Lecture Script

Summary

Kernel Functions and Mapping to Higher Dimensions

The core problem addressed is how to separate data that is not linearly separable in the original low-dimensional space
The kernel method maps data from low-dimensional space to high-dimensional (Hilbert) space where linear separation becomes possible
The mapping function φ transforms data point x from the small d-dimensional space to the capital D-dimensional space, where D > d
The kernel function K depends on the mapping function φ and enables this transformation without explicitly computing φ
Alternative names for φ include: mapping function, feature mapping, and nonlinear function

Kernel Functions in Soft-Margin SVM

In soft-margin SVM, the dual form involves inner products between data points
When applying the mapping function φ, the SVM is defined on φ(x_m) rather than the original space x_m
A key challenge is that the high-dimensional space can have many more parameters, and in some cases φ can map to infinite dimensions
The "kernel trick" allows working in high-dimensional space by replacing inner products φ(x)·φ(x') with a kernel function K(x,x')
This means the feature mapping φ does not need to be explicitly defined or computed

Dual Form and Kernel Matrix

In the dual form, φ appears in pairs as inner products between data points
The number of parameters in the dual form equals the number of data points, regardless of the feature dimension
The kernel function K(x,x') = φ(x)·φ(x') captures similarity between data points in the high-dimensional space
The kernel matrix (Gram matrix) is an m×n matrix containing kernels for all pairs of data points

Example: Computing Kernel from Mapping Function

Example given: x from 2D space mapped to 3D space via specific φ function
The kernel K(x,x') is computed as the inner product of φ(x) and φ(x')
A reverse example showed how to derive the mapping function from a given kernel
This demonstrated mapping from 3D original space to 6D feature space

Types of Kernels

Linear kernel: Simple inner product of data points in original space
Polynomial kernel: Contains all polynomials up to degree d
Gaussian (RBF) kernel: The most commonly used kernel
- Has the form K(x,x') = exp(-γ||x-x'||²)
- The corresponding φ maps to infinite-dimensional space
- Does not require designing a specific feature mapping

Kernel Matrix Properties

The kernel matrix K is semi-positive definite
Semi-definite property: For any vector v, v^T K v ≥ 0
This property ensures the objective function always has an optimal solution
The kernel matrix can be computed once and stored offline since it only depends on the data

RBF Kernel SVM Example

SVM using Gaussian/RBF kernel is called RBF kernel SVM or RBF SVM
Example demonstrated: Data not linearly separable in 2D space
The RBF kernel found a non-linear decision boundary
The boundary was defined by support vectors: 2 support vectors for one class and 3 for the other

Summary of SVM Concepts

SVM is also called a maximum margin classifier
Margins are defined by support vectors
The primal problem is a quadratic programming (QP) problem
The dual form is also a QP problem but scales with the number of samples, making it more efficient for high-dimensional data
Kernel methods enable handling non-linearly separable data by mapping to high-dimensional space
Three main kernel alternatives exist, with Gaussian/RBF kernel being most widely used

Notes

Transcript

Which can separate the two? I just read a dead boy from different houses. Okay. So this tree of numbers basically shows that The data is possibly In a way, it's separable in a hydrogen space. Even though Are you impossible? Hello? Let me know space. And then the question is, how can we map The third hand, you can have your space. Right, this is very, this is what we'll care, right? How do we know that, we'll assume that we don't have a table of distribution, right?

But what if we-what if we're there? We don't know. We don't know. Can we still find a mapping such that it can be classified in the hydrogen space? Okay. So this is very--It's important that The corners. What a crime. So the idea of the kernel basically is to map the low-dimensional data into the high-dimensional space Washington! How are you doing? Eat a meal space. I'm weaky nobis. Martin. Other fight.

And then... In Heidelberg's D, Heidelberg D, We want to use the key of phi and I have a prime, which is The half-blind defined by FXIt was a zero. Okay. On Newton's mathematics, right? Max. The function, my function, This matter, point x from the small d space to the small d space. Can I excuse me? Can you explain what Cardinal means? Ah. The kernel up here, I will use the example later to show what the exact kernel is.

But here, just to understand the kernel, Functional kernel, right? The kernel basically Right, the data in other space I'm not going to separate you this time, you know I separate us. Like in this case. Right.

So we're just like editing that? You can't think about it? Yes.

This is to make the data, to leave data into the hydrogen space. Okay. In Heidelberg space. This height of the space is obtained by this function matching phi. And then the kernel will define this function pi. And later I will show a specific form of this kernel. But just understanding that you can-for now, you can think about a corner. Depends on function phi. And phi is to map data from low-term workspace to high-term workspace.

Of course, this capital D is lighter than D. Okay. There are different names, like mapping function, Oh Feature matching for some nonlinear function. So here, I just see an idea. I just show my white, I just show white. This... The kernel function on this file function is introduced here. Let's just, uh... Okay, I highly would just say about that,We talk about the, right, the soft-model spin. And actually, I mentioned that We needed a top-line experience.

Oh, you there. Soft money, SVN. We have a In the dual form we have an inner product, but let's talk about that later. Just think about that. Here. Okay, so the sort of the sort of my SVN here. It's nice and not... Define "on the arena". It's not defined on that point from the original space x_m, but now it's defined on the phi x_m. Try and feel my view. Okay. And there was that. We can get this prediction.

So If we apply this Switching back to function 5. We just need to have a dinner step. A dinner step is to map x to the 5x. I think we've solved thisIn the Heidemann space.

Of course.

Of course, right? First, there are many parameters w that-there are many-Even more private if we want to run. Because the dimension is much higher. in the original space, right? And in certain cases, the 5 can be infinite. What that means, If we take a look at the primary software market SVM, OK, It is not easy to incorporate This is my function here. Basically, if you make this up,You make. Please.

objective function is a little soft, kind of. So then, we have this function. Dear Commission, A modest magic

So this is the learning point. Remember that I mentioned that in this learning function, but object function here. There's one part, it's an inner product, You can help their points. Excellent. But similarly, now once we apply this Teacher, my name is here. There we have the Interpret that. I'm the head of inner space. And there we have similar to how I applied this prediction, But...

The magic's next year's. is this, okay?

You know what I wrote that landed on a prediction? This fives. appears in the pairs. Basically, the linear product can only come to that point. And also... This object function, remember that, instead of solving the Primer better. In the Fisher space, It's all you need to solve theMonteplauus, A is the same number of points. Well, that means whatever the future man you are using, This new objective function still has just need to store the same number of parameters, right?

The number of... There's a mouse. And now this is the exact form of the kernel function. The current function You just need to define the kernel function as the-You know, for that, you just buy a type of magazine. In other words, we don't need to Specifically, define this feature matching in this dual form. We don't need to define the same function phi. Bud. That's what you find. Just need to introduce a kernel function.

Okay, this kind of function which reflects the similarity between 5x and 5x. Okay. So with that, We just need you. Simply change the This inner product A certain kind of kernel function. But here, we do it with the don't introduce what exactly the kernel function is, right? So just the chronological theory is that in the dual form, For each data point x-Whatever the... Manifolding a fight is plight.

Then in this pure form, In either learning or prediction, it often appears that inner product We can approach that at once. This inner product reflects a similarityYou get that point, right? And then this either product On the hydrogen space, It's simply defined as the And of course, this program should or can capture the similarity between the That's how gay I am. And by introducing the kernel function, We don't need you.

explicitly compute the files. Okay, I don't need to explain what you're trying to be revised. Because fakes somehow can be-In finish, I'm in space. That means I'm very, very high. And what I'm going to do is we only need to define a kernel of K. OK? Right. This is-I'm not just using example 2. that you have relationship between the phi and the kernel k. Here, this is just an example. If x is from the two-dimensional space, x minus 2, and then we map that to the three-dimensional space, we assume we have this function phi.

And then what is the kernel function here? I'll leave you one minute to calculate this. Maybe two minutes. Give me two minutes. Of course, you know the duration between the kernel and the mapping file, OK? Okay, now, any solution? What is the corner function based on definition? Get an assumption? Can I just, right, just, just let you come. We'll just leave computer in a for a while, right? Get the inner product.

Facts by tea. And then the kernel is defined on the inner product between these spikes. The kernel on the XMT is basically the inner property of FAT. And they can write thatConnor, here. Where is this? And then we can Define this for now. As in the product, we exchange it. Square, okay? Square over the even or the odd.

And let us do another. Exercise.

And if we know the clear function, And then what is the... Phi x, what is that mapping function? Basically, x, right x here is in the three-dimensional space. Now what is the highest?

Okay, just show the answers here, okay?

Listen, they've been coming ahead of you. All right, just-See the original axis is in three space but now Yeah. 13 W space. So this example is basically Once you have five, then. -Calculate the kernel of the file, the file name. But actually... This kind of kernel function The first kernel, this kernel function is actually used. But we have some. I Also, on the representative currents, The first one is called the You know, Connor, this is very...

Simba, just a... The inner product of the dead point from original space The second one is the phenomena kernel. Okay. This polynomial kernel contains all the polynomialsWhere's the deodorant? And I would like to mention this, this Gaussian kernel isIt's the most commonly used kernel, a Gaussian kernel, because here, We write this corner K. We have these, so it's in a form. But when you calculate-when you-On this card, I'm going to calculateFor your magnified, The phi can be infinite.

It's infinite. You see? So here, we do not have this-we don't need to design a specific feature mapping. The film had been I mean, you finished. So this is defined off your map in five. But instead we just need to You find a... Corner function here. So this corner already implies that Say ma. Math is that. You feel any tremendous grace? Once we have a kernel, then we need to-Would you find the... The matrix, right?

The matrix, you see the kernel is defined on a pair of data points. So bye. Including all the kernels on all parallel state points, And then we have a kernel matrix, or for the gram matrix, type of k, which is m by n. I'm going to use the nanosamples. This kind of measures have many good properties. The first is called semi-definite. Semi-definite means that Given any Good day, Chuck. V. This product He's not on their team.

This probably is very important. Because... Once you define your objective function, which includes this kernel matrix, That means you can alwaysI have an optimal solution. I always want to talk about the uniqueOffer some washing. It's because of the propertyAll of these are current issues, okay? And also, enough from these that-Given the data, right? Whatever data. And this kind of matrix can be computed once.

And then store offline. Right, because it only depends on the The fish eggs are an excellent fish bed. Some of the features. Just need to compute that and then store the kernel images. OK.

And we use this-So this is a lot of fun. There's no fight here. Right here.

So with that, with the program function, you can-of course, you can-But you can use whatever you want. Eeyore. The function order prediction function, right? And when using this kind of function, So we use the Gaussian kernel function. We use the Gaussian kernel function here. And then this SVM has a name called Obv if per svm.

You may head over like SPM, you may head over like...

-That's good. You wish. Apply the-I was in Perna, which is why they useOkay. Thank you. And then this is actually Some example here, right? The data, of course, data are literally I'm not going to accept prayer, though, right? And if you apply theRBF Cresp is for you. You can't find this Black. Black leaves from Earth. She says this is in boundary. This is in boundary. It's not literally separable.

See this in a 2D space.

This is the boundary.

And this is the that defines the yx+1, yx+1, yx+1 defines the It's defined by the support vectors. You can see that there are two. The five.

Go to vectors. And there are three small bitters for a red air pulse, five for the room. Cross that voice. This is the data as a parameter using the RPF kernel. Okay?

So a summary about-Svm, also called the book number 7 It's kind of a... But a maximum mining cost of power. And the margins here are defined by these four vectors. So that's why the name is called a score by commission. Hopefully you guys are familiar with that. And the relationship with respect to the margin. And we talk about the, right? The primary problem is QP. We've got a program. Hold on. You need to use your soft for novel phycic virus.

And then we can write the duo form. The duo form is also a cubic problem, but it's not so for the number of samples. Virus. Which is more efficient for high-density data? I'm talking about the software-managed VM. which still needs to be done in the nearest but considering some different of labs. The main idea This gives you some dislike virus, right? And we know that by each of the select variables, the primary and the dual problem It can be written as the...

He chose orthoplastic arterialization. And here it is. What Brooks made, what Brooks made the zero one loss which is the optimal loss, okay? So 90% by that point. Well,One. And then first I need to use the kernel, the kernel cheek.

Especially I'm going to do my inner ear support here. The non-linearly separator is Just right under Originate. We're in a space, right?

The main idea is to map data from the renal space to a linearly separable height of the renal space.

And there are three alternative kernels.

But now The Gaussian kernel, the RBF kernel is the most widely used. If you use the lesbian. Okay, thisRight, so. So far we have

Hard-Margin SVM

선형 분리와 마진의 개념

결정 경계는 입력 공간에서 클래스를 구분하는 초평면으로 정의함
선형 분리 가능 데이터란 하나의 직선 또는 초평면으로 완벽하게 두 클래스를 나눌 수 있는 경우를 의미함
마진은 결정 경계와 가장 가까운 데이터 포인트 사이의 최소 거리로 정의함
여러 가능한 선형 분리기 중에서 마진을 최대화하는 분리기가 일반화 성능이 가장 좋다고 가정함
퍼셉트론 알고리즘은 어떤 선형 분리기는 찾을 수 있으나, 최대 마진을 보장하지는 않음

Hard-Margin SVM의 기본 가정

모든 학습 데이터가 선형적으로 완전히 분리 가능하다고 가정함
단 하나의 데이터 포인트도 오분류되거나 마진을 침범하는 것을 허용하지 않음
노이즈나 이상치(outlier)에 매우 민감한 모델이 됨