Summary
Notes
Transcript
So it's totally different. Another very commonAh, cause we're in Ireland. So especially we just talk about the K years label, okay? The K is actually a subprime of K, but we'll talk about the meaning of K later. Thank you for your attention. You know that? Classification task, especially in my class, my class classification, Well, the input, right? Well, the input is-We'll have a set of training examples that's time-tested, and each has a label, TI to TM.
And then we want you Right, we want you to classify the new test in. Then a point, next sheet. Then how do you classify this is almost cheating. Now let's see the You know what I was thinking about. For some time, we have, what we have to do is that We first use the Training example to learn such kind of decision boundaries here. All right. And then we classify the data point belonging to the Regis.
Okay, so here you see that whatever the linear model or whatever nonlinear model, the first step is that We first need to learn We first need to learn the decision function through the Trinity Ensemble was labels.
The test is done well. Where's the head? All right. This test on XT has threeYour K was a four. Three. Sambas. From the group class, I'm going to go from the--Bring the cups. And then the second is So, once we have these new neighbors, and then We let the Kenya's neighbors vote. So here, right?
So here. Bye. The geostamps here are first to find the key news labels. And of course, One question is how? How would you define the nearest neighbor? What's the mean of the nearest, right? The nearest, of course, You've all studied monography. Right, you're a close married. For instance, we can use some We'll use the cosine similarity. I use a Even more. How would it feel like a Gaussian kernel?
Right, first is to find is to measure this analogy. And the second is "Find". Find these neighbors. Find the Kenya's neighbors. And how can we find a community? So The name I gotSound right. We have untrained samples, right? And we want you To find the K-diff levels, we just need to first compute the similarities. The pairwise monad of the gene, the XT, and the examples. Right. And then we sort theScores.
First compute the similarity, pairwise similarity between Xt and O.
And then we saw the similarities. And then you get OK.
You can see that the council line here D is the D-dimensional. Right here, right? to compute the similarity For X-I-N-X-T The formula I see is d, right? And then we're going to compute all the scenarios. The time for resting is 4:00. And time's deep.
And they will assume that every Nebo has some weight. So in this example, right? The five of them who was clear are blue and she are green and they were similar levels of the same weight. And then you classify it as the I'll do proof. Ross? What do you think This 5 year there was And the corresponding similarity, okay, the logic distance then indicate this model.
In this case, if we still apply option one, we assume that the data points are the same way and then they classify by the group. But actually the group pointsI'm here with Room twice. The group was actually a fun year. And then the second option is actually-We cannot force the neighborhood the same way, but-The more near the network you have, the lighter it is. So in the example Suppose thatRight away, it can be, right away it can be good.
You know, you guys are not here, right? But isn't it here, we have this seminaryFor the weight, the test in front of the news labels.
And even though here, three blue points are We didn't, uh... But actually, The weight The weight of the three group points actually is smaller than the weight of the-Two group, two group once. So if you consider the weight, all you should never go to death.
Right, first step is compute the-if you want to classify it, it has to be some x, t. When first step is compute the symmetries, I'll backseat you. and then we find your top K, and your standard rules. You see that the difference between This can and other agonies, there is no chain at all. Right, the notion is we only need to know the We're not giving a chance to samples, we're just labels.
So it's Kenya and Agra that actually That's not any training procedure. Okay, any training procedure? On the other hand, the commercial right Is that for each? Hurry. To classify each test sample, we inform the complexity of all the entries. Because Oh, yeah. Computing the Seminarist's Key is exciting. If n is pretty large, of course, this complexity is very--Okay.
Right, this is a deep, this is. This is the cause of fundamental-Of course we need to make this. Actually, I would like to give you one minute to think about If you want to make the cane efficient, you have any The solution right here And we know that, right? Ready or not? It wasn't ended because of I can recognize similarity between the test and sample, all the test and samples.
And we know that there are about 30,000 post offices. Life without longitude And then if you want to find news, The EU setting that we want to calculate Listen, the distance between two Place under 30,000 first office Location, right? Thank you. Okay, I'll show you that we have theAnd of course,there's no need for you to find the office outside the like in the east or in the west, right? Hey, Halloween.
Avoid that. You can use the This idea, this idea is... For the vectorWhat do I want you to do? At the beginning we don't need to After all the chaos of the O Post office. But instead, we can build some landmarks for each. For your state, this is kind of like what you can-If we treat each stageAt a classroom, right? First, generate landmarks for landmark per state, meaning finding the Central. Point.
Oh Dad? cost of that state here, right? And then we got a sign that put off your trees and used landmarks. Dennis.
First, just to compare the location with the landmarks that I'm used to you.
Because in the training stage, in step 2, we already know which post office are near to which landmarks So for here We can find three years landmarks And then we did the Ish! Landmark and then we compare the post office So here, you can see that here the time complexity now I'm going to use from the Number of training samples we want to compare by just the number ofI'm Max. Because of this one. Right.
Actually there are also some other efficient algorithms like the KT3 and the quality sensitive harsher I don't know what that is.
But this strategy of cross-groundation is actually used for All types of machine algorithms that you force have parameters. Right? So let's first just look at what the-how prominent that we are. So how prominent is the ratio of here to alpha? You know... prefix "hypo" that means these parameters cannot be learned We did the model.
Like I said, we said that we try to learn the DC function. This function basically calculates the model parameters. That model parameters can be learned through the I don't even do the service. This hyperparameters can not be learned by the model. OK, by the machine model. And I want to explain hyperparameters.
Let's see what the chakram pramidha is all about. Okay. So you're doing Russian? In the polynomial of writing, whether we keepBrian.
Which is actually Need to specify, right? In the regular linear regression, We have the Hohkner Lambda. We also run the Gaussian operator Perna, which has a weak sigma. Once you use that in your SVM, we need to specify the sigma.
And in the gradient of stochastic gradient design, we have the step alpha, or linear rate, or step side alpha parameter, hyperparameter.
And also we will talk about some introduction about the deep neural in future slides which all has its own kind of like the neural layer but the neuronsRight, and in theCan you say, "Oh, good." If I use the number of classifier, we have the number of the number of k. All these health planners are are really important You going to come? If you set these values inaccurate, And then of course, performance can be very bad.
All right. So the term, right, anaphilia and anophilia actually can be reflected by this So here x-axis means the molecule molest and y-axis means the arrow.
We know that if the model is very complicated, And then try an arrow. can be very small, even too be zero. But very complicated model. Make the day hard. Do you want to feed him? Each tube is a very high tether.
The morning is very-That's very simple, right? The training area is very large. I'm gonna wanna should be Right.
With this, the test error is small and theIn the Paradox of Greed, I mean, the refraining by the linear model. Very simple and then 15 degree, very complicated and then 4 degree,This is your number. And then he'll just use this polynomial regression model as an example to Show the idea. of how we can leverage the cross-validation to decide the option of what you think would The alternate you can take here.
Right, you say which one is the optimal, right? Like I said, the optimal is basically the model thatThese are the Simonius and the Misfare. Whereas in the other standard, the preliminary degree form, This is the list of this world and then we set up degree for model is best So this this thisSo much in Sanskrit.
actually this also this is theThe common mistake I'm just here to begin this.
I want to say that this is a big mistake for theYou know, actually, Even the public papers have made mistakes like this This is totally, don't come here to draw, okay? What's wrong? Because here, when you get the When you want to compute the misfit error on the test step, that means The test tables are already I'll be right back. It might not if you want to apply it academically in practice, of course you don't know the test.
The test level is what you predict. Okay?
It's complicated around because the test labels are on bad-level practice But even if you have the test labels, You cannot do that, you cannot do this. And that's how we use cross validation to I feel likeThis is the issue, okay? So here, right? We really know the training samples.
We really know the training samples. And we also do the labels. So a very common strategy now is that instead of using the test sample, okay, and accurately So if you're a Tesla, we don't have labor. So actually we have three ensembles and we know the labels and then we separate that This is YouTube.
Two portions. And then... The first part is called the training samples and the second part which is called validation samples. Now, what was the...
So here, you can see that the degree four polynomial regression can produce the smallest validation mean square error. I know I chewed this out.
Degree 4 Here I'll show a common Cross-Border Data Strategy Which makes you chew? computerAt the same time, we want to maximally use the verbal data and the train data.
How can I do that? The first step, right, we need you Define it. We have a predefined A set of the health standards get a P, and the P can be one, two, five, or eight. And then... We randomly partition the training center to train the summit to the K-Cops. Okay, boss. Okay, boss. And here, k minus 1 pi over trinium.
And they remind him one part for the testing or for the validation.
And then he computed the average error of the K-ring repeats because here, right? You're partitioning the train number into k bars, and then this has the-Okay Four inches is one, right?
And this average arrow of the K-repeat is called the bandage arrow. And then we just choose to have a round of cheer thatThis is actually theTake a short step, hold your hand.
You can see that in these four combinations, each One has a I have theMaximally leverage the current data and then thein the conditions.
like this for each hole you have 9 paths of training and the primary path is the validation and then you can compute the error each.
Right, this is my--How'd you get that? Senator. Peace. And then his arrow goes to you. Thank you. So right, the summary that this really summarized, right, the K-U is, there's no training. And time for this is order of T and then we can introduce the which actually is order ofNine to five.
We need to use a K4 polarization. Basically, we split the training data. You train the class on this. And then we learn how to use it instead of testing it. All right.
Cross Validation