ICS Video Player

GEOL4397 Selected Topics-Geology - Week6_LogisticRegression_-_20180218_214927_13

Transcript ×

Auto highlight

Off

Font-size

00:00	Hello, everyone. Welcome to to lecture today. We're going to talk

00:04	logistic regression. This is probably the one of the most classical machine learning

00:11	. This is the very basic classification learning algorithm, and I give everyone

00:19	turns or wants to learn. Machine well learns is as their first on

00:27	algorithms. But before I do I just want to spend some time

00:32	explaining the buyers and various variance problem I touched on last week. But

00:39	probably didn't do a good job explaining . So why spend time talking about

00:45	is because if you if you if learn and a chain a motion anymore

00:53	from your opinion that said, Can find that your machine learning model is

00:59	performing as well as you expected the likely you have rain to either a

01:05	on variance problem. It's very important figure out whether you are telling you

01:12	dealing with Spires on various problem because tells you what to do next in

01:17	to improve your motion alone in Remember that last week we also talked

01:22	the different strategies, different things you do on the order to deal with

01:28	and various problems. So let me . Let me use this simple.

01:32	said that you have already seen several on the why access is that.

01:37	is a price on the horizontal axis is the size of house. So

01:43	trying to predicting the housing cops price price based on the side of the

01:48	of the house. In this if we assume if our hypothesis too

01:55	, For example, if we assume this is a linear relationship between price

02:01	on the size, then our machine model will look something like this basically

02:08	line. In this case, it's that we are. We're under fitting

02:13	training data because we get because this line cannot capture um, all the

02:22	information. On the other hand, you receive your machine learning model is

02:29	complicated, for example, here, assume that we use a polynomial model

02:37	two degree for on our mission. mortal will look something it kept.

02:46	feeds data really well because you can can tell that this and yet curve

02:52	all our training data point. So but probably because. But because of

03:00	, this morning probably will not. as well to new that's set in

03:05	case way have to over feed the . Also, in this case,

03:15	we assume this more than it has not too simple linear regression case not

03:23	complex that liking this case where we a problem after the four degrees.

03:27	we simple assume a polynomial moored up to 2 degrees in this modern learning

03:36	that you end up ways will look like this. In this case,

03:41	hope you agree with me that this is probably the best more than that

03:46	we can use to predict how's housing based on the side of house.

03:52	that literature in machine learning community people also a user, I use the

04:01	buyers Hi bear buyers to describe this feeding problem and high variance to describe

04:10	over feeding problem. So here's the most likely in most cases inspires problem

04:19	due to our oversimplified assumptions. For , we assume a linear model when

04:26	through our training data actually were generated a non only highly nonunion model.

04:33	will lead to under feet in the data, meaning that our machine learning

04:41	Mrs some of the important information important , important relationships. Among the training

04:48	on theology and the various problem comes the fact that our machine learning model

04:57	to center Dave probably excessively SEB sensitive small variations in the tree and it

05:04	like black noises. So in this , it will lead to overfeeding that

05:12	, meaning that anymore is too powerful it captures irrelevant or sometimes even you

05:25	it features young training did. This happens. For example, This

05:30	This will happen if we assume a year model when our date actually are

05:35	leaning. It turns out that these things buyers and variance there's a well

05:46	bar bias and variance trade off. what? What does that mean?

05:52	? What this means is that if increase mortars complexity, for example,

05:59	we increase the polynomial features, increase number of parliament features and the decrees

06:07	parliament features, that's what typically increase various because we are moderates now more

06:15	of capturing a very small variations. date, so the various will increase

06:23	the center. At the same the buyers will be reduced, and

06:28	, if we reduce the more those city. For example, if we

06:32	from parts unknown and agree to back linear regression more the platinum a degree

06:38	. Obviously we variances decrease because because modern art becomes a senior, so

06:44	it becomes incapable of capturing the small variations but the buyer's increases so you

06:52	see that there is any buyers and trade off. In most cases,

06:59	you increase wine, you will, , reduce the other one. It

07:05	very difficult to reduce both buyers and , So this is incoming in any

07:13	. This is very well known as Barrys and Byron Street off. I

07:19	that gave you a better understanding of spires and variance problems. Okay,

07:28	we're going thio fuckers on today's Largest regressions. I'm going to talk

07:36	the basic idea and concepts behind things . I also try to expend,

07:44	and help you to understand intuitively what regarding does also, for those who

07:53	interested in learning more about how to a cost function. Was it about

08:00	? But feel you're free to escape part because this is not required.

08:05	is beyond the scope of this class for your lab exercise my homework,

08:11	example. You do not need to how to develop cost function. And

08:15	if you work in industry and you with largest refugee on a daily

08:22	And if you, if you are open sauce and library like second learn

08:28	tensorflow. Still, you do not to know how they cost function for

08:34	record. What the back. This purely for those who are interested in

08:38	more about cost function and optimization. , out, I will also give

08:45	the shows how to implement this operation second learn. Okay, so first

08:54	about largest regression is that these easy method remember that last week we talked

09:01	there are two basic categories for machine algorithms. A wise regression. There's

09:09	classification regression, regression always predicts and numerical values on classification always predicts

09:23	, and the category cool numbers like Here, the 123 simply means Class

09:30	or a category one category to cutting . So the first thing I want

09:36	make clear is the largest Russian. seize a classification matter despite the fact

09:43	it is called the largest regression. this name is very, very

09:47	But just keep in mind that this a classification algorithm rather than a

10:00	So here's just a few examples of registry largest triggering a logistic regression

10:06	For example, emails we want to emails you to spam on non

10:15	So basically a yes or no Um, another, um, application

10:22	largest regress. And he's on the transactions we want to classify. Want

10:29	detect if a thief online transaction it's or not again, with this case

10:38	where this is a yes or no ? Yes, it is fraudulent from

10:42	is a fraudulent transaction. Now it not, um, another application using

10:49	class occasion whether patient's tumor is malignant or benign for self driving

11:01	registration is also useful because it Classified data into pedestrians are no the

11:15	problem that I used last week to supervised machine learning. It is,

11:21	is. It can also be be using largest regression again. Here we're

11:28	about whether it's a cat on not so in an application to

11:34	To Johnson's problems. Largest reversing also also has also found, and its

11:40	it's using, for example, sort all the detection. In this

11:45	you might want to predict whether a model sells a particular cell. Your

11:53	is either sort on no. And this week's life exercise, you're going

12:02	classify about 10,000 seismic creases into the is good and bad. So you

12:12	these that largest regression in all these . It serves as a binary,

12:20	fire. So we only have two yes or no or category one Katherine

12:32	. So the largest regression, and is a supervised learning algorithm. So

12:37	just a recap of supervised learning. that for supervised learning. Our training

12:44	said training did set consists of two . The input variable X and label

12:49	why, says LaBelle's why you can that as output variables are the true

12:58	. So she wasn't what what he . Learning out with them trying to

13:03	is to come up with the mapping F that can that can maps this

13:12	variable two out Once, once this functions learned the next time when there's

13:24	new instance, X comes in, can just use the learned F adds

13:29	new data to predict. So for , sir progression the outfit. The

13:41	will always be either zero or one this case for the weekend. Simply

13:49	zero as negative class. For It's not a thought. It's not

13:53	fraudulent transaction. It's not a it's a scam. And all also,

14:07	can understand these category one as positive , for example. Yes, it

14:14	a sword. It is a and it is the fraudulent transactions.

14:27	this is a linear regression model that have already is in several times.

14:33	he also implement these in the notebook . I'm here. I'm using this

14:40	satisfaction prediction as example. So here have one feature one put one put

14:48	one feature, which is the TV capital. In other words, we're

14:52	to predict the life satisfaction H backs on the single feature TDP X.

15:02	here's this zero See the one that more the parameters were trying to learn

15:08	training date. In this case, only have one input variable or one

15:15	. So this is also called linear . For one feature, you're just

15:21	down here that you can understand the very bow as as feature. So

15:28	, if there's wine put very more a means and we only have one

15:34	. So we also talk about general this linear regression for one feature to

15:40	regression for multiple features. The basic is very simple. Also, instead

15:46	having only one people variable here we quite a few input variables X want

15:51	to up to extend each one of represented 11 He could feature one feature

15:57	put variable. For example here, one month the GDP extrude man free

16:01	care three, maybe education, ex and the air quality So on and

16:08	forth. So we have a problem is is to predict life satisfaction.

16:14	on all these features, X one after accent and here see that they

16:21	see the one Sorry, I should put down to about two feet and

16:27	are the more the parameters were trying learn from training. Um, I

16:32	hear multiple input variables correspond into multiple . Oh, that's what I put

16:45	here. So we can We can need more general science on this speech

16:52	acts That is our predictions and X that is our future one next to

16:57	future to 3 53 Accident on the In this is a more primitive

17:07	Turns out that we can use linear using Ze Major expected multiplication to simplify

17:17	morning and summarize it in the more form here, State vector transposed Have

17:27	vector x after years heart. he's defined a visa and plus on

17:34	one vector axes also a m plus by one vector. So for logistic

17:46	, remember that way This is a classifier. So we out prediction will

17:52	yes or no. But Congress, predictions, always will always be is

17:59	zero or what? Oh, making prediction to be exactly zero or one

18:13	out to be really difficult. So a nothing similar thing we can do

18:18	to make sure that I were predicting for within this range, this Ranger

18:25	0 to 1. But the problem the linear regression problem without foot of

18:33	regression models that this output of linear model it can be anyway view

18:41	It can go from medicine for too, to infinity. For

18:47	Suppose you put the future while you feet here. He's obviously that prediction

18:54	we learn a senior regression model guessing that's the street line on. But

19:01	the output values can be anywhere from Infinity too infinite. So obviously we

19:11	to do something different in order to sure that output always within our

19:18	always for within this rink. So that is where the largest logistic function

19:25	into play. Logistic function is defined this way one over one plus e

19:35	menace X. This is also called function waken. If you plot of

19:43	, this is this is hardly a regression. Looks like first the first

19:53	know, he said. This is smooth function, and also you noticed

19:58	as C input variable X goes to , the ultra value approaches line and

20:15	, if the input variable keeps this outfit buh the Afghani will be

20:26	closer and closer to zero and twins variable is zero Um, the out

20:32	the butt. The out foods value 0.5. A nice thing about this

20:41	regression logistic function is that it can any real number from Madis Infinity,

20:48	to a real number within this So what? No matter and how

20:54	or how small this input Vera Maxie's T X will always be within this

21:04	zero and why? So how do go from largest reversion? It's our

21:11	regression to largest regression. Well, is the on the model we have

21:17	for largest regression. And I also that his output value from this more

21:24	he's it can be anything from minus infinity. Remember, we also we

21:35	talked about largest regret largest a function can map a real number two,

21:43	rule number within this rich. So want to make sure that the output

21:50	always for within this wrench we simply the logistic function to these products

22:03	That is what he was thinking. this is the logistic function. We'll

22:09	more of the function that you saw few slides ago. One property with

22:17	function, that's the output will always within this range, which is what

22:22	we want. Another good thing that this property is that we can easily

22:32	out. Convertible on that is the of X as the probability What was

22:39	reason is that simply reason simply sends parties within this range. So that

22:47	the interpretation in terms, probability just natural thing to do. So,

22:55	example, yes, that's what I what I would I'm here. We

23:03	interpret h of x as the estimated that why you won X So,

23:13	example, I just I have a simple example on email scam detection,

23:21	example Here, my input features, two by one vector. The future

23:30	using its simple is a number of words. If people have found that

23:37	scam emails, they have something in , that is the capers. For

23:42	, if you see something like for also, for example, cash,

23:49	example Amazing. So these are the words that the common to scam emails

23:56	one way to detect scam emails is to counter. To detect the existence

24:03	these and also to count is no of these killers or their many

24:10	There's a long list of the Was amore key words you see from

24:17	list in your email, the more it is a scamp. So in

24:21	case, for example, we trim largest regression based on the input this

24:26	feature and also assume that we are output value from largest yr Ri

24:34	Poland. A two way can simply this outfit value as as that there

24:45	the 82% chance that this email Well, some people don't like

24:54	because if you tell people that there's chance your email use them. That

24:58	make sense to some people. People simply want to know. Eve my

25:05	. If my email spam on just simple yes or no problem.

25:11	that case, my simple thing. can do it just to do that

25:16	me. If the output value from regression is larger or no less than

25:23	off point, pop conned 0.0.0.5, is pretty quiet for a while.

25:27	if the value is less than point . We simply predict Y equals

25:40	So that's pretty much that's that's the idea is behind largest regression. So

25:45	I want to spend some time explaining this what this is and trying to

25:53	you develop intuitive understanding of what the through Britain does. So this is

26:01	we talk about just one second This is how we do the classification

26:07	on largest regression. If the output larger or Yukos and it's like is

26:12	or Laugesen thereupon five, we predict positive class. If the output various

26:19	point 0.5 predicting Y equals zero. again, this is like this is

26:30	the largest, longest day function looks . So let's take a closer look

26:39	this logistic function. Um, my here for you guys is to think

26:47	when this happens. So remember that of X, um defined as t

27:04	transpose times, Max and the as blue line. That is how the

27:12	function Z looks like this easy here can simply can't get similarly we can

27:19	define Z equals equals faith transports packs age. Theater X is equal to

27:38	. So if you look at this closely, you probably will. You

27:42	have already found out that whenever fate X, it's larger. Zero.

27:57	have aged thief ex you cruise for 0.5. And similarly, if h

28:09	eight eggs smolders and meet me that t of transport eggs smaller than upon

28:17	five In other words, eve t the smaller than 50.5, that means

28:24	we are talking about see smaller than . Remember what defines E as 50

28:33	, safe transport time, blacks. that means state transpose times x smaller

28:38	zero. So whenever this happens, have t of the So we have

28:45	of then faith transpose Times X wanted , um, 0.5, that is

28:55	city. So whenever let me put here and we're ready down. So

29:03	H bags is smarter than on five Predict why, Michael zero equipment way

29:10	saying this is Yves St Transposed X smaller than zero. We predict y

29:18	so that that is what I wrote here. Um, if state transposed

29:25	is equal to all ages and zero pretty like one if data transpose times

29:31	smaller zero Pretty wife was here. what? I it was on

29:38	It's exactly the seam is equivalent to part. Okay, Next we'll Andi

29:53	Thio. Explain what? What this means. So to do that I

29:58	Thio. Here's a simple example So here we have This is our

30:03	is our training set. Come again we used the red crosses as potted

30:10	and the circles as negative class on words I the Red Cross. Of

30:16	one to class wine and these circles to pass you and that we want

30:24	classify, um, this data using regression So in this case because in

30:34	case, we have to put So that's why we have we have

30:40	like this said zero Plus they want wanting to x two. You can

30:46	expressed this thing as stated transposed time were Satan. He's going to faithfully

30:58	Saito wine. I hate it and X because X zero excellent.

31:07	next to relax zero will always be . So we haven't talked about how

31:23	hotter learns it's more the perimeter Um what about that. But assume

31:31	we have implemented largest regression and we We learn we have learned this more

31:37	perimeters. And, uh, we that think that they're all equal to

31:43	three. Some wine equals one. in other words, we predict y

31:57	one. Whenever this is true, can always move without. We can

32:11	move this ministry to the veteran so that becomes x one plus X

32:18	larger than three. Conversely, will Michael zero if Manus three plus X

32:39	last text to smaller than zero. , you can also right this thing

32:47	select different form. Let's move this three to the right hand side,

32:52	becomes X Y Class X two smaller three. So you want to explain

33:05	this means? Let me let me write down Listen creations that to you

33:13	problem Much more familiar ways X one X two equals equals three. This

33:20	simple. If we plotted this up onto these x wax to plan,

33:27	simply is storyline. Passing through is on x one and three annex.

33:36	to these these x one flax next equals three It turns out that this

33:49	space on the top right can be some rice as x one plastics to

34:03	him. Three if you look. you look at what we read on

34:11	, it simply means that we will , Like the one Eve this data

34:21	is located in these half space. was a tough right thanks. And

34:34	, the half space to the bottom that can be mathematically some rays and

34:42	one plus extra smaller than three. is what we have here. So

34:47	also he's 1/2 space. So what says is that will predict y equals

34:53	whenever the state point's located in this space. And we will. So

35:03	notice that this straight line here that the X one plus extra good

35:11	It separates this positive class from this class. So we will,

35:21	terms this tree line as deceiving boundary it is a boundary between this posted

35:34	and negative class. If you don't follow me, please feel free to

35:47	here and spend some think, spend time thinking about, um off these

35:55	here. I guess the important point want to understand What I have done

36:02	is to realize that on base increasing to a street line. And these

36:12	actually corresponding to these cough space to top, right? And these quantity

36:24	to the half space to the bottom and that will help you understand.

36:31	, what I did here. Now let's consider a more complicated example

36:44	hear. I have supposed again and is my training data, and I

36:49	posted class marked highlighted in dressed crosses neck. Next class in it's open

36:55	cups in this case. So this the we also have again, we

37:01	have two features. So this is largest tree regression that we you have

37:07	sin from previous lights. I'm But probably have already realized that these largest

37:16	based on these small there will not able to capture will not be able

37:22	, um, find out the boundaries the positive positive class and Arctic

37:28	Because these theme this model can only linear boundaries like this example. So

37:43	question now is that can we find can we discover? Can we develop

37:51	nonlinear decision Boundaries using largest river got . Well, the answer is

37:59	And, uh, it was a to do that is to adding higher

38:07	degree polynomial features like these X wine X two squared This'll Stroot look for

38:18	to you guys because last time, when we talk when we talk about

38:27	Z remedies for undefeated. One of we can do is to, um

38:33	when we're under feet. Today, , if if you want to phone

38:39	that problem, one thing we can is to add more features, for

38:43	, the higher degree polynomial features that make your learning more than more capable

38:50	capturing Lundeen year behavior. Nonunion boundaries your data. So this is what

38:59	This is what we have from last video here. A similar thing simply

39:05	can simply add more features, more features of high degrees in order to

39:14	the complicated decision boundaries. In this , again, we can rewrite this

39:27	inside thes parentheses as faith transpose where St Symphony teeth of their

39:37	Think wine two and three, beautiful x. Okay, excellent. Thanks

39:49	sorry. We also have X zero excellence merit. And next to square

39:59	. If we have help office function looks like this. Basically, it

40:06	that we will predict. Like was like would want? If Sorry,

40:11	forgot to mention one thing and supposed learning we learned that the model prints

40:17	we will learn are the following 60 minus one state one 0203 minutes

40:30	I'm sorry. State of three Um, he's wanted for is why

40:37	visible with that? That means that predict Y quit one. If this

40:44	is larger or larger, equal to again Yonder took that. Understand

40:53	I will. I will revise this in a slightly different form as a

40:59	X one scrap plus extra square is to our larger than one. But

41:04	probably already recognize that these if I up if I plot up this in

41:10	ex Max too plain, um, correspondent to x one squared X two

41:24	equals one And all this space outside outside this this decision boundary can be

41:36	some rest sex one squared, plus to scratch. Larger than why so

41:42	this case, What we have developed far is to commit some residents as

41:49	falling will predict Y equals one. , um, my dad corns force

41:56	these circum in this case, this that is our decision boundary.

42:12	turns out, turns out that we we can do We can't keep adding

42:16	polynomial features Learn more my more complicated boundaries. For example, if you're

42:23	simple example where we just keep adding polynomial features you this kid, for

42:30	Excellent squared times X two x two one squared extra squared Exline cooked actitud

42:37	this is already the fourth on the of polynomial features. And because of

42:43	higher order putting your features, it turns out that these parts this

42:54	largest regression in this form is capable learning more complicated on boundaries, for

43:01	, something like like these. so next thing I want to talk

43:11	the seal cost function for cost function lunch is through regression. This is

43:18	punk thing because I remember that on or three weeks ago when we talk

43:24	machine learning with talk about what learning . Wait, really talk about cost

43:34	, learning. I mean man, many cases, it's important means that

43:40	want to minimize the cost function. next I want to spend some time

43:45	about see cost function for logistic So So here I summarized on training

43:52	as as this least so where we input variables. First input feature the

44:04	data input data, First label, label, and suppose we have M

44:12	and each one of each one of x I on each one of these

44:17	data. He's eight and buy them one by one vector because we have

44:24	features. Plus, um, plus ex not which is always because equal

44:39	one. So a note is that materials in the fall from slide 25

44:49	33 explains how to develop the cost for logistic regression. Again, this

44:56	beyond the scope of this class. you are so please feel free to

45:01	them. And but if you want learn more about cost function as well

45:07	optimization than the following, materials will useful look. So in order to

45:16	a cost function for largest reverie, let's consider the following. So this

45:25	the cost function that we have used linear regression. Right? This this

45:31	a state of X I that is prediction prediction for the ice data.

45:41	why I that the sea label or true answer for the I've data.

45:48	these thing magazine difference between eternal prediction the labels again, As you

45:58	we have this squared and then with these differences over all of our training

46:07	. So this is cause function we been using for linear regression. But

46:17	out that this this cost function is a good one for logistic regression.

46:22	reason has something to do with these comebacks and comebacks function. So I

46:28	to do next. I want to just some time explaining this important

46:36	So this is very important for So what I mean by they seize

46:46	wth the cost function. When it to often, musician came. You

46:53	have to two time. You have types of cost function wise comebacks the

46:58	known comebacks or turns out that there's types of cost function. We have

47:04	different behaviors for example, on the comeback stopped, emit and cost function

47:11	look something like this or just like all this. I don't know why

47:17	is already always happens. So it many, many local minimum. So

47:27	one of this is a local, know, And this gives this is

47:30	is probably the global minimum because it the smallest among all of the local

47:40	. So they see that no commune of the existence of the life on

47:48	local minimum depend young where you start optimization from. For example, if

47:59	if your initial remember that way greedy dissent, we always initialize our more

48:09	Sit. If if Well, if were initialized If our initial mother perimeter

48:16	this place, then you can imagine by implementing the greedy in dissent,

48:23	will eventually end up some somewhere So we will be able to find

48:29	low communion solution. But this is the best solution we want.

48:34	we want we want the best solution , um will come from this global

48:42	. But because of these existence, this money off this local minimum and

48:49	of the greedy in the way How and the Senate works. Chances are

48:55	I'm not chancing. Most likely you end up in a local minima.

49:04	our cost function is contracts, then will look something. Magazines. It's

49:09	a bow shaped, um, cost . Well, good thing about these

49:16	cost function that it has only one any, any minute. Any solution

49:24	end up with these three global So when it does it help,

49:30	has nothing to do with where you your in issue right where you started

49:36	descent from or you can start from because I'm here. You end up

49:42	this global solution. Are you from from here, You and with

49:46	We're also gonna end up in the , um, global minimum, so

49:53	as you're learning rate is not too . So for optimization, if ever

50:01	, we would like to work with cost function. The reason that we

50:10	want to get get stark in La minimum while local human east us is

50:17	is a solution to our problem. it is not the best one.

50:20	best solution always comes from the global . So with that knowledge, he

50:29	a mind. Now let's let me . Let me walk you through the

50:35	we develop a cost function for comebacks function for logistic regression. Now,

50:44	make things simple, let's consider only single training example X and associate

50:50	Why the basic idea for developing cost is that if our prediction H State

51:00	Axe is very is very different from true label, while we want to

51:06	these one prediction heavily in our cost and cum. Conversely, if our

51:12	age data backs is very, very to the true label than we,

51:17	don't want to penalize the critic for other words we want to penalize thesis

51:22	this good prediction as less as I guess you can Simple understands the

51:32	function as a way to impose different for different predictions. So for largest

51:42	and remember that our prediction will always 10 and it turns out that the

51:52	function that having this form looks So let me let me rephrase

52:05	So this is a basic idea for a convict, a cost function for

52:10	regression. It turns out the one of actually implementing this idea is to

52:17	a logistic, usually cost function that has this form here. So now

52:24	know this looks a little bit a bit complicated to you, So let

52:32	explain what this means. So let's just consider the first case when the

52:38	labels want, when the when the answer is one. If that's the

52:45	, if the twenties line the cost social ways on this, why could

52:56	is a menace log of each other's ? Well, we can plot up

53:05	function in this plane in this it would look something like so this

53:14	horizontal axis correspondent to 18 of So the manners log it,

53:22	Um, well, we can figure out by looking at what happens when

53:28	detective actually won. When you With one, this is zero.

53:35	when age state of X, he's to zero. This become positive

53:44	So it would look something like So this is Matt Manners. Lock

53:55	theatre Max. So what this means these at Wednesday's when this prediction is

54:15	. When this prediction is one. means that our prediction is the same

54:21	the transit. With Jessica. It's , you know what in this

54:27	because the prediction is very close to label, so we don't want to

54:32	this prediction. Therefore, we have zero here and Congress. The prediction

54:39	very different from the true answer in case meaning that's the prediction is close

54:43	zero. In this case, the is so different from our our Trans

54:49	with one. So we want to a very, very high penalty on

54:56	prediction. In this case, when one's predicting that one Z, the

55:05	actually goes to infinity so that that's I did. I'm here when the

55:10	, when this prediction is one the of zero and when the predictions close

55:22	zeros cost infinity, so it captures intuition that, um, ive the

55:28	is different from our label. We to penalize this'll any other than have

55:36	. So now that's a case for equal to one. So now let's

55:41	a look at what happens when what happens to the cost or to the

55:46	will invite zero. So when why zero? We are looking at a

55:53	function that has this form, and way we can lock it up by

56:02	the following, if so again, the horizontal axis that is the age

56:08	of acts. So when each state X equals zero, then this cost

56:22	zero. And when a state of you one, this is cost

56:37	Log one minute. State, state X. This is gonna be

56:46	So, um, it's something that like this infinity. So when this

56:58	this is true, this is our true label zero. When our prediction

57:03	zero, that means that our prediction matches our labels and we don't want

57:08	any and it cost or any penalty this prediction. Therefore, we have

57:15	a penalty zero. And when these one When's when's the actor answer is

57:22	zero. We want Panelist this prediction so. So that is essentially what

57:29	cost function does. It penalizes prediction when the prediction is different from the

57:36	, and it penalizes the prediction much heavily if the prediction is similar to

57:43	label. So this is a this out to be the cost function that

57:49	have developed for calling for largest Um, Waken turns out we can

57:58	rewrite this cost function in a more form on that looks like this.

58:04	this equation is exactly the same as swine. Remember that this is this

58:18	only the cost function for one single example for multiple. For many,

58:25	training examples, we will just simply them up. So this part is

58:33	same as this part. But for training examples way need to some Some

58:43	has a cost over all the training . I guess if you want more

58:51	, we should always we should also see superscript i above accent. So

59:03	is what, um that easy final for the cost function for the largest

59:14	. A good thing. I'm positive this is convex, meaning that there's

59:19	one solution. The whatever is a you end up with. That is

59:24	global minimum solution. That is the solution. Best solution. Okay,

59:31	this is the cost functions as you saw And remember that learning is all

59:37	minimizing this cost function So bye. this by minimizing. This is the

59:44	of this cost function. We can off a 10 the optimal motor

59:56	I notice that this cost function is because every part every component in this

60:03	function is a differential function. So this car function is defensible, their

60:09	very straightforward to calculus Ingredient so So this is the greedy Inter have

60:15	cost function with respect to well, than primitive state. Um, and

60:23	see houses. Credence is defined it find us in pass one by one

60:31	. Well, because because we can ingredient easily. Therefore we can we

60:37	play catch grading descent or slow Good, good and decent. I

60:40	about mini batch gradient descent to train largest regression model. While that stringing

60:52	that second part of the machine running important part of motion on Islamic

60:57	So once they learn is completed, will have a tenancy learned more the

61:04	fate you want to watch while next when the new data comes in,

61:12	did acts comes in. We can predict, um, on this news

61:19	this new live acts by calculating by this age data backs where these things

61:29	is Seymour the printers with current, have learned from the training face.

61:36	, next, the implementation of logistic using second learned. So to do

61:42	, I'm going to come open Demonstration the jukt a notebook. So

61:54	going thio my azar notebook and walking a very simple example of largest

62:13	So if you go to my No, no account. Um,

62:18	you click this lack of exercise Week on there is already a joke notebook

62:24	largest regression. So in this I just used the example, Data

62:42	, called areas State Said to illustrate this whole how you can implement largest

62:49	using secular. So this arrested said is a very femurs. Public dissent

62:56	motion running this'll arrested said, contains sample and pad Oh, Len thing

63:03	values from 150 iris flowers for different three different species the Tosa versus collar

63:14	on Virgin Eka. So this is picture of these three different Aris

63:22	In this case, without our our task is to train a lot

63:27	regression or a binary classifier to classic into Virgin Eka or non Virgin eka

63:36	based on two features. Petulant and . With, Um, the first

63:41	you want to do is to report pi array that would have you.

63:45	let me restarting Clea Clea R output that so that you can set it

63:52	cleaner and you can see what each of the code dust wait. You

64:00	to impulsive vampire E as number. also want to import this Ari states

64:07	. So here's what you can do impart the absent from second learn imported

64:16	and Aires equals this says Start, , Carrie's So this is how you

64:25	at every state said. So let's ahead and run it. Okay,

64:31	the resistance as we sing our If you want to take a look

64:36	everything, that's that you can simply Aris and run it. So this

64:40	how the distance that looks like it's little bit messy. But if you

64:49	has look at all the information here , you can you can recognize that

64:55	arrested said, is the dictionary. that the dictionaries always is one of

65:01	other amongst python did types, and dictionary always consists off a few a

65:07	of key the repairs. That is I wrote down here in this

65:11	Arrested sent consist of five key battle . And if you want to find

65:16	what keys are included in this, said, you can clicks this cell

65:23	and run it. So we have keys data target talking names on this

65:33	and future names. Description K Just a few sentences describing that they said

65:40	state of Qi that active contents the is array matrix with one drop,

65:49	instance, and one column for In this case, we have four

65:54	. The pad Oh, and several . Wait, so we have four

65:58	. The Target cake contains the wasti labels and future names. Cake

66:04	, and these are the names of on target names packed. It contains

66:10	names of the actor targets, so not Let's take a closer okay,

66:19	that each case corresponding to so you . If you want to find out

66:24	what's the target value is you just Aires on square bracket is tight so

66:32	will give you the value Correspondences, , touch, kid. So in

66:39	case, you notice that is a . The values correspondent to these key

66:47	are simply one Simply zeros ones and . Again, this values just discreet

66:56	values. It simply means class zero one class to what you want to

67:02	what each class. What What targets class expected correspondent to. You came

67:12	on this coat areas, talk It actually tells you that class wine

67:18	to save Tosa class too. Class Zero Correspondence that talks a class

67:24	is mercy collar. And plus two the emergent Nika. And if we

67:31	to find out, see Fisher So this is this is the

67:37	Names in the names of features in garrison said we have four features several

67:43	separate with had tow lines and better . Here is the description of the

67:52	set that you can treat. You to learn more about It was active

67:58	, active training, data input, middle input. Um, data looks

68:05	this. So this is the you consider this matrix that has 150 rose

68:12	four columns because we have 150. better way Take the battlements on 150

68:21	. And we have four columns because have four features. So next thing

68:26	me to do on the next thing want to do is to pray.

68:30	opinion, Dayton. In this we want to use a paddle lands

68:34	patter ways to make predictions that correspondent this data array actually see,

68:43	third and fourth column of the things so that is what you see

68:47	I just assigned the third and fourth columns from this delivery to this numerical

68:57	ax. And this is our, this is the target. This is

69:05	target of fury. Too sensitive the . So we convert all the

69:13	remember that labels are with the labels it. Average students that are simply

69:18	, ones and twos. We're We have a, um we just

69:25	his, um, So essentially, this this lanco does is to convert

69:30	true and enforce just converts all these values into zeroes and ones that we

69:38	that will serve as our labels the . So if you want to

69:44	Second, learn to trim the largest model again way we need to import

69:49	more do so that so that we use it in our workspace. The

69:56	to do that is to simply right this coat from second learned doctor the

70:00	modern. Because Lena regret largest regression to the category linear more than from

70:07	and dot William or the importance of regression and then feel less Korean.

70:17	this one prepared data import largest regression the training party is very, very

70:26	if you use in second the theory second packages training part. So this

70:31	regression, that is what has just imported from second learn. And don't

70:36	about this practice instead of a few that a user can cast pacifying your

70:42	to really taters. Is this regression due to their spacing problem? But

70:48	, don't worry about it. I this'll am code. I'm justifying my

70:57	regression over them. I want a weapon. I want the largest

71:02	um, algorithm with this two So and I name my largest regression

71:11	them as log on this contract. they seize my, um, largest

71:18	over them Now I'm with that I'm ready to do the training part

71:24	it. Very, very easy on on it. The training parties down

71:32	. Wait. This is the name my largest regression classifier. Start feet

71:39	followed by Z input variable and doubtful . You run it and that's

71:46	That's see, that's That's all you to do in order to train a

71:51	river. Modern without foot here just you what you're more. The parameters

71:58	so many of you don't don't don't to worry about is because the default

72:02	primitive relatives are humanity's is good for purposes If you want to find

72:08	the learns more the premises from within thing is it there? Oh,

72:13	is essentially the intercept. You just the You just write down the name

72:22	these this regression classifier thought intercept on skull Run it ! And so that

72:29	the moral parameter Save the state a learned from largest regression. And

72:36	if you want to find other all other equivalents, for example, it

72:39	to three in this. In this , we only have two features,

72:43	we won't have to. The one two the way to the finals these

72:49	is to using the coat here. think this is the name of my

72:56	Regan classifier Thought co you feed on , Oh, yeah, on the

73:03	. What you found, too. you want a summary statistics for your

73:09	regression, for example, if you to find out the overall accuracy of

73:12	predictions from largest regression, you can call this method called Scott. So

73:20	this case, the prediction accuracy's 92 6% which is not bad, since

73:28	haven't says where were mostly using the default more the printer matavz against.

73:37	that's that's it. That's the premium . And you want to. If

73:44	want to predict, you can call method of social awaits largest regretting,

73:48	predict, um, printed. You use your predict or predict on the

73:55	. Probable. That will give you really, um, really longer value

74:02	this syringe from 0 to 1, this is a new data that you

74:07	to make a prediction on. So part of code to civilization okay,

74:21	you. Yes. So this is data. All the train goes and

74:29	. That is our training data. these death black line, that is

74:34	decision boundary and all these street line street, different colors that it's the

74:43	lines for the predictions. Okay, that's it for implementing, implementing just

74:53	in secular and very easy, very . For the training part, this

74:58	all you need to do. Largest you in the name of your different

75:02	dot feet and or the tribune or mathematics or the optimal Asian part is

75:08	care of by this simple coat. , so that's all for today.

75:16	you for attention on. If you any question, you can send me

75:20	or you can ask questions in the ,

Previous Next

00 : 01
04 : 13
12 : 29
17 : 45
22 : 03
25 : 59
29 : 59
39 : 05
43 : 09
46 : 27
50 : 27
55 : 49
61 : 59
68 : 27