Lecture Script

Summary

Core Probability Concepts

Probability Axioms and Calculations

Joint Probability

Random Variables

Probability Distributions

Notation Conventions

Applications to Natural Language Processing

Prior and Posterior Probabilities

Conditional Probability

Chain Rule

Independence

Conditional Independence

Bayes' Rule

Marginalization

Semantic Applications

Notes

Transcript

How about this? Would that work as a nice little event? Rolling an even number? Is that an outcome? Well, it includes pretty positive outcomes, but... Thank you. It's the one. Everybody knows the difference between an event and outcome. Good. What is probability now? Probability is a numerical measure of How likely is certain event is an event or an outcome? It has a value of between 0 and 1, and I'm going to 15.

I'll do a sight note here that I always do. Horrible repetition. Probability is a value between 0 and 1, right? Okay. Do you want to make a bet that by the end of... I will see someone write probability of something is Greater than one. I don't think that's a good idea. What's your, what's your, what's your opinion of the large amount of aSo I told Chris what was going to happen. If that happens, I always have an edge to give someone a fewer points on the exam.

Thank you. Where'd she go? Thank you. Choco. All right. Probability theory is obviously a math. Science and studies have to come together. in the visual probability box. Alright, everybody knows what an axiom is? Something that's sort of true, you're questioning it, and in the case of. Zero and less than one. Right? Or equal to. Probability of the sample space is 1. Does that make sense? A sample space is event and event, right?

If we, or all of them. Okay, how do we calculate probabilities? So I'm pretty sure your experience is mostly based on frequencies, right? To count how often something happened and then That's your event. frequency, or in our case it will be the size of the set, right? There's only one. It happens. Over once out of six options, Over the size. Conditional probabilities work in this likely different ways that we'll get to.

Consider events. A or B, a union, three or six happen. mutually exclusive, we can add their probabilities. 2.6. The possibility of both being enrolled at the same time? Impossible. Zero. Does that make sense? The size of an event? Just either... I'm sorry. 3 happened and 6 happened at the same time. This is an empty set. An empty set has a size of 0. 0 over 6. By the way, I'll be slowly producing some other notations.

Shorthand for a lot of things in this class, just to Simple. So, uh, union, you can consider it. People are right, it does. The AOV, that's fine. Does that work for you? What if we have a union of Thank you. Overlapping events. The probability of that, because we are double-calling it, so we cannot... Otherwise, if they're mutually exclusive, you know already what to do. Yeah. So far so good, the basics?

Now, let's get back to the stuff that actually matters. Okay. Now... This is where the big deal for this class starts to happen. IntersectionSituation,. Let me introduce a couple different ways of writing that. which you will see in this class and in all sorts of places in background. P-A-R-M. BAB means B-A-N-B. Exhortation.

Logically, A and B happen at the same time.

So, joint probability. There's a probability of two events occurring at the same time. You can think of it as an intersection of two or more events. Now. Of course, this can be extended to-And so on and so on, as many as equals. That's a joint probability distribution.

Same thing. All these ways to not... probability is, in all means, the same. If you see any more of those, that's the same. How many of you are comfortable with the logic part that we did? It seems very, most of you did. Okay. We will not, I don't think we'll go there in this class, but we could kind of apply probability theory to sentences in some cases.

What is this key, do you remember? I think mythology is something that is absolutely is considered to be always true.

Contradiction. Contradiction.

Where did we get the picture? A tricky one for you right now. What is a random variable? We have a definition of our human being, of our body. against your intuition or your Knowledge. A random variable is not a variable. Is that okay? Formally speaking, it is not a variable. It's a function that maps Events and outcomesTwo, Values, essentially. What do I mean by that? When I was discussing the rolling the dice, right?

We were talking about an event. I roll an even number. In English, everybody understands what it means. You can roll either two, four, or six. That's still not a number, not a probability. say u, a random variable reallyTurns takes all those possible outcomes that he meant and has a way of mapping every possible event by that aim to Now let's be fair. What does this mean to you? Is that what that symbol means?

The set of real numbers, okay? So, if what happens...

Alchems Two. Set the real number. What's your hunch? Am I mapping S? Two. Probabilities are not necessarily. Real values go beyond Zero, right? If I was absolutely adamant that if we're mapping to probabilities, this would be8-triple-0-1, right?

So a random variable technically is beginning to be a function that maps events to Two, three. Any number, any real number.

But in practice, it will be a probability, value, Okay, so here's an example of how that would work in practice. In many cases in this class we will be sort of bypassing all the new ones here. I don't know, heads or tails, I mean, we'll have a probability, but technically speaking, a random variable would mean that you assign a number to an outcome in a central space. one way up right and then from there You don't have to memorize that, but it's good to know if you are going to.

Is it a... The distribution of possible outcomes that could be in our unit from around the area.

Okay, so you have possible outcomes, like every possible outcome, you have some number associated to it.

Typically, those will be frequency discontentions. Frequency means what? How often it appears. So you could possibly have, you have Because yes/no, right? Fifteen. That's a frequency plot, right? For two outcomes that could be considered a frequency distribution plot, right? Or I could have a nice littleTable with yes. No. That's also-Um... Distribution, right? Look, typically this will be replaced Does that make sense?

How else could you possibly represent a probability distribution? You have a table, you have a block. Yes, bell curve. A bell curve, okay, for continuous work. Could we use a piecewise function?

You could do that. Anyway, it's just a matter of notation.

Something that gives you probability values for every possible outcome By the way, Belker, can you have a tableLaura Belker. ...a normal distribution.

If it's continuous, you can't because it's infinite number of values.

In that case, you do what? You have a nice little formula with mean and variance and whatnot, and you have a...

But in the end you have a way for every possible outcome to match it to a specific protocol. That's your That's your task here. All right. Now here's something that I will promise to adhere as much as possible, but it will, I imagine, eventually somewhere it will be violated a little bit, but this is the notation that I'll be using here in this class. For the most part. A capital will be a random variable.

Boulder X Well, typically in textbooks and whatnot, Corresponds to... Set up variables.

So, bolt X, you make, you know. X1, X2, X3. Does that make sense? Describing something. From the system.

Bold lowercase would be an assignment to every single one of them.

This could be, in this case, 1.1.2.

Notation, all right. For the most part, You will see this probability of a certain random variable x taking up the value of lowercase x when shortened into the probability of x. Following the same way of thinking, What kind of probability is this? It involves fuel running on the level.

We've got a rain. Okay, probability of these three Outcomes. at the same time. Reference by three different random variables. This is the probability of matching. Okay. How?

You get the picture, the probability of ward one equals some ward value of ward two equals some ward one. What would that be?

I mean, I'm trying to figure it out. The probability of word 1 equals All right, that's a legit sequence of words in English, right?

How often let's say when the city was Thank you. How likely is that? sequence in English, for example.

Yeah, so I was thinking about potential sequences that could be identified by, let's say, the Declaration of Independence. So is this method used for categorization or classification? Since sequences, like the declaration of independence, could be classified as government documents or something like that?

So I would be hesitant to make a jump from Well, there's two things that I can think of. There's going to be some certain expressions in legal or government documents that you might expect, right? The probability of it occurring in that document could be some indication supporting the claim that it's a government document, but in my eyes it would not be enough. I would rather, I would, uh, I would use a different example, but you're absolutely onto something that is a piece of information that you could use.

I just feel that in your example that would be a stretch. I don't know. But, you know. I'm not sure if my temple will be better than the judge's, but... probability that you know what's up, right? Being in some conversation is an indication of some, you know, an age or bracket or something like this, right? You would not see that in a government document unless it's a citation. There you go. You see that?

Anyway, I digress. You can see a lot of problems. technical problems with building a platform A probability model like this, or like this. And I'm pretty sure if you take a deeper look at what I tried to convey here, you will see it. I just used two words, combination of worlds. A world is a sequence of two worlds, right? There's so many ways to... match towards the English. Just two words. Let's...

But we could do that. Now if I extend it to three words. Four minutes. You see how problematic it becomes? Okay. And... Your tragedy, your tragedy. All they are doing at the very At the very least. Yes. Give a sequence. Oh, bro.

No matter how well there's some of the technical division, it could be very low, and it will. You view a probability estimate. What? Does that make sense? Lost or not? Good. This is what I was getting at. If you try to build a probability table that actually captures all nuances of probability distribution involving a lot of randomness. No variables. Here. You're in trouble, right? This table is going to be-Constructing that table has gone nearly impossible in many years.

The application that we do. That's number one. Okay, we're looking at a specific sequence of wars, for example, what is the probability? That's going to take quite a bit as well. Thank you.

Tuesday you will learn how to bypass that. All right, but before that, you have to go through some additional probabilities. Before we start talking about conditional probabilities, let's name the probabilities that we have been discussing so far. Unconditional probabilities, which is probabilityIt's raining, we're building the road. was by the grave, saying whatever. Prior probabilities. Okay. What does it mean a prior probability?

Unconditional protocol.

The probability that it's not Brian by some prior evil.

By a priori event, I consider information You've learned something about reality. Okay. If we use it to, use that information to, Learn what the probability of something is after that information came in.

Everybody in the audience, I know you know the die roll is one over six. Okay. Now what if I told you that The role you're... Rolling does not have a 1. I need to design a new file.

What's the probability that you will roll with the type?

So you would move from 1.6 to 0.

This is calledConditioning.

Okay. I had some probability of E math A. And then E came in, so I'm now looking at probability A given E. So it was 1/6. It is zero in my specific example. Alright, what is this conditional probability? A given B is... How many ways to calculate it do you get some?

base formula or additional probability for the right now. The ability A and B over probability of being white. Is that the case? Okay. So, how about the little e-lens? A happened. Greg? Probability that some event B happened. No. This means I'm telling you that B happened. B is true. What does it mean? In that. Simple space. And I go. I mean, the probability of G is 1. Okay. That also means that I already know that I'm somewhere here, right?

I can't be here. We're there. So, automatically, Aging true confines meTo here, all right? The previously probability of A was this circle. the size of this circle divided to the entire sample space. Now, What is Graphically speaking, what is going to be-what is probability A? over B ADMD going to be right now? Okay. We are here, right? Is that my son? You can think of conditioning evidence as Limiting the number of possible worlds you can explore This piece of information means that you are here.

If you're dealing with that sort ofComplex conditional probabilities, you're using the same-Join the club a bit. What, can you say that again? The individual protocol is your. Some axioms of conditional probability. The conditional probability cannot be greater than 1 and it cannot be less than 0. That's the only probability. Probability of B even B?

One, right? You know that you can...

Okay, so a little test for you. Here is a sample space. There's two possible events. Uh, eight. means you got an A in the course. H-U-R-R working or not hard working. What's the probability? Hard working if you got a knife. Based on this. Ah, based on the numbers. So you already know that you got an A, right?

What's the probability that you're hardworking? Hard working is this entireThe thing that you're being told that you got an ace and you must be in this field Thank you. The number. The probability value? Seven out of 10. Seven out of 10? So,This. What about that? Okay, how about that? Same question, but with different numbers. Seven over twenty-one. Seven over twenty-one. Seven.

Alright, it is a way of turning a jointInto the product of So if you have probability of A beingDC. Going to join the distribution Let's ignore the values. You will have a probability of A. probability ofB, give him A.

Why is that important? In CS480 it was important for our Bayes network Here we will be doing some mathematics to show that you guys can Well, I actually estimate probabilities because You can see that getting a probability of a sequence of words. And we'll have different ways Game video. Later on. All right, I asked you about independence before. mutually exclusive versus independent.

These two terms are completely different. Independent.

means that one does not convey information about the other. Mutually exclusive, that means that there's no overlapping outcomes that they share, but if I know B in this case, That knowledge does not change anything about my belief or my understanding of A.

Any examples? Do the first time you left the Buckeye, that did not have any effect on the second time or third time?

Absolutely, absolutely. Or you know in my shoe size, has no idea relationship to what you will be doing on January 29, 2015. I mean, we could argue about the universe being a one big thing, but For the sake of all the discussions here, this is what independent means. Now, if you know that two events are independent, you can calculate----Why does that work? Well... Here's a mathematical confirmation, right?

If A and B are independent,Ahem. The value of A becomes Just don't have a W on that.

We know conditioning using B has been evidence when A is independent of B does not change my value of F That's the English version of what you're saying right now.

Good? Okay. More stuff about independence. If you're shaking on that one, either of those. The other aspect that I have seen in the order to go back to is conditional independence. What is conditional remittance? -Random variables.

No. You're looking at that person, right? And then I approach you and whisper into your ear. This person is so strange. Okay. Would you change? Your belief about that person? How about this? 90 over 100. That will depend on how much you trust my judgment, of course, right? Definitely, if I even have a shred of credibility, you would immediately raise your belief. That $9,100 would become too powerful given rich is rich.

There you go. So the moment you, in this scenario you probably would dial down that okay I'm happy you have maybe a sliver of a percentage up or down but not by a huge margin like here right? So this is conditional independence. Once you knew That The person is rich, right? Additional independence-evidence Regardless of its value, Did not change the probability. It's the same probability. You're not moving the dial pretty much.

That's a conditional independence. Does that make sense? So, in my example, Powerful It's conditionally independent of happy and unhappy. Given rich. Now this scenario would play out completely different if, you know, I ask you the question, "Is this person powerful?" You give me 1/100, right? And then I whisper, "This person is happy." You would not make that drastic change, right?

But see, if you already have a piece of information that's kind of... loosely speaking overwhelms everything. The other piece of information is irrelevant. So mathematically speaking, what does it mean? This is what... Hold on. X is our powerful. Z is our rich. So we know that this is a yes, this is a yes. And then regardless of what the value of this is, I don't know, hat p, not hat p, Regardless of the value, whatever that value is,Hello, Dennis.

Those probabilities are the same, meaning that information, empty information does not influence Being powerful. This is not the same as independence, but effectively kind of acts like independence. You cannot say based on this example, you cannot say that being Powerful, happy and rich are all independent of one another. Or being powerful is independent of being happy. You cannot say that. But what you can say is that once you know someone is rich, That specific value, you can say that Powerful, is conditional, independent, unhappy, even.

But also,The Bayes rule, okay? Everybody's familiar with the Bayes rule? Alright, now we have the conditional probability rule which is PAB. divided by the probability of the feet. I literally can calculate it using this formula. Why would I use another one? to do so. Which is actually Well, it's that face value more complex because it involves three different probabilities.

Verses 2, right? Why would I do that? Why was that? I'll get the same number.

And all the probabilities are correct, right? I'll get the same number regardless ofIt depends on the information or probabilities that I have at hand. For example, if calculating this is very hard for me, but I already have this and that and that, why would I bother using this one? Vice versa. The other motivation or the other rationale for using math is kind of by looking atTwo possible ways of looking at conditional cover belts, right?

And In many cases, There you go.

This is better. In many cases, Okay. Is it? Easy to estimate the probability. This is your doctor's task. You walk into the office and you say, I have a cough, I have a high temperature, and your doctor is... The task is to establish what's the probability that the disease is abnormal. Someone called or something like that. He will go through the list. Different options, right? And based on those probabilities, he will or she will make a diagnosis, right?

Is it easier to come up with this probability versus Bad probability. I'm walking into the doctor's office and I say, "I have a common cold." All right. What's the probability? When you have a common cold that you have a fever or something like that.

Ask the Lord. This is all medical records. Hold on. People that were diagnosed with colon cold and they had Nine out of ten had Cough or something like that. What about probability of having a disease? How would you go about getting from a the ability of having a common core.

Probability of symptoms, that's easy. How many people out of all the people that visited your clinic had a cough? This stuff is relatively easier to find in many cases than that. This is where we're biased. Alright, some examples that you might have seen in CS4A. If you're kind of shaky on the bath roll, this is a nice little visual example that I personally, like,To get a gauge.

A and yes, no, maybe. A bee can be you. Balls. So this table is relatively long. There are some probability values. Two, and one. No. What is marginalization? What is a marginal probability? Here's the question. If I have this joint probability table, B, A, and B. Can I extract from it Probability that A is a yes. If I give you this table, it's fully populated with numbers. Can you get me that value?

Yes. That is a process of marginalization. I would simply sum up all the rows where a is y, and I'm done. And that's the marginal probability. Okay. It's like an unconditional probability.

This is a special name term for it because you're kind of-Summing things on the margin, that's.

Uh, All right. Another question for you. If you have this-this is a joint probability table, A and B. Can you give me a probability that A is true or was one of them? Yes. Even if B is--Balls. Can I extract that? Probability from that table. Absolutely. I could do-first of all, this is going to be-equal to the probability of A is being yes and B is equal to false, right? Over probability. is equal to false.

Marginalize? This is a simple lookup because we will have only one row that corresponds to that. Boom, you can calculate that. Do you see where this is going? If I have a joint probability, I can go and marginalize it. I can go and calculate conditional probabilities all along, extract from it. Does that make sense? Back and forth. I have conditional and marginal probabilities. I can get the joint probability distribution.

Marginal probability, let's say probability of a word Yes, right? This will give you the likelihood of that word existing in English, right? of Uh... Yes. Sir, right? That's a toy for the dollar. Probability ofThere. Given yes, it's a probability that They're called--


Probability Refresher

Random Experiment, Outcome, Sample Space

Random Experiment


Outcome


Sample Space