Summary
This lecture covered the historical development of Natural Language Processing from the early 1900s through modern approaches, followed by foundational concepts in linguistics relevant to NLP applications.
Markov Models (Early 1900s)
Language as a System (Early 1900s)
Foundational Computing Concepts
Early NLP Applications
Translation and Early Systems
Turing Test and Conversational Systems
Chomsky's Contributions
ALPAC Report and AI Winter
Rule-Based Approaches (1950s-1970s)
Statistical Approaches (1980s-1990s)
Modern Approaches (1990s-Present)
Language as a Structured System
Linguistics Subfields Relevant to NLP
Key Linguistic Concepts
Phonemes: Smallest sound units in language that may not carry meaning independently
Morphemes and Lexemes:
Morphological Types:
Syntax:
Notes
Transcript
Markov chains. I'm sure he was. Or the name mark of the morning Marker model, marker change. Thank you. Assumptions, right? You might have heard that. This guy will show up multiple times in this room. Push these. Actually, Very important concept. Yeah, all sorts of AI operations. You will meet, sorry. Markup model, or NLP, that essentially A simple way to put it is the Markov model is based on the Markov assumption that everything that happened in the pastIt's cool within a month you will see that this translates into How do we predict the next word?
I suppose. This is, this is,Uh, philosopher. If I read the code correctly, his idea wasLanguage is a system. Why is it important? You speak, right? You learn how to speak at a very early age. You learn how to write later. And it's a natural--Process, process, right? Think about How to do that, how to do that. Language is a system, right? in different places, right? Systems are made up of bonds that interact with one another.
This is a very important... I'm here for a computer processing because computers are good in breaking Do you like the word, right? Let's... I see this. Expression here, meaning will be created when a language to relations and differences. Do you have a brief? What would be the language parts? Parts of speech, context, sentences, paragraphs, syllables, right? Exclamation mark. What? Everything or most everything carries some meaning and the relationships between them, even the sequencing, right?
What will Ketare mean? There is no explanation. Mark at the end of the sentence? There's a question mark at the end of the sentence? No, just a... Full stop, Frank. different meaning altogether. Slightly different, but different.
Understanding of that system. That we can communicate. Now that should make even more sense when it comes to computers. You might have been exposed to one of them. Working Chronicles, or even programming letters. A compiler for C++ will not compile something that on your computer will not compile something that is not C++. If I give you a C++ code, you can pass it around. Your compiler will understand, or my compiler understands.
Same system. Now, this should make sense because if you have a system like that, you can kind of integrate it in a computer and If you can boil down language to a system of Later, come on. always working the same way, then you can haveCommunicating with one another. Same with computer and He was right. Computer understand our system. Well then, we can have a communication. Of course that was just a philosophical idea that's harrowing back in the hundreds, no computers back then.
About deal group, is that my turn? Okay, so. The third. Important. This is something that I imagine. It's a conceptual mathematical model. Processing. Okay, so. Language as a system. A mathematical model processing anything And then the last one. It's something that you also should be familiar with by now. All you want. All right. So We have all the pieces eventually we need. He got his computer. The first programmable, flexible computer and the rest is history.
If we have a computer,Right, that's 946, right? Kind of move very quickly moving forward...listers and whatnot. Computers, like that one. What is your-I mean, to me it would be obvious that Unless you have a computer, you You don't wantA difficult way of-You have a machine that just listens to you and does what you want, right? And when you see that thing, it can process data. Everyone will hate it.
The war ended in '45, so. You were down there? No. Perfect. Okay, so, what people will we get to that? The first step was digitizing. Thanks. Really interesting. First. translating was next Okay. The beginning of the... Call the board, any questions? spies and what not So one of the... The guy on the screen right now was unbelievable.
at the time. And he was really big on let's make a machine translate things.
Okay. Because it's not easy, right? You have a ton of documents. -It would be great to have a way of automatically processing it.
Turn the meters on. Since later. How do you think it worked out? Not one. What do you think? First, an approach to translation. Yeah, pretty long and very, very simple, right? Word by word, try to translate it and express me. These are people from the 90s, right? Experts, very smart people, and they didn't have any better idea. than that at the moment. But in any case, that processProcess was Very, very well received as an idea at the time.
more specific It was more considered to be kind of Going beyond that one totranslation, let's-Let's come straight out of there. The Russian text was just English encoded in some fancy type, right? So once we have the way to decode it, we will be fine. Can you imagine that did not go? or eat. Like there's your symbolic effort, right?
And there was quite a bit of Work in this area, I said you first. And will be used for the years with him. Translating. Digitizing works. I'll just look at them immediately after translation. How do you rate How many of you are bilingual? Quite a few. How do you rank-Let's say Google Translate. And it comes to yours, the second language.
Now, now this is better.
That's a good come, right? If you go to Tanya's batch, it was terrible. The last used language, the words, the translation was great. Usually you laugh at the longer sense of translation than the real translation. Second visit is halted.
If you're in the past, right? conceptual idea, how do we verify a computerThis is actually Intelligence, right? The idea was with the true intent that eventually machines would become so good that they would be As intelligent as we are, how do we know what they are? When we have that? Let's have a conversation. Thank you. with a machine-zoned conversation, meaning we can modernize-Okay. How many of you are familiar with Aliza?
You have, you understand. The computer can generate The text has to generate the right response I want to come here to answer your question. You have to ask them to understand the question. SoThis is where Another way, sort of, When the NLP approach started, people started looking at what to make of the text process there from a linguistic perspective. How do we understand what it means? Let's hire, let's involve a bunch of linguists who will break down a language Treat it as a system.
And that work will produce a bunch of rules.
And you can apply and then just add a method. This is the rule, we sell that, okay, that rules this and that, right?
Another linguistic-based concept that came from the area, which is 50s and 60s, is Chomsky's idea.
He is an American, America's professor of MIT, and one of his major ideas was Yes? Syntax is independent of semantics.
He argued that he couldAnd Silly sentence right here. His argument was that This isn't perfectly valid. There's nothing wrong with it, that's it. But it kills you. Now, Going back to one of the first slides that I showed you, which was-Sweet. That well language is a system. And now, if you take that into account, Thank you. What is that disconnect or the connection?
Assumption that you can decouple a few What does that mean? Modularity. -Thanks.
You can process one and the other How would you move on from that?
Having The grammar. A special grandma.
What do I mean by that? This is very close in a sense to programming language. Right. You all can code, you know how a for loop works, you know how a if-then statement works, right? Yes. Give them. Or something like that is a structure that you can translate your idea of something has to happen. You can express it in a million ways. If something, when I see that this, when I do that, You can cram it into the very rigid structure in programming language, right?
And computers understand it. There's a ton of ways of expressing a rule in English or any other human language, but there's one way that a foreign language will understand it, right? Does that make sense? The same idea here, and you have a rigid structure that represents games, for example. Verbs, or nouns, or whatnot. And if you can, Convert. A written text in deaf form. Then it will be much easier for the computer to understand.
Again, and it's changed because we're sharing the same system, right? Or let me take a step back. The assumption is here for English that there exists one very An ambiguous grammar for English language. I don't know if the 5,000 rules and that's it, right? There's nothing ambiguous about it. Is that true? Obviously, no, right? There's rules, exceptions, and whatnot, but if we can kind ofIn this direction, here we'll be able to Every sentence will be able to train that.
The takeaway is that it doesn't workThe long time afterwards take away that it doesn't work. Perfectly. But it can get you somewhere. Take for example a legal document. Will not see a couple of very really, really way of writing things or expressing things.
Legal documents and medical documents will be very relatively easy to translate into level. Because they follow a similar...
Right, you're not allowed, you would not be a good doctor or a lawyer. And then what would you use as a partner? On the other end of the spectrum is slang, right? Or cutting corners. What are your tests? Things of that nature, and it's very news. Which is going back to what I said before, depending on your applications, for example, if you're writing a chat book onOr lawyer, right?
For those who know Eliza already, We'll try to keep it short for those who don't know. At some point. Why is that he is the first Chapa from 19--16.4.
There was a chap 60 years ago. It wasn't as good as your church. But, and it was designed with a very serious Thank you. supposed to beThat's a little shrimp.
Has anyone ever heard from your parents or someone in the media or whatever, well, they're young people. Nowadays are so attached to cat bonds, it's scary. And that that attachment is dangerous for the brain. Good luck with the laser. Well, I like that one. massively available to you. because computers were confined mostly to universities and military. Questions? So pretty much students and professors have access How long is that, buddy?
If you read about it like a After it was taken away. Some people have no control. They're a mess in their life. And if you've never... Order.
Never seen a life that's intersectional. Here. Conversations are online. You can have a short conversation with them. We'll see you again. No serious conversation that's been happening in my life, but still people felt at that time without the knowledge that you have what it possibly was.
Also, we'll back that. There was a couple more models like that that followed the '60s, '70s. Uh... Bud. People quickly realized that this is a dead end. What is a dead end? Having this linguistic approach, rule-based approach, It just didn't work very well. And... L-E-V-E. Research from A.R.C. Search. Is this the first time in history, whether you'veYou know how recent it was,You get a grind from somewhere, certain thing and get paid for it.
So at some point, any idea that I have-Talking computer and AI. That's all And enter.
20 years later. So all the research essentially stopped. This stage. All this.
staff was considered insufficient.
There's a linguist that tries to distill language, you build a bunch of rules, then you build a computer model, and you have E's and L's, blah, blah, blah. That didn't work.
Eventually We went toSomething different for researchers who do the different records. Other than have a pair of rules, explicitly defined before you build a system, why don't you distill the rules Don't make any assumptions. Good morning. There you go. Your response right now is statistically predictable, right? You're not going to say, No, no, no.
It is possible, but very unlikely. Sure, sure, but if I say good morning, your language, your... Experience-based response is to say hello or good morning or something like that. You're not going to get into a tirade about normal jobs. Even though it is possible. Like, on the same token, you will not respond to mourning good. Right? Clashes with your English understanding in your upbringing. Do you immediately remember the rule?
Why is there a morning after good and not vice versa? Probably not, but your brain has learned to. The statistical patterns, just like I keep using the same The same example in the reservoir. It took me a while. How are you? Yeah, how are you? I land in this country, someone asks you how are you? You'll get a response from me, okay? I'm good. My nose is dripping a little bit, my toe is hurting. This is not what you want, but you have to spend some time talking to Americans right here to learn that this is not how you respond to that question.
I've been here for 20 something years and I still don't know why you do that, but I've learned to adapt it statistically speaking, okay, I will get a better response if I just say, I'm fine, all's great, right? Nobody cares about my ground, right?
In statistical terms, But how do you get good statistical data? What is the distance?
You're trying to average. over-simplifying, that you're trying to do reduce all of us here to an average height, average whatever, right? You can't deal with a whole mess, right? So same thing if you want just an Englishman's life. doing whatever it is, some averages. Eddie? Later on you will learn that your charges are equal. averaging the language in the end. Of course you can argue with me like that.
And there are problems in transgbg, you always get a different response, it feels like there's always a different conversation, but that's a bunch of humor. Little tricks that are on top of that. Is that somehow interesting? I'm almost done with that. So, it's a good approach. And then, whatever happened afterwards since 1990 has ended up to now is just an of that statistical approach and accept the different tools for use.
Alright, how many of you are familiar with the term "embeddings"? Embeddings and deep learning. Thank you.
Plus a lot of data. That's what makes. Tomorrow at NLP. Everyone should know by now How many of you were to the government, I guess, when Google was distributing the first Gmail accounts? You couldn't just set it up. You had to be invited over to the government 20 years ago.
In any case. Do you use Gmail in the IAP until, what is it, 24 or something like that? Do you like it? Was it useful? It was free. That's why University got it in the first place, right? Well, the moment it stopped being free, Gone. But why was it free? Every single email.
that he wrote every single pattern that he You've got stored, processed, and distilled into Text box. It so happened that Google got a little bit Facebooks. Instagram is great because people are happy to tag in and describe the image. What else do you need to create a unit that's like that? In. I digress.
The basics won't cover. So I think that makes perfect sense when you try to define a life, going back to Swiss Language is a structured system of communication.
Does that make sense? The more structured it is, the more easy it is to use.
Specifically by the Moral Wounds, Also. A symmetric system means that it involves It's a structure based on little pieces. Where can you do that? -Where is it? And this goes back to what I said before, language is... Sounds, forms, right? Are you familiar, have you heard of the term, this is for us, teens, more than, like things? Okay.
How did you say your name? Sound will tell you something about where you are playing. The word itself, when you chop it into pieces, possibly some negation there is always something to say about it Sentence. Context, semantics, there's just different levels, okay?
Going back to the 50s and the 60s. It's a study of language and its church. That makes perfect sense. use a result of linguistics. To build a pattern base. Sister. It didn't work well, but that information is still useful. What does it mean? It means studying morphology, which meansWords are being syntax. How the cells are being created and used. semantics, which is meaning all these pieces are the basis of linguistics.
Our words form and ourJust go back and going. Thank you. There is a structure and there is. What's the right section? syntax that that's obviouslyEducation. So there is a structure.
So, alright. Good semantics, the study of meaning of sentences, pragmatics. This is a beautiful term, pragmatics.
Well, I'm not going to go by the definition, but here's the thing. Is it possible that the single sentence You say C or I say... Could you please record it in multiple words? So no matter how much you process it using syntax and ...or whatever, you will not be able to... Thank you. Without. Context, right? So a sentence on its own. Sometimes you may not be understandable properly withoutHaving to go up one level, right now, is what's around us, So, low storage with those sub-tubules, again, are kind of like a pure You're slowly filling up on--All right, full names.
It's always the end of its own. Soundy language.
different sounds. This is something that I want you toUnderstand. So? The Morphing is the smallest unit of theWhen it is not. The work is a combination of Nation over-It's a coordination. It's a single sound or a combination of those. It does not necessarily make the noise just yet. Lexis are also more... But let's say these are also words. So run in running is a morphem, is a unit of let's say, but it also is on the word.
And by taking away that,All right, so, If that sounds even remotely interesting to you, or if it doesn't sound remotely interesting to you right now, let me tell you two things. Yes, I will show you how to do that. And the mechanism that I will show you is actually used by.
Thank you. Process as one of the first stepsReady?
Inflectional are not creating new meanings. Thank you.
So happy in a happier world. Two different words, but they mean more, like the same. You're not creating any strength anymore, dampening. The relational You're unhappy, you're in front of the happy and you change the world, you create a new world.
Is that useful information? Has anyone--There's some grassy in here, or are they?
By the way, on that grass, it doesn't have to beShe already passed? Don't know what an energy is. Oh, dude, one minute, just explain why.
You get bits of tasks, timed tasks, standardized tasks, and I got questions like, "A word X is to Y as A to B or C to D." Multiple choice question, right?
You get that? I don't even know what x is. I don't understand. I don't have enough English to know what x and y is, much less c, d, a, d, right? And you have no clue you still have. You can guess, obviously. So... What made my life a little easier? Looking for those, right? There is some... There is some pattern that you may be coming.
Thank you. You're already passing the ball. Syntax? By the way, are there any... Questions? All good? Syntax, you know what it is? The way syntax is used.
Natural Language Processing은
의 교차 영역에 위치하는 학문 분야임
컴퓨터와 인간 언어 간의 상호작용을 다루며
특히 대량의 자연어 데이터를 처리하고 분석하도록 컴퓨터를 프로그래밍하는 방법을 연구함