Lecture Script

Summary

Overview of Natural Language Processing

NLP Applications Discussed

Topic Modeling and Text Classification

Sentiment Analysis Deep Dive

Information Extraction

Other NLP Applications

Levels of Language Analysis

The NLP Pipeline

Text Preprocessing: Stemming and Lemmatization

Handling Unknown Words

Semantics and Pragmatics

Notes

Transcript

You use the, is that it has wornJudging that he has learned something fromWritten sources, right? I'm not going to define the learning process here. I'm not going to argue with her. She actually knows how to make things and whatever she does. This class, Having the And we'll be there. Finally, Understanding languages themselves is Okay, so. Here's what we want to do with them. Understand what's next for them.

Is it 100% possible? I don't think so. Is it mostly possible? Absolutely yes. Successful. The dishes are done. Now, that's... Indulge me. Do you think thatBecause that's the next step. Good things to do. question I don't even know how to teach that. Anybody know what that location is? An open question. I'm on the side of no. But, ultimately,Nice to have that. And this is-Progressing very fast. having an interactive communication with a machine and a translation.

Now, so this is what a regular user would want from NLP. Right, you don't care what's on there, that good? You just have a tool that understands you, What's What does it mean from engineer? I pressed a couple times last time to do that. We use what we learn and more to build your own applications. Those applications will do all sorts of things with us. They won't have to do chatbots. They won't have to understand everything.

Depending on the task, You'll be using different algorithms, different approaches, and different data. I'm sure that this will in your eyes on the NLP engineering perspective and scan the facts and extract information and whatnot. Spelling, right? What other obligations are you familiar with? Natural language processing. which means human. Natural language, interactions, whatever it means. What have you...

Seeing what have you used?

I think a lot of companies are automating phone calls and whatnot. Phone calls, automating conversations? All right.

Okay, good. Do you trust him? I don't believe this. I don't either. Anything else? Can I move it? Grammarly. Grammarly, yes. That's a very good tool. All right, so I'm going to give you some other things. There's an overlock in what you said, there's an overlock in what you said. If you don't know what a language model is just yet, you can find that in ourBig field. Building a model of a light year.

What a burden. Text classification, we'll do that in this class. Sentimental analysis, there's a lot of flavors to information extraction, information retrieval. These two are different paths. I'm sure some of you, CS4.29,-Different topics. Conversational regions, that's obvious. Text summarization, question and answer, and machine translation. These are not surprising to me. What about topic modeling?

Have you heard that term? What do you think the topic model would be? I imagine most of you, not all of you know how a cluster in one word is formed. Right, you have a data set. In this case, it's going to be documents. You suspect that there are patterns in the documents or in that data, or that some of the data homes are sharing, but you don't know what are the groupings. Next topic modeling is kind of like that.

Okay. These documents, I'm just giving you a pile of documents, separate them into your own business. Sports news, this is a novel, this is a short story, this is a... Letters from your grandma, whatever. based on what's inside. So that's it. -Obvious. Have you seen tax classification in action? Or not. Let me show you how that works. How do you use the text classification? Any favorite news source?

Favorite movie? Let's play it safe. Okay. What did just happen? We have a document full of text, right? Words after words and that document ended up being classified with a number, right? Thank you. On scale,. plus one being paused. Here's a question for you. How would you approach that kind of-What was the text? Was it a bad review of the movie? It was a bad review. I picked up one out of them. Oh, okay.

I would look at some of the words and see if they are typically negative. Words with negative connotations or positive connotations and based on that, Potentially, I don't wait to those words, like how negative they are. I guess, and you could create a desirable model using that. Perfect.

Okay, so... Nowadays, I'm not sure what this model is. Specifically, What's the underlying technologies into neural network or something else? But if you don't believe, that kind of sophistication when you use that, right? Take the, look at the words, right, and, you know,Get the word negative. Such as? Crowing Boring. And at the most basic level, we just come back. We'll get to text plus information.

The most simplistic way would be just to have a kind of a list of words that are negative and words that are positive. their occurrences. I mean, there's more on one side than on the other hand. Positive. Of course, that would be just very, very, veryWhat's that? Most of-Next classification is based on the center. We look for certain words. Yes.

Does this classifier take into account context? For instance, if somebody says this is positively horrendous?

I don't know. The first demo that I foundOnline. I have no clue whether this one actually is taking the I doubt that. What you highlighted isSomething very important, this is a scenario where this counting of positive and negative words kind of breaks down, right? on the same token that someone wrote. This was not great, not great, not great, not great, right? It would be, for a simple model, it would be probably classified as a positive, because there's a lot of breaks in it.

Yeah, this one says no redeeming qualities. It says only redeeming qualities.

It's in your face right now how hard it is, right? There's ways of, you know, massaging and things. Thank you. Trick here, a little trick there. But, well, we'll get to that point. So here's text classification. Where would you get data? Warwick. By the way, You have a text that has a score. A lot of times for a lot of problems people already have. Yeah. And of course, Would you be able to weed out a trouble in Review out of that?

Just one minute, 'cause there's so many little steps that you can do here. I can't do that all in two programs. Information extraction. This is an honor, Mary. Useful. A petition of NLP. Does anyone have any favorite news source? There you go. Here's a random text that I plugged in. What happened? Information extraction. Actually, let's do it again. Talk about briefly aboutThe information expression.

When I ask you about your texts, let's meet them this Friday, right? That's the information instruction. Did it do a good job? You have some tax, you're a person, Mark. Mass Effect was labeled as organization. Probably not. That's a game, right? Outdoor Wilds game is kind of the one thing that is related to all sorts of games. Misclassified, torn in all, You did a good job the past decade. You did a good job for E-Current.

Do you see what's happening here? So instead of this time, instead of classifying the text, we're plucking up a piece of paper. Thank you. Relevant to some problem in this case. Is it an instance of tending? It is an instance of tending, except it's a kind of more sophisticated I will guess you may-Word.

I don't have a specific answer for this particular demo. I don't know exactly what-This is probably based on-Stacy was just So I don't know how it was trained. What is it? I'm pretty sure that it's not just the cutting over My guess it would be that it looks like Oscar Wilde or something. Yeah. You will get to information extraction at some point. You will see how that's being done. Nowadays, mostly it'sNeural networks.

Some of the sentences here definitely follow that structure where you have Birds, mountains, and... Well, mass effect, these can be considered in this context. So maybe we start with that sentence structure and maybe with data, we'd be able to identify whether they're proper nouns or something else. Great.

By the way, notice what he just described. Before you even slap the egg on Mass Effect, you look. We'll wait a minute. Whether it's a noun or not. There's an underlying process thatProceeds. That's exactly how it works. First, we were part of the speech, and I mean, Do that. Of course you can play it easy, for example, and we don't have any How difficult is it to have a list of 200 countries? Go through the text if I spot it.

Companies, that's a little trickier because they come and go. Good. What I'm trying to draw your attention to is this. That's working, and there will be some really good lookup tables. Numbers, we don't change, right? Weekdays, they don't change. Okay, so So... We kind of got into the point. to some understanding how would you do that?

So you already covered the calendar It will connect you to your calendar and execute possibly form, formulate a command at-Finally, this is what a lot of yourClaws are doing this. There is a language mold that's... Build to process boards and on top of that you will have other little componentsBut what would you use that?

How about question and answer and then summarization? You may ask a question, okay? You have a... You have a conversation. President Trump, there's a transcript of that. And you asked the question, what is President Trump's stand on this or that, right? It has to look a The parts where President Trump is saying something versus the interviewer and whatnot. Yes?

Maybe like Google or like Yelp reviews for a company or something. Right, right, right.

Yeah. Just write your own script that runs what are people thinking about Okay, so information extraction. information retrieval you've seen that on google, I'm not going to give you an example this is the search engine is an example of information retrieval why? And we'll just search for the bazillion of websites and documents and we'll just show you the ones that are seen to it. relevant to your search, to your query, right here in the back.

How is... Not a trivial let's scan through all the documents and When you ask and then you give. The answer is I will too. Take ages. Are you a sectional agent? Thank you. See this, you got that one. You will. An outlier. Use it anymore, but, uh, uh, introduction law space fiction. A few things about Space Station. Never mind. You know how it works. Machine translation, You found that yourself? I think we'll have a little time to talk about that.

You can see that there is-There's a ton of different applications, and once again, I'm... Repeat myself. Those tasks. You use different parts of that. -We'll be going through that. All right, there's other obviously that you can think of. I do hope that many of you will come up. Some of those tasks that you will see in this classI've heard. Some ofThen let me... Asking what do you think about this?

It's open domain control. Sexual age, right? It's pretty easy to understand. And what it is, conversation agent that. You can talk about. As opposed to a closed domain, Have you seen those? I'm gonna do what my company's reviews is not gonna go to like, shape.comYou've been dealing with Chad Busberg a lot, so how do you rate The quality of existing open domain in decision making. Yes. They actually talk aboutAnything he did in a serious way satisfying Thank you.

Still not there. Moving in that direction, like I think. Gemini probably has the best, like, when you're using your voice and communicating back and forth. I think they probably have the most, quote unquote, human interactions with it, but it'll still trip off where it doesn't.

I feel like a completely organic conversation, but it also doesn't entirely feel like you're justIt feels like It's trying to add some personality versus just--Anything. All right, last time I showed you this table just to remind you that The way the language is. Study this linguistically. ...divided into... Let's talk about how those levels of tests are being processed So we have the complexity grows in this direction, right?

This is the opposite direction right here. We'll skip most of the sound aspect of it. How do we do that? Because these are levels. to understand the attitudes, to understand the syntax and the semantics and whatnot. So what you would-building yourself. Our pilots, okay? You might, if you did any NLP work on agent, creating agent, I'm playing new processes. It has a, Reason, for the most part, it comes from So.

A fundamental NLP High five. is broken in that particular way. Let's ignore this future analysis. Thank you. Great film. Next, into words and sub-words. Then you parse sentences. Then you apply grammatics and distill the context. Here. This is you right here, okay? Grab everything that was discovered along the wayThen, The part that we will not be spending. Then you generate the answer and it expands.

Your response might be, I don't know, the number that we're going to 1956, right? That's, it's a date. response to some question. Now you have to massage it and then turn it into a English Paragraph that will be then used in the next one. You're reversing the process. All those little steps along the bullseye are Really, Chris. There's a reason. Behind having the pipe. Reasoning, and this will be application specific, so this is where you come in.

Interest in building an MLP system. Now those two sections of the pipeline are...generation. We are going to be here. Does that make sense? For the most part, Your chat GPDs are doing all this stuff in one lesson. Wait, they do have stages, they do have--But before we will get their rules. So let's start here. Alright, so morphological and lexical analysis. We're looking at text. We're looking at words and their parts.

The prefixes, the suffixes, the roots, and the what. This is called morphological analysis. Have you heard the term "limitization"? What if... we're standing in both our... When you simplify the word. Okay. Simplifying words as in... Happier, happiest, go, going, when. Stemming, I'll just... Describe and explain the differences in the. and how we can do that. Stemming and then accusation. Hands up.

Looking at your text. Let's stay here. Go, go, into, and into. You got every single area,Two questions for you. Why would you do that? And why would that work or not work?

Sentences can have a lot of filler words that are nice for context, but if you remove them from the sentence, you can still understand. What was the point of trying to do that?

Okay, so that would be... Moving somewhere, surrounding, go, go. Would you keep going, going, yeah. It's an action movie, I think. Probably. This means something. The words that you would skip, probably. There is meaning to it, but very little compared to an action movie.

We haven't got there yet. But anything, any word in your text We'll have to have a Thank you. Right? You can't just storm go and So let's say, let me make it up. Let's say that Go has 234. Go has4,096. When is the Let's say that they have individual IMBs, right? Was that my cat? I'm adding a little dictionary. It's not a proper dictionary because it's a dictionary. You don't have, you don't keep all the forms.

necessarily, right? For his limbs. All right, so. Is this the same word? From a computer perspective, you said it goes from different numbers, right? If you strip that from meaning, you just erase every word with a number. You're dealing with Different words, technicals. If you remove the contents or you don't get to the contents level just yet, you can't--That'd be sick, right? process and goal going forward asIs that affecting, possibly affecting the processing?

It could be affecting the space complexity. Space complexity, right. We have a bigger dictionary that has to be We're all a version of you. Very good. Let's go back to the Shrek and sentiment analysis. Would you care? Happy, happiest, greatest, greatest, greatest. Would you care what our... Those different versions are beingAnalyze or can you just? Replace all of them with Pappy, would you get the sink?

Sentiment analysis. More or less. I'm happy watching Shrek. I would be happier watching Shrek 2, right? I'm the happiest whenever I watch. You can see that. So reducing the forms of different words is just helping you reduce the space complexity of the problem and as a byproduct of that your processing time and time complexity will be reduced. So that's the answer to the first of my questions. Now the second one.

Can you imagine the reverse when you're actually losing some of your time? Not included in that.

Saying you're happy watching Shrek and saying you're happiest watching Shrek are two very different things. We'll say you're happy to watch. You're just happy when you're happy. Yes, that means you're happy. Like the happiest that you could ever be. Yeah, correct. So depending on the context of that statement, it could mean. Very good.

And depending on what are you trying to achieve, how many interested in being in answering the question, I don't care whether it's happy or happiest, right? When are you happiest? There you go. If I reduce happiness to happy, I will not have an energy as easy. So again, here's another fundamental problem for you. You will have to make that call. That's helpful. Good way. Do I care about that? Am I willing to sacrifice it on this base because it's in the war?

The good news to some degree is that all the neuron-based models that you have right now, It takes everything in it and actually willOkay, we'll try. Happy as happier eight pieces by the new Extract everything that you don't have to. However, it is used in some cases. Name your question, we'll get to it slowly. One more. Example, going back to how are those different levels of language things are related to applications.

--Would that work for any language? 30ish, right? Is this a real English word already? Probably not. Everybody understands that, right? Yeah. You may rest now, thank you. Any other examples? All right, let me ask you a different question. Have you ever thrown a misspelled word or a sentence? Crazy word that only used in your friends, whatever some slang word. Mason Sineworth at HRGVP. Give it a go and see what happens.

I'm giving you an answer right here. I'm ready. I'm giving you a scenario where you are Throwing andA word made up of pieces that are unknown because they're not part of the language of the dictionary or whatnot, right? Or the list of preferences is Everybody knows what it is, or everybody can understand it. But yeah. Would that be a challenge for the computer?

I saw a YouTube video where A guy talked about how people are becoming lazier with their typing and CheckGPT and other alums are actually perpetuating that because he was doing experiments with it where he intentionally was only spelling words with half of their letters and it was still able to understand everything he was saying. So I think that works from the neural network side of things, but if we were going to use these older I'm not sure if they would be able to have the same success rate.

How close is it to the spelling of the other words? Given the context of the plantin or plantin, you can kind of figure it out just given the sentence of the type of plantin.

The long story short, the near tragic, If it sees an unknown world, Unknown component. Cut off the ink, I'm sure. It would find the closest equivalentNext home dictionary, do it. And then you do it. Interesting. To actually help massage that situation, but All right, soMorphology. Pieces of words. Words become sentences, sentences are structured. -Understanding if you are correctly structuredThis will be.

The next step. And then extracting knowledge. I already told you that. Gasps This happens at multiple-Syntax. Cough Let's stop at semantics for a moment. Four of them. Semantics means extracting meaning from facts. What is this? For all birds, they're very big, they can fly. Okay, so we, um... I don't think it's a rule of logic. It's a first-order logic rule, right? Then can you possibly extract a rule like that from a really good text?

Can you represent the text that you wrote or spoke or whatever, do you think something like this or some formal structure in that way? This goes into the reasoning and everything, right? Perhaps. The idea is that you want your machine to learn from sex and learn relationships and what not. Otherwise, it's just word next to another word. Pragmatics. This is above semantics and it's related to intentions.


Introduction to NLP

Natural Language Processing (NLP)

정의

Natural Language Processing은

linguistics, computer science, artificial intelligence의 교차 영역임

컴퓨터와 인간 언어 간의 상호작용을 연구하며

대량의 자연어 데이터를 처리하고 분석하도록 컴퓨터를 프로그래밍하는 방법을 다룸

NLP가 포함하는 영역


Computers vs Language and Speech

개념 구분

Text processing