Summary

Course Administration

Core Terminology Review

What is a DBMS?

The Megatron 3000 Example System

An imaginary database system designed from scratch to illustrate basic DBMS components:

Problems with Simple File-Based Systems Like Megatron

Despite basic functionality, Megatron is inadequate for modern applications with large data volumes and numerous concurrent users:

Course Objectives

This course will teach students how to build a modern database system that addresses all the limitations of simple file-based systems like Megatron

Next topic: Hardware and storage mechanisms

Action Items

Notes

Transcript

It should be able to do To meet all the functionalities that I did There are some people who are doing everything with Excel and they say, "Oh, my database is Excel." You know, yes, it may, you may be able to insert data, you may be able to create tables in Excel, you may be able to query. and delete the CRAN operation but that doesn't make it a database system. So, yeah. The rest are standard policies Academic dishonesty He's punishable here If you have any disability and you're a special needs person, then you can go through the Center for Disability We cannot accommodate you.

For example, If the exam is one hour, we may give you an hour and 30 minutes. If you don't want this kind of dysfunction in an exam, you can find a separate room for you to write an exam and many others depending on your Thank you. Namaste. Okay. So every week on Friday I'll be putting up As it folder Like this week one So like this Friday, end of this Friday, I'm going to put up week two. That is if you finish week one.

So it's gonna be just in that week. Week one, week two, just like that. Hmm. Okay. So In week one, This is the syllabus which we've gone through.

Our database software by the name of UseDatabaseSystem These notes are original for the authors at Stanford.

So if I'm making changes, there are very few changes I'm making. So In other words, these slides are not my own. But there are a few typos, if you see some typos, I'll try to manage to clean up some. If there's a typo and doesn't make sense, please you can email me. I can always Thank you. Explain. So...

So data, database, query, relations, schema, This sounds like 425. you know But just recalling some of these We know that anything, any information that is worthy storing in a computer as an informal electronic format It's what we call data. And a database. As a collection of interrelated now, we are very keen on this word, interrelated. Data, a collection of interrelated data.

If I am asking you to create a universal database And you say, oh yeah, I have student table, I have course, I have enrollment. And they have also diagnosis, disease diagnosis. I'm going to say, wait a minute. What is disease level going to do with a new muscle? That information is not related. That's Should fall under hospital. The subject matter is hospital. How is the course going to be related Good diagnosis.

A query is a, these are operations that we use to It'll extract data to and from the database. We want to exactly extract So Because we're talking about the biggest database, which is the S2O database, it's been around for over five decades.

since 1970. When EF code. I read it in the letter in the letter base. Up to now we still using the same query language, standard structured query language.

Our relation Because it's relational So we store data in a nation which is a table structure, two dimensional tables. Were we interested in that? Colons And it rolls. In the relational rows are records and then columns are fields or attributes.

And then the schema is basically giving us a snapshot of how our data looks like in a database. I'm gonna give you Yes. Description of the structure of the data. And If you have a collection of that, you can also call it metadata, which is always data aboutData. So data about the action database data. And we'll be looking at that. In form of a catalog, system catalog.

So, call it a general purpose system that is going to allow the crowd operation.

creation You know, updating Reading and division.

So in advanced, what do we do? Is it just going to implement a database system? Oh, just a minute, I'll let this. So we are implementing a database system, not a database.

Implementing a database that was a project in 4.25 Where you did it, that is based onThe airlines or Library database So here we are Implementing a relevant system What do you want from a database system? What do we want it to do? We just wanted to store data persistently. Or we wanted to ask any type of questions. And also updated data.

and you want to see the result. This was 425.

That's 425. Five to the five. We want to build a database system from scratch, just like this imaginary. Megatron.

Let's say Just imagine that These are guys From a lab, Megatron lab, they want to They're trying to design a nervous system from scratch. And they get to use the latest relational technology Remember this technology has been around for 50 years. So they say, "Okay, we want to be a relational database system. We still want to make use of the existing relational technology." It's gonna be Unix best, makes a lot of sense.

So they tried to imagine how it should be look like. What do we want it to do? First of all, you're going to say it needs to have a file system that stores The records.

a collection of interrelated data. We're using this American Standard Code for information interchange, which is basically what computer standards required for you to exchange data We want to use separate file So every file is going to be isolated. Stay gay. We're going to have isolation file. If it's too late, All information about students in one file. If it is department, all information about the department in one file.

This application is going to pass. A file. in a given Directory, if we want to read or if you want to update. For example, if I want to read what is in student I have to go through this folder this directory and then get into that and get the file and let's say the file looks like this Remember we said it uses relational technology, so it uses rows and columns. So you can see this is me. This is the ID.

I need to be separated with the character. would you go with? Hashtag or anything. We want to call it.

for the directory. And make sure you pass that. Mm. And, um... Yes, so Now that we know how a fire is,Doesn't one of the things you want to first make sure that your system knows how is data gonna be stored? It's going to be stored in that fashion.

In a good database system must have what we call a schema. Adelavis Kima. which tells us, hey, information about the table for example this table is 2 dates But I need to have another section of the database.

Tell us, hey, the table is too dense. It has the columns name And that name is going to be a string. It's going to have an ID, student ID, that ID is an int. department which is going to be a string Okay, we also have department. Department is gonna be We're going to have C, which is the common C string. And a is going to be an integer. So if you have that region Where?

When you go there, you actually get more information about the table. You call that the database key.

All the other ones are attributes and they are types These are the tabs. attributes can be columns and then So you can see that, hey, something that is organized It's not organized for me, the user. It's organized for the system. The system needs to understand Okay. How to find, because you want to ask it, can you pick everything from the student table? It is the first time, okay, do I have student table?

So it's going to be going line by line. Next today, I use your GPX Grandin 4 3.5 GS Come. Next. GPA is less than 4,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, So I have defined my storage, how fire is And if I'm going to have a thousand files, I know where to get it from.

Now another person will say good Arabic system should be able to return. Information, we should be able to prompt information and give it back. And now... We'll call for Like... An interface. So this guy says "Hey, our interface is going to always first welcome you" He's going to welcome you to Megatron interface And, uh...

These three dots is where we are going to wrap our legal SQL query So they are not saying, hey, we are not inventing, we are not replacing SQL.

We still will accept your SQL, but it's going to be embedded in our code. So... You come here and type the legal s2r query and follow all the prompts at the end of the query And this should be all in between this. Must be the three dots.

They go and say, "Hey, we can actually not want to display The result we may want to send it straight to the printer to print. Then just add on the query, just add the pipe. And then the name of the printer. and then end it with a hashtag So the results would be sent to the printer instead of the result being here like this This is the result. Hey, you want this to be sent to the printer for printing?

I'll let them say hey. We will not want to sell it to the printer, we want to create a new file but call it a custom name. So the same thing. From the student table, can you filter out the IDs? There are less than 100, and that's what I want.

And the short one. Okay. So... My database can store a file, my database can tell me Can this tell me more about my file which is a description of my schema? My database can do Um. queries But it can even do more complex things.

A query that has a condition just like this one here. What's the condition? I'm interested in IDs less than 100.

So, what? What does the database system do in such a query? So, first thing is going to get the schema. You can see how important the schema is. And if I can ask, why do you think it goes to schema? Why do you think it's called the schema? The first thing is read the schema to get all the attributes of R. So If I want to go somewhere, if I want to know the attributes, I have to go to the schema and see, yes, it's been defined in the schema, in a catalog, in a dictionary.

You know, Then check. Check if the condition Um. If the condition is Is a valent For example, someone can say where name is greater than 200. Name is Brethren 200. How is the screen going to be Brethren 200? No. So it will check validity. is Is this condition really? Uh, Ludica, you're right.

Then, we want to display attributes of R You didn't hear this? And then we're gonna be reading line by line. So after displaying the attributes of R, These are the goods.

Then we will go line by line and see which lines or which rows really meet the condition and we're only going to return those roles if it's true My first rule Um...

So when you pass that, It's going to go and say, "Okay, let me first go line by line, but let me only work with one file first." It's going to... bring our file into the memory. and it's going to say okay The first line, which I'm going to call R, Small r, r1. The first line, R1 And then the other line is, the other one is S. So if R, if this R has 3, Rose And then this S has 10 rows. So the first row in R is going to be Evaluating with all the other ones.

Then we're going to go to Ara-Ara-One and S-Two. R1 is 3. R1 is 4, R1 is 5, R1 is 6. That's what they call the Cartesian product. So it's going to get 3 times 10. It's going to have 30 interactions. Every single row in one tablePause.

Ten times. Okay, he's going to interact with every single row in the other table. That's why we do not have 30 interactions. If we have three, our three records in R and then 10 records in S.

At this point we can be happy. I said, wow. So far they are Megatron. Can even pick data from two tables.

If you just stop here and you say I am creating relevant manifesto system We can see now That is not enough. Okay. It's not It is inadequate For application involving Especially the big data today. The large amount of data and also Okay, numerous users. that belong to the database of the same time. And This is very common today in today's era.

some of the functionalities that we find in database systems but those functionalities does not make Megatron a fully fledged database system And here the list of problems we can start pointing out with Megatron.

It's not flexible.

The CSD database, that is the CS department. Everything is going to be redit and that is computationally so... Bad. So you find that, first of all, working with those key characters' story is expensive. And you are writing the entire characters up, right?

When you're deleting a record, So The structure, the layout has that problem when you're reading it and when you're There's no way you can talk about direct this and then you don't talk about indexing.

So with that structure, You don't see how one key is linked to another one in a foreign key, so it's quick searching. or the IDs, how are the IDs? When you retrieve the ID, how is the ID going to be able to retrieve all the entire row?

So the search is going to be really impossible or very expensive. You cannot find a tuple with a given key. So pretty much this is like a flat file. You know, you're always going to read the entire relation, which is again going to require you To run The Dino Theory The other one is the brute force.

When you look at this query This query has a It's clicking from two tables. We're returning. All the attributes After combining attributes of R and S, So, if R has A, B, C and S has A, B, C, then D E F So I'm interested in ABCDEF.

But we saw Megatron can work with such. But let's see. How? Oh. How beta this can be. If we had used query indices You know This selection here and I think I'm I'm thinking A is an ID. If A is an ID Then picking out this will be so easy. A is an idea or even B and A, B and I. My indexes. You know... We can pick up this, we can go through this so quickly. C, B, B and S is greater than I can, you know, we don't have to go through all that file then.

Yeah. And we can easily basically Identify those This Equality. or what we call the differential integrity, where A references We are out of references. S. Boya A. So you can see that, hey, Then the margin. The buffer, the buffer, non-buffer manager And if you're looking at Buffer Manager You basically want an efficient way of managing the main buffer or the main memory.

That's very expensive. And that's what these people do.

The data temporarily hold the block. You know, we want to temporarily hold the block into the membrane such that we don't usually do the Um... Go down, pick the data. You know, it was kind of expensive.

No concurrency control. You can see that, hey, now, Some of these things are must, especially with new modern systems. And before my agent must be there, The layout must be so flexible.

So this kind of system where you have to access the directory to access the student file. So if we have a million people accessing one file, imagine.

Okay. If I change the GPA here, the other person is still looking at their old GPA. And the other person is looking at me, and he goes, You know It's going to bring in inconsistency of data, which you call unpredictable results. So concurrency control must be Incorporated if you're talking to if you're talking about a fully fledged therapy system, but not this megadrop.

Reliability.

You know, No one is immune to this kind of... As a dentist, You made the White House system aboutAre you presenting mute? So this kind of system can remove data so easily.

Security, I know so many people, so many that are based on my military.

So A security feature when you look at Postgres or MySQL The first thing that has to be created is a password. But even the type of security we're looking at is even way beyond just the location.

So many things are still wrong with this. Today, world people want to plug in a lot of APIs into database. Like, how can it be better?

Estrella Light but at some point they want them to be you want a system that is easily applicable for What's the time?

Do we have five minutes? If you don't have a GUE graphical user interface That system is, you know, today's world, people want both. The console, they want the CLI and then you get the GUI. Finally.

So what is this course really telling us? We want to introduce students to a better way of producing a more than database system. A system that goes beyond what Megatron does. And you can see that That means nothing. Just skip that, but I'm going to provide I want to provide some content to refresh you about this 425 content So we will go straight to the hardware. and see where our file is going to be stored.

So I'm not taking attendance today, but starting with the next class, Please make sure that you are ready with your QR code. I'm going to take a tendency using the QR code. Make sure you can scan the QR code. And uh,