Summary

Course Overview and Transition

CS425 focused on logical database concepts and query writing, while CS525 focuses on building DBMS software
The course emphasizes understanding how to efficiently store and manage large volumes of data
Key focus is on hardware (specifically disks) and how data is stored and manipulated at the machine level
Students need to understand how to choose appropriate storage structures for different data types (images, audio, text, PDFs)

Disk-Oriented DBMS Architecture

Primary assumption: Database primary storage is on non-volatile storage (HDD/SSD), not in memory
Database files are stored as fixed-size blocks (also called slotted pages), typically 4KB to 64KB
Memory is always smaller than database size - cannot fit entire database in RAM
Buffer Pool: Component that manages data movement between disk and memory
- Reading: transferring blocks from disk to memory
- Writing: transferring blocks from memory back to disk
- All query operations are performed in memory to avoid expensive disk access

Storage Hierarchy

Storage is organized in pyramid structure: smaller and faster at top, larger and slower at bottom
Hierarchy levels (top to bottom):
- CPU Registers
- CPU Caches
- DRAM (Memory)
- SSD
- HDD
- Network Storage
Trade-offs:
- Closer to CPU: faster, smaller capacity, more expensive
- Further from CPU: slower, larger capacity, cheaper

Volatile Storage (Memory)

Data is lost when power is turned off
Advantages:
- Byte-addressable (direct access to specific addresses)
- Random access - can jump directly to any location
- Very fast access speed
Disadvantages: Smaller capacity and more expensive
The course will refer to volatile storage simply as "memory" going forward

Non-Volatile Storage (Disk)

Data persists even when power is off
Characteristics:
- Block-addressable (must first locate block, then specific data within it)
- Sequential access is preferred
- Two-level indexing required: block ID, then unit ID within block
Advantages: Larger capacity and cheaper
Disadvantages: Slower access due to addressing mechanism
The course will refer to non-volatile storage as "disk"

Persistent Memory (PMEM)

New research area attempting to combine advantages of both volatile and non-volatile storage
Aims for disk-level capacity with memory-level speed
Not covered in this course

Disk Components and Structure

Platter: Circular disk with magnetic surfaces on top and bottom
- Each platter has two surfaces for data storage
- Each surface has a read/write head
- Capacity example: if one surface holds 1MB, a 3-platter disk holds 6MB total
Rotation speed: Typically 7,200 to 15,000 RPM
Disk heads: Move in and out while disk rotates to read/write data

Data Organization on Disk

Track: Concentric circles on disk surface (like lanes on a track field)
- Modern disks can have 50,000 to 100,000 tracks per platter
Sector: Subdivisions within a track
- Typical sector size: 512 bytes
- Inner sectors are smaller than outer sectors
- Sectors are separated by non-magnetic gaps
Block (Page): Logical unit composed of multiple sectors
Cylinder: Collection of same-numbered tracks across all platters

Disk Capacity Calculations

Can calculate total capacity from disk specifications:

Access Patterns

Sequential Access: Reading blocks in order (1, 2, 3, 4, 5)
Random Access: Reading blocks from arbitrary locations

Disk Access Time Calculation

Total Access Time = Seek Time + Rotation Time + Transfer Time
Other delays (CPU, OS, controller) treated as negligible/zero

Seek Time

Time to move disk head to correct track
Can be zero if already at correct track
Typical range: 0-10 milliseconds for good disk

Notes

Transcript

of working with complex queries, just like this one. But I would say there's a couple of things that are wrong that we need to work on. Oh, we finished this. Yeah, I think we're done. I was just forgetting. I just need to go straight to the second, which is hardware.

Hardware. And now that we know that we're interested in the low level, which is the machine level, we want to look at the database system just in the machine, inside, internally in the computer. The component we are going to be interested in is the disk where the database file is going to be stored.

So we're basically going to look at database storage. Database storage. We have the system that is going to store our database. and look at the database as just a file, just like a .sql file. As we learn how to store data on a file, as we understand how to lay out data on a disk, we are also interested in knowing how to manage the data. How is it going to be manipulated?

Look at all the data, 10, 20 years ago, we would not really pay attention to that data. Today, I'm so interested as a business person to know what seeded meaning is in all the data. So, as we create, as we design the system, we need to know how can we store and manage huge volumes of data.

And you want to know in which structures can we store this data. You cannot start forcing an audio file into a row and column. You cannot start fixing an image to a table. You may want to say, OK, I want to structure my images in an array format, or you want it in a column format, or in a JSON format. You know, all those. You need to know.

That hey, we want to choose the right storage structures for the right data.

low-cost vehicles, but I can achieve what a Ferrari achieves. Optimization. Because how many organizations can buy storage worth? the storages that are on Amazon and Meta, you know. But we can achieve those speeds, we can achieve those capacities. When we do optimization. Other things we talked about, we're going to talk about the cost and also this failures. So going forward, this is the architecture we are going to assume. This is the layout that we're going to assume throughout the course.

It is called the disk-oriented architecture. And here, we are always going to assume that, hey, That primary storage is going to be the hard disk. Meaning that, hey, if you have a storage... If you have IIT database, I don't know how big it is, let's say 3 gigabytes, then my hard disk must be maybe 500 gigabytes.

Someone is gonna ask you why can't we put my database into the memory. The memory is always small. The memory of this computer is about 16 So, because we are thinking of big data, we are thinking of large volumes of data, we want a disk that can... I can't store the biggest database you have ever thought of. But think about Facebook's database.

So, our database file is going to be stored in units we call blocks.

All slotted pages. Now, just like a...

which we call the slotted pages. Those blocks are fixed sizes. They are of fixed sizes, typically 4KB. So they range from 4kb to a 64kb.

So, on that disk-oriented, the disk is lower, and then we have the main memory. In the main memory, there is a component that manages the interaction of these two. Why do we have this architecture? We have this architecture because we would want to get the block from...

The disk to the memory. We work from the memory because it's very expensive to work from the disk. Every time you access the disk, you have to open the disk, initialize it, then close it after. That's a lot. So imagine every time you're picking a block.

And now our queries are complex. They're requiring data that's coming from more than two blocks. So let me say that you want to access eight blocks. That means you're going to open and close eight times. That is a lot of time. So what do we do? We bring the blocks to the memory, such that we do all the manipulation in the memory, because it's going to be fast.

And we can write back. So we have a buffer pool. A buffer pool... It's a component that is in charge of the movement of blocks back and forth. When I'm reading a block, when I'm transferring a block, If I am transferring a block from the disk to the memory, I am reading. If I am transferring a block from the memory back to the disk, I am writing.

the non-volatile and the volatile. So, before we talk about the category, I want... you to look at the hierarchy of storage. So, our data storage is arranged in a pyramid-like structure, whereby the pyramid, always the top is small. And the bottom is large. That is how we find this data.

If you look at the three top blocks, one, two, three, they are much smaller than this. So these are larger, these are smaller in terms of capacity. The CPU register, the CPU caches, these are small storage components that enable communication between conductors within the CPU.

Where are the SSDs and HDDs that had this in laptops and in stand-alone computers? They are much larger compared to the registers right here. You cannot find a register that is in a petabyte, in a terabyte. They are normally smaller megabits. Megabytes. Then, a network of storage could be the arrangement I'm telling you. You can have maybe 16 HDD hard disk drives arranged in a network format if one of it is 1TB. When you combine them and connect them in total, you have 16TB. So your network can hold 16TB if you arrange them.

So, we have separated these categories, this hierarchy, into a volatile and a non-volatile. A volatile one means that, hey, if power goes off and you're working with these three, you're going to lose the data there. You're not going to lose the data. The data here persists, whereas the data here is lost.

So volatile, they are very fragile. They are very sensitive to power. They need constant power all the time. The moment you cut it off, then the thing is... So these two categories have their pros and cons. and being back addressable. Now, I told you when you come into my place, my home, you first have to go to a block.

But if you are byte addressable, you are going to point directly to my unit 4, as opposed to block addressable. Non-volatile are going to first look for the block, then inside the block go look for it. So this is going to be slower. because it has to have two indirection mechanisms compared to this one that has a random axis which is direct.

Another pro of, these are faster and drawback, they are smaller and they are expensive. This one's the pro, although the advantage is they are larger. However, they are larger and cheaper. That's the advantage. But they are slower. Slower in terms of access. So let's go back here.

So basically that is the ones that are close to the CPU are the volatile ones. We say those that are close to the CPU are considered fastest storages. However, they are smaller and expensive. Those ones that are far away, they are much slower, but they are larger and cheaper.

So you can see that the more you have, the more hard disks, every time or every new year, the cost of a hard disk goes down. So it's cheaper per byte, per gigabyte. So, someone can say, I want to add the third category, which is non-volatile memory. Non-volatile memory. He wants to get both worlds. He wants to get a world of volatile and wants to get a world of non-volatile.

But I'm going to say that this is a new area of research, pretty new, and we are not really going to cover this. We are going to concentrate on volatile and non-volatile. And if you are going to choose... You can choose a storage media based on many factors. One of the core factors is the speed. I don't want to access the disk and it takes a long time.

I wanted this maybe that spins so fast. It has 15,000 RPMs, rotations per minute. I want... The cost of the data... I mean, if you go online to... you're looking for a one terabyte hard disk, you see Amazon... and maybe best buy. They are selling the same capacity but one you're saving a hundred dollars. So that can also be a factor, the cost.

So volatile means that if you pull out the power from the machine, then the data is going to be lost.

So, you lose content as soon as power goes off.

The benefit about it, I talked about being randomly accessed, that means fast and direct and also byte addressable. The program can jump straight to your unit, to your door, to your bite, to your address. So, going forward, I'm not going to say volatile storage when I'm meaning a disk. I am, sorry, I'm not going to say volatile storage when I mean a memory. I'm just going to say memory. So, memory going forward.

I see memories on the top, and at the bottom we have a non-volatile, which is also going to be referred to as a disk.

So if it's not volatile, it's non-volatile, which is exactly the opposite. When power goes off, this is not going to lose the content. Data is not going to be lost. Data is persistent, meaning that, hey, it's going to... Be present, even when power goes out.

The drawback here is block addressable. This one cannot get to my unit until you first tell me the block number. Come to my block number. When I get a block number, if there are 10 units in a block, I will look out for that. That is old time. Okay, so this cannot jump straight to the unit. You have to go through a certain addressing.

So somebody can say, I'm going to have two indexing. An indexing to point me to the right block ID. Then another indexing to point me to the right unit ID. That is old time. You know, that's why they are slower. The way you access this sequential access, and this is accessing blocks that are in an order, that are in sequence at the same time.

So even if the first 10 blocks are useless, you're still going to access them. Unlike the other one that jumps directly to the block that you want. So, going forward, I'm going to call this line volatile a disk. as opposed to me calling it non-volatile. So back to our architecture, I have a disk on the bottom. I have a memory at the top.

So, this is the category that I said, hey, we're not really going to talk about it. Call it PAM. P-MEM. And it wants to get both worlds, because the advantages of a non-volatile, the advantages of using a disk and the advantages of using memory. For example, we know that disk gives a bigger capacity, although they are slower.

And the benefit of this, they are smaller in capacity of the memory. The advantage of memory is they are smaller in capacity but they are faster. So this is for us to say, hey, I want to use both. One can fill in the loophole of the other. This is a new area of research that you can read about, but we're really not going to focus on. I've already gone through this.

So every time we are thinking of designing a database system, we always have to know that the size of the database is always going to be exceeding the size of the memory. Because we cannot fit everything into the memory. So all the information about the database is going to be, and I have this, both the database system and the database itself.

So, the disk can be secondary, secondary device and those devices that you can always plug out of the system. And you can have two, you can find two access methods. Sequential and random. First is random. We already said sequential. You're going to follow a sequence.

If I say I want to attend to your queries in a sequence, you know, I have to start from the first one here. If there's someone here, go in that sequence like this until I finish the sequence. Maybe some people don't have a query. Maybe the first five people don't have a query, but still I have to go through it. Because it's sequential, I have to go through the sequence.

So we are reading data blocks in order. So if I want to block five, I cannot go to block five. I have to go to one, two, three, four, five. And I can't count it, you know? I like random. Random, I just go to this one, that one, that one. See? So you can read data from anywhere at any time.

We want to look at components of a disk. Basically, we're going to break down the disk. One thing we know that, hey, on that disk, it's going to be partitioned into fixed sizes called blocks. We already know that. But now a disc has the top and the bottom. So the top has a surface.

which is magnetic. We're going to put data on that magnetic surface. The same applies at the bottom. That's why we have two disk heads. So there's a head on the top and a head on the bottom. One needs to read data on the top, one needs to read data on the bottom. And we call that a platter.

So one of the data has two surfaces. If I say that, hey, the total amount of data that can be filled up, that can fill the top surface, is one. What is the total capacity of this disk that is connecting three other disks? What's the top of this disk architecture? So I have to multiply this times this by... So 1 plus 1 plus 1 plus 1 plus 1 plus 1... 6 MBs total capacity of this disk arrangement.

This disc can rotate at 7200 rpm. Maybe that is very slow. If you find some that is more than 1500 rpm in one minute. Maybe you want to go with that. So those are the speed of rotation. Because when you rotate, you get the yield. That's how we read. That's why the header reads the data.

And, of course, we talked about the disc headers that can read or write at the same time. So, the beauty about these diskettes, they move in and out as the disk rotates. So, imagine. The data, you may have data all over the surface engraved in the form of tracks and I think

This brings it out very well. So this is the typical top surface or bottom surface. You see those lines that go around. If you've done track and field. If you start from the 5th lane, the 5th track, you cannot finish from the 4th lane. That means somehow you cross over somebody's track. You're going to be penalized. The same thing with the data. For example, this is a track.

Now, if we partition that into different divisions, it's going to be a sector. So, we can have so many sectors in one track. And we can have so many tracks in one surface. And we can have two surfaces in one pletha. So when you get to know this... You know, you will get to understand how you can compute the total capacity of a disk.

Now, I want to say that these gaps that you see, which are like the demarcations, where where the sector starts and the sector ends, those are called gaps and they are non-magnetic.

And this talks about everything I've talked about. I can say that a typical plater can have between 50k to 100k tracks. So imagine, this is just one track, three, four, you know. So a typical disc can have about 100k, and that is maybe a modern track, a modern disc.

And the typical sector can stall 512 bytes. Now, when you go here, of course, sectors that are closer to the center, they are smaller. For example, this sector that you see here is smaller than this one. So the amount of data this is going to store is much smaller than that one.

They can make up what we call a block. Remember, a block or a page is...

roughly or approximately 4kb to 64kb. Now 64kb, you're looking at big data, you're looking at adobe files. So now, a cylinder is when you put these discs. And they are arranged or aligned in unison. All the heads are reading the same track. So if all the heads are reading the same track, or they are working on the same track, that makes up a cylinder.

So remember, tracks are numbered. If it's track number 20, then it's going to be cylinder. If it's cylinder 20, that means it's reading the 20th track.

So you are going to have properties of a disk. They'll tell you this disk has 10 cylinders, which is also 10 tracks per surface.

And if you are given that statistics, you are given how many platers there are. You know each plater has two cylinders. You want number of tracks per cylinder. Number of heads, you have a number of heads, you'll, number of heads, we have two heads times number of platers. How many platers do you have?

So many sectors and so many tracks, then we can divide that and find in one track how many sectors we have on average.

That is the standard size. Oh, you have so many bytes and so many sectors. You want to see how many sectors, how many bytes are out in the sector. So, basically, if you're given storage characteristics, It means that you can go ahead and get this kind of summaries. These are the summaries that you need when you're buying a disc.

You are in the right sector. We can also be on the right track when we don't have to move back and forth. So if we are on the right track and we are in the right location... of the data, which is the sector. Then we can easily start reading and writing. So access time can be broken down into three major delays. The seek time, rotation, and transfer.

Now, the other delays are going to be treated as negligible. They don't matter because they are associated with CPU, they are associated with the operating system, they are associated with something that we are not looking at. We are looking at... What is within the database system, okay? So, all those quotations for controllers, memory, we're going to treat them as zero. So, this one is going to be typically zero. So, we're only going to be left with what it is. So, let's see what six times mean.

So, if the time to move my arm to there, to position my this, this, this head there. on to the right track, is what we call the zig-zag. So, am I moving this? Yes. I can move...

Position my head, the right trap is the sick time. They say sick time can be zero. Yes, it can be zero, only if...

We are, the disc, the disc's head is initially at the right track, and we can call that a lucky day. If it's our lucky day, that means sick time is zero, so this is going to be zero plus. Can rotation be zero? Yes! Rotation can be zero if also we are in the right sector.

We can now start looking at where this time comes from. So first, we move the disc head. We say that this kid is going to move. So we move it to the right track. And we say that time it takes is called sick time. So a good disk is approximated to have a seek time of 10 milliseconds, from 0 to 10.

Disk-Oriented DBMS Architecture

Disk 중심 DBMS 구조

대부분의 DBMS는 disk-oriented architecture를 가정한다.

즉

Primary storage location = non-volatile disk

예

Database Storage Structure

DB는 disk file로 저장된다.

파일 구성

file → blocks → tuples

여기서

block = page