Lecture

Summary

Memory Management Fundamentals

Database systems are responsible for managing memory and moving data between disk and memory
All data manipulation (insertion, modification) happens in memory, not directly on disk
Pages must be read from disk into memory, modified, then written back to replace the original data
Reading pages from disk involves access time delay, which is a critical performance bottleneck
Minimum acceptable latency is one millisecond - anything longer requires optimization
The goal is to minimize disk I/O time by controlling where and when to write pages

Buffer Pool Architecture

The buffer pool manager is responsible for efficiently managing memory, reading pages from disk, and writing pages back
Buffer pool sits between memory and disk as an in-memory cache for pages
Buffer pool contains fixed-size frames (4KB each) that hold copies of disk pages
When a requested page is already in the buffer pool, no disk access is needed - this is the ideal scenario
Pages in the buffer pool do not need to match the disk ordering

Buffer Pool Metadata

Page table (hash table): Tracks which pages are in which frames using key-value pairs, enabling quick lookup without caring about order
Dirty flag: Indicates whether a page has been modified in memory (1 = modified, 0 = clean)
- Dirty pages must be written back to disk before eviction
Pin counter: Tracks number of active operations/threads accessing a page
- Pages with pin counter > 0 cannot be evicted
Latches: Used to control access to internal buffer pool structures during critical operations

Latches vs Locks

Locks operate at high level on database content (tables, indexes) and can be rolled back
Latches protect internal database structures at low level (page tables, memory regions) and cannot be rolled back
Latches are held only for single operations, while locks can persist for entire transactions

Page Table vs Page Directory

Page table: Tracks pages in memory (buffer pool) using hash table
Page directory: Tracks pages on disk
Both serve the same purpose but operate at different layers

Memory Allocation Policies

Global policies: Make decisions by considering entire workload across all transactions
Local policies: Make decisions based on specific transactions or priorities
Most systems use a combination of both approaches

Buffer Pool Optimization Techniques

Multiple Buffer Pools

Can have separate buffer pools per database or per page type
Reduces latch contention and allows tailored policies for different workloads
Uses object IDs and hashing to map pages to appropriate buffer pools

Prefetching

For sequential scans: Buffer pool can anticipate next pages and fetch them before they're requested
Pages are made available in memory ahead of time, eliminating wait when actually needed
Drawback: Can cause sequential flooding - loading unnecessary pages that pollute buffer pool
Index scans are more efficient, only fetching needed pages without flooding

Scan Sharing

When multiple queries scan the same table, attach new scans to existing ones to avoid redundant work

Buffer Pool Bypass

For large sequential scans or complex queries with large intermediate results, bypass buffer pool entirely
Reduces overhead by moving data directly to memory
Common in queries with sorting, complex joins, or merge joins
Also used in OS-database communication

Replacement (Eviction) Policies

FIFO (First In, First Out)

Simplest policy - evicts page that has been in buffer longest
Uses timestamps to track how long pages have been in buffer
Can be implemented with circular buffer

LRU (Least Recently Used)

Evicts page with oldest access timestamp
Implemented as list where recently accessed pages move to tail, least recently used at head
Head of list is evicted first

Clock Algorithm

Optimized approximation of LRU using reference bits (0 or 1)
Reference bit of 1 means page is active/accessed and cannot be evicted
Clock hand moves around pages, setting accessed pages to 0 after processing
Only evicts pages with reference bit of 0

Problems with Basic Policies

LRU and Clock can cause sequential flooding - populating buffer pool with up to 80% unnecessary data
Sequential scans read entire tables, loading many unneeded pages

Advanced Eviction Strategies

LRU-K

Tracks last K accesses to predict future access patterns
Uses historical data to estimate when page will be accessed again

Localization

Query-based eviction - pages evicted based on specific query/transaction
Minimizes buffer pool pollution by only keeping pages relevant to query

Priority Hints

Transactions indicate which pages are more important
System understands which pages contain critical data (e.g., pages with new inserts)
Important pages marked as not to be deleted

Notes

Transcript

So if somebody says The database system is responsible for managing its memory and moving data. From the lower end to the back end and after you do whatever you do our manipulation Our insertion we've seen, our modification we've seen You know?

Everything happens in memory. Here. It's right here in memory.

After you do any modification Let me see. Page two.

has been read into the memory And then we added something plus plus plus You have to push it back To replace this that started the task, the same data. And then this can be taken out so that we can have Extra space. That is the whole dynamic. So... I don't want to hear anybody say, "Hey, we're going to modify stuff from there." No, no, no. You have to read the page. And then you modify. But the process of reading the page We are going to be Looking at And delay.

There's gonna be a time delay. You have to wait for it to be picked from the disk and bring it up. If you calculate, we calculated that time as the axis time, you guys remember axis time. Okay. And actually that reminds me of a folder I forgot to upload. But I'll upload it. I'm just going to put it to remind that. I need to upload some Examples That, uh... You guys can calculate about axis delay. So I'm just putting it as a reminder today So...

So somebody can say, "Oh, is it a long time "that you're going to sit and wait?" you know it appears as fast but it's actually long time because The standard acceptable of the minimum acceptable Latency. It should be one millisecond. Any system that... That delay is one millisecond, then it is not meeting the minimum acceptable. So the delay is too much we work towards minimizing it so this is the whole idea The disoriented architecture is this: So many pages.

So Because our problem is the delay, When blocks move up and down We say if we know where to write, And when to write.

Okay, you know where to write and when to write. Meaning that, hey, if you ask, Uh, uh, it's partial control. and you have Pro control time control Then We can minimize in that time.

Real component that does all this It's what we call the buffer pool manager. This buffer pool manager is going to be your second assignment. When You're going to show how it efficiently manages memory and also how it manages written pages back on a disk.

and also which pages should be read in that memory.

Okay. The pages that we read, we first have to put them in a cache Before we are veiled in the memory, So look at it as a portion in memory. Okay, just like you see right over here The buffer pool is right over here Which is gonna be a part in memory. So...

All say it is between the memory and the And the disc.

But we say that the buffer pool is a section in memory Now, a section in memory that is basically going to Just Help me manage. These pages, but in memory So We're going to have fixed-sized components in a buffer pool, but this time we're going to call them frames. So, Frame one, frame two, frame three, frame four.

Just gonna be like slots Where are we going to insert our pages? They are also going to be of the same sizes 4KB, 4KB, 4KB Just as you see there. Just know that they are basically similar to the pages on a disk.

Now, In case the database is still We request a page. And if you request a page In a buffer pool And the page is in the buffer pool, we're going to say that's our lucky day. So what do I mean by lucky day? I mean like we don't need to go down the disk and spend all the time to open the disc Look for the block Bring the block into existence Or a velvet block. to the memory for use Now, in case we do not have it, We are actually going to do that and you can see that, hey, We have gone down to peakA copy.

The order of the disk It should be the same order. No, that buffer pool doesn't matter. The buffer pool has its own way of managing this order.

And we introduced that in form of a page table. A page table is going to help us manage Okay, all have a pointer and I think I show it right over here.

So you can see that, hey, We don't... Because, yes, we need some kind of organization and we need some kind of quickly identifying which page is on which frame. So we... We use this hash table What you call the page table? And you know that hashing is perfectly implemented in scenarios where we do not respect any form of order. Any form of ordering Um... Hashing is the best thing.

The hash table is very important because you need to track Each page that you have collected, like in our... In a situation we don't have page 2. I can also have page 9, page 7. So in this we still need some form of direction. And the hash table is going to help us do that. What is it going to be? It's going to have a key and value. So a key that is going to point to the physical location It's going to point to a location which is going to be in our frame.

To the buffa pool. And of course we say that hey the ordering is not mattering so we don't have to worry about the order we just have to make sure that we have an X and Next tunnel And an extra in direction layer which is basically going to help us to point each Every single page that comes into into the buffer pool. We need to identify it in our indirection layer. The dirty page, the dirty flag or dirty page.

Now, you already know what a dirty page is. A page that is read into the memory We have done some modification End.

We are here to Write it! That is a dirty page.

Okay. because If it is page 2, page 2 in memory has some addition but page 2 on the disk doesn't have addition. It makes this dati page Meaning that, hey, we will need to fast write it So we have a flag which is basically a bit, can be a zero and one.

And this is going to identify a page in memory whether it has been modified or not.

If my flag is showing zero We can say the page has not been modified. If we say the flag is one has been modified, hence it needs to be.

written back So... These flags help us to To track Many operations that come and change the data on the page So don't forget that, hey, This dirty page always needs to be written back. On the disc. And the pin Pin is going to track the number of operations, multiple threads that you are running that are currently accessing the page. So they could be writing, they could be reading and every time that thread, that pin or counter is going to increment So every time you're going to access, you're going to increment.

If the counter is greater than zero, then we should not Allow to evict that page. Just like how we cannot evict dirty pages, We cannot evict Pegasus.

And you don't want to have something In memory that doesn't have a pointer. We don't want to have any narrow pointers. So what do we do here? You can We can Introduce a light mechanism. And this is basically like you are booking off a slot Before the page is going to be such that if the page is there then you just link You just make Just like you see right over here.

control the competition, The quotation for space No. So you can see that, hey, page two is here, but it doesn't have a link. So we're probably going to come here and have it Page 2 ID is going to be pointing there. Okay. So this is exactly fetch the page. Put it there. then update. So the reason why we use latches is we want to make sure that hey, There's no point There's no point in time. We will pick a page and fail to create its indirection there.

O-Sand identify in our table page. So we We have a... Latches, like a latch that we just looked at, And this can be So local logs at the higher level, application, And the lectures are the low level. So at a higher level I mean you are Just working with mere srl tables, indexes, basically database content.

Not the internal, but if you're working with internal, You're working with critical section of the internal structure of the database.

That means at the much lower level, you are talking about Things like PageTableOkay. You're talking about Some region in memory. Okay. That is is basically going to be LatchOkay. So look at them as features that have control or protect uh... the eternal companies of the database.

Can we roll back? Yes We can roll back any transaction at a high level, but latches Can never be rolled back.

which is metadata tell me more about this and tell me how much space is left tell me about this and tell me if you have spaces left this is tracking pages The thing is that this memory This... Layer of indirection does it in the memory and if it does in the memory we call it a page table This layer of indirection does it on a disk. And we call that a page directory. So the two do the same thing. But at different layers, we call it differently.

Okay, so do not confuse the page table from a page directoryNow... Because the memory has to know How to handle responses? We need to know how to allocate Memory. So you can categorize your transactions As Globo Oloko Now, if you have global policies, Okay, of allocation. You basically going to make decision By looking at the entire workload, So you're going to say, because of this entire workflow, this is what we need to do.

So all the transactions are going to be considered For you to make a decision. So, I want to make a decision. From page 1 to page 10, which page comes first?

And this decision are going to come from, let me see, let me look at all the pages and let me say that page five is more. Now, if you don't do global policies, That means you're going to do local policies. Where decisions are made based on specific transactions. So, it could be: This transaction has higher priority Maybe this transaction ran faster? So we're not going to look at the entire entire workload And somebody can say this is going to be unfair.

So But Most systems have a combination. You know, maybe activate local when you are trying to look for private, when you're basically selecting out some transaction. So both these can be used as policies can be used by database system to manage the allocationHi there. memory Okay.

Multiple buffer pools, that is obvious, having a second buffer pool.

Um Scan sharing is something new that I I can tell you about and then impossible power bypass And we have a slide for that. What I can say about multiple buffers or you have several buffers, you can have different buffers that handle different things.

For example, I can have a buffer that is going to handle per database.

Maybe I'm having multiple databases.

So I can have Pandora based buffer pool I'm gonna have For page type Buffer pool, you know, so because you have many it's gonna be fine for you to do that And you can have tailored policies in those in each of those buffer pools Now, The beauty of having all these buffer pool is going to reduce the On the contention of The latches, the latch contention, the competition, "Hey, I have to put a lock, I have to put a lock in this because there's a page coming in." It's kind of a competition.

Okay, and all these operating systems All these benefit systems use or have this functionality whereby they're not operating on one buffer, they're operating in more than One. So if you have multiple buffers, You can map You can do a mapping of desired pages by looking at the object ID.

So it's a unique identifier. And then we're also using hashing.

That means, hey, We are going to be, we don't care about any other certain things. We're not caring about it. We just want to Be mindful that, hey, I have an ID, I'm gonna hash that ID and return me exactly where this record should go. So an object ID is like an identifier. which is basically going to be a composite. It can be a composite ID, and remember we talked about composite ID. It's going to have like an ID, page ID, slot number object.

Pre-fetching is straightforward.

So then we want to look at how prefetching works when we have two queries. One sequential scanning query and the other one is going to be an index scan.

You can see that here are fetched The database system has fetched To be specific the buffer pool has fetched page 0, a copy of page 0. So we don't really fetch The blog, someone can say, "We are removing the blog from the hard disk." No, we take a copy. Okay, so a copy of this comes here So the query, this is sequential. Remember, a sequence C, I'm going to follow. I'm not going to skip anything. If I'm doing a sequential scan, So someone can guess What's the next sequence here?

It's going to be 1 or 2. And let's go to the next sequence. to be read. So the buffer pool can already know that, hey, you've given me page zero, the next sequence is gonna be page one. Why can't you fetch it before I ask for it? That's what we call pre-fetching. And you can see that, hey, In an actual sense, the credit is going to jump to this. The next sequence is... I say, "Hey, I have one free slot." Why don't you bring me two at the same time?

And this is what you actually see. The query has not asked for page two and page three, but I've gone ahead to pre-fetch page 2 and page 3 and I've made them available in memory. So when the time comes for me to ask for page 2 and page 3 like you see here I'm asking for page 2 And... And the buffer pool is saying, "That's my lucky day." Because I already fetched it beforehand. So the actual query goes on to ask for, the next one is a, okay.

Now we're going to see a drawback there. The drawback like page 4 was never requested. Nerva requested But it was made available We call that Sequential flooding You are flooding the A buffer pool, or we call it also pollution. Pollution You're going to pollute Our minimum resource with pages that we do not need. Okay. So Let's see what happens if we are actually doing an index scan. Now, on index gun, I am going to followAnd?

I did fire. And this is if you have told me that, "Hey, My table has an index on. Maybe student ID. That is different with someone who's doing a sequential scan. Someone who's doing a squint your scan is bringing you everything An index saying wait I'm just going to be jumping. Okay. So if you look at this query, This query represents index who can tell me why do you think Who can tell meWell, Why this index?

Yes, page one is here. Page, sorry, page zero, page one, page two. Page three, page four, Page 5, page 6. So you can see that hay My query that has the index is going to come and ask me, Hey. Um... Can I see page one? Yes. Page 1 makes it available. So from page one, we know that these are the values we are looking for.

So my index is basically going to jump here and here. Do we need page four? No. So what am I going to do? This is going to be index page 4.

This page was never requested. It's never requested so we're never gonna see it in the platform. It's not going to show up in our buffer pool.

So, is there something we are saving? Yes, we are saving concurrent scan. Imagine if A has 30,000 records. You have every time you issue, it's going to scan the entire table. If there is already a scan in the background, just attach it to it.

So scan sharing can be very helpful. The other technique that can help us optimize our buffer pool is when you bypass the buffer pool itself. When you have a bypass of the buffer pool. This is very common if we do sequential scanning, especially of large, if we're scanning large pages or large sequential.

something in this sequence in the next is four five six seven you know that you follow a sequence so if you know that hey this All needed, you're just going to move them straight to the memory without going through the buffer pool. That alone can help us get rid of most of the overhead.

And this is going to be so common when you are having a complex query. Like you have a query that does a sorting, and a complex join, or merge join. Those joins have some intermediate value. Some intermediate may be big, covering a number of blocks, so it's better for you to bypass the...

And again, this technique is also not new. It's been applied by most of those operating systems. The bypass can also happen at the OS, the communication between an OS and a database system. And we say that these two work hand-in-hand. Although we don't focus so much on the operating system, we want to see, hey, when the database is communicating to the operating system, for example, if you want to turn on the flag, if you want to turn on...

flag that says, hey, let's bypass. Meaning it's going to be a 1 or a 0. So the byte, we flag it. Bypass that. That means, hey, we are going to basically directly talk to the OS without going through. Something that you will see, find the OS cache, redundant copy of pages, that you communicate between the database system and the OS, or you want to look at the eviction policies, some of these you may say, hey, you don't need.

So, as you see here, the memory, our buffer pool has only 1, 2, 3 slots, okay, but we want to process only 5. So what is going to happen? After processing the other one, drop it. But what is, what are we going to look at to decide the eviction? Action. So we look at the replacement policies and we start with a simple one which everybody knows, very easy to implement and very straightforward, which is the FIFO.

So, fast in, fast out. This is very clear, simplest eviction policy that you can apply. This algorithm looks at the timestamp. It's going to look at the one that has stayed in the buffer for a long time, OK? And you have seen it just in our last illustration here. When you want to delete, I'm asking for 0. This is going to be replaced because it's the last one that was there, OK?

If you implement a circular buffer, you must mostly use the... So you hash the number of buffers, hash the point counter, it is basically going to give you

If you're looking at the list recently used, LRU, This one, the one that is not accessed or that has fewer access times. So, we use timestamps. If you look at a timestamp, for example, if I give different timestamps to each one of you, I look at a timestamp that was last accessed.

If I access you now, and I access you again, again, again, again, okay, so basically you'll be getting new timestamps. This is also, um... So look at the oldest timestamp. And it's been implemented in a list algorithm. So a list algorithm, you have your blocks or pages. Page 1, 2, 3, 4, 5. The one that is accessed comes in the front, because it gets the latest timestamp.

So imagine, the one that is not really accessed is going to have the longest time stamp. So you find that if this is the illustration, our 1, 2, 3, 4, 5 here are these pages. So if I access... If I access 3 first, 3 is going to be given a new timestamp. If I access 1, 1 is going to be given a timestamp. If I access 5, 5 is going to be given. You're going to find that, hey, whatever is accessed is pushed at the end. It's the last to be deleted, you know.

And then the one that is not accessed is going to have a list recently accessed page. So it's going to be at the front. Like I have my phone. Here my phone is going to be at the front. And if I'm evicting, this is the head and this is the tail. So whatever is in the head is just dropped.

It's a technique that you basically couldn't find out. Okay, so the clock, it uses an approximation of a... Meaning that, hey, it's slightly optimized, but much faster. One thing you need to know here, it does not. So it uses a different feature, which is a reference bit. And we talked about a reference bit. You either set it to 1, meaning that, hey, the page has been accessed, or it's going to be 0, meaning the page...

You're basically monitoring the bits. The bit is in the page header. It's ADA-001. Anything with the... Anything, if you check the header, for example, if you start here as we go around the clock, clock high, the second second chance algorithm, I'll call it the clock algorithm. So, one meaning is access. This can never be dropped, okay? You can never drop this.

Why? Because it's active. It's a used bit. One is a used bit. You only drop when it's already reconciled and you've done the zero. So what you do, access the bit, use it. If it needs to be written, let it be written. And now, change this one to zero. Meaning that, hey, the modifications have been done. Comes the same, this same block or page. It's still showing a used bit.

If your clock hand is there, that means hey, access it, process whatever you're processing, then convert that to this. Do the same here. Now, this one... We have nothing to change here. This block was never used.

Blocks are ready to be dropped because all of them are zero, you know. That is when you are using the reference bit. If the reference bit is 1, do not drop that. You can't evict. Even if it's turn has come to evict. When you evict, say, hey, I have a 1. We only evict zeros. That's how a clock 1 works. Now, the clock has some issue here.

So the clock and LRU, they're likely to cause sequential flooding. And I've already told you sequential flagging, what it's about. So if you pick 80% of data, and you populate. The memory, you know, that means you are going to, you're only going to waste the memory. This is the memory.

So, why did you bring this one? Why did you bring this? This is where we said 80% is useless data.

And when block 4 shows up, I don't need block 4. I'm being forced because of my algorithm. That is going to lead to flooding. You don't want to flood 80% of your memory with useless data. So, we are likely to have that, and you don't want to have your memory to do that. So, if you're doing a sequential scan, you're most likely going to have this problem. Sequential flagging.

I've given you an example. OK, what does this likely to do? Pollution. You're going to find that, hey, there are three blocks that you are minimizing, you're kind of managing. You are wasting the two blocks because you are populating it with blocks that are not. And why sequential scan? Sequential scan reads everything about the table. If I'm running a sequential scan, I am going to read...

I am going to read everything in that table. I'm going to read all the pages, all the blocks in that table. And that is why we end up having all the data that we don't need. And with these methods, we're not very sure whether the most recently used pages are actually the most unneeded.

If I want to read 5, all of this needs to be brought into the memory. But if I'm using index, Index will just bring me to the table. OK, so there's an improvement of the ARU, which we're going to call the ARU with a K reference. We write it like that, and you, in your program, in your assignment, you can go ahead to implement these other better solutions, replacement solutions.

So this one, the k-reference, I want to track the last time a page was published. If this is page 0, I want to see when was it last accessed. So I get 5, I get the average. When I get the average, I can now predict the next time. I can now predict. I can now predict. So get...

Get an indication that, oh, OK, looks like this page is always accessed after a certain frequency. So that's what the K is. Localization deals with a specific query or transaction. So this is TX, when you see TXN, we are meaning transaction. So the page is evicted based on the transaction or the query itself. So we can say it's query-based.

Eviction. Priority, you're going to find that some transactions will indicate to the buffer pool. that hey, this block is more important than that one. The other one is less important, so we need to evict it. So priority hints, somebody can say can be unfair.

The k-references are basically explained. Get the historical 3, 4, 5 k-references. Study them, get a prediction, then you can predict the next k-reference. If you have an estimate, if you have a history of the last five, then you can easily tell for the next.

Localization, which I say this is basically based on the query. So you need to know which one this is based on which query. If it's an aggregation, it could be just maybe a set membership query. So the vision here is based on the query. And of course, this will help us minimize the pollution of the Park-Far-Port.

And what does that mean? We're not just going to bring everything. We are going to keep tracks that are desired by that query, because that query is, the eviction is going to be query-based.

An example here, Postgres maintains a small ring of buffer that is private to the query. So you can look at that as well. The priority ones... Here, as you create, you basically know, okay, these blocks are priority because they keep same data. So, the database system has a clear understanding or context of which pages have my data. For example, if you're going to insert new values.

So, because we have hints, because you have hints, when you were creating the table, you had the hint that, hey, my tables lie on this one and this one and that. So, and those ones can be labeled as important and the ones that we don't want can be. So when you look at this,

Let's say our values are going to be here. Our values are going to be on the block 0, block 4 and block 5. So you insert it, and that's where you've included your block 0, block 4 and block 6. If we call this student data, we'll have all my student data. If I go on to retrieve, this is a retrieving read. Can you read the data that you just entered?

It should not be dropped. The ones that are important in the priority hints should not be deleted. So the pages that can be deleted are this, this, this, and that because of the... I explained the dirty page concept, but dirty page...

Background

Buffer Pool은 DBMS 메모리 관리의 핵심 구성요소이다.

DBMS는 대부분의 데이터를 disk에 저장하지만 query 실행 시에는 memory에서 작업해야 한다.

따라서 DBMS는 disk와 memory 사이에서 데이터를 이동시키는 buffer pool system을 사용한다.

Buffer Pool의 목적

disk I/O 최소화
데이터 caching
query execution 성능 향상

Disk Oriented DBMS

대부분의 DBMS는 Disk-Oriented Architecture를 사용한다.

특징

database 데이터는 disk에 저장
query 실행을 위해 필요한 데이터만 memory로 가져옴