External Sort-Merge Algorithm In DBMS

nmr98000
Apr 5, 2022
11 min read

Updated: Apr 7, 2022

When we hear the words Sorting and DBMS together, some of the questions that come to our mind are, how is Sorting important in Database Management Systems ?,What role does Sorting of relations play in Database Management Systems ?. So, as an answer to these questions,let's take two different scenarios where sorting operations can be required in a DBMS system. Those two scenarios are :

1 . When an output to an SQL query is explicitly asked to be provided in a sorted manner, by the query itself.

For example : Say, consider a database table ‘employee’

Select * from employee Order By name

In this , the SQL ORDER BY keyword orders the query to return records from the table employee, in ascending order by the value of the name attribute in the table.

2 . Implementation of relational operations in DBMS like joins, union, projection, etc., can be made efficient ,if the internal relations are sorted initially before these operations are performed on them, leading to an efficient query processing.

For example : For a join operation, if both the relations are sorted on their join attribute (it shouldn’t be for any other attribute as it becomes useless),then the join operation becomes much more easier and efficient.

Now, we know that we can join huge relations easily, if they are sorted. But the question here that we would like to get answered is , how do we sort huge relations ?.

For relations that fit in memory, standard sorting techniques like quick sort, merge sort are sufficient. But what about relations that are much bigger than our main memory,which is in an order of magnitude larger than memory available to us, which doesn’t fit in memory ?.

Two approaches followed for sorting relations are

i) Logical (using indexing)

ii) Physical - [ 1) In memory sorting (eg : Quick sort ) and 2) External Sorting (eg : External sort-merge) ]

One of the ways to sort relations is through a logical sorting approach.

In Logical Sorting , we can sort a relation by building an index on the sort key, and use that index to read the relation in a sorted manner. But one of the drawback to such an approach is that while reading the tuples in a sorted order, it may lead to a disk access for each record (disk seek + block transfer), which can be very expensive , since the relation that we are dealing with is larger than our memory, the number of records can be much larger than the number of blocks. So, the ordering of records on a physical basis is what is needed here.

In the physical sorting approach, we sort the relations physically.As discussed before, if the relations fit in main memory,then in memory sorting techniques like quick sort,merge sort etc can be deployed in this scenario.

Fig : Quick sort illustration (GIF)

Now, for the other scenario, where the tables doesn’t fit into the main memory, external sorting is used.

External Sorting : For relations that don't fit in memory, sorting of such relations is what is known as external sorting.Since the records or data is contained in peripheral or external memory,that’s why the sorting method is called external sorting.

Now, here, we will be discussing one of the most commonly used techniques for external sorting, which is the ‘External Sort-Merge Algorithm’. So, now let’s jump onto the details of the algorithm.

External Sort-Merge Algorithm

Basic idea :

From the name of the algorithm itself we can make out that this algorithm is based on divide and conquer approach.In a basic sense, what it does is, the data is divided into chunks where each chunk’s size will be small enough to fit into the main memory.These chunks are then read, sorted, and written out to a temporary file.Then finally,these sorted chunks are combined together or merged together to produce finally the sorted relation as the output.So here we will go deep into the detailed working of the algorithm.

Let M denote memory size (in pages). It denotes the number of blocks in the main memory buffer which is available for sorting.The algorithm is divided into two phases (it works in two stages), First is the ‘Sorting phase’ and the second is the ‘Merging phase’.

First Stage (The Sorting phase) :

Here in the first stage of this algorithm, creation of sorted runs happens.By sorted run here, we mean, is a sequence of pages that is sorted.Here, sort the records on each individual page,that is each run is sorted and also one thing to note here is that each run in this case will contain only some of the records of the relation and those runs are sorted in this first phase of the algorithm.So basically,in this first phase, the relation will be divided as per the main memory space available and those individual runs will be sorted and saved back to the disk.

Now, let’s dive into the algorithm part of the first phase.

Algorithm :

i = 0;

repeat

read M blocks of the relation, or the rest of the relation,

whichever is smaller;

sort the in-memory part of the relation;

write the sorted data to run file Ri;

i = i + 1;

until the end of the relation

Algo Explained

The first three steps in the loop, that is { read M blocks of the relation, or the rest of the relation, whichever is smaller;sort the in-memory part of the relation; write the sorted data to run file Ri ; } is run for a number of iterations until the end of the relation.What happens in these three steps is that first read M blocks of the relations,then sort it individually (using any standard sorting method) and then write the result back to the disk(here,written into temporary run file Ri)This comprises the steps to generate the first sorted run.In the next iteration, again read the M blocks (from the remaining part of the relation) and so on till we haven’t completed the relation R.So,in this manner, number of sorted runs are generated in this first phase.

Second Stage (The Merging phase) :

Now, here in this second stage of the algorithm,from the first phase whatever sorted runs were created, merging of these sorted runs happens in this phase.These sorted runs are merged or combined to form the whole sorted relation.Now say, let N be the number of such sorted runs from the first phase which were to be merged and we know M is the number of memory blocks in the main memory buffer which is available for sorting.So,suppose here N < M, so that after allocating each block of memory available to these N sorted runs,at least there will exist one block of memory left for the output block.

So now, let’s dive right into the algorithm part corresponding to the merging phase.

Algorithm :

read one block of each of the N files Ri into a buffer block in memory;

repeat

choose the first tuple (in sort order) among all buffer blocks;

write the tuple to the output, and delete it from the buffer block;

if the buffer block of any run Ri is empty and not end-of-file(Ri)

then read the next block of Ri into the buffer block;

until all input buffer blocks are empty

Algo Explained

So, as per the algorithm in this phase, the first step is to read one block of each of the N files Ri into a buffer block in memory; , that is to read one block of each of the N run file Ri (where i =0,1,...,N-1) which has been generated in the first stage/phase.Next few steps are inside a loop, where the first step in the loop is to choose the first tuple (in sort order) among all buffer blocks; , Now this means that we have to compare the first tuple of every block and the next step is to write the tuple to the output, and delete it from the buffer block; , that is, after comparing the first tuple of every block in the sorted order, as per the requirement the tuple will be written back to the output. Once that tuple is written to the output,the corresponding tuple entry from the buffer block will be deleted. The last steps of each iteration is that, if the buffer block of any run Ri is empty and not end-of-file(Ri),then read the next block of Ri into the buffer block; , that is, when the buffer becomes empty, we have to read the next part or next block of the run file R into the buffer block.These steps has to be run iteratively until all the run files Ri are not read completely.

So, after the second phase, the final output produced is a sorted one single run, that is, a sorted relation is obtained.In this merge phase, all the N runs are sorted and that’s why this merge is known as an N-way merge,it’s like merge k sorted arrays In this phase, multiple merge pass happens when there are more than M number of runs generated in a stage as the number of memory buffer blocks available at a time is M-1,so if there are more than M runs generated,merge operation can be done in multiple passes,wherein M-1 runs can be taken as input in each pass.These multiple passes are done until the number of runs in a pass becomes less than M and then the pass after that produces the final sorted output.One thing to note here is that each merge pass reduced the number of runs by a factor of M-1.

Now, let’s see an example for seeing the working of the algorithm in practice.

Example :

So, let’s consider that we are having a file or relation of size 5GB.

Now, say if the main memory size available to us is only 1GB.

So, given

Relation Size - 5GB

Memory Size - 1GB = 1000 MB.

So, here we can see that this complete relation of size 5GB is not going to fit in the main memory.

First Stage (Sorting phase):

So, as per in the first phase (Sorting phase), this relation will be divided into five different parts as per the memory size (since 5GB can be divided into five 1GB parts each and the main memory size is 1GB). These parts are then sorted individually as per some standard sorting method in the main memory and then this sorted part is written back into the disk (stored in their corresponding run files). So in this manner five different run files (R0,R1,R2,R3,R4) are generated, which will individually contain a sorted 1GB part of the relation.

Fig 1 : Illustration of steps involved during the first phase

Second Stage (Merge phase):

For the next stage, we have five sorted run files from the first phase.We know that the main memory size is 1GB, so if we bring all the file blocks onto the main memory,it won’t fit and we won’t be able to produce the output here.So, as per the second phase algorithm that we saw before, from every run file that has been generated, some part will be read.So, here let’s assume that 150MB of data has been read from each run file.So, the total amount of data taken into the main memory in an iteration is 150 * 5 = 750 MB.Now, since the main memory is of size 1GB in total, that means 1000 - 750 = 250 MB of space is still left, which can be used for the output block.Now, the first tuple from every 150MB run blocks is compared, whoever is less (for an increasing order arrangement) or more (for a decreasing order arrangement), write them to the output.Then read the next record which has already been written that we need to delete and then go for the next record, so in this way all the data will be compared in written in a sorted order.Now,if the output block of 250MB becomes full,then the data should be written to the disk and then empty it again.So in this way all five files will be empty or the buffers will be empty,then the next set of record has to be read and so on.So,in this manner, finally, we will be left with one single sorted relation.

Fig 2 : Illustration of steps involved during the second phase

Cost Analysis of External Sort-Merge

The Disk access cost for the external merge sort will be calculated here.

Let r denote the relation we want to sort.

Let br denote the number of blocks containing records of r.

In the first stage,that is in the sorting phase, what would be the number of block transfers that could have happened ?

We know,in the first phase,there is both read and write operations performed.Every block of r is read and then at the end of the first stage,every block divided into sorted parts are again written out to the disk.So, here the total number of block transfers that happened in the first phase = 2 * br

Now, in the second stage, that is in the merge phase, initially , br / M number of runs will be read at a time.Now, in each merge pass in the merge phase,the number of runs will be decreasing in each pass by a factor of M - 1.

Hence, the total number of merge passes required will be

⌈logM-1 (br / M)⌉

Note : Here the term M-1 in logM-1 is a subscript to log and also r in br is also a subscript.

One thing to notice here is that in each of these merge passes,every block of the relation r is read in once and written out once,that means two block transfers happening each time,except two scenarios or two exceptions..Those exceptions are :

First one is when the last sorted output is generated in the last merge pass,that output is not written into the disk.So,no block transfer w.r.t the write operation happening in this particular pass.

Second one is when there are runs which are neither read in nor written out during a pass.One example for such a scenario is, suppose say there were M runs to be merged,then in that pass,one run is not accessed during this pass,as the M-1 runs are read and merged while one run is not accessed.

Hence, considering all these things and scenarios,

the total number of block transfers for external sorting the relation r is equal to

br (2⌈log(br / M)⌉ + 1)

Now, what about the Disk-seek costs ?

In the first phase,that is in the sorting phase, the disk seeks required for the read in and write out for each sorted runs generated in that phase will be 2 * ⌈br / M⌉ , since M blocks are taken into the memory at a time.

In the second stage,that is in the merge phase, so let’s say if there are bb read at a time from each run, then the number of disk seeks required for these reads in this phase will be

⌈br / bb⌉ . Since there is an equal amount of write into output during each merge pass,the number of seeks consumed by the writes in this phase will also be ⌈br / bb⌉.

Hence, the total number of disk seeks required in the merge pass will be 2 * ⌈br / bb⌉.

Hence, adding up all the disk seek costs of both the phases,

the total number of seeks will be

2 * ⌈br / M⌉ + ⌈br / bb⌉ (2⌈log(br / M)⌉ - 1)

Note : Here, for the final merge pass,it’s assumed that the final sorted result is not written back to the disk.

Now let’s see with an example as of how to calculate Disk access cost for the external merge sort.

Example :

Consider the figure given below which depicts the steps of the external sort–merge for an example relation.

Assumptions :

Let’s assume that the main memory capacity is three blocks or the memory can hold at most three blocks.
Only one tuple fits into a block.
For input and output during the merge phase,two blocks and one block are used respectively.

Fig 3 : illustrates the steps of the external sort–merge for an example relation

So, in this example, from the figure we can see that there are 12 blocks in total which are holding the given relation.

So, in this example , br = 12.Also, we can see that for the given example,the number of memory blocks in the main memory buffer which is available for sorting(M in our case) is 3.That is,

M = 3.

Now, applying the number of block transfers equation given above,substituting M and br value to the equation ( br (2⌈log(br / M)⌉ + 1)), to the above example, we get

total number of block transfers = 12 * (4+1) = 12 * 5 = 60 block transfers

Similarly, applying the total number of seeks equation given above, that is applying

( 2 * ⌈br / M⌉ + ⌈br / bb⌉ (2⌈log(br / M)⌉ - 1) to the above example, we get

total number of seeks = 8 + 12 * (2 * 2 − 1) = 44 disk seeks.

Hence, for external sorting the above relation using the external sort merge algorithm,in total,

60 number of block transfers took place and 44 number of disk seeks.

Summary :

As a summary, the first thing that we can make out from all these discussions is that

External Sorting is important because the scenarios where the relation to be sorted,being larger than the available main memory in size,are not rare in the real world.
Went in detail, through two phases involved in the algorithm, namely Sort and Merge phases respectively.With an example we saw the workings involved in these two phases.
Cost analysis of the algorithm was done.External merge sort minimizes disk I/O cost, saw in comparison to logical sorting using indexing.With an example,we saw how to calculate the total number of blocks transferred and number of disk seeks.

References :

DATABASE SYSTEM CONCEPTS SIXTH EDITION

Abraham Silberschatz - Yale University, Henry F. Korth - Lehigh University,

S. Sudarshan - Indian Institute of Technology, Bombay

External Sorting : For relations that don't fit in memory, sorting of such relations is what is known as external sorting.Since the records or data is contained in peripheral or external memory,that’s why the sorting method is called external sorting.

External Sort-Merge Algorithm

Basic idea :

First Stage (The Sorting phase) :

Algorithm :

Algo Explained

Second Stage (The Merging phase) :

Algorithm :

Algo Explained

Example :

First Stage (Sorting phase):

Second Stage (Merge phase):

Cost Analysis of External Sort-Merge

Example :

Assumptions :

Summary :

References :

Comments