Sort by Direct 2-Way Merge

8.11. MERGESORT ALGORITHM

8.11.1. Mixing 2 lines

The two-way merge is the operation of merging two sorted key sequences to form a key sequence whose size is equal to the sum of the sizes of the two original key sequences, and the resulting key sequence is also in a sorted order. Its implementation principle is quite simple: compare the two keys at the top of the two sequences, select the smallest key, and put it into the sorting domain (a sub-key sequence whose size is equal to the sum of the sizes of the two original key sequences) in the appropriate position. Then, this key is removed from the key sequence containing it. The process continues until one of the two key sequences is exhausted, at which point it is only necessary to move the entire remaining key sequence to the sorting domain.

For example: With two key sequences: (1, 3, 10, 11) and (2, 4, 9)

Row 1

Row 2	Smallest key in 2 series	Sort Domain
(1, 3, 10, 11)	(2, 4, 9)	1	(1)
(3, 10, 11)	(2, 4, 9)	2	(1, 2)
(3, 10, 11)	(4, 9)	3	(1, 2, 3)
(10, 11)	(4, 9)	4	(1, 2, 3, 4)
(10, 11)	(9)	9	(1, 2, 3, 4, 9)
(10, 11)		Sequence 2 is  , put sequence 1 into the sorted domain	(1, 2, 3, 4, 9, 10, 11)

Maybe you are interested!

8.11.2. Sorting by merging 2 direct lines

We can consider each key in the key sequence k 1 , k 2 , …, k n as a chain of length 1, of course the chains of length 1 can be considered to be sorted. If we merge two consecutive chains into a chain of length 2, we will again have a sequence of sorted chains. Continuing like this, the number of chains in the sequence will decrease after each merge (Figure 35)

Figure 35: Merge sort algorithm

To carry out the direct two-way merge sort algorithm, we write the procedures:

Merge(var x, y: TArray; a, b, c: Integer) procedure; this procedure merges the circuit x a , x a+1 , …, x b with the circuit x b+1 , x b+2 …, x c to get the circuit y a , y a+1 , …, y c .

The MergeByLength(var x, y: TArray; len: Integer) procedure shuffles the pairs of strings in order:

Mix the circuit x 1 …x len and x len + 1 …x 2len to form the circuit y 1 …y 2len .

Mix the circuit x 2len+1 …x 3len and x 3len+1 …x 4len to form the circuit y 2len+1 …y 4len .

…

Note that in the end we may encounter two cases: Either there are two strands left and the second strand has length < len. Or there is only one strand left. In the first case we have to manage the indices correctly to perform the merge, and in the second case we must not forget to move the only remaining strand straight to the y array.

Finally, there is the MergeSort procedure, which requires a sequence of subkeys t 1 , t 2 , …, t n . First, we call MergeByLength(k, t, 1) to merge two consecutive elements of k into a single chain in t, then we call MergeByLength(t, k, 2) to merge two consecutive chains in t into a single chain in k, then we call MergeByLength(k, t, 4) to merge two consecutive chains in k into a single chain in t … Thus, k and t are used in alternating roles: one sequence contains chains and one sequence is used to merge pairs of consecutive chains to get a larger chain.

procedure MergeSort; var

t: TArray; {Secondary key sequence}

len: Integer;

Flag: Boolean; {Flag = True: merge the circuits in k into t; Flag = False: merge the circuits in t into k}

procedure Merge(var X, Y: TArray; a, b, c: Integer); {Merge X a …X b and X b+1 …X c }

var

i, j, p: Integer; begin

{Index p runs in the sorted domain, i runs in the first strand, j runs in the second strand}

p := a; i := a; j := b + 1;

while (i b) and (j c) then {Until both circuits are considered}

begin

if X i X j then {Compare the two smallest elements in the two strands that have not yet been placed in the sorting domain}

begin

Y p := X i ; i := i + 1; {Put x i into the sorting domain and let i run}

end else

begin

Y p := X j ; j := j + 1; {Put x j into the sorted domain and let j run}

end;

p := p + 1;

end;

if i b then (Y p , Y p+1 , …, Yc) := (X i , X i+1 , …, X b ) {Circuit 2 finishes first, Put the last part of strand 1 into the arrangement}

else (Y p , Y p+1 , …, Yc) := (X j , X j+1 , …, X c ); {Circuit 1 ends first, Put the last part of circuit 2 into the sort order}

end;

procedure MergeByLength(var X, Y: TArray; len: Integer); begin. begin

a := 1; b := wool; c := 2 * len;

while c n do {Mix two chains x a …x b and x b+1 …x c both have length len}

begin

Merge(X, Y, a, b, c);

a := a + 2 * len; b := b + 2 * len; c := c + 2 * len; {Move the indices a, b, c back 2.len positions}

end;

if b < n then Merge(X, Y, a, b, n) {There are two strands left, the second strand has length shorter than len}

else

if a n then (Y a , Y a+1 , …, Y n ) := (X a , X a+1, …, X n ); {If there is one remaining strand, move it straight to the y domain}

end;

begin {Merge Sort Algorithm}

Flag := True; len := 1;

while len < n do begin

if Flag then MergeByLength(k, t, len) else MergeByLength(t, k, len); len := len * 2;

Flag := not Flag; {Switch the flags to alternate the roles of k and t}

end;

if not Flag then k := t; {If the final result is in t then copy the result to k}

end;

Regarding the complexity of the algorithm, we see that in the Merge procedure, the active operation is the operation of putting a key into the sorting domain. Each time the MergeByLength procedure is called, all elements in the key array are completely transferred to the sorting domain, so the complexity of the MergeByLength procedure is O(n). The MergeSort procedure has a loop that executes no more than log2n + 1 MergeByLength calls because the len variable will be increased exponentially by a factor of 2. From this, we can deduce that the complexity of MergeSort is O(nlog 2 n) regardless of the state of the input data.

Both are general sorting algorithms with the same average complexity, but unlike QuickSort or HeapSort, MergeSort is stable . The disadvantage of MergeSort is that it must use an additional memory area to store the sub-key array with the same size as the original key array.

One can also take advantage of the state of the input data to make MergeSort run faster: from the beginning, we do not consider each element of the key sequence as a circuit but consider the sorted segments in the key sequence as a circuit. Because any key sequence can be considered as consisting of sorted consecutive circuits. Then people call this method the natural two-path merge method.

More generally, instead of two-strand mixing, one can use k-strand mixing, when

That gives us the k-way merge sort algorithm.

8.12. SETTINGS

We will implement all the above sorting algorithms, with the input data placed in the text file SORT.INP containing no more than 15000 keys and the value of each key is a natural number not exceeding 15000. The result is written to the text file SORT.OUT containing a sorted array of keys, each key on a line.

SORT.INP

SORT.OUT
1 4 3 2 5	1
7 9 8	2
10 6	3
	4
	5
	6
	7
	8
	9
	10

The program has a menu interface, each function corresponds to a sorting algorithm. At each sorting algorithm, we add a few commands to measure its actual time (only measure the algorithm execution time, not including the time to input data and print results).

In the sorting algorithm using base by element permutation, we choose the binary system. In the sorting algorithm using direct base, we use base 256, then a natural number value x  15000 will be represented by two digits in base 256:

The units digit is x mod 256 = x mod 2 8 = x and 255 = x and $FF;

The remaining digit (= the digit in the highest place) is x div 256 = x div 2 8 = x shr 8;

P_2_08_1.PAS * Sorting algorithms

{ $M 65520 0 655360 }

program SortingAlgorithmsDemo; uses crt;

const

InputFile ='SORT.INP'; OutputFile ='SORT.OUT';

max = 15000;

maxV = 15000;

Interval = 1193180 / 65536; {Clock frequency 18.2 times / second}

nMenu = 12;

SMenu: array[0..nMenu] of String = (

);

type

' 0. Display Input', ' 1. SelectionSort',

' 2. BubbleSort',

' 3. InsertionSort',

' 4. InsertionSort with binary searching', ' 5. ShellSort',

' 6. QuickSort',

' 7. HeapSort',

' 8. Distribution Counting', ' 9. Exchange RadixSort',

' 10. Straight RadixSort', ' 11. MergeSort',

' 12. Exit'

TArr = array[1..max] of Integer; TCount = array[0..maxV] of Integer;

var

k: TArr;

n: Integer; selected: Integer; StTime: LongInt;

Time: LongInt absolute 0:$46C; {Clock count variable}

procedure Enter; {Before each sorting algorithm, call this procedure to enter data}

var

f: Text; begin

Assign(f, InputFile); Reset(f); n := 0;

while not SeekEof(f) do begin

Inc(n); Read(f, k[n]); end;

Close(f);

StTime := Time; {Start calculating time immediately after entering}

end;

procedure PrintInput; {Print data}

var

i: Integer; begin

Enter;

for i := 1 to n do Write(k[i]:8); Write('Press any key to return to menu…');

ReadKey end;

procedure PrintResult; {Print the results of each sorting algorithm}

var

f: Text;

i: Integer; ch: Char;

begin

{First print out the execution time}

WriteLn('Running Time =', (Time - StTime) / Interval:1:10, ' (s)'); Assign(f, OutputFile); Rewrite(f);

for i := 1 to n do WriteLn(f, k[i]);

Close(f);

Write('Press <P> to print Output, another key to return to menu…'); ch := ReadKey; WriteLn(ch);

if Upcase(ch) ='P' then begin

for i := 1 to n do Write(k[i]:8); WriteLn;

Write('Press any key to return to menu…'); ReadKey;

end;

procedure Swap(var x, y: Integer); {Procedure to swap the values of two parameters x, y}

var

t: Integer; begin

t := x; x := y; y := t; end;

(** SELECTIONSORT ********************************************** ***) procedure SelectionSort;

var

i, j, jmin: Integer; begin

Enter;

for i := 1 to n - 1 do begin

jmin := i;

for j := i + 1 to n do

if k[j] < k[jmin] then jmin := j; if jmin <> i then Swap(k[i], k[jmin]);

end;

PrintResult;

end;

(** BUBBLESORT ********************************************** ******) procedure BubbleSort;

var

i, j: Integer; begin

Enter;

for i := 2 to n do

for j := n downto i do

if k[j - 1] > k[j] then Swap(k[j - 1], k[j]); PrintResult;

end;

(** INSERTIONSORT ********************************************** ***) procedure InsertionSort;

var

i, j, tmp: Integer; begin

Enter;

for i := 2 to n do begin

tmp := k[i]; j := i - 1;

while (j > 0) and (tmp < k[j]) do begin

k[j + 1] := k[j];

Dec(j);

end;

k[j + 1] := tmp; end;

PrintResult;

end;

(** INSERTIONSORT WITH BINARY SEARCHING ****************************) procedure AdvancedInsertionSort;

var

i, inf, sup, median, tmp: Integer; begin. begin

Enter;

for i := 2 to n do begin

tmp := k[i];

inf := 1; sup := i - 1; repeat. repeat

median := (inf + sup) shr 1;

if tmp < k[median] then sup := median - 1 else inf := median + 1;

until inf > sup;

Move(k[inf], k[inf + 1], (i - inf) * SizeOf(k[1])); k[inf] := tmp;

end;

PrintResult;

end;

(**SHELLSORT ********************************************** *******) procedure ShellSort;

var

tmp: Integer;

i, j, h: Integer; begin

Enter;

h := n shr 1; while h <> 0 do

begin

for i := h + 1 to n do begin

tmp := k[i]; j := i - h;

while (j > 0) and (k[j] > tmp) do begin

k[j + h] := k[j]; j := j - h;

end;

k[j + h] := tmp; end;

h := h shr 1; end;

PrintResult; end;

(**QUICKSORT ********************************************** *******) procedure QuickSort;

procedure Partition(L, H: Integer); var

i, j: Integer;

Pivot: Integer;

begin

if L >= H then Exit;

Pivot := k[L + Random(H - L + 1)]; i := L; j := H;

repeat

while k[i] < Pivot do Inc(i); while k[j] > Pivot do Dec(j); if i <= j then

begin

if i < j then Swap(k[i], k[j]);

Inc(i); Dec(j); end;

until i > j;

Partition(L, j); Partition(i, H); end; end;

begin

Enter;

Partition(1, n);

PrintResult;

end;

(** HEAPSORT ********************************************** ********) procedure HeapSort;

var

r, i: Integer;

procedure Adjust(root, endnode: Integer); var

key, c: Integer; begin

key := k[root];

while root shl 1 <= endnode do begin

c := root shl 1;

if (c < endnode) and (k[c] < k[c + 1]) then Inc(c); if k[c] <= key then Break;

k[root] := k[c]; root := c; end; end;

k[root] := key; end;

begin

Enter;

for r := n shr 1 downto 1 do Adjust(r, n); for i := n downto 2 do

begin

Swap(k[1], k[i]);

Adjust(1, i - 1);

end;

PrintResult;

end;

(** DISTRIBUTION COUNTING ******************************************************) procedure DistributionCounting ;

var

x: TArr; c: TCount;

i, V: Integer; begin

Enter;

FillChar(c, SizeOf(c), 0);

for i := 1 to n do Inc(c[k[i]]);

for V := 1 to MaxV do c[V] := c[V - 1] + c[V]; for i := n downto 1 do

begin

V := k[i]; x[c[V]] := k[i]; Dec(c[V]);

end; k := x;

PrintResult; end;

Comment