External Sorting Algorithms On Parallel Disks To Minimize Parallel I/O's

The program implementing external sorting on parallel disk based on paper[1] can be downloaded from here.

USAGE:

After you extract the file "PDMSorting.tgz" you should see a linux 32-bit binary called pdm sort. The program takes two parameters "N , D" (Following the convention in the paper). To test the algorithm for inputsize "N = 2^24" with "D = 4" (i.e 4 disks) just type "./pdm sort 24 4"(random input of integers will be automatically generated). The output will be a binary file of integers in "run_file.bin".

To convert the integers into text format use the program "bin2ascii" (available in the archive) as follows "./bin2ascii run_file.bin sorted file -r". This would create file(s) like "sorted file.x.txt"(x is the part number, the file will be automatically split if it exceeds size of 4GB). To see the random input file generated by the program in text format use "bin2ascii key_file.txt input". This will create a text file(s) "input.1.txt". Now you can verify by sorting "input.1.txt" by any program and comparing it with "sorted file.x.txt". The test cases reported in the tables of the paper can be verified by getting downloading the input key files from here.
The main external sorting is there in file "ExternalSort.c" in function "TopLevelRWayMerge".

References

[1] V. Kundeti and S. Rajasekaran, "Efficient PDM Sorting Algorithms". IEEE/ACM Conference on High Performance Computing (HiPC) 2008.