psort: Parallel sorting on the command line. An example.

[This is a copy of the post on: at ]

I am in the process to understand hadoop and the map-reduce framework.

This introductory line will be clarified with the next post, but keep in mind that in this post I am not seeking for the fastest sort but a bit more for a sort within a parallel framework. I need to sort a lot of data 😉

I needed a simple code which would work on my Q6600 processor and also on my 2 nodes 16×2 cores cpus. Sorting seems to be a good example, easy to understand, easy to implement with the sort command and a pretty typical problem. More over hadoop recently (maybe years in the IT time scale) won one of the sorting competition (See here or also here. Google it for up to date data). It sounded a good starting point for a simple and dummy comparison.

Continue reading