Category: cmd line

Backup with rsync a la time machine: a proof

I am struggling for backups since ever. I assume just like anybody. For my personal machines I used rsync in combination with my script taritdate.sh. Then I kept reading online about using rsync for incremental backups but I could not find any simple example. So I did this little script: Continue reading

Advertisements

Less unix, more linguistic and phonetics: a script to automate praat

Praat (the home page at http://www.praat.org/ and http://www.fon.hum.uva.nl/praat/):

I needed a simple script to quickly change the input data for the command line execution. The interesting part is at the end. It is an initial script to handle big data.

So this is a first attempt (lets call the script: run_praat.sh):

Continue reading

A small script: taritdate.sh tars directories with dates

This is a very simple script which I use often to create temporary backups of working directories. As complexity grows, I needed to have a script to save locally the state of my work, so that I can easily revert to a previous state. It came out that I also use it for backups. See it:

Continue reading

Some unix commands I want to remember and an example from recsys

There are some simple unix commands that are pretty useful but time to time I forget about. I often use them in handling large data sets and I am always surprised how a good pipe might save time and resources making possible to handle large amount of data within a small resources. It might make the difference between having an answer or not.

Continue reading

Sorting Cats with Hadoop and psort

[This post will also be published on http://lwsffhs.wordpress.com/]

This is my first “self” tutorial on hadoop mapreduce streaming. If you are really IT oriented you probably want to read http://hadoop.apache.org/docs/r0.15.2/streaming.html (or any newer version). This post doesn’t add much to that document with respect to hadoop mapreduce streaming. Here I play a bit with the “sort” on the command line. Probably you might want to read first my previous notes: psort: parallel sorting …. I will run these examples in a virtual cluster (libvirt/qemu/KVM) composed of 1 master node with 4 CPUs and 10 computing nodes with 2 CPUs each. The virtual nodes are distributed in two physical machines (I will post here in the future some details about this virtual cluster).

The question I had was: what hadoop mapreduce streaming actually does?

Continue reading

psort: Parallel sorting on the command line. An example.

[This is a copy of the post on: http://lwsffhs.wordpress.com/ at http://lwsffhs.wordpress.com/2012/08/29/psort-parallel-sorting-on-the-command-line-an-example/ ]

I am in the process to understand hadoop and the map-reduce framework.

This introductory line will be clarified with the next post, but keep in mind that in this post I am not seeking for the fastest sort but a bit more for a sort within a parallel framework. I need to sort a lot of data 😉

I needed a simple code which would work on my Q6600 processor and also on my 2 nodes 16×2 cores cpus. Sorting seems to be a good example, easy to understand, easy to implement with the sort command and a pretty typical problem. More over hadoop recently (maybe years in the IT time scale) won one of the sorting competition (See here or also here. Google it for up to date data). It sounded a good starting point for a simple and dummy comparison.

Continue reading