Category: linux

A small script: tars directories with dates

This is a very simple script which I use often to create temporary backups of working directories. As complexity grows, I needed to have a script to save locally the state of my work, so that I can easily revert to a previous state. It came out that I also use it for backups. See it:

Some unix commands I want to remember and an example from recsys

There are some simple unix commands that are pretty useful but time to time I forget about. I often use them in handling large data sets and I am always surprised how a good pipe might save time and resources making possible to handle large amount of data within a small resources. It might make the difference between having an answer or not.

hd bbc – our lab local virtual hadoop cluster

When I finished (after few days) to install our local cluster I could finally get on this video on you tube. Nice video, cheers masterschema!

But I’ll give here my little report in any case for the records and for some more comprehensive written form.

(EDIT: Please note, I do not describe a fast deployment of hadoop, but indeed an hand-on approach to a minimal set up that I used in order to learn some details. For tech people you probably find useful the network configuration on different hard nodes)

Maybe a little intro, what is this about:

  • Big data, data analytics, and the like. We want to test an hadoop installation. Not last, eventually use it to simplify some problems with a different approach.
  • This is (in this blog) a demo installation and eventually a test case. (no production here, even if it might be a starting point)
  • You will not see any data indeed! This is only about serving the tools.
  • We focus on map/reduce and/or software implementation, which means we do not consider the system for any data storage solution.
  • Beside this: have fun and check how we set up a little virtual cluster…

