Translate this page :

Setup OpenMPI Cluster for bzip2

:: One job in a client involves managing terabytes of logfiles. Each logfile can be about 9 GB in size. And a new one is created every few minutes. Needless to say, however big your storage is, it will always be hard to keep up with such rate of usage.

I alleviate this by compressing the logfile using bzip2, before sending it to the storage server.
One problem though; compressing 9 GB logfile can take 2,5 hours ! Gzip can compress faster, but the resulting file is also about 2x bigger than bzip2. 

So I looked around for a solution, and found mpibzip2 : http://compression.ca/mpibzip2/ 

Now a 9 GB logfile can be compressed in just 15 minutes. That's at least 10x faster ! Not bad 🙂

mpibzip2 achieves this by making the compression process spread into a cluster / many machines. Yes, we now can use a cluster to speedup compression process (insert Beowulf cluster joke here) 🙂 

An OpenMPI (Open Message Passing Interface) cluster enable us to run a software simultaneously in multiple machines : http://en.wikipedia.org/wiki/Message_Passing_Interface 

Setting up an OpenMPI cluster may seem to be a daunting task at first. 
Turned out it's quite easy on a Debian platform (squeeze or newer). Here is how :

=====================
###### MASTER ######
cd /tmp
apt-get install openmpi-bin build-essential libbz2-dev libopenmpi-dev  
wget http://compression.ca/mpibzip2/mpibzip2-0.6.tar.gz
tar xzvf mpibzip2-0.6.tar.gz
cd mpibzip2-0.6

# Need to edit Makefile
vi Makefile
## make sure the line with CC=c++ is changed into
# CC=mpic++
# Otherwise, we'll get the following error message : 
# pibzip2.cpp:72:17: fatal error: mpi.h: No such file or directory

make
make install

# let's test locally / just in this machine
# -n2 = use 2 processors
mpirun -n 2 mpibzip2 /var/log/syslog

### OK, let's start set up the cluster
# the master need to be able to access the slaves with no password
# create SSH keys
ssh-keygen -t rsa -b 4096
# when asked for password, just press enter, twice

cat ~/.ssh/id_rsa.pub
# then paste this public key into slaves' ~/.ssh/authorized_keys file

# put slave's IP address / hostnames here :
vi /etc/openmpi/openmpi-default-hostfile
# just need to put the slaves' IP addresses there, simple.

###### SLAVE ######
cd /tmp
apt-get install openmpi-bin build-essential libbz2-dev libopenmpi-dev sshfs 
wget http://compression.ca/mpibzip2/mpibzip2-0.6.tar.gz
tar xzvf mpibzip2-0.6.tar.gz
cd mpibzip2-0.6

# Need to edit Makefile
vi Makefile
## make sure the line with CC=c++ is changed into
# CC=mpic++
# Otherwise, we'll get the following error message : 
# pibzip2.cpp:72:17: fatal error: mpi.h: No such file or directory

make
make install

###### TEST mpibzip2 ######

# make sure there is a shared folder on all master & slaves
# in this example, I'll use sshfs to share the folder
cp /var/log/syslog /tmp/
ssh root@slave1 sshfs root@master:/tmp//tmp 
ssh root@slave2 sshfs root@master:/tmp//tmp 

# let's run mpibzip2
mpirun -v -n 40 –hostfile /etc/openmpi/openmpi-default-hostfile /usr/bin/mpibzip2 -v /tmp/syslog
# this will run 20 processes of mpibzip2 on each slave1 & slave2

=====================

Hope you'll find this note useful. 

At the moment, my 40-processors OpenMPI cluster is busy scrouging the storage server for any uncompressed logfiles, and quickly compress it. Love this :)

Open MPI – Wikipedia, the free encyclopedia
Open MPI. Open MPI logo.png · Stable release, 1.8 / March 31, 2014; 3 months ago (2014-03-31). Operating system · Unix, Linux, Mac OS · Platform · Cross-platform · Type · Library · License · New BSD License (free software). Website, www.open-mpi.org. Open MPI is a Message Passing Interface (MPI) …

Post imported by Google+Blog for WordPress.

66 Responses to “Setup OpenMPI Cluster for bzip2

Leave a Reply

 

Subscribe without commenting

            








SEObox: Web Hosting Murah Unlimited Komik Indonesia Homeschooling Indonesia