VIFIB Vifib

VIFIB maintains 2 clones and backups on 35 different sites to achieve unparalleled resiliency

Request a Web Runner

Go to the SlapOS web interface, and click on "New Service".

Request a Web Runner

Select "SlapOS Web Runner" from the list of available software.

Choose the Web Runner version

Select the latest stable release.

Assign a title to the instance

Give a meaningful title, like "hadoop-webrunner"

Request from command line

    # slapos supply http://git.erp5.org/gitweb/slapos.git/blob_plain/slapos-0.204:/software/slaprunner/software.cfg COMP-xxxx
    # slapos request hadoop-webrunner http://git.erp5.org/gitweb/slapos.git/blob_plain/slapos-0.204:/software/slaprunner/software.cfg --node computer_guid=COMP-xxxx
  
As an alternative to the previous steps, you can request the instance in the command line interface, from a slapos node.

Wait for the instance to be ready

When the instance has been deployed, its page will show the login credentials.

Log in the webrunner

Check out the "hadoop" branch

Go to Manage your repository, write "hadoop" on the branch name, and click "checkout".

Open Software Release

Select the hadoop-demo buildout profile

Compile with "Run software"

Deploy with "Run instance"

Log in the Web Runner shell

Click in the terminal icon on the top, and insert the shell_password that is show in the credentials page.

Run the demo

Download raw data
$ ./demo/gutenberg/data-download.sh
Load data into hadoop
$ ./demo/gutenberg/put-files.sh
Run the job
$ ./demo/gutenberg/run.sh

You will find the results in var/gutenberg/input, var/gutenberg/output, and var/gutenberg/raw-data.

The gutenberg demo is adapted from http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ and performs a simple word count on text files.

A second demo is provided, which downloads a snapshot of Wikipedia and extracts the article titles. Both are using the hadoop-streaming component, and are written in Python.

The streaming component does not use Hadoop daemons, so there is no need for a service/ directory. If you need that, see bin/start-daemons.sh