Request a Web Runner Go to the SlapOS web interface, and click on "New Service". Request a Web Runner Select "SlapOS Web Runner" from the list of available software. Choose the Web Runner version Select the latest stable release. Assign a title to the instance Give a meaningful title, like "hadoop-webrunner" Request from command line # slapos supply http://git.erp5.org/gitweb/slapos.git/blob_plain/slapos-0.204:/software/slaprunner/software.cfg COMP-xxxx # slapos request hadoop-webrunner http://git.erp5.org/gitweb/slapos.git/blob_plain/slapos-0.204:/software/slaprunner/software.cfg --node computer_guid=COMP-xxxx As an alternative to the previous steps, you can request the instance in the command line interface, from a slapos node. Wait for the instance to be ready When the instance has been deployed, its page will show the login credentials. Log in the webrunner Check out the "hadoop" branch Go to Manage your repository, write "hadoop" on the branch name, and click "checkout". Open Software Release Select the hadoop-demo buildout profile Compile with "Run software" Deploy with "Run instance" Log in the Web Runner shell Click in the terminal icon on the top, and insert the shell_password that is show in the credentials page. Run the demo Download raw data $ ./demo/gutenberg/data-download.sh Load data into hadoop $ ./demo/gutenberg/put-files.sh Run the job $ ./demo/gutenberg/run.sh You will find the results in var/gutenberg/input, var/gutenberg/output, and var/gutenberg/raw-data. The gutenberg demo is adapted from http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ and performs a simple word count on text files. A second demo is provided, which downloads a snapshot of Wikipedia and extracts the article titles. Both are using the hadoop-streaming component, and are written in Python. The streaming component does not use Hadoop daemons, so there is no need for a service/ directory. If you need that, see bin/start-daemons.sh