I assume that you have: - installed Hadoop 1.x on your machine - added Hadoop to your path. To double check, on a terminal run: echo $PATH This should return a line similar to the following /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/opt/X11/bin:/usr/local/git/bin:/usr/texbin:./:/usr/local/mysql/bin:/usr/local/mysql/bin:/usr/local/hadoop-1.2.1/bin:/usr/local/pig-0.12.0/bin - set the environment variable JAVA_HOME, e.g., export JAVA_HOME=`/usr/libexec/java_home -v 1.6` To get started, you need to make sure Hadoop is running. I have installed Hadoop at /usr/local/hadoop-1.2.1. To start the system I run the following command: $ /usr/local/hadoop-1.2.1/bin/start-all.sh Hadoop runs on top of your operating system, and it does not 'see' your local files. So you have to copy all necessary data to the Hadoop file system. But before you do that, you need to format the name node: $ hadoop namenode -format I have created a small version of the Wikipedia data set with only 10 documents, and stored it in ~/Desktop/wikipedia.txt. To make it available under Hadoop, I used the following command: $ hadoop fs -copyFromLocal ~/Desktop/wikipedia.txt wikipedia.txt A new file called wikipedia.txt is then created inside HDFS. To see the files in HDFS: $ hadoop fs -ls which outputs: Found 1 items -rw-r--r-- 1 julianafreire supergroup 5171 2014-04-01 22:40 /user/julianafreire/wikipedia.txt Now, you can run an example. I have provided a mapper and a reducer written in Python, called pmap.py and pred.py. To fire the job, run: $ hadoop jar /usr/local/hadoop-1.2.1/contrib/streaming/hadoop-streaming-1.2.1.jar -file pmap.py -mapper pmap.py -file ./pred.py -reducer pred.py -input /user/julianafreire/wikipedia.txt -output /user/julianafreire/wikipedia.output The outputs of this job is now in the HDFS, in the directory /user/julianafreire/wikipedia.output. To list the output files: $ hadoop fs -ls /user/julianafreire/wikipedia.output You can also inspect the content of the files: $ hadoop fs -cat wikipedia.output/* If you'd like to copy the files over to your local directory: $ hadoop fs -get /user/julianafreire/wikipedia.output output This will copy the outputs to the local directory "output" If you need to run the job again, you need to remove the output directory: $ hadoop fs -rmr /user/julianafreire/wikipedia.output When you are done, remember to kill Hadoop: $ stop-all.sh