Hadoop Hangover : How-to launch a hadoop cluster CDH4 [MRv1 / YARN + Ganglia] using Apache Whirr
This post is about how-to launch a CDH4 MRv1 or CDH4 Yarn cluster on EC2 instances. It's said that you can launch a cluster with the help of Whirr and in a matter of 5 minutes! This is very true if and only if everything works out well! ;)
Hopefully, this article helps you in that regard.
So, let's row the boat...
- Download the stable version of Apache Whirr ie. whirr-0.8.1.tar.gz from the following link whirr-0.8.1.tar.gz
- Extract from the tarball and generate the key
- Generate the key
- Make a properties file to launch the cluster with that configuration.
- Now let me tell you how to avoid getting headaches!
- cluster name: Keep your cluster name simple. Avoid testCluster, testCluster1 etc. ie. No Caps, numerics..
- Decide on the number of datanodes you want judiciously.
- Your launch may not be successful, if java is not installed. Make sure the image has Java. However, this properties file takes care of that.
- It will be good to go ahead with MRv1 for now and later switch to MRv2, when we get a production stable release.
- This is the minimal set of configurations for launching a Hadoop cluster. But, you can do a lot performance tuning upon this.
- I had launched this cluster from an ec2 instance, Initially i faced errors, regarding user. Setting the configuration below, solved the problem.
- Set proper permissions for ~/.ssh and whirr-0.8.1 folder before launching.
- Well, we are ready to launch the cluster. Name the properties file as "whirr_cdh.properties".
In the console you can see, links to Namenode and JobTracker Web UI. It also prints how to ssh to the instances in the end.
Happy Learning! :)
- Now, you should be having the files generated. You will be able to see these files: instances, hadoop-proxy.sh and hadoop-site.xml
- Starting the proxy
- Open another terminal, and type
- You should be able to access the HDFS.
- You can alternatively download hadoop tarball and launch with
- Okay! So I know that you will not be satisfied unless you a web UI
- If you want to launch MRv2, use this.
Happy Learning! :)