Hadoop Multi Node Cluster Setup
- General
Hadoop Multi Node Cluster Setup
What is Hadoop ?
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Installation Steps
Prerequisite are :
- Make sure your Both Master and Slave System has Ssh installed & Active .
- To Check weather it is Installed and Active or Not : Open Terminal and type
1sudo systemctl status ssh
- If Not installed then install by
1sudo apt install openssh-client.
- java Should be installed .
Steps To install Multi-Cluster are:
- We have a Server IP : 192.168.0.186 (Master) And a Node or Slave IP : 192.168.0.119.
Generation of Keys.
- Generate the ssh key and add all the node keys in all the nodes under /home/username/.ssh/authorized_keys ( create this file if not exists ).
- Generate the key (ssh-keygen -t rsa) – 4 times press enter.
- Exchange the keys between Master Node and Slave Node . Example Paste Slave Node Key in Authorize _keys file in Master Node and vice versa.
- You will find your key in /home/username/.ssh/id_rsa.pub file . Copy it & Paste on Master Node and Vice versa.
- Key Look Like slave 1 – ssh-rsa AAAAB3N………….
Download the Hadoop & untar the file
- Download the hadoop from official website or open the terminal and use the command
1 |
wget https://downloads.apache.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz |
- Configure Hadoop Environment Variables (bashrc)
- Edit the .bashrc shell configuration file using a text editor of your choice (we will be using nano):
1 2 3 4 5 6 7 8 9 |
export HADOOP_HOME=/home/auriga/hadoop-3.2.3 export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/nativ" |
It is vital to apply the changes to the current running environment by using the following command:
1 |
source ~/.bashrc |
Edit hadoop-env.sh File
- The hadoop-env.sh file serves as a master file to configure YARN, HDFS, MapReduce, and Hadoop-related project settings.
- When setting up a single node Hadoop cluster, you need to define which Java implementation is to be utilized. Use the previously created
$HADOOP_HOME
variable to access the hadoop-env.sh file:
1 |
sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh |
Uncomment the $JAVA_HOME
variable (i.e., remove the #
sign) and add the full path to the OpenJDK installation on your system.
1 |
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_311 |
Edit core-site.xml File
- The core-site.xml file defines HDFS and Hadoop core properties.
- Open the core-site.xml file in a text editor:
1 |
sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml |
1 2 3 4 5 6 7 |
<configuration> <property> <name>fs.default.name</name> <value>hdfs://192.168.0.186:9000</value> <description>The name of the default file system></description> </property> </configuration> |
Edit hdfs-site.xml File
1 |
sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml |
- The properties in the hdfs-site.xml file govern the location for storing node metadata, fsimage file, and edit log file.
- Configure the file by defining the NameNode and DataNode storage directories.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>/home/auriga/hadoop-3.2.3/namenode-dir</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/auriga/hadoop-3.2.3/datanode-dir</value> </property> <property> <name>dfs.namenode.datanode.registration.ip-hostname-check</name> <value>false</value> </property> </configuration> |
Edit mapred-site.xml File
- see the following command to access the mapred-site.xml file and define MapReduce values:
1 |
sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml |
1 2 3 4 5 6 |
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> |
Edit yarn-site.xml File
- The yarn-site.xml file is used to define settings relevant to YARN.
- It contains configurations for the Node Manager, Resource Manager, Containers, and Application Master.
- Open the yarn-site.xml file in a text editor:
1 |
sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>192.168.0.186</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>192.168.0.186:8032</value> </property> </configuration> |
1 2 |
192.168.0.186 192.168.0.119 |
1 |
bin/hadoop namenode -format |
1 |
sbin/start-all.sh |
1 |
jps |
We can also access Web UI of Hadoop by hitting a url
1 |
http://192.168.0.186:9870 |
The YARN Resource Manager is accessible on port 8088:
1 |
http://192.168.0.186:8088 |
Conclusion
- You have successfully installed Hadoop on Ubuntu and deployed it in a distributed mode.
- A Multi node Hadoop deployment is an excellent starting point to explore basic HDFS commands and acquire the experience.
Related content
Auriga: Leveling Up for Enterprise Growth!
Auriga’s journey began in 2010 crafting products for India’s