TechTorch

Location:HOME > Technology > content

Technology

Setting Up a Multi-Node Hadoop Cluster

April 24, 2025Technology1977
Setting Up a Multi-Node Hadoop Cluster Setting up a multi-node Hadoop

Setting Up a Multi-Node Hadoop Cluster

Setting up a multi-node Hadoop cluster can be a potent way to manage large data processing tasks efficiently. This article will guide you through the essential steps to set up a Hadoop cluster, ensuring it’s scalable and secure.

Essential Prerequisites

Before diving into the setup, make sure you have the following prerequisites installed on your system: Hadoop 2.7.3 from Apache Hadoop Releases Java 1.8.0_111 Apache Spark 1.6.2 from Downloads Apache Spark

Mapping the Nodes

The first step involves configuring the `/etc/hosts` file on each node to map the IP addresses with hostnames for easy reference.
vi /etc/hosts
Add the following lines:
hadoop-ace
hadoop-slave-1
hadoop-slave-2

Passwordless Login Through SSH

Setting up passwordless SSH login ensures seamless communication between nodes. This will be done using key-based authentication.
su hduser
ssh-keygen -t rsa
Then duplicate the public key to all the slave nodes:
ssh-copy-id -i ~_ 
Ensure the following permissions are set correctly: ssh directory: 700 authorized_keys: 644 user hduser: 755

Setting Up Java Environment

Ensure the same Java environment (path) is set up on the master and slave nodes.
export JAVA_HOME/home/hduser/programming/jdk1.8.0_111
Add this line to the `~` file and source the file to apply the changes.
source ~

Designing Hadoop

Next, let’s proceed with the Hadoop configuration. We’ll start by installing Hadoop in the `/usr/local` directory.
su
mkdir /usr/local/hadoop
chown hduser /usr/local/hadoop
Set the `HADOOP_HOME` environment variable in the `~` file:
export HADOOP_HOME/usr/local/hadoop
export PATH$PATH:$HADOOP_HOME/bin
Now, create a directory named `hadoop_data` in a chosen directory and in `HADOOP_HOME`, create a directory named `dfs` and within `dfs`, create a directory named `name`.
mkdir -p ${HADOOP_HOME}/dfs/name
hadoop format –name-dir ${HADOOP_HOME}/dfs/name
Set the appropriate permissions for `name` and `dfs` to be `777`.
chmod -R 777 ${HADOOP_HOME}/dfs
Edit the `` file:
vi ${HADOOP_HOME}
And add the following line:
export JAVA_HOME/home/hduser/programming/jdk1.8.0_111
Edit the `core-site.xml` configuration file:
vi ${HADOOP_HOME}/etc/hadoop/core-site.xml
Your `core-site.xml` file should look like this:
configuration
 property
 
 valuehdfs://hadoop-master:54311/value
 descriptionURL for HDFS URI/description
 /property
 property
 
 valuehttp://hdfs://hadoop-master:54311/value
 descriptionLocation for the HDFS data/description
 /property
/configuration
Edit the `hdfs-site.xml` configuration file:
vi ${HADOOP_HOME}/etc/hadoop/hdfs-site.xml
Your `hdfs-site.xml` file should look like this:
configuration
 property
 
 value1/value
 /property
 property
 
 value/usr/local/hadoop/dfs/name/value
 finaltrue/final
 /property
 property
 
 value/usr/local/hadoop/dfs/data/value
 finaltrue/final
 /property
/configuration
Edit the `mapred-site.xml` configuration file:
vi ${HADOOP_HOME}/etc/hadoop/mapred-site.xml
Your `mapred-site.xml` file should look like this:
configuration
 property
 
 valueyarn/value
 /property
/configuration
Edit the `yarn-site.xml` configuration file:
vi ${HADOOP_HOME}/etc/hadoop/yarn-site.xml
Your `yarn-site.xml` file should look like this:
configuration
 property
 
 valuehadoop-master/value
/property
 property
 
 valuemapreduce_shuffle/value
 /property
 property
 
 value120000/value
 /property
 property
 
 value300000/value
 /property
/configuration
Finally, set the `HADOOP_USER_NAME` environment variable on all nodes and add the Spark master IP address to the `slaves` file on the master node.
export HADOOP_USER_NAMEhduser
vi ${HADOOP_HOME}/etc/hadoop/slaves
Add the slave IP addresses:
hadoop-slave-1
hadoop-slave-2
Expel the `localhost` entry from the `slaves` file.

Conclusion

Proper configuration of a Hadoop cluster ensures that your big data processing tasks are performed efficiently and reliably. Ensure that both the Hadoop and Spark installations are synchronized in the master and slave nodes to facilitate seamless data processing and analysis.

Note: This setup will provide a foundation for a robust Hadoop environment, but may require further adjustments based on your specific use case and infrastructure.