Configure Hadoop and start cluster
services using Ansible Playbook

Set up a Hadoop Cluster using Ansible | LaptrinhX

What is Hadoop?

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model

Hadoop Cluster Architecture

Basically, for the purpose of storing as well as analyzing huge amounts of unstructured data in a distributed computing environment, a special type of computational cluster is designed that what we call as Hadoop Clusters.
Though, whenever we talk about Hadoop Clusters, two main terms come up, they are cluster and node, so on defining them:

  • A collection of nodes is what we call the cluster.
  • A node is a point of intersection/connection within a network, i.e a server

The NameNode is the HDFS master, which manages the file system namespace and regulates access to files by clients and also consults with DataNodes (HDFS slave) while copying data or running MapReduce operations.

Whereas DataNode manages storage attached to the nodes that they run on, basically there are a number of DataNodes, that means one DataNode per slave in the cluster.
In other words, a node which knows where the files are to be found in hdfs are Namenode, and the node which have the data of the files are Datanodes.

Firstly download the hadoop and jdk packages:

hadoop-1.2.1-1.x86_64.rpm
jdk-8u171-linux-x64.rpm

CONFIGURING MASTER NODE !

Now let us create a playbook to configure the Master Node:

---
- name: Install Hadoop on master
hosts: localhost
tasks:
- name: Copy Hadoop packages
copy:
src: hadoop-1.2.1-1.x86_64.rpm
dest: /tmp/hadoop-1.2.1-1.x86_64.rpm
- name: Install java package
copy:
src: jdk-8u171-linux-x64.rpm
dest: /tmp/jdk-8u171-linux-x64.rpm
- name: Install hadoop package
command: rpm -i /tmp/hadoop-1.2.1-1.x86_64.rpm --force
run_once: yes
- name: Install java packages
command: rpm -i /tmp/jdk-8u171-linux-x64.rpm --force
run_once: yes
- name: Copy core-site.xml configuration file
template:
src: core-site.xml.j2
dest: /etc/hadoop/core-site.xml
notify: Start hadoop services
- name: Copy hdfs-site.xml configuration file
template:
src: hdfs-site.xml.j2
dest: /etc/hdfs-site.xml
notify: Start hadoop services
- name: Create master node directory
file:
path: /nn1
state: directory
notify: Format namenode
handlers:
- name: Format namenode
command: hadoop namenode -format -y
- name: Start hadoop services
command: hadoop-daemon.sh start namenode
[root@localhost task11.1]# ---
- name: Install Hadoop on master
hosts: localhost
tasks:
- name: Copy Hadoop packages
copy:
src: hadoop-1.2.1-1.x86_64.rpm
dest: /tmp/hadoop-1.2.1-1.x86_64.rpm
- name: Install java package
copy:
src: jdk-8u171-linux-x64.rpm
dest: /tmp/jdk-8u171-linux-x64.rpm
- name: Install hadoop package
command: rpm -i /tmp/hadoop-1.2.1-1.x86_64.rpm --force
run_once: yes
- name: Install java packages
command: rpm -i /tmp/jdk-8u171-linux-x64.rpm --force
run_once: yes
- name: Copy core-site.xml configuration file
template:
src: core-site.xml.j2
dest: /etc/hadoop/core-site.xml
notify: Start hadoop services
- name: Copy hdfs-site.xml configuration file
template:
src: hdfs-site.xml.j2
dest: /etc/hdfs-site.xml
notify: Start hadoop services
- name: Create master node directory
file:
path: /nn1
state: directory
notify: Format namenode
handlers:
- name: Format namenode
command: hadoop namenode -format -y
- name: Start hadoop services
command: hadoop-daemon.sh start namenode

Create two file core-site.xml.j2 and hdfs-site.xml.j2 :

core-site.xml.j2 file with the following content:

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://ansible_facts.default_ipv4.address:9001</value></property>
</configuration>

hdfs-site.xml.j2 file with the following content:

<configuration>
<property>
<name>dfs.name.dir</name>
<value>/nn1</value>
</property>
</configuration>

Now let us run the playbook!

ansible-playbook playbook1.yml

Now to check if the Masternode has started or not run the following command:

jps

Voila NameNode configured and started!

CONFIGURING DATA NODE !

Now let us create a playbook to configure the Data Node:

---
- name: Install Hadoop on datanode
hosts: localhost
tasks:
- name: Copy Hadoop packages
copy:
src: hadoop-1.2.1-1.x86_64.rpm
dest: /tmp/hadoop-1.2.1-1.x86_64.rpm
- name: Copy java package
copy:
src: jdk-8u171-linux-x64.rpm
dest: /tmp/jdk-8u171-linux-x64.rpm
- name: Install hadoop package
command: rpm -i /tmp/hadoop-1.2.1-1.x86_64.rpm --force
run_once: yes
- name: Install java packages
command: rpm -i /tmp/jdk-8u171-linux-x64.rpm --force
run_once: yes
- name: Copy core-site.xml configuration file
template:
src: core-site.xml.j2
dest: /etc/hadoop/core-site.xml
notify: Start hadoop services
- name: Copy hdfs-site.xml configuration file
template:
src: hdfs-site.xml.j2
dest: /etc/hdfs-site.xml
notify: Start hadoop services
- name: Create master node directory
file:
path: /dn1
state: directory
handlers:
- name: Start hadoop services
command: hadoop-daemon.sh start datanode

Create two file core-site.xml.j2 and hdfs-site.xml.j2 :

core-site.xml.j2 file with the following content:

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.11:9001</value>
</property>
</configuration>

hdfs-site.xml.j2 file with the following content:

<configuration>
<property>
<name>dfs.name.dir</name>
<value>/dn1</value>
</property>
</configuration>

Now let us run the playbook!

ansible-playbook playbook2.yml

Now to check if the Datanode has started or not run the following command:

jps

SUCCESSFUL ! WE HAVE CONFIGURED AND STARTED A HADOOP CLUSTER :)

THANK YOU!