Configure Hadoop and start cluster
services using Ansible Playbook

5 min readMar 17, 2021

Set up a Hadoop Cluster using Ansible | LaptrinhX

What is Hadoop?

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model

Hadoop Cluster Architecture

Basically, for the purpose of storing as well as analyzing huge amounts of unstructured data in a distributed computing environment, a special type of computational cluster is designed that what we call as Hadoop Clusters.
Though, whenever we talk about Hadoop Clusters, two main terms come up, they are cluster and node, so on defining them:

A collection of nodes is what we call the cluster.
A node is a point of intersection/connection within a network, i.e a server

The NameNode is the HDFS master, which manages the file system namespace and regulates access to files by clients and also consults with DataNodes (HDFS slave) while copying data or running MapReduce operations.

Whereas DataNode manages storage attached to the nodes that they run on, basically there are a number of DataNodes, that means one DataNode per slave in the cluster.
In other words, a node which knows where the files are to be found in hdfs are Namenode, and the node which have the data of the files are Datanodes.

Firstly download the hadoop and jdk packages:

hadoop-1.2.1-1.x86_64.rpm
jdk-8u171-linux-x64.rpm

CONFIGURING MASTER NODE !

Now let us create a playbook to configure the Master Node:

---
- name: Install Hadoop on master
  hosts: localhost
  tasks:
  - name: Copy Hadoop packages
    copy:
      src: hadoop-1.2.1-1.x86_64.rpm
      dest: /tmp/hadoop-1.2.1-1.x86_64.rpm
- name: Install java package
    copy:
      src: jdk-8u171-linux-x64.rpm
      dest: /tmp/jdk-8u171-linux-x64.rpm- name: Install hadoop package
    command: rpm -i /tmp/hadoop-1.2.1-1.x86_64.rpm --force
    run_once: yes- name: Install java packages
    command: rpm -i /tmp/jdk-8u171-linux-x64.rpm --force
    run_once: yes- name: Copy core-site.xml configuration file
    template:
      src: core-site.xml.j2
      dest: /etc/hadoop/core-site.xml
    notify: Start hadoop services- name: Copy hdfs-site.xml configuration file
    template:
      src: hdfs-site.xml.j2
      dest: /etc/hdfs-site.xml
    notify: Start hadoop services- name: Create master node directory
    file:
      path: /nn1
      state: directory
    notify: Format namenodehandlers:
  - name: Format namenode
    command: hadoop namenode -format -y- name: Start hadoop services
    command: hadoop-daemon.sh start namenode
[root@localhost task11.1]# ---
- name: Install Hadoop on master
  hosts: localhost
  tasks:
  - name: Copy Hadoop packages
    copy:
      src: hadoop-1.2.1-1.x86_64.rpm
      dest: /tmp/hadoop-1.2.1-1.x86_64.rpm- name: Install java package
    copy:
      src: jdk-8u171-linux-x64.rpm
      dest: /tmp/jdk-8u171-linux-x64.rpm- name: Install hadoop package
    command: rpm -i /tmp/hadoop-1.2.1-1.x86_64.rpm --force
    run_once: yes- name: Install java packages
    command: rpm -i /tmp/jdk-8u171-linux-x64.rpm --force
    run_once: yes- name: Copy core-site.xml configuration file
    template:
      src: core-site.xml.j2
      dest: /etc/hadoop/core-site.xml
    notify: Start hadoop services- name: Copy hdfs-site.xml configuration file
    template:
      src: hdfs-site.xml.j2
      dest: /etc/hdfs-site.xml
    notify: Start hadoop services- name: Create master node directory
    file:
      path: /nn1
      state: directory
    notify: Format namenodehandlers:
  - name: Format namenode
    command: hadoop namenode -format -y- name: Start hadoop services
    command: hadoop-daemon.sh start namenode

Create two file core-site.xml.j2 and hdfs-site.xml.j2 :

core-site.xml.j2 file with the following content:

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://ansible_facts.default_ipv4.address:9001</value></property>
</configuration>

hdfs-site.xml.j2 file with the following content:

<configuration>
<property>
<name>dfs.name.dir</name>
<value>/nn1</value>
</property>
</configuration>

Now let us run the playbook!

ansible-playbook playbook1.yml

Now to check if the Masternode has started or not run the following command:

jps

Voila NameNode configured and started!

CONFIGURING DATA NODE !

Now let us create a playbook to configure the Data Node:

---
- name: Install Hadoop on datanode
  hosts: localhost
  tasks:
  - name: Copy Hadoop packages
    copy:
      src: hadoop-1.2.1-1.x86_64.rpm
      dest: /tmp/hadoop-1.2.1-1.x86_64.rpm- name: Copy java package
    copy:
      src: jdk-8u171-linux-x64.rpm
      dest: /tmp/jdk-8u171-linux-x64.rpm- name: Install hadoop package
    command: rpm -i /tmp/hadoop-1.2.1-1.x86_64.rpm --force
    run_once: yes- name: Install java packages
    command: rpm -i /tmp/jdk-8u171-linux-x64.rpm --force
    run_once: yes- name: Copy core-site.xml configuration file
    template:
      src: core-site.xml.j2
      dest: /etc/hadoop/core-site.xml
    notify: Start hadoop services- name: Copy hdfs-site.xml configuration file
    template:
      src: hdfs-site.xml.j2
      dest: /etc/hdfs-site.xml
    notify: Start hadoop services- name: Create master node directory
    file:
      path: /dn1
      state: directoryhandlers:
  - name: Start hadoop services
    command: hadoop-daemon.sh start datanode

Create two file core-site.xml.j2 and hdfs-site.xml.j2 :

core-site.xml.j2 file with the following content:

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.11:9001</value>
</property>
</configuration>

hdfs-site.xml.j2 file with the following content:

<configuration>
<property>
<name>dfs.name.dir</name>
<value>/dn1</value>
</property>
</configuration>

Now let us run the playbook!

ansible-playbook playbook2.yml

Now to check if the Datanode has started or not run the following command:

jps

SUCCESSFUL ! WE HAVE CONFIGURED AND STARTED A HADOOP CLUSTER :)

THANK YOU!

Configure Hadoop and start clusterservices using Ansible Playbook