Techie Talks: Step by step Installation of hadoop on linux

1)First of all install java jdk 1.6

In my case ,JAVA_HOME= /usr/java/jdk1.6.0_33

2)Create a dedicated hadoop user group to separate the hadoop installation from other software applications.

$ sudo addgroup hadoop

$ sudo adduser --ingroup hadoop hduser

This will add the user hduser and the group hadoop to your local machine.

3)Configure SSH

Name the node as master and slave so it becomes convenient to distinguish the two

master's IP:192.168.0.106

Slave's IP:192.168.0.122

For Master

root@master:~$ su - hduser

hduser@master:~$ ssh-keygen -t rsa -P ""

Generating public/private rsa key pair.

Enter file in which to save the key (/home/hduser/.ssh/id_rsa):

Created directory '/home/hduser/.ssh/id_rsa'

Your identification has been saved in /home/hduser/.ssh/id_rsa.

Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.

The key fingerprint is:

f6:de:f8:ad:ff:b4:aa:06:63:57:2f:ae:f3:bf:ce:93 hduser@master

Copy the key in authorized keys of both the machines

[hduser@master ~]$ scp /home/hduser/.ssh/id_rsa.pub 192.168.0.106:$HOME/.ssh/authorized_keys

id_rsa.pub 100% 394 0.4KB/s 00

[hduser@master ~]$ scp /home/hduser/.ssh/id_rsa.pub 192.168.0.122:$HOME/.ssh/authorized_keys

id_rsa.pub 100% 394 0.4KB/s 00

For Slave

[root@slave ~]# su - hduser

[hduser@slave ~]$ ssh-keygen -t rsa -P ""

Generating public/private rsa key pair.

Enter file in which to save the key (/home/hduser/.ssh/id_rsa):

Created Directory '/home/hduser/.ssh/id_rsa '

Your identification has been saved in /home/hduser/.ssh/id_rsa.

Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.

The key fingerprint is:

f6:de:f8:ad:ff:b4:aa:06:63:57:2f:ae:f3:bf:ce:93 hduser@slave

Copy the key to master's authorized key file

[hduser@slave ~]$ scp /home/hduser/.ssh/id_rsa.pub 192.168.0.106:$HOME/.ssh/authorized_keys2

hduser@192.168.0.106's password:

id_rsa.pub 100% 394 0.4KB/s 00:00

4)In /etc/hosts of both the machines

192.168.0.106 master

192.168.0.122 slave

Caution:Avoid using localhost ,it creates poblems later.

5) You need to disable IPv6

Check the status

$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6

A return value of zero indicates it is enabled but it should be disabled.

We will disable it in conf/hadoop-env.sh by adding

export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true

6)Download Hadoop and extract its content to /usr/local/hadoop by

$ cd /usr/local
$ sudo tar xzf hadoop-1.0.3.tar.gz
$ sudo mv hadoop-1.0.3 hadoop
$ sudo chown -R hduser:hadoop hadoop

export HADOOP_HOME=/usr/local/hadoop

export JAVA_HOME= /usr/java/jdk1.6.0_33

Make a directory /app/hadoop/tmp to store temporary data

$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown hduser:hadoop /app/hadoop/tmp
# ...and if you want to tighten up security, chmod from 755 to 750...
$ sudo chmod 750 /app/hadoop/tmp

7)In /usr/local/hadoop/conf/hadoop-env.sh

 For Masters,add or uncomment these

export JAVA_HOME=/usr/java/jdk1.6.0_33

# Extra Java CLASSPATH elements. Optional.
export HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.92.1.jar:$HBASE_HOME/conf:$HBASE_HOME/lib/zookeeper-3.4.3.jar

# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE=2000
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENO$
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADO$
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANO$
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANC$
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBT$
# export HADOOP_TASKTRACKER_OPTS=

For Slaves,add or uncomment these

export JAVA_HOME=/usr/java/jdk1.6.0_33/

export HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.92.1.jar:$HBASE_HOME/conf:$HBASE_H$

export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true

export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OP$
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SE$
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OP$
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OP$
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKE$
# export HADOOP_TASKTRACKER_OPTS=

8) In /usr/local/hadoop/conf/core-site.xml

 For master,add these properties in configuration

hadoop.tmp.dir
/app/hadoop/tmp
A base for other temporary directories.

fs.default.name
hdfs://master:54310

For slaves,add these properties in configuration

 

  hadoop.tmp.dir
  /app/hadoop/tmp
  A base for other temporary directories.


  fs.default.name
  hdfs://master:54310
  The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.

9)In /usr/local/hadoop/conf/mapred-site.xml

For master,add

mapred.job.tracker
master:54311
The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.

mapred.map.tasks
20
The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.

mapred.reduce.tasks
4
The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.

mapred.local.dir
/app/hadoop/tmp/mapred
The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.

For slaves,add



  mapred.job.tracker
  master:54311
  The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.

10)In /usr/local/hadoop/conf/hdfs-site.xml

For master

dfs.replication
2
Default block replication.
The actual number of replications can be specified when the file is
created.
The default is used if replication is not specified in create time.

            dfs.support.append
            true


dfs.name.dir
/app/hadoop/tmp/dfs/name

dfs.data.dir
/app/hadoop/tmp/dfs/data

dfs.datanode.max.xcievers
4096

dfs.permissions
/false

hadoop.tmp.dir
/app/hadoop/tmp/hadoop

mapred.system.dir
/hadoop/mapred/system
true

For Slaves

          dfs.support.append
          true


         dfs.datanode.max.xcievers
         4096


           dfs.replication
            2
           Default block replication.
               The actual number of replications can be specified when the file is created.
               The default is used if replication is not specified in create time.


11)The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your “cluster” . You need to do this the first time you set up a Hadoop cluster.

Caution:Do not format a running Hadoop filesystem as you will lose all the data currently in the cluster (in HDFS).

To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable), run the command on master only

hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format

The output will look like this:

hduser@ubuntu:/usr/local/hadoop$ bin/hadoop namenode -format
10/05/08 16:59:56 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ubuntu/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/05/08 16:59:56 INFO namenode.FSNamesystem: fsOwner=hduser,hadoop
10/05/08 16:59:56 INFO namenode.FSNamesystem: supergroup=supergroup
10/05/08 16:59:56 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/05/08 16:59:56 INFO common.Storage: Image file of size 96 saved in 0 seconds.
10/05/08 16:59:57 INFO common.Storage: Storage directory .../hadoop-hduser/dfs/name has been successfully formatted.
10/05/08 16:59:57 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/
hduser@ubuntu:/usr/local/hadoop$

For installing hbase refer
http://biforbeginners.blogspot.in/2012/07/step-by-step-installation-of-hbase.html

Techie Talks

Tuesday 17 July 2012

Step by step Installation of hadoop on linux

No comments:

Post a Comment