Tuesday 17 July 2012

Step by step Installation of hadoop on linux

1)First of all install java jdk 1.6
In my case ,JAVA_HOME= /usr/java/jdk1.6.0_33

2)Create a dedicated hadoop user group to separate the hadoop installation from other software applications.
$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser

This will add the user hduser and the group hadoop to your local machine.

3)Configure SSH
Name the node as master and slave so it becomes convenient to distinguish the two

master's IP:192.168.0.106
Slave's IP:192.168.0.122


For Master

root@master:~$ su - hduser
hduser@master:~$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh/id_rsa'
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
f6:de:f8:ad:ff:b4:aa:06:63:57:2f:ae:f3:bf:ce:93 hduser@master


Copy the key in authorized keys of both the machines
[hduser@master ~]$ scp /home/hduser/.ssh/id_rsa.pub 192.168.0.106:$HOME/.ssh/authorized_keys

id_rsa.pub                                    100%  394     0.4KB/s   00

[hduser@master ~]$ scp /home/hduser/.ssh/id_rsa.pub 192.168.0.122:$HOME/.ssh/authorized_keys

id_rsa.pub                                    100%  394     0.4KB/s   00

For Slave

[root@slave ~]# su - hduser
[hduser@slave ~]$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created Directory '/home/hduser/.ssh/id_rsa '


Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
f6:de:f8:ad:ff:b4:aa:06:63:57:2f:ae:f3:bf:ce:93 hduser@slave



Copy the key to master's authorized key file
[hduser@slave ~]$ scp /home/hduser/.ssh/id_rsa.pub 192.168.0.106:$HOME/.ssh/authorized_keys2
hduser@192.168.0.106's password:
id_rsa.pub                                    100%  394     0.4KB/s   00:00  

4)In /etc/hosts of both the machines
192.168.0.106      master
192.168.0.122      slave

Caution:Avoid using localhost ,it creates poblems later.

5) You need to disable IPv6
Check the status
$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
A return value of zero indicates it is enabled but it should be disabled.
We will disable it in conf/hadoop-env.sh by adding

export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true

6)Download Hadoop and extract its content to /usr/local/hadoop by

$ cd /usr/local
$ sudo tar xzf hadoop-1.0.3.tar.gz
$ sudo mv hadoop-1.0.3 hadoop
$ sudo chown -R hduser:hadoop hadoop
 
export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME= /usr/java/jdk1.6.0_33 







 
Make a directory /app/hadoop/tmp to store temporary data
 
$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown hduser:hadoop /app/hadoop/tmp
# ...and if you want to tighten up security, chmod from 755 to 750...
$ sudo chmod 750 /app/hadoop/tmp 
 
7)In /usr/local/hadoop/conf/hadoop-env.sh
 For Masters,add or uncomment these
export JAVA_HOME=/usr/java/jdk1.6.0_33

# Extra Java CLASSPATH elements.  Optional.
 export HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.92.1.jar:$HBASE_HOME/conf:$HBASE_HOME/lib/zookeeper-3.4.3.jar


# The maximum amount of heap to use, in MB. Default is 1000.
 export HADOOP_HEAPSIZE=2000

 export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENO$
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADO$
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANO$
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANC$
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBT$
# export HADOOP_TASKTRACKER_OPTS=


 
For Slaves,add or uncomment these 

export JAVA_HOME=/usr/java/jdk1.6.0_33/

export HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.92.1.jar:$HBASE_HOME/conf:$HBASE_H$

export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
 
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OP$
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SE$
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OP$
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OP$
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKE$
# export HADOOP_TASKTRACKER_OPTS=


8) In /usr/local/hadoop/conf/core-site.xml
 For master,add these properties in configuration


  hadoop.tmp.dir
  /app/hadoop/tmp
  A base for other temporary directories.



  fs.default.name
  hdfs://master:54310


 
For slaves,add these properties in configuration
 
 

  hadoop.tmp.dir
  /app/hadoop/tmp
  A base for other temporary directories.

  fs.default.name
  hdfs://master:54310
  The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.
 
 
9)In /usr/local/hadoop/conf/mapred-site.xml
For master,add

  mapred.job.tracker
  master:54311
  The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
 


  mapred.map.tasks
  20
  The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
 


  mapred.reduce.tasks
  4
  The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
 


  mapred.local.dir
  /app/hadoop/tmp/mapred
  The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
 


 
For slaves,add


  mapred.job.tracker
  master:54311
  The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  


10)In /usr/local/hadoop/conf/hdfs-site.xml
 
For master

  dfs.replication
  2
  Default block replication.
  The actual number of replications can be specified when the file is
created.
  The default is used if replication is not specified in create time.
 

 
            dfs.support.append
            true
       



  dfs.name.dir
  /app/hadoop/tmp/dfs/name


  dfs.data.dir
  /app/hadoop/tmp/dfs/data


  dfs.datanode.max.xcievers
  4096


  dfs.permissions
  /false


  hadoop.tmp.dir
  /app/hadoop/tmp/hadoop


  mapred.system.dir
  /hadoop/mapred/system
 true

 
For Slaves 

    
          dfs.support.append
          true

   
  
         dfs.datanode.max.xcievers
         4096
  

  
           dfs.replication
            2
           Default block replication.
               The actual number of replications can be specified when the file is created.
               The default is used if replication is not specified in create time.
           

 







11)The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your “cluster” . You need to do this the first time you set up a Hadoop cluster.
Caution:Do not format a running Hadoop filesystem as you will lose all the data currently in the cluster (in HDFS).
To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable), run the command on master only

hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format
The output will look like this:
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop namenode -format
10/05/08 16:59:56 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ubuntu/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/05/08 16:59:56 INFO namenode.FSNamesystem: fsOwner=hduser,hadoop
10/05/08 16:59:56 INFO namenode.FSNamesystem: supergroup=supergroup
10/05/08 16:59:56 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/05/08 16:59:56 INFO common.Storage: Image file of size 96 saved in 0 seconds.
10/05/08 16:59:57 INFO common.Storage: Storage directory .../hadoop-hduser/dfs/name has been successfully formatted.
10/05/08 16:59:57 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/
hduser@ubuntu:/usr/local/hadoop$


For installing hbase refer
http://biforbeginners.blogspot.in/2012/07/step-by-step-installation-of-hbase.html
 

 

No comments:

Post a Comment