The following figure shows what I plan to distribute my cloud cluster.
If you follow the previous sections, you will get a Docker container with hadoop environment. To use that, you need to duplicate it to other endpoints you want to distribute to build hadoop cluster.
Save the container as a compressed tar
file.
$ sudo docker save <image_repository:tag> > XXXX.tar
After finishing export, I use scp
to transport the image to other nodes in cluster.
$ scp -P [port] XXXX.tar [account]@[domain]/<where you want to store>
Now, switch to the master2
to show how to use the image we just transport from master
.
$ sudo docker load < XXXX.tar
After loading tar
file, we can check the image just importing from local repository.
$ sudo docker images
Now, we can start to use this image to distribute the cluster. In this example, I write a simple shell script
to run Docker image instead of Dockerfile
because I haven't knowed how to use Dockerfile
to build a Docker container.
Because --link
option is not fit in my situation, I use the basically port mapping to try connect with each container via internet
$ vi bootstrap.sh
-> sudo docker run -i -t -p 2122:2122 -p 50020:50020 -p 50090:50090 -p 50070:50070 -p 50010:50010 -p 50075:50075 -p 8031:8031 -p 8032:8032 -p 8033:8033 -p 8040:8040 -p 8042:8042 -p 49707:49707 -p 8088:8088 -p 8030:8030 -p 9000:9000 -p 9001:9001 -p 3888:3888 -p 2888:2888 -p 2181:2181 -p 16020:16020 -p 60010:60010 -p 60020:60020 -p 60000:60000 -p 9090:9090 -h <hostname> <repository:tag>
wq!
$ sh bootstrap.sh
In this example, I use master2
as hostname and listen all needed ports from container to endpoint machine.
Now, we can start to boot the hadoop cluster on. There are some steps before start-dfs.sh
.
$ source /etc/profile
$ sudo vi /etc/hosts
-> put all IP and host name
## for example
127.0.0.5 master
10.0.0.2 master2
10.0.0.3 slave1
10.0.0.4 slave2
10.0.0.5 slave3
wq!
## restart ssh twice
$ sudo service ssh restart
$ sudo service ssh restart
## format hdfs namenode
$ hdfs namenode -format
$ <HADOOP_HOME> sbin/start-dfs.sh
$ <HADOOP_HOME> sbin/start-yarn.sh
## make the root directory of hbase
$ hadoop fs -mkdir /hbase
## start zookeeper
$ zkServer.sh start
## start hbase
$ start-hbase.sh
## TEST
$ jps
If success, you will see the following process running on master