物联网云平台大数据环境搭建

一、环境软件说明:

        1、依赖safecloud测试环境搭建(hadoop-2.7.2+hbase-1.2.5+phoenix-4.7.0+zookeeper-3.4.6+kafka_2.10-0.10.2.0+spark-2.0.0-bin-hadoop2.7+scala-2.11.8)

        2、注,各小版本的兼容问题;

              phoenix-4.10.0-HBase-1.2-bin 的版本与 phoenix-4.7.0-HBase-1.1-bin.tar是有区别的,当用前者版本插入的数据,在hbase中的列展示为字节,而后者版插入的数据在hbase中列展示位文本;所以插入数据的 Phoenix版本跟查询用的Phoenix版本一定要一致。(此问题曾经一个程序猿累死累活)

        3、阿里云EMR版本依赖题外资料参考(可忽略):

apache-hive-2.0.1-bin

apache-storm-1.0.1

hadoop-2.7.2

hbase-1.1.1

oozie-4.2.0

phoenix-4.7.0-HBase-1.1-bin

pig-0.14.0

presto-server-0.147

spark-2.0.2-bin-hadoop2.7

sqoop-1.4.6

tez-0.8.4

zeppelin-0.6.2-bin-all

zookeeper-3.4.6

二、环境安装:

1、编辑 vim /etc/hosts 将集群机器映射好(若是VPC网络的机器是没有固定公网IP的网卡的,使用内网地址绑定即可);  (重要的事情说三遍!!!)

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

172.16.53.143 safecloud-master

2、解压所有需安装软件包:

[root@safecloud-master apps]# tar -xvf hbase-1.2.5.tar

[root@safecloud-master apps]# tar -xvf kafka_2.10-0.10.2.0.tar

[root@safecloud-master apps]# tar -xvf scala-2.11.8.tar

[root@safecloud-master apps]# tar -xvf spark-2.0.0-bin-hadoop2.7.tar

[root@safecloud-master apps]# tar -xvf zookeeper-3.4.6.tar

[root@safecloud-master apps]# tar -xvf hadoop-2.7.2.tar

3、建立所有安装软件软链(学习阿里云EMR,可选,好处是更换安装包版本时不需改环境变量配置,只需改软链即可)

[root@safecloud-master apps]# ln -s /opt/apps/hbase-1.2.5/ /usr/lib/hbase-current

[root@safecloud-master apps]# ln -s /opt/apps/kafka_2.10-0.10.2.0/ /usr/lib/kafka-current

[root@safecloud-master apps]# ln -s /opt/apps/scala-2.11.8/ /usr/lib/scala-current

[root@safecloud-master apps]# ln -s /opt/apps/spark-2.0.0-bin-hadoop2.7/ /usr/lib/spark-current

[root@safecloud-master apps]# ln -s /opt/apps/zookeeper-3.4.6/ /usr/lib/zookeeper-current

[root@safecloud-master apps]# ln -s /opt/apps/hadoop-2.7.2/ /usr/lib/hadoop-current

查看软件创建的结果:

[root@safecloud-master apps]# cd /usr/lib/

[root@safecloud-master apps]# ll

4、配置环境变量(结合第3步,后续升级扩展性更好):

[root@safecloud-master apps]# vim /etc/bashrc

a、追加环境变量:

export JAVA_HOME=/usr/java/jdk1.8.0_162

export JRE_HOME=/usr/java/jdk1.8.0_162/jre

export CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib

export SCALA_HOME=/usr/lib/scala-2.11.8

export SPARK_HOME=/usr/lib/spark-current

export HADOOP_HOME=/usr/lib/hadoop-current

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"

export HBASE_HOME=/usr/lib/hbase-current

export HBASE_BIN=$HBASE_HOME/bin

export PHOENIX_HOME=/usr/lib/phoenix-current

export KAFKA_HOME=/usr/lib/kafka-current

export ZOOKEEPER_HOME=/usr/lib/zookeeper-current

export PATH=$PATH:$HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/sbin:$HADOOP_HOME/bin:$KAFKA_HOME/bin:$ZOOKEEPER_HOME/bin:$PHOENIX_HOME/bin:$HBASE_HOME:$HBASE_BIN:./

b、即立即生效

[root@safecloud-master apps]# source /etc/bashrc

c、退出当前SSH登录会话(因为当前SSH会话会缓存老的环境变量,在启动服务时可能出问题)

[root@safecloud-master apps]# exit

5、先安装zookeeper(kafka、hbase等都依赖它)

a、修改zookeeper主要配置

[root@safecloud-master apps]# cd /usr/lib/zookeeper-current/conf

[root@safecloud-master apps]# mv  zoo_sample.cfg zoo.cfg

[root@safecloud-master apps]# vim zoo.cfg

dataDir=/mnt/disk1/zookeeper/data

dataLogDir=/mnt/disk1/zookeeper/logs

clientPort=3181    #如果需要修改的话

b、启动zookeeper服务

[root@safecloud-master apps]# cd ../bin          (推荐,因zk有个小坑,日志文件会生成在执行zkServer.sh时的目录)

[root@safecloud-master apps]# ./zkServer.sh start

6、再安装kafka

a、修改kafka主要配置

[root@safecloud-master apps]# vim /usr/lib/kafka-current/config/server.properties

############################# Log Basics #############################

# A comma seperated list of directories under which to store log files

log.dirs=/mnt/disk1/kafka/kafka-logs

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).

# This is a comma separated host:port pairs, each corresponding to a zk

# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".

# You can also append an optional chroot string to the urls to specify the

# root directory for all kafka znodes.

zookeeper.connect=safecloud-master:3181

# Timeout in ms for connecting to zookeeper

zookeeper.connection.timeout.ms=6000

host.name=safecloud-master    #注意:一定要与 /etc/hosts 对应

b、启动kafka(因有环境变量可直接使用启动脚本)

[root@safecloud-master apps]# kafka-server-start.sh -daemon /usr/lib/kafka-current/config/server.properties

7、再安装hbase+phoenix

a、因phoenix服务端是以jar包形式内嵌hbase中的,所以直接拷贝phoenix相关jar至hbase/lib

[root@safecloud-master apps]# cp /usr/lib/phoenix-current/phoenix-4.7.0-HBase-1.1-server.jar /usr/lib/hbase-current/lib

b、修改hbase配置文件

[root@safecloud-master apps]# cd /usr/lib/hbase-current/conf/

[root@safecloud-master apps]# vim hbase-env.sh

# Tell HBase whether it should manage it's own instance of Zookeeper or not.

export HBASE_MANAGES_ZK=false    #显示关闭hbase自带zk,伪集群或集群时必须以此配置

[root@safecloud-master apps]# vim hbase-site.xml

配置主要hbase:

<configuration>

        <!-- ========Basic Settings======== -->

        <!--<property>

                        <name>hbase.rootdir</name>

           <value>hdfs://emr-cluster/hbase</value>

        </property>-->

        <!-- 在集群或伪集群模式下必须设置true -->

        <property>

                <name>hbase.cluster.distributed</name>

                <value>true</value>

        </property>

        <!-- HMaster启动监听绑定的IP,推荐设置为监听所有(0.0.0.0表示所有网卡IP),因为如果是VPC网络的服务器则是没有公网IP网卡的 -->

        <property>

                <name>hbase.master.ipc.address</name>

                <value>0.0.0.0</value>

        </property>

        <!-- 集群模式时显示指定外部zk,多个以逗号隔开 -->

        <property>

                <name>hbase.zookeeper.quorum</name>

                <value>safecloud-master:3181,safecloud-worker1:3181,safecloud-worker2:3181</value>

        </property>

        <property>

                <name>zookeeper.znode.parent</name>

                <value>/hbase</value>

        </property>

        <property>

                <name>hbase.zookeeper.property.dataDir</name>

                <value>/mnt/disk1/hbase-zk/data</value>

        </property>

        <property>

                <name>zookeeper.session.timeout</name>

                <value>90000</value>

        </property>

</configuration>

c、其他配置文件说明

/usr/lib/hbase-current/conf/regionservers    表示配置slave节点,完全集群模式下必须指定,(启动master时会自动启动slave,暂未这样测试)

d、启动hbase服务(因有环境变量则直接使用启动脚本)

[root@safecloud-master apps]# start-hbase.sh

[root@safecloud-master apps]# jps

11578 HMaster

11727 HRegionServer

2364 QuorumPeerMain

2669 Kafka

若出现HMaster、HRegionServer则表示hbase启动成功。

8、再安装spark

a、配置spark配置文件

[root@safecloud-master apps]# cd /usr/lib/spark-current/conf

[root@safecloud-master conf]# cp spark-env.sh.template spark-env.sh

[root@safecloud-master conf]# cp slaves.template slaves

[root@safecloud-master conf]# vim slaves

# A Spark Worker will be started on each of the machines listed below.

#localhost

safecloud-master    #将localhost注释,添加从机列表

[root@safecloud-master conf]# vim spark-env.sh

# 添加hadoop相关classpath,貌似不用陪也能跑

SPARK_DIST_CLASSPATH=$(/usr/lib/hadoop-current/bin/hadoop classpath)

b、启动spark主服务

[root@safecloud-master conf]# ./usr/lib/spark-current/sbin/start-all.sh

[root@safecloud-master apps]# jps

11578 HMaster

11727 HRegionServer

2364 QuorumPeerMain

2669 Kafka

12152 Master

12250 Worker

若出现Master、Worker则表示hbase启动成功。

c、提交一个spark-submit任务(消费kafka中所有主题的数据)

# 创建一个任务的目录

[root@safecloud-master conf]# mkdir -p /usr/lib/spark-current/work/consumer-topics

[root@safecloud-master work]# cd /usr/lib/spark-current/work/consumer-topics

[root@safecloud-master consumer-topics]# vim  start-submit.sh

#export logpath=$(cd "`dirname "$0"`"/..; pwd)/customer-topics.out

nohup ../../bin/spark-submit --master local --deploy-mode client --jars kafka2hbase-0.0.1-SNAPSHOT-jar-with-dependencies.jar --class com.sm.stream.Consumer kafka2hbase.jar localhost:3181 localhost:3181 spark-stream-test safeclound_air,safeclound_smoke,safeclound_ray,safeclound_atmos,safeclound_water,safeclound_hum,safeclound_elec_power_facter,safeclound_elec_power,safeclound_elec_phase_v,safeclound_elec_phase_i,safeclound_elec_line_v,safeclound_elec_freq,safeclound_demand,safeclound_freq_rate,safeclound_phase_v_rate,safeclound_rate_v,safeclound_rate_i,safeclound_harmonic_v,safeclound_harmonic_i,safeclound_water_njyc,safeclound_ammeter,safeclound_heartbeat,safeclound_door_ctrl,safeclound_co1,safeclound_pump_controller,safeclound_ggj_controller,safeclound_compensation_module,safeclound_hcho,safeclound_tvoc,safeclound_co2,safeclound_largestDemand,safeclound_lineTemperature 3 >/mnt/disk1/spark-log/consumer-topics.out 2>&1 &

:wq保存

启动参数说明:   kafka2hbase.jar  kafkaZk列表  hbaseZk列表  kafka组ID  主题列表  线程数

[root@safecloud-master consumer-topics]# chmod 700 start-submit.sh

[root@safecloud-master consumer-topics]# ./start-submit.sh    #启动spark任务

[root@safecloud-master consumer-topics]# jps

11578 HMaster

11727 HRegionServer

2364 QuorumPeerMain

2669 Kafka

12152 Master

12250 Worker

16889 SparkSubmit

若出现SparkSubmit则表示spark任务启动成功。

d、查看日志:

[root@safecloud-master consumer-topics]# tail -f /mnt/disk1/spark-log/consumer-topics.out

e、kafka2hbase.jar 源码参考

https://wang4ever.lofter.com/post/1cca927e_12646c79

云盘:https://pan.baidu.com/s/1xloEbdsRp4fIvVSvdlUU3A


9、设置所有服务开机自启动

[root@safecloud-master ~]# vim /etc/rc.local

cd /usr/lib/zookeeper-current/bin

./zkServer.sh start


cd /usr/lib/kafka-current/bin

./kafka-server-start.sh -daemon ../config/server.properties


/usr/lib/hbase-current/bin/start-hbase.sh


cd /usr/lib/spark-current/sbin

./start-all.sh


cd /usr/lib/spark-current/work/consumer-topics

./start-submit.sh


cd /opt/apps/tomcat8-dataopen

./bin/startup.sh



至此,大数据云平台的基础设施就已搭建好了。


评论
© Saxon | Powered by LOFTER
上一篇 下一篇