物联网云平台大数据环境搭建
一、环境软件说明:
1、依赖:safecloud测试环境搭建(hadoop-2.7.2+hbase-1.2.5+phoenix-4.7.0+zookeeper-3.4.6+kafka_2.10-0.10.2.0+spark-2.0.0-bin-hadoop2.7+scala-2.11.8)
2、注,各小版本的兼容问题;
phoenix-4.10.0-HBase-1.2-bin 的版本与 phoenix-4.7.0-HBase-1.1-bin.tar是有区别的,当用前者版本插入的数据,在hbase中的列展示为字节,而后者版插入的数据在hbase中列展示位文本;所以插入数据的 Phoenix版本跟查询用的Phoenix版本一定要一致。(此问题曾经一个程序猿累死累活)
3、阿里云EMR版本依赖,题外资料参考(可忽略):
apache-hive-2.0.1-bin
apache-storm-1.0.1
hadoop-2.7.2
hbase-1.1.1
oozie-4.2.0
phoenix-4.7.0-HBase-1.1-bin
pig-0.14.0
presto-server-0.147
spark-2.0.2-bin-hadoop2.7
sqoop-1.4.6
tez-0.8.4
zeppelin-0.6.2-bin-all
zookeeper-3.4.6
二、环境安装:
1、编辑 vim /etc/hosts 将集群机器映射好(若是VPC网络的机器是没有固定公网IP的网卡的,使用内网地址绑定即可); (重要的事情说三遍!!!)
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.16.53.143 safecloud-master
2、解压所有需安装软件包:
[root@safecloud-master apps]# tar -xvf hbase-1.2.5.tar
[root@safecloud-master apps]# tar -xvf kafka_2.10-0.10.2.0.tar
[root@safecloud-master apps]# tar -xvf scala-2.11.8.tar
[root@safecloud-master apps]# tar -xvf spark-2.0.0-bin-hadoop2.7.tar
[root@safecloud-master apps]# tar -xvf zookeeper-3.4.6.tar
[root@safecloud-master apps]# tar -xvf hadoop-2.7.2.tar
3、建立所有安装软件软链(学习阿里云EMR,可选,好处是更换安装包版本时不需改环境变量配置,只需改软链即可)
[root@safecloud-master apps]# ln -s /opt/apps/hbase-1.2.5/ /usr/lib/hbase-current
[root@safecloud-master apps]# ln -s /opt/apps/kafka_2.10-0.10.2.0/ /usr/lib/kafka-current
[root@safecloud-master apps]# ln -s /opt/apps/scala-2.11.8/ /usr/lib/scala-current
[root@safecloud-master apps]# ln -s /opt/apps/spark-2.0.0-bin-hadoop2.7/ /usr/lib/spark-current
[root@safecloud-master apps]# ln -s /opt/apps/zookeeper-3.4.6/ /usr/lib/zookeeper-current
[root@safecloud-master apps]# ln -s /opt/apps/hadoop-2.7.2/ /usr/lib/hadoop-current
查看软件创建的结果:
[root@safecloud-master apps]# cd /usr/lib/
[root@safecloud-master apps]# ll
4、配置环境变量(结合第3步,后续升级扩展性更好):
[root@safecloud-master apps]# vim /etc/bashrc
a、追加环境变量:
export JAVA_HOME=/usr/java/jdk1.8.0_162
export JRE_HOME=/usr/java/jdk1.8.0_162/jre
export CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export SCALA_HOME=/usr/lib/scala-2.11.8
export SPARK_HOME=/usr/lib/spark-current
export HADOOP_HOME=/usr/lib/hadoop-current
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export HBASE_HOME=/usr/lib/hbase-current
export HBASE_BIN=$HBASE_HOME/bin
export PHOENIX_HOME=/usr/lib/phoenix-current
export KAFKA_HOME=/usr/lib/kafka-current
export ZOOKEEPER_HOME=/usr/lib/zookeeper-current
export PATH=$PATH:$HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/sbin:$HADOOP_HOME/bin:$KAFKA_HOME/bin:$ZOOKEEPER_HOME/bin:$PHOENIX_HOME/bin:$HBASE_HOME:$HBASE_BIN:./
b、即立即生效
[root@safecloud-master apps]# source /etc/bashrc
c、退出当前SSH登录会话(因为当前SSH会话会缓存老的环境变量,在启动服务时可能出问题)
[root@safecloud-master apps]# exit
5、先安装zookeeper(kafka、hbase等都依赖它)
a、修改zookeeper主要配置
[root@safecloud-master apps]# cd /usr/lib/zookeeper-current/conf
[root@safecloud-master apps]# mv zoo_sample.cfg zoo.cfg
[root@safecloud-master apps]# vim zoo.cfg
dataDir=/mnt/disk1/zookeeper/data
dataLogDir=/mnt/disk1/zookeeper/logs
clientPort=3181 #如果需要修改的话
b、启动zookeeper服务
[root@safecloud-master apps]# cd ../bin (推荐,因zk有个小坑,日志文件会生成在执行zkServer.sh时的目录)
[root@safecloud-master apps]# ./zkServer.sh start
6、再安装kafka
a、修改kafka主要配置
[root@safecloud-master apps]# vim /usr/lib/kafka-current/config/server.properties
############################# Log Basics #############################
# A comma seperated list of directories under which to store log files
log.dirs=/mnt/disk1/kafka/kafka-logs
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=safecloud-master:3181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000
host.name=safecloud-master #注意:一定要与 /etc/hosts 对应
b、启动kafka(因有环境变量可直接使用启动脚本)
[root@safecloud-master apps]# kafka-server-start.sh -daemon /usr/lib/kafka-current/config/server.properties
7、再安装hbase+phoenix
a、因phoenix服务端是以jar包形式内嵌hbase中的,所以直接拷贝phoenix相关jar至hbase/lib
[root@safecloud-master apps]# cp /usr/lib/phoenix-current/phoenix-4.7.0-HBase-1.1-server.jar /usr/lib/hbase-current/lib
b、修改hbase配置文件
[root@safecloud-master apps]# cd /usr/lib/hbase-current/conf/
[root@safecloud-master apps]# vim hbase-env.sh
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false #显示关闭hbase自带zk,伪集群或集群时必须以此配置
[root@safecloud-master apps]# vim hbase-site.xml
配置主要hbase:
<configuration>
<!-- ========Basic Settings======== -->
<!--<property>
<name>hbase.rootdir</name>
<value>hdfs://emr-cluster/hbase</value>
</property>-->
<!-- 在集群或伪集群模式下必须设置true -->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!-- HMaster启动监听绑定的IP,推荐设置为监听所有(0.0.0.0表示所有网卡IP),因为如果是VPC网络的服务器则是没有公网IP网卡的 -->
<property>
<name>hbase.master.ipc.address</name>
<value>0.0.0.0</value>
</property>
<!-- 集群模式时显示指定外部zk,多个以逗号隔开 -->
<property>
<name>hbase.zookeeper.quorum</name>
<value>safecloud-master:3181,safecloud-worker1:3181,safecloud-worker2:3181</value>
</property>
<property>
<name>zookeeper.znode.parent</name>
<value>/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/mnt/disk1/hbase-zk/data</value>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>90000</value>
</property>
</configuration>
c、其他配置文件说明
/usr/lib/hbase-current/conf/regionservers 表示配置slave节点,完全集群模式下必须指定,(启动master时会自动启动slave,暂未这样测试)
d、启动hbase服务(因有环境变量则直接使用启动脚本)
[root@safecloud-master apps]# start-hbase.sh
[root@safecloud-master apps]# jps
11578 HMaster
11727 HRegionServer
2364 QuorumPeerMain
2669 Kafka
若出现HMaster、HRegionServer则表示hbase启动成功。
8、再安装spark
a、配置spark配置文件
[root@safecloud-master apps]# cd /usr/lib/spark-current/conf
[root@safecloud-master conf]# cp spark-env.sh.template spark-env.sh
[root@safecloud-master conf]# cp slaves.template slaves
[root@safecloud-master conf]# vim slaves
# A Spark Worker will be started on each of the machines listed below.
#localhost
safecloud-master #将localhost注释,添加从机列表
[root@safecloud-master conf]# vim spark-env.sh
# 添加hadoop相关classpath,貌似不用陪也能跑
SPARK_DIST_CLASSPATH=$(/usr/lib/hadoop-current/bin/hadoop classpath)
b、启动spark主服务
[root@safecloud-master conf]# ./usr/lib/spark-current/sbin/start-all.sh
[root@safecloud-master apps]# jps
11578 HMaster
11727 HRegionServer
2364 QuorumPeerMain
2669 Kafka
12152 Master
12250 Worker
若出现Master、Worker则表示hbase启动成功。
c、提交一个spark-submit任务(消费kafka中所有主题的数据)
# 创建一个任务的目录
[root@safecloud-master conf]# mkdir -p /usr/lib/spark-current/work/consumer-topics
[root@safecloud-master work]# cd /usr/lib/spark-current/work/consumer-topics
[root@safecloud-master consumer-topics]# vim start-submit.sh
#export logpath=$(cd "`dirname "$0"`"/..; pwd)/customer-topics.out
nohup ../../bin/spark-submit --master local --deploy-mode client --jars kafka2hbase-0.0.1-SNAPSHOT-jar-with-dependencies.jar --class com.sm.stream.Consumer kafka2hbase.jar localhost:3181 localhost:3181 spark-stream-test safeclound_air,safeclound_smoke,safeclound_ray,safeclound_atmos,safeclound_water,safeclound_hum,safeclound_elec_power_facter,safeclound_elec_power,safeclound_elec_phase_v,safeclound_elec_phase_i,safeclound_elec_line_v,safeclound_elec_freq,safeclound_demand,safeclound_freq_rate,safeclound_phase_v_rate,safeclound_rate_v,safeclound_rate_i,safeclound_harmonic_v,safeclound_harmonic_i,safeclound_water_njyc,safeclound_ammeter,safeclound_heartbeat,safeclound_door_ctrl,safeclound_co1,safeclound_pump_controller,safeclound_ggj_controller,safeclound_compensation_module,safeclound_hcho,safeclound_tvoc,safeclound_co2,safeclound_largestDemand,safeclound_lineTemperature 3 >/mnt/disk1/spark-log/consumer-topics.out 2>&1 &
:wq保存
启动参数说明: kafka2hbase.jar kafkaZk列表 hbaseZk列表 kafka组ID 主题列表 线程数
[root@safecloud-master consumer-topics]# chmod 700 start-submit.sh
[root@safecloud-master consumer-topics]# ./start-submit.sh #启动spark任务
[root@safecloud-master consumer-topics]# jps
11578 HMaster
11727 HRegionServer
2364 QuorumPeerMain
2669 Kafka
12152 Master
12250 Worker
16889 SparkSubmit
若出现SparkSubmit则表示spark任务启动成功。
d、查看日志:
[root@safecloud-master consumer-topics]# tail -f /mnt/disk1/spark-log/consumer-topics.out
e、kafka2hbase.jar 源码参考
https://wang4ever.lofter.com/post/1cca927e_12646c79
云盘:https://pan.baidu.com/s/1xloEbdsRp4fIvVSvdlUU3A
9、设置所有服务开机自启动
[root@safecloud-master ~]# vim /etc/rc.local
cd /usr/lib/zookeeper-current/bin
./zkServer.sh start
cd /usr/lib/kafka-current/bin
./kafka-server-start.sh -daemon ../config/server.properties
/usr/lib/hbase-current/bin/start-hbase.sh
cd /usr/lib/spark-current/sbin
./start-all.sh
cd /usr/lib/spark-current/work/consumer-topics
./start-submit.sh
cd /opt/apps/tomcat8-dataopen
./bin/startup.sh
至此,大数据云平台的基础设施就已搭建好了。