Hadoop,  Operation

基于 CDH 6 大数据平台生产部署

基于 CDH 6 大数据平台生产部署

1. 部署拓扑

IP Host CM Components CDH Components
10.0.0.111 cdh6-master-1 cloudera-scm-server, cloudera-scm-agent QuorumPeerMain//JournalNode/NameNode/
HttpFSServerWebServer/ResourceManager/
HMaster/ThriftServer/Kafka/HistoryServer/FlinkYarnSessionCli/...
10.0.0.112 cdh6-worker-1 cloudera-scm-agent QuorumPeerMain/JournalNode/DataNode/
NodeManager/HRegionServer/Kafka/FlinkYarnSessionCli/...
10.0.0.113 cdh6-worker-2 cloudera-scm-agent QuorumPeerMain/JournalNode/DataNode/
NodeManager/HRegionServer/Kafka/FlinkYarnSessionCli/...

2. OS 调优

  • kernel 调优
# 关闭 selinux
sudo getenforce
sudo setenforce 0

# 禁止swap
sudo sysctl -w vm.swappiness=0
sudo echo 'vm.swappiness=0' >> /etc/sysctl.conf
sudo sysctl -p

# 禁止透明大页
sudo echo never > /sys/kernel/mm/transparent_hugepage/enabled
sudo echo never > /sys/kernel/mm/transparent_hugepage/defrag

# see:https://blogs.wl4g.com/archives/1267
sudo yum install -y ntp
sudo timedatectl set-timezone Asia/Shanghai
  • 禁用 ipv6
# -----临时禁用所有接口 ipv6. -----
for f in $(ls /proc/sys/net/ipv6/conf/*/disable_ipv6); do sudo su - root -c "echo 1 >$f"; done
# 再次查看所有接口 ipv6 启用状态
cat /proc/sys/net/ipv6/conf/*/disable_ipv6

# ----- 永久禁用所有接口 ipv6. -----
sudo cat <<-'EOF' >/etc/sysctl.d/99-ipv6-disable.conf
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
EOF

# 禁用每个接口 ipv6 (虽然设置了 net.ipv6.conf.all.disable_ipv6=1
# 但默认对每个接口, 如: net.ipv6.conf.enp3s0.disable_ipv6=0) 还是启用的.

for i in $(ifconfig | awk -F'[ :]+' '!NF{if(eth!=""&&ip=="")print eth;eth=ip4=""}/^[^ ]/{eth=$1}/inet addr:/{ip=$4}'); do echo "net.ipv6.conf.$i.disable_ipv6=1" >>/etc/sysctl.d/99-ipv6-disable.conf; done

sudo sysctl -p
  • 关闭 firewalld
systemctl stop firewalld
systemctl disable firewalld

3. 部署 MySQL

注:不能开启 gtid,因为 cmf 存在无主键表

sudo cat <<-EOF > /etc/my.cnf
# see:https://blogs.wl4g.com/archives/650
[mysqld]
server_id = 1
port = 3306

lower_case_table_names = 1
sql_mode = NO_ZERO_IN_DATE,NO_ZERO_DATE,NO_AUTO_CREATE_USER,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
explicit_defaults_for_timestamp = true

symbolic-links = 0
## Note: 
#innodb_buffer_pool_size = 1G
#innodb_buffer_pool_instances = 8
#innodb_buffer_pool_chunk_size = 128M
max_allowed_packet = 1G
max_connections = 2000
slave_max_allowed_packet = 1G

binlog_format = ROW
binlog_checksum = NONE
log_bin = binlog
log_slave_updates = ON
relay_log_info_repository = TABLE
max_binlog_size = 1G
EOF
  • 初始化 mysql
$MYSQL_HOME/bin/mysqladmin -u root -p password --socket=/mnt/disk1/mysql/mysqld.sock
$MYSQL_HOME/bin/mysql -S $MYSQL_SOCKET -uroot -proot

create database cmf character set utf8 collate utf8_bin;

4. 部署 yum 仓库

  • 4.1 创建 yum parcel 库目录
mkdir -p /usr/share/nginx/html/el7-repo/
tree /usr/share/nginx/html/el7-repo/
/usr/share/nginx/html/el7-repo/
├── cdh6
│             ├── CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel
│             ├── CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel.sha1
│             └── manifest.json
└── cm6
    └── 6.3.1
        ├── allkeys.asc
        ├── repodata
        │   ├── 3662f97de72fd44c017bb0e25cee3bc9398108c8efb745def12130a69df2ecb2-filelists.sqlite.bz2
        │   ├── 43f3725f730ee7522712039982aa4befadae4db968c8d780c8eb15ae9872cd4d-primary.xml.gz
        │   ├── 49e4d60647407a36819f1d8ed901258a13361749b742e3be9065025ad31feb8e-filelists.xml.gz
        │   ├── 8afda99b921fd1538dd06355952719652654fc06b6cd14515437bda28376c03d-other.sqlite.bz2
        │   ├── b9300879675bdbc300436c1131a910a535b8b5a5dc6f38e956d51769b6771a96-primary.sqlite.bz2
        │   ├── e28836e19e07f71480c4dad0f7a87a804dc93970ec5277ad95614e8ffcff0d58-other.xml.gz
        │   ├── repomd.xml
        │   ├── repomd.xml.asc
        │   └── repomd.xml.key
        ├── RPM-GPG-KEY-cloudera
        ├── RPMS
        │   ├── noarch
        │   └── x86_64
        │       ├── cloudera-manager-agent-6.3.1-1466458.el7.x86_64.rpm
        │       ├── cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm
        │       ├── cloudera-manager-server-6.3.1-1466458.el7.x86_64.rpm
        │       ├── cloudera-manager-server-db-2-6.3.1-1466458.el7.x86_64.rpm
        │       ├── enterprise-debuginfo-6.3.1-1466458.el7.x86_64.rpm
        │       └── oracle-j2sdk1.8-1.8.0+update181-1.x86_64.rpm
        └── SRPMS
└── parcel-flink # 当集成了自定义构建的 FLINK 才需要
    ├── 1.11.2
    │   ├── flink-1.11.2-bin-scala_2.12-el7.parcel
    │   ├── flink-1.11.2-bin-scala_2.12-el7.parcel.sha
    │   └── manifest.json
    └── 1.14.4
        ├── flink-1.14.4-bin-scala_2.11-el7.parcel
        ├── flink-1.14.4-bin-scala_2.11-el7.parcel.sha
        └── manifest.json

12 directories, 26 files
  • 4.2 添加 yum 源 & 安装
cat <<-EOF >/etc/yum.repos.d/cloudera.repo
[cloudera-repo]
name=cloudera-repo
baseurl=http://el7-repo.wl4g.io/cm6/6.3.1/
enabled=1
gpgcheck=0
EOF

# 所有节点
sudo yum install -y java-1.8.0-openjdk-devel.x86_64 net-tools
sudo yum -y install cloudera-manager-daemons cloudera-manager-agent
# 仅 master 节点
sudo yum -y install cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server
  • 4.3 初始化 cmf 库
/opt/cloudera/cm/schema/scm_prepare_database.sh mysql cmf root root

注: 初始化时若提示All done, your SCM database is configured correctly! 但实际上没有并未初始化,恭喜这踩到了 cdh 6.3.1(之后哪个高版本已解决,就暂未测试了,因为已收费:)) 的 bug,这通常发生在重复初始化时出现,第一次是OK的,如果出现此问题时,可使用这份初始化成功的脚本:cdh6.3.1_cmf_init.sql

注: 不要开启 MySQL 的 gtid 复制时 (如 MGR 集群),会导致cm-server运行报错无法插入数据The table does not comply with the requirements by an external plugin.,原因是这几张表无主键(role_staleness_status/commands_detail/cm_version/client_configs_to_hosts/schema_version)。
尝试过给这几张表第一个字段设置为主键、或添加一个ID字段自增长都会报错。暂无任何解决方案!(不开源改不了代码/(ㄒoㄒ)/~~)

  • 配置 mysql jdbc 驱动
sudo ln -snf /opt/apps/mysql-connector-java-5.1.47.jar /usr/share/java/mysql-connector-java.jar
  • 4.4 配置 CM 连接 mysql

解决问题:WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.

cat <<EOF >>/etc/cloudera-scm-server/db.properties
# For information describing how to configure the Cloudera Manager Server
# to connect to databases, see the "Cloudera Manager Installation Guide."
#
com.cloudera.cmf.db.type=mysql
com.cloudera.cmf.db.host=localhost
com.cloudera.cmf.db.name=cmf
com.cloudera.cmf.db.user=root
com.cloudera.cmf.db.setupType=EXTERNAL
com.cloudera.cmf.db.password=root
# see:https://community.cloudera.com/t5/Support-Questions/Cloudera-6-3-MySql-5-7-SSL-Warning/td-p/316592
com.cloudera.cmf.orm.hibernate.connection.driver_class=com.mysql.jdbc.Driver
com.cloudera.cmf.orm.hibernate.connection.url=jdbc:mysql://127.0.0.1:3306/cmf?useUnicode=true&characterEncoding=UTF-8&useSSL=false
#com.cloudera.cmf.db.useSSL=false
#com.cloudera.cmf.db.verifyServerCertificate=false
#com.cloudera.cmf.db.trustCertificateKeyStoreUrl=file:/usr/java/jdk1.8.0 _121-cloudera/jre/lib/security/jssecacerts
#com.cloudera.cmf.db.trustCertificateKeyStoreType=JKS
#com.cloudera.cmf.db.trustCertificateKeyStorePassword=changeit
EOF
  • 4.5 启动 CM server
systemctl restart cloudera-scm-server
tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log

systemctl restart cloudera-scm-agent
tail -f /var/log/cloudera-scm-server/cloudera-scm-agent.log

5. 进入 CM 控制台配置集群各组件

5.1 配置截图

5.2 集成外部 Flink

注:若已集成 Flink parcel 后又想升级版本,则务必注意卸载旧 Flink parcel 的操作顺序:a. 首先在所有引用的集群中停止如 Flink yarn 服务;b. 从每个集群中删除 Flink yarn 角色;c. 再在 parcels 管理中停用 Flink parcel 激活;d. 再点击卸载 Flink parcel

5.3 集成外部 Phoenix

提示1:Phoenix 简化了HBase 很多场景的使用方式,在 Cloudera Labs 的其他工具一样,Cloudera 官方目前不会提供Support,仅供实验使用。

提示2:暂时为了简单快速集成,操作步骤:a. 直接从官网下载对应 HBase 的 Phoenix 版本;b. 然后拷贝到所有机器的 CDH HBase 安装目录 lib 下;c. 再重启 HBase 这样就完成了集成。 后续有时间整理打成 phoenix parcel 包标准的方式集成。

  • 如下在 cdh6-master-1 上下载及部署操作示例:

注:一定要找对版本否则会有你想不到的惊喜:),目前测试发现:cdh6.3.1-hbase-2.1.0 + phoenix-hbase-2.1-5.1.2-bin,在首次启动 sqlline.py 时 Phoenix 初始化表会导致 DataNode 报奇怪的错误:DataXceiver error processing WRITE_BLOCK operation IOException: Premature EOF from inputStream,但按此方案解决无效: https://cloud.tencent.com/developer/article/1404118

cd /opt/cloudera/parcels/

#curl -sSkL -O https://archive.apache.org/dist/phoenix/phoenix-5.1.2/phoenix-hbase-2.1-5.1.2-bin.tar.gz
curl -sSkL -O http://archive.apache.org/dist/phoenix/phoenix-5.1.1/phoenix-hbase-2.1-5.1.1-bin.tar.gz

tar -xf phoenix-*

# 如下所有 link 操作,所有节点都需要
ln -snf phoenix-hbase-2.1-5.1.0-bin phoenix

# 使 HBase server 自动加载 Phoenix queryserver
ln -snf /opt/cloudera/parcels/phoenix/phoenix-server-hbase-2.1-5.1.0.jar /opt/cloudera/parcels/CDH/lib/hbase/lib/phoenix-server-hbase-2.1-5.1.0.jar
  • 可选)若您想启用WAL二级索引,则需在重启HBase服务之前,在 hbase-site.xml 中添加以下内容:
<property>
    <name>hbase.regionserver.wal.codec</name>
    <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>

6. 测试验证

6.1 验证 HDFS 部署

查看 NameNode UI 或当前 NN 进程的配置:http://cdh6-master-1:9870/conf CDH 默认端口9870,其他工具搭建的集群可能是 50070

  • 前置条件1:首先在 CM 界面配置(调优) HDFS 相关参数,点击“保存更改”,然后在点击 “操作” -> “部署到客户端”,才会在每台主机上生成 file 形式的配置文件,如:cat /etc/hadoop/conf/core-site.xml,否则执行 hdfs 命令时会找不到配置文件。

  • 前置条件2:已启动完成 HDFS (NameNode/DataNode/Jouralnode/Httpfs等各进程)

  • 执行命令查看文件 HDFS 根目录的文件列表来验证:hdfs dfs -ls /

  • 总体启动顺序按照如下:
    Zookeeper(3) -> HDFS(NameNode/DataNode/Jouralnode/Httpfs..) -> Yarn/mr2(ResourceManager/NodeManager) -> HBase(HMaster/HRegionServer/ThirftServer) -> Kafka(Broker) -> Spark(HistoryServer)、Flink(flink-yarn) -> Hue(Hue Server) 等。

6.2 验证 Yarn 部署

查看 Yarn UI 或当前 ResourceManager 进程的配置:http://cdh6-master-1:8088/conf 默认端口8088

  • 制作测试数据集并上传到 HDFS
mkdir -p /tmp/wordcount/
cat <<EOF >/tmp/wordcount/hello.txt
hello xm
hello sir
java c
python vb
java c++
go php
erlang java
EOF

sudo -u hdfs hdfs dfs -mkdir -p /tmp/
sudo -u hdfs hdfs dfs -put /tmp/wordcount/ /tmp
  • 启动 MR 任务

注:启动之前必须在 CM 控制台上点击执行安装新MR框架 jar,菜单在:“集群“ -> ”“YARN (MR2 Included)” -> “安装 YARN MapReduce 框架 JAR 命令”,由于 Yarn 的前身 V1 是 MapReduce ,后来 V2 升级为 Yarn。see:docs.cloudera.com/documentation/enterprise/6/6.3/topics/cm_mc_mr_and_yarn.html

  • 从命令日志看就是执行的这两条命令:
hdfs/hdfs.sh ["mkdir","/user/yarn/mapreduce/mr-framework","yarn","hadoop","775"]`
yarn/yarn.sh ["install-mr-framework","/user/yarn/mapreduce/mr-framework/3.0.0-cdh6.3.1-mr-framework.tar.gz#mr-framework","hdfs://cdh6-master-1:8020"]
cd /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop-mapreduce/

# 【注】:执行用户(sudo -u hdfs),否则可能报错:Cleaning up the staging area file:/tmp/hadoop/mapred/staging/hdfs1449449357/.staging/job_local1449449357_0001 ENOENT: No such file or directory
sudo -u hdfs yarn jar hadoop-mapreduce-examples-3.0.0-cdh6.3.1.jar wordcount /tmp/wordcount/hello.txt /tmp/wordcount/output

# 查看 yarn 任务容器
yarn application -list

# 查看结果集
hdfs dfs -ls /tmp/wordcount/output/
hdfs dfs -cat /tmp/wordcount/output/part-r-*

6.3 验证 HBase 部署

查看 HMaster UI 或当前 HMaster 进程的配置:http://cdh6-master-1:16010/conf 默认端口16010

  • 使用 hbase shell 建表写数据
hbase shell

hbase(main):005:0> version
2.1.0-cdh6.3.1, rUnknown, Thu Sep 26 02:56:37 PDT 2019
Took 0.0007 seconds

hbase(main):006:0> status
1 active master, 0 backup masters, 2 servers, 0 dead, 1.0000 average load
Took 0.0186 seconds

# 创建带有2个列簇的表
hbase(main):008:0> create 't_hello', {NAME=>'f1'}, {NAME=>'f2'}
Created table t_hello
Took 2.9259 seconds
=> Hbase::Table - t_hello

# 列出所有表
hbase(main):016:0> list
TABLE
t_hello
1 row(s)
Took 0.0181 seconds
=> ["t_hello"]

# 查看表结构
hbase(main):010:0> describe 't_hello'
Table t_hello is ENABLED
t_hello
COLUMN FAMILIES DESCRIPTION
{NAME => 'f1', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', 
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY =
> 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}
{NAME => 'f2', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', 
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY =
> 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}
2 row(s)
Took 0.2506 seconds

# 向表添加数据
hbase(main):012:0> put 't_hello', '001','f1:name','Tom'
Took 0.1881 seconds
hbase(main):013:0> put 't_hello', '001','f1:gender','Man'
Took 0.0201 seconds
hbase(main):014:0> put 't_hello', '001','f1:lang','CN'
Took 0.0196 seconds

# 扫描查询表数据
hbase(main):015:0> scan 't_hello'
ROW                                       COLUMN+CELL
 001                                      column=f1:gender, timestamp=1653460232218, value=Man
 001                                      column=f1:lang, timestamp=1653460248593, value=CN
 001                                      column=f1:name, timestamp=1653460206910, value=Tom
1 row(s)
Took 0.1212 seconds

6.4 验证 Spark 部署

查看 Spark UI:http://cdh6-master-1:4040/ Standalone模式默认端口4040,注:若使用 yarn 调度,则每个 SparkSubmit 任务UI 地址不同,应从 yarn 控制台 RUNNING 列表跳转过来查看。

# 沿用 7.1 的 wordcount 数据集
spark-shell

scala> val data=sc.textFile("/tmp/wordcount/hello.txt")
data: org.apache.spark.rdd.RDD[String] = /tmp/wordcount/hello.txt MapPartitionsRDD[1] at textFile at <console>:24

scala> val textFile = spark.read.textFile("/tmp/wordcount/hello.txt")
textFile: org.apache.spark.sql.Dataset[String] = [value: string]

scala> data.collect;
res0: Array[String] = Array(hello xm, hello sir, java c, python vb, java c++, go php, erlang java)

scala> val splitdata = data.flatMap(line => line.split(" "));
splitdata: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at flatMap at <console>:25

scala> splitdata.collect;
res1: Array[String] = Array(hello, xm, hello, sir, java, c, python, vb, java, c++, go, php, erlang, java)
  • 示例计算 Pi
sudo -u hdfs spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster /opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-examples_2.11-2.4.0-cdh6.3.1.jar 10

6.5 验证 Flink 部署

查看 Flink UI:http://cdh6-master-1:8081/ Standalone模式默认端口8081,注:若使用 yarn 调度,则每个 (TaskManagerRunner / CliFrontend 等) 任务UI 地址不同,应从 yarn 控制台 RUNNING 列表跳转过来查看。

  • 运行官方 WordCount 示例任务:
# 还是沿用 7.1 的 wordcount 数据集(不加 'hdfs://' 默认读取 local FS)
cd /opt/cloudera/parcels/flink-1.14.4-bin-scala_2.11/lib/flink/

# 先local 模式部署:
sudo -u hdfs ./bin/flink run -t local examples/batch/WordCount.jar --input hdfs:///tmp/wordcount/hello.txt --output hdfs:///tmp/wordcount/flink-output.txt

# 再使用 yarn-per-job 模式部署
sudo -u hdfs ./bin/flink run -t yarn-per-job examples/batch/WordCount.jar --input hdfs:///tmp/wordcount/hello.txt --output hdfs:///tmp/wordcount/flink-output.txt

# 查看结果集
hdfs dfs -cat /tmp/wordcount/flink-output.txt
c 2
erlang 1
go 1
hello 2
java 3
php 1
python 1
sir 1
vb 1
xm 1
  • 运行 Flink Shell 交互式任务:

注: 到 flink-1.14.4为止还不支持 scala2.12, 参见本文 FAQ #8.7

cd /opt/cloudera/parcels/flink-1.14.4-bin-scala_2.11/lib/flink/
./bin/start-scala-shell.sh local

scala> val dataStream = senv.fromElements(1, 2, 3, 4)
scala> dataStream.countWindowAll(2).sum(0).executeAndCollect().foreach(println)
3
7

6.6 验证 Phoenix 部署

  • 前置条件:将 phoenix-server-hbase-2.1-5.1.2.jar 软链到所有机器的 CDH hbase lib 目录下并重启完成:
/opt/cloudera/parcels/phoenix/bin/sqlline.py

0: jdbc:phoenix:> !ta
+-----------+-------------+------------+--------------+---------+-----------+---------------------------+----------------+-------------+----------------+--------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME |  TABLE_TYPE  | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION | INDEX_STATE | IMMUTABLE_ROWS | SALT_B |
+-----------+-------------+------------+--------------+---------+-----------+---------------------------+----------------+-------------+----------------+--------+
|           | SYSTEM      | CATALOG    | SYSTEM TABLE |         |           |                           |                |             | false          | null   |
|           | SYSTEM      | CHILD_LINK | SYSTEM TABLE |         |           |                           |                |             | false          | null   |
|           | SYSTEM      | FUNCTION   | SYSTEM TABLE |         |           |                           |                |             | false          | null   |
|           | SYSTEM      | LOG        | SYSTEM TABLE |         |           |                           |                |             | true           | 32     |
|           | SYSTEM      | MUTEX      | SYSTEM TABLE |         |           |                           |                |             | true           | null   |
|           | SYSTEM      | SEQUENCE   | SYSTEM TABLE |         |           |                           |                |             | false          | null   |
|           | SYSTEM      | STATS      | SYSTEM TABLE |         |           |                           |                |             | false          | null   |
|           | SYSTEM      | TASK       | SYSTEM TABLE |         |           |                           |                |             | false          | null   |
+-----------+-------------+------------+--------------+---------+-----------+---------------------------+----------------+-------------+----------------+--------+
0: jdbc:phoenix:>
0: jdbc:phoenix:> CREATE TABLE "test"."t_user" (
    "ROW" VARCHAR PRIMARY KEY,
    "f1"."firstName" VARCHAR (16),
    "f1"."lastName" VARCHAR,
    "f1"."age" INTEGER,
    "f2"."money" DECIMAL(10,2),
    "f2"."idcard" BIGINT
) COLUMN_ENCODED_BYTES = 0;

0: jdbc:phoenix:> UPSERT INTO "test"."t_user" ("ROW", "firstName", "lastName", "age", "money", "idcard") VALUES ('12301062022,admin,1', 'foo', 'bar', 18, 9999999.01, 42310197001011212);
0: jdbc:phoenix:>
0: jdbc:phoenix:> select * from "test"."t_user";

7. FAQ

7.1 Install CDH Parcels 包时报错,提示主机运行状况不良,错误原因官方文档解释:Block agents from heartbeating to a Cloudera Manager with different UUID until agent restart

  • 解决
# 删除所有有问题服务器的agent目录下面的cm_guid文件,并重启失败节点的agent服务恢复
cd /var/lib/cloudera-scm-agent
# 删除cm-guid文件
rm -f cm_guid
# 重启agent服务,重启后会自动安装失败的agent
service cloudera-scm-agent restart

7.2 启动组件时报错 Cannot find CDH's bigtop-detect-javahome.

sudo mkdir -p /usr/libexec/bigtop-utils/
# 如:
sudo echo 'export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-1.el7_9.x86_64' > /usr/libexec/bigtop-utils/bigtop-detect-javahome

# 推荐设置这里,所有组件生效
sudo echo 'export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-1.el7_9.x86_64' > /etc/default/bigtop-utils
# 参见源码:vim +21 /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/bin/bigtop-detect-javahome

7.3 如何集成第三方组件 Flink?

  • 7.3.1 首先,Cloudera Manager 是采用 parcel 包进行离线统一部署的,且 CDH 默认是不支持 Flink/Phoneix 集成的(CDP支持),因此必须自己构建 flink 的 parcel 包,请参见我这个项目 (使用 cm_ext 构建 CDH 6.3.1 + Flink 1.14.4 parcel 包) gitee.com/wl4g-collect/flink-parcel-generator

  • 7.3.2 在准备加载 flink parcel 之前,请先将 parcel 包覆盖上传至 /opt/cloudera/parcel-repo/,如:

    tree /opt/cloudera/parcel-repo/
    ├── CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel
    ├── CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel.sha
    ├── CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel.sha1
    ├── CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel.torrent
    ├── flink-1.14.4-bin-scala_2.11-el7.parcel
    ├── flink-1.14.4-bin-scala_2.11-el7.parcel.sha
    ├── flink-1.14.4-bin-scala_2.11-el7.parcel.torrent
    └── manifest.json

    然后在 “主机” -> “Parcel” -> “检查新 Parcel” 执行加载,同时必须将 csd 包手动上传到 /opt/cloudera/csd/flink_on_yarn-1.14.4.jar/opt/cloudera/csd/flink_standalone-1.14.4.jar,否则可能就算分片且激活成功 flink parcel 包,但是给集群添加服务时还可能选不了 flink。

  • 7.3.3 集成完 Flink 后安装目录长这样:

[展开] /opt/cloudera 目录树

/opt/cloudera/
├── cm
│  ├── actionItem_en.properties
...
├── cm-agent
│  ├── bin
│  ├── cm_version.properties
│  ├── lib64 -> lib
...
├── csd
│  ├── flink_on_yarn-1.14.4.jar
│  └── flink_standalone-1.14.4.jar
...
├── parcel-cache
│  ├── CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel.torrent
│  └── flink-1.14.4-bin-scala_2.11-el7.parcel.torrent
...
├── parcel-repo
│  ├── CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel
│  ├── CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel.sha
│  ├── CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel.torrent
│  ├── flink-1.14.4-bin-scala_2.11-el7.parcel  ## 扩展的 Flink 包
│  ├── flink-1.14.4-bin-scala_2.11-el7.parcel.sha
│  ├── flink-1.14.4-bin-scala_2.11-el7.parcel.torrent
│  └── manifest.json
...
└── parcels
    │── CDH -> CDH-6.3.1-1.cdh6.3.1.p0.1470567
    │── CDH-6.3.1-1.cdh6.3.1.p0.1470567
    │    ├── bin
    │    │  ├── bigtop-detect-javahome -> ../lib/bigtop-utils/bigtop-detect-javahome
    │    │  ├── flume-ng
    │    │  ├── hadoop
    │    │  ├── hbase
    │    │  ├── hdfs
    │    │  ├── hive
    │    │  ├── kafka-acls
    │    │  ├── mapred
    │    │  ├── oozie
    │    │  ├── pig
    │    │  ├── spark-submit
    │    │  ├── sqoop
    │    │  ├── yarn
    │    │  ├── zookeeper-server
    │    ...
    │    ├── etc
    │    │  ├── hadoop
    │    │  ├── hadoop-httpfs
    │    │  ├── hbase
    │    │  ├── hbase-solr
    │    │  ├── hive
    │    │  ├── kafka
    │    │  ├── solr
    │    │  ├── spark
    │    │  ├── sqoop
    │    │  └── zookeeper
    │   ...
    │    ├── jars
    │    │  ├── avro-mapred-1.8.2-cdh6.3.1-hadoop2.jar
    │    │  ├── hadoop-aliyun-3.0.0-cdh6.3.1.jar
    │    │  ├── hbase-zookeeper-2.1.0-cdh6.3.1.jar
    │    │  ├── hive-storage-api-2.1.1-cdh6.3.1.jar
    │    │  ├── lucene-analyzers-common-7.4.0-cdh6.3.1.jar
    │    │  ├── netty-resolver-4.1.17.Final.jar
    │    │  ├── oozie-tools-5.1.0-cdh6.3.1.jar
    │    │  ├── parquet-tools-1.9.0-cdh6.3.1.jar
    │    │  ├── spark-yarn_2.11-2.4.0-cdh6.3.1.jar
    │    │            ├── zookeeper-3.4.6.jar
    │    ...
    │    ├── lib
    │    │  ├── avro
    │    │  ├── bigtop-utils
    │    │  ├── flume-ng
    │    │  ├── hadoop
    │    │  ├── hadoop-hdfs
    │    │  ├── hadoop-mapreduce
    │    │  ├── hadoop-yarn
    │    │  ├── hbase
    │    │  ├── hive
    │    │  ├── hue
    │    │  ├── impala
    │    │  ├── kafka
    │    │  ├── pig
    │    │  ├── solr
    │    │  ├── spark
    │    │  ├── sqoop
    │    │  └── zookeeper-native
    │    ...
    │    ├── lib64
    │    │  ├── libhdfs.so -> libhdfs.so.0.0.0
    │    │  ├── libhdfs.so.0.0.0
    │    │  ├── libzookeeper_st.so -> libzookeeper_st.so.2.0.0
    │    ...
    │    ├── libexec
    │    │  └── bigtop-utils
    │    ├── meta
    │    │  ├── alternatives.json
    │    │  ├── parcel.json
    │    └── share
    │        ├── doc
    │    ...
    ├── flink -> flink-1.14.4-bin-scala_2.11    ## 扩展的 Flink 包解压后安装目录
    └── flink-1.14.4-bin-scala_2.11
        ├── lib
        │  └── flink
        │  ├── bin
        │  ├── conf
        │  ├── examples
        │  ├── lib
        │  ├── LICENSE
        │  ├── licenses
        │  ├── log
        │  ├── NOTICE
        │  ├── opt
        │  ├── plugins
        │  └── README.txt
        └── meta
            ├── flink_env.sh
            ├── parcel.json
            └── permissions.json


  • 7.3.4 如果以上包都正确上传和配置后,依然在给集群添加服务时选择不了 flink,那么就重启 systemctl restart cloudera-scm-server,因为有RAM缓存,通常这都会OK。

  • 相关资料1:chowdera.com/2020/12/20201213073419954w.html

7.4 当启动集成 Flink yarn 组件报错 NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException

  • 解决方案1:需手动将 flink 依赖的 hadoop 版本上传到 flink/lib 目录即可,注: 但亲测发现用 hadoop-3-uber-3.1.1 的包又会报错 NoClassDefFoundError: org/apache/commons/cli/CommandLine,最终尝试用如下 hadoop-2-uber-2.8.3 竟然OK。

  • 解决方案2:如果flink版本>=1.12.0,可以在:“Flink yarn” -> 配置 -> 高级 -> ”Flink yarn 服务环境高级配置代码段(安全阀) Flink yarn(服务范围)” 可以添加如下配置(重启 Flink yarn 服务就不报错):

HADOOP_USER_NAME=flink
HADOOP_CONF_DIR=/etc/hadoop/conf
HADOOP_HOME=/opt/cloudera/parcels/CDH
HADOOP_CLASSPATH=/opt/cloudera/parcels/CDH/jars/*

7.5 safemode: Access denied for user root. Superuser privilege is required 何解?

实际案例:启动 HBase/Spark/Flink 等时发现报错 Name node is in safe mode,然后执行 hdfs dfsadmin -safemode leave 又发现报错 safemode: Access denied for user root. Superuser privilege is required

解决:sudo -u hdfs hdfs dfsadmin -safemode leave,参考:safemode-Access-denied-for-user-cloudera-Superuser-privilege

7.6 如果是用的 scala_2.12 对应版本,则启动 /opt/cloudera/parcels/flink-1.14.4-bin-scala_2.12/lib/flink/bin/start-scala-shell.sh 时报错:Error: Could not find or load main class org.apache.flink.api.scala.FlinkShell

7.7 Hadoop / HBase Trace 配置使用?

7.8 在只有 rpm 包的情况下如何在 ubuntu 下部署?

  • 使用 alien 将 rpm 转为 deb

    注:目前在测试环境能部署成功,但发现会有一些小问题,如执行 dpki -i cloudera-*.deb 时报错无法创建用户 clouera-scm,需提前手动创建即可。

sudo apt install -y alien
sudo alien cloudera-*.rpm -v # 一键转换

7.9 当启动 Flink-yarn 时报错 /opt/cloudera/parcels/flink/lib/flink/bin/flink-yarn.sh +17 rotateLogFilesWithPrefix command not found

  • 通常这是由于 flink-yarn 默认配置了两项:security.kerberos.login.keytab / security.kerberos.login.principal,如果不想启用 kerberos 认证,则直接设为空即可,同时还应设置 HADOOP 相关环节变量,其选项名为:“Flink-yarn 服务环境高级配置代码段(安全阀) Flink-yarn(服务范围)”:
    HADOOP_USER_NAME=flink
    HADOOP_CONF_DIR=/etc/hadoop/conf
    HADOOP_HOME=/opt/cloudera/parcels/CDH
    HADOOP_CLASSPATH=/opt/cloudera/parcels/CDH/jars/*

7.10 创建 CDH 集群安装时部分节点总是提示:复制失败或无法安装 cloudera-manager-agent?

  • 请检测改节点的 repo 源配置是否正确:ls -al /etc/yum.repo.d/

7.11 创建 CDH 集群安装时部分节点总是提示:安装失败。 无法接收 Agent 发出的检测信号?

  • 有这么几种可能:
    1、Python文件不匹配;
    2、日志文件不存在,在config.ini中把log_file放开;
    3、/etc/hosts/中主机和ip配置问题;
    4、防火墙是否关闭,ubuntu是ufw disable;
    5、端口配置,config.ini中端口是否配置的为7182;
    6、集群时间是否同步,安装ntp同步时间;
    7、ssh私钥的问题-----我现在正在查这个问题呢,前边都配完了,但是仍然无法检测到信号,我没有使用私钥,不知道是不是跟这个有关系;
    8,如果此目的无权限则应该授权:sudo chmod -R cloudera-scm:cloudera-scm /var/log/cloudera-scm-agent/

7.12 如何修改 cloudera-scm-server 日志级别?

7.13 如果您是面向私有云部署的企业运开人员,可能需要经常新部署集群,能否快速导入已部署集群 cmf 库?

  • 应该是不行的,一是,不同集群的 IP 不同,即使导入了也得手动改,而且 cmf 库的表关联关系不开源,极易出错几乎不可能;二是,决定性不可行的,cmf 库中只是配置后的状态数据,实际上会对应每个节点真实安装的jar包和脚本,否则光导入一堆数据是没用的,因此如果需大量部署私有云集群,目前是暂未有啥可靠的方法,主要原因就是不开源,否则知道表关联关系是完全可导入然后手动修改如 IP 部分,基于这个需求开源的 Ambari 更有优势。

7.14 当修改了 CDH kafka 配置项 zookeeper.chroot 默认值时,如改为 /kafka,会导致查不到主题?

环境:CDH6.3.1 / kafka_2.11-2.2.1

现象:

  • 操作1:笔者为公司某客户私有云CDH集群,由于资源问题部分组件选择了混合,因此修改了 kafka 在 zookeeper 的根路径配置项 zookeeper.chroot/kafka,但在实测集群可用性时发现,执行创建 topc kafka-topics --zookeeper 127.0.0.1:2181 --create --topic safeclound_ammeter --partitions 100 --replication-factor 5 成功,但执行查看主题 kafka-topics --zookeeper 127.0.0.1:2181 --list 查不到任何 topics。

  • 操作2:尝试了在控制台启动生产者 kafka-console-producer --broker-list 127.0.0.1:9092 --topic safeclound_ammeter --property parse.key=true --property key.separator=: 和消费者 kafka-console-consumer --bootstrap-server 127.0.0.1:9092 --topic safeclound_ammeter 进行测试发现竟然能生产也能消费此 topic 的数据,查看了 CM 生成的配置是有的:

    cat /etc/kafka/conf/kafka-client.conf
    zookeeper.connect=emr-master-1:2181,emr-master-2:2181,emr-worker-1:2181,emr-worker-2:2181,emr-worker-3:2181/kafka

    同时也查看了 zk 中生成路径也是正常在:/kafka/xxx 下。

  • 操作3:又尝试将 zookeeper.chroot 改为空,重启,再执行 kafka-topics --zookeeper 127.0.0.1:2181 --list 是能查到的

  • 由此分析应该是有 BUG,太忙就没必要花时间深入研究到底是 CDH 配置的键错误还是 kafka 内部错误,总之,通常都会使用默认空值就不会有任何问题,再就是默认开启了 producer 自动创建主题,那么只要是生产过数据的 topic 就都能查到了。当然在 kafka-manager 上也能查到,cmak 相关部署参见:https://blogs.wl4g.com/archives/1485

7.15 启动 flink-yarn 服务时查看 /var/log/flink/flink-yarn.out 报错 org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException$UnimplementedException: KeeperErrorCode = Unimplemented for /flink/cluster_yarn

  • 解决:手动创建 flink 在 zookeeper 的目录然后重启
zookeeper-client
create /flink/cluster_yarn

8. 参考资料

留言

您的电子邮箱地址不会被公开。