Etcd + VScode源码调试 + 生产部署
1. 开发环境搭建
背景驱动:
新客户的服务器需要部署
kubernetes
集群,在安装完 etcd 后发现启动报错:
{"level":"fatal","ts":"2021-08-25T16:24:00.044+0800","caller":"etcdmain/etcd.go:203","msg":"discovery failed","error":"couldn't find local name \"default\" in the initial cluster configuration","stacktrace":"etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tetcd/server/etcdmain/etcd.go:203\netcd/server/v3/etcdmain.Main\n\tetcd/server/etcdmain/main.go:40\nmain.main\n\tetcd/server/main.go:32\nruntime.main\n\truntime/proc.go:225"}
或
{"level":"fatal","ts":"2021-08-25T16:24:00.044+0800","caller":"etcdmain/etcd.go:203","msg":"discovery failed","error":"cannot fetch cluster info from peer urls: could not retrieve cluster information from the given URLs","stacktrace":"etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tetcd/server/etcdmain/etcd.go:203\netcd/server/v3/etcdmain.Main\n\tetcd/server/etcdmain/main.go:40\nmain.main\n\tetcd/server/main.go:32\nruntime.main\n\truntime/proc.go:225"}
1.1 生成 vscode 调试配置文件
cd $PROJECT_HOME
sudo mkdir -p .vscode
sudo cat <<-'EOF' > ./vscode/launch.json
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Launch Package",
"type": "go",
"request": "launch",
"mode": "debug",
"program": "${workspaceFolder}/server",
"args": ["--config-file=/etc/etcd/etcd.conf.yml"]
}
]
}
EOF
1.2 生成 etcd (仅演示)配置文件
sudo cat <<-'EOF' > /etc/etcd/etcd.conf.yml
name: 'etcd1'
listen-peer-urls: https://10.0.0.123:2380
listen-client-urls: https://10.0.0.123:2379
initial-advertise-peer-urls: https://10.0.0.123:2380
advertise-client-urls: https://10.0.0.123:2379
initial-cluster: etcd1=https://10.0.0.123:2380,etcd2=https://10.0.0.122:2380,etcd3=https://10.0.0.121:2380
initial-cluster-token: 'etcd-cluster'
# Initial cluster state ('new' or 'existing').
initial-cluster-state: 'existing'
EOF
- 从 vscode 启动 etcd server
1.3 问题解决
调试过程就不多说了,这里主要目的是如何搭建etcd的开发调试环境,用以排查线上疑难问题。以上这个演示问题的原因是:新集群需要初始化,所有 etcd.conf.yml
配置的 initial-cluster-state
要设置为 new
,而已经初始化了重启之前则最好设置为 existing
以避免其他问题。
说明:如果设置为 existing
, 则 etcd
将尝试加入现有群集。如果设置了错误的值, etcd将尝试启动但安全失败。
----------------------------------------------- 华丽的分割线 ------------------------------------------------------
2. 生产部署
- 集群部署拓扑图:
IP | Host | FQDN | etcd.conf#etcd.name |
---|---|---|---|
10.0.0.121 | k8s-master-1 | n1.etcd.wl4g.uat | etcd1 |
10.0.0.122 | k8s-master-2 | n2.etcd.wl4g.uat | etcd2 |
10.0.0.123 | k8s-master-3 | n3.etcd.wl4g.uat | etcd3 |
注:最优雅地对于 etcd 这种 stateful 服务的配置应该是使用域名/主机名,这样可在不同环境 (IP不同) 下轻松迁移,但由于 etcd 限制只能,如
listen-peer-urls
等必须是 IP,Validation error: expected IP in URL for binding : https://github.com/etcd-io/etcd/issues/9575,目前不可行(除非改源码)https://github.com/etcd-io/etcd/blob/release-3.5/server/embed/config.go#L904。
- 部署准备
sudo mkdir -p /usr/lib/etcd-current; cd /usr/lib/etcd-current
sudo chmod -R 755 /usr/lib/etcd-current
# github官方包的加速地址
curl -O https://github.91chifun.workers.dev/https://github.com//etcd-io/etcd/releases/download/v3.5.0/etcd-v3.5.0-linux-amd64.tar.gz
tar -xf etcd-*.tar.gz --strip-components=1 -C $(pwd)
rm -rf etcd-*.tar.gz # cleanup
# link binary.
sudo ln -snf $(pwd)/etcd /usr/bin/etcd
sudo ln -snf $(pwd)/etcdctl /usr/bin/etcdctl
2.1 环境配置
注:关于启动参数
ETCD_INITIAL_CLUSTER_STATE
说明:
这是一个无聊的参数。
用于指示这是否是新集群,有两个选项new
和existing
,如果设为existing
,则该member启动时会尝试与其他member交互;集群初次建立时,应设为new
,经尝试最后一个节点设为existing
也正常,其他节点不能填为existing
;在集群运行过程中,当一个member故障后恢复时填为existing
,经尝试设为new
也正常。
源码分析:
https://github.com/etcd-io/etcd/blob/v3.5.0/server/etcdserver/server.go#L424
/etc/profile.d/profile-etcd.sh
sudo cat <<-'EOF' >/etc/profile.d/profile-etcd.sh #!/bin/bash # Copyright (c) 2017 ~ 2025, the original author wangl.sir individual Inc, # All rights reserved. Contact us# # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # cat /etc/etcd/etcd.conf | while read etcdVar; do # if [[ -n "$(echo $etcdVar|sed 's/ //g')" && -z "$(echo $etcdVar|grep -E '^#|^export')" ]]; then # etcdVarName=$(echo $etcdVar|awk -F '=' '{print $1}'|sed 's/ //g') # etcdVarVal=$(echo $etcdVar|awk -F '=' '{print $2}'|sed 's/ //g') # eval "export $etcdVarName=\"$etcdVarVal\"" # fi # done . /etc/etcd/etcd.conf export PATH=$PATH:/usr/lib/etcd-current # This is a foolish parameter. # Used to indicate whether this is a new cluster. There are two values, 'new' and 'existing'. # If it is set to 'existing', the member will try to interact with other members when starting. # When creating a cluster for the first time, it should be set to 'new'. After trying, it is # normal for the last node to fill in 'existing', and other nodes cannot be set to 'existing' # During cluster operation, when a member recovers after failure, it is set to 'existing', # and it is normal to try to set it to 'new'. # # see: https://etcd.io/docs/v3.5/op-guide/configuration/#--initial-cluster-state. # see: https://github.com/etcd-io/etcd/blob/v3.5.0/server/etcdserver/server.go#L424 # see: https://my.oschina.net/u/160697/blog/4283750 # # The naming rule: etcd1/etcd2/etcd3, the default set first 'etcd1' is leader. #IS_LEADER="$([ "$ETCD_NAME" == "etcd1" ] && echo Y || echo N)" #if [[ $IS_LEADER == 'Y' && -f "$ETCD_DATA_DIR/member/snap/db" ]]; then if [[ -f "$ETCD_DATA_DIR/member/snap/db" ]]; then export ETCD_INITIAL_CLUSTER_STATE='new' else export ETCD_INITIAL_CLUSTER_STATE='existing' fi # [Additional] log directory. see: /etc/init.d/etcd.sh export ETCD_LOG_DIR=/mnt/disk1/log/etcd EOF . /etc/profile.d/profile-etcd.sh sudo chmod +x /etc/profile.d/profile-etcd.sh
2.2 运行配置
注:以下为主节点
etcd1
配置,其他从节点需自行修改ETCD_NAME=etcd2
或ETCD_NAME=etcd3
等,其他地址相关环境变量配置ETCD_LISTEN_PEER_URLS
、ETCD_LISTEN_CLIENT_URLS
、ETCD_INITIAL_ADVERTISE_PEER_URLS
、ETCD_ADVERTISE_CLIENT_URLS
同理按实际 IP 修改。
/etc/etcd/etcd.conf
sudo mkdir -p /etc/etcd sudo cat <<-'EOF' >/etc/etcd/etcd.conf # Copyright (c) 2017 ~ 2025, the original author wangl.sir individual Inc, # All rights reserved. Contact us# # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # Refer to see: https://etcd.io/docs/v3.5/op-guide/configuration # # Member environoments. export ETCD_NAME='etcd1' export ETCD_DATA_DIR='/mnt/disk1/etcd' # The dedicated wal directory. (default={data-dir}/wal) export ETCD_WAL_DIR="${ETCD_DATA_DIR}/wal" export ETCD_SNAPSHOT_COUNT=10000 export ETCD_HEARTBEAT_INTERVAL=200 export ETCD_ELECTION_TIMEOUT=5000 export ETCD_LISTEN_PEER_URLS='https://10.0.0.121:2380' export ETCD_LISTEN_CLIENT_URLS='https://0.0.0.0:2379' export ETCD_MAX_SNAPSHOTS=10 export ETCD_MAX_WALS=10 export ETCD_CORS='*' export ETCD_QUOTA_BACKEND_BYTES=0 export ETCD_BACKEND_BATCH_LIMIT=0 export ETCD_BACKEND_BBOLT_FREELIST_TYPE=map export ETCD_BACKEND_BATCH_INTERVAL=0 export ETCD_MAX_TXN_OPS=128 export ETCD_MAX_REQUEST_BYTES=1572864 export ETCD_GRPC_KEEPALIVE_MIN_TIME=5s export ETCD_GRPC_KEEPALIVE_INTERVAL=2h export ETCD_GRPC_KEEPALIVE_TIMEOUT=20s # Clustering environments. # The etcd cluster runs URLs configuration. Notice=there are differences between leader and follower nodes. export ETCD_INITIAL_ADVERTISE_PEER_URLS='https://10.0.0.121:2380' export ETCD_INITIAL_CLUSTER='etcd1=https://10.0.0.121:2380,etcd2=https://10.0.0.122:2380,etcd3=https://10.0.0.123:2380' # The initial startup must be 'new', and the subsequent startup must be 'existing'. (default='new') #export ETCD_INITIAL_CLUSTER_STATE='new' export ETCD_INITIAL_CLUSTER_TOKEN='etcd-cluster' export ETCD_ADVERTISE_CLIENT_URLS='https://10.0.0.121:2379' #export ETCD_DISCOVERY='' #export ETCD_DISCOVERY_PROXY='' #export ETCD_DISCOVERY_SRV='' #export ETCD_DISCOVERY_SRV_NAME='' #export ETCD_DISCOVERY_FALLBACK='proxy' #export ETCD_DISCOVERY_PROXY='' export ETCD_STRICT_RECONFIG_CHECK=false export ETCD_AUTO_COMPACTION_MODE=periodic export ETCD_AUTO_COMPACTION_RETENTION=8 export ETCD_ENABLE_V2=false # Profiling environments. export ETCD_ENABLE_PPROF=true export ETCD_METRICS='basic' #export ETCD_LISTEN_METRICS_URLS='' # Proxy environments. export ETCD_PROXY='off' export ETCD_PROXY_FAILURE_WAIT=5000 export ETCD_PROXY_REFRESH_INTERVAL=30000 export ETCD_PROXY_DIAL_TIMEOUT=1000 export ETCD_PROXY_WRITE_TIMEOUT=5000 export ETCD_PROXY_READ_TIMEOUT=0 # Auth environments. #export ETCD_AUTH_TOKEN='simple' # Security environments. export ETCD_CERT_FILE='/etc/etcd/ssl/etcd.pem' export ETCD_KEY_FILE='/etc/etcd/ssl/etcd-key.pem' #export ETCD_CLIENT_CERT_AUTH='' #export ETCD_CLIENT_CRL_FILE='' #export ETCD_CLIENT_CERT_ALLOWED_HOSTNAME='' export ETCD_TRUSTED_CA_FILE='/etc/etcd/ssl/ca.pem' #export ETCD_AUTO_TLS=false export ETCD_PEER_CERT_FILE='/etc/etcd/ssl/etcd.pem' export ETCD_PEER_KEY_FILE='/etc/etcd/ssl/etcd-key.pem' #export ETCD_PEER_CLIENT_CERT_AUTH=false #export ETCD_PEER_CRL_FILE='' export ETCD_PEER_TRUSTED_CA_FILE='/etc/etcd/ssl/ca.pem' #export ETCD_PEER_AUTO_TLS=false #export ETCD_PEER_CERT_ALLOWED_CN='' #export ETCD_PEER_CERT_ALLOWED_HOSTNAME='' # Notice: etcdctl Ssl handshake failed. # [issue]: https://github.com/etcd-io/etcd/issues/9785 # [issue]: https://github.com/etcd-io/etcd/issues/10652 #export ETCD_CIPHER_SUITES='' # Logging environments. export ETCD_LOGGER=zap # Specify 'stdout' or 'stderr' to skip journald logging even when running under systemd. export ETCD_LOG_OUTPUTS='stderr,stdout' export ETCD_LOG_LEVEL=debug # Experimental environments. #export ETCD_EXPERIMENTAL_CORRUPT_CHECK_TIME='0s' #export ETCD_EXPERIMENTAL_COMPACTION_BATCH_LIMIT=1000 #export ETCD_EXPERIMENTAL_PEER_SKIP_CLIENT_SAN_VERIFICATION=false # Miscellaneous environments. #export ETCD_CONFIG_FILE='' # Unsafe environments. export ETCD_FORCE_NEW_CLUSTER=false # Etcdctl environments. export ETCDCTL_ENDPOINTS=https://10.0.0.121:2379,https://10.0.0.122:2379,https://10.0.0.123:2379 export ETCDCTL_CACERT=/etc/etcd/ssl/ca.pem export ETCDCTL_CERT=/etc/etcd/ssl/etcd.pem export ETCDCTL_KEY=/etc/etcd/ssl/etcd-key.pem EOF
2.3 管理脚本
可兼容无
systemd
的 Linux 低版本系统,如 CentOS 6
/etc/init.d/etcd.sh
sudo cat <<-'EOF' >/etc/init.d/etcd.sh #!/bin/bash # chkconfig: - 85 15 #/* # * Copyright 2017 ~ 2025 the original author or authors. <Wanglsir@gmail.com, 983708408@qq.com> # * # * Licensed under the Apache License, Version 2.0 (the "License"); # * you may not use this file except in compliance with the License. # * You may obtain a copy of the License at # * # * http://www.apache.org/licenses/LICENSE-2.0 # * # * Unless required by applicable law or agreed to in writing, software # * distributed under the License is distributed on an "AS IS" BASIS, # * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # * See the License for the specific language governing permissions and # * limitations under the License. # */ [ -f /etc/sysconfig/network ] && . /etc/sysconfig/network [ "$NETWORKING" = "no" ] && exit 0 # Load the user environment. [ -f "/etc/profile.d/profile-etcd.sh" ] && . /etc/profile.d/profile-etcd.sh [ -f "/etc/bashrc" ] && . /etc/bashrc [ -f "/etc/bash.bashrc" ] && . /etc/bash.bashrc # e.g ubuntu [ -f "/home/$USER/.bash_profile" ] && . /home/$USER/.bash_profile [ -f "/home/$USER/.bashrc" ] && . /home/$USER/.bashrc # Mac OS [ -f "/Users/$USER/.bash_profile" ] && . /Users/$USER/.bash_profile [ -f "/Users/$USER/.bashrc" ] && . /Users/$USER/.bashrc # Environment definition. etcdBin="$(command -v etcd)" etcdLogDir="${ETCD_LOG_DIR:-/mnt/disk1/log/etcd}" function start() { local pids=$(getPids) if [ -z "$pids" ]; then # Notice: if '--config-file' is used configuration here, the environment variables # and startup parameters will become invalid. It is not recommended. if [ "$PPID" == "1" ]; then # Systemd call. "$etcdBin" > "$etcdLogDir/etcd.stdout" 2>&1 else # Normal shell call. nohup "$etcdBin" > "$etcdLogDir/etcd.stdout" 2>&1 & fi echo -n "Starting etcd ..." while true do pids=$(getPids) if [ "$pids" == "" ]; then echo -n "."; sleep 0.8; else echo $pids >"$ETCD_DATA_DIR/etcd.pid" break fi done echo -e "\nStarted etcd on "$pids else echo "etcd process is running "$pids fi } function stop() { local pids=$(getPids) if [ -z "$pids" ]; then echo "etcd not running!" else echo -n "Stopping etcd for $pids ..." kill -s TERM $pids while true do pids=$(getPids) if [ "$pids" == "" ]; then \rm -f $ETCD_DATA_DIR/etcd.pid break else echo -n "."; sleep 0.8; fi done echo -e "\nStopped etcd !" fi } function status() { ps -ef | grep -v grep | grep $etcdBin } function getPids() { local pids=$(ps ax | grep -i "$etcdBin" | grep -v grep | awk '{print $1}') echo $pids # Output execution result value. return 0 # Return the execution result code. } # --- Main call. --- CMD=$1 case $CMD in status) status ;; start) start ;; stop) stop ;; restart) stop start ;; *) echo $"Usage: {start|stop|restart|status}" exit 2 esac EOF . /etc/profile sudo useradd etcd sudo chown -R etcd:etcd /etc/init.d/etcd.sh sudo chmod +x /etc/init.d/etcd.sh sudo mkdir -p $ETCD_LOG_DIR sudo chown -R etcd:etcd $ETCD_LOG_DIR sudo mkdir -p $ETCD_DATA_DIR sudo chown -R etcd:etcd $ETCD_DATA_DIR # Etcd 推荐值(The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data) sudo chmod 700 $ETCD_DATA_DIR
2.4 服务配置
/etc/systemd/system/etcd.service
sudo cat <<-'EOF' >/etc/systemd/system/etcd.service # Copyright (c) 2017 ~ 2025, the original author wangl.sir individual Inc, # All rights reserved. Contact us# # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # [Unit] Description=Etcd Server After=network.target After=network-online.target Wants=network-online.target [Service] Type=simple ExecStart=/bin/bash -c "/etc/init.d/etcd.sh start" ExecReload=/bin/bash -c "/etc/init.d/etcd.sh restart" ExecStop=/bin/bash -c "/etc/init.d/etcd.sh stop" StandardOutput=null StandardError=journal LimitNOFILE=1048576 LimitNPROC=1048576 LimitCORE=infinity TimeoutStartSec=5 Restart=always KillMode=process User=etcd Group=etcd SuccessExitStatus=143 [Install] WantedBy=multi-user.target EOF sudo systemctl daemon-reload sudo systemctl enable etcd
2.5 配置证书
生成证书(只需在 10.0.0.121 / k8s-master-1 上执行,然后通过 scp -r /etc/etcd/ssl k8s-master-2:/etc/etcd/
拷贝分发即可)
/etc/etcd/ssl
sudo curl -o /bin/cfssl -L https://github.com/cloudflare/cfssl/releases/download/v1.6.1/cfssl_1.6.1_linux_amd64 sudo curl -o /bin/cfssljson -L https://github.com/cloudflare/cfssl/releases/download/v1.6.1/cfssljson_1.6.1_linux_amd64 sudo chmod +x /bin/cfssl sudo chmod +x /bin/cfssljson #或 sudo apt install golang-cfssl # sudo mkdir /etc/etcd/ssl # # Generating config. sudo cat <<-'EOF' >config.json {"signing":{"default":{"expiry":"87600h"},"profiles":{"etcd-cluster-1":{"usages":["signing","key encipherment","server auth","client auth"],"expiry":"87600h"}}}} EOF # # Generating CA certificate singing request config. sudo cat <<-'EOF' >ca-csr.json {"CN":"WL4G Root CA cert issuer","CA":{"expiry":"87600h","pathlen":0},"key":{"algo":"rsa","size":2048},"names":[{"C":"US","L":"San Francisco 12th street","O":"WL4G company, Inc.","OU":"www dept","ST":"California"}]} EOF # # Generating etcd certificate singing request config. sudo cat <<-'EOF' >etcd-csr.json {"hosts":["10.0.0.121","10.0.0.122","10.0.0.123","k8s-master-1","k8s-master-2","k8s-master-3","https://etcd.wl4g.uat","https://n1.etcd.wl4g.uat","https://n2.etcd.wl4g.uat","https://n3.etcd.wl4g.uat","127.0.0.1"],"CN":"wl4g.uat","key":{"algo":"rsa","size":2048},"names":[{"C":"CN","L":"GuangZhou 6th street","O":"SM, Inc.","OU":"WWW dept","ST":"GuangDong"}]} EOF # # Generating CA certificate. sudo cfssl genkey -initca ca-csr.json | cfssljson -bare ca # # Generating etcd certificate. sudo cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=config.json -profile=etcd-cluster-1 etcd-csr.json | cfssljson -bare etcd # # Print CA and etcd certificate. sudo openssl x509 -in etcd-key.pem -noout -text sudo openssl x509 -in etcd.pem -noout -text # # Copy to other nodes directory. sudo scp -r /etc/etcd/ssl k8s-master-2:/etc/etcd sudo scp -r /etc/etcd/ssl k8s-master-3:/etc/etcd sudo scp -r /etc/etcd/ssl k8s-worker-1:/etc/etcd
2.6 启动 etcd
集群并验证
使用 etcdctl endpoint status -w table 验证
# n1.etcd.wl4g.uat sudo systemctl restart etcd # n2.etcd.wl4g.uat sudo systemctl restart etcd # n3.etcd.wl4g.uat sudo systemctl restart etcd # 查看日志 sudo tail -f /mnt/disk1/log/etcd.stdout # # 在任意一台执行 etcdctl 查看集群状态 etcdctl endpoint status -w table --endpoints https://10.0.0.121:2379,https://10.0.0.122:2379,https://10.0.0.123:2379 --cacert /etc/etcd/ssl/ca.pem --cert /etc/etcd/ssl/etcd.pem --key /etc/etcd/ssl/etcd-key.pem # # 或直接这样干掉一大串(已配置在 /etc/etcd/etcd.conf 中,ETCDCTL_ 前缀) etcdctl endpoint status -w table +-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://10.0.0.121:2379 | e6e9b682e792c054 | 3.5.0 | 20 kB | false | false | 3 | 11 | 11 | | | https://10.0.0.122:2379 | 79378de21f8ee3a3 | 3.5.0 | 25 kB | true | false | 3 | 11 | 11 | | | https://10.0.0.123:2379 | aafe9b852edcd7e6 | 3.5.0 | 29 kB | false | false | 3 | 11 | 11 | | +-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
3. Backup data snaphots.
Notice:
etcdctl
backup can only connect one node at a time.
/etc/etcd/etcd-backup.sh
sudo cat <<-'EOF' >/etc/etcd/etcd-backup.sh #!/bin/bash # Copyright (c) 2017 ~ 2025, the original author wangl.sir individual Inc, # All rights reserved. Contact us# # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # set -e # Load environments. (include: /etc/profile.d/profile-etcd.sh, /etc/etcd/etcd.conf) . /etc/profile # Global definition. export backupEndpoint='https://10.0.0.121:2379' export backupMaxFiles='8' export backupDir="${ETCD_DATA_DIR}/backups"; mkdir -p $backupDir export backupLogFile="$backupDir/backup.log" export backupLogMaxBytes=$((10*1024*1024*1024)) # 10GB export backupCurrentFile="${backupDir}/snapshot-$(date +%Y%m%d_%H%M%S).db" function doBackupSnapshot() { # Tmp tranform etcdctl endpoints environment. local originEtcdctlEndpoints=$ETCDCTL_ENDPOINTS export ETCDCTL_ENDPOINTS=$backupEndpoint { echo "$(date +%Y-%m-%d_%H:%M:%S) - Backuping for $backupEndpoint ..." etcdctl snapshot save "$backupCurrentFile" echo "$(date +%Y-%m-%d_%H:%M:%S) - Success backup to $backupCurrentFile on $backupEndpoint" } 2>&1 | tee -a "$backupLogFile" # Restore etcdctl endpoints environments export ETCDCTL_ENDPOINTS=$originEtcdctlEndpoints } function doCleanup() { # Check older snapshots. echo "$(date +%Y-%m-%d_%H:%M:%S) - Cleaning older backup snapshot files ..." | tee -a "$backupLogFile" cd $backupDir/;ls -lt |awk '{if(NR > '$backupMaxFiles'){print "rm -rf "$9}}' | sh # Check backup log. local logSize= ls -l $backupLogFile | awk '{ print $5 }'
if [ $logSize -gt $backupLogMaxBytes ]; then echo > "$backupLogFile" echo "$(date +%Y-%m-%d_%H:%M:%S) - Cleaned backup log." | tee -a "$backupLogFile" fi } # --- Main. --- doBackupSnapshot doCleanup EOF sudo chmod +x /etc/etcd/etcd-backup.sh # Add backup to crontab. echo "*/5 * * * * root /etc/etcd/etcd-backup.sh" >> /etc/crontab # Add kill zombie process. echo " 0 3 * * * root /bin/ps -A -ostat,ppid | grep -e '^[Zz]' | awk '{print }' | xargs kill -HUP > /dev/null 2>&1" >> /etc/crontab sudo systemctl restart crond sudo tail -f /var/log/cron # 或 sudo journalctl -u crond -f