Kubernetes

Etcd + VScode源码调试 + 生产部署

1. 开发环境搭建

背景驱动:

新客户的服务器需要部署 kubernetes 集群,在安装完 etcd 后发现启动报错:

{"level":"fatal","ts":"2021-08-25T16:24:00.044+0800","caller":"etcdmain/etcd.go:203","msg":"discovery failed","error":"couldn't find local name \"default\" in the initial cluster configuration","stacktrace":"etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tetcd/server/etcdmain/etcd.go:203\netcd/server/v3/etcdmain.Main\n\tetcd/server/etcdmain/main.go:40\nmain.main\n\tetcd/server/main.go:32\nruntime.main\n\truntime/proc.go:225"}

{"level":"fatal","ts":"2021-08-25T16:24:00.044+0800","caller":"etcdmain/etcd.go:203","msg":"discovery failed","error":"cannot fetch cluster info from peer urls: could not retrieve cluster information from the given URLs","stacktrace":"etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tetcd/server/etcdmain/etcd.go:203\netcd/server/v3/etcdmain.Main\n\tetcd/server/etcdmain/main.go:40\nmain.main\n\tetcd/server/main.go:32\nruntime.main\n\truntime/proc.go:225"}

1.1 生成 vscode 调试配置文件

cd $PROJECT_HOME
sudo mkdir -p .vscode
sudo cat <<-'EOF' > ./vscode/launch.json
{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Launch Package",
            "type": "go",
            "request": "launch",
            "mode": "debug",
            "program": "${workspaceFolder}/server",
            "args": ["--config-file=/etc/etcd/etcd.conf.yml"]
        }
    ]
}
EOF

1.2 生成 etcd (仅演示)配置文件

sudo cat <<-'EOF' > /etc/etcd/etcd.conf.yml
name: 'etcd1'
listen-peer-urls: https://10.0.0.123:2380
listen-client-urls: https://10.0.0.123:2379
initial-advertise-peer-urls: https://10.0.0.123:2380
advertise-client-urls: https://10.0.0.123:2379
initial-cluster: etcd1=https://10.0.0.123:2380,etcd2=https://10.0.0.122:2380,etcd3=https://10.0.0.121:2380
initial-cluster-token: 'etcd-cluster'
# Initial cluster state ('new' or 'existing').
initial-cluster-state: 'existing'
EOF
  • 从 vscode 启动 etcd server

shot1
shot2

1.3 问题解决

调试过程就不多说了,这里主要目的是如何搭建etcd的开发调试环境,用以排查线上疑难问题。以上这个演示问题的原因是:新集群需要初始化,所有 etcd.conf.yml 配置的 initial-cluster-state 要设置为 new,而已经初始化了重启之前则最好设置为 existing 以避免其他问题。
说明:如果设置为 existing, 则 etcd 将尝试加入现有群集。如果设置了错误的值, etcd将尝试启动但安全失败。


----------------------------------------------- 华丽的分割线 ------------------------------------------------------


2. 生产部署

  • 集群部署拓扑图:
IP Host FQDN etcd.conf#etcd.name
10.0.0.121 k8s-master-1 n1.etcd.wl4g.uat etcd1
10.0.0.122 k8s-master-2 n2.etcd.wl4g.uat etcd2
10.0.0.123 k8s-master-3 n3.etcd.wl4g.uat etcd3

注:最优雅地对于 etcd 这种 stateful 服务的配置应该是使用域名/主机名,这样可在不同环境 (IP不同) 下轻松迁移,但由于 etcd 限制只能,如 listen-peer-urls 等必须是 IP,Validation error: expected IP in URL for binding : https://github.com/etcd-io/etcd/issues/9575,目前不可行(除非改源码)https://github.com/etcd-io/etcd/blob/release-3.5/server/embed/config.go#L904

  • 部署准备
sudo mkdir -p /usr/lib/etcd-current; cd /usr/lib/etcd-current
sudo chmod -R 755 /usr/lib/etcd-current
# github官方包的加速地址
curl -O https://github.91chifun.workers.dev/https://github.com//etcd-io/etcd/releases/download/v3.5.0/etcd-v3.5.0-linux-amd64.tar.gz
tar -xf etcd-*.tar.gz --strip-components=1 -C $(pwd)
rm -rf etcd-*.tar.gz # cleanup
# link binary.
sudo ln -snf $(pwd)/etcd /usr/bin/etcd
sudo ln -snf $(pwd)/etcdctl /usr/bin/etcdctl

2.1 环境配置

注:关于启动参数 ETCD_INITIAL_CLUSTER_STATE 说明:
这是一个无聊的参数。
用于指示这是否是新集群,有两个选项newexisting,如果设为existing,则该member启动时会尝试与其他member交互;集群初次建立时,应设为new,经尝试最后一个节点设为existing 也正常,其他节点不能填为existing;在集群运行过程中,当一个member故障后恢复时填为existing,经尝试设为new也正常。

源码分析:
https://github.com/etcd-io/etcd/blob/v3.5.0/server/etcdserver/server.go#L424

/etc/profile.d/profile-etcd.sh
sudo cat <<-'EOF' >/etc/profile.d/profile-etcd.sh
#!/bin/bash
# Copyright (c) 2017 ~ 2025, the original author wangl.sir individual Inc,
# All rights reserved. Contact us 
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# cat /etc/etcd/etcd.conf | while read etcdVar; do
# if [[ -n "$(echo $etcdVar|sed 's/ //g')" && -z "$(echo $etcdVar|grep -E '^#|^export')" ]]; then
#   etcdVarName=$(echo $etcdVar|awk -F '=' '{print $1}'|sed 's/ //g')
#   etcdVarVal=$(echo $etcdVar|awk -F '=' '{print $2}'|sed 's/ //g')
#   eval "export $etcdVarName=\"$etcdVarVal\""
# fi
# done

. /etc/etcd/etcd.conf
export PATH=$PATH:/usr/lib/etcd-current

# This is a foolish parameter.
# Used to indicate whether this is a new cluster. There are two values, 'new' and 'existing'. 
# If it is set to 'existing', the member will try to interact with other members when starting.
# When creating a cluster for the first time, it should be set to 'new'. After trying, it is 
# normal for the last node to fill in 'existing', and other nodes cannot be set to 'existing'
# During cluster operation, when a member recovers after failure, it is set to 'existing', 
# and it is normal to try to set it to 'new'.
#
# see: https://etcd.io/docs/v3.5/op-guide/configuration/#--initial-cluster-state.
# see: https://github.com/etcd-io/etcd/blob/v3.5.0/server/etcdserver/server.go#L424
# see: https://my.oschina.net/u/160697/blog/4283750
#
# The naming rule: etcd1/etcd2/etcd3, the default set first 'etcd1' is leader.
#IS_LEADER="$([ "$ETCD_NAME" == "etcd1" ] && echo Y || echo N)"
#if [[ $IS_LEADER == 'Y' && -f "$ETCD_DATA_DIR/member/snap/db" ]]; then
if [[ -f "$ETCD_DATA_DIR/member/snap/db" ]]; then
  export ETCD_INITIAL_CLUSTER_STATE='new'
else
  export ETCD_INITIAL_CLUSTER_STATE='existing'
fi

# [Additional] log directory. see: /etc/init.d/etcd.sh
export ETCD_LOG_DIR=/mnt/disk1/log/etcd
EOF

. /etc/profile.d/profile-etcd.sh
sudo chmod +x /etc/profile.d/profile-etcd.sh

2.2 运行配置

注:以下为主节点 etcd1 配置,其他从节点需自行修改 ETCD_NAME=etcd2ETCD_NAME=etcd3 等,其他地址相关环境变量配置 ETCD_LISTEN_PEER_URLSETCD_LISTEN_CLIENT_URLSETCD_INITIAL_ADVERTISE_PEER_URLSETCD_ADVERTISE_CLIENT_URLS 同理按实际 IP 修改。

/etc/etcd/etcd.conf
sudo mkdir -p /etc/etcd
sudo cat <<-'EOF' >/etc/etcd/etcd.conf
# Copyright (c) 2017 ~ 2025, the original author wangl.sir individual Inc,
# All rights reserved. Contact us 
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Refer to see: https://etcd.io/docs/v3.5/op-guide/configuration
#

# Member environoments.
export ETCD_NAME='etcd1'
export ETCD_DATA_DIR='/mnt/disk1/etcd'
# The dedicated wal directory. (default={data-dir}/wal)
export ETCD_WAL_DIR="${ETCD_DATA_DIR}/wal"
export ETCD_SNAPSHOT_COUNT=10000
export ETCD_HEARTBEAT_INTERVAL=200
export ETCD_ELECTION_TIMEOUT=5000
export ETCD_LISTEN_PEER_URLS='https://10.0.0.121:2380'
export ETCD_LISTEN_CLIENT_URLS='https://0.0.0.0:2379'
export ETCD_MAX_SNAPSHOTS=10
export ETCD_MAX_WALS=10
export ETCD_CORS='*'
export ETCD_QUOTA_BACKEND_BYTES=0
export ETCD_BACKEND_BATCH_LIMIT=0
export ETCD_BACKEND_BBOLT_FREELIST_TYPE=map
export ETCD_BACKEND_BATCH_INTERVAL=0
export ETCD_MAX_TXN_OPS=128
export ETCD_MAX_REQUEST_BYTES=1572864
export ETCD_GRPC_KEEPALIVE_MIN_TIME=5s
export ETCD_GRPC_KEEPALIVE_INTERVAL=2h
export ETCD_GRPC_KEEPALIVE_TIMEOUT=20s

# Clustering environments.
# The etcd cluster runs URLs configuration. Notice=there are differences between leader and follower nodes.
export ETCD_INITIAL_ADVERTISE_PEER_URLS='https://10.0.0.121:2380'
export ETCD_INITIAL_CLUSTER='etcd1=https://10.0.0.121:2380,etcd2=https://10.0.0.122:2380,etcd3=https://10.0.0.123:2380'
# The initial startup must be 'new', and the subsequent startup must be 'existing'. (default='new')
#export ETCD_INITIAL_CLUSTER_STATE='new'
export ETCD_INITIAL_CLUSTER_TOKEN='etcd-cluster'
export ETCD_ADVERTISE_CLIENT_URLS='https://10.0.0.121:2379'
#export ETCD_DISCOVERY=''
#export ETCD_DISCOVERY_PROXY=''
#export ETCD_DISCOVERY_SRV=''
#export ETCD_DISCOVERY_SRV_NAME=''
#export ETCD_DISCOVERY_FALLBACK='proxy'
#export ETCD_DISCOVERY_PROXY=''
export ETCD_STRICT_RECONFIG_CHECK=false
export ETCD_AUTO_COMPACTION_MODE=periodic
export ETCD_AUTO_COMPACTION_RETENTION=8
export ETCD_ENABLE_V2=false

# Profiling environments.
export ETCD_ENABLE_PPROF=true
export ETCD_METRICS='basic'
#export ETCD_LISTEN_METRICS_URLS=''

# Proxy environments.
export ETCD_PROXY='off'
export ETCD_PROXY_FAILURE_WAIT=5000
export ETCD_PROXY_REFRESH_INTERVAL=30000
export ETCD_PROXY_DIAL_TIMEOUT=1000
export ETCD_PROXY_WRITE_TIMEOUT=5000
export ETCD_PROXY_READ_TIMEOUT=0

# Auth environments.
#export ETCD_AUTH_TOKEN='simple'

# Security environments.
export ETCD_CERT_FILE='/etc/etcd/ssl/etcd.pem'
export ETCD_KEY_FILE='/etc/etcd/ssl/etcd-key.pem'
#export ETCD_CLIENT_CERT_AUTH=''
#export ETCD_CLIENT_CRL_FILE=''
#export ETCD_CLIENT_CERT_ALLOWED_HOSTNAME=''
export ETCD_TRUSTED_CA_FILE='/etc/etcd/ssl/ca.pem'
#export ETCD_AUTO_TLS=false
export ETCD_PEER_CERT_FILE='/etc/etcd/ssl/etcd.pem'
export ETCD_PEER_KEY_FILE='/etc/etcd/ssl/etcd-key.pem'
#export ETCD_PEER_CLIENT_CERT_AUTH=false
#export ETCD_PEER_CRL_FILE=''
export ETCD_PEER_TRUSTED_CA_FILE='/etc/etcd/ssl/ca.pem'
#export ETCD_PEER_AUTO_TLS=false
#export ETCD_PEER_CERT_ALLOWED_CN=''
#export ETCD_PEER_CERT_ALLOWED_HOSTNAME=''
# Notice: etcdctl Ssl handshake failed.
# [issue]: https://github.com/etcd-io/etcd/issues/9785
# [issue]: https://github.com/etcd-io/etcd/issues/10652
#export ETCD_CIPHER_SUITES=''

# Logging environments.
export ETCD_LOGGER=zap
# Specify 'stdout' or 'stderr' to skip journald logging even when running under systemd.
export ETCD_LOG_OUTPUTS='stderr,stdout'
export ETCD_LOG_LEVEL=debug

# Experimental environments.
#export ETCD_EXPERIMENTAL_CORRUPT_CHECK_TIME='0s'
#export ETCD_EXPERIMENTAL_COMPACTION_BATCH_LIMIT=1000
#export ETCD_EXPERIMENTAL_PEER_SKIP_CLIENT_SAN_VERIFICATION=false

# Miscellaneous environments.
#export ETCD_CONFIG_FILE=''

# Unsafe environments.
export ETCD_FORCE_NEW_CLUSTER=false

# Etcdctl environments.
export ETCDCTL_ENDPOINTS=https://10.0.0.121:2379,https://10.0.0.122:2379,https://10.0.0.123:2379
export ETCDCTL_CACERT=/etc/etcd/ssl/ca.pem
export ETCDCTL_CERT=/etc/etcd/ssl/etcd.pem
export ETCDCTL_KEY=/etc/etcd/ssl/etcd-key.pem
EOF

2.3 管理脚本

可兼容无 systemd 的 Linux 低版本系统,如 CentOS 6

/etc/init.d/etcd.sh
sudo cat <<-'EOF' >/etc/init.d/etcd.sh
#!/bin/bash
# chkconfig: - 85 15
#/*
# * Copyright 2017 ~ 2025 the original author or authors. <Wanglsir@gmail.com, 983708408@qq.com>
# *
# * Licensed under the Apache License, Version 2.0 (the "License");
# * you may not use this file except in compliance with the License.
# * You may obtain a copy of the License at
# *
# *      http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless required by applicable law or agreed to in writing, software
# * distributed under the License is distributed on an "AS IS" BASIS,
# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */

[ -f /etc/sysconfig/network ] && . /etc/sysconfig/network
[ "$NETWORKING" = "no" ] && exit 0

# Load the user environment.
[ -f "/etc/profile.d/profile-etcd.sh" ] && . /etc/profile.d/profile-etcd.sh
[ -f "/etc/bashrc" ] && . /etc/bashrc
[ -f "/etc/bash.bashrc" ] && . /etc/bash.bashrc # e.g ubuntu
[ -f "/home/$USER/.bash_profile" ] && . /home/$USER/.bash_profile
[ -f "/home/$USER/.bashrc" ] && . /home/$USER/.bashrc
# Mac OS
[ -f "/Users/$USER/.bash_profile" ] && . /Users/$USER/.bash_profile
[ -f "/Users/$USER/.bashrc" ] && . /Users/$USER/.bashrc

# Environment definition.
etcdBin="$(command -v etcd)"
etcdLogDir="${ETCD_LOG_DIR:-/mnt/disk1/log/etcd}"

function start() {
  local pids=$(getPids)
  if [ -z "$pids" ]; then
    # Notice: if '--config-file' is used configuration here, the environment variables
    # and startup parameters will become invalid. It is not recommended.
    if [ "$PPID" == "1" ]; then # Systemd call.
      "$etcdBin" > "$etcdLogDir/etcd.stdout" 2>&1
    else # Normal shell call.
      nohup "$etcdBin" > "$etcdLogDir/etcd.stdout" 2>&1 &
    fi

    echo -n "Starting etcd ..."
    while true
    do
      pids=$(getPids)
      if [ "$pids" == "" ]; then
        echo -n ".";
        sleep 0.8;
      else
        echo $pids >"$ETCD_DATA_DIR/etcd.pid"
        break
      fi
    done
    echo -e "\nStarted etcd on "$pids
  else
    echo "etcd process is running "$pids
  fi
}

function stop() {
  local pids=$(getPids)
  if [ -z "$pids" ]; then
    echo "etcd not running!"
  else
    echo -n "Stopping etcd for $pids ..."
    kill -s TERM $pids
    while true
    do
      pids=$(getPids)
      if [ "$pids" == "" ]; then
        \rm -f $ETCD_DATA_DIR/etcd.pid
        break
      else
        echo -n ".";
        sleep 0.8;
      fi
    done
    echo -e "\nStopped etcd !"
  fi
}

function status() {
  ps -ef | grep -v grep | grep $etcdBin
}

function getPids() {
  local pids=$(ps ax | grep -i "$etcdBin" | grep -v grep | awk '{print $1}')
  echo $pids # Output execution result value.
  return 0 # Return the execution result code.
}

# --- Main call. ---
CMD=$1
case $CMD in
  status)
    status
    ;;
  start)
    start
    ;;
  stop)
    stop
    ;;
  restart)
    stop
    start
    ;;
    *)
  echo $"Usage: {start|stop|restart|status}"
  exit 2
esac
EOF

. /etc/profile

sudo useradd etcd
sudo chown -R etcd:etcd /etc/init.d/etcd.sh
sudo chmod +x /etc/init.d/etcd.sh

sudo mkdir -p $ETCD_LOG_DIR
sudo chown -R etcd:etcd $ETCD_LOG_DIR

sudo mkdir -p $ETCD_DATA_DIR
sudo chown -R etcd:etcd $ETCD_DATA_DIR
# Etcd 推荐值(The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data)
sudo chmod 700 $ETCD_DATA_DIR

2.4 服务配置

/etc/systemd/system/etcd.service
sudo cat <<-'EOF' >/etc/systemd/system/etcd.service
# Copyright (c) 2017 ~ 2025, the original author wangl.sir individual Inc,
# All rights reserved. Contact us 
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
ExecStart=/bin/bash -c "/etc/init.d/etcd.sh start"
ExecReload=/bin/bash -c "/etc/init.d/etcd.sh restart"
ExecStop=/bin/bash -c "/etc/init.d/etcd.sh stop"
StandardOutput=null
StandardError=journal
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=5
Restart=always
KillMode=process
User=etcd
Group=etcd
SuccessExitStatus=143

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable etcd

2.5 配置证书

生成证书(只需在 10.0.0.121 / k8s-master-1 上执行,然后通过 scp -r /etc/etcd/ssl k8s-master-2:/etc/etcd/ 拷贝分发即可)

/etc/etcd/ssl
sudo curl -o /bin/cfssl -L https://github.com/cloudflare/cfssl/releases/download/v1.6.1/cfssl_1.6.1_linux_amd64
sudo curl -o /bin/cfssljson -L https://github.com/cloudflare/cfssl/releases/download/v1.6.1/cfssljson_1.6.1_linux_amd64
sudo chmod +x /bin/cfssl
sudo chmod +x /bin/cfssljson
#或 sudo apt install golang-cfssl
#
sudo mkdir /etc/etcd/ssl
#
# Generating config.
sudo cat <<-'EOF' >config.json
{"signing":{"default":{"expiry":"87600h"},"profiles":{"etcd-cluster-1":{"usages":["signing","key encipherment","server auth","client auth"],"expiry":"87600h"}}}}
EOF
#
# Generating CA certificate singing request config.
sudo cat <<-'EOF' >ca-csr.json
{"CN":"WL4G Root CA cert issuer","CA":{"expiry":"87600h","pathlen":0},"key":{"algo":"rsa","size":2048},"names":[{"C":"US","L":"San Francisco 12th street","O":"WL4G company, Inc.","OU":"www dept","ST":"California"}]}
EOF
#
# Generating etcd certificate singing request config.
sudo cat <<-'EOF' >etcd-csr.json
{"hosts":["10.0.0.121","10.0.0.122","10.0.0.123","k8s-master-1","k8s-master-2","k8s-master-3","https://etcd.wl4g.uat","https://n1.etcd.wl4g.uat","https://n2.etcd.wl4g.uat","https://n3.etcd.wl4g.uat","127.0.0.1"],"CN":"wl4g.uat","key":{"algo":"rsa","size":2048},"names":[{"C":"CN","L":"GuangZhou 6th street","O":"SM, Inc.","OU":"WWW dept","ST":"GuangDong"}]}
EOF
#
# Generating CA certificate.
sudo cfssl genkey -initca ca-csr.json | cfssljson -bare ca
#
# Generating etcd certificate.
sudo cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=config.json -profile=etcd-cluster-1 etcd-csr.json | cfssljson -bare etcd
#
# Print CA and etcd certificate.
sudo openssl x509 -in etcd-key.pem -noout -text
sudo openssl x509 -in etcd.pem -noout -text
#
# Copy to other nodes directory.
sudo scp -r  /etc/etcd/ssl k8s-master-2:/etc/etcd
sudo scp -r  /etc/etcd/ssl k8s-master-3:/etc/etcd
sudo scp -r  /etc/etcd/ssl k8s-worker-1:/etc/etcd

2.6 启动 etcd 集群并验证

使用 etcdctl endpoint status -w table 验证
# n1.etcd.wl4g.uat
sudo systemctl restart etcd
# n2.etcd.wl4g.uat
sudo systemctl restart etcd
# n3.etcd.wl4g.uat
sudo systemctl restart etcd
# 查看日志
sudo tail -f /mnt/disk1/log/etcd.stdout
#
# 在任意一台执行 etcdctl 查看集群状态
etcdctl endpoint status -w table --endpoints https://10.0.0.121:2379,https://10.0.0.122:2379,https://10.0.0.123:2379 --cacert /etc/etcd/ssl/ca.pem --cert /etc/etcd/ssl/etcd.pem --key /etc/etcd/ssl/etcd-key.pem
#
# 或直接这样干掉一大串(已配置在 /etc/etcd/etcd.conf 中,ETCDCTL_ 前缀)
etcdctl endpoint status -w table
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|        ENDPOINT         |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.0.0.121:2379 | e6e9b682e792c054 |   3.5.0 |   20 kB |     false |      false |         3 |         11 |                 11 |        |
| https://10.0.0.122:2379 | 79378de21f8ee3a3 |   3.5.0 |   25 kB |      true |      false |         3 |         11 |                 11 |        |
| https://10.0.0.123:2379 | aafe9b852edcd7e6 |   3.5.0 |   29 kB |     false |      false |         3 |         11 |                 11 |        |
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

3. Backup data snaphots.

Notice: etcdctl backup can only connect one node at a time.

/etc/etcd/etcd-backup.sh
sudo cat <<-'EOF' >/etc/etcd/etcd-backup.sh
#!/bin/bash
# Copyright (c) 2017 ~ 2025, the original author wangl.sir individual Inc,
# All rights reserved. Contact us 
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
set -e

# Load environments. (include: /etc/profile.d/profile-etcd.sh, /etc/etcd/etcd.conf)
. /etc/profile

# Global definition.
export backupEndpoint='https://10.0.0.121:2379'
export backupMaxFiles='8'
export backupDir="${ETCD_DATA_DIR}/backups"; mkdir -p $backupDir
export backupLogFile="$backupDir/backup.log"
export backupLogMaxBytes=$((10*1024*1024*1024)) # 10GB
export backupCurrentFile="${backupDir}/snapshot-$(date +%Y%m%d_%H%M%S).db"

function doBackupSnapshot() {
  # Tmp tranform etcdctl endpoints environment.
  local originEtcdctlEndpoints=$ETCDCTL_ENDPOINTS
  export ETCDCTL_ENDPOINTS=$backupEndpoint
  {
    echo "$(date +%Y-%m-%d_%H:%M:%S) - Backuping for $backupEndpoint ..."
    etcdctl snapshot save "$backupCurrentFile"
    echo "$(date +%Y-%m-%d_%H:%M:%S) - Success backup to $backupCurrentFile on $backupEndpoint"
  } 2>&1 | tee -a "$backupLogFile" 
  # Restore etcdctl endpoints environments
  export ETCDCTL_ENDPOINTS=$originEtcdctlEndpoints
}

function doCleanup() {
  # Check older snapshots.
  echo "$(date +%Y-%m-%d_%H:%M:%S) - Cleaning older backup snapshot files ..." | tee -a "$backupLogFile"
  cd $backupDir/;ls -lt |awk '{if(NR > '$backupMaxFiles'){print "rm -rf "$9}}' | sh

  # Check backup log.
  local logSize=ls -l $backupLogFile | awk '{ print $5 }'
  if [ $logSize -gt $backupLogMaxBytes ]; then
    echo > "$backupLogFile"
    echo "$(date +%Y-%m-%d_%H:%M:%S) - Cleaned backup log." | tee -a "$backupLogFile"
  fi
}

# --- Main. ---
doBackupSnapshot
doCleanup
EOF

sudo chmod +x /etc/etcd/etcd-backup.sh

# Add backup to crontab.
echo "*/5  *  *  *  * root  /etc/etcd/etcd-backup.sh" >> /etc/crontab
# Add kill zombie process.
echo "  0  3  *  *  * root  /bin/ps -A -ostat,ppid | grep -e '^[Zz]' | awk '{print }' | xargs kill -HUP > /dev/null 2>&1" >> /etc/crontab

sudo systemctl restart crond
sudo tail -f /var/log/cron
# 或 sudo journalctl -u crond -f

4. Refer

留言

您的电子邮箱地址不会被公开。