Kubernetes

Docker 离线二进制生产部署

Docker 离线二进制生产部署

官网文档参考
官网二进制仓库

1.1 OS configure

  • 1.1.1 Proofreading clock

Synchronizing nodes clock using Chrony

  • 1.1.2 Closing firewalld

    sudo systemctl stop firewalld
    sudo systemctl disable firewalld
    sudo systemctl status firewalld
  • 1.1.3 Closing SELinux

sudo touch /etc/selinux/config # (e.g ubuntu 20  does not have this file)
sudo sed -i 's/^SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
setenforce 0
  • 1.1.4 Kernel optimization

    Thanks refer of: kainstall-ubuntu.sh#L385

    /etc/sysctl.d/99-kube.conf
    sudo cat <<-'EOF' >/etc/sysctl.d/99-kube.conf
    # https://www.kernel.org/doc/Documentation/sysctl/
    #############################################################################################
    # 调整虚拟内存
    #############################################################################################
    # Default: 30
    # 0 - 任何情况下都不使用swap。
    # 1 - 除非内存不足(OOM),否则不使用swap。
    vm.swappiness = 0
    # 内存分配策略
    #0 - 表示内核将检查是否有足够的可用内存供应用进程使用;如果有足够的可用内存,内存申请允许;否则,内存申请失败,并把错误返回给应用进程。
    #1 - 表示内核允许分配所有的物理内存,而不管当前的内存状态如何。
    #2 - 表示内核允许分配超过所有物理内存和交换空间总和的内存
    vm.overcommit_memory=1
    # OOM时处理
    # 1关闭,等于0时,表示当内存耗尽时,内核会触发OOM killer杀掉最耗内存的进程。
    vm.panic_on_oom=0
    # vm.dirty_background_ratio 用于调整内核如何处理必须刷新到磁盘的脏页。
    # Default value is 10.
    # 该值是系统内存总量的百分比,在许多情况下将此值设置为5是合适的。
    # 此设置不应设置为零。
    vm.dirty_background_ratio = 5
    # 内核强制同步操作将其刷新到磁盘之前允许的脏页总数
    # 也可以通过更改 vm.dirty_ratio 的值(将其增加到默认值30以上(也占系统内存的百分比))来增加
    # 推荐 vm.dirty_ratio 的值在60到80之间。
    vm.dirty_ratio = 60
    # vm.max_map_count 计算当前的内存映射文件数。
    # mmap 限制(vm.max_map_count)的最小值是打开文件的ulimit数量(cat /proc/sys/fs/file-max)。
    # 每128KB系统内存 map_count应该大约为1。 因此,在32GB系统上,max_map_count为262144。
    # Default: 65530
    vm.max_map_count = 2097152
    #############################################################################################
    # 调整文件
    #############################################################################################
    fs.may_detach_mounts = 1
    # 增加文件句柄和inode缓存的大小,并限制核心转储。
    fs.file-max = 2097152
    fs.nr_open = 2097152
    fs.suid_dumpable = 0
    # 文件监控
    fs.inotify.max_user_instances=8192
    fs.inotify.max_user_watches=524288
    fs.inotify.max_queued_events=16384
    #############################################################################################
    # 调整网络设置
    #############################################################################################
    # 为每个套接字的发送和接收缓冲区分配的默认内存量。
    net.core.wmem_default = 25165824
    net.core.rmem_default = 25165824
    # 为每个套接字的发送和接收缓冲区分配的最大内存量。
    net.core.wmem_max = 25165824
    net.core.rmem_max = 25165824
    # 除了套接字设置外,发送和接收缓冲区的大小
    # 必须使用net.ipv4.tcp_wmem和net.ipv4.tcp_rmem参数分别设置TCP套接字。
    # 使用三个以空格分隔的整数设置这些整数,分别指定最小,默认和最大大小。
    # 最大大小不能大于使用net.core.wmem_max和net.core.rmem_max为所有套接字指定的值。
    # 合理的设置是最小4KiB,默认64KiB和最大2MiB缓冲区。
    net.ipv4.tcp_wmem = 20480 12582912 25165824
    net.ipv4.tcp_rmem = 20480 12582912 25165824
    # 增加最大可分配的总缓冲区空间
    # 以页为单位(4096字节)进行度量
    net.ipv4.tcp_mem = 65536 25165824 262144
    net.ipv4.udp_mem = 65536 25165824 262144
    # 为每个套接字的发送和接收缓冲区分配的最小内存量。
    net.ipv4.udp_wmem_min = 16384
    net.ipv4.udp_rmem_min = 16384
    # 启用TCP窗口缩放,客户端可以更有效地传输数据,并允许在代理方缓冲该数据。
    net.ipv4.tcp_window_scaling = 1
    # 提高同时接受连接数。
    net.ipv4.tcp_max_syn_backlog = 10240
    # 将net.core.netdev_max_backlog的值增加到大于默认值1000
    # 可以帮助突发网络流量,特别是在使用数千兆位网络连接速度时,
    # 通过允许更多的数据包排队等待内核处理它们。
    net.core.netdev_max_backlog = 65536
    # 增加选项内存缓冲区的最大数量
    net.core.optmem_max = 25165824
    # 被动TCP连接的SYNACK次数。
    net.ipv4.tcp_synack_retries = 2
    # 允许的本地端口范围。
    net.ipv4.ip_local_port_range = 2048 65535
    # 防止TCP时间等待
    # Default: net.ipv4.tcp_rfc1337 = 0
    net.ipv4.tcp_rfc1337 = 1
    # 减少tcp_fin_timeout连接的时间默认值
    net.ipv4.tcp_fin_timeout = 15
    # 积压套接字的最大数量。
    # Default is 128.
    net.core.somaxconn = 32768
    # 打开syncookies以进行SYN洪水攻击保护。
    net.ipv4.tcp_syncookies = 1
    # 避免Smurf攻击
    # 发送伪装的ICMP数据包,目的地址设为某个网络的广播地址,源地址设为要攻击的目的主机,
    # 使所有收到此ICMP数据包的主机都将对目的主机发出一个回应,使被攻击主机在某一段时间内收到成千上万的数据包
    net.ipv4.icmp_echo_ignore_broadcasts = 1
    # 为icmp错误消息打开保护
    net.ipv4.icmp_ignore_bogus_error_responses = 1
    # 启用自动缩放窗口。
    # 如果延迟证明合理,这将允许TCP缓冲区超过其通常的最大值64K。
    net.ipv4.tcp_window_scaling = 1
    # 打开并记录欺骗,源路由和重定向数据包
    net.ipv4.conf.all.log_martians = 1
    net.ipv4.conf.default.log_martians = 1
    # 告诉内核有多少个未附加的TCP套接字维护用户文件句柄。 万一超过这个数字,
    # 孤立的连接会立即重置,并显示警告。
    # Default: net.ipv4.tcp_max_orphans = 65536
    net.ipv4.tcp_max_orphans = 65536
    # 不要在关闭连接时缓存指标
    net.ipv4.tcp_no_metrics_save = 1
    # 启用RFC1323中定义的时间戳记:
    # Default: net.ipv4.tcp_timestamps = 1
    net.ipv4.tcp_timestamps = 1
    # 启用选择确认。
    # Default: net.ipv4.tcp_sack = 1
    net.ipv4.tcp_sack = 1
    # 增加 tcp-time-wait 存储桶池大小,以防止简单的DOS攻击。
    # net.ipv4.tcp_tw_recycle 已从Linux 4.12中删除。请改用net.ipv4.tcp_tw_reuse。
    net.ipv4.tcp_max_tw_buckets = 14400
    net.ipv4.tcp_tw_reuse = 1
    # accept_source_route 选项使网络接口接受设置了严格源路由(SSR)或松散源路由(LSR)选项的数据包。
    # 以下设置将丢弃设置了SSR或LSR选项的数据包。
    net.ipv4.conf.all.accept_source_route = 0
    net.ipv4.conf.default.accept_source_route = 0
    # 打开反向路径过滤
    net.ipv4.conf.all.rp_filter = 1
    net.ipv4.conf.default.rp_filter = 1
    # 禁用ICMP重定向接受
    net.ipv4.conf.all.accept_redirects = 0
    net.ipv4.conf.default.accept_redirects = 0
    net.ipv4.conf.all.secure_redirects = 0
    net.ipv4.conf.default.secure_redirects = 0
    # 禁止发送所有IPv4 ICMP重定向数据包。
    net.ipv4.conf.all.send_redirects = 0
    net.ipv4.conf.default.send_redirects = 0
    # 开启IP转发.
    net.ipv4.ip_forward = 1
    # 禁止IPv6
    net.ipv6.conf.lo.disable_ipv6=1
    net.ipv6.conf.all.disable_ipv6 = 1
    net.ipv6.conf.default.disable_ipv6 = 1
    # 要求iptables不对bridge的数据进行处理
    net.bridge.bridge-nf-call-ip6tables = 1
    net.bridge.bridge-nf-call-iptables = 1
    net.bridge.bridge-nf-call-arptables = 1
    # arp缓存
    # 存在于 ARP 高速缓存中的最少层数,如果少于这个数,垃圾收集器将不会运行。缺省值是 128
    net.ipv4.neigh.default.gc_thresh1=2048
    # 保存在 ARP 高速缓存中的最多的记录软限制。垃圾收集器在开始收集前,允许记录数超过这个数字 5 秒。缺省值是 512
    net.ipv4.neigh.default.gc_thresh2=4096
    # 保存在 ARP 高速缓存中的最多记录的硬限制,一旦高速缓存中的数目高于此,垃圾收集器将马上运行。缺省值是 1024
    net.ipv4.neigh.default.gc_thresh3=8192
    # 持久连接
    net.ipv4.tcp_keepalive_time = 600
    net.ipv4.tcp_keepalive_intvl = 30
    net.ipv4.tcp_keepalive_probes = 10
    # conntrack表
    net.nf_conntrack_max=1048576
    net.netfilter.nf_conntrack_max=1048576
    net.netfilter.nf_conntrack_buckets=262144
    net.netfilter.nf_conntrack_tcp_timeout_fin_wait=30
    net.netfilter.nf_conntrack_tcp_timeout_time_wait=30
    net.netfilter.nf_conntrack_tcp_timeout_close_wait=15
    net.netfilter.nf_conntrack_tcp_timeout_established=300
    #############################################################################################
    # 调整内核参数
    #############################################################################################
    # 地址空间布局随机化(ASLR)是一种用于操作系统的内存保护过程,可防止缓冲区溢出攻击。
    # 这有助于确保与系统上正在运行的进程相关联的内存地址不可预测,
    # 因此,与这些流程相关的缺陷或漏洞将更加难以利用。
    # Accepted values: 0 = 关闭, 1 = 保守随机化, 2 = 完全随机化
    kernel.randomize_va_space = 2
    # 调高 PID 数量
    kernel.pid_max = 65536
    kernel.threads-max=30938
    # coredump
    kernel.core_pattern=core
    # 决定了检测到soft lockup时是否自动panic,缺省值是0
    kernel.softlockup_all_cpu_backtrace=1
    kernel.softlockup_panic=1
    EOF
    

1.2 安装 (binary)

sudo mkdir -p /usr/lib/docker-current; cd /usr/lib/docker-current
sudo chmod -R 755 /usr/lib/docker-current
sudo curl -O https://download.docker.com/linux/static/stable/x86_64/docker-20.10.7.tgz
sudo tar -xf docker-20.10.7.tgz --strip-components=1 -C $(pwd)
sudo rm -rf docker-*.tgz # cleanup

1.3 环境配置

sudo cat <<-'EOF' >/etc/profile.d/profile-docker.sh
#!/bin/bash
# Copyright (c) 2017 ~ 2025, the original author wangl.sir individual Inc,
# All rights reserved. Contact us <wanglsir@gmail.com, 983708408@qq.com>
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
export DOCKER_HOME=/usr/lib/docker-current
export PATH=$PATH:$DOCKER_HOME:
EOF

. /etc/profile.d/profile-docker.sh

# Links binary.
for f in `ls $DOCKER_HOME`; do sudo ln -snf $DOCKER_HOME/$f /usr/bin/$f; done

1.4 配置服务

docker.service

[展开] /etc/systemd/system/docker.service
sudo cat <<-'EOF' >/etc/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket containerd.service
#
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always
#
# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3
#
# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s
#
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=262144
LimitNPROC=infinity
LimitCORE=infinity
#
# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity
#
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
#
# kill only the docker process, not all processes in the cgroup
KillMode=process
OOMScoreAdjust=-500
#
[Install]
WantedBy=multi-user.target
EOF

containerd.service

[展开] /etc/systemd/system/containerd.service
sudo cat <<-'EOF' >/etc/systemd/system/containerd.service
# Copyright The containerd Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target local-fs.target
#
[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/bin/containerd
#
Type=notify
Delegate=yes
KillMode=process
Restart=always
RestartSec=5
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=262144
LimitNPROC=infinity
LimitCORE=infinity
# Comment TasksMax if your systemd version does not supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
OOMScoreAdjust=-999
#
[Install]
WantedBy=multi-user.target
EOF

docker.socket

否则启动报错:Failed to start docker.service: Unit docker.socket not found.)

sudo cat <<-'EOF' >/etc/systemd/system/docker.socket
[Unit]
Description=Docker Socket for the API

[Socket]
ListenStream=/var/run/docker.sock
SocketMode=0660
SocketUser=root
SocketGroup=docker

[Install]
WantedBy=sockets.target
EOF

1.4 创建用户

sudo groupadd docker
sudo useradd docker -g docker

1.5 配置 daemon.json (墙内推荐)

这里主要是修改 registry-mirrors,其他配置为部署 kubernetes 时才需要,可选。

sudo mkdir -p /etc/docker
sudo cat <<-'EOF' >/etc/docker/daemon.json
{
    "registry-mirrors": ["https://hjbu3ivg.mirror.aliyuncs.com"],
    "data-root": "/var/lib/docker",
    "log-level": "warn",
    "log-driver": "json-file",
    "log-opts": {
      "max-size": "200m",
      "max-file": "5"
    },
    "default-ulimits": {
      "nofile": {
        "Name": "nofile",
        "Hard": 65535,
        "Soft": 65535
      },
      "nproc": {
        "Name": "nproc",
        "Hard": 65535,
        "Soft": 65535
      }
    },
    "live-restore": true,
    "oom-score-adjust": -1000,
    "max-concurrent-downloads": 10,
    "max-concurrent-uploads": 10,
    "storage-driver": "overlay2",
    "storage-opts": ["overlay2.override_kernel_check=true"],
    "exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
  • 注:nofile、noproc 可适当调大,但不可超过系统 ulimit -a,否则会无法启动或者无法运行容器,参见:#2.5

1.6 启动测试

sudo systemctl daemon-reload
sudo systemctl enable docker
sudo systemctl start docker
sudo systemctl status docker
# 此命令下载测试映像并在容器中运行。当容器运行时,它会打印一条信息性消息并退出
sudo docker run hello-world

1.7 查看 docker daemon日志

  • Ubuntu (old using upstart ) : /var/log/upstart/docker.log
  • Ubuntu (new using systemd ) : sudo journalctl -fu docker.service
  • Amazon Linux AMI : /var/log/docker
  • Boot2Docker : /var/log/docker.log
  • Debian GNU/Linux : /var/log/daemon.log
  • CentOS : /var/log/messages | grep docker
  • CoreOS : journalctl -u docker.service
  • Fedora : journalctl -u docker.service
  • Red Hat Enterprise Linux Server : /var/log/messages | grep docker
  • OpenSuSE : journalctl -u docker.service
  • OSX : ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/log/d‌​ocker.log
  • Windows : Get-EventLog - LogName Application - Source Docker - After (Get-Date).AddMinutes(-5) | Sort-Object Time, as mentioned here.

参考1: view docker log file

2. FAQ

2.1 docker 启动错误 Failed to start docker.service: Unit docker.service is masked.

Failed to start docker.service: Unit containerd.service is masked.

  • 解决
    sudo systemctl unmask docker.service
    sudo systemctl unmask docker.socket
    或
    sudo systemctl unmask containerd.service
    sudo systemctl unmask containerd.socket

2.2 启动报错 Failed at step LIMITS spawning /usr/bin/dockerd: Operation not permitted

解决:将 /etc/systemd/system/containerd.service/etc/systemd/system/docker.serviceLimitNOFILE=infinity 改小为 LimitNOFILE=65535,原因是句柄数应不大于当前系统句柄数 ulimit -a

2.3 如在部署 istio 应用时需要访问 gcr.io 仓库,如何配置代理?

  • 解决
# 增加代理配置
mkdir -p /etc/systemd/system/docker.service.d/
sudo cat <<-'EOF' >/etc/systemd/system/docker.service.d/http-proxy.conf
#!/bin/bash
# Copyright (c) 2017 ~ 2025, the original author wangl.sir individual Inc,
# All rights reserved. Contact us <wanglsir@gmail.com, 983708408@qq.com>
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
[Service]
Environment="HTTP_PROXY=http://127.0.0.1:8118" "HTTPS_PROXY=127.0.0.1:8118"
EOF
sudo systemctl restart docker
# 查看 docker 服务的环境配置
sudo systemctl show --property=Environment docker
 ```

- 以上 `127.0.0.1:8118` 代理服务配置可参考: [基于 shadowsock + privoxy 搭建http代理服务](https://blogs.wl4g.com/archives/121)

- 验证

```bash
git clone https://github.com/istio/istio
git checkout 1.9.9
make build
# 或 docker pull gcr.io/istio-testing/build-tools:release-1.9-2021-09-09T06-24-57

2.3 每次停止 `systemctl stop docker` 都提示 `Warning: Stopping docker.service, but it can still be activated by: docker.socket`

  • 问题分析 stackoverflow.com/questions/47489631/warning-stopping-docker-service-but-it-can-still-be-activated-by-docker-socke:This is because in addition to the docker.service unit file, there is a docker.socket unit file… this is for socket activation. The warning means if you try to connect to the docker socket while the docker service is not running, then systemd will automatically start docker for you. You can get rid of this by removing /lib/systemd/system/docker.socket… you may also need to remove -H fd:// from the docker.service unit file.

  • 问题解决

# 删除 docker.service 中 docker.socket 依赖,改为 Requires=containerd.service
sudo systemctl daemon-reload
sudo systemctl restart docker
# 如果还是起不来则重启 containerd,然后参考: #2.1 执行 unmask 解禁
sudo systemctl restart containerd

2.4. 其他 docker 常见故障排查参考

2.5 无法运行任何容器,都报错`setting rlimits for ready process caused: error setting rlimit type 7: operation not permitted: unknown.`

  • 问题重现
d run --rm hello-world
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:446: setting rlimits for ready process caused: error setting rlimit type 7: operation not permitted: unknown.
  • 问题分析

通常是由于 docker 或 containerd 进程启动指定的 nproc、nofile设置过大,超过了系统限制导致。

  • 问题解决

请检查 /etc/systemd/system/docker.service/etc/systemd/system/containerd.service/etc/docker/daemon.json 中的设置,改为小于等于系统 ulimit -a 然后重启 docker 即可。

2.6 docker 宿主机磁盘空间占用过高如何释放?

留言

您的电子邮箱地址不会被公开。