kubectl describe 命令查看
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 13m default-scheduler Successfully assigned kube-system/kube-flannel-ds-amd64-vfhnj to node-d1
Normal Pulled 13m kubelet, node-d1 Container image "quay.io/coreos/flannel:v0.11.0-amd64" already present on machine
Normal Created 13m kubelet, node-d1 Created container install-cni
Normal Started 13m kubelet, node-d1 Started container install-cni
Normal Pulled 10m (x5 over 13m) kubelet, node-d1 Container image "quay.io/coreos/flannel:v0.11.0-amd64" already present on machine
Normal Created 10m (x5 over 13m) kubelet, node-d1 Created container kube-flannel
Normal Started 10m (x5 over 13m) kubelet, node-d1 Started container kube-flannel
Warning BackOff 3m39s (x30 over 12m) kubelet, node-d1 Back-off restarting failed container
kubectl logs命令查看
[root@master ~]# kubectl logs kube-flannel-ds-amd64-9w5nq -n kube-system
I1216 05:59:40.055608 1 main.go:527] Using interface with name eth0 and address 192.168.1.82
I1216 05:59:40.055666 1 main.go:544] Defaulting external address to interface address (192.168.1.82)
E1216 06:00:10.056546 1 main.go:241] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-amd64-9w5nq': Get https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-amd64-9w5nq: dial tcp 10.96.0.1:443: i/o timeout
问题排查
网络问题?
通过curl命令测试,网络没有问题。
测试kube-proxy
重启该节点上的kube-proxy
容器,并查看日志
kubectl delete pod kube-proxy-vtd27 -n kube-system
[root@master ~]# kubectl logs kube-proxy-pljct -n kube-system
W1216 06:30:51.741835 1 proxier.go:513] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1216 06:30:51.742536 1 proxier.go:513] Failed to load kernel module ip_vs_sh with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1216 06:30:51.748495 1 proxier.go:513] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1216 06:30:51.749223 1 proxier.go:513] Failed to load kernel module ip_vs_sh with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
E1216 06:30:51.750805 1 server_others.go:259] can't determine whether to use ipvs proxy, error: IPVS proxier will not be used because the following required kernel modules are not loaded: [ip_vs_wrr ip_vs_sh]
I1216 06:30:51.757054 1 server_others.go:143] Using iptables Proxier.
W1216 06:30:51.757122 1 proxier.go:321] clusterCIDR not specified, unable to distinguish between internal and external traffic
I1216 06:30:51.757338 1 server.go:534] Version: v1.15.0
问题
这里明显有个问题。一些需要的内核模块加载失败,参考安装文档,已经配置过内核模块。重新尝试,问题依然存在。
[root@master ~]# cat > /etc/sysconfig/modules/ipvs.modules < #!/bin/bash
> modprobe -- ip_vs
> modprobe -- ip_vs_rr
> modprobe -- ip_vs_wrr
> modprobe -- ip_vs_sh
> modprobe -- nf_conntrack_ipv4
> EOF
[root@master ~]# chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4
ip_vs_sh 12688 0
ip_vs_wrr 12697 0
ip_vs_rr 12600 150
ip_vs 145497 156 ip_vs_rr,ip_vs_sh,ip_vs_wrr
nf_conntrack_ipv4 15053 6
nf_defrag_ipv4 12729 1 nf_conntrack_ipv4
nf_conntrack 139224 9 ip_vs,nf_nat,nf_nat_ipv4,nf_nat_ipv6,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_netlink,nf_conntrack_ipv4,nf_conntrack_ipv6
libcrc32c 12644 3 ip_vs,nf_nat,nf_conntrack
[root@master ~]# lsmod | grep -e ip_vs -e nf_conntrack_ipv4
ip_vs_sh 12688 0
ip_vs_wrr 12697 0
ip_vs_rr 12600 150
ip_vs 145497 156 ip_vs_rr,ip_vs_sh,ip_vs_wrr
nf_conntrack_ipv4 15053 6
nf_defrag_ipv4 12729 1 nf_conntrack_ipv4
nf_conntrack 139224 9 ip_vs,nf_nat,nf_nat_ipv4,nf_nat_ipv6,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_netlink,nf_conntrack_ipv4,nf_conntrack_ipv6
libcrc32c 12644 3 ip_vs,nf_nat,nf_conntrack
问题解决
由于ipvs
已经加入到内核主干,所以需要内核模块支持,请确保内核已经加载了相应模块;如不确定,执行以下脚本,以确保内核加载相应模块,否则会出现failed to load kernel modules: [ip_vs_rr ip_vs_sh ip_vs_wrr]
错误
cat > /etc/sysconfig/modules/ipvs.modules < /dev/null 2>&1
if [ $? -eq 0 ]; then
/sbin/modprobe \${kernel_module}
fi
done
EOF
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep ip_vs
执行后应该如下图所示,如果lsmod | grep ip_vs
并未出现 ip_vs_rr
等模块;
[root@node-d1 ~]# lsmod | grep ip_vs
ip_vs_ftp 13079 0
ip_vs_sed 12519 0
ip_vs_nq 12516 0
ip_vs_sh 12688 0
ip_vs_dh 12688 0
ip_vs_lblcr 12922 0
ip_vs_lblc 12819 0
ip_vs_wrr 12697 0
ip_vs_wlc 12519 0
ip_vs_lc 12516 0
nf_nat 26583 5 ip_vs_ftp,nf_nat_ipv4,nf_nat_ipv6,xt_nat,nf_nat_masquerade_ipv4
ip_vs_rr 12600 136
ip_vs 145497 162 ip_vs_dh,ip_vs_lc,ip_vs_nq,ip_vs_rr,ip_vs_sh,ip_vs_ftp,ip_vs_sed,ip_vs_wlc,ip_vs_wrr,ip_vs_lblcr,ip_vs_lblc
nf_conntrack 139224 9 ip_vs,nf_nat,nf_nat_ipv4,nf_nat_ipv6,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_netlink,nf_conntrack_ipv4,nf_conntrack_ipv6
libcrc32c 12644 3 ip_vs,nf_nat,nf_conntrack
最后重启Kube-proxy
和flannel
容器,问题解决。