The issue occurs during an upgrade to Continuous Delivery for Puppet Enterprise and Puppet Application Manager or after upgrading is complete. In each scenario, It worked fine for a while, but you lost connectivity to your clusters in Puppet Application Manager and Kubernetes. You get errors that images are missing, but containerd
and kubelet
are running.
Error messages
When you run kubectl get pods
, you get connection errors.
The connection to the server <IP ADDRESS:6443> was refused - did you specify the right host or port?
When you try to migrate to a newer version of Puppet Application Manager, you get errors similar to:
Mar 8 14:28:36 CM2A2072 kubelet[51786]: E0308 14:28:36.556147 51786 kuberuntime_manager.go:790] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"k8s.gcr.io/pause:3.2\": failed to pull image \"k8s.gcr.io/pause:3.2\": failed to pull and unpack image \"k8s.gcr.io/pause:3.2\": failed to resolve reference \"k8s.gcr.io/pause:3.2\": failed to do request: Head \https://k8s.gcr.io/v2/pause/manifests/3.2\: dial tcp: lookup k8s.gcr.io on 10.109.252.34:53: no such host" pod="kube-system/kube-controller-manager-cm2a2072"
Kubernetes monitors disk space usage on cluster nodes. When the filesystem hosting the container runtime increases to more than 85% utilization, garbage collection (ImageGC) takes place. As new container images are loaded, upgrading hosted applications causes an increase in disk utilization. Over time, upgrades continue, and the probability of ImageGC being invoked approaches 100%.
When invoked, ImageGC removes images that the runtime determines are unused. However, it also removes needed images, including the k8s.gcr.io/pause
image required to start new pods. Online clusters can download missing images from a public registry. However, offline (air-gapped) clusters cannot reach public registries and fail to re-pull images removed by ImageGC.
Version and installation information
Product: Continuous Delivery for Puppet Enterprise
Version: 4.x
Installation type: Offline (air-gapped)
Solution
To fix the issue, check syslogs for unused container
messages. If you have them, repopulate the registry to restore the missing images.
-
To check for the issue, run the following on your control plane node to monitor your syslogs for an
Attempting to delete unused containers
message once per second:until (journalctl -u kubelet --since "${start_time}" | grep 'Attempting to delete unused containers') do printf 'Waiting for kubelet to run ImageGC.\n' sleep 1 done
-
Repopulate the registry manually to restore the missing images.
A. Log into the control plane node.
B. If your installation package, for example
puppet-application-manager-standalone.tar.gz,
is compressed, expand it.C. Repopulate the registry by running:
cat tasks.sh | bash -s load-images
How can we improve this article?
0 comments
Please sign in to leave a comment.
Related articles