Troubleshooting ArgoCD Installation in Kubernetes
Introduction
Installing ArgoCD on Kubernetes can sometimes come with unexpected challenges, especially in environments with existing configurations or resource issues. Recently, I faced several issues while deploying ArgoCD, including CrashLoopBackOff
errors, namespaces stuck in Terminating
status, and connectivity issues. In this article, I’ll walk you through the problems I encountered and how I resolved them.
The Initial Problem: ArgoCD Pod CrashLoopBackOff
After installing ArgoCD using Helm, I noticed the following issue when checking the pods in the argocd
namespace:
root@k3s-master:~# kubectl get pods -n argocd
NAME READY STATUS RESTARTS AGE
argocd-redis-secret-init-cvtz8 0/1 CrashLoopBackOff 4 (74s ago) 2m52s
On inspecting the logs of the failing pod, it revealed a no route to host
error:
root@k3s-master:~# kubectl logs argocd-redis-secret-init-cvtz8 -n argocd
Checking for initial Redis password in secret argocd/argocd-redis at key auth.
time="2025-01-21T03:12:22Z" level=fatal msg="Post \"https://10.43.0.1:443/api/v1/namespaces/argocd/secrets\": dial tcp 10.43.0.1:443: connect: no route to host"
This error indicated that the pod was unable to communicate with the Kubernetes API server. This pointed to a potential networking issue within the cluster.
Further Investigation: Namespace Stuck in Terminating
When attempting to clean up and reinstall ArgoCD, I encountered another issue: the argocd
namespace was stuck in Terminating
status.
root@k3s-master:~# kubectl delete namespace argocd
namespace "argocd" deleted
root@k3s-master:~# kubectl get namespace argocd
NAME STATUS AGE
argocd Terminating 10m
Initially, I suspected the problem might be related to finalizers
. I manually removed them:
kubectl edit namespace argocd
However, the namespace still wouldn’t delete, and the finalizers
entry kept reappearing.
Upon closer inspection using the following command:
kubectl get namespace argocd -o yaml
I noticed an error message:
Discovery failed for some groups, 1 failing: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1
Root Cause: Metrics Server and Flannel Issues
The discovery error pointed to a problem with the metrics-server
in the kube-system
namespace. I verified the status of the metrics-server
pod:
kubectl get pods -n kube-system | grep metrics-server
All metrics-server
pods were in an error state.
Additionally, I noticed issues with the networking component flannel
. This likely explained the no route to host
error during the initial ArgoCD installation.
Solution: Fixing Metrics Server and Flannel
To resolve the issue, I took the following steps:
Verify ArgoCD Installation: I confirmed that all ArgoCD pods were running successfully:
kubectl get pods -n argocd
Reinstall ArgoCD: With the networking and metrics issues resolved, I reinstalled ArgoCD using Helm:
helm install argocd argo/argo-cd -n argocd --create-namespace
Check Metrics Server: Once the networking issue was resolved, I ensured the metrics-server
pods were operational:
kubectl get pods -n kube-system | grep metrics-server
Verify Flannel Status: After applying the configuration, I confirmed that the flannel
pods were running:
kubectl get pods -n kube-system | grep flannel
Reinstall Flannel: Flannel is a key networking component in Kubernetes. I re-applied the official Flannel configuration:
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
以上資訊用GPT整理