Managed Kubernetes
Latest
Frequently Asked Questions
Solutions
How Tos
Internal Only
Templates
Powered By

Title
Message
Create new category
What is the title of your new category?
Edit page index title
What is the title of the page index?
Edit category
What is the new title of your category?
Edit link
What is the new title and URL of your link?
Nodelet Phases Stuck At Master Node Due to CA Certificate Issue, Which In Turn Affected All worker nodes being NotReady State
Copy Markdown
Open in ChatGPT
Open in Claude
Problem
Identified multiple issues while performing a Kubernetes cluster upgrade from v1.24 to v1.25.
- While upgrading the cluster, the nodelet phase got stuck at the etcd phase on the master node where the upgrade was started.
ETCD Logs
{"log":"{\"level\":\"warn\",\"ts\":\"2024-04-24T08:16:49.171922Z\",\"caller\":\"embed/config_logging.go:169\",\"msg\":\"rejected connection\",\"remote-addr\":\"10.96.8.51:58162\",\"server-name\":\"\",\"error\":\"tls: failed to verify certificate: x509: certificate signed by unknown authority\"}To address the issue, a PMK stack restart was performed on all master nodes. However, as an after-effect, all worker nodes transitioned to the NotReady state following the master node upgrade/CA chain rotation after the stack restart.
Kubelet Logs
E0424 09:18:50.382151 1746680 kubelet.go:2424] "Error getting node" err="node \"kube-837943-zone1-worker28\" not found"E0424 09:18:53.106797 1746680 kubelet_node_status.go:92] "Unable to register node with API server" err="Unauthorized" node="kube-837943-zone1-worker28"- The existing kubeconfig with the previous CA certificate becomes invalid after the cluster is upgraded.
Environment
- Platform9 Managed Kubernetes - v5.9
- Kubernetes Version 1.24+
Cause
- The issue stemmed from the pf9-kube code, which failed to utilize the entire CA chain after certificate rotation for generating certs. This oversight wasn't detected during testing, primarily due to a missing step of restarting the management plane service(Qbert) pod after initiating the certificate rotation.
- After Cluster CA rotation, the Old CA Cert is not available in the CA chain.
Resolution
- The issue has been resolved in PMK version 5.10 with Kubernetes version 1.25 or later
Workaround
- To unblock restart the nodelet phases on the unaffected master nodes first, and then restart the phases on the affected master node.
Phases restart
xxxxxxxxxx# systemctl stop pf9-hostagent pf9-nodeletd # /opt/pf9/nodelet/nodeletd phases stop # systemctl start pf9-hostagent- To recover from the NotReady state, proceed with the upgrades of the worker nodes. After the upgrade, the nodes that were previously NotReady will transition to a Ready state, as they have been upgraded to the required version
VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches
Last updated on
Was this page helpful?
Discard Changes
Do you want to discard your current changes and overwrite with the template?
Archive Synced Block
Message
Create new Template
What is this template's title?
Delete Template
Message