
Title
Create new category
Edit page index title
Edit category
Edit link
While Restoring LTS2-Patch2 On SMCP, Management Plane Cluster Backup-Restore Process Fails.
Problem
During the restoration process of LTS2-patch2 [v-5.6.7-2624593] to SMCP, the restore step is failing with below error:
# airctl restore --backupdir /root/ --config /opt/pf9/airctl/conf/airctl-config.yaml --verbose...2023-09-22T14:01:25.353Z info restoring mysql2023-09-22T14:01:25.353Z info state file does not contain SSH user▀ Starting vault... (6m28s)2023-09-22T14:01:25.456Z debug found pod percona-db-pxc-db-pxc-0 ERROR setting up kplane...2023-09-22T14:01:34.475Z error failed to install kplane components: failed to install helm chart /sbin/helm install kplane-usermgr /opt/pf9/airctl/conf/helm_charts/kplane-components-0.3.4.tgz -f /opt/pf9/airctl/conf/kplane_values.yaml -f /opt/pf9/airctl/conf/secrets.yaml: exit status 1 - Error: INSTALLATION FAILED: execution error at (kplane-components/templates/required.yaml:17:5): consul_fallback_token is required from values.yaml2023-09-22T14:01:34.475Z fatal error: failed to install helm chart /sbin/helm install kplane-usermgr /opt/pf9/airctl/conf/helm_charts/kplane-components-0.3.4.tgz -f /opt/pf9/airctl/conf/kplane_values.yaml -f /opt/pf9/airctl/conf/secrets.yaml: exit status 1 - Error: INSTALLATION FAILED: execution error at (kplane-components/templates/required.yaml:17:5): consul_fallback_token is required from values.yaml...Environment
- Platform9 Edge Cloud- LTS2-Patch2 [v-5.6.7-2624593].
Cause
This is a known issue. Jira AIR-1218 has been filed to track and resolve it.
Platform9 Engineering team is actively working to fix this issue.
Workaround
As a workaround, please follow the steps mentioned below:
- Ensure your existing DU has no issues by running the following command and verifying that task state is ready
xxxxxxxxxxairctl status2. Download LTS2-Patch4 [v-5.6.7-2658688] artifacts, following same steps as for LTS2-Patch#2
xxxxxxxxxxbash ./install.sh v-5.6.7-2658688- Run the upgrade operation following the upgrade guide. (Upgrade from LTS2-patch#2 to LTS2-patch#4)
xxxxxxxxxxairctl upgradeThe upgrade operation is expected to fail due a known issue which can be ignored. The upgrade, however it fails, fixes the state files which are essential for the restoration of LTS2 on SMCP. But the upgrade from LTS2-patch#2 to LTS2-patch#4 is affected due to removal of internal component known as decco and some related codebase changes.
The expected error message is shown below:
ERROR upgrading management plane...2023-10-11T10:08:24.585Z error fatal error:failed to upgrade: failed to upgrade kplane components: failed to upgrade helm chart /sbin/helm upgrade kplane-usermgr /opt/pf9/airctl/conf/helm_charts/kplane-components-0.3.4.tgz -f /opt/pf9/airctl/conf/kplane_values.yaml -f /opt/pf9/airctl/conf/secrets.yaml: exit status 1 - Error: UPGRADE FAILED: rendered manifests contain a resource that already exists. Unable to continue with update: Namespace "smcp-mgmt-kplane" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "kplane-usermgr"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "default"- After this, please follow restore process of smcp with following change:
In step#7, while updating the nodelet-bootstrap.yaml file add the kubedu-imgs tar file from LTS2-Patch#2 to the userImages section as well. A snippet of the yaml file shown below for reference:
xxxxxxxxxxisAirgapped: truesystemImages:- /opt/pf9/airctl/imgs/kubedu-imgs-v-5.9.0-2847602.tar.gz- /opt/pf9/airctl/imgs/nodelet-imgs-v-5.9.0-2847602.tar.gzuserImages:- /home/centos/patch2/kubedu-imgs-v-5.6.7-2624593.tar.gz- /home/centos/patch4/kubedu-imgs-v-5.6.7-2658688.tar.gzAdditional Information
In some cases, especially on systems with limited resources, the container runtime can perform a garbage collection of some of the kubedu images which have not been used yet. This can cause some of the operations like airctl upgrade/upgrade-hosts to fail due to ImagePullBackOff errors.
We can determine whether the images need to be reloaded by running and making sure the images that we need for du-upgrade or host-upgrade have not been cleaned up.
xxxxxxxxxxsudo /opt/pf9/pf9-kube/bin/nerdctl -n k8s.io imagesFor reference, some of the images we should look for are quay.io/platform9/k8s-helm-runner and quay.io/platform9/kplane-host-upg.
If we find that images are missing we can run the following command before the upgrade/upgrade-hosts operations
xxxxxxxxxxsudo /opt/pf9/pf9-kube/bin/nerdctl -n k8s.io load -i /var/opt/pf9/images/kubedu-imgs-v-5.9.0-2847602.tar.gz