Managed Kubernetes
Latest
Frequently Asked Questions
Solutions
How Tos
Internal Only
Templates
Powered By

Title
Message
Create new category
What is the title of your new category?
Edit page index title
What is the title of the page index?
Edit category
What is the new title of your category?
Edit link
What is the new title and URL of your link?
How to Proactively Monitor PLEG Health with Prometheus and Alertmanager Rules
Copy Markdown
Open in ChatGPT
Open in Claude
Problem
The Pod Lifecycle Event Generator (PLEG) (introduced in Kubernetes 1.2) – a component which detects changes in container states locally – needs to remain in a "Healthy" state; it is imperative to proactively monitor vitals or metrics related to its health as it relates to the state of the node (which will transition into a "NotReady" state shall PLEG be deemed "Unhealthy" which will cease scheduling).
Environment
- Platform9 Managed Kubernetes – v5.4 and Higher
- Prometheus Monitoring
- Kubelet
Procedure
- If Prometheus Monitoring is not already enabled (it should be enabled by default, but, for older clusters this may not apply), follow the instructions in Enable In-Cluster Monitoring.
- Download Kubeconfig.
- Export Kubeconfig.
Bash
xxxxxxxxxxexport KUBECONFIG=~/<cluster-name>.yaml- List the
PrometheusRulesin thepf9-monitoringnamespace.
Bash
xxxxxxxxxx$ kubectl -n pf9-monitoring get prometheusrules.monitoring.coreos.comNAME AGEsystem-prometheus-rules 22m- Edit the
system-prometheus-rulesobject and add the following rules.
Bash
xxxxxxxxxx$ kubectl edit -n pf9-monitoring prometheusrules.monitoring.coreos.com system-prometheus-rulesBash
spec: groups: - name: kube-events rules: - alert: KubeletPlegDurationHigh annotations: message: 'The Kubelet Pod Lifecycle Event Generator has a 99th percentile duration of {{ $value }} seconds on node {{ $labels.node }}.' expr: | node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile{quantile="0.99"} >= 10 for: 5m labels: severity: warning [...] - name: kubelet.rules rules: - expr: | histogram_quantile(0.99, sum(rate(kubelet_pleg_relist_duration_seconds_bucket[5m])) by (cluster, instance, le) * on(cluster, instance) group_left(node) kubelet_node_name{job="kubelet", metrics_path="/metrics"}) labels: quantile: "0.99" record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile- Access the Prometheus UI via Qbert API proxy (similar to Grafana) or via
kubectl port-forwardassociated with theprometheus-operatedservice exposed on TCP/9090. - Navigate to the Alerts tab.
- Search for and/or verify from the list of alarms that the
KubeletPlegDurationHighalarm has been added and is showing green.

VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches
Last updated on
Was this page helpful?
Discard Changes
Do you want to discard your current changes and overwrite with the template?
Archive Synced Block
Message
Create new Template
What is this template's title?
Delete Template
Message