Expedite Pod Scheduling On a Node That has Recovered From Disk-Pressure Eviction

Problem

Not able to schedule new Pods on a node that has recently recovered from Disk Pressure eviction for around 5 min.

Environment

  • Platform9 Managed Kubernetes - All Version

Procedure

  1. There is a default transition period of 5 Minutes which controls how long the kubelet must wait before transitioning a node condition to a different state.
  2. This transition period can be configured to a lesser value with the help of a Kubelet parameter evictionPressureTransitionPeriod
  3. This parameter can be configured to a smaller value through Dynamic Kubelet Configuration.
  4. To configure this for the worker nodes, edit the configmap object worker-default-kubelet-config in kube-system project to add above parameter with a smaller value.
Configure evictionPressureTransitionPeriod to 1 Min
Copy

It requires sometime to incorporate above changes in all worker nodes as during this time on each node the pf9-kubelet is restarted and each node transitions through the Ready --> NotReady,SchedulingDisabled --> NotReady --> Ready states.

Below mentioned are the verification steps to confirm if the changes have been successfully incorporated on a node or not. node.

Worker node verification
Copy

Additional Information

  • In the situations where nodes oscillate above and below a soft eviction thresholds without holding for the defined grace periods, leads to constantly switching node condition between true and false which eventually leads to bad eviction decision.
  • eviction-pressure-transition-period flag is used to provide protection against such unwanted node condition oscillations.
VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches