Kubernetes Taints and Tolerations are mechanisms for controlling Pod scheduling. They allow nodes to reject (or accept) Pods with specific tolerations.
Taints
Taints are key-value pairs applied to nodes that prevent Pods from being scheduled to that node unless the Pod has matching tolerations.
Components of a Taint
Each taint consists of three parts:
-
Key: The taint key (required)
-
Value: The taint value (optional)
-
Effect: The taint effect (required)
Taint Effect Types
-
NoSchedule:
- New Pods will not be scheduled to this node
- Existing Pods are not affected
- Suitable for dedicated nodes (such as GPU nodes)
-
PreferNoSchedule:
- Try not to schedule new Pods to this node
- But if no other nodes are available, scheduling may still occur
- Suitable for soft restrictions
-
NoExecute:
- New Pods will not be scheduled to this node
- Existing Pods without matching tolerations will be evicted
- Suitable for node maintenance or failure scenarios
Adding Taints
bash# Add NoSchedule taint kubectl taint nodes node1 key=value:NoSchedule # Add NoExecute taint kubectl taint nodes node1 key=value:NoExecute # Add taint without value kubectl taint nodes node1 key:NoSchedule
Viewing Taints
bash# View taints on a node kubectl describe node node1 | grep Taint # View taints on all nodes kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
Removing Taints
bash# Remove specific taint kubectl taint nodes node1 key:NoSchedule- # Remove all taints with specific key kubectl taint nodes node1 key-
Tolerations
Tolerations are configurations applied to Pods that allow the Pod to be scheduled to nodes with matching taints.
Components of a Toleration
Tolerations include the following fields:
-
Key: The taint key to tolerate
-
Operator: The operator (Equal or Exists)
-
Value: The taint value to tolerate (required when Operator is Equal)
-
Effect: The taint effect to tolerate
-
TolerationSeconds: Tolerant time (only applicable to NoExecute)
Toleration Operators
-
Equal:
- Both key and value must match
- Value must be specified
-
Exists:
- Only the key needs to match
- Value does not need to be specified
Adding Tolerations
yamlapiVersion: v1 kind: Pod metadata: name: my-pod spec: tolerations: - key: "key" operator: "Equal" value: "value" effect: "NoSchedule" containers: - name: my-container image: nginx
Toleration Examples
- Tolerate specific taint:
yamltolerations: - key: "dedicated" operator: "Equal" value: "gpu" effect: "NoSchedule"
- Tolerate all taints with a specific key:
yamltolerations: - key: "dedicated" operator: "Exists"
- Tolerate all taints:
yamltolerations: - operator: "Exists"
- Tolerate NoExecute taint and set toleration time:
yamltolerations: - key: "node.kubernetes.io/not-ready" operator: "Exists" effect: "NoExecute" tolerationSeconds: 300
Taint and Toleration Matching Rules
-
Key Matching:
- If the toleration key is empty, it matches all taints
- If the taint key is empty, it cannot be matched by any toleration
-
Operator Matching:
- Equal: Both key and value must match
- Exists: Only the key needs to match
-
Effect Matching:
- If the toleration effect is empty, it matches all effects
- Otherwise, the effect must match
Common Use Cases
1. Dedicated Nodes
Add taints to nodes for specific purposes to ensure only specific Pods can be scheduled to these nodes.
bash# Add taint to GPU node kubectl taint nodes gpu-node dedicated=gpu:NoSchedule
yaml# Only GPU Pods can be scheduled to GPU node apiVersion: v1 kind: Pod metadata: name: gpu-pod spec: tolerations: - key: "dedicated" operator: "Equal" value: "gpu" effect: "NoSchedule" containers: - name: gpu-container image: nvidia/cuda:11.0.3-base-ubuntu20.04
2. Node Maintenance
Use NoExecute taint to evict Pods for node maintenance.
bash# Mark node as under maintenance kubectl taint nodes node1 maintenance:NoExecute
3. Special Hardware Nodes
Add taints to nodes with special hardware to ensure only Pods that need this hardware can be scheduled.
bash# Add taint to SSD node kubectl taint nodes ssd-node disktype=ssd:NoSchedule
4. Failed Nodes
Kubernetes automatically adds taints to failed nodes, evicting Pods.
yaml# Pod tolerates node failure tolerations: - key: "node.kubernetes.io/not-ready" operator: "Exists" effect: "NoExecute" tolerationSeconds: 300 - key: "node.kubernetes.io/unreachable" operator: "Exists" effect: "NoExecute" tolerationSeconds: 300
Taints and Tolerations vs Affinity
| Feature | Taints and Tolerations | Affinity |
|---|---|---|
| Target | Nodes and Pods | Nodes and Pods |
| Direction | Node rejects Pod | Pod selects node |
| Flexibility | Lower | Higher |
| Use Cases | Dedicated nodes, node maintenance | Performance optimization, high availability |
Best Practices
-
Use Taints Reasonably: Avoid overusing taints, which may lead to scheduling failures
-
Add Taints to Dedicated Nodes: Ensure only specific Pods can be scheduled to dedicated nodes
-
Set Reasonable Toleration Times: Set reasonable toleration times for NoExecute taints to avoid frequent evictions
-
Combine with Affinity: Combine taints/tolerations with affinity for more fine-grained scheduling control
-
Monitor Node Status: Monitor node taint status and handle failed nodes in a timely manner
-
Document Taint Policies: Record taint and toleration usage policies for team collaboration
-
Test Toleration Configuration: Test toleration configuration in non-production environments to ensure correctness
Troubleshooting
- View Node Taints:
bashkubectl describe node <node-name>
- View Pod Tolerations:
bashkubectl describe pod <pod-name>
- Check Scheduling Failure Reasons:
bashkubectl describe pod <pod-name> | grep -A 10 Events
- View Scheduler Logs:
bashkubectl logs -n kube-system <scheduler-pod-name>
Example: Multi-Node Type Cluster
yaml# Master node apiVersion: v1 kind: Node metadata: name: master-node spec: taints: - key: "node-role.kubernetes.io/master" effect: "NoSchedule" # GPU node apiVersion: v1 kind: Node metadata: name: gpu-node spec: taints: - key: "dedicated" value: "gpu" effect: "NoSchedule" # Normal Pod (can be scheduled to normal nodes) apiVersion: v1 kind: Pod metadata: name: normal-pod spec: containers: - name: nginx image: nginx # GPU Pod (can only be scheduled to GPU node) apiVersion: v1 kind: Pod metadata: name: gpu-pod spec: tolerations: - key: "dedicated" operator: "Equal" value: "gpu" effect: "NoSchedule" containers: - name: gpu-app image: nvidia/cuda:11.0.3-base-ubuntu20.04