Kubernetes Affinity and Anti-Affinity are mechanisms for controlling Pod scheduling. They allow users to define relationships between Pods and nodes or other Pods, thereby influencing scheduling decisions.
Affinity Types
1. Node Affinity
Node affinity is used to control which nodes Pods are scheduled to.
requiredDuringSchedulingIgnoredDuringExecution
Hard requirement, Pods must be scheduled to nodes that meet the conditions, otherwise scheduling fails.
yamlapiVersion: v1 kind: Pod metadata: name: with-node-affinity spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/e2e-az-name operator: In values: - us-west-1a containers: - name: my-container image: nginx
preferredDuringSchedulingIgnoredDuringExecution
Soft preference, the scheduler prioritizes nodes that meet the conditions, but if no nodes meet the conditions, it can schedule to other nodes.
yamlapiVersion: v1 kind: Pod metadata: name: with-node-affinity spec: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: disktype operator: In values: - ssd containers: - name: my-container image: nginx
2. Pod Affinity
Pod affinity is used to control which nodes Pods are scheduled to, based on Pods already running on those nodes.
requiredDuringSchedulingIgnoredDuringExecution
yamlapiVersion: v1 kind: Pod metadata: name: with-pod-affinity spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - S1 topologyKey: topology.kubernetes.io/zone containers: - name: my-container image: nginx
preferredDuringSchedulingIgnoredDuringExecution
yamlapiVersion: v1 kind: Pod metadata: name: with-pod-affinity spec: affinity: podAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: security operator: In values: - S2 topologyKey: topology.kubernetes.io/zone containers: - name: my-container image: nginx
3. Pod Anti-Affinity
Pod anti-affinity is used to control which nodes Pods should not be scheduled to, based on Pods already running on those nodes.
requiredDuringSchedulingIgnoredDuringExecution
yamlapiVersion: v1 kind: Pod metadata: name: with-pod-antiaffinity spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - web topologyKey: kubernetes.io/hostname containers: - name: my-container image: nginx
preferredDuringSchedulingIgnoredDuringExecution
yamlapiVersion: v1 kind: Pod metadata: name: with-pod-antiaffinity spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: app operator: In values: - web topologyKey: kubernetes.io/hostname containers: - name: my-container image: nginx
Operator Types
-
In: The label value is in the given list
-
NotIn: The label value is not in the given list
-
Exists: The label exists
-
DoesNotExist: The label does not exist
-
Gt: The label value is greater than the given value (only for numeric values)
-
Lt: The label value is less than the given value (only for numeric values)
Topology Keys
Topology keys are used to define how nodes are grouped. Common topology keys include:
-
kubernetes.io/hostname: Group by hostname
-
topology.kubernetes.io/zone: Group by availability zone
-
topology.kubernetes.io/region: Group by region
-
node.kubernetes.io/instance-type: Group by instance type
Affinity Rule Behavior
Scheduling Phase (During Scheduling)
- required: Hard requirement, must be met
- preferred: Soft preference, prioritize meeting
Execution Phase (During Execution)
- Ignored: After the Pod is running, if the condition is no longer met, it does not affect the running Pod
- Required: After the Pod is running, if the condition is no longer met, the Pod needs to be evicted (currently not supported)
Affinity vs nodeSelector
| Feature | nodeSelector | Node Affinity |
|---|---|---|
| Complexity | Simple | Complex |
| Flexibility | Low | High |
| Supported Operators | Equality | Multiple operators |
| Priority | None | Supports weights |
Use Cases
Node Affinity Use Cases
-
Hardware Requirements: Schedule Pods to nodes with specific hardware (such as GPU, SSD)
-
Zone Requirements: Schedule Pods to specific regions or availability zones
-
OS Requirements: Schedule Pods to nodes running specific operating systems
Pod Affinity Use Cases
-
Communication Optimization: Schedule Pods that need frequent communication to the same node to reduce network latency
-
Dependency Relationships: Schedule dependent Pods to the same node to improve performance
-
Data Locality: Schedule Pods near nodes that store data
Pod Anti-Affinity Use Cases
-
High Availability: Distribute Pods of the same application across different nodes to avoid single points of failure
-
Resource Competition: Avoid scheduling resource-intensive Pods to the same node
-
Fault Isolation: Distribute Pods of different applications across different nodes to reduce the impact of failures
Best Practices
-
Use Hard Requirements Reasonably: Avoid overusing required rules, which may lead to scheduling failures
-
Set Reasonable Weights: Set appropriate weights for preferred rules to influence scheduling decisions
-
Use Labels and Annotations: Add meaningful labels to nodes and Pods for easier use of affinity rules
-
Monitor Scheduling Results: Monitor Pod scheduling to ensure affinity rules work as expected
-
Combine with Taints and Tolerations: Combine affinity with taints/tolerations for more fine-grained scheduling control
-
Avoid Over-Complexity: Avoid creating overly complex affinity rules that affect scheduling performance
-
Test Rules: Test affinity rules in non-production environments to ensure correctness
Example: High Availability Web Application
yamlapiVersion: apps/v1 kind: Deployment metadata: name: web-app spec: replicas: 3 selector: matchLabels: app: web template: metadata: labels: app: web spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - web topologyKey: kubernetes.io/hostname nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: disktype operator: In values: - ssd containers: - name: web image: nginx ports: - containerPort: 80
This example ensures:
- Each Web Pod is scheduled to a different node (high availability)
- Prioritize selecting nodes with SSD (performance optimization)