Driver Upgrade Guide#
This guide walks through the process of upgrading AMD GPU drivers on worker nodes.
Overview#
The upgrade process involves:
Verifying current installation
Updating the driver version
Managing workloads
Updating node labels
Performing the upgrade
Step-by-Step Upgrade Process#
1. Check Current Driver Version#
Verify the existing driver version label on your worker nodes:
kubectl get node <worker-node> -o yaml
Look for the label in this format:
kmm.node.kubernetes.io/version-module.<deviceconfig-namespace>.<deviceconfig-name>=<version>
Example:
kmm.node.kubernetes.io/version-module.kube-amd-gpu.test-device-config=6.1.3
2. Update DeviceConfig#
Update the driversVersion
field in your DeviceConfig:
kubectl edit deviceconfigs <config-name> -n kube-amd-gpu
The operator will automatically:
Look for the new driver image in the registry
Build the image if it doesn’t exist
Push the built image to your specified registry
Image Tag Format#
The operator uses specific tag formats based on the OS:
OS |
Tag Format |
Example |
---|---|---|
Ubuntu |
|
|
Warning: If a node’s ready status changes during upgrade (Ready → NotReady → Ready) before its driver version label is updated, the old driver won’t be reinstalled. Complete the upgrade steps for these nodes to install the new driver.
3. Stop Workloads#
Stop all workloads using the AMD GPU driver on the target node before proceeding.
4. Update Node Labels#
You have two options for updating node labels:
Option A: Direct Update (Recommended)#
If no additional maintenance is needed, directly update the version label:
# Old label format:
kmm.node.kubernetes.io/version-module.<namespace>.<config-name>=<old-version>
# New label format:
kmm.node.kubernetes.io/version-module.<namespace>.<config-name>=<new-version>
Option B: Remove and Add (If maintenance is needed)#
Remove old version label:
kubectl label node <worker-node> \
kmm.node.kubernetes.io/version-module.<namespace>.<config-name>-
Perform required maintenance
Add new version label:
kubectl label node <worker-node> \
kmm.node.kubernetes.io/version-module.<namespace>.<config-name>=<new-version>
5. Restart Workloads#
After the new driver is installed successfully, restart your GPU workloads on the upgraded node.
Verification#
To verify the upgrade, check node labels:
kubectl get node <worker-node> --show-labels | grep kmm.node.kubernetes.io
Verify driver version:
kubectl get deviceconfigs <config-name> -n kube-amd-gpu -o yaml
Check driver status:
kubectl get deviceconfigs <config-name> -n kube-amd-gpu -o jsonpath='{.status}'