AMD GPU Operator Documentation#
The AMD GPU Operator simplifies the deployment and management of AMD Instinct GPU accelerators within Kubernetes clusters. This project enables seamless configuration and operation of GPU-accelerated workloads, including machine learning, Generative AI, and other GPU-intensive applications.
Features#
Automated driver installation and management
Easy deployment of the AMD GPU device plugin
Metrics collection and export
Support for Vanilla Kubernetes
Simplified GPU resource allocation for containers
Automatic worker node labeling for GPU-enabled nodes
Compatibility#
Kubernetes: 1.29.0
Please refer to the ROCm documentation for the compatability matrix for the AMD GPU DKMS driver.
Prerequisites#
Helm v3.2.0+
kubectl
CLI tool configured to access your cluster
Quick Start#
Add the Helm repository:
helm repo add rocm https://rocm.github.io/gpu-operator
helm repo update
Install the AMD GPU Operator:
helm install amd-gpu-operator rocm/gpu-operator-charts --namespace kube-amd-gpu --create-namespace
Verify the installation:
kubectl get pods -n kube-amd-gpu
Support#
For bugs and feature requests, please file an issue on our GitHub Issues page.