iGPU Passthrough with Proxmox and Talos
To use the GPU of the host system in K8s, it needs to be made aware of its existence. As I use VMs in Proxmox to run my Talos Cluster, there is an additional step to consider: passthrough of the hardware into the VM. The idea is the following:
- Disable usage of GPU on the Proxmox Host - done via IOMMU and VFIO
- Passthrough of the GPU into a Talos Worker Node - done via hostpci
- Activation of GPU drivers in Talos - done via talhelper
- GPU management in the cluster - done via Intel GPU Plugin
- Usage of GPU in a deployment - by requesting the resource
There are quite some steps and all of them are specific to your environment. Different host system? Different steps. Different Cluster Software? You need to adept. Different GPU? You need other drivers and management. So please look out for the pitfalls and only copy and paste if you run proxmox with talos and have an Intel iGPU.
Ideally this is all done in ansible. We need to find the proper hardware ID and passthrough for the proxmox host. Also we need to disable any allocation of the GPU from the host (proxmox) system. This is done in several steps. First IOMMU for Intel is activate to allow host passthrough. Secondly, the PCI device is passed into the VM via VFIO.
Enable IOMMU and VFIO
Get the PCI device id of the gpu:
Get the vendor and device id:
1
| lspci -nn | grep 00:02.0 # <- replace with your pci device id
|
Finally the ansible script to be executed:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
| - name: Enable iGPU passthrough on Intel NUC 12 i5 for Talos VM
hosts: proxmox
become: true
vars:
gpu_vendor_id: "8086" # taken from above
gpu_device_id: "46a6" # taken from above
gpu_pci_id: "00:02.0" # taken from above
tasks:
- name: Enable IOMMU in GRUB
lineinfile:
path: /etc/default/grub
regexp: '^GRUB_CMDLINE_LINUX_DEFAULT='
line: 'GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt video=efifb:off"'
notify: update grub
- name: Ensure VFIO modules are loaded at boot
copy:
dest: /etc/modules-load.d/vfio.conf
content: |
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
- name: Create modprobe override for VFIO GPU binding
copy:
dest: /etc/modprobe.d/vfio.conf
content: |
options vfio-pci ids={{ gpu_vendor_id }}:{{ gpu_device_id }} disable_vga=1
- name: Regenerate initramfs
command: update-initramfs -u
- name: Add PCI passthrough to Talos VM config
lineinfile:
path: "/etc/pve/qemu-server/{{ vm_id }}.conf"
line: "hostpci0: {{ gpu_pci_id }},pcie=1,x-vga=1"
insertafter: EOF
create: yes
- name: Recommend reboot
debug:
msg: "All changes applied. Please reboot the Proxmox host to enable GPU passthrough."
handlers:
- name: update grub
command: update-grub
|
This script needs to be executed on every worker node where a GPU is installed. The next step is to make the hardware device usable in the Guest OS (Talos).
Talos is an immutable OS, hence all config needs to be done during installation. The configuration takes placs in the clusterconfig files, or if you use talhelper in talconfig.yaml. My setup allows passthrough to the worker VMs, hence I can use a single config section to enable the intel specific drivers. The respective required changes to the Talos base image are:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| # yaml-language-server: $schema=https://raw.githubusercontent.com/budimanjojo/talhelper/master/pkg/config/schemas/talconfig.json
clusterName: talos
# ommitted for brevity
worker:
# ommitted for brevity
schematic:
customization:
systemExtensions:
officialExtensions:
- siderolabs/i915 # intel quicksync
- siderolabs/intel-ucode # intel quicksync
|
If you are not using talhelper, the respective image will no be calculated automatically. You can compile it on the talos factory page: https://factory.talos.dev/.
Install Intel GPU Device Manager
Intel GPU Device Plugin is managed by Intel and enables resource management for the containers. The advantage is that resources can be requested and dynamically allocated. Containers using the GPU do not have to be privileged and the GPU can be used by multiple containers in parallel - if configured.
The device plugin can be installed via kubectl or fluxCD. The gpu plugin is installed as a DaemonSet and can be found in /deploymnents/gpu_plugin/base/intel-gpu-plugin.yaml. Install it in kube-system and ensure correct security admission for the pod - or the namespace.
1
| k apply -f intel-gpu-plugin.yaml -n kube-system
|
Verify in Container
To check that everything is working as expected, a small pod can be launched to check the availability of VA-API interface. Using ffmpeg it is an easy 3 lines in the startup args of the ffmpeg container.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
| ---
apiVersion: v1
kind: Pod
metadata:
name: gpu-checker
spec:
restartPolicy: Never
containers:
- name: checker
image: jrottenberg/ffmpeg:7.1-ubuntu2404
command: ["/bin/sh", "-c"]
env:
- name: XDG_RUNTIME_DIR
value: /tmp
- name: LIBVA_DRIVER_NAME
value: iHD # or 'i965' for older GPUs
- name: DRI_DEVICE
value: /dev/dri/renderD128
args:
- >
apt update &&
apt install -y vainfo intel-media-va-driver-non-free &&
vainfo && sleep 3600
resources:
limits:
gpu.intel.com/i915: 1
|
The output
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
| error: can't connect to X server!
libva info: VA-API version 1.20.0
libva info: User environment variable requested driver 'iHD'
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_20
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.20 (libva 2.12.0)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 24.1.0 ()
vainfo: Supported profile and entrypoints
VAProfileNone : VAEntrypointVideoProc
VAProfileNone : VAEntrypointStats
VAProfileMPEG2Simple : VAEntrypointVLD
VAProfileMPEG2Simple : VAEntrypointEncSlice
VAProfileMPEG2Main : VAEntrypointVLD
VAProfileMPEG2Main : VAEntrypointEncSlice
VAProfileH264Main : VAEntrypointVLD
VAProfileH264Main : VAEntrypointEncSlice
VAProfileH264Main : VAEntrypointFEI
VAProfileH264Main : VAEntrypointEncSliceLP
VAProfileH264High : VAEntrypointVLD
VAProfileH264High : VAEntrypointEncSlice
VAProfileH264High : VAEntrypointFEI
VAProfileH264High : VAEntrypointEncSliceLP
VAProfileVC1Simple : VAEntrypointVLD
VAProfileVC1Main : VAEntrypointVLD
VAProfileVC1Advanced : VAEntrypointVLD
VAProfileJPEGBaseline : VAEntrypointVLD
VAProfileJPEGBaseline : VAEntrypointEncPicture
VAProfileH264ConstrainedBaseline: VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
VAProfileH264ConstrainedBaseline: VAEntrypointFEI
VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
VAProfileVP8Version0_3 : VAEntrypointVLD
VAProfileHEVCMain : VAEntrypointVLD
VAProfileHEVCMain : VAEntrypointEncSlice
VAProfileHEVCMain : VAEntrypointFEI
VAProfileHEVCMain : VAEntrypointEncSliceLP
VAProfileHEVCMain10 : VAEntrypointVLD
VAProfileHEVCMain10 : VAEntrypointEncSlice
VAProfileHEVCMain10 : VAEntrypointEncSliceLP
VAProfileVP9Profile0 : VAEntrypointVLD
VAProfileVP9Profile0 : VAEntrypointEncSliceLP
VAProfileVP9Profile1 : VAEntrypointVLD
VAProfileVP9Profile1 : VAEntrypointEncSliceLP
VAProfileVP9Profile2 : VAEntrypointVLD
VAProfileVP9Profile2 : VAEntrypointEncSliceLP
VAProfileVP9Profile3 : VAEntrypointVLD
VAProfileVP9Profile3 : VAEntrypointEncSliceLP
VAProfileHEVCMain12 : VAEntrypointVLD
VAProfileHEVCMain12 : VAEntrypointEncSlice
VAProfileHEVCMain422_10 : VAEntrypointVLD
VAProfileHEVCMain422_10 : VAEntrypointEncSlice
VAProfileHEVCMain422_12 : VAEntrypointVLD
VAProfileHEVCMain422_12 : VAEntrypointEncSlice
VAProfileHEVCMain444 : VAEntrypointVLD
VAProfileHEVCMain444 : VAEntrypointEncSliceLP
VAProfileHEVCMain444_10 : VAEntrypointVLD
VAProfileHEVCMain444_10 : VAEntrypointEncSliceLP
VAProfileHEVCMain444_12 : VAEntrypointVLD
VAProfileHEVCSccMain : VAEntrypointVLD
VAProfileHEVCSccMain : VAEntrypointEncSliceLP
VAProfileHEVCSccMain10 : VAEntrypointVLD
VAProfileHEVCSccMain10 : VAEntrypointEncSliceLP
VAProfileHEVCSccMain444 : VAEntrypointVLD
VAProfileHEVCSccMain444 : VAEntrypointEncSliceLP
VAProfileAV1Profile0 : VAEntrypointVLD
VAProfileHEVCSccMain444_10 : VAEntrypointVLD
VAProfileHEVCSccMain444_10 : VAEntrypointEncSliceLP
|
The hightlighted lines show the success of the configuration as VA-API was loaded successfully.