johanneskueber.com

Intel iGPU passthrough with Proxmox and Talos

2025-10-11

iGPU Passthrough with Proxmox and Talos

To use the GPU of the host system in K8s, it needs to be made aware of its existence. As I use VMs in Proxmox to run my Talos Cluster, there is an additional step to consider: passthrough of the hardware into the VM. The idea is the following:

  1. Disable usage of GPU on the Proxmox Host - done via IOMMU and VFIO
  2. Passthrough of the GPU into a Talos Worker Node - done via hostpci
  3. Activation of GPU drivers in Talos - done via talhelper
  4. GPU management in the cluster - done via Intel GPU Plugin
  5. Usage of GPU in a deployment - by requesting the resource

There are quite some steps and all of them are specific to your environment. Different host system? Different steps. Different Cluster Software? You need to adept. Different GPU? You need other drivers and management. So please look out for the pitfalls and only copy and paste if you run proxmox with talos and have an Intel iGPU.

Configure Proxmox Host System

Ideally this is all done in ansible. We need to find the proper hardware ID and passthrough for the proxmox host. Also we need to disable any allocation of the GPU from the host (proxmox) system. This is done in several steps. First IOMMU for Intel is activate to allow host passthrough. Secondly, the PCI device is passed into the VM via VFIO.

Enable IOMMU and VFIO

Get the PCI device id of the gpu:

1
lspci | grep VGA

Get the vendor and device id:

1
lspci -nn | grep 00:02.0 # <- replace with your pci device id

Finally the ansible script to be executed:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
- name: Enable iGPU passthrough on Intel NUC 12 i5 for Talos VM
  hosts: proxmox
  become: true
  vars:
    gpu_vendor_id: "8086"  # taken from above
    gpu_device_id: "46a6"  # taken from above
    gpu_pci_id: "00:02.0"  # taken from above

  tasks:
    - name: Enable IOMMU in GRUB
      lineinfile:
        path: /etc/default/grub
        regexp: '^GRUB_CMDLINE_LINUX_DEFAULT='
        line: 'GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt video=efifb:off"'
      notify: update grub

    - name: Ensure VFIO modules are loaded at boot
      copy:
        dest: /etc/modules-load.d/vfio.conf
        content: |
          vfio
          vfio_iommu_type1
          vfio_pci
          vfio_virqfd

    - name: Create modprobe override for VFIO GPU binding
      copy:
        dest: /etc/modprobe.d/vfio.conf
        content: |
          options vfio-pci ids={{ gpu_vendor_id }}:{{ gpu_device_id }} disable_vga=1

    - name: Regenerate initramfs
      command: update-initramfs -u

    - name: Add PCI passthrough to Talos VM config
      lineinfile:
        path: "/etc/pve/qemu-server/{{ vm_id }}.conf"
        line: "hostpci0: {{ gpu_pci_id }},pcie=1,x-vga=1"
        insertafter: EOF
        create: yes

    - name: Recommend reboot
      debug:
        msg: "All changes applied. Please reboot the Proxmox host to enable GPU passthrough."

  handlers:
    - name: update grub
      command: update-grub

This script needs to be executed on every worker node where a GPU is installed. The next step is to make the hardware device usable in the Guest OS (Talos).

Configure Talos Worker Nodes

Talos is an immutable OS, hence all config needs to be done during installation. The configuration takes placs in the clusterconfig files, or if you use talhelper in talconfig.yaml. My setup allows passthrough to the worker VMs, hence I can use a single config section to enable the intel specific drivers. The respective required changes to the Talos base image are:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# yaml-language-server: $schema=https://raw.githubusercontent.com/budimanjojo/talhelper/master/pkg/config/schemas/talconfig.json

clusterName: talos

# ommitted for brevity

worker:

  # ommitted for brevity

  schematic:
    customization:
      systemExtensions:
        officialExtensions:
          - siderolabs/i915 # intel quicksync
          - siderolabs/intel-ucode # intel quicksync

If you are not using talhelper, the respective image will no be calculated automatically. You can compile it on the talos factory page: https://factory.talos.dev/.

Install Intel GPU Device Manager

Intel GPU Device Plugin is managed by Intel and enables resource management for the containers. The advantage is that resources can be requested and dynamically allocated. Containers using the GPU do not have to be privileged and the GPU can be used by multiple containers in parallel - if configured.

The device plugin can be installed via kubectl or fluxCD. The gpu plugin is installed as a DaemonSet and can be found in /deploymnents/gpu_plugin/base/intel-gpu-plugin.yaml. Install it in kube-system and ensure correct security admission for the pod - or the namespace.

1
k apply -f intel-gpu-plugin.yaml -n kube-system

Verify in Container

To check that everything is working as expected, a small pod can be launched to check the availability of VA-API interface. Using ffmpeg it is an easy 3 lines in the startup args of the ffmpeg container.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
---
apiVersion: v1
kind: Pod
metadata:
  name: gpu-checker
spec:
  restartPolicy: Never
  containers:
    - name: checker
      image: jrottenberg/ffmpeg:7.1-ubuntu2404
      command: ["/bin/sh", "-c"]
      env:
        - name: XDG_RUNTIME_DIR
          value: /tmp
        - name: LIBVA_DRIVER_NAME
          value: iHD           # or 'i965' for older GPUs
        - name: DRI_DEVICE
          value: /dev/dri/renderD128
      args:
        - >
          apt update &&
          apt install -y vainfo intel-media-va-driver-non-free &&
          vainfo && sleep 3600
      resources:
        limits:
          gpu.intel.com/i915: 1

The output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
error: can't connect to X server!
libva info: VA-API version 1.20.0
libva info: User environment variable requested driver 'iHD'
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_20
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.20 (libva 2.12.0)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 24.1.0 ()
vainfo: Supported profile and entrypoints
      VAProfileNone                   : VAEntrypointVideoProc
      VAProfileNone                   : VAEntrypointStats
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Simple            : VAEntrypointEncSlice
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointEncSlice
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSlice
      VAProfileH264Main               : VAEntrypointFEI
      VAProfileH264Main               : VAEntrypointEncSliceLP
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSlice
      VAProfileH264High               : VAEntrypointFEI
      VAProfileH264High               : VAEntrypointEncSliceLP
      VAProfileVC1Simple              : VAEntrypointVLD
      VAProfileVC1Main                : VAEntrypointVLD
      VAProfileVC1Advanced            : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointEncPicture
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
      VAProfileH264ConstrainedBaseline: VAEntrypointFEI
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
      VAProfileVP8Version0_3          : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointEncSlice
      VAProfileHEVCMain               : VAEntrypointFEI
      VAProfileHEVCMain               : VAEntrypointEncSliceLP
      VAProfileHEVCMain10             : VAEntrypointVLD
      VAProfileHEVCMain10             : VAEntrypointEncSlice
      VAProfileHEVCMain10             : VAEntrypointEncSliceLP
      VAProfileVP9Profile0            : VAEntrypointVLD
      VAProfileVP9Profile0            : VAEntrypointEncSliceLP
      VAProfileVP9Profile1            : VAEntrypointVLD
      VAProfileVP9Profile1            : VAEntrypointEncSliceLP
      VAProfileVP9Profile2            : VAEntrypointVLD
      VAProfileVP9Profile2            : VAEntrypointEncSliceLP
      VAProfileVP9Profile3            : VAEntrypointVLD
      VAProfileVP9Profile3            : VAEntrypointEncSliceLP
      VAProfileHEVCMain12             : VAEntrypointVLD
      VAProfileHEVCMain12             : VAEntrypointEncSlice
      VAProfileHEVCMain422_10         : VAEntrypointVLD
      VAProfileHEVCMain422_10         : VAEntrypointEncSlice
      VAProfileHEVCMain422_12         : VAEntrypointVLD
      VAProfileHEVCMain422_12         : VAEntrypointEncSlice
      VAProfileHEVCMain444            : VAEntrypointVLD
      VAProfileHEVCMain444            : VAEntrypointEncSliceLP
      VAProfileHEVCMain444_10         : VAEntrypointVLD
      VAProfileHEVCMain444_10         : VAEntrypointEncSliceLP
      VAProfileHEVCMain444_12         : VAEntrypointVLD
      VAProfileHEVCSccMain            : VAEntrypointVLD
      VAProfileHEVCSccMain            : VAEntrypointEncSliceLP
      VAProfileHEVCSccMain10          : VAEntrypointVLD
      VAProfileHEVCSccMain10          : VAEntrypointEncSliceLP
      VAProfileHEVCSccMain444         : VAEntrypointVLD
      VAProfileHEVCSccMain444         : VAEntrypointEncSliceLP
      VAProfileAV1Profile0            : VAEntrypointVLD
      VAProfileHEVCSccMain444_10      : VAEntrypointVLD
      VAProfileHEVCSccMain444_10      : VAEntrypointEncSliceLP

The hightlighted lines show the success of the configuration as VA-API was loaded successfully.


stat /posts/proxmox_passthrough_talos/

2025-10-11: Initial publication of the article