Wednesday, September 2, 2015

libvirt 1.2.19 - session mode device assignment

Just a quick note to point out a feature, actually bug fix, in the new libvirt 1.2.19 release:

hostdev: skip ACS check when using VFIO for device assignment (Laine Stump)

Why is this noteworthy?  Well, that ACS checking required access to PCI config space beyond the standard header, which is privileged.  That means that session (ie. user) mode libvirt couldn't do it and failed trying to support <hostdev> entries.  Now that libvirt recognizes that vfio enforces device isolation and a userspace ACS test is unnecessary, session mode libvirt can support device assignment!  Thanks Laine!

Note that a user still can't just pluck a device from the host and start using it, that's still privileged.  There's also the problem that a VM making use of device assignment needs to lock all of the VM memory into RAM, which is typically quite a lot more than the standard user locked memory limit of 64kB.  But these can be resolved by enabling the (trusted) user to lock memory sufficient for their VM and preparing the device for the user.  The keys to doing this are:
  1. Use /etc/security/limits.conf to increase memlock for the desired user
  2. Pre-bind the desired device to vfio-pci, either by the various mechanisms provided in other posts or simply using virsh nodedev-detach.
  3. Change the ownership of the vfio group to that of the (trusted) user.  To determine the group, follow the links in sysfs or use virsh nodedev-dumpxml, for example:
      $ virsh nodedev-dumpxml pci_0000_00_19_0
      <device>
        <name>pci_0000_00_19_0</name>
        <path>/sys/devices/pci0000:00/0000:00:19.0</path>
        <parent>computer</parent>
        <driver>
          <name>e1000e</name>
        </driver>
        <capability type='pci'>
          <domain>0</domain>
          <bus>0</bus>
          <slot>25</slot>
          <function>0</function>
          <product id='0x1502'>82579LM Gigabit Network Connection</product>
          <vendor id='0x8086'>Intel Corporation</vendor>
          <iommuGroup number='4'>
            <address domain='0x0000' bus='0x00' slot='0x19' function='0x0'/>
          </iommuGroup>
        </capability>
      </device>
The iommuGroup sections tells us that this is group number 4, so permissions need to be set on /dev/vfio/4.  As always, also note the set of devices within this group and ensure that all endpoints listed are either bound to vfio-pci or pci-stub, the former will allow the user access to the device, the latter will allow the group to be usable without explicitly allowing the user access.
Enjoy! 

9 comments:

  1. Very cool! Thanks for sharing, as always!

    ReplyDelete
  2. Not able to get this to work with a GPU:

    qemu-system-x86_64: -device vfio-pci,host=01:00.0: VFIO_MAP_DMA: -12
    qemu-system-x86_64: -device vfio-pci,host=01:00.0: vfio_dma_map(0x560d54a63b60, 0x0, 0xa0000, 0x7f1d18000000) = -12 (Cannot allocate memory)
    qemu-system-x86_64: -device vfio-pci,host=01:00.0: VFIO_MAP_DMA: -12
    qemu-system-x86_64: -device vfio-pci,host=01:00.0: vfio_dma_map(0x560d54a63b60, 0xc0000, 0x20000, 0x7f1f2c400000) = -12 (Cannot allocate memory)
    qemu-system-x86_64: -device vfio-pci,host=01:00.0: VFIO_MAP_DMA: -12
    qemu-system-x86_64: -device vfio-pci,host=01:00.0: vfio_dma_map(0x560d54a63b60, 0xe0000, 0x20000, 0x7f1f2c800000) = -12 (Cannot allocate memory)
    qemu-system-x86_64: -device vfio-pci,host=01:00.0: VFIO_MAP_DMA: -12
    qemu-system-x86_64: -device vfio-pci,host=01:00.0: vfio_dma_map(0x560d54a63b60, 0x100000, 0xbff00000, 0x7f1d18100000) = -12 (Cannot allocate memory)
    qemu-system-x86_64: -device vfio-pci,host=01:00.0: VFIO_MAP_DMA: -12
    qemu-system-x86_64: -device vfio-pci,host=01:00.0: vfio_dma_map(0x560d54a63b60, 0x100000000, 0x140000000, 0x7f1dd8000000) = -12 (Cannot allocate memory)
    qemu-system-x86_64: -device vfio-pci,host=01:00.0: vfio: memory listener initialization failed for container
    qemu-system-x86_64: -device vfio-pci,host=01:00.0: vfio: failed to setup container for group 1
    qemu-system-x86_64: -device vfio-pci,host=01:00.0: vfio: failed to get group 1
    qemu-system-x86_64: -device vfio-pci,host=01:00.0: Device initialization failed

    cat /etc/security/limits.conf
    m - memlock 10485760

    ls -al /dev/vfio/1
    crw-rw---- 1 m users 244, 0 Oct 4 23:03 /dev/vfio/1

    Does this have to be taking place through virsh/libvirtd? I'm trying this with just a qemu script.

    Thanks

    ReplyDelete
    Replies
    1. What does ulimit -l report for the user starting the VM and what's the memory size of the VM being started?

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Memory size was 10GB, which was over the ulimit. I started a 1GB memory VM and it worked fine (via virsh), but now I see something odd:

    When I first bind the gpu, the owner:group is m:users. While running the VM the owner:group is m:users. After the VM is shutdown/destroyed the owner:group changes to root:root.

    ls -l /dev/vfio
    total 0
    crw-rw---- 1 m users 244, 0 Oct 5 15:28 1
    crw-rw-rw- 1 root root 10, 196 Oct 5 15:28 vfio
    [m@orange rules.d]$ ls -l /dev/vfio
    total 0
    crw-rw---- 1 root root 244, 0 Oct 5 15:28 1
    crw-rw-rw- 1 root root 10, 196 Oct 5 15:28 vfio

    I'm guessing this has something to do with udev? I'm about to hit my limit on knowledge :)

    Here is my udev rule:
    cat /etc/udev/rules.d/10-vfio-users.rules
    KERNEL=="1", SUBSYSTEM=="vfio", OWNER="m", GROUP="users"

    I would think maybe when the VM is destroyed it releases the group which recreates it defaulting to root:root? Funny thing is that I can restart the VM and ownership goes back to m:users.

    Cool thing about this is now I am running a VM with GPU passthrough without needing root access to anything!
    Thank you

    ReplyDelete
    Replies
    1. This behavior also happens with partitions given to users in udev:
      After boot, ownership is m:users, VM is destroyed ownership reverts to root:root, VM restarted back to m:users.

      Odd.

      Delete
  5. "Note that a user still can't just pluck a device from the host and start using it, that's still privileged."

    Hi, how am I supposed to understand this? A regular user can't but root/privileged one can, or is it not possible at all.

    I'm researching the feasability of a Linux workstation where I can use the highend graphics card for either Xorg or the Windows VM.

    ReplyDelete
  6. Does this mean that your ACS patch is no longer necessary for, let's say, NVIDIA cards which iommu-group themselves with CPU root bridge?

    ReplyDelete
    Replies
    1. No, and it's not the NVIDIA cards with the problem, it's the root ports. The ACS check in libvirt is completely useless when using vfio for device assignment, so it was easy to remove it and rely on the isolation enforcement of the kernel. This only allows granting permissions to a device to a user outside of libvirt such that libvirt doesn't need to do that granting itself.

      Delete

Comments are not a support forum. For help with problems, please try the vfio-users mailing list (https://www.redhat.com/mailman/listinfo/vfio-users)