]>
Commit | Line | Data |
---|---|---|
8e1c5a40 KW |
1 | /* |
2 | * VFIO Mediated devices | |
3 | * | |
4 | * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved. | |
5 | * Author: Neo Jia <cjia@nvidia.com> | |
6 | * Kirti Wankhede <kwankhede@nvidia.com> | |
7 | * | |
8 | * This program is free software; you can redistribute it and/or modify | |
9 | * it under the terms of the GNU General Public License version 2 as | |
10 | * published by the Free Software Foundation. | |
11 | */ | |
12 | ||
13 | Virtual Function I/O (VFIO) Mediated devices[1] | |
14 | =============================================== | |
15 | ||
16 | The number of use cases for virtualizing DMA devices that do not have built-in | |
17 | SR_IOV capability is increasing. Previously, to virtualize such devices, | |
18 | developers had to create their own management interfaces and APIs, and then | |
19 | integrate them with user space software. To simplify integration with user space | |
20 | software, we have identified common requirements and a unified management | |
21 | interface for such devices. | |
22 | ||
23 | The VFIO driver framework provides unified APIs for direct device access. It is | |
24 | an IOMMU/device-agnostic framework for exposing direct device access to user | |
25 | space in a secure, IOMMU-protected environment. This framework is used for | |
26 | multiple devices, such as GPUs, network adapters, and compute accelerators. With | |
27 | direct device access, virtual machines or user space applications have direct | |
28 | access to the physical device. This framework is reused for mediated devices. | |
29 | ||
30 | The mediated core driver provides a common interface for mediated device | |
31 | management that can be used by drivers of different devices. This module | |
32 | provides a generic interface to perform these operations: | |
33 | ||
34 | * Create and destroy a mediated device | |
35 | * Add a mediated device to and remove it from a mediated bus driver | |
36 | * Add a mediated device to and remove it from an IOMMU group | |
37 | ||
38 | The mediated core driver also provides an interface to register a bus driver. | |
39 | For example, the mediated VFIO mdev driver is designed for mediated devices and | |
40 | supports VFIO APIs. The mediated bus driver adds a mediated device to and | |
41 | removes it from a VFIO group. | |
42 | ||
43 | The following high-level block diagram shows the main components and interfaces | |
44 | in the VFIO mediated driver framework. The diagram shows NVIDIA, Intel, and IBM | |
45 | devices as examples, as these devices are the first devices to use this module. | |
46 | ||
47 | +---------------+ | |
48 | | | | |
49 | | +-----------+ | mdev_register_driver() +--------------+ | |
50 | | | | +<------------------------+ | | |
51 | | | mdev | | | | | |
52 | | | bus | +------------------------>+ vfio_mdev.ko |<-> VFIO user | |
53 | | | driver | | probe()/remove() | | APIs | |
54 | | | | | +--------------+ | |
55 | | +-----------+ | | |
56 | | | | |
57 | | MDEV CORE | | |
58 | | MODULE | | |
59 | | mdev.ko | | |
60 | | +-----------+ | mdev_register_device() +--------------+ | |
61 | | | | +<------------------------+ | | |
62 | | | | | | nvidia.ko |<-> physical | |
63 | | | | +------------------------>+ | device | |
64 | | | | | callbacks +--------------+ | |
65 | | | Physical | | | |
66 | | | device | | mdev_register_device() +--------------+ | |
67 | | | interface | |<------------------------+ | | |
68 | | | | | | i915.ko |<-> physical | |
69 | | | | +------------------------>+ | device | |
70 | | | | | callbacks +--------------+ | |
71 | | | | | | |
72 | | | | | mdev_register_device() +--------------+ | |
73 | | | | +<------------------------+ | | |
74 | | | | | | ccw_device.ko|<-> physical | |
75 | | | | +------------------------>+ | device | |
76 | | | | | callbacks +--------------+ | |
77 | | +-----------+ | | |
78 | +---------------+ | |
79 | ||
80 | ||
81 | Registration Interfaces | |
82 | ======================= | |
83 | ||
84 | The mediated core driver provides the following types of registration | |
85 | interfaces: | |
86 | ||
87 | * Registration interface for a mediated bus driver | |
88 | * Physical device driver interface | |
89 | ||
90 | Registration Interface for a Mediated Bus Driver | |
91 | ------------------------------------------------ | |
92 | ||
93 | The registration interface for a mediated bus driver provides the following | |
94 | structure to represent a mediated device's driver: | |
95 | ||
96 | /* | |
97 | * struct mdev_driver [2] - Mediated device's driver | |
98 | * @name: driver name | |
99 | * @probe: called when new device created | |
100 | * @remove: called when device removed | |
101 | * @driver: device driver structure | |
102 | */ | |
103 | struct mdev_driver { | |
104 | const char *name; | |
105 | int (*probe) (struct device *dev); | |
106 | void (*remove) (struct device *dev); | |
107 | struct device_driver driver; | |
108 | }; | |
109 | ||
110 | A mediated bus driver for mdev should use this structure in the function calls | |
111 | to register and unregister itself with the core driver: | |
112 | ||
113 | * Register: | |
114 | ||
115 | extern int mdev_register_driver(struct mdev_driver *drv, | |
116 | struct module *owner); | |
117 | ||
118 | * Unregister: | |
119 | ||
120 | extern void mdev_unregister_driver(struct mdev_driver *drv); | |
121 | ||
122 | The mediated bus driver is responsible for adding mediated devices to the VFIO | |
123 | group when devices are bound to the driver and removing mediated devices from | |
124 | the VFIO when devices are unbound from the driver. | |
125 | ||
126 | ||
127 | Physical Device Driver Interface | |
128 | -------------------------------- | |
129 | ||
42930553 AW |
130 | The physical device driver interface provides the mdev_parent_ops[3] structure |
131 | to define the APIs to manage work in the mediated core driver that is related | |
132 | to the physical device. | |
8e1c5a40 | 133 | |
42930553 | 134 | The structures in the mdev_parent_ops structure are as follows: |
8e1c5a40 KW |
135 | |
136 | * dev_attr_groups: attributes of the parent device | |
137 | * mdev_attr_groups: attributes of the mediated device | |
138 | * supported_config: attributes to define supported configurations | |
139 | ||
42930553 | 140 | The functions in the mdev_parent_ops structure are as follows: |
8e1c5a40 KW |
141 | |
142 | * create: allocate basic resources in a driver for a mediated device | |
143 | * remove: free resources in a driver when a mediated device is destroyed | |
144 | ||
42930553 | 145 | The callbacks in the mdev_parent_ops structure are as follows: |
8e1c5a40 KW |
146 | |
147 | * open: open callback of mediated device | |
148 | * close: close callback of mediated device | |
149 | * ioctl: ioctl callback of mediated device | |
150 | * read : read emulation callback | |
151 | * write: write emulation callback | |
152 | * mmap: mmap emulation callback | |
153 | ||
42930553 AW |
154 | A driver should use the mdev_parent_ops structure in the function call to |
155 | register itself with the mdev core driver: | |
8e1c5a40 KW |
156 | |
157 | extern int mdev_register_device(struct device *dev, | |
42930553 | 158 | const struct mdev_parent_ops *ops); |
8e1c5a40 | 159 | |
42930553 AW |
160 | However, the mdev_parent_ops structure is not required in the function call |
161 | that a driver should use to unregister itself with the mdev core driver: | |
8e1c5a40 KW |
162 | |
163 | extern void mdev_unregister_device(struct device *dev); | |
164 | ||
165 | ||
166 | Mediated Device Management Interface Through sysfs | |
167 | ================================================== | |
168 | ||
169 | The management interface through sysfs enables user space software, such as | |
170 | libvirt, to query and configure mediated devices in a hardware-agnostic fashion. | |
171 | This management interface provides flexibility to the underlying physical | |
172 | device's driver to support features such as: | |
173 | ||
174 | * Mediated device hot plug | |
175 | * Multiple mediated devices in a single virtual machine | |
176 | * Multiple mediated devices from different physical devices | |
177 | ||
178 | Links in the mdev_bus Class Directory | |
179 | ------------------------------------- | |
180 | The /sys/class/mdev_bus/ directory contains links to devices that are registered | |
181 | with the mdev core driver. | |
182 | ||
183 | Directories and files under the sysfs for Each Physical Device | |
184 | -------------------------------------------------------------- | |
185 | ||
186 | |- [parent physical device] | |
187 | |--- Vendor-specific-attributes [optional] | |
188 | |--- [mdev_supported_types] | |
189 | | |--- [<type-id>] | |
190 | | | |--- create | |
191 | | | |--- name | |
192 | | | |--- available_instances | |
193 | | | |--- device_api | |
194 | | | |--- description | |
195 | | | |--- [devices] | |
196 | | |--- [<type-id>] | |
197 | | | |--- create | |
198 | | | |--- name | |
199 | | | |--- available_instances | |
200 | | | |--- device_api | |
201 | | | |--- description | |
202 | | | |--- [devices] | |
203 | | |--- [<type-id>] | |
204 | | |--- create | |
205 | | |--- name | |
206 | | |--- available_instances | |
207 | | |--- device_api | |
208 | | |--- description | |
209 | | |--- [devices] | |
210 | ||
211 | * [mdev_supported_types] | |
212 | ||
213 | The list of currently supported mediated device types and their details. | |
214 | ||
215 | [<type-id>], device_api, and available_instances are mandatory attributes | |
216 | that should be provided by vendor driver. | |
217 | ||
218 | * [<type-id>] | |
219 | ||
220 | The [<type-id>] name is created by adding the the device driver string as a | |
221 | prefix to the string provided by the vendor driver. This format of this name | |
222 | is as follows: | |
223 | ||
224 | sprintf(buf, "%s-%s", dev_driver_string(parent->dev), group->name); | |
225 | ||
226 | * device_api | |
227 | ||
228 | This attribute should show which device API is being created, for example, | |
229 | "vfio-pci" for a PCI device. | |
230 | ||
231 | * available_instances | |
232 | ||
233 | This attribute should show the number of devices of type <type-id> that can be | |
234 | created. | |
235 | ||
236 | * [device] | |
237 | ||
238 | This directory contains links to the devices of type <type-id> that have been | |
239 | created. | |
240 | ||
241 | * name | |
242 | ||
243 | This attribute should show human readable name. This is optional attribute. | |
244 | ||
245 | * description | |
246 | ||
247 | This attribute should show brief features/description of the type. This is | |
248 | optional attribute. | |
249 | ||
250 | Directories and Files Under the sysfs for Each mdev Device | |
251 | ---------------------------------------------------------- | |
252 | ||
253 | |- [parent phy device] | |
254 | |--- [$MDEV_UUID] | |
255 | |--- remove | |
256 | |--- mdev_type {link to its type} | |
257 | |--- vendor-specific-attributes [optional] | |
258 | ||
259 | * remove (write only) | |
260 | Writing '1' to the 'remove' file destroys the mdev device. The vendor driver can | |
261 | fail the remove() callback if that device is active and the vendor driver | |
262 | doesn't support hot unplug. | |
263 | ||
264 | Example: | |
265 | # echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove | |
266 | ||
267 | Mediated device Hot plug: | |
268 | ------------------------ | |
269 | ||
270 | Mediated devices can be created and assigned at runtime. The procedure to hot | |
271 | plug a mediated device is the same as the procedure to hot plug a PCI device. | |
272 | ||
273 | Translation APIs for Mediated Devices | |
274 | ===================================== | |
275 | ||
276 | The following APIs are provided for translating user pfn to host pfn in a VFIO | |
277 | driver: | |
278 | ||
279 | extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn, | |
280 | int npage, int prot, unsigned long *phys_pfn); | |
281 | ||
282 | extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, | |
283 | int npage); | |
284 | ||
285 | These functions call back into the back-end IOMMU module by using the pin_pages | |
286 | and unpin_pages callbacks of the struct vfio_iommu_driver_ops[4]. Currently | |
287 | these callbacks are supported in the TYPE1 IOMMU module. To enable them for | |
288 | other IOMMU backend modules, such as PPC64 sPAPR module, they need to provide | |
289 | these two callback functions. | |
290 | ||
9d1a546c KW |
291 | Using the Sample Code |
292 | ===================== | |
293 | ||
294 | mtty.c in samples/vfio-mdev/ directory is a sample driver program to | |
295 | demonstrate how to use the mediated device framework. | |
296 | ||
297 | The sample driver creates an mdev device that simulates a serial port over a PCI | |
298 | card. | |
299 | ||
300 | 1. Build and load the mtty.ko module. | |
301 | ||
302 | This step creates a dummy device, /sys/devices/virtual/mtty/mtty/ | |
303 | ||
304 | Files in this device directory in sysfs are similar to the following: | |
305 | ||
306 | # tree /sys/devices/virtual/mtty/mtty/ | |
307 | /sys/devices/virtual/mtty/mtty/ | |
308 | |-- mdev_supported_types | |
309 | | |-- mtty-1 | |
310 | | | |-- available_instances | |
311 | | | |-- create | |
312 | | | |-- device_api | |
313 | | | |-- devices | |
314 | | | `-- name | |
315 | | `-- mtty-2 | |
316 | | |-- available_instances | |
317 | | |-- create | |
318 | | |-- device_api | |
319 | | |-- devices | |
320 | | `-- name | |
321 | |-- mtty_dev | |
322 | | `-- sample_mtty_dev | |
323 | |-- power | |
324 | | |-- autosuspend_delay_ms | |
325 | | |-- control | |
326 | | |-- runtime_active_time | |
327 | | |-- runtime_status | |
328 | | `-- runtime_suspended_time | |
329 | |-- subsystem -> ../../../../class/mtty | |
330 | `-- uevent | |
331 | ||
332 | 2. Create a mediated device by using the dummy device that you created in the | |
333 | previous step. | |
334 | ||
335 | # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" > \ | |
336 | /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create | |
337 | ||
338 | 3. Add parameters to qemu-kvm. | |
339 | ||
340 | -device vfio-pci,\ | |
341 | sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 | |
342 | ||
343 | 4. Boot the VM. | |
344 | ||
345 | In the Linux guest VM, with no hardware on the host, the device appears | |
346 | as follows: | |
347 | ||
348 | # lspci -s 00:05.0 -xxvv | |
349 | 00:05.0 Serial controller: Device 4348:3253 (rev 10) (prog-if 02 [16550]) | |
350 | Subsystem: Device 4348:3253 | |
351 | Physical Slot: 5 | |
352 | Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- | |
353 | Stepping- SERR- FastB2B- DisINTx- | |
354 | Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- | |
355 | <TAbort- <MAbort- >SERR- <PERR- INTx- | |
356 | Interrupt: pin A routed to IRQ 10 | |
357 | Region 0: I/O ports at c150 [size=8] | |
358 | Region 1: I/O ports at c158 [size=8] | |
359 | Kernel driver in use: serial | |
360 | 00: 48 43 53 32 01 00 00 02 10 02 00 07 00 00 00 00 | |
361 | 10: 51 c1 00 00 59 c1 00 00 00 00 00 00 00 00 00 00 | |
362 | 20: 00 00 00 00 00 00 00 00 00 00 00 00 48 43 53 32 | |
363 | 30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00 | |
364 | ||
365 | In the Linux guest VM, dmesg output for the device is as follows: | |
366 | ||
367 | serial 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ | |
368 | 10 | |
369 | 0000:00:05.0: ttyS1 at I/O 0xc150 (irq = 10) is a 16550A | |
370 | 0000:00:05.0: ttyS2 at I/O 0xc158 (irq = 10) is a 16550A | |
371 | ||
372 | ||
373 | 5. In the Linux guest VM, check the serial ports. | |
374 | ||
375 | # setserial -g /dev/ttyS* | |
376 | /dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4 | |
377 | /dev/ttyS1, UART: 16550A, Port: 0xc150, IRQ: 10 | |
378 | /dev/ttyS2, UART: 16550A, Port: 0xc158, IRQ: 10 | |
379 | ||
380 | 6. Using a minicom or any terminal enulation program, open port /dev/ttyS1 or | |
381 | /dev/ttyS2 with hardware flow control disabled. | |
382 | ||
383 | 7. Type data on the minicom terminal or send data to the terminal emulation | |
384 | program and read the data. | |
385 | ||
386 | Data is loop backed from hosts mtty driver. | |
387 | ||
388 | 8. Destroy the mediated device that you created. | |
389 | ||
390 | # echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001/remove | |
391 | ||
8e1c5a40 | 392 | References |
9d1a546c | 393 | ========== |
8e1c5a40 KW |
394 | |
395 | [1] See Documentation/vfio.txt for more information on VFIO. | |
396 | [2] struct mdev_driver in include/linux/mdev.h | |
42930553 | 397 | [3] struct mdev_parent_ops in include/linux/mdev.h |
8e1c5a40 | 398 | [4] struct vfio_iommu_driver_ops in include/linux/vfio.h |