]> git.ipfire.org Git - thirdparty/linux.git/blame - Documentation/powerpc/cxl.rst
Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git...
[thirdparty/linux.git] / Documentation / powerpc / cxl.rst
CommitLineData
4d2e26a3 1====================================
a9282d01
IM
2Coherent Accelerator Interface (CXL)
3====================================
4
5Introduction
6============
7
8 The coherent accelerator interface is designed to allow the
9 coherent connection of accelerators (FPGAs and other devices) to a
10 POWER system. These devices need to adhere to the Coherent
11 Accelerator Interface Architecture (CAIA).
12
13 IBM refers to this as the Coherent Accelerator Processor Interface
14 or CAPI. In the kernel it's referred to by the name CXL to avoid
15 confusion with the ISDN CAPI subsystem.
16
17 Coherent in this context means that the accelerator and CPUs can
18 both access system memory directly and with the same effective
19 addresses.
20
21
22Hardware overview
23=================
24
4d2e26a3
MCC
25 ::
26
f24be42a 27 POWER8/9 FPGA
a9282d01
IM
28 +----------+ +---------+
29 | | | |
30 | CPU | | AFU |
31 | | | |
32 | | | |
33 | | | |
34 +----------+ +---------+
35 | PHB | | |
36 | +------+ | PSL |
37 | | CAPP |<------>| |
38 +---+------+ PCIE +---------+
39
f24be42a 40 The POWER8/9 chip has a Coherently Attached Processor Proxy (CAPP)
a9282d01
IM
41 unit which is part of the PCIe Host Bridge (PHB). This is managed
42 by Linux by calls into OPAL. Linux doesn't directly program the
43 CAPP.
44
45 The FPGA (or coherently attached device) consists of two parts.
46 The POWER Service Layer (PSL) and the Accelerator Function Unit
47 (AFU). The AFU is used to implement specific functionality behind
48 the PSL. The PSL, among other things, provides memory address
49 translation services to allow each AFU direct access to userspace
50 memory.
51
52 The AFU is the core part of the accelerator (eg. the compression,
53 crypto etc function). The kernel has no knowledge of the function
54 of the AFU. Only userspace interacts directly with the AFU.
55
56 The PSL provides the translation and interrupt services that the
57 AFU needs. This is what the kernel interacts with. For example, if
58 the AFU needs to read a particular effective address, it sends
59 that address to the PSL, the PSL then translates it, fetches the
60 data from memory and returns it to the AFU. If the PSL has a
61 translation miss, it interrupts the kernel and the kernel services
62 the fault. The context to which this fault is serviced is based on
63 who owns that acceleration function.
64
4d2e26a3
MCC
65 - POWER8 and PSL Version 8 are compliant to the CAIA Version 1.0.
66 - POWER9 and PSL Version 9 are compliant to the CAIA Version 2.0.
67
f24be42a 68 This PSL Version 9 provides new features such as:
4d2e26a3 69
f24be42a
CL
70 * Interaction with the nest MMU on the P9 chip.
71 * Native DMA support.
72 * Supports sending ASB_Notify messages for host thread wakeup.
73 * Supports Atomic operations.
4d2e26a3 74 * etc.
f24be42a
CL
75
76 Cards with a PSL9 won't work on a POWER8 system and cards with a
77 PSL8 won't work on a POWER9 system.
a9282d01
IM
78
79AFU Modes
80=========
81
82 There are two programming modes supported by the AFU. Dedicated
83 and AFU directed. AFU may support one or both modes.
84
85 When using dedicated mode only one MMU context is supported. In
86 this mode, only one userspace process can use the accelerator at
87 time.
88
89 When using AFU directed mode, up to 16K simultaneous contexts can
90 be supported. This means up to 16K simultaneous userspace
91 applications may use the accelerator (although specific AFUs may
92 support fewer). In this mode, the AFU sends a 16 bit context ID
93 with each of its requests. This tells the PSL which context is
94 associated with each operation. If the PSL can't translate an
95 operation, the ID can also be accessed by the kernel so it can
96 determine the userspace context associated with an operation.
97
98
99MMIO space
100==========
101
102 A portion of the accelerator MMIO space can be directly mapped
103 from the AFU to userspace. Either the whole space can be mapped or
104 just a per context portion. The hardware is self describing, hence
105 the kernel can determine the offset and size of the per context
106 portion.
107
108
109Interrupts
110==========
111
112 AFUs may generate interrupts that are destined for userspace. These
113 are received by the kernel as hardware interrupts and passed onto
114 userspace by a read syscall documented below.
115
116 Data storage faults and error interrupts are handled by the kernel
117 driver.
118
119
120Work Element Descriptor (WED)
121=============================
122
123 The WED is a 64-bit parameter passed to the AFU when a context is
124 started. Its format is up to the AFU hence the kernel has no
125 knowledge of what it represents. Typically it will be the
126 effective address of a work queue or status block where the AFU
127 and userspace can share control and status information.
128
129
130
131
132User API
133========
134
594ff7d0
CL
1351. AFU character devices
136
a9282d01
IM
137 For AFUs operating in AFU directed mode, two character device
138 files will be created. /dev/cxl/afu0.0m will correspond to a
139 master context and /dev/cxl/afu0.0s will correspond to a slave
140 context. Master contexts have access to the full MMIO space an
141 AFU provides. Slave contexts have access to only the per process
142 MMIO space an AFU provides.
143
144 For AFUs operating in dedicated process mode, the driver will
145 only create a single character device per AFU called
146 /dev/cxl/afu0.0d. This will have access to the entire MMIO space
147 that the AFU provides (like master contexts in AFU directed).
148
149 The types described below are defined in include/uapi/misc/cxl.h
150
151 The following file operations are supported on both slave and
152 master devices.
153
dc12f20b 154 A userspace library libcxl is available here:
4d2e26a3 155
aee85fb6 156 https://github.com/ibm-capi/libcxl
4d2e26a3 157
aee85fb6 158 This provides a C interface to this kernel API.
a9282d01
IM
159
160open
161----
162
163 Opens the device and allocates a file descriptor to be used with
164 the rest of the API.
165
166 A dedicated mode AFU only has one context and only allows the
167 device to be opened once.
168
169 An AFU directed mode AFU can have many contexts, the device can be
170 opened once for each context that is available.
171
172 When all available contexts are allocated the open call will fail
173 and return -ENOSPC.
174
4d2e26a3
MCC
175 Note:
176 IRQs need to be allocated for each context, which may limit
a9282d01
IM
177 the number of contexts that can be created, and therefore
178 how many times the device can be opened. The POWER8 CAPP
179 supports 2040 IRQs and 3 are used by the kernel, so 2037 are
180 left. If 1 IRQ is needed per context, then only 2037
181 contexts can be allocated. If 4 IRQs are needed per context,
182 then only 2037/4 = 509 contexts can be allocated.
183
184
185ioctl
186-----
187
188 CXL_IOCTL_START_WORK:
189 Starts the AFU context and associates it with the current
190 process. Once this ioctl is successfully executed, all memory
191 mapped into this process is accessible to this AFU context
192 using the same effective addresses. No additional calls are
193 required to map/unmap memory. The AFU memory context will be
194 updated as userspace allocates and frees memory. This ioctl
195 returns once the AFU context is started.
196
4d2e26a3
MCC
197 Takes a pointer to a struct cxl_ioctl_start_work
198
199 ::
a9282d01
IM
200
201 struct cxl_ioctl_start_work {
202 __u64 flags;
203 __u64 work_element_descriptor;
204 __u64 amr;
205 __s16 num_interrupts;
206 __s16 reserved1;
207 __s32 reserved2;
208 __u64 reserved3;
209 __u64 reserved4;
210 __u64 reserved5;
211 __u64 reserved6;
212 };
213
214 flags:
215 Indicates which optional fields in the structure are
216 valid.
217
218 work_element_descriptor:
219 The Work Element Descriptor (WED) is a 64-bit argument
220 defined by the AFU. Typically this is an effective
221 address pointing to an AFU specific structure
222 describing what work to perform.
223
224 amr:
225 Authority Mask Register (AMR), same as the powerpc
226 AMR. This field is only used by the kernel when the
227 corresponding CXL_START_WORK_AMR value is specified in
228 flags. If not specified the kernel will use a default
229 value of 0.
230
231 num_interrupts:
232 Number of userspace interrupts to request. This field
233 is only used by the kernel when the corresponding
234 CXL_START_WORK_NUM_IRQS value is specified in flags.
235 If not specified the minimum number required by the
236 AFU will be allocated. The min and max number can be
237 obtained from sysfs.
238
239 reserved fields:
240 For ABI padding and future extensions
241
242 CXL_IOCTL_GET_PROCESS_ELEMENT:
243 Get the current context id, also known as the process element.
244 The value is returned from the kernel as a __u32.
245
246
247mmap
248----
249
250 An AFU may have an MMIO space to facilitate communication with the
251 AFU. If it does, the MMIO space can be accessed via mmap. The size
252 and contents of this area are specific to the particular AFU. The
253 size can be discovered via sysfs.
254
255 In AFU directed mode, master contexts are allowed to map all of
256 the MMIO space and slave contexts are allowed to only map the per
257 process MMIO space associated with the context. In dedicated
258 process mode the entire MMIO space can always be mapped.
259
260 This mmap call must be done after the START_WORK ioctl.
261
262 Care should be taken when accessing MMIO space. Only 32 and 64-bit
263 accesses are supported by POWER8. Also, the AFU will be designed
264 with a specific endianness, so all MMIO accesses should consider
265 endianness (recommend endian(3) variants like: le64toh(),
266 be64toh() etc). These endian issues equally apply to shared memory
267 queues the WED may describe.
268
269
270read
271----
272
273 Reads events from the AFU. Blocks if no events are pending
274 (unless O_NONBLOCK is supplied). Returns -EIO in the case of an
275 unrecoverable error or if the card is removed.
276
277 read() will always return an integral number of events.
278
279 The buffer passed to read() must be at least 4K bytes.
280
281 The result of the read will be a buffer of one or more events,
4d2e26a3 282 each event is of type struct cxl_event, of varying size::
a9282d01
IM
283
284 struct cxl_event {
285 struct cxl_event_header header;
286 union {
287 struct cxl_event_afu_interrupt irq;
288 struct cxl_event_data_storage fault;
289 struct cxl_event_afu_error afu_error;
290 };
291 };
292
4d2e26a3
MCC
293 The struct cxl_event_header is defined as
294
295 ::
a9282d01
IM
296
297 struct cxl_event_header {
298 __u16 type;
299 __u16 size;
300 __u16 process_element;
301 __u16 reserved1;
302 };
303
304 type:
305 This defines the type of event. The type determines how
306 the rest of the event is structured. These types are
307 described below and defined by enum cxl_event_type.
308
309 size:
310 This is the size of the event in bytes including the
311 struct cxl_event_header. The start of the next event can
312 be found at this offset from the start of the current
313 event.
314
315 process_element:
316 Context ID of the event.
317
318 reserved field:
319 For future extensions and padding.
320
321 If the event type is CXL_EVENT_AFU_INTERRUPT then the event
4d2e26a3
MCC
322 structure is defined as
323
324 ::
a9282d01
IM
325
326 struct cxl_event_afu_interrupt {
327 __u16 flags;
328 __u16 irq; /* Raised AFU interrupt number */
329 __u32 reserved1;
330 };
331
332 flags:
333 These flags indicate which optional fields are present
334 in this struct. Currently all fields are mandatory.
335
336 irq:
337 The IRQ number sent by the AFU.
338
339 reserved field:
340 For future extensions and padding.
341
342 If the event type is CXL_EVENT_DATA_STORAGE then the event
4d2e26a3
MCC
343 structure is defined as
344
345 ::
a9282d01
IM
346
347 struct cxl_event_data_storage {
348 __u16 flags;
349 __u16 reserved1;
350 __u32 reserved2;
351 __u64 addr;
352 __u64 dsisr;
353 __u64 reserved3;
354 };
355
356 flags:
357 These flags indicate which optional fields are present in
358 this struct. Currently all fields are mandatory.
359
360 address:
361 The address that the AFU unsuccessfully attempted to
362 access. Valid accesses will be handled transparently by the
363 kernel but invalid accesses will generate this event.
364
365 dsisr:
366 This field gives information on the type of fault. It is a
367 copy of the DSISR from the PSL hardware when the address
368 fault occurred. The form of the DSISR is as defined in the
369 CAIA.
370
371 reserved fields:
372 For future extensions
373
374 If the event type is CXL_EVENT_AFU_ERROR then the event structure
4d2e26a3
MCC
375 is defined as
376
377 ::
a9282d01
IM
378
379 struct cxl_event_afu_error {
380 __u16 flags;
381 __u16 reserved1;
382 __u32 reserved2;
383 __u64 error;
384 };
385
386 flags:
387 These flags indicate which optional fields are present in
388 this struct. Currently all fields are Mandatory.
389
390 error:
391 Error status from the AFU. Defined by the AFU.
392
393 reserved fields:
394 For future extensions and padding
395
594ff7d0
CL
396
3972. Card character device (powerVM guest only)
398
399 In a powerVM guest, an extra character device is created for the
400 card. The device is only used to write (flash) a new image on the
401 FPGA accelerator. Once the image is written and verified, the
402 device tree is updated and the card is reset to reload the updated
403 image.
404
405open
406----
407
408 Opens the device and allocates a file descriptor to be used with
409 the rest of the API. The device can only be opened once.
410
411ioctl
412-----
413
4d2e26a3 414CXL_IOCTL_DOWNLOAD_IMAGE / CXL_IOCTL_VALIDATE_IMAGE:
594ff7d0
CL
415 Starts and controls flashing a new FPGA image. Partial
416 reconfiguration is not supported (yet), so the image must contain
417 a copy of the PSL and AFU(s). Since an image can be quite large,
418 the caller may have to iterate, splitting the image in smaller
419 chunks.
420
4d2e26a3
MCC
421 Takes a pointer to a struct cxl_adapter_image::
422
594ff7d0
CL
423 struct cxl_adapter_image {
424 __u64 flags;
425 __u64 data;
426 __u64 len_data;
427 __u64 len_image;
428 __u64 reserved1;
429 __u64 reserved2;
430 __u64 reserved3;
431 __u64 reserved4;
432 };
433
434 flags:
435 These flags indicate which optional fields are present in
436 this struct. Currently all fields are mandatory.
437
438 data:
439 Pointer to a buffer with part of the image to write to the
440 card.
441
442 len_data:
443 Size of the buffer pointed to by data.
444
445 len_image:
446 Full size of the image.
447
448
a9282d01
IM
449Sysfs Class
450===========
451
452 A cxl sysfs class is added under /sys/class/cxl to facilitate
453 enumeration and tuning of the accelerators. Its layout is
454 described in Documentation/ABI/testing/sysfs-class-cxl
455
aee85fb6 456
a9282d01
IM
457Udev rules
458==========
459
460 The following udev rules could be used to create a symlink to the
461 most logical chardev to use in any programming mode (afuX.Yd for
462 dedicated, afuX.Ys for afu directed), since the API is virtually
4d2e26a3 463 identical for each::
a9282d01
IM
464
465 SUBSYSTEM=="cxl", ATTRS{mode}=="dedicated_process", SYMLINK="cxl/%b"
466 SUBSYSTEM=="cxl", ATTRS{mode}=="afu_directed", \
467 KERNEL=="afu[0-9]*.[0-9]*s", SYMLINK="cxl/%b"