]>
Commit | Line | Data |
---|---|---|
28e21eac CD |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
3 | ====================== | |
4 | Memory Protection Keys | |
5 | ====================== | |
6 | ||
c51ff2c7 DH |
7 | Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature |
8 | which is found on Intel's Skylake "Scalable Processor" Server CPUs. | |
9 | It will be avalable in future non-server parts. | |
10 | ||
11 | For anyone wishing to test or use this feature, it is available in | |
12 | Amazon's EC2 C5 instances and is known to work there using an Ubuntu | |
13 | 17.04 image. | |
591b1d8d DH |
14 | |
15 | Memory Protection Keys provides a mechanism for enforcing page-based | |
16 | protections, but without requiring modification of the page tables | |
17 | when an application changes protection domains. It works by | |
18 | dedicating 4 previously ignored bits in each page table entry to a | |
19 | "protection key", giving 16 possible keys. | |
20 | ||
21 | There is also a new user-accessible register (PKRU) with two separate | |
22 | bits (Access Disable and Write Disable) for each key. Being a CPU | |
23 | register, PKRU is inherently thread-local, potentially giving each | |
24 | thread a different set of protections from every other thread. | |
25 | ||
26 | There are two new instructions (RDPKRU/WRPKRU) for reading and writing | |
27 | to the new register. The feature is only available in 64-bit mode, | |
28 | even though there is theoretically space in the PAE PTEs. These | |
29 | permissions are enforced on data access only and have no effect on | |
30 | instruction fetches. | |
31 | ||
28e21eac CD |
32 | Syscalls |
33 | ======== | |
c74fe394 | 34 | |
28e21eac | 35 | There are 3 system calls which directly interact with pkeys:: |
c74fe394 DH |
36 | |
37 | int pkey_alloc(unsigned long flags, unsigned long init_access_rights) | |
38 | int pkey_free(int pkey); | |
39 | int pkey_mprotect(unsigned long start, size_t len, | |
40 | unsigned long prot, int pkey); | |
41 | ||
42 | Before a pkey can be used, it must first be allocated with | |
43 | pkey_alloc(). An application calls the WRPKRU instruction | |
44 | directly in order to change access permissions to memory covered | |
45 | with a key. In this example WRPKRU is wrapped by a C function | |
46 | called pkey_set(). | |
28e21eac | 47 | :: |
c74fe394 DH |
48 | |
49 | int real_prot = PROT_READ|PROT_WRITE; | |
f90e2d9a | 50 | pkey = pkey_alloc(0, PKEY_DISABLE_WRITE); |
c74fe394 DH |
51 | ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); |
52 | ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey); | |
53 | ... application runs here | |
54 | ||
55 | Now, if the application needs to update the data at 'ptr', it can | |
28e21eac | 56 | gain access, do the update, then remove its write access:: |
c74fe394 | 57 | |
f90e2d9a | 58 | pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE |
c74fe394 | 59 | *ptr = foo; // assign something |
f90e2d9a | 60 | pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again |
c74fe394 DH |
61 | |
62 | Now when it frees the memory, it will also free the pkey since it | |
28e21eac | 63 | is no longer in use:: |
c74fe394 DH |
64 | |
65 | munmap(ptr, PAGE_SIZE); | |
66 | pkey_free(pkey); | |
67 | ||
28e21eac CD |
68 | .. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions. |
69 | An example implementation can be found in | |
70 | tools/testing/selftests/x86/protection_keys.c. | |
6679dac5 | 71 | |
28e21eac CD |
72 | Behavior |
73 | ======== | |
c74fe394 DH |
74 | |
75 | The kernel attempts to make protection keys consistent with the | |
28e21eac | 76 | behavior of a plain mprotect(). For instance if you do this:: |
c74fe394 DH |
77 | |
78 | mprotect(ptr, size, PROT_NONE); | |
79 | something(ptr); | |
80 | ||
28e21eac | 81 | you can expect the same effects with protection keys when doing this:: |
c74fe394 DH |
82 | |
83 | pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ); | |
84 | pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey); | |
85 | something(ptr); | |
86 | ||
87 | That should be true whether something() is a direct access to 'ptr' | |
28e21eac | 88 | like:: |
c74fe394 DH |
89 | |
90 | *ptr = foo; | |
91 | ||
92 | or when the kernel does the access on the application's behalf like | |
28e21eac | 93 | with a read():: |
c74fe394 DH |
94 | |
95 | read(fd, ptr, 1); | |
96 | ||
97 | The kernel will send a SIGSEGV in both cases, but si_code will be set | |
98 | to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when | |
99 | the plain mprotect() permissions are violated. |