libgomp, nvptx: Cuda pinned memory
Use Cuda to pin memory, instead of Linux mlock, when available.
There are two advantages: firstly, this gives a significant speed boost for
NVPTX offloading, and secondly, it side-steps the usual OS ulimit/rlimit
setting.
The design adds a device independent plugin API for allocating pinned memory,
and then implements it for NVPTX. At present, the other supported devices do
not have equivalent capabilities (or requirements).
libgomp/ChangeLog:
* config/linux/allocator.c: Include assert.h.
(using_device_for_page_locked): New variable.
(linux_memspace_alloc): Add init0 parameter. Support device pinning.
(linux_memspace_calloc): Set init0 to true.
(linux_memspace_free): Support device pinning.
(linux_memspace_realloc): Support device pinning.
(MEMSPACE_ALLOC): Set init0 to false.
* libgomp-plugin.h
(GOMP_OFFLOAD_page_locked_host_alloc): New prototype.
(GOMP_OFFLOAD_page_locked_host_free): Likewise.
* libgomp.h (gomp_page_locked_host_alloc): Likewise.
(gomp_page_locked_host_free): Likewise.
(struct gomp_device_descr): Add page_locked_host_alloc_func and
page_locked_host_free_func.
* libgomp.texi: Adjust the docs for the pinned trait.
* plugin/plugin-nvptx.c
(GOMP_OFFLOAD_page_locked_host_alloc): New function.
(GOMP_OFFLOAD_page_locked_host_free): Likewise.
* target.c (device_for_page_locked): New variable.
(get_device_for_page_locked): New function.
(gomp_page_locked_host_alloc): Likewise.
(gomp_page_locked_host_free): Likewise.
(gomp_load_plugin_for_device): Add page_locked_host_alloc and
page_locked_host_free.
* testsuite/libgomp.c/alloc-pinned-1.c: Change expectations for NVPTX
devices.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-3.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-4.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-5.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-6.c: Likewise.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
(cherry picked from commit
3b8d9d579c2931f1d8d2c89ff67735bc77df55ad)