MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2)
since it has been introduced.
mlock(2) fails if the memory range cannot get populated to
guarantee that no future major faults will happen on the range.
mmap(MAP_LOCKED) on the other hand silently succeeds even if
the range was populated only partially.
Fixing this subtle difference in the kernel is rather awkward
because the memory population happens after mm locks have been
dropped and so the cleanup before returning failure (munlock)
could operate on something else than the originally mapped area.
E.g. speculative userspace page fault handler catching SEGV and
doing mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion
of a racing mmap and lead to lost data. Although it is not clear
whether such a usage would be valid, mmap page doesn't explicitly
describe requirements for threaded applications so we cannot
exclude this possibility.
This patch makes the semantic of MAP_LOCKED explicit and suggests
using mmap + mlock as the only way to guarantee no later major
page faults.
Reviewed-by: Eric B Munson <emunson@akamai.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>