Previously, in UCS4 conversion routines we limit the number of
characters we examine to the minimum of the number of characters in the
input and the number of characters in the output. This is not the
correct behavior when __GCONV_IGNORE_ERRORS is set, as we do not consume
an output character when we skip a code unit. Instead, track the input
and output pointers and terminate the loop when either reaches its
limit.
This resolves assertion failures when resetting the input buffer in a step of
iconv, which assumes that the input will be fully consumed given sufficient
output space.
Arjun Shankar [Wed, 4 Nov 2020 11:19:38 +0000 (12:19 +0100)]
iconv: Accept redundant shift sequences in IBM1364 [BZ #26224]
The IBM1364, IBM1371, IBM1388, IBM1390 and IBM1399 character sets
share converter logic (iconvdata/ibm1364.c) which would reject
redundant shift sequences when processing input in these character
sets. This led to a hang in the iconv program (CVE-2020-27618).
This commit adjusts the converter to ignore redundant shift sequences
and adds test cases for iconv_prog hangs that would be triggered upon
their rejection. This brings the implementation in line with other
converters that also ignore redundant shift sequences (e.g. IBM930
etc., fixed in commit 692de4b3960d).
DJ Delorie [Thu, 25 Feb 2021 21:08:21 +0000 (16:08 -0500)]
nscd: Fix double free in netgroupcache [BZ #27462]
In commit 745664bd798ec8fd50438605948eea594179fba1 a use-after-free
was fixed, but this led to an occasional double-free. This patch
tracks the "live" allocation better.
Florian Weimer [Wed, 27 Jan 2021 12:36:12 +0000 (13:36 +0100)]
gconv: Fix assertion failure in ISO-2022-JP-3 module (bug 27256)
The conversion loop to the internal encoding does not follow
the interface contract that __GCONV_FULL_OUTPUT is only returned
after the internal wchar_t buffer has been filled completely. This
is enforced by the first of the two asserts in iconv/skeleton.c:
/* We must run out of output buffer space in this
rerun. */
assert (outbuf == outerr);
assert (nstatus == __GCONV_FULL_OUTPUT);
This commit solves this issue by queuing a second wide character
which cannot be written immediately in the state variable, like
other converters already do (e.g., BIG5-HKSCS or TSCII).
H.J. Lu [Mon, 28 Dec 2020 13:28:49 +0000 (05:28 -0800)]
x86: Check IFUNC definition in unrelocated executable [BZ #20019]
Calling an IFUNC function defined in unrelocated executable also leads to
segfault. Issue a fatal error message when calling IFUNC function defined
in the unrelocated executable from a shared library.
On x86, ifuncmain6pie failed with:
[hjl@gnu-cfl-2 build-i686-linux]$ ./elf/ifuncmain6pie --direct
./elf/ifuncmain6pie: IFUNC symbol 'foo' referenced in '/export/build/gnu/tools-build/glibc-32bit/build-i686-linux/elf/ifuncmod6.so' is defined in the executable and creates an unsatisfiable circular dependency.
[hjl@gnu-cfl-2 build-i686-linux]$ readelf -rW elf/ifuncmod6.so | grep foo 00003ff400000706 R_386_GLOB_DAT 0000400c foo_ptr 00003ff800000406 R_386_GLOB_DAT 00000000 foo 0000400c00000401 R_386_32 00000000 foo
[hjl@gnu-cfl-2 build-i686-linux]$
Remove non-JUMP_SLOT relocations against foo in ifuncmod6.so, which
trigger the circular IFUNC dependency, and build ifuncmain6pie with
-Wl,-z,lazy.
H.J. Lu [Tue, 12 Jan 2021 13:15:49 +0000 (05:15 -0800)]
x86-64: Avoid rep movsb with short distance [BZ #27130]
When copying with "rep movsb", if the distance between source and
destination is N*4GB + [1..63] with N >= 0, performance may be very
slow. This patch updates memmove-vec-unaligned-erms.S for AVX and
AVX512 versions with the distance in RCX:
cmpl $63, %ecx
// Don't use "rep movsb" if ECX <= 63
jbe L(Don't use rep movsb")
Use "rep movsb"
Benchtests data with bench-memcpy, bench-memcpy-large, bench-memcpy-random
and bench-memcpy-walk on Skylake, Ice Lake and Tiger Lake show that its
performance impact is within noise range as "rep movsb" is only used for
data size >= 4KB.
The variant PCS support was ineffective because in the common case
linkmap->l_mach.plt == 0 but then the symbol table flags were ignored
and normal lazy binding was used instead of resolving the relocs early.
(This was a misunderstanding about how GOT[1] is setup by the linker.)
In practice this mainly affects SVE calls when the vector length is
more than 128 bits, then the top bits of the argument registers get
clobbered during lazy binding.
Wilco Dijkstra [Wed, 11 Mar 2020 17:15:25 +0000 (17:15 +0000)]
[AArch64] Improve integer memcpy
Further optimize integer memcpy. Small cases now include copies up
to 32 bytes. 64-128 byte copies are split into two cases to improve
performance of 64-96 byte copies. Comments have been rewritten.
Krzysztof Koch [Tue, 5 Nov 2019 17:35:18 +0000 (17:35 +0000)]
aarch64: Increase small and medium cases for __memcpy_generic
Increase the upper bound on medium cases from 96 to 128 bytes.
Now, up to 128 bytes are copied unrolled.
Increase the upper bound on small cases from 16 to 32 bytes so that
copies of 17-32 bytes are not impacted by the larger medium case.
Benchmarking:
The attached figures show relative timing difference with respect
to 'memcpy_generic', which is the existing implementation.
'memcpy_med_128' denotes the the version of memcpy_generic with
only the medium case enlarged. The 'memcpy_med_128_small_32' numbers
are for the version of memcpy_generic submitted in this patch, which
has both medium and small cases enlarged. The figures were generated
using the script from:
https://www.sourceware.org/ml/libc-alpha/2019-10/msg00563.html
Depending on the platform, the performance improvement in the
bench-memcpy-random.c benchmark ranges from 6% to 20% between
the original and final version of memcpy.S
Tested against GLIBC testsuite and randomized tests.
Wilco Dijkstra [Fri, 28 Aug 2020 16:51:40 +0000 (17:51 +0100)]
AArch64: Improve backwards memmove performance
On some microarchitectures performance of the backwards memmove improves if
the stores use STR with decreasing addresses. So change the memmove loop
in memcpy_advsimd.S to use 2x STR rather than STP.
Add a new memcpy using 128-bit Q registers - this is faster on modern
cores and reduces codesize. Similar to the generic memcpy, small cases
include copies up to 32 bytes. 64-128 byte copies are split into two
cases to improve performance of 64-96 byte copies. Large copies align
the source rather than the destination.
bench-memcpy-random is ~9% faster than memcpy_falkor on Neoverse N1,
so make this memcpy the default on N1 (on Centriq it is 15% faster than
memcpy_falkor).
arm: CVE-2020-6096: Fix multiarch memcpy for negative length [BZ #25620]
Unsigned branch instructions could be used for r2 to fix the wrong
behavior when a negative length is passed to memcpy.
This commit fixes the armv7 version.
arm: CVE-2020-6096: fix memcpy and memmove for negative length [BZ #25620]
Unsigned branch instructions could be used for r2 to fix the wrong
behavior when a negative length is passed to memcpy and memmove.
This commit fixes the generic arm implementation of memcpy amd memmove.
strcmp-avx2.S: In avx2 strncmp function, strings are compared in
chunks of 4 vector size(i.e. 32x4=128 byte for avx2). After first 4
vector size comparison, code must check whether it already passed
the given offset. This patch implement avx2 offset check condition
for strncmp function, if both string compare same for first 4 vector
size.
Florian Weimer [Tue, 19 May 2020 12:09:38 +0000 (14:09 +0200)]
nss_compat: internal_end*ent may clobber errno, hiding ERANGE [BZ #25976]
During cleanup, before returning from get*_r functions, the end*ent
calls must not change errno. Otherwise, an ERANGE error from the
underlying implementation can be hidden, causing unexpected lookup
failures. This commit introduces an internal_end*ent_noerror
function which saves and restore errno, and marks the original
internal_end*ent function as warn_unused_result, so that it is used
only in contexts were errors from it can be handled explicitly.
H.J. Lu [Wed, 29 Apr 2020 20:20:27 +0000 (13:20 -0700)]
Add C wrappers for process_vm_readv/process_vm_writev [BZ #25810]
Since the the U marker can only be applied to 2 unsigned long arguments
in syscalls.list files, add a C wrapper for process_vm_readv and
process_vm_writev syscals which have more than 2 unsigned long arguments.
H.J. Lu [Wed, 29 Apr 2020 15:08:40 +0000 (08:08 -0700)]
Mark unsigned long arguments with U in more syscalls [BZ #25810]
Mark unsigned long arguments in mmap, read, recv, recvfrom, send, sendto,
write, ioperm, sendfile64, setxattr, lsetxattr, fsetxattr, getxattr,
lgetxattr, fgetxattr, listxattr, llistxattr and flistxattr with U in
syscalls.list files.
H.J. Lu [Wed, 29 Apr 2020 12:35:34 +0000 (05:35 -0700)]
Add SYSCALL_ULONG_ARG_[12] to pass long to syscall [BZ #25810]
X32 has 32-bit long and pointer with 64-bit off_t. Since x32 psABI
requires that pointers passed in registers must be zero-extended to
64bit, x32 can share many syscall interfaces with LP64. When a LP64
syscall with long and unsigned long int arguments is used for x32, these
arguments must be properly extended to 64-bit. Otherwise if the upper
32 bits of the register have undefined value, such a syscall will be
rejected by kernel.
For syscalls implemented in assembly codes, 'U' is added to syscall
signature key letters for unsigned long, which is zero-extended to
64-bit types. SYSCALL_ULONG_ARG_1 and SYSCALL_ULONG_ARG_2 are passed
to syscall-template.S for the first and the second unsigned long int
arguments if PSEUDOS_HAVE_ULONG_INDICES is defined. They are used by
x32 to zero-extend 32-bit arguments to 64 bits.
Tested on i386, x86-64 and x32 as well as with build-many-glibcs.py.
H.J. Lu [Mon, 13 Apr 2020 17:31:26 +0000 (10:31 -0700)]
x32: Properly pass long to syscall [BZ #25810]
X32 has 32-bit long and pointer with 64-bit off_t. Since x32 psABI
requires that pointers passed in registers must be zero-extended to
64bit, x32 can share many syscall interfaces with LP64. When a LP64
syscall with long and unsigned long arguments is used for x32, these
arguments must be properly extended to 64-bit. Otherwise if the upper
32 bits of the register have undefined value, such a syscall will be
rejected by kernel.
Enforce zero-extension for pointers and array system call arguments.
For integer types, extend to int64_t (the full register) using a
regular cast, resulting in zero or sign extension based on the
signedness of the original type.
For
void *mmap(void *addr, size_t length, int prot, int flags,
int fd, off_t offset);
1. addr is unchanged.
2. length is zero-extend to 64 bits.
3. prot is sign-extend to 64 bits.
4. flags is sign-extend to 64 bits.
5. fd is sign-extend to 64 bits.
6. offset is unchanged.
For int arguments, since kernel uses only the lower 32 bits and ignores
the upper 32 bits in 64-bit registers, these work correctly.
Tested on x86-64 and x32. There are no code changes on x86-64.
Fix data race in setting function descriptors during lazy binding on hppa.
This addresses an issue that is present mainly on SMP machines running
threaded code. In a typical indirect call or PLT import stub, the
target address is loaded first. Then the global pointer is loaded into
the PIC register in the delay slot of a branch to the target address.
During lazy binding, the target address is a trampoline which transfers
to _dl_runtime_resolve().
_dl_runtime_resolve() uses the relocation offset stored in the global
pointer and the linkage map stored in the trampoline to find the
relocation. Then, the function descriptor is updated.
In a multi-threaded application, it is possible for the global pointer
to be updated between the load of the target address and the global
pointer. When this happens, the relocation offset has been replaced
by the new global pointer. The function pointer has probably been
updated as well but there is no way to find the address of the function
descriptor and to transfer to the target. So, _dl_runtime_resolve()
typically crashes.
HP-UX addressed this problem by adding an extra pc-relative branch to
the trampoline. The descriptor is initially setup to point to the
branch. The branch then transfers to the trampoline. This allowed
the trampoline code to figure out which descriptor was being used
without any modification to user code. I didn't use this approach
as it is more complex and changes function pointer canonicalization.
The order of loading the target address and global pointer in
indirect calls was not consistent with the order used in import stubs.
In particular, $$dyncall and some inline versions of it loaded the
global pointer first. This was inconsistent with the global pointer
being updated first in dl-machine.h. Assuming the accesses are
ordered, we want elf_machine_fixup_plt() to store the global pointer
first and calls to load it last. Then, the global pointer will be
correct when the target function is entered.
However, just to make things more fun, HP added support for
out-of-order execution of accesses in PA 2.0. The accesses used by
calls are weakly ordered. So, it's possibly under some circumstances
that a function might be entered with the wrong global pointer.
However, HP uses weakly ordered accesses in 64-bit HP-UX, so I assume
that loading the global pointer in the delay slot of the branch must
work consistently.
The basic fix for the race is a combination of modifying user code to
preserve the address of the function descriptor in register %r22 and
setting the least-significant bit in the relocation offset. The
latter was suggested by Carlos as a way to distinguish relocation
offsets from global pointer values. Conventionally, %r22 is used
as the address of the function descriptor in calls to $$dyncall.
So, it wasn't hard to preserve the address in %r22.
I have updated gcc trunk and gcc-9 branch to not clobber %r22 in
$$dyncall and inline indirect calls. I have also modified the import
stubs in binutils trunk and the 2.33 branch to preserve %r22. This
required making the stubs one instruction longer but we save one
relocation. I also modified binutils to align the .plt section on
a 8-byte boundary. This allows descriptors to be updated atomically
with a floting-point store.
With these changes, _dl_runtime_resolve() can fallback to an alternate
mechanism to find the relocation offset when it has been clobbered.
There's just one additional instruction in the fast path. I tested
the fallback function, _dl_fix_reloc_arg(), by changing the branch to
always use the fallback. Old code still runs as it did before.
As indicated by kernel developers [1], the sigreturn stub can not change
the register window or the stack pointer since the kernel has setup the
restore frame at a precise location relative to the stack pointer when
the stub is invoked.
I tried to play with some compiler flags and even with _Noreturn and
__builtin_unreachable after the asm does not help (and Sparc does not
support naked functions).
To avoid similar issues, as the stack-protector support also have
stumbled, this patch moves the implementation of the sigreturn stubs to
assembly.
Checked on sparcv9-linux-gnu and sparc64-linux-gnu with gcc 9.2.1
and gcc 7.5.0.
The commit "arm: Split BE/LE abilist"
(1673ba87fefe019c834c09d33673d1d453ea698d) changed the soft-fp order for
ARM selection when __SOFTFP__ is defined by the compiler.
It make the build select some routines (fadd, fdiv, fmul, fsub, and fma)
on ieee754/flt-32 and ieee754/dbl-64 that requires fenv support to be
correctly rounded which in turns lead to math failures since the
__SOFTFP__ does not have fenv support.
i386: Use comdat instead of .gnu.linkonce for i386 setup pic register (BZ #20543)
GCC has moved from using .gnu.linkonce for i386 setup pic register with
minimum current version (as for binutils minimum binutils that support
comdat).
Trying to pinpoint when binutils has added comdat support for i686, it
seems it was around 2004 [1]. I also checking with some ancient
binutils older than 2.16 I see:
test.o: In function `__x86.get_pc_thunk.bx':
test.o(.text.__x86.get_pc_thunk.bx+0x0): multiple definition of `__x86.get_pc_thunk.bx'
/usr/lib/gcc/x86_64-linux-gnu/5/../../../i386-linux-gnu/crti.o(.gnu.linkonce.t.__x86.get_pc_thunk.bx+0x0): first defined here
Which seems that such version can not handle either comdat at all or
a mix of linkonce and comdat. For binutils 2.16.1 I am getting a
different issue trying to link a binary with and more recent
ctri.o (unrecognized relocation (0x2b) in section `.init', which is
R_386_GOT32X and old binutils won't generate it anyway).
So I think that either unlikely someone will use an older binutils than
the one used to glibc and even this scenario may fail with some issue
as the R_386_GOT32X. Also, 2.16.1 is quite old and not really supported
(glibc itself required 2.25).
Andreas Schwab [Mon, 20 Jan 2020 16:01:50 +0000 (17:01 +0100)]
Fix array overflow in backtrace on PowerPC (bug 25423)
When unwinding through a signal frame the backtrace function on PowerPC
didn't check array bounds when storing the frame address. Fixes commit d400dcac5e ("PowerPC: fix backtrace to handle signal trampolines").
Joseph Myers [Wed, 12 Feb 2020 23:31:56 +0000 (23:31 +0000)]
Avoid ldbl-96 stack corruption from range reduction of pseudo-zero (bug 25487).
Bug 25487 reports stack corruption in ldbl-96 sinl on a pseudo-zero
argument (an representation where all the significand bits, including
the explicit high bit, are zero, but the exponent is not zero, which
is not a valid representation for the long double type).
Although this is not a valid long double representation, existing
practice in this area (see bug 4586, originally marked invalid but
subsequently fixed) is that we still seek to avoid invalid memory
accesses as a result, in case of programs that treat arbitrary binary
data as long double representations, although the invalid
representations of the ldbl-96 format do not need to be consistently
handled the same as any particular valid representation.
This patch makes the range reduction detect pseudo-zero and unnormal
representations that would otherwise go to __kernel_rem_pio2, and
returns a NaN for them instead of continuing with the range reduction
process. (Pseudo-zero and unnormal representations whose unbiased
exponent is less than -1 have already been safely returned from the
function before this point without going through the rest of range
reduction.) Pseudo-zero representations would previously result in
the value passed to __kernel_rem_pio2 being all-zero, which is
definitely unsafe; unnormal representations would previously result in
a value passed whose high bit is zero, which might well be unsafe
since that is not a form of input expected by __kernel_rem_pio2.
Fangrui Song [Wed, 5 Feb 2020 05:55:44 +0000 (21:55 -0800)]
Improve IFUNC check [BZ #25506]
GNU ld's RISCV port does not support IFUNC. ld -no-pie produces no
relocation and the test passed incorrectly. Be more rigid by testing
IRELATIVE explicitly.
malloc/tst-mallocfork2: Kill lingering process for unexpected failures
If the test fails due some unexpected failure after the children
creation, either in the signal handler by calling abort or in the main
loop; the created children might not be killed properly.
This patches fixes it by:
* Avoid aborting in the signal handler by setting a flag that
an error has occured and add a check in the main loop.
* Add a atexit handler to handle kill child processes.
riscv: Avoid clobbering register parameters in syscall
The riscv INTERNAL_SYSCALL macro might clobber the register
parameter if the argument itself might clobber any register (a function
call for instance).
This patch fixes it by using temporary variables for the expressions
between the register assignments (as indicated by GCC documentation,
6.47.5.2 Specifying Registers for Local Variables).
It is similar to the fix done for MIPS (bug 25523).
Checked with riscv64-linux-gnu-rv64imafdc-lp64d build.
microblaze: Avoid clobbering register parameters in syscall
The microblaze INTERNAL_SYSCALL macro might clobber the register
parameter if the argument itself might clobber any register (a function
call for instance).
This patch fixes it by using temporary variables for the expressions
between the register assignments (as indicated by GCC documentation,
6.47.5.2 Specifying Registers for Local Variables).
It is similar to the fix done for MIPS (bug 25523).
Checked with microblaze-linux-gnu and microblazeel-linux-gnu build.
hppa: Align __clone stack argument to 8 bytes (Bug 25066)
The hppa architecture requires strict alignment for loads and stores.
As a result, the minimum stack alignment that will work is 8 bytes.
This patch adjusts __clone() to align the stack argument passed to it.
It also adjusts slightly some formatting.
Florian Weimer [Fri, 17 Jan 2020 14:11:20 +0000 (15:11 +0100)]
Remove incorrect alloc_size attribute from pvalloc [BZ #25401]
pvalloc is guarantueed to round up the allocation size to the page
size, so applications can assume that the memory region is larger
than the passed-in argument. The alloc_size attribute cannot express
that.
The test case is based on a suggestion from Jakub Jelinek.
Florian Weimer [Tue, 12 Nov 2019 11:25:49 +0000 (12:25 +0100)]
login: Use pread64 in utmp implementation
This reduces the possible error scenarios considerably because
no longer can file seek fail, leaving the file descriptor in an
inconsistent state and out of sync with the cache.
As a result, it is possible to avoid setting file_offset to -1
to make an error persistent. Instead, subsequent calls will retry
the operation and report any errors returned by the kernel.
This change also avoids reading the file from the start if pututline
is called multiple times, to work around lock acquisition failures
due to timeouts.
Florian Weimer [Tue, 12 Nov 2019 11:02:57 +0000 (12:02 +0100)]
login: Introduce matches_last_entry to utmp processing
This simplifies internal_getut_nolock and fixes a regression,
introduced in commit be6b16d975683e6cca57852cd4cfe715b2a9d8b1
("login: Acquire write lock early in pututline [BZ #24882]")
in pututxline because __utmp_equal can only compare process-related
utmp entries.
Fixes: be6b16d975683e6cca57852cd4cfe715b2a9d8b1
Change-Id: Ib8a85002f7f87ee41590846d16d7e52bdb82f5a5
(cherry picked from commit 76a7c103eb9060f9e3ba01d073ae4621a17d8b46)
Florian Weimer [Thu, 7 Nov 2019 17:15:18 +0000 (18:15 +0100)]
login: Acquire write lock early in pututline [BZ #24882]
It has been reported that due to lack of fairness in POSIX file
locking, the current reader-to-writer lock upgrade can result in
lack of forward progress. Acquiring the write lock directly
hopefully avoids this issue if there are only writers.
This also fixes bug 24882 due to the cache revalidation in
__libc_pututline.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Change-Id: I57e31ae30719e609a53505a0924dda101d46372e
(cherry picked from commit be6b16d975683e6cca57852cd4cfe715b2a9d8b1)
Commit 7532837d7b03b3ca5b9a63d77a5bd81dd23f3d9c ("The
-Wstringop-truncation option new in GCC 8 detects common misuses")
added __attribute_nonstring__ to bits/utmp.h, but it did not update
the parallel bits/utmpx.h header. In struct utmp, the nonstring
attribute for ut_id was missing.
Florian Weimer [Thu, 7 Nov 2019 08:53:41 +0000 (09:53 +0100)]
login: Remove double-assignment of fl.l_whence in try_file_lock
Since l_whence is the second member of struct flock, it is written
twice. The double-assignment is technically undefined behavior due to
the lack of a sequence point.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Change-Id: I2baf9e70690e723c61051b25ccbd510aec15976c
(cherry picked from commit b0a83ae71b2588bd2a9e6b40f95191602940e01e)
Florian Weimer [Wed, 28 Aug 2019 09:59:45 +0000 (11:59 +0200)]
login: pututxline could fail to overwrite existing entries [BZ #24902]
The internal_getut_r function updates the file_offset variable and
therefore must always update last_entry as well.
Previously, if pututxline could not upgrade the read lock to a
write lock, internal_getut_r would update file_offset only,
without updating last_entry, and a subsequent call would not
overwrite the existing utmpx entry at file_offset, instead
creating a new entry. This has been observed to cause unbounded
file growth in high-load situations.
This commit removes the buffer argument to internal_getut_r and
updates the last_entry variable directly, along with file_offset.
Florian Weimer [Thu, 15 Aug 2019 14:09:20 +0000 (16:09 +0200)]
login: Use struct flock64 in utmp [BZ #24880]
Commit 06ab719d30b01da401150068054d3b8ea93dd12f ("Fix Linux fcntl OFD
locks for non-LFS architectures (BZ#20251)") introduced the use of
fcntl64 into the utmp implementation. However, the lock file
structure was not updated to struct flock64 at that point.
Florian Weimer [Thu, 15 Aug 2019 14:09:05 +0000 (16:09 +0200)]
login: Disarm timer after utmp lock acquisition [BZ #24879]
If the file processing takes a long time for some reason, SIGALRM can
arrive while the file is still being processed. At that point, file
access will fail with EINTR. Disarming the timer after lock
acquisition avoids that. (If there was a previous alarm, it is the
responsibility of the caller to deal with the EINTR error.)
Florian Weimer [Thu, 15 Aug 2019 08:30:23 +0000 (10:30 +0200)]
login: Fix updwtmp, updwtmx unlocking
Commit 5a3afa9738f3dbbaf8c0a35665318c1af782111b (login: Replace
macro-based control flow with function calls in utmp) introduced
a regression because after it, __libc_updwtmp attempts to unlock
the wrong file descriptor.
There is just one file-based implementation, so this dispatch
mechanism is unnecessary. Instead of the vtable pointer
__libc_utmp_jump_table, use a non-negative file_fd as the indicator
that the backend is initialized.
Florian Weimer [Thu, 5 Dec 2019 16:29:42 +0000 (17:29 +0100)]
misc/test-errno-linux: Handle EINVAL from quotactl
In commit 3dd4d40b420846dd35869ccc8f8627feef2cff32 ("xfs: Sanity check
flags of Q_XQUOTARM call"), Linux 5.4 added checking for the flags
argument, causing the test to fail due to too restrictive test
expectations.
Kamlesh Kumar [Thu, 5 Dec 2019 15:54:47 +0000 (16:54 +0100)]
<string.h>: Define __CORRECT_ISO_CPP_STRING_H_PROTO for Clang [BZ #25232]
Without the asm redirects, strchr et al. are not const-correct.
libc++ has a wrapper header that works with and without
__CORRECT_ISO_CPP_STRING_H_PROTO (using a Clang extension). But when
Clang is used with libstdc++ or just C headers, the overloaded functions
with the correct types are not declared.
This change does not impact current GCC (with libstdc++ or libc++).
Florian Weimer [Tue, 3 Dec 2019 19:26:28 +0000 (20:26 +0100)]
x86: Assume --enable-cet if GCC defaults to CET [BZ #25225]
This links in CET support if GCC defaults to CET. Otherwise, __CET__
is defined, yet CET functionality is not compiled and linked into the
dynamic loader, resulting in a linker failure due to undefined
references to _dl_cet_check and _dl_open_check.
Florian Weimer [Thu, 28 Nov 2019 13:18:12 +0000 (14:18 +0100)]
libio: Disable vtable validation for pre-2.1 interposed handles [BZ #25203]
Commit c402355dfa7807b8e0adb27c009135a7e2b9f1b0 ("libio: Disable
vtable validation in case of interposition [BZ #23313]") only covered
the interposable glibc 2.1 handles, in libio/stdfiles.c. The
parallel code in libio/oldstdfiles.c needs similar detection logic.
Stefan Liebler [Wed, 27 Nov 2019 11:35:40 +0000 (12:35 +0100)]
S390: Fix handling of needles crossing a page in strstr z15 ifunc-variant. [BZ #25226]
If the specified needle crosses a page-boundary, the s390-z15 ifunc variant of
strstr truncates the needle which results in invalid results.
This is fixed by loading the needle beyond the page boundary to v18 instead of v16.
The bug is sometimes observable in test-strstr.c in check1 and check2 as the
haystack and needle is stored on stack. Thus the needle can be on a page boundary.
check2 is now extended to test haystack / needles located on stack, at end of page
and on two pages.
rtld: Check __libc_enable_secure before honoring LD_PREFER_MAP_32BIT_EXEC (CVE-2019-19126) [BZ #25204]
The problem was introduced in glibc 2.23, in commit b9eb92ab05204df772eb4929eccd018637c9f3e9
("Add Prefer_MAP_32BIT_EXEC to map executable pages with MAP_32BIT").
Don't use a custom wrapper macro around __has_include (bug 25189).
This causes issues when using clang with -frewrite-includes to e.g.,
submit the translation unit to a distributed compiler.
In my case, I was building Firefox using sccache.
See [1] for a reduced test-case since I initially thought this was a
clang bug, and [2] for more context.
Apparently doing this is invalid C++ per [cpp.cond], which mentions [3]:
> The #ifdef and #ifndef directives, and the defined conditional
> inclusion operator, shall treat __has_include and __has_cpp_attribute
> as if they were the names of defined macros. The identifiers
> __has_include and __has_cpp_attribute shall not appear in any context
> not mentioned in this subclause.
DJ Delorie [Wed, 30 Oct 2019 22:03:14 +0000 (18:03 -0400)]
Base max_fast on alignment, not width, of bins (Bug 24903)
set_max_fast sets the "impossibly small" value based on,
eventually, MALLOC_ALIGNMENT. The comparisons for the smallest
chunk used is, eventually, MIN_CHUNK_SIZE. Note that i386
is the only platform where these are the same, so a smallest
chunk *would* be put in a no-fastbins fastbin.
This change calculates the "impossibly small" value
based on MIN_CHUNK_SIZE instead, so that we can know it will
always be impossibly small.
malloc: Fix missing accounting of top chunk in malloc_info [BZ #24026]
Fixes `<total type="rest" size="..."> incorrectly showing as 0 most
of the time.
The rest value being wrong is significant because to compute the
actual amount of memory handed out via malloc, the user must subtract
it from <system type="current" size="...">. That result being wrong
makes investigating memory fragmentation issues like
<https://bugzilla.redhat.com/show_bug.cgi?id=843478> close to
impossible.
Paul A. Clarke [Thu, 19 Sep 2019 16:11:04 +0000 (11:11 -0500)]
[powerpc] No need to enter "Ignore Exceptions Mode"
Since at least POWER8, there is no performance advantage to entering
"Ignore Exceptions Mode", and doing so conditionally requires
- the conditional logic, and
- a system call.
Paul A. Clarke [Thu, 19 Sep 2019 19:04:45 +0000 (14:04 -0500)]
[powerpc] Rename fesetenv_mode to fesetenv_control
fesetenv_mode is used variously to write the FPSCR exception enable
bits and rounding mode bits. These are referred to as the control
bits in the POWER ISA. Change the name to be reflective of its
current and expected use, and match up well with fegetenv_control.
libc_feholdsetround_noex_ppc_ctx currently performs:
1. Read FPSCR, save to context.
2. Create new FPSCR value: clear enables and set new rounding mode.
3. Write new value to FPSCR.
Since other bits just pass through, there is no need to write them.
Instead, write just the changed values (enables and rounding mode),
which can be a bit more efficient.
Paul A. Clarke [Thu, 19 Sep 2019 16:58:46 +0000 (11:58 -0500)]
[powerpc] Rename fegetenv_status to fegetenv_control
fegetenv_status is used variously to retrieve the FPSCR exception enable
bits, rounding mode bits, or both. These are referred to as the control
bits in the POWER ISA. FPSCR status bits are also returned by the
'mffs' and 'mffsl' instructions, but they are uniformly ignored by all
uses of fegetenv_status. Change the name to be reflective of its
current and expected use.
Reviewed-By: Paul E Murphy <murphyp@linux.ibm.com>
Paul A. Clarke [Thu, 19 Sep 2019 16:39:44 +0000 (11:39 -0500)]
[powerpc] __fesetround_inline optimizations
On POWER9, use more efficient means to update the 2-bit rounding mode
via the 'mffscrn' instruction (instead of two 'mtfsb0/1' instructions
or one 'mtfsfi' instruction that modifies 4 bits).
Suggested-by: Paul E. Murphy <murphyp@linux.ibm.com> Reviewed-By: Paul E Murphy <murphyp@linux.ibm.com>
ROUND_TO_ODD and a couple of other places use libc_feupdateenv_test to
restore the rounding mode and exception enables, preserve exception flags,
and test whether given exception(s) were generated.
If the exception flags haven't changed, then it is sufficient and a bit
more efficient to just restore the rounding mode and enables, rather than
writing the full Floating-Point Status and Control Register (FPSCR).
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
Paul A. Clarke [Thu, 19 Sep 2019 13:35:16 +0000 (08:35 -0500)]
[powerpc] SET_RESTORE_ROUND optimizations and bug fix
SET_RESTORE_ROUND brackets a block of code, temporarily setting and
restoring the rounding mode and letting everything else, including
exceptions generated within the block, pass through.
On powerpc, the current code clears the exception enables, which will hide
exceptions generated within the block. This issue was introduced by me
in commit e905212627350d54b58426214b5a54ddc852b0c9.
Fix this by not clearing exception enable bits in the prologue.
Also, since we are no longer changing the enable bits in either the
prologue or the epilogue, there is no need to test for entering/exiting
non-stop mode.
Also, optimize the prologue get/save/set rounding mode operations for
POWER9 and later by using 'mffscrn' when possible.
Suggested-by: Paul E. Murphy <murphyp@linux.ibm.com> Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com> Fixes: e905212627350d54b58426214b5a54ddc852b0c9
2019-09-19 Paul A. Clarke <pc@us.ibm.com>
* sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_and_set_rn): New.
(__fe_mffscrn): New.
* sysdeps/powerpc/fpu/fenv_private.h (libc_feholdsetround_ppc_ctx):
Do not clear enable bits, remove obsolete code, use
fegetenv_and_set_rn.
(libc_feresetround_ppc): Remove obsolete code, use
fegetenv_and_set_rn.
fegetenv_status() wants to use the lighter weight instruction 'mffsl'
for reading the Floating-Point Status and Control Register (FPSCR).
It currently will use it directly if compiled '-mcpu=power9', and will
perform a runtime check (cpu_supports("arch_3_00")) otherwise.
Nicely, it turns out that the 'mffsl' instruction will decode to
'mffs' on architectures older than "arch_3_00" because the additional
bits set for 'mffsl' are "don't care" for 'mffs'. 'mffs' is a superset
of 'mffsl'.
Paul A. Clarke [Tue, 6 Aug 2019 04:13:45 +0000 (00:13 -0400)]
[powerpc] fesetenv: optimize FPSCR access
fesetenv() reads the current value of the Floating-Point Status and Control
Register (FPSCR) to determine the difference between the current state of
exception enables and the newly requested state. All of these bits are also
returned by the lighter weight 'mffsl' instruction used by fegetenv_status().
Use that instead.
Also, remove a local macro _FPU_MASK_ALL in favor of a common macro,
FPU_ENABLES_MASK from fenv_libc.h.
Finally, use a local variable ('new') in favor of a pointer dereference
('*envp').
Paul A. Clarke [Sat, 3 Aug 2019 02:47:57 +0000 (22:47 -0400)]
[powerpc] SET_RESTORE_ROUND improvements
SET_RESTORE_ROUND uses libc_feholdsetround_ppc_ctx and
libc_feresetround_ppc_ctx to bracket a block of code where the floating point
rounding mode must be set to a certain value.
For the *prologue*, libc_feholdsetround_ppc_ctx is used and performs:
1. Read/save FPSCR.
2. Create new value for FPSCR with new rounding mode and enables cleared.
3. If new value is different than current value,
a. If transitioning from a state where some exceptions enabled,
enter "ignore exceptions / non-stop" mode.
b. Write new value to FPSCR.
c. Put a mark on the wall indicating the FPSCR was changed.
(1) uses the 'mffs' instruction. On POWER9, the lighter weight 'mffsl'
instruction can be used, but it doesn't return all of the bits in the FPSCR.
fegetenv_status uses 'mffsl' on POWER9, 'mffs' otherwise, and can thus be
used instead of fegetenv_register.
(3b) uses 'mtfsf 0b11111111' to write the entire FPSCR, so it must
instead use 'mtfsf 0b00000011' to write just the enables and the mode,
because some of the rest of the bits are not valid if 'mffsl' was used.
fesetenv_mode uses 'mtfsf 0b00000011' on POWER9, 'mtfsf 0b11111111'
otherwise.
For the *epilogue*, libc_feresetround_ppc_ctx checks the mark on the wall, then
calls libc_feresetround_ppc, which just calls __libc_femergeenv_ppc with
parameters such that it performs:
1. Retreive saved value of FPSCR, saved in prologue above.
2. Read FPSCR.
3. Create new value of FPSCR where:
- Summary bits and exception indicators = current OR saved.
- Rounding mode and enables = saved.
- Status bits = current.
4. If transitioning from some exceptions enabled to none,
enter "ignore exceptions / non-stop" mode.
5. If transitioning from no exceptions enabled to some,
enter "catch exceptions" mode.
6. Write new value to FPSCR.
The summary bits are hardwired to the exception indicators, so there is no
need to restore any saved summary bits.
The exception indicator bits, which are sticky and remain set unless
explicitly cleared, would only need to be restored if the code block
might explicitly clear any of them. This is certainly not expected.
So, the only bits that need to be restored are the enables and the mode.
If it is the case that only those bits are to be restored, there is no need to
read the FPSCR. Steps (2) and (3) are unnecessary, and step (6) only needs to
write the bits being restored.
We know we are transitioning out of "ignore exceptions" mode, so step (4) is
unnecessary, and in step (6), we only need to check the state we are
entering.
Since fe{en,dis}ableexcept() and fesetmode() read-modify-write just the
"mode" (exception enable and rounding mode) bits of the Floating Point Status
Control Register (FPSCR), the lighter weight 'mffsl' instruction can be used
to read the FPSCR (enables and rounding mode), and 'mtfsf 0b00000011' can be
used to write just those bits back to the FPSCR. The net is better performance.
In addition, fe{en,dis}ableexcept() read the FPSCR again after writing it, or
they determine that it doesn't need to be written because it is not changing.
In either case, the local variable holds the current values of the enable
bits in the FPSCR. This local variable can be used instead of again reading
the FPSCR.
Also, that value of the FPSCR which is read the second time is validated
against the requested enables. Since the write can't fail, this validation
step is unnecessary, and can be removed. Instead, the exceptions to be
enabled (or disabled) are transformed into available bits in the FPSCR,
then validated after being transformed back, to ensure that all requested
bits are actually being set. For example, FE_INVALID_SQRT can be
requested, but cannot actually be set. This bit is not mapped during the
transformations, so a test for that bit being set before and after
transformations will show the bit would not be set, and the function will
return -1 for failure.
Finally, convert the local macros in fesetmode.c to more generally useful
macros in fenv_libc.h.