Commit 948ce73b31 made recvmsg/recvmmsg to always call
__convert_scm_timestamps for 64 bit time_t symbol, so adjust it to
always build it for __TIMESIZE != 64.
It fixes build for architecture with 32 bit time_t support when
configured with minimum kernel of 5.1.
Samuel Thibault [Sun, 17 Oct 2021 23:39:02 +0000 (01:39 +0200)]
hurd if_index: Explicitly use AF_INET for if index discovery
5bf07e1b3a74 ("Linux: Simplify __opensock and fix race condition [BZ #28353]")
made __opensock try NETLINK then UNIX then INET. On the Hurd, only INET
knows about network interfaces, so better actually specify that in
if_index.
Linux: Simplify __opensock and fix race condition [BZ #28353]
AF_NETLINK support is not quite optional on modern Linux systems
anymore, so it is likely that the first attempt will always succeed.
Consequently, there is no need to cache the result. Keep AF_UNIX
and the Internet address families as a fallback, for the rare case
that AF_NETLINK is missing. The other address families previously
probed are totally obsolete be now, so remove them.
Use this simplified version as the generic implementation, disabling
Netlink support as needed.
Linux: Only generate 64 bit timestamps for 64 bit time_t recvmsg/recvmmsg
The timestamps created by __convert_scm_timestamps only make sense for
64 bit time_t programs, 32 bit time_t programs will ignore 64 bit time_t
timestamps since SO_TIMESTAMP will be defined to old values (either by
glibc or kernel headers).
Worse, if the buffer is not suffice MSG_CTRUNC is set to indicate it
(which breaks some programs [1]).
This patch makes only 64 bit time_t recvmsg and recvmmsg to call
__convert_scm_timestamps. Also, the assumption to called it is changed
from __ASSUME_TIME64_SYSCALLS to __TIMESIZE != 64 since the setsockopt
might be called by libraries built without __TIME_BITS=64. The
MSG_CTRUNC is only set for the 64 bit symbols, it should happen only
if 64 bit time_t programs run older kernels.
linux: Fix ancillary 64-bit time timestamp conversion (BZ #28349, BZ#28350)
The __convert_scm_timestamps only updates the control message last
pointer for SOL_SOCKET type, so if the message control buffer contains
multiple ancillary message types the converted timestamp one might
overwrite a valid message.
The test checks if the extra ancillary space is correctly handled
by recvmsg/recvmmsg, where if there is no extra space for the 64-bit
time_t converted message the control buffer should be marked with
MSG_TRUNC. It also check if recvmsg/recvmmsg handle correctly multiple
ancillary data.
Checked on x86_64-linux and on i686-linux-gnu on both 5.11 and
4.15 kernel.
Check if the socket support 64-bit network packages timestamps
(SO_TIMESTAMP and SO_TIMESTAMPNS). This will be used on recvmsg
and recvmmsg tests to check if the timestamp should be generated.
Florian Weimer [Thu, 27 Jan 2022 15:03:58 +0000 (16:03 +0100)]
Fix glibc 2.34 ABI omission (missing GLIBC_2.34 in dynamic loader)
The glibc 2.34 release really should have added a GLIBC_2.34
symbol to the dynamic loader. With it, we could move functions such
as dlopen or pthread_key_create that work on process-global state
into the dynamic loader (once we have fixed a longstanding issue
with static linking). Without the GLIBC_2.34 symbol, yet another
new symbol version would be needed because old glibc will fail to
load binaries due to the missing symbol version in ld.so that newly
linked programs will require.
Noah Goldstein [Sun, 9 Jan 2022 22:02:28 +0000 (16:02 -0600)]
x86: Fix __wcsncmp_evex in strcmp-evex.S [BZ# 28755]
Fixes [BZ# 28755] for wcsncmp by redirecting length >= 2^56 to
__wcscmp_evex. For x86_64 this covers the entire address range so any
length larger could not possibly be used to bound `s1` or `s2`.
test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass.
Noah Goldstein [Sun, 9 Jan 2022 22:02:21 +0000 (16:02 -0600)]
x86: Fix __wcsncmp_avx2 in strcmp-avx2.S [BZ# 28755]
Fixes [BZ# 28755] for wcsncmp by redirecting length >= 2^56 to
__wcscmp_avx2. For x86_64 this covers the entire address range so any
length larger could not possibly be used to bound `s1` or `s2`.
test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass.
getcwd: Set errno to ERANGE for size == 1 (CVE-2021-3999)
No valid path returned by getcwd would fit into 1 byte, so reject the
size early and return NULL with errno set to ERANGE. This change is
prompted by CVE-2021-3999, which describes a single byte buffer
underflow and overflow when all of the following conditions are met:
- The buffer size (i.e. the second argument of getcwd) is 1 byte
- The current working directory is too long
- '/' is also mounted on the current working directory
Sequence of events:
- In sysdeps/unix/sysv/linux/getcwd.c, the syscall returns ENAMETOOLONG
because the linux kernel checks for name length before it checks
buffer size
- The code falls back to the generic getcwd in sysdeps/posix
- In the generic func, the buf[0] is set to '\0' on line 250
- this while loop on line 262 is bypassed:
while (!(thisdev == rootdev && thisino == rootino))
since the rootfs (/) is bind mounted onto the directory and the flow
goes on to line 449, where it puts a '/' in the byte before the
buffer.
- Finally on line 458, it moves 2 bytes (the underflowed byte and the
'\0') to the buf[0] and buf[1], resulting in a 1 byte buffer overflow.
- buf is returned on line 469 and errno is not set.
realpath: Set errno to ENAMETOOLONG for result larger than PATH_MAX [BZ #28770]
realpath returns an allocated string when the result exceeds PATH_MAX,
which is unexpected when its second argument is not NULL. This results
in the second argument (resolved) being uninitialized and also results
in a memory leak since the caller expects resolved to be the same as the
returned value.
Return NULL and set errno to ENAMETOOLONG if the result exceeds
PATH_MAX. This fixes [BZ #28770], which is CVE-2021-3998.
support: Add helpers to create paths longer than PATH_MAX
Add new helpers support_create_and_chdir_toolong_temp_directory and
support_chdir_toolong_temp_directory to create and descend into
directory trees longer than PATH_MAX.
Paul A. Clarke [Sat, 25 Sep 2021 14:57:15 +0000 (09:57 -0500)]
powerpc: Fix unrecognized instruction errors with recent binutils
Recent versions of binutils (with commit b25f942e18d6ecd7ec3e2d2e9930eb4f996c258a) stopped preserving "sticky"
options across a base `.machine` directive, nullifying the use of
passing "-many" through GCC to the assembler. As a result, some
instructions which were recognized even under older, more stringent
`.machine` directives become unrecognized instructions in that
context.
In `sysdeps/powerpc/tst-set_ppr.c`, the use of the `mfppr32` extended
mnemonic became unrecognized, as the default compilation with GCC for
32bit powerpc adds a `.machine ppc` in the resulting assembly, so the
command line option `-Wa,-many` is essentially ignored, and the ISA 2.06
instructions and mnemonics, like `mfppr32`, are unrecognized.
The compilation of `sysdeps/powerpc/tst-set_ppr.c` fails with:
Error: unrecognized opcode: `mfppr32'
Add appropriate `.machine` directives in the assembly to bracket the
`mfppr32` instruction.
Part of a 2019 fix (commit 9250e6610fdb0f3a6f238d2813e319a41fb7a810) to
the above test's Makefile to add `-many` to the compilation when GCC
itself stopped passing `-many` to the assember no longer has any effect,
so remove that.
Aurelien Jarno [Mon, 17 Jan 2022 18:41:40 +0000 (19:41 +0100)]
x86: use default cache size if it cannot be determined [BZ #28784]
In some cases (e.g QEMU, non-Intel/AMD CPU) the cache information can
not be retrieved and the corresponding values are set to 0.
Commit 2d651eb9265d ("x86: Move x86 processor cache info to
cpu_features") changed the behaviour in such case by defining the
__x86_shared_cache_size and __x86_data_cache_size variables to 0 instead
of using the default values. This cause an issue with the i686 SSE2
optimized bzero/routine which assumes that the cache size is at least
128 bytes, and otherwise tries to zero/set the whole address space minus
128 bytes.
Fix that by restoring the original code to only update
__x86_shared_cache_size and __x86_data_cache_size variables if the
corresponding cache sizes are not zero.
$ cat nptl/test-condattr-printers.out
Error: Response does not match the expected pattern.
Command: start
Expected pattern: main
Response: Temporary breakpoint 1 at 0x11d5: file test-condattr-printers.c, line 43.
Starting program: /export/build/gnu/tools-build/glibc-cet-gitlab/build-x86_64-linux/nptl/test-condattr-printers
This GDB supports auto-downloading debuginfo from the following URLs:
https://debuginfod.fedoraproject.org/
Enable debuginfod for this session? (y or [n])
Disable debuginfod to avoid GDB messages. This fixes BZ #28757.
Joseph Myers [Thu, 13 Jan 2022 22:18:13 +0000 (22:18 +0000)]
Update syscall lists for Linux 5.16
Linux 5.16 has one new syscall, futex_waitv. Update
syscall-names.list and regenerate the arch-syscall.h headers with
build-many-glibcs.py update-syscalls.
The configure check for CAN_USE_REGISTER_ASM_EBP tried to compile a
simple function that uses %ebp as an inline assembly operand. If
compilation failed, CAN_USE_REGISTER_ASM_EBP was set 0, which
eventually had these consequences:
(1) %ebx was avoided as an inline assembly operand, with an
assembler macro hack to avoid unnecessary register moves.
(2) %ebp was avoided as an inline assembly operand, using an
out-of-line syscall function for 6-argument system calls.
(1) is no longer needed for any GCC version that is supported for
building glibc. %ebx can be used directly as a register operand.
Therefore, this commit removes the %ebx avoidance completely. This
avoids the assembler macro hack, which turns out to be incompatible
with the current Systemtap probe macros (which switch to .altmacro
unconditionally).
(2) is still needed in many build configurations. The existing
configure check cannot really capture that because the simple function
succeeds to compile, while the full glibc build still fails.
Therefore, this commit removes the check, the CAN_USE_REGISTER_ASM_EBP
macro, and uses the out-of-line syscall function for 6-argument system
calls unconditionally.
Joseph Myers [Wed, 10 Nov 2021 15:21:19 +0000 (15:21 +0000)]
Update syscall lists for Linux 5.15
Linux 5.15 has one new syscall, process_mrelease (and also enables the
clone3 syscall for RV32). It also has a macro __NR_SYSCALL_MASK for
Arm, which is not a syscall but matches the pattern used for syscall
macro names.
Add __NR_SYSCALL_MASK to the names filtered out in the code dealing
with syscall lists, update syscall-names.list for the new syscall and
regenerate the arch-syscall.h headers with build-many-glibcs.py
update-syscalls.
Paul A. Clarke [Tue, 14 Sep 2021 18:13:33 +0000 (13:13 -0500)]
powerpc: Fix unrecognized instruction errors with recent GCC
Recent binutils commit b25f942e18d6ecd7ec3e2d2e9930eb4f996c258a
changes the behavior of `.machine` directives to override, rather
than augment, the base CPU. This can result in _reduced_ functionality
when, for example, compiling for default machine "power8", but explicitly
asking for ".machine power5", which loses Altivec instructions.
In tst-ucontext-ppc64-vscr.c, while the instructions provoking the new
error messages are bracketed by ".machine power5", which is ostensibly
Power ISA 2.03 (POWER5), the POWER5 processor did not support the
VSX subset, so these instructions are not recognized as "power5".
This test-case is the tzfile for Asuncion generated by
tzlib-2021e as follows, using the tzlib-2021e zic: "zic -d
DEST -r @1546300800 -L /dev/null -b slim
SOURCE/southamerica". Note that in its type 2 header, it
has two entries in its "time-types" array (types), but only
one entry in its "transition types" array (type_idxs).
* timezone/Makefile, timezone/tst-pr28707.c,
timezone/testdata/gen-XT5.sh: New test.
timezone: handle truncated timezones from tzcode-2021d and later (BZ #28707)
When using a timezone file with a truncated starting time,
generated by the zic in IANA tzcode-2021d a.k.a. tzlib-2021d
(also in tzlib-2021e; current as of this writing), glibc
asserts in __tzfile_read (on e.g. tzset() for this file) and
you may find lines matching "tzfile.c:435: __tzfile_read:
Assertion `num_types == 1' failed" in your syslog.
One example of such a file is the tzfile for Asuncion
generated by tzlib-2021e as follows, using the tzlib-2021e zic:
"zic -d DEST -r @1546300800 -L /dev/null -b slim
SOURCE/southamerica". Note that in its type 2 header, it has
two entries in its "time-types" array (types), but only one
entry in its "transition types" array (type_idxs).
This is valid and expected already in the published RFC8536, and
not even frowned upon: "Local time for timestamps before the
first transition is specified by the first time type (time type
0)" ... "every nonzero local time type index SHOULD appear at
least once in the transition type array". Note the "nonzero ...
index". Until the 2021d zic, index 0 has been shared by the
first valid transition but with 2021d it's separate, set apart
as a placeholder and only "implicitly" indexed. (A draft update
of the RFC mandates that the entry at index 0 is a placeholder
in this case, hence can no longer be shared.)
* time/tzfile.c (__tzfile_read): Don't assert when no transitions
are found.
Paul Eggert [Tue, 14 Sep 2021 05:49:45 +0000 (22:49 -0700)]
Fix subscript error with odd TZif file [BZ #28338]
* time/tzfile.c (__tzfile_compute): Fix unlikely off-by-one bug
that accessed before start of an array when an oddball-but-valid
TZif file was queried with an unusual time_t value.
maminjie [Mon, 20 Dec 2021 11:36:32 +0000 (19:36 +0800)]
Linux: Fix 32-bit vDSO for clock_gettime on powerpc32
When the clock_id is CLOCK_PROCESS_CPUTIME_ID or CLOCK_THREAD_CPUTIME_ID,
on the 5.10 kernel powerpc 32-bit, the 32-bit vDSO is executed successfully (
because the __kernel_clock_gettime in arch/powerpc/kernel/vdso32/gettimeofday.S
does not support these two IDs, the 32-bit time_t syscall will be used),
but tp32.tv_sec is equal to 0, causing the 64-bit time_t syscall to continue to be used,
resulting in two system calls.
It turned that the generic implementation of brk() does not work
for sparc, since on failure kernel will just return the previous
input value without setting the conditional register.
This patches adds back a sparc32 and sparc64 implementation removed
by 720480934ab9107.
Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
powerpc64[le]: Allocate extra stack frame on syscall.S
The syscall function does not allocate the extra stack frame for scv like other
assembly syscalls using DO_CALL_SCV. So after commit d120fb9941 changed the
offset that is used to save LR, syscall ended up using an invalid offset,
causing regressions on powerpc64. So make sure the extra stack frame is
allocated in syscall.S as well to make it consistent with other uses of
DO_CALL_SCV and avoid similar issues in the future.
Tested on powerpc, powerpc64, and powerpc64le (with and without scv)
Aurelien Jarno [Wed, 15 Dec 2021 22:46:19 +0000 (23:46 +0100)]
elf: Fix tst-cpu-features-cpuinfo for KVM guests on some AMD systems [BZ #28704]
On KVM guests running on some AMD systems, the IBRS feature is reported
as a synthetic feature using the Intel feature, while the cpuinfo entry
keeps the same. Handle that by first checking the presence of the Intel
feature on AMD systems.
Florian Weimer [Fri, 17 Dec 2021 11:01:20 +0000 (12:01 +0100)]
nss: Use "files dns" as the default for the hosts database (bug 28700)
This matches what is currently in nss/nsswitch.conf. The new ordering
matches what most distributions use in their installed configuration
files.
It is common to add localhost to /etc/hosts because the name does not
exist in the DNS, but is commonly used as a host name.
With the built-in "dns [!UNAVAIL=return] files" default, dns is
searched first and provides an answer for "localhost" (NXDOMAIN).
We never look at the files database as a result, so the contents of
/etc/hosts is ignored. This means that "getent hosts localhost"
fail without a /etc/nsswitch.conf file, even though the host name
is listed in /etc/hosts.
Florian Weimer [Fri, 17 Dec 2021 10:48:41 +0000 (11:48 +0100)]
arm: Guard ucontext _rtld_global_ro access by SHARED, not PIC macro
Due to PIE-by-default, PIC is now defined in more cases. libc.a
does not have _rtld_global_ro, and statically linking setcontext
fails. SHARED is the right condition to use, so that libc.a
references _dl_hwcap instead of _rtld_global_ro.
For static PIE support, the !SHARED case would still have to be made
PIC. This patch does not achieve that.
Xi Ruoyao [Fri, 13 Aug 2021 16:01:14 +0000 (16:01 +0000)]
mips: increase stack alignment in clone to match the ABI
In "mips: align stack in clone [BZ #28223]"
(commit 1f51cd9a860ee45eee8a56fb2ba925267a2a7bfe) I made a mistake: I
misbelieved one "word" was 2-byte and "doubleword" should be 4-byte.
But in MIPS ABI one "word" is defined 32-bit (4-byte), so "doubleword" is
8-byte [1], and "quadword" is 16-byte [2].
Xi Ruoyao [Thu, 12 Aug 2021 20:31:59 +0000 (20:31 +0000)]
mips: align stack in clone [BZ #28223]
The MIPS O32 ABI requires 4 byte aligned stack, and the MIPS N64 and N32
ABI require 8 byte aligned stack. Previously if the caller passed an
unaligned stack to clone the the child misbehaved.
When running this test on the OpenRISC port I am working on this test
fails with a timeout. The test passes when being straced or debugged.
Looking at the code there seems to be a race condition in that:
1 main thread: calls xpthread_cancel
2 sub thread : receives cancel signal
3 sub thread : cleanup routine waits on barrier
4 main thread: re-inits barrier
5 main thread: waits on barrier
After getting to 5 the main thread and sub thread wait forever as the 2
barriers are no longer the same.
Removing the barrier re-init seems to fix this issue. Also, the barrier
does not need to be reinitialized as that is done by default.
Joseph Myers [Fri, 17 Sep 2021 19:24:14 +0000 (19:24 +0000)]
Use $(pie-default) with conformtest
My glibc bot showed that my conformtest changes fail the build of the
conformtest execution tests for x86_64-linux-gnu-static-pie, because
linking the newly built object with the newly built libc and the
associated options normally used for linking requires it to be built
as PIE. Add $(pie-default) to the compiler command used so that PIE
options are used when required.
There's a case for using the whole of $(CFLAGS-.o) (which includes
$(pie-default)), but that raises questions of any impact from using
optimization flags from CFLAGS in these tests. So for now just use
$(pie-default) as the key part of $(CFLAGS-.o) that's definitely
needed.
Tested with build-many-glibcs.py for x86_64-linux-gnu-static-pie.
Joseph Myers [Fri, 17 Sep 2021 13:12:10 +0000 (13:12 +0000)]
Run conform/ tests using newly built libc
Although the conform/ header tests are built using the headers of the
glibc under test, the execution tests from conformtest (a few tests of
the values of macros evaluating to string constants) are linked and
run with system libc, not the newly built libc.
Apart from preventing testing in cross environments, this can be a
problem even for native testing. Specifically, it can be useful to do
native testing when building with a cross compiler that links with a
libc that is not the system libc; for example, on x86_64, you can test
all three ABIs that way if the kernel support is present, even if the
host OS lacks 32-bit or x32 libraries or they are older than the
libraries in the sysroot used by the compiler used to build glibc.
This works for almost all tests, but not for these conformtest tests.
Arrange for conformtest to link and run test programs similarly to
other tests, with consequent refactoring of various variables in
Makeconfig to allow passing relevant parts of the link-time command
lines down to conformtest. In general, the parts of the link command
involving $@ or $^ are separated out from the parts that should be
passed to conformtest (the variables passed to conformtest still
involve various variables whose names involve $(@F), but those
variables simply won't be defined for the conformtest makefile rules
and I think their presence there is harmless).
This is also most of the support that would be needed to allow running
those tests of string constants for cross testing when test-wrapper is
defined. That will also need changes to where conformtest.py puts the
test executables, so it puts them in the main object directory
(expected to be shared with a test system in cross testing) rather
than /tmp (not expected to be shared) as at present.
Florian Weimer [Fri, 10 Dec 2021 04:14:24 +0000 (05:14 +0100)]
nptl: Add one more barrier to nptl/tst-create1
Without the bar_ctor_finish barrier, it was possible that thread2
re-locked user_lock before ctor had a chance to lock it. ctor then
blocked in its locking operation, xdlopen from the main thread
did not return, and thread2 was stuck waiting in bar_dtor:
Matheus Castanho [Tue, 26 Oct 2021 13:44:59 +0000 (10:44 -0300)]
powerpc64[le]: Fix CFI and LR save address for asm syscalls [BZ #28532]
Syscalls based on the assembly templates are missing CFI for r31, which gets
clobbered when scv is used, and info for LR is inaccurate, placed in the wrong
LOC and not using the proper offset. LR was also being saved to the callee's
frame, while the ABI mandates it to be saved to the caller's frame. These are
fixed by this commit.
Florian Weimer [Wed, 24 Nov 2021 07:59:54 +0000 (08:59 +0100)]
nptl: Do not set signal mask on second setjmp return [BZ #28607]
__libc_signal_restore_set was in the wrong place: It also ran
when setjmp returned the second time (after pthread_exit or
pthread_cancel). This is observable with blocked pending
signals during thread exit.
Florian Weimer [Wed, 10 Nov 2021 14:21:37 +0000 (15:21 +0100)]
s390: Use long branches across object boundaries (jgh instead of jh)
Depending on the layout chosen by the linker, the 16-bit displacement
of the jh instruction is insufficient to reach the target label.
Analysis of the linker failure was carried out by Nick Clifton.
Reviewed-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Stefan Liebler <stli@linux.ibm.com>
(cherry picked from commit 98966749f2b418825ff2ea496a0ee89fe63d2cc8)
Florian Weimer [Fri, 5 Nov 2021 16:01:24 +0000 (17:01 +0100)]
elf: Earlier missing dynamic segment check in _dl_map_object_from_fd
Separated debuginfo files have PT_DYNAMIC with p_filesz == 0. We
need to check for that before the _dl_map_segments call because
that could attempt to write to mappings that extend beyond the end
of the file, resulting in SIGBUS.
Nikita Popov [Tue, 2 Nov 2021 08:21:42 +0000 (13:21 +0500)]
gconv: Do not emit spurious NUL character in ISO-2022-JP-3 (bug 28524)
Bugfix 27256 has introduced another issue:
In conversion from ISO-2022-JP-3 encoding, it is possible
to force iconv to emit extra NUL character on internal state reset.
To do this, it is sufficient to feed iconv with escape sequence
which switches active character set.
The simplified check 'data->__statep->__count != ASCII_set'
introduced by the aforementioned bugfix picks that case and
behaves as if '\0' character has been queued thus emitting it.
To eliminate this issue, these steps are taken:
* Restore original condition
'(data->__statep->__count & ~7) != ASCII_set'.
It is necessary since bits 0-2 may contain
number of buffered input characters.
* Check that queued character is not NUL.
Similar step is taken for main conversion loop.
Bundled test case follows following logic:
* Try to convert ISO-2022-JP-3 escape sequence
switching active character set
* Reset internal state by providing NULL as input buffer
* Ensure that nothing has been converted.
1. Define DL_RO_DYN_SECTION to initalize bootstrap_map.l_ld_readonly
before calling elf_get_dynamic_info to get dynamic info in bootstrap_map,
2. Define a single
static inline bool
dl_relocate_ld (const struct link_map *l)
{
/* Don't relocate dynamic section if it is readonly */
return !(l->l_ld_readonly || DL_RO_DYN_SECTION);
}
H.J. Lu [Thu, 16 Sep 2021 15:15:29 +0000 (08:15 -0700)]
ld.so: Replace DL_RO_DYN_SECTION with dl_relocate_ld [BZ #28340]
We can't relocate entries in dynamic section if it is readonly:
1. Add a l_ld_readonly field to struct link_map to indicate if dynamic
section is readonly and set it based on p_flags of PT_DYNAMIC segment.
2. Replace DL_RO_DYN_SECTION with dl_relocate_ld to decide if dynamic
section should be relocated.
3. Remove DL_RO_DYN_TEMP_CNT.
4. Don't use a static dynamic section to make readonly dynamic section
in vDSO writable.
5. Remove the temp argument from elf_get_dynamic_info.
Szabolcs Nagy [Wed, 15 Sep 2021 14:16:19 +0000 (15:16 +0100)]
elf: Avoid deadlock between pthread_create and ctors [BZ #28357]
The fix for bug 19329 caused a regression such that pthread_create can
deadlock when concurrent ctors from dlopen are waiting for it to finish.
Use a new GL(dl_load_tls_lock) in pthread_create that is not taken
around ctors in dlopen.
The new lock is also used in __tls_get_addr instead of GL(dl_load_lock).
The new lock is held in _dl_open_worker and _dl_close_worker around
most of the logic before/after the init/fini routines. When init/fini
routines are running then TLS is in a consistent, usable state.
In _dl_open_worker the new lock requires catching and reraising dlopen
failures that happen in the critical section.
The new lock is reinitialized in a fork child, to keep the existing
behaviour and it is kept recursive in case malloc interposition or TLS
access from signal handlers can retake it. It is not obvious if this
is necessary or helps, but avoids changing the preexisting behaviour.
The new lock may be more appropriate for dl_iterate_phdr too than
GL(dl_load_write_lock), since TLS state of an incompletely loaded
module may be accessed. If the new lock can replace the old one,
that can be a separate change.
This was found to be due to the kernel overwriting the stack space
allocated by the timex structure. The reason for the overwrite being
that the kernel timex has 64-bit fields and user space code only
allocates enough stack space for timex with 32-bit fields.
On 32-bit systems with TIMESIZE=64 __USE_TIME_BITS64 is not defined.
This causes the timex structure to use 32-bit fields with type
__syscall_slong_t.
This patch adjusts the ifdef condition to allow 32-bit systems with
TIMESIZE=64 to use the 64-bit long long timex definition.
@@ -856,6 +876,11 @@ no more namespaces available for dlmopen()"));
/* See if an error occurred during loading. */
if (__glibc_unlikely (exception.errstring != NULL))
{
+ /* Avoid keeping around a dangling reference to the libc.so link
+ map in case it has been cached in libc_map. */
+ if (!args.libc_already_loaded)
+ GL(dl_ns)[nsid].libc_map = NULL;
+
do_dlopen calls _dl_open with nsid == __LM_ID_CALLER (-2), which calls
dl_open_worker with args.nsid = nsid. dl_open_worker updates args.nsid
if it is __LM_ID_CALLER. After dl_open_worker returns, it is wrong to
use nsid.
Replace nsid with args.nsid after dl_open_worker returns. This fixes
BZ #27609.
Also note that the kernel commit 511ad531afd4090625def4d9aba1f5227bd44b8e
"s390/hwcaps: shorten HWCAP defines" has shortened the prefix of the macros
from "HWCAP_S390_" to "HWCAP_". For compatibility reasons, we do not
change the prefix in public glibc header file.
linux: Revert the use of sched_getaffinity on get_nproc (BZ #28310)
The use of sched_getaffinity on get_nproc and
sysconf (_SC_NPROCESSORS_ONLN) done in 903bc7dcc2acafc40 (BZ #27645)
breaks the top command in common hypervisor configurations and also
other monitoring tools.
The main issue using sched_getaffinity changed the symbols semantic
from system-wide scope of online CPUs to per-process one (which can
be changed with kernel cpusets or book parameters in VM).
This patch reverts mostly of the 903bc7dcc2acafc40, with the
exceptions:
* No more cached values and atomic updates, since they are inherent
racy.
* No /proc/cpuinfo fallback, since /proc/stat is already used and
it would require to revert more arch-specific code.
* The alloca is replace with a static buffer of 1024 bytes.
So the implementation first consult the sysfs, and fallbacks to procfs.
This patch simplifies the memory allocation code and uses the sched
routines instead of reimplement it. This still uses a stack
allocation buffer, so it can be used on malloc initialization code.
Linux currently supports at maximum of 4096 cpus for most architectures:
$ find -iname Kconfig | xargs git grep -A10 -w NR_CPUS | grep -w range
arch/alpha/Kconfig- range 2 32
arch/arc/Kconfig- range 2 4096
arch/arm/Kconfig- range 2 16 if DEBUG_KMAP_LOCAL
arch/arm/Kconfig- range 2 32 if !DEBUG_KMAP_LOCAL
arch/arm64/Kconfig- range 2 4096
arch/csky/Kconfig- range 2 32
arch/hexagon/Kconfig- range 2 6 if SMP
arch/ia64/Kconfig- range 2 4096
arch/mips/Kconfig- range 2 256
arch/openrisc/Kconfig- range 2 32
arch/parisc/Kconfig- range 2 32
arch/riscv/Kconfig- range 2 32
arch/s390/Kconfig- range 2 512
arch/sh/Kconfig- range 2 32
arch/sparc/Kconfig- range 2 32 if SPARC32
arch/sparc/Kconfig- range 2 4096 if SPARC64
arch/um/Kconfig- range 1 1
arch/x86/Kconfig-# [NR_CPUS_RANGE_BEGIN ... NR_CPUS_RANGE_END] range.
arch/x86/Kconfig- range NR_CPUS_RANGE_BEGIN NR_CPUS_RANGE_END
arch/xtensa/Kconfig- range 2 32
With x86 supporting 8192:
arch/x86/Kconfig
976 config NR_CPUS_RANGE_END
977 int
978 depends on X86_64
979 default 8192 if SMP && CPUMASK_OFFSTACK
980 default 512 if SMP && !CPUMASK_OFFSTACK
981 default 1 if !SMP
So using a maximum of 32k cpu should cover all cases (and I would
expect once we start to have many more CPUs that Linux would provide
a more straightforward way to query for such information).
A test is added to check if sched_getaffinity can successfully return
with large buffers.
This is an internal function meant to return the number of avaliable
processor where the process can scheduled, different than the
__get_nprocs which returns a the system available online CPU.
The Linux implementation currently only calls __get_nprocs(), which
in tuns calls sched_getaffinity.
Florian Weimer [Fri, 1 Oct 2021 16:16:41 +0000 (18:16 +0200)]
nptl: pthread_kill must send signals to a specific thread [BZ #28407]
The choice between the kill vs tgkill system calls is not just about
the TID reuse race, but also about whether the signal is sent to the
whole process (and any thread in it) or to a specific thread.
This was caught by the openposix test suite:
LTP: openposix test suite - FAIL: SIGUSR1 is member of new thread pendingset.
<https://gitlab.com/cki-project/kernel-tests/-/issues/764>
Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit eae81d70574e923ce3c59078b8df857ae192efa6)
Florian Weimer [Fri, 1 Oct 2021 16:16:41 +0000 (18:16 +0200)]
support: Add check for TID zero in support_wait_for_thread_exit
Some kernel versions (observed with kernel 5.14 and earlier) can list
"0" entries in /proc/self/task. This happens when a thread exits
while the task list is being constructed. Treat this entry as not
present, like the proposed kernel patch does:
[PATCH] procfs: Do not list TID 0 in /proc/<pid>/task
<https://lore.kernel.org/all/8735pn5dx7.fsf@oldenburg.str.redhat.com/>
Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 176c88f5214d8107d330971cbbfbbba5186a111f)
nptl: Avoid setxid deadlock with blocked signals in thread exit [BZ #28361]
As part of the fix for bug 12889, signals are blocked during
thread exit, so that application code cannot run on the thread that
is about to exit. This would cause problems if the application
expected signals to be delivered after the signal handler revealed
the thread to still exist, despite pthread_kill can no longer be used
to send signals to it. However, glibc internally uses the SIGSETXID
signal in a way that is incompatible with signal blocking, due to the
way the setxid handshake delays thread exit until the setxid operation
has completed. With a blocked SIGSETXID, the handshake can never
complete, causing a deadlock.
As a band-aid, restore the previous handshake protocol by not blocking
SIGSETXID during thread exit.
The new test sysdeps/pthread/tst-pthread-setuid-loop.c is based on
a downstream test by Martin Osvald.
Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 2849e2f53311b66853cb5159b64cba2bddbfb854)
It returns a range of file descriptor referring to the '/dev/null'
pathname. The function takes care of restarting the open range
if a file descriptor is found within the specified range and
also increases RLIMIT_NOFILE if required.
nptl: Fix type of pthread_mutexattr_getrobust_np, pthread_mutexattr_setrobust_np (bug 28036)
Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit f3e664563361dc17530113b3205998d1f19dc4d9)
nptl: pthread_kill needs to return ESRCH for old programs (bug 19193)
The fix for bug 19193 breaks some old applications which appear
to use pthread_kill to probe if a thread is still running, something
that is not supported by POSIX.
posix: Fix attribute access mode on getcwd [BZ #27476]
There is a GNU extension that allows to call getcwd(NULL, >0). It is
described in the documentation, but also directly in the unistd.h
header, just above the declaration.
Therefore the attribute access mode added in commit 06febd8c6705
is not correct. Drop it.
Joseph Myers [Tue, 14 Sep 2021 14:19:24 +0000 (14:19 +0000)]
Add MADV_POPULATE_READ and MADV_POPULATE_WRITE from Linux 5.14 to bits/mman-linux.h
Linux 5.14 adds constants MADV_POPULATE_READ and MADV_POPULATE_WRITE
(with the same values on all architectures). Add these to glibc's
bits/mman-linux.h.
Joseph Myers [Tue, 14 Sep 2021 13:51:58 +0000 (13:51 +0000)]
Update kernel version to 5.14 in tst-mman-consts.py
This patch updates the kernel version in the test tst-mman-consts.py
to 5.14. (There are no new MAP_* constants covered by this test in
5.14 that need any other header changes.)
Joseph Myers [Wed, 8 Sep 2021 12:42:06 +0000 (12:42 +0000)]
Update syscall lists for Linux 5.14
Linux 5.14 has two new syscalls, memfd_secret (on some architectures
only) and quotactl_fd. Update syscall-names.list and regenerate the
arch-syscall.h headers with build-many-glibcs.py update-syscalls.
Fix failing nss/tst-nss-files-hosts-long with local resolver
When a local resolver like unbound is listening on the IPv4 loopback
address 127.0.0.1, the nss/tst-nss-files-hosts-long test fails. This is
due to:
- the default resolver in the absence of resolv.conf being 127.0.0.1
- the default DNS NSS database configuration in the absence of
nsswitch.conf being 'hosts: dns [!UNAVAIL=return] file'
This causes the requests for 'test4' and 'test6' to first be sent to the
local resolver, which responds with NXDOMAIN in the likely case those
records do no exist. In turn that causes the access to /etc/hosts to be
skipped, which is the purpose of that test.
Fix that by providing a simple nsswitch.conf file forcing access to
/etc/hosts for that test. I have tested that the only changed result in
the testsuite is that test.
iconvconfig: Fix behaviour with --prefix [BZ #28199]
The consolidation of configuration parsing broke behaviour with
--prefix, where the prefix bled into the modules cache. Accept a
prefix which, when non-NULL, is prepended to the path when looking for
configuration files but only the original directory is added to the
modules cache.
This has no effect on the codegen of gconv_conf since it passes NULL.
Reported-by: Patrick McCarty <patrick.mccarty@intel.com> Reported-by: Michael Hudson-Doyle <michael.hudson@canonical.com> Reviewed-by: Andreas Schwab <schwab@linux-m68k.org>
(cherry picked from commit 43cea6d5652b6b9e61ac6ecc69419c909b504f47)
nptl: Fix race between pthread_kill and thread exit (bug 12889)
A new thread exit lock and flag are introduced. They are used to
detect that the thread is about to exit or has exited in
__pthread_kill_internal, and the signal is not sent in this case.
The test sysdeps/pthread/tst-pthread_cancel-select-loop.c is derived
from a downstream test originally written by Marek Polacek.
nptl: pthread_kill, pthread_cancel should not fail after exit (bug 19193)
This closes one remaining race condition related to bug 12889: if
the thread already exited on the kernel side, returning ESRCH
is not correct because that error is reserved for the thread IDs
(pthread_t values) whose lifetime has ended. In case of a
kernel-side exit and a valid thread ID, no signal needs to be sent
and cancellation does not have an effect, so just return 0.
sysdeps/pthread/tst-kill4.c triggers undefined behavior and is
removed with this commit.
Jiaxun Yang [Tue, 7 Sep 2021 05:31:42 +0000 (13:31 +0800)]
MIPS: Setup errno for {f,l,}xstat
{f,l,}xstat stub for MIPS is using INTERNAL_SYSCALL
to do xstat syscall for glibc ver, However it leaves
errno untouched and thus giving bad errno output.
Setup errno properly when syscall returns non-zero.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 66016ec8aeefd40e016d7040d966484c764b0e9c)