From: Philippe Waroquiers Date: Tue, 2 Jul 2019 19:43:41 +0000 (+0200) Subject: Fix 409141 and 409367: valgrind hangs or loops when a process sends a signal to itself. X-Git-Tag: VALGRIND_3_16_0~253 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=63a9f0793113fd5d828ea7b6183812ad71f924f1;p=thirdparty%2Fvalgrind.git Fix 409141 and 409367: valgrind hangs or loops when a process sends a signal to itself. The loop scenario: The main thread sends a signal 15 to another thread, and then calls the exit syscall. The exit syscall done by thread 1 marks all threads as needing to die using exitreason VgSrc_ExitProcess. The main thread then gets all other threads out of their blocking syscall to let them die, and then "busy polls" for all other threads to disappear. However, when the second thread is out of its syscall, it gets the signal 15, which is a fatal signal. This second thread then changes the exit reason of all threads to VgSrc_FatalSig, and itself starts to busy poll for all other threads to disappear. This then loops forever. The fix for this consists in not handling the fatal signal in the second thread when the process is already busy dying. Effectively, the exit syscall should be processed "atomically": either the process is running, or it is dead once the syscall is done. Under valgrind, when threads are marked as being ' VgSrc_ExitProcess', the guest process should be considered as dead. Valgrind has still to do the cleanup, the endof run report, etc but otherwise should not let any more user code to run. So, signal should not be handled anymore once the 'exit syscall' has marked all threads as VgSrc_ExitProcess. The hang scenario: The main thread sends a signal 9 (KILL) to itself. When running natively, this directly kills the process, without giving any opportunity to run some user code. Valgrind intercepts the kill syscall, and detects that this is a fatal signal. The main thread was then dying, but was not getting the other threads out of their syscall (to let them die). The fix for this is to have the 'handling' of the signal 9 sent to a thread of the process to directly make the process die, by getting all threads out of syscall. Note that the previous code was trying to have this action done by the thread to which the signal 9 was sent. This was too tricky to keep (causing other race conditions between the main thread sending the signal 9 e.g. exiting and the other thread supposed to die). As it is not particularly critical to have the signal 9 'handled' by a specific thread, the thread that is sending the signal 9 is the one doing the work to cleanup and terminate the process. --- diff --git a/.gitignore b/.gitignore index 640b8c65e7..a0dedb8c56 100644 --- a/.gitignore +++ b/.gitignore @@ -1419,6 +1419,7 @@ /none/tests/pth_simple_mutex /none/tests/pth_simple_threads /none/tests/pth_specific +/none/tests/pth_self_kill /none/tests/pth_stackalign /none/tests/pth_term_signal /none/tests/pth_yield diff --git a/NEWS b/NEWS index 6406035610..7217164776 100644 --- a/NEWS +++ b/NEWS @@ -56,6 +56,9 @@ where XXXXXX is the bug number as listed below. 408091 Missing pkey syscalls 408414 Add support for missing for preadv2 and pwritev2 syscalls 404406 s390x: z14 miscellaneous instructions not implemented +409141 Valgrind hangs when SIGKILLed +409367 exit_group() after signal to thread waiting in futex() causes hangs + n-i-bz Fix minor one time leaks in dhat. n-i-bz Add --run-cxx-freeres=no in outer args to avoid inner crashes. n-i-bz Add support for the Linux io_uring system calls diff --git a/coregrind/m_signals.c b/coregrind/m_signals.c index 9ba6efc4cc..008fff98b1 100644 --- a/coregrind/m_signals.c +++ b/coregrind/m_signals.c @@ -2426,8 +2426,8 @@ void async_signalhandler ( Int sigNo, sigNo, tid, info->si_code, VG_(name_of_VgSchedReturnCode)(tst->exitreason)); - /* */ - if (tst->exitreason == VgSrc_FatalSig) + /* See similar logic in VG_(poll_signals). */ + if (tst->exitreason != VgSrc_None) resume_scheduler(tid); /* Update thread state properly. The signal can only have been @@ -2951,10 +2951,11 @@ void VG_(poll_signals)(ThreadId tid) ThreadState *tst = VG_(get_ThreadState)(tid); vki_sigset_t saved_mask; - if (tst->exitreason == VgSrc_FatalSig) { - /* This task has been requested to die due to a fatal signal - received by the process. So, we cannot poll new signals, - as we are supposed to die asap. If we would poll and deliver + if (tst->exitreason != VgSrc_None ) { + /* This task has been requested to die (e.g. due to a fatal signal + received by the process, or because of a call to exit syscall). + So, we cannot poll new signals, as we are supposed to die asap. + If we would poll and deliver a new (maybe fatal) signal, this could cause a deadlock, as this thread would believe it has to terminate the other threads and wait for them to die, while we already have a thread doing diff --git a/coregrind/m_syswrap/syswrap-generic.c b/coregrind/m_syswrap/syswrap-generic.c index 01191f634b..ab96648583 100644 --- a/coregrind/m_syswrap/syswrap-generic.c +++ b/coregrind/m_syswrap/syswrap-generic.c @@ -3755,20 +3755,26 @@ Bool ML_(do_sigkill)(Int pid, Int tgid) if (tgid != -1 && tst->os_state.threadgroup != tgid) return False; /* not the right thread group */ - /* Check to see that the target isn't already exiting. */ - if (!VG_(is_exiting)(tid)) { - if (VG_(clo_trace_signals)) - VG_(message)(Vg_DebugMsg, - "Thread %u being killed with SIGKILL\n", - tst->tid); - - tst->exitreason = VgSrc_FatalSig; - tst->os_state.fatalsig = VKI_SIGKILL; - - if (!VG_(is_running_thread)(tid)) - VG_(get_thread_out_of_syscall)(tid); - } - + /* Fatal SIGKILL sent to one of our threads. + "Handle" the signal ourselves, as trying to have tid + handling the signal causes termination problems (see #409367 + and #409141). + Moreover, as a process cannot do anything when receiving SIGKILL, + it is not particularly crucial that "tid" does the work to + terminate the process. */ + + if (VG_(clo_trace_signals)) + VG_(message)(Vg_DebugMsg, + "Thread %u %s being killed with SIGKILL, running tid: %u\n", + tst->tid, VG_(name_of_ThreadStatus) (tst->status), VG_(running_tid)); + + if (!VG_(is_running_thread)(tid)) + tst = VG_(get_ThreadState)(VG_(running_tid)); + VG_(nuke_all_threads_except) (VG_(running_tid), VgSrc_FatalSig); + VG_(reap_threads)(VG_(running_tid)); + tst->exitreason = VgSrc_FatalSig; + tst->os_state.fatalsig = VKI_SIGKILL; + return True; } diff --git a/none/tests/Makefile.am b/none/tests/Makefile.am index 37ac4f087d..b63c8d97c3 100644 --- a/none/tests/Makefile.am +++ b/none/tests/Makefile.am @@ -163,6 +163,8 @@ EXTRA_DIST = \ pth_mutexspeed.stdout.exp pth_mutexspeed.vgtest \ pth_once.stderr.exp pth_once.stdout.exp pth_once.vgtest \ pth_rwlock.stderr.exp pth_rwlock.vgtest \ + pth_self_kill_9.stderr.exp pth_self_kill_9.vgtest \ + pth_self_kill_15_other.stderr.exp pth_self_kill_15_other.vgtest \ pth_stackalign.stderr.exp \ pth_stackalign.stdout.exp pth_stackalign.vgtest \ pth_2sig.stderr.exp-linux pth_2sig.stderr.exp-solaris pth_2sig.vgtest \ @@ -227,7 +229,7 @@ check_PROGRAMS = \ pselect_sigmask_null \ pth_atfork1 pth_blockedsig pth_cancel1 pth_cancel2 pth_cvsimple \ pth_empty pth_exit pth_exit2 pth_mutexspeed pth_once pth_rwlock \ - pth_stackalign pth_2sig pth_term_signal\ + pth_self_kill pth_stackalign pth_2sig pth_term_signal\ rcrl readline1 \ require-text-symbol \ res_search resolv \ @@ -315,6 +317,7 @@ pth_mutexspeed_LDADD = -lpthread pth_once_LDADD = -lpthread pth_rwlock_LDADD = -lpthread pth_rwlock_CFLAGS = $(AM_CFLAGS) +pth_self_kill_LDADD = -lpthread pth_stackalign_LDADD = -lpthread pth_2sig_LDADD = -lpthread pth_term_signal_LDADD = -lpthread diff --git a/none/tests/pth_self_kill.c b/none/tests/pth_self_kill.c new file mode 100644 index 0000000000..98adbadcf4 --- /dev/null +++ b/none/tests/pth_self_kill.c @@ -0,0 +1,44 @@ +#include +#include +#include +#include +#include + +/* This reproduces bugs 409141 and 409367. + valgrind ./pth_self_kill 9 was hanging. + + valgrind ./pth_self_kill 15 killotherthread was looping. +*/ + +void *t(void *p) +{ + sleep (200); + printf ("error: supposed to die without printing this\n"); + exit (5); + return NULL; +} + +int main(int argc, char **argv) +{ + pthread_t thr; + + if (argc <= 1) + { + printf + ("usage: pth_self_kill SIGNALNR [killotherthread] [sleepafterkill]\n"); + exit (1); + } + + int s = atoi(argv[1]); + + pthread_create(&thr, NULL, t, NULL); + sleep (1); + if (argc > 2) + { + pthread_kill(thr, s); + if (argc > 3) + sleep (2); + } + else + raise(s); +} diff --git a/none/tests/pth_self_kill_15_other.stderr.exp b/none/tests/pth_self_kill_15_other.stderr.exp new file mode 100644 index 0000000000..e69de29bb2 diff --git a/none/tests/pth_self_kill_15_other.vgtest b/none/tests/pth_self_kill_15_other.vgtest new file mode 100644 index 0000000000..4141869176 --- /dev/null +++ b/none/tests/pth_self_kill_15_other.vgtest @@ -0,0 +1,3 @@ +prog: pth_self_kill +args: 15 killotherthread +vgopts: -q diff --git a/none/tests/pth_self_kill_9.stderr.exp b/none/tests/pth_self_kill_9.stderr.exp new file mode 100644 index 0000000000..e69de29bb2 diff --git a/none/tests/pth_self_kill_9.vgtest b/none/tests/pth_self_kill_9.vgtest new file mode 100644 index 0000000000..4bed473e2d --- /dev/null +++ b/none/tests/pth_self_kill_9.vgtest @@ -0,0 +1,3 @@ +prog: pth_self_kill +args: 9 +vgopts: -q