Tom de Vries [Tue, 10 Dec 2024 19:30:05 +0000 (20:30 +0100)]
[gdb/testsuite] Use setVariable in gdb.dap/scopes.exp
The test-case gdb.dap/scopes.exp contains the following outdated comment:
...
# setVariable isn't implemented yet, so use the register name.
...
Now that setVariable is implemented, use it to set variable scalar, and remove
the bit that sets the first register. That part is known to fail on s390x,
because the first register isn't writeable [1].
Tested on x86_64-linux.
Suggested-By: Tom Tromey <tom@tromey.com> Approved-By: Tom Tromey <tom@tromey.com>
[1] https://sourceware.org/pipermail/gdb-patches/2024-December/213823.html
The problem is that the test-case expects three scopes:
...
lassign $scopes scope reg_scope return_scope
...
but the return_scope is missing because this doesn't work:
...
$ gdb -q -batch outputs/gdb.dap/step-out/step-out \
-ex "b function_breakpoint_here" \
-ex run \
-ex finish
...
Value returned has type: struct result. Cannot determine contents
...
This is likely caused by a problem in gdb, but there's nothing wrong the DAP
support.
Fix this by:
- allowing two scopes, and
- declaring the tests of return_scope unsupported.
Tom de Vries [Tue, 10 Dec 2024 10:48:33 +0000 (11:48 +0100)]
[gdb/testsuite] Fix fails in gdb.python/py-arch-reg-groups.exp
Since commit e69d35f45e0 ("Use ui-out table in "maint print reggroups""),
test-case gdb.python/py-arch-reg-groups.exp fails with check-read1:
...
FAIL: $exp: Same number of registers groups found
FAIL: $exp: all register groups match
...
Fix this by adding a gdb_test_multiple clause that matches the command.
WANG Xuerui [Sat, 19 Oct 2024 14:11:52 +0000 (22:11 +0800)]
LoongArch: Default to a maximum page size of 64KiB
As per the spec (Section 7.5.10, LoongArch Reference Manual Vol. 1),
LoongArch machines are not limited in page size choices, and currently
page sizes of 4KiB, 16KiB and 64KiB are supported by mainline Linux.
While 16KiB is the most common, the current BFD code says it is the
maximum; this is not correct, and as an effect, almost all existing
binaries are incompatible with a 64KiB kernel because the sections are
not sufficiently aligned, while being totally fine otherwise.
This is needlessly complicating integration testing [1].
This patch fixes the inconsistency, and also brings BFD behavior in line
with that of LLD [2].
bfd/
* elfnn-loongarch.c (ELF_MAXPAGESIZE): Bump to 64KiB.
(ELF_MINPAGESIZE): Define as 4KiB.
(ELF_COMMONPAGESIZE): Define as 16KiB.
ld/
* testsuite/ld-loongarch-elf/64_pcrel.d: Update assertions after
changing the target max page size to 64KiB.
* testsuite/ld-loongarch-elf/data-got.d: Likewise.
* testsuite/ld-loongarch-elf/desc-relex.d: Likewise.
* testsuite/ld-loongarch-elf/relax-align-ignore-start.d: Likewise.
* testsuite/ld-loongarch-elf/tlsdesc_abs.d: Make the fuzzy match work
as intended by not checking exact instruction words.
* testsuite/ld-loongarch-elf/tlsdesc_extreme.d: Likewise.
Peter Bergner [Mon, 9 Dec 2024 22:32:08 +0000 (17:32 -0500)]
PowerPC: Disallow r0 as a base register for the hashst and hashchk insns
Using r0 as a base address register in the ROP hashst and hashchk instructions
is invalid. Modify the assembler to catch that illegal use and emit an error.
opcodes/
* ppc-opc.c (insert_ras): Update error message and function comment.
(powerpc_opcodes) <hashst, hashstp, hashchk, hashchkp>: Use RAS.
Tom Tromey [Fri, 15 Nov 2024 16:29:27 +0000 (09:29 -0700)]
Introduce NoOpStringPrinter
We discovered that attempting to print a very large string-like array
would succeed on the CLI, but in DAP would cause the "variables"
request to fail with:
value requires 67038491 bytes, which is more than max-value-size
This turns out to be a limitation in Value.format_string, which
de-lazy-ifies the value.
This patch fixes this problem by introducing a new NoOpStringPrinter
class, and then using it for string-like values. This printer returns
a lazy string, which solves the problem.
Note there are some special cases where we do not want to return a
lazy string. I've documented these in the code. I considered making
gdb.Value.lazy_string handle these cases -- for example it could
return 'self' rather than a lazy string in some situations -- but this
approach was simpler.
Tom Tromey [Tue, 19 Nov 2024 14:34:26 +0000 (07:34 -0700)]
Clean up 0-length handling in gdbpy_create_lazy_string_object
gdbpy_create_lazy_string_object will throw an exception if you pass it
a NULL pointer without also setting length=0 -- the default,
length==-1, will fail. This seems bizarre. Furthermore, it doesn't
make sense to do this check for array types, as an array can have a
zero length. This patch cleans up the check and makes it specific to
TYPE_CODE_PTR.
Tom Tromey [Mon, 18 Nov 2024 20:47:22 +0000 (13:47 -0700)]
Reject non-string types in gdb.Value.lazy_string
Currently, gdb.Value.lazy_string will allow the conversion of any
object to a "lazy string". However, this was never the intent and is
weird besides. This patch changes this code to correctly throw an
exception in the non-matching cases.
Tom Tromey [Wed, 20 Nov 2024 13:50:17 +0000 (06:50 -0700)]
Fix error check in gdb_py_test_silent_cmd
I added a new test using gdb_py_test_silent_cmd, and then was
surprised to find out that the new test passed -- it caused a Python
exception and I had expected it to fail. This patch fixes this proc
to detect this situation and fail.
Tom Tromey [Tue, 12 Nov 2024 20:07:46 +0000 (13:07 -0700)]
Omit artificial symbols from DAP variables response
While testing DAP, we found a situation where a compiler-generated
variable caused the "variables" request to fail -- the variable in
question being an apparent 67-megabyte string.
It seems to me that artificial variables like this aren't interesting
to DAP users, and the gdb CLI omits these as well.
This patch changes DAP to omit these variables, adding a new
gdb.Symbol.is_artificial attribute to make this possible.
Tom Tromey [Wed, 20 Nov 2024 20:04:27 +0000 (13:04 -0700)]
Defer DAP launch command until after configurationDone
PR dap/32090 points out that gdb's DAP "launch" sequencing is
incorrect. The current approach (which is itself a 2nd
implementation...) was based on a misreading of the spec. The spec
has since been clarified here:
The clarification here is that a client is free to send the "launch"
(or "attach") request at any point after the "initialized" event has
been sent by gdb. However, the "launch" does not cause any action to
be taken -- and does not send a response -- until after
"configurationDone" has been seen.
This patch implements this by arranging for the launch and attach
commands to return a DeferredRequest object.
All the tests needed updates. I've also added a new test that checks
that the deferred "launch" request can be cancelled. (Note that the
cancellation is lazy -- it also waits until configurationDone is seen.
This could be fixed, but I was not sure whether it is important to do
so.)
Finally, the "launch" command has a somewhat funny sequencing now.
Simply sending the command and waiting for a response yielded strange
results if the inferior did not stop -- in this case, the repsonse was
never sent. So now, the command is split into two parts, with some
setup being done synchronously (for better error propagation) and the
actual "run" being done async.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32090 Reviewed-by: Kévin Le Gouguec <legouguec@adacore.com>
Tom Tromey [Wed, 20 Nov 2024 21:56:38 +0000 (14:56 -0700)]
Add DAP deferred requests
This adds a new "deferred request" capability to DAP. The idea here
is that if a request returns a DeferredRequest object, then no
response is sent immediately to the client. Instead, the request is
pending until the deferred request is rescheduled.
Some minor refactorings, particularly in cancellation, were needed to
make this work.
There's no use of this in the tree yet -- that is the next patch.
Reviewed-by: Kévin Le Gouguec <legouguec@adacore.com>
Tom Tromey [Wed, 20 Nov 2024 20:29:39 +0000 (13:29 -0700)]
Allow cancellation of DAP-thread requests
This patch started as an attempt to fix the comment in
CancellationHandler.cancel, but while working on it I found that the
code could be improved as well.
The current DAP cancellation code only handles the case where work is
done on the gdb thread -- by checking for cancellation in
interruptable_region. This means that if a request is handled
completely in tthe DAP thread, then cancellation will never work.
Now, this isn't a bug per se. DAP doesn't actually require that
cancellation succeed. In fact, I think it can't, because cancellation
is inherently racy.
However, a coming patch will add a sort of "pending" request, and it
would be nice if that were cancellable before any commands are sent to
the gdb thread.
No test in this patch, but one will arrive at the end of the series.
Reviewed-by: Kévin Le Gouguec <legouguec@adacore.com>
Tom Tromey [Wed, 20 Nov 2024 19:46:53 +0000 (12:46 -0700)]
Refactor CancellationHandler in DAP
This refactors the DAP CancellationHandler to be a context manager,
and reorganizes the caller to use this. This is a bit more robust and
also simplifies a subsequent patch in this series.
Reviewed-by: Kévin Le Gouguec <legouguec@adacore.com>
Tom Tromey [Wed, 20 Nov 2024 18:13:28 +0000 (11:13 -0700)]
Add call_function_later to DAP
This adds a new call_function_later API to DAP. This arranges to run
a function after the current request has completed. This isn't used
yet, but will be at the end of this series.
Reviewed-by: Kévin Le Gouguec <legouguec@adacore.com>
Tom Tromey [Wed, 20 Nov 2024 18:07:05 +0000 (11:07 -0700)]
Reimplement DAP delayed events
This patch changes how delayed events are implemented in DAP. The new
implementation makes it simpler to add a delayed function call, which
will be needed by the final patch in this series.
Reviewed-by: Kévin Le Gouguec <legouguec@adacore.com>
Tom Tromey [Thu, 21 Nov 2024 19:32:53 +0000 (12:32 -0700)]
Reimplement DAP's stopAtBeginningOfMainSubprogram
Right now, stopAtBeginningOfMainSubprogram is implemented "by hand",
but then later the launch function uses "starti" to implement
stopOnEntry. This patch unifies this code and rewrites it to use
"start" when appropriate.
Reviewed-by: Kévin Le Gouguec <legouguec@adacore.com>
Tom de Vries [Mon, 9 Dec 2024 17:19:36 +0000 (18:19 +0100)]
[gdb/symtab] Apply workaround for PR gas/31115 a bit more
In commit 8a61ee551ce ("[gdb/symtab] Workaround PR gas/31115"), I applied a
workaround for PR gas/31115 in read_func_scope, fixing test-case
gdb.arch/pr25124.exp.
Recently I noticed that the test-case is failing again.
Fix this by factoring out the workaround into a new function fixup_low_high_pc
and applying it in dwarf2_die_base_address.
While we're at it, do the same in dwarf2_record_block_ranges.
Tested on arm-linux with target boards unix/-marm and unix/-mthumb.
Reviewed-By: Alexandra Petlanova Hajkova <ahajkova@redhat.com>
Tom de Vries [Mon, 9 Dec 2024 14:49:44 +0000 (15:49 +0100)]
[gdb/syscalls] Generate aarch64-linux.xml.in in update-linux-from-src.sh
Currently aarch64-linux.xml.in is skipped by update-linux-from-src.sh:
...
$ ./update-linux-from-src.sh ~/upstream/linux-stable.git/
Skipping aarch64-linux.xml.in, no syscall.tbl
...
$
...
and instead we use update-linux.sh.
This works fine, but requires an aarch64 system with recent system headers,
which makes it harder to pick up the latest changes in the linux kernel.
Fix this by updating ./update-linux-from-src.sh to:
- build the linux kernel headers for aarch64
- use update-linux.sh with those headers to generate
aarch64-linux.xml.in.
Regenerating aarch64-linux.xml.in using current trunk of linux-stable gives me
these changes:
...
+ <syscall name="setxattrat" number="463"/>
+ <syscall name="getxattrat" number="464"/>
+ <syscall name="listxattrat" number="465"/>
+ <syscall name="removexattrat" number="466"/>
...
which are the same changes I see for the other architectures.
Note that the first step, building the linux kernel headers is a cross build
and should work on any architecture.
But the second step, update-linux.sh uses plain gcc rather than a cross-gcc,
so there is scope for problems, but we seem to get away with this on
x86_64-linux.
So, while we could constrain this to only generate aarch64-linux.xml.in on
aarch64-linux, I'm leaving this unconstrained.
For aarch64-linux.xml.in, this doesn't matter much to me because I got an
aarch64-linux system.
But I don't have a longaarch system, and the same approach seems to work
there. I'm leaving this for follow-up patch though.
Tested on aarch64-linux and x86_64-linux. Verified with shellcheck.
Mark Wielaard [Sat, 7 Dec 2024 00:37:53 +0000 (01:37 +0100)]
Include gdbsupport/gdb_vecs.h in gdb/s390-linux-nat.c
Commit c8889b913175 ("gdb, gdbserver, gdbsupport: remove some unused
gdb_vecs.h includes") removed gdbsupport/gdb_vecs.h from various
header files. This caused an compile issue for gdb/s390-linux-nat.c
../../binutils-gdb/gdb/s390-linux-nat.c: In member function ‘virtual int s390_linux_nat_target::remove_watchpoint(CORE_ADDR, int, target_hw_bp_type, expression*)’:
../../binutils-gdb/gdb/s390-linux-nat.c:875:11: error: ‘unordered_remove’ was not declared in this scope
875 | unordered_remove (state->watch_areas, ix);
| ^~~~~~~~~~~~~~~~
../../binutils-gdb/gdb/s390-linux-nat.c: In member function ‘virtual int s390_linux_nat_target::remove_hw_breakpoint(gdbarch*, bp_target_info*)’:
../../binutils-gdb/gdb/s390-linux-nat.c:928:11: error: ‘unordered_remove’ was not declared in this scope
928 | unordered_remove (state->break_areas, ix);
| ^~~~~~~~~~~~~~~~
Fix this by including gdbsupport/gdb_vecs.h in gdb/s390-linux-nat.c.
gdb: 'target ...' commands now expect quoted/escaped filenames
it was no longer possible to pass GDB the name of a core file
containing any special characters (white space or quote characters) on
the command line. For example:
The problem is that the above commit changed the 'target core' command
to expect quoted filenames, so before the above commit a user could
write:
(gdb) target core /tmp/core file.core
[New LWP 2345783]
Core was generated by `./mkcore'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000401111 in ?? ()
(gdb)
But after the above commit the user must write:
(gdb) target core /tmp/core\ file.core
or
(gdb) target core "/tmp/core file.core"
This is part of a move to make GDB's filename argument handling
consistent.
Anyway, the problem with the '-c' command line flag is that it
forwards the filename unmodified through to the 'core-file' command,
which in turn forwards to the 'target core' command.
So when the user, at a shell writes:
$ gdb -c "core file.core"
this arrives in GDB as the unquoted string 'core file.core' (without
the single quotes). GDB then forwards this to the 'core-file'
command as if the user had written this at a GDB prompt:
(gdb) core-file core file.core
Which then fails to parse due to the unquoted white space between
'core' and 'file.core'.
The solution I propose is to escape any special characters in the core
file name passed from the command line before calling 'core-file'
command from main.c.
I've updated the corefile.exp test to include a test for passing a
core file containing a white space character. While I was at it I've
modernised the part of corefile.exp that I was touching.
Andrew Burgess [Mon, 9 Dec 2024 10:50:37 +0000 (10:50 +0000)]
gdb: use 'const' more in a couple of small breakpoint functions
Make the 'struct breakpoint *' argument 'const' in user_breakpoint_p
and pending_breakpoint_p. And make the 'struct bp_location *'
argument 'const' in bl_address_is_meaningful.
There should be no user visible changes after this commit.
Simon Marchi [Wed, 4 Dec 2024 20:07:32 +0000 (15:07 -0500)]
gdbserver: simplify win32 process removal
In the spirit of encapsulation, I'm looking to remove the need for
external code to access the "ptid -> thread" map of process_info, making
it an internal implementation detail. The only remaining use is in
function clear_inferiors, and it led me down this rabbit hole:
- clear_inferiors is really only used by the Windows port and doesn't
really make sense in the grand scheme of things, I think (when would
you want to remove all threads of all processes, without removing
those processes?)
- ok, so let's remove clear_inferiors and inline the code where it's
called, in function win32_clear_inferiors
- the Windows port does not support multi-process, so it's not really
necessary to loop over all processes like this:
(or pass down the process from the caller, but it's not important
right now)
- so, the code that we've inlined in win32_clear_inferiors does 3
things:
- clear the process' thread list and map (which deletes the
thread_info objects)
- clear the dll list, which just basically frees some objects
- switch to no current process / no current thread
- let's now look at where this win32_clear_inferiors function is used:
- in win32_process_target::kill, where the process is removed just
after
- in win32_process_target::detach, where the process is removed
just after
- in win32_process_target::wait, when handling a process exit.
After this returns, we could be in handle_target_event (if async)
or resume (if sync), both in `server.cc`. In both of these
cases, target_mourn_inferior gets called, we end up in
win32_process_target::mourn, which removes the process
- in all 3 cases above, we end up removing the process, which takes
care of the 3 actions listed above:
- the thread list and map get cleared when the process gets
destroyed
- same with the dll list
- remove_process switches to no current process / current thread
if the process being removed is the current one
- I conclude that it's probably unnecessary to do the cleanup in
win32_clear_inferiors, because it's going to get done right after
anyway.
Therefore, this patch does:
- remove clear_inferiors, remove the call in win32_clear_inferiors
- remove clear_dlls, which is now unused
- remove process_info::thread_map, which is now unused
- rename win32_clear_inferiors to win32_clear_process, which seems more
accurate
win32_clear_inferiors also does:
for_each_thread (delete_thread_info);
which also makes sure to delete all threads, but it also deletes the
Windows private data object (windows_thread_info), so I'll leave this
one there for now. But if we could make the thread private data
destruction automatic, on thread destruction, it could be removed, I
think.
There should be no user-visible change with this patch. Of course,
operations don't happen in the same order as before, so there might be
some important detail I'm missing. I'm only able to build-test this, if
someone could give it a test run on Windows, it would be appreciated.
gdb: Fix use-after-free when an objfile has no symbols to load
The recent commit <HASH> moved an initialization of an objfile_holder in
syms_from_objfile_1 much earlier in the function, to better deal with
when GDB is unable to read the objfile format.
However, there is an early exit from syms_from_objfile_1 when the
objfile can be understood, but has no symbols. That was not releasing
the objfile_holder, so the objfile was being unlinked from the program
space, but the process of reading the objfile was being continued,
leading to use-after-frees flagged by the Address Sanitizer.
This commit fixes that UAF by making the objfile_holder release the
objfile right before the early exit.
This commit also changes the test gdb.base/dump.exp since that was the
original test that flagged the UAF, but at the end of the test the
generated files were being deleted, meaning we couldn't redo the test
manually after the fact. That final deletion was removed
Reported-by: Simon Marchi <simark@simark.ca> Approved-By: Simon Marchi <simon.marchi@efficios.com>
Hannes Domani [Fri, 6 Dec 2024 13:04:00 +0000 (14:04 +0100)]
Reduce WOW64 code duplication
Currently we have duplicate code for each place where
windows_thread_info::context is touched, since for WOW64 processes
it has to do the equivalent with wow64_context instead.
The actual choice if context or wow64_context are used, is handled by
this new function in windows_process_info:
template<typename Function>
auto with_context (windows_thread_info *th, Function function)
{
#ifdef __x86_64__
if (wow64_process)
return function (th != nullptr ? th->wow64_context : nullptr);
else
#endif
return function (th != nullptr ? th->context : nullptr);
}
The other parts to make this work are the templated WindowsContext class
which give the appropriate ContextFlags for both types.
And there are also overloaded helper functions, like in the case of
get_thread_context here, call either GetThreadContext or
Wow64GetThreadContext.
According git log --stat, this results in 120 lines less code.
Nelson Chu [Fri, 12 May 2023 09:15:58 +0000 (17:15 +0800)]
RISC-V: PR27566, consider ELF_MAXPAGESIZE/COMMONPAGESIZE for gp relaxations.
For default linker script, if a symbol's value outsides the bounds of the
defined section, then it may cross the data segment alignment, so we should
reserve more size about MAXPAGESIZE and COMMONPAGESIZE when doing gp
relaxations. Otherwise we may meet the truncated errors since the data
segment alignment might move the section forward.
bfd/
PR 27566
* elfnn-riscv.c (_bfd_riscv_relax_lui): Consider MAXPAGESIZE and
COMMONPAGESIZE if the symbol's value outsides the bounds of the
defined section.
(_bfd_riscv_relax_pc): Likewise.
ld/
PR 27566
* testsuite/ld-riscv-elf/ld-riscv-elf.exp: Updated.
* testsuite/ld-riscv-elf/relax-data-segment-align*: New testcase
for pr27566. Without this patch, the rv32 binutils will meet
truncated errors for this testcase.
Simon Marchi [Tue, 3 Dec 2024 20:01:14 +0000 (15:01 -0500)]
gdbserver/win32-low.cc: remove use of `all_threads`
Fix this:
gdbserver/win32-low.cc: In function ‘void child_delete_thread(DWORD, DWORD)’:
gdbserver/win32-low.cc:192:7: error: ‘all_threads’ was not declared in this scope; did you mean ‘using_threads’?
192 | if (all_threads.size () == 1)
| ^~~~~~~~~~~
| using_threads
Commit 9f77b3aa0bfc ("gdbserver: change 'all_processes' and
'all_threads' list type") changed the type of `all_thread` to an
intrusive_list, without changing this particular use, which broke the
build because an intrusive_list doesn't know its size, so it doesn't
have a `size()` method. The subsequent commit removed `all_threads`,
leading to the error above.
Fix it by using the number of threads of the concerned process instead.
My rationale: as far as I know, GDBserver on Windows only supports one
process at a time, so there's no need to iterate over all processes. If
we made GDBserver for Windows support multiple processes, my intuition
is that we'd want this check to use the number of threads of the
concerned process, not the number of threads overall.
Add the method `process_info::thread_count`, to get the number of
threads of the process.
I'm not really sure what this check is for in the first place, Hannes
Domani said that this check didn't seem to trigger on Windows 7 and 11.
Perhaps it was necessary before.
Change-Id: I84d6226532b887d99248cf3be90f5065fb7a074a Tested-By: Hannes Domani <ssbssa@yahoo.de>
Hu, Lin1 [Thu, 5 Dec 2024 06:49:27 +0000 (14:49 +0800)]
Support Intel AVX10.2 satcvt instructions
In this patch, we will support AVX10.2 satcvt instructions. All of them
are new instruction forms. In current documentation, it is still
VCVTTNEBF162I[,U]BS, but it will change to VCVTTBF162I[,U]BS eventually.
In table part, we used temporary <sign> iterator to reduce redundancy.
It definitely could be done for legacy cvt insns, but it is out of this
patch's scope.
H.J. Lu [Mon, 2 Dec 2024 04:58:33 +0000 (12:58 +0800)]
x86: Eliminate unnecessary {evex} prefixes
For several instructions including vps{l,r}l{d,q,w,dq} and vpsra{d,w},
their VEX part do not have the following version:
vpsrlw $0x1f,(%r15,%rcx,4),%xmm0
Thus, {evex} prefix should not be inserted when their second operand is
memory, while we still need them for register as second operand. Add a
new macro %ME to solve this problem.
For vpsraq, there is no VEX version, so the {evex} prefix should always
be eliminated.
gas/ChangeLog:
PR binutils/32403
* testsuite/gas/i386/i386.exp: Run new test.
* testsuite/gas/i386/x86-64.exp: Ditto.
* testsuite/gas/i386/evex-only.d: New test.
* testsuite/gas/i386/evex-only.s: Ditto.
* testsuite/gas/i386/x86-64-evex-only.d: Ditto.
* testsuite/gas/i386/x86-64-evex-only.s: Ditto.
opcodes/ChangeLog:
PR binutils/32403
* i386-dis-evex-reg.h: Use %ME instead of %XE for vps{l,r}l{w,dq}
and vpsraw. Split table for vpsra{d,q}.
* i386-dis-evex-w.h: Use %ME instead of %XE for vps{l,r}l{d,q}
and vpsrad. Eliminate vpsraq {evex} prefix.
* i386-dis-evex.h: Split table for vpsra{d,q}.
* i386-dis.c: (EVEX_W_0F72_R_4): New.
(EVEX_W_0FE2): Ditto.
(struct dis386): Add comment for %ME.
(putop): Handle %ME.
Co-authored-by: Haochen Jiang <haochen.jiang@intel.com> Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
gdb: fix handling of DW_AT_entry_pc of inlined subroutines
GDB's buildbot CI testing highlighted this assertion failure:
(gdb) c
Continuing.
../../binutils-gdb/gdb/block.h:203: internal-error: set_entry_pc: Assertion `start >= this->start () && start < this->end ()' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
----- Backtrace -----
FAIL: gdb.base/break-probes.exp: run til our library loads (GDB internal error)
This assertion was in the new function set_entry_pc and is asserting
that the default_entry_pc() value is within the blocks start/end
range.
The default_entry_pc() is the value GDB will use as the entry-pc if
the DWARF doesn't specifically override the entry-pc. This value is
calculated as:
1. The start address of the first sub-range within the block, if the
block has more than 1 range, or
2. The low address (from DW_AT_low_pc) for the block.
If the block only has a single range then this means the block was
defined with low/high pc attributes (case #2 above). These low/high
pc values are what block::start() and block::end() return. This means
that by definition, if the block is continuous, the above assert
cannot trigger as 'start', the default_entry_pc() would be equivalent
to block::start().
This means that, for the assert to trigger, the block must have
multiple ranges, and the first address of the first range is not
within the blocks low/high address range. This seems wrong.
I inspected the state at the time the assert triggered and discovered
the block's start() address. Then I removed the assert and restarted
GDB. I was now able to inspect the blocks at the offending address:
(gdb) maintenance info blocks 0x7ffff7dddaa4
Blocks at 0x7ffff7dddaa4:
from objfile: [(objfile *) 0x44a37f0] /lib64/ld-linux-x86-64.so.2
[(block *) 0x46b30c0] 0x7ffff7ddd5a0..0x7ffff7dde8a6
entry pc: 0x7ffff7ddd5a0
is global block
symbol count: 4
is contiguous
[(block *) 0x46b3020] 0x7ffff7ddd5a0..0x7ffff7dde8a6
entry pc: 0x7ffff7ddd5a0
is static block
symbol count: 9
is contiguous
[(block *) 0x46b2f70] 0x7ffff7ddda00..0x7ffff7dddac3
entry pc: 0x7ffff7ddda00
function: __GI__dl_find_dso_for_object
symbol count: 4
is contiguous
[(block *) 0x46b2e10] 0x7ffff7dddaa4..0x7ffff7dddac3
entry pc: 0x7ffff7dddaa4
inline function: __GI__dl_find_dso_for_object
symbol count: 5
is contiguous
[(block *) 0x46b2a40] 0x7ffff7dddaa4..0x7ffff7dddac3
entry pc: 0x7ffff7dddaa4
symbol count: 1
is contiguous
[(block *) 0x46b2970] 0x7ffff7dddaa4..0x7ffff7dddac3
entry pc: 0x7ffff7dddaa4
symbol count: 2
address ranges:
0x7ffff7ddda0e..0x7ffff7ddda77
0x7ffff7ddda90..0x7ffff7ddda96
I've left everything in for context, but the only really interesting
bit is the very last block, it's low/high range is:
which are all outside the low/high range. This is what triggers the
assert. But why does that block exist at all?
What I believe is happening is that we're running into a bug in older
versions of GCC. The buildbot failure was with an 8.5 gcc, and Tom de
Vries also reported seeing failures when using version 7 and 8 gcc,
but not with gcc 9 and onward.
Looking at the DWARF I can see that the problematic block is created
from this DIE:
And so we can see that <15efb> has got both low/high pc attributes and
a ranges attribute.
If I widen my checking to parents of DIE <15efb> then I see that they
also have DW_AT_abstract_origin, however, there is something
interesting going on, the parent DIEs are linking to a different DIE
tree than <15efb>.
What I believe is happening is this, we have an abstract instance
tree, this is rooted at a DW_AT_subprogram, and contains all the
blocks, variables, parameters, etc, that you would expect. As this is
an abstract instance, then there are no low/high pc attributes, and no
ranges attributes in this tree. This makes sense.
Now elsewhere we have a DW_TAG_subprogram (not
DW_TAG_inlined_subroutine) which links via
DW_AT_abstract_origin to the abstract DW_AT_subprogram. This case is
documented in the DWARF 5 spec in section 3.3.8.3, and describes an
Out-of-Line Instance of an Inlined Subroutine. Within this out of
line instance many of the DIE correctly link back, using
DW_AT_abstract_origin to the abstract instance tree. This tree also
includes the DIE <15e9f>, which is where our problem DIE references.
Now, to really confuse things, within this out-of-line instance we
have a DW_TAG_inlined_subroutine, which is another instance of the
same abstract instance tree! This would seem to indicate a recursive
call to the inline function, and the compiler, for some reason, needed
to instantiate an out of line instance of this function.
And it is within this nested, inlined subroutine, that the problem DIE
exists. The problem DIE is referencing the corresponding DIE within
the out of line instance tree, but I am convinced this must be a (long
fixed) GCC bug, and that the problem DIE should be referencing the DIE
within the abstract instance tree.
I'm aware that the above is pretty confusing. The actual DWARF would
be a around 200 lines long, so I'd like to avoid dumping it in here.
But here's my attempt at representing what's going on in a minimal
example. The numbers down the side represent the section offset, not
the nesting level, and I've removed any attributes that are not
relevant:
The lexical block at <6> is linking to <4> when it should be linking
to <2>.
There is one additional thing that we might wonder about, which is,
when calculating the low/high pc range for a block, why does GDB not
make use of the range information and expand the range beyond the
defined low/high values?
The answer to this is in dwarf_get_pc_bounds_ranges_or_highlow_pc in
dwarf/read.c. This is where the low/high bounds are calculated. What
we see is that GDB first checks for a low/high attribute pair, and if
that is present, this defines the address range for the block. Only
if there is no DW_AT_low_pc do we check for the DW_AT_ranges, and use
that to define the extent of the block. And this makes sense, section
3.5 of the DWARF-5 spec says:
The lexical block entry may have either a DW_AT_low_pc and DW_AT_high_pc
pair of attributes or a DW_AT_ranges attribute whose values encode the
contiguous or non-contiguous address ranges, respectively, of the machine
instructions generated for the lexical block...
Section 3.5 is specifically about lexical blocks, but the same
wording, about it being either low/high OR ranges is repeated for
other DW_TAG_ types.
So this explains why GDB doesn't use the ranges to expand the problem
blocks ranges; as the first DIE has low/high addresses, these are
used, and the ranges is not consulted.
It is only later in dwarf2_record_block_ranges that we create a range
based off the low/high pc, and then also process the ranges data, this
allows the problem block to exist with ranges that are outside the
low/high range.
To solve this I considered a number of options:
1. Prevent loading certain attributes from an abstract instance.
Section 3.3.8.1 of the DWARF-5 spec talks about which attributes are
appropriate to place in an abstract instance. Any attribute that
might vary between instances should not appear in an abstract
instance. DW_AT_ranges is included as an example in the
non-exhaustive list of attributes that should not appear in an
abstract instance.
Currently in dwarf2_attr (dwarf2/read.c), when we see a
DW_AT_abstract_origin attribute, we always follow this to try and find
the attribute we are looking for. But we could change this function
so that we prevent this following for attributes that we know should
not be looked up in an abstract instance. This would solve the
problem in this case by preventing us finding the DW_AT_ranges in the
incorrect abstract instance.
2. Filter the ranges.
Having established a blocks low/high address range in
dwarf_get_pc_bounds_ranges_or_highlow_pc, we could allow
dwarf2_record_block_ranges to parse the ranges, but we could reject
any range that extends outside the blocks defined start and end
addresses.
For well behaved DWARF where we have either low/high or ranges, then
the blocks start/end are defined from the range data, and so, by
definition, every range would be acceptable.
But in our problem case we would reject all of the invalid ranges.
This is my least favourite solution as it feels like rejecting the
ranges is tackling the problem too late on.
3. Don't try to parse ranges when we have low/high attributes.
This option involves updating dwarf2_record_block_ranges to match the
behaviour of dwarf_get_pc_bounds_ranges_or_highlow_pc, and, I believe,
to match the DWARF spec: don't try to read range data from
DW_AT_ranges if we have low/high pc attributes.
In our case this solves the issue because the problematic DIE has the
low/high attributes, and it then links to the wrong DIE which happens
to have DW_AT_ranges. With this change in place we don't even look
for the DW_AT_ranges.
If the problem were reversed, and the initial DIE had DW_AT_ranges,
but the incorrectly referenced DIE had the low/high pc attributes,
we would pick up the wrong addresses, but this wouldn't trigger any
asserts. The reason is that dwarf_get_pc_bounds_ranges_or_highlow_pc
would also find the low/high addresses from the incorrectly referenced
DIE, and so we would just end up with a block which had the wrong
address ranges, but the block would be self consistent, which is
different to the problem we hit here.
In the end, in this commit I went with solution #3, having
dwarf_get_pc_bounds_ranges_or_highlow_pc and
dwarf2_record_block_ranges be consistent seems sensible. However, I
do wonder if in the future we might want to explore solution #1 as an
additional safety feature.
With this patch in place I'm able to run the gdb.base/break-probes.exp
without seeing the assert that CI testing highlighted. I see no
regressions when testing on x86-64 GNU/Linux with gcc 9.3.1.
Note: the diff in this commit looks big, but it's really just me
indenting the code.
Tom de Vries [Wed, 4 Dec 2024 20:29:52 +0000 (21:29 +0100)]
[gdb/tdep] Remove includes of gdbsupport/common-defs.h
In commit 18d2988e5da ("gdb, gdbserver, gdbsupport: remove includes of early
headers") all includes of gdbsupport/common-defs.h where removed, but
commit c1cdee0e2c1 ("gdb: LoongArch: Add support for hardware watchpoint")
reintroduced some.
Fix this by removing them.
Tested by doing this on x86_64-linux:
...
$ make \
nat/loongarch-hw-point.o \
nat/loongarch-linux.o \
nat/loongarch-linux-hw-point.o
CXX nat/loongarch-hw-point.o
CXX nat/loongarch-linux.o
CXX nat/loongarch-linux-hw-point.o
...
Approved-By: Simon Marchi <simon.marchi@efficios.com>
Simon Marchi [Wed, 4 Dec 2024 20:29:52 +0000 (21:29 +0100)]
[gdb/build] Fix build breaker on mingw-w64
The mingw-w64 build breaks currently:
...
In file included from gdb/cli/cli-cmds.c:58:
gdbsupport/eintr.h: In function ‘pid_t gdb::waitpid(pid_t, int*, int)’:
gdbsupport/eintr.h:77:35: error: ‘::waitpid’ has not been declared; \
did you mean ‘gdb::waitpid’?
77 | return gdb::handle_eintr (-1, ::waitpid, pid, wstatus, options);
| ^~~~~~~
| gdb::waitpid
gdbsupport/eintr.h:75:1: note: ‘gdb::waitpid’ declared here
75 | waitpid (pid_t pid, int *wstatus, int options)
| ^~~~~~~
...
This is a regression since commit 658a03e9e85 ("[gdbsupport] Add
gdb::{waitpid,read,write,close}"), which moved the use of ::waitpid from
run_under_shell, where it was used conditionally:
...
#if defined(CANT_FORK) || \
(!defined(HAVE_WORKING_VFORK) && !defined(HAVE_WORKING_FORK))
...
#else
...
int ret = gdb::handle_eintr (-1, ::waitpid, pid, &status, 0);
...
to gdb::waitpid, where it's used unconditionally:
...
inline pid_t
waitpid (pid_t pid, int *wstatus, int options)
{
return gdb::handle_eintr (-1, ::waitpid, pid, wstatus, options);
}
...
Likewise for ::wait.
Guard these uses with HAVE_WAITPID and HAVE_WAIT.
Reproduced and tested by doing a mingw-w64 cross-build on x86_64-linux.
Reported-By: Simon Marchi <simark@simark.ca> Co-Authored-By: Tom de Vries <tdevries@suse.de>
Stephan Rohr [Thu, 22 Feb 2024 11:14:29 +0000 (03:14 -0800)]
gdb, testsuite: fix TCL error in 'gdb.base/structs.exp'
A failure of 'runto_main' in 'start_structs_test' results in a TCL
error. The return value of 'start_structs_test' function is evaluated
inside an if conditional clause, which expects a boolean value. Return
'-1' on failure to avoid the error.
Reviewed-By: Keith Seitz <keiths@redhat.com> Approved-By: Tom Tromey <tom@tromey.com>
Tom de Vries [Wed, 4 Dec 2024 09:21:00 +0000 (10:21 +0100)]
[gdb/testsuite] Fix failure in gdb.python/py-startup-opt.exp
In commit 922ab963e1c ("[gdb/python] Handle empty PYTHONDONTWRITEBYTECODE") I
added a test in gdb.python/py-startup-opt.exp that checks the
"show python dont-write-bytecode" output.
Then in commit 348290c7ef4 ("[gdb/python] Warn and ignore ineffective python
settings") I changed the output of "show python dont-write-bytecode" after
python initialization.
I tested these changes individually, and found no problems but after
committing both the test started failing, which the Linaro CI reported.
Fix this by updating the expected output.
While we're at it, make the test a bit more generic by testing
"show python $setting" in all cases.
Tom Tromey [Wed, 9 Oct 2024 21:31:11 +0000 (15:31 -0600)]
Fix "maint print" error messages
While working on an earlier patch, I noticed that all the
register-related "maint print" commands used the wrong command name in
an error message. This fixes them.
Reviewed-by: Christina Schimpe <christina.schimpe@intel.com> Approved-By: Andrew Burgess <aburgess@redhat.com>
Tom de Vries [Tue, 3 Dec 2024 22:03:03 +0000 (23:03 +0100)]
[gdb/testsuite] Fix DUPLICATE in gdb.arch/pr25124.exp
With test-case gdb.arch/pr25124.exp, I run into:
...
PASS: gdb.arch/pr25124.exp: disassemble thumb instruction (1st try)
PASS: gdb.arch/pr25124.exp: disassemble thumb instruction (2nd try)
DUPLICATE: gdb.arch/pr25124.exp: disassemble thumb instruction (2nd try)
...
Tom de Vries [Tue, 3 Dec 2024 21:58:47 +0000 (22:58 +0100)]
[gdb/python] Issue warning if python fails to initialize
A common problem is that python may fail to initialize if PYTHONHOME is
set incorrectly, or points to incompatible default libraries.
Likewise if PYTHONPATH points to incompatible modules.
For instance, say PYTHONHOME is foo, then we get:
...
$ gdb -q
Python path configuration:
PYTHONHOME = 'foo'
PYTHONPATH = (not set)
program name = '/usr/bin/python'
isolated = 0
environment = 1
user site = 1
safe_path = 0
import site = 1
is in build tree = 0
stdlib dir = 'foo/lib64/python3.12'
sys._base_executable = '/usr/bin/python'
sys.base_prefix = 'foo'
sys.base_exec_prefix = 'foo'
sys.platlibdir = 'lib64'
sys.executable = '/usr/bin/python'
sys.prefix = 'foo'
sys.exec_prefix = 'foo'
sys.path = [
'foo/lib64/python312.zip',
'foo/lib64/python3.12',
'foo/lib64/python3.12/lib-dynload',
]
Python Exception <class 'ModuleNotFoundError'>: No module named 'encodings'
Python not initialized
$
...
In this case, it might be easy to figure out what went wrong because of the
obviously incorrect pathnames, but that might not be the case if PYTHONHOME
points to an incompatible python installation.
Fix this by adding a warning with a description of the possible cause and what
to do about it:
...
Python initialization failed: \
failed to get the Python codec of the filesystem encoding
gdb: warning: Python failed to initialize with PYTHONHOME set. Maybe because \
it is set incorrectly? Maybe because it points to incompatible standard \
libraries? Consider changing or unsetting it, or ignoring it using "set \
python ignore-environment on" at early initialization.
...
Likewise for PYTHONPATH:
...
Python initialization failed: \
failed to get the Python codec of the filesystem encoding
gdb: warning: Python failed to initialize with PYTHONPATH set. Maybe because \
it points to incompatible modules? Consider changing or unsetting it, or \
ignoring it using "set python ignore-environment on" at early \
initialization.
...
Tested on aarch64-linux.
Approved-By: Tom Tromey <tom@tromey.com>
PR python/32379
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32379
Tom de Vries [Tue, 3 Dec 2024 21:54:23 +0000 (22:54 +0100)]
[gdb/python] Handle empty PYTHONDONTWRITEBYTECODE
When using PYTHONDONTWRITEBYTECODE with an empty string we get:
...
$ PYTHONDONTWRITEBYTECODE= gdb -q -batch -ex "show python dont-write-bytecode"
Python's dont-write-bytecode setting is auto (currently on).
...
This is incorrect, it should be off.
The actual setting is correct, that was already fixed in commit 24d2cbc42cc
("set/show python dont-write-bytecode fixes"), in function
python_write_bytecode.
Fix this by:
- factoring out new function env_python_dont_write_bytecode out of
python_write_bytecode, and
- using it in show_python_dont_write_bytecode.
Tested on x86_64-linux, using test-case gdb.python/py-startup-opt.exp and:
- PYTHONDONTWRITEBYTECODE=
- PYTHONDONTWRITEBYTECODE=1
- unset PYTHONDONTWRITEBYTECODE
Approved-By: Tom Tromey <tom@tromey.com>
PR python/32389
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32389
Tom de Vries [Tue, 3 Dec 2024 21:54:23 +0000 (22:54 +0100)]
[gdb/testsuite] Fix gdb.python/py-startup-opt.exp with empty PYTHONDONTWRITEBYTECODE
When running test-case gdb.python/py-startup-opt.exp with empty
PYTHONDONTWRITEBYTECODE:
...
$ cd build/gdb/testsuite
$ PYTHONDONTWRITEBYTECODE= make check \
RUNTESTFLAGS=gdb.python/py-startup-opt.exp
...
I get:
...
end^M
dont_write_bytecode is off^M
(gdb) FAIL: $exp: attr=dont_write_bytecode: testname: input 6: end
...
The problem is that the test-case expects dont_write_bytecode to be
on, which is incorrect because PYTHONDONTWRITEBYTECODE only has effect if set
to a non-empty string [1].
Fix this by correctly setting expectations in the test-case.
Tom de Vries [Tue, 3 Dec 2024 21:49:40 +0000 (22:49 +0100)]
[gdb/python] Warn and ignore ineffective python settings
Configuration flags "python dont-write-bytecode" and
"python ignore-environment" have effect only at Python initialization.
For instance, setting "python dont-write-bytecode" here has no effect:
...
$ gdb -q
(gdb) show python dont-write-bytecode
Python's dont-write-bytecode setting is auto (currently off).
(gdb) python import sys
(gdb) python print (sys.dont_write_bytecode)
False
(gdb) set python dont-write-bytecode on
(gdb) python print (sys.dont_write_bytecode)
False
...
This is not clear in the code: we set Py_DontWriteBytecodeFlag and
Py_IgnoreEnvironmentFlag in set_python_ignore_environment and
set_python_dont_write_bytecode. Fix this by moving the setting of those
variables to py_initialization.
Furthermore, this is not clear to the user: after Python initialization, the
user can still modify the configuration flags, and observe the changed setting:
...
$ gdb -q
(gdb) show python ignore-environment
Python's ignore-environment setting is off.
(gdb) set python ignore-environment on
(gdb) show python ignore-environment
Python's ignore-environment setting is on.
(gdb)
...
Fix this by emitting a warning when trying to set these configuration flags
after Python initialization:
...
$ gdb -q
(gdb) set python ignore-environment on
warning: Setting python ignore-environment after Python initialization has \
no effect, try setting this during early initialization
(gdb) set python dont-write-bytecode on
warning: Setting python dont-write-bytecode after Python initialization has \
no effect, try setting this during early initialization, or try setting \
sys.dont_write_bytecode
...
and by keeping the values constant after Python initialization.
Since the auto setting for python dont-write-bytecode depends on the current
value of environment variable PYTHONDONTWRITEBYTECODE, we simply avoid it
after Python initialization:
...
$ gdb -q -batch \
-eiex "show python dont-write-bytecode" \
-iex "show python dont-write-bytecode"
Python's dont-write-bytecode setting is auto (currently off).
Python's dont-write-bytecode setting is off.
...
Tested on aarch64-linux.
Approved-By: Tom Tromey <tom@tromey.com>
PR python/32388
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32388
Tom de Vries [Tue, 3 Dec 2024 21:49:40 +0000 (22:49 +0100)]
[gdb/python] Drop ATTRIBUTE_UNUSED on py_initialize_catch_abort
I added ATTRIBUTE_UNUSED to py_initialize_catch_abort as a quick fix to deal
with it being unused for PY_VERSION_HEX >= 0x030a0000, but forgot to fix this
before committing.
Fix this now, by removing the attribute and using
'#if PY_VERSION_HEX < 0x030a0000' instead.
Tom de Vries [Tue, 3 Dec 2024 21:49:40 +0000 (22:49 +0100)]
[gdb/python] Factor out and refactor py_initialize
Function do_start_initialization has a large part dedicated to initializing
the python interpreter, as opposed to the rest of the function where
gdb-specific python support is initialized.
Factor out this part, as new function py_initialize, and rename the existing
py_initialize to py_initialize_catch_abort.
Refactor the new function py_initialize by getting rid of the nested:
...
#ifdef WITH_PYTHON_PATH
#if PY_VERSION_HEX < 0x030a0000
#else
#endif
#else
#endif
...
In particular, this changes behaviour for the "!defined (WITH_PYTHON_PATH)"
case.
For the "defined (WITH_PYTHON_PATH)" case, we've started using
Py_InitializeFromConfig () for PY_VERSION_HEX >= 0x030a0000 to deal with the
deprecation of Py_SetProgramName in 3.11.
For the "!defined (WITH_PYTHON_PATH)" case, we don't use Py_SetProgramName so
we stuck with Py_Initialize ().
However, in 3.12 Py_DontWriteBytecodeFlag and Py_IgnoreEnvironmentFlag got
deprecated and also here we need Py_InitializeFromConfig () to deal with this,
but the "!defined (WITH_PYTHON_PATH)" case didn't get updated.
This should be taken care of, now that we have this behavior:
- for PY_VERSION_HEX < 0x030a0000 we use Py_Initialize
- for PY_VERSION_HEX >= 0x030a0000 we use Py_InitializeFromConfig
I'm not sure how to test the "!defined (WITH_PYTHON_PATH)" though.
Simon Marchi [Tue, 3 Dec 2024 15:52:18 +0000 (10:52 -0500)]
gdb: restore nullptr check in compunit_symtab::find_call_site
Commit de2b4ab50de ("Convert dwarf2_cu::call_site_htab to new hash
table") removed this nullptr check for no good reason. This causes a
crash if `m_call_site_htab` is not set, as shown in PR 32410. My guess
is that when doing this change, I tried to make `m_call_site_htab` not a
pointer, removed this check, then realized it wasn't so obvious, and
forgot to re-add the check.
Change-Id: I455e00cdc0519dfb412dc7826d17a839b77aae69
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32410 Approved-By: Tom Tromey <tom@tromey.com> Approved-By: Tom de Vries <tdevries@suse.de>
Guinevere Larsen [Mon, 25 Nov 2024 17:34:31 +0000 (14:34 -0300)]
gdb/testsuite: make gdb.reverse/i386-avx-reverse.exp require avx
The test gdb.reverse/i386-avx-reverse.exp was assuming that if the CPU
was like x86, it would have AVX instructions because I didn't know how
to check for AVX instruction support explicitly. This commit updates
that to use the pre-existing TCL proc have_avx.
Also update the comment at the top of the test, since it was a copy of a
different test.
Tom de Vries [Tue, 3 Dec 2024 15:53:14 +0000 (16:53 +0100)]
[gdb/testsuite] Fix gdb.base/reset-catchpoint-cond.exp with --with-expat=no
When building gdb with --with-expat=no and running test-case
gdb.base/reset-catchpoint-cond.exp we get:
...
(gdb) catch syscall write^M
warning: Can not parse XML syscalls information; \
XML support was disabled at compile time.^M
Unknown syscall name 'write'.^M
(gdb) FAIL: $exp: mode=syscall: catch syscall write
...
Fix this by skipping the test for --with-expat=no.
Tom de Vries [Tue, 3 Dec 2024 15:53:14 +0000 (16:53 +0100)]
[gdb/testsuite] Fix gdb.python/python.exp with --disable-tui
When building gdb with --disable-tui, we run into:
...
(gdb) python print(type(gdb.TuiWindow))^M
Python Exception <class 'AttributeError'>: \
module 'gdb' has no attribute 'TuiWindow'^M
Error occurred in Python: module 'gdb' has no attribute 'TuiWindow'^M
(gdb) FAIL: gdb.python/python.exp: gdb.TuiWindow is registered
...
Guinevere Larsen [Mon, 21 Oct 2024 18:57:55 +0000 (15:57 -0300)]
gdb: fix crash when GDB can't read an objfile
If a user starts an inferior composed of objfiles that GDB is unable to
read, there is an error thrown in find_sym_fns, printing the famous "I'm
sorry, Dave, I can't do that" and the objfile stops being read. However,
the objfile will already have been linked to the program space, and
future interactions with the objfile will assume that it is readable.
Relevant to this commit, if GDB tries to find out the section that
contains a PC, and this section happens to land in the unreadable
objfile, GDB will try to create a section mapping, eventually calling
update_section_map. Since that function uses bfd to calculate the
sections, it'll think there are sections to be ordered, but when trying
to access the objfile::section_offsets, it'll be indexing a size 0
std::vector, which will end up segfaulting.
Currently, it isn't easy to trigger this crash, but the upcoming
possibility to disable support for some file formats would make the
crash very easy to reproduce, by attempting to debug an unsupported
inferior and using "break *<instruction>" command, or simply connecting
to a gdbserver loaded with an unsupported inferior.
The struct objfile_up seems to have been created to catch these kinds of
errors and unlink the partially-read objfile from the program space, as
the objfile isn't useful to GDB anymore, but it seems to have been added
before find_sym_fns would throw errors for unreadable objfiles, as the
instance in syms_from_objfile_1 (that could save GDB from this crash) is
declared well after find_sym_fns, too late to guard us. This commit
moves the declaration up to the top of the function, so it works as
intended.
Further discussion on the mailing list also agreed that the name
"objfile_up" implies some level of ownership of the pointer, which this
struct doesn't have. So this commit renames the struct to
scoped_objfile_unlinker, which is more descriptive of what the struct is
actually meant to do.
The final change this commit does is add an assertion to
objfile::section_offset and objfile::set_section_offset, which ensures
that the section_offsets vector is large enough to return the desired
offset. This ensures that we won't misteriously segfault or worse,
continue going with garbage data.
Reported-By: Andrew Burgess <aburgess@redhat.com> Approved-By: Andrew Burgess <aburgess@redhat.com>
Lulu Cai [Tue, 3 Dec 2024 11:37:26 +0000 (19:37 +0800)]
LoongArch: Fix the infinite loop caused by calling undefweak symbol
The undefweak symbol value of non-default visibility is 0 and does
not use plt entry, and will not be relocated in the relocate_secion
function. As a result, an infinite loop is generated because
bl %plt(sym) => bl 0.
Fix this by converting the call into a jump address 0.
Jan Beulich [Tue, 3 Dec 2024 09:48:16 +0000 (10:48 +0100)]
gas: partly restore how current_location() had worked
Commit 4a826962e760 changed its behavior without saying why, and without
putting in place any testcase demonstrating the required behavior.
Firmly latch the current position unless deferred-evaluation mode is in
effect.
Jan Beulich [Tue, 3 Dec 2024 09:47:36 +0000 (10:47 +0100)]
gas: streamline expr_build_dot()
There's no point involving symbol_clone_if_forward_ref(), just for it to
replace dot_symbol by one obtained from symbol_temp_new_now(). For the
abs-section case also produce a slightly more "complete" (as in: all
potentially relevant fields filled) expression by going through
expr_build_uconstant().
Move the function next to current_location(), for it to be easier to see
the (dis)similarities. Correct the function's comment while there.
Kong Lingling [Tue, 3 Dec 2024 07:34:05 +0000 (15:34 +0800)]
Support Intel AVX10.2 BF16 instructions
In this patch, we will support AVX10.2 BF16 instructions. All of them
are new instructions forms. In current documentation, it is still
VSCALEFPBF16, but it will change to VSCALEFNEPBF16 eventually.
In disassembler part, we added %XB to reduce W table pass since all
of them get evex.w=0.
Simon Marchi [Mon, 2 Dec 2024 15:35:23 +0000 (10:35 -0500)]
gdb, gdbserver, gdbsupport: flatten and sort some list in configure files
This makes the lists easier sort read and modify. There are no changes
in the generated config.h files, so I'm confident this brings no
functional changes.
aarch64: GCS feature check in GNU note properties for input objects
This patch adds support for Guarded Control Stack in AArch64 linker.
This patch implements the following:
1) Defines GNU_PROPERTY_AARCH64_FEATURE_1_GCS bit for GCS in
GNU_PROPERTY_AARCH64_FEATURE_1_AND macro.
2) Adds readelf support to read and print the GCS feature in GNU
properties in AArch64.
Displaying notes found in: .note.gnu.property
[ ]+Owner[ ]+Data size[ ]+Description
GNU 0x00000010 NT_GNU_PROPERTY_TYPE_0
Properties: AArch64 feature: GCS
3) Adds support for the "-z gcs" linker option and document all the values
allowed with this option (-z gcs[=always|never|implicit]) where "-z gcs" is
equivalent to "-z gcs=always". When '-z gcs' option is omitted from the
command line, it defaults to "implicit" and relies on the GCS feature
marking in GNU properties.
4) Adds support for the "-z gcs-report" linker option and document all the
values allowed with this option (-z gcs-report[=none|warning|error]) where
"-z gcs-report" is equivalent to "-z gcs-report=warning". When this option
is omitted from the command line, it defaults to "warning".
The ABI changes adding GNU_PROPERTY_AARCH64_FEATURE_1_GCS to the GNU
property GNU_PROPERTY_AARCH64_FEATURE_1_AND is merged into main and
can be found in [1].
Matthieu Longo [Thu, 7 Nov 2024 14:33:09 +0000 (14:33 +0000)]
aarch64: rename BTI error/warning message
The previous message for missing BTI feature in GNU properties was
not very clear. The new message explains that a missing GNU property
marking is lacking on this specific input.
Matthieu Longo [Tue, 5 Nov 2024 13:22:31 +0000 (13:22 +0000)]
aarch64: limit number of reported issues on missing GNU properties
This patch attempts to make the linker output more friendly for the
developers by limiting the number of emitted warning/error messages
related to BTI issues.
Every time an error/warning related to BTI is emitted, the logger
also increments the BTI issues counter. A batch of errors/warnings is
limited to a maximum of 20 explicit errors/warnings. At the end of
the merge, a summary of the total of errors/warning is given if the
number exceeds the limit of 20 invidual messages.
Matthieu Longo [Wed, 6 Nov 2024 17:59:46 +0000 (17:59 +0000)]
aarch64: bugfix when finding 1st bfd input with GNU property
The current implementation of searching the first input BFD with GNU
properties has a bug. The search was not filtering on object inputs
belonging to the output link unit only, but was also including dynamic
objects, BFD plugins, and linker-created files.
This means that the initial initialization of the output properties
were skewed, and warnings on input files that should have been emitted
were not.
This patch fixes the filtering to exclude the object input files not
belonging to the output link unit, not having the same ELF class, and
not the same target architecture.
Matthieu Longo [Thu, 14 Nov 2024 17:35:24 +0000 (17:35 +0000)]
aarch64: remove early exit when setting up GNU properties with partial linking
There is an early exit in _bfd_aarch64_elf_link_setup_gnu_properties
that is enabled when the output link unit is relocatable, i.e. ld
generates an output file that can in turn serve as input to ld. (see
ld manual, -r,--relocatable for more details).
At this stage, the GNU properties have already been merged and errors
or warnings (if any) have already been issued. However, OUTPROP has
not been updated yet.
Not updating OUTPROP means that implicits enablement of BTI PLTs via
the GNU properties will be ignored for final links. Indeed, the
enablement of BTI PLTs is checked inside _bfd_aarch64_add_call_stub_entries
by looking up at gnu_property_aarch64_feature_1_and (OUTPROP).
Since the final link does not happen in the case of partial linking,
the behaviour with or without the early exit should be the same.
Given that there is currently no comment for explain why the exit is
there, and that there might in the future be cases were these properties
affect relocatable links, it is preferrable to drop the early exit.
Move the code related to the search of the first bfd input with GNU
properties to a separate function:
_bfd_aarch64_elf_find_1st_bfd_input_with_gnu_property
Before this patch, warnings were reported normally, and errors
(introduced by a previous patch adding '-z bti-report' option)
were logged as error but were not provoking a link failure.
The root of the issue was a misuse of _bfd_error_handler to
report the errors.
Replacing _bfd_error_handler by info->callbacks->einfo, with the
addition of the formatter '%X' for errors fixed the issue.