Add the interrupt enable register offset (inten_offset) so that GPIO
interrupts can be enabled normally on more models.
According to the latest interface specifications, the definition of GPIO
interrupts in ACPI is similar to that in FDT. The GPIO interrupts are
listed one by one according to the GPIO number, and the corresponding
interrupt number can be obtained directly through the GPIO number
specified by the consumer.
Christian König [Thu, 10 Jul 2025 14:25:21 +0000 (16:25 +0200)]
drm/ttm: remove ttm_bo_validate_swapout test
The test is quite fragile since it tries to allocate halve available system
memory + 1 page.
If the system has either not enough memory to make the allocation work
with other things running in parallel or to much memory so the allocation
fails as to large/invalid the test will fail.
Completely remove the test. We already validate swapout on the device
level and that test seems to be stable.
serial: sh-sci: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
Convert the Renesas SuperH SCI(F) serial port driver from
SIMPLE_DEV_PM_OPS() to DEFINE_SIMPLE_DEV_PM_OPS() and pm_sleep_ptr().
This lets us drop the __maybe_unused annotations from its suspend and
resume callbacks, and reduces kernel size in case CONFIG_PM or
CONFIG_PM_SLEEP is disabled.
gpio: sysfs: allow disabling the legacy parts of the GPIO sysfs interface
Add a Kconfig switch allowing to disable the legacy parts of the GPIO
sysfs interface. This means that even though we keep the
/sys/class/gpio/ directory, it no longer contains the global
export/unexport attribute pair (instead, the user should use the
per-chip export/unpexport) nor the gpiochip$BASE entries. This option
default to y if GPIO sysfs is enabled but we'll default it to n at some
point in the future.
gpio: sysfs: export the GPIO directory locally in the gpiochip<id> directory
As a way to allow the user-space to stop referring to GPIOs by their
global numbers, introduce a parallel group of line attributes for
exported GPIO that live inside the GPIO chip class device and are
referred to by their HW offset within their parent chip.
gpio: sysfs: don't look up exported lines as class devices
In preparation for adding a parallel, per-chip attribute group for
exported GPIO lines, stop using class device APIs to refer to it in the
code. When unregistering the chip, don't call class_find_device() but
instead store exported lines in a linked list inside the GPIO chip data
object and look it up there.
gpio: sysfs: don't use driver data in sysfs callbacks for line attributes
Currently each exported GPIO is represented in sysfs as a separate class
device. This allows us to simply use dev_get_drvdata() to retrieve the
pointer passed to device_create_with_groups() from sysfs ops callbacks.
However, we're preparing to add a parallel set of per-line sysfs
attributes that will live inside the associated gpiochip group. They are
not registered as class devices and so have the parent device passed as
argument to their callbacks (the GPIO chip class device).
Put the attribute structs inside the GPIO descriptor data and
dereference the relevant ones using container_of() in the callbacks.
This way, we'll be able to reuse the same code for both the legacy and
new GPIO attributes.
gpio: sysfs: rename the data variable in gpiod_(un)export()
In preparation for future commits which will make use of descriptor AND
GPIO-device data in the same functions rename the former from data to
desc_data separately which will make future changes smaller and easier
to read.
gpio: sysfs: pass gpiod_data directly to internal GPIO sysfs functions
We don't use any fields from struct device in gpio_sysfs_request_irq(),
gpio_sysfs_free_irq() and gpio_sysfs_set_active_low(). We only use the
dev argument to get the associated struct gpiod_data pointer with
dev_get_drvdata().
To make the transition to not using dev_get_drvdata() across line
callbacks for sysfs attributes easier, pass gpiod_data directly to
these functions instead of having it wrapped in struct device.
gpio: sysfs: only get the dirent reference for the value attr once
There's no reason to retrieve the reference to the sysfs dirent every
time we request an interrupt, we can as well only do it once when
exporting the GPIO.
gpio: sysfs: add a parallel class device for each GPIO chip using device IDs
In order to enable moving away from the global GPIO numberspace-based
exporting of lines over sysfs: add a parallel, per-chip entry under
/sys/class/gpio/ for every registered GPIO chip, denoted by device ID
in the file name and not its base GPIO number.
Compared to the existing chip group: it does not contain the "base"
attribute as the goal of this change is to not refer to GPIOs by their
global number from user-space anymore. It also contains its own,
per-chip export/unexport attribute pair which allow to export lines by
their hardware offset within the chip.
Caveat #1: the new device cannot be a link to (or be linked to by) the
existing "gpiochip<BASE>" entry as we cannot create links in
/sys/class/xyz/.
Caveat #2: the new entry cannot be named "gpiochipX" as it could
conflict with devices whose base is statically defined to a low number.
Let's go with "chipX" instead.
While at it: the chip label is unique so update the untrue statement
when extending the docs.
gpio: sysfs: use gpiod_is_equal() to compare GPIO descriptors
We have a dedicated comparator for GPIO descriptors that performs
additional checks and hides the implementation detail of whether the
same GPIO can be associated with two separate struct gpio_desc objects.
Use it in sysfs code
Dan Carpenter [Tue, 15 Jul 2025 23:03:17 +0000 (18:03 -0500)]
fs: tighten a sanity check in file_attr_to_fileattr()
The fattr->fa_xflags is a u64 that comes from the user. This is a sanity
check to ensure that the users are only setting allowed flags. The
problem is that it doesn't check the upper 32 bits. It doesn't really
affect anything but for more flexibility in the future, we want to enforce
users zero out those bits.
iio: adc: ad_sigma_delta: Select IIO_BUFFER_DMAENGINE and SPI_OFFLOAD
CONFIG_AD_SIGMA_DELTA uses several symbols that it does not explicitly
select. If no other enabled driver selects them, the build fails with
either a linker failure if the driver is built in or a modpost failure
if the driver is a module.
Mathias Nyman [Mon, 23 Jun 2025 13:39:47 +0000 (16:39 +0300)]
usb: hub: Don't try to recover devices lost during warm reset.
Hub driver warm-resets ports in SS.Inactive or Compliance mode to
recover a possible connected device. The port reset code correctly
detects if a connection is lost during reset, but hub driver
port_event() fails to take this into account in some cases.
port_event() ends up using stale values and assumes there is a
connected device, and will try all means to recover it, including
power-cycling the port.
Details:
This case was triggered when xHC host was suspended with DbC (Debug
Capability) enabled and connected. DbC turns one xHC port into a simple
usb debug device, allowing debugging a system with an A-to-A USB debug
cable.
xhci DbC code disables DbC when xHC is system suspended to D3, and
enables it back during resume.
We essentially end up with two hosts connected to each other during
suspend, and, for a short while during resume, until DbC is enabled back.
The suspended xHC host notices some activity on the roothub port, but
can't train the link due to being suspended, so xHC hardware sets a CAS
(Cold Attach Status) flag for this port to inform xhci host driver that
the port needs to be warm reset once xHC resumes.
CAS is xHCI specific, and not part of USB specification, so xhci driver
tells usb core that the port has a connection and link is in compliance
mode. Recovery from complinace mode is similar to CAS recovery.
xhci CAS driver support that fakes a compliance mode connection was added
in commit 8bea2bd37df0 ("usb: Add support for root hub port status CAS")
Once xHCI resumes and DbC is enabled back, all activity on the xHC
roothub host side port disappears. The hub driver will anyway think
port has a connection and link is in compliance mode, and hub driver
will try to recover it.
The port power-cycle during recovery seems to cause issues to the active
DbC connection.
Fix this by clearing connect_change flag if hub_port_reset() returns
-ENOTCONN, thus avoiding the whole unnecessary port recovery and
initialization attempt.
Cc: stable@vger.kernel.org Fixes: 8bea2bd37df0 ("usb: Add support for root hub port status CAS") Tested-by: Łukasz Bartosik <ukaszb@chromium.org> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/20250623133947.3144608-1-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Lechner [Thu, 10 Jul 2025 20:43:40 +0000 (15:43 -0500)]
iio: adc: ad7173: fix setting ODR in probe
Fix the setting of the ODR register value in the probe function for
AD7177. The AD7177 chip has a different ODR value after reset than the
other chips (0x7 vs. 0x0) and 0 is a reserved value on that chip.
The driver already has this information available in odr_start_value
and uses it when checking valid values when writing to the
sampling_frequency attribute, but failed to set the correct initial
value in the probe function.
David Lechner [Wed, 9 Jul 2025 01:38:33 +0000 (20:38 -0500)]
iio: adc: ad7173: fix calibration channel
Fix the channel index values passed to ad_sd_calibrate() in
ad7173_calibrate_all().
ad7173_calibrate_all() expects these values to be that of the CHANNELx
register assigned to the channel, not the datasheet INPUTx number of the
channel. The incorrect values were causing register writes to fail for
some channels because they set the WEN bit that must always be 0 for
register access and set the R/W bit to read instead of write. For other
channels, the channel number was just wrong because the CHANNELx
registers are generally assigned in reverse order and so almost never
match the INPUTx numbers.
David Lechner [Sun, 6 Jul 2025 18:53:08 +0000 (13:53 -0500)]
iio: adc: ad7173: fix num_slots
Fix the num_slots value for most chips in the ad7173 driver. The correct
value is the number of CHANNELx registers on the chip.
In commit 4310e15b3140 ("iio: adc: ad7173: don't make copy of
ad_sigma_delta_info struct"), we refactored struct ad_sigma_delta_info
to be static const data instead of being dynamically populated during
driver probe. However, there was an existing bug in commit 76a1e6a42802
("iio: adc: ad7173: add AD7173 driver") where num_slots was incorrectly
set to the number of CONFIGx registers instead of the number of
CHANNELx registers. This bug was partially propagated to the refactored
code in that the 16-channel chips were only given 8 slots instead of
16 although we did managed to fix the 8-channel chips and one of the
4-channel chips in that commit. However, we botched two of the 4-channel
chips and ended up incorrectly giving them 8 slots during the
refactoring.
This patch fixes that mistake on the 4-channel chips and also
corrects the 16-channel chips to have 16 slots.
David Lechner [Thu, 3 Jul 2025 19:51:17 +0000 (14:51 -0500)]
iio: adc: ad7173: fix channels index for syscalib_mode
Fix the index used to look up the channel when accessing the
syscalib_mode attribute. The address field is a 0-based index (same
as scan_index) that it used to access the channel in the
ad7173_channels array throughout the driver. The channels field, on
the other hand, may not match the address field depending on the
channel configuration specified in the device tree and could result
in an out-of-bounds access.
David Lechner [Thu, 3 Jul 2025 21:07:44 +0000 (16:07 -0500)]
iio: adc: ad_sigma_delta: change to buffer predisable
Change the buffer disable callback from postdisable to predisable.
This balances the existing posteanble callback. Using postdisable
with posteanble can be problematic, for example, if update_scan_mode
fails, it would call postdisable without ever having called posteanble,
so the drivers using this would be in an unexpected state when
postdisable was called.
Michael Straube [Tue, 15 Jul 2025 18:28:12 +0000 (20:28 +0200)]
staging: rtl8723bs: remove function pointer hal_reset_security_engine
The function pointer hal_reset_security_engine is never set. As a
consequence, the function rtw_hal_reset_security_engine does nothing.
Remove both to reduce dead code.
Stefan Wahren [Tue, 15 Jul 2025 16:11:08 +0000 (18:11 +0200)]
staging: vchiq_arm: Make vchiq_shutdown never fail
Most of the users of vchiq_shutdown ignore the return value,
which is bad because this could lead to resource leaks.
So instead of changing all calls to vchiq_shutdown, it's easier
to make vchiq_shutdown never fail.
Stefan Wahren [Tue, 15 Jul 2025 16:11:07 +0000 (18:11 +0200)]
Revert "staging: vchiq_arm: Create keep-alive thread during probe"
The commit 86bc88217006 ("staging: vchiq_arm: Create keep-alive thread
during probe") introduced a regression for certain configurations,
which doesn't have a VCHIQ user. This results in a unused and hanging
keep-alive thread:
INFO: task vchiq-keep/0:85 blocked for more than 120 seconds.
Not tainted 6.12.34-v8-+ #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:vchiq-keep/0 state:D stack:0 pid:85 tgid:85 ppid:2
Call trace:
__switch_to+0x188/0x230
__schedule+0xa54/0xb28
schedule+0x80/0x120
schedule_preempt_disabled+0x30/0x50
kthread+0xd4/0x1a0
ret_from_fork+0x10/0x20
The commit 3e5def4249b9 ("staging: vchiq_arm: Improve initial VCHIQ connect")
based on the assumption that in good case the VCHIQ connect always happen and
therefore the keep-alive thread is guaranteed to be woken up. This is wrong,
because in certain configurations there are no VCHIQ users and so the VCHIQ
connect never happen. So revert it.
Damien Le Moal [Wed, 16 Jul 2025 02:03:15 +0000 (11:03 +0900)]
Documentation: driver-api: Update libata error handler information
Update ``->error_handler()`` section of the libata documentation file
Documentation/driver-api/libata.rst to remove the reference to the
function ata_do_eh() as that function was removed. The reference to the
function ata_bmdma_drive_eh() is also removed as that function does not
exist at all. And while at it, cleanup the description of the various
reset operations using a bullet list.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20250716020315.235457-4-dlemoal@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org>
Introduce struct ata_reset_operations to aggregate in a single structure
the definitions of the 4 reset methods (prereset, softreset, hardreset
and postreset) for a port. This new structure is used in struct ata_port
to define the reset methods for a regular port (reset field) and for a
port-multiplier port (pmp_reset field). A pointer to either of these
fields replaces the 4 reset method arguments passed to ata_eh_recover()
and ata_eh_reset().
The definition of the reset methods for all drivers is changed to use
the reset and pmp_reset fields in struct ata_port_operations.
A large number of files is modifed, but no functional changes are
introduced.
Suggested-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20250716020315.235457-3-dlemoal@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org>
Damien Le Moal [Wed, 16 Jul 2025 02:03:13 +0000 (11:03 +0900)]
ata: libata-eh: Remove ata_do_eh()
The only reason for ata_do_eh() to exist is that the two caller sites,
ata_std_error_handler() and ata_sff_error_handler() may pass it a
NULL hardreset operation so that the built-in (generic) hardreset
operation for a driver is ignored if the adapter SCR access is not
available.
However, ata_std_error_handler() and ata_sff_error_handler()
modifications of the hardreset port operation can easily be combined as
they are mutually exclusive. That is, a driver using sata_std_hardreset()
as its hardreset operation cannot use sata_sff_hardreset() and
vice-versa.
With this observation, ata_do_eh() can be removed and its code moved to
ata_std_error_handler(). The condition used to ignore the built-in
hardreset port operation is modified to be the one that was used in
ata_sff_error_handler(). This requires defining a stub for the function
sata_sff_hardreset() to avoid compilation errors when CONFIG_ATA_SFF is
not enabled. Furthermore, instead of modifying the local hardreset
operation definition, set the ATA_LFLAG_NO_HRST link flag to prevent
the use of built-in hardreset methods for ports without a valid scr_read
function. This flag is checked in ata_eh_reset() and if set, the
hardreset method is ignored.
This change simplifies ata_sff_error_handler() as this function now only
needs to call ata_std_error_handler().
No functional changes.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20250716020315.235457-2-dlemoal@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org>
John Johansen [Wed, 4 Jun 2025 08:45:05 +0000 (01:45 -0700)]
apparmor: fix regression in fs based unix sockets when using old abi
Policy loaded using abi 7 socket mediation was not being applied
correctly in all cases. In some cases with fs based unix sockets a
subset of permissions where allowed when they should have been denied.
This was happening because the check for if the socket was an fs based
unix socket came before the abi check. But the abi check is where the
correct path is selected, so having the fs unix socket check occur
early would cause the wrong code path to be used.
Fix this by pushing the fs unix to be done after the abi check.
Fixes: dcd7a559411e ("apparmor: gate make fine grained unix mediation behind v9 abi") Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Sat, 14 Jun 2025 20:49:02 +0000 (13:49 -0700)]
apparmor: fix af_unix auditing to include all address information
The auditing of addresses currently doesn't include the source address
and mixes source and foreign/peer under the same audit name. Fix this
so source is always addr, and the foreign/peer is peer_addr.
Fixes: c05e705812d1 ("apparmor: add fine grained af_unix mediation") Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Tue, 1 Apr 2025 22:28:13 +0000 (15:28 -0700)]
apparmor: Remove use of the double lock
The use of the double lock is not necessary and problematic. Instead
pull the bits that need locks into their own sections and grab the
needed references.
Fixes: c05e705812d1 ("apparmor: add fine grained af_unix mediation") Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Sun, 22 Jun 2025 11:09:06 +0000 (04:09 -0700)]
apparmor: update kernel doc comments for xxx_label_crit_section
Add a kernel doc header for __end_current_label_crit_section(), and
update the header for __begin_current_label_crit_section().
Fixes: b42ecc5f58ef ("apparmor: make __begin_current_label_crit_section() indicate whether put is needed") Signed-off-by: John Johansen <john.johansen@canonical.com>
Mateusz Guzik [Tue, 18 Mar 2025 22:06:41 +0000 (23:06 +0100)]
apparmor: make __begin_current_label_crit_section() indicate whether put is needed
Same as aa_get_newest_cred_label_condref().
This avoids a bunch of work overall and allows the compiler to note when no
clean up is necessary, allowing for tail calls.
This in particular happens in apparmor_file_permission(), which manages to
tail call aa_file_perm() 105 bytes in (vs a regular call 112 bytes in
followed by branches to figure out if clean up is needed).
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Sat, 24 May 2025 04:04:51 +0000 (21:04 -0700)]
apparmor: mitigate parser generating large xtables
Some versions of the parser are generating an xtable transition per
state in the state machine, even when the state machine isn't using
the transition table.
The parser bug is triggered by
commit 2e12c5f06017 ("apparmor: add additional flags to extended permission.")
In addition to fixing this in userspace, mitigate this in the kernel
as part of the policy verification checks by detecting this situation
and adjusting to what is actually used, or if not used at all freeing
it, so we are not wasting unneeded memory on policy.
Fixes: 2e12c5f06017 ("apparmor: add additional flags to extended permission.") Signed-off-by: John Johansen <john.johansen@canonical.com>
tracing/probes: Avoid using params uninitialized in parse_btf_arg()
After a recent change in clang to strengthen uninitialized warnings [1],
it points out that in one of the error paths in parse_btf_arg(), params
is used uninitialized:
kernel/trace/trace_probe.c:660:19: warning: variable 'params' is uninitialized when used here [-Wuninitialized]
660 | return PTR_ERR(params);
| ^~~~~~
Match many other NO_BTF_ENTRY error cases and return -ENOENT, clearing
up the warning.
Joel Fernandes [Tue, 15 Jul 2025 20:01:52 +0000 (16:01 -0400)]
rcu: Refactor expedited handling check in rcu_read_unlock_special()
Extract the complex expedited handling condition in rcu_read_unlock_special()
into a separate function rcu_unlock_needs_exp_handling() with detailed
comments explaining each condition.
This improves code readability. No functional change intended.
Reviewed-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
This commit removes the SRCU-lite implementation, which has been replaced
by SRCU-fast.
Both SRCU-lite and SRCU-fast provide faster readers by dropping the
smp_mb() call from their lock and unlock primitives, but incur a pair
of added RCU grace periods during the SRCU grace period. There is a
trivial mapping from the SRCU-lite API to that of SRCU-fast, so there
should be no transition issues.
[ paulmck: Apply Christoph Hellwig feedback. ]
Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
Currently, SRCU-fast grace periods use synchronize_rcu() to provide the
needed ordering with readers, even given an expedited SRCU-fast grace
period, which isn't all that expedited. This commit therefore instead
uses synchronize_rcu_expedited() if there is an expedited SRCU-fast
grace period in flight.
Of course, given an non-expedited SRCU-fast grace period blocked in
synchronize_rcu(), a later request for an expedited SRCU-fast grace
period will wait for that synchronize_rcu() to return before switching
to use of synchronize_rcu_expedited(). If this turns out to be a real
problem for a production workload, we can increase the complexity (but
likely also degrade the energy efficiency) to speed things up further.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
Because SRCU-lite is being replaced by SRCU-fast, this commit removes
support for SRCU-lite from rcutorture.c
Both SRCU-lite and SRCU-fast provide faster readers by dropping the
smp_mb() call from their lock and unlock primitives, but incur a pair
of added RCU grace periods during the SRCU grace period. There is a
trivial mapping from the SRCU-lite API to that of SRCU-fast, so there
should be no transition issues.
[ paulmck: Apply Christoph Hellwig feedback. ]
Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
This commit prepares for the removal of SRCU-Lite by removing the SRCU-L
rcutorture scenario that tests it.
Both SRCU-lite and SRCU-fast provide faster readers by dropping the
smp_mb() call from their lock and unlock primitives, but incur a pair
of added RCU grace periods during the SRCU grace period. There is a
trivial mapping from the SRCU-lite API to that of SRCU-fast, so there
should be no transition issues.
[ paulmck: Apply Christoph Hellwig feedback. ]
Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
Because SRCU-lite is being replaced by SRCU-fast, this commit removes
support for SRCU-lite from refscale.c.
Both SRCU-lite and SRCU-fast provide faster readers by dropping the
smp_mb() call from their lock and unlock primitives, but incur a pair
of added RCU grace periods during the SRCU grace period. There is a
trivial mapping from the SRCU-lite API to that of SRCU-fast, so there
should be no transition issues.
[ paulmck: Apply Christoph Hellwig feedback. ]
Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
torture: Make torture.sh --allmodconfig testing fail on warnings
Currently, the torture.sh --allmodconfig testing looks solely at the
exit code from the kernel build, and thus fails to flag many compiler
warnings. This commit therefore checks the kernel-build output for
compiler diagnostics.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
torture: Make torture.sh tolerate runs having bad kvm.sh arguments
Currently, torture.sh assumes excessive levels of reviewer competence
and thus fails to gracefully handle cases where it is tricked into giving
kvm.sh invalid arguments. This commit therefore upgrades error handling
to more gracefully handle this situation.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
torture: Add textid.txt file to --do-allmodconfig and --do-rcu-rust runs
This commit causes the torture.sh --do-allmodconfig and --do-rcu-rust
parameters to add testid.txt files to their results directories, thus
allowing easier analysis of the results of a series of runs kicked off by
"git bisect".
Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
torture: Extract testid.txt generation to separate script
The kvm.sh script places a testid.txt file in the top-level results
directory in order to identify the tree and commit that was tested.
This works well, but there are scripts other than kvm.sh that also create
results directories, and it would be good for them to also identify
exactly what was tested.
This commit therefore extracts the testid.txt generation to a new
mktestid.sh script so that it can be easily used elsewhere.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
Paul E. McKenney [Thu, 15 May 2025 22:30:01 +0000 (15:30 -0700)]
torture: Provide EXPERT Kconfig option for arm64 KCSAN torture.sh runs
The arm64 architecture requires that KCSAN-enabled kernels be built with
the CONFIG_EXPERT=y Kconfig option. This commit therefore causes the
torture.sh script to provide this option, but only for --kcsan runs on
arm64 systems.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: <kasan-dev@googlegroups.com> Cc: <linux-arm-kernel@lists.infradead.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
Joel Fernandes [Tue, 8 Jul 2025 14:22:19 +0000 (10:22 -0400)]
rcu: Fix rcu_read_unlock() deadloop due to IRQ work
During rcu_read_unlock_special(), if this happens during irq_exit(), we
can lockup if an IPI is issued. This is because the IPI itself triggers
the irq_exit() path causing a recursive lock up.
This is precisely what Xiongfeng found when invoking a BPF program on
the trace_tick_stop() tracepoint As shown in the trace below. Fix by
managing the irq_work state correctly.
irq_exit()
__irq_exit_rcu()
/* in_hardirq() returns false after this */
preempt_count_sub(HARDIRQ_OFFSET)
tick_irq_exit()
tick_nohz_irq_exit()
tick_nohz_stop_sched_tick()
trace_tick_stop() /* a bpf prog is hooked on this trace point */
__bpf_trace_tick_stop()
bpf_trace_run2()
rcu_read_unlock_special()
/* will send a IPI to itself */
irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
A simple reproducer can also be obtained by doing the following in
tick_irq_exit(). It will hang on boot without the patch:
rcu: Enable rcu_normal_wake_from_gp on small systems
Automatically enable the rcu_normal_wake_from_gp parameter on
systems with a small number of CPUs. The activation threshold
is set to 16 CPUs.
This helps to reduce a latency of normal synchronize_rcu() API
by waking up GP-waiters earlier and decoupling synchronize_rcu()
callers from regular callback handling.
A benchmark running 64 parallel jobs(system with 64 CPUs) invoking
synchronize_rcu() demonstrates a notable latency reduction with the
setting enabled.
Paul E. McKenney [Thu, 24 Apr 2025 23:49:53 +0000 (16:49 -0700)]
rcu: Protect ->defer_qs_iw_pending from data race
On kernels built with CONFIG_IRQ_WORK=y, when rcu_read_unlock() is
invoked within an interrupts-disabled region of code [1], it will invoke
rcu_read_unlock_special(), which uses an irq-work handler to force the
system to notice when the RCU read-side critical section actually ends.
That end won't happen until interrupts are enabled at the soonest.
In some kernels, such as those booted with rcutree.use_softirq=y, the
irq-work handler is used unconditionally.
The per-CPU rcu_data structure's ->defer_qs_iw_pending field is
updated by the irq-work handler and is both read and updated by
rcu_read_unlock_special(). This resulted in the following KCSAN splat:
BUG: KCSAN: data-race in rcu_preempt_deferred_qs_handler / rcu_read_unlock_special
read to 0xffff96b95f42d8d8 of 1 bytes by task 90 on cpu 8:
rcu_read_unlock_special+0x175/0x260
__rcu_read_unlock+0x92/0xa0
rt_spin_unlock+0x9b/0xc0
__local_bh_enable+0x10d/0x170
__local_bh_enable_ip+0xfb/0x150
rcu_do_batch+0x595/0xc40
rcu_cpu_kthread+0x4e9/0x830
smpboot_thread_fn+0x24d/0x3b0
kthread+0x3bd/0x410
ret_from_fork+0x35/0x40
ret_from_fork_asm+0x1a/0x30
write to 0xffff96b95f42d8d8 of 1 bytes by task 88 on cpu 8:
rcu_preempt_deferred_qs_handler+0x1e/0x30
irq_work_single+0xaf/0x160
run_irq_workd+0x91/0xc0
smpboot_thread_fn+0x24d/0x3b0
kthread+0x3bd/0x410
ret_from_fork+0x35/0x40
ret_from_fork_asm+0x1a/0x30
no locks held by irq_work/8/88.
irq event stamp: 200272
hardirqs last enabled at (200272): [<ffffffffb0f56121>] finish_task_switch+0x131/0x320
hardirqs last disabled at (200271): [<ffffffffb25c7859>] __schedule+0x129/0xd70
softirqs last enabled at (0): [<ffffffffb0ee093f>] copy_process+0x4df/0x1cc0
softirqs last disabled at (0): [<0000000000000000>] 0x0
The problem is that irq-work handlers run with interrupts enabled, which
means that rcu_preempt_deferred_qs_handler() could be interrupted,
and that interrupt handler might contain an RCU read-side critical
section, which might invoke rcu_read_unlock_special(). In the strict
KCSAN mode of operation used by RCU, this constitutes a data race on
the ->defer_qs_iw_pending field.
This commit therefore disables interrupts across the portion of the
rcu_preempt_deferred_qs_handler() that updates the ->defer_qs_iw_pending
field. This suffices because this handler is not a fast path.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
Describing the dependencies between registers and features is on
the masochistic side of things, with hard-coded values that would
be better taken from the existing description.
Add a couple of helpers to that effect, and repaint the dependency
array. More could be done to improve this test, but my interest is
wearing thin...
Marc Zyngier [Mon, 14 Jul 2025 12:26:28 +0000 (13:26 +0100)]
KVM: arm64: Let GICv3 save/restore honor visibility attribute
The GICv3 save/restore code never needed any visibility attribute,
but that's about to change. Make vgic_v3_has_cpu_sysregs_attr()
check the visibility in case a register is hidden.
Marc Zyngier [Mon, 14 Jul 2025 12:26:25 +0000 (13:26 +0100)]
KVM: arm64: Don't advertise ICH_*_EL2 registers through GET_ONE_REG
It appears that exposing the GICv3 EL2 registers through the usual
sysreg interface is not consistent with the way we expose the EL1
registers. The latter are exposed via the GICv3 device interface
instead, and there is no reason why the EL2 registers should get
a different treatement.
Hide the registers from userspace until the GICv3 code grows the
required infrastructure.
Marc Zyngier [Mon, 14 Jul 2025 12:26:24 +0000 (13:26 +0100)]
KVM: arm64: Make RVBAR_EL2 accesses UNDEF
We always expose a virtual CPU that has EL3 when NV is enabled,
irrespective of EL3 being actually implemented in HW.
Therefore, as per the architecture, RVBAR_EL2 must UNDEF, since
EL2 is not the highest implemented exception level. This is
consistent with RMR_EL2 also triggering an UNDEF.
Oliver Upton [Tue, 15 Jul 2025 06:25:07 +0000 (23:25 -0700)]
KVM: arm64: Commit exceptions from KVM_SET_VCPU_EVENTS immediately
syzkaller has found that it can trip a warning in KVM's exception
emulation infrastructure by repeatedly injecting exceptions into the
guest.
While it's unlikely that a reasonable VMM will do this, further
investigation of the issue reveals that KVM can potentially discard the
"pending" SEA state. While the handling of KVM_GET_VCPU_EVENTS presumes
that userspace-injected SEAs are realized immediately, in reality the
emulated exception entry is deferred until the next call to KVM_RUN.
Hack-a-fix the immediate issues by committing the pending exceptions to
the vCPU's architectural state immediately in KVM_SET_VCPU_EVENTS. This
is no different to the way KVM-injected exceptions are handled in
KVM_RUN where we potentially call __kvm_adjust_pc() before returning to
userspace.
Reported-by: syzbot+4e09b1432de3774b86ae@syzkaller.appspotmail.com Reported-by: syzbot+1f6f096afda6f4f8f565@syzkaller.appspotmail.com Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
This series contains 3 fixes somewhat related to various races we have
while handling fallback.
The root cause of the issues addressed here is that the check for
"we can fallback to tcp now" and the related action are not atomic. That
also applies to fallback due to MP_FAIL -- where the window race is even
wider.
Address the issue introducing an additional spinlock to bundle together
all the relevant events, as per patch 1 and 2. These fixes can be
backported up to v5.19 and v5.15.
Note that mptcp_disconnect() unconditionally clears the fallback status
(zeroing msk->flags) but don't touch the `allows_infinite_fallback`
flag. Such issue is addressed in patch 3, and can be backported up to
v5.17.
====================
Paolo Abeni [Mon, 14 Jul 2025 16:41:46 +0000 (18:41 +0200)]
mptcp: reset fallback status gracefully at disconnect() time
mptcp_disconnect() clears the fallback bit unconditionally, without
touching the associated flags.
The bit clear is safe, as no fallback operation can race with that --
all subflow are already in TCP_CLOSE status thanks to the previous
FASTCLOSE -- but we need to consistently reset all the fallback related
status.
Also acquire the relevant lock, to avoid fouling static analyzers.
Fixes: b29fcfb54cd7 ("mptcp: full disconnect implementation") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250714-net-mptcp-fallback-races-v1-3-391aff963322@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 14 Jul 2025 16:41:45 +0000 (18:41 +0200)]
mptcp: plug races between subflow fail and subflow creation
We have races similar to the one addressed by the previous patch between
subflow failing and additional subflow creation. They are just harder to
trigger.
The solution is similar. Use a separate flag to track the condition
'socket state prevent any additional subflow creation' protected by the
fallback lock.
The socket fallback makes such flag true, and also receiving or sending
an MP_FAIL option.
The field 'allow_infinite_fallback' is now always touched under the
relevant lock, we can drop the ONCE annotation on write.
Fixes: 478d770008b0 ("mptcp: send out MP_FAIL when data checksum fails") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250714-net-mptcp-fallback-races-v1-2-391aff963322@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since we need to track the 'fallback is possible' condition and the
fallback status separately, there are a few possible races open between
the check and the actual fallback action.
Add a spinlock to protect the fallback related information and use it
close all the possible related races. While at it also remove the
too-early clearing of allow_infinite_fallback in __mptcp_subflow_connect():
the field will be correctly cleared by subflow_finish_connect() if/when
the connection will complete successfully.
If fallback is not possible, as per RFC, reset the current subflow.
Since the fallback operation can now fail and return value should be
checked, rename the helper accordingly.
Multicast good packets received by PF rings that pass ethternet MAC
address filtering are counted for rtnl_link_stats64.multicast. The
counter is not cleared on read. Fix the duplicate counting on updating
statistics.
When device reset is triggered by feature changes such as toggling Rx
VLAN offload, wx->do_reset() is called to reinitialize Rx rings. The
hardware descriptor ring may retain stale values from previous sessions.
And only set the length to 0 in rx_desc[0] would result in building
malformed SKBs. Fix it to ensure a clean slate after device reset.
The wx_rx_buffer structure contained two DMA address fields: 'dma' and
'page_dma'. However, only 'page_dma' was actually initialized and used
to program the Rx descriptor. But 'dma' was uninitialized and used in
some paths.
This could lead to undefined behavior, including DMA errors or
use-after-free, if the uninitialized 'dma' was used. Althrough such
error has not yet occurred, it is worth fixing in the code.
Fixes: 3c47e8ae113a ("net: libwx: Support to receive packets in NAPI") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250714024755.17512-3-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>