This test extends "delete and re-add" to validate the previous commit. A
new 'subflow' endpoint is added, but the subflow request will be
rejected. The result is that no subflow will be established from this
address.
Later, the endpoint is removed and re-added after having cleared the
firewall rule. Before the previous commit, the client would not have
been able to create this new subflow.
While at it, extra checks have been added to validate the expected
numbers of MPJ and RM_ADDR.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: b6c08380860b ("mptcp: remove addr and subflow in PM netlink") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-4-38035d40de5b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This case was not covered, and the wrong ID was set before the previous
commit.
The rest is not modified, it is just that it will increase the code
coverage.
The right address ID can be verified by looking at the packet traces. We
could automate that using Netfilter with some cBPF code for example, but
that's always a bit cryptic. Packetdrill seems better fitted for that.
select_local_address() and select_signal_address() both select an
endpoint entry from the list inside an RCU protected section, but return
a reference to it, to be read later on. If the entry is dereferenced
after the RCU unlock, reading info could cause a Use-after-Free.
A simple solution is to copy the required info while inside the RCU
protected section to avoid any risk of UaF later. The address ID might
need to be modified later to handle the ID0 case later, so a copy seems
OK to deal with.
When reacting upon the reception of an ADD_ADDR, the in-kernel PM first
looks for fullmesh endpoints. If there are some, it will pick them,
using their entry ID.
It should set the ID 0 when using the endpoint corresponding to the
initial subflow, it is a special case imposed by the MPTCP specs.
Note that msk->mpc_endpoint_id might not be set when receiving the first
ADD_ADDR from the server. So better to compare the addresses.
Fixes: 1a0d6136c5f0 ("mptcp: local addresses fullmesh") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-12-38035d40de5b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
... before decrementing the add_addr_accepted counter helped to find a
bug when running the "remove single subflow" subtest from the
mptcp_join.sh selftest.
Removing a 'subflow' endpoint will first trigger a RM_ADDR, then the
subflow closure. Before this patch, and upon the reception of the
RM_ADDR, the other peer will then try to decrement this
add_addr_accepted. That's not correct because the attached subflows have
not been created upon the reception of an ADD_ADDR.
A way to solve that is to decrement the counter only if the attached
subflow was an MP_JOIN to a remote id that was not 0, and initiated by
the host receiving the RM_ADDR.
Fixes: d0876b2284cf ("mptcp: add the incoming RM_ADDR support") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-9-38035d40de5b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
... before decrementing the local_addr_used counter helped to find a bug
when running the "remove single address" subtest from the mptcp_join.sh
selftests.
Removing a 'signal' endpoint will trigger the removal of all subflows
linked to this endpoint via mptcp_pm_nl_rm_addr_or_subflow() with
rm_type == MPTCP_MIB_RMSUBFLOW. This will decrement the local_addr_used
counter, which is wrong in this case because this counter is linked to
'subflow' endpoints, and here it is a 'signal' endpoint that is being
removed.
Now, the counter is decremented, only if the ID is being used outside
of mptcp_pm_nl_rm_addr_or_subflow(), only for 'subflow' endpoints, and
if the ID is not 0 -- local_addr_used is not taking into account these
ones. This marking of the ID as being available, and the decrement is
done no matter if a subflow using this ID is currently available,
because the subflow could have been closed before.
Fixes: 06faa2271034 ("mptcp: remove multi addresses and subflows in PM") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-8-38035d40de5b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This helper is confusing. It is in pm.c, but it is specific to the
in-kernel PM and it cannot be used by the userspace one. Also, it simply
calls one in-kernel specific function with the PM lock, while the
similar mptcp_pm_remove_addr() helper requires the PM lock.
What's left is the pr_debug(), which is not that useful, because a
similar one is present in the only function called by this helper:
mptcp_pm_nl_rm_subflow_received()
After these modifications, this helper can be marked as 'static', and
the lock can be taken only once in mptcp_pm_flush_addrs_and_subflows().
Note that it is not a bug fix, but it will help backporting the
following commits.
If no subflow is attached to the 'subflow' endpoint that is being
removed, the addr ID will not be marked as available again.
Mark the linked ID as available when removing the 'subflow' endpoint if
no subflow is attached to it.
While at it, the local_addr_used counter is decremented if the ID was
marked as being used to reflect the reality, but also to allow adding
new endpoints after that.
Fixes: b6c08380860b ("mptcp: remove addr and subflow in PM netlink") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-3-38035d40de5b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Terminating for_each_available_child_of_node() loop requires dropping OF
node reference, so bailing out on errors misses this. Solve the OF node
reference leak with scoped for_each_available_child_of_node_scoped().
Fixes: 3fd6d6e2b4e8 ("thermal/of: Rework the thermal device tree initialization") Cc: <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20240814195823.437597-3-krzysztof.kozlowski@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
thermal_of_zone_register() calls of_thermal_zone_find() which will
iterate over OF nodes with for_each_available_child_of_node() to find
matching thermal zone node. When it finds such, it exits the loop and
returns the node. Prematurely ending for_each_available_child_of_node()
loops requires dropping OF node reference, thus success of
of_thermal_zone_find() means that caller must drop the reference.
Fixes: 3fd6d6e2b4e8 ("thermal/of: Rework the thermal device tree initialization") Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20240814195823.437597-2-krzysztof.kozlowski@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Terminating for_each_child_of_node() loop requires dropping OF node
reference, so bailing out after thermal_of_populate_trip() error misses
this. Solve the OF node reference leak with scoped
for_each_child_of_node_scoped().
Fixes: d0c75fa2c17f ("thermal/of: Initialize trip points separately") Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20240814195823.437597-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix this by using the non-coherent allocator instead, I think there
might be a better answer to this, but it involve ripping up some of
APIs using sg lists.
Cc: stable@vger.kernel.org Fixes: 2541626cfb79 ("drm/nouveau/acr: use common falcon HS FW code for ACR FWs") Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240815201923.632803-1-airlied@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With "quiet" set in bootargs, there is power domain failure:
"imx93_power_domain 44462400.power-domain: pd_off timeout: name: 44462400.power-domain, stat: 4"
The current power on opertation takes ISO state as power on finished
flag, but it is wrong. Before powering on operation really finishes,
powering off comes and powering off will never finish because the last
powering on still not finishes, so the following powering off actually
not trigger hardware state machine to run. SSAR is the last step when
powering on a domain, so need to wait SSAR done when powering on.
Since EdgeLock Enclave(ELE) handshake is involved in the flow, enlarge
the waiting time to 10ms for both on and off to avoid timeout.
Cc: stable@vger.kernel.org Fixes: 0a0f7cc25d4a ("soc: imx: add i.MX93 SRC power domain driver") Reviewed-by: Jacky Bai <ping.bai@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Link: https://lore.kernel.org/r/20240814124740.2778952-1-peng.fan@oss.nxp.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mandatory locking is enforced for cached reads, which violates
default posix semantics, and also it is enforced inconsistently.
This affected recent versions of libreoffice, and can be
demonstrated by opening a file twice from the same client,
locking it from handle one and trying to read from it from
handle two (which fails, returning EACCES).
There is already a mount option "forcemandatorylock"
(which defaults to off), so with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on read to a locked range (ie we will
only fail in this case, if the user mounts with
"forcemandatorylock").
An earlier patch fixed the write path.
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks") Cc: stable@vger.kernel.org Cc: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: David Howells <dhowells@redhat.com> Reported-by: abartlet@samba.org Reported-by: Kevin Ottens <kevin.ottens@enioka.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix a race condition if the clock provider comes up after mmc is probed,
this causes mmc to fail without retrying.
When given the DEFER error from the clk source, pass it on up the chain.
Fixes: f90a0612f0e1 ("mmc: dw_mmc: lookup for optional biu and ciu clocks") Signed-off-by: Ben Whitten <ben.whitten@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240811212212.123255-1-ben.whitten@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we use cmd8 as the tuning command in hs400 mode, the command
response sent back by some eMMC devices cannot be correctly sampled
by MTK eMMC controller at some weak sample timing. In this case,
command timeout error may occur. So we must receive the following
data to make sure the next cmd8 send correctly.
Commit e2ffe502ba45 ("cgroup/cpuset: Add cpuset.cpus.exclusive for
v2") adds a user writable cpuset.cpus.exclusive file for setting
exclusive CPUs to be used for the creation of partitions. Since then
effective_xcpus depends on both the cpuset.cpus and cpuset.cpus.exclusive
setting. If cpuset.cpus.exclusive is set, effective_xcpus will depend
only on cpuset.cpus.exclusive. When it is not set, effective_xcpus
will be set according to the cpuset.cpus value when the cpuset becomes
a valid partition root.
When cpuset.cpus is being cleared by the user, effective_xcpus should
only be cleared when cpuset.cpus.exclusive is not set. However, that
is not currently the case.
# cd /sys/fs/cgroup/
# mkdir test
# echo +cpuset > cgroup.subtree_control
# cd test
# echo 3 > cpuset.cpus.exclusive
# cat cpuset.cpus.exclusive.effective
3
# echo > cpuset.cpus
# cat cpuset.cpus.exclusive.effective // was cleared
Fix it by clearing effective_xcpus only if cpuset.cpus.exclusive is
not set.
It can be reproduced with cammands:
cd /sys/fs/cgroup/
mkdir test
cd test/
echo +cpuset > ../cgroup.subtree_control
echo root > cpuset.cpus.partition
cat /sys/fs/cgroup/cpuset.cpus.effective
0-3
echo 0-3 > cpuset.cpus // taking away all cpus from root
This issue is caused by the incorrect rebuilding of scheduling domains.
In this scenario, test/cpuset.cpus.partition should be an invalid root
and should not trigger the rebuilding of scheduling domains. When calling
update_parent_effective_cpumask with partcmd_update, if newmask is not
null, it should recheck newmask whether there are cpus is available
for parect/cs that has tasks.
On a system with a GICv3, if a guest hasn't been configured with
GICv3 and that the host is not capable of GICv2 emulation,
a write to any of the ICC_*SGI*_EL1 registers is trapped to EL2.
We therefore try to emulate the SGI access, only to hit a NULL
pointer as no private interrupt is allocated (no GIC, remember?).
The obvious fix is to give the guest what it deserves, in the
shape of a UNDEF exception.
Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240820100349.3544850-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If there were LPIs being mapped behind our back (i.e., between .start() and
.stop()), we would put them at iter_unmark_lpis() without checking if they
were actually *marked*, which is obviously not good.
Switch to use the xa_for_each_marked() iterator to fix it.
Cc: stable@vger.kernel.org Fixes: 85d3ccc8b75b ("KVM: arm64: vgic-debug: Use an xarray mark for debug iterator") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240817101541.1664-1-yuzenghui@huawei.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kolbjørn and Jonáš reported that their 32-bit PowerMacs were crashing
in pata-macio since commit 09fe2bfa6b83 ("ata: pata_macio: Fix
max_segment_size with PAGE_SIZE == 64K").
That commit increased max_segment_size to 64KB, with the justification
that the SCSI core was already using that size when PAGE_SIZE == 64KB,
and that there was existing logic to split over-sized requests.
However with a sufficiently large request, the splitting logic causes
each sg to be split into two commands in the DMA table, leading to
overflow of the DMA table, triggering the BUG_ON().
With default settings the bug doesn't trigger, because the request size
is limited by max_sectors_kb == 1280, however max_sectors_kb can be
increased, and apparently some distros do that by default using udev
rules.
Fix the bug for 4KB kernels by reverting to the old max_segment_size.
For 64KB kernels the sg_tablesize needs to be halved, to allow for the
possibility that each sg will be split into two.
The old quirk combination sometimes cause a laggy keyboard after boot. With
the new quirk the initial issue of an unresponsive keyboard after s3 resume
is also fixed, but it doesn't have the negative side effect of the
sometimes laggy keyboard.
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com> Cc: stable@vger.kernel.org Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20240104183118.779778-3-wse@tuxedocomputers.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On s3 resume the i8042 driver tries to restore the controller to a known
state by reinitializing things, however this can confuse the controller
with different effects. Mostly occasionally unresponsive keyboards after
resume.
These issues do not rise on s0ix resume as here the controller is assumed
to preserved its state from before suspend.
This patch adds a quirk for devices where the reinitialization on s3 resume
is not needed and might be harmful as described above. It does this by
using the s0ix resume code path at selected locations.
This new quirk goes beyond what the preexisting reset=never quirk does,
which only skips some reinitialization steps.
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com> Cc: stable@vger.kernel.org Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20240104183118.779778-2-wse@tuxedocomputers.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Wacom driver maps the HID_DG_TWIST usage to ABS_Z (rather than ABS_RZ)
for historic reasons. When the code to support twist was introduced in
commit 50066a042da5 ("HID: wacom: generic: Add support for height, tilt,
and twist usages"), we were careful to write it in such a way that it had
HID calculate the resolution of the twist axis assuming ABS_RZ instead
(so that we would get correct angular behavior). This was broken with
the introduction of commit 08a46b4190d3 ("HID: wacom: Set a default
resolution for older tablets"), which moved the resolution calculation
to occur *before* the adjustment from ABS_Z to ABS_RZ occurred.
This commit moves the calculation of resolution after the point that
we are finished setting things up for its proper use.
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com> Fixes: 08a46b4190d3 ("HID: wacom: Set a default resolution for older tablets") Cc: stable@vger.kernel.org Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Loongson64 C and G processors have EXTIMER feature which
is conflicting with CP0 counter.
Although the processor resets in EXTIMER disabled & INTIMER
enabled mode, which is compatible with MIPS CP0 compare, firmware
may attempt to enable EXTIMER and interfere CP0 compare.
Set timer mode back to MIPS compatible mode to fix booting on
systems with such firmware before we have an actual driver for
EXTIMER.
Cc: stable@vger.kernel.org Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When performing the port_hwtstamp_set operation, ptp_schedule_worker()
will be called if hardware timestamoing is enabled on any of the ports.
When using multiple ports for PTP, port_hwtstamp_set is executed for
each port. When called for the first time ptp_schedule_worker() returns
0. On subsequent calls it returns 1, indicating the worker is already
scheduled. Currently the ksz driver treats 1 as an error and fails to
complete the port_hwtstamp_set operation, thus leaving the timestamping
configuration for those ports unchanged.
This patch fixes this by ignoring the ptp_schedule_worker() return
value.
The MAC only has add the TX delay and it can not be modified.
MAC and PHY are both set the TX delay cause transmission problems.
So just disable TX delay in PHY, when use rgmii to attach to
external phy, set PHY_INTERFACE_MODE_RGMII_RXID to phy drivers.
And it is does not matter to internal phy.
Fixes: bc2426d74aa3 ("net: ngbe: convert phylib to phylink") Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Cc: stable@vger.kernel.org # 6.3+ Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/E6759CF1387CF84C+20240820030425.93003-1-mengyuanlou@net-swift.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With the rework of the AP bus scan and the introduction of
a bindings complete completion also the timing until the
userspace finally receives a AP bus binding complete uevent
had increased. Unfortunately this event triggers some important
jobs for preparation of KVM guests, for example the modification
of card/queue masks to reassign AP resources to the alternate
AP queue device driver (vfio_ap) which is the precondition
for building mediated devices which may be a precondition for
starting KVM guests using AP resources.
This small fix now triggers the check for binding complete
each time an AP device driver has registered. With this patch
the bindings complete may be posted up to 30s earlier as there
is no need to wait for the next AP bus scan any more.
When only the last resource is invalid, tpmi_sst_dev_add() is returing
error even if there are other valid resources before. This function
should return error when there are no valid resources.
Here tpmi_sst_dev_add() is returning "ret" variable. But this "ret"
variable contains the failure status of last call to sst_main(), which
failed for the invalid resource. But there may be other valid resources
before the last entry.
To address this, do not update "ret" variable for sst_main() return
status.
If there are no valid resources, it is already checked for by !inst
below the loop and -ENODEV is returned.
The dell-uart-backlight driver supports backlight control on Dell All In
One (AIO) models using a backlight controller board connected to an UART.
In DSDT this uart port will be defined as:
Name (_HID, "DELL0501")
Name (_CID, EisaId ("PNP0501")
Now the first AIO has turned up which has not only the DSDT bits for this,
but also an actual controller attached to the UART, yet it is not using
this controller for backlight control.
Use the acpi_video_get_backlight_type() function from the ACPI video-detect
code to check if the dell-uart-backlight driver should actually be used.
This allows reusing the existing ACPI video-detect infra to override
the backlight control method on the commandline or with DMI quirks.
Fixes: 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver") Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andy@kernel.org> Link: https://patch.msgid.link/20240814190159.15650-3-hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dell All In One (AIO) models released after 2017 may use a backlight
controller board connected to an UART.
In DSDT this uart port will be defined as:
Name (_HID, "DELL0501")
Name (_CID, EisaId ("PNP0501")
The Dell OptiPlex 7760 AIO has an ACPI device for one if its UARTs with
the above _HID + _CID. Loading the dell-uart-backlight driver shows that
there actually is a backlight controller board attached to the UART,
which reports a firmware version of "G&MX01-V15".
But the backlight controller board does not actually control the backlight
brightness and the GPU's native backlight control method does work.
Add a quirk to use the GPU's native backlight control method on this model.
Fixes: 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver") Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2303936 Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andy@kernel.org> Link: https://patch.msgid.link/20240814190159.15650-4-hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dell All In One (AIO) models released after 2017 use a backlight
controller board connected to an UART.
In DSDT this uart port will be defined as:
Name (_HID, "DELL0501")
Name (_CID, EisaId ("PNP0501")
Commit 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver")
has added support for this, but I neglected to tie this into
acpi_video_get_backlight_type().
Now the first AIO has turned up which has not only the DSDT bits for this,
but also an actual controller attached to the UART, yet it is not using
this controller for backlight control.
Add support to acpi_video_get_backlight_type() for a new dell_uart
backlight type. So that the existing infra to override the backlight
control method on the commandline or with DMI quirks can be used.
Fixes: 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver") Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andy@kernel.org> Link: https://patch.msgid.link/20240814190159.15650-2-hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When STATUS_NO_MORE_FILES status is set to smb2 query dir response,
->StructureSize is set to 9, which mean buffer has 1 byte.
This issue occurs because ->Buffer[1] in smb2_query_directory_rsp to
flex-array.
Fixes: eb3e28c1e89b ("smb3: Replace smb2pdu 1-element arrays with flex-arrays") Cc: stable@vger.kernel.org # v6.1+ Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
scsi_logical_block_count() should return the block count of a given SCSI
command. The original implementation ended up shifting twice, leading to an
incorrect count being returned. Fix the conversion between bytes and
logical blocks.
Cc: stable@vger.kernel.org Fixes: 6a20e21ae1e2 ("scsi: core: Add helper to return number of logical blocks in a request") Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com> Link: https://lore.kernel.org/r/20240813053534.7720-1-chaotian.jing@mediatek.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 4733b65d82bd ("nvme: start keep-alive after admin queue setup")
moves starting keep-alive from nvme_start_ctrl() into
nvme_init_ctrl_finish(), but don't move stopping keep-alive into
nvme_uninit_ctrl(), so keep-alive work can be started and keep pending
after failing to start controller, finally use-after-free is triggered if
nvme host driver is unloaded.
This patch fixes kernel panic when running nvme/004 in case that connection
failure is triggered, by moving stopping keep-alive into nvme_uninit_ctrl().
This way is reasonable because keep-alive is now started in
nvme_init_ctrl_finish().
Fixes: 3af755a46881 ("nvme: move nvme_stop_keep_alive() back to original position") Cc: Hannes Reinecke <hare@suse.de> Cc: Mark O'Donovan <shiftee@posteo.net> Reported-by: Changhui Zhong <czhong@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Symbol offsets to the KASLR base do not match symbol address in
the vmlinux image. That is the result of setting the KASLR base
to the beginning of .text section as result of an optimization.
Revert that optimization and allocate virtual memory for the
whole kernel image including __START_KERNEL bytes as per the
linker script. That allows keeping the semantics of the KASLR
base offset in sync with other architectures.
Rename __START_KERNEL to TEXT_OFFSET, since it represents the
offset of the .text section within the kernel image, rather than
a virtual address.
Still skip mapping TEXT_OFFSET bytes to save memory on pgtables
and provoke exceptions in case an attempt to access this area is
made, as no kernel symbol may reside there.
In case CONFIG_KASAN is enabled the location counter might exceed
the value of TEXT_OFFSET, while the decompressor linker script
forcefully resets it to TEXT_OFFSET, which leads to a sections
overlap link failure. Use MAX() expression to avoid that.
When physical memory for the kernel image is allocated it does not
consider extra memory required for offsetting the image start to
match it with the lower 20 bits of KASLR virtual base address. That
might lead to kernel access beyond its memory range.
Suggested-by: Vasily Gorbik <gor@linux.ibm.com> Fixes: 693d41f7c938 ("s390/mm: Restore mapping of kernel image using large pages") Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Pre-allocate but don't initialize fences at xe_sched_job_create(),
and initialize / arm them instead at xe_sched_job_arm(). This
makes it possible to move xe_sched_job_create() with its memory
allocation out of any lock that is required for fence
initialization, and that may not allow memory allocation under it.
Replaces the struct dma_fence_array for parallell jobs with a
struct dma_fence_chain, since the former doesn't allow
a split-up between allocation and initialization.
v2:
- Rebase.
- Don't always use the first lrc when initializing parallel
lrc fences.
- Use dma_fence_chain_contained() to access the lrc fences.
v4:
- Add an assert that job->lrc_seqno == fence->seqno.
(Matthew Brost)
Since sometimes a lock is required to initialize a seqno fence,
and it might be desirable not to hold that lock while performing
memory allocations, split the lrc seqno fence creation up into an
allocation phase and an initialization phase.
Since lrc seqno fences under the hood are hw_fences, do the same
for these and remove the xe_hw_fence_create() function since it
is not used anymore.
Tightly coupling these seqno presents problems if alternative fences for
jobs are used. Decouple these for correctness.
v2:
- Slightly reword commit message (Thomas)
- Make sure the lrc fence ops are used in comparison (Thomas)
- Assume seqno is unsigned rather than signed in format string (Thomas)
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-2-thomas.hellstrom@linux.intel.com
Stable-dep-of: 9e7f30563677 ("drm/xe: Free job before xe_exec_queue_put") Signed-off-by: Sasha Levin <sashal@kernel.org>
Limit the protection only during moments of actual job execution,
and introduce protection for guc submit fini, which is currently
unprotected due to the absence of exec_queue life protection.
In the regular use case scenario, user space will create an
exec queue, and keep it alive to reuse that until it is done
with that kind of workload.
For the regular desktop cases, it means that the exec_queue
is alive even on idle scenarios where display goes off. This
is unacceptable since this would entirely block runtime PM
indefinitely, blocking deeper Package-C state. This would be
a waste drainage of power.
Cc: Matthew Brost <matthew.brost@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-3-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Stable-dep-of: 9e7f30563677 ("drm/xe: Free job before xe_exec_queue_put") Signed-off-by: Sasha Levin <sashal@kernel.org>
Harden the buffer peeking a bit, by adding a sanity check for it having
a valid size. Outside of that, arg->max_len is a size_t, though it's
only ever set to a 32-bit value (as it's governed by MAX_RW_COUNT).
Bump our needed check to a size_t so we know it fits. Finally, cap the
calculated needed iov value to the PEEK_MAX_IMPORT, which is the
maximum number of segments that should be peeked.
If the "test->highmem = alloc_pages()" allocation fails then calling
__free_pages(test->highmem) will result in a NULL dereference. Also
change the error code to -ENOMEM instead of returning success.
Only set tile->mmio.regs to NULL if not the root tile in tile_fini. The
root tile mmio regs is setup ealier in MMIO init thus it should be set
to NULL in mmio_fini.
Set our various mmio mappings to NULL. This should make it easier to
catch something rogue trying to mess with mmio after device removal. For
example, we might unmap everything and then start hitting some mmio
address which has already been unmamped by us and then remapped by
something else, causing all kinds of carnage.
Not valid to touch mmio once the device is removed, so make sure we
unmap on removal and not just when driver instance goes away. Also set
the mmio pointers to NULL to hopefully catch such issues more easily.
Being part o the display, ideally the setup and cleanup would be done by
display itself. However this is a bigger refactor that needs to be done
on both i915 and xe. For now, just fix the leak:
We are checking cp_irq_count from the wrong hdcp structure which
ends up giving timed out errors. We only increment the cp_irq_count
of the primary connector's hdcp structure but here in case of
multidisplay setup we end up checking the secondary connector's hdcp
structure, which will not have its cp_irq_count incremented. This leads
to a timed out at CP_IRQ error even though a CP_IRQ was raised. Extract
it from the correct intel_hdcp structure.
--v2
-Explain why it was the wrong hdcp structure [Jani]
Its necessary to call pm_runtime_force_*() hooks as part of system
suspend/resume calls so that the runtime_pm hooks get called. This
ensures latest state of the IP is cached and restored during system
sleep. This is especially true if runtime autosuspend is enabled as
runtime suspend hooks may not be called at all before system sleeps.
Without this patch, OSPI NOR enumeration (READ_ID) fails during resume
as context saved during suspend path is inconsistent.
Fixes: 078d62de433b ("spi: cadence-qspi: add system-wide suspend and resume callbacks") Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Link: https://patch.msgid.link/20240814151237.3856184-1-vigneshr@ti.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
sc7180 programs the ubwc settings as 0x1e as that would mean a
highest bank bit of 14 which matches what the GPU sets as well.
However, the highest_bank_bit field of the msm_mdss_data which is
being used to program the SSPP's fetch configuration is programmed
to a highest bank bit of 16 as 0x3 translates to 16 and not 14.
Fix the highest bank bit field used for the SSPP to match the mdss
and gpu settings.
Fixes: 6f410b246209 ("drm/msm/mdss: populate missing data") Reviewed-by: Rob Clark <robdclark@gmail.com> Tested-by: Stephen Boyd <swboyd@chromium.org> # Trogdor.Lazor
Patchwork: https://patchwork.freedesktop.org/patch/607625/ Link: https://lore.kernel.org/r/20240808235227.2701479-1-quic_abhinavk@quicinc.com Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When flushing a work item for cancellation, __flush_work() knows that it
exclusively owns the work item through its PENDING bit. 134874e2eee9
("workqueue: Allow cancel_work_sync() and disable_work() from atomic
contexts on BH work items") added a read of @work->data to determine whether
to use busy wait for BH work items that are being canceled. While the read
is safe when @from_cancel, @work->data was read before testing @from_cancel
to simplify code structure:
data = *work_data_bits(work);
if (from_cancel &&
!WARN_ON_ONCE(data & WORK_STRUCT_PWQ) && (data & WORK_OFFQ_BH)) {
While the read data was never used if !@from_cancel, this could trigger
KCSAN data race detection spuriously:
==================================================================
BUG: KCSAN: data-race in __flush_work / __flush_work
write to 0xffff8881223aa3e8 of 8 bytes by task 3998 on cpu 0:
instrument_write include/linux/instrumented.h:41 [inline]
___set_bit include/asm-generic/bitops/instrumented-non-atomic.h:28 [inline]
insert_wq_barrier kernel/workqueue.c:3790 [inline]
start_flush_work kernel/workqueue.c:4142 [inline]
__flush_work+0x30b/0x570 kernel/workqueue.c:4178
flush_work kernel/workqueue.c:4229 [inline]
...
read to 0xffff8881223aa3e8 of 8 bytes by task 50 on cpu 1:
__flush_work+0x42a/0x570 kernel/workqueue.c:4188
flush_work kernel/workqueue.c:4229 [inline]
flush_delayed_work+0x66/0x70 kernel/workqueue.c:4251
...
value changed: 0x0000000000400000 -> 0xffff88810006c00d
Reorganize the code so that @from_cancel is tested before @work->data is
accessed. The only problem is triggering KCSAN detection spuriously. This
shouldn't need READ_ONCE() or other access qualifiers.
No functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: syzbot+b3e4f2f51ed645fd5df2@syzkaller.appspotmail.com Fixes: 134874e2eee9 ("workqueue: Allow cancel_work_sync() and disable_work() from atomic contexts on BH work items") Link: http://lkml.kernel.org/r/000000000000ae429e061eea2157@google.com Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
This is due to shift_and_mask() using a signed immediate to construct
the mask and being called with a shift of 31 (WORK_OFFQ_POOL_SHIFT) so
that it ends up decrementing from INT_MIN.
Use an unsigned constant '1U' to generate the mask in shift_and_mask().
Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Fixes: 1211f3b21c2a ("workqueue: Preserve OFFQ bits in cancel[_sync] paths") Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Take into account the plane rotation and flipping when calculating src
positions for the wide plane parts.
This is not an issue yet, because rotation is only supported for the
UBWC planes and wide UBWC planes are rejected anyway because in parallel
multirect case only the half of the usual width is supported for tiled
formats. However it's better to fix this now rather than stumbling upon
it later.
YUV formats require only CSC to be enabled. Even decimated formats
should not require scaler. Relax the requirement and don't check for the
scaler block while checking if YUV format can be enabled.
If the dpu_format_populate_layout() fails, then FB is prepared, but not
cleaned up. This ends up leaking the pin_count on the GEM object and
causes a splat during DRM file closure:
msm_obj->pin_count
WARNING: CPU: 2 PID: 569 at drivers/gpu/drm/msm/msm_gem.c:121 update_lru_locked+0xc4/0xcc
[...]
Call trace:
update_lru_locked+0xc4/0xcc
put_pages+0xac/0x100
msm_gem_free_object+0x138/0x180
drm_gem_object_free+0x1c/0x30
drm_gem_object_handle_put_unlocked+0x108/0x10c
drm_gem_object_release_handle+0x58/0x70
idr_for_each+0x68/0xec
drm_gem_release+0x28/0x40
drm_file_free+0x174/0x234
drm_release+0xb0/0x160
__fput+0xc0/0x2c8
__fput_sync+0x50/0x5c
__arm64_sys_close+0x38/0x7c
invoke_syscall+0x48/0x118
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x4c/0x120
el0t_64_sync_handler+0x100/0x12c
el0t_64_sync+0x190/0x194
irq event stamp: 129818
hardirqs last enabled at (129817): [<ffffa5f6d953fcc0>] console_unlock+0x118/0x124
hardirqs last disabled at (129818): [<ffffa5f6da7dcf04>] el1_dbg+0x24/0x8c
softirqs last enabled at (129808): [<ffffa5f6d94afc18>] handle_softirqs+0x4c8/0x4e8
softirqs last disabled at (129785): [<ffffa5f6d94105e4>] __do_softirq+0x14/0x20
Before re-starting link training reset the link phy params namely
the pre-emphasis and voltage swing levels otherwise the next
link training begins at the previously cached levels which can result
in link training failures.
For cases where the crtc's connectors_changed was set without enable/active
getting toggled , there is an atomic_enable() call followed by an
atomic_disable() but without an atomic_mode_set().
This results in a NULL ptr access for the dpu_encoder_get_drm_fmt() call in
the atomic_enable() as the dpu_encoder's connector was cleared in the
atomic_disable() but not re-assigned as there was no atomic_mode_set() call.
Fix the NULL ptr access by moving the assignment for atomic_enable() and also
use drm_atomic_get_new_connector_for_encoder() to get the connector from
the atomic_state.
Fix the dp_panel_get_supported_bpp() API to return the minimum
supported bpp correctly for relevant cases and use this API
to correct the behavior of DP driver which hard-codes the max supported
bpp to 30.
This is incorrect because the number of lanes and max data rate
supported by the lanes need to be taken into account.
Replace the hardcoded limit with the appropriate math which accounts
for the accurate number of lanes and max data rate.
changes in v2:
- Fix the dp_panel_get_supported_bpp() and use it
- Drop the max_t usage as dp_panel_get_supported_bpp() already
returns the min_bpp correctly now
changes in v3:
- replace min_t with just min as all params are u32
DPU debugging macros need to be converted to a proper drm_debug_*
macros, however this is a going an intrusive patch, not suitable for a
fix. Wire DPU_DEBUG and DPU_DEBUG_DRIVER to always use DRM_DEBUG_DRIVER
to make sure that DPU debugging messages always end up in the drm debug
messages and are controlled via the usual drm.debug mask.
I don't think that it is a good idea for a generic DPU_DEBUG macro to be
tied to DRM_UT_KMS. It is used to report a debug message from driver, so by
default it should go to the DRM_UT_DRIVER channel. While refactoring
debug macros later on we might end up with particular messages going to
ATOMIC or KMS, but DRIVER should be the default.
iucv_alloc_device() gets a format string and a varying number of
arguments. This is incorrectly forwarded by calling dev_set_name() with
the format string and a va_list, while dev_set_name() expects also a
varying number of arguments.
Symptoms:
Corrupted iucv device names, which can result in log messages like:
sysfs: cannot create duplicate filename '/devices/iucv/hvc_iucv1827699952'
There is something wrong with ovs_drop_reasons. ovs_drop_reasons[0] is
"OVS_DROP_LAST_ACTION", but OVS_DROP_LAST_ACTION == __OVS_DROP_REASON + 1,
which means that ovs_drop_reasons[1] should be "OVS_DROP_LAST_ACTION".
And as Adrian tested, without the patch, adding flow to drop packets
results in:
drop at: do_execute_actions+0x197/0xb20 [openvsw (0xffffffffc0db6f97)
origin: software
input port ifindex: 8
timestamp: Tue Aug 20 10:19:17 2024 859853461 nsec
protocol: 0x800
length: 98
original length: 98
drop reason: OVS_DROP_ACTION_ERROR
With the patch, the same results in:
drop at: do_execute_actions+0x197/0xb20 [openvsw (0xffffffffc0db6f97)
origin: software
input port ifindex: 8
timestamp: Tue Aug 20 10:16:13 2024 475856608 nsec
protocol: 0x800
length: 98
original length: 98
drop reason: OVS_DROP_LAST_ACTION
Fix this by initializing ovs_drop_reasons with index.
Fixes: 9d802da40b7c ("net: openvswitch: add last-action drop reason") Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Tested-by: Adrian Moreno <amorenoz@redhat.com> Reviewed-by: Adrian Moreno <amorenoz@redhat.com> Link: https://patch.msgid.link/20240821123252.186305-1-dongml2@chinatelecom.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
If a multicast address is removed but there are still some multicast
addresses, that address would remain programmed into the frame filter.
Fix this by explicitly setting the enable bit for each filter.
Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240822154059.1066595-3-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
If promiscuous mode is disabled when there are fewer than four multicast
addresses, then it will not be reflected in the hardware. Fix this by
always clearing the promiscuous mode flag even when we program multicast
addresses.
Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240822154059.1066595-2-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Some CPT AF registers are per LF and others are global. Translation
of PF/VF local LF slot number to actual LF slot number is required
only for accessing perf LF registers. CPT AF global registers access
do not require any LF slot number. Also, there is no reason CPT
PF/VF to know actual lf's register offset.
Without this fix microcode loading will fail, VFs cannot be created
and hardware is not usable.
Fixes: bc35e28af789 ("octeontx2-af: replace cpt slot with lf id on reg write") Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240821070558.1020101-1-bbhushan2@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Remove the dma_unmap_page_attrs() call in the driver's XDP_REDIRECT
code path. This should have been removed when we let the page pool
handle the DMA mapping. This bug causes the warning:
Fixes: 578fcfd26e2a ("bnxt_en: Let the page pool manage the DMA mapping") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20240820203415.168178-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
After ip6_local_out() has returned, we no longer can safely
dereference rt, unless we hold rcu_read_lock().
A similar issue has been fixed in commit a688caa34beb ("ipv6: take rcu lock in rawv6_send_hdrinc()")
Another potential issue in ip6_finish_output2() is handled in a
separate patch.
[1]
BUG: KASAN: slab-use-after-free in ip6_send_skb+0x18d/0x230 net/ipv6/ip6_output.c:1964
Read of size 8 at addr ffff88806dde4858 by task syz.1.380/6530
When assembling fraglist GSO packets, udp4_gro_complete does not set
skb->csum_start, which makes the extra validation in __udp_gso_segment fail.
Fixes: 89add40066f9 ("net: drop bad gso csum_start and offset in virtio_net_hdr") Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240819150621.59833-1-nbd@nbd.name Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
There is a bug in netem_enqueue() introduced by
commit 5845f706388a ("net: netem: fix skb length BUG_ON in __skb_to_sgvec")
that can lead to a use-after-free.
This commit made netem_enqueue() always return NET_XMIT_SUCCESS
when a packet is duplicated, which can cause the parent qdisc's q.qlen
to be mistakenly incremented. When this happens qlen_notify() may be
skipped on the parent during destruction, leaving a dangling pointer
for some classful qdiscs like DRR.
There are two ways for the bug happen:
- If the duplicated packet is dropped by rootq->enqueue() and then
the original packet is also dropped.
- If rootq->enqueue() sends the duplicated packet to a different qdisc
and the original packet is dropped.
In both cases NET_XMIT_SUCCESS is returned even though no packets
are enqueued at the netem qdisc.
The fix is to defer the enqueue of the duplicate packet until after
the original packet has been guaranteed to return NET_XMIT_SUCCESS.
Fixes: 5845f706388a ("net: netem: fix skb length BUG_ON in __skb_to_sgvec") Reported-by: Budimir Markovic <markovicbudimir@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240819175753.5151-1-stephen@networkplumber.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Sabrina reports that the igb driver does not cope well with large
MAX_SKB_FRAG values: setting MAX_SKB_FRAG to 45 causes payload
corruption on TX.
An easy reproducer is to run ssh to connect to the machine. With
MAX_SKB_FRAGS=17 it works, with MAX_SKB_FRAGS=45 it fails. This has
been reported originally in
https://bugzilla.redhat.com/show_bug.cgi?id=2265320
The root cause of the issue is that the driver does not take into
account properly the (possibly large) shared info size when selecting
the ring layout, and will try to fit two packets inside the same 4K
page even when the 1st fraglist will trump over the 2nd head.
Address the issue by checking if 2K buffers are insufficient.
Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS") Reported-by: Jan Tluka <jtluka@redhat.com> Reported-by: Jirka Hladky <jhladky@redhat.com> Reported-by: Sabrina Dubroca <sd@queasysnail.net> Tested-by: Sabrina Dubroca <sd@queasysnail.net> Tested-by: Corinna Vinschen <vinschen@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Link: https://patch.msgid.link/20240816152034.1453285-1-vinschen@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The dpaa2_switch_add_bufs() function returns the number of bufs that it
was able to add. It returns BUFS_PER_CMD (7) for complete success or a
smaller number if there are not enough pages available. However, the
error checking is looking at the total number of bufs instead of the
number which were added on this iteration. Thus the error checking
only works correctly for the first iteration through the loop and
subsequent iterations are always counted as a success.
Fix this by checking only the bufs added in the current iteration.
Fixes: 0b1b71370458 ("staging: dpaa2-switch: handle Rx path on control interface") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://patch.msgid.link/eec27f30-b43f-42b6-b8ee-04a6f83423b6@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When working on multi-buffer packet on arch that has PAGE_SIZE >= 8192,
truesize is calculated and stored in xdp_buff::frame_sz per each
processed Rx buffer. This means that frame_sz will contain the truesize
based on last received buffer, but commit 1dc1a7e7f410 ("ice:
Centrallize Rx buffer recycling") assumed this value will be constant
for each buffer, which breaks the page recycling scheme and mess up the
way we update the page::page_offset.
To fix this, let us work on constant truesize when PAGE_SIZE >= 8192
instead of basing this on size of a packet read from Rx descriptor. This
way we can simplify the code and avoid calculating truesize per each
received frame and on top of that when using
xdp_update_skb_shared_info(), current formula for truesize update will
be valid.
This means ice_rx_frame_truesize() can be removed altogether.
Furthermore, first call to it within ice_clean_rx_irq() for 4k PAGE_SIZE
was redundant as xdp_buff::frame_sz is initialized via xdp_init_buff()
in ice_vsi_cfg_rxq(). This should have been removed at the point where
xdp_buff struct started to be a member of ice_rx_ring and it was no
longer a stack based variable.
There are two fixes tags as my understanding is that the first one
exposed us to broken truesize and page_offset handling and then second
introduced broken skb_shared_info update in ice_{construct,build}_skb().
Reported-and-tested-by: Luiz Capitulino <luizcap@redhat.com> Closes: https://lore.kernel.org/netdev/8f9e2a5c-fd30-4206-9311-946a06d031bb@redhat.com/ Fixes: 1dc1a7e7f410 ("ice: Centrallize Rx buffer recycling") Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Architectures that have PAGE_SIZE >= 8192 such as arm64 should act the
same as x86 currently, meaning reuse of a page should only take place
when no one else is busy with it.
Do two things independently of underlying PAGE_SIZE:
- store the page count under ice_rx_buf::pgcnt
- then act upon its value vs ice_rx_buf::pagecnt_bias when making the
decision regarding page reuse
Fixes: 2b245cb29421 ("ice: Implement transmit and NAPI support") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the active slave is cleared manually the xfrm state is not flushed.
This leads to xfrm add/del imbalance and adding the same state multiple
times. For example when the device cannot handle anymore states we get:
[ 1169.884811] bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
because it's filled with the same state after multiple active slave
clearings. This change also has a few nice side effects: user-space
gets a notification for the change, the old device gets its mac address
and promisc/mcast adjusted properly.
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
We shouldn't set real_dev to NULL because packets can be in transit and
xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume
real_dev is set.
Example trace:
kernel: BUG: unable to handle page fault for address: 0000000000001030
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: #PF: supervisor write access in kernel mode
kernel: #PF: error_code(0x0002) - not-present page
kernel: PGD 0 P4D 0
kernel: Oops: 0002 [#1] PREEMPT SMP
kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12
kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel:
kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60
kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00
kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014
kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000
kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000
kernel: FS: 00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: Call Trace:
kernel: <TASK>
kernel: ? __die+0x1f/0x60
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel: ? page_fault_oops+0x142/0x4c0
kernel: ? do_user_addr_fault+0x65/0x670
kernel: ? kvm_read_and_reset_apf_flags+0x3b/0x50
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: ? exc_page_fault+0x7b/0x180
kernel: ? asm_exc_page_fault+0x22/0x30
kernel: ? nsim_bpf_uninit+0x50/0x50 [netdevsim]
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel: ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: bond_ipsec_offload_ok+0x7b/0x90 [bonding]
kernel: xfrm_output+0x61/0x3b0
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel: ip_push_pending_frames+0x56/0x80
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
We must check if there is an active slave before dereferencing the pointer.
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
GRO code checks for matching layer 2 headers to see, if packet belongs
to the same flow and because ip6 tunnel set dev->hard_header_len
this check fails in cases, where it shouldn't. To fix this don't
set hard_header_len, but use needed_headroom like ipv4/ip_tunnel.c
does.