Heikki Krogerus [Mon, 11 Nov 2024 10:02:20 +0000 (12:02 +0200)]
usb: typec: ucsi: Fix a missing bits to bytes conversion in ucsi_init()
The GET_CAPABILITY size is wrong. The definitions for the
command sizes are for bitfieds and therefore in bits, not
bytes.
This fixes an issue that prevents the interface from being
registered with UCSI versions older than 2.0 because the
command size exceeds the MESSAGE_IN field size.
Fixes: 226ff2e681d0 ("usb: typec: ucsi: Convert connector specific commands to bitmaps") Reported-by: Abel Vesa <abel.vesa@linaro.org> Closes: https://lore.kernel.org/linux-usb/Zy864W7sysWZbCTd@linaro.org/ Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Abel Vesa <abel.vesa@linaro.org> Tested-by: Abel Vesa <abel.vesa@linaro.org> Link: https://lore.kernel.org/r/20241111100220.1743872-1-heikki.krogerus@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
usb: musb: Fix hardware lockup on first Rx endpoint request
There is a possibility that a request's callback could be invoked from
usb_ep_queue() (call trace below, supplemented with missing calls):
req->complete from usb_gadget_giveback_request
(drivers/usb/gadget/udc/core.c:999)
usb_gadget_giveback_request from musb_g_giveback
(drivers/usb/musb/musb_gadget.c:147)
musb_g_giveback from rxstate
(drivers/usb/musb/musb_gadget.c:784)
rxstate from musb_ep_restart
(drivers/usb/musb/musb_gadget.c:1169)
musb_ep_restart from musb_ep_restart_resume_work
(drivers/usb/musb/musb_gadget.c:1176)
musb_ep_restart_resume_work from musb_queue_resume_work
(drivers/usb/musb/musb_core.c:2279)
musb_queue_resume_work from musb_gadget_queue
(drivers/usb/musb/musb_gadget.c:1241)
musb_gadget_queue from usb_ep_queue
(drivers/usb/gadget/udc/core.c:300)
According to the docstring of usb_ep_queue(), this should not happen:
"Note that @req's ->complete() callback must never be called from within
usb_ep_queue() as that can create deadlock situations."
In fact, a hardware lockup might occur in the following sequence:
1. The gadget is initialized using musb_gadget_enable().
2. Meanwhile, a packet arrives, and the RXPKTRDY flag is set, raising an
interrupt.
3. If IRQs are enabled, the interrupt is handled, but musb_g_rx() finds an
empty queue (next_request() returns NULL). The interrupt flag has
already been cleared by the glue layer handler, but the RXPKTRDY flag
remains set.
4. The first request is enqueued using usb_ep_queue(), leading to the call
of req->complete(), as shown in the call trace above.
5. If the callback enables IRQs and another packet is waiting, step (3)
repeats. The request queue is empty because usb_g_giveback() removes the
request before invoking the callback.
6. The endpoint remains locked up, as the interrupt triggered by hardware
setting the RXPKTRDY flag has been handled, but the flag itself remains
set.
For this scenario to occur, it is only necessary for IRQs to be enabled at
some point during the complete callback. This happens with the USB Ethernet
gadget, whose rx_complete() callback calls netif_rx(). If called in the
task context, netif_rx() disables the bottom halves (BHs). When the BHs are
re-enabled, IRQs are also enabled to allow soft IRQs to be processed. The
gadget itself is initialized at module load (or at boot if built-in), but
the first request is enqueued when the network interface is brought up,
triggering rx_complete() in the task context via ioctl(). If a packet
arrives while the interface is down, it can prevent the interface from
receiving any further packets from the USB host.
The situation is quite complicated with many parties involved. This
particular issue can be resolved in several possible ways:
1. Ensure that callbacks never enable IRQs. This would be difficult to
enforce, as discovering how netif_rx() interacts with interrupts was
already quite challenging and u_ether is not the only function driver.
Similar "bugs" could be hidden in other drivers as well.
2. Disable MUSB interrupts in musb_g_giveback() before calling the callback
and re-enable them afterwars (by calling musb_{dis,en}able_interrupts(),
for example). This would ensure that MUSB interrupts are not handled
during the callback, even if IRQs are enabled. In fact, it would allow
IRQs to be enabled when releasing the lock. However, this feels like an
inelegant hack.
3. Modify the interrupt handler to clear the RXPKTRDY flag if the request
queue is empty. While this approach also feels like a hack, it wastes
CPU time by attempting to handle incoming packets when the software is
not ready to process them.
4. Flush the Rx FIFO instead of calling rxstate() in musb_ep_restart().
This ensures that the hardware can receive packets when there is at
least one request in the queue. Once IRQs are enabled, the interrupt
handler will be able to correctly process the next incoming packet
(eventually calling rxstate()). This approach may cause one or two
packets to be dropped (two if double buffering is enabled), but this
seems to be a minor issue, as packet loss can occur when the software is
not yet ready to process them. Additionally, this solution makes the
gadget driver compliant with the rule mentioned in the docstring of
usb_ep_queue().
There may be additional solutions, but from these four, the last one has
been chosen as it seems to be the most appropriate, as it addresses the
"bad" behavior of the driver.
Fixes: baebdf48c360 ("net: dev: Makes sure netif_rx() can be invoked in any context.") Cc: stable@vger.kernel.org Signed-off-by: Hubert Wiśniewski <hubert.wisniewski.25632@gmail.com> Link: https://lore.kernel.org/r/4ee1ead4525f78fb5909a8cbf99513ad0082ad21.camel@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
usb: typec: ucsi: glink: be more precise on orientation-aware ports
Instead of checking if any of the USB-C ports have orientation GPIO and
thus is orientation-aware, check for the GPIO for the port being
registered. There are no boards that are affected by this change at this
moment, so the patch is not marked as a fix, but it might affect other
boards in future.
Reviewed-by: Abel Vesa <abel.vesa@linaro.org> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Johan Hovold <johan+linaro@kernel.org> Tested-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20241109-ucsi-glue-fixes-v2-2-8b21ff4f9fbe@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
usb: typec: ucsi: glink: fix off-by-one in connector_status
UCSI connector's indices start from 1 up to 3, PMIC_GLINK_MAX_PORTS.
Correct the condition in the pmic_glink_ucsi_connector_status()
callback, fixing Type-C orientation reporting for the third USB-C
connector.
Fixes: 76716fd5bf09 ("usb: typec: ucsi: glink: move GPIO reading into connector_status callback") Cc: stable@vger.kernel.org Reported-by: Abel Vesa <abel.vesa@linaro.org> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Johan Hovold <johan+linaro@kernel.org> Tested-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20241109-ucsi-glue-fixes-v2-1-8b21ff4f9fbe@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After an initial range change on the insigned int alt being > 1
the only possible values for alt are 0 or 1. Therefore the else
statement for values other than 0 or 1 is redundant and can be
removed. Replace the else if (all == 1) check with just an else.
Philipp Stanner [Tue, 5 Nov 2024 11:11:33 +0000 (12:11 +0100)]
thunderbolt: Replace deprecated PCI functions
pcim_iomap_table() and pcim_request_regions() have been deprecated in
commit e354bb84a4c1 ("PCI: Deprecate pcim_iomap_table(),
pcim_iomap_regions_request_all()") and commit d140f80f60358 ("PCI:
Deprecate pcim_iomap_regions() in favor of pcim_iomap_region()").
Replace these functions with pcim_iomap_region().
Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Heikki Krogerus [Wed, 6 Nov 2024 15:06:05 +0000 (17:06 +0200)]
usb: typec: ucsi: Convert connector specific commands to bitmaps
That allows the fields in those command data structures to
be easily validated. If an unsupported field is accessed, a
warning is generated.
This will not force UCSI version checks to be made in every
place where these data structures are accessed, but it will
make it easier to pinpoint issues that are caused by the
unconditional accesses to those fields, and perhaps more
importantly, allow those issues to be noticed immediately.
Stop Endpoint command on an already stopped endpoint fails and may be
misinterpreted as a known hardware bug by the completion handler. This
results in an unnecessary delay with repeated retries of the command.
Avoid queuing this command when endpoint state flags indicate that it's
stopped or halted and the command will fail. If commands are pending on
the endpoint, their completion handlers will process cancelled TDs so
it's done. In case of waiting for external operations like clearing TT
buffer, the endpoint is stopped and cancelled TDs can be processed now.
This eliminates practically all unnecessary retries because an endpoint
with pending URBs is maintained in Running state by the driver, unless
aforementioned commands or other operations are pending on it. This is
guaranteed by xhci_ring_ep_doorbell() and by the fact that it is called
every time any of those operations completes.
The only known exceptions are hardware bugs (the endpoint never starts
at all) and Stream Protocol errors not associated with any TRB, which
cause an endpoint reset not followed by restart. Sounds like a bug.
Generally, these retries are only expected to happen when the endpoint
fails to start for unknown/no reason, which is a worse problem itself,
and fixing the bug eliminates the retries too.
All cases were tested and found to work as expected. SET_DEQ_PENDING
was produced by patching uvcvideo to unlink URBs in 100us intervals,
which then runs into this case very often. EP_HALTED was produced by
restarting 'cat /dev/ttyUSB0' on a serial dongle with broken cable.
EP_CLEARING_TT by the same, with the dongle on an external hub.
Fixes: fd9d55d190c0 ("xhci: retry Stop Endpoint on buggy NEC controllers") CC: stable@vger.kernel.org Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241106101459.775897-34-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michal Pecio [Wed, 6 Nov 2024 10:14:58 +0000 (12:14 +0200)]
usb: xhci: Fix TD invalidation under pending Set TR Dequeue
xhci_invalidate_cancelled_tds() may not work correctly if the hardware
is modifying endpoint or stream contexts at the same time by executing
a Set TR Dequeue command. And even if it worked, it would be unable to
queue Set TR Dequeue for the next stream, failing to clear xHC cache.
On stream endpoints, a chain of Set TR Dequeue commands may take some
time to execute and we may want to cancel more TDs during this time.
Currently this leads to Stop Endpoint completion handler calling this
function without testing for SET_DEQ_PENDING, which will trigger the
aforementioned problems when it happens.
On all endpoints, a halt condition causes Reset Endpoint to be queued
and an error status given to the class driver, which may unlink more
URBs in response. Stop Endpoint is queued and its handler may execute
concurrently with Set TR Dequeue queued by Reset Endpoint handler.
(Reset Endpoint handler calls this function too, but there seems to
be no possibility of it running concurrently with Set TR Dequeue).
Fix xhci_invalidate_cancelled_tds() to work correctly under a pending
Set TR Dequeue. Bail out of the function when SET_DEQ_PENDING is set,
then make the completion handler call the function again and also call
xhci_giveback_invalidated_tds(), which needs to be called next.
This seems to fix another potential bug, where the handler would call
xhci_invalidate_cancelled_tds(), which may clear some deferred TDs if
a sanity check fails, and the TDs wouldn't be given back promptly.
Said sanity check seems to be wrong and prone to false positives when
the endpoint halts, but fixing it is beyond the scope of this change,
besides ensuring that cleared TDs are given back properly.
Michal Pecio [Wed, 6 Nov 2024 10:14:57 +0000 (12:14 +0200)]
usb: xhci: Limit Stop Endpoint retries
Some host controllers fail to atomically transition an endpoint to the
Running state on a doorbell ring and enter a hidden "Restarting" state,
which looks very much like Stopped, with the important difference that
it will spontaneously transition to Running anytime soon.
A Stop Endpoint command queued in the Restarting state typically fails
with Context State Error and the completion handler sees the Endpoint
Context State as either still Stopped or already Running. Even a case
of Halted was observed, when an error occurred right after the restart.
The Halted state is already recovered from by resetting the endpoint.
The Running state is handled by retrying Stop Endpoint.
The Stopped state was recognized as a problem on NEC controllers and
worked around also by retrying, because the endpoint soon restarts and
then stops for good. But there is a risk: the command may fail if the
endpoint is "stopped for good" already, and retries will fail forever.
The possibility of this was not realized at the time, but a number of
cases were discovered later and reproduced. Some proved difficult to
deal with, and it is outright impossible to predict if an endpoint may
fail to ever start at all due to a hardware bug. One such bug (albeit
on ASM3142, not on NEC) was found to be reliably triggered simply by
toggling an AX88179 NIC up/down in a tight loop for a few seconds.
An endless retries storm is quite nasty. Besides putting needless load
on the xHC and CPU, it causes URBs never to be given back, paralyzing
the device and connection/disconnection logic for the whole bus if the
device is unplugged. User processes waiting for URBs become unkillable,
drivers and kworker threads lock up and xhci_hcd cannot be reloaded.
For peace of mind, impose a timeout on Stop Endpoint retries in this
case. If they don't succeed in 100ms, consider the endpoint stopped
permanently for some reason and just give back the unlinked URBs. This
failure case is rare already and work is under way to make it rarer.
Start this work today by also handling one simple case of race with
Reset Endpoint, because it costs just two lines to implement.
Fixes: fd9d55d190c0 ("xhci: retry Stop Endpoint on buggy NEC controllers") CC: stable@vger.kernel.org Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241106101459.775897-32-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Niklas Neronin [Wed, 6 Nov 2024 10:14:56 +0000 (12:14 +0200)]
usb: xhci: remove irrelevant comment
The code which it is referencing does not exist in the same function,
or the file for that matter. Since it was added [1], the Interrupter
Moderation Interval can be changed within xhci addon, e.g. PCI
xhci_pci_setup().
[1], commit 0ebbab374223 ("USB: xhci: Ring allocation and initialization.")
Niklas Neronin [Wed, 6 Nov 2024 10:14:54 +0000 (12:14 +0200)]
usb: xhci: refactor xhci_td_cleanup() to return void
The function is modified to return 'void' instead of an integer since it
invariably returns '0'. Additionally, multiple functions which only
return xhci_td_cleanup() are also refactored to return void.
This change eliminates the need for callers to handle a return value that
does not convey meaningful information and improve code readability, as it
becomes immediately clear that the function does not produce a significant
output.
Niklas Neronin [Wed, 6 Nov 2024 10:14:52 +0000 (12:14 +0200)]
usb: xhci: improve xhci_clear_command_ring()
Remove redundant TRB cycle reset, the TRB cycle is already set to zero by
the preceding memset(), making the explicit reset unnecessary.
Clarify ring loop start point. Change the loop start from the dequeue
segment to the start segment. Both approaches achieve the same result,
but starting from the start segment makes it clearer that the entire ring
is being zeroed out.
Niklas Neronin [Wed, 6 Nov 2024 10:14:51 +0000 (12:14 +0200)]
usb: xhci: request MSI/-X according to requested amount
Variable 'max_interrupters' contains the maximum supported interrupters
or the maximum interrupters the user has requested. Thus, it should be
used instead of HCS_MAX_INTRS().
User set 'max_interrupters' value is validated in xhci_gen_setup(),
otherwise 'max_interrupters' value is 'HCS_MAX_INTRS(xhci->hcs_params1)'.
Niklas Neronin [Wed, 6 Nov 2024 10:14:49 +0000 (12:14 +0200)]
usb: xhci: simplify TDs start and end naming scheme in struct 'xhci_td'
Old names:
* start_seg - last_trb_seg
* first_trb - last_trb
New names:
* start_seg - end_seg
* start_trb - end_trb
A Transfer Descriptor (TD) in the xhci driver is a data structure that
represents a single transaction to be performed by the USB host controller.
This transaction is defined by TRBs from 'start_trb' in 'start_seg' to
'end_trb' in 'end_seg'.
The terms "start" and "end" were chosen over "first" and "last" for ease
of searching within the codebase. The ring structure uses 'first_seg' and
'last_seg', while the TD structure uses 'start_seg' and 'end_seg'.
Kuangyi Chiang [Wed, 6 Nov 2024 10:14:46 +0000 (12:14 +0200)]
xhci: Don't perform Soft Retry for Etron xHCI host
Since commit f8f80be501aa ("xhci: Use soft retry to recover faster from
transaction errors"), unplugging USB device while enumeration results in
errors like this:
[ 364.855321] xhci_hcd 0000:0b:00.0: ERROR Transfer event for disabled endpoint slot 5 ep 2
[ 364.864622] xhci_hcd 0000:0b:00.0: @0000002167656d7067f03000000000210c00000005038001
[ 374.934793] xhci_hcd 0000:0b:00.0: Abort failed to stop command ring: -110
[ 374.958793] xhci_hcd 0000:0b:00.0: xHCI host controller not responding, assume dead
[ 374.967590] xhci_hcd 0000:0b:00.0: HC died; cleaning up
[ 374.973984] xhci_hcd 0000:0b:00.0: Timeout while waiting for configure endpoint command
Seems that Etorn xHCI host can not perform Soft Retry correctly, apply
XHCI_NO_SOFT_RETRY quirk to disable Soft Retry and then issue is gone.
This patch depends on commit a4a251f8c235 ("usb: xhci: do not perform
Soft Retry for some xHCI hosts").
Fixes: f8f80be501aa ("xhci: Use soft retry to recover faster from transaction errors") Cc: stable@vger.kernel.org Signed-off-by: Kuangyi Chiang <ki.chiang65@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241106101459.775897-21-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The r8152 driver will periodically issue lots of control-IN requests
to access the status of ethernet adapter hardware registers during
the test.
This happens when the xHCI driver enqueue a control TD (which cross
over the Link TRB between two ring segments, as shown) in the endpoint
zero's transfer ring. Seems the Etron xHCI host can not perform this
TD correctly, causing the USB transfer error occurred, maybe the upper
driver retry that control-IN request can solve problem, but not all
drivers do this.
| |
-------
| TRB | Setup Stage
-------
| TRB | Link
-------
-------
| TRB | Data Stage
-------
| TRB | Status Stage
-------
| |
To work around this, the xHCI driver should enqueue a No Op TRB if
next available TRB is the Link TRB in the ring segment, this can
prevent the Setup and Data Stage TRB to be breaked by the Link TRB.
Check if the XHCI_ETRON_HOST quirk flag is set before invoking the
workaround in xhci_queue_ctrl_tx().
Kuangyi Chiang [Wed, 6 Nov 2024 10:14:44 +0000 (12:14 +0200)]
xhci: Don't issue Reset Device command to Etron xHCI host
Sometimes the hub driver does not recognize the USB device connected
to the external USB2.0 hub when the system resumes from S4.
After the SetPortFeature(PORT_RESET) request is completed, the hub
driver calls the HCD reset_device callback, which will issue a Reset
Device command and free all structures associated with endpoints
that were disabled.
This happens when the xHCI driver issue a Reset Device command to
inform the Etron xHCI host that the USB device associated with a
device slot has been reset. Seems that the Etron xHCI host can not
perform this command correctly, affecting the USB device.
To work around this, the xHCI driver should obtain a new device slot
with reference to commit 651aaf36a7d7 ("usb: xhci: Handle USB transaction
error on address command"), which is another way to inform the Etron
xHCI host that the USB device has been reset.
Add a new XHCI_ETRON_HOST quirk flag to invoke the workaround in
xhci_discover_or_reset_device().
Fixes: 2a8f82c4ceaf ("USB: xhci: Notify the xHC when a device is reset.") Cc: stable@vger.kernel.org Signed-off-by: Kuangyi Chiang <ki.chiang65@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241106101459.775897-19-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Niklas Neronin [Wed, 6 Nov 2024 10:14:42 +0000 (12:14 +0200)]
usb: xhci: add xhci_initialize_ring_segments()
A ring consists of a list of segments, each containing a specific number of
TRBs. The xhci driver allocates and initializes ring segments and TRBs in
the same functions. This combined allocation and initialization process
leads to an issue where, after hibernation (S4 state), the xhci driver
frees all its memory and re-creates the rings, segments, and TRBs from
scratch.
Move all default ring segment initialization into function
xhci_initialize_ring_segments(). This function can be called to
reinitialize a ring without freeing and reallocating it.
Since xhci_alloc_segments_for_ring() no longer initializes segment TRBs,
xhci_initialize_ring_segments() is added to xhci_ring_expansion(). This
results in the last segment of the source ring having the 'LINK_TOGGLE'
bit set. Therefore, if the last source ring segment is not the last in
the destination ring, the 'LINK_TOGGLE' bit must be cleared.
Niklas Neronin [Wed, 6 Nov 2024 10:14:41 +0000 (12:14 +0200)]
usb: xhci: rework xhci_link_segments()
Prepare for splitting ring segments allocation and initialization by
reworking the xhci_link_segments() function. Segment linking and ring type
checks are moved out of xhci_link_segments(), and the function is renamed
to "xhci_set_link_trb()".
The goal is to keep ring linking within xhci_alloc_segments_for_ring() and
move initialization into a separate function.
Additionally, reorder and simplify xhci_set_link_trb() for better
readability.
Niklas Neronin [Wed, 6 Nov 2024 10:14:40 +0000 (12:14 +0200)]
usb: xhci: refactor xhci_link_rings() to use source and destination rings
Refactor the xhci_link_rings() function to accept two rings: a source ring
and a destination ring. Previously, the function accepted a ring and a
segment list as arguments, now the function splices the source ring segment
list into the destination ring. This new approach reduces the number of
arguments and simplifies the code, making it easier to follow.
Niklas Neronin [Wed, 6 Nov 2024 10:14:39 +0000 (12:14 +0200)]
usb: xhci: rework xhci_free_segments_for_ring()
The segment list is only relevant within the context of the ring.
Thus, it is more logical to pass the ring itself when freeing all segments
from a ring, rather than passing the ring list.
Rename the function to "xhci_ring_segments_free" to align with the naming
convention of xhci_segment_free().
To make the freeing process more intuitive, free segments sequentially
from start to end (i.e., 0, 1, 2, 3, ...) instead of freeing the first
segment last (i.e., 1, 2, 3, ... 0).
The function xhci_alloc_segments_for_ring() currently takes 7 arguments,
5 of which are components of the 'xhci_ring' struct. Refactor the function
to accept a pointer to the 'xhci_ring' struct instead of passing these
components separately.
The change reduces the number of arguments, making the function signature
cleaner and easier to understand.
Niklas Neronin [Wed, 6 Nov 2024 10:14:37 +0000 (12:14 +0200)]
usb: xhci: remove option to change a default ring's TRB cycle bit
The TRB cycle bit indicates TRB ownership by the Host Controller (HC) or
Host Controller Driver (HCD). New rings are initialized with 'cycle_state'
equal to one, and all its TRBs' cycle bits are set to zero. When handling
ring expansion, set the source ring cycle bits to the same value as the
destination ring.
Move the cycle bit setting from xhci_segment_alloc() to xhci_link_rings(),
and remove the 'cycle_state' argument from xhci_initialize_ring_info().
The xhci_segment_alloc() function uses kzalloc_node() to allocate segments,
ensuring that all TRB cycle bits are initialized to zero.
Niklas Neronin [Wed, 6 Nov 2024 10:14:36 +0000 (12:14 +0200)]
usb: xhci: introduce macro for ring segment list iteration
Add macro to streamline and standardize the iteration over ring
segment list.
xhci_for_each_ring_seg(): Iterates over the entire ring segment list.
The xhci_free_segments_for_ring() function's while loop has not been
updated to use the new macro. This function has some underlying issues,
and as a result, it will be handled separately in a future patch.
Mathias Nyman [Wed, 6 Nov 2024 10:14:34 +0000 (12:14 +0200)]
xhci: trace stream context at Set TR Deq command completion
A successful 'Set TR Deq' command writes a new dequeue pointer to the
stream context for stream rings, show the content of the stream ctx
at command completion.
Mathias Nyman [Wed, 6 Nov 2024 10:14:32 +0000 (12:14 +0200)]
xhci: Don't trace ring at every enqueue or dequeue increase
Don't trace ring pointers every time driver increases enqueue pointer
after queuing a TRB, or dequeue pointer after handling an event.
These xhci_inc_deq: and xhci_inc_enq: trace entries fill up the trace and
provides little useful info now that the TRB DMA address is printed with
the TRB itself.
Only trace ring during xhci_inc_enq() and xhci_inc_deq() in case we move
to a new ring segment.
Also don't show both segment and TRB addess in trace.
Segment is just TRB with 0xfff masked off.
Mathias Nyman [Wed, 6 Nov 2024 10:14:31 +0000 (12:14 +0200)]
xhci: show DMA address of TRB when tracing TRBs
The DMA address of a queued TRB is essential when looking at traces as
both transfer events and command completion events refer to the command
or transfer based on its DMA address.
Previously the TRB address was figured out from xhci_inc_enq and
xhci_inc_deq trace entries seen after queuing or handlong a TRB.
Now that DMA address is shown in TRB tracing we can get rid of most of the
xhci_inc_enq and xhci_inc_deq traces thus decreasing trace size.
Mathias Nyman [Wed, 6 Nov 2024 10:14:30 +0000 (12:14 +0200)]
xhci: Cleanup Candence controller PCI device and vendor ID usage
Use predefined PCI vendor ID constant for Cadence controller in pci_ids.h
Rename the Candence device ID to match the pattern other PCI vendor and
device IDs use
Michal Pecio [Wed, 6 Nov 2024 10:14:29 +0000 (12:14 +0200)]
usb: xhci: Fix sum_trb_lengths()
This function is supposed to sum the lengths of all transfer TRBs in
a TD up to a point, but it starts summing at the current dequeue since
it only ever gets called on the first pending TD.
This won't work when there are cancelled TDs at the beginning of the
ring. The function tries to exclude No-Ops from the count, but not all
cancelled TDs are No-Op'ed - not those the HW stopped on.
The absolutely obvious fix is to start counting at the TD's first TRB.
And remove the now-useless 'ring' parameter, and 'xhci' too.
usb: Use (of|device)_property_present() for non-boolean properties
The use of (of|device)_property_read_bool() for non-boolean properties
is deprecated in favor of of_property_present() when testing for
property presence.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Acked-by: Peter Chen <peter.chen@kernel.org> Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Link: https://lore.kernel.org/r/20241104190820.277702-1-robh@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Romain Gantois [Thu, 24 Oct 2024 08:54:17 +0000 (10:54 +0200)]
usb: typec: mux: Add support for the TUSB1046 crosspoint switch
The TUSB1046-DCI is a USB-C linear redriver crosspoint switch, which can
mux SuperSpeed lanes from a Type-C connector to a USB3.0 data lane or up to
4 display port lanes.
Add support for driving the TUSB1046 as a Type-C orientation switch and
DisplayPort altmode multiplexer.
Describe the Texas Instruments TUSB1046-DCI USB Type-C linear redriver
crosspoint switch. This component is used to handle orientation switching
and DisplayPort altmode multiplexing for Type-C signals.
Parth Pancholi [Tue, 29 Oct 2024 07:24:44 +0000 (08:24 +0100)]
USB: xhci: add support for PWRON active high
Some PCIe-to-USB controllers such as TI's TUSB73x0 3.0 xHCI host
controller supports controlling the PWRONx polarity via the USB control
register (E0h). Add support for device tree property
ti,pwron-active-high which indicates PWRONx to be active high and
configure the E0h register accordingly. This enables the software
control for the TUSB73x0's PWRONx outputs with an inverted polarity from
the default configuration which could be used as USB EN signals for the
other hubs or devices.
Parth Pancholi [Tue, 29 Oct 2024 07:24:43 +0000 (08:24 +0100)]
dt-bindings: usb: add TUSB73x0 PCIe
Add device tree bindings for TI's TUSB73x0 PCIe-to-USB 3.0 xHCI
host controller. The controller supports software configuration
through PCIe registers, such as controlling the PWRONx polarity
via the USB control register (E0h).
Datasheet: https://www.ti.com/lit/ds/symlink/tusb7320.pdf Signed-off-by: Parth Pancholi <parth.pancholi@toradex.com> Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20241029072444.8827-2-francesco@dolcini.it Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Torvalds [Sun, 3 Nov 2024 20:25:05 +0000 (10:25 -1000)]
Merge tag 'mm-hotfixes-stable-2024-11-03-10-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"17 hotfixes. 9 are cc:stable. 13 are MM and 4 are non-MM.
The usual collection of singletons - please see the changelogs"
* tag 'mm-hotfixes-stable-2024-11-03-10-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()
mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats
mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes
mm: shrinker: avoid memleak in alloc_shrinker_info
.mailmap: update e-mail address for Eugen Hristev
vmscan,migrate: fix page count imbalance on node stats when demoting pages
mailmap: update Jarkko's email addresses
mm: allow set/clear page_type again
nilfs2: fix potential deadlock with newly created symlinks
Squashfs: fix variable overflow in squashfs_readpage_block
kasan: remove vmalloc_percpu test
tools/mm: -Werror fixes in page-types/slabinfo
mm, swap: avoid over reclaim of full clusters
mm: fix PSWPIN counter for large folios swap-in
mm: avoid VM_BUG_ON when try to map an anon large folio to zero page.
mm/codetag: fix null pointer check logic for ref and tag
mm/gup: stop leaking pinned pages in low memory conditions
Linus Torvalds [Sun, 3 Nov 2024 20:15:50 +0000 (10:15 -1000)]
Merge tag 'dmaengine-fix-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:
- TI driver fix to set EOP for cyclic BCDMA transfers
- sh rz-dmac driver fix for handling config with zero address
* tag 'dmaengine-fix-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: ti: k3-udma: Set EOP for all TRs in cyclic BCDMA transfer
dmaengine: sh: rz-dmac: handle configs where one address is zero
Linus Torvalds [Sun, 3 Nov 2024 18:51:53 +0000 (08:51 -1000)]
Merge tag 'driver-core-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core revert from Greg KH:
"Here is a single driver core revert for 6.12-rc6. It reverts a change
that came in -rc1 that was supposed to resolve a reported problem, but
caused another one, so revert it for now so that we can get this all
worked out properly in 6.13.
The revert has been in linux-next all week with no reported issues"
* tag 'driver-core-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Revert "driver core: Fix uevent_show() vs driver detach race"
Linus Torvalds [Sun, 3 Nov 2024 18:48:11 +0000 (08:48 -1000)]
Merge tag 'usb-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt fixes from Greg KH:
"Here are some small USB and Thunderbolt driver fixes for 6.12-rc6 that
have been sitting in my tree this week. Included in here are the
following:
- thunderbolt driver fixes for reported issues
- USB typec driver fixes
- xhci driver fixes for reported problems
- dwc2 driver revert for a broken change
- usb phy driver fix
- usbip tool fix
All of these have been in linux-next this week with no reported
issues"
* tag 'usb-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: tcpm: restrict SNK_WAIT_CAPABILITIES_TIMEOUT transitions to non self-powered devices
usb: phy: Fix API devm_usb_put_phy() can not release the phy
usb: typec: use cleanup facility for 'altmodes_node'
usb: typec: fix unreleased fwnode_handle in typec_port_register_altmodes()
usb: typec: qcom-pmic-typec: fix missing fwnode removal in error path
usb: typec: qcom-pmic-typec: use fwnode_handle_put() to release fwnodes
usb: acpi: fix boot hang due to early incorrect 'tunneled' USB3 device links
Revert "usb: dwc2: Skip clock gating on Broadcom SoCs"
xhci: Fix Link TRB DMA in command ring stopped completion event
xhci: Use pm_runtime_get to prevent RPM on unsupported systems
usbip: tools: Fix detach_port() invalid port error path
thunderbolt: Honor TMU requirements in the domain when setting TMU mode
thunderbolt: Fix KASAN reported stack out-of-bounds read in tb_retimer_scan()
Yu Zhao [Sat, 19 Oct 2024 01:29:39 +0000 (01:29 +0000)]
mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()
When the MM_WALK capability is enabled, memory that is mostly accessed by
a VM appears younger than it really is, therefore this memory will be less
likely to be evicted. Therefore, the presence of a running VM can
significantly increase swap-outs for non-VM memory, regressing the
performance for the rest of the system.
Fix this regression by always calling {ptep,pmdp}_clear_young_notify()
whenever we clear the young bits on PMDs/PTEs.
[jthoughton@google.com: fix link-time error] Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao@google.com> Signed-off-by: James Houghton <jthoughton@google.com> Reported-by: David Stevens <stevensd@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Matlack <dmatlack@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Wei Xu <weixugc@google.com> Cc: <stable@vger.kernel.org> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yu Zhao [Sat, 19 Oct 2024 01:29:38 +0000 (01:29 +0000)]
mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats
Patch series "mm: multi-gen LRU: Have secondary MMUs participate in
MM_WALK".
Today, the MM_WALK capability causes MGLRU to clear the young bit from
PMDs and PTEs during the page table walk before eviction, but MGLRU does
not call the clear_young() MMU notifier in this case. By not calling this
notifier, the MM walk takes less time/CPU, but it causes pages that are
accessed mostly through KVM / secondary MMUs to appear younger than they
should be.
We do call the clear_young() notifier today, but only when attempting to
evict the page, so we end up clearing young/accessed information less
frequently for secondary MMUs than for mm PTEs, and therefore they appear
younger and are less likely to be evicted. Therefore, memory that is
*not* being accessed mostly by KVM will be evicted *more* frequently,
worsening performance.
ChromeOS observed a tab-open latency regression when enabling MGLRU with a
setup that involved running a VM:
Tab-open latency histogram (ms)
Version p50 mean p95 p99 max
base 1315 1198 2347 3454 10319
mglru 2559 1311 7399 12060 43758
fix 1119 926 2470 4211 6947
This series replaces the final non-selftest patchs from this series[1],
which introduced a similar change (and a new MMU notifier) with KVM
optimizations. I'll send a separate series (to Sean and Paolo) for the
KVM optimizations.
This series also makes proactive reclaim with MGLRU possible for KVM
memory. I have verified that this functions correctly with the selftest
from [1], but given that that test is a KVM selftest, I'll send it with
the rest of the KVM optimizations later. Andrew, let me know if you'd
like to take the test now anyway.
The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful
and become more complicated to properly compute when adding
test/clear_young() notifiers in MGLRU's mm walk.
Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao@google.com> Signed-off-by: James Houghton <jthoughton@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Matlack <dmatlack@google.com> Cc: David Rientjes <rientjes@google.com> Cc: David Stevens <stevensd@google.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Wei Xu <weixugc@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Sun, 3 Nov 2024 18:45:03 +0000 (08:45 -1000)]
Merge tag 'char-misc-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull misc driver fixes from Greg KH:
"Here are some small char/misc/iio fixes for 6.12-rc6 that resolve
some reported issues. Included in here are the following:
- small IIO driver fixes for many reported issues
- mei driver fix for a suddenly much reported issue for an "old"
issue.
- MAINTAINERS update for a developer who has moved companies and
forgot to update their old entry.
All of these have been in linux-next this week with no reported
issues"
* tag 'char-misc-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
mei: use kvmalloc for read buffer
MAINTAINERS: add netup_unidvb maintainer
iio: dac: Kconfig: Fix build error for ltc2664
iio: adc: ad7124: fix division by zero in ad7124_set_channel_odr()
staging: iio: frequency: ad9832: fix division by zero in ad9832_calc_freqreg()
docs: iio: ad7380: fix supply for ad7380-4
iio: adc: ad7380: fix supplies for ad7380-4
iio: adc: ad7380: add missing supplies
iio: adc: ad7380: use devm_regulator_get_enable_read_voltage()
dt-bindings: iio: adc: ad7380: fix ad7380-4 reference supply
iio: light: veml6030: fix microlux value calculation
iio: gts-helper: Fix memory leaks for the error path of iio_gts_build_avail_scale_table()
iio: gts-helper: Fix memory leaks in iio_gts_build_avail_scale_table()
Linus Torvalds [Sun, 3 Nov 2024 18:35:29 +0000 (08:35 -1000)]
Merge tag 'input-for-v6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- a fix for regression in input core introduced in 6.11 preventing
re-registering input handlers
- a fix for adp5588-keys driver tyring to disable interrupt 0 at
suspend when devices is used without interrupt
- a fix for edt-ft5x06 to stop leaking regmap structure when probing
fails and to make sure it is not released too early on removal.
* tag 'input-for-v6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: fix regression when re-registering input handlers
Input: adp5588-keys - do not try to disable interrupt 0
Input: edt-ft5x06 - fix regmap leak when probe fails
Linus Torvalds [Sun, 3 Nov 2024 18:29:02 +0000 (08:29 -1000)]
Merge tag 'kbuild-fixes-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix a memory leak in modpost
- Resolve build issues when cross-compiling RPM and Debian packages
- Fix another regression in Kconfig
- Fix incorrect MODULE_ALIAS() output in modpost
* tag 'kbuild-fixes-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
modpost: fix input MODULE_DEVICE_TABLE() built for 64-bit on 32-bit host
modpost: fix acpi MODULE_DEVICE_TABLE built with mismatched endianness
kconfig: show sub-menu entries even if the prompt is hidden
kbuild: deb-pkg: add pkg.linux-upstream.nokerneldbg build profile
kbuild: deb-pkg: add pkg.linux-upstream.nokernelheaders build profile
kbuild: rpm-pkg: disable kernel-devel package when cross-compiling
sumversion: Fix a memory leak in get_src_version()
Linus Torvalds [Sun, 3 Nov 2024 18:22:21 +0000 (08:22 -1000)]
Merge tag 'timers-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix for posix CPU timers.
When a thread is cloned, the posix CPU timers are not inherited.
If the parent has a CPU timer armed the corresponding tick dependency
in the tasks tick_dep_mask is set and copied to the new thread, which
means the new thread and all decendants will prevent the system to go
into full NOHZ operation.
Clear the tick dependency mask in copy_process() to fix this"
* tag 'timers-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone
Linus Torvalds [Sun, 3 Nov 2024 18:18:28 +0000 (08:18 -1000)]
Merge tag 'sched-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
- Plug a race between pick_next_task_fair() and try_to_wake_up() where
both try to write to the same task, even though both paths hold a
runqueue lock, but obviously from different runqueues.
The problem is that the store to task::on_rq in __block_task() is
visible to try_to_wake_up() which assumes that the task is not
queued. Both sides then operate on the same task.
Cure it by rearranging __block_task() so the the store to task::on_rq
is the last operation on the task.
- Prevent a potential NULL pointer dereference in task_numa_work()
task_numa_work() iterates the VMAs of a process. A concurrent unmap
of the address space can result in a NULL pointer return from
vma_next() which is unchecked.
Add the missing NULL pointer check to prevent this.
- Operate on the correct scheduler policy in task_should_scx()
task_should_scx() returns true when a task should be handled by sched
EXT. It checks the tasks scheduling policy.
This fails when the check is done before a policy has been set.
Cure it by handing the policy into task_should_scx() so it operates
on the requested value.
- Add the missing handling of sched EXT in the delayed dequeue
mechanism. This was simply forgotten.
* tag 'sched-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/ext: Fix scx vs sched_delayed
sched: Pass correct scheduling policy to __setscheduler_class
sched/numa: Fix the potential null pointer dereference in task_numa_work()
sched: Fix pick_next_task_fair() vs try_to_wake_up() race
Linus Torvalds [Sun, 3 Nov 2024 18:13:52 +0000 (08:13 -1000)]
Merge tag 'perf-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Thomas Gleixner:
"perf_event_clear_cpumask() uses list_for_each_entry_rcu() without
being in a RCU read side critical section, which triggers a
'suspicious RCU usage' warning.
It turns out that the list walk does not be RCU protected because the
write side lock is held in this context.
Change it to a regular list walk"
* tag 'perf-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix missing RCU reader protection in perf_event_clear_cpumask()
Linus Torvalds [Sun, 3 Nov 2024 18:09:25 +0000 (08:09 -1000)]
Merge tag 'irq-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
- Fix an off-by-one error in the failure path of msi_domain_alloc(),
which causes the cleanup loop to terminate early and leaking the
first allocated interrupt.
- Handle a corner case in GIC-V4 versus a lazily mapped Virtual
Processing Element (VPE). If the VPE has not been mapped because the
guest has not yet emitted a mapping command, then the set_affinity()
callback returns an error code, which causes the vCPU management to
fail.
Return success in this case without touching the hardware. This will
be done later when the guest issues the mapping command.
* tag 'irq-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v4: Correctly deal with set_affinity on lazily-mapped VPEs
genirq/msi: Fix off-by-one error in msi_domain_alloc()
When building a 64-bit kernel, BITS_PER_LONG is defined as 64. However,
on a 32-bit build machine, the constant 1L is a signed 32-bit value.
Left-shifting it beyond 32 bits causes wraparound, and shifting by 31
or 63 bits makes it a negative value.
The fix in commit e0e92632715f ("[PATCH] PATCH: 1 line 2.6.18 bugfix:
modpost-64bit-fix.patch") is incorrect; it only addresses cases where
a 64-bit kernel is built on a 64-bit build machine, overlooking cases
on a 32-bit build machine.
Using 1ULL ensures a 64-bit width on both 32-bit and 64-bit machines,
avoiding the wraparound issue.
Dmitry Torokhov [Mon, 28 Oct 2024 05:31:15 +0000 (22:31 -0700)]
Input: fix regression when re-registering input handlers
Commit d469647bafd9 ("Input: simplify event handling logic") introduced
code that would set handler->events() method to either
input_handler_events_filter() or input_handler_events_default() or
input_handler_events_null(), depending on the kind of input handler
(a filter or a regular one) we are dealing with. Unfortunately this
breaks cases when we try to re-register the same filter (as is the case
with sysrq handler): after initial registration the handler will have 2
event handling methods defined, and will run afoul of the check in
input_handler_check_methods():
input: input_handler_check_methods: only one event processing method can be defined (sysrq)
sysrq: Failed to register input handler, error -22
Fix this by adding handle_events() method to input_handle structure and
setting it up when registering a new input handle according to event
handling methods defined in associated input_handler structure, thus
avoiding modifying the input_handler structure.
Reported-by: "Ned T. Crigler" <crigler@gmail.com> Reported-by: Christian Heusel <christian@heusel.eu> Tested-by: "Ned T. Crigler" <crigler@gmail.com> Tested-by: Peter Seiderer <ps.report@gmx.net> Fixes: d469647bafd9 ("Input: simplify event handling logic") Link: https://lore.kernel.org/r/Zx2iQp6csn42PJA7@xavtug Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Linus Torvalds [Sat, 2 Nov 2024 19:27:11 +0000 (09:27 -1000)]
Merge tag 'nfsd-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix two async COPY bugs found during NFS bake-a-thon
- Fix an svcrdma memory leak
* tag 'nfsd-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
rpcrdma: Always release the rpcrdma_device's xa_array
NFSD: Never decrement pending_async_copies on error
NFSD: Initialize struct nfsd4_copy earlier
Linus Torvalds [Sat, 2 Nov 2024 19:22:16 +0000 (09:22 -1000)]
Merge tag 'xfs-6.12-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
- fix a sysbot reported crash on filestreams
- Reduce cpu time spent searching for extents in a very fragmented FS
- Check for delayed allocations before setting extsize
* tag 'xfs-6.12-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: streamline xfs_filestream_pick_ag
xfs: fix finding a last resort AG in xfs_filestream_pick_ag
xfs: Reduce unnecessary searches when searching for the best extents
xfs: Check for delayed allocations before setting extsize
- fix idmap_mount_tree_invalid test failure due to incorrect argument
- fix watchdog-test run leaving the watchdog timer enabled causing
system reboot. With this fix, the test disables the watchdog timer
when it gets terminated with SIGTERM, SIGKILL, and SIGQUIT in
addition to SIGINT
* tag 'linux_kselftest-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/watchdog-test: Fix system accidentally reset after watchdog-test
selftests/intel_pstate: check if cpupower is installed
selftests/intel_pstate: fix operand expected error
selftests/mount_setattr: fix idmap_mount_tree_invalid failed to run
Linus Torvalds [Sat, 2 Nov 2024 01:59:46 +0000 (15:59 -1000)]
Merge tag 'rust-fixes-6.12-3' of https://github.com/Rust-for-Linux/linux
Pull rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:
- Avoid build errors with old 'rustc's without LLVM patch version
(important since it impacts people that do not even enable Rust)
- Update LLVM version for 'HAVE_CFI_ICALL_NORMALIZE_INTEGERS' in
'depends on' condition (the fix was eventually backported rather
than land in LLVM 19)"
* tag 'rust-fixes-6.12-3' of https://github.com/Rust-for-Linux/linux:
cfi: tweak llvm version for HAVE_CFI_ICALL_NORMALIZE_INTEGERS
kbuild: rust: avoid errors with old `rustc`s without LLVM patch version
Linus Torvalds [Sat, 2 Nov 2024 01:44:23 +0000 (15:44 -1000)]
Merge tag 'pci-v6.12-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fix from Bjorn Helgaas:
- Enable device-specific ACS-like functionality even if the device
doesn't advertise an ACS capability, which got broken when adding
fancy ACS kernel parameter (Jason Gunthorpe)
* tag 'pci-v6.12-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: Fix pci_enable_acs() support for the ACS quirks
Linus Torvalds [Sat, 2 Nov 2024 01:37:09 +0000 (15:37 -1000)]
Merge tag 'drm-fixes-2024-11-02' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Regular fixes pull, nothing too out of the ordinary, the mediatek
fixes came in a batch that I might have preferred a bit earlier but
all seem fine, otherwise regular xe/amdgpu and a few misc ones.
xe:
- Fix missing HPD interrupt enabling, bringing one PM refactor with it
- Workaround LNL GGTT invalidation not being visible to GuC
- Avoid getting jobs stuck without a protecting timeout
ivpu:
- Fix firewall IRQ handling
panthor:
- Fix firmware initialization wrt page sizes
- Fix handling and reporting of dead job groups
sched:
- Guarantee forward progress via WC_MEM_RECLAIM
tests:
- Fix memory leak in drm_display_mode_from_cea_vic()
mediatek:
- Fix degradation problem of alpha blending
- Fix color format MACROs in OVL
- Fix get efuse issue for MT8188 DPTX
- Fix potential NULL dereference in mtk_crtc_destroy()
- Correct dpi power-domains property
- Add split subschema property constraints"
* tag 'drm-fixes-2024-11-02' of https://gitlab.freedesktop.org/drm/kernel: (27 commits)
drm/xe: Don't short circuit TDR on jobs not started
drm/xe: Add mmio read before GGTT invalidate
drm/tests: hdmi: Fix memory leaks in drm_display_mode_from_cea_vic()
drm/connector: hdmi: Fix memory leak in drm_display_mode_from_cea_vic()
drm/tests: helpers: Add helper for drm_display_mode_from_cea_vic()
drm/panthor: Report group as timedout when we fail to properly suspend
drm/panthor: Fail job creation when the group is dead
drm/panthor: Fix firmware initialization on systems with a page size > 4k
accel/ivpu: Fix NOC firewall interrupt handling
drm/xe/display: Add missing HPD interrupt enabling during non-d3cold RPM resume
drm/xe/display: Separate the d3cold and non-d3cold runtime PM handling
drm/xe: Remove runtime argument from display s/r functions
drm/amdgpu/smu13: fix profile reporting
drm/amd/pm: Vangogh: Fix kernel memory out of bounds write
Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35"
drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM
drm/tegra: Fix NULL vs IS_ERR() check in probe()
dt-bindings: display: mediatek: split: add subschema property constraints
dt-bindings: display: mediatek: dpi: correct power-domains property
drm/mediatek: Fix potential NULL dereference in mtk_crtc_destroy()
...
Linus Torvalds [Sat, 2 Nov 2024 01:22:57 +0000 (15:22 -1000)]
Merge tag 'cxl-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Ira Weiny:
"The bulk of these fixes center around an initialization order bug
reported by Gregory Price and some additional fall out from the
debugging effort.
In summary, cxl_acpi and cxl_mem race and previously worked because of
a bus_rescan_devices() while testing without modules built in.
Unfortunately with modules built in the rescan would fail due to the
cxl_port driver being registered late via the build order. Furthermore
it was found bus_rescan_devices() did not guarantee a probe barrier
which CXL was expecting. Additional fixes to cxl-test and decoder
allocation came along as they were found in this debugging effort.
The other fixes are pretty minor but one affects trace point data seen
by user space.
Summary:
- Fix crashes when running with cxl-test code
- Fix Trace DRAM Event Record field decodes
- Fix module/built in initialization order errors
- Fix use after free on decoder shutdowns
- Fix out of order decoder allocations
- Improve cxl-test to better reflect real world systems"
* tag 'cxl-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/test: Improve init-order fidelity relative to real-world systems
cxl/port: Prevent out-of-order decoder allocation
cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
cxl/acpi: Ensure ports ready at cxl_acpi_probe() return
cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices()
cxl/port: Fix CXL port initialization order when the subsystem is built-in
cxl/events: Fix Trace DRAM Event Record
cxl/core: Return error when cxl_endpoint_gather_bandwidth() handles a non-PCI device
Linus Torvalds [Fri, 1 Nov 2024 23:41:55 +0000 (13:41 -1000)]
Merge tag 'block-6.12-20241101' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Fixup for a recent blk_rq_map_user_bvec() patch
- NVMe pull request via Keith:
- Spec compliant identification fix (Keith)
- Module parameter to enable backward compatibility on unusual
namespace formats (Keith)
- Target double free fix when using keys (Vitaliy)
- Passthrough command error handling fix (Keith)
* tag 'block-6.12-20241101' of git://git.kernel.dk/linux:
nvme: re-fix error-handling for io_uring nvme-passthrough
nvmet-auth: assign dh_key to NULL after kfree_sensitive
nvme: module parameter to disable pi with offsets
block: fix queue limits checks in blk_rq_map_user_bvec for real
nvme: enhance cns version checking
Linus Torvalds [Fri, 1 Nov 2024 23:38:01 +0000 (13:38 -1000)]
Merge tag 'io_uring-6.12-20241101' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:
- Fix not honoring IOCB_NOWAIT for starting buffered writes in terms of
calling sb_start_write(), leading to a deadlock if someone is
attempting to freeze the file system with writes in progress, as each
side will end up waiting for the other to make progress.
* tag 'io_uring-6.12-20241101' of git://git.kernel.dk/linux:
io_uring/rw: fix missing NOWAIT check for O_DIRECT start write
Linus Torvalds [Fri, 1 Nov 2024 19:04:23 +0000 (09:04 -1000)]
Merge tag 'acpi-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Make the ACPI CPPC library use a raw spinlock for operations carried
out in scheduler context via the schedutil governor and the ACPI CPPC
cpufreq driver (Pierre Gondois)"
* tag 'acpi-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: CPPC: Make rmw_lock a raw_spin_lock
Dave Airlie [Fri, 1 Nov 2024 18:44:02 +0000 (04:44 +1000)]
Merge tag 'drm-xe-fixes-2024-10-31' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Fix missing HPD interrupt enabling, bringing one PM refactor with it
(Imre / Maarten)
- Workaround LNL GGTT invalidation not being visible to GuC
(Matthew Brost)
- Avoid getting jobs stuck without a protecting timeout (Matthew Brost)
Linus Torvalds [Fri, 1 Nov 2024 18:26:38 +0000 (08:26 -1000)]
Merge tag 'riscv-for-linus-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- Avoid accessing the early boot ACPI tables via unsafe memory
attributes, which can result in incorrect ACPI table data appearing.
This can cause all sorts of bad behavior.
- Avoid compiler-inserted library calls in the VDSO.
- GCC+Rust builds have been disabled, to avoid issues related to ISA
string mismatched between the GCC and LLVM Rust implementations.
- The NX flag is now set in the EFI PE/COFF headers, which is necessary
for some distro GRUB versions to boot images.
- A fix to avoid leaking DT node reference counts on ACPI systems
during cache info parsing.
- CPU numbers are now printed as unsigned values during hotplug.
- A pair of build fixes for usused macros, which can trigger warnings
on some configurations.
* tag 'riscv-for-linus-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Remove duplicated GET_RM
riscv: Remove unused GENERATING_ASM_OFFSETS
riscv: Use '%u' to format the output of 'cpu'
riscv: Prevent a bad reference count on CPU nodes
riscv: efi: Set NX compat flag in PE/COFF header
RISC-V: disallow gcc + rust builds
riscv: Do not use fortify in early code
RISC-V: ACPI: fix early_ioremap to early_memremap
riscv: vdso: Prevent the compiler from inserting calls to memset()
Linus Torvalds [Fri, 1 Nov 2024 17:54:11 +0000 (07:54 -1000)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The important one is a change to the way in which we handle protection
keys around signal delivery so that we're more closely aligned with
the x86 behaviour, however there is also a revert of the previous fix
to disable software tag-based KASAN with GCC, since a workaround
materialised shortly afterwards.
I'd love to say we're done with 6.12, but we're aware of some
longstanding fpsimd register corruption issues that we're almost at
the bottom of resolving.
Summary:
- Fix handling of POR_EL0 during signal delivery so that pushing the
signal context doesn't fail based on the pkey configuration of the
interrupted context and align our user-visible behaviour with that
of x86.
- Fix a bogus pointer being passed to the CPU hotplug code from the
Arm SDEI driver.
- Re-enable software tag-based KASAN with GCC by using an alternative
implementation of '__no_sanitize_address'"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: signal: Improve POR_EL0 handling to avoid uaccess failures
firmware: arm_sdei: Fix the input parameter of cpuhp_remove_state()
Revert "kasan: Disable Software Tag-Based KASAN with GCC"
kasan: Fix Software Tag-Based KASAN with GCC
Linus Torvalds [Fri, 1 Nov 2024 17:45:00 +0000 (07:45 -1000)]
Merge tag 'vfs-6.12-rc6.iomap' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull iomap fixes from Christian Brauner:
"Fixes for iomap to prevent data corruption bugs in the fallocate
unshare range implementation of fsdax and a small cleanup to turn
iomap_want_unshare_iter() into an inline function"
* tag 'vfs-6.12-rc6.iomap' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
iomap: turn iomap_want_unshare_iter into an inline function
fsdax: dax_unshare_iter needs to copy entire blocks
fsdax: remove zeroing code from dax_unshare_iter
iomap: share iomap_unshare_iter predicate code with fsdax
xfs: don't allocate COW extents when unsharing a hole
Linus Torvalds [Fri, 1 Nov 2024 17:37:10 +0000 (07:37 -1000)]
Merge tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull filesystem fixes from Christian Brauner:
"VFS:
- Fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP=y is set
- Add a get_tree_bdev_flags() helper that allows to modify e.g.,
whether errors are logged into the filesystem context during
superblock creation. This is used by erofs to fix a userspace
regression where an error is currently logged when its used on a
regular file which is an new allowed mode in erofs.
netfs:
- Fix the sysfs debug path in the documentation.
- Fix iov_iter_get_pages*() for folio queues by skipping the page
extracation if we're at the end of a folio.
afs:
- Fix moving subdirectories to different parent directory.
autofs:
- Fix handling of AUTOFS_DEV_IOCTL_TIMEOUT_CMD ioctl in
validate_dev_ioctl(). The actual ioctl number, not the ioctl
command needs to be checked for autofs"
* tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP
autofs: fix thinko in validate_dev_ioctl()
iov_iter: Fix iov_iter_get_pages*() for folio_queue
afs: Fix missing subdir edit when renamed between parent dirs
doc: correcting the debug path for cachefiles
erofs: use get_tree_bdev_flags() to avoid misleading messages
fs/super.c: introduce get_tree_bdev_flags()
Linus Torvalds [Fri, 1 Nov 2024 17:31:47 +0000 (07:31 -1000)]
Merge tag 'for-6.12-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more stability fixes. There's one patch adding export of MIPS
cmpxchg helper, used in the error propagation fix.
- fix error propagation from split bios to the original btrfs bio
- fix merging of adjacent extents (normal operation, defragmentation)
- fix potential use after free after freeing btrfs device structures"
* tag 'for-6.12-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix defrag not merging contiguous extents due to merged extent maps
btrfs: fix extent map merging not happening for adjacent extents
btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids()
btrfs: fix error propagation of split bios
MIPS: export __cmpxchg_small()
Linus Torvalds [Fri, 1 Nov 2024 17:21:03 +0000 (07:21 -1000)]
Merge tag 'bcachefs-2024-10-31' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Various syzbot fixes, and the more notable ones:
- Fix for pointers in an extent overflowing the max (16) on a
filesystem with many devices: we were creating too many cached
copies when moving data around. Now, we only create at most one
cached copy if there's a promote target set.
Caching will be a bit broken for reflinked data until 6.13: I have
larger series queued up which significantly improves the plumbing
for data options down into the extent (bch_extent_rebalance) to fix
this.
- Fix for deadlock on -ENOSPC on tiny filesystems
Allocation from the partial open_bucket list wasn't correctly
accounting partial open_buckets as free: this fixes the main cause
of tests timing out in the automated tests"
* tag 'bcachefs-2024-10-31' of git://evilpiepirate.org/bcachefs:
bcachefs: Fix NULL ptr dereference in btree_node_iter_and_journal_peek
bcachefs: fix possible null-ptr-deref in __bch2_ec_stripe_head_get()
bcachefs: Fix deadlock on -ENOSPC w.r.t. partial open buckets
bcachefs: Don't filter partial list buckets in open_buckets_to_text()
bcachefs: Don't keep tons of cached pointers around
bcachefs: init freespace inited bits to 0 in bch2_fs_initialize
bcachefs: Fix unhandled transaction restart in fallocate
bcachefs: Fix UAF in bch2_reconstruct_alloc()
bcachefs: fix null-ptr-deref in have_stripes()
bcachefs: fix shift oob in alloc_lru_idx_fragmentation
bcachefs: Fix invalid shift in validate_sb_layout()
Aapo Vienamo [Thu, 15 Aug 2024 18:45:20 +0000 (21:45 +0300)]
thunderbolt: debugfs: Implement asymmetric lane margining
Add support for the RX2 receiver which is used as the third receiver in
asymmetric links. This requires expanding the results array for the
additional third data word of the hardware margining results.
Signed-off-by: Aapo Vienamo <aapo.vienamo@iki.fi> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Use ARRAY_SIZE() when available or pass in the array size derived from
it. This is in preparation for adding another result data word for
supporting Gen 4 asymmetric links with an additional lane.
Signed-off-by: Aapo Vienamo <aapo.vienamo@iki.fi> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Aapo Vienamo [Thu, 15 Aug 2024 18:45:16 +0000 (21:45 +0300)]
thunderbolt: debugfs: Replace "both lanes" with "all lanes"
With USB4 Gen 4, the link can be configured into an asymmetric mode,
where there are three receivers and only one transmitter. The USB4
specification also uses the "all lanes" nomenclature instead of "both
lanes".
Signed-off-by: Aapo Vienamo <aapo.vienamo@iki.fi> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Aapo Vienamo [Thu, 15 Aug 2024 18:45:15 +0000 (21:45 +0300)]
thunderbolt: debugfs: Implement Gen 4 margining eye selection
Add a debugfs knob for USB4 Gen 4 margining eye selection. Gen 4 uses
3-level pulse amplitude modulation (PAM3) which changes how margining
measurements are made because PAM3 has two eyes per lane from which
the margins can be measured.
Signed-off-by: Aapo Vienamo <aapo.vienamo@iki.fi> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Aapo Vienamo [Thu, 15 Aug 2024 18:45:14 +0000 (21:45 +0300)]
thunderbolt: debugfs: Add USB4 Gen 4 margining capabilities
Parse the Gen 4 specific capabilities. Change the return types of
independent_voltage_margins() and independent_time_margins() to enums
that distinguish between the Gen 2/3 and Gen 4 margins since they behave
differently between generations.
Signed-off-by: Aapo Vienamo <aapo.vienamo@iki.fi> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Use or pass ARRAY_SIZE() of the capabilities array instead of hardcoding
it. USB4 Gen 4 introduces an additional data word, which requires
expanding the capabilities array.
Signed-off-by: Aapo Vienamo <aapo.vienamo@iki.fi> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Vlastimil Babka [Thu, 24 Oct 2024 15:12:29 +0000 (17:12 +0200)]
mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes
Since commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
boundaries") a mmap() of anonymous memory without a specific address hint
and of at least PMD_SIZE will be aligned to PMD so that it can benefit
from a THP backing page.
However this change has been shown to regress some workloads
significantly. [1] reports regressions in various spec benchmarks, with
up to 600% slowdown of the cactusBSSN benchmark on some platforms. The
benchmark seems to create many mappings of 4632kB, which would have merged
to a large THP-backed area before commit efa7df3e3bb5 and now they are
fragmented to multiple areas each aligned to PMD boundary with gaps
between. The regression then seems to be caused mainly due to the
benchmark's memory access pattern suffering from TLB or cache aliasing due
to the aligned boundaries of the individual areas.
Another known regression bisected to commit efa7df3e3bb5 is darktable [2]
[3] and early testing suggests this patch fixes the regression there as
well.
To fix the regression but still try to benefit from THP-friendly anonymous
mapping alignment, add a condition that the size of the mapping must be a
multiple of PMD size instead of at least PMD size. In case of many
odd-sized mapping like the cactusBSSN creates, those will stop being
aligned and with gaps between, and instead naturally merge again.
Link: https://lkml.kernel.org/r/20241024151228.101841-2-vbabka@suse.cz Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Michael Matz <matz@suse.de> Debugged-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Closes: https://bugzilla.suse.com/show_bug.cgi?id=1229012 [1] Reported-by: Matthias Bodenbinder <matthias@bodenbinder.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219366 [2] Closes: https://lore.kernel.org/all/2050f0d4-57b0-481d-bab8-05e8d48fed0c@leemhuis.info/ [3] Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Yang Shi <yang@os.amperecomputing.com> Cc: Rik van Riel <riel@surriel.com> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Petr Tesarik <ptesarik@suse.com> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Gregory Price [Fri, 25 Oct 2024 14:17:24 +0000 (10:17 -0400)]
vmscan,migrate: fix page count imbalance on node stats when demoting pages
When numa balancing is enabled with demotion, vmscan will call
migrate_pages when shrinking LRUs. migrate_pages will decrement the
the node's isolated page count, leading to an imbalanced count when
invoked from (MG)LRU code.
This path happens for SUCCESSFUL migrations, not failures. Typically
callers to migrate_pages are required to handle putback/accounting for
failures, but this is already handled in the shrink code.
When accounting for migrations, instead do not decrement the count when
the migration reason is MR_DEMOTION. As of v6.11, this demotion logic
is the only source of MR_DEMOTION.
Link: https://lkml.kernel.org/r/20241025141724.17927-1-gourry@gourry.net Fixes: 26aa2d199d6f ("mm/migrate: demote pages during reclaim") Signed-off-by: Gregory Price <gourry@gourry.net> Reviewed-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Wei Xu <weixugc@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>