]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
3 months agoirqchip/davinci-cp-intc: Remove public header
Bartosz Golaszewski [Tue, 4 Mar 2025 13:18:14 +0000 (14:18 +0100)] 
irqchip/davinci-cp-intc: Remove public header

There are no more users of irq-davinci-cp-intc.h (da830.c doesn't use
any of its symbols). Remove the header and make the driver stop using the
config structure.

[ tglx: Mop up coding style ]

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250304131815.86549-1-brgl@bgdev.pl
3 months agoirqchip/renesas-rzv2h: Add RZ/G3E support
Biju Das [Mon, 24 Feb 2025 13:11:28 +0000 (13:11 +0000)] 
irqchip/renesas-rzv2h: Add RZ/G3E support

The ICU block on the RZ/G3E SoC is almost identical to the one found on
the RZ/V2H SoC, with the following differences:

 - The TINT register base offset is 0x800 instead of zero.
 - The number of GPIO interrupts for TINT selection is 141 instead of 86.
 - The pin index and TINT selection index are not in the 1:1 map.
 - The number of TSSR registers is 16 instead of 8.
 - Each TSSR register can program 2 TINTs instead of 4 TINTs.

Add support for the RZ/G3E driver by filling the rzv2h_hw_info table and
adding LUT for mapping between pin index and TINT selection index.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Reviewed-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/all/20250224131253.134199-13-biju.das.jz@bp.renesas.com
3 months agoirqchip/renesas-rzv2h: Update macros ICU_TSSR_TSSEL_{MASK,PREP}
Biju Das [Mon, 24 Feb 2025 13:11:27 +0000 (13:11 +0000)] 
irqchip/renesas-rzv2h: Update macros ICU_TSSR_TSSEL_{MASK,PREP}

On RZ/G3E, TSSEL register field is 8 bits wide compared to 7 on RZ/V2H.
Also bits 8..14 is reserved on RZ/G3E and any writes on these reserved
bits is ignored.

Use bitmask GENMASK(field_width - 2, 0) on both SoCs for extracting TSSEL
and then update the macros ICU_TSSR_TSSEL_PREP and ICU_TSSR_TSSEL_MASK for
supporting both SoCs.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/all/20250224131253.134199-12-biju.das.jz@bp.renesas.com
3 months agoirqchip/renesas-rzv2h: Update TSSR_TIEN macro
Biju Das [Mon, 24 Feb 2025 13:11:26 +0000 (13:11 +0000)] 
irqchip/renesas-rzv2h: Update TSSR_TIEN macro

On RZ/G3E, TIEN bit position is at 15 compared to 7 on RZ/V2H. Replace the
macro ICU_TSSR_TIEN(n)->ICU_TSSR_TIEN(n, _field_width) for supporting both
these SoCs.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/all/20250224131253.134199-11-biju.das.jz@bp.renesas.com
3 months agoirqchip/renesas-rzv2h: Add field_width to struct rzv2h_hw_info
Biju Das [Mon, 24 Feb 2025 13:11:25 +0000 (13:11 +0000)] 
irqchip/renesas-rzv2h: Add field_width to struct rzv2h_hw_info

On RZ/G3E the field width for TSSR register for a TINT is 16 compared to 8
on the RZ/V2H.

Add field_width to struct rzv2h_hw_info and replace the macros ICU_TSSR_K
and ICU_TSSR_TSSEL_N by a runtime evaluation:

    (32 / field_width) provides the number of tints in the TSSR register.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/all/20250224131253.134199-10-biju.das.jz@bp.renesas.com
3 months agoirqchip/renesas-rzv2h: Add max_tssel to struct rzv2h_hw_info
Biju Das [Mon, 24 Feb 2025 13:11:24 +0000 (13:11 +0000)] 
irqchip/renesas-rzv2h: Add max_tssel to struct rzv2h_hw_info

The number of GPIO interrupts on RZ/G3E for TINT selection is 141 compared
to 86 on RZ/V2H. Rename the macro ICU_PB5_TINT->ICU_RZV2H_TSSEL_MAX_VAL to
hold this difference for RZ/V2H.

Add max_tssel to struct rzv2h_hw_info and replace the hardcoded constants
in the code.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Reviewed-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/all/20250224131253.134199-9-biju.das.jz@bp.renesas.com
3 months agoirqchip/renesas-rzv2h: Add struct rzv2h_hw_info with t_offs variable
Biju Das [Mon, 24 Feb 2025 13:11:23 +0000 (13:11 +0000)] 
irqchip/renesas-rzv2h: Add struct rzv2h_hw_info with t_offs variable

The ICU block on the RZ/G3E SoC is almost identical to the one found on
the RZ/V2H SoC, with the following differences:

 - The TINT register base offset is 0x800 instead of zero.
 - The number of GPIO interrupts for TINT selection is 141 instead of 86.
 - The pin index and TINT selection index are not in the 1:1 map
 - The number of TSSR registers is 16 instead of 8
 - Each TSSR register can program 2 TINTs instead of 4 TINTs

Introduce struct rzv2h_hw_info to describe the SoC properties and refactor
the code by moving rzv2h_icu_init() into rzv2h_icu_init_common() and pass
the variable containing hw difference to support both these SoCs.

As a first step add t_offs to the new struct and replace the hardcoded
constants in the code.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Reviewed-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/all/20250224131253.134199-8-biju.das.jz@bp.renesas.com
3 months agoirqchip/renesas-rzv2h: Use devm_pm_runtime_enable()
Biju Das [Mon, 24 Feb 2025 13:11:22 +0000 (13:11 +0000)] 
irqchip/renesas-rzv2h: Use devm_pm_runtime_enable()

Simplify rzv2h_icu_init() by using devm_pm_runtime_enable().

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/all/20250224131253.134199-7-biju.das.jz@bp.renesas.com
3 months agoirqchip/renesas-rzv2h: Use devm_reset_control_get_exclusive_deasserted()
Biju Das [Mon, 24 Feb 2025 13:11:21 +0000 (13:11 +0000)] 
irqchip/renesas-rzv2h: Use devm_reset_control_get_exclusive_deasserted()

Use devm_reset_control_get_exclusive_deasserted() to simplify
rzv2h_icu_init().

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://lore.kernel.org/all/20250224131253.134199-6-biju.das.jz@bp.renesas.com
3 months agoirqchip/renesas-rzv2h: Simplify rzv2h_icu_init()
Biju Das [Mon, 24 Feb 2025 13:11:20 +0000 (13:11 +0000)] 
irqchip/renesas-rzv2h: Simplify rzv2h_icu_init()

Use devm_add_action_or_reset() for calling put_device in error path of
rzv2h_icu_init() to simplify the code by using the recently added devm_*
helpers.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://lore.kernel.org/all/20250224131253.134199-5-biju.das.jz@bp.renesas.com
3 months agoirqchip/renesas-rzv2h: Drop irqchip from struct rzv2h_icu_priv
Biju Das [Mon, 24 Feb 2025 13:11:19 +0000 (13:11 +0000)] 
irqchip/renesas-rzv2h: Drop irqchip from struct rzv2h_icu_priv

Use rzv2h_icu_chip directly on irq_domain_set_hwirq_and_chip() and drop
the global variable irqchip from struct rzv2h_icu_priv.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Reviewed-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/all/20250224131253.134199-4-biju.das.jz@bp.renesas.com
3 months agoirqchip/renesas-rzv2h: Fix wrong variable usage in rzv2h_tint_set_type()
Biju Das [Mon, 24 Feb 2025 13:11:18 +0000 (13:11 +0000)] 
irqchip/renesas-rzv2h: Fix wrong variable usage in rzv2h_tint_set_type()

The variable tssel_n is used for selecting TINT source and titsel_n for
setting the interrupt type. The variable titsel_n is wrongly used for
enabling the TINT interrupt in rzv2h_tint_set_type(). Fix this issue by
using the correct variable tssel_n.

While at it, move the tien variable assignment near to tssr.

Fixes: 0d7605e75ac2 ("irqchip: Add RZ/V2H(P) Interrupt Control Unit (ICU) driver")
Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250224131253.134199-3-biju.das.jz@bp.renesas.com
Closes: https://lore.kernel.org/CAMuHMdU3xJpz-jh=j7t4JreBat2of2ksP_OR3+nKAoZBr4pSxg@mail.gmail.com
3 months agodt-bindings: interrupt-controller: renesas,rzv2h-icu: Document RZ/G3E SoC
Biju Das [Mon, 24 Feb 2025 13:11:17 +0000 (13:11 +0000)] 
dt-bindings: interrupt-controller: renesas,rzv2h-icu: Document RZ/G3E SoC

Document RZ/G3E (R9A09G047) ICU bindings. The ICU block on the RZ/G3E
SoC is almost identical to the one found on the RZ/V2H SoC, with the
following differences:
 - The TINT register base offset is 0x800 instead of zero.
 - The number of supported GPIO interrupts for TINT selection is 141
   instead of 86.
 - The pin index and TINT selection index are not in the 1:1 map
 - The number of TSSR registers is 16 instead of 8
 - Each TSSR register can program 2 TINTs instead of 4 TINTs

Hence add the new compatible string "renesas,r9a09g047-icu" for RZ/G3E SoC.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Reviewed-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/all/20250224131253.134199-2-biju.das.jz@bp.renesas.com
3 months agoriscv: sophgo: dts: Add msi controller for SG2042
Chen Wang [Wed, 26 Feb 2025 02:15:37 +0000 (10:15 +0800)] 
riscv: sophgo: dts: Add msi controller for SG2042

Add msi-controller node to dts for SG2042.

Signed-off-by: Chen Wang <unicorn_wang@outlook.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/f47c6c3f0309a543d495cb088d6c8c5750bb5647.1740535748.git.unicorn_wang@outlook.com
3 months agoirqchip: Add the Sophgo SG2042 MSI interrupt controller
Chen Wang [Wed, 26 Feb 2025 02:15:19 +0000 (10:15 +0800)] 
irqchip: Add the Sophgo SG2042 MSI interrupt controller

Add driver for Sophgo SG2042 MSI interrupt controller.

Signed-off-by: Chen Wang <unicorn_wang@outlook.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Inochi Amaoto <inochiama@gmail.com>
Link: https://lore.kernel.org/all/3104216ca90a5f532bafb676c1c5b1efb19e94d1.1740535748.git.unicorn_wang@outlook.com
3 months agodt-bindings: interrupt-controller: Add Sophgo SG2042 MSI
Chen Wang [Wed, 26 Feb 2025 02:15:01 +0000 (10:15 +0800)] 
dt-bindings: interrupt-controller: Add Sophgo SG2042 MSI

Add binding for Sophgo SG2042 MSI controller.

Signed-off-by: Chen Wang <unicorn_wang@outlook.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/all/44de02977624be334ba6328acfdbb2a375f2071f.1740535748.git.unicorn_wang@outlook.com
4 months agoarm64: dts: rockchip: rk356x: Move PCIe MSI to use GIC ITS instead of MBI
Dmitry Osipenko [Sun, 16 Feb 2025 22:16:34 +0000 (01:16 +0300)] 
arm64: dts: rockchip: rk356x: Move PCIe MSI to use GIC ITS instead of MBI

Rockchip 356x device-tree now supports GIC ITS. Move PCIe controller's
MSI to use ITS instead of MBI. This removes extra CPU overhead of handling
PCIe MBIs by letting GIC's ITS to serve the PCIe MSIs.

Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250216221634.364158-4-dmitry.osipenko@collabora.com
4 months agoarm64: dts: rockchip: rk356x: Add MSI controller node
Dmitry Osipenko [Sun, 16 Feb 2025 22:16:33 +0000 (01:16 +0300)] 
arm64: dts: rockchip: rk356x: Add MSI controller node

Rockchip 356x SoC's GIC has two hardware integration issues that
affect MSI functionality of the GIC. Previously, both these GIC
issues were worked around by using MBI for MSI instead of ITS
because kernel GIC driver didn't have necessary quirks.

First issue is about RK356x GIC not supporting programmable
shareability, while reporting it as supported in a GIC's feature
register. Rockchip assigned Erratum ID #3568001 for this issue. This
patch adds dma-noncoherent property to the GIC node, denoting that a SW
workaround is required for mitigating the issue.

Second issue is about GIC AXI master interface addressing limited to
the first 4GB of physical address space. Rockchip assigned Erratum
ID #3568002 for this issue.

Now that kernel supports quirks for both of the erratums, add
MSI controller node to RK356x device-tree.

Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250216221634.364158-3-dmitry.osipenko@collabora.com
4 months agoirqchip/gic-v3: Add Rockchip 3568002 erratum workaround
Dmitry Osipenko [Sun, 16 Feb 2025 22:16:32 +0000 (01:16 +0300)] 
irqchip/gic-v3: Add Rockchip 3568002 erratum workaround

Rockchip RK3566/RK3568 GIC600 integration has DDR addressing
limited to the first 32bit of physical address space. Rockchip
assigned Erratum ID #3568002 for this issue. Add driver quirk for
this Rockchip GIC Erratum.

Note, that the 0x0201743b GIC600 ID is not Rockchip-specific and is
common for many ARM GICv3 implementations. Hence, there is an extra
of_machine_is_compatible() check.

Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20250216221634.364158-2-dmitry.osipenko@collabora.com
4 months agoirqchip/riscv-imsic: Special handling for non-atomic device MSI update
Anup Patel [Mon, 17 Feb 2025 08:56:56 +0000 (14:26 +0530)] 
irqchip/riscv-imsic: Special handling for non-atomic device MSI update

Devices, which have a non-atomic MSI update, might see an intermediate
state when changing the target IMSIC vector from one CPU to another.

To avoid losing interrupts due to this intermediate state, do the following
just like x86 APIC:

 1) First write a temporary IMSIC vector to the device which has the same
    MSI address as the old IMSIC vector and MSI data pointing to the new
    IMSIC vector.

 2) Next write the new IMSIC vector to the device.

Based on the above, the __imsic_local_sync() must check pending status of
both old MSI data and new MSI data on the old CPU. In addition, the
movement of IMSIC vector for non-atomic device MSI update must be done in
interrupt context using IRQCHIP_MOVE_DEFERRED.

Implememnt the logic and enforce the chip flag for PCI/MSI[X].

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250217085657.789309-11-apatel@ventanamicro.com
4 months agoirqchip/riscv-imsic: Avoid interrupt translation in interrupt handler
Anup Patel [Mon, 17 Feb 2025 08:56:55 +0000 (14:26 +0530)] 
irqchip/riscv-imsic: Avoid interrupt translation in interrupt handler

Currently, imsic_handle_irq() uses generic_handle_domain_irq() to handle
the interrupt, which internally has an extra step of resolving hwirq using
domain.

Avoid the translation step by replacing the hardware interrupt number with
the Linux interrupt number in the IMSIC vector data and directly call
generic_handle_irq().

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250217085657.789309-10-apatel@ventanamicro.com
4 months agoirqchip/riscv-imsic: Implement irq_force_complete_move() for IMSIC
Anup Patel [Mon, 17 Feb 2025 08:56:54 +0000 (14:26 +0530)] 
irqchip/riscv-imsic: Implement irq_force_complete_move() for IMSIC

Implement irq_force_complete_move() for IMSIC driver so that in-flight
vector movements on a CPU can be cleaned-up when the CPU goes down.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250217085657.789309-9-apatel@ventanamicro.com
4 months agoirqchip/riscv-imsic: Separate next and previous pointers in IMSIC vector
Anup Patel [Mon, 17 Feb 2025 08:56:53 +0000 (14:26 +0530)] 
irqchip/riscv-imsic: Separate next and previous pointers in IMSIC vector

Currently, there is only one "move" pointer in struct imsic_vector so
during vector movement the old vector points to the new vector and new
vector points to itself.

To support forced cleanup of the old vector, add separate "move_next" and
"move_prev" pointers to struct imsic_vector, where during vector movement
the "move_next" pointer of the old vector points to the new vector and the
"move_prev" pointer of the new vector points to the old vector.

Both "move_next" and "move_prev" pointers are cleared separately by
__imsic_local_sync() with a restriction that "move_prev" on the new
CPU is cleared only after the old CPU has cleared "move_next".

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250217085657.789309-8-apatel@ventanamicro.com
4 months agoRISC-V: Select CONFIG_GENERIC_PENDING_IRQ
Anup Patel [Mon, 17 Feb 2025 08:56:52 +0000 (14:26 +0530)] 
RISC-V: Select CONFIG_GENERIC_PENDING_IRQ

Enable CONFIG_GENERIC_PENDING_IRQ for RISC-V so that RISC-V interrupt chips
can support delayed interrupt mirgration in interrupt context.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250217085657.789309-7-apatel@ventanamicro.com
4 months agogenirq: Introduce irq_can_move_in_process_context()
Anup Patel [Mon, 17 Feb 2025 08:56:51 +0000 (14:26 +0530)] 
genirq: Introduce irq_can_move_in_process_context()

Interrupt controller drivers which enable CONFIG_GENERIC_PENDING_IRQ
require to know whether an interrupt can be moved in process context or not
to decide whether they need to invoke the work around for non-atomic MSI
updates or not.

This information can be retrieved via irq_can_move_pcntxt(). That helper
requires access to the top-most interrupt domain data, but the driver which
requires this is usually further down in the hierarchy.

Introduce irq_can_move_in_process_context() which retrieves that
information from the top-most interrupt domain data.

[ tglx: Massaged change log ]

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250217085657.789309-6-apatel@ventanamicro.com
4 months agogenirq: Introduce common irq_force_complete_move() implementation
Thomas Gleixner [Mon, 17 Feb 2025 08:56:50 +0000 (14:26 +0530)] 
genirq: Introduce common irq_force_complete_move() implementation

CONFIG_GENERIC_PENDING_IRQ requires an architecture specific implementation
of irq_force_complete_move() for CPU hotplug. At the moment, only x86
implements this unconditionally, but for RISC-V irq_force_complete_move()
is only needed when the RISC-V IMSIC driver is in use and not needed
otherwise.

To allow runtime configuration of this mechanism, introduce a common
irq_force_complete_move() implementation in the interrupt core code, which
only invokes the completion function, when a interrupt chip in the
hierarchy implements it.

Switch X86 over to the new mechanism. No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250217085657.789309-5-apatel@ventanamicro.com
4 months agoirqchip/riscv-imsic: Move to common MSI library
Thomas Gleixner [Mon, 17 Feb 2025 08:56:49 +0000 (14:26 +0530)] 
irqchip/riscv-imsic: Move to common MSI library

Simplify the leaf MSI domain handling in the RISC-V IMSIC driver by
using msi_lib_init_dev_msi_info() and msi_lib_irq_domain_select()
provided by the common MSI library.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250217085657.789309-4-apatel@ventanamicro.com
4 months agoirqchip/irq-msi-lib: Optionally set default irq_eoi()/irq_ack()
Thomas Gleixner [Mon, 17 Feb 2025 08:56:48 +0000 (14:26 +0530)] 
irqchip/irq-msi-lib: Optionally set default irq_eoi()/irq_ack()

msi_lib_init_dev_msi_info() sets the default irq_eoi()/irq_ack() callbacks
unconditionally. This is correct for all existing users, but prevents the
IMSIC driver to be moved to the MSI library implementation.

Introduce chip_flags in struct msi_parent_ops, which instruct the library
to selectively set the callbacks depending on the flags, and update all
current users to set them.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250217085657.789309-3-apatel@ventanamicro.com
4 months agoirqchip/riscv-imsic: Set irq_set_affinity() for IMSIC base
Andrew Jones [Mon, 17 Feb 2025 08:56:47 +0000 (14:26 +0530)] 
irqchip/riscv-imsic: Set irq_set_affinity() for IMSIC base

The IMSIC driver assigns the IMSIC domain specific imsic_irq_set_affinity()
callback to the per device leaf MSI domain. That's a layering violation as
it is called with the leaf domain data and not with the IMSIC domain
data. This prevents moving the IMSIC driver to the common MSI library which
uses the generic msi_domain_set_affinity() callback for device MSI domains.

Instead of using imsic_irq_set_affinity() for leaf MSI domains, use
imsic_irq_set_affinity() for the non-leaf IMSIC base domain and use
irq_chip_set_affinity_parent() for leaf MSI domains.

[ tglx: Massaged change log ]

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250217085657.789309-2-apatel@ventanamicro.com
4 months agoirqchip/renesas-rzg2l: Simplify checks in rzg2l_irqc_common_init()
Fabrizio Castro [Wed, 12 Feb 2025 18:20:34 +0000 (18:20 +0000)] 
irqchip/renesas-rzg2l: Simplify checks in rzg2l_irqc_common_init()

Both devm_pm_runtime_enable() and pm_runtime_resume_and_get()
return 0 or a negative error code.

Simplify the checks done with their respective return values
accordingly.

Signed-off-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250212182034.366167-7-fabrizio.castro.jz@renesas.com
4 months agoirqchip/renesas-rzg2l: Switch to using dev_err_probe()
Fabrizio Castro [Wed, 12 Feb 2025 18:20:33 +0000 (18:20 +0000)] 
irqchip/renesas-rzg2l: Switch to using dev_err_probe()

Make use of dev_err_probe() to simplify rzg2l_irqc_common_init().

Signed-off-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://lore.kernel.org/all/20250212182034.366167-6-fabrizio.castro.jz@renesas.com
4 months agoirqchip/renesas-rzg2l: Remove pm_put label
Fabrizio Castro [Wed, 12 Feb 2025 18:20:32 +0000 (18:20 +0000)] 
irqchip/renesas-rzg2l: Remove pm_put label

No need to keep label `pm_put`, as it's only used once.
Call pm_runtime_put() directly from the error path.

Signed-off-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://lore.kernel.org/all/20250212182034.366167-5-fabrizio.castro.jz@renesas.com
4 months agoirqchip/renesas-rzg2l: Use devm_pm_runtime_enable()
Fabrizio Castro [Wed, 12 Feb 2025 18:20:31 +0000 (18:20 +0000)] 
irqchip/renesas-rzg2l: Use devm_pm_runtime_enable()

Simplify rzg2l_irqc_common_init() by using devm_pm_runtime_enable().

Signed-off-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250212182034.366167-4-fabrizio.castro.jz@renesas.com
4 months agoirqchip/renesas-rzg2l: Use devm_reset_control_get_exclusive_deasserted()
Fabrizio Castro [Wed, 12 Feb 2025 18:20:30 +0000 (18:20 +0000)] 
irqchip/renesas-rzg2l: Use devm_reset_control_get_exclusive_deasserted()

Use devm_reset_control_get_exclusive_deasserted() to simplify
rzg2l_irqc_common_init().

Signed-off-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://lore.kernel.org/all/20250212182034.366167-3-fabrizio.castro.jz@renesas.com
4 months agoirqchip/renesas-rzg2l: Use local dev pointer in rzg2l_irqc_common_init()
Fabrizio Castro [Wed, 12 Feb 2025 18:20:29 +0000 (18:20 +0000)] 
irqchip/renesas-rzg2l: Use local dev pointer in rzg2l_irqc_common_init()

Replace direct references to `&pdev->dev` with the local `dev` pointer
in rzg2l_irqc_common_init() to avoid redundant dereferencing.

Suggested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://lore.kernel.org/all/20250212182034.366167-2-fabrizio.castro.jz@renesas.com
4 months agoirqchip/riscv-aplic: Add support for hart indexes
Vladimir Kondratiev [Wed, 29 Jan 2025 09:16:37 +0000 (11:16 +0200)] 
irqchip/riscv-aplic: Add support for hart indexes

RISC-V APLIC specification defines "hart index" in:

  https://github.com/riscv/riscv-aia

Within a given interrupt domain, each of the domain’s harts has a unique
index number in the range 0 to 2^14 − 1 (= 16,383). The index number a
domain associates with a hart may or may not have any relationship to the
unique hart identifier (“hart ID”) that the RISC-V Privileged Architecture
assigns to the hart. Two different interrupt domains may employ entirely
different index numbers for the same set of harts.

Further, this document says in "4.5 Memory-mapped control region for an
interrupt domain":

The array of IDC structures may include some for potential hart index
numbers that are not actual hart index numbers in the domain. For example,
the first IDC structure is always for hart index 0, but 0 is not
necessarily a valid index number for any hart in the domain.

Support arbitrary hart indices specified in an optional APLIC property
"riscv,hart-indexes" which is specified as an array of u32 elements, one
per interrupt target. If this property is not specified, fallback to use
logical hart indices within the domain.

Signed-off-by: Vladimir Kondratiev <vladimir.kondratiev@mobileye.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/all/20250129091637.1667279-3-vladimir.kondratiev@mobileye.com
4 months agodt-bindings: interrupt-controller: Add risc-v,aplic hart indexes
Vladimir Kondratiev [Wed, 29 Jan 2025 09:16:36 +0000 (11:16 +0200)] 
dt-bindings: interrupt-controller: Add risc-v,aplic hart indexes

Document optional property "riscv,hart-indexes"

The RISC-V APLIC specification defines "hart index" in:

  https://github.com/riscv/riscv-aia

Within a given interrupt domain, each of the domain’s harts has a unique
index number in the range 0 to 2^14 − 1 (= 16,383). The index number a
domain associates with a hart may or may not have any relationship to the
unique hart identifier (“hart ID”) that the RISC-V Privileged Architecture
assigns to the hart. Two different interrupt domains may employ entirely
different index numbers for the same set of harts.

Further, this document says in "4.5 Memory-mapped control region for an
interrupt domain":

The array of IDC structures may include some for potential hart index
numbers that are not actual hart index numbers in the domain. For example,
the first IDC structure is always for hart index 0, but 0 is not
necessarily a valid index number for any hart in the domain.

Support arbitrary hart indexes specified in a optional APLIC property
"riscv,hart-indexes" which is specificed as an array of u32 elements, one
per interrupt target. If this property is not specified, fallback to use
the logical hart indices within the domain.

Signed-off-by: Vladimir Kondratiev <vladimir.kondratiev@mobileye.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/all/20250129091637.1667279-2-vladimir.kondratiev@mobileye.com
4 months agoLinux 6.14-rc1 v6.14-rc1
Linus Torvalds [Sun, 2 Feb 2025 23:39:26 +0000 (15:39 -0800)] 
Linux 6.14-rc1

4 months agoMerge tag 'turbostat-2025.02.02' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 2 Feb 2025 18:49:13 +0000 (10:49 -0800)] 
Merge tag 'turbostat-2025.02.02' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux

Pull turbostat updates from Len Brown:

 - Fix regression that affinitized forked child in one-shot mode.

 - Harden one-shot mode against hotplug online/offline

 - Enable RAPL SysWatt column by default

 - Add initial PTL, CWF platform support

 - Harden initial PMT code in response to early use

 - Enable first built-in PMT counter: CWF c1e residency

 - Refuse to run on unsupported platforms without --force, to encourage
   updating to a version that supports the system, and to avoid
   no-so-useful measurement results

* tag 'turbostat-2025.02.02' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (25 commits)
  tools/power turbostat: version 2025.02.02
  tools/power turbostat: Add CPU%c1e BIC for CWF
  tools/power turbostat: Harden one-shot mode against cpu offline
  tools/power turbostat: Fix forked child affinity regression
  tools/power turbostat: Add tcore clock PMT type
  tools/power turbostat: version 2025.01.14
  tools/power turbostat: Allow adding PMT counters directly by sysfs path
  tools/power turbostat: Allow mapping multiple PMT files with the same GUID
  tools/power turbostat: Add PMT directory iterator helper
  tools/power turbostat: Extend PMT identification with a sequence number
  tools/power turbostat: Return default value for unmapped PMT domains
  tools/power turbostat: Check for non-zero value when MSR probing
  tools/power turbostat: Enhance turbostat self-performance visibility
  tools/power turbostat: Add fixed RAPL PSYS divisor for SPR
  tools/power turbostat: Fix PMT mmaped file size rounding
  tools/power turbostat: Remove SysWatt from DISABLED_BY_DEFAULT
  tools/power turbostat: Add an NMI column
  tools/power turbostat: add Busy% to "show idle"
  tools/power turbostat: Introduce --force parameter
  tools/power turbostat: Improve --help output
  ...

4 months agoMerge tag 'sh-for-v6.14-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubi...
Linus Torvalds [Sun, 2 Feb 2025 18:40:27 +0000 (10:40 -0800)] 
Merge tag 'sh-for-v6.14-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux

Pull sh updates from John Paul Adrian Glaubitz:
 "Fixes and improvements for sh:

   - replace seq_printf() with the more efficient
     seq_put_decimal_ull_width() to increase performance when stress
     reading /proc/interrupts (David Wang)

   - migrate sh to the generic rule for built-in DTB to help avoid race
     conditions during parallel builds which can occur because Kbuild
     decends into arch/*/boot/dts twice (Masahiro Yamada)

   - replace select with imply in the board Kconfig for enabling
     hardware with complex dependencies. This addresses warnings which
     were reported by the kernel test robot (Geert Uytterhoeven)"

* tag 'sh-for-v6.14-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
  sh: boards: Use imply to enable hardware with complex dependencies
  sh: Migrate to the generic rule for built-in DTB
  sh: irq: Use seq_put_decimal_ull_width() for decimal values

4 months agotools/power turbostat: version 2025.02.02
Len Brown [Sun, 2 Feb 2025 16:43:02 +0000 (10:43 -0600)] 
tools/power turbostat: version 2025.02.02

Summary of Changes since 2024.11.30:

Fix regression in 2023.11.07 that affinitized forked child
in one-shot mode.

Harden one-shot mode against hotplug online/offline

Enable RAPL SysWatt column by default.

Add initial PTL, CWF platform support.

Harden initial PMT code in response to early use.

Enable first built-in PMT counter: CWF c1e residency

Refuse to run on unsupported platforms without --force,
to encourage updating to a version that supports the system,
and to avoid no-so-useful measurement results.

Signed-off-by: Len Brown <len.brown@intel.com>
4 months agoMerge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sat, 1 Feb 2025 23:07:56 +0000 (15:07 -0800)] 
Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull misc vfs cleanups from Al Viro:
 "Two unrelated patches - one is a removal of long-obsolete include in
  overlayfs (it used to need fs/internal.h, but the extern it wanted has
  been moved back to include/linux/namei.h) and another introduces
  convenience helper constructing struct qstr by a NUL-terminated
  string"

* tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  add a string-to-qstr constructor
  fs/overlayfs/namei.c: get rid of include ../internal.h

4 months agoMerge tag 'mips_6.14_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Linus Torvalds [Sat, 1 Feb 2025 22:54:33 +0000 (14:54 -0800)] 
Merge tag 'mips_6.14_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fix from Thomas Bogendoerfer:
 "Revert commit breaking sysv ipc for o32 ABI"

* tag 'mips_6.14_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  Revert "mips: fix shmctl/semctl/msgctl syscall for o32"

4 months agoMerge tag 'v6.14-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 1 Feb 2025 19:30:41 +0000 (11:30 -0800)] 
Merge tag 'v6.14-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull more smb client updates from Steve French:

   - various updates for special file handling: symlink handling,
     support for creating sockets, cleanups, new mount options (e.g. to
     allow disabling using reparse points for them, and to allow
     overriding the way symlinks are saved), and fixes to error paths

   - fix for kerberos mounts (allow IAKerb)

   - SMB1 fix for stat and for setting SACL (auditing)

   - fix an incorrect error code mapping

   - cleanups"

* tag 'v6.14-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: (21 commits)
  cifs: Fix parsing native symlinks directory/file type
  cifs: update internal version number
  cifs: Add support for creating WSL-style symlinks
  smb3: add support for IAKerb
  cifs: Fix struct FILE_ALL_INFO
  cifs: Add support for creating NFS-style symlinks
  cifs: Add support for creating native Windows sockets
  cifs: Add mount option -o reparse=none
  cifs: Add mount option -o symlink= for choosing symlink create type
  cifs: Fix creating and resolving absolute NT-style symlinks
  cifs: Simplify reparse point check in cifs_query_path_info() function
  cifs: Remove symlink member from cifs_open_info_data union
  cifs: Update description about ACL permissions
  cifs: Rename struct reparse_posix_data to reparse_nfs_data_buffer and move to common/smb2pdu.h
  cifs: Remove struct reparse_posix_data from struct cifs_open_info_data
  cifs: Remove unicode parameter from parse_reparse_point() function
  cifs: Fix getting and setting SACLs over SMB1
  cifs: Remove intermediate object of failed create SFU call
  cifs: Validate EAs for WSL reparse points
  cifs: Change translation of STATUS_PRIVILEGE_NOT_HELD to -EPERM
  ...

4 months agoMerge tag 'driver-core-6.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 1 Feb 2025 18:04:29 +0000 (10:04 -0800)] 
Merge tag 'driver-core-6.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull debugfs fix from Greg KH:
 "Here is a single debugfs fix from Al to resolve a reported regression
  in the driver-core tree. It has been reported to fix the issue"

* tag 'driver-core-6.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  debugfs: Fix the missing initializations in __debugfs_file_get()

4 months agoMerge tag 'mm-hotfixes-stable-2025-02-01-03-56' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 1 Feb 2025 17:49:20 +0000 (09:49 -0800)] 
Merge tag 'mm-hotfixes-stable-2025-02-01-03-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "21 hotfixes. 8 are cc:stable and the remainder address post-6.13
  issues. 13 are for MM and 8 are for non-MM.

  All are singletons, please see the changelogs for details"

* tag 'mm-hotfixes-stable-2025-02-01-03-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
  MAINTAINERS: include linux-mm for xarray maintenance
  revert "xarray: port tests to kunit"
  MAINTAINERS: add lib/test_xarray.c
  mailmap, MAINTAINERS, docs: update Carlos's email address
  mm/hugetlb: fix hugepage allocation for interleaved memory nodes
  mm: gup: fix infinite loop within __get_longterm_locked
  mm, swap: fix reclaim offset calculation error during allocation
  .mailmap: update email address for Christopher Obbard
  kfence: skip __GFP_THISNODE allocations on NUMA systems
  nilfs2: fix possible int overflows in nilfs_fiemap()
  mm: compaction: use the proper flag to determine watermarks
  kernel: be more careful about dup_mmap() failures and uprobe registering
  mm/fake-numa: handle cases with no SRAT info
  mm: kmemleak: fix upper boundary check for physical address objects
  mailmap: add an entry for Hamza Mahfooz
  MAINTAINERS: mailmap: update Yosry Ahmed's email address
  scripts/gdb: fix aarch64 userspace detection in get_current_task
  mm/vmscan: accumulate nr_demoted for accurate demotion statistics
  ocfs2: fix incorrect CPU endianness conversion causing mount failure
  mm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc()
  ...

4 months agoMerge tag 'media/v6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Sat, 1 Feb 2025 17:15:01 +0000 (09:15 -0800)] 
Merge tag 'media/v6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fix from Mauro Carvalho Chehab:
 "A revert for a regression in the uvcvideo driver"

* tag 'media/v6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  Revert "media: uvcvideo: Require entities to have a non-zero unique ID"

4 months agoMAINTAINERS: include linux-mm for xarray maintenance
Andrew Morton [Fri, 31 Jan 2025 00:16:20 +0000 (16:16 -0800)] 
MAINTAINERS: include linux-mm for xarray maintenance

MM developers have an interest in the xarray code.

Cc: David Gow <davidgow@google.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Tamir Duberstein <tamird@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agorevert "xarray: port tests to kunit"
Andrew Morton [Fri, 31 Jan 2025 00:09:20 +0000 (16:09 -0800)] 
revert "xarray: port tests to kunit"

Revert c7bb5cf9fc4e ("xarray: port tests to kunit").  It broke the build
when compiing the xarray userspace test harness code.

Reported-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Closes: https://lkml.kernel.org/r/07cf896e-adf8-414f-a629-a808fc26014a@oracle.com
Cc: David Gow <davidgow@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Tamir Duberstein <tamird@gmail.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agoMAINTAINERS: add lib/test_xarray.c
Tamir Duberstein [Wed, 29 Jan 2025 21:13:49 +0000 (16:13 -0500)] 
MAINTAINERS: add lib/test_xarray.c

Ensure test-only changes are sent to the relevant maintainer.

Link: https://lkml.kernel.org/r/20250129-xarray-test-maintainer-v1-1-482e31f30f47@gmail.com
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Cc: Mattew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomailmap, MAINTAINERS, docs: update Carlos's email address
Carlos Bilbao [Thu, 30 Jan 2025 01:22:44 +0000 (19:22 -0600)] 
mailmap, MAINTAINERS, docs: update Carlos's email address

Update .mailmap to reflect my new (and final) primary email address,
carlos.bilbao@kernel.org.  Also update contact information in files
Documentation/translations/sp_SP/index.rst and MAINTAINERS.

Link: https://lkml.kernel.org/r/20250130012248.1196208-1-carlos.bilbao@kernel.org
Signed-off-by: Carlos Bilbao <carlos.bilbao@kernel.org>
Cc: Carlos Bilbao <bilbao@vt.edu>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mattew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomm/hugetlb: fix hugepage allocation for interleaved memory nodes
Ritesh Harjani (IBM) [Sat, 11 Jan 2025 11:06:55 +0000 (16:36 +0530)] 
mm/hugetlb: fix hugepage allocation for interleaved memory nodes

gather_bootmem_prealloc() assumes the start nid as 0 and size as
num_node_state(N_MEMORY).  That means in case if memory attached numa
nodes are interleaved, then gather_bootmem_prealloc_parallel() will fail
to scan few of these nodes.

Since memory attached numa nodes can be interleaved in any fashion, hence
ensure that the current code checks for all numa node ids
(.size = nr_node_ids). Let's still keep max_threads as N_MEMORY, so that
it can distributes all nr_node_ids among the these many no. threads.

e.g. qemu cmdline
========================
numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20"
mem_cmd="-object memory-backend-ram,id=mem1,size=16G"

w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
==========================
~ # cat /proc/meminfo  |grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:               0 kB

with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
===========================
~ # cat /proc/meminfo |grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:       2
HugePages_Free:        2
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:         2097152 kB

Link: https://lkml.kernel.org/r/f8d8dad3a5471d284f54185f65d575a6aaab692b.1736592534.git.ritesh.list@gmail.com
Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reported-by: Pavithra Prakash <pavrampu@linux.ibm.com>
Suggested-by: Muchun Song <muchun.song@linux.dev>
Tested-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reviewed-by: Luiz Capitulino <luizcap@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Gang Li <gang.li@linux.dev>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomm: gup: fix infinite loop within __get_longterm_locked
Zhaoyang Huang [Tue, 21 Jan 2025 02:01:59 +0000 (10:01 +0800)] 
mm: gup: fix infinite loop within __get_longterm_locked

We can run into an infinite loop in __get_longterm_locked() when
collect_longterm_unpinnable_folios() finds only folios that are isolated
from the LRU or were never added to the LRU.  This can happen when all
folios to be pinned are never added to the LRU, for example when
vm_ops->fault allocated pages using cma_alloc() and never added them to
the LRU.

Fix it by simply taking a look at the list in the single caller, to see if
anything was added.

[zhaoyang.huang@unisoc.com: move definition of local]
Link: https://lkml.kernel.org/r/20250122012604.3654667-1-zhaoyang.huang@unisoc.com
Link: https://lkml.kernel.org/r/20250121020159.3636477-1-zhaoyang.huang@unisoc.com
Fixes: 67e139b02d99 ("mm/gup.c: refactor check_and_migrate_movable_pages()")
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Aijun Sun <aijun.sun@unisoc.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomm, swap: fix reclaim offset calculation error during allocation
Kairui Song [Thu, 30 Jan 2025 11:51:31 +0000 (19:51 +0800)] 
mm, swap: fix reclaim offset calculation error during allocation

There is a code error that will cause the swap entry allocator to reclaim
and check the whole cluster with an unexpected tail offset instead of the
part that needs to be reclaimed.  This may cause corruption of the swap
map, so fix it.

Link: https://lkml.kernel.org/r/20250130115131.37777-1-ryncsn@gmail.com
Fixes: 3b644773eefd ("mm, swap: reduce contention on device lock")
Signed-off-by: Kairui Song <kasong@tencent.com>
Cc: Chris Li <chrisl@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months ago.mailmap: update email address for Christopher Obbard
Christopher Obbard [Wed, 22 Jan 2025 12:04:27 +0000 (12:04 +0000)] 
.mailmap: update email address for Christopher Obbard

Update my email address.

Link: https://lkml.kernel.org/r/20250122-wip-obbardc-update-email-v2-1-12bde6b79ad0@linaro.org
Signed-off-by: Christopher Obbard <christopher.obbard@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agokfence: skip __GFP_THISNODE allocations on NUMA systems
Marco Elver [Fri, 24 Jan 2025 12:01:38 +0000 (13:01 +0100)] 
kfence: skip __GFP_THISNODE allocations on NUMA systems

On NUMA systems, __GFP_THISNODE indicates that an allocation _must_ be on
a particular node, and failure to allocate on the desired node will result
in a failed allocation.

Skip __GFP_THISNODE allocations if we are running on a NUMA system, since
KFENCE can't guarantee which node its pool pages are allocated on.

Link: https://lkml.kernel.org/r/20250124120145.410066-1-elver@google.com
Fixes: 236e9f153852 ("kfence: skip all GFP_ZONEMASK allocations")
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Chistoph Lameter <cl@linux.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agonilfs2: fix possible int overflows in nilfs_fiemap()
Nikita Zhandarovich [Fri, 24 Jan 2025 22:20:53 +0000 (07:20 +0900)] 
nilfs2: fix possible int overflows in nilfs_fiemap()

Since nilfs_bmap_lookup_contig() in nilfs_fiemap() calculates its result
by being prepared to go through potentially maxblocks == INT_MAX blocks,
the value in n may experience an overflow caused by left shift of blkbits.

While it is extremely unlikely to occur, play it safe and cast right hand
expression to wider type to mitigate the issue.

Found by Linux Verification Center (linuxtesting.org) with static analysis
tool SVACE.

Link: https://lkml.kernel.org/r/20250124222133.5323-1-konishi.ryusuke@gmail.com
Fixes: 622daaff0a89 ("nilfs2: fiemap support")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomm: compaction: use the proper flag to determine watermarks
yangge [Sat, 25 Jan 2025 06:53:57 +0000 (14:53 +0800)] 
mm: compaction: use the proper flag to determine watermarks

There are 4 NUMA nodes on my machine, and each NUMA node has 32GB of
memory.  I have configured 16GB of CMA memory on each NUMA node, and
starting a 32GB virtual machine with device passthrough is extremely slow,
taking almost an hour.

Long term GUP cannot allocate memory from CMA area, so a maximum of 16 GB
of no-CMA memory on a NUMA node can be used as virtual machine memory.
There is 16GB of free CMA memory on a NUMA node, which is sufficient to
pass the order-0 watermark check, causing the __compaction_suitable()
function to consistently return true.

For costly allocations, if the __compaction_suitable() function always
returns true, it causes the __alloc_pages_slowpath() function to fail to
exit at the appropriate point.  This prevents timely fallback to
allocating memory on other nodes, ultimately resulting in excessively long
virtual machine startup times.

Call trace:
__alloc_pages_slowpath
    if (compact_result == COMPACT_SKIPPED ||
        compact_result == COMPACT_DEFERRED)
        goto nopage; // should exit __alloc_pages_slowpath() from here

We could use the real unmovable allocation context to have
__zone_watermark_unusable_free() subtract CMA pages, and thus we won't
pass the order-0 check anymore once the non-CMA part is exhausted.  There
is some risk that in some different scenario the compaction could in fact
migrate pages from the exhausted non-CMA part of the zone to the CMA part
and succeed, and we'll skip it instead.  But only __GFP_NORETRY
allocations should be affected in the immediate "goto nopage" when
compaction is skipped, others will attempt with DEF_COMPACT_PRIORITY
anyway and won't fail without trying to compact-migrate the non-CMA
pageblocks into CMA pageblocks first, so it should be fine.

After this fix, it only takes a few tens of seconds to start a 32GB
virtual machine with device passthrough functionality.

Link: https://lore.kernel.org/lkml/1736335854-548-1-git-send-email-yangge1116@126.com/
Link: https://lkml.kernel.org/r/1737788037-8439-1-git-send-email-yangge1116@126.com
Signed-off-by: yangge <yangge1116@126.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Barry Song <21cnbao@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agokernel: be more careful about dup_mmap() failures and uprobe registering
Liam R. Howlett [Mon, 27 Jan 2025 17:02:21 +0000 (12:02 -0500)] 
kernel: be more careful about dup_mmap() failures and uprobe registering

If a memory allocation fails during dup_mmap(), the maple tree can be left
in an unsafe state for other iterators besides the exit path.  All the
locks are dropped before the exit_mmap() call (in mm/mmap.c), but the
incomplete mm_struct can be reached through (at least) the rmap finding
the vmas which have a pointer back to the mm_struct.

Up to this point, there have been no issues with being able to find an
mm_struct that was only partially initialised.  Syzbot was able to make
the incomplete mm_struct fail with recent forking changes, so it has been
proven unsafe to use the mm_struct that hasn't been initialised, as
referenced in the link below.

Although 8ac662f5da19f ("fork: avoid inappropriate uprobe access to
invalid mm") fixed the uprobe access, it does not completely remove the
race.

This patch sets the MMF_OOM_SKIP to avoid the iteration of the vmas on the
oom side (even though this is extremely unlikely to be selected as an oom
victim in the race window), and sets MMF_UNSTABLE to avoid other potential
users from using a partially initialised mm_struct.

When registering vmas for uprobe, skip the vmas in an mm that is marked
unstable.  Modifying a vma in an unstable mm may cause issues if the mm
isn't fully initialised.

Link: https://lore.kernel.org/all/6756d273.050a0220.2477f.003d.GAE@google.com/
Link: https://lkml.kernel.org/r/20250127170221.1761366-1-Liam.Howlett@oracle.com
Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomm/fake-numa: handle cases with no SRAT info
Bruno Faccini [Mon, 27 Jan 2025 17:16:23 +0000 (09:16 -0800)] 
mm/fake-numa: handle cases with no SRAT info

Handle more gracefully cases where no SRAT information is available, like
in VMs with no Numa support, and allow fake-numa configuration to complete
successfully in these cases

Link: https://lkml.kernel.org/r/20250127171623.1523171-1-bfaccini@nvidia.com
Fixes: 63db8170bf34 (“mm/fake-numa: allow later numa node hotplug”)
Signed-off-by: Bruno Faccini <bfaccini@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hyeonggon Yoo <hyeonggon.yoo@sk.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomm: kmemleak: fix upper boundary check for physical address objects
Catalin Marinas [Mon, 27 Jan 2025 18:42:33 +0000 (18:42 +0000)] 
mm: kmemleak: fix upper boundary check for physical address objects

Memblock allocations are registered by kmemleak separately, based on their
physical address.  During the scanning stage, it checks whether an object
is within the min_low_pfn and max_low_pfn boundaries and ignores it
otherwise.

With the recent addition of __percpu pointer leak detection (commit
6c99d4eb7c5e ("kmemleak: enable tracking for percpu pointers")), kmemleak
started reporting leaks in setup_zone_pageset() and
setup_per_cpu_pageset().  These were caused by the node_data[0] object
(initialised in alloc_node_data()) ending on the PFN_PHYS(max_low_pfn)
boundary.  The non-strict upper boundary check introduced by commit
84c326299191 ("mm: kmemleak: check physical address when scan") causes the
pg_data_t object to be ignored (not scanned) and the __percpu pointers it
contains to be reported as leaks.

Make the max_low_pfn upper boundary check strict when deciding whether to
ignore a physical address object and not scan it.

Link: https://lkml.kernel.org/r/20250127184233.2974311-1-catalin.marinas@arm.com
Fixes: 84c326299191 ("mm: kmemleak: check physical address when scan")
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Cc: Patrick Wang <patrick.wang.shcn@gmail.com>
Cc: <stable@vger.kernel.org> [6.0.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomailmap: add an entry for Hamza Mahfooz
Hamza Mahfooz [Mon, 20 Jan 2025 20:56:59 +0000 (15:56 -0500)] 
mailmap: add an entry for Hamza Mahfooz

Map my previous work email to my current one.

Link: https://lkml.kernel.org/r/20250120205659.139027-1-hamzamahfooz@linux.microsoft.com
Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hans verkuil <hverkuil@xs4all.nl>
Cc: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agoMAINTAINERS: mailmap: update Yosry Ahmed's email address
Yosry Ahmed [Thu, 23 Jan 2025 23:13:44 +0000 (23:13 +0000)] 
MAINTAINERS: mailmap: update Yosry Ahmed's email address

Moving to a linux.dev email address.

Link: https://lkml.kernel.org/r/20250123231344.817358-1-yosry.ahmed@linux.dev
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agoscripts/gdb: fix aarch64 userspace detection in get_current_task
Jan Kiszka [Fri, 10 Jan 2025 10:36:33 +0000 (11:36 +0100)] 
scripts/gdb: fix aarch64 userspace detection in get_current_task

At least recent gdb releases (seen with 14.2) return SP_EL0 as signed long
which lets the right-shift always return 0.

Link: https://lkml.kernel.org/r/dcd2fabc-9131-4b48-8419-6444e2d67454@siemens.com
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomm/vmscan: accumulate nr_demoted for accurate demotion statistics
Li Zhijian [Fri, 10 Jan 2025 12:21:32 +0000 (20:21 +0800)] 
mm/vmscan: accumulate nr_demoted for accurate demotion statistics

In shrink_folio_list(), demote_folio_list() can be called 2 times.
Currently stat->nr_demoted will only store the last nr_demoted( the later
nr_demoted is always zero, the former nr_demoted will get lost), as a
result number of demoted pages is not accurate.

Accumulate the nr_demoted count across multiple calls to
demote_folio_list(), ensuring accurate reporting of demotion statistics.

[lizhijian@fujitsu.com: introduce local nr_demoted to fix nr_reclaimed double counting]
Link: https://lkml.kernel.org/r/20250111015253.425693-1-lizhijian@fujitsu.com
Link: https://lkml.kernel.org/r/20250110122133.423481-1-lizhijian@fujitsu.com
Fixes: f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA balancing operations")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Acked-by: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
Tested-by: Donet Tom <donettom@linux.ibm.com>
Reviewed-by: Donet Tom <donettom@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agoocfs2: fix incorrect CPU endianness conversion causing mount failure
Heming Zhao [Tue, 21 Jan 2025 11:22:03 +0000 (19:22 +0800)] 
ocfs2: fix incorrect CPU endianness conversion causing mount failure

Commit 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()")
introduced a regression bug.  The blksz_bits value is already converted to
CPU endian in the previous code; therefore, the code shouldn't use
le32_to_cpu() anymore.

Link: https://lkml.kernel.org/r/20250121112204.12834-1-heming.zhao@suse.com
Fixes: 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()")
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc()
Hyeonggon Yoo [Mon, 27 Jan 2025 23:16:31 +0000 (08:16 +0900)] 
mm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc()

Commit c1b3bb73d55e ("mm/zsmalloc: use zpdesc in
trylock_zspage()/lock_zspage()") introduces is_first_zpdesc() function.
However, the function is only used when CONFIG_DEBUG_VM=y.

When building with LLVM=1 and W=1 option, the following warning is
generated:
  $ make -j12 W=1 LLVM=1 mm/zsmalloc.o
  mm/zsmalloc.c:455:20: error: function 'is_first_zpdesc' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
    455 | static inline bool is_first_zpdesc(struct zpdesc *zpdesc)
        |                    ^~~~~~~~~~~~~~~
  1 error generated.

Fix the warning by adding __maybe_unused attribute to the function.
No functional change intended.

Link: https://lkml.kernel.org/r/20250127231631.4363-1-42.hyeyoo@gmail.com
Fixes: c1b3bb73d55e ("mm/zsmalloc: use zpdesc in trylock_zspage()/lock_zspage()")
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501240958.4ILzuBrH-lkp@intel.com/
Cc: Alex Shi <alexs@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomm/vmscan: fix hard LOCKUP in function isolate_lru_folios
liuye [Tue, 19 Nov 2024 06:08:42 +0000 (14:08 +0800)] 
mm/vmscan: fix hard LOCKUP in function isolate_lru_folios

This fixes the following hard lockup in isolate_lru_folios() during memory
reclaim.  If the LRU mostly contains ineligible folios this may trigger
watchdog.

watchdog: Watchdog detected hard LOCKUP on cpu 173
RIP: 0010:native_queued_spin_lock_slowpath+0x255/0x2a0
Call Trace:
_raw_spin_lock_irqsave+0x31/0x40
folio_lruvec_lock_irqsave+0x5f/0x90
folio_batch_move_lru+0x91/0x150
lru_add_drain_per_cpu+0x1c/0x40
process_one_work+0x17d/0x350
worker_thread+0x27b/0x3a0
kthread+0xe8/0x120
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1b/0x30

lruvec->lru_lock owner:

PID: 2865     TASK: ffff888139214d40  CPU: 40   COMMAND: "kswapd0"
 #0 [fffffe0000945e60] crash_nmi_callback at ffffffffa567a555
 #1 [fffffe0000945e68] nmi_handle at ffffffffa563b171
 #2 [fffffe0000945eb0] default_do_nmi at ffffffffa6575920
 #3 [fffffe0000945ed0] exc_nmi at ffffffffa6575af4
 #4 [fffffe0000945ef0] end_repeat_nmi at ffffffffa6601dde
    [exception RIP: isolate_lru_folios+403]
    RIP: ffffffffa597df53  RSP: ffffc90006fb7c28  RFLAGS: 00000002
    RAX: 0000000000000001  RBX: ffffc90006fb7c60  RCX: ffffea04a2196f88
    RDX: ffffc90006fb7c60  RSI: ffffc90006fb7c60  RDI: ffffea04a2197048
    RBP: ffff88812cbd3010   R8: ffffea04a2197008   R9: 0000000000000001
    R10: 0000000000000000  R11: 0000000000000001  R12: ffffea04a2197008
    R13: ffffea04a2197048  R14: ffffc90006fb7de8  R15: 0000000003e3e937
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    <NMI exception stack>
 #5 [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53
 #6 [ffffc90006fb7cf8] shrink_active_list at ffffffffa597f788
 #7 [ffffc90006fb7da8] balance_pgdat at ffffffffa5986db0
 #8 [ffffc90006fb7ec0] kswapd at ffffffffa5987354
 #9 [ffffc90006fb7ef8] kthread at ffffffffa5748238
crash>

Scenario:
User processe are requesting a large amount of memory and keep page active.
Then a module continuously requests memory from ZONE_DMA32 area.
Memory reclaim will be triggered due to ZONE_DMA32 watermark alarm reached.
However pages in the LRU(active_anon) list are mostly from
the ZONE_NORMAL area.

Reproduce:
Terminal 1: Construct to continuously increase pages active(anon).
mkdir /tmp/memory
mount -t tmpfs -o size=1024000M tmpfs /tmp/memory
dd if=/dev/zero of=/tmp/memory/block bs=4M
tail /tmp/memory/block

Terminal 2:
vmstat -a 1
active will increase.
procs ---memory--- ---swap-- ---io---- -system-- ---cpu--- ...
 r  b   swpd   free  inact active   si   so    bi    bo
 1  0   0 1445623076 45898836 83646008    0    0     0
 1  0   0 1445623076 43450228 86094616    0    0     0
 1  0   0 1445623076 41003480 88541364    0    0     0
 1  0   0 1445623076 38557088 90987756    0    0     0
 1  0   0 1445623076 36109688 93435156    0    0     0
 1  0   0 1445619552 33663256 95881632    0    0     0
 1  0   0 1445619804 31217140 98327792    0    0     0
 1  0   0 1445619804 28769988 100774944    0    0     0
 1  0   0 1445619804 26322348 103222584    0    0     0
 1  0   0 1445619804 23875592 105669340    0    0     0

cat /proc/meminfo | head
Active(anon) increase.
MemTotal:       1579941036 kB
MemFree:        1445618500 kB
MemAvailable:   1453013224 kB
Buffers:            6516 kB
Cached:         128653956 kB
SwapCached:            0 kB
Active:         118110812 kB
Inactive:       11436620 kB
Active(anon):   115345744 kB
Inactive(anon):   945292 kB

When the Active(anon) is 115345744 kB, insmod module triggers
the ZONE_DMA32 watermark.

perf record -e vmscan:mm_vmscan_lru_isolate -aR
perf script
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=2
nr_skipped=2 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=0
nr_skipped=0 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=28835844
nr_skipped=28835844 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=28835844
nr_skipped=28835844 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=29
nr_skipped=29 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=0
nr_skipped=0 nr_taken=0 lru=active_anon

See nr_scanned=28835844.
28835844 * 4k = 115343376KB approximately equal to 115345744 kB.

If increase Active(anon) to 1000G then insmod module triggers
the ZONE_DMA32 watermark. hard lockup will occur.

In my device nr_scanned = 0000000003e3e937 when hard lockup.
Convert to memory size 0x0000000003e3e937 * 4KB = 261072092 KB.

   [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53
    ffffc90006fb7c300000000000000020 0000000000000000
    ffffc90006fb7c40ffffc90006fb7d40 ffff88812cbd3000
    ffffc90006fb7c50ffffc90006fb7d30 0000000106fb7de8
    ffffc90006fb7c60ffffea04a2197008 ffffea0006ed4a48
    ffffc90006fb7c700000000000000000 0000000000000000
    ffffc90006fb7c800000000000000000 0000000000000000
    ffffc90006fb7c900000000000000000 0000000000000000
    ffffc90006fb7ca00000000000000000 0000000003e3e937
    ffffc90006fb7cb00000000000000000 0000000000000000
    ffffc90006fb7cc08d7c0b56b7874b00 ffff88812cbd3000

About the Fixes:
Why did it take eight years to be discovered?

The problem requires the following conditions to occur:
1. The device memory should be large enough.
2. Pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL area.
3. The memory in ZONE_DMA32 needs to reach the watermark.

If the memory is not large enough, or if the usage design of ZONE_DMA32
area memory is reasonable, this problem is difficult to detect.

notes:
The problem is most likely to occur in ZONE_DMA32 and ZONE_NORMAL,
but other suitable scenarios may also trigger the problem.

Link: https://lkml.kernel.org/r/20241119060842.274072-1-liuye@kylinos.cn
Fixes: b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-node basis")
Signed-off-by: liuye <liuye@kylinos.cn>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agosh: boards: Use imply to enable hardware with complex dependencies
Geert Uytterhoeven [Fri, 24 Jan 2025 08:39:19 +0000 (09:39 +0100)] 
sh: boards: Use imply to enable hardware with complex dependencies

If CONFIG_I2C=n:

    WARNING: unmet direct dependencies detected for SND_SOC_AK4642
      Depends on [n]: SOUND [=y] && SND [=y] && SND_SOC [=y] && I2C [=n]
      Selected by [y]:
      - SH_7724_SOLUTION_ENGINE [=y] && CPU_SUBTYPE_SH7724 [=y] && SND_SIMPLE_CARD [=y]

    WARNING: unmet direct dependencies detected for SND_SOC_DA7210
      Depends on [n]: SOUND [=y] && SND [=y] && SND_SOC [=y] && SND_SOC_I2C_AND_SPI [=n]
      Selected by [y]:
      - SH_ECOVEC [=y] && CPU_SUBTYPE_SH7724 [=y] && SND_SIMPLE_CARD [=y]

Fix this by replacing select by imply, instead of adding a dependency on
I2C.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501240836.OvXqmANX-lkp@intel.com/
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
4 months agosh: Migrate to the generic rule for built-in DTB
Masahiro Yamada [Sun, 22 Dec 2024 00:32:07 +0000 (09:32 +0900)] 
sh: Migrate to the generic rule for built-in DTB

Commit 654102df2ac2 ("kbuild: add generic support for built-in
boot DTBs") introduced generic support for built-in DTBs.

Select GENERIC_BUILTIN_DTB when built-in DTB support is enabled.

To keep consistency across architectures, this commit also renames
CONFIG_USE_BUILTIN_DTB to CONFIG_BUILTIN_DTB, and
CONFIG_BUILTIN_DTB_SOURCE to CONFIG_BUILTIN_DTB_NAME.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
4 months agosh: irq: Use seq_put_decimal_ull_width() for decimal values
David Wang [Sat, 30 Nov 2024 13:49:09 +0000 (21:49 +0800)] 
sh: irq: Use seq_put_decimal_ull_width() for decimal values

On a system with n CPUs and m interrupts, there will be n*m decimal
values yielded via seq_printf(.."%10u "..) which has significant costs
parsing format string and is less efficient than seq_put_decimal_ull_width().
Stress reading /proc/interrupts indicates ~30% performance improvement with
this patch.

Signed-off-by: David Wang <00107082@163.com>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
4 months agoMerge tag 'for-linus-hexagon-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 1 Feb 2025 04:11:24 +0000 (20:11 -0800)] 
Merge tag 'for-linus-hexagon-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/bcain/linux

Pull hexagon updates from Brian Cain:

 - Move kernel prototypes out of uapi header to internal header

 - Fix to address an unbalanced spinlock

 - Miscellaneous patches to fix static checks

 - Update bcain@quicinc.com->brian.cain@oss.qualcomm.com

* tag 'for-linus-hexagon-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/bcain/linux:
  MAINTAINERS: Update my email address
  hexagon: Fix unbalanced spinlock in die()
  hexagon: Fix warning comparing pointer to 0
  hexagon: Move kernel prototypes out of uapi/asm/setup.h header
  hexagon: time: Remove redundant null check for resource
  hexagon: fix using plain integer as NULL pointer warning in cmpxchg

4 months agoRemove stale generated 'genheaders' file
Linus Torvalds [Sat, 1 Feb 2025 03:49:17 +0000 (19:49 -0800)] 
Remove stale generated 'genheaders' file

This bogus stale file was added in commit 101971298be2 ("riscv: add a
warning when physical memory address overflows").  It's the old location
for what is now 'security/selinux/genheaders'.

It looks like it got incorrectly committed back when that file was in
the old location, and then rebasing kept the bogus file alive.

Reported-by: Eric Biggers <ebiggers@kernel.org>
Link: https://lore.kernel.org/linux-riscv/20250201020003.GA77370@sol.localdomain/
Fixes: 101971298be2 ("riscv: add a warning when physical memory address overflows")
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agoMerge tag 'AT_EXECVE_CHECK-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 1 Feb 2025 01:12:31 +0000 (17:12 -0800)] 
Merge tag 'AT_EXECVE_CHECK-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull AT_EXECVE_CHECK selftest fix from Kees Cook:
 "Fixes the AT_EXECVE_CHECK selftests which didn't run on old versions
  of glibc"

* tag 'AT_EXECVE_CHECK-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  selftests: Handle old glibc without execveat(2)

4 months agoMerge tag 'hardening-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 1 Feb 2025 01:10:26 +0000 (17:10 -0800)] 
Merge tag 'hardening-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening fixes from Kees Cook:
 "This is a fix for the soon to be released GCC 15 which has regressed
  its initialization of unions when performing explicit initialization
  (i.e. a general problem, not specifically a hardening problem; we're
  just carrying the fix).

  Details in the final patch, Acked by Masahiro, with updated selftests
  to validate the fix"

* tag 'hardening-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  kbuild: Use -fzero-init-padding-bits=all
  stackinit: Add union initialization to selftests
  stackinit: Add old-style zero-init syntax to struct tests

4 months agoMerge tag 'drm-next-2025-02-01' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Fri, 31 Jan 2025 23:45:41 +0000 (15:45 -0800)] 
Merge tag 'drm-next-2025-02-01' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "This is only AMD fixes:

  amdgpu:
   - GC 12 fix
   - Aldebaran fix
   - DCN 3.5 fix
   - Freesync fix

  amdkfd:
   - Per queue reset fix
   - MES fix"

* tag 'drm-next-2025-02-01' of https://gitlab.freedesktop.org/drm/kernel:
  drm/amd/display: restore invalid MSA timing check for freesync
  drm/amdkfd: only flush the validate MES contex
  drm/amd/display: Correct register address in dcn35
  drm/amd/pm: Mark MM activity as unsupported
  drm/amd/amdgpu: change the config of cgcg on gfx12
  drm/amdkfd: Block per-queue reset when halt_if_hws_hang=1

4 months agoMerge tag 'pci-v6.14-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Linus Torvalds [Fri, 31 Jan 2025 23:39:50 +0000 (15:39 -0800)] 
Merge tag 'pci-v6.14-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fix from Bjorn Helgaas:

 - Save the original INTX_DISABLE bit at the first pcim_intx() call and
   restore that at devres cleanup instead of restoring the opposite of
   the most recent enable/disable pcim_intx() argument, which was wrong
   when a driver called pcim_intx() multiple times or with the already
   enabled state (Takashi Iwai)

* tag 'pci-v6.14-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI: Restore original INTX_DISABLE bit by pcim_intx()

4 months agoMerge tag 'riscv-for-linus-6.14-mw1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 31 Jan 2025 23:13:25 +0000 (15:13 -0800)] 
Merge tag 'riscv-for-linus-6.14-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - The PH1520 pinctrl and dwmac drivers are enabeled in defconfig

 - A redundant AQRL barrier has been removed from the futex cmpxchg
   implementation

 - Support for the T-Head vector extensions, which includes exposing
   these extensions to userspace on systems that implement them

 - Some more page table information is now printed on die() and systems
   that cause PA overflows

* tag 'riscv-for-linus-6.14-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: add a warning when physical memory address overflows
  riscv/mm/fault: add show_pte() before die()
  riscv: Add ghostwrite vulnerability
  selftests: riscv: Support xtheadvector in vector tests
  selftests: riscv: Fix vector tests
  riscv: hwprobe: Document thead vendor extensions and xtheadvector extension
  riscv: hwprobe: Add thead vendor extension probing
  riscv: vector: Support xtheadvector save/restore
  riscv: Add xtheadvector instruction definitions
  riscv: csr: Add CSR encodings for CSR_VXRM/CSR_VXSAT
  RISC-V: define the elements of the VCSR vector CSR
  riscv: vector: Use vlenb from DT for thead
  riscv: Add thead and xtheadvector as a vendor extension
  riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
  dt-bindings: cpus: add a thead vlen register length property
  dt-bindings: riscv: Add xtheadvector ISA extension description
  RISC-V: Mark riscv_v_init() as __init
  riscv: defconfig: drop RT_GROUP_SCHED=y
  riscv/futex: Optimize atomic cmpxchg
  riscv: defconfig: enable pinctrl and dwmac support for TH1520

4 months agoRevert "media: uvcvideo: Require entities to have a non-zero unique ID"
Thadeu Lima de Souza Cascardo [Tue, 14 Jan 2025 20:00:45 +0000 (17:00 -0300)] 
Revert "media: uvcvideo: Require entities to have a non-zero unique ID"

This reverts commit 3dd075fe8ebbc6fcbf998f81a75b8c4b159a6195.

Tomasz has reported that his device, Generalplus Technology Inc. 808 Camera,
with ID 1b3f:2002, stopped being detected:

$ ls -l /dev/video*
zsh: no matches found: /dev/video*
[    7.230599] usb 3-2: Found multiple Units with ID 5

This particular device is non-compliant, having both the Output Terminal
and Processing Unit with ID 5. uvc_scan_fallback, though, is able to build
a chain. However, when media elements are added and uvc_mc_create_links
call uvc_entity_by_id, it will get the incorrect entity,
media_create_pad_link will WARN, and it will fail to register the entities.

In order to reinstate support for such devices in a timely fashion,
reverting the fix for these warnings is appropriate. A proper fix that
considers the existence of such non-compliant devices will be submitted in
a later development cycle.

Reported-by: Tomasz Sikora <sikora.tomus@gmail.com>
Fixes: 3dd075fe8ebb ("media: uvcvideo: Require entities to have a non-zero unique ID")
Cc: stable@vger.kernel.org
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ricardo Ribalda <ribalda@chromium.org>
Link: https://lore.kernel.org/r/20250114200045.1401644-1-cascardo@igalia.com
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
4 months agoMerge tag 'kbuild-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy...
Linus Torvalds [Fri, 31 Jan 2025 20:07:07 +0000 (12:07 -0800)] 
Merge tag 'kbuild-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Support multiple hook locations for maint scripts of Debian package

 - Remove 'cpio' from the build tool requirement

 - Introduce gendwarfksyms tool, which computes CRCs for export symbols
   based on the DWARF information

 - Support CONFIG_MODVERSIONS for Rust

 - Resolve all conflicts in the genksyms parser

 - Fix several syntax errors in genksyms

* tag 'kbuild-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (64 commits)
  kbuild: fix Clang LTO with CONFIG_OBJTOOL=n
  kbuild: Strip runtime const RELA sections correctly
  kconfig: fix memory leak in sym_warn_unmet_dep()
  kconfig: fix file name in warnings when loading KCONFIG_DEFCONFIG_LIST
  genksyms: fix syntax error for attribute before init-declarator
  genksyms: fix syntax error for builtin (u)int*x*_t types
  genksyms: fix syntax error for attribute after 'union'
  genksyms: fix syntax error for attribute after 'struct'
  genksyms: fix syntax error for attribute after abstact_declarator
  genksyms: fix syntax error for attribute before nested_declarator
  genksyms: fix syntax error for attribute before abstract_declarator
  genksyms: decouple ATTRIBUTE_PHRASE from type-qualifier
  genksyms: record attributes consistently for init-declarator
  genksyms: restrict direct-declarator to take one parameter-type-list
  genksyms: restrict direct-abstract-declarator to take one parameter-type-list
  genksyms: remove Makefile hack
  genksyms: fix last 3 shift/reduce conflicts
  genksyms: fix 6 shift/reduce conflicts and 5 reduce/reduce conflicts
  genksyms: reduce type_qualifier directly to decl_specifier
  genksyms: rename cvar_qualifier to type_qualifier
  ...

4 months agoMerge tag 'block-6.14-20250131' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 31 Jan 2025 19:49:30 +0000 (11:49 -0800)] 
Merge tag 'block-6.14-20250131' of git://git.kernel.dk/linux

Pull more block updates from Jens Axboe:

 - MD pull request via Song:
      - Fix a md-cluster regression introduced

 - More sysfs race fixes

 - Mark anything inside queue freezing as not being able to do IO for
   memory allocations

 - Fix for a regression introduced in loop in this merge window

 - Fix for a regression in queue mapping setups introduced in this merge
   window

 - Fix for the block dio fops attempting an iov_iter revert upton
   getting -EIOCBQUEUED on the read side. This one is going to stable as
   well

* tag 'block-6.14-20250131' of git://git.kernel.dk/linux:
  block: force noio scope in blk_mq_freeze_queue
  block: fix nr_hw_queue update racing with disk addition/removal
  block: get rid of request queue ->sysfs_dir_lock
  loop: don't clear LO_FLAGS_PARTSCAN on LOOP_SET_STATUS{,64}
  md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime
  blk-mq: create correct map for fallback case
  block: don't revert iter for -EIOCBQUEUED

4 months agoMerge tag 'io_uring-6.14-20250131' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 31 Jan 2025 19:29:23 +0000 (11:29 -0800)] 
Merge tag 'io_uring-6.14-20250131' of git://git.kernel.dk/linux

Pull more io_uring updates from Jens Axboe:

 - Series cleaning up the alloc cache changes from this merge window,
   and then another series on top making it better yet.

   This also solves an issue with KASAN_EXTRA_INFO, by making io_uring
   resilient to KASAN using parts of the freed struct for storage

 - Cleanups and simplications to buffer cloning and io resource node
   management

 - Fix an issue introduced in this merge window where READ/WRITE_ONCE
   was used on an atomic_t, which made some archs complain

 - Fix for an errant connect retry when the socket has been shut down

 - Fix for multishot and provided buffers

* tag 'io_uring-6.14-20250131' of git://git.kernel.dk/linux:
  io_uring/net: don't retry connect operation on EPOLLERR
  io_uring/rw: simplify io_rw_recycle()
  io_uring: remove !KASAN guards from cache free
  io_uring/net: extract io_send_select_buffer()
  io_uring/net: clean io_msg_copy_hdr()
  io_uring/net: make io_net_vec_assign() return void
  io_uring: add alloc_cache.c
  io_uring: dont ifdef io_alloc_cache_kasan()
  io_uring: include all deps for alloc_cache.h
  io_uring: fix multishots with selected buffers
  io_uring/register: use atomic_read/write for sq_flags migration
  io_uring/alloc_cache: get rid of _nocache() helper
  io_uring: get rid of alloc cache init_once handling
  io_uring/uring_cmd: cleanup struct io_uring_cmd_data layout
  io_uring/uring_cmd: use cached cmd_op in io_uring_cmd_sock()
  io_uring/msg_ring: don't leave potentially dangling ->tctx pointer
  io_uring/rsrc: Move lockdep assert from io_free_rsrc_node() to caller
  io_uring/rsrc: remove unused parameter ctx for io_rsrc_node_alloc()
  io_uring: clean up io_uring_register_get_file()
  io_uring/rsrc: Simplify buffer cloning by locking both rings

4 months agokbuild: fix Clang LTO with CONFIG_OBJTOOL=n
Masahiro Yamada [Fri, 31 Jan 2025 14:04:01 +0000 (23:04 +0900)] 
kbuild: fix Clang LTO with CONFIG_OBJTOOL=n

Since commit bede169618c6 ("kbuild: enable objtool for *.mod.o and
additional kernel objects"), Clang LTO builds do not perform any
optimizations when CONFIG_OBJTOOL is disabled (e.g., for ARCH=arm64).
This is because every LLVM bitcode file is immediately converted to
ELF format before the object files are linked together.

This commit fixes the breakage.

Fixes: bede169618c6 ("kbuild: enable objtool for *.mod.o and additional kernel objects")
Reported-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Yonghong Song <yonghong.song@linux.dev>
4 months agokbuild: Strip runtime const RELA sections correctly
Ard Biesheuvel [Mon, 13 Jan 2025 15:53:07 +0000 (16:53 +0100)] 
kbuild: Strip runtime const RELA sections correctly

Due to the fact that runtime const ELF sections are named without a
leading period or double underscore, the RSTRIP logic that removes the
static RELA sections from vmlinux fails to identify them. This results
in a situation like below, where some sections that were supposed to get
removed are left behind.

  [Nr] Name                              Type            Address          Off     Size   ES Flg Lk Inf Al

  [58] runtime_shift_d_hash_shift        PROGBITS        ffffffff83500f50 2900f50 000014 00   A  0   0  1
  [59] .relaruntime_shift_d_hash_shift   RELA            0000000000000000 55b6f00 000078 18   I 70  58  8
  [60] runtime_ptr_dentry_hashtable      PROGBITS        ffffffff83500f68 2900f68 000014 00   A  0   0  1
  [61] .relaruntime_ptr_dentry_hashtable RELA            0000000000000000 55b6f78 000078 18   I 70  60  8
  [62] runtime_ptr_USER_PTR_MAX          PROGBITS        ffffffff83500f80 2900f80 000238 00   A  0   0  1
  [63] .relaruntime_ptr_USER_PTR_MAX     RELA            0000000000000000 55b6ff0 000d50 18   I 70  62  8

So tweak the match expression to strip all sections starting with .rel.
While at it, consolidate the logic used by RISC-V, s390 and x86 into a
single shared Makefile library command.

Link: https://lore.kernel.org/all/CAHk-=wjk3ynjomNvFN8jf9A1k=qSc=JFF591W00uXj-qqNUxPQ@mail.gmail.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 months agoMerge tag 'ata-6.14-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/libat...
Linus Torvalds [Fri, 31 Jan 2025 19:07:56 +0000 (11:07 -0800)] 
Merge tag 'ata-6.14-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull more ata updates from Niklas Cassel:

 - Add ATA_QUIRK_NOLPM for Samsung SSD 870 QVO drives (Daniel)

 - Ensure that PIO transfers using libata-sff cannot write outside the
   allocated buffer (me)

* tag 'ata-6.14-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata-sff: Ensure that we cannot write outside the allocated buffer
  ata: libata-core: Add ATA_QUIRK_NOLPM for Samsung SSD 870 QVO drives

4 months agocifs: Fix parsing native symlinks directory/file type
Pali Rohár [Mon, 23 Sep 2024 20:29:30 +0000 (22:29 +0200)] 
cifs: Fix parsing native symlinks directory/file type

As SMB protocol distinguish between symlink to directory and symlink to
file, add some mechanism to disallow resolving incompatible types.

When SMB symlink is of the directory type, ensure that its target path ends
with slash. This forces Linux to not allow resolving such symlink to file.

And when SMB symlink is of the file type and its target path ends with
slash then returns an error as such symlink is unresolvable. Such symlink
always points to invalid location as file cannot end with slash.

As POSIX server does not distinguish between symlinks to file and symlink
directory, do not apply this change for symlinks from POSIX SMB server. For
POSIX SMB servers, this change does nothing.

This mimics Windows behavior of native SMB symlinks.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agocifs: update internal version number
Steve French [Mon, 27 Jan 2025 23:45:57 +0000 (17:45 -0600)] 
cifs: update internal version number

To 2.53

Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agocifs: Add support for creating WSL-style symlinks
Pali Rohár [Sat, 28 Sep 2024 11:24:26 +0000 (13:24 +0200)] 
cifs: Add support for creating WSL-style symlinks

This change implements support for creating new symlink in WSL-style by
Linux cifs client when -o reparse=wsl mount option is specified. WSL-style
symlink uses reparse point with tag IO_REPARSE_TAG_LX_SYMLINK and symlink
target location is stored in reparse buffer in UTF-8 encoding prefixed by
32-bit flags. Flags bits are unknown, but it was observed that WSL always
sets flags to value 0x02000000. Do same in Linux cifs client.

New symlinks would be created in WSL-style only in case the mount option
-o reparse=wsl is specified, which is not by default. So default CIFS
mounts are not affected by this change.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agosmb3: add support for IAKerb
Steve French [Tue, 28 Jan 2025 07:04:23 +0000 (01:04 -0600)] 
smb3: add support for IAKerb

There are now more servers which advertise support for IAKerb (passthrough
Kerberos authentication via proxy).  IAKerb is a public extension industry
standard Kerberos protocol that allows a client without line-of-sight
to a Domain Controller to authenticate. There can be cases where we
would fail to mount if the server only advertises the OID for IAKerb
in SPNEGO/GSSAPI.  Add code to allow us to still upcall to userspace
in these cases to obtain the Kerberos ticket.

Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agocifs: Fix struct FILE_ALL_INFO
Pali Rohár [Sun, 29 Dec 2024 14:31:05 +0000 (15:31 +0100)] 
cifs: Fix struct FILE_ALL_INFO

struct FILE_ALL_INFO for level 263 (0x107) used by QPathInfo does not have
any IndexNumber, AccessFlags, IndexNumber1, CurrentByteOffset, Mode or
AlignmentRequirement members. So remove all of them.

Also adjust code in move_cifs_info_to_smb2() function which converts struct
FILE_ALL_INFO to struct smb2_file_all_info.

Fixed content of struct FILE_ALL_INFO was verified that is correct against:
* [MS-CIFS] section 2.2.8.3.10 SMB_QUERY_FILE_ALL_INFO
* Samba server implementation of trans2 query file/path for level 263
* Packet structure tests against Windows SMB servers

This change fixes CIFSSMBQFileInfo() and CIFSSMBQPathInfo() functions which
directly copy received FILE_ALL_INFO network buffers into kernel structures
of FILE_ALL_INFO type.

struct FILE_ALL_INFO is the response structure returned by the SMB server.
So the incorrect definition of this structure can lead to returning bogus
information in stat() call.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agocifs: Add support for creating NFS-style symlinks
Pali Rohár [Fri, 13 Sep 2024 09:42:59 +0000 (11:42 +0200)] 
cifs: Add support for creating NFS-style symlinks

CIFS client is currently able to parse NFS-style symlinks, but is not able
to create them. This functionality is useful when the mounted SMB share is
used also by Windows NFS server (on Windows Server 2012 or new). It allows
interop of symlinks between SMB share mounted by Linux CIFS client and same
export from Windows NFS server mounted by some NFS client.

New symlinks would be created in NFS-style only in case the mount option
-o reparse=nfs is specified, which is not by default. So default CIFS
mounts are not affected by this change.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agocifs: Add support for creating native Windows sockets
Pali Rohár [Mon, 16 Sep 2024 18:21:52 +0000 (20:21 +0200)] 
cifs: Add support for creating native Windows sockets

Native Windows sockets created by WinSock on Windows 10 April 2018 Update
(version 1803) or Windows Server 2019 (version 1809) or later versions is
reparse point with IO_REPARSE_TAG_AF_UNIX tag, with empty reparse point
data buffer and without any EAs.

Create AF_UNIX sockets in this native format if -o nonativesocket was not
specified.

This change makes AF_UNIX sockets created by Linux CIFS client compatible
with AF_UNIX sockets created by Windows applications on NTFS volumes.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agoMerge tag 'x86-mm-2025-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Linus Torvalds [Fri, 31 Jan 2025 18:39:07 +0000 (10:39 -0800)] 
Merge tag 'x86-mm-2025-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm updates from Ingo Molnar:

 - The biggest changes are the TLB flushing scalability optimizations,
   to update the mm_cpumask lazily and related changes.

   This feature has both a track record and a continued risk of
   performance regressions, so it was already delayed by a cycle - but
   it's all 100% perfect now™ (Rik van Riel)

 - Also miscellaneous fixes and cleanups. (Gautam Somani, Kirill
   Shutemov, Sebastian Andrzej Siewior)

* tag 'x86-mm-2025-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Remove unnecessary include of <linux/extable.h>
  x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state()
  x86/mm/selftests: Fix typo in lam.c
  x86/mm/tlb: Only trim the mm_cpumask once a second
  x86/mm/tlb: Also remove local CPU from mm_cpumask if stale
  x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU
  x86/mm/tlb: Update mm_cpumask lazily

4 months agoMerge tag 'ceph-for-6.14-rc1' of https://github.com/ceph/ceph-client
Linus Torvalds [Fri, 31 Jan 2025 18:30:34 +0000 (10:30 -0800)] 
Merge tag 'ceph-for-6.14-rc1' of https://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "A fix for a memory leak from Antoine (marked for stable) and two
  cleanups from Liang and Slava"

* tag 'ceph-for-6.14-rc1' of https://github.com/ceph/ceph-client:
  ceph: exchange hardcoded value on NAME_MAX
  ceph: streamline request head structures in MDS client
  ceph: fix memory leak in ceph_mds_auth_match()

4 months agoMerge tag 'for-linus-6.14-ofs4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 31 Jan 2025 17:40:31 +0000 (09:40 -0800)] 
Merge tag 'for-linus-6.14-ofs4' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs fix from Mike Marshall:
 "Fix a oob in orangefs_debug_write

  I got a syzbot report: "slab-out-of-bounds Read in orangefs_debug_write"

  Several people suggested fixes, I tested Al Viro's suggestion and made
  this patch"

* tag 'for-linus-6.14-ofs4' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: fix a oob in orangefs_debug_write

4 months agoMerge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Fri, 31 Jan 2025 17:33:54 +0000 (09:33 -0800)] 
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull hostfs fix from Al Viro:
 "Fix hostfs __dentry_name() string handling.

  The use of strcpy() with overlapping source and destination is a UB;
  original loop hadn't been. More to the point, the whole thing is much
  easier done with memcpy() + memmove()"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  hostfs: fix string handling in __dentry_name()

4 months agoMerge tag 'sound-fix-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 31 Jan 2025 17:17:02 +0000 (09:17 -0800)] 
Merge tag 'sound-fix-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Here is a collection of fixes that have been gathered since the
  previous pull request.

  All about device-specific fixes and quirks, and most of them are
  pretty small and trivial"

* tag 'sound-fix-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (25 commits)
  ALSA: hda/realtek: Workaround for resume on Dell Venue 11 Pro 7130
  ALSA: hda: Fix headset detection failure due to unstable sort
  ALSA: pcm: use new array-copying-wrapper
  ASoC: codec: es8316: "DAC Soft Ramp Rate" is just a 2 bit control
  ASoC: amd: acp: Fix possible deadlock
  firmware: cs_dsp: FW_CS_DSP_KUNIT_TEST should not select REGMAP
  ALSA: usb-audio: Add delay quirk for iBasso DC07 Pro
  ALSA: hda/realtek: Fix quirk matching for Legion Pro 7
  ASoC: renesas: SND_SIU_MIGOR should depend on DMADEVICES
  ASoC: Intel: bytcr_rt5640: Add DMI quirk for Vexia Edu Atla 10 tablet 5V
  ASoC: da7213: Initialize the mutex
  ASoC: use to_platform_device() instead of container_of()
  ASoC: acp: Support microphone from Lenovo Go S
  ASoC: SOF: imx8m: Add entry for new 8M Plus revision
  ASoC: SOF: imx8: Add entries for new 8QM and 8QXP revisions
  ASoC: SOF: imx: Add mach entry to select cs42888 topology
  dt-bindings: arm: imx: Add board revisions for i.MX8MP, i.MX8QM and i.MX8QXP
  ASoC: fsl_asrc_m2m: select CONFIG_DMA_SHARED_BUFFER
  ASoC: audio-graph-card2: use correct endpoint when getting link parameters
  ASoC: SOF: imx8m: add SAI2,5,6,7
  ...

4 months agoblock: force noio scope in blk_mq_freeze_queue
Christoph Hellwig [Fri, 31 Jan 2025 12:03:47 +0000 (13:03 +0100)] 
block: force noio scope in blk_mq_freeze_queue

When block drivers or the core block code perform allocations with a
frozen queue, this could try to recurse into the block device to
reclaim memory and deadlock.  Thus all allocations done by a process
that froze a queue need to be done without __GFP_IO and __GFP_FS.
Instead of tying to track all of them down, force a noio scope as
part of freezing the queue.

Note that nvme is a bit of a mess here due to the non-owner freezes,
and they will be addressed separately.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250131120352.1315351-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 months agoRevert "mips: fix shmctl/semctl/msgctl syscall for o32"
Thomas Bogendoerfer [Thu, 30 Jan 2025 10:48:56 +0000 (11:48 +0100)] 
Revert "mips: fix shmctl/semctl/msgctl syscall for o32"

This reverts commit bc7584e009c39375294794f7ca751a6b2622c425.

The split IPC system calls for o32 have been introduced with modern
version only. Changing this breaks ABI.

Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 months agoMAINTAINERS: Update my email address
Brian Cain [Mon, 27 Jan 2025 20:51:03 +0000 (12:51 -0800)] 
MAINTAINERS: Update my email address

Qualcomm is migrating away from quicinc.com email addresses towards ones
with *.qualcomm.com.

Signed-off-by: Brian Cain <brian.cain@oss.qualcomm.com>
Signed-off-by: Brian Cain <bcain@quicinc.com>