Arnd Bergmann [Wed, 15 Jan 2025 17:05:37 +0000 (18:05 +0100)]
Merge tag 'reset-for-v6.14-2' of git://git.pengutronix.de/pza/linux into soc/drivers
Reset controller updates for v6.14 (v2)
* Add support for A1 SoC in amlogic reset driver.
* Drop aux registration helper from amlogic reset driver.
* tag 'reset-for-v6.14-2' of git://git.pengutronix.de/pza/linux:
reset: amlogic: aux: drop aux registration helper
reset: amlogic: aux: get regmap through parent device
reset: amlogic: add support for A1 SoC in auxiliary reset driver
dt-bindings: reset: add bindings for A1 SoC audio reset controller
clk: amlogic: axg-audio: revert reset implementation
Revert "clk: Fix invalid execution of clk_set_rate"
Jerome Brunet [Mon, 9 Dec 2024 16:04:35 +0000 (17:04 +0100)]
reset: amlogic: aux: drop aux registration helper
Having the aux registration helper along with the registered driver is not
great dependency wise. It does not allow the registering driver to be
properly decoupled from the registered auxiliary driver.
Drop the registration helper from the amlogic auxiliary reset driver.
This will be handled in the registering clock driver to start with while
a more generic solution is worked on.
Jan Dakinevich [Tue, 12 Nov 2024 23:00:55 +0000 (02:00 +0300)]
dt-bindings: reset: add bindings for A1 SoC audio reset controller
This reset controller is part of audio clock controller and handled by
auxiliary reset driver. Introduced defines supposed to be used together
with upcoming device tree nodes for audio clock controller fo A1 SoC.
Arnd Bergmann [Wed, 15 Jan 2025 14:59:24 +0000 (15:59 +0100)]
Merge tag 'samsung-drivers-6.14' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/drivers
Samsung SoC drivers for v6.14
1. Add new bindings for sysreg in Exynos8895.
2. Minor improvements in Exynos USI bindings.
3. Fix for Smatch warning in Exynos PMU syscon driver.
* tag 'samsung-drivers-6.14' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
soc: samsung: exynos-pmu: Fix uninitialized ret in tensor_set_bits_atomic()
dt-bindings: soc: samsung: exynos-sysreg: add sysreg compatibles for exynos8895
dt-bindings: samsung: exynos-usi: Restrict possible samsung,mode values
Arnd Bergmann [Wed, 15 Jan 2025 14:58:01 +0000 (15:58 +0100)]
Merge tag 'qcom-drivers-for-6.14' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/drivers
Qualcomm driver updates for v6.14
The Qualcomm SCM drivers gains a number of fixes and improvements
related to race conditions during initialization. QSEECOM and the EFI
variable service therein is enabled for a few 8cx Gen 3 and X Elite
boards.
LLCC driver gains configuration for IPQ5424 and WRCACHE is enabled on X
Elite.
The BCM_TCS_CMD() macro is corrected and is cleaned up.
Support for SM7225 and X 1 Plus are added to the pd-mapper.
pmic_glink and the associated altmode driver are simplied using guards.
socinfo is added for QCS9075 and serial number readout on MSM8916
devices is corrected.
* tag 'qcom-drivers-for-6.14' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (29 commits)
firmware: qcom: scm: add calls for wrapped key support
soc: qcom: pd_mapper: Add SM7225 compatible
dt-bindings: firmware: qcom,scm: Document ipq5424 SCM
soc: qcom: llcc: Update configuration data for IPQ5424
dt-bindings: cache: qcom,llcc: Add IPQ5424 compatible
firmware: qcom: scm: smc: Narrow 'mempool' variable scope
firmware: qcom: scm: smc: Handle missing SCM device
firmware: qcom: scm: Cleanup global '__scm' on probe failures
firmware: qcom: scm: Fix missing read barrier in qcom_scm_get_tzmem_pool()
firmware: qcom: scm: Fix missing read barrier in qcom_scm_is_available()
soc: qcom: socinfo: add QCS9075 SoC ID
dt-bindings: arm: qcom,ids: add SoC ID for QCS9075
soc: qcom: socinfo: Avoid out of bounds read of serial number
firmware: qcom: scm: Allow QSEECOM on Huawei Matebook E Go (sc8280xp)
firmware: qcom: scm: Allow QSEECOM for Windows Dev Kit 2023
firmware: qcom: scm: Allow QSEECOM for HP Omnibook X14
soc: qcom: rmtfs: constify rmtfs_class
soc: qcom: rmtfs: allow building the module with COMPILE_TEST=y
soc: qcom: pmic_glink_altmode: simplify locking with guard()
soc: qcom: Rework BCM_TCS_CMD macro
...
Arnd Bergmann [Wed, 15 Jan 2025 14:54:30 +0000 (15:54 +0100)]
Merge tag 'tegra-for-6.14-soc' of https://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/drivers
soc/tegra: Cleanups for v6.14-rc1
This contains debugfs error handling cleanup, a typofix and an update to
the FUSE block's keepout list to properly allow reading these registers.
* tag 'tegra-for-6.14-soc' of https://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
soc/tegra: fuse: Update Tegra234 nvmem keepout list
soc/tegra: Fix spelling error in tegra234_lookup_slave_timeout()
soc/tegra: cbb: Drop unnecessary debugfs error handling
Arnd Bergmann [Wed, 15 Jan 2025 14:14:38 +0000 (15:14 +0100)]
Merge tag 'scmi-updates-6.14' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into soc/drivers
Arm SCMI updates for 6.14
This mainly has 2 updates:
1. Extension of the transport properties read from devicetree to support
multiple SCMI platform/server instances
2. Addition of the capability to automatically load the proper SCMI vendor
protocol module. The vendor protocol selection is already provided by
the SCMI core while the automatic loading of vendor protocols was not.
* tag 'scmi-updates-6.14' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
firmware: arm_scmi: Add aliases to transport modules
firmware: arm_scmi: Add module aliases to i.MX vendor protocols
firmware: arm_scmi: Support vendor protocol modules autoloading
firmware: arm_scmi: Allow transport properties for multiple instances
Arnd Bergmann [Wed, 15 Jan 2025 14:11:33 +0000 (15:11 +0100)]
Merge tag 'memory-controller-drv-ti-6.14' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl into soc/drivers
Memory controller drivers for v6.14 - TI
TI AEMIF driver enhancements: some refactoring around timing
parameters and finally adding plus exporting interfaces for devices
using the AEMIF interface (e.g. TI Davinci NAND controller) to better
configure the memory interface.
The exported functions are going to be used by:
drivers/mtd/nand/raw/davinci_nand.c
* tag 'memory-controller-drv-ti-6.14' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl:
memory: ti-aemif: Export aemif_*_cs_timings()
memory: ti-aemif: Create aemif_set_cs_timings()
memory: ti-aemif: Create aemif_check_cs_timings()
memory: ti-aemif: Wrap CS timings into a struct
memory: ti-aemif: Remove unnecessary local variables
memory: ti-aemif: Store timings parameter in number of cycles - 1
Arnd Bergmann [Wed, 15 Jan 2025 14:10:52 +0000 (15:10 +0100)]
Merge tag 'memory-controller-drv-6.14' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl into soc/drivers
Memory controller drivers for v6.14
1. OMAP GPMC: Cleanup dead code.
2. Tegra20 EMC: Fix OF reference counting when iterating over
emc-tables.
* tag 'memory-controller-drv-6.14' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl:
memory: tegra20-emc: fix an OF node reference bug in tegra_emc_find_node_by_ram_code()
memory: omap-gpmc: deadcode a pair of functions
soc/tegra: cbb: Drop unnecessary debugfs error handling
Kernel coding style expects all drivers to ignore debugfs errors.
Partially because it is purely for debugging, not for important user
interfaces. Simplify the code by dropping unnecessary probe failuring
and error message on debugfs failures, which also fixes incorrect usage
IS_ERR_OR_NULL() and Smatch warning:
drivers/soc/tegra/cbb/tegra-cbb.c:80 tegra_cbb_err_debugfs_init() warn: passing zero to 'PTR_ERR'
Gaurav Kashyap [Fri, 13 Dec 2024 04:19:51 +0000 (20:19 -0800)]
firmware: qcom: scm: add calls for wrapped key support
Add helper functions for the SCM calls required to support
hardware-wrapped inline storage encryption keys. These SCM calls manage
wrapped keys via Qualcomm's Hardware Key Manager (HWKM), which can only
be accessed from TrustZone.
QCOM_SCM_ES_GENERATE_ICE_KEY and QCOM_SCM_ES_IMPORT_ICE_KEY create a new
long-term wrapped key, with the former making the hardware generate the
key and the latter importing a raw key. QCOM_SCM_ES_PREPARE_ICE_KEY
converts the key to ephemerally-wrapped form so that it can be used for
inline storage encryption. These are planned to be wired up to new
ioctls via the blk-crypto framework; see the proposed documentation for
the hardware-wrapped keys feature for more information.
Similarly there's also QCOM_SCM_ES_DERIVE_SW_SECRET which derives a
"software secret" from an ephemerally-wrapped key and will be wired up
to the corresponding operation in the blk_crypto_profile.
These will all be used by the ICE driver in drivers/soc/qcom/ice.c.
soc: qcom: llcc: Update configuration data for IPQ5424
The 'broadcast' register space is present only in chipsets that
have multiple instances of LLCC IP. Since IPQ5424 has only one
instance, both the LLCC and LLCC_BROADCAST points to the same
register space.
Document the Last Level Cache Controller on IPQ5424. The
'broadcast' register space is present only in chipsets that have
multiple instances of LLCC IP. Since IPQ5424 has only one
instance, both the LLCC and LLCC_BROADCAST points to the same
register space.
Hence, allow only '1' reg & reg-names entry for IPQ5424.
Commit ca61d6836e6f ("firmware: qcom: scm: fix a NULL-pointer
dereference") makes it explicit that qcom_scm_get_tzmem_pool() can
return NULL, therefore its users should handle this.
firmware: qcom: scm: Cleanup global '__scm' on probe failures
If SCM driver fails the probe, it should not leave global '__scm'
variable assigned, because external users of this driver will assume the
probe finished successfully. For example TZMEM parts ('__scm->mempool')
are initialized later in the probe, but users of it (__scm_smc_call())
rely on the '__scm' variable.
This fixes theoretical NULL pointer exception, triggered via introducing
probe deferral in SCM driver with call trace:
firmware: qcom: scm: Fix missing read barrier in qcom_scm_get_tzmem_pool()
Commit 2e4955167ec5 ("firmware: qcom: scm: Fix __scm and waitq
completion variable initialization") introduced a write barrier in probe
function to store global '__scm' variable. We all known barriers are
paired (see memory-barriers.txt: "Note that write barriers should
normally be paired with read or address-dependency barriers"), therefore
accessing it from concurrent contexts requires read barrier. Previous
commit added such barrier in qcom_scm_is_available(), so let's use that
directly.
Lack of this read barrier can result in fetching stale '__scm' variable
value, NULL, and dereferencing it.
Note that barrier in qcom_scm_is_available() satisfies here the control
dependency.
firmware: qcom: scm: Fix missing read barrier in qcom_scm_is_available()
Commit 2e4955167ec5 ("firmware: qcom: scm: Fix __scm and waitq
completion variable initialization") introduced a write barrier in probe
function to store global '__scm' variable. It also claimed that it
added a read barrier, because as we all known barriers are paired (see
memory-barriers.txt: "Note that write barriers should normally be paired
with read or address-dependency barriers"), however it did not really
add it.
The offending commit used READ_ONCE() to access '__scm' global which is
not a barrier.
The barrier is needed so the store to '__scm' will be properly visible.
This is most likely not fatal in current driver design, because missing
read barrier would mean qcom_scm_is_available() callers will access old
value, NULL. Driver does not support unbinding and does not correctly
handle probe failures, thus there is no risk of stale or old pointer in
'__scm' variable.
However for code correctness, readability and to be sure that we did not
mess up something in this tricky topic of SMP barriers, add a read
barrier for accessing '__scm'. Change also comment from useless/obvious
what does barrier do, to what is expected: which other parts of the code
are involved here.
The firmware used on MSM8916 exposes SOCINFO_VERSION(0, 8), which does not
have support for the serial_num field in the socinfo struct. There is an
existing check to avoid exposing the serial number in that case, but it's
not correct: When checking the item_size returned by SMEM, we need to make
sure the *end* of the serial_num is within bounds, instead of comparing
with the *start* offset. The serial_number currently exposed on MSM8916
devices is just an out of bounds read of whatever comes after the socinfo
struct in SMEM.
Fix this by changing offsetof() to offsetofend(), so that the size of the
field is also taken into account.
soc: samsung: exynos-pmu: Fix uninitialized ret in tensor_set_bits_atomic()
If tensor_set_bits_atomic() is called with a mask of 0 the function will
just iterate over its bit, not perform any updates and return stack
value of 'ret'.
Also reported by smatch:
drivers/soc/samsung/exynos-pmu.c:129 tensor_set_bits_atomic() error: uninitialized symbol 'ret'.
Joe Hattori [Tue, 17 Dec 2024 09:14:34 +0000 (18:14 +0900)]
memory: tegra20-emc: fix an OF node reference bug in tegra_emc_find_node_by_ram_code()
As of_find_node_by_name() release the reference of the argument device
node, tegra_emc_find_node_by_ram_code() releases some device nodes while
still in use, resulting in possible UAFs. According to the bindings and
the in-tree DTS files, the "emc-tables" node is always device's child
node with the property "nvidia,use-ram-code", and the "lpddr2" node is a
child of the "emc-tables" node. Thus utilize the
for_each_child_of_node() macro and of_get_child_by_name() instead of
of_find_node_by_name() to simplify the code.
This bug was found by an experimental verification tool that I am
developing.
Document QCS615 BWMONs, which includes one BWMONv4 instance for CPU to
LLCC path bandwidth monitoring and one BWMONv5 instance for LLCC to DDR
path bandwidth monitoring.
Sahil Malhotra [Fri, 29 Nov 2024 11:46:48 +0000 (12:46 +0100)]
optee: fix format string for printing optee build_id
There has been a recent change in OP-TEE to print 8 and 16 character
commit id for 32bit and 64bit architecture respectively.
In case if commit id is starting with 0 like 04d1c612ec7beaede073b8c
it is printing revision as below removing leading 0
"optee: revision 4.4 (4d1c612ec7beaed)"
gpmc_get_client_irq() last use was removed by
commit ac28e47ccc3f ("ARM: OMAP2+: Remove legacy gpmc-nand.c")
gpmc_ticks_to_ns() last use was removed by
commit 2514830b8b8c ("ARM: OMAP2+: Remove gpmc-onenand")
Remove them.
gpmc_clk_ticks_to_ns() is now only used in some DEBUG
code; move inside the ifdef to avoid unused warnings.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Roger Quadros <rogerq@kernel.org> Acked-by: Kevin Hilman <khilman@baylibre.com> Link: https://lore.kernel.org/r/20241211214227.107980-1-linux@treblig.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
soc: qcom: pmic_glink: simplify locking with guard()
Simplify error handling over locks with guard(). In few places this
elimiates error gotos and local variables. Switch to guard() everywhere
(except when jumps would go over scoped guard) for consistency, even if
it does not bring benefit in such places.
firmware: arm_scmi: Add module aliases to i.MX vendor protocols
Using the pattern 'scmi-protocol-0x<PROTO_ID>-<VEND_ID>' as MODULE_ALIAS
allows the SCMI core to autoload this protocol, if built as a module, when
its protocol operations are requested by an SCMI driver.
Cc: Peng Fan <peng.fan@nxp.com> Acked-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Message-Id: <20241209164957.1801886-3-cristian.marussi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
firmware: arm_scmi: Support vendor protocol modules autoloading
SCMI vendor protocols namespace is shared amongst all vendors so that there
can be multiple implementation for the same protocol ID by different
vendors, exposing completely different functionalities and used by distinct
SCMI vendor drivers.
For these reasons, at runtime, when some driver asks for a protocol, the
proper implementation to use is chosen based on the SCMI vendor/subvendor/
impl_version data as advertised by the platform SCMI server and gathered
from the SCMI core during stack initialization: this enables proper runtime
selection of vendor protocols even when many different protocols from
different vendors are built into the same image via a common defconfig.
This same selection mechanism works similarly well even when all the vendor
protocols are compiled as loadable modules, as long as all such required
protocol modules have been previously loaded by some other means.
Add support for the automatic loading of vendor protocol modules, based on
protocol/vendor IDs, when an SCMI driver attempts to use such a protocol.
aemif_calc_rate() checks the validity of a new computed timing against a
'max' value given as input. This isn't convenient if we want to check
the CS timing configuration somewhere else in the code.
Wrap the verification of all the chip select's timing configuration into a
single function to ease its exportation in upcoming patches.
Remove the validity check from aemif_calc_rate(). Also remove the no
longer used 'max' input and change the return type to u32.
Remove the check of the aemif_calc_rate()'s return value during
device-tree parsing as aemif_calc_rate() can't fail anymore.
CS timings are store in the struct aemif_cs_data along with other CS
parameters. It isn't convenient for exposing CS timings to other drivers
without also exposing the other parameters.
Wrap the CS timings in a new struct aemif_cs_timings to simplify their
export in upcoming patches.
memory: ti-aemif: Store timings parameter in number of cycles - 1
The CS configuration register expects timings to be expressed in
'number of cycles - 1' but they are stored in ns in the struct
aemif_cs_data. So at init, the timings currently set are converted to ns
by aemif_get_hw_params(), updated with values from the device-tree
properties, and then converted back to 'number of cycles - 1' before
being applied.
Store the timings directly in 'number of cycles - 1' instead of
nanoseconds.
Perform the conversion from nanosecond during the device-tree parsing.
Remove aemif_cycles_to_nsec() as it isn't used anymore.
alice.guo [Mon, 18 Nov 2024 02:17:16 +0000 (10:17 +0800)]
soc: imx: Add SoC device register for i.MX9
i.MX9 SoCs have SoC ID, SoC revision number and chip unique identifier
which are provided by the corresponding ARM trusted firmware API. This
patch intends to use SMC call to obtain these information and then
register i.MX9 SoC as a device.
Signed-off-by: Alice Guo <alice.guo@nxp.com> Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Stefan Wahren <wahrenst@gmx.net> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
firmware: arm_scmi: Allow transport properties for multiple instances
Default SCMI transport properties values can be overridden with devicetree
provided descriptors; in order to support multiple SCMI instances, make the
properties-update happen on a per-instance copy of the original transport
descriptor.
Linus Torvalds [Sun, 8 Dec 2024 20:01:06 +0000 (12:01 -0800)]
Merge tag 'kbuild-fixes-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix a section mismatch warning in modpost
- Fix Debian package build error with the O= option
* tag 'kbuild-fixes-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: deb-pkg: fix build error with O=
modpost: Add .irqentry.text to OTHER_SECTIONS
Linus Torvalds [Sun, 8 Dec 2024 19:54:04 +0000 (11:54 -0800)]
Merge tag 'irq_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Fix a /proc/interrupts formatting regression
- Have the BCM2836 interrupt controller enter power management states
properly
- Other fixlets
* tag 'irq_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/stm32mp-exti: CONFIG_STM32MP_EXTI should not default to y when compile-testing
genirq/proc: Add missing space separator back
irqchip/bcm2836: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
irqchip/gic-v3: Fix irq_complete_ack() comment
Linus Torvalds [Sun, 8 Dec 2024 19:51:29 +0000 (11:51 -0800)]
Merge tag 'timers_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Borislav Petkov:
- Handle the case where clocksources with small counter width can,
in conjunction with overly long idle sleeps, falsely trigger the
negative motion detection of clocksources
* tag 'timers_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: Make negative motion detection more robust
Linus Torvalds [Sun, 8 Dec 2024 19:38:56 +0000 (11:38 -0800)]
Merge tag 'x86_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Have the Automatic IBRS setting check on AMD does not falsely fire in
the guest when it has been set already on the host
- Make sure cacheinfo structures memory is allocated to address a boot
NULL ptr dereference on Intel Meteor Lake which has different numbers
of subleafs in its CPUID(4) leaf
- Take care of the GDT restoring on the kexec path too, as expected by
the kernel
- Make sure SMP is not disabled when IO-APIC is disabled on the kernel
cmdline
- Add a PGD flag _PAGE_NOPTISHADOW to instruct machinery not to
propagate changes to the kernelmode page tables, to the user portion,
in PTI
- Mark Intel Lunar Lake as affected by an issue where MONITOR wakeups
can get lost and thus user-visible delays happen
- Make sure PKRU is properly restored with XRSTOR on AMD after a PRKU
write of 0 (WRPKRU) which will mark PKRU in its init state and thus
lose the actual buffer
* tag 'x86_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/CPU/AMD: WARN when setting EFER.AUTOIBRS if and only if the WRMSR fails
x86/cacheinfo: Delete global num_cache_leaves
cacheinfo: Allocate memory during CPU hotplug if not done from the primary CPU
x86/kexec: Restore GDT on return from ::preserve_context kexec
x86/cpu/topology: Remove limit of CPUs due to disabled IO/APIC
x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tables
x86/cpu: Add Lunar Lake to list of CPUs with a broken MONITOR implementation
x86/pkeys: Ensure updated PKRU value is XRSTOR'd
x86/pkeys: Change caller of update_pkru_in_sigframe()
Linus Torvalds [Sun, 8 Dec 2024 19:26:13 +0000 (11:26 -0800)]
Merge tag 'mm-hotfixes-stable-2024-12-07-22-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"24 hotfixes. 17 are cc:stable. 15 are MM and 9 are non-MM.
The usual bunch of singletons - please see the relevant changelogs for
details"
* tag 'mm-hotfixes-stable-2024-12-07-22-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (24 commits)
iio: magnetometer: yas530: use signed integer type for clamp limits
sched/numa: fix memory leak due to the overwritten vma->numab_state
mm/damon: fix order of arguments in damos_before_apply tracepoint
lib: stackinit: hide never-taken branch from compiler
mm/filemap: don't call folio_test_locked() without a reference in next_uptodate_folio()
scatterlist: fix incorrect func name in kernel-doc
mm: correct typo in MMAP_STATE() macro
mm: respect mmap hint address when aligning for THP
mm: memcg: declare do_memsw_account inline
mm/codetag: swap tags when migrate pages
ocfs2: update seq_file index in ocfs2_dlm_seq_next
stackdepot: fix stack_depot_save_flags() in NMI context
mm: open-code page_folio() in dump_page()
mm: open-code PageTail in folio_flags() and const_folio_flags()
mm: fix vrealloc()'s KASAN poisoning logic
Revert "readahead: properly shorten readahead when falling back to do_page_cache_ra()"
selftests/damon: add _damon_sysfs.py to TEST_FILES
selftest: hugetlb_dio: fix test naming
ocfs2: free inode when ocfs2_get_init_inode() fails
nilfs2: fix potential out-of-bounds memory access in nilfs_find_entry()
...
Masahiro Yamada [Sun, 8 Dec 2024 07:56:45 +0000 (16:56 +0900)]
kbuild: deb-pkg: fix build error with O=
Since commit 13b25489b6f8 ("kbuild: change working directory to external
module directory with M="), the Debian package build fails if a relative
path is specified with the O= option.
$ make O=build bindeb-pkg
[ snip ]
dpkg-deb: building package 'linux-image-6.13.0-rc1' in '../linux-image-6.13.0-rc1_6.13.0-rc1-6_amd64.deb'.
Rebuilding host programs with x86_64-linux-gnu-gcc...
make[6]: Entering directory '/home/masahiro/linux/build'
/home/masahiro/linux/Makefile:190: *** specified kernel directory "build" does not exist. Stop.
This occurs because the sub_make_done flag is cleared, even though the
working directory is already in the output directory.
Passing KBUILD_OUTPUT=. resolves the issue.
Fixes: 13b25489b6f8 ("kbuild: change working directory to external module directory with M=") Reported-by: Charlie Jenkins <charlie@rivosinc.com> Closes: https://lore.kernel.org/all/Z1DnP-GJcfseyrM3@ghost/ Tested-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Thomas Gleixner [Sun, 1 Dec 2024 11:17:30 +0000 (12:17 +0100)]
modpost: Add .irqentry.text to OTHER_SECTIONS
The compiler can fully inline the actual handler function of an interrupt
entry into the .irqentry.text entry point. If such a function contains an
access which has an exception table entry, modpost complains about a
section mismatch:
WARNING: vmlinux.o(__ex_table+0x447c): Section mismatch in reference ...
The relocation at __ex_table+0x447c references section ".irqentry.text"
which is not in the list of authorized sections.
Add .irqentry.text to OTHER_SECTIONS to cure the issue.
Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org # needed for linux-5.4-y Link: https://lore.kernel.org/all/20241128111844.GE10431@google.com/ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Linus Torvalds [Sun, 8 Dec 2024 01:27:25 +0000 (17:27 -0800)]
Merge tag '6.13-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- DFS fix (for race with tree disconnect and dfs cache worker)
- Four fixes for SMB3.1.1 posix extensions:
- improve special file support e.g. to Samba, retrieving the file
type earlier
- reduce roundtrips (e.g. on ls -l, in some cases)
* tag '6.13-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix potential race in cifs_put_tcon()
smb3.1.1: fix posix mounts to older servers
fs/smb/client: cifs_prime_dcache() for SMB3 POSIX reparse points
fs/smb/client: Implement new SMB3 POSIX type
fs/smb/client: avoid querying SMB2_OP_QUERY_WSL_EA for SMB3 POSIX
Linus Torvalds [Sun, 8 Dec 2024 01:17:38 +0000 (17:17 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Large number of small fixes, all in drivers"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (32 commits)
scsi: scsi_debug: Fix hrtimer support for ndelay
scsi: storvsc: Do not flag MAINTENANCE_IN return of SRB_STATUS_DATA_OVERRUN as an error
scsi: ufs: core: Add missing post notify for power mode change
scsi: sg: Fix slab-use-after-free read in sg_release()
scsi: ufs: core: sysfs: Prevent div by zero
scsi: qla2xxx: Update version to 10.02.09.400-k
scsi: qla2xxx: Supported speed displayed incorrectly for VPorts
scsi: qla2xxx: Fix NVMe and NPIV connect issue
scsi: qla2xxx: Remove check req_sg_cnt should be equal to rsp_sg_cnt
scsi: qla2xxx: Fix use after free on unload
scsi: qla2xxx: Fix abort in bsg timeout
scsi: mpi3mr: Update driver version to 8.12.0.3.50
scsi: mpi3mr: Handling of fault code for insufficient power
scsi: mpi3mr: Start controller indexing from 0
scsi: mpi3mr: Fix corrupt config pages PHY state is switched in sysfs
scsi: mpi3mr: Synchronize access to ioctl data buffer
scsi: mpt3sas: Update driver version to 51.100.00.00
scsi: mpt3sas: Diag-Reset when Doorbell-In-Use bit is set during driver load time
scsi: ufs: pltfrm: Dellocate HBA during ufshcd_pltfrm_remove()
scsi: ufs: pltfrm: Drop PM runtime reference count after ufshcd_remove()
...
Linus Torvalds [Sat, 7 Dec 2024 18:07:05 +0000 (10:07 -0800)]
Merge tag 'block-6.13-20241207' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Target fix using incorrect zero buffer (Nilay)
- Device specifc deallocate quirk fixes (Christoph, Keith)
- Fabrics fix for handling max command target bugs (Maurizio)
- Cocci fix usage for kzalloc (Yu-Chen)
- DMA size fix for host memory buffer feature (Christoph)
- Fabrics queue cleanup fixes (Chunguang)
- CPU hotplug ordering fixes
- Add missing MODULE_DESCRIPTION for rnull
- bcache error value fix
- virtio-blk queue freeze fix
* tag 'block-6.13-20241207' of git://git.kernel.dk/linux:
blk-mq: move cpuhp callback registering out of q->sysfs_lock
blk-mq: register cpuhp callback after hctx is added to xarray table
virtio-blk: don't keep queue frozen during system suspend
nvme-tcp: simplify nvme_tcp_teardown_io_queues()
nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues()
nvme-rdma: unquiesce admin_q before destroy it
nvme-tcp: fix the memleak while create new ctrl failed
nvme-pci: don't use dma_alloc_noncontiguous with 0 merge boundary
nvmet: replace kmalloc + memset with kzalloc for data allocation
nvme-fabrics: handle zero MAXCMD without closing the connection
bcache: revert replacing IS_ERR_OR_NULL with IS_ERR again
nvme-pci: remove two deallocate zeroes quirks
block: rnull: add missing MODULE_DESCRIPTION
nvme: don't apply NVME_QUIRK_DEALLOCATE_ZEROES when DSM is not supported
nvmet: use kzalloc instead of ZERO_PAGE in nvme_execute_identify_ns_nvm()
Linus Torvalds [Fri, 6 Dec 2024 23:07:48 +0000 (15:07 -0800)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Daniel Borkmann::
- Fix several issues for BPF LPM trie map which were found by syzbot
and during addition of new test cases (Hou Tao)
- Fix a missing process_iter_arg register type check in the BPF
verifier (Kumar Kartikeya Dwivedi, Tao Lyu)
- Fix several correctness gaps in the BPF verifier when interacting
with the BPF stack without CAP_PERFMON (Kumar Kartikeya Dwivedi,
Eduard Zingerman, Tao Lyu)
- Fix OOB BPF map writes when deleting elements for the case of xsk map
as well as devmap (Maciej Fijalkowski)
- Fix xsk sockets to always clear DMA mapping information when
unmapping the pool (Larysa Zaremba)
- Fix sk_mem_uncharge logic in tcp_bpf_sendmsg to only uncharge after
sent bytes have been finalized (Zijian Zhang)
- Fix BPF sockmap with vsocks which was missing a queue check in poll
and sockmap cleanup on close (Michal Luczaj)
- Fix tools infra to override makefile ARCH variable if defined but
empty, which addresses cross-building tools. (Björn Töpel)
- Fix two resolve_btfids build warnings on unresolved bpf_lsm symbols
(Thomas Weißschuh)
- Fix a NULL pointer dereference in bpftool (Amir Mohammadi)
- Fix BPF selftests to check for CONFIG_PREEMPTION instead of
CONFIG_PREEMPT (Sebastian Andrzej Siewior)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (31 commits)
selftests/bpf: Add more test cases for LPM trie
selftests/bpf: Move test_lpm_map.c to map_tests
bpf: Use raw_spinlock_t for LPM trie
bpf: Switch to bpf mem allocator for LPM trie
bpf: Fix exact match conditions in trie_get_next_key()
bpf: Handle in-place update for full LPM trie correctly
bpf: Handle BPF_EXIST and BPF_NOEXIST for LPM trie
bpf: Remove unnecessary kfree(im_node) in lpm_trie_update_elem
bpf: Remove unnecessary check when updating LPM trie
selftests/bpf: Add test for narrow spill into 64-bit spilled scalar
selftests/bpf: Add test for reading from STACK_INVALID slots
selftests/bpf: Introduce __caps_unpriv annotation for tests
bpf: Fix narrow scalar spill onto 64-bit spilled scalar slots
bpf: Don't mark STACK_INVALID as STACK_MISC in mark_stack_slot_misc
samples/bpf: Remove unnecessary -I flags from libbpf EXTRA_CFLAGS
bpf: Zero index arg error string for dynptr and iter
selftests/bpf: Add tests for iter arg check
bpf: Ensure reg is PTR_TO_STACK in process_iter_arg
tools: Override makefile ARCH variable if defined, but empty
selftests/bpf: Add apply_bytes test to test_txmsg_redir_wait_sndmem in test_sockmap
...
Linus Torvalds [Fri, 6 Dec 2024 21:47:55 +0000 (13:47 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
"Nothing major, some left-overs from the recent merging window (MTE,
coco) and some newly found issues like the ptrace() ones.
- MTE/hugetlbfs:
- Set VM_MTE_ALLOWED in the arch code and remove it from the core
code for hugetlbfs mappings
- Fix copy_highpage() warning when the source is a huge page but
not MTE tagged, taking the wrong small page path
- drivers/virt/coco:
- Add the pKVM and Arm CCA drivers under the arm64 maintainership
- Fix the pkvm driver to fall back to ioremap() (and warn) if the
MMIO_GUARD hypercall fails
- Keep the Arm CCA driver default 'n' rather than 'm'
- A series of fixes for the arm64 ptrace() implementation,
potentially leading to the kernel consuming uninitialised stack
variables when PTRACE_SETREGSET is invoked with a length of 0
- Fix zone_dma_limit calculation when RAM starts below 4GB and
ZONE_DMA is capped to this limit
- Fix early boot warning with CONFIG_DEBUG_VIRTUAL=y triggered by a
call to page_to_phys() (from patch_map()) which checks pfn_valid()
before vmemmap has been set up
- Do not clobber bits 15:8 of the ASID used for TTBR1_EL1 and TLBI
ops when the kernel assumes 8-bit ASIDs but running under a
hypervisor on a system that implements 16-bit ASIDs (found running
Linux under Parallels on Apple M4)
- ACPI/IORT: Add PMCG platform information for HiSilicon HIP09A as it
is using the same SMMU PMCG as HIP09 and suffers from the same
errata
- Add GCS to cpucap_is_possible(), missed in the recent merge"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: ptrace: fix partial SETREGSET for NT_ARM_GCS
arm64: ptrace: fix partial SETREGSET for NT_ARM_POE
arm64: ptrace: fix partial SETREGSET for NT_ARM_FPMR
arm64: ptrace: fix partial SETREGSET for NT_ARM_TAGGED_ADDR_CTRL
arm64: cpufeature: Add GCS to cpucap_is_possible()
coco: virt: arm64: Do not enable cca guest driver by default
arm64: mte: Fix copy_highpage() warning on hugetlb folios
arm64: Ensure bits ASID[15:8] are masked out when the kernel uses 8-bit ASIDs
ACPI/IORT: Add PMCG platform information for HiSilicon HIP09A
MAINTAINERS: Add CCA and pKVM CoCO guest support to the ARM64 entry
drivers/virt: pkvm: Don't fail ioremap() call if MMIO_GUARD fails
arm64: patching: avoid early page_to_phys()
arm64: mm: Fix zone_dma_limit calculation
arm64: mte: set VM_MTE_ALLOWED for hugetlbfs at correct place
Linus Torvalds [Fri, 6 Dec 2024 21:42:03 +0000 (13:42 -0800)]
Merge tag 'fixes-2024-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fixes from Mike Rapoport:
"Restore check for node validity in arch_numa.
The rework of NUMA initialization in arch_numa dropped a check that
refused to accept configurations with invalid node IDs.
Restore that check to ensure that when firmware passes invalid nodes,
such configuration is rejected and kernel gracefully falls back to
dummy NUMA"
* tag 'fixes-2024-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
arch_numa: Restore nid checks before registering a memblock with a node
memblock: allow zero threshold in validate_numa_converage()
* tag 'drm-fixes-2024-12-06' of https://gitlab.freedesktop.org/drm/kernel:
drm/amdgpu: rework resume handling for display (v2)
drm/amd/pm: fix and simplify workload handling
Revert "drm/amd/pm: correct the workload setting"
drm/amdgpu: fix sriov reinit late orders
drm/amdgpu: Fix ISP hw init issue
drm/amd/display: Add hblank borrowing support
drm/amd/display: Limit VTotal range to max hw cap minus fp
drm/amd/display: Correct prefetch calculation
drm/amd/display: Add option to retrieve detile buffer size
drm/amd/display: Add a left edge pixel if in YCbCr422 or YCbCr420 and odm
drm/amdkfd: hard-code cacheline for gc943,gc944
drm/amdkfd: add MEC version that supports no PCIe atomics for GFX12
drm/amd/display: Fix programming backlight on OLED panels
drm/amd: Sanity check the ACPI EDID
drm/amdgpu/hdp7.0: do a posting read when flushing HDP
drm/amdgpu/hdp6.0: do a posting read when flushing HDP
drm/amdgpu/hdp5.2: do a posting read when flushing HDP
drm/amdgpu/hdp5.0: do a posting read when flushing HDP
drm/amdgpu/hdp4.0: do a posting read when flushing HDP
drm/amdgpu/jpeg1.0: fix idle work handler
Linus Torvalds [Fri, 6 Dec 2024 19:52:15 +0000 (11:52 -0800)]
Merge tag 'drm-fixes-2024-12-07' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Pretty quiet week which is probably expected after US holidays, the
dma-fence and displayport MST message handling fixes make up the bulk
of this, along with a couple of minor xe and other driver fixes.
dma-fence:
- Fix reference leak on fence-merge failure path
- Simplify fence merging with kernel's sort()
- Fix dma_fence_array_signaled() to ensure forward progress
dp_mst:
- Fix MST sideband message body length check
- Fix a bunch of locking/state handling with DP MST msgs
sti:
- Add __iomem for mixer_dbg_mxn()'s parameter
xe:
- Missing init value and 64-bit write-order check
- Fix a memory allocation issue causing lockdep violation
v3d:
- Performance counter fix"
* tag 'drm-fixes-2024-12-07' of https://gitlab.freedesktop.org/drm/kernel:
drm/v3d: Enable Performance Counters before clearing them
drm/dp_mst: Use reset_msg_rx_state() instead of open coding it
drm/dp_mst: Reset message rx state after OOM in drm_dp_mst_handle_up_req()
drm/dp_mst: Ensure mst_primary pointer is valid in drm_dp_mst_handle_up_req()
drm/dp_mst: Fix down request message timeout handling
drm/dp_mst: Simplify error path in drm_dp_mst_handle_down_rep()
drm/dp_mst: Verify request type in the corresponding down message reply
drm/dp_mst: Fix resetting msg rx state after topology removal
drm/xe: Move the coredump registration to the worker thread
drm/xe/guc: Fix missing init value and add register order check
drm/sti: Add __iomem for mixer_dbg_mxn's parameter
drm/dp_mst: Fix MST sideband message body length check
dma-buf: fix dma_fence_array_signaled v4
dma-fence: Use kernel's sort for merging fences
dma-fence: Fix reference leak on fence merge failure path
Linus Torvalds [Fri, 6 Dec 2024 19:46:39 +0000 (11:46 -0800)]
Merge tag 'sound-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes that have been gathered in the week.
- Fix the missing XRUN handling in USB-audio low latency mode
- Fix regression by the previous USB-audio hadening change
- Clean up old SH sound driver to use the standard helpers
- A few further fixes for MIDI 2.0 UMP handling
- Various HD-audio and USB-audio quirks
- Fix jack handling at PM on ASoC Intel AVS
- Misc small fixes for ASoC SOF and Mediatek"
* tag 'sound-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Fix spelling mistake "Firelfy" -> "Firefly"
ASoC: mediatek: mt8188-mt6359: Remove hardcoded dmic codec
ALSA: hda/realtek: fix micmute LEDs don't work on HP Laptops
ALSA: usb-audio: Add extra PID for RME Digiface USB
ALSA: usb-audio: Fix a DMA to stack memory bug
ASoC: SOF: ipc3-topology: fix resource leaks in sof_ipc3_widget_setup_comp_dai()
ALSA: hda/realtek: Add support for Samsung Galaxy Book3 360 (NP730QFG)
ASoC: Intel: avs: da7219: Remove suspend_pre() and resume_post()
ALSA: hda/tas2781: Fix error code tas2781_read_acpi()
ALSA: hda/realtek: Enable mute and micmute LED on HP ProBook 430 G8
ALSA: usb-audio: add mixer mapping for Corsair HS80
ALSA: ump: Shut up truncated string warning
ALSA: sh: Use standard helper for buffer accesses
ALSA: usb-audio: Notify xrun for low-latency mode
ALSA: hda/conexant: fix Z60MR100 startup pop issue
ALSA: ump: Update legacy substream names upon FB info update
ALSA: ump: Indicate the inactive group in legacy substream names
ALSA: ump: Don't open legacy substream for an inactive group
ALSA: seq: ump: Fix seq port updates per FB info notify
Linus Torvalds [Fri, 6 Dec 2024 19:43:22 +0000 (11:43 -0800)]
Merge tag 'regmap-fix-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fixes from Mark Brown:
"A couple of small fixes, fixing an incorrect format specifier in a log
message and adding missing cleanup of the devres data used to support
dev_get_regmap() when a device is unregistered"
* tag 'regmap-fix-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: detach regmap from dev on regmap_exit
regmap: Use correct format specifier for logging range errors
Linus Torvalds [Fri, 6 Dec 2024 19:36:48 +0000 (11:36 -0800)]
Merge tag 'spi-fix-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few small driver specific fixes and device ID updates for SPI.
The Apple change flags the driver as being compatible with the core's
GPIO chip select support, fixing support for some systems"
* tag 'spi-fix-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: omap2-mcspi: Fix the IS_ERR() bug for devm_clk_get_optional_enabled()
spi: intel: Add Panther Lake SPI controller support
spi: apple: Set use_gpio_descriptors to true
spi: mpc52xx: Add cancel_work_sync before module remove
Linus Torvalds [Fri, 6 Dec 2024 19:27:10 +0000 (11:27 -0800)]
Merge tag 'mmc-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"Core:
- Further prevent card detect during shutdown
Host drivers:
- sdhci-pci: Add DMI quirk for missing CD GPIO on Vexia Edu Atla 10
tablet"
* tag 'mmc-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: core: Further prevent card detect during shutdown
mmc: sdhci-pci: Add DMI quirk for missing CD GPIO on Vexia Edu Atla 10 tablet
Linus Torvalds [Fri, 6 Dec 2024 19:24:00 +0000 (11:24 -0800)]
Merge tag 'pmdomain-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain fixes from Ulf Hansson:
"Core:
- Fix a couple of memory-leaks during genpd init/remove
Providers:
- imx: Adjust delay for gpcv2 to fix power up handshake
- mediatek: Fix DT bindings by adding another nested power-domain
layer"
* tag 'pmdomain-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: imx: gpcv2: Adjust delay after power up handshake
pmdomain: core: Fix error path in pm_genpd_init() when ida alloc fails
pmdomain: core: Add missing put_device()
dt-bindings: power: mediatek: Add another nested power-domain layer
x86/CPU/AMD: WARN when setting EFER.AUTOIBRS if and only if the WRMSR fails
When ensuring EFER.AUTOIBRS is set, WARN only on a negative return code
from msr_set_bit(), as '1' is used to indicate the WRMSR was successful
('0' indicates the MSR bit was already set).
Fixes: 8cc68c9c9e92 ("x86/CPU/AMD: Make sure EFER[AIBRSE] is set") Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/Z1MkNofJjt7Oq0G6@google.com Closes: https://lore.kernel.org/all/20241205220604.GA2054199@thelio-3990X
====================
This patch set fixes several issues for LPM trie. These issues were
found during adding new test cases or were reported by syzbot.
The patch set is structured as follows:
Patch #1~#2 are clean-ups for lpm_trie_update_elem().
Patch #3 handles BPF_EXIST and BPF_NOEXIST correctly for LPM trie.
Patch #4 fixes the accounting of n_entries when doing in-place update.
Patch #5 fixes the exact match condition in trie_get_next_key() and it
may skip keys when the passed key is not found in the map.
Patch #6~#7 switch from kmalloc() to bpf memory allocator for LPM trie
to fix several lock order warnings reported by syzbot. It also enables
raw_spinlock_t for LPM trie again. After these changes, the LPM trie will
be closer to being usable in any context (though the reentrance check of
trie->lock is still missing, but it is on my todo list).
Patch #8: move test_lpm_map to map_tests to make it run regularly.
Patch #9: add test cases for the issues fixed by patch #3~#5.
Please see individual patches for more details. Comments are always
welcome.
Change Log:
v3:
* patch #2: remove the unnecessary NULL-init for im_node
* patch #6: alloc the leaf node before disabling IRQ to low
the possibility of -ENOMEM when leaf_size is large; Free
these nodes outside the trie lock (Suggested by Alexei)
* collect review and ack tags (Thanks for Toke & Daniel)
v2: https://lore.kernel.org/bpf/20241127004641.1118269-1-houtao@huaweicloud.com/
* collect review tags (Thanks for Toke)
* drop "Add bpf_mem_cache_is_mergeable() helper" patch
* patch #3~#4: add fix tag
* patch #4: rename the helper to trie_check_add_elem() and increase
n_entries in it.
* patch #6: use one bpf mem allocator and update commit message to
clarify that using bpf mem allocator is more appropriate.
* patch #7: update commit message to add the possible max running time
for update operation.
* patch #9: update commit message to specify the purpose of these test
cases.
Hou Tao [Fri, 6 Dec 2024 11:06:22 +0000 (19:06 +0800)]
selftests/bpf: Add more test cases for LPM trie
Add more test cases for LPM trie in test_maps:
1) test_lpm_trie_update_flags
It constructs various use cases for BPF_EXIST and BPF_NOEXIST and check
whether the return value of update operation is expected.
2) test_lpm_trie_update_full_maps
It tests the update operations on a full LPM trie map. Adding new node
will fail and overwriting the value of existed node will succeed.
3) test_lpm_trie_iterate_strs and test_lpm_trie_iterate_ints
There two test cases test whether the iteration through get_next_key is
sorted and expected. These two test cases delete the minimal key after
each iteration and check whether next iteration returns the second
minimal key. The only difference between these two test cases is the
former one saves strings in the LPM trie and the latter saves integers.
Without the fix of get_next_key, these two cases will fail as shown
below:
test_lpm_trie_iterate_strs(1091):FAIL:iterate #2 got abc exp abS
test_lpm_trie_iterate_ints(1142):FAIL:iterate #1 got 0x2 exp 0x1
Hou Tao [Fri, 6 Dec 2024 11:06:21 +0000 (19:06 +0800)]
selftests/bpf: Move test_lpm_map.c to map_tests
Move test_lpm_map.c to map_tests/ to include LPM trie test cases in
regular test_maps run. Most code remains unchanged, including the use of
assert(). Only reduce n_lookups from 64K to 512, which decreases
test_lpm_map runtime from 37s to 0.7s.
Hou Tao [Fri, 6 Dec 2024 11:06:20 +0000 (19:06 +0800)]
bpf: Use raw_spinlock_t for LPM trie
After switching from kmalloc() to the bpf memory allocator, there will be
no blocking operation during the update of LPM trie. Therefore, change
trie->lock from spinlock_t to raw_spinlock_t to make LPM trie usable in
atomic context, even on RT kernels.
The max value of prefixlen is 2048. Therefore, update or deletion
operations will find the target after at most 2048 comparisons.
Constructing a test case which updates an element after 2048 comparisons
under a 8 CPU VM, and the average time and the maximal time for such
update operation is about 210us and 900us.
Hou Tao [Fri, 6 Dec 2024 11:06:19 +0000 (19:06 +0800)]
bpf: Switch to bpf mem allocator for LPM trie
Multiple syzbot warnings have been reported. These warnings are mainly
about the lock order between trie->lock and kmalloc()'s internal lock.
See report [1] as an example:
======================================================
WARNING: possible circular locking dependency detected 6.10.0-rc7-syzkaller-00003-g4376e966ecb7 #0 Not tainted
------------------------------------------------------
syz.3.2069/15008 is trying to acquire lock: ffff88801544e6d8 (&n->list_lock){-.-.}-{2:2}, at: get_partial_node ...
but task is already holding lock: ffff88802dcc89f8 (&trie->lock){-.-.}-{2:2}, at: trie_update_elem ...
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
A bpf program attached to trace_contention_end() triggers after
acquiring &n->list_lock. The program invokes trie_delete_elem(), which
then acquires trie->lock. However, it is possible that another
process is invoking trie_update_elem(). trie_update_elem() will acquire
trie->lock first, then invoke kmalloc_node(). kmalloc_node() may invoke
get_partial_node() and try to acquire &n->list_lock (not necessarily the
same lock object). Therefore, lockdep warns about the circular locking
dependency.
Invoking kmalloc() before acquiring trie->lock could fix the warning.
However, since BPF programs call be invoked from any context (e.g.,
through kprobe/tracepoint/fentry), there may still be lock ordering
problems for internal locks in kmalloc() or trie->lock itself.
To eliminate these potential lock ordering problems with kmalloc()'s
internal locks, replacing kmalloc()/kfree()/kfree_rcu() with equivalent
BPF memory allocator APIs that can be invoked in any context. The lock
ordering problems with trie->lock (e.g., reentrance) will be handled
separately.
Three aspects of this change require explanation:
1. Intermediate and leaf nodes are allocated from the same allocator.
Since the value size of LPM trie is usually small, using a single
alocator reduces the memory overhead of the BPF memory allocator.
2. Leaf nodes are allocated before disabling IRQs. This handles cases
where leaf_size is large (e.g., > 4KB - 8) and updates require
intermediate node allocation. If leaf nodes were allocated in
IRQ-disabled region, the free objects in BPF memory allocator would not
be refilled timely and the intermediate node allocation may fail.
3. Paired migrate_{disable|enable}() calls for node alloc and free. The
BPF memory allocator uses per-CPU struct internally, these paired calls
are necessary to guarantee correctness.
Hou Tao [Fri, 6 Dec 2024 11:06:18 +0000 (19:06 +0800)]
bpf: Fix exact match conditions in trie_get_next_key()
trie_get_next_key() uses node->prefixlen == key->prefixlen to identify
an exact match, However, it is incorrect because when the target key
doesn't fully match the found node (e.g., node->prefixlen != matchlen),
these two nodes may also have the same prefixlen. It will return
expected result when the passed key exist in the trie. However when a
recently-deleted key or nonexistent key is passed to
trie_get_next_key(), it may skip keys and return incorrect result.
Fix it by using node->prefixlen == matchlen to identify exact matches.
When the condition is true after the search, it also implies
node->prefixlen equals key->prefixlen, otherwise, the search would
return NULL instead.
Hou Tao [Fri, 6 Dec 2024 11:06:17 +0000 (19:06 +0800)]
bpf: Handle in-place update for full LPM trie correctly
When a LPM trie is full, in-place updates of existing elements
incorrectly return -ENOSPC.
Fix this by deferring the check of trie->n_entries. For new insertions,
n_entries must not exceed max_entries. However, in-place updates are
allowed even when the trie is full.
Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation") Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241206110622.1161752-5-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Hou Tao [Fri, 6 Dec 2024 11:06:16 +0000 (19:06 +0800)]
bpf: Handle BPF_EXIST and BPF_NOEXIST for LPM trie
Add the currently missing handling for the BPF_EXIST and BPF_NOEXIST
flags. These flags can be specified by users and are relevant since LPM
trie supports exact matches during update.
Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation") Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241206110622.1161752-4-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>