Most of the remaining ARM board files in the kernel have no known users,
and we plan to remove those in early 2023.
For ep93xx, Alexander Sverdlin still has access to the edb93xx family
of reference boards, while Nikita Shubin has a ts7250 and is working on
a device tree conversion for those. Hartley Sweeten has a
MACH_VISION_EP9307 that is still in use.
This is a total of nine machine definitions that we will keep
around, but these are all similar machines and are defined in only
two board files. The other six board files now have a dependency on
CONFIG_UNUSED_BOARD_FILES to indicate that they are likely going away.
ARM: davinci: mark all ATAGS board files as unused
From an earlier discussion, it appears that the davinci da8xx machines
that are still functional have already been converted to DT, while the
remaining board files are only kept because nobody has stepped up to
remove them.
Mark all these boards as 'depends on UNUSED_BOARD_FILES' with the
plan to remove them in early 2023 after the next longterm supported
kernel is out.
Most of the remaining arm board files in the kernel are unused and will be
removed in early 2023 if no users step up. So far I got no user replies
about the orion5x and mv78xx0 machines, but these are still supported
in the default kernel of the Debian 'armel' (armv5 softfloat) distro,
and there is an active project on github that tries to keep some of
these machines working, and Mauri Sandberg is working on a DT conversion
for the D-Link DNS-323.
It appears the Debian-on-Buffalo project has not got the Terastation WXL
working in a few years, and the other mv78xx0 machines are just the
reference designs, so I assume none of these have remaining users.
For the Orion5x family, the same is probably true for its reference
implementations (RD88Fxxxxx, DB88F281) and the machines with less than
64MB of memory (WNR854T, WRT350N v2).
The remaining nine machines are now scheduled to be kept for at least
2023, hopefully to be replaced with DT based versions.
The mv78xx0_defconfig file needs to enable CONFIG_UNUSED_BOARD_FILES
to still build, while the other affected defconfig files lose the
specific boards.
Cc: Andrew Lunn <andrew@lunn.ch> Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Cc: Gregory Clement <gregory.clement@bootlin.com> Cc: Mauri Sandberg <maukka@ext.kapsi.fi> Link: https://github.com/1000001101000/Debian_on_Buffalo Signed-off-by: Arnd Bergmann <arnd@arndb.de>
ARM: pxa: add Kconfig dependencies for ATAGS based boards
Most of the traditional board files are no longer used by anyone and
will be removed next year, while the DT based machine support remains.
Adding a CONFIG_ATAGS dependency around all the board files means
that they now actaully get disabled when ATAGS support is left out,
and the individual boards that have no known users are marked
as depending on CONFIG_UNUSED_BOARD_FILES, with the plan to remove
them in early 2023 unless someone else shows interest.
Laurence de Bruxelles intends to work on converting the Spitz/Akita/Borzoi
family of Sharp Zaurus SL machines to DT, to make that easier those
remain for the moment.
In addition, the "Gumstix" machine is the one that is supported in
qemu with 256MB of RAM, which makes it particularly nice for testing,
I'm leaving it in hoping that someone can take care of converting it to
DT as well.
Finally, Marc Zyngier is still able to test the Zeus and Viper machines,
so these could be saved as well if anyone wants to conver them to DT.
This seems less likely, so I'm marking them as unused for the time being.
For the defconfig files, both the pxa3xx_defconfig and pxa_defconfig
now only enable the boards that are not marked as unused, while all the
other ones explicitly enable CONFIG_UNUSED_BOARD_FILES to still allow
building the kernels.
Cc: Robert Jarzmik <robert.jarzmik@free.fr> Cc: Daniel Mack <daniel@zonque.org> Cc: Laurence de Bruxelles <lfdebrux@gmail.com> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Based on the recent mailing list discussion, most board file support
has no remaining users and can be scheduled for removal early next
year.
If a board is still found to have users, it will remain for this round
but users are encouraged to migrate to devicetree based booting where
possible.
The timing is meant to ensure the next longterm supported kernel
still contains all the board files, giving another year of support
for potential users that did not speak up and would otherwise be
stuck on the v5.15.y longterm kernel from 2021.
There are a total of eight platforms that only suppor ATAGS based boot
with board files but no devicetree booting.
For dove, the DT support is part of the mvebu platform, which shares
driver but no code in arch/arm.
Most of these will never get converted to DT, and the majority of the
board files appear to be entirely unused already. There are still known
users on a few machines, and there may be interest in converting some
omap1, ep93xx or footbridge machines over in the future.
For the moment, just add a Kconfig dependency to hide these platforms
completely when CONFIG_ATAGS is disabled, and reorder the priority
of the options: Rather than offering to turn ATAGS off for platforms
that have DT support, make it a top-level setting that determines
which platforms are visible.
The s3c24xx platform supports one machine with DT support, but it
cannot be built without also including ATAGS support, and the
entire platform is scheduled for removal, so leaving the entire
platform behind a dependency seems good enough.
All defconfig files should keep working, as the option remains default
enabled.
tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit.
While reading sysctl_tcp_invalid_ratelimit, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 032ee4236954 ("tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen.
While reading sysctl_tcp_min_rtt_wlen, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: f672258391b4 ("tcp: track min RTT using windowed min-filter") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: Fix a data-race around sysctl_tcp_tso_rtt_log.
While reading sysctl_tcp_tso_rtt_log, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: 65466904b015 ("tcp: adjust TSO packet sizes based on min_rtt") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: Fix a data-race around sysctl_tcp_limit_output_bytes.
While reading sysctl_tcp_limit_output_bytes, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 46d3ceabd8d9 ("tcp: TCP Small Queues") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: Fix data-races around sysctl_tcp_workaround_signed_windows.
While reading sysctl_tcp_workaround_signed_windows, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 15d99e02baba ("[TCP]: sysctl to allow TCP window > 32767 sans wscale") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: Fix data-races around sysctl_tcp_no_ssthresh_metrics_save.
While reading sysctl_tcp_no_ssthresh_metrics_save, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 65e6d90168f3 ("net-tcp: Disable TCP ssthresh metrics cache by default") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
CONFIG_LEDS was replaced by CONFIG_NEW_LEDS over ten years ago with commit fa8bbb13ab49 ("ARM: use new LEDS CPU trigger stub to replace old one"),
but some defconfig files still reference it.
Replace it and its sub-options with the corresponding new versions.
Some of these machines may not actually have a new-style LED driver,
and I did not check them individually as most of the machines are
going away soon anyway.
Since commit 1515b186c235 ("ARM: make configuration of userspace
Thumb support an expert option"), CONFIG_THUMB cannot be disabled
unless one turns on CONFIG_EXPERT first.
This is probably for the better, so remove the statements that
turn it off.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
CONFIG_DEBUG_INFO is now implicitly selected if one picks one of the
explicit options that could be DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT,
DEBUG_INFO_DWARF4, DEBUG_INFO_DWARF5.
This was actually not what I had in mind when I suggested making
it a 'choice' statement, but it's too late to change again now,
and the Kconfig logic is more sensible in the new form.
Change any defconfig file that had CONFIG_DEBUG_INFO enabled
but did not pick DWARF4 or DWARF5 explicitly to now pick the toolchain
default.
Fixes: f9b3cd245784 ("Kconfig.debug: make DEBUG_INFO selectable from a choice") Acked-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
ARM: defconfig: remove stale CONFIG_ZBOOT_ROM entries
The default is always 0x0 after commit 39c3e304567a ("ARM: 8984/1:
Kconfig: set default ZBOOT_ROM_TEXT/BSS value to 0x0"), so any
defconfig file that has these two lines can now drop them to reduce
the diff against the 'make savedefconfig' version.
A lot of Kconfig options have changed over the years, and we tend
to not do a blind 'make defconfig' to refresh the files, to ensure
we catch options that should not have gone away.
I used some a bit of scripting to only rework the bits where an
option moved around in any of the defconfig files, without also
dropping any of the other lines, to make it clearer which options
we no longer have.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Acked-by: Neil Armstrong <narmstrong@baylibre.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
On pcb8291, can0 and the network driver share some of the GPIOs so only
1 device can be active. Therefore disable can0 as we want to enable the
network driver.
crypto: testmgr - some more fixes to RSA test vectors
Two more fixes:
* some test vectors in commit 79e6e2f3f3ff ("crypto: testmgr - populate
RSA CRT parameters in RSA test vectors") had misplaced commas, which
break the test and trigger KASAN warnings at least on x86-64
* pkcs1pad test vector did not have its CRT parameters
Fixes: 79e6e2f3f3ff ("crypto: testmgr - populate RSA CRT parameters in RSA test vectors") Reported-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Randy Dunlap [Fri, 15 Jul 2022 01:59:14 +0000 (18:59 -0700)]
crypto: rmd160 - fix Kconfig "its" grammar
Use the possessive "its" instead of the contraction "it's"
where appropriate.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-crypto@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: keembay-ocs-ecc - Drop if with an always false condition
The remove callback is only called after probe completed successfully.
In this case platform_set_drvdata() was called with a non-NULL argument
and so ecc_dev is never NULL.
This is a preparation for making platform remove callbacks return void.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
arch_topology: Fix cache attributes detection in the CPU hotplug path
init_cpu_topology() is called only once at the boot and all the cache
attributes are detected early for all the possible CPUs. However when
the CPUs are hotplugged out, the cacheinfo gets removed. While the
attributes are added back when the CPUs are hotplugged back in as part
of CPU hotplug state machine, it ends up called quite late after the
update_siblings_masks() are called in the secondary_start_kernel()
resulting in wrong llc_sibling_masks.
Move the call to detect_cache_attributes() inside update_siblings_masks()
to ensure the cacheinfo is updated before the LLC sibling masks are
updated. This will fix the incorrect LLC sibling masks generated when
the CPUs are hotplugged out and hotplugged back in again.
ACPI: PPTT: Leave the table mapped for the runtime usage
Currently, everytime an information needs to be fetched from the PPTT,
the table is mapped via acpi_get_table() and unmapped after the use via
acpi_put_table() which is fine. However we do this at runtime especially
when the CPU is hotplugged out and plugged in back since we re-populate
the cache topology and other information.
However, with the support to fetch LLC information from the PPTT in the
cpuhotplug path which is executed in the atomic context, it is preferred
to avoid mapping and unmapping of the PPTT for every single use as the
acpi_get_table() might sleep waiting for a mutex.
In order to avoid the same, the table is needs to just mapped once on
the boot CPU and is never unmapped allowing it to be used at runtime
with out the hassle of mapping and unmapping the table.
Reported-by: Guenter Roeck <linux@roeck-us.net> Cc: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
--
Hi Rafael,
Sorry to bother you again on this PPTT changes. Guenter reported an issue
with lockdep enabled in -next that include my cacheinfo/arch_topology changes
to utilise LLC from PPTT in the CPU hotplug path.
Please ack the change once you are happy so that I can get it merged with
other fixes via Greg's tree.
cacheinfo: Use atomic allocation for percpu cache attributes
On couple of architectures like RISC-V and ARM64, we need to detect
cache attribues quite early during the boot when the secondary CPUs
start. So we will call detect_cache_attributes in the atomic context
and since use of normal allocation can sleep, we will end up getting
"sleeping in the atomic context" bug splat.
In order avoid that, move the allocation to use atomic version in
preparation to move the actual detection of cache attributes in the
CPU hotplug path which is atomic.
Merge tag 'iio-for-5.20b' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next
Jonathan writes:
2nd set of IIO new device support, cleanup etc for 5.20
Given the slight delay in the likely start of the merge cycle lets get
a few more things in for IIO. A few late breaking fixes also included.
Bulk in number of patches is mechanical conversion to new PM macros.
New device support
* npcm
- Add support for NPCM8XX with different resolution and voltage references
from previously supported parts.
Tree wide
* pm_ptr()/pm_sleep_ptr()/DEFINE_[SIMPLE/RUNTIME]_DEV_PM_OPS() conversions
- Convert all the low hanging fruit to the new macros. The remaining
cases will need more careful consideration.
Tidy-up, fixes and minor features.
* Documentation
- Fix duplicate ABI definitions and a missing blank line to squash all
remaining docs build issues in IIO.
* core
- Make iio_format_avail_range() handle non IIO_VAL_INT cases.
* core/trigger
- Move the setup of trigger->owner to allocation rather than registration.
There doesn't seem to be any advantage in doing this late and a few bugs
have occurred because of mis-balanced module reference counting.
- coding style fix-ups.
* tests
- Allow kunit tests to be built as a module.
* ad7949
- Fix a reversed error message.
* cio-dac
- Use structures for register map to improve readability.
* cros_ec
- Register fifo callback only after the sensor is registered. Closes
a theoretical race.
* hmc5843
- Duplicate word fix.
* isl29028
- Fix mixed devm_ and non devm_ for iio_device_register().
* max1027
- Fix missing unlocks in error paths.
* rm3100
- Let core code handle setting INDIO_BUFFER_TRIGGERED.
* sca3000
- Fix ordering and buffer size issue.
* sx_common
- Don't use the IIO device to get device properties when the parent struct
device is readily available.
- Allow the IIO core to connect up the firmware node.
* stx104
- Use structures for register map to improve readability.
* vf610-adc
- Add compatible entries for imx6ul and imx6sx with fallback to
fsl,vf610-adc.
* tag 'iio-for-5.20b' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (65 commits)
dt-bindings: iio: adc: Add compatible for MT8188
iio: light: isl29028: Fix the warning in isl29028_remove()
iio: accel: sca3300: Extend the trigger buffer from 16 to 32 bytes
iio: fix iio_format_avail_range() printing for none IIO_VAL_INT
iio: adc: max1027: unlock on error path in max1027_read_single_value()
iio: proximity: sx9324: add empty line in front of bullet list
iio: magnetometer: hmc5843: Remove duplicate 'the'
iio: magn: yas530: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
iio: magnetometer: ak8974: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
iio: light: veml6030: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
iio: light: vcnl4035: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
iio: light: vcnl4000: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
iio: light: tsl2591: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr()
iio: light: tsl2583: Use DEFINE_RUNTIME_DEV_PM_OPS and pm_ptr()
iio: light: isl29028: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr()
iio: light: gp2ap002: Switch to DEFINE_RUNTIME_DEV_PM_OPS and pm_ptr()
iio: adc: imx8qxp: Switch to DEFINE_RUNTIME_DEV_PM_OPS and pm_ptr()
iio: light: us5182: Switch from CONFIG_PM guards to pm_ptr() etc
iio: temperature: ltc2983: Switch to DEFINE_SIMPLE_DEV_PM_OPS() and pm_sleep_ptr()
iio: proximity: cros_ec_mkbp: Switch to DEFINE_SIMPLE_DEV_PM_OPS() and pm_sleep_ptr()
...
Rob Herring [Wed, 6 Jul 2022 21:19:33 +0000 (15:19 -0600)]
dt-bindings: mfd: stm32-timers: Move fixed string node names under 'properties'
Fixed string node names should be under 'properties' rather than
'patternProperties'. Additionally, without beginning and end of line
anchors, any prefix or suffix is allowed on the specified node name.
Move the stm32 timers 'counter' and 'timer' nodes to the 'properties'
section.
Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com> Signed-off-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20220706211934.567432-1-robh@kernel.org
Dave Airlie [Fri, 22 Jul 2022 05:51:26 +0000 (15:51 +1000)]
Merge tag 'drm-intel-gt-next-2022-07-13' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
Driver uAPI changes:
- All related to the Small BAR support: (and all by Matt Auld)
* add probed_cpu_visible_size
* expose the avail memory region tracking
* apply ALLOC_GPU only by default
* add NEEDS_CPU_ACCESS hint
* tweak error capture on recoverable contexts
Driver highlights:
- Add Small BAR support (Matt)
- Add MeteorLake support (RK)
- Add support for LMEM PCIe resizable BAR (Akeem)
Driver important fixes:
- ttm related fixes (Matt Auld)
- Fix a performance regression related to waitboost (Chris)
- Fix GT resets (Chris)
Driver others:
- Adding GuC SLPC selftest (Vinay)
- Fix ADL-N GuC load (Daniele)
- Add platform workaround (Gustavo, Matt Roper)
- DG2 and ATS-M device ID updates (Matt Roper)
- Add VM_BIND doc rfc with uAPI documentation (Niranjana)
- Fix user-after-free in vma destruction (Thomas)
- Async flush of GuC log regions (Alan)
- Fixes in selftests (Chris, Dan, Andrzej)
- Convert to drm_dbg (Umesh)
- Disable OA sseu config param for newer hardware (Umesh)
- Multi-cast register steering changes (Matt Roper)
- Add lmem_bar_size modparam (Priyanka)
Merge branch 'New nf_conntrack kfuncs for insertion, changing timeout, status'
Kumar Kartikeya Dwivedi says:
====================
Introduce the following new kfuncs:
- bpf_{xdp,skb}_ct_alloc
- bpf_ct_insert_entry
- bpf_ct_{set,change}_timeout
- bpf_ct_{set,change}_status
The setting of timeout and status on allocated or inserted/looked up CT
is same as the ctnetlink interface, hence code is refactored and shared
with the kfuncs. It is ensured allocated CT cannot be passed to kfuncs
that expected inserted CT, and vice versa. Please see individual patches
for details.
* Introduce kfunc flags, rework verifier to work with them
* Add documentation for kfuncs
* Add comment explaining TRUSTED_ARGS kfunc flag (Alexei)
* Fix missing offset check for trusted arguments (Alexei)
* Change nf_conntrack test minimum delta value to 8
* Drop read-only PTR_TO_BTF_ID approach, use struct nf_conn___init (Alexei)
* Drop acquire release pair code that is no longer required (Alexei)
* Disable writes into nf_conn, use dedicated helpers (Florian, Alexei)
* Refactor and share ctnetlink code for setting timeout and status
* Do strict type matching on finding __ref suffix on argument to
prevent passing nf_conn___init as nf_conn (offset = 0, match on walk)
* Remove bpf_ct_opts parameter from bpf_ct_insert_entry
* Update selftests for new additions, add more negative tests
* add bpf_xdp_ct_add and bpf_ct_refresh_timeout kfunc helpers
* remove conntrack dependency from selftests
* add support for forcing kfunc args to be referenced and related selftests
Kumar Kartikeya Dwivedi (10):
bpf: Introduce 8-byte BTF set
tools/resolve_btfids: Add support for 8-byte BTF sets
bpf: Switch to new kfunc flags infrastructure
bpf: Add support for forcing kfunc args to be trusted
bpf: Add documentation for kfuncs
net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup
net: netfilter: Add kfuncs to set and change CT timeout
selftests/bpf: Add verifier tests for trusted kfunc args
selftests/bpf: Add negative tests for new nf_conntrack kfuncs
selftests/bpf: Fix test_verifier failed test in unprivileged mode
====================
selftests/bpf: Fix test_verifier failed test in unprivileged mode
Loading the BTF won't be permitted without privileges, hence only test
for privileged mode by setting the prog type. This makes the
test_verifier show 0 failures when unprivileged BPF is enabled.
Fixes: 41188e9e9def ("selftest/bpf: Test for use-after-free bug fix in inline_bpf_loop") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-14-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
selftests/bpf: Add verifier tests for trusted kfunc args
Make sure verifier rejects the bad cases and ensure the good case keeps
working. The selftests make use of the bpf_kfunc_call_test_ref kfunc
added in the previous patch only for verification.
Lorenzo Bianconi [Thu, 21 Jul 2022 13:42:41 +0000 (15:42 +0200)]
net: netfilter: Add kfuncs to set and change CT status
Introduce bpf_ct_set_status and bpf_ct_change_status kfunc helpers in
order to set nf_conn field of allocated entry or update nf_conn status
field of existing inserted entry. Use nf_ct_change_status_common to
share the permitted status field changes between netlink and BPF side
by refactoring ctnetlink_change_status.
It is required to introduce two kfuncs taking nf_conn___init and nf_conn
instead of sharing one because KF_TRUSTED_ARGS flag causes strict type
checking. This would disallow passing nf_conn___init to kfunc taking
nf_conn, and vice versa. We cannot remove the KF_TRUSTED_ARGS flag as we
only want to accept refcounted pointers and not e.g. ct->master.
Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and
bpf_ct_change_* kfuncs are meant to be used on inserted or looked up
CT entry.
net: netfilter: Add kfuncs to set and change CT timeout
Introduce bpf_ct_set_timeout and bpf_ct_change_timeout kfunc helpers in
order to change nf_conn timeout. This is same as ctnetlink_change_timeout,
hence code is shared between both by extracting it out to
__nf_ct_change_timeout. It is also updated to return an error when it
sees IPS_FIXED_TIMEOUT_BIT bit in ct->status, as that check was missing.
It is required to introduce two kfuncs taking nf_conn___init and nf_conn
instead of sharing one because KF_TRUSTED_ARGS flag causes strict type
checking. This would disallow passing nf_conn___init to kfunc taking
nf_conn, and vice versa. We cannot remove the KF_TRUSTED_ARGS flag as we
only want to accept refcounted pointers and not e.g. ct->master.
Apart from this, bpf_ct_set_timeout is only called for newly allocated
CT so it doesn't need to inspect the status field just yet. Sharing the
helpers even if it was possible would make timeout setting helper
sensitive to order of setting status and timeout after allocation.
Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and
bpf_ct_change_* kfuncs are meant to be used on inserted or looked up
CT entry.
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-9-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Lorenzo Bianconi [Thu, 21 Jul 2022 13:42:39 +0000 (15:42 +0200)]
net: netfilter: Add kfuncs to allocate and insert CT
Introduce bpf_xdp_ct_alloc, bpf_skb_ct_alloc and bpf_ct_insert_entry
kfuncs in order to insert a new entry from XDP and TC programs.
Introduce bpf_nf_ct_tuple_parse utility routine to consolidate common
code.
We extract out a helper __nf_ct_set_timeout, used by the ctnetlink and
nf_conntrack_bpf code, extract it out to nf_conntrack_core, so that
nf_conntrack_bpf doesn't need a dependency on CONFIG_NF_CT_NETLINK.
Later this helper will be reused as a helper to set timeout of allocated
but not yet inserted CT entry.
The allocation functions return struct nf_conn___init instead of
nf_conn, to distinguish allocated CT from an already inserted or looked
up CT. This is later used to enforce restrictions on what kfuncs
allocated CT can be used with.
net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup
Move common checks inside the common function, and maintain the only
difference the two being how to obtain the struct net * from ctx.
No functional change intended.
As the usage of kfuncs grows, we are starting to form consensus on the
kinds of attributes and annotations that kfuncs can have. To better help
developers make sense of the various options available at their disposal
to present an unstable API to the BPF users, document the various kfunc
flags and annotations, their expected usage, and explain the process of
defining and registering a kfunc set.
bpf: Add support for forcing kfunc args to be trusted
Teach the verifier to detect a new KF_TRUSTED_ARGS kfunc flag, which
means each pointer argument must be trusted, which we define as a
pointer that is referenced (has non-zero ref_obj_id) and also needs to
have its offset unchanged, similar to how release functions expect their
argument. This allows a kfunc to receive pointer arguments unchanged
from the result of the acquire kfunc.
This is required to ensure that kfunc that operate on some object only
work on acquired pointers and not normal PTR_TO_BTF_ID with same type
which can be obtained by pointer walking. The restrictions applied to
release arguments also apply to trusted arguments. This implies that
strict type matching (not deducing type by recursively following members
at offset) and OBJ_RELEASE offset checks (ensuring they are zero) are
used for trusted pointer arguments.
Instead of populating multiple sets to indicate some attribute and then
researching the same BTF ID in them, prepare a single unified BTF set
which indicates whether a kfunc is allowed to be called, and also its
attributes if any at the same time. Now, only one call is needed to
perform the lookup for both kfunc availability and its attributes.
tools/resolve_btfids: Add support for 8-byte BTF sets
A flag is a 4-byte symbol that may follow a BTF ID in a set8. This is
used in the kernel to tag kfuncs in BTF sets with certain flags. Add
support to adjust the sorting code so that it passes size as 8 bytes
for 8-byte BTF sets.
Introduce support for defining flags for kfuncs using a new set of
macros, BTF_SET8_START/BTF_SET8_END, which define a set which contains
8 byte elements (each of which consists of a pair of BTF ID and flags),
using a new BTF_ID_FLAGS macro.
This will be used to tag kfuncs registered for a certain program type
as acquire, release, sleepable, ret_null, etc. without having to create
more and more sets which was proving to be an unscalable solution.
Now, when looking up whether a kfunc is allowed for a certain program,
we can also obtain its kfunc flags in the same call and avoid further
lookups.
The resolve_btfids change is split into a separate patch.
Jaehee Park [Wed, 20 Jul 2022 18:36:32 +0000 (14:36 -0400)]
net: ipv6: avoid accepting values greater than 2 for accept_untracked_na
The accept_untracked_na sysctl changed from a boolean to an integer
when a new knob '2' was added. This patch provides a safeguard to avoid
accepting values that are not defined in the sysctl. When setting a
value greater than 2, the user will get an 'invalid argument' warning.
Fixes: aaa5f515b16b ("net: ipv6: new accept_untracked_na option to accept na only if in-network") Signed-off-by: Jaehee Park <jhpark1013@gmail.com> Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Suggested-by: Roopa Prabhu <roopa@nvidia.com> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20220720183632.376138-1-jhpark1013@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 20 Jul 2022 11:20:57 +0000 (14:20 +0300)]
net: pcs: xpcs: propagate xpcs_read error to xpcs_get_state_c37_sgmii
While phylink_pcs_ops :: pcs_get_state does return void, xpcs_get_state()
does check for a non-zero return code from xpcs_get_state_c37_sgmii()
and prints that as a message to the kernel log.
However, a non-zero return code from xpcs_read() is translated into
"return false" (i.e. zero as int) and the I/O error is therefore not
printed. Fix that.
Jakub Kicinski [Wed, 20 Jul 2022 20:37:00 +0000 (13:37 -0700)]
tls: rx: release the sock lock on locking timeout
Eric reports we should release the socket lock if the entire
"grab reader lock" operation has failed. The callers assume
they don't have to release it or otherwise unwind.
Reported-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+16e72110feb2b653ef27@syzkaller.appspotmail.com Fixes: 4cbc325ed6b4 ("tls: rx: allow only one reader at a time") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220720203701.2179034-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 22 Jul 2022 01:45:34 +0000 (18:45 -0700)]
Merge tag 'linux-can-next-for-5.20-20220721' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
can-next 2022-07-21
The patch is by Vincent Mailhol and fixes a use on an uninitialized
variable in the pch_can driver (introduced in last pull request to
net-next).
* tag 'linux-can-next-for-5.20-20220721' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: pch_can: pch_can_error(): initialize errc before using it
====================
Paul E. McKenney [Fri, 22 Jul 2022 00:43:16 +0000 (17:43 -0700)]
Merge branches 'doc.2022.06.21a', 'fixes.2022.07.19a', 'nocb.2022.07.19a', 'poll.2022.07.21a', 'rcu-tasks.2022.06.21a' and 'torture.2022.06.21a' into HEAD
Zqiang [Wed, 18 May 2022 11:43:10 +0000 (19:43 +0800)]
rcu: Add irqs-disabled indicator to expedited RCU CPU stall warnings
If a CPU has interrupts disabled continuously starting before the
beginning of a given expedited RCU grace period, that CPU will not
execute that grace period's IPI handler. This will in turn mean
that the ->cpu_no_qs.b.exp field in that CPU's rcu_data structure
will continue to contain the boolean value false.
Knowing whether or not a CPU has had interrupts disabled can be helpful
when debugging an expedited RCU CPU stall warning, so this commit
adds a "D" indicator expedited RCU CPU stall warnings that signifies
that the corresponding CPU has had interrupts disabled throughout.
rcu: Put panic_on_rcu_stall() after expedited RCU CPU stall warnings
When a normal RCU CPU stall warning is encountered with the
panic_on_rcu_stall sysfs variable is set, the system panics only after
the stall warning is printed. But when an expedited RCU CPU stall
warning is encountered with the panic_on_rcu_stall sysfs variable is
set, the system panics first, thus never printing the stall warning.
This commit therefore brings the expedited stall warning into line with
the normal stall warning by printing first and panicking afterwards.
Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Fri, 15 Apr 2022 17:55:42 +0000 (10:55 -0700)]
rcu: Add polled expedited grace-period primitives
This commit adds expedited grace-period functionality to RCU's polled
grace-period API, adding start_poll_synchronize_rcu_expedited() and
cond_synchronize_rcu_expedited(), which are similar to the existing
start_poll_synchronize_rcu() and cond_synchronize_rcu() functions,
respectively.
Note that although start_poll_synchronize_rcu_expedited() can be invoked
very early, the resulting expedited grace periods are not guaranteed
to start until after workqueues are fully initialized. On the other
hand, both synchronize_rcu() and synchronize_rcu_expedited() can also
be invoked very early, and the resulting grace periods will be taken
into account as they occur.
Paul E. McKenney [Thu, 14 Apr 2022 16:09:11 +0000 (09:09 -0700)]
rcutorture: Verify that polled GP API sees synchronous grace periods
This commit causes rcu_torture_writer() to use WARN_ON_ONCE() to check
that the cookie returned by the current RCU flavor's ->get_gp_state()
function (get_state_synchronize_rcu() for vanilla RCU) causes that
flavor's ->poll_gp_state function (poll_state_synchronize_rcu() for
vanilla RCU) to unconditionally return true.
Note that a pair calls to synchronous grace-period-wait functions are
used. This is necessary to account for partially overlapping normal and
expedited grace periods aligning in just the wrong way with polled API
invocations, which can cause those polled API invocations to ignore one or
the other of those partially overlapping grace periods. It is unlikely
that this sort of ignored grace period will be a problem in production,
but rcutorture can make it happen quite within a few tens of seconds.
This commit is in preparation for polled expedited grace periods.
[ paulmck: Apply feedback from Frederic Weisbecker. ]
Paul E. McKenney [Thu, 14 Apr 2022 18:49:58 +0000 (11:49 -0700)]
rcu: Make Tiny RCU grace periods visible to polled APIs
This commit makes the Tiny RCU implementation of synchronize_rcu()
increment the rcu_ctrlblk.gp_seq counter, thus making both
synchronize_rcu() and synchronize_rcu_expedited() visible to
get_state_synchronize_rcu() and friends.
This situation is counter-intuitive and user-unfriendly. After all, there
really was a perfectly valid full grace period right after the call to
get_state_synchronize_rcu(), so why shouldn't poll_state_synchronize_rcu()
know about it?
This commit therefore makes the polled grace-period API aware of expedited
grace periods in addition to the normal grace periods that it is already
aware of. With this change, the above code is guaranteed not to splat.
Please note that the above code can still splat due to counter wrap on the
one hand and situations involving partially overlapping normal/expedited
grace periods on the other. On 64-bit systems, the second is of course
much more likely than the first. It is possible to modify this approach
to prevent overlapping grace periods from causing splats, but only at
the expense of greatly increasing the probability of counter wrap, as
in within milliseconds on 32-bit systems and within minutes on 64-bit
systems.
This commit is in preparation for polled expedited grace periods.
Paul E. McKenney [Thu, 14 Apr 2022 00:46:15 +0000 (17:46 -0700)]
rcu: Switch polled grace-period APIs to ->gp_seq_polled
This commit switches the existing polled grace-period APIs to use a
new ->gp_seq_polled counter in the rcu_state structure. An additional
->gp_seq_polled_snap counter in that same structure allows the normal
grace period kthread to interact properly with the !SMP !PREEMPT fastpath
through synchronize_rcu(). The first of the two to note the end of a
given grace period will make knowledge of this transition available to
the polled API.
This commit is in preparation for polled expedited grace periods.
[ paulmck: Fix use of rcu_state.gp_seq_polled to start normal grace period. ]
Some cases need multiple nop instructions and arm64 already has a
nice helper for not needing to write all of them out but instead
use a helper to add n nops.
So add a similar thing to riscv and convert the T-Head PMA
alternative to use it.
* 'riscv-nops' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git:
riscv: convert the t-head pbmt errata to use the __nops macro
riscv: introduce nops and __nops macros for NOP sequences
Ben Widawsky [Tue, 8 Jun 2021 17:28:34 +0000 (10:28 -0700)]
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Dan Williams [Fri, 20 May 2022 20:41:24 +0000 (13:41 -0700)]
resource: Introduce alloc_free_mem_region()
The core of devm_request_free_mem_region() is a helper that searches for
free space in iomem_resource and performs __request_region_locked() on
the result of that search. The policy choices of the implementation
conform to what CONFIG_DEVICE_PRIVATE users want which is memory that is
immediately marked busy, and a preference to search for the first-fit
free range in descending order from the top of the physical address
space.
CXL has a need for a similar allocator, but with the following tweaks:
1/ Search for free space in ascending order
2/ Search for free space relative to a given CXL window
3/ 'insert' rather than 'request' the new resource given downstream
drivers from the CXL Region driver (like the pmem or dax drivers) are
responsible for request_mem_region() when they activate the memory
range.
Rework __request_free_mem_region() into get_free_mem_region() which
takes a set of GFR_* (Get Free Region) flags to control the allocation
policy (ascending vs descending), and "busy" policy (insert_resource()
vs request_region()).
As part of the consolidation of the legacy GFR_REQUEST_REGION case with
the new default of just inserting a new resource into the free space
some minor cleanups like not checking for NULL before calling
devres_free() (which does its own check) is included.
Dan Williams [Tue, 7 Jun 2022 17:35:39 +0000 (10:35 -0700)]
cxl/mem: Enumerate port targets before adding endpoints
The port scanning algorithm in devm_cxl_enumerate_ports() walks up the
topology and adds cxl_port objects starting from the root down to the
endpoint. When those ports are initially created they know all their
dports, but they do not know the downstream cxl_port instance that
represents the next descendant in the topology. Rework create_endpoint()
into devm_cxl_add_endpoint() that enumerates the downstream cxl_port
topology into each port's 'struct cxl_ep' record for each endpoint it
that the port is an ancestor.
Ben Widawsky [Sun, 10 Apr 2022 22:26:13 +0000 (15:26 -0700)]
cxl/hdm: Add sysfs attributes for interleave ways + granularity
The region provisioning flow involves selecting interleave ways +
granularity settings for a region, and then programming the decoder
topology to meet those constraints, if possible. For example, root
decoders set the minimum interleave ways + granularity for any hosted
regions.
Given decoder programming is not atomic and collisions can occur between
multiple requesting regions userspace will be responsible for conflict
resolution and it needs these attributes to make those decisions.
Dan Williams [Sat, 28 May 2022 03:51:19 +0000 (20:51 -0700)]
cxl/port: Move dport tracking to an xarray
Reduce the complexity and the overhead of walking the topology to
determine endpoint connectivity to root decoder interleave
configurations.
Note that cxl_detach_ep(), after it determines that the last @ep has
departed and decides to delete the port, now needs to walk the dport
array with the device_lock() held to remove entries. Previously
list_splice_init() could be used atomically delete all dport entries at
once and then perform entry tear down outside the lock. There is no
list_splice_init() equivalent for the xarray.
Dan Williams [Fri, 27 May 2022 17:58:26 +0000 (10:58 -0700)]
cxl/port: Move 'cxl_ep' references to an xarray per port
In preparation for region provisioning that needs to walk the topology
by endpoints, use an xarray to record endpoint interest in a given port.
In addition to being more space and time efficient it also reduces the
complexity of the implementation by moving locking internal to the
xarray implementation. It also allows for a single cxl_ep reference to
be recorded in multiple xarrays.
Dan Williams [Fri, 27 May 2022 17:57:01 +0000 (10:57 -0700)]
cxl/port: Record parent dport when adding ports
At the time that cxl_port instances are being created, cache the dport
from the parent port that points to this new child port. This will be
useful for region provisioning when walking the tree to calculate
decoder targets, and saves rewalking the dport list after the fact to
build this information.
Dan Williams [Fri, 27 May 2022 07:56:59 +0000 (00:56 -0700)]
cxl/port: Record dport in endpoint references
Recall that the primary role of the cxl_mem driver is to probe if the
given endpoint is connected to a CXL port topology. In that process it
walks its device ancestry to its PCI root port. If that root port is
also a CXL root port then the probe process adds cxl_port object
instances at switch in the path between to the root and the endpoint. As
those cxl_port instances are added, or if a previous enumeration
attempt already created the port, a 'struct cxl_ep' instance is
registered with that port to track the endpoints interested in that
port.
At the time the cxl_ep is registered the downstream egress path from the
port to the endpoint is known. Take the opportunity to record that
information as it will be needed for dynamic programming of decoder
targets during region provisioning.
Dan Williams [Tue, 24 May 2022 01:02:30 +0000 (18:02 -0700)]
cxl/hdm: Add support for allocating DPA to an endpoint decoder
The region provisioning flow will roughly follow a sequence of:
1/ Allocate DPA to a set of decoders
2/ Allocate HPA to a region
3/ Associate decoders with a region and validate that the DPA allocations
and topologies match the parameters of the region.
For now, this change (step 1) arranges for DPA capacity to be allocated
and deleted from non-committed decoders based on the decoder's mode /
partition selection. Capacity is allocated from the lowest DPA in the
partition and any 'pmem' allocation blocks out all remaining ram
capacity in its 'skip' setting. DPA allocations are enforced in decoder
instance order. I.e. decoder N + 1 always starts at a higher DPA than
instance N, and deleting allocations must proceed from the
highest-instance allocated decoder to the lowest.
Dan Williams [Tue, 24 May 2022 19:04:58 +0000 (12:04 -0700)]
cxl/hdm: Track next decoder to allocate
The CXL specification enforces that endpoint decoders are committed in
hw instance id order. In preparation for adding dynamic DPA allocation,
record the hw instance id in endpoint decoders, and enforce allocations
to occur in hw instance id order.
Dan Williams [Mon, 23 May 2022 19:15:35 +0000 (12:15 -0700)]
cxl/hdm: Add 'mode' attribute to decoder objects
Recall that the Device Physical Address (DPA) space of a CXL Memory
Expander is potentially partitioned into a volatile and persistent
portion. A decoder maps a Host Physical Address (HPA) range to a DPA
range and that translation depends on the value of all previous (lower
instance number) decoders before the current one.
In preparation for allowing dynamic provisioning of regions, decoders
need an ABI to indicate which DPA partition a decoder targets. This ABI
needs to be prepared for the possibility that some other agent committed
and locked a decoder that spans the partition boundary.
Add 'decoderX.Y/mode' to endpoint decoders that indicates which
partition 'ram' / 'pmem' the decoder targets, or 'mixed' if the decoder
currently spans the partition boundary.
Dan Williams [Fri, 22 Jul 2022 00:19:12 +0000 (17:19 -0700)]
cxl/hdm: Enumerate allocated DPA
In preparation for provisioning CXL regions, add accounting for the DPA
space consumed by existing regions / decoders. Recall, a CXL region is a
memory range comprised from one or more endpoint devices contributing a
mapping of their DPA into HPA space through a decoder.
Record the DPA ranges covered by committed decoders at initial probe of
endpoint ports relative to a per-device resource tree of the DPA type
(pmem or volatile-ram).
The cxl_dpa_rwsem semaphore is introduced to globally synchronize DPA
state across all endpoints and their decoders at once. The vast majority
of DPA operations are reads as region creation is expected to be as rare
as disk partitioning and volume creation. The device_lock() for this
synchronization is specifically avoided for concern of entangling with
sysfs attribute removal.
Heiko Stuebner [Tue, 7 Jun 2022 14:30:58 +0000 (16:30 +0200)]
riscv: introduce nops and __nops macros for NOP sequences
NOP sequences tend to get used for padding out alternative sections
This change adds macros for generating these sequences as both inline
asm blocks, but also as strings suitable for embedding in other asm
blocks directly.
It essentially mimics similar functionality from arm64 introduced by
Wil Deacon in commit f99a250cb6a3 ("arm64: barriers: introduce nops
and __nops macros for NOP sequences").