tcp: Fix data-races around sysctl_tcp_slow_start_after_idle.
While reading sysctl_tcp_slow_start_after_idle, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 35089bb203f4 ("[TCP]: Add tcp_slow_start_after_idle sysctl.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: Fix a data-race around sysctl_tcp_thin_linear_timeouts.
While reading sysctl_tcp_thin_linear_timeouts, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 36e31b0af587 ("net: TCP thin linear timeouts") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
While reading sysctl_tcp_recovery, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: 4f41b1c58a32 ("tcp: use RACK to detect losses") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: Fix a data-race around sysctl_tcp_early_retrans.
While reading sysctl_tcp_early_retrans, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: eed530b6c676 ("tcp: early retransmit") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
udp: Fix a data-race around sysctl_udp_l3mdev_accept.
While reading sysctl_udp_l3mdev_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: 63a6fff353d0 ("net: Avoid receiving packets with an l3mdev on unbound UDP sockets") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sysctl_ip_prot_sock is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.
Fixes: 4548b683b781 ("Introduce a sysctl that modifies the value of PROT_SOCK.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4: Fix data-races around sysctl_fib_multipath_hash_fields.
While reading sysctl_fib_multipath_hash_fields, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: ce5c9c20d364 ("ipv4: Add a sysctl to control multipath hash fields") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4: Fix data-races around sysctl_fib_multipath_hash_policy.
While reading sysctl_fib_multipath_hash_policy, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: bf4e0a3db97e ("net: ipv4: add support for ECMP hash policy choice") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4: Fix a data-race around sysctl_fib_multipath_use_neigh.
While reading sysctl_fib_multipath_use_neigh, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: a6db4494d218 ("net: ipv4: Consider failed nexthops in multipath routes") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pierre Morel [Thu, 14 Jul 2022 10:18:23 +0000 (12:18 +0200)]
KVM: s390: guest support for topology function
We report a topology change to the guest for any CPU hotplug.
The reporting to the guest is done using the Multiprocessor
Topology-Change-Report (MTCR) bit of the utility entry in the guest's
SCA which will be cleared during the interpretation of PTF.
On every vCPU creation we set the MCTR bit to let the guest know the
next time it uses the PTF with command 2 instruction that the
topology changed and that it should use the STSI(15.1.x) instruction
to get the topology details.
STSI(15.1.x) gives information on the CPU configuration topology.
Let's accept the interception of STSI with the function code 15 and
let the userland part of the hypervisor handle it when userland
supports the CPU Topology facility.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20220714101824.101601-2-pmorel@linux.ibm.com
Message-Id: <20220714101824.101601-2-pmorel@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Pierre Morel [Wed, 4 May 2022 12:29:08 +0000 (14:29 +0200)]
KVM: s390: Cleanup ipte lock access and SIIF facility checks
We can check if SIIF is enabled by testing the sclp_info struct
instead of testing the sie control block eca variable as that
facility is always enabled if available.
Also let's cleanup all the ipte related struct member accesses
which currently happen by referencing the KVM struct via the
VCPU struct.
Making the KVM struct the parameter to the ipte_* functions
removes one level of indirection which makes the code more readable.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Nico Boehr <nrb@linux.ibm.com> Link: https://lore.kernel.org/all/20220711084148.25017-2-pmorel@linux.ibm.com/ Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Ben Dooks [Tue, 19 Jul 2022 08:52:00 +0000 (09:52 +0100)]
reset: reset-simple should depends on HAS_IOMEM
The reset-simple driver does not build on all architecuters as it requires
devm_ioremap_resource() which is only built when CONFIG_HAS_IOMEM is enabled
in the kenrel. Fix the following error by depending on CONFIG_HAS_IOMEM:
drivers/reset/reset-simple.o: in function `reset_simple_probe':
reset-simple.c:(.text+0x3aa): undefined reference to `devm_ioremap_resource'
Fixes: 18d1909be345 ("reset: allow building of reset simple driver if expert config selected") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ben Dooks <ben.dooks@sifive.com> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Link: https://lore.kernel.org/r/20220719085200.203688-1-ben.dooks@sifive.com
In commit c6f2a617a0a8 ("can: mcp251xfd: add support for mcp251863")
support for the mcp251863 was added. However it was not taken into
account that the auto detection of the chip model cannot distinguish
between mcp2518fd and mcp251863 and would lead to a warning message if
the firmware specifies a mcp251863.
Fix auto detection: If a mcp2518fd compatible chip is found, keep the
mcp251863 if specified by firmware, use mcp2518fd instead.
Clément Léger [Mon, 6 Jun 2022 14:57:01 +0000 (16:57 +0200)]
ARM: at91: setup outer cache .write_sec() callback if needed
When running under OP-TEE, the L2 cache is configured by OP-TEE and the
sam platform code does not allow any modification yet. Setup a dummy
.write_sec() callback to avoid triggering exceptions when Linux tries
to modify the L2 cache configuration.
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
[claudiu.beznea: keep .init_early populated only for SAMA5D2, remove
sam_secure_init() from sama5d2_init() as it is also called in
sama5_secure_cache_init()] Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Link: https://lore.kernel.org/r/20220606145701.185552-3-clement.leger@bootlin.com
Clément Léger [Mon, 6 Jun 2022 14:57:00 +0000 (16:57 +0200)]
ARM: at91: add sam_linux_is_optee_available() function
Add sam_linux_is_optee_available() which allows to know if OP-TEE is
available for Linux. This function is used by code which needs to
know if we running with OP-TEE available or not.
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
[claudiu.beznea: edit commit title and message, renamed
sam_linux_is_in_normal_world() into sam_linux_is_optee_available()] Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Link: https://lore.kernel.org/r/20220606145701.185552-2-clement.leger@bootlin.com
====================
can: error: set of fixes and improvement on txerr and rxerr reporting
This series is a collection of patches targeting the CAN error
counter. The series is split in three blocks (with small relation to
each other).
Several drivers uses the data[6] and data[7] fields (both of type u8)
of the CAN error frame to report those values. However, the maximum
size an u8 can hold is 255 and the error counter can exceed this value
if bus-off status occurs. As such, the first nine patches of this
series make sure that no drivers try to report txerr or rxerr through
the CAN error frame when bus-off status is reached.
can_frame::data[5..7] are defined as being "controller
specific". Controller specific behaviors are not something desirable
(portability issue...) The tenth patch of this series specifies how
can_frame::data[5..7] should be use and remove any "controller
specific" freedom. The eleventh patch adds a flag to notify though
can_frame::can_id that data[6..7] were populated (in order to be
consistent with other fields).
Finally, the twelfth and last patch add three macro values to specify
the different error counter threshold with so far was hard-coded as
magic numbers in the drivers.
N.B.:
* patches 1 to 10 are for net (stable).
* patches 11 and 12 are for net-next (but depends on patches 1 to 10).
** Changelog **
v1 -> v2: https://lore.kernel.org/all/20220712153157.83847-1-mailhol.vincent@wanadoo.fr
* Fix typo in patch #10: data[7] of CAN error frames is for the RX
error counter, not the TX one (this is litteraly a one byte
change).
====================
As discussed take the whole series via can-next -> net-next.
Eric Biggers [Tue, 19 Jul 2022 03:04:15 +0000 (03:04 +0000)]
crypto: lib - add module license to libsha1
libsha1 can be a module, so it needs a MODULE_LICENSE.
Fixes: ec8f7f4821d5 ("crypto: lib - make the sha1 library optional") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Vincent Mailhol [Tue, 19 Jul 2022 14:35:50 +0000 (23:35 +0900)]
can: error: add definitions for the different CAN error thresholds
Currently, drivers are using magic numbers to derive the CAN error
states from the error counter. Add three macro declarations to
remediate this.
For reference, the error-active, error-passive and bus-off are defined
in ISO 11898, section 12.1.4.2 "Error counting". Although ISO 11898
does not define error-warning state, this extra value is also commonly
used and is thus also added.
Vincent Mailhol [Tue, 19 Jul 2022 14:35:49 +0000 (23:35 +0900)]
can: add CAN_ERR_CNT flag to notify availability of error counter
Add a dedicated flag in uapi/linux/can/error.h to notify the userland
that fields data[6] and data[7] of the CAN error frame were
respectively populated with the tx and rx error counters.
For all driver tree-wide, set up this flags whenever needed.
Vincent Mailhol [Tue, 19 Jul 2022 14:35:48 +0000 (23:35 +0900)]
can: error: specify the values of data[5..7] of CAN error frames
Currently, data[5..7] of struct can_frame, when used as a CAN error
frame, are defined as being "controller specific". Device specific
behaviours are problematic because it prevents someone from writing
code which is portable between devices.
As a matter of fact, data[5] is never used, data[6] is always used to
report TX error counter and data[7] is always used to report RX error
counter. can-utils also relies on this.
This patch updates the comment in the uapi header to specify that
data[5] is reserved (and thus should not be used) and that data[6..7]
are used for error counters.
Vincent Mailhol [Tue, 19 Jul 2022 14:35:46 +0000 (23:35 +0900)]
can: kvaser_usb_leaf: do not report txerr and rxerr during bus-off
During bus off, the error count is greater than 255 and can not fit in
a u8.
Fixes: 7259124eac7d1 ("can: kvaser_usb: Split driver into kvaser_usb_core.c and kvaser_usb_leaf.c") Link: https://lore.kernel.org/all/20220719143550.3681-9-mailhol.vincent@wanadoo.fr CC: Jimmy Assarsson <extja@kvaser.com> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Vincent Mailhol [Tue, 19 Jul 2022 14:35:45 +0000 (23:35 +0900)]
can: kvaser_usb_hydra: do not report txerr and rxerr during bus-off
During bus off, the error count is greater than 255 and can not fit in
a u8.
Fixes: aec5fb2268b7 ("can: kvaser_usb: Add support for Kvaser USB hydra family") Link: https://lore.kernel.org/all/20220719143550.3681-8-mailhol.vincent@wanadoo.fr CC: Jimmy Assarsson <extja@kvaser.com> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Fixes: 0a9cdcf098a4 ("can: slcan: extend the protocol with CAN state info") Link: https://lore.kernel.org/all/20220719143550.3681-5-mailhol.vincent@wanadoo.fr CC: Dario Binacchi <dario.binacchi@amarulasolutions.com> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Liang He [Thu, 14 Jul 2022 08:13:37 +0000 (16:13 +0800)]
drm/imx/dcss: Add missing of_node_put() in fail path
In dcss_dev_create() and dcss_dev_destroy(), we should call of_node_put()
in fail path or before the dcss's destroy as of_graph_get_port_by_id() has
increased the refcount.
Michael Ellerman [Mon, 18 Jul 2022 13:44:18 +0000 (23:44 +1000)]
powerpc/64s: Disable stack variable initialisation for prom_init
With GCC 12 allmodconfig prom_init fails to build:
Error: External symbol 'memset' referenced from prom_init.c
make[2]: *** [arch/powerpc/kernel/Makefile:204: arch/powerpc/kernel/prom_init_check] Error 1
The allmodconfig build enables KASAN, so all calls to memset in
prom_init should be converted to __memset by the #ifdefs in
asm/string.h, because prom_init must use the non-KASAN instrumented
versions.
The build failure happens because there's a call to memset that hasn't
been caught by the pre-processor and converted to __memset. Typically
that's because it's a memset generated by the compiler itself, and that
is the case here.
With GCC 12, allmodconfig enables CONFIG_INIT_STACK_ALL_PATTERN, which
causes the compiler to emit memset calls to initialise on-stack
variables with a pattern.
Because prom_init is non-user-facing boot-time only code, as a
workaround just disable stack variable initialisation to unbreak the
build.
RISC-V: Support for 64bit hartid on RV64 platforms
The hartid can be a 64bit value on RV64 platforms. This series updates
the code so that 64bit hartid can be supported on RV64 platforms.
* 'riscv-64bit_hartid' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git:
riscv/efi_stub: Add 64bit boot-hartid support on RV64
riscv: cpu: Add 64bit hartid support on RV64
riscv: smp: Add 64bit hartid support on RV64
riscv: spinwait: Fix hartid variable type
riscv: cpu_ops_sbi: Add 64bit hartid support on RV64
This patch re-introduces support for GuC v69 in parallel to v70. As this
is a quick fix, v69 has been re-introduced as the single "fallback" guc
version in case v70 is not available on disk and only for platforms that
are out of force_probe and require the GuC by default. All v69 specific
code has been labeled as such for easy identification, and the same was
done for all v70 functions for which there is a separate v69 version,
to avoid accidentally calling the wrong version via the unlabeled name.
When the fallback mode kicks in, a drm_notice message is printed in
dmesg to inform the user of the required update. The existing
logging of the fetch function has also been updated so that we no
longer complain immediately if we can't find a fw and we only throw an
error if the fetch of both the base and fallback blobs fails.
The plan is to follow this up with a more complex rework to allow for
multiple different GuC versions to be supported at the same time.
v2: reduce the fallback to platform that require it, switch to
firmware_request_nowarn(), improve logs.
Matthew Brost [Wed, 4 May 2022 23:46:36 +0000 (16:46 -0700)]
drm/i915/guc: Support programming the EU priority in the GuC descriptor
In GuC submission mode the EU priority must be updated by the GuC rather
than the driver as the GuC owns the programming of the context descriptor.
Given that the GuC code uses the GuC priorities, we can't use a generic
function using i915 priorities for both execlists and GuC submission.
The existing function has therefore been pushed to the execlists
back-end while a new one has been added for GuC.
drivers/platform/chrome/cros_kbd_led_backlight.c got a new build warning
when using the randconfig in [1]:
>>> warning: unused variable 'keyboard_led_drvdata_ec_pwm'
The warning happens when CONFIG_CROS_EC is set but CONFIG_OF is not set.
Reproduce:
- mkdir build_dir
- wget [1] -O build_dir/.config
- COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 \
O=build_dir ARCH=s390 SHELL=/bin/bash drivers/platform/chrome/
Fix the warning by using __maybe_unused. Also use IS_ENABLED() because
CROS_EC is a tristate.
platform/chrome: cros_ec_proto: return -EPROTO if empty payload
cros_ec_wait_until_complete() sends EC_CMD_GET_COMMS_STATUS which expects
to receive sizeof(struct ec_response_get_comms_status) from
cros_ec_xfer_command().
Return -EPROTO if cros_ec_xfer_command() returns 0.
platform/chrome: cros_ec_proto: add Kunit test for empty payload
cros_ec_wait_until_complete() sends EC_CMD_GET_COMMS_STATUS which expects
to receive sizeof(struct ec_response_get_comms_status) from
cros_ec_xfer_command().
Add Kunit test and expect to receive an error code when
cros_ec_xfer_command() returns 0.
platform/chrome: cros_ec_proto: return -EAGAIN when retries timed out
While EC_COMMS_STATUS_PROCESSING flag is still on after it tries
EC_COMMAND_RETRIES times for sending EC_CMD_GET_COMMS_STATUS,
cros_ec_wait_until_complete() doesn't return an error code.
platform/chrome: cros_ec_proto: change Kunit expectation when timed out
While EC_COMMS_STATUS_PROCESSING flag is still on after it tries
EC_COMMAND_RETRIES times for sending EC_CMD_GET_COMMS_STATUS,
cros_ec_wait_until_complete() doesn't return an error code.
platform/chrome: cros_ec_proto: separate cros_ec_wait_until_complete()
EC returns EC_RES_IN_PROGRESS if the host command needs more time to
complete. Whenever receives the return code, cros_ec_send_command()
sends EC_CMD_GET_COMMS_STATUS to query the command status.
Separate cros_ec_wait_until_complete() from cros_ec_send_command().
It sends EC_CMD_GET_COMMS_STATUS and waits until the previous command
was completed, or encountered error, or timed out.
platform/chrome: cros_ec_proto: separate cros_ec_xfer_command()
cros_ec_send_command() has extra logic to handle EC_RES_IN_PROGRESS.
Separate the command transfer part into cros_ec_xfer_command() so
that other functions can re-use it.
Jakub Kicinski [Wed, 20 Jul 2022 00:43:02 +0000 (17:43 -0700)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-07-18
This series contains updates to iavf driver only.
Przemyslaw fixes handling of multiple VLAN requests to account for
individual errors instead of rejecting them all. He removes incorrect
implementations of ETHTOOL_COALESCE_MAX_FRAMES and
ETHTOOL_COALESCE_MAX_FRAMES_IRQ.
He also corrects an issue with NULL pointer caused by improper handling of
dummy receive descriptors. Finally, he corrects debug prints reporting an
unknown state.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: Fix missing state logs
iavf: Fix handling of dummy receive descriptors
iavf: Disallow changing rx/tx-frames and rx/tx-frames-irq
iavf: Fix VLAN_V2 addition/rejection
====================
Xin Long [Mon, 18 Jul 2022 17:56:59 +0000 (13:56 -0400)]
Documentation: fix udp_wmem_min in ip-sysctl.rst
UDP doesn't support tx memory accounting, and sysctl udp_wmem_min
is not really used anywhere. So we should fix the description in
ip-sysctl.rst accordingly.
Lorenzo Bianconi [Mon, 18 Jul 2022 09:51:53 +0000 (11:51 +0200)]
net: ethernet: mtk_ppe: fix possible NULL pointer dereference in mtk_flow_get_wdma_info
odev pointer can be NULL in mtk_flow_offload_replace routine according
to the flower action rules. Fix possible NULL pointer dereference in
mtk_flow_get_wdma_info.
the last caller has been removed with commit 96f5e66e8a79 ("mac80211: fix
aggregation for hardware with ampdu queues"), so it's safe to remove this
function.
Hayes Wang [Mon, 18 Jul 2022 08:21:20 +0000 (16:21 +0800)]
r8152: fix a WOL issue
This fixes that the platform is waked by an unexpected packet. The
size and range of FIFO is different when the device enters S3 state,
so it is necessary to correct some settings when suspending.
Regardless of jumbo frame, set RMS to 1522 and MTPS to MTPS_DEFAULT.
Besides, enable MCU_BORW_EN to update the method of calculating the
pointer of data. Then, the hardware could get the correct data.
Sunil V L [Fri, 27 May 2022 05:17:40 +0000 (10:47 +0530)]
riscv: spinwait: Fix hartid variable type
The hartid variable is of type int but compared with
ULONG_MAX(INVALID_HARTID). This issue is fixed by changing
the hartid variable type to unsigned long.
Fixes: c78f94f35cf6 ("RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method") Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20220527051743.2829940-3-sunilvl@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
drm/panel-edp: Fix variable typo when saving hpd absent delay from DT
The value read from the "hpd-absent-delay-ms" property in DT was being
saved to the wrong variable, overriding the hpd_reliable delay. Fix the
typo.
Fixes: 5540cf8f3e8d ("drm/panel-edp: Implement generic "edp-panel"s probed by EDID") Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Reviewed-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20220719203857.1488831-4-nfraprado@collabora.com
Ira Weiny [Tue, 19 Jul 2022 20:52:49 +0000 (13:52 -0700)]
cxl/port: Read CDAT table
The per-device CDAT data provides performance data that is relevant for
mapping which CXL devices can participate in which CXL ranges by QTG
(QoS Throttling Group) (per ECN: CXL 2.0 CEDT CFMWS & QTG_DSM) [1]. The
QTG association specified in the ECN is advisory. Until the
cxl_acpi driver grows support for invoking the QTG _DSM method the CDAT
data is only of interest to userspace that may need it for debug
purposes.
Search the DOE mailboxes available, query CDAT data, cache the data and
make it available via a sysfs binary attribute per endpoint at:
/sys/bus/cxl/devices/endpointX/CDAT
...similar to other ACPI-structured table data in
/sys/firmware/ACPI/tables. The CDAT is relative to 'struct cxl_port'
objects since switches in addition to endpoints can host a CDAT
instance. Switch CDAT support is not implemented.
This does not support table updates at runtime. It will always provide
whatever was there when first cached. It is also the case that table
updates are not expected outside of explicit DPA address map affecting
commands like Set Partition with the immediate flag set. Given that the
driver does not support Set Partition with the immediate flag set there
is no current need for update support.
Link: https://www.computeexpresslink.org/spec-landing Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
[djbw: drop in-kernel parsing infra for now, and other minor fixups] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220719205249.566684-7-ira.weiny@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Ira Weiny [Tue, 19 Jul 2022 20:52:48 +0000 (13:52 -0700)]
driver-core: Introduce BIN_ATTR_ADMIN_{RO,RW}
Many binary attributes need to limit access to CAP_SYS_ADMIN only; ie
many binary attributes specify is_visible with 0400 or 0600.
Make setting the permissions of such attributes more explicit by
defining BIN_ATTR_ADMIN_{RO,RW}.
Cc: Bjorn Helgaas <bhelgaas@google.com> Suggested-by: Dan Williams <dan.j.williams@intel.com> Suggested-by: Krzysztof Wilczyński <kw@linux.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20220719205249.566684-6-ira.weiny@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Ira Weiny [Tue, 19 Jul 2022 20:52:47 +0000 (13:52 -0700)]
cxl/pci: Create PCI DOE mailbox's for memory devices
DOE mailbox objects will be needed for various mailbox communications
with each memory device.
Iterate each DOE mailbox capability and create PCI DOE mailbox objects
as found.
It is not anticipated that this is the final resting place for the
iteration of the DOE devices. The support of switch ports will drive
this code into the PCIe side. In this imagined architecture the CXL
port driver would then query into the PCI device for the DOE mailbox
array.
For now creating the mailboxes in the CXL port is good enough for the
endpoints. Later PCIe ports will need to support this to support switch
ports more generically.
Cc: Dan Williams <dan.j.williams@intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Lukas Wunner <lukas@wunner.de> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20220719205249.566684-5-ira.weiny@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Jonathan Cameron [Tue, 19 Jul 2022 20:52:46 +0000 (13:52 -0700)]
PCI/DOE: Add DOE mailbox support functions
Introduced in a PCIe r6.0, sec 6.30, DOE provides a config space based
mailbox with standard protocol discovery. Each mailbox is accessed
through a DOE Extended Capability.
Each DOE mailbox must support the DOE discovery protocol in addition to
any number of additional protocols.
Define core PCIe functionality to manage a single PCIe DOE mailbox at a
defined config space offset. Functionality includes iterating,
creating, query of supported protocol, and task submission. Destruction
of the mailboxes is device managed.
Cc: "Li, Ming" <ming4.li@intel.com> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Acked-by: Bjorn Helgaas <helgaas@kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Co-developed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20220719205249.566684-4-ira.weiny@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Jonathan Cameron [Tue, 19 Jul 2022 20:52:44 +0000 (13:52 -0700)]
PCI: Add vendor ID for the PCI SIG
This ID is used in DOE headers to identify protocols that are defined
within the PCI Express Base Specification, PCIe r6.0, sec 6.30.1.1 table
6-32.
Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220719205249.566684-2-ira.weiny@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
It does not hurt to fill in the changeset id while the mutex is still
held. After doing so, the function tails for the success and failure
cases become identical, so they can be unified.
There is no point in doing several preparatory steps in
of_overlay_fdt_apply(), only to see of_overlay_apply() return early
because of a corrupt device tree.
Move the check for a corrupt device tree from of_overlay_apply() to
of_overlay_fdt_apply(), to check for this as early as possible.
Jakub Kicinski [Tue, 19 Jul 2022 21:13:33 +0000 (14:13 -0700)]
Merge branch 'io_uring-zerocopy-send' of git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux
Pavel Begunkov says:
====================
io_uring zerocopy send
The patchset implements io_uring zerocopy send. It works with both registered
and normal buffers, mixing is allowed but not recommended. Apart from usual
request completions, just as with MSG_ZEROCOPY, io_uring separately notifies
the userspace when buffers are freed and can be reused (see API design below),
which is delivered into io_uring's Completion Queue. Those "buffer-free"
notifications are not necessarily per request, but the userspace has control
over it and should explicitly attaching a number of requests to a single
notification. The series also adds some internal optimisations when used with
registered buffers like removing page referencing.
From the kernel networking perspective there are two main changes. The first
one is passing ubuf_info into the network layer from io_uring (inside of an
in kernel struct msghdr). This allows extra optimisations, e.g. ubuf_info
caching on the io_uring side, but also helps to avoid cross-referencing
and synchronisation problems. The second part is an optional optimisation
removing page referencing for requests with registered buffers.
Benchmarking UDP with an optimised version of the selftest (see [1]), which
sends a bunch of requests, waits for completions and repeats. "+ flush" column
posts one additional "buffer-free" notification per request, and just "zc"
doesn't post buffer notifications at all.
Previously it also brought a massive performance speedup compared to the
msg_zerocopy tool (see [3]), which is probably not super interesting. There
is also an additional bunch of refcounting optimisations that was omitted from
the series for simplicity and as they don't change the picture drastically,
they will be sent as follow up, as well as flushing optimisations closing the
performance gap b/w two last columns.
For TCP on localhost (with hacks enabling localhost zerocopy) and including
additional overhead for receive:
Using a real NIC 1200 bytes, zc is worse than non-zc ~5-10%, maybe the
omitted optimisations will somewhat help, should look better for 4000,
but couldn't test properly because of setup problems.
Net patches based:
git@github.com:isilence/linux.git zc_v4-net-base
or
https://github.com/isilence/linux/tree/zc_v4-net-base
API design overview:
The series introduces an io_uring concept of notifactors. From the userspace
perspective it's an entity to which it can bind one or more requests and then
requesting to flush it. Flushing a notifier makes it impossible to attach new
requests to it, and instructs the notifier to post a completion once all
requests attached to it are completed and the kernel doesn't need the buffers
anymore.
Notifications are stored in notification slots, which should be registered as
an array in io_uring. Each slot stores only one notifier at any particular
moment. Flushing removes it from the slot and the slot automatically replaces
it with a new notifier. All operations with notifiers are done by specifying
an index of a slot it's currently in.
When registering a notification the userspace specifies a u64 tag for each
slot, which will be copied in notification completion entries as
cqe::user_data. cqe::res is 0 and cqe::flags is equal to wrap around u32
sequence number counting notifiers of a slot.
Pavel Begunkov [Tue, 12 Jul 2022 20:52:35 +0000 (21:52 +0100)]
tcp: support externally provided ubufs
Teach tcp how to use external ubuf_info provided in msghdr and
also prepare it for managed frags by sprinkling
skb_zcopy_downgrade_managed() when it could mix managed and not managed
frags.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pavel Begunkov [Tue, 12 Jul 2022 20:52:34 +0000 (21:52 +0100)]
ipv6/udp: support externally provided ubufs
Teach ipv6/udp how to use external ubuf_info provided in msghdr and
also prepare it for managed frags by sprinkling
skb_zcopy_downgrade_managed() when it could mix managed and not managed
frags.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pavel Begunkov [Tue, 12 Jul 2022 20:52:33 +0000 (21:52 +0100)]
ipv4/udp: support externally provided ubufs
Teach ipv4/udp how to use external ubuf_info provided in msghdr and
also prepare it for managed frags by sprinkling
skb_zcopy_downgrade_managed() when it could mix managed and not managed
frags.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pavel Begunkov [Tue, 12 Jul 2022 20:52:32 +0000 (21:52 +0100)]
net: introduce __skb_fill_page_desc_noacc
Managed pages contain pinned userspace pages and controlled by upper
layers, there is no need in tracking skb->pfmemalloc for them. Introduce
a helper for filling frags but ignoring page tracking, it'll be needed
later.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pavel Begunkov [Tue, 12 Jul 2022 20:52:31 +0000 (21:52 +0100)]
net: introduce managed frags infrastructure
Some users like io_uring can do page pinning more efficiently, so we
want a way to delegate referencing to other subsystems. For that add
a new flag called SKBFL_MANAGED_FRAG_REFS. When set, skb doesn't hold
page references and upper layers are responsivle to managing page
lifetime.
It's allowed to convert skbs from managed to normal by calling
skb_zcopy_downgrade_managed(). The function will take all needed
page references and clear the flag. It's needed, for instance,
to avoid mixing managed modes.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Ahern [Tue, 12 Jul 2022 20:52:30 +0000 (21:52 +0100)]
net: Allow custom iter handler in msghdr
Add support for custom iov_iter handling to msghdr. The idea is that
in-kernel subsystems want control over how an SG is split.
Signed-off-by: David Ahern <dsahern@kernel.org>
[pavel: move callback into msghdr] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Aya Levin [Mon, 4 Jul 2022 16:34:26 +0000 (19:34 +0300)]
net/mlx5e: Add resiliency for PTP TX port timestamp
PTP TX port timestamp relies on receiving 2 CQEs for each outgoing
packet (WQE). The regular CQE has a less accurate timestamp than the
wire CQE. On link change, the wire CQE may get lost. Let the driver
detect and restore the relation between the CQEs, and re-sync after
timeout.
Add resiliency for this as follows: add id (producer counter)
into the WQE's metadata. This id will be received in the wire
CQE (in wqe_counter field). On handling the wire CQE, if there is no
match, replay the PTP application with the time-stamp from the regular
CQE and restore the sync between the CQEs and their SKBs. This patch
adds 2 ptp counters:
1) ptp_cq0_resync_event: number of times a mismatch was detected between
the regular CQE and the wire CQE.
2) ptp_cq0_resync_cqe: total amount of missing wire CQEs.
Aya Levin [Mon, 4 Jul 2022 16:34:08 +0000 (19:34 +0300)]
net/mlx5: Expose ts_cqe_metadata_size2wqe_counter
Add capability field which indicates the mask for wqe_counter which
connects between loopback CQE and the original WQE. With this connection
the driver can identify lost of the loopback CQE and reply PTP
synchronization with timestamp given in the original CQE.
Moshe Tal [Tue, 5 Apr 2022 01:34:00 +0000 (04:34 +0300)]
net/mlx5e: HTB, remove priv from htb function calls
As a step to make htb self-contained replace the passing of priv as a
parameter to htb function calls with members in the htb struct.
Full decoupling the htb from priv will require more work, so for now
leave the priv as one of the members in the htb struct, to be replaced
by channels in a future commit.