Joe Damato [Wed, 4 Sep 2024 15:34:30 +0000 (15:34 +0000)]
net: napi: Prevent overflow of napi_defer_hard_irqs
In commit 6f8b12d661d0 ("net: napi: add hard irqs deferral feature")
napi_defer_irqs was added to net_device and napi_defer_irqs_count was
added to napi_struct, both as type int.
This value never goes below zero, so there is not reason for it to be a
signed int. Change the type for both from int to u32, and add an
overflow check to sysfs to limit the value to S32_MAX.
The limit of S32_MAX was chosen because the practical limit before this
patch was S32_MAX (anything larger was an overflow) and thus there are
no behavioral changes introduced. If the extra bit is needed in the
future, the limit can be raised.
Merge tag 'net-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from can, bluetooth and wireless.
No known regressions at this point. Another calm week, but chances are
that has more to do with vacation season than the quality of our work.
Current release - new code bugs:
- smc: prevent NULL pointer dereference in txopt_get
- eth: ti: am65-cpsw: number of XDP-related fixes
Previous releases - regressions:
- Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over
BREDR/LE", it breaks existing user space
- Bluetooth: qca: if memdump doesn't work, re-enable IBS to avoid
later problems with suspend
- can: mcp251x: fix deadlock if an interrupt occurs during
mcp251x_open
- eth: r8152: fix the firmware communication error due to use of bulk
write
- ptp: ocp: fix serial port information export
- eth: igb: fix not clearing TimeSync interrupts for 82580
- Revert "wifi: ath11k: support hibernation", fix suspend on Lenovo
Previous releases - always broken:
- eth: intel: fix crashes and bugs when reconfiguration and resets
happening in parallel
- wifi: ath11k: fix NULL dereference in ath11k_mac_get_eirp_power()
Misc:
- docs: netdev: document guidance on cleanup.h"
* tag 'net-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
ila: call nf_unregister_net_hooks() sooner
tools/net/ynl: fix cli.py --subscribe feature
MAINTAINERS: fix ptp ocp driver maintainers address
selftests: net: enable bind tests
net: dsa: vsc73xx: fix possible subblocks range of CAPT block
sched: sch_cake: fix bulk flow accounting logic for host fairness
docs: netdev: document guidance on cleanup.h
net: xilinx: axienet: Fix race in axienet_stop
net: bridge: br_fdb_external_learn_add(): always set EXT_LEARN
r8152: fix the firmware doesn't work
fou: Fix null-ptr-deref in GRO.
bareudp: Fix device stats updates.
net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanup
bpf, net: Fix a potential race in do_sock_getsockopt()
net: dqs: Do not use extern for unused dql_group
sch/netem: fix use after free in netem_dequeue
usbnet: modern method to get random MAC
MAINTAINERS: wifi: cw1200: add net-cw1200.h
ice: do not bring the VSI up, if it was down before the XDP setup
ice: remove ICE_CFG_BUSY locking from AF_XDP code
...
Merge tag 'spi-fix-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few small driver specific fixes (including some of the widespread
work on fixing missing ID tables for module autoloading and the revert
of some problematic PM work in spi-rockchip), some improvements to the
MAINTAINERS information for the NXP drivers and the addition of a new
device ID to spidev"
* tag 'spi-fix-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
MAINTAINERS: SPI: Add mailing list imx@lists.linux.dev for nxp spi drivers
MAINTAINERS: SPI: Add freescale lpspi maintainer information
spi: spi-fsl-lpspi: Fix off-by-one in prescale max
spi: spidev: Add missing spi_device_id for jg10309-01
spi: bcm63xx: Enable module autoloading
spi: intel: Add check devm_kasprintf() returned value
spi: spidev: Add an entry for elgin,jg10309-01
spi: rockchip: Resolve unbalanced runtime PM / system PM handling
Merge tag 'regulator-fix-v6.11-stub' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"A fix from Doug Anderson for a missing stub, required to fix the build
for some newly added users of devm_regulator_bulk_get_const() in
!REGULATOR configurations"
* tag 'regulator-fix-v6.11-stub' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: core: Stub devm_regulator_bulk_get_const() if !CONFIG_REGULATOR
Merge tag 'rust-fixes-6.11-2' of https://github.com/Rust-for-Linux/linux
Pull Rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:
- Fix builds for nightly compiler users now that 'new_uninit' was
split into new features by using an alternative approach for the
code that used what is now called the 'box_uninit_write' feature
- Allow the 'stable_features' lint to preempt upcoming warnings about
them, since soon there will be unstable features that will become
stable in nightly compilers
- Export bss symbols too
'kernel' crate:
- 'block' module: fix wrong usage of lockdep API
'macros' crate:
- Provide correct provenance when constructing 'THIS_MODULE'
Documentation:
- Remove unintended indentation (blockquotes) in generated output
- Fix a couple typos
MAINTAINERS:
- Remove Wedson as Rust maintainer
- Update Andreas' email"
* tag 'rust-fixes-6.11-2' of https://github.com/Rust-for-Linux/linux:
MAINTAINERS: update Andreas Hindborg's email address
MAINTAINERS: Remove Wedson as Rust maintainer
rust: macros: provide correct provenance when constructing THIS_MODULE
rust: allow `stable_features` lint
docs: rust: remove unintended blockquote in Quick Start
rust: alloc: eschew `Box<MaybeUninit<T>>::write`
rust: kernel: fix typos in code comments
docs: rust: remove unintended blockquote in Coding Guidelines
rust: block: fix wrong usage of lockdep API
rust: kbuild: fix export of bss symbols
Merge tag 'trace-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix adding a new fgraph callback after function graph tracing has
already started.
If the new caller does not initialize its hash before registering the
fgraph_ops, it can cause a NULL pointer dereference. Fix this by
adding a new parameter to ftrace_graph_enable_direct() passing in the
newly added gops directly and not rely on using the fgraph_array[],
as entries in the fgraph_array[] must be initialized.
Assign the new gops to the fgraph_array[] after it goes through
ftrace_startup_subops() as that will properly initialize the
gops->ops and initialize its hashes.
- Fix a memory leak in fgraph storage memory test.
If the "multiple fgraph storage on a function" boot up selftest fails
in the registering of the function graph tracer, it will not free the
memory it allocated for the filter. Break the loop up into two where
it allocates the filters first and then registers the functions where
any errors will do the appropriate clean ups.
- Only clear the timerlat timers if it has an associated kthread.
In the rtla tool that uses timerlat, if it was killed just as it was
shutting down, the signals can free the kthread and the timer. But
the closing of the timerlat files could cause the hrtimer_cancel() to
be called on the already freed timer. As the kthread variable is is
set to NULL when the kthreads are stopped and the timers are freed it
can be used to know not to call hrtimer_cancel() on the timer if the
kthread variable is NULL.
- Use a cpumask to keep track of osnoise/timerlat kthreads
The timerlat tracer can use user space threads for its analysis. With
the killing of the rtla tool, the kernel can get confused between if
it is using a user space thread to analyze or one of its own kernel
threads. When this confusion happens, kthread_stop() can be called on
a user space thread and bad things happen. As the kernel threads are
per-cpu, a bitmask can be used to know when a kernel thread is used
or when a user space thread is used.
- Add missing interface_lock to osnoise/timerlat stop_kthread()
The stop_kthread() function in osnoise/timerlat clears the osnoise
kthread variable, and if it was a user space thread does a put_task
on it. But this can race with the closing of the timerlat files that
also does a put_task on the kthread, and if the race happens the task
will have put_task called on it twice and oops.
- Add cond_resched() to the tracing_iter_reset() loop.
The latency tracers keep writing to the ring buffer without resetting
when it issues a new "start" event (like interrupts being disabled).
When reading the buffer with an iterator, the tracing_iter_reset()
sets its pointer to that start event by walking through all the
events in the buffer until it gets to the time stamp of the start
event. In the case of a very large buffer, the loop that looks for
the start event has been reported taking a very long time with a non
preempt kernel that it can trigger a soft lock up warning. Add a
cond_resched() into that loop to make sure that doesn't happen.
- Use list_del_rcu() for eventfs ei->list variable
It was reported that running loops of creating and deleting kprobe
events could cause a crash due to the eventfs list iteration hitting
a LIST_POISON variable. This is because the list is protected by SRCU
but when an item is deleted from the list, it was using list_del()
which poisons the "next" pointer. This is what list_del_rcu() was to
prevent.
* tag 'trace-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/timerlat: Add interface_lock around clearing of kthread in stop_kthread()
tracing/timerlat: Only clear timer if a kthread exists
tracing/osnoise: Use a cpumask to know what threads are kthreads
eventfs: Use list_del_rcu() for SRCU protected list variable
tracing: Avoid possible softlockup in tracing_iter_reset()
tracing: Fix memory leak in fgraph storage selftest
tracing: fgraph: Fix to add new fgraph_ops to array after ftrace_startup_subops()
Memory state around the buggy address: ffff88806461ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88806461ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888064620000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^ ffff888064620080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888064620100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Fixes: 7f00feaf1076 ("ila: Add generic ILA translation facility") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Reviewed-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20240904144418.1162839-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Execution of command:
./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml /
--subscribe "monitor" --sleep 10
fails with:
File "/repo/./tools/net/ynl/cli.py", line 109, in main
ynl.check_ntf()
File "/repo/tools/net/ynl/lib/ynl.py", line 924, in check_ntf
op = self.rsp_by_value[nl_msg.cmd()]
KeyError: 19
Parsing Generic Netlink notification messages performs lookup for op in
the message. The message was not yet decoded, and is not yet considered
GenlMsg, thus msg.cmd() returns Generic Netlink family id (19) instead of
proper notification command id (i.e.: DPLL_CMD_PIN_CHANGE_NTF=13).
Allow the op to be obtained within NetlinkProtocol.decode(..) itself if the
op was not passed to the decode function, thus allow parsing of Generic
Netlink notifications without causing the failure.
Merge tag 'platform-drivers-x86-v6.11-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
- amd/pmf: ASUS GA403 quirk matching tweak
- dell-smbios: Fix to the init function rollback path
* tag 'platform-drivers-x86-v6.11-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/amd: pmf: Make ASUS GA403 quirk generic
platform/x86: dell-smbios: Fix error path in dell_smbios_init()
Merge tag 'linux_kselftest-kunit-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit fix fromShuah Khan:
"One single fix to a use-after-free bug resulting from
kunit_driver_create() failing to copy the driver name leaving it on
the stack or freeing it"
* tag 'linux_kselftest-kunit-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: Device wrappers should also manage driver name
Steven Rostedt [Thu, 5 Sep 2024 15:33:59 +0000 (11:33 -0400)]
tracing/timerlat: Add interface_lock around clearing of kthread in stop_kthread()
The timerlat interface will get and put the task that is part of the
"kthread" field of the osn_var to keep it around until all references are
released. But here's a race in the "stop_kthread()" code that will call
put_task_struct() on the kthread if it is not a kernel thread. This can
race with the releasing of the references to that task struct and the
put_task_struct() can be called twice when it should have been called just
once.
Take the interface_lock() in stop_kthread() to synchronize this change.
But to do so, the function stop_per_cpu_kthreads() needs to change the
loop from for_each_online_cpu() to for_each_possible_cpu() and remove the
cpu_read_lock(), as the interface_lock can not be taken while the cpu
locks are held. The only side effect of this change is that it may do some
extra work, as the per_cpu variables of the offline CPUs would not be set
anyway, and would simply be skipped in the loop.
Remove unneeded "return;" in stop_kthread().
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tomas Glozar <tglozar@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Link: https://lore.kernel.org/20240905113359.2b934242@gandalf.local.home Fixes: e88ed227f639e ("tracing/timerlat: Add user-space interface") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Thu, 5 Sep 2024 12:53:30 +0000 (08:53 -0400)]
tracing/timerlat: Only clear timer if a kthread exists
The timerlat tracer can use user space threads to check for osnoise and
timer latency. If the program using this is killed via a SIGTERM, the
threads are shutdown one at a time and another tracing instance can start
up resetting the threads before they are fully closed. That causes the
hrtimer assigned to the kthread to be shutdown and freed twice when the
dying thread finally closes the file descriptors, causing a use-after-free
bug.
Only cancel the hrtimer if the associated thread is still around. Also add
the interface_lock around the resetting of the tlat_var->kthread.
Note, this is just a quick fix that can be backported to stable. A real
fix is to have a better synchronization between the shutdown of old
threads and the starting of new ones.
Steven Rostedt [Wed, 4 Sep 2024 14:34:28 +0000 (10:34 -0400)]
tracing/osnoise: Use a cpumask to know what threads are kthreads
The start_kthread() and stop_thread() code was not always called with the
interface_lock held. This means that the kthread variable could be
unexpectedly changed causing the kthread_stop() to be called on it when it
should not have been, leading to:
while true; do
rtla timerlat top -u -q & PID=$!;
sleep 5;
kill -INT $PID;
sleep 0.001;
kill -TERM $PID;
wait $PID;
done
This is because it would mistakenly call kthread_stop() on a user space
thread making it "exit" before it actually exits.
Since kthreads are created based on global behavior, use a cpumask to know
when kthreads are running and that they need to be shutdown before
proceeding to do new work.
Steven Rostedt [Wed, 4 Sep 2024 17:16:05 +0000 (13:16 -0400)]
eventfs: Use list_del_rcu() for SRCU protected list variable
Chi Zhiling reported:
We found a null pointer accessing in tracefs[1], the reason is that the
variable 'ei_child' is set to LIST_POISON1, that means the list was
removed in eventfs_remove_rec. so when access the ei_child->is_freed, the
panic triggered.
by the way, the following script can reproduce this panic
loop1 (){
while true
do
echo "p:kp submit_bio" > /sys/kernel/debug/tracing/kprobe_events
echo "" > /sys/kernel/debug/tracing/kprobe_events
done
}
loop2 (){
while true
do
tree /sys/kernel/debug/tracing/events/kprobes/
done
}
loop1 &
loop2
The issue is that list_del() is used on an SRCU protected list variable
before the synchronization occurs. This can poison the list pointers while
there is a reader iterating the list.
This is simply fixed by using list_del_rcu() that is specifically made for
this purpose.
Zheng Yejian [Tue, 27 Aug 2024 12:46:54 +0000 (20:46 +0800)]
tracing: Avoid possible softlockup in tracing_iter_reset()
In __tracing_open(), when max latency tracers took place on the cpu,
the time start of its buffer would be updated, then event entries with
timestamps being earlier than start of the buffer would be skipped
(see tracing_iter_reset()).
Softlockup will occur if the kernel is non-preemptible and too many
entries were skipped in the loop that reset every cpu buffer, so add
cond_resched() to avoid it.
Cc: stable@vger.kernel.org Fixes: 2f26ebd549b9a ("tracing: use timestamp to determine start of latency traces") Link: https://lore.kernel.org/20240827124654.3817443-1-zhengyejian@huaweicloud.com Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
====================
Add driver for Motorcomm yt8821 2.5G ethernet phy
yt8521 and yt8531s as Gigabit transceiver use bit15:14(bit9 reserved
default 0) as phy speed mask, yt8821 as 2.5G transceiver uses bit9 bit15:14
as phy speed mask.
Be compatible to yt8821, reform phy speed mask and phy speed macro.
Based on update above, add yt8821 2.5G phy driver.
====================
Frank Sae [Sun, 1 Sep 2024 08:35:26 +0000 (01:35 -0700)]
net: phy: Add driver for Motorcomm yt8821 2.5G ethernet phy
Add a driver for the motorcomm yt8821 2.5G ethernet phy. Verified the
driver on BPI-R3(with MediaTek MT7986(Filogic 830) SoC) development board,
which is developed by Guangdong Bipai Technology Co., Ltd..
yt8821 2.5G ethernet phy works in AUTO_BX2500_SGMII or FORCE_BX2500
interface, supports 2.5G/1000M/100M/10M speeds, and wol(magic package).
Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com> Reviewed-by: Sai Krishna <saikrishnag@marvell.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Frank Sae [Sun, 1 Sep 2024 08:35:25 +0000 (01:35 -0700)]
net: phy: Optimize phy speed mask to be compatible to yt8821
yt8521 and yt8531s as Gigabit transceiver use bit15:14(bit9 reserved
default 0) as phy speed mask, yt8821 as 2.5G transceiver uses bit9 bit15:14
as phy speed mask.
Be compatible to yt8821, reform phy speed mask and phy speed macro.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* tag 'linux-can-next-for-6.12-20240904-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: rockchip_canfd: add support for CAN_CTRLMODE_BERR_REPORTING
can: rockchip_canfd: add support for CAN_CTRLMODE_LOOPBACK
can: rockchip_canfd: add hardware timestamping support
can: rockchip_canfd: enable full TX-FIFO depth of 2
can: rockchip_canfd: prepare to use full TX-FIFO depth
can: rockchip_canfd: add stats support for errata workarounds
can: rockchip_canfd: rkcanfd_get_berr_counter_corrected(): work around broken {RX,TX}ERRORCNT register
can: rockchip_canfd: implement workaround for erratum 12
can: rockchip_canfd: implement workaround for erratum 6
can: rockchip_canfd: add TX PATH
can: rockchip_canfd: rkcanfd_register_done(): add warning for erratum 5
can: rockchip_canfd: rkcanfd_handle_rx_int_one(): implement workaround for erratum 5: check for empty FIFO
can: rockchip_canfd: add notes about known issues
can: rockchip_canfd: add support for rk3568v3
can: rockchip_canfd: add quirk for broken CAN-FD support
can: rockchip_canfd: add quirks for errata workarounds
can: rockchip_canfd: add driver for Rockchip CAN-FD controller
dt-bindings: can: rockchip_canfd: add rockchip CAN-FD controller
====================
Stefan Wahren [Thu, 5 Sep 2024 11:15:37 +0000 (13:15 +0200)]
spi: spi-fsl-lpspi: Fix off-by-one in prescale max
The commit 783bf5d09f86 ("spi: spi-fsl-lpspi: limit PRESCALE bit in
TCR register") doesn't implement the prescaler maximum as intended.
The maximum allowed value for i.MX93 should be 1 and for i.MX7ULP
it should be 7. So this needs also a adjustment of the comparison
in the scldiv calculation.
Chen Ni [Wed, 4 Sep 2024 01:50:03 +0000 (09:50 +0800)]
ptp: ptp_idt82p33: Convert comma to semicolon
Replace comma between expressions with semicolons.
Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.
Found by inspection.
No functional change intended.
Compile tested only.
Hongbo Li [Wed, 4 Sep 2024 01:49:56 +0000 (09:49 +0800)]
net: dsa: felix: Annotate struct action_gate_entry with __counted_by
Add the __counted_by compiler attribute to the flexible array member
entries to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.
====================
Bonding: support new xfrm state offload functions
Add 2 new xfrm state offload functions xdo_dev_state_advance_esn and
xdo_dev_state_update_stats for bonding. The xdo_dev_state_free will be
added by Jianbo's patchset [1]. I will add the bonding xfrm policy offload
in future.
v7: no update, just rebase the code.
v6: Use "Return: " based on ./scripts/kernel-doc (Simon Horman)
v5: Rebase to latest net-next, update function doc (Jakub Kicinski)
v4: Ratelimit pr_warn (Sabrina Dubroca)
v3: Re-format bond_ipsec_dev, use slave_warn instead of WARN_ON (Nikolay Aleksandrov)
Fix bond_ipsec_dev defination, add *. (Simon Horman, kernel test robot)
Fix "real" typo (kernel test robot)
v2: Add a function to process the common device checking (Nikolay Aleksandrov)
Remove unused variable (Simon Horman)
v1: lore.kernel.org/netdev/20240816035518.203704-1-liuhangbin@gmail.com
====================
Hangbin Liu [Wed, 4 Sep 2024 00:34:57 +0000 (08:34 +0800)]
bonding: support xfrm state update
The patch add xfrm statistics update for bonding IPsec offload.
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hangbin Liu [Wed, 4 Sep 2024 00:34:56 +0000 (08:34 +0800)]
bonding: Add ESN support to IPSec HW offload
Currently, users can see that bonding supports IPSec HW offload via ethtool.
However, this functionality does not work with NICs like Mellanox cards when
ESN (Extended Sequence Numbers) is enabled, as ESN functions are not yet
supported. This patch adds ESN support to the bonding IPSec device offload,
ensuring proper functionality with NICs that support ESN.
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hangbin Liu [Wed, 4 Sep 2024 00:34:55 +0000 (08:34 +0800)]
bonding: add common function to check ipsec device
This patch adds a common function to check the status of IPSec devices.
This function will be useful for future implementations, such as IPSec ESN
and state offload callbacks.
Suggested-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
sched: sch_cake: fix bulk flow accounting logic for host fairness
In sch_cake, we keep track of the count of active bulk flows per host,
when running in dst/src host fairness mode, which is used as the
round-robin weight when iterating through flows. The count of active
bulk flows is updated whenever a flow changes state.
This has a peculiar interaction with the hash collision handling: when a
hash collision occurs (after the set-associative hashing), the state of
the hash bucket is simply updated to match the new packet that collided,
and if host fairness is enabled, that also means assigning new per-host
state to the flow. For this reason, the bulk flow counters of the
host(s) assigned to the flow are decremented, before new state is
assigned (and the counters, which may not belong to the same host
anymore, are incremented again).
Back when this code was introduced, the host fairness mode was always
enabled, so the decrement was unconditional. When the configuration
flags were introduced the *increment* was made conditional, but
the *decrement* was not. Which of course can lead to a spurious
decrement (and associated wrap-around to U16_MAX).
AFAICT, when host fairness is disabled, the decrement and wrap-around
happens as soon as a hash collision occurs (which is not that common in
itself, due to the set-associative hashing). However, in most cases this
is harmless, as the value is only used when host fairness mode is
enabled. So in order to trigger an array overflow, sch_cake has to first
be configured with host fairness disabled, and while running in this
mode, a hash collision has to occur to cause the overflow. Then, the
qdisc has to be reconfigured to enable host fairness, which leads to the
array out-of-bounds because the wrapped-around value is retained and
used as an array index. It seems that syzbot managed to trigger this,
which is quite impressive in its own right.
This patch fixes the issue by introducing the same conditional check on
decrement as is used on increment.
The original bug predates the upstreaming of cake, but the commit listed
in the Fixes tag touched that code, meaning that this patch won't apply
before that.
Fixes: 712639929912 ("sch_cake: Make the dual modes fairer") Reported-by: syzbot+7fe7b81d602cc1e6b94d@syzkaller.appspotmail.com Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20240903160846.20909-1-toke@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Fri, 30 Aug 2024 17:14:42 +0000 (10:14 -0700)]
docs: netdev: document guidance on cleanup.h
Document what was discussed multiple times on list and various
virtual / in-person conversations. guard() being okay in functions
<= 20 LoC is a bit of my own invention. If the function is trivial
it should be fine, but feel free to disagree :)
We'll obviously revisit this guidance as time passes and we and other
subsystems get more experience.
Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20240830171443.3532077-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Thu, 5 Sep 2024 00:37:37 +0000 (17:37 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
ice: fix synchronization between .ndo_bpf() and reset
Larysa Zaremba says:
PF reset can be triggered asynchronously, by tx_timeout or by a user. With some
unfortunate timings both ice_vsi_rebuild() and .ndo_bpf will try to access and
modify XDP rings at the same time, causing system crash.
The first patch factors out rtnl-locked code from VSI rebuild code to avoid
deadlock. The following changes lock rebuild and .ndo_bpf() critical sections
with an internal mutex as well and provide complementary fixes.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: do not bring the VSI up, if it was down before the XDP setup
ice: remove ICE_CFG_BUSY locking from AF_XDP code
ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset
ice: check for XDP rings instead of bpf program when unconfiguring
ice: protect XDP configuration with a mutex
ice: move netif_queue_set_napi to rtnl-protected sections
====================
Jakub Kicinski [Thu, 5 Sep 2024 00:14:11 +0000 (17:14 -0700)]
Merge tag 'wireless-2024-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.11
Hopefully final fixes for v6.11 and this time only fixes to ath11k
driver. We need to revert hibernation support due to reported
regressions and we have a fix for kernel crash introduced in
v6.11-rc1.
* tag 'wireless-2024-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
MAINTAINERS: wifi: cw1200: add net-cw1200.h
Revert "wifi: ath11k: support hibernation"
Revert "wifi: ath11k: restore country code during resume"
wifi: ath11k: fix NULL pointer dereference in ath11k_mac_get_eirp_power()
====================
Sean Anderson [Tue, 3 Sep 2024 18:49:12 +0000 (14:49 -0400)]
net: cadence: macb: Enable software IRQ coalescing by default
This NIC doesn't have hardware IRQ coalescing. Under high load,
interrupts can adversely affect performance. To mitigate this, enable
software IRQ coalescing by default. On my system this increases receive
throughput with iperf3 from 853 MBit/sec to 934 MBit/s, decreases
interrupts from 69489/sec to 2016/sec, and decreases CPU utilization
from 27% (4x Cortex-A53) to 14%. Latency is not affected (as far as I
can tell).
Sean Anderson [Tue, 3 Sep 2024 17:51:41 +0000 (13:51 -0400)]
net: xilinx: axienet: Fix race in axienet_stop
axienet_dma_err_handler can race with axienet_stop in the following
manner:
CPU 1 CPU 2
====================== ==================
axienet_stop()
napi_disable()
axienet_dma_stop()
axienet_dma_err_handler()
napi_disable()
axienet_dma_stop()
axienet_dma_start()
napi_enable()
cancel_work_sync()
free_irq()
Fix this by setting a flag in axienet_stop telling
axienet_dma_err_handler not to bother doing anything. I chose not to use
disable_work_sync to allow for easier backporting.
When reading registers from the PHY using the SIOCGMIIREG IOCTL any
errors returned from either mdiobus_read() or mdiobus_c45_read() are
ignored, and parts of the returned error is passed as the register value
back to user-space.
For example, if mdiobus_c45_read() is used with a bus that do not
implement the read_c45() callback -EOPNOTSUPP is returned. This is
however directly stored in mii_data->val_out and returned as the
registers content. As val_out is a u16 the error code is truncated and
returned as a plausible register value.
Fix this by first checking the return value for errors before returning
it as the register content.
Li Zetao [Tue, 3 Sep 2024 14:33:43 +0000 (22:33 +0800)]
pds_core: Remove redundant null pointer checks
Since the debugfs_create_dir() never returns a null pointer, checking
the return value for a null pointer is redundant, and using IS_ERR is
safe enough.
Li Zetao [Tue, 3 Sep 2024 14:31:49 +0000 (22:31 +0800)]
ionic: Remove redundant null pointer checks in ionic_debugfs_add_qcq()
Since the debugfs_create_dir() never returns a null pointer, checking
the return value for a null pointer is redundant, and using IS_ERR is
safe enough.
Jakub Kicinski [Wed, 4 Sep 2024 23:57:13 +0000 (16:57 -0700)]
Merge branch 'unmask-upper-dscp-bits-part-3'
Ido Schimmel says:
====================
Unmask upper DSCP bits - part 3
tl;dr - This patchset continues to unmask the upper DSCP bits in the
IPv4 flow key in preparation for allowing IPv4 FIB rules to match on
DSCP. No functional changes are expected.
The TOS field in the IPv4 flow key ('flowi4_tos') is used during FIB
lookup to match against the TOS selector in FIB rules and routes.
It is currently impossible for user space to configure FIB rules that
match on the DSCP value as the upper DSCP bits are either masked in the
various call sites that initialize the IPv4 flow key or along the path
to the FIB core.
In preparation for adding a DSCP selector to IPv4 and IPv6 FIB rules, we
need to make sure the entire DSCP value is present in the IPv4 flow key.
This patchset continues to unmask the upper DSCP bits, but this time in
the output route path, specifically in the callers of
ip_route_output_ports().
The next patchset (last) will handle the callers of
ip_route_output_key(). Split from this patchset to avoid going over the
15 patches limit.
No functional changes are expected as commit 1fa3314c14c6 ("ipv4:
Centralize TOS matching") moved the masking of the upper DSCP bits to
the core where 'flowi4_tos' is matched against the TOS selector.
====================
ipv6: sit: Unmask upper DSCP bits in ipip6_tunnel_bind_dev()
Unmask the upper DSCP bits when calling ip_route_output_ports() so that
in the future it could perform the FIB lookup according to the full DSCP
value.
ip6_tunnel: Unmask upper DSCP bits in ip4ip6_err()
Unmask the upper DSCP bits when calling ip_route_output_ports() so that
in the future it could perform the FIB lookup according to the full DSCP
value.
ipv4: ipmr: Unmask upper DSCP bits in ipmr_queue_xmit()
Unmask the upper DSCP bits when calling ip_route_output_ports() so that
in the future it could perform the FIB lookup according to the full DSCP
value.
The function is passed the full DS field in its 'tos' argument by its
two callers. It then masks the upper DSCP bits using RT_TOS() when
passing it to ip_route_output_ports().
Unmask the upper DSCP bits when passing 'tos' to ip_route_output_ports()
so that in the future it could perform the FIB lookup according to the
full DSCP value.
Chen Ni [Wed, 4 Sep 2024 01:44:41 +0000 (09:44 +0800)]
selftests: net: convert comma to semicolon
Replace comma between expressions with semicolons.
Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.
Found by inspection.
No functional change intended.
Compile tested only.
ipv4: Fix user space build failure due to header change
RT_TOS() from include/uapi/linux/in_route.h is defined using
IPTOS_TOS_MASK from include/uapi/linux/ip.h. This is problematic for
files such as include/net/ip_fib.h that want to use RT_TOS() as without
including both header files kernel compilation fails:
In file included from ./include/net/ip_fib.h:25,
from ./include/net/route.h:27,
from ./include/net/lwtunnel.h:9,
from net/core/dst.c:24:
./include/net/ip_fib.h: In function ‘fib_dscp_masked_match’:
./include/uapi/linux/in_route.h:31:32: error: ‘IPTOS_TOS_MASK’ undeclared (first use in this function)
31 | #define RT_TOS(tos) ((tos)&IPTOS_TOS_MASK)
| ^~~~~~~~~~~~~~
./include/net/ip_fib.h:440:45: note: in expansion of macro ‘RT_TOS’
440 | return dscp == inet_dsfield_to_dscp(RT_TOS(fl4->flowi4_tos));
Therefore, cited commit changed linux/in_route.h to include linux/ip.h.
However, as reported by David, this breaks iproute2 compilation due
overlapping definitions between linux/ip.h and
/usr/include/netinet/ip.h:
In file included from ../include/uapi/linux/in_route.h:5,
from iproute.c:19:
../include/uapi/linux/ip.h:25:9: warning: "IPTOS_TOS" redefined
25 | #define IPTOS_TOS(tos) ((tos)&IPTOS_TOS_MASK)
| ^~~~~~~~~
In file included from iproute.c:17:
/usr/include/netinet/ip.h:222:9: note: this is the location of the previous definition
222 | #define IPTOS_TOS(tos) ((tos) & IPTOS_TOS_MASK)
Fix by changing include/net/ip_fib.h to include linux/ip.h. Note that
usage of RT_TOS() should not spread further in the kernel due to recent
work in this area.
Fixes: 1fa3314c14c6 ("ipv4: Centralize TOS matching") Reported-by: David Ahern <dsahern@kernel.org> Closes: https://lore.kernel.org/netdev/2f5146ff-507d-4cab-a195-b28c0c9e654e@kernel.org/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/20240903133554.2807343-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
James Chapman [Tue, 3 Sep 2024 11:35:47 +0000 (12:35 +0100)]
l2tp: remove unneeded null check in l2tp_v2_session_get_next
Commit aa92c1cec92b ("l2tp: add tunnel/session get_next helpers") uses
idr_get_next APIs to iterate over l2tp session IDR lists. Sessions in
l2tp_v2_session_idr always have a non-null session->tunnel pointer
since l2tp_session_register sets it before inserting the session into
the IDR. Therefore the null check on session->tunnel in
l2tp_v2_session_get_next is redundant and can be removed. Removing the
check avoids a warning from lkp.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202408111407.HtON8jqa-lkp@intel.com/ CC: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: James Chapman <jchapman@katalix.com> Acked-by: Tom Parkin <tparkin@katalix.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240903113547.1261048-1-jchapman@katalix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonas Gorski [Tue, 3 Sep 2024 08:19:57 +0000 (10:19 +0200)]
net: bridge: br_fdb_external_learn_add(): always set EXT_LEARN
When userspace wants to take over a fdb entry by setting it as
EXTERN_LEARNED, we set both flags BR_FDB_ADDED_BY_EXT_LEARN and
BR_FDB_ADDED_BY_USER in br_fdb_external_learn_add().
If the bridge updates the entry later because its port changed, we clear
the BR_FDB_ADDED_BY_EXT_LEARN flag, but leave the BR_FDB_ADDED_BY_USER
flag set.
If userspace then wants to take over the entry again,
br_fdb_external_learn_add() sees that BR_FDB_ADDED_BY_USER and skips
setting the BR_FDB_ADDED_BY_EXT_LEARN flags, thus silently ignores the
update.
Fix this by always allowing to set BR_FDB_ADDED_BY_EXT_LEARN regardless
if this was a user fdb entry or not.
Fixes: 710ae7287737 ("net: bridge: Mark FDB entries that were added by user as such") Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20240903081958.29951-1-jonas.gorski@bisdn.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hayes Wang [Tue, 3 Sep 2024 06:33:33 +0000 (14:33 +0800)]
r8152: fix the firmware doesn't work
generic_ocp_write() asks the parameter "size" must be 4 bytes align.
Therefore, write the bp would fail, if the mac->bp_num is odd. Align the
size to 4 for fixing it. The way may write an extra bp, but the
rtl8152_is_fw_mac_ok() makes sure the value must be 0 for the bp whose
index is more than mac->bp_num. That is, there is no influence for the
firmware.
Besides, I check the return value of generic_ocp_write() to make sure
everything is correct.
We observed a null-ptr-deref in fou_gro_receive() while shutting down
a host. [0]
The NULL pointer is sk->sk_user_data, and the offset 8 is of protocol
in struct fou.
When fou_release() is called due to netns dismantle or explicit tunnel
teardown, udp_tunnel_sock_release() sets NULL to sk->sk_user_data.
Then, the tunnel socket is destroyed after a single RCU grace period.
So, in-flight udp4_gro_receive() could find the socket and execute the
FOU GRO handler, where sk->sk_user_data could be NULL.
Let's use rcu_dereference_sk_user_data() in fou_from_sock() and add NULL
checks in FOU GRO handlers.
net: mana: Improve mana_set_channels() in low mem conditions
The mana_set_channels() function requires detaching the mana
driver and reattaching it with changed channel values.
During this operation if the system is low on memory, the reattach
might fail, causing the network device being down.
To avoid this we pre-allocate buffers at the beginning of set operation,
to prevent complete network loss
Merge tag 'bcachefs-2024-09-04' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
- Fix a typo in the rebalance accounting changes
- BCH_SB_MEMBER_INVALID: small on disk format feature which will be
needed for full erasure coding support; this is only the minimum so
that 6.11 can handle future versions without barfing.
* tag 'bcachefs-2024-09-04' of git://evilpiepirate.org/bcachefs:
bcachefs: BCH_SB_MEMBER_INVALID
bcachefs: fix rebalance accounting
Merge tag 'perf-tools-fixes-for-v6.11-2024-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Namhyung Kim:
"A number of small fixes for the late cycle:
- Two more build fixes on 32-bit archs
- Fixed a segfault during perf test
- Fixed spinlock/rwlock accounting bug in perf lock contention"
* tag 'perf-tools-fixes-for-v6.11-2024-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf daemon: Fix the build on more 32-bit architectures
perf python: include "util/sample.h"
perf lock contention: Fix spinlock and rwlock accounting
perf test pmu: Set uninitialized PMU alias to null
Merge tag 'hwmon-for-v6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- hp-wmi-sensors: Check if WMI event data exists before accessing it
- ltc2991: fix register bits defines
* tag 'hwmon-for-v6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (hp-wmi-sensors) Check if WMI event data exists
hwmon: ltc2991: fix register bits defines
Merge tag 'for-6.11-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- followup fix for direct io and fsync under some conditions, reported
by QEMU users
- fix a potential leak when disabling quotas while some extent tracking
work can still happen
- in zoned mode handle unexpected change of zone write pointer in
RAID1-like block groups, turn the zones to read-only
* tag 'for-6.11-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix race between direct IO write and fsync when using same fd
btrfs: zoned: handle broken write pointer on zones
btrfs: qgroup: don't use extent changeset when not needed
Merge tag 'v6.11-rc6-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Fix crash in session setup
- Fix locking bug
- Improve access bounds checking
* tag 'v6.11-rc6-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: Unlock on in ksmbd_tcp_set_interfaces()
ksmbd: unset the binding mark of a reused connection
smb: Annotate struct xattr_smb_acl with __counted_by()
Merge tag 'vfs-6.11-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"Two netfs fixes for this merge window:
- Ensure that fscache_cookie_lru_time is deleted when the fscache
module is removed to prevent UAF
- Fix filemap_invalidate_inode() to use invalidate_inode_pages2_range()
Before it used truncate_inode_pages_partial() which causes
copy_file_range() to fail on cifs"
* tag 'vfs-6.11-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fscache: delete fscache_cookie_lru_timer when fscache exits to avoid UAF
mm: Fix filemap_invalidate_inode() to use invalidate_inode_pages2_range()
Merge tag 'parisc-for-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc architecture fix from Helge Deller:
- Fix boot issue where boot memory is marked read-only too early
* tag 'parisc-for-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Delay write-protection until mark_rodata_ro() call
Merge tag 'mm-hotfixes-stable-2024-09-03-20-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"17 hotfixes, 15 of which are cc:stable.
Mostly MM, no identifiable theme. And a few nilfs2 fixups"
* tag 'mm-hotfixes-stable-2024-09-03-20-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
alloc_tag: fix allocation tag reporting when CONFIG_MODULES=n
mm: vmalloc: optimize vmap_lazy_nr arithmetic when purging each vmap_area
mailmap: update entry for Jan Kuliga
codetag: debug: mark codetags for poisoned page as empty
mm/memcontrol: respect zswap.writeback setting from parent cg too
scripts: fix gfp-translate after ___GFP_*_BITS conversion to an enum
Revert "mm: skip CMA pages when they are not available"
maple_tree: remove rcu_read_lock() from mt_validate()
kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y
mm/slub: add check for s->flags in the alloc_tagging_slab_free_hook
nilfs2: fix state management in error path of log writing function
nilfs2: fix missing cleanup on rollforward recovery error
nilfs2: protect references to superblock parameters exposed in sysfs
userfaultfd: don't BUG_ON() if khugepaged yanks our page table
userfaultfd: fix checks for huge PMDs
mm: vmalloc: ensure vmap_block is initialised before adding to queue
selftests: mm: fix build errors on armhf
The error occurs in older versions of the GNU ld with version earlier
than 2.36. It makes most sense to have a minimum LD version as
a dependency for HAVE_LD_DEAD_CODE_DATA_ELIMINATION and eliminate
the impact of ".reloc .text, R_ARM_NONE, ." when
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is not enabled.
Fixes: ed0f94102251 ("ARM: 9404/1: arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION") Reported-by: Harith George <mail2hgg@gmail.com> Tested-by: Harith George <mail2hgg@gmail.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Yuntao Liu <liuyuntao12@huawei.com> Link: https://lore.kernel.org/all/14e9aefb-88d1-4eee-8288-ef15d4a9b059@gmail.com/ Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Merge patch series "can: rockchip_canfd: add support for CAN-FD IP core found on Rockchip RK3568"
Marc Kleine-Budde <mkl@pengutronix.de> says:
This series adds support for the CAN-FD IP core found on the Rockchip
RK3568.
The IP core is a bit complicated and has several documented errata.
The driver is added in several stages, first the base driver including
the RX-path. Then several workarounds for errata and the TX-path, and
finally features like hardware time stamping, loop-back mode and
bus error reporting.
can: rockchip_canfd: enable full TX-FIFO depth of 2
The previous commit prepared the TX path to make use of the full
TX-FIFO depth as much as possible. Increase the available TX-FIFO
depth to the hardware maximum of 2.
can: rockchip_canfd: prepare to use full TX-FIFO depth
So far the TX-FIFO is only used with a depth of 1, although the
hardware offers a depth of 2.
The workaround for the chips that are affected by erratum 6, i.e. EFF
frames may be send as standard frames, is to re-send the EFF frame.
This means the driver cannot queue the next frame for sending, as long
ad the EFF frame has not been successfully send out.
Introduce rkcanfd_get_effective_tx_free() that returns "0" space in
the TX-FIFO if an EFF frame is pending and the actual free space in
the TX-FIFO otherwise. Then replace rkcanfd_get_tx_free() with
rkcanfd_get_effective_tx_free() everywhere.
can: rockchip_canfd: add stats support for errata workarounds
The driver contains workarounds for some of the rk3568v2 errata. Add
ethtool-based statistics ("ethtool -S") to track how often an erratum
workaround was needed.
can: rockchip_canfd: rkcanfd_get_berr_counter_corrected(): work around broken {RX,TX}ERRORCNT register
Tests show that sometimes both CAN bus error counters read 0x0, even
if the controller is in warning mode
(RKCANFD_REG_STATE_ERROR_WARNING_STATE in RKCANFD_REG_STATE
set).
To work around this issue, if both error counters read from hardware
are 0x0, use the structure priv->bec, otherwise save the read value in
priv->bec.
In rkcanfd_handle_rx_int_one() decrement the priv->bec.rxerr for
successfully RX'ed CAN frames.
In rkcanfd_handle_tx_done_one() decrement the priv->bec.txerr for
successfully TX'ed CAN frames.
can: rockchip_canfd: implement workaround for erratum 12
The rk3568 CAN-FD errata sheet as of Tue 07 Nov 2023 11:25:31 +08:00
says:
| A dominant bit at the third bit of the intermission may cause a
| transmission error.
|
| When sampling the third bit of the intermission as a dominant bit, if
| tx_req is configured to transmit extended frames at this time, the
| extended frame may be sent to the bus in the format of a standard
| frame. The extended frame will be sent as a standard frame and will not
| result in error frames
Turn on "Interframe Spaceing RX Mode" only during TX to work around
erratum 12, according to rock-chip:
| Spaceing RX Mode = 1, the third Bit between frames cannot receive
| and send, and the fourth Bit begins to receive and send.
|
| Spaceing RX Mode = 0, allowing the third Bit between frames to
| receive and send.
can: rockchip_canfd: implement workaround for erratum 6
The rk3568 CAN-FD errata sheet as of Tue 07 Nov 2023 11:25:31 +08:00
says:
| The CAN controller's transmission of extended frames may
| intermittently change into standard frames.
|
| When using the CAN controller to send extended frames, if the
| 'tx_req' is configured as 1 and coincides with the internal
| transmission point, the extended frame will be transmitted onto the
| bus in the format of a standard frame.
To work around Erratum 6, the driver is in self-receiving mode (RXSTX)
and all received CAN frames are passed through rkcanfd_rxstx_filter().
Add a check in rkcanfd_rxstx_filter() whether the received frame
corresponds to the current outgoing frame, but the extended CAN ID has
been mangled to a standard ID. In this case re-send the original CAN
frame.
The IP core has a TX event FIFO. In other IP cores, this type of FIFO
usually contains the events that a CAN frame has been successfully
sent. However, the IP core on the rk3568v2 the FIFO also holds events
of unsuccessful transmission attempts.
It turned out that the best way to work around this problem is to set
the IP core to self-receive mode (RXSTX), filter out the self-received
frames and insert them into the complete TX path.
Add a pair new functions to check if 2 struct canfd_frame are equal.
The 1st checks if the header of the CAN frames are equal, the 2nd
checks if the data portion are equal:
can: rockchip_canfd: rkcanfd_register_done(): add warning for erratum 5
Tests on the rk3568v2 and rk3568v3 show that a reduced "baudclk" (e.g.
80MHz, compared to the standard 300MHz) significantly increases the
possibility of incorrect FIFO counters, i.e. erratum 5.
Print an info message if the clock is below the known good value of
300MHz.
can: rockchip_canfd: rkcanfd_handle_rx_int_one(): implement workaround for erratum 5: check for empty FIFO
The rk3568 CAN-FD errata sheet as of Tue 07 Nov 2023 11:25:31 +08:00
says:
| Erratum 5: Counters related to the TXFIFO and RXFIFO exhibit
| abnormal counting behavior.
|
| Due to a bug in the cross-asynchronous logic of the enable signals
| for rx_fifo_cnt and txe_fifo_frame_cnt counters, the counts of these
| two counters become inaccurate. This issue has resulted in the
| inability to use the TXFIFO and RXFIFO functions.
The errata sheet mentioned above states that only the rk3568v2 is
affected by this erratum, but tests with the rk3568v2 and rk3568v3
show that the RX_FIFO_CNT is sometimes too high. This leads to CAN
frames being read from the FIFO, which is then already empty.
Further tests on the rk3568v2 and rk3568v3 show that in this
situation (i.e. empty FIFO) all elements of the FIFO
header (frameinfo, id, ts) contain the same data.
On the rk3568v2 and rk3568v3, this problem only occurs extremely
rarely with the standard clock of 300 MHz, but almost immediately at
80 MHz.
To workaround this problem, check for empty FIFO with
rkcanfd_fifo_header_empty() in rkcanfd_handle_rx_int_one() and exit
early.
can: rockchip_canfd: add quirk for broken CAN-FD support
The errata sheets doesn't say anything about CAN-FD, but tests on the
rk3568v2 and rk3568v3 show that receiving certain CAN-FD frames
triggers an Error Interrupt.
can: rockchip_canfd: add driver for Rockchip CAN-FD controller
Add driver for the Rockchip CAN-FD controller.
The IP core on the rk3568v2 SoC has 12 documented errata. Corrections
for these errata will be added in the upcoming patches.
Since several workarounds are required for the TX path, only add the
base driver that only implements the RX path.
Although the RX path implements CAN-FD support, it's not activated in
ctrlmode_supported, as the IP core in the rk3568v2 has problems with
receiving or sending certain CAN-FD frames.
Add support for group stats for mac. The fbnic_set_counter help preserve
the default values for counters which are not touched by the driver.
The 'reset' flag in 'get_eth_mac_stats' allows to choose between
resetting the counter to recent most value or fetching the aggregated
values of the counter.
The 'fbnic_stat_rd64' read 64b stats counters in an atomic fashion using
read-read-read approach. This allows to isolate cases where counter is
moving too fast making accuracy of the counter questionable.
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Add ethtool ops support and enable 'get_drvinfo' for fbnic. The driver
provides firmware version information while the driver name and bus
information is provided by ethtool_get_drvinfo().
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Xing [Mon, 2 Sep 2024 16:06:10 +0000 (00:06 +0800)]
selftests: add selftest for UDP SO_PEEK_OFF support
Add the SO_PEEK_OFF selftest for UDP. In this patch, I mainly do
three things:
1. rename tcp_so_peek_off.c
2. adjust for UDP protocol
3. add selftests into it
Suggested-by: Jon Maloy <jmaloy@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 4 Sep 2024 10:53:50 +0000 (11:53 +0100)]
Merge branch 'sparx5-fdma-part-one'
Daniel Machon says:
====================
net: microchip: add FDMA library and use it for Sparx5
This patch series is the first of a 2-part series, that adds a new
common FDMA library for Microchip switch chips Sparx5 and lan966x. These
chips share the same FDMA engine, and as such will benefit from a
common library with a common implementation. This also has the benefit
of removing a lot open-coded bookkeeping and duplicate code for the two
drivers.
Additionally, upstreaming efforts for a third chip, lan969x, will begin
in the near future. This chip will use the new library too.
In this first series, the FDMA library is introduced and used by the
Sparx5 switch driver.
###################
# Example of use: #
###################
- Initialize the rx and tx fdma structs with values for: number of
DCB's, number of DB's, channel ID, DB size (data buffer size), and
total size of the requested memory. Also provide two callbacks:
nextptr_cb() and dataptr_cb() for getting the nextptr and dataptr.
- Allocate memory using fdma_alloc_phys() or fdma_alloc_coherent().
- Initialize the DCB's with fdma_dcb_init().
- Add new DCB's with fdma_dcb_add().
- Free memory with fdma_free_phys() or fdma_free_coherent().
Patch #1: introduces library and selects it for Sparx5.
Patch #2: includes the fdma_api.h header and removes old symbols.
Patch #3: replaces old rx and tx variables with equivalent ones from the
fdma struct. Only the variables that can be changed without
breaking traffic is changed in this patch.
Patch #4: uses the library for allocation of rx buffers. This requires
quite a bit of refactoring in this single patch.
Patch #5: uses the library for adding DCB's in the rx path.
Patch #6: uses the library for freeing rx buffers.
Patch #7: uses the library helpers in the rx path.
Patch #8: uses the library for allocation of tx buffers. This requires
quite a bit of refactoring in this single patch.
Patch #9: uses the library for adding DCB's in the tx path.
Patch #10: uses the library helpers in the tx path.
Patch #11: ditches the existing linked list for storing buffer addresses,
and instead uses offsets into contiguous memory.
Patch #12: modifies existing rx and tx functions to be direction
independent.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
These direction specific functions can be ditched in favor of a single
function: sparx5_fdma_reload(), which retrieves the channel id from the
fdma struct instead.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:16 +0000 (16:54 +0200)]
net: sparx5: use contiguous memory for tx buffers
Currently, the driver uses a linked list for storing the tx buffer
addresses. This requires a good amount of extra bookkeeping code. Ditch
the linked list in favor of tx buffers being in the same contiguous
memory space as the DCB's and the DB's. The FDMA library has a helper
for this - so use that.
The tx buffer addresses are now retrieved as an offset into the FDMA
memory space.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:15 +0000 (16:54 +0200)]
net: sparx5: use library helper for freeing tx buffers
The library has the helper fdma_free_phys() for freeing physical FDMA
memory. Use it in the exit path.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:14 +0000 (16:54 +0200)]
net: sparx5: use FDMA library for adding DCB's in the tx path
Use the fdma_dcb_add() function to add DCB's in the tx path. This gets
rid of the open-coding of nextptr and dataptr handling and leaves it to
the library.
Also, make sure the fdma indexes are advanced using: fdma_dcb_advance(),
so that the correct nextptr and dataptr offsets are retrieved.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:13 +0000 (16:54 +0200)]
net: sparx5: use the FDMA library for allocation of tx buffers
Use the two functions: fdma_alloc_phys() and fdma_dcb_init() for tx
buffer allocation and use the new buffers throughout.
In order to replace the old buffers with the new ones, we have to do the
following refactoring:
- use fdma_alloc_phys() and fdma_dcb_init()
- replace the variables: tx->dma, tx->first_entry and tx->curr_entry
with the equivalents from the FDMA struct.
- replace uses of sparx5_db_hw and sparx5_tx_dcb_hw with fdma_db and
fdma_dcb.
- add sparx5_fdma_tx_dataptr_cb callback for obtaining the dataptr.
- Initialize FDMA struct values.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:12 +0000 (16:54 +0200)]
net: sparx5: use a few FDMA helpers in the rx path
The library provides helpers for a number of DCB and DB operations. Use
these in the rx path.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:11 +0000 (16:54 +0200)]
net: sparx5: use library helper for freeing rx buffers
The library has the helper fdma_free_phys() for freeing physical FDMA
memory. Use it in the exit path.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:10 +0000 (16:54 +0200)]
net: sparx5: use FDMA library for adding DCB's in the rx path
Use the fdma_dcb_add() function to add DCB's in the rx path. This gets
rid of the open-coding of nextptr and dataptr handling and leaves it to
the library.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:09 +0000 (16:54 +0200)]
net: sparx5: use the FDMA library for allocation of rx buffers
Use the two functions: fdma_alloc_phys() and fdma_dcb_init() for rx
buffer allocation and use the new buffers throughout.
In order to replace the old buffers with the new ones, we have to do the
following refactoring:
- use fdma_alloc_phys() and fdma_dcb_init()
- replace the variables: rx->dma, rx->dcb_entries and rx->last_entry
with the equivalents from the FDMA struct.
- replace uses of sparx5_db_hw and sparx5_rx_dcb_hw with fdma_db and
fdma_dcb.
- add sparx5_fdma_rx_dataptr_cb callback for obtaining the dataptr.
- Initialize FDMA struct values.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:08 +0000 (16:54 +0200)]
net: sparx5: replace a few variables with new equivalent ones
Replace the old rx and tx variables: channel_id, FDMA_DCB_MAX,
FDMA_RX_DCB_MAX_DBS, FDMA_TX_DCB_MAX_DBS, dcb_index and db_index with
the equivalents from the FDMA rx and tx structs. These variables are not
entangled in any buffer allocation and can therefore be replaced in
advance.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>