The original patch rejects any tree containing two netems when
either has duplication set, even when they sit on unrelated classes
of the same classful parent. That broke configurations that have
worked since netem was introduced.
The re-entrancy problem the original commit was trying to solve is
handled by later patch using tc_depth flag.
Doing this revert will (re)expose the original bug with multiple
netem duplication. When this patch is backported make sure
and get the full series.
Fixes: ec8e0e3d7ade ("net/sched: Restrict conditions for adding duplicating netems to qdisc tree") Reported-by: Ji-Soo Chung <jschung2@proton.me> Reported-by: Gerlinde <lrGerlinde@mailfence.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220774 Reported-by: zyc zyc <zyc199902@zohomail.cn> Closes: https://lore.kernel.org/all/19adda5a1e2.12410b78222774.9191120410578703463@zohomail.cn/ Reported-by: Manas Ghandat <ghandatmanas@gmail.com> Closes: https://lore.kernel.org/netdev/f69b2c8f-8325-4c2e-a011-6dbc089f30e4@gmail.com/ Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260525122556.973584-3-jhs@mojatatu.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jamal Hadi Salim [Mon, 25 May 2026 12:25:48 +0000 (08:25 -0400)]
net: Introduce skb tc depth field to track packet loops
Add a 2-bit per-skb tc depth field to track packet loops across the stack.
The previous per-CPU loop counters like MIRRED_NEST_LIMIT
assume a single call stack and lose state in two cases:
1) When a packet is queued and reprocessed later (e.g., egress->ingress
via backlog), the per-cpu state is gone by the time it is dequeued.
2) With XPS/RPS a packet may arrive on one CPU and be processed on
another.
A per-skb field solves both by travelling with the packet itself.
The field fits in existing padding, using 2 bits that were previously a
hole:
- /* XXX 2 bits hole, try to pack */
/* XXX 1 byte hole, try to pack */
__u16 tc_index; /* 134 2 */
There used to be a ttl field which was removed as part of tc_verd in commit aec745e2c520 ("net-tc: remove unused tc_verd fields"). It was already
unused by that time, due to remove earlier in commit c19ae86a510c ("tc: remove
unused redirect ttl").
The first user of this field is netem, which increments tc_depth on
duplicated packets before re-enqueueing them at the root qdisc. On
re-entry, netem skips duplication for any skb with tc_depth already set,
bounding recursion to a single level regardless of tree topology.
The other user is mirred which increments it on each pass
and limits to depth to MIRRED_DEFER_LIMIT (3).
The new field was called ttl in earlier versions of this patch
but renamed to tc_depth to avoid confusion with IP ttl.
Note (looking at you Sashiko! Dont ignore me and continue bringing this up):
1. Since both mirred and netem utilize the same 2-bit tc_depth field it is
possible when netem and mirred are used together that netem qdisc to skip
the duplication step. This is a known trade-off, as a 2-bit field cannot
independently track both features' recursion depths and it is not considered
sane to have a setup that addresses both features on at the same time.
2. skb_scrub_packet does not clear tc_depth. This means a packet's loop history
is preserved even across namespaces. While this might be restrictive for
some topologies, it is also design intent to provide robustness against loops
across namespaces.
Heiko Carstens [Tue, 19 May 2026 11:03:15 +0000 (13:03 +0200)]
seqlock: Allow UBSAN_ALIGNMENT to fail optimizing
With gcc-15 and gcc-16 with UBSAN_ALIGNMENT enabled the compiler fails to
inline and optimize __scoped_seqlock_bug() away on s390:
s390x-16.1.0-ld: kernel/sched/build_policy.o: in function `__scoped_seqlock_next':
/.../seqlock.h:1286:(.text+0x22030): undefined reference to `__scoped_seqlock_bug'
Fix this by adding UBSAN_ALIGNMENT to the list of config options where a
not inlined empty __scoped_seqlock_bug() is allowed.
Marco Elver [Fri, 15 May 2026 12:43:31 +0000 (14:43 +0200)]
compiler-context-analysis: Bump required Clang version to 23
Clang 23 introduces several major improvements:
1. Support for multiple arguments in the `guarded_by` and
`pt_guarded_by` attributes [1]. This allows defining variables
protected by multiple context locks, where read access requires
holding at least one lock (shared or exclusive), and write access
requires holding all of them exclusively.
2. Function pointer support [2]. We can now add attributes to function
pointers just like we do on normal functions.
3. A fix to use arrays of locks [3]. Each index is now correctly treated
as a separate lock instance.
4. A fix for implicit member access in attributes [4]. This allows to
use __guarded_by(&foo->lock) correctly.
Overall that makes it worthwhile bumping the compiler version instead of
trying to make both Clang 22 and later work while supporting these new
features.
Jan Polensky [Thu, 21 May 2026 12:01:32 +0000 (14:01 +0200)]
s390/bug: Always emit format word in __BUG_ENTRY
When CONFIG_DEBUG_BUGVERBOSE is disabled, the s390 __BUG_ENTRY() macro
omits the format string pointer, so the generated __bug_table entry no
longer matches struct bug_entry.
With HAVE_ARCH_BUG_FORMAT enabled, the generic BUG infrastructure reads
bug_entry::format via bug_get_format(). If the format word is missing,
subsequent fields are read from the wrong offset, which may:
- Misinterpret flags (BUG vs WARN classification errors)
- Fault when dereferencing a misread format pointer
The root cause is that __BUG_ENTRY() delegates format word emission to
__BUG_ENTRY_VERBOSE(), which is conditional on CONFIG_DEBUG_BUGVERBOSE.
Fix this by moving the format field emission directly into __BUG_ENTRY()
so it is always emitted unconditionally. Remove the format parameter from
__BUG_ENTRY_VERBOSE() and keep only file/line emission conditional on
CONFIG_DEBUG_BUGVERBOSE.
Fixes: 2b71b8ab9718 ("s390/bug: Use BUG_FORMAT for DEBUG_BUGVERBOSE_DETAILED") Signed-off-by: Jan Polensky <japo@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
====================
wangxun: improve service task synchronization
This series improves synchronization between asynchronous service work,
device teardown, and module event handling in the Wangxun drivers.
====================
Jiawen Wu [Mon, 25 May 2026 10:05:43 +0000 (18:05 +0800)]
net: txgbe: rework service event handling
Convert to use test_and_clear_bit() for link event subtasks. Only re-arm
the WX_FLAG_NEED_MODULE_RESET flag when module is absent. Unsupported or
invalid modules no longer cause the service task to continuously retry
module identification.
Additionally, explicitly cancel service_task during device teardown to
ensure no pending asynchronous service work survives after the device
has entered the DOWN state.
Jiawen Wu [Mon, 25 May 2026 10:05:42 +0000 (18:05 +0800)]
net: wangxun: avoid statistics updates during device teardown
After introducing WX_STATE_DOWN, wx_update_stats() now explicitly skips
statistics collection while the device is in teardown or reset state.
Calling wx_update_stats() from the device disable path therefore becomes
redundant.
Remove wx_update_stats() calls from ngbe_disable_device() and
txgbe_disable_device().
Jiawen Wu [Mon, 25 May 2026 10:05:41 +0000 (18:05 +0800)]
net: wangxun: introduce WX_STATE_DOWN to serialize device shutdown state
Replace various netif_running() checks with an explicit WX_STATE_DOWN
state bit to track whether the device datapath and interrupt handling
are operational.
The previous logic relied on netif_running() to gate interrupt
reenablement, queue wakeups, statistics updates, and service task
execution. However, netif_running() only reflects the administrative
state of the netdevice and does not fully serialize against teardown
and reset paths. During device shutdown and reset flows, asynchronous
contexts such as interrupt handlers, NAPI poll, and service work could
still observe netif_running() as true while device resources were
already being disabled or freed.
mm, slab: simplify returning slab in __refill_objects_node()
When we return slabs to the partial list because we didn't fully refill
from them, we observe the min_partial limit when the returned slab is
empty, and discard it when over the limit. But it's unlikely for the
limit to be reached while we were refilling, and the worst outcome is to
have temporarily more free slabs on the list than necessary. So just
drop that code and simplify the function.
mm, slab: add an optimistic __slab_try_return_freelist()
When we end up returning extraneous objects during refill to a slab
where we just did a get_freelist_nofreeze(), it is likely no other CPU
has freed objects to it meanwhile. We can then reattach the remainder of
the freelist without having to walk the (potentially cache cold)
freelist for finding its tail to connect slab->freelist to it.
Add a __slab_try_return_freelist() function that does that. As suggested
by Hao Li, it doesn't need to also return the slab to the partial list,
because there's code in __refill_objects_node() that already does that
for any slabs where we don't detach the freelist in the first place. So
we just put the slab back to the pc.slabs list. It's no longer likely
that the list will be empty now, so remove the unlikely() annotation.
However, also change that code to add to the tail of the partial list
instead of head to match what __slab_free() did and avoid a regression,
that was reported for the earlier version by the kernel test robot [1].
This change will also affect slabs which were grabbed from the partial
list and not refilled from even partially, but those should be much more
rare than a partial refill.
Peter Zijlstra [Tue, 26 May 2026 09:06:31 +0000 (11:06 +0200)]
x86/kvm/vmx: Fix x86_64 CFI build
It was missed that idt_do_interrupt_irqoff() gets compiled on x84_64;
this is a problem for CFI builds because it includes an unadorned
indirect call. It is however completely dead code.
Rework things to not emit this function at all.
Fixes: 0701c9e17bd9 ("x86/kvm/vmx: Move IRQ/NMI dispatch from KVM into x86 core") Reported-by: Nathan Chancellor <nathan@kernel.org> Reported-by: Calvin Owens <calvin@wbinvd.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://patch.msgid.link/20260526090631.GA4149641@noisy.programming.kicks-ass.net
KVM: arm64: Fallback to a supported value for unsupported guest TGx
When KVM derives the translation granule for emulated stage-1 and
stage-2 walks, it decodes TCR/VTCR.TGx and treats the granule as-is.
This is wrong when the guest programs a granule size that is not
advertised in the guest's ID_AA64MMFR0_EL1.TGRAN* fields.
Architecturally, such a value must be treated as an implemented granule
size. Choose an available one while prioritizing PAGE_SIZE.
rust: error: replace match + panic in const context with const expect
This patch replaces an instance of match + panic with const expect,
which is now usable in const contexts after the MSRV was updated to
1.85.0 (it was available since Rust 1.83.0).
KVM: arm64: nv: Use literal granule size in TLBI range calculation
TLBI handling derives the invalidation range from guest VTCR_EL2.TG0 in
get_guest_mapping_ttl() and compute_tlb_inval_range(). Switch these to
use a helper that returns the decoded VTCR_EL2.TG0 granule size instead
of decoding it inline.
This keeps the granule size derivation in one place and prepares for
following changes that adjust the effective granule size.
KVM: arm64: Factor out TG0/1 decoding of VTCR and TCR
The current code decodes TCR.TG0/TG1 and VTCR.TG0 inline at several
places. Extract this logic into helpers so the granule size can be
derived in one place. This enables us to alter the effective granule
size in the same place, which we will do in a later patch.
Rosen Penev [Thu, 28 May 2026 04:10:31 +0000 (21:10 -0700)]
gpio: realtek-otto: fix kernel-doc warnings
Add the missing 'struct' keyword in the kernel-doc comment for
realtek_gpio_ctrl, and document the @cpumask_base and @cpu_irq_maskable
members that were added later but never described. Also fix the
mismatch between documented @imr_line_pos and the actual member name
line_imr_pos.
Fixes W=1 warning:
Warning: drivers/gpio/gpio-realtek-otto.c:66 cannot understand function prototype: 'struct realtek_gpio_ctrl'
gpio: Use named initializers for platform_device_id arrays
Named initializers are better readable and more robust to changes of the
struct definition. This robustness is relevant for a planned change to
struct platform_device_id replacing .driver_data by an anonymous unit.
While touching these arrays unify spacing and usage of commas.
Linus Walleij [Sat, 28 Feb 2026 00:05:48 +0000 (01:05 +0100)]
ARM: dts: gemini: Correct the RUT1xx
Fix two problems with the RUT1xx device tree:
- The memory is 32MB not 128MB
- The console is 19200 BPS
- Activate the PCI
- Disable the unused USB ports
Linus Walleij [Sat, 28 Feb 2026 00:05:47 +0000 (01:05 +0100)]
ARM: dts: Add a Raidsonic IB-4210-B DTS
This adds a device tree for the Raidsonic IB-4210-B NAS, a slightly
under-powered version of IB-4220-B with half the memory and
the cheaper version of the SoC.
Linus Walleij [Sat, 28 Feb 2026 00:05:46 +0000 (01:05 +0100)]
ARM: dts: Add a Verbatim Gigabit NAS DTS
This adds a device tree for the Verbatim S08V1901-D1 NAS
which also has the product names "Gigabit Network Hard Drive"
"Gigabit NAS" and maybe other names.
Johannes Berg [Thu, 28 May 2026 08:23:12 +0000 (10:23 +0200)]
Merge tag 'ath-next-20260526' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath
Jeff Johnson says:
==================
ath.git patches for v7.2 (PR #2)
For ath12k:
- Add thermal throttling and cooling device support
- Add support for handling incumbent signal interference in 6 GHz
- Add support for channel 177 in the 5 GHz band
In addition, a large number of cleanup and minor bug fixing across
all supported drivers.
==================
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This contains mainly:
UHR support (DPS, DUO, multi-link PM), NAN enhancements
(multicast, schedule config v2, multiple stations), EMLSR fixes, new
Killer/LNL device IDs, firmware API cleanups, and a few bugfixes
====================
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Feng Tang [Mon, 25 May 2026 01:51:11 +0000 (09:51 +0800)]
dma-contiguous: simplify numa cma area handling
Currently, there are 2 kernel cmdline ways to setup numa cma area:
"cma_pernuma=" and "numa_cma=", and there are 2 cma arrays as well,
while they have no difference technically. Robin suggested to cleanup
the code and only use one array [1], as "the apparent intent that
users only want one _or_ the other".
Simplify the code by only using one array to save the numa cma area.
And in rare case that a user really setup the 2 cmdline parameters
at the same time, let the per-node specific size setting 'numa_cma='
take priority over the global numa cma setting.
Simona Vetter [Thu, 28 May 2026 07:56:06 +0000 (09:56 +0200)]
Merge v7.1-rc5 into drm-next
Boris Brezillion needs the gem lru fixes 379e8f1ca5e9 ("drm/gem: Make
the GEM LRU lock part of drm_device") backmerged for drm-misc-next.
That also means we need to sort out the rename conflict in panthor with
the fixup patch from Boris from drm-tip.
Miri Korenblit [Wed, 27 May 2026 19:51:45 +0000 (22:51 +0300)]
wifi: mac80211: fix channel evacuation logic
When we try to assign a chanctx to a link, if
ieee80211_find_or_create_chanctx() failed, we try to evacuate a NAN
channel and call it again.
This logic is broken:
In case there are not enough chanctxs we will fail earlier,
when we check ieee80211_check_combinations().
To fix this, do the following in case ieee80211_check_combinations()
failed:
- check if there is a NAN channel that can be evacuated
- make ieee80211_check_combinations() not consider the chanctx of that NAN
channel, so we pretend that it was already evacuated
- If now ieee80211_check_combinations() is successful, we know that it
helped, and we can remove that NAN channel for real.
Miri Korenblit [Wed, 27 May 2026 19:51:43 +0000 (22:51 +0300)]
wifi: mac80211: add an option to filter out a channel in combinations check
Sometimes, ieee80211_check_combinations fails, but it is hard to know
why exactly. We will have to return an array of reasons, one per
combination.
In cases where we want to check if it failed because there are not
enough chanctxs (and maybe remove one if needed), we can just not fill
in that chanctx(s) in iface_combination_params::num_different_channels
in ieee80211_fill_ifcomb_params, so that chanctx(s) won't be taken into
account.
To allow that, add an option to pass a callback to
ieee80211_fill_ifcomb_params. This callback will be called for each
chanctx we consider to count in num_different_channels and will return
whether or not this chanctx should be skipped and not counted.
Maoyi Xie [Wed, 27 May 2026 13:33:58 +0000 (21:33 +0800)]
wifi: nl80211: re-check wiphy netns in testmode and vendor dump continuations
Commit 79240f3f6d76 ("wifi: nl80211: re-check wiphy netns in
nl80211_prepare_wdev_dump() continuation") fixed one dumpit path that
looked the wiphy up by index on a later call without confirming it was
still in the caller's netns. Two more dumpit paths have the same gap.
nl80211_testmode_dump() and nl80211_prepare_vendor_dump() both keep the
wiphy index in cb->args[] and look it up again on later calls, through
cfg80211_rdev_by_wiphy_idx() and wiphy_idx_to_wiphy(). The first call
binds to the caller's netns. A later call does not check it again. In
between, the wiphy can move to another netns via
NL80211_CMD_SET_WIPHY_NETNS.
Add the same net_eq() check to both. On a mismatch, return -ENODEV and
the dump ends.
No mainline driver registers .testmode_dump or
wiphy_vendor_command.dumpit, so these paths are not reachable today.
Drivers outside the tree can register either.
Lachlan Hodges [Wed, 27 May 2026 03:38:28 +0000 (13:38 +1000)]
wifi: mac80211_hwsim: modernise S1G channel list
The current S1G channel list in mac80211_hwsim does not represent
what S1G drivers would advertise that being 1MHz primaries. Also,
include the NO_PRIMARY flag on the edgeband 1MHz primaries to emulate
US operation such that it can also be tested.
Lachlan Hodges [Wed, 27 May 2026 03:38:27 +0000 (13:38 +1000)]
wifi: mac80211_hwsim: don't run RC update on new STA on S1G vif
mac80211_hwsim_sta_rc_update() is unable to handle S1G widths so
when a new STA is added under a S1G vif the WARN is hit preventing
hwsim use for S1G. For now, skip calling rc_update() for S1G
interfaces. This is required such that the soon-to-be S1G hwsim tests
can successfully run.
wifi: mac80211: add KUnit coverage for negotiated TTLM parser
Add KUnit coverage for ieee80211_parse_neg_ttlm() to lock the sparse
link_map_presence layout against future regressions.
The sparse_presence_no_oob_read case crafts a negotiated TTLM element
with link_map_presence = BIT(0) | BIT(7) and bm_size = 2 in a buffer
sized exactly to the validated element length. Without the parser
fix this would read 14 bytes past the buffer when processing TID 7;
under KASAN that is a slab-out-of-bounds report.
The dense_presence_baseline case crafts a fully populated
link_map_presence = 0xff element to confirm that the cursor-advance
fix does not regress the path that was already correct.
Export ieee80211_parse_neg_ttlm via VISIBLE_IF_MAC80211_KUNIT so the
test can call it directly.
Jackie Dong [Wed, 27 May 2026 13:03:53 +0000 (21:03 +0800)]
ALSA: hda/realtek:ALC269 fixup for Yoga Pro 7 15ASH11 mic mute LED
Lenovo Yoga Pro 7 15ASH11 with AMD RYZEN AI MAX+ 388 (Strix Halo, ACP
7.0) uses Realtek ALC287 series codec. The ALC269_FIXUP_LENOVO_XPAD_ACPI
in alc269_fixup_vendor_tbl[] can load lenovo_wmi_hotkey_utilities module
by default in this laptop, but the driver doesn't control mic mute LED.
If users run below command and the mic mute LED can work normally.
Zhao Dongdong [Wed, 27 May 2026 12:09:14 +0000 (20:09 +0800)]
ALSA: aoa: check snd_ctl_new1() return value
snd_ctl_new1() can return NULL when memory allocation fails. In
layout.c, the function does not check the return value before
dereferencing ctl->id.name or passing to aoa_snd_ctl_add(), which can
lead to a NULL pointer dereference.
Add NULL checks after snd_ctl_new1() calls and return early if any
fails.
Jani Nikula [Wed, 27 May 2026 10:02:12 +0000 (13:02 +0300)]
drm/i915: rename intel_runtime_{suspend, resume} to i915_pm_runtime_{suspend, resume}
All the other struct dev_pm_ops hooks are named i915_pm_*(), but the
.runtime_suspend and .runtime_resume hooks are called
intel_runtime_suspend() and intel_runtime_resume(), respectively.
Rename intel_runtime_suspend() to i915_pm_runtime_suspend() and
intel_runtime_resume() to i915_pm_runtime_resume() to unify.
Jani Nikula [Wed, 27 May 2026 10:02:11 +0000 (13:02 +0300)]
drm/i915/power: add "runtime" to intel_display_power_{suspend, resume}() names
The intel_display_power_suspend() and intel_display_power_resume()
functions are supposed to be called from the struct dev_pm_pops
.runtime_suspend and .runtime_resume hook paths. Name them accordingly
to intel_display_power_runtime_suspend() and
intel_display_power_runtime_resume().
Zhao Dongdong [Wed, 27 May 2026 12:09:13 +0000 (20:09 +0800)]
ALSA: cmipci: check snd_ctl_new1() return value
snd_ctl_new1() can return NULL when memory allocation fails.
snd_cmipci_spdif_controls() does not check the return value before
dereferencing kctl->id.device, which can lead to a NULL pointer
dereference.
Add NULL checks after snd_ctl_new1() calls and return -ENOMEM if any
fails.
Zhao Dongdong [Wed, 27 May 2026 12:09:12 +0000 (20:09 +0800)]
ALSA: ymfpci: check snd_ctl_new1() return value
snd_ctl_new1() can return NULL when memory allocation fails.
snd_ymfpci_create_spdif_controls() does not check the return value
before dereferencing kctl->id.device, which can lead to a NULL pointer
dereference.
Add NULL checks after snd_ctl_new1() calls and return -ENOMEM if any
fails.
Zhao Dongdong [Wed, 27 May 2026 12:09:11 +0000 (20:09 +0800)]
ALSA: ice1712: check snd_ctl_new1() return value
snd_ctl_new1() can return NULL when memory allocation fails. The
ice1712 driver calls snd_ctl_new1() without checking the return value
before dereferencing the pointer in multiple places (ice1712.c,
ice1724.c, aureon.c), which can lead to NULL pointer dereferences.
Add NULL checks after snd_ctl_new1() calls and return -ENOMEM if any
fails.
Zhao Dongdong [Wed, 27 May 2026 12:09:10 +0000 (20:09 +0800)]
ALSA: gus: check snd_ctl_new1() return value
snd_ctl_new1() can return NULL when memory allocation fails.
snd_gf1_pcm_volume_control() does not check the return value before
dereferencing kctl->id.index, which can lead to a NULL pointer
dereference.
Add a NULL check after snd_ctl_new1() and return -ENOMEM if it fails.
Nicolin Chen [Thu, 21 May 2026 20:34:22 +0000 (13:34 -0700)]
iommu/arm-smmu-v3: Allow ATS to be always on
When a device's default substream attaches to an identity domain, the SMMU
driver currently sets the device's STE between two modes:
Mode 1: Cfg=Translate, S1DSS=Bypass, EATS=1
Mode 2: Cfg=bypass (EATS is ignored by HW)
When there is an active PASID (non-default substream), mode 1 is used. And
when there is no PASID support or no active PASID, mode 2 is used.
The driver will also downgrade an STE from mode 1 to mode 2, when the last
active substream becomes inactive.
However, there are PCIe devices that demand ATS to be always on. For these
devices, their STEs have to use the mode 1 as HW ignores EATS with mode 2.
Change the driver accordingly:
- always use the mode 1
- never downgrade to mode 2
- allocate and retain a CD table (see note below)
Note that these devices might not support PASID, i.e. doing non-PASID ATS.
In such a case, the ssid_bits is set to 0. However, s1cdmax must be set to
a !0 value in order to keep the S1DSS field effective. Thus, when a master
requires ats_always_on, set its s1cdmax to at least 1, meaning that the CD
table will have a dummy entry (SSID=1) that will never be used.
Now for these devices, arm_smmu_cdtab_allocated() will always return true,
v.s. false prior to this change. When its default substream is attached to
an IDENTITY domain, its first CD is NULL in the table, which is a totally
valid case. Thus, add "!master->ats_always_on" to the condition.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Tested-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Nicolin Chen [Thu, 21 May 2026 20:34:21 +0000 (13:34 -0700)]
PCI: Allow ATS to be always on for pre-CXL devices
Some NVIDIA GPU/NIC devices, though they don't implement CXL config space,
have many CXL-like properties. Call this kind "pre-CXL".
Similar to CXL.cache capability, these pre-CXL devices also require the ATS
function even when their RIDs are IOMMU bypassed, i.e. keep ATS "always on"
v.s. "on demand" when a non-zero PASID line gets enabled in SVA use cases.
Introduce pci_dev_specific_ats_required() quirk function to scan a list of
IDs for these devices. Then, include it in pci_ats_required().
Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nirmoy Das <nirmoyd@nvidia.com> Tested-by: Nirmoy Das <nirmoyd@nvidia.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Zhao Dongdong [Wed, 27 May 2026 12:09:09 +0000 (20:09 +0800)]
ALSA: es1938: check snd_ctl_new1() return value
snd_ctl_new1() can return NULL when memory allocation fails.
snd_es1938_mixer() does not check the return value before dereferencing
the pointer, which can lead to a NULL pointer dereference.
Add a NULL check after snd_ctl_new1() and return -ENOMEM if it fails.
Nicolin Chen [Thu, 21 May 2026 20:34:20 +0000 (13:34 -0700)]
PCI: Add pci_ats_required() for CXL.cache capable devices
Controlled by IOMMU drivers, ATS can be enabled "on demand", when a given
PASID on a device is attached to an I/O page table. This is working, even
when a device has no translation on its RID (i.e., RID is IOMMU bypassed).
However, certain PCIe devices require non-PASID ATS on their RID even when
the RID is IOMMU bypassed. Call this "ATS always on" in IOMMU term.
For example, CXL spec r4.0 notes in sec 3.2.5.13 Memory Type on CXL.cache:
"To source requests on CXL.cache, devices need to get the Host Physical
Address (HPA) from the Host by means of an ATS request on CXL.io."
In other words, the CXL.cache capability requires ATS; otherwise, it can't
access host physical memory.
Introduce a new pci_ats_required() helper for the IOMMU driver to scan a
PCI device and shift ATS policies between "on demand" and "always on".
Add the support for CXL.cache devices first. Pre-CXL devices will be added
in quirks.c file.
Note that pci_ats_required() validates against pci_ats_supported(), so we
ensure that untrusted devices (e.g. external ports) will not be always on.
This maintains the existing ATS security policy regarding potential side-
channel attacks via ATS.
Cc: linux-cxl@vger.kernel.org Suggested-by: Vikram Sethi <vsethi@nvidia.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Arnd Bergmann [Tue, 19 May 2026 20:36:51 +0000 (22:36 +0200)]
iommu: vsi: avoid -Wformat-security warning
When -Wformat-security is enabled, it catches a call to
iommu_device_sysfs_add() that passes a string variable in
place of a format:
drivers/iommu/vsi-iommu.c: In function 'vsi_iommu_probe':
drivers/iommu/vsi-iommu.c:717:9: error: format not a string literal and no format arguments [-Werror=format-security]
717 | err = iommu_device_sysfs_add(&iommu->iommu, dev, NULL, dev_name(dev));
| ^~~
Pass this indirectly using "%s" as the format instead.
Vasant Hegde [Sun, 17 May 2026 12:29:25 +0000 (12:29 +0000)]
iommu/amd: Fix premature break in init_iommu_one()
In init_iommu_one(), when processing IOMMU EFR attributes, the code checks
whether GASUP is enabled. If GASUP is not enabled, the code falls back to
legacy guest IR mode and then breaks out of the switch statement.
This break incorrectly skips the subsequent initialization steps that
follow the GASUP check. These initializations are independent of GASUP
support and must always be performed.
Fix this by replacing the early break with a conditional else block,
ensuring that the XTSUP check is only skipped when GASUP is not available.
Fixes: a44092e326d4 ("iommu/amd: Use IVHD EFR for early initialization of IOMMU features") Reported-by: Sudheer Dantuluri <dantuluris@google.com> Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Miguel Ojeda [Thu, 28 May 2026 07:25:41 +0000 (09:25 +0200)]
Merge tag 'alloc-7.2-rc1' of https://github.com/Rust-for-Linux/linux into rust-next
Pull alloc updates from Danilo Krummrich:
- Fix the 'Vec::reserve()' doctest to properly account for the existing
vector length in the capacity assertion.
- Fix an incorrect operator in the 'Vec::extend_with()' SAFETY comment;
add a doc test demonstrating basic usage and the zero-length case.
- Cleanup all imports in the alloc module and its doctests to use the
"kernel vertical" import style.
* tag 'alloc-7.2-rc1' of https://github.com/Rust-for-Linux/linux:
rust: alloc: cleanup doctest imports to "kernel vertical" style
rust: alloc: cleanup imports and use "kernel vertical" style
rust: alloc: fix `Vec::extend_with` SAFETY comment
rust: alloc: add doc test for `Vec::extend_with`
rust: alloc: fix assert in `Vec::reserve` doc test
Arnd Bergmann [Wed, 27 May 2026 19:45:07 +0000 (21:45 +0200)]
drm/exynos: fix size_t format string
The exynos_gem->base.size argument is a size_t rather than an
unsigned long, so adapt the printk() format string accordingly:
In file included from drivers/gpu/drm/exynos/exynos_drm_gem.c:16:
drivers/gpu/drm/exynos/exynos_drm_gem.c: In function 'exynos_drm_alloc_buf':
drivers/gpu/drm/exynos/exynos_drm_gem.c:69:49: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
69 | DRM_DEV_DEBUG_KMS(drm_dev_dma_dev(dev), "dma_addr(0x%lx), size(0x%lx)\n",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
70 | (unsigned long)exynos_gem->dma_addr, exynos_gem->base.size);
| ~~~~~~~~~~~~~~~~~~~~~
| |
| size_t {aka unsigned int}
The dma_addr in the same line is already printed using a cast
to unsigned long, so change that similarly to use the correct
%pad format.
gcc-16 has gained some more advanced inter-procedual optimization
techniques that enable it to inline the dummy_tlb_add_page() and
dummy_tlb_flush() function pointers into a specialized version of
__arm_v7s_unmap:
>From what I can tell, the transformation is correct, as this is only
called when __arm_v7s_unmap() is called from arm_v7s_do_selftests(),
which is also __init. Since __arm_v7s_unmap() however is not __init,
gcc cannot inline the inner function calls directly.
In debug_objects_selftest(), the same thing happens. Both the
caller and the leaf function are __init, but the IPA pulls
it into a non-init one:
Marking the affected functions as not "__init" would reliably avoid this
issue but is not a good solution because it removes an otherwise correct
annotation. I tried marking the functions as 'noinline', but that ended
up not covering all the affected configurations.
With some more experimenting, I found that marking these functions as
__attribute__((noipa)) is both logical and reliable.
In order to keep the syntax readable, add a custom macro for this in
include/linux/compiler_attributes.h next to other related macros and
use it to annotate both files.
Link: https://lore.kernel.org/all/abRB6g-48ZX6Yl2r@willie-the-truck/ Cc: Will Deacon <will@kernel.org> Cc: Thomas Gleixner <tglx@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: linux-kbuild@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Thomas Gleixner <tglx@kernel.org> Acked-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
All published NPU firmware versions support D0i3 delayed entry
flow, making this workaround obsolete. It was originally added as
a safety measure for potential firmware bugs.
Recent firmware dropped legacy D0i3 entry support, so the workaround
can't be used anyway. Hardcode d0i3_delayed_entry boot param to 1 to
ensure older firmware works in the correct mode.
Ville Syrjälä [Fri, 22 May 2026 20:03:45 +0000 (23:03 +0300)]
drm/i915/bw: Remove deinterleave fallback for TGL+
Remove the deinterleave fallback calculation from the TGL+ codepath.
The fallback is using the ICL deinterleave calculation which was never
in the TGL+ algorithm. All supported memory types have the correct
deinterleave already specified for TGL+ anyway, so this is dead code.
Ville Syrjälä [Fri, 22 May 2026 20:03:43 +0000 (23:03 +0300)]
drm/i915/bw: Fix/unify peakbw calculations
We have several copies of the same memory peak bandwidth calculations,
and the rounding directions are all over the place in some of them.
Unify it all into one small function (with rounding matching what Bspec
says).
Note that 'channel_width' is always a multiple of 8 anyway, so for
'channel_width / 8' the rounding direction doesn't actually matter.
Ville Syrjälä [Fri, 22 May 2026 20:03:42 +0000 (23:03 +0300)]
drm/i915/bw: Fix DEPROGBWPCLIMIT handling on BMG
DEPROGBWPCLIMIT is specified in %, so divide by 100 instead of 10.
Fortunately the deprobbwlimit is much lower than the peak memory
bandwidth on BMG, so whether we take 60% or 600% of the peak
bandwidth doesn't matter as the min() will pick the lower
deprobbwlimit anyway.
Eg. on the BMG here I get (with or without the fix):
QGV 0: deratedbw=33600 peakbw=48000
QGV 1: deratedbw=53000 peakbw=456000
Ville Syrjälä [Fri, 22 May 2026 20:03:40 +0000 (23:03 +0300)]
drm/i915/bw: Fix 'deinterleave' rounding direction
For some reason we're rounding up when calculating the deinterleave
value. But the spec says we should round down. Fix it.
But I suppose this doesn't actually matter since the deinterleave
values should always be power of two. The only exception is therefore
the deinterleave==1 case, which gets handled by the max(..., 1).
Ville Syrjälä [Fri, 22 May 2026 20:03:38 +0000 (23:03 +0300)]
drm/i915/bw: Fix DCLK rounding mess
Fix up the total mess when calculating the DCLK
frequency. Some codepaths are trying to do both DIV_ROUND_UP()
and an open coded "round to nearest" at the same time. The
MTL+ codepath was the only one that was correct (using
DIV_ROUND_CLOSEST()).
Let's unify all of them, and borrow the actual '100/6'
approach from adl_calc_psf_bw() so that we get even less
rounding errors.
Ville Syrjälä [Fri, 22 May 2026 20:03:37 +0000 (23:03 +0300)]
drm/i915/bw: Fix num_planes handling on TGL+
The TGL+ bw code has an off by one error on the num_planes
calculation, and tgl_max_bw_index() incorrectly bumps
the num_planes to 1 from 0.
That approach made sense on ICL where num_planes is more or
less a minimum number of planes to consider for the group,
but on TGL+ num_planes really is a maximum number of planes,
so these adjustments no longer make any sense there.
Introduce an event relayer mechanism. Instead of each open file
registering directly with `ec_dev->event_notifier`, the platform device
registers a single relayer notifier. Individual files then register
with a local subscribers list in `chardev_pdata`.
This allows the driver to safely disconnect from the event chain
`ec_dev->event_notifier` during cros_ec_chardev_remove(), preventing
events from being delivered to open files after the device is removed,
while still allowing those files to be closed safely later.
Introduce struct chardev_pdata to hold platform driver data.
The platform driver data is allocated by kzalloc() instead of devm
variant, allowing for managed cleanup that can eventually extend beyond
device removal if files are still open.
hdrlen is __u8. For n >= 127 the result exceeds 255 and silently
truncates. With n=127 (cmpri=15, cmpre=15, pad=0, hdrlen=16):
(128 * 16) >> 3 = 256, truncated to 0 as __u8
The caller in ipv6_rpl_srh_rcv() then places the compressed header
at buf + ((ohdr->hdrlen + 1) << 3). With hdrlen=0 this is buf + 8,
but the decompressed region occupies buf[0..2055] (8-byte header
plus 128 full addresses). The compressed header overlaps the
decompressed data, and ipv6_rpl_srh_compress() writes into this
overlap, corrupting the routing header of the forwarded packet.
The existing guard at exthdrs.c:546 checks (n + 1) > 255, which
prevents n+1 from overflowing unsigned char (the segments_left
field), but does not prevent the computed hdrlen from overflowing
__u8. n=127 passes because 128 <= 255, yet hdrlen=256 does not
fit.
Tighten the bound to (n + 1) > 127. This caps n at 126, giving
hdrlen = (127 * 16) >> 3 = 254, which fits in __u8. The compressed
header then lands at buf + ((254 + 1) << 3) = buf + 2040, exactly
past the decompressed region (buf[0..2039]). No overlap. 127
segments is well beyond any realistic RPL deployment.
Jim Mattson [Wed, 27 May 2026 23:47:07 +0000 (23:47 +0000)]
KVM: x86/pmu: Allow Host-Only/Guest-Only bits with nSVM and mediated PMU
Now that KVM correctly handles Host-Only and Guest-Only bits in the
event selector MSRs, allow the guest to set them if the vCPU advertises
SVM and uses the mediated PMU.
Yosry Ahmed [Wed, 27 May 2026 23:47:06 +0000 (23:47 +0000)]
KVM: x86/pmu: Reprogram Host/Guest-Only counters on nested transitions
Reprogram PMU counters on nested transitions for the mediated PMU, to
re-evaluate Host-Only and Guest-Only bits and enable/disable the PMU
counters accordingly. For example, if Host-Only is set and Guest-Only is
cleared, a counter should be disabled when entering guest mode and
enabled when exiting guest mode.
According to the APM, when EFER.SVME is cleared, setting Host-Only or
Guest-Only disables the counter, so also trigger counter reprogramming
when EFER.SVME is toggled.
Counters setting any of Host-Only and Guest-Only bits are already being
tracked in pmc_has_mode_specific_enables, use the bitmap to reprogram
these counters.
Reprogram the counters synchronously on nested VMRUN/#VMEXIT and
EFER.SVME toggling. This is necessary as these instructions are counted
based on the new CPU state (after the instruction is retired in
hardware). Hence, the PMU needs to be updated before instruction
emulation is completed and kvm_pmu_instruction_retired() is called.
Defer reprogramming the counters when force leaving guest mode through
svm_leave_nested() to avoid potentially reading stale state (e.g.
incorrect EFER). All flows force leaving nested are non-architectural,
so accuracy is irrelevant.
Refactor a helper out of kvm_pmu_request_reprogram_counters() that
accepts a boolean allowing synchronous vs deferred reprogramming, and
use that from SVM code to support both scenarios.
Yosry Ahmed [Wed, 27 May 2026 23:47:05 +0000 (23:47 +0000)]
KVM: x86/pmu: Track mediated PMU counters with mode-specific enables
Instead of always checking of a counter needs to be disabled for
mode-specific reasons (e.g. Host-Only/Guest-Only bits in SVM), add a
bitmap to track such counters. Set the bit for counters using either
Host-Only or Guest-Only bits in EVENTSEL on SVM.
This bitmap will also be reused in following changes to selectively
apply changes to such counters.
Yosry Ahmed [Wed, 27 May 2026 23:47:04 +0000 (23:47 +0000)]
KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM
Introduce an optional per-vendor PMU callback for checking if a counter
is disabled in the current mode, and register a callback on AMD to
disable a counter based on the vCPU's setting of Host-Only or Guest-Only
EVENT_SELECT bits with the mediated PMU.
If EFER.SVME is set, all events are counted if both bits are set or
cleared. If only one bit is set, the counter is disabled if the vCPU
context does not match the set bit.
If EFER.SVME is cleared, the counter is disabled if any of the bits is
set, otherwise all events are counted. Note that a Linux guest correctly
handles this and clears Host-Only when EFER.SVME is cleared, see commit 1018faa6cf23 ("perf/x86/kvm: Fix Host-Only/Guest-Only counting with SVM
disabled").
The callback is made from pmc_is_locally_enabled(), which is used for
the mediated PMU when updating eventsel_hw in
kvm_mediated_pmu_refresh_eventsel_hw(), as well as when checking what
PMCs count instructions/branches for emulation in
kvm_pmu_recalc_pmc_emulation().
Host-Only and Guest-Only bits are currently reserved, so this change is
a noop, but the bits will be allowed with mediated PMU in a following
change when fully supported.
Yosry Ahmed [Wed, 27 May 2026 23:47:03 +0000 (23:47 +0000)]
KVM: x86/pmu: Add support for KVM_X86_PMU_OP_OPTIONAL_RET0
Add definitions for KVM_X86_PMU_OP_OPTIONAL_RET0() to resolve to
__static_call_return0, similar to KVM_X86_OP_OPTIONAL_RET0(). Move the
definition of kvm_pmu_call() to pmu.h, and add declarations for the
static PMU calls in the header to allow making callbacks from the header
in following changes.
Yosry Ahmed [Wed, 27 May 2026 23:47:02 +0000 (23:47 +0000)]
KVM: x86/pmu: Check mediated PMU counter enablement before event filters
If the guest disables the counter (by clearing
ARCH_PERFMON_EVENTSEL_ENABLE), KVM still performs the PMU filter lookup,
even though it doesn't end up changing eventsel_hw. Check if the
counter is enabled by the guest before doing the potentially expensive
PMU filter lookup.
Yosry Ahmed [Wed, 27 May 2026 23:47:00 +0000 (23:47 +0000)]
KVM: x86/pmu: Rename reprogram_counters() to clarify usage
Rename reprogram_counters() to kvm_pmu_request_counters_reprogram()
clarifying that it is more similar to
kvm_pmu_request_counter_reprogram(), and less similar to
reprogram_counter(). The kvm_pmu_* prefix is also appropriate as the
function is exposed in the header.
Opportunistically rename the argument from 'diff' to 'counters'.
Yosry Ahmed [Wed, 27 May 2026 23:46:59 +0000 (23:46 +0000)]
KVM: x86: Move enable_pmu/enable_mediated_pmu to pmu.h and pmu.c
The declaration and definition of enable_pmu/enable_mediated_pmu
semantically belongs in pmu.h and pmu.c, and more importantly, pmu.h
uses enable_mediated_pmu and relies on the caller including x86.h.
There is already precedence for other module params defined outside of
x86.c, so move enable_pmu/enable_mediated_pmu to pmu.c.
Yosry Ahmed [Wed, 27 May 2026 23:46:58 +0000 (23:46 +0000)]
KVM: nSVM: Move VMRUN instruction retirement after entering guest mode
A successful VMRUN retires in guest mode and should be counted by the
PMU as a guest instruction. Move the call to
kvm_pmu_instruction_retired() after potentially entering guest mode,
such that VMRUN is counted correctly.
The PMU event will be matched against L2's CPL, but otherwise this does
not change the behavior in terms of guest vs. host, because KVM does
not virtualize Host-Only/Guest-Only PMC controls yet, so all
instructions are counted regardless of the vCPU's host/guest state. But
this change is needed for the incoming support for Host-Only/Guest-Only
controls to count VMRUN correctly.
Yosry Ahmed [Wed, 27 May 2026 23:46:57 +0000 (23:46 +0000)]
KVM: nSVM: Unify RIP and PMU handling calls when emulating VMRUN
The code paths for advancing RIP and retiring the instruction for RIP
are very similar whether or not caching vmcb12 succeeds. The only
difference is handling mapping failures (i.e. EFAULT).
Pull the mapping failure handling out and unify the calls to
svm_skip_emulated_instruction() and kvm_pmu_instruction_retired(), but
return immediately after if copying and caching vmcb12 failed. A nice
side effect of this is that the FIXME comment is now above the only code
path calling svm_skip_emulated_instruction().
Yosry Ahmed [Wed, 27 May 2026 23:46:56 +0000 (23:46 +0000)]
KVM: nSVM: Bail early out of VMRUN emulation if advancing RIP fails
If svm_skip_emulation_instruction() fails, then RIP could not be
advanced correctly (e.g. decode failure when NextRIP is not available).
KVM will exit to userspace to handle the emulation failure, but only
after stuffing the wrong RIP into vmcb01 and entering guest mode.
Bail early and exit to userspace before committing any side-effects of
emulating the VMRUN (e.g. entering guest mode).
Fixes: c8e16b78c614 ("x86: KVM: svm: eliminate hardcoded RIP advancement from vmrun_interception()") Signed-off-by: Yosry Ahmed <yosry@kernel.org> Link: https://patch.msgid.link/20260527234711.4175166-3-yosry@kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
Yosry Ahmed [Wed, 27 May 2026 23:46:55 +0000 (23:46 +0000)]
KVM: nSVM: Stop leaking single-stepping on VMRUN into L2
According to the APM, TF on VMRUN causes a #DB after VMRUN completes on
the _host_ side. However, KVM injects a #DB in L2 context instead (or
exits to userspace if KVM_GUESTDBG_SINGLESTEP is set) in
kvm_skip_emulated_instruction().
Avoid single-step handling on VMRUN by open-coding the rest of
kvm_skip_emulated_instruction() in nested_svm_vmrun(). This doesn't look
pretty, but following changes will need to open-code
kvm_pmu_instruction_retired() anyway, and will cleanup the code. This
ignores TF on VMRUN instead of injecting a spurious exception into
L2. Document this virtualization hole with a FIXME.
Note that a failed VMRUN would have been correctly single-stepped, but
now TF is always ignored for consistency and simplicity purposes. VMX
does not support TF on a successful VMLAUNCH/VMRESUME, so it's unlikely
that single-stepping VMRUN properly is important, especially if it's
only for failed VMRUNs.
Fixes: c8e16b78c614 ("x86: KVM: svm: eliminate hardcoded RIP advancement from vmrun_interception()") Signed-off-by: Yosry Ahmed <yosry@kernel.org> Link: https://patch.msgid.link/20260527234711.4175166-2-yosry@kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
Willem de Bruijn [Tue, 26 May 2026 13:40:37 +0000 (09:40 -0400)]
net: sch_fq: update flow delivery time on earlier EDT packet
When inserting an EDT packet with time before flow->time_next_packet,
update the flow and possibly queue next delivery time.
Reinsert the flow into the q->delayed rb-tree to position correctly
and to have fq_check_throttled set wake-up at the right next time.
Factor RB tree insertion out fq_flow_set_throttled to avoid open
coding twice.
EDT packets do not take precedence over queue rate limit. Skip this
new step if a queue limit is set. EDT packets do take precedence over
per-socket rate limits, as can be seen from fq_dequeue reading
sk_pacing_rate if !skb->tstamp.
With this change the so_txtime selftest sends packets in the expected
order.
====================
Introduce Airoha AN8801R series Gigabit Ethernet PHY driver
This series introduces the Airoha AN8801R Gigabit Ethernet PHY initial
support.
The Airoha AN8801R is a low power single-port Ethernet PHY Transceiver
with Single-port serdes interface for 1000Base-X/RGMII.
This chip is compliant with 10Base-T, 100Base-TX and 1000Base-T IEEE
802.3(u,ab) and supports:
- Energy Efficient Ethernet (802.3az)
- Full Duplex Control Flow (802.3x)
- auto-negotiation
- crossover detect and autocorrection,
- Wake-on-LAN with Magic Packet
- Jumbo Frame up to 9 Kilobytes.
This PHY also supports up to three user-configurable LEDs, which are
usually used for LAN Activity, 100M, 1000M indication.
The series provides the devicetree binding and the driver that have been
written by AngeloGioacchino Del Regno, based on downstream
implementation ([1]). The driver allows setting up PHY LEDs, 10/100M,
1000M speeds, and Wake on LAN and PHY interrupts.
Since v2, the series also adds the air_phy_lib library, which goal is to
share common code between air_en8811h and air_an8801 drivers, and its use
in them. The first shared functions are the existing BuckPbus register
accessors and air_phy_read/write_page functions coming from air_en8811h
driver.
The series is based on net-next kernel tree (sha1: 90d03ee2c5dc) and
I have tested it on Mediatek Genio 720-EVK board (that integrates an
Airoha AN8801RIN/A Ethernet PHY) with early board hardware enablement
patches.
net: phy: air_an8801: ensure maximum available speed link use
To ensure that the Airoha AN8801R PHY uses the maximum available link
speed, an additional register write is needed to configure the function
mode for either 1G or 100M/10M operation after link detection.
So, in air_an8801 driver, implement a custom read_status callback, that
after genphy_read_status determines the link speed, sets the bit 0 of
the link mode register (REG_LINK_MODE) if the detected speed is 1Gbps,
or unsets it otherwise.
Introduce a driver for the Airoha AN8801R Series Gigabit Ethernet
PHY; this currently supports setting up PHY LEDs, 10/100M, 1000M
speeds, and Wake on LAN and PHY interrupts.
net: phy: Rename Airoha common BuckPBus register accessors
Rename the BuckPBus register accessors functions present in air_phy_lib
and their calls in air_en8811h driver, so all exported functions start
with the same prefix.
In preparation of Airoha AN8801R PHY support, move the BuckPBus
register accessors and definitions, present in air_en8811h driver,
into the Airoha PHY shared code (air_phy_lib), so they will be usable
by the new driver without duplicating them.
In preparation of Airoha AN8801R PHY support, split out the interface
functions that will be common between the already present air_en8811h
driver and the new one, and put them into a new library named
air_phy_lib.