Allocate SVM's interrupt remapping metadata using GFP_ATOMIC as
svm_ir_list_add() is called with IRQs are disabled and irqfs.lock held
when kvm_irq_routing_update() reacts to GSI routing changes.
Fixes: 411b44ba80ab ("svm: Implements update_pi_irte hook to setup posted interrupt") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250404193923.1413163-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As per virtio spec the fields cols and rows are specified as little
endian. Although there is no legacy interface requirement that would
state that cols and rows need to be handled as native endian when legacy
interface is used, unlike for the fields of the adjacent struct
virtio_console_control, I decided to err on the side of caution based
on some non-conclusive virtio spec repo archaeology and opt for using
virtio16_to_cpu() much like for virtio_console_control.event. Strictly
by the letter of the spec virtio_le_to_cpu() would have been sufficient.
But when the legacy interface is not used, it boils down to the same.
And when using the legacy interface, the device formatting these as
little endian when the guest is big endian would surprise me more than
it using guest native byte order (which would make it compatible with
the current implementation). Nevertheless somebody trying to implement
the spec following it to the letter could end up forcing little endian
byte order when the legacy interface is in use. So IMHO this ultimately
needs a judgement call by the maintainers.
Fixes: 8345adbf96fc1 ("virtio: console: Accept console size along with resize control message") Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Cc: stable@vger.kernel.org # v2.6.35+
Message-Id: <20250322002954.3129282-1-pasic@linux.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Return -EINVAL instead of success if amd_ir_set_vcpu_affinity() is
invoked without use_vapic; lying to KVM about whether or not the IRTE was
configured to post IRQs is all kinds of bad.
Fixes: d98de49a53e4 ("iommu/amd: Enable vAPIC interrupt remapping mode by default") Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250404193923.1413163-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This patch fixes a Use-After-Free vulnerability in the HFSC qdisc class
handling. The issue occurs due to a time-of-check/time-of-use condition
in hfsc_change_class() when working with certain child qdiscs like netem
or codel.
The vulnerability works as follows:
1. hfsc_change_class() checks if a class has packets (q.qlen != 0)
2. It then calls qdisc_peek_len(), which for certain qdiscs (e.g.,
codel, netem) might drop packets and empty the queue
3. The code continues assuming the queue is still non-empty, adding
the class to vttree
4. This breaks HFSC scheduler assumptions that only non-empty classes
are in vttree
5. Later, when the class is destroyed, this can lead to a Use-After-Free
The fix adds a second queue length check after qdisc_peek_len() to verify
the queue wasn't emptied.
Fixes: 21f4d5cc25ec ("net_sched/hfsc: fix curve activation in hfsc_change_class()") Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg> Reviewed-by: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20250417184732.943057-2-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
There is a racing condition between workqueue created when enabling
bearer and another thread created when disabling bearer right after
that as follow:
A network restart test on a router led to an out-of-memory condition,
which was traced to a memory leak in the PHY LED trigger code.
The root cause is misuse of the devm API. The registration function
(phy_led_triggers_register) is called from phy_attach_direct, not
phy_probe, and the unregister function (phy_led_triggers_unregister)
is called from phy_detach, not phy_remove. This means the register and
unregister functions can be called multiple times for the same PHY
device, but devm-allocated memory is not freed until the driver is
unbound.
This also prevents kmemleak from detecting the leak, as the devm API
internally stores the allocated pointer.
Fix this by replacing devm_kzalloc/devm_kcalloc with standard
kzalloc/kcalloc, and add the corresponding kfree calls in the unregister
path.
Fixes: 3928ee6485a3 ("net: phy: leds: Add support for "link" trigger") Fixes: 2e0bc452f472 ("net: phy: leds: add support for led triggers on phy link state change") Signed-off-by: Hao Guan <hao.guan@siflower.com.cn> Signed-off-by: Qingfang Deng <qingfang.deng@siflower.com.cn> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250417032557.2929427-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
cpufreq_cpu_get_raw() can return NULL when the target CPU is not present
in the policy->cpus mask. scpi_cpufreq_get_rate() does not check for
this case, which results in a NULL pointer dereference.
The page_link lower bits of the first sg could contain something like
SG_END, if we are mapping a single VRAM page or contiguous blob which
fits into one sg entry. Rather pull out the struct page, and use that in
our check to know if we mapped struct pages vs VRAM.
Fixes: f44ffd677fb3 ("drm/amdgpu: add support for exporting VRAM using DMA-buf v3") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Christian König <christian.koenig@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v5.8+ Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently callers have to provide handle of amdgpu_device,
which is not used by the implementation. It is unlikely this
parameter will become useful in future, thus removing it
v2: squash in unused variable fix
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: c0dd8a9253fa ("drm/amdgpu/dma_buf: fix page_link check") Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:648: warning: Function parameter or member 'dev' not described in 'amdgpu_vram_mgr_free_sgt'
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:648: warning: Function parameter or member 'dir' not described in 'amdgpu_vram_mgr_free_sgt'
Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: c0dd8a9253fa ("drm/amdgpu/dma_buf: fix page_link check") Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently if the filesystem for the cgroups version it wants to use is not
mounted charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh tests
will attempt to mount it on the hard coded path /dev/cgroup/memory,
deleting that directory when the test finishes. This will fail if there
is not a preexisting directory at that path, and since the directory is
deleted subsequent runs of the test will fail. Instead of relying on this
hard coded directory name use mktemp to generate a temporary directory to
use as a mountpoint, fixing both the assumption and the disruption caused
by deleting a preexisting directory.
This means that if the relevant cgroup filesystem is not already mounted
then we rely on having coreutils (which provides mktemp) installed. I
suspect that many current users are relying on having things automounted
by default, and given that the script relies on bash it's probably not an
unreasonable requirement.
Link: https://lkml.kernel.org/r/20250404-kselftest-mm-cgroup2-detection-v1-1-3dba6d32ba8c@kernel.org Fixes: 209376ed2a84 ("selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting") Signed-off-by: Mark Brown <broonie@kernel.org> Cc: Aishwarya TCV <aishwarya.tcv@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Mina Almasry <almasrymina@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Replace assignment of PCI domain IDs from atomic_inc_return() to
ida_alloc().
Use two IDAs, one for static domain allocations (those which are defined in
device tree) and second for dynamic allocations (all other).
During removal of root bus / host bridge, also release the domain ID. The
released ID can be reused again, for example when dynamically loading and
unloading native PCI host bridge drivers.
This change also allows to mix static device tree assignment and dynamic by
kernel as all static allocations are reserved in dynamic pool.
[bhelgaas: set "err" if "bus->domain_nr < 0"] Link: https://lore.kernel.org/r/20220714184130.5436-1-pali@kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Stable-dep-of: 804443c1f278 ("PCI: Fix reference leak in pci_register_host_bridge()") Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently we retrieve the PCI domain number of the host bridge from the
bus sysdata (or pci_config_window if PCI_DOMAINS_GENERIC=y). Actually
we have the information at PCI host bridge probing time, and it makes
sense that we store it into pci_host_bridge. One benefit of doing so is
the requirement for supporting PCI on Hyper-V for ARM64, because the
host bridge of Hyper-V doesn't have pci_config_window, whereas ARM64 is
a PCI_DOMAINS_GENERIC=y arch, so we cannot retrieve the PCI domain
number from pci_config_window on ARM64 Hyper-V guest.
As the preparation for ARM64 Hyper-V PCI support, we introduce the
domain_nr in pci_host_bridge and a sentinel value to allow drivers to
set domain numbers properly at probing time. Currently
CONFIG_PCI_DOMAINS_GENERIC=y archs are only users of this
newly-introduced field.
The ad7768-1 ADC output code is two's complement, meaning that the voltage
conversion result is a signed value.. Since the value is a 24 bit one,
stored in a 32 bit variable, the sign should be extended in order to get
the correct representation.
Also the channel description has been updated to signed representation,
to match the ADC specifications.
The data used is all in local variables so there is no advantage
in setting *val = ret with the direct mode claim held.
Move it later to after error check.
soc_dev_attr->revision could be NULL, thus,
a pointer check is added to prevent potential NULL pointer dereference.
This is similar to the fix in commit 3027e7b15b02
("ice: Fix some null pointer dereference issues in ice_ptp.c").
Old Exynos SoCs have both Product ID and Revision ID in one single
register, while new SoCs tend to have two separate registers for those
IDs. Implement handling of both cases by passing Revision ID register
offsets in driver data.
Previously existing macros for Exynos4210 (removed in this patch) were
incorrect:
soc_device_to_device() seems to be discouraged [1] so remove it in favor
of printing info message with platform device. This will only change
the prefix in the info message from "soc soc0: " to "exynos-chipid 10000000.chipid:".
The Exynos Chip ID driver on Exynos SoCs has so far only informational
purpose - to expose the SoC device in sysfs. No other drivers depend on
it so there is really no benefit of initializing it early.
The code would be the most flexible if converted to a regular driver.
However there is already another driver - Exynos ASV (Adaptive Supply
Voltage) - which binds to the device node of Chip ID.
The solution is to convert the Exynos Chip ID to a built in driver and
merge the Exynos ASV into it.
This has several benefits:
1. Although the Exynos ASV driver binds to a device node present in all
Exynos DTS (generic compatible), it fails to probe except on the
supported ones (only Exynos5422). This means that the regular boot
process has a planned/normal device probe failure.
Merging the ASV into Chip ID will remove this probe failure because
the final driver will always bind, just with disabled ASV features.
2. Allows to use dev_info() as the SoC bus is present (since
core_initcall).
3. Could speed things up because of execution of Chip ID code in a SMP
environment (after bringing up secondary CPUs, unlike early_initcall),
This reduces the amount of work to be done early, when the kernel has
to bring up critical devices.
The Exynos ChipID driver on Exynos SoCs has only informational
purpose - to expose the SoC device in sysfs. No other drivers
depend on it so there is really no benefit of initializing it early.
Instead, initialize everything with arch_initcall which:
1. Allows to use dev_info() as the SoC bus is present (since
core_initcall),
2. Could speed things up because of execution in a SMP environment
(after bringing up secondary CPUs, unlike early_initcall),
3. Reduces the amount of work to be done early, when the kernel has to
bring up critical devices.
words_count denotes the number of words in total payload, while data
points to payload of various property within it. When words_count
reaches last word, data can access memory beyond the total payload. This
can lead to OOB access. With this patch, the utility api for handling
individual properties now returns the size of data consumed. Accordingly
remaining bytes are calculated before parsing the payload, thereby
eliminates the OOB access possibilities.
Introduce a new hfi platform to cover differences between hfi
versions. As a start move vpp/vsp freq data in that hfi
platform, more platform data will come later.
The hfi_session_init can be called many times and it returns
EINVAL when the session was already initialized. This error code
(EINVAL) is confusing for the callers. Change hfi_session_init to
return EALREADY error code when the session has been already
initialized.
Currently we rely on firmware to return error when we reach the maximum
supported number of sessions. But this errors are happened at reqbuf
time which is a bit later. The more reasonable way looks like is to
return the error on driver open.
To achieve that modify hfi_session_create to return error when we reach
maximum count of sessions and thus refuse open.
Syzkaller has reported a general protection fault at function
ir_raw_event_store_with_filter(). This crash is caused by a NULL pointer
dereference of dev->raw pointer, even though it is checked for NULL in
the same function, which means there is a race condition. It occurs due
to the incorrect order of actions in the streamzap_disconnect() function:
rc_unregister_device() is called before usb_kill_urb(). The dev->raw
pointer is freed and set to NULL in rc_unregister_device(), and only
after that usb_kill_urb() waits for in-progress requests to finish.
If rc_unregister_device() is called while streamzap_callback() handler is
not finished, this can lead to accessing freed resources. Thus
rc_unregister_device() should be called after usb_kill_urb().
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 8e9e60640067 ("V4L/DVB: staging/lirc: port lirc_streamzap to ir-core") Cc: stable@vger.kernel.org Reported-by: syzbot+34008406ee9a31b13c73@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=34008406ee9a31b13c73 Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru> Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Sasha Levin <sashal@kernel.org>
The printk format for synth event uses "%.*s" to print string fields,
but then only passes the pointer part as var arg.
Replace %.*s with %s as the C string is guaranteed to be null-terminated.
The output in print fmt should never have been updated as __get_str()
handles the string limit because it can access the length of the string in
the string meta data that is saved in the ring buffer.
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: 8db4d6bfbbf92 ("tracing: Change synthetic event string format to limit printed length") Link: https://lore.kernel.org/20250325165202.541088-1-douglas.raillard@arm.com Signed-off-by: Douglas Raillard <douglas.raillard@arm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Allow a stacktrace from one event to be displayed by the end event of a
synthetic event. This is very useful when looking for the longest latency
of a sleep or something blocked on I/O.
The above creates a "block_lat" synthetic event that take the stacktrace of
when a task schedules out in either the interruptible or uninterruptible
states, and on a new per process max $delta (the time it was scheduled
out), will print the process id and the stacktrace.
After resume/online turbo limit ratio (TRL) is restored partially if
the admin explicitly changed TRL from user space.
A hash table is used to store SST mail box and MSR settings when modified
to restore those settings after resume or online. This uses a struct
isst_cmd field "data" to store these settings. This is a 64 bit field.
But isst_store_new_cmd() is only assigning as u32. This results in
truncation of 32 bits.
Change the argument to u64 from u32.
Fixes: f607874f35cb ("platform/x86: ISST: Restore state on resume") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250328224749.2691272-1-srinivas.pandruvada@linux.intel.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit cc7ad0d77b51 ("drivers: staging: rtl8723bs: Fix deadlock in
rtw_surveydone_event_callback()") besides fixing the deadlock also
modified rtw_scan_timeout_handler() to use spin_[un]lock_irq()
instead of spin_[un]lock_bh().
Disabling the IRQs is not necessary since all code taking this lock
runs from either user contexts or from softirqs
rtw_scan_timeout_handler() is the only function taking pmlmepriv->lock
which uses spin_[un]lock_irq() for this. Switch back to
spin_[un]lock_bh() to make it consistent with the rest of the code.
Fixes: cc7ad0d77b51 ("drivers: staging: rtl8723bs: Fix deadlock in rtw_surveydone_event_callback()") Cc: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20230221145326.7808-2-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
devm_kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure. Ensure the allocation was successful
by checking the pointer validity.
Once the discipline is associated with the device, deleting the device
takes care of decrementing the module's refcount. Doing it manually on
this error path causes refcount to artificially decrease on each error
while it should just stay the same.
Fixes: c020d722b110 ("s390/dasd: fix panic during offline processing") Signed-off-by: Miroslav Franc <mfranc@suse.cz> Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20240209124522.3697827-3-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
[Minor context change fixed] Signed-off-by: Feng Liu <Feng.Liu3@windriver.com> Signed-off-by: He Zhe <Zhe.He@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We hold pmlmepriv->lock in position (1) of thread 1 and use
del_timer_sync() to wait timer to stop, but timer handler
also need pmlmepriv->lock in position (2) of thread 2.
As a result, rtw_surveydone_event_callback() will block forever.
This patch extracts del_timer_sync() from the protection of
spin_lock_bh(), which could let timer handler to obtain
the needed lock. What`s more, we change spin_lock_bh() in
rtw_scan_timeout_handler() to spin_lock_irq(). Otherwise,
spin_lock_bh() will also cause deadlock() in timer handler.
In the case of apply_to_existing_page_range(), apply_to_pte_range() is
reached with 'create' set to false. When !create, the loop over the PTE
page table is broken.
apply_to_pte_range() will only move to the next PTE entry if 'create' is
true or if the current entry is not pte_none().
This means that the user of apply_to_existing_page_range() will not have
'fn' called for any entries after the first pte_none() in the PTE page
table.
Fix the loop logic in apply_to_pte_range().
There are no known runtime issues from this, but the fix is trivial enough
for stable@ even without a known buggy user.
Link: https://lkml.kernel.org/r/20250409094043.1629234-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Fixes: be1db4753ee6 ("mm/memory.c: add apply_to_existing_page_range() helper") Cc: Daniel Axtens <dja@axtens.net> Cc: David Hildenbrand <david@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
lock_task_sighand() can trigger a hard lockup. If NR_CPUS threads call
do_task_stat() at the same time and the process has NR_THREADS, it will
spin with irqs disabled O(NR_CPUS * NR_THREADS) time.
Change do_task_stat() to use sig->stats_lock to gather the statistics
outside of ->siglock protected section, in the likely case this code will
run lockless.
Link: https://lkml.kernel.org/r/20240123153357.GA21857@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Dylan Hatch <dylanbhatch@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: David Sauerwein <dssauerw@amazon.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we abort driver initialisation in the middle of gt/engine discovery,
some engines will be fully setup and some not. Those incompletely setup
engines only have 'engine->release == NULL' and so will leak any of the
common objects allocated.
v2:
- Drop the destroy_pinned_context() helper for now. It's not really
worth it with just a single callsite at the moment. (Janusz)
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220915232654.3283095-2-matthew.d.roper@intel.com Signed-off-by: Zhi Yang <Zhi.Yang@windriver.com> Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit ebff7d8f270d ("mem hotunplug: fix kfree() of bootmem
memory"), we could get a resource allocated during boot via
alloc_resource(). And it's required to release the resource using
free_resource(). Howerver, many people use kfree directly which will
result in kernel BUG. In order to fix this without fixing every call
site, just leak a couple of bytes in such corner case.
Link: https://lkml.kernel.org/r/20220217083619.19305-1-linmiaohe@huawei.com Fixes: ebff7d8f270d ("mem hotunplug: fix kfree() of bootmem memory") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Suggested-by: David Hildenbrand <david@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: David Sauerwein <dssauerw@amazon.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If 'vfio_pci_core_device::needs_pm_restore' is set (PCI device does
not have No_Soft_Reset bit set in its PMCSR config register), then
the current PCI state will be saved locally in
'vfio_pci_core_device::pm_save' during D0->D3hot transition and same
will be restored back during D3hot->D0 transition.
For saving the PCI state locally, pci_store_saved_state() is being
used and the pci_load_and_free_saved_state() will free the allocated
memory.
But for reset related IOCTLs, vfio driver calls PCI reset-related
API's which will internally change the PCI power state back to D0. So,
when the guest resumes, then it will get the current state as D0 and it
will skip the call to vfio_pci_set_power_state() for changing the
power state to D0 explicitly. In this case, the memory pointed by
'pm_save' will never be freed. In a malicious sequence, the state changing
to D3hot followed by VFIO_DEVICE_RESET/VFIO_DEVICE_PCI_HOT_RESET can be
run in a loop and it can cause an OOM situation.
This patch frees the earlier allocated memory first before overwriting
'pm_save' to prevent the mentioned memory leak.
Fixes: 51ef3a004b1e ("vfio/pci: Restore device state on PM transition") Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220217122107.22434-2-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
[Minor context change fixed] Signed-off-by: Feng Liu <Feng.Liu3@windriver.com> Signed-off-by: He Zhe <Zhe.He@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
An unexpected WARN_ON from flush_work() may occur when cache creation
fails, caused by destroying the uninitialized delayed_work waker in the
error path of cache_create(). For example, the warning appears on the
superblock checksum error.
(snip)
WARNING: CPU: 0 PID: 84 at kernel/workqueue.c:4178 __flush_work+0x5d4/0x890
Fix by pulling out the cancel_delayed_work_sync() from the constructor's
error path. This patch doesn't affect the use-after-free fix for
concurrent dm_resume and dm_destroy (commit 6a459d8edbdb ("dm cache: Fix
UAF in destroy()")) as cache_dtr is not changed.
Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Fixes: 6a459d8edbdb ("dm cache: Fix UAF in destroy()") Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Synchronize the declaration of ds1287_set_base_clock() between
cevt-ds1287.c and ds1287.h.
Fix follow error with gcc-14 when -Werror:
arch/mips/kernel/cevt-ds1287.c:21:5: error: conflicting types for ‘ds1287_set_base_clock’; have ‘int(unsigned int)’
21 | int ds1287_set_base_clock(unsigned int hz)
| ^~~~~~~~~~~~~~~~~~~~~
In file included from arch/mips/kernel/cevt-ds1287.c:13:
./arch/mips/include/asm/ds1287.h:11:13: note: previous declaration of ‘ds1287_set_base_clock’ with type ‘void(unsigned int)’
11 | extern void ds1287_set_base_clock(unsigned int clock);
| ^~~~~~~~~~~~~~~~~~~~~
make[7]: *** [scripts/Makefile.build:207: arch/mips/kernel/cevt-ds1287.o] Error 1
make[6]: *** [scripts/Makefile.build:465: arch/mips/kernel] Error 2
make[6]: *** Waiting for unfinished jobs....
Declare which_prom() as static to suppress gcc compiler warning that
'missing-prototypes'. This function is not intended to be called
from other parts.
Fix follow error with gcc-14 when -Werror:
arch/mips/dec/prom/init.c:45:13: error: no previous prototype for ‘which_prom’ [-Werror=missing-prototypes]
45 | void __init which_prom(s32 magic, s32 *prom_vec)
| ^~~~~~~~~~
cc1: all warnings being treated as errors
make[6]: *** [scripts/Makefile.build:207: arch/mips/dec/prom/init.o] Error 1
make[5]: *** [scripts/Makefile.build:465: arch/mips/dec/prom] Error 2
make[5]: *** Waiting for unfinished jobs....
Currently, this does not cause any issues, but I believe it is necessary to
set bsg_queue to NULL after removing it to prevent potential use-after-free
(UAF) access.
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Link: https://lore.kernel.org/r/20241218014214.64533-3-kanie@linux.alibaba.com Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Xiangyu Chen <xiangyu.chen@windriver.com> Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
However, in the same function, phba->fcf.fcf_flag is assigned with 0
without holding the lock, and thus can cause a data race:
phba->fcf.fcf_flag = 0;
To fix this possible data race, a lock and unlock pair is added when
accessing the variable phba->fcf.fcf_flag.
Reported-by: BassCheck <bass@buaa.edu.cn> Signed-off-by: Tuo Li <islituo@gmail.com> Link: https://lore.kernel.org/r/20230630024748.1035993-1-islituo@gmail.com Reviewed-by: Justin Tee <justin.tee@broadcom.com> Reviewed-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Bin Lan <bin.lan.cn@windriver.com> Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When device is unregistering, the 'dev->real_num_tx_queues' goes to
zero and the 'while (unlikely(hash >= qcount))' loop inside the
'skb_tx_hash' becomes infinite, locking up the core forever.
But unfortunately, checking just the carrier status is not enough to
fix the issue, because some devices may still be in unregistering
state while reporting carrier status OK.
One example of such device is a net/dummy. It sets carrier ON
on start, but it doesn't implement .ndo_stop to set the carrier off.
And it makes sense, because dummy doesn't really have a carrier.
Therefore, while this device is unregistering, it's still easy to hit
the infinite loop in the skb_tx_hash() from the OVS datapath. There
might be other drivers that do the same, but dummy by itself is
important for the OVS ecosystem, because it is frequently used as a
packet sink for tcpdump while debugging OVS deployments. And when the
issue is hit, the only way to recover is to reboot.
Fix that by also checking if the device is running. The running
state is handled by the net core during unregistering, so it covers
unregistering case better, and we don't really need to send packets
to devices that are not running anyway.
While only checking the running state might be enough, the carrier
check is preserved. The running and the carrier states seem disjoined
throughout the code and different drivers. And other core functions
like __dev_direct_xmit() check both before attempting to transmit
a packet. So, it seems safer to check both flags in OVS as well.
Fixes: 066b86787fa3 ("net: openvswitch: fix race on port output") Reported-by: Friedrich Weber <f.weber@proxmox.com> Closes: https://mail.openvswitch.org/pipermail/ovs-discuss/2025-January/053423.html Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Tested-by: Friedrich Weber <f.weber@proxmox.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20250109122225.4034688-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Carlos Soto <carlos.soto@broadcom.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
assume the following setup on a single machine:
1. An openvswitch instance with one bridge and default flows
2. two network namespaces "server" and "client"
3. two ovs interfaces "server" and "client" on the bridge
4. for each ovs interface a veth pair with a matching name and 32 rx and
tx queues
5. move the ends of the veth pairs to the respective network namespaces
6. assign ip addresses to each of the veth ends in the namespaces (needs
to be the same subnet)
7. start some http server on the server network namespace
8. test if a client in the client namespace can reach the http server
when following the actions below the host has a chance of getting a cpu
stuck in a infinite loop:
1. send a large amount of parallel requests to the http server (around
3000 curls should work)
2. in parallel delete the network namespace (do not delete interfaces or
stop the server, just kill the namespace)
there is a low chance that this will cause the below kernel cpu stuck
message. If this does not happen just retry.
Below there is also the output of bpftrace for the functions mentioned
in the output.
The series of events happening here is:
1. the network namespace is deleted calling
`unregister_netdevice_many_notify` somewhere in the process
2. this sets first `NETREG_UNREGISTERING` on both ends of the veth and
then runs `synchronize_net`
3. it then calls `call_netdevice_notifiers` with `NETDEV_UNREGISTER`
4. this is then handled by `dp_device_event` which calls
`ovs_netdev_detach_dev` (if a vport is found, which is the case for
the veth interface attached to ovs)
5. this removes the rx_handlers of the device but does not prevent
packages to be sent to the device
6. `dp_device_event` then queues the vport deletion to work in
background as a ovs_lock is needed that we do not hold in the
unregistration path
7. `unregister_netdevice_many_notify` continues to call
`netdev_unregister_kobject` which sets `real_num_tx_queues` to 0
8. port deletion continues (but details are not relevant for this issue)
9. at some future point the background task deletes the vport
If after 7. but before 9. a packet is send to the ovs vport (which is
not deleted at this point in time) which forwards it to the
`dev_queue_xmit` flow even though the device is unregistering.
In `skb_tx_hash` (which is called in the `dev_queue_xmit`) path there is
a while loop (if the packet has a rx_queue recorded) that is infinite if
`dev->real_num_tx_queues` is zero.
To prevent this from happening we update `do_output` to handle devices
without carrier the same as if the device is not found (which would
be the code path after 9. is done).
Additionally we now produce a warning in `skb_tx_hash` if we will hit
the infinite loop.
Function mlx5e_rep_neigh_update() wasn't updated to accommodate rtnl lock
removal from TC filter update path and properly handle concurrent encap
entry insertion/deletion which can lead to following use-after-free:
[23827.464923] ==================================================================
[23827.469446] BUG: KASAN: use-after-free in mlx5e_encap_take+0x72/0x140 [mlx5_core]
[23827.470971] Read of size 4 at addr ffff8881d132228c by task kworker/u20:6/21635
[23827.472251]
[23827.472615] CPU: 9 PID: 21635 Comm: kworker/u20:6 Not tainted 5.13.0-rc3+ #5
[23827.473788] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[23827.475639] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core]
[23827.476731] Call Trace:
[23827.477260] dump_stack+0xbb/0x107
[23827.477906] print_address_description.constprop.0+0x18/0x140
[23827.478896] ? mlx5e_encap_take+0x72/0x140 [mlx5_core]
[23827.479879] ? mlx5e_encap_take+0x72/0x140 [mlx5_core]
[23827.480905] kasan_report.cold+0x7c/0xd8
[23827.481701] ? mlx5e_encap_take+0x72/0x140 [mlx5_core]
[23827.482744] kasan_check_range+0x145/0x1a0
[23827.493112] mlx5e_encap_take+0x72/0x140 [mlx5_core]
[23827.494054] ? mlx5e_tc_tun_encap_info_equal_generic+0x140/0x140 [mlx5_core]
[23827.495296] mlx5e_rep_neigh_update+0x41e/0x5e0 [mlx5_core]
[23827.496338] ? mlx5e_rep_neigh_entry_release+0xb80/0xb80 [mlx5_core]
[23827.497486] ? read_word_at_a_time+0xe/0x20
[23827.498250] ? strscpy+0xa0/0x2a0
[23827.498889] process_one_work+0x8ac/0x14e0
[23827.499638] ? lockdep_hardirqs_on_prepare+0x400/0x400
[23827.500537] ? pwq_dec_nr_in_flight+0x2c0/0x2c0
[23827.501359] ? rwlock_bug.part.0+0x90/0x90
[23827.502116] worker_thread+0x53b/0x1220
[23827.502831] ? process_one_work+0x14e0/0x14e0
[23827.503627] kthread+0x328/0x3f0
[23827.504254] ? _raw_spin_unlock_irq+0x24/0x40
[23827.505065] ? __kthread_bind_mask+0x90/0x90
[23827.505912] ret_from_fork+0x1f/0x30
[23827.506621]
[23827.506987] Allocated by task 28248:
[23827.507694] kasan_save_stack+0x1b/0x40
[23827.508476] __kasan_kmalloc+0x7c/0x90
[23827.509197] mlx5e_attach_encap+0xde1/0x1d40 [mlx5_core]
[23827.510194] mlx5e_tc_add_fdb_flow+0x397/0xc40 [mlx5_core]
[23827.511218] __mlx5e_add_fdb_flow+0x519/0xb30 [mlx5_core]
[23827.512234] mlx5e_configure_flower+0x191c/0x4870 [mlx5_core]
[23827.513298] tc_setup_cb_add+0x1d5/0x420
[23827.514023] fl_hw_replace_filter+0x382/0x6a0 [cls_flower]
[23827.514975] fl_change+0x2ceb/0x4a51 [cls_flower]
[23827.515821] tc_new_tfilter+0x89a/0x2070
[23827.516548] rtnetlink_rcv_msg+0x644/0x8c0
[23827.517300] netlink_rcv_skb+0x11d/0x340
[23827.518021] netlink_unicast+0x42b/0x700
[23827.518742] netlink_sendmsg+0x743/0xc20
[23827.519467] sock_sendmsg+0xb2/0xe0
[23827.520131] ____sys_sendmsg+0x590/0x770
[23827.520851] ___sys_sendmsg+0xd8/0x160
[23827.521552] __sys_sendmsg+0xb7/0x140
[23827.522238] do_syscall_64+0x3a/0x70
[23827.522907] entry_SYSCALL_64_after_hwframe+0x44/0xae
[23827.523797]
[23827.524163] Freed by task 25948:
[23827.524780] kasan_save_stack+0x1b/0x40
[23827.525488] kasan_set_track+0x1c/0x30
[23827.526187] kasan_set_free_info+0x20/0x30
[23827.526968] __kasan_slab_free+0xed/0x130
[23827.527709] slab_free_freelist_hook+0xcf/0x1d0
[23827.528528] kmem_cache_free_bulk+0x33a/0x6e0
[23827.529317] kfree_rcu_work+0x55f/0xb70
[23827.530024] process_one_work+0x8ac/0x14e0
[23827.530770] worker_thread+0x53b/0x1220
[23827.531480] kthread+0x328/0x3f0
[23827.532114] ret_from_fork+0x1f/0x30
[23827.532785]
[23827.533147] Last potentially related work creation:
[23827.534007] kasan_save_stack+0x1b/0x40
[23827.534710] kasan_record_aux_stack+0xab/0xc0
[23827.535492] kvfree_call_rcu+0x31/0x7b0
[23827.536206] mlx5e_tc_del_fdb_flow+0x577/0xef0 [mlx5_core]
[23827.537305] mlx5e_flow_put+0x49/0x80 [mlx5_core]
[23827.538290] mlx5e_delete_flower+0x6d1/0xe60 [mlx5_core]
[23827.539300] tc_setup_cb_destroy+0x18e/0x2f0
[23827.540144] fl_hw_destroy_filter+0x1d2/0x310 [cls_flower]
[23827.541148] __fl_delete+0x4dc/0x660 [cls_flower]
[23827.541985] fl_delete+0x97/0x160 [cls_flower]
[23827.542782] tc_del_tfilter+0x7ab/0x13d0
[23827.543503] rtnetlink_rcv_msg+0x644/0x8c0
[23827.544257] netlink_rcv_skb+0x11d/0x340
[23827.544981] netlink_unicast+0x42b/0x700
[23827.545700] netlink_sendmsg+0x743/0xc20
[23827.546424] sock_sendmsg+0xb2/0xe0
[23827.547084] ____sys_sendmsg+0x590/0x770
[23827.547850] ___sys_sendmsg+0xd8/0x160
[23827.548606] __sys_sendmsg+0xb7/0x140
[23827.549303] do_syscall_64+0x3a/0x70
[23827.549969] entry_SYSCALL_64_after_hwframe+0x44/0xae
[23827.550853]
[23827.551217] The buggy address belongs to the object at ffff8881d1322200
[23827.551217] which belongs to the cache kmalloc-256 of size 256
[23827.553341] The buggy address is located 140 bytes inside of
[23827.553341] 256-byte region [ffff8881d1322200, ffff8881d1322300)
[23827.555747] The buggy address belongs to the page:
[23827.556847] page:00000000898762aa refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1d1320
[23827.558651] head:00000000898762aa order:2 compound_mapcount:0 compound_pincount:0
[23827.559961] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[23827.561243] raw: 002ffff800010200dead000000000100dead000000000122ffff888100042b40
[23827.562653] raw: 0000000000000000000000000020002000000001ffffffff0000000000000000
[23827.564112] page dumped because: kasan: bad access detected
[23827.565439]
[23827.565932] Memory state around the buggy address:
[23827.566917] ffff8881d1322180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[23827.568485] ffff8881d1322200: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[23827.569818] >ffff8881d1322280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[23827.571143] ^
[23827.571879] ffff8881d1322300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[23827.573283] ffff8881d1322380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[23827.574654] ==================================================================
Most of the necessary logic is already correctly implemented by
mlx5e_get_next_valid_encap() helper that is used in neigh stats update
handler. Make the handler generic by renaming it to
mlx5e_get_next_matching_encap() and use callback to test whether flow is
matching instead of hardcoded check for 'valid' flag value. Implement
mlx5e_get_next_valid_encap() by calling mlx5e_get_next_matching_encap()
with callback that tests encap MLX5_ENCAP_ENTRY_VALID flag. Implement new
mlx5e_get_next_init_encap() helper by calling
mlx5e_get_next_matching_encap() with callback that tests encap completion
result to be non-error and use it in mlx5e_rep_neigh_update() to safely
iterate over nhe->encap_list.
Remove encap completion logic from mlx5e_rep_update_flows() since the encap
entries passed to this function are already guaranteed to be properly
initialized by similar code in mlx5e_get_next_init_encap().
Fixes: 2a1f1768fa17 ("net/mlx5e: Refactor neigh update for concurrent execution") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
[ since kernel 5.10 doesn't have commit 0d9f96471493
("net/mlx5e: Extract tc tunnel encap/decap code to dedicated file")
which moved encap/decap from en_tc.c to tc_tun_encap.c, so backport and
move the additional functions to en_tc.c instead of tc_tun_encap.c ] Signed-off-by: Xiangyu Chen <xiangyu.chen@windriver.com> Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The del_timer_sync function cancels the s_err_report timer,
which reminds about filesystem errors daily. We should
guarantee the timer is no longer active before kfree(sbi).
When filesystem mounting fails, the flow goes to failed_mount3,
where an error occurs when ext4_stop_mmpd is called, causing
a read I/O failure. This triggers the ext4_handle_error function
that ultimately re-arms the timer,
leaving the s_err_report timer active before kfree(sbi) is called.
Fix the issue by canceling the s_err_report timer after calling ext4_stop_mmpd.
In iocg_pay_debt(), warn is triggered if 'active_list' is empty, which
is intended to confirm iocg is active when it has debt. However, warn
can be triggered during a blkcg or disk removal, if iocg_waitq_timer_fn()
is run at that time:
The warn in this situation is meaningless. Since this iocg is being
removed, the state of the 'active_list' is irrelevant, and 'waitq_timer'
is canceled after removing 'active_list' in ioc_pd_free(), which ensures
iocg is freed after iocg_waitq_timer_fn() returns.
Therefore, add the check if iocg was already offlined to avoid warn
when removing a blkcg or disk.
Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20240419093257.3004211-1-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Bin Lan <bin.lan.cn@windriver.com> Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A new field 'online' is added to blkg_policy_data to fix following
2 problem:
1) In blkcg_activate_policy(), if pd_alloc_fn() with 'GFP_NOWAIT'
failed, 'queue_lock' will be dropped and pd_alloc_fn() will try again
without 'GFP_NOWAIT'. In the meantime, remove cgroup can race with
it, and pd_offline_fn() will be called without pd_init_fn() and
pd_online_fn(). This way null-ptr-deference can be triggered.
2) In order to synchronize pd_free_fn() from blkg_free_workfn() and
blkcg_deactivate_policy(), 'list_del_init(&blkg->q_node)' will be
delayed to blkg_free_workfn(), hence pd_offline_fn() can be called
first in blkg_destroy(), and then blkcg_deactivate_policy() will
call it again, we must prevent it.
The new field 'online' will be set after pd_online_fn() and will be
cleared after pd_offline_fn(), in the meantime pd_offline_fn() will only
be called if 'online' is set.
Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230119110350.2287325-3-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Bin Lan <bin.lan.cn@windriver.com> Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
These three bpf_map_{lookup,update,delete}_elem() helpers are also
available for sleepable bpf program, so add the corresponding lock
assertion for sleepable bpf program, otherwise the following warning
will be reported when a sleepable bpf program manipulates bpf map under
interpreter mode (aka bpf_jit_enable=0):
We use map->freeze_mutex to prevent races between map_freeze() and
memory mapping BPF map contents with writable permissions. The way we
naively do this means we'll hold freeze_mutex for entire duration of all
the mm and VMA manipulations, which is completely unnecessary. This can
potentially also lead to deadlocks, as reported by syzbot in [0].
So, instead, hold freeze_mutex only during writeability checks, bump
(proactively) "write active" count for the map, unlock the mutex and
proceed with mmap logic. And only if something went wrong during mmap
logic, then undo that "write active" counter increment.
Neither SMB3.0 or SMB3.02 supports encryption negotiate context, so
when SMB2_GLOBAL_CAP_ENCRYPTION flag is set in the negotiate response,
the client uses AES-128-CCM as the default cipher. See MS-SMB2
3.3.5.4.
Commit b0abcd65ec54 ("smb: client: fix UAF in async decryption") added
a @server->cipher_type check to conditionally call
smb3_crypto_aead_allocate(), but that check would always be false as
@server->cipher_type is unset for SMB3.02.
Fix the following KASAN splat by setting @server->cipher_type for
SMB3.02 as well.
Fix this by allocating a new AEAD TFM for async decryption, but keep
the existing one for synchronous READ cases (similar to what is done
in smb3_calc_signature()).
Also remove the calls to aead_request_set_callback() and
crypto_wait_req() since it's always going to be a synchronous operation.
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
[In linux-5.10, dec and enc fields are named ccmaesdecrypt and ccmaesencrypt.] Signed-off-by: Jianqi Ren <jianqi.ren.cn@windriver.com> Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Skip sessions that are being teared down (status == SES_EXITING) to
avoid UAF.
Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
[ cifs_debug.c was moved from fs/cifs to fs/smb/client since 38c8a9a52082 ("smb: move client and server files to common directory fs/smb").
The cifs_ses_exiting() was introduced to cifs_debug.c since ca545b7f0823 ("smb: client: fix potential UAF in cifs_debug_files_proc_show()")
which has been sent to upstream already. ] Signed-off-by: Jianqi Ren <jianqi.ren.cn@windriver.com> Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
All release_mid() callers seem to hold a reference of @mid so there is
no need to call kref_put(&mid->refcount, __release_mid) under
@server->mid_lock spinlock. If they don't, then an use-after-free bug
would have occurred anyways.
By getting rid of such spinlock also fixes a potential deadlock as
shown below
CPU 0 CPU 1
------------------------------------------------------------------
cifs_demultiplex_thread() cifs_debug_data_proc_show()
release_mid()
spin_lock(&server->mid_lock);
spin_lock(&cifs_tcp_ses_lock)
spin_lock(&server->mid_lock)
__release_mid()
smb2_find_smb_tcon()
spin_lock(&cifs_tcp_ses_lock) *deadlock*
Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
[cifs_mid_q_entry_release() is renamed to release_mid() and
_cifs_mid_q_entry_release() is renamed to __release_mid() by
commit 70f08f914a37 ("cifs: remove useless DeleteMidQEntry()")
which is integrated into v6.0, so preserve old names in v5.10.] Signed-off-by: Cliff Liu <donghua.liu@windriver.com> Signed-off-by: He Zhe <Zhe.He@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1. After send request to server, wait the response until
mid->mid_state != SUBMITTED;
2. Receive response from server, and set it to mid;
3. Set the mid state to RECEIVED;
4. Kill the process, the mid state already RECEIVED, get 0;
5. Handle and release the negotiate response;
6. UAF.
It can be easily reproduce with add some delay in [3] - [6].
Only sync call has the problem since async call's callback is
executed in cifsd process.
Add an extra state to mark the mid state to READY before wakeup the
waitter, then it can get the resp safely.
Fixes: ec637e3ffb6b ("[CIFS] Avoid extra large buffer allocation (and memcpy) in cifs_readpages") Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
[fs/cifs was moved to fs/smb/client since 38c8a9a52082 ("smb: move client and server files to common directory fs/smb").
We apply the patch to fs/cifs with some minor context changes.] Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Xiangyu Chen <xiangyu.chen@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
[ This patch removed lock/unlock operation due to ses_lock is
not present in v5.10 and not ported yet. ses->status is protected
by a global lock, cifs_tcp_ses_lock, in v5.10. ] Signed-off-by: Xiangyu Chen <xiangyu.chen@windriver.com> Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Skip sessions that are being teared down (status == SES_EXITING) to
avoid UAF.
Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
[ This patch removes lock/unlock operation in routine cifs_ses_exiting()
for ses_lock is not present in v5.10 and not ported yet. ses->status
is protected by a global lock, cifs_tcp_ses_lock, in v5.10. ] Signed-off-by: Jianqi Ren <jianqi.ren.cn@windriver.com> Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The functions nvmet_fc_iodnum() and nvmet_fc_fodnum() are currently
unutilized.
Following commit c53432030d86 ("nvme-fabrics: Add target support for FC
transport"), which introduced these two functions, they have not been
used at all in practice.
An issue was identified in the dcn21_link_encoder_create function where
an out-of-bounds access could occur when the hpd_source index was used
to reference the link_enc_hpd_regs array. This array has a fixed size
and the index was not being checked against the array's bounds before
accessing it.
This fix adds a conditional check to ensure that the hpd_source index is
within the valid range of the link_enc_hpd_regs array. If the index is
out of bounds, the function now returns NULL to prevent undefined
behavior.
Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Bin Lan <bin.lan.cn@windriver.com> Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Why]
Coverity report OVERRUN warning. There are
only max_links elements within dc->links. link
count could up to AMDGPU_DM_MAX_DISPLAY_INDEX 31.
[How]
Make sure link count less than max_links.
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Hersen Wu <hersenxs.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Minor conflict resolved due to code context change. And the macro MAX_LINKS
is introduced by Commit 60df5628144b ("drm/amd/display: handle invalid
connector indices") after 6.10. So here we still use the original array
length MAX_PIPES * 2] Signed-off-by: Jianqi Ren <jianqi.ren.cn@windriver.com> Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When running 'rmmod ath10k', ath10k_sdio_remove() will free sdio
workqueue by destroy_workqueue(). But if CONFIG_INIT_ON_FREE_DEFAULT_ON
is set to yes, kernel panic will happen:
Call trace:
destroy_workqueue+0x1c/0x258
ath10k_sdio_remove+0x84/0x94
sdio_bus_remove+0x50/0x16c
device_release_driver_internal+0x188/0x25c
device_driver_detach+0x20/0x2c
This is because during 'rmmod ath10k', ath10k_sdio_remove() will call
ath10k_core_destroy() before destroy_workqueue(). wiphy_dev_release()
will finally be called in ath10k_core_destroy(). This function will free
struct cfg80211_registered_device *rdev and all its members, including
wiphy, dev and the pointer of sdio workqueue. Then the pointer of sdio
workqueue will be set to NULL due to CONFIG_INIT_ON_FREE_DEFAULT_ON.
After device release, destroy_workqueue() will use NULL pointer then the
kernel panic happen.
Need to call destroy_workqueue() before ath10k_core_destroy(), free
the work queue buffer first and then free pointer of work queue by
ath10k_core_destroy(). This order matches the error path order in
ath10k_sdio_probe().
No work will be queued on sdio workqueue between it is destroyed and
ath10k_core_destroy() is called. Based on the call_stack above, the
reason is:
Only ath10k_sdio_sleep_timer_handler(), ath10k_sdio_hif_tx_sg() and
ath10k_sdio_irq_disable() will queue work on sdio workqueue.
Sleep timer will be deleted before ath10k_core_destroy() in
ath10k_hif_power_down().
ath10k_sdio_irq_disable() only be called in ath10k_hif_stop().
ath10k_core_unregister() will call ath10k_hif_power_down() to stop hif
bus, so ath10k_sdio_hif_tx_sg() won't be called anymore.
Signed-off-by: Kang Yang <quic_kangyang@quicinc.com> Tested-by: David Ruth <druth@chromium.org> Reviewed-by: David Ruth <druth@chromium.org> Link: https://patch.msgid.link/20241008022246.1010-1-quic_kangyang@quicinc.com Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Alva Lan <alvalan9@foxmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
callers of tegra_xusb_find_port_node() function only do NULL checking for
the return value. return NULL instead of ERR_PTR(-ENOMEM) to keep
consistent.
The 'nargs' and 'nret' locals come directly from a user-supplied
buffer and are used as indexes into a small stack-based array and as
inputs to copy_to_user() after they are subject to bounds checks.
Use array_index_nospec() after the bounds checks to clamp these values
for speculative execution.
If a discard request needs to be retried, and that retry may fail before
a new special payload is added, a double free will result. Clear the
RQF_SPECIAL_LOAD when the request is cleaned.
Signed-off-by: Chunguang Xu <chunguang.xu@shopee.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
[Minor context change fixed] Signed-off-by: Cliff Liu <donghua.liu@windriver.com> Signed-off-by: He Zhe <Zhe.He@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Calling C code via a different mapping than it was linked at is
problematic, because the compiler assumes that RIP-relative and absolute
symbol references are interchangeable. GCC in particular may use
RIP-relative per-CPU variable references even when not using -fpic.
So call xen_prepare_pvh() via its kernel virtual mapping on x86_64, so
that those RIP-relative references produce the correct values. This
matches the pre-existing behavior for i386, which also invokes
xen_prepare_pvh() via the kernel virtual mapping before invoking
startup_32 with paging disabled again.
There are two variables that indicate the interrupt type to be used
in the next test execution, "irq_type" as global and "test->irq_type".
The global is referenced from pci_endpoint_test_get_irq() to preserve
the current type for ioctl(PCITEST_GET_IRQTYPE).
The type set in this function isn't reflected in the global "irq_type",
so ioctl(PCITEST_GET_IRQTYPE) returns the previous type.
As a result, the wrong type is displayed in old version of "pcitest"
as follows:
- Result of running "pcitest -i 0"
SET IRQ TYPE TO LEGACY: OKAY
- Result of running "pcitest -I"
GET IRQ TYPE: MSI
Whereas running the new version of "pcitest" in kselftest results in an
error as follows:
# RUN pci_ep_basic.LEGACY_IRQ_TEST ...
# pci_endpoint_test.c:104:LEGACY_IRQ_TEST:Expected 0 (0) == ret (1)
# pci_endpoint_test.c:104:LEGACY_IRQ_TEST:Can't get Legacy IRQ type
Fix this issue by propagating the current type to the global "irq_type".
After devm_request_irq() fails with error in pci_endpoint_test_request_irq(),
the pci_endpoint_test_free_irq_vectors() is called assuming that all IRQs
have been released.
However, some requested IRQs remain unreleased, so there are still
/proc/irq/* entries remaining, and this results in WARN() with the
following message:
remove_proc_entry: removing non-empty directory 'irq/30', leaking at least 'pci-endpoint-test.0'
WARNING: CPU: 0 PID: 202 at fs/proc/generic.c:719 remove_proc_entry +0x190/0x19c
To solve this issue, set the number of remaining IRQs to test->num_irqs,
and release IRQs in advance by calling pci_endpoint_test_release_irq().
When adding a socket option support in MPTCP, both the get and set parts
are supposed to be implemented.
IPV6_V6ONLY support for the setsockopt part has been added a while ago,
but it looks like the get part got forgotten. It should have been
present as a way to verify a setting has been set as expected, and not
to act differently from TCP or any other socket types.
Not supporting this getsockopt(IPV6_V6ONLY) blocks some apps which want
to check the default value, before doing extra actions. On Linux, the
default value is 0, but this can be changed with the net.ipv6.bindv6only
sysctl knob. On Windows, it is set to 1 by default. So supporting the
get part, like for all other socket options, is important.
Everything was in place to expose it, just the last step was missing.
Only new code is added to cover this specific getsockopt(), that seems
safe.
Fixes: c9b95a135987 ("mptcp: support IPV6_V6ONLY setsockopt") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/550 Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314-net-mptcp-fix-data-stream-corr-sockopt-v1-2-122dbb249db3@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
[ Conflicts in sockopt.c in the context, because commit 0abdde82b163
("mptcp: move sockopt function into a new file") is not in this
release. The modifications can still be done in protocol.c without
difficulties. A particularity is that the mptcp_put_int_option()
helper is required, and imported from newer versions without taking
the extra features introduced with them in commit 2c9e77659a0c
("mptcp: add TCP_INQ cmsg support") and commit 3b1e21eb60e8 ("mptcp:
getsockopt: add support for IP_TOS"). ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Recently, during a debugging session using local MPTCP connections, I
noticed MPJoinAckHMacFailure was not zero on the server side. The
counter was in fact incremented when the PM rejected new subflows,
because the 'subflow' limit was reached.
The fix is easy, simply dissociating the two cases: only the HMAC
validation check should increase MPTCP_MIB_JOINACKMAC counter.
Fixes: 4cf8b7e48a09 ("subflow: introduce and use mptcp_can_accept_new_subflow()") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250407-net-mptcp-hmac-failure-mib-v1-1-3c9ecd0a3a50@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ No conflicts, but subflow_add_reset_reason() is not needed is this
version: the related feature is not supported in this version. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to the debug log, the same req received two SYN-ACK in a very
short time, very likely because the client retransmits the syn ack due
to multiple reasons.
Even if the packets are transmitted with a relevant time interval, they
can be processed by the server on different CPUs concurrently). The
'subflow_req->msk' ownership is transferred to the subflow the first,
and there will be a risk of a null pointer dereference here.
This patch fixes this issue by moving the 'subflow_req->msk' under the
`own_req == true` conditional.
Note that the !msk check in subflow_hmac_valid() can be dropped, because
the same check already exists under the own_req mpj branch where the
code has been moved to.
Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in use") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Gang Yan <yangang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250328-net-mptcp-misc-fixes-6-15-v1-1-34161a482a7f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ Conflict in subflow.c because commit 74c7dfbee3e1 ("mptcp: consolidate
in_opt sub-options fields in a bitmask") is not in this version. The
conflict is in the context, and the modification can still be applied.
Note that subflow_add_reset_reason() is not needed here, because the
related feature is not supported in this version. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler().
"""
We are seeing a use-after-free from a bpf prog attached to
trace_tcp_retransmit_synack. The program passes the req->sk to the
bpf_sk_storage_get_tracing kernel helper which does check for null
before using it.
"""
The commit 83fccfc3940c ("inet: fix potential deadlock in
reqsk_queue_unlink()") added timer_pending() in reqsk_queue_unlink() not
to call del_timer_sync() from reqsk_timer_handler(), but it introduced a
small race window.
Before the timer is called, expire_timers() calls detach_timer(timer, true)
to clear timer->entry.pprev and marks it as not pending.
If reqsk_queue_unlink() checks timer_pending() just after expire_timers()
calls detach_timer(), TCP will miss del_timer_sync(); the reqsk timer will
continue running and send multiple SYN+ACKs until it expires.
The reported UAF could happen if req->sk is close()d earlier than the timer
expiration, which is 63s by default.
The scenario would be
1. inet_csk_complete_hashdance() calls inet_csk_reqsk_queue_drop(),
but del_timer_sync() is missed
2. reqsk timer is executed and scheduled again
3. req->sk is accept()ed and reqsk_put() decrements rsk_refcnt, but
reqsk timer still has another one, and inet_csk_accept() does not
clear req->sk for non-TFO sockets
4. sk is close()d
5. reqsk timer is executed again, and BPF touches req->sk
Let's not use timer_pending() by passing the caller context to
__inet_csk_reqsk_queue_drop().
Note that reqsk timer is pinned, so the issue does not happen in most
use cases. [1]
[0]
BUG: KFENCE: use-after-free read in bpf_sk_storage_get_tracing+0x2e/0x1b0
Use-after-free read at 0x00000000a891fb3a (in kfence-#1):
bpf_sk_storage_get_tracing+0x2e/0x1b0
bpf_prog_5ea3e95db6da0438_tcp_retransmit_synack+0x1d20/0x1dda
bpf_trace_run2+0x4c/0xc0
tcp_rtx_synack+0xf9/0x100
reqsk_timer_handler+0xda/0x3d0
run_timer_softirq+0x292/0x8a0
irq_exit_rcu+0xf5/0x320
sysvec_apic_timer_interrupt+0x6d/0x80
asm_sysvec_apic_timer_interrupt+0x16/0x20
intel_idle_irq+0x5a/0xa0
cpuidle_enter_state+0x94/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb
allocated by task 0 on cpu 9 at 260507.901592s:
sk_prot_alloc+0x35/0x140
sk_clone_lock+0x1f/0x3f0
inet_csk_clone_lock+0x15/0x160
tcp_create_openreq_child+0x1f/0x410
tcp_v6_syn_recv_sock+0x1da/0x700
tcp_check_req+0x1fb/0x510
tcp_v6_rcv+0x98b/0x1420
ipv6_list_rcv+0x2258/0x26e0
napi_complete_done+0x5b1/0x2990
mlx5e_napi_poll+0x2ae/0x8d0
net_rx_action+0x13e/0x590
irq_exit_rcu+0xf5/0x320
common_interrupt+0x80/0x90
asm_common_interrupt+0x22/0x40
cpuidle_enter_state+0xfb/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb
freed by task 0 on cpu 9 at 260507.927527s:
rcu_core_si+0x4ff/0xf10
irq_exit_rcu+0xf5/0x320
sysvec_apic_timer_interrupt+0x6d/0x80
asm_sysvec_apic_timer_interrupt+0x16/0x20
cpuidle_enter_state+0xfb/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb
Fixes: 83fccfc3940c ("inet: fix potential deadlock in reqsk_queue_unlink()") Reported-by: Martin KaFai Lau <martin.lau@kernel.org> Closes: https://lore.kernel.org/netdev/eb6684d0-ffd9-4bdc-9196-33f690c25824@linux.dev/ Link: https://lore.kernel.org/netdev/b55e2ca0-42f2-4b7c-b445-6ffd87ca74a0@linux.dev/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20241014223312.4254-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Bin Lan <bin.lan.cn@windriver.com> Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A recent optimization change in LLVM [1] aims to transform certain loop
idioms into calls to strlen() or wcslen(). This change transforms the
first while loop in UniStrcat() into a call to wcslen(), breaking the
build when UniStrcat() gets inlined into alloc_path_with_tree_prefix():
ld.lld: error: undefined symbol: wcslen
>>> referenced by nls_ucs2_utils.h:54 (fs/smb/client/../../nls/nls_ucs2_utils.h:54)
>>> vmlinux.o:(alloc_path_with_tree_prefix)
>>> referenced by nls_ucs2_utils.h:54 (fs/smb/client/../../nls/nls_ucs2_utils.h:54)
>>> vmlinux.o:(alloc_path_with_tree_prefix)
Disable this optimization with '-fno-builtin-wcslen', which prevents the
compiler from assuming that wcslen() is available in the kernel's C
library.
[ More to the point - it's not that we couldn't implement wcslen(), it's
that this isn't an optimization at all in the context of the kernel.
Replacing a simple inlined loop with a function call to the same loop
is just stupid and pointless if you don't have long strings and fancy
libraries with vectorization support etc.
For the regular 'strlen()' cases, we want the compiler to do this in
order to handle the trivial case of constant strings. And we do have
optimized versions of 'strlen()' on some architectures. But for
wcslen? Just no. - Linus ]
Since acpi_processor_notify() can be called before registering a cpufreq
driver or even in cases when a cpufreq driver is not registered at all,
cpufreq_update_limits() needs to check if a cpufreq driver is present
and prevent it from being unregistered.
For this purpose, make it call cpufreq_cpu_get() to obtain a cpufreq
policy pointer for the given CPU and reference count the corresponding
policy object, if present.
Fixes: 5a25e3f7cc53 ("cpufreq: intel_pstate: Driver-specific handling of _PPC updates") Closes: https://lore.kernel.org/linux-acpi/Z-ShAR59cTow0KcR@mail-itl Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://patch.msgid.link/1928789.tdWV9SEqCh@rjwysocki.net
[do not use __free(cpufreq_cpu_put) in a backport] Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- drm_prime_gem_destroy calls dma_buf_put(dma_buf) which releases the
reference to the shared dma_buf. The reference count is 0, so the
dma_buf is destroyed, which in turn decrements the corresponding
amdgpu_bo reference count to 0, and the amdgpu_bo is destroyed -
calling drm_gem_object_release then dma_resv_fini (which destroys the
reservation object), then finally freeing the amdgpu_bo.
- nouveau_bo obj->bo.base.resv is now a dangling pointer to the memory
formerly allocated to the amdgpu_bo.
- nouveau_gem_object_del calls ttm_bo_put(&nvbo->bo) which calls
ttm_bo_release, which schedules ttm_bo_delayed_delete.
- ttm_bo_delayed_delete runs and dereferences the dangling resv pointer,
resulting in a general protection fault.
Fix this by moving the drm_prime_gem_destroy call from
nouveau_gem_object_del to nouveau_bo_del_ttm. This ensures that it will
be run after ttm_bo_delayed_delete.
Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com> Suggested-by: Christian König <christian.koenig@amd.com> Fixes: 22b33e8ed0e3 ("nouveau: add PRIME support") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3937 Cc: Stable@vger.kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/Z-P4epVK8k7tFZ7C@debian.local Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>