Similar to VHE, calculate the value of cptr_el2 from scratch on
activate traps. This removes the need to store cptr_el2 in every
vcpu structure. Moreover, some traps, such as whether the guest
owns the fp registers, need to be set on every vcpu run.
Reported-by: James Clark <james.clark@linaro.org> Fixes: 5294afdbf45a ("KVM: arm64: Exclude FP ownership from kvm_vcpu_arch") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241216105057.579031-13-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Before the blamed commit, sendzc relied on io_req_msg_cleanup() to clear
REQ_F_NEED_CLEANUP, so after the following snippet the request will
never hit the core io_uring cleanup path.
io_notif_flush();
io_req_msg_cleanup();
The easiest fix is to null the notification. io_send_zc_cleanup() can
still be called after, but it's tolerated.
parse_dcal() validate num_aces to allocate posix_ace_state_array.
if (num_aces > ULONG_MAX / sizeof(struct smb_ace *))
It is an incorrect validation that we can create an array of size ULONG_MAX.
smb_acl has ->size field to calculate actual number of aces in request buffer
size. Use this to check invalid num_aces.
Reported-by: Igor Leite Ladessa <igor-ladessa@hotmail.com> Tested-by: Igor Leite Ladessa <igor-ladessa@hotmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To workaround queue full h/w issue on Gfx7/8, when application create
AQL queue, the ring buffer bo allocate size is queue_size/2 and
map queue_size ring buffer to GPU in 2 pieces using 2 attachments, each
attachment map size is queue_size/2, with same ring_bo backing memory.
For Gfx7/8, user queue buffer validation should use queue_size/2 to
verify ring_bo allocation and mapping size.
Fixes: 68e599db7a54 ("drm/amdkfd: Validate user queue buffers") Suggested-by: Tomáš Trnka <trnka@scm.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e7a477735f1771b9a9346a5fbd09d7ff0641723a) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, it seems like the code was carried over from RDNA3 because
it assumes two possible values to set. RDNA4, instead of having:
0: min SCLK
1: max SCLK
only has
0: SCLK offset
This change makes it so it only reports current offset value instead of
showing possible min/max values and their indices. Moreover, it now only
accepts the offset as a value, without the indice index.
Additionally, the lower bound was printed as %u by mistake.
[WHY]
DMUB locking is important to make sure that registers aren't accessed
while in PSR. Previously it was enabled but caused a deadlock in
situations with multiple eDP panels.
[HOW]
Detect if multiple eDP panels are in use to decide whether to use
lock. Refactor the function so that the first check is for PSR-SU
and then replay is in use to prevent having to look up number
of eDP panels for those configurations.
Fixes: f245b400a223 ("Revert "drm/amd/display: Use HW lock mgr for PSR1"") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3965 Reviewed-by: ChiaHsuan Chung <chiahsuan.chung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ed569e1279a3045d6b974226c814e071fa0193a6) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In gfx_v12_0_cp_gfx_load_me_microcode_rs64(), gfx_v12_0_pfp_fini() is
incorrectly used to free 'me' field of 'gfx', since gfx_v12_0_pfp_fini()
can only release 'pfp' field of 'gfx'. The release function of 'me' field
should be gfx_v12_0_me_fini().
Fixes: 52cb80c12e8a ("drm/amdgpu: Add gfx v12_0 ip block support (v6)") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ebdc52607a46cda08972888178c6aa9cd6965141) Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On the off chance that command stream passed from userspace via
ioctl() call to radeon_vce_cs_parse() is weirdly crafted and
first command to execute is to encode (case 0x03000001), the function
in question will attempt to call radeon_vce_cs_reloc() with size
argument that has not been properly initialized. Specifically, 'size'
will point to 'tmp' variable before the latter had a chance to be
assigned any value.
Play it safe and init 'tmp' with 0, thus ensuring that
radeon_vce_cs_reloc() will catch an early error in cases like these.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 2fc5703abda2 ("drm/radeon: check VCE relocation buffer range v3") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2d52de55f9ee7aaee0e09ac443f77855989c6b68) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When some client process A call pdr_add_lookup() to add the look up for
the service and does schedule locator work, later a process B got a new
server packet indicating locator is up and call pdr_locator_new_server()
which eventually sets pdr->locator_init_complete to true which process A
sees and takes list lock and queries domain list but it will timeout due
to deadlock as the response will queued to the same qmi->wq and it is
ordered workqueue and process B is not able to complete new server
request work due to deadlock on list lock.
Fix it by removing the unnecessary list iteration as the list iteration
is already being done inside locator work, so avoid it here and just
call schedule_work() here.
pdr_get_domain_list()
pr_err("PDR: %s get domain list
txn wait failed: %d\n",
req->service_name,
ret);
Timeout error log due to deadlock:
"
PDR: tms/servreg get domain list txn wait failed: -110
PDR: service lookup for msm/adsp/sensor_pd:tms/servreg failed: -110
"
Thanks to Bjorn and Johan for letting me know that this commit also fixes
an audio regression when using the in-kernel pd-mapper as that makes it
easier to hit this race. [1]
An OGMv1 and OGMv2 packet receive processing were not only limited by the
number of bytes in the received packet but also by the nodes maximum
aggregation packet size limit. But this limit is relevant for TX and not
for RX. It must not be enforced by batadv_(i)v_ogm_aggr_packet to avoid
loss of information in case of a different limit for sender and receiver.
This has a minor side effect for B.A.T.M.A.N. IV because the
batadv_iv_ogm_aggr_packet is also used for the preprocessing for the TX.
But since the aggregation code itself will not allow more than
BATADV_MAX_AGGREGATION_BYTES bytes, this check was never triggering (in
this context) prior of removing it.
Since the i and pool->chunk_size variables are of type 'u32',
their product can wrap around and then be cast to 'u64'.
This can lead to two different XDP buffers pointing to the same
memory area.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: 94033cd8e73b ("xsk: Optimize for aligned case") Cc: stable@vger.kernel.org Signed-off-by: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru> Link: https://patch.msgid.link/20250313085007.3116044-1-Ilia.Gavrilov@infotecs.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Once a key's reference count has been reduced to 0, the garbage collector
thread may destroy it at any time and so key_put() is not allowed to touch
the key after that point. The most key_put() is normally allowed to do is
to touch key_gc_work as that's a static global variable.
However, in an effort to speed up the reclamation of quota, this is now
done in key_put() once the key's usage is reduced to 0 - but now the code
is looking at the key after the deadline, which is forbidden.
Fix this by using a flag to indicate that a key can be gc'd now rather than
looking at the key's refcount in the garbage collector.
Fixes: 9578e327b2b4 ("keys: update key quotas in key_put()") Reported-by: syzbot+6105ffc1ded71d194d6d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/673b6aec.050a0220.87769.004a.GAE@google.com/ Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: syzbot+6105ffc1ded71d194d6d@syzkaller.appspotmail.com Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ben reports spurious EFI zboot failures on a system where physical RAM
starts at 0x0. When doing random memory allocation from the EFI stub on
such a platform, a random seed of 0x0 (which means no entropy source is
available) will result in the allocation to be placed at address 0x0 if
sufficient space is available.
When this allocation is subsequently passed on to the decompression
code, the 0x0 address is mistaken for NULL and the code complains and
gives up.
So avoid address 0x0 when doing random allocation, and set the minimum
address to the minimum alignment.
Since the conversion to using the TZ allocator, the efivars service is
registered before the memory pool has been allocated, something which
can lead to a NULL-pointer dereference in case of a racing EFI variable
access.
Make sure that all resources have been set up before registering the
efivars.
Fixes: 6612103ec35a ("firmware: qcom: qseecom: convert to using the TZ allocator") Cc: stable@vger.kernel.org # 6.11 Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com> Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20250120151000.13870-1-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the addresses of the shmobile_smp_mpidr, shmobile_smp_fn, and
shmobile_smp_arg variables are not multiples of 4 bytes, secondary CPU
bring-up fails:
smp: Bringing up secondary CPUs ...
CPU1: failed to come online
CPU2: failed to come online
CPU3: failed to come online
smp: Brought up 1 node, 1 CPU
Fix this by adding the missing alignment directive.
The current solution for powering off the Apalis iMX6 is not functioning
as intended. To resolve this, it is necessary to power off the
vgen2_reg, which will also set the POWER_ENABLE_MOCI signal to a low
state. This ensures the carrier board is properly informed to initiate
its power-off sequence.
The new solution uses the regulator-poweroff driver, which will power
off the regulator during a system shutdown.
Cc: <stable@vger.kernel.org> Fixes: 4eb56e26f92e ("ARM: dts: imx6q-apalis: Command pmic to standby for poweroff") Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently on cpu hotplug teardown, only memcg stock is drained but we
need to drain the obj stock as well otherwise we will miss the stats
accumulated on the target cpu as well as the nr_bytes cached. The stats
include MEMCG_KMEM, NR_SLAB_RECLAIMABLE_B & NR_SLAB_UNRECLAIMABLE_B. In
addition we are leaking reference to struct obj_cgroup object.
Link: https://lkml.kernel.org/r/20250310230934.2913113-1-shakeel.butt@linux.dev Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API") Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix race between rmmod and /proc/XXX's inode instantiation.
The bug is that pde->proc_ops don't belong to /proc, it belongs to a
module, therefore dereferencing it after /proc entry has been registered
is a bug unless use_pde/unuse_pde() pair has been used.
use_pde/unuse_pde can be avoided (2 atomic ops!) because pde->proc_ops
never changes so information necessary for inode instantiation can be
saved _before_ proc_register() in PDE itself and used later, avoiding
pde->proc_ops->... dereference.
[adobriyan@gmail.com: don't do 2 atomic ops on the common path] Link: https://lkml.kernel.org/r/3d25ded0-1739-447e-812b-e34da7990dcf@p183 Fixes: 778f3dd5a13c ("Fix procfs compat_ioctl regression") Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David S. Miller <davem@davemloft.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Watermarks are initialized during the postcore initcall. Until then, all
watermarks are set to zero. This causes cond_accept_memory() to
incorrectly skip memory acceptance because a watermark of 0 is always met.
This can lead to a premature OOM on boot.
To ensure progress, accept one MAX_ORDER page if the watermark is zero.
Link: https://lkml.kernel.org/r/20250310082855.2587122-1-kirill.shutemov@linux.intel.com Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Farrah Chen <farrah.chen@intel.com> Reported-by: Farrah Chen <farrah.chen@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Cc: Ashish Kalra <ashish.kalra@amd.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: Thomas Lendacky <thomas.lendacky@amd.com> Cc: <stable@vger.kernel.org> [6.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A shmem folio can be either in page cache or in swap cache, but not at the
same time. Namely, once it is in swap cache, folio->mapping should be
NULL, and the folio is no longer in a shmem mapping.
In __folio_migrate_mapping(), to determine the number of xarray entries to
update, folio_test_swapbacked() is used, but that conflates shmem in page
cache case and shmem in swap cache case. It leads to xarray multi-index
entry corruption, since it turns a sibling entry to a normal entry during
xas_store() (see [1] for a userspace reproduction). Fix it by only using
folio_test_swapcache() to determine whether xarray is storing swap cache
entries or not to choose the right number of xarray entries to update.
Note:
In __split_huge_page(), folio_test_anon() && folio_test_swapcache() is
used to get swap_cache address space, but that ignores the shmem folio in
swap cache case. It could lead to NULL pointer dereferencing when a
in-swap-cache shmem folio is split at __xa_store(), since
!folio_test_anon() is true and folio->mapping is NULL. But fortunately,
its caller split_huge_page_to_list_to_order() bails out early with EBUSY
when folio->mapping is NULL. So no need to take care of it here.
Link: https://lkml.kernel.org/r/20250305200403.2822855-1-ziy@nvidia.com Fixes: fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly") Signed-off-by: Zi Yan <ziy@nvidia.com> Reported-by: Liu Shixin <liushixin2@huawei.com> Closes: https://lore.kernel.org/all/28546fb4-5210-bf75-16d6-43e1f8646080@huawei.com/ Suggested-by: Hugh Dickins <hughd@google.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Charan Teja Kalla <quic_charante@quicinc.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
original report:
https://lore.kernel.org/all/CAKhLTr1UL3ePTpYjXOx2AJfNk8Ku2EdcEfu+CH1sf3Asr=B-Dw@mail.gmail.com/T/
When doing buffered writes with FGP_NOWAIT, under memory pressure, the
system returned ENOMEM despite there being plenty of available memory, to
be reclaimed from page cache. The user space used io_uring interface,
which in turn submits I/O with FGP_NOWAIT (the fast path).
This is likely a regression caused by 66dabbb65d67 ("mm: return an ERR_PTR
from __filemap_get_folio"), which moved error handling from
io_map_get_folio() to __filemap_get_folio(), but broke FGP_NOWAIT
handling, so ENOMEM is being escaped to user space. Had it correctly
returned -EAGAIN with NOWAIT, either io_uring or user space itself would
be able to retry the request.
It's not enough to patch io_uring since the iomap interface is the one
responsible for it, and pwritev2(RWF_NOWAIT) and AIO interfaces must
return the proper error too.
The patch was tested with scylladb test suite (its original reproducer),
and the tests all pass now when memory is pressured.
Link: https://lkml.kernel.org/r/20250224143700.23035-1-raphaelsc@scylladb.com Fixes: 66dabbb65d67 ("mm: return an ERR_PTR from __filemap_get_folio") Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We noticed that uffd-stress test was always failing to run when invoked
for the hugetlb profiles on x86_64 systems with a processor count of 64 or
bigger:
The problem boils down to how run_vmtests.sh (mis)calculates the size of
the region it feeds to uffd-stress. The latter expects to see an amount
of MiB while the former is just giving out the number of free hugepages
halved down. This measurement discrepancy ends up violating uffd-stress'
assertion on number of hugetlb pages allocated per CPU, causing it to bail
out with the error above.
This commit fixes that issue by adjusting run_vmtests.sh's
half_ufd_size_MB calculation so it properly renders the region size in
MiB, as expected, while maintaining all of its original constraints in
place.
Link: https://lkml.kernel.org/r/20250218192251.53243-1-aquini@redhat.com Fixes: 2e47a445d7b3 ("selftests/mm: run_vmtests.sh: fix hugetlb mem size calculation") Signed-off-by: Rafael Aquini <raquini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The error path when atmci_configure_dma() set dma fails in atmci driver
does not correctly disable the clock.
Add the missing clk_disable_unprepare() to the error path for pair with
clk_prepare_enable().
Fixes: 467e081d23e6 ("mmc: atmel-mci: use probe deferring if dma controller is not ready yet") Signed-off-by: Gu Bowen <gubowen5@huawei.com> Acked-by: Aubin Constans <aubin.constans@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250225022856.3452240-1-gubowen5@huawei.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
UART0 pinmux by default configures GPIO0_B5 in its UART RTS function for
UART0. However, by default on Haikou, it is used as GPIO as UART RTS for
UART5.
Therefore, let's update UART0 pinmux to not configure the pin in that
mode, a later commit will make UART5 request the GPIO pinmux.
The simple-audio-card's microphone widget currently connects to the
headphone jack. Routing the microphone input to the microphone jack
allows for independent operation of the microphone and headphones.
This resolves the following boot-time kernel log message, which
indicated a conflict when the microphone and headphone functions were
not separated:
debugfs: File 'Headphone Jack' in directory 'dapm' already present!
Fixes: 6a57f224f734 ("arm64: dts: freescale: add initial support for verdin imx8m mini") Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com> Reviewed-by: Francesco Dolcini <francesco.dolcini@toradex.com> Cc: <stable@vger.kernel.org> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The simple-audio-card's microphone widget currently connects to the
headphone jack. Routing the microphone input to the microphone jack
allows for independent operation of the microphone and headphones.
This resolves the following boot-time kernel log message, which
indicated a conflict when the microphone and headphone functions were
not separated:
debugfs: File 'Headphone Jack' in directory 'dapm' already present!
These are u64 variables that come from the user via
qaic_attach_slice_bo_ioctl(). Use check_add_overflow() to ensure that
the math doesn't have an integer wrapping bug.
Cc: stable@vger.kernel.org Fixes: ff13be830333 ("accel/qaic: Add datapath") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://patchwork.freedesktop.org/patch/msgid/176388fa-40fe-4cb4-9aeb-2c91c22130bd@stanley.mountain Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
By placing some extra BUG_ON() statements I could verify that this is
raised because probing of the 'dummy' regulator driver is not completed
('dummy_regulator_rdev' is still NULL).
In the JTAG debugger I can see that dummy_regulator_probe() and
anatop_regulator_probe() can be run by different kernel threads
(kworker/u4:*). I haven't further investigated whether this can be
changed or if there are other possibilities to force synchronization
between these two probe routines. On the other hand I don't expect much
boot time penalty by probing the 'dummy' regulator synchronously.
Cc: stable@vger.kernel.org Fixes: 259b93b21a9f ("regulator: Set PROBE_PREFER_ASYNCHRONOUS for drivers that existed in 4.14") Signed-off-by: Christian Eggers <ceggers@arri.de> Link: https://patch.msgid.link/20250311091803.31026-1-ceggers@arri.de Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Many filesystems such as NFS and Ceph do not implement the
`invalidate_cache` method. On those filesystems, if writing to the
cache (`NETFS_WRITE_TO_CACHE`) fails for some reason, the kernel
crashes like this:
io_req_msg_cleanup() relies on the fact that io_netmsg_recycle() will
always fully recycle, but that may not be the case if the msg cache
was already full. To ensure that normal cleanup always gets run,
let io_netmsg_recycle() deal with clearing the relevant cleanup flags,
as it knows exactly when that should be done.
The V3D driver still relies on `drm_sched_increase_karma()` and
`drm_sched_resubmit_jobs()` for resubmissions when a timeout occurs.
The function `drm_sched_increase_karma()` marks the job as guilty, while
`drm_sched_resubmit_jobs()` sets an error (-ECANCELED) in the DMA fence of
that guilty job.
Because of this, we must check whether the job’s DMA fence has been
flagged with an error before executing the job. Otherwise, the same guilty
job may be resubmitted indefinitely, causing repeated GPU resets.
This patch adds a check for an error on the job's fence to prevent running
a guilty job that was previously flagged when the GPU timed out.
Note that the CPU and CACHE_CLEAN queues do not require this check, as
their jobs are executed synchronously once the DRM scheduler starts them.
Cc: stable@vger.kernel.org Fixes: d223f98f0209 ("drm/v3d: Add support for compute shader dispatch.") Fixes: 1584f16ca96e ("drm/v3d: Add support for submitting jobs to the TFU.") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250313-v3d-gpu-reset-fixes-v4-1-c1e780d8e096@igalia.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The `struct ttm_resource->placement` contains TTM_PL_FLAG_* flags, but
it was incorrectly tested for XE_PL_* flags.
This caused xe_dma_buf_pin() to always fail when invoked for
the second time. Fix this by checking the `mem_type` field instead.
Fixes: 7764222d54b7 ("drm/xe: Disallow pinning dma-bufs in VRAM") Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: intel-xe@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v6.8+ Signed-off-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250218100353.2137964-1-jacek.lawrynowicz@linux.intel.com Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
(cherry picked from commit b96dabdba9b95f71ded50a1c094ee244408b2a8e) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After a suspend/resume cycle on a down interface, it will come up as
ERROR-ACTIVE.
$ ip -details -s -s a s dev flexcan0
3: flexcan0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN group default qlen 10
link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0
can state STOPPED (berr-counter tx 0 rx 0) restart-ms 1000
$ sudo systemctl suspend
$ ip -details -s -s a s dev flexcan0
3: flexcan0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN group default qlen 10
link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0
can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 1000
And only set CAN state to CAN_STATE_ERROR_ACTIVE when resume process
has no issue, otherwise keep in CAN_STATE_SLEEPING as suspend did.
Commit 7fdaf8966aae ("can: ucan: use strscpy() to instead of strncpy()")
unintentionally introduced a one byte out of bound read on strscpy()'s
source argument (which is kind of ironic knowing that strscpy() is meant
to be a more secure alternative :)).
Let's consider below buffers:
dest[len + 1]; /* will be NUL terminated */
src[len]; /* may not be NUL terminated */
When doing:
strncpy(dest, src, len);
dest[len] = '\0';
strncpy() will read up to len bytes from src.
On the other hand:
strscpy(dest, src, len + 1);
will read up to len + 1 bytes from src, that is to say, an out of bound
read of one byte will occur on src if it is not NUL terminated. Note
that the src[len] byte is never copied, but strscpy() still needs to
read it to check whether a truncation occurred or not.
This exact pattern happened in ucan.
The root cause is that the source is not NUL terminated. Instead of
doing a copy in a local buffer, directly NUL terminate it as soon as
usb_control_msg() returns. With this, the local firmware_str[] variable
can be removed.
On top of this do a couple refactors:
- ucan_ctl_payload->raw is only used for the firmware string, so
rename it to ucan_ctl_payload->fw_str and change its type from u8 to
char.
- ucan_device_request_in() is only used to retrieve the firmware
string, so rename it to ucan_get_fw_str() and refactor it to make it
directly handle all the string termination logic.
Reported-by: syzbot+d7d8c418e8317899e88c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-can/67b323a4.050a0220.173698.002b.GAE@google.com/ Fixes: 7fdaf8966aae ("can: ucan: use strscpy() to instead of strncpy()") Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://patch.msgid.link/20250218143515.627682-2-mailhol.vincent@wanadoo.fr Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are a total of 96 AFL pages and each page has 16 entries with
registers CFDGAFLIDr, CFDGAFLMr, CFDGAFLP0r, CFDGAFLP1r holding
the rule entries (r = 0..15).
Currently, RCANFD_GAFL* macros use a start variable to find AFL entries,
which is incorrect as the testing on RZ/G3E shows ch1 and ch4
gets a start value of 0 and the register contents are overwritten.
Fix this issue by using rule_entry corresponding to the channel
to find the page entries in the AFL list.
Fixes: dd3bd23eb438 ("can: rcar_canfd: Add Renesas R-Car CAN FD driver") Cc: stable@vger.kernel.org Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/20250307170330.173425-3-biju.das.jz@bp.renesas.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to GDMA protocol, holes (zeros) are allowed at the beginning
or middle of the gdma_list_devices_resp message. The existing code
cannot properly handle this, and may miss some devices in the list.
To fix, scan the entire list until the num_of_devs are found, or until
the end of the list.
Cc: stable@vger.kernel.org Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@microsoft.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1741723974-1534-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On the GTA04A5 writing a reset command to the gyroscope causes IRQ
storms because NACK IRQs are enabled and therefore triggered but not
acked.
Sending a reset command to the gyroscope by
i2cset 1 0x69 0x14 0xb6
with an additional debug print in the ISR (not the thread) itself
causes
[ 363.353515] i2c i2c-1: ioctl, cmd=0x720, arg=0xbe801b00
[ 363.359039] omap_i2c 48072000.i2c: addr: 0x0069, len: 2, flags: 0x0, stop: 1
[ 363.366180] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x1110)
[ 363.371673] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
[ 363.376892] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
[ 363.382263] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
[ 363.387664] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
repeating till infinity
[...]
(0x2 = NACK, 0x100 = Bus free, which is not enabled)
Apparently no other IRQ bit gets set, so this stalls.
Do not ignore enabled interrupts and make sure they are acked.
If the NACK IRQ is not needed, it should simply not enabled, but
according to the above log, caring about it is necessary unless
the Bus free IRQ is enabled and handled. The assumption that is
will always come with a ARDY IRQ, which was the idea behind
ignoring it, proves wrong.
It is true for simple reads from an unused address.
To still avoid the i2cdetect trouble which is the reason for
commit c770657bd261 ("i2c: omap: Fix standard mode false ACK readings"),
avoid doing much about NACK in omap_i2c_xfer_data() which is used
by both IRQ mode and polling mode, so also the false detection fix
is extended to polling usage and IRQ storms are avoided.
By changing this, the hardirq handler is not needed anymore to filter
stuff.
The mentioned gyro reset now just causes a -ETIMEDOUT instead of
hanging the system.
When enabling the tracepoint at loading module, the target module
refcount is incremented by find_tracepoint_in_module(). But it is
unnecessary because the module is not unloaded while processing
module loading callbacks.
Moreover, the refcount is not decremented in that function.
To be clear the module refcount handling, move the try_module_get()
callsite to trace_fprobe_create_internal(), where it is actually
required.
This patch broke net/forwarding/ip6gre_custom_multipath_hash.sh in some
circumstances (https://lore.kernel.org/netdev/Z9RIyKZDNoka53EO@mini-arch/).
Let's revert it while the problem is being investigated.
Previous commit 8b5c171bb3dc ("neigh: new unresolved queue limits")
introduces new netlink attribute NDTPA_QUEUE_LENBYTES to represent
approximative value for deprecated QUEUE_LEN. However, it forgot to add
the associated nla_policy in nl_ntbl_parm_policy array. Fix it with one
simple NLA_U32 type policy.
Fixes: 8b5c171bb3dc ("neigh: new unresolved queue limits") Signed-off-by: Lin Ma <linma@zju.edu.cn> Link: https://patch.msgid.link/20250315165113.37600-1-linma@zju.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In tmpfs, when using the "ls" command to list the contents
of a directory with a large number of files, glibc performs
the getdents call in multiple rounds. If a concurrent unlink
occurs between these getdents calls, it may lead to duplicate
directory entries in the ls output. One possible reproduction
scenario is as follows:
Create 1026 files and execute ls and rm concurrently:
for i in {1..1026}; do
echo "This is file $i" > /tmp/dir/file$i
done
ls /tmp/dir rm /tmp/dir/file4
->getdents(file1026-file5)
->unlink(file4)
->getdents(file5,file3,file2,file1)
It is expected that the second getdents call to return file3
through file1, but instead it returns an extra file5.
The root cause of this problem is in the offset_dir_lookup
function. It uses mas_find to determine the starting position
for the current getdents call. Since mas_find locates the first
position that is greater than or equal to mas->index, when file4
is deleted, it ends up returning file5.
It can be fixed by replacing mas_find with mas_find_rev, which
finds the first position that is less than or equal to mas->index.
Fixes: b9b588f22a0c ("libfs: Use d_children list to iterate simple_offset directories") Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com> Link: https://lore.kernel.org/r/20250320034417.555810-1-sunyongjian@huaweicloud.com Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix the lwtunnel_output() reentry loop in ioam6_iptunnel when the
destination is the same after transformation. Note that a check on the
destination address was already performed, but it was not enough. This
is the example of a lwtunnel user taking care of loops without relying
only on the last resort detection offered by lwtunnel.
Fixes: 8cb3bf8bff3c ("ipv6: ioam: Add support for the ip6ip6 encapsulation") Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Link: https://patch.msgid.link/20250314120048.12569-3-justin.iurman@uliege.be Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This patch acts as a parachute, catch all solution, by detecting
recursion loops in lwtunnel users and taking care of them (e.g., a loop
between routes, a loop within the same route, etc). In general, such
loops are the consequence of pathological configurations. Each lwtunnel
user is still free to catch such loops early and do whatever they want
with them. It will be the case in a separate patch for, e.g., seg6 and
seg6_local, in order to provide drop reasons and update statistics.
Another example of a lwtunnel user taking care of loops is ioam6, which
has valid use cases that include loops (e.g., inline mode), and which is
addressed by the next patch in this series. Overall, this patch acts as
a last resort to catch loops and drop packets, since we don't want to
leak something unintentionally because of a pathological configuration
in lwtunnels.
The solution in this patch reuses dev_xmit_recursion(),
dev_xmit_recursion_inc(), and dev_xmit_recursion_dec(), which seems fine
considering the context.
Closes: https://lore.kernel.org/netdev/2bc9e2079e864a9290561894d2a602d6@akamai.com/ Closes: https://lore.kernel.org/netdev/Z7NKYMY7fJT5cYWu@shredder/ Fixes: ffce41962ef6 ("lwtunnel: support dst output redirect function") Fixes: 2536862311d2 ("lwt: Add support to redirect dst.input") Fixes: 14972cbd34ff ("net: lwtunnel: Handle fragmentation") Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Link: https://patch.msgid.link/20250314120048.12569-2-justin.iurman@uliege.be Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently the API emac_update_hardware_stats() reads different ICSSG
stats without any lock protection.
This API gets called by .ndo_get_stats64() which is only under RCU
protection and nothing else. Add lock to this API so that the reading of
statistics happens during lock.
Previously with tegra-smmu, even with CONFIG_IOMMU_DMA, the default domain
could have been left as NULL. The NULL domain is specially recognized by
host1x_iommu_attach() as meaning it is not the DMA domain and
should be replaced with the special shared domain.
This happened prior to the below commit because tegra-smmu was using the
NULL domain to mean IDENTITY.
Now that the domain is properly labled the test in DRM doesn't see NULL.
Check for IDENTITY as well to enable the special domains.
This is the same issue and basic fix as seen in
commit fae6e669cdc5 ("drm/tegra: Do not assume that a NULL domain means no
DMA IOMMU").
Fixes: c8cc2655cc6c ("iommu/tegra-smmu: Implement an IDENTITY domain") Reported-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt> Closes: https://lore.kernel.org/all/c6a6f114-3acd-4d56-a13b-b88978e927dc@tecnico.ulisboa.pt/ Tested-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Link: https://patchwork.freedesktop.org/patch/msgid/0-v1-10dcc8ce3869+3a7-host1x_identity_jgg@nvidia.com Signed-off-by: Sasha Levin <sashal@kernel.org>
xa_alloc_cyclic() can return 1, which isn't an error. To prevent
situation when the caller of this function will treat it as no error do
a check only for negative here.
Fixes: 384968786909 ("net: phy: Introduce ethernet link topology representation") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
In case of returning 1 from xa_alloc_cyclic() (wrapping) ERR_PTR(1) will
be returned, which will cause IS_ERR() to be false. Which can lead to
dereference not allocated pointer (pin).
Fix it by checking if err is lower than zero.
This wasn't found in real usecase, only noticed. Credit to Pierre.
Fixes: 97f265ef7f5b ("dpll: allocate pin ids in cycle") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
In case of returning 1 from xa_alloc_cyclic() (wrapping) ERR_PTR(1) will
be returned, which will cause IS_ERR() to be false. Which can lead to
dereference not allocated pointer (rel).
Fix it by checking if err is lower than zero.
This wasn't found in real usecase, only noticed. Credit to Pierre.
Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While creating a new IPv6, we could get a weird -ENOMEM when
RTA_NH_ID is set and either of the conditions below is true:
1) CONFIG_IPV6_SUBTREES is enabled and rtm_src_len is specified
2) nexthop_get() fails
e.g.)
# strace ip -6 route add fe80::dead:beef:dead:beef nhid 1 from ::
recvmsg(3, {msg_iov=[{iov_base=[...[
{error=-ENOMEM, msg=[... [...]]},
[{nla_len=49, nla_type=NLMSGERR_ATTR_MSG}, "Nexthops can not be used with so"...]
]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 148
Let's set err explicitly after ip_fib_metrics_init() in
ip6_route_info_create().
Fixes: f88d8ea67fbd ("ipv6: Plumb support for nexthop object in a fib6_info") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250312013854.61125-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
fib_check_nh_v6_gw() expects that fib6_nh_init() cleans up everything
when it fails.
Commit 7dd73168e273 ("ipv6: Always allocate pcpu memory in a fib6_nh")
moved fib_nh_common_init() before alloc_percpu_gfp() within fib6_nh_init()
but forgot to add cleanup for fib6_nh->nh_common.nhc_pcpu_rth_output in
case it fails to allocate fib6_nh->rt6i_pcpu, resulting in memleak.
Let's call fib_nh_common_release() and clear nhc_pcpu_rth_output in the
error path.
Note that we can remove the fib6_nh_release() call in nh_create_ipv6()
later in net-next.git.
Fixes: 7dd73168e273 ("ipv6: Always allocate pcpu memory in a fib6_nh") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250312010333.56001-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Registering the interrupts for TX or RX DMA Channels prior to registering
their respective NAPI callbacks can result in a NULL pointer dereference.
This is seen in practice as a random occurrence since it depends on the
randomness associated with the generation of traffic by Linux and the
reception of traffic from the wire.
Before commit 7627a0edef54 ("ata: ahci: Drop low power policy board type")
the ATI AHCI controllers specified board type 'board_ahci' rather than
board type 'board_ahci'. This means that LPM was historically not enabled
for the ATI AHCI controllers.
By looking at commit 7a8526a5cd51 ("libata: Add ATA_HORKAGE_NO_NCQ_ON_ATI
for Samsung 860 and 870 SSD."), it is clear that, for some unknown reason,
that Samsung SSDs do not play nice with ATI AHCI controllers. (When using
other AHCI controllers, NCQ can be enabled on these Samsung SSDs without
issues.)
In a similar way, from user reports, it is clear the ATI AHCI controllers
can enable LPM on e.g. Maxtor HDDs perfectly fine, but when enabling LPM
on certain Samsung SSDs, things break. (E.g. the SSDs will not get detected
by the ATI AHCI controller even after a COMRESET.)
Yet, when using LPM on these Samsung SSDs with other AHCI controllers, e.g.
Intel AHCI controllers, these Samsung drives appear to work perfectly fine.
Considering that the combination of ATI + Samsung, for some unknown reason,
does not seem to work well, disable LPM when detecting an ATI AHCI
controller with a problematic Samsung SSD.
Apply this new ATA_QUIRK_NO_LPM_ON_ATI quirk for all Samsung SSDs that have
already been reported to not play nice with ATI (ATA_QUIRK_NO_NCQ_ON_ATI).
Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type") Suggested-by: Hans de Goede <hdegoede@redhat.com> Reported-by: Eric <eric.4.debian@grabatoulnz.fr> Closes: https://lore.kernel.org/linux-ide/Z8SBZMBjvVXA7OAK@eldamar.lan/ Tested-by: Eric <eric.4.debian@grabatoulnz.fr> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20250317170348.1748671-2-cassel@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When unloading module, the tprobe events are not correctly cleaned
up. Thus it becomes `fprobe-event` and never be enabled again even
if loading the same module again.
When slicing a BO, we need to iterate through the BO's sgt to find the
right pieces to construct the slice. Some of the data types chosen for
this process are incorrectly too small, and can overflow. This can
result in the incorrect slice construction, which can lead to data
corruption in workload execution.
The device can only handle 32-bit sized transfers, and the scatterlist
struct only supports 32-bit buffer sizes, so our upper limit for an
individual transfer is an unsigned int. Using an int is incorrect due to
the reservation of the sign bit. Upgrade the length of a scatterlist
entry and the offsets into a scatterlist entry to unsigned int for a
correct representation.
While each transfer may be limited to 32-bits, the overall BO may exceed
that size. For counting the total length of the BO, we need a type that
can represent the largest allocation possible on the system. That is the
definition of size_t, so use it.
Due to a typo during defining HCI errors it is not possible to connect
LE-capable device with BR/EDR only adapter. The connection is terminated
by the LE adapter because the invalid LL params error code is treated
as unsupported remote feature.
Fixes: 79c0868ad65a ("Bluetooth: hci_event: Use HCI error defines instead of magic values") Signed-off-by: Arkadiusz Bokowy <arkadiusz.bokowy@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When ib_copy_to_udata() fails in hns_roce_create_qp_common(),
hns_roce_qp_remove() should be called in the error path to
clean up resources in hns_roce_qp_store().
Fixes: 0f00571f9433 ("RDMA/hns: Use new SQ doorbell register for HIP09") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250311084857.3803665-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
SQ params from userspace are checked in by set_user_sq_size(). But
when the check fails, the function doesn't return but instead keep
running and overwrite 'ret'. As a result, the invalid params will
not get blocked actually.
Add a return right after the failed check. Besides, although the
check result of kernel sq params will not be overwritten, to keep
coding style unified, move default_congest_type() before
set_kernel_sq_size().
Fixes: 6ec429d5887a ("RDMA/hns: Support userspace configuring congestion control algorithm with QP granularity") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250311084857.3803665-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently the condition of unmapping sdb in error path is not exactly
the same as the condition of mapping in alloc_user_qp_db(). This may
cause a problem of unmapping an unmapped db in some case, such as
when the QP is XRC TGT. Unified the two conditions.
Driver runs a for-loop when allocating bt pages and mapping them with
buffer pages. When a large buffer (e.g. MR over 100GB) is being allocated,
it may require a considerable loop count. This will lead to soft lockup:
Add a cond_resched() to fix soft lockup during these loops. In order not
to affect the allocation performance of normal-size buffer, set the loop
count of a 100GB MR as the threshold to call cond_resched().
Fixes: 38389eaa4db1 ("RDMA/hns: Add mtr support for mixed multihop addressing") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250311084857.3803665-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Driver is always clearing the mask that sets the VLAN ID/Service Level
in the adapter. Recent change for supporting multiple traffic class
exposed this issue.
Allow setting SL and VLAN_ID while QP is moved from INIT to RTR state.
Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Fixes: c64b16a37b6d ("RDMA/bnxt_re: Support different traffic class") Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1741670196-2919-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
After using the device for a while, Tom reports that he initially described
the switch port labels incorrectly. Apparently, ASUS's own firmware also
describes them incorrectly. Correct them to what is seen on the chassis.
During upstream process of Raspberry Pi 4 back in 2019 the ARMv7 stubs
didn't configured the ARM architectural timer. This firmware issue has
been fixed in 2020, which gave users enough time to update their system.
So drop this property to allow the use of the vDSO version of
clock_gettime.
Link: https://github.com/raspberrypi/tools/pull/113 Fixes: 7dbe8c62ceeb ("ARM: dts: Add minimal Raspberry Pi 4 support") Signed-off-by: Phil Elwell <phil@raspberrypi.com> Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20250222094113.48198-1-wahrenst@gmx.net Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When GENERIC_IRQ_CHIP is disabled, OMAP1 kernels fail to link:
arm-linux-gnueabi-ld: arch/arm/mach-omap1/irq.o: in function `omap1_init_irq':
irq.c:(.init.text+0x1e8): undefined reference to `irq_alloc_generic_chip'
arm-linux-gnueabi-ld: irq.c:(.init.text+0x228): undefined reference to `irq_setup_generic_chip'
arm-linux-gnueabi-ld: irq.c:(.init.text+0x2a8): undefined reference to `irq_gc_set_wake'
arm-linux-gnueabi-ld: irq.c:(.init.text+0x2b0): undefined reference to `irq_gc_mask_set_bit'
arm-linux-gnueabi-ld: irq.c:(.init.text+0x2b4): undefined reference to `irq_gc_mask_clr_bit'
This has apparently been the case for many years, but I never caught it
in randconfig builds until now, as there are dozens of other drivers
that also 'select GENERIC_IRQ_CHIP' and statistically there is almost
always one of them enabled.
In function create_ib_ah() the following line attempts
to left shift the return value of mlx5r_ib_rate() by 4
and store it in the stat_rate_sl member of av:
However the code overlooks the fact that mlx5r_ib_rate()
may return -EINVAL if the rate passed to it is less than
IB_RATE_2_5_GBPS or greater than IB_RATE_800_GBPS.
Because of this, the code may invoke undefined behaviour when
shifting a signed negative value when doing "-EINVAL << 4".
To fix this check for errors before assigning stat_rate_sl and
propagate any error value to the callers.
Property "supports-sd" isn't documented anywhere and is unnecessary for
mainline driver to function. It seems a property used by downstream
kernel was brought into mainline.
This should be reported by dtbs_check, but mmc-controller-common.yaml
defaults additionalProperties to true thus allows it. Remove the
property to clean the devicetree up and avoid possible confusion.
The ARM PL011 UART instances in BCM2712 are r1p5 spec, which means they
have 32-entry FIFOs. The correct periphid value for this is 0x00341011.
Thanks to N Buchwitz for pointing this out.
Signed-off-by: Phil Elwell <phil@raspberrypi.com> Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://lore.kernel.org/r/20250223125614.3592-3-wahrenst@gmx.net Fixes: faa3381267d0 ("arm64: dts: broadcom: Add minimal support for Raspberry Pi 5") Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The ARM PL011 UART instances in BCM2711 are r1p5 spec, which means they
have 32-entry FIFOs. The correct periphid value for this is 0x00341011.
Thanks to N Buchwitz for pointing this out.
Signed-off-by: Phil Elwell <phil@raspberrypi.com> Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://lore.kernel.org/r/20250223125614.3592-2-wahrenst@gmx.net Fixes: 7dbe8c62ceeb ("ARM: dts: Add minimal Raspberry Pi 4 support") Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
During s2idle tests on the Raspberry CM4 the VPU firmware always crashes
on xHCI power-domain resume:
root@raspberrypi:/sys/power# echo freeze > state
[ 70.724347] xhci_suspend finished
[ 70.727730] xhci_plat_suspend finished
[ 70.755624] bcm2835-power bcm2835-power: Power grafx off
[ 70.761127] USB: Set power to 0
[ 74.653040] USB: Failed to set power to 1 (-110)
This seems to be caused because of the mixed usage of
raspberrypi-power and bcm2835-power at the same time. So avoid
the usage of the VPU firmware power-domain driver, which
prevents the VPU crash.
Unregister the cpufreq device and soc device when resource unwinding,
otherwise there will be warning when do removing test:
sysfs: cannot create duplicate filename '/devices/platform/imx-cpufreq-dt'
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.13.0-rc1-next-20241204
Hardware name: NXP i.MX8MPlus EVK board (DT)
Fixes: 9cc832d37799 ("soc: imx8m: Probe the SoC driver as platform driver") Cc: Marco Felsch <m.felsch@pengutronix.de> Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Marco Felsch <m.felsch@pengutronix.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The static global soc_uid is only ever used as kasprintf() parameter in
imx8m_soc_probe(). Pass pointer to local u64 variable to .soc_revision()
callback instead and let the .soc_revision() callback fill in the content.
Remove the unnecessary static global variable.
Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Stable-dep-of: cf7139aac463 ("soc: imx8m: Unregister cpufreq and soc dev in cleanup path") Signed-off-by: Sasha Levin <sashal@kernel.org>
The cited commit fixed a software GSO bug with VXLAN + IPSec in tunnel
mode. Unfortunately, it is slightly broader than necessary, as it also
severely affects performance for Geneve + IPSec transport mode over a
device capable of both HW GSO and IPSec crypto offload. In this case,
xfrm_output unnecessarily triggers software GSO instead of letting the
HW do it. In simple iperf3 tests over Geneve + IPSec transport mode over
a back-2-back pair of NICs with MTU 1500, the performance was observed
to be up to 6x worse when doing software GSO compared to leaving it to
the hardware.
This commit makes xfrm_output only trigger software GSO in crypto
offload cases for already encapsulated packets in tunnel mode, as not
doing so would then cause the inner tunnel skb->inner_networking_header
to be overwritten and break software GSO for that packet later if the
device turns out to not be capable of HW GSO.
Taking a closer look at the conditions for the original bug, to better
understand the reasons for this change:
- vxlan_build_skb -> iptunnel_handle_offloads sets inner_protocol and
inner network header.
- then, udp_tunnel_xmit_skb -> ip_tunnel_xmit adds outer transport and
network headers.
- later in the xmit path, xfrm_output -> xfrm_outer_mode_output ->
xfrm4_prepare_output -> xfrm4_tunnel_encap_add overwrites the inner
network header with the one set in ip_tunnel_xmit before adding the
second outer header.
- __dev_queue_xmit -> validate_xmit_skb checks whether GSO segmentation
needs to happen based on dev features. In the original bug, the hw
couldn't segment the packets, so skb_gso_segment was invoked.
- deep in the .gso_segment callback machinery, __skb_udp_tunnel_segment
tries to use the wrong inner network header, expecting the one set in
iptunnel_handle_offloads but getting the one set by xfrm instead.
- a bit later, ipv6_gso_segment accesses the wrong memory based on that
wrong inner network header.
With the new change, the original bug (or similar ones) cannot happen
again, as xfrm will now trigger software GSO before applying a tunnel.
This concern doesn't exist in packet offload mode, when the HW adds
encapsulation headers. For the non-offloaded packets (crypto in SW),
software GSO is still done unconditionally in the else branch.
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Fixes: a204aef9fd77 ("xfrm: call xfrm_output_gso when inner_protocol is set in xfrm_output") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Packets that match the output xfrm policy are delivered to the netstack.
In IPsec packet mode for tunnel mode, the HW is responsible for building
the hard header and outer IP header. In such a situation, the inner
header may refer to a network that is not directly reachable by the host,
resulting in a failed neighbor resolution. The packet is then dropped.
xfrm policy defines the netdevice to use for xmit so we can send packets
directly to it.
Makes direct xmit exclusive to tunnel mode, since some rules may apply
in transport mode.
Fixes: f8a70afafc17 ("xfrm: add TX datapath support for IPsec packet offload mode") Signed-off-by: Alexandre Cassen <acassen@corp.free.fr> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>