When ice driver reads recipes from firmware information about
need_pass_l2 and allow_pass_l2 flags is not stored correctly.
Those flags are stored as one bit each in ice_sw_recipe structure.
Because of that, the result of checking a flag has to be casted to bool.
Note that the need_pass_l2 flag currently works correctly, because
it's stored in the first bit.
Fixes: bccd9bce29e0 ("ice: Add guard rule when creating FDB in switchdev") Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The Record Route IP option records the addresses of the routers that
routed the packet. In the case of forwarded packets, the kernel performs
a route lookup via fib_lookup() and fills in the preferred source
address of the matched route.
The lookup is performed with the DS field of the forwarded packet, but
using the RT_TOS() macro which only masks one of the two ECN bits. If
the packet is ECT(0) or CE, the matched route might be different than
the route via which the packet was forwarded as the input path masks
both of the ECN bits, resulting in the wrong address being filled in the
Record Route option.
When the CM block migrated from CM2.5 to CM3.0, the address offset for
the Global CSR Access Privilege register was modified. We saw this in
the "MIPS64 I6500 Multiprocessing System Programmer's Guide," it is
stated that "the Global CSR Access Privilege register is located at
offset 0x0120" in section 5.4. It is at least the same for I6400.
This fix allows to use the VP cores in SMP mode if the reset values
were modified by the bootloader.
Based on the work of Vladimir Kondratiev
<vladimir.kondratiev@mobileye.com> and the feedback from Jiaxun Yang
<jiaxun.yang@flygoat.com>.
Add a type cast for set8->pairs to fix below compile warning:
main.c: In function 'sets_patch':
main.c:699:50: warning: comparison of distinct pointer types lacks a cast
699 | BUILD_BUG_ON(set8->pairs != &set8->pairs[0].id);
| ^~
Fixes: 9707ac4fe2f5 ("tools/resolve_btfids: Refactor set sorting with types from btf_ids.h") Signed-off-by: Liwei Song <liwei.song.lsong@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20240722083305.4009723-1-liwei.song.lsong@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Since commit 0108a4e9f358 ("bpf: ensure main program has an extable"),
prog->aux->func[0]->kallsyms is left as uninitialized. For BPF programs
with subprogs, the symbol for the main program is missing just as shown
in the output of perf script below:
Fix it by always using prog instead prog->aux->func[0] to emit ksymbol
event for the main program. After the fix, the output of perf script
will be correct:
Fixes: 0108a4e9f358 ("bpf: ensure main program has an extable") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Yonghong Song <yonghong.song@linux.dev> Reviewed-by: Krister Johansen <kjlx@templeofstupid.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20240714065533.1112616-1-houtao@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org>
dmam_free_coherent() frees a DMA allocation, which makes the
freed vaddr available for reuse, then calls devres_destroy()
to remove and free the data structure used to track the DMA
allocation. Between the two calls, it is possible for a
concurrent task to make an allocation with the same vaddr
and add it to the devres list.
If this happens, there will be two entries in the devres list
with the same vaddr and devres_destroy() can free the wrong
entry, triggering the WARN_ON() in dmam_match.
Fix by destroying the devres entry before freeing the DMA
allocation.
AF_UNIX socket tracks the most recent OOB packet (in its receive queue)
with an `oob_skb` pointer. BPF redirecting does not account for that: when
an OOB packet is moved between sockets, `oob_skb` is left outdated. This
results in a single skb that may be accessed from two different sockets.
Take the easy way out: silently drop MSG_OOB data targeting any socket that
is in a sockmap or a sockhash. Note that such silent drop is akin to the
fate of redirected skb's scm_fp_list (SCM_RIGHTS, SCM_CREDENTIALS).
For symmetry, forbid MSG_OOB in unix_bpf_recvmsg().
Fixes: 314001f0bf92 ("af_unix: Add OOB support") Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20240713200218.2140950-2-mhal@rbox.co Signed-off-by: Sasha Levin <sashal@kernel.org>
For all these years libbpf's BTF dumper has been emitting not strictly
valid syntax for function prototypes that have no input arguments.
Instead of `int (*blah)()` we should emit `int (*blah)(void)`.
This is not normally a problem, but it manifests when we get kfuncs in
vmlinux.h that have no input arguments. Due to compiler internal
specifics, we get no BTF information for such kfuncs, if they are not
declared with proper `(void)`.
The fix is trivial. We also need to adjust a few ancient tests that
happily assumed `()` is correct.
Fixes: 351131b51c7a ("libbpf: add btf_dump API for BTF-to-C conversion") Reported-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://lore.kernel.org/bpf/20240712224442.282823-1-andrii@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
On ARM64 the stack pointer should be aligned at a 16 byte boundary or
the SPAlignmentFault can occur. The fexit_sleep selftest allocates the
stack for the child process as a character array, this is not guaranteed
to be aligned at 16 bytes.
Because of the SPAlignmentFault, the child process is killed before it
can do the nanosleep call and hence fentry_cnt remains as 0. This causes
the main thread to hang on the following line:
while (READ_ONCE(fexit_skel->bss->fentry_cnt) != 2);
Fix this by allocating the stack using mmap() as described in the
example in the man page of clone().
Remove the fexit_sleep test from the DENYLIST of arm64.
Fixes: eddbe8e65214 ("selftest/bpf: Add a test to check trampoline freeing logic.") Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240715173327.8657-1-puranjay@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
As commit afa974b77128 ("kbuild: add real-prereqs shorthand for
$(filter-out FORCE,$^)") explained, $(real-prereqs) is not just a list
of objects when linking a multi-object module. If a single-object module
is turned into a multi-object module, $^ (and therefore $(real-prereqs)
as well) contains header files recorded in the *.cmd file. Such headers
must be filtered out.
Now that a DTB can be built either from a single source or multiple
source files, the same issue can occur.
Consider the following scenario:
First, foo.dtb is implemented as a single-blob device tree.
If you rebuild foo.dtb without 'make clean', you will get this error:
Overlay 'scripts/dtc/include-prefixes/dt-bindings/gpio/gpio.h' is incomplete
$(real-prereqs) contains not only foo-base.dtb and foo-addon.dtbo but
also scripts/dtc/include-prefixes/dt-bindings/gpio/gpio.h, which is
passed to scripts/dtc/fdtoverlay.
Commit 59c9081bc86e ("f2fs: allow write page cache when writting cp")
allows write() to write data to page cache during checkpoint, so block
count fields like .total_valid_block_count, .alloc_valid_block_count
and .rf_node_block_count may encounter race condition as below:
CP Thread A
- write_checkpoint
- block_operations
- f2fs_down_write(&sbi->node_change)
- __prepare_cp_block
: ckpt->valid_block_count = .total_valid_block_count
- f2fs_up_write(&sbi->node_change)
- write
- f2fs_preallocate_blocks
- f2fs_map_blocks(,F2FS_GET_BLOCK_PRE_AIO)
- f2fs_map_lock
- f2fs_down_read(&sbi->node_change)
- f2fs_reserve_new_blocks
- inc_valid_block_count
: percpu_counter_add(&sbi->alloc_valid_block_count, count)
: sbi->total_valid_block_count += count
- f2fs_up_read(&sbi->node_change)
- do_checkpoint
: sbi->last_valid_block_count = sbi->total_valid_block_count
: percpu_counter_set(&sbi->alloc_valid_block_count, 0)
: percpu_counter_set(&sbi->rf_node_block_count, 0)
- fsync
- need_do_checkpoint
- f2fs_space_for_roll_forward
: alloc_valid_block_count was reset to zero,
so, it may missed last data during checkpoint
Let's change to update .total_valid_block_count, .alloc_valid_block_count
and .rf_node_block_count in block_operations(), then their access can be
protected by .node_change and .cp_rwsem lock, so that it can avoid above
race condition.
Fixes: 59c9081bc86e ("f2fs: allow write page cache when writting cp") Cc: Yunlei He <heyunlei@oppo.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
This patch fixes a potentially null pointer being accessed by
is_end_zone_blkaddr() that checks the last block of a zone
when f2fs is mounted as a single device.
Fixes: e067dc3c6b9c ("f2fs: maintain six open zones for zoned devices") Signed-off-by: Daejun Park <daejun7.park@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
pm_runtime_get_sync() may return with error. In case it returns with error
dev->power.usage_count needs to be decremented. pm_runtime_resume_and_get()
takes care of this. Thus use it.
Along with it the rzg2l_wdt_set_timeout() function was updated to
propagate the result of rzg2l_wdt_start() to its caller.
get_ckpt_valid_blocks() checks valid ckpt blocks in current section.
It counts all vblocks from the first to the last segment in the
large section. However, START_SEGNO() is used to get the first segno
in an SIT block. This patch fixes that to get the correct start segno.
Fixes: 61461fc921b7 ("f2fs: fix to avoid touching checkpointed data in get_victim()") Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When signals are hard-blocked in order to do time-travel
socket processing, we set signals_blocked and then handle
SIGIO signals by setting the SIGIO bit in signals_pending.
When unblocking, we first set signals_blocked to 0, and
then handle all pending signals. We have to set it first,
so that we can again properly block/unblock inside the
unblock, if the time-travel handlers need to be processed.
Unfortunately, this is racy. We can get into this situation:
// signals_pending = SIGIO_MASK
unblock_signals_hard()
signals_blocked = 0;
if (signals_pending && signals_enabled) {
block_signals();
unblock_signals()
...
sig_handler_common(SIGIO, NULL, NULL);
sigio_handler()
...
sigio_reg_handler()
irq_do_timetravel_handler()
reg->timetravel_handler() ==
vu_req_interrupt_comm_handler()
vu_req_read_message()
vhost_user_recv_req()
vhost_user_recv()
vhost_user_recv_header()
// reads 12 bytes header of
// 20 bytes message
<-- receive SIGIO here <--
sig_handler()
int enabled = signals_enabled; // 1
if ((signals_blocked || !enabled) && (sig == SIGIO)) {
if (!signals_blocked && time_travel_mode == TT_MODE_EXTERNAL)
sigio_run_timetravel_handlers()
_sigio_handler()
sigio_reg_handler()
... as above ...
vhost_user_recv_header()
// reads 8 bytes that were message payload
// as if it were header - but aborts since
// it then gets -EAGAIN
...
--> end signal handler -->
// continue in vhost_user_recv()
// full_read() for 8 bytes payload busy loops
// entire process hangs here
Conceptually, to fix this, we need to ensure that the
signal handler cannot run while we hard-unblock signals.
The thing that makes this more complex is that we can be
doing hard-block/unblock while unblocking. Introduce a
new signals_blocked_pending variable that we can keep at
non-zero as long as pending signals are being processed,
then we only need to ensure it's decremented safely and
the signal handler will only increment it if it's already
non-zero (or signals_blocked is set, of course.)
Note also that only the outermost call to hard-unblock is
allowed to decrement signals_blocked_pending, since it
could otherwise reach zero in an inner call, and leave
the same race happening if the timetravel_handler loops,
but that's basically required of it.
The Rust compiler can take a target config from 'target.json', which is
generated by scripts/generate_rust_target.rs. It used to be that all
Linux architectures used this to generate a target.json, but now
architectures must opt-in to this, or they will default to the Rust
compiler's built-in target definition.
This is mostly okay for (64-bit) x86 and UML, except that it can
generate SSE instructions, which we can't use in the kernel. So
re-instate the custom target.json, which disables SSE (and generally
enables the 'soft-float' feature). This fixes the following compile
error:
error: <unknown>:0:0: in function _RNvMNtCs5QSdWC790r4_4core3f32f7next_up float (float): SSE register return with SSE disabled
Fixes: f82811e22b48 ("rust: Refactor the build target to allow the use of builtin targets") Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Tested-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Link: https://patch.msgid.link/20240529093336.4075206-1-davidgow@google.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Lanes can use other lanes' reference clocks, as determined by refclk.
Use refclk to determine the clock to enable/disable instead of always
using the lane's own reference clock. This ensures the clock selected in
xpsgtr_configure_pll is the one enabled.
For the other half of the equation, always program REF_CLK_SEL even when
we are selecting the lane's own clock. This ensures that Linux's idea of
the reference clock matches the hardware. We use the "local" clock mux
for this instead of going through the ref clock network.
Fixes: 25d700833513 ("phy: xilinx: phy-zynqmp: dynamic clock support for power-save") Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Link: https://lore.kernel.org/r/20240628205540.3098010-2-sean.anderson@linux.dev Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
cdns_torrent_dp_set_power_state() does not consider that ret might be
overwritten. Add return value check of regmap_read_poll_timeout() after
register read in cdns_torrent_dp_set_power_state().
Fixes: 5b16a790f18d ("phy: cadence-torrent: Reorder few functions to remove function declarations") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Reviewed-by: Roger Quadros <rogerq@kernel.org> Link: https://lore.kernel.org/r/20240702032042.3993031-1-make24@iscas.ac.cn Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Compile testing configurations without REGMAP support enabled results in
a bunch of errors being reported:
../drivers/phy/rockchip/phy-rockchip-samsung-hdptx.c:569:21: error: variable ‘rk_hdptx_phy_regmap_config’ has initializer but incomplete type
569 | static const struct regmap_config rk_hdptx_phy_regmap_config = {
| ^~~~~~~~~~~~~
../drivers/phy/rockchip/phy-rockchip-samsung-hdptx.c:570:10: error: ‘const struct regmap_config’ has no member named ‘reg_bits’
570 | .reg_bits = 32,
| ^~~~~~~~
Note that selecting REGMAP alone is not enough, because of the following
liker error:
phy-rockchip-samsung-hdptx.c:(.text+0x10c): undefined reference to `__devm_regmap_init_mmio_clk'
Instead of the obvious fix to enable REGMAP_MMIO, select MFD_SYSCON,
which implicitly enables REGMAP_MMIO as well. The rationale is that the
driver has been already relying on the syscon functionality.
Moreover, without MFD_SYSCON enabled, the test coverage is reduced,
since the linker might not detect any potential undefined references
following syscon_regmap_lookup_by_phandle() invocation in
rk_hdptx_phy_probe() body. That is because the function would
unconditionally return -ENOTSUP, hence the compiler is free to optimize
out any unreachable code.
Finally ensure PHY_ROCKCHIP_SAMSUNG_HDPTX depends on HAS_IOMEM, as
required by MFD_SYSCON.
Unlike other channel counts in CAPx registers, BCDMA BCHAN CNT doesn't
include UHC and HC BC channels. So include them explicitly to arrive at
total BC channel in the instance.
The function kdb_position_cursor() takes in a "prompt" parameter but
never uses it. This doesn't _really_ matter since all current callers
of the function pass the same value and it's a global variable, but
it's a bit ugly. Let's clean it up.
Found by code inspection. This patch is expected to functionally be a
no-op.
When -Wformat-security is not disabled, using a string pointer
as a format causes a warning:
kernel/debug/kdb/kdb_io.c: In function 'kdb_read':
kernel/debug/kdb/kdb_io.c:365:36: error: format not a string literal and no format arguments [-Werror=format-security]
365 | kdb_printf(kdb_prompt_str);
| ^~~~~~~~~~~~~~
kernel/debug/kdb/kdb_io.c: In function 'kdb_getstr':
kernel/debug/kdb/kdb_io.c:456:20: error: format not a string literal and no format arguments [-Werror=format-security]
456 | kdb_printf(kdb_prompt_str);
| ^~~~~~~~~~~~~~
Use an explcit "%s" format instead.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 5d5314d6795f ("kdb: core for kgdb back end (1 of 2)") Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20240528121154.3662553-1-arnd@kernel.org Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Existing device trees specify only a single clock-output-name for the
PCIe PHYs. The function phy_aux_clk_register() expects a second entry in
that property. When it doesn't find it, it returns an error, thus
failing the probe of the PHY and thus breaking support for the
corresponding PCIe host.
Follow the approach of the combo USB+DT PHY and generate the name for
the AUX clocks instead of requiring it in DT.
Fixes: 583ca9ccfa80 ("phy: qcom: qmp-pcie: register second optional PHY AUX clock") Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20240614-fix-pcie-phy-compat-v3-1-730d1811acf4@linaro.org Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
It is because inode.i_crypt_info is not initialized during below path:
- mount
- f2fs_fill_super
- f2fs_disable_checkpoint
- f2fs_gc
- f2fs_iget
- f2fs_truncate
So, let's relocate truncation of preallocated blocks to f2fs_file_open(),
after fscrypt_file_open().
Fixes: d4dd19ec1ea0 ("f2fs: do not expose unwritten blocks to user by DIO") Reported-by: chenyuwen <yuwen.chen@xjmz.com> Closes: https://lore.kernel.org/linux-kernel/20240517085327.1188515-1-yuwen.chen@xjmz.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
While working on simplifying the minmax functions, and avoiding
excessive macro expansion, it turns out that the sr.c use of the
'clamp()' macro has the arguments the wrong way around.
The clamp logic is
val = clamp(in, low, high);
and it returns the input clamped to the low/high limits. But sr.c ddid
speed = clamp(0, speed, 0xffff / 177);
which clamps the value '0' to the range '[speed, 0xffff / 177]' and ends
up being nonsensical.
Every time a watch is reestablished after getting lost, we need to
update the cookie which involves quiescing exclusive lock. For this,
we transition from RBD_LOCK_STATE_LOCKED to RBD_LOCK_STATE_QUIESCING
roughly for the duration of rbd_reacquire_lock() call. If the mapping
is exclusive and I/O happens to arrive in this time window, it's failed
with EROFS (later translated to EIO) based on the wrong assumption in
rbd_img_exclusive_lock() -- "lock got released?" check there stopped
making sense with commit a2b1da09793d ("rbd: lock should be quiesced on
reacquire").
To make it worse, any such I/O is added to the acquiring list before
EROFS is returned and this sets up for violating rbd_lock_del_request()
precondition that the request is either on the running list or not on
any list at all -- see commit ded080c86b3f ("rbd: don't move requests
to the running list on errors"). rbd_lock_del_request() ends up
processing these requests as if they were on the running list which
screws up quiescing_wait completion counter and ultimately leads to
... to RBD_LOCK_STATE_QUIESCING and quiescing_wait to recognize that
this state and the associated completion are backing rbd_quiesce_lock(),
which isn't specific to releasing the lock.
While exclusive lock does get quiesced before it's released, it also
gets quiesced before an attempt to update the cookie is made and there
the lock is not released as long as ceph_cls_set_cookie() succeeds.
Panfrost DRM driver uses devfreq to perform DVFS, while using simple_ondemand
devfreq governor by default. This causes driver initialization to fail on
boot when simple_ondemand governor isn't built into the kernel statically,
as a result of the missing module dependency and, consequently, the required
governor module not being included in the initial ramdisk. Thus, let's mark
simple_ondemand governor as a softdep for Panfrost, to have its kernel module
included in the initial ramdisk.
This is a rather longstanding issue that has forced distributions to build
devfreq governors statically into their kernels, [1][2] or has forced users
to introduce some unnecessary workarounds. [3]
For future reference, not having support for the simple_ondemand governor in
the initial ramdisk produces errors in the kernel log similar to these below,
which were taken from a Pine64 RockPro64:
panfrost ff9a0000.gpu: [drm:panfrost_devfreq_init [panfrost]] *ERROR* Couldn't initialize GPU devfreq
panfrost ff9a0000.gpu: Fatal error during GPU init
panfrost: probe of ff9a0000.gpu failed with error -22
Having simple_ondemand marked as a softdep for Panfrost may not resolve this
issue for all Linux distributions. In particular, it will remain unresolved
for the distributions whose utilities for the initial ramdisk generation do
not handle the available softdep information [4] properly yet. However, some
Linux distributions already handle softdeps properly while generating their
initial ramdisks, [5] and this is a prerequisite step in the right direction
for the distributions that don't handle them properly yet.
Since 45ecaea73883 ("drm/sched: Partial revert of 'drm/sched: Keep
s_fence->parent pointer'") still active jobs aren't put back in the
pending list on drm_sched_start(), as they don't have a active
parent fence anymore, so if the GPU is still working and the timeout
is extended, all currently active jobs will be freed.
To avoid prematurely freeing jobs that are still active on the GPU,
don't block the scheduler until we are fully committed to actually
reset the GPU.
As the current job is already removed from the pending list and
will not be put back when drm_sched_start() isn't called, we must
make sure to put the job back on the pending list when extending
the timeout.
Cc: stable@vger.kernel.org #6.0 Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We should always use firmware's poweroff & reboot service
if it's available as firmware may need to perform more task
than platform's syscon etc.
However _machine_restart & poweroff hooks are registered at
low priority, which means platform reboot driver can override
them.
Register firmware based reboot/poweroff implementation with
register_sys_off_handler with appropriate priority so that
they will be prioritised. Remove _machine_halt hook as it's
deemed to be unnecessary.
Cc: stable@vger.kernel.org Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
arch/mips/sgi-ip30/ip30-console.c: In function ‘prom_putchar’:
arch/mips/sgi-ip30/ip30-console.c:21:17: error: implicit declaration of function ‘cpu_relax’ [-Werror=implicit-function-declaration]
21 | cpu_relax();
Cc: stable@vger.kernel.org Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In imx_rproc_addr_init() strcmp() is performed over the node after the
of_node_put() is performed over it.
Fix this error by moving of_node_put() calls.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 5e4c1243071d ("remoteproc: imx_rproc: support remote cores booted before Linux Kernel") Cc: stable@vger.kernel.org Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Link: https://lore.kernel.org/r/20240612131714.12907-1-amishin@t-argos.ru Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In imx_rproc_addr_init() "nph = of_count_phandle_with_args()" just counts
number of phandles. But phandles may be empty. So of_parse_phandle() in
the parsing loop (0 < a < nph) may return NULL which is later dereferenced.
Adjust this issue by adding NULL-return check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: a0ff4aa6f010 ("remoteproc: imx_rproc: add a NXP/Freescale imx_rproc driver") Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Reviewed-by: Peng Fan <peng.fan@nxp.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240606075204.12354-1-amishin@t-argos.ru
[Fixed title to fit within the prescribed 70-75 charcters] Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Expanding on the previous commit, assuming that rbd_is_lock_owner()
always returns true (i.e. that we are either in RBD_LOCK_STATE_LOCKED
or RBD_LOCK_STATE_QUIESCING) if the mapping is exclusive is wrong too.
In case ceph_cls_set_cookie() fails, the lock would be temporarily
released even if the mapping is exclusive, meaning that we can end up
even in RBD_LOCK_STATE_UNLOCKED.
IOW, exclusive mappings are really "just" about disabling automatic
lock transitions (as documented in the man page), not about grabbing
the lock and holding on to it whatever it takes.
When CONFIG_DM_VERITY=y, dm_is_verity_target() returned true for any
builtin dm target, not just dm-verity. Fix this by checking for
verity_target instead of THIS_MODULE (which is NULL for builtin code).
Building the sigaltstack test with GCC on 64-bit powerpc errors with:
gcc -Wall sas.c -o /home/michael/linux/.build/kselftest/sigaltstack/sas
In file included from sas.c:23:
current_stack_pointer.h:22:2: error: #error "implement current_stack_pointer equivalent"
22 | #error "implement current_stack_pointer equivalent"
| ^~~~~
sas.c: In function ‘my_usr1’:
sas.c:50:13: error: ‘sp’ undeclared (first use in this function); did you mean ‘p’?
50 | if (sp < (unsigned long)sstack ||
| ^~
This happens because GCC doesn't define __ppc__ for 64-bit builds, only
32-bit builds. Instead use __powerpc__ to detect powerpc builds, which
is defined by clang and GCC for 64-bit and 32-bit builds.
Fix a null pointer dereference induced by DEBUG_TEST_DRIVER_REMOVE.
Return from __sev_snp_shutdown_locked() if the psp_device or the
sev_device structs are not initialized. Without the fix, the driver will
produce the following splat:
Fixes: 1ca5614b84ee ("crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP") Cc: stable@vger.kernel.org Signed-off-by: Kim Phillips <kim.phillips@amd.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: John Allen <john.allen@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
rdma_destroy_id() frees both the cm_id and the struct rdma_id_private. Make
sure that cm_work_handler() does not trigger a use-after-free by only
freeing of the struct rdma_id_private after all pending work has finished.
Cc: stable@vger.kernel.org Fixes: 59c68ac31e15 ("iw_cm: free cm_id resources on the last deref") Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240605145117.397751-6-bvanassche@acm.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is incorrect to request the input pin format of the destination widget
using the output pin index of the source module as the indexes are not
necessarily matching.
moduleA.out_pin1 can be connected to moduleB.in_pin0 for example.
Use the dst_queue_id to request the input format of the destination module.
This bug remained unnoticed likely because in nocodec topologies we don't
have process modules after a module copier, thus the pin/queue index is
ignored.
For the process module case, the code was likely have been tested in a
controlled way where all the pin/queue/format properties were present to
work.
Update the debug prints to have better information.
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Cc: stable@vger.kernel.org # v6.8+ Link: https://patch.msgid.link/20240624121519.91703-3-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Driver does not unregister typec structures (typec_mux_dev and
typec_switch_desc) during removal leading to leaks. Fix this by moving
typec registering parts to separate function and using devm interface to
release them. This also makes code a bit simpler:
- Smaller probe() function with less error paths and no #ifdefs,
- No need to store typec_mux_dev and typec_switch_desc in driver state
container structure.
Cc: stable@vger.kernel.org Fixes: 10f514bd172a ("ASoC: codecs: Add WCD939x Codec driver") Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patch.msgid.link/20240701122616.414158-1-krzysztof.kozlowski@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
MHI endpoint stack accidentally started allocating memory for objects from
DMA zone since commit 62210a26cd4f ("bus: mhi: ep: Use slab allocator
where applicable"). But there is no real need to allocate memory from this
naturally limited DMA zone. This also causes the MHI endpoint stack to run
out of memory while doing high bandwidth transfers.
For systems on which the performance counter can expire early due to turbo
modes the watchdog handler has a safety net in place which validates that
since the last watchdog event there has at least 4/5th of the watchdog
period elapsed.
This works reliably only after the first watchdog event because the per
CPU variable which holds the timestamp of the last event is never
initialized.
So a first spurious event will validate against a timestamp of 0 which
results in a delta which is likely to be way over the 4/5 threshold of the
period. As this might happen before the first watchdog hrtimer event
increments the watchdog counter, this can lead to false positives.
Fix this by initializing the timestamp before enabling the hardware event.
Reset the rearm counter as well, as that might be non zero after the
watchdog was disabled and reenabled.
Link: https://lkml.kernel.org/r/87frsfu15a.ffs@tglx Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Read callbacks registered with nvmem core expect 0 to be returned on
success and a negative value to be returned on failure.
abx80x_nvmem_xfer() on read calls i2c_smbus_read_i2c_block_data() which
returns the number of bytes read on success as per its api description,
this return value is handled as an error and returned to nvmem even on
success.
Fix to handle all possible values that would be returned by
i2c_smbus_read_i2c_block_data().
Fixes: e90ff8ede777 ("rtc: abx80x: Add nvmem support") Cc: stable@vger.kernel.org Signed-off-by: Joy Chakraborty <joychakr@google.com> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Sean Anderson <sean.anderson@seco.com> Link: https://lore.kernel.org/r/20240613120750.1455209-1-joychakr@google.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Read/write callbacks registered with nvmem core expect 0 to be returned
on success and a negative value to be returned on failure.
isl1208_nvmem_read()/isl1208_nvmem_write() currently return the number of
bytes read/written on success, fix to return 0 on success and negative on
failure.
Fixes: c3544f6f51ed ("rtc: isl1208: Add new style nvmem support to driver") Cc: stable@vger.kernel.org Signed-off-by: Joy Chakraborty <joychakr@google.com> Link: https://lore.kernel.org/r/20240612080831.1227131-1-joychakr@google.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Switching to transparent mode leads to a loss of link synchronization,
so prevent doing this on an active link. This happened at least on an
Intel N100 system / DELL UD22 dock, the LTTPR residing either on the
host or the dock. To fix the issue, keep the current mode on an active
link, adjusting the LTTPR count accordingly (resetting it to 0 in
transparent mode).
v2: Adjust code comment during link training about reiniting the LTTPRs.
(Ville)
Fixes: 7b2a4ab8b0ef ("drm/i915: Switch to LTTPR transparent mode link training") Reported-and-tested-by: Gareth Yu <gareth.yu@intel.com> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10902 Cc: <stable@vger.kernel.org> # v5.15+ Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240708190029.271247-3-imre.deak@intel.com
(cherry picked from commit 211ad49cf8ccfdc798a719b4d1e000d0a8a9e588) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Regularly retraining a link during an atomic commit happens with the
given pipe/link already disabled and hence intel_dp->link_trained being
false. Ensure this also for retraining a DP SST link via direct calls to
the link training functions (vs. an actual commit as for DP MST). So far
nothing depended on this, however the next patch will depend on
link_trained==false for changing the LTTPR mode to non-transparent.
[Why]
Page table of compute VM in the VRAM will lost after gpu reset.
VRAM won't be restored since compute VM has no shadows.
[How]
Use higher 32-bit of vm->generation to record a vram_lost_counter.
Reset the VM state machine when vm->genertaion is not equal to
the new generation token.
v2: Check vm->generation instead of calling drm_sched_entity_error
in amdgpu_vm_validate.
v3: Use new generation token instead of vram_lost_counter for check.
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
(cherry picked from commit 47c0388b0589cb481c294dcb857d25a214c46eb3) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Why]
After supend/resume, with topology unchanged, observe that
link_address_sent of all mstb are marked as false even the topology probing
is done without any error.
It is caused by wrongly also include "ret == 0" case as a probing failure
case.
[How]
Remove inappropriate checking conditions.
Cc: Lyude Paul <lyude@redhat.com> Cc: Harry Wentland <hwentlan@amd.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Imre Deak <imre.deak@intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: stable@vger.kernel.org Fixes: 37dfdc55ffeb ("drm/dp_mst: Cleanup drm_dp_send_link_address() a bit") Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240626084825.878565-2-Wayne.Lin@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
DisplayLink devices do not generate hotplug events. Remove the poll
flag DRM_CONNECTOR_POLL_HPD, as it may not be specified together with
DRM_CONNECTOR_POLL_CONNECT or DRM_CONNECTOR_POLL_DISCONNECT.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: afdfc4c6f55f ("drm/udl: Fixed problem with UDL adpater reconnection") Reviewed-by: Jani Nikula <jani.nikula@intel.com> Cc: Robert Tarasov <tutankhamen@chromium.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Sean Paul <sean@poorly.run> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v4.15+ Link: https://patchwork.freedesktop.org/patch/msgid/20240510154841.11370-2-tzimmermann@suse.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We seem to have a case where SDMA will sometimes miss a doorbell
if GFX is entering the powergating state when the doorbell comes in.
To workaround this, we can update the wptr via MMIO, however,
this is only safe because we disallow gfxoff in begin_ring() for
SDMA 5.2 and then allow it again in end_ring().
Enable this workaround while we are root causing the issue with
the HW team.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/3440 Tested-by: Friedrich Vock <friedrich.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
(cherry picked from commit f2ac52634963fc38e4935e11077b6f7854e5d700) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We're seeing a GPU hang issue on a CHV platform, which was caused by commit bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for
Gen8").
The Gen8 platform only supports timeslicing and doesn't have a preemption
mechanism, as its engines do not have a preemption timer.
Commit 751f82b353a6 ("drm/i915/gt: Only disable preemption on Gen8 render
engines") addressed this issue only for render engines. This patch extends
that fix by ensuring that preemption is not considered for all engines on
Gen8 platforms.
v4:
- Use the correct Fixes tag (Rodrigo Vivi)
- Reworded commit log (Andi Shyti)
v3:
- Inside need_preempt(), condition of can_preempt() is not required
as simplified can_preempt() is enough. (Chris Wilson)
v2: Simplify can_preempt() function (Tvrtko Ursulin)
Fixes: 751f82b353a6 ("drm/i915/gt: Only disable preemption on gen8 render engines") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11396 Suggested-by: Andi Shyti <andi.shyti@intel.com> Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> CC: <stable@vger.kernel.org> # v5.12+ Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240711163208.1355736-1-nitin.r.gote@intel.com
(cherry picked from commit 7df0be6e6280c6fca01d039864bb123e5e36604b) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
topa_entry->base is a bit-field. Bit-fields are not promoted to a 64-bit
type, even if the underlying type is 64-bit, and so, if necessary, must
be cast to a larger type when calculations are done.
Fix a topa_entry->base address calculation by adding a cast.
Without the cast, the address was limited to 36-bits i.e. 64GiB.
The address calculation is used on systems that do not support Multiple
Entry ToPA (only Broadwell), and affects physical addresses on or above
64GiB. Instead of writing to the correct address, the address comprising
the first 36 bits would be written to.
Intel PT snapshot and sampling modes are not affected.
Fixes: 52ca9ced3f70 ("perf/x86/intel/pt: Add Intel PT PMU driver") Reported-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240624201101.60186-3-adrian.hunter@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
topa_entry->base needs to store a pfn. It obviously needs to be
large enough to store the largest possible x86 pfn which is
MAXPHYADDR-PAGE_SIZE (52-12). So it is 4 bits too small.
Increase the size of topa_entry->base from 36 bits to 40 bits.
Note, systems where physical addresses can be 256TiB or more are affected.
[ Adrian: Amend commit message as suggested by Dave Hansen ]
Fixes: 52ca9ced3f70 ("perf/x86/intel/pt: Add Intel PT PMU driver") Signed-off-by: Marco Cavenati <cavenati.marco@gmail.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240624201101.60186-2-adrian.hunter@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A non-0 retire latency can be observed on a Raptorlake which doesn't
support the retire latency feature.
By design, the retire latency shares the PERF_SAMPLE_WEIGHT_STRUCT
sample type with other types of latency. That could avoid adding too
many different sample types to support all kinds of latency. For the
machine which doesn't support some kind of latency, 0 should be
returned.
Perf doesn’t clear/init all the fields of a sample data for the sake
of performance. It expects the later perf_{prepare,output}_sample() to
update the uninitialized field. However, the current implementation
doesn't touch the field of the retire latency if the feature is not
supported. The memory garbage is dumped into the perf data.
Clear the retire latency if the feature is not supported.
The perf stat errors out with UNC_CHA_TOR_INSERTS.IA_HIT_CXL_ACC_LOCAL
event.
$perf stat -e uncore_cha_55/event=0x35,umask=0x10c0008101/ -a -- ls
event syntax error: '..0x35,umask=0x10c0008101/'
\___ Bad event or PMU
The definition of the CHA umask is config:8-15,32-55, which is 32bit.
However, the umask of the event is bigger than 32bit.
This is an error in the original uncore spec.
Add a new umask_ext5 for the new CHA umask range.
Fixes: 949b11381f81 ("perf/x86/intel/uncore: Add Sapphire Rapids server CHA support") Closes: https://lore.kernel.org/linux-perf-users/alpine.LRH.2.20.2401300733310.11354@Diego/ Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ian Rogers <irogers@google.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20240708185524.1185505-1-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The hard-coded metrics is wrongly calculated on the hybrid machine.
$ perf stat -e cycles,instructions -a sleep 1
Performance counter stats for 'system wide':
18,205,487 cpu_atom/cycles/
9,733,603 cpu_core/cycles/
9,423,111 cpu_atom/instructions/ # 0.52 insn per cycle
4,268,965 cpu_core/instructions/ # 0.23 insn per cycle
The insn per cycle for cpu_core should be 4,268,965 / 9,733,603 = 0.44.
When finding the metric events, the find_stat() doesn't take the PMU
type into account. The cpu_atom/cycles/ is wrongly used to calculate
the IPC of the cpu_core.
In the hard-coded metrics, the events from a different PMU are only
SW_CPU_CLOCK and SW_TASK_CLOCK. They both have the stat type,
STAT_NSECS. Except the SW CLOCK events, check the PMU type as well.
Fixes: 0a57b910807a ("perf stat: Use counts rather than saved_value") Reported-by: Khalil, Amiri <amiri.khalil@intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240606180316.4122904-1-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The perf pending task work is never waited upon the matching event
release. In the case of a child event, released via free_event()
directly, this can potentially result in a leaked event, such as in the
following scenario that doesn't even require a weak IRQ work
implementation to trigger:
begin_new_exec()
perf_event_exit_task()
perf_event_exit_event()
// If is child event
free_event()
WARN(atomic_long_cmpxchg(&event->refcount, 1, 0) != 1)
// event is leaked
Similar scenarios can also happen with perf_event_remove_on_exec() or
simply against concurrent perf_event_release().
Fix this with synchonizing against the possibly remaining pending task
work while freeing the event, just like is done with remaining pending
IRQ work. This means that the pending task callback neither need nor
should hold a reference to the event, preventing it from ever beeing
freed.
When a task is scheduled out, pending sigtrap deliveries are deferred
to the target task upon resume to userspace via task_work.
However failures while adding an event's callback to the task_work
engine are ignored. And since the last call for events exit happen
after task work is eventually closed, there is a small window during
which pending sigtrap can be queued though ignored, leaking the event
refcount addition such as in the following scenario:
TASK A
-----
do_exit()
exit_task_work(tsk);
<IRQ>
perf_event_overflow()
event->pending_sigtrap = pending_id;
irq_work_queue(&event->pending_irq);
</IRQ>
=========> PREEMPTION: TASK A -> TASK B
event_sched_out()
event->pending_sigtrap = 0;
atomic_long_inc_not_zero(&event->refcount)
// FAILS: task work has exited
task_work_add(&event->pending_task)
[...]
<IRQ WORK>
perf_pending_irq()
// early return: event->oncpu = -1
</IRQ WORK>
[...]
=========> TASK B -> TASK A
perf_event_exit_task(tsk)
perf_event_exit_event()
free_event()
WARN(atomic_long_cmpxchg(&event->refcount, 1, 0) != 1)
// leak event due to unexpected refcount == 2
As a result the event is never released while the task exits.
Fix this with appropriate task_work_add()'s error handling.
The command completion was done early while aborting the commands in driver
unload path but outside lock to avoid the WARN_ON condition of performing
dma_free_attr within the lock. However this caused race condition while
command completion via multiple paths causing system crash.
Hence complete the command early in unload path but within the lock to
avoid race condition.
Link up failure is observed as a result of flash read failure. Current
code does not check flash read return code where it relies on FW checksum
to detect the problem.
Add check of flash read failure to detect the problem sooner.
Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/202406210815.rPDRDMBi-lkp@intel.com/ Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240710171057.35066-6-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For fabric scan, current code uses switch scan opcode and flags as the
method to iterate through different commands to carry out the process.
This makes it hard to read. This patch convert those opcode and flags into
steps. In addition, this help reduce some duplicate code.
The device does not come online when the target port is online. There were
multiple RSCNs indicating multiple devices were affected. Driver is in the
process of finishing a fabric scan. A new RSCN (device up) arrived at the
tail end of the last fabric scan. Driver mistakenly thinks the new RSCN is
being taken care of by the previous fabric scan, where this notification is
cleared and not acted on. The laser needs to be blinked again to get the
device to show up.
To prevent driver from accidentally clearing the RSCN notification, each
RSCN is given a generation value. A fabric scan will scan for that
generation(s). Any new RSCN arrive after the scan start will have a new
generation value. This will trigger another scan to get latest data. The
RSCN notification flag will be cleared when the scan is associate to that
generation.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202406210538.w875N70K-lkp@intel.com/ Fixes: bb2ca6b3f09a ("scsi: qla2xxx: Relogin during fabric disturbance") Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240710171057.35066-2-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During vport delete, it is observed that during unload we hit a crash
because of stale entries in outstanding command array. For all these stale
I/O entries, eh_abort was issued and aborted (fast_fail_io = 2009h) but
I/Os could not complete while vport delete is in process of deleting.
Certain vendor specific targets initially register with the fabric as an
initiator function first and then re-register as a target function
afterwards.
The timing of the target function re-registration can cause a race
condition such that the driver is stuck assuming the remote port as an
initiator function and never discovers the target's hosted LUNs.
Expand the nlp_state qualifier to also include NLP_STE_PRLI_ISSUE because
the state means that PRLI was issued but we have not quite reached
MAPPED_NODE state yet. If we received an RSCN in the PRLI_ISSUE state,
then we should restart discovery again by going into DEVICE_RECOVERY.
Fixes: dded1dc31aa4 ("scsi: lpfc: Modify when a node should be put in device recovery mode during RSCN") Cc: <stable@vger.kernel.org> # v6.6+ Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240628172011.25921-3-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Read/write callbacks registered with nvmem core expect 0 to be returned
on success and a negative value to be returned on failure.
cmos_nvram_read()/cmos_nvram_write() currently return the number of
bytes read or written, fix to return 0 on success and -EIO incase number
of bytes requested was not read or written.
Fixes: 8b5b7958fd1c ("rtc: cmos: use generic nvmem") Cc: stable@vger.kernel.org Signed-off-by: Joy Chakraborty <joychakr@google.com> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/20240612083635.1253039-1-joychakr@google.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since balancing mode was added in bda420b98505 ("numa balancing: migrate
on fault among multiple bound nodes"), it was possible to set this mode
but it wouldn't be shown in /proc/<pid>/numa_maps since there was no
support for it in the mpol_to_str() helper.
Furthermore, because the balancing mode sets the MPOL_F_MORON flag, it
would be displayed as 'default' due a workaround introduced a few years
earlier in 8790c71a18e5 ("mm/mempolicy.c: fix mempolicy printing in
numa_maps").
To tidy this up we implement two changes:
Replace the MPOL_F_MORON check by pointer comparison against the
preferred_node_policy array. By doing this we generalise the current
special casing and replace the incorrect 'default' with the correct 'bind'
for the mode.
Secondly, we add a string representation and corresponding handling for
the MPOL_F_NUMA_BALANCING flag.
With the two changes together we start showing the balancing flag when it
is set and therefore complete the fix.
Representation format chosen is to separate multiple flags with vertical
bars, following what existed long time ago in kernel 2.6.25. But as
between then and now there wasn't a way to display multiple flags, this
patch does not change the format in practice.
The power domain is automatically activated from clk_prepare(). However, on
certain platforms like i.MX8QM and i.MX8QXP, the power-on handling invokes
sleeping functions, which triggers the 'scheduling while atomic' bug in the
context switch path during device probing:
Cure this by implementing the irq_bus_lock/sync_unlock() interrupt chip
callbacks and handle power management in them as they are invoked from
non-atomic context.
[ tglx: Rewrote change log, added Fixes tag ]
Fixes: 0136afa08967 ("irqchip: Add driver for imx-irqsteer controller") Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240703163250.47887-1-shenwei.wang@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linux 6.9+ is unable to start a degraded RAID1 array with one drive,
when that drive has a write-mostly flag set. During such an attempt,
the following assertion in bio_split() is hit:
After investigation, it turned out that choose_slow_rdev() does not set
the value of max_sectors in some cases and because of it,
raid1_read_request calls bio_split with sectors == 0.
Fix it by filling in this variable.
This bug was introduced in
commit dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()")
but apparently hidden until
commit 0091c5a269ec ("md/raid1: factor out helpers to choose the best rdev from read_balance()")
shortly thereafter.
Cc: stable@vger.kernel.org # 6.9.x+ Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Fixes: dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()") Cc: Song Liu <song@kernel.org> Cc: Yu Kuai <yukuai3@huawei.com> Cc: Paul Luse <paul.e.luse@linux.intel.com> Cc: Xiao Ni <xni@redhat.com> Cc: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> Link: https://lore.kernel.org/linux-raid/20240706143038.7253-1-mat.jonczyk@o2.pl/ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
--
Tested on both Linux 6.10 and 6.9.8.
Inside a VM, mdadm testsuite for RAID1 on 6.10 did not find any problems:
./test --dev=loop --no-error --raidtype=raid1
(on 6.9.8 there was one failure, caused by external bitmap support not
compiled in).
Notes:
- I was reliably getting deadlocks when adding / removing devices
on such an array - while the array was loaded with fsstress with 20
concurrent processes. When the array was idle or loaded with fsstress
with 8 processes, no such deadlocks happened in my tests.
This occurred also on unpatched Linux 6.8.0 though, but not on
6.1.97-rc1, so this is likely an independent regression (to be
investigated).
- I was also getting deadlocks when adding / removing the bitmap on the
array in similar conditions - this happened on Linux 6.1.97-rc1
also though. fsstress with 8 concurrent processes did cause it only
once during many tests.
- in my testing, there was once a problem with hot adding an
internal bitmap to the array:
mdadm: Cannot add bitmap while array is resyncing or reshaping etc.
mdadm: failed to set internal bitmap.
even though no such reshaping was happening according to /proc/mdstat.
This seems unrelated, though.
fwnode_handle_get(fwnode) is called when a domain is created with fwnode
passed as a function parameter. fwnode_handle_put(domain->fwnode) is called
when the domain is destroyed but during the creation a path exists that
does not set domain->fwnode.
If this path is taken, the fwnode get will never be put.
To avoid the unbalanced get and put, set domain->fwnode unconditionally.
Fixes: d59f6617eef0 ("genirq: Allow fwnode to carry name information only") Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240614173232.1184015-4-herve.codina@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It will cause memory leakage when use driver API devm_free_percpu()
to free memory allocated by devm_alloc_percpu(), fixed by using
devres_release() instead of devres_destroy() within devm_free_percpu().
Driver API devm_krealloc() calls alloc_dr() with wrong argument
@total_new_size, so causes more memory to be allocated than required
fix this memory waste by using @new_size as the argument for alloc_dr().
While the iavf driver adds a s/w limit (128) on the number of FDIR
filters that the VF can request, a malicious VF driver can request more
than that and exhaust the resources for other VFs.
Add a similar limit in ice.
CC: stable@vger.kernel.org Fixes: 1f7ea1cd6a37 ("ice: Enable FDIR Configure for AVF") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Suggested-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The NIC requires each TSO segment to not span more than 10
descriptors. NIC further requires each descriptor to not exceed
16KB - 1 (GVE_TX_MAX_BUF_SIZE_DQO).
The descriptors for an skb are generated by
gve_tx_add_skb_no_copy_dqo() for DQO RDA queue format.
gve_tx_add_skb_no_copy_dqo() loops through each skb frag and
generates a descriptor for the entire frag if the frag size is
not greater than GVE_TX_MAX_BUF_SIZE_DQO. If the frag size is
greater than GVE_TX_MAX_BUF_SIZE_DQO, it is split into descriptor(s)
of size GVE_TX_MAX_BUF_SIZE_DQO and a descriptor is generated for
the remainder (frag size % GVE_TX_MAX_BUF_SIZE_DQO).
gve_can_send_tso() checks if the descriptors thus generated for an
skb would meet the requirement that each TSO-segment not span more
than 10 descriptors. However, the current code misses an edge case
when a TSO segment spans multiple descriptors within a large frag.
This change fixes the edge case.
gve_can_send_tso() relies on the assumption that max gso size (9728)
is less than GVE_TX_MAX_BUF_SIZE_DQO and therefore within an skb
fragment a TSO segment can never span more than 2 descriptors.
Fixes: a57e5de476be ("gve: DQO: Add TX path") Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Cc: stable@vger.kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240724143431.3343722-1-pkaligineedi@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
zap_modalias_env() wrongly calculates size of memory block to move, so
will cause OOB memory access issue if variable MODALIAS is not the last
one within its @env parameter, fixed by correcting size to memmove.
Fixes: 9b3fa47d4a76 ("kobject: fix suppressing modalias in uevents delivered over netlink") Cc: stable@vger.kernel.org Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Reviewed-by: Lk Sii <lk_sii@163.com> Link: https://lore.kernel.org/r/1717074877-11352-1-git-send-email-quic_zijuhu@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lina reports random oopsen originating from the fast GUP code when
16K pages are used with 4-level page-tables, the fourth level being
folded at runtime due to lack of LPA2.
In this configuration, the generic implementation of
p4d_offset_lockless() will return a 'p4d_t *' corresponding to the
'pgd_t' allocated on the stack of the caller, gup_fast_pgd_range().
This is normally fine, but when the fourth level of page-table is folded
at runtime, pud_offset_lockless() will offset from the address of the
'p4d_t' to calculate the address of the PUD in the same page-table page.
This results in a stray stack read when the 'p4d_t' has been allocated
on the stack and can send the walker into the weeds.
Fix the problem by providing our own definition of p4d_offset_lockless()
when CONFIG_PGTABLE_LEVELS <= 4 which returns the real page-table
pointer rather than the address of the local stack variable.
The DMA Link ID is set to the IPC message's primary during dai_config,
which is only during hw_params.
During xrun handling the hw_params is not called and the DMA Link ID
information will be lost.
All other fields in the message expected to be 0 for re-configuration, only
the DMA Link ID needs to be preserved and the in case of repeated
dai_config, it is correctly updated (masked and then set).
Cc: stable@vger.kernel.org Fixes: ca5ce0caa67f ("ASoC: SOF: ipc4/intel: Add support for chained DMA") Link: https://github.com/thesofproject/linux/issues/5116 Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://patch.msgid.link/20240724081932.24542-3-peter.ujfalusi@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The DMA Link ID is only valid in snd_sof_dai_config_data when the
dai_config is called with HW_PARAMS.
The commit that this patch fixes is actually moved a code section without
changing it, the same bug exists in the original code, needing different
patch to kernel prior to 6.9 kernels.
Cc: stable@vger.kernel.org Fixes: 3858464de57b ("ASoC: SOF: ipc4-topology: change chain_dma handling in dai_config") Link: https://github.com/thesofproject/linux/issues/5116 Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://patch.msgid.link/20240724081932.24542-2-peter.ujfalusi@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Outline and export free_reserved_page() because modules use it and it in
turn uses page_ext_{get|put} which should not be exported. The same
result could be obtained by outlining {get|put}_page_tag_ref() but that
would have higher performance impact as these functions are used in more
performance critical paths.
Link: https://lkml.kernel.org/r/20240717212844.2749975-1-surenb@google.com Fixes: dcfe378c81f7 ("lib: introduce support for page allocation tagging") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202407080044.DWMC9N9I-lkp@intel.com/ Suggested-by: Christoph Hellwig <hch@infradead.org> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Sourav Panda <souravpanda@google.com> Cc: <stable@vger.kernel.org> [6.10] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After a recent change in clang to stop consuming all instances of '-S'
and '-c' [1], the stack protector scripts break due to the kernel's use
of -Werror=unused-command-line-argument to catch cases where flags are
not being properly consumed by the compiler driver:
$ echo | clang -o - -x c - -S -c -Werror=unused-command-line-argument
clang: error: argument unused during compilation: '-c' [-Werror,-Wunused-command-line-argument]
This results in CONFIG_STACKPROTECTOR getting disabled because
CONFIG_CC_HAS_SANE_STACKPROTECTOR is no longer set.
'-c' and '-S' both instruct the compiler to stop at different stages of
the pipeline ('-S' after compiling, '-c' after assembling), so having
them present together in the same command makes little sense. In this
case, the test wants to stop before assembling because it is looking at
the textual assembly output of the compiler for either '%fs' or '%gs',
so remove '-c' from the list of arguments to resolve the error.
All versions of GCC continue to work after this change, along with
versions of clang that do or do not contain the change mentioned above.
The decompression code parses a huffman tree and counts the number of
symbols for a given bit length. In rare cases, there may be >= 256
symbols with a given bit length, causing the unsigned char to overflow.
This causes a decompression failure later when the code tries and fails to
find the bit length for a given symbol.
Since the maximum number of symbols is 258, use unsigned short instead.
Link: https://lkml.kernel.org/r/20240717162016.1514077-1-ross.lagerwall@citrix.com Fixes: bc22c17e12c1 ("bzip2/lzma: library support for gzip, bzip2 and lzma decompression") Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Cc: Alain Knaff <alain@knaff.lu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>