The cap_auths that are allocated during an MDS session opening are never
released, causing a memory leak detected by kmemleak. Fix this by freeing
the memory allocated when shutting down the MDS client.
Fixes: 1d17de9534cb ("ceph: save cap_auths in MDS client when session is opened") Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
During noirq suspend phase the Raspberry Pi power driver suffer of
firmware property timeouts. The reason is that the IRQ of the underlying
BCM2835 mailbox is disabled and rpi_firmware_property_list() will always
run into a timeout [1].
Since the VideoCore side isn't consider as a wakeup source, set the
IRQF_NO_SUSPEND flag for the mailbox IRQ in order to keep it enabled
during suspend-resume cycle.
[1]
PM: late suspend of devices complete after 1.754 msecs
WARNING: CPU: 0 PID: 438 at drivers/firmware/raspberrypi.c:128
rpi_firmware_property_list+0x204/0x22c
Firmware transaction 0x00028001 timeout
Modules linked in:
CPU: 0 PID: 438 Comm: bash Tainted: G C 6.9.3-dirty #17
Hardware name: BCM2835
Call trace:
unwind_backtrace from show_stack+0x18/0x1c
show_stack from dump_stack_lvl+0x34/0x44
dump_stack_lvl from __warn+0x88/0xec
__warn from warn_slowpath_fmt+0x7c/0xb0
warn_slowpath_fmt from rpi_firmware_property_list+0x204/0x22c
rpi_firmware_property_list from rpi_firmware_property+0x68/0x8c
rpi_firmware_property from rpi_firmware_set_power+0x54/0xc0
rpi_firmware_set_power from _genpd_power_off+0xe4/0x148
_genpd_power_off from genpd_sync_power_off+0x7c/0x11c
genpd_sync_power_off from genpd_finish_suspend+0xcc/0xe0
genpd_finish_suspend from dpm_run_callback+0x78/0xd0
dpm_run_callback from device_suspend_noirq+0xc0/0x238
device_suspend_noirq from dpm_suspend_noirq+0xb0/0x168
dpm_suspend_noirq from suspend_devices_and_enter+0x1b8/0x5ac
suspend_devices_and_enter from pm_suspend+0x254/0x2e4
pm_suspend from state_store+0xa8/0xd4
state_store from kernfs_fop_write_iter+0x154/0x1a0
kernfs_fop_write_iter from vfs_write+0x12c/0x184
vfs_write from ksys_write+0x78/0xc0
ksys_write from ret_fast_syscall+0x0/0x54
Exception stack(0xcc93dfa8 to 0xcc93dff0)
[...]
PM: noirq suspend of devices complete after 3095.584 msecs
MODULE_DEVICE_TABLE(of, rockchip_mbox_of_match) could let the module
properly autoloaded based on the alias from of_device_id table. It
should be 'rockchip_mbox_of_match' instead of 'rockchp_mbox_of_match',
just fix it.
The ARM MHUv3 controller is only present on ARM64 SoCs. Hence add a
dependency on ARM64, to prevent asking the user about this driver when
configuring a kernel for a different architecture than ARM64.
As set_drr() is called from IRQ context, it can happen that the
pipe context has been nulled by dc_state_destruct().
Apply the same protection here that is already present for
dcn35_set_drr() and dcn10_set_drr(). I.e. fetch the tg pointer
first (to avoid a race with dc_state_destruct()), and then
check the local copy before using it.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3142 Fixes: 06ad7e164256 ("drm/amd/display: Destroy DC context while keeping DML and DML2") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Panel Self Refresh on eDP requires the AUX IO power to be enabled
whenever the output (main link) is enabled. This is required by the
AUX_PHY_WAKE/ML_PHY_LOCK signaling initiated by the HW automatically to
re-enable the main link after it got disabled in power saving states
(see eDP v1.4b, sections 5.1, 6.1.3.3.1.1).
The Panel Replay mode on non-eDP outputs on the other hand is only
supported by keeping the main link active, thus not requiring the above
AUX_PHY_WAKE/ML_PHY_LOCK signaling (eDP v1.4b, section 6.1.3.3.1.2).
Thus enabling the AUX IO power for this case is not required either.
Based on the above enable the AUX IO power only for eDP/PSR outputs.
Bspec: 49274, 53370
v2:
- Add a TODO comment to adjust the requirement for AUX IO based on
whether the ALPM/main-link off mode gets enabled. (Rodrigo)
We do not have ALPM on DP Panel Replay. Due to this SRD_STATUS[SRD State]
doesn't change from SRDENT_ON after Panel Replay is enabled until it gets
disabled.
On eDP Panel Replay DEEP_SLEEP is not reached.
_psr2_ready_for_pipe_update_locked is waiting DEEP_SLEEP bit getting reset.
Take these into account in Panel Replay code by not waiting PSR getting
idle after enabling VBI.
blk_mq_pci_map_queues() maps all queues but right after this, we overwrite
these mappings by calling blk_mq_map_queues(). Just use one helper but not
both.
Fixes: 42f22fe36d51 ("scsi: pm8001: Expose hardware queues for pm80xx") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Daniel Wagner <dwagner@suse.de> Link: https://lore.kernel.org/r/20240912-do-not-overwrite-pci-mapping-v1-1-85724b6cec49@suse.de Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
A previous change was introduced to prevent data loss during a power-on
reset when a tape is present inside the drive. This commit set the
"pos_unknown" flag to true to avoid operations that could compromise data
by performing actions from an untracked position. The relevant change is
commit 9604eea5bd3a ("scsi: st: Add third party poweron reset handling")
As a consequence of this change, a new issue has surfaced: the driver now
returns an "Input/output error" even for empty drives when the drive, host,
or bus is reset. This issue stems from the "flush_buffer" function, which
first checks whether the "pos_unknown" flag is set. If the flag is set, the
user will encounter an "Input/output error" until the tape position is
known again. This behavior differs from the previous implementation, where
empty drives were not affected at system start up time, allowing tape
software to send commands to the driver to retrieve the drive's status and
other information.
The current behavior prioritizes the "pos_unknown" flag over the
"ST_NO_TAPE" status, leading to issues for software that detects drives
during system startup. This software will receive an "Input/output error"
until a tape is loaded and its position is known.
To resolve this, the "ST_NO_TAPE" status should take priority when the
drive is empty, allowing communication with the drive following a power-on
reset. At the same time, the change should continue to protect data by
maintaining the "pos_unknown" flag when the drive contains a tape and its
position is unknown.
Signed-off-by: Rafael Rocha <rrochavi@fnal.gov> Link: https://lore.kernel.org/r/20240905173921.10944-1-rrochavi@fnal.gov Fixes: 9604eea5bd3a ("scsi: st: Add third party poweron reset handling") Acked-by: Kai Mäkisara <kai.makisara@kolumbus.fi> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
While commit 83ab38ef0a0b ("jump_label: Fix concurrency issues in
static_key_slow_dec()") fixed one problem, it created yet another,
notably the following is now possible:
slow_dec
if (try_dec) // dec_not_one-ish, false
// enabled == 1
slow_inc
if (inc_not_disabled) // inc_not_zero-ish
// enabled == 2
return
guard((mutex)(&jump_label_mutex);
if (atomic_cmpxchg(1,0)==1) // false, we're 2
static_call_module_notify() triggers a WARN_ON(), when memory allocation
fails in __static_call_add_module().
That's not really justified, because the failure case must be correctly
handled by the well known call chain and the error code is passed
through to the initiating userspace application.
A memory allocation fail is not a fatal problem, but the WARN_ON() takes
the machine out when panic_on_warn is set.
Module insertion invokes static_call_add_module() to initialize the static
calls in a module. static_call_add_module() invokes __static_call_init(),
which allocates a struct static_call_mod to either encapsulate the built-in
static call sites of the associated key into it so further modules can be
added or to append the module to the module chain.
If that allocation fails the function returns with an error code and the
module core invokes static_call_del_module() to clean up eventually added
static_call_mod entries.
This works correctly, when all keys used by the module were converted over
to a module chain before the failure. If not then static_call_del_module()
causes a #GP as it blindly assumes that key::mods points to a valid struct
static_call_mod.
The problem is that key::mods is not a individual struct member of struct
static_call_key, it's part of a union to save space:
union {
/* bit 0: 0 = mods, 1 = sites */
unsigned long type;
struct static_call_mod *mods;
struct static_call_site *sites;
};
key::sites is a pointer to the list of built-in usage sites of the static
call. The type of the pointer is differentiated by bit 0. A mods pointer
has the bit clear, the sites pointer has the bit set.
As static_call_del_module() blidly assumes that the pointer is a valid
static_call_mod type, it fails to check for this failure case and
dereferences the pointer to the list of built-in call sites, which is
obviously bogus.
Cure it by checking whether the key has a sites or a mods pointer.
If it's a sites pointer then the key is not to be touched. As the sites are
walked in the same order as in __static_call_init() the site walk can be
terminated because all subsequent sites have not been touched by the init
code due to the error exit.
If it was converted before the allocation fail, then the inner loop which
searches for a module match will find nothing.
A fail in the second allocation in __static_call_init() is harmless and
does not require special treatment. The first allocation succeeded and
converted the key to a module chain. That first entry has mod::mod == NULL
and mod::next == NULL, so the inner loop of static_call_del_module() will
neither find a module match nor a module chain. The next site in the walk
was either already converted, but can't match the module, or it will exit
the outer loop because it has a static_call_site pointer and not a
static_call_mod pointer.
The problem that the commit e6a3531dd542cb127c8de32ab1e54a48ae19962b
fixes was reported as a security bug, but Google engineers working on
Android and ChromeOS didn't want to change the default behavior, they
want to get -EIO rather than restarting the system, so I am reverting
that commit.
Note also that calling machine_restart from the I/O handling code is
potentially unsafe (the reboot notifiers may wait for the bio that
triggered the restart), but Android uses the reboot notifiers to store
the reboot reason into the PMU microcontroller, so machine_restart must
be used.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Fixes: e6a3531dd542 ("dm-verity: restart or panic on an I/O error") Suggested-by: Sami Tolvanen <samitolvanen@google.com> Suggested-by: Will Drewry <wad@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 734e1a860312 ("block: Prevent deadlocks when switching
elevators") introduced the function elv_iosched_load_module() to allow
loading an elevator module outside of elv_iosched_store() with the
target device queue not frozen, to avoid deadlocks. However, the "none"
scheduler does not have a module and as a result,
elv_iosched_load_module() always returns an error when trying to switch
to this valid scheduler.
Fix this by ignoring the return value of the request_module() call
done by elv_iosched_load_module(). This restores the behavior before
commit 734e1a860312, which was to ignore the request_module() result and
instead rely on elevator_change() to handle the "none" scheduler case.
Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: 734e1a860312 ("block: Prevent deadlocks when switching elevators") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240917133231.134806-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If bpf_link_prime() fails, bpf_uprobe_multi_link_attach() goes to the
error_free label and frees the array of bpf_uprobe's without calling
bpf_uprobe_unregister().
This leaks bpf_uprobe->uprobe and worse, this frees bpf_uprobe->consumer
without removing it from the uprobe->consumers list.
is caused by incorrectly treating a line as the continuation of a paragraph,
rather than as the first line in a bullet list.
Fixed: 44d174596260 ("KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the event that the I2C bus was powered down when the I2C controller
driver loads, or some spurious pulses occur on the I2C bus, it's
possible that the controller detects a spurious I2C "start" condition.
In this situation it may continue to report the bus is busy indefinitely
and block the controller from working.
The "single-master" DT flag can be specified to disable bus busy checks
entirely, but this may not be safe to use in situations where other I2C
masters may potentially exist.
In the event that the controller reports "bus busy" for too long when
starting a transaction, we can try reinitializing the controller to see
if the busy condition clears. This allows recovering from this scenario.
Fixes: e1d5b6598cdc ("i2c: Add support for Xilinx XPS IIC Bus Interface") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Cc: <stable@vger.kernel.org> # v2.6.34+ Reviewed-by: Manikanta Guntupalli <manikanta.guntupalli@amd.com> Acked-by: Michal Simek <michal.simek@amd.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the i2c bus recovery occurs, driver will send i2c stop command
in the scl low condition. In this case the sw state will still keep
original situation. Under multi-master usage, i2c bus recovery will
be called when i2c transfer timeout occurs. Update the stop command
calling with aspeed_i2c_do_stop function to update master_state.
Traversing VMAs of a given maple tree should be protected by rcu read
lock. However, __damon_va_three_regions() is not doing the protection.
Hold the lock.
Link: https://lkml.kernel.org/r/20240905001204.1481-1-sj@kernel.org Fixes: d0cf3dd47f0d ("damon: convert __damon_va_three_regions to use the VMA iterator") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Guenter Roeck <linux@roeck-us.net> Closes: https://lore.kernel.org/b83651a0-5b24-4206-b860-cb54ffdf209b@roeck-us.net Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some callers of vmf_anon_prepare() may not want us to release the per-VMA
lock ourselves. Rename vmf_anon_prepare() to __vmf_anon_prepare() and let
the callers drop the lock when desired.
Also, make vmf_anon_prepare() a wrapper that releases the per-VMA lock
itself for any callers that don't care.
This is in preparation to fix this bug reported by syzbot:
https://lore.kernel.org/linux-mm/00000000000067c20b06219fbc26@google.com/
Link: https://lkml.kernel.org/r/20240914194243.245-1-vishal.moola@gmail.com Fixes: 9acad7ba3e25 ("hugetlb: use vmf_anon_prepare() instead of anon_vma_prepare()") Reported-by: syzbot+2dab93857ee95f2eeb08@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-mm/00000000000067c20b06219fbc26@google.com/ Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Syzbot reports a UAF in hugetlb_fault(). This happens because
vmf_anon_prepare() could drop the per-VMA lock and allow the current VMA
to be freed before hugetlb_vma_unlock_read() is called.
We can fix this by using a modified version of vmf_anon_prepare() that
doesn't release the VMA lock on failure, and then release it ourselves
after hugetlb_vma_unlock_read().
Link: https://lkml.kernel.org/r/20240914194243.245-2-vishal.moola@gmail.com Fixes: 9acad7ba3e25 ("hugetlb: use vmf_anon_prepare() instead of anon_vma_prepare()") Reported-by: syzbot+2dab93857ee95f2eeb08@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-mm/00000000000067c20b06219fbc26@google.com/ Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 08d08e2e9f0a ("tpm: ibmvtpm: Call tpm2_sessions_init() to
initialize session support") adds call to tpm2_sessions_init() in ibmvtpm,
which could be built as a module. However, tpm2_sessions_init() wasn't
exported, causing libmvtpm to fail to build as a module:
Probing xen-fbfront faults in video_is_primary_device(). The passed-in
struct device is NULL since xen-fbfront doesn't assign it and the
memory is kzalloc()-ed. Assign fb_info->device to avoid this.
This was exposed by the conversion of fb_is_primary_device() to
video_is_primary_device() which dropped a NULL check for struct device.
Fixes: f178e96de7f0 ("arch: Remove struct fb_info from video helpers") Reported-by: Arthur Borsboom <arthurborsboom@gmail.com> Closes: https://lore.kernel.org/xen-devel/CALUcmUncX=LkXWeiSiTKsDY-cOe8QksWhFvcCneOKfrKd0ZajA@mail.gmail.com/ Tested-by: Arthur Borsboom <arthurborsboom@gmail.com> CC: stable@vger.kernel.org Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The generic mmap_base code tries to leave a gap between the top of the
stack and the mmap base address, but enforces a minimum gap size (MIN_GAP)
of 128MB, which is too large on some setups. In particular, on arm tasks
without ADDR_LIMIT_32BIT, the STACK_TOP value is less than 128MB, so it's
impossible to fit such a gap in.
Only enforce this minimum if MIN_GAP < MAX_GAP, as we'd prefer to honour
MAX_GAP, which is defined proportionally, so scales better and always
leaves us with both _some_ stack space and some room for mmap.
This fixes the usercopy KUnit test suite on 32-bit arm, as it doesn't set
any personality flags so gets the default (in this case 26-bit) task size.
This test can be run with: ./tools/testing/kunit/kunit.py run --arch arm
usercopy --make_options LLVM=1
Link: https://lkml.kernel.org/r/20240803074642.1849623-2-davidgow@google.com Fixes: dba79c3df4a2 ("arm: use generic mmap top-down layout and brk randomization") Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Kees Cook <kees@kernel.org> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ftrace_return_address() is called extremely often from
performance-critical code paths when debugging features like
CONFIG_TRACE_IRQFLAGS are enabled. For example, with debug_defconfig,
ftrace selftests on my LPAR currently execute ftrace_return_address()
as follows:
ftrace_return_address(0) - 0 times (common code uses __builtin_return_address(0) instead)
ftrace_return_address(1) - 2,986,805,401 times (with this patch applied)
ftrace_return_address(2) - 140 times
ftrace_return_address(>2) - 0 times
The use of __builtin_return_address(n) was replaced by return_address()
with an unwinder call by commit cae74ba8c295 ("s390/ftrace:
Use unwinder instead of __builtin_return_address()") because
__builtin_return_address(n) simply walks the stack backchain and doesn't
check for reaching the stack top. For shallow stacks with fewer than
"n" frames, this results in reads at low addresses and random
memory accesses.
While calling the fully functional unwinder "works", it is very slow
for this purpose. Moreover, potentially following stack switches and
walking past IRQ context is simply wrong thing to do for
ftrace_return_address().
Reimplement return_address() to essentially be __builtin_return_address(n)
with checks for reaching the stack top. Since the ftrace_return_address(n)
argument is always a constant, keep the implementation in the header,
allowing both GCC and Clang to unroll the loop and optimize it to the
bare minimum.
Batch the HVO work, including de-HVO of the source and HVO of the
destination hugeTLB folios, to speed up demotion.
After commit bd225530a4c7 ("mm/hugetlb_vmemmap: fix race with speculative
PFN walkers"), each request of HVO or de-HVO, batched or not, invokes
synchronize_rcu() once. For example, when not batched, demoting one 1GB
hugeTLB folio to 512 2MB hugeTLB folios invokes synchronize_rcu() 513
times (1 de-HVO plus 512 HVO requests), whereas when batched, only twice
(1 de-HVO plus 1 HVO request). And the performance difference between the
two cases is significant, e.g.,
echo 2048kB >/sys/kernel/mm/hugepages/hugepages-1048576kB/demote_size
time echo 100 >/sys/kernel/mm/hugepages/hugepages-1048576kB/demote
Before this patch:
real 8m58.158s
user 0m0.009s
sys 0m5.900s
After this patch:
real 0m0.900s
user 0m0.000s
sys 0m0.851s
Note that this patch changes the behavior of the `demote` interface when
de-HVO fails. Before, the interface aborts immediately upon failure; now,
it tries to finish an entire batch, meaning it can make extra progress if
the rest of the batch contains folios that do not need to de-HVO.
Link: https://lkml.kernel.org/r/20240812224823.3914837-1-yuzhao@google.com Fixes: bd225530a4c7 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers") Signed-off-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If exfat_load_upcase_table reaches end and returns -EINVAL,
allocated memory doesn't get freed and while
exfat_load_default_upcase_table allocates more memory, leading to a
memory leak.
Here's link to syzkaller crash report illustrating this issue:
https://syzkaller.appspot.com/text?tag=CrashReport&x=1406c201980000
Reported-by: syzbot+e1c69cadec0f1a078e3d@syzkaller.appspotmail.com Fixes: a13d1a4de3b0 ("exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper") Cc: stable@vger.kernel.org Signed-off-by: Daniel Yang <danielyangkang@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is no above issue on x86 due to the generated section flag is only
"A" (allocatable). In order to silence the warning on LoongArch, specify
the attribute like ".rodata..c_jump_table,\"a\",@progbits #" explicitly,
then the section attribute of ".rodata..c_jump_table" must be readonly
in the kernel/bpf/core.o file.
By the way, AFAICT, maybe the root cause is related with the different
compiler behavior of various archs, so to some extent this change is a
workaround for LoongArch, and also there is no effect for x86 which is the
only port supported by objtool before LoongArch with this patch.
Link: https://lkml.kernel.org/r/20240924062710.1243-1-yangtiezhu@loongson.cn Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> [6.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Maxim Suhanov reported that dm-verity doesn't crash if an I/O error
happens. In theory, this could be used to subvert security, because an
attacker can create sectors that return error with the Write Uncorrectable
command. Some programs may misbehave if they have to deal with EIO.
This commit fixes dm-verity, so that if "panic_on_corruption" or
"restart_on_corruption" was specified and an I/O error happens, the
machine will panic or restart.
This commit also changes kernel_restart to emergency_restart -
kernel_restart calls reboot notifiers and these reboot notifiers may wait
for the bio that failed. emergency_restart doesn't call the notifiers.
Move management of the sock->sk_security blob out
of the individual security modules and into the security
infrastructure. Instead of allocating the blobs from within
the modules the modules tell the infrastructure how much
space is required, and the space is allocated there.
Acked-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: John Johansen <john.johansen@canonical.com> Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subject tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
Stable-dep-of: 63dff3e48871 ("lsm: add the inode_free_security_rcu() LSM implementation hook") Signed-off-by: Sasha Levin <sashal@kernel.org>
Add sanity checks for new trip temperature and hysteresis values to
trip_point_temp_store() and trip_point_hyst_store() to prevent trip
point threshold from falling below THERMAL_TEMP_INVALID.
However, still allow user space to pass THERMAL_TEMP_INVALID as the
new trip temperature value to invalidate the trip if necessary.
Also allow the hysteresis to be updated when the temperature is invalid
to allow user space to avoid having to adjust hysteresis after a valid
temperature has been set, but in that case just change the value and do
nothing else.
Fixes: be0a3600aa1e ("thermal: sysfs: Rework the handling of trip point updates") Cc: 6.8+ <stable@vger.kernel.org> # 6.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/12528772.O9o76ZdvQC@rjwysocki.net Signed-off-by: Sasha Levin <sashal@kernel.org>
Change trip_point_hyst_store() and replace thermal_zone_trip_updated()
with thermal_zone_set_trip_hyst() to follow the trip_point_temp_store()
code pattern for more consistency.
The _store() and _show() functions for sysfs attributes corresponding
to trip point parameters (type, temperature and hysteresis) read the
trip ID from the attribute name and then use the trip ID as the index
in the given thermal zone's trips table to get to the trip object they
want.
Instead of doing this, make them use the attribute pointer they get
as the second argument to get to the trip object embedded in the same
struct thermal_trip_desc as the struct device_attribute pointed to by
it, which is much more straightforward and less overhead.
Instead of allocating memory for trip point sysfs attributes separately,
store them in struct thermal_trip_desc for each trip individually which
allows a few extra memory allocations to be avoided.
The Qualcomm serial console implementation is broken and can lose
characters when the serial port is also used for tty output.
Specifically, the console code only waits for the current tx command to
complete when all data has already been written to the fifo. When there
are on-going longer transfers this often means that console output is
lost when the console code inadvertently "hijacks" the current tx
command instead of starting a new one.
This can, for example, be observed during boot when console output that
should have been interspersed with init output is truncated:
Add a new state variable to track how much data has been written to the
fifo and use it to determine when the fifo and shift register are both
empty. This is needed since there is currently no other known way to
determine when the shift register is empty.
This in turn allows the console code to interrupt long transfers without
losing data.
Note that the oops-in-progress case is similarly broken as it does not
cancel any active command and also waits for the wrong status flag when
attempting to drain the fifo (TX_FIFO_NOT_EMPTY_EN is only set when
cancelling a command leaves data in the fifo).
Fixes: c4f528795d1a ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP") Fixes: a1fee899e5be ("tty: serial: qcom_geni_serial: Fix softlock") Fixes: 9e957a155005 ("serial: qcom-geni: Don't cancel/abort if we can't get the port lock") Cc: stable@vger.kernel.org # 4.17 Reviewed-by: Douglas Anderson <dianders@chromium.org> Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Link: https://lore.kernel.org/r/20240906131336.23625-7-johan+linaro@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
With a small modification the qcom_geni_serial_poll_bit() function
could be used to poll more than just a single bit. Let's generalize
it. We'll make the qcom_geni_serial_poll_bit() into just a wrapper of
the general function.
If the busy indicator is set, all other fields in CCI should be
clear according to the spec. However, some UCSI implementations do
not follow this rule and report bogus data in CCI along with the
busy indicator. Ignore the contents of CCI if the busy indicator is
set.
If a command timeout is hit it is possible that the EVENT_PENDING
bit is cleared while connector work is still scheduled which can
cause the EVENT_PENDING bit to go out of sync with scheduled connector
work. Check and set the EVENT_PENDING bit on entry to
ucsi_handle_connector_change() to fix this.
Finally, check UCSI_CCI_BUSY before the return code of ->sync_control.
This ensures that the command is cancelled even if ->sync_control
returns an error (most likely -ETIMEDOUT).
Reported-by: Anurag Bijea <icaliberdev@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219108 Bisected-by: Christian Heusel <christian@heusel.eu> Tested-by: Anurag Bijea <icaliberdev@gmail.com> Fixes: de52aca4d9d5 ("usb: typec: ucsi: Never send a lone connector change ack") Cc: stable@vger.kernel.org Signed-off-by: Christian A. Ehrhardt <lk@c--e.de> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20240912074132.722855-1-lk@c--e.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The command cancellation can be done right after detecting
that the PPM is busy. There is no need to do it separately
in ucsi_read_error() and ucsi_send_command_common().
string.h tests for the macros NOLIBC_ARCH_HAS_$FUNC to use the
architecture-optimized function variants.
However if string.h is included before arch.h header then that check
does not work, leading to duplicate function definitions.
Fixes: 553845eebd60 ("tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`") Fixes: 12108aa8c1a1 ("tools/nolibc: x86-64: Use `rep stosb` for `memset()`") Cc: stable@vger.kernel.org Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20240725-arch-has-func-v1-1-5521ed354acd@weissschuh.net Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The property is "fsl,pins", not "fsl,pin". Wrong property means the pin
configuration was not applied. Fixes dtbs_check warnings:
imx6ull-seeed-npi-dev-board-emmc.dtb: pinctrl@20e0000: uart1grp: 'fsl,pins' is a required property
imx6ull-seeed-npi-dev-board-emmc.dtb: pinctrl@20e0000: uart1grp: 'fsl,pin' does not match any of the regexes: 'pinctrl-[0-9]+'
Cc: stable@vger.kernel.org Fixes: e3b5697195c8 ("ARM: dts: imx6ull: add seeed studio NPi dev board") Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Parthiban Nallathambi <parthiban@linumiz.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The property is "fsl,pins", not "fsl,pin". Wrong property means the pin
configuration was not applied. Fixes dtbs_check warnings:
imx6ul-geam.dtb: pinctrl@20e0000: tscgrp: 'fsl,pins' is a required property
imx6ul-geam.dtb: pinctrl@20e0000: tscgrp: 'fsl,pin' does not match any of the regexes: 'pinctrl-[0-9]+'
The flexspi on different SoCs may have different number of LUTs.
So involve lut_num in nxp_fspi_devtype_data to make distinguish.
This patch prepare for the adding of imx8ulp.
Fixes: ef89fd56bdfc ("arm64: dts: imx8ulp: add flexspi node") Cc: stable@kernel.org Signed-off-by: Haibo Chen <haibo.chen@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20240905094338.1986871-3-haibo.chen@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The LSM framework has an existing inode_free_security() hook which
is used by LSMs that manage state associated with an inode, but
due to the use of RCU to protect the inode, special care must be
taken to ensure that the LSMs do not fully release the inode state
until it is safe from a RCU perspective.
This patch implements a new inode_free_security_rcu() implementation
hook which is called when it is safe to free the LSM's internal inode
state. Unfortunately, this new hook does not have access to the inode
itself as it may already be released, so the existing
inode_free_security() hook is retained for those LSMs which require
access to the inode.
Cc: stable@vger.kernel.org Reported-by: syzbot+5446fbf332b0602ede0b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/00000000000076ba3b0617f65cc8@google.com Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When submitting more than 2^32 padata objects to padata_do_serial, the
current sorting implementation incorrectly sorts padata objects with
overflowed seq_nr, causing them to be placed before existing objects in
the reorder list. This leads to a deadlock in the serialization process
as padata_find_next cannot match padata->seq_nr and pd->processed
because the padata instance with overflowed seq_nr will be selected
next.
To fix this, we use an unsigned integer wrap around to correctly sort
padata objects in scenarios with integer overflow.
Fixes: bfde23ce200e ("padata: unbind parallel jobs from specific CPUs") Cc: <stable@vger.kernel.org> Co-developed-by: Christian Gafert <christian.gafert@rohde-schwarz.com> Signed-off-by: Christian Gafert <christian.gafert@rohde-schwarz.com> Co-developed-by: Max Ferger <max.ferger@rohde-schwarz.com> Signed-off-by: Max Ferger <max.ferger@rohde-schwarz.com> Signed-off-by: Van Giang Nguyen <vangiang.nguyen@rohde-schwarz.com> Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Two return statements in sbi_cpuidle_dt_init_states() did not drop the
OF node reference count. Solve the issue and simplify entire error
handling with scoped/cleanup.h.
Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://patch.msgid.link/20240820094023.61155-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Because flush sync_work may trigger mddev_suspend() if there are spares,
and this should never be done in IO path because mddev_suspend() is used
to wait for IO.
This problem is found by code review.
Fixes: bc08041b32ab ("md: suspend array in md_start_sync() if array need reconfiguration") Cc: stable@vger.kernel.org Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240801124746.242558-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The conversion of system address to physical memory address (as viewed by
the memory controller) by igen6_edac is incorrect when the system address
is above the TOM (Total amount Of populated physical Memory) for Elkhart
Lake and Ice Lake (Neural Network Processor). Fix this conversion.
Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/stable/20240814061011.43545-1-qiuxu.zhuo%40intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit c77e22834ae9 ("NFSv4: Fix a potential sleep while atomic in
nfs4_do_reclaim()") separate out the freeing of the state owners from
nfs4_purge_state_owners() and finish it outside the rcu lock.
However, the error path is omitted. As a result, the state owners in
"freeme" will not be released.
Fix it by adding freeing in the error path.
Fixes: c77e22834ae9 ("NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The fcntl's F_SETOWN command sets the process that handle SIGIO/SIGURG
for the related file descriptor. Before this change, the
file_set_fowner LSM hook was always called, ignoring the VFS logic which
may not actually change the process that handles SIGIO (e.g. TUN, TTY,
dnotify), nor update the related UID/EUID.
Moreover, because security_file_set_fowner() was called without lock
(e.g. f_owner.lock), concurrent F_SETOWN commands could result to a race
condition and inconsistent LSM states (e.g. SELinux's fown_sid) compared
to struct fown_struct's UID/EUID.
This change makes sure the LSM states are always in sync with the VFS
state by moving the security_file_set_fowner() call close to the
UID/EUID updates and using the same f_owner.lock .
Rename f_modown() to __f_setown() to simplify code.
Cc: stable@vger.kernel.org Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: Christian Brauner <brauner@kernel.org> Cc: James Morris <jmorris@namei.org> Cc: Jann Horn <jannh@google.com> Cc: Ondrej Mosnacek <omosnace@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In netfs_init() or fscache_proc_init(), we create dentry under 'fs/netfs',
but in netfs_exit(), we only delete the proc entry of 'fs/netfs' without
deleting its subtree. This triggers the following WARNING:
Therefore use remove_proc_subtree() instead of remove_proc_entry() to
fix the above problem.
Fixes: 7eb5b3e3a0a5 ("netfs, fscache: Move /proc/fs/fscache to /proc/fs/netfs and put in a symlink") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240826113404.3214786-1-libaokun@huaweicloud.com Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Recently I noticed a bug[1] in btrfs, after digged it into
and I believe it'a race in vfs.
Let's assume there's a inode (ie ino 261) with i_count 1 is
called by iput(), and there's a concurrent thread calling
generic_shutdown_super().
cpu0: cpu1:
iput() // i_count is 1
->spin_lock(inode)
->dec i_count to 0
->iput_final() generic_shutdown_super()
->__inode_add_lru() ->evict_inodes()
// cause some reason[2] ->if (atomic_read(inode->i_count)) continue;
// return before // inode 261 passed the above check
// list_lru_add_obj() // and then schedule out
->spin_unlock()
// note here: the inode 261
// was still at sb list and hash list,
// and I_FREEING|I_WILL_FREE was not been set
btrfs_iget()
// after some function calls
->find_inode()
// found the above inode 261
->spin_lock(inode)
// check I_FREEING|I_WILL_FREE
// and passed
->__iget()
->spin_unlock(inode) // schedule back
->spin_lock(inode)
// check (I_NEW|I_FREEING|I_WILL_FREE) flags,
// passed and set I_FREEING
iput() ->spin_unlock(inode)
->spin_lock(inode) ->evict()
// dec i_count to 0
->iput_final()
->spin_unlock()
->evict()
Now, we have two threads simultaneously evicting
the same inode, which may trigger the BUG(inode->i_state & I_CLEAR)
statement both within clear_inode() and iput().
To fix the bug, recheck the inode->i_count after holding i_lock.
Because in the most scenarios, the first check is valid, and
the overhead of spin_lock() can be reduced.
If there is any misunderstanding, please let me know, thanks.
[1]: https://lore.kernel.org/linux-btrfs/000000000000eabe1d0619c48986@google.com/
[2]: The reason might be 1. SB_ACTIVE was removed or 2. mapping_shrinkable()
return false when I reproduced the bug.
Reported-by: syzbot+67ba3c42bcbb4665d3ad@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=67ba3c42bcbb4665d3ad CC: stable@vger.kernel.org Fixes: 63997e98a3be ("split invalidate_inodes()") Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Link: https://lore.kernel.org/r/20240823130730.658881-1-sunjunchao2870@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
All batches of the Pine64 Pinebook Pro, except the latest batch (as of 2024)
whose hardware design was revised due to the component shortage, use a 1S
lithium battery whose nominal/design capacity is 10,000 mAh, according to the
battery datasheet. [1][2] Let's correct the design full-charge value in the
Pinebook Pro board dts, to improve the accuracy of the hardware description,
and to hopefully improve the accuracy of the fuel gauge a bit on all units
that don't belong to the latest batch.
The above-mentioned latest batch uses a different 1S lithium battery with
a slightly lower capacity, more precisely 9,600 mAh. To make the fuel gauge
work reliably on the latest batch, a sample battery would need to be sent to
CellWise, to obtain its proprietary battery profile, whose data goes into
"cellwise,battery-profile" in the Pinebook Pro board dts. Without that data,
the fuel gauge reportedly works unreliably, so changing the design capacity
won't have any negative effects on the already unreliable operation of the
fuel gauge in the Pinebook Pros that belong to the latest batch.
According to the battery datasheet, its voltage can go as low as 2.75 V while
discharging, but it's better to leave the current 3.0 V value in the dts file,
because of the associated Pinebook Pro's voltage regulation issues.
Increase the frequency of the PWM signal that drives the LED backlight of
the Pinebook Pro's panel, from about 1.35 KHz (which equals to the PWM
period of 740,740 ns), to exactly 8 kHz (which equals to the PWM period of
125,000 ns). Using a higher PWM frequency for the panel backlight, which
reduces the flicker, can only be beneficial to the end users' eyes.
On top of that, increasing the backlight PWM signal frequency reportedly
eliminates the buzzing emitted from the Pinebook Pro's built-in speakers
when certain backlight levels are set, which cause some weird interference
with some of the components of the Pinebook Pro's audio chain.
The old value for the backlight PWM period, i.e. 740,740 ns, is pretty much
an arbitrary value that was selected during the very early bring-up of the
Pinebook Pro, only because that value seemed to minimize horizontal line
distortion on the display, which resulted from the old X.org drivers causing
screen tearing when dragging windows around. That's no longer an issue, so
there are no reasons to stick with the old PWM period value.
The lower and the upper backlight PWM frequency limits for the Pinebook Pro's
panel, according to its datasheet, are 200 Hz and 10 kHz, respectively. [1]
These changes still leave some headroom, which may have some positive effects
on the lifetime expectancy of the panel's backlight LEDs.
The DPI display interface feeds the external display pipeline. However
the pipeline representation is currently incomplete. Efforts are still
under way to come up with a way to represent the "creative" repurposing
of the DP bridge chip's internal output mux, which is meant to support
USB type-C orientation changes, to output to one of two type-C ports.
Until that is finalized, the external display can't be fully described,
and thus won't work. Even worse, the half complete graph potentially
confuses the OS, breaking the internal display as well.
Disable the external display interface across the whole Corsola family
until the DP / USB Type-C muxing graph binding is ready.
Add explicit casting to prevent expantion of 32th bit of
u32 into highest half of u64 in several places.
For example, in inject_abt64:
ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT = 0x24 << 26.
This operation's result is int with 1 in 32th bit.
While casting this value into u64 (esr is u64) 1
fills 32 highest bits.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Some f2fs ioctl interfaces like f2fs_ioc_set_pin_file(),
f2fs_move_file_range(), and f2fs_defragment_range() missed to
check atomic_write status, which may cause potential race issue,
fix it.
As the helper function f2fs_bdev_support_discard() shows, f2fs checks if
the target block devices support discard by calling
bdev_max_discard_sectors() and bdev_is_zoned(). This check works well
for most cases, but it does not work for conventional zones on zoned
block devices. F2fs assumes that zoned block devices support discard,
and calls __submit_discard_cmd(). When __submit_discard_cmd() is called
for sequential write required zones, it works fine since
__submit_discard_cmd() issues zone reset commands instead of discard
commands. However, when __submit_discard_cmd() is called for
conventional zones, __blkdev_issue_discard() is called even when the
devices do not support discard.
The inappropriate __blkdev_issue_discard() call was not a problem before
the commit 30f1e7241422 ("block: move discard checks into the ioctl
handler") because __blkdev_issue_discard() checked if the target devices
support discard or not. If not, it returned EOPNOTSUPP. After the
commit, __blkdev_issue_discard() no longer checks it. It always returns
zero and sets NULL to the given bio pointer. This NULL pointer triggers
f2fs_bug_on() in __submit_discard_cmd(). The BUG is recreated with the
commands below at the umount step, where /dev/nullb0 is a zoned null_blk
with 5GB total size, 128MB zone size and 10 conventional zones.
$ mkfs.f2fs -f -m /dev/nullb0
$ mount /dev/nullb0 /mnt
$ for ((i=0;i<5;i++)); do dd if=/dev/zero of=/mnt/test bs=65536 count=1600 conv=fsync; done
$ umount /mnt
To fix the BUG, avoid the inappropriate __blkdev_issue_discard() call.
When discard is requested for conventional zones, check if the device
supports discard or not. If not, return EOPNOTSUPP.
Fixes: 30f1e7241422 ("block: move discard checks into the ioctl handler") Cc: stable@vger.kernel.org Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The F2FS ioctls for starting and committing atomic writes check for
inode_owner_or_capable(), but this does not give LSMs like SELinux or
Landlock an opportunity to deny the write access - if the caller's FSUID
matches the inode's UID, inode_owner_or_capable() immediately returns true.
There are scenarios where LSMs want to deny a process the ability to write
particular files, even files that the FSUID of the process owns; but this
can currently partially be bypassed using atomic write ioctls in two ways:
- F2FS_IOC_START_ATOMIC_REPLACE + F2FS_IOC_COMMIT_ATOMIC_WRITE can
truncate an inode to size 0
- F2FS_IOC_START_ATOMIC_WRITE + F2FS_IOC_ABORT_ATOMIC_WRITE can revert
changes another process concurrently made to a file
Fix it by requiring FMODE_WRITE for these operations, just like for
F2FS_IOC_MOVE_RANGE. Since any legitimate caller should only be using these
ioctls when intending to write into the file, that seems unlikely to break
anything.
Fixes: 88b88a667971 ("f2fs: support atomic writes") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The result of multiplication between values derived from functions
dir_buckets() and bucket_blocks() *could* technically reach
2^30 * 2^2 = 2^32.
While unlikely to happen, it is prudent to ensure that it will not
lead to integer overflow. Thus, use mul_u32_u32() as it's more
appropriate to mitigate the issue.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
When dealing with large extents and calculating file offsets by
summing up according extent offsets and lengths of unsigned int type,
one may encounter possible integer overflow if the values are
big enough.
Prevent this from happening by expanding one of the addends to
(pgoff_t) type.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
It should wait all existing dio write IOs before block removal,
otherwise, previous direct write IO may overwrite data in the
block which may be reused by other inode.
[BUG]
There are some reports about invalid data backref objectids, the report
looks like this:
BTRFS critical (device sda): corrupt leaf: block=333654787489792 slot=110 extent bytenr=333413935558656 len=65536 invalid data ref objectid value 2543
The data ref objectid is the inode number inside the subvolume.
But in above case, the value is completely sane, not really showing the
problem.
[CAUSE]
The root cause of the problem is the deprecated feature, inode cache.
This feature results a special inode number, -12ULL, and it's no longer
recognized by tree-checker, triggering the error.
The direct problem here is the output of data ref objectid. The value
shown is in fact the dref_root (subvolume id), not the dref_objectid
(inode number).
[FIX]
Fix the output to use dref_objectid instead.
When doing concurrent lseek(2) system calls against the same file
descriptor, using multiple threads belonging to the same process, we have
a short time window where a race happens and can result in a memory leak.
The race happens like this:
1) A program opens a file descriptor for a file and then spawns two
threads (with the pthreads library for example), lets call them
task A and task B;
2) Task A calls lseek with SEEK_DATA or SEEK_HOLE and ends up at
file.c:find_desired_extent() while holding a read lock on the inode;
3) At the start of find_desired_extent(), it extracts the file's
private_data pointer into a local variable named 'private', which has
a value of NULL;
4) Task B also calls lseek with SEEK_DATA or SEEK_HOLE, locks the inode
in shared mode and enters file.c:find_desired_extent(), where it also
extracts file->private_data into its local variable 'private', which
has a NULL value;
5) Because it saw a NULL file private, task A allocates a private
structure and assigns to the file structure;
6) Task B also saw a NULL file private so it also allocates its own file
private and then assigns it to the same file structure, since both
tasks are using the same file descriptor.
At this point we leak the private structure allocated by task A.
Besides the memory leak, there's also the detail that both tasks end up
using the same cached state record in the private structure (struct
btrfs_file_private::llseek_cached_state), which can result in a
use-after-free problem since one task can free it while the other is
still using it (only one task took a reference count on it). Also, sharing
the cached state is not a good idea since it could result in incorrect
results in the future - right now it should not be a problem because it
end ups being used only in extent-io-tree.c:count_range_bits() where we do
range validation before using the cached state.
Fix this by protecting the private assignment and check of a file while
holding the inode's spinlock and keep track of the task that allocated
the private, so that it's used only by that task in order to prevent
user-after-free issues with the cached state record as well as potentially
using it incorrectly in the future.
Fixes: 3c32c7212f16 ("btrfs: use cached state when looking for delalloc ranges with lseek") CC: stable@vger.kernel.org # 6.6+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fill_pool() uses 'obj_pool_min_free' to decide whether objects should be
handed back to the kmem cache. But 'obj_pool_min_free' records the lowest
historical value of the number of objects in the object pool and not the
minimum number of objects which should be kept in the pool.
Use 'debug_objects_pool_min_level' instead, which holds the minimum number
which was scaled to the number of CPUs at boot time.
[ tglx: Massage change log ]
Fixes: d26bf5056fc0 ("debugobjects: Reduce number of pool_lock acquisitions in fill_pool()") Fixes: 36c4ead6f6df ("debugobjects: Add global free list and the counter") Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240904133944.2124-3-thunder.leizhen@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After its conversion to the new mount API, debugfs displays "none" in
/proc/mounts instead of the actual source. Fix this by recognising its
"source" mount option.
Signed-off-by: Marc Aurèle La France <tsi@tuyoix.net> Link: https://lore.kernel.org/r/e439fae2-01da-234b-75b9-2a7951671e27@tuyoix.net Fixes: a20971c18752 ("vfs: Convert debugfs to use the new mount API") Cc: stable@vger.kernel.org # 6.10.x: 49abee5991e1: debugfs: Convert to new uid/gid option parsing helpers Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In multi-channel scenarios, the granted channel must be aborted before
suspending. Otherwise, the firmware will be put into a wrong state,
resulting in an association failure after resuming.
With this patch, the granted channel will be aborted before suspending
if necessary.
Cc: stable@vger.kernel.org Fixes: c948b5da6bbe ("wifi: mt76: mt7925: add Mediatek Wi-Fi7 driver for mt7925 chips") Signed-off-by: Michael Lo <michael.lo@mediatek.com> Signed-off-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com> Link: https://patch.msgid.link/20240902090054.15806-1-mingyen.hsieh@mediatek.com Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
"iw dev wlp2s0 station dump" shows incorrect rx bitrate:
tx bitrate: 866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2
rx bitrate: 86.7 MBit/s VHT-MCS 9 VHT-NSS 1
This is because the RX band width is calculated incorrectly. Fix the
calculation according to the phydm_rxsc_2_bw() function from the
official drivers.
After:
tx bitrate: 866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2
rx bitrate: 390.0 MBit/s VHT-MCS 9 80MHz VHT-NSS 1
It also works correctly with the AP configured for 20 MHz and 40 MHz.
Remove VID/PID 0bda:c82c as it was inadvertently added to the device
list in driver rtw8821cu. This VID/PID is for the rtw8822cu device
and it is already in the appropriate place for that device.
netif_txq_maybe_stop() returns -1, 0, or 1, while
idpf_tx_maybe_stop_common() says it returns 0 or -EBUSY. As a result,
there sometimes are Tx queue timeout warnings despite that the queue
is empty or there is at least enough space to restart it.
Make idpf_tx_maybe_stop_common() inline and returning true or false,
handling the return of netif_txq_maybe_stop() properly. Use a correct
goto in idpf_tx_maybe_stop_splitq() to avoid stopping the queue or
incrementing the stops counter twice.
Fixes: 6818c4d5b3c2 ("idpf: add splitq start_xmit") Fixes: a5ab9ee0df0b ("idpf: add singleq start_xmit and napi poll") Cc: stable@vger.kernel.org # 6.7+ Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>