In a review discussion of the changes to support vCPU hotplug where
a check was added on the GICC being enabled if was online, it was
noted that there is need to map back to the cpu and use that to index
into a cpumask. As such, a valid ID is needed.
If an MPIDR check fails in acpi_map_gic_cpu_interface() it is possible
for the entry in cpu_madt_gicc[cpu] == NULL. This function would
then cause a NULL pointer dereference. Whilst a path to trigger
this has not been established, harden this caller against the
possibility.
If acpi_processor_get_info() returned an error, pr and the associated
pr->throttling.shared_cpu_map were leaked.
The unwind code was in the wrong order wrt to setup, relying on
some unwind actions having no affect (clearing variables that were
never set etc). That makes it harder to reason about so reorder
and add appropriate labels to only undo what was actually set up
in the first place.
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240529133446.28446-6-Jonathan.Cameron@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Rafael observed [1] that returning 0 from processor_add() will result in
acpi_default_enumeration() being called which will attempt to create a
platform device, but that makes little sense when the processor is known
to be not available. So just return the error code from acpi_processor_get_info()
instead.
The superblock buffers of nilfs2 can not only be overwritten at runtime
for modifications/repairs, but they are also regularly swapped, replaced
during resizing, and even abandoned when degrading to one side due to
backing device issues. So, accessing them requires mutual exclusion using
the reader/writer semaphore "nilfs->ns_sem".
Some sysfs attribute show methods read this superblock buffer without the
necessary mutual exclusion, which can cause problems with pointer
dereferencing and memory access, so fix it.
Ole reported that event->mmap_mutex is strictly insufficient to
serialize the AUX buffer, add a per RB mutex to fully serialize it.
Note that in the lock order comment the perf_event::mmap_mutex order
was already wrong, that is, it nesting under mmap_lock is not new with
this patch.
Fixes: 45bfb2e50471 ("perf: Add AUX area to ring buffer for raw data streams") Reported-by: Ole <ole@binarygecko.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/clocksource/timer-of.c: In function ‘timer_of_irq_exit’:
drivers/clocksource/timer-of.c:29:46: error: passing argument 2 of
‘free_percpu_irq’ from pointer to non-enclosed address space
29 | free_percpu_irq(of_irq->irq, clkevt);
| ^~~~~~
In file included from drivers/clocksource/timer-of.c:8:
./include/linux/interrupt.h:201:43: note: expected ‘__seg_gs void *’
but argument is of type ‘struct clock_event_device *’
201 | extern void free_percpu_irq(unsigned int, void __percpu *);
| ^~~~~~~~~~~~~~~
drivers/clocksource/timer-of.c: In function ‘timer_of_irq_init’:
drivers/clocksource/timer-of.c:74:51: error: passing argument 4 of
‘request_percpu_irq’ from pointer to non-enclosed address space
74 | np->full_name, clkevt) :
| ^~~~~~
./include/linux/interrupt.h:190:56: note: expected ‘__seg_gs void *’
but argument is of type ‘struct clock_event_device *’
190 | const char *devname, void __percpu *percpu_dev_id)
It appears the code is incorrect as reported by Uros Bizjak:
"The referred code is questionable as it tries to reuse
the clkevent pointer once as percpu pointer and once as generic
pointer, which should be avoided."
This change removes the percpu related code as no drivers is using it.
The value written into the TPM CnV can only be updated into the hardware
when the counter increases. Additional writes to the CnV write buffer are
ignored until the register has been updated. Therefore, we need to check
if the CnV has been updated before continuing. This may require waiting for
1 counter cycle in the worst case.
Cc: stable@vger.kernel.org Fixes: 059ab7b82eec ("clocksource/drivers/imx-tpm: Add imx tpm timer support") Signed-off-by: Jacky Bai <ping.bai@nxp.com> Reviewed-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Ye Li <ye.li@nxp.com> Reviewed-by: Jason Liu <jason.hui.liu@nxp.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20240725193355.1436005-2-Frank.Li@nxp.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In tpm_set_next_event(delta), return -ETIME by wrong cast to int when delta
is larger than INT_MAX.
For example:
tpm_set_next_event(delta = 0xffff_fffe)
{
...
next = tpm_read_counter(); // assume next is 0x10
next += delta; // next will 0xffff_fffe + 0x10 = 0x1_0000_000e
now = tpm_read_counter(); // now is 0x10
...
return (int)(next - now) <= 0 ? -ETIME : 0;
^^^^^^^^^^
0x1_0000_000e - 0x10 = 0xffff_fffe, which is -2 when
cast to int. So return -ETIME.
}
To fix this, introduce a 'prev' variable and check if 'now - prev' is
larger than delta.
Cc: stable@vger.kernel.org Fixes: 059ab7b82eec ("clocksource/drivers/imx-tpm: Add imx tpm timer support") Signed-off-by: Jacky Bai <ping.bai@nxp.com> Reviewed-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Ye Li <ye.li@nxp.com> Reviewed-by: Jason Liu <jason.hui.liu@nxp.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20240725193355.1436005-1-Frank.Li@nxp.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When removing a resource from vmci_resource_table in
vmci_resource_remove(), the search is performed using the resource
handle by comparing context and resource fields.
It is possible though to create two resources with different types
but same handle (same context and resource fields).
When trying to remove one of the resources, vmci_resource_remove()
may not remove the intended one, but the object will still be freed
as in the case of the datagram type in vmci_datagram_destroy_handle().
vmci_resource_table will still hold a pointer to this freed resource
leading to a use-after-free vulnerability.
Rescind offer handling relies on rescind callbacks for some of the
resources cleanup, if they are registered. It does not unregister
vmbus device for the primary channel closure, when callback is
registered. Without it, next onoffer does not come, rescind flag
remains set and device goes to unusable state.
Add logic to unregister vmbus for the primary channel in rescind callback
to ensure channel removal and relid release, and to ensure that next
onoffer can be received and handled properly.
For primary VM Bus channels, primary_channel pointer is always NULL. This
pointer is valid only for the secondary channels. Also, rescind callback
is meant for primary channels only.
Fix NULL pointer dereference by retrieving the device_obj from the parent
for the primary channel.
Binder objects are processed and copied individually into the target
buffer during transactions. Any raw data in-between these objects is
copied as well. However, this raw data copy lacks an out-of-bounds
check. If the raw data exceeds the data section size then the copy
overwrites the offsets section. This eventually triggers an error that
attempts to unwind the processed objects. However, at this point the
offsets used to index these objects are now corrupted.
Unwinding with corrupted offsets can result in decrements of arbitrary
nodes and lead to their premature release. Other users of such nodes are
left with a dangling pointer triggering a use-after-free. This issue is
made evident by the following KASAN report (trimmed):
==================================================================
BUG: KASAN: slab-use-after-free in _raw_spin_lock+0xe4/0x19c
Write of size 4 at addr ffff47fc91598f04 by task binder-util/743
The ad7124_soft_reset() function has the assumption that the chip will
assert the "power-on reset" bit in the STATUS register after a software
reset without any delay. The POR bit =0 is used to check if the chip
initialization is done.
A chip ID mismatch probe error appears intermittently when the probe
continues too soon and the ID register does not contain the expected
value.
Fix by adding a 200us delay after the software reset command is issued.
Fixes: b3af341bbd96 ("iio: adc: Add ad7124 support") Signed-off-by: Dumitru Ceclan <dumitru.ceclan@analog.com> Reviewed-by: Nuno Sa <nuno.sa@analog.com> Link: https://patch.msgid.link/20240731-ad7124-fix-v1-1-46a76aa4b9be@analog.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the scale_type is IIO_VAL_INT_PLUS_MICRO or IIO_VAL_INT_PLUS_NANO
the scale passed as argument is only applied to the fractional part of
the value. Fix it by also multiplying the integer part by the scale
provided.
Fixes: 48e44ce0f881 ("iio:inkern: Add function to read the processed value") Signed-off-by: Matteo Martelli <matteomartelli3@gmail.com> Link: https://patch.msgid.link/20240730-iio-fix-scale-v1-1-6246638c8daa@gmail.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In ad9834_write_frequency() clk_get_rate() can return 0. In such case
ad9834_calc_freqreg() call will lead to division by zero. Checking
'if (fout > (clk_freq / 2))' doesn't protect in case of 'fout' is 0.
ad9834_write_frequency() is called from ad9834_write(), where fout is
taken from text buffer, which can contain any value.
Modify parameters checking.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 12b9d5bf76bf ("Staging: IIO: DDS: AD9833 / AD9834 driver") Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/20240703154506.25584-1-amishin@t-argos.ru Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The overflow/underflow conditions in pata_macio_qc_prep() should never
happen. But if they do there's no need to kill the system entirely, a
WARN and failing the IO request should be sufficient and might allow the
system to keep running.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ 0.118053] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:283
Caused by get_c0_compare_int on secondary CPU.
We also skipped saving IRQ number to struct clock_event_device *cd as
it's never used by clockevent core, as per comments it's only meant
for "non CPU local devices".
Reported-by: Serge Semin <fancer.lancer@gmail.com> Closes: https://lore.kernel.org/linux-mips/6szkkqxpsw26zajwysdrwplpjvhl5abpnmxgu2xuj3dkzjnvsf@4daqrz4mf44k/ Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
If we need to increase the tree depth, allocate a new node, and then
race with another thread that increased the tree depth before us, we'll
still have a preallocated node that might be used later.
If we then use that node for a new non-root node, it'll still have a
pointer to the old root instead of being zeroed - fix this by zeroing it
in the cmpxchg failure path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Sasha Levin <sashal@kernel.org>
When of_irq_parse_raw() is invoked with a device address smaller than
the interrupt parent node (from #address-cells property), KASAN detects
the following out-of-bounds read when populating the initial match table
(dyndbg="func of_irq_parse_* +p"):
OF: of_irq_parse_one: dev=/soc@0/picasso/watchdog, index=0
OF: parent=/soc@0/pci@878000000000/gpio0@17,0, intsize=2
OF: intspec=4
OF: of_irq_parse_raw: ipar=/soc@0/pci@878000000000/gpio0@17,0, size=2
OF: -> addrsize=3
==================================================================
BUG: KASAN: slab-out-of-bounds in of_irq_parse_raw+0x2b8/0x8d0
Read of size 4 at addr ffffff81beca5608 by task bash/764
The buggy address belongs to the object at ffffff81beca5600
which belongs to the cache kmalloc-128 of size 128
The buggy address is located 8 bytes inside of
128-byte region [ffffff81beca5600, ffffff81beca5680)
Memory state around the buggy address: ffffff81beca5500: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffffff81beca5580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffffff81beca5600: 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffffff81beca5680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffffff81beca5700: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc
==================================================================
OF: -> got it !
Prevent the out-of-bounds read by copying the device address into a
buffer of sufficient size.
Syzkiller reports a "KMSAN: uninit-value in pick_link" bug.
This is caused by an uninitialised page, which is ultimately caused
by a corrupted symbolic link size read from disk.
The reason why the corrupted symlink size causes an uninitialised
page is due to the following sequence of events:
1. squashfs_read_inode() is called to read the symbolic
link from disk. This assigns the corrupted value 3875536935 to inode->i_size.
2. Later squashfs_symlink_read_folio() is called, which assigns
this corrupted value to the length variable, which being a
signed int, overflows producing a negative number.
3. The following loop that fills in the page contents checks that
the copied bytes is less than length, which being negative means
the loop is skipped, producing an uninitialised page.
This patch adds a sanity check which checks that the symbolic
link size is not larger than expected.
ipheth_sndbulk_callback() can submit carrier_work
as a part of its error handling. That means that
the driver must make sure that the work is cancelled
after it has made sure that no more URB can terminate
with an error condition.
Hence the order of actions in ipheth_close() needs
to be inverted.
Signed-off-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: Foster Snowhill <forst@pen.gy> Tested-by: Georgi Valkov <gvalkov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
When exercising uinput interface syzkaller may try setting up device
with a really large number of slots, which causes memory allocation
failure in input_mt_init_slots(). While this allocation failure is
handled properly and request is rejected, it results in syzkaller
reports. Additionally, such request may put undue burden on the
system which will try to free a lot of memory for a bogus request.
Fix it by limiting allowed number of slots to 100. This can easily
be extended if we see devices that can track more than 100 contacts.
The .data.rel.ro and .got section were added between the rodata and
ro_after_init data section, which adds an RW mapping in between all RO
mapping of the kernel image:
Some arch + compiler combinations report a potentially unused variable
location in btrfs_lookup_dentry(). This is a false alert as the variable
is passed by value and always valid or there's an error. The compilers
cannot probably reason about that although btrfs_inode_by_name() is in
the same file.
> + /kisskb/src/fs/btrfs/inode.c: error: 'location.objectid' may be used
+uninitialized in this function [-Werror=maybe-uninitialized]: => 5603:9
> + /kisskb/src/fs/btrfs/inode.c: error: 'location.type' may be used
+uninitialized in this function [-Werror=maybe-uninitialized]: => 5674:5
Initialize it to zero, this should fix the warnings and won't change the
behaviour as btrfs_inode_by_name() accepts only a root or inode item
types, otherwise returns an error.
Even if a vgem device is configured in, we will skip the import_vgem_fd()
test almost every time.
TAP version 13
1..11
# Testing heap: system
# =======================================
# Testing allocation and importing:
ok 1 # SKIP Could not open vgem -1
The problem is that we use the DRM_IOCTL_VERSION ioctl to query the driver
version information but leave the name field a non-null-terminated string.
Terminate it properly to actually test against the vgem device.
While at it, let's check the length of the driver name is exactly 4 bytes
and return early otherwise (in case there is a name like "vgemfoo" that
gets converted to "vgem\0" unexpectedly).
The dpaa-eth driver is written for PowerPC and Arm SoCs which have 1-24
CPUs. It depends on CONFIG_NR_CPUS having a reasonably small value in
Kconfig. Otherwise, there are 2 functions which allocate on-stack arrays
of NR_CPUS elements, and these can quickly explode in size, leading to
warnings such as:
The problem is twofold:
- Reducing the array size to the boot-time num_possible_cpus() (rather
than the compile-time NR_CPUS) creates a variable-length array,
which should be avoided in the Linux kernel.
- Using NR_CPUS as an array size makes the driver blow up in stack
consumption with generic, as opposed to hand-crafted, .config files.
A simple solution is to use dynamic allocation for num_possible_cpus()
elements (aka a small number determined at runtime).
Where pci_reset_bus() users are triggering unlocked secondary bus resets.
Ironically pci_bus_reset(), several calls down from pci_reset_bus(), uses
pci_bus_lock() before issuing the reset which locks everything *but* the
bridge itself.
For the same motivation as adding:
bridge = pci_upstream_bridge(dev);
if (bridge)
pci_dev_lock(bridge);
to pci_reset_function() for the "bus" and "cxl_bus" reset cases, add
pci_dev_lock() for @bus->self to pci_bus_lock().
Link: https://lore.kernel.org/r/171711747501.1628941.15217746952476635316.stgit@dwillia2-xfh.jf.intel.com Reported-by: Imre Deak <imre.deak@intel.com> Closes: http://lore.kernel.org/r/6657833b3b5ae_14984b29437@dwillia2-xfh.jf.intel.com.notmuch Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
[bhelgaas: squash in recursive locking deadlock fix from Keith Busch:
https://lore.kernel.org/r/20240711193650.701834-1-kbusch@meta.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Kalle Valo <kvalo@kernel.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In reada we BUG_ON(refs == 0), which could be unkind since we aren't
holding a lock on the extent leaf and thus could get a transient
incorrect answer. In walk_down_proc we also BUG_ON(refs == 0), which
could happen if we have extent tree corruption. Change that to return
-EUCLEAN. In do_walk_down() we catch this case and handle it correctly,
however we return -EIO, which -EUCLEAN is a more appropriate error code.
Finally in walk_up_proc we have the same BUG_ON(refs == 0), so convert
that to proper error handling. Also adjust the error message so we can
actually do something with the information.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
We have a couple of areas where we check to make sure the tree block is
locked before looking up or messing with references. This is old code
so it has this as BUG_ON(). Convert this to ASSERT() for developers.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
For CONFIG_DEBUG_OBJECTS_WORK=y kernels sscs.work defined by
INIT_WORK_ONSTACK() is initialized by debug_object_init_on_stack() for
the debug check in __init_work() to work correctly.
But this lacks the counterpart to remove the tracked object from debug
objects again, which will cause a debug object warning once the stack is
freed.
Add the missing destroy_work_on_stack() invocation to cure that.
mwifiex_get_priv_by_id() returns the priv pointer corresponding to
the bss_num and bss_type, but without checking if the priv is actually
currently in use.
Unused priv pointers do not have a wiphy attached to them which can
lead to NULL pointer dereferences further down the callstack. Fix
this by returning only used priv pointers which have priv->bss_mode
set to something else than NL80211_IFTYPE_UNSPECIFIED.
Said NULL pointer dereference happened when an Accesspoint was started
with wpa_supplicant -i mlan0 with this config:
While calling bpf_object__find_map_by_name with obj = NULL is
obviously incorrect, this should not lead to a segmentation
fault but rather be handled gracefully.
As __bpf_map__iter already handles this situation correctly, we
can delegate the check for the regular case there and only add
a check in case the prev or next parameter is NULL.
DIV_ROUND_CLOSEST() after kstrtol() results in an underflow if a large
negative number such as -9223372036854775808 is provided by the user.
Fix it by reordering clamp_val() and DIV_ROUND_CLOSEST() operations.
DIV_ROUND_CLOSEST() after kstrtol() results in an underflow if a large
negative number such as -9223372036854775808 is provided by the user.
Fix it by reordering clamp_val() and DIV_ROUND_CLOSEST() operations.
DIV_ROUND_CLOSEST() after kstrtol() results in an underflow if a large
negative number such as -9223372036854775808 is provided by the user.
Fix it by reordering clamp_val() and DIV_ROUND_CLOSEST() operations.
DIV_ROUND_CLOSEST() after kstrtol() results in an underflow if a large
negative number such as -9223372036854775808 is provided by the user.
Fix it by reordering clamp_val() and DIV_ROUND_CLOSEST() operations.
The hotplug driver for powerpc (pci/hotplug/pnv_php.c) causes a kernel
crash when we try to hot-unplug/disable the PCIe switch/bridge from
the PHB.
The crash occurs because although the MSI data structure has been
released during disable/hot-unplug path and it has been assigned
with NULL, still during unregistration the code was again trying to
explicitly disable the MSI which causes the NULL pointer dereference and
kernel crash.
The patch fixes the check during unregistration path to prevent invoking
pci_disable_msi/msix() since its data structure is already freed.
The pointer isn't initialized by callers, but I have
encountered cases where it's still printed; initialize
it in all possible cases in setup_one_line().
The writing of css->cgroup associated with the cgroup root in
rebind_subsystems() is currently protected only by cgroup_mutex.
However, the reading of css->cgroup in both proc_cpuset_show() and
proc_cgroup_show() is protected just by css_set_lock. That makes the
readers susceptible to racing problems like data tearing or caching.
It is also a problem that can be reported by KCSAN.
This can be fixed by using READ_ONCE() and WRITE_ONCE() to access
css->cgroup. Alternatively, the writing of css->cgroup can be moved
under css_set_lock as well which is done by this patch.
Queued invalidation wait descriptor status is volatile in that IOMMU
hardware writes the data upon completion.
Use READ_ONCE() to prevent compiler optimizations which ensures memory
reads every time. As a side effect, READ_ONCE() also enforces strict
types and may add an extra instruction. But it should not have negative
performance impact since we use cpu_relax anyway and the extra time(by
adding an instruction) may allow IOMMU HW request cacheline ownership
easier.
e.g. gcc 12.3
BEFORE:
81 38 ad de 00 00 cmpl $0x2,(%rax)
AFTER (with READ_ONCE())
772f: 8b 00 mov (%rax),%eax
7731: 3d ad de 00 00 cmp $0x2,%eax
//status data is 32 bit
dm_parse_device_entry() simply copies the minor number into dmi.dev, but
the dev_t format splits the minor number between the lowest 8 bytes and
highest 12 bytes. If the minor number is larger than 255, part of it
will end up getting treated as the major number
Fix this by checking that the minor number is valid and then encoding it
as a dev_t.
When userspace wants to take over a fdb entry by setting it as
EXTERN_LEARNED, we set both flags BR_FDB_ADDED_BY_EXT_LEARN and
BR_FDB_ADDED_BY_USER in br_fdb_external_learn_add().
If the bridge updates the entry later because its port changed, we clear
the BR_FDB_ADDED_BY_EXT_LEARN flag, but leave the BR_FDB_ADDED_BY_USER
flag set.
If userspace then wants to take over the entry again,
br_fdb_external_learn_add() sees that BR_FDB_ADDED_BY_USER and skips
setting the BR_FDB_ADDED_BY_EXT_LEARN flags, thus silently ignores the
update.
Fix this by always allowing to set BR_FDB_ADDED_BY_EXT_LEARN regardless
if this was a user fdb entry or not.
Fixes: 710ae7287737 ("net: bridge: Mark FDB entries that were added by user as such") Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20240903081958.29951-1-jonas.gorski@bisdn.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
We observed a null-ptr-deref in fou_gro_receive() while shutting down
a host. [0]
The NULL pointer is sk->sk_user_data, and the offset 8 is of protocol
in struct fou.
When fou_release() is called due to netns dismantle or explicit tunnel
teardown, udp_tunnel_sock_release() sets NULL to sk->sk_user_data.
Then, the tunnel socket is destroyed after a single RCU grace period.
So, in-flight udp4_gro_receive() could find the socket and execute the
FOU GRO handler, where sk->sk_user_data could be NULL.
Let's use rcu_dereference_sk_user_data() in fou_from_sock() and add NULL
checks in FOU GRO handlers.
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Manually fix all net/usb drivers without separate maintainers.
v2: catc does DMA to the buffer, leave the conversion to Oliver
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: bab8eb0dd4cb ("usbnet: modern method to get random MAC") Signed-off-by: Sasha Levin <sashal@kernel.org>
strcpy() performs no bounds checking on the destination buffer. This
could result in linear overflows beyond the end of the buffer, leading
to all kinds of misbehaviors. The safe replacement is strscpy().
Signed-off-by: Len Baker <len.baker@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: bab8eb0dd4cb ("usbnet: modern method to get random MAC") Signed-off-by: Sasha Levin <sashal@kernel.org>
Call rtnl_unlock() on this error path, before returning.
Fixes: bc23aa949aeb ("igc: Add pcie error handler support") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
When we cork messages in psock->cork, the last message triggers the
flushing will result in sending a sk_msg larger than the current
message size. In this case, in tcp_bpf_send_verdict(), 'copied' becomes
negative at least in the following case:
Therefore, it could lead to the following BUG with a proper value of
'copied' (thanks to syzbot). We should not use negative 'copied' as a
return value here.
In case of error in build_tokens_sysfs(), all the memory that has been
allocated is freed at end of this function. But then free_group() is
called which performs memory deallocation again.
Also, instead of free_group() call, there should be exit_dell_smbios_smm()
and exit_dell_smbios_wmi() calls, since there is initialization, but there
is no release of resources in case of an error.
Fix these issues by replacing free_group() call with
exit_dell_smbios_wmi() and exit_dell_smbios_smm().
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 33b9ca1e53b4 ("platform/x86: dell-smbios: Add a sysfs interface for SMBIOS tokens") Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Link: https://lore.kernel.org/r/20240830065428.9544-1-amishin@t-argos.ru Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
82580 NICs have a hardware bug that makes it
necessary to write into the TSICR (TimeSync Interrupt Cause) register
to clear it:
https://lore.kernel.org/all/CDCB8BE0.1EC2C%25matthew.vick@intel.com/
Add a conditional so only for 82580 we write into the TSICR register,
so we don't risk losing events for other models.
Without this change, when running ptp4l with an Intel 82580 card,
I get the following output:
> timed out while polling for tx timestamp increasing tx_timestamp_timeout or
> increasing kworker priority may correct this issue, but a driver bug likely
> causes it
This goes away with this change.
This (partially) reverts commit ee14cc9ea19b ("igb: Fix missing time sync events").
Fixes: ee14cc9ea19b ("igb: Fix missing time sync events") Closes: https://lore.kernel.org/intel-wired-lan/CAN0jFd1kO0MMtOh8N2Ztxn6f7vvDKp2h507sMryobkBKe=xk=w@mail.gmail.com/ Tested-by: Daiwei Li <daiweili@google.com> Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Daiwei Li <daiweili@google.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Errata #i2037 in AM65x/DRA80xM Processors Silicon Revision 1.0
(SPRZ452D_July 2018_Revised December 2019 [1]) mentions when an
inbound PCIe TLP spans more than two internal AXI 128-byte bursts,
the bus may corrupt the packet payload and the corrupt data may
cause associated applications or the processor to hang.
The workaround for Errata #i2037 is to limit the maximum read
request size and maximum payload size to 128 bytes. Add workaround
for Errata #i2037 here.
The errata and workaround is applicable only to AM65x SR 1.0 and
later versions of the silicon will have this fixed.
Set the host status byte when a data completion error is encountered
otherwise the upper layer may end up using the invalid zero'ed data.
The following output was observed from scsi/sd.c prior to this fix.
After being asked about support for WPA3 for BCM43224 chipset it
was found that all it takes is setting the MFP_CAPABLE flag and
mac80211 will take care of all that is needed [1].
In several places a division by fmt->vdownsampling[p] was
missing in the sizeimage[p] calculation, causing incorrect
behavior for multiplanar formats were some planes are smaller
than the first plane.
Found by new v4l2-compliance tests.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Sasha Levin <sashal@kernel.org>
Avoid mounting filesystems where the partition would overflow the
32-bits used for block number. Also refuse to mount filesystems where
the partition length is so large we cannot safely index bits in a
block bitmap.
Now there is a issue is that code checks reports a warning: implicit
narrowing conversion from type 'unsigned int' to small type 'u8' (the
'keylen' variable). Fix it by removing the 'keylen' variable.
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The Allwinner H6 IOMMU has a bypass register, which allows to circumvent
the page tables for each possible master. The reset value for this
register is 0, which disables the bypass.
The Allwinner H616 IOMMU resets this register to 0x7f, which activates
the bypass for all masters, which is not what we want.
Always clear this register to 0, to enforce the usage of page tables,
and make this driver compatible with the H616 in this respect.
When (AF_UNIX, SOCK_STREAM) socket connect()s to a listening socket,
the listener's sk_peer_pid/sk_peer_cred are copied to the client in
copy_peercred().
Then, the client's sk_peer_pid and sk_peer_cred are always NULL, so
we need not call put_pid() and put_cred() there.
When a process accept()s connection from a unix socket
(either stream or seqpacket)
it gets the socket with the label of the connecting process.
For example, if a connecting process has a label 'foo',
the accept()ed socket will also have 'in' and 'out' labels 'foo',
regardless of the label of the listener process.
This is because kernel creates unix child sockets
in the context of the connecting process.
I do not see any obvious way for the listener to abuse
alien labels coming with the new socket, but,
to be on the safe side, it's better fix new socket labels.
Add a simple sanity check to HD-audio HDMI Channel Map controls.
Although the value might not be accepted for the actual connection, we
can filter out some bogus values beforehand, and that should be enough
for making kselftest happier.
The final bit of stats that is global is the rpc svc_stat. Move this
into the nfsd_net struct and use that everywhere instead of the global
struct. Remove the unused global struct.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We have a global set of counters that we modify for all of the nfsd
operations, but now that we're exposing these stats across all network
namespaces we need to make the stats also be per-network namespace. We
already have some caching stats that are per-network namespace, so move
these definitions into the same counter and then adjust all the helpers
and users of these stats to provide the appropriate nfsd_net struct so
that the stats are maintained for the per-network namespace objects.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
[ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We are running nfsd servers inside of containers with their own network
namespace, and we want to monitor these services using the stats found
in /proc. However these are not exposed in the proc inside of the
container, so we have to bind mount the host /proc into our containers
to get at this information.
Separate out the stat counters init and the proc registration, and move
the proc registration into the pernet operations entry and exit points
so that these stats can be exposed inside of network namespaces.
This is an intermediate step, this just exposes the global counters in
the network namespace. Subsequent patches will move these counters into
the per-network namespace container.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We're going to merge the stats all into per network namespace in
subsequent patches, rename these nn counters to be consistent with the
rest of the stats.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
nfsd is the only thing using this helper, and it doesn't use the private
currently. When we switch to per-network namespace stats we will need
the struct net * in order to get to the nfsd_net. Use the net as the
proc private so we can utilize this when we make the switch over.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
[ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since only one service actually reports the rpc stats there's not much
of a reason to have a pointer to it in the svc_program struct. Adjust
the svc_create_pooled function to take the sv_stats as an argument and
pass the struct through there as desired instead of getting it from the
svc_program->pg_stats.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
[ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A lot of places are setting a blank svc_stats in ->pg_stats and never
utilizing these stats. Remove all of these extra structs as we're not
reporting these stats anywhere.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We check for the existence of ->sv_stats elsewhere except in the core
processing code. It appears that only nfsd actual exports these values
anywhere, everybody else just has a write only copy of sv_stats in their
svc_program. Add a check for ->sv_stats before every adjustment to
allow us to eliminate the stats struct from all the users who don't
report the stats.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
[ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fs/nfsd/export.c: In function 'svc_export_parse':
fs/nfsd/export.c:737:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=]
737 | }
On my systems, svc_export_parse() has a stack frame of over 800
bytes, not 1040, but nonetheless, it could do with some reduction.
When a struct svc_export is on the stack, it's a temporary structure
used as an argument, and not visible as an actual exported FS. No
need to reserve space for export_stats in such cases.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310012359.YEw5IrK6-lkp@intel.com/ Cc: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
Stable-dep-of: 4b14885411f7 ("nfsd: make all of the nfsd stats per-network namespace")
[ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In function ‘export_stats_init’,
inlined from ‘svc_export_alloc’ at fs/nfsd/export.c:866:6:
fs/nfsd/export.c:337:16: warning: ‘nfsd_percpu_counters_init’ accessing 40 bytes in a region of size 0 [-Wstringop-overflow=]
337 | return nfsd_percpu_counters_init(&stats->counter, EXP_STATS_COUNTERS_NUM);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/nfsd/export.c:337:16: note: referencing argument 1 of type ‘struct percpu_counter[0]’
fs/nfsd/stats.h: In function ‘svc_export_alloc’:
fs/nfsd/stats.h:40:5: note: in a call to function ‘nfsd_percpu_counters_init’
40 | int nfsd_percpu_counters_init(struct percpu_counter counters[], int num);
| ^~~~~~~~~~~~~~~~~~~~~~~~~
Cc: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
Stable-dep-of: 93483ac5fec6 ("nfsd: expose /proc/net/sunrpc/nfsd in net namespaces") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The error paths in nfsd_svc() are needlessly complex and can result in a
final call to svc_put() without nfsd_last_thread() being called. This
results in the listening sockets not being closed properly.
The per-netns setup provided by nfsd_startup_new() and removed by
nfsd_shutdown_net() is needed precisely when there are running threads.
So we don't need nfsd_up_before. We don't need to know if it *was* up.
We only need to know if any threads are left. If none are, then we must
call nfsd_shutdown_net(). But we don't need to do that explicitly as
nfsd_last_thread() does that for us.
So simply call nfsd_last_thread() before the last svc_put() if there are
no running threads. That will always do the right thing.
Also discard:
pr_info("nfsd: last server has exited, flushing export cache\n");
It may not be true if an attempt to start the first server failed, and
it isn't particularly helpful and it simply reports normal behaviour.
Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Avoid holding the bucket lock while freeing cache entries. This
change also caps the number of entries that are freed when the
shrinker calls to reduce the shrinker's impact on the cache's
effectiveness.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Enable nfsd_prune_bucket() to drop the bucket lock while calling
kfree(). Use the same pattern that Jeff recently introduced in the
NFSD filecache.
A few percpu operations are moved outside the lock since they
temporarily disable local IRQs which is expensive and does not
need to be done while the lock is held.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Stable-dep-of: c135e1269f34 ("NFSD: Refactor the duplicate reply cache shrinker") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For readability, rename to match the other helpers.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Stable-dep-of: 4b14885411f7 ("nfsd: make all of the nfsd stats per-network namespace") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>