If the i_mode field in inode of metadata files is corrupted on disk, it
can cause the initialization of bmap structure, which should have been
called from nilfs_read_inode_common(), not to be called. This causes a
lockdep warning followed by a NULL pointer dereference at
nilfs_bmap_lookup_at_level().
This patch fixes these issues by adding a missing sanitiy check for the
i_mode field of metadata file's inode.
Syzbot found an issue in usbmon module, where the user space client can
corrupt the monitor's internal memory, causing the usbmon module to
crash the kernel with segfault, UAF, etc.
The reproducer mmaps the /dev/usbmon memory to user space, and
overwrites it with arbitrary data, which causes all kinds of issues.
Return an -EPERM error from mon_bin_mmap() if the flag VM_WRTIE is set.
Also clear VM_MAYWRITE to make it impossible to change it to writable
later.
arch.tls_array is statically allocated so checking for NULL doesn't
make sense. This causes the compiler warning below.
Remove the checks to silence these warnings.
../arch/x86/um/tls_32.c: In function 'get_free_idx':
../arch/x86/um/tls_32.c:68:13: warning: the comparison will always evaluate as 'true' for the address of 'tls_array' will never be NULL [-Waddress]
68 | if (!t->arch.tls_array)
| ^
In file included from ../arch/x86/um/asm/processor.h:10,
from ../include/linux/rcupdate.h:30,
from ../include/linux/rculist.h:11,
from ../include/linux/pid.h:5,
from ../include/linux/sched.h:14,
from ../arch/x86/um/tls_32.c:7:
../arch/x86/um/asm/processor_32.h:22:31: note: 'tls_array' declared here
22 | struct uml_tls_struct tls_array[GDT_ENTRY_TLS_ENTRIES];
| ^~~~~~~~~
../arch/x86/um/tls_32.c: In function 'get_tls_entry':
../arch/x86/um/tls_32.c:243:13: warning: the comparison will always evaluate as 'true' for the address of 'tls_array' will never be NULL [-Waddress]
243 | if (!t->arch.tls_array)
| ^
../arch/x86/um/asm/processor_32.h:22:31: note: 'tls_array' declared here
22 | struct uml_tls_struct tls_array[GDT_ENTRY_TLS_ENTRIES];
| ^~~~~~~~~
Signed-off-by: Lukas Straub <lukasstraub2@web.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
There is uninit value bug in dgram_sendmsg function in
net/ieee802154/socket.c when the length of valid data pointed by the
msg->msg_name isn't verified.
We introducing a helper function ieee802154_sockaddr_check_size to
check namelen. First we check there is addr_type in ieee802154_addr_sa.
Then, we check namelen according to addr_type.
Also fixed in raw_bind, dgram_bind, dgram_connect.
Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
In alloc_inode, inode_init_always() could return -ENOMEM if
security_inode_alloc() fails, which causes inode->i_private
uninitialized. Then nilfs_is_metadata_file_inode() returns
true and nilfs_free_inode() wrongly calls nilfs_mdt_destroy(),
which frees the uninitialized inode->i_private
and leads to crashes(e.g., UAF/GPF).
Fix this by moving security_inode_alloc just prior to
this_cpu_inc(nr_inodes)
Link: https://lkml.kernel.org/r/CAFcO6XOcf1Jj2SeGt=jJV59wmhESeSKpfR0omdFRq+J9nD1vfQ@mail.gmail.com Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Reported-by: Hao Sun <sunhao.th@gmail.com> Reported-by: Jiacheng Xu <stitch@zju.edu.cn> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Naresh Kamboju recently reported that the function-graph tracer crashes
on ARM. The function-graph tracer assumes that the kernel is built with
frame pointers.
We explicitly disabled the function-graph tracer when building Thumb2,
since the Thumb2 ABI doesn't have frame pointers.
We recently changed the way the unwinder method was selected, which
seems to have made it more likely that we can end up with the function-
graph tracer enabled but without the kernel built with frame pointers.
Fix up the function graph tracer dependencies so the option is not
available when we have no possibility of having frame pointers, and
adjust the dependencies on the unwinder option to hide the non-frame
pointer unwinder options if the function-graph tracer is enabled.
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reported-by: Danilo Cezar Zanella <danilo.zanella@iag.usp.br> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We enable -Wcast-function-type globally in the kernel to warn about
mismatching types in function pointer casts. Compilers currently
warn only about ABI incompability with this flag, but Clang 16 will
enable a stricter version of the check by default that checks for an
exact type match. This will be very noisy in the kernel, so disable
-Wcast-function-type-strict without W=1 until the new warnings have
been addressed.
After commit 31fd9b79dc58 ("ARM: dts: BCM5301X: update CRU block
description") a warning from clk-iproc-pll.c was generated due to a
duplicate PLL name as well as the console stopped working. Upon closer
inspection it became clear that iproc_pll_clk_setup() used the Device
Tree node unit name as an unique identifier as well as a parent name to
parent all clocks under the PLL.
BCM5301X was the first platform on which that got noticed because of the
DT node unit name renaming but the same assumptions hold true for any
user of the iproc_pll_clk_setup() function.
The first 'clock-output-names' property is always guaranteed to be
unique as well as providing the actual desired PLL clock name, so we
utilize that to register the PLL and as a parent name of all children
clock.
There were a few fields in the iproc pll data structures that were
holding information that was not true state information.
Using stack variables is sufficient and simplifies the structure.
There are not any functional changes in this commit.
Signed-off-by: Lori Hikichi <lori.hikichi@broadcom.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Stable-dep-of: 1b24a132eba7 ("clk: iproc: Do not rely on node name for correct PLL setup") Signed-off-by: Sasha Levin <sashal@kernel.org>
The IOC_PR_CLEAR and IOC_PR_RELEASE ioctls are
non-functional on NVMe devices because the nvme_pr_clear()
and nvme_pr_release() functions set the IEKEY field incorrectly.
The IEKEY field should be set only when the key is zero (i.e,
not specified). The current code does it backwards.
Furthermore, the NVMe spec describes the persistent
reservation "clear" function as an option on the reservation
release command. The current implementation of nvme_pr_clear()
erroneously uses the reservation register command.
Fix these errors. Note that NVMe version 1.3 and later specify
that setting the IEKEY field will return an error of Invalid
Field in Command. The fix will set IEKEY when the key is zero,
which is appropriate as these ioctls consider a zero key to
be "unspecified", and the intention of the spec change is
to require a valid key.
Tested on a version 1.4 PCI NVMe device in an Azure VM.
Fixes: 1673f1f08c88 ("nvme: move block_device_operations and ns/ctrl freeing to common code") Fixes: 1d277a637a71 ("NVMe: Add persistent reservation ops") Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add a new line in functions nvme_pr_preempt(), nvme_pr_clear(), and
nvme_pr_release() after variable declaration which follows the rest of
the code in the nvme/host/core.c.
No functional change(s) in this patch.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Stable-dep-of: c292a337d0e4 ("nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices") Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently usbnet_disconnect() unanchors and frees all deferred URBs
using usb_scuttle_anchored_urbs(), which does not free urb->context,
causing a memory leak as reported by syzbot.
Use a usb_get_from_anchor() while loop instead, similar to what we did
in commit 19cfe912c37b ("Bluetooth: btusb: Fix memory leak in
play_deferred"). Also free urb->sg.
Reported-and-tested-by: syzbot+dcd3e13cf4472f2e0ba1@syzkaller.appspotmail.com Fixes: 69ee472f2706 ("usbnet & cdc-ether: Autosuspend for online devices") Fixes: 638c5115a794 ("USBNET: support DMA SG") Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Link: https://lore.kernel.org/r/20220923042551.2745-1-yepeilin.cs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
For quite some time, core DRM helpers already ensure that any relevant
connectors/CRTCs/etc. are disabled, as well as their associated
components (e.g., bridges) when suspending the system. Thus,
analogix_dp_bridge_{enable,disable}() already get called, which in turn
call drm_panel_{prepare,unprepare}(). This makes these drm_panel_*()
calls redundant.
Besides redundancy, there are a few problems with this handling:
(1) drm_panel_{prepare,unprepare}() are *not* reference-counted APIs and
are not in general designed to be handled by multiple callers --
although some panel drivers have a coarse 'prepared' flag that mitigates
some damage, at least. So at a minimum this is redundant and confusing,
but in some cases, this could be actively harmful.
(2) The error-handling is a bit non-standard. We ignored errors in
suspend(), but handled errors in resume(). And recently, people noticed
that the clk handling is unbalanced in error paths, and getting *that*
right is not actually trivial, given the current way errors are mostly
ignored.
(3) In the particular way analogix_dp_{suspend,resume}() get used (e.g.,
in rockchip_dp_*(), as a late/early callback), we don't necessarily have
a proper PM relationship between the DP/bridge device and the panel
device. So while the DP bridge gets resumed, the panel's parent device
(e.g., platform_device) may still be suspended, and so any prepare()
calls may fail.
So remove the superfluous, possibly-harmful suspend()/resume() handling
of panel state.
A number of drivers call page_frag_alloc() with a fragment's size >
PAGE_SIZE.
In low memory conditions, __page_frag_cache_refill() may fail the order
3 cache allocation and fall back to order 0; In this case, the cache
will be smaller than the fragment, causing memory corruptions.
Prevent this from happening by checking if the newly allocated cache is
large enough for the fragment; if not, the allocation will fail and
page_frag_alloc() will return NULL.
Link: https://lkml.kernel.org/r/20220715125013.247085-1-mlombard@redhat.com Fixes: b63ae8ca096d ("mm/net: Rename and move page fragment handling from net/ to mm/") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Cc: Chen Lin <chen45464546@163.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to the datasheet [1] at page 377, 4-bit bus width is turned on by
bit 2 of the Bus Width Register. Thus the current bitmask is wrong: define
BUS_WIDTH_4 BIT(1)
BIT(1) does not work but BIT(2) works. This has been verified on real MOXA
hardware with FTSDC010 controller revision 1_6_0.
The corrected value of BUS_WIDTH_4 mask collides with: define BUS_WIDTH_8
BIT(2). Additionally, 8-bit bus width mode isn't supported according to the
datasheet, so let's remove the corresponding code.
Fix this by adding sanity check on extended system files' directory inode
to ensure that it is directory, just like ntfs_extend_init() when mounting
ntfs3.
Link: https://lkml.kernel.org/r/20220809064730.2316892-1-chenxiaosong2@huawei.com Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The UAS mode of Thinkplus(0x17ef, 0x3899) is reported to influence
performance and trigger kernel panic on several platforms with the
following error message:
The UAS mode of Hiksemi USB_HDD is reported to fail to work on several
platforms with the following error message, then after re-connecting the
device will be offlined and not working at all.
The UAS mode of Hiksemi is reported to fail to work on several platforms
with the following error message, then after re-connecting the device will
be offlined and not working at all.
Currently the Orlov inode allocator searches for free inodes for a
directory only in flex block groups with at most inodes_per_group/16
more directory inodes than average per flex block group. However with
growing size of flex block group this becomes unnecessarily strict.
Scale allowed difference from average directory count per flex block
group with flex block group size as we do with other metrics.
the interesting part is the 'f8000000' region as it is actually the
VM's framebuffer:
$ lspci -v
...
0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA (prog-if 00 [VGA controller])
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at f8000000 (32-bit, non-prefetchable) [size=64M]
...
hv_vmbus: registering driver hyperv_drm
hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] Synthvid Version major 3, minor 5
hyperv_drm 0000:00:08.0: vgaarb: deactivate vga console
hyperv_drm 0000:00:08.0: BAR 0: can't reserve [mem 0xf8000000-0xfbffffff]
hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] Cannot request framebuffer, boot fb still active?
Note: "Cannot request framebuffer" is not a fatal error in
hyperv_setup_gen1() as the code assumes there's some other framebuffer
device there but we actually have some other PCI device (mlx4 in this
case) config space there!
The problem appears to be that vmbus_allocate_mmio() can use dedicated
framebuffer region to serve any MMIO request from any device. The
semantics one might assume of a parameter named "fb_overlap_ok"
aren't implemented because !fb_overlap_ok essentially has no effect.
The existing semantics are really "prefer_fb_overlap". This patch
implements the expected and needed semantics, which is to not allocate
from the frame buffer space when !fb_overlap_ok.
Note, Gen2 VMs are usually unaffected by the issue because
framebuffer region is already taken by EFI fb (in case kernel supports
it) but Gen1 VMs may have this region unclaimed by the time Hyper-V PCI
pass-through driver tries allocating MMIO space if Hyper-V DRM/FB drivers
load after it. Devices can be brought up in any sequence so let's
resolve the issue by always ignoring 'fb_mmio' region for non-FB
requests, even if the region is unclaimed.
Fix Oops in dasd_alias_get_start_dev() function caused by the pavgroup
pointer being NULL.
The pavgroup pointer is checked on the entrance of the function but
without the lcu->lock being held. Therefore there is a race window
between dasd_alias_get_start_dev() and _lcu_update() which sets
pavgroup to NULL with the lcu->lock held.
Fix by checking the pavgroup pointer with lcu->lock held.
Cc: <stable@vger.kernel.org> # 2.6.25+ Fixes: 8e09f21574ea ("[S390] dasd: add hyper PAV support to DASD device driver, part 1") Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Link: https://lore.kernel.org/r/20220919154931.4123002-2-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a separate receive path for small packets (under 256 bytes).
Instead of allocating a new dma-capable skb to be used for the next packet,
this path allocates a skb and copies the data into it (reusing the existing
sbk for the next packet). There are two bytes of junk data at the beginning
of every packet. I believe these are inserted in order to allow aligned DMA
and IP headers. We skip over them using skb_reserve. Before copying over
the data, we must use a barrier to ensure we see the whole packet. The
current code only synchronizes len bytes, starting from the beginning of
the packet, including the junk bytes. However, this leaves off the final
two bytes in the packet. Synchronize the whole packet.
To reproduce this problem, ping a HME with a payload size between 17 and
214
$ ping -s 17 <hme_address>
which will complain rather loudly about the data mismatch. Small packets
(below 60 bytes on the wire) do not have this issue. I suspect this is
related to the padding added to increase the minimum packet size.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Sean Anderson <seanga2@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220920235018.1675956-1-seanga2@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
/proc/kallsyms and /proc/modules are compared before and after the copy
in order to ensure no changes during the copy.
However /proc/modules also might change due to reference counts changing
even though that does not make any difference.
Any modules loaded or unloaded should be visible in changes to kallsyms,
so it is not necessary to check /proc/modules also anyway.
Remove the comparison checking that /proc/modules is unchanged.
Fixes: fc1b691d7651d949 ("perf buildid-cache: Add ability to add kcore to the cache") Reported-by: Daniel Dao <dqminh@cloudflare.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Daniel Dao <dqminh@cloudflare.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220914122429.8770-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The dev->can.state is set to CAN_STATE_ERROR_ACTIVE, after the device
has been started. On busy networks the CAN controller might receive
CAN frame between and go into an error state before the dev->can.state
is assigned.
Assign dev->can.state before starting the controller to close the race
window.
Netdev drivers are expected to call dev_{uc,mc}_sync() in their
ndo_set_rx_mode method and dev_{uc,mc}_unsync() in their ndo_stop method.
This is mentioned in the kerneldoc for those dev_* functions.
The team driver calls dev_{uc,mc}_unsync() during ndo_uninit instead of
ndo_stop. This is ineffective because address lists (dev->{uc,mc}) have
already been emptied in unregister_netdevice_many() before ndo_uninit is
called. This mistake can result in addresses being leftover on former team
ports after a team device has been deleted; see test_LAG_cleanup() in the
last patch in this series.
Add unsync calls at their expected location, team_close().
v3:
* When adding or deleting a port, only sync/unsync addresses if the team
device is up. In other cases, it is taken care of at the right time by
ndo_open/ndo_set_rx_mode/ndo_stop.
Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device") Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
If an AF_PACKET socket is used to send packets through ipvlan and the
default xmit function of the AF_PACKET socket is changed from
dev_queue_xmit() to packet_direct_xmit() via setsockopt() with the option
name of PACKET_QDISC_BYPASS, the skb->mac_header may not be reset and
remains as the initial value of 65535, this may trigger slab-out-of-bounds
bugs as following:
=================================================================
UG: KASAN: slab-out-of-bounds in ipvlan_xmit_mode_l2+0xdb/0x330 [ipvlan]
PU: 2 PID: 1768 Comm: raw_send Kdump: loaded Not tainted 6.0.0-rc4+ #6
ardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33
all Trace:
print_address_description.constprop.0+0x1d/0x160
print_report.cold+0x4f/0x112
kasan_report+0xa3/0x130
ipvlan_xmit_mode_l2+0xdb/0x330 [ipvlan]
ipvlan_start_xmit+0x29/0xa0 [ipvlan]
__dev_direct_xmit+0x2e2/0x380
packet_direct_xmit+0x22/0x60
packet_snd+0x7c9/0xc40
sock_sendmsg+0x9a/0xa0
__sys_sendto+0x18a/0x230
__x64_sys_sendto+0x74/0x90
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The root cause is:
1. packet_snd() only reset skb->mac_header when sock->type is SOCK_RAW
and skb->protocol is not specified as in packet_parse_headers()
2. packet_direct_xmit() doesn't reset skb->mac_header as dev_queue_xmit()
In this case, skb->mac_header is 65535 when ipvlan_xmit_mode_l2() is
called. So when ipvlan_xmit_mode_l2() gets mac header with eth_hdr() which
use "skb->head + skb->mac_header", out-of-bound access occurs.
This patch replaces eth_hdr() with skb_eth_hdr() in ipvlan_xmit_mode_l2()
and reset mac header in multicast to solve this out-of-bound bug.
Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Lu Wei <luwei32@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
CTCP messages should only be at the start of an IRC message, not
anywhere within it.
While the helper only decodes packes in the ORIGINAL direction, its
possible to make a client send a CTCP message back by empedding one into
a PING request. As-is, thats enough to make the helper believe that it
saw a CTCP message.
ct_sip_next_header and ct_sip_get_header return an absolute
value of matchoff, not a shift from current dataoff.
So dataoff should be assigned matchoff, not incremented by it.
This issue can be seen in the scenario when there are multiple
Contact headers and the first one is using a hostname and other headers
use IP addresses. In this case, ct_sip_walk_headers will work as follows:
The first ct_sip_get_header call to will find the first Contact header
but will return -1 as the header uses a hostname. But matchoff will
be changed to the offset of this header. After that, dataoff should be
set to matchoff, so that the next ct_sip_get_header call find the next
Contact header. But instead of assigning dataoff to matchoff, it is
incremented by it, which is not correct, as matchoff is an absolute
value of the offset. So on the next call to the ct_sip_get_header,
dataoff will be incorrect, and the next Contact header may not be
found at all.
In create_unique_id(), kmalloc(, GFP_KERNEL) can fail due to
out-of-memory, if it fails, return errno correctly rather than
triggering panic via BUG_ON();
ieee80211_scan_rx() tries to access scan_req->flags after a
null check, but a UAF is observed when the scan is completed
and __ieee80211_scan_completed() executes, which then calls
cfg80211_scan_done() leading to the freeing of scan_req.
Since scan_req is rcu_dereference()'d, prevent the racing in
__ieee80211_scan_completed() by ensuring that from mac80211's
POV it is no longer accessed from an RCU read critical section
before we call cfg80211_scan_done().
In pxa3xx_gcu_write, a count parameter of type size_t is passed to words of
type int. Then, copy_from_user() may cause a heap overflow because it is used
as the third argument of copy_from_user().
Signed-off-by: Stefan Metzmacher <metze@samba.org> Cc: stable@vger.kernel.org Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mm: Fix TLB flush for not-first PFNMAP mappings in unmap_region()
This is a stable-specific patch.
I botched the stable-specific rewrite of
commit b67fbebd4cf98 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas"):
As Hugh pointed out, unmap_region() actually operates on a list of VMAs,
and the variable "vma" merely points to the first VMA in that list.
So if we want to check whether any of the VMAs we're operating on is
PFNMAP or MIXEDMAP, we have to iterate through the list and check each VMA.
2 keymap fixes for the Acer Aspire One AOD270 and the same hardware
rebranded as Packard Bell Dot SC:
1. The F2 key is marked with a big '?' symbol on the Packard Bell Dot SC,
this sends WMID_HOTKEY_EVENTs with a scancode of 0x27 add a mapping
for this.
2. Scancode 0x61 is KEY_SWITCHVIDEOMODE. Usually this is a duplicate
input event with the "Video Bus" input device events. But on these devices
the "Video Bus" does not send events for this key. Map 0x61 to KEY_UNKNOWN
instead of using KE_IGNORE so that udev/hwdb can override it on these devs.
The MSI is probably raised by incoming packets, so power down the device
and disable bus mastering to stop the traffic, as user confirmed this
approach works.
In addition to that, be extra safe and cancel reset task if it's running.
If the previous thing cat'ing $debugfs/rd left the FIFO full, then
subsequent open could deadlock in rd_write() (because open is blocked,
not giving a chance for read() to consume any data in the FIFO). Also
it is generally a good idea to clear out old data from the FIFO.
Prior to Linux 5.3, ->transport_lock in sunrpc required the _bh style
spinlocks (when not called from a bottom-half handler).
When upstream 3848e96edf4788f772d83990022fa7023a233d83 was backported to
stable kernels, the spin_lock/unlock calls should have been changed to
the _bh version, but this wasn't noted in the patch and didn't happen.
So convert these lock/unlock calls to the _bh versions.
This patch is required for any stable kernel prior to 5.3 to which the
above mentioned patch was backported. Namely 4.9.y, 4.14.y, 4.19.y.
Cong Wang noticed that the previous fix for sch_sfb accessing the queued
skb after enqueueing it to a child qdisc was incomplete: the SFB enqueue
function was also calling qdisc_qstats_backlog_inc() after enqueue, which
reads the pkt len from the skb cb field. Fix this by also storing the skb
len, and using the stored value to increment the backlog after enqueueing.
Fixes: 9efd23297cca ("sch_sfb: Don't assume the skb is still around after enqueueing to child") Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Acked-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/20220905192137.965549-1-toke@toke.dk Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix a bug reported and analyzed by Nagaraj Arankal, where the handling
of a spurious non-SACK RTO could cause a connection to fail to clear
retrans_stamp, causing a later RTO to very prematurely time out the
connection with ETIMEDOUT.
Here is the buggy scenario, expanding upon Nagaraj Arankal's excellent
report:
(*1) Send one data packet on a non-SACK connection
(*2) Because no ACK packet is received, the packet is retransmitted
and we enter CA_Loss; but this retransmission is spurious.
(*3) The ACK for the original data is received. The transmitted packet
is acknowledged. The TCP timestamp is before the retrans_stamp,
so tcp_may_undo() returns true, and tcp_try_undo_loss() returns
true without changing state to Open (because tcp_is_sack() is
false), and tcp_process_loss() returns without calling
tcp_try_undo_recovery(). Normally after undoing a CA_Loss
episode, tcp_fastretrans_alert() would see that the connection
has returned to CA_Open and fall through and call
tcp_try_to_open(), which would set retrans_stamp to 0. However,
for non-SACK connections we hold the connection in CA_Loss, so do
not fall through to call tcp_try_to_open() and do not set
retrans_stamp to 0. So retrans_stamp is (erroneously) still
non-zero.
At this point the first "retransmission event" has passed and
been recovered from. Any future retransmission is a completely
new "event". However, retrans_stamp is erroneously still
set. (And we are still in CA_Loss, which is correct.)
(*4) After 16 minutes (to correspond with tcp_retries2=15), a new data
packet is sent. Note: No data is transmitted between (*3) and
(*4) and we disabled keep alives.
The socket's timeout SHOULD be calculated from this point in
time, but instead it's calculated from the prior "event" 16
minutes ago (step (*2)).
(*5) Because no ACK packet is received, the packet is retransmitted.
(*6) At the time of the 2nd retransmission, the socket returns
ETIMEDOUT, prematurely, because retrans_stamp is (erroneously)
too far in the past (set at the time of (*2)).
This commit fixes this bug by ensuring that we reuse in
tcp_try_undo_loss() the same careful logic for non-SACK connections
that we have in tcp_try_undo_recovery(). To avoid duplicating logic,
we factor out that logic into a new
tcp_is_non_sack_preventing_reopen() helper and call that helper from
both undo functions.
There is a shift wrapping bug in this code so anything thing above
31 will return false.
Fixes: 35c55c9877f8 ("tipc: add neighbor monitoring framework") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The sch_sfb enqueue() routine assumes the skb is still alive after it has
been enqueued into a child qdisc, using the data in the skb cb field in the
increment_qlen() routine after enqueue. However, the skb may in fact have
been freed, causing a use-after-free in this case. In particular, this
happens if sch_cake is used as a child of sfb, and the GSO splitting mode
of CAKE is enabled (in which case the skb will be split into segments and
the original skb freed).
Fix this by copying the sfb cb data to the stack before enqueueing the skb,
and using this stack copy in increment_qlen() instead of the skb pointer
itself.
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18231 Fixes: e13e02a3c68d ("net_sched: SFB flow scheduler") Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Ensure the match happens in the right direction, previously the
destination used was the server, not the NAT host, as the comment
shows the code intended.
Additionally nf_nat_irc uses port 0 as a signal and there's no valid way
it can appear in a DCC message, so consider port 0 also forged.
The IPv6 path already drops dst in the daddr changed case, but the IPv4
path does not. This change makes the two code paths consistent.
Further, it is possible that there is already a metadata_dst allocated from
ingress that might already be attached to skbuff->dst while following
the bridge path. If it is not released before setting a new
metadata_dst, it will be leaked. This is similar to what is done in
bpf_set_tunnel_key() or ip6_route_input().
It is important to note that the memory being leaked is not the dst
being set in the bridge code, but rather memory allocated from some
other code path that is not being freed correctly before the skb dst is
overwritten.
An example of the leakage fixed by this commit found using kmemleak:
Both __device_attach_driver() and __driver_attach() check the return
code of the bus_type.match() function to see if the device needs to be
added to the deferred probe list. After adding the device to the list,
the logic attempts to bind the device to the driver anyway, as if the
device had matched with the driver, which is not correct.
If __device_attach_driver() detects that the device in question is not
ready to match with a driver on the bus, then it doesn't make sense for
the device to attempt to bind with the current driver or continue
attempting to match with any of the other drivers on the bus. So, update
the logic in __device_attach_driver() to reflect this.
If __driver_attach() detects that a driver tried to match with a device
that is not ready to match yet, then the driver should not attempt to bind
with the device. However, the driver can still attempt to match and bind
with other devices on the bus, as drivers can be bound to multiple
devices. So, update the logic in __driver_attach() to reflect this.
There may be a bad USB audio device with a USB ID of (0x04fa, 0x4201) and
the number of it's interfaces less than 4, an out-of-bounds read bug occurs
when parsing the interface descriptor for this device.
In loopback_jiffies_timer_pos_update(), we are getting jiffies twice.
First time for playback, second time for capture. Jiffies can be updated
between these two calls and if the capture jiffies is larger, extra zeros
will be filled in the capture buffer.
Change to get jiffies once and use it for both playback and capture.
The voice allocator sometimes begins allocating from near the end of the
array and then wraps around, however snd_emu10k1_pcm_channel_alloc()
accesses the newly allocated voices as if it never wrapped around.
This results in out of bounds access if the first voice has a high enough
index so that first_voice + requested_voice_count > NUM_G (64).
The more voices are requested, the more likely it is for this to occur.
This was initially discovered using PipeWire, however it can be reproduced
by calling aplay multiple times with 16 channels:
aplay -r 48000 -D plughw:CARD=Live,DEV=3 -c 16 /dev/zero
UBSAN: array-index-out-of-bounds in sound/pci/emu10k1/emupcm.c:127:40
index 65 is out of range for type 'snd_emu10k1_voice [64]'
CPU: 1 PID: 31977 Comm: aplay Tainted: G W IOE 6.0.0-rc2-emu10k1+ #7
Hardware name: ASUSTEK COMPUTER INC P5W DH Deluxe/P5W DH Deluxe, BIOS 3002 07/22/2010
Call Trace:
<TASK>
dump_stack_lvl+0x49/0x63
dump_stack+0x10/0x16
ubsan_epilogue+0x9/0x3f
__ubsan_handle_out_of_bounds.cold+0x44/0x49
snd_emu10k1_playback_hw_params+0x3bc/0x420 [snd_emu10k1]
snd_pcm_hw_params+0x29f/0x600 [snd_pcm]
snd_pcm_common_ioctl+0x188/0x1410 [snd_pcm]
? exit_to_user_mode_prepare+0x35/0x170
? do_syscall_64+0x69/0x90
? syscall_exit_to_user_mode+0x26/0x50
? do_syscall_64+0x69/0x90
? exit_to_user_mode_prepare+0x35/0x170
snd_pcm_ioctl+0x27/0x40 [snd_pcm]
__x64_sys_ioctl+0x95/0xd0
do_syscall_64+0x5c/0x90
? do_syscall_64+0x69/0x90
? do_syscall_64+0x69/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
If a 32-bit kernel was compiled for PA2.0 CPUs, it won't be able to run
on machines with PA1.x CPUs. Add a check and bail out early if a PA1.x
machine is detected.
Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Zhenneng Li <lizhenneng@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 23c2d497de21 ("mm: kmemleak: take a full lowmem check in
kmemleak_*_phys()") brought false leak alarms on some archs like arm64
that does not init pfn boundary in early booting. The final solution
lands on linux-6.0: commit 0c24e061196c ("mm: kmemleak: add rbtree and
store physical address for objects allocated with PA").
Revert this commit before linux-6.0. The original issue of invalid PA
can be mitigated by additional check in devicetree.
Commit d4252071b97d ("add barriers to buffer_uptodate and
set_buffer_uptodate") added proper memory barriers to the buffer head
BH_Uptodate bit, so that anybody who tests a buffer for being up-to-date
will be guaranteed to actually see initialized state.
However, that commit didn't _just_ add the memory barrier, it also ended
up dropping the "was it already set" logic that the BUFFER_FNS() macro
had.
That's conceptually the right thing for a generic "this is a memory
barrier" operation, but in the case of the buffer contents, we really
only care about the memory barrier for the _first_ time we set the bit,
in that the only memory ordering protection we need is to avoid anybody
seeing uninitialized memory contents.
Any other access ordering wouldn't be about the BH_Uptodate bit anyway,
and would require some other proper lock (typically BH_Lock or the folio
lock). A reader that races with somebody invalidating the buffer head
isn't an issue wrt the memory ordering, it's a serialization issue.
Now, you'd think that the buffer head operations don't matter in this
day and age (and I certainly thought so), but apparently some loads
still end up being heavy users of buffer heads. In particular, the
kernel test robot reported that not having this bit access optimization
in place caused a noticeable direct IO performance regression on ext4:
It's been reported that there is a possible data-race accessing to the
global card_requested[] array at ALSA sequencer core, which is used
for determining whether to call request_module() for the card or not.
This data race itself is almost harmless, as it might end up with one
extra request_module() call for the already loaded module at most.
But it's still better to fix.
This patch addresses the possible data race of card_requested[] and
client_requested[] arrays by replacing them with bitmask.
It's an atomic operation and can work without locks.
ALSA OSS sequencer refers to a global variable max_midi_devs at
creating a new port, storing it to its own field. Meanwhile this
variable may be changed by other sequencer events at
snd_seq_oss_midi_check_exit_port() in parallel, which may cause a data
race.
OTOH, this data race itself is almost harmless, as the access to the
MIDI device is done via get_mdev() and it's protected with a refcount,
hence its presence is guaranteed.
Though, it's sill better to address the data-race from the code sanity
POV, and this patch adds the proper spinlock for the protection.
Upon reception, a packet must be categorized, either it's destination is
the host, or it is another host. A packet with no destination addressing
fields may be valid in two situations:
- the packet has no source field: only ACKs are built like that, we
consider the host as the destination.
- the packet has a valid source field: it is directed to the PAN
coordinator, as for know we don't have this information we consider we
are not the PAN coordinator.
There was likely a copy/paste error made during a previous cleanup
because the if clause is now containing exactly the same condition as in
the switch case, which can never be true. In the past the destination
address was used in the switch and the source address was used in the
if, which matches what the spec says.
Cc: stable@vger.kernel.org Fixes: ae531b9475f6 ("ieee802154: use ieee802154_addr instead of *_sa variants") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20220826142954.254853-1-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we are not connected to a channel, sending channel "switch"
announcement doesn't make any sense.
The BSS list is empty in that case. This causes the for loop in
cfg80211_get_bss() to be bypassed, so the function returns NULL
(check line 1424 of net/wireless/scan.c), causing the WARN_ON()
in ieee80211_ibss_csa_beacon() to get triggered (check line 500
of net/mac80211/ibss.c), which was consequently reported on the
syzkaller dashboard.
Thus, check if we have an existing connection before generating
the CSA beacon in ieee80211_ibss_finish_csa().
During cdrom emulation, the response to read_toc command must contain
the cdrom address as the number of sectors (2048 byte sized blocks)
represented either as an absolute value (when MSF bit is '0') or in
terms of PMin/PSec/PFrame (when MSF bit is set to '1'). Incase of
cdrom, the fsg_lun_open call sets the sector size to 2048 bytes.
When MAC OS sends a read_toc request with MSF set to '1', the
store_cdrom_address assumes that the address being provided is the
LUN size represented in 512 byte sized blocks instead of 2048. It
tries to modify the address further to convert it to 2048 byte sized
blocks and store it in MSF format. This results in data transfer
failures as the cdrom address being provided in the read_toc response
is incorrect.
This turned out not to be an error in usb-storage but rather a nested
device reset attempt. That is, as the rtl8712 driver was being
unbound from a composite device in preparation for an unrelated USB
reset (that driver does not have pre_reset or post_reset callbacks),
its ->remove routine called usb_reset_device() -- thus nesting one
reset call within another.
Performing a reset as part of disconnect processing is a questionable
practice at best. However, the bug report points out that the USB
core does not have any protection against nested resets. Adding a
reset_in_progress flag and testing it will prevent such errors in the
future.
Add proper alignment for .nospec_call_table and .nospec_return_table in
vmlinux.
[hca@linux.ibm.com]: The problem with the missing alignment of the nospec
tables exist since a long time, however only since commit e6ed91fd0768
("s390/alternatives: remove padding generation code") and with
CONFIG_RELOCATABLE=n the kernel may also crash at boot time.
The above named commit reduced the size of struct alt_instr by one byte,
so its new size is 11 bytes. Therefore depending on the number of cpu
alternatives the size of the __alt_instructions array maybe odd, which
again also causes that the addresses of the nospec tables will be odd.
If the address of __nospec_call_start is odd and the kernel is compiled
With CONFIG_RELOCATABLE=n the compiler may generate code that loads the
address of __nospec_call_start with a 'larl' instruction.
This will generate incorrect code since the 'larl' instruction only works
with even addresses. In result the members of the nospec tables will be
accessed with an off-by-one offset, which subsequently may lead to
addressing exceptions within __nospec_revert().
The alignment check in prepare_hugepage_range() is wrong for 2 GB
hugepages, it only checks for 1 MB hugepage alignment.
This can result in kernel crash in __unmap_hugepage_range() at the
BUG_ON(start & ~huge_page_mask(h)) alignment check, for mappings
created with MAP_FIXED at unaligned address.
Fix this by correctly handling multiple hugepage sizes, similar to the
generic version of prepare_hugepage_range().
Fixes: d08de8e2d867 ("s390/mm: add support for 2GB hugepages") Cc: <stable@vger.kernel.org> # 4.8+ Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After xHC controller is started, either in probe or resume, it can take
a while before any of the connected usb devices are visible to the roothub
due to link training.
It's possible xhci driver loads, sees no acivity and suspends the host
before the USB device is visible.
In one testcase with a hotplugged xHC controller the host finally detected
the connected USB device and generated a wake 500ms after host initial
start.
If hosts didn't suspend the device duringe training it probablty wouldn't
take up to 500ms to detect it, but looking at specs reveal USB3 link
training has a couple long timeout values, such as 120ms
RxDetectQuietTimeout, and 360ms PollingLFPSTimeout.
So Add a 500ms grace period that keeps polling the roothub for 500ms after
start, preventing runtime suspend until USB devices are detected.
The driver does not check if the cooling state passed to
gpio_fan_set_cur_state() exceeds the maximum cooling state as
stored in fan_data->num_speeds. Since the cooling state is later
used as an array index in set_fan_speed(), an array out of bounds
access can occur.
This can be exploited by setting the state of the thermal cooling device
to arbitrary values, causing for example a kernel oops when unavailable
memory is accessed this way.
When changing the console font with ioctl(KDFONTOP) the new font size
can be bigger than the previous font. A previous selection may thus now
be outside of the new screen size and thus trigger out-of-bounds
accesses to graphics memory if the selection is removed in
vc_do_resize().
Prevent such out-of-memory accesses by dropping the selection before the
various con_font_set() console handlers are called.
_Read/Write_MACREG callbacks are NULL so the read/write_macreg_hdl()
functions don't do anything except free the "pcmd" pointer. It
results in a use after free. Delete them.
Fixes: 2865d42c78a9 ("staging: r8712u: Add the new driver to the mainline kernel") Cc: stable <stable@kernel.org> Reported-by: Zheng Wang <hackerzheng666@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/Yw4ASqkYcUhUfoY2@kili Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The setting of RS485 RTS polarity is inverse in the current driver.
When the property of 'rs485-rts-active-low' is enabled in the dts node,
the RTS signal should be LOW during sending. Otherwise, if there is no
such a property, the RTS should be HIGH during sending.
Fixes: 03895cf41d18 ("tty: serial: fsl_lpuart: Add support for RS-485") Cc: stable <stable@kernel.org> Signed-off-by: Nicolas Diaz <nicolas.diaz@nxp.com> Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com> Link: https://lore.kernel.org/r/20220805144529.604856-1-shenwei.wang@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The return type is supposed to be ssize_t, which is signed long,
but "r" was declared as unsigned int. This means that on 64 bit systems
we return positive values instead of negative error codes.
On Intel hardware the SLP_TYPx bitfield occupies bits 10-12 as per ACPI
specification (see Table 4.13 "PM1 Control Registers Fixed Hardware
Feature Control Bits" for the details).
Fix the mask and other related definitions accordingly.
Fixes: 93e5eadd1f6e ("x86/platform: New Intel Atom SOC power management controller driver") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220801113734.36131-1-andriy.shevchenko@linux.intel.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In `do_fb_ioctl()` of fbmem.c, if cmd is FBIOPUT_VSCREENINFO, var will be
copied from user, then go through `fb_set_var()` and
`info->fbops->fb_check_var()` which could may be `pm2fb_check_var()`.
Along the path, `var->pixclock` won't be modified. This function checks
whether reciprocal of `var->pixclock` is too high. If `var->pixclock` is
zero, there will be a divide by zero error. So, it is necessary to check
whether denominator is zero to avoid crash. As this bug is found by
Syzkaller, logs are listed below.