With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:
drivers/s390/net/ctcm_main.c:1064:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
.ndo_start_xmit = ctcm_tx,
^~~~~~~
drivers/s390/net/ctcm_main.c:1072:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
.ndo_start_xmit = ctcmpc_tx,
^~~~~~~~~
->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of ctc{mp,}m_tx() to
match the prototype's to resolve the warning and potential CFI failure,
should s390 select ARCH_SUPPORTS_CFI_CLANG in the future.
Additionally, while in the area, remove a comment block that is no
longer relevant.
Link: https://github.com/ClangBuiltLinux/linux/issues/1750 Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/amdgpu_smu.c:3008:29: error: incompatible function pointer types initializing 'int (*)(void *, uint32_t, long *, uint32_t)' (aka 'int (*)(void *, unsigned int, long *, unsigned int)') with an expression of type 'int (void *, enum PP_OD_DPM_TABLE_COMMAND, long *, uint32_t)' (aka 'int (void *, enum PP_OD_DPM_TABLE_COMMAND, long *, unsigned int)') [-Werror,-Wincompatible-function-pointer-types-strict]
.odn_edit_dpm_table = smu_od_edit_dpm_table,
^~~~~~~~~~~~~~~~~~~~~
1 error generated.
There are only two implementations of ->odn_edit_dpm_table() in 'struct
amd_pm_funcs': smu_od_edit_dpm_table() and pp_odn_edit_dpm_table(). One
has a second parameter type of 'enum PP_OD_DPM_TABLE_COMMAND' and the
other uses 'u32'. Ultimately, smu_od_edit_dpm_table() calls
->od_edit_dpm_table() from 'struct pptable_funcs' and
pp_odn_edit_dpm_table() calls ->odn_edit_dpm_table() from 'struct
pp_hwmgr_func', which both have a second parameter type of 'enum
PP_OD_DPM_TABLE_COMMAND'.
Update the type parameter in both the prototype in 'struct amd_pm_funcs'
and pp_odn_edit_dpm_table() to 'enum PP_OD_DPM_TABLE_COMMAND', which
cleans up the warning.
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:
Avoid potential use-after-free condition under memory pressure. If the
kzalloc() fails, q_vector will be freed but left in the original
adapter->q_vector[v_idx] array position.
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The XP-PEN Deco LW drawing tablet can be connected by USB cable or using
a USB Bluetooth dongle. When it is connected using the dongle, there
might be a small delay until the tablet is paired with the dongle.
Fetching the device battery during this delay results in random battery
percentage values.
Add a quirk to avoid actively querying the battery percentage and wait
for the device to report it on its own.
Reported-by: Mia Kanashi <chad@redpilled.dev> Tested-by: Mia Kanashi <chad@redpilled.dev> Signed-off-by: José Expósito <jose.exposito89@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
This patch fixes a shift-out-of-bounds in brcmfmac that occurs in
BIT(chiprev) when a 'chiprev' provided by the device is too large.
It should also not be equal to or greater than BITS_PER_TYPE(u32)
as we do bitwise AND with a u32 variable and BIT(chiprev). The patch
adds a check that makes the function return NULL if that is the case.
Note that the NULL case is later handled by the bus-specific caller,
brcmf_usb_probe_cb() or brcmf_usb_reset_resume(), for example.
Reported-by: Dokyung Song <dokyungs@yonsei.ac.kr> Reported-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr> Reported-by: Minsuk Kang <linuxlovemin@yonsei.ac.kr> Signed-off-by: Minsuk Kang <linuxlovemin@yonsei.ac.kr> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221024071329.504277-1-linuxlovemin@yonsei.ac.kr Signed-off-by: Sasha Levin <sashal@kernel.org>
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:
drivers/net/hamradio/baycom_epp.c:1119:25: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
.ndo_start_xmit = baycom_send_packet,
^~~~~~~~~~~~~~~~~~
1 error generated.
->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of baycom_send_packet()
to match the prototype's to resolve the warning and CFI failure.
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:
drivers/net/ethernet/ti/netcp_core.c:1944:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
.ndo_start_xmit = netcp_ndo_start_xmit,
^~~~~~~~~~~~~~~~~~~~
1 error generated.
->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of
netcp_ndo_start_xmit() to match the prototype's to resolve the warning
and CFI failure.
The reproducer doesn't really reproduce outside of syzkaller
environment, so I'm taking a guess here. It looks like we
do generate correct ETH_HLEN-sized packet, but we redirect
the packet to the tunneling device. Before we do so, we
__skb_pull l2 header and arrive again at skb->len == 0.
Doesn't seem like we can do anything better than having
an explicit check after __skb_pull?
Cc: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+f635e86ec3fa0a37e019@syzkaller.appspotmail.com Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20221027225537.353077-1-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:
drivers/gpu/drm/meson/meson_encoder_cvbs.c:211:16: error: incompatible function pointer types initializing 'enum drm_mode_status (*)(struct drm_bridge *, const struct drm_display_info *, const struct drm_display_mode *)' with an expression of type 'int (struct drm_bridge *, const struct drm_display_info *, const struct drm_display_mode *)' [-Werror,-Wincompatible-function-pointer-types-strict]
.mode_valid = meson_encoder_cvbs_mode_valid,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
->mode_valid() in 'struct drm_bridge_funcs' expects a return type of
'enum drm_mode_status', not 'int'. Adjust the return type of
meson_encoder_cvbs_mode_valid() to match the prototype's to resolve the
warning and CFI failure.
gcc 13 correctly reports overflow in qed_grc_dump_addr_range():
In file included from drivers/net/ethernet/qlogic/qed/qed.h:23,
from drivers/net/ethernet/qlogic/qed/qed_debug.c:10:
drivers/net/ethernet/qlogic/qed/qed_debug.c: In function 'qed_grc_dump_addr_range':
include/linux/qed/qed_if.h:1217:9: error: overflow in conversion from 'int' to 'u8' {aka 'unsigned char'} changes value from '(int)vf_id << 8 | 128' to '128' [-Werror=overflow]
We do:
u8 fid;
...
fid = vf_id << 8 | 128;
Since fid is 16bit (and the stored value above too), fid should be u16,
not u8. Fix that.
qmi_msg_handler is required to be null terminated by QMI module.
There might be a case where a handler for a msg id is not present in the
handlers array which can lead to infinite loop while searching the handler
and therefore out of bound access in qmi_invoke_handler().
Hence update the initialization in qmi_msg_handler data structure.
The iso_layout parameter must be manually set to get the driver to
swap KEY_102ND and KEY_GRAVE. This patch eliminates the need to do that.
This is safe to do, as Macs with keyboards that do not need the quirk
will keep working the same way as the value of hid->country will be
different than HID_COUNTRY_INTERNATIONAL_ISO. This was tested by one
person with a Mac with the WELLSPRINGT2_J152F keyboard with a layout
that does not require the quirk to be set.
The hid-apple driver does not support chaining translations or
dependencies on other translations. This creates two problems:
1 - In Non-English keyboards of Macs, KEY_102ND and KEY_GRAVE are
swapped and the APPLE_ISO_TILDE_QUIRK is used to work around this
problem. The quirk is not set for the Macs where these bugs happen yet
(see the 2nd patch for that), but this can be forced by setting the
iso_layout parameter. Unfortunately, this only partially works.
KEY_102ND gets translated to KEY_GRAVE, but KEY_GRAVE does not get
translated to KEY_102ND, so both of them end up functioning as
KEY_GRAVE. This is because the driver translates the keys as if Fn was
pressed and the original is sent if it is not pressed, without any
further translations happening on the key[#463]. KEY_GRAVE is present at
macbookpro_no_esc_fn_keys[#195], so this is what happens:
- KEY_GRAVE -> KEY_ESC (as if Fn is pressed)
- KEY_GRAVE is returned (Fn isn't pressed, so translation is discarded)
- KEY_GRAVE -> KEY_102ND (this part is not reached!)
...
2 - In case the touchbar does not work, the driver supports sending
Escape when Fn+KEY_GRAVE is pressed. As mentioned previously, KEY_102ND
is actually KEY_GRAVE and needs to be translated before this happens.
Normally, these are the steps that should happen:
- KEY_102ND -> KEY_GRAVE
- KEY_GRAVE -> KEY_ESC (Fn is pressed)
- KEY_ESC is returned
Though this is what happens instead, as dependencies on other
translations are not supported:
- KEY_102ND -> KEY_ESC (Fn is pressed)
- KEY_ESC is returned
This patch fixes both bugs by ordering the translations correctly and by
making the translations continue and not return immediately after
translating a key so that chained translations work and translations can
depend on other ones.
This patch also simplifies the implementation of the swap_fn_leftctrl
option a little bit, as it makes it simply use a normal translation
instead adding extra code to translate a key to KEY_FN[#381]. This change
wasn't put in another patch as the code that translates the Fn key needs
to be changed because of the changes in the patch, and those changes
would be discarded with the next patch anyway (the part that originally
translates KEY_FN to KEY_LEFTCTRL needs to be made an else-if branch of
the part that transltes KEY_LEFTCTRL to KEY_FN).
Note: Line numbers (#XYZ) are for drivers/hid/hid-apple.c at commit 20afcc462579 ("HID: apple: Add "GANSS" to the non-Apple list").
Note: These bugs are only present on Macs with a keyboard with no
dedicated escape key and a non-English layout.
David Jeffery found one double ->queue_rq() issue, so far it can
be triggered in VM use case because of long vmexit latency or preempt
latency of vCPU pthread or long page fault in vCPU pthread, then block
IO req could be timed out before queuing the request to hardware but after
calling blk_mq_start_request() during ->queue_rq(), then timeout handler
may handle it by requeue, then double ->queue_rq() is caused, and kernel
panic.
So far, it is driver's responsibility to cover the race between timeout
and completion, so it seems supposed to be solved in driver in theory,
given driver has enough knowledge.
But it is really one common problem, lots of driver could have similar
issue, and could be hard to fix all affected drivers, even it isn't easy
for driver to handle the race. So David suggests this patch by draining
in-progress ->queue_rq() for solving this issue.
Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Keith Busch <kbusch@kernel.org> Cc: virtualization@lists.linux-foundation.org Cc: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221026051957.358818-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
The LG 27GP950 and LG 27GN950 have visible display corruption when
trying to use 10bpc modes. So, to fix this, cap their maximum DSC
target bitrate to 15bpp.
Suggested-by: Roman Li <roman.li@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
On WCN3990, we are seeing a rare scenario where copy engine hardware is
sending a copy complete interrupt to the host driver while still
processing the buffer that the driver has sent, this is leading into an
SMMU fault triggering kernel panic. This is happening on copy engine
channel 3 (CE3) where the driver normally enqueues WMI commands to the
firmware. Upon receiving a copy complete interrupt, host driver will
immediately unmap and frees the buffer presuming that hardware has
processed the buffer. In the issue case, upon receiving copy complete
interrupt, host driver will unmap and free the buffer but since hardware
is still accessing the buffer (which in this case got unmapped in
parallel), SMMU hardware will trigger an SMMU fault resulting in a
kernel panic.
In order to avoid this, as a work around, add a delay before unmapping
the copy engine source DMA buffer. This is conditionally done for
WCN3990 and only for the CE3 channel where issue is seen.
After the IPMI disconnect problem, the memory kept rising and we tried
to unload the driver to free the memory. However, only part of the
free memory is recovered after the driver is uninstalled. Using
ebpf to hook free functions, we find that neither ipmi_user nor
ipmi_smi_msg is free, only ipmi_recv_msg is free.
We find that the deliver_smi_err_response call in clean_smi_msgs does
the destroy processing on each message from the xmit_msg queue without
checking the return value and free ipmi_smi_msg.
deliver_smi_err_response is called only at this location. Adding the
free handling has no effect.
To verify, try using ebpf to trace the free function.
If ar5523_cmd() timed out, then ar5523_host_available() failed and
ar5523_probe() freed the device structure. So, ar5523_cmd_tx_cb()
might touch the freed structure.
This patch fixes this issue by canceling in-flight tx cmd if submitted
urb timed out.
The bug arises when a USB device claims to be an ATH9K but doesn't
have the expected endpoints. (In this case there was an interrupt
endpoint where the driver expected a bulk endpoint.) The kernel
needs to be able to handle such devices without getting an internal error.
When firmware hit trap at initialization, host will read abnormal
max_flowrings number from dongle, and it will cause kernel panic when
doing iowrite to initialize dongle ring.
To detect this error at early stage, we directly return error when getting
invalid max_flowrings(>256).
Signed-off-by: Wright Feng <wright.feng@cypress.com> Signed-off-by: Chi-hsien Lin <chi-hsien.lin@cypress.com> Signed-off-by: Ian Lin <ian.lin@infineon.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220929031001.9962-3-ian.lin@infineon.com Signed-off-by: Sasha Levin <sashal@kernel.org>
There is a hardware bug that the interrupt STMBUF_HALF may be triggered
after or when disable interrupt.
It may led to unexpected kernel panic.
And interrupt STMBUF_HALF and STMBUF_RTND have no other effect.
So disable them and the unused interrupts.
meanwhile clear the interrupt status when disable interrupt.
Signed-off-by: Ming Qian <ming.qian@nxp.com> Reviewed-by: Mirela Rabulea <mirela.rabulea@nxp.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Sasha Levin <sashal@kernel.org>
The GC300's features register doesn't specify that a 2D pipe is
available, and like the GC600, its idle register reports zero bits where
modules aren't present.
Signed-off-by: Doug Brown <doug@schmorgal.com> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the input inode of hfs_write_inode() is incorrect:
struct inode
struct hfs_inode_info
struct hfs_cat_key
struct hfs_name
u8 len # len is greater than HFS_NAMELEN(31) which is the
maximum length of an HFS filename
OOB read occurred:
hfs_write_inode()
hfs_brec_find()
__hfs_brec_find()
hfs_cat_keycmp()
hfs_strcmp() # OOB read occurred due to len is too large
Fix this by adding a Check on len in hfs_write_inode() before calling
hfs_brec_find().
Link: https://lkml.kernel.org/r/20221130065959.2168236-1-zhangpeng362@huawei.com Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Reported-by: <syzbot+e836ff7133ac02be825f@syzkaller.appspotmail.com> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The Medion Lifetab S10346 is a x86 tablet which ships with Android x86 as
factory OS. The Android x86 kernel fork ignores I2C devices described in
the DSDT, except for the PMIC and Audio codecs.
As usual the Medion Lifetab S10346's DSDT contains a bunch of extra I2C
devices which are not actually there, causing various resource conflicts.
Add an ACPI_QUIRK_SKIP_I2C_CLIENTS quirk for the Medion Lifetab S10346 to
the acpi_quirk_skip_dmi_ids table to woraround this.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The Lenovo Yoga Tab 3 Pro (YT3-X90F) is a x86 (Cherry Trail) tablet which
ships with Android x86 as factory OS. The Android x86 kernel fork ignores
I2C devices described in the DSDT, except for the PMIC and Audio codecs.
As usual the Lenovo Yoga Tab 3 Pro's DSDT contains a bunch of extra I2C
devices which are not actually there, causing various resource conflicts.
Add an ACPI_QUIRK_SKIP_I2C_CLIENTS quirk for the Lenovo Yoga Tab 3 Pro to
the acpi_quirk_skip_dmi_ids table to woraround this.
ACPI_QUIRK_SKIP_I2C_CLIENTS handling uses i2c_acpi_known_good_ids[],
so that PMICs and Audio codecs will still be enumerated properly.
The Lenovo Yoga Tab 3 Pro uses a Whiskey Cove PMIC, add the INT34D3 HID
for this PMIC to the i2c_acpi_known_good_ids[] list.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
A kernel that was compiled without CONFIG_X86_X2APIC was unable to boot on
platforms that have x2APIC already enabled in the BIOS before starting the
kernel.
The kernel was supposed to panic with an approprite error message in
validate_x2apic() due to the missing X2APIC support.
However, validate_x2apic() was run too late in the boot cycle, and the
kernel tried to initialize the APIC nonetheless. This resulted in an
earlier panic in setup_local_APIC() because the APIC was not registered.
In my experiments, a panic message in setup_local_APIC() was not visible
in the graphical console, which resulted in a hang with no indication
what has gone wrong.
Instead of calling panic(), disable the APIC, which results in a somewhat
working system with the PIC only (and no SMP). This way the user is able to
diagnose the problem more easily.
Disabling X2APIC mode is not an option because it's impossible on systems
with locked x2APIC.
The proper place to disable the APIC in this case is in check_x2apic(),
which is called early from setup_arch(). Doing this in
__apic_intr_mode_select() is too late.
Make check_x2apic() unconditionally available and remove the empty stub.
The integer overflow is descripted with following codes:
> 317 static comp_t encode_comp_t(u64 value)
> 318 {
> 319 int exp, rnd;
......
> 341 exp <<= MANTSIZE;
> 342 exp += value;
> 343 return exp;
> 344 }
Currently comp_t is defined as type of '__u16', but the variable 'exp' is
type of 'int', so overflow would happen when variable 'exp' in line 343 is
greater than 65535.
If field s_log_block_size of superblock data is corrupted and too large,
init_nilfs() and load_nilfs() still can trigger a shift-out-of-bounds
warning followed by a kernel panic (if panic_on_warn is set):
shift exponent 38973 is too large for 32-bit type 'int'
Call Trace:
<TASK>
dump_stack_lvl+0xcd/0x134
ubsan_epilogue+0xb/0x50
__ubsan_handle_shift_out_of_bounds.cold.12+0x17b/0x1f5
init_nilfs.cold.11+0x18/0x1d [nilfs2]
nilfs_mount+0x9b5/0x12b0 [nilfs2]
...
This fixes the issue by adding and using a new helper function for getting
block size with sanity check.
Patch series "nilfs2: fix UBSAN shift-out-of-bounds warnings on mount
time".
The first patch fixes a bug reported by syzbot, and the second one fixes
the remaining bug of the same kind. Although they are triggered by the
same super block data anomaly, I divided it into the above two because the
details of the issues and how to fix it are different.
Both are required to eliminate the shift-out-of-bounds issues at mount
time.
This patch (of 2):
If the block size exponent information written in an on-disk superblock is
corrupted, nilfs_sb2_bad_offset helper function can trigger
shift-out-of-bounds warning followed by a kernel panic (if panic_on_warn
is set):
shift exponent 38983 is too large for 64-bit type 'unsigned long long'
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:151 [inline]
__ubsan_handle_shift_out_of_bounds+0x33d/0x3b0 lib/ubsan.c:322
nilfs_sb2_bad_offset fs/nilfs2/the_nilfs.c:449 [inline]
nilfs_load_super_block+0xdf5/0xe00 fs/nilfs2/the_nilfs.c:523
init_nilfs+0xb7/0x7d0 fs/nilfs2/the_nilfs.c:577
nilfs_fill_super+0xb1/0x5d0 fs/nilfs2/super.c:1047
nilfs_mount+0x613/0x9b0 fs/nilfs2/super.c:1317
...
In addition, since nilfs_sb2_bad_offset() performs multiplication without
considering the upper bound, the computation may overflow if the disk
layout parameters are not normal.
This fixes these issues by inserting preliminary sanity checks for those
parameters and by converting the comparison from one involving
multiplication and left bit-shifting to one using division and right
bit-shifting.
A use-after-free in acpi_ps_parse_aml() after a failing invocaion of
acpi_ds_call_control_method() is reported by KASAN [1] and code
inspection reveals that next_walk_state pushed to the thread by
acpi_ds_create_walk_state() is freed on errors, but it is not popped
from the thread beforehand. Thus acpi_ds_get_current_walk_state()
called by acpi_ps_parse_aml() subsequently returns it as the new
walk state which is incorrect.
To address this, make acpi_ds_call_control_method() call
acpi_ds_pop_walk_state() to pop next_walk_state from the thread before
returning an error.
Added GPE quirk entry for the HP Pavilion Gaming 15-cx0041ur.
There is a quirk entry for the 15-cx0xxx laptops, but this one has
different DMI_PRODUCT_NAME.
Notably backlight keys and other ACPI events now function correctly.
Signed-off-by: Mia Kanashi <chad@redpilled.dev> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The return value of acpi_fetch_acpi_dev() could be NULL, which would
cause a NULL pointer dereference to occur in acpi_device_hid().
Signed-off-by: Li Zhong <floridsleeves@gmail.com>
[ rjw: Subject and changelog edits, added empty line after if () ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
/* _inline may overflow into _inline_ea when needed */
/* _inline_ea may overlay the last part of
* file._xtroot if maxentry = XTROOTINITSLOT
*/
union {
struct {
/* 128: inline symlink */
unchar _inline[128];
/* 128: inline extended attr */
unchar _inline_ea[128];
};
unchar _inline_all[256];
and currently the symlink code copies into _inline;
if this is larger than 128 bytes it triggers a fortify warning of the
form:
memcpy: detected field-spanning write (size 132) of single field
"ip->i_link" at fs/jfs/namei.c:950 (size 18446744073709551615)
when it's actually OK.
Copy it into _inline_all instead.
Reported-by: syzbot+5fc38b2ddbbca7f5c680@syzkaller.appspotmail.com Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The cause of the issue is that brelse() is called on both ofibh.sbh
and ofibh.ebh by udf_find_entry() when it returns NULL. However,
brelse() is called by udf_rename(), too. So, b_count on buffer_head
becomes unbalanced.
This patch fixes the issue by not calling brelse() by udf_rename()
when udf_find_entry() returns NULL.
Syzbot found a crash : UBSAN: shift-out-of-bounds in dbAllocAG. The
underlying bug is the missing check of bmp->db_agl2size. The field can
be greater than 64 and trigger the shift-out-of-bounds.
Fix this bug by adding a check of bmp->db_agl2size in dbMount since this
field is used in many following functions. The upper bound for this
field is L2MAXL2SIZE - L2MAXAG, thanks for the help of Dave Kleikamp.
Note that, for maintenance, I reorganized error handling code of dbMount.
Reported-by: syzbot+15342c1aa6a00fb7a438@syzkaller.appspotmail.com Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When enabling the APPS SMMU the mainline driver reconfigures the SMMU
from its bootloader configuration, losing the stream mapping for (among
which) the SDHCI hardware and breaking its ADMA feature. This feature
can be disabled with:
sdhci.debug_quirks=0x40
But it is of course desired to have this feature enabled and working
through the SMMU.
Hyper-V cleanup code comes under panic path where preemption and irq
is already disabled. So calling of unregister_syscore_ops might schedule
out the thread even for the case where mutex lock is free.
hyperv_cleanup
unregister_syscore_ops
mutex_lock(&syscore_ops_lock)
might_sleep
Here might_sleep might schedule out this thread, where voluntary preemption
config is on and this thread will never comes back. And also this was added
earlier to maintain the symmetry which is not required as this can comes
during crash shutdown path only.
To prevent the same, removing unregister_syscore_ops function call.
The Hyper-V framebuffer code registers a panic notifier in order
to try updating its fbdev if the kernel crashed. The notifier
callback is straightforward, but it calls the vmbus_sendpacket()
routine eventually, and such function takes a spinlock for the
ring buffer operations.
Panic path runs in atomic context, with local interrupts and
preemption disabled, and all secondary CPUs shutdown. That said,
taking a spinlock might cause a lockup if a secondary CPU was
disabled with such lock taken. Fix it here by checking if the
ring buffer spinlock is busy on Hyper-V framebuffer panic notifier;
if so, bail-out avoiding the potential lockup scenario.
Cc: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Tianyu Lan <Tianyu.Lan@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Tested-by: Fabio A M Martins <fabiomirmar@gmail.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20220819221731.480795-10-gpiccoli@igalia.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The flash_memory region currently starts at offset 896MiB:
0xb8000000 (flash_memory offset) - 0x80000000 (base memory address) = 0x38000000 = 896MiB
This is the end of the available DRAM with ECC enabled and therefore it
needs to be moved.
Since the flash_memory is 64MiB in size and needs to be 64MiB aligned,
it can just be moved up by 64MiB and would sit right at the end of the
available DRAM buffer.
The ramoops region currently follows the flash_memory, but it can be
moved to sit above flash_memory which would minimize the address-space
fragmentation.
We use is_ttbr0_addr() in noinstr code, but as it's only marked as
inline, it's theoretically possible for the compiler to place it
out-of-line and instrument it, which would be problematic.
Mark is_ttbr0_addr() as __always_inline such that that can safely be
used from noinstr code. For consistency, do the same to is_ttbr1_addr().
Note that while is_ttbr1_addr() calls arch_kasan_reset_tag(), this is a
macro (and its callees are either macros or __always_inline), so there
is not a risk of transient instrumentation.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20221114144042.3001140-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Running rcutorture with non-zero fqs_duration module parameter in a
kernel built with CONFIG_PREEMPTION=y results in the following splat:
BUG: using __this_cpu_read() in preemptible [00000000]
code: rcu_torture_fqs/398
caller is __this_cpu_preempt_check+0x13/0x20
CPU: 3 PID: 398 Comm: rcu_torture_fqs Not tainted 6.0.0-rc1-yoctodev-standard+
Call Trace:
<TASK>
dump_stack_lvl+0x5b/0x86
dump_stack+0x10/0x16
check_preemption_disabled+0xe5/0xf0
__this_cpu_preempt_check+0x13/0x20
rcu_force_quiescent_state.part.0+0x1c/0x170
rcu_force_quiescent_state+0x1e/0x30
rcu_torture_fqs+0xca/0x160
? rcu_torture_boost+0x430/0x430
kthread+0x192/0x1d0
? kthread_complete_and_exit+0x30/0x30
ret_from_fork+0x22/0x30
</TASK>
The problem is that rcu_force_quiescent_state() uses __this_cpu_read()
in preemptible code instead of the proper raw_cpu_read(). This commit
therefore changes __this_cpu_read() to raw_cpu_read().
Signed-off-by: Zqiang <qiang1.zhang@intel.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The unregister check could be incorrectly triggered if a netdev
changes its type after register. That is possible for a tun device
using TUNSETLINK ioctl, resulting in mctp unregister failing
and the netdev unregister waiting forever.
This was encountered by https://github.com/openthread/openthread/issues/8523
Neither check at register or unregister is required. They were added in
an attempt to track down mctp_ptr being set unexpectedly, which should
not happen in normal operation.
Fixes: 7b1871af75f3 ("mctp: Warn if pointer is set for a wrong dev type") Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Link: https://lore.kernel.org/r/20221215054933.2403401-1-matt@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
RFC1662 defines the start state for the crc16 FCS to be 0xffff, but
we're currently starting at zero.
This change uses the correct start state. We're only early in the
adoption for the serial binding, so there aren't yet any other users to
interface to.
Fixes: a0c2ccd9b5ad ("mctp: Add MCTP-over-serial transport binding") Reported-by: Harsh Tyagi <harshtya@google.com> Tested-by: Harsh Tyagi <harshtya@google.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Changheon Lee reported TCP socket leaks, with a nice repro.
It seems we leak TCP sockets with the following sequence:
1) SOF_TIMESTAMPING_TX_ACK is enabled on the socket.
Each ACK will cook an skb put in error queue, from __skb_tstamp_tx().
__skb_tstamp_tx() is using skb_clone(), unless
SOF_TIMESTAMPING_OPT_TSONLY was also requested.
2) If the application is also using MSG_ZEROCOPY, then we put in the
error queue cloned skbs that had a struct ubuf_info attached to them.
Whenever an struct ubuf_info is allocated, sock_zerocopy_alloc()
does a sock_hold().
As long as the cloned skbs are still in sk_error_queue,
socket refcount is kept elevated.
3) Application closes the socket, while error queue is not empty.
Since tcp_close() no longer purges the socket error queue,
we might end up with a TCP socket with at least one skb in
error queue keeping the socket alive forever.
This bug can be (ab)used to consume all kernel memory
and freeze the host.
We need to purge the error queue, with proper synchronization
against concurrent writers.
Fixes: 24bcbe1cc69f ("net: stream: don't purge sk_error_queue in sk_stream_kill_queues()") Reported-by: Changheon Lee <darklight2357@icloud.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
One of the error paths in rxrpc_do_sendmsg() doesn't unlock the call mutex
before returning. Fix it to do this.
Note that this still doesn't get rid of the checker warning:
../net/rxrpc/sendmsg.c:617:5: warning: context imbalance in 'rxrpc_do_sendmsg' - wrong count at exit
I think the interplay between the socket lock and the call's user_mutex may
be too complicated for checker to analyse, especially as
rxrpc_new_client_call_for_sendmsg(), which it calls, returns with the
call's user_mutex if successful but unconditionally drops the socket lock.
Fixes: e754eba685aa ("rxrpc: Provide a cmsg to specify the amount of Tx data for a call") Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
"You don't have to, providing a 32bit data chunk without TCF_EM_SIMPLE
set will simply result in allocating & copy. It's an optimization,
nothing more."
So if an ematch module provides ops->datalen that means it wants a
complex data structure (saved in its em->data) instead of a simple u32
value. We should simply reject such a combination, otherwise this u32
could be misinterpreted as a pointer.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-and-tested-by: syzbot+4caeae4c7103813598ae@syzkaller.appspotmail.com Reported-by: Jun Nie <jun.nie@linaro.org> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
If device_register() fails, it has two issues:
1. The name allocated by dev_set_name() is leaked.
2. The parent of device is not NULL, device_unregister() is called
in zynqmp_ipi_free_mboxes(), it will lead a kernel crash because
of removing not added device.
Call put_device() to give up the reference, so the name is freed in
kobject_cleanup(). Add device registered check in zynqmp_ipi_free_mboxes()
to avoid null-ptr-deref.
Some services explicitly return an error code in their response, but
others rely on the system controller to set a status in its status
register. The meaning of the bits varies based on what service is
requested, so pass it back up to the driver that requested the service
in the first place. The field in the message struct already existed, but
was unused until now.
If the system controller is busy, in which case we should never actually
be in the interrupt handler, or if the service fails the mailbox itself
should not be read. Callers should check the status before operating on
the response.
There's an existing, but unused, #define for the mailbox mask - but it
was incorrect. It was doing a GENMASK_ULL(32, 16) which should've just
been a GENMASK(31, 16), so fix that up and start using it.
Extending the tail can have some unexpected side effects if a program uses
a helper like BPF_FUNC_skb_pull_data to read partial content beyond the
head skb headlen when all the skbs in the gso frag_list are linear with no
head_frag -
kernel BUG at net/core/skbuff.c:4219!
pc : skb_segment+0xcf4/0xd2c
lr : skb_segment+0x63c/0xd2c
Call trace:
skb_segment+0xcf4/0xd2c
__udp_gso_segment+0xa4/0x544
udp4_ufo_fragment+0x184/0x1c0
inet_gso_segment+0x16c/0x3a4
skb_mac_gso_segment+0xd4/0x1b0
__skb_gso_segment+0xcc/0x12c
udp_rcv_segment+0x54/0x16c
udp_queue_rcv_skb+0x78/0x144
udp_unicast_rcv_skb+0x8c/0xa4
__udp4_lib_rcv+0x490/0x68c
udp_rcv+0x20/0x30
ip_protocol_deliver_rcu+0x1b0/0x33c
ip_local_deliver+0xd8/0x1f0
ip_rcv+0x98/0x1a4
deliver_ptype_list_skb+0x98/0x1ec
__netif_receive_skb_core+0x978/0xc60
Fix this by marking these skbs as GSO_DODGY so segmentation can handle
the tail updates accordingly.
Fixes: 3dcbdb134f32 ("net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list") Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com> Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/1671084718-24796-1-git-send-email-quic_subashab@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Take the instance lock around devlink_nl_fill() when dumping,
doit takes it already.
We are only dumping basic info so in the worst case we were risking
data races around the reload statistics. Until the big devlink mutex
was removed all relevant code was protected by it, so the missing
instance lock was not exposed.
The actual clock feeding into the Mali GPU on the MT8183 is from the
clock gate in the MFGCFG block, not CLK_TOP_MFGPLL_CK from the TOPCKGEN
block, which itself is simply a pass-through placeholder for the MFGPLL
in the APMIXEDSYS block.
Fix the hardware description with the correct clock reference.
Fixes: a8168cebf1bc ("arm64: dts: mt8183: Add node for the Mali GPU") Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Link: https://lore.kernel.org/r/20220927101128.44758-2-angelogioacchino.delregno@collabora.com Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Power reset maybe generate unexpected signal. In order to avoid
the glitch issue, we need to enable isolation first to guarantee the
stable signal when power reset is triggered.
Fixes: 59b644b01cf4 ("soc: mediatek: Add MediaTek SCPSYS power domains") Signed-off-by: Chun-Jie Chen <chun-jie.chen@mediatek.com> Signed-off-by: Allen-KH Cheng <allen-kh.cheng@mediatek.com> Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Miles Chen <miles.chen@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20221014102029.1162-1-allen-kh.cheng@mediatek.com Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The commit mentioned below causes the ovs_flow_tbl_lookup() function
to be called with the masked key. However, it's supposed to be called
with the unmasked key. This due to the fact that the datapath supports
installing wider flows, and OVS relies on this behavior. For example
if ipv4(src=1.1.1.1/192.0.0.0, dst=1.1.1.2/192.0.0.0) exists, a wider
flow (smaller mask) of ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/
128.0.0.0) is allowed to be added.
However, if we try to add a wildcard rule, the installation fails:
The reason is that the key used to determine if the flow is already
present in the system uses the original key ANDed with the mask.
This results in the IP address not being part of the (miniflow) key,
i.e., being substituted with an all-zero value. When doing the actual
lookup, this results in the key wrongfully matching the first flow,
and therefore the flow does not get installed.
This change reverses the commit below, but rather than having the key
on the stack, it's allocated.
Fixes: 190aa3e77880 ("openvswitch: Fix Frame-size larger than 1024 bytes warning.") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
$number + > bash means redirect FD $number, e.g. commonly
used 2> redirects stderr (fd 2). The test uses 8192> to
write the number 8192 to a file, this results in:
./devlink.sh: line 499: 8192: Bad file descriptor
Oddly the test also papers over this issue by checking
for failure (expecting an error rather than success)
so it passes, anyway.
Fixes: ff18176ad806 ("selftests: Add a test of large binary to devlink health test") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
This is the locking assert in devlink_region_snapshot_del(),
we're supposed to be holding the region->snapshot_lock here.
Fixes: 2dec18ad826f ("net: devlink: remove region snapshots list dependency on devlink->lock") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The default setting of end_time minus start_time is whole 1 second.
Thus, if it's not being configured in any GCL entry then it will be
staying at original 1 second.
This patch is changing the start_time and end_time to be end_time as
if setting zero will be having weird HW behavior where the gate will
not be fully closed.
Fixes: ec50a9d437f0 ("igc: Add support for taprio offloading") Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Qbv users can specify a cycle time that is not equal to the total GCL
intervals. Hence, recalculation is necessary here to exclude the time
interval that exceeds the cycle time. As those GCL which exceeds the
cycle time will be truncated.
According to IEEE Std. 802.1Q-2018 section 8.6.9.2, once the end of
the list is reached, it will switch to the END_OF_CYCLE state and
leave the gates in the same state until the next cycle is started.
Fixes: ec50a9d437f0 ("igc: Add support for taprio offloading") Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Introduce qbv_enable flag in igc_adapter struct to store the Qbv on/off.
So this allow the BaseTime to enroll with zero value.
Fixes: 61572d5f8f91 ("igc: Simplify TSN flags handling") Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Using the tc qdisc command, the user can set basetime to any value.
Checking should be done on the driver's side to prevent registering
basetime values that are less than zero.
Fixes: ec50a9d437f0 ("igc: Add support for taprio offloading") Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The I225 hardware has a limitation that packets can only be scheduled
in the [0, cycle-time] interval. So, scheduling a packet to the start
of the next cycle doesn't usually work.
To overcome this, we use the Transmit Descriptor first flag to indicates
that a packet should be the first packet (from a queue) in a cycle
according to the section 7.5.2.9.3.4 The First Packet on Each QBV Cycle
in Intel Discrete I225/6 User Manual.
But this only works if there was any packet from that queue during the
current cycle, to avoid this issue, we issue an empty packet if that's
not the case. Also require one more descriptor to be available, to take
into account the empty packet that might be issued.
Test Setup:
Talker: Use l2_tai to generate the launchtime into packet load.
Listener: Use timedump.c to compute the delta between packet arrival
and LaunchTime packet payload.
In the blamed commit, it was not noticed that one implementation of
chip->info->ops->phylink_get_caps(), called by mv88e6xxx_get_caps(),
may access hardware registers, and in doing so, it takes the
mv88e6xxx_reg_lock(). Namely, this is mv88e6352_phylink_get_caps().
This is a problem because mv88e6xxx_get_caps(), apart from being
a top-level function (method invoked by dsa_switch_ops), is now also
directly called from mv88e6xxx_setup_port(), which runs under the
mv88e6xxx_reg_lock() taken by mv88e6xxx_setup(). Therefore, when running
on mv88e6352, the reg_lock would be acquired a second time and the
system would deadlock on driver probe.
The things that mv88e6xxx_setup() can compete with in terms of register
access with are the IRQ handlers and MDIO bus operations registered by
mv88e6xxx_probe(). So there is a real need to acquire the register lock.
The register lock can, in principle, be dropped and re-acquired pretty
much at will within the driver, as long as no operations that involve
waiting for indirect access to complete (essentially, callers of
mv88e6xxx_smi_direct_wait() and mv88e6xxx_wait_mask()) are interrupted
with the lock released. However, I would guess that in mv88e6xxx_setup(),
the critical section is kept open for such a long time just in order to
optimize away multiple lock/unlock operations on the registers.
We could, in principle, drop the reg_lock right before the
mv88e6xxx_setup_port() -> mv88e6xxx_get_caps() call, and
re-acquire it immediately afterwards. But this would look ugly, because
mv88e6xxx_setup_port() would release a lock which it didn't acquire, but
the caller did.
A cleaner solution to this issue comes from the observation that struct
mv88e6xxxx_ops methods generally assume they are called with the
reg_lock already acquired. Whereas mv88e6352_phylink_get_caps() is more
the exception rather than the norm, in that it acquires the lock itself.
Let's enforce the same locking pattern/convention for
chip->info->ops->phylink_get_caps() as well, and make
mv88e6xxx_get_caps(), the top-level function, acquire the register lock
explicitly, for this one implementation that will access registers for
port 4 to work properly.
This means that mv88e6xxx_setup_port() will no longer call the top-level
function, but the low-level mv88e6xxx_ops method which expects the
correct calling context (register lock held).
Compared to chip->info->ops->phylink_get_caps(), mv88e6xxx_get_caps()
also fixes up the supported_interfaces bitmap for internal ports, since
that can be done generically and does not require per-switch knowledge.
That's code which will no longer execute, however mv88e6xxx_setup_port()
doesn't need that. It just needs to look at the mac_capabilities bitmap.
Fixes: cc1049ccee20 ("net: dsa: mv88e6xxx: fix speed setting for CPU/DSA ports") Reported-by: Maksim Kiselev <bigunclemax@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Maksim Kiselev <bigunclemax@gmail.com> Link: https://lore.kernel.org/r/20221214110120.3368472-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The problem occurs in probe process as follows:
r6040_init_one:
mdiobus_register
mdiobus_scan <- alloc and register phy_device,
the reference count of phy_device is 3
r6040_mii_probe
phy_connect <- connect to the first phy_device,
so the reference count of the first
phy_device is 4, others are 3
register_netdev <- fault inject succeeded, goto error handling path
// error handling path
err_out_mdio_unregister:
mdiobus_unregister(lp->mii_bus);
err_out_mdio:
mdiobus_free(lp->mii_bus); <- the reference count of the first
phy_device is 1, it is not released
and other phy_devices are released
// similarly, the remove process also has the same problem
The root cause is traced to the phy_device is not disconnected when
removes one r6040 device in r6040_remove_one() or on error handling path
after r6040_mii probed successfully. In r6040_mii_probe(), a net ethernet
device is connected to the first PHY device of mii_bus, in order to
notify the connected driver when the link status changes, which is the
default behavior of the PHY infrastructure to handle everything.
Therefore the phy_device should be disconnected when removes one r6040
device or on error handling path.
Fix it by adding phy_disconnect() when removes one r6040 device or on
error handling path after r6040_mii probed successfully.
Fixes: 3831861b4ad8 ("r6040: implement phylib") Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20221213125614.927754-1-lizetao1@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
There is a race resulting in alive SOCK_SEQPACKET socket
may change its state from TCP_ESTABLISHED to TCP_CLOSE:
unix_release_sock(peer) unix_dgram_sendmsg(sk)
sock_orphan(peer)
sock_set_flag(peer, SOCK_DEAD)
sock_alloc_send_pskb()
if !(sk->sk_shutdown & SEND_SHUTDOWN)
OK
if sock_flag(peer, SOCK_DEAD)
sk->sk_state = TCP_CLOSE
sk->sk_shutdown = SHUTDOWN_MASK
After that socket sk remains almost normal: it is able to connect, listen, accept
and recvmsg, while it can't sendmsg.
Since this is the only possibility for alive SOCK_SEQPACKET to change
the state in such way, we should better fix this strange and potentially
danger corner case.
Note, that we will return EPIPE here like this is normally done in sock_alloc_send_pskb().
Originally used ECONNREFUSED looks strange, since it's strange to return
a specific retval in dependence of race in kernel, when user can't affect on this.
Also, move TCP_CLOSE assignment for SOCK_DGRAM sockets under state lock
to fix race with unix_dgram_connect():
Fix a slab-out-of-bounds read that occurs in nla_put() called from
nfc_genl_send_target() when target->sensb_res_len, which is duplicated
from an nfc_target in pn533, is too large as the nfc_target is not
properly initialized and retains garbage values. Clear nfc_targets with
memset() before they are used.
Fixes: 673088fb42d0 ("NFC: pn533: Send ATR_REQ directly for active device detection") Fixes: 361f3cb7f9cf ("NFC: DEP link hook implementation for pn533") Signed-off-by: Minsuk Kang <linuxlovemin@yonsei.ac.kr> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20221214015139.119673-1-linuxlovemin@yonsei.ac.kr Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Before enetc_clean_rx_ring_xdp() calls xdp_do_redirect(), each software
BD in the RX ring between index orig_i and i can have one of 2 refcount
values on its page.
We are the owner of the current buffer that is being processed, so the
refcount will be at least 1.
If the current owner of the buffer at the diametrically opposed index
in the RX ring (i.o.w, the other half of this page) has not yet called
kfree(), this page's refcount could even be 2.
enetc_page_reusable() in enetc_flip_rx_buff() tests for the page
refcount against 1, and [ if it's 2 ] does not attempt to reuse it.
But if enetc_flip_rx_buff() is put after the xdp_do_redirect() call,
the page refcount can have one of 3 values. It can also be 0, if there
is no owner of the other page half, and xdp_do_redirect() for this
buffer ran so far that it triggered a flush of the devmap/cpumap bulk
queue, and the consumers of those bulk queues also freed the buffer,
all by the time xdp_do_redirect() returns the execution back to enetc.
This is the reason why enetc_flip_rx_buff() is called before
xdp_do_redirect(), but there is a big flaw with that reasoning:
enetc_flip_rx_buff() will set rx_swbd->page = NULL on both sides of the
enetc_page_reusable() branch, and if xdp_do_redirect() returns an error,
we call enetc_xdp_free(), which does not deal gracefully with that.
In fact, what happens is quite special. The page refcounts start as 1.
enetc_flip_rx_buff() figures they're reusable, transfers these
rx_swbd->page pointers to a different rx_swbd in enetc_reuse_page(), and
bumps the refcount to 2. When xdp_do_redirect() later returns an error,
we call the no-op enetc_xdp_free(), but we still haven't lost the
reference to that page. A copy of it is still at rx_ring->next_to_alloc,
but that has refcount 2 (and there are no concurrent owners of it in
flight, to drop the refcount). What really kills the system is when
we'll flip the rx_swbd->page the second time around. With an updated
refcount of 2, the page will not be reusable and we'll really leak it.
Then enetc_new_page() will have to allocate more pages, which will then
eventually leak again on further errors from xdp_do_redirect().
The problem, summarized, is that we zeroize rx_swbd->page before we're
completely done with it, and this makes it impossible for the error path
to do something with it.
Since the packet is potentially multi-buffer and therefore the
rx_swbd->page is potentially an array, manual passing of the old
pointers between enetc_flip_rx_buff() and enetc_xdp_free() is a bit
difficult.
For the sake of going with a simple solution, we accept the possibility
of racing with xdp_do_redirect(), and we move the flip procedure to
execute only on the redirect success path. By racing, I mean that the
page may be deemed as not reusable by enetc (having a refcount of 0),
but there will be no leak in that case, either.
Once we accept that, we have something better to do with buffers on
XDP_REDIRECT failure. Since we haven't performed half-page flipping yet,
we won't, either (and this way, we can avoid enetc_xdp_free()
completely, which gives the entire page to the slab allocator).
Instead, we'll call enetc_xdp_drop(), which will recycle this half of
the buffer back to the RX ring.
Fixes: 9d2b68cc108d ("net: enetc: add support for XDP_REDIRECT") Suggested-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20221213001908.2347046-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The patch adding support for dynamically allocated arrays accidentally
dropped the line setting ctrl->is_new to 1, thus new string values were
always ignored.
Fixes: fb582cba4492 ("media: v4l2-ctrls: add support for dynamically allocated arrays.") Reported-by: Alice Yuan <alice.yuan@nxp.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In [0], we added the ability to bpf_prog_attach LSM programs to cgroups,
but in our validation to make sure the prog is meant to be attached to
BPF_LSM_CGROUP, we return too early if the check fails. This results in
lack of decrementing prog's refcnt (through bpf_prog_put)
leaving the LSM program alive past the point of the expected lifecycle.
This fix allows for the decrement to take place.
BPF selftests require CONFIG_FUNCTION_ERROR_INJECTION to work. However,
CONFIG_FUNCTION_ERROR_INJECTION is no longer 'y' by default after recent
changes. As a result, we are seeing errors like the following from BPF CI:
bpf_testmod_test_read() is not modifiable
__x64_sys_setdomainname is not sleepable
__x64_sys_getpgid is not sleepable
Fix this by explicitly selecting CONFIG_FUNCTION_ERROR_INJECTION in the
selftest config.
Fixes: a4412fdd49dc ("error-injection: Add prompt for function error injection") Reported-by: Daniel Müller <deso@posteo.net> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Müller <deso@posteo.net> Link: https://lore.kernel.org/bpf/20221213220500.3427947-1-song@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 3bc5e683c67d ("bfq: Split shared queues on move between cgroups")
changes that move process to a new cgroup will allocate a new bfqq to
use, however, the old bfqq and new bfqq can point to the same bic:
1) Initial state, two process with io in the same cgroup.
Process 1 Process 2
(BIC1) (BIC2)
| Λ | Λ
| | | |
V | V |
bfqq1 bfqq2
2) bfqq1 is merged to bfqq2.
Process 1 Process 2
(BIC1) (BIC2)
| |
\-------------\|
V
bfqq1 bfqq2(coop)
3) Process 1 exit, then issue new io(denoce IOA) from Process 2.
(BIC2)
| Λ
| |
V |
bfqq2(coop)
4) Before IOA is completed, move Process 2 to another cgroup and issue io.
Process 2
(BIC2)
Λ
|\--------------\
| V
bfqq2 bfqq3
Now that BIC2 points to bfqq3, while bfqq2 and bfqq3 both point to BIC2.
If all the requests are completed, and Process 2 exit, BIC2 will be
freed while there is no guarantee that bfqq2 will be freed before BIC2.
Fix the problem by clearing bfqq->bic while bfqq is detached from bic.
Fixes: 3bc5e683c67d ("bfq: Split shared queues on move between cgroups") Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221214030430.3304151-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
It is not allowed to call kfree_skb() or consume_skb() from hardware
interrupt context or with hardware interrupts being disabled.
skb_queue_purge() is called under spin_lock_irqsave() in handle_dmsg()
and hfcm_l1callback(), kfree_skb() is called in them, to fix this, use
skb_queue_splice_init() to move the dch->squeue to a free queue, also
enqueue the tx_skb and rx_skb, at last calling __skb_queue_purge() to
free the SKBs afer unlock.
Fixes: af69fb3a8ffa ("Add mISDN HFC multiport driver") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
It is not allowed to call kfree_skb() or consume_skb() from hardware
interrupt context or with hardware interrupts being disabled.
skb_queue_purge() is called under spin_lock_irqsave() in hfcpci_l2l1D(),
kfree_skb() is called in it, to fix this, use skb_queue_splice_init()
to move the dch->squeue to a free queue, also enqueue the tx_skb and
rx_skb, at last calling __skb_queue_purge() to free the SKBs afer unlock.
Fixes: 1700fe1a10dc ("Add mISDN HFC PCI driver") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
It is not allowed to call kfree_skb() or consume_skb() from hardware
interrupt context or with hardware interrupts being disabled.
It should use dev_kfree_skb_irq() or dev_consume_skb_irq() instead.
The difference between them is free reason, dev_kfree_skb_irq() means
the SKB is dropped in error and dev_consume_skb_irq() means the SKB
is consumed in normal.
skb_queue_purge() is called under spin_lock_irqsave() in hfcusb_l2l1D(),
kfree_skb() is called in it, to fix this, use skb_queue_splice_init()
to move the dch->squeue to a free queue, also enqueue the tx_skb and
rx_skb, at last calling __skb_queue_purge() to free the SKBs afer unlock.
In tx_iso_complete(), dev_kfree_skb() is called to consume the transmitted
SKB, so replace it with dev_consume_skb_irq().
Fixes: 69f52adb2d53 ("mISDN: Add HFC USB driver") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, when a high prio link enslaved, or when current link down,
the high prio port could be selected. But when high prio link up, the
new active slave reselection is not triggered. Fix it by checking link's
prio when getting up. Making the do_failover after looping all slaves as
there may be multi high prio slaves up.
Reported-by: Liang Li <liali@redhat.com> Fixes: 0a2ff7cc8ad4 ("Bonding: add per-port priority for failover re-selection") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
There is one direct accesses to bond->curr_active_slave in
bond_miimon_commit(). Protected it by rcu_access_pointer()
since the later of this function also use this one.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: e95cc44763a4 ("bonding: do failover when high prio link up") Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently macsec offload selection update routine accesses
the net device prior to holding the relevant lock.
Fix by holding the lock prior to the device access.
Fixes: dcb780fb2795 ("net: macsec: add nla support for changing the offloading selection") Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Link: https://lore.kernel.org/r/20221211075532.28099-1-ehakim@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
On error situation `clp->cl_cb_conn.cb_xprt` should not be given
a reference to the xprt otherwise both client cleanup and the
error handling path of the caller call to put it. Better to
delay handing over the reference to a later branch.
The pic32_rtc_enable(pdata, 0) and clk_disable_unprepare(pdata->clk)
should be called in the error handling of devm_rtc_allocate_device(),
so we should move devm_rtc_allocate_device earlier in pic32_rtc_probe()
to fix it.
In case of error, the function devm_regmap_init_i2c() returns
ERR_PTR() and never returns NULL. The NULL test in the return
value check should be replaced with IS_ERR().
Fixes: 6b149f3310a4 ("mfd: pm8008: Add driver for QCOM PM8008 PMIC") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Guru Das Srinagesh <gurus@codeaurora.org> Signed-off-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20221125073626.1868229-1-yangyingliang@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Since commit 856c288b0039 ("ARM: Use do_kernel_power_off()"), the
function axp20x_power_off() now runs inside a RCU read-side critical
section, so it is not allowed to call msleep(). Use mdelay() instead.
Fixes: 856c288b0039 ("ARM: Use do_kernel_power_off()") Signed-off-by: Samuel Holland <samuel@sholland.org> Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20221105212909.6526-1-samuel@sholland.org Signed-off-by: Sasha Levin <sashal@kernel.org>
The PWM node is not a separate device and is expected to be part of parent
SPMI PMIC node, thus it obtains the address space from the parent. One IO
address in "reg" is also not correct description because LPG block maps to
several regions.
Fixes: 3f5117be9584 ("dt-bindings: mfd: convert to yaml Qualcomm SPMI PMIC") Suggested-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Signed-off-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20220928000517.228382-2-bryan.odonoghue@linaro.org Signed-off-by: Sasha Levin <sashal@kernel.org>
rtas-error-log-max is not the name of an RTAS function, so rtas_token()
is not the appropriate API for retrieving its value. We already have
rtas_get_error_log_max() which returns a sensible value if the property
is absent for any reason, so use that instead.
Fixes: 8d633291b4fc ("powerpc/eeh: pseries platform EEH error log retrieval") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
[mpe: Drop no-longer possible error handling as noticed by ajd] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221118150751.469393-6-nathanl@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>