This change does not have any functional effect. The switch works just
fine without this patch as it has full access to all the addresses
on the bus. This is simply a clean-up to set the node name address
and reg address to the same value.
Fixes: 15b43e497ffd ("ARM: dts: imx6dl-yapp4: Use correct pseudo PHY address for the switch") Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When the development chip ROM was added, the "direct-mapped" compatible
value was already obsolete. In addition, the device node lacked the
accompanying "probe-type" property, causing the old physmap_of_core
driver to fall back to trying all available probe types.
Unfortunately this fallback was lost when the DT and pdata cases were
merged.
Fix this by using the modern "mtd-rom" compatible value instead.
Avoid the following warnings by removing the ena_select_queue() function
and rely on the net core to do the queue selection, The issue happen
when an skb received from an interface with more queues than ena is
forwarded to the ena interface.
[ 1176.159959] eth0 selects TX queue 11, but real number of TX queues is 8
[ 1176.863976] eth0 selects TX queue 14, but real number of TX queues is 8
[ 1180.767877] eth0 selects TX queue 14, but real number of TX queues is 8
[ 1188.703742] eth0 selects TX queue 14, but real number of TX queues is 8
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Kamal Heib <kheib@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
On many systems that have an AMD IOMMU the following sequence of
warnings is observed during bootup.
```
pci 0000:00:00.2 can't derive routing for PCI INT A
pci 0000:00:00.2: PCI INT A: not connected
```
This series of events happens because of the IOMMU initialization
sequence order and the lack of _PRT entries for the IOMMU.
During initialization the IOMMU driver first enables the PCI device
using pci_enable_device(). This will call acpi_pci_irq_enable()
which will check if the interrupt is declared in a PCI routing table
(_PRT) entry. According to the PCI spec [1] these routing entries
are only required under PCI root bridges:
The _PRT object is required under all PCI root bridges
The IOMMU is directly connected to the root complex, so there is no
parent bridge to look for a _PRT entry. The first warning is emitted
since no entry could be found in the hierarchy. The second warning is
then emitted because the interrupt hasn't yet been configured to any
value. The pin was configured in pci_read_irq() but the byte in
PCI_INTERRUPT_LINE return 0xff which means "Unknown".
After that sequence of events pci_enable_msi() is called and this
will allocate an interrupt.
That is both of these warnings are totally harmless because the IOMMU
uses MSI for interrupts. To avoid even trying to probe for a _PRT
entry mark the IOMMU as IRQ managed. This avoids both warnings.
Update the architecture dependency to be the generic Tegra
because the driver works on the four latest Tegra generations
not just Tegra210, if you build a kernel with a specific
ARCH_TEGRA_xxx_SOC option that excludes Tegra210 you don't get
this driver.
Fixes: 46a88534afb59 ("bus: Add support for Tegra ACONNECT") Signed-off-by: Peter Robinson <pbrobinson@gmail.com> Cc: Jon Hunter <jonathanh@nvidia.com> Cc: Thierry Reding <treding@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix this by freeing the CPU idle device after unregistering it.
Fixes: 3d339dcbb56d ("cpuidle / ACPI : move cpuidle_device field out of the acpi_processor_power structure") Signed-off-by: Armin Wolf <W_Armin@gmx.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
wilc_netdev_cleanup currently triggers a KASAN warning, which can be
observed on interface registration error path, or simply by
removing the module/unbinding device from driver:
==================================================================
BUG: KASAN: slab-use-after-free in wilc_netdev_cleanup+0x508/0x5cc
Read of size 4 at addr c54d1ce8 by task sh/86
CPU: 0 PID: 86 Comm: sh Not tainted 6.8.0-rc1+ #117
Hardware name: Atmel SAMA5
unwind_backtrace from show_stack+0x18/0x1c
show_stack from dump_stack_lvl+0x34/0x58
dump_stack_lvl from print_report+0x154/0x500
print_report from kasan_report+0xac/0xd8
kasan_report from wilc_netdev_cleanup+0x508/0x5cc
wilc_netdev_cleanup from wilc_bus_remove+0xc8/0xec
wilc_bus_remove from spi_remove+0x8c/0xac
spi_remove from device_release_driver_internal+0x434/0x5f8
device_release_driver_internal from unbind_store+0xbc/0x108
unbind_store from kernfs_fop_write_iter+0x398/0x584
kernfs_fop_write_iter from vfs_write+0x728/0xf88
vfs_write from ksys_write+0x110/0x1e4
ksys_write from ret_fast_syscall+0x0/0x1c
David Mosberger-Tan initial investigation [1] showed that this
use-after-free is due to netdevice unregistration during vif list
traversal. When unregistering a net device, since the needs_free_netdev has
been set to true during registration, the netdevice object is also freed,
and as a consequence, the corresponding vif object too, since it is
attached to it as private netdevice data. The next occurrence of the loop
then tries to access freed vif pointer to the list to move forward in the
list.
Fix this use-after-free thanks to two mechanisms:
- navigate in the list with list_for_each_entry_safe, which allows to
safely modify the list as we go through each element. For each element,
remove it from the list with list_del_rcu
- make sure to wait for RCU grace period end after each vif removal to make
sure it is safe to free the corresponding vif too (through
unregister_netdev)
Since we are in a RCU "modifier" path (not a "reader" path), and because
such path is expected not to be concurrent to any other modifier (we are
using the vif_mutex lock), we do not need to use RCU list API, that's why
we can benefit from list_for_each_entry_safe.
Currently tracing is supposed not to allow for bpf_spin_{lock,unlock}()
helper calls. This is to prevent deadlock for the following cases:
- there is a prog (prog-A) calling bpf_spin_{lock,unlock}().
- there is a tracing program (prog-B), e.g., fentry, attached
to bpf_spin_lock() and/or bpf_spin_unlock().
- prog-B calls bpf_spin_{lock,unlock}().
For such a case, when prog-A calls bpf_spin_{lock,unlock}(),
a deadlock will happen.
The related source codes are below in kernel/bpf/helpers.c:
notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock)
notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock)
notrace is supposed to prevent fentry prog from attaching to
bpf_spin_{lock,unlock}().
But actually this is not the case and fentry prog can successfully
attached to bpf_spin_lock(). Siddharth Chintamaneni reported
the issue in [1]. The following is the macro definition for
above BPF_CALL_1:
#define BPF_CALL_x(x, name, ...) \
static __always_inline \
u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)); \
typedef u64 (*btf_##name)(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)); \
u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__)); \
u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__)) \
{ \
return ((btf_##name)____##name)(__BPF_MAP(x,__BPF_CAST,__BPF_N,__VA_ARGS__));\
} \
static __always_inline \
u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__))
The notrace attribute is actually applied to the static always_inline function
____bpf_spin_{lock,unlock}(). The actual callback function
bpf_spin_{lock,unlock}() is not marked with notrace, hence
allowing fentry prog to attach to two helpers, and this
may cause the above mentioned deadlock. Siddharth Chintamaneni
actually has a reproducer in [2].
To fix the issue, a new macro NOTRACE_BPF_CALL_1 is introduced which
will add notrace attribute to the original function instead of
the hidden always_inline function and this fixed the problem.
This fixes:
arch/arm64/boot/dts/mediatek/mt7622-rfb1.dtb: /: memory@40000000: 'device_type' is a required property
from schema $id: http://devicetree.org/schemas/memory.yaml#
arch/arm64/boot/dts/mediatek/mt7622-bananapi-bpi-r64.dtb: /: memory@40000000: 'device_type' is a required property
from schema $id: http://devicetree.org/schemas/memory.yaml#
Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20240122132357.31264-1-zajec5@gmail.com Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In the for statement of lbs_allocate_cmd_buffer(), if the allocation of
cmdarray[i].cmdbuf fails, both cmdarray and cmdarray[i].cmdbuf needs to
be freed. Otherwise, there will be memleaks in lbs_allocate_cmd_buffer().
Fixes: 876c9d3aeb98 ("[PATCH] Marvell Libertas 8388 802.11b/g USB driver") Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://msgid.link/20240126075336.2825608-1-alexious@zju.edu.cn Signed-off-by: Sasha Levin <sashal@kernel.org>
EWRD ACPI table contains up to 3 additional sar profiles.
According to the BIOS spec, the table contains a n_profile
variable indicating how many additional profiles exist in the
table.
Currently we check that n_profiles is not <= 0.
But according to the BIOS spec, 0 is a valid value,
and it can't be < 0 anyway because we receive that from ACPI as
an unsigned integer.
Fixes: 39c1a9728f93 ("iwlwifi: refactor the SAR tables from mvm to acpi") Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Reviewed-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://msgid.link/20240129211905.448ea2f40814.Iffd2aadf8e8693e6cb599bee0406a800a0c1e081@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The ath9k_wmi_event_tasklet() used in ath9k_htc assumes that all the data
structures have been fully initialised by the time it runs. However, because of
the order in which things are initialised, this is not guaranteed to be the
case, because the device is exposed to the USB subsystem before the ath9k driver
initialisation is completed.
We already committed a partial fix for this in commit: 8b3046abc99e ("ath9k_htc: fix NULL pointer dereference at ath9k_htc_tx_get_packet()")
However, that commit only aborted the WMI_TXSTATUS_EVENTID command in the event
tasklet, pairing it with an "initialisation complete" bit in the TX struct. It
seems syzbot managed to trigger the race for one of the other commands as well,
so let's just move the existing synchronisation bit to cover the whole
tasklet (setting it at the end of ath9k_htc_probe_device() instead of inside
ath9k_tx_init()).
There exists the following warning when building bpftool:
CC prog.o
prog.c: In function ‘profile_open_perf_events’:
prog.c:2301:24: warning: ‘calloc’ sizes specified with ‘sizeof’ in the earlier argument and not in the later argument [-Wcalloc-transposed-args]
2301 | sizeof(int), obj->rodata->num_cpu * obj->rodata->num_metric);
| ^~~
prog.c:2301:24: note: earlier argument should specify number of elements, later size of each element
Tested with the latest upstream GCC which contains a new warning option
-Wcalloc-transposed-args. The first argument to calloc is documented to
be number of elements in array, while the second argument is size of each
element, just switch the first and second arguments of calloc() to silence
the build warning, compile tested only.
debugfs_create_dir() returns ERR_PTR and never return NULL.
As Russell suggested, this patch removes the error checking for
debugfs_create_dir(). This is because the DebugFS kernel API is developed
in a way that the caller can safely ignore the errors that occur during
the creation of DebugFS nodes. The debugfs APIs have a IS_ERR() judge in
start_creating() which can handle it gracefully. So these checks are
unnecessary.
Fixes: 5e6e3a92b9a4 ("wireless: mwifiex: initial commit for Marvell mwifiex driver") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://msgid.link/20230903030216.1509013-3-ruanjinjie@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Adding then removing a second vif currently makes the first vif not working
anymore. This is visible for example when we have a first interface
connected to some access point:
- create a wpa_supplicant.conf with some AP credentials
- wpa_supplicant -Dnl80211 -c /etc/wpa_supplicant.conf -i wlan0
- dhclient wlan0
- iw phy phy0 interface add wlan1 type managed
- iw dev wlan1 del
wlan0 does not manage properly traffic anymore (eg: ping not working)
This is due to vif mode being incorrectly reconfigured with some default
values in del_virtual_intf, affecting by default first vif.
Prevent first vif from being affected on second vif removal by removing vif
mode change command in del_virtual_intf
Fixes: 9bc061e88054 ("staging: wilc1000: added support to dynamically add/remove interfaces") Signed-off-by: Ajay Singh <ajay.kathat@microchip.com> Co-developed-by: Alexis Lothoré <alexis.lothore@bootlin.com> Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://msgid.link/20240115-wilc_1000_fixes-v1-5-54d29463a738@bootlin.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The workqueue might still be running, when the driver is stopped. To
avoid a use-after-free, call cancel_work_sync() in rtl8xxxu_stop().
Fixes: e542e66b7c2e ("rtl8xxxu: add bluetooth co-existence support for single antenna") Signed-off-by: Martin Kaistra <martin.kaistra@linutronix.de> Reviewed-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://msgid.link/20240111163628.320697-2-martin.kaistra@linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org>
With lockdep enabled, calls to the connect function from cfg802.11 layer
lead to the following warning:
=============================
WARNING: suspicious RCU usage
6.7.0-rc1-wt+ #333 Not tainted
-----------------------------
drivers/net/wireless/microchip/wilc1000/hif.c:386
suspicious rcu_dereference_check() usage!
[...]
stack backtrace:
CPU: 0 PID: 100 Comm: wpa_supplicant Not tainted 6.7.0-rc1-wt+ #333
Hardware name: Atmel SAMA5
unwind_backtrace from show_stack+0x18/0x1c
show_stack from dump_stack_lvl+0x34/0x48
dump_stack_lvl from wilc_parse_join_bss_param+0x7dc/0x7f4
wilc_parse_join_bss_param from connect+0x2c4/0x648
connect from cfg80211_connect+0x30c/0xb74
cfg80211_connect from nl80211_connect+0x860/0xa94
nl80211_connect from genl_rcv_msg+0x3fc/0x59c
genl_rcv_msg from netlink_rcv_skb+0xd0/0x1f8
netlink_rcv_skb from genl_rcv+0x2c/0x3c
genl_rcv from netlink_unicast+0x3b0/0x550
netlink_unicast from netlink_sendmsg+0x368/0x688
netlink_sendmsg from ____sys_sendmsg+0x190/0x430
____sys_sendmsg from ___sys_sendmsg+0x110/0x158
___sys_sendmsg from sys_sendmsg+0xe8/0x150
sys_sendmsg from ret_fast_syscall+0x0/0x1c
This warning is emitted because in the connect path, when trying to parse
target BSS parameters, we dereference a RCU pointer whithout being in RCU
critical section.
Fix RCU dereference usage by moving it to a RCU read critical section. To
avoid wrapping the whole wilc_parse_join_bss_param under the critical
section, just use the critical section to copy ies data
Fixes: c460495ee072 ("staging: wilc1000: fix incorrent type in initializer") Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://msgid.link/20240105075733.36331-3-alexis.lothore@bootlin.com Signed-off-by: Sasha Levin <sashal@kernel.org>
bcm4331 seems to not function correctly with QoS support. This may be due
to issues with currently available firmware or potentially a device
specific issue.
When queues that are not of the default "best effort" priority are
selected, traffic appears to not transmit out of the hardware while no
errors are returned. This behavior is present among all the other priority
queues: video, voice, and background. While this can be worked around by
setting a kernel parameter, the default behavior is problematic for most
users and may be difficult to debug. This patch offers a working out-of-box
experience for bcm4331 users.
Log of the issue (using ssh low-priority traffic as an example):
ssh -T -vvvv git@github.com
OpenSSH_9.6p1, OpenSSL 3.0.12 24 Oct 2023
debug1: Reading configuration data /etc/ssh/ssh_config
debug2: checking match for 'host * exec "/nix/store/q1c2flcykgr4wwg5a6h450hxbk4ch589-bash-5.2-p15/bin/bash -c '/nix/store/c015armnkhr6v18za0rypm7sh1i8js8w-gnupg-2.4.1/bin/gpg-connect-agent --quiet updatestartuptty /bye >/dev/null 2>&1'"' host github.com originally github.com
debug3: /etc/ssh/ssh_config line 5: matched 'host "github.com"'
debug1: Executing command: '/nix/store/q1c2flcykgr4wwg5a6h450hxbk4ch589-bash-5.2-p15/bin/bash -c '/nix/store/c015armnkhr6v18za0rypm7sh1i8js8w-gnupg-2.4.1/bin/gpg-connect-agent --quiet updatestartuptty /bye >/dev/null 2>&1''
debug3: command returned status 0
debug3: /etc/ssh/ssh_config line 5: matched 'exec "/nix/store/q1c2flcykgr4wwg5a6h450hxbk4ch589-bash-5.2-p15/bin/bash -c '/nix/store/c015armnkhr6v18za0r"'
debug2: match found
debug1: /etc/ssh/ssh_config line 9: Applying options for *
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/home/binary-eater/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/home/binary-eater/.ssh/known_hosts2'
debug2: resolving "github.com" port 22
debug3: resolve_host: lookup github.com:22
debug3: channel_clear_timeouts: clearing
debug3: ssh_connect_direct: entering
debug1: Connecting to github.com [192.30.255.113] port 22.
debug3: set_sock_tos: set socket 3 IP_TOS 0x48
When QoS is disabled, the queue priority value will not map to the correct
ieee80211 queue since there is only one queue. Stop queue 0 when QoS is
disabled to prevent trying to stop a non-existent queue and failing to stop
the actual queue instantiated.
Fixes: bad691946966 ("b43: avoid packet losses in the dma worker code.") Signed-off-by: Rahul Rameshbabu <sergeantsagara@protonmail.com> Reviewed-by: Julian Calaby <julian.calaby@gmail.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://msgid.link/20231231050300.122806-4-sergeantsagara@protonmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
When QoS is disabled, the queue priority value will not map to the correct
ieee80211 queue since there is only one queue. Stop/wake queue 0 when QoS
is disabled to prevent trying to stop/wake a non-existent queue and failing
to stop/wake the actual queue instantiated.
Fixes: 5100d5ac81b9 ("b43: Add PIO support for PCMCIA devices") Signed-off-by: Rahul Rameshbabu <sergeantsagara@protonmail.com> Reviewed-by: Julian Calaby <julian.calaby@gmail.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://msgid.link/20231231050300.122806-3-sergeantsagara@protonmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
When QoS is disabled, the queue priority value will not map to the correct
ieee80211 queue since there is only one queue. Stop/wake queue 0 when QoS
is disabled to prevent trying to stop/wake a non-existent queue and failing
to stop/wake the actual queue instantiated.
We should check whether the WMI_TLV_TAG_STRUCT_MGMT_TX_COMPL_EVENT tlv is
present before accessing it, otherwise a null pointer deference error will
occur.
Fixes: dc405152bb64 ("ath10k: handle mgmt tx completion event") Signed-off-by: Xingyuan Mo <hdthky0@gmail.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20231208043433.271449-1-hdthky0@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
So far, get_device_system_crosststamp() unconditionally passes
system_counterval.cycles to timekeeping_cycles_to_ns(). But when
interpolating system time (do_interp == true), system_counterval.cycles is
before tkr_mono.cycle_last, contrary to the timekeeping_cycles_to_ns()
expectations.
On x86, CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE will mitigate on
interpolating, setting delta to 0. With delta == 0, xtstamp->sys_monoraw
and xtstamp->sys_realtime are then set to the last update time, as
implicitly expected by adjust_historical_crosststamp(). On other
architectures, the resulting nonsense xtstamp->sys_monoraw and
xtstamp->sys_realtime corrupt the xtstamp (ts) adjustment in
adjust_historical_crosststamp().
Fix this by deriving xtstamp->sys_monoraw and xtstamp->sys_realtime from
the last update time when interpolating, by using the local variable
"cycles". The local variable already has the right value when
interpolating, unlike system_counterval.cycles.
Fixes: 2c756feb18d9 ("time: Add history to cross timestamp interface supporting slower devices") Signed-off-by: Peter Hilber <peter.hilber@opensynergy.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/r/20231218073849.35294-4-peter.hilber@opensynergy.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The cycle_between() helper checks if parameter test is in the open interval
(before, after). Colloquially speaking, this also applies to the counter
wrap-around special case before > after. get_device_system_crosststamp()
currently uses cycle_between() at the first call site to decide whether to
interpolate for older counter readings.
get_device_system_crosststamp() has the following problem with
cycle_between() testing against an open interval: Assume that, by chance,
cycles == tk->tkr_mono.cycle_last (in the following, "cycle_last" for
brevity). Then, cycle_between() at the first call site, with effective
argument values cycle_between(cycle_last, cycles, now), returns false,
enabling interpolation. During interpolation,
get_device_system_crosststamp() will then call cycle_between() at the
second call site (if a history_begin was supplied). The effective argument
values are cycle_between(history_begin->cycles, cycles, cycles), since
system_counterval.cycles == interval_start == cycles, per the assumption.
Due to the test against the open interval, cycle_between() returns false
again. This causes get_device_system_crosststamp() to return -EINVAL.
This failure should be avoided, since get_device_system_crosststamp() works
both when cycles follows cycle_last (no interpolation), and when cycles
precedes cycle_last (interpolation). For the case cycles == cycle_last,
interpolation is actually unneeded.
Fix this by changing cycle_between() into timestamp_in_interval(), which
now checks against the closed interval, rather than the open interval.
This changes the get_device_system_crosststamp() behavior for three corner
cases:
1. Bypass interpolation in the case cycles == tk->tkr_mono.cycle_last,
fixing the problem described above.
2. At the first timestamp_in_interval() call site, cycles == now no longer
causes failure.
3. At the second timestamp_in_interval() call site, history_begin->cycles
== system_counterval.cycles no longer causes failure.
adjust_historical_crosststamp() also works for this corner case,
where partial_history_cycles == total_history_cycles.
These behavioral changes should not cause any problems.
Fixes: 2c756feb18d9 ("time: Add history to cross timestamp interface supporting slower devices") Signed-off-by: Peter Hilber <peter.hilber@opensynergy.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20231218073849.35294-3-peter.hilber@opensynergy.com Signed-off-by: Sasha Levin <sashal@kernel.org>
This patch is against CVE-2023-6270. The description of cve is:
A flaw was found in the ATA over Ethernet (AoE) driver in the Linux
kernel. The aoecmd_cfg_pkts() function improperly updates the refcnt on
`struct net_device`, and a use-after-free can be triggered by racing
between the free on the struct and the access through the `skbtxq`
global queue. This could lead to a denial of service condition or
potential code execution.
In aoecmd_cfg_pkts(), it always calls dev_put(ifp) when skb initial
code is finished. But the net_device ifp will still be used in
later tx()->dev_queue_xmit() in kthread. Which means that the
dev_put(ifp) should NOT be called in the success path of skb
initial code in aoecmd_cfg_pkts(). Otherwise tx() may run into
use-after-free because the net_device is freed.
This patch removed the dev_put(ifp) in the success path in
aoecmd_cfg_pkts(), and added dev_put() after skb xmit in tx().
The raid should not be opened anymore when it is about to be stopped.
However, other processes can open it again if the flag MD_CLOSING is
cleared before exiting. From now on, this flag will not be cleared when
the raid will be stopped.
Fixes: 065e519e71b2 ("md: MD_CLOSING needs to be cleared after called md_set_readonly or do_md_stop") Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240226031444.3606764-6-linan666@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Implement the ->set_read_only method instead of parsing the actual
ioctl command.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: 9674f54e41ff ("md: Don't clear MD_CLOSING when the raid is about to stop") Signed-off-by: Sasha Levin <sashal@kernel.org>
Add a new method to allow for driver-specific processing when setting or
clearing the block device read-only state. This allows to replace the
cumbersome and error-prone override of the whole ioctl implementation.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: 9674f54e41ff ("md: Don't clear MD_CLOSING when the raid is about to stop") Signed-off-by: Sasha Levin <sashal@kernel.org>
A while ago, we changed the way that select() and poll() preallocate
a temporary buffer just under the size of the static warning limit of
1024 bytes, as clang was frequently going slightly above that limit.
The warnings have recently returned and I took another look. As it turns
out, clang is not actually inherently worse at reserving stack space,
it just happens to inline do_select() into core_sys_select(), while gcc
never inlines it.
Annotate do_select() to never be inlined and in turn remove the special
case for the allocation size. This should give the same behavior for
both clang and gcc all the time and once more avoids those warnings.
Bytes 18-19 of 20 are uninitialized
Memory access of size 20 starts at ffff888128a46380
Data copied to user address 0000000020000240"
Per Chuck Lever's suggestion, use kzalloc() instead of kmalloc() to
solve the problem.
Fixes: 990d6c2d7aee ("vfs: Add name to file handle conversion support") Suggested-by: Chuck Lever III <chuck.lever@oracle.com> Reported-and-tested-by: <syzbot+09b349b3066c2e0b1e96@syzkaller.appspotmail.com> Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Link: https://lore.kernel.org/r/20240119153906.4367-1-n.zhandarovich@fintech.ru Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
x86/paravirt: Fix build due to __text_gen_insn() backport
The Link tag has all the details but basically due to missing upstream
commits, the header which contains __text_gen_insn() is not in the
includes in paravirt.c, leading to:
arch/x86/kernel/paravirt.c: In function 'paravirt_patch_call':
arch/x86/kernel/paravirt.c:65:9: error: implicit declaration of function '__text_gen_insn' \
[-Werror=implicit-function-declaration]
65 | __text_gen_insn(insn_buff, CALL_INSN_OPCODE,
| ^~~~~~~~~~~~~~~
When resetting the bus after a gap count error, use a long rather than
short bus reset.
IEEE 1394-1995 uses only long bus resets. IEEE 1394a adds the option of
short bus resets. When video or audio transmission is in progress and a
device is hot-plugged elsewhere on the bus, the resulting bus reset can
cause video frame drops or audio dropouts. Short bus resets reduce or
eliminate this problem. Accordingly, short bus resets are almost always
preferred.
However, on a mixed 1394/1394a bus, a short bus reset can trigger an
immediate additional bus reset. This double bus reset can be interpreted
differently by different nodes on the bus, resulting in an inconsistent gap
count after the bus reset. An inconsistent gap count will cause another bus
reset, leading to a neverending bus reset loop. This only happens for some
bus topologies, not for all mixed 1394/1394a buses.
By instead sending a long bus reset after a gap count inconsistency, we
avoid the doubled bus reset, restoring the bus to normal operation.
During our fuzz testing of the connection and disconnection process at the
RFCOMM layer, we discovered this bug. By comparing the packets from a
normal connection and disconnection process with the testcase that
triggered a KASAN report. We analyzed the cause of this bug as follows:
1. In the packets captured during a normal connection, the host sends a
`Read Encryption Key Size` type of `HCI_CMD` packet
(Command Opcode: 0x1408) to the controller to inquire the length of
encryption key.After receiving this packet, the controller immediately
replies with a Command Completepacket (Event Code: 0x0e) to return the
Encryption Key Size.
2. In our fuzz test case, the timing of the controller's response to this
packet was delayed to an unexpected point: after the RFCOMM and L2CAP
layers had disconnected but before the HCI layer had disconnected.
3. After receiving the Encryption Key Size Response at the time described
in point 2, the host still called the rfcomm_check_security function.
However, by this time `struct l2cap_conn *conn = l2cap_pi(sk)->chan->conn;`
had already been released, and when the function executed
`return hci_conn_security(conn->hcon, d->sec_level, auth_type, d->out);`,
specifically when accessing `conn->hcon`, a null-ptr-deref error occurred.
To fix this bug, check if `sk->sk_state` is BT_CLOSED before calling
rfcomm_recv_frame in rfcomm_process_rx.
Signed-off-by: Yuxuan Hu <20373622@buaa.edu.cn> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the driver detects that the controller is not ready before sending the
first IOC facts command, it will wait for a maximum of 10 seconds for it to
become ready. However, even if the controller becomes ready within 10
seconds, the driver will still issue a diagnostic reset.
Modify the driver to avoid sending a diag reset if the controller becomes
ready within the 10-second wait time.
"struct bvec_iter" is defined with the __packed attribute, so it is
aligned on a single byte. On X86 (and on other architectures that support
unaligned addresses in hardware), "struct bvec_iter" is accessed using the
8-byte and 4-byte memory instructions, however these instructions are less
efficient if they operate on unaligned addresses.
(on RISC machines that don't have unaligned access in hardware, GCC
generates byte-by-byte accesses that are very inefficient - see [1])
This commit reorders the entries in "struct dm_verity_io" and "struct
convert_context", so that "struct bvec_iter" is aligned on 8 bytes.
The SED Opal response parsing function response_parse() does not
handle the case of an empty atom in the response. This causes
the entry count to be too high and the response fails to be
parsed. Recognizing, but ignoring, empty atoms allows response
handling to succeed.
Fixes a bug revealed by -Wmissing-prototypes when
CONFIG_FUNCTION_GRAPH_TRACER is enabled but not CONFIG_DYNAMIC_FTRACE:
arch/parisc/kernel/ftrace.c:82:5: error: no previous prototype for 'ftrace_enable_ftrace_graph_caller' [-Werror=missing-prototypes]
82 | int ftrace_enable_ftrace_graph_caller(void)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/parisc/kernel/ftrace.c:88:5: error: no previous prototype for 'ftrace_disable_ftrace_graph_caller' [-Werror=missing-prototypes]
88 | int ftrace_disable_ftrace_graph_caller(void)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
iucv_path_table is a dynamically allocated array of pointers to
struct iucv_path items. Yet, its size is calculated as if it was
an array of struct iucv_path items.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
1) A bpf program uses bpf_probe_read_kernel() to read from the vsyscall
page and invokes copy_from_kernel_nofault() which in turn calls
__get_user_asm().
2) Because the vsyscall page address is not readable from kernel space,
a page fault exception is triggered accordingly.
3) handle_page_fault() considers the vsyscall page address as a user
space address instead of a kernel space address. This results in the
fix-up setup by bpf not being applied and a page_fault_oops() is invoked
due to SMAP.
Considering handle_page_fault() has already considered the vsyscall page
address as a userspace address, fix the problem by disallowing vsyscall
page read for copy_from_kernel_nofault().
Originally-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: syzbot+72aa0161922eba61b50e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/CAG48ez06TZft=ATH1qh2c5mpS5BT8UakwNkzi6nvK5_djC-4Nw@mail.gmail.com Reported-by: xingwei lee <xrivendell7@gmail.com> Closes: https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33LvScw@mail.gmail.com Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240202103935.3154011-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Relax DEVX access upon modify commands to be UVERBS_ACCESS_READ.
The kernel doesn't need to protect what firmware protects, or what
causes no damage to anyone but the user.
As firmware needs to protect itself from parallel access to the same
object, don't block parallel modify/query commands on the same object in
the kernel side.
This change will allow user space application to run parallel updates to
different entries in the same bulk object.
Clear Cause.BD after we use instruction_pointer_set to override
EPC.
This can prevent exception_epc check against instruction code at
new return address.
It won't be considered as "in delay slot" after epc being overridden
anyway.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure. Ensure the allocation was successful
by checking the pointer validity.
The DMI strings used for the LattePanda board DMI quirks are very generic.
Using the dmidecode database from https://linux-hardware.org/ shows
that the chosen DMI strings also match the following 2 laptops
which also have a rt5645 codec:
This exact case was fail for async crypto and we weren't
catching it.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
When updating or deleting an inner map in map array or map htab, the map
may still be accessed by non-sleepable program or sleepable program.
However bpf_map_fd_put_ptr() decreases the ref-counter of the inner map
directly through bpf_map_put(), if the ref-counter is the last one
(which is true for most cases), the inner map will be freed by
ops->map_free() in a kworker. But for now, most .map_free() callbacks
don't use synchronize_rcu() or its variants to wait for the elapse of a
RCU grace period, so after the invocation of ops->map_free completes,
the bpf program which is accessing the inner map may incur
use-after-free problem.
Fix the free of inner map by invoking bpf_map_free_deferred() after both
one RCU grace period and one tasks trace RCU grace period if the inner
map has been removed from the outer map before. The deferment is
accomplished by using call_rcu() or call_rcu_tasks_trace() when
releasing the last ref-counter of bpf map. The newly-added rcu_head
field in bpf_map shares the same storage space with work field to
reduce the size of bpf_map.
Fixes: bba1dc0b55ac ("bpf: Remove redundant synchronize_rcu.") Fixes: 638e4b825d52 ("bpf: Allows per-cpu maps and map-in-map in sleepable programs") Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20231204140425.1480317-5-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 62fca83303d608ad4fec3f7428c8685680bb01b0) Signed-off-by: Robert Kolchmeyer <rkolchmeyer@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
As an accident of implementation, an RCU Tasks Trace grace period also
acts as an RCU grace period. However, this could change at any time.
This commit therefore creates an rcu_trace_implies_rcu_gp() that currently
returns true to codify this accident. Code relying on this accident
must call this function to verify that this accident is still happening.
Reported-by: Hou Tao <houtao@huaweicloud.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Link: https://lore.kernel.org/r/20221014113946.965131-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stable-dep-of: 876673364161 ("bpf: Defer the free of inner map when necessary") Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 10108826191ab30388e8ae9d54505a628f78a7ec) Signed-off-by: Robert Kolchmeyer <rkolchmeyer@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Since we no longer allow sending io_uring fds over SCM_RIGHTS, move to
using io_is_uring_fops() to detect whether this is a io_uring fd or not.
With that done, kill off io_uring_get_socket() as nobody calls it
anymore.
This is in preparation to yanking out the rest of the core related to
unix gc with io_uring.
After upgrading from 5.16 to 6.1, our board with a MAX14830 started
producing lots of garbage data over UART. Bisection pointed out commit 285e76fc049c as the culprit. That patch tried to replace hand-written
code which I added in 2b4bac48c1084 ("serial: max310x: Use batched reads
when reasonably safe") with the generic regmap infrastructure for
batched operations.
Unfortunately, the `regmap_raw_read` and `regmap_raw_write` which were
used are actually functions which perform IO over *multiple* registers.
That's not what is needed for accessing these Tx/Rx FIFOs; the
appropriate functions are the `_noinc_` versions, not the `_raw_` ones.
Fix this regression by using `regmap_noinc_read()` and
`regmap_noinc_write()` along with the necessary `regmap_config` setup;
with this patch in place, our board communicates happily again. Since
our board uses SPI for talking to this chip, the I2C part is completely
untested.
Fixes: 285e76fc049c ("serial: max310x: use regmap methods for SPI batch operations") Cc: stable@vger.kernel.org Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Jan Kundrát <jan.kundrat@cesnet.cz> Link: https://lore.kernel.org/r/79db8e82aadb0e174bc82b9996423c3503c8fb37.1680732084.git.jan.kundrat@cesnet.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
I2C implementation on this chip has a few key differences
compared to SPI, as described in previous patches.
* extended register space access needs no extra logic
* slave address is used to select which UART to communicate
with
To accommodate these differences, add an I2C interface config,
set the RevID register address and implement an empty method
for setting the GlobalCommand register, since no special handling
is needed for the extended register space.
To handle the port-specific slave address, create an I2C dummy
device for each port, except the base one (UART0), which is
expected to be the one specified in firmware, and create a
regmap for each I2C device.
Add minimum and maximum slave addresses to each devtype for
sanity checking.
Also, use a separate regmap config with no write_flag_mask,
since I2C has a R/W bit in its slave address, and set the
max register to the address of the RevID register, since the
extended register space needs no extra logic.
Finally, add the I2C driver.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Cosmin Tanislav <cosmin.tanislav@analog.com> Link: https://lore.kernel.org/r/20220605144659.4169853-5-demonsingur@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: 3f42b142ea11 ("serial: max310x: fix IO data corruption in batched operations") Signed-off-by: Sasha Levin <sashal@kernel.org>
SPI can only use 5 address bits, since one bit is reserved for
specifying R/W and 2 bits are used to specify the UART port.
To access registers that have addresses past 0x1F, an extended
register space can be enabled by writing to the GlobalCommand
register (address 0x1F).
I2C uses 8 address bits. The R/W bit is placed in the slave
address, and so is the UART port. Because of this, registers
that have addresses higher than 0x1F can be accessed normally.
To access the RevID register, on SPI, 0xCE must be written to
the 0x1F address to enable the extended register space, after
which the RevID register is accessible at address 0x5. 0xCD
must be written to the 0x1F address to disable the extended
register space.
On I2C, the RevID register is accessible at address 0x25.
Create an interface config struct, and add a method for
toggling the extended register space and a member for the RevId
register address. Implement these for SPI.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Cosmin Tanislav <cosmin.tanislav@analog.com> Link: https://lore.kernel.org/r/20220605144659.4169853-4-demonsingur@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: 3f42b142ea11 ("serial: max310x: fix IO data corruption in batched operations") Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently the regmap_config structure only allows the user to implement
single element register read/write using .reg_read/.reg_write callbacks.
The regmap_bus already implements bulk counterparts of both, and is being
misused as a workaround for the missing bulk read/write callbacks in
regmap_config by a couple of drivers. To stop this misuse, add the bulk
read/write callbacks to regmap_config and call them from the regmap core
code.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: Jagan Teki <jagan@amarulasolutions.com> Cc: Mark Brown <broonie@kernel.org> Cc: Maxime Ripard <maxime@cerno.tech> Cc: Robert Foss <robert.foss@linaro.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Zimmermann <tzimmermann@suse.de>
To: dri-devel@lists.freedesktop.org Link: https://lore.kernel.org/r/20220430025145.640305-1-marex@denx.de Signed-off-by: Mark Brown <broonie@kernel.org>
Stable-dep-of: 3f42b142ea11 ("serial: max310x: fix IO data corruption in batched operations") Signed-off-by: Sasha Levin <sashal@kernel.org>
Some device requires a special handling for reg_update_bits and can't use
the normal regmap read write logic. An example is when locking is
handled by the device and rmw operations requires to do atomic operations.
Allow to declare a dedicated function in regmap_config for
reg_update_bits in no bus configuration.
Running out of request IDs on a channel essentially produces the same
effect as running out of space in the ring buffer, in that -EAGAIN is
returned. The error message in hv_ringbuffer_write() should either be
dropped (since we don't output a message when the ring buffer is full)
or be made conditional/debug-only.
Suggested-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Fixes: e8b7db38449ac ("Drivers: hv: vmbus: Add vmbus_requestor data structure for VMBus hardening") Link: https://lore.kernel.org/r/20210301191348.196485-1-parri.andrea@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
lock_task_sighand() can trigger a hard lockup. If NR_CPUS threads call
getrusage() at the same time and the process has NR_THREADS, spin_lock_irq
will spin with irqs disabled O(NR_CPUS * NR_THREADS) time.
Change getrusage() to use sig->stats_lock, it was specifically designed
for this type of use. This way it runs lockless in the likely case.
TODO:
- Change do_task_stat() to use sig->stats_lock too, then we can
remove spin_lock_irq(siglock) in wait_task_zombie().
- Turn sig->stats_lock into seqcount_rwlock_t, this way the
readers in the slow mode won't exclude each other. See
https://lore.kernel.org/all/20230913154907.GA26210@redhat.com/
- stats_lock has to disable irqs because ->siglock can be taken
in irq context, it would be very nice to change __exit_signal()
to avoid the siglock->stats_lock dependency.
Link: https://lkml.kernel.org/r/20240122155053.GA26214@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Dylan Hatch <dylanbhatch@google.com> Tested-by: Dylan Hatch <dylanbhatch@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
do/while_each_thread should be avoided when possible.
Plus this change allows to avoid lock_task_sighand(), we can use rcu
and/or sig->stats_lock instead.
Link: https://lkml.kernel.org/r/20230909172629.GA20454@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: f7ec1cd5cc7e ("getrusage: use sig->stats_lock rather than lock_task_sighand()") Signed-off-by: Sasha Levin <sashal@kernel.org>
Patch series "getrusage: use sig->stats_lock", v2.
This patch (of 2):
thread_group_cputime() does its own locking, we can safely shift
thread_group_cputime_adjusted() which does another for_each_thread loop
outside of ->siglock protected section.
This is also preparation for the next patch which changes getrusage() to
use stats_lock instead of siglock, thread_group_cputime() takes the same
lock. With the current implementation recursive read_seqbegin_or_lock()
is fine, thread_group_cputime() can't enter the slow mode if the caller
holds stats_lock, yet this looks more safe and better performance-wise.
For shared memory of type SHM_HUGETLB, hugetlb pages are reserved in
shmget() call. If SHM_NORESERVE flags is specified then the hugetlb pages
are not reserved. However when the shared memory is attached with the
shmat() call the hugetlb pages are getting reserved incorrectly for
SHM_HUGETLB shared memory created with SHM_NORESERVE which is a bug.
-------------------------------
Following test shows the issue.
$cat shmhtb.c
int main()
{
int shmflags = 0660 | IPC_CREAT | SHM_HUGETLB | SHM_NORESERVE;
int shmid;
While reviewing a bug in hugetlb_reserve_pages, it was noticed that all
callers ignore the return value. Any failure is considered an ENOMEM
error by the callers.
Change the function to be of type bool. The function will return true if
the reservation was successful, false otherwise. Callers currently assume
a zero return code indicates success. Change the callers to look for true
to indicate success. No functional change, only code cleanup.
Link: https://lkml.kernel.org/r/20201221192542.15732-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stable-dep-of: e656c7a9e596 ("mm: hugetlb pages should not be reserved by shmat() if SHM_NORESERVE") Signed-off-by: Sasha Levin <sashal@kernel.org>
If hv_netvsc driver is unloaded and reloaded, the NET_DEVICE_REGISTER
handler cannot perform VF register successfully as the register call
is received before netvsc_probe is finished. This is because we
register register_netdevice_notifier() very early( even before
vmbus_driver_register()).
To fix this, we try to register each such matching VF( if it is visible
as a netdevice) at the end of netvsc_probe.
Cc: stable@vger.kernel.org Fixes: 85520856466e ("hv_netvsc: Fix race of register_netdevice_notifier and VF register") Suggested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Use netif_is_bond_master() function instead of open code, which is
((event_dev->priv_flags & IFF_BONDING) && (event_dev->flags & IFF_MASTER)).
This patch doesn't change logic.
Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 9cae43da9867 ("hv_netvsc: Register VF in netvsc_probe if NET_DEVICE_REGISTER missed") Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently the netvsc/VF binding logic only checks the PCI serial number.
The Microsoft Azure Network Adapter (MANA) supports multiple net_device
interfaces (each such interface is called a "vPort", and has its unique
MAC address) which are backed by the same VF PCI device, so the binding
logic should check both the MAC address and the PCI serial number.
The change should not break any other existing VF drivers, because
Hyper-V NIC SR-IOV implementation requires the netvsc network
interface and the VF network interface have the same MAC address.
Co-developed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Co-developed-by: Shachar Raindel <shacharr@microsoft.com> Signed-off-by: Shachar Raindel <shacharr@microsoft.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 9cae43da9867 ("hv_netvsc: Register VF in netvsc_probe if NET_DEVICE_REGISTER missed") Signed-off-by: Sasha Levin <sashal@kernel.org>
On VF hot remove, NETDEV_GOING_DOWN is sent to notify the VF is about to
go down. At this time, the VF is still sending/receiving traffic and we
request the VSP to switch datapath.
On completion, the datapath is switched to synthetic and we can proceed
with VF hot remove.
Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 9cae43da9867 ("hv_netvsc: Register VF in netvsc_probe if NET_DEVICE_REGISTER missed") Signed-off-by: Sasha Levin <sashal@kernel.org>
The completion indicates if NVSP_MSG4_TYPE_SWITCH_DATA_PATH has been
processed by the VSP. The traffic is steered to VF or synthetic after we
receive this completion.
Signed-off-by: Long Li <longli@microsoft.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 9cae43da9867 ("hv_netvsc: Register VF in netvsc_probe if NET_DEVICE_REGISTER missed") Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, pointers to guest memory are passed to Hyper-V as
transaction IDs in netvsc. In the face of errors or malicious
behavior in Hyper-V, netvsc should not expose or trust the transaction
IDs returned by Hyper-V to be valid guest memory addresses. Instead,
use small integers generated by vmbus_requestor as requests
(transaction) IDs.
Signed-off-by: Andres Beltran <lkmlabelt@gmail.com> Co-developed-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Wei Liu <wei.liu@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/20201109100402.8946-4-parri.andrea@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
Stable-dep-of: 9cae43da9867 ("hv_netvsc: Register VF in netvsc_probe if NET_DEVICE_REGISTER missed") Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, VMbus drivers use pointers into guest memory as request IDs
for interactions with Hyper-V. To be more robust in the face of errors
or malicious behavior from a compromised Hyper-V, avoid exposing
guest memory addresses to Hyper-V. Also avoid Hyper-V giving back a
bad request ID that is then treated as the address of a guest data
structure with no validation. Instead, encapsulate these memory
addresses and provide small integers as request IDs.
Signed-off-by: Andres Beltran <lkmlabelt@gmail.com> Co-developed-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20201109100402.8946-2-parri.andrea@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
Stable-dep-of: 9cae43da9867 ("hv_netvsc: Register VF in netvsc_probe if NET_DEVICE_REGISTER missed") Signed-off-by: Sasha Levin <sashal@kernel.org>
ext4_da_map_blocks() only hold i_data_sem in shared mode and i_rwsem
when inserting delalloc extents, it could be raced by another querying
path of ext4_map_blocks() without i_rwsem, .e.g buffered read path.
Suppose we buffered read a file containing just a hole, and without any
cached extents tree, then it is raced by another delayed buffered write
to the same area or the near area belongs to the same hole, and the new
delalloc extent could be overwritten to a hole extent.
This race could lead to inconsistent delalloc extents tree and
incorrect reserved space counter. Fix this by converting to hold
i_data_sem in exclusive mode when adding a new delalloc extent in
ext4_da_map_blocks().
Cc: stable@vger.kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
For these hooks the true "neutral" value is -EOPNOTSUPP, which is
currently what is returned when no LSM provides this hook and what LSMs
return when there is no security context set on the socket. Correct the
value in <linux/lsm_hooks.h> and adjust the dispatch functions in
security/security.c to avoid issues when the BPF LSM is enabled.
Cc: stable@vger.kernel.org Fixes: 98e828a0650f ("security: Refactor declaration of LSM hooks") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
[PM: subject line tweak] Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 4ff09db1b79b ("bpf: net: Change sk_getsockopt() to take the
sockptr_t argument") made it possible to call sk_getsockopt()
with both user and kernel address space buffers through the use of
the sockptr_t type. Unfortunately at the time of conversion the
security_socket_getpeersec_stream() LSM hook was written to only
accept userspace buffers, and in a desire to avoid having to change
the LSM hook the commit author simply passed the sockptr_t's
userspace buffer pointer. Since the only sk_getsockopt() callers
at the time of conversion which used kernel sockptr_t buffers did
not allow SO_PEERSEC, and hence the
security_socket_getpeersec_stream() hook, this was acceptable but
also very fragile as future changes presented the possibility of
silently passing kernel space pointers to the LSM hook.
There are several ways to protect against this, including careful
code review of future commits, but since relying on code review to
catch bugs is a recipe for disaster and the upstream eBPF maintainer
is "strongly against defensive programming", this patch updates the
LSM hook, and all of the implementations to support sockptr_t and
safely handle both user and kernel space buffers.
Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
Stable-dep-of: 5a287d3d2b9d ("lsm: fix default return value of the socket_getpeersec_*() hooks") Signed-off-by: Sasha Levin <sashal@kernel.org>
This patch changes sk_getsockopt() to take the sockptr_t argument
such that it can be used by bpf_getsockopt(SOL_SOCKET) in a
latter patch.
security_socket_getpeersec_stream() is not changed. It stays
with the __user ptr (optval.user and optlen.user) to avoid changes
to other security hooks. bpf_getsockopt(SOL_SOCKET) also does not
support SO_PEERSEC.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220902002802.2888419-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stable-dep-of: 5a287d3d2b9d ("lsm: fix default return value of the socket_getpeersec_*() hooks") Signed-off-by: Sasha Levin <sashal@kernel.org>
A latter patch refactors bpf_getsockopt(SOL_SOCKET) with the
sock_getsockopt() to avoid code duplication and code
drift between the two duplicates.
The current sock_getsockopt() takes sock ptr as the argument.
The very first thing of this function is to get back the sk ptr
by 'sk = sock->sk'.
bpf_getsockopt() could be called when the sk does not have
the sock ptr created. Meaning sk->sk_socket is NULL. For example,
when a passive tcp connection has just been established but has yet
been accept()-ed. Thus, it cannot use the sock_getsockopt(sk->sk_socket)
or else it will pass a NULL ptr.
This patch moves all sock_getsockopt implementation to the newly
added sk_getsockopt(). The new sk_getsockopt() takes a sk ptr
and immediately gets the sock ptr by 'sock = sk->sk_socket'
The existing sock_getsockopt(sock) is changed to call
sk_getsockopt(sock->sk). All existing callers have both sock->sk
and sk->sk_socket pointer.
The latter patch will make bpf_getsockopt(SOL_SOCKET) call
sk_getsockopt(sk) directly. The bpf_getsockopt(SOL_SOCKET) does
not use the optnames that require sk->sk_socket, so it will
be safe.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220902002756.2887884-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stable-dep-of: 5a287d3d2b9d ("lsm: fix default return value of the socket_getpeersec_*() hooks") Signed-off-by: Sasha Levin <sashal@kernel.org>
If there is a problem after resetting a port, the do/while() loop that
checks the default value of DIVLSB register may run forever and spam the
I2C bus.
Add a delay before each read of DIVLSB, and a maximum number of tries to
prevent that situation from happening.
The driver currently does manual register manipulation in
multiple places to talk to a specific UART port.
In order to talk to a specific UART port over SPI, the bits U1
and U0 of the register address can be set, as explained in the
Command byte configuration section of the datasheet.
Make this more elegant by creating regmaps for each UART port
and setting the read_flag_mask and write_flag_mask
accordingly.
All communcations regarding global registers are done on UART
port 0, so replace the global regmap entirely with the port 0
regmap.
Also, remove the 0x1f masks from reg_writeable(), reg_volatile()
and reg_precious() methods, since setting the U1 and U0 bits of
the register address happens inside the regmap core now.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Cosmin Tanislav <cosmin.tanislav@analog.com> Link: https://lore.kernel.org/r/20220605144659.4169853-3-demonsingur@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: b35f8dbbce81 ("serial: max310x: prevent infinite while() loop in port startup") Signed-off-by: Sasha Levin <sashal@kernel.org>