Now that the application is simply stored in struct hid_input, we can
overwrite it in mt_input_mapping() for the faulty egalax and have a
simpler suffix processing in mt_input_configured()
The list iterator value 'encoder' will *always* be set and non-NULL
by list_for_each_entry(), so it is incorrect to assume that the
iterator value will be NULL if the list is empty or no element
is found.
To fix the bug, use a new variable 'iter' as the list iterator,
while use the original variable 'encoder' as a dedicated pointer
to point to the found element.
Earlier there were no mode_valid() helper for crtc and tilcdc had a
hack to over come this limitation. But now the mode_valid() helper is
there (has been since v4.13), so it is about time to get rid of that
hack.
Fixes: f9aefd6b2aa3 ("net: warn if mac header was not set") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220707123900.945305-1-edumazet@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mpol_set_nodemask()(mm/mempolicy.c) does not set up nodemask when
pol->mode is MPOL_LOCAL. Check pol->mode before access
pol->w.cpuset_mems_allowed in mpol_rebind_policy()(mm/mempolicy.c).
Revert "Revert "char/random: silence a lockdep splat with printk()""
In 2019, Sergey fixed a lockdep splat with 15341b1dd409 ("char/random:
silence a lockdep splat with printk()"), but that got reverted soon
after from 4.19 because back then it apparently caused various problems.
But the issue it was fixing is still there, and more generally, many
patches turning printk() into printk_deferred() have landed since,
making me suspect it's okay to try this out again.
This should fix the following deadlock found by the kernel test robot:
While reading sysctl_tcp_thin_linear_timeouts, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 36e31b0af587 ("net: TCP thin linear timeouts") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_tcp_recovery, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: 4f41b1c58a32 ("tcp: use RACK to detect losses") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_tcp_early_retrans, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: eed530b6c676 ("tcp: early retransmit") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
be_cmd_read_port_transceiver_data assumes that it is given a buffer that
is at least PAGE_DATA_LEN long, or twice that if the module supports SFF
8472. However, this is not always the case.
Fix this by passing the desired offset and length to
be_cmd_read_port_transceiver_data so that we only copy the bytes once.
While reading sysctl_tcp_fastopen, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: 2100c8d2d9db ("net-tcp: Fast Open base") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_igmp_llm_reports, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
This test can be packed into a helper, so such changes will be in the
follow-up series after net is merged into net-next.
if (ipv4_is_local_multicast(pmc->multiaddr) &&
!READ_ONCE(net->ipv4.sysctl_igmp_llm_reports))
Fixes: df2cf4a78e48 ("IGMP: Inhibit reports for local multicast groups") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Socket destruction flow and tls_device_down function sync against each
other using tls_device_lock and the context refcount, to guarantee the
device resources are freed via tls_dev_del() by the end of
tls_device_down.
In the following unfortunate flow, this won't happen:
- refcount is decreased to zero in tls_device_sk_destruct.
- tls_device_down starts, skips the context as refcount is zero, going
all the way until it flushes the gc work, and returns without freeing
the device resources.
- only then, tls_device_queue_ctx_destruction is called, queues the gc
work and frees the context's device resources.
Solve it by decreasing the refcount in the socket's destruction flow
under the tls_device_lock, for perfect synchronization. This does not
slow down the common likely destructor flow, in which both the refcount
is decreased and the spinlock is acquired, anyway.
Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Problems were observed on the Xilinx ZynqMP platform with large I2C reads.
When a read of 277 bytes was performed, the controller NAKed the transfer
after only 252 bytes were transferred and returned an ENXIO error on the
transfer.
There is some code in cdns_i2c_master_isr to handle this case by resetting
the transfer count in the controller before it reaches 0, to allow larger
transfers to work, but it was conditional on the CDNS_I2C_BROKEN_HOLD_BIT
quirk being set on the controller, and ZynqMP uses the r1p14 version of
the core where this quirk is not being set. The requirement to do this to
support larger reads seems like an inherently required workaround due to
the core only having an 8-bit transfer size register, so it does not
appear that this should be conditional on the broken HOLD bit quirk which
is used elsewhere in the driver.
Remove the dependency on the CDNS_I2C_BROKEN_HOLD_BIT for this transfer
size reset logic to fix this problem.
Fixes: 63cab195bf49 ("i2c: removed work arounds in i2c driver for Zynq Ultrascale+ MPSoC") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Shubhrajyoti Datta <Shubhrajyoti.datta@amd.com> Acked-by: Michal Simek <michal.simek@amd.com> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_tcp_probe_interval, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: 05cbc0db03e8 ("ipv4: Create probe timer for tcp PMTU as per RFC4821") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_tcp_probe_threshold, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: 6b58e0a5f32d ("ipv4: Use binary search to choose tcp PMTU probe_size") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_tcp_mtu_probing, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: 5d424d5a674f ("[TCP]: MTU probing") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_fwmark_reflect, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: e110861f8609 ("net: add a sysctl to reflect the fwmark on replies") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading sysctl_ip_fwd_use_pmtu, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: f87c10a8aa1e ("ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
After this; e1 is attached to an unmapped rb and a subsequent
perf_mmap() will loop forever more:
again:
mutex_lock(&e->mmap_mutex);
if (event->rb) {
...
if (!atomic_inc_not_zero(&e->rb->mmap_count)) {
...
mutex_unlock(&e->mmap_mutex);
goto again;
}
}
The loop in perf_mmap_close() holds e2->mmap_mutex, while the attach
in perf_event_set_output() holds e1->mmap_mutex. As such there is no
serialization to avoid this race.
Change perf_event_set_output() to take both e1->mmap_mutex and
e2->mmap_mutex to alleviate that problem. Additionally, have the loop
in perf_mmap() detach the rb directly, this avoids having to wait for
the concurrent perf_mmap_close() to get around to doing it to make
progress.
Fixes: 9bb5d40cd93c ("perf: Fix mmap() accounting hole") Reported-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Yang Jihong <yangjihong1@huawei.com> Link: https://lkml.kernel.org/r/YsQ3jm2GR38SW7uD@worktop.programming.kicks-ass.net Signed-off-by: Sasha Levin <sashal@kernel.org>
Because of the possible failure of the allocation, data->domains might
be NULL pointer and will cause the dereference of the NULL pointer
later.
Therefore, it might be better to check it and directly return -ENOMEM
without releasing data manually if fails, because the comment of the
devm_kmalloc() says "Memory allocated with this function is
automatically freed on driver detach.".
of_find_matching_node_and_match() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.
Add missing of_node_put() to avoid refcount leak.
Fixes: 0e545f57b708 ("power: reset: driver for the Versatile syscon reboot") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
xfrm_policy_lookup() will call xfrm_pol_hold_rcu() to get a refcount of
pols[0]. This refcount can be dropped in xfrm_expand_policies() when
xfrm_expand_policies() return error. pols[0]'s refcount is balanced in
here. But xfrm_bundle_lookup() will also call xfrm_pols_put() with
num_pols == 1 to drop this refcount when xfrm_expand_policies() return
error.
This patch also fix an illegal address access. pols[0] will save a error
point when xfrm_policy_lookup fails. This lead to xfrm_pols_put to resolve
an illegal address in xfrm_bundle_lookup's error path.
Fix these by setting num_pols = 0 in xfrm_expand_policies()'s error path.
The error paths of gntdev_mmap() can call unmap_grant_pages() even
though not all of the pages have been successfully mapped. This will
trigger the WARN_ON()s in __unmap_grant_pages_done(). The number of
warnings can be very large; I have observed thousands of lines of
warnings in the systemd journal.
Avoid this problem by only warning on unmapping failure if the handle
being unmapped is not INVALID_GRANT_HANDLE. The handle field of any
page that was not successfully mapped will be INVALID_GRANT_HANDLE, so
this catches all cases where unmapping can legitimately fail.
When trying to load modules built for RISC-V which include assembly files
the kernel loader errors with "unexpected relocation type 'R_RISCV_ALIGN'"
due to R_RISCV_ALIGN relocations being generated by the assembler.
The R_RISCV_ALIGN relocations can be removed at the expense of code space
by adding -mno-relax to gcc and as. In commit 7a8e7da42250138
("RISC-V: Fixes to module loading") -mno-relax is added to the build
variable KBUILD_CFLAGS_MODULE. See [1] for more info.
The issue is that when kbuild builds a .S file, it invokes gcc with
the -mno-relax flag, but this is not being passed through to the
assembler. Adding -Wa,-mno-relax to KBUILD_AFLAGS_MODULE ensures that
the assembler is invoked correctly. This may have now been fixed in
gcc[2] and this addition should not stop newer gcc and as from working.
can_put_echo_skb() will clone skb then free the skb. Move the
can_put_echo_skb() for the m_can version 3.0.x directly before the
start of the xmit in hardware, similar to the 3.1.x branch.
Fixes: 80646733f11c ("can: m_can: update to support CAN FD features") Link: https://lore.kernel.org/all/20220317081305.739554-1-mkl@pengutronix.de Cc: stable@vger.kernel.org Reported-by: Hangyu Hua <hbh25y@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
[sudip: adjust context] Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver must provide throttle and unthrottle in uart_ops when it
sets UPSTAT_AUTORTS. Add them using existing stop_rx &
enable_interrupts functions.
Fixes: 2a76fa283098 (serial: pl011: Adopt generic flag to store auto RTS status) Cc: stable <stable@kernel.org> Cc: Lukas Wunner <lukas@wunner.de> Reported-by: Nuno Gonçalves <nunojpg@gmail.com> Tested-by: Nuno Gonçalves <nunojpg@gmail.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20220614075637.8558-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The code lacks clearing of previous DEAT/DEDT values. Thus, changing
values on the fly results in garbage delays tending towards the maximum
value as more and more bits are ORed together. (Leaving RS485 mode
would have cleared the old values though).
Fixes: 1bcda09d2910 ("serial: stm32: add support for RS485 hardware control mode") Cc: stable <stable@kernel.org> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20220627150753.34510-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If port->mapbase = NULL in serial8250_request_std_resource() , it need
return a error code instead of 0. If uart_set_info() fail to request new
regions by serial8250_request_std_resource() but the return value of
serial8250_request_std_resource() is 0, The system incorrectly considers
that the resource application is successful and does not attempt to
restore the old setting. A null pointer reference is triggered when the
port resource is later invoked.
The src_maxburst and dst_maxburst have been changed to 1 but the settings
of the UCON register aren't changed yet. They should be changed as well
according to the dmaengine slave config.
Fixes: aa2f80e752c7 ("serial: samsung: fix maxburst parameter for DMA transactions") Cc: stable <stable@kernel.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Chanho Park <chanho61.park@samsung.com> Link: https://lore.kernel.org/r/20220627065113.139520-1-chanho61.park@samsung.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The DWC3_EVENT_PENDING flag is used to protect against invalid call to
top-half interrupt handler, which can occur when there's a delay in
software detection of the interrupt line deassertion.
However, the clearing of this flag was done prior to unmasking the
interrupt line, creating opportunity where the top-half handler can
come. This breaks the serialization and creates a race between the
top-half and bottom-half handler, resulting in losing synchronization
between the controller and the driver when processing events.
To fix this, make sure the clearing of the DWC3_EVENT_PENDING is done at
the end of the bottom-half handler.
System like Android allow user control power role from UI, it is possible
to implement application base on typec uevent to refresh UI, but found
there is chance that UI show different state from typec attribute file.
In typec_set_pwr_opmode(), when partner support PD, there is no uevent
send to user space which cause the problem.
Fix it by sending uevent notification when change power mode to PD.
These are indeed "should not happen" situations, but it turns out recent
changes made the 'task_is_stopped_or_trace()' case trigger (fix for that
exists, is pending more testing), and the BUG_ON() makes it
unnecessarily hard to actually debug for no good reason.
It's been that way for a long time, but let's make it clear: BUG_ON() is
not good for debugging, and should never be used in situations where you
could just say "this shouldn't happen, but we can continue".
Use WARN_ON_ONCE() instead to make sure it gets logged, and then just
continue running. Instead of making the system basically unusuable
because you crashed the machine while potentially holding some very core
locks (eg this function is commonly called while holding 'tasklist_lock'
for writing).
The .brk section has the same properties as .bss: it is an alloc-only
section and should be cleared before being used.
Not doing so is especially a problem for Xen PV guests, as the
hypervisor will validate page tables (check for writable page tables
and hypervisor private bits) before accepting them to be used.
Make sure .brk is initially zero by letting clear_bss() clear the brk
area, too.
The mask_ack operation clears the interrupt by writing to the PICSR
register. This we don't want for level triggered interrupt because
it does not actually clear the interrupt on the source hardware.
This was causing issues in qemu with multi core setups where
interrupts would continue to fire even though they had been cleared in
PICSR.
Don't print a misleading header length mismatch error if the i2c call
returns an error. Instead just return the error code without any error
message.
Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
In pmac_cpufreq_init_MacRISC3(), we need to add corresponding
of_node_put() for the three node pointers whose refcount have
been incremented by of_find_node_by_name().
When br_netfilter module is loaded, skbs may be diverted to the
ipv4/ipv6 hooks, just like as if we were routing.
Unfortunately, bridge filter hooks with priority 0 may be skipped
in this case.
Example:
1. an nftables bridge ruleset is loaded, with a prerouting
hook that has priority 0.
2. interface is added to the bridge.
3. no tcp packet is ever seen by the bridge prerouting hook.
4. flush the ruleset
5. load the bridge ruleset again.
6. tcp packets are processed as expected.
After 1) the only registered hook is the bridge prerouting hook, but its
not called yet because the bridge hasn't been brought up yet.
After 2), hook order is:
0 br_nf_pre_routing // br_netfilter internal hook
0 chain bridge f prerouting // nftables bridge ruleset
The packet is diverted to br_nf_pre_routing.
If call-iptables is off, the nftables bridge ruleset is called as expected.
But if its enabled, br_nf_hook_thresh() will skip it because it assumes
that all 0-priority hooks had been called previously in bridge context.
To avoid this, check for the br_nf_pre_routing hook itself, we need to
resume directly after it, even if this hook has a priority of 0.
Unfortunately, this still results in different packet flow.
With this fix, the eval order after in 3) is:
1. br_nf_pre_routing
2. ip(6)tables (if enabled)
3. nftables bridge
but after 5 its the much saner:
1. nftables bridge
2. br_nf_pre_routing
3. ip(6)tables (if enabled)
Unfortunately I don't see a solution here:
It would be possible to move br_nf_pre_routing to a higher priority
so that it will be called later in the pipeline, but this also impacts
ebtables evaluation order, and would still result in this very ordering
problem for all nftables-bridge hooks with the same priority as the
br_nf_pre_routing one.
Searching back through the git history I don't think this has
ever behaved in any other way, hence, no fixes-tag.
Virtio devices might lose their state when the VMM is restarted
after a suspend to disk (hibernation) cycle. This means that the
guest page size register must be restored for the virtio_mmio legacy
interface, since otherwise the virtio queues are not functional.
This is particularly problematic for QEMU that currently still defaults
to using the legacy interface for virtio_mmio. Write the guest page
size register again in virtio_mmio_restore() to make legacy virtio_mmio
devices work correctly after hibernation.
Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com>
Message-Id: <20220621110621.3638025-3-stephan.gerhold@kernkonzept.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Most virtio drivers provide freeze/restore callbacks to finish up
device usage before suspend and to reinitialize the virtio device after
resume. However, these callbacks are currently only called when using
virtio_pci. virtio_mmio does not have any PM ops defined.
This causes problems for example after suspend to disk (hibernation),
since the virtio devices might lose their state after the VMM is
restarted. Calling virtio_device_freeze()/restore() ensures that
the virtio devices are re-initialized correctly.
Fix this by implementing the dev_pm_ops for virtio_mmio,
similar to virtio_pci_common.
Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com>
Message-Id: <20220621110621.3638025-2-stephan.gerhold@kernkonzept.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When creating VFs a kernel panic can happen when calling to
efx_ef10_try_update_nic_stats_vf.
When releasing a DMA coherent buffer, sometimes, I don't know in what
specific circumstances, it has to unmap memory with vunmap. It is
disallowed to do that in IRQ context or with BH disabled. Otherwise, we
hit this line in vunmap, causing the crash:
BUG_ON(in_interrupt());
This patch reenables BH to release the buffer.
Log messages when the bug is hit:
kernel BUG at mm/vmalloc.c:2727!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 6 PID: 1462 Comm: NetworkManager Kdump: loaded Tainted: G I --------- --- 5.14.0-119.el9.x86_64 #1
Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS 2.8.2 08/27/2020
RIP: 0010:vunmap+0x2e/0x30
...skip...
Call Trace:
__iommu_dma_free+0x96/0x100
efx_nic_free_buffer+0x2b/0x40 [sfc]
efx_ef10_try_update_nic_stats_vf+0x14a/0x1c0 [sfc]
efx_ef10_update_stats_vf+0x18/0x40 [sfc]
efx_start_all+0x15e/0x1d0 [sfc]
efx_net_open+0x5a/0xe0 [sfc]
__dev_open+0xe7/0x1a0
__dev_change_flags+0x1d7/0x240
dev_change_flags+0x21/0x60
...skip...
Fixes: d778819609a2 ("sfc: DMA the VF stats only when requested") Reported-by: Ma Yuying <yuma@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20220713092116.21238-1-ihuguet@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Both helper functions bpf_lwt_seg6_action() and bpf_lwt_push_encap() use
the bpf_push_seg6_encap() to encapsulate the packet in an IPv6 with Segment
Routing Header (SRH) or insert an SRH between the IPv6 header and the
payload.
To achieve this result, such helper functions rely on bpf_push_seg6_encap()
which, in turn, leverages seg6_do_srh_{encap,inline}() to perform the
required operation (i.e. encap/inline).
This patch removes the initialization of the IPv6 header payload length
from bpf_push_seg6_encap(), as it is now handled properly by
seg6_do_srh_{encap,inline}() to prevent corruption of the skb checksum.
Fixes: fe94cc290f53 ("bpf: Add IPv6 Segment Routing helpers") Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The SRv6 End.B6 and End.B6.Encaps behaviors rely on functions
seg6_do_srh_{encap,inline}() to, respectively: i) encapsulate the
packet within an outer IPv6 header with the specified Segment Routing
Header (SRH); ii) insert the specified SRH directly after the IPv6
header of the packet.
This patch removes the initialization of the IPv6 header payload length
from the input_action_end_b6{_encap}() functions, as it is now handled
properly by seg6_do_srh_{encap,inline}() to avoid corruption of the skb
checksum.
Fixes: 140f04c33bbc ("ipv6: sr: implement several seg6local actions") Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Support for SRH encapsulation and insertion was introduced with
commit 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and
injection with lwtunnels"), through the seg6_do_srh_encap() and
seg6_do_srh_inline() functions, respectively.
The former encapsulates the packet in an outer IPv6 header along with
the SRH, while the latter inserts the SRH between the IPv6 header and
the payload. Then, the headers are initialized/updated according to the
operating mode (i.e., encap/inline).
Finally, the skb checksum is calculated to reflect the changes applied
to the headers.
The IPv6 payload length ('payload_len') is not initialized
within seg6_do_srh_{inline,encap}() but is deferred in seg6_do_srh(), i.e.
the caller of seg6_do_srh_{inline,encap}().
However, this operation invalidates the skb checksum, since the
'payload_len' is updated only after the checksum is evaluated.
To solve this issue, the initialization of the IPv6 payload length is
moved from seg6_do_srh() directly into the seg6_do_srh_{inline,encap}()
functions and before the skb checksum update takes place.
Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Reported-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/all/20220705190727.69d532417be7438b15404ee1@uniroma2.it Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Use after free is detected by kfence when disabling sriov. What was read
after being freed was vf->pci_dev: it was freed from pci_disable_sriov
and later read in efx_ef10_sriov_free_vf_vports, called from
efx_ef10_sriov_free_vf_vswitching.
Set the pointer to NULL at release time to not trying to read it later.
Reproducer and dmesg log (note that kfence doesn't detect it every time):
$ echo 1 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs
$ echo 0 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs
BUG: KFENCE: use-after-free read in efx_ef10_sriov_free_vf_vswitching+0x82/0x170 [sfc]
Use-after-free read at 0x00000000ff3c1ba5 (in kfence-#224):
efx_ef10_sriov_free_vf_vswitching+0x82/0x170 [sfc]
efx_ef10_pci_sriov_disable+0x38/0x70 [sfc]
efx_pci_sriov_configure+0x24/0x40 [sfc]
sriov_numvfs_store+0xfe/0x140
kernfs_fop_write_iter+0x11c/0x1b0
new_sync_write+0x11f/0x1b0
vfs_write+0x1eb/0x280
ksys_write+0x5f/0xe0
do_syscall_64+0x5c/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
allocated by task 6771 on cpu 10 at 3137.860196s:
pci_alloc_dev+0x21/0x60
pci_iov_add_virtfn+0x2a2/0x320
sriov_enable+0x212/0x3e0
efx_ef10_sriov_configure+0x67/0x80 [sfc]
efx_pci_sriov_configure+0x24/0x40 [sfc]
sriov_numvfs_store+0xba/0x140
kernfs_fop_write_iter+0x11c/0x1b0
new_sync_write+0x11f/0x1b0
vfs_write+0x1eb/0x280
ksys_write+0x5f/0xe0
do_syscall_64+0x5c/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
freed by task 6771 on cpu 12 at 3170.991309s:
device_release+0x34/0x90
kobject_cleanup+0x3a/0x130
pci_iov_remove_virtfn+0xd9/0x120
sriov_disable+0x30/0xe0
efx_ef10_pci_sriov_disable+0x57/0x70 [sfc]
efx_pci_sriov_configure+0x24/0x40 [sfc]
sriov_numvfs_store+0xfe/0x140
kernfs_fop_write_iter+0x11c/0x1b0
new_sync_write+0x11f/0x1b0
vfs_write+0x1eb/0x280
ksys_write+0x5f/0xe0
do_syscall_64+0x5c/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: 3c5eb87605e85 ("sfc: create vports for VFs and assign random MAC addresses") Reported-by: Yanghang Liu <yanghliu@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/20220712062642.6915-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The device tree should include generic "jedec,spi-nor" compatible, and a
manufacturer-specific one.
The macronix part is what is shipped on the boards that come with a
flash chip.
While reading icmp sysctl variables, they can be changed concurrently.
So, we need to add READ_ONCE() to avoid data-races.
Fixes: 4cdf507d5452 ("icmp: add a global rate limitation") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
While reading cipso sysctl variables, they can be changed concurrently.
So, we need to add READ_ONCE() to avoid data-races.
Fixes: 446fda4f2682 ("[NetLabel]: CIPSOv4 engine") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Put the SGTL5000 in a silent/safe state on shutdown/remove, this is
required since the SGTL5000 produces a constant noise on its output
after it is configured and its clock is removed. Without this change
this is happening every time the module is unbound/removed or from
reboot till the clock is enabled again.
The issue was experienced on both a Toradex Colibri/Apalis iMX6, but can
be easily reproduced everywhere just playing something on the codec and
after that removing/unbinding the driver.
Fixes: 9b34e6cc3bc2 ("ASoC: Add Freescale SGTL5000 codec support") Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by: Fabio Estevam <festevam@denx.de> Link: https://lore.kernel.org/r/20220624101301.441314-1-francesco.dolcini@toradex.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Jon reports that the Spectre-BHB init code is filling up the kernel log
with spurious notifications about which mitigation has been enabled,
every time any CPU comes out of a low power state.
Given that Spectre-BHB mitigations are system wide, only a single
mitigation can be enabled, and we already print an error if two types of
CPUs coexist in a single system that require different Spectre-BHB
mitigations.
This means that the pr_info() that describes the selected mitigation
does not need to be emitted for each CPU anyway, and so we can simply
emit it only once.
In order to clarify the above in the log message, update it to describe
that the selected mitigation will be enabled on all CPUs, including ones
that are unaffected. If another CPU comes up later that is affected and
requires a different mitigation, we report an error as before.
Fixes: b9baf5c8c5c3 ("ARM: Spectre-BHB workaround") Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
The permission flags of newly created symlinks are wrongly dropped on
nilfs2 with the current umask value even though symlinks should have 777
(rwxrwxrwx) permissions:
Each cset (css_set) is pinned by its tasks. When we're moving tasks around
across csets for a migration, we need to hold the source and destination
csets to ensure that they don't go away while we're moving tasks about. This
is done by linking cset->mg_preload_node on either the
mgctx->preloaded_src_csets or mgctx->preloaded_dst_csets list. Using the
same cset->mg_preload_node for both the src and dst lists was deemed okay as
a cset can't be both the source and destination at the same time.
Unfortunately, this overloading becomes problematic when multiple tasks are
involved in a migration and some of them are identity noop migrations while
others are actually moving across cgroups. For example, this can happen with
the following sequence on cgroup1:
the process including the group leader back into a. In this final migration,
non-leader threads would be doing identity migration while the group leader
is doing an actual one.
After #3, let's say the whole process was in cset A, and that after #4, the
leader moves to cset B. Then, during #6, the following happens:
1. cgroup_migrate_add_src() is called on B for the leader.
2. cgroup_migrate_add_src() is called on A for the other threads.
3. cgroup_migrate_prepare_dst() is called. It scans the src list.
4. It notices that B wants to migrate to A, so it tries to A to the dst
list but realizes that its ->mg_preload_node is already busy.
5. and then it notices A wants to migrate to A as it's an identity
migration, it culls it by list_del_init()'ing its ->mg_preload_node and
putting references accordingly.
6. The rest of migration takes place with B on the src list but nothing on
the dst list.
This means that A isn't held while migration is in progress. If all tasks
leave A before the migration finishes and the incoming task pins it, the
cset will be destroyed leading to use-after-free.
This is caused by overloading cset->mg_preload_node for both src and dst
preload lists. We wanted to exclude the cset from the src list but ended up
inadvertently excluding it from the dst list too.
This patch fixes the issue by separating out cset->mg_preload_node into
->mg_src_preload_node and ->mg_dst_preload_node, so that the src and dst
preloadings don't interfere with each other.
After emulating a misaligned load or store issued in Thumb mode, we have
to advance the IT state by hand, or it will get out of sync with the
actual instruction stream, which means we'll end up applying the wrong
condition code to subsequent instructions. This might corrupt the
program state rather catastrophically.
So borrow the it_advance() helper from the probing code, and use it on
CPSR if the emulated instruction is Thumb.
Cc: <stable@vger.kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Print the message about disabled Spectre workarounds only once. The
message is printed each time CPU goes out from idling state on NVIDIA
Tegra boards, causing storm in KMSG that makes system unusable.
Cc: stable@vger.kernel.org Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The trace event sock_exceed_buf_limit saves the prot->sysctl_mem pointer
and then dereferences it in the TP_printk() portion. This is unsafe as the
TP_printk() portion is executed at the time the buffer is read. That is,
it can be seconds, minutes, days, months, even years later. If the proto
is freed, then this dereference will can also lead to a kernel crash.
Instead, save the sysctl_mem array into the ring buffer and have the
TP_printk() reference that instead. This is the proper and safe way to
read pointers in trace events.
Link: https://lore.kernel.org/all/20220706052130.16368-12-kuniyu@amazon.com/ Cc: stable@vger.kernel.org Fixes: 3847ce32aea9f ("core: add tracepoints for queueing skb to rcvbuf") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As commit 46bbe5c671e0 ("tracing: fix double free") said, the
"double free" problem reported by clang static analyzer is:
> In parse_var_defs() if there is a problem allocating
> var_defs.expr, the earlier var_defs.name is freed.
> This free is duplicated by free_var_defs() which frees
> the rest of the list.
However, if there is a problem allocating N-th var_defs.expr:
+ in parse_var_defs(), the freed 'earlier var_defs.name' is
actually the N-th var_defs.name;
+ then in free_var_defs(), the names from 0th to (N-1)-th are freed;
IF ALLOCATING PROBLEM HAPPENED HERE!!! -+
\
|
0th 1th (N-1)-th N-th V
+-------------+-------------+-----+-------------+-----------
var_defs: | name | expr | name | expr | ... | name | expr | name | ///
+-------------+-------------+-----+-------------+-----------
These two frees don't act on same name, so there was no "double free"
problem before. Conversely, after that commit, we get a "memory leak"
problem because the above "N-th var_defs.name" is not freed.
If enable CONFIG_DEBUG_KMEMLEAK and inject a fault at where the N-th
var_defs.expr allocated, then execute on shell like:
$ echo 'hist:key=call_site:val=$v1,$v2:v1=bytes_req,v2=bytes_alloc' > \
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
xenvif_rx_next_skb() is expecting the rx queue not being empty, but
in case the loop in xenvif_rx_action() is doing multiple iterations,
the availability of another skb in the rx queue is not being checked.
On a HP 288 Pro G2 MT (X9W02AV), the front mic could not be detected.
In order to get it working, the pin configuration needs to be set
correctly, and the ALC221_FIXUP_HP_288PRO_MIC_NO_PRESENCE fixup needs
to be applied.
There is another HP ProDesk 600 G3 model with the PCI SSID 103c:82b4
that requires the quirk HP_MIC_NO_PRESENCE. Add the corresponding
entry to the quirk table.
It seems that it is valid to have less than the requested number of
descriptors. But what is not valid and leads to subsequent errors is to
have zero descriptors. In that case, abort the probing.
Fixes: e1f7c9eee707 ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver") Signed-off-by: Michael Walle <michael@walle.cc> Link: https://lore.kernel.org/r/20220526135111.1470926-1-michael@walle.cc Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The DEFINE_SPINLOCK() macro shouldn't be used for dynamically allocated
spinlocks. The lockdep warns about this and disables locking validator.
Fix the warning by making lock static.
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
Hardware name: Radxa ROCK Pi 4C (DT)
Call trace:
dump_backtrace.part.0+0xcc/0xe0
show_stack+0x18/0x6c
dump_stack_lvl+0x8c/0xb8
dump_stack+0x18/0x34
register_lock_class+0x4a8/0x4cc
__lock_acquire+0x78/0x20cc
lock_acquire.part.0+0xe0/0x230
lock_acquire+0x68/0x84
_raw_spin_lock_irqsave+0x84/0xc4
add_desc+0x44/0xc0
pl330_get_desc+0x15c/0x1d0
pl330_prep_dma_cyclic+0x100/0x270
snd_dmaengine_pcm_trigger+0xec/0x1c0
dmaengine_pcm_trigger+0x18/0x24
...
This is another old BUG_ON() that just shouldn't exist (see also commit a382f8fee42c: "signal handling: don't use BUG_ON() for debugging").
In fact, as Matthew Wilcox points out, this condition shouldn't really
even result in a warning, since a negative id allocation result is just
a normal allocation failure:
"I wonder if we should even warn here -- sure, the caller is trying to
free something that wasn't allocated, but we don't warn for
kfree(NULL)"
and goes on to point out how that current error check is only causing
people to unnecessarily do their own index range checking before freeing
it.
This was noted by Itay Iellin, because the bluetooth HCI socket cookie
code does *not* do that range checking, and ends up just freeing the
error case too, triggering the BUG_ON().
The HCI code requires CAP_NET_RAW, and seems to just result in an ugly
splat, but there really is no reason to BUG_ON() here, and we have
generally striven for allocation models where it's always ok to just do
free(alloc());
even if the allocation were to fail for some random reason (usually
obviously that "random" reason being some resource limit).
Fixes: 88eca0207cf1 ("ida: simplified functions for id allocation") Reported-by: Itay Iellin <ieitayie@gmail.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Set return value in rsp_buf alloc error path before going to
error handling.
drivers/misc/cardreader/rtsx_usb.c:639:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
if (!ucr->rsp_buf)
^~~~~~~~~~~~~
drivers/misc/cardreader/rtsx_usb.c:678:9: note: uninitialized use occurs here
return ret;
^~~
drivers/misc/cardreader/rtsx_usb.c:639:2: note: remove the 'if' if its condition is always false
if (!ucr->rsp_buf)
^~~~~~~~~~~~~~~~~~
drivers/misc/cardreader/rtsx_usb.c:622:9: note: initialize the variable 'ret' to silence this warning
int ret;
^
= 0
Fixes: 3776c7855985 ("misc: rtsx_usb: use separate command and response buffers") Reported-by: kernel test robot <lkp@intel.com> Cc: stable <stable@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20220701165352.15687-1-skhan@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
rtsx_usb uses same buffer for command and response. There could
be a potential conflict using the same buffer for both especially
if retries and timeouts are involved.
Use separate command and response buffers to avoid conflicts.
rtsx_usb driver allocates coherent dma buffer for urb transfers.
This buffer is passed to usb_bulk_msg() and usb core tries to
map already mapped buffer running into a dma mapping error.
xhci_hcd 0000:01:00.0: rejecting DMA map of vmalloc memory
WARNING: CPU: 1 PID: 279 at include/linux/dma-mapping.h:326 usb_ hcd_map_urb_for_dma+0x7d6/0x820
When packets are not received, they aren't received on $host1_if, so the
message talking about the second host not receiving them is incorrect.
Fix it.
Fixes: d4deb01467ec ("selftests: forwarding: Add a test for FDB learning") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The first host interface has by default no interest in receiving packets
MAC DA de:ad:be:ef:13:37, so it might drop them before they hit the tc
filter and this might confuse the selftest.
Enable promiscuous mode such that the filter properly counts received
packets.
Fixes: d4deb01467ec ("selftests: forwarding: Add a test for FDB learning") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>