It happens when enable debug log, if set_alt() returns
USB_GADGET_DELAYED_STATUS and usb_composite_setup_continue()
is called before increasing count of @delayed_status,
so fix it by using spinlock of @cdev->lock.
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com> Tested-by: Jay Hsu <shih-chieh.hsu@mediatek.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If isoc split in transfer with no data (the length of DATA0
packet is zero), we can't simply return immediately. Because
the DATA0 can be the first transaction or the second transaction
for the isoc split in transaction. If the DATA0 packet with no
data is in the first transaction, we can return immediately.
But if the DATA0 packet with no data is in the second transaction
of isoc split in transaction sequence, we need to increase the
qtd->isoc_frame_index and giveback urb to device driver if needed,
otherwise, the MDATA packet will be lost.
A typical test case is that connect the dwc2 controller with an
usb hs Hub (GL852G-12), and plug an usb fs audio device (Plantronics
headset) into the downstream port of Hub. Then use the usb mic
to record, we can find noise when playback.
In the case, the isoc split in transaction sequence like this:
- SSPLIT IN transaction
- CSPLIT IN transaction
- MDATA packet (176 bytes)
- CSPLIT IN transaction
- DATA0 packet (0 byte)
This patch use both the length of DATA0 and qtd->isoc_split_offset
to check if the DATA0 is in the second transaction.
The commit 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in
a more supported way") rips out a lot of code to simply the
allocation of aligned DMA. However, it also introduces a new
issue when use isoc split in transfer.
In my test case, I connect the dwc2 controller with an usb hs
Hub (GL852G-12), and plug an usb fs audio device (Plantronics
headset) into the downstream port of Hub. Then use the usb mic
to record, we can find noise when playback.
It's because that the usb Hub uses an MDATA for the first
transaction and a DATA0 for the second transaction for the isoc
split in transaction. An typical isoc split in transaction sequence
like this:
- SSPLIT IN transaction
- CSPLIT IN transaction
- MDATA packet
- CSPLIT IN transaction
- DATA0 packet
The DMA address of MDATA (urb->dma) is always DWORD-aligned, but
the DMA address of DATA0 (urb->dma + qtd->isoc_split_offset) may
not be DWORD-aligned, it depends on the qtd->isoc_split_offset (the
length of MDATA). In my test case, the length of MDATA is usually
unaligned, this cause DATA0 packet transmission error.
This patch use kmem_cache to allocate aligned DMA buf for isoc
split in transaction. Note that according to usb 2.0 spec, the
maximum data payload size is 1023 bytes for each fs isoc ep,
and the maximum allowable interrupt data payload size is 64 bytes
or less for fs interrupt ep. So we set the size of object to be
1024 bytes in the kmem cache.
Tested-by: Gevorg Sahakyan <sahakyan@synopsys.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Acked-by: Minas Harutyunyan hminas@synopsys.com> Signed-off-by: William Wu <william.wu@rock-chips.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case when a hub is connected to DWC2 host
auto suspend occurs and host goes to
hibernation. When any device connected to hub
host hibernation exiting incorrectly.
- Added dwc2_hcd_rem_wakeup() function call to
exit from suspend state by remote wakeup.
- Increase timeout value for port suspend bit to be set.
Acked-by: Minas Harutyunyan <hminas@synopsys.com> Signed-off-by: Artur Petrosyan <arturp@synopsys.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 0198d7bb8a0c ("ASoC: omap-mcbsp: Convert to use the sdma-pcm
instead of omap-pcm") resulted in broken audio playback on OMAP1510
(discovered on Amstrad Delta).
When running on OMAP1510, omap-pcm used to obtain DMA offset from
snd_dmaengine_pcm_pointer_no_residue() based on DMA interrupt triggered
software calculations instead of snd_dmaengine_pcm_pointer() which
depended on residue value calculated from omap_dma_get_src_pos().
Similar code path is still available in now used
sound/soc/soc-generic-dmaengine-pcm.c but it is not triggered.
It was verified already before that omap_get_dma_src_pos() from
arch/arm/plat-omap/dma.c didn't work correctly for OMAP1510 - see
commit 1bdd7419910c ("ASoC: OMAP: fix OMAP1510 broken PCM pointer
callback") for details. Apparently the same applies to its successor,
omap_dma_get_src_pos() from drivers/dma/ti/omap-dma.c.
On the other hand, snd_dmaengine_pcm_pointer_no_residue() is described
as depreciated and discouraged for use in new drivers because of its
unreliable accuracy. However, it seems the only working option for
OPAM1510 now, as long as a software calculated residue is not
implemented as OMAP1510 fallback in omap-dma.
Using snd_dmaengine_pcm_pointer_no_residue() code path instead of
snd_dmaengine_pcm_pointer() in sound/soc/soc-generic-dmaengine-pcm.c
can be triggered in two ways:
- by passing pcm->flags |= SND_DMAENGINE_PCM_FLAG_NO_RESIDUE from
sound/soc/omap/sdma-pcm.c,
- by passing dma_caps.residue_granularity =
DMA_RESIDUE_GRANULARITY_DESCRIPTOR from DMA engine.
Let's do the latter.
Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com> Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently smatch warns of possible Spectre-V1 issue in ahci_led_store():
drivers/ata/libahci.c:1150 ahci_led_store() warn: potential spectre issue 'pp->em_priv' (local cap)
Userspace controls @pmp from following callchain:
em_message->store()
->ata_scsi_em_message_store()
-->ap->ops->em_store()
--->ahci_led_store()
After the mask+shift @pmp is effectively an 8b value, which is used to
index into an array of length 8, so sanitize the array index.
Run the completer task to post a work completion after processing
a memory registration or invalidate work request. This covers the
case where the memory registration or invalidate was the last work
request posted to the qp.
Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com> Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On some Mali-DP processors, the LAYER_FORMAT register contains fields
other than the format. These bits were unconditionally cleared when
setting the pixel format, whereas they should be preserved at their
reset values.
In the situation that DE and SE aren’t shared the same interrupt number,
the Global SE interrupts mask bit MASK_IRQ_EN in MASKIRQ must be set, or
else other mask bits will not work and no SE interrupt will occur. This
patch enables MASK_IRQ_EN for SE to fix this problem.
One needs to ensure that the crtcs are shutdown so that the
drm_crtc_state->connector_mask reflects that no connectors
are currently active. Further, it reduces the reference
count for each connector. This ensures that the connectors
and encoders can be cleanly removed either when _unbind
is called for the corresponding drivers or by
drm_mode_config_cleanup().
We need drm_atomic_helper_shutdown() to be called before
component_unbind_all() otherwise the connectors attached to the
component device will have the wrong reference count value and will not
be cleanly removed.
When vm test is skipped because of unmet dependencies and/or unsupported
configuration, it exits with error which is treated as a fail by the
Kselftest framework. This leads to false negative result even when the
test could not be run.
Change it to return kselftest skip code when a test gets skipped to
clearly report that the test could not be run.
Kselftest framework SKIP code is 4 and the framework prints appropriate
messages to indicate that the test is skipped.
When zram test is skipped because of unmet dependencies and/or
unsupported configuration, it exits with error which is treated as
a fail by the Kselftest framework. This leads to false negative result
even when the test could not be run.
Change it to return kselftest skip code when a test gets skipped to
clearly report that the test could not be run.
Kselftest framework SKIP code is 4 and the framework prints appropriate
messages to indicate that the test is skipped.
When user test is skipped because of unmet dependencies and/or
unsupported configuration, it exits with error which is treated as
a fail by the Kselftest framework. This leads to false negative result
even when the test could not be run.
Change it to return kselftest skip code when a test gets skipped to
clearly report that the test could not be run. Add an explicit check
for module presence and return skip code if module isn't present.
Kselftest framework SKIP code is 4 and the framework prints appropriate
messages to indicate that the test is skipped.
When sysctl test is skipped because of unmet dependencies and/or
unsupported configuration, it exits with error which is treated as
a fail by the Kselftest framework. This leads to false negative result
even when the test could not be run.
Change it to return kselftest skip code when a test gets skipped to
clearly report that the test could not be run.
Changed return code to kselftest skip code in skip error legs that check
requirements and module probe test error leg.
Kselftest framework SKIP code is 4 and the framework prints appropriate
messages to indicate that the test is skipped.
When static_keys test is skipped because of unmet dependencies and/or
unsupported configuration, it exits with error which is treated as a fail
by the Kselftest framework. This leads to false negative result even when
the test could not be run.
Change it to return kselftest skip code when a test gets skipped to clearly
report that the test could not be run.
Added an explicit searches for test_static_key_base and test_static_keys
modules and return skip code if they aren't found to differentiate between
the failure to load the module condition and module not found condition.
Kselftest framework SKIP code is 4 and the framework prints appropriate
messages to indicate that the test is skipped.
When pstore_post_reboot test gets skipped because of unmet dependencies
and/or unsupported configuration, it returns 0 which is treated as a pass
by the Kselftest framework. This leads to false positive result even when
the test could not be run.
Change it to return kselftest skip code when a test gets skipped to clearly
report that the test could not be run.
Kselftest framework SKIP code is 4 and the framework prints appropriate
messages to indicate that the test is skipped.
The helper module would be unloaded after nf_conntrack_helper_unregister,
so it may cause a possible panic caused by race.
nf_ct_iterate_destroy(unhelp, me) reset the helper of conntrack as NULL,
but maybe someone has gotten the helper pointer during this period. Then
it would panic, when it accesses the helper and the module was unloaded.
Take an example as following:
CPU0 CPU1
ctnetlink_dump_helpinfo
helper = rcu_dereference(help->helper);
unhelp
set helper as NULL
unload helper module
helper->to_nlattr(skb, ct);
As above, the cpu0 tries to access the helper and its module is unloaded,
then the panic happens.
It is a waste of memory to use a full "struct netns_sysctl_ipv6"
while only one pointer is really used, considering netns_sysctl_ipv6
keeps growing.
Also, since "struct netns_frags" has cache line alignment,
it is better to move the frags_hdr pointer outside, otherwise
we spend a full cache line for this pointer.
This saves 192 bytes of memory per netns.
Fixes: c038a767cd69 ("ipv6: add a new namespace for nf_conntrack_reasm") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On this system EC interrupt triggers constantly kicking devices out of
low power states and thus blocking power management. The system also has
a PCIe root port hosting Alpine Ridge Thunderbolt controller and it
never gets a chance to go to D3cold because of this.
Since the power button works the same regardless if EC interrupt is
enabled or not during s2idle, add a quirk for this machine that sets
ec_no_wakeup=true preventing spurious wakeups.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The clocks have already been explicitly disabled and put as part of
remove() so the runtime suspend callback must not be run when balancing
the runtime PM usage count before returning.
Fixes: 16adc674d0d6 ("usb: dwc3: add generic OF glue layer") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Correct MIPI/PCIe/USB_HSIC's PGC offset based on
design RTL, the values in the Reference Manual
(Rev. 1, 01/2018 and the older ones) are incorrect.
The correct offset values should be as below:
0x800 ~ 0x83F: PGC for core0 of A7 platform;
0x840 ~ 0x87F: PGC for core1 of A7 platform;
0x880 ~ 0x8BF: PGC for SCU of A7 platform;
0xA00 ~ 0xA3F: PGC for fastmix/megamix;
0xC00 ~ 0xC3F: PGC for MIPI PHY;
0xC40 ~ 0xC7F: PGC for PCIe_PHY;
0xC80 ~ 0xCBF: PGC for USB OTG1 PHY;
0xCC0 ~ 0xCFF: PGC for USB OTG2 PHY;
0xD00 ~ 0xD3F: PGC for USB HSIC PHY;
Commit cc66b3038254 ("hwmon: (nct6775) Rework temperature source and label
handling") changed a loop limit from "data->temp_label_num - 1" to "32",
as part of moving from a string array to a bit mask. This results in the
following error, reported by UBSAN.
UBSAN: Undefined behaviour in drivers/hwmon/nct6775.c:4179:27
shift exponent 32 is too large for 32-bit type 'long unsigned int'
Similar to the original loop, the limit has to be one less than the
number of bits.
Fixes: cc66b3038254 ("hwmon: (nct6775) Rework temperature source and label handling") Reported-by: Paul Menzel <pmenzel+linux-hwmon@molgen.mpg.de> Cc: Paul Menzel <pmenzel+linux-hwmon@molgen.mpg.de> Tested-by: Paul Menzel <pmenzel+linux-hwmon@molgen.mpg.de> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Calling fan related SMM functions implemented by Dell BIOS firmware on Dell
XPS13 9333 freeze kernel for about 500ms. Until Dell fixes it we need to
disable fan support for Dell XPS13 9333.
Via "force" module param fan support can be enabled.
Compared to other clients the Linux smb3 client ramps up
credits very slowly, taking more than 128 operations before a
maximum size write could be sent (since the number of credits
requested is only 2 per small operation, causing the credit
limit to grow very slowly).
This lack of credits initially would impact large i/o performance,
when large i/o is tried early before enough credits are built up.
Signed-off-by: Steve French <sfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Modern distroes increasingly make use of BPF programs. Default
Ubuntu 18.04 installation boots with a number of cgroup_skb
programs loaded.
test_offloads.py tries to check if programs and maps are not
leaked on error paths by confirming the list of programs on the
system is empty between tests.
Since we can no longer expect the system to have no BPF objects
at boot try to remember the programs and maps present at the start,
and skip those when scanning the system.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
GCC built for arc*-*-linux has "-mmedium-calls" implicitly enabled by default
thus we don't see any problems during Linux kernel compilation.
----------------------------->8------------------------
arc-linux-gcc -mcpu=arc700 -Q --help=target | grep calls
-mlong-calls [disabled]
-mmedium-calls [enabled]
----------------------------->8------------------------
But if we try to use so-called Elf32 toolchain with GCC configured for
arc*-*-elf* then we'd see the following failure:
----------------------------->8------------------------
init/do_mounts.o: In function 'init_rootfs':
do_mounts.c:(.init.text+0x108): relocation truncated to fit: R_ARC_S21W_PCREL
against symbol 'unregister_filesystem' defined in .text section in fs/filesystems.o
arc-elf32-ld: final link failed: Symbol needs debug section which does not exist
make: *** [vmlinux] Error 1
----------------------------->8------------------------
That happens because neither "-mmedium-calls" nor "-mlong-calls" are enabled in
Elf32 GCC:
----------------------------->8------------------------
arc-elf32-gcc -mcpu=arc700 -Q --help=target | grep calls
-mlong-calls [disabled]
-mmedium-calls [disabled]
----------------------------->8------------------------
Now to make it possible to use Elf32 toolchain for building Linux kernel
we're explicitly add "-mmedium-calls" to CFLAGS.
And since we add "-mmedium-calls" to the global CFLAGS there's no point in
having per-file copies thus removing them.
Buffer overflow error should not occur, as mode_fixup() callback
filters pixel clock value and it should never exceed 600000. However,
current implementation is not obviously safe and relies on
implementation of mode_fixup().
Make 'i' variable never reach unsafe value in order to avoid buffer
overflow error.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: bf1722ca ("drm/bridge/sii8620: rewrite hdmi start sequence") Signed-off-by: Maciej Purski <m.purski@samsung.com> Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/1511341718-6974-1-git-send-email-m.purski@samsung.com Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Before returning -EPERM we should release some resources, as already done
in the other error handling path of the function.
Fixes: d8f9cc328c88 ("IB/mlx4: Mark user MR as writable if actual virtual memory is writable") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The documentation for the touchscreen-swapped-x-y property states that
swapping is done after inverting if both are used. RMI4 did it the other
way around, leading to inconsistent behavior with regard to other
touchscreens.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Tested-by: Nick Dyer <nick@shmanahar.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some RoCE specific code in qedr_modify_qp was run over an iWARP device
when running perftest benchmarks without the -R option.
The commit 3e44e0ee0893 ("IB/providers: Avoid null netdev check for RoCE")
exposed this. Dropping the check for NULL pointer on ndev in
qedr_modify_qp lead to a null pointer dereference when running over
iWARP. Before the code would identify ndev as being NULL and return an
error.
In rxe_send, when network_type is not RDMA_NETWORK_IPV4 or
RDMA_NETWORK_IPV6, skb is freed and -EINVAL is returned.
Then rxe_xmit_packet will return -EINVAL, too. In rxe_requester,
this skb is double freed.
In rxe_requester, kfree_skb is needed only after fill_packet fails.
So kfree_skb is moved from label err to test fill_packet.
Fixes: 5793b4652155 ("IB/rxe: remove unnecessary skb_clone in xmit") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The mvneta Ethernet driver is used on a few different Marvell SoCs.
Some SoCs have per cpu interrupts for Ethernet events, the driver uses
a per CPU napi structure for this case. Some SoCs such as armada 3700
have a single interrupt for Ethernet events, the driver uses a global
napi structure for this case.
Current mvneta_config_rss() always operates the per cpu napi structure.
Fix it by operating a global napi for "single interrupt" case, and per
cpu napi structure for remaining cases.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Fixes: 2636ac3cc2b4 ("net: mvneta: Add network support for Armada 3700 SoC") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The mvneta Ethernet driver is used on a few different Marvell SoCs.
Some SoCs have per cpu interrupts for Ethernet events. Some SoCs have
a single interrupt, independent of the CPU. The driver handles this by
having a per CPU napi structure when there are per CPU interrupts, and
a global napi structure when there is a single interrupt.
When the napi core calls mvneta_poll(), it passes the napi
instance. This was not being propagated through the call chain, and
instead the per-cpu napi instance was passed to napi_gro_receive()
call. This breaks when there is a single global napi instance.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Fixes: 2636ac3cc2b4 ("net: mvneta: Add network support for Armada 3700 SoC") Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix tcf_unbind_filter missing in cls_matchall as this will trigger
WARN_ON() in cbq_destroy_class().
Fixes: fd62d9f5c575f ("net/sched: matchall: Fix configuration race") Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: 79134e6ce2c9 ("net: do not create fallback tunnels for non-default namespaces") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136]
(rev 07)
Subsystem: ASUSTeK Computer Inc. RTL810xE PCI Express Fast
Ethernet controller [1043:200f]
Flags: bus master, fast devsel, latency 0, IRQ 16
I/O ports at e000 [size=256]
Memory at ef100000 (64-bit, non-prefetchable) [size=4K]
Memory at e0000000 (64-bit, prefetchable) [size=16K]
Capabilities: <access denied>
Kernel driver in use: r8169
Kernel modules: r8169
Falling back to MSI fixes the issue.
Fixes: 6c6aa15fdea5 ("r8169: improve interrupt handling") Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
req->sdiag_family is a user-controlled value that's used as an array
index. Sanitize it after the bounds check to avoid speculative
out-of-bounds array access.
This also protects the sock_is_registered() call, so this removes the
sanitize call there.
Fixes: e978de7a6d38 ("net: socket: Fix potential spectre v1 gadget in sock_is_registered") Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: konrad.wilk@oracle.com Cc: jamie.iles@oracle.com Cc: liran.alon@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Jeremy Cline <jcline@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It was possible to directly leak the kernel address where the isdn_dev
structure pointer was stored. This is a kernel ASLR bypass for anyone
with access to the ioctl. The code had been present since the beginning
of git history, though this shouldn't ever be needed for normal operation,
therefore remove it.
Reported-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Karsten Keil <isdn@linux-pingi.de> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ 210.115454] Memory state around the buggy address:
[ 210.115460] ffff880107e17000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 210.115465] ffff880107e17080: fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb
[ 210.115469] >ffff880107e17100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 210.115472] ^
[ 210.115477] ffff880107e17180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 210.115481] ffff880107e17200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 210.115483] ==================================================================
And finally when BT_DBG() and ftrace was enabled it showed:
Only in the failed case, sco_sock_kill() gets called with the same sock
pointer two times. Add a check for SOCK_DEAD to avoid continue killing
a socket which has already been killed.
Make sure to disable clocks and deregister any exported partitions
before returning on late probe errors.
Note that since commit ee895ccdf776 ("misc: sram: fix enabled clock leak
on error path"), partitions are deliberately exported before enabling
the clock so we stick to that logic here. A follow up patch will address
this.
dw8250_set_termios() doesn't set baud rate if the arg "old ktermios" is
NULL. This happens during resume.
Call Trace:
...
[ 54.928108] dw8250_set_termios+0x162/0x170
[ 54.928114] serial8250_set_termios+0x17/0x20
[ 54.928117] uart_change_speed+0x64/0x160
[ 54.928119] uart_resume_port
...
So the baud rate is not restored after S3 and breaks the apps who use
UART, for example, console and bluetooth etc.
We address this issue by setting the baud rate irrespective of arg
"old", just like the drivers for other 8250 IPs. This is tested with
Intel Broxton platform.
Signed-off-by: Chen Hu <hu1.chen@intel.com> Fixes: 4e26b134bd17 ("serial: 8250_dw: clock rate handling for all ACPI platforms") Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com> Cc: stable <stable@vger.kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
did not account for devices with a slave device on the expansion port.
This patch pokes the INT0 register in the slave device, if present, in
order to ensure that MSI interrupts don't get permanently "stuck"
because of a sleep wake-up interrupt as described here:
The above commit causes userland application to no longer write
correctly its first write to a dumb terminal connected to /dev/ttyS0.
This commit seems to be the culprit. It's as though the TX FIFO is being
reset during that write. What should be displayed is:
Every time I tried to upgrade my laptop from 3.10.x to 4.x I faced an
issue by which the fan would run at full speed upon resume. Bisecting
it showed me the issue was introduced in 3.17 by commit 821d6f0359b0
(ACPI / sleep: Do not save NVS for new machines to accelerate S3). This
code only affects machines built starting as of 2012, but this Asus
1025C laptop was made in 2012 and apparently needs the NVS data to be
saved, otherwise the CPU's thermal state is not properly reported on
resume and the fan runs at full speed upon resume.
Here's a very simple way to check if such a machine is affected :
# cat /sys/class/thermal/thermal_zone0/temp
55000
( now suspend, wait one second and resume )
# cat /sys/class/thermal/thermal_zone0/temp
0
(and after ~15 seconds the fan starts to spin)
Let's apply the same quirk as commit cbc00c13 (ACPI: save NVS memory
for Lenovo G50-45) and reuse the function it provides. Note that this
commit was already backported to 4.9.x but not 4.4.x.
Cc: 3.17+ <stable@vger.kernel.org> # 3.17+: requires cbc00c13 Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The device exposes AT, NMEA and DIAG ports in both USB configurations.
The patch explicitly ignores interfaces 0 and 1, as they're bound to
other drivers already; and also interface 6, which is a GNSS interface
for which we don't have a driver yet.
Signed-off-by: Movie Song <MovieSong@aten-itlab.cn> Cc: Johan Hovold <johan@kernel.org> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The portdata spinlock can be taken in interrupt context (via
sierra_outdat_callback()).
Disable interrupts when taking the portdata spinlock when discarding
deferred URBs during close to prevent a possible deadlock.
Fixes: 014333f77c0b ("USB: sierra: fix urb and memory leak on disconnect") Cc: stable <stable@vger.kernel.org> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[ johan: amend commit message and add fixes and stable tags ] Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The endian conversions used in vxp_dma_read() and vxp_dma_write() are
superfluous and even wrong on big-endian machines, as inw() and outw()
already do conversions. Kill them.
snd_dma_alloc_pages_fallback() tries to allocate pages again when the
allocation fails with reduced size. But the first try actually
*increases* the size to power-of-two, which may give back a larger
chunk than the requested size. This confuses the callers, e.g. sgbuf
assumes that the size is equal or less, and it may result in a bad
loop due to the underflow and eventually lead to Oops.
The code of this function seems incorrectly assuming the usage of
get_order(). We need to decrease at first, then align to
power-of-two.
Reported-and-tested-by: he, bo <bo.he@intel.com> Reported-by: zhang jun <jun.zhang@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
One place in cs5535audio_build_dma_packets() does an extra conversion
via cpu_to_le32(); namely jmpprd_addr is passed to setup_prd() ops,
which writes the value via cs_writel(). That is, the callback does
the conversion by itself, and we don't need to convert beforehand.
The virmidi output trigger tries to parse the all available bytes and
process sequencer events as much as possible. In a normal situation,
this is supposed to be relatively short, but a program may give a huge
buffer and it'll take a long time in a single spin lock, which may
eventually lead to a soft lockup.
This patch simply adds a workaround, a cond_resched() call in the loop
if applicable. A better solution would be to move the event processor
into a work, but let's put a duct-tape quickly at first.
The endian conversions used in vx2_dma_read() and vx2_dma_write() are
superfluous and even wrong on big-endian machines, as inl() and outl()
already do conversions. Kill them.
Spotted by sparse, a warning like:
sound/pci/vx222/vx222_ops.c:278:30: warning: incorrect type in argument 1 (different base types)
AF_RXRPC has a keepalive message generator that generates a message for a
peer ~20s after the last transmission to that peer to keep firewall ports
open. The implementation is incorrect in the following ways:
(1) It mixes up ktime_t and time64_t types.
(2) It uses ktime_get_real(), the output of which may jump forward or
backward due to adjustments to the time of day.
(3) If the current time jumps forward too much or jumps backwards, the
generator function will crank the base of the time ring round one slot
at a time (ie. a 1s period) until it catches up, spewing out VERSION
packets as it goes.
Fix the problem by:
(1) Only using time64_t. There's no need for sub-second resolution.
(2) Use ktime_get_seconds() rather than ktime_get_real() so that time
isn't perceived to go backwards.
(3) Simplifying rxrpc_peer_keepalive_worker() by splitting it into two
parts:
(a) The "worker" function that manages the buckets and the timer.
(b) The "dispatch" function that takes the pending peers and
potentially transmits a keepalive packet before putting them back
in the ring into the slot appropriate to the revised last-Tx time.
(4) Taking everything that's pending out of the ring and splicing it into
a temporary collector list for processing.
In the case that there's been a significant jump forward, the ring
gets entirely emptied and then the time base can be warped forward
before the peers are processed.
The warping can't happen if the ring isn't empty because the slot a
peer is in is keepalive-time dependent, relative to the base time.
(5) Limit the number of iterations of the bucket array when scanning it.
(6) Set the timer to skip any empty slots as there's no point waking up if
there's nothing to do yet.
This can be triggered by an incoming call from a server after a reboot with
AF_RXRPC and AFS built into the kernel causing a peer record to be set up
before userspace is started. The system clock is then adjusted by
userspace, thereby potentially causing the keepalive generator to have a
meltdown - which leads to a message like:
Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There have been two reports that network doesn't come back on resume
from suspend when using MSI-X. Both cases affect the same chip version
(RTL8168g - version 40), on different systems. Falling back to MSI
fixes the issue.
Even though we don't really have a proof yet that the network chip
version is to blame, let's disable MSI-X for this version.
Reported-by: Steve Dodd <steved424@gmail.com> Reported-by: Lou Reed <gogen@disroot.org> Tested-by: Steve Dodd <steved424@gmail.com> Tested-by: Lou Reed <gogen@disroot.org> Fixes: 6c6aa15fdea5 ("r8169: improve interrupt handling") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current check relies on function BDF addresses and can get
us wrong e.g when two VFs are assigned into a VM and the PCI
v-address is set by the hypervisor.
Fixes: 5c65c564c962 ('net/mlx5e: Support offloading TC NIC hairpin flows') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Alaa Hleihel <alaa@mellanox.com> Tested-by: Alaa Hleihel <alaa@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In previous patch mlxsw_afa_resource_del() was added to avoid a duplicate
resource detruction scenario.
For mirror actions, such duplicate destruction leads to a crash as in:
# tc qdisc add dev swp49 ingress
# tc filter add dev swp49 parent ffff: \
protocol ip chain 100 pref 10 \
flower skip_sw dst_ip 192.168.101.1 action drop
# tc filter add dev swp49 parent ffff: \
protocol ip pref 10 \
flower skip_sw dst_ip 192.168.101.1 action goto chain 100 \
action mirred egress mirror dev swp4
Therefore add a call to mlxsw_afa_resource_del() in
mlxsw_afa_mirror_destroy() in order to clear that resource
from rule's resources.
Fixes: d0d13c1858a1 ("mlxsw: spectrum_acl: Add support for mirror action") Signed-off-by: Nir Dotan <nird@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Each tc flower rule uses a hidden count action. As counter resource may
not be available due to limited HW resources, update _counter_create()
and _counter_destroy() pair to follow previously introduced symmetric
error condition handling, add a call to mlxsw_afa_resource_del() as part
of the counter resource destruction.
Fixes: c18c1e186ba8 ("mlxsw: core: Make counter index allocated inside the action append") Signed-off-by: Nir Dotan <nird@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some ACL actions require the allocation of a separate resource
prior to applying the action itself. When facing an error condition
during the setup phase of the action, resource should be destroyed.
For such actions the destruction was done twice which is dangerous
and lead to a potential crash.
The destruction took place first upon error on action setup phase
and then as the rule was destroyed.
The following sequence generated a crash:
# tc qdisc add dev swp49 ingress
# tc filter add dev swp49 parent ffff: \
protocol ip chain 100 pref 10 \
flower skip_sw dst_ip 192.168.101.1 action drop
# tc filter add dev swp49 parent ffff: \
protocol ip pref 10 \
flower skip_sw dst_ip 192.168.101.1 action goto chain 100 \
action mirred egress mirror dev swp4
Therefore add mlxsw_afa_resource_del() as a complement of
mlxsw_afa_resource_add() to add symmetry to resource_list membership
handling. Call this from mlxsw_afa_fwd_entry_ref_destroy() to make the
_fwd_entry_ref_create() and _fwd_entry_ref_destroy() pair of calls a
NOP.
Fixes: 140ce421217e ("mlxsw: core: Convert fwd_entry_ref list to be generic per-block resource list") Signed-off-by: Nir Dotan <nird@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to RFC791, 68 bytes is the minimum size of IPv4 datagram every
device must be able to forward without further fragmentation while 576
bytes is the minimum size of IPv4 datagram every device has to be able
to receive, so in ip6_tnl_xmit(), 68(IPV4_MIN_MTU) should be the right
value for the ipv4 min mtu check in ip6_tnl_xmit.
While at it, change to use max() instead of if statement.
Fixes: c9fefa08190f ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit") Reported-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It was noticed that NIC always pass all multicast traffic to the host
regardless of IFF_ALLMULTI flag on the interface.
The rule in MC Filter Table in NIC, that is configured to accept any
multicast packets, is turning on if IFF_MULTICAST flag is set on the
interface. It leads to passing all multicast traffic to the host.
This fix changes the condition to turn on that rule by checking
IFF_ALLMULTI flag as it should.
Fixes: b21f502f84be ("net:ethernet:aquantia: Fix for multicast filter handling.") Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Spectrum switch ACL action set is built in groups of three actions
which may point to additional actions. A group holds a single record
which can be set as goto record for pointing at a following group
or can be set to mark the termination of the lookup. This is perfectly
adequate for handling a series of actions to be executed on a packet.
While the SW model allows configuration of conflicting actions
where it is clear that some actions will never execute, the mlxsw
driver must block such configurations as it creates a conflict
over the single terminate/goto record value.
For a conflicting actions configuration such as:
# tc filter add dev swp49 parent ffff: \
protocol ip pref 10 \
flower skip_sw dst_ip 192.168.101.1 \
action goto chain 100 \
action mirred egress mirror dev swp4
Where it is clear that the last action will never execute, the
mlxsw driver was issuing a warning instead of returning an error.
Therefore replace that warning with an error for this specific
case.
Fixes: 4cda7d8d7098 ("mlxsw: core: Introduce flexible actions support") Signed-off-by: Nir Dotan <nird@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We need to reset metadata cache during new IOTLB initialization,
otherwise the stale pointers to previous IOTLB may be still accessed
which will lead a use after free.
Reported-by: syzbot+c51e6736a1bf614b3272@syzkaller.appspotmail.com Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is because we didn't update f->result.res when create new filter. Then in
tcindex_delete() -> tcf_unbind_filter(), we will failed to find out the res
and unbind filter, which will trigger the WARN_ON() in cbq_destroy_class().
Fix it by updating f->result.res when create new filter.
Fixes: 6e0565697a106 ("net_sched: fix another crash in cls_tcindex") Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
syzbot reported that we reinitialize an active delayed
work in vsock_stream_connect():
ODEBUG: init active (active state 0) object type: timer_list hint:
delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:1414
WARNING: CPU: 1 PID: 11518 at lib/debugobjects.c:329
debug_print_object+0x16a/0x210 lib/debugobjects.c:326
The pattern is apparently wrong, we should only initialize
the dealyed work once and could repeatly schedule it. So we
have to move out the initializations to allocation side.
And to avoid confusion, we can split the shared dwork
into two, instead of re-using the same one.
Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reported-by: <syzbot+8a9b1bd330476a4f3db6@syzkaller.appspotmail.com> Cc: Andy king <acking@vmware.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reproducer:
tc qdisc add dev lo handle 1:0 root dsmark indices 64 set_tc_index
tc filter add dev lo parent 1:0 protocol ip prio 1 tcindex mask 0xfc shift 2
tc qdisc add dev lo parent 1:0 handle 2:0 cbq bandwidth 10Mbit cell 8 avpkt 1000 mpu 64
tc class add dev lo parent 2:0 classid 2:1 cbq bandwidth 10Mbit rate 1500Kbit avpkt 1000 prio 1 bounded isolated allot 1514 weight 1 maxburst 10
tc filter add dev lo parent 2:0 protocol ip prio 1 handle 0x2e tcindex classid 2:1 pass_on
tc qdisc add dev lo parent 2:1 pfifo limit 5
tc qdisc del dev lo root
This is because in tcindex_set_parms, when there is no old_r, we set new
exts to cr.exts. And we didn't set it to filter when r == &new_filter_result.
Then in tcindex_delete() -> tcf_exts_get_net(), we will get NULL pointer
dereference as we didn't init exts.
Fix it by moving tcf_exts_change() after "if (old_r && old_r != r)" check.
Then we don't need "cr" as there is no errout after that.
Fixes: bf63ac73b3e13 ("net_sched: fix an oops in tcindex filter") Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
llc_sap_put() decreases the refcnt before deleting sap
from the global list. Therefore, there is a chance
llc_sap_find() could find a sap with zero refcnt
in this global list.
Close this race condition by checking if refcnt is zero
or not in llc_sap_find(), if it is zero then it is being
removed so we can just treat it as gone.
Reported-by: <syzbot+278893f3f7803871f7ce@syzkaller.appspotmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In l2tp code, if it is a L2TP_UDP_ENCAP tunnel, tunnel->sk points to a
UDP socket. User could call sendmsg() on both this tunnel and the UDP
socket itself concurrently. As l2tp_xmit_skb() holds socket lock and call
__sk_dst_check() to refresh sk->sk_dst_cache, while udpv6_sendmsg() is
lockless and call sk_dst_check() to refresh sk->sk_dst_cache, there
could be a race and cause the dst cache to be freed multiple times.
So we fix l2tp side code to always call sk_dst_check() to garantee
xchg() is called when refreshing sk->sk_dst_cache to avoid race
conditions.
Syzkaller reported stack trace:
BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
BUG: KASAN: use-after-free in atomic_fetch_add_unless include/linux/atomic.h:575 [inline]
BUG: KASAN: use-after-free in atomic_add_unless include/linux/atomic.h:597 [inline]
BUG: KASAN: use-after-free in dst_hold_safe include/net/dst.h:308 [inline]
BUG: KASAN: use-after-free in ip6_hold_safe+0xe6/0x670 net/ipv6/route.c:1029
Read of size 4 at addr ffff8801aea9a880 by task syz-executor129/4829
Fixes: 71b1391a4128 ("l2tp: ensure sk->dst is still valid") Reported-by: syzbot+05f840f3b04f211bad55@syzkaller.appspotmail.com Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Guillaume Nault <g.nault@alphalink.fr> Cc: David Ahern <dsahern@gmail.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The shift of 'cwnd' with '(now - hc->tx_lsndtime) / hc->tx_rto' value
can lead to undefined behavior [1].
In order to fix this use a gradual shift of the window with a 'while'
loop, similar to what tcp_cwnd_restart() is doing.
When comparing delta and RTO there is a minor difference between TCP
and DCCP, the last one also invokes dccp_cwnd_restart() and reduces
'cwnd' if delta equals RTO. That case is preserved in this change.
[1]:
[40850.963623] UBSAN: Undefined behaviour in net/dccp/ccids/ccid2.c:237:7
[40851.043858] shift exponent 67 is too large for 32-bit type 'unsigned int'
[40851.127163] CPU: 3 PID: 15940 Comm: netstress Tainted: G W E 4.18.0-rc7.x86_64 #1
...
[40851.377176] Call Trace:
[40851.408503] dump_stack+0xf1/0x17b
[40851.451331] ? show_regs_print_info+0x5/0x5
[40851.503555] ubsan_epilogue+0x9/0x7c
[40851.548363] __ubsan_handle_shift_out_of_bounds+0x25b/0x2b4
[40851.617109] ? __ubsan_handle_load_invalid_value+0x18f/0x18f
[40851.686796] ? xfrm4_output_finish+0x80/0x80
[40851.739827] ? lock_downgrade+0x6d0/0x6d0
[40851.789744] ? xfrm4_prepare_output+0x160/0x160
[40851.845912] ? ip_queue_xmit+0x810/0x1db0
[40851.895845] ? ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp]
[40851.963530] ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp]
[40852.029063] dccp_xmit_packet+0x1d3/0x720 [dccp]
[40852.086254] dccp_write_xmit+0x116/0x1d0 [dccp]
[40852.142412] dccp_sendmsg+0x428/0xb20 [dccp]
[40852.195454] ? inet_dccp_listen+0x200/0x200 [dccp]
[40852.254833] ? sched_clock+0x5/0x10
[40852.298508] ? sched_clock+0x5/0x10
[40852.342194] ? inet_create+0xdf0/0xdf0
[40852.388988] sock_sendmsg+0xd9/0x160
...
Fixes: 113ced1f52e5 ("dccp ccid-2: Perform congestion-window validation") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It turns out that we should *not* invert all not-present mappings,
because the all zeroes case is obviously special.
clear_page() does not undergo the XOR logic to invert the address bits,
i.e. PTE, PMD and PUD entries that have not been individually written
will have val=0 and so will trigger __pte_needs_invert(). As a result,
{pte,pmd,pud}_pfn() will return the wrong PFN value, i.e. all ones
(adjusted by the max PFN mask) instead of zero. A zeroed entry is ok
because the page at physical address 0 is reserved early in boot
specifically to mitigate L1TF, so explicitly exempt them from the
inversion when reading the PFN.
Manifested as an unexpected mprotect(..., PROT_NONE) failure when called
on a VMA that has VM_PFNMAP and was mmap'd to as something other than
PROT_NONE but never used. mprotect() sends the PROT_NONE request down
prot_none_walk(), which walks the PTEs to check the PFNs.
prot_none_pte_entry() gets the bogus PFN from pte_pfn() and returns
-EACCES because it thinks mprotect() is trying to adjust a high MMIO
address.
[ This is a very modified version of Sean's original patch, but all
credit goes to Sean for doing this and also pointing out that
sometimes the __pte_needs_invert() function only gets the protection
bits, not the full eventual pte. But zero remains special even in
just protection bits, so that's ok. - Linus ]
Fixes: f22cc87f6c1f ("x86/speculation/l1tf: Invert all not present mappings") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>