DO NOT access the underlying struct page of an sg table exported
by DMA-buf in dmabuf_imp_to_refs(), this is not allowed.
Please see drivers/dma-buf/dma-buf.c:mangle_sg_table() for details.
Fortunately, here (for special Xen device) we can avoid using
pages and calculate gfns directly from dma addresses provided by
the sg table.
Suggested-by: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Daniel Vetter <daniel@ffwll.ch> Link: https://lore.kernel.org/r/20240107103426.2038075-1-olekstysh@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
As per the Cadence IP document fixed the I2C clock divider value limit from
16 bits instead of 10 bits. Without this change setting up the I2C clock to
low frequencies will not work as the prescaler value might be greater than
10 bit number.
I3C clock divider value is 10 bits only. Updating the macro names for both.
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
warning in clang aims to catch these at compile time, which reveals:
arch/um/drivers/net_kern.c:353:21: warning: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Wincompatible-function-pointer-types-strict]
353 | .ndo_start_xmit = uml_net_start_xmit,
| ^~~~~~~~~~~~~~~~~~
1 warning generated.
->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of uml_net_start_xmit()
to match the prototype's to resolve the warning. While UML does not
currently implement support for kCFI, it could in the future, which
means this warning becomes a fatal CFI failure at run time.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310031340.v1vPh207-lkp@intel.com/ Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
The threads allocated inside the kernel have only a single page of
stack. Unfortunately, the vfprintf function in standard glibc may use
too much stack-space, overflowing it.
To make os_info safe to be used by helper threads, use the kernel
vscnprintf function into a smallish buffer and write out the information
to stderr.
Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
Return value of container_of(...) can't be null, so null check is not
required for 'fence'. Hence drop its NULL check.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c:93 to_amdgpu_amdkfd_fence() warn: can 'fence' even be NULL?
Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Change the rules for amdgpu_sync_resv to let KFD synchronize with VM
fences on page table reservations. This fixes intermittent memory
corruption after evictions when using amdgpu_vm_handle_moved to update
page tables for VM mappings managed through render nodes.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
WDTCTRL bit 3 sets the mode choice for the clock input of IT8784/IT8786.
Some motherboards require this bit to be set to 1 (= PCICLK mode),
otherwise the watchdog functionality gets broken. The BIOS of those
motherboards sets WDTCTRL bit 3 already to 1.
Instead of setting all bits of WDTCTRL to 0 by writing 0x00 to it, keep
bit 3 of it unchanged for IT8784/IT8786 chips. In this way, bit 3 keeps
the status as set by the BIOS of the motherboard.
Watchdog tests have been successful with this patch with the following
systems:
IT8784: Thomas-Krenn LES plus v2 (YANLING YL-KBRL2 V2)
IT8786: Thomas-Krenn LES plus v3 (YANLING YL-CLU L2)
IT8786: Thomas-Krenn LES network 6L v2 (YANLING YL-CLU6L)
In cases where mapping of mpmu/apmu/apbc registers fails, the code path
does not handle the failure gracefully, potentially leading to a memory
leak. This fix ensures proper cleanup by freeing the allocated memory
for 'pxa_unit' before returning.
In cases where kcalloc() fails for the 'clk_data->clks' allocation, the
code path does not handle the failure gracefully, potentially leading
to a memory leak. This fix ensures proper cleanup by freeing the
allocated memory for 'clk_data' before returning.
Releasing the `priv->lock` while iterating the `priv->multicast_list` in
`ipoib_mcast_join_task()` opens a window for `ipoib_mcast_dev_flush()` to
remove the items while in the middle of iteration. If the mcast is removed
while the lock was dropped, the for loop spins forever resulting in a hard
lockup (as was reported on RHEL 4.18.0-372.75.1.el8_6 kernel):
Task A (kworker/u72:2 below) | Task B (kworker/u72:0 below)
-----------------------------------+-----------------------------------
ipoib_mcast_join_task(work) | ipoib_ib_dev_flush_light(work)
spin_lock_irq(&priv->lock) | __ipoib_ib_dev_flush(priv, ...)
list_for_each_entry(mcast, | ipoib_mcast_dev_flush(dev = priv->dev)
&priv->multicast_list, list) |
ipoib_mcast_join(dev, mcast) |
spin_unlock_irq(&priv->lock) |
| spin_lock_irqsave(&priv->lock, flags)
| list_for_each_entry_safe(mcast, tmcast,
| &priv->multicast_list, list)
| list_del(&mcast->list);
| list_add_tail(&mcast->list, &remove_list)
| spin_unlock_irqrestore(&priv->lock, flags)
spin_lock_irq(&priv->lock) |
| ipoib_mcast_remove_list(&remove_list)
(Here, `mcast` is no longer on the | list_for_each_entry_safe(mcast, tmcast,
`priv->multicast_list` and we keep | remove_list, list)
spinning on the `remove_list` of | >>> wait_for_completion(&mcast->done)
the other thread which is blocked |
and the list is still valid on |
it's stack.)
Fix this by keeping the lock held and changing to GFP_ATOMIC to prevent
eventual sleeps.
Unfortunately we could not reproduce the lockup and confirm this fix but
based on the code review I think this fix should address such lockups.
Based on grepping through the source code this driver appears to be
missing a call to drm_atomic_helper_shutdown() at system shutdown time
and at driver unbind time. Among other things, this means that if a
panel is in use that it won't be cleanly powered off at system
shutdown time.
The fact that we should call drm_atomic_helper_shutdown() in the case
of OS shutdown/restart and at driver remove (or unbind) time comes
straight out of the kernel doc "driver instance overview" in
drm_drv.c.
A few notes about this fix:
- When adding drm_atomic_helper_shutdown() to the unbind path, I added
it after drm_kms_helper_poll_fini() since that's when other drivers
seemed to have it.
- Technically with a previous patch, ("drm/atomic-helper:
drm_atomic_helper_shutdown(NULL) should be a noop"), we don't
actually need to check to see if our "drm" pointer is NULL before
calling drm_atomic_helper_shutdown(). We'll leave the "if" test in,
though, so that this patch can land without any dependencies. It
could potentially be removed later.
- This patch also makes sure to set the drvdata to NULL in the case of
bind errors to make sure that shutdown can't access freed data.
Suggested-by: Maxime Ripard <mripard@kernel.org> Reviewed-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When using 32 bit RGB formats, the RGA on the rk3568 produces wrong
colors as the wrong color channels are read or written. The reason is
that the format description for the channel swizzeling is wrong and the
wrong bits are configured. For example, when converting ARGB32 to NV12,
the alpha channel is used as blue channel.. This doesn't happen if the
color format is the same on both sides.
Fix the color_swap settings of the formats to correctly handle 32 bit
RGB formats.
For RGA_COLOR_FMT_XBGR8888, the RGA_COLOR_ALPHA_SWAP bit doesn't have an
effect. Thus, it isn't possible to handle the V4L2_PIX_FMT_XRGB32. Thus,
it is removed from the list of supported formats.
Signed-off-by: Michael Tretter <m.tretter@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Sasha Levin <sashal@kernel.org>
The function stk1160_dbg gets called too many times, which causes
the output to get flooded with messages. Since stk1160_dbg uses
printk, it is now replaced with printk_ratelimited.
Suggested-by: Phillip Potter <phil@philpotter.co.uk> Signed-off-by: Ghanshyam Agrawal <ghanshyam1898@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Sasha Levin <sashal@kernel.org>
After reading the code, I think this is what happens:
We have a DSI host defined in the device tree and a DSI peripheral under
that host (i.e. an i2c device using the DSI as data bus doesn't exhibit
this behavior).
The host driver calls mipi_dsi_host_register(), which causes (via a few
functions) mipi_dsi_device_add() to be called for the DSI peripheral. So
now we have a DSI device under the host, but attach hasn't been called.
Normally the probing of the devices continues, and eventually the DSI
peripheral's driver will call mipi_dsi_attach(), attaching the
peripheral.
However, if the host driver's probe encounters an error after calling
mipi_dsi_host_register(), and before the peripheral has called
mipi_dsi_attach(), the host driver will do cleanups and return an error
from its probe function. The cleanups include calling
mipi_dsi_host_unregister().
mipi_dsi_host_unregister() will call two functions for all its DSI
peripheral devices: mipi_dsi_detach() and mipi_dsi_device_unregister().
The latter makes sense, as the device exists, but the former may be
wrong as attach has not necessarily been done.
To fix this, track the attached state of the peripheral, and only detach
from mipi_dsi_host_unregister() if the peripheral was attached.
Note that I have only tested this with a board with an i2c DSI
peripheral, not with a "pure" DSI peripheral.
However, slightly related, the unregister machinery still seems broken.
E.g. if the DSI host driver is unbound, it'll detach and unregister the
DSI peripherals. After that, when the DSI peripheral driver unbound
it'll call detach either directly or using the devm variant, leading to
a crash. And probably the driver will crash if it happens, for some
reason, to try to send a message via the DSI bus.
But that's another topic.
Tested-by: H. Nikolaus Schaller <hns@goldelico.com> Acked-by: Maxime Ripard <mripard@kernel.org> Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com> Tested-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230921-dsi-detach-fix-v1-1-d0de2d1621d9@ideasonboard.com Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/gpu/drm/drm_framebuffer.c:654 drm_mode_getfb2_ioctl() error: uninitialized symbol 'ret'.
'ret' is possibly not set when there are no errors, causing the error
above. I can't say if that ever happens in real-life, but in any case I
think it is good to initialize 'ret' to 0.
drivers/gpu/drm/drm_file.c:967 drm_show_memory_stats() error: uninitialized symbol 'supported_status'.
'supported_status' is only set in one code path. I'm not familiar with
the code to say if that path will always be ran in real life, but
whether that is the case or not, I think it is good to initialize
'supported_status' to 0 to silence the warning (and possibly fix a bug).
If refcount is less than 1, we should just warn, unlock dentry and
return true, so that the caller doesn't try to do anything else.
Taking care of that leaves the rest of "lockref_put_return() has
failed" case equivalent to "decrement refcount and rejoin the
normal slow path after the point where we grab ->d_lock".
NOTE: lockref_put_return() is strictly a fastpath thing - unlike
the rest of lockref primitives, it does not contain a fallback.
Caller (and it looks like fast_dput() is the only legitimate one
in the entire kernel) has to do that itself. Reasons for
lockref_put_return() failures:
* ->d_lock held by somebody
* refcount <= 0
* ... or an architecture not supporting lockref use of
cmpxchg - sparc, anything non-SMP, config with spinlock debugging...
We could add a fallback, but it would be a clumsy API - we'd have
to distinguish between:
(1) refcount > 1 - decremented, lock not held on return
(2) refcount < 1 - left alone, probably no sense to hold the lock
(3) refcount is 1, no cmphxcg - decremented, lock held on return
(4) refcount is 1, cmphxcg supported - decremented, lock *NOT* held
on return.
We want to return with no lock held in case (4); that's the whole point of that
thing. We very much do not want to have the fallback in case (3) return without
a lock, since the caller might have to retake it in that case.
So it wouldn't be more convenient than doing the fallback in the caller and
it would be very easy to screw up, especially since the test coverage would
suck - no way to test (3) and (4) on the same kernel build.
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
The virtual widget example makes use of an undefined SND_SOC_DAPM_NOPM
argument passed to SND_SOC_DAPM_MIXER(). Replace with the correct
SND_SOC_NOPM definition.
In a couple of loops over the all streams, we check the bitmap against
the loop counter. A more correct reference would be, however, the
index of each stream, instead.
This patch corrects the check of bitmaps to the stream index.
Note that this change doesn't fix anything for now; all existing
drivers set up the stream indices properly, hence the loop count is
always equal with the stream index. That said, this change is only
for consistency.
Let's check return value of f2fs_reserve_new_block() in do_recover_data()
rather than letting it fails silently.
Also refactoring check condition on return value of f2fs_reserve_new_block()
as below:
- trigger f2fs_bug_on() only for ENOSPC case;
- use do-while statement to avoid redundant codes;
Currently, if a VF is disabled using the
'ip link set dev $ETHX vf $VF_NUM state disable' command, the VF is still
able to receive traffic.
Fix the behavior of the 'ip link set dev $ETHX vf $VF_NUM state disable'
to completely shutdown the VF's queues making it entirely disabled and
not able to receive or send any traffic.
Modify the behavior of the 'ip link set $ETHX vf $VF_NUM state enable'
command to make a VF do reinitialization bringing the queues back up.
Co-developed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Jan Sokolowski <jan.sokolowski@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Andrii Staikov <andrii.staikov@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In case of an incomplete command or a command with a null identifier 2
reject packets will be sent, one with the identifier and one with 0.
Consuming the data of the command will prevent it.
This allows to send a reject packet for each corrupted command in a
multi-command packet.
Signed-off-by: Frédéric Danis <frederic.danis@collabora.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Set both WIDEBAND_SPEECH_SUPPORTED and VALID_LE_STATES quirks
for QCA2066.
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This is a more of a cosmetic fix. The branch will only be taken if
proberesp_ies is set, which implies that beacon_ies is not set unless we
are connected to an AP that just did a channel switch. And, in that case
we should have found the BSS in the internal storage to begin with.
Upon assembling the array, both kernel and mdadm allow the devices to have event
counter difference of 1, and still consider them as up-to-date.
However, a device whose event count is behind by 1, may in fact not be up-to-date,
and array resync with such a device may cause data corruption.
To avoid this, consult the superblock of the freshest device about the status
of a device, whose event counter is behind by 1.
mv88e6xxx_get_stats, which collects stats from various sources,
expects all callees to return the number of stats read. If an error
occurs, 0 should be returned.
Prevent future mishaps of this kind by updating the return type to
reflect this contract.
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Per fsl,mxs-dma.yaml, the node name should be 'dma-controller'.
Change it to fix the following dt-schema warning.
imx28-apf28.dtb: dma-apbx@80024000: $nodename:0: 'dma-apbx@80024000' does not match '^dma-controller(@.*)?$'
from schema $id: http://devicetree.org/schemas/dma/fsl,mxs-dma.yaml#
The 'gpios' property to describe the SDA and SCL GPIOs is considered
deprecated according to i2c-gpio.yaml.
Switch to the preferred 'sda-gpios' and 'scl-gpios' properties.
This fixes the following schema warnings:
imx23-sansa.dtb: i2c-0: 'sda-gpios' is a required property
from schema $id: http://devicetree.org/schemas/i2c/i2c-gpio.yaml#
imx23-sansa.dtb: i2c-0: 'scl-gpios' is a required property
from schema $id: http://devicetree.org/schemas/i2c/i2c-gpio.yaml#
Per leds-gpio.yaml, the led names should start with 'led'.
Change it to fix the following dt-schema warning:
imx27-apf27dev.dtb: leds: 'user' does not match any of the regexes: '(^led-[0-9a-f]$|led)', 'pinctrl-[0-9]+'
from schema $id: http://devicetree.org/schemas/leds/leds-gpio.yaml#
Per display-timings.yaml, the 'timing' pattern should be used to
describe the display timings.
Change it accordingly to fix the following dt-schema warning:
imx27-apf27dev.dtb: display-timings: '800x480' does not match any of the regexes: '^timing', 'pinctrl-[0-9]+'
from schema $id: http://devicetree.org/schemas/display/panel/display-timings.yaml#
Per imx-iim.yaml, the compatible string should only contain a single
entry.
Use it as "fsl,imx25-iim" to fix the following dt-schema warning:
imx25-karo-tx25.dtb: efuse@53ff0000: compatible: ['fsl,imx25-iim', 'fsl,imx27-iim'] is too long
from schema $id: http://devicetree.org/schemas/nvmem/imx-iim.yaml#
Since "dev_search_path" can technically be as large as PATH_MAX,
there was a risk of truncation when copying it and a second string
into "full_path" since it was also PATH_MAX sized. The W=1 builds were
reporting this warning:
drivers/block/rnbd/rnbd-srv.c: In function 'process_msg_open.isra':
drivers/block/rnbd/rnbd-srv.c:616:51: warning: '%s' directive output may be truncated writing up to 254 bytes into a region of size between 0 and 4095 [-Wformat-truncation=]
616 | snprintf(full_path, PATH_MAX, "%s/%s",
| ^~
In function 'rnbd_srv_get_full_path',
inlined from 'process_msg_open.isra' at drivers/block/rnbd/rnbd-srv.c:721:14: drivers/block/rnbd/rnbd-srv.c:616:17: note: 'snprintf' output between 2 and 4351 bytes into a destination of size 4096
616 | snprintf(full_path, PATH_MAX, "%s/%s",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
617 | dev_search_path, dev_name);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
To fix this, unconditionally check for truncation (as was already done
for the case where "%SESSNAME%" was present).
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312100355.lHoJPgKy-lkp@intel.com/ Cc: Md. Haris Iqbal <haris.iqbal@ionos.com> Cc: Jack Wang <jinpu.wang@ionos.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <linux-block@vger.kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev> Acked-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20231212214738.work.169-kees@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
Don't rely on the PCI memory for the devcmd opcode because we
read a 0xff value if the PCI bus is broken, which can cause us
to report a bogus dev_cmd opcode later.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Per sram.yaml, address-cells, size-cells and ranges are mandatory.
The node name should be sram.
Change the node name and pass the required properties to fix the
following dt-schema warnings:
imx1-apf9328.dtb: esram@300000: $nodename:0: 'esram@300000' does not match '^sram(@.*)?'
from schema $id: http://devicetree.org/schemas/sram/sram.yaml#
imx1-apf9328.dtb: esram@300000: '#address-cells' is a required property
from schema $id: http://devicetree.org/schemas/sram/sram.yaml#
imx1-apf9328.dtb: esram@300000: '#size-cells' is a required property
from schema $id: http://devicetree.org/schemas/sram/sram.yaml#
imx1-apf9328.dtb: esram@300000: 'ranges' is a required property
from schema $id: http://devicetree.org/schemas/sram/sram.yaml#
Per mtd-physmap.yaml, 'nor@0,0' is not a valid node pattern.
Change it to 'flash@0,0' to fix the following dt-schema warning:
imx1-ads.dtb: nor@0,0: $nodename:0: 'nor@0,0' does not match '^(flash|.*sram|nand)(@.*)?$'
from schema $id: http://devicetree.org/schemas/mtd/mtd-physmap.yaml#
Node names should be generic. Use 'rtc' as node name to fix
the following dt-schema warning:
imx25-eukrea-mbimxsd25-baseboard.dtb: pcf8563@51: $nodename:0: 'pcf8563@51' does not match '^rtc(@.*|-([0-9]|[1-9][0-9]+))?$'
from schema $id: http://devicetree.org/schemas/rtc/nxp,pcf8563.yaml#
generic_map_{delete,update}_batch() doesn't set uattr->batch.count as
zero before it tries to allocate memory for key. If the memory
allocation fails, the value of uattr->batch.count will be incorrect.
Fix it by setting uattr->batch.count as zero beore batched update or
deletion.
We should set the status to FC_TIMED_OUT when a timeout error is passed to
fc_fcp_rec_error().
Signed-off-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20231129165832.224100-3-hare@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The current FC error recovery is sending up to three REC (recovery) frames
in 10 second intervals, and as a final step sending an ABTS after 30
seconds for the command itself. Unfortunately sending an ABTS is also the
action for the SCSI abort handler, and the default timeout for SCSI
commands is also 30 seconds. This causes two ABTS to be scheduled, with the
libfc one slightly earlier. The ABTS scheduled by SCSI EH then sees the
command to be already aborted, and will always return with a 'GOOD' status
irrespective on the actual result from the first ABTS. This causes the
SCSI EH abort handler to always succeed, and SCSI EH never to be engaged.
Fix this by not issuing an ABTS when a SCSI command is present for the
exchange, but rather wait for the abort scheduled from SCSI EH. And warn
if an abort is already scheduled to avoid similar errors in the future.
Signed-off-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20231129165832.224100-2-hare@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
map is the pointer of outer map, and need_defer needs some explanation.
need_defer tells the implementation to defer the reference release of
the passed element and ensure that the element is still alive before
the bpf program, which may manipulate it, exits.
The following three cases will invoke map_fd_put_ptr() and different
need_defer values will be passed to these callers:
1) release the reference of the old element in the map during map update
or map deletion. The release must be deferred, otherwise the bpf
program may incur use-after-free problem, so need_defer needs to be
true.
2) release the reference of the to-be-added element in the error path of
map update. The to-be-added element is not visible to any bpf
program, so it is OK to pass false for need_defer parameter.
3) release the references of all elements in the map during map release.
Any bpf program which has access to the map must have been exited and
released, so need_defer=false will be OK.
These two parameters will be used by the following patches to fix the
potential use-after-free problem for map-in-map.
Fix an array-index-out-of-bounds read in ath9k_htc_txstatus(). The bug
occurs when txs->cnt, data from a URB provided by a USB device, is
bigger than the size of the array txs->txstatus, which is
HTC_MAX_TX_STATUS. WARN_ON() already checks it, but there is no bug
handling code after the check. Make the function return if that is the
case.
Found by a modified version of syzkaller.
UBSAN: array-index-out-of-bounds in htc_drv_txrx.c
index 13 is out of range for type '__wmi_event_txstatus [12]'
Call Trace:
ath9k_htc_txstatus
ath9k_wmi_event_tasklet
tasklet_action_common
__do_softirq
irq_exit_rxu
sysvec_apic_timer_interrupt
nand-controller.yaml bindings says #size-cells shall be set to 0.
Fixes the dtbs_check warning:
arch/arm/boot/dts/nxp/imx/imx7s-mba7.dtb: nand-controller@33002000:
#size-cells:0:0: 0 was expected
from schema $id: http://devicetree.org/schemas/mtd/gpmi-nand.yaml#
imx7d-lcdif is compatible to imx6sx-lcdif. MXSFB_V6 supports overlay
by using LCDC_AS_CTRL register. This registers used by overlay plane:
* LCDC_AS_CTRL
* LCDC_AS_BUF
* LCDC_AS_NEXT_BUF
are listed in i.MX7D RM as well.
imx7d uses two ports for 'in-ports', so the syntax port@<num> has to
be used. imx7d has both port and port@1 nodes present, raising these
error:
funnel@30041000: in-ports: More than one condition true in oneOf schema
funnel@30041000: Unevaluated properties are not allowed
('in-ports' was unexpected)
If failed to allocate "tags" or could not find the final upper device from
start_dev's upper list in bond_verify_device_path(), only the loopback
detection of the current upper device should be affected, and the system is
no need to be panic.
So return -ENOMEM in alb_upper_dev_walk to stop walking, print some warn
information when failed to allocate memory for vlan tags in
bond_verify_device_path.
I also think that the following function calls
netdev_walk_all_upper_dev_rcu
---->>>alb_upper_dev_walk
---------->>>bond_verify_device_path
From this way, "end device" can eventually be obtained from "start device"
in bond_verify_device_path, IS_ERR(tags) could be instead of
IS_ERR_OR_NULL(tags) in alb_upper_dev_walk.
Spectrum-{1,2,3,4} devices report that a D3hot->D0 transition causes a
reset (i.e., they advertise NoSoftRst-). However, this transition does
not have any effect on the device: It continues to be operational and
network ports remain up. Advertising this support makes it seem as if a
PM reset is viable for these devices. Mark it as unavailable to skip it
when testing reset methods.
Before:
# cat /sys/bus/pci/devices/0000\:03\:00.0/reset_method
pm bus
After:
# cat /sys/bus/pci/devices/0000\:03\:00.0/reset_method
bus
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Because file_name and phba->ModelName are both declared a size 80 bytes,
the extra ".grp" file extension could cause an overflow into file_name.
Define a ELX_FW_NAME_SIZE macro with value 84. 84 incorporates the 4 extra
characters from ".grp". file_name is changed to be declared as a char and
initialized to zeros i.e. null chars.
Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20231031191224.150862-3-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
With latest clang18 (main branch of llvm-project repo), when building bpf selftests,
[~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j
The following compilation error happens:
fatal error: error in backend: Branch target out of insn range
...
Stack dump:
0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi
-I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter
/home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include
-idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf
-c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o
1. <eof> parser at end of file
2. Code generation
...
The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay
since cpu=v4 supports 32-bit branch target offset.
The above failure is due to upstream llvm patch [1] where some inlining behavior
are changed in clang18.
To workaround the issue, previously all 180 loop iterations are fully unrolled.
The bpf macro __BPF_CPU_VERSION__ (implemented in clang18 recently) is used to avoid
unrolling changes if cpu=v4. If __BPF_CPU_VERSION__ is not available and the
compiler is clang18, the unrollng amount is unconditionally reduced.
When a hardware reset is triggered, all registers are reset, so all
queues are forced to stop in hardware interface. However, mac80211
will not automatically stop the queue. If we don't manually stop the
beacon queue, the queue will be deadlocked and unable to start again.
This patch fixes the issue where Apple devices cannot connect to the
AP after calling ieee80211_restart_hw().
When we online resize an ext4 filesystem with a oversized flexbg_size,
mkfs.ext4 -F -G 67108864 $dev -b 4096 100M
mount $dev $dir
resize2fs $dev 16G
the following WARN_ON is triggered:
==================================================================
WARNING: CPU: 0 PID: 427 at mm/page_alloc.c:4402 __alloc_pages+0x411/0x550
Modules linked in: sg(E)
CPU: 0 PID: 427 Comm: resize2fs Tainted: G E 6.6.0-rc5+ #314
RIP: 0010:__alloc_pages+0x411/0x550
Call Trace:
<TASK>
__kmalloc_large_node+0xa2/0x200
__kmalloc+0x16e/0x290
ext4_resize_fs+0x481/0xd80
__ext4_ioctl+0x1616/0x1d90
ext4_ioctl+0x12/0x20
__x64_sys_ioctl+0xf0/0x150
do_syscall_64+0x3b/0x90
==================================================================
This is because flexbg_size is too large and the size of the new_group_data
array to be allocated exceeds MAX_ORDER. Currently, the minimum value of
MAX_ORDER is 8, the minimum value of PAGE_SIZE is 4096, the corresponding
maximum number of groups that can be allocated is:
And the value that is down-aligned to the power of 2 is 16384. Therefore,
this value is defined as MAX_RESIZE_BG, and the number of groups added
each time does not exceed this value during resizing, and is added multiple
times to complete the online resizing. The difference is that the metadata
in a flex_bg may be more dispersed.
In commit 967ac8af4475 ("ext4: fix potential integer overflow in
alloc_flex_gd()"), an overflow check is added to alloc_flex_gd() to
prevent the allocated memory from being smaller than expected due to
the overflow. However, after kmalloc() is replaced with kmalloc_array()
in commit 6da2ec56059c ("treewide: kmalloc() -> kmalloc_array()"), the
kmalloc_array() function has an overflow check, so the above problem
will not occur. Therefore, the extra check is removed.
The maximum value of flexbg_size is 2^31, but the maximum value of int
is (2^31 - 1), so overflow may occur when the type of flexbg_size is
declared as int.
For example, when uninit_mask is initialized in ext4_alloc_group_tables(),
if flexbg_size == 2^31, the initialized uninit_mask is incorrect, and this
may causes set_flexbg_block_bitmap() to trigger a BUG_ON().
Therefore, the flexbg_size type is declared as unsigned int to avoid
overflow and memory waste.
Suppose we issue two FITRIM ioctls for ranges [0,15] and [16,31] with
mininum length of trimmed range set to 8 blocks. If we have say a range of
blocks 10-22 free, this range will not be trimmed because it straddles the
boundary of the two FITRIM ranges and neither part is big enough. This is a
bit surprising to some users that call FITRIM on smaller ranges of blocks
to limit impact on the system. Also XFS trims all free space extents that
overlap with the specified range so we are inconsistent among filesystems.
Let's change ext4_try_to_trim_range() to consider for trimming the whole
free space extent that straddles the end of specified range, not just the
part of it within the range.
Even though it seems to be able to resolve some names of
case-insensitive directories, the lack of d_hash and d_compare means we
end up with a broken state in the d_cache. Considering it was never a
goal to support these two together, and we are preparing to use
d_revalidate in case-insensitive filesystems, which would make the
combination even more broken, reject any attempt to get a casefolded
inode from ecryptfs.
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The problem seems to be that two out of three callers aren't taking the
rcu_read_lock() before calling the list_for_each_entry_rcu() function in
rpc_xprt_switch_has_addr(). I fix this by having
rpc_xprt_switch_has_addr() unconditionaly take the rcu_read_lock(),
which is okay to do recursively in the case that the lock has already
been taken by a caller.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
kvm_arch_vcpu_ioctl_set_fpu() allows to set the floating point control
(fpc) register of a guest cpu. The new value is tested for validity by
temporarily loading it into the fpc register.
This may lead to corruption of the fpc register of the host process:
if an interrupt happens while the value is temporarily loaded into the fpc
register, and within interrupt context floating point or vector registers
are used, the current fp/vx registers are saved with save_fpu_regs()
assuming they belong to user space and will be loaded into fp/vx registers
when returning to user space.
test_fp_ctl() restores the original user space / host process fpc register
value, however it will be discarded, when returning to user space.
In result the host process will incorrectly continue to run with the value
that was supposed to be used for a guest cpu.
Fix this by simply removing the test. There is another test right before
the SIE context is entered which will handles invalid values.
This results in a change of behaviour: invalid values will now be accepted
instead of that the ioctl fails with -EINVAL. This seems to be acceptable,
given that this interface is most likely not used anymore, and this is in
addition the same behaviour implemented with the memory mapped interface
(replace invalid values with zero) - see sync_regs() in kvm-s390.c.
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the content of the floating point control (fpc) register of a traced
process is modified with the ptrace interface the new value is tested for
validity by temporarily loading it into the fpc register.
This may lead to corruption of the fpc register of the tracing process:
if an interrupt happens while the value is temporarily loaded into the
fpc register, and within interrupt context floating point or vector
registers are used, the current fp/vx registers are saved with
save_fpu_regs() assuming they belong to user space and will be loaded into
fp/vx registers when returning to user space.
test_fp_ctl() restores the original user space fpc register value, however
it will be discarded, when returning to user space.
In result the tracer will incorrectly continue to run with the value that
was supposed to be used for the traced process.
Fix this by saving fpu register contents with save_fpu_regs() before using
test_fp_ctl().
[Analysis]
When the agstart is too large, it can cause agno overflow.
[Fix]
After obtaining agno, if the value is invalid, exit the subsequent process.
Reported-and-tested-by: syzbot+553d90297e6d2f50dbc7@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Modified the test from agno > MAXAG to agno >= MAXAG based on linux-next
report by kernel test robot (Dan Carpenter).
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
There could be a lot of servers in the list and each server can have
multiple addresses, so I think this would be better with an exclusive
second pass.
The server list isn't likely to change all that often, but when it does
change, there's a good chance several servers are going to be
added/removed one after the other. Further, this is only going to be
used for incoming cache management/callback requests from the server,
which hopefully aren't going to happen too often - but it is remotely
drivable.
(6) afs_find_server_by_uuid().
Similarly to (5), there could be a lot of servers to search through, but
they are in a tree not a flat list, so it should be faster to process.
Again, it's not likely to change that often and, again, when it does
change it's likely to involve multiple changes. This can be driven
remotely by an incoming cache management request but is mostly going to
be driven by setting up or reconfiguring a volume's server list -
something that also isn't likely to happen often.
Make the "seq" counter odd on the 2nd pass, otherwise read_seqbegin_or_lock()
never takes the lock.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/20231130115614.GA21581@redhat.com/ Signed-off-by: Sasha Levin <sashal@kernel.org>
There can be a lot of volumes known by a system. A thousand would
require a 10-step walk and this is drivable by remote operation, so I
think this should probably take a lock on the second pass too.
Make the "seq" counter odd on the 2nd pass, otherwise read_seqbegin_or_lock()
never takes the lock.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/20231130115606.GA21571@redhat.com/ Signed-off-by: Sasha Levin <sashal@kernel.org>
smatch warnings:
drivers/crypto/stm32/stm32-crc32.c:108 stm32_crc_get_next_crc() warn:
can 'crc' even be NULL?
Use list_first_entry_or_null instead of list_first_entry to retrieve
the first device registered.
The function list_first_entry always return a non NULL pointer even if
the list is empty. Hence checking if the pointer returned is NULL does
not tell if the list is empty or not.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202311281111.ou2oUL2i-lkp@intel.com/ Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202311281111.ou2oUL2i-lkp@intel.com/ Signed-off-by: Thomas Bourgoin <thomas.bourgoin@foss.st.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
When the number of cpu cores is adjusted to 7 or other odd numbers,
the zone size will become an odd number.
The address of the zone will become:
addr of zone0 = BASE
addr of zone1 = BASE + zone_size
addr of zone2 = BASE + zone_size*2
...
The address of zone1/3/5/7 will be mapped to non-alignment va.
Eventually crashes will occur when accessing these va.
So, use ALIGN_DOWN() to make sure the zone size is even
to avoid this bug.
When the execution of diMount(ipimap) fails, the object ipimap that has been
released may be accessed in diFreeSpecial(). Asynchronous ipimap release occurs
when rcu_core() calls jfs_free_node().
Therefore, when diMount(ipimap) fails, sbi->ipimap should not be initialized as
ipimap.
Reported-and-tested-by: syzbot+01cf2dbcbe2022454388@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently there is a bound check missing in the dbAdjTree while
accessing the dmt_stree. To add the required check added the bool is_ctl
which is required to determine the size as suggest in the following
commit.
https://lore.kernel.org/linux-kernel-mentees/f9475918-2186-49b8-b801-6f0f9e75f4fa@oracle.com/
The issue is caused when the value of fsi becomes less than -1.
The check to break the loop when fsi value becomes -1 is present
but syzbot was able to produce value less than -1 which cause the error.
This patch simply add the change for the values less than 0.
The patch is tested via syzbot.
Reported-and-tested-by: syzbot+d4b1df2e9d4ded6488ec@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=d4b1df2e9d4ded6488ec Signed-off-by: Osama Muhammad <osmtendev@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dmap.c:2867:6
index 196694 is out of range for type 's8[1365]' (aka 'signed char[1365]')
CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:217 [inline]
__ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348
dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867
dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834
dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331
dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline]
dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402
txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534
txUpdateMap+0x342/0x9e0
txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline]
jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732
kthread+0x2d3/0x370 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
</TASK>
================================================================================
Kernel panic - not syncing: UBSAN: panic_on_warn set ...
CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
panic+0x30f/0x770 kernel/panic.c:340
check_panic_on_warn+0x82/0xa0 kernel/panic.c:236
ubsan_epilogue lib/ubsan.c:223 [inline]
__ubsan_handle_out_of_bounds+0x13c/0x150 lib/ubsan.c:348
dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867
dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834
dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331
dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline]
dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402
txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534
txUpdateMap+0x342/0x9e0
txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline]
jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732
kthread+0x2d3/0x370 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
</TASK>
Kernel Offset: disabled
Rebooting in 86400 seconds..
The issue is caused when the value of lp becomes greater than
CTLTREESIZE which is the max size of stree. Adding a simple check
solves this issue.
Dave:
As the function returns a void, good error handling
would require a more intrusive code reorganization, so I modified
Osama's patch at use WARN_ON_ONCE for lack of a cleaner option.
The patch is tested via syzbot.
Reported-by: syzbot+39ba34a099ac2e9bd3cb@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=39ba34a099ac2e9bd3cb Signed-off-by: Osama Muhammad <osmtendev@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
There are two major types of uncorrected recoverable (UCR) errors :
- Synchronous error: The error is detected and raised at the point of
the consumption in the execution flow, e.g. when a CPU tries to
access a poisoned cache line. The CPU will take a synchronous error
exception such as Synchronous External Abort (SEA) on Arm64 and
Machine Check Exception (MCE) on X86. OS requires to take action (for
example, offline failure page/kill failure thread) to recover this
uncorrectable error.
- Asynchronous error: The error is detected out of processor execution
context, e.g. when an error is detected by a background scrubber.
Some data in the memory are corrupted. But the data have not been
consumed. OS is optional to take action to recover this uncorrectable
error.
When APEI firmware first is enabled, a platform may describe one error
source for the handling of synchronous errors (e.g. MCE or SEA notification
), or for handling asynchronous errors (e.g. SCI or External Interrupt
notification). In other words, we can distinguish synchronous errors by
APEI notification. For synchronous errors, kernel will kill the current
process which accessing the poisoned page by sending SIGBUS with
BUS_MCEERR_AR. In addition, for asynchronous errors, kernel will notify the
process who owns the poisoned page by sending SIGBUS with BUS_MCEERR_AO in
early kill mode. However, the GHES driver always sets mf_flags to 0 so that
all synchronous errors are handled as asynchronous errors in memory failure.
To this end, set memory failure flags as MF_ACTION_REQUIRED on synchronous
events.
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Tested-by: Ma Wupeng <mawupeng1@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Xiaofei Tan <tanxiaofei@huawei.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
There is a chance if a frequent switch of the governor
done in a loop result in timer list corruption where
timer cancel being done from two place one from
cancel_delayed_work_sync() and followed by expire_timers()
can be seen from the traces[1].
while true
do
echo "simple_ondemand" > /sys/class/devfreq/1d84000.ufshc/governor
echo "performance" > /sys/class/devfreq/1d84000.ufshc/governor
done
It looks to be issue with devfreq driver where
device_monitor_[start/stop] need to synchronized so that
delayed work should get corrupted while it is either
being queued or running or being cancelled.
Let's use polling flag and devfreq lock to synchronize the
queueing the timer instance twice and work data being
corrupted.
The gcc plugin -fanalyzer [1] tries to detect various
patterns of incorrect behaviour. The tool reports:
drivers/acpi/acpi_extlog.c: In function ‘extlog_exit’:
drivers/acpi/acpi_extlog.c:307:12: warning: check of ‘extlog_l1_addr’ for NULL after already dereferencing it [-Wanalyzer-deref-before-check]
|
| 306 | ((struct extlog_l1_head *)extlog_l1_addr)->flags &= ~FLAG_OS_OPTIN;
| | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
| | |
| | (1) pointer ‘extlog_l1_addr’ is dereferenced here
| 307 | if (extlog_l1_addr)
| | ~
| | |
| | (2) pointer ‘extlog_l1_addr’ is checked for NULL here but it was already dereferenced at (1)
|
Fix the NULL pointer dereference check in extlog_exit().
When compiling with gcc version 14.0.0 20231126 (experimental)
and CONFIG_FORTIFY_SOURCE=y, I've noticed the following:
In file included from ./include/linux/string.h:295,
from ./include/linux/bitmap.h:12,
from ./include/linux/cpumask.h:12,
from ./arch/x86/include/asm/paravirt.h:17,
from ./arch/x86/include/asm/cpuid.h:62,
from ./arch/x86/include/asm/processor.h:19,
from ./arch/x86/include/asm/cpufeature.h:5,
from ./arch/x86/include/asm/thread_info.h:53,
from ./include/linux/thread_info.h:60,
from ./arch/x86/include/asm/preempt.h:9,
from ./include/linux/preempt.h:79,
from ./include/linux/spinlock.h:56,
from ./include/linux/mmzone.h:8,
from ./include/linux/gfp.h:7,
from ./include/linux/slab.h:16,
from ./include/linux/resource_ext.h:11,
from ./include/linux/acpi.h:13,
from drivers/pnp/pnpacpi/rsparser.c:11:
In function 'fortify_memcpy_chk',
inlined from 'pnpacpi_parse_allocated_vendor' at drivers/pnp/pnpacpi/rsparser.c:158:3,
inlined from 'pnpacpi_allocated_resource' at drivers/pnp/pnpacpi/rsparser.c:249:3:
./include/linux/fortify-string.h:588:25: warning: call to '__read_overflow2_field'
declared with attribute warning: detected read beyond size of field (2nd parameter);
maybe use struct_group()? [-Wattribute-warning]
588 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
According to the comments in include/linux/fortify-string.h, 'memcpy()',
'memmove()' and 'memset()' must not be used beyond individual struct
members to ensure that the compiler can enforce protection against
buffer overflows, and, IIUC, this also applies to partial copies from
the particular member ('vendor->byte_data' in this case). So it should
be better (and safer) to do both copies at once (and 'byte_data' of
'struct acpi_resource_vendor_typed' seems to be a good candidate for
'__counted_by(byte_length)' as well).
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The Colorful X15 AT 23 ACPI video-bus device report spurious
ACPI_VIDEO_NOTIFY_CYCLE events resulting in spurious KEY_SWITCHVIDEOMODE
events being reported to userspace (and causing trouble there) when
an external screen plugged in.
Add a quirk setting the report_key_events mask to
REPORT_BRIGHTNESS_KEY_EVENTS so that the ACPI_VIDEO_NOTIFY_CYCLE
events will be ignored, while still reporting brightness up/down
hotkey-presses to userspace normally.
Signed-off-by: Yuluo Qiu <qyl27@outlook.com> Co-developed-by: Celeste Liu <CoelacanthusHex@gmail.com> Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When auditd_set sets the auditd_conn pointer, audit messages can
immediately be put on the socket by other kernel threads. If the backlog
is large or the rate is high, this can immediately fill the socket
buffer. If the audit daemon requested an ACK for this operation, a full
socket buffer causes the ACK to get dropped, also setting ENOBUFS on the
socket.
To avoid this race and ensure ACKs get through, fast-track the ACK in
this specific case to ensure it is sent before auditd_conn is set.
Signed-off-by: Chris Riches <chris.riches@nutanix.com>
[PM: fix some tab vs space damage] Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The use_count of a regulator should only be incremented when the
enable_count changes from 0 to 1. Similarly, the use_count should
only be decremented when the enable_count changes from 1 to 0.
In the previous implementation, use_count was sometimes decremented
to 0 when some consumer called unbalanced disable,
leading to unexpected disable even the regulator is enabled by
other consumers. With this change, the use_count accurately reflects
the number of users which the regulator is enabled.
This should make things more robust in the case where a consumer does
leak references.
After release of the hashbucket lock the tracking object can be modified or
freed by a concurrent thread. Using it in such a case is error prone, even
for printing the object state:
1. T1 tries to deactivate destroyed object, debugobjects detects it,
hash bucket lock is released.
2. T2 preempts T1 and frees the tracking object.
3. The freed tracking object is allocated and initialized for a
different to be tracked kernel object.
4. T1 resumes and reports error for wrong kernel object.
Create a local copy of the tracking object before releasing the hash bucket
lock and use the local copy for reporting and fixups to prevent this.
Memory errors don't happen very often, especially fatal ones. However,
in large-scale scenarios such as data centers, that probability
increases with the amount of machines present.
When a fatal machine check happens, mce_panic() is called based on the
severity grading of that error. The page containing the error is not
marked as poison.
However, when kexec is enabled, tools like makedumpfile understand when
pages are marked as poison and do not touch them so as not to cause
a fatal machine check exception again while dumping the previous
kernel's memory.
Therefore, mark the page containing the error as poisoned so that the
kexec'ed kernel can avoid accessing the page.
[ bp: Rewrite commit message and comment. ]
Co-developed-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Link: https://lore.kernel.org/r/20231014051754.3759099-1-zhiquan1.li@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Some of the fp/vmx code in sstep.c assume a certain maximum size for the
instructions being emulated. The size of those operations however is
determined separately in analyse_instr().
Add a check to validate the assumption on the maximum size of the
operations, so as to prevent any unintended kernel stack corruption.
Signed-off-by: Naveen N Rao <naveen@kernel.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Build-tested-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231123071705.397625-1-naveen@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
The linux-next build of powerpc64 allnoconfig fails with:
arch/powerpc/mm/book3s64/pgtable.c:557:5: error: no previous prototype for 'pmd_move_must_withdraw'
557 | int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
| ^~~~~~~~~~~~~~~~~~~~~~
Caused by commit:
c6345dfa6e3e ("Makefile.extrawarn: turn on missing-prototypes globally")
Fix it by moving the function definition under
CONFIG_TRANSPARENT_HUGEPAGE like the prototype. The function is only
called when CONFIG_TRANSPARENT_HUGEPAGE=y.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[mpe: Flesh out change log from linux-next patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231127132809.45c2b398@canb.auug.org.au Signed-off-by: Sasha Levin <sashal@kernel.org>