git.ipfire.org Git - people/ms/linux.git/log

arm64: dts: qcom: sdm845-db845c: Specify a i2c bus clocks

The kernel log contains complaints about i2c11 and i2c14 lacking
clock-frequency, specify a reasonable value to suppress this warning.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20220717034403.2135027-4-bjorn.andersson@linaro.org

arm64: dts: qcom: sdm845-db845c: Enable gpi_dma1

Enable gpi_dma1 so that i2c14 is able to find its DMA controller.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20220717034403.2135027-3-bjorn.andersson@linaro.org

arm64: dts: qcom: sdm845: Fill in GENI DMA references

The I2C and SPI might be configured in GPI DMA mode, fill in the
properties needed for this.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20220717034403.2135027-2-bjorn.andersson@linaro.org

scsi: core: cap shost max_sectors according to DMA limits only once

The shost->max_sectors is repeatedly capped according to the host DMA
mapping limit for each sdev in __scsi_init_queue(). This is unnecessary, so
set only once when adding the host.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

dma-iommu: add iommu_dma_opt_mapping_size()

Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
allows the drivers to know the optimal mapping limit and thus limit the
requested IOVA lengths.

This value is based on the IOVA rcache range limit, as IOVAs allocated
above this limit must always be newly allocated, which may be quite slow.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

dma-mapping: add dma_opt_mapping_size()

Streaming DMA mapping involving an IOMMU may be much slower for larger
total mapping size. This is because every IOMMU DMA mapping requires an
IOVA to be allocated and freed. IOVA sizes above a certain limit are not
cached, which can have a big impact on DMA mapping performance.

Provide an API for device drivers to know this "optimal" limit, such that
they may try to produce mapping which don't exceed it.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2022-07-15

This series contains updates to ice driver only.

Ani updates feature restriction for devices that don't support external
time stamping.

Zhuo Chen removes unnecessary call to pci_aer_clear_nonfatal_status().

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: Remove pci_aer_clear_nonfatal_status() call
ice: Add EXTTS feature to the feature bitmap
====================

Link: https://lore.kernel.org/r/20220715214642.2968799-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: macb: fixup sparse warnings on __be16 ports

The port fields in the ethool flow structures are defined
to be __be16 types, so sparse is showing issues where these
are being passed to htons(). Fix these warnings by passing
them to be16_to_cpu() instead.

These are being used in netdev_dbg() so should only effect
anyone doing debug.

Fixes the following sparse warnings:

drivers/net/ethernet/cadence/macb_main.c:3366:9: warning: cast from restricted __be16
drivers/net/ethernet/cadence/macb_main.c:3366:9: warning: cast from restricted __be16
drivers/net/ethernet/cadence/macb_main.c:3366:9: warning: cast from restricted __be16
drivers/net/ethernet/cadence/macb_main.c:3419:25: warning: cast from restricted __be16
drivers/net/ethernet/cadence/macb_main.c:3419:25: warning: cast from restricted __be16
drivers/net/ethernet/cadence/macb_main.c:3419:25: warning: cast from restricted __be16
drivers/net/ethernet/cadence/macb_main.c:3419:25: warning: cast from restricted __be16

Signed-off-by: Ben Dooks <ben.dooks@sifive.com>
Link: https://lore.kernel.org/r/20220715173009.526126-1-ben.dooks@sifive.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: prestera: acl: fix code formatting

Make the code look better.

Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Link: https://lore.kernel.org/r/20220715103806.7108-1-maksym.glubokiy@plvision.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

vmxnet3: Record queue number to incoming packets

Make generic XDP processing attribute packets to their actual
queues instead of queue #0. This improves AF_XDP performance
considerably since softirq threads no longer fight over single
AF_XDP socket spinlock.

Signed-off-by: Andrey Turkin <andrey.turkin@gmail.com>
Link: https://lore.kernel.org/r/20220717022050.822766-2-andrey.turkin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: remove redunctant disable xPCS EEE call

Disable is done in stmmac_init_eee() on the event of MAC link down.
Since setting enable/disable EEE via ethtool will eventually trigger
a MAC down, removing this redunctant call in stmmac_ethtool.c to avoid
calling xpcs_config_eee() twice.

Fixes: d4aeaed80b0e ("net: stmmac: trigger PCS EEE to turn off on link down")
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Link: https://lore.kernel.org/r/20220715122402.1017470-1-vee.khee.wong@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'fix-2-dsa-issues-with-vlan_filtering_is_global'

Vladimir Oltean says:

====================
Fix 2 DSA issues with vlan_filtering_is_global

This patch set fixes 2 issues with vlan_filtering_is_global switches.

Both are regressions introduced by refactoring commit d0004a020bb5
("net: dsa: remove the "dsa_to_port in a loop" antipattern from the
core"), which wasn't tested on a wide enough variety of switches.

Tested on the sja1105 driver.
====================

Link: https://lore.kernel.org/r/20220715151659.780544-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: fix NULL pointer dereference in dsa_port_reset_vlan_filtering

The "ds" iterator variable used in dsa_port_reset_vlan_filtering() ->
dsa_switch_for_each_port() overwrites the "dp" received as argument,
which is later used to call dsa_port_vlan_filtering() proper.

As a result, switches which do enter that code path (the ones with
vlan_filtering_is_global=true) will dereference an invalid dp in
dsa_port_reset_vlan_filtering() after leaving a VLAN-aware bridge.

Use a dedicated "other_dp" iterator variable to avoid this from
happening.

Fixes: d0004a020bb5 ("net: dsa: remove the "dsa_to_port in a loop" antipattern from the core")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: fix dsa_port_vlan_filtering when global

The blamed refactoring commit changed a "port" iterator with "other_dp",
but still looked at the slave_dev of the dp outside the loop, instead of
other_dp->slave from the loop.

As a result, dsa_port_vlan_filtering() would not call
dsa_slave_manage_vlan_filtering() except for the port in cause, and not
for all switch ports as expected.

Fixes: d0004a020bb5 ("net: dsa: remove the "dsa_to_port in a loop" antipattern from the core")
Reported-by: Lucian Banu <Lucian.Banu@westermo.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ixgbe: Add locking to prevent panic when setting sriov_numvfs to zero

It is possible to disable VFs while the PF driver is processing requests
from the VF driver.  This can result in a panic.

BUG: unable to handle kernel paging request at 000000000000106c
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 8 PID: 0 Comm: swapper/8 Kdump: loaded Tainted: G I      --------- -
Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS 2.8.2 08/27/2020
RIP: 0010:ixgbe_msg_task+0x4c8/0x1690 [ixgbe]
Code: 00 00 48 8d 04 40 48 c1 e0 05 89 7c 24 24 89 fd 48 89 44 24 10 83 ff
01 0f 84 b8 04 00 00 4c 8b 64 24 10 4d 03 a5 48 22 00 00 <41> 80 7c 24 4c
00 0f 84 8a 03 00 00 0f b7 c7 83 f8 08 0f 84 8f 0a
RSP: 0018:ffffb337869f8df8 EFLAGS: 00010002
RAX: 0000000000001020 RBX: 0000000000000000 RCX: 000000000000002b
RDX: 0000000000000002 RSI: 0000000000000008 RDI: 0000000000000006
RBP: 0000000000000006 R08: 0000000000000002 R09: 0000000000029780
R10: 00006957d8f42832 R11: 0000000000000000 R12: 0000000000001020
R13: ffff8a00e8978ac0 R14: 000000000000002b R15: ffff8a00e8979c80
FS:  0000000000000000(0000) GS:ffff8a07dfd00000(0000) knlGS:00000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000106c CR3: 0000000063e10004 CR4: 00000000007726e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
? ttwu_do_wakeup+0x19/0x140
? try_to_wake_up+0x1cd/0x550
? ixgbevf_update_xcast_mode+0x71/0xc0 [ixgbevf]
ixgbe_msix_other+0x17e/0x310 [ixgbe]
__handle_irq_event_percpu+0x40/0x180
handle_irq_event_percpu+0x30/0x80
handle_irq_event+0x36/0x53
handle_edge_irq+0x82/0x190
handle_irq+0x1c/0x30
do_IRQ+0x49/0xd0
common_interrupt+0xf/0xf

This can be eventually be reproduced with the following script:

while :
do
    echo 63 > /sys/class/net/<devname>/device/sriov_numvfs
    sleep 1
    echo 0 > /sys/class/net/<devname>/device/sriov_numvfs
    sleep 1
done

Add lock when disabling SR-IOV to prevent process VF mailbox communication.

Fixes: d773d1310625 ("ixgbe: Fix memory leak when SR-IOV VFs are direct assigned")
Signed-off-by: Piotr Skajewski <piotrx.skajewski@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220715214456.2968711-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

i40e: Fix erroneous adapter reinitialization during recovery process

Fix an issue when driver incorrectly detects state
of recovery process and erroneously reinitializes interrupts,
which results in a kernel error and call trace message.

The issue was caused by a combination of two factors:
1. Assuming the EMP reset issued after completing
firmware recovery means the whole recovery process is complete.
2. Erroneous reinitialization of interrupt vector after detecting
the above mentioned EMP reset.

Fixes (1) by changing how recovery state change is detected
and (2) by adjusting the conditional expression to ensure using proper
interrupt reinitialization method, depending on the situation.

Fixes: 4ff0ee1af016 ("i40e: Introduce recovery mode support")
Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220715214542.2968762-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mtk_eth_soc: fix off by one check of ARRAY_SIZE

In mtk_wed_tx_ring_setup(.., int idx, ..), idx is used as an index here
struct mtk_wed_ring *ring = &dev->tx_ring[idx];

The bounds of idx are checked here
BUG_ON(idx > ARRAY_SIZE(dev->tx_ring));

If idx is the size of the array, it will pass this check and overflow.
So change the check to >= .

Fixes: 804775dfc288 ("net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)")
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20220716214654.1540240-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'devlink-prepare-mlxsw-and-netdevsim-for-locked-reload'

Jiri Pirko says:

====================
devlink: prepare mlxsw and netdevsim for locked reload

This is preparation patchset to be able to eventually make a switch and
make reload cmd to take devlink->lock as the other commands do.

This patchset is preparing 2 major users of devlink API - mlxsw and
netdevsim. The sets of functions are similar, therefore taking care of
both here.
====================

Link: https://lore.kernel.org/r/20220716110241.3390528-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: remove unused locked functions

Remove locked versions of functions that are no longer used by anyone.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netdevsim: convert driver to use unlocked devlink API during init/fini

Prepare for devlink reload being called with devlink->lock held and
convert the netdevsim driver to use unlocked devlink API during init and
fini flows. Take devl_lock() in reload_down() and reload_up() ops in the
meantime before reload cmd is converted to take the lock itself.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: add unlocked variants of devlink_region_create/destroy() functions

Add unlocked variants of devlink_region_create/destroy() functions
to be used in drivers called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: convert driver to use unlocked devlink API during init/fini

Prepare for devlink reload being called with devlink->lock held and
convert the mlxsw driver to use unlocked devlink API during init and
fini flows. Take devl_lock() in reload_down() and reload_up() ops in the
meantime before reload cmd is converted to take the lock itself.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: add unlocked variants of devlink_dpipe*() functions

Add unlocked variants of devlink_dpipe*() functions to be used
in drivers called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: add unlocked variants of devlink_sb*() functions

Add unlocked variants of devlink_sb*() functions to be used
in drivers called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: add unlocked variants of devlink_resource*() functions

Add unlocked variants of devlink_resource*() functions to be used
in drivers called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: add unlocked variants of devling_trap*() functions

Add unlocked variants of devl_trap*() functions to be used in drivers
called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: avoid false DEADLOCK warning reported by lockdep

Add a lock_class_key per devlink instance to avoid DEADLOCK warning by
lockdep, while locking more than one devlink instance in driver code,
for example in opening VFs flow.

Kernel log:
[  101.433802] ============================================
[  101.433803] WARNING: possible recursive locking detected
[  101.433810] 5.19.0-rc1+ #35 Not tainted
[  101.433812] --------------------------------------------
[  101.433813] bash/892 is trying to acquire lock:
[  101.433815] ffff888127bfc2f8 (&devlink->lock){+.+.}-{3:3}, at: probe_one+0x3c/0x690 [mlx5_core]
[  101.433909]
               but task is already holding lock:
[  101.433910] ffff888118f4c2f8 (&devlink->lock){+.+.}-{3:3}, at: mlx5_core_sriov_configure+0x62/0x280 [mlx5_core]
[  101.433989]
               other info that might help us debug this:
[  101.433990]  Possible unsafe locking scenario:

[  101.433991]        CPU0
[  101.433991]        ----
[  101.433992]   lock(&devlink->lock);
[  101.433993]   lock(&devlink->lock);
[  101.433995]
                *** DEADLOCK ***

[  101.433996]  May be due to missing lock nesting notation

[  101.433996] 6 locks held by bash/892:
[  101.433998]  #0: ffff88810eb50448 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0xf3/0x1d0
[  101.434009]  #1: ffff888114777c88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x20d/0x520
[  101.434017]  #2: ffff888102b58660 (kn->active#231){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x230/0x520
[  101.434023]  #3: ffff888102d70198 (&dev->mutex){....}-{3:3}, at: sriov_numvfs_store+0x132/0x310
[  101.434031]  #4: ffff888118f4c2f8 (&devlink->lock){+.+.}-{3:3}, at: mlx5_core_sriov_configure+0x62/0x280 [mlx5_core]
[  101.434108]  #5: ffff88812adce198 (&dev->mutex){....}-{3:3}, at: __device_attach+0x76/0x430
[  101.434116]
               stack backtrace:
[  101.434118] CPU: 5 PID: 892 Comm: bash Not tainted 5.19.0-rc1+ #35
[  101.434120] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  101.434130] Call Trace:
[  101.434133]  <TASK>
[  101.434135]  dump_stack_lvl+0x57/0x7d
[  101.434145]  __lock_acquire.cold+0x1df/0x3e7
[  101.434151]  ? register_lock_class+0x1880/0x1880
[  101.434157]  lock_acquire+0x1c1/0x550
[  101.434160]  ? probe_one+0x3c/0x690 [mlx5_core]
[  101.434229]  ? lockdep_hardirqs_on_prepare+0x400/0x400
[  101.434232]  ? __xa_alloc+0x1ed/0x2d0
[  101.434236]  ? ksys_write+0xf3/0x1d0
[  101.434239]  __mutex_lock+0x12c/0x14b0
[  101.434243]  ? probe_one+0x3c/0x690 [mlx5_core]
[  101.434312]  ? probe_one+0x3c/0x690 [mlx5_core]
[  101.434380]  ? devlink_alloc_ns+0x11b/0x910
[  101.434385]  ? mutex_lock_io_nested+0x1320/0x1320
[  101.434388]  ? lockdep_init_map_type+0x21a/0x7d0
[  101.434391]  ? lockdep_init_map_type+0x21a/0x7d0
[  101.434393]  ? __init_swait_queue_head+0x70/0xd0
[  101.434397]  probe_one+0x3c/0x690 [mlx5_core]
[  101.434467]  pci_device_probe+0x1b4/0x480
[  101.434471]  really_probe+0x1e0/0xaa0
[  101.434474]  __driver_probe_device+0x219/0x480
[  101.434478]  driver_probe_device+0x49/0x130
[  101.434481]  __device_attach_driver+0x1b8/0x280
[  101.434484]  ? driver_allows_async_probing+0x140/0x140
[  101.434487]  bus_for_each_drv+0x123/0x1a0
[  101.434489]  ? bus_for_each_dev+0x1a0/0x1a0
[  101.434491]  ? lockdep_hardirqs_on_prepare+0x286/0x400
[  101.434494]  ? trace_hardirqs_on+0x2d/0x100
[  101.434498]  __device_attach+0x1a3/0x430
[  101.434501]  ? device_driver_attach+0x1e0/0x1e0
[  101.434503]  ? pci_bridge_d3_possible+0x1e0/0x1e0
[  101.434506]  ? pci_create_resource_files+0xeb/0x190
[  101.434511]  pci_bus_add_device+0x6c/0xa0
[  101.434514]  pci_iov_add_virtfn+0x9e4/0xe00
[  101.434517]  ? trace_hardirqs_on+0x2d/0x100
[  101.434521]  sriov_enable+0x64a/0xca0
[  101.434524]  ? pcibios_sriov_disable+0x10/0x10
[  101.434528]  mlx5_core_sriov_configure+0xab/0x280 [mlx5_core]
[  101.434602]  sriov_numvfs_store+0x20a/0x310
[  101.434605]  ? sriov_totalvfs_show+0xc0/0xc0
[  101.434608]  ? sysfs_file_ops+0x170/0x170
[  101.434611]  ? sysfs_file_ops+0x117/0x170
[  101.434614]  ? sysfs_file_ops+0x170/0x170
[  101.434616]  kernfs_fop_write_iter+0x348/0x520
[  101.434619]  new_sync_write+0x2e5/0x520
[  101.434621]  ? new_sync_read+0x520/0x520
[  101.434624]  ? lock_acquire+0x1c1/0x550
[  101.434626]  ? lockdep_hardirqs_on_prepare+0x400/0x400
[  101.434630]  vfs_write+0x5cb/0x8d0
[  101.434633]  ksys_write+0xf3/0x1d0
[  101.434635]  ? __x64_sys_read+0xb0/0xb0
[  101.434638]  ? lockdep_hardirqs_on_prepare+0x286/0x400
[  101.434640]  ? syscall_enter_from_user_mode+0x1d/0x50
[  101.434643]  do_syscall_64+0x3d/0x90
[  101.434647]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  101.434650] RIP: 0033:0x7f5ff536b2f7
[  101.434658] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f
1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[  101.434661] RSP: 002b:00007ffd9ea85d58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  101.434664] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f5ff536b2f7
[  101.434666] RDX: 0000000000000002 RSI: 000055c4c279e230 RDI: 0000000000000001
[  101.434668] RBP: 000055c4c279e230 R08: 000000000000000a R09: 0000000000000001
[  101.434669] R10: 000055c4c283cbf0 R11: 0000000000000246 R12: 0000000000000002
[  101.434670] R13: 00007f5ff543d500 R14: 0000000000000002 R15: 00007f5ff543d700
[  101.434673]  </TASK>

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

scsi: target: iscsi: Fix clang -Wformat warnings

When building with Clang we encounter these warnings:
| drivers/target/iscsi/iscsi_target_login.c:719:24: error: format
| specifies type 'unsigned short' but the argument has type 'int'
| [-Werror,-Wformat] " from node: %s\n", atomic_read(&sess->nconn),
-
| drivers/target/iscsi/iscsi_target_login.c:767:12: error: format
| specifies type 'unsigned short' but the argument has type 'int'
| [-Werror,-Wformat] " %s\n", atomic_read(&sess->nconn),
-
| drivers/target/iscsi/iscsi_target.c:4365:12: error: format specifies
| type 'unsigned short' but the argument has type 'int' [-Werror,-Wformat]
| " %s\n", atomic_read(&sess->nconn)

For all warnings, the format specifier is '%hu' which describes an unsigned
short. The resulting type of atomic_read is an int. The proposed fix is to
listen to Clang and swap the format specifier.

Link: https://github.com/ClangBuiltLinux/linux/issues/378
Link: https://lore.kernel.org/r/20220718180421.49697-1-justinstitt@google.com
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: Read device property for ref clock

UFS storage devices require bRefClkFreq attribute to be set to operate
correctly at high speed mode. The necessary value is determined by what the
SoC / board supports. The standard doesn't specify a method to query the
value, so the information needs to be fed in separately.

DT information feeds into setting up the clock framework, so platforms
using DT can get the UFS reference clock frequency from the clock
framework. A special node "ref_clk" from the clock array for the UFS
controller node is used as the source for the information.

On the platforms that do not use DT (e.g. Intel), the alternative mechanism
to feed the intended reference clock frequency is necessary. Specifying the
necessary information in DSD of the UFS controller ACPI node is an
alternative mechanism proposed in this patch. Those can be accessed via
firmware property facility in the kernel and in many ways simillar to
querying properties defined in DT.

This patch introduces a small helper function to query a predetermined ACPI
supplied property of the UFS controller, and uses it to attempt retrieving
reference clock value, unless that was already done by the clock
infrastructure.

Link: https://lore.kernel.org/r/20220715210230.1.I365d113d275117dee8fd055ce4fc7e6aebd0bce9@changeid
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Daniil Lunev <dlunev@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: libsas: Resume SAS host for phy reset or enable via sysfs

Currently if a phy reset or enable phy is issued via sysfs when controller
is suspended, those operations will be ignored as SAS_HA_REGISTERED is
cleared. If RPM is enabled then we may aggressively suspend automatically.
In this case it may be difficult to enable or reset a phy via sysfs, so
resume the host in these scenarios.

Link: https://lore.kernel.org/r/1657823002-139010-6-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: hisi_sas: Modify v3 HW SATA completion error processing

If the I/O completion response frame returned by the target device has been
written to the host memory and the err bit in the status field of the
received fis is 1, ts->stat should set to SAS_PROTO_RESPONSE, and this will
let EH analyze and further determine cause of failure.

Link: https://lore.kernel.org/r/1657823002-139010-5-git-send-email-john.garry@huawei.com
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: hisi_sas: Relocate DMA unmap of SMP task

Currently SMP tasks are DMA unmapped only when cq of SMP I/O is returned
normally. If the cq of SMP I/O is returned with exception actually SMP TAS
is never unmapped. Relocate DMA unmap of SMP task to fix the issue.

Link: https://lore.kernel.org/r/1657823002-139010-4-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: hisi_sas: Remove unnecessary variable to hold DMA map elements

Use slot->n_elem to store the return value of dma_map_sg() for SSP and SMP
IOs, and remove unnecessary variable n_elem_req.

Link: https://lore.kernel.org/r/1657823002-139010-3-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: hisi_sas: Call hisi_sas_slave_configure() from slave_configure_v3_hw()

There is duplicated code between slave_configure_v3_hw() and
hisi_sas_slave_configure(), so call common function
hisi_sas_slave_configure() from slave_configure_v3_hw().

Link: https://lore.kernel.org/r/1657823002-139010-2-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpi3mr: Delete a stray tab

This code is indented one more tab than it should be.

Link: https://lore.kernel.org/r/YtVCFshEJNC7ELid@kili
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpi3mr: Unlock on error path

There is some clean up necessary before returning.  Smatch complains:

    drivers/scsi/mpi3mr/mpi3mr_fw.c:4786 mpi3mr_soft_reset_handler()
    warn: inconsistent returns '&mrioc->reset_mutex'.
      Locked on  : 4730
      Unlocked on: 4786

Link: https://lore.kernel.org/r/YtVCEsxMU8buuMjP@kili
Fixes: f10af057325c ("scsi: mpi3mr: Resource Based Metering")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpi3mr: Reduce VD queue depth on detecting throttling

Reduce the VD queue depth on detecting the throttling condition.

[mkp: incorporate fix for pointer cast issue reported by the test
robot and Guenter Roeck]

Link: https://lore.kernel.org/r/20220708195020.8323-3-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpi3mr: Resource Based Metering

Update driver to track cumulative pending large data size at the controller
level and at the throttle group level. When one of the values meet or
exceed the controller's firmware-determined high threshold value, then the
driver will divert future selective I/O to the firmware. Once both
controller level and at the throttle group level cumulative pending large
data size reach controller's firmware determined low threshold value, then
the driver will stop diverting I/Os to the firmware.

Link: https://lore.kernel.org/r/20220708195020.8323-2-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Merge branch 'net-lan966x-fix-issues-with-mac-table'

Horatiu Vultur says:

====================
net: lan966x: Fix issues with MAC table

The patch series fixes 2 issues:
- when an entry was forgotten the irq thread was holding a spin lock and then
was talking also rtnl_lock.
- the access to the HW MAC table is indirect, so the access to the HW MAC
table was not synchronized, which means that there could be race conditions.
====================

Link: https://lore.kernel.org/r/20220714194040.231651-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan966x: Fix usage of lan966x->mac_lock when used by FDB

When the SW bridge was trying to add/remove entries to/from HW, the
access to HW was not protected by any lock. In this way, it was
possible to have race conditions.
Fix this by using the lan966x->mac_lock to protect parallel access to HW
for this cases.

Fixes: 25ee9561ec622 ("net: lan966x: More MAC table functionality")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan966x: Fix usage of lan966x->mac_lock inside lan966x_mac_irq_handler

The problem with this spin lock is that it was just protecting the list
of the MAC entries in SW and not also the access to the MAC entries in HW.
Because the access to HW is indirect, then it could happen to have race
conditions.
For example when SW introduced an entry in MAC table and the irq mac is
trying to read something from the MAC.
Update such that also the access to MAC entries in HW is protected by
this lock.

Fixes: 5ccd66e01cbef ("net: lan966x: add support for interrupts from analyzer")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan966x: Fix usage of lan966x->mac_lock when entry is removed

To remove an entry to the MAC table, it is required first to setup the
entry and then issue a command for the MAC to forget the entry.
So if it happens for two threads to remove simultaneously an entry
in MAC table then it would be a race condition.
Fix this by using lan966x->mac_lock to protect the HW access.

Fixes: e18aba8941b40 ("net: lan966x: add mactable support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan966x: Fix usage of lan966x->mac_lock when entry is added

To add an entry to the MAC table, it is required first to setup the
entry and then issue a command for the MAC to learn the entry.
So if it happens for two threads to add simultaneously an entry in MAC
table then it would be a race condition.
Fix this by using lan966x->mac_lock to protect the HW access.

Fixes: fc0c3fe7486f2 ("net: lan966x: Add function lan966x_mac_ip_learn()")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan966x: Fix taking rtnl_lock while holding spin_lock

When the HW deletes an entry in MAC table then it generates an
interrupt. The SW will go through it's own list of MAC entries and if it
is not found then it would notify the listeners about this. The problem
is that when the SW will go through it's own list it would take a spin
lock(lan966x->mac_lock) and when it notifies that the entry is deleted.
But to notify the listeners it taking the rtnl_lock which is illegal.

This is fixed by instead of notifying right away that the entry is
deleted, move the entry on a temp list and once, it checks all the
entries then just notify that the entries from temp list are deleted.

Fixes: 5ccd66e01cbe ("net: lan966x: add support for interrupts from analyzer")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

skbuff: add SKBFL_DONT_ORPHAN flag

We don't want to list every single ubuf_info callback in
skb_orphan_frags(), add a flag controlling the behaviour.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

skbuff: don't mix ubuf_info from different sources

We should not append MSG_ZEROCOPY requests to skbuff with non
MSG_ZEROCOPY ubuf_info, they might be not compatible.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: avoid partial copy for zc

Even when zerocopy transmission is requested and possible,
__ip_append_data() will still copy a small chunk of data just because it
allocated some extra linear space (e.g. 128 bytes). It wastes CPU cycles
on copy and iter manipulations and also misalignes potentially aligned
data. Avoid such copies. And as a bonus we can allocate smaller skb.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: avoid partial copy for zc

Even when zerocopy transmission is requested and possible,
__ip_append_data() will still copy a small chunk of data just because it
allocated some extra linear space (e.g. 148 bytes). It wastes CPU cycles
on copy and iter manipulations and also misalignes potentially aligned
data. Avoid such copies. And as a bonus we can allocate smaller skb.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

clk: qcom: gcc-msm8994: use parent_hws for gpll0/4

Use parent_hws for two remaining clocks in gcc-msm8994 that used
parent_names.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220620080505.1573948-1-dmitry.baryshkov@linaro.org

scsi: sg: Allow waiting for commands to complete on removed device

When a SCSI device is removed while in active use, currently sg will
immediately return -ENODEV on any attempt to wait for active commands that
were sent before the removal.  This is problematic for commands that use
SG_FLAG_DIRECT_IO since the data buffer may still be in use by the kernel
when userspace frees or reuses it after getting ENODEV, leading to
corrupted userspace memory (in the case of READ-type commands) or corrupted
data being sent to the device (in the case of WRITE-type commands).  This
has been seen in practice when logging out of a iscsi_tcp session, where
the iSCSI driver may still be processing commands after the device has been
marked for removal.

Change the policy to allow userspace to wait for active sg commands even
when the device is being removed.  Return -ENODEV only when there are no
more responses to read.

Link: https://lore.kernel.org/r/5ebea46f-fe83-2d0b-233d-d0dcb362dd0a@cybernetics.com
Cc: <stable@vger.kernel.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Update version to 10.02.07.800-k

Link: https://lore.kernel.org/r/20220713052045.10683-11-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Update manufacturer details

Update manufacturer details to indicate Marvell Semiconductors.

Link: https://lore.kernel.org/r/20220713052045.10683-10-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bikash Hazarika <bhazarika@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Fix sparse warning for dport_data

Use le16_to_cpu to fix sparse warning reported for dport_data.

sparse warnings: (new ones prefixed by >>)
>> drivers/scsi/qla2xxx/qla_bsg.c:2485:34: sparse: sparse: incorrect
>> type in assignment (different base types) @@ expected unsigned
>> short [usertype] mbx1 @@ got restricted __le16 @@
   drivers/scsi/qla2xxx/qla_bsg.c:2485:34: sparse: expected unsigned short [usertype] mbx1
      drivers/scsi/qla2xxx/qla_bsg.c:2485:34: sparse: got restricted __le16
>> drivers/scsi/qla2xxx/qla_bsg.c:2486:34: sparse: sparse:
>> incorrect type in assignment (different base types) @@
>> expected unsigned short [usertype] mbx2 @@ got restricted __le16 @@
   drivers/scsi/qla2xxx/qla_bsg.c:2486:34: sparse: expected unsigned short [usertype] mbx2
   drivers/scsi/qla2xxx/qla_bsg.c:2486:34: sparse: got restricted __le16

Link: https://lore.kernel.org/r/20220713052045.10683-9-njavali@marvell.com
Fixes: 476da8faa336 ("scsi: qla2xxx: Add a new v2 dport diagnostic feature")
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Fix discovery issues in FC-AL topology

A direct attach tape device, when gets swapped with another, was not
discovered. Fix this by looking at loop map and reinitialize link if there
are devices present.

Link: https://lore.kernel.org/linux-scsi/baef87c3-5dad-3b47-44c1-6914bfc90108@cybernetics.com/
Link: https://lore.kernel.org/r/20220713052045.10683-8-njavali@marvell.com
Cc: stable@vger.kernel.org
Reported-by: Tony Battersby <tonyb@cybernetics.com>
Tested-by: Tony Battersby <tonyb@cybernetics.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Fix imbalance vha->vref_count

vref_count took an extra decrement in the task management path. Add an
extra ref count to compensate the imbalance.

Link: https://lore.kernel.org/r/20220713052045.10683-7-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: edif: Fix dropped IKE message

This patch fixes IKE message being dropped due to error in processing Purex
IOCB and Continuation IOCBs.

Link: https://lore.kernel.org/r/20220713052045.10683-6-njavali@marvell.com
Fixes: fac2807946c1 ("scsi: qla2xxx: edif: Add extraction of auth_els from the wire")
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Fix response queue handler reading stale packets

On some platforms, the current logic of relying on finding new packet
solely based on signature pattern can lead to driver reading stale
packets. Though this is a bug in those platforms, reduce such exposures by
limiting reading packets until the IN pointer.

Two module parameters are introduced:

  ql2xrspq_follow_inptr:

    When set, on newer adapters that has queue pointer shadowing, look for
    response packets only until response queue in pointer.

    When reset, response packets are read based on a signature pattern
    logic (old way).

  ql2xrspq_follow_inptr_legacy:

    Like ql2xrspq_follow_inptr, but for those adapters where there is no
    queue pointer shadowing.

Link: https://lore.kernel.org/r/20220713052045.10683-5-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Zero undefined mailbox IN registers

While requesting a new mailbox command, driver does not write any data to
unused registers. Initialize the unused register value to zero while
requesting a new mailbox command to prevent stale entry access by firmware.

Link: https://lore.kernel.org/r/20220713052045.10683-4-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bikash Hazarika <bhazarika@marvell.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Fix incorrect display of max frame size

Replace display field with the correct field.

Link: https://lore.kernel.org/r/20220713052045.10683-3-njavali@marvell.com
Fixes: 8777e4314d39 ("scsi: qla2xxx: Migrate NVME N2N handling into state machine")
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bikash Hazarika <bhazarika@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: Revert "scsi: qla2xxx: Fix disk failure to rediscover"

This fixes the regression of NVMe discovery failure during driver load
time.

This reverts commit 6a45c8e137d4e2c72eecf1ac7cf64f2fdfcead99.

Link: https://lore.kernel.org/r/20220713052045.10683-2-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
"Two bug fixes for irdma:

   - x722 does not support 1GB pages, trying to configure them will
     corrupt the dma mapping

   - Fix a sleep while holding a spinlock"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/irdma: Fix sleep from invalid context BUG
  RDMA/irdma: Do not advertise 1GB page size for x722

ARM: dts: qcom: add rpmcc missing clocks for apq/ipq8064 and msm8660

Add missing rpmcc pxo and cxo clock for apq8064, ipq8064 and
msm8660 dtsi.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220706225321.26215-3-ansuelsmth@gmail.com

clk: qcom: clk-rpm: convert to parent_data API

Convert clk-rpm driver to parent_data API.
We keep the old pxo/cxo_board parent naming to keep compatibility with
old DT and we use the new pxo/cxo for new implementation where these
clock are defined in DTS.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220706225321.26215-4-ansuelsmth@gmail.com

dt-bindings: clock: fix wrong clock documentation for qcom,rpmcc

qcom,rpmcc describe 2 different kind of device.
Currently we have definition for rpm-smd based device but we lack
Documentation for simple rpm based device.

Add the missing clk for ipq806x, apq8060, msm8660 and apq8064 and
provide additional example to describe these new simple rpm based
devices.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220706225321.26215-2-ansuelsmth@gmail.com

arm64: dts: qcom: sc7280: delete vdda-1p2 and vdda-0p9 from both dp and edp

Both vdda-1p2-supply and vdda-0p9-supply regulators are controlled
by dp combo phy. Therefore remove them from dp controller.

Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/1657556603-15024-1-git-send-email-quic_khsieh@quicinc.com

arm64: defconfig: Demote Qualcomm USB PHYs to modules

The Qualcomm USB PHYs are not critical for reaching the ramdisk to load
modules, so they can be demoted to be built as such instead of builtin.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20220712031821.4134712-1-bjorn.andersson@linaro.org

clk: qcom: gcc-msm8939: Add missing USB HS system clock frequencies

The shipped qcom driver defines:
static struct clk_freq_tbl ftbl_gcc_usb_hs_system_clk[] = {
        F(  57140000,      gpll0_out_main,  14,    0,    0),
        F(  80000000,      gpll0_out_main,  10,   0,    0),
        F( 100000000,      gpll0_out_main,   8,   0,    0),
        F_END
};
In the upstream code we omit 57.14 MHz and 100 MHz.

Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220712125922.3461675-7-bryan.odonoghue@linaro.org

clk: qcom: gcc-msm8939: Add missing MDSS MDP clock frequencies

Again the msm8936/msm8939 supports a wider range of operating frequencies
to the antecedent msm8916 from which the msm8939.c driver is derived.

static struct clk_freq_tbl ftbl_gcc_mdss_mdp_clk[] = {
        F(  50000000,      gpll0_out_aux,  16,    0,    0),
        F(  80000000,      gpll0_out_aux,  10,    0,    0),
        F( 100000000,      gpll0_out_aux,   8,    0,    0),
        F( 145500000,      gpll0_out_aux,  5.5,   0,    0),
        F( 153600000,      gpll1_out_main,      4,      0,      0),
        F( 160000000,      gpll0_out_aux,   5,    0,    0),
        F( 177780000,      gpll0_out_aux, 4.5,    0,    0),
        F( 200000000,      gpll0_out_aux,   4,    0,    0),
        F( 266670000,      gpll0_out_aux,   3,    0,    0),
        F( 307200000,      gpll1_out_main,      2,      0,      0),
        F( 366670000,      gpll3_out_aux,   3,        0,    0),
        F_END
};

We are missing 145.5 MHz and 153.6 MHz.

Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220712125922.3461675-6-bryan.odonoghue@linaro.org

clk: qcom: gcc-msm8939: Add missing CAMSS CPP clock frequencies

Reviewing the qcom msm8936.c clock frequency tables we see

static struct clk_freq_tbl ftbl_gcc_camss_cpp_clk[] = {
        F( 160000000,      gpll0_out_main,   5,   0,    0),
        F( 200000000,      gpll0_out_main,   4,   0,    0),
        F( 228570000,      gpll0_out_main, 3.5,   0,    0),
        F( 266670000,      gpll0_out_main,   3,   0,    0),
        F( 320000000,      gpll0_out_main, 2.5,   0,    0),
        F( 465000000,      gpll2_out_main,   2,   0,    0),
        F_END
};
which is a super-set of the msm8916 original definitions.
Add in the missing frequency definitions now.

Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220712125922.3461675-5-bryan.odonoghue@linaro.org

clk: qcom: gcc-msm8939: Fix venus0_vcodec0_clk frequency definitions

The Venus clock frequencies are a copy/paste error from msm8916. Looking
at the original clock-gcc-8936.c ftbl_gcc_venus0_vcodec0_clk defines we
have:

- 133 MHz
- 200 MHz
- 266 MHz

These values are born out by the relevant qualcomm documentation for the
msm8936/msm8939 Venus core performance levels.

Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220712125922.3461675-4-bryan.odonoghue@linaro.org

clk: qcom: gcc-msm8939: Add missing CAMSS CCI bus clock

Standard CCI bus clock clocks are 19.2 MHz and 37.5 MHz. We already define
the 19.2 MHz but are missing the 37.5 MHz.

See qcom kernel drivers/clk/qcom/clock-gcc-8936.c::ftbl_gcc_camss_cci_clk[]

Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220712125922.3461675-3-bryan.odonoghue@linaro.org

clk: qcom: gcc-msm8939: Fix weird field spacing in ftbl_gcc_camss_cci_clk

Adding a new item to this frequency table I see the existing indentation is
incorrect.

Fixes: 1664014e4679 ("clk: qcom: gcc-msm8939: Add MSM8939 Generic Clock Controller")
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220712125922.3461675-2-bryan.odonoghue@linaro.org

arm64: dts: sdm850: Remove unnecessary turbo-mode

qcom-cpufreq-hw finds turbo-mode in the LUT hardware tables
and slaps the flag on the last element, so there's no reason
to add it in the dts, so remove it.

Signed-off-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220718230109.8193-1-steev@kali.org

ARM: mach-qcom: Add support for MSM8909

Add a Kconfig entry for MSM8909 and the "qcom,msm8909-smp" CPU
enable-method. The ARM Cortex-A7 cores are booted just like on MSM8226.

Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220705143523.3390944-9-stephan.gerhold@kernkonzept.com

dt-bindings: arm: cpus: Document "qcom,msm8909-smp" enable-method

MSM8909 is a fairly old 32-bit SoC without PSCI support, so the
additional CPU cores need to be initialized with a custom enable-method.
Fortunately it works just like on MSM8226 and MSM8916 so just add
an additional compatible as alias to the DT schema.

Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220705143523.3390944-8-stephan.gerhold@kernkonzept.com

soc: qcom: spm: Add CPU data for MSM8909

Given the lack of public documentation for the SPM, the configuration
data is taken without modification from Qualcomm's msm-3.10 release [1].
It is pretty much identical to the one for MSM8916, except that 0x3B is
missing in the sequence for standalone power collapse for some reason.

[1]: https://git.codelinaro.org/clo/la/kernel/msm-3.10/-/blob/LA.BR.1.2.3-00910-8x09.0/arch/arm/boot/dts/qcom/msm8909-pm8909-pm.dtsi

Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220705143523.3390944-7-stephan.gerhold@kernkonzept.com

dt-bindings: soc: qcom: spm: Add MSM8909 CPU compatible

Document the "qcom,msm8909-saw2-v3.0-cpu" compatible for the CPU
Subsystem Power Manager (SPM) on the MSM8909 SoC. This is necessary
for CPU idle since this is a fairly old 32-bit SoC without support
for PSCI.

Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220705143523.3390944-6-stephan.gerhold@kernkonzept.com

soc: qcom: rpmpd: Add compatible for MSM8909

MSM8909 has the same power domains as MSM8916, so just define another
compatible for the existing definition.

Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220705143523.3390944-5-stephan.gerhold@kernkonzept.com

dt-bindings: power: qcom-rpmpd: Add MSM8909 power domains

MSM8909 has the same power domains as MSM8916 so just define them
as aliases for the existing definitions.

Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220705143523.3390944-4-stephan.gerhold@kernkonzept.com

soc: qcom: smd-rpm: Add compatible for MSM8909

Add the new "qcom,rpm-msm8909" compatible to the driver so the interface
to the Resource Power Manager (RPM) is initialized correctly on MSM8909.

Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220705143523.3390944-3-stephan.gerhold@kernkonzept.com

dt-bindings: soc: qcom: smd-rpm: Add MSM8909

Document the "qcom,rpm-msm8909" compatible to describe the interface to
the Resource Power Manager (RPM) on the MSM8909 SoC.

Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220705143523.3390944-2-stephan.gerhold@kernkonzept.com

arm64: dts: qcom: sc8280xp: add missing 300MHz

When booting a Thinkpad x13s, we see the message

[ 0.997647] cpu cpu0: failed to update OPP for freq=300000

So, lets add in 300MHz to make it happy

Signed-off-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220718225714.8074-1-steev@kali.org

pinctrl: armada-37xx: use raw spinlocks for regmap to avoid invalid wait context

The irqchip->irq_set_type method is called by __irq_set_trigger() under
the desc->lock raw spinlock.

The armada-37xx implementation, armada_37xx_irq_set_type(), uses an MMIO
regmap created by of_syscon_register(), which uses plain spinlocks
(the kind that are sleepable on RT).

Therefore, this is an invalid locking scheme for which we get a kernel
splat stating just that ("[ BUG: Invalid wait context ]"), because the
context in which the plain spinlock may sleep is atomic due to the raw
spinlock. We need to go raw spinlocks all the way.

Make this driver create its own MMIO regmap, with use_raw_spinlock=true,
and stop relying on syscon to provide it.

This patch depends on commit 67021f25d952 ("regmap: teach regmap to use
raw spinlocks if requested in the config").

Cc: <stable@vger.kernel.org> # 5.15+
Fixes: 2f227605394b ("pinctrl: armada-37xx: Add irqchip support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220716233745.1704677-3-vladimir.oltean@nxp.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

pinctrl: armada-37xx: make irq_lock a raw spinlock to avoid invalid wait context

The irqchip->irq_set_type method is called by __irq_set_trigger() under
the desc->lock raw spinlock.

The armada-37xx implementation, armada_37xx_irq_set_type(), takes a
plain spinlock, the kind that becomes sleepable on RT.

Therefore, this is an invalid locking scheme for which we get a kernel
splat stating just that ("[ BUG: Invalid wait context ]"), because the
context in which the plain spinlock may sleep is atomic due to the raw
spinlock. We need to go raw spinlocks all the way.

Replace the driver's irq_lock with a raw spinlock, to disable preemption
even on RT.

Cc: <stable@vger.kernel.org> # 5.15+
Fixes: 2f227605394b ("pinctrl: armada-37xx: Add irqchip support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220716233745.1704677-2-vladimir.oltean@nxp.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

Revert "ocfs2: mount shared volume without ha stack"

This reverts commit 912f655d78c5d4ad05eac287f23a435924df7144.

This commit introduced a regression that can cause mount hung.  The
changes in __ocfs2_find_empty_slot causes that any node with none-zero
node number can grab the slot that was already taken by node 0, so node 1
will access the same journal with node 0, when it try to grab journal
cluster lock, it will hung because it was already acquired by node 0.
It's very easy to reproduce this, in one cluster, mount node 0 first, then
node 1, you will see the following call trace from node 1.

[13148.735424] INFO: task mount.ocfs2:53045 blocked for more than 122 seconds.
[13148.739691]       Not tainted 5.15.0-2148.0.4.el8uek.mountracev2.x86_64 #2
[13148.742560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[13148.745846] task:mount.ocfs2     state:D stack:    0 pid:53045 ppid: 53044 flags:0x00004000
[13148.749354] Call Trace:
[13148.750718]  <TASK>
[13148.752019]  ? usleep_range+0x90/0x89
[13148.753882]  __schedule+0x210/0x567
[13148.755684]  schedule+0x44/0xa8
[13148.757270]  schedule_timeout+0x106/0x13c
[13148.759273]  ? __prepare_to_swait+0x53/0x78
[13148.761218]  __wait_for_common+0xae/0x163
[13148.763144]  __ocfs2_cluster_lock.constprop.0+0x1d6/0x870 [ocfs2]
[13148.765780]  ? ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
[13148.768312]  ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
[13148.770968]  ocfs2_journal_init+0x91/0x340 [ocfs2]
[13148.773202]  ocfs2_check_volume+0x39/0x461 [ocfs2]
[13148.775401]  ? iput+0x69/0xba
[13148.777047]  ocfs2_mount_volume.isra.0.cold+0x40/0x1f5 [ocfs2]
[13148.779646]  ocfs2_fill_super+0x54b/0x853 [ocfs2]
[13148.781756]  mount_bdev+0x190/0x1b7
[13148.783443]  ? ocfs2_remount+0x440/0x440 [ocfs2]
[13148.785634]  legacy_get_tree+0x27/0x48
[13148.787466]  vfs_get_tree+0x25/0xd0
[13148.789270]  do_new_mount+0x18c/0x2d9
[13148.791046]  __x64_sys_mount+0x10e/0x142
[13148.792911]  do_syscall_64+0x3b/0x89
[13148.794667]  entry_SYSCALL_64_after_hwframe+0x170/0x0
[13148.797051] RIP: 0033:0x7f2309f6e26e
[13148.798784] RSP: 002b:00007ffdcee7d408 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[13148.801974] RAX: ffffffffffffffda RBX: 00007ffdcee7d4a0 RCX: 00007f2309f6e26e
[13148.804815] RDX: 0000559aa762a8ae RSI: 0000559aa939d340 RDI: 0000559aa93a22b0
[13148.807719] RBP: 00007ffdcee7d5b0 R08: 0000559aa93a2290 R09: 00007f230a0b4820
[13148.810659] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcee7d420
[13148.813609] R13: 0000000000000000 R14: 0000559aa939f000 R15: 0000000000000000
[13148.816564]  </TASK>

To fix it, we can just fix __ocfs2_find_empty_slot.  But original commit
introduced the feature to mount ocfs2 locally even it is cluster based,
that is a very dangerous, it can easily cause serious data corruption,
there is no way to stop other nodes mounting the fs and corrupting it.
Setup ha or other cluster-aware stack is just the cost that we have to
take for avoiding corruption, otherwise we have to do it in kernel.

Link: https://lkml.kernel.org/r/20220603222801.42488-1-junxiao.bi@oracle.com
Fixes: 912f655d78c5("ocfs2: mount shared volume without ha stack")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <heming.zhao@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

hugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte

When alloc_huge_page fails, *pagep is set to NULL without put_page first.
So the hugepage indicated by *pagep is leaked.

Link: https://lkml.kernel.org/r/20220709092629.54291-1-linmiaohe@huawei.com
Fixes: 8cc5fcbb5be8 ("mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

fs: sendfile handles O_NONBLOCK of out_fd

sendfile has to return EAGAIN if out_fd is nonblocking and the write into
it would block.

Here is a small reproducer for the problem:

#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/sendfile.h>

#define FILE_SIZE (1UL << 30)
int main(int argc, char **argv) {
        int p[2], fd;

        if (pipe2(p, O_NONBLOCK))
                return 1;

        fd = open(argv[1], O_RDWR | O_TMPFILE, 0666);
        if (fd < 0)
                return 1;
        ftruncate(fd, FILE_SIZE);

        if (sendfile(p[1], fd, 0, FILE_SIZE) == -1) {
                fprintf(stderr, "FAIL\n");
        }
        if (sendfile(p[1], fd, 0, FILE_SIZE) != -1 || errno != EAGAIN) {
                fprintf(stderr, "FAIL\n");
        }
        return 0;
}

It worked before b964bf53e540, it is stuck after b964bf53e540, and it
works again with this fix.

This regression occurred because do_splice_direct() calls pipe_write
that handles O_NONBLOCK.  Here is a trace log from the reproducer:

1)               |  __x64_sys_sendfile64() {
1)               |    do_sendfile() {
1)               |      __fdget()
1)               |      rw_verify_area()
1)               |      __fdget()
1)               |      rw_verify_area()
1)               |      do_splice_direct() {
1)               |        rw_verify_area()
1)               |        splice_direct_to_actor() {
1)               |          do_splice_to() {
1)               |            rw_verify_area()
1)               |            generic_file_splice_read()
1) + 74.153 us   |          }
1)               |          direct_splice_actor() {
1)               |            iter_file_splice_write() {
1)               |              __kmalloc()
1)   0.148 us    |              pipe_lock();
1)   0.153 us    |              splice_from_pipe_next.part.0();
1)   0.162 us    |              page_cache_pipe_buf_confirm();
... 16 times
1)   0.159 us    |              page_cache_pipe_buf_confirm();
1)               |              vfs_iter_write() {
1)               |                do_iter_write() {
1)               |                  rw_verify_area()
1)               |                  do_iter_readv_writev() {
1)               |                    pipe_write() {
1)               |                      mutex_lock()
1)   0.153 us    |                      mutex_unlock();
1)   1.368 us    |                    }
1)   1.686 us    |                  }
1)   5.798 us    |                }
1)   6.084 us    |              }
1)   0.174 us    |              kfree();
1)   0.152 us    |              pipe_unlock();
1) + 14.461 us   |            }
1) + 14.783 us   |          }
1)   0.164 us    |          page_cache_pipe_buf_release();
... 16 times
1)   0.161 us    |          page_cache_pipe_buf_release();
1)               |          touch_atime()
1) + 95.854 us   |        }
1) + 99.784 us   |      }
1) ! 107.393 us  |    }
1) ! 107.699 us  |  }

Link: https://lkml.kernel.org/r/20220415005015.525191-1-avagin@gmail.com
Fixes: b964bf53e540 ("teach sendfile(2) to handle send-to-pipe directly")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ntfs: fix use-after-free in ntfs_ucsncmp()

Syzkaller reported use-after-free bug as follows:

==================================================================
BUG: KASAN: use-after-free in ntfs_ucsncmp+0x123/0x130
Read of size 2 at addr ffff8880751acee8 by task a.out/879

CPU: 7 PID: 879 Comm: a.out Not tainted 5.19.0-rc4-next-20220630-00001-gcc5218c8bd2c-dirty #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x1c0/0x2b0
print_address_description.constprop.0.cold+0xd4/0x484
print_report.cold+0x55/0x232
kasan_report+0xbf/0xf0
ntfs_ucsncmp+0x123/0x130
ntfs_are_names_equal.cold+0x2b/0x41
ntfs_attr_find+0x43b/0xb90
ntfs_attr_lookup+0x16d/0x1e0
ntfs_read_locked_attr_inode+0x4aa/0x2360
ntfs_attr_iget+0x1af/0x220
ntfs_read_locked_inode+0x246c/0x5120
ntfs_iget+0x132/0x180
load_system_files+0x1cc6/0x3480
ntfs_fill_super+0xa66/0x1cf0
mount_bdev+0x38d/0x460
legacy_get_tree+0x10d/0x220
vfs_get_tree+0x93/0x300
do_new_mount+0x2da/0x6d0
path_mount+0x496/0x19d0
__x64_sys_mount+0x284/0x300
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f3f2118d9ea
Code: 48 8b 0d a9 f4 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 76 f4 0b 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc269deac8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3f2118d9ea
RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffc269dec00
RBP: 00007ffc269dec80 R08: 00007ffc269deb00 R09: 00007ffc269dec44
R10: 0000000000000000 R11: 0000000000000202 R12: 000055f81ab1d220
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>

The buggy address belongs to the physical page:
page:0000000085430378 refcount:1 mapcount:1 mapping:0000000000000000 index:0x555c6a81d pfn:0x751ac
memcg:ffff888101f7e180
anon flags: 0xfffffc00a0014(uptodate|lru|mappedtodisk|swapbacked|node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc00a0014 ffffea0001bf2988 ffffea0001de2448 ffff88801712e201
raw: 0000000555c6a81d 0000000000000000 0000000100000000 ffff888101f7e180
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8880751acd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff8880751ace00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff8880751ace80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
ffff8880751acf00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff8880751acf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

The reason is that struct ATTR_RECORD->name_offset is 6485, end address of
name string is out of bounds.

Fix this by adding sanity check on end address of attribute name string.

[akpm@linux-foundation.org: coding-style cleanups]
[chenxiaosong2@huawei.com: cleanup suggested by Hawkins Jiawei]
Link: https://lkml.kernel.org/r/20220709064511.3304299-1-chenxiaosong2@huawei.com
Link: https://lkml.kernel.org/r/20220707105329.4020708-1-chenxiaosong2@huawei.com
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: ChenXiaoSong <chenxiaosong2@huawei.com>
Cc: Yongqiang Liu <liuyongqiang13@huawei.com>
Cc: Zhang Yi <yi.zhang@huawei.com>
Cc: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

secretmem: fix unhandled fault in truncate

syzkaller reports the following issue:

BUG: unable to handle page fault for address: ffff888021f7e005
PGD 11401067 P4D 11401067 PUD 11402067 PMD 21f7d063 PTE 800fffffde081060
Oops: 0002 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 3761 Comm: syz-executor281 Not tainted 5.19.0-rc4-syzkaller-00014-g941e3e791269 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:memset_erms+0x9/0x10 arch/x86/lib/memset_64.S:64
Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
RSP: 0018:ffffc9000329fa90 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000ffb
RDX: 0000000000000ffb RSI: 0000000000000000 RDI: ffff888021f7e005
RBP: ffffea000087df80 R08: 0000000000000001 R09: ffff888021f7e005
R10: ffffed10043efdff R11: 0000000000000000 R12: 0000000000000005
R13: 0000000000000000 R14: 0000000000001000 R15: 0000000000000ffb
FS: 00007fb29d8b2700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff888021f7e005 CR3: 0000000026e7b000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
zero_user_segments include/linux/highmem.h:272 [inline]
folio_zero_range include/linux/highmem.h:428 [inline]
truncate_inode_partial_folio+0x76a/0xdf0 mm/truncate.c:237
truncate_inode_pages_range+0x83b/0x1530 mm/truncate.c:381
truncate_inode_pages mm/truncate.c:452 [inline]
truncate_pagecache+0x63/0x90 mm/truncate.c:753
simple_setattr+0xed/0x110 fs/libfs.c:535
secretmem_setattr+0xae/0xf0 mm/secretmem.c:170
notify_change+0xb8c/0x12b0 fs/attr.c:424
do_truncate+0x13c/0x200 fs/open.c:65
do_sys_ftruncate+0x536/0x730 fs/open.c:193
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fb29d900899
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 11 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fb29d8b2318 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
RAX: ffffffffffffffda RBX: 00007fb29d988408 RCX: 00007fb29d900899
RDX: 00007fb29d900899 RSI: 0000000000000005 RDI: 0000000000000003
RBP: 00007fb29d988400 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb29d98840c
R13: 00007ffca01a23bf R14: 00007fb29d8b2400 R15: 0000000000022000
</TASK>
Modules linked in:
CR2: ffff888021f7e005
---[ end trace 0000000000000000 ]---

Eric Biggers suggested that this happens when
secretmem_setattr()->simple_setattr() races with secretmem_fault() so that
a page that is faulted in by secretmem_fault() (and thus removed from the
direct map) is zeroed by inode truncation right afterwards.

Use mapping->invalidate_lock to make secretmem_fault() and
secretmem_setattr() mutually exclusive.

[rppt@linux.ibm.com: v3]
Link: https://lkml.kernel.org/r/20220714091337.412297-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20220707165650.248088-1-rppt@kernel.org
Reported-by: syzbot+9bd2b7adbd34b30b87e4@syzkaller.appspotmail.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/hugetlb: separate path for hwpoison entry in copy_hugetlb_page_range()

Originally copy_hugetlb_page_range() handles migration entries and
hwpoisoned entries in similar manner. But recently the related code path
has more code for migration entries, and when
is_writable_migration_entry() was converted to
!is_readable_migration_entry(), hwpoison entries on source processes got
to be unexpectedly updated (which is legitimate for migration entries, but
not for hwpoison entries). This results in unexpected serious issues like
kernel panic when forking processes with hwpoison entries in pmd.

Separate the if branch into one for hwpoison entries and one for migration
entries.

Link: https://lkml.kernel.org/r/20220704013312.2415700-3-naoya.horiguchi@linux.dev
Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org> [5.18]
Cc: David Hildenbrand <david@redhat.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: fix missing wake-up event for FSDAX pages

FSDAX page refcounts are 1-based, rather than 0-based: if refcount is
1, then the page is freed. The FSDAX pages can be pinned through GUP,
then they will be unpinned via unpin_user_page() using a folio variant
to put the page, however, folio variants did not consider this special
case, the result will be to miss a wakeup event (like the user of
__fuse_dax_break_layouts()). This results in a task being permanently
stuck in TASK_INTERRUPTIBLE state.

Since FSDAX pages are only possibly obtained by GUP users, so fix GUP
instead of folio_put() to lower overhead.

Link: https://lkml.kernel.org/r/20220705123532.283-1-songmuchun@bytedance.com
Fixes: d8ddc099c6b3 ("mm/gup: Add gup_put_folio()")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: fix page leak with multiple threads mapping the same page

We have an application with a lot of threads that use a shared mmap backed
by tmpfs mounted with -o huge=within_size.  This application started
leaking loads of huge pages when we upgraded to a recent kernel.

Using the page ref tracepoints and a BPF program written by Tejun Heo we
were able to determine that these pages would have multiple refcounts from
the page fault path, but when it came to unmap time we wouldn't drop the
number of refs we had added from the faults.

I wrote a reproducer that mmap'ed a file backed by tmpfs with -o
huge=always, and then spawned 20 threads all looping faulting random
offsets in this map, while using madvise(MADV_DONTNEED) randomly for huge
page aligned ranges.  This very quickly reproduced the problem.

The problem here is that we check for the case that we have multiple
threads faulting in a range that was previously unmapped.  One thread maps
the PMD, the other thread loses the race and then returns 0.  However at
this point we already have the page, and we are no longer putting this
page into the processes address space, and so we leak the page.  We
actually did the correct thing prior to f9ce0be71d1f, however it looks
like Kirill copied what we do in the anonymous page case.  In the
anonymous page case we don't yet have a page, so we don't have to drop a
reference on anything.  Previously we did the correct thing for file based
faults by returning VM_FAULT_NOPAGE so we correctly drop the reference on
the page we faulted in.

Fix this by returning VM_FAULT_NOPAGE in the pmd_devmap_trans_unstable()
case, this makes us drop the ref on the page properly, and now my
reproducer no longer leaks the huge pages.

[josef@toxicpanda.com: v2]
Link: https://lkml.kernel.org/r/e90c8f0dbae836632b669c2afc434006a00d4a67.1657721478.git.josef@toxicpanda.com
Link: https://lkml.kernel.org/r/2b798acfd95c9ab9395fe85e8d5a835e2e10a920.1657051137.git.josef@toxicpanda.com
Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Chris Mason <clm@fb.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mailmap: update Seth Forshee's email address

seth.forshee@canonical.com is no longer valid, use sforshee@kernel.org
instead.

Link: https://lkml.kernel.org/r/20220628200734.424495-1-sforshee@kernel.org
Signed-off-by: Seth Forshee <sforshee@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

tmpfs: fix the issue that the mount and remount results are inconsistent.

An undefined-behavior issue has not been completely fixed since commit
d14f5efadd84 ("tmpfs: fix undefined-behaviour in shmem_reconfigure()").
In the commit, check in the shmem_reconfigure() is added in remount
process to avoid the Ubsan problem.  However, the check is not added to
the mount process.  It causes inconsistent results between mount and
remount.  The operations to reproduce the problem in user mode as follows:

If nr_blocks is set to 0x8000000000000000, the mounting is successful.

  # mount tmpfs /dev/shm/ -t tmpfs -o nr_blocks=0x8000000000000000

However, when -o remount is used, the mount fails because of the
check in the shmem_reconfigure()

  # mount tmpfs /dev/shm/ -t tmpfs -o remount,nr_blocks=0x8000000000000000
  mount: /dev/shm: mount point not mounted or bad option.

Therefore, add checks in the shmem_parse_one() function and remove the
check in shmem_reconfigure() to avoid this problem.

Link: https://lkml.kernel.org/r/20220629124324.1640807-1-wangzhaolong1@huawei.com
Signed-off-by: ZhaoLong Wang <wangzhaolong1@huawei.com>
Cc: Luo Meng <luomeng12@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Yu Kuai <yukuai3@huawei.com>
Cc: Zhihao Cheng <chengzhihao1@huawei.com>
Cc: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: kfence: apply kmemleak_ignore_phys on early allocated pool

This patch solves two issues.

(1) The pool allocated by memblock needs to unregister from
kmemleak scanning. Apply kmemleak_ignore_phys to replace the
original kmemleak_free as its address now is stored in the phys tree.

(2) The pool late allocated by page-alloc doesn't need to unregister.
Move out the freeing operation from its call path.

Link: https://lkml.kernel.org/r/20220628113714.7792-2-yee.lee@mediatek.com
Fixes: 0c24e061196c21d5 ("mm: kmemleak: add rbtree and store physical address for objects allocated with PA")
Signed-off-by: Yee Lee <yee.lee@mediatek.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Suggested-by: Marco Elver <elver@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

clk: qcom: gdsc: Bump parent usage count when GDSC is found enabled

When a GDSC is found to be enabled at boot the pm_runtime state will
be unbalanced as the GDSC is later turned off. Fix this by increasing
the usage counter on the power-domain, in line with how we handled the
regulator state.

Fixes: 1b771839de05 ("clk: qcom: gdsc: enable optional power domain support")
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Link: https://lore.kernel.org/r/20220713212818.130277-1-bjorn.andersson@linaro.org

clk: qcom: Drop mmcx gdsc supply for dispcc and videocc

Both dispcc and videocc use mmcx power domain now.
Lets drop the supply mmcx from every gdsc.

Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Fixes: 266e5cf39a0f ("arm64: dts: qcom: sm8250: remove mmcx regulator")
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220713143200.3686765-1-abel.vesa@linaro.org

soc: qcom: icc-bwmon: Remove unnecessary print function dev_err()

Eliminate the follow coccicheck warning:
./drivers/soc/qcom/icc-bwmon.c:349:2-9: line 349 is redundant because platform_get_irq() already prints an error

Fixes: b9c2ae6cac40 ("soc: qcom: icc-bwmon: Add bandwidth monitoring driver")
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220714075532.104665-1-yang.lee@linux.alibaba.com

Merge tag 'clk-imx-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/abelvesa/linux into clk-imx

Pull i.MX clk driver updates from Abel Vesa:

- Correct adc1, nic_media and edma1's parents for i.MX93
- Fix rdiv, mfd values, the return rate in recalc_rate and add more
   frequencies in the table for fracn-gppll

* tag 'clk-imx-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/abelvesa/linux:
  clk: imx: clk-fracn-gppll: Add more freq config for video pll
  clk: imx: clk-fracn-gppll: correct rdiv
  clk: imx: clk-fracn-gppll: Return rate in rate table properly in ->recalc_rate()
  clk: imx: clk-fracn-gppll: fix mfd value
  clk: imx93: Correct the edma1's parent clock
  clk: imx93: correct nic_media parent
  clk: imx93: use adc_root as the parent clock of adc1

Merge tag 'sunxi-clk-for-5.20-1' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into clk-allwinner

Pull Allwinner clk driver updates from Jernej Skrabec:

- deduplicate Allwinner ccu_clks arrays
- Allwinner H6 GPU DFS support
- adjust Allwinner Kconfig

* tag 'sunxi-clk-for-5.20-1' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  clk: sunxi-ng: sun50i: h6: Modify GPU clock configuration to support DFS
  clk: sunxi: Do not select the PRCM MFD
  clk: sunxi: Limit legacy clocks to 32-bit ARM
  clk: sunxi-ng: Deduplicate ccu_clks arrays