mmc: Set PROBE_PREFER_ASYNCHRONOUS for drivers that existed in v4.4
This is like commit 3d3451124f3d ("mmc: sdhci-msm: Prefer asynchronous
probe") but applied to a whole pile of drivers. This batch converts
the drivers that appeared to be around in the v4.4 timeframe.
Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> # SH_MMCIF Tested-by: Thierry Reding <treding@nvidia.com> Link: https://lore.kernel.org/r/20200903162412.1.Id501e96fa63224f77bb86b2135a5e8324ffb9c43@changeid Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Robin Murphy [Thu, 3 Sep 2020 21:18:06 +0000 (22:18 +0100)]
mmc: renesas_sdhi: Drop local dma_parms
Since commit 9495b7e92f71 ("driver core: platform: Initialize dma_parms
for platform devices"), struct platform_device already provides a
dma_parms structure, so we can save allocating another one.
Wolfram Sang [Tue, 1 Sep 2020 15:02:49 +0000 (17:02 +0200)]
mmc: renesas_sdhi: keep SCC clock active when tuning
Tuning procedure switches to lower frequencies but that will turn the
SCC off and accessing its register then will hang. So, check when we are
tuning and keep the current setup of the external clock if we are doing
so. Note that we still switch to the lower frequency because of the
internal divider. We just make sure to not modify the external clock.
This patch depends on a MMC core patch calling the downgrade function
earlier.
Wolfram Sang [Tue, 1 Sep 2020 15:02:48 +0000 (17:02 +0200)]
mmc: core: add a 'doing_init_tune' flag and a 'mmc_doing_tune' helper
Our driver needs to know when tuning is in progress. 'doing_retune' only
covers re-tuning, not the initial tuning. Add another flag to detect the
initial tuning state and add a helper which tells us if any kind of
tuning is going on. Only implemented for MMC currently because that's
where we need it. SD can be added later if it becomes necessary.
Wolfram Sang [Tue, 1 Sep 2020 15:02:47 +0000 (17:02 +0200)]
mmc: core: when downgrading HS400, callback into drivers earlier
The driver specific downgrade function makes more sense if we run it
before we set the timing to something lower, not after. Otherwise some
non-HS400 communication has already happened.
No need to convert users. There is only one currently which needs this
change in a following patch.
Turning on initcall debug on one system showed this:
initcall sdhci_msm_driver_init+0x0/0x28 returned 0 after 34782 usecs
The lion's share of this time (~33 ms) was in mmc_power_up(). This
shouldn't be terribly surprising since there are a few calls to delay
based on "power_delay_ms" and the default delay there is 10 ms.
Because we haven't specified that we'd prefer asynchronous probe for
this driver then we'll wait for this driver to finish before we start
probes for more drivers. While 33 ms doesn't sound like tons, every
little bit counts.
There should be little problem with turning on asynchronous probe for
this driver. It's already possible that previous drivers may have
turned on asynchronous probe so we might already have other things
(that probed before us) probing at the same time we are anyway. This
driver isn't really providing resources (clocks, regulators, etc) that
other drivers need to probe and even if it was they should be handling
-EPROBE_DEFER.
Let's turn this on and get a bit of boot speed back.
mmc: s3cmci: Drop unused variables in dbg_dumpregs
The 'imask' and 'bsize' are not used in dbg_dumpregs:
drivers/mmc/host/s3cmci.c:149:36: warning: variable 'imask' set but not used [-Wunused-but-set-variable]
drivers/mmc/host/s3cmci.c:148:63: warning: variable 'bsize' set but not used [-Wunused-but-set-variable]
mmc: sdhci-of-sparx5: Use proper printk format for dma_addr_t
dma_addr_t size varies between architectures so use dedicated printk
format to fix compile testing warning (e.g. on 32-bit MIPS):
drivers/mmc/host/sdhci-of-sparx5.c:63:11: warning:
format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 5 has type ‘dma_addr_t {aka unsigned int}’ [-Wformat=]
Common pattern of handling deferred probe can be simplified with
dev_err_probe(). Less code, the error value gets printed and real error
from dw_mci_parse_dt() is passed further instead of fixed -EINVAL.
mmc: core: Allow setting slot index via device tree alias
As with GPIO, UART and others, allow specifying the device index via the
aliases node in the device tree.
On embedded devices, there is often a combination of removable (e.g.
SD card) and non-removable MMC devices (e.g. eMMC).
Therefore the index might change depending on
* host of removable device
* removable card present or not
This makes it difficult to hardcode the root device, if it is on the
non-removable device. E.g. if SD card is present eMMC will be mmcblk1,
if SD card is not present at boot, eMMC will be mmcblk0.
Alternative solutions like PARTUUIDs do not cover the case where multiple
mmcblk devices contain the same image. This is a common issue on devices
that can boot both from eMMC (for regular boot) and SD cards (as a
temporary boot medium for development). When a firmware image is
installed to eMMC after a test boot via SD card, there will be no
reliable way to refer to a specific device using (PART)UUIDs oder
LABELs.
The demand for this feature has led to multiple attempts to implement
it, dating back at least to 2012 (see
https://www.spinics.net/lists/linux-mmc/msg26586.html for a previous
discussion from 2014).
All indices defined in the aliases node will be reserved for use by the
respective MMC device, moving the indices of devices that don't have an
alias up into the non-reserved range. If the aliases node is not found,
the driver will act as before.
This is a rebased and cleaned up version of
https://www.spinics.net/lists/linux-mmc/msg26588.html .
dt-bindings: mmc: mmc-pwreq-simple: Accept more than one reset GPIO
There might be multiple reset GPIOs but dtschema has trouble parsing it
if there are no maxItems:
arch/arm/boot/dts/exynos5250-snow.dt.yaml: mmc3_pwrseq: reset-gpios: [[20, 2, 1], [20, 1, 1]] is too long
From schema: Documentation/devicetree/bindings/mmc/mmc-pwrseq-simple.yaml
The i.MX 8 DTSes use two compatibles so update the binding to fix
dtbs_check warnings like:
arch/arm64/boot/dts/freescale/imx8mn-evk.dt.yaml: mmc@30b40000:
compatible: ['fsl,imx8mn-usdhc', 'fsl,imx7d-usdhc'] is too long
From schema: Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.yaml
arch/arm64/boot/dts/freescale/imx8mn-evk.dt.yaml: mmc@30b40000:
compatible: Additional items are not allowed ('fsl,imx7d-usdhc' was unexpected)
arch/arm64/boot/dts/freescale/imx8mn-ddr4-evk.dt.yaml: mmc@30b40000:
compatible: ['fsl,imx8mn-usdhc', 'fsl,imx7d-usdhc'] is too long
Stefan Wahren [Fri, 28 Aug 2020 21:47:14 +0000 (23:47 +0200)]
mmc: sdhci-iproc: Enable eMMC DDR 3.3V support for bcm2711
The emmc2 interface on the bcm2711 supports DDR modes for eMMC devices
running at 3.3V. This allows to run eMMC module with 3.3V signaling voltage
at DDR52 mode on the Raspberry Pi 4 using a SD adapter:
clock: 52000000 Hz
actual clock: 50000000 Hz
vdd: 21 (3.3 ~ 3.4 V)
bus mode: 2 (push-pull)
chip select: 0 (don't care)
power mode: 2 (on)
bus width: 2 (4 bits)
timing spec: 8 (mmc DDR52)
signal voltage: 0 (3.30 V)
driver type: 0 (driver type B)
Chun-Hung Wu [Thu, 27 Aug 2020 09:33:03 +0000 (17:33 +0800)]
mmc: mediatek: add pre_enable() and post_disable() hook function
CQHCI_ENABLE bit in CQHCI_CFG should be disabled
after msdc_cqe_disable(), and should be enabled before
msdc_ceq_enable() for MTK platform.
Add hook functions for cqhci_host_ops->pre_enable() and
cqhci_host_ops->post_disable().
mmc: sdhci-msm: Enable restore_dll_config flag for sc7180 target
On sc7180 target, issues are observed with HS400 mode due to a
hardware limitation. If sdcc clock is dynamically gated and ungated,
the very next command is failing with command CRC/timeout errors.
To mitigate this issue, DLL phase has to be restored whenever sdcc
clock is gated dynamically. The restore_dll_config ensures this.
Enabling this flag with this change. And simply re-using the sdm845
target configuration for this flag.
Wolfram Sang [Thu, 20 Aug 2020 13:25:38 +0000 (15:25 +0200)]
mmc: tmio: remove indirection of 'execute_tuning' callback
After all the previous refactorization, we can now populate mmc_ops
directly and don't need a layer inbetween. The NULL-pointer check and
the error printout are already done by the MMC core.
Wolfram Sang [Thu, 20 Aug 2020 13:25:37 +0000 (15:25 +0200)]
mmc: tmio: don't reset whole IP core when tuning fails
SDHI needs to reset the SCC only, not the whole IP core. So, if tuning
fails, don't handle specifics in the generic TMIO core, but in the
specific drivers. For SDHI, we need to move around the reset routine a
bit. It is not modified.
Wolfram Sang [Thu, 20 Aug 2020 13:25:36 +0000 (15:25 +0200)]
mmc: tmio: factor out common parts of the reset routine
Some TMIO variants need specific actions in their reset routine, but
they are all based on a generic reset routine. So, the optional 'reset'
callback will now only take care of the additional stuff and we will
have a generic function around it. Less code, easier to maintain, and
much more readable. Code in tmio_mmc.c is untested but in my TC6387XB
datasheet the SDIO part is reset independently from the SD part, too.
Wolfram Sang [Thu, 20 Aug 2020 13:25:34 +0000 (15:25 +0200)]
Revert "mmc: tmio: fix reset operation"
This reverts commit a87852c6b8827b7fece78ae57d871d56e4348e30. It did fix
the issue, but was building on top of already wrong assumptions. The
driver missed that 'hw_reset' was only for resetting remote HW (card)
and not for the IP core. Since we fixed that in a previous patch, we can
now remove this patch to make it clear that 'reset' is for resetting the
IP core only. Also, cancelling DMA will only be called when actually
needed again. It will also allow for further cleanups and better
readability. Note that in addition to the revert, the call in
'tmio_mmc_execute_tuning' will be converted, too, to maintain the
current behaviour.
Wolfram Sang [Thu, 20 Aug 2020 13:25:33 +0000 (15:25 +0200)]
mmc: renesas_sdhi: move wrong 'hw_reset' to 'reset'
This driver got the usage of 'hw_reset' wrong and missed that it is used
to reset the remote HW (card) only, not the local one (controller). Move
everything to the proper 'reset' callback. Also, add the generic reset
code from TMIO, so we will ensure the same behaviour (it will get
refactored away in a later patch). This also means we need to drop
MMC_CAP_HW_RESET because this is currently not supported by our
hardware.
Faiz Abbas [Tue, 25 Aug 2020 17:00:15 +0000 (22:30 +0530)]
mmc: sdhci_am654: Add workaround for card detect debounce timer
There is a one time delay because of a card detect debounce timer in the
controller IP. This timer runs as soon as power is applied to the module
regardless of whether a card is present or not and any writes to
SDHCI_POWER_ON will return 0 before it expires. This timeout has been
measured to be about 1 second in am654x and j721e.
Write-and-read-back in a loop on SDHCI_POWER_ON for a maximum of
1.5 seconds to make sure that the controller actually powers on.
via_save_pcictrlreg() should be called with host->lock held
as it writes to pm_pcictrl_reg, otherwise there can be a race
condition between via_sd_suspend() and via_sdc_card_detect().
The same pattern is used in the function via_reset_pcictrl()
as well, where via_save_pcictrlreg() is called with host->lock
held.
Found by Linux Driver Verification project (linuxtesting.org).
Wolfram Sang [Fri, 21 Aug 2020 06:35:33 +0000 (08:35 +0200)]
mmc: core: Improve documentation of MMC_CAP_HW_RESET
MMC_CAP_HW_RESET means that the controller is capable of resetting the eMMC
device via RST_n signal, not a reset of the controller. Two drivers got
this wrong, so let's make it more clear.
Adrian Hunter [Tue, 18 Aug 2020 10:45:08 +0000 (13:45 +0300)]
mmc: sdhci: Add LTR support for some Intel BYT based controllers
Some Intel BYT based host controllers support the setting of latency
tolerance. Accordingly, implement the PM QoS ->set_latency_tolerance()
callback. The raw register values are also exposed via debugfs.
Wolfram Sang [Mon, 17 Aug 2020 11:58:38 +0000 (13:58 +0200)]
mmc: test: remove ambiguity in test description
When reading the test description, I thought a correction of the
xfer_size was tested, which is not the case. It is tested that the
xfer_size is correct. Use 'proper xfer_size' to remove this ambiguity.
Tobias Schramm [Fri, 14 Aug 2020 18:50:11 +0000 (20:50 +0200)]
mmc: mmc_spi: fix timeout calculation
Previously the cycle timeout was converted to a microsecond value but
then incorrectly treated as a nanosecond timeout. This patch changes
the code to convert both the nanosecond timeout and the cycle timeout
to a microsecond value and use that directly.
mmc: sdio: Export SDIO revision and info strings to userspace
For SDIO functions, SDIO cards and SD COMBO cards are exported revision
number and info strings from CISTPL_VERS_1 structure. Revision number
should indicate compliance of standard and info strings should contain
product information in same format as product information for PCMCIA cards.
Product information for PCMCIA cards should contain following strings in
this order: Manufacturer, Product Name, Lot number, Programming Conditions.
Note that not all SDIO cards export all those info strings in that order as
described in PCMCIA Metaformat Specification.
Haibo Chen [Tue, 11 Aug 2020 08:37:37 +0000 (16:37 +0800)]
mmc: sdhci-esdhc-imx: Reset before sending tuning command for manual tuning
According to IC suggestion, everytime before sending the tuning command,
need to reset the usdhc, so to reset the tuning circuit, to let every
tuning command work well for the manual tuning method. For standard tuning
method, IC already add the reset operation in the hardware logic.
mmc: sdhci_am654: Replace HTTP links with HTTPS ones
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.
Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.
Adrian Hunter [Thu, 3 Sep 2020 08:20:07 +0000 (11:20 +0300)]
mmc: sdio: Use mmc_pre_req() / mmc_post_req()
SDHCI changed from using a tasklet to finish requests, to using an IRQ
thread i.e. commit c07a48c2651965 ("mmc: sdhci: Remove finish_tasklet").
Because this increased the latency to complete requests, a preparatory
change was made to complete the request from the IRQ handler if
possible i.e. commit 19d2f695f4e827 ("mmc: sdhci: Call mmc_request_done()
from IRQ handler if possible"). That alleviated the situation for MMC
block devices because the MMC block driver makes use of mmc_pre_req()
and mmc_post_req() so that successful requests are completed in the IRQ
handler and any DMA unmapping is handled separately in mmc_post_req().
However SDIO was still affected, and an example has been reported with
up to 20% degradation in performance.
Looking at SDIO I/O helper functions, sdio_io_rw_ext_helper() appeared
to be a possible candidate for making use of asynchronous requests
within its I/O loops, but analysis revealed that these loops almost
never iterate more than once, so the complexity of the change would not
be warrented.
Instead, mmc_pre_req() and mmc_post_req() are added before and after I/O
submission (mmc_wait_for_req) in mmc_io_rw_extended(). This still has
the potential benefit of reducing the duration of interrupt handlers, as
well as addressing the latency issue for SDHCI. It also seems a more
reasonable solution than forcing drivers to do everything in the IRQ
handler.
Chris Packham [Thu, 3 Sep 2020 01:20:29 +0000 (13:20 +1200)]
mmc: sdhci-of-esdhc: Don't walk device-tree on every interrupt
Commit b214fe592ab7 ("mmc: sdhci-of-esdhc: add erratum eSDHC7 support")
added code to check for a specific compatible string in the device-tree
on every esdhc interrupat. Instead of doing this record the quirk in
struct sdhci_esdhc and lookup the struct in esdhc_irq.
mmc: mmc_spi: Allow the driver to be built when CONFIG_HAS_DMA is unset
The commit cd57d07b1e4e ("sh: don't allow non-coherent DMA for NOMMU") made
CONFIG_NO_DMA to be set for some platforms, for good reasons.
Consequentially, CONFIG_HAS_DMA doesn't get set, which makes the DMA
mapping interface to be built as stub functions, but also prevent the
mmc_spi driver from being built as it depends on CONFIG_HAS_DMA.
It turns out that for some odd cases, the driver still relied on the DMA
mapping interface, even if the DMA was not actively being used.
To fixup the behaviour, let's drop the build dependency for CONFIG_HAS_DMA.
Moreover, as to allow the driver to succeed probing, let's move the DMA
initializations behind "#ifdef CONFIG_HAS_DMA".
Douglas Anderson [Thu, 27 Aug 2020 14:58:41 +0000 (07:58 -0700)]
mmc: sdhci-msm: Add retries when all tuning phases are found valid
As the comments in this patch say, if we tune and find all phases are
valid it's _almost_ as bad as no phases being found valid. Probably
all phases are not really reliable but we didn't detect where the
unreliable place is. That means we'll essentially be guessing and
hoping we get a good phase.
This is not just a problem in theory. It was causing real problems on
a real board. On that board, most often phase 10 is found as the only
invalid phase, though sometimes 10 and 11 are invalid and sometimes
just 11. Some percentage of the time, however, all phases are found
to be valid. When this happens, the current logic will decide to use
phase 11. Since phase 11 is sometimes found to be invalid, this is a
bad choice. Sure enough, when phase 11 is picked we often get mmc
errors later in boot.
I have seen cases where all phases were found to be valid 3 times in a
row, so increase the retry count to 10 just to be extra sure.
Fixes: 415b5a75da43 ("mmc: sdhci-msm: Add platform_execute_tuning implementation") Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20200827075809.1.If179abf5ecb67c963494db79c3bc4247d987419b@changeid Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Raul E Rangel [Mon, 31 Aug 2020 21:10:32 +0000 (15:10 -0600)]
mmc: sdhci-acpi: Clear amd_sdhci_host on reset
The commit 61d7437ed1390 ("mmc: sdhci-acpi: Fix HS400 tuning for AMDI0040")
broke resume for eMMC HS400. When the system suspends the eMMC controller
is powered down. So, on resume we need to reinitialize the controller.
Although, amd_sdhci_host was not getting cleared, so the DLL was never
re-enabled on resume. This results in HS400 being non-functional.
To fix the problem, this change clears the tuned_clock flag, clears the
dll_enabled flag and disables the DLL on reset.
Merge tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-block
Pull more io_uring fixes from Jens Axboe:
"Two followup fixes. One is fixing a regression from this merge window,
the other is two commits fixing cancelation of deferred requests.
Both have gone through full testing, and both spawned a few new
regression test additions to liburing.
- Don't play games with const, properly store the output iovec and
assign it as needed.
- Deferred request cancelation fix (Pavel)"
* tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-block:
io_uring: fix linked deferred ->files cancellation
io_uring: fix cancel of deferred reqs with ->files
io_uring: fix explicit async read/write mapping for large segments
Merge tag 'iommu-fixes-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- three Intel VT-d fixes to fix address handling on 32bit, fix a NULL
pointer dereference bug and serialize a hardware register access as
required by the VT-d spec.
- two patches for AMD IOMMU to force AMD GPUs into translation mode
when memory encryption is active and disallow using IOMMUv2
functionality. This makes the AMDGPU driver work when memory
encryption is active.
- two more fixes for AMD IOMMU to fix updating the Interrupt Remapping
Table Entries.
- MAINTAINERS file update for the Qualcom IOMMU driver.
* tag 'iommu-fixes-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Handle 36bit addressing for x86-32
iommu/amd: Do not use IOMMUv2 functionality when SME is active
iommu/amd: Do not force direct mapping when SME is active
iommu/amd: Use cmpxchg_double() when updating 128-bit IRTE
iommu/amd: Restore IRTE.RemapEn bit after programming IRTE
iommu/vt-d: Fix NULL pointer dereference in dev_iommu_priv_set()
iommu/vt-d: Serialize IOMMU GCMD register modifications
MAINTAINERS: Update QUALCOMM IOMMU after Arm SMMU drivers move
Merge tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- more generic entry code ABI fallout
- debug register handling bugfixes
- fix vmalloc mappings on 32-bit kernels
- kprobes instrumentation output fix on 32-bit kernels
- fix over-eager WARN_ON_ONCE() on !SMAP hardware
- NUMA debugging fix
- fix Clang related crash on !RETPOLINE kernels
* tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry: Unbreak 32bit fast syscall
x86/debug: Allow a single level of #DB recursion
x86/entry: Fix AC assertion
tracing/kprobes, x86/ptrace: Fix regs argument order for i386
x86, fakenuma: Fix invalid starting node ID
x86/mm/32: Bring back vmalloc faulting on x86_32
x86/cmdline: Disable jump tables for cmdline.c
Merge tags 'auxdisplay-for-linus-v5.9-rc4', 'clang-format-for-linus-v5.9-rc4' and 'compiler-attributes-for-linus-v5.9-rc4' of git://github.com/ojeda/linux
Pull misc fixes from Miguel Ojeda:
"A trivial patch for auxdisplay:
- Replace HTTP links with HTTPS ones (Alexander A. Klimov)
The usual clang-format trivial update:
- Update with the latest for_each macro list (Miguel Ojeda)
And Luc requested me to pick a sparse fix on my queue, so here it goes
along with other two trivial Compiler Attributes ones (also from Luc).
- sparse: use static inline for __chk_{user,io}_ptr() (Luc Van
Oostenryck)
- Compiler Attributes: remove comment about sparse not supporting
__has_attribute (Luc Van Oostenryck)"
* tag 'auxdisplay-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
auxdisplay: Replace HTTP links with HTTPS ones
* tag 'clang-format-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
clang-format: Update with the latest for_each macro list
* tag 'compiler-attributes-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
sparse: use static inline for __chk_{user,io}_ptr()
Compiler Attributes: fix comment concerning GCC 4.6
Compiler Attributes: remove comment about sparse not supporting __has_attribute
Merge tag 'arc-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- HSDK-4xd Dev system: perf driver updates for sampling interrupt
- HSDK* Dev System: Ethernet broken [Evgeniy Didin]
- HIGHMEM broken (2 memory banks) [Mike Rapoport]
- show_regs() rewrite once and for all
- Other minor fixes
* tag 'arc-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: [plat-hsdk]: Switch ethernet phy-mode to rgmii-id
arc: fix memory initialization for systems with two memory banks
irqchip/eznps: Fix build error for !ARC700 builds
ARC: show_regs: fix r12 printing and simplify
ARC: HSDK: wireup perf irq
ARC: perf: don't bail setup if pct irq missing in device-tree
ARC: pgalloc.h: delete a duplicated word + other fixes
Subsystems affected by this patch series: MAINTAINERS, ipc, fork,
checkpatch, lib, and mm (memcg, slub, pagemap, madvise, migration,
hugetlb)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
include/linux/log2.h: add missing () around n in roundup_pow_of_two()
mm/khugepaged.c: fix khugepaged's request size in collapse_file
mm/hugetlb: fix a race between hugetlb sysctl handlers
mm/hugetlb: try preferred node first when alloc gigantic page from cma
mm/migrate: preserve soft dirty in remove_migration_pte()
mm/migrate: remove unnecessary is_zone_device_page() check
mm/rmap: fixup copying of soft dirty and uffd ptes
mm/migrate: fixup setting UFFD_WP flag
mm: madvise: fix vma user-after-free
checkpatch: fix the usage of capture group ( ... )
fork: adjust sysctl_max_threads definition to match prototype
ipc: adjust proc_ipc_sem_dointvec definition to match prototype
mm: track page table modifications in __apply_to_page_range()
MAINTAINERS: IA64: mark Status as Odd Fixes only
MAINTAINERS: add LLVM maintainers
MAINTAINERS: update Cavium/Marvell entries
mm: slub: fix conversion of freelist_corrupted()
mm: memcg: fix memcg reclaim soft lockup
memcg: fix use-after-free in uncharge_batch
Jason Gunthorpe [Fri, 4 Sep 2020 23:36:19 +0000 (16:36 -0700)]
include/linux/log2.h: add missing () around n in roundup_pow_of_two()
Otherwise gcc generates warnings if the expression is complicated.
Fixes: 312a0c170945 ("[PATCH] LOG2: Alter roundup_pow_of_two() so that it can use a ilog2() on a constant") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/0-v1-8a2697e3c003+41165-log_brackets_jgg@nvidia.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Fri, 4 Sep 2020 23:36:16 +0000 (16:36 -0700)]
mm/khugepaged.c: fix khugepaged's request size in collapse_file
collapse_file() in khugepaged passes PAGE_SIZE as the number of pages to
be read to page_cache_sync_readahead(). The intent was probably to read
a single page. Fix it to use the number of pages to the end of the
window instead.
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Yang Shi <shy828301@gmail.com> Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Eric Biggers <ebiggers@google.com> Link: https://lkml.kernel.org/r/20200903140844.14194-2-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Muchun Song [Fri, 4 Sep 2020 23:36:13 +0000 (16:36 -0700)]
mm/hugetlb: fix a race between hugetlb sysctl handlers
There is a race between the assignment of `table->data` and write value
to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
the other thread.
Fix this by duplicating the `table`, and only update the duplicate of
it. And introduce a helper of proc_hugetlb_doulongvec_minmax() to
simplify the code.
Li Xinhai [Fri, 4 Sep 2020 23:36:10 +0000 (16:36 -0700)]
mm/hugetlb: try preferred node first when alloc gigantic page from cma
Since commit cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic
hugepages using cma"), the gigantic page would be allocated from node
which is not the preferred node, although there are pages available from
that node. The reason is that the nid parameter has been ignored in
alloc_gigantic_page().
Besides, the __GFP_THISNODE also need be checked if user required to
alloc only from the preferred node.
After this patch, the preferred node is tried first before other allowed
nodes, and don't try to allocate from other nodes if __GFP_THISNODE is
specified. If user don't specify the preferred node, the current node
will be used as preferred node, which makes sure consistent behavior of
allocating gigantic and non-gigantic hugetlb page.
Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Link: https://lkml.kernel.org/r/20200902025016.697260-1-lixinhai.lxh@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ralph Campbell [Fri, 4 Sep 2020 23:36:07 +0000 (16:36 -0700)]
mm/migrate: preserve soft dirty in remove_migration_pte()
The code to remove a migration PTE and replace it with a device private
PTE was not copying the soft dirty bit from the migration entry. This
could lead to page contents not being marked dirty when faulting the page
back from device private memory.
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Bharata B Rao <bharata@linux.ibm.com> Link: https://lkml.kernel.org/r/20200831212222.22409-3-rcampbell@nvidia.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm/migrate: preserve soft dirty in remove_migration_pte()".
I happened to notice this from code inspection after seeing Alistair
Popple's patch ("mm/rmap: Fixup copying of soft dirty and uffd ptes").
This patch (of 2):
The check for is_zone_device_page() and is_device_private_page() is
unnecessary since the latter is sufficient to determine if the page is a
device private page. Simplify the code for easier reading.
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Bharata B Rao <bharata@linux.ibm.com> Link: https://lkml.kernel.org/r/20200831212222.22409-1-rcampbell@nvidia.com Link: https://lkml.kernel.org/r/20200831212222.22409-2-rcampbell@nvidia.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/rmap: fixup copying of soft dirty and uffd ptes
During memory migration a pte is temporarily replaced with a migration
swap pte. Some pte bits from the existing mapping such as the soft-dirty
and uffd write-protect bits are preserved by copying these to the
temporary migration swap pte.
However these bits are not stored at the same location for swap and
non-swap ptes. Therefore testing these bits requires using the
appropriate helper function for the given pte type.
Unfortunately several code locations were found where the wrong helper
function is being used to test soft_dirty and uffd_wp bits which leads to
them getting incorrectly set or cleared during page-migration.
Fix these by using the correct tests based on pte type.
Fixes: a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in migration") Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration") Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Alistair Popple <alistair@popple.id.au> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200825064232.10023-2-alistair@popple.id.au Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit f45ec5ff16a75 ("userfaultfd: wp: support swap and page migration")
introduced support for tracking the uffd wp bit during page migration.
However the non-swap PTE variant was used to set the flag for zone device
private pages which are a type of swap page.
This leads to corruption of the swap offset if the original PTE has the
uffd_wp flag set.
Fixes: f45ec5ff16a75 ("userfaultfd: wp: support swap and page migration") Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Link: https://lkml.kernel.org/r/20200825064232.10023-1-alistair@popple.id.au Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Fri, 4 Sep 2020 23:35:55 +0000 (16:35 -0700)]
mm: madvise: fix vma user-after-free
The syzbot reported the below use-after-free:
BUG: KASAN: use-after-free in madvise_willneed mm/madvise.c:293 [inline]
BUG: KASAN: use-after-free in madvise_vma mm/madvise.c:942 [inline]
BUG: KASAN: use-after-free in do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145
Read of size 8 at addr ffff8880a6163eb0 by task syz-executor.0/9996
It is because vma is accessed after releasing mmap_lock, but someone
else acquired the mmap_lock and the vma is gone.
Releasing mmap_lock after accessing vma should fix the problem.
Fixes: 692fe62433d4c ("mm: Handle MADV_WILLNEED through vfs_fadvise()") Reported-by: syzbot+b90df26038d1d5d85c97@syzkaller.appspotmail.com Signed-off-by: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: <stable@vger.kernel.org> [5.4+] Link: https://lkml.kernel.org/r/20200816141204.162624-1-shy828301@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
checkpatch: fix the usage of capture group ( ... )
The usage of "capture group (...)" in the immediate condition after `&&`
results in `$1` being uninitialized. This issues a warning "Use of
uninitialized value $1 in regexp compilation at ./scripts/checkpatch.pl
line 2638".
I noticed this bug while running checkpatch on the set of commits from
v5.7 to v5.8-rc1 of the kernel on the commits with a diff content in
their commit message.
This bug was introduced in the script by commit e518e9a59ec3
("checkpatch: emit an error when there's a diff in a changelog"). It
has been in the script since then.
The author intended to store the match made by capture group in variable
`$1`. This should have contained the name of the file as `[\w/]+`
matched. However, this couldn't be accomplished due to usage of capture
group and `$1` in the same regular expression.
Fix this by placing the capture group in the condition before `&&`.
Thus, `$1` can be initialized to the text that capture group matches
thereby setting it to the desired and required value.
Fixes: e518e9a59ec3 ("checkpatch: emit an error when there's a diff in a changelog") Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Joe Perches <joe@perches.com> Link: https://lkml.kernel.org/r/20200714032352.f476hanaj2dlmiot@mrinalpandey Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fork: adjust sysctl_max_threads definition to match prototype
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
definition of sysctl_max_threads to match its prototype in
linux/sysctl.h which fixes the following sparse error/warning:
kernel/fork.c:3050:47: warning: incorrect type in argument 3 (different address spaces)
kernel/fork.c:3050:47: expected void *
kernel/fork.c:3050:47: got void [noderef] __user *buffer
kernel/fork.c:3036:5: error: symbol 'sysctl_max_threads' redeclared with different type (incompatible argument 3 (different address spaces)):
kernel/fork.c:3036:5: int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )
kernel/fork.c: note: in included file (through include/linux/key.h, include/linux/cred.h, include/linux/sched/signal.h, include/linux/sched/cputime.h):
include/linux/sysctl.h:242:5: note: previously declared as:
include/linux/sysctl.h:242:5: int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )
Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lkml.kernel.org/r/20200825093647.24263-1-tklauser@distanz.ch Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ipc: adjust proc_ipc_sem_dointvec definition to match prototype
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
signature of proc_ipc_sem_dointvec to match ctl_table.proc_handler which
fixes the following sparse error/warning:
ipc/ipc_sysctl.c:94:47: warning: incorrect type in argument 3 (different address spaces)
ipc/ipc_sysctl.c:94:47: expected void *buffer
ipc/ipc_sysctl.c:94:47: got void [noderef] __user *buffer
ipc/ipc_sysctl.c:194:35: warning: incorrect type in initializer (incompatible argument 3 (different address spaces))
ipc/ipc_sysctl.c:194:35: expected int ( [usertype] *proc_handler )( ... )
ipc/ipc_sysctl.c:194:35: got int ( * )( ... )
Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Link: https://lkml.kernel.org/r/20200825105846.5193-1-tklauser@distanz.ch Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm: track page table modifications in __apply_to_page_range()
__apply_to_page_range() is also used to change and/or allocate
page-table pages in the vmalloc area of the address space. Make sure
these changes get synchronized to other page-tables in the system by
calling arch_sync_kernel_mappings() when necessary.
The impact appears limited to x86-32, where apply_to_page_range may miss
updating the PMD. That leads to explosions in drivers like
Nominate Nathan and myself to be point of contact for clang/LLVM related
support, after a poll at the LLVM BoF at Linux Plumbers Conf 2020.
While corporate sponsorship is beneficial, its important to not entrust
the keys to the nukes with any one entity. Should Nathan and I find
ourselves at the same employer, I would gladly step down.
Robert Richter [Fri, 4 Sep 2020 23:35:33 +0000 (16:35 -0700)]
MAINTAINERS: update Cavium/Marvell entries
I am leaving Marvell and already do not have access to my @marvell.com
email address. So switching over to my korg mail address or removing my
address there another maintainer is already listed. For the entries
there no other maintainer is listed I will keep looking into patches for
Cavium systems for a while until someone from Marvell takes it over.
Since I might have limited access to hardware and also limited time I
changed state to 'Odd Fixes' for those entries.
Signed-off-by: Robert Richter <rric@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Ganapatrao Kulkarni <gkulkarni@marvell.com> Cc: Sunil Goutham <sgoutham@marvell.com> CC: Borislav Petkov <bp@alien8.de> Cc: Marc Zyngier <maz@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Wolfram Sang <wsa@kernel.org>, Link: https://lkml.kernel.org/r/20200824122050.31164-1-rric@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 52f23478081ae0 ("mm/slub.c: fix corrupted freechain in
deactivate_slab()") suffered an update when picked up from LKML [1].
Specifically, relocating 'freelist = NULL' into 'freelist_corrupted()'
created a no-op statement. Fix it by sticking to the behavior intended
in the original patch [1]. In addition, make freelist_corrupted()
immune to passing NULL instead of &freelist.
The issue has been spotted via static analysis and code review.
It only happens on our 1-vcpu instances, because there's no chance for
oom reaper to run to reclaim the to-be-killed process.
Add a cond_resched() at the upper shrink_node_memcgs() to solve this
issue, this will mean that we will get a scheduling point for each memcg
in the reclaimed hierarchy without any dependency on the reclaimable
memory in that memcg thus making it more predictable.
Suggested-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: http://lkml.kernel.org/r/1598495549-67324-1-git-send-email-xlpang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 1a3e1f40962c ("mm: memcontrol: decouple reference counting from
page accounting") reworked the memcg lifetime to be bound the the struct
page rather than charges. It also removed the css_put_many from
uncharge_batch and that is causing the above splat.
uncharge_batch() is supposed to uncharge accumulated charges for all
pages freed from the same memcg. The queuing is done by uncharge_page
which however drops the memcg reference after it adds charges to the
batch. If the current page happens to be the last one holding the
reference for its memcg then the memcg is OK to go and the next page to
be freed will trigger batched uncharge which needs to access the memcg
which is gone already.
Fix the issue by taking a reference for the memcg in the current batch.
Fixes: 1a3e1f40962c ("mm: memcontrol: decouple reference counting from page accounting") Reported-by: syzbot+b305848212deec86eabe@syzkaller.appspotmail.com Reported-by: syzbot+b5ea6fb6f139c8b9482b@syzkaller.appspotmail.com Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <guro@fb.com> Cc: Hugh Dickins <hughd@google.com> Link: https://lkml.kernel.org/r/20200820090341.GC5033@dhcp22.suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge tag 'xfs-5.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fix from Darrick Wong:
"Fix a broken metadata verifier that would incorrectly validate attr
fork extents of a realtime file against the realtime volume"
* tag 'xfs-5.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix xfs_bmap_validate_extent_raw when checking attr fork of rt files
When running in a dax mode, if the user maps a page with MAP_PRIVATE and
PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime
when the user hits a COW fault.
This breaks building of the Linux kernel. How to reproduce:
1. extract the Linux kernel tree on dax-mounted xfs filesystem
2. run make clean
3. run make -j12
4. run make -j12
at step 4, make would incorrectly rebuild the whole kernel (although it
was already built in step 3).
The reason for the breakage is that almost all object files depend on
objtool. When we run objtool, it takes COW page fault on its .data
section, and these faults will incorrectly update the timestamp of the
objtool binary. The updated timestamp causes make to rebuild the whole
tree.
When running in a dax mode, if the user maps a page with MAP_PRIVATE and
PROT_WRITE, the ext2 filesystem would incorrectly update ctime and mtime
when the user hits a COW fault.
This breaks building of the Linux kernel. How to reproduce:
1. extract the Linux kernel tree on dax-mounted ext2 filesystem
2. run make clean
3. run make -j12
4. run make -j12
at step 4, make would incorrectly rebuild the whole kernel (although it
was already built in step 3).
The reason for the breakage is that almost all object files depend on
objtool. When we run objtool, it takes COW page fault on its .data
section, and these faults will incorrectly update the timestamp of the
objtool binary. The updated timestamp causes make to rebuild the whole
tree.
io_uring: fix explicit async read/write mapping for large segments
If we exceed UIO_FASTIOV, we don't handle the transition correctly
between an allocated vec for requests that are queued with IOSQE_ASYNC.
Store the iovec appropriately and re-set it in the iter iov in case
it changed.
Fixes: ff6165b2d7f6 ("io_uring: retain iov_iter state over io_read/io_write calls") Reported-by: Nick Hill <nick@nickhill.org> Tested-by: Norman Maurer <norman.maurer@googlemail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>