Mike Snitzer [Tue, 26 Jul 2022 15:29:50 +0000 (11:29 -0400)]
dm verity: allow optional args to alter primary args handling
The previous commit ("dm verity: Add optional "try_verify_in_tasklet"
feature") imposed that CRYPTO_ALG_ASYNC mask be used even if the
optional "try_verify_in_tasklet" feature was not specified. This was
because verity_parse_opt_args() was called after handling the primary
args (due to it having data dependencies on having first parsed all
primary args).
Enhance verity_ctr() so that simple optional args, that don't have a
data dependency on primary args parsing, can alter how the primary
args are handled. In practice this means verity_parse_opt_args() gets
called twice. First with the new 'only_modifier_opts' arg set to true,
then again with it set to false _after_ parsing all primary args.
This allows the v->use_tasklet flag to be properly set and then used
when verity_ctr() parses the primary args and then calls
crypto_alloc_ahash() with CRYPTO_ALG_ASYNC conditionally set.
Using tasklets for disk verification can reduce IO latency. When there
are accelerated hash instructions it is often better to compute the
hash immediately using a tasklet rather than deferring verification to
a work-queue. This reduces time spent waiting to schedule work-queue
jobs, but requires spending slightly more time in interrupt context.
If the dm-bufio cache does not have the required hashes we fallback to
the work-queue implementation. FEC is only possible using work-queue
because code to support the FEC feature may sleep.
The following shows a speed comparison of random reads on a dm-verity
device. The dm-verity device uses a 1G ramdisk for data and a 1G
ramdisk for hashes. One test was run using tasklets and one test was
run using the existing work-queue solution. Both tests were run when
the dm-bufio cache was hot. The tasklet implementation performs
significantly better since there is no time spent waiting for
work-queue jobs to be scheduled.
Bjorn Helgaas [Thu, 4 Aug 2022 16:46:53 +0000 (11:46 -0500)]
Merge branch 'pci/header-cleanup-immutable'
- Remove pci_get_legacy_ide_irq(); use ATA_PRIMARY_IRQ() and
ATA_SECONDARY_IRQ() instead (Stafford Horne)
- Remove isa_dma_bridge_buggy, except for x86_32, the only place it's used
(Stafford Horne)
- Define ARCH_GENERIC_PCI_MMAP_RESOURCE for csky (Stafford Horne)
- Move common PCI definitions that arches sometimes override to
asm-generic/pci.h (Stafford Horne)
- Include <linux/isa-dma.h> for 'isa_dma_bridge_buggy' when needed
(bisection hole here) (Randy Dunlap)
* pci/header-cleanup-immutable:
PCI: Stub __pci_ioport_map() for arches that don't support it at all
x86/cyrix: include header linux/isa-dma.h
asm-generic: Add new pci.h and use it
csky: PCI: Define ARCH_GENERIC_PCI_MMAP_RESOURCE
PCI: Move isa_dma_bridge_buggy out of asm/dma.h
PCI: Remove pci_get_legacy_ide_irq() and asm-generic/pci.h
Bjorn Helgaas [Thu, 4 Aug 2022 16:41:57 +0000 (11:41 -0500)]
Merge branch 'pci/ctrl/mediatek-gen3'
- Fix refcount leak in mtk_pcie_init_irq_domains() (Miaoqian Lin)
- Print decoded LTSSM state when the link doesn't come up (Jianjun Wang)
* pci/ctrl/mediatek-gen3:
PCI: mediatek-gen3: Print LTSSM state when PCIe link down
PCI: mediatek-gen3: Fix refcount leak in mtk_pcie_init_irq_domains()
Bjorn Helgaas [Thu, 4 Aug 2022 16:41:55 +0000 (11:41 -0500)]
Merge branch 'pci/ctrl/imx6'
- Factor out ref clock disables to match enables (Bjorn Helgaas)
- Collect clock enables in imx6_pcie_clk_enable() (Richard Zhu)
- Propagate regulator and clock errors back to .host_init() caller (Richard
Zhu)
- Disable i.MX6QDL clock when disabling ref clocks (Richard Zhu)
- Call host init function directly in resume instead of duplicating the
code (Richard Zhu)
- Turn off regulators when suspending (Richard Zhu)
- Make link being down a non-fatal error so probe doesn't fail (Richard
Zhu)
- Start link in resume only if it was up before suspend to reduce resume
time (Richard Zhu)
- Move PHY init and power-on out of clock- and reset-related functions
(Richard Zhu)
- Rework suspend callback to be more symmetric with resume (Richard Zhu)
- Set PCIE_DBI_RO_WR_EN before writing DBI registers (Richard Zhu)
- Allow speeds faster than Gen2 (Richard Zhu)
* pci/ctrl/imx6:
PCI: imx6: Support more than Gen2 speed link mode
PCI: imx6: Set PCIE_DBI_RO_WR_EN before writing DBI registers
PCI: imx6: Reformat suspend callback to keep symmetric with resume
PCI: imx6: Move the imx6_pcie_ltssm_disable() earlier
PCI: imx6: Disable clocks in reverse order of enable
PCI: imx6: Do not hide PHY driver callbacks and refine the error handling
PCI: imx6: Reduce resume time by only starting link if it was up before suspend
PCI: imx6: Mark the link down as non-fatal error
PCI: imx6: Move regulator enable out of imx6_pcie_deassert_core_reset()
PCI: imx6: Turn off regulator when system is in suspend mode
PCI: imx6: Call host init function directly in resume
PCI: imx6: Disable i.MX6QDL clock when disabling ref clocks
PCI: imx6: Propagate .host_init() errors to caller
PCI: imx6: Collect clock enables in imx6_pcie_clk_enable()
PCI: imx6: Factor out ref clock disable to match enable
PCI: imx6: Move imx6_pcie_clk_disable() earlier
PCI: imx6: Move imx6_pcie_enable_ref_clk() earlier
PCI: imx6: Move PHY management functions together
PCI: imx6: Move imx6_pcie_grp_offset(), imx6_pcie_configure_type() earlier
PCI: imx6: Convert to NOIRQ_SYSTEM_SLEEP_PM_OPS()
- Move eDMA private data from struct dw_edma to struct dw_edma_chip (Frank
Li)
- Convert "struct dw_edma_region rg_region" to "void __iomem *reg_base"
since only the virtual address (not physical address or size) is used
(Frank Li)
- Rename "*_ch_cnt" to "ll_*_cnt" to reflect actual usage (Frank Li)
- Drop dma_slave_config.direction field usage (Serge Semin)
- Fix eDMA Rd/Wr-channels and DMA-direction semantics (Serge Semin)
- Add chip-specific DW_EDMA_CHIP_LOCAL flag to indicate that local eDMA
doesn't require generating MSIs to remote (Frank Li)
- Enable DMA tests for endpoints that support it (Frank Li)
* pci/ctrl/dwc-edma:
PCI: endpoint: Enable DMA tests for endpoints with DMA capabilities
dmaengine: dw-edma: Add support for chip-specific flags
dmaengine: dw-edma: Fix eDMA Rd/Wr-channels and DMA-direction semantics
dmaengine: dw-edma: Drop dma_slave_config.direction field usage
dmaengine: dw-edma: Rename wr(rd)_ch_cnt to ll_wr(rd)_cnt in struct dw_edma_chip
dmaengine: dw-edma: Change rg_region to reg_base in struct dw_edma_chip
dmaengine: dw-edma: Detach the private data and chip info structures
dmaengine: dw-edma: Remove unused irq field in struct dw_edma_chip
- Detect iATU region size from hardware (Serge Semin)
- Validate iATU outbound mappings against hardware constraints (Serge
Semin)
- Check for errors in iATU setup (Serge Semin)
- Allocate a 32-bit DMA-able page to be MSI target instead of using a
driver data structure that may not be addressable with 32-bit address
(Will McVicker)
- Use the bitmap API to allocate bitmaps instead of open-coding it
(Christophe JAILLET)
- Correct dw_pcie_free_msi() checking for when to remove IRQ handler and
data (Dmitry Baryshkov)
- Split MSI init to new dw_pcie_msi_host_init() function (Dmitry Baryshkov)
- Convert struct pcie_port.msi_irq to an array so we can support more than
32 MSI interrupts (Dmitry Baryshkov)
- Handle MSIs routed to multiple GIC interrupts for Qualcomm platforms with
groups of 32 MSI vectors (Dmitry Baryshkov)
- Add additional MSI interrupts to qcom DT (Dmitry Baryshkov)
* pci/ctrl/dwc:
dt-bindings: PCI: qcom: Support additional MSI vectors
PCI: dwc: Handle MSIs routed to multiple GIC interrupts
PCI: dwc: Convert struct pcie_port.msi_irq to an array
PCI: dwc: Split MSI IRQ parsing/allocation to a separate function
PCI: dwc: Correct msi_irq condition in dw_pcie_free_msi()
PCI: dwc: Use the bitmap API to allocate bitmaps
PCI: dwc: Fix MSI msi_msg DMA mapping
PCI: dwc: Check iATU in/outbound range setup status
PCI: dwc: Validate iATU outbound mappings against hardware constraints
PCI: dwc: Add iATU regions size detection procedure
PCI: dwc: Simplify in/outbound iATU setup methods
PCI: dwc: Drop enum dw_pcie_region_type in favor of PCIE_ATU_REGION_DIR_IB/OB
PCI: dwc: Drop enum dw_pcie_as_type in favor of PCIE_ATU_TYPE_MEM/IO
PCI: dwc: Add dw_pcie_ops.host_deinit() callback
PCI: tegra194: Drop manual DW PCIe controller version setup
PCI: intel-gw: Drop manual DW PCIe controller version setup
PCI: dwc: Add macros to compare Synopsys IP core versions
PCI: dwc: Read DWC IP core version from register
PCI: dwc: Use native DWC IP core version representation
PCI: dwc: Detect iATU settings after getting "addr_space" resource
PCI: dwc: Log link speed and width if it comes up
PCI: dwc-plat: Drop dw_plat_pcie_of_match[] forward declaration
PCI: dwc-plat: Drop unused regmap pointer
PCI: dwc-plat: Simplify dw_plat_pcie_probe() return values
PCI: dwc: Rename struct pcie_port to dw_pcie_rp
PCI: dwc: Move io_cfg_atu_shared to struct pcie_port
PCI: dwc: Add start_link/stop_link inlines
PCI: dwc: Reuse local pointer to the resource data
PCI: dwc: Organize local variable usage
PCI: dwc: Convert dw_pcie_link_up() to use dw_pcie_readl_dbi()
PCI: dwc: Simplify unrolled iATU detection
PCI: dwc: Add newlines to log messages
PCI: dwc: Add braces to multi-line if-else statements
PCI: dwc: Always enable CDM check if "snps,enable-cdm-check" exists
PCI: dwc: Deallocate EPC memory on dw_pcie_ep_init() errors
PCI: dwc: Set INCREASE_REGION_SIZE flag based on limit address
PCI: dwc: Disable outbound windows only for controllers using iATU
PCI: dwc: Add unroll iATU space support to dw_pcie_disable_atu()
PCI: dwc: Stop link on host_init errors and de-initialization
- Prevent config space access when link is down (Jim Quinlan)
- Split post-link up initialization to brcm_pcie_start_link() (Jim Quinlan)
- Enable child bus device regulators described under Root Ports in DT (Jim
Quinlan)
- Disable/enable regulators in suspend/resume (Jim Quinlan)
- Rename .map_bus() functions to end with 'map_bus' as they do in other
drivers (Jim Quinlan)
* pci/ctrl/brcmstb:
PCI: brcmstb: Rename .map_bus() functions to end with 'map_bus'
PCI: brcmstb: Disable/enable regulators in suspend/resume
PCI: brcmstb: Enable child bus device regulators from DT
PCI: brcmstb: Split post-link up initialization to brcm_pcie_start_link()
PCI: brcmstb: Prevent config space access when link is down
PCI: brcmstb: Remove unnecessary forward declarations
Bjorn Helgaas [Thu, 4 Aug 2022 16:41:52 +0000 (11:41 -0500)]
Merge branch 'pci/resource'
- Replace sparc pci_mmap_page_range() wrapper. This still leaves a
sparc-specific pci_mmap_resource_range(), but it's only one interface
instead of two (Arnd Bergmann)
- Remove sparc-specific pci_mmap_resource_range() by implementing
pci_iobar_pfn(). This removes the ability to map the entire PCI I/O
space using /proc/bus/pci, but we believe that's already been broken
since v2.6.28 (Arnd Bergmann)
* pci/resource:
sparc: Use generic pci_mmap_resource_range()
PCI: Remove pci_mmap_page_range() wrapper
Bjorn Helgaas [Thu, 4 Aug 2022 16:41:52 +0000 (11:41 -0500)]
Merge branch 'pci/err'
- Recognize disconnected devices so we don't bother trying to set them to
"frozen" or "normal" state (Christoph Hellwig)
- Clear PCI Status register during enumeration in case firmware left errors
logged (Kai-Heng Feng)
- Configure ECRC for every device, including hot-added ones (Stefan Roese)
- Keep AER error reporting enabled for switches (Stefan Roese)
- Enable error reporting for all devices that support AER (Stefan Roese)
- Iterate over error counters instead of error strings to avoid printing
junk in AER sysfs counters (Mohamed Khalfella)
* pci/err:
PCI/AER: Iterate over error counters instead of error strings
PCI/AER: Enable error reporting when AER is native
PCI/portdrv: Don't disable AER reporting in get_port_device_capability()
PCI/AER: Configure ECRC for every device
PCI: Clear PCI_STATUS when setting up device
PCI/ERR: Recognize disconnected devices in report_error_detected()
Bjorn Helgaas [Thu, 4 Aug 2022 16:41:51 +0000 (11:41 -0500)]
Merge branch 'pci/enumeration'
- Split out ARI "next function" handling from the traditional one (Niklas
Schnelle)
- Move jailhouse "isolated function" (non-zero functions where function 0
doesn't exist) handling to pci_scan_slot() to avoid duplicating
multi-function scanning in pci_scan_child_bus_extend() (Niklas Schnelle)
- Extend "isolated function" probing to s390 (Niklas Schnelle).
- Allow s390 zPCI zbus without a function 0 (Niklas Schnelle)
* pci/enumeration:
s390/pci: allow zPCI zbus without a function zero
PCI: Extend isolated function probing to s390
PCI: Move jailhouse's isolated function handling to pci_scan_slot()
PCI: Split out next_ari_fn() from next_fn()
PCI: Clean up pci_scan_slot()
Namjae Jeon [Mon, 1 Aug 2022 22:28:51 +0000 (07:28 +0900)]
ksmbd: fix heap-based overflow in set_ntacl_dacl()
The testcase use SMB2_SET_INFO_HE command to set a malformed file attribute
under the label `security.NTACL`. SMB2_QUERY_INFO_HE command in testcase
trigger the following overflow.
[ 4712.003781] ==================================================================
[ 4712.003790] BUG: KASAN: slab-out-of-bounds in build_sec_desc+0x842/0x1dd0 [ksmbd]
[ 4712.003807] Write of size 1060 at addr ffff88801e34c068 by task kworker/0:0/4190
This patch add the buffer validation for security descriptor that is
stored by malformed SMB2_SET_INFO_HE command. and allocate large
response buffer about SMB2_O_INFO_SECURITY file info class.
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17771 Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Pavel Begunkov [Thu, 4 Aug 2022 14:15:30 +0000 (15:15 +0100)]
io_uring/net: send retry for zerocopy
io_uring handles short sends/recvs for stream sockets when MSG_WAITALL
is set, however new zerocopy send is inconsistent in this regard, which
might be confusing. Handle short sends.
Pavel Begunkov [Thu, 4 Aug 2022 14:13:46 +0000 (15:13 +0100)]
io_uring: mem-account pbuf buckets
Potentially, someone may create as many pbuf bucket as there are indexes
in an xarray without any other restrictions bounding our memory usage,
put memory needed for the buckets under memory accounting.
3. io_sq_thread() or io_wqe_worker() frees @audit_context using
audit_free();
4. do_exit() eventually calls audit_free() again, which is okay
because audit_free() does a NULL check.
As suggested by Paul Moore, fix it by deleting audit_alloc_kernel() and
redundant audit_free() calls.
Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring") Suggested-by: Paul Moore <paul@paul-moore.com> Cc: stable@vger.kernel.org Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/r/20220803222343.31673-1-yepeilin.cs@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jeff Layton [Mon, 1 Aug 2022 19:57:26 +0000 (15:57 -0400)]
lockd: detect and reject lock arguments that overflow
lockd doesn't currently vet the start and length in nlm4 requests like
it should, and can end up generating lock requests with arguments that
overflow when passed to the filesystem.
The NLM4 protocol uses unsigned 64-bit arguments for both start and
length, whereas struct file_lock tracks the start and end as loff_t
values. By the time we get around to calling nlm4svc_retrieve_args,
we've lost the information that would allow us to determine if there was
an overflow.
Start tracking the actual start and len for NLM4 requests in the
nlm_lock. In nlm4svc_retrieve_args, vet these values to ensure they
won't cause an overflow, and return NLM4_FBIG if they do.
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=392 Reported-by: Jan Kasiak <j.kasiak@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org> # 5.14+
NFSD: discard fh_locked flag and fh_lock/fh_unlock
As all inode locking is now fully balanced, fh_put() does not need to
call fh_unlock().
fh_lock() and fh_unlock() are no longer used, so discard them.
These are the only real users of ->fh_locked, so discard that too.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
NFSD: use (un)lock_inode instead of fh_(un)lock for file operations
When locking a file to access ACLs and xattrs etc, use explicit locking
with inode_lock() instead of fh_lock(). This means that the calls to
fh_fill_pre/post_attr() are also explicit which improves readability and
allows us to place them only where they are needed. Only the xattr
calls need pre/post information.
When locking a file we don't need I_MUTEX_PARENT as the file is not a
parent of anything, so we can use inode_lock() directly rather than the
inode_lock_nested() call that fh_lock() uses.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
When creating or unlinking a name in a directory use explicit
inode_lock_nested() instead of fh_lock(), and explicit calls to
fh_fill_pre_attrs() and fh_fill_post_attrs(). This is already done
for renames, with lock_rename() as the explicit locking.
Also move the 'fill' calls closer to the operation that might change the
attributes. This way they are avoided on some error paths.
For the v2-only code in nfsproc.c, the fill calls are not replaced as
they aren't needed.
Making the locking explicit will simplify proposed future changes to
locking for directories. It also makes it easily visible exactly where
pre/post attributes are used - not all callers of fh_lock() actually
need the pre/post attributes.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
nfsd_lookup() takes an exclusive lock on the parent inode, but no
callers want the lock and it may not be needed at all if the
result is in the dcache.
Change nfsd_lookup_dentry() to not take the lock, and call
lookup_one_len_locked() which takes lock only if needed.
nfsd4_open() currently expects the lock to still be held, but that isn't
necessary as nfsd_validate_delegated_dentry() provides required
guarantees without the lock.
NOTE: NFSv4 requires directory changeinfo for OPEN even when a create
wasn't requested and no change happened. Now that nfsd_lookup()
doesn't use fh_lock(), we need to explicitly fill the attributes
when no create happens. A new fh_fill_both_attrs() is provided
for that task.
Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
On non-error paths, nfsd_link() calls fh_unlock() twice. This is safe
because fh_unlock() records that the unlock has been done and doesn't
repeat it.
However it makes the code a little confusing and interferes with changes
that are planned for directory locking.
So rearrange the code to ensure fh_unlock() is called exactly once if
fh_lock() was called.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Some error paths in nfsd_unlink() allow it to exit without unlocking the
directory. This is not a problem in practice as the directory will be
locked with an fh_put(), but it is untidy and potentially confusing.
This allows us to remove all the fh_unlock() calls that are immediately
after nfsd_unlink() calls.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
NFSD: change nfsd_create()/nfsd_symlink() to unlock directory before returning.
nfsd_create() usually returns with the directory still locked.
nfsd_symlink() usually returns with it unlocked. This is clumsy.
Until recently nfsd_create() needed to keep the directory locked until
ACLs and security label had been set. These are now set inside
nfsd_create() (in nfsd_setattr()) so this need is gone.
So change nfsd_create() and nfsd_symlink() to always unlock, and remove
any fh_unlock() calls that follow calls to these functions.
Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
pacl and dpacl pointers are added to struct nfsd_attrs, which requires
that we have an nfsd_attrs_free() function to free them.
Those nfsv4 functions that can set ACLs now set up these pointers
based on the passed in NFSv4 ACL.
nfsd_setattr() sets the acls as appropriate.
Errors are handled as with security labels.
Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
MIPS: tlbex: Explicitly compare _PAGE_NO_EXEC against 0
When CONFIG_XPA is enabled, Clang warns:
arch/mips/mm/tlbex.c:629:24: error: converting the result of '<<' to a boolean; did you mean '(1 << _PAGE_NO_EXEC_SHIFT) != 0'? [-Werror,-Wint-in-bool-context]
if (cpu_has_rixi && !!_PAGE_NO_EXEC) {
^
arch/mips/include/asm/pgtable-bits.h:174:28: note: expanded from macro '_PAGE_NO_EXEC'
# define _PAGE_NO_EXEC (1 << _PAGE_NO_EXEC_SHIFT)
^
arch/mips/mm/tlbex.c:2568:24: error: converting the result of '<<' to a boolean; did you mean '(1 << _PAGE_NO_EXEC_SHIFT) != 0'? [-Werror,-Wint-in-bool-context]
if (!cpu_has_rixi || !_PAGE_NO_EXEC) {
^
arch/mips/include/asm/pgtable-bits.h:174:28: note: expanded from macro '_PAGE_NO_EXEC'
# define _PAGE_NO_EXEC (1 << _PAGE_NO_EXEC_SHIFT)
^
2 errors generated.
_PAGE_NO_EXEC can be '0' or '1 << _PAGE_NO_EXEC_SHIFT' depending on the
build and runtime configuration, which is what the negation operators
are trying to convey. To silence the warning, explicitly compare against
0 so the result of the '<<' operator is not implicitly converted to a
boolean.
According to its documentation, GCC enables -Wint-in-bool-context with
-Wall but this warning is not visible when building the same
configuration with GCC. It appears GCC only warns when compiling C++,
not C, although the documentation makes no note of this:
https://godbolt.org/z/x39q3brxf
Even after 8 years later, GCC LTO has not been upstreamed. Also, it said
"This is a workaround". If this is needed in the future, it should be
added in a proper way.
Andrea Righi [Thu, 14 Jul 2022 07:49:15 +0000 (09:49 +0200)]
x86/entry: Build thunk_$(BITS) only if CONFIG_PREEMPTION=y
With CONFIG_PREEMPTION disabled, arch/x86/entry/thunk_$(BITS).o becomes
an empty object file.
With some old versions of binutils (i.e., 2.35.90.20210113-1ubuntu1) the
GNU assembler doesn't generate a symbol table for empty object files and
objtool fails with the following error when a valid symbol table cannot
be found:
arch/x86/entry/thunk_64.o: warning: objtool: missing symbol table
To prevent this from happening, build thunk_$(BITS).o only if
CONFIG_PREEMPTION is enabled.
When DCSS + MIPI_DSI is used, and the last bridge in the chain supports
HPD, we can see a "Hot plug detection already enabled" warning stack
trace dump that's thrown when DCSS is initialized.
The problem appeared when HPD was enabled by default in the
bridge_connector initialization, which made the
drm_bridge_connector_enable_hpd() call, in DCSS init path, redundant.
So, let's remove that call.
Commit c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
optimises ttwu by queueing a task that is descheduling on the wakelist,
but does not check if the task descheduling is still allowed to run on that CPU.
In this warning, the problematic task is a workqueue rescue thread which
checks if the rescue is for a per-cpu workqueue and running on the wrong CPU.
While this is early in boot and it should be possible to create workers,
the rescue thread may still used if the MAYDAY_INITIAL_TIMEOUT is reached
or MAYDAY_INTERVAL and on a sufficiently large machine, the rescue
thread is being used frequently.
Tracing confirmed that the task should have migrated properly using the
stopper thread to handle the migration. However, a parallel wakeup from udev
running on another CPU that does not share CPU cache observes p->on_cpu and
uses task_cpu(p), queues the task on the old CPU and triggers the warning.
Check that the wakee task that is descheduling is still allowed to run
on its current CPU and if not, wait for the descheduling to complete
and select an allowed CPU.
Huacai Chen [Thu, 4 Aug 2022 02:54:57 +0000 (10:54 +0800)]
irqchip/loongson-eiointc: Fix a build warning
Make acpi_get_vec_parent() be a static function, to avoid:
drivers/irqchip/irq-loongson-eiointc.c:289:20: warning: no previous prototype for 'acpi_get_vec_parent'
Michael Kelley [Tue, 26 Jul 2022 00:53:40 +0000 (17:53 -0700)]
iommu/hyper-v: Use helper instead of directly accessing affinity
Recent changes to solve inconsistencies in handling IRQ masks #ifdef
out the affinity field in irq_common_data for non-SMP configurations.
The current code in hyperv_irq_remapping_alloc() gets a compiler error
in that case.
Fix this by using the new irq_data_update_affinity() helper, which
handles the non-SMP case correctly.
Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Marc Zyngier <maz@kernel.org> Fixes: aa0813581b8d ("genirq: Provide an IRQ affinity mask in non-SMP configs") Link: https://lore.kernel.org/r/1658796820-2261-1-git-send-email-mikelley@microsoft.com
fbdev:
- device unregistering fixes
- vesa: Support COMPILE_TEST
- Disable firmware-device registration when first native driver loads
aperture:
- fix segfault during hot-unplug
- export for use with other subsystems
client:
- use driver validated modes
dp:
- aux: make probing more reliable
- mst: Read extended DPCD capabilities during system resume
- Support waiting for HDP signal
- Port-validation fixes
probe-helper:
- use 640x480 as displayport fallback
scheduler:
- don't kill jobs in interrupt context
bridge:
- Add support for i.MX8qxp and i.MX8qm
- lots of fixes/cleanups
- Add TI-DLPC3433
- fy07024di26a30d: Optional GPIO reset
- ldb: Add reg and reg-name properties to bindings, Kconfig fixes
- lt9611: Fix display sensing;
- tc358767: DSI/DPI refactoring and DSI-to-eDP support, DSI lane handling
- tc358775: Fix clock settings
- ti-sn65dsi83: Allow GPIO to sleep
- adv7511: I2C fixes
- anx7625: Fix error handling; DPI fixes; Implement HDP timeout via callback
- fsl-ldb: Drop DE flip
- ti-sn65dsi86: Convert to atomic modesetting
amdgpu:
- use atomic fence helpers in DM
- fix VRAM address calculations
- export CRTC bpc via debugfs
- Initial devcoredump support
- Enable high priority gfx queue on asics which support it
- Adjust GART size on newer APUs for S/G display
- Soft reset for GFX 11 / SDMA 6
- Add gfxoff status query for vangogh
- Fix timestamps for cursor only commits
- Adjust GART size on newer APUs for S/G display
- fix buddy memory corruption
amdkfd:
- MMU notifier fixes
- P2P DMA support using dma-buf
- Add available memory IOCTL
- HMM profiler support
- Simplify GPUVM validation
- Unified memory for CWSR save/restore area
i915:
- General driver clean-up
- DG2 enabling (still under force probe)
- DG2 small BAR memory support
- HuC loading support
- DG2 workarounds
- DG2/ATS-M device IDs added
- Ponte Vecchio prep work and new blitter engines
- add Meteorlake support
- Fix sparse warnings
- DMC MMIO range checks
- Audio related fixes
- Runtime PM fixes
- PSR fixes
- Media freq factor and per-gt enhancements
- DSI fixes for ICL+
- Disable DMC flip queue handlers
- ADL_P voltage swing updates
- Use more the VBT for panel information
- Fix on Type-C ports with TBT mode
- Improve fastset and allow seamless M/N changes
- Accept more fixed modes with VRR/DMRRS panels
- Disable connector polling for a headless SKU
- ADL-S display PLL w/a
- Enable THP on Icelake and beyond
- Fix i915_gem_object_ggtt_pin_ww regression on old platforms
- Expose per tile media freq factor in sysfs
- Fix dma_resv fence handling in multi-batch execbuf
- Improve on suspend / resume time with VT-d enabled
- export CRTC bpc settings via debugfs
msm:
- gpu: a619 support
- gpu: Fix for unclocked GMU register access
- gpu: Devcore dump enhancements
- client utilization via fdinfo support
- fix fence rollover issue
- gem: Lockdep false-positive warning fix
- gem: Switch to pfn mappings
- WB support on sc7180
- dp: dropped custom bulk clock implementation
- fix link retraining on resolution change
- hdmi: dropped obsolete GPIO support
tegra:
- context isolation for host1x engines
- tegra234 soc support
exynos:
- Fix resume function issue of exynos decon driver by calling
clk_disable_unprepare() properly if clk_prepare_enable() failed.
nouveau:
- set of misc fixes/cleanups
- display cleanups
gma500:
- Cleanup connector I2C handling
hyperv:
- Unify VRAM allocation of Gen1 and Gen2
meson:
- Support YUV422 output; Refcount fixes
mgag200:
- Support damage clipping
- Support gamma handling
- Protect concurrent HW access
- Fixes to connector
- Store model-specific limits in device-info structure
- fix PCI register init
panfrost:
- Valhall support
r128:
- Fix bit-shift overflow
rockchip:
- Locking fixes in error path
ssd130x:
- Fix built-in linkage
udl:
- Always advertize VGA connector
ast:
- Support multiple outputs
- fix black screen on resume
sun4i:
- HDMI PHY cleanups
vc4:
- Add support for BCM2711
vkms:
- Allocate output buffer with vmalloc()
mcde:
- Fix ref-count leak
mxsfb/lcdif:
- Support i.MX8MP LCD controller
stm/ltdc:
- Support dynamic Z order
- Support mirroring
ingenic:
- Fix display at maximum resolution"
* tag 'drm-next-2022-08-03' of git://anongit.freedesktop.org/drm/drm: (1480 commits)
drm/amd/display: Fix a compilation failure on PowerPC caused by FPU code
drm/amdgpu: enable support for psp 13.0.4 block
drm/amdgpu: add files for PSP 13.0.4
drm/amdgpu: add header files for MP 13.0.4
drm/amdgpu: correct RLC_RLCS_BOOTLOAD_STATUS offset and index
drm/amdgpu: send msg to IMU for the front-door loading
drm/amdkfd: use time_is_before_jiffies(a + b) to replace "jiffies - a > b"
drm/amdgpu: fix hive reference leak when reflecting psp topology info
drm/amd/pm: enable GFX ULV feature support for SMU13.0.0
drm/amd/pm: update driver if header for SMU 13.0.0
drm/amdgpu: move mes self test after drm sched re-started
drm/amdgpu: drop non-necessary call trace dump
drm/amdgpu: enable VCN cg and JPEG cg/pg
drm/amdgpu: vcn_4_0_2 video codec query
drm/amdgpu: add VCN_4_0_2 firmware support
drm/amdgpu: add VCN function in NBIO v7.7
drm/amdgpu: fix a vcn4 boot poll bug in emulation mode
drm/amd/amdgpu: add memory training support for PSP_V13
drm/amdkfd: remove an unnecessary amdgpu_bo_ref
drm/amd/pm: Add get_gfx_off_status interface for yellow carp
...
Linus Torvalds [Thu, 4 Aug 2022 02:23:51 +0000 (19:23 -0700)]
Merge tag 'i2c-for-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
- new drivers: Microchip CoreI2C, Renesas RZV2M
- quite some DT schema conversions and extensions
- and a bunch of driver updates and improvements
* tag 'i2c-for-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (37 commits)
i2c: extend documentation about retvals of master_xfer functions
i2c: mux-gpmux: Add of_node_put() when breaking out of loop
dt-bindings: i2c: i2c-rk3x: Document Rockchip RV1126
i2c: qcom-geni: Use the correct return value
i2c: cadence: Support PEC for SMBus block read
i2c: qcom-geni: Propagate GENI_ABORT_DONE to geni_i2c_abort_xfer()
i2c: brcmstb: Use dev_name() for adapter name
i2c: Add Renesas RZ/V2M controller
dt-bindings: i2c: Document RZ/V2M I2C controller
i2c: mlxcpld: Add callback to notify probing completion
i2c: scmi: Replace open coded device_get_match_data()
i2c: stm32: add support for the STM32MP13 soc
dt-bindings: i2c: st,stm32-i2c: add entry for stm32mp13
dt-bindings: i2c: i2c-rk3x: add rk3588 compatible
i2c: add support for microchip fpga i2c controllers
i2c: i801: Add support for Intel Meteor Lake-P
dt-bindings: i2c: nomadik: Add power domain to binding
dt-bindings: i2c: nomadik: Drop unused voltage supply from example
i2c: Fix a potential use after free
i2c: hisi: use HZ_PER_KHZ macro in units.h
...
Yu Xiao [Tue, 2 Aug 2022 09:33:55 +0000 (10:33 +0100)]
nfp: ethtool: fix the display error of `ethtool -m DEVNAME`
The port flag isn't set to `NFP_PORT_CHANGED` when using
`ethtool -m DEVNAME` before, so the port state (e.g. interface)
cannot be updated. Therefore, it caused that `ethtool -m DEVNAME`
sometimes cannot read the correct information.
E.g. `ethtool -m DEVNAME` cannot work when load driver before plug
in optical module, as the port interface is still NONE without port
update.
Now update the port state before sending info to NIC to ensure that
port interface is correct (latest state).
Fixes: 61f7c6f44870 ("nfp: implement ethtool get module EEPROM") Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yu Xiao <yu.xiao@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20220802093355.69065-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: phy: Warn about incorrect mdio_bus_phy_resume() state
Calling mdio_bus_phy_resume() with neither the PHY state machine set to
PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication
that we can produce a race condition looking like this:
with the phy_resume() function triggering a PHY behavior that might have
to be worked around with (see bf8bfc4336f7 ("net: phy: broadcom: Fix
brcm_fet_config_init()") for instance) that ultimately leads to an error
reading from the PHY.
====================
Make DSA work with bonding's ARP monitor
Since commit 2b86cb829976 ("net: dsa: declare lockless TX feature for
slave ports") in v5.7, DSA breaks the ARP monitoring logic from the
bonding driver, fact which was pointed out by Brian Hutchinson who uses
a linux-5.10.y stable kernel.
Initially I got lured by other similar hacks introduced for other
NETIF_F_LLTX drivers, which, inspired by the bonding documentation,
update the trans_start of their TX queues by hand.
However Jakub pointed out that this simply isn't a proper solution, and
after coming to think more about it, I agree, and it doesn't work
properly with DSA nor is it maintainable for the future changes I plan
for it (multiple DSA masters in a LAG).
I've tested these changes using a DSA-based setup and a veth-based
setup, using the active-backup mode and ARP monitoring, with and without
arp_validate.
Link to v1:
https://patchwork.kernel.org/project/netdevbpf/patch/20220715232641.952532-1-vladimir.oltean@nxp.com/
Link to v2:
https://patchwork.kernel.org/project/netdevbpf/patch/20220727152000.3616086-1-vladimir.oltean@nxp.com/
====================
Vladimir Oltean [Sun, 31 Jul 2022 12:41:07 +0000 (15:41 +0300)]
Revert "veth: Add updating of trans_start"
This reverts commit e66e257a5d8368d9c0ba13d4630f474436533e8b. The veth
driver no longer needs these hacks which are slightly detrimential to
the fast path performance, because the bonding driver is keeping track
of TX times of ARP and NS probes by itself, which it should.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Sun, 31 Jul 2022 12:41:06 +0000 (15:41 +0300)]
net/sched: remove hacks added to dev_trans_start() for bonding to work
Now that the bonding driver keeps track of the last TX time of ARP and
NS probes, we effectively revert the following commits:
32d3e51a82d4 ("net_sched: use macvlan real dev trans_start in dev_trans_start()") 07ce76aa9bcf ("net_sched: make dev_trans_start return vlan's real dev trans_start")
Note that the approach of continuing to hack at this function would not
get us very far, hence the desire to take a different approach. DSA is
also a virtual device that uses NETIF_F_LLTX, but there, many uppers
share the same lower (DSA master, i.e. the physical host port of a
switch). By making dev_trans_start() on a DSA interface return the
dev_trans_start() of the master, we effectively assume that all other
DSA interfaces are silent, otherwise this corrupts the validity of the
probe timestamp data from the bonding driver's perspective.
Furthermore, the hacks didn't take into consideration the fact that the
lower interface of @dev may not have been physical either. For example,
VLAN over VLAN, or DSA with 2 masters in a LAG.
And even furthermore, there are NETIF_F_LLTX devices which are not
stacked, like veth. The hack here would not work with those, because it
would not have to provide the bonding driver something to chew at all.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Sun, 31 Jul 2022 12:41:05 +0000 (15:41 +0300)]
net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS
The bonding driver piggybacks on time stamps kept by the network stack
for the purpose of the netdev TX watchdog, and this is problematic
because it does not work with NETIF_F_LLTX devices.
It is hard to say why the driver looks at dev_trans_start() of the
slave->dev, considering that this is updated even by non-ARP/NS probes
sent by us, and even by traffic not sent by us at all (for example PTP
on physical slave devices). ARP monitoring in active-backup mode appears
to still work even if we track only the last TX time of actual ARP
probes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Add support for 64 bits enum, to match types exposed by kernel.
- Introduce support for sleepable uprobes program.
- Introduce support for enum textual representation in libbpf.
- New helpers to implement synproxy with eBPF/XDP.
- Improve loop performances, inlining indirect calls when possible.
- Removed all the deprecated libbpf APIs.
- Implement new eBPF-based LSM flavor.
- Add type match support, which allow accurate queries to the eBPF
used types.
- A few TCP congetsion control framework usability improvements.
- Add new infrastructure to manipulate CT entries via eBPF programs.
- Allow for livepatch (KLP) and BPF trampolines to attach to the same
kernel function.
Protocols:
- Introduce per network namespace lookup tables for unix sockets,
increasing scalability and reducing contention.
- Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.
- Add support to forciby close TIME_WAIT TCP sockets via user-space
tools.
- Significant performance improvement for the TLS 1.3 receive path,
both for zero-copy and not-zero-copy.
- Support for changing the initial MTPCP subflow priority/backup
status
- Introduce virtually contingus buffers for sockets over RDMA, to
cope better with memory pressure.
- Extend CAN ethtool support with timestamping capabilities
- Refactor CAN build infrastructure to allow building only the needed
features.
Driver API:
- Remove devlink mutex to allow parallel commands on multiple links.
- Add support for pause stats in distributed switch.
- Implement devlink helpers to query and flash line cards.
- New helper for phy mode to register conversion.
New hardware / drivers:
- Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.
- Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.
- Ethernet DSA driver for the Microchip LAN937x switch.
- Ethernet PHY driver for the Aquantia AQR113C EPHY.
- CAN driver for the OBD-II ELM327 interface.
- CAN driver for RZ/N1 SJA1000 CAN controller.
- Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.
Drivers:
- Intel Ethernet NICs:
- i40e: add support for vlan pruning
- i40e: add support for XDP framented packets
- ice: improved vlan offload support
- ice: add support for PPPoE offload
- Mellanox Ethernet (mlx5)
- refactor packet steering offload for performance and scalability
- extend support for TC offload
- refactor devlink code to clean-up the locking schema
- support stacked vlans for bridge offloads
- use TLS objects pool to improve connection rate
- Netronome Ethernet NICs (nfp):
- extend support for IPv6 fields mangling offload
- add support for vepa mode in HW bridge
- better support for virtio data path acceleration (VDPA)
- enable TSO by default
- Microsoft vNIC driver (mana)
- add support for XDP redirect
- Others Ethernet drivers:
- bonding: add per-port priority support
- microchip lan743x: extend phy support
- Fungible funeth: support UDP segmentation offload and XDP xmit
- Solarflare EF100: add support for virtual function representors
- MediaTek SoC: add XDP support
- Mellanox Ethernet/IB switch (mlxsw):
- dropped support for unreleased H/W (XM router).
- improved stats accuracy
- unified bridge model coversion improving scalability (parts 1-6)
- support for PTP in Spectrum-2 asics
- Broadcom PHYs
- add PTP support for BCM54210E
- add support for the BCM53128 internal PHY
- Marvell Ethernet switches (prestera):
- implement support for multicast forwarding offload
- Embedded Ethernet switches:
- refactor OcteonTx MAC filter for better scalability
- improve TC H/W offload for the Felix driver
- refactor the Microchip ksz8 and ksz9477 drivers to share the
probe code (parts 1, 2), add support for phylink mac
configuration
- Other WiFi:
- Microchip wilc1000: diable WEP support and enable WPA3
- Atheros ath10k: encapsulation offload support
Old code removal:
- Neterion vxge ethernet driver: this is untouched since more than 10 years"
* tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits)
doc: sfp-phylink: Fix a broken reference
wireguard: selftests: support UML
wireguard: allowedips: don't corrupt stack when detecting overflow
wireguard: selftests: update config fragments
wireguard: ratelimiter: use hrtimer in selftest
net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ
net: usb: ax88179_178a: Bind only to vendor-specific interface
selftests: net: fix IOAM test skip return code
net: usb: make USB_RTL8153_ECM non user configurable
net: marvell: prestera: remove reduntant code
octeontx2-pf: Reduce minimum mtu size to 60
net: devlink: Fix missing mutex_unlock() call
net/tls: Remove redundant workqueue flush before destroy
net: txgbe: Fix an error handling path in txgbe_probe()
net: dsa: Fix spelling mistakes and cleanup code
Documentation: devlink: add add devlink-selftests to the table of contents
dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock
net: ionic: fix error check for vlan flags in ionic_set_nic_features()
net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr()
nfp: flower: add support for tunnel offload without key ID
...
Linus Torvalds [Wed, 3 Aug 2022 22:26:04 +0000 (15:26 -0700)]
Merge tag 'ata-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ATA updates from Damien Le Moal:
- Some code refactoring for the pata_hpt37x and pata_hpt3x2n drivers,
from Sergei.
- Several patches to cleanup in libata-core, libata-scsi and libata-eh
code: fixes arguments and variables types, change some functions
declaration to static and fix for a typo in a comment. From Sergey
and Xiang.
- Fix a compilation warning in the pata_macio driver, from me.
- A fix for the expected number of resources in the sata_mv driver fix,
from Andrew.
* tag 'ata-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: sata_mv: Fixes expected number of resources now IRQs are gone
ata: libata-scsi: fix result type of ata_ioc32()
ata: pata_macio: Fix compilation warning
ata: libata-eh: fix sloppy result type of ata_internal_cmd_timeout()
ata: libata-core: fix sloppy parameter type in ata_exec_internal[_sg]()
ata: make ata_port::fastdrain_cnt *unsigned int*
ata: libata-eh: fix sloppy result type of ata_eh_nr_in_flight()
ata: libata-core: make ata_exec_internal_sg() *static*
ata: make transfer mode masks *unsigned int*
ata: libata-core: get rid of *else* branches in ata_id_n_sectors()
ata: libata-core: fix sloppy typing in ata_id_n_sectors()
ata: pata_hpt3x2n: pass base DPLL frequency to hpt3x2n_pci_clock()
ata: pata_hpt37x: merge hpt374_read_freq() to hpt37x_pci_clock()
ata: pata_hpt37x: factor out hpt37x_pci_clock()
ata: pata_hpt37x: move claculating PCI clock from hpt37x_clock_slot()
ata: libata: Fix syntax errors in comments
Linus Torvalds [Wed, 3 Aug 2022 22:21:53 +0000 (15:21 -0700)]
Merge tag 'zonefs-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs
Pull zonefs update from Damien Le Moal:
"A single change for this cycle to simplify handling of the memory page
used as super block buffer during mount (from Fabio)"
* tag 'zonefs-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: Call page_address() on page acquired with GFP_KERNEL flag
Linus Torvalds [Wed, 3 Aug 2022 22:16:49 +0000 (15:16 -0700)]
Merge tag 'iomap-5.20-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap updates from Darrick Wong:
"The most notable change in this first batch is that we no longer
schedule pages beyond i_size for writeback, preferring instead to let
truncate deal with those pages.
Next week, there may be a second pull request to remove
iomap_writepage from the other two filesystems (gfs2/zonefs) that use
iomap for buffered IO. This follows in the same vein as the recent
removal of writepage from XFS, since it hasn't been triggered in a few
years; it does nothing during direct reclaim; and as far as the people
who examined the patchset can tell, it's moving the codebase in the
right direction.
However, as it was a late addition to for-next, I'm holding off on
that section for another week of testing to see if anyone can come up
with a solid reason for holding off in the meantime.
Summary:
- Skip writeback for pages that are completely beyond EOF
- Minor code cleanups"
* tag 'iomap-5.20-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
dax: set did_zero to true when zeroing successfully
iomap: set did_zero to true when zeroing successfully
iomap: skip pages past eof in iomap_do_writepage()
Linus Torvalds [Wed, 3 Aug 2022 21:54:52 +0000 (14:54 -0700)]
Merge tag 'for-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"This brings some long awaited changes, the send protocol bump,
otherwise lots of small improvements and fixes. The main core part is
reworking bio handling, cleaning up the submission and endio and
improving error handling.
There are some changes outside of btrfs adding helpers or updating
API, listed at the end of the changelog.
Features:
- sysfs:
- export chunk size, in debug mode add tunable for setting its size
- show zoned among features (was only in debug mode)
- show commit stats (number, last/max/total duration)
- send protocol updated to 2
- new commands:
- ability write larger data chunks than 64K
- send raw compressed extents (uses the encoded data ioctls),
ie. no decompression on send side, no compression needed on
receive side if supported
- send 'otime' (inode creation time) among other timestamps
- send file attributes (a.k.a file flags and xflags)
- this is first version bump, backward compatibility on send and
receive side is provided
- there are still some known and wanted commands that will be
implemented in the near future, another version bump will be
needed, however we want to minimize that to avoid causing
usability issues
- print checksum type and implementation at mount time
- don't print some messages at mount (mentioned as people asked about
it), we want to print messages namely for new features so let's
make some space for that
- big metadata - this has been supported for a long time and is
not a feature that's worth mentioning
- skinny metadata - same reason, set by default by mkfs
Performance improvements:
- reduced amount of reserved metadata for delayed items
- when inserted items can be batched into one leaf
- when deleting batched directory index items
- when deleting delayed items used for deletion
- overall improved count of files/sec, decreased subvolume lock
contention
- metadata item access bounds checker micro-optimized, with a few
percent of improved runtime for metadata-heavy operations
- increase direct io limit for read to 256 sectors, improved
throughput by 3x on sample workload
Notable fixes:
- raid56
- reduce parity writes, skip sectors of stripe when there are no
data updates
- restore reading from on-disk data instead of using stripe cache,
this reduces chances to damage correct data due to RMW cycle
- refuse to replay log with unknown incompat read-only feature bit
set
- zoned
- fix page locking when COW fails in the middle of allocation
- improved tracking of active zones, ZNS drives may limit the
number and there are ENOSPC errors due to that limit and not
actual lack of space
- adjust maximum extent size for zone append so it does not cause
late ENOSPC due to underreservation
- mirror reading error messages show the mirror number
- don't fallback to buffered IO for NOWAIT direct IO writes, we don't
have the NOWAIT semantics for buffered io yet
- send, fix sending link commands for existing file paths when there
are deleted and created hardlinks for same files
- repair all mirrors for profiles with more than 1 copy (raid1c34)
- fix repair of compressed extents, unify where error detection and
repair happen
Core changes:
- bio completion cleanups
- don't double defer compression bios
- simplify endio workqueues
- add more data to btrfs_bio to avoid allocation for read requests
- rework bio error handling so it's same what block layer does,
the submission works and errors are consumed in endio
- when asynchronous bio offload fails fall back to synchronous
checksum calculation to avoid errors under writeback or memory
pressure
- super block log_root_transid deprecated (never used)
- mixed_backref and big_metadata sysfs feature files removed, they've
been default for sufficiently long time, there are no known users
and mixed_backref could be confused with mixed_groups
Non-btrfs changes, API updates:
- minor highmem API update to cover const arguments
* tag 'for-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (163 commits)
btrfs: don't call btrfs_page_set_checked in finish_compressed_bio_read
btrfs: fix repair of compressed extents
btrfs: remove the start argument to check_data_csum and export
btrfs: pass a btrfs_bio to btrfs_repair_one_sector
btrfs: simplify the pending I/O counting in struct compressed_bio
btrfs: repair all known bad mirrors
btrfs: merge btrfs_dev_stat_print_on_error with its only caller
btrfs: join running log transaction when logging new name
btrfs: simplify error handling in btrfs_lookup_dentry
btrfs: send: always use the rbtree based inode ref management infrastructure
btrfs: send: fix sending link commands for existing file paths
btrfs: send: introduce recorded_ref_alloc and recorded_ref_free
btrfs: zoned: wait until zone is finished when allocation didn't progress
btrfs: zoned: write out partially allocated region
btrfs: zoned: activate necessary block group
btrfs: zoned: activate metadata block group on flush_space
btrfs: zoned: disable metadata overcommit for zoned
btrfs: zoned: introduce space_info->active_total_bytes
btrfs: zoned: finish least available block group on data bg allocation
btrfs: let can_allocate_chunk return error
...
Linus Torvalds [Wed, 3 Aug 2022 21:41:36 +0000 (14:41 -0700)]
Merge tag 'efi-efivars-removal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull efivars sysfs interface removal from Ard Biesheuvel:
"Remove the obsolete 'efivars' sysfs based interface to the EFI
variable store, now that all users have moved to the efivarfs pseudo
file system, which was created ~10 years ago to address some
fundamental shortcomings in the sysfs based driver.
Move the 'business logic' related to which EFI variables are important
and may affect the boot flow from the efivars support layer into the
efivarfs pseudo file system, so it is no longer exposed to other parts
of the kernel"
* tag 'efi-efivars-removal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: vars: Move efivar caching layer into efivarfs
efi: vars: Switch to new wrapper layer
efi: vars: Remove deprecated 'efivars' sysfs interface
Linus Torvalds [Wed, 3 Aug 2022 21:38:02 +0000 (14:38 -0700)]
Merge tag 'efi-next-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
- Enable mirrored memory for arm64
- Fix up several abuses of the efivar API
- Refactor the efivar API in preparation for moving the 'business
logic' part of it into efivarfs
- Enable ACPI PRM on arm64
* tag 'efi-next-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
ACPI: Move PRM config option under the main ACPI config
ACPI: Enable Platform Runtime Mechanism(PRM) support on ARM64
ACPI: PRM: Change handler_addr type to void pointer
efi: Simplify arch_efi_call_virt() macro
drivers: fix typo in firmware/efi/memmap.c
efi: vars: Drop __efivar_entry_iter() helper which is no longer used
efi: vars: Use locking version to iterate over efivars linked lists
efi: pstore: Omit efivars caching EFI varstore access layer
efi: vars: Add thin wrapper around EFI get/set variable interface
efi: vars: Don't drop lock in the middle of efivar_init()
pstore: Add priv field to pstore_record for backend specific use
Input: applespi - avoid efivars API and invoke EFI services directly
selftests/kexec: remove broken EFI_VARS secure boot fallback check
brcmfmac: Switch to appropriate helper to load EFI variable contents
iwlwifi: Switch to proper EFI variable store interface
media: atomisp_gmin_platform: stop abusing efivar API
efi: efibc: avoid efivar API for setting variables
efi: avoid efivars layer when loading SSDTs from variables
efi: Correct comment on efi_memmap_alloc
memblock: Disable mirror feature if kernelcore is not specified
...
Linus Torvalds [Wed, 3 Aug 2022 21:03:51 +0000 (14:03 -0700)]
Merge tag 'pull-work.9p' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull 9p iov_iter fix from Al Viro:
"net/9p abuses iov_iter primitives - it attempts to copy _from_ a
destination-only iov_iter when it handles Rerror arriving in reply to
zero-copy request. Not hard to fix, fortunately.
This is a prereq for the iov_iter_get_pages() work in the second part
of iov_iter series, ended up in a separate branch"
* tag 'pull-work.9p' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
9p: handling Rerror without copy_from_iter_full()
Linus Torvalds [Wed, 3 Aug 2022 20:59:15 +0000 (13:59 -0700)]
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull copy_to_iter_mc fix from Al Viro:
"Backportable fix for copy_to_iter_mc() - the second part of iov_iter
work will pretty much overwrite this, but would be much harder to
backport"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix short copy handling in copy_mc_pipe_to_iter()
Mårten Lindahl [Mon, 1 Aug 2022 13:57:03 +0000 (15:57 +0200)]
tpm: Add check for Failure mode for TPM2 modules
In commit 0aa698787aa2 ("tpm: Add Upgrade/Reduced mode support for
TPM2 modules") it was said that:
"If the TPM is in Failure mode, it will successfully respond to both
tpm2_do_selftest() and tpm2_startup() calls. Although, will fail to
answer to tpm2_get_cc_attrs_tbl(). Use this fact to conclude that TPM
is in Failure mode."
But a check was never added in the commit when calling
tpm2_get_cc_attrs_tbl() to conclude that the TPM is in Failure mode.
This commit corrects this by adding a check.
Fixes: 0aa698787aa2 ("tpm: Add Upgrade/Reduced mode support for TPM2 modules") Cc: stable@vger.kernel.org # v5.17+ Signed-off-by: Mårten Lindahl <marten.lindahl@axis.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
tpm: eventlog: Fix section mismatch for DEBUG_SECTION_MISMATCH
If DEBUG_SECTION_MISMATCH enabled, __calc_tpm2_event_size() will not be
inlined, this cause section mismatch like this:
WARNING: modpost: vmlinux.o(.text.unlikely+0xe30c): Section mismatch in reference from the variable L0 to the function .init.text:early_ioremap()
The function L0() references
the function __init early_memremap().
This is often because L0 lacks a __init
annotation or the annotation of early_ioremap is wrong.
Fix it by using __always_inline instead of inline for the called-once
function __calc_tpm2_event_size().
Yang Li [Fri, 1 Jul 2022 09:13:22 +0000 (17:13 +0800)]
tpm: fix platform_no_drv_owner.cocci warning
Eliminate the following coccicheck warning:
./drivers/char/tpm/tpm_tis_i2c.c:379:3-8: No need to set .owner here. The core will do it.
Remove .owner field if calls are used which set it automatically
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Tianjia Zhang [Tue, 28 Jun 2022 03:37:20 +0000 (11:37 +0800)]
KEYS: asymmetric: enforce SM2 signature use pkey algo
The signature verification of SM2 needs to add the Za value and
recalculate sig->digest, which requires the detection of the pkey_algo
in public_key_verify_signature(). As Eric Biggers said, the pkey_algo
field in sig is attacker-controlled and should be use pkey->pkey_algo
instead of sig->pkey_algo, and secondly, if sig->pkey_algo is NULL, it
will also cause signature verification failure.
The software_key_determine_akcipher() already forces the algorithms
are matched, so the SM3 algorithm is enforced in the SM2 signature,
although this has been checked, we still avoid using any algorithm
information in the signature as input.
Fixes: 215525639631 ("X.509: support OSCCA SM2-with-SM3 certificate verification") Reported-by: Eric Biggers <ebiggers@google.com> Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Tianjia Zhang [Mon, 27 Jun 2022 09:21:41 +0000 (17:21 +0800)]
pkcs7: parser support SM2 and SM3 algorithms combination
Support parsing the message signature of the SM2 and SM3 algorithm
combination. This group of algorithms has been well supported. One
of the main users is module signature verification.
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: Vitaly Chikunov <vt@altlinux.org> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Tianjia Zhang [Mon, 27 Jun 2022 09:21:07 +0000 (17:21 +0800)]
sign-file: Fix confusing error messages
When an error occurs, use errx() instead of err() to display the
error message, because openssl has its own error record. When an
error occurs, errno will not be changed, while err() displays the
errno error message. It will cause confusion. For example, when
CMS_add1_signer() fails, the following message will appear:
sign-file: CMS_add1_signer: Success
errx() ignores errno and does not cause such issue.
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Tianjia Zhang [Mon, 27 Jun 2022 09:19:58 +0000 (17:19 +0800)]
X.509: Support parsing certificate using SM2 algorithm
The SM2-with-SM3 certificate generated by latest openssl no longer
reuses the OID_id_ecPublicKey, but directly uses OID_sm2. This patch
supports this type of x509 certificate parsing.
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Implement the TCG I2C Interface driver, as specified in the TCG PC
Client Platform TPM Profile (PTP) specification for TPM 2.0 v1.04
revision 14, section 8, I2C Interface Definition.
This driver supports Guard Times. That is, if required by the TPM, the
driver has to wait by a vendor-specific time after each I2C read/write.
The specific time is read from the TPM_I2C_INTERFACE_CAPABILITY register.
Unfortunately, the TCG specified almost but not quite compatible
register addresses. Therefore, the TIS register addresses need to be
mapped to I2C ones. The locality is stripped because for now, only
locality 0 is supported.
Add a sanity check to I2C reads of e.g. TPM_ACCESS and TPM_STS. This is
to detect communication errors and issues due to non-standard behaviour
(E.g. the clock stretching quirk in the BCM2835, see 4dbfb5f4401f). In
case the sanity check fails, attempt a retry.
Co-developed-by: Johannes Holland <johannes.holland@infineon.com> Signed-off-by: Johannes Holland <johannes.holland@infineon.com> Co-developed-by: Amir Mizinski <amirmizi6@gmail.com> Signed-off-by: Amir Mizinski <amirmizi6@gmail.com> Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
tpm: Add tpm_tis_verify_crc to the tpm_tis_phy_ops protocol layer
Some TPMs, e.g. those implementing the I2C variant of TIS, can verify
data transfers to/from the FIFO with a CRC. The CRC is calculated over
the entirety of the FIFO register. Since the phy_ops layer is not aware
when the core layer is done reading/writing the FIFO, CRC verification
must be triggered from the core layer. To this end, add an optional
phy_ops API call.
Co-developed-by: Johannes Holland <johannes.holland@infineon.com> Signed-off-by: Johannes Holland <johannes.holland@infineon.com> Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Initial device to be supported by the upcoming tpm_tis_i2c driver. More
to be added later.
Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com> Acked-by: Rob Herring <robh@kernel.org> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
tpm: Add upgrade/reduced mode support for TPM1.2 modules
In case a TPM in failure mode is detected, the TPM should be accessible
through a transparent communication channel for analysing purposes (e.g.
TPM_GetTestResult) or a field upgrade. Since a TPM in failure mode has
similar reduced functionality as in field upgrade mode, the flag
TPM_CHIP_FLAG_FIRMWARE_UPGRADE is also valid.
As described in TCG TPM Main Part1 Design Principles, Revision 116,
chapter 9.2.1. the TPM also allows an update function in case a TPM is
in failure mode.
If the TPM in failure mode is detected, the function tpm1_auto_startup()
sets TPM_CHIP_FLAG_FIRMWARE_UPGRADE flag, which is used later during
driver initialization/deinitialization to disable functionality which
makes no sense or will fail in the current TPM state. The following
functionality is affected:
* Do not register TPM as a hwrng
* Do not get pcr allocation
* Do not register sysfs entries which provide information impossible to
obtain in limited mode
Signed-off-by: Stefan Mahnke-Hartmann <stefan.mahnke-hartmann@infineon.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Linus Torvalds [Wed, 3 Aug 2022 20:50:22 +0000 (13:50 -0700)]
Merge tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs iov_iter updates from Al Viro:
"Part 1 - isolated cleanups and optimizations.
One of the goals is to reduce the overhead of using ->read_iter() and
->write_iter() instead of ->read()/->write().
new_sync_{read,write}() has a surprising amount of overhead, in
particular inside iocb_flags(). That's the explanation for the
beginning of the series is in this pile; it's not directly
iov_iter-related, but it's a part of the same work..."
* tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
first_iovec_segment(): just return address
iov_iter: massage calling conventions for first_{iovec,bvec}_segment()
iov_iter: first_{iovec,bvec}_segment() - simplify a bit
iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment()
iov_iter_get_pages{,_alloc}(): cap the maxsize with MAX_RW_COUNT
iov_iter_bvec_advance(): don't bother with bvec_iter
copy_page_{to,from}_iter(): switch iovec variants to generic
keep iocb_flags() result cached in struct file
iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC
struct file: use anonymous union member for rcuhead and llist
btrfs: use IOMAP_DIO_NOSYNC
teach iomap_dio_rw() to suppress dsync
No need of likely/unlikely on calls of check_copy_size()
Linus Torvalds [Wed, 3 Aug 2022 18:43:12 +0000 (11:43 -0700)]
Merge tag 'pull-work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs dcache updates from Al Viro:
"The main part here is making parallel lookups safe for RT - making
sure preemption is disabled in start_dir_add()/ end_dir_add() sections
(on non-RT it's automatic, on RT it needs to to be done explicitly)
and moving wakeups from __d_lookup_done() inside of such to the end of
those sections.
Wakeups can be safely delayed for as long as ->d_lock on in-lookup
dentry is held; proving that has caught a bug in d_add_ci() that
allows memory corruption when sufficiently bogus ntfs (or
case-insensitive xfs) image is mounted. Easily fixed, fortunately"
* tag 'pull-work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/dcache: Move wakeup out of i_seq_dir write held region.
fs/dcache: Move the wakeup from __d_lookup_done() to the caller.
fs/dcache: Disable preemption on i_dir_seq write side on PREEMPT_RT
d_add_ci(): make sure we don't miss d_lookup_done()
Andrew Morton [Wed, 3 Aug 2022 01:03:03 +0000 (18:03 -0700)]
tools/testing/selftests/vm/hmm-tests.c: fix build
hmm-tests.c:1607:42: error: 'HMM_DMIRROR_MIGRATE' undeclared (first use in this function); did you mean 'HMM_DMIRROR_WRITE'?
Fixes: f6c3e1ae0114cd0 ("mm/hmm: add a test for cross device private faults") Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Wed, 3 Aug 2022 18:35:20 +0000 (11:35 -0700)]
Merge tag 'pull-work.lseek' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs lseek updates from Al Viro:
"Jason's lseek series.
Saner handling of 'lseek should fail with ESPIPE' - this gets rid of
the magical no_llseek thing and makes checks consistent.
In particular, the ad-hoc "can we do splice via internal pipe" checks
got saner (and somewhat more permissive, which is what Jason had been
after, AFAICT)"
* tag 'pull-work.lseek' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: remove no_llseek
fs: check FMODE_LSEEK to control internal pipe splicing
vfio: do not set FMODE_LSEEK flag
dma-buf: remove useless FMODE_LSEEK flag
fs: do not compare against ->llseek
fs: clear or set FMODE_LSEEK based on llseek function
Linus Torvalds [Wed, 3 Aug 2022 18:31:42 +0000 (11:31 -0700)]
Merge tag 'pull-work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs namei updates from Al Viro:
"RCU pathwalk cleanups.
Storing sampled ->d_seq of the next dentry in nameidata simplifies
life considerably, especially if we delay fetching ->d_inode until
step_into()"
* tag 'pull-work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
step_into(): move fetching ->d_inode past handle_mounts()
lookup_fast(): don't bother with inode
follow_dotdot{,_rcu}(): don't bother with inode
step_into(): lose inode argument
namei: stash the sampled ->d_seq into nameidata
namei: move clearing LOOKUP_RCU towards rcu_read_unlock()
switch try_to_unlazy_next() to __legitimize_mnt()
follow_dotdot{,_rcu}(): change calling conventions
namei: get rid of pointless unlikely(read_seqcount_retry(...))
__follow_mount_rcu(): verify that mount_lock remains unchanged
John Garry [Tue, 26 Jul 2022 10:02:44 +0000 (18:02 +0800)]
pinctrl: qcom: Make PINCTRL_SM8450 depend on PINCTRL_MSM
All the many other configs depend on config PINCTRL_MSM, yet for config
PINCTRL_SM8450 we select config PINCTRL_MSM. Make config PINCTRL_SM8450
depend on PINCTRL_MSM to be consistent with the rest.