Merge tag 'char-misc-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are four small char/misc driver fixes for 5.19-rc6 to resolve
some reported issues. They only affect two drivers:
- rtsx_usb: fix for of-reported DMA warning error, the driver was
handling memory buffers in odd ways, it has now been fixed up to be
much simpler and correct by Shuah.
- at25 eeprom driver bugfix for reported problem
All of these have been in linux-next for a week with no reported
problems"
* tag 'char-misc-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
misc: rtsx_usb: set return value in rsp_buf alloc err path
misc: rtsx_usb: use separate command and response buffers
misc: rtsx_usb: fix use of dma mapped buffer for usb bulk transfer
eeprom: at25: Rework buggy read splitting
Merge tag 'kbuild-fixes-v5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Adjust gen_compile_commands.py to the format change of *.mod files
- Remove unused macro in scripts/Makefile.modinst
* tag 'kbuild-fixes-v5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: remove unused cmd_none in scripts/Makefile.modinst
gen_compile_commands: handle multiple lines per .mod file
Merge tag 'irq_urgent_for_v5.19_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Gracefully handle failure to request MMIO resources in the GICv3
driver
- Make a static key static in the Apple AIC driver
- Fix the Xilinx intc driver dependency on OF_ADDRESS
* tag 'irq_urgent_for_v5.19_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/apple-aic: Make symbol 'use_fast_ipi' static
irqchip/xilinx: Add explicit dependency on OF_ADDRESS
irqchip/gicv3: Handle resource request failure consistently
Merge tag 'x86_urgent_for_v5.19_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Prepare for and clear .brk early in order to address XenPV guests
failures where the hypervisor verifies page tables and uninitialized
data in that range leads to bogus failures in those checks
- Add any potential setup_data entries supplied at boot to the identity
pagetable mappings to prevent kexec kernel boot failures. Usually,
this is not a problem for the normal kernel as those mappings are
part of the initially mapped 2M pages but if kexec gets to allocate
the second kernel somewhere else, those setup_data entries need to be
mapped there too.
- Fix objtool not to discard text references from the __tracepoints
section so that ENDBR validation still works
- Correct the setup_data types limit as it is user-visible, before 5.19
releases
* tag 'x86_urgent_for_v5.19_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Fix the setup data types max limit
x86/ibt, objtool: Don't discard text references from tracepoint section
x86/compressed/64: Add identity mappings for setup_data entries
x86: Fix .brk attribute in linker script
x86: Clear .brk area at early boot
x86/xen: Use clear_bss() for Xen PV guests
Hans de Goede [Sun, 10 Jul 2022 14:07:36 +0000 (16:07 +0200)]
platform/x86/intel/ifs: Mark as BROKEN
A recent suggested change to the IFS code has shown that the userspace
API needs a bit more work, see:
https://lore.kernel.org/platform-driver-x86/20220708151938.986530-1-jithu.joseph@intel.com/
Mark it as BROKEN before 5.19 ships, to give ourselves one more
kernel-devel cycle to get the userspace API right.
On laptops like ASUS TUF Gaming A15, which have hotkeys to start Armoury
Crate or AURA Sync, these hotkeys are unavailable. This patch add
mappings for them.
Hans de Goede [Fri, 8 Jul 2022 13:14:12 +0000 (15:14 +0200)]
efi: Fix efi_power_off() not being run before acpi_power_off() when necessary
Commit 98f30d0ecf79 ("ACPI: power: Switch to sys-off handler API")
switched the ACPI sleep code from directly setting the old global
pm_power_off handler to using the new register_sys_off_handler()
mechanism with a priority of SYS_OFF_PRIO_FIRMWARE.
This is a problem when the old global pm_power_off handler would later
be overwritten, such as done by the late_initcall(efi_shutdown_init):
if (efi_poweroff_required())
pm_power_off = efi_power_off;
The old global pm_power_off handler gets run with a priority of
SYS_OFF_PRIO_DEFAULT which is lower then SYS_OFF_PRIO_FIRMWARE, causing
acpi_power_off() to run first, changing the behavior from before
the ACPI sleep code switched to the new register_sys_off_handler().
Switch the registering of efi_power_off over to register_sys_off_handler()
with a priority of SYS_OFF_PRIO_FIRMWARE + 1 so that it will run before
acpi_power_off() as before.
Note since the new sys-off-handler code will try all handlers in
priority order, there is no more need for the EFI code to store and
call the original pm_power_off handler.
Fixes: 98f30d0ecf79 ("ACPI: power: Switch to sys-off handler API") Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220708131412.81078-3-hdegoede@redhat.com
Hans de Goede [Fri, 8 Jul 2022 13:14:11 +0000 (15:14 +0200)]
platform/x86: x86-android-tablets: Fix Lenovo Yoga Tablet 2 830/1050 poweroff again
Commit 98f30d0ecf79 ("ACPI: power: Switch to sys-off handler API")
switched the ACPI sleep code from directly setting the old global
pm_power_off handler to using the new register_sys_off_handler()
mechanism with a priority of SYS_OFF_PRIO_FIRMWARE.
This is a problem in special cases where the old global pm_power_off
handler later gets overwritten, such as the Lenovo Tab2 poweroff bugfix
in x86-android-tablets. The old global pm_power_off handler gets run
with a priority of SYS_OFF_PRIO_DEFAULT which is lower then
SYS_OFF_PRIO_FIRMWARE, causing the troublesome ACPI poweroff (which
freezes the system) to run first.
Switch the registering of lenovo_yoga_tab2_830_1050_power_off over to
register_sys_off_handler() with a priority of SYS_OFF_PRIO_FIRMWARE + 1
so that it will run before acpi_power_off() to fix this.
Hans de Goede [Sun, 10 Jul 2022 14:07:36 +0000 (16:07 +0200)]
platform/x86/intel/ifs: Mark as BROKEN
A recent suggested change to the IFS code has shown that the userspace
API needs a bit more work, see:
https://lore.kernel.org/platform-driver-x86/20220708151938.986530-1-jithu.joseph@intel.com/
Mark it as BROKEN before 5.19 ships, to give ourselves one more
kernel-devel cycle to get the userspace API right.
On laptops like ASUS TUF Gaming A15, which have hotkeys to start Armoury
Crate or AURA Sync, these hotkeys are unavailable. This patch add
mappings for them.
Hans de Goede [Fri, 8 Jul 2022 13:14:12 +0000 (15:14 +0200)]
efi: Fix efi_power_off() not being run before acpi_power_off() when necessary
Commit 98f30d0ecf79 ("ACPI: power: Switch to sys-off handler API")
switched the ACPI sleep code from directly setting the old global
pm_power_off handler to using the new register_sys_off_handler()
mechanism with a priority of SYS_OFF_PRIO_FIRMWARE.
This is a problem when the old global pm_power_off handler would later
be overwritten, such as done by the late_initcall(efi_shutdown_init):
if (efi_poweroff_required())
pm_power_off = efi_power_off;
The old global pm_power_off handler gets run with a priority of
SYS_OFF_PRIO_DEFAULT which is lower then SYS_OFF_PRIO_FIRMWARE, causing
acpi_power_off() to run first, changing the behavior from before
the ACPI sleep code switched to the new register_sys_off_handler().
Switch the registering of efi_power_off over to register_sys_off_handler()
with a priority of SYS_OFF_PRIO_FIRMWARE + 1 so that it will run before
acpi_power_off() as before.
Note since the new sys-off-handler code will try all handlers in
priority order, there is no more need for the EFI code to store and
call the original pm_power_off handler.
Fixes: 98f30d0ecf79 ("ACPI: power: Switch to sys-off handler API") Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220708131412.81078-3-hdegoede@redhat.com
Hans de Goede [Fri, 8 Jul 2022 13:14:11 +0000 (15:14 +0200)]
platform/x86: x86-android-tablets: Fix Lenovo Yoga Tablet 2 830/1050 poweroff again
Commit 98f30d0ecf79 ("ACPI: power: Switch to sys-off handler API")
switched the ACPI sleep code from directly setting the old global
pm_power_off handler to using the new register_sys_off_handler()
mechanism with a priority of SYS_OFF_PRIO_FIRMWARE.
This is a problem in special cases where the old global pm_power_off
handler later gets overwritten, such as the Lenovo Tab2 poweroff bugfix
in x86-android-tablets. The old global pm_power_off handler gets run
with a priority of SYS_OFF_PRIO_DEFAULT which is lower then
SYS_OFF_PRIO_FIRMWARE, causing the troublesome ACPI poweroff (which
freezes the system) to run first.
Switch the registering of lenovo_yoga_tab2_830_1050_power_off over to
register_sys_off_handler() with a priority of SYS_OFF_PRIO_FIRMWARE + 1
so that it will run before acpi_power_off() to fix this.
Marc Zyngier [Sun, 10 Jul 2022 08:51:20 +0000 (09:51 +0100)]
Merge branch irq/plic-masking into irq/irqchip-next
* irq/plic-masking:
: .
: SiFive PLIC optimisations from Samuel Holland:
:
: "This series removes the spinlocks and cpumask operations from the PLIC
: driver's hot path. As far as I know, using the priority to mask
: interrupts is an intended usage and will work on all existing
: implementations. [...]"
: .
irqchip/sifive-plic: Separate the enable and mask operations
irqchip/sifive-plic: Make better use of the effective affinity mask
PCI: hv: Take a const cpumask in hv_compose_msi_req_get_cpu()
genirq: Provide an IRQ affinity mask in non-SMP configs
genirq: Return a const cpumask from irq_data_get_affinity_mask
genirq: Add and use an irq_data_update_affinity helper
genirq: Refactor accessors to use irq_data_get_affinity_mask
genirq: Drop redundant irq_init_effective_affinity
genirq: GENERIC_IRQ_EFFECTIVE_AFF_MASK depends on SMP
genirq: GENERIC_IRQ_IPI depends on SMP
irqchip/mips-gic: Only register IPI domain when SMP is enabled
Samuel Holland [Fri, 1 Jul 2022 20:24:40 +0000 (15:24 -0500)]
irqchip/sifive-plic: Separate the enable and mask operations
The PLIC has two per-IRQ checks before sending an IRQ to a hart context.
First, it checks that the IRQ's priority is nonzero. Then, it checks
that the enable bit is set for that combination of IRQ and context.
Currently, the PLIC driver sets both the priority value and the enable
bit in its (un)mask operations. However, modifying the enable bit is
problematic for two reasons:
1) The enable bits are packed, so changes are not atomic and require
taking a spinlock.
2) The following requirement from the PLIC spec, which explains the
racy (un)mask operations in plic_irq_eoi():
If the completion ID does not match an interrupt source
that is currently enabled for the target, the completion
is silently ignored.
Both of these problems are solved by using the priority value to mask
IRQs. Each IRQ has a separate priority register, so writing the priority
value is atomic. And since the enable bit remains set while an IRQ is
masked, the EOI operation works normally. The enable bits are still used
to control the IRQ's affinity.
Samuel Holland [Fri, 1 Jul 2022 20:24:39 +0000 (15:24 -0500)]
irqchip/sifive-plic: Make better use of the effective affinity mask
The PLIC driver already updates the effective affinity mask in its
.irq_set_affinity callback. Take advantage of that information to only
touch bits (and take spinlocks) for the specific relevant hart contexts.
First, make sure the effective affinity mask is set before IRQ startup.
Then, since this mask already takes priv->lmask into account, checking
that mask later is no longer needed (and handler->present is equivalent
to the bit being set in priv->lmask).
Finally, when (un)masking or changing affinity, only clear/set the
enable bits in the specific old/new context(s). The cpumask operations
in plic_irq_unmask() are not needed because they duplicate the code in
plic_set_affinity().
Marc Zyngier [Sun, 10 Jul 2022 08:49:44 +0000 (09:49 +0100)]
Merge branch irq/affinity-nosmp into irq/plic-masking
* irq/affinity-nosmp:
: .
: non-SMP IRQ affinity fixes courtesy of Samuel Holland:
:
: "This series solves some inconsistency with how IRQ affinity masks are
: handled between SMP and non-SMP configurations.
:
: In non-SMP configs, an IRQ's true affinity is always cpumask_of(0), so
: irq_{,data_}get_affinity_mask now return that, instead of returning an
: uninitialized per-IRQ cpumask. This change makes iterating over the
: affinity mask do the right thing in both SMP and non-SMP configurations.
:
: To accomplish that:
: - patches 1-3 disable some library code that was broken anyway on !SMP
: - patches 4-7 refactor the code so that irq_{,data_}get_affinity_mask
: can return a const cpumask, since that is what cpumask_of provides
: - patch 8 drops the per-IRQ cpumask and replaces it with cpumask_of(0)"
: .
PCI: hv: Take a const cpumask in hv_compose_msi_req_get_cpu()
genirq: Provide an IRQ affinity mask in non-SMP configs
genirq: Return a const cpumask from irq_data_get_affinity_mask
genirq: Add and use an irq_data_update_affinity helper
genirq: Refactor accessors to use irq_data_get_affinity_mask
genirq: Drop redundant irq_init_effective_affinity
genirq: GENERIC_IRQ_EFFECTIVE_AFF_MASK depends on SMP
genirq: GENERIC_IRQ_IPI depends on SMP
irqchip/mips-gic: Only register IPI domain when SMP is enabled
Marc Zyngier [Sun, 10 Jul 2022 08:48:26 +0000 (09:48 +0100)]
Merge branch irq/stm32-exti-updates into irq/irqchip-next
* irq/stm32-exti-updates:
: .
: stm32-exti updates courtesy of Antonio Borneo:
:
: "This series address some code fix for irq-stm32-exti driver and
: simplifies the table that remaps the interrupts from exti to gic."
:
: Also comes with an additional change to irq_chip_request_resources_parent(),
: making it possible to omit the callback in hierarchies.
: .
irqchip/stm32-exti: Simplify irq description table
irqchip/stm32-exti: Read event trigger type from event_trg register
irqchip/stm32-exti: Tag emr register as undefined for stm32mp15
irqchip/stm32-exti: Prevent illegal read due to unbounded DT value
irqchip/stm32-exti: Fix irq_mask/irq_unmask for direct events
irqchip/stm32-exti: Fix irq_set_affinity return value
genirq: Don't return error on missing optional irq_request_resources()
pinctrl: renesas: pinctrl-rzg2l: Add IRQ domain to handle GPIO interrupt
Add IRQ domain to RZ/G2L pinctrl driver to handle GPIO interrupt.
GPIO0-GPIO122 pins can be used as IRQ lines but only 32 pins can be
used as IRQ lines at a given time. Selection of pins as IRQ lines
is handled by IA55 (which is the IRQC block) which sits in between the
GPIO and GIC.
gpio: gpiolib: Allow free() callback to be overridden
Allow free() callback to be overridden from irq_domain_ops for
hierarchical chips.
This allows drivers to free up resources which are allocated during
child_to_parent_hwirq()/populate_parent_alloc_arg() callbacks.
On Renesas RZ/G2L platform a bitmap is maintained for TINT slots, a slot
is allocated in child_to_parent_hwirq() callback which is freed up in free
callback hence this override.
Add a driver for the Renesas RZ/G2L Interrupt Controller.
This supports external pins being used as interrupts. It supports
one line for NMI, 8 external pins and 32 GPIO pins (out of 123)
to be used as IRQ lines.
Marc Zyngier [Thu, 7 Jul 2022 18:23:09 +0000 (19:23 +0100)]
gpio: Remove dynamic allocation from populate_parent_alloc_arg()
The gpiolib is unique in the way it uses intermediate fwspecs
when feeding an interrupt specifier to the parent domain, as it
relies on the populate_parent_alloc_arg() callback to perform
a dynamic allocation.
This is pretty inefficient (we free the structure almost immediately),
and the only reason this isn't a stack allocation is that our
ThunderX friend uses MSIs rather than a FW-constructed structure.
Let's solve it by providing a new type composed of the union
of a struct irq_fwspec and a msi_info_t, which satisfies both
requirements. This allows us to use a stack allocation, and we
can move the handful of users to this new scheme.
Also perform some additional cleanup, such as getting rid of the
stub versions of the irq_domain_translate_*cell helpers, which
are never used when CONFIG_IRQ_DOMAIN_HIERARCHY isn't selected.
Tested on a Tegra186.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Daniel Palmer <daniel@thingy.jp> Cc: Romain Perier <romain.perier@gmail.com> Cc: Bartosz Golaszewski <brgl@bgdev.pl> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Jonathan Hunter <jonathanh@nvidia.com> Cc: Robert Richter <rric@kernel.org> Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Cc: Andy Gross <agross@kernel.org> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Bartosz Golaszewski <brgl@bgdev.pl> Link: https://lore.kernel.org/r/20220707182314.66610-2-prabhakar.mahadev-lad.rj@bp.renesas.com
Ben Widawsky [Wed, 13 Apr 2022 05:18:09 +0000 (22:18 -0700)]
cxl/hdm: Require all decoders to be enumerated
In preparation for region provisioning all device decoders need to be
enumerated since DPA allocations are calculated by summing the
capacities of all decoders in a set. I.e. the programming for decoder[N]
depends on the state of decoder[N-1], so skipping over decoders that
fail to initialize prevents accurate DPA accounting.
Dan Williams [Sat, 21 May 2022 22:35:29 +0000 (15:35 -0700)]
cxl/mem: Convert partition-info to resources
To date the per-device-partition DPA range information has only been
used for enumeration purposes. In preparation for allocating regions
from available DPA capacity, convert those ranges into DPA-type resource
trees.
With resources and the new add_dpa_res() helper some open coded end
address calculations and debug prints can be cleaned.
The 'cxlds->pmem_res' and 'cxlds->ram_res' resources are child resources
of the total-device DPA space and they in turn will host DPA allocations
from cxl_endpoint_decoder instances (tracked by cxled->dpa_res).
Dan Williams [Mon, 23 May 2022 00:04:27 +0000 (17:04 -0700)]
cxl: Introduce cxl_to_{ways,granularity}
Interleave granularity and ways have CXL specification defined encodings.
Promote the conversion helpers to a common header, and use them to
replace other open-coded instances.
Force caller to consider the error case of the conversion similarly to
other conversion helpers like kstrto*().
Dan Williams [Thu, 19 May 2022 04:38:29 +0000 (21:38 -0700)]
cxl/core: Drop is_cxl_decoder()
This helper was only used to identify the object type for lockdep
purposes. Now that lockdep support is done with explicit lock classes,
this helper can be dropped.
Dan Williams [Thu, 19 May 2022 01:02:39 +0000 (18:02 -0700)]
cxl/core: Drop ->platform_res attribute for root decoders
Root decoders are responsible for hosting the available host address
space for endpoints and regions to claim. The tracking of that available
capacity can be done in iomem_resource directly. As a result, root
decoders no longer need to host their own resource tree. The
current ->platform_res attribute was added prematurely.
Otherwise, ->hpa_range fills the role of conveying the current decode
range of the decoder.
Ben Widawsky [Thu, 28 Apr 2022 18:15:40 +0000 (11:15 -0700)]
cxl/hdm: Use local hdm variable
Save a few characters and use the already initialized local variable.
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Adam Manzanares <a.manzanares@samsung.com> Link: https://lore.kernel.org/r/165603872171.551046.913207574344536475.stgit@dwillia2-xfh Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Merge tag 'intel-pinctrl-v5.20-1' of gitolite.kernel.org:pub/scm/linux/kernel/git/pinctrl/intel into devel
intel-pinctrl for v5.20-1
* Update MAINTAINERS to set the Intel pin control status to Supported
* Switch Intel pin control drivers to use struct pingroup
The following is an automated git shortlog grouped by driver:
baytrail:
- Switch to to embedded struct pingroup
cherryview:
- Switch to to embedded struct pingroup
intel:
- Add Intel Meteor Lake pin controller support
- Drop no more used members of struct intel_pingroup
- Switch to to embedded struct pingroup
- Embed struct pingroup into struct intel_pingroup
lynxpoint:
- Switch to to embedded struct pingroup
MAINTAINERS:
- Update Intel pin control to Supported
Robert Marko [Fri, 24 Jun 2022 19:51:12 +0000 (21:51 +0200)]
pinctrl: qcom: spmi-gpio: make the irqchip immutable
Commit 6c846d026d49 ("gpio: Don't fiddle with irqchips marked as
immutable") added a warning to indicate if the gpiolib is altering the
internals of irqchips.
Following this change the following warning is now observed for the SPMI
PMIC pinctrl driver:
gpio gpiochip1: (200f000.spmi:pmic@0:gpio@c000): not an immutable chip, please consider fixing it!
Fix this by making the irqchip in the SPMI PMIC pinctrl driver immutable.
Signed-off-by: Robert Marko <robimarko@gmail.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Tested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20220624195112.894916-1-robimarko@gmail.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Darrick J. Wong [Sat, 9 Jul 2022 17:56:06 +0000 (10:56 -0700)]
xfs: use XFS_IFORK_Q to determine the presence of an xattr fork
Modify xfs_ifork_ptr to return a NULL pointer if the caller asks for the
attribute fork but i_forkoff is zero. This eliminates the ambiguity
between i_forkoff and i_af.if_present, which should make it easier to
understand the lifetime of attr forks.
While we're at it, remove the if_present checks around calls to
xfs_idestroy_fork and xfs_ifork_zap_attr since they can both handle attr
forks that have already been torn down.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Darrick J. Wong [Sat, 9 Jul 2022 17:56:06 +0000 (10:56 -0700)]
xfs: make inode attribute forks a permanent part of struct xfs_inode
Syzkaller reported a UAF bug a while back:
==================================================================
BUG: KASAN: use-after-free in xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127
Read of size 4 at addr ffff88802cec919c by task syz-executor262/2958
Memory state around the buggy address: ffff88802cec9080: fb fb fb fc fc fa fb fb fb fb fc fc fb fb fb fb ffff88802cec9100: fb fc fc fb fb fb fb fb fc fc fb fb fb fb fb fc
>ffff88802cec9180: fc fa fb fb fb fb fc fc fa fb fb fb fb fc fc fb
^ ffff88802cec9200: fb fb fb fb fc fc fb fb fb fb fb fc fc fb fb fb ffff88802cec9280: fb fb fc fc fa fb fb fb fb fc fc fa fb fb fb fb
==================================================================
The root cause of this bug is the unlocked access to xfs_inode.i_afp
from the getxattr code paths while trying to determine which ILOCK mode
to use to stabilize the xattr data. Unfortunately, the VFS does not
acquire i_rwsem when vfs_getxattr (or listxattr) call into the
filesystem, which means that getxattr can race with a removexattr that's
tearing down the attr fork and crash:
Regrettably, the VFS is much more lax about i_rwsem and getxattr than
is immediately obvious -- not only does it not guarantee that we hold
i_rwsem, it actually doesn't guarantee that we *don't* hold it either.
The getxattr system call won't acquire the lock before calling XFS, but
the file capabilities code calls getxattr with and without i_rwsem held
to determine if the "security.capabilities" xattr is set on the file.
Fixing the VFS locking requires a treewide investigation into every code
path that could touch an xattr and what i_rwsem state it expects or sets
up. That could take years or even prove impossible; fortunately, we
can fix this UAF problem inside XFS.
An earlier version of this patch used smp_wmb in xfs_attr_fork_remove to
ensure that i_forkoff is always zeroed before i_afp is set to null and
changed the read paths to use smp_rmb before accessing i_forkoff and
i_afp, which avoided these UAF problems. However, the patch author was
too busy dealing with other problems in the meantime, and by the time he
came back to this issue, the situation had changed a bit.
On a modern system with selinux, each inode will always have at least
one xattr for the selinux label, so it doesn't make much sense to keep
incurring the extra pointer dereference. Furthermore, Allison's
upcoming parent pointer patchset will also cause nearly every inode in
the filesystem to have extended attributes. Therefore, make the inode
attribute fork structure part of struct xfs_inode, at a cost of 40 more
bytes.
This patch adds a clunky if_present field where necessary to maintain
the existing logic of xattr fork null pointer testing in the existing
codebase. The next patch switches the logic over to XFS_IFORK_Q and it
all goes away.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Darrick J. Wong [Sat, 9 Jul 2022 17:56:05 +0000 (10:56 -0700)]
xfs: convert XFS_IFORK_PTR to a static inline helper
We're about to make this logic do a bit more, so convert the macro to a
static inline function for better typechecking and fewer shouty macros.
No functional changes here.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Xiu Jianfeng [Tue, 14 Jun 2022 09:00:01 +0000 (17:00 +0800)]
apparmor: Fix memleak in aa_simple_write_to_buffer()
When copy_from_user failed, the memory is freed by kvfree. however the
management struct and data blob are allocated independently, so only
kvfree(data) cause a memleak issue here. Use aa_put_loaddata(data) to
fix this issue.
Fixes: a6a52579e52b5 ("apparmor: split load data into management struct and data blob") Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
apparmor: fix reference count leak in aa_pivotroot()
The aa_pivotroot() function has a reference counting bug in a specific
path. When aa_replace_current_label() returns on success, the function
forgets to decrement the reference count of “target”, which is
increased earlier by build_pivotroot(), causing a reference leak.
Fix it by decreasing the refcount of “target” in that path.
Fixes: 2ea3ffb7782a ("apparmor: add mount mediation") Co-developed-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Co-developed-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn> Signed-off-by: John Johansen <john.johansen@canonical.com>
Yang Li [Thu, 17 Mar 2022 01:03:30 +0000 (09:03 +0800)]
apparmor: Fix some kernel-doc comments
Remove some warnings found by running scripts/kernel-doc,
which is caused by using 'make W=1'.
security/apparmor/domain.c:137: warning: Function parameter or member
'state' not described in 'label_compound_match'
security/apparmor/domain.c:137: warning: Excess function parameter
'start' description in 'label_compound_match'
security/apparmor/domain.c:1294: warning: Excess function parameter
'onexec' description in 'aa_change_profile'
Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Tue, 22 Feb 2022 10:09:20 +0000 (02:09 -0800)]
apparmor: Fix undefined reference to `zlib_deflate_workspacesize'
IF CONFIG_SECURITY_APPARMOR_EXPORT_BINARY is disabled, there remains
some unneed references to zlib, and can result in undefined symbol
references if ZLIB_INFLATE or ZLIB_DEFLATE are not defined.
Reported-by: kernel test robot <lkp@intel.com> Fixes: abfb9c0725f2 ("apparmor: make export of raw binary profile to userspace optional") Signed-off-by: John Johansen <john.johansen@canonical.com>
Tom Rix [Sun, 13 Feb 2022 21:32:28 +0000 (13:32 -0800)]
apparmor: fix aa_label_asxprint return check
Clang static analysis reports this issue
label.c:1802:3: warning: 2nd function call argument
is an uninitialized value
pr_info("%s", str);
^~~~~~~~~~~~~~~~~~
str is set from a successful call to aa_label_asxprint(&str, ...)
On failure a negative value is returned, not a -1. So change
the check.
Fixes: f1bd904175e8 ("apparmor: add the base fns() for domain labels") Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
Yang Li [Sat, 29 Jan 2022 02:51:01 +0000 (10:51 +0800)]
apparmor: Fix some kernel-doc comments
Don't use /** for non-kernel-doc comments and change function name
aa_mangle_name to mangle_name in kernel-doc comment to Remove some
warnings found by running scripts/kernel-doc, which is caused by
using 'make W=1'.
security/apparmor/apparmorfs.c:1503: warning: Cannot understand *
on line 1503 - I thought it was a doc line
security/apparmor/apparmorfs.c:1530: warning: Cannot understand *
on line 1530 - I thought it was a doc line
security/apparmor/apparmorfs.c:1892: warning: Cannot understand *
on line 1892 - I thought it was a doc line
security/apparmor/apparmorfs.c:108: warning: expecting prototype for
aa_mangle_name(). Prototype was for mangle_name() instead
Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
Yang Li [Sat, 29 Jan 2022 02:51:00 +0000 (10:51 +0800)]
apparmor: Fix some kernel-doc comments
Add the description of @ns_name, change function name aa_u16_chunck to
unpack_u16_chunk and verify_head to verify_header in kernel-doc comment
to remove warnings found by running scripts/kernel-doc, which is caused
by using 'make W=1'.
security/apparmor/policy_unpack.c:224: warning: expecting prototype for
aa_u16_chunck(). Prototype was for unpack_u16_chunk() instead
security/apparmor/policy_unpack.c:678: warning: Function parameter or
member 'ns_name' not described in 'unpack_profile'
security/apparmor/policy_unpack.c:950: warning: expecting prototype for
verify_head(). Prototype was for verify_header() instead
Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
Yang Li [Sat, 29 Jan 2022 02:50:59 +0000 (10:50 +0800)]
apparmor: Fix match_mnt_path_str() and match_mnt() kernel-doc comment
Fix a spelling problem and change @mntpath to @path to remove warnings
found by running scripts/kernel-doc, which is caused by using 'make W=1'.
security/apparmor/mount.c:321: warning: Function parameter or member
'devname' not described in 'match_mnt_path_str'
security/apparmor/mount.c:321: warning: Excess function parameter
'devnme' description in 'match_mnt_path_str'
security/apparmor/mount.c:377: warning: Function parameter or member
'path' not described in 'match_mnt'
security/apparmor/mount.c:377: warning: Excess function parameter
'mntpath' description in 'match_mnt'
Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
Make use of the struct_size() helper instead of an open-coded version,
in order to avoid any potential type mistakes or integer overflows that,
in the worst scenario, could lead to heap overflows.
Also, address the following sparse warnings:
security/apparmor/lib.c:139:23: warning: using sizeof on a flexible structure
Link: https://github.com/KSPP/linux/issues/174 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Tue, 25 Jan 2022 08:37:42 +0000 (00:37 -0800)]
apparmor: Fix failed mount permission check error message
When the mount check fails due to a permission check failure instead
of explicitly at one of the subcomponent checks, AppArmor is reporting
a failure in the flags match. However this is not true and AppArmor
can not attribute the error at this point to any particular component,
and should only indicate the mount failed due to missing permissions.
Fixes: 2ea3ffb7782a ("apparmor: add mount mediation") Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Thu, 29 Apr 2021 08:48:28 +0000 (01:48 -0700)]
apparmor: fix quiet_denied for file rules
Global quieting of denied AppArmor generated file events is not
handled correctly. Unfortunately the is checking if quieting of all
audit events is set instead of just denied events.
Fixes: 67012e8209df ("AppArmor: basic auditing infrastructure.") Signed-off-by: John Johansen <john.johansen@canonical.com>
Mike Salvatore [Wed, 8 Jul 2020 17:04:43 +0000 (13:04 -0400)]
apparmor: resolve uninitialized symbol warnings in policy_unpack_test.c
Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mike Salvatore <mike.salvatore@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Fri, 5 Feb 2021 12:56:02 +0000 (04:56 -0800)]
apparmor: don't create raw_sha1 symlink if sha1 hashing is disabled
Currently if sha1 hashing of policy is disabled a sha1 hash symlink
to the non-existent file is created. There is now reason to create
the symlink in this case so don't do it.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Wed, 3 Feb 2021 09:35:12 +0000 (01:35 -0800)]
apparmor: Enable tuning of policy paranoid load for embedded systems
AppArmor by default does an extensive check on loaded policy that
can take quite some time on limited resource systems. Allow
disabling this check for embedded systems where system images are
readonly and have checksumming making the need for the embedded
policy to be fully checked to be redundant.
Note: basic policy checks are still done.
Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Mon, 1 Feb 2021 11:43:18 +0000 (03:43 -0800)]
apparmor: make export of raw binary profile to userspace optional
Embedded systems have limited space and don't need the introspection
or checkpoint restore capability provided by exporting the raw
profile binary data so make it so make it a config option.
This will reduce run time memory use and also speed up policy loads.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Yang Li [Fri, 10 Dec 2021 05:57:12 +0000 (13:57 +0800)]
lsm: Fix kernel-doc
Fix function name in lsm.c kernel-doc comment
to remove some warnings found by running scripts/kernel-doc,
which is caused by using 'make W=1'.
security/apparmor/lsm.c:819: warning: expecting prototype for
apparmor_clone_security(). Prototype was for
apparmor_sk_clone_security() instead
security/apparmor/lsm.c:923: warning: expecting prototype for
apparmor_socket_list(). Prototype was for apparmor_socket_listen()
instead
security/apparmor/lsm.c:1028: warning: expecting prototype for
apparmor_getsockopt(). Prototype was for apparmor_socket_getsockopt()
instead
security/apparmor/lsm.c:1038: warning: expecting prototype for
apparmor_setsockopt(). Prototype was for apparmor_socket_setsockopt()
instead
ecurity/apparmor/lsm.c:1061: warning: expecting prototype for
apparmor_socket_sock_recv_skb(). Prototype was for
apparmor_socket_sock_rcv_skb() instead
Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
Yang Li [Wed, 17 Nov 2021 07:37:58 +0000 (15:37 +0800)]
apparmor: Fix kernel-doc
Fix function name in security/apparmor/label.c, policy.c, procattr.c
kernel-doc comment to remove some warnings found by clang(make W=1 LLVM=1).
security/apparmor/label.c:499: warning: expecting prototype for
aa_label_next_not_in_set(). Prototype was for
__aa_label_next_not_in_set() instead
security/apparmor/label.c:2147: warning: expecting prototype for
__aa_labelset_udate_subtree(). Prototype was for
__aa_labelset_update_subtree() instead
security/apparmor/policy.c:434: warning: expecting prototype for
aa_lookup_profile(). Prototype was for aa_lookupn_profile() instead
security/apparmor/procattr.c:101: warning: expecting prototype for
aa_setprocattr_chagnehat(). Prototype was for aa_setprocattr_changehat()
instead
Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
John Johansen [Tue, 14 Dec 2021 10:59:28 +0000 (02:59 -0800)]
apparmor: fix absroot causing audited secids to begin with =
AppArmor is prefixing secids that are converted to secctx with the =
to indicate the secctx should only be parsed from an absolute root
POV. This allows catching errors where secctx are reparsed back into
internal labels.
Unfortunately because audit is using secid to secctx conversion this
means that subject and object labels can result in a very unfortunate
== that can break audit parsing.
eg. the subj==unconfined term in the below audit message
Fix this by switch the prepending of = to a _. This still works as a
special character to flag this case without breaking audit. Also move
this check behind debug as it should not be needed during normal
operqation.
Fixes: 26b7899510ae ("apparmor: add support for absolute root view based labels") Reported-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
Dan Williams [Fri, 3 Jun 2022 23:43:48 +0000 (16:43 -0700)]
cxl/port: Keep port->uport valid for the entire life of a port
The upcoming region provisioning implementation has a need to
dereference port->uport during the port unregister flow. Specifically,
endpoint decoders need to be able to lookup their corresponding memdev
via port->uport.
The existing ->dead flag was added for cases where the core was
committed to tearing down the port, but needed to drop locks before
calling device_unregister(). Reuse that flag to indicate to
delete_endpoint() that it has no "release action" work to do as
unregister_port() will handle it.
We've added 94 non-merge commits during the last 19 day(s) which contain
a total of 125 files changed, 5141 insertions(+), 6701 deletions(-).
The main changes are:
1) Add new way for performing BTF type queries to BPF, from Daniel Müller.
2) Add inlining of calls to bpf_loop() helper when its function callback is
statically known, from Eduard Zingerman.
3) Implement BPF TCP CC framework usability improvements, from Jörn-Thorben Hinz.
4) Add LSM flavor for attaching per-cgroup BPF programs to existing LSM
hooks, from Stanislav Fomichev.
5) Remove all deprecated libbpf APIs in prep for 1.0 release, from Andrii Nakryiko.
6) Add benchmarks around local_storage to BPF selftests, from Dave Marchevsky.
7) AF_XDP sample removal (given move to libxdp) and various improvements around AF_XDP
selftests, from Magnus Karlsson & Maciej Fijalkowski.
8) Add bpftool improvements for memcg probing and bash completion, from Quentin Monnet.
9) Add arm64 JIT support for BPF-2-BPF coupled with tail calls, from Jakub Sitnicki.
10) Sockmap optimizations around throughput of UDP transmissions which have been
improved by 61%, from Cong Wang.
11) Rework perf's BPF prologue code to remove deprecated functions, from Jiri Olsa.
12) Fix sockmap teardown path to avoid sleepable sk_psock_stop, from John Fastabend.
13) Fix libbpf's cleanup around legacy kprobe/uprobe on error case, from Chuang Wang.
14) Fix libbpf's bpf_helpers.h to work with gcc for the case of its sec/pragma
macro, from James Hilliard.
15) Fix libbpf's pt_regs macros for riscv to use a0 for RC register, from Yixun Lan.
16) Fix bpftool to show the name of type BPF_OBJ_LINK, from Yafang Shao.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (94 commits)
selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n
bpf: Correctly propagate errors up from bpf_core_composites_match
libbpf: Disable SEC pragma macro on GCC
bpf: Check attach_func_proto more carefully in check_return_code
selftests/bpf: Add test involving restrict type qualifier
bpftool: Add support for KIND_RESTRICT to gen min_core_btf command
MAINTAINERS: Add entry for AF_XDP selftests files
selftests, xsk: Rename AF_XDP testing app
bpf, docs: Remove deprecated xsk libbpf APIs description
selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage
libbpf, riscv: Use a0 for RC register
libbpf: Remove unnecessary usdt_rel_ip assignments
selftests/bpf: Fix few more compiler warnings
selftests/bpf: Fix bogus uninitialized variable warning
bpftool: Remove zlib feature test from Makefile
libbpf: Cleanup the legacy uprobe_event on failed add/attach_event()
libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy()
libbpf: Cleanup the legacy kprobe_event on failed add/attach_event()
selftests/bpf: Add type match test against kernel's task_struct
selftests/bpf: Add nested type to type based tests
...
====================
* tag 'i2c-for-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: cadence: Unregister the clk notifier in error path
i2c: piix4: Fix a memory leak in the EFCH MMIO support
drm/aperture: Run fbdev removal before internal helpers
Always run fbdev removal first to remove simpledrm via sysfb_disable().
This clears the internal state.
The later call to drm_aperture_detach_drivers() then does nothing.
Otherwise, with drm_aperture_detach_drivers() running first, the call to
sysfb_disable() uses inconsistent state.
Example backtrace show below:
BUG: KASAN: use-after-free in device_del+0x79/0x5f0
Read of size 8 at addr ffff888108185050 by task systemd-udevd/311
CPU: 0 PID: 311 Comm: systemd-udevd Tainted: G E 5.19.0-rc2-1-default+ #1689
Hardware name: HP ProLiant DL120 G7, BIOS J01 04/21/2011
Call Trace:
device_del+0x79/0x5f0
platform_device_del.part.0+0x19/0xe0
platform_device_unregister+0x1c/0x30
sysfb_disable+0x2d/0x70
remove_conflicting_framebuffers+0x1c/0xf0
remove_conflicting_pci_framebuffers+0x130/0x1a0
drm_aperture_remove_conflicting_pci_framebuffers+0x86/0xb0
mgag200_pci_probe+0x2d/0x140 [mgag200]
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: 873eb3b11860 ("fbdev: Disable sysfb device registration when removing conflicting FBs") Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Helge Deller <deller@gmx.de> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Cc: Changcheng Deng <deng.changcheng@zte.com.cn> Reviewed-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andre Przywara [Fri, 8 Jul 2022 10:52:35 +0000 (11:52 +0100)]
arm64: dts: allwinner: h616: Add X96 Mate TV box support
The X96 Mate is an Allwinner H616 based TV box, featuring:
- Four ARM Cortex-A53 cores, Mali-G31 MP2 GPU
- 2GiB/4GiB RAM (fully usable!)
- 16/32/64GiB eMMC
- 100Mbps Ethernet (via embedded AC200 EPHY, not yet supported)
- Unsupported Allwinner WiFi chip
- 2 x USB 2.0 host ports
- HDMI port
- IR receiver
- 5V/2A DC power supply via barrel plug
Add a basic devicetree for it, with SD card and eMMC working, as
well as serial and the essential peripherals, like the AXP PMIC.
This DT is somewhat minimal, and should work on many other similar TV
boxes with the Allwinner H616 chip.
Andre Przywara [Fri, 8 Jul 2022 10:52:34 +0000 (11:52 +0100)]
arm64: dts: allwinner: h616: Add OrangePi Zero 2 board support
The OrangePi Zero 2 is a development board with the new H616 SoC. It
comes with the following features:
- Four ARM Cortex-A53 cores, Mali-G31 MP2 GPU
- 512MiB/1GiB DDR3 DRAM
- AXP305 PMIC
- Raspberry-Pi-1 compatible GPIO header
- extra 13 pin expansion header, exposing pins for 2x USB 2.0 ports
- 1 USB 2.0 host port
- 1 USB 2.0 type C port (power supply + OTG)
- MicroSD slot
- on-board 2MiB bootable SPI NOR flash
- 1Gbps Ethernet port (via RTL8211F PHY)
- micro-HDMI port
- (yet) unsupported Allwinner WiFi/BT chip
Add the devicetree file describing the currently supported features.
Andre Przywara [Fri, 8 Jul 2022 10:52:33 +0000 (11:52 +0100)]
dt-bindings: arm: sunxi: Add two H616 board compatible strings
This adds the two board compatible strings of two boards with the
Allwinner H616 SoC. One is a development board from OrangePi, the other
some TV box from some formerly unused vendor. Add that vendor to the
vendor list on the way.
Andre Przywara [Fri, 8 Jul 2022 10:52:32 +0000 (11:52 +0100)]
dt-bindings: pinctrl: sunxi: allow vcc-pi-supply
The Allwinner H616 SoC contains a VCC_PI pin, which supplies the voltage
for GPIO port I.
Extend the range of supply port names to include vcc-pi-supply to cover
that.
This (relatively) new SoC is similar to the H6, but drops the (broken)
PCIe support and the USB 3.0 controller. It also gets the management
controller removed, which in turn removes *some*, but not all of the
devices formerly dedicated to the ARISC (CPUS).
And while there is still the extra sunxi interrupt controller, the
package lacks the corresponding NMI pin, so no interrupts for the PMIC.
The reserved memory node is actually handled by Trusted Firmware now,
but U-Boot fails to propagate this to a separately loaded DTB, so we
keep it in here for now, until U-Boot learns to do this properly.
Andre Przywara [Fri, 8 Jul 2022 10:52:30 +0000 (11:52 +0100)]
dt-bindings: pinctrl: sunxi: Make interrupts optional
The R_PIO pinctrl device on the Allwinner H616 SoC does not have an
interrupt (it features only two pins).
However the binding requires at least naming one upstream interrupt,
plus the #interrupt-cells and interrupt-controller properties.
Drop the unconditional requirement for the interrupt properties, and
make them dependent on being not this particular pinctrl device.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Andrey Strachuk <strochuk@ispras.ru> Fixes: 4d0cdd2bb8f0 ("xfs: clean up xfs_attr_node_hasname") Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Eric Sandeen [Sat, 9 Jul 2022 17:56:02 +0000 (10:56 -0700)]
xfs: add selinux labels to whiteout inodes
We got a report that "renameat2() with flags=RENAME_WHITEOUT doesn't
apply an SELinux label on xfs" as it does on other filesystems
(for example, ext4 and tmpfs.) While I'm not quite sure how labels
may interact w/ whiteout files, leaving them as unlabeled seems
inconsistent at best. Now that xfs_init_security is not static,
rename it to xfs_inode_init_security per dchinner's suggestion.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Sat, 9 Jul 2022 17:55:44 +0000 (10:55 -0700)]
Merge tag 'xfs-perag-conv-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs into xfs-5.20-mergeA
xfs: per-ag conversions for 5.20
This series drives the perag down into the AGI, AGF and AGFL access
routines and unifies the perag structure initialisation with the
high level AG header read functions. This largely replaces the
xfs_mount/agno pair that is passed to all these functions with a
perag, and in most places we already have a perag ready to pass in.
There are a few places where perags need to be grabbed before
reading the AG header buffers - some of these will need to be driven
to higher layers to ensure we can run operations on AGs without
getting stuck part way through waiting on a perag reference.
The latter section of this patchset moves some of the AG geometry
information from the xfs_mount to the xfs_perag, and starts
converting code that requires geometry validation to use a perag
instead of a mount and having to extract the AGNO from the object
location. This also allows us to store the AG size in the perag and
then we can stop having to compare the agno against sb_agcount to
determine if the AG is the last AG and so has a runt size. This
greatly simplifies some of the type validity checking we do and
substantially reduces the CPU overhead of type validity checking. It
also cuts over 1.2kB out of the binary size.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* tag 'xfs-perag-conv-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: make is_log_ag() a first class helper
xfs: replace xfs_ag_block_count() with perag accesses
xfs: Pre-calculate per-AG agino geometry
xfs: Pre-calculate per-AG agbno geometry
xfs: pass perag to xfs_alloc_read_agfl
xfs: pass perag to xfs_alloc_put_freelist
xfs: pass perag to xfs_alloc_get_freelist
xfs: pass perag to xfs_read_agf
xfs: pass perag to xfs_read_agi
xfs: pass perag to xfs_alloc_read_agf()
xfs: kill xfs_alloc_pagf_init()
xfs: pass perag to xfs_ialloc_read_agi()
xfs: kill xfs_ialloc_pagi_init()
xfs: make last AG grow/shrink perag centric
Darrick J. Wong [Sat, 9 Jul 2022 17:55:21 +0000 (10:55 -0700)]
Merge tag 'xfs-cil-scale-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs into xfs-5.20-mergeA
xfs: improve CIL scalability
This series aims to improve the scalability of XFS transaction
commits on large CPU count machines. My 32p machine hits contention
limits in xlog_cil_commit() at about 700,000 transaction commits a
section. It hits this at 16 thread workloads, and 32 thread
workloads go no faster and just burn CPU on the CIL spinlocks.
This patchset gets rid of spinlocks and global serialisation points
in the xlog_cil_commit() path. It does this by moving to a
combination of per-cpu counters, unordered per-cpu lists and
post-ordered per-cpu lists.
This results in transaction commit rates exceeding 1.4 million
commits/s under unlink certain workloads, and while the log lock
contention is largely gone there is still significant lock
contention in the VFS (dentry cache, inode cache and security layers)
at >600,000 transactions/s that still limit scalability.
The changes to the CIL accounting and behaviour, combined with the
structural changes to xlog_write() in prior patchsets make the
per-cpu restructuring possible and sane. This allows us to move to
precalculated reservation requirements that allow for reservation
stealing to be accounted across multiple CPUs accurately.
That is, instead of trying to account for continuation log opheaders
on a "growth" basis, we pre-calculate how many iclogs we'll need to
write out a maximally sized CIL checkpoint and steal that reserveD
that space one commit at a time until the CIL has a full
reservation. If we ever run a commit when we are already at the hard
limit (because post-throttling) we simply take an extra reservation
from each commit that is run when over the limit. Hence we don't
need to do space usage math in the fast path and so never need to
sum the per-cpu counters in this fast path.
Similarly, per-cpu lists have the problem of ordering - we can't
remove an item from a per-cpu list if we want to move it forward in
the CIL. We solve this problem by using an atomic counter to give
every commit a sequence number that is copied into the log items in
that transaction. Hence relogging items just overwrites the sequence
number in the log item, and does not move it in the per-cpu lists.
Once we reaggregate the per-cpu lists back into a single list in the
CIL push work, we can run it through list-sort() and reorder it back
into a globally ordered list. This costs a bit of CPU time, but now
that the CIL can run multiple works and pipelines properly, this is
not a limiting factor for performance. It does increase fsync
latency when the CIL is full, but workloads issuing large numbers of
fsync()s or sync transactions end up with very small CILs and so the
latency impact or sorting is not measurable for such workloads.
OVerall, this pushes the transaction commit bottleneck out to the
lockless reservation grant head updates. These atomic updates don't
start to be a limiting fact until > 1.5 million transactions/s are
being run, at which point the accounting functions start to show up
in profiles as the highest CPU users. Still, this series doubles
transaction throughput without increasing CPU usage before we get
to that cacheline contention breakdown point...
` Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* tag 'xfs-cil-scale-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: expanding delayed logging design with background material
xfs: xlog_sync() manually adjusts grant head space
xfs: avoid cil push lock if possible
xfs: move CIL ordering to the logvec chain
xfs: convert log vector chain to use list heads
xfs: convert CIL to unordered per cpu lists
xfs: Add order IDs to log items in CIL
xfs: convert CIL busy extents to per-cpu
xfs: track CIL ticket reservation in percpu structure
xfs: implement percpu cil space used calculation
xfs: introduce per-cpu CIL tracking structure
xfs: rework per-iclog header CIL reservation
xfs: lift init CIL reservation out of xc_cil_lock
xfs: use the CIL space used counter for emptiness checks
Merge tag 'powerpc-5.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
- On Power8 bare metal, fix creation of RNG platform devices, which are
needed for the /dev/hwrng driver to probe correctly.
Thanks to Jason A. Donenfeld, and Sachin Sant.
* tag 'powerpc-5.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powernv: delay rng platform device creation until later in boot
Old code resets SIE for up to 8 streams using byte accessor, but
register is laid out in following way:
31 GIE
30 CIE
29:x Reserved
x-1:0 SIE
If there is more than 8 streams, some of them may and up with enabled
interrupts. To fix this just clear whole INTCTL register when disabling
interrupts.
ALSA: hda: Fix page fault in snd_hda_codec_shutdown()
If early probe of HDAudio bus driver fails e.g.: due to missing
firmware file, snd_hda_codec_shutdown() ends in manipulating
uninitialized codec->pcm_list_head causing page fault.
Iinitialization of HDAudio codec in ASoC is split in two:
- snd_hda_codec_device_init()
- snd_hda_codec_device_new()
snd_hda_codec_device_init() is called during probe_codecs() by HDAudio
bus driver while snd_hda_codec_device_new() is called by
codec-component's ->probe(). The second call will not happen until all
components required by related sound card are present within the ASoC
framework. With firmware failing to load during the PCI's deferred
initialization i.e.: probe_work(), no platform components are ever
registered. HDAudio codec enumeration is done at that point though, so
the codec components became registered to ASoC framework, calling
snd_hda_codec_device_init() in the process.
Now, during platform reboot snd_hda_codec_shutdown() is called for every
codec found on the HDAudio bus causing oops if any of them has not
completed both of their initialization steps. Relocating field
initialization fixes the issue.
ALSA: hda: Fix put_device() inconsistency in error path
AVS HDAudio bus driver does not tie with codec drivers tighly. Codec
device and its respective driver cleanup procedures are split and may
not occur one after the other. Device cleanup is performed only on
snd_hdac_ext_bus_device_remove() i.e. it's the bus driver's
responsibility. If codec component probing fails, put_device() found in
snd_hda_codec_device_new() may lead to page fault. Relocate it to
snd_hda_codec_new() to address the problem on ASoC side while keeping
status quo for snd_hda_intel.
ALSA: hda: Make device usage_count consistent across subsequent probing
AVS HDAudio bus driver does not tie with codec drivers tighly and
snd_hda_codec_device_new() can be called after codec's module reload. In
such case, rpm is forbidden and invoking pm_runtime_forbid()
unconditionally causes device's usage_count to become unbalanced. This
is later caught by WARN_ON() found in sound/soc/hda.c. Detect such
circumstance and bump the usage_count instead.
ALSA: hda: Fix null-ptr-deref when i915 fails and hdmi is denylisted
If snd_hda_hdmi_codec module is denylisted and any event causes i915
enumeration to fail, is_likely_hdmi_codec() ends in null-ptr-deref.
As snd_soc_hda is an ASoC-based driver, its initialization is delayed
until all the necessary components appear in the system - allowing
actual sound card to enumerate. snd_hda_codec_configure() gets called by
the avs-driver core during probe_codecs() but the
snd_hda_codec_device_new(), necessary to complete codecs initialization,
happens only when codec-component of hda sound card is being probed.
Denylisting snd_hda_codec_hdmi module causes snd_hda_codec_configure()
to reach: codec_bind_generic() -> is_likely_hdmi_codec() which makes use
of ->wcaps and at this point the it isn't initialized yet - again,
requires completion of snd_hda_codec_device_new().
netfilter: nf_tables: replace BUG_ON by element length check
BUG_ON can be triggered from userspace with an element with a large
userdata area. Replace it by length check and return EINVAL instead.
Over time extensions have been growing in size.
Pick a sufficiently old Fixes: tag to propagate this fix.
Fixes: 7d7402642eaf ("netfilter: nf_tables: variable sized set element keys / data") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>