DMA mapping might fail, we have to check it with dma_mapping_error().
Otherwise DMA-API is not happy:
DMA-API: pch_udc 0000:02:02.4: device driver failed to check map error[device address=0x00000000027ee678] [size=64 bytes] [mapped as single]
Fixes: abab0c67c061 ("usb: pch_udc: Fixed issue which does not work with g_serial") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20210323153626.54908-3-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Either way ~0 will be in the correct byte order, hence
replace cpu_to_le32() by lower_32_bits(). Moreover,
it makes sparse happy, otherwise it complains:
.../pch_udc.c:1813:27: warning: incorrect type in assignment (different base types)
.../pch_udc.c:1813:27: expected unsigned int [usertype] dataptr
.../pch_udc.c:1813:27: got restricted __le32 [usertype]
Currently, the late microcode loading mechanism checks whether any CPUs
are offlined, and, in such a case, aborts the load attempt.
However, this must be done before the kernel caches new microcode from
the filesystem. Otherwise, when offlined CPUs are onlined later, those
cores are going to be updated through the CPU hotplug notifier callback
with the new microcode, while CPUs previously onine will continue to run
with the older microcode.
The rationale for why the update is aborted when at least one primary
thread is offline is because even if that thread is soft-offlined
and idle, it will still have to participate in broadcasted MCE's
synchronization dance or enter SMM, and in both examples it will execute
instructions so it better have the same microcode revision as the other
cores.
[ bp: Heavily edit and extend commit message with the reasoning behind all
this. ]
Fixes: 30ec26da9967 ("x86/microcode: Do not upload microcode if CPUs are offline") Signed-off-by: Otavio Pontes <otavio.pontes@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Ashok Raj <ashok.raj@intel.com> Link: https://lkml.kernel.org/r/20210319165515.9240-2-otavio.pontes@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
In qcom_probe_nand_devices() function, the error code returned by
qcom_nand_host_init_and_register() is converted to -ENODEV in the case
of failure. This poses issue if -EPROBE_DEFER is returned when the
dependency is not available for a component like parser.
So let's restructure the error handling logic a bit and return the
actual error code in case of qcom_nand_host_init_and_register() failure.
There are chances that the parse_mtd_partitions() function will return
-EPROBE_DEFER in mtd_device_parse_register(). This might happen when
the dependency is not available for the parser. For instance, on SDX55
the MTD_QCOMSMEM_PARTS parser depends on the QCOM_SMEM driver to parse
the partitions defined in the shared memory region. With the current
flow, the error returned from parse_mtd_partitions() will be discarded
in favor of trying to add the fallback partition.
This will prevent the driver to end up in probe deferred pool and the
partitions won't be parsed even after the QCOM_SMEM driver is available.
Fix this issue by bailing out of mtd_device_parse_register() when
-EPROBE_DEFER error is returned from parse_mtd_partitions() function and
propagate the error code to the driver core for probing later.
Hamming ECC doesn't cover the OOB data, so reading or writing OOB shall
always be done without ECC enabled.
This is a problem when adding JFFS2 cleanmarkers to erased blocks. If JFFS2
clenmarkers are added to the OOB with ECC enabled, OOB bytes will be changed
from ff ff ff to 00 00 00, reporting incorrect ECC errors.
There is a upstream commit cffa4b2122f5("regmap:debugfs:
Fix a memory leak when calling regmap_attach_dev") that
adds a if condition when create name for debugfs_name.
With below function invoking logical, debugfs_name is
freed in regmap_debugfs_exit(), but it is not created again
because of the if condition introduced by above commit.
regmap_reinit_cache()
regmap_debugfs_exit()
...
regmap_debugfs_init()
So, set debugfs_name to NULL after it is freed.
Fixes: cffa4b2122f5 ("regmap: debugfs: Fix a memory leak when calling regmap_attach_dev") Signed-off-by: Meng Li <Meng.Li@windriver.com> Link: https://lore.kernel.org/r/20210226021737.7690-1-Meng.Li@windriver.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
While interpreting CC_STATUS, ROLE_CONTROL has to be read to make
sure that CC1/CC2 is not forced presenting Rp/Rd.
>From the TCPCI spec:
4.4.5.2 ROLE_CONTROL (Normative):
The TCPM shall write B6 (DRP) = 0b and B3..0 (CC1/CC2) if it wishes
to control the Rp/Rd directly instead of having the TCPC perform
DRP toggling autonomously. When controlling Rp/Rd directly, the
TCPM writes to B3..0 (CC1/CC2) each time it wishes to change the
CC1/CC2 values. This control is used for TCPM-TCPC implementing
Source or Sink only as well as when a connection has been detected
via DRP toggling but the TCPM wishes to attempt Try.Src or Try.Snk.
In "tx_empty", we should poll TC bit in both DMA and PIO modes (instead of
TXE) to check transmission data register has been transmitted independently
of the FIFO mode. TC indicates that both transmit register and shift
register are empty. When shift register is empty, tx_empty should return
TIOCSER_TEMT instead of TC value.
Cleans the USART_CR_TC TCCF register define (transmission complete clear
flag) as it is duplicate of USART_ICR_TCCF.
Incorrect characters are observed on console during boot. This issue occurs
when init/main.c is modifying termios settings to open /dev/console on the
rootfs.
This patch adds a waiting loop in set_termios to wait for TX shift register
empty (and TX FIFO if any) before stopping serial port.
The Maxim PMIC datasheets describe the interrupt line as active low
with a requirement of acknowledge from the CPU. Without specifying the
interrupt type in Devicetree, kernel might apply some fixed
configuration, not necessarily working for this hardware.
Additionally, the interrupt line is shared so using level sensitive
interrupt is here especially important to avoid races.
The Maxim PMIC datasheets describe the interrupt line as active low
with a requirement of acknowledge from the CPU. Without specifying the
interrupt type in Devicetree, kernel might apply some fixed
configuration, not necessarily working for this hardware.
Additionally, the interrupt line is shared so using level sensitive
interrupt is here especially important to avoid races.
Fixes: 47580e8d94c2 ("ARM: dts: Specify MAX77686 pmic interrupt for exynos5250-smdk5250") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Link: https://lore.kernel.org/r/20201210212534.216197-8-krzk@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
The Maxim PMIC datasheets describe the interrupt line as active low
with a requirement of acknowledge from the CPU. Without specifying the
interrupt type in Devicetree, kernel might apply some fixed
configuration, not necessarily working for this hardware.
Additionally, the interrupt line is shared so using level sensitive
interrupt is here especially important to avoid races.
The Maxim PMIC datasheets describe the interrupt line as active low
with a requirement of acknowledge from the CPU. Without specifying the
interrupt type in Devicetree, kernel might apply some fixed
configuration, not necessarily working for this hardware.
Additionally, the interrupt line is shared so using level sensitive
interrupt is here especially important to avoid races.
The Maxim MUIC datasheets describe the interrupt line as active low
with a requirement of acknowledge from the CPU. Without specifying the
interrupt type in Devicetree, kernel might apply some fixed
configuration, not necessarily working for this hardware.
Additionally, the interrupt line is shared so using level sensitive
interrupt is here especially important to avoid races.
The Maxim fuel gauge datasheets describe the interrupt line as active
low with a requirement of acknowledge from the CPU. The falling edge
interrupt will mostly work but it's not correct.
Currently the array gpmc_cs is indexed by cs before it cs is range checked
and the pointer read from this out-of-index read is dereferenced. Fix this
by performing the range check on cs before the read and the following
pointer dereference.
Addresses-Coverity: ("Negative array index read") Fixes: 9ed7a776eb50 ("ARM: OMAP2+: Fix support for multiple devices on a GPMC chip select") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Tony Lindgren <tony@atomide.com> Link: https://lore.kernel.org/r/20210223193821.17232-1-colin.king@canonical.com Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The commit d3cb25a12138 ("usb: gadget: udc: fix spin_lock in pch_udc")
obviously was not thought through and had made the situation even worse
than it was before. Two changes after almost reverted it. but a few
leftovers have been left as it. With this revert d3cb25a12138 completely.
While at it, narrow down the scope of unlocked section to prevent
potential race when prot_stall is assigned.
Smatch complains about missing that the ovl_override_creds() doesn't
have a matching revert_creds() if the dentry is disconnected. Fix this
by moving the ovl_override_creds() until after the disconnected check.
Fixes: aa3ff3c152ff ("ovl: copy up of disconnected dentries") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
store_regs_fmt2() has an ordering problem: first the guarded storage
facility is enabled on the local cpu, then preemption disabled, and
then the STGSC (store guarded storage controls) instruction is
executed.
If the process gets scheduled away between enabling the guarded
storage facility and before preemption is disabled, this might lead to
a special operation exception and therefore kernel crash as soon as
the process is scheduled back and the STGSC instruction is executed.
Fixes: 4e0b1ab72b8a ("KVM: s390: gs support for kvm guests") Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Cc: <stable@vger.kernel.org> # 4.12 Link: https://lore.kernel.org/r/20210415080127.1061275-1-hca@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Split kvm_s390_logical_to_effective to a generic function called
_kvm_s390_logical_to_effective. The new function takes a PSW and an address
and returns the address with the appropriate bits masked off. The old
function now calls the new function with the appropriate PSW from the vCPU.
This is needed to avoid code duplication for vSIE.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: stable@vger.kernel.org # for VSIE: correctly handle MVPG when in VSIE Link: https://lore.kernel.org/r/20210302174443.514363-2-imbrenda@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Initialize MSR_TSC_AUX with CPU node information if RDTSCP or RDPID is
supported. This fixes a bug where vdso_read_cpunode() will read garbage
via RDPID if RDPID is supported but RDTSCP is not. While no known CPU
supports RDPID but not RDTSCP, both Intel's SDM and AMD's APM allow for
RDPID to exist without RDTSCP, e.g. it's technically a legal CPU model
for a virtual machine.
Note, technically MSR_TSC_AUX could be initialized if and only if RDPID
is supported since RDTSCP is currently not used to retrieve the CPU node.
But, the cost of the superfluous WRMSR is negigible, whereas leaving
MSR_TSC_AUX uninitialized is just asking for future breakage if someone
decides to utilize RDTSCP.
Fixes: a582c540ac1b ("x86/vdso: Use RDPID in preference to LSL when available") Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210504225632.1532621-2-seanjc@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The quirk entry for Uniwill ECS M31EI is with the PCI SSID device 0,
which means matching with all. That is, it's essentially equivalent
with SND_PCI_QUIRK_VENDOR(0x1584), which also matches with the
previous entry for Haier W18 applying the very same quirk.
Let's unify them with the single vendor-quirk entry.
Just re-order the alc269_fixup_tbl[] entries for Lenovo devices for
avoiding the oversight of the duplicated or unapplied item in future.
No functional changes.
Also Cc-to-stable for the further patch applications.
Just re-order the alc269_fixup_tbl[] entries for Sony devices for
avoiding the oversight of the duplicated or unapplied item in future.
No functional changes.
Also Cc-to-stable for the further patch applications.
Just re-order the alc269_fixup_tbl[] entries for Dell devices for
avoiding the oversight of the duplicated or unapplied item in future.
No functional changes.
Also Cc-to-stable for the further patch applications.
Just re-order the alc269_fixup_tbl[] entries for HP devices for
avoiding the oversight of the duplicated or unapplied item in future.
No functional changes.
Formerly, some entries were grouped for the actual codec, but this
doesn't seem reasonable to keep in that way. So now we simply keep
the PCI SSID order for the whole.
Also Cc-to-stable for the further patch applications.
Just re-order the alc882_fixup_tbl[] entries for Clevo devices for
avoiding the oversight of the duplicated or unapplied item in future.
No functional changes.
Also, user lower hex letters in the entry.
Also Cc-to-stable for the further patch applications.
Just re-order the alc882_fixup_tbl[] entries for Sony devices for
avoiding the oversight of the duplicated or unapplied item in future.
No functional changes.
Also Cc-to-stable for the further patch applications.
Just re-order the alc882_fixup_tbl[] entries for Acer devices for
avoiding the oversight of the duplicated or unapplied item in future.
No functional changes.
Also Cc-to-stable for the further patch applications.
Currently the ioctl command RADEON_INFO_SI_BACKEND_ENABLED_MASK can
copy back uninitialised data in value_tmp that pointer *value points
to. This can occur when rdev->family is less than CHIP_BONAIRE and
less than CHIP_TAHITI. Fix this by adding in a missing -EINVAL
so that no invalid value is copied back to userspace.
Addresses-Coverity: ("Uninitialized scalar variable) Cc: stable@vger.kernel.org # 3.13+ Fixes: 439a1cfffe2c ("drm/radeon: expose render backend mask to the userspace") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we overflow the maximum number of BSS entries and free the
new entry, drop it from any hidden_list that it may have been
added to in the code above or in cfg80211_combine_bsses().
commit d3374825ce57 ("md: make devices disappear when they are no longer
needed.") introduced protection between mddev creating & removing. The
md_open shouldn't create mddev when all_mddevs list doesn't contain
mddev. With currently code logic, there will be very easy to trigger
soft lockup in non-preempt env.
This patch changes md_open returning from -ERESTARTSYS to -EBUSY, which
will break the infinitely retry when md_open enter racing area.
This patch is partly fix soft lockup issue, full fix needs mddev_find
is split into two functions: mddev_find & mddev_find_or_alloc. And
md_open should call new mddev_find (it only does searching job).
For more detail, please refer with Christoph's "split mddev_find" patch
in later commits.
Split mddev_find into a simple mddev_find that just finds an existing
mddev by the unit number, and a more complicated mddev_find that deals
with find or allocating a mddev.
This turns out to fix this bug reported by Zhao Heming.
----------------------------- snip ------------------------------
commit d3374825ce57 ("md: make devices disappear when they are no longer
needed.") introduced protection between mddev creating & removing. The
md_open shouldn't create mddev when all_mddevs list doesn't contain
mddev. With currently code logic, there will be very easy to trigger
soft lockup in non-preempt env.
Fixes: dbb64f8635f5d ("md-cluster: Fix adding of new disk with new reload code") Fixes: 659b254fa7392 ("md-cluster: remove a disk asynchronously from cluster environment") Cc: stable@vger.kernel.org Reviewed-by: Gang He <ghe@suse.com> Signed-off-by: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
bio in the above stack is a bitmap write whose completion is invoked after
the tear down sequence sets the mddev structure to NULL in rdev.
During tear down, there is an attempt to flush the bitmap writes, but for
external bitmaps, there is no explicit wait for all the bitmap writes to
complete. For instance, md_bitmap_flush() is called to flush the bitmap
writes, but the last call to md_bitmap_daemon_work() in md_bitmap_flush()
could generate new bitmap writes for which there is no explicit wait to
complete those writes. The call to md_bitmap_update_sb() will return
simply for external bitmaps and the follow-up call to md_update_sb() is
conditional and may not get called for external bitmaps. This results in a
kernel panic when the completion routine, super_written() is called which
tries to reference mddev in the rdev that has been set to
NULL(in unbind_rdev_from_array() by tear down sequence).
The solution is to call md_super_wait() for external bitmaps after the
last call to md_bitmap_daemon_work() in md_bitmap_flush() to ensure there
are no pending bitmap writes before proceeding with the tear down.
Cc: stable@vger.kernel.org Signed-off-by: Sudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com> Reviewed-by: Zhao Heming <heming.zhao@suse.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
KMSAN complains that the vmci_use_ppn64() == false path in
vmci_dbell_register_notification_bitmap() left upper 32bits of
bitmap_set_msg.bitmap_ppn64 member uninitialized.
Local variable ----bitmap_set_msg@vmci_dbell_register_notification_bitmap created at:
vmci_dbell_register_notification_bitmap+0x50/0x1e0
vmci_dbell_register_notification_bitmap+0x50/0x1e0
Bytes 28-31 of 32 are uninitialized
Memory access of size 32 starts at ffff88810098f570
=====================================================
Before this commit lis3lv02d_get_pwron_wait() had a WARN_ONCE() to catch
a potential divide by 0. WARN macros should only be used to catch internal
kernel bugs and that is not the case here. We have been receiving a lot of
bug reports about kernel backtraces caused by this WARN.
The div value being checked comes from the lis3->odrs[] array. Which
is sized to be a power-of-2 matching the number of bits in lis3->odr_mask.
The only lis3 model where this array is not entirely filled with non zero
values. IOW the only model where we can hit the div == 0 check is the
3dc ("8 bits 3DC sensor") model:
Note the 0 value at index 0, according to the datasheet an odr index of 0
means "Power-down mode". HP typically uses a lis3 accelerometer for HDD
fall protection. What I believe is happening here is that on newer
HP devices, which only contain a SDD, the BIOS is leaving the lis3 device
powered-down since it is not used for HDD fall protection.
Note that the lis3_3dc_rates array initializer only specifies 10 values,
which matches the datasheet. So it also contains 6 zero values at the end.
Replace the WARN with a normal check, which treats an odr index of 0
as power-down and uses a normal dev_err() to report the error in case
odr index point past the initialized part of the array.
Whilst running some basic tests as part of writing up the dt-bindings for
this driver (to follow), it became clear it doesn't actually load
currently.
iio iio:device1: tried to double register : in_incli_x_index
adis16201 spi0.0: Failed to create buffer sysfs interfaces
adis16201: probe of spi0.0 failed with error -16
Looks like a cut and paste / update bug. Fixes tag obviously not accurate
but we don't want to bother carry thing back to before the driver moved
out of staging.
Fixes: 591298e54cea ("Staging: iio: accel: adis16201: Move adis16201 driver out of staging") Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: <Stable@vger.kernel.org> Cc: Himanshu Jha <himanshujha199640@gmail.com> Cc: Nuno Sá <nuno.sa@analog.com> Reviewed-by: Alexandru Ardelean <ardeleanalex@gmail.com> Link: https://lore.kernel.org/r/20210321182956.844652-1-jic23@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Recent versions of the PCI Express specification have deprecated support
for I/O transactions and actually some PCIe host bridges, such as Power
Systems Host Bridge 4 (PHB4), do not implement them.
For those systems the PCI BARs that request a mapping in the I/O space
have the length recorded in the corresponding PCI resource set to zero,
which makes it unassigned:
# lspci -s 0031:02:04.0 -v
0031:02:04.0 FDDI network controller: Digital Equipment Corporation PCI-to-PDQ Interface Chip [PFI] FDDI (DEFPA) (rev 02)
Subsystem: Digital Equipment Corporation FDDIcontroller/PCI (DEFPA)
Flags: bus master, medium devsel, latency 136, IRQ 57, NUMA node 8
Memory at 620c080020000 (32-bit, non-prefetchable) [size=128]
I/O ports at <unassigned> [disabled]
Memory at 620c080030000 (32-bit, non-prefetchable) [size=64K]
Capabilities: [50] Power Management version 2
Kernel driver in use: defxx
Kernel modules: defxx
#
Regardless the driver goes ahead and requests it (here observed with a
Raptor Talos II POWER9 system), resulting in an odd /proc/ioport entry:
Furthermore, the system gets confused as the driver actually continues
and pokes at those locations, causing a flood of messages being output
to the system console by the underlying system firmware, like:
defxx: v1.11 2014/07/01 Lawrence V. Stefani and others
defxx 0031:02:04.0: enabling device (0140 -> 0142)
LPC[000]: Got SYNC no-response error. Error address reg: 0xd0010000
IPMI: dropping non severe PEL event
LPC[000]: Got SYNC no-response error. Error address reg: 0xd0010014
IPMI: dropping non severe PEL event
LPC[000]: Got SYNC no-response error. Error address reg: 0xd0010014
IPMI: dropping non severe PEL event
and so on and so on (possibly intermixed actually, as there's no locking
between the kernel and the firmware in console port access with this
particular system, but cleaned up above for clarity), and once some 10k
of such pairs of the latter two messages have been produced an interace
eventually shows up in a useless state:
This was not expected to happen as resource handling was added to the
driver a while ago, because it was not known at that time that a PCI
system would be possible that cannot assign port I/O resources, and
oddly enough `request_region' does not fail, which would have caught it.
Correct the problem then by checking for the length of zero for the CSR
resource and bail out gracefully refusing to register an interface if
that turns out to be the case, producing messages like:
defxx: v1.11 2014/07/01 Lawrence V. Stefani and others
0031:02:04.0: Cannot use I/O, no address set, aborting
0031:02:04.0: Recompile driver with "CONFIG_DEFXX_MMIO=y"
Keep the original check for the EISA MMIO resource as implemented,
because in that case the length is hardwired to 0x400 as a consequence
of how the compare/mask address decoding works in the ESIC chip and it
is only the base address that is set to zero if MMIO has been disabled
for the adapter in EISA configuration, which in turn could be a valid
bus address in a legacy-free system implementing PCI, especially for
port I/O.
Where the EISA MMIO resource has been disabled for the adapter in EISA
configuration this arrangement keeps producing messages like:
eisa 00:05: EISA: slot 5: DEC3002 detected
defxx: v1.11 2014/07/01 Lawrence V. Stefani and others
00:05: Cannot use MMIO, no address set, aborting
00:05: Recompile driver with "CONFIG_DEFXX_MMIO=n"
00:05: Or run ECU and set adapter's MMIO location
with the last two lines now swapped for easier handling in the driver.
There is no need to check for and catch the case of a port I/O resource
not having been assigned for EISA as the adapter uses the slot-specific
I/O space, which gets assigned by how EISA has been specified and maps
directly to the particular slot an option card has been placed in. And
the EISA variant of the adapter has additional registers that are only
accessible via the port I/O space anyway.
While at it factor out the error message calls into helpers and fix an
argument order bug with the `pr_err' call now in `dfx_register_res_err'.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Fixes: 4d0438e56a8f ("defxx: Clean up DEFEA resource management") Cc: stable@vger.kernel.org # v3.19+ Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
pci_fixup_irqs() used to call pcibios_map_irq on every PCI device, which
for RT2880 included bus 0 slot 0. After pci_fixup_irqs() got removed,
only slots/funcs with devices attached would be called. While arguably
the right thing, that left no chance for this driver to ever initialize
slot 0, effectively bricking PCI and USB on RT2880 devices such as the
Belkin F5D8235-4 v1.
Slot 0 configuration needs to happen after PCI bus enumeration, but
before any device at slot 0x11 (func 0 or 1) is talked to. That was
determined empirically by testing on a Belkin F5D8235-4 v1 device. A
minimal BAR 0 config write followed by read, then setting slot 0
PCI_COMMAND to MASTER | IO | MEMORY is all that seems to be required for
proper functionality.
Tested by ensuring that full- and high-speed USB devices get enumerated
on the Belkin F5D8235-4 v1 (with an out of tree DTS file from OpenWrt).
Fixes: 04c81c7293df ("MIPS: PCI: Replace pci_fixup_irqs() call with host bridge IRQ mapping hooks") Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Tobias Wolf <dev-NTEO@vplace.de> Cc: <stable@vger.kernel.org> # v4.14+ Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upstream a long-standing OpenWrt patch [0] that fixes MT7620 PCIe PLL
lock check. The existing code checks the wrong register bit: PPLL_SW_SET
is not defined in PPLL_CFG1 and bit 31 of PPLL_CFG1 is marked as reserved
in the MT7620 Programming Guide. The correct bit to check for PLL lock
is PPLL_LD (bit 23).
Also reword the error message for clarity.
Without this change it is unlikely that this driver ever worked with
mainline kernel.
sound/soc/samsung/tm2_wm5110.c:605:6: style: Variable 'ret' is
reassigned a value before the old one has been
used. [redundantAssignment]
ret = devm_snd_soc_register_component(dev, &tm2_component,
^
sound/soc/samsung/tm2_wm5110.c:554:7: note: ret is assigned
ret = of_parse_phandle_with_args(dev->of_node, "i2s-controller",
^
sound/soc/samsung/tm2_wm5110.c:605:6: note: ret is overwritten
ret = devm_snd_soc_register_component(dev, &tm2_component,
^
The args is a stack variable, so it could have junk (uninitialized)
therefore args.np could have a non-NULL and random value even though
property was missing. Later could trigger invalid pointer dereference.
There's no need to check for args.np because args.np won't be
initialized on errors.
Fixes: 8d1513cef51a ("ASoC: samsung: Add support for HDMI audio on TM2 board") Cc: <stable@vger.kernel.org> Suggested-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Reviewed-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20210312180231.2741-2-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commits 8a4cd82d ("nfc: fix refcount leak in llcp_sock_connect()")
and c33b1cc62 ("nfc: fix refcount leak in llcp_sock_bind()")
fixed a refcount leak bug in bind/connect but introduced a
use-after-free if the same local is assigned to 2 different sockets.
This can be triggered by the following simple program:
int sock1 = socket( AF_NFC, SOCK_STREAM, NFC_SOCKPROTO_LLCP );
int sock2 = socket( AF_NFC, SOCK_STREAM, NFC_SOCKPROTO_LLCP );
memset( &addr, 0, sizeof(struct sockaddr_nfc_llcp) );
addr.sa_family = AF_NFC;
addr.nfc_protocol = NFC_PROTO_NFC_DEP;
bind( sock1, (struct sockaddr*) &addr, sizeof(struct sockaddr_nfc_llcp) )
bind( sock2, (struct sockaddr*) &addr, sizeof(struct sockaddr_nfc_llcp) )
close(sock1);
close(sock2);
Fix this by assigning NULL to llcp_sock->local after calling
nfc_llcp_local_put.
This addresses CVE-2021-23134.
Reported-by: Or Cohen <orcohen@paloaltonetworks.com> Reported-by: Nadav Markus <nmarkus@paloaltonetworks.com> Fixes: c33b1cc62 ("nfc: fix refcount leak in llcp_sock_bind()") Signed-off-by: Or Cohen <orcohen@paloaltonetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a possible race condition vulnerability between issuing a HCI
command and removing the cont. Specifically, functions hci_req_sync()
and hci_dev_do_close() can race each other like below:
thread-A in hci_req_sync() | thread-B in hci_dev_do_close()
| hci_req_sync_lock(hdev);
test_bit(HCI_UP, &hdev->flags); |
... | test_and_clear_bit(HCI_UP, &hdev->flags)
hci_req_sync_lock(hdev); |
|
In this commit we alter the sequence in function hci_req_sync(). Hence,
the thread-A cannot issue th.
Signed-off-by: Lin Ma <linma@zju.edu.cn> Cc: Marcel Holtmann <marcel@holtmann.org> Fixes: 7c6a329e4447 ("[Bluetooth] Fix regression from using default link policy") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When HSR interface is sending a frame, it finds a node with
the destination ethernet address from the list.
If there is no node, it calls WARN_ONCE().
But, using WARN_ONCE() for this situation is a little bit overdoing.
So, in this patch, the netdev_err() is used instead.
Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: George Kennedy <george.kennedy@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
hci_chan can be created in 2 places: hci_loglink_complete_evt() if
it is an AMP hci_chan, or l2cap_conn_add() otherwise. In theory,
Only AMP hci_chan should be removed by a call to
hci_disconn_loglink_complete_evt(). However, the controller might mess
up, call that function, and destroy an hci_chan which is not initiated
by hci_loglink_complete_evt().
This patch adds a verification that the destroyed hci_chan must have
been init'd by hci_loglink_complete_evt().
Memory state around the buggy address: ffff8881d7af9080: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ffff8881d7af9100: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff8881d7af9180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff8881d7af9200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8881d7af9280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
If a TAINT_PROPRIETARY_MODULE exports symbol, inherit the taint flag
for all modules importing these symbols, and don't allow loading
symbols from TAINT_PROPRIETARY_MODULE modules if the module previously
imported gplonly symbols. Add a anti-circumvention devices so people
don't accidentally get themselves into trouble this way.
Comment from Greg:
"Ah, the proven-to-be-illegal "GPL Condom" defense :)"
[jeyu: pr_info -> pr_err and pr_warn as per discussion] Link: http://lore.kernel.org/r/20200730162957.GA22469@lst.de Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jessica Yu <jeyu@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When loading a device-mapper table for a request-based mapped device,
and the allocation/initialization of the blk_mq_tag_set for the device
fails, a following device remove will cause a double free.
When allocation/initialization of the blk_mq_tag_set fails in
dm_mq_init_request_queue(), it is uninitialized/freed, but the pointer
is not reset to NULL; so when dev_remove() later gets into
dm_mq_cleanup_mapped_device() it sees the pointer and tries to
uninitialize and free it again.
Fix this by setting the pointer to NULL in dm_mq_init_request_queue()
error-handling. Also set it to NULL in dm_mq_cleanup_mapped_device().
Cc: <stable@vger.kernel.org> # 4.6+ Fixes: 1c357a1e86a4 ("dm: allocate blk_mq_tag_set rather than embed in mapped_device") Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This division bug meant the search for free metadata space could skip
the final allocation bitmap's worth of entries. Fix affects DM thinp,
cache and era targets.
Cc: stable@vger.kernel.org Signed-off-by: Joe Thornber <ejt@redhat.com> Tested-by: Ming-Hung Tsai <mtsai@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It was reported that a fix to the ring buffer recursion detection would
cause a hung machine when performing suspend / resume testing. The
following backtrace was extracted from debugging that case:
Since the fix to the recursion detection would allow a single recursion to
happen while tracing, this lead to the trace_clock_global() taking a spin
lock and then trying to take it again:
ring_buffer_lock_reserve() {
trace_clock_global() {
arch_spin_lock() {
queued_spin_lock_slowpath() {
/* lock taken */
(something else gets traced by function graph tracer)
ring_buffer_lock_reserve() {
trace_clock_global() {
arch_spin_lock() {
queued_spin_lock_slowpath() {
/* DEAD LOCK! */
Tracing should *never* block, as it can lead to strange lockups like the
above.
Restructure the trace_clock_global() code to instead of simply taking a
lock to update the recorded "prev_time" simply use it, as two events
happening on two different CPUs that calls this at the same time, really
doesn't matter which one goes first. Use a trylock to grab the lock for
updating the prev_time, and if it fails, simply try again the next time.
If it failed to be taken, that means something else is already updating
it.
Link: https://lkml.kernel.org/r/20210430121758.650b6e8a@gandalf.local.home Cc: stable@vger.kernel.org Tested-by: Konstantin Kharlamov <hi-angel@yandex.ru> Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com> Fixes: b02414c8f045 ("ring-buffer: Fix recursion protection transitions between interrupt context") # started showing the problem Fixes: 14131f2f98ac3 ("tracing: implement trace_clock_*() APIs") # where the bug happened
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212761 Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The default max PID is set by PID_MAX_DEFAULT, and the tracing
infrastructure uses this number to map PIDs to the comm names of the
tasks, such output of the trace can show names from the recorded PIDs in
the ring buffer. This mapping is also exported to user space via the
"saved_cmdlines" file in the tracefs directory.
But currently the mapping expects the PIDs to be less than
PID_MAX_DEFAULT, which is the default maximum and not the real maximum.
Recently, systemd will increases the maximum value of a PID on the system,
and when tasks are traced that have a PID higher than PID_MAX_DEFAULT, its
comm is not recorded. This leads to the entire trace to have "<...>" as
the comm name, which is pretty useless.
Instead, keep the array mapping the size of PID_MAX_DEFAULT, but instead
of just mapping the index to the comm, map a mask of the PID
(PID_MAX_DEFAULT - 1) to the comm, and find the full PID from the
map_cmdline_to_pid array (that already exists).
This bug goes back to the beginning of ftrace, but hasn't been an issue
until user space started increasing the maximum value of PIDs.
The rsi_resume() does access the bus to enable interrupts on the RSI
SDIO WiFi card, however when calling sdio_claim_host() in the resume
path, it is possible the bus is already claimed and sdio_claim_host()
spins indefinitelly. Enable the SDIO card interrupts in resume_noirq
instead to prevent anything else from claiming the SDIO bus first.
Fixes: 20db07332736 ("rsi: sdio suspend and resume support") Signed-off-by: Marek Vasut <marex@denx.de> Cc: Amitkumar Karwar <amit.karwar@redpinesignals.com> Cc: Angus Ainslie <angus@akkea.ca> Cc: David S. Miller <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: Karun Eagalapati <karun256@gmail.com> Cc: Martin Kepplinger <martink@posteo.de> Cc: Sebastian Krzyszkowiak <sebastian.krzyszkowiak@puri.sm> Cc: Siva Rebbagondla <siva8118@gmail.com> Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210327235932.175896-1-marex@denx.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
syzbot reported memory leak in tty/vt.
The problem was in VT_DISALLOCATE ioctl cmd.
After allocating unimap with PIO_UNIMAP it wasn't
freed via VT_DISALLOCATE, but vc_cons[currcons].d was
zeroed.
The START_TRANSFER command needs to be executed while in ON/U0 link
state (with an exception during register initialization). Don't use
dwc->link_state to check this since the driver only tracks the link
state when the link state change interrupt is enabled. Check the link
state from DSTS register instead.
Note that often the host already brings the device out of low power
before it sends/requests the next transfer. So, the user won't see any
issue when the device starts transfer then. This issue is more
noticeable in cases when the device delays starting transfer, which can
happen during delayed control status after the host put the device in
low power.
Fixes bug with the handling of more than one language in
the string table in f_fs.c.
str_count was not reset for subsequent language codes.
str_count-- "rolls under" and processes u32 max strings on
the processing of the second language entry.
The existing bug can be reproduced by adding a second language table
to the structure "strings" in tools/usb/ffs-test.c.
Upon driver unbind usb_free_all_descriptors() function frees all
speed descriptor pointers without setting them to NULL. In case
gadget speed changes (i.e from super speed plus to super speed)
after driver unbind only upto super speed descriptor pointers get
populated. Super speed plus desc still holds the stale (already
freed) pointer. Fix this issue by setting all descriptor pointers
to NULL after freeing them in usb_free_all_descriptors().
Fixes: f5c61225cf29 ("usb: gadget: Update function for SuperSpeedPlus")
cc: stable@vger.kernel.org Reviewed-by: Peter Chen <peter.chen@kernel.org> Signed-off-by: Hemant Kumar <hemantk@codeaurora.org> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> Link: https://lore.kernel.org/r/1619034452-17334-1-git-send-email-wcheng@codeaurora.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix a general protection fault reported by syzbot due to a race between
gadget_setup() and gadget_unbind() in raw_gadget.
The gadget core is supposed to guarantee that there won't be any more
callbacks to the gadget driver once the driver's unbind routine is
called. That guarantee is enforced in usb_gadget_remove_driver as
follows:
usb_gadget_disconnect(udc->gadget);
if (udc->gadget->irq)
synchronize_irq(udc->gadget->irq);
udc->driver->unbind(udc->gadget);
usb_gadget_udc_stop(udc);
usb_gadget_disconnect turns off the pullup resistor, telling the host
that the gadget is no longer connected and preventing the transmission
of any more USB packets. Any packets that have already been received
are sure to processed by the UDC driver's interrupt handler by the time
synchronize_irq returns.
But this doesn't work with dummy_hcd, because dummy_hcd doesn't use
interrupts; it uses a timer instead. It does have code to emulate the
effect of synchronize_irq, but that code doesn't get invoked at the
right time -- it currently runs in usb_gadget_udc_stop, after the unbind
callback instead of before. Indeed, there's no way for
usb_gadget_remove_driver to invoke this code before the unbind callback.
To fix this, move the synchronize_irq() emulation code to dummy_pullup
so that it runs before unbind. Also, add a comment explaining why it is
necessary to have it there.
dvb_media_device_free() is leaking memory. Free `dvbdev->adapter->conn`
before setting it to NULL, as documented in include/media/media-device.h:
"The media_entity instance itself must be freed explicitly by the driver
if required."
We should set the error code when ext4_commit_super check argument failed.
Found in code review. Fixes: c4be0c1dc4cdc ("filesystem freeze: add error handling of write_super_lockfs/unlockfs"). Cc: stable@kernel.org Signed-off-by: Fengnan Chang <changfengnan@vivo.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20210402101631.561-1-changfengnan@vivo.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When CONFIG_QUOTA is enabled, if we failed to mount the filesystem due
to some error happens behind ext4_orphan_cleanup(), it will end up
triggering a after free issue of super_block. The problem is that
ext4_orphan_cleanup() will set SB_ACTIVE flag if CONFIG_QUOTA is
enabled, after we cleanup the truncated inodes, the last iput() will put
them into the lru list, and these inodes' pages may probably dirty and
will be write back by the writeback thread, so it could be raced by
freeing super_block in the error path of mount_bdev().
After check the setting of SB_ACTIVE flag in ext4_orphan_cleanup(), it
was used to ensure updating the quota file properly, but evict inode and
trash data immediately in the last iput does not affect the quotafile,
so setting the SB_ACTIVE flag seems not required[1]. Fix this issue by
just remove the SB_ACTIVE setting.
Cc: stable@kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Tested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210331033138.918975-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit <50122847007> ("ext4: fix check to prevent initializing reserved
inodes") check the block group zero and prevent initializing reserved
inodes. But in some special cases, the reserved inode may not all belong
to the group zero, it may exist into the second group if we format
filesystem below.
So, it will end up triggering a false positive report of a corrupted
file system. This patch fix it by avoid check reserved inodes if no free
inode blocks will be zeroed.
Cc: stable@kernel.org Fixes: 50122847007 ("ext4: fix check to prevent initializing reserved inodes") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210331121516.2243099-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jian Cai [Thu, 6 May 2021 01:25:08 +0000 (18:25 -0700)]
arm64: vdso: remove commas between macro name and arguments
LLVM's integrated assembler appears to assume an argument with default
value is passed whenever it sees a comma right after the macro name.
It will be fine if the number of following arguments is one less than
the number of parameters specified in the macro definition. Otherwise,
it fails. For example, the following code works:
$ llvm-mc -triple=armv7a -filetype=obj foo.s -o ias.o
foo.s:6:14: error: too many positional arguments
foo, arg1=2, arg2=8
This causes build failures as follows:
arch/arm64/kernel/vdso/gettimeofday.S:230:24: error: too many positional
arguments
clock_gettime_return, shift=1
^
arch/arm64/kernel/vdso/gettimeofday.S:253:24: error: too many positional
arguments
clock_gettime_return, shift=1
^
arch/arm64/kernel/vdso/gettimeofday.S:274:24: error: too many positional
arguments
clock_gettime_return, shift=1
This error is not in mainline because commit 28b1a824a4f4 ("arm64: vdso:
Substitute gettimeofday() with C implementation") rewrote this assembler
file in C as part of a 25 patch series that is unsuitable for stable.
Just remove the comma in the clock_gettime_return invocations in 4.19 so
that GNU as and LLVM's integrated assembler work the same.
The return value on success (>= 0) is overwritten by the return value of
put_old_timex32(). That works correct in the fault case, but is wrong for
the success case where put_old_timex32() returns 0.
Just check the return value of put_old_timex32() and return -EFAULT in case
it is not zero.
[ tglx: Massage changelog ]
Fixes: 3a4d44b61625 ("ntp: Move adjtimex related compat syscalls to native counterparts") Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210414030449.90692-1-chenjun102@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The FUTEX_WAIT operand has historically a relative timeout which means that
the clock id is irrelevant as relative timeouts on CLOCK_REALTIME are not
subject to wall clock changes and therefore are mapped by the kernel to
CLOCK_MONOTONIC for simplicity.
If a caller would set FUTEX_CLOCK_REALTIME for FUTEX_WAIT the timeout is
still treated relative vs. CLOCK_MONOTONIC and then the wait arms that
timeout based on CLOCK_REALTIME which is broken and obviously has never
been used or even tested.
Reject any attempt to use FUTEX_CLOCK_REALTIME with FUTEX_WAIT again.
The desired functionality can be achieved with FUTEX_WAIT_BITSET and a
FUTEX_BITSET_MATCH_ANY argument.
Fixes: 337f13046ff0 ("futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210422194704.834797921@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
KASAN reports a BUG when download file in jffs2 filesystem.It is
because when dstlen == 1, cpage_out will write array out of bounds.
Actually, data will not be compressed in jffs2_zlib_compress() if
data's length less than 4.
[ 393.799778] BUG: KASAN: slab-out-of-bounds in jffs2_rtime_compress+0x214/0x2f0 at addr ffff800062e3b281
[ 393.809166] Write of size 1 by task tftp/2918
[ 393.813526] CPU: 3 PID: 2918 Comm: tftp Tainted: G B 4.9.115-rt93-EMBSYS-CGEL-6.1.R6-dirty #1
[ 393.823173] Hardware name: LS1043A RDB Board (DT)
[ 393.827870] Call trace:
[ 393.830322] [<ffff20000808c700>] dump_backtrace+0x0/0x2f0
[ 393.835721] [<ffff20000808ca04>] show_stack+0x14/0x20
[ 393.840774] [<ffff2000086ef700>] dump_stack+0x90/0xb0
[ 393.845829] [<ffff20000827b19c>] kasan_object_err+0x24/0x80
[ 393.851402] [<ffff20000827b404>] kasan_report_error+0x1b4/0x4d8
[ 393.857323] [<ffff20000827bae8>] kasan_report+0x38/0x40
[ 393.862548] [<ffff200008279d44>] __asan_store1+0x4c/0x58
[ 393.867859] [<ffff2000084ce2ec>] jffs2_rtime_compress+0x214/0x2f0
[ 393.873955] [<ffff2000084bb3b0>] jffs2_selected_compress+0x178/0x2a0
[ 393.880308] [<ffff2000084bb530>] jffs2_compress+0x58/0x478
[ 393.885796] [<ffff2000084c5b34>] jffs2_write_inode_range+0x13c/0x450
[ 393.892150] [<ffff2000084be0b8>] jffs2_write_end+0x2a8/0x4a0
[ 393.897811] [<ffff2000081f3008>] generic_perform_write+0x1c0/0x280
[ 393.903990] [<ffff2000081f5074>] __generic_file_write_iter+0x1c4/0x228
[ 393.910517] [<ffff2000081f5210>] generic_file_write_iter+0x138/0x288
[ 393.916870] [<ffff20000829ec1c>] __vfs_write+0x1b4/0x238
[ 393.922181] [<ffff20000829ff00>] vfs_write+0xd0/0x238
[ 393.927232] [<ffff2000082a1ba8>] SyS_write+0xa0/0x110
[ 393.932283] [<ffff20000808429c>] __sys_trace_return+0x0/0x4
[ 393.937851] Object at ffff800062e3b280, in cache kmalloc-64 size: 64
[ 393.944197] Allocated:
[ 393.946552] PID = 2918
[ 393.948913] save_stack_trace_tsk+0x0/0x220
[ 393.953096] save_stack_trace+0x18/0x20
[ 393.956932] kasan_kmalloc+0xd8/0x188
[ 393.960594] __kmalloc+0x144/0x238
[ 393.963994] jffs2_selected_compress+0x48/0x2a0
[ 393.968524] jffs2_compress+0x58/0x478
[ 393.972273] jffs2_write_inode_range+0x13c/0x450
[ 393.976889] jffs2_write_end+0x2a8/0x4a0
[ 393.980810] generic_perform_write+0x1c0/0x280
[ 393.985251] __generic_file_write_iter+0x1c4/0x228
[ 393.990040] generic_file_write_iter+0x138/0x288
[ 393.994655] __vfs_write+0x1b4/0x238
[ 393.998228] vfs_write+0xd0/0x238
[ 394.001543] SyS_write+0xa0/0x110
[ 394.004856] __sys_trace_return+0x0/0x4
[ 394.008684] Freed:
[ 394.010691] PID = 2918
[ 394.013051] save_stack_trace_tsk+0x0/0x220
[ 394.017233] save_stack_trace+0x18/0x20
[ 394.021069] kasan_slab_free+0x88/0x188
[ 394.024902] kfree+0x6c/0x1d8
[ 394.027868] jffs2_sum_write_sumnode+0x2c4/0x880
[ 394.032486] jffs2_do_reserve_space+0x198/0x598
[ 394.037016] jffs2_reserve_space+0x3f8/0x4d8
[ 394.041286] jffs2_write_inode_range+0xf0/0x450
[ 394.045816] jffs2_write_end+0x2a8/0x4a0
[ 394.049737] generic_perform_write+0x1c0/0x280
[ 394.054179] __generic_file_write_iter+0x1c4/0x228
[ 394.058968] generic_file_write_iter+0x138/0x288
[ 394.063583] __vfs_write+0x1b4/0x238
[ 394.067157] vfs_write+0xd0/0x238
[ 394.070470] SyS_write+0xa0/0x110
[ 394.073783] __sys_trace_return+0x0/0x4
[ 394.077612] Memory state around the buggy address:
[ 394.082404] ffff800062e3b180: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[ 394.089623] ffff800062e3b200: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[ 394.096842] >ffff800062e3b280: 01 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 394.104056] ^
[ 394.107283] ffff800062e3b300: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 394.114502] ffff800062e3b380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 394.121718] ==================================================================
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Cc: Joel Stanley <joel@jms.id.au> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It seems like Fedora 34 ends up enabling a few new gcc warnings, notably
"-Wstringop-overread" and "-Warray-parameter".
Both of them cause what seem to be valid warnings in the kernel, where
we have array size mismatches in function arguments (that are no longer
just silently converted to a pointer to element, but actually checked).
This fixes most of the trivial ones, by making the function declaration
match the function definition, and in the case of intel_pm.c, removing
the over-specified array size from the argument declaration.
At least one 'stringop-overread' warning remains in the i915 driver, but
that one doesn't have the same obvious trivial fix, and may or may not
actually be indicative of a bug.
[ It was a mistake to upgrade one of my machines to Fedora 34 while
being busy with the merge window, but if this is the extent of the
compiler upgrade problems, things are better than usual - Linus ]
gcc-11 introdces a harmless warning for cap_inode_getsecurity:
security/commoncap.c: In function ‘cap_inode_getsecurity’:
security/commoncap.c:440:33: error: ‘memcpy’ reading 16 bytes from a region of size 0 [-Werror=stringop-overread]
440 | memcpy(&nscap->data, &cap->data, sizeof(__le32) * 2 * VFS_CAP_U32);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The problem here is that tmpbuf is initialized to NULL, so gcc assumes
it is not accessible unless it gets set by vfs_getxattr_alloc(). This is
a legitimate warning as far as I can tell, but the code is correct since
it correctly handles the error when that function fails.
Add a separate NULL check to tell gcc about it as well.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: James Morris <jamorris@linux.microsoft.com> Cc: Andrey Zhizhikin <andrey.z@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If fast table reloads occur during an ongoing reshape of raid4/5/6
devices the target may race reading a superblock vs the the MD resync
thread; causing an inconclusive reshape state to be read in its
constructor.
lvm2 test lvconvert-raid-reshape-stripes-load-reload.sh can cause
BUG_ON() to trigger in md_run(), e.g.:
"kernel BUG at drivers/md/raid5.c:7567!".
Scenario triggering the bug:
1. the MD sync thread calls end_reshape() from raid5_sync_request()
when done reshaping. However end_reshape() _only_ updates the
reshape position to MaxSector keeping the changed layout
configuration though (i.e. any delta disks, chunk sector or RAID
algorithm changes). That inconclusive configuration is stored in
the superblock.
2. dm-raid constructs a mapping, loading named inconsistent superblock
as of step 1 before step 3 is able to finish resetting the reshape
state completely, and calls md_run() which leads to mentioned bug
in raid5.c.
3. the MD RAID personality's finish_reshape() is called; which resets
the reshape information on chunk sectors, delta disks, etc. This
explains why the bug is rarely seen on multi-core machines, as MD's
finish_reshape() superblock update races with the dm-raid
constructor's superblock load in step 2.
Fix identifies inconclusive superblock content in the dm-raid
constructor and resets it before calling md_run(), factoring out
identifying checks into rs_is_layout_change() to share in existing
rs_reshape_requested() and new rs_reset_inclonclusive_reshape(). Also
enhance a comment and remove an empty line.
This patch addresses a data corruption bug in raid1 arrays using bitmaps.
Without this fix, the bitmap bits for the failed I/O end up being cleared.
Since we are in the failure leg of raid1_end_write_request, the request
either needs to be retried (R1BIO_WriteError) or failed (R1BIO_Degraded).
Fixes: eeba6809d8d5 ("md/raid1: end bio when the device faulty") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Paul Clements <paul.clements@us.sios.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Avoid allocating memory and reading the host log when a virtual device
is used since this log is of no use to that driver. A virtual
device can be identified through the flag TPM_CHIP_FLAG_VIRTUAL, which
is only set for the tpm_vtpm_proxy driver.
Cc: stable@vger.kernel.org Fixes: 6f99612e2500 ("tpm: Proxy driver for supporting multiple emulated TPMs") Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A few archs like powerpc have different errno.h values for macros
EDEADLOCK and EDEADLK. In code including both libc and linux versions of
errno.h, this can result in multiple definitions of EDEADLOCK in the
include chain. Definitions to the same value (e.g. seen with mips) do
not raise warnings, but on powerpc there are redefinitions changing the
value, which raise warnings and errors (if using "-Werror").
Guard against these redefinitions to avoid build errors like the following,
first seen cross-compiling libbpf v5.8.9 for powerpc using GCC 8.4.0 with
musl 1.1.24:
In file included from ../../arch/powerpc/include/uapi/asm/errno.h:5,
from ../../include/linux/err.h:8,
from libbpf.c:29:
../../include/uapi/asm-generic/errno.h:40: error: "EDEADLOCK" redefined [-Werror]
#define EDEADLOCK EDEADLK
In file included from toolchain-powerpc_8540_gcc-8.4.0_musl/include/errno.h:10,
from libbpf.c:26:
toolchain-powerpc_8540_gcc-8.4.0_musl/include/bits/errno.h:58: note: this is the location of the previous definition
#define EDEADLOCK 58
During the EEH MMIO error checking, the current implementation fails to map
the (virtual) MMIO address back to the pci device on radix with hugepage
mappings for I/O. This results into failure to dispatch EEH event with no
recovery even when EEH capability has been enabled on the device.
In case of hugepage mappings, eeh_token_to_phys() has a bug in virt -> phys
translation that results in wrong physical address, which is then passed to
eeh_addr_cache_get_dev() to match it against cached pci I/O address ranges
to get to a PCI device. Hence, it fails to find a match and the EEH event
never gets dispatched leaving the device in failed state.
The commit 33439620680be ("powerpc/eeh: Handle hugepages in ioremap space")
introduced following logic to translate virt to phys for hugepage mappings:
eeh_token_to_phys():
+ pa = pte_pfn(*ptep);
+
+ /* On radix we can do hugepage mappings for io, so handle that */
+ if (hugepage_shift) {
+ pa <<= hugepage_shift; <= This is wrong
+ pa |= token & ((1ul << hugepage_shift) - 1);
+ }
This patch fixes the virt -> phys translation in eeh_token_to_phys()
function.
$ cat /sys/kernel/debug/powerpc/eeh_address_cache
mem addr range [0x0000040080000000-0x00000400807fffff]: 0030:01:00.1
mem addr range [0x0000040080800000-0x0000040080ffffff]: 0030:01:00.1
mem addr range [0x0000040081000000-0x00000400817fffff]: 0030:01:00.0
mem addr range [0x0000040081800000-0x0000040081ffffff]: 0030:01:00.0
mem addr range [0x0000040082000000-0x000004008207ffff]: 0030:01:00.1
mem addr range [0x0000040082080000-0x00000400820fffff]: 0030:01:00.0
mem addr range [0x0000040082100000-0x000004008210ffff]: 0030:01:00.1
mem addr range [0x0000040082110000-0x000004008211ffff]: 0030:01:00.0
Above is the list of cached io address ranges of pci 0030:01:00.<fn>.
Before this patch:
Tracing 'arg1' of function eeh_addr_cache_get_dev() during error injection
clearly shows that 'addr=' contains wrong physical address:
KASAN report a slab-out-of-bounds problem. The logs are listed below.
It is because in function jffs2_scan_dirent_node, we alloc "checkedlen+1"
bytes for fd->name and we check crc with length rd->nsize. If checkedlen
is less than rd->nsize, it will cause the slab-out-of-bounds problem.
jffs2: Dirent at *** has zeroes in name. Truncating to %d char
==================================================================
BUG: KASAN: slab-out-of-bounds in crc32_le+0x1ce/0x260 at addr ffff8800842cf2d1
Read of size 1 by task test_JFFS2/915
=============================================================================
BUG kmalloc-64 (Tainted: G B O ): kasan: bad access detected
-----------------------------------------------------------------------------
INFO: Allocated in jffs2_alloc_full_dirent+0x2a/0x40 age=0 cpu=1 pid=915
___slab_alloc+0x580/0x5f0
__slab_alloc.isra.24+0x4e/0x64
__kmalloc+0x170/0x300
jffs2_alloc_full_dirent+0x2a/0x40
jffs2_scan_eraseblock+0x1ca4/0x3b64
jffs2_scan_medium+0x285/0xfe0
jffs2_do_mount_fs+0x5fb/0x1bbc
jffs2_do_fill_super+0x245/0x6f0
jffs2_fill_super+0x287/0x2e0
mount_mtd_aux.isra.0+0x9a/0x144
mount_mtd+0x222/0x2f0
jffs2_mount+0x41/0x60
mount_fs+0x63/0x230
vfs_kern_mount.part.6+0x6c/0x1f4
do_mount+0xae8/0x1940
SyS_mount+0x105/0x1d0
INFO: Freed in jffs2_free_full_dirent+0x22/0x40 age=27 cpu=1 pid=915
__slab_free+0x372/0x4e4
kfree+0x1d4/0x20c
jffs2_free_full_dirent+0x22/0x40
jffs2_build_remove_unlinked_inode+0x17a/0x1e4
jffs2_do_mount_fs+0x1646/0x1bbc
jffs2_do_fill_super+0x245/0x6f0
jffs2_fill_super+0x287/0x2e0
mount_mtd_aux.isra.0+0x9a/0x144
mount_mtd+0x222/0x2f0
jffs2_mount+0x41/0x60
mount_fs+0x63/0x230
vfs_kern_mount.part.6+0x6c/0x1f4
do_mount+0xae8/0x1940
SyS_mount+0x105/0x1d0
entry_SYSCALL_64_fastpath+0x1e/0x97
Call Trace:
[<ffffffff815befef>] dump_stack+0x59/0x7e
[<ffffffff812d1d65>] print_trailer+0x125/0x1b0
[<ffffffff812d82c8>] object_err+0x34/0x40
[<ffffffff812dadef>] kasan_report.part.1+0x21f/0x534
[<ffffffff81132401>] ? vprintk+0x2d/0x40
[<ffffffff815f1ee2>] ? crc32_le+0x1ce/0x260
[<ffffffff812db41a>] kasan_report+0x26/0x30
[<ffffffff812d9fc1>] __asan_load1+0x3d/0x50
[<ffffffff815f1ee2>] crc32_le+0x1ce/0x260
[<ffffffff814764ae>] ? jffs2_alloc_full_dirent+0x2a/0x40
[<ffffffff81485cec>] jffs2_scan_eraseblock+0x1d0c/0x3b64
[<ffffffff81488813>] ? jffs2_scan_medium+0xccf/0xfe0
[<ffffffff81483fe0>] ? jffs2_scan_make_ino_cache+0x14c/0x14c
[<ffffffff812da3e9>] ? kasan_unpoison_shadow+0x35/0x50
[<ffffffff812da3e9>] ? kasan_unpoison_shadow+0x35/0x50
[<ffffffff812da462>] ? kasan_kmalloc+0x5e/0x70
[<ffffffff812d5d90>] ? kmem_cache_alloc_trace+0x10c/0x2cc
[<ffffffff818169fb>] ? mtd_point+0xf7/0x130
[<ffffffff81487dc9>] jffs2_scan_medium+0x285/0xfe0
[<ffffffff81487b44>] ? jffs2_scan_eraseblock+0x3b64/0x3b64
[<ffffffff812da3e9>] ? kasan_unpoison_shadow+0x35/0x50
[<ffffffff812da3e9>] ? kasan_unpoison_shadow+0x35/0x50
[<ffffffff812da462>] ? kasan_kmalloc+0x5e/0x70
[<ffffffff812d57df>] ? __kmalloc+0x12b/0x300
[<ffffffff812da462>] ? kasan_kmalloc+0x5e/0x70
[<ffffffff814a2753>] ? jffs2_sum_init+0x9f/0x240
[<ffffffff8148b2ff>] jffs2_do_mount_fs+0x5fb/0x1bbc
[<ffffffff8148ad04>] ? jffs2_del_noinode_dirent+0x640/0x640
[<ffffffff812da462>] ? kasan_kmalloc+0x5e/0x70
[<ffffffff81127c5b>] ? __init_rwsem+0x97/0xac
[<ffffffff81492349>] jffs2_do_fill_super+0x245/0x6f0
[<ffffffff81493c5b>] jffs2_fill_super+0x287/0x2e0
[<ffffffff814939d4>] ? jffs2_parse_options+0x594/0x594
[<ffffffff81819bea>] mount_mtd_aux.isra.0+0x9a/0x144
[<ffffffff81819eb6>] mount_mtd+0x222/0x2f0
[<ffffffff814939d4>] ? jffs2_parse_options+0x594/0x594
[<ffffffff81819c94>] ? mount_mtd_aux.isra.0+0x144/0x144
[<ffffffff81258757>] ? free_pages+0x13/0x1c
[<ffffffff814fa0ac>] ? selinux_sb_copy_data+0x278/0x2e0
[<ffffffff81492b35>] jffs2_mount+0x41/0x60
[<ffffffff81302fb7>] mount_fs+0x63/0x230
[<ffffffff8133755f>] ? alloc_vfsmnt+0x32f/0x3b0
[<ffffffff81337f2c>] vfs_kern_mount.part.6+0x6c/0x1f4
[<ffffffff8133ceec>] do_mount+0xae8/0x1940
[<ffffffff811b94e0>] ? audit_filter_rules.constprop.6+0x1d10/0x1d10
[<ffffffff8133c404>] ? copy_mount_string+0x40/0x40
[<ffffffff812cbf78>] ? alloc_pages_current+0xa4/0x1bc
[<ffffffff81253a89>] ? __get_free_pages+0x25/0x50
[<ffffffff81338993>] ? copy_mount_options.part.17+0x183/0x264
[<ffffffff8133e3a9>] SyS_mount+0x105/0x1d0
[<ffffffff8133e2a4>] ? copy_mnt_ns+0x560/0x560
[<ffffffff810e8391>] ? msa_space_switch_handler+0x13d/0x190
[<ffffffff81be184a>] entry_SYSCALL_64_fastpath+0x1e/0x97
[<ffffffff810e9274>] ? msa_space_switch+0xb0/0xe0
Memory state around the buggy address: ffff8800842cf180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8800842cf200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8800842cf280: fc fc fc fc fc fc 00 00 00 00 01 fc fc fc fc fc
^ ffff8800842cf300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8800842cf380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
If the pNFS layout segment is marked with the NFS_LSEG_LAYOUTRETURN
flag, then the assumption is that it has some reporting requirement
to perform through a layoutreturn (e.g. flexfiles layout stats or error
information).
Fixes: 6d597e175012 ("pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the pNFS layout segment is marked with the NFS_LSEG_LAYOUTRETURN
flag, then the assumption is that it has some reporting requirement
to perform through a layoutreturn (e.g. flexfiles layout stats or error
information).
Fixes: e0b7d420f72a ("pNFS: Don't discard layout segments that are marked for return") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When failing the driver probe because of invalid firmware properties,
the GTDT driver unmaps the interrupt that it mapped earlier.
However, it never checks whether the mapping of the interrupt actially
succeeded. Even more, should the firmware report an illegal interrupt
number that overlaps with the GIC SGI range, this can result in an
IPI being unmapped, and subsequent fireworks (as reported by Dann
Frazier).
Rework the driver to have a slightly saner behaviour and actually
check whether the interrupt has been mapped before unmapping things.
Reported-by: dann frazier <dann.frazier@canonical.com> Fixes: ca9ae5ec4ef0 ("acpi/arm64: Add SBSA Generic Watchdog support in GTDT driver") Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/YH87dtTfwYgavusz@xps13.dannf Cc: <stable@vger.kernel.org> Cc: Fu Wei <wefu@redhat.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: dann frazier <dann.frazier@canonical.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Link: https://lore.kernel.org/r/20210421164317.1718831-2-maz@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in ovs_fragment(), similarly to what is done for IPv6 few lines below.
Fixes: d52e5a7e7ca4 ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmt") Cc: <stable@vger.kernel.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Each multicast route that is forwarding packets (as opposed to trapping
them) points to a list of egress router interfaces (RIFs) through which
packets are replicated.
A route's action can transition from trap to forward when a RIF is
created for one of the route's egress virtual interfaces (eVIF). When
this happens, the route's action is first updated and only later the
list of egress RIFs is committed to the device.
This results in the route pointing to an invalid list. In case the list
pointer is out of range (due to uninitialized memory), the device will
complain:
Fix this by first committing the list of egress RIFs to the device and
only later update the route's action.
Note that a fix is not needed in the reverse function (i.e.,
mlxsw_sp_mr_route_evif_unresolve()), as there the route's action is
first updated and only later the RIF is removed from the list.
Cc: stable@vger.kernel.org Fixes: c011ec1bbfd6 ("mlxsw: spectrum: Add the multicast routing offloading logic") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/20210506072308.3834303-1-idosch@idosch.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The root cause is, if nat entry in checkpoint journal area is corrupted,
e.g. nid of journalled nat entry exceeds max nid value, during checkpoint,
once it tries to flush nat journal to NAT area, get_next_nat_page() may
access out-of-bounds memory on nat_bitmap due to it uses wrong nid value
as bitmap offset.
Conside the following case, it just write a big file into flash,
when complete writing, delete the file, and then power off promptly.
Next time power on, we'll get a replay list like:
...
LEB 1105:211344 len 4144 deletion 0 sqnum 428783 key type 1 inode 80
LEB 15:233544 len 160 deletion 1 sqnum 428785 key type 0 inode 80
LEB 1105:215488 len 4144 deletion 0 sqnum 428787 key type 1 inode 80
...
In the replay list, data nodes' deletion are 0, and the inode node's
deletion is 1. In current logic, the file's dentry will be removed,
but inode and the flash space it occupied will be reserved.
User will see that much free space been disappeared.
We only need to check the deletion value of the following inode type
node of the replay entry.
Fixes: e58725d51fa8 ("ubifs: Handle re-linking of inodes correctly while recovery") Cc: stable@vger.kernel.org Signed-off-by: Guochun Mao <guochun.mao@mediatek.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The arm64 assembler in binutils 2.32 and above generates a program
property note in a note section, .note.gnu.property, to encode used x86
ISAs and features. But the kernel linker script only contains a single
NOTE segment:
Commit dbcc7d57bffc0c ("btrfs: fix race when cloning extent buffer during
rewind of an old root"), fixed a race when we need to rewind the extent
buffer of an old root. It was caused by picking a new mod log operation
for the extent buffer while getting a cloned extent buffer with an outdated
number of items (off by -1), because we cloned the extent buffer without
locking it first.
However there is still another similar race, but in the opposite direction.
The cloned extent buffer has a number of items that does not match the
number of tree mod log operations that are going to be replayed. This is
because right after we got the last (most recent) tree mod log operation to
replay and before locking and cloning the extent buffer, another task adds
a new pointer to the extent buffer, which results in adding a new tree mod
log operation and incrementing the number of items in the extent buffer.
So after cloning we have mismatch between the number of items in the extent
buffer and the number of mod log operations we are going to apply to it.
This results in hitting a BUG_ON() that produces the following stack trace:
(gdb) l *(tree_mod_log_rewind+0x3b1)
0xffffffff819e5b21 is in tree_mod_log_rewind (fs/btrfs/tree-mod-log.c:675).
670 * the modification. As we're going backwards, we do the
671 * opposite of each operation here.
672 */
673 switch (tm->op) {
674 case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING:
675 BUG_ON(tm->slot < n);
676 fallthrough;
677 case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_MOVING:
678 case BTRFS_MOD_LOG_KEY_REMOVE:
679 btrfs_set_node_key(eb, &tm->key, tm->slot);
(gdb) quit
The following steps explain in more detail how it happens:
1) We have one tree mod log user (through fiemap or the logical ino ioctl),
with a sequence number of 1, so we have fs_info->tree_mod_seq == 1.
This is task A;
2) Another task is at ctree.c:balance_level() and we have eb X currently as
the root of the tree, and we promote its single child, eb Y, as the new
root.
Then, at ctree.c:balance_level(), we call:
ret = btrfs_tree_mod_log_insert_root(root->node, child, true);
3) At btrfs_tree_mod_log_insert_root() we create a tree mod log operation
of type BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING, with a ->logical field
pointing to ebX->start. We only have one item in eb X, so we create
only one tree mod log operation, and store in the "tm_list" array;
4) Then, still at btrfs_tree_mod_log_insert_root(), we create a tree mod
log element of operation type BTRFS_MOD_LOG_ROOT_REPLACE, ->logical set
to ebY->start, ->old_root.logical set to ebX->start, ->old_root.level
set to the level of eb X and ->generation set to the generation of eb X;
5) Then btrfs_tree_mod_log_insert_root() calls tree_mod_log_free_eb() with
"tm_list" as argument. After that, tree_mod_log_free_eb() calls
tree_mod_log_insert(). This inserts the mod log operation of type
BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING from step 3 into the rbtree
with a sequence number of 2 (and fs_info->tree_mod_seq set to 2);
6) Then, after inserting the "tm_list" single element into the tree mod
log rbtree, the BTRFS_MOD_LOG_ROOT_REPLACE element is inserted, which
gets the sequence number 3 (and fs_info->tree_mod_seq set to 3);
7) Back to ctree.c:balance_level(), we free eb X by calling
btrfs_free_tree_block() on it. Because eb X was created in the current
transaction, has no other references and writeback did not happen for
it, we add it back to the free space cache/tree;
8) Later some other task B allocates the metadata extent from eb X, since
it is marked as free space in the space cache/tree, and uses it as a
node for some other btree;
9) The tree mod log user task calls btrfs_search_old_slot(), which calls
btrfs_get_old_root(), and finally that calls tree_mod_log_oldest_root()
with time_seq == 1 and eb_root == eb Y;
10) The first iteration of the while loop finds the tree mod log element
with sequence number 3, for the logical address of eb Y and of type
BTRFS_MOD_LOG_ROOT_REPLACE;
11) Because the operation type is BTRFS_MOD_LOG_ROOT_REPLACE, we don't
break out of the loop, and set root_logical to point to
tm->old_root.logical, which corresponds to the logical address of
eb X;
12) On the next iteration of the while loop, the call to
tree_mod_log_search_oldest() returns the smallest tree mod log element
for the logical address of eb X, which has a sequence number of 2, an
operation type of BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING and
corresponds to the old slot 0 of eb X (eb X had only 1 item in it
before being freed at step 7);
13) We then break out of the while loop and return the tree mod log
operation of type BTRFS_MOD_LOG_ROOT_REPLACE (eb Y), and not the one
for slot 0 of eb X, to btrfs_get_old_root();
14) At btrfs_get_old_root(), we process the BTRFS_MOD_LOG_ROOT_REPLACE
operation and set "logical" to the logical address of eb X, which was
the old root. We then call tree_mod_log_search() passing it the logical
address of eb X and time_seq == 1;
15) But before calling tree_mod_log_search(), task B locks eb X, adds a
key to eb X, which results in adding a tree mod log operation of type
BTRFS_MOD_LOG_KEY_ADD, with a sequence number of 4, to the tree mod
log, and increments the number of items in eb X from 0 to 1.
Now fs_info->tree_mod_seq has a value of 4;
16) Task A then calls tree_mod_log_search(), which returns the most recent
tree mod log operation for eb X, which is the one just added by task B
at the previous step, with a sequence number of 4, a type of
BTRFS_MOD_LOG_KEY_ADD and for slot 0;
17) Before task A locks and clones eb X, task A adds another key to eb X,
which results in adding a new BTRFS_MOD_LOG_KEY_ADD mod log operation,
with a sequence number of 5, for slot 1 of eb X, increments the
number of items in eb X from 1 to 2, and unlocks eb X.
Now fs_info->tree_mod_seq has a value of 5;
18) Task A then locks eb X and clones it. The clone has a value of 2 for
the number of items and the pointer "tm" points to the tree mod log
operation with sequence number 4, not the most recent one with a
sequence number of 5, so there is mismatch between the number of
mod log operations that are going to be applied to the cloned version
of eb X and the number of items in the clone;
19) Task A then calls tree_mod_log_rewind() with the clone of eb X, the
tree mod log operation with sequence number 4 and a type of
BTRFS_MOD_LOG_KEY_ADD, and time_seq == 1;
20) At tree_mod_log_rewind(), we set the local variable "n" with a value
of 2, which is the number of items in the clone of eb X.
Then in the first iteration of the while loop, we process the mod log
operation with sequence number 4, which is targeted at slot 0 and has
a type of BTRFS_MOD_LOG_KEY_ADD. This results in decrementing "n" from
2 to 1.
Then we pick the next tree mod log operation for eb X, which is the
tree mod log operation with a sequence number of 2, a type of
BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING and for slot 0, it is the one
added in step 5 to the tree mod log tree.
We go back to the top of the loop to process this mod log operation,
and because its slot is 0 and "n" has a value of 1, we hit the BUG_ON:
Fix this by checking for a more recent tree mod log operation after locking
and cloning the extent buffer of the old root node, and use it as the first
operation to apply to the cloned extent buffer when rewinding it.
Stable backport notes: due to moved code and renames, in =< 5.11 the
change should be applied to ctree.c:get_old_root.
Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Link: https://lore.kernel.org/linux-btrfs/20210404040732.GZ32440@hungrycats.org/ Fixes: 834328a8493079 ("Btrfs: tree mod log's old roots could still be part of the tree") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>