[Why]
Color corruption can occur on bootup into a login
manager that applies a non-linear gamma LUT because
the LUT may not actually be powered on before writing.
It's cleared on the next full pipe reprogramming as
we switch to LUTB from LUTA and the pipe accessing
the LUT has taken it out of light sleep mode.
[How]
The MPCC_OGAM_MEM_PWR_FORCE register does not force
the current power mode when set to 0. It only forces
when set light sleep, deep sleep or shutdown.
The register to actually force power on and ignore
sleep modes is MPCC_OGAM_MEM_PWR_DIS - a value of 0
will enable power requests and a value of 1 will
disable them.
When PWR_FORCE!=0 is combined with PWR_DIS=0 then
MPCC OGAM memory is forced into the state specified
by the force bits.
If PWR_FORCE is 0 then it respects the mode specified
by MPCC_OGAM_MEM_LOW_PWR_MODE if the RAM LUT is not
in use.
We set that bit to shutdown on low power, but otherwise
it inherits from bootup defaults.
So for the fix:
1. Update the sequence to "force" power on when needed
We can use MPCC_OGAM_MEM_PWR_DIS for this to turn on the
memory even when the block is in bypass and pending to be
enabled for the next frame.
We need this for both low power enabled or disabled.
If we don't set this then we can run into issues when we
first program the LUT from bootup.
2. Don't apply FORCE_SEL
Once we enable power requests with DIS=0 we run into the
issue of the RAM being forced into light sleep and being
unusable for display output. Leave this 0 like we used to
for DCN20.
3. Rely on MPCC OGAM init to determine light sleep/deep sleep
MPC low power debug mode isn't enabled on any ASIC currently
but we'll respect the setting determined during init if it
is.
Lightly tested as working with IGT tests and desktop color
adjustment.
4. Change the MPC resource default for DCN30
It was interleaving the dcn20 and dcn30 versions before
depending on the sequence.
5. REG_WAIT for it to be on whenever we're powering up the
memory
Otherwise we can write register values too early and we'll
get corruption.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Eric Yang <eric.yang2@amd.com> Acked-by: Qingqing Zhuo <Qingqing.Zhuo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Use AST_MAX_HWC_HEIGHT for setting offset_y in the cursor plane's
atomic_check. The code used AST_MAX_HWC_WIDTH instead. This worked
because both constants has the same value.
This patch fixes identification of DA913x parts by the DA9121 driver,
where a lack of clarity lead to implementation on the basis that variant
IDs were to be identical to the equivalent rated non-automotive parts.
There is a new emphasis on the DT identity to cope with overlap in these
ID's - this is not considered to be problematic, because projects would
be exclusively using automotive or consumer grade parts.
We call btrfs_update_root in btrfs_update_reloc_root, which can fail for
all sorts of reasons, including IO errors. Instead of panicing the box
lets return the error, now that all callers properly handle those
errors.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
We do memory allocations here, read blocks from disk, all sorts of
operations that could easily fail at any given point. Instead of
panicing the box, simply return the error back up the chain, all callers
at this point have proper error handling.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When we are running out of space for updating the chunk tree, that is,
when we are low on available space in the system space info, if we have
many task concurrently allocating block groups, via fallocate for example,
many of them can end up all allocating new system chunks when only one is
needed. In extreme cases this can lead to exhaustion of the system chunk
array, which has a size limit of 2048 bytes, and results in a transaction
abort with errno EFBIG, producing a trace in dmesg like the following,
which was triggered on a PowerPC machine with a node/leaf size of 64K:
The following steps explain how we can end up in this situation:
1) Task A is at check_system_chunk(), either because it is allocating a
new data or metadata block group, at btrfs_chunk_alloc(), or because
it is removing a block group or turning a block group RO. It does not
matter why;
2) Task A sees that there is not enough free space in the system
space_info object, that is 'left' is < 'thresh'. And at this point
the system space_info has a value of 0 for its 'bytes_may_use'
counter;
3) As a consequence task A calls btrfs_alloc_chunk() in order to allocate
a new system block group (chunk) and then reserves 'thresh' bytes in
the chunk block reserve with the call to btrfs_block_rsv_add(). This
changes the chunk block reserve's 'reserved' and 'size' counters by an
amount of 'thresh', and changes the 'bytes_may_use' counter of the
system space_info object from 0 to 'thresh'.
Also during its call to btrfs_alloc_chunk(), we end up increasing the
value of the 'total_bytes' counter of the system space_info object by
8MiB (the size of a system chunk stripe). This happens through the
call chain:
4) After it finishes the first phase of the block group allocation, at
btrfs_chunk_alloc(), task A unlocks the chunk mutex;
5) At this point the new system block group was added to the transaction
handle's list of new block groups, but its block group item, device
items and chunk item were not yet inserted in the extent, device and
chunk trees, respectively. That only happens later when we call
btrfs_finish_chunk_alloc() through a call to
btrfs_create_pending_block_groups();
Note that only when we update the chunk tree, through the call to
btrfs_finish_chunk_alloc(), we decrement the 'reserved' counter
of the chunk block reserve as we COW/allocate extent buffers,
through:
If we end up COWing less chunk btree nodes/leaves than expected, which
is the typical case since the amount of space we reserve is always
pessimistic to account for the worst possible case, we release the
unused space through:
But before task A gets into btrfs_create_pending_block_groups()...
6) Many other tasks start allocating new block groups through fallocate,
each one does the first phase of block group allocation in a
serialized way, since btrfs_chunk_alloc() takes the chunk mutex
before calling check_system_chunk() and btrfs_alloc_chunk().
However before everyone enters the final phase of the block group
allocation, that is, before calling btrfs_create_pending_block_groups(),
new tasks keep coming to allocate new block groups and while at
check_system_chunk(), the system space_info's 'bytes_may_use' keeps
increasing each time a task reserves space in the chunk block reserve.
This means that eventually some other task can end up not seeing enough
free space in the system space_info and decide to allocate yet another
system chunk.
This may repeat several times if yet more new tasks keep allocating
new block groups before task A, and all the other tasks, finish the
creation of the pending block groups, which is when reserved space
in excess is released. Eventually this can result in exhaustion of
system chunk array in the superblock, with btrfs_add_system_chunk()
returning EFBIG, resulting later in a transaction abort.
Even when we don't reach the extreme case of exhausting the system
array, most, if not all, unnecessarily created system block groups
end up being unused since when finishing creation of the first
pending system block group, the creation of the following ones end
up not needing to COW nodes/leaves of the chunk tree, so we never
allocate and deallocate from them, resulting in them never being
added to the list of unused block groups - as a consequence they
don't get deleted by the cleaner kthread - the only exceptions are
if we unmount and mount the filesystem again, which adds any unused
block groups to the list of unused block groups, if a scrub is
run, which also adds unused block groups to the unused list, and
under some circumstances when using a zoned filesystem or async
discard, which may also add unused block groups to the unused list.
So fix this by:
*) Tracking the number of reserved bytes for the chunk tree per
transaction, which is the sum of reserved chunk bytes by each
transaction handle currently being used;
*) When there is not enough free space in the system space_info,
if there are other transaction handles which reserved chunk space,
wait for some of them to complete in order to have enough excess
reserved space released, and then try again. Otherwise proceed with
the creation of a new system chunk.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
We have a race between marking that an inode needs to be logged, either
at btrfs_set_inode_last_trans() or at btrfs_page_mkwrite(), and between
btrfs_sync_log(). The following steps describe how the race happens.
1) We are at transaction N;
2) Inode I was previously fsynced in the current transaction so it has:
inode->logged_trans set to N;
3) The inode's root currently has:
root->log_transid set to 1
root->last_log_commit set to 0
Which means only one log transaction was committed to far, log
transaction 0. When a log tree is created we set ->log_transid and
->last_log_commit of its parent root to 0 (at btrfs_add_log_tree());
4) One more range of pages is dirtied in inode I;
5) Some task A starts an fsync against some other inode J (same root), and
so it joins log transaction 1.
Before task A calls btrfs_sync_log()...
6) Task B starts an fsync against inode I, which currently has the full
sync flag set, so it starts delalloc and waits for the ordered extent
to complete before calling btrfs_inode_in_log() at btrfs_sync_file();
7) During ordered extent completion we have btrfs_update_inode() called
against inode I, which in turn calls btrfs_set_inode_last_trans(),
which does the following:
So ->last_trans is set to N and ->last_sub_trans set to 1.
But before setting ->last_log_commit...
8) Task A is at btrfs_sync_log():
- it increments root->log_transid to 2
- starts writeback for all log tree extent buffers
- waits for the writeback to complete
- writes the super blocks
- updates root->last_log_commit to 1
It's a lot of slow steps between updating root->log_transid and
root->last_log_commit;
9) The task doing the ordered extent completion, currently at
btrfs_set_inode_last_trans(), then finally runs:
Which results in inode->last_log_commit being set to 1.
The ordered extent completes;
10) Task B is resumed, and it calls btrfs_inode_in_log() which returns
true because we have all the following conditions met:
inode->logged_trans == N which matches fs_info->generation &&
inode->last_subtrans (1) <= inode->last_log_commit (1) &&
inode->last_subtrans (1) <= root->last_log_commit (1) &&
list inode->extent_tree.modified_extents is empty
And as a consequence we return without logging the inode, so the
existing logged version of the inode does not point to the extent
that was written after the previous fsync.
It should be impossible in practice for one task be able to do so much
progress in btrfs_sync_log() while another task is at
btrfs_set_inode_last_trans() right after it reads root->log_transid and
before it reads root->last_log_commit. Even if kernel preemption is enabled
we know the task at btrfs_set_inode_last_trans() can not be preempted
because it is holding the inode's spinlock.
However there is another place where we do the same without holding the
spinlock, which is in the memory mapped write path at:
So with preemption happening after setting ->last_sub_trans and before
setting ->last_log_commit, it is less of a stretch to have another task
do enough progress at btrfs_sync_log() such that the task doing the memory
mapped write ends up with ->last_sub_trans and ->last_log_commit set to
the same value. It is still a big stretch to get there, as the task doing
btrfs_sync_log() has to start writeback, wait for its completion and write
the super blocks.
So fix this in two different ways:
1) For btrfs_set_inode_last_trans(), simply set ->last_log_commit to the
value of ->last_sub_trans minus 1;
2) For btrfs_page_mkwrite() only set the inode's ->last_sub_trans, just
like we do for buffered and direct writes at btrfs_file_write_iter(),
which is all we need to make sure multiple writes and fsyncs to an
inode in the same transaction never result in an fsync missing that
the inode changed and needs to be logged. Turn this into a helper
function and use it both at btrfs_page_mkwrite() and at
btrfs_file_write_iter() - this also fixes the problem that at
btrfs_page_mkwrite() we were setting those fields without the
protection of the inode's spinlock.
This is an extremely unlikely race to happen in practice.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
A few places we intermix btrfs_inode_lock with a inode_unlock, and some
places we just use inode_lock/inode_unlock instead of btrfs_inode_lock.
None of these places are using this incorrectly, but as we adjust some
of these callers it would be nice to keep everything consistent, so
convert everybody to use btrfs_inode_lock/btrfs_inode_unlock.
Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When initially probing the SPI slave device, the call for disabling an
SPI device without the SPI_CS_HIGH flag is not applied, as the
condition for checking whether or not the state to be applied equals the
one currently set evaluates to true.
This however might not necessarily be the case, as the chipselect might
be active.
Add a force flag to spi_set_cs which allows to override this
early exit condition. Set it to false everywhere except when called
from spi_setup to sync up the initial CS state.
Fixes commit d40f0b6f2e21 ("spi: Avoid setting the chip select if we don't
need to")
The DMI callbacks, used for quirks, currently access the PMC by getting
the address a global pmc_dev struct. Instead, have the callbacks set a
global quirk specific variable. In probe, after calling dmi_check_system(),
pass pmc_dev to a function that will handle each quirk if its variable
condition is met. This allows removing the global pmc_dev later.
Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Rajneesh Bhardwaj <irenic.rajneesh@gmail.com> Link: https://lore.kernel.org/r/20210417031252.3020837-2-david.e.box@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
pm_runtime_get_sync will increment pm usage counter even it failed.
Forgetting to putting operation will result in reference leak here.
Fix it by replacing it with pm_runtime_resume_and_get to keep usage
counter balanced.
Signed-off-by: Shixin Liu <liushixin2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
pm_runtime_get_sync will increment pm usage counter even it failed.
Forgetting to putting operation will result in reference leak here.
Fix it by replacing it with pm_runtime_resume_and_get to keep usage
counter balanced.
Signed-off-by: Shixin Liu <liushixin2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
pm_runtime_get_sync will increment pm usage counter even it failed.
Forgetting to putting operation will result in reference leak here.
Fix it by replacing it with pm_runtime_resume_and_get to keep usage
counter balanced.
Signed-off-by: Shixin Liu <liushixin2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
pm_runtime_get_sync will increment pm usage counter even it failed.
Forgetting to putting operation will result in reference leak here.
Fix it by replacing it with pm_runtime_resume_and_get to keep usage
counter balanced.
Signed-off-by: Shixin Liu <liushixin2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
pm_runtime_get_sync will increment pm usage counter even it failed.
Forgetting to putting operation will result in reference leak here.
Fix it by replacing it with pm_runtime_resume_and_get to keep usage
counter balanced.
Signed-off-by: Shixin Liu <liushixin2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
pm_runtime_get_sync will increment pm usage counter even it failed.
Forgetting to putting operation will result in reference leak here.
Fix it by replacing it with pm_runtime_resume_and_get to keep usage
counter balanced.
Signed-off-by: Shixin Liu <liushixin2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
pm_runtime_get_sync will increment pm usage counter even it failed.
Forgetting to putting operation will result in reference leak here.
Fix it by replacing it with pm_runtime_resume_and_get to keep usage
counter balanced.
Signed-off-by: Shixin Liu <liushixin2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
This driver's remove path calls cancel_delayed_work(). However, that
function does not wait until the work function finishes. This means
that the callback function may still be running after the driver's
remove function has finished, which would result in a use-after-free.
Fix by calling cancel_delayed_work_sync(), which ensures that
the work is properly cancelled, no longer running, and unable
to re-schedule itself.
pm_runtime_get_sync will increment pm usage counter even it failed.
thus a pairing decrement is needed.
Fix it by replacing it with pm_runtime_resume_and_get to keep usage
counter balanced.
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Bixuan Cui <cuibixuan@huawei.com> Link: https://lore.kernel.org/r/20210408130831.56239-1-cuibixuan@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
pm_runtime_get_sync will increment pm usage counter even it failed.
thus a pairing decrement is needed.
Fix it by replacing it with pm_runtime_resume_and_get to keep usage
counter balanced.
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Bixuan Cui <cuibixuan@huawei.com> Link: https://lore.kernel.org/r/20210408091836.55227-1-cuibixuan@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
This driver's remove path calls cancel_delayed_work(). However, that
function does not wait until the work function finishes. This means
that the callback function may still be running after the driver's
remove function has finished, which would result in a use-after-free.
Fix by calling cancel_delayed_work_sync(), which ensures that
the work is properly cancelled, no longer running, and unable
to re-schedule itself.
pm_runtime_get_sync will increment pm usage counter even it failed.
Forgetting to putting operation will result in reference leak here.
Fix it by replacing it with pm_runtime_resume_and_get to keep usage
counter balanced.
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Li <wangli74@huawei.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20210409095458.29921-1-wangli74@huawei.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Call spi_master_get() holds the reference count to master device, thus
we need an additional spi_master_put() call to reduce the reference
count, otherwise we will leak a reference to master.
This commit fix it by removing the unnecessary spi_master_get().
Call spi_master_get() holds the reference count to master device, thus
we need an additional spi_master_put() call to reduce the reference
count, otherwise we will leak a reference to master.
This commit fix it by removing the unnecessary spi_master_get().
Some Chromebooks use hard-coded interrupts in their ACPI tables.
This is an excerpt as dumped on Relm:
...
Name (_HID, "ELAN0001") // _HID: Hardware ID
Name (_DDN, "Elan Touchscreen ") // _DDN: DOS Device Name
Name (_UID, 0x05) // _UID: Unique ID
Name (ISTP, Zero)
Method (_CRS, 0, NotSerialized) // _CRS: Current Resource Settings
{
Name (BUF0, ResourceTemplate ()
{
I2cSerialBusV2 (0x0010, ControllerInitiated, 0x00061A80,
AddressingMode7Bit, "\\_SB.I2C1",
0x00, ResourceConsumer, , Exclusive,
)
Interrupt (ResourceConsumer, Edge, ActiveLow, Exclusive, ,, )
{
0x000000B8,
}
})
Return (BUF0) /* \_SB_.I2C1.ETSA._CRS.BUF0 */
}
...
This interrupt is hard-coded to 0xB8 = 184 which is too high to be mapped
to IO-APIC, so no triggering information is propagated as acpi_register_gsi()
fails and irqresource_disabled() is issued, which leads to erasing triggering
and polarity information.
Do not overwrite flags as it leads to erasing triggering and polarity
information which might be useful in case of hard-coded interrupts.
This way the information can be read later on even though mapping to
APIC domain failed.
Signed-off-by: Angela Czubak <acz@semihalf.com>
[ rjw: Changelog rearrangement ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In some cases when firmware is busy or updating, some mailbox commands
still timeout on some newer CPUs. To fix this issue, change how we
process timeout.
With this change, replaced timeout from using simple count with real
timeout in micro-seconds using ktime. When the command response takes
more than average processing time, yield to other tasks. The worst case
timeout is extended upto 1 milli-second.
The current string size to print cpulist can accommodate upto 80
logical CPUs per package. But this limit is not enough. So increase
the string size. Also prevent buffer overflow, if the string size
reaches limit.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Having a button code and not a key code causes issues with libinput.
udev won't set ID_INPUT_KEY. If it is forced, then it causes a bug
within libinput.
Deinit the device on shutdown to halt MHI/PCI operation on device
side. This change fixes floating device state with some hosts that
do not fully shutdown PCIe device when rebooting.
If a channel was explicitly stopped but not reset and a driver
remove is issued, clean up the channel context such that it is
reflected on the device. This move is useful if a client driver
module is unloaded or a device crash occurs with the host having
placed the channel in a stopped state.
The same values are parsed several times from transfer and event
TRBs by different functions in the same call path, all while processing
one transfer event.
As the TRBs are in DMA memory and can be accessed by the xHC host we want
to avoid this to prevent double-fetch issues.
To resolve this pass the already parsed values to the different functions
in the path of parsing a transfer event
The Max Interrupters supported by the controller is given in a 10bit
wide bitfield, but the driver uses a fixed 128 size array to index these
interrupters.
Klockwork reports a possible array out of bounds case which in theory
is possible. In practice this hasn't been hit as a common number of Max
Interrupters for new controllers is 8, not even close to 128.
By virtue of using platform_irq_get_optional() under the covers,
platform_irq_count() needs the target interrupt controller to be
available and may return -EPROBE_DEFER if it isn't. Let's use
dev_err_probe() to avoid a spurious error log (and help debug any
deferral issues) in that case.
We sometimes see COMMAND_IGNORED responses during the clock stop
sequence. It turns out we already have information if devices are
present on a link, so we should only prepare those when they
are attached.
In addition, even when COMMAND_IGNORED are received, we should still
proceed with the clock stop. The device will not be prepared but
that's not a problem.
The only case where the clock stop will fail is if the Cadence IP
reports an error (including a timeout), or if the devices throw a
COMMAND_FAILED response.
When Secure World returns, it may have changed the size attribute of the
memory references passed as [in/out] parameters. The GlobalPlatform TEE
Internal Core API specification does not restrict the values that this
size can take. In particular, Secure World may increase the value to be
larger than the size of the input buffer to indicate that it needs more.
Therefore, the size check in optee_from_msg_param() is incorrect and
needs to be removed. This fixes a number of failed test cases in the
GlobalPlatform TEE Initial Configuratiom Test Suite v2_0_0_0-2017_06_09
when OP-TEE is compiled without dynamic shared memory support
(CFG_CORE_DYN_SHM=n).
Commit 99e71c029213 ("arm64: dts: imx8mq-librem5: Don't mark buck3 as always on")
removed always-on marking from GPU regulator, which is great for power
saving - however it introduces additional i2c0 traffic which can be deadly
for devices from the Dogwood batch.
To workaround the i2c0 shutdown issue on Dogwood, this commit marks
buck3 as always-on again - but only for Dogwood (r3).
Signed-off-by: Sebastian Krzyszkowiak <sebastian.krzyszkowiak@puri.sm> Signed-off-by: Martin Kepplinger <martin.kepplinger@puri.sm> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The SW-initiated power gate toggling is dropped by PMC if there is
contention with a HW-initiated toggling, i.e. when one of CPU cores is
gated by cpuidle driver. Software should retry the toggling after 10
microseconds on Tegra20/30 SoCs, hence add the retrying. On Tegra114+ the
toggling method was changed in hardware, the TOGGLE_START bit indicates
whether PMC is busy or could accept the command to toggle, hence handle
that bit properly.
The problem pops up after enabling dynamic power gating of 3D hardware,
where 3D power domain fails to turn on/off "randomly".
The programming sequence and quirks are documented in TRMs, but PMC
driver obliviously re-used the Tegra20 logic for Tegra30+, which strikes
back now. The 10 microseconds and other timeouts aren't documented in TRM,
they are taken from downstream kernel.
When cross compiling x86 on an ARM machine with clang, there are several
errors along the lines of:
arch/x86/include/asm/page_64.h:52:7: error: invalid output constraint '=D' in asm
This happens because the x86 flags in the EFI stub are not derived from
KBUILD_CFLAGS like the other architectures are and the clang flags that
set the target architecture ('--target=') and the path to the GNU cross
tools ('--prefix=') are not present, meaning that the host architecture
is targeted.
These flags are available as $(CLANG_FLAGS) from the main Makefile so
add them to the cflags for x86 so that cross compiling works as expected.
When cross compiling x86 on an ARM machine with clang, there are several
errors along the lines of:
arch/x86/include/asm/string_64.h:27:10: error: invalid output constraint '=&c' in asm
This happens because the compressed boot Makefile reassigns KBUILD_CFLAGS
and drops the clang flags that set the target architecture ('--target=')
and the path to the GNU cross tools ('--prefix='), meaning that the host
architecture is targeted.
These flags are available as $(CLANG_FLAGS) from the main Makefile so
add them to the compressed boot folder's KBUILD_CFLAGS so that cross
compiling works as expected.
When cross-compiling with Clang, the `$(CLANG_FLAGS)' variable
contains additional flags needed to build C and assembly sources
for the target platform. Normally this variable is automatically
included in `$(KBUILD_CFLAGS)' via the top-level Makefile.
The x86 real-mode makefile builds `$(REALMODE_CFLAGS)' from a
plain assignment and therefore drops the Clang flags. This causes
Clang to not recognize x86-specific assembler directives:
 arch/x86/realmode/rm/header.S:36:1: error: unknown directive
 .type real_mode_header STT_OBJECT ; .size real_mode_header, .-real_mode_header
 ^
Explicit propagation of `$(CLANG_FLAGS)' to `$(REALMODE_CFLAGS)',
which is inherited by real-mode make rules, fixes cross-compilation
with Clang for x86 targets.
Relevant flags:
* `--target' sets the target architecture when cross-compiling. This
 flag must be set for both compilation and assembly (`KBUILD_AFLAGS')
 to support architecture-specific assembler directives.
* `-no-integrated-as' tells clang to assemble with GNU Assembler
 instead of its built-in LLVM assembler. This flag is set by default
 unless `LLVM_IAS=1' is set, because the LLVM assembler can't yet
 parse certain GNU extensions.
The TVK1281618 R3 sensors are different from the R2 board,
some incorrectness is fixed and some new sensors added, we
also rename the nodes appropriately with accelerometer@
etc.
This fixes warnings/errors like:
arch/arm/boot/dts/bcm4708-buffalo-wzr-1750dhp.dt.yaml: /: memory@0:reg:0: [0, 134217728, 2281701376, 402653184] is too long
From schema: /lib/python3.6/site-packages/dtschema/schemas/reg.yaml
To check whether the CPU and kernel support the MTE features we want
to test, we use an (emulated) CPU ID register read. However we only
check against a very particular feature version (0b0010), even though
the ARM ARM promises ID register features to be backwards compatible.
While this could be fixed by using ">=" instead of "==", we should
actually use the explicit HWCAP2_MTE hardware capability, exposed by the
kernel via the ELF auxiliary vectors.
That moves this responsibility to the kernel, and fixes running the
tests on machines with FEAT_MTE3 capability.
It should not be necessary to update the current_state field of
struct pci_dev in pci_enable_device_flags() before calling
do_pci_enable_device() for the device, because none of the
code between that point and the pci_set_power_state() call in
do_pci_enable_device() invoked later depends on it.
Moreover, doing that is actively harmful in some cases. For example,
if the given PCI device depends on an ACPI power resource whose _STA
method initially returns 0 ("off"), but the config space of the PCI
device is accessible and the power state retrieved from the
PCI_PM_CTRL register is D0, the current_state field in the struct
pci_dev representing that device will get out of sync with the
power.state of its ACPI companion object and that will lead to
power management issues going forward.
To avoid such issues it is better to leave the current_state value
as is until it is changed to PCI_D0 by do_pci_enable_device() as
appropriate. However, the power state of the device is not changed
to PCI_D0 if it is already enabled when pci_enable_device_flags()
gets called for it, so update its current_state in that case, but
use pci_update_current_state() covering platform PM too for that.
Link: https://lore.kernel.org/lkml/20210314000439.3138941-1-luzmaximilian@gmail.com/ Reported-by: Maximilian Luz <luzmaximilian@gmail.com> Tested-by: Maximilian Luz <luzmaximilian@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The mte selftest Makefile contains a check for GCC, to add the memtag
-march flag to the compiler options. This check fails if the compiler
is not explicitly specified, so reverts to the standard "cc", in which
case --version doesn't mention the "gcc" string we match against:
$ cc --version | head -n 1
cc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
This will not add the -march switch to the command line, so compilation
fails:
mte_helper.S: Assembler messages:
mte_helper.S:25: Error: selected processor does not support `irg x0,x0,xzr'
mte_helper.S:38: Error: selected processor does not support `gmi x1,x0,xzr'
...
Actually clang accepts the same -march option as well, so we can just
drop this check and add this unconditionally to the command line, to avoid
any future issues with this check altogether (gcc actually prints
basename(argv[0]) when called with --version).
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Mark Brown <broone@kernel.org> Link: https://lore.kernel.org/r/20210319165334.29213-2-andre.przywara@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Some hosts incorrectly use sub-minor version for minor version (i.e.
0x02 instead of 0x20 for bcdUSB 0x320 and 0x01 for bcdUSB 0x310).
Currently the xHCI driver works around this by just checking for minor
revision > 0x01 for USB 3.1 everywhere. With the addition of USB 3.2,
checking this gets a bit cumbersome. Since there is no USB release with
bcdUSB 0x301 to 0x309, we can assume that sub-minor version 01 to 09 is
incorrect. Let's try to fix this and use the minor revision that matches
with the USB/xHCI spec to help with the version checking within the
driver.
The current dwc3_gadget_reset_interrupt() will stop any active
transfers, but only addresses blocking of EP queuing for while we are
coming from a disconnected scenario, i.e. after receiving the disconnect
event. If the host decides to issue a bus reset on the device, the
connected parameter will still be set to true, allowing for EP queuing
to continue while we are disabling the functions. To avoid this, set the
connected flag to false until the stop active transfers is complete.
Currently user can configure UAC1 function with
parameters that violate UAC1 spec or are not supported
by UAC1 gadget implementation.
This can lead to incorrect behavior if such gadget
is connected to the host - like enumeration failure
or other issues depending on host's UAC1 driver
implementation, bringing user to a long hours
of debugging the issue.
Instead of silently accept these parameters, throw
an error if they are not valid.
Currently user can configure UAC2 function with
parameters that violate UAC2 spec or are not supported
by UAC2 gadget implementation.
This can lead to incorrect behavior if such gadget
is connected to the host - like enumeration failure
or other issues depending on host's UAC2 driver
implementation, bringing user to a long hours
of debugging the issue.
Instead of silently accept these parameters, throw
an error if they are not valid.
When irq_matrix_free() is called for an unallocated vector the
managed_allocated and total_allocated counters get out of sync with the
real state of the matrix. Later, when the last interrupt is freed, these
counters will underflow resulting in UINTMAX because the counters are
unsigned.
While this is certainly a problem of the calling code, this can be catched
in the allocator by checking the allocation bit for the to be freed vector
which simplifies debugging.
An example of the problem described above:
https://lore.kernel.org/lkml/20210318192819.636943062@linutronix.de/
Add the missing sanity check and emit a warning when it triggers.
Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210319111823.1105248-1-vkuznets@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
A malicious hypervisor could disable the CPUID intercept for an SEV or
SEV-ES guest and trick it into the no-SEV boot path, where it could
potentially reveal secrets. This is not an issue for SEV-SNP guests,
as the CPUID intercept can't be disabled for those.
Remove the Hypervisor CPUID bit check from the SEV detection code to
protect against this kind of attack and add a Hypervisor bit equals zero
check to the SME detection path to prevent non-encrypted guests from
trying to enable SME.
This handles the following cases:
1) SEV(-ES) guest where CPUID intercept is disabled. The guest
will still see leaf 0x8000001f and the SEV bit. It can
retrieve the C-bit and boot normally.
2) Non-encrypted guests with intercepted CPUID will check
the SEV_STATUS MSR and find it 0 and will try to enable SME.
This will fail when the guest finds MSR_K8_SYSCFG to be zero,
as it is emulated by KVM. But we can't rely on that, as there
might be other hypervisors which return this MSR with bit
23 set. The Hypervisor bit check will prevent that the guest
tries to enable SME in this case.
3) Non-encrypted guests on SEV capable hosts with CPUID intercept
disabled (by a malicious hypervisor) will try to boot into
the SME path. This will fail, but it is also not considered
a problem because non-encrypted guests have no protection
against the hypervisor anyway.
According with USB Device Class Definition for Video Device the
Processing Unit Descriptor bLength should be 12 (10 + bmControlSize),
but it has 11.
Invalid length caused that Processing Unit Descriptor Test Video form
CV tool failed. To fix this issue patch adds bmVideoStandards into
uvc_processing_unit_descriptor structure.
The bmVideoStandards field was added in UVC 1.1 and it wasn't part of
UVC 1.0a.
Patch adds extra checking for bInterval passed by configfs.
The 5.6.4 chapter of USB Specification (rev. 2.0) say:
"A high-bandwidth endpoint must specify a period of 1x125 µs
(i.e., a bInterval value of 1)."
The issue was observed during testing UVC class on CV.
I treat this change as improvement because we can control
bInterval by configfs.
Given that crypto_alloc_tfm() may return ERR pointers, and to avoid
crashes on obscure error paths where such pointers are presented to
crypto_destroy_tfm() (such as [0]), add an ERR_PTR check there
before dereferencing the second argument as a struct crypto_tfm
pointer.
In current design, whenever the BHI interrupt is fired, the
execution environment is updated. This can cause race conditions
and impede ongoing power up/down processing. For example, if a
power down is in progress, MHI host updates to a local "disabled"
execution environment. If a BHI interrupt fires later, that value
gets replaced with one from the BHI EE register. This impacts the
controller as it does not expect multiple RDDM execution
environment change status callbacks as an example. Another issue
would be that the device can enter mission mode and the execution
environment is updated, while device creation for SBL channels is
still going on due to slower PM state worker thread run, leading
to multiple attempts at opening the same channel.
Ensure that EE changes are handled only from appropriate places
and occur one after another and handle only PBL modes or RDDM EE
changes as critical events directly from the interrupt handler.
Simplify handling by waiting for SYS ERROR before handling RDDM.
This also makes sure that we use the correct execution environment
to notify the controller driver when the device resets to one of
the PBL execution environments.
Currently, client devices are created in SBL or AMSS (mission
mode) and only destroyed after power down or SYS ERROR. When
moving between certain execution environments, such as from SBL
to AMSS, no clean-up is required. This presents an issue where
SBL-specific channels are left open and client drivers now run in
an execution environment where they cannot operate. Fix this by
expanding the mhi_destroy_device() to do an execution environment
specific clean-up if one is requested. Close the gap and destroy
devices in such scenarios that allow SBL client drivers to clean
up once device enters mission mode.
The wake_db register presence is highly speculative and can fuze MHI
devices. Indeed, currently the wake_db register address is defined at
entry 127 of the 'Channel doorbell array', thus writing to this address
is equivalent to ringing the doorbell for channel 127, causing trouble
with some devics (e.g. SDX24 based modems) that get an unexpected
channel 127 doorbell interrupt.
This change fixes that issue by setting wake get/put as no-op for
pci_generic devices. The wake device sideband mechanism seems really
specific to each device, and is AFAIK not defined by the MHI spec.
It also removes zeroing initialization of wake_db register during MMIO
initialization, the register being set via wake_get/put accessors few
cycles later during M0 transition.
This removes the assignment of setup and cleanup functions for the ath79
target. Assigning the setup-method will lead to 'setup_transfer' not
being assigned in spi_bitbang_init. Because of this, performing any
TX/RX operation will lead to a kernel oops.
Also drop the redundant cleanup assignment, as it's also assigned in
spi_bitbang_init.
spi-bitbang has to call the chipselect function on the ath79 SPI driver
in order to communicate with the SPI slave device, as the ath79 SPI
driver has three dedicated chipselect lines but can also be used with
GPIOs for the CS lines.
Fixes commit 4a07b8bcd503 ("spi: bitbang: Make chipselect callback optional")
We want to probe l4_wkup and l4_cfg interconnect devices first to avoid
issues with missing resources. Otherwise we attempt to probe l4_per
devices first causing pointless deferred probe and also annoyingh
renumbering of the MMC devices for example.
Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Trusted Foundation firmware doesn't implement the do_idle call and in
this case suspending should fall back to the common suspend path. In order
to fix this issue we will unconditionally set the NOFLUSH_L2 mode via
firmware call, which is a NO-OP on Tegra30/124, and then proceed to the
C7 idling, like it was done by the older Tegra114 cpuidle driver.
Fixes: 14e086baca50 ("cpuidle: tegra: Squash Tegra114 driver into the common driver") Cc: stable@vger.kernel.org # 5.7+ Reported-by: Anton Bambura <jenneron@protonmail.com> # TF701 T114 Tested-by: Anton Bambura <jenneron@protonmail.com> # TF701 T114 Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30 Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30 Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210302095405.28453-1-digetx@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use kzalloc() rather than kmalloc() for the dynamically allocated parts
of the colormap in fb_alloc_cmap_gfp, to prevent a leak of random kernel
data to userspace under certain circumstances.
The return value on success (>= 0) is overwritten by the return value of
put_old_timex32(). That works correct in the fault case, but is wrong for
the success case where put_old_timex32() returns 0.
Just check the return value of put_old_timex32() and return -EFAULT in case
it is not zero.
[ tglx: Massage changelog ]
Fixes: 3a4d44b61625 ("ntp: Move adjtimex related compat syscalls to native counterparts") Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210414030449.90692-1-chenjun102@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For zoned btrfs, zone append is mandatory to write to a sequential write
only zone, otherwise parallel writes to the same zone could result in
unaligned write errors.
If a zoned block device does not support zone append (e.g. a dm-crypt
zoned device using a non-NULL IV cypher), fail to mount.
CC: stable@vger.kernel.org # 5.12 Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When doing a device replace on a zoned filesystem, if we find a block
group with ->to_copy == 0, we jump to the label 'done', which will result
in later calling btrfs_unfreeze_block_group(), even though at this point
we never called btrfs_freeze_block_group().
Since at this point we have neither turned the block group to RO mode nor
made any progress, we don't need to jump to the label 'done'. So fix this
by jumping instead to the label 'skip' and dropping our reference on the
block group before the jump.
Fixes: 78ce9fc269af6e ("btrfs: zoned: mark block groups to copy for device-replace") CC: stable@vger.kernel.org # 5.12 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a race between a task aborting a transaction during a commit,
a task doing an fsync and the transaction kthread, which leads to an
use-after-free of the log root tree. When this happens, it results in a
stack trace like the following:
The steps that lead to this crash are the following:
1) We are at transaction N;
2) We have two tasks with a transaction handle attached to transaction N.
Task A and Task B. Task B is doing an fsync;
3) Task B is at btrfs_sync_log(), and has saved fs_info->log_root_tree
into a local variable named 'log_root_tree' at the top of
btrfs_sync_log(). Task B is about to call write_all_supers(), but
before that...
4) Task A calls btrfs_commit_transaction(), and after it sets the
transaction state to TRANS_STATE_COMMIT_START, an error happens before
it waits for the transaction's 'num_writers' counter to reach a value
of 1 (no one else attached to the transaction), so it jumps to the
label "cleanup_transaction";
5) Task A then calls cleanup_transaction(), where it aborts the
transaction, setting BTRFS_FS_STATE_TRANS_ABORTED on fs_info->fs_state,
setting the ->aborted field of the transaction and the handle to an
errno value and also setting BTRFS_FS_STATE_ERROR on fs_info->fs_state.
After that, at cleanup_transaction(), it deletes the transaction from
the list of transactions (fs_info->trans_list), sets the transaction
to the state TRANS_STATE_COMMIT_DOING and then waits for the number
of writers to go down to 1, as it's currently 2 (1 for task A and 1
for task B);
6) The transaction kthread is running and sees that BTRFS_FS_STATE_ERROR
is set in fs_info->fs_state, so it calls btrfs_cleanup_transaction().
There it sees the list fs_info->trans_list is empty, and then proceeds
into calling btrfs_drop_all_logs(), which frees the log root tree with
a call to btrfs_free_log_root_tree();
7) Task B calls write_all_supers() and, shortly after, under the label
'out_wake_log_root', it deferences the pointer stored in
'log_root_tree', which was already freed in the previous step by the
transaction kthread. This results in a use-after-free leading to a
crash.
Fix this by deleting the transaction from the list of transactions at
cleanup_transaction() only after setting the transaction state to
TRANS_STATE_COMMIT_DOING and waiting for all existing tasks that are
attached to the transaction to release their transaction handles.
This makes the transaction kthread wait for all the tasks attached to
the transaction to be done with the transaction before dropping the
log roots and doing other cleanups.
Fixes: ef67963dac255b ("btrfs: drop logs when we've aborted a transaction") CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When creating a subvolume we allocate an extent buffer for its root node
after starting a transaction. We setup a root item for the subvolume that
points to that extent buffer and then attempt to insert the root item into
the root tree - however if that fails, due to ENOMEM for example, we do
not free the extent buffer previously allocated and we do not abort the
transaction (as at that point we did nothing that can not be undone).
This means that we effectively do not return the metadata extent back to
the free space cache/tree and we leave a delayed reference for it which
causes a metadata extent item to be added to the extent tree, in the next
transaction commit, without having backreferences. When this happens
'btrfs check' reports the following:
$ btrfs check /dev/sdi
Opening filesystem to check...
Checking filesystem on /dev/sdi
UUID: dce2cb9d-025f-4b05-a4bf-cee0ad3785eb
[1/7] checking root items
[2/7] checking extents
ref mismatch on [30425088 16384] extent item 1, found 0
backref 30425088 root 256 not referenced back 0x564a91c23d70
incorrect global backref count on 30425088 found 1 wanted 0
backpointer mismatch on [30425088 16384]
owner ref check failed [30425088 16384]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 212992 bytes used, error(s) found
total csum bytes: 0
total tree bytes: 131072
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 124669
file data blocks allocated: 65536
referenced 65536
So fix this by freeing the metadata extent if btrfs_insert_root() returns
an error.
CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix a regression caused by making the 486SX separately selectable in
Kconfig, for which the HIGHMEM64G setting has not been updated and
therefore has become exposed as a user-selectable option for the M486SX
configuration setting unlike with original M486 and all the other
settings that choose non-PAE-enabled processors:
High Memory Support
> 1. off (NOHIGHMEM)
2. 4GB (HIGHMEM4G)
3. 64GB (HIGHMEM64G)
choice[1-3?]:
With the fix in place the setting is now correctly removed:
High Memory Support
> 1. off (NOHIGHMEM)
2. 4GB (HIGHMEM4G)
choice[1-2?]:
[ bp: Massage commit message. ]
Fixes: 87d6021b8143 ("x86/math-emu: Limit MATH_EMULATION to 486SX compatibles") Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.5+ Link: https://lkml.kernel.org/r/alpine.DEB.2.21.2104141221340.44318@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is already after the patch "btrfs: inode: fix NULL pointer
dereference if inode doesn't need compression."
[CAUSE]
@pages is firstly created by kcalloc() in compress_file_extent():
pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
Then passed to btrfs_compress_pages() to be utilized there:
ret = btrfs_compress_pages(...
pages,
&nr_pages,
...);
btrfs_compress_pages() will initialize each page as output, in
zlib_compress_pages() we have:
pages[nr_pages] = out_page;
nr_pages++;
Normally this is completely fine, but there is a special case which
is in btrfs_compress_pages() itself:
switch (type) {
default:
return -E2BIG;
}
In this case, we didn't modify @pages nor @out_pages, leaving them
untouched, then when we cleanup pages, the we can hit NULL pointer
dereference again:
if (pages) {
for (i = 0; i < nr_pages; i++) {
WARN_ON(pages[i]->mapping);
put_page(pages[i]);
}
...
}
Since pages[i] are all initialized to zero, and btrfs_compress_pages()
doesn't change them at all, accessing pages[i]->mapping would lead to
NULL pointer dereference.
This is not possible for current kernel, as we check
inode_need_compress() before doing pages allocation.
But if we're going to remove that inode_need_compress() in
compress_file_extent(), then it's going to be a problem.
[FIX]
When btrfs_compress_pages() hits its default case, modify @out_pages to
0 to prevent such problem from happening.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212331 CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* rqst[1,2,3] is allocated in vars
* each rqst->rq_iov is also allocated in vars or using pooled memory
SMB2_open_free, SMB2_ioctl_free, SMB2_query_info_free are iterating on
each rqst after vars has been freed (use-after-free), and they are
freeing the kvec a second time (double-free).
==================================================================
BUG: KASAN: use-after-free in SMB2_open_free+0x1c/0xa0
Read of size 8 at addr ffff888007b10c00 by task python3/1200
Memory state around the buggy address: ffff888007b10b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888007b10b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888007b10c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff888007b10c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888007b10d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Signed-off-by: Aurelien Aptel <aaptel@suse.com> CC: <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The commit 315db9a05b7a ("cifs: fix leak in cifs_smb3_do_mount() ctx")
revealed an existing bug when mounting shares that contain a prefix
path or DFS links.
cifs_setup_volume_info() requires the @devname to contain the full
path (UNC + prefix) to update the fs context with the new UNC and
prepath values, however we were passing only the UNC
path (old_ctx->UNC) in @device thus discarding any prefix paths.
Instead of concatenating both old_ctx->{UNC,prepath} and pass it in
@devname, just keep the dup'ed values of UNC and prepath in
cifs_sb->ctx after calling smb3_fs_context_dup(), and fix
smb3_parse_devname() to correctly parse and not leak the new UNC and
prefix paths.
Cc: <stable@vger.kernel.org> # v5.11+ Fixes: 315db9a05b7a ("cifs: fix leak in cifs_smb3_do_mount() ctx") Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Acked-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We can detect server unresponsiveness only if echoes are enabled.
Echoes can be disabled under two scenarios:
1. The connection is low on credits, so we've disabled echoes/oplocks.
2. The connection has not seen any request till now (other than
negotiate/sess-setup), which is when we enable these two, based on
the credits available.
So this fix will check for dead connection, only when echo is enabled.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> CC: <stable@vger.kernel.org> # v5.8+ Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cifs_smb3_do_mount() calls smb3_fs_context_dup() and then
cifs_setup_volume_info(). The latter's subsequent smb3_parse_devname()
call overwrites the cifs_sb->ctx->UNC string already dup'ed by
smb3_fs_context_dup(), resulting in a leak. E.g.
This change is a bandaid until the cifs_setup_volume_info() TODO and
error handling issues are resolved.
Signed-off-by: David Disseldorp <ddiss@suse.de> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> CC: <stable@vger.kernel.org> # v5.11+ Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If smb3_notify() is called at mount point of CIFS, build_path_from_dentry()
returns the pointer to kmalloc-ed memory with terminating zero (this is
empty FileName to be passed to SMB2 CREATE request). This pointer is assigned
to the `path` variable.
Then `path + 1` (to skip first backslash symbol) is passed to
cifs_convert_path_to_utf16(). This is incorrect for empty path and causes
out-of-bound memory access.
Get rid of this "increase by one". cifs_convert_path_to_utf16() already
contains the check for leading backslash in the path.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=212693 CC: <stable@vger.kernel.org> # v5.6+ Signed-off-by: Eugene Korenevsky <ekorenevsky@astralinux.ru> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[440700.376476] CIFS VFS: \\otters.example.com crypt_message: Could not get encryption key
[440700.386947] ------------[ cut here ]------------
[440700.386948] err = 1
[440700.386977] WARNING: CPU: 11 PID: 2733 at /build/linux-hwe-5.4-p6lk6L/linux-hwe-5.4-5.4.0/lib/errseq.c:74 errseq_set+0x5c/0x70
...
[440700.397304] CPU: 11 PID: 2733 Comm: tar Tainted: G OE 5.4.0-70-generic #78~18.04.1-Ubuntu
...
[440700.397334] Call Trace:
[440700.397346] __filemap_set_wb_err+0x1a/0x70
[440700.397419] cifs_writepages+0x9c7/0xb30 [cifs]
[440700.397426] do_writepages+0x4b/0xe0
[440700.397444] __filemap_fdatawrite_range+0xcb/0x100
[440700.397455] filemap_write_and_wait+0x42/0xa0
[440700.397486] cifs_setattr+0x68b/0xf30 [cifs]
[440700.397493] notify_change+0x358/0x4a0
[440700.397500] utimes_common+0xe9/0x1c0
[440700.397510] do_utimes+0xc5/0x150
[440700.397520] __x64_sys_utimensat+0x88/0xd0
Fixes: 61cfac6f267d ("CIFS: Fix possible use after free in demultiplex thread") Signed-off-by: Paul Aurich <paul@darkrain42.org> CC: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
which is cause by a 'BUG_ON(in_nmi())' in nmi_enter().
From the call trace, we can find three interrupts (noted A, B, C above):
interrupt (A) is preempted by (B), which is further interrupted by (C).
Subsequent investigations show that (B) results in nmi_enter() being
called, but that it actually is a spurious interrupt. Furthermore,
interrupts are reenabled in the context of (B), and (C) fires with
NMI priority. We end-up with a nested NMI situation, something
we definitely do not want to (and cannot) handle.
The bug here is that spurious interrupts should never result in any
state change, and we should just return to the interrupted context.
Moving the handling of spurious interrupts as early as possible in
the GICv3 handler fixes this issue.
Fixes: 3f1f3234bc2d ("irqchip/gic-v3: Switch to PMR masking before calling IRQ handler") Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: He Ying <heying24@huawei.com>
[maz: rewrote commit message, corrected Fixes: tag] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210423083516.170111-1-heying24@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The mmc core uses a PM notifier to temporarily during system suspend, turn
off the card detection mechanism for removal/insertion of (e)MMC/SD/SDIO
cards. Additionally, the notifier may be used to remove an SDIO card
entirely, if a corresponding SDIO functional driver don't have the system
suspend/resume callbacks assigned. This behaviour has been around for a
very long time.
However, a recent bug report tells us there are problems with this
approach. More precisely, when receiving the PM_SUSPEND_PREPARE
notification, we may end up hanging on I/O to be completed, thus also
preventing the system from getting suspended.
In the end what happens, is that the cancel_delayed_work_sync() in
mmc_pm_notify() ends up waiting for mmc_rescan() to complete - and since
mmc_rescan() wants to claim the host, it needs to wait for the I/O to be
completed first.
Typically, this problem is triggered in Android, if there is ongoing I/O
while the user decides to suspend, resume and then suspend the system
again. This due to that after the resume, an mmc_rescan() work gets punted
to the workqueue, which job is to verify that the card remains inserted
after the system has resumed.
To fix this problem, userspace needs to become frozen to suspend the I/O,
prior to turning off the card detection mechanism. Therefore, let's drop
the PM notifiers for mmc subsystem altogether and rely on the card
detection to be turned off/on as a part of the system_freezable_wq, that we
are already using.
Moreover, to allow and SDIO card to be removed during system suspend, let's
manage this from a ->prepare() callback, assigned at the mmc_host_class
level. In this way, we can use the parent device (the mmc_host_class
device), to remove the card device that is the child, in the
device_prepare() phase.
Some of SD cards sets permanent write protection bit in their CSD register,
due to lifespan or internal problem. To avoid unnecessary I/O write
operations, let's parse the bits in the CSD during initialization and mark
the card as read only for this case.