These symbols are currently part of the generic archrandom.h
interface, but are currently unused and can be removed.
Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20200110145422.49141-4-broonie@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
These symbols are currently part of the generic archrandom.h
interface, but are currently unused and can be removed.
Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20200110145422.49141-3-broonie@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use the expansion of these macros directly in arch_get_random_*.
These symbols are currently part of the generic archrandom.h
interface, but are currently unused and can be removed.
Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20200110145422.49141-2-broonie@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As crng_initialize_secondary() is only called by do_numa_crng_init(),
and the latter is under ifdeffery for CONFIG_NUMA, when CONFIG_NUMA is
not selected the compiler will warn that the former is unused:
| drivers/char/random.c:820:13: warning: 'crng_initialize_secondary' defined but not used [-Wunused-function]
| 820 | static void crng_initialize_secondary(struct crng_state *crng)
| | ^~~~~~~~~~~~~~~~~~~~~~~~~
Stephen reports that this happens for x86_64 noallconfig builds.
We could move crng_initialize_secondary() and crng_init_try_arch() under
the CONFIG_NUMA ifdeffery, but this has the unfortunate property of
separating them from crng_initialize_primary() and
crng_init_try_arch_early() respectively. Instead, let's mark
crng_initialize_secondary() as __maybe_unused.
Link: https://lore.kernel.org/r/20200310121747.GA49602@lakrids.cambridge.arm.com Fixes: 5cbe0f13b51a ("random: split primary/secondary crng init paths") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently crng_initialize() is used for both the primary CRNG and
secondary CRNGs. While we wish to share common logic, we need to do a
number of additional things for the primary CRNG, and this would be
easier to deal with were these handled in separate functions.
This patch splits crng_initialize() into crng_initialize_primary() and
crng_initialize_secondary(), with common logic factored out into a
crng_init_try_arch() helper.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20200210130015.17664-2-mark.rutland@arm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, if CONFIG_RANDOM_TRUST_BOOTLOADER is enabled, multiple calls
to add_bootloader_randomness() are broken and can cause a NULL pointer
dereference, as noted by Ivan T. Ivanov. This is not only a hypothetical
problem, as qemu on arm64 may provide bootloader entropy via EFI and via
devicetree.
On the first call to add_hwgenerator_randomness(), crng_fast_load() is
executed, and if the seed is long enough, crng_init will be set to 1.
On subsequent calls to add_bootloader_randomness() and then to
add_hwgenerator_randomness(), crng_fast_load() will be skipped. Instead,
wait_event_interruptible() and then credit_entropy_bits() will be called.
If the entropy count for that second seed is large enough, that proceeds
to crng_reseed().
However, both wait_event_interruptible() and crng_reseed() depends
(at least in numa_crng_init()) on workqueues. Therefore, test whether
system_wq is already initialized, which is a sufficient indicator that
workqueue_init_early() has progressed far enough.
If we wind up hitting the !system_wq case, we later want to do what
would have been done there when wqs are up, so set a flag, and do that
work later from the rand_initialize() call.
Reported-by: Ivan T. Ivanov <iivanov@suse.de> Fixes: 18b915ac6b0a ("efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness") Cc: stable@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
[Jason: added crng_need_done state and related logic.] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
but those code is so old, and I have no clue how to de-couple it after
checking other locks in the splat. There is an onging effort to make all
printk() as deferred, so until that happens, workaround it for now as a
short-term fix.
/dev/random and getrandom() never warn. Split the meat of
urandom_read() into urandom_read_nowarn() and leave the warning code
in urandom_read().
This has no effect on kernel behavior, but it makes subsequent
patches more straightforward. It also makes the fact that
getrandom() never warns more obvious.
crng_init_wait is only used to wayt for crng_init to be set to 2, so
there's no point to waking it when crng_init is set to 1. Remove the
unnecessary wake_up_interruptible() call.
With SHA-1 no longer being used for anything performance oriented, and
also soon to be phased out entirely, we can make up for the space added
by unrolled BLAKE2s by simply re-rolling SHA-1. Since SHA-1 is so much
more complex, re-rolling it more or less takes care of the code size
added by BLAKE2s. And eventually, hopefully we'll see SHA-1 removed
entirely from most small kernel builds.
Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ard Biesheuvel <ardb@kernel.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Basically nobody should use blake2s in an HMAC construction; it already
has a keyed variant. But unfortunately for historical reasons, Noise,
used by WireGuard, uses HKDF quite strictly, which means we have to use
this. Because this really shouldn't be used by others, this commit moves
it into wireguard's noise.c locally, so that kernels that aren't using
WireGuard don't get this superfluous code baked in. On m68k systems,
this shaves off ~314 bytes.
Cc: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Ard Biesheuvel <ardb@kernel.org>
[Jason: for stable, skip the wireguard changes, since this kernel
doesn't have wireguard.] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The C implementation was originally based on Samuel Neves' public
domain reference implementation but has since been heavily modified
for the kernel. We're able to do compile-time optimizations by moving
some scaffolding around the final function into the header file.
Information: https://blake2.net/
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Co-developed-by: Samuel Neves <sneves@dei.uc.pt>
[ardb: - move from lib/zinc to lib/crypto
- remove simd handling
- rewrote selftest for better coverage
- use fixed digest length for blake2s_hmac() and rename to
blake2s256_hmac() ] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[Jason: for stable, skip kconfig and wire up directly, and skip the arch
hooks; optimized implementations need not be backported.] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 03a3bb7ae631 ("hwrng: core - Freeze khwrng
thread during suspend"), ff296293b353 ("random: Support freezable
kthreads in add_hwgenerator_randomness()") and 59b569480dc8 ("random:
Use wait_event_freezable() in add_hwgenerator_randomness()").
These patches introduced regressions and we need more time to
get them ready for mainline.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On Tue, Oct 01, 2019 at 10:14:40AM -0700, Linus Torvalds wrote:
> The previous state of the file didn't have that 0xa at the end, so you get that
>
>
> -EXPORT_SYMBOL_GPL(add_bootloader_randomness);
> \ No newline at end of file
> +EXPORT_SYMBOL_GPL(add_bootloader_randomness);
>
> which is "the '-' line doesn't have a newline, the '+' line does" marker.
Aaha, that makes total sense, thanks for explaining. Oh well, let's fix
it then so that people don't scratch heads like me.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sebastian reports that after commit ff296293b353 ("random: Support freezable
kthreads in add_hwgenerator_randomness()") we can call might_sleep() when the
task state is TASK_INTERRUPTIBLE (state=1). This leads to the following warning.
do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000349d1489>] prepare_to_wait_event+0x5a/0x180
WARNING: CPU: 0 PID: 828 at kernel/sched/core.c:6741 __might_sleep+0x6f/0x80
Modules linked in:
We shouldn't call kthread_freezable_should_stop() from deep within the
wait_event code because the task state is still set as
TASK_INTERRUPTIBLE instead of TASK_RUNNING and
kthread_freezable_should_stop() will try to call into the freezer with
the task in the wrong state. Use wait_event_freezable() instead so that
it calls schedule() in the right place and tries to enter the freezer
when the task state is TASK_RUNNING instead.
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Keerthy <j-keerthy@ti.com> Fixes: ff296293b353 ("random: Support freezable kthreads in add_hwgenerator_randomness()") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introducing a chosen node, rng-seed, which is an entropy that can be
passed to kernel called very early to increase initial device
randomness. Bootloader should provide this entropy and the value is
read from /chosen/rng-seed in DT.
Obtain of_fdt_crc32 for CRC check after early_init_dt_scan_nodes(),
since early_init_dt_scan_chosen() would modify fdt to erase rng-seed.
Add a new interface add_bootloader_randomness() for rng-seed use case.
Depends on whether the seed is trustworthy, rng seed would be passed to
add_hwgenerator_randomness(). Otherwise it would be passed to
add_device_randomness(). Decision is controlled by kernel config
RANDOM_TRUST_BOOTLOADER.
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Theodore Ts'o <tytso@mit.edu> # drivers/char/random.c Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kthread calling this function is freezable after commit 03a3bb7ae631
("hwrng: core - Freeze khwrng thread during suspend") is applied.
Unfortunately, this function uses wait_event_interruptible() but doesn't
check for the kthread being woken up by the fake freezer signal. When a
user suspends the system, this kthread will wake up and if it fails the
entropy size check it will immediately go back to sleep and not go into
the freezer. Eventually, suspend will fail because the task never froze
and a warning message like this may appear:
PM: suspend entry (deep)
Filesystems sync: 0.000 seconds
Freezing user space processes ... (elapsed 0.001 seconds) done.
OOM killer disabled.
Freezing remaining freezable tasks ...
Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
hwrng R running task 0 289 2 0x00000020
[<c08c64c4>] (__schedule) from [<c08c6a10>] (schedule+0x3c/0xc0)
[<c08c6a10>] (schedule) from [<c05dbd8c>] (add_hwgenerator_randomness+0xb0/0x100)
[<c05dbd8c>] (add_hwgenerator_randomness) from [<bf1803c8>] (hwrng_fillfn+0xc0/0x14c [rng_core])
[<bf1803c8>] (hwrng_fillfn [rng_core]) from [<c015abec>] (kthread+0x134/0x148)
[<c015abec>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
Check for a freezer signal here and skip adding any randomness if the
task wakes up because it was frozen. This should make the kthread freeze
properly and suspend work again.
Fixes: 03a3bb7ae631 ("hwrng: core - Freeze khwrng thread during suspend") Reported-by: Keerthy <j-keerthy@ti.com> Tested-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: eb9d1bf079bb: "random: only read from /dev/random after its pool has received 128 bits" Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some architectures set up CFLAGS for linux decompressor phase from
scratch and do not include GCC_PLUGINS_CFLAGS. Since "latent_entropy"
variable declaration is generated by the plugin code itself including
linux/random.h in decompressor code then would cause a build
error. E.g. on s390:
In file included from ./include/linux/net.h:22,
from ./include/linux/skbuff.h:29,
from ./include/linux/if_ether.h:23,
from ./arch/s390/include/asm/diag.h:12,
from arch/s390/boot/startup.c:8:
./include/linux/random.h: In function 'add_latent_entropy':
./include/linux/random.h:26:39: error: 'latent_entropy' undeclared
(first use in this function); did you mean 'add_latent_entropy'?
26 | add_device_randomness((const void *)&latent_entropy,
| ^~~~~~~~~~~~~~
| add_latent_entropy
./include/linux/random.h:26:39: note: each undeclared identifier is
reported only once for each function it appears in
The build error is triggered by commit a80313ff91ab ("s390/kernel:
introduce .dma sections") which made it into 5.2 merge window.
To address that avoid using CONFIG_GCC_PLUGIN_LATENT_ENTROPY in
favour of LATENT_ENTROPY_PLUGIN definition which is defined as a
part of gcc plugins cflags and hence reflect more accurately when gcc
plugin is active. Besides that it is also used for similar purpose in
linux/compiler-gcc.h for latent_entropy attribute definition.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Right now rand_initialize() is run as an early_initcall(), but it only
depends on timekeeping_init() (for mixing ktime_get_real() into the
pools). However, the call to boot_init_stack_canary() for stack canary
initialization runs earlier, which triggers a warning at boot:
random: get_random_bytes called from start_kernel+0x357/0x548 with crng_init=0
Instead, this moves rand_initialize() to after timekeeping_init(), and moves
canary initialization here as well.
Note that this warning may still remain for machines that do not have
UEFI RNG support (which initializes the RNG pools during setup_arch()),
or for x86 machines without RDRAND (or booting without "random.trust=on"
or CONFIG_RANDOM_TRUST_CPU=y).
Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Immediately after boot, we allow reads from /dev/random before its
entropy pool has been fully initialized. Fix this so that we don't
allow this until the blocking pool has received 128 bits.
We do this by repurposing the initialized flag in the entropy pool
struct, and use the initialized flag in the blocking pool to indicate
whether it is safe to pull from the blocking pool.
To do this, we needed to rework when we decide to push entropy from the
input pool to the blocking pool, since the initialized flag for the
input pool was used for this purpose. To simplify things, we no
longer use the initialized flag for that purpose, nor do we use the
entropy_total field any more.
Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since the definition of struct crng_state is private to random.c, and
primary_crng is neither declared or used elsewhere, there's no reason
for that symbol to have external linkage.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 45089142b149 Aneesh had missed one (admittedly, very unlikely
to hit) case in v9fs_stat2inode_dotl(). However, the same considerations
apply there as well - we have no business whatsoever to change ->i_rdev
or the file type.
Cc: Tadeusz Struk <tadeusz.struk@linaro.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The enumeration of MD_CLEAR in CPUID(EAX=7,ECX=0).EDX{bit 10} is not an
accurate indicator on all CPUs of whether the VERW instruction will
overwrite fill buffers. FB_CLEAR enumeration in
IA32_ARCH_CAPABILITIES{bit 17} covers the case of CPUs that are not
vulnerable to MDS/TAA, indicating that microcode does overwrite fill
buffers.
Guests running in VMM environments may not be aware of all the
capabilities/vulnerabilities of the host CPU. Specifically, a guest may
apply MDS/TAA mitigations when a virtual CPU is enumerated as vulnerable
to MDS/TAA even when the physical CPU is not. On CPUs that enumerate
FB_CLEAR_CTRL the VMM may set FB_CLEAR_DIS to skip overwriting of fill
buffers by the VERW instruction. This is done by setting FB_CLEAR_DIS
during VMENTER and resetting on VMEXIT. For guests that enumerate
FB_CLEAR (explicitly asking for fill buffer clear capability) the VMM
will not use FB_CLEAR_DIS.
Irrespective of guest state, host overwrites CPU buffers before VMENTER
to protect itself from an MMIO capable guest, as part of mitigation for
MMIO Stale Data vulnerabilities.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[cascardo: arch/x86/kvm/vmx.c has been split and context adjustment at
vmx_vcpu_run]
[cascardo: moved functions so they are after struct vcpu_vmx definition]
[cascardo: fb_clear is disabled/enabled around __vmx_vcpu_run] Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Shared Buffers Data Sampling (SBDS) variant of Processor MMIO Stale
Data vulnerabilities may expose RDRAND, RDSEED and SGX EGETKEY data.
Mitigation for this is added by a microcode update.
As some of the implications of SBDS are similar to SRBDS, SRBDS mitigation
infrastructure can be leveraged by SBDS. Set X86_BUG_SRBDS and use SRBDS
mitigation.
Mitigation is enabled by default; use srbds=off to opt-out. Mitigation
status can be checked from below file:
/sys/devices/system/cpu/vulnerabilities/srbds
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[cascardo: adjust for processor model names] Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, Linux disables SRBDS mitigation on CPUs not affected by
MDS and have the TSX feature disabled. On such CPUs, secrets cannot
be extracted from CPU fill buffers using MDS or TAA. Without SRBDS
mitigation, Processor MMIO Stale Data vulnerabilities can be used to
extract RDRAND, RDSEED, and EGETKEY data.
Do not disable SRBDS mitigation by default when CPU is also affected by
Processor MMIO Stale Data vulnerabilities.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add the sysfs reporting file for Processor MMIO Stale Data
vulnerability. It exposes the vulnerability and mitigation state similar
to the existing files for the other hardware vulnerabilities.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the CPU is affected by Processor MMIO Stale Data vulnerabilities,
Fill Buffer Stale Data Propagator (FBSDP) can propagate stale data out
of Fill buffer to uncore buffer when CPU goes idle. Stale data can then
be exploited with other variants using MMIO operations.
Mitigate it by clearing the Fill buffer before entering idle state.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
MDS, TAA and Processor MMIO Stale Data mitigations rely on clearing CPU
buffers. Moreover, status of these mitigations affects each other.
During boot, it is important to maintain the order in which these
mitigations are selected. This is especially true for
md_clear_update_mitigation() that needs to be called after MDS, TAA and
Processor MMIO Stale Data mitigation selection is done.
Introduce md_clear_select_mitigation(), and select all these mitigations
from there. This reflects relationships between these mitigations and
ensures proper ordering.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Processor MMIO Stale Data is a class of vulnerabilities that may
expose data after an MMIO operation. For details please refer to
Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst.
These vulnerabilities are broadly categorized as:
Device Register Partial Write (DRPW):
Some endpoint MMIO registers incorrectly handle writes that are
smaller than the register size. Instead of aborting the write or only
copying the correct subset of bytes (for example, 2 bytes for a 2-byte
write), more bytes than specified by the write transaction may be
written to the register. On some processors, this may expose stale
data from the fill buffers of the core that created the write
transaction.
Shared Buffers Data Sampling (SBDS):
After propagators may have moved data around the uncore and copied
stale data into client core fill buffers, processors affected by MFBDS
can leak data from the fill buffer.
Shared Buffers Data Read (SBDR):
It is similar to Shared Buffer Data Sampling (SBDS) except that the
data is directly read into the architectural software-visible state.
An attacker can use these vulnerabilities to extract data from CPU fill
buffers using MDS and TAA methods. Mitigate it by clearing the CPU fill
buffers using the VERW instruction before returning to a user or a
guest.
On CPUs not affected by MDS and TAA, user application cannot sample data
from CPU fill buffers using MDS or TAA. A guest with MMIO access can
still use DRPW or SBDR to extract data architecturally. Mitigate it with
VERW instruction to clear fill buffers before VMENTER for MMIO capable
guests.
Add a kernel parameter mmio_stale_data={off|full|full,nosmt} to control
the mitigation.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[cascardo: arch/x86/kvm/vmx.c has been moved] Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Processor MMIO Stale Data mitigation uses similar mitigation as MDS and
TAA. In preparation for adding its mitigation, add a common function to
update all mitigations that depend on MD_CLEAR.
[ bp: Add a newline in md_clear_update_mitigation() to separate
statements better. ]
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Processor MMIO Stale Data is a class of vulnerabilities that may
expose data after an MMIO operation. For more details please refer to
Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
Add the Processor MMIO Stale Data bug enumeration. A microcode update
adds new bits to the MSR IA32_ARCH_CAPABILITIES, define them.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[cascardo: adapted family names to the ones in v4.19] Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Japser Lake is an Atom family processor.
It uses Tremont cores and is targeted at mobile platforms.
Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit dfeae1073583("mtd: cfi_cmdset_0002: Change write buffer to
check correct value") buffered writes fail on S29GL064N. This is
because, on S29GL064N, reads return 0xFF at the end of DQ polling for
write completion, where as, chip_good() check expects actual data
written to the last location to be returned post DQ polling completion.
Fix is to revert to using chip_good() for S29GL064N which only checks
for DQ lines to settle down to determine write completion.
The RAID0 layout is irrelevant if all members have the same size so the
array has only one zone. It is *also* irrelevant if the array has two
zones and the second zone has only one device, for example if the array
has two members of different sizes.
So in that case it makes sense to allow assembly even when the layout is
undefined, like what is done when the array has only one zone.
Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Pascal Hambourg <pascal@plouf.fr.eu.org> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ptrace PEEKUSR/POKEUSR (aka PEEKUSER/POKEUSER) API allows a process
to read/write registers of another process.
To get/set a register, the API takes an index into an imaginary address
space called the "USER area", where the registers of the process are
laid out in some fashion.
The kernel then maps that index to a particular register in its own data
structures and gets/sets the value.
The API only allows a single machine-word to be read/written at a time.
So 4 bytes on 32-bit kernels and 8 bytes on 64-bit kernels.
The way floating point registers (FPRs) are addressed is somewhat
complicated, because double precision float values are 64-bit even on
32-bit CPUs. That means on 32-bit kernels each FPR occupies two
word-sized locations in the USER area. On 64-bit kernels each FPR
occupies one word-sized location in the USER area.
Internally the kernel stores the FPRs in an array of u64s, or if VSX is
enabled, an array of pairs of u64s where one half of each pair stores
the FPR. Which half of the pair stores the FPR depends on the kernel's
endianness.
To handle the different layouts of the FPRs depending on VSX/no-VSX and
big/little endian, the TS_FPR() macro was introduced.
Unfortunately the TS_FPR() macro does not take into account the fact
that the addressing of each FPR differs between 32-bit and 64-bit
kernels. It just takes the index into the "USER area" passed from
userspace and indexes into the fp_state.fpr array.
On 32-bit there are 64 indexes that address FPRs, but only 32 entries in
the fp_state.fpr array, meaning the user can read/write 256 bytes past
the end of the array. Because the fp_state sits in the middle of the
thread_struct there are various fields than can be overwritten,
including some pointers. As such it may be exploitable.
It has also been observed to cause systems to hang or otherwise
misbehave when using gdbserver, and is probably the root cause of this
report which could not be easily reproduced:
https://lore.kernel.org/linuxppc-dev/dc38afe9-6b78-f3f5-666b-986939e40fc6@keymile.com/
Rather than trying to make the TS_FPR() macro even more complicated to
fix the bug, or add more macros, instead add a special-case for 32-bit
kernels. This is more obvious and hopefully avoids a similar bug
happening again in future.
Note that because 32-bit kernels never have VSX enabled the code doesn't
need to consider TS_FPRWIDTH/OFFSET at all. Add a BUILD_BUG_ON() to
ensure that 32-bit && VSX is never enabled.
Fixes: 87fec0514f61 ("powerpc: PTRACE_PEEKUSR/PTRACE_POKEUSER of FPR registers in little endian builds") Cc: stable@vger.kernel.org # v3.13+ Reported-by: Ariel Miculas <ariel.miculas@belden.com> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220609133245.573565-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The bcm5974 driver does the allocation and dma mapping of the usb urb
data buffer, but driver does not set the URB_NO_TRANSFER_DMA_MAP flag
to let usb core know the buffer is already mapped.
usb core tries to map the already mapped buffer, causing a warning:
"xhci_hcd 0000:00:14.0: rejecting DMA map of vmalloc memory"
Fix this by setting the URB_NO_TRANSFER_DMA_MAP, letting usb core
know buffer is already mapped by bcm5974 driver
When the promiscuous mode is enabled on a VF, the IXGBE_VMOLR_VPE
bit (VLAN Promiscuous Enable) is set. This means that the VF will
receive packets whose VLAN is not the same than the VLAN of the VF.
If we tcpdump on VF3, we see all the packets, even those transmitted
on vlan 1000.
This behavior prevents to bridge VF1 and VF2 in VM2, because it will
create a loop: packets transmitted on VF1 will be received by VF2 and
vice-versa, and bridged again through the software bridge.
This patch remove the activation of VLAN Promiscuous when a VF enables
the promiscuous mode. However, the IXGBE_VMOLR_UPE bit (Unicast
Promiscuous) is kept, so that a VF receives all packets that has the
same VLAN, whatever the destination MAC address.
Fixes: 8443c1a4b192 ("ixgbe, ixgbevf: Add new mbox API xcast mode") Cc: stable@vger.kernel.org Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After a VF requested to remove the promiscuous flag on an interface, the
broadcast packets are not received anymore. This breaks some protocols
like ARP.
In ixgbe_update_vf_xcast_mode(), we should keep the IXGBE_VMOLR_BAM
bit (Broadcast Accept) on promiscuous removal.
This flag is already set by default in ixgbe_set_vmolr() on VF reset.
Fixes: 8443c1a4b192 ("ixgbe, ixgbevf: Add new mbox API xcast mode") Cc: stable@vger.kernel.org Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The first validation check for EVT_TRANSACTION has two different checks
tied together with logical AND. One is a check for minimum packet length,
and the other is for a valid aid_tag. If either condition is true (fails),
then an error should be triggered. The fix is to change && to ||.
The intention of the use of mmc_blk_reset_success() in
mmc_blk_cqe_recovery() was to prevent repeated resets when retrying and
getting the same error. However, that may not be the case - any amount
of time and I/O may pass before another recovery is needed, in which
case there would be no reason to deny it the opportunity to recover via
a reset if necessary. CQE recovery is expected seldom and failure to
recover (if the clear tasks command fails), even more seldom, so it is
better to allow the reset always, which can be done by calling
mmc_blk_reset_success() always.
The {dma|pio}_mode sysfs files are incorrectly documented as having a
list of the supported DMA/PIO transfer modes, while the corresponding
fields of the *struct* ata_device hold the transfer mode IDs, not masks.
To match these docs, the {dma|pio}_mode (and even xfer_mode!) sysfs
files are handled by the ata_bitfield_name_match() macro which leads to
reading such kind of nonsense from them:
While fixing the file documentation, somewhat reword the {dma|pio}_mode
file doc and add a note about being mostly useful for PATA devices to
the xfer_mode file doc...
Fixes: d9027470b886 ("[libata] Add ATA transport class") Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During reconnects, we check the return value from
cifs_negotiate_protocol, and have handlers for both success
and failures. But if that passes, and cifs_setup_session
returns any errors other than -EACCES, we do not handle
that. This fix adds a handler for that, so that we don't
go ahead and try a tree_connect on a failed session.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We should use size of descriptor chain to test loop condition
in the indirect case. And another statistical count is also introduced
for indirect descriptors to avoid conflict with the statistical count
of direct descriptors.
Fixes: f87d0fbb5798 ("vringh: host-side implementation of virtio rings.") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Fam Zheng <fam.zheng@bytedance.com>
Message-Id: <20220505100910.137-1-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The nodemask routines had mixed return values that provided potentially
signed return values that could never happen. This was leading to the
compiler getting confusing about the range of possible return values
(it was thinking things could be negative where they could not be). Fix
all the nodemask routines that should be returning unsigned
(or bool) values. Silences:
mm/swapfile.c: In function ‘setup_swap_info’:
mm/swapfile.c:2291:47: error: array subscript -1 is below array bounds of ‘struct plist_node[]’ [-Werror=array-bounds]
2291 | p->avail_lists[i].prio = 1;
| ~~~~~~~~~~~~~~^~~
In file included from mm/swapfile.c:16:
./include/linux/swap.h:292:27: note: while referencing ‘avail_lists’
292 | struct plist_node avail_lists[]; /*
| ^~~~~~~~~~~
"qemu-ndb -d" will call ioctl 'NBD_DISCONNECT' first, however, following
message was found:
block nbd0: Send disconnect failed -32
Which indicate that something is wrong with the server. Then,
"qemu-nbd -d" will call ioctl 'NBD_CLEAR_SOCK', however ioctl can't clear
requests after commit 2516ab1543fd("nbd: only clear the queue on device
teardown"). And in the meantime, request can't complete through timeout
because nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER', which
means such request will never be completed in this situation.
Now that the flag 'NBD_CMD_INFLIGHT' can make sure requests won't
complete multiple times, switch back to call nbd_clear_sock() in
nbd_clear_sock_ioctl(), so that inflight requests can be cleared.
When nbd module is being removing, nbd_alloc_config() may be
called concurrently by nbd_genl_connect(), although try_module_get()
will return false, but nbd_alloc_config() doesn't handle it.
The race may lead to the leak of nbd_config and its related
resources (e.g, recv_workq) and oops in nbd_read_stat() due
to the unload of nbd module as shown below:
Fixing it by checking the return value of try_module_get()
in nbd_alloc_config(). As nbd_alloc_config() may return ERR_PTR(-ENODEV),
assign nbd->config only when nbd_alloc_config() succeeds to ensure
the value of nbd->config is binary (valid or NULL).
Also adding a debug message to check the reference counter
of nbd_config during module removal.
The return value of is_arm_mapping_symbol() is unpredictable when "$"
is passed in.
strchr(3) says:
The strchr() and strrchr() functions return a pointer to the matched
character or NULL if the character is not found. The terminating null
byte is considered part of the string, so that if c is specified as
'\0', these functions return a pointer to the terminator.
When str[1] is '\0', strchr("axtd", str[1]) is not NULL, and str[2] is
referenced (i.e. buffer overrun).
In radeon_fp_native_mode(), the return value of drm_mode_duplicate()
is assigned to mode, which will lead to a NULL pointer dereference
on failure of drm_mode_duplicate(). Add a check to avoid npd.
The failure status of drm_cvt_mode() on the other path is checked too.
Signed-off-by: Gong Yuanjun <ruc_gongyuanjun@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
`rctime' has been a pain point in cephfs due to its buggy
nature - inconsistent values reported and those sorts.
Fixing rctime is non-trivial needing an overall redesign
of the entire nested statistics infrastructure.
As a workaround, PR
http://github.com/ceph/ceph/pull/37938
allows this extended attribute to be manually set. This allows
users to "fixup" inconsistent rctime values. While this sounds
messy, its probably the wisest approach allowing users/scripts
to workaround buggy rctime values.
The above PR enables Ceph MDS to allow manually setting
rctime extended attribute with the corresponding user-land
changes. We may as well allow the same to be done via kclient
for parity.
A non-zero return value from pfkey_broadcast() does not necessarily mean
an error occurred as this function returns -ESRCH when no registered
listener received the message. In particular, a call with
BROADCAST_PROMISC_ONLY flag and null one_sk argument can never return
zero so that this commit in fact prevents processing any PF_KEY message.
One visible effect is that racoon daemon fails to find encryption
algorithms like aes and refuses to start.
Excluding -ESRCH return value would fix this but it's not obvious that
we really want to bail out here and most other callers of
pfkey_broadcast() also ignore the return value. Also, as pointed out by
Steffen Klassert, PF_KEY is kind of deprecated and newer userspace code
should use netlink instead so that we should only disturb the code for
really important fixes.
v2: add a comment explaining why is the return value ignored
Generally, the md_unregister_thread is called with reconfig_mutex, but
raid_message in dm-raid doesn't hold reconfig_mutex to unregister thread,
so md_unregister_thread can be called simulitaneously from two call sites
in theory.
Then after previous commit which remove the protection of reconfig_mutex
for md_unregister_thread completely, the potential issue could be worse
than before.
Let's take pers_lock at the beginning of function to ensure reentrancy.
Reported-by: Donald Buczek <buczek@molgen.mpg.de> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Previously the protection of kernfs_pr_cont_buf was piggy backed by
rename_lock, which means that pr_cont() needs to be protected under
rename_lock. This can cause potential circular lock dependencies.
If there is an OOM, we have the following call hierarchy:
pr_cont_kernfs_name() will grab rename_lock and call printk. So we have
the following lock dependencies:
kernfs_rename_lock -> console_sem
Sometimes, printk does a wakeup before releasing console_sem, which has
the dependence chain:
console_sem -> p->pi_lock -> rq->lock
Now, imagine one wants to read cgroup_name under rq->lock, for example,
printing cgroup_name in a tracepoint in the scheduler code. They will
be holding rq->lock and take rename_lock:
rq->lock -> kernfs_rename_lock
Now they will deadlock.
A prevention to this circular lock dependency is to separate the
protection of pr_cont_buf from rename_lock. In principle, rename_lock
is to protect the integrity of cgroup name when copying to buf. Once
pr_cont_buf has got its content, rename_lock can be dropped. So it's
safe to drop rename_lock after kernfs_name_locked (and
kernfs_path_from_node_locked) and rely on a dedicated pr_cont_lock
to protect pr_cont_buf.
__msm_console_write() assumes that interrupts are disabled, but
with threaded console printers it is possible that the write()
callback of the console is called with interrupts enabled.
Explicitly disable interrupts using local_irq_save() to preserve
the assumed context.
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: John Ogness <john.ogness@linutronix.de> Link: https://lore.kernel.org/r/20220506213324.470461-1-john.ogness@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Local variable mac created at:
r871xu_drv_init+0x1771/0x3070 drivers/staging/rtl8712/usb_intf.c:394
usb_probe_interface+0xf19/0x1600 drivers/usb/core/driver.c:396
When a machine sports more than one SP804 timer instance, we only bring
up the first one, since multiple timers of the same kind are not useful
to Linux. As this is intentional behaviour, we should not return an
error message, as we do today:
===============
[ 0.000800] Failed to initialize '/bus@8000000/motherboard-bus@8000000/iofpga-bus@300000000/timer@120000': -22
===============
Replace the -EINVAL return with a debug message and return 0 instead.
Also we do not reach the init function anymore if the DT node is
disabled (as this is now handled by OF_DECLARE), so remove the explicit
check for that case.
This fixes a long standing bogus error when booting ARM's fastmodels.
Currently, someone can invoke the sysfs such as state_show()
intermittently before dev_set_drvdata() is done.
And it can be a cause of kernel Oops because of edev is Null at that time.
So modified the driver registration to after setting drviver data.
rtsx_usb_probe() doesn't call usb_set_intfdata() to null out the
interface pointer when probe fails. This leaves a stale pointer.
Noticed the missing usb_set_intfdata() while debugging an unrelated
invalid DMA mapping problem.
Fix it with a call to usb_set_intfdata(..., NULL).
UDC driver should not touch gadget's driver internals, especially it
should not reset driver->bus. This wasn't harmful so far, but since
commit fc274c1e9973 ("USB: gadget: Add a new bus for gadgets") gadget
subsystem got it's own bus and messing with ->bus triggers the
following NULL pointer dereference:
dwc2 12480000.hsotg: bound driver g_ether
8<--- cut here ---
Unable to handle kernel NULL pointer dereference at virtual address 00000000
[00000000] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in: ...
CPU: 0 PID: 620 Comm: modprobe Not tainted 5.18.0-rc5-next-20220504 #11862
Hardware name: Samsung Exynos (Flattened Device Tree)
PC is at module_add_driver+0x44/0xe8
LR is at sysfs_do_create_link_sd+0x84/0xe0
...
Process modprobe (pid: 620, stack limit = 0x(ptrval))
...
module_add_driver from bus_add_driver+0xf4/0x1e4
bus_add_driver from driver_register+0x78/0x10c
driver_register from usb_gadget_register_driver_owner+0x40/0xb4
usb_gadget_register_driver_owner from do_one_initcall+0x44/0x1e0
do_one_initcall from do_init_module+0x44/0x1c8
do_init_module from load_module+0x19b8/0x1b9c
load_module from sys_finit_module+0xdc/0xfc
sys_finit_module from ret_fast_syscall+0x0/0x54
Exception stack(0xf1771fa8 to 0xf1771ff0)
...
dwc2 12480000.hsotg: new device is high-speed
---[ end trace 0000000000000000 ]---
The documentation for the freeze() method says that it "should quiesce
the device so that it doesn't generate IRQs or DMA". The unspoken
consequence of not doing this is that MSIs aimed at non-boot CPUs may
get fully lost if they're sent during the period where the target CPU is
offline.
The current callbacks for USB HCD do not fully quiesce interrupts,
specifically on XHCI. Change to use the full suspend/resume flow for
freeze/thaw to ensure interrupts are fully quiesced. This fixes issues
where USB devices fail to thaw during hibernation because XHCI misses
its interrupt and cannot recover.
We hold oxu->lock in position (1) of thread 1, and use
del_timer_sync() to wait timer to stop, but timer handler
also need oxu->lock in position (2) of thread 2. As a result,
oxu_bus_suspend() will block forever.
This patch extracts del_timer_sync() from the protection of
spin_lock_irq(), which could let timer handler to obtain
the needed lock.
We hold sport->port.lock in position (1) of thread 1 and
use del_timer_sync() to wait timer to stop, but timer handler
also need sport->port.lock in position (2) of thread 2. As a result,
sa1100_set_termios() will block forever.
This patch moves del_timer_sync() before spin_lock_irqsave()
in order to prevent the deadlock.
We hold ieee->beacon_lock in position (1) of thread 1 and
use del_timer_sync() to wait timer to stop, but timer handler
also need ieee->beacon_lock in position (2) of thread 2.
As a result, rtllib_beacons_stop() will block forever.
This patch extracts del_timer_sync() from the protection of
spin_lock_irqsave(), which could let timer handler to obtain
the needed lock.
We hold ieee->beacon_lock in position (1) of thread 1 and use
del_timer_sync() to wait timer to stop, but timer handler
also need ieee->beacon_lock in position (2) of thread 2.
As a result, ieee80211_beacons_stop() will block forever.
This patch extracts del_timer_sync() from the protection of
spin_lock_irqsave(), which could let timer handler to obtain
the needed lock.
To be sufficiently out of range for the usercopy test to see the lifetime
mismatch, expand the size of the "bad" buffer, which will let it be
beyond current_stack_pointer regardless of stack growth direction.
Paired with the recent addition of stack depth checking under
CONFIG_HARDENED_USERCOPY=y, this will correctly start tripping again.
kstrdup() is also a memory allocation-related function, it returns NULL
when some memory errors happen. So it is better to check the return
value of it so to catch the memory error in time. Besides, there should
have a kfree() to clear up the allocation if we get a failure later in
this function to prevent memory leak.
Gcc-12 correctly warned about this code using a non-NULL pointer as a
truth value:
drivers/gpu/drm/imx/ipuv3-crtc.c: In function ‘ipu_crtc_disable_planes’:
drivers/gpu/drm/imx/ipuv3-crtc.c:72:21: error: the comparison will always evaluate as ‘true’ for the address of ‘plane’ will never be NULL [-Werror=address]
72 | if (&ipu_crtc->plane[1] && plane == &ipu_crtc->plane[1]->base)
| ^
due to the extraneous '&' address-of operator.
Philipp Zabel points out that The mistake had no adverse effect since
the following condition doesn't actually dereference the NULL pointer,
but the intent of the code was obviously to check for it, not to take
the address of the member.
Every iteration of for_each_child_of_node() decrements
the reference count of the previous node.
When break from a for_each_child_of_node() loop,
we need to explicitly call of_node_put() on the child node when
not need anymore.
Add missing of_node_put() to avoid refcount leak.
Fixes: bbd2190ce96d ("Altera TSE: Add main and header file for Altera Ethernet Driver") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20220607041144.7553-1-linmq006@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
But was eventually revised more thoroughly:
- restrict the check to the only branch where needed, in an
uncommon GRE path that uses header_ops and calls skb_pull.
- test skb_transport_header, which is set along with csum_start
in skb_partial_csum_set in the normal header_ops datapath.
Turns out skbs can arrive in this branch without the transport
header set, e.g., through BPF redirection.
Revise the check back to check csum_start directly, and only if
CHECKSUM_PARTIAL. Do leave the check in the updated location.
Check field regardless of whether TUNNEL_CSUM is configured.
The current design does not arm the tracer if traces are available before
the tracer string database is fully loaded, leading to an unfunctional tracer.
This fix will rearm the tracer every time the FW triggers tracer event
regardless of the tracer strings database status.