The minimum encryption key size for LE connections is 56 bits and to
align LE with BR/EDR, enforce 56 bits of minimum encryption key size for
BR/EDR connections as well.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
On ThinkPad P50 SKUs with an Nvidia Quadro M1000M instead of the M2000M
variant, the BIOS does not always reset the secondary Nvidia GPU during
reboot if the laptop is configured in Hybrid Graphics mode. The reason is
unknown, but the following steps and possibly a good bit of patience will
reproduce the issue:
1. Boot up the laptop normally in Hybrid Graphics mode
2. Make sure nouveau is loaded and that the GPU is awake
3. Allow the Nvidia GPU to runtime suspend itself after being idle
4. Reboot the machine, the more sudden the better (e.g. sysrq-b may help)
5. If nouveau loads up properly, reboot the machine again and go back to
step 2 until you reproduce the issue
This results in some very strange behavior: the GPU will be left in exactly
the same state it was in when the previously booted kernel started the
reboot. This has all sorts of bad side effects: for starters, this
completely breaks nouveau starting with a mysterious EVO channel failure
that happens well before we've actually used the EVO channel for anything:
This causes a timeout trying to bring up the GR ctx:
nouveau 0000:01:00.0: timeout
WARNING: CPU: 0 PID: 12 at drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.c:1547 gf100_grctx_generate+0x7b2/0x850 [nouveau]
Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET82W (1.55 ) 12/18/2018
Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
...
nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000008000 engine 00 [GR] client 15 [HUB/SCC_NB] reason c4 [] on channel -1 [0000000000 unknown]
The GPU never manages to recover. Booting without loading nouveau causes
issues as well, since the GPU starts sending spurious interrupts that cause
other device's IRQs to get disabled by the kernel:
irq 16: nobody cared (try booting with the "irqpoll" option)
...
handlers:
[<000000007faa9e99>] i801_isr [i2c_i801]
Disabling IRQ #16
...
serio: RMI4 PS/2 pass-through port at rmi4-00.fn03
i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
i801_smbus 0000:00:1f.4: Transaction timeout
rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-110).
i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
i801_smbus 0000:00:1f.4: Transaction timeout
rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts!
This causes the touchpad and sometimes other things to get disabled.
Since this happens without nouveau, we can't fix this problem from nouveau
itself.
Add a PCI quirk for the specific P50 variant of this GPU. Make sure the
GPU is advertising NoReset- so we don't reset the GPU when the machine is
in Dedicated graphics mode (where the GPU being initialized by the BIOS is
normal and expected). Map the GPU MMIO space and read the magic 0x2240c
register, which will have bit 1 set if the device was POSTed during a
previous boot. Once we've confirmed all of this, reset the GPU and
re-disable it - bringing it back to a healthy state.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203003 Link: https://lore.kernel.org/lkml/20190212220230.1568-1-lyude@redhat.com Signed-off-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: nouveau@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: Karol Herbst <kherbst@redhat.com> Cc: Ben Skeggs <skeggsb@gmail.com>
[bwh: Backported to 3.16:
- Use dev_{err,info}() instead of pci_{err,info}()
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
We need to compute the uart state only on the first open. This is
usually what is done in the ->install hook. serial_core used to do this
in ->open on every open. So move it to ->install.
As a side effect, it ensures the state is set properly in the window
after tty_init_dev is called, but before uart_open. This fixes a bunch
of races between tty_open and flush_to_ldisc we were dealing with
recently.
One of such bugs was attempted to fix in commit fedb5760648a (serial:
fix race between flush_to_ldisc and tty_open), but it only took care of
a couple of functions (uart_start and uart_unthrottle). I was able to
reproduce the crash on a SLE system, but in uart_write_room which is
also called from flush_to_ldisc via process_echoes. I was *unable* to
reproduce the bug locally. It is due to having this patch in my queue
since 2012!
0 in rbx means tty->driver_data is NULL in uart_write_room. 0x178 is
tried to be dereferenced (0x178 >> 3 is 0x2f in rdx) at
uart_write_room+0xc4. 0x178 is exactly (struct uart_state *)NULL->refcount
used in uart_port_lock from uart_write_room.
So revert the upstream commit here as my local patch should fix the
whole family.
Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Li RongQing <lirongqing@baidu.com> Cc: Wang Li <wangli39@baidu.com> Cc: Zhang Yu <zhangyu31@baidu.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16: The previous fix didn't apply, so we don't need
to revert it here.] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In journal_reclaim() ja->cur_idx of each cache will be update to
reclaim available journal buckets. Variable 'int n' is used to count how
many cache is successfully reclaimed, then n is set to c->journal.key
by SET_KEY_PTRS(). Later in journal_write_unlocked(), a for_each_cache()
loop will write the jset data onto each cache.
The problem is, if all jouranl buckets on each cache is full, the
following code in journal_reclaim(),
529 for_each_cache(ca, c, iter) {
530 struct journal_device *ja = &ca->journal;
531 unsigned int next = (ja->cur_idx + 1) % ca->sb.njournal_buckets;
532
533 /* No space available on this device */
534 if (next == ja->discard_idx)
535 continue;
536
537 ja->cur_idx = next;
538 k->ptr[n++] = MAKE_PTR(0,
539 bucket_to_sector(c, ca->sb.d[ja->cur_idx]),
540 ca->sb.nr_this_dev);
541 }
542
543 bkey_init(k);
544 SET_KEY_PTRS(k, n);
If there is no available bucket to reclaim, the if() condition at line
534 will always true, and n remains 0. Then at line 544, SET_KEY_PTRS()
will set KEY_PTRS field of c->journal.key to 0.
Setting KEY_PTRS field of c->journal.key to 0 is wrong. Because in
journal_write_unlocked() the journal data is written in following loop,
649 for (i = 0; i < KEY_PTRS(k); i++) {
650-671 submit journal data to cache device
672 }
If KEY_PTRS field is set to 0 in jouranl_reclaim(), the journal data
won't be written to cache device here. If system crahed or rebooted
before bkeys of the lost journal entries written into btree nodes, data
corruption will be reported during bcache reload after rebooting the
system.
Indeed there is only one cache in a cache set, there is no need to set
KEY_PTRS field in journal_reclaim() at all. But in order to keep the
for_each_cache() logic consistent for now, this patch fixes the above
problem by not setting 0 KEY_PTRS of journal key, if there is no bucket
available to reclaim.
Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
There is a race between cache device register and cache set unregister.
For an already registered cache device, register_bcache will call
bch_is_open to iterate through all cachesets and check every cache
there. The race occurs if cache_set_free executes at the same time and
clears the caches right before ca is dereferenced in bch_is_open_cache.
To close the race, let's make sure the clean up work is protected by
the bch_register_lock as well.
This issue can be reproduced as follows,
while true; do echo /dev/XXX> /sys/fs/bcache/register ; done&
while true; do echo 1> /sys/block/XXX/bcache/set/unregister ; done &
Do the proper cleanup in case the size check fails.
Tested with xfstests:generic/228
Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 0cbade024ba5 ("fuse: honor RLIMIT_FSIZE in fuse_file_fallocate") Cc: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The error from snd_usb_mixer_apply_create_quirk() is ignored in the
current usb-audio driver code, which will continue the probing even
after the error. Let's take it more serious.
Fixes: 7b1eda223deb ("ALSA: usb-mixer: factor out quirks") Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The __put_user() macro evaluates it's @ptr argument inside the
__uaccess_begin() / __uaccess_end() region. While this would normally
not be expected to be an issue, an UBSAN bug (it ignored -fwrapv,
fixed in GCC 8+) would transform the @ptr evaluation for:
drivers/gpu/drm/i915/i915_gem_execbuffer.c: if (unlikely(__put_user(offset, &urelocs[r-stack].presumed_offset))) {
into a signed-overflow-UB check and trigger the objtool AC validation.
Finish this commit:
2a418cf3f5f1 ("x86/uaccess: Don't leak the AC flag into __put_user() value evaluation")
and explicitly evaluate all 3 arguments early.
Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: luto@kernel.org Fixes: 2a418cf3f5f1 ("x86/uaccess: Don't leak the AC flag into __put_user() value evaluation") Link: http://lkml.kernel.org/r/20190424072208.695962771@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The test robot reported a wrong assignment of a per-CPU variable which
it detected by using sparse and sent a report. The assignment itself is
correct. The annotation for sparse was wrong and hence the report.
The first pointer is a "normal" pointer and points to the per-CPU memory
area. That means that the __percpu annotation has to be moved.
Move the __percpu annotation to pointer which points to the per-CPU
area. This change affects only the sparse tool (and is ignored by the
compiler).
Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: f97f8f06a49fe ("smpboot: Provide infrastructure for percpu hotplug threads") Link: http://lkml.kernel.org/r/20190424085253.12178-1-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When using PCI passthrough with this device, the host machine locks up
completely when starting the VM, requiring a hard reboot. Add a quirk to
avoid bus resets on this device.
The ctrl_check_input() function is called from pvr2_ctrl_range_check().
It's supposed to validate user supplied input and return true or false
depending on whether the input is valid or not. The problem is that
negative shifts or shifts greater than 31 are undefined in C. In
practice with GCC they result in shift wrapping so this function returns
true for some inputs which are not valid and this could result in a
buffer overflow:
drivers/media/usb/pvrusb2/pvrusb2-ctrl.c:205 pvr2_ctrl_get_valname()
warn: uncapped user index 'names[val]'
The cptr->hdw->input_allowed_mask mask is configured in pvr2_hdw_create()
and the highest valid bit is BIT(4).
Fixes: 7fb20fa38caa ("V4L/DVB (7299): pvrusb2: Improve logic which handles input choice availability") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
After removal of clock_start() from before soc_camera_init_i2c() in
soc_camera_probe() by commit 9aea470b399d ("[media] soc-camera: switch
I2C subdevice drivers to use v4l2-clk") introduced in v3.11, the ov6650
driver could no longer probe the sensor successfully because its clock
was no longer turned on in advance. The issue was initially worked
around by adding that missing clock_start() equivalent to OMAP1 camera
interface driver - the only user of this sensor - but a propoer fix
should be rather implemented in the sensor driver code itself.
Fix the issue by inserting a delay between the clock is turned on and
the sensor I2C registers are read for the first time.
Tested on Amstrad Delta with now out of tree but still locally
maintained omap1_camera host driver.
Fixes: 9aea470b399d ("[media] soc-camera: switch I2C subdevice drivers to use v4l2-clk") Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
[bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Add an of_node_put() when a tested device node is not available.
Fixes: c026c98739c7e ("powerpc/83xx: Do not configure or probe disabled FSL DR USB controllers") Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This fixes the various compiler warnings when building the msgque
selftest. The primary change is using sys/msg.h instead of linux/msg.h
directly to gain the API declarations.
If the user-provided IV needs to be aligned to the algorithm's
alignmask, then skcipher_walk_virt() copies the IV into a new aligned
buffer walk.iv. But skcipher_walk_virt() can fail afterwards, and then
if the caller unconditionally accesses walk.iv, it's a use-after-free.
arm32 xts-aes-neonbs doesn't set an alignmask, so currently it isn't
affected by this despite unconditionally accessing walk.iv. However
this is more subtle than desired, and it was actually broken prior to
the alignmask being removed by commit cc477bf64573 ("crypto: arm/aes -
replace bit-sliced OpenSSL NEON code"). Thus, update xts-aes-neonbs to
start checking the return value of skcipher_walk_virt().
Fixes: e4e7f10bfc40 ("ARM: add support for bit sliced AES using NEON instructions") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If the user-provided IV needs to be aligned to the algorithm's
alignmask, then skcipher_walk_virt() copies the IV into a new aligned
buffer walk.iv. But skcipher_walk_virt() can fail afterwards, and then
if the caller unconditionally accesses walk.iv, it's a use-after-free.
salsa20-generic doesn't set an alignmask, so currently it isn't affected
by this despite unconditionally accessing walk.iv. However this is more
subtle than desired, and it was actually broken prior to the alignmask
being removed by commit b62b3db76f73 ("crypto: salsa20-generic - cleanup
and convert to skcipher API").
Since salsa20-generic does not update the IV and does not need any IV
alignment, update it to use req->iv instead of walk.iv.
Fixes: 2407d60872dd ("[CRYPTO] salsa20: Salsa20 stream cipher") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Currently the kfree of output.pointer can be potentially freeing
an uninitalized pointer in the case where out_data is NULL. Fix this
by reworking the case where out_data is not-null to perform the
ACPI status check and also the kfree of outpoint.pointer in one block
and hence ensuring the pointer is only freed when it has been used.
Also replace the if (ptr != NULL) idiom with just if (ptr).
Fixes: ff0e9f26288d ("platform/x86: alienware-wmi: Correct a memory leak") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Currently if alloc_skb fails to allocate the skb a null skb is passed to
t4_set_arp_err_handler and this ends up dereferencing the null skb. Avoid
the NULL pointer dereference by checking for a NULL skb and returning
early.
Addresses-Coverity: ("Dereference null return") Fixes: b38a0ad8ec11 ("RDMA/cxgb4: Set arp error handler for PASS_ACCEPT_RPL messages") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Super-IO accesses may fail on a system with no or unmapped LPC bus.
Also, other drivers may attempt to access the LPC bus at the same time,
resulting in undefined behavior.
Use request_muxed_region() to ensure that IO access on the requested
address space is supported, and to ensure that access by multiple drivers
is synchronized.
Fixes: 2219cd81a6cd ("hwmon/vt1211: Add probing of alternate config index port") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Super-IO accesses may fail on a system with no or unmapped LPC bus.
Also, other drivers may attempt to access the LPC bus at the same time,
resulting in undefined behavior.
Use request_muxed_region() to ensure that IO access on the requested
address space is supported, and to ensure that access by multiple drivers
is synchronized.
Fixes: b72656dbc491 ("hwmon: (w83627hf) Stop using globals for I/O port numbers") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Super-IO accesses may fail on a system with no or unmapped LPC bus.
Also, other drivers may attempt to access the LPC bus at the same time,
resulting in undefined behavior.
Use request_muxed_region() to ensure that IO access on the requested
address space is supported, and to ensure that access by multiple drivers
is synchronized.
Fixes: 8d5d45fb1468 ("I2C: Move hwmon drivers (2/3)") Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reported-by: John Garry <john.garry@huawei.com> Cc: John Garry <john.garry@huawei.com> Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Super-IO accesses may fail on a system with no or unmapped LPC bus.
Also, other drivers may attempt to access the LPC bus at the same time,
resulting in undefined behavior.
Use request_muxed_region() to ensure that IO access on the requested
address space is supported, and to ensure that access by multiple drivers
is synchronized.
Fixes: 8d5d45fb1468 ("I2C: Move hwmon drivers (2/3)") Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reported-by: John Garry <john.garry@huawei.com> Cc: John Garry <john.garry@huawei.com> Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Super-IO accesses may fail on a system with no or unmapped LPC bus.
Also, other drivers may attempt to access the LPC bus at the same time,
resulting in undefined behavior.
Use request_muxed_region() to ensure that IO access on the requested
address space is supported, and to ensure that access by multiple drivers
is synchronized.
Fixes: ba224e2c4f0a7 ("hwmon: New PC87427 hardware monitoring driver") Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reported-by: John Garry <john.garry@huawei.com> Cc: John Garry <john.garry@huawei.com> Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Also, other drivers may attempt to access the LPC bus at the same time,
resulting in undefined behavior.
Use request_muxed_region() to ensure that IO access on the requested
address space is supported, and to ensure that access by multiple
drivers is synchronized.
Fixes: e53004e20a58e ("hwmon: New f71805f driver") Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reported-by: John Garry <john.garry@huawei.com> Cc: John Garry <john.garry@huawei.com> Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Currently, compat tasks running on arm64 can allocate memory up to
TASK_SIZE_32 (UL(0x100000000)).
This means that mmap() allocations, if we treat them as returning an
array, are not compliant with the sections 6.5.8 of the C standard
(C99) which states that: "If the expression P points to an element of
an array object and the expression Q points to the last element of the
same array object, the pointer expression Q+1 compares greater than P".
Redefine TASK_SIZE_32 to address the issue.
Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Jann Horn <jannh@google.com> Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
[will: fixed typo in comment] Signed-off-by: Will Deacon <will.deacon@arm.com>
[bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If module initialization fails after the character device has been
registered, unregister the character device. Additionally, avoid
duplicating error path code.
Cc: Himanshu Madhani <hmadhani@marvell.com> Cc: Giridhar Malavali <giridhar.malavali@qlogic.com> Fixes: 6a03b4cd78f3 ("[SCSI] qla2xxx: Add char device to increase driver use count") # v2.6.35. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The ->digest() method of crct10dif-pclmul reads the current CRC value
from the shash_desc context. But this value is uninitialized, causing
crypto_shash_digest() to compute the wrong result. Fix it.
Probably this wasn't noticed before because lib/crc-t10dif.c only uses
crypto_shash_update(), not crypto_shash_digest(). Likewise,
crypto_shash_digest() is not yet tested by the crypto self-tests because
those only test the ahash API which only uses shash init/update/final.
Fixes: 0b95a7f85718 ("crypto: crct10dif - Glue code to cast accelerated CRCT10DIF assembly as a crypto transform") Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The ->digest() method of crct10dif-generic reads the current CRC value
from the shash_desc context. But this value is uninitialized, causing
crypto_shash_digest() to compute the wrong result. Fix it.
Probably this wasn't noticed before because lib/crc-t10dif.c only uses
crypto_shash_update(), not crypto_shash_digest(). Likewise,
crypto_shash_digest() is not yet tested by the crypto self-tests because
those only test the ahash API which only uses shash init/update/final.
This bug was detected by my patches that improve testmgr to fuzz
algorithms against their generic implementation.
Fixes: 2d31e518a428 ("crypto: crct10dif - Wrap crc_t10dif function all to use crypto transform framework") Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
We hit a BUG at fs/buffer.c:3057 if we detached the nbd device
before unmounting ext4 filesystem.
The typical chain of events leading to the BUG:
jbd2_write_superblock
submit_bh
submit_bh_wbc
BUG_ON(!buffer_mapped(bh));
The block device is removed and all the pages are invalidated. JBD2
was trying to write journal superblock to the block device which is
no longer present.
Fix this by checking the journal superblock's buffer head prior to
submitting.
Reported-by: Eric Ren <renzhen@linux.alibaba.com> Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Due to an erratum in some Pericom PCIe-to-PCI bridges in reverse mode
(conventional PCI on primary side, PCIe on downstream side), the Retrain
Link bit needs to be cleared manually to allow the link training to
complete successfully.
If it is not cleared manually, the link training is continuously restarted
and no devices below the PCI-to-PCIe bridge can be accessed. That means
drivers for devices below the bridge will be loaded but won't work and may
even crash because the driver is only reading 0xffff.
See the Pericom Errata Sheet PI7C9X111SLB_errata_rev1.2_102711.pdf for
details. Devices known as affected so far are: PI7C9X110, PI7C9X111SL,
PI7C9X130.
Add a new flag, clear_retrain_link, in struct pci_dev. Quirks for affected
devices set this bit.
Note that pcie_retrain_link() lives in aspm.c because that's currently the
only place we use it, but this erratum is not specific to ASPM, and we may
retrain links for other reasons in the future.
Signed-off-by: Stefan Mätje <stefan.maetje@esd.eu>
[bhelgaas: apply regardless of CONFIG_PCIEASPM] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
[bwh: Backported to 3.16:
- Use dev_info() instead of pci_info()
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
pr->tx_bytes should be assigned to tx_bytes other than
rx_bytes.
Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: ce45b873028f ("ehea: Fixing statistics") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Commit e6f77540c067 ("scsi: qla2xxx: Fix an integer overflow in sysfs
code") incorrectly set 'optrom_region_size' to 'start+size', which can
overflow option-rom boundaries when 'start' is non-zero. Continue setting
optrom_region_size to the proper adjusted value of 'size'.
Fixes: e6f77540c067 ("scsi: qla2xxx: Fix an integer overflow in sysfs code") Signed-off-by: Andrew Vasquez <andrewv@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The use of zero-sized array causes undefined behaviour when it is not
the last member in a structure. As it happens to be in this case.
Also, the current code makes use of a language extension to the C90
standard, but the preferred mechanism to declare variable-length
types such as this one is a flexible array member, introduced in
C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last. Which is beneficial
to cultivate a high-quality code.
Fixes: e48f129c2f20 ("[SCSI] cxgb3i: convert cdev->l2opt to use rcu to prevent NULL dereference") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
syzbot is hitting use-after-free bug in uinput module [1]. This is because
kobject_uevent(KOBJ_REMOVE) is called again due to commit 0f4dafc0563c6c49
("Kobject: auto-cleanup on final unref") after memory allocation fault
injection made kobject_uevent(KOBJ_REMOVE) from device_del() from
input_unregister_device() fail, while uinput_destroy_device() is expecting
that kobject_uevent(KOBJ_REMOVE) is not called after device_del() from
input_unregister_device() completed.
That commit intended to catch cases where nobody even attempted to send
"remove" uevents. But there is no guarantee that an event will ultimately
be sent. We are at the point of no return as far as the rest of the kernel
is concerned; there are no repeats or do-overs.
Also, it is not clear whether some subsystem depends on that commit.
If no subsystem depends on that commit, it will be better to remove
the state_{add,remove}_uevent_sent logic. But we don't want to risk
a regression (in a patch which will be backported) by trying to remove
that logic. Therefore, as a first step, let's avoid the use-after-free bug
by making sure that kobject_uevent(KOBJ_REMOVE) won't be triggered twice.
Smatch marks skb->data as untrusted so it warns that "evt_hdr->dlen"
can copy up to 255 bytes and we only have room for two bytes. Even
if this comes from the firmware and we trust it, the new policy
generally is just to fix it as kernel hardenning.
I can't test this code so I tried to be very conservative. I considered
not allowing "evt_hdr->dlen == 1" because it doesn't initialize the
whole variable but in the end I decided to allow it and manually
initialized "asic_id" and "asic_ver" to zero.
Fixes: e8454ff7b9a4 ("[media] drivers:media:radio: wl128x: FM Driver Common sources") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
clang warns about a possible variable use that gcc never
complained about:
drivers/media/platform/davinci/isif.c:982:32: error: variable 'frame_size' is uninitialized when used here
[-Werror,-Wuninitialized]
dm365_vpss_set_pg_frame_size(frame_size);
^~~~~~~~~~
drivers/media/platform/davinci/isif.c:887:2: note: variable 'frame_size' is declared here
struct vpss_pg_frame_size frame_size;
^
1 error generated.
There is no initialization for this variable at all, and there
has never been one in the mainline kernel, so we really should
not put that stack data into an mmio register.
On the other hand, I suspect that gcc checks the condition
more closely and notices that the global
isif_cfg.bayer.config_params.test_pat_gen flag is initialized
to zero and never written to from any code path, so anything
depending on it can be eliminated.
To shut up the clang warning, just remove the dead code manually,
it has probably never been used because any attempt to do so
would have resulted in undefined behavior.
Fixes: 63e3ab142fa3 ("V4L/DVB: V4L - vpfe capture - source for ISIF driver on DM365") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Lad, Prabhakar <prabhakar.csengg@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The bug manifests on systemd systems with multiple vtcon devices:
# cat /sys/devices/virtual/vtconsole/vtcon0/name
(S) dummy device
# cat /sys/devices/virtual/vtconsole/vtcon1/name
(M) frame buffer device
There systemd runs 'loadkeys' tool in tapallel for each vtcon
instance. This causes two parallel ioctl(KDSKBSENT) calls to
race into adding the same entry into 'func_table' array at:
drivers/tty/vt/keyboard.c:vt_do_kdgkb_ioctl()
The function has no locking around writes to 'func_table'.
The simplest reproducer is to have initrams with the following
init on a 8-CPU machine x86_64:
Commit 747834ab8347 ("ARM: OMAP2+: hwmod: revise hardreset behavior") made
the call to _enable() conditional based on no oh->rst_lines_cnt. This
caused the return value to be potentially uninitialized. Curiously we see
no compiler warnings for this, probably as this gets inlined.
We call _setup_reset() from _setup() and only _setup_postsetup() if the
return value is zero. Currently the return value can be uninitialized for
cases where oh->rst_lines_cnt is set and HWMOD_INIT_NO_RESET is not set.
Fixes: 747834ab8347 ("ARM: OMAP2+: hwmod: revise hardreset behavior") Cc: Paul Walmsley <paul@pwsan.com> Cc: Tero Kristo <t-kristo@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Clang -Wuninitialized notices that on is_qla40XX we never allocate any DMA
memory in get_fw_boot_info() but attempt to free it anyway:
drivers/scsi/qla4xxx/ql4_os.c:5915:7: error: variable 'buf_dma' is used uninitialized whenever 'if' condition is false
[-Werror,-Wsometimes-uninitialized]
if (!(val & 0x07)) {
^~~~~~~~~~~~~
drivers/scsi/qla4xxx/ql4_os.c:5985:47: note: uninitialized use occurs here
dma_free_coherent(&ha->pdev->dev, size, buf, buf_dma);
^~~~~~~
drivers/scsi/qla4xxx/ql4_os.c:5915:3: note: remove the 'if' if its condition is always true
if (!(val & 0x07)) {
^~~~~~~~~~~~~~~~~~~
drivers/scsi/qla4xxx/ql4_os.c:5885:20: note: initialize the variable 'buf_dma' to silence this warning
dma_addr_t buf_dma;
^
= 0
Skip the call to dma_free_coherent() here.
Fixes: 2a991c215978 ("[SCSI] qla4xxx: Boot from SAN support for open-iscsi") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
It must be made sure that immediate mode is not already set, when
modifying shadow register value in ehrpwm_pwm_disable(). Otherwise
modifications to the action-qualifier continuous S/W force
register(AQSFRC) will be done in the active register.
This may happen when both channels are being disabled. In this case,
only the first channel state will be recorded as disabled in the shadow
register. Later, when enabling the first channel again, the second
channel would be enabled as well. Setting RLDCSF to zero, first, ensures
that the shadow register is updated as desired.
Fixes: 38dabd91ff0b ("pwm: tiehrpwm: Fix disabling of output of PWMs") Signed-off-by: Christoph Vogtländer <c.vogtlaender@sigma-surface-science.com>
[vigneshr@ti.com: Improve commit message] Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The sysfs unexport in pwmchip_remove() is completely asymmetric
to what we do in pwmchip_add_with_polarity() and commit 0733424c9ba9
("pwm: Unexport children before chip removal") is a strong indication
that this was wrong to begin with. We should just move
pwmchip_sysfs_unexport() where it belongs, which is right after
pwmchip_sysfs_unexport_children(). In that case, we do not need
separate functions anymore either.
We also really want to remove sysfs irrespective of whether or not
the chip will be removed as a result of pwmchip_remove(). We can only
assume that the driver will be gone after that, so we shouldn't leave
any dangling sysfs files around.
This warning disappears if we move pwmchip_sysfs_unexport() to
the top of pwmchip_remove(), pwmchip_sysfs_unexport_children().
That way it is also outside of the pwm_lock section, which indeed
doesn't seem to be needed.
Moving the pwmchip_sysfs_export() call outside of that section also
seems fine and it'd be perfectly symmetric with pwmchip_remove() again.
So, this patch fixes them.
Signed-off-by: Phong Hoang <phong.hoang.wz@renesas.com>
[shimoda: revise the commit log and code] Fixes: 76abbdde2d95 ("pwm: Add sysfs interface") Fixes: 0733424c9ba9 ("pwm: Unexport children before chip removal") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Tested-by: Hoan Nguyen An <na-hoan@jinso.co.jp> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
While the sequencer is reset after each SPI message since commit 880c6d114fd79a69 ("spi: rspi: Add support for Quad and Dual SPI
Transfers on QSPI"), it was never reset for the first message, thus
relying on reset state or bootloader settings.
Fix this by initializing it explicitly during configuration.
Fixes: 0b2182ddac4b8837 ("spi: add support for Renesas RSPI") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The Renesas RSPI/QSPI driver performs SPI controller register
initialization in its spi_operations.setup() callback, without calling
pm_runtime_get_sync() first, which may cause spurious failures.
So far this went unnoticed, as this SPI controller is typically used
with a single SPI NOR FLASH containing the boot loader:
1. If the device's module clock is still enabled (left enabled by the
bootloader, and not yet disabled by the clk_disable_unused() late
initcall), register initialization succeeds,
2. If the device's module clock is disabled, register writes don't
seem to cause lock-ups or crashes.
Data received in the first SPI message may be corrupted, though.
Subsequent SPI messages seem to be OK.
E.g. on r8a7791/koelsch, one bit is lost while receiving the 6th
byte of the JEDEC ID for the s25fl512s FLASH, corrupting that byte
and all later bytes. But until commit a2126b0a010905e5 ("mtd:
spi-nor: refine Spansion S25FL512S ID"), the 6th byte was not
considered for FLASH identification.
Fix this by moving all initialization from the .setup() to the
.prepare_message() callback. The latter is always called after the
device has been runtime-resumed by the SPI core.
This also makes the driver follow the rule that .setup() must not change
global driver state or register values, as that might break a transfer
in progress.
Fixes: 490c97747d5dc77d ("spi: rspi: Add runtime PM support, using spi core auto_runtime_pm") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Mark Brown <broonie@kernel.org>
[bwh: Backported to 3.16: s/(controller|ctlr)/master/g] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
HalfSipHash, or hsiphash, is a shortened version of SipHash, which
generates 32-bit outputs using a weaker 64-bit key. It has *much* lower
security margins, and shouldn't be used for anything too sensitive, but
it could be used as a hashtable key function replacement, if the output
is never exposed, and if the security requirement is not too high.
The goal is to make this something that performance-critical jhash users
would be willing to use.
On 64-bit machines, HalfSipHash1-3 is slower than SipHash1-3, so we alias
SipHash1-3 to HalfSipHash1-3 on those systems.
hsiphash() is roughly 3 times slower than jhash(), but comes with a
considerable security improvement.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16 to avoid a build regression for WireGuard with
only part of the siphash API available] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Currently, we are releasing the indirect buffer where we are done with
it in ext4_ind_remove_space(), so we can see the brelse() and
BUFFER_TRACE() everywhere. It seems fragile and hard to read, and we
may probably forget to release the buffer some day. This patch cleans
up the code by putting of the code which releases the buffers to the
end of the function.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
All indirect buffers get by ext4_find_shared() should be released no
mater the branch should be freed or not. But now, we forget to release
the lower depth indirect buffers when removing space from the same
higher depth indirect block. It will lead to buffer leak and futher
more, it may lead to quota information corruption when using old quota,
consider the following case.
- Create and mount an empty ext4 filesystem without extent and quota
features,
- quotacheck and enable the user & group quota,
- Create some files and write some data to them, and then punch hole
to some files of them, it may trigger the buffer leak problem
mentioned above.
- Disable quota and run quotacheck again, it will create two new
aquota files and write the checked quota information to them, which
probably may reuse the freed indirect block(the buffer and page
cache was not freed) as data block.
- Enable quota again, it will invoke
vfs_load_quota_inode()->invalidate_bdev() to try to clean unused
buffers and pagecache. Unfortunately, because of the buffer of quota
data block is still referenced, quota code cannot read the up to date
quota info from the device and lead to quota information corruption.
This problem can be reproduced by xfstests generic/231 on ext3 file
system or ext4 file system without extent and quota features.
This patch fix this problem by releasing the missing indirect buffers,
in ext4_ind_remove_space().
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Ben Hutchings [Sat, 3 Aug 2019 16:11:42 +0000 (17:11 +0100)]
tcp: Clear sk_send_head after purging the write queue
Denis Andzakovic discovered a potential use-after-free in older kernel
versions, using syzkaller. tcp_write_queue_purge() frees all skbs in
the TCP write queue and can leave sk->sk_send_head pointing to freed
memory. tcp_disconnect() clears that pointer after calling
tcp_write_queue_purge(), but tcp_connect() does not. It is
(surprisingly) possible to add to the write queue between
disconnection and reconnection, so this needs to be done in both
places.
This bug was introduced by backports of commit 7f582b248d0a ("tcp:
purge write queue in tcp_connect_init()") and does not exist upstream
because of earlier changes in commit 75c119afe14f ("tcp: implement
rb-tree based retransmit queue"). The latter is a major change that's
not suitable for stable.
Reported-by: Denis Andzakovic <denis.andzakovic@pulsesecurity.co.nz> Bisected-by: Salvatore Bonaccorso <carnil@debian.org> Fixes: 7f582b248d0a ("tcp: purge write queue in tcp_connect_init()") Cc: <stable@vger.kernel.org> # before 4.15 Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When the lldd is processing the complete sas task in interrupt and set the
task stat as SAS_TASK_STATE_DONE, the smp timeout timer is able to be
triggered at the same time. And smp_task_timedout() will complete the task
wheter the SAS_TASK_STATE_DONE is set or not. Then the sas task may freed
before lldd end the interrupt process. Thus a use-after-free will happen.
Fix this by calling the complete() only when SAS_TASK_STATE_DONE is not
set. And remove the check of the return value of the del_timer(). Once the
LLDD sets DONE, it must call task->done(), which will call
smp_task_done()->complete() and the task will be completed and freed
correctly.
Reported-by: chenxiang <chenxiang66@hisilicon.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> CC: John Garry <john.garry@huawei.com> CC: Johannes Thumshirn <jthumshirn@suse.de> CC: Ewan Milne <emilne@redhat.com> CC: Christoph Hellwig <hch@lst.de> CC: Tomas Henzl <thenzl@redhat.com> CC: Dan Williams <dan.j.williams@intel.com> CC: Hannes Reinecke <hare@suse.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: John Garry <john.garry@huawei.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
else, we leak the addresses to userspace via ctnetlink events
and dumps.
Compute an ID on demand based on the immutable parts of nf_conn struct.
Another advantage compared to using an address is that there is no
immediate re-use of the same ID in case the conntrack entry is freed and
reallocated again immediately.
Fixes: 3583240249ef ("[NETFILTER]: nf_conntrack_expect: kill unique ID") Fixes: 7f85f914721f ("[NETFILTER]: nf_conntrack: kill unique ID") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[bwh: Backported to 3.16:
- Include <net/netns/hash.h> in nf_conntrack_core.c
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
According to Amit Klein and Benny Pinkas, IP ID generation is too weak
and might be used by attackers.
Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix())
having 64bit key and Jenkins hash is risky.
It is time to switch to siphash and its 128bit keys.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Amit Klein <aksecurity@gmail.com> Reported-by: Benny Pinkas <benny@pinkas.net> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
SipHash is a 64-bit keyed hash function that is actually a
cryptographically secure PRF, like HMAC. Except SipHash is super fast,
and is meant to be used as a hashtable keyed lookup function, or as a
general PRF for short input use cases, such as sequence numbers or RNG
chaining.
For the first usage:
There are a variety of attacks known as "hashtable poisoning" in which an
attacker forms some data such that the hash of that data will be the
same, and then preceeds to fill up all entries of a hashbucket. This is
a realistic and well-known denial-of-service vector. Currently
hashtables use jhash, which is fast but not secure, and some kind of
rotating key scheme (or none at all, which isn't good). SipHash is meant
as a replacement for jhash in these cases.
There are a modicum of places in the kernel that are vulnerable to
hashtable poisoning attacks, either via userspace vectors or network
vectors, and there's not a reliable mechanism inside the kernel at the
moment to fix it. The first step toward fixing these issues is actually
getting a secure primitive into the kernel for developers to use. Then
we can, bit by bit, port things over to it as deemed appropriate.
While SipHash is extremely fast for a cryptographically secure function,
it is likely a bit slower than the insecure jhash, and so replacements
will be evaluated on a case-by-case basis based on whether or not the
difference in speed is negligible and whether or not the current jhash usage
poses a real security risk.
For the second usage:
A few places in the kernel are using MD5 or SHA1 for creating secure
sequence numbers, syn cookies, port numbers, or fast random numbers.
SipHash is a faster and more fitting, and more secure replacement for MD5
in those situations. Replacing MD5 and SHA1 with SipHash for these uses is
obvious and straight-forward, and so is submitted along with this patch
series. There shouldn't be much of a debate over its efficacy.
Dozens of languages are already using this internally for their hash
tables and PRFs. Some of the BSDs already use this in their kernels.
SipHash is a widely known high-speed solution to a widely known set of
problems, and it's time we catch-up.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
As namespaces are sometimes used with overlapping ip address ranges,
we should also use the namespace as input to the hash to select the ip
fragmentation counter bucket.
Cc: Eric Dumazet <edumazet@google.com> Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The socket parameter might legally be NULL, thus sock_net is sometimes
causing a NULL pointer dereference. Using net_device pointer in dst_entry
is more reliable.
Fixes: b6a7719aedd7e5c ("ipv4: hash net ptr into fragmentation bucket selection") Reported-by: Rick Jones <rick.jones2@hp.com> Cc: Rick Jones <rick.jones2@hp.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
As namespaces are sometimes used with overlapping ip address ranges,
we should also use the namespace as input to the hash to select the ip
fragmentation counter bucket.
Cc: Eric Dumazet <edumazet@google.com> Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Matt Grant reported frequent crashes in ipv6_select_ident when
udp6_ufo_fragment is called from openvswitch on a skb that doesn't
have a dst_entry set.
ipv6_proxy_select_ident generates the frag_id without using the dst
associated with the skb. This approach was suggested by Vladislav
Yasevich.
Fixes: 0508c07f5e0c ("ipv6: Select fragment id during UFO segmentation if not set.") Cc: Vladislav Yasevich <vyasevic@redhat.com> Reported-by: Matt Grant <matt@mattgrant.net.nz> Tested-by: Matt Grant <matt@mattgrant.net.nz> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Make __ipv6_select_ident() static as it isn't used outside
the file.
Fixes: 0508c07f5e0c9 (ipv6: Select fragment id during UFO segmentation if not set.) Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ipv6: Select fragment id during UFO segmentation if not set.
Introduced a bug on LE in how ipv6 fragment id is assigned.
This was cought by nightly sparce check:
Resolve the following sparce error:
net/ipv6/output_core.c:57:38: sparse: incorrect type in assignment
(different base types)
net/ipv6/output_core.c:57:38: expected restricted __be32
[usertype] ip6_frag_id
net/ipv6/output_core.c:57:38: got unsigned int [unsigned]
[assigned] [usertype] id
Fixes: 0508c07f5e0c9 (ipv6: Select fragment id during UFO segmentation if not set.) Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Now that GSO layer can track if fragment id has been selected
and can allocate one if necessary, we don't need to do this in
tap and macvtap. This reverts most of the code and only keeps
the new ipv6 fragment id generation function that is still needed.
Fixes: 3d0ad09412ff (drivers/net: Disable UFO through virtio) Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If the IPv6 fragment id has not been set and we perform
fragmentation due to UFO, select a new fragment id.
We now consider a fragment id of 0 as unset and if id selection
process returns 0 (after all the pertrubations), we set it to
0x80000000, thus giving us ample space not to create collisions
with the next packet we may have to fragment.
When doing UFO integrity checking, we also select the
fragment id if it has not be set yet. This is stored into
the skb_shinfo() thus allowing UFO to function correclty.
This patch also removes duplicate fragment id generation code
and moves ipv6_select_ident() into the header as it may be
used during GSO.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Ben Hutchings [Fri, 9 Aug 2019 19:09:49 +0000 (20:09 +0100)]
Revert "inet: update the IP ID generation algorithm to higher standards."
This reverts commit 8b197d3ce585d6777197e0633d71e5af7d98cb35, which
was a stable-specific improvement to IP ID selection. I will apply
the upstream changes instead.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Instead of trying to allocate pages with GFP_USER in
add_ballooned_pages() check the available free memory via
si_mem_available(). GFP_USER is far less limiting memory exhaustion
than the test via si_mem_available().
This will avoid dom0 running out of memory due to excessive foreign
page mappings especially on ARM and on x86 in PVH mode, as those don't
have a pre-ballooned area which can be used for foreign mappings.
As the normal ballooning suffers from the same problem don't balloon
down more than si_mem_available() pages in one iteration. At the same
time limit the default maximum number of retries.
This is part of XSA-300.
Signed-off-by: Juergen Gross <jgross@suse.com>
[bwh: Backported to 3.16: adjust context, indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Add a new field, VIRTIO_BALLOON_S_AVAIL, to virtio_balloon memory
statistics protocol, corresponding to 'Available' in /proc/meminfo.
It indicates to the hypervisor how big the balloon can be inflated
without pushing the guest system to swap. This metric would be very
useful in VM orchestration software to improve memory management of
different VMs under overcommit.
This patch (of 2):
Factor out calculation of the available memory counter into a separate
exportable function, in order to be able to use it in other parts of the
kernel.
In particular, it appears a relevant metric to report to the hypervisor
via virtio-balloon statistics interface (in a followup patch).
Signed-off-by: Igor Redko <redkoi@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16 as dependency of commit a1078e821b60
"xen: let alloc_xenballooned_pages() fail if not enough memory free"] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The MemAvailable item in /proc/meminfo is to give users a hint of how
much memory is allocatable without causing swapping, so it excludes the
zones' low watermarks as unavailable to userspace.
However, for a userspace allocation, kswapd will actually reclaim until
the free pages hit a combination of the high watermark and the page
allocator's lowmem protection that keeps a certain amount of DMA and
DMA32 memory from userspace as well.
Subtract the full amount we know to be unavailable to userspace from the
number of free pages when calculating MemAvailable.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This fixes a global out-of-bounds read access in the copy_buffer
function of the floppy driver.
The FDDEFPRM ioctl allows one to set the geometry of a disk. The sect
and head fields (unsigned int) of the floppy_drive structure are used to
compute the max_sector (int) in the make_raw_rw_request function. It is
possible to overflow the max_sector. Next, max_sector is passed to the
copy_buffer function and used in one of the memcpy calls.
An unprivileged user could trigger the bug if the device is accessible,
but requires a floppy disk to be inserted.
The patch adds the check for the .sect * .head multiplication for not
overflowing in the set_geometry function.
This fixes the invalid pointer dereference in the drive_name function of
the floppy driver.
The native_format field of the struct floppy_drive_params is used as
floppy_type array index in the drive_name function. Thus, the field
should be checked the same way as the autodetect field.
To trigger the bug, one could use a value out of range and set the drive
parameters with the FDSETDRVPRM ioctl. Next, FDGETDRVTYP ioctl should
be used to call the drive_name. A floppy disk is not required to be
inserted.
CAP_SYS_ADMIN is required to call FDSETDRVPRM.
The patch adds the check for a value of the native_format field to be in
the '0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array
indices.
The bug was found by syzkaller.
Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: Drop changes in compat_setdrvprm(), as compat
ioctls go via fd_ioctl_locked() after translation in compat_ioctl.c.] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This fixes a global out-of-bounds read access in the next_valid_format
function of the floppy driver.
The values from autodetect field of the struct floppy_drive_params are
used as indices for the floppy_type array in the next_valid_format
function 'floppy_type[DP->autodetect[probed_format]].sect'.
To trigger the bug, one could use a value out of range and set the drive
parameters with the FDSETDRVPRM ioctl. A floppy disk is not required to
be inserted.
CAP_SYS_ADMIN is required to call FDSETDRVPRM.
The patch adds the check for values of the autodetect field to be in the
'0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array indices.
The bug was found by syzkaller.
Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: Drop changes in compat_setdrvprm(), as compat
ioctls go via fd_ioctl_locked() after translation in compat_ioctl.c.] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This fixes a divide by zero error in the setup_format_params function of
the floppy driver.
Two consecutive ioctls can trigger the bug: The first one should set the
drive geometry with such .sect and .rate values for the F_SECT_PER_TRACK
to become zero. Next, the floppy format operation should be called.
A floppy disk is not required to be inserted. An unprivileged user
could trigger the bug if the device is accessible.
The patch checks F_SECT_PER_TRACK for a non-zero value in the
set_geometry function. The proper check should involve a reasonable
upper limit for the .sect and .rate fields, but it could change the
UAPI.
The patch also checks F_SECT_PER_TRACK in the setup_format_params, and
cancels the formatting operation in case of zero.
On systems like P9 powernv where we have no TM (or P8 booted with
ppc_tm=off), userspace can construct a signal context which still has
the MSR TS bits set. The kernel tries to restore this context which
results in the following crash:
The problem is the signal code assumes TM is enabled when
CONFIG_PPC_TRANSACTIONAL_MEM is enabled. This may not be the case as
with P9 powernv or if `ppc_tm=off` is used on P8.
This means any local user can crash the system.
Fix the problem by returning a bad stack frame to the user if they try
to set the MSR TS bits with sigreturn() on systems where TM is not
supported.
Found with sigfuz kernel selftest on P9.
This fixes CVE-2019-13648.
Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context") Reported-by: Praveen Pandey <Praveen.Pandey@in.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190719050502.405-1-mikey@neuling.org Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The GTCO tablet input driver configures itself from an HID report sent
via USB during the initial enumeration process. Some debugging messages
are generated during the parsing. A debugging message indentation
counter is not bounds checked, leading to the ability for a specially
crafted HID report to cause '-' and null bytes be written past the end
of the indentation array. As long as the kernel has CONFIG_DYNAMIC_DEBUG
enabled, this code will not be optimized out. This was discovered
during code review after a previous syzkaller bug was found in this
driver.
Signed-off-by: Grant Hernandez <granthernandez@google.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Certain ttys operations (pty_unix98_ops) lack tiocmget() and tiocmset()
functions which are called by the certain HCI UART protocols (hci_ath,
hci_bcm, hci_intel, hci_mrvl, hci_qca) via hci_uart_set_flow_control()
or directly. This leads to an execution at NULL and can be triggered by
an unprivileged user. Fix this by adding a helper function and a check
for the missing tty operations in the protocols code.
This fixes CVE-2019-10207. The Fixes: lines list commits where calls to
tiocm[gs]et() or hci_uart_set_flow_control() were added to the HCI UART
protocols.
Link: https://syzkaller.appspot.com/bug?id=1b42faa2848963564a5b1b7f8c837ea7b55ffa50 Reported-by: syzbot+79337b501d6aa974d0f6@syzkaller.appspotmail.com Fixes: b3190df62861 ("Bluetooth: Support for Atheros AR300x serial chip") Fixes: 118612fb9165 ("Bluetooth: hci_bcm: Add suspend/resume PM functions") Fixes: ff2895592f0f ("Bluetooth: hci_intel: Add Intel baudrate configuration support") Fixes: 162f812f23ba ("Bluetooth: hci_uart: Add Marvell support") Fixes: fa9ad876b8e0 ("Bluetooth: hci_qca: Add support for Qualcomm Bluetooth chip wcn3990") Signed-off-by: Vladis Dronov <vdronov@redhat.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Reviewed-by: Yu-Chen, Cho <acho@suse.com> Tested-by: Yu-Chen, Cho <acho@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16:
- Only hci_ath is affected
- There is no serdev support] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This patch will check the weight and exit the loop if we exceeds the
weight. This is useful for preventing scsi kthread from hogging cpu
which is guest triggerable.
This addresses CVE-2019-3900.
Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Fixes: 057cbf49a1f0 ("tcm_vhost: Initial merge for vhost level target fabric driver") Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[bwh: Backported to 3.16:
- Drop changes in vhost_scsi_ctl_handle_vq()
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When the rx buffer is too small for a packet, we will discard the vq
descriptor and retry it for the next packet:
while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
&busyloop_intr))) {
...
/* On overrun, truncate and discard */
if (unlikely(headcount > UIO_MAXIOV)) {
iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
err = sock->ops->recvmsg(sock, &msg,
1, MSG_DONTWAIT | MSG_TRUNC);
pr_debug("Discarded rx packet: len %zd\n", sock_len);
continue;
}
...
}
This makes it possible to trigger a infinite while..continue loop
through the co-opreation of two VMs like:
1) Malicious VM1 allocate 1 byte rx buffer and try to slow down the
vhost process as much as possible e.g using indirect descriptors or
other.
2) Malicious VM2 generate packets to VM1 as fast as possible
Fixing this by checking against weight at the end of RX and TX
loop. This also eliminate other similar cases when:
- userspace is consuming the packets in the meanwhile
- theoretical TOCTOU attack if guest moving avail index back and forth
to hit the continue after vhost find guest just add new buffers
This addresses CVE-2019-3900.
Fixes: d8316f3991d20 ("vhost: fix total length when packets are too short") Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server") Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
[bwh: Backported to 3.16:
- Both Tx modes are handled in one loop in handle_tx()
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
We used to have vhost_exceeds_weight() for vhost-net to:
- prevent vhost kthread from hogging the cpu
- balance the time spent between TX and RX
This function could be useful for vsock and scsi as well. So move it
to vhost.c. Device must specify a weight which counts the number of
requests, or it can also specific a byte_weight which counts the
number of bytes that has been processed.
Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
[bwh: Backported to 3.16:
- Drop changes to vhost_vsock
- In vhost_net, both Tx modes are handled in one loop in handle_tx()
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Similar to commit a2ac99905f1e ("vhost-net: set packet weight of
tx polling to 2 * vq size"), we need a packet-based limit for
handler_rx, too - elsewhere, under rx flood with small packets,
tx can be delayed for a very long time, even without busypolling.
The pkt limit applied to handle_rx must be the same applied by
handle_tx, or we will get unfair scheduling between rx and tx.
Tying such limit to the queue length makes it less effective for
large queue length values and can introduce large process
scheduler latencies, so a constant valued is used - likewise
the existing bytes limit.
The selected limit has been validated with PVP[1] performance
test with different queue sizes:
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy
polling udp packets with small length(e.g. 1byte udp payload), because setting
VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length.
Ping-Latencies shown below were tested between two Virtual Machines using
netperf (UDP_STREAM, len=1), and then another machine pinged the client:
Seems pretty consistent, a small dip at 2 VQ sizes.
Ring size is a hint from device about a burst size it can tolerate. Based on
benchmarks, set the weight to 2 * vq size.
To evaluate this change, another tests were done using netperf(RR, TX) between
two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was
tweaked through qemu. Results shown below does not show obvious changes.
Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Haibin Zhang <haibinzhang@tencent.com> Signed-off-by: Yunfang Tai <yunfangtai@tencent.com> Signed-off-by: Lidong Chen <lidongchen@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
On all current Atom processors, instructions that use a segment register
value (e.g. a load or store) will not speculatively execute before the
last writer of that segment retires. Thus they will not use a
speculatively written segment value.
That means on ATOMs there is no speculation through SWAPGS, so the SWAPGS
entry paths can be excluded from the extra LFENCE if PTI is disabled.
Create a separate bug flag for the through SWAPGS speculation and mark all
out-of-order ATOMs and AMD/HYGON CPUs as not affected. The in-order ATOMs
are excluded from the whole mitigation mess anyway.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
[bwh: Backported to 3.16:
- There's no whitelist entry (or any support) for Hygon CPUs
- Use the next available X86_BUG number
- Adjust context, indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Somehow the swapgs mitigation entry code patch ended up with a JMPQ
instruction instead of JMP, where only the short jump is needed. Some
assembler versions apparently fail to optimize JMPQ into a two-byte JMP
when possible, instead always using a 7-byte JMP with relocation. For
some reason that makes the entry code explode with a #GP during boot.
Change it back to "JMP" as originally intended.
Fixes: 18ec54fdd6d1 ("x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations") Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.16: adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The previous commit added macro calls in the entry code which mitigate the
Spectre v1 swapgs issue if the X86_FEATURE_FENCE_SWAPGS_* features are
enabled. Enable those features where applicable.
The mitigations may be disabled with "nospectre_v1" or "mitigations=off".
There are different features which can affect the risk of attack:
- When FSGSBASE is enabled, unprivileged users are able to place any
value in GS, using the wrgsbase instruction. This means they can
write a GS value which points to any value in kernel space, which can
be useful with the following gadget in an interrupt/exception/NMI
handler:
if (coming from user space)
swapgs
mov %gs:<percpu_offset>, %reg1
// dependent load or store based on the value of %reg
// for example: mov %(reg1), %reg2
If an interrupt is coming from user space, and the entry code
speculatively skips the swapgs (due to user branch mistraining), it
may speculatively execute the GS-based load and a subsequent dependent
load or store, exposing the kernel data to an L1 side channel leak.
Note that, on Intel, a similar attack exists in the above gadget when
coming from kernel space, if the swapgs gets speculatively executed to
switch back to the user GS. On AMD, this variant isn't possible
because swapgs is serializing with respect to future GS-based
accesses.
NOTE: The FSGSBASE patch set hasn't been merged yet, so the above case
doesn't exist quite yet.
- When FSGSBASE is disabled, the issue is mitigated somewhat because
unprivileged users must use prctl(ARCH_SET_GS) to set GS, which
restricts GS values to user space addresses only. That means the
gadget would need an additional step, since the target kernel address
needs to be read from user space first. Something like:
if (coming from user space)
swapgs
mov %gs:<percpu_offset>, %reg1
mov (%reg1), %reg2
// dependent load or store based on the value of %reg2
// for example: mov %(reg2), %reg3
It's difficult to audit for this gadget in all the handlers, so while
there are no known instances of it, it's entirely possible that it
exists somewhere (or could be introduced in the future). Without
tooling to analyze all such code paths, consider it vulnerable.
Effects of SMAP on the !FSGSBASE case:
- If SMAP is enabled, and the CPU reports RDCL_NO (i.e., not
susceptible to Meltdown), the kernel is prevented from speculatively
reading user space memory, even L1 cached values. This effectively
disables the !FSGSBASE attack vector.
- If SMAP is enabled, but the CPU *is* susceptible to Meltdown, SMAP
still prevents the kernel from speculatively reading user space
memory. But it does *not* prevent the kernel from reading the
user value from L1, if it has already been cached. This is probably
only a small hurdle for an attacker to overcome.
Thanks to Dave Hansen for contributing the speculative_smap() function.
Thanks to Andrew Cooper for providing the inside scoop on whether swapgs
is serializing on AMD.
[ tglx: Fixed the USER fence decision and polished the comment as suggested
by Dave Hansen ]
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com>
[bwh: Backported to 3.16:
- Check for X86_FEATURE_KAISER instead of X86_FEATURE_PTI
- mitigations= parameter is x86-only here
- powerpc doesn't have Spectre mitigations
- Don't use __ro_after_init
- Adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Spectre v1 isn't only about array bounds checks. It can affect any
conditional checks. The kernel entry code interrupt, exception, and NMI
handlers all have conditional swapgs checks. Those may be problematic in
the context of Spectre v1, as kernel code can speculatively run with a user
GS.
For example:
if (coming from user space)
swapgs
mov %gs:<percpu_offset>, %reg
mov (%reg), %reg1
When coming from user space, the CPU can speculatively skip the swapgs, and
then do a speculative percpu load using the user GS value. So the user can
speculatively force a read of any kernel value. If a gadget exists which
uses the percpu value as an address in another load/store, then the
contents of the kernel value may become visible via an L1 side channel
attack.
A similar attack exists when coming from kernel space. The CPU can
speculatively do the swapgs, causing the user GS to get used for the rest
of the speculative window.
The mitigation is similar to a traditional Spectre v1 mitigation, except:
a) index masking isn't possible; because the index (percpu offset)
isn't user-controlled; and
b) an lfence is needed in both the "from user" swapgs path and the
"from kernel" non-swapgs path (because of the two attacks described
above).
The user entry swapgs paths already have SWITCH_TO_KERNEL_CR3, which has a
CR3 write when PTI is enabled. Since CR3 writes are serializing, the
lfences can be skipped in those cases.
On the other hand, the kernel entry swapgs paths don't depend on PTI.
To avoid unnecessary lfences for the user entry case, create two separate
features for alternative patching:
Use these features in entry code to patch in lfences where needed.
The features aren't enabled yet, so there's no functional change.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com>
[bwh: Backported to 3.16:
- Assign the CPU feature bits from word 7
- Add FENCE_SWAPGS_KERNEL_ENTRY to NMI entry, since it does not
use paranoid_entry
- Add a return after .Lerror_entry_from_usermode_after_swapgs, done
upstream by commit f10750536fa7 "x86/entry/64: Fix irqflag tracing wrt
context tracking"
- Include <asm/cpufeatures.h> in calling.h
- Adjust filenames, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
... can be reproduced by running the GS testcase of the ldt_gdt test unit in
the x86 selftests.
do_int80_syscall_32() will call enter_form_user_mode() to convert context
tracking state from user state to kernel state. The load_gs_index() call
can fail with user gsbase, gsbase will be fixed up and proceed if this
happen.
However, enter_from_user_mode() will be called again in the fixed up path
though it is context tracking kernel state currently.
This patch fixes it by just fixing up gsbase and telling lockdep that IRQs
are off once load_gs_index() failed with user gsbase.
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1475197266-3440-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16 as dependency of commit 18ec54fdd6d1
"x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations":
- Adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In 539f51136500 ("x86/asm/entry/64: Disentangle error_entry/exit
gsbase/ebx/usermode code"), I arranged the code slightly wrong
-- IRET faults would skip the code path that was intended to
execute on all error entries from user mode. Fix it up.
While we're at it, make all the labels in error_entry local.
This does not fix a bug, but we'll need it, and it slightly
shrinks the code.
Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/91e17891e49fa3d61357eadc451529ad48143ee1.1435952415.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16 as dependency of commit 18ec54fdd6d1
"x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations":
- Adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The error_entry/error_exit code to handle gsbase and whether we
return to user mdoe was a mess:
- error_sti was misnamed. In particular, it did not enable interrupts.
- Error handling for gs_change was hopelessly tangled the normal
usermode path. Separate it out. This saves a branch in normal
entries from kernel mode.
- The comments were bad.
Fix it up. As a nice side effect, there's now a code path that
happens on error entries from user mode. We'll use it soon.
Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f1be898ab93360169fb845ab85185948832209ee.1433878454.git.luto@kernel.org
[ Prettified it, clarified comments some more. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16 as dependency of commit 18ec54fdd6d1
"x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations":
- We do not use %ebx as a flag since we already have a backport of commit b3681dd548d0 "x86/entry/64: Remove %ebx handling from error_entry/exit",
so don't add the comments about that
- Adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
To choose whether to pick the GID from the old (16bit) or new (32bit)
field, we should check if the old gid field is set to 0xffff. Mainline
checks the old *UID* field instead - cut'n'paste from the corresponding
code in ufs_get_inode_uid().
Fixes: 252e211e90ce Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Packet sockets in datagram mode take a destination address. Verify its
length before passing to dev_hard_header.
Prior to 2.6.14-rc3, the send code ignored sll_halen. This is
established behavior. Directly compare msg_namelen to dev->addr_len.
Change v1->v2: initialize addr in all paths
Fixes: 6b8d95f1795c4 ("packet: validate address length if non-zero") Suggested-by: David Laight <David.Laight@aculab.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Packet send checks that msg_name is at least sizeof sockaddr_ll.
Packet recv must return at least this length, so that its output
can be passed unmodified to packet send.
This ceased to be true since adding support for lladdr longer than
sll_addr. Since, the return value uses true address length.
Always return at least sizeof sockaddr_ll, even if address length
is shorter. Zero the padding bytes.
Change v1->v2: do not overwrite zeroed padding again. use copy_len.
Fixes: 0fb375fb9b93 ("[AF_PACKET]: Allow for > 8 byte hardware addresses.") Suggested-by: David Laight <David.Laight@aculab.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>