There is a cut and paste issue here. The bug is that we are allocating
more memory than necessary for msp_maps. We should be allocating enough
space for a map_info struct (144 bytes) but we instead allocate enough
for an mtd_info struct (1840 bytes). It's a small waste.
The other part of this is not harmful but when we allocated msp_flash
then we allocated enough space fro a map_info pointer instead of an
mtd_info pointer. But since pointers are the same size it works out
fine.
Anyway, I decided to clean up all three allocations a bit to make them
a bit more consistent and clear.
Fixes: 68aa0fa87f6d ('[MTD] PMC MSP71xx flash/rootfs mappings') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
Commit 073db4a51ee4 ("mtd: fix: avoid race condition when accessing
mtd->usecount") fixed a race condition but due to poor ordering of the
mutex acquisition, introduced a potential deadlock.
The deadlock can occur, for example, when rmmod'ing the m25p80 module, which
will delete one or more MTDs, along with any corresponding mtdblock
devices. This could potentially race with an acquisition of the block
device as follows.
This is a classic (potential) ABBA deadlock, which can be fixed by
making the A->B ordering consistent everywhere. There was no real
purpose to the ordering in the original patch, AFAIR, so this shouldn't
be a problem. This ordering was actually already present in
del_mtd_blktrans_dev(), for one, where the function tried to ensure that
its caller already held mtd_table_mutex before it acquired &dev->lock:
if (mutex_trylock(&mtd_table_mutex)) {
mutex_unlock(&mtd_table_mutex);
BUG();
}
So, reverse the ordering of acquisition of &dev->lock and &mtd_table_mutex so
we always acquire mtd_table_mutex first.
When there is a CM id object that has port assigned to it, it means that
the cm-id asked for the specific port that it should go by it, but if
that port was removed (hot-unplug event) the cm-id was not updated.
In order to fix that the port keeps a list of all the cm-id's that are
planning to go by it, whenever the port is removed it marks all of them
as invalid.
This commit fixes a kernel panic which happens when running traffic between
guests and we force reboot a guest mid traffic, it triggers a kernel panic:
Because of an incorrect bit-masking done on the join state bits, when
handling a join request we failed to detect a difference between the
group join state and the request join state when joining as send only
full member (0x8). This caused the MC join request not to be sent.
This issue is relevant only when SRIOV is enabled and SM supports
send only full member.
This fix separates scope bits and join states bits a nibble each.
Fixes: b9c5d6a64358 ('IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
This fix solves a race between light flush and on the fly joins.
Light flush doesn't set the device to down and unset IPOIB_OPER_UP
flag, this means that if while flushing we have a MC join in progress
and the QP was attached to BC MGID we can have a mismatches when
re-attaching a QP to the BC MGID.
The light flush would set the broadcast group to NULL causing an on
the fly join to rejoin and reattach to the BC MCG as well as adding
the BC MGID to the multicast list. The flush process would later on
remove the BC MGID and detach it from the QP. On the next flush
the BC MGID is present in the multicast list but not found when trying
to detach it because of the previous double attach and single detach.
Fixes: ee1e2c82c245 ("IPoIB: Refresh paths instead of flushing them on SM change events") Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
The function send_leave sets the member: group->query_id
(group->query_id = ret) after calling the sa_query, but leave_handler
can be executed before the setting and it might delete the group object,
and will get a memory corruption.
Additionally, this patch gets rid of group->query_id variable which is
not used.
When a new CM connection is being requested, ipoib driver copies data
from the path pointer in the CM/tx object, the path object might be
invalid at the point and memory corruption will happened later when now
the CM driver will try using that data.
The next scenario demonstrates it:
neigh_add_path --> ipoib_cm_create_tx -->
queue_work (pointer to path is in the cm/tx struct)
#while the work is still in the queue,
#the port goes down and causes the ipoib_flush_paths:
ipoib_flush_paths --> path_free --> kfree(path)
#at this point the work scheduled starts.
ipoib_cm_tx_start --> copy from the (invalid)path pointer:
(memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);)
-> memory corruption.
To fix that the driver now starts the CM/tx connection only if that
specific path exists in the general paths database.
This check is protected with the relevant locks, and uses the gid from
the neigh member in the CM/tx object which is valid according to the ref
count that was taken by the CM/tx.
Regardless of the previous CPU a timer was on, add_timer_on()
currently simply sets timer->flags to the new CPU. As the caller must
be seeing the timer as idle, this is locally fine, but the timer
leaving the old base while unlocked can lead to race conditions as
follows.
Let's say timer was on cpu 0.
cpu 0 cpu 1
-----------------------------------------------------------------------------
del_timer(timer) succeeds
del_timer(timer)
lock_timer_base(timer) locks cpu_0_base
add_timer_on(timer, 1)
spin_lock(&cpu_1_base->lock)
timer->flags set to cpu_1_base
operates on @timer operates on @timer
This triggered with mod_delayed_work_on() which contains
"if (del_timer()) add_timer_on()" sequence eventually leading to the
following oops.
In the critical sysfs entry the thermal hwmon was returning wrong
temperature to the user-space. It was reporting the temperature of the
first trip point instead of the temperature of critical trip point.
For example:
/sys/class/hwmon/hwmon0/temp1_crit:50000
/sys/class/thermal/thermal_zone0/trip_point_0_temp:50000
/sys/class/thermal/thermal_zone0/trip_point_0_type:active
/sys/class/thermal/thermal_zone0/trip_point_3_temp:120000
/sys/class/thermal/thermal_zone0/trip_point_3_type:critical
Since commit e68b16abd91d ("thermal: add hwmon sysfs I/F") the driver
have been registering a sysfs entry if get_crit_temp() callback was
provided. However when accessed, it was calling get_trip_temp() instead
of the get_crit_temp().
Fixes: e68b16abd91d ("thermal: add hwmon sysfs I/F") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[wt: s/thermal_hwmon.c/thermal_core.c in 3.10]
All the scaling of the KXSD9 involves multiplication with a
fraction number < 1.
However the scaling value returned from IIO_INFO_SCALE was
unpredictable as only the micros of the value was assigned, and
not the integer part, resulting in scaling like this:
Any readings from the raw interface of the KXSD9 driver will
return an empty string, because it does not return
IIO_VAL_INT but rather some random value from the accelerometer
to the caller.
In some cases a NACK interrupt may be pending in the Status Register (SR)
as a result of a previous transfer. However at91_do_twi_transfer() did not
read the SR to clear pending interruptions before starting a new transfer.
Hence a NACK interrupt rose as soon as it was enabled again at the I2C
controller level, resulting in a wrong sequence of operations and strange
patterns of behaviour on the I2C bus, such as a clock stretch followed by
a restart of the transfer.
This first issue occurred with both DMA and PIO write transfers.
Also when a NACK error was detected during a PIO write transfer, the
interrupt handler used to wrongly start a new transfer by writing into the
Transmit Holding Register (THR). Then the I2C slave was likely to reply
with a second NACK.
This second issue is fixed in atmel_twi_interrupt() by handling the TXRDY
status bit only if both the TXCOMP and NACK status bits are cleared.
Tested with a at24 eeprom on sama5d36ek board running a linux-4.1-at91
kernel image. Adapted to linux-next.
Reported-by: Peter Rosin <peda@lysator.liu.se> Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com> Tested-by: Peter Rosin <peda@lysator.liu.se> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Fixes: 93563a6a71bb ("i2c: at91: fix a race condition when using the DMA controller") Signed-off-by: Willy Tarreau <w@1wt.eu>
Race condition between registering an I2C device driver and
deregistering an I2C adapter device which is assumed to manage that
I2C device may lead to a NULL pointer dereference due to the
uninitialized list head of driver clients.
The root cause of the issue is that the I2C bus may know about the
registered device driver and thus it is matched by bus_for_each_drv(),
but the list of clients is not initialized and commonly it is NULL,
because I2C device drivers define struct i2c_driver as static and
clients field is expected to be initialized by I2C core:
To solve the problem it is sufficient to do clients list head
initialization before calling driver_register().
The problem was found while using an I2C device driver with a sluggish
registration routine on a bus provided by a physically detachable I2C
master controller, but practically the oops may be reproduced under
the race between arbitraty I2C device driver registration and managing
I2C bus device removal e.g. by unbinding the latter over sysfs:
% echo 21a4000.i2c > /sys/bus/platform/drivers/imx-i2c/unbind
Unable to handle kernel NULL pointer dereference at virtual address 00000000
Internal error: Oops: 17 [#1] SMP ARM
CPU: 2 PID: 533 Comm: sh Not tainted 4.9.0-rc3+ #61
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
task: e5ada400 task.stack: e4936000
PC is at i2c_do_del_adapter+0x20/0xcc
LR is at __process_removed_adapter+0x14/0x1c
Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 35bd004a DAC: 00000051
Process sh (pid: 533, stack limit = 0xe4936210)
Stack: (0xe4937d28 to 0xe4938000)
Backtrace:
[<c0667be0>] (i2c_do_del_adapter) from [<c0667cc0>] (__process_removed_adapter+0x14/0x1c)
[<c0667cac>] (__process_removed_adapter) from [<c0516998>] (bus_for_each_drv+0x6c/0xa0)
[<c051692c>] (bus_for_each_drv) from [<c06685ec>] (i2c_del_adapter+0xbc/0x284)
[<c0668530>] (i2c_del_adapter) from [<bf0110ec>] (i2c_imx_remove+0x44/0x164 [i2c_imx])
[<bf0110a8>] (i2c_imx_remove [i2c_imx]) from [<c051a838>] (platform_drv_remove+0x2c/0x44)
[<c051a80c>] (platform_drv_remove) from [<c05183d8>] (__device_release_driver+0x90/0x12c)
[<c0518348>] (__device_release_driver) from [<c051849c>] (device_release_driver+0x28/0x34)
[<c0518474>] (device_release_driver) from [<c0517150>] (unbind_store+0x80/0x104)
[<c05170d0>] (unbind_store) from [<c0516520>] (drv_attr_store+0x28/0x34)
[<c05164f8>] (drv_attr_store) from [<c0298acc>] (sysfs_kf_write+0x50/0x54)
[<c0298a7c>] (sysfs_kf_write) from [<c029801c>] (kernfs_fop_write+0x100/0x214)
[<c0297f1c>] (kernfs_fop_write) from [<c0220130>] (__vfs_write+0x34/0x120)
[<c02200fc>] (__vfs_write) from [<c0221088>] (vfs_write+0xa8/0x170)
[<c0220fe0>] (vfs_write) from [<c0221e74>] (SyS_write+0x4c/0xa8)
[<c0221e28>] (SyS_write) from [<c0108a20>] (ret_fast_syscall+0x0/0x1c)
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Willy Tarreau <w@1wt.eu>
the eg20t driver call request_irq() function before the pch_base_address,
base address of i2c controller's register, is assigned an effective value.
there is one possible scenario that an interrupt which isn't inside eg20t
arrives immediately after request_irq() is executed when i2c controller
shares an interrupt number with others. since the interrupt handler
pch_i2c_handler() has already active as shared action, it will be called
and read its own register to determine if this interrupt is from itself.
At that moment, since base address of i2c registers is not remapped
in kernel space yet,so the INT handler will access an illegal address
and then a error occurs.
Currently omap-rng checks the return value of pm_runtime_get_sync and
reports failure if anything is returned, however it should be checking
if ret < 0 as pm_runtime_get_sync return 0 on success but also can return
1 if the device was already active which is not a failure case. Only
values < 0 are actual failures.
Fixes: 61dc0a446e5d ("hwrng: omap - Fix assumption that runtime_get_sync will always succeed") Signed-off-by: Dave Gerlach <d-gerlach@ti.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Willy Tarreau <w@1wt.eu>
pm_runtime_get_sync does return a error value that must be checked for
error conditions, else, due to various reasons, the device maynot be
enabled and the system will crash due to lack of clock to the hardware
module.
Before:
12.562784] [00000000] *pgd=fe193835
12.562792] Internal error: : 1406 [#1] SMP ARM
[...]
12.562864] CPU: 1 PID: 241 Comm: modprobe Not tainted 4.7.0-rc4-next-20160624 #2
12.562867] Hardware name: Generic DRA74X (Flattened Device Tree)
12.562872] task: ed51f140 ti: ed44c000 task.ti: ed44c000
12.562886] PC is at omap4_rng_init+0x20/0x84 [omap_rng]
12.562899] LR is at set_current_rng+0xc0/0x154 [rng_core]
[...]
After the proper checks:
[ 94.366705] omap_rng 48090000.rng: _od_fail_runtime_resume: FIXME:
missing hwmod/omap_dev info
[ 94.375767] omap_rng 48090000.rng: Failed to runtime_get device -19
[ 94.382351] omap_rng 48090000.rng: initialization failed.
Fixes: 665d92fa85b5 ("hwrng: OMAP: convert to use runtime PM") Cc: Paul Walmsley <paul@pwsan.com> Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[wt: adjusted context for pre-3.12-rc1 kernels]
Add proper error path (for disabling runtime PM) when registering of
hwrng fails.
Fixes: b329669ea0b5 ("hwrng: exynos - Add support for Exynos random number generator") Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Willy Tarreau <w@1wt.eu>
The commit 4097461897df ("Input: i8042 - break load dependency ...")
correctly set up ps2_cmd_mutex pointer for the KBD port but forgot to do
the same for AUX port(s), which results in communication on KBD and AUX
ports to clash with each other.
Fixes: 4097461897df ("Input: i8042 - break load dependency ...") Reported-by: Bruno Wolff III <bruno@wolff.to> Tested-by: Bruno Wolff III <bruno@wolff.to> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
As explained in 1407814240-4275-1-git-send-email-decui@microsoft.com we
have a hard load dependency between i8042 and atkbd which prevents
keyboard from working on Gen2 Hyper-V VMs.
> hyperv_keyboard invokes serio_interrupt(), which needs a valid serio
> driver like atkbd.c. atkbd.c depends on libps2.c because it invokes
> ps2_command(). libps2.c depends on i8042.c because it invokes
> i8042_check_port_owner(). As a result, hyperv_keyboard actually
> depends on i8042.c.
>
> For a Generation 2 Hyper-V VM (meaning no i8042 device emulated), if a
> Linux VM (like Arch Linux) happens to configure CONFIG_SERIO_I8042=m
> rather than =y, atkbd.ko can't load because i8042.ko can't load(due to
> no i8042 device emulated) and finally hyperv_keyboard can't work and
> the user can't input: https://bugs.archlinux.org/task/39820
> (Ubuntu/RHEL/SUSE aren't affected since they use CONFIG_SERIO_I8042=y)
To break the dependency we move away from using i8042_check_port_owner()
and instead allow serio port owner specify a mutex that clients should use
to serialize PS/2 command stream.
Reported-by: Mark Laws <mdl@60hz.org> Tested-by: Mark Laws <mdl@60hz.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
Michel Dänzer [Tue, 29 Nov 2016 09:40:20 +0000 (18:40 +0900)]
drm/radeon: Ensure vblank interrupt is enabled on DPMS transition to on
NOTE: This patch only applies to 4.5.y or older kernels. With newer
kernels, this problem cannot happen because the driver now uses
drm_crtc_vblank_on/off instead of drm_vblank_pre/post_modeset[0]. I
consider this patch safer for older kernels than backporting the API
change, because drm_crtc_vblank_on/off had various issues in older
kernels, and I'm not sure all fixes for those have been backported to
all stable branches where this patch could be applied.
---------------------
Fixes the vblank interrupt being disabled when it should be on, which
can cause at least the following symptoms:
* Hangs when running 'xset dpms force off' in a GNOME session with
gnome-shell using DRI2.
* RandR 1.4 slave outputs freezing with garbage displayed using
xf86-video-ati 7.8.0 or newer.
Reported-and-Tested-by: Max Staudt <mstaudt@suse.de> Reviewed-by: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
Somehow this one slipped through, which means drivers without modeset
support can be oopsed (since those also don't call
drm_mode_config_init, which means the crtc lookup will chase an
uninitalized idr).
Reported-by: Alexander Potapenko <glider@google.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
This bug seems to be present for a very long time.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
The global mutex of 'gdp_mutex' is used to serialize creating/querying
glue dir and its cleanup. Turns out it isn't a perfect way because
part(kobj_kset_leave()) of the actual cleanup action() is done inside
the release handler of the glue dir kobject. That means gdp_mutex has
to be held before releasing the last reference count of the glue dir
kobject.
This patch moves glue dir's cleanup after kobject_del() in device_del()
for avoiding the race.
Cc: Yijing Wang <wangyijing@huawei.com> Reported-by: Chandra Sekhar Lingutla <clingutla@codeaurora.org> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
The put_device() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[wt: backported only to ease next patch as suggested by Jiri]
When isofs_mount() is called to mount a device read-write, it returns
EACCES even before it checks that the device actually contains an isofs
filesystem. This may confuse mount(8) which then tries to mount all
subsequent filesystem types in read-only mode.
Fix the problem by returning EACCES only once we verify that the device
indeed contains an iso9660 filesystem.
Fixes: 17b7f7cf58926844e1dd40f5eb5348d481deca6a Reported-by: Kent Overstreet <kent.overstreet@gmail.com> Reported-by: Karel Zak <kzak@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Willy Tarreau <w@1wt.eu>
Change thaw_super() to check frozen != SB_FREEZE_COMPLETE rather than
frozen == SB_UNFROZEN, otherwise it can race with freeze_super() which
drops sb->s_umount after SB_FREEZE_WRITE to preserve the lock ordering.
In this case thaw_super() will wrongly call s_op->unfreeze_fs() before
it was actually frozen, and call sb_freeze_unlock() which leads to the
unbalanced percpu_up_write(). Unfortunately lockdep can't detect this,
so this triggers misc BUG_ON()'s in kernel/rcu/sync.c.
Reported-and-tested-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu>
seq_read() is a nasty piece of work, not to mention buggy.
It has (I think) an old bug which allows unprivileged userspace to read
beyond the end of m->buf.
I was getting these:
BUG: KASAN: slab-out-of-bounds in seq_read+0xcd2/0x1480 at addr ffff880116889880
Read of size 2713 by task trinity-c2/1329
CPU: 2 PID: 1329 Comm: trinity-c2 Not tainted 4.8.0-rc1+ #96
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
Call Trace:
kasan_object_err+0x1c/0x80
kasan_report_error+0x2cb/0x7e0
kasan_report+0x4e/0x80
check_memory_region+0x13e/0x1a0
kasan_check_read+0x11/0x20
seq_read+0xcd2/0x1480
proc_reg_read+0x10b/0x260
do_loop_readv_writev.part.5+0x140/0x2c0
do_readv_writev+0x589/0x860
vfs_readv+0x7b/0xd0
do_readv+0xd8/0x2c0
SyS_readv+0xb/0x10
do_syscall_64+0x1b3/0x4b0
entry_SYSCALL64_slow_path+0x25/0x25
Object at ffff880116889100, in cache kmalloc-4096 size: 4096
Allocated:
PID = 1329
save_stack_trace+0x26/0x80
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
__kmalloc+0x1aa/0x4a0
seq_buf_alloc+0x35/0x40
seq_read+0x7d8/0x1480
proc_reg_read+0x10b/0x260
do_loop_readv_writev.part.5+0x140/0x2c0
do_readv_writev+0x589/0x860
vfs_readv+0x7b/0xd0
do_readv+0xd8/0x2c0
SyS_readv+0xb/0x10
do_syscall_64+0x1b3/0x4b0
return_from_SYSCALL_64+0x0/0x6a
Freed:
PID = 0
(stack is not available)
Memory state around the buggy address: ffff88011688a000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88011688a080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88011688a100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff88011688a180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88011688a200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Disabling lock debugging due to kernel taint
This seems to be the same thing that Dave Jones was seeing here:
https://lkml.org/lkml/2016/8/12/334
There are multiple issues here:
1) If we enter the function with a non-empty buffer, there is an attempt
to flush it. But it was not clearing m->from after doing so, which
means that if we try to do this flush twice in a row without any call
to traverse() in between, we are going to be reading from the wrong
place -- the splat above, fixed by this patch.
2) If there's a short write to userspace because of page faults, the
buffer may already contain multiple lines (i.e. pos has advanced by
more than 1), but we don't save the progress that was made so the
next call will output what we've already returned previously. Since
that is a much less serious issue (and I have a headache after
staring at seq_read() for the past 8 hours), I'll leave that for now.
Link: http://lkml.kernel.org/r/1471447270-32093-1-git-send-email-vegard.nossum@oracle.com Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Reported-by: Dave Jones <davej@codemonkey.org.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
If the file permissions change on the server, then we may not be able to
recover open state. If so, we need to ensure that we mark the file
descriptor appropriately.
Before commit 778be232a207 ("NFS do not find client in NFSv4
pg_authenticate"), the Linux callback server replied with
RPC_AUTH_ERROR / RPC_AUTH_BADCRED, instead of dropping the CB
request. Let's restore that behavior so the server has a chance to
do something useful about it, and provide a warning that helps
admins correct the problem.
Fixes: 778be232a207 ("NFS do not find client in NFSv4 ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
scan_pool() does not mark the PEB for scrubing when bitflips are
detected in the EC header of a free PEB (VID header region left to
0xff).
Make sure we scrub the PEB in this case.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Fixes: dbb7d2a88d2a ("UBI: Add fastmap core") Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Willy Tarreau <w@1wt.eu>
If UBIFS is facing an error while walking a directory, it reports this
error and ubifs_readdir() returns the error code. But the VFS readdir
logic does not make the getdents system call fail in all cases. When the
readdir cursor indicates that more entries are present, the system call
will just return and the libc wrapper will try again since it also
knows that more entries are present.
This causes the libc wrapper to busy loop for ever when a directory is
corrupted on UBIFS.
A common approach do deal with corrupted directory entries is
skipping them by setting the cursor to the next entry. On UBIFS this
approach is not possible since we cannot compute the next directory
entry cursor position without reading the current entry. So all we can
do is setting the cursor to the "no more entries" position and make
getdents exit.
Signed-off-by: Richard Weinberger <richard@nod.at>
[wt: adjusted context]
An assertion in layout_in_gaps() verifies that the gap_lebs pointer is
below the maximum bound. When computing this maximum bound the idx_lebs
count is multiplied by sizeof(int), while C pointers arithmetic does take
into account the size of the pointed elements implicitly already. Remove
the multiplication to fix the assertion.
Fixes: 1e51764a3c2ac05a ("UBIFS: add new flash file system") Signed-off-by: Vincent Stehlé <vincent.stehle@intel.com> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Willy Tarreau <w@1wt.eu>
If we punch a hole on a reflink such that following conditions are met:
1. start offset is on a cluster boundary
2. end offset is not on a cluster boundary
3. (end offset is somewhere in another extent) or
(hole range > MAX_CONTIG_BYTES(1MB)),
we dont COW the first cluster starting at the start offset. But in this
case, we were wrongly passing this cluster to
ocfs2_zero_range_for_truncate() to zero out. This will modify the
cluster in place and zero it in the source too.
Fix this by skipping this cluster in such a scenario.
To reproduce:
1. Create a random file of say 10 MB
xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
2. Reflink it
reflink -f 10MBfile reflnktest
3. Punch a hole at starting at cluster boundary with range greater that
1MB. You can also use a range that will put the end offset in another
extent.
fallocate -p -o 0 -l 1048615 reflnktest
4. sync
5. Check the first cluster in the source file. (It will be zeroed out).
dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C
Link: http://lkml.kernel.org/r/1470957147-14185-1-git-send-email-ashish.samant@oracle.com Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Reported-by: Saar Maoz <saar.maoz@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Eric Ren <zren@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
Commit ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery")
checks if lockres master has changed to identify whether new master has
finished recovery or not. This will introduce a race that right after
old master does umount ( means master will change), a new convert
request comes.
In this case, it will reset lockres state to DLM_RECOVERING and then
retry convert, and then fail with lockres->l_action being set to
OCFS2_AST_INVALID, which will cause inconsistent lock level between
ocfs2 and dlm, and then finally BUG.
Since dlm recovery will clear lock->convert_pending in
dlm_move_lockres_to_recovery_list, we can use it to correctly identify
the race case between convert and recovery. So fix it.
Fixes: ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery") Link: http://lkml.kernel.org/r/57CE1569.8010704@huawei.com Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
If the subvol/snapshot create/destroy ioctls are passed a regular file
with execute permissions set, we'll eventually Oops while trying to do
inode->i_op->lookup via lookup_one_len.
This patch ensures that the file descriptor refers to a directory.
The function xfs_calc_dquots_per_chunk takes a parameter in units
of basic blocks. The kernel seems to get the units wrong, but
userspace got 'fixed' by commenting out the unnecessary conversion.
Fix both.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
From inspection, the superblock sb_inprogress check is done in the
verifier and triggered only for the primary superblock via a
"bp->b_bn == XFS_SB_DADDR" check.
Unfortunately, the primary superblock is an uncached buffer, and
hence it is configured by xfs_buf_read_uncached() with:
bp->b_bn = XFS_BUF_DADDR_NULL; /* always null for uncached buffers */
And so this check never triggers. Fix it.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
[wt: s/xfs_sb.c/xfs_mount.c in 3.10]
If we hold the superblock lock while calling reiserfs_quota_on_mount(), we can
deadlock our own worker - mount blocks kworker/3:2, sleeps forever more.
Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Mike Galbraith <mgalbraith@suse.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Willy Tarreau <w@1wt.eu>
new_insert_key only makes any sense when it's associated with a
new_insert_ptr, which is initialized to NULL and changed to a
buffer_head when we also initialize new_insert_key. We can key off of
that to avoid the uninitialized warning.
Link: http://lkml.kernel.org/r/5eca5ffb-2155-8df2-b4a2-f162f105efed@suse.com Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jan Kara <jack@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
If the block size or cluster size is insane, reject the mount. This
is important for security reasons (although we shouldn't be just
depending on this check).
Currently when doing a DAX hole punch with ext4 we fail to do a writeback.
This is because the logic around filemap_write_and_wait_range() in
ext4_punch_hole() only looks for dirty page cache pages in the radix tree,
not for dirty DAX exceptional entries.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Willy Tarreau <w@1wt.eu>
Now, ext4_do_update_inode() clears high 16-bit fields of uid/gid
of deleted and evicted inode to fix up interoperability with old
kernels. However, it checks only i_dtime of an inode to determine
whether the inode was deleted and evicted, and this is very risky,
because i_dtime can be used for the pointer maintaining orphan inode
list, too. We need to further check whether the i_dtime is being
used for the orphan inode list even if the i_dtime is not NULL.
We found that high 16-bit fields of uid/gid of inode are unintentionally
and permanently cleared when the inode truncation is just triggered,
but not finished, and the inode metadata, whose high uid/gid bits are
cleared, is written on disk, and the sudden power-off follows that
in order.
This might be unexpected but pages allocated for sbi->s_buddy_cache are
charged to current memory cgroup. So, GFP_NOFS allocation could fail if
current task has been killed by OOM or if current memory cgroup has no
free memory left. Block allocator cannot handle such failures here yet.
We temporally change checksum fields in buffers of some types of
metadata into '0' for verifying the checksum values. By doing this
without locking the buffer, some metadata's checksums, which are
being committed or written back to the storage, could be damaged.
In our test, several metadata blocks were found with damaged metadata
checksum value during recovery process. When we only verify the
checksum value, we have to avoid modifying checksum fields directly.
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
The arcmsr driver failed to pass SYNCHRONIZE CACHE to controller
firmware. Depending on how drive caches are handled internally by
controller firmware this could potentially lead to data integrity
problems.
Ensure that cache flushes are passed to the controller.
[mkp: applied by hand and removed unused vars]
Signed-off-by: Ching Huang <ching2048@areca.com.tw> Reported-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
We need to put an upper bound on "user_len" so the memcpy() doesn't
overflow.
[js] no ARCMSR_API_DATA_BUFLEN defined, use the number
Reported-by: Marco Grassi <marco.gra@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Willy Tarreau <w@1wt.eu>
This patch fixes one use-after-free report[1] by KASAN.
In __scsi_scan_target(), when a type 31 device is probed,
SCSI_SCAN_TARGET_PRESENT is returned and the target will be scanned
again.
Inside the following scsi_report_lun_scan(), one new scsi_device
instance is allocated, and scsi_probe_and_add_lun() is called again to
probe the target and still see type 31 device, finally
__scsi_remove_device() is called to remove & free the device at the end
of scsi_probe_and_add_lun(), so cause use-after-free in
scsi_report_lun_scan().
And the following SCSI log can be observed:
scsi 0:0:2:0: scsi scan: INQUIRY pass 1 length 36
scsi 0:0:2:0: scsi scan: INQUIRY successful with code 0x0
scsi 0:0:2:0: scsi scan: peripheral device type of 31, no device added
scsi 0:0:2:0: scsi scan: Sending REPORT LUNS to (try 0)
scsi 0:0:2:0: scsi scan: REPORT LUNS successful (try 0) result 0x0
scsi 0:0:2:0: scsi scan: REPORT LUN scan
scsi 0:0:2:0: scsi scan: INQUIRY pass 1 length 36
scsi 0:0:2:0: scsi scan: INQUIRY successful with code 0x0
scsi 0:0:2:0: scsi scan: peripheral device type of 31, no device added
BUG: KASAN: use-after-free in __scsi_scan_target+0xbf8/0xe40 at addr ffff88007b44a104
This patch fixes the issue by moving the putting reference at
the end of scsi_report_lun_scan().
Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
If a VFC port gets unmapped in the VIOS, it may not respond with a CRQ
init complete following H_REG_CRQ. If this occurs, we can end up having
called scsi_block_requests and not a resulting unblock until the init
complete happens, which may never occur, and we end up hanging I/O
requests. This patch ensures the host action stay set to
IBMVFC_HOST_ACTION_TGT_DEL so we move all rports into devloss state and
unblock unless we receive an init complete.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
This patch will fix regression caused by commit 1e793f6fc0db ("scsi:
megaraid_sas: Fix data integrity failure for JBOD (passthrough)
devices").
The problem was that the MEGASAS_IS_LOGICAL macro did not have braces
and as a result the driver ended up exposing a lot of non-existing SCSI
devices (all SCSI commands to channels 1,2,3 were returned as
SUCCESS-DID_OK by driver).
Commit 02b01e010afe ("megaraid_sas: return sync cache call with
success") modified the driver to successfully complete SYNCHRONIZE_CACHE
commands without passing them to the controller. Disk drive caches are
only explicitly managed by controller firmware when operating in RAID
mode. So this commit effectively disabled writeback cache flushing for
any drives used in JBOD mode, leading to data integrity failures.
[mkp: clarified patch description]
Fixes: 02b01e010afeeb49328d35650d70721d2ca3fd59 Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
Problem:
This is a work around for a bug with LSI Fusion MPT SAS2 when
pefroming secure erase. Due to the very long time the operation
takes commands issued during the erase will time out and will trigger
execution of abort hook. Even though the abort hook is called for
the specific command which timed out this leads to entire device halt
(scsi_state terminated) and premature termination of the secured erase.
Fix:
Set device state to busy while erase in progress to reject any incoming
commands until the erase is done. The device is blocked any way during
this time and cannot execute any other command.
More data and logs can be found here -
https://drive.google.com/file/d/0B9ocOHYHbbS1Q3VMdkkzeWFkTjg/view
P.S
This is a backport from the same fix for mpt3sas driver intended
for pre-4.4 stable trees.
Signed-off-by: Andrey Grodzovsky <andrey2805@gmail.com> Cc: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com> Cc: Hannes Reinecke <hare@suse.de> Cc: PDL-MPT-FUSIONLINUX <MPT-FusionLinux.pdl@broadcom.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
mpt3sas has a firmware failure where it can only handle one pass through
ATA command at a time. If another comes in, contrary to the SAT
standard, it will hang until the first one completes (causing long
commands like secure erase to timeout). The original fix was to block
the device when an ATA command came in, but this caused a regression
with
scsi: srp_transport: Move queuecommand() wait code to SCSI core
So fix the original fix of the secure erase timeout by properly
returning SAM_STAT_BUSY like the SAT recommends. The original patch
also had a concurrency problem since scsih_qcmd is lockless at that
point (this is fixed by using atomic bitops to set and test the flag).
[mkp: addressed feedback wrt. test_bit and fixed whitespace]
Fixes: 18f6084a989ba1b (mpt3sas: Fix secure erase premature termination) Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reported-by: Ingo Molnar <mingo@kernel.org> Tested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[wt: adjust context] Signed-off-by: Willy Tarreau <w@1wt.eu>
While issuing any ATA passthrough command to firmware the driver will
block the device. But it will unblock the device only if the I/O
completes through the ISR path. If a controller reset occurs before
command completion the device will remain in blocked state.
Make sure we unblock the device following a controller reset if an ATA
passthrough command was queued.
[mkp: clarified patch description]
Cc: <stable@vger.kernel.org> # v4.4+ Fixes: ac6c2a93bd07 ("mpt3sas: Fix for SATA drive in blocked state, after diag reset") Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[wt: adjust context] Signed-off-by: Willy Tarreau <w@1wt.eu>
This is a work around for a bug with LSI Fusion MPT SAS2 when perfoming
secure erase. Due to the very long time the operation takes, commands
issued during the erase will time out and will trigger execution of the
abort hook. Even though the abort hook is called for the specific
command which timed out, this leads to entire device halt
(scsi_state terminated) and premature termination of the secure erase.
Set device state to busy while ATA passthrough commands are in progress.
[mkp: hand applied to 4.9/scsi-fixes, tweaked patch description]
Signed-off-by: Andrey Grodzovsky <andrey2805@gmail.com> Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com> Cc: <linux-scsi@vger.kernel.org> Cc: Sathya Prakash <sathya.prakash@broadcom.com> Cc: Chaitra P B <chaitra.basappa@broadcom.com> Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com> Cc: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
We accidentally overwrite the original saved value of "flags" so that we
can't re-enable IRQs at the end of the function. Presumably this
function is mostly called with IRQs disabled or it would be obvious in
testing.
Fixes: aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
This was lost with commit 2c55b750a884b86dea8b4cc5f15e1484cc47a25c
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
but is necessary for problem determination, e.g. to see the
currently active zone set during automatic port scan.
For the large GPN_FT response (4 pages), save space by not dumping
any empty residual entries.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") Reviewed-by: Alexey Ishchuk <aishchuk@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 2c55b750a884b86dea8b4cc5f15e1484cc47a25c
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
started to add FC_CT_HDR_LEN which made zfcp dump random data
out of bounds for RSPN GS responses because u.rspn.rsp
is the largest and last field in the union of struct zfcp_fc_req.
Other request/response types only happened to stay within bounds
due to the padding of the union or
due to the trace capping of u.gspn.rsp to ZFCP_DBF_SAN_MAX_PAYLOAD.
Timestamp : ...
Area : SAN
Subarea : 00
Level : 1
Exception : -
CPU id : ..
Caller : ...
Record id : 2
Tag : fsscth2
Request id : 0x...
Destination ID : 0x00fffffc
Payload short : 01000000fc0200008002000000000000
xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx <=== 00000000000000000000000000000000
Payload length : 32 <===
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") Reviewed-by: Alexey Ishchuk <aishchuk@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
With commit 2c55b750a884b86dea8b4cc5f15e1484cc47a25c
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
we lost the N_Port-ID where an ELS response comes from.
With commit 7c7dc196814b9e1d5cc254dc579a5fa78ae524f7
("[SCSI] zfcp: Simplify handling of ct and els requests")
we lost the N_Port-ID where a CT response comes from.
It's especially useful if the request SAN trace record
with D_ID was already lost due to trace buffer wrap.
GS uses an open WKA port handle and ELS just a D_ID, and
only for ELS we could get D_ID from QTCB bottom via zfcp_fsf_req.
To cover both cases, add a new field to zfcp_fsf_ct_els
and fill it in on request to use in SAN response trace.
Strictly speaking the D_ID on SAN response is the FC frame's S_ID.
We don't need a field for the other end which is always us.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") Fixes: 7c7dc196814b ("[SCSI] zfcp: Simplify handling of ct and els requests") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
This information was lost with
commit a54ca0f62f953898b05549391ac2a8a4dad6482b
("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
but is required to debug e.g. invalid handle situations.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
Since commit a54ca0f62f953898b05549391ac2a8a4dad6482b
("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
HBA records no longer contain WWPN, D_ID, or LUN
to reduce duplicate information which is already in REC records.
In contrast to "regular" target ports, we don't use recovery to open
WKA ports such as directory/nameserver, so we don't get REC records.
Therefore, introduce pseudo REC running records without any
actual recovery action but including D_ID of WKA port on open/close.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: ae0904f60fab ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
While retaining the actual filtering according to trace level,
the following commits started to write such filtered records
with a hardcoded record level of 1 instead of the actual record level:
commit 250a1352b95e1db3216e5c5d4f4365bea5122f4a
("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.")
commit a54ca0f62f953898b05549391ac2a8a4dad6482b
("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
Now we can distinguish written records again for offline level filtering.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 250a1352b95e ("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.") Fixes: a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
On a successful end of reopen port forced,
zfcp_erp_strategy_followup_success() re-uses the port erp_action
and the subsequent zfcp_erp_action_cleanup() now
sees ZFCP_ERP_SUCCEEDED with
erp_action->action==ZFCP_ERP_ACTION_REOPEN_PORT
instead of ZFCP_ERP_ACTION_REOPEN_PORT_FORCED
but must not perform zfcp_scsi_schedule_rport_register().
We can detect this because the fresh port reopen erp_action
is in its very first step ZFCP_ERP_STEP_UNINITIALIZED.
Otherwise this opens a time window with unblocked rport
(until the followup port reopen recovery would block it again).
If a scsi_cmnd timeout occurs during this time window
fc_timed_out() cannot work as desired and such command
would indeed time out and trigger scsi_eh. This prevents
a clean and timely path failover.
This should not happen if the path issue can be recovered
on FC transport layer such as path issues involving RSCNs.
Also, unnecessary and repeated DID_IMM_RETRY for pending and
undesired new requests occur because internally zfcp still
has its zfcp_port blocked.
As follow-on errors with scsi_eh, it can cause,
in the worst case, permanently lost paths due to one of:
sd <scsidev>: [<scsidisk>] Medium access timeout failure. Offlining disk!
sd <scsidev>: Device offlined - not ready after error recovery
For fix validation and to aid future debugging with other recoveries
we now also trace (un)blocking of rports.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 5767620c383a ("[SCSI] zfcp: Do not unblock rport from REOPEN_PORT_FORCED") Fixes: a2fa0aede07c ("[SCSI] zfcp: Block FC transport rports early on errors") Fixes: 5f852be9e11d ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI") Fixes: 338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable") Fixes: 3859f6a248cb ("[PATCH] zfcp: add rports to enable scsi_add_device to work again") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
In the hardware data router case, introduced with kernel 3.2
commit 86a9668a8d29 ("[SCSI] zfcp: support for hardware data router")
the ELS/GS request&response length needs to be initialized
as in the chained SBAL case.
Otherwise, the FCP channel rejects ELS requests with
FSF_REQUEST_SIZE_TOO_LARGE.
Such ELS requests can be issued by user space through BSG / HBA API,
or zfcp itself uses ADISC ELS for remote port link test on RSCN.
The latter can cause a short path outage due to
unnecessary remote target port recovery because the always
failing ADISC cannot detect extremely short path interruptions
beyond the local FCP channel.
Below example is decoded with zfcpdbf from s390-tools:
Timestamp : ...
Area : SAN
Subarea : 00
Level : 1
Exception : -
CPU id : ..
Caller : zfcp_dbf_san_req+0408
Record id : 1
Tag : fssels1
Request id : 0x<reqid>
Destination ID : 0x00<target d_id>
Payload info : 5200000000000000 <our wwpn > [ADISC]
<our wwnn > 00<s_id> 00000000 00000000000000000000000000000000
Timestamp : ...
Area : HBA
Subarea : 00
Level : 1
Exception : -
CPU id : ..
Caller : zfcp_dbf_hba_fsf_res+0740
Record id : 1
Tag : fs_ferr
Request id : 0x<reqid>
Request status : 0x00000010
FSF cmnd : 0x0000000b [FSF_QTCB_SEND_ELS]
FSF sequence no: 0x...
FSF issued : ...
FSF stat : 0x00000061 [FSF_REQUEST_SIZE_TOO_LARGE]
FSF stat qual : 00000000000000000000000000000000
Prot stat : 0x00000100
Prot stat qual : 00000000000000000000000000000000
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 86a9668a8d29 ("[SCSI] zfcp: support for hardware data router") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
For an NPIV-enabled FCP device, zfcp can erroneously show
"NPort (fabric via point-to-point)" instead of "NPIV VPORT"
for the port_type sysfs attribute of the corresponding
fc_host.
s390-tools that can be affected are dbginfo.sh and ziomon.
zfcp_fsf_exchange_config_evaluate() ignores
fsf_qtcb_bottom_config.connection_features indicating NPIV
and only sets fc_host_port_type to FC_PORTTYPE_NPORT if
fsf_qtcb_bottom_config.fc_topology is FSF_TOPO_FABRIC.
Only the independent zfcp_fsf_exchange_port_evaluate()
evaluates connection_features to overwrite fc_host_port_type
to FC_PORTTYPE_NPIV in case of NPIV.
Code was introduced with upstream kernel 2.6.30
commit 0282985da5923fa6365adcc1a1586ae0c13c1617
("[SCSI] zfcp: Report fc_host_port_type as NPIV").
This works during FCP device recovery (such as set online)
because it performs FSF_QTCB_EXCHANGE_CONFIG_DATA followed by
FSF_QTCB_EXCHANGE_PORT_DATA in sequence.
However, the zfcp-specific scsi host sysfs attributes
"requests", "megabytes", or "seconds_active" trigger only
zfcp_fsf_exchange_config_evaluate() resetting fc_host
port_type to FC_PORTTYPE_NPORT despite NPIV.
The zfcp-specific scsi host sysfs attribute "utilization"
triggers only zfcp_fsf_exchange_port_evaluate() correcting
the fc_host port_type again in case of NPIV.
Evaluate fsf_qtcb_bottom_config.connection_features
in zfcp_fsf_exchange_config_evaluate() where it belongs to.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 0282985da592 ("[SCSI] zfcp: Report fc_host_port_type as NPIV") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
Currently kill_fasync() is called outside the stream lock in
snd_pcm_period_elapsed(). This is potentially racy, since the stream
may get released even during the irq handler is running. Although
snd_pcm_release_substream() calls snd_pcm_drop(), this doesn't
guarantee that the irq handler finishes, thus the kill_fasync() call
outside the stream spin lock may be invoked after the substream is
detached, as recently reported by KASAN.
As a quick workaround, move kill_fasync() call inside the stream
lock. The fasync is rarely used interface, so this shouldn't have a
big impact from the performance POV.
Ideally, we should implement some sync mechanism for the proper finish
of stream and irq handler. But this oneliner should suffice for most
cases, so far.
The pointer callbacks of ali5451 driver may return the value at the
boundary occasionally, and it results in the kernel warning like
snd_ali5451 0000:00:06.0: BUG: , pos = 16384, buffer size = 16384, period size = 1024
It seems that folding the position offset is enough for fixing the
warning and no ill-effect has been seen by that.
The problem happens when you call ioctl(SNDRV_TIMER_IOCTL_CONTINUE) on a
completely new/unused timer -- it will have ->sticks == 0, which causes a
divide by 0 in snd_hrtimer_callback().
- ioctl(SNDRV_TIMER_IOCTL_SELECT), which potentially sets
tu->queue/tu->tqueue to NULL on memory allocation failure, so read()
would get a NULL pointer dereference like the above splat
- the same ioctl() can free tu->queue/to->tqueue which means read()
could potentially see (and dereference) the freed pointer
We can fix both by taking the ioctl_lock mutex when dereferencing
->queue/->tqueue, since that's always held over all the ioctl() code.
Just looking at the code I find it likely that there are more problems
here such as tu->qhead pointing outside the buffer if the size is
changed concurrently using SNDRV_TIMER_IOCTL_PARAMS.
When a seq-virmidi driver is initialized, it registers a rawmidi
instance with its callback to create an associated seq kernel client.
Currently it's done throughly in rawmidi's register_mutex context.
Recently it was found that this may lead to a deadlock another rawmidi
device that is being attached with the sequencer is accessed, as both
open with the same register_mutex. This was actually triggered by
syzkaller, as Dmitry Vyukov reported:
======================================================
[ INFO: possible circular locking dependency detected ]
4.8.0-rc1+ #11 Not tainted
-------------------------------------------------------
syz-executor/7154 is trying to acquire lock:
(register_mutex#5){+.+.+.}, at: [<ffffffff84fd6d4b>] snd_rawmidi_kernel_open+0x4b/0x260 sound/core/rawmidi.c:341
but task is already holding lock:
(&grp->list_mutex){++++.+}, at: [<ffffffff850138bb>] check_and_subscribe_port+0x5b/0x5c0 sound/core/seq/seq_ports.c:495
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
The fix is to simply move the registration parts in
snd_rawmidi_dev_register() to the outside of the register_mutex lock.
The lock is needed only to manage the linked list, and it's not
necessarily to cover the whole initialization process.
Some code (all error handling) submits CDBs that are allocated
on the stack. This breaks with CB/CBI code that tries to create
URB directly from SCSI command buffer - which happens to be in
vmalloced memory with vmalloced kernel stacks.
Let's make copy of the command in usb_stor_CB_transport.
Signed-off-by: Petr Vandrovec <petr@vandrovec.name> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
According to Dave Miller "the networking stack has a
hard requirement that all SKBs which are transmitted
must have their completion signalled in a fininte
amount of time. This is because, until the SKB is
freed by the driver, it holds onto socket,
netfilter, and other subsystem resources."
In summary, this means that using TX IRQ throttling
for the networking gadgets is, at least, complex and
we should avoid it for the time being.
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
The current tiocmget implementation would fail to report errors up the
stack and instead leaked a few bits from the stack as a mask of
modem-status flags.
Fixes: 39a66b8d22a3 ("[PATCH] USB: CP2101 Add support for flow control") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
If we don't guarantee that we will always get an
interrupt at least when we're queueing our very last
request, we could fall into situation where we queue
every request with 'no_interrupt' set. This will
cause the link to get stuck.
The behavior above has been triggered with g_ether
and dwc3.
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
This patch fixes a NULL pointer dereference caused by a race codition in
the probe function of the legousbtower driver. It re-structures the
probe function to only register the interface after successfully reading
the board's firmware ID.
The probe function does not deregister the usb interface after an error
receiving the devices firmware ID. The device file registered
(/dev/usb/legousbtower%d) may be read/written globally before the probe
function returns. When tower_delete is called in the probe function
(after an r/w has been initiated), core dev structures are deleted while
the file operation functions are still running. If the 0 address is
mappable on the machine, this vulnerability can be used to create a
Local Priviege Escalation exploit via a write-what-where condition by
remapping dev->interrupt_out_buffer in tower_write. A forged USB device
and local program execution would be required for LPE. The USB device
would have to delay the control message in tower_probe and accept
the control urb in tower_open whilst guest code initiated a write to the
device file as tower_delete is called from the error in tower_probe.
This bug has existed since 2003. Patch tested by emulated device.
Reported-by: James Patrick-Evans <james@jmp-e.com> Tested-by: James Patrick-Evans <james@jmp-e.com> Signed-off-by: James Patrick-Evans <james@jmp-e.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
Some full-speed mceusb infrared transceivers contain invalid endpoint
descriptors for their interrupt endpoints, with bInterval set to 0.
In the past they have worked out okay with the mceusb driver, because
the driver sets the bInterval field in the descriptor to 1,
overwriting whatever value may have been there before. However, this
approach was never sanctioned by the USB core, and in fact it does not
work with xHCI controllers, because they use the bInterval value that
was present when the configuration was installed.
Currently usbcore uses 32 ms as the default interval if the value in
the endpoint descriptor is invalid. It turns out that these IR
transceivers don't work properly unless the interval is set to 10 ms
or below. To work around this mceusb problem, this patch changes the
endpoint-descriptor parsing routine, making the default interval value
be 10 ms rather than 32 ms.
The previous driver is possible to stop the transfer wrongly.
For example:
1) An interrupt happens, but not BRDY interruption.
2) Read INTSTS0. And than state->intsts0 is not set to BRDY.
3) BRDY is set to 1 here.
4) Read BRDYSTS.
5) Clear the BRDYSTS. And then. the BRDY is cleared wrongly.
Remarks:
- The INTSTS0.BRDY is read only.
- If any bits of BRDYSTS are set to 1, the BRDY is set to 1.
- If BRDYSTS is 0, the BRDY is set to 0.
So, this patch adds condition to avoid such situation. (And about
NRDYSTS, this is not used for now. But, avoiding any side effects,
this patch doesn't touch it.)
udriver struct allocated by kzalloc() will not be freed
if usb_register() and next calls fail. This patch fixes this
by adding one more step with kfree(udriver) in error path.
Signed-off-by: Alexey Klimov <klimov.linux@gmail.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu>