The rtsx_usb_sdmmc driver may bail out in its ->set_ios() callback when no
SD card is inserted. This is wrong, as it could cause the device to remain
runtime resumed when it's unused. Fix this behaviour.
Tested-by: Ritesh Raj Sarraf <rrs@researchut.com> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Potentially overflowing expression 1000000 * data->timeout_clks with
type unsigned int is evaluated using 32-bit arithmetic, and then used
in a context that expects an expression of type unsigned long long.
To avoid overflow, cast 1000000U to type unsigned long long.
Special thanks to Coverity.
Fixes: 7f05538af71c ("mmc: sdhci: fix data timeout (part 2)") Signed-off-by: Haibo Chen <haibo.chen@nxp.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
We accidentally overwrite the original saved value of "flags" so that we
can't re-enable IRQs at the end of the function. Presumably this
function is mostly called with IRQs disabled or it would be obvious in
testing.
Fixes: aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The goal of the patch is to fix this scenario:
ip link add dummy1 type dummy
ip link set dummy1 up
ip link set lo down ; ip link set lo up
After that sequence, the local route to the link layer address of dummy1 is
not there anymore.
When the loopback is set down, all local routes are deleted by
addrconf_ifdown()/rt6_ifdown(). At this time, the rt6_info entry still
exists, because the corresponding idev has a reference on it. After the rcu
grace period, dst_rcu_free() is called, and thus ___dst_free(), which will
set obsolete to DST_OBSOLETE_DEAD.
In this case, init_loopback() is called before dst_rcu_free(), thus
obsolete is still sets to something <= 0. So, the function doesn't add the
route again. To avoid that race, let's check the rt6 refcnt instead.
Fixes: 25fb6ca4ed9c ("net IPv6 : Fix broken IPv6 routing table after loopback down-up") Fixes: a881ae1f625c ("ipv6: don't call addrconf_dst_alloc again when enable lo") Fixes: 33d99113b110 ("ipv6: reallocate addrconf router for ipv6 address when lo device up") Reported-by: Francesco Santoro <francesco.santoro@6wind.com> Reported-by: Samuel Gauthier <samuel.gauthier@6wind.com> CC: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com> CC: Maruthi Thotad <Maruthi.Thotad@ap.sony.com> CC: Sabrina Dubroca <sd@queasysnail.net> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> CC: Weilong Chen <chenweilong@huawei.com> CC: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In mac80211, multicast A-MSDUs are accepted in many cases that
they shouldn't be accepted in:
* drop A-MSDUs with a multicast A1 (RA), as required by the
spec in 9.11 (802.11-2012 version)
* drop A-MSDUs with a 4-addr header, since the fourth address
can't actually be useful for them; unless 4-address frame
format is actually requested, even though the fourth address
is still not useful in this case, but ignored
Accepting the first case, in particular, is very problematic
since it allows anyone else with possession of a GTK to send
unicast frames encapsulated in a multicast A-MSDU, even when
the AP has client isolation enabled.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This patch fixes one use-after-free report[1] by KASAN.
In __scsi_scan_target(), when a type 31 device is probed,
SCSI_SCAN_TARGET_PRESENT is returned and the target will be scanned
again.
Inside the following scsi_report_lun_scan(), one new scsi_device
instance is allocated, and scsi_probe_and_add_lun() is called again to
probe the target and still see type 31 device, finally
__scsi_remove_device() is called to remove & free the device at the end
of scsi_probe_and_add_lun(), so cause use-after-free in
scsi_report_lun_scan().
And the following SCSI log can be observed:
scsi 0:0:2:0: scsi scan: INQUIRY pass 1 length 36
scsi 0:0:2:0: scsi scan: INQUIRY successful with code 0x0
scsi 0:0:2:0: scsi scan: peripheral device type of 31, no device added
scsi 0:0:2:0: scsi scan: Sending REPORT LUNS to (try 0)
scsi 0:0:2:0: scsi scan: REPORT LUNS successful (try 0) result 0x0
scsi 0:0:2:0: scsi scan: REPORT LUN scan
scsi 0:0:2:0: scsi scan: INQUIRY pass 1 length 36
scsi 0:0:2:0: scsi scan: INQUIRY successful with code 0x0
scsi 0:0:2:0: scsi scan: peripheral device type of 31, no device added
BUG: KASAN: use-after-free in __scsi_scan_target+0xbf8/0xe40 at addr ffff88007b44a104
This patch fixes the issue by moving the putting reference at
the end of scsi_report_lun_scan().
Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Mismatching stream names in DAPM route and widget definitions are
causing compilation errors. Fixing these names allows the cs4270
driver to compile and function.
[Errors must be at probe time not compile time -- broonie]
Signed-off-by: Murray Foster <mrafoster@gmail.com> Acked-by: Paul Handrigan <Paul.Handrigan@cirrus.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Commit f68381a70bb2 (mmc: block: fix packed command header endianness)
correctly fixed endianness handling of packed_cmd_hdr in
mmc_blk_packed_hdr_wrq_prep.
But now, sparse complains about incorrect types:
drivers/mmc/card/block.c:1613:27: sparse: incorrect type in assignment (different base types)
drivers/mmc/card/block.c:1613:27: expected unsigned int [unsigned] [usertype] <noident>
drivers/mmc/card/block.c:1613:27: got restricted __le32 [usertype] <noident>
...
So annotate cmd_hdr properly using __le32 to make everyone happy.
Change thaw_super() to check frozen != SB_FREEZE_COMPLETE rather than
frozen == SB_UNFROZEN, otherwise it can race with freeze_super() which
drops sb->s_umount after SB_FREEZE_WRITE to preserve the lock ordering.
In this case thaw_super() will wrongly call s_op->unfreeze_fs() before
it was actually frozen, and call sb_freeze_unlock() which leads to the
unbalanced percpu_up_write(). Unfortunately lockdep can't detect this,
so this triggers misc BUG_ON()'s in kernel/rcu/sync.c.
Reported-and-tested-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cleanup some missing mem frees on some cifs ioctls, and
clarify others to make more obvious that no data is returned.
Signed-off-by: Steve French <smfrench@gmail.com> Acked-by: Sachin Prabhu <sprabhu@redhat.com>
[bwh: Backported to 3.16:
- Drop changes to smb2_duplicate_extents(), smb3_set_integrity()
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[CIFS] We had cases where we sent a SMB2/SMB3 setinfo request with all
timestamp (and DOS attribute) fields marked as 0 (ie do not change)
e.g. on chmod or chown.
Signed-off-by: Steve French <steve.french@primarydata.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Remove the global file_list_lock to simplify cifs/smb3 locking and
have spinlocks that more closely match the information they are
protecting.
Add new tcon->open_file_lock and file->file_info_lock spinlocks.
Locks continue to follow a heirachy,
cifs_socket --> cifs_ses --> cifs_tcon --> cifs_file
where global tcp_ses_lock still protects socket and cifs_ses, while the
the newer locks protect the lower level structure's information
(tcon and cifs_file respectively).
Signed-off-by: Steve French <steve.french@primarydata.com> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Germano Percossi <germano.percossi@citrix.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
GUIDs although random, and 16 bytes, need to be generated as
proper uuids.
Signed-off-by: Steve French <steve.french@primarydata.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reported-by: David Goebels <davidgoe@microsoft.com>
[bwh: Backported to 3.16: drop changes to create_durable_v2_buf()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Steve French <steve.french@primarydata.com> Reported-by: David Goebel <davidgoe@microsoft.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The kernel client requests 2 credits for many operations even though
they only use 1 credit (presumably to build up a buffer of credit).
Some servers seem to give the client as much credit as is requested. In
this case, the amount of credit the client has continues increasing to
the point where (server->credits * MAX_BUFFER_SIZE) overflows in
smb2_wait_mtu_credits().
Fix this by throttling the credit requests if an set limit is reached.
For async requests where the credit charge may be > 1, request as much
credit as what is charged.
The limit is chosen somewhat arbitrarily. The Windows client
defaults to 128 credits, the Windows server allows clients up to
512 credits (or 8192 for Windows 2016), and the NetApp server
(and at least one other) does not limit clients at all.
Choose a high enough value such that the client shouldn't limit
performance.
This behavior was seen with a NetApp filer (NetApp Release 9.0RC2).
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.16: drop changes to smb2_async_{read,write}v()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In debugging smb3, it is useful to display the number
of credits available, so we can see when the server has not granted
sufficient operations for the client to make progress, or alternatively
the client has requested too many credits (as we saw in a recent bug)
so we can compare with the number of credits the server thinks
we have.
Add a /proc/fs/cifs/DebugData line to display the client view
on how many credits are available.
Signed-off-by: Steve French <steve.french@primarydata.com> Reported-by: Germano Percossi <germano.percossi@citrix.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Currently regs_return_value always negates reg[2] if it determines
the syscall has failed, but when called in kernel context this check is
invalid and may result in returning a wrong value.
This fixes errors reported by CONFIG_KPROBES_SANITY_TEST
Fixes: d7e7528bcd45 ("Audit: push audit success and retcode into arch ptrace.h") Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14381/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a
race:
sem_lock has a fast path that allows parallel simple operations.
There are two reasons why a simple operation cannot run in parallel:
- a non-simple operations is ongoing (sma->sem_perm.lock held)
- a complex operation is sleeping (sma->complex_count != 0)
As both facts are stored independently, a thread can bypass the current
checks by sleeping in the right positions. See below for more details
(or kernel bugzilla 105651).
The patch fixes that by creating one variable (complex_mode)
that tracks both reasons why parallel operations are not possible.
The patch also updates stale documentation regarding the locking.
With regards to stable kernels:
The patch is required for all kernels that include the
commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") (3.10?)
The alternative is to revert the patch that introduced the race.
The patch is safe for backporting, i.e. it makes no assumptions
about memory barriers in spin_unlock_wait().
Background:
Here is the race of the current implementation:
Thread A: (simple op)
- does the first "sma->complex_count == 0" test
Thread B: (complex op)
- does sem_lock(): This includes an array scan. But the scan can't
find Thread A, because Thread A does not own sem->lock yet.
- the thread does the operation, increases complex_count,
drops sem_lock, sleeps
Thread A:
- spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
- sleeps before the complex_count test
Thread C: (complex op)
- does sem_lock (no array scan, complex_count==1)
- wakes up Thread B.
- decrements complex_count
Thread A:
- does the complex_count test
Bug:
Now both thread A and thread C operate on the same array, without
any synchronization.
Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") Link: http://lkml.kernel.org/r/1469123695-5661-1-git-send-email-manfred@colorfullife.com Reported-by: <felixh@informatik.uni-bremen.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: <1vier1@web.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16:
- We missed out on some earlier memory barrier changes
- Use set_mb instead of smp_store_mb] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
CPUs without single-byte and double-byte loads and stores place some
"interesting" requirements on concurrent code. For example (adapted
from Peter Hurley's test code), suppose we have the following structure:
Of course, it is common (and good) practice to place data protected
by different locks in separate cache lines. However, if the locks are
rarely acquired (for example, only in rare error cases), and there are
a great many instances of the data structure, then memory footprint can
trump false-sharing concerns, so that it can be better to place them in
the same cache cache line as above.
But if the CPU does not support single-byte loads and stores, a store
to foop->a will do a non-atomic read-modify-write operation on foop->b,
which will come as a nasty surprise to someone holding foop->lock2. So we
now require CPUs to support single-byte and double-byte loads and stores.
Therefore, this commit adjusts the definition of __native_word() to allow
these sizes to be used by smp_load_acquire() and smp_store_release().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Daniel Walker reported problems which happens when
crash_kexec_post_notifiers kernel option is enabled
(https://lkml.org/lkml/2015/6/24/44).
In that case, smp_send_stop() is called before entering kdump routines
which assume other CPUs are still online. As the result, kdump
routines fail to save other CPUs' registers. Additionally for MIPS
OCTEON, it misses to stop the watchdog timer.
To fix this problem, call a new kdump friendly function,
crash_smp_send_stop(), instead of the smp_send_stop() when
crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a
weak function, and it just call smp_send_stop(). Architecture
codes should override it so that kdump can work appropriately.
This patch provides MIPS version.
Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option) Link: http://lkml.kernel.org/r/20160810080950.11028.28000.stgit@sysi4-13.yrl.intra.hitachi.co.jp Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Reported-by: Daniel Walker <dwalker@fifo99.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Daniel Walker <dwalker@fifo99.com> Cc: Xunlei Pang <xpang@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: "Steven J. Hill" <steven.hill@cavium.com> Cc: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Daniel Walker reported problems which happens when
crash_kexec_post_notifiers kernel option is enabled
(https://lkml.org/lkml/2015/6/24/44).
In that case, smp_send_stop() is called before entering kdump routines
which assume other CPUs are still online. As the result, for x86, kdump
routines fail to save other CPUs' registers and disable virtualization
extensions.
To fix this problem, call a new kdump friendly function,
crash_smp_send_stop(), instead of the smp_send_stop() when
crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a weak
function, and it just call smp_send_stop(). Architecture codes should
override it so that kdump can work appropriately. This patch only
provides x86-specific version.
For Xen's PV kernel, just keep the current behavior.
NOTES:
- Right solution would be to place crash_smp_send_stop() before
__crash_kexec() invocation in all cases and remove smp_send_stop(), but
we can't do that until all architectures implement own
crash_smp_send_stop()
- crash_smp_send_stop()-like work is still needed by
machine_crash_shutdown() because crash_kexec() can be called without
entering panic()
Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option) Link: http://lkml.kernel.org/r/20160810080948.11028.15344.stgit@sysi4-13.yrl.intra.hitachi.co.jp Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Reported-by: Daniel Walker <dwalker@fifo99.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Daniel Walker <dwalker@fifo99.com> Cc: Xunlei Pang <xpang@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: "Steven J. Hill" <steven.hill@cavium.com> Cc: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Debugging a data corruption issue with virtio-net/vhost-net led to
the observation that __copy_tofrom_user was occasionally returning
a value 16 larger than it should. Since the return value from
__copy_tofrom_user is the number of bytes not copied, this means
that __copy_tofrom_user can occasionally return a value larger
than the number of bytes it was asked to copy. In turn this can
cause higher-level copy functions such as copy_page_to_iter_iovec
to corrupt memory by copying data into the wrong memory locations.
It turns out that the failing case involves a fault on the store
at label 79, and at that point the first unmodified byte of the
destination is at R3 + 16. Consequently the exception handler
for that store needs to add 16 to R3 before using it to work out
how many bytes were not copied, but in this one case it was not
adding the offset to R3. To fix it, this moves the label 179 to
the point where we add 16 to R3. I have checked manually all the
exception handlers for the loads and stores in this code and the
rest of them are correct (it would be excellent to have an
automated test of all the exception cases).
This bug has been present since this code was initially
committed in May 2002 to Linux version 2.5.20.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This commit fixes a stack corruption in the pseries specific code dealing
with the huge pages.
In __pSeries_lpar_hugepage_invalidate() the buffer used to pass arguments
to the hypervisor is not large enough. This leads to a stack corruption
where a previously saved register could be corrupted leading to unexpected
result in the caller, like the following panic:
Most of the time, the bug is surfacing in a caller up in the stack from
__pSeries_lpar_hugepage_invalidate() which is quite confusing.
This bug is pending since v3.11 but was hidden if a caller of the
caller of __pSeries_lpar_hugepage_invalidate() has pushed the corruped
register (r18 in this case) in the stack and is not using it until
restoring it. GCC 6.2.0 seems to raise it more frequently.
This commit also change the definition of the parameter buffer in
pSeries_lpar_flush_hash_range() to rely on the global define
PLPAR_HCALL9_BUFSIZE (no functional change here).
The filesystem used in this case is ubifs, but it can be triggered on
many other filesystems.
When preadv64() is called with offset=0xffffffff000, a page with
index=0xffffffff will be added to the radix tree of ->mapping. Then
this page can be found in ->mapping with pagevec_lookup(). After that,
truncate_inode_pages_range(), which is called in ftruncate(), will fall
into an infinite loop:
- find a page with index=0xffffffff, since index>=end, this page won't
be truncated
- index++, and index become 0
- the page with index=0xffffffff will be found again
The data type of index is unsigned long, so index won't overflow to 0 on
64 bits architecture in this case, and the dead loop won't happen.
Since truncate_inode_pages_range() is executed with holding lock of
inode->i_rwsem, any operation related with this lock will be blocked,
and a hung task will happen, e.g.:
INFO: task truncate_test:3364 blocked for more than 120 seconds.
...
call_rwsem_down_write_failed+0x17/0x30
generic_file_write_iter+0x32/0x1c0
ubifs_write_iter+0xcc/0x170
__vfs_write+0xc4/0x120
vfs_write+0xb2/0x1b0
SyS_write+0x46/0xa0
The page with index=0xffffffff added to ->mapping is useless. Fix this
by checking the read position before allocating pages.
Link: http://lkml.kernel.org/r/1475151010-40166-1-git-send-email-fangwei1@huawei.com Signed-off-by: Wei Fang <fangwei1@huawei.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In dissolve_free_huge_pages(), free hugepages will be dissolved without
making sure that there are enough of them left to satisfy hugepage
reservations.
Fix this by adding a return value to dissolve_free_huge_pages() and
checking h->free_huge_pages vs. h->resv_huge_pages. Note that this may
lead to the situation where dissolve_free_huge_page() returns an error
and all free hugepages that were dissolved before that error are lost,
while the memory block still cannot be set offline.
Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Link: http://lkml.kernel.org/r/20160926172811.94033-3-gerald.schaefer@de.ibm.com Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Rui Teng <rui.teng@linux.vnet.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Patch series "mm/hugetlb: memory offline issues with hugepages", v4.
This addresses several issues with hugepages and memory offline. While
the first patch fixes a panic, and is therefore rather important, the
last patch is just a performance optimization.
The second patch fixes a theoretical issue with reserved hugepages,
while still leaving some ugly usability issue, see description.
This patch (of 3):
dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
list corruption and addressing exception when trying to set a memory
block offline that is part (but not the first part) of a "gigantic"
hugetlb page with a size > memory block size.
When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
will trigger directly. In the other case we will run into an addressing
exception later, because dissolve_free_huge_page() will not work on the
head page of the compound hugetlb page which will result in a NULL
hstate from page_hstate().
To fix this, first remove the VM_BUG_ON() because it is wrong, and then
use the compound head page in dissolve_free_huge_page(). This means
that an unused pre-allocated gigantic page that has any part of itself
inside the memory block that is going offline will be dissolved
completely. Losing an unused gigantic hugepage is preferable to failing
the memory offline, for example in the situation where a (possibly
faulty) memory DIMM needs to go offline.
Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Link: http://lkml.kernel.org/r/20160926172811.94033-2-gerald.schaefer@de.ibm.com Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Rui Teng <rui.teng@linux.vnet.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Another Lifebook machine that needs the same quirk as other similar
models to make the driver working.
Also let's reorder elantech_dmi_force_crc_enabled list so LIfebook enries
are in alphabetical order.
Reported-by: William Linna <william.linna@gmail.com> Tested-by: William Linna <william.linna@gmail.com> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Avoid that mapping an sg-list in which the first element has a
non-zero offset triggers an infinite loop when using FMR. This
patch makes the FMR mapping code similar to that of ib_sg_to_pages().
Note: older Mellanox HCAs do not support non-zero offsets for FMR.
See also commit 8c4037b501ac ("IB/srp: always avoid non-zero offsets
into an FMR").
Reported-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The definition of atomic_dec_if_positive() assumes that
atomic_sub_if_positive() exists, which is only the case if
metag specific atomics are used. This results in the following
build error when trying to build metag1_defconfig.
kernel/ucount.c: In function 'dec_ucount':
kernel/ucount.c:211: error:
implicit declaration of function 'atomic_sub_if_positive'
Moving the definition of atomic_dec_if_positive() into the metag
conditional code fixes the problem.
Fixes: 6006c0d8ce94 ("metag: Atomics, locks and bitops") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Since linux-3.15, netlink_dump() can use up to 16384 bytes skb
allocations.
Due to struct skb_shared_info ~320 bytes overhead, we end up using
order-3 (on x86) page allocations, that might trigger direct reclaim and
add stress.
The intent was really to attempt a large allocation but immediately
fallback to a smaller one (order-1 on x86) in case of memory stress.
On recent kernels (linux-4.4), we can remove __GFP_DIRECT_RECLAIM to
meet the goal. Old kernels would need to remove __GFP_WAIT
While we are at it, since we do an order-3 allocation, allow to use
all the allocated bytes instead of 16384 to reduce syscalls during
large dumps.
Commit 22f2ac51b6d6 ("mm: workingset: fix crash in shadow node shrinker
caused by replace_page_cache_page()") switched replace_page_cache() from
raw radix tree operations to page_cache_tree_insert() but didn't take
into account that the latter function, unlike the raw radix tree op,
handles mapping->nrpages. As a result, that counter is bumped for each
page replacement rather than balanced out even.
The mapping->nrpages counter is used to skip needless radix tree walks
when invalidating, truncating, syncing inodes without pages, as well as
statistics for userspace. Since the error is positive, we'll do more
page cache tree walks than necessary; we won't miss a necessary one.
And we'll report more buffer pages to userspace than there are. The
error is limited to fuse inodes.
Fixes: 22f2ac51b6d6 ("mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Member "status" of struct usb_sg_request is managed by usb core. A
spin lock is used to serialize the change of it. The driver could
check the value of req->status, but should avoid changing it without
the hold of the spinlock. Otherwise, it could cause race or error
in usb core.
This patch could be backported to stable kernels with version later
than v3.14.
Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Roger Tseng <rogerable@realtek.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
set_bit() and clear_bit() take the bit number so this code is really
doing "1 << (1 << irq)" which is a double shift bug. It's done
consistently so it won't cause a problem unless "irq" is more than 4.
Fixes: 70c6cce04066 ('mfd: Support 88pm80x in 80x driver') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
drivers/built-in.o: In function `wm8350_i2c_probe':
core.c:(.text+0x828b0): undefined reference to `__devm_regmap_init_i2c'
Makefile:953: recipe for target 'vmlinux' failed
Fixes: 52b461b86a9f ("mfd: Add regmap cache support for wm8350") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Although rare, it's possible to hit PCI error early on device
probe, meaning possibly some structs are not entirely initialized,
and some might even be completely uninitialized, leading to NULL
pointer dereference.
The i40e driver currently presents a "bad" behavior if device hits
such early PCI error: firstly, the struct i40e_pf might not be
attached to pci_dev yet, leading to a NULL pointer dereference on
access to pf->state.
Even checking if the struct is NULL and avoiding the access in that
case isn't enough, since the driver cannot recover from PCI error
that early; in our experiments we saw multiple failures on kernel
log, like:
[549.664] i40e 0007:01:00.1: Initial pf_reset failed: -15
[549.664] i40e: probe of 0007:01:00.1 failed with error -15
[...]
[871.644] i40e 0007:01:00.1: The driver for the device stopped because the
device firmware failed to init. Try updating your NVM image.
[871.644] i40e: probe of 0007:01:00.1 failed with error -32
[...]
[872.516] i40e 0007:01:00.0: ARQ: Unknown event 0x0000 ignored
Between the first probe failure (error -15) and the second (error -32)
another PCI error happened due to the first bad probe. Also, driver
started to flood console with those ARQ event messages.
This patch will prevent these issues by allowing error recovery
mechanism to remove the failed device from the system instead of
trying to recover from early PCI errors during device probe.
Signed-off-by: Guilherme G Piccoli <gpiccoli@linux.vnet.ibm.com> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Usually Fastmap is free to consider every PEB in one of the pools
as newer than the existing PEB. Since PEBs in a pool are by definition
newer than everything else.
But update_vol() missed the case that a pool can contain more than
one candidate.
Fixes: dbb7d2a88d ("UBI: Add fastmap core") Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Richard Weinberger <richard@nod.at>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When writing a new Fastmap the first thing that happens
is refilling the pools in memory.
At this stage it is possible that new PEBs from the new pools
get already claimed and written with data.
If this happens before the new Fastmap data structure hits the
flash and we face power cut the freshly written PEB will not
scanned and unnoticed.
Solve the issue by locking the pools until Fastmap is written.
Fixes: dbb7d2a88d ("UBI: Add fastmap core") Signed-off-by: Richard Weinberger <richard@nod.at>
[bwh: Backported to 3.16:
- Adjust filename, context, indentation
- s/ubi->fm_eba_sem/ubi->fm_sem/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When Fastmap is used we can face here an -EBADMSG
since Fastmap cannot know about unmaps.
If the erasure was interrupted the PEB may show ECC
errors and UBI would go to ro-mode as it assumes
that the PEB was check during attach time, which is
not the case with Fastmap.
Fixes: dbb7d2a88d ("UBI: Add fastmap core") Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
scan_pool() does not mark the PEB for scrubing when bitflips are
detected in the EC header of a free PEB (VID header region left to
0xff).
Make sure we scrub the PEB in this case.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Fixes: dbb7d2a88d2a ("UBI: Add fastmap core") Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The cipher block size for GCM is 16 bytes, and thus the CTR transform
used in crypto_gcm_setkey() will also expect a 16-byte IV. However,
the code currently reserves only 8 bytes for the IV, causing
an out-of-bounds access in the CTR transform. This patch fixes
the issue by setting the size of the IV buffer to 16 bytes.
Fixes: 84c911523020 ("[CRYPTO] gcm: Add support for async ciphers") Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fuse allowed VFS to set mode in setattr in order to clear suid/sgid on
chown and truncate, and (since writeback_cache) write. The problem with
this is that it'll potentially restore a stale mode.
The poper fix would be to let the filesystems do the suid/sgid clearing on
the relevant operations. Possibly some are already doing it but there's no
way we can detect this.
So fix this by refreshing and recalculating the mode. Do this only if
ATTR_KILL_S[UG]ID is set to not destroy performance for writes. This is
still racy but the size of the window is reduced.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Without "default_permissions" the userspace filesystem's lookup operation
needs to perform the check for search permission on the directory.
If directory does not allow search for everyone (this is quite rare) then
userspace filesystem has to set entry timeout to zero to make sure
permissions are always performed.
Changing the mode bits of the directory should also invalidate the
(previously cached) dentry to make sure the next lookup will have a chance
of updating the timeout, if needed.
Reported-by: Jean-Pierre André <jean-pierre.andre@wanadoo.fr> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
con3270 contains an optimisation that reduces the amount of data to be
transmitted to the 3270 terminal by putting a Repeat to Address (RA)
order into the data stream. The RA order itself takes up space, so
con3270 only uses it if there's enough space left in the line
buffer. Otherwise it just pads out the line manually.
For lines that were _just_ short enough that the RA order still fit in
the line buffer, the line was instead padded with an insufficient
amount of spaces. This was caused by examining the size of the
allocated line buffer rather than the length of the string to be
displayed.
For con3270_cline_end(), we just compare against the line length. For
con3270_update_string() however that isn't available anymore, so we
check whether the Repeat to Address order is present.
Fixes: f51320a5 ("[PATCH] s390: new 3270 driver.") (tglx/history.git) Tested-by: Jing Liu <liujbjl@linux.vnet.ibm.com> Tested-by: Yang Chen <bjcyang@linux.vnet.ibm.com> Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
con3270 contains an optimisation that reduces the amount of data to be
transmitted to the 3270 terminal by putting a Repeat to Address (RA)
order into the data stream. The RA order itself takes up space, so
con3270 only uses it if there's enough space left in the line
buffer. Otherwise it just pads out the line manually.
For lines too long to include the RA order, one byte was left
uninitialised. This was caused by an off-by-one bug in the loop that
pads out the line. Since the buffer is allocated from a common pool,
the single byte left uninitialised contained some previous buffer
content. Usually this was just a space or some character (which can
result in clutter but is otherwise harmless). Sometimes, however, it
was a Repeat to Address order, messing up the entire screen layout and
causing the display to send the entire buffer content on every
keystroke.
Fixes: f51320a5 ("[PATCH] s390: new 3270 driver.") (tglx/history.git) Reported-by: Liu Jing <liujbjl@linux.vnet.ibm.com> Tested-by: Jing Liu <liujbjl@linux.vnet.ibm.com> Tested-by: Yang Chen <bjcyang@linux.vnet.ibm.com> Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
__kernel_get_syscall_map() and __kernel_clock_getres() use cmpli to
check if the passed in pointer is non zero. cmpli maps to a 32 bit
compare on binutils, so we ignore the top 32 bits.
A simple test case can be created by passing in a bogus pointer with
the bottom 32 bits clear. Using a clk_id that is handled by the VDSO,
then one that is handled by the kernel shows the problem:
The bigger issue is if we pass a valid pointer with the bottom 32 bits
clear, in this case we will return success but won't write any data
to the pointer.
I stumbled across this issue because the LLVM integrated assembler
doesn't accept cmpli with 3 arguments. Fix this by converting them to
cmpldi.
Fixes: a7f290dad32e ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel") Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Used the wrong index to setup the phase shedding mask.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[bwh: Backported to 3.16: adjust context, indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Turns out it was totally wrong. The memory is supposed to be bound to
the kref, as the original code was doing correctly, not the
device/driver binding as the devm_kzalloc() would cause.
This fixes an oops when read would be called after the device was
unbound from the driver.
Reported-by: Ladislav Michl <ladis@linux-mips.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If the file permissions change on the server, then we may not be able to
recover open state. If so, we need to ensure that we mark the file
descriptor appropriately.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The MMCR2 register is available twice, one time with number 785
(privileged access), and one time with number 769 (unprivileged,
but it can be disabled completely). In former times, the Linux
kernel was using the unprivileged register 769 only, but since
commit 8dd75ccb571f3c92c ("powerpc: Use privileged SPR number
for MMCR2"), it uses the privileged register 785 instead.
The KVM-PR code then of course also switched to use the SPR 785,
but this is causing older guest kernels to crash, since these
kernels still access 769 instead. So to support older kernels
with KVM-PR again, we have to support register 769 in KVM-PR, too.
Fixes: 8dd75ccb571f3c92c48014b3dabd3d51a115ab41 Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
POWER8 has one virtual timebase (VTB) register per subcore, not one
per CPU thread. The HV KVM code currently treats VTB as a per-thread
register, which can lead to spurious soft lockup messages from guests
which use the VTB as the time source for the soft lockup detector.
(CPUs before POWER8 did not have the VTB register.)
For HV KVM, this fixes the problem by making only the primary thread
in each virtual core save and restore the VTB value. With this,
the VTB state becomes part of the kvmppc_vcore structure. This
also means that "piggybacking" of multiple virtual cores onto one
subcore is not possible on POWER8, because then the virtual cores
would share a single VTB register.
PR KVM emulates a VTB register, which is per-vcpu because PR KVM
has no notion of CPU threads or SMT. For PR KVM we move the VTB
state into the kvmppc_vcpu_book3s struct.
Reported-by: Thomas Huth <thuth@redhat.com> Tested-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
[bwh: Backported to 3.16:
- Adjust filenames, context
- Drop changes to kvmppc_core_emulate_mfspr_pr(), can_piggyback_subcore(),
kvmppc_copy_from_svcpu()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
CMD23 aka SET_BLOCK_COUNT was introduced with MMC v3.1.
Older versions of the specification allowed to terminate
multi-block transfers only with CMD12.
The patch fixes the following problem:
mmc0: new MMC card at address 0001
mmcblk0: mmc0:0001 SDMB-16 15.3 MiB
mmcblk0: timed out sending SET_BLOCK_COUNT command, card status 0x400900
...
blk_update_request: I/O error, dev mmcblk0, sector 0
Buffer I/O error on dev mmcblk0, logical block 0, async page read
mmcblk0: unable to read partition table
Signed-off-by: Daniel Glöckner <dg@emlix.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
We have two new Dell laptop models, they have the same ALC255 pin
definition, but not in the pin quirk table yet, as a result, the
headset microphone can't work. After adding the definition in the
table, the headset microphone works well.
Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
http://www.ti.com/lit/pdf/SWCZ010:
DCDC o/p voltage can go higher than programmed value
Impact:
VDDI, VDD2, and VIO output programmed voltage level can go higher than
expected or crash, when coming out of PFM to PWM mode or using DVFS.
Description:
When DCDC CLK SYNC bits are 11/01:
* VIO 3-MHz oscillator is the source clock of the digital core and input
clock of VDD1 and VDD2
* Turn-on of VDD1 and VDD2 HSD PFETis synchronized or at a constant
phase shift
* Current pulled though VCC1+VCC2 is Iload(VDD1) + Iload(VDD2)
* The 3 HSD PFET will be turned-on at the same time, causing the highest
possible switching noise on the application. This noise level depends
on the layout, the VBAT level, and the load current. The noise level
increases with improper layout.
When DCDC CLK SYNC bits are 00:
* VIO 3-MHz oscillator is the source clock of digital core
* VDD1 and VDD2 are running on their own 3-MHz oscillator
* Current pulled though VCC1+VCC2 average of Iload(VDD1) + Iload(VDD2)
* The switching noise of the 3 SMPS will be randomly spread over time,
causing lower overall switching noise.
Workaround:
Set DCDCCTRL_REG[1:0]= 00.
Signed-off-by: Jan Remmet <j.remmet@phytec.de> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Avoid that sparse complains about blkg_hint manipulations.
Fixes: a637120e4902 ("blkcg: use radix tree to index blkgs from blkcg") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
[bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The ctxt's count field is overloaded to mean the number of pages in
the ctxt->page array and the number of SGEs in the ctxt->sge array.
Typically these two numbers are the same.
However, when an inline RPC reply is constructed from an xdr_buf
with a tail iovec, the head and tail often occupy the same page,
but each are DMA mapped independently. In that case, ->count equals
the number of pages, but it does not equal the number of SGEs.
There's one more SGE, for the tail iovec. Hence there is one more
DMA mapping than there are pages in the ctxt->page array.
This isn't a real problem until the server's iommu is enabled. Then
each RPC reply that has content in that iovec orphans a DMA mapping
that consists of real resources.
krb5i and krb5p always populate that tail iovec. After a couple
million sent krb5i/p RPC replies, the NFS server starts behaving
erratically. Reboot is needed to clear the problem.
Fixes: 9d11b51ce7c1 ("svcrdma: Fix send_reply() scatter/gather set-up") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[bwh: Backported to 3.16:
- Adjust context
- Drop changes to svc_rdma_bc_sendto()
- s/xprt->sc_pd->local_dma_lkey/xprt->sc_dma_lkey/ Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fetch value and validate u32 netlink attribute. This validation is
usually required when the u32 netlink attributes are being stored in a
field whose size is smaller.
This patch revisits 4da449ae1df9 ("netfilter: nft_exthdr: Add size check
on u8 nft_exthdr attributes").
Fixes: 96518518cc41 ("netfilter: add nftables") Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fix the direct assignment of offset and length attributes included in
nft_exthdr structure from u32 data to u8.
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The DragonFly quirk added in 42e3121d90f4 ("ALSA: usb-audio: Add a more
accurate volume quirk for AudioQuest DragonFly") applies a custom dB map
on the volume control when its range is reported as 0..50 (0 .. 0.2dB).
However, there exists at least one other variant (hw v1.0c, as opposed
to the tested v1.2) which reports a different non-sensical volume range
(0..53) and the custom map is therefore not applied for that device.
This results in all of the volume change appearing close to 100% on
mixer UIs that utilize the dB TLV information.
Add a fallback case where no dB TLV is reported at all if the control
range is not 0..50 but still 0..N where N <= 1000 (3.9 dB). Also
restrict the quirk to only apply to the volume control as there is also
a mute control which would match the check otherwise.
Fixes: 42e3121d90f4 ("ALSA: usb-audio: Add a more accurate volume quirk for AudioQuest DragonFly") Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi> Reported-by: David W <regulars@d-dub.org.uk> Tested-by: David W <regulars@d-dub.org.uk> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
eeh_pe_bus_get() can return NULL if a PCI bus isn't found for a given PE.
Some callers don't check this, and can cause a null pointer dereference
under certain circumstances.
Fix this by checking NULL everywhere eeh_pe_bus_get() is called.
Fixes: 8a6b1bc70dbb ("powerpc/eeh: EEH core to handle special event") Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When switching from polling-based fw commands to event-based fw
commands, there is a race condition which could cause a fw command
in another task to hang: that task will keep waiting for the polling
sempahore, but may never be able to acquire it. This is due to
mlx4_cmd_use_events, which "down"s the sempahore back to 0.
During driver initialization, this is not a problem, since no other
tasks which invoke FW commands are active.
However, there is a problem if the driver switches to polling mode
and then back to event mode during normal operation.
The "test_interrupts" feature does exactly that.
Running "ethtool -t <eth device> offline" causes the PF driver to
temporarily switch to polling mode, and then back to event mode.
(Note that for VF drivers, such switching is not performed).
Fix this by adding a read-write semaphore for protection when
switching between modes.
Fixes: 225c7b1feef1 ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context, indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Use tabs instead of spaces before if statement, no functional change.
Fixes: e7c1c2c46201 ("mlx4_en: Added self diagnostics test implementation") Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This patch fixes a NULL pointer dereference caused by a race codition in
the probe function of the legousbtower driver. It re-structures the
probe function to only register the interface after successfully reading
the board's firmware ID.
The probe function does not deregister the usb interface after an error
receiving the devices firmware ID. The device file registered
(/dev/usb/legousbtower%d) may be read/written globally before the probe
function returns. When tower_delete is called in the probe function
(after an r/w has been initiated), core dev structures are deleted while
the file operation functions are still running. If the 0 address is
mappable on the machine, this vulnerability can be used to create a
Local Priviege Escalation exploit via a write-what-where condition by
remapping dev->interrupt_out_buffer in tower_write. A forged USB device
and local program execution would be required for LPE. The USB device
would have to delay the control message in tower_probe and accept
the control urb in tower_open whilst guest code initiated a write to the
device file as tower_delete is called from the error in tower_probe.
This bug has existed since 2003. Patch tested by emulated device.
Reported-by: James Patrick-Evans <james@jmp-e.com> Tested-by: James Patrick-Evans <james@jmp-e.com> Signed-off-by: James Patrick-Evans <james@jmp-e.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The pointer callbacks of ali5451 driver may return the value at the
boundary occasionally, and it results in the kernel warning like
snd_ali5451 0000:00:06.0: BUG: , pos = 16384, buffer size = 16384, period size = 1024
It seems that folding the position offset is enough for fixing the
warning and no ill-effect has been seen by that.
When we merge two contiguous partitions whose signatures are marked
NVRAM_SIG_FREE, We need update prev's length and checksum, then write it
to nvram, not cur's. So lets fix this mistake now.
Also use memset instead of strncpy to set the partition's name. It's
more readable if we want to fill up with duplicate chars .
Fixes: fa2b4e54d41f ("powerpc/nvram: Improve partition removal") Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If a VFC port gets unmapped in the VIOS, it may not respond with a CRQ
init complete following H_REG_CRQ. If this occurs, we can end up having
called scsi_block_requests and not a resulting unblock until the init
complete happens, which may never occur, and we end up hanging I/O
requests. This patch ensures the host action stay set to
IBMVFC_HOST_ACTION_TGT_DEL so we move all rports into devloss state and
unblock unless we receive an init complete.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: keep using ktime_to_ns(ktime_get())] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Once a chunk is enqueued successfully, sctp queues can take care of it.
Even if it is failed to transmit (like because of nomem), it should be
put into retransmit queue.
If sctp report this error to users, it confuses them, they may resend
that msg, but actually in kernel sctp stack is in charge of retransmit
it already.
Besides, this error probably is not from the failure of transmitting
current msg, but transmitting or retransmitting another msg's chunks,
as sctp_outq_flush just tries to send out all transports' chunks.
This patch is to make sctp_cmd_send_msg return avoid, and not return the
transmit err back to sctp_sendmsg
Fixes: 8b570dc9f7b6 ("sctp: only drop the reference on the datamsg after sending a msg") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: no gfp flags parameter] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If we hold the superblock lock while calling reiserfs_quota_on_mount(), we can
deadlock our own worker - mount blocks kworker/3:2, sleeps forever more.
Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Mike Galbraith <mgalbraith@suse.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The pinctrl pull up/down register on exynos4210 is 2-bit wide for each
pin and it accepts only values of 0, 1 and 3. The pins sd4-bus-width8
were configured with value of 4. The driver does not validate the value
so this overflow effectively set a bit 1 in adjacent pins thus
configuring them to pull down.
The author's intention was probably to set drive strength of 4x. All
other bus-widths pins are configured with pull up and drive strength of
4x. Fix this one with same pattern.
Fixes: 87711d8c7c70 ("ARM: dts: Add pinctrl node entries for SAMSUNG EXYNOS4210 SoC") Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com> Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Acked-by: Linus Walleij <linus.walleij@linaro.org>
[bwh: Backported to 3.16: use literal constant] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
On faulting sigreturn we do get SIGSEGV, all right, but anything
we'd put into pt_regs could end up in the coredump. And since
__copy_from_user() never zeroed on arc, we'd better bugger off
on its failure without copying random uninitialized bits of
kernel stack into pt_regs...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.16:
- Adjust context
- Don't change the single return statement] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Pages clear buffers after ext4 delayed block allocation failed,
However, it does not clean its pte_dirty flag.
if the pages unmap ,in cording to the pte_dirty ,
unmap_page_range may try to call __set_page_dirty,
which may lead to the bugon at
mpage_prepare_extent_to_map:head = page_buffers(page);.
This patch just call clear_page_dirty_for_io to clean pte_dirty
at mpage_release_unused_pages for pages mmaped.
Steps to reproduce the bug:
(1) mmap a file in ext4
addr = (char *)mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED,
fd, 0);
memset(addr, 'i', 4096);
If pg_init_retries is set and a request is queued against a multipath
device with all underlying block device request_queues in the "dying"
state then an infinite loop is triggered because activate_path() never
succeeds and hence never calls pg_init_done().
This change avoids that device removal triggers an infinite loop by
failing the activate_path() which causes the "dying" path to be failed.
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This avoids that new requests are queued while __dm_destroy() is in
progress.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
[js: use md->queue instead of non-present helper] Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
oss.sgi.com is going away, move contact details over to vger.
Signed-off-by: Dave Chinner <david@fromorbit.com>
[bwh: Backported to 3.16: Also update the git URL, done upstream in commit 9f273c24ec5f "MAINTAINERS: add/fix git URLs for various subsystems"] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The musb driver calls into this phy driver to disable/enable squelch
detection. This function was introduced in 24fe86a617c5 ("phy: sun4i-usb:
Add a sunxi specific function for setting squelch-detect"). This
function in turn calls sun4i_usb_phy_write, which uses a mutex to
guard the common access register. Unfortunately musb does this
in atomic context, which results in the following warning with lock
debugging enabled:
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:97
in_atomic(): 1, irqs_disabled(): 128, pid: 96, name: kworker/0:2
CPU: 0 PID: 96 Comm: kworker/0:2 Not tainted 4.8.0-rc4-00181-gd502f8ad1c3e #13
Hardware name: Allwinner sun8i Family
Workqueue: events musb_deassert_reset
[<c010bc01>] (unwind_backtrace) from [<c0109237>] (show_stack+0xb/0xc)
[<c0109237>] (show_stack) from [<c02a669b>] (dump_stack+0x67/0x74)
[<c02a669b>] (dump_stack) from [<c05d68c9>] (mutex_lock+0x15/0x2c)
[<c05d68c9>] (mutex_lock) from [<c02c3589>] (sun4i_usb_phy_write+0x39/0xec)
[<c02c3589>] (sun4i_usb_phy_write) from [<c03e6327>] (musb_port_reset+0xfb/0x184)
[<c03e6327>] (musb_port_reset) from [<c03e4917>] (musb_deassert_reset+0x1f/0x2c)
[<c03e4917>] (musb_deassert_reset) from [<c012ecb5>] (process_one_work+0x129/0x2b8)
[<c012ecb5>] (process_one_work) from [<c012f5e3>] (worker_thread+0xf3/0x424)
[<c012f5e3>] (worker_thread) from [<c0132dbd>] (kthread+0xa1/0xb8)
[<c0132dbd>] (kthread) from [<c0105f31>] (ret_from_fork+0x11/0x20)
Since the register access is mmio, we can use a spinlock to guard this
specific access, rather than the mutex that guards the entire phy.
Fixes: ba4bdc9e1dc0 ("PHY: sunxi: Add driver for sunxi usb phy") Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Chen-Yu Tsai <wens@csie.org> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The commit 9bf448c66d4b ("ARM: pxa: use generic gpio operation instead of
gpio register") from Oct 17, 2011, leads to the following static checker
warning:
arch/arm/mach-pxa/spitz_pm.c:172 spitz_charger_wakeup()
warn: double left shift '!gpio_get_value(SPITZ_GPIO_KEY_INT)
<< (1 << ((SPITZ_GPIO_KEY_INT) & 31))'
As Dan reported, the value is shifted three times :
- once by gpio_get_value(), which returns either 0 or BIT(gpio)
- once by the shift operation '<<'
- a last time by GPIO_bit(gpio) which is BIT(gpio)
Therefore the calculation lead to a chained or operator of :
- (1 << gpio) << (1 << gpio) = (2^gpio)^gpio = 2 ^ (gpio * gpio)
It is be sheer luck the former statement works, only because each gpio
used is strictly smaller than 6, and therefore 2^(gpio^2) never
overflows a 32 bits value, and because it is used as a boolean value to
check a gpio activation.
As the xxx_charger_wakeup() functions are used as a true/false detection
mechanism, take that opportunity to change their prototypes from integer
return value to boolean one.
Fixes: 9bf448c66d4b ("ARM: pxa: use generic gpio operation instead of
gpio register") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
[bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In commit f02db315b8d8 ("ipv4: IP_TOS and IP_TTL can be specified as
ancillary data") Francesco added IP_TOS values specified as integer.
However, kernel sends to userspace (at recvmsg() time) an IP_TOS value
in a single byte, when IP_RECVTOS is set on the socket.
It can be very useful to reflect all ancillary options as given by the
kernel in a subsequent sendmsg(), instead of aborting the sendmsg() with
EINVAL after Francesco patch.
So this patch extends IP_TOS ancillary to accept an u8, so that an UDP
server can simply reuse same ancillary block without having to mangle
it.
Jesper can then augment
https://github.com/netoptimizer/network-testing/blob/master/src/udp_example02.c
to add TOS reflection ;)
Fixes: f02db315b8d8 ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Francesco Fusco <ffusco@redhat.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The ramoops buffer may be mapped as either I/O memory or uncached
memory. On ARM64, this results in a device-type (strongly-ordered)
mapping. Since unnaligned accesses to device-type memory will
generate an alignment fault (regardless of whether or not strict
alignment checking is enabled), it is not safe to use memcpy().
memcpy_fromio() is guaranteed to only use aligned accesses, so use
that instead.
persistent_ram_update uses vmap / iomap based on whether the buffer is in
memory region or reserved region. However, both map it as non-cacheable
memory. For armv8 specifically, non-cacheable mapping requests use a
memory type that has to be accessed aligned to the request size. memcpy()
doesn't guarantee that.
I have here a FPGA behind PCIe which exports SRAM which I use for
pstore. Now it seems that the FPGA no longer supports cmpxchg based
updates and writes back 0xff…ff and returns the same. This leads to
crash during crash rendering pstore useless.
Since I doubt that there is much benefit from using cmpxchg() here, I am
dropping this atomic access and use the spinlock based version.
Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Kees Cook <keescook@chromium.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Rabin Vincent <rabinv@axis.com> Tested-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Guenter Roeck <linux@roeck-us.net>
[kees: remove "_locked" suffix since it's the only option now] Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Now, ext4_do_update_inode() clears high 16-bit fields of uid/gid
of deleted and evicted inode to fix up interoperability with old
kernels. However, it checks only i_dtime of an inode to determine
whether the inode was deleted and evicted, and this is very risky,
because i_dtime can be used for the pointer maintaining orphan inode
list, too. We need to further check whether the i_dtime is being
used for the orphan inode list even if the i_dtime is not NULL.
We found that high 16-bit fields of uid/gid of inode are unintentionally
and permanently cleared when the inode truncation is just triggered,
but not finished, and the inode metadata, whose high uid/gid bits are
cleared, is written on disk, and the sudden power-off follows that
in order.
The cx231xx_set_agc_analog_digital_mux_select() callers
expect it to return 0 or an error. Returning a positive value
makes the first attempt to switch between analog/digital to fail.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
With the current settings, only one channel locks properly.
That's likely because, when this driver was written, Brazil
were still using experimental transmissions.
Change it to reproduce the settings used by the newer drivers.
That makes it lock on other channels.
Tested with both PixelView SBTVD Hybrid (cx231xx-based) and
C3Tech Digital Duo HDTV/SDTV (em28xx-based) devices.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
On this frontend, it takes a while to start output normal
TS data. That only happens on state S9. On S8, the TS output
is enabled, but it is not reliable enough.
However, the zigzag loop is too fast to let it sync.
As, on practical tests, the zigzag software loop doesn't
seem to be helping, but just slowing down the tuning, let's
switch to hardware algorithm, as the tuners used on such
devices are capable of work with frequency drifts without
any help from software.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Exported pwm channels aren't removed before the pwmchip and are
leaked. This results in invalid sysfs files. This fix removes
all exported pwm channels before chip removal.
Signed-off-by: David Hsu <davidhsu@google.com> Fixes: 76abbdde2d95 ("pwm: Add sysfs interface") Signed-off-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>