If __key_link_begin() failed then "edit" would be uninitialized. I've
added a check to fix that.
This allows a random user to crash the kernel, though it's quite
difficult to achieve. There are three ways it can be done as the user
would have to cause an error to occur in __key_link():
(1) Cause the kernel to run out of memory. In practice, this is difficult
to achieve without ENOMEM cropping up elsewhere and aborting the
attempt.
(2) Revoke the destination keyring between the keyring ID being looked up
and it being tested for revocation. In practice, this is difficult to
time correctly because the KEYCTL_REJECT function can only be used
from the request-key upcall process. Further, users can only make use
of what's in /sbin/request-key.conf, though this does including a
rejection debugging test - which means that the destination keyring
has to be the caller's session keyring in practice.
(3) Have just enough key quota available to create a key, a new session
keyring for the upcall and a link in the session keyring, but not then
sufficient quota to create a link in the nominated destination keyring
so that it fails with EDQUOT.
The bug can be triggered using option (3) above using something like the
following:
echo 80 >/proc/sys/kernel/keys/root_maxbytes
keyctl request2 user debug:fred negate @t
The above sets the quota to something much lower (80) to make the bug
easier to trigger, but this is dependent on the system. Note also that
the name of the keyring created contains a random number that may be
between 1 and 10 characters in size, so may throw the test off by
changing the amount of quota used.
Assuming the failure occurs, something like the following will be seen:
Several Lenovo users have reported problems with their Sierra
Wireless EM7455 modem. The driver has loaded successfully and
the MBIM management channel has appeared to work, including
establishing a connection to the mobile network. But no frames
have been received over the data interface.
The problem affects all EM7455 and MC7455, and is assumed to
affect other modems based on the same Qualcomm chipset and
baseband firmware.
Testing narrowed the problem down to what seems to be a
firmware timing bug during initialization. Adding a short sleep
while probing is sufficient to make the problem disappear.
Experiments have shown that 1-2 ms is too little to have any
effect, while 10-20 ms is enough to reliably succeed.
Reported-by: Stefan Armbruster <ml001@armbruster-it.de> Reported-by: Ralph Plawetzki <ralph@purejava.org> Reported-by: Andreas Fett <andreas.fett@secunet.com> Reported-by: Rasmus Lerdorf <rasmus@lerdorf.com> Reported-by: Samo Ratnik <samo.ratnik@gmail.com> Reported-and-tested-by: Aleksander Morgado <aleksander@aleksander.es> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Copy __kvm_mips_vcpu_run() into unmapped memory, so that we can never
get a TLB refill exception in it when KVM is built as a module.
This was observed to happen with the host MIPS kernel running under
QEMU, due to a not entirely transparent optimisation in the QEMU TLB
handling where TLB entries replaced with TLBWR are copied to a separate
part of the TLB array. Code in those pages continue to be executable,
but those mappings persist only until the next ASID switch, even if they
are marked global.
An ASID switch happens in __kvm_mips_vcpu_run() at exception level after
switching to the guest exception base. Subsequent TLB mapped kernel
instructions just prior to switching to the guest trigger a TLB refill
exception, which enters the guest exception handlers without updating
EPC. This appears as a guest triggered TLB refill on a host kernel
mapped (host KSeg2) address, which is not handled correctly as user
(guest) mode accesses to kernel (host) segments always generate address
error exceptions.
Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: kvm@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: <stable@vger.kernel.org> # 3.10.x- Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[james.hogan@imgtec.com: backported for stable 3.14] Signed-off-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sending SI_TKILL from rt_[tg]sigqueueinfo was deprecated, so now we issue
a warning on the first attempt of doing it. We use WARN_ON_ONCE, which is
not informative and, what is worse, taints the kernel, making the trinity
syscall fuzzer complain false-positively from time to time.
It does not look like we need this warning at all, because the behaviour
changed quite a long time ago (2.6.39), and if an application relies on
the old API, it gets EPERM anyway and can issue a warning by itself.
So let us zap the warning in kernel.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Richard Weinberger <richard@nod.at> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use set_posix_acl, which includes proper permission checks, instead of
calling ->set_acl directly. Without this anyone may be able to grant
themselves permissions to a file by setting the ACL.
Lock the inode to make the new checks atomic with respect to set_acl.
(Also, nfsd was the only caller of set_acl not locking the inode, so I
suspect this may fix other races.)
This also simplifies the code, and ensures our ACLs are checked by
posix_acl_valid.
The permission checks and the inode locking were lost with commit 4ac7249e, which changed nfsd to use the set_acl inode operation directly
instead of going through xattr handlers.
Reported-by: David Sinquin <david@sinquin.eu>
[agreunba@redhat.com: use set_posix_acl] Fixes: 4ac7249e Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Factor out part of posix_acl_xattr_set into a common function that takes
a posix_acl, which nfsd can also call.
The prototype already exists in include/linux/posix_acl.h.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During page migrations UBIFS might get confused
and the following assert triggers:
[ 213.480000] UBIFS assert failed in ubifs_set_page_dirty at 1451 (pid 436)
[ 213.490000] CPU: 0 PID: 436 Comm: drm-stress-test Not tainted 4.4.4-00176-geaa802524636-dirty #1008
[ 213.490000] Hardware name: Allwinner sun4i/sun5i Families
[ 213.490000] [<c0015e70>] (unwind_backtrace) from [<c0012cdc>] (show_stack+0x10/0x14)
[ 213.490000] [<c0012cdc>] (show_stack) from [<c02ad834>] (dump_stack+0x8c/0xa0)
[ 213.490000] [<c02ad834>] (dump_stack) from [<c0236ee8>] (ubifs_set_page_dirty+0x44/0x50)
[ 213.490000] [<c0236ee8>] (ubifs_set_page_dirty) from [<c00fa0bc>] (try_to_unmap_one+0x10c/0x3a8)
[ 213.490000] [<c00fa0bc>] (try_to_unmap_one) from [<c00fadb4>] (rmap_walk+0xb4/0x290)
[ 213.490000] [<c00fadb4>] (rmap_walk) from [<c00fb1bc>] (try_to_unmap+0x64/0x80)
[ 213.490000] [<c00fb1bc>] (try_to_unmap) from [<c010dc28>] (migrate_pages+0x328/0x7a0)
[ 213.490000] [<c010dc28>] (migrate_pages) from [<c00d0cb0>] (alloc_contig_range+0x168/0x2f4)
[ 213.490000] [<c00d0cb0>] (alloc_contig_range) from [<c010ec00>] (cma_alloc+0x170/0x2c0)
[ 213.490000] [<c010ec00>] (cma_alloc) from [<c001a958>] (__alloc_from_contiguous+0x38/0xd8)
[ 213.490000] [<c001a958>] (__alloc_from_contiguous) from [<c001ad44>] (__dma_alloc+0x23c/0x274)
[ 213.490000] [<c001ad44>] (__dma_alloc) from [<c001ae08>] (arm_dma_alloc+0x54/0x5c)
[ 213.490000] [<c001ae08>] (arm_dma_alloc) from [<c035cecc>] (drm_gem_cma_create+0xb8/0xf0)
[ 213.490000] [<c035cecc>] (drm_gem_cma_create) from [<c035cf20>] (drm_gem_cma_create_with_handle+0x1c/0xe8)
[ 213.490000] [<c035cf20>] (drm_gem_cma_create_with_handle) from [<c035d088>] (drm_gem_cma_dumb_create+0x3c/0x48)
[ 213.490000] [<c035d088>] (drm_gem_cma_dumb_create) from [<c0341ed8>] (drm_ioctl+0x12c/0x444)
[ 213.490000] [<c0341ed8>] (drm_ioctl) from [<c0121adc>] (do_vfs_ioctl+0x3f4/0x614)
[ 213.490000] [<c0121adc>] (do_vfs_ioctl) from [<c0121d30>] (SyS_ioctl+0x34/0x5c)
[ 213.490000] [<c0121d30>] (SyS_ioctl) from [<c000f2c0>] (ret_fast_syscall+0x0/0x34)
UBIFS is using PagePrivate() which can have different meanings across
filesystems. Therefore the generic page migration code cannot handle this
case correctly.
We have to implement our own migration function which basically does a
plain copy but also duplicates the page private flag.
UBIFS is not a block device filesystem and cannot use buffer_migrate_page().
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
[rw: Massaged changelog, build fixes, etc...] Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In a subsequent patch, pmd_mknotpresent will clear the valid bit of the
pmd entry, resulting in a not-present entry from the hardware's
perspective. Unfortunately, pmd_present simply checks for a non-zero pmd
value and will therefore continue to return true even after a
pmd_mknotpresent operation. Since pmd_mknotpresent is only used for
managing huge entries, this is only an issue for the 3-level case.
This patch fixes the 3-level pmd_present implementation to take into
account the valid bit. For bisectability, the change is made before the
fix to pmd_mknotpresent.
Fixes: 8d9625070073 ("ARM: mm: Transparent huge page support for LPAE systems.") Cc: Russell King <linux@armlinux.org.uk> Cc: Steve Capper <Steve.Capper@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Olga Kornievskaia reports that the following test fails to trigger
an OPEN_DOWNGRADE on the wire, and only triggers the final CLOSE.
fd0 = open(foo, RDRW) -- should be open on the wire for "both"
fd1 = open(foo, RDONLY) -- should be open on the wire for "read"
close(fd0) -- should trigger an open_downgrade
read(fd1)
close(fd1)
The issue is that we're missing a check for whether or not the current
state transitioned from an O_RDWR state as opposed to having transitioned
from a combination of O_RDONLY and O_WRONLY.
Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: cd9288ffaea4 ("NFSv4: Fix another bug in the close/open_downgrade code") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code"
unconditional d_drop() after the ->open_context() had been removed. It had
been correct for success cases (there ->open_context() itself had been doing
dcache manipulations), but not for error ones. Only one of those (ENOENT)
got a compensatory d_drop() added in that commit, but in fact it should've
been done for all errors. As it is, the case of O_CREAT non-exclusive open
on a hashed negative dentry racing with e.g. symlink creation from another
client ended up with ->open_context() getting an error and proceeding to
call nfs_lookup(). On a hashed dentry, which would've instantly triggered
BUG_ON() in d_materialise_unique() (or, these days, its equivalent in
d_splice_alias()).
Tested-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix boot crash that triggers if this driver is built into a kernel and
run on non-AMD systems.
AMD northbridges users call amd_cache_northbridges() and it returns
a negative value to signal that we weren't able to cache/detect any
northbridges on the system.
At least, it should do so as all its callers expect it to do so. But it
does return a negative value only when kmalloc() fails.
Fix it to return -ENODEV if there are no NBs cached as otherwise, amd_nb
users like amd64_edac, for example, which relies on it to know whether
it should load or not, gets loaded on systems like Intel Xeons where it
shouldn't.
Fix kprobe_fault_handler() to clear the TF (trap flag) bit of
the flags register in the case of a fault fixup on single-stepping.
If we put a kprobe on the instruction which caused a
page fault (e.g. actual mov instructions in copy_user_*),
that fault happens on the single-stepping buffer. In this
case, kprobes resets running instance so that the CPU can
retry execution on the original ip address.
However, current code forgets to reset the TF bit. Since this
fault happens with TF bit set for enabling single-stepping,
when it retries, it causes a debug exception and kprobes
can not handle it because it already reset itself.
On the most of x86-64 platform, it can be easily reproduced
by using kprobe tracer. E.g.
# cd /sys/kernel/debug/tracing
# echo p copy_user_enhanced_fast_string+5 > kprobe_events
# echo 1 > events/kprobes/enable
And you'll see a kernel panic on do_debug(), since the debug
trap is not handled by kprobes.
To fix this problem, we just need to clear the TF bit when
resetting running kprobe.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: systemtap@sourceware.org Link: http://lkml.kernel.org/r/20160611140648.25885.37482.stgit@devbox
[ Updated the comments. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For newer versions of Syslinux, we need ldlinux.c32 in addition to
isolinux.bin to reside on the boot disk, so if the latter is found,
copy it, too, to the isoimage tree.
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The HOSTPC extension registers found in some EHCI implementations form
a variable-length array, with one element for each port. Therefore
the hostpc field in struct ehci_regs should be declared as a
zero-length array, not a single-element array.
Userspace can quite legitimately perform an exec() syscall with a
suspended transaction. exec() does not return to the old process, rather
it load a new one and starts that, the expectation therefore is that the
new process starts not in a transaction. Currently exec() is not treated
any differently to any other syscall which creates problems.
Firstly it could allow a new process to start with a suspended
transaction for a binary that no longer exists. This means that the
checkpointed state won't be valid and if the suspended transaction were
ever to be resumed and subsequently aborted (a possibility which is
exceedingly likely as exec()ing will likely doom the transaction) the
new process will jump to invalid state.
Secondly the incorrect attempt to keep the transactional state while
still zeroing state for the new process creates at least two TM Bad
Things. The first triggers on the rfid to return to userspace as
start_thread() has given the new process a 'clean' MSR but the suspend
will still be set in the hardware MSR. The second TM Bad Thing triggers
in __switch_to() as the processor is still transactionally suspended but
__switch_to() wants to zero the TM sprs for the new process.
This is an example of the outcome of calling exec() with a suspended
transaction. Note the first 700 is likely the first TM bad thing
decsribed earlier only the kernel can't report it as we've loaded
userspace registers. c000000000009980 is the rfid in
fast_exception_return()
In commit 8445a87f7092 "powerpc/iommu: Remove the dependency on EEH
struct in DDW mechanism", the PE address was replaced with the PCI
config address in order to remove dependency on EEH. According to PAPR
spec, firmware (pHyp or QEMU) should accept "xxBBSSxx" format PCI config
address, not "xxxxBBSS" provided by the patch. Note that "BB" is PCI bus
number and "SS" is the combination of slot and function number.
This fixes the PCI address passed to DDW RTAS calls.
Fixes: 8445a87f7092 ("powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism") Reported-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 39baadbf36ce ("powerpc/eeh: Remove eeh information from pci_dn")
changed the pci_dn struct by removing its EEH-related members.
As part of this clean-up, DDW mechanism was modified to read the device
configuration address from eeh_dev struct.
As a consequence, now if we disable EEH mechanism on kernel command-line
for example, the DDW mechanism will fail, generating a kernel oops by
dereferencing a NULL pointer (which turns to be the eeh_dev pointer).
This patch just changes the configuration address calculation on DDW
functions to a manual calculation based on pci_dn members instead of
using eeh_dev-based address.
No functional changes were made. This was tested on pSeries, both
in PHyp and qemu guest.
Fixes: 39baadbf36ce ("powerpc/eeh: Remove eeh information from pci_dn") Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When this code was reworked for IBoE support the order of assignments
for the sl_tclass_flowlabel got flipped around resulting in
TClass & FlowLabel being permanently set to 0 in the packet headers.
This breaks IB routers that rely on these headers, but only affects
kernel users - libmlx4 does this properly for user space.
Fixes: fa417f7b520e ("IB/mlx4: Add support for IBoE") Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A wmediumd that does not send this attribute causes a NULL pointer
dereference, as the attribute is accessed even if it does not exist.
The attribute was required but never checked ever since userspace frame
forwarding has been introduced. The issue gets more problematic once we
allow wmediumd registration from user namespaces.
Fixes: 7882513bacb1 ("mac80211_hwsim driver support userspace frame tx/rx") Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix both of these by unconditionally flushing paths before destroying
the sta, and by adding a synchronize_net() after path flush to ensure
no active readers can still dereference the sta.
Reported-by: Fred Veldini <fred.veldini@gmail.com> Tested-by: Fred Veldini <fred.veldini@gmail.com> Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
shared_fifo endpoints would only get a previous tx state cleared
out, the rx state was only cleared for non shared_fifo endpoints
Change this so that the rx state is cleared for all endpoints.
This addresses an issue that resulted in rx packets being dropped
silently.
Signed-off-by: Andrew Goodbody <andrew.goodbody@cambrionix.com> Signed-off-by: Bin Liu <b-liu@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ensure that the endpoint is stopped by clearing REQPKT before
clearing DATAERR_NAKTIMEOUT before rotating the queue on the
dedicated bulk endpoint.
This addresses an issue where a race could result in the endpoint
receiving data before it was reprogrammed resulting in a warning
about such data from musb_rx_reinit before it was thrown away.
The data thrown away was a valid packet that had been correctly
ACKed which meant the host and device got out of sync.
Signed-off-by: Andrew Goodbody <andrew.goodbody@cambrionix.com> Signed-off-by: Bin Liu <b-liu@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Note: This is a verified backported patch for stable 4.4 kernel, and it
could also be applied to 4.3/4.2/4.1/3.18/3.16
There is a problem with alx devices, that the network link will be
lost in 1-5 minutes after the device is up.
>From debugging without datasheet, we found the error always
happen when the DMA RX address is set to 0x....fc0, which is very
likely to be a HW/silicon problem.
This patch will apply rx skb with 64 bytes longer space, and if the
allocated skb has a 0x...fc0 address, it will use skb_resever(skb, 64)
to advance the address, so that the RX overflow can be avoided.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70761 Signed-off-by: Feng Tang <feng.tang@intel.com> Suggested-by: Eric Dumazet <edumazet@google.com> Tested-by: Ole Lukoie <olelukoie@mail.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes wrong-interface signaling on 32-bit platforms for entries
created when jiffies > 2^31 + MFC_ASSERT_THRESH.
Signed-off-by: Tom Goff <thomas.goff@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since 32b8a8e59c9c ("sit: add IPv4 over IPv4 support")
ipip6_err() may be called for packets whose IP protocol is
IPPROTO_IPIP as well as those whose IP protocol is IPPROTO_IPV6.
In the case of IPPROTO_IPIP packets the correct protocol value is not
passed to ipv4_update_pmtu() or ipv4_redirect().
This patch resolves this problem by using the IP protocol of the packet
rather than a hard-coded value. This appears to be consistent
with the usage of the protocol of a packet by icmp_socket_deliver()
the caller of ipip6_err().
I was able to exercise the redirect case by using a setup where an ICMP
redirect was received for the destination of the encapsulated packet.
However, it appears that although incorrect the protocol field is not used
in this case and thus no problem manifests. On inspection it does not
appear that a problem will manifest in the fragmentation needed/update pmtu
case either.
In short I believe this is a cosmetic fix. None the less, the use of
IPPROTO_IPV6 seems wrong and confusing.
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The hash buffer is really HASH_BLOCK_SIZE bytes, someone
must have thought that memmove takes n*u32 words by mistake.
Tests work as good/bad as before after this patch.
Cc: Joakim Bech <joakim.bech@linaro.org> Reported-by: David Binderman <linuxdev.baldrick@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This partially reverts commit 1086bbe97a07 ("netfilter: ensure number of
counters is >0 in do_replace()") in net/bridge/netfilter/ebtables.c.
Setting rules with ebtables does not work any more with 1086bbe97a07 place.
There is an error message and no rules set in the end.
e.g.
~# ebtables -t nat -A POSTROUTING --src 12:34:56:78:9a:bc -j DROP
Unable to update the kernel. Two possible causes:
1. Multiple ebtables programs were executing simultaneously. The ebtables
userspace tool doesn't by default support multiple ebtables programs
running
Reverting the ebtables part of 1086bbe97a07 makes this work again.
Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This looks like refactoring, but its also a bug fix.
Problem is that the compat path (32bit iptables, 64bit kernel) lacks a few
sanity tests that are done in the normal path.
For example, we do not check for underflows and the base chain policies.
While its possible to also add such checks to the compat path, its more
copy&pastry, for instance we cannot reuse check_underflow() helper as
e->target_offset differs in the compat case.
Other problem is that it makes auditing for validation errors harder; two
places need to be checked and kept in sync.
At a high level 32 bit compat works like this:
1- initial pass over blob:
validate match/entry offsets, bounds checking
lookup all matches and targets
do bookkeeping wrt. size delta of 32/64bit structures
assign match/target.u.kernel pointer (points at kernel
implementation, needed to access ->compatsize etc.)
2- allocate memory according to the total bookkeeping size to
contain the translated ruleset
3- second pass over original blob:
for each entry, copy the 32bit representation to the newly allocated
memory. This also does any special match translations (e.g.
adjust 32bit to 64bit longs, etc).
4- check if ruleset is free of loops (chase all jumps)
5-first pass over translated blob:
call the checkentry function of all matches and targets.
The alternative implemented by this patch is to drop steps 3&4 from the
compat process, the translation is changed into an intermediate step
rather than a full 1:1 translate_table replacement.
In the 2nd pass (step #3), change the 64bit ruleset back to a kernel
representation, i.e. put() the kernel pointer and restore ->u.user.name .
This gets us a 64bit ruleset that is in the format generated by a 64bit
iptables userspace -- we can then use translate_table() to get the
'native' sanity checks.
This has two drawbacks:
1. we re-validate all the match and target entry structure sizes even
though compat translation is supposed to never generate bogus offsets.
2. we put and then re-lookup each match and target.
THe upside is that we get all sanity tests and ruleset validations
provided by the normal path and can remove some duplicated compat code.
iptables-restore time of autogenerated ruleset with 300k chains of form
-A CHAIN0001 -m limit --limit 1/s -j CHAIN0002
-A CHAIN0002 -m limit --limit 1/s -j CHAIN0003
shows no noticeable differences in restore times:
old: 0m30.796s
new: 0m31.521s
64bit: 0m25.674s
It turns out we don't validate that the num_counters field in the
struct we pass in from userspace is initialized.
The same problem also exists in ebtables, arptables, ipv6, and the
compat variants.
Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Quoting John Stultz:
In updating a 32bit arm device from 4.6 to Linus' current HEAD, I
noticed I was having some trouble with networking, and realized that
/proc/net/ip_tables_names was suddenly empty.
Digging through the registration process, it seems we're catching on the:
Validate that all matches (if any) add up to the beginning of
the target and that each match covers at least the base structure size.
The compat path should be able to safely re-use the function
as the structures only differ in alignment; added a
BUILD_BUG_ON just in case we have an arch that adds padding as well.
We're currently asserting that targetoff + targetsize <= nextoff.
Extend it to also check that targetoff is >= sizeof(xt_entry).
Since this is generic code, add an argument pointing to the start of the
match/target, we can then derive the base structure size from the delta.
We also need the e->elems pointer in a followup change to validate matches.
We have targets and standard targets -- the latter carries a verdict.
The ip/ip6tables validation functions will access t->verdict for the
standard targets to fetch the jump offset or verdict for chainloop
detection, but this happens before the targets get checked/validated.
Thus we also need to check for verdict presence here, else t->verdict
can point right after a blob.
32bit rulesets have different layout and alignment requirements, so once
more integrity checks get added to xt_check_entry_offsets it will reject
well-formed 32bit rulesets.
Once we add more sanity testing to xt_check_entry_offsets it
becomes relvant if we're expecting a 32bit 'config_compat' blob
or a normal one.
Since we already have a lot of similar-named functions (check_entry,
compat_check_entry, find_and_check_entry, etc.) and the current
incarnation is short just fold its contents into the callers.
Currently arp/ip and ip6tables each implement a short helper to check that
the target offset is large enough to hold one xt_entry_target struct and
that t->u.target_size fits within the current rule.
Unfortunately these checks are not sufficient.
To avoid adding new tests to all of ip/ip6/arptables move the current
checks into a helper, then extend this helper in followup patches.
In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it
is possible for a user-supplied ipt_entry structure to have a large
next_offset field. This field is not bounds checked prior to writing a
counter value at the supplied offset.
Base chains enforce absolute verdict.
User defined chains are supposed to end with an unconditional return,
xtables userspace adds them automatically.
But if such return is missing we will move to non-existent next rule.
Reported-by: Ben Hawkes <hawkes@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On no-so-small systems, it is possible for a single process to cause an
OOM condition by filling large pipes with data that are never read. A
typical process filling 4000 pipes with 1 MB of data will use 4 GB of
memory. On small systems it may be tricky to set the pipe max size to
prevent this from happening.
This patch makes it possible to enforce a per-user soft limit above
which new pipes will be limited to a single page, effectively limiting
them to 4 kB each, as well as a hard limit above which no new pipes may
be created for this user. This has the effect of protecting the system
against memory abuse without hurting other users, and still allowing
pipes to work correctly though with less data at once.
The limit are controlled by two new sysctls : pipe-user-pages-soft, and
pipe-user-pages-hard. Both may be disabled by setting them to zero. The
default soft limit allows the default number of FDs per process (1024)
to create pipes of the default size (64kB), thus reaching a limit of 64MB
before starting to create only smaller pipes. With 256 processes limited
to 1024 FDs each, this results in 1024*64kB + (256*1024 - 1024) * 4kB =
1084 MB of memory allocated for a user. The hard limit is disabled by
default to avoid breaking existing applications that make intensive use
of pipes (eg: for splicing).
In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it
is possible for a user-supplied ipt_entry structure to have a large
next_offset field. This field is not bounds checked prior to writing a
counter value at the supplied offset.
Problem is that mark_source_chains should not have been called --
the rule doesn't have a next entry, so its supposed to return
an absolute verdict of either ACCEPT or DROP.
However, the function conditional() doesn't work as the name implies.
It only checks that the rule is using wildcard address matching.
However, an unconditional rule must also not be using any matches
(no -m args).
The underflow validator only checked the addresses, therefore
passing the 'unconditional absolute verdict' test, while
mark_source_chains also tested for presence of matches, and thus
proceeeded to the next (not-existent) rule.
Unify this so that all the callers have same idea of 'unconditional rule'.
Reported-by: Ben Hawkes <hawkes@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the "ibm,configure-pe" and "ibm,configure-bridge" RTAS calls, the
spec states that values of 9900-9905 can be returned, indicating that
software should delay for 10^x (where x is the last digit, i.e. 990x)
milliseconds and attempt the call again. Currently, the kernel doesn't
know about this, and respecting it fixes some PCI failures when the
hypervisor is busy.
The delay is capped at 0.2 seconds.
Cc: <stable@vger.kernel.org> # 3.10+ Signed-off-by: Russell Currey <ruscur@russell.cc> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
TASK_SIZE was defined as 0x7fff8000UL which for 64k pages is not a
multiple of the page size. Somewhere further down the math fails
such that executing an ELF binary fails.
Ascend-to-parent logics in d_walk() depends on all encountered child
dentries not getting freed without an RCU delay. Unfortunately, in
quite a few cases it is not true, with hard-to-hit oopsable race as
the result.
Fortunately, the fix is simiple; right now the rule is "if it ever
been hashed, freeing must be delayed" and changing it to "if it
ever had a parent, freeing must be delayed" closes that hole and
covers all cases the old rule used to cover. Moreover, pipes and
sockets remain _not_ covered, so we do not introduce RCU delay in
the cases which are the reason for having that delay conditional
in the first place.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
iwpriv app uses iw_point structure to send data to Kernel. The iw_point
structure holds a pointer. For compatibility Kernel converts the pointer
as required for WEXT IOCTLs (SIOCIWFIRST to SIOCIWLAST). Some drivers
may use iw_handler_def.private_args to populate iwpriv commands instead
of iw_handler_def.private. For those case, the IOCTLs from
SIOCIWFIRSTPRIV to SIOCIWLASTPRIV will follow the path ndo_do_ioctl().
Accordingly when the filled up iw_point structure comes from 32 bit
iwpriv to 64 bit Kernel, Kernel will not convert the pointer and sends
it to driver. So, the driver may get the invalid data.
The pointer conversion for the IOCTLs (SIOCIWFIRSTPRIV to
SIOCIWLASTPRIV), which follow the path ndo_do_ioctl(), is mandatory.
This patch adds pointer conversion from 32 bit to 64 bit and vice versa,
if the ioctl comes from 32 bit iwpriv to 64 bit Kernel.
Signed-off-by: Prasun Maiti <prasunmaiti87@gmail.com> Signed-off-by: Ujjal Roy <royujjal@gmail.com> Tested-by: Dibyajyoti Ghosh <dibyajyotig@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This prevents users from triggering a stack overflow through a recursive
invocation of pagefault handling that involves mapping procfs files into
virtual memory.
This means the userspace program clock_adjtime called the clock_adjtime()
syscall and then crashed inside the compat_get_timex() function.
Syscalls should never crash programs, but instead return EFAULT.
The IIR register contains the executed instruction, which disassebles
into "ldw 0(sr3,r5),r9".
This load-word instruction is part of __get_user() which tried to read the word
at %r5/IOR (0xfa6f7fff). This means the unaligned handler jumped in. The
unaligned handler is able to emulate all ldw instructions, but it fails if it
fails to read the source e.g. because of page fault.
int main(void) {
/* allocate 8k */
char *ptr = mmap(NULL, 2*4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
/* free second half (upper 4k) and make it invalid. */
munmap(ptr+4096, 4096);
/* syscall where first int is unaligned and clobbers into invalid memory region */
/* syscall should return EFAULT */
return syscall(__NR_clock_adjtime, 0, ptr+4095);
}
To fix this issue we simply need to check if the faulting instruction address
is in the exception fixup table when the unaligned handler failed. If it
is, call the fixup routine instead of crashing.
While looking at the unaligned handler I found another issue as well: The
target register should not be modified if the handler was unsuccessful.
We are already using the privileged versions of MMCR0, MMCR1
and MMCRA in the kernel, so for MMCR2, we should better use
the privileged versions, too, to be consistent.
Fixes: 240686c13687 ("powerpc: Initialise PMU related regs on Power8") Suggested-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The SIAR and SDAR registers are available twice, one time as SPRs
780 / 781 (unprivileged, but read-only), and one time as the SPRs
796 / 797 (privileged, but read and write). The Linux kernel code
currently uses the unprivileged SPRs - while this is OK for reading,
writing to that register of course does not work.
Since the KVM code tries to write to this register, too (see the mtspr
in book3s_hv_rmhandlers.S), the contents of this register sometimes get
lost for the guests, e.g. during migration of a VM.
To fix this issue, simply switch to the privileged SPR numbers instead.
Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ccp-crypto module for AES XTS support has a bug that can allow requests
greater than 4096 bytes in size to be passed to the CCP hardware. The CCP
hardware does not support request sizes larger than 4096, resulting in
incorrect output. The request should actually be handled by the fallback
mechanism instantiated by the ccp-crypto module.
Add a check to insure the request size is less than or equal to the maximum
supported size and use the fallback mechanism if it is not.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
PTRACE_SETVFPREGS fails to properly mark the VFP register set to be
reloaded, because it undoes one of the effects of vfp_flush_hwstate().
Specifically vfp_flush_hwstate() sets thread->vfpstate.hard.cpu to
an invalid CPU number, but vfp_set() overwrites this with the original
CPU number, thereby rendering the hardware state as apparently "valid",
even though the software state is more recent.
Fix this by reverting the previous change.
Fixes: 8130b9d7b9d8 ("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers") Acked-by: Will Deacon <will.deacon@arm.com> Tested-by: Simon Marchi <simon.marchi@ericsson.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
MOV to DR6 or DR7 causes a #GP if an attempt is made to write a 1 to
any of bits 63:32. However, this is not detected at KVM_SET_DEBUGREGS
time, and the next KVM_RUN oopses:
Otherwise, if we fail to allocate new PIO buffers, our TXQs will try to
use the old ones, which aren't there any more.
Fixes: 183233bec810 "sfc: Allocate and link PIO buffers; map them with write-combining" Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we free cb->skb after a dump, we do it after releasing the
lock. This means that a new dump could have started in the time
being and we'll end up freeing their skb instead of ours.
This patch saves the skb and module before we unlock so we free
the right memory.
Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.") Reported-by: Baozeng Ding <sploving1@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We don't write back stale inodes so we should skip them in
xfs_iflush_cluster, too.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some careless idiot(*) wrote crap code in commit 1a3e8f3 ("xfs:
convert inode cache lookups to use RCU locking") back in late 2010,
and so xfs_iflush_cluster checks the wrong inode for whether it is
still valid under RCU protection. Fix it to lock and check the
correct inode.
(*) Careless-idiot: Dave Chinner <dchinner@redhat.com>
Discovered-by: Brain Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a failure due to an inode buffer occurs, the error handling
fails to abort the inode writeback correctly. This can result in the
inode being reclaimed whilst still in the AIL, leading to
use-after-free situations as well as filesystems that cannot be
unmounted as the inode log items left in the AIL never get removed.
Fix this by ensuring fatal errors from xfs_imap_to_bp() result in
the inode flush being aborted correctly.
Reported-by: Shyam Kaushik <shyam@zadarastorage.com> Diagnosed-by: Shyam Kaushik <shyam@zadarastorage.com> Tested-by: Shyam Kaushik <shyam@zadarastorage.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With netconsole (at least) the pr_err("... disablingn") call can
recurse back into the dma-debug code, where it'll try to grab
free_entries_lock again. Avoid the problem by doing the printk after
dropping the lock.
Currently, in ext4_mb_init(), there's a loop like the following:
do {
...
offset += 1 << (sb->s_blocksize_bits - i);
i++;
} while (i <= sb->s_blocksize_bits + 1);
Note that the updated offset is used in the loop's next iteration only.
However, at the last iteration, that is at i == sb->s_blocksize_bits + 1,
the shift count becomes equal to (unsigned)-1 > 31 (c.f. C99 6.5.7(3))
and UBSAN reports
Observe that the mentioned shift exponent, 4294967295, equals (unsigned)-1.
Unless compilers start to do some fancy transformations (which at least
GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
such calculated value of offset is never used again.
Silence UBSAN by introducing another variable, offset_incr, holding the
next increment to apply to offset and adjust that one by right shifting it
by one position per loop iteration.
Unless compilers start to do some fancy transformations (which at least
GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
such calculated value of bb is never used again.
Silence UBSAN by introducing another variable, bb_incr, holding the next
increment to apply to bb and adjust that one by right shifting it by one
position per loop iteration.
If the orphaned inode list contains inode #5, ext4_iget() returns a
bad inode (since the bootloader inode should never be referenced
directly). Because of the bad inode, we end up processing the inode
repeatedly and this hangs the machine.
(But don't do this if you are using an unpatched kernel if you care
about the system staying functional. :-)
This bug was found by the port of American Fuzzy Lop into the kernel
to find file system problems[1]. (Since it *only* happens if inode #5
shows up on the orphan list --- 3, 7, 8, etc. won't do it, it's not
surprising that AFL needed two hours before it found it.)
During boot, MST hotplugs are generally expected (even if no physical
hotplugging occurs) and result in DRM's connector topology changing.
This means that using num_connector from the current mode configuration
can lead to the number of connectors changing under us. This can lead to
some nasty scenarios in fbcon:
- We allocate an array to the size of dev->mode_config.num_connectors.
- MST hotplug occurs, dev->mode_config.num_connectors gets incremented.
- We try to loop through each element in the array using the new value
of dev->mode_config.num_connectors, and end up going out of bounds
since dev->mode_config.num_connectors is now larger then the array we
allocated.
fb_helper->connector_count however, will always remain consistent while
we do a modeset in fb_helper.
Note: This is just polish for 4.7, Dave Airlie's drm_connector
refcounting fixed these bugs for real. But it's good enough duct-tape
for stable kernel backporting, since backporting the refcounting
changes is way too invasive.
Fix possible out of bounds read, by adding missing comma.
The code may read pass the end of the dsi_errors array
when the most significant bit (bit #31) in the intr_stat register
is set.
This bug has been detected using CppCheck (static analysis tool).
The length of the GSS MIC token need not be a multiple of four bytes.
It is then padded by XDR to a multiple of 4 B, but unwrap_integ_data()
would previously only trim mic.len + 4 B. The remaining up to three
bytes would then trigger a check in nfs4svc_decode_compoundargs(),
leading to a "garbage args" error and mount failure:
nfs4svc_decode_compoundargs: compound not properly padded!
nfsd: failed to decode arguments!
This would prevent older clients using the pre-RFC 4121 MIC format
(37-byte MIC including a 9-byte OID) from mounting exports from v3.9+
servers using krb5i.
The trimming was introduced by commit 4c190e2f913f ("sunrpc: trim off
trailing checksum before returning decrypted or integrity authenticated
buffer").
Fixes: 4c190e2f913f "unrpc: trim off trailing checksum..." Signed-off-by: Tomáš Trnka <ttrnka@mail.muni.cz> Acked-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit ff1e22e7a638 ("xen/events: Mask a moving irq") open-coded
irq_move_irq() but left out checking if the IRQ is disabled. This broke
resuming from suspend since it tries to move a (disabled) irq without
holding the IRQ's desc->lock. Fix it by adding in a check for disabled
IRQs.
Some of the interrupt vectors on 64-bit POWER server processors are only
32 bytes long (8 instructions), which is not enough for the full
first-level interrupt handler. For these we need to branch to an
out-of-line (OOL) handler. But when we are running a relocatable kernel,
interrupt vectors till __end_interrupts marker are copied down to real
address 0x100. So, branching to labels (ie. OOL handlers) outside this
section must be handled differently (see LOAD_HANDLER()), considering
relocatable kernel, which would need at least 4 instructions.
However, branching from interrupt vector means that we corrupt the
CFAR (come-from address register) on POWER7 and later processors as
mentioned in commit 1707dd16. So, EXCEPTION_PROLOG_0 (6 instructions)
that contains the part up to the point where the CFAR is saved in the
PACA should be part of the short interrupt vectors before we branch out
to OOL handlers.
But as mentioned already, there are interrupt vectors on 64-bit POWER
server processors that are only 32 bytes long (like vectors 0x4f00,
0x4f20, etc.), which cannot accomodate the above two cases at the same
time owing to space constraint. Currently, in these interrupt vectors,
we simply branch out to OOL handlers, without using LOAD_HANDLER(),
which leaves us vulnerable when running a relocatable kernel (eg. kdump
case). While this has been the case for sometime now and kdump is used
widely, we were fortunate not to see any problems so far, for three
reasons:
1. In almost all cases, production kernel (relocatable) is used for
kdump as well, which would mean that crashed kernel's OOL handler
would be at the same place where we end up branching to, from short
interrupt vector of kdump kernel.
2. Also, OOL handler was unlikely the reason for crash in almost all
the kdump scenarios, which meant we had a sane OOL handler from
crashed kernel that we branched to.
3. On most 64-bit POWER server processors, page size is large enough
that marking interrupt vector code as executable (see commit 429d2e83) leads to marking OOL handler code from crashed kernel,
that sits right below interrupt vector code from kdump kernel, as
executable as well.
Let us fix this by moving the __end_interrupts marker down past OOL
handlers to make sure that we also copy OOL handlers to real address
0x100 when running a relocatable kernel.
This fix has been tested successfully in kdump scenario, on an LPAR with
4K page size by using different default/production kernel and kdump
kernel.
Also tested by manually corrupting the OOL handlers in the first kernel
and then kdump'ing, and then causing the OOL handlers to fire - mpe.
Fixes: c1fb6816fb1b ("powerpc: Add relocation on exception vector handlers") Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> It was found that the fix for CVE-2015-1805 incorrectly kept buffer
> offset and buffer length in sync on a failed atomic read, potentially
> resulting in a pipe buffer state corruption. A local, unprivileged user
> could use this flaw to crash the system or leak kernel memory to user
> space. (CVE-2016-0774, Moderate)
The same flawed fix was applied to stable branches from 2.6.32.y to
3.14.y inclusive, and I was able to reproduce the issue on 3.2.y.
We need to give pipe_iov_copy_to_user() a separate offset variable
and only update the buffer offset if it succeeds.
In commit a269913c52ad ("rtlwifi: Rework rtl_lps_leave() and
rtl_lps_enter() to use work queue"), the tests for enter/exit
power-save mode were inverted. With this change applied, the
wifi connection becomes much more stable.
Fixes: a269913c52ad ("rtlwifi: Rework rtl_lps_leave() and rtl_lps_enter() to use work queue") Signed-off-by: Wang YanQing <udknight@gmail.com> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
b84106b4e229 ("PCI: Disable IO/MEM decoding for devices with non-compliant
BARs") disabled BAR sizing for BARs 0-5 of devices that don't comply with
the PCI spec. But it didn't do anything for expansion ROM BARs, so we
still try to size them, resulting in warnings like this on Broadwell-EP:
pci 0000:ff:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
Move the non-compliant BAR check from __pci_read_base() up to
pci_read_bases() so it applies to the expansion ROM BAR as well as
to BARs 0-5.
Note that direct callers of __pci_read_base(), like sriov_init(), will now
bypass this check. We haven't had reports of devices with broken SR-IOV
BARs yet.
Currently the 'registered' member of the cpuidle_device struct is set
to 1 during cpuidle_register_device. In this same function there are
checks to see if the device is already registered to prevent duplicate
calls to register the device, but this value is never set to 0 even on
unregister of the device. Because of this, any attempt to call
cpuidle_register_device after a call to cpuidle_unregister_device will
fail which shouldn't be the case.
To prevent this, set registered to 0 when the device is unregistered.
Fixes: c878a52d3c7c (cpuidle: Check if device is already registered) Signed-off-by: Dave Gerlach <d-gerlach@ti.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Typically under error conditions, it is possible for aac_command_thread()
to miss the wakeup from kthread_stop() and go back to sleep, causing it
to hang aac_shutdown.
In the observed scenario, the adapter is not functioning correctly and so
aac_fib_send() never completes (or time-outs depending on how it was
called). Shortly after aac_command_thread() starts it performs
aac_fib_send(SendHostTime) which hangs. When aac_probe_one
/aac_get_adapter_info send time outs, kthread_stop is called which breaks
the command thread out of it's hang.
The code will still go back to sleep in schedule_timeout() without
checking kthread_should_stop() so it causes aac_probe_one to hang until
the schedule_timeout() which is 30 minutes.
Fixed by: Adding another kthread_should_stop() before schedule_timeout() Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
aac_fib_send has a special function case for initial commands during
driver initialization using wait < 0(pseudo sync mode). In this case,
the command does not sleep but rather spins checking for timeout.This
loop is calls cpu_relax() in an attempt to allow other processes/threads
to use the CPU, but this function does not relinquish the CPU and so the
command will hog the processor. This was observed in a KDUMP
"crashkernel" and that prevented the "command thread" (which is
responsible for completing the command from being timed out) from
starting because it could not get the CPU.
Fixed by replacing "cpu_relax()" call with "schedule()" Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
BugLink: http://bugs.launchpad.net/bugs/972604
Commit 09c9bae26b0d3c9472cb6ae45010460a2cee8b8d ("ath5k: add led pin
configuration for compaq c700 laptop") added a pin configuration for the Compaq
c700 laptop. However, the polarity of the led pin is reversed. It should be
red for wifi off and blue for wifi on, but it is the opposite. This bug was
reported in the following bug report:
http://pad.lv/972604
Fixes: 09c9bae26b0d3c9472cb6ae45010460a2cee8b8d ("ath5k: add led pin configuration for compaq c700 laptop") Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When running a 32-bit userspace on a 64-bit kernel, the UI_SET_PHYS
ioctl needs to be treated with special care, as it has the pointer
size encoded in the command.
This makes the ath79 bootconsole behave the same way as the generic 8250
bootconsole.
Also waiting for TEMT (transmit buffer is empty) instead of just THRE
(transmit buffer is not full) ensures that all characters have been
transmitted before the real serial driver starts reconfiguring the serial
controller (which would sometimes result in garbage being transmitted.)
This change does not cause a visible performance loss.
In addition, this seems to fix a hang observed in certain configurations on
many AR7xxx/AR9xxx SoCs during autoconfig of the real serial driver.
A more complete follow-up patch will disable 8250 autoconfig for ath79
altogether (the serial controller is detected as a 16550A, which is not
fully compatible with the ath79 serial, and the autoconfig may lead to
undefined behavior on ath79.)
Commit 85efde6f4e0d ("make exported headers use strict posix types")
changed the asm-generic siginfo.h to use the __kernel_* types, and
commit 3a471cbc081b ("remove __KERNEL_STRICT_NAMES") make the internal
types accessible only to the kernel, but the MIPS implementation hasn't
been updated to match.
Switch to proper types now so that the exported asm/siginfo.h won't
produce quite so many compiler errors when included alone by a user
program.
When emulating a jalr instruction with rd == $0, the code in
isBranchInstr was incorrectly writing to GPR $0 which should actually
always remain zeroed. This would lead to any further instructions
emulated which use $0 operating on a bogus value until the task is next
context switched, at which point the value of $0 in the task context
would be restored to the correct zero by a store in SAVE_SOME. Fix this
by not writing to rd if it is $0.
Fixes: 102cedc32a6e ("MIPS: microMIPS: Floating point support.") Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: Maciej W. Rozycki <macro@imgtec.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/13160/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixes the broken serial log when changing the clock source
of uart device. Before disabling the original clock source, this patch
enables the new clock source to protect the clock off state for a split second.
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
When csw->con_startup() fails in do_register_con_driver, we return no
error (i.e. 0). This was changed back in 2006 by commit 3e795de763.
Before that we used to return -ENODEV.
So fix the return value to be -ENODEV in that case again.
Fixes: 3e795de763 ("VT binding: Add binding/unbinding support for the VT console") Signed-off-by: Jiri Slaby <jslaby@suse.cz> Reported-by: "Dan Carpenter" <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
URBs and buffers allocated in attach for Epic devices would never be
deallocated in case of a later probe error (e.g. failure to allocate
minor numbers) as disconnect is then never called.
Fix by moving deallocation to release and making sure that the
URBs are first unlinked.
The interface read URB is submitted in attach, but was only unlinked by
the driver at disconnect.
In case of a late probe error (e.g. due to failed minor allocation),
disconnect is never called and we would end up with active URBs for an
unbound interface. This in turn could lead to deallocated memory being
dereferenced in the completion callback.
Fixes: f7a33e608d9a ("USB: serial: add quatech2 usb to serial driver") Signed-off-by: Johan Hovold <johan@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The interface instat and indat URBs were submitted in attach, but never
unlinked in release before deallocating the corresponding transfer
buffers.
In the case of a late probe error (e.g. due to failed minor allocation),
disconnect would not have been called before release, causing the
buffers to be freed while the URBs are still in use. We'd also end up
with active URBs for an unbound interface.
The interface read and event URBs are submitted in attach, but were
never explicitly unlinked by the driver. Instead the URBs would have
been killed by usb-serial core on disconnect.
In case of a late probe error (e.g. due to failed minor allocation),
disconnect is never called and we could end up with active URBs for an
unbound interface. This in turn could lead to deallocated memory being
dereferenced in the completion callbacks.