The MIC VOP driver does two successive reads from user space to read a
variable length data structure. Kernel memory corruption can result if
the data structure changes between the two reads. This patch disallows
the chance of this happening.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=116651
Reported by: Pengfei Wang <wpengfeinudt@gmail.com> Reviewed-by: Sudeep Dutt <sudeep.dutt@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16:
- Adjust filename, context
- goto exit on failure] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Yue Cao claims that current host rate limiting of challenge ACKS
(RFC 5961) could leak enough information to allow a patient attacker
to hijack TCP sessions. He will soon provide details in an academic
paper.
This patch increases the default limit from 100 to 1000, and adds
some randomization so that the attacker can no longer hijack
sessions without spending a considerable amount of probes.
Based on initial analysis and patch from Linus.
Note that we also have per socket rate limiting, so it is tempting
to remove the host limit in the future.
v2: randomize the count of challenge acks per second, not the period.
Fixes: 282f23c6ee34 ("tcp: implement RFC 5961 3.2") Reported-by: Yue Cao <ycao009@ucr.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16:
- Adjust context
- Use ACCESS_ONCE() instead of {READ,WRITE}_ONCE()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
There is a double fetch problem in audit_log_single_execve_arg()
where we first check the execve(2) argumnets for any "bad" characters
which would require hex encoding and then re-fetch the arguments for
logging in the audit record[1]. Of course this leaves a window of
opportunity for an unsavory application to munge with the data.
This patch reworks things by only fetching the argument data once[2]
into a buffer where it is scanned and logged into the audit
records(s). In addition to fixing the double fetch, this patch
improves on the original code in a few other ways: better handling
of large arguments which require encoding, stricter record length
checking, and some performance improvements (completely unverified,
but we got rid of some strlen() calls, that's got to be a good
thing).
As part of the development of this patch, I've also created a basic
regression test for the audit-testsuite, the test can be tracked on
GitHub at the following link:
[1] If you pay careful attention, there is actually a triple fetch
problem due to a strnlen_user() call at the top of the function.
[2] This is a tiny white lie, we do make a call to strnlen_user()
prior to fetching the argument data. I don't like it, but due to the
way the audit record is structured we really have no choice unless we
copy the entire argument at once (which would require a rather
wasteful allocation). The good news is that with this patch the
kernel no longer relies on this strnlen_user() value for anything
beyond recording it in the log, we also update it with a trustworthy
value whenever possible.
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The sclp_ctl_ioctl_sccb function uses two copy_from_user calls to
retrieve the sclp request from user space. The first copy_from_user
fetches the length of the request which is stored in the first two
bytes of the request. The second copy_from_user gets the complete
sclp request, but this copies the length field a second time.
A malicious user may have changed the length in the meantime.
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com> Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
It turns out that if the guest does a H_CEDE while the CPU is in
a transactional state, and the H_CEDE does a nap, and the nap
loses the architected state of the CPU (which is is allowed to do),
then we lose the checkpointed state of the virtual CPU. In addition,
the transactional-memory state recorded in the MSR gets reset back
to non-transactional, and when we try to return to the guest, we take
a TM bad thing type of program interrupt because we are trying to
transition from non-transactional to transactional with a hrfid
instruction, which is not permitted.
The result of the program interrupt occurring at that point is that
the host CPU will hang in an infinite loop with interrupts disabled.
Thus this is a denial of service vulnerability in the host which can
be triggered by any guest (and depending on the guest kernel, it can
potentially triggered by unprivileged userspace in the guest).
This vulnerability has been assigned the ID CVE-2016-5412.
To fix this, we save the TM state before napping and restore it
on exit from the nap, when handling a H_CEDE in real mode. The
case where H_CEDE exits to host virtual mode is already OK (as are
other hcalls which exit to host virtual mode) because the exit
path saves the TM state.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This moves the transactional memory state save and restore sequences
out of the guest entry/exit paths into separate procedures. This is
so that these sequences can be used in going into and out of nap
in a subsequent patch.
The only code changes here are (a) saving and restore LR on the
stack, since these new procedures get called with a bl instruction,
(b) explicitly saving r1 into the PACA instead of assuming that
HSTATE_HOST_R1(r13) is already set, and (c) removing an unnecessary
and redundant setting of MSR[TM] that should have been removed by
commit 9d4d0bdd9e0a ("KVM: PPC: Book3S HV: Add transactional memory
support", 2013-09-24) but wasn't.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
[bwh: Backported to 3.16: include dots in subroutine names] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The last field "flags" of object "minfo" is not initialized.
Copying this object out may leak kernel stack data.
Assign 0 to it to avoid leak.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
link_info.str is a char array of size 60. Memory after the NULL
byte is not initialized. Sending the whole object out can cause
a leak.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
[carnil: Backported to 3.16 (same as bwh did for 3.2): the unpadded strcpy() is
in tipc_node_get_links() and no nlattr is involved, so use strncpy()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The stack object “r1” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The stack object “r1” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The stack object “tread” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The stack object “ci” has a total size of 8 bytes. Its last 3 bytes
are padding bytes which are not initialized and leaked to userland
via “copy_to_user”.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This prevents stacking filesystems (ecryptfs and overlayfs) from using
procfs as lower filesystem. There is too much magic going on inside
procfs, and there is no good reason to stack stuff on top of procfs.
(For example, procfs does access checks in VFS open handlers, and
ecryptfs by design calls open handlers from a kernel thread that doesn't
drop privileges or so.)
Add a simple read-only counter to super_block that indicates how deep this
is in the stack of filesystems. Previously ecryptfs was the only stackable
filesystem and it explicitly disallowed multiple layers of itself.
Overlayfs, however, can be stacked recursively and also may be stacked
on top of ecryptfs or vice versa.
To limit the kernel stack usage we must limit the depth of the
filesystem stack. Initially the limit is set to 2.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
[bwh: Backported to 3.16: drop changes to overlayfs] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
I previously added an integer overflow check here but looking at it now,
it's still buggy.
The bug happens in snd_compr_allocate_buffer(). We multiply
".fragments" and ".fragment_size" and that doesn't overflow but then we
save it in an unsigned int so it truncates the high bits away and we
allocate a smaller than expected size.
Fixes: b35cc8225845 ('ALSA: compress_core: integer overflow in snd_compr_allocate_buffer()') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The well-spotted fallocate undo fix is good in most cases, but not when
fallocate failed on the very first page. index 0 then passes lend -1
to shmem_undo_range(), and that has two bad effects: (a) that it will
undo every fallocation throughout the file, unrestricted by the current
range; but more importantly (b) it can cause the undo to hang, because
lend -1 is treated as truncation, which makes it keep on retrying until
every page has gone, but those already fully instantiated will never go
away. Big thank you to xfstests generic/269 which demonstrates this.
Fixes: b9b4bb26af01 ("tmpfs: don't undo fallocate past its last page") Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: use PAGE_CACHE_SHIFT instead of PAGE_SHIFT] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
There are legitimate reasons to disallow mmap on certain files, notably
in sysfs or procfs. We shouldn't emulate mmap support on file systems
that don't offer support natively.
CVE-2016-1583
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
[tyhicks: clean up f_op check by using ecryptfs_file_to_lower()] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
As of Xen 4.7 PV CPUID doesn't expose either of CPUID[1].ECX[7] and
CPUID[0x80000007].EDX[7] anymore, causing the driver to fail to load on
both Intel and AMD systems. Doing any kind of hardware capability
checks in the driver as a prerequisite was wrong anyway: With the
hypervisor being in charge, all such checking should be done by it. If
ACPI data gets uploaded despite some missing capability, the hypervisor
is free to ignore part or all of that data.
Ditch the entire check_prereq() function, and do the only valid check
(xen_initial_domain()) in the caller in its place.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
xenbus_dev_request_and_reply() needs to track whether a transaction is
open. For XS_TRANSACTION_START messages it calls transaction_start()
and for XS_TRANSACTION_END messages it calls transaction_end().
If sending an XS_TRANSACTION_START message fails or responds with an
an error, the transaction is not open and transaction_end() must be
called.
If sending an XS_TRANSACTION_END message fails, the transaction is
still open, but if an error response is returned the transaction is
closed.
Commit 027bd7e89906 ("xen/xenbus: Avoid synchronous wait on XenBus
stalling shutdown/restart") introduced a regression where failed
XS_TRANSACTION_START messages were leaving the transaction open. This
can cause problems with suspend (and migration) as all transactions
must be closed before suspending.
It appears that the problematic change was added accidentally, so just
remove it.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
A qeth_card contains a napi_struct linked to the net_device during
device probing. This struct must be deleted when removing the qeth
device, otherwise Panic on oops can occur when qeth devices are
repeatedly removed and added.
Fixes: a1c3ed4c9ca ("qeth: NAPI support for l2 and l3 discipline") Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Tested-by: Alexander Klein <ALKL@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The user timer tu->qused counter may go to a negative value when
multiple concurrent reads are performed since both the check and the
decrement of tu->qused are done in two individual locked contexts.
This results in bogus read outs, and the endless loop in the
user-space side.
The fix is to move the decrement of the tu->qused counter into the
same spinlock context as the zero-check of the counter.
Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
On 64bits kernels, device stats are 64bits wide, not 32bits.
Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
get_task_ioprio() accesses the task->io_context without holding the task
lock and thus can race with exit_io_context(), leading to a
use-after-free. The reproducer below hits this within a few seconds on
my 4-core QEMU VM:
The current implementation does not handle timeout in case of command
with callback request, and this can lead to deadlock if the command
doesn't get fw response.
Add delayed callback timeout work before posting the command to fw.
In case of real fw command completion we will cancel the delayed work.
In case of fw command timeout the callback timeout handler will be
called and it will simulate fw completion with timeout error.
Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Call command completion handler in case of timeout when working in
interrupts mode.
Avoid flushing the commands workqueue after acquiring the semaphores to
prevent a potential deadlock.
Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: the calculation of ds is more complex] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ether_addr_equal_64bits() requires some care about its arguments,
namely that 8 bytes might be read, even if last 2 byte values are not
used.
KASan detected a violation with null_mac_addr and lacpdu_mcast_addr
in bond_3ad.c
Same problem with mac_bcast[] and mac_v6_allmcast[] in bond_alb.c :
Although the 8-byte alignment was there, KASan would detect out
of bound accesses.
Fixes: 815117adaf5b ("bonding: use ether_addr_equal_unaligned for bond addr compare") Fixes: bb54e58929f3 ("bonding: Verify RX LACPDU has proper dest mac-addr") Fixes: 885a136c52a8 ("bonding: use compare_ether_addr_64bits() in ALB") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16:
- Adjust filename
- Drop change to bond_params::ad_actor_system
- Fix one more copy of null_mac_addr to use eth_zero_addr()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fix boot crash that triggers if this driver is built into a kernel and
run on non-AMD systems.
AMD northbridges users call amd_cache_northbridges() and it returns
a negative value to signal that we weren't able to cache/detect any
northbridges on the system.
At least, it should do so as all its callers expect it to do so. But it
does return a negative value only when kmalloc() fails.
Fix it to return -ENODEV if there are no NBs cached as otherwise, amd_nb
users like amd64_edac, for example, which relies on it to know whether
it should load or not, gets loaded on systems like Intel Xeons where it
shouldn't.
Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).
That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables. Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.
The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.
To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch. Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.
Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too. That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables. Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.
With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it). Update RESTORE_MAGIC
too to reflect the image header format change.
Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.
This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.
Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata) Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2 Reported-by: Logan Gunthorpe <logang@deltatee.com> Reported-and-tested-by: Borislav Petkov <bp@suse.de> Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
vortex_wtdma_bufshift() function does calculate the page index
wrongly, first masking then shift, which always results in zero.
The proper computation is to first shift, then mask.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
There is a static checker warning here "warn: mask and shift to zero"
and the code sets "ring" to zero every time. From looking at how
QLCNIC_FETCH_RING_ID() is used in qlcnic_83xx_process_rcv_ring() the
qlcnic_83xx_hndl() should be removed.
Fixes: 4be41e92f7c6 ('qlcnic: 83xx data path routines') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The untagged vlan object is only destroyed when the interface is removed
via the legacy sysfs interface. But it also has to be destroyed when the
standard rtnl-link interface is used.
Fixes: 5d2c05b21337 ("batman-adv: add per VLAN interface attribute framework") Signed-off-by: Sven Eckelmann <sven@narfation.org> Acked-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: s/_put/_free_ref/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The skb_linearize may reallocate the skb. This makes the calculated pointer
for ethhdr invalid. But it the pointer is used later to fill in the RR
field of the batadv_icmp_packet_rr packet.
Instead re-evaluate eth_hdr after the skb_linearize+skb_cow to fix the
pointer and avoid the invalid read.
Fixes: da6b8c20a5b8 ("batman-adv: generalize batman-adv icmp packet handling") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Each batadv_tt_local_entry hold a single reference to a
batadv_softif_vlan. In case a new entry cannot be added to the hash
table, the error path puts the reference, but the reference will also
now be dropped by batadv_tt_local_entry_release().
Fixes: a33d970d0b54 ("batman-adv: Fix reference counting of vlan object for tt_local_entry") Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: s/_put/_free_ref/]
The tt_req_node is added and removed from a list inside a spinlock. But the
locking is sometimes removed even when the object is still referenced and
will be used later via this reference. For example batadv_send_tt_request
can create a new tt_req_node (including add to a list) and later
re-acquires the lock to remove it from the list and to free it. But at this
time another context could have already removed this tt_req_node from the
list and freed it.
CPU#0
batadv_batman_skb_recv from net_device 0
-> batadv_iv_ogm_receive
-> batadv_iv_ogm_process
-> batadv_iv_ogm_process_per_outif
-> batadv_tvlv_ogm_receive
-> batadv_tvlv_ogm_receive
-> batadv_tvlv_containers_process
-> batadv_tvlv_call_handler
-> batadv_tt_tvlv_ogm_handler_v1
-> batadv_tt_update_orig
-> batadv_send_tt_request
-> batadv_tt_req_node_new
spin_lock(...)
allocates new tt_req_node and adds it to list
spin_unlock(...)
return tt_req_node
CPU#1
batadv_batman_skb_recv from net_device 1
-> batadv_recv_unicast_tvlv
-> batadv_tvlv_containers_process
-> batadv_tvlv_call_handler
-> batadv_tt_tvlv_unicast_handler_v1
-> batadv_handle_tt_response
spin_lock(...)
tt_req_node gets removed from list and is freed
spin_unlock(...)
CPU#0
<- returned to batadv_send_tt_request
spin_lock(...)
tt_req_node gets removed from list and is freed
MEMORY CORRUPTION/SEGFAULT/...
spin_unlock(...)
This can only be solved via reference counting to allow multiple contexts
to handle the list manipulation while making sure that only the last
context holding a reference will free the object.
Fixes: a73105b8d4c7 ("batman-adv: improved client announcement mechanism") Signed-off-by: Sven Eckelmann <sven@narfation.org> Tested-by: Martin Weinelt <martin@darmstadt.freifunk.net> Tested-by: Amadeus Alfa <amadeus@chemnitz.freifunk.net> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16:
- Adjust context
- Use list_empty() instead of hlist_unhashed()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If a VLAN tagged frame is received and the corresponding VLAN is not
configured on the soft interface, it will splat a WARN on every packet
received. This is a quite annoying behaviour for some scenarios, e.g. if
bat0 is bridged with eth0, and there are arbitrary VLAN tagged frames
from Ethernet coming in without having any VLAN configuration on bat0.
The code should probably create vlan objects on the fly and
transparently transport these VLAN-tagged Ethernet frames, but until
this is done, at least the WARN splat should be replaced by a rate
limited output.
Fixes: 354136bcc3c4 ("batman-adv: fix kernel crash due to missing NULL checks") Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The object tt_local is allocated with kmalloc and not initialized when the
function batadv_tt_local_add checks for the vlan. But this function can
only cleanup the object when the (not yet initialized) reference counter of
the object is 1. This is unlikely and thus the object would leak when the
vlan could not be found.
Instead the uninitialized object tt_local has to be freed manually and the
pointer has to set to NULL to avoid calling the function which would try to
decrement the reference counter of the not existing object.
CID: 1316518 Fixes: 354136bcc3c4 ("batman-adv: fix kernel crash due to missing NULL checks") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If we have a system which uses fixed PHY devices and calls
fixed_phy_register() then fixed_phy_unregister() we can exhaust the
number of fixed PHYs available after a while, since we keep incrementing
the variable phy_fixed_addr, but we never decrement it.
This patch fixes that by converting the fixed PHY allocation to using
IDA, which takes care of the allocation/dealloaction of the PHY
addresses for us.
Fixes: a75951217472 ("net: phy: extend fixed driver with fixed_phy_register()") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16:
- Adjust filename, context
- fixed_phy_register() returns an integer, not a pointer/error] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Currently we have 2 segments that are bolted for the kernel linear
mapping (ie 0xc000... addresses). This is 0 to 1TB and also the kernel
stacks. Anything accessed outside of these regions may need to be
faulted in. (In practice machines with TM always have 1T segments)
If a machine has < 2TB of memory we never fault on the kernel linear
mapping as these two segments cover all physical memory. If a machine
has > 2TB of memory, there may be structures outside of these two
segments that need to be faulted in. This faulting can occur when
running as a guest as the hypervisor may remove any SLB that's not
bolted.
When we treclaim and trecheckpoint we have a window where we need to
run with the userspace GPRs. This means that we no longer have a valid
stack pointer in r1. For this window we therefore clear MSR RI to
indicate that any exceptions taken at this point won't be able to be
handled. This means that we can't take segment misses in this RI=0
window.
In this RI=0 region, we currently access the thread_struct for the
process being context switched to or from. This thread_struct access
may cause a segment fault since it's not guaranteed to be covered by
the two bolted segment entries described above.
We've seen this with a crash when running as a guest with > 2TB of
memory on PowerVM:
This fixes this by copying the required data from the thread_struct to
the stack before we clear MSR RI. Then once we clear RI, we only access
the stack, guaranteeing there's no segment miss.
We also tighten the region over which we set RI=0 on the treclaim()
path. This may have a slight performance impact since we're adding an
mtmsr instruction.
Fixes: 090b9284d725 ("powerpc/tm: Clear MSR RI in non-recoverable TM code") Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If we fall back to using LSI on the Croc or Crocodile chip we need to
clear the interrupt so we don't hang the system.
Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Olga Kornievskaia reports that the following test fails to trigger
an OPEN_DOWNGRADE on the wire, and only triggers the final CLOSE.
fd0 = open(foo, RDRW) -- should be open on the wire for "both"
fd1 = open(foo, RDONLY) -- should be open on the wire for "read"
close(fd0) -- should trigger an open_downgrade
read(fd1)
close(fd1)
The issue is that we're missing a check for whether or not the current
state transitioned from an O_RDWR state as opposed to having transitioned
from a combination of O_RDONLY and O_WRONLY.
Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: cd9288ffaea4 ("NFSv4: Fix another bug in the close/open_downgrade code") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The bridge is falsly dropping ipv6 mulitcast packets if there is:
1. No ipv6 address assigned on the brigde.
2. No external mld querier present.
3. The internal querier enabled.
When the bridge fails to build mld queries, because it has no
ipv6 address, it slilently returns, but keeps the local querier enabled.
This specific case causes confusing packet loss.
Ipv6 multicast snooping can only work if:
a) An external querier is present
OR
b) The bridge has an ipv6 address an is capable of sending own queries
Otherwise it has to forward/flood the ipv6 multicast traffic,
because snooping cannot work.
This patch fixes the issue by adding a flag to the bridge struct that
indicates that there is currently no ipv6 address assinged to the bridge
and returns a false state for the local querier in
__br_multicast_querier_exists().
Special thanks to Linus Lüssing.
Fixes: d1d81d4c3dd8 ("bridge: check return value of ipv6_dev_get_saddr()") Signed-off-by: Daniel Danzberger <daniel@dd-wrt.com> Acked-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If a user space program (e.g., wpa_supplicant) deletes a STA entry that
is currently in NL80211_PLINK_ESTAB state, the number of established
plinks counter was not decremented and this could result in rejecting
new plink establishment before really hitting the real maximum plink
limit. For !user_mpm case, this decrementation is handled by
mesh_plink_deactive().
Fix this by decrementing estab_plinks on STA deletion
(mesh_sta_cleanup() gets called from there) so that the counter has a
correct value and the Beacon frame advertisement in Mesh Configuration
element shows the proper value for capability to accept additional
peers.
Signed-off-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
[bwh: Backported to 3.16: plink_state field is in struct sta_info] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
bgmac_open() calls phy_start() to initialize the PHY state machine,
which will set the interface's carrier state accordingly, no need to
force that as this could be conflicting with the PHY state determined by
PHYLIB.
Fixes: dd4544f05469 ("bgmac: driver for GBit MAC core on BCMA bus") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The driver does not start the transmit queue in bgmac_open(). If the
queue was stopped prior to closing then re-opening the interface, we
would never be able to wake-up again.
Fixes: dd4544f05469 ("bgmac: driver for GBit MAC core on BCMA bus") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The test_fp_ctl function is used to test if a given value is a valid
floating-point control. The inline assembly in test_fp_ctl uses an
incorrect constraint for the 'orig_fpc' variable. If the compiler
chooses the same register for 'fpc' and 'orig_fpc' the test_fp_ctl()
function always returns true. This allows user space to trigger
kernel oopses with invalid floating-point control values on the
signal stack.
This problem has been introduced with git commit 4725c86055f5bbdcdf
"s390: fix save and restore of the floating-point-control register"
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If CONFIG_ARC_DW2_UNWIND is disabled every time arc_unwind_core()
gets called following message gets printed in debug console:
----------------->8---------------
CONFIG_ARC_DW2_UNWIND needs to be enabled
----------------->8---------------
That message makes sense if user indeed wants to see a backtrace or
get nice function call-graphs in perf but what if user disabled
unwinder for the purpose? Why pollute his debug console?
So instead we'll warn user about possibly missing feature once and
let him decide if that was what he or she really wanted.
With recent binutils update to support dwarf CFI pseudo-ops in gas, we
now get .eh_frame vs. .debug_frame. Although the call frame info is
exactly the same in both, the CIE differs, which the current kernel
unwinder can't cope with.
This broke both the kernel unwinder as well as loadable modules (latter
because of a new unhandled relo R_ARC_32_PCREL from .rela.eh_frame in
the module loader)
The ideal solution would be to switch unwinder to .eh_frame.
For now however we can make do by just ensureing .debug_frame is
generated by removing -fasynchronous-unwind-tables
.eh_frame generated with -gdwarf-2 -fasynchronous-unwind-tables
.debug_frame generated with -gdwarf-2
'commpage_bak' is allocated with 'sizeof(struct echoaudio)' bytes.
We then copy 'sizeof(struct comm_page)' bytes in it.
On my system, smatch complains because one is 2960 and the other is 3072.
This would result in memory corruption or a oops.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The USB core contains a bug that can show up when a USB-3 host
controller is removed. If the primary (USB-2) hcd structure is
released before the shared (USB-3) hcd, the core will try to do a
double-free of the common bandwidth_mutex.
The problem was described in graphical form by Chung-Geol Kim, who
first reported it:
This happens because hcd_release() frees the bandwidth_mutex whenever
it sees a primary hcd being released (which is not a very good idea
in any case), but in the course of releasing the primary hcd, it
changes the pointers in the shared hcd in such a way that the shared
hcd will appear to be primary when it gets released.
This patch fixes the problem by changing hcd_release() so that it
deallocates the bandwidth_mutex only when the _last_ hcd structure
referencing it is released. The patch also removes an unnecessary
test, so that when an hcd is released, both the shared_hcd and
primary_hcd pointers in the hcd's peer will be cleared.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Chung-Geol Kim <chunggeol.kim@samsung.com> Tested-by: Chung-Geol Kim <chunggeol.kim@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16: free only usb_hcd::bandwidth_mutex, not
usb_hcd::address0_mutex too] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code"
unconditional d_drop() after the ->open_context() had been removed. It had
been correct for success cases (there ->open_context() itself had been doing
dcache manipulations), but not for error ones. Only one of those (ENOENT)
got a compensatory d_drop() added in that commit, but in fact it should've
been done for all errors. As it is, the case of O_CREAT non-exclusive open
on a hashed negative dentry racing with e.g. symlink creation from another
client ended up with ->open_context() getting an error and proceeding to
call nfs_lookup(). On a hashed dentry, which would've instantly triggered
BUG_ON() in d_materialise_unique() (or, these days, its equivalent in
d_splice_alias()).
Tested-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Userspace can quite legitimately perform an exec() syscall with a
suspended transaction. exec() does not return to the old process, rather
it load a new one and starts that, the expectation therefore is that the
new process starts not in a transaction. Currently exec() is not treated
any differently to any other syscall which creates problems.
Firstly it could allow a new process to start with a suspended
transaction for a binary that no longer exists. This means that the
checkpointed state won't be valid and if the suspended transaction were
ever to be resumed and subsequently aborted (a possibility which is
exceedingly likely as exec()ing will likely doom the transaction) the
new process will jump to invalid state.
Secondly the incorrect attempt to keep the transactional state while
still zeroing state for the new process creates at least two TM Bad
Things. The first triggers on the rfid to return to userspace as
start_thread() has given the new process a 'clean' MSR but the suspend
will still be set in the hardware MSR. The second TM Bad Thing triggers
in __switch_to() as the processor is still transactionally suspended but
__switch_to() wants to zero the TM sprs for the new process.
This is an example of the outcome of calling exec() with a suspended
transaction. Note the first 700 is likely the first TM bad thing
decsribed earlier only the kernel can't report it as we've loaded
userspace registers. c000000000009980 is the rfid in
fast_exception_return()
Currently the ad7266 driver treats any failure to get vref as though the
regulator were not present but this means that if probe deferral is
triggered the driver will act as though the regulator were not present.
Instead only use the internal reference if we explicitly got -ENODEV which
is what is returned for absent regulators.
Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The ad7266 driver attempts to support deciding between the use of internal
and external power supplies by checking to see if an error is returned when
requesting the regulator. This doesn't work with the current code since the
driver uses a normal regulator_get() which is for non-optional supplies
and so assumes that if a regulator is not provided by the platform then
this is a bug in the platform integration and so substitutes a dummy
regulator. Use regulator_get_optional() instead which indicates to the
framework that the regulator may be absent and provides a dummy regulator
instead.
Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
All regulator_get() variants return either a pointer to a regulator or an
ERR_PTR() so testing for NULL makes no sense and may lead to bugs if we
use NULL as a valid regulator. Fix this by using IS_ERR() as expected.
Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
These two spi_w8r8() calls return a value with is used by the code
following the error check. The dubious use was caused by a cleanup
patch.
Fixes: d34dbee8ac8e ("staging:iio:accel:kxsd9 cleanup and conversion to iio_chan_spec.") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
sca3000_read_ctrl_reg() returns a negative number on failure, check for
this instead of zero.
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The value `bytes' comes from the filesystem which is about to be
mounted. We cannot trust that the value is always in the range we
expect it to be.
Check its value before using it to calculate the length for the crc32_le
call. It value must be larger (or equal) sumoff + 4.
This fixes a kernel bug when accidentially mounting an image file which
had the nilfs2 magic value 0x3434 at the right offset 0x406 by chance.
The bytes 0x01 0x00 were stored at 0x408 and were interpreted as a
s_bytes value of 1. This caused an underflow when substracting sumoff +
4 (20) in the call to crc32_le.
When fallocate is interrupted it will undo a range that extends one byte
past its range of allocated pages. This can corrupt an in-use page by
zeroing out its first byte. Instead, undo using the inclusive byte
range.
Fixes: 1635f6a74152f1d ("tmpfs: undo fallocation on failure") Link: http://lkml.kernel.org/r/1462713387-16724-1-git-send-email-anthony.romano@coreos.com Signed-off-by: Anthony Romano <anthony.romano@coreos.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Brandon Philips <brandon@ifup.co> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: use PAGE_CACHE_SHIFT instead of PAGE_SHIFT] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Azure server blocks clients that open a socket and don't do anything on it.
In our reconnect scenarios, we can reconnect the tcp session and
detect the socket is available but we defer the negprot and SMB3 session
setup and tree connect reconnection until the next i/o is requested, but
this looks suspicous to some servers who expect SMB3 negprog and session
setup soon after a socket is created.
In the echo thread, reconnect SMB3 sessions and tree connections
that are disconnected. A later patch will replay persistent (and
resilient) handle opens.
Signed-off-by: Steve French <steve.french@primarydata.com> Acked-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Use set_posix_acl, which includes proper permission checks, instead of
calling ->set_acl directly. Without this anyone may be able to grant
themselves permissions to a file by setting the ACL.
Lock the inode to make the new checks atomic with respect to set_acl.
(Also, nfsd was the only caller of set_acl not locking the inode, so I
suspect this may fix other races.)
This also simplifies the code, and ensures our ACLs are checked by
posix_acl_valid.
The permission checks and the inode locking were lost with commit 4ac7249e, which changed nfsd to use the set_acl inode operation directly
instead of going through xattr handlers.
Reported-by: David Sinquin <david@sinquin.eu>
[agreunba@redhat.com: use set_posix_acl] Fixes: 4ac7249e Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[carnil: backport for 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Factor out part of posix_acl_xattr_set into a common function that takes
a posix_acl, which nfsd can also call.
The prototype already exists in include/linux/posix_acl.h.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[carnil: backport to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This patch validates the num_values parameter from userland during the
HIDIOCGUSAGES and HIDIOCSUSAGES commands. Previously, if the report id was set
to HID_REPORT_ID_UNKNOWN, we would fail to validate the num_values parameter
leading to a heap overflow.
Signed-off-by: Scott Bauer <sbauer@plzdonthack.me> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In sess_auth_rawntlmssp_authenticate(), the ntlmssp blob is allocated
statically and its size is an "empirical" 5*sizeof(struct
_AUTHENTICATE_MESSAGE) (320B on x86_64). I don't know where this value
comes from or if it was ever appropriate, but it is currently
insufficient: the user and domain name in UTF16 could take 1kB by
themselves. Because of that, build_ntlmssp_auth_blob() might corrupt
memory (out-of-bounds write). The size of ntlmssp_blob in
SMB2_sess_setup() is too small too (sizeof(struct _NEGOTIATE_MESSAGE)
+ 500).
This patch allocates the blob dynamically in
build_ntlmssp_auth_blob().
Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.16: adjust context, indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Currently in build_ntlmssp_auth_blob(), when converting the domain
name to UTF16, CIFS_MAX_USERNAME_LEN limit is used. It should be
CIFS_MAX_DOMAINNAME_LEN. This patch fixes this.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The touchpad in HP Pavilion 14-ab057ca reports it's version as 12 and
according to Elan both 11 and 12 are valid IC types and should be
identified as hw_version 4.
Reported-by: Patrick Lessard <Patrick.Lessard@cogeco.com> Tested-by: Patrick Lessard <Patrick.Lessard@cogeco.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When RC, UC, or RAW QPs are created, a qp object is allocated (kzalloc).
If at a later point (in procedure create_qp_common) the qp creation fails,
this qp object must be freed.
Fixes: 1ffeb2eb8be99 ("IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support") Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In procedure mlx4_ib_create_flow, passing an invalid port number
will cause an out-of-bounds array access. Data passed to this procedure
can come from user-space. Therefore, need to validate port number
before proceeding onwards.
Note that we check against the number of physical ports declared at
the verbs (ib core) level; When bonding is active, the verbs level
sees one physical port, even though the low-level driver sees two ports.
Fixes: f77c0162a339 ("IB/mlx4: Add receive flow steering support") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.16:
- Adjust context
- Function returns an integer, not a pointer/error] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fix mad send error flow to prevent double freeing address handles,
and leaking tx_ring entries when SRIOV is active.
If ib_mad_post_send fails, the address handle pointer in the tx_ring entry
must be set to NULL (or there will be a double-free) and tx_tail must be
incremented (or there will be a leak of tx_ring entries).
The tx_ring is handled the same way in the send-completion handler.
Fixes: 37bfc7c1e83f ("IB/mlx4: SR-IOV multiplex and demultiplex MADs") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When calculating the required size of an RC QP send queue, leave
enough space for masked atomic operations, which require more space than
"regular" atomic operation.
Fixes: 6fa8f719844b ("IB/mlx4: Add support for masked atomic operations") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@mellanox.co.il> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If the caller specified IB_SEND_FENCE in the send flags of the work
request and no previous work request stated that the successive one
should be fenced, the work request would be executed without a fence.
This could result in RDMA read or atomic operations failure due to a MR
being invalidated. Fix this by adding the mlx5 enumeration for fencing
RDMA/atomic operations and fix the logic to apply this.
Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Current overlap check is evaluating to false a case where a filter
field is fully contained (proper subset) of a r/w request. This
change applies classical overlap check instead to include all the
scenarios.
More specifically, for (Hilscher GmbH CIFX 50E-DP(M/S)) device driver
the logic is such that the entire confspace is read and written in 4
byte chunks. In this case as an example, CACHE_LINE_SIZE,
LATENCY_TIMER and PCI_BIST are arriving together in one call to
xen_pcibk_config_write() with offset == 0xc and size == 4. With the
exsisting overlap check the LATENCY_TIMER field (offset == 0xd, length
== 1) is fully contained in the write request and hence is excluded
from write, which is incorrect.
Signed-off-by: Andrey Grodzovsky <andrey2805@gmail.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
For 'real' hardware CAN devices the netlink interface is used to set CAN
specific communication parameters. Real CAN hardware can not be created nor
removed with the ip tool ...
This patch adds a private dellink function for the CAN device driver interface
that does just nothing.
It's a follow up to commit 993e6f2fd ("can: fix oops caused by wrong rtnl
newlink usage") but for dellink.
Reported-by: ajneu <ajneu1@gmail.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Classic BPF JIT was never ported completely to work on little endian
powerpc. However, it can be enabled and will crash the system when used.
As such, disable use of BPF JIT on ppc64le.
Fixes: 7c105b63bd98 ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.") Reported-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[bwh: Backported to 3.16: config symbol is BPF_JIT and also depends on PPC64] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
During page migrations UBIFS might get confused
and the following assert triggers:
[ 213.480000] UBIFS assert failed in ubifs_set_page_dirty at 1451 (pid 436)
[ 213.490000] CPU: 0 PID: 436 Comm: drm-stress-test Not tainted 4.4.4-00176-geaa802524636-dirty #1008
[ 213.490000] Hardware name: Allwinner sun4i/sun5i Families
[ 213.490000] [<c0015e70>] (unwind_backtrace) from [<c0012cdc>] (show_stack+0x10/0x14)
[ 213.490000] [<c0012cdc>] (show_stack) from [<c02ad834>] (dump_stack+0x8c/0xa0)
[ 213.490000] [<c02ad834>] (dump_stack) from [<c0236ee8>] (ubifs_set_page_dirty+0x44/0x50)
[ 213.490000] [<c0236ee8>] (ubifs_set_page_dirty) from [<c00fa0bc>] (try_to_unmap_one+0x10c/0x3a8)
[ 213.490000] [<c00fa0bc>] (try_to_unmap_one) from [<c00fadb4>] (rmap_walk+0xb4/0x290)
[ 213.490000] [<c00fadb4>] (rmap_walk) from [<c00fb1bc>] (try_to_unmap+0x64/0x80)
[ 213.490000] [<c00fb1bc>] (try_to_unmap) from [<c010dc28>] (migrate_pages+0x328/0x7a0)
[ 213.490000] [<c010dc28>] (migrate_pages) from [<c00d0cb0>] (alloc_contig_range+0x168/0x2f4)
[ 213.490000] [<c00d0cb0>] (alloc_contig_range) from [<c010ec00>] (cma_alloc+0x170/0x2c0)
[ 213.490000] [<c010ec00>] (cma_alloc) from [<c001a958>] (__alloc_from_contiguous+0x38/0xd8)
[ 213.490000] [<c001a958>] (__alloc_from_contiguous) from [<c001ad44>] (__dma_alloc+0x23c/0x274)
[ 213.490000] [<c001ad44>] (__dma_alloc) from [<c001ae08>] (arm_dma_alloc+0x54/0x5c)
[ 213.490000] [<c001ae08>] (arm_dma_alloc) from [<c035cecc>] (drm_gem_cma_create+0xb8/0xf0)
[ 213.490000] [<c035cecc>] (drm_gem_cma_create) from [<c035cf20>] (drm_gem_cma_create_with_handle+0x1c/0xe8)
[ 213.490000] [<c035cf20>] (drm_gem_cma_create_with_handle) from [<c035d088>] (drm_gem_cma_dumb_create+0x3c/0x48)
[ 213.490000] [<c035d088>] (drm_gem_cma_dumb_create) from [<c0341ed8>] (drm_ioctl+0x12c/0x444)
[ 213.490000] [<c0341ed8>] (drm_ioctl) from [<c0121adc>] (do_vfs_ioctl+0x3f4/0x614)
[ 213.490000] [<c0121adc>] (do_vfs_ioctl) from [<c0121d30>] (SyS_ioctl+0x34/0x5c)
[ 213.490000] [<c0121d30>] (SyS_ioctl) from [<c000f2c0>] (ret_fast_syscall+0x0/0x34)
UBIFS is using PagePrivate() which can have different meanings across
filesystems. Therefore the generic page migration code cannot handle this
case correctly.
We have to implement our own migration function which basically does a
plain copy but also duplicates the page private flag.
UBIFS is not a block device filesystem and cannot use buffer_migrate_page().
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
[rw: Massaged changelog, build fixes, etc...] Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
recover_peb() was never power cut aware,
if a power cut happened right after writing the VID header
upon next attach UBI would blindly use the new partial written
PEB and all data from the old PEB is lost.
In order to make recover_peb() power cut aware, write the new
VID with a proper crc and copy_flag set such that the UBI attach
process will detect whether the new PEB is completely written
or not.
We cannot directly use ubi_eba_atomic_leb_change() since we'd
have to unlock the LEB which is facing a write error.
Reported-by: Jörg Pfähler <pfaehler@isse.de> Reviewed-by: Jörg Pfähler <pfaehler@isse.de> Signed-off-by: Richard Weinberger <richard@nod.at>
[bwh: Backported to 3.16: no need to unlock ubi->fm_eba_sem on error] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Modify mlx4_en_vlan_rx_[add/kill]_vid to return error value in case of
failure.
Fixes: 8e586137e6b6 ('net: make vlan ndo_vlan_rx_[add/kill]_vid return error value') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
With many repeated suspend resume cycles, the pin specific wakeirq
may not always work on omaps. This is because the write to enable the
pin interrupt may not have reached the device over the interconnect
before suspend happens.
Let's fix the issue with a flush of posted write with a readback.
Reported-by: Nishanth Menon <nm@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
__sync_icache_dcache unconditionally skips the cache maintenance for
anonymous pages, under the assumption that flushing is only required in
the presence of D-side aliases [see 7249b79f6b4cc ("arm64: Do not flush
the D-cache for anonymous pages")].
Unfortunately, this breaks migration of anonymous pages holding
self-modifying code, where userspace cannot be reasonably expected to
reissue maintenance instructions in response to a migration.
This patch fixes the problem by removing the broken page_mapping(page)
check from the cache syncing code, otherwise we may end up fetching and
executing stale instructions from the PoU.
Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If a task uses a non constant string for the format parameter in
trace_printk(), then the trace_printk_fmt variable is set to NULL. This
variable is then saved in the __trace_printk_fmt section.
The function hold_module_trace_bprintk_format() checks to see if duplicate
formats are used by modules, and reuses them if so (saves them to the list
if it is new). But this function calls lookup_format() that does a strcmp()
to the value (which is now NULL) and can cause a kernel oops.
This wasn't an issue till 3debb0a9ddb ("tracing: Fix trace_printk() to print
when not using bprintk()") which added "__used" to the trace_printk_fmt
variable, and before that, the kernel simply optimized it out (no NULL value
was saved).
The fix is simply to handle the NULL pointer in lookup_format() and have the
caller ignore the value if it was NULL.
Link: http://lkml.kernel.org/r/1464769870-18344-1-git-send-email-zhengjun.xing@intel.com Reported-by: xingzhen <zhengjun.xing@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Fixes: 3debb0a9ddb ("tracing: Fix trace_printk() to print when not using bprintk()") Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
At high bus load it could happen that "at91_poll()" enters with all RX
message boxes filled up. If then at the end the "quota" is exceeded as
well, "rx_next" will not be reset to the first RX mailbox and hence the
interrupts remain disabled.
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com> Tested-by: Amr Bekhit <amrbekhit@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When testing CAN write floods on Altera's CycloneV, the first 2 bytes
are sometimes 0x00, 0x00 or corrupted instead of the values sent. Also
observed bytes 4 & 5 were corrupted in some cases.
The D_CAN Data registers are 32 bits and changing from 16 bit writes to
32 bit writes fixes the problem.
Testing performed on Altera CycloneV (D_CAN). Requesting tests on other
C_CAN & D_CAN platforms.
Reported-by: Richard Andrysek <richard.andrysek@gomtec.de> Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
For security reasons ordinary user must not be able to control fan speed
via /proc/i8k by default. Some malicious software running under "nobody"
user could be able to turn fan off and cause HW problems. So this patch
changes default value of "restricted" parameter to 1.
Also restrict reading of DMI_PRODUCT_SERIAL from /proc/i8k via "restricted"
parameter. It is because non root user cannot read DMI_PRODUCT_SERIAL from
sysfs file /sys/class/dmi/id/product_serial.
Old non secure behaviour of file /proc/i8k can be achieved by loading this
module with "restricted" parameter set to 0.
Note that this patch has effects only for kernels compiled with CONFIG_I8K
and only for file /proc/i8k. Hwmon interface provided by this driver was
not changed and root access for setting fan speed was needed also before.
Reported-by: Mario Limonciello <Mario_Limonciello@dell.com> Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
[bwh: Backported to 3.16: adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The isa_bus_init function must be called before drivers which utilize
the ISA bus driver are registered. A race condition for initilization
exists if device_initcall is used (the isa_bus_init callback is placed
in the same initcall level as dependent drivers which use module_init).
This patch ensures that isa_bus_init is called first by utilizing
postcore_initcall in favor of device_initcall.
Fixes: a5117ba7da37 ("[PATCH] Driver model: add ISA bus") Cc: Rene Herman <rene.herman@keyaccess.nl> Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When this code was reworked for IBoE support the order of assignments
for the sl_tclass_flowlabel got flipped around resulting in
TClass & FlowLabel being permanently set to 0 in the packet headers.
This breaks IB routers that rely on these headers, but only affects
kernel users - libmlx4 does this properly for user space.
Fixes: fa417f7b520e ("IB/mlx4: Add support for IBoE") Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If __key_link_begin() failed then "edit" would be uninitialized. I've
added a check to fix that.
This allows a random user to crash the kernel, though it's quite
difficult to achieve. There are three ways it can be done as the user
would have to cause an error to occur in __key_link():
(1) Cause the kernel to run out of memory. In practice, this is difficult
to achieve without ENOMEM cropping up elsewhere and aborting the
attempt.
(2) Revoke the destination keyring between the keyring ID being looked up
and it being tested for revocation. In practice, this is difficult to
time correctly because the KEYCTL_REJECT function can only be used
from the request-key upcall process. Further, users can only make use
of what's in /sbin/request-key.conf, though this does including a
rejection debugging test - which means that the destination keyring
has to be the caller's session keyring in practice.
(3) Have just enough key quota available to create a key, a new session
keyring for the upcall and a link in the session keyring, but not then
sufficient quota to create a link in the nominated destination keyring
so that it fails with EDQUOT.
The bug can be triggered using option (3) above using something like the
following:
echo 80 >/proc/sys/kernel/keys/root_maxbytes
keyctl request2 user debug:fred negate @t
The above sets the quota to something much lower (80) to make the bug
easier to trigger, but this is dependent on the system. Note also that
the name of the keyring created contains a random number that may be
between 1 and 10 characters in size, so may throw the test off by
changing the amount of quota used.
Assuming the failure occurs, something like the following will be seen:
Fixes: f70e2e06196a ('KEYS: Do preallocation for __key_link()') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In the omap gpmc driver it can be noticed that GPMC_CONFIG4_OEEXTRADELAY
is overwritten by the WEEXTRADELAY value from the device tree and
GPMC_CONFIG4_WEEXTRADELAY is not updated by the value from the device
tree.
As a consequence, the memory accesses cannot be configured properly when
the extra delay are needed for OE and WE.
Fix the update of GPMC_CONFIG4_WEEXTRADELAY with the value from the
device tree file and prevents GPMC_CONFIG4_OEXTRADELAY
being overwritten by the WEXTRADELAY value from the device tree.
Signed-off-by: Ocquidant, Sebastien <sebastienocquidant@eaton.com> Signed-off-by: Roger Quadros <rogerq@ti.com>
[bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Modules which register drivers via standard path (driver_register) in
parallel can cause a warning:
WARNING: CPU: 2 PID: 3492 at ../fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80
sysfs: cannot create duplicate filename '/module/saa7146/drivers'
Modules linked in: hexium_gemini(+) mxb(+) ...
...
Call Trace:
...
[<ffffffff812e63a2>] sysfs_warn_dup+0x62/0x80
[<ffffffff812e6487>] sysfs_create_dir_ns+0x77/0x90
[<ffffffff8140f2c4>] kobject_add_internal+0xb4/0x340
[<ffffffff8140f5b8>] kobject_add+0x68/0xb0
[<ffffffff8140f631>] kobject_create_and_add+0x31/0x70
[<ffffffff8157a703>] module_add_driver+0xc3/0xd0
[<ffffffff8155e5d4>] bus_add_driver+0x154/0x280
[<ffffffff815604c0>] driver_register+0x60/0xe0
[<ffffffff8145bed0>] __pci_register_driver+0x60/0x70
[<ffffffffa0273e14>] saa7146_register_extension+0x64/0x90 [saa7146]
[<ffffffffa0033011>] hexium_init_module+0x11/0x1000 [hexium_gemini]
...
As can be (mostly) seen, driver_register causes this call sequence:
-> bus_add_driver
-> module_add_driver
-> module_create_drivers_dir
The last one creates "drivers" directory in /sys/module/<...>. When
this is done in parallel, the directory is attempted to be created
twice at the same time.
This can be easily reproduced by loading mxb and hexium_gemini in
parallel:
while :; do
modprobe mxb &
modprobe hexium_gemini
wait
rmmod mxb hexium_gemini saa7146_vv saa7146
done
saa7146 calls pci_register_driver for both mxb and hexium_gemini,
which means /sys/module/saa7146/drivers is to be created for both of
them.
Fix this by a new mutex in module_create_drivers_dir which makes the
test-and-create "drivers" dir atomic.
I inverted the condition and removed 'return' to avoid multiple
unlocks or a goto.
Signed-off-by: Jiri Slaby <jslaby@suse.cz> Fixes: fe480a2675ed (Modules: only add drivers/ direcory if needed) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thanks to Ville Syrjälä for pointing me towards the cause of this issue.
Unfortunately one of the sideaffects of having the refclk for a DPLL set
to SSC is that as long as it's set to SSC, the GPU will prevent us from
powering down any of the pipes or transcoders using it. A couple of
BIOSes enable SSC in both PCH_DREF_CONTROL and in the DPLL
configurations. This causes issues on the first modeset, since we don't
expect SSC to be left on and as a result, can't successfully power down
the pipes or the transcoders using it. Here's an example from this Dell
OptiPlex 990:
[drm:intel_modeset_init] SSC enabled by BIOS, overriding VBT which says disabled
[drm:intel_modeset_init] 2 display pipes available.
[drm:intel_update_cdclk] Current CD clock rate: 400000 kHz
[drm:intel_update_max_cdclk] Max CD clock rate: 400000 kHz
[drm:intel_update_max_cdclk] Max dotclock rate: 360000 kHz
vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
[drm:intel_crt_reset] crt adpa set to 0xf40000
[drm:intel_dp_init_connector] Adding DP connector on port C
[drm:intel_dp_aux_init] registering DPDDC-C bus for card0-DP-1
[drm:ironlake_init_pch_refclk] has_panel 0 has_lvds 0 has_ck505 0
[drm:ironlake_init_pch_refclk] Disabling SSC entirely
… later we try committing the first modeset …
[drm:intel_dump_pipe_config] [CRTC:26][modeset] config ffff88041b02e800 for pipe A
[drm:intel_dump_pipe_config] cpu_transcoder: A
…
[drm:intel_dump_pipe_config] dpll_hw_state: dpll: 0xc4016001, dpll_md: 0x0, fp0: 0x20e08, fp1: 0x30d07
[drm:intel_dump_pipe_config] planes on this crtc
[drm:intel_dump_pipe_config] STANDARD PLANE:23 plane: 0.0 idx: 0 enabled
[drm:intel_dump_pipe_config] FB:42, fb = 800x600 format = 0x34325258
[drm:intel_dump_pipe_config] scaler:0 src (0, 0) 800x600 dst (0, 0) 800x600
[drm:intel_dump_pipe_config] CURSOR PLANE:25 plane: 0.1 idx: 1 disabled, scaler_id = 0
[drm:intel_dump_pipe_config] STANDARD PLANE:27 plane: 0.1 idx: 2 disabled, scaler_id = 0
[drm:intel_get_shared_dpll] CRTC:26 allocated PCH DPLL A
[drm:intel_get_shared_dpll] using PCH DPLL A for pipe A
[drm:ilk_audio_codec_disable] Disable audio codec on port C, pipe A
[drm:intel_disable_pipe] disabling pipe A
------------[ cut here ]------------
WARNING: CPU: 1 PID: 130 at drivers/gpu/drm/i915/intel_display.c:1146 intel_disable_pipe+0x297/0x2d0 [i915]
pipe_off wait timed out
…
---[ end trace 94fc8aa03ae139e8 ]---
[drm:intel_dp_link_down]
[drm:ironlake_crtc_disable [i915]] *ERROR* failed to disable transcoder A
Later modesets succeed since they reset the DPLL's configuration anyway,
but this is enough to get stuck with a big fat warning in dmesg.
A better solution would be to add refcounts for the SSC source, but for
now leaving the source clock on should suffice.
Changes since v4:
- Fix calculation of final for systems with LVDS panels (fixes BUG() on
CI test suite)
Changes since v3:
- Move temp variable into loop
- Move checks for using_ssc_source to after we've figured out has_ck505
- Add using_ssc_source to debug output
Changes since v2:
- Fix debug output for when we disable the CPU source
Changes since v1:
- Leave the SSC source clock on instead of just shutting it off on all
of the DPLL configurations.
When the qdisc is full, we drop a packet at the head of the queue,
queue the current skb and return NET_XMIT_CN
Now we track backlog on upper qdiscs, we need to call
qdisc_tree_reduce_backlog(), even if the qlen did not change.
Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If the packet was dropped by lower qdisc, then we must not
access it later.
Save qdisc_pkt_len(skb) in a temp variable.
Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: not using qdisc_qstats_drop()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When the bottom qdisc decides to, for example, drop some packet,
it calls qdisc_tree_decrease_qlen() to update the queue length
for all its ancestors, we need to update the backlog too to
keep the stats on root qdisc accurate.
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Remove nearly duplicated code and prepare for the following patch.
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
IPv6 ping socket error handler doesn't correctly convert the new 32 bit
mtu to host endianness before using.
Cc: Lorenzo Colitti <lorenzo@google.com> Fixes: 6d0bfe22611602f ("net: ipv6: Add IPv6 support to the ping socket.") Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Lengthy output of sysrq-w may take a lot of time on slow serial console.
Currently we reset NMI-watchdog on the current CPU to avoid spurious
lockup messages. Sometimes this doesn't work since softlockup watchdog
might trigger on another CPU which is waiting for an IPI to proceed.
We reset softlockup watchdogs on all CPUs, but we do this only after
listing all tasks, and this may be too late on a busy system.
So, reset watchdogs CPUs earlier, in for_each_process_thread() loop.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1465474805-14641-1-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fix kprobe_fault_handler() to clear the TF (trap flag) bit of
the flags register in the case of a fault fixup on single-stepping.
If we put a kprobe on the instruction which caused a
page fault (e.g. actual mov instructions in copy_user_*),
that fault happens on the single-stepping buffer. In this
case, kprobes resets running instance so that the CPU can
retry execution on the original ip address.
However, current code forgets to reset the TF bit. Since this
fault happens with TF bit set for enabling single-stepping,
when it retries, it causes a debug exception and kprobes
can not handle it because it already reset itself.
On the most of x86-64 platform, it can be easily reproduced
by using kprobe tracer. E.g.
# cd /sys/kernel/debug/tracing
# echo p copy_user_enhanced_fast_string+5 > kprobe_events
# echo 1 > events/kprobes/enable
And you'll see a kernel panic on do_debug(), since the debug
trap is not handled by kprobes.
To fix this problem, we just need to clear the TF bit when
resetting running kprobe.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: systemtap@sourceware.org Link: http://lkml.kernel.org/r/20160611140648.25885.37482.stgit@devbox
[ Updated the comments. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>