target_core_sbc's compare_and_write functionality suffers from taking
data at the wrong memory location when writing a CAW request to disk
when a SGL offset is non-zero.
This can happen with loopback and vhost-scsi fabric drivers when
SCF_PASSTHROUGH_SG_TO_MEM_NOALLOC is used to map existing user-space
SGL memory into COMPARE_AND_WRITE READ/WRITE payload buffers.
Rather than writing all-zeroes as instructed with the -D file, it
corrupts the data in the sector by splicing some of the original
bytes in. The page of the first entry of cmd->t_data_sg includes the
CDB, and sg->offset is set to a position past the CDB. I presume that
sg->offset is also the right choice to use for subsequent sglist
members.
Signed-off-by: Jan Engelhardt <jengelh@netitwork.de> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
This patch addresses a race + use after free where the first
stage of COMPARE_AND_WRITE in compare_and_write_callback()
is rescheduled after the backend sends the secondary WRITE,
resulting in second stage compare_and_write_post() callback
completing in target_complete_ok_work() before the first
can return.
Because current code depends on checking se_cmd->se_cmd_flags
after return from se_cmd->transport_complete_callback(),
this results in first stage having SCF_COMPARE_AND_WRITE_POST
set, which incorrectly falls through into second stage CAW
processing code, eventually triggering a NULL pointer
dereference due to use after free.
To address this bug, pass in a new *post_ret parameter into
se_cmd->transport_complete_callback(), and depend upon this
value instead of ->se_cmd_flags to determine when to return
or fall through into ->queue_status() code for CAW.
Cc: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
This patch addresses a case where iscsi_target_do_tx_login_io()
fails sending the last login response PDU, after the RX/TX
threads have already been started.
The case centers around iscsi_target_rx_thread() not invoking
allow_signal(SIGINT) before the send_sig(SIGINT, ...) occurs
from the failure path, resulting in RX thread hanging
indefinately on iscsi_conn->rx_login_comp.
To address this bug, complete ->rx_login_complete for good
measure in the failure path, and immediately return from
RX thread context if connection state did not actually reach
full feature phase (TARG_CONN_STATE_LOGGED_IN).
Cc: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Doing so will cause the grant to be unmapped and then, during
fault handling, the fault to be mistakenly treated as NUMA hint
fault.
In addition, even if those maps could partcipate in NUMA
balancing, it wouldn't provide any benefit since we are unable
to determine physical page's node (even if/when VNUMA is
implemented).
Marking grant maps' VMAs as VM_IO will exclude them from being
part of NUMA balancing.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
If we pass in an empty nfs_fattr struct to nfs_update_inode, it will
(correctly) not update any of the attributes, but it then clears the
NFS_INO_INVALID_ATTR flag, which indicates that the attributes are
up to date. Don't clear the flag if the fattr struct has no valid
attrs to apply.
Reviewed-by: Steve French <steve.french@primarydata.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Gigabyte Z710X mobo with ALC1150 codec gets significant noises from
the analog loopback routes even if their inputs are all muted.
Simply kill the aamix for fixing it.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=108301 Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
When running a 32bit guest under a 64bit hypervisor, the ARMv8
architecture defines a mapping of the 32bit registers in the 64bit
space. This includes banked registers that are being demultiplexed
over the 64bit ones.
On exceptions caused by an operation involving a 32bit register, the
HW exposes the register number in the ESR_EL2 register. It was so
far understood that SW had to distinguish between AArch32 and AArch64
accesses (based on the current AArch32 mode and register number).
It turns out that I misinterpreted the ARM ARM, and the clue is in
D1.20.1: "For some exceptions, the exception syndrome given in the
ESR_ELx identifies one or more register numbers from the issued
instruction that generated the exception. Where the exception is
taken from an Exception level using AArch32 these register numbers
give the AArch64 view of the register."
Which means that the HW is already giving us the translated version,
and that we shouldn't try to interpret it at all (for example, doing
an MMIO operation from the IRQ mode using the LR register leads to
very unexpected behaviours).
The fix is thus not to perform a call to vcpu_reg32() at all from
vcpu_reg(), and use whatever register number is supplied directly.
The only case we need to find out about the mapping is when we
actively generate a register access, which only occurs when injecting
a fault in a guest.
Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
If clp->cl_cb_ident is zero, then nfs_cb_idr_remove_locked() skips removing
it when the nfs_client is freed. A decoding or server bug can then find
and try to put that first nfs_client which would lead to a crash.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Fixes: d6870312659d ("nfs4client: convert to idr_alloc()") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The following test program from Dmitry can cause softlockups or RCU
stalls as it copies 1GB from tmpfs into eventfd and we don't have any
scheduling point at that path in sendfile(2) implementation:
int r1 = eventfd(0, 0);
int r2 = memfd_create("", 0);
unsigned long n = 1<<30;
fallocate(r2, 0, 0, n);
sendfile(r1, r2, 0, n);
Add cond_resched() into __splice_from_pipe() to fix the problem.
CC: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Commit 296291cdd162 (mm: make sendfile(2) killable) fixed an issue where
sendfile(2) was doing a lot of tiny writes into a filesystem and thus
was unkillable for a long time. However sendfile(2) can be (mis)used to
issue lots of writes into arbitrary file descriptor such as evenfd or
similar special file descriptors which never hit the standard filesystem
write path and thus are still unkillable. E.g. the following example
from Dmitry burns CPU for ~16s on my test system without possibility to
be killed:
int r1 = eventfd(0, 0);
int r2 = memfd_create("", 0);
unsigned long n = 1<<30;
fallocate(r2, 0, 0, n);
sendfile(r1, r2, 0, n);
There are actually quite a few tests for pending signals in sendfile
code however we data to write is always available none of them seems to
trigger. So fix the problem by adding a test for pending signal into
splice_from_pipe_next() also before the loop waiting for pipe buffers to
be available. This should fix all the lockup issues with sendfile of the
do-ton-of-tiny-writes nature.
Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The thing got broken back in 2002 - sysvfs does *not* have inline
symlinks; even short ones have bodies stored in the first block
of file. sysv_symlink() handles that correctly; unfortunately,
attempting to look an existing symlink up will end up confusing
them for inline symlinks, and interpret the block number containing
the body as the body itself.
Nobody has noticed until now, which says something about the level
of testing sysvfs gets ;-/
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Fixes STAR 9000953410: "perf callgraph profiling causing RCU stalls"
| perf record -g -c 15000 -e cycles /sbin/hackbench
|
| INFO: rcu_preempt self-detected stall on CPU
| 1: (1 GPs behind) idle=609/140000000000002/0 softirq=2914/2915 fqs=603
| Task dump for CPU 1:
in-kernel dwarf unwinder has a fast binary lookup and a fallback linear
search (which iterates thru each of ~11K entries) thus takes 2 orders of
magnitude longer (~3 million cycles vs. 2000). Routines written in hand
assembler lack dwarf info (as we don't support assembler CFI pseudo-ops
yet) fail the unwinder binary lookup, hit linear search, failing
nevertheless in the end.
However the linear search is pointless as binary lookup tables are created
from it in first place. It is impossible to have binary lookup fail while
succeed the linear search. It is pure waste of cycles thus removed by
this patch.
This manifested as RCU stalls / NMI watchdog splat when running
hackbench under perf with callgraph profiling. The triggering condition
was perf counter overflowing in routine lacking dwarf info (like memset)
leading to patheic 3 million cycle unwinder slow path and by the time it
returned new interrupts were already pending (Timer, IPI) and taken
rightaway. The original memset didn't make forward progress, system kept
accruing more interrupts and more unwinder delayes in a vicious feedback
loop, ultimately triggering the NMI diagnostic.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Currently we can hit a scenario where we'll tm_reclaim() twice. This
results in a TM bad thing exception because the second reclaim occurs
when not in suspend mode.
The scenario in which this can happen is the following. We attempt to
deliver a signal to userspace. To do this we need obtain the stack
pointer to write the signal context. To get this stack pointer we
must tm_reclaim() in case we need to use the checkpointed stack
pointer (see get_tm_stackpointer()). Normally we'd then return
directly to userspace to deliver the signal without going through
__switch_to().
Unfortunatley, if at this point we get an error (such as a bad
userspace stack pointer), we need to exit the process. The exit will
result in a __switch_to(). __switch_to() will attempt to save the
process state which results in another tm_reclaim(). This
tm_reclaim() now causes a TM Bad Thing exception as this state has
already been saved and the processor is no longer in TM suspend mode.
Whee!
This patch checks the state of the MSR to ensure we are TM suspended
before we attempt the tm_reclaim(). If we've already saved the state
away, we should no longer be in TM suspend mode. This has the
additional advantage of checking for a potential TM Bad Thing
exception.
Found using syscall fuzzer.
Fixes: fb09692e71f1 ("powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes") Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Currently we allow both the MSR T and S bits to be set by userspace on
a signal return. Unfortunately this is a reserved configuration and
will cause a TM Bad Thing exception if attempted (via rfid).
This patch checks for this case in both the 32 and 64 bit signals
code. If both T and S are set, we mark the context as invalid.
Found using a syscall fuzzer.
Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context") Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
It turned out that many HP laptops suffer from the same problem as
fixed in commit [c932b98c1e47: ALSA: hda - Apply pin fixup for HP
ProBook 6550b]. But, it's tiresome to list up all such PCI SSIDs, as
there are really lots of HP machines.
Instead, we do a bit more clever, try to check the supposedly dock and
built-in headphone pins, and apply the fixup when both seem valid.
This rule can be applied generically to all models using the same
quirk, so we'll fix all in a shot.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=107491 Signed-off-by: Takashi Iwai <tiwai@suse.de>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
If md->signature == MAC_DRIVER_MAGIC and md->block_size == 1023, a single
512 byte sector would be read (secsize / 512). However the partition
structure would be located past the end of the buffer (secsize % 512).
For making the speakers on Acer Aspire One Cloudbook 14 to work, we
need the as same quirk as for another Chromebook. This patch adds the
corresponding fixup entry.
Reported-by: Patrick <epictetus@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
If either of the memory allocations in kvm_arch_vcpu_create() fail, the
vcpu which has been allocated and kvm_vcpu_init'd doesn't get uninit'd
in the error handling path. Add a call to kvm_vcpu_uninit() to fix this.
Fixes: 669e846e6c4e ("KVM/MIPS32: MIPS arch specific APIs for KVM") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ luis: backported to 3.16:
- file rename: mips.c -> kvm_mips.c ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The immediate field of the CACHE instruction is signed, so ensure that
it gets sign extended by casting it to an int16_t rather than just
masking the low 16 bits.
Fixes: e685c689f3a8 ("KVM/MIPS32: Privileged instruction/target branch emulation.") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ luis: backported to 3.16:
- file rename: emulate.c -> kvm_mips_emul.c ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
ASID restoration on guest resume should determine the guest execution
mode based on the guest Status register rather than bit 30 of the guest
PC.
Fix the two places in locore.S that do this, loading the guest status
from the cop0 area. Note, this assembly is specific to the trap &
emulate implementation of KVM, so it doesn't need to check the
supervisor bit as that mode is not implemented in the guest.
Fixes: b680f70fc111 ("KVM/MIPS32: Entry point for trampolining to...") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ luis: backported to 3.16:
- file rename: locore.S -> kvm_locore.S ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
dm-mpath retries ioctl, when no path is readily available and the device
is configured to queue I/O in such a case. If you want to stop the retry
before multipathd decides to turn off queueing mode, you could send
signal for the process to exit from the loop.
However the check of fatal signal has not carried along when commit 6c182cd88d17 ("dm mpath: fix ioctl deadlock when no paths") moved the
loop from dm-mpath to dm core. As a result, we can't terminate such
a process in the retry loop.
A thin-pool that is in out-of-data-space (OODS) mode may transition back
to write mode -- without the admin adding more space to the thin-pool --
if/when blocks are released (either by deleting thin devices or
discarding provisioned blocks).
But as part of the thin-pool's earlier transition to out-of-data-space
mode the thin-pool may have set the 'error_if_no_space' flag to true if
the no_space_timeout expires without more space having been made
available. That implementation detail, of changing the pool's
error_if_no_space setting, needs to be reset back to the default that
the user specified when the thin-pool's table was loaded.
Otherwise we'll drop the user requested behaviour on the floor when this
out-of-data-space to write mode transition occurs.
Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Fixes: 2c43fd26e4 ("dm thin: fix missing out-of-data-space to write mode transition if blocks are released")
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
One of the many faults of the QinHeng CH345 USB MIDI interface chip is
that it does not handle received SysEx messages correctly -- every second
event packet has a wrong code index number, which is the one from the last
seen message, instead of 4. For example, the two messages "FE F0 01 02 03
04 05 06 07 08 09 0A 0B 0C 0D 0E F7" result in the following event
packets:
A class-compliant driver must interpret an event packet with CIN 15 as
having a single data byte, so the other two bytes would be ignored. The
message received by the host would then be missing two bytes out of six;
in this example, "F0 01 02 03 06 07 08 09 0C 0D 0E F7".
These corrupted SysEx event packages contain only data bytes, while the
CH345 uses event packets with a correct CIN value only for messages with
a status byte, so it is possible to distinguish between these two cases by
checking for the presence of this status byte.
(Other bugs in the CH345's input handling, such as the corruption resulting
from running status, cannot be worked around.)
Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The CH345 USB MIDI chip has two output ports. However, they are
multiplexed through one pin, and the number of ports cannot be reduced
even for hardware that implements only one connector, so for those
devices, data sent to either port ends up on the same hardware output.
This becomes a problem when both ports are used at the same time, as
longer MIDI commands (such as SysEx messages) are likely to be
interrupted by messages from the other port, and thus to get lost.
It would not be possible for the driver to detect how many ports the
device actually has, except that in practice, _all_ devices built with
the CH345 have only one port. So we can just ignore the device's
descriptors, and hardcode one output port.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Takashi Iwai <tiwai@suse.de>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Today, if we perform an xfs_growfs which adds allocation groups,
mp->m_maxagi is not properly updated when the growfs is complete.
Therefore inodes will continue to be allocated only in the
AGs which existed prior to the growfs, and the new space
won't be utilized.
This is because of this path in xfs_growfs_data_private():
xfs_growfs_data_private
xfs_initialize_perag(mp, nagcount, &nagimax);
if (mp->m_flags & XFS_MOUNT_32BITINODES)
index = xfs_set_inode32(mp);
else
index = xfs_set_inode64(mp);
if (maxagi)
*maxagi = index;
where xfs_set_inode* iterates over the (old) agcount in
mp->m_sb.sb_agblocks, which has not yet been updated
in the growfs path. So "index" will be returned based on
the old agcount, not the new one, and new AGs are not available
for inode allocation.
Fix this by explicitly passing the proper AG count (which
xfs_initialize_perag() already has) down another level,
so that xfs_set_inode* can make the proper decision about
acceptable AGs for inode allocation in the potentially
newly-added AGs.
This has been broken since 3.7, when these two
xfs_set_inode* functions were added in commit 2d2194f.
Prior to that, we looped over "agcount" not sb_agblocks
in these calculations.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Cc: Ben Hutchings <ben@decadent.org.uk> Acked-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Dmitry Vyukov reported a memory leak using IPV6 SCTP sockets.
We need to call inet6_destroy_sock() to properly release
inet6 specific fields.
Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Proxy entries could have null pointer to net-device.
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Fixes: 84920c1420e2 ("net: Allow ipv6 proxies and arp proxies be shown with iproute2") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np->opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.
Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np->opt once (before calling ipv6_dup_options())
This patch adds full RCU protection to np->opt
Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
If a fragmented multicast packet is received on an ethernet device which
has an active macvlan on top of it, each fragment is duplicated and
received both on the underlying device and the macvlan. If some
fragments for macvlan are processed before the whole packet for the
underlying device is reassembled, the "overlapping fragments" test in
ip6_frag_queue() discards the whole fragment queue.
To resolve this, add device ifindex to the search key and require it to
match reassembling multicast packets and packets to link-local
addresses.
Note: similar patch has been already submitted by Yoshifuji Hideaki in
http://patchwork.ozlabs.org/patch/220979/
but got lost and forgotten for some reason.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Commit fcb26ec5b18d ("broadcom: move all PHY_ID's to header")
updated broadcom_tbl to use PHY_IDs, but incorrectly replaced 0x0143bca0
with PHY_ID_BCM5482 (making a duplicate entry, and completely omitting
the original). Fix that.
Fixes: fcb26ec5b18d ("broadcom: move all PHY_ID's to header") Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Similar to ipv4, when destroying an mrt table the static mfc entries and
the static devices are kept, which leads to devices that can never be
destroyed (because of refcnt taken) and leaked memory. Make sure that
everything is cleaned up on netns destruction.
Fixes: 8229efdaef1e ("netns: ip6mr: enable namespace support in ipv6 multicast forwarding code") CC: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
When destroying an mrt table the static mfc entries and the static
devices are kept, which leads to devices that can never be destroyed
(because of refcnt taken) and leaked memory, for example:
unreferenced object 0xffff880034c144c0 (size 192):
comm "mfc-broken", pid 4777, jiffies 4320349055 (age 46001.964s)
hex dump (first 32 bytes):
98 53 f0 34 00 88 ff ff 98 53 f0 34 00 88 ff ff .S.4.....S.4....
ef 0a 0a 14 01 02 03 04 00 00 00 00 01 00 00 00 ................
backtrace:
[<ffffffff815c1b9e>] kmemleak_alloc+0x4e/0xb0
[<ffffffff811ea6e0>] kmem_cache_alloc+0x190/0x300
[<ffffffff815931cb>] ip_mroute_setsockopt+0x5cb/0x910
[<ffffffff8153d575>] do_ip_setsockopt.isra.11+0x105/0xff0
[<ffffffff8153e490>] ip_setsockopt+0x30/0xa0
[<ffffffff81564e13>] raw_setsockopt+0x33/0x90
[<ffffffff814d1e14>] sock_common_setsockopt+0x14/0x20
[<ffffffff814d0b51>] SyS_setsockopt+0x71/0xc0
[<ffffffff815cdbf6>] entry_SYSCALL_64_fastpath+0x16/0x7a
[<ffffffffffffffff>] 0xffffffffffffffff
Make sure that everything is cleaned on netns destruction.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Further investigation showed that this can happen when an *odd* number of
fds are being passed over AF_UNIX sockets.
In these cases CMSG_LEN(i * sizeof(int)) and CMSG_SPACE(i * sizeof(int)),
where i is the number of successfully passed fds, differ by 4 bytes due
to the extra CMSG_ALIGN() padding in CMSG_SPACE() to an 8 byte boundary
on 64 bit. The padding is used to align subsequent cmsg headers in the
control buffer.
When the control buffer passed in from the receiver side *lacks* these 4
bytes (e.g. due to buggy/wrong API usage), then msg->msg_controllen will
overflow in scm_detach_fds():
int cmlen = CMSG_LEN(i * sizeof(int)); <--- cmlen w/o tail-padding
err = put_user(SOL_SOCKET, &cm->cmsg_level);
if (!err)
err = put_user(SCM_RIGHTS, &cm->cmsg_type);
if (!err)
err = put_user(cmlen, &cm->cmsg_len);
if (!err) {
cmlen = CMSG_SPACE(i * sizeof(int)); <--- cmlen w/ 4 byte extra tail-padding
msg->msg_control += cmlen;
msg->msg_controllen -= cmlen; <--- iff no tail-padding space here ...
} ... wrap-around
F.e. it will wrap to a length of 18446744073709551612 bytes in case the
receiver passed in msg->msg_controllen of 20 bytes, and the sender
properly transferred 1 fd to the receiver, so that its CMSG_LEN results
in 20 bytes and CMSG_SPACE in 24 bytes.
In case of MSG_CMSG_COMPAT (scm_detach_fds_compat()), I haven't seen an
issue in my tests as alignment seems always on 4 byte boundary. Same
should be in case of native 32 bit, where we end up with 4 byte boundaries
as well.
In practice, passing msg->msg_controllen of 20 to recvmsg() while receiving
a single fd would mean that on successful return, msg->msg_controllen is
being set by the kernel to 24 bytes instead, thus more than the input
buffer advertised. It could f.e. become an issue if such application later
on zeroes or copies the control buffer based on the returned msg->msg_controllen
elsewhere.
Maximum number of fds we can send is a hard upper limit SCM_MAX_FD (253).
Going over the code, it seems like msg->msg_controllen is not being read
after scm_detach_fds() in scm_recv() anymore by the kernel, good!
Relevant recvmsg() handler are unix_dgram_recvmsg() (unix_seqpacket_recvmsg())
and unix_stream_recvmsg(). Both return back to their recvmsg() caller,
and ___sys_recvmsg() places the updated length, that is, new msg_control -
old msg_control pointer into msg->msg_controllen (hence the 24 bytes seen
in the example).
Long time ago, Wei Yongjun fixed something related in commit 1ac70e7ad24a
("[NET]: Fix function put_cmsg() which may cause usr application memory
overflow").
RFC3542, section 20.2. says:
The fields shown as "XX" are possible padding, between the cmsghdr
structure and the data, and between the data and the next cmsghdr
structure, if required by the implementation. While sending an
application may or may not include padding at the end of last
ancillary data in msg_controllen and implementations must accept both
as valid. On receiving a portable application must provide space for
padding at the end of the last ancillary data as implementations may
copy out the padding at the end of the control message buffer and
include it in the received msg_controllen. When recvmsg() is called
if msg_controllen is too small for all the ancillary data items
including any trailing padding after the last item an implementation
may set MSG_CTRUNC.
Since we didn't place MSG_CTRUNC for already quite a long time, just do
the same as in 1ac70e7ad24a to avoid an overflow.
Btw, even man-page author got this wrong :/ See db939c9b26e9 ("cmsg.3: Fix
error in SCM_RIGHTS code sample"). Some people must have copied this (?),
thus it got triggered in the wild (reported several times during boot by
David and HacKurx).
No Fixes tag this time as pre 2002 (that is, pre history tree).
Reported-by: David Sterba <dave@jikos.cz> Reported-by: HacKurx <hackurx@gmail.com> Cc: PaX Team <pageexec@freemail.hu> Cc: Emese Revfy <re.emese@gmail.com> Cc: Brad Spengler <spender@grsecurity.net> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: Eric Dumazet <edumazet@google.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
His program is specifically attempting a Cross SYN TCP exchange,
that we support (for the pleasure of hackers ?), but it looks we
lack proper tcp->copied_seq initialization.
Thanks again Dmitry for your report and testings.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
When a passive TCP is created, we eventually call tcp_md5_do_add()
with sk pointing to the child. It is not owner by the user yet (we
will add this socket into listener accept queue a bit later anyway)
But we do own the spinlock, so amend the lockdep annotation to avoid
following splat :
Fixes: a8afca0329988 ("tcp: md5: protects md5sig_info with RCU") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Thomas reports
"
4gsystems sells two total different LTE-surfsticks under the same name.
..
The newer version of XS Stick W100 is from "omega"
..
Under windows the driver switches to the same ID, and uses MI03\6 for
network and MI01\6 for modem.
..
echo "1c9e 9b01" > /sys/bus/usb/drivers/qmi_wwan/new_id
echo "1c9e 9b01" > /sys/bus/usb-serial/drivers/option1/new_id
There is also ttyUSB0, but it is not usable, at least not for at.
The device works well with qmi and ModemManager-NetworkManager.
"
Reported-by: Thomas Schäfer <tschaefer@t-online.de> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
the OUTMCAST stat is double incremented, getting bumped once in the mcast code
itself, and again in the common ip output path. Remove the mcast bump, as its
not needed
Validated by the reporter, with good results
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Claus Jensen <claus.jensen@microsemi.com> CC: Claus Jensen <claus.jensen@microsemi.com> CC: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Drivers like vxlan use the recently introduced
udp_tunnel_xmit_skb/udp_tunnel6_xmit_skb APIs. udp_tunnel6_xmit_skb
makes use of ip6tunnel_xmit, and ip6tunnel_xmit, after sending the
packet, updates the struct stats using the usual
u64_stats_update_begin/end calls on this_cpu_ptr(dev->tstats).
udp_tunnel_xmit_skb makes use of iptunnel_xmit, which doesn't touch
tstats, so drivers like vxlan, immediately after, call
iptunnel_xmit_stats, which does the same thing - calls
u64_stats_update_begin/end on this_cpu_ptr(dev->tstats).
While vxlan is probably fine (I don't know?), calling a similar function
from, say, an unbound workqueue, on a fully preemptable kernel causes
real issues:
The solution would be to protect the whole
this_cpu_ptr(dev->tstats)/u64_stats_update_begin/end blocks with
disabling preemption and then reenabling it.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
In case no struct sockaddr_ll has been passed to packet
socket's sendmsg() when doing a TX_RING flush run, then
skb->protocol is set to po->num instead, which is the protocol
passed via socket(2)/bind(2).
Applications only xmitting can go the path of allocating the
socket as socket(PF_PACKET, <mode>, 0) and do a bind(2) on the
TX_RING with sll_protocol of 0. That way, register_prot_hook()
is neither called on creation nor on bind time, which saves
cycles when there's no interest in capturing anyway.
That leaves us however with po->num 0 instead and therefore
the TX_RING flush run sets skb->protocol to 0 as well. Eric
reported that this leads to problems when using tools like
trafgen over bonding device. I.e. the bonding's hash function
could invoke the kernel's flow dissector, which depends on
skb->protocol being properly set. In the current situation, all
the traffic is then directed to a single slave.
Fix it up by inferring skb->protocol from the Ethernet header
when not set and we have ARPHRD_ETHER device type. This is only
done in case of SOCK_RAW and where we have a dev->hard_header_len
length. In case of ARPHRD_ETHER devices, this is guaranteed to
cover ETH_HLEN, and therefore being accessed on the skb after
the skb_store_bits().
Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
We concluded that the skb_probe_transport_header() should better be
called unconditionally. Avoiding the call into the flow dissector has
also not really much to do with the direct xmit mode.
While it seems that only virtio_net code makes use of GSO from non
RX/TX ring packet socket paths, we should probe for a transport header
nevertheless before they hit devices.
Reference: http://thread.gmane.org/gmane.linux.network/386173/ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
An AF_UNIX datagram socket being the client in an n:1 association with
some server socket is only allowed to send messages to the server if the
receive queue of this socket contains at most sk_max_ack_backlog
datagrams. This implies that prospective writers might be forced to go
to sleep despite none of the message presently enqueued on the server
receive queue were sent by them. In order to ensure that these will be
woken up once space becomes again available, the present unix_dgram_poll
routine does a second sock_poll_wait call with the peer_wait wait queue
of the server socket as queue argument (unix_dgram_recvmsg does a wake
up on this queue after a datagram was received). This is inherently
problematic because the server socket is only guaranteed to remain alive
for as long as the client still holds a reference to it. In case the
connection is dissolved via connect or by the dead peer detection logic
in unix_dgram_sendmsg, the server socket may be freed despite "the
polling mechanism" (in particular, epoll) still has a pointer to the
corresponding peer_wait queue. There's no way to forcibly deregister a
wait queue with epoll.
Based on an idea by Jason Baron, the patch below changes the code such
that a wait_queue_t belonging to the client socket is enqueued on the
peer_wait queue of the server whenever the peer receive queue full
condition is detected by either a sendmsg or a poll. A wake up on the
peer queue is then relayed to the ordinary wait queue of the client
socket via wake function. The connection to the peer wait queue is again
dissolved if either a wake up is about to be relayed or the client
socket reconnects or a dead peer is detected or the client socket is
itself closed. This enables removing the second sock_poll_wait from
unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
that no blocked writer sleeps forever.
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets") Reviewed-by: Jason Baron <jbaron@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Reset pskb in macvlan_handle_frame in case skb_share_check returned a
clone.
Fixes: 8a4eb5734e8d ("net: introduce rx_handler results and logic around that") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
fs/cachefiles/rdwr.c: In function ‘cachefiles_write_page’:
fs/cachefiles/rdwr.c:882: warning: ‘ret’ may be used uninitialized in
this function
If the jump to label "error" is taken, "ret" will indeed be
uninitialized, and random stack data may be printed by the debug code.
Fixes: 102f4d900c9c8f5e ("FS-Cache: Handle a write to the page immediately beyond the EOF marker") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The callback registered by the musb driver has to comply to the latter,
but up to now had "offset" first which effectively made the function
broken for correct users. So flip the order and while at it also
switch to the parameter names of struct usb_phy_io_ops's write.
Fixes: ffb865b1e460 ("usb: musb: add ulpi access operations") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
now sctp auth cannot work well when setting a hmacid manually, which
is caused by that we didn't use the network order for hmacid, so fix
it by adding the transformation in sctp_auth_ep_set_hmacs.
even we set hmacid with the network order in userspace, it still
can't work, because of this condition in sctp_auth_ep_set_hmacs():
if (id > SCTP_AUTH_HMAC_ID_MAX)
return -EOPNOTSUPP;
so this wasn't working before and thus it won't break compatibility.
Fixes: 65b07e5d0d09 ("[SCTP]: API updates to suport SCTP-AUTH extensions.") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Since it's introduction in commit 69e3c75f4d54 ("net: TX_RING and
packet mmap"), TX_RING could be used from SOCK_DGRAM and SOCK_RAW
side. When used with SOCK_DGRAM only, the size_max > dev->mtu +
reserve check should have reserve as 0, but currently, this is
unconditionally set (in it's original form as dev->hard_header_len).
I think this is not correct since tpacket_fill_skb() would then
take dev->mtu and dev->hard_header_len into account for SOCK_DGRAM,
the extra VLAN_HLEN could be possible in both cases. Presumably, the
reserve code was copied from packet_snd(), but later on missed the
check. Make it similar as we have it in packet_snd().
Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Packet sockets can be used by various net devices and are not
really restricted to ARPHRD_ETHER device types. However, when
currently checking for the extra 4 bytes that can be transmitted
in VLAN case, our assumption is that we generally probe on
ARPHRD_ETHER devices. Therefore, before looking into Ethernet
header, check the device type first.
This also fixes the issue where non-ARPHRD_ETHER devices could
have no dev->hard_header_len in TX_RING SOCK_RAW case, and thus
the check would test unfilled linear part of the skb (instead
of non-linear).
Fixes: 57f89bfa2140 ("network: Allow af_packet to transmit +4 bytes for VLAN packets.") Fixes: 52f1454f629f ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
In tpacket_fill_skb() commit c1aad275b029 ("packet: set transport
header before doing xmit") and later on 40893fd0fd4e ("net: switch
to use skb_probe_transport_header()") was probing for a transport
header on the skb from a ring buffer slot, but at a time, where
the skb has _not even_ been filled with data yet. So that call into
the flow dissector is pretty useless. Lets do it after we've set
up the skb frags.
Fixes: c1aad275b029 ("packet: set transport header before doing xmit") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Use the local uapi headers to keep in sync with "recently" added #define's
(e.g. SKF_AD_VLAN_TPID). Refactored CFLAGS, and bpf_asm doesn't need -I.
Fixes: 3f356385e8a4 ("filter: bpf_asm: add minimal bpf asm tool") Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Anytime a write operation is performed with Reliable Write flag enabled,
the eMMC device is enforced to bypass the cache and do a write to the
underling NVM device by Jedec specification; this causes a performance
penalty since write operations can't be optimized by the device cache.
In our tests, we replayed a typical mobile daily trace pattern and found
~9% overall time reduction in trace replay by using this patch. Also the
write ops within 4KB~64KB chunk size range get a 40~60% performance
improvement by using the patch (as this range of write chunks are the ones
affected by REQ_META).
This patch has been discussed in the Mobile & Embedded Linux Storage Forum
and it's the results of feedbacks from many people. We also checked with
fsdevl and f2fs mailing list developers that this change in the usage of
REQ_META is not affecting FS behavior and we got positive feedbacks.
Reporting here the feedbacks:
http://comments.gmane.org/gmane.linux.file-systems/97219
http://thread.gmane.org/gmane.linux.file-systems.f2fs/3178/focus=3183
Signed-off-by: Bruce Ford <bford@micron.com> Signed-off-by: Luca Porzio <lporzio@micron.com> Fixes: ce39f9d17c14 ("mmc: support packed write command for eMMC4.5 devices") Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
New created file's mode is not masked with umask, and this makes umask not
work for ocfs2 volume.
Fixes: 702e5bc ("ocfs2: use generic posix ACL infrastructure") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Gang He <ghe@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
sigsuspend() is nowhere used except in signal.c itself, so we can mark it
static do not pollute the global namespace.
But this patch is more than a boring cleanup patch, it fixes a real issue
on UserModeLinux. UML has a special console driver to display ttys using
xterm, or other terminal emulators, on the host side. Vegard reported
that sometimes UML is unable to spawn a xterm and he's facing the
following warning:
WARNING: CPU: 0 PID: 908 at include/linux/thread_info.h:128 sigsuspend+0xab/0xc0()
It turned out that this warning makes absolutely no sense as the UML
xterm code calls sigsuspend() on the host side, at least it tries. But
as the kernel itself offers a sigsuspend() symbol the linker choose this
one instead of the glibc wrapper. Interestingly this code used to work
since ever but always blocked signals on the wrong side. Some recent
kernel change made the WARN_ON() trigger and uncovered the bug.
It is a wonderful example of how much works by chance on computers. :-)
For the root directory, . and .. are faked (using dir_emit_dots()) and
ctx->pos is reset from 2 to 0.
A corrupted root directory could cause fat_get_entry() to fail, but
->iterate() (fat_readdir()) reports progress to the VFS (with ctx->pos
rewound to 0), so any following calls to ->iterate() continue to return
the same entries again and again.
The result is that userspace will never see the end of the directory,
causing e.g. 'ls' to hang in a getdents() loop.
[hirofumi@mail.parknet.co.jp: cleanup and make sure to correct fake_offset] Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Tested-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Richard Weinberger <richard.weinberger@gmail.com> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
It is not permitted to set task state before lock. usblp_wwait sets
the state to TASK_INTERRUPTIBLE and calls mutex_lock_interruptible.
Upon return from that function, the state will be TASK_RUNNING again.
Thomas reports
"
4gsystems sells two total different LTE-surfsticks under the same name.
..
The newer version of XS Stick W100 is from "omega"
..
Under windows the driver switches to the same ID, and uses MI03\6 for
network and MI01\6 for modem.
..
echo "1c9e 9b01" > /sys/bus/usb/drivers/qmi_wwan/new_id
echo "1c9e 9b01" > /sys/bus/usb-serial/drivers/option1/new_id
As previously reported, some userspace applications depend on bogomips
showed by /proc/cpuinfo. Although there is much less legacy impact on
aarch64 than arm, it does break libvirt.
This patch reverts commit 326b16db9f69 ("arm64: delay: don't bother
reporting bogomips in /proc/cpuinfo"), but with some tweak due to
context change and without the pr_info().
Fixes: 326b16db9f69 ("arm64: delay: don't bother reporting bogomips in /proc/cpuinfo") Signed-off-by: Yang Shi <yang.shi@linaro.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[ luis: backported to 3.16:
- file rename: cpuinfo.c -> setup.c
- linux/delay.h is already included
- adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
There appears to be no formal statement of what pv_irq_ops.save_fl() is
supposed to return precisely. Native returns the full flags, while lguest and
Xen only return the Interrupt Flag, and both have comments by the
implementations stating that only the Interrupt Flag is looked at. This may
have been true when initially implemented, but no longer is.
To make matters worse, the Xen PVOP leaves the upper bits undefined, making
the BUG_ON() undefined behaviour. Experimentally, this now trips for 32bit PV
guests on Broadwell hardware. The BUG_ON() is consistent for an individual
build, but not consistent for all builds. It has also been a sitting timebomb
since SMAP support was introduced.
Use native_save_fl() instead, which will obtain an accurate view of the AC
flag.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Tested-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: <lguest@lists.ozlabs.org> Cc: Xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/1433323874-6927-1-git-send-email-andrew.cooper3@citrix.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Clear device initiated resume variables once device is fully up and running
in U0 state.
Resume needs to be signaled for 20ms for usb2 devices before they can be
moved to U0 state.
An interrupt is triggered if a device initiates resume. As we handle the
event in interrupt context we can not sleep for 20ms, so we instead set
a resume flag, a timestamp, and start the roothub polling.
The roothub code will later move the port to U0 when it finds a port in
resume state with the resume flag set, and timestamp passed by 20ms.
A host initiated resume is however not done in interrupt context, and
host initiated resume code will directly signal resume, wait 20ms and then
move the port to U0.
These two codepaths can race, if we are in the middle of a host initated
resume, while sleeping for 20ms, we may handle a port event and find the
port in resume state. The port event handling code will assume the resume
was device initiated and set the resume flag and timestamp.
Root hub code will however not catch the port in resume state again as the
host initated resume code has already moved the port to U0.
The resume flag and timestamp will remain set for this port preventing port
from suspending again (LPM setting port to U3)
Fix this for now by always clearing the device initated resume parameters
once port is in U0
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Existing Intel xHCI controllers require a delay of 1 mS,
after setting the CMD_RESET bit in command register, before
accessing any HC registers. This allows the HC to complete
the reset operation and be ready for HC register access.
Without this delay, the subsequent HC register access,
may result in a system hang, very rarely.
Verified CherryView / Braswell platforms go through over
5000 warm reboot cycles (which was not possible without
this patch), without any xHCI reset hang.
Signed-off-by: Rajmohan Mani <rajmohan.mani@intel.com> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Since the ci->role will be set after the host role start is complete, there
will be nobody cared irq during start host if usb irq enabled. This error
can be reproduced on i.mx6 sololite EVK board by:
1. disable otg id irq(IDIE) and disable all real otg properties of usbotg1
in dts.
2. boot up the board with ID cable and usb device connected.
3. echo gadget > /sys/kernel/debug/ci_hdrc.0/role
4. echo host > /sys/kernel/debug/ci_hdrc.0/role
5. irq 212: nobody cared.
Signed-off-by: Li Jun <jun.li@freescale.com> Signed-off-by: Peter Chen <peter.chen@freescale.com>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
In some SoCs, dwc3 is implemented as a USB2.0 only
core, meaning that it can't ever achieve SuperSpeed.
Currect driver always sets gadget.max_speed to
USB_SPEED_SUPER unconditionally. This can causes
issues to some Host stacks where the host will issue
a GetBOS() request and we will reply with a BOS
containing Superspeed Capability Descriptor.
At least Windows seems to be upset by this fact and
prints a warning that we should connect $this device
to another port.
[ balbi@ti.com : rewrote entire commit, including
source code comment to make a lot clearer what the
problem is ]
Signed-off-by: Ben McCauley <ben.mccauley@garmin.com> Signed-off-by: Felipe Balbi <balbi@ti.com>
[ luis: backported to 3.16:
- used dev_vdbg() instead of dwc3_trace() ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The function graph tracer adds instrumentation that is required to trace
both entry and exit of a function. In particular the function graph
tracer updates the "return address" of a function in order to insert
a trace callback on function exit.
Kernel power management functions like cpu_suspend() are called
upon power down entry with functions called "finishers" that are in turn
called to trigger the power down sequence but they may not return to the
kernel through the normal return path.
When the core resumes from low-power it returns to the cpu_suspend()
function through the cpu_resume path, which leaves the trace stack frame
set-up by the function tracer in an incosistent state upon return to the
kernel when tracing is enabled.
This patch fixes the issue by pausing/resuming the function graph
tracer on the thread executing cpu_suspend() (ie the function call that
subsequently triggers the "suspend finishers"), so that the function graph
tracer state is kept consistent across functions that enter power down
states and never return by effectively disabling graph tracer while they
are executing.
Fixes: 819e50e25d0c ("arm64: Add ftrace support") Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reported-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The DEVICE_HWI type was added under the faulty assumption that Huawei
devices based on Qualcomm chipsets and firmware use the static USB
interface numbering known from Gobi devices. But this model does
not apply to Huawei devices like the HP branded lt4112 (Huawei me906e).
Huawei firmwares will dynamically assign interface numbers. Functions
are renumbered when the firmware is reconfigured.
Fix by changing the DEVICE_HWI type to use a simplified version
of Huawei's subclass + protocol scheme: Blacklisting known network
interface combinations and assuming the rest are serial.
Reported-and-tested-by: Muri Nicanor <muri+libqmi@immerda.ch> Tested-by: Martin Hauke <mardnh@gmx.de> Fixes: e7181d005e84 ("USB: qcserial: Add support for HP lt4112 LTE/HSPA+ Gobi 4G Modem") Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
After Damien's D3 fix I started to get runtime suspend residency for the
first time and that revealed a breakage on the set_caching IOCTL path
that accesses the HW but doesn't take an RPM ref. Fix this up.
The Honeywell HGI80 is a wireless interface to the evohome connected
thermostat. It uses a TI 3410 USB-serial port.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The device is modeswitched from 1410:9020 to 1410:9022 by selecting the
4th USB configuration:
$ sudo usb_modeswitch –v 0x1410 –p 0x9020 –u 4
This configuration provides a ECM interface as well as TTYs ('Enterprise
Mode' according to the U620 Linux integration guide).
Signed-off-by: Aleksander Morgado <aleksander@aleksander.es> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
It seems like this device has same vendor and product IDs as G2K
devices, but it has different number of interfaces(4 vs 5) and also
different interface layout which makes it currently unusable:
usbcore: registered new interface driver qcserial
usbserial: USB Serial support registered for Qualcomm USB modem
usb 2-1.2: unknown number of interfaces: 5
Signed-off-by: Petr Štetiar <ynezz@true.cz>
[johan: rename define and add comment ] Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Unsurprisingly macbooks have backlights, just the VBT doesn't seem to
know it in this case.
Reported-and-tested-by: Daniel Nicoletti <dantti12@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88325 Fixes: c675949ec58c ("drm/i915: do not setup backlight if not available according to VBT") Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1446716999-1796-1-git-send-email-jani.nikula@intel.com Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The scaling factor for VREFN is 3.0/4096 (not 1.0/4096), just as for
VREFP. This is not immediately obvious from the specification (Xilinx
UG480), but has been confirmed by Xilinx support.
Suggested-by: Hartmut Knaack <knaack.h@gmx.de> Signed-off-by: Thomas Betker <thomas.betker@rohde-schwarz.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
(This should have gone to LKML originally. Sorry for the extra
noise, folks on the cc.)
Background:
Signal frames on x86 have two formats:
1. For 32-bit executables (whether on a real 32-bit kernel or
under 32-bit emulation on a 64-bit kernel) we have a
'fpregset_t' that includes the "FSAVE" registers.
2. For 64-bit executables (on 64-bit kernels obviously), the
'fpregset_t' is smaller and does not contain the "FSAVE"
state.
When creating the signal frame, we have to be aware of whether
we are running a 32 or 64-bit executable so we create the
correct format signal frame.
Problem:
save_xstate_epilog() uses 'fx_sw_reserved_ia32' whenever it is
called for a 32-bit executable. This is for real 32-bit and
ia32 emulation.
But, fpu__init_prepare_fx_sw_frame() only initializes
'fx_sw_reserved_ia32' when emulation is enabled, *NOT* for real
32-bit kernels.
This leads to really wierd situations where 32-bit programs
lose their extended state when returning from a signal handler.
The kernel copies the uninitialized (zero) 'fx_sw_reserved_ia32'
out to userspace in save_xstate_epilog(). But when returning
from the signal, the kernel errors out in check_for_xstate()
when it does not see FP_XSTATE_MAGIC1 present (because it was
zeroed). This leads to the FPU/XSAVE state being initialized.
For MPX, this leads to the most permissive state and means we
silently lose bounds violations. I think this would also mean
that we could lose *ANY* FPU/SSE/AVX state. I'm not sure why
no one has spotted this bug.
I believe this was broken by:
72a671ced66d ("x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels")
way back in 2012.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@sr71.net Cc: fenghua.yu@intel.com Cc: yu-cheng.yu@intel.com Link: http://lkml.kernel.org/r/20151111002354.A0799571@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ luis: backported to 3.16:
- file and function rename:
* arch/x86/kernel/fpu/signal.c -> arch/x86/kernel/xsave.c
* fpu__init_prepare_fx_sw_frame() -> prepare_fx_sw_frame()
- use 'i387_fsave_struct' instead of 'fregs_state'
- adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The ad5629/ad5669 are the I2C variant of the ad5628/ad5668, which has a SPI
interface. They are mostly identical with the exception that the shift
factor is different. Currently the driver does not take care of this
difference which leads to incorrect DAC output values.
Fix this by introducing a custom channel spec for the ad5629/ad5669 with
the correct shift factor.
Fixes: commit 6a17a0768f77 ("iio:dac:ad5064: Add support for the ad5629r and ad5669r") Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
i2c_master_send() returns the number of bytes transferred on success while
the ad5064 driver expects that the write() callback returns 0 on success.
Fix that by translating any non negative return value of i2c_master_send()
to 0.
Fixes: commit 6a17a0768f77 ("iio:dac:ad5064: Add support for the ad5629r and ad5669r") Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
If common clock framework is configured, the driver generates a warning,
which is fixed by this change:
root@devkit3250:~# cat /sys/bus/iio/devices/iio\:device0/in_voltage0_raw
------------[ cut here ]------------
WARNING: CPU: 0 PID: 724 at drivers/clk/clk.c:727 clk_core_enable+0x2c/0xa4()
Modules linked in: sc16is7xx snd_soc_uda1380
CPU: 0 PID: 724 Comm: cat Not tainted 4.3.0-rc2+ #198
Hardware name: LPC32XX SoC (Flattened Device Tree)
Backtrace:
[<>] (dump_backtrace) from [<>] (show_stack+0x18/0x1c)
[<>] (show_stack) from [<>] (dump_stack+0x20/0x28)
[<>] (dump_stack) from [<>] (warn_slowpath_common+0x90/0xb8)
[<>] (warn_slowpath_common) from [<>] (warn_slowpath_null+0x24/0x2c)
[<>] (warn_slowpath_null) from [<>] (clk_core_enable+0x2c/0xa4)
[<>] (clk_core_enable) from [<>] (clk_enable+0x24/0x38)
[<>] (clk_enable) from [<>] (lpc32xx_read_raw+0x38/0x80)
[<>] (lpc32xx_read_raw) from [<>] (iio_read_channel_info+0x70/0x94)
[<>] (iio_read_channel_info) from [<>] (dev_attr_show+0x28/0x4c)
[<>] (dev_attr_show) from [<>] (sysfs_kf_seq_show+0x8c/0xf0)
[<>] (sysfs_kf_seq_show) from [<>] (kernfs_seq_show+0x2c/0x30)
[<>] (kernfs_seq_show) from [<>] (seq_read+0x1c8/0x440)
[<>] (seq_read) from [<>] (kernfs_fop_read+0x38/0x170)
[<>] (kernfs_fop_read) from [<>] (do_readv_writev+0x16c/0x238)
[<>] (do_readv_writev) from [<>] (vfs_readv+0x50/0x58)
[<>] (vfs_readv) from [<>] (default_file_splice_read+0x1a4/0x308)
[<>] (default_file_splice_read) from [<>] (do_splice_to+0x78/0x84)
[<>] (do_splice_to) from [<>] (splice_direct_to_actor+0xc8/0x1cc)
[<>] (splice_direct_to_actor) from [<>] (do_splice_direct+0xa0/0xb8)
[<>] (do_splice_direct) from [<>] (do_sendfile+0x1a8/0x30c)
[<>] (do_sendfile) from [<>] (SyS_sendfile64+0x104/0x10c)
[<>] (SyS_sendfile64) from [<>] (ret_fast_syscall+0x0/0x38)
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Handle a write being requested to the page immediately beyond the EOF
marker on a cache object. Currently this gets an assertion failure in
CacheFiles because the EOF marker is used there to encode information about
a partial page at the EOF - which could lead to an unknown blank spot in
the file if we extend the file over it.
The problem is actually in fscache where we check the index of the page
being written against store_limit. store_limit is set to the number of
pages that we're allowed to store by fscache_set_store_limit() - which
means it's one more than the index of the last page we're allowed to store.
The problem is that we permit writing to a page with an index _equal_ to
the store limit - when we should reject that case.
Whilst we're at it, change the triggered assertion in CacheFiles to just
return -ENOBUFS instead.
The assertion failure looks something like this:
CacheFiles: Assertion failed
1000 < 7b1 is false
------------[ cut here ]------------
kernel BUG at fs/cachefiles/rdwr.c:962!
...
RIP: 0010:[<ffffffffa02c9e83>] [<ffffffffa02c9e83>] cachefiles_write_page+0x273/0x2d0 [cachefiles]
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ kamal: backport to 3.13-stable: no __kernel_write(); thanks Ben H. ] Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The following statement of ABI/testing/dev-kmsg is not quite right:
It is not possible to inject messages from userspace with the
facility number LOG_KERN (0), to make sure that the origin of the
messages can always be reliably determined.
Userland actually can inject messages with a facility of 0 by abusing the
fact that the facility is stored in a u8 data type. By using a facility
which is a multiple of 256 the assignment of msg->facility in log_store()
implicitly truncates it to 0, i.e. LOG_KERN, allowing users of /dev/kmsg
to spoof kernel messages as shown below:
The following call...
# printf '<%d>Kernel panic - not syncing: beer empty\n' 0 >/dev/kmsg
...leads to the following log entry (dmesg -x | tail -n 1):
user :emerg : [ 66.137758] Kernel panic - not syncing: beer empty
However, this call...
# printf '<%d>Kernel panic - not syncing: beer empty\n' 0x800 >/dev/kmsg
...leads to the slightly different log entry (note the kernel facility):
kern :emerg : [ 74.177343] Kernel panic - not syncing: beer empty
Fix that by limiting the user provided facility to 8 bit right from the
beginning and catch the truncation early.
Fixes: 7ff9554bb578 ("printk: convert byte-buffer to variable-length...") Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Petr Mladek <pmladek@suse.cz> Cc: Alex Elder <elder@linaro.org> Cc: Joe Perches <joe@perches.com> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ kamal: backport to 3.13-stable: retain local 'int i' ] Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
We've had many reports that some Creative sound cards with CA0132
don't work well. Some reported that it starts working after reloading
the module, while some reported it starts working when a 32bit kernel
is used. All these facts seem implying that the chip fails to
communicate when the buffer is located in 64bit address.
This patch addresses these issues by just adding AZX_DCAPS_NO_64BIT
flag to the corresponding PCI entries. I casually had a chance to
test an SB Recon3D board, and indeed this seems helping.
Although this hasn't been tested on all Creative devices, it's safer
to assume that this restriction applies to the rest of them, too. So
the flag is applied to all Creative entries.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[ kamal: backport to 3.13-stable: context ] Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
So the /proc/PID/stat 'wchan' field (the 30th field, which contains
the absolute kernel address of the kernel function a task is blocked in)
leaks absolute kernel addresses to unprivileged user-space:
seq_put_decimal_ull(m, ' ', wchan);
The absolute address might also leak via /proc/PID/wchan as well, if
KALLSYMS is turned off or if the symbol lookup fails for some reason:
static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task)
{
unsigned long wchan;
char symname[KSYM_NAME_LEN];
There's one compatibility quirk here: procps relies on whether the
absolute value is non-zero - and we can provide that functionality
by outputing "0" or "1" depending on whether the task is blocked
(whether there's a wchan address).
These days there appears to be very little legitimate reason
user-space would be interested in the absolute address. The
absolute address is mostly historic: from the days when we
didn't have kallsyms and user-space procps had to do the
decoding itself via the System.map.
So this patch sets all numeric output to "0" or "1" and keeps only
symbolic output, in /proc/PID/wchan.
( The absolute sleep address can generally still be profiled via
perf, by tasks with sufficient privileges. )
Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: kasan-dev <kasan-dev@googlegroups.com> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150930135917.GA3285@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ kamal: backport to 3.16-stable: proc_pid_wchan context ] Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
There seems to be two problems causing this issue.
First, in drivers/tty/n_tty.c, __receive_buf() stores the data and
updates ldata->commit_head using smp_store_release() and then checks
the wait queue using waitqueue_active(). However, since there is no
memory barrier, __receive_buf() could return without calling
wake_up_interactive_poll(), and at the same time, n_tty_read() could
start to wait in wait_woken() as in the following chart.
__receive_buf() n_tty_read()
------------------------------------------------------------------------
if (waitqueue_active(&tty->read_wait))
/* Memory operations issued after the
RELEASE may be completed before the
RELEASE operation has completed */
add_wait_queue(&tty->read_wait, &wait);
...
if (!input_available_p(tty, 0)) {
smp_store_release(&ldata->commit_head,
ldata->read_head);
...
timeout = wait_woken(&wait,
TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------
The second problem is that n_tty_read() also lacks a memory barrier
call and could also cause __receive_buf() to return without calling
wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken()
as in the chart below.
__receive_buf() n_tty_read()
------------------------------------------------------------------------
spin_lock_irqsave(&q->lock, flags);
/* from add_wait_queue() */
...
if (!input_available_p(tty, 0)) {
/* Memory operations issued after the
RELEASE may be completed before the
RELEASE operation has completed */
smp_store_release(&ldata->commit_head,
ldata->read_head);
if (waitqueue_active(&tty->read_wait))
__add_wait_queue(q, wait);
spin_unlock_irqrestore(&q->lock,flags);
/* from add_wait_queue() */
...
timeout = wait_woken(&wait,
TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------
There are also other places in drivers/tty/n_tty.c which have similar
calls to waitqueue_active(), so instead of adding many memory barrier
calls, this patch simply removes the call to waitqueue_active(),
leaving just wake_up*() behind.
This fixes both problems because, even though the memory access before
or after the spinlocks in both wake_up*() and add_wait_queue() can
sneak into the critical section, it cannot go past it and the critical
section assures that they will be serialized (please see "INTER-CPU
ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a
better explanation). Moreover, the resulting code is much simpler.
Latency measurement using a ping-pong test over a pty doesn't show any
visible performance drop.
Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ luis: backported to 3.16:
- always use wake_up_interruptible() instead of wake_up_interruptible_poll()
- adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Calling e.g. blk_queue_max_hw_sectors() after calls to
disk_stack_limits() discards the settings determined by
disk_stack_limits().
So we need to make those calls first.
Fixes: 199dc6ed5179 ("md/raid0: update queue parameter in a safer location.") Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
When a (e.g.) RAID5 array is reshaped to RAID0, the updating
of queue parameters (e.g. max number of sectors per bio) is
done in the wrong place.
It should be part of ->run, but it is actually part of ->takeover.
This means it happens before level_store() calls:
blk_set_stacking_limits(&mddev->queue->limits);
and so it ineffective. This can lead to errors from underlying
devices.
So move all the relevant settings out of create_stripe_zones()
and into raid0_run().
As this can lead to a bug-on it is suitable for any -stable
kernel which supports reshape to RAID0. So 2.6.35 or later.
As the bug has been present for five years there is no urgency,
so no need to rush into -stable.
Fixes: 9af204cf720c ("md: Add support for Raid5->Raid0 and Raid10->Raid0 takeover") Reported-by: Yi Zhang <yizhan@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com>
[ luis: backported to 3.16:
- raid0 isn't accessed from dm-raid so no conditional mddev->queue accesses
(done with commit 753f2856cda2 "md raid0: access mddev->queue (request
queue member) conditionally because it is not set when accessed from
dm-raid")
- adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Reference to the 'np' node is dropped before dereferencing the 'sizep' and
'basep' pointers, which could by then point to junk if the node has been
freed.
Refactor code to call 'of_node_put' later.
Fixes: c5df39262dd5 ("drivers/char/tpm: Add securityfs support for event log") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Acked-by: Peter Huewe <PeterHuewe@gmx.de>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
mcp794xx alarm registers must be written in BCD format. However, the
alarm programming logic neglected this by adding one to the value
after bin2bcd conversion has been already done, writing bad values
to month register in case the alarm being set is in October. In this
case, the alarm month value becomes 0x0a instead of the expected 0x10.
Fix by moving the +1 addition within the bin2bcd call also.
Fixes: 1d1945d261a2 ("drivers/rtc/rtc-ds1307.c: add alarm support for mcp7941x chips") Signed-off-by: Tero Kristo <t-kristo@ti.com> Acked-by: Nishanth Menon <nm@ti.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Since commit 7d5cd2ce529b, when bond_enslave fails on devices that
are not ARPHRD_ETHER, if needed, it resets the bonding device back to
ARPHRD_ETHER by calling ether_setup.
Unfortunately, ether_setup clobbers dev->flags, clearing IFF_UP
if the bond device is up, leaving it in a quasi-down state without
having actually gone through dev_close. For bonding, if any periodic
work queue items are active (miimon, arp_interval, etc), those will
remain running, as they are stopped by bond_close. At this point, if
the bonding module is unloaded or the bond is deleted, the system will
panic when the work function is called.
This panic is resolved by calling dev_close on the bond itself
prior to calling ether_setup.
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Fixes: 7d5cd2ce5292 ("bonding: correctly handle bonding type change on enslave failure") Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The -i flag was incorrectly listed as a short flag for --no-inherit. It
should have only been listed as a short flag for --input.
This documentation error has existed since the --input flag was
introduced in 6810fc915f7a89d8134edb3996dbbf8eac386c26 (perf trace: Add
option to analyze events in a file versus live).
Signed-off-by: Peter Feiner <pfeiner@google.com> Cc: David Ahern <dsahern@gmail.com> Link: http://lkml.kernel.org/r/1446657706-14518-1-git-send-email-pfeiner@google.com Fixes: 6810fc915f7a ("perf trace: Add option to analyze events in a file versus live") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Both tunnel6_protocol and tunnel46_protocol share the same error
handler, tunnel6_err(), which traverses through tunnel6_handlers list.
For ipip6 tunnels, we need to traverse tunnel46_handlers as we do e.g.
in tunnel46_rcv(). Current code can generate an ICMPv6 error message
with an IPv4 packet embedded in it.
Fixes: 73d605d1abbd ("[IPSEC]: changing API of xfrm6_tunnel_register") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
create_request_message() computes the maximum length of a message,
but uses the wrong type for the time stamp: sizeof(struct timespec)
may be 8 or 16 depending on the architecture, while sizeof(struct
ceph_timespec) is always 8, and that is what gets put into the
message.
Found while auditing the uses of timespec for y2038 problems.
Fixes: b8e69066d8af ("ceph: include time stamp in every MDS request") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>