If we use USB ID pin as wakeup source, and there is a USB block
device on this USB OTG (ID) cable, the system will be deadlock
after system resume.
The root cause for this problem is: the workqueue ci_otg may try
to remove hcd before the driver resume has finished, and hcd will
disconnect the device on it, then, it will call device_release_driver,
and holds the device lock "dev->mutex", but it is never unlocked since
it waits workqueue writeback to run to flush the block information, but
the workqueue writeback is freezable, it is not thawed before driver
resume has finished.
When the driver (device: sd 0:0:0:0:) resume goes to dpm_complete, it
tries to get its device lock "dev->mutex", but it can't get it forever,
then the deadlock occurs. Below call stacks show the situation.
So, in order to fix this problem, we need to change workqueue ci_otg
as freezable, then the work item in this workqueue will be run after
driver's resume, this workqueue will not be blocked forever like above
case since the workqueue writeback has been thawed too.
The 5 volt detect functionality broke in 3.14: the code reads IO register 0x70
again after it has already been cleared. Instead it should use the cached
irq_reg_0x70 value and the io_write to 0x71 to clear 0x70 can be dropped since
this has already been done.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
As reported by Soohoon Lee, the HDIO_GET_32BIT ioctl does not
work correctly in compat mode with libata.
I have investigated the issue further and found multiple problems
that all appeared with the same commit that originally introduced
HDIO_GET_32BIT handling in libata back in linux-2.6.8 and presumably
also linux-2.4, as the code uses "copy_to_user(arg, &val, 1)" to copy
a 'long' variable containing either 0 or 1 to user space.
The problems with this are:
* On big-endian machines, this will always write a zero because it
stores the wrong byte into user space.
* In compat mode, the upper three bytes of the variable are updated
by the compat_hdio_ioctl() function, but they now contain
uninitialized stack data.
* The hdparm tool calling this ioctl uses a 'static long' variable
to store the result. This means at least the upper bytes are
initialized to zero, but calling another ioctl like HDIO_GET_MULTCOUNT
would fill them with data that remains stale when the low byte
is overwritten. Fortunately libata doesn't implement any of the
affected ioctl commands, so this would only happen when we query
both an IDE and an ATA device in the same command such as
"hdparm -N -c /dev/hda /dev/sda"
* The libata code for unknown reasons started using ATA_IOC_GET_IO32
and ATA_IOC_SET_IO32 as aliases for HDIO_GET_32BIT and HDIO_SET_32BIT,
while the ioctl commands that were added later use the normal
HDIO_* names. This is harmless but rather confusing.
This addresses all four issues by changing the code to use put_user()
on an 'unsigned long' variable in HDIO_GET_32BIT, like the IDE subsystem
does, and by clarifying the names of the ioctl commands.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Soohoon Lee <Soohoon.Lee@f5.com> Tested-by: Soohoon Lee <Soohoon.Lee@f5.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
This patch was triggering a Oops in stable kernel 3.10.99. Christian
agrees that the patch is correct but "assumes that radeon_fence_unref()
can safely take NULL as the fence which is not the case for older
kernels."
Reported-by: Erik Andersen <andersen@codepoet.org> Acked-by: Christian König <christian.koenig@amd.com> Cc: Nicolai Hähnle <nicolai.haehnle@amd.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
This patch fixes a race between setting of SCF_SEND_DELAYED_TAS
in transport_send_task_abort(), and check of the same bit in
transport_check_aborted_status().
It adds a __transport_check_aborted_status() version that is
used by target_execute_cmd() when se_cmd->t_state_lock is
held, and a transport_check_aborted_status() wrapper for
all other existing callers.
Also, it handles the case where the check happens before
transport_send_task_abort() gets called. For this, go
ahead and set SCF_SEND_DELAYED_TAS early when necessary,
and have transport_send_task_abort() send the abort.
Cc: Quinn Tran <quinn.tran@qlogic.com> Cc: Himanshu Madhani <himanshu.madhani@qlogic.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Andy Grover <agrover@redhat.com> Cc: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[ luis: backported to 3.16: based on Nicholas' backport to 3.14 ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
To address the bug where fabric driver level shutdown
of se_cmd occurs at the same time when TMR CMD_T_ABORTED
is happening resulting in a -1 ->cmd_kref, this patch
adds a CMD_T_FABRIC_STOP bit that is used to determine
when TMR + driver I_T nexus shutdown is happening
concurrently.
It changes target_sess_cmd_list_set_waiting() to obtain
se_cmd->cmd_kref + set CMD_T_FABRIC_STOP, and drop local
reference in target_wait_for_sess_cmds() and invoke extra
target_put_sess_cmd() during Task Aborted Status (TAS)
when necessary.
Also, it adds a new target_wait_free_cmd() wrapper around
transport_wait_for_tasks() for the special case within
transport_generic_free_cmd() to set CMD_T_FABRIC_STOP,
and is now aware of CMD_T_ABORTED + CMD_T_TAS status
bits to know when an extra transport_put_cmd() during
TAS is required.
Note transport_generic_free_cmd() is expected to block on
cmd->cmd_wait_comp in order to follow what iscsi-target
expects during iscsi_conn context se_cmd shutdown.
Cc: Quinn Tran <quinn.tran@qlogic.com> Cc: Himanshu Madhani <himanshu.madhani@qlogic.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Andy Grover <agrover@redhat.com> Cc: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
[ luis: backported to 3.16: used Nicholas' backport to 3.14 ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
This patch fixes a bug in TMR task aborted status (TAS)
handling when multiple sessions are connected to the
same target WWPN endpoint and se_node_acl descriptor,
resulting in TASK_ABORTED status to not be generated
for aborted se_cmds on the remote port.
This is due to core_tmr_handle_tas_abort() incorrectly
comparing se_node_acl instead of se_session, for which
the multi-session case is expected to be sharing the
same se_node_acl.
Instead, go ahead and update core_tmr_handle_tas_abort()
to compare tmr_sess + cmd->se_sess in order to determine
if the LUN_RESET was received on a different I_T nexus,
and TASK_ABORTED status response needs to be generated.
Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Quinn Tran <quinn.tran@qlogic.com> Cc: Himanshu Madhani <himanshu.madhani@qlogic.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Andy Grover <agrover@redhat.com> Cc: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[ luis: backported to 3.16: used Nicholas' backport to 3.14 ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
This patch fixes a NULL pointer se_cmd->cmd_kref < 0
refcount bug during TMR LUN_RESET with active se_cmd
I/O, that can be triggered during se_cmd descriptor
shutdown + release via core_tmr_drain_state_list() code.
To address this bug, add common __target_check_io_state()
helper for ABORT_TASK + LUN_RESET w/ CMD_T_COMPLETE
checking, and set CMD_T_ABORTED + obtain ->cmd_kref for
both cases ahead of last target_put_sess_cmd() after
TFO->aborted_task() -> transport_cmd_finish_abort()
callback has completed.
It also introduces SCF_ACK_KREF to determine when
transport_cmd_finish_abort() needs to drop the second
extra reference, ahead of calling target_put_sess_cmd()
for the final kref_put(&se_cmd->cmd_kref).
It also updates transport_cmd_check_stop() to avoid
holding se_cmd->t_state_lock while dropping se_cmd
device state via target_remove_from_state_list(), now
that core_tmr_drain_state_list() is holding the
se_device lock while checking se_cmd state from
within TMR logic.
Finally, move transport_put_cmd() release of SGL +
TMR + extended CDB memory into target_free_cmd_mem()
in order to avoid potential resource leaks in TMR
ABORT_TASK + LUN_RESET code-paths. Also update
target_release_cmd_kref() accordingly.
Reviewed-by: Quinn Tran <quinn.tran@qlogic.com> Cc: Himanshu Madhani <himanshu.madhani@qlogic.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Andy Grover <agrover@redhat.com> Cc: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[ luis: backported to 3.16: used Nicholas' backport to 3.14 ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
This patch fixes a NULL pointer se_cmd->cmd_kref < 0
refcount bug during TMR LUN_RESET with active TMRs,
triggered during se_cmd + se_tmr_req descriptor
shutdown + release via core_tmr_drain_tmr_list().
To address this bug, go ahead and obtain a local
kref_get_unless_zero(&se_cmd->cmd_kref) for active I/O
to set CMD_T_ABORTED, and transport_wait_for_tasks()
followed by the final target_put_sess_cmd() to drop
the local ->cmd_kref.
Also add two new checks within target_tmr_work() to
avoid CMD_T_ABORTED -> TFO->queue_tm_rsp() callbacks
ahead of invoking the backend -> fabric put in
transport_cmd_check_stop_to_fabric().
For good measure, also change core_tmr_release_req()
to use list_del_init() ahead of se_tmr_req memory
free.
Reviewed-by: Quinn Tran <quinn.tran@qlogic.com> Cc: Himanshu Madhani <himanshu.madhani@qlogic.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Andy Grover <agrover@redhat.com> Cc: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[ luis: backported to 3.16: used Nicholas' backport to 3.14 ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Dmitry Vyukov noted recently that the sctp_port_hashtable had an error in
its size computation, observing that the current method never guaranteed
that the hashsize (measured in number of entries) would be a power of two,
which the input hash function for that table requires. The root cause of
the problem is that two values need to be computed (one, the allocation
order of the storage requries, as passed to __get_free_pages, and two the
number of entries for the hash table). Both need to be ^2, but for
different reasons, and the existing code is simply computing one order
value, and using it as the basis for both, which is wrong (i.e. it assumes
that ((1<<order)*PAGE_SIZE)/sizeof(bucket) is still ^2 when its not).
To fix this, we change the logic slightly. We start by computing a goal
allocation order (which is limited by the maximum size hash table we want
to support. Then we attempt to allocate that size table, decreasing the
order until a successful allocation is made. Then, with the resultant
successful order we compute the number of buckets that hash table supports,
which we then round down to the nearest power of two, giving us the number
of entries the table actually supports.
I've tested this locally here, using non-debug and spinlock-debug kernels,
and the number of entries in the hashtable consistently work out to be
powers of two in all cases.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> CC: Dmitry Vyukov <dvyukov@google.com> CC: Vladislav Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
An error response from a RTM_GETNETCONF request can return the positive
error value EINVAL in the struct nlmsgerr that can mislead userspace.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Since the gc of ipv4 route was removed, the route cached would has
no chance to be removed, and even it has been timeout, it still could
be used, cause no code to check it's expires.
Fix this issue by checking and removing route cache when we get route.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Drop reference on the relay_po socket when __pppoe_xmit() succeeds.
This is already handled correctly in the error path.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Reported-by: Thomas Schäfer <tschaefer@t-online.de> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Dmitry reported memory leaks of IP options allocated in
ip_cmsg_send() when/if this function returns an error.
Callers are responsible for the freeing.
Many thanks to Dmitry for the report and diagnostic.
Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
tg3_tso_bug() can hit a condition where the entire tx ring is not big
enough to segment the GSO packet. For example, if MSS is very small,
gso_segs can exceed the tx ring size. When we hit the condition, it
will cause tx timeout.
tg3_tso_bug() is called to handle TSO and DMA hardware bugs.
For TSO bugs, if tg3_tso_bug() cannot succeed, we have to drop the packet.
For DMA bugs, we can still fall back to linearize the SKB and let the
hardware transmit the TSO packet.
This patch adds a function tg3_tso_bug_gso_check() to check if there
are enough tx descriptors for GSO before calling tg3_tso_bug().
The caller will then handle the error appropriately - drop or
lineraize the SKB.
v2: Corrected patch description to avoid confusion.
Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Acked-by: Prashant Sreedharan <prashant@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Devices may have limits on the number of fragments in an skb they support.
Current codebase uses a constant as maximum for number of fragments one
skb can hold and use.
When enabling scatter/gather and running traffic with many small messages
the codebase uses the maximum number of fragments and may thereby violate
the max for certain devices.
The patch introduces a global variable as max number of fragments.
Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
disabled accept hop limit from RA if it is smaller than the current hop
limit for security stuff. But this behavior kind of break the RFC definition.
RFC 4861, 6.3.4. Processing Received Router Advertisements
A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
and Retrans Timer) may contain a value denoting that it is
unspecified. In such cases, the parameter should be ignored and the
host should continue using whatever value it is already using.
If the received Cur Hop Limit value is non-zero, the host SHOULD set
its CurHopLimit variable to the received value.
So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
hop limit value they can accept from RA. And set default to 1 to meet RFC
standards.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ luis: backported to 3.16:
- added missing DEVCONF_* as suggested by Yoshfuji so that uapi contains
the same values as mainline
- adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Currently, the egress interface index specified via IPV6_PKTINFO
is ignored by __ip6_datagram_connect(), so that RFC 3542 section 6.7
can be subverted when the user space application calls connect()
before sendmsg().
Fix it by initializing properly flowi6_oif in connect() before
performing the route lookup.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The problem here is the skb we provide to tcp_v4_send_ack() had to
be parked in the backlog of a new TCP fastopen child because this child
was owned by the user at the time an out of window packet arrived.
Before queuing a packet, TCP has to set skb->dev to NULL as the device
could disappear before packet is removed from the queue.
Fix this issue by using the net pointer provided by the socket (being a
timewait or a request socket).
IPv6 is immune to the bug : tcp_v6_send_response() already gets the net
pointer from the socket if provided.
Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path") Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jerry Chu <hkchu@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
PHY status frames are not reliable, the PHY may not be able to send them
during heavy receive traffic. This overflow condition is signaled by the
PHY in the next status frame, but the driver did not make use of it.
Instead it always reported wrong tx timestamps to user space after an
overflow happened because it assigned newly received tx timestamps to old
packets in the queue.
This commit fixes this issue by clearing the tx timestamp queue every time
an overflow happens, so that no timestamps are delivered for overflow
packets. This way time stamping will continue correctly after an overflow.
Signed-off-by: Manfred Rudigier <manfred.rudigier@omicron.at> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The value passed by unix_diag_get_exact to unix_lookup_by_ino has type
__u32, but unix_lookup_by_ino's argument ino has type int, which is not
a problem yet.
However, when ino is compared with sock_i_ino return value of type
unsigned long, ino is sign extended to signed long, and this results
to incorrect comparison on 64-bit architectures for inode numbers
greater than INT_MAX.
This bug was found by strace test suite.
Fixes: 5d3cae8bc39d ("unix_diag: Dumping exact socket core") Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
It's forbidden to manually change dev->features in run-time. Currently, this is
done in the driver to make sure that GSO_UDP_TUNNEL is advertized only when
VXLAN tunnel is set. However, since the stack actually does features intersection
with hw_enc_features, we can safely revert to advertizing features early when
registering the netdevice.
Fixes: f4a1edd56120 ('net/mlx4_en: Advertize encapsulation offloads [...]') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Previously, the shift value used for time-stamping was constant and didn't
depend on the HW chip frequency. Change that to take the frequency into account
and calculate the maximal value in cycles per wraparound of ten seconds. This
time slot was chosen since it gives a good accuracy in time synchronization.
Algorithm for shift value calculation:
* Round up the maximal value in cycles to nearest power of two
* Calculate maximal multiplier by division of all 64 bits set
to above result
* Then, invert the function clocksource_khz2mult() to get the shift from
maximal mult value
Fixes: ec693d47010e ('net/mlx4_en: Add HW timestamping (TS) support') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
RdropOvflw counts overrun of HW buffer, therefore should
be used for rx_fifo_errors only.
Currently RdropOvflw counter is mistakenly also set into
rx_missed_errors and rx_over_errors too, which makes the
device total dropped packets accounting to show wrong results.
Fix that. Use it for rx_fifo_errors only.
Fixes: c27a02cd94d6 ('mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC') Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The clear and set masks in the call to phy_set_clr_bits() called from
bcm7xxx_config_init() are inverted. We need to fix this by swapping the two
arguments, that is, set 0 bits, but clear the shade mode 2 enable bit.
Fixes: b560a58c45c66 ("net: phy: add Broadcom BCM7xxx internal PHY driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The unix_dgram_sendmsg routine use the following test
if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
to determine if sk and other are in an n:1 association (either
established via connect or by using sendto to send messages to an
unrelated socket identified by address). This isn't correct as the
specified address could have been bound to the sending socket itself or
because this socket could have been connected to itself by the time of
the unix_peer_get but disconnected before the unix_state_lock(other). In
both cases, the if-block would be entered despite other == sk which
might either block the sender unintentionally or lead to trying to unlock
the same spin lock twice for a non-blocking send. Add a other != sk
check to guard against this.
Fixes: 7d267278a9ec ("unix: avoid use-after-free in ep_remove_wait_queue") Reported-By: Philipp Hahn <pmhahn@pmhahn.de> Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Tested-by: Philipp Hahn <pmhahn@pmhahn.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The current logic in bond_arp_rcv will accept an incoming ARP for
validation if (a) the receiving slave is either "active" (which includes
the currently active slave, or the current ARP slave) or, (b) there is a
currently active slave, and it has received an ARP since it became active.
For case (b), the receiving slave isn't the currently active slave, and is
receiving the original broadcast ARP request, not an ARP reply from the
target.
This logic can fail if there is no currently active slave. In
this situation, the ARP probe logic cycles through all slaves, assigning
each in turn as the "current_arp_slave" for one arp_interval, then setting
that one as "active," and sending an ARP probe from that slave. The
current logic expects the ARP reply to arrive on the sending
current_arp_slave, however, due to switch FDB updating delays, the reply
may be directed to another slave.
This can arise if the bonding slaves and switch are working, but
the ARP target is not responding. When the ARP target recovers, a
condition may result wherein the ARP target host replies faster than the
switch can update its forwarding table, causing each ARP reply to be sent
to the previous current_arp_slave. This will never pass the logic in
bond_arp_rcv, as neither of the above conditions (a) or (b) are met.
Some experimentation on a LAN shows ARP reply round trips in the
200 usec range, but my available switches never update their FDB in less
than 4000 usec.
This patch changes the logic in bond_arp_rcv to additionally
accept an ARP reply for validation on any slave if there is a current ARP
slave and it sent an ARP probe during the previous arp_interval.
Fixes: aeea64ac717a ("bonding: don't trust arp requests unless active slave really works") Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Downstream packages like Debian flash-kernel use
/proc/device-tree/model
to determine which dtb file to install.
Hence each dts in the Linux kernel should provide a unique model
identifier.
Commit 2d0a7addbd10 ("ARM: Kirkwood: Add support for many Synology NAS
devices") created the new files kirkwood-ds111.dts and kirkwood-ds112.dts
using the same model identifier.
This patch provides a unique model identifier for the
Synology DiskStation DS112.
Fixes: 2d0a7addbd10 ("ARM: Kirkwood: Add support for many Synology NAS devices") Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
... into returning a positive to path_openat(), which would interpret that
as "symlink had been encountered" and proceed to corrupt memory, etc.
It can only happen due to a bug in some ->open() instance or in some LSM
hook, etc., so we report any such event *and* make sure it doesn't trick
us into further unpleasantness.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The delete opration can allocate additional space on the HPFS filesystem
due to btree split. The HPFS driver checks in advance if there is
available space, so that it won't corrupt the btree if we run out of space
during splitting.
If there is not enough available space, the HPFS driver attempted to
truncate the file, but this results in a deadlock since the commit 7dd29d8d865efdb00c0542a5d2c87af8c52ea6c7 ("HPFS: Introduce a global mutex
and lock it on every callback from VFS").
This patch removes the code that tries to truncate the file and -ENOSPC is
returned instead. If the user hits -ENOSPC on delete, he should try to
delete other files (that are stored in a leaf btree node), so that the
delete operation will make some space for deleting the file stored in
non-leaf btree node.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
HP EliteBook 755 G2 with ALC3228 (ALC280) codec [103c:221c] requires
the known fixup (ALC269_FIXUP_HEADSET_MIC) for making the headset mic
working. Also, it suffers from the loopback noise problem, so we
should disable aamix path as well.
Reported-by: Derick Eddington <derick.eddington@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
After login to the desktop on Dell Inspiron 3162,
there's a very loud background noise comes from the builtin speaker.
The noise does not go away even if the speaker is muted.
The noise disappears after using the aamix fixup.
Codec: Realtek ALC3234
Address: 0
AFG Function Id: 0x1 (unsol 1)
Vendor Id: 0x10ec0255
Subsystem Id: 0x10280725
Revision Id: 0x100002
No Modem Function Group found
The contract between try_read() and try_write() is that when called
each processes as much data as possible. When instructed by osd_client
to skip a message, try_read() is violating this contract by returning
after receiving and discarding a single message instead of checking for
more. try_write() then gets a chance to write out more requests,
generating more replies/skips for try_read() to handle, forcing the
messenger into a starvation loop.
Reported-by: Varada Kari <Varada.Kari@sandisk.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Varada Kari <Varada.Kari@sandisk.com> Reviewed-by: Alex Elder <elder@linaro.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The ftrace:function event is only displayed for parsing the function tracer
data. It is not used to enable function tracing, and does not include an
"enable" file in its event directory.
Originally, this event was kept separate from other events because it did
not have a ->reg parameter. But perf added a "reg" parameter for its use
which caused issues, because it made the event available to functions where
it was not compatible for.
Commit 9b63776fa3ca9 "tracing: Do not enable function event with enable"
added a TRACE_EVENT_FL_IGNORE_ENABLE flag that prevented the function event
from being enabled by normal trace events. But this commit missed keeping
the function event from being displayed by the "available_events" directory,
which is used to show what events can be enabled by set_event.
One documented way to enable all events is to:
cat available_events > set_event
But because the function event is displayed in the available_events, this
now causes an INVALID error:
cat: write error: Invalid argument
Reported-by: Chunyu Hu <chuhu@redhat.com> Fixes: 9b63776fa3ca9 "tracing: Do not enable function event with enable" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
In async_pf we try to allocate with NOWAIT to get an element quickly
or fail. This code also handle failures gracefully. Lets silence
potential page allocation failures under load.
The qword_get() function NUL-terminates its output buffer. If the input
string is in hex format \xXXXX... and the same length as the output
buffer, there is an off-by-one:
This patch ensures the NUL terminator doesn't fall outside the output
buffer.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
set_power_state defaults to no displays, so we need to update
the display configuration after setting up the powerstate on the
first call. In most cases this is not an issue since ends up
getting called multiple times at any given modeset and the proper
order is achieved in the display changed handling at the top of
the function.
Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Jordan Lazare <Jordan.Lazare@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
This patch fixes the problem that more CAN messages could be sent to the
interface as could be send on the CAN bus. This was more likely for slow baud
rates. The sleeping _start_xmit was woken up in the _write_bulk_callback. Under
heavy TX load this produced another bulk transfer without checking the
free_slots variable and hence caused the overflow in the interface.
Signed-off-by: Gerhard Uttenthaler <uttenthaler@ems-wuensche.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
There's one point was missed in the patch commit da49889deb34 ("staging:
binder: Support concurrent 32 bit and 64 bit processes."). When configure
BINDER_IPC_32BIT, the size of binder_uintptr_t was 32bits, but size of
void * is 64bit on 64bit system. Correct it here.
Signed-off-by: Lisa Du <cldu@marvell.com> Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Fixes: da49889deb34 ("staging: binder: Support concurrent 32 bit and 64 bit processes.") Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ luis: backported to 3.16:
- binder is still in staging in the 3.16 kernel] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Add USB ID for cp2104/5 devices on GE B650v3 and B850v3 boards.
Signed-off-by: Ken Lin <ken.lin@advantech.com.tw> Signed-off-by: Akshay Bhat <akshay.bhat@timesys.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
In the case where d_add_unique() finds an appropriate alias to use it will
have already incremented the reference count. An additional dget() to swap
the open context's dentry is unnecessary and will leak a reference.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Fixes: 275bb307865a3 ("NFSv4: Move dentry instantiation into the NFSv4-...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The assignement of EP transfer resources was not handled properly in the
dwc3 driver. Commit aebda6187181 ("usb: dwc3: Reset the transfer
resource index on SET_INTERFACE") previously fixed one aspect of this
where resources may be exhausted with multiple calls to SET_INTERFACE.
However, it introduced an issue where composite devices with multiple
interfaces can be assigned the same transfer resources for different
endpoints. This patch solves both issues.
The assignment of transfer resources cannot perfectly follow the data
book due to the fact that the controller driver does not have all
knowledge of the configuration in advance. It is given this information
piecemeal by the composite gadget framework after every
SET_CONFIGURATION and SET_INTERFACE. Trying to follow the databook
programming model in this scenario can cause errors. For two reasons:
1) The databook says to do DEPSTARTCFG for every SET_CONFIGURATION and
SET_INTERFACE (8.1.5). This is incorrect in the scenario of multiple
interfaces.
2) The databook does not mention doing more DEPXFERCFG for new endpoint
on alt setting (8.1.6).
The following simplified method is used instead:
All hardware endpoints can be assigned a transfer resource and this
setting will stay persistent until either a core reset or hibernation.
So whenever we do a DEPSTARTCFG(0) we can go ahead and do DEPXFERCFG for
every hardware endpoint as well. We are guaranteed that there are as
many transfer resources as endpoints.
This patch triggers off of the calling dwc3_gadget_start_config() for
EP0-out, which always happens first, and which should only happen in one
of the above conditions.
Fixes: aebda6187181 ("usb: dwc3: Reset the transfer resource index on SET_INTERFACE") Reported-by: Ravi Babu <ravibabu@ti.com> Signed-off-by: John Youn <johnyoun@synopsys.com> Signed-off-by: Felipe Balbi <balbi@kernel.org>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
When we call pci_scan_root_bus we would pass in 'sd' (sysdata)
pointer which was an 'pcifront_sd' structure. However in the
pci_device_add it expects that the 'sd' is 'struct sysdata' and
sets the dev->node to what is in sd->node (offset 4):
That is an hole - filled with garbage as we used kmalloc instead of
kzalloc (the second problem).
This patch fixes the issue by:
1) Use kzalloc to initialize to a well known state.
2) Put 'struct pci_sysdata' at the start of 'pcifront_sd'. That
way access to the 'node' will access the right offset.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Commit 8135cf8b092723dbfcc611fe6fdcb3a36c9951c5 (xen/pciback: Save
xen_pci_op commands before processing it) broke enabling MSI-X because
it would never copy the resulting vectors into the response. The
number of vectors requested was being overwritten by the return value
(typically zero for success).
Save the number of vectors before processing the op, so the correct
number of vectors are copied afterwards.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Commit 408fb0e5aa7fda0059db282ff58c3b2a4278baa0 (xen/pciback: Don't
allow MSI-X ops if PCI_COMMAND_MEMORY is not set) prevented enabling
MSI-X on passed-through virtual functions, because it checked the VF
for PCI_COMMAND_MEMORY but this is not a valid bit for VFs.
Instead, check the physical function for PCI_COMMAND_MEMORY.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Ben Hutchings [Mon, 7 Mar 2016 03:40:02 +0000 (03:40 +0000)]
crypto: {blk,giv}cipher: Set has_setkey
Commit a1383cd86a06 ("crypto: skcipher - Add crypto_skcipher_has_setkey")
was incorrectly backported to the 3.2.y and 3.16.y stable branches.
We need to set ablkcipher_tfm::has_setkey in the
crypto_init_blkcipher_ops_async() and crypto_init_givcipher_ops()
functions as well as crypto_init_ablkcipher_ops().
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
This doesn't seem to fix a regression -- I don't think the CLAC was
ever there.
I double-checked in a debugger: entries through the int80 gate do
not automatically clear AC.
Stable maintainers: I can provide a backport to 4.3 and earlier if
needed. This needs to be backported all the way to 3.10.
Reported-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 63bcff2a307b ("x86, smap: Add STAC and CLAC instructions to control user space access") Link: http://lkml.kernel.org/r/b02b7e71ae54074be01fc171cbd4b72517055c0e.1456345086.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ kamal: backport to 3.10 through 3.19-stable: file rename; context ] Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
In __request_region, if a conflict with a BUSY and MUXED resource is
detected, then the caller goes to sleep and waits for the resource to be
released. A pointer on the conflicting resource is kept. At wake-up
this pointer is used as a parent to retry to request the region.
A first problem is that this pointer might well be invalid (if for
example the conflicting resource have already been freed). Another
problem is that the next call to __request_region() fails to detect a
remaining conflict. The previously conflicting resource is passed as a
parameter and __request_region() will look for a conflict among the
children of this resource and not at the resource itself. It is likely
to succeed anyway, even if there is still a conflict.
Instead, the parent of the conflicting resource should be passed to
__request_region().
As a fix, this patch doesn't update the parent resource pointer in the
case we have to wait for a muxed region right after.
Reported-and-tested-by: Vincent Pelletier <plr.vincent@gmail.com> Signed-off-by: Simon Guinot <simon.guinot@sequanux.org> Tested-by: Vincent Donnefort <vdonnefort@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Competing overwrite DIO in dioread_nolock mode will just overwrite
pointer to io_end in the inode. This may result in data corruption or
extent conversion happening from IO completion interrupt because we
don't properly set buffer_defer_completion() when unlocked DIO races
with locked DIO to unwritten extent.
Since unlocked DIO doesn't need io_end for anything, just avoid
allocating it and corrupting pointer from inode for locked DIO.
A cleaner fix would be to avoid these games with io_end pointer from the
inode but that requires more intrusive changes so we leave that for
later.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
ext4 can update bh->b_state non-atomically in _ext4_get_block() and
ext4_da_get_block_prep(). Usually this is fine since bh is just a
temporary storage for mapping information on stack but in some cases it
can be fully living bh attached to a page. In such case non-atomic
update of bh->b_state can race with an atomic update which then gets
lost. Usually when we are mapping bh and thus updating bh->b_state
non-atomically, nobody else touches the bh and so things work out fine
but there is one case to especially worry about: ext4_finish_bio() uses
BH_Uptodate_Lock on the first bh in the page to synchronize handling of
PageWriteback state. So when blocksize < pagesize, we can be atomically
modifying bh->b_state of a buffer that actually isn't under IO and thus
can race e.g. with delalloc trying to map that buffer. The result is
that we can mistakenly set / clear BH_Uptodate_Lock bit resulting in the
corruption of PageWriteback state or missed unlock of BH_Uptodate_Lock.
Fix the problem by always updating bh->b_state bits atomically.
Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[ luis: backported to 3.16:
- replaced READ_ONCE() by ACCESS_ONCE() ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
We need to use post-decrement to get the pci_map_page undone also for
i==0, and to avoid some very unpleasant behaviour if pci_map_page
failed already at i==0.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The commit [7f0973e973cd: ALSA: seq: Fix lockdep warnings due to
double mutex locks] split the management of two linked lists (source
and destination) into two individual calls for avoiding the AB/BA
deadlock. However, this may leave the possible double deletion of one
of two lists when the counterpart is being deleted concurrently.
It ends up with a list corruption, as revealed by syzkaller fuzzer.
This patch fixes it by checking the list emptiness and skipping the
deletion and the following process.
BugLink: http://lkml.kernel.org/r/CACT4Y+bay9qsrz6dQu31EcGaH9XwfW7o3oBzSQUG9fMszoh=Sg@mail.gmail.com Fixes: 7f0973e973cd ('ALSA: seq: Fix lockdep warnings due to 'double mutex locks) Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
In my randconfig tests, I came across a bug that involves several
components:
* gcc-4.9 through at least 5.3
* CONFIG_GCOV_PROFILE_ALL enabling -fprofile-arcs for all files
* CONFIG_PROFILE_ALL_BRANCHES overriding every if()
* The optimized implementation of do_div() that tries to
replace a library call with an division by multiplication
* code in drivers/media/dvb-frontends/zl10353.c doing
In this case, gcc fails to determine whether the divisor
in do_div() is __builtin_constant_p(). In particular, it
concludes that __builtin_constant_p(adc_clock) is false, while
__builtin_constant_p(!!adc_clock) is true.
That in turn throws off the logic in do_div() that also uses
__builtin_constant_p(), and instead of picking either the
constant- optimized division, and the code in ilog2() that uses
__builtin_constant_p() to figure out whether it knows the answer at
compile time. The result is a link error from failing to find
multiple symbols that should never have been called based on
the __builtin_constant_p():
dvb-frontends/zl10353.c:138: undefined reference to `____ilog2_NaN'
dvb-frontends/zl10353.c:138: undefined reference to `__aeabi_uldivmod'
ERROR: "____ilog2_NaN" [drivers/media/dvb-frontends/zl10353.ko] undefined!
ERROR: "__aeabi_uldivmod" [drivers/media/dvb-frontends/zl10353.ko] undefined!
This patch avoids the problem by changing __trace_if() to check
whether the condition is known at compile-time to be nonzero, rather
than checking whether it is actually a constant.
I see this one link error in roughly one out of 1600 randconfig builds
on ARM, and the patch fixes all known instances.
Link: http://lkml.kernel.org/r/1455312410-1058841-1-git-send-email-arnd@arndb.de Acked-by: Nicolas Pitre <nico@linaro.org> Fixes: ab3c9c686e22 ("branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The tracepoint infrastructure uses RCU sched protection to enable and
disable tracepoints safely. There are some instances where tracepoints are
used in infrastructure code (like kfree()) that get called after a CPU is
going offline, and perhaps when it is coming back online but hasn't been
registered yet.
This can probuce the following warning:
[ INFO: suspicious RCU usage. ] 4.4.0-00006-g0fe53e8-dirty #34 Tainted: G S
-------------------------------
include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/8/0.
This warning is not a false positive either. RCU is not protecting code that
is being executed while the CPU is offline.
Instead of playing "whack-a-mole(TM)" and adding conditional statements to
the tracepoints we find that are used in this instance, simply add a
cpu_online() test to the tracepoint code where the tracepoint will be
ignored if the CPU is offline.
Use of raw_smp_processor_id() is fine, as there should never be a case where
the tracepoint code goes from running on a CPU that is online and suddenly
gets migrated to a CPU that is offline.
Link: http://lkml.kernel.org/r/1455387773-4245-1-git-send-email-kda@linux-powerpc.org Reported-by: Denis Kirjanov <kda@linux-powerpc.org> Fixes: 97e1c18e8d17b ("tracing: Kernel Tracepoints") Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[ luis: backported to 3.16:
- included linux/percpu.h as suggested by Steven for other stable kernels ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The commit 2895b2cad6e7 ("dmaengine: dw: fix cyclic transfer callbacks")
re-enabled BLOCK interrupts with regard to make cyclic transfers work. However,
this change becomes a regression for non-cyclic transfers as interrupt counters
under stress test had been grown enormously (approximately per 4-5 bytes in the
UART loop back test).
Taking into consideration above enable BLOCK interrupts if and only if channel
is programmed to perform cyclic transfer.
Fixes: 2895b2cad6e7 ("dmaengine: dw: fix cyclic transfer callbacks") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Mans Rullgard <mans@mansr.com> Tested-by: Mans Rullgard <mans@mansr.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
When multiple concurrent writes happen on the ALSA sequencer device
right after the open, it may try to allocate vmalloc buffer for each
write and leak some of them. It's because the presence check and the
assignment of the buffer is done outside the spinlock for the pool.
The fix is to move the check and the assignment into the spinlock.
(The current implementation is suboptimal, as there can be multiple
unnecessary vmallocs because the allocation is done before the check
in the spinlock. But the pool size is already checked beforehand, so
this isn't a big problem; that is, the only possible path is the
multiple writes before any pool assignment, and practically seen, the
current coverage should be "good enough".)
Commit 35dc248383bbab0a7203fca4d722875bc81ef091 introduced a check for
current->mm to see if we have a user space context and only copies data
if we do. Now if an IO gets interrupted by a signal data isn't copied
into user space any more (as we don't have a user space context) but
user space isn't notified about it.
This patch modifies the behaviour to return -EINTR from bio_uncopy_user()
to notify userland that a signal has interrupted the syscall, otherwise
it could lead to a situation where the caller may get a buffer with
no data returned.
This can be reproduced by issuing SG_IO ioctl()s in one thread while
constantly sending signals to it.
Fixes: 35dc248 [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
[ luis: backported to 3.16: based on Johannes' backport to 3.14 ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
I notice ext4/307 fails occasionally on ppc64 host, reporting md5
checksum mismatch after moving data from original file to donor file.
The reason is that move_extent_per_page() calls __block_write_begin()
and block_commit_write() to write saved data from original inode blocks
to donor inode blocks, but __block_write_begin() not only maps buffer
heads but also reads block content from disk if the size is not block
size aligned. At this time the physical block number in mapped buffer
head is pointing to the donor file not the original file, and that
results in reading wrong data to page, which get written to disk in
following block_commit_write call.
This also can be reproduced by the following script on 1k block size ext4
on x86_64 host:
# reserve space for donor file, written by 0xaa and sync to disk to
# avoid EBUSY on EXT4_IOC_MOVE_EXT
xfs_io -fc "pwrite -S 0xaa 0 1m" -c "fsync" $donorfile
# create test file written by 0xbb
xfs_io -fc "pwrite -S 0xbb 0 1023" -c "fsync" $testfile
# compute initial md5sum
md5sum $testfile | tee md5sum.txt
# drop cache, force e4compact to read data from disk
echo 3 > /proc/sys/vm/drop_caches
Fix it by creating & mapping buffer heads only but not reading blocks
from disk, because all the data in page is guaranteed to be up-to-date
in mext_page_mkuptodate().
Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Since sizeof(ext_new_group_data) > sizeof(ext_new_flex_group_data),
integer overflow could be happened.
Therefore, need to fix integer overflow sanitization.
What appears to be happening is that something has pinned the target
so it can't go into STARGET_DEL via final release and the loop in
scsi_remove_target spins endlessly until that happens.
The fix for this soft lockup is to not keep looping over a device that
we've called remove on but which hasn't gone into DEL state. This
patch will retain a simplistic memory of the last target and not keep
looping over it.
Reported-by: Sebastian Herbszt <herbszt@gmx.de> Tested-by: Sebastian Herbszt <herbszt@gmx.de> Fixes: 40998193560dab6c3ce8d25f4fa58a23e252ef38 Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Add refcount to the DASD device when a summary unit check worker is
scheduled. This prevents that the device is set offline with worker
in place.
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The channel checks the specified length and the provided amount of
data for CCWs and provides an incorrect length error if the size does
not match. Under z/VM with simulation activated the length may get
changed. Having the suppress length indication bit set is stated as
good CCW coding practice and avoids errors under z/VM.
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The setup_ntlmv2_rsp() function may return positive value ENOMEM instead
of -ENOMEM in case of kmalloc failure.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Since binutils 2.26 BFD is doing suffix merging on STRTAB sections. But
dedotify modifies the symbol names in place, which can also modify
unrelated symbols with a name that matches a suffix of a dotted name. To
remove the leading dot of a symbol name we can just increment the pointer
into the STRTAB section instead.
Backport to all stables to avoid breakage when people update their
binutils - mpe.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
This commit introduced a regression in this kernel. Commit 95be58df74a5
("firmware: dmi_scan: Use full dmi version for SMBIOS3") added support for
SMBIOS3. However, this was not backport to this kernel version, and thus
commit ff4319dc7cd5 should have not been applied.
Commit 2c7b49212a86 ("phy: fix the use of PHY_IGNORE_INTERRUPT") changed
a hunk in phy_state_machine() in the PHY_RUNNING case which was not
needed. The change essentially makes the PHY library treat PHY devices
with PHY_IGNORE_INTERRUPT to keep polling for the PHY device, even
though the intent is not to do it.
Fix this by reverting that specific hunk, which makes the PHY state
machine wait for state changes, and stay in the PHY_RUNNING state for as
long as needed.
Fixes: 2c7b49212a86 ("phy: fix the use of PHY_IGNORE_INTERRUPT") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Currently, if phy state is PHY_RUNNING, we always register a CHANGE
when phy works in polling or interrupt ignored, this will make the
adjust_link being called even the phy link did Not changed.
checking the phy link to make sure the link did changed before we
register a CHANGE, if link did not changed, we do nothing.
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
A regression introduced when the master ttm lock was split into two.
Reported-and-tested-by: Brian Paul <brianp@vmware.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
On no-so-small systems, it is possible for a single process to cause an
OOM condition by filling large pipes with data that are never read. A
typical process filling 4000 pipes with 1 MB of data will use 4 GB of
memory. On small systems it may be tricky to set the pipe max size to
prevent this from happening.
This patch makes it possible to enforce a per-user soft limit above
which new pipes will be limited to a single page, effectively limiting
them to 4 kB each, as well as a hard limit above which no new pipes may
be created for this user. This has the effect of protecting the system
against memory abuse without hurting other users, and still allowing
pipes to work correctly though with less data at once.
The limit are controlled by two new sysctls : pipe-user-pages-soft, and
pipe-user-pages-hard. Both may be disabled by setting them to zero. The
default soft limit allows the default number of FDs per process (1024)
to create pipes of the default size (64kB), thus reaching a limit of 64MB
before starting to create only smaller pipes. With 256 processes limited
to 1024 FDs each, this results in 1024*64kB + (256*1024 - 1024) * 4kB =
1084 MB of memory allocated for a user. The hard limit is disabled by
default to avoid breaking existing applications that make intensive use
of pipes (eg: for splicing).
Fix the RC QPs send queue overhead computation to take into account
two additional segments in the WQE which are needed for registration
operations.
The ATOMIC and UMR segments can't coexist together, so chose maximum out
of them.
The commit 9e65dc371b5c ("IB/mlx5: Fix RC transport send queue overhead
computation") was intended to update RC transport as commit messages
states, but added the code to UC transport.
Fixes: 9e65dc371b5c ("IB/mlx5: Fix RC transport send queue overhead computation") Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Commit ed5a377d87dc ("sctp: translate host order to network order when
setting a hmacid") corrected the hmacid byte-order when setting a hmacid.
but the same issue also exists on getting a hmacid.
We fix it by changing hmacids to host order when users get them with
getsockopt.
Fixes: Commit ed5a377d87dc ("sctp: translate host order to network order when setting a hmacid") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Silence lockdep false positive about rcu_dereference() being
used in the wrong context.
First one should use rcu_dereference_protected() as we own the spinlock.
Second one should be a normal assignation, as no barrier is needed.
Fixes: 18367681a10bd ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.") Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The commit referenced in the Fixes tag incorrectly accounted the number
of in-flight fds over a unix domain socket to the original opener
of the file-descriptor. This allows another process to arbitrary
deplete the original file-openers resource limit for the maximum of
open files. Instead the sending processes and its struct cred should
be credited.
To do so, we add a reference counted struct user_struct pointer to the
scm_fp_list and use it to account for the number of inflight unix fds.
Fixes: 712f4aad406bb1 ("unix: properly account for FDs passed over unix sockets") Reported-by: David Herrmann <dh.herrmann@gmail.com> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
We should not trim skb for mmaped socket since its buf size is fixed
and userspace will read as frame which data equals head. mmaped
socket will not call recvmsg, means max_recvmsg_len is 0,
skb_reserve was not called before commit: db65a3aaf29e.
Fixes: db65a3aaf29e (netlink: Trim skb to alloc size to avoid MSG_TRUNC) Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
This patch extends commit b93d6471748d ("sctp: implement the sender side
for SACK-IMMEDIATELY extension") as it didn't white list
SCTP_SACK_IMMEDIATELY on sctp_msghdr_parse(), causing it to be
understood as an invalid flag and returning -EINVAL to the application.
Note that the actual handling of the flag is already there in
sctp_datamsg_from_user().
https://tools.ietf.org/html/rfc7053#section-7
Fixes: b93d6471748d ("sctp: implement the sender side for SACK-IMMEDIATELY extension") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ luis: backported to 3.16:
- dropped changes to SCTP_SNDINFO case ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Several times already this has been reported as kasan reports caused by
syzkaller and trinity and people always looked at RCU races, but it is
much more simple. :)
In case we bind a pptp socket multiple times, we simply add it to
the callid_sock list but don't remove the old binding. Thus the old
socket stays in the bucket with unused call_id indexes and doesn't get
cleaned up. This causes various forms of kasan reports which were hard
to pinpoint.
Simply don't allow multiple binds and correct error handling in
pptp_bind. Also keep sk_state bits in place in pptp_connect.
Fixes: 00959ade36acad ("PPTP: PPP over IPv4 (Point-to-Point Tunneling Protocol)") Cc: Dmitry Kozlov <xeb@mail.ru> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Dmitry Vyukov <dvyukov@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Dave Jones <davej@codemonkey.org.uk> Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Dmitry reported a struct pid leak detected by a syzkaller program.
Bug happens in unix_stream_recvmsg() when we break the loop when a
signal is pending, without properly releasing scm.
Fixes: b3ca9b02b007 ("net: fix multithreaded signal handling in unix recv routines") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ luis: backported to 3.16:
- use siocb->scm instead of &scm ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Commit 5ea94e7686a3 ("phy: add phy_mac_interrupt()") to use with
PHY_IGNORE_INTERRUPT added a cancel_work_sync() into phy_mac_interrupt()
which is allowed to sleep, whereas phy_mac_interrupt() is expected to be
callable from interrupt context.
Now that we have fixed how the PHY state machine treats
PHY_IGNORE_INTERRUPT with respect to state changes, we can just set the
new link state, and queue the PHY state machine for execution so it is
going to read the new link state.
For that to work properly, we need to update phy_change() not to try to
invoke any interrupt callbacks if we have configured the PHY device for
PHY_IGNORE_INTERRUPT, because that PHY device and its driver are not
required to implement those.
Fixes: 5ea94e7686a3 ("phy: add phy_mac_interrupt() to use with PHY_IGNORE_INTERRUPT") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The value of ctx->pos in the last readdir call is supposed to be set to
INT_MAX due to 32bit compatibility, unless 'pos' is intentially set to a
larger value, then it's LLONG_MAX.
There's a report from PaX SIZE_OVERFLOW plugin that "ctx->pos++"
overflows (https://forums.grsecurity.net/viewtopic.php?f=1&t=4284), on a
64bit arch, where the value is 0x7fffffffffffffff ie. LLONG_MAX before
the increment.
We can get to that situation like that:
* emit all regular readdir entries
* still in the same call to readdir, bump the last pos to INT_MAX
* next call to readdir will not emit any entries, but will reach the
bump code again, finds pos to be INT_MAX and sets it to LLONG_MAX
Normally this is not a problem, but if we call readdir again, we'll find
'pos' set to LLONG_MAX and the unconditional increment will overflow.
The report from Victor at
(http://thread.gmane.org/gmane.comp.file-systems.btrfs/49500) with debugging
print shows that pattern:
The jump from 7fffffff to 7fffffffffffffff happens when new dir entries
are not yet synced and are processed from the delayed list. Then the code
could go to the bump section again even though it might not emit any new
dir entries from the delayed list.
The fix avoids entering the "bump" section again once we've finished
emitting the entries, both for synced and delayed entries.
References: https://forums.grsecurity.net/viewtopic.php?f=1&t=4284 Reported-by: Victor <services@swwu.com> Signed-off-by: David Sterba <dsterba@suse.com> Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Since the dawn of time the ICST code has only supported divide
by one or hang in an eternal loop. Luckily we were always dividing
by one because the reference frequency for the systems using
the ICSTs is 24MHz and the [min,max] values for the PLL input
if [10,320] MHz for ICST307 and [6,200] for ICST525, so the loop
will always terminate immediately without assigning any divisor
for the reference frequency.
But for the code to make sense, let's insert the missing i++
Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
An arbitrary amount of time can pass between spin_unlock and
radeon_fence_wait_any, so we need to ensure that nobody frees the
fences from under us.
Based on the analogous fix for amdgpu.
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
When looking up the pool_workqueue to use for an unbound workqueue,
workqueue assumes that the target CPU is always bound to a valid NUMA
node. However, currently, when a CPU goes offline, the mapping is
destroyed and cpu_to_node() returns NUMA_NO_NODE.
This has always been broken but hasn't triggered often enough before 874bbfe600a6 ("workqueue: make sure delayed work run in local cpu").
After the commit, workqueue forcifully assigns the local CPU for
delayed work items without explicit target CPU to fix a different
issue. This widens the window where CPU can go offline while a
delayed work item is pending causing delayed work items dispatched
with target CPU set to an already offlined CPU. The resulting
NUMA_NO_NODE mapping makes workqueue try to queue the work item on a
NULL pool_workqueue and thus crash.
While 874bbfe600a6 has been reverted for a different reason making the
bug less visible again, it can still happen. Fix it by mapping
NUMA_NO_NODE to the default pool_workqueue from unbound_pwq_by_node().
This is a temporary workaround. The long term solution is keeping CPU
-> NODE mapping stable across CPU off/online cycles which is being
worked on.
Otherwise rmmod omap2430; rmmod phy-twl4030-usb; modprobe omap2430
will try to use a non-existing phy and oops:
Unable to handle kernel paging request at virtual address b6f7c1f0
...
[<c048a284>] (devm_usb_get_phy_by_node) from [<bf0758ac>]
(omap2430_musb_init+0x44/0x2b4 [omap2430])
[<bf0758ac>] (omap2430_musb_init [omap2430]) from [<bf055ec0>]
(musb_init_controller+0x194/0x878 [musb_hdrc])
Cc: Bin Liu <b-liu@ti.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Kishon Vijay Abraham I <kishon@ti.com> Cc: NeilBrown <neil@brown.name> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
snd_timer_user_read() has a potential race among parallel reads, as
qhead and qused are updated outside the critical section due to
copy_to_user() calls. Move them into the critical section, and also
sanitize the relevant code a bit.
Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>