Replace free_skb with dev_kfree_skb_any in be_tx_compl_process as
which can be called in hard irq by netpoll, softirq context
by normal napi polling, and in normal sleepable context
by the network device close method.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Replace kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Replace kfree_skb with dev_kfree_skb_any in cp_start_xmit
as it can be called in both hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
I noticed tcpdump was giving funky timestamps for locally
generated SYNACK messages on loopback interface.
11:42:46.938990 IP 127.0.0.1.48245 > 127.0.0.2.23850: S 945476042:945476042(0) win 43690 <mss 65495,nop,nop,sackOK,nop,wscale 7>
20:28:58.502209 IP 127.0.0.2.23850 > 127.0.0.1.48245: S 3160535375:3160535375(0) ack 945476043 win 43690 <mss
65495,nop,nop,sackOK,nop,wscale 7>
This is because we need to clear skb->tstamp before
entering lower stack, otherwise net_timestamp_check()
does not set skb->tstamp.
Fixes: 7faee5c0d514 ("tcp: remove TCP_SKB_CB(skb)->when") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
On processing cumulative ACKs, the FRTO code was not checking the
SACKed bit, meaning that there could be a spurious FRTO undo on a
cumulative ACK of a previously SACKed skb.
The FRTO code should only consider a cumulative ACK to indicate that
an original/unretransmitted skb is newly ACKed if the skb was not yet
SACKed.
The effect of the spurious FRTO undo would typically be to make the
connection think that all previously-sent packets were in flight when
they really weren't, leading to a stall and an RTO.
Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Fixes: e33099f96d99c ("tcp: implement RFC5682 F-RTO") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
A local route may have a lower hop_limit set than global routes do.
RFC 3756, Section 4.2.7, "Parameter Spoofing"
> 1. The attacker includes a Current Hop Limit of one or another small
> number which the attacker knows will cause legitimate packets to
> be dropped before they reach their destination.
> As an example, one possible approach to mitigate this threat is to
> ignore very small hop limits. The nodes could implement a
> configurable minimum hop limit, and ignore attempts to set it below
> said limit.
Signed-off-by: D.S. Ljungmark <ljungmark@modio.se> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
On s390x, gcc 4.8 compiles this part of tcp_v6_early_demux()
struct dst_entry *dst = sk->sk_rx_dst;
if (dst)
dst = dst_check(dst, inet6_sk(sk)->rx_dst_cookie);
to code reading sk->sk_rx_dst twice, once for the test and once for
the argument of ip6_dst_check() (dst_check() is inline). This allows
ip6_dst_check() to be called with null first argument, causing a crash.
Protect sk->sk_rx_dst access by ACCESS_ONCE() both in IPv4 and IPv6
TCP early demux code.
Fixes: 41063e9dd119 ("ipv4: Early TCP socket demux.") Fixes: c7109986db3c ("ipv6: Early TCP socket demux") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
arm64 is unlikely to have a VGA console and does not export screen_info
causing build failures if the driver is build, for example in all*config.
Add a dependency on !ARM64 to prevent this.
This list is getting quite long, it may be easier to depend on a symbol
which architectures that do support the driver can select.
Signed-off-by: Mark Brown <broonie@linaro.org>
[tomi.valkeinen@ti.com: moved && to first modified line] Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Some architectures do not support PCI, but still support USB, so need
let our usb driver try to use usb_* instead of pci_* to support these
architectures, or can not pass compiling.
The related error (with allmodconfig for arc):
CC [M] drivers/media/usb/b2c2/flexcop-usb.o
drivers/media/usb/b2c2/flexcop-usb.c: In function ‘flexcop_usb_transfer_exit’:
drivers/media/usb/b2c2/flexcop-usb.c:393: error: implicit declaration of function ‘pci_free_consistent’
drivers/media/usb/b2c2/flexcop-usb.c: In function ‘flexcop_usb_transfer_init’:
drivers/media/usb/b2c2/flexcop-usb.c:410: error: implicit declaration of function ‘pci_alloc_consistent’
For RoCE ports, we set the u32 PMA values based on u64 HCA counters. In case of
overflow, according to the IB spec, we have to saturate a counter to its
max value, do that.
Fixes: c37791349cc7 ('IB/mlx4: Support PMA counters for IBoE') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The timeout entries are sizeof(int) rather than sizeof(long), which
means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
TASK_SIZE is depends on the systems architecture (32 or 64 bits) and it
should not be used for defining offset boundary for mmaping buffers for
CAPTURE and OUTPUT queues. This patch fixes support for MMAP calls on
the CAPTURE queue on 64bit architectures (like ARM64).
This fixes a oops due to a double list add when adding a reject PDU for
iscsit_allocate_iovecs allocation failures. The cmd has already been
added to the conn_cmd_list in iscsit_setup_scsi_cmd, so this has us call
iscsit_reject_cmd.
Note that for ERL0 the reject PDU is not actually sent, so this patch
is not completely tested. Just verified we do not oops. The problem is the
add reject functions return -1 which is returned all the way up to
iscsi_target_rx_thread which for ERL0 will drop the connection.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
If we fail past the aio_setup_ring(), we need to destroy the
mapping. We don't need to care about anybody having found ctx,
or added requests to it, since the last failure exit is exactly
the failure to make ctx visible to lookups.
Reproducer (based on one by Joe Mario <jmario@redhat.com>):
"ocfs2 syncs the wrong range" had been broken; prior to it the
code was doing the wrong thing in case of O_APPEND, all right,
but _after_ it we were syncing the wrong range in 100% cases.
*ppos, aka iocb->ki_pos is incremented prior to that point,
so we are always doing sync on the area _after_ the one we'd
written to.
Spotted by Joseph Qi <joseph.qi@huawei.com> back in January;
unfortunately, I'd missed his mail back then ;-/
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Kernel panic was happening as iscsi_host_remove() was called on
a host which was not yet added.
Signed-off-by: John Soni Jose <sony.john-n@emulex.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Under intermittent network outages, find_writable_file() is susceptible
to the following race condition, which results in a user-after-free in
the cifs_writepages code-path:
At this point we loop back through with an invalid inv_file pointer
and a refind value of 1. On second pass, inv_file is not overwritten on
openFileList traversal, and is subsequently dereferenced.
Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
In canon mode, the read buffer head will advance over the buffer tail
if the input > 4095 bytes without receiving a line termination char.
Discard additional input until a line termination is received.
Before evaluating for overflow, the 'room' value is normalized for
I_PARMRK and 1 byte is reserved for line termination (even in !icanon
mode, in case the mode is switched). The following table shows the
transform:
actual buffer | 'room' value before overflow calc
space avail | !I_PARMRK | I_PARMRK
--------------------------------------------------
0 | -1 | -1
1 | 0 | 0
2 | 1 | 0
3 | 2 | 0
4+ | 3 | 1
When !icanon or when icanon and the read buffer contains newlines,
normalized 'room' values of -1 and 0 are clamped to 0, and
'overflow' is 0, so read_head is not adjusted and the input i/o loop
exits (setting no_room if called from flush_to_ldisc()). No input
is discarded since the reader does have input available to read
which ensures forward progress.
When icanon and the read buffer does not contain newlines and the
normalized 'room' value is 0, then overflow and room are reset to 1,
so that the i/o loop will process the next input char normally
(except for parity errors which are ignored). Thus, erasures, signalling
chars, 7-bit mode, etc. will continue to be handled properly.
If the input char processed was not a line termination char, then
the canon_head index will not have advanced, so the normalized 'room'
value will now be -1 and 'overflow' will be set, which indicates the
read_head can safely be reset, effectively erasing the last char
processed.
If the input char processed was a line termination, then the
canon_head index will have advanced, so 'overflow' is cleared to 0,
the read_head is not reset, and 'room' is cleared to 0, which exits
the i/o loop (because the reader now have input available to read
which ensures forward progress).
Note that it is possible for a line termination to be received, and
for the reader to copy the line to the user buffer before the
input i/o loop is ready to process the next input char. This is
why the i/o loop recomputes the room/overflow state with every
input char while handling overflow.
Finally, if the input data was processed without receiving
a line termination (so that overflow is still set), the pty
driver must receive a write wakeup. A pty writer may be waiting
to write more data in n_tty_write() but without unthrottling
here that wakeup will not arrive, and forward progress will halt.
(Normally, the pty writer is woken when the reader reads data out
of the buffer and more space become available).
Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
(backported from commit fb5ef9e7da39968fec6d6f37f20a23d23740c75e) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
When the receiver was enabled during startup, a character could
have been in the FIFO when the UART get initially used. The
driver configures the (receive) watermark level, and flushes the
FIFO. However, the receive flag (RDRF) could still be set at that
stage (as mentioned in the register description of UARTx_RWFIFO).
This leads to an interrupt which won't be handled properly in
interrupt mode: The receive interrupt function lpuart_rxint checks
the FIFO count, which is 0 at that point (due to the flush
during initialization). The problem does not manifest when using
DMA to receive characters.
Fix this situation by explicitly read the status register, which
leads to clearing of the RDRF flag. Due to the flush just after
the status flag read, a explicit data read is not to required.
Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
When a device with an isochronous endpoint is plugged into the Intel
xHCI host controller, and the driver submits multiple frames per URB,
the xHCI driver will set the Block Event Interrupt (BEI) flag on all
but the last TD for the URB. This causes the host controller to place
an event on the event ring, but not send an interrupt. When the last
TD for the URB completes, BEI is cleared, and we get an interrupt for
the whole URB.
However, under Intel xHCI host controllers, if the event ring is full
of events from transfers with BEI set, an "Event Ring is Full" event
will be posted to the last entry of the event ring, but no interrupt
is generated. Host will cease all transfer and command executions and
wait until software completes handling the pending events in the event
ring. That means xHC stops, but event of "event ring is full" is not
notified. As the result, the xHC looks like dead to user.
This patch is to apply XHCI_AVOID_BEI quirk to Intel xHC devices. And
it should be backported to kernels as old as 3.0, that contains the
commit 69e848c2090a ("Intel xhci: Support EHCI/xHCI port switching.").
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Alistair Grant <akgrant0710@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Linux xHCI driver doesn't report and handle port cofig error change.
If Port Configure Error for root hub port occurs, CEC bit in PORTSC
would be set by xHC and remains 1. This happends when the root port
fails to configure its link partner, e.g. the port fails to exchange
port capabilities information using Port Capability LMPs.
Then the Port Status Change Events will be blocked until all status
change bits(CEC is one of the change bits) are cleared('0') (refer to
xHCI spec 4.19.2). Otherwise, the port status change event for this
root port will not be generated anymore, then root port would look
like dead for user and can't be recovered until a Host Controller
Reset(HCRST).
This patch is to check CEC bit in PORTSC in xhci_get_port_status()
and set a Config Error in the return status if CEC is set. This will
cause a ClearPortFeature request, where CEC bit is cleared in
xhci_clear_port_change_bit().
[The commit log is based on initial Marvell patch posted at
http://marc.info/?l=linux-kernel&m=142323612321434&w=2]
Return EPROBE_DEFER if Regulator returns EPROBE_DEFER
If the Flexcan driver is built into kernel and a regulator is used to
enable the CAN transceiver, the Flexcan driver may not use the regulator.
When initializing the Flexcan device with a regulator defined in the device
tree, but not initialized, the regulator subsystem returns EPROBE_DEFER, hence
the Flexcan init fails.
The solution for this is to return EPROBE_DEFER if regulator is not initialized
and wait until the regulator is initialized.
Signed-off-by: Andreas Werner <kernel@andy89.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The ASRock Q1900DC-ITX mainboard (Baytrail-D) hangs randomly in
both BIOS and UEFI mode while rebooting unless reboot=pci is
used. Add a quirk to reboot via the pci method.
The problem is very intermittent and hard to debug, it might succeed
rebooting just fine 40 times in a row - but fails half a dozen times
the next day. It seems to be slightly less common in BIOS CSM mode
than native UEFI (with the CSM disabled), but it does happen in either
mode. Since I've started testing this patch in late january, rebooting
has been 100% reliable.
Most of the time it already hangs during POST, but occasionally it
might even make it through the bootloader and the kernel might even
start booting, but then hangs before the mode switch. The same symptoms
occur with grub-efi, gummiboot and grub-pc, just as well as (at least)
kernel 3.16-3.19 and 4.0-rc6 (I haven't tried older kernels than 3.16).
Upgrading to the most current mainboard firmware of the ASRock
Q1900DC-ITX, version 1.20, does not improve the situation.
( Searching the web seems to suggest that other Bay Trail-D mainboards
might be affected as well. )
-- Signed-off-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> Cc: Matt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/20150330224427.0fb58e42@mir Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
In omap_dma_start_desc the vdesc->node is removed from the virt-dma
framework managed lists (to be precise from the desc_issued list).
If a terminate_all comes before the transfer finishes the omap_desc will
not be freed up because it is not in any of the lists and we stopped the
DMA channel so the transfer will not going to complete.
There is no special sequence for leaking memory when using cyclic (audio)
transfer: with every start and stop of a cyclic transfer the driver leaks
struct omap_desc worth of memory.
Free up the allocated memory directly in omap_dma_terminate_all() since the
framework will not going to do that for us.
This patch uses iio_trigger_get to increment the reference
count of trigger device, to avoid incorrect assignment.
Can result in a null pointer dereference during removal if the
trigger has been changed before removal.
This patch refers to a similar situation encountered through the
following discussion:
http://www.spinics.net/lists/linux-iio/msg13669.html
SCSI transport drivers and SCSI LLDs block a SCSI device if the
transport layer is not operational. This means that in this state
no requests should be processed, even if the REQ_PREEMPT flag has
been set. This patch avoids that a rescan shortly after a cable
pull sporadically triggers the following kernel oops:
This patch uses the existing CALAO Systems ftdi_8u2232c_probe in order
to avoid attaching a TTY to the JTAG port as this board is based on the
CALAO Systems reference design and needs the same fix up.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
[johan: clean up probe logic ] Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Use readb() and memcpy_fromio() accessors instead.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2f800fbd777b ("writeback: fix dirtied pages accounting on redirty")
introduced account_page_redirty() which reverts stat updates for a
redirtied page, making BDI_DIRTIED no longer monotonically increasing.
bdi_update_write_bandwidth() uses the delta in BDI_DIRTIED as the
basis for bandwidth calculation. While unlikely, since the above
patch, the newer value may be lower than the recorded past value and
underflow the bandwidth calculation leading to a wild result.
Fix it by subtracing min of the old and new values when calculating
delta. AFAIK, there hasn't been any report of it happening but the
resulting erratic behavior would be non-critical and temporary, so
it's possible that the issue is happening without being reported. The
risk of the fix is very low, so tagged for -stable.
global_update_bandwidth() uses static variable update_time as the
timestamp for the last update but forgets to initialize it to
INITIALIZE_JIFFIES.
This means that global_dirty_limit will be 5 mins into the future on
32bit and some large amount jiffies into the past on 64bit. This
isn't critical as the only effect is that global_dirty_limit won't be
updated for the first 5 mins after booting on 32bit machines,
especially given the auxiliary nature of global_dirty_limit's role -
protecting against global dirty threshold's sudden dips; however, it
does lead to unintended suboptimal behavior. Fix it.
When non-realtime tasks get priority-inheritance boosted to a realtime
scheduling class, RLIMIT_RTTIME starts to apply to them. However, the
counter used for checking this (the same one used for SCHED_RR
timeslices) was not getting reset. This meant that tasks running with a
non-realtime scheduling class which are repeatedly boosted to a realtime
one, but never block while they are running realtime, eventually hit the
timeout without ever running for a time over the limit. This patch
resets the realtime timeslice counter when un-PI-boosting from an RT to
a non-RT scheduling class.
I have some test code with two threads and a shared PTHREAD_PRIO_INHERIT
mutex which induces priority boosting and spins while boosted that gets
killed by a SIGXCPU on non-fixed kernels but doesn't with this patch
applied. It happens much faster with a CONFIG_PREEMPT_RT kernel, and
does happen eventually with PREEMPT_VOLUNTARY kernels.
The cause is the "memset(pgdat, 0, sizeof(*pgdat))" at the end of
try_offline_node, which will reset all the content of pgdat to 0, as the
pgdat is accessed lock-free, so that the users still using the pgdat
will panic, such as the vmstat_update routine.
process A: offline node XX:
vmstat_updat()
refresh_cpu_vm_stats()
for_each_populated_zone()
find online node XX
cond_resched()
offline cpu and memory, then try_offline_node()
node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
zone = next_zone(zone)
pg_data_t *pgdat = zone->zone_pgdat; // here pgdat is NULL now
next_online_pgdat(pgdat)
next_online_node(pgdat->node_id); // NULL pointer access
So the solution here is postponing the reset of obsolete pgdat from
try_offline_node() to hotadd_new_pgdat(), and just resetting
pgdat->nr_zones and pgdat->classzone_idx to be 0 rather than the memset
0 to avoid breaking pointer information in pgdat.
The assumption before this patch was that we don't need to
run again the INIT firmware after the system booted. The
INIT firmware runs calibrations which impact the physical
layer's behavior.
Users reported that it may be helpful to run these
calibrations again every time the interface is brought up.
The penatly is minimal, since the calibrations run fast.
This fixes:
https://bugzilla.kernel.org/show_bug.cgi?id=94341
Properly verify that the resulting page aligned end address is larger
than both the start address and the length of the memory area requested.
Both the start and length arguments for ib_umem_get are controlled by
the user. A misbehaving user can provide values which will cause an
integer overflow when calculating the page aligned end address.
This overflow can cause also miscalculation of the number of pages
mapped, and additional logic issues.
Addresses: CVE-2014-8159 Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Ben Hutchings [Wed, 15 Apr 2015 18:00:32 +0000 (19:00 +0100)]
tcp: Fix crash in TCP Fast Open
Commit 355a901e6cf1 ("tcp: make connect() mem charging friendly")
changed tcp_send_syn_data() to perform an open-coded copy of the 'syn'
skb rather than using skb_copy_expand().
The open-coded copy does not cover the skb_shared_info::gso_segs
field, so in the new skb it is left set to 0. When this commit was
backported into stable branches between 3.10.y and 3.16.7-ckty
inclusive, it triggered the BUG() in tcp_transmit_skb().
Since Linux 3.18 the GSO segment count is kept in the
tcp_skb_cb::tcp_gso_segs field and tcp_send_syn_data() does copy the
tcp_skb_cb structure to the new skb, so mainline and newer stable
branches are not affected.
Set skb_shared_info::gso_segs to the correct value of 1.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Return a negative error value like the rest of the entries in this function.
Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: tweaked subject line] Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The regfile provided to SA_SIGINFO signal handler as ucontext was off by
one due to pt_regs gutter cleanups in 2013.
Before handling signal, user pt_regs are copied onto user_regs_struct and copied
back later. Both structs are binary compatible. This was all fine until
commit 2fa919045b72 (ARC: pt_regs update #2) which removed the empty stack slot
at top of pt_regs (corresponding to first pad) and made the corresponding
fixup in struct user_regs_struct (the pad in there was moved out of
@scratch - not removed altogether as it is part of ptrace ABI)
struct user_regs_struct {
+ long pad;
struct {
- long pad;
long bta, lp_start, lp_end,....
} scratch;
...
}
This meant that now user_regs_struct was off by 1 reg w.r.t pt_regs and
signal code needs to user_regs_struct.scratch to reflect it as pt_regs,
which is what this commit does.
This problem was hidden for 2 years, because both save/restore, despite
using wrong location, were using the same location. Only an interim
inspection (reproducer below) exposed the issue.
Some BIOS version of Fujitsu Lifebook T731 seems to set up the
headphone pin (0x21) without the assoc number 0x0f while it's set only
to the output on the docking port (0x1a). With the recent commit
[03ad6a8c93b6: ALSA: hda - Fix "PCM" name being used on one DAC when
there are two DACs], this resulted in the weird mixer element
mapping where the headphone on the laptop is assigned as a shared
volume with the speaker and the docking port is assigned as an
individual headphone.
This patch improves the situation by correcting the headphone pin
config to the more appropriate value.
Pin sense will active when power pin is wake up.
Power pin will not wake up immediately during resume state.
Add some delay to wait for power pin activated.
Adds an entry for Creative USB X-Fi to the rc_config array in
mixer_quirks.c to allow use of volume knob on the device.
Adds support for newer X-Fi Pro card, known as "Model No. SB1095"
with USB ID "041e:3237"
We have a HP machine which use the codec node 0x17 connecting the
internal speaker, and from the node capability, we saw the EAPD,
if we don't set the EAPD on for this node, the internal speaker
can't output any sound.
Now that the definition is centralized in <linux/kernel.h>, the
definitions of U32_MAX (and related) elsewhere in the kernel can be
removed.
Signed-off-by: Alex Elder <elder@linaro.org> Acked-by: Sage Weil <sage@inktank.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Create constants that define the maximum and minimum values
representable by the kernel types u8, s8, u16, s16, and so on.
Signed-off-by: Alex Elder <elder@linaro.org> Cc: Sage Weil <sage@inktank.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The symbol U32_MAX is defined in several spots. Change these
definitions to be conditional. This is in preparation for the next
patch, which centralizes the definition in <linux/kernel.h>.
Signed-off-by: Alex Elder <elder@linaro.org> Cc: Sage Weil <sage@inktank.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
On a MIPS Malta board, tons of fifo underflow errors have been observed
when using u-boot as bootloader instead of YAMON. The reason for that
is that YAMON used to set the pcnet device to SRAM mode but u-boot does
not. As a result, the default Tx threshold (64 bytes) is now too small to
keep the fifo relatively used and it can result to Tx fifo underflow errors.
As a result of which, it's best to setup the SRAM on supported controllers
so we can always use the NOUFLO bit.
Cc: <netdev@vger.kernel.org> Cc: <linux-kernel@vger.kernel.org> Cc: Don Fry <pcnet32@frontier.com> Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
We currently use the device tree update code in the kernel after resuming
from a suspend operation to re-sync the kernels view of the device tree with
that of the hypervisor. The code as it stands is not endian safe as it relies
on parsing buffers returned by RTAS calls that thusly contains data in big
endian format.
This patch annotates variables and structure members with __be types as well
as performing necessary byte swaps to cpu endian for data that needs to be
parsed.
The idle_task_exit() function may call switch_mm() with next ==
&init_mm. On arm64, init_mm.pgd cannot be used for user mappings, so
this patch simply sets the reserved TTBR0.
Reported-by: Jon Medhurst (Tixy) <tixy@linaro.org> Tested-by: Jon Medhurst (Tixy) <tixy@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Stable commit "core, nfqueue, openvswitch: Orphan frags in
skb_zerocopy and handle errors", upstream commit 36d5fe6a000790f56039afe26834265db0a3ad4c, was not correctly backported
and missed to change a const 'from' parameter to non-const. This
results in a new batch of warnings:
net/netfilter/nfnetlink_queue_core.c: In function ‘nfqnl_zcopy’:
net/netfilter/nfnetlink_queue_core.c:272:2: warning: passing argument 1 of ‘skb_orphan_frags’ discards ‘const’ qualifier from pointer target type [enabled by default]
if (unlikely(skb_orphan_frags(from, GFP_ATOMIC))) {
^
In file included from net/netfilter/nfnetlink_queue_core.c:18:0:
include/linux/skbuff.h:1822:19: note: expected ‘struct sk_buff *’ but argument is of type ‘const struct sk_buff *’
static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask)
^
net/netfilter/nfnetlink_queue_core.c:273:3: warning: passing argument 1 of ‘skb_tx_error’ discards ‘const’ qualifier from pointer target type [enabled by default]
skb_tx_error(from);
^
In file included from net/netfilter/nfnetlink_queue_core.c:18:0:
include/linux/skbuff.h:630:13: note: expected ‘struct sk_buff *’ but argument is of type ‘const struct sk_buff *’
extern void skb_tx_error(struct sk_buff *skb);
Remove const from the 'from' parameter, the same as in the upstream
commit.
As far as I can see, this leaked into 3.10, 3.12, and 3.13 already.
Cc: Zoltan Kiss <zoltan.kiss@citrix.com> Cc: David S. Miller <davem@davemloft.net> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org # v3.10, v3.12, v3.13 Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Fix B-tree corruption when a new record is inserted at position 0 in the
node in hfs_brec_insert(). In this case a hfs_brec_update_parent() is
called to update the parent index node (if exists) and it is passed
hfs_find_data with a search_key containing a newly inserted key instead
of the key to be updated. This results in an inconsistent index node.
The bug reproduces on my machine after an extents overflow record for
the catalog file (CNID=4) is inserted into the extents overflow B-tree.
Because of a low (reserved) value of CNID=4, it has to become the first
record in the first leaf node.
A change in hfs_brec_insert() makes hfs_brec_update_parent() work
correctly by preventing it from getting fd->record=-1 value from
__hfs_brec_find().
Along the way, I removed duplicate code with unification of the if
condition. The resulting code is equivalent to the original code
because node is never 0.
Also hfs_brec_update_parent() will now return an error after getting a
negative fd->record value. However, the return value of
hfs_brec_update_parent() is not checked anywhere in the file and I'm
leaving it unchanged by this patch. brec.c lacks error checking after
some other calls too, but this issue is of less importance than the one
being fixed by this patch.
Signed-off-by: Sergei Antonov <saproj@gmail.com> Cc: Joe Perches <joe@perches.com> Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com> Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Anton Altaparmakov <aia21@cam.ac.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Since it's possible for the discard and write same queue limits to
change while the upper level command is being sliced and diced, fix up
both of them (a) to reject IO if the special command is unsupported at
the start of the function and (b) read the limits once and let the
commands error out on their own if the status happens to change.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The commit 9cade1a46c77 (dma: dw: split driver to library part and platform
code) introduced a separate platform driver but missed to add a
MODULE_ALIAS("platform:dw_dmac"); to that module.
The patch adds this to get driver loaded automatically if platform device is
registered.
Reported-by: "Blin, Jerome" <jerome.blin@intel.com> Fixes: 9cade1a46c77 (dma: dw: split driver to library part and platform code) Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Which is caused by an irq_work generating new irq_work and therefore
not allowing forward progress.
This happens because processing the perf irq_work triggers another
perf event (tracepoint stuff) which in turn generates an irq_work ad
infinitum.
Avoid this by raising the recursion counter in the irq_work -- which
effectively disables all software events (including tracepoints) from
actually triggering again.
Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20150219170311.GH21418@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Some APs experience problems when working with
U-APSD. Decreasing the probability of that
happening by using legacy mode for all ACs but VO
isn't enough.
Cisco 4410N originally forced us to enable VO by
default only because it treated non-VO ACs as
legacy.
However some APs (notably Netgear R7000) silently
reclassify packets to different ACs. Since u-APSD
ACs require trigger frames for frame retrieval
clients would never see some frames (e.g. ARP
responses) or would fetch them accidentally after
a long time.
It makes little sense to enable u-APSD queues by
default because it needs userspace applications to
be aware of it to actually take advantage of the
possible additional powersavings. Implicitly
depending on driver autotrigger frame support
doesn't make much sense.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
As HT/VHT depend heavily on QoS/WMM, it's not a good idea to
let userspace add clients that have HT/VHT but not QoS/WMM.
Since it does so in certain cases we've observed (client is
using HT IEs but not QoS/WMM) just ignore the HT/VHT info at
this point and don't pass it down to the drivers which might
unconditionally use it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
This patch fixes the incorrect use of __transport_register_session()
in tcm_qla2xxx_check_initiator_node_acl() code, that does not perform
explicit se_tpg->session_lock when accessing se_tpg->tpg_sess_list
to add new se_sess nodes.
Given that tcm_qla2xxx_check_initiator_node_acl() is not called with
qla_hw->hardware_lock held for all accesses of ->tpg_sess_list, the
code should be using transport_register_session() instead.
This patch adds a missing set of conditional check braces in
ft_invl_hw_context() originally introduced by commit dcd998ccd
when handling DDP failures in ft_recv_write_data() code.
When inserting a new register into a block at the lower end the present
bitmap is currently shifted into the wrong direction. The effect of this is
that the bitmap becomes corrupted and registers which are present might be
reported as not present and vice versa.
Fix this by shifting left rather than right.
Fixes: 472fdec7380c("regmap: rbtree: Reduce number of nodes, take 2") Reported-by: Daniel Baluta <daniel.baluta@gmail.com> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de> Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de> Acked-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de> Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de> Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de> Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de> Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de> Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de> Acked-by: Paul Handrigan <Paul.Handrigan@cirrus.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Right now the mvneta driver doesn't handle Tx IRQ, and relies on two
mechanisms to flush Tx descriptors : a flush at the end of mvneta_tx()
and a timer. If a burst of packets is emitted faster than the device
can send them, then the queue is stopped until next wake-up of the
timer 10ms later. This causes jerky output traffic with bursts and
pauses, making it difficult to reach line rate with very few streams.
A test on UDP traffic shows that it's not possible to go beyond 134
Mbps / 12 kpps of outgoing traffic with 1500-bytes IP packets. Routed
traffic tends to observe pauses as well if the traffic is bursty,
making it even burstier after the wake-up.
It seems that this feature was inherited from the original driver but
nothing there mentions any reason for not using the interrupt instead,
which the chip supports.
Thus, this patch enables Tx interrupts and removes the timer. It does
the two at once because it's not really possible to make the two
mechanisms coexist, so a split patch doesn't make sense.
First tests performed on a Mirabox (Armada 370) show that less CPU
seems to be used when sending traffic. One reason might be that we now
call the mvneta_tx_done_gbe() with a mask indicating which queues have
been done instead of looping over all of them.
The same UDP test above now happily reaches 987 Mbps / 87.7 kpps.
Single-stream TCP traffic can now more easily reach line rate. HTTP
transfers of 1 MB objects over a single connection went from 730 to
840 Mbps. It is even possible to go significantly higher (>900 Mbps)
by tweaking tcp_tso_win_divisor.
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Cc: Arnaud Ebalard <arno@natisbad.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Marvell has not published the chip's datasheet yet, so it's very hard
to find the relevant bits to manipulate to change the IRQ behaviour.
Fortunately, these bits are described in the proprietary LSP patch set
which is publicly available here :
http://www.plugcomputer.org/downloads/mirabox/
So let's put them back in the driver in order to reduce the burden of
current and future maintenance.
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
If a queue timeout is reported, we can oops because of some
schedules while the caller is atomic, as shown below :
mvneta d0070000.ethernet eth0: tx timeout
BUG: scheduling while atomic: bash/1528/0x00000100
Modules linked in: slhttp_ethdiv(C) [last unloaded: slhttp_ethdiv]
CPU: 2 PID: 1528 Comm: bash Tainted: G WC 3.13.0-rc4-mvebu-nf #180
[<c0011bd9>] (unwind_backtrace+0x1/0x98) from [<c000f1ab>] (show_stack+0xb/0xc)
[<c000f1ab>] (show_stack+0xb/0xc) from [<c02ad323>] (dump_stack+0x4f/0x64)
[<c02ad323>] (dump_stack+0x4f/0x64) from [<c02abe67>] (__schedule_bug+0x37/0x4c)
[<c02abe67>] (__schedule_bug+0x37/0x4c) from [<c02ae261>] (__schedule+0x325/0x3ec)
[<c02ae261>] (__schedule+0x325/0x3ec) from [<c02adb97>] (schedule_timeout+0xb7/0x118)
[<c02adb97>] (schedule_timeout+0xb7/0x118) from [<c0020a67>] (msleep+0xf/0x14)
[<c0020a67>] (msleep+0xf/0x14) from [<c01dcbe5>] (mvneta_stop_dev+0x21/0x194)
[<c01dcbe5>] (mvneta_stop_dev+0x21/0x194) from [<c01dcfe9>] (mvneta_tx_timeout+0x19/0x24)
[<c01dcfe9>] (mvneta_tx_timeout+0x19/0x24) from [<c024afc7>] (dev_watchdog+0x18b/0x1c4)
[<c024afc7>] (dev_watchdog+0x18b/0x1c4) from [<c0020b53>] (call_timer_fn.isra.27+0x17/0x5c)
[<c0020b53>] (call_timer_fn.isra.27+0x17/0x5c) from [<c0020cad>] (run_timer_softirq+0x115/0x170)
[<c0020cad>] (run_timer_softirq+0x115/0x170) from [<c001ccb9>] (__do_softirq+0xbd/0x1a8)
[<c001ccb9>] (__do_softirq+0xbd/0x1a8) from [<c001cfad>] (irq_exit+0x61/0x98)
[<c001cfad>] (irq_exit+0x61/0x98) from [<c000d4bf>] (handle_IRQ+0x27/0x60)
[<c000d4bf>] (handle_IRQ+0x27/0x60) from [<c000843b>] (armada_370_xp_handle_irq+0x33/0xc8)
[<c000843b>] (armada_370_xp_handle_irq+0x33/0xc8) from [<c000fba9>] (__irq_usr+0x49/0x60)
Ben Hutchings attempted to propose a better fix consisting in using a
scheduled work for this, but while it fixed this panic, it caused other
random freezes and panics proving that the reset sequence in the driver
is unreliable and that additional fixes should be investigated.
When sending multiple streams over a link limited to 100 Mbps, Tx timeouts
happen from time to time, and the driver correctly recovers only when the
function is disabled.
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Cc: Ben Hutchings <ben@decadent.org.uk> Tested-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Stats writers are mvneta_rx() and mvneta_tx(). They don't lock anything
when they update the stats, and as a result, it randomly happens that
the stats freeze on SMP if two updates happen during stats retrieval.
This is very easily reproducible by starting two HTTP servers and binding
each of them to a different CPU, then consulting /proc/net/dev in loops
during transfers, the interface should immediately lock up. This issue
also randomly happens upon link state changes during transfers, because
the stats are collected in this situation, but it takes more attempts to
reproduce it.
The comments in netdevice.h suggest using per_cpu stats instead to get
rid of this issue.
This patch implements this. It merges both rx_stats and tx_stats into
a single "stats" member with a single syncp. Both mvneta_rx() and
mvneta_rx() now only update the a single CPU's counters.
In turn, mvneta_get_stats64() does the summing by iterating over all CPUs
to get their respective stats.
With this change, stats are still correct and no more lockup is encountered.
Note that this bug was present since the first import of the mvneta
driver. It might make sense to backport it to some stable trees. If
so, it depends on "d33dc73 net: mvneta: increase the 64-bit rx/tx stats
out of the hot path".
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Tested-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
[wt: port to 3.10 : u64_stats_init() does not exist in 3.10 and is not needed] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Better count packets and bytes in the stack and on 32 bit then
accumulate them at the end for once. This saves two memory writes
and two memory barriers per packet. The incoming packet rate was
increased by 4.7% on the Openblocks AX3 thanks to this.
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Tested-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
pci_match_id() just match the static pci_device_id, which may return NULL if
someone binds the driver to a device manually using
/sys/bus/pci/drivers/.../new_id.
This patch wrap up a helper function __mlx4_remove_one() which does the tear
down function but preserve the drv_data. Functions like
mlx4_pci_err_detected() and mlx4_restart_one() will call this one with out
releasing drvdata.
Fixes: 97a5221 "net/mlx4_core: pass pci_device_id.driver_data to __mlx4_init_one during reset". CC: Bjorn Helgaas <bhelgaas@google.com> CC: Amir Vadai <amirv@mellanox.com> CC: Jack Morgenstein <jackm@dev.mellanox.co.il> CC: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The second parameter of __mlx4_init_one() is used to identify whether the
pci_dev is a PF or VF. Currently, when it is invoked in mlx4_pci_slot_reset()
this information is missed.
This patch match the pci_dev with mlx4_pci_table and passes the
pci_device_id.driver_data to __mlx4_init_one() in mlx4_pci_slot_reset().
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
sk_dst_cache has __rcu annotation, so we need a cast to avoid
following sparse error :
include/net/sock.h:1774:19: warning: incorrect type in initializer (different address spaces)
include/net/sock.h:1774:19: expected struct dst_entry [noderef] <asn:4>*__ret
include/net/sock.h:1774:19: got struct dst_entry *dst
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: 7f502361531e ("ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
This commit fixes the command value generated for CSUM calculation
when running in big endian mode. The Ethernet protocol ID for IP was
being unconditionally byte-swapped in the layer 3 protocol check (with
swab16), which caused the mvneta driver to not function correctly in
big endian mode. This patch byte-swaps the ID conditionally with
htons.
Cc: <stable@vger.kernel.org> # v3.13+ Signed-off-by: Thomas Fitzsimmons <fitzsim@fitzsim.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
*_result[len] is parsed as *(_result[len]) which is not at all what we
want to touch here.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 84a7c0b1db1c ("dns_resolver: assure that dns_query() result is null-terminated") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>