Syncookies borrow the ->rx_opt.ts_recent_stamp field to store the
timestamp of the last synflood. Protect them with READ_ONCE() and
WRITE_ONCE() since reads and writes aren't serialised.
Use of .rx_opt.ts_recent_stamp for storing the synflood timestamp was
introduced by a0f82f64e269 ("syncookies: remove last_synq_overflow from
struct tcp_sock"). But unprotected accesses were already there when
timestamp was stored in .last_synq_overflow.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When no synflood occurs, the synflood timestamp isn't updated.
Therefore it can be so old that time_after32() can consider it to be
in the future.
That's a problem for tcp_synq_no_recent_overflow() as it may report
that a recent overflow occurred while, in fact, it's just that jiffies
has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31.
Spurious detection of recent overflows lead to extra syncookie
verification in cookie_v[46]_check(). At that point, the verification
should fail and the packet dropped. But we should have dropped the
packet earlier as we didn't even send a syncookie.
Let's refine tcp_synq_no_recent_overflow() to report a recent overflow
only if jiffies is within the
[last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This
way, no spurious recent overflow is reported when jiffies wraps and
'last_overflow' becomes in the future from the point of view of
time_after32().
However, if jiffies wraps and enters the
[last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with
'last_overflow' being a stale synflood timestamp), then
tcp_synq_no_recent_overflow() still erroneously reports an
overflow. In such cases, we have to rely on syncookie verification
to drop the packet. We unfortunately have no way to differentiate
between a fresh and a stale syncookie timestamp.
In practice, using last_overflow as lower bound is problematic.
If the synflood timestamp is concurrently updated between the time
we read jiffies and the moment we store the timestamp in
'last_overflow', then 'now' becomes smaller than 'last_overflow' and
tcp_synq_no_recent_overflow() returns true, potentially dropping a
valid syncookie.
Reading jiffies after loading the timestamp could fix the problem,
but that'd require a memory barrier. Let's just accommodate for
potential timestamp growth instead and extend the interval using
'last_overflow - HZ' as lower bound.
Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If no synflood happens for a long enough period of time, then the
synflood timestamp isn't refreshed and jiffies can advance so much
that time_after32() can't accurately compare them any more.
Therefore, we can end up in a situation where time_after32(now,
last_overflow + HZ) returns false, just because these two values are
too far apart. In that case, the synflood timestamp isn't updated as
it should be, which can trick tcp_synq_no_recent_overflow() into
rejecting valid syncookies.
For example, let's consider the following scenario on a system
with HZ=1000:
* The synflood timestamp is 0, either because that's the timestamp
of the last synflood or, more commonly, because we're working with
a freshly created socket.
* We receive a new SYN, which triggers synflood protection. Let's say
that this happens when jiffies == 2147484649 (that is,
'synflood timestamp' + HZ + 2^31 + 1).
* Then tcp_synq_overflow() doesn't update the synflood timestamp,
because time_after32(2147484649, 1000) returns false.
With:
- 2147484649: the value of jiffies, aka. 'now'.
- 1000: the value of 'last_overflow' + HZ.
* A bit later, we receive the ACK completing the 3WHS. But
cookie_v[46]_check() rejects it because tcp_synq_no_recent_overflow()
says that we're not under synflood. That's because
time_after32(2147484649, 120000) returns false.
With:
- 2147484649: the value of jiffies, aka. 'now'.
- 120000: the value of 'last_overflow' + TCP_SYNCOOKIE_VALID.
Of course, in reality jiffies would have increased a bit, but this
condition will last for the next 119 seconds, which is far enough
to accommodate for jiffie's growth.
Fix this by updating the overflow timestamp whenever jiffies isn't
within the [last_overflow, last_overflow + HZ] range. That shouldn't
have any performance impact since the update still happens at most once
per second.
Now we're guaranteed to have fresh timestamps while under synflood, so
tcp_synq_no_recent_overflow() can safely use it with time_after32() in
such situations.
Stale timestamps can still make tcp_synq_no_recent_overflow() return
the wrong verdict when not under synflood. This will be handled in the
next patch.
For 64 bits architectures, the problem was introduced with the
conversion of ->tw_ts_recent_stamp to 32 bits integer by commit cca9bab1b72c ("tcp: use monotonic timestamps for PAWS").
The problem has always been there on 32 bits architectures.
Fixes: cca9bab1b72c ("tcp: use monotonic timestamps for PAWS") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the user changes prio2buffer mapping while global pause is
enabled, mlx5 driver incorrectly sets all active buffers
(buffer that has at least one priority mapped) to lossy.
Solution:
If global pause is enabled, set all the active buffers to lossless
in prio2buffer command.
Also, add error message when buffer size is not enough to meet
xoff threshold.
In order to set/get/dump, the tipc uses the generic netlink
infrastructure. So, when tipc module is inserted, init function
calls genl_register_family().
After genl_register_family(), set/get/dump commands are immediately
allowed and these callbacks internally use the net_generic.
net_generic is allocated by register_pernet_device() but this
is called after genl_register_family() in the __init function.
So, these callbacks would use un-initialized net_generic.
Test commands:
#SHELL1
while :
do
modprobe tipc
modprobe -rv tipc
done
Fixes: 7e4369057806 ("tipc: fix a slab object leak") Fixes: a62fbccecd62 ("tipc: make subscriber server support net namespace") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The openvswitch module shares a common conntrack and NAT infrastructure
exposed via netfilter. It's possible that a packet needs both SNAT and
DNAT manipulation, due to e.g. tuple collision. Netfilter can support
this because it runs through the NAT table twice - once on ingress and
again after egress. The openvswitch module doesn't have such capability.
Like netfilter hook infrastructure, we should run through NAT twice to
keep the symmetry.
Fixes: 05752523e565 ("openvswitch: Interface with NAT.") Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit 2b3e88ea6528 ("net: phy: improve phy state checking")
phy_start_aneg() expects phy state to be >= PHY_UP. Call phy_start()
before calling phy_start_aneg() during probe so that autonegotiation
is initiated.
As phy_start() takes care of calling phy_start_aneg(), drop the explicit
call to phy_start_aneg().
Network fails without this patch on Octeon TX.
Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking") Signed-off-by: Mian Yousaf Kaukab <ykaukab@suse.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sch->q.len hasn't been set if the subqueue is a NOLOCK qdisc
in mq_dump() and mqprio_dump().
Fixes: ce679e8df7ed ("net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio") Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now RX interrupt is triggered twice every time, because in
cpsw_rx_interrupt() it is asked first and then disabled. So there will be
pending interrupt always, when RX interrupt is enabled again in NAPI
handler.
Fix it by first disabling IRQ and then do ask.
Fixes: 870915feabdc ("drivers: net: cpsw: remove disable_irq/enable_irq as irq can be masked from cpsw itself") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 43e665287f93 ("net-next: dsa: fix flow dissection") added an
ability to override protocol and network offset during flow dissection
for DSA-enabled devices (i.e. controllers shipped as switch CPU ports)
in order to fix skb hashing for RPS on Rx path.
However, skb_hash() and added part of code can be invoked not only on
Rx, but also on Tx path if we have a multi-queued device and:
- kernel is running on UP system or
- XPS is not configured.
The call stack in this two cases will be like: dev_queue_xmit() ->
__dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() ->
skb_tx_hash() -> skb_get_hash().
The problem is that skbs queued for Tx have both network offset and
correct protocol already set up even after inserting a CPU tag by DSA
tagger, so calling tag_ops->flow_dissect() on this path actually only
breaks flow dissection and hashing.
This can be observed by adding debug prints just before and right after
tag_ops->flow_dissect() call to the related block of code:
In order to fix that we can add the check 'proto == htons(ETH_P_XDSA)'
to prevent code from calling tag_ops->flow_dissect() on Tx.
I also decided to initialize 'offset' variable so tagger callbacks can
now safely leave it untouched without provoking a chaos.
Fixes: 43e665287f93 ("net-next: dsa: fix flow dissection") Signed-off-by: Alexander Lobakin <alobakin@dlink.ru> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We have an interesting memory leak in the bridge when it is being
unregistered and is a slave to a master device which would change the
mac of its slaves on unregister (e.g. bond, team). This is a very
unusual setup but we do end up leaking 1 fdb entry because
dev_set_mac_address() would cause the bridge to insert the new mac address
into its table after all fdbs are flushed, i.e. after dellink() on the
bridge has finished and we call NETDEV_UNREGISTER the bond/team would
release it and will call dev_set_mac_address() to restore its original
address and that in turn will add an fdb in the bridge.
One fix is to check for the bridge dev's reg_state in its
ndo_set_mac_address callback and return an error if the bridge is not in
NETREG_REGISTERED.
Easy steps to reproduce:
1. add bond in mode != A/B
2. add any slave to the bond
3. add bridge dev as a slave to the bond
4. destroy the bridge device
Fixes: 43598813386f ("bridge: add local MAC address to forwarding table (v2)") Reported-by: syzbot+2add91c08eb181fea1bf@syzkaller.appspotmail.com Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When user runs a command like
tc qdisc add dev eth1 root mqprio
KASAN stack-out-of-bounds warning is emitted.
Currently, NLA_ALIGN macro used in mqprio_dump provides too large
buffer size as argument for nla_put and memcpy down the call stack.
The flow looks like this:
1. nla_put expects exact object size as an argument;
2. Later it provides this size to memcpy;
3. To calculate correct padding for SKB, nla_put applies NLA_ALIGN
macro itself.
Therefore, NLA_ALIGN should not be applied to the nla_put parameter.
Otherwise it will lead to out-of-bounds memory access in memcpy.
Fixes: 4e8b86c06269 ("mqprio: Introduce new hardware offload mode and shaper in mqprio") Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
syzbot was once again able to crash a host by setting a very small mtu
on loopback device.
Let's make inetdev_valid_mtu() available in include/net/ip.h,
and use it in ip_setup_cork(), so that we protect both ip_append_page()
and __ip_append_data()
Also add a READ_ONCE() when the device mtu is read.
Pairs this lockless read with one WRITE_ONCE() in __dev_set_mtu(),
even if other code paths might write over this field.
Add a big comment in include/linux/netdevice.h about dev->mtu
needing READ_ONCE()/WRITE_ONCE() annotations.
Hopefully we will add the missing ones in followup patches.
Fixes: 1470ddf7f8ce ("inet: Remove explicit write references to sk/inet in ip_append_data") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In attach_node_and_children memory is allocated for full_name via
kasprintf. If the condition of the 1st if is not met the function
returns early without freeing the memory. Add a kfree() to fix that.
This has been detected with kmemleak: Link: https://bugzilla.kernel.org/show_bug.cgi?id=205327
It looks like the leak was introduced by this commit: Fixes: 5babefb7f7ab ("of: unittest: allow base devicetree to have symbol metadata") Signed-off-by: Erhard Furtner <erhard_f@mailbox.org> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When using this driver on a Blizzard 1260, there were failures whenever DMA
transfers from the SCSI bus to memory of 65535 bytes were followed by a DMA
transfer of 1 byte. This caused the byte at offset 65535 to be overwritten
with 0xff. The Blizzard hardware can't handle single byte DMA transfers.
Besides this issue, limiting the DMA length to something that is not a
multiple of the page size is very inefficient on most file systems.
It seems this limit was chosen because the DMA transfer counter of the ESP
by default is 16 bits wide, thus limiting the length to 65535 bytes.
However, the value 0 means 65536 bytes, which is handled by the ESP and the
Blizzard just fine. It is also the default maximum used by esp_scsi when
drivers don't provide their own dma_length_limit() function.
The limit of 65536 bytes can be used by all boards except the Fastlane. The
old driver used a limit of 65532 bytes (0xfffc), which is reintroduced in
this patch.
Fixes: b7ded0e8b0d1 ("scsi: zorro_esp: Limit DMA transfers to 65535 bytes") Link: https://lore.kernel.org/r/20191112175523.23145-1-jongk@linux-m68k.org Signed-off-by: Kars de Jong <jongk@linux-m68k.org> Reviewed-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 5c089fd0c734 ("idr: Fix idr_get_next race with idr_remove")
neglected to fix idr_get_next_ul(). As far as I can tell, nobody's
actually using this interface under the RCU read lock, but fix it now
before anybody decides to use it.
Fixes: 5c089fd0c734 ("idr: Fix idr_get_next race with idr_remove") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Driver only supports 3-axis gyro and/or 3-axis accel.
For icm20602, temp data is mandatory for all configurations.
Fix all single and double axis configurations (almost never used) and more
importantly fix 3-axis gyro and 6-axis accel+gyro buffer on icm20602 when
temp data is not enabled.
When a port sends PLOGI, discovery state should be changed to login
pending, otherwise RELOGIN_NEEDED bit is set in
qla24xx_handle_plogi_done_event(). RELOGIN_NEEDED triggers another PLOGI,
and it never goes out of the loop until login timer expires.
Fixes: 8777e4314d397 ("scsi: qla2xxx: Migrate NVME N2N handling into state machine") Fixes: 8b5292bcfcacf ("scsi: qla2xxx: Fix Relogin to prevent modifying scan_state flag") Cc: Quinn Tran <qutran@marvell.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191125165702.1013-6-r.bolshakov@yadro.com Acked-by: Himanshu Madhani <hmadhani@marvell.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Tested-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
With commit 6ce220dd2f8ea71d6afc29b9a7524c12e39f374a ("raid5: don't set
STRIPE_HANDLE to stripe which is in batch list"), we don't want to set
STRIPE_HANDLE flag for sh which is already in batch list.
However, the stripe which is the head of batch list should set this flag,
otherwise panic could happen inside init_stripe at BUG_ON(sh->batch_head),
it is reproducible with raid5 on top of nvdimm devices per Xiao oberserved.
Thanks for Xiao's effort to verify the change.
Fixes: 6ce220dd2f8ea ("raid5: don't set STRIPE_HANDLE to stripe which is in batch list") Reported-by: Xiao Ni <xni@redhat.com> Tested-by: Xiao Ni <xni@redhat.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The Terra Pad 1061 has the usual micro-USB-B id-pin handler, but instead
of controlling the actual micro-USB-B it turns the 5V boost for the
tablet's USB-A connector and its keyboard-cover connector off.
The actual micro-USB-B connector on the tablet is wired for charging only,
and its id pin is *not* connected to the GPIO which is used for the
(broken) id-pin event handler in the DSDT.
While at it not only add a comment why the Terra Pad 1061 is on the
blacklist, but also fix the missing comment for the Minix Neo Z83-4 entry.
Fixes: 61f7f7c8f978 ("gpiolib: acpi: Add gpiolib_acpi_run_edge_events_on_boot option and blacklist") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
We used to skip reconnects on all SMB2_IOCTL commands due to SMB3+
FSCTL_VALIDATE_NEGOTIATE_INFO - which made sense since we're still
establishing a SMB session.
However, when refresh_cache_worker() calls smb2_get_dfs_refer() and
we're under reconnect, SMB2_ioctl() will not be able to get a proper
status error (e.g. -EHOSTDOWN in case we failed to reconnect) but an
-EAGAIN from cifs_send_recv() thus looping forever in
refresh_cache_worker().
Fixes: e99c63e4d86d ("SMB3: Fix deadlock in validate negotiate hits reconnect") Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Suggested-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
No changeset entries are created for #address-cells and #size-cells
properties, but the duplicated properties are never freed. This
results in a memory leak which is detected by kmemleak:
Commit 9287c6452d2b fixed a situation in which gfs2 could use a glock
after it had been freed. To do that, it temporarily added a new glock
reference by calling gfs2_glock_hold in function gfs2_add_revoke.
However, if the bd element was removed by gfs2_trans_remove_revoke, it
failed to drop the additional reference.
This patch adds logic to gfs2_trans_remove_revoke to properly drop the
additional glock reference.
strictly enforces the MACCTLR inizialization value - 39.3.1 - "Initial
Setting of PCI Express":
"Be sure to write the initial value (= H'80FF 0000) to MACCTLR before
enabling PCIETCTLR.CFINIT".
To avoid unexpected behavior and to match the SW initialization sequence
guidelines, this patch programs the MACCTLR with the correct value.
Note that the MACCTLR.SPCHG bit in the MACCTLR register description
reports that "Only writing 1 is valid and writing 0 is invalid" but this
"invalid" has to be interpreted as a write-ignore aka "ignored", not
"prohibited".
Reported-by: Eugeniu Rosca <erosca@de.adit-jv.com> Fixes: c25da4778803 ("PCI: rcar: Add Renesas R-Car PCIe driver") Fixes: be20bbcb0a8c ("PCI: rcar: Add the initialization of PCIe link in resume_noirq()") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: <stable@vger.kernel.org> # v5.2+ Signed-off-by: Sasha Levin <sashal@kernel.org>
The NETDEV_CHANGENAME code is not "unneeded" like it is stated in commit 4cb6560514fa ("leds: trigger: netdev: fix refcnt leak on interface
rename").
The event was accidentally misinterpreted equivalent to
NETDEV_UNREGISTER, but should be equivalent to NETDEV_REGISTER.
This was the case in the original code from the openwrt project.
Otherwise, you are unable to set netdev led triggers for (non-existent)
netdevices, which has to be renamed. This is the case, for example, for
ppp interfaces in openwrt.
Fixes: 06f502f57d0d ("leds: trigger: Introduce a NETDEV trigger") Fixes: 4cb6560514fa ("leds: trigger: netdev: fix refcnt leak on interface rename") Signed-off-by: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
I was investigating a crash in our Virtuozzo7 kernel which happened in
in svcauth_unix_set_client. I found out that we access m_client field
in ip_map structure, which was received from sunrpc_cache_lookup (we
have a bit older kernel, now the code is in sunrpc_cache_add_entry), and
these field looks uninitialized (m_client == 0x74 don't look like a
pointer) but in the cache_head in flags we see 0x1 which is CACHE_VALID.
It looks like the problem appeared from our previous fix to sunrpc (1):
commit 4ecd55ea0742 ("sunrpc: fix cache_head leak due to queued
request")
And we've also found a patch already fixing our patch (2):
commit d58431eacb22 ("sunrpc: don't mark uninitialised items as VALID.")
Though the crash is eliminated, I think the core of the problem is not
completely fixed:
Neil in the patch (2) makes cache_head CACHE_NEGATIVE, before
cache_fresh_locked which was added in (1) to fix crash. These way
cache_is_valid won't say the cache is valid anymore and in
svcauth_unix_set_client the function cache_check will return error
instead of 0, and we don't count entry as initialized.
But it looks like we need to remove cache_fresh_locked completely in
sunrpc_cache_lookup:
In (1) we've only wanted to make cache_fresh_unlocked->cache_dequeue so
that cache_requests with no readers also release corresponding
cache_head, to fix their leak. We with Vasily were not sure if
cache_fresh_locked and cache_fresh_unlocked should be used in pair or
not, so we've guessed to use them in pair.
Now we see that we don't want the CACHE_VALID bit set here by
cache_fresh_locked, as "valid" means "initialized" and there is no
initialization in sunrpc_cache_add_entry. Both expiry_time and
last_refresh are not used in cache_fresh_unlocked code-path and also not
required for the initial fix.
So to conclude cache_fresh_locked was called by mistake, and we can just
safely remove it instead of crutching it with CACHE_NEGATIVE. It looks
ideologically better for me. Hope I don't miss something here.
Fixes: d58431eacb22 ("sunrpc: don't mark uninitialised items as VALID.") Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If device_register() fails, both put_device() and kfree() are called,
ending with a double free of the scmi_dev.
Calling kfree() is needed only when a failure happens between the
allocation of the scmi_dev and its registration, so move it to there
and remove it from the error flow.
Fixes: 46edb8d1322c ("firmware: arm_scmi: provide the mandatory device release callback") Signed-off-by: Wen Yang <wenyang@linux.alibaba.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
After pskb_may_pull() we should always refetch the header
pointers from the skb->data in case it got reallocated.
In gre_parse_header(), the erspan header is still fetched
from the 'options' pointer which is fetched before
pskb_may_pull().
Found this during code review of a KMSAN bug report.
Fixes: cb73ee40b1b3 ("net: ip_gre: use erspan key field for tunnel lookup") Cc: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Acked-by: William Tu <u9012063@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
8962842ca5ab ("blk-mq: avoid sysfs buffer overflow with too many CPU cores")
avoids sysfs buffer overflow, and reserves one character for line break.
However, the last snprintf() doesn't get correct 'size' parameter passed
in, so fixed it.
Fixes: 8962842ca5ab ("blk-mq: avoid sysfs buffer overflow with too many CPU cores") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a secondary CPU is brought up it must initialize its control
registers. CPU A which triggers that a secondary CPU B is brought up
stores its control register contents into the lowcore of new CPU B,
which then loads these values on startup.
This is problematic in various ways: the control register which
contains the home space ASCE will correctly contain the kernel ASCE;
however control registers for primary and secondary ASCEs are
initialized with whatever values were present in CPU A.
Typically:
- the primary ASCE will contain the user process ASCE of the process
that triggered onlining of CPU B.
- the secondary ASCE will contain the percpu VDSO ASCE of CPU A.
Due to lazy ASCE handling we may also end up with other combinations.
When then CPU B switches to a different process (!= idle) it will
fixup the primary ASCE. However the problem is that the (wrong) ASCE
from CPU A was loaded into control register 1: as soon as an ASCE is
attached (aka loaded) a CPU is free to generate TLB entries using that
address space.
Even though it is very unlikey that CPU B will actually generate such
entries, this could result in TLB entries of the address space of the
process that ran on CPU A. These entries shouldn't exist at all and
could cause problems later on.
Furthermore the secondary ASCE of CPU B will not be updated correctly.
This means that processes may see wrong results or even crash if they
access VDSO data on CPU B. The correct VDSO ASCE will eventually be
loaded on return to user space as soon as the kernel executed a call
to strnlen_user or an atomic futex operation on CPU B.
Fix both issues by intializing the to be loaded control register
contents with the correct ASCEs and also enforce (re-)loading of the
ASCEs upon first context switch and return to user space.
Fixes: 0aaba41b58bc ("s390: remove all code using the access register mode") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Userspace falls short when trying to find out whether a specific memory
range is eligible for THP. There are usecases that would like to know
that
http://lkml.kernel.org/r/alpine.DEB.2.21.1809251248450.50347@chino.kir.corp.google.com
: This is used to identify heap mappings that should be able to fault thp
: but do not, and they normally point to a low-on-memory or fragmentation
: issue.
The only way to deduce this now is to query for hg resp. nh flags and
confronting the state with the global setting. Except that there is also
PR_SET_THP_DISABLE that might change the picture. So the final logic is
not trivial. Moreover the eligibility of the vma depends on the type of
VMA as well. In the past we have supported only anononymous memory VMAs
but things have changed and shmem based vmas are supported as well these
days and the query logic gets even more complicated because the
eligibility depends on the mount option and another global configuration
knob.
Simplify the current state and report the THP eligibility in
/proc/<pid>/smaps for each existing vma. Reuse
transparent_hugepage_enabled for this purpose. The original
implementation of this function assumes that the caller knows that the vma
itself is supported for THP so make the core checks into
__transparent_hugepage_enabled and use it for existing callers.
__show_smap just use the new transparent_hugepage_enabled which also
checks the vma support status (please note that this one has to be out of
line due to include dependency issues).
[mhocko@kernel.org: fix oops with NULL ->f_mapping] Link: http://lkml.kernel.org/r/20181224185106.GC16738@dhcp22.suse.cz Link: http://lkml.kernel.org/r/20181211143641.3503-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Oppenheimer <bepvte@gmail.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The Rockchip PMIC driver can automatically detect connected component
versions by reading the ID_MSB and ID_LSB registers. The probe function
will always fail with RK818 PMICs because the ID_MSK is 0xFFF0 and the
RK818 template ID is 0x8181.
This patch changes this value to 0x8180.
Fixes: 9d6105e19f61 ("mfd: rk808: Fix up the chip id get failed") Cc: stable@vger.kernel.org Cc: Elaine Zhang <zhangqing@rock-chips.com> Cc: Joseph Chen <chenjh@rock-chips.com> Signed-off-by: Daniel Schultz <d.schultz@phytec.de> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
No need to wait for any commit once the page is fully truncated.
Besides, it may confuse e.g. concurrent ext4_writepage() with the page
still be dirty (will be cleared by truncate_pagecache() in
ext4_setattr()) but buffers has been freed; and then trigger a bug
show as below:
And the bug will be triggered once we seen the below order.
reproduce1 reproduce2
... | ...
truncate to 4k |
change to journal data mode |
| memcpy(set page dirty)
truncate to 0: |
ext4_setattr: |
... |
ext4_wait_for_tail_page_commit |
| mbind(trigger bug)
truncate_pagecache(clean dirty)| ...
... |
mbind will call ext4_writepage() since the page still be dirty, and then
report the bug since the buffers has been free. Fix it by return
directly once offset equals to 0 which means the page has been fully
truncated.
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: yangerkun <yangerkun@huawei.com> Link: https://lore.kernel.org/r/20190919063508.1045-1-yangerkun@huawei.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andreas Grünbacher reports that on the two filesystems that support
iomap directio, it's possible for splice() to return -EAGAIN (instead of
a short splice) if the pipe being written to has less space available in
its pipe buffers than the length supplied by the calling process.
Months ago we fixed splice_direct_to_actor to clamp the length of the
read request to the size of the splice pipe. Do the same to do_splice.
Fixes: 17614445576b6 ("splice: don't read more than available pipe space") Reported-by: syzbot+3c01db6025f26530cf8d@syzkaller.appspotmail.com Reported-by: Andreas Grünbacher <andreas.gruenbacher@gmail.com> Reviewed-by: Andreas Grünbacher <andreas.gruenbacher@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When setting the time in the future with the uie timer enabled,
rtc_timer_do_work will loop for a while because the expiration of the uie
timer was way before the current RTC time and a new timer will be enqueued
until the current rtc time is reached.
If the uie timer is enabled, disable it before setting the time and enable
it after expiring current timers (which may actually be an alarm).
This is the safest thing to do to ensure the uie timer is still
synchronized with the RTC, especially in the UIE emulation case.
Reported-by: syzbot+08116743f8ad6f9a6de7@syzkaller.appspotmail.com Fixes: 6610e0893b8b ("RTC: Rework RTC code to use timerqueue for events") Link: https://lore.kernel.org/r/20191020231320.8191-1-alexandre.belloni@bootlin.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The 'a0' member of 'struct arm_smccc_res' is declared as 'unsigned long',
however the Qualcomm SCM firmware interface driver expects to receive
negative error codes via this field, so ensure that it's cast to 'long'
before comparing to see if it is less than 0.
If the file system is corrupted such that a file's i_links_count is
too small, then it's possible that when unlinking that file, i_nlink
will already be zero. Previously we were working around this kind of
corruption by forcing i_nlink to one; but we were doing this before
trying to delete the directory entry --- and if the file system is
corrupted enough that ext4_delete_entry() fails, then we exit with
i_nlink elevated, and this causes the orphan inode list handling to be
FUBAR'ed, such that when we unmount the file system, the orphan inode
list can get corrupted.
A better way to fix this is to simply skip trying to call drop_nlink()
if i_nlink is already zero, thus moving the check to the place where
it makes the most sense.
clock_getres in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().
In particular, posix_get_hrtimer_res() does:
sec = 0;
ns = hrtimer_resolution;
and hrtimer_resolution depends on the enablement of the high
resolution timers that can happen either at compile or at run time.
Fix the powerpc vdso implementation of clock_getres keeping a copy of
hrtimer_resolution in vdso data and using that directly.
Fixes: a7f290dad32e ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel") Cc: stable@vger.kernel.org Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Shuah Khan <skhan@linuxfoundation.org>
[chleroy: changed CLOCK_REALTIME_RES to CLOCK_HRTIMER_RES] Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a55eca3a5e85233838c2349783bcb5164dae1d09.1575273217.git.christophe.leroy@c-s.fr Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit aea447141c7e ("powerpc: Disable -Wbuiltin-requires-header when
setjmp is used") disabled -Wbuiltin-requires-header because of a
warning about the setjmp and longjmp declarations.
r367387 in clang added another diagnostic around this, complaining
that there is no jmp_buf declaration.
In file included from ../arch/powerpc/xmon/xmon.c:47:
../arch/powerpc/include/asm/setjmp.h:10:13: error: declaration of
built-in function 'setjmp' requires the declaration of the 'jmp_buf'
type, commonly provided in the header <setjmp.h>.
[-Werror,-Wincomplete-setjmp-declaration]
extern long setjmp(long *);
^
../arch/powerpc/include/asm/setjmp.h:11:13: error: declaration of
built-in function 'longjmp' requires the declaration of the 'jmp_buf'
type, commonly provided in the header <setjmp.h>.
[-Werror,-Wincomplete-setjmp-declaration]
extern void longjmp(long *, long);
^
2 errors generated.
We are not using the standard library's longjmp/setjmp implementations
for obvious reasons; make this clear to clang by using -ffreestanding
on these files.
On SMP platform, when continuously running wifi up/down, the napi
poll can be scheduled during chip reset, which will call
ath10k_pci_has_fw_crashed() to check the fw status. But in the reset
period, the value from FW_INDICATOR_ADDRESS register will return
0xdeadbeef, which also be treated as fw crash. Fix the issue by
moving chip reset after napi disabled.
If the system has other devices being registered in the component
framework, the compare function will be called with a device that
doesn't belong to vimc.
This device is not necessarily a platform_device, nor have a
platform_data (which causes a NULL pointer dereference error) and if it
does have a pdata, it is not necessarily type of struct vimc_platform_data.
So casting to any of these types is wrong.
Instead of expecting a given pdev with a given pdata, just expect for
the device it self. vimc-core is the one who creates them, we know in
advance exactly which object to expect in the match.
Fixes: 4a29b7090749 ("[media] vimc: Subdevices as modules") Signed-off-by: Helen Koike <helen.koike@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Tested-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The accumulator sample register is signed 32-bits wide register on
droid 4. And only the earlier version of cpcap has a signed 24-bits
wide register. We're currently passing it around as unsigned, so
let's fix that and use sign_extend32() for the earlier revision.
Signed-off-by: Tony Lindgren <tony@atomide.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The MC4_MISC thresholding quirk needs to be applied during S5 -> S0 and
S3 -> S0 state transitions, which follow different code paths. Carve it
out into a separate function and call it mce_amd_feature_init() where
the two code paths of the state transitions converge.
[ bp: massage commit message and the carved out function. ]
The SAS controller cannot support a programmed minimum linkrate of > 1.5G
(it will always negotiate to 1.5G at least), so just reject it.
This solves a strange situation where the PHY negotiated linkrate may be
less than the programmed minimum linkrate.
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Send primitive NOTIFY to SSP situation only, or it causes underflow issue
when sending IO. Also rename hisi_sas_hw.sl_notify() to hisi_sas_hw.
sl_notify_ssp().
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In hnae3_register_ae_dev(), ae_algo->ops is assigned to ae_dev->ops
before check that ae_algo->ops is valid.
And in hnae3_register_ae_algo(), missing check for ae_algo->ops.
This patch fixes them.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
hnae3_register_ae_dev() may fail, and it should return a error code
to its caller, so change hnae3_register_ae_dev() return type to int.
Also, when hnae3_register_ae_dev() return error, hns3_probe() should
do some error handling and return the error code.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
When unload hns3 driver, we should clear the pci private data.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
smc_cdc_get_free_slot() might wait for free transfer buffers when using
SMC-R. This wait should not be done under the send_lock, which is a
spin_lock. This fixes a cpu loop in parallel threads waiting for the
send_lock.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
To ensure parent qdiscs have the same notion of the number of enqueued
packets even after splitting a GSO packet, update the qdisc tree with the
number of packets that was added due to the split.
Reported-by: Pete Heist <pete@heistp.net> Tested-by: Pete Heist <pete@heistp.net> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Clang warns when an implicit conversion is done between enumerated
types:
drivers/block/drbd/drbd_state.c:708:8: warning: implicit conversion from
enumeration type 'enum drbd_ret_code' to different enumeration type
'enum drbd_state_rv' [-Wenum-conversion]
rv = ERR_INTR;
~ ^~~~~~~~
drbd_request_detach_interruptible's only call site is in the return
statement of adm_detach, which returns an int. Change the return type of
drbd_request_detach_interruptible to match, silencing Clang's warning.
Reported-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
Driver missed classifying the chip type for G7 when reporting supported
topologies. This resulted in loop being shown as supported on FC links that
are not supported per the standard.
Add the chip classifications to the topology checks in the driver.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Driver is setting bits in word 10 of the SLI4 ABORT WQE (the wqid). The
field was a carry over from a prior SLI revision. The field does not exist
in SLI4, and the action may result in an overlap with future definition of
the WQE.
Remove the setting of WQID in the ABORT WQE.
Also cleaned up WQE field settings - initialize to zero, don't bother to
set fields to zero.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Depending on the chipset, the number of NPIV vports may vary and be in
excess of what most switches support (256). To avoid confusion with the
users, limit the reported NPIV vports to 256.
Additionally correct the 16G adapter which is reporting a bogus NPIV vport
number if the link is down.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
With a wl1251 child node of mmc3 in the device tree decoded
in omap_hsmmc.c to handle special wl1251 initialization, we do
no longer need to instantiate the mmc3 through pdata quirks.
We also can remove the wlan regulator and reset/interrupt definitions
and do them through device tree.
Fixes: 81eef6ca9201 ("mmc: omap_hsmmc: Use dma_request_chan() for requesting DMA channel") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Cc: <stable@vger.kernel.org> # v4.7+ Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
spin_unlock_irqrestore() might be called with stale flags after
reading port status, possibly restoring interrupts to a incorrect
state.
If a usb2 port just finished resuming while the port status is read
the spin lock will be temporary released and re-acquired in a separate
function. The flags parameter is passed as value instead of a pointer,
not updating flags properly before the final spin_unlock_irqrestore()
is called.
when GPSC/GPDB switch command fails, driver just returns without doing a
proper cleanup. This patch fixes this memory leak by calling sp->free() in
the error path.
Link: https://lore.kernel.org/r/20191105150657.8092-4-hmadhani@marvell.com Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This patch updates log message which indicates number of vectors used by
the driver instead of displaying failure to get maximum requested
vectors. Driver will always request maximum vectors during
initialization. In the event driver is not able to get maximum requested
vectors, it will adjust the allocated vectors. This is normal and does not
imply failure in driver.
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Link: https://lore.kernel.org/r/20190830222402.23688-2-hmadhani@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Set the r??_data_len variables before using these instead of after.
This patch fixes the following Coverity complaint:
const: At condition req_data_len != rsp_data_len, the value of req_data_len
must be equal to 0.
const: At condition req_data_len != rsp_data_len, the value of rsp_data_len
must be equal to 0.
dead_error_condition: The condition req_data_len != rsp_data_len cannot be
true.
Cc: Himanshu Madhani <hmadhani@marvell.com> Fixes: a9b6f722f62d ("[SCSI] qla2xxx: Implementation of bidirectional.") # v3.7. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Himanshu Madhani <hmadhani@marvell.com> Reviewed-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
With debug kernel we see following wanings indicating memory leak.
[28809.523959] WARNING: CPU: 3 PID: 6790 at lib/dma-debug.c:978
dma_debug_device_change+0x166/0x1d0
[28809.523964] pci 0000:0c:00.6: DMA-API: device driver has pending DMA
allocations while released from device [count=5]
[28809.523964] One of leaked entries details: [device
address=0x00000002aefe4000] [size=8208 bytes] [mapped with DMA_BIDIRECTIONAL]
[mapped as coherent]
Fix this by unmapping DMA memory.
Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
While v2.6.26 commit b75db73159cc ("[SCSI] zfcp: Add qtcb dump to hba debug
trace") is right that we don't want to flood the (payload) trace ring
buffer, we don't trace successful FCP command responses by default. So we
can include the channel log for problem determination with failed responses
of any FSF request type.
Fixes: b75db73159cc ("[SCSI] zfcp: Add qtcb dump to hba debug trace") Fixes: a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Cc: <stable@vger.kernel.org> #2.6.38+ Link: https://lore.kernel.org/r/e37597b5c4ae123aaa85fd86c23a9f71e994e4a9.1572018132.git.bblock@linux.ibm.com Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
- one is the normal discard merge, just like normal read/write request,
and call it single-range discard
- another is the multi-range discard, queue_max_discard_segments(rq->q) > 1
For the former case, queue_max_discard_segments(rq->q) is 1, and we
should handle this kind of discard merge like the normal read/write
request.
This patch fixes the following kernel panic issue[1], which is caused by
not removing the single-range discard request from elevator queue.
Guangwu has one raid discard test case, in which this issue is a bit
easier to trigger, and I verified that this patch can fix the kernel
panic issue in Guangwu's test case.
Since commit d0a5b995a308 (vfs: Add IOP_XATTR inode operations flag)
extended attributes haven't worked on the root directory in reiserfs.
This is due to reiserfs conditionally setting the sb->s_xattrs handler
array depending on whether it located or create the internal privroot
directory. It necessarily does this after the root inode is already
read in. The IOP_XATTR flag is set during inode initialization, so
it never gets set on the root directory.
This commit unconditionally assigns sb->s_xattrs and clears IOP_XATTR on
internal inodes. The old return values due to the conditional assignment
are handled via open_xa_root, which now returns EOPNOTSUPP as the VFS
would have done.
Link: https://lore.kernel.org/r/20191024143127.17509-1-jeffm@suse.com CC: stable@vger.kernel.org Fixes: d0a5b995a308 ("vfs: Add IOP_XATTR inode operations flag") Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Estimate for the number of credits needed for final freeing of inode in
ext4_evict_inode() was to small. We may modify 4 blocks (inode & sb for
orphan deletion, bitmap & group descriptor for inode freeing) and not
just 3.
There is a race window where quota was redirted once we drop dq_list_lock inside dqput(),
but before we grab dquot->dq_lock inside dquot_release()
TASK1 TASK2 (chowner)
->dqput()
we_slept:
spin_lock(&dq_list_lock)
if (dquot_dirty(dquot)) {
spin_unlock(&dq_list_lock);
dquot->dq_sb->dq_op->write_dquot(dquot);
goto we_slept
if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
spin_unlock(&dq_list_lock);
dquot->dq_sb->dq_op->release_dquot(dquot);
dqget()
mark_dquot_dirty()
dqput()
goto we_slept;
}
So dquot dirty quota will be released by TASK1, but on next we_sleept loop
we detect this and call ->write_dquot() for it.
XFSTEST: https://github.com/dmonakhov/xfstests/commit/440a80d4cbb39e9234df4d7240aee1d551c36107
The PCI INTx interrupts and other LSI interrupts are handled differently
under a sPAPR platform. When the interrupt source characteristics are
queried, the hypervisor returns an H_INT_ESB flag to inform the OS
that it should be using the H_INT_ESB hcall for interrupt management
and not loads and stores on the interrupt ESB pages.
A default -1 value is returned for the addresses of the ESB pages. The
driver ignores this condition today and performs a bogus IO mapping.
Recent changes and the DEBUG_VM configuration option make the bug
visible with :
When the machine crash handler is invoked, all interrupts are masked
but interrupts which have not been started yet do not have an ESB page
mapped in the Linux address space. This crashes the 'crash kexec'
sequence on sPAPR guests.
To fix, force the mapping of the ESB page when an interrupt is being
mapped in the Linux IRQ number space. This is done by setting the
initial state of the interrupt to OFF which is not necessarily the
case on PowerNV.
Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191031063100.3864-1-clg@kaod.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When calling __kernel_sync_dicache with a size >4GB, we were masking
off the upper 32 bits, so we would incorrectly flush a range smaller
than intended.
This patch replaces the 32 bit shifts with 64 bit ones, so that
the full size is accounted for.
The MMC card detection GPIO polarity is active low on TAO3530, like in many
other similar boards. Now the card is not detected and it is unable to
mount rootfs from an SD card.
Fix this by using the correct polarity.
This incorrect polarity was defined already in the commit 30d95c6d7092
("ARM: dts: omap3: Add Technexion TAO3530 SOM omap3-tao3530.dtsi") in v3.18
kernel and later changed to use defined GPIO constants in v4.4 kernel by
the commit 3a637e008e54 ("ARM: dts: Use defined GPIO constants in flags
cell for OMAP2+ boards").
While the latter commit did not introduce the issue I'm marking it with
Fixes tag due the v4.4 kernels still being maintained.
Fixes: 3a637e008e54 ("ARM: dts: Use defined GPIO constants in flags cell for OMAP2+ boards") Cc: linux-stable <stable@vger.kernel.org> # 4.4+ Signed-off-by: Jarkko Nikula <jarkko.nikula@bitmer.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pandora_wl1251_init_card was used to do special pdata based
setup of the sdio mmc interface. This does no longer work with
v4.7 and later. A fix requires a device tree based mmc3 setup.
Therefore we move the special setup to omap_hsmmc.c instead
of calling some pdata supplied init_card function.
The new code checks for a DT child node compatible to wl1251
so it will not affect other MMC3 use cases.
Generally, this code was and still is a hack and should be
moved to mmc core to e.g. read such properties from optional
DT child nodes.
Fixes: 81eef6ca9201 ("mmc: omap_hsmmc: Use dma_request_chan() for requesting DMA channel") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Cc: <stable@vger.kernel.org> # v4.7+
[Ulf: Fixed up some checkpatch complaints] Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In s3c64xx_eint_eint0_init() the for_each_child_of_node() loop is used
with a break to find a matching child node. Although each iteration of
for_each_child_of_node puts the previous node, but early exit from loop
misses it. This leads to leak of device node.
Several functions use for_each_child_of_node() loop with a break to find
a matching child node. Although each iteration of
for_each_child_of_node puts the previous node, but early exit from loop
misses it. This leads to leak of device node.
In s3c24xx_eint_init() the for_each_child_of_node() loop is used with a
break to find a matching child node. Although each iteration of
for_each_child_of_node puts the previous node, but early exit from loop
misses it. This leads to leak of device node.
In exynos_eint_wkup_init() the for_each_child_of_node() loop is used
with a break to find a matching child node. Although each iteration of
for_each_child_of_node puts the previous node, but early exit from loop
misses it. This leads to leak of device node.
Cc: <stable@vger.kernel.org> Fixes: 43b169db1841 ("pinctrl: add exynos4210 specific extensions for samsung pinctrl driver") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Each iteration of for_each_child_of_node puts the previous node, but in
the case of a return from the middle of the loop, there is no put, thus
causing a memory leak. Hence add an of_node_put before the return of
exynos_eint_wkup_init() error path.
Issue found with Coccinelle.
As explained in the following commit a9a1a4833613 ("pinctrl:
armada-37xx: Fix gpio interrupt setup") the armada_37xx_irq_set_type()
function can be called before the initialization of the mask field.
That means that we can't use this field in this function and need to
workaround it using hwirq.
Fixes: 30ac0d3b0702 ("pinctrl: armada-37xx: Add edge both type gpio irq support") Cc: stable@vger.kernel.org Reported-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Link: https://lore.kernel.org/r/20191115155752.2562-1-gregory.clement@bootlin.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Certain ACPI-enumerated devices represented as platform devices in
Linux, like fans, require special low-level power management handling
implemented by their drivers that is not in agreement with the ACPI
PM domain behavior. That leads to problems with managing ACPI fans
during system-wide suspend and resume.
For this reason, make acpi_dev_pm_attach() skip the affected devices
by adding a list of device IDs to avoid to it and putting the IDs of
the affected devices into that list.
Fixes: e5cc8ef31267 (ACPI / PM: Provide ACPI PM callback routines for subsystems) Reported-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com> Cc: 3.10+ <stable@vger.kernel.org> # 3.10+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In i2c_acpi_remove_space_handler(), a leak occurs whenever the
"data" parameter is initialized to 0 before being passed to
acpi_bus_get_private_data().
This is because the NULL pointer check in acpi_bus_get_private_data()
(condition->if(!*data)) returns EINVAL and, in consequence, memory is
never freed in i2c_acpi_remove_space_handler().
Fix the NULL pointer check in acpi_bus_get_private_data() to follow
the analogous check in acpi_get_data_full().
Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>
[ rjw: Subject & changelog ] Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>