Before commit 778be232a207 ("NFS do not find client in NFSv4
pg_authenticate"), the Linux callback server replied with
RPC_AUTH_ERROR / RPC_AUTH_BADCRED, instead of dropping the CB
request. Let's restore that behavior so the server has a chance to
do something useful about it, and provide a warning that helps
admins correct the problem.
Fixes: 778be232a207 ("NFS do not find client in NFSv4 ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If an RPC program does not set vs_dispatch and pc_func() returns
rpc_drop_reply, the server sends a reply anyway containing a single
word containing the value RPC_DROP_REPLY (in network byte-order, of
course). This is a nonsense RPC message.
Fixes: 9e701c610923 ("svcrpc: simpler request dropping") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Variable "now" seems to be genuinely used unintialized
if branch
if (CPUCLOCK_PERTHREAD(timer->it_clock)) {
is not taken and branch
if (unlikely(sighand == NULL)) {
is taken. In this case the process has been reaped and the timer is marked as
disarmed anyway. So none of the postprocessing of the sample is
required. Return right away.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Link: http://lkml.kernel.org/r/20160707223911.GA26483@p183.telecom.by Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The EFI firmware on Macs contains a full-fledged network stack for
downloading OS X images from osrecovery.apple.com. Unfortunately
on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331
wireless card on every boot and leaves it enabled even after
ExitBootServices has been called. The card continues to assert its IRQ
line, causing spurious interrupts if the IRQ is shared. It also corrupts
memory by DMAing received packets, allowing for remote code execution
over the air. This only stops when a driver is loaded for the wireless
card, which may be never if the driver is not installed or blacklisted.
The issue seems to be constrained to the Broadcom 4331. Chris Milsted
has verified that the newer Broadcom 4360 built into the MacBookPro11,3
(2013/2014) does not exhibit this behaviour. The chances that Apple will
ever supply a firmware fix for the older machines appear to be zero.
The solution is to reset the card on boot by writing to a reset bit in
its mmio space. This must be done as an early quirk and not as a plain
vanilla PCI quirk to successfully combat memory corruption by DMAed
packets: Matthew Garrett found out in 2012 that the packets are written
to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html).
This type of memory is made available to the page allocator by
efi_free_boot_services(). Plain vanilla PCI quirks run much later, in
subsys initcall level. In-between a time window would be open for memory
corruption. Random crashes occurring in this time window and attributed
to DMAed packets have indeed been observed in the wild by Chris
Bainbridge.
When Matthew Garrett analyzed the memory corruption issue in 2012, he
sought to fix it with a grub quirk which transitions the card to D3hot:
http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56
This approach does not help users with other bootloaders and while it
may prevent DMAed packets, it does not cure the spurious interrupts
emanating from the card. Unfortunately the card's mmio space is
inaccessible in D3hot, so to reset it, we have to undo the effect of
Matthew's grub patch and transition the card back to D0.
Note that the quirk takes a few shortcuts to reduce the amount of code:
The size of BAR 0 and the location of the PM capability is identical
on all affected machines and therefore hardcoded. Only the address of
BAR 0 differs between models. Also, it is assumed that the BCMA core
currently mapped is the 802.11 core. The EFI driver seems to always take
care of this.
Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback
towards finding the best solution to this problem.
The following should be a comprehensive list of affected models:
iMac13,1 2012 21.5" [Root Port 00:1c.3 = 8086:1e16]
iMac13,2 2012 27" [Root Port 00:1c.3 = 8086:1e16]
Macmini5,1 2011 i5 2.3 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini5,2 2011 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini5,3 2011 i7 2.0 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini6,1 2012 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1e12]
Macmini6,2 2012 i7 2.3 GHz [Root Port 00:1c.1 = 8086:1e12]
MacBookPro8,1 2011 13" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro8,2 2011 15" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro8,3 2011 17" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro9,1 2012 15" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro9,2 2012 13" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro10,1 2012 15" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro10,2 2012 13" [Root Port 00:1c.1 = 8086:1e12]
For posterity, spurious interrupts caused by the Broadcom 4331 wireless
card resulted in splats like this (stacktrace omitted):
We used to scan secondary buses until the following commit that
was applied in 2009:
8659c406ade3 ("x86: only scan the root bus in early PCI quirks")
which commit constrained early quirks to the root bus only. Its
motivation was to prevent application of the nvidia_bugs quirk
on secondary buses.
We're about to add a quirk to reset the Broadcom 4331 wireless card on
2011/2012 Macs, which is located on a secondary bus behind a PCIe root
port. To facilitate that, reintroduce scanning of secondary buses.
The commit message of 8659c406ade3 notes that scanning only the root bus
"saves quite some unnecessary scanning work". The algorithm used prior
to 8659c406ade3 was particularly time consuming because it scanned
buses 0 to 31 brute force. To avoid lengthening boot time, employ a
recursive strategy which only scans buses that are actually reachable
from the root bus.
Yinghai Lu pointed out that the secondary bus number read from a
bridge's config space may be invalid, in particular a value of 0 would
cause an infinite loop. The PCI core goes beyond that and recurses to a
child bus only if its bus number is greater than the parent bus number
(see pci_scan_bridge()). Since the root bus is numbered 0, this implies
that secondary buses may not be 0. Do the same on early scanning.
If this algorithm is found to significantly impact boot time or cause
infinite loops on broken hardware, it would be possible to limit its
recursion depth: The Broadcom 4331 quirk applies at depth 1, all others
at depth 0, so the bus need not be scanned deeper than that for now. An
alternative approach would be to revert to scanning only the root bus,
and apply the Broadcom 4331 quirk to the root ports 8086:1c12, 8086:1e12
and 8086:1e16. Apple always positioned the card behind either of these
three ports. The quirk would then check presence of the card in slot 0
below the root port and do its deed.
Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-pci@vger.kernel.org Link: http://lkml.kernel.org/r/f0daa70dac1a9b2483abdb31887173eb6ab77bdf.1465690253.git.lukas@wunner.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8659c406ade3 ("x86: only scan the root bus in early PCI quirks")
... early quirks are only applied to devices on the root bus.
The motivation was to prevent application of the nvidia_bugs quirk on
secondary buses.
We're about to reintroduce scanning of secondary buses for a quirk to
reset the Broadcom 4331 wireless card on 2011/2012 Macs. To prevent
regressions, open code the requirement to apply nvidia_bugs only on the
root bus.
Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/4d5477c1d76b2f0387a780f2142bbcdd9fee869b.1465690253.git.lukas@wunner.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Implement memory barriers according to Documentation/circular-buffers.txt:
- use smp_store_release() to update ringbuffer read/write pointers
- use smp_load_acquire() to load write pointer on reader side
- use ACCESS_ONCE() to load read pointer on writer side
This fixes data stream corruptions observed e.g. on an ARM Cortex-A9
quad core system with different types (PCI, USB) of DVB tuners.
The reset value of weekday is 0x1. This is wrong since
the reset values of the day/month/year make up to Jan 1 2001.
When computed weekday comes out to be Monday. On a scale
of 1-7(Sunday - Saturday) it should be 0x2. So we should not
be relying on the reset value.
Hence compute the wday using the current date/month/year values.
Check if reset wday is any different from the computed wday,
If different then set the wday which we computed using
date/month/year values.
Matt reported that we have a NULL pointer dereference
in ppp_pernet() from ppp_connect_channel(),
i.e. pch->chan_net is NULL.
This is due to that a parallel ppp_unregister_channel()
could happen while we are in ppp_connect_channel(), during
which pch->chan_net set to NULL. Since we need a reference
to net per channel, it makes sense to sync the refcnt
with the life time of the channel, therefore we should
release this reference when we destroy it.
Fixes: 1f461dcdd296 ("ppp: take reference on channels netns") Reported-by: Matt Bennett <Matt.Bennett@alliedtelesis.co.nz> Cc: Paul Mackerras <paulus@samba.org> Cc: linux-ppp@vger.kernel.org Cc: Guillaume Nault <g.nault@alphalink.fr> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Commit aebea2ba0f74 ("net: mvneta: fix Tx interrupt delay") intended to
set coalescing threshold to a value guaranteeing interrupt generation
per each sent packet, so that buffers can be released with no delay.
In fact setting threshold to '1' was wrong, because it causes interrupt
every two packets. According to the documentation a reason behind it is
following - interrupt occurs once sent buffers counter reaches a value,
which is higher than one specified in MVNETA_TXQ_SIZE_REG(q). This
behavior was confirmed during tests. Also when testing the SoC working
as a NAS device, better performance was observed with int-per-packet,
as it strongly depends on the fact that all transmitted packets are
released immediately.
This commit enables NETA controller work in interrupt per sent packet mode
by setting coalescing threshold to 0.
Signed-off-by: Dmitri Epshtein <dima@marvell.com> Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Fixes aebea2ba0f74 ("net: mvneta: fix Tx interrupt delay") Acked-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The chmap ctls assigned to PCM streams are freed in the PCM disconnect
callback. However, since the disconnect callback isn't called when
the card gets freed before registering, the chmap ctls may still be
left assigned. They are eventually freed together with other ctls,
but it may cause an Oops at pcm_chmap_ctl_private_free(), as the
function refers to the assigned PCM stream, while the PCM objects have
been already freed beforehand.
The fix is to free the chmap ctls also at PCM free callback, not only
at PCM disconnect.
Reported-by: Laxminath Kasam <b_lkasam@codeaurora.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
snd_ctl_remove() has a notification for the removal event. It's
superfluous when done during the device got disconnected. Although
the notification itself is mostly harmless, it may potentially be
harmful, and should be suppressed. Actually some components PCM may
free ctl elements during the disconnect or free callbacks, thus it's
no theoretical issue.
This patch adds the check of card->shutdown flag for avoiding
unnecessary notifications after (or during) the disconnect.
Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
DRM_CONNECTOR_POLL_CONNECT only enables polling for connections, not
disconnections. Because of this, we end up losing hotplug polling for
analog connectors once they get connected.
Easy way to reproduce:
- Grab a machine with a radeon GPU and a VGA port
- Plug a monitor into the VGA port, wait for it to update the connector
from disconnected to connected
- Disconnect the monitor on VGA, a hotplug event is never sent for the
removal of the connector.
Originally, only using DRM_CONNECTOR_POLL_CONNECT might have been a good
idea since doing VGA polling can sometimes result in having to mess with
the DAC voltages to figure out whether or not there's actually something
there since VGA doesn't have HPD. Doing this would have the potential of
showing visible artifacts on the screen every time we ran a poll while a
VGA display was connected. Luckily, radeon_vga_detect() only resorts to
this sort of polling if the poll is forced, and DRM's polling helper
doesn't force it's polls.
Additionally, this removes some assignments to connector->polled that
weren't actually doing anything.
Signed-off-by: Lyude <cpaul@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ATPX dGPU power control requires a 200ms delay between
power off and on. This should fix dGPU failures on
resume from power off.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Speedy join only works when the received packet is either broadcast or an
4addr unicast packet. Thus packets converted from broadcast to unicast via
the gateway handling code have to be converted to 4addr packets to allow
the receiving gateway server to add the sender address as temporary entry
to the translation table.
Not doing it will make the batman-adv gateway server drop the DHCP response
in many situations because it doesn't yet have the TT entry for the
destination of the DHCP response.
Fixes: 371351731e9c ("batman-adv: change interface_rx to get orig node") Signed-off-by: Sven Eckelmann <sven@narfation.org> Acked-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When the core THP code is modifying the permissions of a huge page it
calls pmd_modify(), which unfortunately was clearing the _PAGE_HUGE bit
of the page table entry. The result can be kernel messages like:
This fixes a pretty ancient bug that hasn't manifested itself
until now.
The scratchbuf for command queue is allocated only for 32 slots
but is accessed with the queue write pointer - which can be
up to 256.
Since the scratch buf size was 16 and there are up to 256 TFDs
we never passed a page boundary when accessing the scratch buffer,
but when attempting to increase the size of the scratch buffer a
panic was quick to follow when trying to access the address resulted
in a page boundary.
Signed-off-by: Sara Sharon <sara.sharon@intel.com> Fixes: 38c0f334b359 ("iwlwifi: use coherent DMA memory for command header") Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
[bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If s_reserved_gdt_blocks is extremely large, it's possible for
ext4_init_block_bitmap(), which is called when ext4 sets up an
uninitialized block bitmap, to corrupt random kernel memory. Add the
same checks which e2fsck has --- it must never be larger than
blocksize / sizeof(__u32) --- and then add a backup check in
ext4_init_block_bitmap() in case the superblock gets modified after
the file system is mounted.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.16:
- Use EIO instead of EFSCORRUPTED
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The orig_ifinfo reference counter for last_bonding_candidate in
batadv_orig_node has to be reduced when an originator node is released.
Otherwise the orig_ifinfo is leaked and the reference counter the netdevice
is not reduced correctly.
Fixes: f3b3d9018975 ("batman-adv: add bonding again") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
[bwh: Backported to 3.16:
- s/_put/_free_ref/
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The replacement of last_bonding_candidate in batadv_orig_node has to be an
atomic operation. Otherwise it is possible that the reference counter of a
batadv_orig_ifinfo is reduced which was no longer the
last_bonding_candidate when the new candidate is added. This can either
lead to an invalid memory access or to reference leaks which make it
impossible to an interface which was added to batman-adv.
Fixes: f3b3d9018975 ("batman-adv: add bonding again") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
[bwh: Backported to 3.16:
- s/kref_get/atomic_inc/
- s/_put/_free_ref/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The pointer batadv_bla_claim::backbone_gw can be changed at any time.
Therefore, access to it must be protected to ensure that two function
accessing the same backbone_gw are actually accessing the same. This is
especially important when the crc_lock is used or when the backbone_gw of a
claim is exchanged.
Not doing so leads to invalid memory access and/or reference leaks.
Fixes: 23721387c409 ("batman-adv: add basic bridge loop avoidance code") Fixes: 5a1dd8a4773d ("batman-adv: lock crc access in bridge loop avoidance") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
[bwh: Backported to 3.16:
- s/kref_get/atomic_inc/
- s/_put/_free_ref/
- Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
We have found some networks in which nodes were constantly requesting
other nodes BLA claim tables to synchronize, just to ask for that again
once completed. The reason was that the crc checksum of the asked nodes
were out of sync due to missing locking and multiple writes to the same
crc checksum when adding/removing entries. Therefore the asked nodes
constantly reported the wrong crc, which caused repeating requests.
To avoid multiple functions changing a backbone gateways crc entry at
the same time, lock it using a spinlock.
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Tested-by: Alfons Name <AlfonsName@web.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
batadv_orig_node_new uses batadv_orig_node_vlan_new to allocate a new
batadv_orig_node_vlan and add it to batadv_orig_node::vlan_list. References
to this list have also to be cleaned when the batadv_orig_node is removed.
Fixes: 7ea7b4a14275 ("batman-adv: make the TT CRC logic VLAN specific") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
[bwh: Backported to 3.16:
- vlan_list is a list not an hlist
- s/_put/_free_ref/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
vlan_insert_tag can return NULL on errors. The distributed arp table code
therefore has to check the return value of vlan_insert_tag for NULL before
it can safely operate on this pointer.
Fixes: be1db4f6615b ("batman-adv: make the Distributed ARP Table vlan aware") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
vlan_insert_tag can return NULL on errors. The bridge loop avoidance code
therefore has to check the return value of vlan_insert_tag for NULL before
it can safely operate on this pointer.
Fixes: 23721387c409 ("batman-adv: add basic bridge loop avoidance code") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If ext4_fill_super() fails early, it's possible for ext4_evict_inode()
to call ext4_should_journal_data() before superblock options and flags
are fully set up. In that case, the iput() on the journal inode can
end up causing a BUG().
Work around this problem by reordering the tests so we only call
ext4_should_journal_data() after we know it's not the journal inode.
Fixes: 2d859db3e4 ("ext4: fix data corruption in inodes with journalled data") Fixes: 2b405bfa84 ("ext4: fix data=journal fast mount/umount hang") Cc: Jan Kara <jack@suse.cz> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
deadlock in ext4_writepages() which was previously much harder to hit.
After this commit xfstest generic/130 reproduces the deadlock on small
filesystems.
The problem happens when ext4_do_update_inode() sets LARGE_FILE feature
and marks current inode handle as synchronous. That subsequently results
in ext4_journal_stop() called from ext4_writepages() to block waiting for
transaction commit while still holding page locks, reference to io_end,
and some prepared bio in mpd structure each of which can possibly block
transaction commit from completing and thus results in deadlock.
Fix the problem by releasing page locks, io_end reference, and
submitting prepared bio before calling ext4_journal_stop().
[ Changed to defer the call to ext4_journal_stop() only if the handle
is synchronous. --tytso ]
Reported-and-tested-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
An extent with lblock = 4294967295 and len = 1 will pass the
ext4_valid_extent() test:
ext4_lblk_t last = lblock + len - 1;
if (len == 0 || lblock > last)
return 0;
since last = 4294967295 + 1 - 1 = 4294967295. This would later trigger
the BUG_ON(es->es_lblk + es->es_len < es->es_lblk) in ext4_es_end().
We can simplify it by removing the - 1 altogether and changing the test
to use lblock + len <= lblock, since now if len = 0, then lblock + 0 ==
lblock and it fails, and if len > 0 then lblock + len > lblock in order
to pass (i.e. it doesn't overflow).
Fixes: 5946d0893 ("ext4: check for overlapping extents in ext4_valid_extent_entries()") Fixes: 2f974865f ("ext4: check for zero length extent explicitly") Cc: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The function ar9003_hw_apply_minccapwr_thresh takes as second parameter not
a pointer to the channel but a boolean value describing whether the channel
is 2.4GHz or not. This broke (according to the origin commit) the ETSI
regulatory compliance on 5GHz channels.
Fixes: 3533bf6b15a0 ("ath9k: Fix regulatory compliance") Signed-off-by: Sven Eckelmann <sven@narfation.org> Cc: Simon Wunderlich <sw@simonwunderlich.de> Cc: Sujith Manoharan <c_manoha@qca.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This patch fixes an issue that the CFIFOSEL register value is possible
to be changed by usbhsg_ep_enable() wrongly. And then, a data transfer
using CFIFO may not work correctly.
For example:
# modprobe g_multi file=usb-storage.bin
# ifconfig usb0 192.168.1.1 up
(During the USB host is sending file to the mass storage)
# ifconfig usb0 down
In this case, since the u_ether.c may call usb_ep_enable() in
eth_stop(), if the renesas_usbhs driver is also using CFIFO for
mass storage, the mass storage may not work correctly.
So, this patch adds usbhs_lock() and usbhs_unlock() calling in
usbhsg_ep_enable() to protect CFIFOSEL register. This is because:
- CFIFOSEL.CURPIPE = 0 is also needed for the pipe configuration
- The CFIFOSEL (fifo->sel) is already protected by usbhs_lock()
Fixes: 97664a207bc2 ("usb: renesas_usbhs: shrink spin lock area") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This patch fixes an issue that the xfer_work() is possible to cause
NULL pointer dereference if the usb cable is disconnected while data
transfer is running.
In such case, a gadget driver may call usb_ep_disable()) before
xfer_work() is actually called. In this case, the usbhs_pkt_pop()
will call usbhsf_fifo_unselect(), and then usbhs_pipe_to_fifo()
in xfer_work() will return NULL.
Fixes: e73a989 ("usb: renesas_usbhs: add DMAEngine support") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Several users reported wifi cannot be unblocked as discussed in [1].
This patch removes the use of the 2009 flag by BIOS but uses the actual
WMI function calls - it will be skipped if WMI reports unsupported.
pm_runtime_get_sync does return a error value that must be checked for
error conditions, else, due to various reasons, the device maynot be
enabled and the system will crash due to lack of clock to the hardware
module.
Before:
12.562784] [00000000] *pgd=fe193835
12.562792] Internal error: : 1406 [#1] SMP ARM
[...]
12.562864] CPU: 1 PID: 241 Comm: modprobe Not tainted 4.7.0-rc4-next-20160624 #2
12.562867] Hardware name: Generic DRA74X (Flattened Device Tree)
12.562872] task: ed51f140 ti: ed44c000 task.ti: ed44c000
12.562886] PC is at omap4_rng_init+0x20/0x84 [omap_rng]
12.562899] LR is at set_current_rng+0xc0/0x154 [rng_core]
[...]
After the proper checks:
[ 94.366705] omap_rng 48090000.rng: _od_fail_runtime_resume: FIXME:
missing hwmod/omap_dev info
[ 94.375767] omap_rng 48090000.rng: Failed to runtime_get device -19
[ 94.382351] omap_rng 48090000.rng: initialization failed.
Fixes: 665d92fa85b5 ("hwrng: OMAP: convert to use runtime PM") Cc: Paul Walmsley <paul@pwsan.com> Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
On non-DeviceTree platforms, the index of serial device is a static
variable incremented on each probe. It is incremented even if deferred
probe happens when getting the clock in s3c24xx_serial_init_port().
This index is used for referencing elements of statically allocated
s3c24xx_serial_ports array. In case of re-probe, the index will point
outside of this array leading to memory corruption.
Increment the index only on successful probe.
Reported-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Fixes: b497549a035c ("[ARM] S3C24XX: Split serial driver into core and per-cpu drivers") Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When the clk_get() of "uart" clock returns EPROBE_DEFER, the next re-probe
finishes with success but uses invalid (ERR_PTR) values. This leads to
dereferencing of ERR_PTR stored under ourport->clk:
12c30000.serial: Controller clock not found
(...) 12c30000.serial: ttySAC3 at MMIO 0x12c30000 (irq = 61, base_baud = 0) is a S3C6400/10
Unable to handle kernel paging request at virtual address fffffdfb
(clk_prepare) from [<c039f7d0>] (s3c24xx_serial_pm+0x20/0x128)
(s3c24xx_serial_pm) from [<c0395414>] (uart_change_pm+0x38/0x40)
(uart_change_pm) from [<c039689c>] (uart_add_one_port+0x31c/0x44c)
(uart_add_one_port) from [<c03a035c>] (s3c24xx_serial_probe+0x2a8/0x418)
(s3c24xx_serial_probe) from [<c03ee110>] (platform_drv_probe+0x50/0xb0)
(platform_drv_probe) from [<c03ecb44>] (driver_probe_device+0x1f4/0x2b0)
(driver_probe_device) from [<c03eb0c0>] (bus_for_each_drv+0x44/0x8c)
(bus_for_each_drv) from [<c03ec8c8>] (__device_attach+0x9c/0x100)
(__device_attach) from [<c03ebf54>] (bus_probe_device+0x84/0x8c)
(bus_probe_device) from [<c03ec388>] (deferred_probe_work_func+0x60/0x8c)
(deferred_probe_work_func) from [<c012fee4>] (process_one_work+0x120/0x328)
(process_one_work) from [<c0130150>] (worker_thread+0x2c/0x4ac)
(worker_thread) from [<c0135320>] (kthread+0xd8/0xf4)
(kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)
The first unsuccessful clk_get() causes s3c24xx_serial_init_port() to
exit with failure but the s3c24xx_uart_port is left half-configured
(e.g. port->mapbase is set, clk contains ERR_PTR). On next re-probe,
the function s3c24xx_serial_init_port() will exit early with success
because of configured port->mapbase and driver will use old values,
including the ERR_PTR as clock.
Fix this by cleaning the port->mapbase on error path so each re-probe
will initialize all of the port settings.
Fixes: 60e93575476f ("serial: samsung: enable clock before clearing pending interrupts during init") Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com> Tested-by: Javier Martinez Canillas <javier@osg.samsung.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16: driver doesn't set up DMA here] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
radeon_device_init() returns an error if either of the two calls to
radeon_init() fail. One level up in the call stack,
radeon_driver_load_kms() will then skip runtime pm initialization and
call radeon_driver_unload_kms(), which acquires a runtime pm ref that
is leaked.
Balance by releasing a runtime pm ref in the error path of
radeon_device_init().
Fixes: 10ebc0bc0934 ("drm/radeon: add runtime PM support (v2)") Cc: Dave Airlie <airlied@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/fa5bb977c1fe00474acedae5b03232dbf0b49410.1465392124.git.lukas@wunner.de Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
radeon_driver_load_kms() calls pm_runtime_put_autosuspend() if
radeon_is_px(dev), but radeon_driver_unload_kms() calls
pm_runtime_get_sync() unconditionally. We therefore leak a runtime pm
ref whenever radeon is unloaded on a non-PX machine or if runpm=0. The
GPU will subsequently never runtime suspend after loading radeon again.
Fix by taking the runtime pm ref under the same condition that it was
released on driver load.
Fixes: 10ebc0bc0934 ("drm/radeon: add runtime PM support (v2)") Cc: Dave Airlie <airlied@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/aaf71106c042126817aeca8b8e54ed468ab61ef7.1465392124.git.lukas@wunner.de Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
nouveau_drm_load() calls pm_runtime_put() if nouveau_runtime_pm != 0,
but nouveau_drm_unload() calls pm_runtime_get_sync() unconditionally.
We therefore leak a runtime pm ref whenever nouveau is loaded with
runpm=0 and then unloaded. The GPU will subsequently never runtime
suspend even if nouveau is loaded again with runpm=1.
Fix by taking the runtime pm ref under the same condition that it was
released on driver load.
Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)") Cc: Dave Airlie <airlied@redhat.com> Cc: Ben Skeggs <bskeggs@redhat.com> Reported-by: Karol Herbst <karolherbst@gmail.com> Tested-by: Karol Herbst <karolherbst@gmail.com> Tested-by: Peter Wu <peter@lekensteyn.nl> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1544b82007037601fbc510b1a50edc56c529e75f.1465392124.git.lukas@wunner.de Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f3af36511e60 ("usb: dwc3: gadget: always
enable IOC on bulk/interrupt transfers") ended up
regressing Isochronous endpoints by clearing
DWC3_EP_BUSY flag too early, which resulted in
choppy audio playback over USB.
Fix that by partially reverting original commit and
making sure that we check for isochronous endpoints.
Fixes: f3af36511e60 ("usb: dwc3: gadget: always enable IOC
on bulk/interrupt transfers") Signed-off-by: Konrad Leszczynski <konrad.leszczynski@intel.com> Signed-off-by: Rafal Redzimski <rafal.f.redzimski@intel.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Until now, our understanding for HW I/O coherency to work on the
Cortex-A9 based Marvell SoC was that only the PCIe regions should be
mapped strongly-ordered. However, we were still encountering some
deadlocks, especially when testing the CESA crypto engine. After
checking with the HW designers, it was concluded that all the MMIO
registers should be mapped as strongly ordered for the HW I/O coherency
mechanism to work properly.
This fixes some easy to reproduce deadlocks with the CESA crypto engine
driver (dmcrypt on a sufficiently large disk partition).
Tested-by: Terry Stockert <stockert@inkblotadmirer.me> Tested-by: Romain Perier <romain.perier@free-electrons.com> Cc: Terry Stockert <stockert@inkblotadmirer.me> Cc: Romain Perier <romain.perier@free-electrons.com> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")
... set rq->prev_* to 0 after a CPU hotplug comes back, in order to
fix the case where (after CPU hotplug) steal time is smaller than
rq->prev_steal_time.
However, this should never happen. Steal time was only smaller because of the
KVM-specific bug fixed by the previous patch. Worse, the previous patch
triggers a bug on CPU hot-unplug/plug operation: because
rq->prev_steal_time is cleared, all of the CPU's past steal time will be
accounted again on hot-plug.
Since the root cause has been fixed, we can just revert commit e9532e69b8d1.
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 'commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")' Link: http://lkml.kernel.org/r/1465813966-3116-3-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The Hyper-V Linux Integration Services use the VMBus implementation for
communication with the Hypervisor. VMBus registers its own interrupt
handler that completely bypasses the common Linux interrupt handling.
This implies that the interrupt entropy collector is not triggered.
This patch adds the interrupt entropy collection callback into the VMBus
interrupt handler function.
Signed-off-by: Stephan Mueller <stephan.mueller@atsec.com> Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Since systemd is consistently using /dev/urandom before it is
initialized, we can't see the other potentially dangerous users of
/dev/urandom immediately after boot. So print the first ten such
complaints instead.
Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
NBANK() macro assumes that ngpios is a multiple of 8(BANK_SZ) and
hence results in 0 banks for PCA9536 which has just 4 gpios. This is
wrong as PCA9356 has 1 bank with 4 gpios. This results in uninitialized
PCA953X_INVERT register. Fix this by using DIV_ROUND_UP macro in
NBANK().
Signed-off-by: Vignesh R <vigneshr@ti.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Similar to the AR93xx series, the AR94xx and the Qualcomm QCA988x also have
the same quirk for the Bus Reset.
Fixes: c3e59ee4e766 ("PCI: Mark Atheros AR93xx to avoid bus reset") Signed-off-by: Chris Blake <chrisrblake93@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
It seems risky to always rely on the caller to ensure the socket's
address family is correct before passing it to the NetLabel kAPI,
especially since we see at least one LSM which didn't. Add address
family checks to the *_delattr() functions to help prevent future
problems.
Reported-by: Maninder Singh <maninder1.s@samsung.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When s5p_mfc_remove() calls put_device() for the reserved memory region
devs, the driver core warns that the dev doesn't have a release callback:
WARNING: CPU: 0 PID: 591 at drivers/base/core.c:251 device_release+0x8c/0x90
Device 's5p-mfc-l' does not have a release() function, it is broken and must be fixed.
Also, the declared DMA memory using dma_declare_coherent_memory() isn't
relased so add a dev .release that calls dma_release_declared_memory().
Fixes: 6e83e6e25eb4 ("[media] s5p-mfc: Fix kernel warning on memory init") Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The devices don't have a name set, so makes dev_name() returns NULL which
makes harder to identify the devices that are causing issues, for example:
WARNING: CPU: 2 PID: 616 at drivers/base/core.c:251 device_release+0x8c/0x90
Device '(null)' does not have a release() function, it is broken and must be fixed.
And after setting the device name:
WARNING: CPU: 0 PID: 591 at drivers/base/core.c:251 device_release+0x8c/0x90
Device 's5p-mfc-l' does not have a release() function, it is broken and must be fixed.
Fixes: 6e83e6e25eb4 ("[media] s5p-mfc: Fix kernel warning on memory init") Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When we postpone a broadcast packet we save the source port in
the skb if it is local. However, the source port can disappear
before we get a chance to process the packet.
This patch fixes this by holding a ref count on the netdev.
It also delays the skb->cb modification until after we allocate
the new skb as you should not modify shared skbs.
Fixes: 412ca1550cbe ("macvlan: Move broadcasts into a work queue") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Many devices use userspace bluetooth stacks like BlueZ or Bluedroid in combination
with uhid. If any of these stacks is used with a HID device for which the driver
performs a HID request as part .probe (or technically another HID operation),
this results in a deadlock situation. The deadlock results in a 5 second timeout
for I/O operations in HID drivers, so isn't fatal, but none of the I/O operations
have a chance of succeeding.
The root cause for the problem is that uhid only allows for one request to be
processed at a time per uhid instance and locks out other operations. This means
that if a user space is creating a new HID device through 'UHID_CREATE', which
ultimately triggers '.probe' through the HID layer. Then any HID request e.g. a
read for calibration data would trigger a HID operation on uhid again, but it
won't go out to userspace, because it is still stuck in UHID_CREATE.
In addition bluetooth stacks are typically single threaded, so they wouldn't be
able to handle any requests while waiting on uhid.
Lucikly the UHID spec is somewhat flexible and allows for fixing the issue,
without breaking user space. The idea which the patch implements as discussed
with David Herrmann is to decouple adding of a hid device (which triggers .probe)
from UHID_CREATE. The work will kick off roughly once UHID_CREATE completed (or
else will wait a tiny bit of time in .probe for a lock). A HID driver has to call
HID to call 'hid_hw_start()' as part of .probe once it is ready for I/O, which
triggers UHID_START to user space. Any HID operations should function now within
.probe and won't deadlock because userspace is stuck on UHID_CREATE.
We verified this patch on Bluedroid with Android 6.0 and on desktop Linux with
BlueZ stacks. Prior to the patch they had the deadlock issue.
The number of bits, nbits, is calculated in mpi_read_raw_data() as follows:
nbits = nbytes * 8;
Afterwards, the number of leading zero bits of the first byte get
subtracted:
nbits -= count_leading_zeros(buffer[0]);
However, count_leading_zeros() takes an unsigned long and thus,
the u8 gets promoted to an unsigned long.
Thus, the above doesn't subtract the number of leading zeros in the most
significant nonzero input byte from nbits, but the number of leading
zeros of the most significant nonzero input byte promoted to unsigned long,
i.e. BITS_PER_LONG - 8 too many.
Fix this by subtracting
count_leading_zeros(...) - (BITS_PER_LONG - 8)
from nbits only.
Fixes: e1045992949 ("MPILIB: Provide a function to read raw data into an
MPI") Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This is an ancient bug that was actually attempted to be fixed once
(badly) by me eleven years ago in commit 4ceb5db9757a ("Fix
get_user_pages() race for write access") but that was then undone due to
problems on s390 by commit f33ea7f404e5 ("fix get_user_pages bug").
In the meantime, the s390 situation has long been fixed, and we can now
fix it by checking the pte_dirty() bit properly (and do it better). The
s390 dirty bit was implemented in abf09bed3cce ("s390/mm: implement
software dirty bits") which made it into v3.9. Earlier kernels will
have to look at the page state itself.
Also, the VM has become more scalable, and what used a purely
theoretical race back then has become easier to trigger.
To fix it, we introduce a new internal FOLL_COW flag to mark the "yes,
we already did a COW" rather than play racy games with FOLL_WRITE that
is very fundamental, and then use the pte dirty flag to validate that
the FOLL_COW flag is still valid.
Reported-and-tested-by: Phil "not Paul" Oester <kernel@linuxace.com> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Michal Hocko <mhocko@suse.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Nick Piggin <npiggin@gmail.com> Cc: Greg Thelen <gthelen@google.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[carnil: backport to 3.16, adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This partially reverts commit 1086bbe97a07 ("netfilter: ensure number of
counters is >0 in do_replace()") in net/bridge/netfilter/ebtables.c.
Setting rules with ebtables does not work any more with 1086bbe97a07 place.
There is an error message and no rules set in the end.
e.g.
~# ebtables -t nat -A POSTROUTING --src 12:34:56:78:9a:bc -j DROP
Unable to update the kernel. Two possible causes:
1. Multiple ebtables programs were executing simultaneously. The ebtables
userspace tool doesn't by default support multiple ebtables programs
running
Reverting the ebtables part of 1086bbe97a07 makes this work again.
Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
It turns out we don't validate that the num_counters field in the
struct we pass in from userspace is initialized.
The same problem also exists in ebtables, arptables, ipv6, and the
compat variants.
Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This looks like refactoring, but its also a bug fix.
Problem is that the compat path (32bit iptables, 64bit kernel) lacks a few
sanity tests that are done in the normal path.
For example, we do not check for underflows and the base chain policies.
While its possible to also add such checks to the compat path, its more
copy&pastry, for instance we cannot reuse check_underflow() helper as
e->target_offset differs in the compat case.
Other problem is that it makes auditing for validation errors harder; two
places need to be checked and kept in sync.
At a high level 32 bit compat works like this:
1- initial pass over blob:
validate match/entry offsets, bounds checking
lookup all matches and targets
do bookkeeping wrt. size delta of 32/64bit structures
assign match/target.u.kernel pointer (points at kernel
implementation, needed to access ->compatsize etc.)
2- allocate memory according to the total bookkeeping size to
contain the translated ruleset
3- second pass over original blob:
for each entry, copy the 32bit representation to the newly allocated
memory. This also does any special match translations (e.g.
adjust 32bit to 64bit longs, etc).
4- check if ruleset is free of loops (chase all jumps)
5-first pass over translated blob:
call the checkentry function of all matches and targets.
The alternative implemented by this patch is to drop steps 3&4 from the
compat process, the translation is changed into an intermediate step
rather than a full 1:1 translate_table replacement.
In the 2nd pass (step #3), change the 64bit ruleset back to a kernel
representation, i.e. put() the kernel pointer and restore ->u.user.name .
This gets us a 64bit ruleset that is in the format generated by a 64bit
iptables userspace -- we can then use translate_table() to get the
'native' sanity checks.
This has two drawbacks:
1. we re-validate all the match and target entry structure sizes even
though compat translation is supposed to never generate bogus offsets.
2. we put and then re-lookup each match and target.
THe upside is that we get all sanity tests and ruleset validations
provided by the normal path and can remove some duplicated compat code.
iptables-restore time of autogenerated ruleset with 300k chains of form
-A CHAIN0001 -m limit --limit 1/s -j CHAIN0002
-A CHAIN0002 -m limit --limit 1/s -j CHAIN0003
shows no noticeable differences in restore times:
old: 0m30.796s
new: 0m31.521s
64bit: 0m25.674s
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[bwh: Backported to 3.16: deleted code is a little different] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Quoting John Stultz:
In updating a 32bit arm device from 4.6 to Linus' current HEAD, I
noticed I was having some trouble with networking, and realized that
/proc/net/ip_tables_names was suddenly empty.
Digging through the registration process, it seems we're catching on the:
Validate that all matches (if any) add up to the beginning of
the target and that each match covers at least the base structure size.
The compat path should be able to safely re-use the function
as the structures only differ in alignment; added a
BUILD_BUG_ON just in case we have an arch that adds padding as well.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
We're currently asserting that targetoff + targetsize <= nextoff.
Extend it to also check that targetoff is >= sizeof(xt_entry).
Since this is generic code, add an argument pointing to the start of the
match/target, we can then derive the base structure size from the delta.
We also need the e->elems pointer in a followup change to validate matches.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
We have targets and standard targets -- the latter carries a verdict.
The ip/ip6tables validation functions will access t->verdict for the
standard targets to fetch the jump offset or verdict for chainloop
detection, but this happens before the targets get checked/validated.
Thus we also need to check for verdict presence here, else t->verdict
can point right after a blob.
Spotted with UBSAN while testing malformed blobs.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
32bit rulesets have different layout and alignment requirements, so once
more integrity checks get added to xt_check_entry_offsets it will reject
well-formed 32bit rulesets.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Once we add more sanity testing to xt_check_entry_offsets it
becomes relvant if we're expecting a 32bit 'config_compat' blob
or a normal one.
Since we already have a lot of similar-named functions (check_entry,
compat_check_entry, find_and_check_entry, etc.) and the current
incarnation is short just fold its contents into the callers.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Currently arp/ip and ip6tables each implement a short helper to check that
the target offset is large enough to hold one xt_entry_target struct and
that t->u.target_size fits within the current rule.
Unfortunately these checks are not sufficient.
To avoid adding new tests to all of ip/ip6/arptables move the current
checks into a helper, then extend this helper in followup patches.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it
is possible for a user-supplied ipt_entry structure to have a large
next_offset field. This field is not bounds checked prior to writing a
counter value at the supplied offset.
Base chains enforce absolute verdict.
User defined chains are supposed to end with an unconditional return,
xtables userspace adds them automatically.
But if such return is missing we will move to non-existent next rule.
Reported-by: Ben Hawkes <hawkes@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The MIC VOP driver does two successive reads from user space to read a
variable length data structure. Kernel memory corruption can result if
the data structure changes between the two reads. This patch disallows
the chance of this happening.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=116651
Reported by: Pengfei Wang <wpengfeinudt@gmail.com> Reviewed-by: Sudeep Dutt <sudeep.dutt@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16:
- Adjust filename, context
- goto exit on failure] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Yue Cao claims that current host rate limiting of challenge ACKS
(RFC 5961) could leak enough information to allow a patient attacker
to hijack TCP sessions. He will soon provide details in an academic
paper.
This patch increases the default limit from 100 to 1000, and adds
some randomization so that the attacker can no longer hijack
sessions without spending a considerable amount of probes.
Based on initial analysis and patch from Linus.
Note that we also have per socket rate limiting, so it is tempting
to remove the host limit in the future.
v2: randomize the count of challenge acks per second, not the period.
Fixes: 282f23c6ee34 ("tcp: implement RFC 5961 3.2") Reported-by: Yue Cao <ycao009@ucr.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16:
- Adjust context
- Use ACCESS_ONCE() instead of {READ,WRITE}_ONCE()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
There is a double fetch problem in audit_log_single_execve_arg()
where we first check the execve(2) argumnets for any "bad" characters
which would require hex encoding and then re-fetch the arguments for
logging in the audit record[1]. Of course this leaves a window of
opportunity for an unsavory application to munge with the data.
This patch reworks things by only fetching the argument data once[2]
into a buffer where it is scanned and logged into the audit
records(s). In addition to fixing the double fetch, this patch
improves on the original code in a few other ways: better handling
of large arguments which require encoding, stricter record length
checking, and some performance improvements (completely unverified,
but we got rid of some strlen() calls, that's got to be a good
thing).
As part of the development of this patch, I've also created a basic
regression test for the audit-testsuite, the test can be tracked on
GitHub at the following link:
[1] If you pay careful attention, there is actually a triple fetch
problem due to a strnlen_user() call at the top of the function.
[2] This is a tiny white lie, we do make a call to strnlen_user()
prior to fetching the argument data. I don't like it, but due to the
way the audit record is structured we really have no choice unless we
copy the entire argument at once (which would require a rather
wasteful allocation). The good news is that with this patch the
kernel no longer relies on this strnlen_user() value for anything
beyond recording it in the log, we also update it with a trustworthy
value whenever possible.
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The sclp_ctl_ioctl_sccb function uses two copy_from_user calls to
retrieve the sclp request from user space. The first copy_from_user
fetches the length of the request which is stored in the first two
bytes of the request. The second copy_from_user gets the complete
sclp request, but this copies the length field a second time.
A malicious user may have changed the length in the meantime.
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com> Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
It turns out that if the guest does a H_CEDE while the CPU is in
a transactional state, and the H_CEDE does a nap, and the nap
loses the architected state of the CPU (which is is allowed to do),
then we lose the checkpointed state of the virtual CPU. In addition,
the transactional-memory state recorded in the MSR gets reset back
to non-transactional, and when we try to return to the guest, we take
a TM bad thing type of program interrupt because we are trying to
transition from non-transactional to transactional with a hrfid
instruction, which is not permitted.
The result of the program interrupt occurring at that point is that
the host CPU will hang in an infinite loop with interrupts disabled.
Thus this is a denial of service vulnerability in the host which can
be triggered by any guest (and depending on the guest kernel, it can
potentially triggered by unprivileged userspace in the guest).
This vulnerability has been assigned the ID CVE-2016-5412.
To fix this, we save the TM state before napping and restore it
on exit from the nap, when handling a H_CEDE in real mode. The
case where H_CEDE exits to host virtual mode is already OK (as are
other hcalls which exit to host virtual mode) because the exit
path saves the TM state.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
[bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This moves the transactional memory state save and restore sequences
out of the guest entry/exit paths into separate procedures. This is
so that these sequences can be used in going into and out of nap
in a subsequent patch.
The only code changes here are (a) saving and restore LR on the
stack, since these new procedures get called with a bl instruction,
(b) explicitly saving r1 into the PACA instead of assuming that
HSTATE_HOST_R1(r13) is already set, and (c) removing an unnecessary
and redundant setting of MSR[TM] that should have been removed by
commit 9d4d0bdd9e0a ("KVM: PPC: Book3S HV: Add transactional memory
support", 2013-09-24) but wasn't.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
[bwh: Backported to 3.16: include dots in subroutine names] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The last field "flags" of object "minfo" is not initialized.
Copying this object out may leak kernel stack data.
Assign 0 to it to avoid leak.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
link_info.str is a char array of size 60. Memory after the NULL
byte is not initialized. Sending the whole object out can cause
a leak.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
[carnil: Backported to 3.16 (same as bwh did for 3.2): the unpadded strcpy() is
in tipc_node_get_links() and no nlattr is involved, so use strncpy()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The stack object “r1” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The stack object “r1” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The stack object “tread” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The stack object “ci” has a total size of 8 bytes. Its last 3 bytes
are padding bytes which are not initialized and leaked to userland
via “copy_to_user”.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This prevents stacking filesystems (ecryptfs and overlayfs) from using
procfs as lower filesystem. There is too much magic going on inside
procfs, and there is no good reason to stack stuff on top of procfs.
(For example, procfs does access checks in VFS open handlers, and
ecryptfs by design calls open handlers from a kernel thread that doesn't
drop privileges or so.)
Add a simple read-only counter to super_block that indicates how deep this
is in the stack of filesystems. Previously ecryptfs was the only stackable
filesystem and it explicitly disallowed multiple layers of itself.
Overlayfs, however, can be stacked recursively and also may be stacked
on top of ecryptfs or vice versa.
To limit the kernel stack usage we must limit the depth of the
filesystem stack. Initially the limit is set to 2.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
[bwh: Backported to 3.16: drop changes to overlayfs] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
I previously added an integer overflow check here but looking at it now,
it's still buggy.
The bug happens in snd_compr_allocate_buffer(). We multiply
".fragments" and ".fragment_size" and that doesn't overflow but then we
save it in an unsigned int so it truncates the high bits away and we
allocate a smaller than expected size.
Fixes: b35cc8225845 ('ALSA: compress_core: integer overflow in snd_compr_allocate_buffer()') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The well-spotted fallocate undo fix is good in most cases, but not when
fallocate failed on the very first page. index 0 then passes lend -1
to shmem_undo_range(), and that has two bad effects: (a) that it will
undo every fallocation throughout the file, unrestricted by the current
range; but more importantly (b) it can cause the undo to hang, because
lend -1 is treated as truncation, which makes it keep on retrying until
every page has gone, but those already fully instantiated will never go
away. Big thank you to xfstests generic/269 which demonstrates this.
Fixes: b9b4bb26af01 ("tmpfs: don't undo fallocate past its last page") Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: use PAGE_CACHE_SHIFT instead of PAGE_SHIFT] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
There are legitimate reasons to disallow mmap on certain files, notably
in sysfs or procfs. We shouldn't emulate mmap support on file systems
that don't offer support natively.
CVE-2016-1583
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
[tyhicks: clean up f_op check by using ecryptfs_file_to_lower()] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>