The 3w-9xxx driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point. Also remove the dma mapping helpers
which have another inherent race due to the request_id index.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Adam Radford <aradford@gmail.com> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The 3w-xxxx driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point. Also remove the dma mapping helpers
which have another inherent race due to the request_id index.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Adam Radford <aradford@gmail.com> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The 3w-sas driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point. Also remove the dma mapping helpers
which have another inherent race due to the request_id index.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Torsten Luettgert <ml-lkml@enda.eu> Tested-by: Bernd Kardatzki <Bernd.Kardatzki@med.uni-tuebingen.de> Acked-by: Adam Radford <aradford@gmail.com> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
This works around a issue with qnap iscsi targets not handling large IOs
very well.
The target returns:
VPD INQUIRY: Block limits page (SBC)
Maximum compare and write length: 1 blocks
Optimal transfer length granularity: 1 blocks
Maximum transfer length: 4294967295 blocks
Optimal transfer length: 4294967295 blocks
Maximum prefetch, xdread, xdwrite transfer length: 0 blocks
Maximum unmap LBA count: 8388607
Maximum unmap block descriptor count: 1
Optimal unmap granularity: 16383
Unmap granularity alignment valid: 0
Unmap granularity alignment: 0
Maximum write same length: 0xffffffff blocks
Maximum atomic transfer length: 0
Atomic alignment: 0
Atomic transfer length granularity: 0
and it is *sometimes* able to handle at least one IO of size up to 8 MB. We
have seen in traces where it will sometimes work, but other times it
looks like it fails and it looks like it returns failures if we send
multiple large IOs sometimes. Also it looks like it can return 2 different
errors. It will sometimes send iscsi reject errors indicating out of
resources or it will send invalid cdb illegal requests check conditions.
And then when it sends iscsi rejects it does not seem to handle retries
when there are command sequence holes, so I could not just add code to
try and gracefully handle that error code.
The problem is that we do not have a good contact for the company,
so we are not able to determine under what conditions it returns
which error and why it sometimes works.
So, this patch just adds a new black list flag to set targets like this to
the old max safe sectors of 1024. The max_hw_sectors changes added in 3.19
caused this regression, so I also ccing stable.
Reported-by: Christian Hesse <list@eworm.de> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Basically snd_emux_detach_seq() doesn't need a protection of
emu->register_mutex as it's already being unregistered. So, we can
get rid of this for avoiding the deadlock.
Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Some models provide too long string for the shortname that has 32bytes
including the terminator, and it results in a non-terminated string
exposed to the user-space. This isn't too critical, though, as the
string is stopped at the succeeding longname string.
This patch fixes such entries by dropping "SB" prefix (it's enough to
fit within 32 bytes, so far). Meanwhile, it also changes strcpy()
with strlcpy() to make sure that this kind of problem won't happen in
future, too.
Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Buffers allocated by dma_alloc_coherent() are always zeroed on Alpha,
ARM (32bit), MIPS, PowerPC, x86/x86_64 and probably other architectures.
It turned out that some drivers rely on this 'feature'. Allocated buffer
might be also exposed to userspace with dma_mmap() call, so clearing it
is desired from security point of view to avoid exposing random memory
to userspace. This patch unifies dma_alloc_coherent() behavior on ARM64
architecture with other implementations by unconditionally zeroing
allocated buffer.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
[ luis: backported to 3.16:
- dropped changes to __alloc_from_pool() which doesn't exist in 3.16 ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The mute-LED mode control has the fixed on/off states that are
supposed to remain on/off regardless of the master switch. However,
this doesn't work actually because the vmaster hook is called in the
vmaster code itself.
This patch fixes it by calling the hook indirectly after checking the
mute LED mode.
Reported-and-tested-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Whenever the check for a send in progress introduced in commit 521e0546c970 (btrfs: protect snapshots from deleting during send) is
hit, we return without unlocking inode->i_mutex. This is easy to see
with lockdep enabled:
[ +0.000059] ================================================
[ +0.000028] [ BUG: lock held when returning to user space! ]
[ +0.000029] 4.0.0-rc5-00096-g3c435c1 #93 Not tainted
[ +0.000026] ------------------------------------------------
[ +0.000029] btrfs/211 is leaving the kernel with locks still held!
[ +0.000029] 1 lock held by btrfs/211:
[ +0.000023] #0: (&type->i_mutex_dir_key){+.+.+.}, at: [<ffffffff8135b8df>] btrfs_ioctl_snap_destroy+0x2df/0x7a0
Make sure we unlock it in the error path.
Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The PLL output will be unstable in some cases. We can fix it by
setting some registers.
Signed-off-by: Bard Liao <bardliao@realtek.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Correct small copy and paste error where autodisable was not being
enabled for the SOC_DAPM_SINGLE_TLV_AUTODISABLE control.
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The only users of collect_mounts are in audit_tree.c
In audit_trim_trees and audit_add_tree_rule the path passed into
collect_mounts is generated from kern_path passed an audit_tree
pathname which is guaranteed to be an absolute path. In those cases
collect_mounts is obviously intended to work on mounted paths and
if a race results in paths that are unmounted when collect_mounts
it is reasonable to fail early.
The paths passed into audit_tag_tree don't have the absolute path
check. But are used to play with fsnotify and otherwise interact with
the audit_trees, so again operating only on mounted paths appears
reasonable.
Avoid having to worry about what happens when we try and audit
unmounted filesystems by restricting collect_mounts to mounts
that appear in the mount tree.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Running mtd-utils/tests/ubi-tests/io_basic.c could cause
soft lockup or watchdog reset. It is because *updatevol*
will perform ubi_check_volume() after updating finish
and this function will full scan the updated lebs if the
volume is initialized as STATIC_VOLUME.
This patch adds *cond_resched()* in the loop of lebs scan
to avoid soft lockup.
Helped by Richard Weinberger <richard@nod.at>
[ 2158.067096] INFO: rcu_sched self-detected stall on CPU { 1} (t=2101 jiffies g=1606 c=1605 q=56)
[ 2158.172867] CPU: 1 PID: 2073 Comm: io_basic Tainted: G O 3.10.53 #21
[ 2158.172898] [<c000f624>] (unwind_backtrace+0x0/0x120) from [<c000c294>] (show_stack+0x10/0x14)
[ 2158.172918] [<c000c294>] (show_stack+0x10/0x14) from [<c008ac3c>] (rcu_check_callbacks+0x1c0/0x660)
[ 2158.172936] [<c008ac3c>] (rcu_check_callbacks+0x1c0/0x660) from [<c002b480>] (update_process_times+0x38/0x64)
[ 2158.172953] [<c002b480>] (update_process_times+0x38/0x64) from [<c005ff38>] (tick_sched_handle+0x54/0x60)
[ 2158.172966] [<c005ff38>] (tick_sched_handle+0x54/0x60) from [<c00601ac>] (tick_sched_timer+0x44/0x74)
[ 2158.172978] [<c00601ac>] (tick_sched_timer+0x44/0x74) from [<c003f348>] (__run_hrtimer+0xc8/0x1b8)
[ 2158.172992] [<c003f348>] (__run_hrtimer+0xc8/0x1b8) from [<c003fd9c>] (hrtimer_interrupt+0x128/0x2a4)
[ 2158.173007] [<c003fd9c>] (hrtimer_interrupt+0x128/0x2a4) from [<c0246f1c>] (arch_timer_handler_virt+0x28/0x30)
[ 2158.173022] [<c0246f1c>] (arch_timer_handler_virt+0x28/0x30) from [<c0086214>] (handle_percpu_devid_irq+0x9c/0x124)
[ 2158.173036] [<c0086214>] (handle_percpu_devid_irq+0x9c/0x124) from [<c0082bd8>] (generic_handle_irq+0x20/0x30)
[ 2158.173049] [<c0082bd8>] (generic_handle_irq+0x20/0x30) from [<c000969c>] (handle_IRQ+0x64/0x8c)
[ 2158.173060] [<c000969c>] (handle_IRQ+0x64/0x8c) from [<c0008544>] (gic_handle_irq+0x3c/0x60)
[ 2158.173074] [<c0008544>] (gic_handle_irq+0x3c/0x60) from [<c02f0f80>] (__irq_svc+0x40/0x50)
[ 2158.173083] Exception stack(0xc4043c98 to 0xc4043ce0)
[ 2158.173092] 3c80: c4043ce400000019
[ 2158.173102] 3ca0: 1f8a865fc050ad101f8a864c00000031c04b59700003ebce00000000f3550000
[ 2158.173113] 3cc0: bf00bc68000008000003ebcec4043ce0c0186d14c0186cb880000013ffffffff
[ 2158.173130] [<c02f0f80>] (__irq_svc+0x40/0x50) from [<c0186cb8>] (read_current_timer+0x4/0x38)
[ 2158.173145] [<c0186cb8>] (read_current_timer+0x4/0x38) from [<1f8a865f>] (0x1f8a865f)
[ 2183.927097] BUG: soft lockup - CPU#1 stuck for 22s! [io_basic:2073]
[ 2184.002229] Modules linked in: nandflash(O) [last unloaded: nandflash]
Signed-off-by: Wang Kai <morgan.wang@huawei.com> Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
A malicious signal handler / restorer can DOS the system by fudging the
user regs saved on stack, causing weird things such as sigreturn returning
to user mode PC but cpu state still being kernel mode....
Ensure that in sigreturn path status32 always has U bit; any other bogosity
(gargbage PC etc) will be taken care of by normal user mode exceptions mechanisms.
It appears that the BayTrail-T class of hardware requires EFI in order
to powerdown and reboot and no other reliable method exists.
This quirk is generally applicable to all hardware that has the ACPI
Hardware Reduced bit set, since usually ACPI would be the preferred
method.
Cc: Len Brown <len.brown@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: Ben Hutchings <ben@decadent.org.uk>
[ luis: backported to 3.16:
- move changes from quirks.c into efi.c
- adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Not only can EfiResetSystem() be used to reboot, it can also be used to
power down machines.
By and large, this functionality doesn't work very well across the range
of EFI machines in the wild, so it should definitely only be used as a
last resort. In an ideal world, this wouldn't be needed at all.
Unfortunately, we're starting to see machines where EFI is the *only*
reliable way to power down, and nothing else, not PCI, not ACPI, works.
efi_poweroff_required() should be implemented on a per-architecture
basis, since exactly when we should be using EFI runtime services is a
platform-specific decision. There's no analogue for reboot because each
architecture handles reboot very differently - the x86 code in
particular is pretty complex.
Patches to enable this for specific classes of hardware will be
submitted separately.
Tested-by: Mark Salter <msalter@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Implement efi_reboot(), which is really just a wrapper around the
EfiResetSystem() EFI runtime service, but it does at least allow us to
funnel all callers through a single location.
It also simplifies the callsites since users no longer need to check to
see whether EFI_RUNTIME_SERVICES are enabled.
Cc: Tony Luck <tony.luck@intel.com> Tested-by: Mark Salter <msalter@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
When system is out of memory, refilling of RX buffers fails while
the driver continue to pass the received packets to the kernel stack.
At some point, when all RX buffers deplete, driver may fall into a
sleep, and not recover when memory for new RX buffers is once again
availible. This is because hardware does not have valid descriptors,
so no interrupt will be generated for the driver to return to work
in napi context. Fix it by schedule the napi poll function from
stats_task delayed workqueue, as long as the allocations fail.
Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
By default, the number of tx queues is limited by the number of online cpus
in mlx4_en_get_profile(). However, this limit no longer holds after the
ethtool .set_channels method has been called. In that situation, the driver
may access invalid bits of certain cpumask variables when queue_index >=
nr_cpu_ids.
Signed-off-by: Benjamin Poirier <bpoirier@suse.de> Acked-by: Ido Shamay <idos@mellanox.com> Fixes: d03a68f ("net/mlx4_en: Configure the XPS queue mapping on driver load") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
If we don't do that, then the poison value is left in the ->pprev
backlink.
This can cause crashes if we do a disconnect, followed by a connect().
Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Wen Xu <hotdog3645@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Currently there is a bug in zero range code which causes zero range
calls to only allocate block aligned portion of the range, while
ignoring the rest in some cases.
In some cases, namely if the end of the range is past i_size, we do
attempt to preallocate the last nonaligned block. However this might
cause kernel to BUG() in some carefully designed zero range requests
on setups where page size > block size.
Fix this problem by first preallocating the entire range, including
the nonaligned edges and converting the written extents to unwritten
in the next step. This approach will also give us the advantage of
having the range to be as linearly contiguous as possible.
Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Moritz Muehlenhoff <jmm@debian.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
We accidently aliased EXT4_EX_NOCACHE and EXT4_GET_CONVERT_UNWRITTEN
falgs, which apparently was hiding a bug that was unmasked when this
flag aliasing issue was addressed (see the subsequent commit). The
reproduction case was:
... which would cause fsx to report corruption in the data file.
The fix we have is a bit of an overkill, but I'd much rather be
conservative for now, and we can optimize ZERO_RANGE_FL handling
later. The fact that we need to zap the extent_status cache for the
inode is unfortunate, but correctness is far more important than
performance.
Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Moritz Muehlenhoff <jmm@debian.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Commit b8a8684502a0f introduced an accidental flag aliasing between
EXT4_EX_NOCACHE and EXT4_GET_BLOCKS_CONVERT_UNWRITTEN.
Fortunately, this didn't introduce any untorward side effects --- we
got lucky. Nevertheless, fix this and leave a warning to hopefully
avoid this from happening in the future.
Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Moritz Muehlenhoff <jmm@debian.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The Fujitsu H730 does not work with crc_enabled = 0, even though the
crc_enabled bit in the firmware version indicated it would. When switching
this value to crc_enabled to 1, the touchpad works. This patch uses DMI to
detect H730.
Reported-by: Stefan Valouch <stefan@valouch.com> Tested-by: Stefan Valouch <stefan@valouch.com> Tested-by: Alfredo Gemma <alfredo.gemma@gmail.com> Signed-off-by: Ulrik De Bie <ulrik.debie-os@e2big.org> Acked-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The root cause is a race conditon -- cpufreq core and acpi-cpufreq driver
were initiated, but cpufreq_governor wasn't and _PPC changed notification
happened, __cpufreq_governor() was called within acpi_os_execute_deferred
kernel thread context.
To fix this panic issue, add pointer checking code in __cpufreq_governor()
before pointer policy->governor is to be dereferenced.
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Since commit ee0778a30153
("tools/power: turbostat: make Makefile a bit more capable")
turbostat's Makefile is using
[...]
BUILD_OUTPUT := $(PWD)
[...]
which obviously causes trouble when building "turbostat" with
make -C /usr/src/linux/tools/power/x86/turbostat ARCH=x86 turbostat
because GNU make does not update nor guarantee that $PWD is set.
This patch changes the Makefile to use $CURDIR instead, which GNU make
guarantees to set and update (i.e. when using "make -C ...") and also
adds support for the O= option (see "make help" in your root of your
kernel source tree for more details).
Link: https://bugs.gentoo.org/show_bug.cgi?id=533918 Fixes: ee0778a30153 ("tools/power: turbostat: make Makefile a bit more capable") Signed-off-by: Thomas D. <whissi@whissi.de> Cc: Mark Asselstine <mark.asselstine@windriver.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Using the indenting we can see the curly braces were obviously intended.
This is a static checker fix, but my guess is that we don't read enough
bytes, because we don't calculate "t_len" correctly.
Fixes: f1d82698029b ('memstick: use fully asynchronous request processing') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Alex Dubov <oakad@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Commit 2473238eac95 ("ihex: add support for CS:IP/EIP records") removes
the "default:" statement in the switch block, making the "return
usage();" line dead code and ihex2fw silently ignoring unknown options.
Restore this statement.
This bug was found by building with HOSTCC=clang and adding
-Wunreachable-code-return to HOSTCFLAGS.
Fixes: 2473238eac95 ("ihex: add support for CS:IP/EIP records") Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
On Wed, Apr 15, 2015 at 05:41:26PM +0200, Nicolas Dichtel wrote:
> Le 15/04/2015 15:57, Herbert Xu a écrit :
> >On Wed, Apr 15, 2015 at 06:22:29PM +0800, Herbert Xu wrote:
> [snip]
> >Subject: skbuff: Do not scrub skb mark within the same name space
> >
> >The commit ea23192e8e577dfc51e0f4fc5ca113af334edff9 ("tunnels:
> Maybe add a Fixes tag?
> Fixes: ea23192e8e57 ("tunnels: harmonize cleanup done on skb on rx path")
>
> >harmonize cleanup done on skb on rx path") broke anyone trying to
> >use netfilter marking across IPv4 tunnels. While most of the
> >fields that are cleared by skb_scrub_packet don't matter, the
> >netfilter mark must be preserved.
> >
> >This patch rearranges skb_scurb_packet to preserve the mark field.
> nit: s/scurb/scrub
>
> Else it's fine for me.
Sure.
PS I used the wrong email for James the first time around. So
let me repeat the question here. Should secmark be preserved
or cleared across tunnels within the same name space? In fact,
do our security models even support name spaces?
---8<---
The commit ea23192e8e577dfc51e0f4fc5ca113af334edff9 ("tunnels:
harmonize cleanup done on skb on rx path") broke anyone trying to
use netfilter marking across IPv4 tunnels. While most of the
fields that are cleared by skb_scrub_packet don't matter, the
netfilter mark must be preserved.
This patch rearranges skb_scrub_packet to preserve the mark field.
Fixes: ea23192e8e57 ("tunnels: harmonize cleanup done on skb on rx path") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
If CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for x86 systems and physical
memory is more than 4GB, dma_map_page may return a valid memory
address which greater than 0xffffffff. As a result, the mlx5 device page
allocator RB tree will be initialized with valid addresses greater than
0xfffffff.
However, (addr & PAGE_MASK) set the high four bytes to zeros. So, it's
impossible for the function, free_4k, to release the pages whose
addresses greater than 4GB. Memory leaks. And mlx5_ib module can't
release the pages when user try to remove the module, as a result,
system hang.
[root@rdma05 root]# dmesg | grep addr | head
addr = 3fe384000
addr & PAGE_MASK = fe384000
[root@rdma05 root]# rmmod mlx5_ib <---- hang on
---------------------- cosnole log -----------------
mlx5_ib 0000:04:00.0: irq 138 for MSI/MSI-X
alloc irq_desc for 139 on node -1
alloc kstat_irqs on node -1
mlx5_ib 0000:04:00.0: irq 139 for MSI/MSI-X
0000:04:00.0:free_4k:221:(pid 1519): page not found
0000:04:00.0:free_4k:221:(pid 1519): page not found
0000:04:00.0:free_4k:221:(pid 1519): page not found
0000:04:00.0:free_4k:221:(pid 1519): page not found
---------------------- cosnole log -----------------
Fixes: bf0bf77f6519 ('mlx5: Support communicating arbitrary host page size to firmware') Signed-off-by: Honggang Li <honli@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The current code decreases from the mss size (which is the gso_size
from the kernel skb) the size of the packet headers.
It shouldn't do that because the mss that comes from the stack
(e.g IPoIB) includes only the tcp payload without the headers.
The result is indication to the HW that each packet that the HW sends
is smaller than what it could be, and too many packets will be sent
for big messages.
An easy way to demonstrate one more aspect of the problem is by
configuring the ipoib mtu to be less than 2*hlen (2*56) and then
run app sending big TCP messages. This will tell the HW to send packets
with giant (negative value which under unsigned arithmetics becomes
a huge positive one) length and the QP moves to SQE state.
Fixes: b832be1e4007 ('IB/mlx4: Add IPoIB LSO support') Reported-by: Matthew Finlay <matt@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Current -next fails to link an ARM allmodconfig because drivers that use
the core recovery functions can be built as modules but those functions
are not exported:
Fixes: 5f9296ba21b3c (i2c: Add bus recovery infrastructure) Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
This problem appears to have been introduced in 2.6.29 by commit 93197a36a9c1 "Rewrite sysfs processor cache info code".
This caused lscpu to error out on at least e500v2 devices, eg:
error: cannot open /sys/devices/system/cpu/cpu0/cache/index2/size: No such file or directory
Some embedded powerpc systems use cache-size in DTS for the unified L2
cache size, not d-cache-size, so we need to allow for both DTS names.
Added a new CACHE_TYPE_UNIFIED_D cache_type_info structure to handle
this.
Fixes: 93197a36a9c1 ("powerpc: Rewrite sysfs processor cache info code") Signed-off-by: Dave Olson <olson@cumulusnetworks.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
kvm_write_guest_cached() does not mark all written pages as dirty and
code comments in kvm_gfn_to_hva_cache_init() talk about NULL memslot
with cross page accesses. Fix all the easy way.
The check is '<= 1' to have the same result for 'len = 0' cache anywhere
in the page. (nr_pages_needed is 0 on page boundary.)
Fixes: 8f964525a121 ("KVM: Allow cross page reads and writes from cached translations.") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Message-Id: <20150408121648.GA3519@potion.brq.redhat.com> Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The patch to add it_page_shift incorrectly changed the increment of
uaddr to use it_page_shift, rather then (1 << it_page_shift).
This broke booting on at least some Cell blades, as the iommu was
basically non-functional.
Fixes: 3a553170d35d ("powerpc/iommu: Add it_page_shift field to determine iommu page size") Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
When the kernel deleted a vti6 interface, this interface was not removed from
the tunnels list. Thus, when the ip6_vti module was removed, this old interface
was found and the kernel tried to delete it again. This was leading to a kernel
panic.
Fixes: 61220ab34948 ("vti6: Enable namespace changing") Signed-off-by: Yao Xiwei <xiwei.yao@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Looking over the implementation for jhash2 and comparing it to jhash_3words
I realized that the two hashes were in fact very different. Doing a bit of
digging led me to "The new jhash implementation" in which lookup2 was
supposed to have been replaced with lookup3.
In reviewing the patch I noticed that jhash2 had originally initialized a
and b to JHASH_GOLDENRATIO and c to initval, but after the patch a, b, and
c were initialized to initval + (length << 2) + JHASH_INITVAL. However the
changes in jhash_3words simply replaced the initialization of a and b with
JHASH_INITVAL.
This change corrects what I believe was an oversight so that a, b, and c in
jhash_3words all have the same value added consisting of initval + (length
<< 2) + JHASH_INITVAL so that jhash2 and jhash_3words will now produce the
same hash result given the same inputs.
Fixes: 60d509c823cca ("The new jhash implementation") Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Normally, when a CPU wants to clear a cache line to zero in the external
L2 cache, it would generate bus cycles to write each word as it would do
with any other data access.
However, a Cortex A9 connected to a L2C-310 has a specific feature where
the CPU can detect this operation, and signal that it wants to zero an
entire cache line. This feature, known as Full Line of Zeros (FLZ),
involves a non-standard AXI signalling mechanism which only the L2C-310
can properly interpret.
There are separate enable bits in both the L2C-310 and the Cortex A9 -
the L2C-310 needs to be enabled and have the FLZ enable bit set in the
auxiliary control register before the Cortex A9 has this feature
enabled.
Unfortunately, the suspend code was not respecting this - it's not
obvious from the code:
swsusp_arch_suspend()
cpu_suspend() /* saves the Cortex A9 auxiliary control register */
arch_save_image()
soft_restart() /* turns off FLZ in Cortex A9, and disables L2C */
cpu_resume() /* restores the Cortex A9 registers, inc auxcr */
At this point, we end up with the L2C disabled, but the Cortex A9 with
FLZ enabled - which means any memset() or zeroing of a full cache line
will fail to take effect.
A similar issue exists in the resume path, but it's slightly more
complex:
swsusp_arch_suspend()
cpu_suspend() /* saves the Cortex A9 auxiliary control register */
arch_save_image() /* image with A9 auxcr saved */
...
swsusp_arch_resume()
call_with_stack()
arch_restore_image() /* restores image with A9 auxcr saved above */
soft_restart() /* turns off FLZ in Cortex A9, and disables L2C */
cpu_resume() /* restores the Cortex A9 registers, inc auxcr */
Again, here we end up with the L2C disabled, but Cortex A9 FLZ enabled.
There's no need to turn off the L2C in either of these two paths; there
are benefits from not doing so - for example, the page copies will be
faster with the L2C enabled.
Hence, fix this by providing a variant of soft_restart() which can be
used without turning the L2 cache controller off, and use it in both
of these paths to keep the L2C enabled across the respective resume
transitions.
Fixes: 8ef418c7178f ("ARM: l2c: trial at enabling some Cortex-A9 optimisations") Reported-by: Sean Cross <xobs@kosagi.com> Tested-by: Sean Cross <xobs@kosagi.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
For cases where total length of an input SGs is not same as
length of the input data for encryption, omap-aes driver
crashes. This happens in the case when IPsec is trying to use
omap-aes driver.
To avoid this, we copy all the pages from the input SG list
into a contiguous buffer and prepare a single element SG list
for this buffer with length as the total bytes to crypt, which is
similar thing that is done in case of unaligned lengths.
Fixes: 6242332ff2f3 ("crypto: omap-aes - Add support for cases of unaligned lengths") Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
If a provider advertizes a zero max_fast_reg_page_list_len, FRWR
depth detection loops forever. Instead of just failing the mount,
try other memory registration modes.
Fixes: 0fc6c4e7bb28 ("xprtrdma: mind the device's max fast . . .") Reported-by: Devesh Sharma <Devesh.Sharma@Emulex.Com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by: Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by: Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
[ luis: backported to 3.16:
- devattr struct isn't a pointer ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
time_init invokes timer64_init (which is __init annotation)
since all of these are invoked at init time, lets maintain
consistency by ensuring time_init is marked appropriately
as well.
This fixes the following warning with CONFIG_DEBUG_SECTION_MISMATCH=y
WARNING: vmlinux.o(.text+0x3bfc): Section mismatch in reference from the function time_init() to the function .init.text:timer64_init()
The function time_init() references
the function __init timer64_init().
This is often because time_init lacks a __init
annotation or the annotation of timer64_init is wrong.
Fixes: 546a39546c64 ("C6X: time management") Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: Mark Salter <msalter@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
After writing the page tables, we use __inval_cache_range to invalidate
any stale cache entries. Strongly Ordered memory accesses are not
ordered w.r.t. cache maintenance instructions, and hence explicit memory
barriers are required to provide this ordering. However,
__inval_cache_range was written to be used on Normal Cacheable memory
once the MMU and caches are on, and does not have any barriers prior to
the DC instructions.
This patch adds a DMB between the page tables being written and the
corresponding cachelines being invalidated, ensuring that the
invalidation makes the new data visible to subsequent cacheable
accesses. A barrier is not required before the prior invalidate as we do
not access the page table memory area prior to this, and earlier
barriers in preserve_boot_args and set_cpu_boot_mode_flag ensures
ordering w.r.t. any stores performed prior to entering Linux.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Fixes: c218bca74eeafa2f ("arm64: Relax the kernel cache requirements for boot") Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Currently, a RCG's M/N counter (used for fraction division) is
set to either 'bypass' (counter disabled) or 'dual edge' (counter
enabled) based on whether the corresponding rcg struct has a mnd
field specified and a non-zero N.
In the case where M and N are the same value, the M/N counter is
still enabled by code even though no division takes place.
Leaving the RCG in such a state can result in improper behavior.
This was observed with the DSI pixel clock RCG when M and N were
both set to 1.
Add an additional check (M != N) to enable the M/N counter only
when it's needed for fraction division.
Signed-off-by: Archit Taneja <architt@codeaurora.org> Fixes: bcd61c0f535a (clk: qcom: Add support for root clock
generators (RCGs)) Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
... but in case in future we might use facilities such as LTO, then
OPTIMIZER_HIDE_VAR() is not sufficient to protect gcc from a possible
eviction of the memset(). We have to use a compiler barrier instead.
Minimal test example when we assume memzero_explicit() would *not* be
a call, but would have been *inlined* instead:
As GMUX depends on IO for iGP to be enabled and active, lock the IO at
vgaarb level. This should prevent GPU driver for dGPU to disable IO for
iGP while it tries to own legacy VGA IO.
This fixes usage of backlight control combined with closed nvidia
driver on some Apple dual-GPU (intel/nvidia) systems.
On those systems loading nvidia driver disables intel IO decoding,
disabling the gmux backlight controls as a side effect.
Prior to commits moving boot_vga from (optional) efifb to less optional
vgaarb this mis-behavior could be avoided by using right kernel config
(efifb enabled but vgaarb disabled).
This patch explicitly does not try to trigger vgaarb changes in order
to avoid confusing already running graphics drivers. If IO has been
mis-configured by vgaarb gmux will thus fail to probe.
It is expected to load/probe gmux prior to graphics drivers.
Fixes: ce027dac592c0ada241ce0f95ae65856828ac450 # nvidia interaction
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=86121 Reported-by: Petri Hodju <petrihodju@yahoo.com> Tested-by: Petri Hodju <petrihodju@yahoo.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org> Signed-off-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Set the SYSCIER as per the values indicated in the documentation.
The value previously used appears to been copied from the r8a7779
implementation but on closer inspection is not correct for the r8a7790.
In struct wl18xx_acx_rx_rate_stat, rx_frames_per_rates field is an
array, not a number. This means WL18XX_DEBUGFS_FWSTATS_FILE can't be
used to display this field in debugfs (it would display a pointer, not
the actual data). Use WL18XX_DEBUGFS_FWSTATS_FILE_ARRAY instead.
This bug has been found by adding a __printf attribute to
wl1271_format_buffer. gcc complained about "format '%u' expects
argument of type 'unsigned int', but argument 5 has type 'u32 *'".
Fixes: c5d94169e818 ("wl18xx: use new fw stats structures") Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
There is a race condition between e1000_change_mtu's cleanups and
netpoll, when we change the MTU across jumbo size:
Changing MTU frees all the rx buffers:
e1000_change_mtu -> e1000_down -> e1000_clean_all_rx_rings ->
e1000_clean_rx_ring
Then, close to the end of e1000_change_mtu:
pr_info -> ... -> netpoll_poll_dev -> e1000_clean ->
e1000_clean_rx_irq -> e1000_alloc_rx_buffers -> e1000_alloc_frag
And when we come back to do the rest of the MTU change:
e1000_up -> e1000_configure -> e1000_configure_rx ->
e1000_alloc_jumbo_rx_buffers
alloc_jumbo finds the buffers already != NULL, since data (shared with
page in e1000_rx_buffer->rxbuf) has been re-alloc'd, but it's garbage,
or at least not what is expected when in jumbo state.
This results in an unusable adapter (packets don't get through), and a
NULL pointer dereference on the next call to e1000_clean_rx_ring
(other mtu change, link down, shutdown):
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81194d6e>] put_compound_page+0x7e/0x330
Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Grant Likely <grant.likely@linaro.org>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Calling unlazy_walk() in walk_component() and do_last() when we find
a symlink that needs to be followed doesn't acquire a reference to vfsmount.
That's fine when the symlink is on the same vfsmount as the parent directory
(which is almost always the case), but it's not always true - one _can_
manage to bind a symlink on top of something. And in such cases we end up
with excessive mntput().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
acpi_scan_is_offline() may be called under the physical_node_lock
lock of the given device object's parent, so prevent lockdep from
complaining about that by annotating that instance with
SINGLE_DEPTH_NESTING.
Fixes: caa73ea158de (ACPI / hotplug / driver core: Handle containers in a special way) Reported-and-tested-by: Xie XiuQi <xiexiuqi@huawei.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
I noticed this only by reading the code. To my knowledge it shouldn't
cause any real problems at the moment, since the power well backing this
register remains on across a runtime s/r. This may change once
system-wide s0ix functionality is enabled in the kernel.
v2:
- resend after a missing git add -u :/
Signed-off-by: Imre Deak <imre.deak@intel.com> Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The hardware, according to the specs, is limited to 256 byte transfers,
and current driver has no protections in case users attempt to do larger
transfers. The code will just stomp over status register and mayhem
ensues.
Let's split larger transfers into digestable chunks. Doing this allows
Atmel MXT driver on Pixel 1 function properly (it hasn't since commit 9d8dc3e529a19e427fd379118acd132520935c5d "Input: atmel_mxt_ts -
implement T44 message handling" which tries to consume multiple
touchscreen/touchpad reports in a single transaction).
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Chuck pointed out a problem that crept in with commit 6ffa30d3f734 (nfs:
don't call blocking operations while !TASK_RUNNING). Linux counts tasks
in uninterruptible sleep against the load average, so this caused the
system's load average to be pinned at at least 1 when there was a
NFSv4.1+ mount active.
Not a huge problem, but it's probably worth fixing before we get too
many complaints about it. This patch converts the code back to use
TASK_INTERRUPTIBLE sleep, simply has it flush any signals on each loop
iteration. In practice no one should really be signalling this thread at
all, so I think this is reasonably safe.
With this change, there's also no need to game the hung task watchdog so
we can also convert the schedule_timeout call back to a normal schedule.
Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Fixes: commit 6ffa30d3f734 (“nfs: don't call blocking . . .”) Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Commit 523c5b89640e ("i2c: Remove support for legacy PM") removed the PM
ops from the bus type, which causes the pm operations on the s3c2410
adapter device to fail (-ENOSUPP in rpm_callback). The adapter device
doesn't get bound to a driver and as such can't have its own pm_runtime
callbacks. Previously this was fine as the bus callbacks would have been
used, but now this can cause devices which use PM runtime and are
attached over I2C to fail to resume.
This commit fixes this issue by marking all adapter devices with
pm_runtime_no_callbacks, since they can't have any.
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Acked-by: Beata Michalska <b.michalska@samsung.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Fixes: 523c5b89640e Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
master_xfer() method should return number of i2c messages transferred,
but on Rockchip we were usually returning just 1, which caused trouble
with users that actually check number of transferred messages vs.
checking for negative error codes.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Fixed this behaviour by calling register_pernet_subsys(&nfsd_net_ops) before
registering rpc_pipefs_event(...) with the notifier chain.
Signed-off-by: Giuseppe Cantavenera <giuseppe.cantavenera.ext@nokia.com> Signed-off-by: Lorenzo Restelli <lorenzo.restelli.ext@nokia.com> Reviewed-by: Kinlong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
mvsas is giving a General protection fault when it encounters an expander
attached ATA device. Analysis of mvs_task_prep_ata() shows that the driver is
assuming all ATA devices are locally attached and obtaining the phy mask by
indexing the local phy table (in the HBA structure) with the phy id. Since
expanders have many more phys than the HBA, this is causing the index into the
HBA phy table to overflow and returning rubbish as the pointer.
mvs_task_prep_ssp() instead does the phy mask using the port properties.
Mirror this in mvs_task_prep_ata() to fix the panic.
Reported-by: Adam Talbot <ajtalbot1@gmail.com> Tested-by: Adam Talbot <ajtalbot1@gmail.com> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
ptrace_resume() is called when the tracee is still __TASK_TRACED. We set
tracee->exit_code and then wake_up_state() changes tracee->state. If the
tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T)
wrongly looks like another report from tracee.
This confuses debugger, and since wait_task_stopped() clears ->exit_code
the tracee can miss a signal.
Note for stable: the bug is very old, but without 9899d11f6544 "ptrace:
ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix
should use lock_task_sighand(child).
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Pavel Labath <labath@google.com> Tested-by: Pavel Labath <labath@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
When UNMAP command is issued with DIF protection support enabled,
the protection info for the unmapped region is remain unchanged.
So READ command for the region causes data integrity failure.
This fixes it by invalidating protection info for the unmapped region
by filling with 0xff pattern. This change also adds helper function
fd_do_prot_fill() in order to reduce code duplication with existing
fd_format_prot().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
In fd_do_prot_rw(), it allocates prot_buf which is used to copy from
se_cmd->t_prot_sg by sbc_dif_copy_prot(). The SG table for prot_buf
is also initialized by allocating 'se_cmd->t_prot_nents' entries of
scatterlist and setting the data length of each entry to PAGE_SIZE
at most.
However if se_cmd->t_prot_sg contains a clustered entry (i.e.
sg->length > PAGE_SIZE), the SG table for prot_buf can't be
initialized correctly and sbc_dif_copy_prot() can't copy to prot_buf.
(This actually happened with TCM loopback fabric module)
As prot_buf is allocated by kzalloc() and it's physically contiguous,
we only need a single scatterlist entry.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
When CONFIG_DEBUG_SG=y and DIF protection support enabled, kernel
BUG()s are triggered due to the following two issues:
1) prot_sg is not initialized by sg_init_table().
When CONFIG_DEBUG_SG=y, scatterlist helpers check sg entry has a
correct magic value.
2) vmalloc'ed buffer is passed to sg_set_buf().
sg_set_buf() uses virt_to_page() to convert virtual address to struct
page, but it doesn't work with vmalloc address. vmalloc_to_page()
should be used instead. As prot_buf isn't usually too large, so
fix it by allocating prot_buf by kmalloc instead of vmalloc.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
In a call to ib_umem_get(), if address is 0x0 and size is
already page aligned, check added in commit 8494057ab5e4
("IB/uverbs: Prevent integer overflow in ib_umem_get address
arithmetic") will refuse to register a memory region that
could otherwise be valid (provided vm.mmap_min_addr sysctl
and mmap_low_allowed SELinux knobs allow userspace to map
something at address 0x0).
This patch allows back such registration: ib_umem_get()
should probably don't care of the base address provided it
can be pinned with get_user_pages().
There's two possible overflows, in (addr + size) and in
PAGE_ALIGN(addr + size), this patch keep ensuring none
of them happen while allowing to pin memory at address
0x0. Anyway, the case of size equal 0 is no more (partially)
handled as 0-length memory region are disallowed by an
earlier check.
Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com Cc: Shachar Raindel <raindel@mellanox.com> Cc: Jack Morgenstein <jackm@mellanox.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
If ib_umem_get() is called with a size equal to 0 and an
non-page aligned address, one page will be pinned and a
0-sized umem will be returned to the caller.
This should not be allowed: it's not expected for a memory
region to have a size equal to 0.
This patch adds a check to explicitly refuse to register
a 0-sized region.
Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com Cc: Shachar Raindel <raindel@mellanox.com> Cc: Jack Morgenstein <jackm@mellanox.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
With CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE enabled, and a normal top-down
address allocation strategy, load_elf_binary() will attempt to map a PIE
binary into an address range immediately below mm->mmap_base.
Unfortunately, load_elf_ binary() does not take account of the need to
allocate sufficient space for the entire binary which means that, while
the first PT_LOAD segment is mapped below mm->mmap_base, the subsequent
PT_LOAD segment(s) end up being mapped above mm->mmap_base into the are
that is supposed to be the "gap" between the stack and the binary.
Since the size of the "gap" on x86_64 is only guaranteed to be 128MB this
means that binaries with large data segments > 128MB can end up mapping
part of their data segment over their stack resulting in corruption of the
stack (and the data segment once the binary starts to run).
Any PIE binary with a data segment > 128MB is vulnerable to this although
address randomization means that the actual gap between the stack and the
end of the binary is normally greater than 128MB. The larger the data
segment of the binary the higher the probability of failure.
Fix this by calculating the total size of the binary in the same way as
load_elf_interp().
Signed-off-by: Michael Davidson <md@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Commit 61f77eda9bbf ("mm/hugetlb: reduce arch dependent code around
follow_huge_*") broke follow_huge_pmd() on s390, where pmd and pte
layout differ and using pte_page() on a huge pmd will return wrong
results. Using pmd_page() instead fixes this.
All architectures that were touched by that commit have pmd_page()
defined, so this should not break anything on other architectures.
Fixes: 61f77eda "mm/hugetlb: reduce arch dependent code around follow_huge_*" Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@suse.cz>, Andrea Arcangeli <aarcange@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
This patch fixes a bug for COMPARE_AND_WRITE handling with
fabrics using SCF_PASSTHROUGH_SG_TO_MEM_NOALLOC.
It adds the missing allocation for cmd->t_bidi_data_sg within
transport_generic_new_cmd() that is used by COMPARE_AND_WRITE
for the initial READ payload, even if the fabric is already
providing a pre-allocated buffer for cmd->t_data_sg.
Also, fix zero-length COMPARE_AND_WRITE handling within the
compare_and_write_callback() and target_complete_ok_work()
to queue the response, skipping the initial READ.
This fixes COMPARE_AND_WRITE emulation with loopback, vhost,
and xen-backend fabric drivers using SG_TO_MEM_NOALLOC.
Reported-by: Christoph Hellwig <hch@lst.de> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
kvm_init_msr_list is currently called before hardware_setup. As a result,
vmx_mpx_supported always returns false when kvm_init_msr_list checks whether to
save MSR_IA32_BNDCFGS.
Move kvm_init_msr_list after vmx_hardware_setup is called to fix this issue.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1428864435-4732-1-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
drm/i915: Stop gathering error states for CS error interrupts
but just clearing is apparently not enough: A sufficiently dead gpu
left behind by firmware (*cough* coreboot *cough*) can keep the gpu in
an endless loop of such interrupts, eventually leading to the nmi
firing. And definitely to what looks like a machine hang.
Since we don't even enable these interrupts on gen5+ let's do the same
on earlier platforms.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=93171 Tested-by: Mono <mono-for-kernel-org@donderklumpen.de> Tested-by: info@gluglug.org.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
ACPI_MTX_TABLES is acquired and released by the callers of
acpi_tb_install_standard_table() so releasing it in the function itself is
causing the following error in Linux kernel if the table is reloaded:
ACPI Error: Mutex [0x2] is not acquired, cannot release (20141107/utmutex-321)
Call Trace:
[<ffffffff81b0bd48>] dump_stack+0x4f/0x7b
[<ffffffff81546bf5>] acpi_ut_release_mutex+0x47/0x67
[<ffffffff81544357>] acpi_load_table+0x73/0xcb
Link: https://github.com/acpica/acpica/commit/c70434d4 Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
We cap 32bit userspace backtraces to PERF_MAX_STACK_DEPTH
(currently 127), but we forgot to do the same for 64bit backtraces.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
If we attempt to clone a 0 length region into a file we can end up
inserting a range in the inode's extent_io tree with a start offset
that is greater then the end offset, which triggers immediately the
following warning:
So just bail out of the clone ioctl if the length of the region to clone
is zero, without locking any extent range, in order to prevent this issue
(same behaviour as a pwrite with a 0 length for example).
This is trivial to reproduce. For example, the steps for the test I just
made for fstests:
mkfs.btrfs -f SCRATCH_DEV
mount SCRATCH_DEV $SCRATCH_MNT
If we pass a length of 0 to the extent_same ioctl, we end up locking an
extent range with a start offset greater then its end offset (if the
destination file's offset is greater than zero). This results in a warning
from extent_io.c:insert_state through the following call chain:
This leads to an infinite loop when evicting the inode. This is the same
problem that my previous patch titled
"Btrfs: fix inode eviction infinite loop after cloning into it" addressed
but for the extent_same ioctl instead of the clone ioctl.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Sebastian reported a crash caused by a jump label mismatch after resume.
This happens because we do not save the kernel text section during suspend
and therefore also do not restore it during resume, but use the kernel image
that restores the old system.
This means that after a suspend/resume cycle we lost all modifications done
to the kernel text section.
The reason for this is the pfn_is_nosave() function, which incorrectly
returns that read-only pages don't need to be saved. This is incorrect since
we mark the kernel text section read-only.
We still need to make sure to not save and restore pages contained within
NSS and DCSS segment.
To fix this add an extra case for the kernel text section and only save
those pages if they are not contained within an NSS segment.
Fixes the following crash (and the above bugs as well):
Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
- don't lock lp->lock in the iss_net_timer for the call of iss_net_poll,
it will lock it itself;
- invert order of lp->lock and opened_lock acquisition in the
iss_net_open to make it consistent with iss_net_poll;
- replace spin_lock with spin_lock_bh when acquiring locks used in
iss_net_timer from non-atomic context;
- replace spin_lock_irqsave with spin_lock_bh in the iss_net_start_xmit
as the driver doesn't use lp->lock in the hard IRQ context;
- replace __SPIN_LOCK_UNLOCKED(lp.lock) with spin_lock_init, otherwise
lockdep is unhappy about using non-static key.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
First check for rate == 0 in set_rate and round_rate to avoid div by zero.
Then, in order to get the closest rate, round all divisions to the closest
result instead of rounding them down.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Michael Turquette <mturquette@linaro.org>
[ luis: 3.16 prereq for: 4591243102fa "clk: at91: usb: propagate rate modification to the parent clk" ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
The number of resets controls is 32 times the number of peripheral
register banks rather than 32 times the number of clocks. This reduces
(drastically) the number of reset controls registered from 10080 (315
clocks * 32) to 224 (6 peripheral register banks * 32).
This also fixes a potential crash because trying to use any of the
excess reset controls (224-10079) would have caused accesses beyond
the array bounds of the peripheral register banks definition array.
Cc: Peter De Schrijver <pdeschrijver@nvidia.com> Cc: Prashant Gaikwad <pgaikwad@nvidia.com> Fixes: 6d5b988e7dc5 ("clk: tegra: implement a reset driver") Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
memsize denotes the amount of RAM we can access from kseg{0,1} and
that should be up to 256M. In case the bootloader reports a value
higher than that (perhaps reporting all the available RAM) it's best
if we fix it ourselves and just warn the user about that. This is
usually a problem with the bootloader and/or its environment.
[ralf@linux-mips.org: Remove useless parens as suggested bei Sergei.
Reformat long pr_warn statement to fit into 80 column limit.]
The functions snd_emu10k1_proc_spdif_read and snd_emu1010_fpga_read
acquire the emu_lock before accessing the FPGA. The function used
to access the FPGA (snd_emu1010_fpga_read) also tries to take
the emu_lock which causes a deadlock.
Remove the outer locking in the proc-functions (guarding only the
already safe fpga read) to prevent this deadlock.
[removed superfluous flags variables too -- tiwai]
Signed-off-by: Michael Gernoth <michael@gernoth.net> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
We may exit this function without properly freeing up the maapings
we may have acquired. Fix the bug.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
img_ir_remove() passes a pointer to the ISR function as the 2nd
parameter to irq_free() instead of a pointer to the device data
structure.
This issue causes unloading img-ir module to fail with the below
warning after building and loading img-ir as a module.
Level IRQ handlers and edge IRQ handler are managed by tow different
sets of registers. But currently the driver uses the same mask for the
both registers. It lead to issues with the following scenario:
First, an IRQ is requested on a GPIO to be triggered on front. After,
this an other IRQ is requested for a GPIO of the same bank but
triggered on level. Then the first one will be also setup to be
triggered on level. It leads to an interrupt storm.
The different kind of handler are already associated with two
different irq chip type. With this patch the driver uses a private
mask for each one which solves this issue.
It has been tested on an Armada XP based board and on an Armada 375
board. For the both boards, with this patch is applied, there is no
such interrupt storm when running the previous scenario.
This bug was already fixed but in a different way in the legacy
version of this driver by Evgeniy Dushistov: 9ece8839b1277fb9128ff6833411614ab6c88d68 "ARM: orion: Fix for certain
sequence of request_irq can cause irq storm". The fact the new version
of the gpio drive could be affected had been discussed there:
http://thread.gmane.org/gmane.linux.ports.arm.kernel/344670/focus=364012
Reported-by: Evgeniy A. Dushistov <dushistov@mail.ru> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
If we were migrated right after __getcpu, but before reading the
migration_count, we wouldn't notice that we read TSC of a different
VCPU, nor that KVM's bug made pvti invalid, as only migration_count
on source VCPU is increased.
Change vdso instead of updating migration_count on destination.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Fixes: 0a4e6be9ca17 ("x86: kvm: Revert "remove sched notifier for cross-cpu migrations"")
Message-Id: <1428000263-11892-1-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Before we reach to connection established we may get an
error event. In this case the core won't teardown this
connection (never established it), so we take care of freeing
it ourselves.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
This hang was a result of a missing command put when
a DIF error occurred during a rdma read (and we sent
an CHECK_CONDITION error without passing it to the
backend).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>