drivers/pwm/core.c: In function ‘pwm_get’:
drivers/pwm/core.c:610: warning: ‘polarity’ may be used uninitialized in this function
drivers/pwm/core.c:609: warning: ‘period’ may be used uninitialized in this function
While these are false positives, we can get rid of them by refactoring
the code to store a pointer to the best match, as suggested before by
Thierry Reding. This does require moving the mutex_unlock() down.
Fixes: d717ea73e36dd565 ("pwm: Fix period and polarity in pwm_get() for non-perfect matches") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kernel/time/jiffies.c provides a default clocksource_default_clock()
definition explicitly marked "weak". arch/s390 provides its own definition
intended to override the default, but the "weak" attribute on the
declaration applied to the s390 definition as well, so the linker chose one
based on link order (see 10629d711ed7 ("PCI: Remove __weak annotation from
pcibios_get_phb_of_node decl")).
Remove the "weak" attribute from the clocksource_default_clock()
declaration so we always prefer a non-weak definition over the weak one,
independent of link order.
Fixes: f1b82746c1e9 ("clocksource: Cleanup clocksource selection") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: John Stultz <john.stultz@linaro.org> Acked-by: Ingo Molnar <mingo@kernel.org> CC: Daniel Lezcano <daniel.lezcano@linaro.org> CC: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kernel/debug/debug_core.c provides a default kgdb_arch_pc() definition
explicitly marked "weak". Several architectures provide their own
definitions intended to override the default, but the "weak" attribute on
the declaration applied to the arch definitions as well, so the linker
chose one based on link order (see 10629d711ed7 ("PCI: Remove __weak
annotation from pcibios_get_phb_of_node decl")).
Remove the "weak" attribute from the declaration so we always prefer a
non-weak definition over the weak one, independent of link order.
fs/proc/vmcore.c provides default definitions explicitly marked "weak".
arch/s390 provides its own definitions intended to override the default
ones, but the "weak" attribute on the declarations applied to the s390
definitions as well, so the linker chose one based on link order (see 10629d711ed7 ("PCI: Remove __weak annotation from pcibios_get_phb_of_node
decl")).
Remove the "weak" attribute from the declarations so we always prefer a
non-weak definition over the weak one, independent of link order.
Fixes: be8a8d069e50 ("vmcore: introduce ELF header in new memory feature") Fixes: 9cb218131de1 ("vmcore: introduce remap_oldmem_pfn_range()") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> CC: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/base/memory.c provides a default memory_block_size_bytes()
definition explicitly marked "weak". Several architectures provide their
own definitions intended to override the default, but the "weak" attribute
on the declaration applied to the arch definitions as well, so the linker
chose one based on link order (see 10629d711ed7 ("PCI: Remove __weak
annotation from pcibios_get_phb_of_node decl")).
Remove the "weak" attribute from the declaration so we always prefer a
non-weak definition over the weak one, independent of link order.
Fixes: 41f107266b19 ("drivers: base: Add prototype declaration to the header file") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> CC: Rashika Kheria <rashika.kheria@gmail.com> CC: Nathan Fontenot <nfont@austin.ibm.com> CC: Anton Blanchard <anton@au1.ibm.com> CC: Heiko Carstens <heiko.carstens@de.ibm.com> CC: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
bcm_sysport_resume() was missing an UniMAC reset which can lead to
various receive FIFO corruptions coming out of a suspend cycle. If the
RX FIFO is stuck, it will deliver corrupted/duplicate packets towards
the host CPU interface.
This could be reproduced on crowded network and when Wake-on-LAN is
enabled for this particular interface because the switch still forwards
packets towards the host CPU interface (SYSTEMPORT), and we had to leave
the UniMAC RX enable bit on to allow matching MagicPackets.
Once we re-enter the resume function, there is a small window during
which the UniMAC receive is still enabled, and we start queueing
packets, but the RDMA and RBUF engines are not ready, which leads to
having packets stuck in the UniMAC RX FIFO, ultimately delivered towards
the host CPU as corrupted.
Fixes: 40755a0fce17 ("net: systemport: add suspend and resume support") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is currently a small window during which the SYSTEMPORT adapter
enables its RX interrupts without having enabled its NAPI handler, which
can result in packets to be discarded during interface bringup.
A similar but more serious window exists in bcm_sysport_resume() during
which we can have the RDMA engine not fully prepared to receive packets
and yet having RX interrupts enabled.
Fix this my moving the RX interrupt enable down to
bcm_sysport_netif_start() after napi_enable() for the RX path is called,
which fixes both call sites: bcm_sysport_open() and
bcm_sysport_resume().
Fixes: b02e6d9ba7ad ("net: systemport: add bcm_sysport_netif_{enable,stop}") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Disabling DCBx in firmware automatically enables DCBx for control via host
lldp agents. Wait for an explicit setstate call from an lldp agents to enable
DCBx instead.
Fixes: 76bcb31efc06 ("cxgb4 : Add DCBx support codebase and dcbnl_ops") Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch removes the assumption made previously, that we only need to
check the delegation stateid when it matches the stateid on a cached
open.
If we believe that we hold a delegation for this file, then we must assume
that its stateid may have been revoked or expired too. If we don't test it
then our state recovery process may end up caching open/lock state in a
situation where it should not.
We therefore rename the function nfs41_clear_delegation_stateid as
nfs41_check_delegation_stateid, and change it to always run through the
delegation stateid test and recovery process as outlined in RFC5661.
Any attempt to call nfs_remove_bad_delegation() while a delegation is being
returned is currently a no-op. This means that we can end up looping
forever in nfs_end_delegation_return() if something causes the delegation
to be revoked.
This patch adds a mechanism whereby the state recovery code can communicate
to the delegation return code that the delegation is no longer valid and
that it should not be used when reclaiming state.
It also changes the return value for nfs4_handle_delegation_recall_error()
to ensure that nfs_end_delegation_return() does not reattempt the lock
reclaim before state recovery is done.
Variable 'err' needn't be initialized when nfs_getattr() uses it to
check whether it should call generic_fillattr() or not. That can result
in spurious error returns. Initialize 'err' properly.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
NFSv4.0 does not have TEST_STATEID/FREE_STATEID functionality, so
unlike NFSv4.1, the recovery procedure when stateids have expired or
have been revoked requires us to just forget the delegation.
md_check_recovery will skip any recovery and also clear
MD_RECOVERY_NEEDED if MD_RECOVERY_FROZEN is set.
So when we clear _FROZEN, we must set _NEEDED and ensure that
md_check_recovery gets run.
Otherwise we could miss out on something that is needed.
In particular, this can make it impossible to remove a
failed device from an array is the 'recovery-needed' processing
didn't happen.
Suitable for stable kernels since 3.13.
Reported-and-tested-by: Joe Lawrence <joe.lawrence@stratus.com> Fixes: 30b8feb730f9b9b3c5de02580897da03f59b6b16 Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When choosing a random address, the current implementation does not take into
account the reversed space for .bss and .brk sections. Thus the relocated kernel
may overlap other components in memory. Here is an example of the overlap from a
x86_64 kernel in qemu (the ranges of physical addresses are presented):
This patch prevents the above situation by requiring a larger space when looking
for a random kernel base, so that existing logic can effectively avoids the
overlap.
[kees: switched to perl to avoid hex translation pain in mawk vs gawk]
[kees: calculated overlap without relocs table]
Fixes: 82fa9637a2 ("x86, kaslr: Select random position from e820 maps") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Junjie Mao <eternal.n08@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1414762838-13067-1-git-send-email-eternal.n08@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Save the patch while we're running on the BSP instead of later, before
the initrd has been jettisoned. More importantly, on 32-bit we need to
access the physical address instead of the virtual.
This way we actually do find it on the APs instead of having to go
through the initrd each time.
Konrad triggered the following splat below in a 32-bit guest on an AMD
box. As it turns out, in save_microcode_in_initrd_amd() we're using the
*physical* address of the container *after* we have enabled paging and
thus we #PF in load_microcode_amd() when trying to access the microcode
container in the ramdisk range.
Because the ramdisk is exactly there:
[ 0.000000] RAMDISK: [mem 0x35e04000-0x36ef9fff]
and we fault at 0x35e04304.
And since this guest doesn't relocate the ramdisk, we don't do the
computation which will give us the correct virtual address and we end up
with the PA.
So, we should actually be using virtual addresses on 32-bit too by the
time we're freeing the initrd. Do that then!
Memory allocated for 'name' was leaking if required binding properties
were not present.
The memory for 'name' was allocated early at probe with kasprintf(). It
was freed in error paths executed before and after parsing DTS but not
in that error path.
Fix the error path for parsing device tree properties.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Fixes: faffd234cf85 ("bq2415x_charger: Add DT support") Signed-off-by: Sebastian Reichel <sre@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The power_supply_get_by_phandle() on error returns ENODEV or NULL.
The driver later expects obtained pointer to power supply to be
valid or NULL. If it is not NULL then it dereferences it in
bq2415x_notifier_call() which would lead to dereferencing ENODEV-value
pointer.
Properly handle the power_supply_get_by_phandle() error case by
replacing error value with NULL. This indicates that usb charger
detection won't be used.
Fix also memory leak of 'name' if power_supply_get_by_phandle() fails
with NULL and probe should defer.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Fixes: faffd234cf85 ("bq2415x_charger: Add DT support")
[small fix regarding the missing ti,usb-charger-detection info message] Signed-off-by: Sebastian Reichel <sre@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The charger manager obtained in probe references to power supplies for
all chargers with power_supply_get_by_name() for later usage. However
if such charger driver was removed then this reference would point to
old power supply (from driver which was removed).
The charger manager obtained reference to fuel gauge power supply in probe
with power_supply_get_by_name() for later usage. However if fuel gauge
driver was removed and re-added then this reference would point to old
power supply (from driver which was removed).
This lead to accessing old (and probably invalid) memory which could be
observed with:
$ echo "12-0036" > /sys/bus/i2c/drivers/max17042/unbind
$ echo "12-0036" > /sys/bus/i2c/drivers/max17042/bind
$ cat /sys/devices/virtual/power_supply/battery/capacity
[ 240.480084] INFO: task cat:1393 blocked for more than 120 seconds.
[ 240.484799] Not tainted 3.17.0-next-20141007-00028-ge60b6dd79570 #203
[ 240.491782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.499589] cat D c0469530 0 1393 1 0x00000000
[ 240.505947] [<c0469530>] (__schedule) from [<c0469d3c>] (schedule_preempt_disabled+0x14/0x20)
[ 240.514449] [<c0469d3c>] (schedule_preempt_disabled) from [<c046af08>] (mutex_lock_nested+0x1bc/0x458)
[ 240.523736] [<c046af08>] (mutex_lock_nested) from [<c0287a98>] (regmap_read+0x30/0x60)
[ 240.531647] [<c0287a98>] (regmap_read) from [<c032238c>] (max17042_get_property+0x2e8/0x350)
[ 240.540055] [<c032238c>] (max17042_get_property) from [<c03247d8>] (charger_get_property+0x264/0x348)
[ 240.549252] [<c03247d8>] (charger_get_property) from [<c0320764>] (power_supply_show_property+0x48/0x1e0)
[ 240.558808] [<c0320764>] (power_supply_show_property) from [<c027308c>] (dev_attr_show+0x1c/0x48)
[ 240.567664] [<c027308c>] (dev_attr_show) from [<c0141fb0>] (sysfs_kf_seq_show+0x84/0x104)
[ 240.575814] [<c0141fb0>] (sysfs_kf_seq_show) from [<c0140b18>] (kernfs_seq_show+0x24/0x28)
[ 240.584061] [<c0140b18>] (kernfs_seq_show) from [<c0104574>] (seq_read+0x1b0/0x484)
[ 240.591702] [<c0104574>] (seq_read) from [<c00e1e24>] (vfs_read+0x88/0x144)
[ 240.598640] [<c00e1e24>] (vfs_read) from [<c00e1f20>] (SyS_read+0x40/0x8c)
[ 240.605507] [<c00e1f20>] (SyS_read) from [<c000e760>] (ret_fast_syscall+0x0/0x48)
[ 240.612952] 4 locks held by cat/1393:
[ 240.616589] #0: (&p->lock){+.+.+.}, at: [<c01043f4>] seq_read+0x30/0x484
[ 240.623414] #1: (&of->mutex){+.+.+.}, at: [<c01417dc>] kernfs_seq_start+0x1c/0x8c
[ 240.631086] #2: (s_active#31){++++.+}, at: [<c01417e4>] kernfs_seq_start+0x24/0x8c
[ 240.638777] #3: (&map->mutex){+.+...}, at: [<c0287a98>] regmap_read+0x30/0x60
The charger-manager should get reference to fuel gauge power supply on
each use of get_property callback. The thermal zone 'tzd' field of
power supply should not be used because of the same reason.
Additionally this change solves also the issue with nested
thermal_zone_get_temp() calls and related false lockdep positive for
deadlock for thermal zone's mutex [1]. When fuel gauge is used as source of
temperature then the charger manager forwards its get_temp calls to fuel
gauge thermal zone. So actually different mutexes are used (one for
charger manager thermal zone and second for fuel gauge thermal zone) but
for lockdep this is one class of mutex.
The recursion is removed by retrieving temperature through power
supply's get_property().
In case external thermal zone is used ('cm-thermal-zone' property is
present in DTS) the recursion does not exist. Charger manager simply
exports POWER_SUPPLY_PROP_TEMP_AMBIENT property (instead of
POWER_SUPPLY_PROP_TEMP) thus no thermal zone is created for this power
supply.
[1] https://lkml.org/lkml/2014/10/6/309
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Fixes: 3bb3dbbd56ea ("power_supply: Add initial Charger-Manager driver") Signed-off-by: Sebastian Reichel <sre@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sleeping under the rcu_read_lock is bad. This patch fixes it by dropping
the rcu_read_lock before doing the allocation and then reacquiring it
and redoing the dereference before doing the copy. If we find that the
string has somehow grown in the meantime, we'll reallocate and try again.
Reported-by: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When resuming from s2ram on an SMP system without cpufreq operating
points (e.g. there's no "operating-points" property for the CPU node in
DT, or the platform doesn't use DT yet), the kernel crashes when
bringing CPU 1 online:
Sometimes on Dell Latitude laptops psmouse/alps driver receive invalid ALPS
protocol V3 packets with bit7 set in last byte. More often it can be
reproduced on Dell Latitude E6440 or E7440 with closed lid and pushing
cover above touchpad.
If bit7 in last packet byte is set then it is not valid ALPS packet. I was
told that ALPS devices never send these packets. It is not know yet who
send those packets, it could be Dell EC, bug in BIOS and also bug in
touchpad firmware...
With this patch alps driver does not process those invalid packets, but
instead of reporting PSMOUSE_BAD_DATA, getting into out of sync state,
getting back in sync with the next byte and spam dmesg we return
PSMOUSE_FULL_PACKET. If driver is truly out of sync we'll fail the checks
on the next byte and report PSMOUSE_BAD_DATA then.
On some Dell Latitude laptops ALPS device or Dell EC send one invalid byte
in 6 bytes ALPS packet. In this case psmouse driver enter out of sync
state. It looks like that all other bytes in packets are valid and also
device working properly. So there is no need to do full device reset, just
need to wait for byte which match condition for first byte (start of
packet). Because ALPS packets are bigger (6 or 8 bytes) default limit is
small.
This patch increase number of invalid bytes to size of 2 ALPS packets which
psmouse driver can drop before do full reset.
Resetting ALPS devices take some time and when doing reset on some Dell
laptops touchpad, trackstick and also keyboard do not respond. So it is
better to do it only if really necessary.
5th and 6th byte of ALPS trackstick V3 protocol match condition for first
byte of PS/2 3 bytes packet. When driver enters out of sync state and ALPS
trackstick is sending data then driver match 5th, 6th and next 1st bytes as
PS/2.
It basically means if user is using trackstick when driver is in out of
sync state driver will never resync. Processing these bytes as 3 bytes PS/2
data cause total mess (random cursor movements, random clicks) and make
trackstick unusable until psmouse driver decide to do full device reset.
Lot of users reported problems with ALPS devices on Dell Latitude E6440,
E6540 and E7440 laptops. ALPS device or Dell EC for unknown reason send
some invalid ALPS PS/2 bytes which cause driver out of sync. It looks like
that i8042 and psmouse/alps driver always receive group of 6 bytes packets
so there are no missing bytes and no bytes were inserted between valid
ones.
This patch does not fix root of problem with ALPS devices found in Dell
Latitude laptops but it does not allow to process some (invalid)
subsequence of 6 bytes ALPS packets as 3 bytes PS/2 when driver is out of
sync.
So with this patch trackstick input device does not report bogus data when
also driver is out of sync, so trackstick should be usable on those
machines.
The dm-raid superblock (struct dm_raid_superblock) is padded to 512
bytes and that size is being used to read it in from the metadata
device into one preallocated page.
Reading or writing this on a 512-byte sector device works fine but on
a 4096-byte sector device this fails.
Set the dm-raid superblock's size to the logical block size of the
metadata device, because IO at that size is guaranteed too work. Also
add a size check to avoid silent partial metadata loss in case the
superblock should ever grow past the logical block size or PAGE_SIZE.
[includes pointer math fix from Dan Carpenter] Reported-by: "Liuhua Wang" <lwang@suse.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The walk code was using a 'ro_spine' to hold it's locked btree nodes.
But this data structure is designed for the rolling lock scheme, and
as such automatically unlocks blocks that are two steps up the call
chain. This is not suitable for the simple recursive walk algorithm,
which retraces its steps.
This code is only used by the persistent array code, which in turn is
only used by dm-cache. In order to trigger it you need to have a
mapping tree that is more than 2 levels deep; which equates to 8-16
million cache blocks. For instance a 4T ssd with a very small block
size of 32k only just triggers this bug.
The fix just places the locked blocks on the stack, and stops using
the ro_spine altogether.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The shrinker uses gfp flags to indicate what kind of operation can the
driver wait for. If __GFP_IO flag is present, the driver can wait for
block I/O operations, if __GFP_FS flag is present, the driver can wait on
operations involving the filesystem.
dm-bufio tested for __GFP_IO. However, dm-bufio can run on a loop block
device that makes calls into the filesystem. If __GFP_IO is present and
__GFP_FS isn't, dm-bufio could still block on filesystem operations if it
runs on a loop block device.
The change from __GFP_IO to __GFP_FS supposedly fixes one observed (though
unreproducible) deadlock involving dm-bufio and loop device.
Priority of a merged request is computed by ioprio_best(). If one of the
requests has undefined priority (IOPRIO_CLASS_NONE) and another request
has priority from IOPRIO_CLASS_BE, the function will return the
undefined priority which is wrong. Fix the function to properly return
priority of a request with the defined priority.
Fixes: d58cdfb89ce0c6bd5f81ae931a984ef298dbda20 Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Switch over the msgctl, shmat, shmctl and semtimedop syscalls to use the compat
layer. The problem was found with the debian procenv package, which called
shmctl(0, SHM_INFO, &info);
in which the shmctl syscall then overwrote parts of the surrounding areas on
the stack on which the info variable was stored and thus lead to a segfault
later on.
Additionally fix the definition of struct shminfo64 to use unsigned longs like
the other architectures. This has no impact on userspace since we only have a
32bit userspace up to now.
Signed-off-by: Helge Deller <deller@gmx.de> Cc: John David Anglin <dave.anglin@bell.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Setups that use the blk-mq I/O path can lock up if a host with a single
device that has its door locked enters EH. Make sure to only send the
command to re-lock the door to devices that actually were reset and thus
might have lost their state. Otherwise the EH code might be get blocked
on blk_get_request as all requests for non-reset devices might be in use.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Meelis Roos <meelis.roos@ut.ee> Tested-by: Meelis Roos <meelis.roos@ut.ee> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When experimenting with patches to provide kprobes support for aarch64
smp machines would hang when inserting breakpoints into kernel code.
The hangs were caused by a race condition in the code called by
aarch64_insn_patch_text_sync(). The first processor in the
aarch64_insn_patch_text_cb() function would patch the code while other
processors were still entering the function and incrementing the
cpu_count field. This resulted in some processors never observing the
exit condition and exiting the function. Thus, processors in the
system hung.
The first processor to enter the patching function performs the
patching and signals that the patching is complete with an increment
of the cpu_count field. When all the processors have incremented the
cpu_count field the cpu_count will be num_cpus_online()+1 and they
will return to normal execution.
Fixes: ae16480785de arm64: introduce interfaces to hotpatch kernel and module code Signed-off-by: William Cohen <wcohen@redhat.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For pNFS direct writes, layout driver may dynamically allocate ds_cinfo.buckets.
So we need to take care to free them when freeing dreq.
Ideally this needs to be done inside layout driver where ds_cinfo.buckets
are allocated. But buckets are attached to dreq and reused across LD IO iterations.
So I feel it's OK to free them in the generic layer.
Found by the UC-KLEE tool: A user could supply less input to
firewire-cdev ioctls than write- or write/read-type ioctl handlers
expect. The handlers used data from uninitialized kernel stack then.
This could partially leak back to the user if the kernel subsequently
generated fw_cdev_event_'s (to be read from the firewire-cdev fd)
which notably would contain the _u64 closure field which many of the
ioctl argument structures contain.
The fact that the handlers would act on random garbage input is a
lesser issue since all handlers must check their input anyway.
The fix simply always null-initializes the entire ioctl argument buffer
regardless of the actual length of expected user input. That is, a
runtime overhead of memset(..., 40) is added to each firewirew-cdev
ioctl() call. [Comment from Clemens Ladisch: This part of the stack is
most likely to be already in the cache.]
Remarks:
- There was never any leak from kernel stack to the ioctl output
buffer itself. IOW, it was not possible to read kernel stack by a
read-type or write/read-type ioctl alone; the leak could at most
happen in combination with read()ing subsequent event data.
- The actual expected minimum user input of each ioctl from
include/uapi/linux/firewire-cdev.h is, in bytes:
[0x00] = 32, [0x05] = 4, [0x0a] = 16, [0x0f] = 20, [0x14] = 16,
[0x01] = 36, [0x06] = 20, [0x0b] = 4, [0x10] = 20, [0x15] = 20,
[0x02] = 20, [0x07] = 4, [0x0c] = 0, [0x11] = 0, [0x16] = 8,
[0x03] = 4, [0x08] = 24, [0x0d] = 20, [0x12] = 36, [0x17] = 12,
[0x04] = 20, [0x09] = 24, [0x0e] = 4, [0x13] = 40, [0x18] = 4.
Reported-by: David Ramos <daramos@stanford.edu> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While efi-entry.S mentions that efi_entry() will have relocated the
kernel image, it actually means that efi_entry will have placed a copy
of the kernel in the appropriate location, and until this is branched to
at the end of efi_entry.S, all instructions are executed from the
original image.
Thus while the flush in efi_entry.S does ensure that the copy is visible
to noncacheable accesses, it does not guarantee that this is true for
the image instructions are being executed from. This could have
disasterous effects when the MMU and caches are disabled if the image
has not been naturally evicted to the PoC.
Additionally, due to a missing dsb following the ic ialluis, the new
kernel image is not necessarily clean in the I-cache when it is branched
to, with similar potentially disasterous effects.
This patch adds additional flushing to ensure that the currently
executing stub text is flushed to the PoC and is thus visible to
noncacheable accesses. As it is placed after the instructions cache
maintenance for the new image and __flush_dcache_area already contains a
dsb, we do not need to add a separate barrier to ensure completion of
the icache maintenance.
Comments are updated to clarify the situation with regard to the two
images and the maintenance required for both.
Fixes: 3c7f255039a2ad6ee1e3890505caf0d029b22e29 Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Joel Schopp <joel.schopp@amd.com> Reviewed-by: Roy Franz <roy.franz@linaro.org> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ian Campbell <ijc@hellion.org.uk> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Mark Salter <msalter@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ARM64 currently doesn't fix up faults on the single-byte (strb) case of
__clear_user... which means that we can cause a nasty kernel panic as an
ordinary user with any multiple PAGE_SIZE+1 read from /dev/zero.
i.e.: dd if=/dev/zero of=foo ibs=1 count=1 (or ibs=65537, etc.)
This is a pretty obscure bug in the general case since we'll only
__do_kernel_fault (since there's no extable entry for pc) if the
mmap_sem is contended. However, with CONFIG_DEBUG_VM enabled, we'll
always fault.
if (!down_read_trylock(&mm->mmap_sem)) {
if (!user_mode(regs) && !search_exception_tables(regs->pc))
goto no_context;
retry:
down_read(&mm->mmap_sem);
} else {
/*
* The above down_read_trylock() might have succeeded in
* which
* case, we'll have missed the might_sleep() from
* down_read().
*/
might_sleep();
if (!user_mode(regs) && !search_exception_tables(regs->pc))
goto no_context;
}
Fix that by adding an extable entry for the strb instruction, since it
touches user memory, similar to the other stores in __clear_user.
The branches of the if (i->type & ITER_BVEC) statement in
iov_iter_single_seg_count() are the wrong way around; if ITER_BVEC is
clear then we use i->bvec, when we should be using i->iov. This fixes
it.
In my case, the symptom that this caused was that a KVM guest doing
filesystem operations on a virtual disk would result in one of qemu's
threads on the host going into an infinite loop in
generic_perform_write(). The loop would hit the copied == 0 case and
call iov_iter_single_seg_count() to reduce the number of bytes to try
to process, but because of the error, iov_iter_single_seg_count()
would just return i->count and the loop made no progress and continued
forever.
Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A second product has come to light which makes use of the A0 stepping
of the Armada XP SoC. A0 stepping has a hardware bug in the i2c core
meaning that hardware offload does not work, resulting in the kernel
failing to boot. The quirk detects that the kernel is running on an A0
stepping SoC and disables the use of hardware offload.
Currently the quirk is only enabled for PlatHome Openblocks AX3. The
AX3 has been produced with both A0 and B0 stepping SoCs. The second
product is the Lenovo Iomega IX4-300d. It seems likely that this
device will also swap from A0 to B0 SoC sometime during its life.
If there are two products using A0, it seems likely there are more
products with A0. Also, since the number of A0 SoCs is limited, these
products are also likely to transition to B0. Hence detecting at run
time is the safest option. So enable the quirk for all Armada XP
boards.
Tested on an AX3 with A0 stepping.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Fixes: 930ab3d403ae: ("i2c: mv64xxx: Add I2C Transaction Generator support") Link: https://lkml.kernel.org/r/1406395238-29758-2-git-send-email-andrew@lunn.ch Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The 5th NAND partition should be named "NAND.u-boot-spl-os"
instead of "NAND.u-boot-spl". This is to be consistent with other
TI boards as well as u-boot.
Fixes: 91994facdd2d ("ARM: dts: am335x-evm: NAND: update MTD partition table") Signed-off-by: Roger Quadros <rogerq@ti.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To speed up decompression, the decompressor sets up a flat, cacheable
mapping of memory. However, when there is insufficient space to hold
the page tables for this mapping, we don't bother to enable the caches
and subsequently skip all the cache maintenance hooks.
Skipping the cache maintenance before jumping to the relocated code
allows the processor to predict the branch and populate the I-cache
with stale data before the relocation loop has completed (since a
bootloader may have SCTLR.I set, which permits normal, cacheable
instruction fetches regardless of SCTLR.M).
This patch moves the cache maintenance check into the maintenance
routines themselves, allowing the v6/v7 versions to invalidate the
I-cache regardless of the MMU state.
Reported-by: Marc Carino <marc.ceeeee@gmail.com> Tested-by: Julien Grall <julien.grall@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kuser helpers page is not set up on non-MMU systems, so it does
not make sense to allow CONFIG_KUSER_HELPERS to be enabled when
CONFIG_MMU=n. Allowing it to be set on !MMU results in an oops in
set_tls (used in execve and the arm_syscall trap handler):
As best I can tell this issue existed for the set_tls ARM syscall
before commit fbfb872f5f41 "ARM: 8148/1: flush TLS and thumbee
register state during exec" consolidated the TLS manipulation code
into the set_tls helper function, but now that we're using it to flush
register state during execve, !MMU users encounter the oops at the
first exec.
Prevent CONFIG_MMU=n configurations from enabling
CONFIG_KUSER_HELPERS.
Fixes: fbfb872f5f41 (ARM: 8148/1: flush TLS and thumbee register state during exec) Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com> Reported-by: Stefan Agner <stefan@agner.ch> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While developing MST support I noticed I often got the wrong data
back from a transaction, in a racy fashion. I noticed the scratch
space wasn't locked against concurrent users.
Based on a patch by Alex, but I've made it a bit more obvious when
things are locked.
Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The power management code calls into the display code for
certain things. If certain power management sysfs attributes
are called before the driver has finished initializing all of
the hardware we can run into problems with uninitialized
modesetting state. Add a check to make sure modesetting
init has completed to the bandwidth update callbacks to
fix this. Can be triggered by the tlp and laptop start
up scripts depending on the timing.
Global GTT doesn't have pat_sel[2:0] so it always point to pat_sel = 000;
So the only way to avoid screen corruptions is setting PAT 0 to Uncached.
MOCS can still be used though. But if userspace is trusting PTE for
cache selection the safest thing to do is to let caches disabled.
BSpec: "For GGTT, there is NO pat_sel[2:0] from the entry,
so RTL will always use the value corresponding to pat_sel = 000"
- System agent ggtt writes (i.e. cpu gtt mmaps) already work before
this patch, i.e. the same uncached + snooping access like on gen6/7
seems to be in effect.
- So this just fixes blitter/render access. Again it looks like it's
not just uncached access, but uncached + snooping. So we can still
hold onto all our assumptions wrt cpu clflushing on LLC machines.
v2: Cleaner patch as suggested by Chris.
v3: Add Daniel's comment
Reference: https://bugs.freedesktop.org/show_bug.cgi?id=85576 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: James Ausmus <james.ausmus@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jani Nikula <jani.nikula@intel.com> Tested-by: James Ausmus <james.ausmus@intel.com> Reviewed-by: James Ausmus <james.ausmus@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upon receiving the last fragment, all but the first fragment
are freed, but the multicast check for statistics at the end
of the function refers to the current skb (the last fragment)
causing a use-after-free bug.
Since multicast frames cannot be fragmented and we check for
this early in the function, just modify that check to also
do the accounting to fix the issue.
Due to the time it takes to process the beacon that started the CSA
process, we may be late for the switch if we try to reach exactly
beacon 0. To avoid that, use count - 1 when calculating the switch time.
If we are switching from an HT40+ to an HT40- channel (or vice-versa),
we need the secondary channel offset IE to specify what is the
post-CSA offset to be used. This applies both to beacons and to probe
responses.
In ieee80211_parse_ch_switch_ie() we were ignoring this IE from
beacons and using the *current* HT information IE instead. This was
causing us to use the same offset as before the switch.
Fix that by using the secondary channel offset IE also for beacons and
don't ever use the pre-switch offset. Additionally, remove the
"beacon" argument from ieee80211_parse_ch_switch_ie(), since it's not
needed anymore.
When an interface is deleted, an ongoing hardware scan is canceled and
the driver must abort the scan, at the very least reporting completion
while the interface is removed.
However, if it scheduled the work that might only run after everything
is said and done, which leads to cfg80211 warning that the scan isn't
reported as finished yet; this is no fault of the driver, it already
did, but mac80211 hasn't processed it.
To fix this situation, flush the delayed work when the interface being
removed is the one that was executing the scan.
The driver is not released when ieee80211_register_hw fails in
mac80211_hwsim_create_radio, leading to the access to the unregistered (and
possibly freed) device in platform_driver_unregister:
When VLAN is in use in macvtap_put_user, we end up setting
csum_start to the wrong place. The result is that the whoever
ends up doing the checksum setting will corrupt the packet instead
of writing the checksum to the expected location, usually this
means writing the checksum with an offset of -4.
This patch fixes this by adjusting csum_start when VLAN tags are
detected.
Fixes: f09e2249c4f5 ("macvtap: restore vlan header on user read") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Large (greater than 32k, the value of PAGE_ALLOC_COSTLY_ORDER) auth
tickets will have their buffers vmalloc'ed, which leads to the
following crash in crypto:
Userspace actually passes single parameter (path name) to the umount
syscall, so new umount just fails. Fix it by requesting old umount
syscall implementation and re-wiring umount to it.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lenovo Ideapad Z560 has a mute LED that is controlled via EAPD pin
0x1b on CX20585 codec. (EAPD bit on corresponds to mute LED on.)
The machine doesn't need other EAPD, so the fixup concentrates on
controlling EAPD 0x1b following the vmaster state (but inversely).
Samsung pci-e SSDs on macbooks failed miserably on NCQ commands, so 67809f85d31e ("ahci: disable NCQ on Samsung pci-e SSDs on macbooks")
disabled NCQ on them. It turns out that NCQ is fine as long as MSI is
not used, so let's turn off MSI and leave NCQ on.
Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=60731 Tested-by: <dorin@i51.org> Tested-by: Imre Kaloz <kaloz@openwrt.org> Fixes: 67809f85d31e ("ahci: disable NCQ on Samsung pci-e SSDs on macbooks") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes into the AHCI subsystem have introduced a bug by not taking into
account the force_port_map and mask_port_map parameters when using the
ahci_pci_save_initial_config function. This commit fixes it by setting
the internal parameters of the ahci_port_priv structure.
Currently if the user passes an invalid value on the kernel command line
then the kernel will crash during argument parsing. On most systems this
is very hard to debug because the console hasn't been initialized yet.
This is a regression due to commit 51e158c12aca ("param: hand arguments
after -- straight to init") which, in response to the systemd debug
controversy, made it possible to explicitly pass arguments to init. To
achieve this parse_args() was extended from simply returning an error
code to returning a pointer. Regretably the new init args logic does not
perform a proper validity check on the pointer resulting in a crash.
This patch fixes the validity check. Should the check fail then no arguments
will be passed to init. This is reasonable and matches how the kernel treats
its own arguments (i.e. no error recovery).
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The problem is this: tracing_buffers_splice_read() calls
ring_buffer_wait() to wait for data in the ring buffers. The buffers
are not empty so ring_buffer_wait() returns immediately. But
tracing_buffers_splice_read() calls ring_buffer_read_page() with full=1,
meaning it only wants to read a full page. When the full page is not
available, tracing_buffers_splice_read() tries to wait again with
ring_buffer_wait(), which again returns immediately, and so on.
Fix this by adding a "full" argument to ring_buffer_wait() which will
make ring_buffer_wait() wait until the writer has left the reader's
page, i.e. until full-page reads will succeed.
Link: http://lkml.kernel.org/r/1415645194-25379-1-git-send-email-rabin@rab.in Fixes: b1169cc69ba9 ("tracing: Remove mock up poll wait function") Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Audit rules disappear when an inode they watch is evicted from the cache.
This is likely not what we want.
The guilty commit is "fsnotify: allow marks to not pin inodes in core",
which didn't take into account that audit_tree adds watches with a zero
mask.
Adding any mask should fix this.
Fixes: 90b1e7a57880 ("fsnotify: allow marks to not pin inodes in core") Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a space between subj= and feature= fields to make them parsable.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When an AUDIT_GET_FEATURE message is sent from userspace to the kernel, it
should reply with a message tagged as an AUDIT_GET_FEATURE type with a struct
audit_feature. The current reply is a message tagged as an AUDIT_GET
type with a struct audit_feature.
This appears to have been a cut-and-paste-eo in commit b0fed40.
Reported-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When VLAN acceleration is in use on the xmit path, we end up
setting csum_start to the wrong place. The result is that the
whoever ends up doing the checksum setting will corrupt the packet
instead of writing the checksum to the expected location, usually
this means writing the checksum with an offset of -4.
This patch fixes this by adjusting csum_start when VLAN acceleration
is detected.
Fixes: 6680ec68eff4 ("tuntap: hardware vlan tx support") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The emulator could reuse an op->type from a previous instruction for some
immediate values. If it mistakenly considers the operands as memory
operands, it will performs a memory read and overwrite op->val.
Consider for instance the ROR instruction - src2 (the number of times)
would be read from memory instead of being used as immediate.
Mark every immediate operand as such to avoid this problem.
Fixes: c44b4c6ab80eef3a9c52c7b3f0c632942e6489aa Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When memory is hot-added, all the memory is in offline state. So clear
all zones' present_pages because they will be updated in online_pages()
and offline_pages(). Otherwise, /proc/zoneinfo will corrupt:
In free_area_init_core(), zone->managed_pages is set to an approximate
value for lowmem, and will be adjusted when the bootmem allocator frees
pages into the buddy system.
But free_area_init_core() is also called by hotadd_new_pgdat() when
hot-adding memory. As a result, zone->managed_pages of the newly added
node's pgdat is set to an approximate value in the very beginning.
Even if the memory on that node has node been onlined,
/sys/device/system/node/nodeXXX/meminfo has wrong value:
This patch fixes this problem by reset node managed pages to 0 after
hot-adding a new node.
1. Move reset_managed_pages_done from reset_node_managed_pages() to
reset_all_zones_managed_pages()
2. Make reset_node_managed_pages() non-static
3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat
is initialized
The add_early_randomness() function in drivers/char/hw_random/core.c passes
a 16-byte buffer to pseries_rng_data_read(). Unfortunately, plpar_hcall()
returns four 64-bit values and trashes 16 bytes on the stack.
This bug has been lying around for a long time. It got unveiled by:
It may trig a oops while loading or unloading the pseries-rng module for both
PowerVM and PowerKVM guests.
This patch does two things:
- pass an intermediate well sized buffer to plpar_hcall(). This is acceptalbe
since we're not on a hot path.
- move to the new read API so that we know the return buffer size for sure.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
All interrupts coming from MUIC were ignored because interrupt source
register was masked.
The Maxim 77693 has a "interrupt source" - a separate register and interrupts
which give information about PMIC block triggering the individual
interrupt (charger, topsys, MUIC, flash LED).
By default bootloader could initialize this register to "mask all"
value. In such case (observed on Trats2 board) MUIC interrupts won't be
generated regardless of their mask status. Regmap irq chip was unmasking
individual MUIC interrupts but the source was masked
Before introducing regmap irq chip this interrupt source was unmasked,
read and acked. Reading and acking is not necessary but unmasking is.
Fixes: 342d669c1ee4 ("mfd: max77693: Handle IRQs using regmap") Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Interrupts coming from Maxim77693 MUIC block (MicroUSB Interface
Controller) were not handled at all because wrong regmap was used for
MUIC's regmap_irq_chip.
The MUIC component of Maxim 77693 uses different I2C address thus second
regmap is created and used by max77693 extcon driver. The registers for
MUIC interrupts are also in that block and should be handled by that
second regmap.
However the regmap irq chip for MUIC was configured with default regmap
which could not read MUIC registers.
Fixes: 342d669c1ee4 ("mfd: max77693: Handle IRQs using regmap") Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit e7cd1d1eb16f ("mfd: twl4030-power: Add generic reset
configuration") enabled configuring the PM features for twl4030.
This caused poweroff command to fail on devices that have the
BCI charger on twl4030 wired, or have power wired for VBUS.
Instead of powering off, the device reboots. This is because
voltage is detected on charger or VBUS with the default bits
enabled for the power transition registers.
To fix the issue, let's just clear VBUS and CHG bits as we want
poweroff command to keep the system powered off.
Fixes: e7cd1d1eb16f ("mfd: twl4030-power: Add generic reset configuration") Reported-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Replace equivalent (and partially incorrect) scatter-gather functions
with ones from crypto-API.
The replacement is motivated by page-faults in sg_copy_part triggered
by successive calls to crypto_hash_update. The following fault appears
after calling crypto_ahash_update twice, first with 13 and then
with 285 bytes:
Unable to handle kernel paging request for data at address 0x00000008
Faulting instruction address: 0xf9bf9a8c
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=8 CoreNet Generic
Modules linked in: tcrypt(+) caamhash caam_jr caam tls
CPU: 6 PID: 1497 Comm: cryptomgr_test Not tainted
3.12.19-rt30-QorIQ-SDK-V1.6+g9fda9f2 #75
task: e9308530 ti: e700e000 task.ti: e700e000
NIP: f9bf9a8c LR: f9bfcf28 CTR: c0019ea0
REGS: e700fb80 TRAP: 0300 Not tainted
(3.12.19-rt30-QorIQ-SDK-V1.6+g9fda9f2)
MSR: 00029002 <CE,EE,ME> CR: 44f92024 XER: 20000000
DEAR: 00000008, ESR: 00000000
In a system with NUMA configuration we want to enforce that the accelerator is
connected to a node with memory to avoid cross QPI memory transaction.
Otherwise there is no point in using the accelerator as the encryption in
software will be faster.
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Tested-by: Nikolay Aleksandrov <nikolay@redhat.com> Reviewed-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If dma mapping for dma_addr_out fails, the descriptor memory is freed
but the previous dma mapping for dma_addr_in remains.
This patch resolves the missing dma unmap and groups resource
allocations at function start.
Current pageblock isolation logic could isolate each pageblock
individually. This causes freepage accounting problem if freepage with
pageblock order on isolate pageblock is merged with other freepage on
normal pageblock. We can prevent merging by restricting max order of
merging to pageblock order if freepage is on isolate pageblock.
A side-effect of this change is that there could be non-merged buddy
freepage even if finishing pageblock isolation, because undoing
pageblock isolation is just to move freepage from isolate buddy list to
normal buddy list rather than to consider merging. So, the patch also
makes undoing pageblock isolation consider freepage merge. When
un-isolation, freepage with more than pageblock order and it's buddy are
checked. If they are on normal pageblock, instead of just moving, we
isolate the freepage and free it in order to get merged.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
All the caller of __free_one_page() has similar freepage counting logic,
so we can move it to __free_one_page(). This reduce line of code and
help future maintenance.
This is also preparation step for "mm/page_alloc: restrict max order of
merging on isolated pageblock" which fix the freepage counting problem
on freepage with more than pageblock order.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In free_pcppages_bulk(), we use cached migratetype of freepage to
determine type of buddy list where freepage will be added. This
information is stored when freepage is added to pcp list, so if
isolation of pageblock of this freepage begins after storing, this
cached information could be stale. In other words, it has original
migratetype rather than MIGRATE_ISOLATE.
There are two problems caused by this stale information.
One is that we can't keep these freepages from being allocated.
Although this pageblock is isolated, freepage will be added to normal
buddy list so that it could be allocated without any restriction. And
the other problem is incorrect freepage accounting. Freepages on
isolate pageblock should not be counted for number of freepage.
Following is the code snippet in free_pcppages_bulk().
/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
__free_one_page(page, page_to_pfn(page), zone, 0, mt);
trace_mm_page_pcpu_drain(page, 0, mt);
if (likely(!is_migrate_isolate_page(page))) {
__mod_zone_page_state(zone, NR_FREE_PAGES, 1);
if (is_migrate_cma(mt))
__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
}
As you can see above snippet, current code already handle second
problem, incorrect freepage accounting, by re-fetching pageblock
migratetype through is_migrate_isolate_page(page).
But, because this re-fetched information isn't used for
__free_one_page(), first problem would not be solved. This patch try to
solve this situation to re-fetch pageblock migratetype before
__free_one_page() and to use it for __free_one_page().
In addition to move up position of this re-fetch, this patch use
optimization technique, re-fetching migratetype only if there is isolate
pageblock. Pageblock isolation is rare event, so we can avoid
re-fetching in common case with this optimization.
This patch also correct migratetype of the tracepoint output.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Before describing bugs itself, I first explain definition of freepage.
1. pages on buddy list are counted as freepage.
2. pages on isolate migratetype buddy list are *not* counted as freepage.
3. pages on cma buddy list are counted as CMA freepage, too.
Now, I describe problems and related patch.
Patch 1: There is race conditions on getting pageblock migratetype that
it results in misplacement of freepages on buddy list, incorrect
freepage count and un-availability of freepage.
Patch 2: Freepages on pcp list could have stale cached information to
determine migratetype of buddy list to go. This causes misplacement of
freepages on buddy list and incorrect freepage count.
Patch 4: Merging between freepages on different migratetype of
pageblocks will cause freepages accouting problem. This patch fixes it.
Without patchset [3], above problem doesn't happens on my CMA allocation
test, because CMA reserved pages aren't used at all. So there is no
chance for above race.
With patchset [3], I did simple CMA allocation test and get below
result:
- Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation
- run kernel build (make -j16) on background
- 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval
- Result: more than 5000 freepage count are missed
With patchset [3] and this patchset, I found that no freepage count are
missed so that I conclude that problems are solved.
On my simple memory offlining test, these problems also occur on that
environment, too.
This patch (of 4):
There are two paths to reach core free function of buddy allocator,
__free_one_page(), one is free_one_page()->__free_one_page() and the
other is free_hot_cold_page()->free_pcppages_bulk()->__free_one_page().
Each paths has race condition causing serious problems. At first, this
patch is focused on first type of freepath. And then, following patch
will solve the problem in second type of freepath.
In the first type of freepath, we got migratetype of freeing page
without holding the zone lock, so it could be racy. There are two cases
of this race.
1. pages are added to isolate buddy list after restoring orignal
migratetype
CPU1 CPU2
get migratetype => return MIGRATE_ISOLATE
call free_one_page() with MIGRATE_ISOLATE
grab the zone lock
unisolate pageblock
release the zone lock
grab the zone lock
call __free_one_page() with MIGRATE_ISOLATE
freepage go into isolate buddy list,
although pageblock is already unisolated
This may cause two problems. One is that we can't use this page anymore
until next isolation attempt of this pageblock, because freepage is on
isolate buddy list. The other is that freepage accouting could be wrong
due to merging between different buddy list. Freepages on isolate buddy
list aren't counted as freepage, but ones on normal buddy list are
counted as freepage. If merge happens, buddy freepage on normal buddy
list is inevitably moved to isolate buddy list without any consideration
of freepage accouting so it could be incorrect.
2. pages are added to normal buddy list while pageblock is isolated.
It is similar with above case.
This also may cause two problems. One is that we can't keep these
freepages from being allocated. Although this pageblock is isolated,
freepage would be added to normal buddy list so that it could be
allocated without any restriction. And the other problem is same as
case 1, that it, incorrect freepage accouting.
This race condition would be prevented by checking migratetype again
with holding the zone lock. Because it is somewhat heavy operation and
it isn't needed in common case, we want to avoid rechecking as much as
possible. So this patch introduce new variable, nr_isolate_pageblock in
struct zone to check if there is isolated pageblock. With this, we can
avoid to re-check migratetype in common case and do it only if there is
isolated pageblock or migratetype is MIGRATE_ISOLATE. This solve above
mentioned problems.
Changes from v3:
Add one more check in free_one_page() that checks whether migratetype is
MIGRATE_ISOLATE or not. Without this, abovementioned case 1 could happens.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
zram could kunmap_atomic() a NULL pointer in a rare situation: a zram
page becomes a full-zeroed page after a partial write io. The current
code doesn't handle this case and performs kunmap_atomic() on a NULL
pointer, which panics the kernel.
This patch fixes this issue.
Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Weijie Yang <weijie.yang.kh@gmail.com> Acked-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Atomicity between xchg and cmpxchg cannot be guaranteed when xchg is
implemented with a swap and cmpxchg is implemented with locks.
Without this, e.g. mcs_spin_lock and mcs_spin_unlock are broken.
Signed-off-by: Andreas Larsson <andreas@gaisler.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Otherwise rcu_irq_{enter,exit}() do not happen and we get dumps like:
====================
[ 188.275021] ===============================
[ 188.309351] [ INFO: suspicious RCU usage. ]
[ 188.343737] 3.18.0-rc3-00068-g20f3963-dirty #54 Not tainted
[ 188.394786] -------------------------------
[ 188.429170] include/linux/rcupdate.h:883 rcu_read_lock() used
illegally while idle!
[ 188.505235]
other info that might help us debug this:
Based almost entirely upon a patch by Paul E. McKenney.
Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This code is trying to go through the standard PCI config space
interfaces to read the PCI controller's PCI_STATUS register.
This doesn't work, because we more often than not do not enumerate
the PCI controller as a bonafide PCI device during the OF device
node scan. Therefore bus->self remains NULL.
Existing common code for PSYCHO and PSYCHO-like PCI controllers
handles this properly, by doing the config space access directly.
Do the same here, pbm->pci_ops->{read,write}().
Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The VD_OP_GET_VTOC operation will succeed only if the vdisk backend has a
VTOC label, otherwise it will fail. In particular, it will return error
48 (ENOTSUP) if the disk has an EFI label. VTOC disk labels are already
handled by directly reading the disk in block/partitions/sun.c (enabled by
CONFIG_SUN_PARTITION which defaults to y on SPARC). Since port->label is
unused in the driver, remove the call and the field.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
vio_dring_avail() will allow use of every dring entry, but when the last
entry is allocated then dr->prod == dr->cons which is indistinguishable from
the ring empty condition. This causes the next allocation to reuse an entry.
When this happens in sunvdc, the server side vds driver begins nack'ing the
messages and ends up resetting the ldc channel. This problem does not effect
sunvnet since it checks for < 2.
The fix here is to just never allocate the very last dring slot so that full
and empty are not the same condition. The request start path was changed to
check for the ring being full a bit earlier, and to stop the blk_queue if
there is no space left. The blk_queue will be restarted once the ring is
only half full again. The number of ring entries was increased to 512 which
matches the sunvnet and Solaris vdc drivers, and greatly reduces the
frequency of hitting the ring full condition and the associated blk_queue
stop/starting. The checks in sunvent were adjusted to account for
vio_dring_avail() returning 1 less.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>