Priority of a merged request is computed by ioprio_best(). If one of the
requests has undefined priority (IOPRIO_CLASS_NONE) and another request
has priority from IOPRIO_CLASS_BE, the function will return the
undefined priority which is wrong. Fix the function to properly return
priority of a request with the defined priority.
Fixes: d58cdfb89ce0c6bd5f81ae931a984ef298dbda20 Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Switch over the msgctl, shmat, shmctl and semtimedop syscalls to use the compat
layer. The problem was found with the debian procenv package, which called
shmctl(0, SHM_INFO, &info);
in which the shmctl syscall then overwrote parts of the surrounding areas on
the stack on which the info variable was stored and thus lead to a segfault
later on.
Additionally fix the definition of struct shminfo64 to use unsigned longs like
the other architectures. This has no impact on userspace since we only have a
32bit userspace up to now.
Signed-off-by: Helge Deller <deller@gmx.de> Cc: John David Anglin <dave.anglin@bell.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Setups that use the blk-mq I/O path can lock up if a host with a single
device that has its door locked enters EH. Make sure to only send the
command to re-lock the door to devices that actually were reset and thus
might have lost their state. Otherwise the EH code might be get blocked
on blk_get_request as all requests for non-reset devices might be in use.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Meelis Roos <meelis.roos@ut.ee> Tested-by: Meelis Roos <meelis.roos@ut.ee> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When experimenting with patches to provide kprobes support for aarch64
smp machines would hang when inserting breakpoints into kernel code.
The hangs were caused by a race condition in the code called by
aarch64_insn_patch_text_sync(). The first processor in the
aarch64_insn_patch_text_cb() function would patch the code while other
processors were still entering the function and incrementing the
cpu_count field. This resulted in some processors never observing the
exit condition and exiting the function. Thus, processors in the
system hung.
The first processor to enter the patching function performs the
patching and signals that the patching is complete with an increment
of the cpu_count field. When all the processors have incremented the
cpu_count field the cpu_count will be num_cpus_online()+1 and they
will return to normal execution.
Fixes: ae16480785de arm64: introduce interfaces to hotpatch kernel and module code Signed-off-by: William Cohen <wcohen@redhat.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For pNFS direct writes, layout driver may dynamically allocate ds_cinfo.buckets.
So we need to take care to free them when freeing dreq.
Ideally this needs to be done inside layout driver where ds_cinfo.buckets
are allocated. But buckets are attached to dreq and reused across LD IO iterations.
So I feel it's OK to free them in the generic layer.
Found by the UC-KLEE tool: A user could supply less input to
firewire-cdev ioctls than write- or write/read-type ioctl handlers
expect. The handlers used data from uninitialized kernel stack then.
This could partially leak back to the user if the kernel subsequently
generated fw_cdev_event_'s (to be read from the firewire-cdev fd)
which notably would contain the _u64 closure field which many of the
ioctl argument structures contain.
The fact that the handlers would act on random garbage input is a
lesser issue since all handlers must check their input anyway.
The fix simply always null-initializes the entire ioctl argument buffer
regardless of the actual length of expected user input. That is, a
runtime overhead of memset(..., 40) is added to each firewirew-cdev
ioctl() call. [Comment from Clemens Ladisch: This part of the stack is
most likely to be already in the cache.]
Remarks:
- There was never any leak from kernel stack to the ioctl output
buffer itself. IOW, it was not possible to read kernel stack by a
read-type or write/read-type ioctl alone; the leak could at most
happen in combination with read()ing subsequent event data.
- The actual expected minimum user input of each ioctl from
include/uapi/linux/firewire-cdev.h is, in bytes:
[0x00] = 32, [0x05] = 4, [0x0a] = 16, [0x0f] = 20, [0x14] = 16,
[0x01] = 36, [0x06] = 20, [0x0b] = 4, [0x10] = 20, [0x15] = 20,
[0x02] = 20, [0x07] = 4, [0x0c] = 0, [0x11] = 0, [0x16] = 8,
[0x03] = 4, [0x08] = 24, [0x0d] = 20, [0x12] = 36, [0x17] = 12,
[0x04] = 20, [0x09] = 24, [0x0e] = 4, [0x13] = 40, [0x18] = 4.
Reported-by: David Ramos <daramos@stanford.edu> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While efi-entry.S mentions that efi_entry() will have relocated the
kernel image, it actually means that efi_entry will have placed a copy
of the kernel in the appropriate location, and until this is branched to
at the end of efi_entry.S, all instructions are executed from the
original image.
Thus while the flush in efi_entry.S does ensure that the copy is visible
to noncacheable accesses, it does not guarantee that this is true for
the image instructions are being executed from. This could have
disasterous effects when the MMU and caches are disabled if the image
has not been naturally evicted to the PoC.
Additionally, due to a missing dsb following the ic ialluis, the new
kernel image is not necessarily clean in the I-cache when it is branched
to, with similar potentially disasterous effects.
This patch adds additional flushing to ensure that the currently
executing stub text is flushed to the PoC and is thus visible to
noncacheable accesses. As it is placed after the instructions cache
maintenance for the new image and __flush_dcache_area already contains a
dsb, we do not need to add a separate barrier to ensure completion of
the icache maintenance.
Comments are updated to clarify the situation with regard to the two
images and the maintenance required for both.
Fixes: 3c7f255039a2ad6ee1e3890505caf0d029b22e29 Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Joel Schopp <joel.schopp@amd.com> Reviewed-by: Roy Franz <roy.franz@linaro.org> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ian Campbell <ijc@hellion.org.uk> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Mark Salter <msalter@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ARM64 currently doesn't fix up faults on the single-byte (strb) case of
__clear_user... which means that we can cause a nasty kernel panic as an
ordinary user with any multiple PAGE_SIZE+1 read from /dev/zero.
i.e.: dd if=/dev/zero of=foo ibs=1 count=1 (or ibs=65537, etc.)
This is a pretty obscure bug in the general case since we'll only
__do_kernel_fault (since there's no extable entry for pc) if the
mmap_sem is contended. However, with CONFIG_DEBUG_VM enabled, we'll
always fault.
if (!down_read_trylock(&mm->mmap_sem)) {
if (!user_mode(regs) && !search_exception_tables(regs->pc))
goto no_context;
retry:
down_read(&mm->mmap_sem);
} else {
/*
* The above down_read_trylock() might have succeeded in
* which
* case, we'll have missed the might_sleep() from
* down_read().
*/
might_sleep();
if (!user_mode(regs) && !search_exception_tables(regs->pc))
goto no_context;
}
Fix that by adding an extable entry for the strb instruction, since it
touches user memory, similar to the other stores in __clear_user.
The branches of the if (i->type & ITER_BVEC) statement in
iov_iter_single_seg_count() are the wrong way around; if ITER_BVEC is
clear then we use i->bvec, when we should be using i->iov. This fixes
it.
In my case, the symptom that this caused was that a KVM guest doing
filesystem operations on a virtual disk would result in one of qemu's
threads on the host going into an infinite loop in
generic_perform_write(). The loop would hit the copied == 0 case and
call iov_iter_single_seg_count() to reduce the number of bytes to try
to process, but because of the error, iov_iter_single_seg_count()
would just return i->count and the loop made no progress and continued
forever.
Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A second product has come to light which makes use of the A0 stepping
of the Armada XP SoC. A0 stepping has a hardware bug in the i2c core
meaning that hardware offload does not work, resulting in the kernel
failing to boot. The quirk detects that the kernel is running on an A0
stepping SoC and disables the use of hardware offload.
Currently the quirk is only enabled for PlatHome Openblocks AX3. The
AX3 has been produced with both A0 and B0 stepping SoCs. The second
product is the Lenovo Iomega IX4-300d. It seems likely that this
device will also swap from A0 to B0 SoC sometime during its life.
If there are two products using A0, it seems likely there are more
products with A0. Also, since the number of A0 SoCs is limited, these
products are also likely to transition to B0. Hence detecting at run
time is the safest option. So enable the quirk for all Armada XP
boards.
Tested on an AX3 with A0 stepping.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Fixes: 930ab3d403ae: ("i2c: mv64xxx: Add I2C Transaction Generator support") Link: https://lkml.kernel.org/r/1406395238-29758-2-git-send-email-andrew@lunn.ch Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The 5th NAND partition should be named "NAND.u-boot-spl-os"
instead of "NAND.u-boot-spl". This is to be consistent with other
TI boards as well as u-boot.
Fixes: 91994facdd2d ("ARM: dts: am335x-evm: NAND: update MTD partition table") Signed-off-by: Roger Quadros <rogerq@ti.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To speed up decompression, the decompressor sets up a flat, cacheable
mapping of memory. However, when there is insufficient space to hold
the page tables for this mapping, we don't bother to enable the caches
and subsequently skip all the cache maintenance hooks.
Skipping the cache maintenance before jumping to the relocated code
allows the processor to predict the branch and populate the I-cache
with stale data before the relocation loop has completed (since a
bootloader may have SCTLR.I set, which permits normal, cacheable
instruction fetches regardless of SCTLR.M).
This patch moves the cache maintenance check into the maintenance
routines themselves, allowing the v6/v7 versions to invalidate the
I-cache regardless of the MMU state.
Reported-by: Marc Carino <marc.ceeeee@gmail.com> Tested-by: Julien Grall <julien.grall@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kuser helpers page is not set up on non-MMU systems, so it does
not make sense to allow CONFIG_KUSER_HELPERS to be enabled when
CONFIG_MMU=n. Allowing it to be set on !MMU results in an oops in
set_tls (used in execve and the arm_syscall trap handler):
As best I can tell this issue existed for the set_tls ARM syscall
before commit fbfb872f5f41 "ARM: 8148/1: flush TLS and thumbee
register state during exec" consolidated the TLS manipulation code
into the set_tls helper function, but now that we're using it to flush
register state during execve, !MMU users encounter the oops at the
first exec.
Prevent CONFIG_MMU=n configurations from enabling
CONFIG_KUSER_HELPERS.
Fixes: fbfb872f5f41 (ARM: 8148/1: flush TLS and thumbee register state during exec) Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com> Reported-by: Stefan Agner <stefan@agner.ch> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While developing MST support I noticed I often got the wrong data
back from a transaction, in a racy fashion. I noticed the scratch
space wasn't locked against concurrent users.
Based on a patch by Alex, but I've made it a bit more obvious when
things are locked.
Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The power management code calls into the display code for
certain things. If certain power management sysfs attributes
are called before the driver has finished initializing all of
the hardware we can run into problems with uninitialized
modesetting state. Add a check to make sure modesetting
init has completed to the bandwidth update callbacks to
fix this. Can be triggered by the tlp and laptop start
up scripts depending on the timing.
Global GTT doesn't have pat_sel[2:0] so it always point to pat_sel = 000;
So the only way to avoid screen corruptions is setting PAT 0 to Uncached.
MOCS can still be used though. But if userspace is trusting PTE for
cache selection the safest thing to do is to let caches disabled.
BSpec: "For GGTT, there is NO pat_sel[2:0] from the entry,
so RTL will always use the value corresponding to pat_sel = 000"
- System agent ggtt writes (i.e. cpu gtt mmaps) already work before
this patch, i.e. the same uncached + snooping access like on gen6/7
seems to be in effect.
- So this just fixes blitter/render access. Again it looks like it's
not just uncached access, but uncached + snooping. So we can still
hold onto all our assumptions wrt cpu clflushing on LLC machines.
v2: Cleaner patch as suggested by Chris.
v3: Add Daniel's comment
Reference: https://bugs.freedesktop.org/show_bug.cgi?id=85576 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: James Ausmus <james.ausmus@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jani Nikula <jani.nikula@intel.com> Tested-by: James Ausmus <james.ausmus@intel.com> Reviewed-by: James Ausmus <james.ausmus@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upon receiving the last fragment, all but the first fragment
are freed, but the multicast check for statistics at the end
of the function refers to the current skb (the last fragment)
causing a use-after-free bug.
Since multicast frames cannot be fragmented and we check for
this early in the function, just modify that check to also
do the accounting to fix the issue.
Due to the time it takes to process the beacon that started the CSA
process, we may be late for the switch if we try to reach exactly
beacon 0. To avoid that, use count - 1 when calculating the switch time.
If we are switching from an HT40+ to an HT40- channel (or vice-versa),
we need the secondary channel offset IE to specify what is the
post-CSA offset to be used. This applies both to beacons and to probe
responses.
In ieee80211_parse_ch_switch_ie() we were ignoring this IE from
beacons and using the *current* HT information IE instead. This was
causing us to use the same offset as before the switch.
Fix that by using the secondary channel offset IE also for beacons and
don't ever use the pre-switch offset. Additionally, remove the
"beacon" argument from ieee80211_parse_ch_switch_ie(), since it's not
needed anymore.
When an interface is deleted, an ongoing hardware scan is canceled and
the driver must abort the scan, at the very least reporting completion
while the interface is removed.
However, if it scheduled the work that might only run after everything
is said and done, which leads to cfg80211 warning that the scan isn't
reported as finished yet; this is no fault of the driver, it already
did, but mac80211 hasn't processed it.
To fix this situation, flush the delayed work when the interface being
removed is the one that was executing the scan.
The driver is not released when ieee80211_register_hw fails in
mac80211_hwsim_create_radio, leading to the access to the unregistered (and
possibly freed) device in platform_driver_unregister:
When VLAN is in use in macvtap_put_user, we end up setting
csum_start to the wrong place. The result is that the whoever
ends up doing the checksum setting will corrupt the packet instead
of writing the checksum to the expected location, usually this
means writing the checksum with an offset of -4.
This patch fixes this by adjusting csum_start when VLAN tags are
detected.
Fixes: f09e2249c4f5 ("macvtap: restore vlan header on user read") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Large (greater than 32k, the value of PAGE_ALLOC_COSTLY_ORDER) auth
tickets will have their buffers vmalloc'ed, which leads to the
following crash in crypto:
Userspace actually passes single parameter (path name) to the umount
syscall, so new umount just fails. Fix it by requesting old umount
syscall implementation and re-wiring umount to it.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lenovo Ideapad Z560 has a mute LED that is controlled via EAPD pin
0x1b on CX20585 codec. (EAPD bit on corresponds to mute LED on.)
The machine doesn't need other EAPD, so the fixup concentrates on
controlling EAPD 0x1b following the vmaster state (but inversely).
Samsung pci-e SSDs on macbooks failed miserably on NCQ commands, so 67809f85d31e ("ahci: disable NCQ on Samsung pci-e SSDs on macbooks")
disabled NCQ on them. It turns out that NCQ is fine as long as MSI is
not used, so let's turn off MSI and leave NCQ on.
Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=60731 Tested-by: <dorin@i51.org> Tested-by: Imre Kaloz <kaloz@openwrt.org> Fixes: 67809f85d31e ("ahci: disable NCQ on Samsung pci-e SSDs on macbooks") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes into the AHCI subsystem have introduced a bug by not taking into
account the force_port_map and mask_port_map parameters when using the
ahci_pci_save_initial_config function. This commit fixes it by setting
the internal parameters of the ahci_port_priv structure.
Currently if the user passes an invalid value on the kernel command line
then the kernel will crash during argument parsing. On most systems this
is very hard to debug because the console hasn't been initialized yet.
This is a regression due to commit 51e158c12aca ("param: hand arguments
after -- straight to init") which, in response to the systemd debug
controversy, made it possible to explicitly pass arguments to init. To
achieve this parse_args() was extended from simply returning an error
code to returning a pointer. Regretably the new init args logic does not
perform a proper validity check on the pointer resulting in a crash.
This patch fixes the validity check. Should the check fail then no arguments
will be passed to init. This is reasonable and matches how the kernel treats
its own arguments (i.e. no error recovery).
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The problem is this: tracing_buffers_splice_read() calls
ring_buffer_wait() to wait for data in the ring buffers. The buffers
are not empty so ring_buffer_wait() returns immediately. But
tracing_buffers_splice_read() calls ring_buffer_read_page() with full=1,
meaning it only wants to read a full page. When the full page is not
available, tracing_buffers_splice_read() tries to wait again with
ring_buffer_wait(), which again returns immediately, and so on.
Fix this by adding a "full" argument to ring_buffer_wait() which will
make ring_buffer_wait() wait until the writer has left the reader's
page, i.e. until full-page reads will succeed.
Link: http://lkml.kernel.org/r/1415645194-25379-1-git-send-email-rabin@rab.in Fixes: b1169cc69ba9 ("tracing: Remove mock up poll wait function") Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Audit rules disappear when an inode they watch is evicted from the cache.
This is likely not what we want.
The guilty commit is "fsnotify: allow marks to not pin inodes in core",
which didn't take into account that audit_tree adds watches with a zero
mask.
Adding any mask should fix this.
Fixes: 90b1e7a57880 ("fsnotify: allow marks to not pin inodes in core") Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a space between subj= and feature= fields to make them parsable.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When an AUDIT_GET_FEATURE message is sent from userspace to the kernel, it
should reply with a message tagged as an AUDIT_GET_FEATURE type with a struct
audit_feature. The current reply is a message tagged as an AUDIT_GET
type with a struct audit_feature.
This appears to have been a cut-and-paste-eo in commit b0fed40.
Reported-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When VLAN acceleration is in use on the xmit path, we end up
setting csum_start to the wrong place. The result is that the
whoever ends up doing the checksum setting will corrupt the packet
instead of writing the checksum to the expected location, usually
this means writing the checksum with an offset of -4.
This patch fixes this by adjusting csum_start when VLAN acceleration
is detected.
Fixes: 6680ec68eff4 ("tuntap: hardware vlan tx support") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The emulator could reuse an op->type from a previous instruction for some
immediate values. If it mistakenly considers the operands as memory
operands, it will performs a memory read and overwrite op->val.
Consider for instance the ROR instruction - src2 (the number of times)
would be read from memory instead of being used as immediate.
Mark every immediate operand as such to avoid this problem.
Fixes: c44b4c6ab80eef3a9c52c7b3f0c632942e6489aa Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When memory is hot-added, all the memory is in offline state. So clear
all zones' present_pages because they will be updated in online_pages()
and offline_pages(). Otherwise, /proc/zoneinfo will corrupt:
In free_area_init_core(), zone->managed_pages is set to an approximate
value for lowmem, and will be adjusted when the bootmem allocator frees
pages into the buddy system.
But free_area_init_core() is also called by hotadd_new_pgdat() when
hot-adding memory. As a result, zone->managed_pages of the newly added
node's pgdat is set to an approximate value in the very beginning.
Even if the memory on that node has node been onlined,
/sys/device/system/node/nodeXXX/meminfo has wrong value:
This patch fixes this problem by reset node managed pages to 0 after
hot-adding a new node.
1. Move reset_managed_pages_done from reset_node_managed_pages() to
reset_all_zones_managed_pages()
2. Make reset_node_managed_pages() non-static
3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat
is initialized
The add_early_randomness() function in drivers/char/hw_random/core.c passes
a 16-byte buffer to pseries_rng_data_read(). Unfortunately, plpar_hcall()
returns four 64-bit values and trashes 16 bytes on the stack.
This bug has been lying around for a long time. It got unveiled by:
It may trig a oops while loading or unloading the pseries-rng module for both
PowerVM and PowerKVM guests.
This patch does two things:
- pass an intermediate well sized buffer to plpar_hcall(). This is acceptalbe
since we're not on a hot path.
- move to the new read API so that we know the return buffer size for sure.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
All interrupts coming from MUIC were ignored because interrupt source
register was masked.
The Maxim 77693 has a "interrupt source" - a separate register and interrupts
which give information about PMIC block triggering the individual
interrupt (charger, topsys, MUIC, flash LED).
By default bootloader could initialize this register to "mask all"
value. In such case (observed on Trats2 board) MUIC interrupts won't be
generated regardless of their mask status. Regmap irq chip was unmasking
individual MUIC interrupts but the source was masked
Before introducing regmap irq chip this interrupt source was unmasked,
read and acked. Reading and acking is not necessary but unmasking is.
Fixes: 342d669c1ee4 ("mfd: max77693: Handle IRQs using regmap") Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Interrupts coming from Maxim77693 MUIC block (MicroUSB Interface
Controller) were not handled at all because wrong regmap was used for
MUIC's regmap_irq_chip.
The MUIC component of Maxim 77693 uses different I2C address thus second
regmap is created and used by max77693 extcon driver. The registers for
MUIC interrupts are also in that block and should be handled by that
second regmap.
However the regmap irq chip for MUIC was configured with default regmap
which could not read MUIC registers.
Fixes: 342d669c1ee4 ("mfd: max77693: Handle IRQs using regmap") Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit e7cd1d1eb16f ("mfd: twl4030-power: Add generic reset
configuration") enabled configuring the PM features for twl4030.
This caused poweroff command to fail on devices that have the
BCI charger on twl4030 wired, or have power wired for VBUS.
Instead of powering off, the device reboots. This is because
voltage is detected on charger or VBUS with the default bits
enabled for the power transition registers.
To fix the issue, let's just clear VBUS and CHG bits as we want
poweroff command to keep the system powered off.
Fixes: e7cd1d1eb16f ("mfd: twl4030-power: Add generic reset configuration") Reported-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Replace equivalent (and partially incorrect) scatter-gather functions
with ones from crypto-API.
The replacement is motivated by page-faults in sg_copy_part triggered
by successive calls to crypto_hash_update. The following fault appears
after calling crypto_ahash_update twice, first with 13 and then
with 285 bytes:
Unable to handle kernel paging request for data at address 0x00000008
Faulting instruction address: 0xf9bf9a8c
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=8 CoreNet Generic
Modules linked in: tcrypt(+) caamhash caam_jr caam tls
CPU: 6 PID: 1497 Comm: cryptomgr_test Not tainted
3.12.19-rt30-QorIQ-SDK-V1.6+g9fda9f2 #75
task: e9308530 ti: e700e000 task.ti: e700e000
NIP: f9bf9a8c LR: f9bfcf28 CTR: c0019ea0
REGS: e700fb80 TRAP: 0300 Not tainted
(3.12.19-rt30-QorIQ-SDK-V1.6+g9fda9f2)
MSR: 00029002 <CE,EE,ME> CR: 44f92024 XER: 20000000
DEAR: 00000008, ESR: 00000000
In a system with NUMA configuration we want to enforce that the accelerator is
connected to a node with memory to avoid cross QPI memory transaction.
Otherwise there is no point in using the accelerator as the encryption in
software will be faster.
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Tested-by: Nikolay Aleksandrov <nikolay@redhat.com> Reviewed-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If dma mapping for dma_addr_out fails, the descriptor memory is freed
but the previous dma mapping for dma_addr_in remains.
This patch resolves the missing dma unmap and groups resource
allocations at function start.
Current pageblock isolation logic could isolate each pageblock
individually. This causes freepage accounting problem if freepage with
pageblock order on isolate pageblock is merged with other freepage on
normal pageblock. We can prevent merging by restricting max order of
merging to pageblock order if freepage is on isolate pageblock.
A side-effect of this change is that there could be non-merged buddy
freepage even if finishing pageblock isolation, because undoing
pageblock isolation is just to move freepage from isolate buddy list to
normal buddy list rather than to consider merging. So, the patch also
makes undoing pageblock isolation consider freepage merge. When
un-isolation, freepage with more than pageblock order and it's buddy are
checked. If they are on normal pageblock, instead of just moving, we
isolate the freepage and free it in order to get merged.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
All the caller of __free_one_page() has similar freepage counting logic,
so we can move it to __free_one_page(). This reduce line of code and
help future maintenance.
This is also preparation step for "mm/page_alloc: restrict max order of
merging on isolated pageblock" which fix the freepage counting problem
on freepage with more than pageblock order.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In free_pcppages_bulk(), we use cached migratetype of freepage to
determine type of buddy list where freepage will be added. This
information is stored when freepage is added to pcp list, so if
isolation of pageblock of this freepage begins after storing, this
cached information could be stale. In other words, it has original
migratetype rather than MIGRATE_ISOLATE.
There are two problems caused by this stale information.
One is that we can't keep these freepages from being allocated.
Although this pageblock is isolated, freepage will be added to normal
buddy list so that it could be allocated without any restriction. And
the other problem is incorrect freepage accounting. Freepages on
isolate pageblock should not be counted for number of freepage.
Following is the code snippet in free_pcppages_bulk().
/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
__free_one_page(page, page_to_pfn(page), zone, 0, mt);
trace_mm_page_pcpu_drain(page, 0, mt);
if (likely(!is_migrate_isolate_page(page))) {
__mod_zone_page_state(zone, NR_FREE_PAGES, 1);
if (is_migrate_cma(mt))
__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
}
As you can see above snippet, current code already handle second
problem, incorrect freepage accounting, by re-fetching pageblock
migratetype through is_migrate_isolate_page(page).
But, because this re-fetched information isn't used for
__free_one_page(), first problem would not be solved. This patch try to
solve this situation to re-fetch pageblock migratetype before
__free_one_page() and to use it for __free_one_page().
In addition to move up position of this re-fetch, this patch use
optimization technique, re-fetching migratetype only if there is isolate
pageblock. Pageblock isolation is rare event, so we can avoid
re-fetching in common case with this optimization.
This patch also correct migratetype of the tracepoint output.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Before describing bugs itself, I first explain definition of freepage.
1. pages on buddy list are counted as freepage.
2. pages on isolate migratetype buddy list are *not* counted as freepage.
3. pages on cma buddy list are counted as CMA freepage, too.
Now, I describe problems and related patch.
Patch 1: There is race conditions on getting pageblock migratetype that
it results in misplacement of freepages on buddy list, incorrect
freepage count and un-availability of freepage.
Patch 2: Freepages on pcp list could have stale cached information to
determine migratetype of buddy list to go. This causes misplacement of
freepages on buddy list and incorrect freepage count.
Patch 4: Merging between freepages on different migratetype of
pageblocks will cause freepages accouting problem. This patch fixes it.
Without patchset [3], above problem doesn't happens on my CMA allocation
test, because CMA reserved pages aren't used at all. So there is no
chance for above race.
With patchset [3], I did simple CMA allocation test and get below
result:
- Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation
- run kernel build (make -j16) on background
- 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval
- Result: more than 5000 freepage count are missed
With patchset [3] and this patchset, I found that no freepage count are
missed so that I conclude that problems are solved.
On my simple memory offlining test, these problems also occur on that
environment, too.
This patch (of 4):
There are two paths to reach core free function of buddy allocator,
__free_one_page(), one is free_one_page()->__free_one_page() and the
other is free_hot_cold_page()->free_pcppages_bulk()->__free_one_page().
Each paths has race condition causing serious problems. At first, this
patch is focused on first type of freepath. And then, following patch
will solve the problem in second type of freepath.
In the first type of freepath, we got migratetype of freeing page
without holding the zone lock, so it could be racy. There are two cases
of this race.
1. pages are added to isolate buddy list after restoring orignal
migratetype
CPU1 CPU2
get migratetype => return MIGRATE_ISOLATE
call free_one_page() with MIGRATE_ISOLATE
grab the zone lock
unisolate pageblock
release the zone lock
grab the zone lock
call __free_one_page() with MIGRATE_ISOLATE
freepage go into isolate buddy list,
although pageblock is already unisolated
This may cause two problems. One is that we can't use this page anymore
until next isolation attempt of this pageblock, because freepage is on
isolate buddy list. The other is that freepage accouting could be wrong
due to merging between different buddy list. Freepages on isolate buddy
list aren't counted as freepage, but ones on normal buddy list are
counted as freepage. If merge happens, buddy freepage on normal buddy
list is inevitably moved to isolate buddy list without any consideration
of freepage accouting so it could be incorrect.
2. pages are added to normal buddy list while pageblock is isolated.
It is similar with above case.
This also may cause two problems. One is that we can't keep these
freepages from being allocated. Although this pageblock is isolated,
freepage would be added to normal buddy list so that it could be
allocated without any restriction. And the other problem is same as
case 1, that it, incorrect freepage accouting.
This race condition would be prevented by checking migratetype again
with holding the zone lock. Because it is somewhat heavy operation and
it isn't needed in common case, we want to avoid rechecking as much as
possible. So this patch introduce new variable, nr_isolate_pageblock in
struct zone to check if there is isolated pageblock. With this, we can
avoid to re-check migratetype in common case and do it only if there is
isolated pageblock or migratetype is MIGRATE_ISOLATE. This solve above
mentioned problems.
Changes from v3:
Add one more check in free_one_page() that checks whether migratetype is
MIGRATE_ISOLATE or not. Without this, abovementioned case 1 could happens.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
zram could kunmap_atomic() a NULL pointer in a rare situation: a zram
page becomes a full-zeroed page after a partial write io. The current
code doesn't handle this case and performs kunmap_atomic() on a NULL
pointer, which panics the kernel.
This patch fixes this issue.
Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Weijie Yang <weijie.yang.kh@gmail.com> Acked-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Atomicity between xchg and cmpxchg cannot be guaranteed when xchg is
implemented with a swap and cmpxchg is implemented with locks.
Without this, e.g. mcs_spin_lock and mcs_spin_unlock are broken.
Signed-off-by: Andreas Larsson <andreas@gaisler.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Otherwise rcu_irq_{enter,exit}() do not happen and we get dumps like:
====================
[ 188.275021] ===============================
[ 188.309351] [ INFO: suspicious RCU usage. ]
[ 188.343737] 3.18.0-rc3-00068-g20f3963-dirty #54 Not tainted
[ 188.394786] -------------------------------
[ 188.429170] include/linux/rcupdate.h:883 rcu_read_lock() used
illegally while idle!
[ 188.505235]
other info that might help us debug this:
Based almost entirely upon a patch by Paul E. McKenney.
Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This code is trying to go through the standard PCI config space
interfaces to read the PCI controller's PCI_STATUS register.
This doesn't work, because we more often than not do not enumerate
the PCI controller as a bonafide PCI device during the OF device
node scan. Therefore bus->self remains NULL.
Existing common code for PSYCHO and PSYCHO-like PCI controllers
handles this properly, by doing the config space access directly.
Do the same here, pbm->pci_ops->{read,write}().
Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The VD_OP_GET_VTOC operation will succeed only if the vdisk backend has a
VTOC label, otherwise it will fail. In particular, it will return error
48 (ENOTSUP) if the disk has an EFI label. VTOC disk labels are already
handled by directly reading the disk in block/partitions/sun.c (enabled by
CONFIG_SUN_PARTITION which defaults to y on SPARC). Since port->label is
unused in the driver, remove the call and the field.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
vio_dring_avail() will allow use of every dring entry, but when the last
entry is allocated then dr->prod == dr->cons which is indistinguishable from
the ring empty condition. This causes the next allocation to reuse an entry.
When this happens in sunvdc, the server side vds driver begins nack'ing the
messages and ends up resetting the ldc channel. This problem does not effect
sunvnet since it checks for < 2.
The fix here is to just never allocate the very last dring slot so that full
and empty are not the same condition. The request start path was changed to
check for the ring being full a bit earlier, and to stop the blk_queue if
there is no space left. The blk_queue will be restarted once the ring is
only half full again. The number of ring entries was increased to 512 which
matches the sunvnet and Solaris vdc drivers, and greatly reduces the
frequency of hitting the ring full condition and the associated blk_queue
stop/starting. The checks in sunvent were adjusted to account for
vio_dring_avail() returning 1 less.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ldc_map_sg() could fail its check that the number of pages referred to
by the sg scatterlist was <= the number of cookies.
This fixes the issue by doing a similar thing to the xen-blkfront driver,
ensuring that the scatterlist will only ever contain a segment count <=
port->ring_cookies, and each segment will be page aligned, and <= page
size. This ensures that the scatterlist is always mappable.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The LDom diskserver doesn't return reliable geometry data. In addition,
the types for all fields in the vio_disk_geom are u16, which were being
truncated in the cast into the u8's of the Linux struct hd_geometry.
Modify vdc_getgeo() to compute the geometry from the disk's capacity in a
manner consistent with xen-blkfront::blkif_getgeo().
Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Interpret the media type from v1.1 protocol to support CDROM/DVD.
For v1.0 protocol, a disk's size continues to be calculated from the
geometry returned by the vdisk server. The geometry returned by the server
can be less than the actual number of sectors available in the backing
image/device due to the rounding in the division used to compute the
geometry in the vdisk server.
In v1.1 protocol a disk's actual size in sectors is returned during the
handshake. Use this size when v1.1 protocol is negotiated. Since this size
will always be larger than the former geometry computed size, disks created
under v1.0 will be forwards compatible to v1.1, but not vice versa.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With commit be9dad1f9f26604fb ("net: phy: suspend phydev when going
to HALTED"), the PHY device will be put in a low-power mode using
BMCR_PDOWN if the the interface is set down. The smsc911x driver does
a software_reset opening the device driver (ndo_open). In such case,
the PHY must be powered-up before access to any register and before
calling the software_reset function. Otherwise, as the PHY is powered
down the software reset fails and the interface can not be enabled
again.
This patch fixes this scenario that is easy to reproduce setting down
the network interface and setting up again.
$ ifconfig eth0 down
$ ifconfig eth0 up
ifconfig: SIOCSIFFLAGS: Input/output error
Signed-off-by: Enric Balletbo i Serra <eballetbo@iseebcn.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Even if netlink_kernel_cfg::unbind is implemented the unbind() method is
not called, because cfg->unbind is omitted in __netlink_kernel_create().
And fix wrong argument of test_bit() and off by one problem.
At this point, no unbind() method is implemented, so there is no real
issue.
Fixes: 4f520900522f ("netlink: have netlink per-protocol bind function return an error code.") Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Cc: Richard Guy Briggs <rgb@redhat.com> Acked-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit ae5c6c6d "ptp: Classify ptp over ip over vlan packets" changed the
code in two drivers that matches time stamps with PTP frames, with the goal
of allowing VLAN tagged PTP packets to receive hardware time stamps.
However, that commit failed to account for the VLAN header when parsing
IPv4 packets. This patch fixes those two drivers to correctly match VLAN
tagged IPv4/UDP PTP messages with their time stamps.
This patch should also be applied to v3.17.
Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use IS_ENABLED(CONFIG_IPV6), to enable this code if IPv6 is
a module.
Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: c8e6ad0829a7 ("ipv6: honor IPV6_PKTINFO with v4 mapped addresses on sendmsg") Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A very minimal and simple user space application allocating an SCTP
socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing
the socket again will leak the memory containing the authentication
key from user space:
This is bad because of two things, we can bring down a machine from
user space when auth_enable=1, but also we would leave security sensitive
keying material in memory without clearing it after use. The issue is
that sctp_auth_create_key() already sets the refcount to 1, but after
allocation sctp_auth_set_key() does an additional refcount on it, and
thus leaving it around when we free the socket.
Fixes: 65b07e5d0d0 ("[SCTP]: API updates to suport SCTP-AUTH extensions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While the INIT chunk parameter verification dissects through many things
in order to detect malformed input, it misses to actually check parameters
inside of parameters. E.g. RFC5061, section 4.2.4 proposes a 'set primary
IP address' parameter in ASCONF, which has as a subparameter an address
parameter.
So an attacker may send a parameter type other than SCTP_PARAM_IPV4_ADDRESS
or SCTP_PARAM_IPV6_ADDRESS, param_type2af() will subsequently return 0
and thus sctp_get_af_specific() returns NULL, too, which we then happily
dereference unconditionally through af->from_addr_param().
A minimal way to address this is to check for NULL as we do on all
other such occasions where we know sctp_get_af_specific() could
possibly return with NULL.
Fixes: d6de3097592b ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In ppp_ioctl(), bpf_prog_create() is called inside ppp_lock, which
eventually calls vmalloc() and hits BUG_ON() in vmalloc.c. This patch
works around the problem by moving the allocation outside the lock.
The bug was revealed by the recent change in net/core/filter.c, as it
allocates via vmalloc() instead of kmalloc() now.
Reported-and-tested-by: Stefan Seyfried <stefan.seyfried@googlemail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, we only match against local port number in order to reuse
socket. But if this new vxlan wants an IPv6 socket and a IPv4 one bound
to that port, vxlan will reuse an IPv4 socket as IPv6 and a panic will
follow. The following steps reproduce it:
# ip link add vxlan6 type vxlan id 42 group 229.10.10.10 \
srcport 5000 6000 dev eth0
# ip link add vxlan7 type vxlan id 43 group ff0e::110 \
srcport 5000 6000 dev eth0
# ip link set vxlan6 up
# ip link set vxlan7 up
<panic>
When doing GRO processing for UDP tunnels, we never add
SKB_GSO_UDP_TUNNEL to gso_type - only the type of the inner protocol
is added (such as SKB_GSO_TCPV4). The result is that if the packet is
later resegmented we will do GSO but not treat it as a tunnel. This
results in UDP fragmentation of the outer header instead of (i.e.) TCP
segmentation of the inner header as was originally on the wire.
Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ATM, txq_reclaim will dequeue and free an skb for each tx desc released
by the hw that has TX_LAST_DESC set. However, in case of TSO, each
hw desc embedding the last part of a segment has TX_LAST_DESC set,
losing the one-to-one 'last skb frag'/'TX_LAST_DESC set' correspondance,
which causes data corruption.
Fix this by checking TX_ENABLE_INTERRUPT instead of TX_LAST_DESC, and
warn when trying to dequeue from an empty txq (which can be symptomatic
of releasing skbs prematurely).
Fixes: 3ae8f4e0b98 ('net: mv643xx_eth: Implement software TSO') Reported-by: Slawomir Gajzner <slawomir.gajzner@gmail.com> Reported-by: Julien D'Ascenzio <jdascenzio@yahoo.fr> Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Cc: Ian Campbell <ijc@hellion.org.uk> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Otherwise it gets overwritten by register_netdev().
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ipip6_tunnel_init() sets the dev->iflink via a call to
ipip6_tunnel_bind_dev(). After that, register_netdevice()
sets dev->iflink = -1. So we loose the iflink configuration
for ipv6 tunnels. Fix this by using ipip6_tunnel_init() as the
ndo_init function. Then ipip6_tunnel_init() is called after
dev->iflink is set to -1 from register_netdevice().
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
vti6_dev_init() sets the dev->iflink via a call to
vti6_link_config(). After that, register_netdevice()
sets dev->iflink = -1. So we loose the iflink configuration
for vti6 tunnels. Fix this by using vti6_dev_init() as the
ndo_init function. Then vti6_dev_init() is called after
dev->iflink is set to -1 from register_netdevice().
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ip6_tnl_dev_init() sets the dev->iflink via a call to
ip6_tnl_link_config(). After that, register_netdevice()
sets dev->iflink = -1. So we loose the iflink configuration
for ipv6 tunnels. Fix this by using ip6_tnl_dev_init() as the
ndo_init function. Then ip6_tnl_dev_init() is called after
dev->iflink is set to -1 from register_netdevice().
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The WARN_ON in inet_evict_bucket can be triggered by a valid case:
inet_frag_kill and inet_evict_bucket can be running in parallel on the
same queue which means that there has been at least one more ref added
by a previous inet_frag_find call, but inet_frag_kill can delete the
timer before inet_evict_bucket which will cause the WARN_ON() there to
trigger since we'll have refcnt!=1. Now, this case is valid because the
queue is being "killed" for some reason (removed from the chain list and
its timer deleted) so it will get destroyed in the end by one of the
inet_frag_put() calls which reaches 0 i.e. refcnt is still valid.
CC: Florian Westphal <fw@strlen.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McLean <chutzpah@gentoo.org> Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") Reported-by: Patrick McLean <chutzpah@gentoo.org> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the evictor is running it adds some chosen frags to a local list to
be evicted once the chain lock has been released but at the same time
the *frag_queue can be running for some of the same queues and it
may call inet_frag_kill which will wait on the chain lock and
will then delete the queue from the wrong list since it was added in the
eviction one. The fix is simple - check if the queue has the evict flag
set under the chain lock before deleting it, this is safe because the
evict flag is set only under that lock and having the flag set also means
that the queue has been detached from the chain list, so no need to delete
it again.
An important note to make is that we're safe w.r.t refcnt because
inet_frag_kill and inet_evict_bucket will sync on the del_timer operation
where only one of the two can succeed (or if the timer is executing -
none of them), the cases are:
1. inet_frag_kill succeeds in del_timer
- then the timer ref is removed, but inet_evict_bucket will not add
this queue to its expire list but will restart eviction in that chain
2. inet_evict_bucket succeeds in del_timer
- then the timer ref is kept until the evictor "expires" the queue, but
inet_frag_kill will remove the initial ref and will set
INET_FRAG_COMPLETE which will make the frag_expire fn just to remove
its ref.
In the end all of the queue users will do an inet_frag_put and the one
that reaches 0 will free it. The refcount balance should be okay.
CC: Florian Westphal <fw@strlen.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McLean <chutzpah@gentoo.org> Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Patrick McLean <chutzpah@gentoo.org> Tested-by: Patrick McLean <chutzpah@gentoo.org> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far
jumps") introduced a bug that caused the fix to be incomplete. Due to
incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit
segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may
not trigger #GP. As we know, this imposes a security problem.
In addition, the condition for two warnings was incorrect.
Fixes: d1442d85cc30ea75f7d399474ca738e0bc96f715 Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
[Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The bulkstat main loop progress is tracked by the "lastino"
variable, which is a full 64 bit inode. However, the loop actually
works on agno/agino pairs, and so there's a significant disconnect
between the rest of the loop and the main cursor. Convert this to
use the agino, and pass the agino into the chunk formatting function
and convert it too.
This gets rid of the inconsistency in the loop processing, and
finally makes it simple for us to skip inodes at any point in the
loop simply by incrementing the agino cursor.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The error propagation is a horror - xfs_bulkstat() returns
a rval variable which is only set if there are formatter errors. Any
sort of btree walk error or corruption will cause the bulkstat walk
to terminate but will not pass an error back to userspace. Worse
is the fact that formatter errors will also be ignored if any inodes
were correctly formatted into the user buffer.
Hence bulkstat can fail badly yet still report success to userspace.
This causes significant issues with xfsdump not dumping everything
in the filesystem yet reporting success. It's not until a restore
fails that there is any indication that the dump was bad and tha
bulkstat failed. This patch now triggers xfsdump to fail with
bulkstat errors rather than silently missing files in the dump.
This now causes bulkstat to fail when the lastino cookie does not
fall inside an existing inode chunk. The pre-3.17 code tolerated
that error by allowing the code to move to the next inode chunk
as the agino target is guaranteed to fall into the next btree
record.
With the fixes up to this point in the series, xfsdump now passes on
the troublesome filesystem image that exposes all these bugs.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are a bunch of variables tha tare more wildy scoped than they
need to be, obfuscated user buffer checks and tortured "next inode"
tracking. This all needs cleaning up to expose the real issues that
need fixing.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The loop construct has issues:
- clustidx is completely unused, so remove it.
- the loop tries to be smart by terminating when the
"freecount" tells it that all inodes are free. Just drop
it as in most cases we have to scan all inodes in the
chunk anyway.
- move the "user buffer left" condition check to the only
point where we consume space int eh user buffer.
- move the initialisation of agino out of the loop, leaving
just a simple loop control logic using the clusteridx.
Also, double handling of the user buffer variables leads to problems
tracking the current state - use the cursor variables directly
rather than keeping local copies and then having to update the
cursor before returning.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The xfs_bulkstat_agichunk formatting cursor takes buffer values from
the main loop and passes them via the structure to the chunk
formatter, and the writes the changed values back into the main loop
local variables. Unfortunately, this complex dance is full of corner
cases that aren't handled correctly.
The biggest problem is that it is double handling the information in
both the main loop and the chunk formatting function, leading to
inconsistent updates and endless loops where progress is not made.
To fix this, push the struct xfs_bulkstat_agichunk outwards to be
the primary holder of user buffer information. this removes the
double handling in the main loop.
Also, pass the last inode processed by the chunk formatter as a
separate parameter as it purely an output variable and is not
related to the user buffer consumption cursor.
Finally, the chunk formatting code is not shared by anyone, so make
it local to xfs_itable.c.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The bulkstat code has several different ways of detecting the end of
an AG when doing a walk. They are not consistently detected, and the
code that checks for the end of AG conditions is not consistently
coded. Hence the are conditions where the walk code can get stuck in
an endless loop making no progress and not triggering any
termination conditions.
Convert all the "tmp/i" status return codes from btree operations
to a common name (stat) and apply end-of-ag detection to these
operations consistently.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xfs_bulkstat() doesn't check error return from xfs_btree_increment(). In
case of specific fs corruption that could result in xfs_bulkstat()
entering an infinite loop because we would be looping over the same
chunk over and over again. Fix the problem by checking the return value
and terminating the loop properly.
Coverity-id: 1231338 Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Jie Liu <jeff.u.liu@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The recent refactoring of the bulkstat code left a small landmine in
the code. If a inobt read fails, then the tree walk is aborted and
returns without releasing the AGI buffer or freeing the cursor. This
can lead to a subsequent bulkstat call hanging trying to grab the
AGI buffer again.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we hit any errors in btrfs_lookup_csums_range, we'll loop through all
the csums we allocate and free them. But the code was using list_entry
incorrectly, and ended up trying to free the on-stack list_head instead.
The string property read helpers will run off the end of the buffer if
it is handed a malformed string property. Rework the parsers to make
sure that doesn't happen. At the same time add new test cases to make
sure the functions behave themselves.
The original implementations of of_property_read_string_index() and
of_property_count_strings() both open-coded the same block of parsing
code, each with it's own subtly different bugs. The fix here merges
functions into a single helper and makes the original functions static
inline wrappers around the helper.
One non-bugfix aspect of this patch is the addition of a new wrapper,
of_property_read_string_array(). The new wrapper is needed by the
device_properties feature that Rafael is working on and planning to
merge for v3.19. The implementation is identical both with and without
the new static inline wrapper, so it just got left in to reduce the
churn on the header file.
Signed-off-by: Grant Likely <grant.likely@linaro.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Darren Hart <darren.hart@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Parallella board comes with a U-Boot bootloader that loads one of
two predefined FPGA bitstreams before booting the kernel. Both define an
AXI interface to the on-board Epiphany processor.
Enable clocks FCLK0..FCLK3 for the Programmable Logic by default.
Otherwise accessing, e.g., the ESYSRESET register freezes the board,
as seen with the Epiphany SDK tools e-reset and e-hw-rev, using /dev/mem.
Signed-off-by: Andreas Färber <afaerber@suse.de> Acked-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a race condition when removing glue directory.
It can be reproduced in following test:
path 1: Add first child device
device_add()
get_device_parent()
/*find parent from glue_dirs.list*/
list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
if (k->parent == parent_kobj) {
kobj = kobject_get(k);
break;
}
....
class_dir_create_and_add()
path2: Remove last child device under glue dir
device_del()
cleanup_device_parent()
cleanup_glue_dir()
kobject_put(glue_dir);
If path2 has been called cleanup_glue_dir(), but not
call kobject_put(glue_dir), the glue dir is still
in parent's kset list. Meanwhile, path1 find the glue
dir from the glue_dirs.list. Path2 may release glue dir
before path1 call kobject_get(). So kernel will report
the warning and bug_on.
This is a "classic" problem we have of a kref in a list
that can be found while the last instance could be removed
at the same time.
This patch reuse gdp_mutex to fix this race condition.
The following calltrace is captured in kernel 3.4, but
the latest kernel still has this bug.