If filesystem groups are artifically small (using parameter -g to
mkfs.ext4), ext4_mb_normalize_request() can result in a request that is
larger than a block group. Trim the request size to not confuse
allocation code.
Reported-by: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Because start block is not included in the range the hole appears at
the wrong offset (just after the desired offset) and the following
pwrite() overwrites already existent block, keeping hole untouched.
Simple way to verify wrong behaviour is to check zeroed blocks after
the test:
$ hexdump ./ext4.file | grep '0000 0000'
The root cause of the bug is a wrong range (start, stop], where start
should be inclusive, i.e. [start, stop].
This patch fixes the problem by including start into the range. But
not to break left shift (range collapse) stop points to the beginning
of the a block, not to the end.
The other not obvious change is an iterator check on validness in a
main loop. Because iterator is unsigned the following corner case
should be considered with care: insert a block at 0 offset, when stop
variables overflows and never becomes less than start, which is 0.
To handle this special case iterator is set to NULL to indicate that
end of the loop is reached.
loop_reread_partitions() needs to do I/O, but we just froze the queue,
so we end up waiting forever. This can easily be reproduced with losetup
-P. Fix it by moving the reread to after we unfreeze the queue.
Fixes: ecdd09597a57 ("block/loop: fix race between I/O and set_status") Reported-by: Tejun Heo <tj@kernel.org> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the journal has been aborted, we shouldn't mark the underlying
buffer head as dirty, since that will cause the metadata block to get
modified. And if the journal has been aborted, we shouldn't allow
this since it will almost certainly lead to a corrupted file system.
Userspace applications should be allowed to expect the membarrier system
call with MEMBARRIER_CMD_SHARED command to issue memory barriers on
nohz_full CPUs, but synchronize_sched() does not take those into
account.
Given that we do not want unrelated processes to be able to affect
real-time sensitive nohz_full CPUs, simply return ENOSYS when membarrier
is invoked on a kernel with enabled nohz_full CPUs.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: Lai Jiangshan <jiangshanlai@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sd_check_events() is called asynchronously, and might race
with device removal. So always take a disk reference when
processing the event to avoid the device being removed while
the event is processed.
Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Jinpu Wang <jinpu.wang@profitbricks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The device handler needs to check if a given queue belongs to a scsi
device; only then does it make sense to attach a device handler.
[mkp: dropped flags]
Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver currently checks the SELF_TEST_FAILED first and then
KERNEL_PANIC next. Under error conditions(boot code failure) both
SELF_TEST_FAILED and KERNEL_PANIC can be set at the same time.
The driver has the capability to reset the controller on an KERNEL_PANIC,
but not on SELF_TEST_FAILED.
Fixed by first checking KERNEL_PANIC and then the others.
Fixes: e8b12f0fb835223752 ([SCSI] aacraid: Add new code for PMC-Sierra's SRC base controller family) Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com> Reviewed-by: David Carroll <David.Carroll@microsemi.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On I/O errors, the Windows driver doesn't set data_transfer_length
on error conditions other than SRB_STATUS_DATA_OVERRUN.
In these cases we need to set data_transfer_length to 0,
indicating there is no data transferred. On SRB_STATUS_DATA_OVERRUN,
data_transfer_length is set by the Windows driver to the actual data transferred.
Reported-by: Shiva Krishna <Shiva.Krishna@nimblestorage.com> Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When sense message is present on error, we should pass along to the upper
layer to decide how to deal with the error.
This patch fixes connectivity issues with Fiber Channel devices.
Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Properly set SRB flags when hosting device supports tagged queuing.
This patch improves the performance on Fiber Channel disks.
Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A rounding bug due to compiler generated temporary being 32bit was found
in remap_to_cache(). A localized cast in remap_to_cache() fixes the
corruption but this preferred fix (changing from uint32_t to sector_t)
eliminates potential for future rounding errors elsewhere.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The problem is that shmat() calls do_mmap_pgoff() with MAP_FIXED, and
the address rounded down to 0. For the regular mmap case, the
protection mentioned above is that the kernel gets to generate the
address -- arch_get_unmapped_area() will always check for MAP_FIXED and
return that address. So by the time we do security_mmap_addr(0) things
get funky for shmat().
The testcase itself shows that while a regular user crashes, root will
not have a problem attaching a nil-page. There are two possible fixes
to this. The first, and which this patch does, is to simply allow root
to crash as well -- this is also regular mmap behavior, ie when hacking
up the testcase and adding mmap(... |MAP_FIXED). While this approach
is the safer option, the second alternative is to ignore SHM_RND if the
rounded address is 0, thus only having MAP_SHARED flags. This makes the
behavior of shmat() identical to the mmap() case. The downside of this
is obviously user visible, but does make sense in that it maintains
semantics after the round-down wrt 0 address and mmap.
With rw_page, page_endio is used for completing IO on a page and it
propagates write error to the address space if the IO fails. The
problem is it accesses page->mapping directly which might be okay for
file-backed pages but it shouldn't for anonymous page. Otherwise, it
can corrupt one of field from anon_vma under us and system goes panic
randomly.
swap_writepage
bdev_writepage
ops->rw_page
I encountered the BUG during developing new zram feature and it was
really hard to figure it out because it made random crash, somtime
mmap_sem lockdep, sometime other places where places never related to
zram/zsmalloc, and not reproducible with some configuration.
When I consider how that bug is subtle and people do fast-swap test with
brd, it's worth to add stable mark, I think.
Fixes: dd6bd0d9c7db ("swap: use bdev_read_page() / bdev_write_page()") Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
At the end of a window period, if the reclaimed pages is greater than
scanned, an unsigned underflow can result in a huge pressure value and
thus a critical event. Reclaimed pages is found to go higher than
scanned because of the addition of reclaimed slab pages to reclaimed in
shrink_node without a corresponding increment to scanned pages.
Minchan Kim mentioned that this can also happen in the case of a THP
page where the scanned is 1 and reclaimed could be 512.
Link: http://lkml.kernel.org/r/1486641577-11685-1-git-send-email-vinmenon@codeaurora.org Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Anton Vorontsov <anton.vorontsov@linaro.org> Cc: Shiraz Hashim <shashim@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When @node_reclaim_node isn't 0, the page allocator tries to reclaim
pages if the amount of free memory in the zones are below the low
watermark. On Power platform, none of NUMA nodes are scanned for page
reclaim because no nodes match the condition in zone_allows_reclaim().
On Power platform, RECLAIM_DISTANCE is set to 10 which is the distance
of Node-A to Node-A. So the preferred node even won't be scanned for
page reclaim.
size <<= 30;
p = malloc(size);
assert(p);
memset(p, 0, size);
end = time(NULL);
printf("Used time: %ld seconds\n", end - start);
sleep(3600);
return 0;
}
The system I use for testing has two NUMA nodes. Both have 128GB
memory. In below scnario, the page caches on node#0 should be reclaimed
when it encounters pressure to accommodate request of allocation.
Some of the macros are incorrect with wrong bit-shifts resulting in picking
the incorrect invalidation granularity. Incorrect Source-ID in extended
devtlb invalidation caused device side errors.
It is allowed to call regulator_get with a NULL dev argument
(_regulator_get explicitly checks for it) but this causes an error later
when printing /sys/kernel/debug/regulator_summary.
Fix this by explicitly handling "deviceless" consumers in the debugfs code.
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
gcc-7 detects that wlanhdr_to_ethhdr() in two drivers calls memcpy() with
a destination argument that an earlier function call may have set to NULL:
staging/rtl8188eu/core/rtw_recv.c: In function 'wlanhdr_to_ethhdr':
staging/rtl8188eu/core/rtw_recv.c:1318:2: warning: argument 1 null where non-null expected [-Wnonnull]
staging/rtl8712/rtl871x_recv.c: In function 'r8712_wlanhdr_to_ethhdr':
staging/rtl8712/rtl871x_recv.c:649:2: warning: argument 1 null where non-null expected [-Wnonnull]
I'm fixing this by adding a NULL pointer check and returning failure
from the function, which is hopefully already handled properly.
This seems to date back to when the drivers were originally added,
so backporting the fix to stable seems appropriate. There are other
related realtek drivers in the kernel, but none of them contain a
function with a similar name or produce this warning.
Fixes: 1cc18a22b96b ("staging: r8188eu: Add files for new driver - part 5") Fixes: 2865d42c78a9 ("staging: r8712u: Add the new driver to the mainline kernel") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The sequencer FIFO management has a bug that may lead to a corruption
(shortage) of the cell linked list. When a sequencer client faces an
error at the event delivery, it tries to put back the dequeued cell.
When the first queue was put back, this forgot the tail pointer
tracking, and the link will be screwed up.
Although there is no memory corruption, the sequencer client may stall
forever at exit while flushing the pending FIFO cells in
snd_seq_pool_done(), as spotted by syzkaller.
This patch addresses the missing tail pointer tracking at
snd_seq_fifo_cell_putback(). Also the patch makes sure to clear the
cell->enxt pointer at snd_seq_fifo_event_in() for avoiding a similar
mess-up of the FIFO linked list.
Currently ctxfi driver tries to set only the 64bit DMA mask on 64bit
architectures, and bails out if it fails. This causes a problem on
some platforms since the 64bit DMA isn't always guaranteed. We should
fall back to the default 32bit DMA when 64bit DMA fails.
When a user sets a too small ticks with a fine-grained timer like
hrtimer, the kernel tries to fire up the timer irq too frequently.
This may lead to the condensed locks, eventually the kernel spinlock
lockup with warnings.
For avoiding such a situation, we define a lower limit of the
resolution, namely 1ms. When the user passes a too small tick value
that results in less than that, the kernel returns -EINVAL now.
Like for Sunrise Point, the total stream number of Lewisburg's
input and output stream exceeds 15 (GCAP is 0x9701), which will
cause some streams do not work because of the overflow on
SDxCTL.STRM field if using the legacy stream tag allocation method.
The issue is the same as "dd9aa335c880 ALSA: hda/realtek - Can't adjust
speaker's volume on a Dell AIO", the output requires to connect to a node
with Amp-out capability.
Applying the same fixup "ALC298_FIXUP_SPK_VOLUME" can fix the issue.
Enable DMA on usart3 to get a more reliable console. This is especially
useful for automation and kernelci were a kernel with PROVE_LOCKING enabled
is quite susceptible to character loss, resulting in tests failure.
is_jump_ins() checks for plain jump ("j") instructions since commit e7438c4b893e ("MIPS: Fix sibling call handling in get_frame_info") but
that commit didn't make the same change to the microMIPS code, leaving
it inconsistent with the MIPS32/MIPS64 code. Handle the microMIPS
encoding of the jump instruction too such that it behaves consistently.
Signed-off-by: Paul Burton <paul.burton@imgtec.com> Fixes: e7438c4b893e ("MIPS: Fix sibling call handling in get_frame_info") Cc: Tony Wu <tung7970@gmail.com> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14533/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
get_frame_info() calculates the offset of the return address within a
stack frame simply by dividing a the bottom 16 bits of the instruction,
treated as a signed integer, by the size of a long. Whilst this works
for MIPS32 & MIPS64 ISAs where the sw or sd instructions are used, it's
incorrect for microMIPS where encodings differ. The result is that we
typically completely fail to unwind the stack on microMIPS.
Fix this by adjusting is_ra_save_ins() to calculate the return address
offset, and take into account the various different encodings there in
the same place as we consider whether an instruction is storing the
ra/$31 register.
With this we are now able to unwind the stack for kernels targetting the
microMIPS ISA, for example we can produce:
is_jump_ins() checks 16b instruction fields without verifying that the
instruction is indeed 16b, as is done by is_ra_save_ins() &
is_sp_move_ins(). Add the appropriate check.
get_frame_info() is meant to iterate over up to the first 128
instructions within a function, but for microMIPS kernels it will not
reach that many instructions unless the function is 512 bytes long since
we calculate the maximum number of instructions to check by dividing the
function length by the 4 byte size of a union mips_instruction. In
microMIPS kernels this won't do since instructions are variable length.
Fix this by instead checking whether the pointer to the current
instruction has reached the end of the function, and use max_insns as a
simple constant to check the number of iterations against.
During stack unwinding we call a number of functions to determine what
type of instruction we're looking at. The union mips_instruction pointer
provided to them may be pointing at a 2 byte, but not 4 byte, aligned
address & we thus cannot directly access the 4 byte wide members of the
union mips_instruction. To avoid this is_ra_save_ins() copies the
required half-words of the microMIPS instruction to a correctly aligned
union mips_instruction on the stack, which it can then access safely.
The is_jump_ins() & is_sp_move_ins() functions do not correctly perform
this temporary copy, and instead attempt to directly dereference 4 byte
fields which may be misaligned and lead to an address exception.
Fix this by copying the instruction halfwords to a temporary union
mips_instruction in get_frame_info() such that we can provide a 4 byte
aligned union mips_instruction to the is_*_ins() functions and they do
not need to deal with misalignment themselves.
get_frame_info() can be called in microMIPS kernels with the ISA bit
already clear. For example this happens when unwind_stack_by_address()
is called because we begin with a PC that has the ISA bit set & subtract
the (odd) offset from the preceding symbol (which does not have the ISA
bit set). Since get_frame_info() unconditionally subtracts 1 from the PC
in microMIPS kernels it incorrectly misaligns the address it then
attempts to access code at, leading to an address error exception.
Fix this by using msk_isa16_mode() to clear the ISA bit, which allows
get_frame_info() to function regardless of whether it is provided with a
PC that has the ISA bit set or not.
Disabling ethernet during reboot (only to enable it again when the
ethernet driver attaches) can put the chip into a faulty state where it
corrupts the header of all incoming packets.
This happens if packets arrive during the time window where the core is
disabled, and it can be easily reproduced by rebooting while sending a
flood ping to the broadcast address.
Fixes: 95135bfa7ead ("MIPS: Lantiq: Deactivate most of the devices by default") Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: John Crispin <john@phrozen.org> Cc: hauke.mehrtens@lantiq.com Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15078/ Signed-off-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If copy_from_user is called with a large buffer (>= 128 bytes) and the
userspace buffer refers partially to unreadable memory, then it is
possible for Octeon's copy_from_user to report the wrong number of bytes
have been copied. In the case where the buffer size is an exact multiple
of 128 and the fault occurs in the last 64 bytes, copy_from_user will
report that all the bytes were copied successfully but leave some
garbage in the destination buffer.
The bug is in the main __copy_user_common loop in octeon-memcpy.S where
in the middle of the loop, src and dst are incremented by 128 bytes. The
l_exc_copy fault handler is used after this but that assumes that
"src < THREAD_BUADDR($28)". This is not the case if src has already been
incremented.
Fix by adding an extra fault handler which rewinds the src and dst
pointers 128 bytes before falling though to l_exc_copy.
Thanks to the pwritev test from the strace test suite for originally
highlighting this bug!
Fixes: 5b3b16880f40 ("MIPS: Add Cavium OCTEON processor support ...") Signed-off-by: James Cowgill <James.Cowgill@imgtec.com> Acked-by: David Daney <david.daney@cavium.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14978/ Signed-off-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For certain arguments such as saddr = 0xc0a8fd60, daddr = 0xc0a8fda1,
len = 80, proto = 17, sum = 0x7eae049d there will be a carry when
folding the intermediate 64 bit checksum to 32 bit but the code doesn't
add the carry back to the one's complement sum, thus an incorrect result
will be generated.
Reported-by: Mark Zhang <bomb.zhang@gmail.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Reviewed-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move mic/mpssd examples to samples and remove it from Documentation
Makefile. Create a new Makefile to build mic/mpssd. It can be built
from top level directory or from mic/mpssd directory:
Run make -C samples/mic/mpssd or cd samples/mic/mpssd; make
Acked-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
[backported to 4.4-stable as this code is broken on newer versions of
gcc and we don't want to break the build for a Documentation sample.
- gregkh] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Between loading the new VMCS and enabling PML, the CPU was unpinned.
If the vCPU thread were migrated to another CPU in the interim (e.g.,
due to preemption or sleeping alloc_page), then the VMWRITEs to enable
PML would target the wrong VMCS -- or no VMCS at all:
[ 2087.266950] vmwrite error: reg 200e value 3fe1d52000 (err -506126336)
[ 2087.267062] vmwrite error: reg 812 value 1ff (err 511)
[ 2087.267125] vmwrite error: reg 401e value 12229c00 (err 304258048)
This patch ensures that the VMCS remains current while enabling PML by
doing the VMWRITEs while the CPU is pinned. Allocation of the PML buffer
is hoisted out of the critical section.
Signed-off-by: Peter Feiner <pfeiner@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: "Herongguang (Stephen)" <herongguang.he@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the most of cases, we only use one transaction per frame and the
frame rate may be high, If the platforms want to support multiple
transactions but less frame rate cases like [1] and [2], it can set
"non-zero-ttctrl-ttha" at dts.
In the function rtl_usb_start we pre-allocate a certain number of urbs
for RX path but they will not be freed when calling rtl_usb_stop. This
results in leaking urbs when doing ifconfig up and down. Eventually,
the system has no available urbs.
Signed-off-by: Michael Schenk <michael.schenk@albis-elcon.com> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When !CONFIG_CGROUP_WRITEBACK, bdi has single bdi_writeback_congested
at bdi->wb_congested. cgwb_bdi_init() allocates it with kzalloc() and
doesn't do further initialization. This usually works fine as the
reference count gets bumped to 1 by wb_init() and the put from
wb_exit() releases it.
However, when wb_init() fails, it puts the wb base ref automatically
freeing the wb and the explicit kfree() in cgwb_bdi_init() error path
ends up trying to free the same pointer the second time causing a
double-free.
Fix it by explicitly initilizing the refcnt to 1 and putting the base
ref from cgwb_bdi_destroy().
The goldfish platform code registers the platform device unconditionally
which causes havoc in several ways if the goldfish_pdev_bus driver is
enabled:
- Access to the hardcoded physical memory region, which is either not
available or contains stuff which is completely unrelated.
- Prevents that the interrupt of the serial port can be requested
- In case of a spurious interrupt it goes into a infinite loop in the
interrupt handler of the pdev_bus driver (which needs to be fixed
seperately).
Add a 'goldfish' command line option to make the registration opt-in when
the platform is compiled in.
I'm seriously grumpy about this engineering trainwreck, which has seven
SOBs from Intel developers for 50 lines of code. And none of them figured
out that this is broken. Impressive fail!
Fixes: ddd70cf93d78 ("goldfish: platform device for x86") Reported-by: Gabriel C <nix.or.die@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current implementation failed to detect short transfers, something
which could lead to bits of the uninitialised heap transfer buffer
leaking to user space.
Fixes: 149fc791a452 ("USB: ark3116: Setup some basic infrastructure for new ark3116 driver.") Fixes: f4c1e8d597d1 ("USB: ark3116: Make existing functions 16450-aware and add close and release functions.") Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The opticon driver used a control request at open to trigger a CTS
status notification to be sent over the bulk-in pipe. When the driver
was converted to using the generic read implementation, an inverted test
prevented this request from being sent, something which could lead to
TIOCMGET reporting an incorrect CTS state.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 7a6ee2b02751 ("USB: opticon: switch to generic read implementation") Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make sure to detect short control transfers and return zero on success
when retrieving the modem status.
This fixes the TIOCMGET implementation which since e1ed212d8593 ("USB:
spcp8x5: add proper modem-status support") has returned TIOCM_LE on
successful retrieval, and avoids leaking bits from the stack on short
transfers.
This also fixes the carrier-detect implementation which since the above
mentioned commit unconditionally has returned true.
FTDI devices use a receive latency timer to periodically empty the
receive buffer and report modem and line status (also when the buffer is
empty).
When a break or error condition is detected the corresponding status
flags will be set on a packet with nonzero data payload and the flags
are not updated until the break is over or further characters are
received.
In order to avoid over-reporting break and error conditions, these flags
must therefore only be processed for packets with payload.
This specifically fixes the case where after an overrun, the error
condition is continuously reported and NULL-characters inserted until
further data is received.
Reported-by: Michael Walle <michael@walle.cc> Fixes: 72fda3ca6fc1 ("USB: serial: ftd_sio: implement sysrq handling on
break") Fixes: 166ceb690750 ("USB: ftdi_sio: clean up line-status handling") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit 557aaa7ffab6 ("ft232: support the ASYNC_LOW_LATENCY
flag") the FTDI driver has been using a receive latency-timer value of
1 ms instead of the device default of 16 ms.
The latency timer is used to periodically empty a non-full receive
buffer, but a status header is always sent when the timer expires
including when the buffer is empty. This means that a two-byte bulk
message is received every millisecond also for an otherwise idle port as
long as it is open.
Let's restore the pre-2009 behaviour which reduces the rate of the
status messages to 1/16th (e.g. interrupt frequency drops from 1 kHz to
62.5 Hz) by not setting ASYNC_LOW_LATENCY by default.
Anyone willing to pay the price for the minimum-latency behaviour should
set the flag explicitly instead using the TIOCSSERIAL ioctl or a tool
such as setserial (e.g. setserial /dev/ttyUSB0 low_latency).
Note that since commit 0cbd81a9f6ba ("USB: ftdi_sio: remove
tty->low_latency") the ASYNC_LOW_LATENCY flag has no other effects but
to set a minimal latency timer.
Reported-by: Antoine Aubert <a.aubert@overkiz.com> Fixes: 557aaa7ffab6 ("ft232: support the ASYNC_LOW_LATENCY flag") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make sure to detect short responses when fetching the modem status in
order to avoid parsing uninitialised buffer data and having bits of it
leak to user space.
Note that we still allow for short 1-byte responses.
Add new USB IDs for cp2104/5 devices on Bx50v3 boards due to the design
change.
Signed-off-by: Ken Lin <yungching0725@gmail.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ modinfo drivers/tty/serial/msm_serial.ko | grep alias
$
Commit 34b88a68f26a ("net: Fix use after free in the recvmmsg exit path"),
changed the exit path of recvmmsg to always return the datagrams
variable and modified the error paths to set the variable to the error
code returned by recvmsg if necessary.
However in the case sock_error returned an error, the error code was
then ignored, and recvmmsg returned 0.
Change the error path of recvmmsg to correctly return the error code
of sock_error.
The bug was triggered by using recvmmsg on a CAN interface which was
not up. Linux 4.6 and later return 0 in this case while earlier
releases returned -ENETDOWN.
Fixes: 34b88a68f26a ("net: Fix use after free in the recvmmsg exit path") Signed-off-by: Maxime Jayat <maxime.jayat@mobile-devices.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The skbs processed by ip_cmsg_recv() are not guaranteed to
be linear e.g. when sending UDP packets over loopback with
MSGMORE.
Using csum_partial() on [potentially] the whole skb len
is dangerous; instead be on the safe side and use skb_checksum().
Thanks to syzkaller team to detect the issue and provide the
reproducer.
v1 -> v2:
- move the variable declaration in a tighter scope
Fixes: ad6f939ab193 ("ip: Add offset parameter to ip_cmsg_recv") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the current DCCP implementation an skb for a DCCP_PKT_REQUEST packet
is forcibly freed via __kfree_skb in dccp_rcv_state_process if
dccp_v6_conn_request successfully returns.
However, if IPV6_RECVPKTINFO is set on a socket, the address of the skb
is saved to ireq->pktopts and the ref count for skb is incremented in
dccp_v6_conn_request, so skb is still in use. Nevertheless, it gets freed
in dccp_rcv_state_process.
Fix by calling consume_skb instead of doing goto discard and therefore
calling __kfree_skb.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3. calling dev_remove_pack(&fanout->prot_hook), from inside
spin_lock(&po->bind_lock) or rcu_read-side critical-section. dev_remove_pack()
-> synchronize_net(), which might sleep.
4. fanout_release() races with calls from different CPU.
To fix the above problems, remove the call to fanout_release() under
rcu_read_lock(). Instead, call __dev_remove_pack(&fanout->prot_hook) and
netdev_run_todo will be happy that &dev->ptype_specific list is empty. In order
to achieve this, I moved dev_{add,remove}_pack() out of fanout_{add,release} to
__fanout_{link,unlink}. So, call to {,__}unregister_prot_hook() will make sure
fanout->prot_hook is removed as well.
Fixes: 6664498280cf ("packet: call fanout_release, while UNREGISTERING a netdev") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Anoob Soman <anoob.soman@citrix.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fortunately fuzzers like syzkaller still know how to run this code,
otherwise it would be no fun.
Setting skb->sk without skb->destructor leads to all kinds of
bugs, we now prefer to be very strict about it.
Ideally here we would use skb_set_owner() but this helper does not exist yet,
only CAN seems to have a private helper for that.
Fixes: 376c7311bdb6 ("net: add a temporary sanity check in skb_orphan()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 0809e3ac6231 ("block: fix plug list flushing for nomerge queues")
updated blk_mq_make_request() to set request_count even when
blk_queue_nomerges() returns true. However, blk_mq_make_request() only
does limited plugging and doesn't use request_count;
blk_sq_make_request() is the one that should have been fixed. Do that
and get rid of the unnecessary work in the mq version.
Fixes: 0809e3ac6231 ("block: fix plug list flushing for nomerge queues") Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixes a RTC wakealarm issue, namely, the event fires during
hibernate and is not cleared from the list, causing hwclock to block.
The current enqueuing does not trigger an alarm if any expired timers
already exist on the timerqueue. This can occur when a RTC wake alarm
is used to wake a machine out of hibernate and the resumed state has
old expired timers that have not been removed from the timer queue.
This fix skips over any expired timers and triggers an alarm if there
are no pending timers on the timerqueue. Note that the skipped expired
timer will get reaped later on, so there is no need to clean it up
immediately.
The issue can be reproduced by putting a machine into hibernate and
waking it with the RTC wakealarm. Running the example RTC test program
from tools/testing/selftests/timers/rtctest.c after the hibernate will
block indefinitely. With the fix, it no longer blocks after the
hibernate resume.
BugLink: http://bugs.launchpad.net/bugs/1333569 Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
These drivers need to be able to reference "struct ieee80211_hw" from
the driver's private data, and vice versa. The USB driver failed to
store the address of ieee80211_hw in the private data. Although this
bug has been present for a long time, it was not exposed until
commit ba9f93f82aba ("rtlwifi: Fix enter/exit power_save").
Fixes: ba9f93f82aba ("rtlwifi: Fix enter/exit power_save") Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 577fb13199b1 ("mmc: rework selection of bus speed mode")
refactored bus width selection code to mmc_select_bus_width().
However, it also altered the behavior to not call the selection code in
non-high-speed modes anymore.
This causes 1-bit mode to always be used when the high-speed mode is not
enabled, even though 4-bit and 8-bit bus are valid bus widths in the
backwards-compatibility (legacy) mode as well (see e.g. 5.3.2 Bus Speed
Modes in JEDEC 84-B50). This results in a significant regression in
transfer speeds.
Fix the code to allow 4-bit and 8-bit widths even without high-speed
mode, as before.
Tested with a Zynq-7000 PicoZed 7020 board.
Fixes: 577fb13199b1 ("mmc: rework selection of bus speed mode") Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[anssi.hannula@bitwise.fi: backported for the different err variable
check on v4.4 and tested] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The call to debugfs_remove_recursive(qp->debugfs_dir) of the sub-level
directory must not be later than
debugfs_remove_recursive(nt_debugfs_dir) of the top-level directory.
Otherwise, the sub-level directory will not exist, and it would be
invalid (panic) to attempt to remove it. This removes the top-level
directory last, after sub-level directories have been cleaned up.
Signed-off-by: Allen Hubbe <Allen.Hubbe@dell.com> Fixes: e26a5843f ("NTB: Split ntb_hw_intel and ntb_transport drivers") Signed-off-by: Jon Mason <jdmason@kudzu.us> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This RCU warning, however, is suppressed by lockdep_off() in printk().
lockdep_off() increments the ->lockdep_recursion counter and thus
disables RCU_LOCKDEP_WARN() and debug_lockdep_rcu_enabled(), which want
lockdep to be enabled "current->lockdep_recursion == 0".
Link: http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Tony Lindgren <tony@atomide.com> Tested-by: Tony Lindgren <tony@atomide.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Lindgren <tony@atomide.com> Cc: Russell King <rmk@armlinux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The 64-bit get_user() wasn't clearing the high word due to a typo in the
error handler. The exception handler entry was already correct, though.
Noticed during recent usercopy test additions in lib/test_user_copy.c.
Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The UEVENT user mode helper is enabled before the initcalls are executed
and is available when the root filesystem has been mounted.
The user mode helper is triggered by device init calls and the executable
might use the futex syscall.
futex_init() is marked __initcall which maps to device_initcall, but there
is no guarantee that futex_init() is invoked _before_ the first device init
call which triggers the UEVENT user mode helper.
If the user mode helper uses the futex syscall before futex_init() then the
syscall crashes with a NULL pointer dereference because the futex subsystem
has not been initialized yet.
Move futex_init() to core_initcall so futexes are initialized before the
root filesystem is mounted and the usermode helper becomes available.
[ tglx: Rewrote changelog ]
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Cc: jiang.biao2@zte.com.cn Cc: jiang.zhengxiong@zte.com.cn Cc: zhong.weidong@zte.com.cn Cc: deng.huali@zte.com.cn Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cn Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
100% reproducible issue found on SKL SkullCanyon NUC with two external
DP daisy-chained monitors in DP/MST mode. When turning off or changing
the input of the second monitor the machine stops with a kernel
oops. This issue happened with 4.8.8 as well as drm/drm-intel-nightly.
This issue is traced to an inconsistent control flow in
drm_dp_update_payload_part1(): the 'port' pointer is set to NULL at the
same time as 'req_payload.num_slots' is set to zero, but the pointer is
dereferenced even when req_payload.num_slot is zero.
The problematic dereference was introduced in commit dfda0df34
("drm/mst: rework payload table allocation to conform better") and may
impact all versions since v3.18
The fix suggested by Chris Wilson removes the kernel oops and was found to
work well after 10mn of monkey-testing with the second monitor power and
input buttons
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98990 Fixes: dfda0df34264 ("drm/mst: rework payload table allocation to conform better.") Cc: Dave Airlie <airlied@redhat.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Nathan D Ciobanu <nathan.d.ciobanu@linux.intel.com> Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> Cc: Sean Paul <seanpaul@chromium.org> Tested-by: Nathan D Ciobanu <nathan.d.ciobanu@linux.intel.com> Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1487076561-2169-1-git-send-email-jani.nikula@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ELAN0605 has been confirmed to be a variant of ELAN0600, which is
blacklisted in the hid-core to be managed by elan_i2c. This device can be
found in Lenovo ideapad 310s (80U4000).
What happens is that a write to /dev/sg is given a request with non-zero
->iovec_count combined with zero ->dxfer_len. Or with ->dxferp pointing
to an array full of empty iovecs.
Having write permission to /dev/sg shouldn't be equivalent to the
ability to trigger BUG_ON() while holding spinlocks...
Found by Dmitry Vyukov and syzkaller.
[ The BUG_ON() got changed to a WARN_ON_ONCE(), but this fixes the
underlying issue. - Linus ]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Don't crash the machine just because of an empty transfer. Use WARN_ON()
combined with returning an error.
Found by Dmitry Vyukov and syzkaller.
[ Changed to "WARN_ON_ONCE()". Al has a patch that should fix the root
cause, but a BUG_ON() is not acceptable in any case, and a WARN_ON()
might still be a cause of excessive log spamming.
NOTE! If this warning ever triggers, we may end up leaking resources,
since this doesn't bother to try to clean the command up. So this
WARN_ON_ONCE() triggering does imply real problems. But BUG_ON() is
much worse.
People really need to stop using BUG_ON() for "this shouldn't ever
happen". It makes pretty much any bug worse. - Linus ]
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: James Bottomley <jejb@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a potential race between fuse_dev_do_write()
and request_wait_answer() contexts as shown below:
TASK 1:
__fuse_request_send():
|--spin_lock(&fiq->waitq.lock);
|--queue_request();
|--spin_unlock(&fiq->waitq.lock);
|--request_wait_answer():
|--if (test_bit(FR_SENT, &req->flags))
<gets pre-empted after it is validated true>
TASK 2:
fuse_dev_do_write():
|--clears bit FR_SENT,
|--request_end():
|--sets bit FR_FINISHED
|--spin_lock(&fiq->waitq.lock);
|--list_del_init(&req->intr_entry);
|--spin_unlock(&fiq->waitq.lock);
|--fuse_put_request();
|--queue_interrupt();
<request gets queued to interrupts list>
|--wake_up_locked(&fiq->waitq);
|--wait_event_freezable();
<as FR_FINISHED is set, it returns and then
the caller frees this request>
Now, the next fuse_dev_do_read(), see interrupts list is not empty
and then calls fuse_read_interrupt() which tries to access the request
which is already free'd and gets the below crash:
[11432.401266] Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b6b
...
[11432.418518] Kernel BUG at ffffff80083720e0
[11432.456168] PC is at __list_del_entry+0x6c/0xc4
[11432.463573] LR is at fuse_dev_do_read+0x1ac/0x474
...
[11432.679999] [<ffffff80083720e0>] __list_del_entry+0x6c/0xc4
[11432.687794] [<ffffff80082c65e0>] fuse_dev_do_read+0x1ac/0x474
[11432.693180] [<ffffff80082c6b14>] fuse_dev_read+0x6c/0x78
[11432.699082] [<ffffff80081d5638>] __vfs_read+0xc0/0xe8
[11432.704459] [<ffffff80081d5efc>] vfs_read+0x90/0x108
[11432.709406] [<ffffff80081d67f0>] SyS_read+0x58/0x94
As FR_FINISHED bit is set before deleting the intr_entry with input
queue lock in request completion path, do the testing of this flag and
queueing atomically with the same lock in queue_interrupt().
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: fd22d62ed0c3 ("fuse: no fc->lock for iqueue parts") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reported as a Kaffeine bug:
https://bugs.kde.org/show_bug.cgi?id=375811
The USB control messages require DMA to work. We cannot pass
a stack-allocated buffer, as it is not warranted that the
stack would be into a DMA enabled area.
On Kernel 4.9, the default is to not accept DMA on stack anymore
on x86 architecture. On other architectures, this has been a
requirement since Kernel 2.2. So, after this patch, this driver
should likely work fine on all archs.
Flags (PIPE_BUF_FLAG_PACKET, PIPE_BUF_FLAG_GIFT) could remain on the
unused part of the pipe ring buffer. Previously splice_to_pipe() left
the flags value alone, which could result in incorrect behavior.
Uninitialized flags appears to have been there from the introduction of
the splice syscall.
udp_ioctl(), as its name suggests, is used by UDP protocols,
but is also used by L2TP :(
L2TP should use its own handler, because it really does not
look the same.
SIOCINQ for instance should not assume UDP checksum or headers.
Thanks to Andrey and syzkaller team for providing the report
and a nice reproducer.
While crashes only happen on recent kernels (after commit 7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")), this
probably needs to be backported to older kernels.
Fixes: 7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue") Fixes: 85584672012e ("udp: Fix udp_poll() and ioctl()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link layer protocols may unconditionally pull headers, as Ethernet
does in eth_type_trans. Ensure that the entire link layer header
always lies in the skb linear segment. tpacket_snd has such a check.
Extend this to packet_snd.
Variable length link layer headers complicate the computation
somewhat. Here skb->len may be smaller than dev->hard_header_len.
Round up the linear length to be at least as long as the smallest of
the two.
Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The stack must not pass packets to device drivers that are shorter
than the minimum link layer header length.
Previously, packet sockets would drop packets smaller than or equal
to dev->hard_header_len, but this has false positives. Zero length
payload is used over Ethernet. Other link layer protocols support
variable length headers. Support for validation of these protocols
removed the min length check for all protocols.
Introduce an explicit dev->min_header_len parameter and drop all
packets below this value. Initially, set it to non-zero only for
Ethernet and loopback. Other protocols can follow in a patch to
net-next.
Fixes: 9ed988cd5915 ("packet: validate variable length ll headers") Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is because when tunnel->dst_cache init fails, we free dev->tstats
once in ipip6_tunnel_init() and twice in sit_init_net(). This looks
redundant but its ndo_uinit() does not seem enough to clean up everything
here. So avoid this by setting dev->tstats to NULL after the first free,
at least for -net.
Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexander Popov reported that an application may trigger a BUG_ON in
sctp_wait_for_sndbuf if the socket tx buffer is full, a thread is
waiting on it to queue more data and meanwhile another thread peels off
the association being used by the first thread.
This patch replaces the BUG_ON call with a proper error handling. It
will return -EPIPE to the original sendmsg call, similarly to what would
have been done if the association wasn't found in the first place.
Acked-by: Alexander Popov <alex.popov@linux.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mlx4 may schedule napi from a workqueue. Afterwards, softirqs are not run
in a deterministic time frame and the following message may be logged:
NOHZ: local_softirq_pending 08
The problem is the same as what was described in commit ec13ee80145c
("virtio_net: invoke softirqs after __napi_schedule") and this patch
applies the same fix to mlx4.
Fixes: 07841f9d94c1 ("net/mlx4_en: Schedule napi when RX buffers allocation fails") Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Acked-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When IFF_VNET_HDR is enabled, a virtio_net header must precede data.
Data length is verified to be greater than or equal to expected header
length tun->vnet_hdr_sz before copying.
Macvtap functions read the value once, but unless READ_ONCE is used,
the compiler may ignore this and read multiple times. Enforce a single
read and locally cached value to avoid updates between test and use.
Signed-off-by: Willem de Bruijn <willemb@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When IFF_VNET_HDR is enabled, a virtio_net header must precede data.
Data length is verified to be greater than or equal to expected header
length tun->vnet_hdr_sz before copying.
Read this value once and cache locally, as it can be updated between
the test and use (TOCTOU).
Signed-off-by: Willem de Bruijn <willemb@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> CC: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>