The current semaphore code allows a potential use after free: in
exit_sem we may free the task's sem_undo_list while there is still
another task looping through the same semaphore set and cleaning the
sem_undo list at freeary function (the task called IPC_RMID for the same
semaphore set).
For example, with a test program [1] running which keeps forking a lot
of processes (which then do a semop call with SEM_UNDO flag), and with
the parent right after removing the semaphore set with IPC_RMID, and a
kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
in the kernel log:
I wasn't able to trigger any badness on a recent kernel without the
proper config debugs enabled, however I have softlockup reports on some
kernel versions, in the semaphore code, which are similar as above (the
scenario is seen on some servers running IBM DB2 which uses semaphore
syscalls).
The patch here fixes the race against freeary, by acquiring or waiting
on the sem_undo_list lock as necessary (exit_sem can race with freeary,
while freeary sets un->semid to -1 and removes the same sem_undo from
list_proc or when it removes the last sem_undo).
After the patch I'm unable to reproduce the problem using the test case
[1].
void create_set()
{
int i, j;
pid_t p;
union {
int val;
struct semid_ds *buf;
unsigned short int *array;
struct seminfo *__buf;
} un;
/* Create and initialize semaphore set */
for (i = 0; i < NSET; i++) {
sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
if (sid[i] < 0) {
perror("semget");
exit(EXIT_FAILURE);
}
}
un.val = 0;
for (i = 0; i < NSET; i++) {
for (j = 0; j < NSEM; j++) {
if (semctl(sid[i], j, SETVAL, un) < 0)
perror("semctl");
}
}
/* Launch threads that operate on semaphore set */
for (i = 0; i < NSEM * NSET * NSET; i++) {
p = fork();
if (p < 0)
perror("fork");
if (p == 0)
thread();
}
/* Free semaphore set */
for (i = 0; i < NSET; i++) {
if (semctl(sid[i], NSEM, IPC_RMID))
perror("IPC_RMID");
}
/* Wait for forked processes to exit */
while (wait(NULL)) {
if (errno == ECHILD)
break;
};
}
int main(int argc, char **argv)
{
pid_t p;
srand(time(NULL));
while (1) {
p = fork();
if (p < 0) {
perror("fork");
exit(EXIT_FAILURE);
}
if (p == 0) {
create_set();
goto end;
}
/* Wait for forked processes to exit */
while (wait(NULL)) {
if (errno == ECHILD)
break;
};
}
end:
return 0;
}
[akpm@linux-foundation.org: use normal comment layout] Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Rafael Aquini <aquini@redhat.com> CC: Aristeu Rozanski <aris@redhat.com> Cc: David Jeffery <djeffery@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
Since fc_fcp_cleanup_cmd() can sleep this function must not
be called while holding a spinlock. This patch avoids that
fc_fcp_cleanup_each_cmd() triggers the following bug:
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Vasu Dev <vasu.dev@intel.com> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
In case of hw iscsi offload, an host can have N-number of active
connections. There can be IO's running on some connections which
make host->host_busy always TRUE. Now if logout from a connection
is tried then the code gets into an infinite loop as host->host_busy
is always TRUE.
iscsi_conn_teardown(....)
{
.........
/*
* Block until all in-progress commands for this connection
* time out or fail.
*/
for (;;) {
spin_lock_irqsave(session->host->host_lock, flags);
if (!atomic_read(&session->host->host_busy)) { /* OK for ERL == 0 */
spin_unlock_irqrestore(session->host->host_lock, flags);
break;
}
spin_unlock_irqrestore(session->host->host_lock, flags);
msleep_interruptible(500);
iscsi_conn_printk(KERN_INFO, conn, "iscsi conn_destroy(): "
"host_busy %d host_failed %d\n",
atomic_read(&session->host->host_busy),
session->host->host_failed);
................
...............
}
}
This is not an issue with software-iscsi/iser as each cxn is a separate
host.
Fix:
Acquiring eh_mutex in iscsi_conn_teardown() before setting
session->state = ISCSI_STATE_TERMINATE.
Signed-off-by: John Soni Jose <sony.john@avagotech.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
[lizf: Backported to 3.4: adjust context] Signed-of-by: Zefan Li <lizefan@huawei.com>
When using nested btrees, the top leaves of the top levels contain
block addresses for the root of the next tree down. If we shadow a
shared leaf node the leaf values (sub tree roots) should be incremented
accordingly.
This is only an issue if there is metadata sharing in the top levels.
Which only occurs if metadata snapshots are being used (as is possible
with dm-thinp). And could result in a block from the thinp metadata
snap being reused early, thus corrupting the thinp metadata snap.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
[lizf: Backported to 3.4:
- drop const
- drop changes to remove_one()] Signed-off-by: Zefan Li <lizefan@huawei.com>
In kbuild it is allowed to define objects in files named "Makefile"
and "Kbuild".
Currently localmodconfig reads objects only from "Makefile"s and misses
modules like nouveau.
Link: http://lkml.kernel.org/r/1437948415-16290-1-git-send-email-richard@nod.at Reported-and-tested-by: Leonidas Spyropoulos <artafinde@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
The "BUG_ON(list_empty(&osb->blocked_lock_list))" in
ocfs2_downconvert_thread_do_work can be triggered in the following case:
ocfs2dc has firstly saved osb->blocked_lock_count to local varibale
processed, and then processes the dentry lockres. During the dentry
put, it calls iput and then deletes rw, inode and open lockres from
blocked list in ocfs2_mark_lockres_freeing. And this causes the
variable `processed' to not reflect the number of blocked lockres to be
processed, which triggers the BUG.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
On MIPS the GLOBAL bit of the PTE must have the same value in any
aligned pair of PTEs. These pairs of PTEs are referred to as
"buddies". In a SMP system is is possible for two CPUs to be calling
set_pte() on adjacent PTEs at the same time. There is a race between
setting the PTE and a different CPU setting the GLOBAL bit in its
buddy PTE.
This race can be observed when multiple CPUs are executing
vmap()/vfree() at the same time.
Make setting the buddy PTE's GLOBAL bit an atomic operation to close
the race condition.
The case of CONFIG_64BIT_PHYS_ADDR && CONFIG_CPU_MIPS32 is *not*
handled.
Signed-off-by: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10835/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
Vince reported that the fasync signal stuff doesn't work proper for
inherited events. So fix that.
Installing fasync allocates memory and sets filp->f_flags |= FASYNC,
which upon the demise of the file descriptor ensures the allocation is
freed and state is updated.
Now for perf, we can have the events stick around for a while after the
original FD is dead because of references from child events. So we
cannot copy the fasync pointer around. We can however consistently use
the parent's fasync, as that will be updated.
Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho deMelo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1434011521.1495.71.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
"len" is a signed integer. We check that len is not negative, so it
goes from zero to INT_MAX. PAGE_SIZE is unsigned long so the comparison
is type promoted to unsigned long. ULONG_MAX - 4095 is a higher than
INT_MAX so the condition can never be true.
I don't know if this is harmful but it seems safe to limit "len" to
INT_MAX - 4095.
Fixes: a8c879a7ee98 ('RDS: Info and stats') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Zefan Li <lizefan@huawei.com>
We need to check that a TRB is part of the current segment
before calculating its DMA address.
Previously a ring segment didn't use a full memory page, and every
new ring segment got a new memory page, so the off by one
error in checking the upper bound was never seen.
Now that we use a full memory page, 256 TRBs (4096 bytes), the off by one
didn't catch the case when a TRB was the first element of the next segment.
This is triggered if the virtual memory pages for a ring segment are
next to each in increasing order where the ring buffer wraps around and
causes errors like:
[ 106.398223] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.398230] xhci_hcd 0000:00:14.0: Looking for event-dma fffd3000 trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 seg-end fffd4ff0
The trb-end address is one outside the end-seg address.
raid1_end_read_request() assumes that the In_sync bits are consistent
with the ->degaded count.
raid1_spare_active updates the In_sync bit before the ->degraded count
and so exposes an inconsistency, as does error()
So extend the spinlock in raid1_spare_active() and error() to hide those
inconsistencies.
This should probably be part of
Commit: 34cab6f42003 ("md/raid1: fix test for 'was read error from
last working device'.")
as it addresses the same issue. It fixes the same bug and should go
to -stable for same reasons.
Fixes: 76073054c95b ("md/raid1: clean up read_balance.") Signed-off-by: NeilBrown <neilb@suse.com>
[lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
modify_ldt() has questionable locking and does not synchronize
threads. Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.
This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.
This fixes some fallout from the CVE-2015-5157 fixes.
Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: security@kernel.org <security@kernel.org> Cc: xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2:
- Adjust context
- Drop comment changes in switch_mm()
- Drop changes to get_segment_base() in arch/x86/kernel/cpu/perf_event.c
- Open-code lockless_dereference(), smp_store_release(), on_each_cpu_mask()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables. Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.
While we're at it, add comments to help explain what's going on.
This isn't a great long-term fix. This code should probably be
changed to use something like set_memory_ro.
Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Vrabel <dvrabel@cantab.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: security@kernel.org <security@kernel.org> Cc: xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
Make sure all non-READ SCSI commands get targ_xfer_tag initialized
to 0xffffffff, not just WRITEs.
Double-free of a TUR cmd object occurs under the following scenario:
1. TUR received (targ_xfer_tag is uninitialized and left at 0)
2. TUR status sent
3. First unsolicited NOPIN is sent to initiator (gets targ_xfer_tag of 0)
4. NOPOUT for NOPIN (with TTT=0) arrives
- its ExpStatSN acks TUR status, TUR is queued for removal
- LIO tries to find NOPIN with TTT=0, but finds the same TUR instead,
TUR is queued for removal for the 2nd time
(Drop unbalanced conditional bracket usage - nab)
Signed-off-by: Alexei Potashnik <alexei@purestorage.com> Signed-off-by: Spencer Baugh <sbaugh@catern.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[lizf: Backported to 3.4:
- adjust context
- leave the braces as it is] Signed-off-by: Zefan Li <lizefan@huawei.com>
Add support for the Sierra Wireless AR8550 device with
USB descriptor 0x1199, 0x68AB.
It is common with MC879x modules 1199:683c/683d which
also are composite devices with 7 interfaces (0..6)
and also MDM62xx based as the AR8550.
The major difference are only the interface attributes
02/02/01 on interfaces 3 and 4 on the AR8550. They are
vendor specific ff/ff/ff on MC879x modules.
lsusb reports:
Bus 001 Device 004: ID 1199:68ab Sierra Wireless, Inc.
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x1199 Sierra Wireless, Inc.
idProduct 0x68ab
bcdDevice 0.06
iManufacturer 3 Sierra Wireless, Incorporated
iProduct 2 AR8550
iSerial 0
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 198
bNumInterfaces 7
bConfigurationValue 1
iConfiguration 1 Sierra Configuration
bmAttributes 0xe0
Self Powered
Remote Wakeup
MaxPower 0mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x01 EP 1 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 1
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02 EP 2 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 2
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x83 EP 3 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x03 EP 3 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 3
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 2 Communications
bInterfaceSubClass 2 Abstract (modem)
bInterfaceProtocol 1 AT-commands (v.25ter)
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x84 EP 4 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 5
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x85 EP 5 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x04 EP 4 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 4
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 2 Communications
bInterfaceSubClass 2 Abstract (modem)
bInterfaceProtocol 1 AT-commands (v.25ter)
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x86 EP 6 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 5
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x87 EP 7 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x05 EP 5 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 5
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x88 EP 8 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 5
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x89 EP 9 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x06 EP 6 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 6
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x8a EP 10 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 5
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x8b EP 11 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x07 EP 7 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Device Qualifier (for other device speed):
bLength 10
bDescriptorType 6
bcdUSB 2.00
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
bNumConfigurations 1
Device Status: 0x0001
Self Powered
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com> Cc: Lars Melin <larsm17@gmail.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
While reviewing vhost log code, I found out that log_file is never
set. Note: I haven't tested the change (QEMU doesn't use LOG_FD yet).
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
Fixes: a3138df9 ("[NIU]: Add Sun Neptune ethernet driver.") Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Zefan Li <lizefan@huawei.com>
This patch fixes a use-after-free bug in iscsit_release_sessions_for_tpg()
where se_portal_group->session_lock was incorrectly released/re-acquired
while walking the active se_portal_group->tpg_sess_list.
The can result in a NULL pointer dereference when iscsit_close_session()
shutdown happens in the normal path asynchronously to this code, causing
a bogus dereference of an already freed list entry to occur.
To address this bug, walk the session list checking for the same state
as before, but move entries to a local list to avoid dropping the lock
while walking the active list.
As before, signal using iscsi_session->session_restatement=1 for those
list entries to be released locally by iscsit_free_session() code.
Reported-by: Sunilkumar Nadumuttlu <sjn@datera.io> Cc: Sunilkumar Nadumuttlu <sjn@datera.io> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
This patch removes a bogus BUG_ON in the ablkcipher path that
triggers when the destination buffer is different from the source
buffer and is scattered.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Zefan Li <lizefan@huawei.com>
When zones were originally introduced, the expectation functions were
all extended to perform lookup using the zone. However, insertion was
not modified to check the zone. This means that two expectations which
are intended to apply for different connections that have the same tuple
but exist in different zones cannot both be tracked.
Fixes: 5d0aa2ccd4 (netfilter: nf_conntrack: add support for "conntrack zones") Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
When we get a read error from the last working device, we don't
try to repair it, and don't fail the device. We simple report a
read error to the caller.
However the current test for 'is this the last working device' is
wrong.
When there is only one fully working device, it assumes that a
non-faulty device is that device. However a spare which is rebuilding
would be non-faulty but so not the only working device.
So change the test from "!Faulty" to "In_sync". If ->degraded says
there is only one fully working device and this device is in_sync,
this must be the one.
This bug has existed since we allowed read_balance to read from
a recovering spare in v3.0
Reported-and-tested-by: Alexander Lyakas <alex.bolshoy@gmail.com> Fixes: 76073054c95b ("md/raid1: clean up read_balance.") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
This patch fixes a problem in the usbtouchscreen driver for DMC TSC-30
touch screen. Due to a missing delay between the RESET and SET_RATE
commands, the touch screen may become unresponsive during system startup or
driver loading.
According to the DMC documentation, a delay is needed after the RESET
command to allow the chip to complete its internal initialization. As this
delay is not guaranteed, we had a system where the touch screen
occasionally did not send any touch data. There was no other indication of
the problem.
The patch fixes the problem by adding a 150ms delay between the RESET and
SET_RATE commands.
Suggested-by: Jakob Mustafa <jakob.mustafa@bytecmed.com> Signed-off-by: Bernhard Bender <bernhard.bender@bytecmed.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
This device automatically switches itself to another mode (0x1405)
unless the specific access pattern of Windows is followed in its
initial mode. That makes a dirty unmount of the internal storage
devices inevitable if they are mounted. So the card reader of
such a device should be ignored, lest an unclean removal become
inevitable.
This replaces an earlier patch that ignored all LUNs of this device.
That patch was overly broad.
Signed-off-by: Oliver Neukum <oneukum@suse.com> Reviewed-by: Lars Melin <larsm17@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
When the link is just waken, it's in Resume state, and driver sets PLS to
U0. This refers to Phase 1. Phase 2 refers to when the link has completed
the transition from Resume state to U0.
With the fix of xhci: report U3 when link is in resume state, it also
exposes an issue that usb3 roothub and controller can suspend right
after phase 1, and this causes a hard hang in controller.
To fix the issue, we need to prevent usb3 bus suspend if any port is
resuming in phase 1.
[merge separate USB2 and USB3 port resume checking to one -Mathias] Signed-off-by: Zhuang Jin Can <jin.can.zhuang@intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
xhci_hub_report_usb3_link_state() returns pls as U0 when the link
is in resume state, and this causes usb core to think the link is in
U0 while actually it's in resume state. When usb core transfers
control request on the link, it fails with TRB error as the link
is not ready for transfer.
To fix the issue, report U3 when the link is in resume state, thus
usb core knows the link it's not ready for transfer.
Signed-off-by: Zhuang Jin Can <jin.can.zhuang@intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
When resetting a device the number of active TTs may need to be
corrected by xhci_update_tt_active_eps, but the number of old active
endpoints supplied to it was always zero, so the number of TTs and the
bandwidth reserved for them was not updated, and could rise
unnecessarily.
This affected systems using Intel's Patherpoint chipset, which rely on
software bandwidth checking. For example, a Lenovo X230 would lose the
ability to use ports on the docking station after enough suspend/resume
cycles because the bandwidth calculated would rise with every cycle when
a suitable device is attached.
The correct number of active endpoints is calculated in the same way as
in xhci_reserve_bandwidth.
Signed-off-by: Brian Campbell <bacam@z273.org.uk> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
virt_dev->num_cached_rings counts on freed ring and is not updated
correctly. In xhci_free_or_cache_endpoint_ring() function, the free ring
is added into cache and then num_rings_cache is incremented as below:
virt_dev->ring_cache[rings_cached] =
virt_dev->eps[ep_index].ring;
virt_dev->num_rings_cached++;
here, free ring pointer is added to a current index and then
index is incremented.
So current index always points to empty location in the ring cache.
For getting available free ring, current index should be decremented
first and then corresponding ring buffer value should be taken from ring
cache.
But In function xhci_endpoint_init(), the num_rings_cached index is
accessed before decrement.
virt_dev->eps[ep_index].new_ring =
virt_dev->ring_cache[virt_dev->num_rings_cached];
virt_dev->ring_cache[virt_dev->num_rings_cached] = NULL;
virt_dev->num_rings_cached--;
This is bug in manipulating the index of ring cache.
And it should be as below:
virt_dev->num_rings_cached--;
virt_dev->eps[ep_index].new_ring =
virt_dev->ring_cache[virt_dev->num_rings_cached];
virt_dev->ring_cache[virt_dev->num_rings_cached] = NULL;
Signed-off-by: Aman Deep <aman.deep@samsung.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
This fixes an issue introduced in commit b23c843992b6 (usb: dwc3:
gadget: fix DEPSTARTCFG for non-EP0 EPs) that made sure we would
only use DEPSTARTCFG once per SetConfig.
The trick is that we should use one DEPSTARTCFG per SetConfig *OR*
SetInterface. SetInterface was completely missed from the original
patch.
This problem became aparent after commit 76e838c9f776 (usb: dwc3:
gadget: return error if command sent to DEPCMD register fails)
added checking of the return status of device endpoint commands.
'Set Endpoint Transfer Resource' command was caught failing
occasionally. This is because the Transfer Resource
Index was not getting reset during a SET_INTERFACE request.
Finally, to fix the issue, was we have to do is make sure that
our start_config_issued flag gets reset whenever we receive a
SetInterface request.
To verify the problem (and its fix), all we have to do is run
test 9 from testusb with 'testusb -t 9 -s 2048 -a -c 5000'.
Tested-by: Huang Rui <ray.huang@amd.com> Tested-by: Subbaraya Sundeep Bhatta <subbaraya.sundeep.bhatta@xilinx.com> Fixes: b23c843992b6 (usb: dwc3: gadget: fix DEPSTARTCFG for non-EP0 EPs) Signed-off-by: John Youn <johnyoun@synopsys.com> Signed-off-by: Felipe Balbi <balbi@ti.com>
[lizf: Backported to 3.4: use dev_vdbg() instead of dwc3_trace()] Signed-off-by: Zefan Li <lizefan@huawei.com>
md_stop() would stop an array, but not free various attached
data structures.
For internal arrays, these are freed later in do_md_stop() or
mddev_put(), but they don't apply for dm-raid arrays.
So get md_stop() to free them, and only all it from dm-raid.
For internal arrays we now call __md_stop.
Reported-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Zefan Li <lizefan@huawei.com>
When ip_frag_queue() computes positions, it assumes that the passed
sk_buff does not contain L2 headers.
However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly
functions can be called on outgoing packets that contain L2 headers.
Also, IPv4 checksum is not corrected after reassembly.
Fixes: 7736d33f4262 ("packet: Add pre-defragmentation support for ipv4 fanouts.") Signed-off-by: Edward Hyunkoo Jee <edjee@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
If we don't do this, and we then fail to recreate the debugfs
directory during a mode change, then we will fail later trying
to add stations to this now bogus directory:
Signed-off-by: Tom Hughes <tom@compton.nu> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
If a valid power regulator or a dummy regulator is used (which
happens to be the case when no regulator is specified), restart_work
is queued no matter whether the device was running or not at suspend
time. Since work queues get initialized in the ndo_open callback,
resuming leads to a NULL pointer exception.
Reverse exactly the steps executed at suspend time:
- Enable the power regulator in any case
- Enable the transceiver regulator if the device was running, even in
case we have a power regulator
- Queue restart_work only in case the device was running
Fixes: bf66f3736a94 ("can: mcp251x: Move to threaded interrupts instead of workqueues.") Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
[lizf: Backported to 3.4:
- adjust filename
- adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
Shared skbs must not be modified and this is crucial for broadcast
and/or multicast paths where we use it as an optimisation to avoid
unnecessary cloning.
The function skb_recv_datagram breaks this rule by setting peeked
without cloning the skb first. This causes funky races which leads
to double-free.
This patch fixes this by cloning the skb and replacing the skb
in the list when setting skb->peeked.
Fixes: a59322be07c9 ("[UDP]: Only increment counter on first peek/recv") Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
I have a ST4000DM000 disk. If Linux is booted while the disk is spun down,
the command that sets transfer mode causes the disk to spin up. The
spin-up takes longer than the default 5s timeout, so the command fails and
timeout is reported.
Fix this by increasing the timeout to 15s, which is enough for the disk to
spin up.
Some devices lose data on TRIM whether queued or not. This patch adds
a horkage to disable TRIM.
tj: Collapsed unnecessary if() nesting.
Signed-off-by: Arne Fitzenreiter <arne_f@ipfire.org> Signed-off-by: Tejun Heo <tj@kernel.org>
[lizf: Backported to 3.4:
- adjust context
- drop changes to show_ata_dev_trim()] Signed-off-by: Zefan Li <lizefan@huawei.com>
This commit adds the necessary quirk to make the Marvell 4140 SATA PMP
work properly. This PMP doesn't like SRST on port number 4 (the host
port) so this commit marks this port as not supporting SRST.
Signed-off-by: Lior Amsalem <alior@marvell.com> Reviewed-by: Nadav Haklai <nadavh@marvell.com> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
Analysis reveals the cause of the crash to be due to STp->device
being NULL. The pointer was NULLed via scsi_tape_put(STp) when it
calls scsi_tape_release(). In st_open() we jump to err_out after
scsi_block_when_processing_errors() completes and returns the
device as offline (sdev_state was SDEV_DEL):
1180 /* Open the device. Needs to take the BKL only because of incrementing the SCSI host
1181 module count. */
1182 static int st_open(struct inode *inode, struct file *filp)
1183 {
1184 int i, retval = (-EIO);
1185 int resumed = 0;
1186 struct scsi_tape *STp;
1187 struct st_partstat *STps;
1188 int dev = TAPE_NR(inode);
1189 char *name;
...
1217 if (scsi_autopm_get_device(STp->device) < 0) {
1218 retval = -EIO;
1219 goto err_out;
1220 }
1221 resumed = 1;
1222 if (!scsi_block_when_processing_errors(STp->device)) {
1223 retval = (-ENXIO);
1224 goto err_out;
1225 }
...
1264 err_out:
1265 normalize_buffer(STp->buffer);
1266 spin_lock(&st_use_lock);
1267 STp->in_use = 0;
1268 spin_unlock(&st_use_lock);
1269 scsi_tape_put(STp); <-- STp->device = 0 after this
1270 if (resumed)
1271 scsi_autopm_put_device(STp->device);
1272 return retval;
The ref count for the struct scsi_tape had already been reduced
to 1 when the .remove method of the st module had been called.
The kref_put() in scsi_tape_put() caused scsi_tape_release()
to be called:
Although the problem was reported on SLES11.3 the problem appears
in linux-next as well.
The crash is fixed by reordering the code so we no longer access
the struct scsi_tape after the kref_put() is done on it in st_open().
Signed-off-by: Shane Seymour <shane.seymour@hp.com> Signed-off-by: Darren Lavender <darren.lavender@hp.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.com> Acked-by: Kai Mäkisara <kai.makisara@kolumbus.fi> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
Fixes: 3e0249f9c05c ("RDS/IB: add refcount tracking to struct rds_ib_device")
There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
failed(mr pool running out). this lead to the refcount overflow.
A complain in line 117(see following) is seen. From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
to return ERR_PTR(-EAGAIN).
The sfpc inline assembly within execve_tail() may incorrectly set bits
28-31 of the sfpc instruction to a value which is not zero.
These bits however are currently unused and therefore should be zero
so we won't get surprised if these bits will be used in the future.
Therefore remove the second operand from the inline assembly.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
Incoming packet should be either in backlog queue or
in RCU read-side section. Otherwise, the final sequence of
flush_backlog() and synchronize_net() may miss packets
that can run without device reference:
CPU 1 CPU 2
skb->dev: no reference
process_backlog:__skb_dequeue
process_backlog:local_irq_enable
on_each_cpu for
flush_backlog => IPI(hardirq): flush_backlog
- packet not found in backlog
CPU delayed ...
synchronize_net
- no ongoing RCU
read-side sections
netdev_run_todo,
rcu_barrier: no
ongoing callbacks
__netif_receive_skb_core:rcu_read_lock
- too late
free dev
process packet for freed dev
Fixes: 6e583ce5242f ("net: eliminate refcounting in backlog queue") Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4:
- adjust context
- no need to change "goto unlock" to "goto out"] Signed-off-by: Zefan Li <lizefan@huawei.com>
commit 381c759d9916 ("ipv4: Avoid crashing in ip_error")
fixes a problem where processed packet comes from device
with destroyed inetdev (dev->ip_ptr). This is not expected
because inetdev_destroy is called in NETDEV_UNREGISTER
phase and packets should not be processed after
dev_close_many() and synchronize_net(). Above fix is still
required because inetdev_destroy can be called for other
reasons. But it shows the real problem: backlog can keep
packets for long time and they do not hold reference to
device. Such packets are then delivered to upper levels
at the same time when device is unregistered.
Calling flush_backlog after NETDEV_UNREGISTER_FINAL still
accounts all packets from backlog but before that some packets
continue to be delivered to upper levels long after the
synchronize_net call which is supposed to wait the last
ones. Also, as Eric pointed out, processed packets, mostly
from other devices, can continue to add new packets to backlog.
Fix the problem by moving flush_backlog early, after the
device driver is stopped and before the synchronize_net() call.
Then use netif_running check to make sure we do not add more
packets to backlog. We have to do it in enqueue_to_backlog
context when the local IRQ is disabled. As result, after the
flush_backlog and synchronize_net sequence all packets
should be accounted.
Thanks to Eric W. Biederman for the test script and his
valuable feedback!
Reported-by: Vittorio Gambaletta <linuxbugs@vittgam.net> Fixes: 6e583ce5242f ("net: eliminate refcounting in backlog queue") Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
Jason Gunthorpe reported that since commit c02db8c6290b ("rtnetlink: make
SR-IOV VF interface symmetric"), we don't verify IFLA_VF_INFO attributes
anymore with respect to their policy, that is, ifla_vfinfo_policy[].
Before, they were part of ifla_policy[], but they have been nested since
placed under IFLA_VFINFO_LIST, that contains the attribute IFLA_VF_INFO,
which is another nested attribute for the actual VF attributes such as
IFLA_VF_MAC, IFLA_VF_VLAN, etc.
Despite the policy being split out from ifla_policy[] in this commit,
it's never applied anywhere. nla_for_each_nested() only does basic nla_ok()
testing for struct nlattr, but it doesn't know about the data context and
their requirements.
Fix, on top of Jason's initial work, does 1) parsing of the attributes
with the right policy, and 2) using the resulting parsed attribute table
from 1) instead of the nla_for_each_nested() loop (just like we used to
do when still part of ifla_policy[]).
Reference: http://thread.gmane.org/gmane.linux.network/368913 Fixes: c02db8c6290b ("rtnetlink: make SR-IOV VF interface symmetric") Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Cc: Greg Rose <gregory.v.rose@intel.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Rony Efraim <ronye@mellanox.com> Cc: Vlad Zolotarov <vladz@cloudius-systems.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Vlad Zolotarov <vladz@cloudius-systems.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Drop unsupported attributes
- Use ndo_set_vf_tx_rate operation, not ndo_set_vf_rate] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Zefan Li <lizefan@huawei.com>
If the kernel is compiled with gcc 5.1 and the XZ compression option
the decompress_kernel function calls _sclp_print_early in 64-bit mode
while the content of the upper register half of %r6 is non-zero.
This causes a specification exception on the servc instruction in
_sclp_servc.
The _sclp_print_early function saves and restores the upper registers
halves but it fails to clear them for the 31-bit code of the mini sclp
driver.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
Allocate memory using GFP_NOIO when deleting a btree. dm_btree_del()
can be called via an ioctl and we don't want to recurse into the FS or
block layer.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Claudio Cappelli <claudio.cappelli.linux@gmail.com> Suggested-by: Lars Melin <larsm17@gmail.com>
[johan: amend commit message with devices info ] Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
Given the pool's cell_sort_array holds 8192 pointers it triggers an
order 5 allocation via kmalloc. This order 5 allocation is prone to
failure as system memory gets more fragmented over time.
Fix this by allocating the cell_sort_array using vmalloc.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
[lizf: Backported 3.4: it's prinson_{create,destroy}() that need fixing] Signed-off-by: Zefan Li <lizefan@huawei.com>
redistribute3() shares entries out across 3 nodes. Some entries were
being moved the wrong way, breaking the ordering. This manifested as a
BUG() in dm-btree-remove.c:shift() when entries were removed from the
btree.
For additional context see:
https://www.redhat.com/archives/dm-devel/2015-May/msg00113.html
Signed-off-by: Dennis Yang <shinrairis@gmail.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
The Steinberg MI2 and MI4 interfaces are compatible with the USB class
audio spec, but the MIDI part of the devices is reported as a vendor
specific interface.
This patch adds entries to quirks-table.h to recognize the MIDI
endpoints. Audio functionality was already working and is unaffected by
this change.
Signed-off-by: Dominic Sacré <dominic.sacre@gmx.de> Signed-off-by: Albert Huitsing <albert@huitsing.nl> Acked-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Zefan Li <lizefan@huawei.com>
Removing unnecessary static buffers is good.
Use the vsprintf %pV extension instead.
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Mikulas Patocka <mikulas@twibright.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Mikulas:
- The bug corrected by the patch is - if hpfs_error is called concurrently
on multiple filesystems, it could corrupt the string because the text
buffer is shared. That's why I marked the patch for stable.]
[lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
Reading page fault handler code I've noticed that under right
circumstances kernel would map anonymous pages into file mappings: if
the VMA doesn't have vm_ops->fault() and the VMA wasn't fully populated
on ->mmap(), kernel would handle page fault to not populated pte with
do_anonymous_page().
Let's change page fault handler to use do_anonymous_page() only on
anonymous VMA (->vm_ops == NULL) and make sure that the VMA is not
shared.
For file mappings without vm_ops->fault() or shred VMA without vm_ops,
page fault on pte_none() entry would lead to SIGBUS.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
ext4_free_blocks is looping around the allocation request and mimics
__GFP_NOFAIL behavior without any allocation fallback strategy. Let's
remove the open coded loop and replace it with __GFP_NOFAIL. Without the
flag the allocator has no way to find out never-fail requirement and
cannot help in any way.
Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Zefan Li <lizefan@huawei.com>
sb_getblk() is used during ext4 (and possibly other FSes) writeback
paths. Sometimes such path require allocating memory and guaranteeing
that such allocation won't block. Currently, however, there is no way
to provide user flags for sb_getblk which could lead to deadlocks.
This patch implements a sb_getblk_gfp with the only difference it can
accept user-provided GFP flags.
Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Zefan Li <lizefan@huawei.com>
A buffer cache is allocated from movable area because it is referred
for a while and released soon. But some filesystems are taking buffer
cache for a long time and it can disturb page migration.
New APIs are introduced to allocate buffer cache with user specific
flag. *_gfp APIs are for user want to set page allocation flag for
page cache allocation. And *_unmovable APIs are for the user wants to
allocate page cache from non-movable area.
Signed-off-by: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
[lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
The free space entries are allocated using kmem_cache_zalloc(),
through __btrfs_add_free_space(), therefore we should use
kmem_cache_free() and not kfree() to avoid any confusion and
any potential problem. Looking at the kfree() definition at
mm/slab.c it has the following comment:
/*
* (...)
*
* Don't free memory not originally allocated by kmalloc()
* or you will run into trouble.
*/
So better be safe and use kmem_cache_free().
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
A test case is as the description says:
open(foobar, O_WRONLY);
sleep() --> reboot the server
close(foobar)
The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
line before going to restart, there is
clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
out state and when we go to close it, “call_close” doesn’t get set as
state flag is not set and CLOSE doesn’t go on the wire.
Signed-off-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
In rare cases a directory can be renamed out from under a bind mount.
In those cases without special handling it becomes possible to walk up
the directory tree to the root dentry of the filesystem and down
from the root dentry to every other file or directory on the filesystem.
Like division by zero .. from an unconnected path can not be given
a useful semantic as there is no predicting at which path component
the code will realize it is unconnected. We certainly can not match
the current behavior as the current behavior is a security hole.
Therefore when encounting .. when following an unconnected path
return -ENOENT.
- Add a function path_connected to verify path->dentry is reachable
from path->mnt.mnt_root. AKA to validate that rename did not do
something nasty to the bind mount.
To avoid races path_connected must be called after following a path
component to it's next path component.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
A rename can result in a dentry that by walking up d_parent
will never reach it's mnt_root. For lack of a better term
I call this an escaped path.
prepend_path is called by four different functions __d_path,
d_absolute_path, d_path, and getcwd.
__d_path only wants to see paths are connected to the root it passes
in. So __d_path needs prepend_path to return an error.
d_absolute_path similarly wants to see paths that are connected to
some root. Escaped paths are not connected to any mnt_root so
d_absolute_path needs prepend_path to return an error greater
than 1. So escaped paths will be treated like paths on lazily
unmounted mounts.
getcwd needs to prepend "(unreachable)" so getcwd also needs
prepend_path to return an error.
d_path is the interesting hold out. d_path just wants to print
something, and does not care about the weird cases. Which raises
the question what should be printed?
Given that <escaped_path>/<anything> should result in -ENOENT I
believe it is desirable for escaped paths to be printed as empty
paths. As there are not really any meaninful path components when
considered from the perspective of a mount tree.
So tweak prepend_path to return an empty path with an new error
code of 3 when it encounters an escaped path.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Zefan Li <lizefan@huawei.com>
The lkey table is allocated with with a get_user_pages() with an
order based on a number of index bits from a module parameter.
The underlying kernel code cannot allocate that many contiguous pages.
There is no reason the underlying memory needs to be physically
contiguous.
This patch:
- switches the allocation/deallocation to vmalloc/vfree
- caps the number of bits to 23 to insure at least 1 generation bit
o this matches the module parameter description
Reviewed-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.2:
- Adjust context
- Add definition of qib_dev_warn(), added upstream by commit ddb887658970
("IB/qib: Convert opcode counters to per-context")] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Zefan Li <lizefan@huawei.com>
Commit 6f6a6fda2945 "jbd2: fix ocfs2 corrupt when updating journal
superblock fails" changed jbd2_cleanup_journal_tail() to return EIO
when the journal is aborted. That makes logic in
jbd2_log_do_checkpoint() bail out which is fine, except that
jbd2_journal_destroy() expects jbd2_log_do_checkpoint() to always make
a progress in cleaning the journal. Without it jbd2_journal_destroy()
just loops in an infinite loop.
Fix jbd2_journal_destroy() to cleanup journal checkpoint lists of
jbd2_log_do_checkpoint() fails with error.
Reported-by: Eryu Guan <guaneryu@gmail.com> Tested-by: Eryu Guan <guaneryu@gmail.com> Fixes: 6f6a6fda294506dfe0e3e0a253bb2d2923f28f0a Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
This fixes an issue introduced in commit b23c843992b6 (usb: dwc3:
gadget: fix DEPSTARTCFG for non-EP0 EPs) that made sure we would
only use DEPSTARTCFG once per SetConfig.
The trick is that we should use one DEPSTARTCFG per SetConfig *OR*
SetInterface. SetInterface was completely missed from the original
patch.
This problem became aparent after commit 76e838c9f776 (usb: dwc3:
gadget: return error if command sent to DEPCMD register fails)
added checking of the return status of device endpoint commands.
'Set Endpoint Transfer Resource' command was caught failing
occasionally. This is because the Transfer Resource
Index was not getting reset during a SET_INTERFACE request.
Finally, to fix the issue, was we have to do is make sure that
our start_config_issued flag gets reset whenever we receive a
SetInterface request.
To verify the problem (and its fix), all we have to do is run
test 9 from testusb with 'testusb -t 9 -s 2048 -a -c 5000'.
Tested-by: Huang Rui <ray.huang@amd.com> Tested-by: Subbaraya Sundeep Bhatta <subbaraya.sundeep.bhatta@xilinx.com> Fixes: b23c843992b6 (usb: dwc3: gadget: fix DEPSTARTCFG for non-EP0 EPs) Signed-off-by: John Youn <johnyoun@synopsys.com> Signed-off-by: Felipe Balbi <balbi@ti.com>
[lizf: Backported to 3.4: use dev_vdbg() instead of dwc3_trace()] Signed-off-by: Zefan Li <lizefan@huawei.com>
We have that bug for years and some users report side effects when fixing it on older hardware.
So revert it for VM_CONTEXT0_PAGE_TABLE_END_ADDR, but keep it for VM 1-15.
Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[lizf: Backported to 3.4: drop the change to clk.c] Signed-off-by: Zefan Li <lizefan@huawei.com>
Fixes: 0aedb1626566 ("drm/i915: Don't skip request retirement if the active list is empty") Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
[lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
If we'd already sent a request and decide to abort it, we *must*
issue TFLUSH properly and not just blindly reuse the tag, or
we'll get seriously screwed when response eventually arrives
and we confuse it for response to later request that had reused
the same tag.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Zefan Li <lizefan@huawei.com>
Writes were a bit racy, but hard to turn into a bug at the same time.
(Particularly because modern Linux doesn't use this feature anymore.)
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[Actually the next patch makes it much, much easier to trigger the race
so I'm including this one for stable@ as well. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
This patch adds a new FACS initialization flag for acpi_tb_initialize().
acpi_enable_subsystem() might be invoked several times in OS bootup process,
and we don't want FACS initialization to be invoked twice. Lv Zheng.
Link: https://github.com/acpica/acpica/commit/90f5332a Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[lizf: Backported to 3.4: adjust filename] Signed-off-by: Zefan Li <lizefan@huawei.com>
fc->release is called from fuse_conn_put() which was used in the error
cleanup before fc->release was initialized.
[Jeremiah Mahler <jmmahler@gmail.com>: assign fc->release after calling
fuse_conn_init(fc) instead of before.]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Fixes: a325f9b92273 ("fuse: update fuse_conn_init() and separate out fuse_conn_kill()") Signed-off-by: Zefan Li <lizefan@huawei.com>
struct crush_bucket_tree::num_nodes is u8, so ceph_decode_8_safe()
should be used. -Wconversion catches this, but I guess it went
unnoticed in all the noise it spews. The actual problem (at least for
common crushmaps) isn't the u32 -> u8 truncation though - it's the
advancement by 4 bytes instead of 1 in the crushmap buffer.
In needs_ilk_vtd_wa(), we pass in the GPU device but compared it against
the ids for the mobile GPU and the mobile host bridge. That latter is
impossible and so likely was just a typo for the desktop GPU device id
(which is also buggy).
drm/i915: Disable WC PTE updates to w/a buggy IOMMU on ILK
Reported-by: Ting-Wei Lan <lantw44@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91127
References: https://bugzilla.freedesktop.org/show_bug.cgi?id=60391 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Zefan Li <lizefan@huawei.com>
bitmap_parselist("", &mask, nmaskbits) will erroneously set bit zero in
the mask. The same bug is visible in cpumask_parselist() since it is
layered on top of the bitmask code, e.g. if you boot with "isolcpus=",
you will actually end up with cpu zero isolated.
The bug was introduced in commit 4b060420a596 ("bitmap, irq: add
smp_affinity_list interface to /proc/irq") when bitmap_parselist() was
generalized to support userspace as well as kernelspace.
Fixes: 4b060420a596 ("bitmap, irq: add smp_affinity_list interface to /proc/irq") Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
The filter_parse() function can call infix_get_op() which calls
infix_advance() that updates the infix filter pointers for the cnt
and tail without checking if the filter is already at the end, which
will put the cnt to zero and the tail beyond the end. The loop then calls
infix_next() that has
The cnt will now be below zero, and the tail that is returned is
already passed the end of the filter string. So far the allocation
of the filter string usually has some buffer that is zeroed out, but
if the filter string is of the exact size of the allocated buffer
there's no guarantee that the charater after the nul terminating
character will be zero.
Luckily, only root can write to the filter.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
When testing the fix for the trace filter, I could not come up with
a scenario where the operand count goes below zero, so I added a
WARN_ON_ONCE(cnt < 0) to the logic. But there is legitimate case
that it can happen (although the filter would be wrong).
That is, a single operation without any operands will hit the path
where the WARN_ON_ONCE() can trigger. Although this is harmless,
and the filter is reported as a error. But instead of spitting out
a warning to the kernel dmesg, just fail nicely and report it via
the proper channels.
Link: http://lkml.kernel.org/r/558C6082.90608@oracle.com Reported-by: Vince Weaver <vincent.weaver@maine.edu> Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Zefan Li <lizefan@huawei.com>
This commit fix kernel crash when probing for rfkill devices in dell-laptop
driver failed. Function free_page() was incorrectly used on struct page *
instead of virtual address of SMI buffer.
This commit also simplify allocating page for SMI buffer by using
__get_free_page() function instead of sequential call of functions
alloc_page() and page_address().
Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
[lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
The kmemleak scanning thread can run for minutes. Callbacks like
kmemleak_free() are allowed during this time, the race being taken care
of by the object->lock spinlock. Such lock also prevents a memory block
from being freed or unmapped while it is being scanned by blocking the
kmemleak_free() -> ... -> __delete_object() function until the lock is
released in scan_object().
When a kmemleak error occurs (e.g. it fails to allocate its metadata),
kmemleak_enabled is set and __delete_object() is no longer called on
freed objects. If kmemleak_scan is running at the same time,
kmemleak_free() no longer waits for the object scanning to complete,
allowing the corresponding memory block to be freed or unmapped (in the
case of vfree()). This leads to kmemleak_scan potentially triggering a
page fault.
This patch separates the kmemleak_free() enabling/disabling from the
overall kmemleak_enabled nob so that we can defer the disabling of the
object freeing tracking until the scanning thread completed. The
kmemleak_free_part() is deliberately ignored by this patch since this is
only called during boot before the scanning thread started.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org> Tested-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
Current implementation of descriptor init procedure only takes
care about setting/clearing ownership flag in "des0"/"des1"
fields while it is perfectly possible to get unexpected bits
set because of the following factors:
[1] On driver probe underlying memory allocated with
dma_alloc_coherent() might not be zeroed and so
it will be filled with garbage.
[2] During driver operation some bits could be set by SD/MMC
controller (for example error flags etc).
And unexpected and/or randomly set flags in "des0"/"des1"
fields may lead to unpredictable behavior of GMAC DMA block.
This change addresses both items above with:
[1] Use of dma_zalloc_coherent() instead of simple
dma_alloc_coherent() to make sure allocated memory is
zeroed. That shouldn't affect performance because
this allocation only happens once on driver probe.
[2] Do explicit zeroing of both "des0" and "des1" fields
of all buffer descriptors during initialization of
DMA transfer.
And while at it fixed identation of dma_free_coherent()
counterpart as well.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: arc-linux-dev@synopsys.com Cc: linux-kernel@vger.kernel.org Cc: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4:
- adjust contest
- adjust allocations in init_dma_desc_rings()] Signed-off-by: Zefan Li <lizefan@huawei.com>
When a port goes through a link down/up the multicast router configuration
is not restored.
Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: 0909e11758bd ("bridge: Add multicast_router sysfs entries") Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <lizefan@huawei.com>
The omap watchdog has the annoying behaviour that writes to most
registers don't have any effect when the watchdog is already running.
Quoting the AM335x reference manual:
To modify the timer counter value (the WDT_WCRR register),
prescaler ratio (the WDT_WCLR[4:2] PTV bit field), delay
configuration value (the WDT_WDLY[31:0] DLY_VALUE bit field), or
the load value (the WDT_WLDR[31:0] TIMER_LOAD bit field), the
watchdog timer must be disabled by using the start/stop sequence
(the WDT_WSPR register).
Currently the timer is stopped in the .probe callback but still there
are possibilities that yield to a situation where omap_wdt_start is
entered with the timer running (e.g. when /dev/watchdog is closed
without stopping and then reopened). In such a case programming the
timeout silently fails!
To circumvent this stop the timer before reprogramming.
Assuming one of the first things the watchdog user does is setting the
timeout explicitly nothing too bad should happen because this explicit
setting works fine.
ext4 isn't willing to map clusters to a non-extent file. Don't signal
this with an out of space error, since the FS will retry the
allocation (which didn't fail) forever. Instead, return EUCLEAN so
that the operation will fail immediately all the way back to userspace.
(The fix is either to run e2fsck -E bmap2extent, or to chattr +e the file.)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Zefan Li <lizefan@huawei.com>
PACKET_FANOUT_LB computes f->rr_cur such that it is modulo
f->num_members. It returns the old value unconditionally, but
f->num_members may have changed since the last store. Ensure
that the return value is always < num.
When modifying the logic, simplify it further by replacing the loop
with an unconditional atomic increment.
Fixes: dc99f600698d ("packet: Add fanout support.") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4:
- adjust context
- fanout_demux_lb() returns a pointer] Signed-off-by: Zefan Li <lizefan@huawei.com>
We need to tell compiler it must not read f->num_members multiple
times. Otherwise testing if num is not zero is flaky, and we could
attempt an invalid divide by 0 in fanout_demux_cpu()
Note bug was present in packet_rcv_fanout_hash() and
packet_rcv_fanout_lb() but final 3.1 had a simple location
after commit 95ec3eb417115fb ("packet: Add 'cpu' fanout policy.")
Fixes: dc99f600698dc ("packet: Add fanout support.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4: use ACCESS_ONCE() instead of READ_ONCE()] Signed-off-by: Zefan Li <lizefan@huawei.com>