git.ipfire.org Git - thirdparty/kernel/stable.git/log

SUNRPC: xs_reset_transport must mark the connection as disconnected

commit 0c78789e3a030615c6650fde89546cadf40ec2cc upstream.

In case the reconnection attempt fails.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

SUNRPC: Fix a thinko in xs_connect()

commit 99b1a4c32ad22024ac6198a4337aaec5ea23168f upstream.

It is rather pointless to test the value of transport->inet after
calling xs_reset_transport(), since it will always be zero, and
so we will never see any exponential back off behaviour.
Also don't force early connections for SOFTCONN tasks. If the server
disconnects us, we should respect the exponential backoff.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: sunrpc: fix tracepoint Warning: unknown op '->'

commit 051ac3848a94f21cfdec899cc9c65ce7f9f116fa upstream.

`perf stat -e sunrpc:svc_xprt_do_enqueue true` results in

Warning: unknown op '->'
Warning: [sunrpc:svc_xprt_do_enqueue] unknown op '->'

Similar warning for svc_handle_xprt as well.

Actually TP_printk() should never dereference an address saved in the ring
buffer that points somewhere in the kernel. There's no guarantee that that
object still exists (with the exception of static strings).

Therefore change all the arguments for TP_printk(), so that it references
values existing in the ring buffer only.

While doing that, also fix another possible bug when argument xprt could be
NULL and TP_fast_assign() tries to access it's elements.

Signed-off-by: Pratyush Anand <panand@redhat.com>
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Fixes: 83a712e0afef "sunrpc: add some tracepoints around ..."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

svcrdma: Change maximum server payload back to RPCSVC_MAXPAYLOAD

commit cc9a903d915c21626b6b2fbf8ed0ff16a7f82210 upstream.

Both commit 0380a3f375 ("svcrdma: Add a separate "max data segs"
macro for svcrdma") and commit 7e5be28827bf ("svcrdma: advertise
the correct max payload") are incorrect. This commit reverts both
changes, restoring the server's maximum payload size to 1MB.

Commit 7e5be28827bf based the server's maximum payload on the
_client's_ RPCRDMA_MAX_DATA_SEGS value. That was wrong.

Commit 0380a3f375 tried to fix this so that the client maximum
payload size could be raised without affecting the server, but
managed to confuse matters more on the server side.

More importantly, limiting the advertised maximum payload size was
meant to be a workaround, not the actual fix. We need to revisit

https://bugzilla.linux-nfs.org/show_bug.cgi?id=270

A Linux client on a platform with 64KB pages can overrun and crash
an x86_64 NFS/RDMA server when the r/wsize is 1MB. An x86/64 Linux
client seems to work fine using 1MB reads and writes when the Linux
server's maximum payload size is restored to 1MB.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=270
Fixes: 0380a3f375 ("svcrdma: Add a separate "max data segs" macro")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "NFSv4: Remove incorrect check in can_open_delegated()"

commit 36319608e28701c07cad80ae3be8b0fdfb1ab40f upstream.

This reverts commit 4e379d36c050b0117b5d10048be63a44f5036115.

This commit opens up a race between the recovery code and the open code.

Reported-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfs: Fix truncated client owner id without proto type

commit 4a70316caef7d158445e672e146eb9f1b8c1aeee upstream.

The length of "Linux NFSv4.0 " is 14, not 10.

Without this patch, I get a truncated client owner id as,
"Linux NFSv4.0 ::1/::1"

With this patch,
"Linux NFSv4.0 ::1/::1 tcp"

Fixes: a319268891 ("nfs: make nfs4_init_nonuniform_client_string use a dynamically allocated buffer")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSv4.1: Fix a protocol issue with CLOSE stateids

commit 4a1e2feb9d246775dee0f78ed5b18826bae2b1c5 upstream.

According to RFC5661 Section 18.2.4, CLOSE is supposed to return
the zero stateid. This means that nfs_clear_open_stateid_locked()
cannot assume that the result stateid will always match the 'other'
field of the existing open stateid when trying to determine a race
with a parallel OPEN.

Instead, we look at the argument, and check for matches.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSv4.1/flexfiles: Fix a protocol error in layoutreturn

commit d13549074cf066d6d5bb29903d044beffea342d3 upstream.

According to the flexfiles protocol, the layoutreturn should specify an
array of errors in the following format:

struct ff_ioerr4 {
offset4        ffie_offset;
length4        ffie_length;
stateid4       ffie_stateid;
device_error4  ffie_errors<>;
};

This patch fixes up the code to ensure that our ffie_errors is indeed
encoded as an array (albeit with only a single entry).

Reported-by: Tom Haynes <thomas.haynes@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFS41/flexfiles: zero out DS write wcc

commit 5420401079e152ff68a8024f6a375804b1c21505 upstream.

We do not want to update inode attributes with DS values.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSv4: Force a post-op attribute update when holding a delegation

commit aaae3f00d3f67f681a1f3cb7af999e976e8a24ce upstream.

If the ctime or mtime or change attribute have changed because
of an operation we initiated, we should make sure that we force
an attribute update. However we do not want to mark the page cache
for revalidation.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFS41/flexfiles: update inode after write finishes

commit 69f230d907e8c1ca3f9bd528993eeb98f712b0dd upstream.

Otherwise we break fstest case tests/read_write/mctime.t

Does files layout need the same fix as well?

Cc: Anna Schumaker <anna.schumaker@netapp.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFS: nfs_set_pgio_error sometimes misses errors

commit e9ae58aeee8842a50f7e199d602a5ccb2e41a95f upstream.

We should ensure that we always set the pgio_header's error field
if a READ or WRITE RPC call returns an error. The current code depends
on 'hdr->good_bytes' always being initialised to a large value, which
is not always done correctly by callers.
When this happens, applications may end up missing important errors.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFS: Fix a NULL pointer dereference of migration recovery ops for v4.2 client

commit 18e3b739fdc826481c6a1335ce0c5b19b3d415da upstream.

---Steps to Reproduce--
<nfs-server>
# cat /etc/exports
/nfs/referal  *(rw,insecure,no_subtree_check,no_root_squash,crossmnt)
/nfs/old      *(ro,insecure,subtree_check,root_squash,crossmnt)

<nfs-client>
# mount -t nfs nfs-server:/nfs/ /mnt/
# ll /mnt/*/

<nfs-server>
# cat /etc/exports
/nfs/referal   *(rw,insecure,no_subtree_check,no_root_squash,crossmnt,refer=/nfs/old/@nfs-server)
/nfs/old       *(ro,insecure,subtree_check,root_squash,crossmnt)
# service nfs restart

<nfs-client>
# ll /mnt/*/    --->>>>> oops here

[ 5123.102925] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 5123.103363] IP: [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
[ 5123.103752] PGD 587b9067 PUD 3cbf5067 PMD 0
[ 5123.104131] Oops: 0000 [#1]
[ 5123.104529] Modules linked in: nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev vmw_balloon parport_pc parport i2c_piix4 shpchp auth_rpcgss nfs_acl vmw_vmci lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi serio_raw scsi_transport_spi e1000 mptscsih mptbase ata_generic pata_acpi [last unloaded: nfsd]
[ 5123.105887] CPU: 0 PID: 15853 Comm: ::1-manager Tainted: G           OE   4.2.0-rc6+ #214
[ 5123.106358] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[ 5123.106860] task: ffff88007620f300 ti: ffff88005877c000 task.ti: ffff88005877c000
[ 5123.107363] RIP: 0010:[<ffffffffa03ed38b>]  [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
[ 5123.107909] RSP: 0018:ffff88005877fdb8  EFLAGS: 00010246
[ 5123.108435] RAX: ffff880053f3bc00 RBX: ffff88006ce6c908 RCX: ffff880053a0d240
[ 5123.108968] RDX: ffffea0000e6d940 RSI: ffff8800399a0000 RDI: ffff88006ce6c908
[ 5123.109503] RBP: ffff88005877fe28 R08: ffffffff81c708a0 R09: 0000000000000000
[ 5123.110045] R10: 00000000000001a2 R11: ffff88003ba7f5c8 R12: ffff880054c55800
[ 5123.110618] R13: 0000000000000000 R14: ffff880053a0d240 R15: ffff880053a0d240
[ 5123.111169] FS:  0000000000000000(0000) GS:ffffffff81c27000(0000) knlGS:0000000000000000
[ 5123.111726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5123.112286] CR2: 0000000000000000 CR3: 0000000054cac000 CR4: 00000000001406f0
[ 5123.112888] Stack:
[ 5123.113458]  ffffea0000e6d940 ffff8800399a0000 00000000000167d0 0000000000000000
[ 5123.114049]  0000000000000000 0000000000000000 0000000000000000 00000000a7ec82c6
[ 5123.114662]  ffff88005877fe18 ffffea0000e6d940 ffff8800399a0000 ffff880054c55800
[ 5123.115264] Call Trace:
[ 5123.115868]  [<ffffffffa03fb44b>] nfs4_try_migration+0xbb/0x220 [nfsv4]
[ 5123.116487]  [<ffffffffa03fcb3b>] nfs4_run_state_manager+0x4ab/0x7b0 [nfsv4]
[ 5123.117104]  [<ffffffffa03fc690>] ? nfs4_do_reclaim+0x510/0x510 [nfsv4]
[ 5123.117813]  [<ffffffff810a4527>] kthread+0xd7/0xf0
[ 5123.118456]  [<ffffffff810a4450>] ? kthread_worker_fn+0x160/0x160
[ 5123.119108]  [<ffffffff816d9cdf>] ret_from_fork+0x3f/0x70
[ 5123.119723]  [<ffffffff810a4450>] ? kthread_worker_fn+0x160/0x160
[ 5123.120329] Code: 4c 8b 6a 58 74 17 eb 52 48 8d 55 a8 89 c6 4c 89 e7 e8 4a b5 ff ff 8b 45 b0 85 c0 74 1c 4c 89 f9 48 8b 55 90 48 8b 75 98 48 89 df <41> ff 55 00 3d e8 d8 ff ff 41 89 c6 74 cf 48 8b 4d c8 65 48 33
[ 5123.121643] RIP  [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
[ 5123.122308]  RSP <ffff88005877fdb8>
[ 5123.122942] CR2: 0000000000000000

Fixes: ec011fe847 ("NFS: Introduce a vector of migration recovery ops")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSv4.1/pNFS: Fix borken function _same_data_server_addrs_locked()

commit 6f536936b79bd4b5cea8fb0e5b8b0bce8cd1ea4a upstream.

- Switch back to using list_for_each_entry(). Fixes an incorrect test
for list NULL termination.
- Do not assume that lists are sorted.
- Finally, consider an existing entry to match if it consists of a subset
of the addresses in the new entry.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFS: Don't let the ctime override attribute barriers.

commit 7c2dad99d60c86ec686b3bfdcb787c450a7ea89f upstream.

Chuck reports seeing cases where a GETATTR that happens to race
with an asynchronous WRITE is overriding the file size, despite
the attribute barrier being set by the writeback code.

The culprit turns out to be the check in nfs_ctime_need_update(),
which sees that the ctime is newer than the cached ctime, and
assumes that it is safe to override the attribute barrier.
This patch removes that override, and ensures that attribute
barriers are always respected.

Reported-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: a08a8cd375db9 ("NFS: Add attribute update barriers to NFS writebacks")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSv4: don't set SETATTR for O_RDONLY|O_EXCL

commit efcbc04e16dfa95fef76309f89710dd1d99a5453 upstream.

It is unusual to combine the open flags O_RDONLY and O_EXCL, but
it appears that libre-office does just that.

[pid 3250] stat("/home/USER/.config", {st_mode=S_IFDIR|0700, st_size=8192, ...}) = 0
[pid 3250] open("/home/USER/.config/libreoffice/4-suse/user/extensions/buildid", O_RDONLY|O_EXCL <unfinished ...>

NFSv4 takes O_EXCL as a sign that a setattr command should be sent,
probably to reset the timestamps.

When it was an O_RDONLY open, the SETATTR command does not
identify any actual attributes to change.
If no delegation was provided to the open, the SETATTR uses the
all-zeros stateid and the request is accepted (at least by the
Linux NFS server - no harm, no foul).

If a read-delegation was provided, this is used in the SETATTR
request, and a Netapp filer will justifiably claim
NFS4ERR_BAD_STATEID, which the Linux client takes as a sign
to retry - indefinitely.

So only treat O_EXCL specially if O_CREAT was also given.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSv4.1/pnfs: Fix atomicity of commit list updates

commit 86d80f973434de24d8a807a92cd59d5ced7bd519 upstream.

pnfs_layout_mark_request_commit() needs to ensure that it adds the
request to the commit list atomically with all the other updates
in order to prevent corruption to buckets[ds_commit_idx].wlseg
due to races with pnfs_generic_clear_request_commit().

Fixes: 338d00cfef07d ("pnfs: Refactor the *_layout_mark_request_commit...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: ensure that delegation stateid hash references are only put once

commit 3fcbbd244ed1d20dc0eb7d48d729503992fa9b7d upstream.

It's possible that a DELEGRETURN could race with (e.g.) client expiry,
in which case we could end up putting the delegation hash reference more
than once.

Have unhash_delegation_locked return a bool that indicates whether it
was already unhashed. In the case of destroy_delegation we only
conditionally put the hash reference if that returns true.

The other callers of unhash_delegation_locked call it while walking
list_heads that shouldn't yet be detached. If we find that it doesn't
return true in those cases, then throw a WARN_ON as that indicates that
we have a partially hashed delegation, and that something is likely very
wrong.

Tested-by: Andrew W Elble <aweits@rit.edu>
Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: ensure that the ol stateid hash reference is only put once

commit e85687393f3ee0a77ccca016f903d1558bb69258 upstream.

When an open or lock stateid is hashed, we take an extra reference to
it. When we unhash it, we drop that reference. The code however does
not properly account for the case where we have two callers concurrently
trying to unhash the stateid. This can lead to list corruption and the
hash reference being put more than once.

Fix this by having unhash_ol_stateid use list_del_init on the st_perfile
list_head, and then testing to see if that list_head is empty before
releasing the hash reference. This means that some of the unhashing
wrappers now become bool return functions so we can test to see whether
the stateid was unhashed before we put the reference.

Reported-by: Andrew W Elble <aweits@rit.edu>
Tested-by: Andrew W Elble <aweits@rit.edu>
Reported-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: Fix an FS_LAYOUT_TYPES/LAYOUT_TYPES encode bug

commit 6896f15aabde505b35888039af93d1d182a0108a upstream.

Currently we'll respond correctly to a request for either
FS_LAYOUT_TYPES or LAYOUT_TYPES, but not to a request for both
attributes simultaneously.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Btrfs: check if previous transaction aborted to avoid fs corruption

commit 1f9b8c8fbc9a4d029760b16f477b9d15500e3a34 upstream.

While we are committing a transaction, it's possible the previous one is
still finishing its commit and therefore we wait for it to finish first.
However we were not checking if that previous transaction ended up getting
aborted after we waited for it to commit, so we ended up committing the
current transaction which can lead to fs corruption because the new
superblock can point to trees that have had one or more nodes/leafs that
were never durably persisted.
The following sequence diagram exemplifies how this is possible:

          CPU 0                                                        CPU 1

  transaction N starts

  (...)

  btrfs_commit_transaction(N)

    cur_trans->state = TRANS_STATE_COMMIT_START;
    (...)
    cur_trans->state = TRANS_STATE_COMMIT_DOING;
    (...)

    cur_trans->state = TRANS_STATE_UNBLOCKED;
    root->fs_info->running_transaction = NULL;

                                                              btrfs_start_transaction()
                                                                 --> starts transaction N + 1

    btrfs_write_and_wait_transaction(trans, root);
      --> starts writing all new or COWed ebs created
          at transaction N

                                                              creates some new ebs, COWs some
                                                              existing ebs but doesn't COW or
                                                              deletes eb X

                                                              btrfs_commit_transaction(N + 1)
                                                                (...)
                                                                cur_trans->state = TRANS_STATE_COMMIT_START;
                                                                (...)
                                                                wait_for_commit(root, prev_trans);
                                                                  --> prev_trans == transaction N

    btrfs_write_and_wait_transaction() continues
    writing ebs
       --> fails writing eb X, we abort transaction N
           and set bit BTRFS_FS_STATE_ERROR on
           fs_info->fs_state, so no new transactions
           can start after setting that bit

       cleanup_transaction()
         btrfs_cleanup_one_transaction()
           wakes up task at CPU 1

                                                                continues, doesn't abort because
                                                                cur_trans->aborted (transaction N + 1)
                                                                is zero, and no checks for bit
                                                                BTRFS_FS_STATE_ERROR in fs_info->fs_state
                                                                are made

                                                                btrfs_write_and_wait_transaction(trans, root);
                                                                  --> succeeds, no errors during writeback

                                                                write_ctree_super(trans, root, 0);
                                                                  --> succeeds
                                                                  --> we have now a superblock that points us
                                                                      to some root that uses eb X, which was
                                                                      never written to disk

In this scenario future attempts to read eb X from disk results in an
error message like "parent transid verify failed on X wanted Y found Z".

So fix this by aborting the current transaction if after waiting for the
previous transaction we verify that it was aborted.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

v4l: omap3isp: Fix async notifier registration order

commit 5d479386983c5f1bb1aff4f88a027b6143f88a39 upstream.

The async notifier was registered before the v4l2_device was registered and
before the notifier callbacks were set. This could lead to missing the
bound() and complete() callbacks and to attempting to spin_lock() and
uninitialised spin lock.

Also fix unregistering the async notifier in the case of an error --- the
function may not fail anymore after the notifier is registered.

Fixes: da7f3843d2c7 ("[media] omap3isp: Add support for the Device Tree")
Signed-off-by: Sakari Ailus <sakari.ailus@iki.fi>
Reviewed-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

v4l: xilinx: missing error code

commit e31f8f00bfc081ec1881d92a2dd192aeddf1d9d7 upstream.

We should set "ret" on this error path instead of returning success.

Fixes: df3305156f98 ('[media] v4l: xilinx: Add Xilinx Video IP core')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Hyun Kwon <hyun.kwon@xilinx.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: am437x-vpfe: Fix a race condition during release

commit c99235fa3ef833c3c23926085f2bb68851c8460a upstream.

There was a race condition where during cleanup/release operation
on-going streaming would cause a kernel panic because the hardware
module was disabled prematurely with IRQ still pending.

Fixes: 417d2e507edc ("[media] media: platform: add VPFE capture driver support for AM437X")
Signed-off-by: Benoit Parrot <bparrot@ti.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: am437x-vpfe: Requested frame size and fmt overwritten by current sensor setting

commit f47c9045643f91e76d8a9030828b9fe1cf4a6bcf upstream.

Upon a S_FMT the input/requested frame size and pixel format is
overwritten by the current sub-device settings.
Fix this so application can actually set the frame size and format.

Fixes: 417d2e507edc ("[media] media: platform: add VPFE capture driver support for AM437X")
Signed-off-by: Benoit Parrot <bparrot@ti.com>
Acked-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

v4l: omap3isp: Fix sub-device power management code

commit 9d39f05490115bf145e5ea03c0b7ec9d3d015b01 upstream.

Commit 813f5c0ac5cc ("media: Change media device link_notify behaviour")
modified the media controller link setup notification API and updated the
OMAP3 ISP driver accordingly. As a side effect it introduced a bug by
turning power on after setting the link instead of before. This results in
sub-devices not being powered down in some cases when they should be. Fix
it.

Fixes: 813f5c0ac5cc [media] media: Change media device link_notify behaviour
Signed-off-by: Sakari Ailus <sakari.ailus@iki.fi>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rc-core: fix remove uevent generation

commit a66b0c41ad277ae62a3ae6ac430a71882f899557 upstream.

The input_dev is already gone when the rc device is being unregistered
so checking for its presence only means that no remove uevent will be
generated.

Signed-off-by: David Härdeman <david@hardeman.nu>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cxl: Don't remove AFUs/vPHBs in cxl_reset

commit 4e1efb403c1c016ae831bd9988a7d2e5e0af41a0 upstream.

If the driver doesn't participate in EEH, the AFUs will be removed
by cxl_remove, which will be invoked by EEH.

If the driver does particpate in EEH, the vPHB needs to stick around
so that the it can particpate.

In both cases, we shouldn't remove the AFU/vPHB.

Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i915: Set ddi_pll_sel in DP MST path

commit 6fa2d197936ba0b8936e813d0adecefac160062b upstream.

The DP MST encoder config function never sets ddi_pll_sel, even though
its value is programmed in its ->pre_enable() hook. That used to work
because a new pipe_config was kzalloc'ed at every modeset, and the value
of zero selects the highest clock for the PLL. Starting with the commit
below, the value of ddi_pll_sel is preserved through modesets, and since
the correct value wasn't properly setup by the MST code, it could lead
to warnings and blank screens.

commit 8504c74c7ae48b4b8ed1f1c0acf67482a7f45c93
Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Date: Fri May 15 11:51:50 2015 +0300

drm/i915: Preserve ddi_pll_sel when allocating new pipe_config

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91628
Cc: Timo Aaltonen <tjaalton@ubuntu.com>
Cc: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/i915: Don't use link_bw for PLL setup

commit 7e6313a2516dbcd168f4ae36f0abe1a9227106b5 upstream.

Use port_clock instead of link_bw when picking the PLL parameters for
DP. link_bw may be zero with an eDP 1.4 sink that supports
DP_LINK_RATE_SET so we shouldn't use it for anything other than feed it
to the sink appropriately.

v2: Fix typo in commit message (Sivakumar)

Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[Jani: cherry-picked from future.]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/mm: Initialize pmd_idx in page_table_range_init_count()

commit 9962eea9e55f797f05f20ba6448929cab2a9f018 upstream.

The variable pmd_idx is not initialized for the first iteration of the
for loop.

Assign the proper value which indexes the start address.

Fixes: 719272c45b82 'x86, mm: only call early_ioremap_page_table_range_init() once'
Signed-off-by: Minfei Huang <mnfhuang@gmail.com>
Cc: tony.luck@intel.com
Cc: wangnan0@huawei.com
Cc: david.vrabel@citrix.com
Reviewed-by: yinghai@kernel.org
Link: http://lkml.kernel.org/r/1436703522-29552-1-git-send-email-mhuang@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: check if section present during memory block registering

commit 04697858d89e4bf2650364f8d6956e2554e8ef88 upstream.

Tony Luck found on his setup, if memory block size 512M will cause crash
during booting.

  BUG: unable to handle kernel paging request at ffffea0074000020
  IP: get_nid_for_pfn+0x17/0x40
  PGD 128ffcb067 PUD 128ffc9067 PMD 0
  Oops: 0000 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc8 #1
  ...
  Call Trace:
     ? register_mem_sect_under_node+0x66/0xe0
     register_one_node+0x17b/0x240
     ? pci_iommu_alloc+0x6e/0x6e
     topology_init+0x3c/0x95
     do_one_initcall+0xcd/0x1f0

The system has non continuous RAM address:
BIOS-e820: [mem 0x0000001300000000-0x0000001cffffffff] usable
BIOS-e820: [mem 0x0000001d70000000-0x0000001ec7ffefff] usable
BIOS-e820: [mem 0x0000001f00000000-0x0000002bffffffff] usable
BIOS-e820: [mem 0x0000002c18000000-0x0000002d6fffefff] usable
BIOS-e820: [mem 0x0000002e00000000-0x00000039ffffffff] usable

So there are start sections in memory block not present.  For example:

    memory block : [0x2c18000000, 0x2c20000000) 512M

first three sections are not present.

The current register_mem_sect_under_node() assume first section is
present, but memory block section number range [start_section_nr,
end_section_nr] would include not present section.

For arch that support vmemmap, we don't setup memmap for struct page
area within not present sections area.

So skip the pfn range that belong to absent section.

[akpm@linux-foundation.org: simplification]
[rientjes@google.com: more simplification]
Fixes: bdee237c0343 ("x86: mm: Use 2GB memory block size on large memory x86-64 systems")
Fixes: 982792c782ef ("x86, mm: probe memory block size for generic x86 64bit")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: Ingo Molnar <mingo@elte.hu>
Tested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Add radeon suspend/resume quirk for HP Compaq dc5750.

commit 09bfda10e6efd7b65bcc29237bee1765ed779657 upstream.

With the radeon driver loaded the HP Compaq dc5750
Small Form Factor machine fails to resume from suspend.
Adding a quirk similar to other devices avoids
the problem and the system resumes properly.

Signed-off-by: Jeffery Miller <jmiller@neverware.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

CIFS: fix type confusion in copy offload ioctl

commit 4c17a6d56bb0cad3066a714e94f7185a24b40f49 upstream.

This might lead to local privilege escalation (code execution as
kernel) for systems where the following conditions are met:

- CONFIG_CIFS_SMB2 and CONFIG_CIFS_POSIX are enabled
- a cifs filesystem is mounted where:
- the mount option "vers" was used and set to a value >=2.0
- the attacker has write access to at least one file on the filesystem

To attack this, an attacker would have to guess the target_tcon
pointer (but guessing wrong doesn't cause a crash, it just returns an
error code) and win a narrow race.

Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/mm: Recompute hash value after a failed update

commit 36b35d5d807b7e57aff7d08e63de8b17731ee211 upstream.

If we had secondary hash flag set, we ended up modifying hash value in
the updatepp code path. Hence with a failed updatepp we will be using
a wrong hash value for the following hash insert. Fix this by
recomputing hash before insert.

Without this patch we can end up with using wrong slot number in linux
pte. That can result in us missing an hash pte update or invalidate
which can cause memory corruption or even machine check.

Fixes: 6d492ecc6489 ("powerpc/THP: Add code to handle HPTE faults for hugepages")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/boot: Specify ABI v2 when building an LE boot wrapper

commit 655471f54c2e395ba29ae4156ba0f49928177cc1 upstream.

The kernel does it, not the boot wrapper, which breaks with some
cross compilers that still default to ABI v1.

Fixes: 147c05168fc8 ("powerpc/boot: Add support for 64bit little endian wrapper")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/pseries: Release DRC when configure_connector fails

commit daebaabb5cfbe4a6f09ca0e0f8b7673efc704960 upstream.

Commit f32393c943e2 ("powerpc/pseries: Correct cpu affinity for
dlpar added cpus") moved dlpar_acquire_drc() call to before
dlpar_configure_connector() call in dlpar_cpu_probe(), but missed
to release the DRC if dlpar_configure_connector() failed.
During CPU hotplug, if configure-connector fails for any reason,
then this will result in subsequent CPU hotplug attempts to fail.

Release the acquired DRC if dlpar_configure_connector() call fails
so that the DRC is left in right isolation and allocation state
for the subsequent hotplug operation to succeed.

Fixes: f32393c943e2 ("powerpc/pseries: Correct cpu affinity for dlpar added cpus")
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/powernv/pci-ioda: fix kdump with non-power-of-2 crashkernel=

commit fa14486979b3a47307bcdb10f8b5baa875a5cf68 upstream.

The 32-bit TCE table initialization relies on the DMA window having a
size equal to a power of 2 (and checks for it explicitly). But
crashkernel= has no constraint that requires a power-of-2 be specified.
This causes the kdump kernel to fail to boot as none of the PCI devices
(including the disk controller) are successfully initialized.

After this change, the PCI devices successfully set up the 32-bit TCE
table and kdump succeeds.

Fixes: aca6913f5551 ("powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages")
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/powernv/pci-ioda: fix 32-bit TCE table init in kdump kernel

commit bb0054552d080dd929907c5925d4bedc8bf6def7 upstream.

When attempting to kdump with the 4.2 kernel, we see for each PCI
device:

pci 0003:01     : [PE# 000] Assign DMA32 space
pci 0003:01     : [PE# 000] Setting up 32-bit TCE table at 0..80000000
pci 0003:01     : [PE# 000] Failed to create 32-bit TCE table, err -22
PCI: Domain 0004 has 8 available 32-bit DMA segments
PCI: 4 PE# for a total weight of 70
pci 0004:01     : [PE# 002] Assign DMA32 space
pci 0004:01     : [PE# 002] Setting up 32-bit TCE table at 0..80000000
pci 0004:01     : [PE# 002] Failed to create 32-bit TCE table, err -22
pci 0004:0d     : [PE# 005] Assign DMA32 space
pci 0004:0d     : [PE# 005] Setting up 32-bit TCE table at 0..80000000
pci 0004:0d     : [PE# 005] Failed to create 32-bit TCE table, err -22
pci 0004:0e     : [PE# 006] Assign DMA32 space
pci 0004:0e     : [PE# 006] Setting up 32-bit TCE table at 0..80000000
pci 0004:0e     : [PE# 006] Failed to create 32-bit TCE table, err -22
pci 0004:10     : [PE# 008] Assign DMA32 space
pci 0004:10     : [PE# 008] Setting up 32-bit TCE table at 0..80000000
pci 0004:10     : [PE# 008] Failed to create 32-bit TCE table, err -22

and eventually the kdump kernel fails to boot as none of the PCI devices
(including the disk controller) are successfully initialized.

The EINVAL response is because the DMA window (the 2GB base window) is
larger than the kdump kernel's reserved memory (crashkernel=, in this
case specified to be 1024M). The check in question,

if ((window_size > memory_hotplug_max()) || !is_power_of_2(window_size))

is a valid sanity check for pnv_pci_ioda2_table_alloc_pages(), so adjust
the caller to pass in a smaller window size if our maximum memory value
is smaller than the DMA window.

After this change, the PCI devices successfully set up the 32-bit TCE
table and kdump succeeds.

The problem was seen on a Firestone machine originally.

Fixes: aca6913f5551 ("powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages")
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[mpe: Coding style pedantry, use u64, change the indentation]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: vmx - Adding enable_kernel_vsx() to access VSX instructions

commit 2d6f0600b2cd755959527230ef5a6fba97bb762a upstream.

vmx-crypto driver make use of some VSX instructions which are
only available if VSX is enabled. Running in cases where VSX
are not enabled vmx-crypto fails in a VSX exception.

In order to fix this enable_kernel_vsx() was added to turn on
VSX instructions for vmx-crypto.

Signed-off-by: Leonidas S. Barbosa <leosilva@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc: Uncomment and make enable_kernel_vsx() routine available

commit 72cd7b44bc99376b3f3c93cedcd052663fcdf705 upstream.

enable_kernel_vsx() function was commented since anything was using
it. However, vmx-crypto driver uses VSX instructions which are
only available if VSX is enable. Otherwise it rises an exception oops.

This patch uncomment enable_kernel_vsx() routine and makes it available.

Signed-off-by: Leonidas S. Barbosa <leosilva@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers

commit 1c2cb594441d02815d304cccec9742ff5c707495 upstream.

The EPOW interrupt handler uses rtas_get_sensor(), which in turn
uses rtas_busy_delay() to wait for RTAS becoming ready in case it
is necessary. But rtas_busy_delay() is annotated with might_sleep()
and thus may not be used by interrupts handlers like the EPOW handler!
This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
enabled:

BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496
in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
Call Trace:
[c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable)
[c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180
[c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0
[c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0
[c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450
[c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300
[c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0
[c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260
[c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80
[c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200
[c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24
[c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110
[c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180

Fix this issue by introducing a new rtas_get_sensor_fast() function
that does not use rtas_busy_delay() - and thus can only be used for
sensors that do not cause a BUSY condition - known as "fast" sensors.

The EPOW sensor is defined to be "fast" in sPAPR - mpe.

Fixes: 587f83e8dd50 ("powerpc/pseries: Use rtas_get_sensor in RAS code")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash

commit 74b5037baa2011a2799e2c43adde7d171b072f9e upstream.

The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K
PAGE_SIZE.

However when built with a 4K PAGE_SIZE there is an additional config
option which can be enabled, PPC_HAS_HASH_64K, which means the kernel
also knows how to hash a 64K page even though the base PAGE_SIZE is 4K.

This is used in one obscure configuration, to support 64K pages for SPU
local store on the Cell processor when the rest of the kernel is using
4K pages.

In this configuration, pte_pagesize_index() is defined to just pass
through its arguments to get_slice_psize(). However pte_pagesize_index()
is called for both user and kernel addresses, whereas get_slice_psize()
only knows how to handle user addresses.

This has been broken forever, however until recently it happened to
work. That was because in get_slice_psize() the large kernel address
would cause the right shift of the slice mask to return zero.

However in commit 7aa0727f3302 ("powerpc/mm: Increase the slice range to
64TB"), the get_slice_psize() code was changed so that instead of a
right shift we do an array lookup based on the address. When passed a
kernel address this means we index way off the end of the slice array
and return random junk.

That is only fatal if we happen to hit something non-zero, but when we
do return a non-zero value we confuse the MMU code and eventually cause
a check stop.

This fix is ugly, but simple. When we're called for a kernel address we
return 4K, which is always correct in this configuration, otherwise we
use the slice mask.

Fixes: 7aa0727f3302 ("powerpc/mm: Increase the slice range to 64TB")
Reported-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/eeh: Fix fenced PHB caused by eeh_slot_error_detail()

commit 259800135c654a098d9f0adfdd3d1f20eef1f231 upstream.

The config space of some PCI devices can't be accessed when their
PEs are in frozen state. Otherwise, fenced PHB might be seen.
Those PEs are identified with flag EEH_PE_CFG_RESTRICTED, meaing
EEH_PE_CFG_BLOCKED is set automatically when the PE is put to
frozen state (EEH_PE_ISOLATED). eeh_slot_error_detail() restores
PCI device BARs with eeh_pe_restore_bars(), which then calls
eeh_ops->restore_config() to reinitialize the PCI device in
(OPAL) firmware. eeh_ops->restore_config() produces PCI config
access that causes fenced PHB. The problem was reported on below
adapter:

   0001:01:00.0 0200: 14e4:168e (rev 10)
   0001:01:00.0 Ethernet controller: Broadcom Corporation \
                NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)

This fixes the issue by skipping eeh_pe_restore_bars() in
eeh_slot_error_detail() when EEH_PE_CFG_BLOCKED is set for the PE.

Fixes: b6541db1 ("powerpc/eeh: Block PCI config access upon frozen PE")
Reported-by: Manvanthara B. Puttashankar <mputtash@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/eeh: Probe after unbalanced kref check

commit e642d11bdbfe8eb10116ab3959a2b5d75efda832 upstream.

In the complete hotplug case, EEH PEs are supposed to be released
and set to NULL. Normally, this is done by eeh_remove_device(),
which is called from pcibios_release_device().

However, if something is holding a kref to the device, it will not
be released, and the PE will remain. eeh_add_device_late() has
a check for this which will explictly destroy the PE in this case.

This check in eeh_add_device_late() occurs after a call to
eeh_ops->probe(). On PowerNV, probe is a pointer to pnv_eeh_probe(),
which will exit without probing if there is an existing PE.

This means that on PowerNV, devices with outstanding krefs will not
be rediscovered by EEH correctly after a complete hotplug. This is
affecting CXL (CAPI) devices in the field.

Put the probe after the kref check so that the PE is destroyed
and affected devices are correctly rediscovered by EEH.

Fixes: d91dafc02f42 ("powerpc/eeh: Delay probing EEH device during hotplug")
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/pseries: Fix corrupted pdn list

commit 590c7567a2895f939525ead57b0334c6d47986f0 upstream.

Commit cca87d30 ("powerpc/pci: Refactor pci_dn") introduced pdn
list for SRIOV VFs. It means the pdn is be put into the child list
of its parent pdn when the pdn is created. When doing PCI hot
unplugging on pSeries, the PCI device node as well as its pdn are
released through procfs entry "powerpc/ofdt". Some one else grabs
the memory chunk of the pdn and update it accordingly. At the same
time, the pdn is still tracked in the child list of parent pdn. It
leads to corrupted child list in the parent pdn.

This fixes above issue by removing the pdn from the child list of
its parent pdn when the device node is detached from the system.
Note the pdn is free'd when the device node is released if the
device node is dynamic one. Otherwise, the device node as well
as the pdn won't be released.

Fixes: cca87d30 ("powerpc/pci: Refactor pci_dn")
Reported-by: Santwana Samantray <santwana.samantray@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pinctrl: at91: fix null pointer dereference

commit 1ab36387ea4face01aac3560b396b1e2ce07c4ff upstream.

Not all gpio banks are necessarily enabled, in the current code this can
lead to null pointer dereferences.

[   51.130000] Unable to handle kernel NULL pointer dereference at virtual address 00000058
[   51.130000] pgd = dee04000
[   51.130000] [00000058] *pgd=3f66d831, *pte=00000000, *ppte=00000000
[   51.140000] Internal error: Oops: 17 [#1] ARM
[   51.140000] Modules linked in:
[   51.140000] CPU: 0 PID: 1664 Comm: cat Not tainted 4.1.1+ #6
[   51.140000] Hardware name: Atmel SAMA5
[   51.140000] task: df6dd880 ti: dec60000 task.ti: dec60000
[   51.140000] PC is at at91_pinconf_get+0xb4/0x200
[   51.140000] LR is at at91_pinconf_get+0xb4/0x200
[   51.140000] pc : [<c01e71a0>]    lr : [<c01e71a0>]    psr: 600f0013
sp : dec61e48  ip : 600f0013  fp : df522538
[   51.140000] r10: df52250c  r9 : 00000058  r8 : 00000068
[   51.140000] r7 : 00000000  r6 : df53c910  r5 : 00000000  r4 : dec61e7c
[   51.140000] r3 : 00000000  r2 : c06746d4  r1 : 00000000  r0 : 00000003
[   51.140000] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   51.140000] Control: 10c53c7d  Table: 3ee04059  DAC: 00000015
[   51.140000] Process cat (pid: 1664, stack limit = 0xdec60208)
[   51.140000] Stack: (0xdec61e48 to 0xdec62000)
[   51.140000] 1e40:                   00000358 00000000 df522500 ded15f80 c05a9d08 ded15f80
[   51.140000] 1e60: 0000048c 00000061 df522500 ded15f80 c05a9d08 c01e7304 ded15f80 00000000
[   51.140000] 1e80: c01e6008 00000060 0000048c c01e6034 c01e5f6c ded15f80 dec61ec0 00000000
[   51.140000] 1ea0: 00020000 ded6f280 dec61f80 00000001 00000001 c00ae0b8 b6e80000 ded15fb0
[   51.140000] 1ec0: 00000000 00000000 df4bc974 00000055 00000800 ded6f280 b6e80000 ded6f280
[   51.140000] 1ee0: ded6f280 00020000 b6e80000 00000000 00020000 c0090dec c0671e1c dec61fb0
[   51.140000] 1f00: b6f8b510 00000001 00004201 c000924c 00000000 00000003 00000003 00000000
[   51.140000] 1f20: df4bc940 00022000 00000022 c066e188 b6e7f000 c00836f4 000b6e7f ded6f280
[   51.140000] 1f40: ded6f280 b6e80000 dec61f80 ded6f280 00020000 c0091508 00000000 00000003
[   51.140000] 1f60: 00022000 00000000 00000000 ded6f280 ded6f280 00020000 b6e80000 c0091d9c
[   51.140000] 1f80: 00000000 00000000 ffffffff 00020000 00020000 b6e80000 00000003 c000f124
[   51.140000] 1fa0: dec60000 c000efa0 00020000 00020000 00000003 b6e80000 00020000 000271c4
[   51.140000] 1fc0: 00020000 00020000 b6e80000 00000003 7fffe000 00000000 00000000 00020000
[   51.140000] 1fe0: 00000000 bef50b64 00013835 b6f29c76 400f0030 00000003 00000000 00000000
[   51.140000] [<c01e71a0>] (at91_pinconf_get) from [<c01e7304>] (at91_pinconf_dbg_show+0x18/0x2c0)
[   51.140000] [<c01e7304>] (at91_pinconf_dbg_show) from [<c01e6034>] (pinconf_pins_show+0xc8/0xf8)
[   51.140000] [<c01e6034>] (pinconf_pins_show) from [<c00ae0b8>] (seq_read+0x1a0/0x464)
[   51.140000] [<c00ae0b8>] (seq_read) from [<c0090dec>] (__vfs_read+0x20/0xd0)
[   51.140000] [<c0090dec>] (__vfs_read) from [<c0091508>] (vfs_read+0x7c/0x108)
[   51.140000] [<c0091508>] (vfs_read) from [<c0091d9c>] (SyS_read+0x40/0x94)
[   51.140000] [<c0091d9c>] (SyS_read) from [<c000efa0>] (ret_fast_syscall+0x0/0x3c)
[   51.140000] Code: eb010ec2 e30a0d08 e34c005a eb0ae5a7 (e5993000)
[   51.150000] ---[ end trace fb3c370da3ea4794 ]---

Fixes: a0b957f306fa ("pinctrl: at91: allow to have disabled gpio bank")
Signed-off-by: David Dueck <davidcdueck@googlemail.com>
Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pinctrl: mediatek: Fix multiple registration issue.

commit d48c2c02645392483f2b88b050d21ce1db6997b3 upstream.

Since our common driver need support main chip and PMU
at the same time, that means it will register two
pinctrl device, and the pinctrl_desc structure should
be used two times.

But pinctrl_desc use global static definition, then
the latest registered pinctrl device will overwrite
the old one's, all members in pinctrl_desc will set to
the new one's, such as name, pins and pins numbers, etc.
This is a bug.

Move pinctrl_desc into mtk_pinctrl, assign new value for
each pinctrl device to fix it.

Signed-off-by: Hongzhou Yang <hongzhou.yang@mediatek.com>
Reviewed-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda - Fix white noise on Dell M3800

commit 467e1436ba85f78b8c4610c4549eb255a8211c42 upstream.

The M3800 is very minor workstation variant of the XPS 15 which has
already been patched for this issue. I figured it's probably more
important for this version of the laptop to be patched than the
regular XPS as Dell sells is pre-configured with Ubuntu to be used as
a Linux workstation. I have tested the patch on my the hardware on
Linux 4.2.0.

Signed-off-by: Niranjan Sivakumar <ns253@cornell.edu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda - Add some FIXUP quirks for white noise on Dell laptop.

commit 1adecc6755e1e4193b5618ddb2e107f6d6e88f4b upstream.

Dell laptop has a series model to use the same codec but different subsystem ID.
At the same time they happens the white noise by login screen and headphone;
for fixing them together, I only can add these IDs to FIXUP function ALC292_FIXUP_DISABLE_AAMIX,
then try to solve such the similar issues.

Codec: Realtek ALC3235
Vendor Id: 0x10ec0293
Subsystem Id: 0x102806dd
Subsystem Id: 0x102806df
Subsystem Id: 0x102806e0

BugLink: https://bugs.launchpad.net/bugs/1492132
Signed-off-by: Woodrow Shen <woodrow.shen@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda - Use ALC880_FIXUP_FUJITSU for FSC Amilo M1437

commit a161574e200ae63a5042120e0d8c36830e81bde3 upstream.

It turned out that the machine has a bass speaker, so take a correct
fixup entry.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=102501
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda - Enable headphone jack detect on old Fujitsu laptops

commit bb148bdeb0ab16fc0ae8009799471e4d7180073b upstream.

According to the bug report, FSC Amilo laptops with ALC880 can detect
the headphone jack but currently the driver disables it. It's partly
intentionally, as non-working jack detect was reported in the past.
Let's enable now.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=102501
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda - Fix missing inline for dummy snd_hdac_set_codec_wakeup()

commit 6869de380e8c11c31b608bb2502dcacd634eda13 upstream.

This seems overlooked.

Fixes: 98d8fc6c5d36 ('ALSA: hda - Move hda_i915.c from sound/pci/hda to sound/hda')
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: usb-audio: correct the value cache check.

commit 6aa6925cad06159dc6e25857991bbc4960821242 upstream.

The check of cval->cached should be zero-based (including master channel).

Signed-off-by: Yao-Wen Mao <yaowen@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: evdev - do not report errors form flush()

commit eb38f3a4f6e86f8bb10a3217ebd85ecc5d763aae upstream.

We've got bug reports showing the old systemd-logind (at least
system-210) aborting unexpectedly, and this turned out to be because
of an invalid error code from close() call to evdev devices.  close()
is supposed to return only either EINTR or EBADFD, while the device
returned ENODEV.  logind was overreacting to it and decided to kill
itself when an unexpected error code was received.  What a tragedy.

The bad error code comes from flush fops, and actually evdev_flush()
returns ENODEV when device is disconnected or client's access to it is
revoked. But in these cases the fact that flush did not actually happen is
not an error, but rather normal behavior. For non-disconnected devices
result of flush is also not that interesting as there is no potential of
data loss and even if it fails application has no way of handling the
error. Because of that we are better off always returning success from
evdev_flush().

Also returning EINTR from flush()/close() is discouraged (as it is not
clear how application should handle this error), so let's stop taking
evdev->mutex interruptibly.

Bugzilla: http://bugzilla.suse.com/show_bug.cgi?id=939834
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: KVM: Disable virtual timer even if the guest is not using it

commit c4cbba9fa078f55d9f6d081dbb4aec7cf969e7c7 upstream.

When running a guest with the architected timer disabled (with QEMU and
the kernel_irqchip=off option, for example), it is important to make
sure the timer gets turned off. Otherwise, the guest may try to
enable it anyway, leading to a screaming HW interrupt.

The fix is to unconditionally turn off the virtual timer on guest
exit.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: arm64: add workaround for Cortex-A57 erratum #852523

commit 43297dda0a51e4ffed0888ce727c218cfb7474b6 upstream.

When restoring the system register state for an AArch32 guest at EL2,
writes to DACR32_EL2 may not be correctly synchronised by Cortex-A57,
which can lead to the guest effectively running with junk in the DACR
and running into unexpected domain faults.

This patch works around the issue by re-ordering our restoration of the
AArch32 register aliases so that they happen before the AArch64 system
registers. Ensuring that the registers are restored in this order
guarantees that they will be correctly synchronised by the core.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resources

commit c2f58514cfb374d5368c9da945f1765cd48eb0da upstream.

Until b26e5fdac43c ("arm/arm64: KVM: introduce per-VM ops"),
kvm_vgic_map_resources() used to include a check on irqchip_in_kernel(),
and vgic_v2_map_resources() still has it.

But now vm_ops are not initialized until we call kvm_vgic_create().
Therefore kvm_vgic_map_resources() can being called without a VGIC,
and we die because vm_ops.map_resources is NULL.

Fixing this restores QEMU's kernel-irqchip=off option to a working state,
allowing to use GIC emulation in userspace.

Fixes: b26e5fdac43c ("arm/arm64: KVM: introduce per-VM ops")
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
[maz: reworked commit message]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: errata: add module build workaround for erratum #843419

commit df057cc7b4fa59e9b55f07ffdb6c62bf02e99a00 upstream.

Cortex-A53 processors <= r0p4 are affected by erratum #843419 which can
lead to a memory access using an incorrect address in certain sequences
headed by an ADRP instruction.

There is a linker fix to generate veneers for ADRP instructions, but
this doesn't work for kernel modules which are built as unlinked ELF
objects.

This patch adds a new config option for the erratum which, when enabled,
builds kernel modules with the mcmodel=large flag. This uses absolute
addressing for all kernel symbols, thereby removing the use of ADRP as
a PC-relative form of addressing. The ADRP relocs are removed from the
module loader so that we fail to load any potentially affected modules.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: head.S: initialise mdcr_el2 in el2_setup

commit d10bcd473301888f957ec4b6b12aa3621be78d59 upstream.

When entering the kernel at EL2, we fail to initialise the MDCR_EL2
register which controls debug access and PMU capabilities at EL1.

This patch ensures that the register is initialised so that all traps
are disabled and all the PMU counters are available to the host. When a
guest is scheduled, KVM takes care to configure trapping appropriately.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: compat: fix vfp save/restore across signal handlers in big-endian

commit bdec97a855ef1e239f130f7a11584721c9a1bf04 upstream.

When saving/restoring the VFP registers from a compat (AArch32)
signal frame, we rely on the compat registers forming a prefix of the
native register file and therefore make use of copy_{to,from}_user to
transfer between the native fpsimd_state and the compat_vfp_sigframe.

Unfortunately, this doesn't work so well in a big-endian environment.
Our fpsimd save/restore code operates directly on 128-bit quantities
(Q registers) whereas the compat_vfp_sigframe represents the registers
as an array of 64-bit (D) registers. The architecture packs the compat D
registers into the Q registers, with the least significant bytes holding
the lower register. Consequently, we need to swap the 64-bit halves when
converting between these two representations on a big-endian machine.

This patch replaces the __copy_{to,from}_user invocations in our
compat VFP signal handling code with explicit __put_user loops that
operate on 64-bit values and swap them accordingly.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: set MAX_MEMBLOCK_ADDR according to linear region size

commit 34ba2c4247e5c4b1542b1106e156af324660c4f0 upstream.

The linear region size of a 39-bit VA kernel is only 256 GB, which
may be insufficient to cover all of system RAM, even on platforms
that have much less than 256 GB of memory but which is laid out
very sparsely.

So make sure we clip the memory we will not be able to map before
installing it into the memblock memory table, by setting
MAX_MEMBLOCK_ADDR accordingly.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

of/fdt: make memblock maximum physical address arch configurable

commit 8eafeb48022816513abc4f440bdad4c350fe81a3 upstream.

When parsing the memory nodes to populate the memblock memory
table, we check against high and low limits and clip any memory
that exceeds either one of them.

However, for arm64, the high limit of (phys_addr_t)~0 is not very
meaningful, since phys_addr_t is 64 bits (i.e., no limit) but there
may be other constraints that limit the memory ranges that we can
support.

So rename MAX_PHYS_ADDR to MAX_MEMBLOCK_ADDR (for clarity) and only
define it if the arch does not supply a definition of its own.

Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: flush FP/SIMD state correctly after execve()

commit 674c242c9323d3c293fc4f9a3a3a619fe3063290 upstream.

When a task calls execve(), its FP/SIMD state is flushed so that
none of the original program state is observeable by the incoming
program.

However, since this flushing consists of setting the in-memory copy
of the FP/SIMD state to all zeroes, the CPU field is set to CPU 0 as
well, which indicates to the lazy FP/SIMD preserve/restore code that
the FP/SIMD state does not need to be reread from memory if the task
is scheduled again on CPU 0 without any other tasks having entered
userland (or used the FP/SIMD in kernel mode) on the same CPU in the
mean time. If this happens, the FP/SIMD state of the old program will
still be present in the registers when the new program starts.

So set the CPU field to the invalid value of NR_CPUS when performing
the flush, by calling fpsimd_flush_task_state().

Reported-by: Chunyan Zhang <chunyan.zhang@spreadtrum.com>
Reported-by: Janet Liu <janet.liu@spreadtrum.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: entry: always restore x0 from the stack on syscall return

commit 412fcb6cebd758d080cacd5a41a0cbc656ea5fce upstream.

We have a micro-optimisation on the fast syscall return path where we
take care to keep x0 live with the return value from the syscall so that
we can avoid restoring it from the stack. The benefit of doing this is
fairly suspect, since we will be restoring x1 from the stack anyway
(which lives adjacent in the pt_regs structure) and the only additional
cost is saving x0 back to pt_regs after the syscall handler, which could
be seen as a poor man's prefetch.

More importantly, this causes issues with the context tracking code.

The ct_user_enter macro ends up branching into C code, which is free to
use x0 as a scratch register and consequently leads to us returning junk
back to userspace as the syscall return value. Rather than special case
the context-tracking code, this patch removes the questionable
optimisation entirely.

Cc: Larry Bassel <larry.bassel@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: kconfig: Move LIST_POISON to a safe value

commit bf0c4e04732479f650ff59d1ee82de761c0071f0 upstream.

Move the poison pointer offset to 0xdead000000000000, a
recognized value that is not mappable by user-space exploits.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Thierry Strudel <tstrudel@google.com>
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "ext4: remove block_device_ejected"

commit bdfe0cbd746aa9b2509c2f6d6be17193cf7facd7 upstream.

This reverts commit 08439fec266c3cc5702953b4f54bdf5649357de0.

Unfortunately we still need to test for bdi->dev to avoid a crash when a
USB stick is yanked out while a file system is mounted:

   usb 2-2: USB disconnect, device number 2
   Buffer I/O error on dev sdb1, logical block 15237120, lost sync page write
   JBD2: Error -5 detected when updating journal superblock for sdb1-8.
   BUG: unable to handle kernel paging request at 34beb000
   IP: [<c136ce88>] __percpu_counter_add+0x18/0xc0
   *pdpt = 0000000023db9001 *pde = 0000000000000000
   Oops: 0000 [#1] SMP
   CPU: 0 PID: 4083 Comm: umount Tainted: G     U     OE   4.1.1-040101-generic #201507011435
   Hardware name: LENOVO 7675CTO/7675CTO, BIOS 7NETC2WW (2.22 ) 03/22/2011
   task: ebf06b50 ti: ebebc000 task.ti: ebebc000
   EIP: 0060:[<c136ce88>] EFLAGS: 00010082 CPU: 0
   EIP is at __percpu_counter_add+0x18/0xc0
   EAX: f21c8e88 EBX: f21c8e88 ECX: 00000000 EDX: 00000001
   ESI: 00000001 EDI: 00000000 EBP: ebebde60 ESP: ebebde40
    DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
   CR0: 8005003b CR2: 34beb000 CR3: 33354200 CR4: 000007f0
   Stack:
    c1abe100 edcb0098 edcb00ec ffffffff f21c8e68 ffffffff f21c8e68 f286d160
    ebebde84 c1160454 00000010 00000282 f72a77f8 00000984 f72a77f8 f286d160
    f286d170 ebebdea0 c11e613f 00000000 00000282 f72a77f8 edd7f4d0 00000000
   Call Trace:
    [<c1160454>] account_page_dirtied+0x74/0x110
    [<c11e613f>] __set_page_dirty+0x3f/0xb0
    [<c11e6203>] mark_buffer_dirty+0x53/0xc0
    [<c124a0cb>] ext4_commit_super+0x17b/0x250
    [<c124ac71>] ext4_put_super+0xc1/0x320
    [<c11f04ba>] ? fsnotify_unmount_inodes+0x1aa/0x1c0
    [<c11cfeda>] ? evict_inodes+0xca/0xe0
    [<c11b925a>] generic_shutdown_super+0x6a/0xe0
    [<c10a1df0>] ? prepare_to_wait_event+0xd0/0xd0
    [<c1165a50>] ? unregister_shrinker+0x40/0x50
    [<c11b92f6>] kill_block_super+0x26/0x70
    [<c11b94f5>] deactivate_locked_super+0x45/0x80
    [<c11ba007>] deactivate_super+0x47/0x60
    [<c11d2b39>] cleanup_mnt+0x39/0x80
    [<c11d2bc0>] __cleanup_mnt+0x10/0x20
    [<c1080b51>] task_work_run+0x91/0xd0
    [<c1011e3c>] do_notify_resume+0x7c/0x90
    [<c1720da5>] work_notify
   Code: 8b 55 e8 e9 f4 fe ff ff 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 83 ec 20 89 5d f4 89 c3 89 75 f8 89 d6 89 7d fc 89 cf 8b 48 14 <64> 8b 01 89 45 ec 89 c2 8b 45 08 c1 fa 1f 01 75 ec 89 55 f0 89
   EIP: [<c136ce88>] __percpu_counter_add+0x18/0xc0 SS:ESP 0068:ebebde40
   CR2: 0000000034beb000
   ---[ end trace dd564a7bea834ecd ]---

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=101011

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: don't manipulate recovery flag when freezing no-journal fs

commit c642dc9e1aaed953597e7092d7df329e6234096e upstream.

At some point along this sequence of changes:

f6e63f9 ext4: fold ext4_nojournal_sops into ext4_sops
bb04457 ext4: support freezing ext2 (nojournal) file systems
9ca9238 ext4: Use separate super_operations structure for no_journal filesystems

ext4 started setting needs_recovery on filesystems without journals
when they are unfrozen. This makes no sense, and in fact confuses
blkid to the point where it doesn't recognize the filesystem at all.

(freeze ext2; unfreeze ext2; run blkid; see no output; run dumpe2fs,
see needs_recovery set on fs w/ no journal).

To fix this, don't manipulate the INCOMPAT_RECOVER feature on
filesystems without journals.

Reported-by: Stu Mark <smark@datto.com>
Reviewed-by: Jan Kara <jack@suse.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cxl: Fix unbalanced pci_dev_get in cxl_probe

commit 2925c2fdf1e0eb642482f5b30577e9435aaa8edb upstream.

Currently the first thing we do in cxl_probe is to grab a reference
on the pci device. Later on, we call device_register on our adapter.
In our remove path, we call device_unregister, but we never call
pci_dev_put. We therefore leak the device every time we do a
reflash.

device_register/unregister is sufficient to hold the reference.
Therefore, drop the call to pci_dev_get.

Here's why this is safe.
The proposed cxl_probe(pdev) calls cxl_adapter_init:
    a) init calls cxl_adapter_alloc, which creates a struct cxl,
       conventionally called adapter. This struct contains a
       device entry, adapter->dev.

    b) init calls cxl_configure_adapter, where we set
       adapter->dev.parent = &dev->dev (here dev is the pci dev)

So at this point, the cxl adapter's device's parent is the PCI
device that I want to be refcounted properly.

    c) init calls cxl_register_adapter
       *) cxl_register_adapter calls device_register(&adapter->dev)

So now we're in device_register, where dev is the adapter device, and
we want to know if the PCI device is safe after we return.

device_register(&adapter->dev) calls device_initialize() and then
device_add().

device_add() does a get_device(). device_add() also explicitly grabs
the device's parent, and calls get_device() on it:

         parent = get_device(dev->parent);

So therefore, device_register() takes a lock on the parent PCI dev,
which is what pci_dev_get() was guarding. pci_dev_get() can therefore
be safely removed.

Fixes: f204e0b8cedd ("cxl: Driver code for powernv PCIe based cards for userspace access")
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cxl: Remove racy attempt to force EEH invocation in reset

commit 9d8e27673c45927fee9e7d8992ffb325a6b0b0e4 upstream.

cxl_reset currently PERSTs the slot, and then repeatedly tries to
read MMIO space in order to kick off EEH.

There are 2 problems with this: it's unnecessary, and it's racy.

It's unnecessary because the PERST will bring down the PHB link.
That will be picked up by the CAPP, which will send out an HMI.
Skiboot, noticing an HMI from the CAPP, will send an OPAL
notification to the kernel, which will trigger EEH recovery.

It's also racy: the EEH recovery triggered by the CAPP will
eventually cause the MMIO space to have its mapping invalidated
and the pointer NULLed out. This races with our attempt to read
the MMIO space. This is causing OOPSes in testing.

Simply drop all the attempts to force EEH detection, and trust
that Skiboot will send the notification and that we'll act on it.
The Skiboot code to send the EEH notification has been in Skiboot
for as long as CAPP recovery has been supported, so we don't need
to worry about breaking obscure setups with ancient firmware.

Cc: Ryan Grimm <grimm@linux.vnet.ibm.com>
Fixes: 62fa19d4b4fd ("cxl: Add ability to reset the card")
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cxl: Allow release of contexts which have been OPENED but not STARTED

commit 7c26b9cf5347c24272152438cdd9675183804425 upstream.

If we open a context but do not start it (either because we do not attempt
to start it, or because it fails to start for some reason), we are left
with a context in state OPENED. Previously, cxl_release_context() only
allowed releasing contexts in state CLOSED, so attempting to release an
OPENED context would fail.

In particular, this bug causes available contexts to run out after some EEH
failures, where drivers attempt to release contexts that have failed to
start.

Allow releasing contexts in any state with a value lower than STARTED, i.e.
OPENED or CLOSED (we can't release a STARTED context as it's currently
using the hardware, and we assume that contexts in any new states which may
be added in future with a value higher than STARTED are also unsafe to
release).

Fixes: 6f7f0b3df6d4 ("cxl: Add AFU virtual PHB and kernel API")
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mac80211: enable assoc check for mesh interfaces

commit 3633ebebab2bbe88124388b7620442315c968e8f upstream.

We already set a station to be associated when peering completes, both
in user space and in the kernel. Thus we should always have an
associated sta before sending data frames to that station.

Failure to check assoc state can cause crashes in the lower-level driver
due to transmitting unicast data frames before driver sta structures
(e.g. ampdu state in ath9k) are initialized. This occurred when
forwarding in the presence of fixed mesh paths: frames were transmitted
to stations with whom we hadn't yet completed peering.

Reported-by: Alexis Green <agreen@cococorp.com>
Tested-by: Jesse Jones <jjones@cococorp.com>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: math-emu: Emulate missing BC1{EQ,NE}Z instructions

commit c909ca718e8f50cf484ef06a8dd935e738e8e53d upstream.

Commit c8a34581ec09 ("MIPS: Emulate the BC1{EQ,NE}Z FPU instructions")
added support for emulating the new R6 BC1{EQ,NE}Z branches but it missed
the case where the instruction that caused the exception was not on a DS.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Fixes: c8a34581ec09 ("MIPS: Emulate the BC1{EQ,NE}Z FPU instructions")
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10738/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: math-emu: Allow m{f,t}hc emulation on MIPS R6

commit e8f80cc1a6d80587136b015e989a12827e1fcfe5 upstream.

The mfhc/mthc instructions are supported on MIPS R6 so emulate
them if needed.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10737/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: CPS: use 32b accesses to GCRs

commit 90996511187d6282db6d02d3f97006b4dbb5c457 upstream.

Commit b677bc03d757 ("MIPS: cps-vec: Use macros for various arithmetics
and memory operations") replaced various load & store instructions
through cps-vec.S with the PTR_L & PTR_S macros. However it was somewhat
overzealous in doing so for CM GCR accesses, since the bit width of the
CM doesn't necessarily match that of the CPU. The registers accessed
(GCR_CL_COHERENCE & GCR_CL_ID) should be safe to simply always access
using 32b instructions, so do so in order to avoid issues when using a
32b CM with a 64b CPU.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: James Hogan <james.hogan@imgtec.com>
Patchwork: https://patchwork.linux-mips.org/patch/10864/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tg3: Fix temperature reporting

commit d3d11fe08ccc9bff174fc958722b5661f0932486 upstream.

The temperature registers appear to report values in degrees Celsius
while the hwmon API mandates values to be exposed in millidegrees
Celsius. Do the conversion so that the values reported by "sensors"
are correct.

Fixes: aed93e0bf493 ("tg3: Add hwmon support for temperature")
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Prashant Sreedharan <prashant@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

igb: Fix oops caused by missing queue pairing

commit 72ddef0506da852dc82f078f37ced8ef4d74a2bf upstream.

When initializing igb driver (e.g. 82576, I350), IGB_FLAG_QUEUE_PAIRS is
set if adapter->rss_queues exceeds half of max_rss_queues in
igb_init_queue_configuration().
On the other hand, IGB_FLAG_QUEUE_PAIRS is not set even if the number of
queues exceeds half of max_combined in igb_set_channels() when changing
the number of queues by "ethtool -L".
In this case, if numvecs is larger than MAX_MSIX_ENTRIES (10), the size
of adapter->msix_entries[], an overflow can occur in
igb_set_interrupt_capability(), which in turn leads to an oops.

Fix this problem as follows:
- When changing the number of queues by "ethtool -L", set
IGB_FLAG_QUEUE_PAIRS in the same way as initializing igb driver.
- When increasing the size of q_vector, reallocate it appropriately.
(With IGB_FLAG_QUEUE_PAIRS set, the size of q_vector gets larger.)

Another possible way to fix this problem is to cap the queues at its
initial number, which is the number of the initial online cpus. But this
is not the optimal way because we cannot increase queues when another
cpu becomes online.

Note that before commit cd14ef54d25b ("igb: Change to use statically
allocated array for MSIx entries"), this problem did not cause oops
but just made the number of queues become 1 because of entering msi_only
mode in igb_set_interrupt_capability().

Fixes: 907b7835799f ("igb: Add ethtool support to configure number of channels")
Signed-off-by: Shota Suzuki <suzuki_shota_t3@lab.ntt.co.jp>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rtlwifi: rtl8821ae: Fix an expression that is always false

commit 251086f588720277a6f5782020a648ce32c4e00b upstream.

In routine _rtl8821ae_set_media_status(), an incorrect mask results in a test
for AP status to always be false. Similar bugs were fixed in rtl8192cu and
rtl8192de, but this instance was missed at that time.

Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: David Binderman <dcb314@hotmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rtlwifi: rtl8192cu: Add new device ID

commit 1642d09fb9b128e8e538b2a4179962a34f38dff9 upstream.

The v2 of NetGear WNA1000M uses a different idProduct: USB ID 0846:9043

Signed-off-by: Adrien Schildknecht <adrien+dev@schischi.me>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

unshare: Unsharing a thread does not require unsharing a vm

commit 12c641ab8270f787dfcce08b5f20ce8b65008096 upstream.

In the logic in the initial commit of unshare made creating a new
thread group for a process, contingent upon creating a new memory
address space for that process. That is wrong. Two separate
processes in different thread groups can share a memory address space
and clone allows creation of such proceses.

This is significant because it was observed that mm_users > 1 does not
mean that a process is multi-threaded, as reading /proc/PID/maps
temporarily increments mm_users, which allows other processes to
(accidentally) interfere with unshare() calls.

Correct the check in check_unshare_flags() to test for
!thread_group_empty() for CLONE_THREAD, CLONE_SIGHAND, and CLONE_VM.
For sighand->count > 1 for CLONE_SIGHAND and CLONE_VM.
For !current_is_single_threaded instead of mm_users > 1 for CLONE_VM.

By using the correct checks in unshare this removes the possibility of
an accidental denial of service attack.

Additionally using the correct checks in unshare ensures that only an
explicit unshare(CLONE_VM) can possibly trigger the slow path of
current_is_single_threaded(). As an explict unshare(CLONE_VM) is
pointless it is not expected there are many applications that make
that call.

Fixes: b2e0d98705e60e45bbb3c0032c48824ad7ae0704 userns: Implement unshare of the user namespace
Reported-by: Ricky Zhou <rickyz@chromium.org>
Reported-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

blk-mq: fix race between timeout and freeing request

commit 0048b4837affd153897ed1222283492070027aa9 upstream.

Inside timeout handler, blk_mq_tag_to_rq() is called
to retrieve the request from one tag. This way is obviously
wrong because the request can be freed any time and some
fiedds of the request can't be trusted, then kernel oops
might be triggered[1].

Currently wrt. blk_mq_tag_to_rq(), the only special case is
that the flush request can share same tag with the request
cloned from, and the two requests can't be active at the same
time, so this patch fixes the above issue by updating tags->rqs[tag]
with the active request(either flush rq or the request cloned
from) of the tag.

Also blk_mq_tag_to_rq() gets much simplified with this patch.

Given blk_mq_tag_to_rq() is mainly for drivers and the caller must
make sure the request can't be freed, so in bt_for_each() this
helper is replaced with tags->rqs[tag].

[1] kernel oops log
[  439.696220] BUG: unable to handle kernel NULL pointer dereference at 0000000000000158^M
[  439.697162] IP: [<ffffffff812d89ba>] blk_mq_tag_to_rq+0x21/0x6e^M
[  439.700653] PGD 7ef765067 PUD 7ef764067 PMD 0 ^M
[  439.700653] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC ^M
[  439.700653] Dumping ftrace buffer:^M
[  439.700653]    (ftrace buffer empty)^M
[  439.700653] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw^M
[  439.700653] CPU: 6 PID: 2779 Comm: stress-ng-sigfd Not tainted 4.2.0-rc5-next-20150805+ #265^M
[  439.730500] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011^M
[  439.730500] task: ffff880605308000 ti: ffff88060530c000 task.ti: ffff88060530c000^M
[  439.730500] RIP: 0010:[<ffffffff812d89ba>]  [<ffffffff812d89ba>] blk_mq_tag_to_rq+0x21/0x6e^M
[  439.730500] RSP: 0018:ffff880819203da0  EFLAGS: 00010283^M
[  439.730500] RAX: ffff880811b0e000 RBX: ffff8800bb465f00 RCX: 0000000000000002^M
[  439.730500] RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000^M
[  439.730500] RBP: ffff880819203db0 R08: 0000000000000002 R09: 0000000000000000^M
[  439.730500] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000202^M
[  439.730500] R13: ffff880814104800 R14: 0000000000000002 R15: ffff880811a2ea00^M
[  439.730500] FS:  00007f165b3f5740(0000) GS:ffff880819200000(0000) knlGS:0000000000000000^M
[  439.730500] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b^M
[  439.730500] CR2: 0000000000000158 CR3: 00000007ef766000 CR4: 00000000000006e0^M
[  439.730500] Stack:^M
[  439.730500]  0000000000000008 ffff8808114eed90 ffff880819203e00 ffffffff812dc104^M
[  439.755663]  ffff880819203e40 ffffffff812d9f5e 0000020000000000 ffff8808114eed80^M
[  439.755663] Call Trace:^M
[  439.755663]  <IRQ> ^M
[  439.755663]  [<ffffffff812dc104>] bt_for_each+0x6e/0xc8^M
[  439.755663]  [<ffffffff812d9f5e>] ? blk_mq_rq_timed_out+0x6a/0x6a^M
[  439.755663]  [<ffffffff812d9f5e>] ? blk_mq_rq_timed_out+0x6a/0x6a^M
[  439.755663]  [<ffffffff812dc1b3>] blk_mq_tag_busy_iter+0x55/0x5e^M
[  439.755663]  [<ffffffff812d88b4>] ? blk_mq_bio_to_request+0x38/0x38^M
[  439.755663]  [<ffffffff812d8911>] blk_mq_rq_timer+0x5d/0xd4^M
[  439.755663]  [<ffffffff810a3e10>] call_timer_fn+0xf7/0x284^M
[  439.755663]  [<ffffffff810a3d1e>] ? call_timer_fn+0x5/0x284^M
[  439.755663]  [<ffffffff812d88b4>] ? blk_mq_bio_to_request+0x38/0x38^M
[  439.755663]  [<ffffffff810a46d6>] run_timer_softirq+0x1ce/0x1f8^M
[  439.755663]  [<ffffffff8104c367>] __do_softirq+0x181/0x3a4^M
[  439.755663]  [<ffffffff8104c76e>] irq_exit+0x40/0x94^M
[  439.755663]  [<ffffffff81031482>] smp_apic_timer_interrupt+0x33/0x3e^M
[  439.755663]  [<ffffffff815559a4>] apic_timer_interrupt+0x84/0x90^M
[  439.755663]  <EOI> ^M
[  439.755663]  [<ffffffff81554350>] ? _raw_spin_unlock_irq+0x32/0x4a^M
[  439.755663]  [<ffffffff8106a98b>] finish_task_switch+0xe0/0x163^M
[  439.755663]  [<ffffffff8106a94d>] ? finish_task_switch+0xa2/0x163^M
[  439.755663]  [<ffffffff81550066>] __schedule+0x469/0x6cd^M
[  439.755663]  [<ffffffff8155039b>] schedule+0x82/0x9a^M
[  439.789267]  [<ffffffff8119b28b>] signalfd_read+0x186/0x49a^M
[  439.790911]  [<ffffffff8106d86a>] ? wake_up_q+0x47/0x47^M
[  439.790911]  [<ffffffff811618c2>] __vfs_read+0x28/0x9f^M
[  439.790911]  [<ffffffff8117a289>] ? __fget_light+0x4d/0x74^M
[  439.790911]  [<ffffffff811620a7>] vfs_read+0x7a/0xc6^M
[  439.790911]  [<ffffffff8116292b>] SyS_read+0x49/0x7f^M
[  439.790911]  [<ffffffff81554c17>] entry_SYSCALL_64_fastpath+0x12/0x6f^M
[  439.790911] Code: 48 89 e5 e8 a9 b8 e7 ff 5d c3 0f 1f 44 00 00 55 89
f2 48 89 e5 41 54 41 89 f4 53 48 8b 47 60 48 8b 1c d0 48 8b 7b 30 48 8b
53 38 <48> 8b 87 58 01 00 00 48 85 c0 75 09 48 8b 97 88 0c 00 00 eb 10
^M
[  439.790911] RIP  [<ffffffff812d89ba>] blk_mq_tag_to_rq+0x21/0x6e^M
[  439.790911]  RSP <ffff880819203da0>^M
[  439.790911] CR2: 0000000000000158^M
[  439.790911] ---[ end trace d40af58949325661 ]---^M

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

blk-mq: fix buffer overflow when reading sysfs file of 'pending'

commit 596f5aad2a704b72934e5abec1b1b4114c16f45b upstream.

There may be lots of pending requests so that the buffer of PAGE_SIZE
can't hold them at all.

One typical example is scsi-mq, the queue depth(.can_queue) of
scsi_host and blk-mq is quite big but scsi_device's queue_depth
is a bit small(.cmd_per_lun), then it is quite easy to have lots
of pending requests in hw queue.

This patch fixes the following warning and the related memory
destruction.

[  359.025101] fill_read_buffer: blk_mq_hw_sysfs_show+0x0/0x7d returned bad count^M
[  359.055595] irq event stamp: 15537^M
[  359.055606] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC ^M
[  359.055614] Dumping ftrace buffer:^M
[  359.055660]    (ftrace buffer empty)^M
[  359.055672] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw^M
[  359.055678] CPU: 4 PID: 21631 Comm: stress-ng-sysfs Not tainted 4.2.0-rc5-next-20150805 #434^M
[  359.055679] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011^M
[  359.055682] task: ffff8802161cc000 ti: ffff88021b4a8000 task.ti: ffff88021b4a8000^M
[  359.055693] RIP: 0010:[<ffffffff811541c5>]  [<ffffffff811541c5>] __kmalloc+0xe8/0x152^M

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfc: nci: hci: Add check on skb nci_hci_send_cmd parameter

commit 5a9e0ffc0f128ecdf7c770f76c268e4f9f3c9118 upstream.

skb can be NULL and may lead to a NULL pointer error.

Add a check condition before setting HCI rx buffer.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfc: netlink: Warning fix

commit adca3c38d807b341a965d0aba8721d0784d8471b upstream.

When NFC_ATTR_VENDOR_DATA is not set, data_len is 0 and data is NULL.

Fixes the following warning:

net/nfc/netlink.c:1536:3: warning: 'data' may be used uninitialized
+in this function [-Wmaybe-uninitialized]
return cmd->doit(dev, data, data_len);

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfc: netlink: Add check on NFC_ATTR_VENDOR_DATA

commit fe202fe95564023223ce1910c9e352f391abb1d5 upstream.

NFC_ATTR_VENDOR_DATA is an optional vendor_cmd argument.
The current code was potentially using a non existing argument
leading to potential catastrophic results.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfc: st-nci: Free data with irrelevant NDLC PCB_SYNC value

commit 8b706884eac958ec16518315053f77e052627084 upstream.

PCB_SYNC different than PCB_TYPE_SUPERVISOR or PCB_TYPE_DATAFRAME
should be discarded.

Irrelevant data may be forwarded up to the ndlc state machine by
phys like spi to prevent missing potential data during "write"
transactions.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfc: st-nci: Remove data from ack_pending_q when receiving a SYNC_ACK

commit 1d816b6eb513498aa28a0ff1e4db7632bded1707 upstream.

When receiving a NDLC PCB_SYNC_ACK the pending data was never
removed from ack_pending_q and cleared.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFC: st-nci: fix use of uninitialized variables in error path

commit daaf1e1f1640eb11259954d1d847d8a72ab5b938 upstream.

st_nci_hci_load_session() calls kfree_skb() on unitialized
variables skb_pipe_info and skb_pipe_list if the call to
nci_hci_connect_gate() failed. Reword the error path to not use
these variables when they are not initialized. While at it, there
seemed to be a memory leak because skb_pipe_info was only freed
once, after the for-loop, even though several ones were created
by nci_hci_send_cmd.

Acked-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFC: st21nfca: fix use of uninitialized variables in error path

commit 5a3570061a131309143a49e4bbdbce7e23f261e7 upstream.

st21nfca_hci_load_session() calls kfree_skb() on unitialized
variables skb_pipe_info and skb_pipe_list if the call to
nfc_hci_connect_gate() failed. Reword the error path to not use
these variables when they are not initialized. While at it, there
seemed to be a memory leak because skb_pipe_info was only freed
once, after the for-loop, even though several ones were created
by nfc_hci_send_cmd.

Fixes: ec03ff1a8f9a
("NFC: st21nfca: Remove skb_pipe_list and skb_pipe_info
useless allocation")

Acked-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfc: st-nci: Fix non accurate comment for st_nci_i2c_read

commit e7723b33077b04648213f043bc22654c54e375e4 upstream.

Due to a copy and paste error st_nci_i2c_read still contains
st21nfca header comment.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfc: st-nci: Fix typo when changing from st21nfcb to st-nci

commit 30458aac63c89771d19f023083d64d018562812e upstream.

Replace ST21NFCB with ST_NCI or st21nfcb with st_nci as it
was forgotten in commit "nfc: st-nci: Rename st21nfcb to st-nci"
ed06aeefdac348cfb91a3db5fe1067e3202afd70

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfc: st-nci: Remove duplicate file platform_data/st_nci.h

commit 76b733d15874128ee2d0365b4cbe7d51decd8d37 upstream.

commit "nfc: st-nci: Rename st21nfcb to st-nci" adds
include/linux/platform_data/st_nci.h duplicated with
include/linux/platform_data/st-nci.h.

Only drivers/nfc/st-nci/i2c.c uses platform_data/st_nci.h.

Reported-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Linux 4.2.1

ARM: rockchip: fix broken build

commit cb8cc37f4d38d96552f2c52deb15e511cdacf906 upstream.

The following was seen in branch[0] build.

arch/arm/mach-rockchip/platsmp.c:154:23: error:
'rockchip_secondary_startup' undeclared (first use in this function)

branch[0]:
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip.git
v4.3-armsoc/soc

The broken build is caused by the commit fe4407c0dc58
("ARM: rockchip: fix the CPU soft reset").

Signed-off-by: Caesar Wang <wxt@rock-chips.com>
The breakage was a result of it being wrongly merged in my branch with
the cache invalidation rework from Russell 02b4e2756e01c
("ARM: v7 setup function should invalidate L1 cache").

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fs: create and use seq_show_option for escaping

commit a068acf2ee77693e0bf39d6e07139ba704f461c3 upstream.

Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g.  new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files.  This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else.  This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.

Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink).  Imagine the use
of "sudo" is something more sneaky:

  $ BASE="ovl"
  $ MNT="$BASE/mnt"
  $ LOW="$BASE/lower"
  $ UP="$BASE/upper"
  $ WORK="$BASE/work/ 0 0
  none /proc fuse.pwn user_id=1000"
  $ mkdir -p "$LOW" "$UP" "$WORK"
  $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
  $ cat /proc/mounts
  none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
  none /proc fuse.pwn user_id=1000 0 0
  $ fusermount -u /proc
  $ cat /proc/mounts
  cat: /proc/mounts: No such file or directory

This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed.  Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.

[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Jan Kara <jack@suse.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: J. R. Okajima <hooanon05g@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

memory-hotplug: add hot-added memory ranges to memblock before allocate node_data for a node.

commit 7f36e3e56db1ae75d1e157011b3cb2e0957f0a7e upstream.

Commit f9126ab9241f ("memory-hotplug: fix wrong edge when hot add a new
node") hot-added memory range to memblock, after creating pgdat for new
node.

But there is a problem:

  add_memory()
  |--> hotadd_new_pgdat()
       |--> free_area_init_node()
            |--> get_pfn_range_for_nid()
                 |--> find start_pfn and end_pfn in memblock
  |--> ......
  |--> memblock_add_node(start, size, nid)    --------    Here, just too late.

get_pfn_range_for_nid() will find that start_pfn and end_pfn are both 0.
As a result, when adding memory, dmesg will give the following wrong
message.

  Initmem setup node 5 [mem 0x0000000000000000-0xffffffffffffffff]
  On node 5 totalpages: 0
  Built 5 zonelists in Node order, mobility grouping on.  Total pages: 32588823
  Policy zone: Normal
  init_memory_mapping: [mem 0x60000000000-0x607ffffffff]

The solution is simple, just add the memory range to memblock a little
earlier, before hotadd_new_pgdat().

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ocfs2: direct write will call ocfs2_rw_unlock() twice when doing aio+dio

commit aa1057b3dec478b20c77bad07442318ae36d893c upstream.

ocfs2_file_write_iter() is usng the wrong return value ('written'). This
will cause ocfs2_rw_unlock() be called both in write_iter & end_io,
triggering a BUG_ON.

This issue was introduced by commit 7da839c47589 ("ocfs2: use
__generic_file_write_iter()").

Orabug: 21612107
Fixes: 7da839c47589 ("ocfs2: use __generic_file_write_iter()")
Signed-off-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hpfs: update ctime and mtime on directory modification

commit f49a26e7718dd30b49e3541e3e25aecf5e7294e2 upstream.

Update ctime and mtime when a directory is modified. (though OS/2 doesn't
update them anyway)

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fs: Set the size of empty dirs to 0.

commit 4b75de8615050c1b0dd8d7794838c42f74ed36ba upstream.

Before the make_empty_dir_inode calls were introduce into proc, sysfs,
and sysctl those directories when stated reported an i_size of 0.
make_empty_dir_inode started reporting an i_size of 2. At least one
userspace application depended on stat returning i_size of 0. So
modify make_empty_dir_inode to cause an i_size of 0 to be reported for
these directories.

Reported-by: Tejun Heo <tj@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dm cache: fix leaking of deferred bio prison cells

commit 9153df7405ae04c1b0466de720e0a685cfea1a3a upstream.

There were two cases where dm_cell_visit_release() was being called,
which removes the cell from the prison's rbtree, but the callers didn't
also return the cell to the mempool. Fix this by having them call
free_prison_cell().

This leak manifested as the 'kmalloc-96' slab growing until OOM.

Fixes: 651f5fa2a3 ("dm cache: defer whole cells")
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>