cgroup_threadgroup_rwsem is acquired in read mode during process exit
and fork. It is also grabbed in write mode during
__cgroups_proc_write(). I've recently run into a scenario with lots
of memory pressure and OOM and I am beginning to see
This thread is waiting on the reader of cgroup_threadgroup_rwsem to
exit. The reader itself is under memory pressure and has gone into
reclaim after fork. There are times the reader also ends up waiting on
oom_lock as well.
In the meanwhile, all processes exiting/forking are blocked almost
stalling the system.
This patch moves the threadgroup_change_begin from before
cgroup_fork() to just before cgroup_canfork(). There is no nee to
worry about threadgroup changes till the task is actually added to the
threadgroup. This avoids having to call reclaim with
cgroup_threadgroup_rwsem held.
After arbitrary bio size was introduced, the incoming bio may
be very big. We have to split the bio into small bios so that
each holds at most BIO_MAX_PAGES bvecs for safety reason, such
as bio_clone().
The issue can be reproduced by the following steps:
- create one raid1 over two virtio-blk
- build bcache device over the above raid1 and another cache device
and bucket size is set as 2Mbytes
- set cache mode as writeback
- run random write over ext4 on the bcache device
Fixes: 54efd50(block: make generic_make_request handle arbitrarily sized bios) Reported-by: Sebastian Roesner <sroesner-kernelorg@roesner-online.de> Reported-by: Eric Wheeler <bcache@lists.ewheeler.net> Cc: Shaohua Li <shli@fb.com> Acked-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
blk_set_queue_dying() can be called while another thread is
submitting I/O or changing queue flags, e.g. through dm_stop_queue().
Hence protect the QUEUE_FLAG_DYING flag change with locking.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We temporally change checksum fields in buffers of some types of
metadata into '0' for verifying the checksum values. By doing this
without locking the buffer, some metadata's checksums, which are
being committed or written back to the storage, could be damaged.
In our test, several metadata blocks were found with damaged metadata
checksum value during recovery process. When we only verify the
checksum value, we have to avoid modifying checksum fields directly.
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Török Edwin <edwin@etorok.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we need to move xattrs into external xattr block, we call
ext4_xattr_block_set() from ext4_expand_extra_isize_ea(). That may end
up calling ext4_mark_inode_dirty() again which will recurse back into
the inode expansion code leading to deadlocks.
Protect from recursion using EXT4_STATE_NO_EXPAND inode flag and move
its management into ext4_expand_extra_isize_ea() since its manipulation
is safe there (due to xattr_sem) from possible races with
ext4_xattr_set_handle() which plays with it as well.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We did not count with the padding of xattr value when computing desired
shift of xattrs in the inode when expanding i_extra_isize. As a result
we could create unaligned start of inline xattrs. Account for alignment
properly.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When multiple xattrs need to be moved out of inode, we did not properly
recompute total size of xattr headers in the inode and the new header
position. Thus when moving the second and further xattr we asked
ext4_xattr_shift_entries() to move too much and from the wrong place,
resulting in possible xattr value corruption or general memory
corruption.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The code in ext4_expand_extra_isize_ea() treated new_extra_isize
argument sometimes as the desired target i_extra_isize and sometimes as
the amount by which we need to grow current i_extra_isize. These happen
to coincide when i_extra_isize is 0 which used to be the common case and
so nobody noticed this until recently when we added i_projid to the
inode and so i_extra_isize now needs to grow from 28 to 32 bytes.
The result of these bugs was that we sometimes unnecessarily decided to
move xattrs out of inode even if there was enough space and we often
ended up corrupting in-inode xattrs because arguments to
ext4_xattr_shift_entries() were just wrong. This could demonstrate
itself as BUG_ON in ext4_xattr_shift_entries() triggering.
Fix the problem by introducing new isize_diff variable and use it where
appropriate.
Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Using INVALID_[UG]ID for the LSM file creation context doesn't
make sense, so return an error if the inode passed to
set_create_file_as() has an invalid id.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Filesystem uids which don't map into a user namespace may result
in inode->i_uid being INVALID_UID. A symlink and its parent
could have different owners in the filesystem can both get
mapped to INVALID_UID, which may result in following a symlink
when this would not have otherwise been permitted when protected
symlinks are enabled.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The capability check should not be audited since it is only being used
to determine the inode permissions. A failed check does not indicate a
violation of security policy but, when an LSM is enabled, a denial audit
message was being generated.
The denial audit message caused confusion for some application authors
because root-running Go applications always triggered the denial. To
prevent this confusion, the capability check in net_ctl_permissions() is
switched to the noaudit variant.
When checking the current cred for a capability in a specific user
namespace, it isn't always desirable to have the LSMs audit the check.
This patch adds a noaudit variant of ns_capable() for when those
situations arise.
The common logic between ns_capable() and the new ns_capable_noaudit()
is moved into a single, shared function to keep duplicated code to a
minimum and ease maintainability.
When finding a child profile via an rcu critical section, the profile
may be put and scheduled for deletion after the child is found but
before its refcount is incremented.
Protect against this by repeating the lookup if the profiles refcount
is 0 and is one its way to deletion.
Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If there were less than 2 entries in the multipath list, then
xprt_iter_next_entry_multiple() would never advance beyond the
first entry, which is correct for round robin behaviour, but not
for the list iteration.
The end result would be infinite looping in rpc_clnt_iterate_for_each_xprt()
as we would never see the xprt == NULL condition fulfilled.
Reported-by: Oleg Drokin <green@linuxhacker.ru> Fixes: 80b14d5e61ca ("SUNRPC: Add a structure to track multiple transports") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Jason L Tibbitts III <tibbs@math.uh.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Attributes declared with __ATTR_PREALLOC use sysfs_kf_read() which returns
zero bytes for non-zero offset. This breaks script checkarray in mdadm tool
in debian where /bin/sh is 'dash' because its builtin 'read' reads only one
byte at a time. Script gets 'i' instead of 'idle' when reads current action
from /sys/block/$dev/md/sync_action and as a result does nothing.
This patch adds trivial implementation of partial read: generate whole
string and move required part into buffer head.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Fixes: 4ef67a8c95f3 ("sysfs/kernfs: make read requests on pre-alloc files use the buffer.") Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=787950 Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit e647b532275b ("ACPI: Add early device probing infrastructure")
introduced code that allows inserting driver specific
struct acpi_probe_entry probe entries into ACPI linker sections
(one per-subsystem, eg irqchip, clocksource) that are then walked
to retrieve the data and function hooks required to probe the
respective kernel components.
Probing for all entries in a section is triggered through
the __acpi_probe_device_table() function, that in turn, according
to the table ID a given probe entry reports parses the table
with the function retrieved from the respective section structures
(ie struct acpi_probe_entry). Owing to the current ACPI table
parsing implementation, the __acpi_probe_device_table() function
has to share global variables with the acpi_match_madt() function, so
in order to guarantee mutual exclusion locking is required
between the two functions.
Current kernel code implements the locking through the acpi_probe_lock
spinlock; this has the side effect of requiring all code called
within the lock (ie struct acpi_probe_entry.probe_{table/subtbl} hooks)
not to sleep.
However, kernel subsystems that make use of the early probing
infrastructure are relying on kernel APIs that may sleep (eg
irq_domain_alloc_fwnode(), among others) in the function calls
pointed at by struct acpi_probe_entry.{probe_table/subtbl} entries
(eg gic_v2_acpi_init()), which is a bug.
Since __acpi_probe_device_table() is called from context
that is allowed to sleep the acpi_probe_lock spinlock can be replaced
with a mutex; this fixes the issue whilst still guaranteeing
mutual exclusion.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Fixes: e647b532275b (ACPI: Add early device probing infrastructure) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the ACPI_DECLARE_PROBE_ENTRY macro was added in
commit e647b532275b ("ACPI: Add early device probing infrastructure"),
a stub macro adding an unused entry was added for the !CONFIG_ACPI
Kconfig option case to make sure kernel code making use of the
macro did not require to be guarded within CONFIG_ACPI in order to
be compiled.
The stub macro was never used since all kernel code that defines
ACPI_DECLARE_PROBE_ENTRY entries is currently guarded within
CONFIG_ACPI; it contains a typo that should be nonetheless fixed.
Fix the typo in the stub (ie !CONFIG_ACPI) ACPI_DECLARE_PROBE_ENTRY()
macro so that it can actually be used if needed.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Fixes: e647b532275b (ACPI: Add early device probing infrastructure) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit ebb657babfa9 ("staging: comedi: ni_mio_common: clarify the
cmd->start_arg validation and use") introduced a backwards compatibility
issue in the use of asynchronous commands on the AO subdevice when
`start_src` is `TRIG_EXT`. Valid values for `start_src` are `TRIG_INT`
(for internal, software trigger), and `TRIG_EXT` (for external trigger).
When set to `TRIG_EXT`. In both cases, the driver relies on an
internal, software trigger to set things up (allowing the user
application to write sufficient samples to the data buffer before the
trigger), so it acts as a software "pre-trigger" in the `TRIG_EXT` case.
The software trigger is handled by `ni_ao_inttrig()`.
Prior to the above change, when `start_src` was `TRIG_INT`, `start_arg`
was required to be 0, and `ni_ao_inttrig()` checked that the software
trigger number was also 0. After the above change, when `start_src` was
`TRIG_INT`, any value was allowed for `start_arg`, and `ni_ao_inttrig()`
checked that the software trigger number matched this `start_arg` value.
The backwards compatibility issue is that the internal trigger number
now has to match `start_arg` when `start_src` is `TRIG_EXT` when it
previously had to be 0.
Fix the backwards compatibility issue in `ni_ao_inttrig()` by always
allowing software trigger number 0 when `start_src` is something other
than `TRIG_INT`.
Thanks to Spencer Olson for reporting the issue.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Reported-by: Spencer Olson <olsonse@umich.edu> Fixes: ebb657babfa9 ("staging: comedi: ni_mio_common: clarify the cmd->start_arg validation and use") Reviewed-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 73e0e4dfed4c ("staging: comedi: comedi_test: fix timer lock-up")
fixed a lock-up in the timer routine `waveform_ai_timer()` (which was
called `waveform_ai_interrupt()` at the time) caused by
commit 240512474424 ("staging: comedi: comedi_test: use
comedi_handle_events()"). However, it introduced a race condition that
can result in the timer routine misbehaving, such as accessing freed
memory or dereferencing a NULL pointer.
73e0... changed the timer routine to do nothing unless a
`WAVEFORM_AI_RUNNING` flag was set, and changed `waveform_ai_cancel()`
to clear the flag and replace a call to `del_timer_sync()` with a call
to `del_timer()`. `waveform_ai_cancel()` may be called from the timer
routine itself (via `comedi_handle_events()`), or from `do_cancel()`.
(`do_cancel()` is called as a result of a file operation (usually a
`COMEDI_CANCEL` ioctl command, or a release), or during device removal.)
When called from `do_cancel()`, the call to `waveform_ai_cancel()` is
followed by a call to `do_become_nonbusy()`, which frees up stuff for
the current asynchronous command under the assumption that it is now
safe to do so. The race condition occurs when the timer routine
`waveform_ai_timer()` checks the `WAVEFORM_AI_RUNNING` flag just before
it is cleared by `waveform_ai_cancel()`, and is still running during the
call to `do_become_nonbusy()`. In particular, it can lead to a NULL
pointer dereference:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffc0c63add>] waveform_ai_timer+0x17d/0x290 [comedi_test]
That corresponds to this line in `waveform_ai_timer()`:
unsigned int chanspec = cmd->chanlist[async->cur_chan];
but `do_become_nonbusy()` frees `cmd->chanlist` and sets it to `NULL`.
Fix the race by calling `del_timer_sync()` instead of `del_timer()` in
`waveform_ai_cancel()` when not in an interrupt context. The only time
`waveform_ai_cancel()` is called in an interrupt context is when it is
called from the timer routine itself, via `comedi_handle_events()`.
There is no longer any need for the `WAVEFORM_AI_RUNNING` flag, so get
rid of it.
The bug was copied from the AI subdevice to the AO when support for
commands on the AO subdevice was added by commit 0cf55bbef2f9 ("staging:
comedi: comedi_test: implement commands on AO subdevice"). That
involves the timer routine `waveform_ao_timer()`, the comedi "cancel"
routine `waveform_ao_cancel()`, and the flag `WAVEFORM_AO_RUNNING`. Fix
it in the same way as for the AI subdevice.
`daqboard2000_find_boardinfo()` is supposed to check if the
DaqBoard/2000 series model is supported, based on the PCI subvendor and
subdevice ID. The current code is wrong as it is comparing the PCI
device's subdevice ID to an expected, fixed value for the subvendor ID.
It should be comparing the PCI device's subvendor ID to this fixed
value. Correct it.
Fixes: 7e8401b23e7f ("staging: comedi: daqboard2000: add back subsystem_device check") Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Right now, if it's an open of a negative dentry, a race is possible
with several openers who all try to instantiate/rehash the same
dentry and would hit a BUG_ON in d_add.
But in fact if we got a negative dentry in atomic_open, that means
we just revalidated it so no point in talking to MDS at all,
just return ENOENT and make the race go away completely.
When the controller is configured to be dual role and it's in host mode,
if bind udc and gadgt driver, those gadget operations will do gadget
disconnect and finally pull down DP line, which will break host function.
Signed-off-by: Li Jun <jun.li@nxp.com> Signed-off-by: Peter Chen <peter.chen@nxp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
UBSAN complains about a left shift by -1 in proc_do_submiturb(). This
can occur when an URB is submitted for a bulk or control endpoint on
a high-speed device, since the code doesn't bother to check the
endpoint type; normally only interrupt or isochronous endpoints have
a nonzero bInterval value.
Aside from the fact that the operation is illegal, it shouldn't matter
because the result isn't used. Still, in theory it could cause a
hardware exception or other problem, so we should work around it.
This patch avoids doing the left shift unless the shift amount is >= 0.
The same piece of code has another problem. When checking the device
speed (the exponential encoding for interrupt endpoints is used only
by high-speed or faster devices), we need to look for speed >=
USB_SPEED_SUPER as well as speed == USB_SPEED HIGH. The patch adds
this check.
The USB-DMAC's interruption happens even if the CHCR.DE is not set to 1
because CHCR.NULLE is set to 1. So, this driver should call
usb_dmac_isr_transfer_end() if the DE bit is set to 1 only. Otherwise,
the desc is possible to be NULL in the usb_dmac_isr_transfer_end().
The commit 4097461897df ("Input: i8042 - break load dependency ...")
correctly set up ps2_cmd_mutex pointer for the KBD port but forgot to do
the same for AUX port(s), which results in communication on KBD and AUX
ports to clash with each other.
Fixes: 4097461897df ("Input: i8042 - break load dependency ...") Reported-by: Bruno Wolff III <bruno@wolff.to> Tested-by: Bruno Wolff III <bruno@wolff.to> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As explained in 1407814240-4275-1-git-send-email-decui@microsoft.com we
have a hard load dependency between i8042 and atkbd which prevents
keyboard from working on Gen2 Hyper-V VMs.
> hyperv_keyboard invokes serio_interrupt(), which needs a valid serio
> driver like atkbd.c. atkbd.c depends on libps2.c because it invokes
> ps2_command(). libps2.c depends on i8042.c because it invokes
> i8042_check_port_owner(). As a result, hyperv_keyboard actually
> depends on i8042.c.
>
> For a Generation 2 Hyper-V VM (meaning no i8042 device emulated), if a
> Linux VM (like Arch Linux) happens to configure CONFIG_SERIO_I8042=m
> rather than =y, atkbd.ko can't load because i8042.ko can't load(due to
> no i8042 device emulated) and finally hyperv_keyboard can't work and
> the user can't input: https://bugs.archlinux.org/task/39820
> (Ubuntu/RHEL/SUSE aren't affected since they use CONFIG_SERIO_I8042=y)
To break the dependency we move away from using i8042_check_port_owner()
and instead allow serio port owner specify a mutex that clients should use
to serialize PS/2 command stream.
Reported-by: Mark Laws <mdl@60hz.org> Tested-by: Mark Laws <mdl@60hz.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The map_offset variable is specific to the register and needs to be reset
in the loop. Otherwise, subsequent register's subpacket maps will have
their bits set at the wrong index.
Signed-off-by: Andrew Duggan <aduggan@synaptics.com> Tested-by: Nitin Chaudhary <nitinchaudhary1289@gmail.com> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit fe6b0dfaba68 ("Input: tegra-kbc - use reset framework")
accidentally converted _deassert to _assert, so there is no code
to wake up this hardware.
commit 909c3a22da3 (Btrfs: fix loading of orphan roots leading to BUG_ON)
avoids the BUG_ON but can add an aliased root to the dead_roots list or
leak the root.
Since we've already been loading roots into the radix tree, we should
use it before looking the root up on disk.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The qgroup_flags field is overloaded such that it reflects the on-disk
status of qgroups and the runtime state. The BTRFS_QGROUP_STATUS_FLAG_RESCAN
flag is used to indicate that a rescan operation is in progress, but if
the file system is unmounted while a rescan is running, the rescan
operation is paused. If the file system is then mounted read-only,
the flag will still be present but the rescan operation will not have
been resumed. When we go to umount, btrfs_qgroup_wait_for_completion
will see the flag and interpret it to mean that the rescan worker is
still running and will wait for a completion that will never come.
This patch uses a separate flag to indicate when the worker is
running. The locking and state surrounding the qgroup rescan worker
needs a lot of attention beyond this patch but this is enough to
avoid a hung umount.
Signed-off-by; Jeff Mahoney <jeffm@suse.com> Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Chris Mason <clm@fb.com>
We wait on qgroup rescan completion in three places: file system
shutdown, the quota disable ioctl, and the rescan wait ioctl. If the
user sends a signal while we're waiting, we continue happily along. This
is expected behavior for the rescan wait ioctl. It's racy in the shutdown
path but mostly works due to other unrelated synchronization points.
In the quota disable path, it Oopses the kernel pretty much immediately.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For DAX inodes we need to be careful to never have page cache pages in
the mapping->page_tree. This radix tree should be composed only of DAX
exceptional entries and zero pages.
ltp's readahead02 test was triggering a warning because we were trying
to insert a DAX exceptional entry but found that a page cache page had
already been inserted into the tree. This page was being inserted into
the radix tree in response to a readahead(2) call.
Readahead doesn't make sense for DAX inodes, but we don't want it to
report a failure either. Instead, we just return success and don't do
any work.
Link: http://lkml.kernel.org/r/20160824221429.21158-1-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: Jeff Moyer <jmoyer@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jan Kara <jack@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The data offset for a dax region needs to account for a reservation in
the resource range. Otherwise, device-dax is allowing mappings directly
into the memmap or device-info-block area with crash signatures like the
following:
While adding proper userfaultfd_wp support with bits in pagetable and
swap entry to avoid false positives WP userfaults through swap/fork/
KSM/etc, I've been adding a framework that mostly mirrors soft dirty.
So I noticed in one place I had to add uffd_wp support to the pagetables
that wasn't covered by soft_dirty and I think it should have.
Example: in the THP migration code migrate_misplaced_transhuge_page()
pmd_mkdirty is called unconditionally after mk_huge_pmd.
That sets soft dirty too (it's a false positive for soft dirty, the soft
dirty bit could be more finegrained and transfer the bit like uffd_wp
will do.. pmd/pte_uffd_wp() enforces the invariant that when it's set
pmd/pte_write is not set).
However in the THP split there's no unconditional pmd_mkdirty after
mk_huge_pmd and pte_swp_mksoft_dirty isn't called after the migration
entry is created. The code sets the dirty bit in the struct page
instead of setting it in the pagetable (which is fully equivalent as far
as the real dirty bit is concerned, as the whole point of pagetable bits
is to be eventually flushed out of to the page, but that is not
equivalent for the soft-dirty bit that gets lost in translation).
This was found by code review only and totally untested as I'm working
to actually replace soft dirty and I don't have time to test potential
soft dirty bugfixes as well :).
Transfer the soft_dirty from pmd to pte during THP splits.
This fix avoids losing the soft_dirty bit and avoids userland memory
corruption in the checkpoint.
Fixes: eef1b3ba053aa6 ("thp: implement split_huge_pmd()") Link: http://lkml.kernel.org/r/1471610515-30229-2-git-send-email-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
seq_read() is a nasty piece of work, not to mention buggy.
It has (I think) an old bug which allows unprivileged userspace to read
beyond the end of m->buf.
I was getting these:
BUG: KASAN: slab-out-of-bounds in seq_read+0xcd2/0x1480 at addr ffff880116889880
Read of size 2713 by task trinity-c2/1329
CPU: 2 PID: 1329 Comm: trinity-c2 Not tainted 4.8.0-rc1+ #96
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
Call Trace:
kasan_object_err+0x1c/0x80
kasan_report_error+0x2cb/0x7e0
kasan_report+0x4e/0x80
check_memory_region+0x13e/0x1a0
kasan_check_read+0x11/0x20
seq_read+0xcd2/0x1480
proc_reg_read+0x10b/0x260
do_loop_readv_writev.part.5+0x140/0x2c0
do_readv_writev+0x589/0x860
vfs_readv+0x7b/0xd0
do_readv+0xd8/0x2c0
SyS_readv+0xb/0x10
do_syscall_64+0x1b3/0x4b0
entry_SYSCALL64_slow_path+0x25/0x25
Object at ffff880116889100, in cache kmalloc-4096 size: 4096
Allocated:
PID = 1329
save_stack_trace+0x26/0x80
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
__kmalloc+0x1aa/0x4a0
seq_buf_alloc+0x35/0x40
seq_read+0x7d8/0x1480
proc_reg_read+0x10b/0x260
do_loop_readv_writev.part.5+0x140/0x2c0
do_readv_writev+0x589/0x860
vfs_readv+0x7b/0xd0
do_readv+0xd8/0x2c0
SyS_readv+0xb/0x10
do_syscall_64+0x1b3/0x4b0
return_from_SYSCALL_64+0x0/0x6a
Freed:
PID = 0
(stack is not available)
Memory state around the buggy address: ffff88011688a000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88011688a080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88011688a100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff88011688a180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88011688a200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Disabling lock debugging due to kernel taint
This seems to be the same thing that Dave Jones was seeing here:
https://lkml.org/lkml/2016/8/12/334
There are multiple issues here:
1) If we enter the function with a non-empty buffer, there is an attempt
to flush it. But it was not clearing m->from after doing so, which
means that if we try to do this flush twice in a row without any call
to traverse() in between, we are going to be reading from the wrong
place -- the splat above, fixed by this patch.
2) If there's a short write to userspace because of page faults, the
buffer may already contain multiple lines (i.e. pos has advanced by
more than 1), but we don't save the progress that was made so the
next call will output what we've already returned previously. Since
that is a much less serious issue (and I have a headache after
staring at seq_read() for the past 8 hours), I'll leave that for now.
Link: http://lkml.kernel.org/r/1471447270-32093-1-git-send-email-vegard.nossum@oracle.com Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Reported-by: Dave Jones <davej@codemonkey.org.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The UserMode (UM) Linux build was failing in gpiolib-of as it requires
ioremap()/iounmap() to exist, which is absent from UM. The non-existence
of IO memory is negatively defined as CONFIG_NO_IOMEM which means we
need to depend on HAS_IOMEM.
In case of error, the function usb_get_phy() returns ERR_PTR() and never
returns NULL. The NULL test in the return value check should be replaced
with IS_ERR().
Fixes: b5a2875605ca ("usb: renesas_usbhs: Allow an OTG PHY driver to
provide VBUS") Acked-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ 187.235190] scsi host2: Avago SAS based MegaRAID driver
[ 191.112365] megaraid_sas 0000:89:00.0: BAR 0: can't reserve [io 0x0000-0x00ff]
[ 191.120548] megaraid_sas 0000:89:00.0: IO memory region busy!
and the card has resource like,
[ 125.097714] pci 0000:89:00.0: [1000:005d] type 00 class 0x010400
[ 125.104446] pci 0000:89:00.0: reg 0x10: [io 0x0000-0x00ff]
[ 125.110686] pci 0000:89:00.0: reg 0x14: [mem 0xce400000-0xce40ffff 64bit]
[ 125.118286] pci 0000:89:00.0: reg 0x1c: [mem 0xce300000-0xce3fffff 64bit]
[ 125.125891] pci 0000:89:00.0: reg 0x30: [mem 0xce200000-0xce2fffff pref]
that does not io port resource allocated from BIOS, and kernel can not
assign one as io port shortage.
The driver is only looking for MEM, and should not fail.
It turns out megasas_init_fw() etc are using bar index as mask. index 1
is used as mask 1, so that pci_request_selected_regions() is trying to
request BAR0 instead of BAR1.
Fix all related reference.
Fixes: b6d5d8808b4c ("megaraid_sas: Use lowest memory bar for SR-IOV VF support") Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mpt3sas crashes on resume after suspend with WarpDrive flash cards. The
reply_post_host_index array is not set back up after the resume, and we
deference a stale pointer in _base_interrupt().
Move the reply_post_host_index array setup into
mpt3sas_base_map_resources(), which is also in the resume path.
Signed-off-by: Greg Edwards <gedwards@fireweed.org> Acked-by: Chaitra P B <chaitra.basappa@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cros_ec_cmd_xfer returns success status if the command transport
completes successfully, but the execution result is incorrectly ignored.
In many cases, the execution result is assumed to be successful, leading
to ignored errors and operating on uninitialized data.
We've recently introduced the cros_ec_cmd_xfer_status() helper to avoid these
problems. Let's use it.
[Regarding the 'Fixes' tag; there is significant refactoring since the driver's
introduction, but the underlying logical error exists throughout I believe]
Fixes: 9d230c9e4f4e ("i2c: ChromeOS EC tunnel driver") Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In aacraid's ioctl_send_fib() we do two fetches from userspace, one the
get the fib header's size and one for the fib itself. Later we use the
size field from the second fetch to further process the fib. If for some
reason the size from the second fetch is different than from the first
fix, we may encounter an out-of- bounds access in aac_fib_send(). We
also check the sender size to insure it is not out of bounds. This was
reported in https://bugzilla.kernel.org/show_bug.cgi?id=116751 and was
assigned CVE-2016-6480.
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com> Fixes: 7c00ffa31 '[SCSI] 2.6 aacraid: Variable FIB size (updated patch)' Signed-off-by: Dave Carroll <david.carroll@microsemi.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
trace_hardirqs_on_caller() in lockdep.c expects to be called before, not
after interrupts are actually enabled.
The following comment in kernel/locking/lockdep.c substantiates this
claim:
"
/*
* We're enabling irqs and according to our state above irqs weren't
* already enabled, yet we find the hardware thinks they are in fact
* enabled.. someone messed up their IRQ state tracing.
*/
"
An example can be found in include/linux/irqflags.h:
do { trace_hardirqs_on(); raw_local_irq_enable(); } while (0)
Without this change, we hit the following DEBUG_LOCKS_WARN_ON.
| CC mm/memory.o
| In file included from ../mm/memory.c:53:0:
| ../include/linux/pfn_t.h: In function ‘pfn_t_pte’:
| ../include/linux/pfn_t.h:78:2: error: conversion to non-scalar type requested
| return pfn_pte(pfn_t_to_pfn(pfn), pgprot);
With STRICT_MM_TYPECHECKS pte_t is a struct and the offending code
forces a cast which ends up shifting a struct and hence the gcc warning.
Note that in recent past some of the arches (aarch64, s390) made
STRICT_MM_TYPECHECKS default, but we don't for ARC as this leads to slightly
worse generated code, given ARC ABI definition of returning structs
(which pte_t would become)
Quoting from ARC ABI...
"Results of type struct are returned in a caller-supplied temporary
variable whose address is passed in r0.
For such functions, the arguments are shifted so that they are
passed in r1 and up."
So
- struct to be returned would be allocated on stack requiring extra
code at call sites
- callee updates stack memory to facilitate the return (vs. simple
MOV into return reg r0)
Hence STRICT_MM_TYPECHECKS is not enabled by default for ARC
User mode callee regs are explicitly collected before signal delivery or
breakpoint trap. r25 is special for kernel as it serves as task pointer,
so user mode value is clobbered very early. It is saved in pt_regs where
generally only scratch (aka caller saved) regs are saved.
The code to access the corresponding pt_regs location had a subtle bug as
it was using load/store with scaling of offset, whereas the offset was already
byte wise correct. So fix this by replacing LD.AS with a standard LD
Unfortunately, there's two situations where we lose hpd right now:
- Runtime suspend
- When we've shut off all of the power wells on Valleyview/Cherryview
While it would be nice if this didn't cause issues, this has the
ability to get us in some awkward states where a user won't be able to
get their display to turn on. For instance; if we boot a Valleyview
system without any monitors connected, it won't need any of it's power
wells and thus shut them off. Since this causes us to lose HPD, this
means that unless the user knows how to ssh into their machine and do a
manual reprobe for monitors, none of the monitors they connect after
booting will actually work.
Eventually we should come up with a better fix then having to enable
polling for this, since this makes rpm a lot less useful, but for now
the infrastructure in i915 just isn't there yet to get hpd in these
situations.
Changes since v1:
- Add comment explaining the addition of the if
(!mode_config->poll_running) in intel_hpd_init()
- Remove unneeded if (!dev->mode_config.poll_enabled) in
i915_hpd_poll_init_work()
- Call to drm_helper_hpd_irq_event() after we disable polling
- Add cancel_work_sync() call to intel_hpd_cancel_work()
Changes since v2:
- Apparently dev->mode_config.poll_running doesn't actually reflect
whether or not a poll is currently in progress, and is actually used
for dynamic module paramter enabling/disabling. So now we instead
keep track of our own poll_running variable in dev_priv->hotplug
- Clean i915_hpd_poll_init_work() a little bit
Changes since v3:
- Remove the now-redundant connector loop in intel_hpd_init(), just
rely on intel_hpd_poll_enable() for setting connector->polled
correctly on each connector
- Get rid of poll_running
- Don't assign enabled in i915_hpd_poll_init_work before we actually
lock dev->mode_config.mutex
- Wrap enabled assignment in i915_hpd_poll_init_work() in READ_ONCE()
for doc purposes
- Do the same for dev_priv->hotplug.poll_enabled with WRITE_ONCE in
intel_hpd_poll_enable()
- Add some comments about racing not mattering in intel_hpd_poll_enable
Changes since v4:
- Rename intel_hpd_poll_enable() to intel_hpd_poll_init()
- Drop the bool argument from intel_hpd_poll_init()
- Remove redundant calls to intel_hpd_poll_init()
- Rename poll_enable_work to poll_init_work
- Add some kerneldoc for intel_hpd_poll_init()
- Cross-reference intel_hpd_poll_init() in intel_hpd_init()
- Just copy the loop from intel_hpd_init() in intel_hpd_poll_init()
Changes since v5:
- Minor kerneldoc nitpicks
Cc: stable@vger.kernel.org Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Lyude <cpaul@redhat.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit 19625e85c6ec56038368aa72c44f5f55b221f0fc) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
One of the things preventing us from using polling is the fact that
calling valleyview_crt_detect_hotplug() when there's a VGA cable
connected results in sending another hotplug. With polling enabled when
HPD is disabled, this results in a scenario like this:
- We enable power wells and reset the ADPA
- output_poll_exec does force probe on VGA, triggering a hpd
- HPD handler waits for poll to unlock dev->mode_config.mutex
- output_poll_exec shuts off the ADPA, unlocks dev->mode_config.mutex
- HPD handler runs, resets ADPA and brings us back to the start
This results in an endless irq storm getting sent from the ADPA
whenever a VGA connector gets detected in the middle of polling.
Somewhat based off of the "drm/i915: Disable CRT HPD around force
trigger" patch Ville Syrjälä sent a while back
Cc: stable@vger.kernel.org Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Lyude <cpaul@redhat.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit b236d7c8421969ac0693fc571e47ee5c2a62fb90) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While VGA hotplugging worked(ish) before, it looks like that was mainly
because we'd unintentionally enable it in
valleyview_crt_detect_hotplug() when we did a force trigger. This
doesn't work reliably enough because whenever the display powerwell on
vlv gets disabled, the values set in VLV_ADPA get cleared and
consequently VGA hotplugging gets disabled. This causes bugs such as one
we found on an Intel NUC, where doing the following sequence of
hotplugs:
Would result in VGA hotplugging becoming disabled, due to the powerwells
getting toggled in the process of connecting HDMI.
Changes since v3:
- Expose intel_crt_reset() through intel_drv.h and call that in
vlv_display_power_well_init() instead of
encoder->base.funcs->reset(&encoder->base);
Changes since v2:
- Use intel_encoder structs instead of drm_encoder structs
Changes since v1:
- Instead of handling the register writes ourself, we just reuse
intel_crt_detect()
- Instead of resetting the ADPA during display IRQ installation, we now
reset them in vlv_display_power_well_init()
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Lyude <cpaul@redhat.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[danvet: Rebase over dev_priv/drm_device embedding.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit 9504a89247595b6c066c68aea0c34af1fc78d021) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On Haswell/Broadwell, the HD-Audio block is inside the HDMI/display
power well and so the sna-hda audio codec acquires the display power
well while it is operational. However, Skylake separates the powerwells
again, but yet we still need the audio powerwell to setup the registers.
(But then the hardware uses those registers even while powered off???)
Acquiring the powerwell around setting the chicken bits when setting up
the audio channel does at least silence the WARNs from touching our
registers whilst unpowered. We silence our own test cases, but maybe
there is a latent bug in using the audio channel?
v2: Grab both rpm wakelock and audio wakelock
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96214 Fixes: 03b135cebc47 "ALSA: hda - remove dependency on i915 power well for SKL") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Libin Yang <libin.yang@intel.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Marius Vlad <marius.c.vlad@intel.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470240540-29004-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit d838a110f0b310d408ebe6b5a97e36ec27555ebf) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bspec says:
"For DDIA with x4 capability (DDI_BUF_CTL DDIA Lane Capability Control =
DDIA x4), the I_boost value has to be programmed in both
tx_blnclegsctl_0 and tx_blnclegsctl_4."
Currently we only program tx_blnclegsctl_0. Let's do the other one as
well.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Workaround for bug:
https://bugs.freedesktop.org/show_bug.cgi?id=97460
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When looking up the connector type make sure the index
is valid. Avoids a later crash if we read past the end
of the array.
Workaround for bug:
https://bugs.freedesktop.org/show_bug.cgi?id=97460
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Adding a BO can make it the insertion point for larger sizes as well.
v2: add a comment about the guard structure.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This bug seems to be present for a very long time.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The GART aperture size can be bigger than 4GB. Therefore the offset
used in amdgpu_gart_bind and amdgpu_gart_unbind must be 64-bit.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When using CONFIG_DEBUG_ATOMIC_SLEEP, the scheduler nicely points out
that we're calling sleeping primitives within the wait_event loop, which
means we might clobber the task state:
[ 10.831289] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffc00026b610>]
[ 10.845531] ------------[ cut here ]------------
[ 10.850161] WARNING: at kernel/sched/core.c:7630
...
[ 12.164333] ---[ end trace 45409966a9a76438 ]---
[ 12.168942] Call trace:
[ 12.171391] [<ffffffc00024ed44>] __might_sleep+0x64/0x90
[ 12.176699] [<ffffffc000954774>] mutex_lock_nested+0x50/0x3fc
[ 12.182440] [<ffffffc0007b9424>] iio_kfifo_buf_data_available+0x28/0x4c
[ 12.189043] [<ffffffc0007b76ac>] iio_buffer_ready+0x60/0xe0
[ 12.194608] [<ffffffc0007b7834>] iio_buffer_read_first_n_outer+0x108/0x1a8
[ 12.201474] [<ffffffc000370d48>] __vfs_read+0x58/0x114
[ 12.206606] [<ffffffc000371740>] vfs_read+0x94/0x118
[ 12.211564] [<ffffffc0003720f8>] SyS_read+0x64/0xb4
[ 12.216436] [<ffffffc000203cb4>] el0_svc_naked+0x24/0x28
To avoid this, we should (a la https://lwn.net/Articles/628628/) use the
wait_woken() function, which avoids the nested sleeping while still
handling races between waiting / wake-events.
Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The called of_graph_get_next_endpoint() already decrements the refcount
of the prev node, so it is wrong to do it again in the calling function.
Use the for_each_endpoint_of_node() helper to interate through the
endpoint OF nodes, which already does the right thing and simplifies
the code a bit.
Fixes: 8ccd0d0ca041
(of: add helper for getting endpoint node of specific identifiers) Reported-by: David Jander <david@protonic.nl> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes to make the resume from cpu_suspend() code behave more like
secondary boot caused debug exceptions to be unmasked early by
__cpu_setup(). We then go on to restore mdscr_el1 in cpu_do_resume(),
potentially taking break or watch points based on uninitialised registers.
Mask debug exceptions in cpu_do_resume(), which is specific to resume
from cpu_suspend(). Debug exceptions will be restored to their original
state by local_dbg_restore() in cpu_suspend(), which runs after
hw_breakpoint_restore() has re-initialised the other registers.
Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Fixes: cabe1c81ea5b ("arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va") Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When CONFIG_RANDOMIZE_BASE is selected, we modify the page tables to remap the
kernel at a newly-chosen VA range. We do this with the MMU disabled, but do not
invalidate TLBs prior to re-enabling the MMU with the new tables. Thus the old
mappings entries may still live in TLBs, and we risk violating
Break-Before-Make requirements, leading to TLB conflicts and/or other issues.
We invalidate TLBs when we uninsall the idmap in early setup code, but prior to
this we are subject to issues relating to the Break-Before-Make violation.
Avoid these issues by invalidating the TLBs before the new mappings can be
used by the hardware.
Fixes: f80fb3a3d508 ("arm64: add support for kernel ASLR") Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Literal loads of virtual addresses are subject to runtime relocation when
CONFIG_RELOCATABLE=y, and given that the relocation routines run with the
MMU and caches enabled, literal loads of relocated values performed with
the MMU off are not guaranteed to return the latest value unless the
memory covering the literal is cleaned to the PoC explicitly.
So defer the literal load until after the MMU has been enabled, just like
we do for primary_switch() and secondary_switch() in head.S.
Fixes: 1e48ef7fcc37 ("arm64: add support for building vmlinux as a relocatable PIE binary") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The code currently assumes that buffered multicast PS frames don't have
a pending ACK frame for tx status reporting.
However, hostapd sends a broadcast deauth frame on teardown for which tx
status is requested. This can lead to the "Have pending ack frames"
warning on module reload.
Fix this by using ieee80211_free_txskb/ieee80211_purge_tx_queue.
Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a device is in a status where CIO has killed all I/O by itself the
interrupt for a clear request may not contain an irb to determine the
clear function. Instead it contains an error pointer -EIO.
This was ignored by the DASD int_handler leading to a hanging device
waiting for a clear interrupt.
Handle -EIO error pointer correctly for requests that are clear pending and
treat the clear as successful.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com> Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On Intel Xeon Phi Knights Landing processor family the channels of the
memory controller have untypical arrangement - MC0 is mapped to CH3,4,5
and MC1 is mapped to CH0,1,2. This causes the EDAC driver to report the
channel name incorrectly.
We missed this change earlier, so the code already contains similar
comment, but the translation function is incorrect.
Without this patch:
errors in DIMM_A and DIMM_D were reported in DIMM_D
errors in DIMM_B and DIMM_E were reported in DIMM_E
errors in DIMM_C and DIMM_F were reported in DIMM_F
Correct this.
Hubert Chrzaniuk:
- rebased to 4.8
- comments and code cleanup
We also need to revert the dynamic OF change, so we get a consistent
state again. Otherwise, we might have two devices enabled e.g. after
pinctrl setup fails.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the function amd_gpio_irq_enable() and
amd_gpio_direction_input(), remove the code which is setting
the default de-bounce time to 2.75ms.
The driver code shall use the same settings as specified in
BIOS. Any default assignment impacts TouchPad behaviour when
the LevelTrig is set to EDGE FALLING.
The disable_bypass cmdline option changes the SMMUv3 driver to put down
faulting stream table entries by default, as opposed to bypassing
transactions from unconfigured devices.
In this mode of operation, it is entirely expected to see aborting
entries in the stream table if and when we come to installing a valid
translation, so don't trigger a BUG() as a result of misdiagnosing these
entries as stream table corruption.
Fixes: 48ec83bcbcf5 ("iommu/arm-smmu: Add initial driver support for ARM SMMUv3 devices") Tested-by: Robin Murphy <robin.murphy@arm.com> Reported-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Enabling stalling faults can result in hardware deadlock on poorly
designed systems, particularly those with a PCI root complex upstream of
the SMMU.
Although it's not really Linux's job to save hardware integrators from
their own misfortune, it *is* our job to stop userspace (e.g. VFIO
clients) from hosing the system for everybody else, even if they might
already be required to have elevated privileges.
Given that the fault handling code currently executes entirely in IRQ
context, there is nothing that can sensibly be done to recover from
things like page faults anyway, so let's rip this code out for now and
avoid the potential for deadlock.
Fixes: 48ec83bcbcf5 ("iommu/arm-smmu: Add initial driver support for ARM SMMUv3 devices") Reported-by: Matt Evans <matt.evans@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the unlikely event of a global command queue error, the ARM SMMUv3
driver attempts to convert the problematic command into a CMD_SYNC and
resume the command queue. Unfortunately, this code is pretty badly
broken:
1. It uses the index into the error string table as the CMDQ index,
so we probably read the wrong entry out of the queue
2. The arguments to queue_write are the wrong way round, so we end up
writing from the queue onto the stack.
These happily cancel out, so the kernel is likely to stay alive, but
the command queue will probably fault again when we resume.
This patch fixes the error handling code to use the correct queue index
and write back the CMD_SYNC to the faulting entry.
Fixes: 48ec83bcbcf5 ("iommu/arm-smmu: Add initial driver support for ARM SMMUv3 devices") Reported-by: Diwakar Subraveti <Diwakar.Subraveti@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>