Fixes an error in having the iosf build as 'default m'. On X86 SoC's the iosf
sideband is the only way to access information for some registers, as opposed to
through MSR's on other Intel architectures. While selecting IOSF_MBI is
preferred, it does mean carrying extra code on non-SoC architectures. This
exports the selection to the user, allowing those driver writers to compile out
iosf code if it's not being built.
[Note the upstream one of this patch requires applying full GICv3 support
but it's out of the scope of stable kernel. So this patch has a huge
modification for stable kernel comparing to the upstream one.]
There is an interesting bug in the vgic code, which manifests itself
when the KVM run loop has a signal pending or needs a vmid generation
rollover after having disabled interrupts but before actually switching
to the guest.
In this case, we flush the vgic as usual, but we sync back the vgic
state and exit to userspace before entering the guest. The consequence
is that we will be syncing the list registers back to the software model
using the GICH_ELRSR and GICH_EISR from the last execution of the guest,
potentially overwriting a list register containing an interrupt.
This showed up during migration testing where we would capture a state
where the VM has masked the arch timer but there were no interrupts,
resulting in a hung test.
Cc: Marc Zyngier <marc.zyngier@arm.com> Reported-by: Alex Bennee <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Since we don't backport commit c647355 (KVM: arm: Add initial dirty page
locking support) for linux-3.14.y, there is no stage2_wp_range in
arch/arm/kvm/mmu.c. So ignore the change in stage2_wp_range introduced
by this patch.]
The kernel's pgd_index macro is designed to index a normal, page
sized array. KVM is a bit diffferent, as we can use concatenated
pages to have a bigger address space (for example 40bit IPA with
4kB pages gives us an 8kB PGD.
In the above case, the use of pgd_index will always return an index
inside the first 4kB, which makes a guest that has memory above
0x8000000000 rather unhappy, as it spins forever in a page fault,
whist the host happilly corrupts the lower pgd.
The obvious fix is to get our own kvm_pgd_index that does the right
thing(tm).
Tested on X-Gene with a hacked kvmtool that put memory at a stupidly
high address.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit b856a59141b1 (arm/arm64: KVM: Reset the HCR on each vcpu
when resetting the vcpu) moved the init of the HCR register to
happen later in the init of a vcpu, but left out the fixup
done in kvm_reset_vcpu when preparing for a 32bit guest.
As a result, the 32bit guest is run as a 64bit guest, but the
rest of the kernel still manages it as a 32bit. Fun follows.
Moving the fixup to vcpu_reset_hcr solves the problem for good.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It took about two years for someone to notice that the IPA passed
to TLBI IPAS2E1IS must be shifted by 12 bits. Clearly our reviewing
is not as good as it should be...
Paper bag time for me.
Reported-by: Mario Smarduch <m.smarduch@samsung.com> Tested-by: Mario Smarduch <m.smarduch@samsung.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Note this patch is a bit different from the original one as the names of
vgic_initialized and kvm_vgic_init are different.]
It is curently possible to run a VM with architected timers support
without creating an in-kernel VGIC, which will result in interrupts from
the virtual timer going nowhere.
To address this issue, move the architected timers initialization to the
time when we run a VCPU for the first time, and then only initialize
(and enable) the architected timers if we have a properly created and
initialized in-kernel VGIC.
When injecting interrupts from the virtual timer to the vgic, the
current setup should ensure that this never calls an on-demand init of
the VGIC, which is the only call path that could return an error from
kvm_vgic_inject_irq(), so capture the return value and raise a warning
if there's an error there.
We also change the kvm_timer_init() function from returning an int to be
a void function, since the function always succeeds.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Limit the mounts fs_fully_visible considers to locked mounts.
Unlocked can always be unmounted so considering them adds hassle
but no security benefit.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The warning message in prepend_path is unclear and outdated. It was
added as a warning that the mechanism for generating names of pseudo
files had been removed from prepend_path and d_dname should be used
instead. Unfortunately the warning reads like a general warning,
making it unclear what to do with it.
Remove the warning. The transition it was added to warn about is long
over, and I added code several years ago which in rare cases causes
the warning to fire on legitimate code, and the warning is now firing
and scaring people for no good reason.
Reported-by: Ivan Delalande <colona@arista.com> Reported-by: Omar Sandoval <osandov@osandov.com> Fixes: f48cfddc6729e ("vfs: In d_path don't call d_dname on a mount point") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
file_remove_suid() could mistakenly set S_NOSEC inode bit when root was
modifying the file. As a result following writes to the file by ordinary
user would avoid clearing suid or sgid bits.
Fix the bug by checking actual mode bits before setting S_NOSEC.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Writes were a bit racy, but hard to turn into a bug at the same time.
(Particularly because modern Linux doesn't use this feature anymore.)
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[Actually the next patch makes it much, much easier to trigger the race
so I'm including this one for stable@ as well. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
KVM guest kernels for trap & emulate run in user mode, with a modified
set of kernel memory segments. However the fixmap address is still in
the normal KSeg3 region at 0xfffe0000 regardless, causing problems when
cache alias handling makes use of them when handling copy on write.
Therefore define FIXADDR_TOP as 0x7ffe0000 in the guest kernel mapped
region when CONFIG_KVM_GUEST is defined.
The Foxconn K8M890-8237A has two PCI host bridges, and we can't assign
resources correctly without the information from _CRS that tells us which
address ranges are claimed by which bridge. In the bugs mentioned below,
we incorrectly assign a sound card address (this example is from 1033299):
We enable _CRS on all systems from 2008 and later. On older systems, we
ignore _CRS and assume the whole physical address space (excluding RAM and
other devices) is available for PCI devices, but on systems that support
physical address spaces larger than 4GB, it's doubtful that the area above
4GB is really available for PCI.
After d56dbf5bab8c ("PCI: Allocate 64-bit BARs above 4G when possible"), we
try to use that space above 4GB *first*, so we're more likely to put a
device there.
On Juan's Toshiba Satellite Pro U200, BIOS left the graphics, sound, 1394,
and card reader devices unassigned (but only after Windows had been
booted). Only the sound device had a 64-bit BAR, so it was the only device
placed above 4GB, and hence the only device that didn't work.
Keep _CRS enabled even on pre-2008 systems if they support physical address
space larger than 4GB.
Fixes: d56dbf5bab8c ("PCI: Allocate 64-bit BARs above 4G when possible") Reported-and-tested-by: Juan Dayer <jdayer@outlook.com> Reported-and-tested-by: Alan Horsfield <alan@hazelgarth.co.uk> Link: https://bugzilla.kernel.org/show_bug.cgi?id=99221 Link: https://bugzilla.opensuse.org/show_bug.cgi?id=907092 Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we take a PMU exception or a software event we call
perf_read_regs(). This overloads regs->result with a boolean that
describes if we should use the sampled instruction address register
(SIAR) or the regs.
If the exception is in kernel, we start with the kernel regs and
backtrace through the kernel stack. At this point we switch to the
userspace regs and backtrace the user stack with perf_callchain_user().
Unfortunately these regs have not got the perf_read_regs() treatment,
so regs->result could be anything. If it is non zero,
perf_instruction_pointer() decides to use the SIAR, and we get issues
like this:
On VM entry, we disable access to the VFP registers in order to
perform a lazy save/restore of these registers.
On VM exit, we restore access, test if we did enable them before,
and save/restore the guest/host registers if necessary. In this
sequence, the FPEXC register is always accessed, irrespective
of the trapping configuration.
If the guest didn't touch the VFP registers, then the HCPTR access
has now enabled such access, but we're missing a barrier to ensure
architectural execution of the new HCPTR configuration. If the HCPTR
access has been delayed/reordered, the subsequent access to FPEXC
will cause a trap, which we aren't prepared to handle at all.
The same condition exists when trapping to enable VFP for the guest.
The fix is to introduce a barrier after enabling VFP access. In the
vmexit case, it can be relaxed to only takes place if the guest hasn't
accessed its view of the VFP registers, making the access to FPEXC safe.
The set_hcptr macro is modified to deal with both vmenter/vmexit and
vmtrap operations, and now takes an optional label that is branched to
when the guest hasn't touched the VFP registers.
Commit 007bea098b86 (intel_pstate: Add setting voltage value for
baytrail P states.) introduced byt_set_pstate() with the assumption that
it would always be run by the CPU whose MSR is to be written by it. It
turns out, however, that is not always the case in practice, so modify
byt_set_pstate() to enforce the MSR write done by it to always happen on
the right CPU.
Fixes: 007bea098b86 (intel_pstate: Add setting voltage value for baytrail P states.) Signed-off-by: Joe Konno <joe.konno@intel.com> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The conversion to be16_add_cpu() is incorrect in case cryptlen is
negative due to premature (i.e. before addition / subtraction)
implicit conversion of cryptlen (int -> u16) leading to sign loss.
Fixes: 1d11911a8c57 ("crypto: talitos - fix warning: 'alg' may be used uninitialized in this function") Signed-off-by: Horia Geanta <horia.geanta@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is NULL pointer dereference possible during statistics update if the route
used for OOTB responce is removed at unfortunate time. If the route exists when
we receive OOTB packet and we finally jump into sctp_packet_transmit() to send
ABORT, but in the meantime route is removed under our feet, we take "no_route"
path and try to update stats with IP_INC_STATS(sock_net(asoc->base.sk), ...).
But sctp_ootb_pkt_new() used to prepare responce packet doesn't call
sctp_transport_set_owner() and therefore there is no asoc associated with this
packet. Probably temporary asoc just for OOTB responces is overkill, so just
introduce a check like in all other places in sctp_packet_transmit(), where
"asoc" is dereferenced.
To reproduce this, one needs to
0. ensure that sctp module is loaded (otherwise ABORT is not generated)
1. remove default route on the machine
2. while true; do
ip route del [interface-specific route]
ip route add [interface-specific route]
done
3. send enough OOTB packets (i.e. HB REQs) from another host to trigger ABORT
responce
When limiting phy link speed using "max-speed" to 100mbps or less on a
giga bit phy, phy never completes auto negotiation and phy state
machine is held in PHY_AN. Fixing this issue by comparing the giga
bit advertise though phydev->supported doesn't have it but phy has
BMSR_ESTATEN set. So that auto negotiation is restarted as old and
new advertise are different and link comes up fine.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tcp_fastopen_reset_cipher really cannot be called from interrupt
context. It allocates the tcp_fastopen_context with GFP_KERNEL and
calls crypto_alloc_cipher, which allocates all kind of stuff with
GFP_KERNEL.
Thus, we might sleep when the key-generation is triggered by an
incoming TFO cookie-request which would then happen in interrupt-
context, as shown by enabling CONFIG_DEBUG_ATOMIC_SLEEP:
This patch moves the call to tcp_fastopen_init_key_once to the places
where a listener socket creates its TFO-state, which always happens in
user-context (either from the setsockopt, or implicitly during the
listen()-call)
Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Fixes: 222e83d2e0ae ("tcp: switch tcp_fastopen key generation to net_get_random_once") Signed-off-by: Christoph Paasch <cpaasch@apple.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The lockless lookups can return entry that is unlinked.
Sometimes they get reference before last neigh_cleanup_and_release,
sometimes they do not need reference. Later, any
modification attempts may result in the following problems:
1. entry is not destroyed immediately because neigh_update
can start the timer for dead entry, eg. on change to NUD_REACHABLE
state. As result, entry lives for some time but is invisible
and out of control.
2. __neigh_event_send can run in parallel with neigh_destroy
while refcnt=0 but if timer is started and expired refcnt can
reach 0 for second time leading to second neigh_destroy and
possible crash.
Thanks to Eric Dumazet and Ying Xue for their work and analyze
on the __neigh_event_send change.
Fixes: 767e97e1e0db ("neigh: RCU conversion of struct neighbour") Fixes: a263b3093641 ("ipv4: Make neigh lookups directly in output packet path.") Fixes: 6fd6ce2056de ("ipv6: Do not depend on rt->n in ip6_finish_output2().") Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
PACKET_FANOUT_LB computes f->rr_cur such that it is modulo
f->num_members. It returns the old value unconditionally, but
f->num_members may have changed since the last store. Ensure
that the return value is always < num.
When modifying the logic, simplify it further by replacing the loop
with an unconditional atomic increment.
Fixes: dc99f600698d ("packet: Add fanout support.") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We need to tell compiler it must not read f->num_members multiple
times. Otherwise testing if num is not zero is flaky, and we could
attempt an invalid divide by 0 in fanout_demux_cpu()
Note bug was present in packet_rcv_fanout_hash() and
packet_rcv_fanout_lb() but final 3.1 had a simple location
after commit 95ec3eb417115fb ("packet: Add 'cpu' fanout policy.")
Fixes: dc99f600698dc ("packet: Add fanout support.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After the ->set() spinlocks were removed br_stp_set_bridge_priority
was left running without any protection when used via sysfs. It can
race with port add/del and could result in use-after-free cases and
corrupted lists. Tested by running port add/del in a loop with stp
enabled while setting priority in a loop, crashes are easily
reproducible.
The spinlocks around sysfs ->set() were removed in commit: 14f98f258f19 ("bridge: range check STP parameters")
There's also a race condition in the netlink priority support that is
fixed by this change, but it was introduced recently and the fixes tag
covers it, just in case it's needed the commit is: af615762e972 ("bridge: add ageing_time, stp_state, priority over netlink")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Fixes: 14f98f258f19 ("bridge: range check STP parameters") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
->auto_asconf_splist is per namespace and mangled by functions like
sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.
Also, the call to inet_sk_copy_descendant() was backuping
->auto_asconf_list through the copy but was not honoring
->do_auto_asconf, which could lead to list corruption if it was
different between both sockets.
This commit thus fixes the list handling by using ->addr_wq_lock
spinlock to protect the list. A special handling is done upon socket
creation and destruction for that. Error handlig on sctp_init_sock()
will never return an error after having initialized asconf, so
sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
will be take on sctp_close_sock(), before locking the socket, so we
don't do it in inverse order compared to sctp_addr_wq_timeout_handler().
Instead of taking the lock on sctp_sock_migrate() for copying and
restoring the list values, it's preferred to avoid rewritting it by
implementing sctp_copy_descendant().
Issue was found with a test application that kept flipping sysctl
default_auto_asconf on and off, but one could trigger it by issuing
simultaneous setsockopt() calls on multiple sockets or by
creating/destroying sockets fast enough. This is only triggerable
locally.
Fixes: 9f7d653b67ae ("sctp: Add Auto-ASCONF support (core).") Reported-by: Ji Jianwen <jiji@redhat.com> Suggested-by: Neil Horman <nhorman@tuxdriver.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We saw excessive direct memory compaction triggered by skb_page_frag_refill.
This causes performance issues and add latency. Commit 5640f7685831e0
introduces the order-3 allocation. According to the changelog, the order-3
allocation isn't a must-have but to improve performance. But direct memory
compaction has high overhead. The benefit of order-3 allocation can't
compensate the overhead of direct memory compaction.
This patch makes the order-3 page allocation atomic. If there is no memory
pressure and memory isn't fragmented, the alloction will still success, so we
don't sacrifice the order-3 benefit here. If the atomic allocation fails,
direct memory compaction will not be triggered, skb_page_frag_refill will
fallback to order-0 immediately, hence the direct memory compaction overhead is
avoided. In the allocation failure case, kswapd is waken up and doing
compaction, so chances are allocation could success next time.
alloc_skb_with_frags is the same.
The mellanox driver does similar thing, if this is accepted, we must fix
the driver too.
V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
V2: make the changelog clearer
Cc: Eric Dumazet <edumazet@google.com> Cc: Chris Mason <clm@fb.com> Cc: Debabrata Banerjee <dbavatar@gmail.com> Signed-off-by: Shaohua Li <shli@fb.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since the addition of sysfs multicast router support if one set
multicast_router to "2" more than once, then the port would be added to
the hlist every time and could end up linking to itself and thus causing an
endless loop for rlist walkers.
So to reproduce just do:
echo 2 > multicast_router; echo 2 > multicast_router;
in a bridge port and let some igmp traffic flow, for me it hangs up
in br_multicast_flood().
Fix this by adding a check in br_multicast_add_router() if the port is
already linked.
The reason this didn't happen before the addition of multicast_router
sysfs entries is because there's a !hlist_unhashed check that prevents
it.
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Fixes: 0909e11758bd ("bridge: Add multicast_router sysfs entries") Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since it is possible for vnet_event_napi to end up doing
vnet_control_pkt_engine -> ... -> vnet_send_attr ->
vnet_port_alloc_tx_ring -> ldc_alloc_exp_dring -> kzalloc()
(i.e., in softirq context), kzalloc() should be called with
GFP_ATOMIC from ldc_alloc_exp_dring.
When the vgic initializes its internal state it does so based on the
number of VCPUs available at the time. If we allow KVM to create more
VCPUs after the VGIC has been initialized, we are likely to error out in
unfortunate ways later, perform buffer overflows etc.
Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introduce a new function to unmap user RAM regions in the stage2 page
tables. This is needed on reboot (or when the guest turns off the MMU)
to ensure we fault in pages again and make the dcache, RAM, and icache
coherent.
Using unmap_stage2_range for the whole guest physical range does not
work, because that unmaps IO regions (such as the GIC) which will not be
recreated or in the best case faulted in on a page-by-page basis.
Call this function on secondary and subsequent calls to the
KVM_ARM_VCPU_INIT ioctl so that a reset VCPU will detect the guest
Stage-1 MMU is off when faulting in pages and make the caches coherent.
Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When userspace resets the vcpu using KVM_ARM_VCPU_INIT, we should also
reset the HCR, because we now modify the HCR dynamically to
enable/disable trapping of guest accesses to the VM registers.
This is crucial for reboot of VMs working since otherwise we will not be
doing the necessary cache maintenance operations when faulting in pages
with the guest MMU off.
Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The implementation of KVM_ARM_VCPU_INIT is currently not doing what
userspace expects, namely making sure that a vcpu which may have been
turned off using PSCI is returned to its initial state, which would be
powered on if userspace does not set the KVM_ARM_VCPU_POWER_OFF flag.
Implement the expected functionality and clarify the ABI.
Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a VCPU was originally started with power off (typically to be brought
up by PSCI in SMP configurations), there is no need to clear the
POWER_OFF flag in the kernel, as this flag is only tested during the
init ioctl itself.
Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Instead of using kvm_is_mmio_pfn() to decide whether a host region
should be stage 2 mapped with device attributes, add a new static
function kvm_is_device_pfn() that disregards RAM pages with the
reserved bit set, as those should usually not be mapped as device
memory.
Some of the macros defined in kvm_arm.h are useful in assembly files, but are
not compatible with the assembler. Change any C language integer constant
definitions using appended U, UL, or ULL to the UL() preprocessor macro. Also,
add a preprocessor include of the asm/memory.h file which defines the UL()
macro.
Fixes build errors like these when using kvm_arm.h in assembly
source files:
Error: unexpected characters following instruction at operand 3 -- `and x0,x1,#((1U<<25)-1)'
Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we detect another vCPU is running we just exit and return 0 as if we
succesfully created the VGIC, but the VGIC wouldn't actual be created.
This shouldn't break in-kernel behavior because the kernel will not
observe the failed the attempt to create the VGIC, but userspace could
be rightfully confused.
Cc: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently if using a 48-bit VA, tearing down the hyp page tables (which
can happen in the absence of a GICH or GICV resource) results in the
rather nasty splat below, evidently becasue we access a table that
doesn't actually exist.
Commit 38f791a4e499792e (arm64: KVM: Implement 48 VA support for KVM EL2
and Stage-2) added a pgd_none check to __create_hyp_mappings to account
for the additional level of tables, but didn't add a corresponding check
to unmap_range, and this seems to be the source of the problem.
This patch adds the missing pgd_none check, ensuring we don't try to
access tables that don't exist.
[Since we don't backport commit 8eef912 (arm/arm64: KVM: map MMIO regions
at creation time) for linux-3.14.y, the context of this patch is
different, while the change itself is same.]
When creating or moving a memslot, make sure the IPA space is within the
addressable range of the guest. Otherwise, user space can create too
large a memslot and KVM would try to access potentially unallocated page
table entries when inserting entries in the Stage-2 page tables.
Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On some platforms with no power management capabilities, the hotplug
implementation is allowed to return from a smp_ops.cpu_die() call as a
function return. Upon a CPU onlining event, the KVM CPU notifier tries
to reinstall the hyp stub, which fails on platform where no reset took
place following a hotplug event, with the message:
CPU1: smp_ops.cpu_die() returned, trying to resuscitate
CPU1: Booted secondary processor
Kernel panic - not syncing: unexpected prefetch abort in Hyp mode at: 0x80409540
unexpected data abort in Hyp mode at: 0x80401fe8
unexpected HVC/SVC trap in Hyp mode at: 0x805c6170
since KVM code is trying to reinstall the stub on a system where it is
already configured.
To prevent this issue, this patch adds a check in the KVM hotplug
notifier that detects if the HYP stub really needs re-installing when a
CPU is onlined and skips the installation call if the stub is already in
place, which means that the CPU has not been reset.
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current aarch64 calculation for VTTBR_BADDR_MASK masks only 39 bits
and not all the bits in the PA range. This is clearly a bug that
manifests itself on systems that allocate memory in the higher address
space range.
[ Modified from Joel's original patch to be based on PHYS_MASK_SHIFT
instead of a hard-coded value and to move the alignment check of the
allocation to mmu.c. Also added a comment explaining why we hardcode
the IPA range and changed the stage-2 pgd allocation to be based on
the 40 bit IPA range instead of the maximum possible 48 bit PA range.
- Christoffer ]
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Joel Schopp <joel.schopp@amd.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The sgi values calculated in read_set_clear_sgi_pend_reg() and
write_set_clear_sgi_pend_reg() were horribly incorrectly multiplied by 4
with catastrophic results in that subfunctions ended up overwriting
memory not allocated for the expected purpose.
This showed up as bugs in kfree() and the kernel complaining a lot of
you turn on memory debugging.
This addresses: http://marc.info/?l=kvm&m=141164910007868&w=2
[Since we don't backport commit 227844f (arm/arm64: KVM: Rename irq_state
to irq_pending) for linux-3.14.y, here we still use vgic_update_irq_state
instead of vgic_update_irq_pending.]
As it stands, nothing prevents userspace from injecting an interrupt
before the guest's GIC is actually initialized.
This goes unnoticed so far (as everything is pretty much statically
allocated), but ends up exploding in a spectacular way once we switch
to a more dynamic allocation (the GIC data structure isn't there yet).
The fix is to test for the "ready" flag in the VGIC distributor before
trying to inject the interrupt. Note that in order to avoid breaking
userspace, we have to ignore what is essentially an error.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Since we don't backport commit 9804788 (arm/arm64: KVM: Support
KVM_CAP_READONLY_MEM), ingore the changes in kvm_handle_guest_abort
introduced by this patch.]
The ISS encoding for an exception from a Data Abort has a WnR
bit[6] that indicates whether the Data Abort was caused by a
read or a write instruction. While there are several fields
in the encoding that are only valid if the ISV bit[24] is set,
WnR is not one of them, so we can read it unconditionally.
Instead of fixing both implementations of kvm_is_write_fault()
in place, reimplement it just once using kvm_vcpu_dabt_iswrite(),
which already does the right thing with respect to the WnR bit.
Also fix up the callers to pass 'vcpu'
Acked-by: Laszlo Ersek <lersek@redhat.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Until now, the mvebu-mbus was guessing by itself whether hardware I/O
coherency was available or not by poking into the Device Tree to see
if the coherency fabric Device Tree node was present or not.
However, on some upcoming SoCs, the presence or absence of the
coherency fabric DT node isn't sufficient: in CONFIG_SMP, the
coherency can be enabled, but not in !CONFIG_SMP.
In order to clean this up, the mvebu_mbus_dt_init() function is
extended to get a boolean argument telling whether coherency is
enabled or not. Therefore, the logic to decide whether coherency is
available or not now belongs to the core SoC code instead of the
mvebu-mbus driver itself, which is much better.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Link: https://lkml.kernel.org/r/1397483228-25625-4-git-send-email-thomas.petazzoni@free-electrons.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
[ Greg Ungerer: back ported to linux-3.14.y
Back port necessary due to large code differences in affected files.
This change in combination with commit e553554536 ("ARM: mvebu: disable
I/O coherency on non-SMP situations on Armada 370/375/38x/XP") is
critical to the hardware I/O coherency being set correctly by both the
mbus driver and all peripheral hardware drivers. Without this change
drivers will incorrectly enable I/O coherency window attributes and
this causes rare unreliable system behavior including oops. ]
If hardware doesn't support DecodeAssist - a feature that provides
more information about the intercept in the VMCB, KVM decodes the
instruction and then updates the next_rip vmcb control field.
However, NRIP support itself depends on cpuid Fn8000_000A_EDX[NRIPS].
Since skip_emulated_instruction() doesn't verify nrip support
before accepting control.next_rip as valid, avoid writing this
field if support isn't present.
Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We need to check the position and size of file writes against various
limits, using generic_write_check(). This was not being done for
the splice write path. It was fixed upstream by commit 8d0207652cbe
("->splice_write() via ->write_iter()") but we can't apply that.
CVE-2014-7822
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Vinson Lee <vlee@twopensource.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For VXLAN/NVGRE encapsulation, the current HW doesn't support offloading
both the outer UDP TX checksum and the inner TCP/UDP TX checksum.
The driver doesn't advertize SKB_GSO_UDP_TUNNEL_CSUM, however we are wrongly
telling the HW to offload the outer UDP checksum for encapsulated packets,
fix that.
Fixes: 837052d0ccc5 ('net/mlx4_en: Add netdev support for TCP/IP
offloads of vxlan tunneling') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Replacing a xattr consists of doing a lookup for its existing value, delete
the current value from the respective leaf, release the search path and then
finally insert the new value. This leaves a time window where readers (getxattr,
listxattrs) won't see any value for the xattr. Xattrs are used to store ACLs,
so this has security implications.
This change also fixes 2 other existing issues which were:
*) Deleting the old xattr value without verifying first if the new xattr will
fit in the existing leaf item (in case multiple xattrs are packed in the
same item due to name hash collision);
*) Returning -EEXIST when the flag XATTR_CREATE is given and the xattr doesn't
exist but we have have an existing item that packs muliple xattrs with
the same name hash as the input xattr. In this case we should return ENOSPC.
A test case for xfstests follows soon.
Thanks to Alexandre Oliva for reporting the non-atomicity of the xattr replace
implementation.
Reported-by: Alexandre Oliva <oliva@gnu.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mc_saved_tmp is a static array allocated on the stack, we need to make
sure mc_saved_count stays within its bounds, otherwise we're overflowing
the stack in _save_mc(). A specially crafted microcode header could lead
to a kernel crash or potentially kernel execution.
Add a call to pci_set_master(...) missing in the previous
patch "hpsa: refine the pci enable/disable handling".
Found thanks to Rob Elliot.
Signed-off-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Robert Elliott <elliott@hp.com> Tested-by: Robert Elliott <elliott@hp.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
nfnl_cthelper_parse_tuple() is called from nfnl_cthelper_new(),
nfnl_cthelper_get() and nfnl_cthelper_del(). In each case they pass
a pointer to an nf_conntrack_tuple data structure local variable:
struct nf_conntrack_tuple tuple;
...
ret = nfnl_cthelper_parse_tuple(&tuple, tb[NFCTH_TUPLE]);
The problem is that this local variable is not initialized, and
nfnl_cthelper_parse_tuple() only initializes two fields: src.l3num and
dst.protonum. This leaves all other fields with undefined values
based on whatever is on the stack:
The symptom observed was that when the rpc and tns helpers were added
then traffic to port 1536 was being sent to user-space.
Signed-off-by: Ian Wilson <iwilson@brocade.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a second(kdump) kernel starts and the hard reset method is used
the driver calls pci_disable_device without previously enabling it,
so the kernel shows a warning -
[ 16.876248] WARNING: at drivers/pci/pci.c:1431 pci_disable_device+0x84/0x90()
[ 16.882686] Device hpsa
disabling already-disabled device
...
This patch fixes it, in addition to this I tried to balance also some other pairs
of enable/disable device in the driver.
Unfortunately I wasn't able to verify the functionality for the case of a sw reset,
because of a lack of proper hw.
Signed-off-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The related code can be simplified, and also can avoid related warnings
(with allmodconfig under parisc):
CC [M] net/netfilter/nfnetlink_cthelper.o
net/netfilter/nfnetlink_cthelper.c: In function ‘nfnl_cthelper_from_nlattr’:
net/netfilter/nfnetlink_cthelper.c:97:9: warning: passing argument 1 o ‘memcpy’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-array-qualifiers]
memcpy(&help->data, nla_data(attr), help->helper->data_len);
^
In file included from include/linux/string.h:17:0,
from include/uapi/linux/uuid.h:25,
from include/linux/uuid.h:23,
from include/linux/mod_devicetable.h:12,
from ./arch/parisc/include/asm/hardware.h:4,
from ./arch/parisc/include/asm/processor.h:15,
from ./arch/parisc/include/asm/spinlock.h:6,
from ./arch/parisc/include/asm/atomic.h:21,
from include/linux/atomic.h:4,
from ./arch/parisc/include/asm/bitops.h:12,
from include/linux/bitops.h:36,
from include/linux/kernel.h:10,
from include/linux/list.h:8,
from include/linux/module.h:9,
from net/netfilter/nfnetlink_cthelper.c:11:
./arch/parisc/include/asm/string.h:8:8: note: expected ‘void *’ but argument is of type ‘const char (*)[]’
void * memcpy(void * dest,const void *src,size_t count);
^
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@soleta.eu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A huge amount of NIC drivers use the DMA API, however if
compiled under 32-bit an very important part of the DMA API can
be ommitted leading to the drivers not working at all
(especially if used with 'swiotlb=force iommu=soft').
As Prashant Sreedharan explains it: "the driver [tg3] uses
DEFINE_DMA_UNMAP_ADDR(), dma_unmap_addr_set() to keep a copy of
the dma "mapping" and dma_unmap_addr() to get the "mapping"
value. On most of the platforms this is a no-op, but ... with
"iommu=soft and swiotlb=force" this house keeping is required,
... otherwise we pass 0 while calling pci_unmap_/pci_dma_sync_
instead of the DMA address."
As such enable this even when using 32-bit kernels.
Reported-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Prashant Sreedharan <prashant@broadcom.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Chan <mchan@broadcom.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: cascardo@linux.vnet.ibm.com Cc: david.vrabel@citrix.com Cc: sanjeevb@broadcom.com Cc: siva.kallam@broadcom.com Cc: vyasevich@gmail.com Cc: xen-devel@lists.xensource.com Link: http://lkml.kernel.org/r/20150417190448.GA9462@l.oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On x86-64, __copy_instruction() always returns 0 (error) if the
instruction uses %rip-relative addressing. This is because
kernel_insn_init() is called the second time for 'insn' instance
in such cases and sets all its fields to 0.
Because of this, trying to place a kprobe on such instruction
will fail, register_kprobe() will return -EINVAL.
Buffers allocated by dma_alloc_coherent() are always zeroed on Alpha,
ARM (32bit), MIPS, PowerPC, x86/x86_64 and probably other architectures.
It turned out that some drivers rely on this 'feature'. Allocated buffer
might be also exposed to userspace with dma_mmap() call, so clearing it
is desired from security point of view to avoid exposing random memory
to userspace. This patch unifies dma_alloc_coherent() behavior on ARM64
architecture with other implementations by unconditionally zeroing
allocated buffer.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
[will: ported to 3.14.y] Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
is_valid_cache returns true if the specified cache is valid.
Unfortunately, if the parameter passed it out of range, we return
-ENOENT, which ends up as true leading to potential hilarity.
This patch returns false on the failure path instead.
Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Running sparse results in a bunch of noisy address space mismatches
thanks to the broken __percpu annotation on kvm_get_running_vcpus.
This function returns a pcpu pointer to a pointer, not a pointer to a
pcpu pointer. This patch fixes the annotation, which kills the warnings
from sparse.
Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sparse kicks up about a type mismatch for kvm_target_cpu:
arch/arm64/kvm/guest.c:271:25: error: symbol 'kvm_target_cpu' redeclared with different type (originally declared at ./arch/arm64/include/asm/kvm_host.h:45) - different modifiers
so fix this by adding the missing const attribute to the function
declaration.
Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
esr_el2 field of struct kvm_vcpu_fault_info has u32 type.
It should be stored as word. Current code works in LE case
because existing puts least significant word of x1 into
esr_el2, and it puts most significant work of x1 into next
field, which accidentally is OK because it is updated again
by next instruction. But existing code breaks in BE case.
Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
HSCTLR.EE is defined as bit[25] referring to arm manual
DDI0606C.b(p1590).
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Li Liu <john.liuli@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I suspect this is a -ECUTPASTE fault from the initial implementation. If
we don't declare the register ID to be KVM_REG_ARM64 the KVM_GET_ONE_REG
implementation kvm_arm_get_reg() returns -EINVAL and hilarity ensues.
The kvm/api.txt document describes all arm64 registers as starting with
0x60xx... (i.e KVM_REG_ARM64).
Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A userspace process can map device MMIO memory via VFIO or /dev/mem,
e.g., for platform device passthrough support in QEMU.
During early development, we found the PAGE_S2 memory type being used
for MMIO mappings. This patch corrects that by using the more strongly
ordered memory type for device MMIO mappings: PAGE_S2_DEVICE.
Signed-off-by: Kim Phillips <kim.phillips@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently when a KVM region is deleted or moved after
KVM_SET_USER_MEMORY_REGION ioctl, the corresponding
intermediate physical memory is not unmapped.
This patch corrects this and unmaps the region's IPA range
in kvm_arch_commit_memory_region using unmap_stage2_range.
Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
unmap_range() was utterly broken, to quote Marc, and broke in all sorts
of situations. It was also quite complicated to follow and didn't
follow the usual scheme of having a separate iterating function for each
level of page tables.
Address this by refactoring the code and introduce a pgd_clear()
function.
Reviewed-by: Jungseok Lee <jays.lee@samsung.com> Reviewed-by: Mario Smarduch <m.smarduch@samsung.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a memory barrier to ensure the valid bit is read before
any of the cqe payload is read. This fixes an issue seen
on Power where the cqe payload was getting loaded before
the valid bit. When this occurred, we saw an iotag out of
range error when a command completed, but since the iotag
looked invalid the command didn't get completed to scsi core.
Later we hit the command timeout, attempted to abort the command,
then waited for the aborted command to get returned. Since the
adapter already returned the command, we timeout waiting,
and end up escalating EEH all the way to host reset. This
patch fixes this issue.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ben Hutchings [Tue, 16 Jun 2015 21:11:06 +0000 (22:11 +0100)]
pipe: iovec: Fix memory corruption when retrying atomic copy as non-atomic
pipe_iov_copy_{from,to}_user() may be tried twice with the same iovec,
the first time atomically and the second time not. The second attempt
needs to continue from the iovec position, pipe buffer offset and
remaining length where the first attempt failed, but currently the
pipe buffer offset and remaining length are reset. This will corrupt
the piped data (possibly also leading to an information leak between
processes) and may also corrupt kernel memory.
This was fixed upstream by commits f0d1bec9d58d ("new helper:
copy_page_from_iter()") and 637b58c2887e ("switch pipe_read() to
copy_page_to_iter()"), but those aren't suitable for stable. This fix
for older kernel versions was made by Seth Jennings for RHEL and I
have extracted it from their update.
CVE-2015-1805
References: https://bugzilla.redhat.com/show_bug.cgi?id=1202855 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
BugLink: https://bugs.launchpad.net/bugs/1427680
This device requires new firmware files
AthrBT_0x11020100.dfu and ramps_0x11020100_40.dfu added to
/lib/firmware/ar3k/ that are not included in linux-firmware yet.
BugLink: https://bugs.launchpad.net/bugs/1462614
This device requires new firmware files
AthrBT_0x11020100.dfu and ramps_0x11020100_40.dfu added to
/lib/firmware/ar3k/ that are not included in linux-firmware yet.
Turns out 1366x768 does not in fact work on this hardware.
Signed-off-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Worse yet, reading the error message (the filter again) it says that
there was no error, when there clearly was. The issue is that the
code that checks the input does not check for balanced ops. That is,
having an op between a closed parenthesis and the next token.
This would only cause a warning, and fail out before doing any real
harm, but it should still not caues a warning, and the error reported
should work:
The hwrng output buffers (2) are cast inside of a a struct (caam_rng_ctx)
allocated in one DMA-tagged region. While the kernel's heap allocator
should place the overall struct on a cacheline aligned boundary, the 2
buffers contained within may not necessarily align. Consenquently, the ends
of unaligned buffers may not fully flush, and if so, stale data will be left
behind, resulting in small repeating patterns.
This fix aligns the buffers inside the struct.
Note that not all of the data inside caam_rng_ctx necessarily needs to be
DMA-tagged, only the buffers themselves require this. However, a fix would
incur the expense of error-handling bloat in the case of allocation failure.
Signed-off-by: Steve Cornelius <steve.cornelius@freescale.com> Signed-off-by: Victoria Milhoan <vicki.milhoan@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Orphans in the fs tree are cleaned up via open_ctree and subvolume
orphans are cleaned via btrfs_lookup_dentry -- except when a default
subvolume is in use. The name for the default subvolume uses a manual
lookup that doesn't trigger orphan cleanup and needs to trigger it
manually as well. This doesn't apply to the remount case since the
subvolumes are cleaned up by walking the root radix tree.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fiemap_fill_next_extent returns 0 on success, -errno on error, 1 if this was
the last extent that will fit in user array. If 1 is returned, the return
value may eventually returned to user space, which should not happen, according
to manpage of ioctl.
Signed-off-by: Chengyu Song <csong84@gatech.edu> Reviewed-by: David Sterba <dsterba@suse.cz> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Until recently, mac80211 overwrote all the statistics it could
provide when getting called, but it now relies on the struct
having been zeroed by the caller. This was always the case in
nl80211, but wext used a static struct which could even cause
values from one device leak to another.
Using a static struct is OK (as even documented in a comment)
since the whole usage of this function and its return value is
always locked under RTNL. Not clearing the struct for calling
the driver has always been wrong though, since drivers were
free to only fill values they could report, so calling this
for one device and then for another would always have leaked
values from one to the other.
Fix this by initializing the structure in question before the
driver method call.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=99691
Reported-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Reported-by: Alexander Kaltsas <alexkaltsas@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This seems an use-after-free problem, and the root cause is
zone->wait_table was not set to *NULL* after free it in
try_offline_node.
When hot re-add a node, we will reuse the pgdat of it, so does the zone
struct, and when add pages to the target zone, it will init the zone
first (including the wait_table) if the zone is not initialized. The
judgement of zone initialized is based on zone->wait_table:
static inline bool zone_is_initialized(struct zone *zone)
{
return !!zone->wait_table;
}
so if we do not set the zone->wait_table to *NULL* after free it, the
memory hotplug routine will skip the init of new zone when hot re-add
the node, and the wait_table still points to the freed memory, then we
will access the invalid address when trying to wake up the waiting
people after the i/o operation with the page is done, such as mentioned
above.
The driver configures the IDLE condition to interrupt the SDMA engine.
Since the SDMA UART ROM script doesn't clear the IDLE bit itself, this
caused repeated 1-byte DMA transfers, regardless of available data in the
RX FIFO. Also, when returning due to the IDLE condition, the UART ROM
script already increased its counter, causing residue to be off by one.
This patch clears the IDLE condition to avoid repeated 1-byte DMA transfers
and decreases count by when the DMA transfer was aborted due to the IDLE
condition, fixing serial transfers using DMA on i.MX6Q.
Reported-by: Peter Seiderer <ps.report@gmx.net> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Tested-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Laptop with Turks/Thames GPU will freeze if dpm is enabled. It seems
the SMC engine is relying on some state inside the CP engine. CP needs
to chew at least one packet for it to get in good state for dynamic
power management.
This patch simply disabled and re-enable DPM after the ring test which
is enough to avoid the freeze.
Passive DP->DVI/HDMI dongles on DP++ ports show up to the system as HDMI
devices, as they do not have a sink device in them to respond to any AUX
traffic. When probing these dongles over the DDC, sometimes they will
NAK the first attempt even though the transaction is valid and they
support the DDC protocol. The retry loop inside of
drm_do_probe_ddc_edid() would normally catch this case and try the
transaction again, resulting in success.
drm: give up on edid retries when i2c bus is not responding
This added code to exit immediately if the return code from the
i2c_transfer function was -ENXIO in order to reduce the amount of time
spent in waiting for unresponsive or disconnected devices. That was
possible because the underlying i2c bit banging algorithm had retries of
its own (which, of course, were part of the reason for the bug the
commit fixes).
we've been flipping back and forth enabling the GMBUS transfers, but
we've settled since then. The GMBUS implementation does not do any
retries, however, bailing out of the drm_do_probe_ddc_edid() retry loop
on first encounter of -ENXIO. This, combined with Eugeni's commit, broke
the retry on -ENXIO.
Retry GMBUS once on -ENXIO on first message to mitigate the issues with
passive adapters.
This patch is based on the work, and commit message, by Todd Previte
<tprevite@gmail.com>.
According to the HSW b-spec we need to try clock divisors of 63
and 72, each 3 or more times, when attempting DP AUX channel
communication on a server chipset. This actually wasn't happening
due to a short-circuit that only checked the DP_AUX_CH_CTL_DONE bit
in status rather than checking that the operation was done and
that DP_AUX_CH_CTL_TIME_OUT_ERROR was not set.
[v2] Implemented alternate solution suggested by Jani Nikula.
Signed-off-by: Jim Bride <jim.bride@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The subtraction here was using a signed integer and did not have any
bounds checking at all. This commit adds proper bounds checking, made
easy by use of an unsigned integer. This way, a single packet won't be
able to remotely trigger a massive loop, locking up the system for a
considerable amount of time. A PoC follows below, which requires
ozprotocol.h from this module.
static int hex2num(char c)
{
if (c >= '0' && c <= '9')
return c - '0';
if (c >= 'a' && c <= 'f')
return c - 'a' + 10;
if (c >= 'A' && c <= 'F')
return c - 'A' + 10;
return -1;
}
static int hwaddr_aton(const char *txt, uint8_t *addr)
{
int i;
for (i = 0; i < 6; i++) {
int a, b;
a = hex2num(*txt++);
if (a < 0)
return -1;
b = hex2num(*txt++);
if (b < 0)
return -1;
*addr++ = (a << 4) | b;
if (i < 5 && *txt++ != ':')
return -1;
}
return 0;
}
int main(int argc, char *argv[])
{
if (argc < 3) {
fprintf(stderr, "Usage: %s interface destination_mac\n", argv[0]);
return 1;
}
uint8_t dest_mac[6];
if (hwaddr_aton(argv[2], dest_mac)) {
fprintf(stderr, "Invalid mac address.\n");
return 1;
}
int sockfd = socket(AF_PACKET, SOCK_RAW, IPPROTO_RAW);
if (sockfd < 0) {
perror("socket");
return 1;
}
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A network supplied parameter was not checked before division, leading to
a divide-by-zero. Since this happens in the softirq path, it leads to a
crash. A PoC follows below, which requires the ozprotocol.h file from
this module.
static int hex2num(char c)
{
if (c >= '0' && c <= '9')
return c - '0';
if (c >= 'a' && c <= 'f')
return c - 'a' + 10;
if (c >= 'A' && c <= 'F')
return c - 'A' + 10;
return -1;
}
static int hwaddr_aton(const char *txt, uint8_t *addr)
{
int i;
for (i = 0; i < 6; i++) {
int a, b;
a = hex2num(*txt++);
if (a < 0)
return -1;
b = hex2num(*txt++);
if (b < 0)
return -1;
*addr++ = (a << 4) | b;
if (i < 5 && *txt++ != ':')
return -1;
}
return 0;
}
int main(int argc, char *argv[])
{
if (argc < 3) {
fprintf(stderr, "Usage: %s interface destination_mac\n", argv[0]);
return 1;
}
uint8_t dest_mac[6];
if (hwaddr_aton(argv[2], dest_mac)) {
fprintf(stderr, "Invalid mac address.\n");
return 1;
}
int sockfd = socket(AF_PACKET, SOCK_RAW, IPPROTO_RAW);
if (sockfd < 0) {
perror("socket");
return 1;
}
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since elt->length is a u8, we can make this variable a u8. Then we can
do proper bounds checking more easily. Without this, a potentially
negative value is passed to the memcpy inside oz_hcd_get_desc_cnf,
resulting in a remotely exploitable heap overflow with network
supplied data.
This could result in remote code execution. A PoC which obtains DoS
follows below. It requires the ozprotocol.h file from this module.
static int hex2num(char c)
{
if (c >= '0' && c <= '9')
return c - '0';
if (c >= 'a' && c <= 'f')
return c - 'a' + 10;
if (c >= 'A' && c <= 'F')
return c - 'A' + 10;
return -1;
}
static int hwaddr_aton(const char *txt, uint8_t *addr)
{
int i;
for (i = 0; i < 6; i++) {
int a, b;
a = hex2num(*txt++);
if (a < 0)
return -1;
b = hex2num(*txt++);
if (b < 0)
return -1;
*addr++ = (a << 4) | b;
if (i < 5 && *txt++ != ':')
return -1;
}
return 0;
}
int main(int argc, char *argv[])
{
if (argc < 3) {
fprintf(stderr, "Usage: %s interface destination_mac\n", argv[0]);
return 1;
}
uint8_t dest_mac[6];
if (hwaddr_aton(argv[2], dest_mac)) {
fprintf(stderr, "Invalid mac address.\n");
return 1;
}
int sockfd = socket(AF_PACKET, SOCK_RAW, IPPROTO_RAW);
if (sockfd < 0) {
perror("socket");
return 1;
}
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 334c86c494b9 ("MIPS: IRQ: Add stackoverflow detection") added
kernel stack overflow detection, however it only enabled it conditional
upon the preprocessor definition DEBUG_STACKOVERFLOW, which is never
actually defined. The Kconfig option is called DEBUG_STACKOVERFLOW,
which manifests to the preprocessor as CONFIG_DEBUG_STACKOVERFLOW, so
switch it to using that definition instead.
The early_idt_handlers asm code generates an array of entry
points spaced nine bytes apart. It's not really clear from that
code or from the places that reference it what's going on, and
the code only works in the first place because GAS never
generates two-byte JMP instructions when jumping to global
labels.
Clean up the code to generate the correct array stride (member size)
explicitly. This should be considerably more robust against
screw-ups, as GAS will warn if a .fill directive has a negative
count. Using '. =' to advance would have been even more robust
(it would generate an actual error if it tried to move
backwards), but it would pad with nulls, confusing anyone who
tries to disassemble the code. The new scheme should be much
clearer to future readers.
While we're at it, improve the comments and rename the array and
common code.
Binutils may start relaxing jumps to non-weak labels. If so,
this change will fix our build, and we may need to backport this
change.