Forcing in_interrupt() to return true if we're not in a bona fide
interrupt confuses the softirq code. This fixes warnings like:
NOHZ: local_softirq_pending 282
... which can happen when running things like selftests/x86.
This will change perf's static percpu buffer usage in IST context.
I think this is okay, and it's changing the behavior to match
historical (pre-4.0) behavior.
Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 959274753857 ("x86, traps: Track entry into and exit from IST context") Link: http://lkml.kernel.org/r/cdc215f94d118d691d73df35275022331156fb45.1464130360.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Christian Borntraeger reported a kernel panic after corrupt page counts,
and it turned out to be a regression introduced with commit aa88b68c3b1d
("thp: keep huge zero page pinned until tlb flush"), at least on s390.
put_huge_zero_page() was moved over from zap_huge_pmd() to
release_pages(), and it was replaced by tlb_remove_page(). However,
release_pages() might not always be triggered by (the arch-specific)
tlb_remove_page().
On s390 we call free_page_and_swap_cache() from tlb_remove_page(), and
not tlb_flush_mmu() -> free_pages_and_swap_cache() like the generic
version, because we don't use the MMU-gather logic. Although both
functions have very similar names, they are doing very unsimilar things,
in particular free_page_xxx is just doing a put_page(), while
free_pages_xxx calls release_pages().
This of course results in very harmful put_page()s on the huge zero
page, on architectures where tlb_remove_page() is implemented in this
way. It seems to affect only s390 and sh, but sh doesn't have THP
support, so the problem (currently) probably only exists on s390.
The following quick hack fixed the issue:
Link: http://lkml.kernel.org/r/20160602172141.75c006a9@thinkpad Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
iwpriv app uses iw_point structure to send data to Kernel. The iw_point
structure holds a pointer. For compatibility Kernel converts the pointer
as required for WEXT IOCTLs (SIOCIWFIRST to SIOCIWLAST). Some drivers
may use iw_handler_def.private_args to populate iwpriv commands instead
of iw_handler_def.private. For those case, the IOCTLs from
SIOCIWFIRSTPRIV to SIOCIWLASTPRIV will follow the path ndo_do_ioctl().
Accordingly when the filled up iw_point structure comes from 32 bit
iwpriv to 64 bit Kernel, Kernel will not convert the pointer and sends
it to driver. So, the driver may get the invalid data.
The pointer conversion for the IOCTLs (SIOCIWFIRSTPRIV to
SIOCIWLASTPRIV), which follow the path ndo_do_ioctl(), is mandatory.
This patch adds pointer conversion from 32 bit to 64 bit and vice versa,
if the ioctl comes from 32 bit iwpriv to 64 bit Kernel.
Signed-off-by: Prasun Maiti <prasunmaiti87@gmail.com> Signed-off-by: Ujjal Roy <royujjal@gmail.com> Tested-by: Dibyajyoti Ghosh <dibyajyotig@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This prevents users from triggering a stack overflow through a recursive
invocation of pagefault handling that involves mapping procfs files into
virtual memory.
memcg_offline_kmem() may be called from memcg_free_kmem() after a css
init failure. memcg_free_kmem() is a ->css_free callback which is
called without cgroup_mutex and memcg_offline_kmem() ends up using
css_for_each_descendant_pre() without any locking. Fix it by adding rcu
read locking around it.
mkdir: cannot create directory `65530': No space left on device
===============================
[ INFO: suspicious RCU usage. ]
4.6.0-work+ #321 Not tainted
-------------------------------
kernel/cgroup.c:4008 cgroup_mutex or RCU read lock required!
[ 527.243970] other info that might help us debug this:
[ 527.244715]
rcu_scheduler_active = 1, debug_locks = 0
2 locks held by kworker/0:5/1664:
#0: ("cgroup_destroy"){.+.+..}, at: [<ffffffff81060ab5>] process_one_work+0x165/0x4a0
#1: ((&css->destroy_work)#3){+.+...}, at: [<ffffffff81060ab5>] process_one_work+0x165/0x4a0
[ 527.248098] stack backtrace:
CPU: 0 PID: 1664 Comm: kworker/0:5 Not tainted 4.6.0-work+ #321
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
Workqueue: cgroup_destroy css_free_work_fn
Call Trace:
dump_stack+0x68/0xa1
lockdep_rcu_suspicious+0xd7/0x110
css_next_descendant_pre+0x7d/0xb0
memcg_offline_kmem.part.44+0x4a/0xc0
mem_cgroup_css_free+0x1ec/0x200
css_free_work_fn+0x49/0x5e0
process_one_work+0x1c5/0x4a0
worker_thread+0x49/0x490
kthread+0xea/0x100
ret_from_fork+0x1f/0x40
Link: http://lkml.kernel.org/r/20160526203018.GG23194@mtj.duckdns.org Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This means the userspace program clock_adjtime called the clock_adjtime()
syscall and then crashed inside the compat_get_timex() function.
Syscalls should never crash programs, but instead return EFAULT.
The IIR register contains the executed instruction, which disassebles
into "ldw 0(sr3,r5),r9".
This load-word instruction is part of __get_user() which tried to read the word
at %r5/IOR (0xfa6f7fff). This means the unaligned handler jumped in. The
unaligned handler is able to emulate all ldw instructions, but it fails if it
fails to read the source e.g. because of page fault.
int main(void) {
/* allocate 8k */
char *ptr = mmap(NULL, 2*4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
/* free second half (upper 4k) and make it invalid. */
munmap(ptr+4096, 4096);
/* syscall where first int is unaligned and clobbers into invalid memory region */
/* syscall should return EFAULT */
return syscall(__NR_clock_adjtime, 0, ptr+4095);
}
To fix this issue we simply need to check if the faulting instruction address
is in the exception fixup table when the unaligned handler failed. If it
is, call the fixup routine instead of crashing.
While looking at the unaligned handler I found another issue as well: The
target register should not be modified if the handler was unsuccessful.
When a dual-edge irq is triggered, an incorrect irq will be reported on
condition that the external signal is not stable and this incorrect irq
has been registered.
Correct the register offset.
When we converted the asm routines to C functions, we missed updating
HPTE_R_R based on _PAGE_ACCESSED. ASM code used to copy over the lower
bits from pte via.
andi. r3,r30,0x1fe /* Get basic set of flags */
We also update the code such that we won't update the Change bit ('C'
bit) always. This was added by commit c5cf0e30bf3d8 ("powerpc: Fix
buglet with MMU hash management").
With hash64, we need to make sure that hardware doesn't do a pte update
directly. This is because we do end up with entries in TLB with no hash
page table entry. This happens because when we find a hash bucket full,
we "evict" a more/less random entry from it. When we do that we don't
invalidate the TLB (hpte_remove) because we assume the old translation
is still technically "valid". For more info look at commit 0608d692463("powerpc/mm: Always invalidate tlb on hpte invalidate and
update").
Thus it's critical that valid hash PTEs always have reference bit set
and writeable ones have change bit set. We do this by hashing a
non-dirty linux PTE as read-only and always setting _PAGE_ACCESSED (and
thus R) when hashing anything else in. Any attempt by Linux at clearing
those bits also removes the corresponding hash entry.
Commit 5cf0e30bf3d8 did that for 'C' bit by enabling 'C' bit always.
We don't really need to do that because we never map a RW pte entry
without setting 'C' bit. On READ fault on a RW pte entry, we still map
it READ only, hence a store update in the page will still cause a hash
pte fault.
This patch reverts the part of commit c5cf0e30bf3d8 ("[PATCH] powerpc:
Fix buglet with MMU hash management") and retain the updatepp part.
- If we hit the updatepp path on native, the old code without that
commit, would fail to set C bcause native_hpte_updatepp()
was implemented to filter the same bits as H_PROTECT and not let C
through thus we would "upgrade" a RO HPTE to RW without setting C
thus causing the bug. So the real fix in that commit was the change
to native_hpte_updatepp
Fixes: 89ff725051d1 ("powerpc/mm: Convert __hash_page_64K to C") Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we do not provide the PVR for POWER8NVL, a guest on this system
currently ends up in PowerISA 2.06 compatibility mode on KVM, since QEMU
does not provide a generic PowerISA 2.07 mode yet. So some new
instructions from POWER8 (like "mtvsrd") get disabled for the guest,
resulting in crashes when using code compiled explicitly for
POWER8 (e.g. with the "-mcpu=power8" option of GCC).
Fixes: ddee09c099c3 ("powerpc: Add PVR for POWER8NVL processor") Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We are already using the privileged versions of MMCR0, MMCR1
and MMCRA in the kernel, so for MMCR2, we should better use
the privileged versions, too, to be consistent.
Fixes: 240686c13687 ("powerpc: Initialise PMU related regs on Power8") Suggested-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The SIAR and SDAR registers are available twice, one time as SPRs
780 / 781 (unprivileged, but read-only), and one time as the SPRs
796 / 797 (privileged, but read and write). The Linux kernel code
currently uses the unprivileged SPRs - while this is OK for reading,
writing to that register of course does not work.
Since the KVM code tries to write to this register, too (see the mtspr
in book3s_hv_rmhandlers.S), the contents of this register sometimes get
lost for the guests, e.g. during migration of a VM.
To fix this issue, simply switch to the privileged SPR numbers instead.
Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the "ibm,configure-pe" and "ibm,configure-bridge" RTAS calls, the
spec states that values of 9900-9905 can be returned, indicating that
software should delay for 10^x (where x is the last digit, i.e. 990x)
milliseconds and attempt the call again. Currently, the kernel doesn't
know about this, and respecting it fixes some PCI failures when the
hypervisor is busy.
The delay is capped at 0.2 seconds.
Signed-off-by: Russell Currey <ruscur@russell.cc> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for
hardware AF/DBM") ensured that pte flags are updated atomically in the
face of potential concurrent, hardware-assisted updates. However, Alex
reports that:
| This patch breaks swapping for me.
| In the broken case, you'll see either systemd cpu time spike (because
| it's stuck in a page fault loop) or the system hang (because the
| application owning the screen is stuck in a page fault loop).
It turns out that this is because the 'dirty' argument to
ptep_set_access_flags is always 0 for read faults, and so we can't use
it to set PTE_RDONLY. The failing sequence is:
1. We put down a PTE_WRITE | PTE_DIRTY | PTE_AF pte
2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF
3. A read faults due to the missing access flag
4. ptep_set_access_flags is called with dirty = 0, due to the read fault
5. pte is then made PTE_WRITE | PTE_DIRTY | PTE_AF | PTE_RDONLY (!)
6. A write faults, but pte_write is true so we get stuck
The solution is to check the new page table entry (as would be done by
the generic, non-atomic definition of ptep_set_access_flags that just
calls set_pte_at) to establish the dirty state.
Fixes: 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM") Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Alexander Graf <agraf@suse.de> Tested-by: Alexander Graf <agraf@suse.de> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ccp-crypto module for AES XTS support has a bug that can allow requests
greater than 4096 bytes in size to be passed to the CCP hardware. The CCP
hardware does not support request sizes larger than 4096, resulting in
incorrect output. The request should actually be handled by the fallback
mechanism instantiated by the ccp-crypto module.
Add a check to insure the request size is less than or equal to the maximum
supported size and use the fallback mechanism if it is not.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In some rare randconfig builds, we can end up with
ASYMMETRIC_PUBLIC_KEY_SUBTYPE enabled but CRYPTO_AKCIPHER disabled,
which fails to link because of the reference to crypto_alloc_akcipher:
crypto/built-in.o: In function `public_key_verify_signature':
:(.text+0x110e4): undefined reference to `crypto_alloc_akcipher'
This adds a Kconfig 'select' statement to ensure the dependency
is always there.
The s390 BFP compiler currently uses relative branch instructions
that only support jumps up to 64 KB. Examples are "j", "jnz", "cgrj",
etc. Currently the maximum size of s390 BPF programs is set
to 0x7ffff. If branches over 64 KB are generated the, kernel can
crash due to incorrect code.
So fix this an reduce the maximum size to 64 KB. Programs larger than
that will be interpreted.
Fixes: ce2b6ad9c185 ("s390/bpf: increase BPF_SIZE_MAX") Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
gpiolib relies on the reference counters to clean up the gpio_device
structure.
Although the number of get/put is properly aligned on gpiolib.c
itself, it does not take into consideration how the referece counters
are affected by other external functions such as cdev_add and device_add.
Because of this, after the last call to put_device, the reference counter
has a value of +3, therefore never calling gpiodevice_release.
Due to the fact that some of the device has already been cleaned on
gpiochip_remove, the library will end up OOPsing the kernel (e.g. a call
to of_gpiochip_find_and_xlate).
The bcm_kona_gpio_reset() calls bcm_kona_gpio_write_lock_regs()
with what looks like the wrong parameter. The write_lock_regs
function takes a pointer to the registers, not the bcm_kona_gpio
structure.
Fix the warning, and probably bug by changing the function to
pass reg_base instead of kona_gpio, fixing the following warning:
drivers/gpio/gpio-bcm-kona.c:550:47: warning: incorrect type in argument 1
(different address spaces)
expected void [noderef] <asn:2>*reg_base
got struct bcm_kona_gpio *kona_gpio
warning: incorrect type in argument 1 (different address spaces)
expected void [noderef] <asn:2>*reg_base
got struct bcm_kona_gpio *kona_gpio
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Acked-by: Ray Jui <ray.jui@broadcom.com> Reviewed-by: Markus Mayer <mmayer@broadcom.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In fdeb8e1547cb9dd39d5d7223b33f3565cf86c28e
("gpio: reflect base and ngpio into gpio_device")
assumed that GPIO descriptors are either valid or error
pointers, but gpiod_get_[index_]optional() actually return
NULL descriptors and then all subsequent calls should just
bail out.
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Andrew Lunn <andrew@lunn.ch> Fixes: fdeb8e1547cb ("gpio: reflect base and ngpio into gpio_device") Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
PTRACE_SETVFPREGS fails to properly mark the VFP register set to be
reloaded, because it undoes one of the effects of vfp_flush_hwstate().
Specifically vfp_flush_hwstate() sets thread->vfpstate.hard.cpu to
an invalid CPU number, but vfp_set() overwrites this with the original
CPU number, thereby rendering the hardware state as apparently "valid",
even though the software state is more recent.
Fix this by reverting the previous change.
Fixes: 8130b9d7b9d8 ("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers") Acked-by: Will Deacon <will.deacon@arm.com> Tested-by: Simon Marchi <simon.marchi@ericsson.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
That is some different register for ALC255 and ALC256.
ALC256 can't fit with some ALC255 register.
This issue is cause from LDO output voltage control.
This patch is updated the right LDO register value.
The global variable __oprofile_cpu_pmu is set before the PMU is fully
initialized. If an error occurs before the end of the initialization,
the PMU will be freed and the variable will contain an invalid pointer.
This will result in a kernel crash when perf will be used.
Fix it by moving the setting of __oprofile_cpu_pmu when the PMU is fully
initialized (i.e when it is no longer possible to fail).
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
MOV to DR6 or DR7 causes a #GP if an attempt is made to write a 1 to
any of bits 63:32. However, this is not detected at KVM_SET_DEBUGREGS
time, and the next KVM_RUN oopses:
When saving the state of the list registers, it is critical to
reset them zero, as we could otherwise leave unexpected EOI
interrupts pending for virtual level interrupts.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When saving the state of the list registers, it is critical to
reset them zero, as we could otherwise leave unexpected EOI
interrupts pending for virtual level interrupts.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
At present we perform an xfrm_lookup() for each UDPv6 message we
send. The lookup involves querying the flow cache (flow_cache_lookup)
and, in case of a cache miss, creating an XFRM bundle.
If we miss the flow cache, we can end up creating a new bundle and
deriving the path MTU (xfrm_init_pmtu) from on an already transformed
dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down
to xfrm_lookup(). This can happen only if we're caching the dst_entry
in the socket, that is when we're using a connected UDP socket.
To put it another way, the path MTU shrinks each time we miss the flow
cache, which later on leads to incorrectly fragmented payload. It can
be observed with ESPv6 in transport mode:
1) Set up a transformation and lower the MTU to trigger fragmentation
# ip xfrm policy add dir out src ::1 dst ::1 \
tmpl src ::1 dst ::1 proto esp spi 1
# ip xfrm state add src ::1 dst ::1 \
proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b
# ip link set dev lo mtu 1500
2) Monitor the packet flow and set up an UDP sink
# tcpdump -ni lo -ttt &
# socat udp6-listen:12345,fork /dev/null &
4) Compare it to a non-connected socket
# perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345
00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448
00:00:00.000010 IP6 ::1 > ::1: frag (1448|64)
What happens in step (3) is:
1) when connecting the socket in __ip6_datagram_connect(), we
perform an XFRM lookup, miss the flow cache, create an XFRM
bundle, and cache the destination,
2) afterwards, when sending the datagram, we perform an XFRM lookup,
again, miss the flow cache (due to mismatch of flowi6_iif and
flowi6_oif, which is an issue of its own), and recreate an XFRM
bundle based on the cached (and already transformed) destination.
To prevent the recreation of an XFRM bundle, avoid an XFRM lookup
altogether whenever we already have a destination entry cached in the
socket. This prevents the path MTU shrinkage and brings us on par with
UDPv4.
The fix also benefits connected PINGv6 sockets, another user of
ip6_sk_dst_lookup_flow(), who also suffer messages being transformed
twice.
Joint work with Hannes Frederic Sowa.
Reported-by: Jan Tluka <jtluka@redhat.com> Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Unused fields of udp_cfg must be all zeros. Otherwise
setup_udp_tunnel_sock() fills ->gro_receive and ->gro_complete
callbacks with garbage, eventually resulting in panic when used by
udp_gro_receive().
v2: use empty initialiser instead of "{ NULL }" to avoid relying on
first field's type.
Fixes: 38fd2af24fcf ("udp: Add socket based GRO and config") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The missing br_vlan_should_use() test caused creation of an unneeded
local fdb entry on changing mac address of a bridge device when there is
a vlan which is configured on a bridge port but not on the bridge
device.
Fixes: 2594e9064a57 ("bridge: vlan: add per-vlan struct and move to rhashtables") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In bpf_perf_event_read() and bpf_perf_event_output(), we must use
READ_ONCE() for fetching the struct file pointer, which could get
updated concurrently, so we must prevent the compiler from potential
refetching.
We already do this with tail calls for fetching the related bpf_prog,
but not so on stored perf events. Semantics for both are the same
with regards to updates.
Fixes: a43eec304259 ("bpf: introduce bpf_perf_event_output() helper") Fixes: 35578d798400 ("bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since NAPI works by shutting down event interrupts when theres
work and turning them on when theres none, the net driver must
make sure that interrupts are disabled when it reschedules polling.
By calling napi_reschedule, the driver switches to polling mode,
therefor there should be no interrupt interference.
Any received packets will be handled in nps_enet_poll by polling the HW
indication of received packet until all packets are handled.
Signed-off-by: Elad Kanfi <eladkan@mellanox.com> Acked-by: Noam Camus <noamca@mellanox.com> Tested-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When create a new vxlan link, example:
ip link add vtap mtu 1440 type vxlan vni 1 dev eth0
The argument "mtu" has no effect, because it is not set to conf->mtu. The
default value is used in vxlan_dev_configure function.
This problem was introduced by commit 0dfbdf4102b9 (vxlan: Factor out device
configuration).
Fixes: 0dfbdf4102b9 (vxlan: Factor out device configuration) Signed-off-by: Chen Haiquan <oc@yunify.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The memcpy() currently copies mdio_bus_data into new_bus->irq, which
makes no sense, since the mdio_bus_data structure contains more than
just irqs. The code was likely supposed to copy mdio_bus_data->irqs
into the new_bus->irq instead, so fix this.
Fixes: e7f4dc3536a4 ("mdio: Move allocation of interrupts into core") Signed-off-by: Marek Vasut <marex@denx.de> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch follows Eric Dumazet's commit 7b70176421 for Atheros
atl1c driver to fix one exactly same bug in alx driver, that the
network link will be lost in 1-5 minutes after the device is up.
My laptop Lenovo Y580 with Atheros AR8161 ethernet device hit the
same problem with kernel 4.4, and it will be cured by Jarod Wilson's
commit c406700c for alx driver which get merged in 4.5. But there
are still some alx devices can't function well even with Jarod's
patch, while this patch could make them work fine. More details on
https://bugzilla.kernel.org/show_bug.cgi?id=70761
The debug shows the issue is very likely to be related with the RX
DMA address, specifically 0x...f80, if RX buffer get 0x...f80 several
times, their will be RX overflow error and device will stop working.
For kernel 4.5.0 with Jarod's patch which works fine with my
AR8161/Lennov Y580, if I made some change to the
__netdev_alloc_skb
--> __alloc_page_frag()
to make the allocated buffer can get an address with 0x...f80,
then the same error happens. If I make it to 0x...f40 or 0x....fc0,
everything will be still fine. So I tend to believe that the
0x..f80 address cause the silicon to behave abnormally.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70761 Cc: Eric Dumazet <edumazet@google.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Jarod Wilson <jarod@redhat.com> Signed-off-by: Feng Tang <feng.tang@intel.com> Tested-by: Ole Lukoie <olelukoie@mail.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The team_device_event() notifier calls team_compute_features() to fix
vlan_features under team->lock to protect team->port_list. The problem is
that subsequent __team_compute_features() calls netdev_change_features()
to propagate vlan_features to upper vlan devices while team->lock is still
taken. This can lead to deadlock when NETIF_F_LRO is modified on lower
devices or team device itself.
Example:
The team0 as active backup with eth0 and eth1 NICs. Both eth0 & eth1 are
LRO capable and LRO is enabled. Thus LRO is also enabled on team0.
The command 'ethtool -K team0 lro off' now hangs due to this deadlock:
The bug is present in team from the beginning but it appeared after the commit fd867d5 (net/core: generic support for disabling netdev features down stack)
that adds synchronization of features with lower devices.
Fixes: fd867d5 (net/core: generic support for disabling netdev features down stack) Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Otherwise, if we fail to allocate new PIO buffers, our TXQs will try to
use the old ones, which aren't there any more.
Fixes: 183233bec810 "sfc: Allocate and link PIO buffers; map them with write-combining" Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The spinlock used by the hwbm functions must be initialized by the
network driver. This commit fixes this lack and the following erros when
lockdep is enabled:
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
[<c010ff80>] (unwind_backtrace) from [<c010bd08>] (show_stack+0x10/0x14)
[<c010bd08>] (show_stack) from [<c032913c>] (dump_stack+0xb4/0xe0)
[<c032913c>] (dump_stack) from [<c01670e4>] (__lock_acquire+0x1f58/0x2060)
[<c01670e4>] (__lock_acquire) from [<c0167dec>] (lock_acquire+0xa4/0xd0)
[<c0167dec>] (lock_acquire) from [<c06f6650>] (_raw_spin_lock_irqsave+0x54/0x68)
[<c06f6650>] (_raw_spin_lock_irqsave) from [<c058e830>] (hwbm_pool_add+0x1c/0xdc)
[<c058e830>] (hwbm_pool_add) from [<c043f4e8>] (mvneta_bm_pool_use+0x338/0x490)
[<c043f4e8>] (mvneta_bm_pool_use) from [<c0443198>] (mvneta_probe+0x654/0x1284)
[<c0443198>] (mvneta_probe) from [<c03b894c>] (platform_drv_probe+0x4c/0xb0)
[<c03b894c>] (platform_drv_probe) from [<c03b7158>] (driver_probe_device+0x214/0x2c0)
[<c03b7158>] (driver_probe_device) from [<c03b72c4>] (__driver_attach+0xc0/0xc4)
[<c03b72c4>] (__driver_attach) from [<c03b5440>] (bus_for_each_dev+0x68/0x9c)
[<c03b5440>] (bus_for_each_dev) from [<c03b65b8>] (bus_add_driver+0x1a0/0x218)
[<c03b65b8>] (bus_add_driver) from [<c03b79cc>] (driver_register+0x78/0xf8)
[<c03b79cc>] (driver_register) from [<c01018f4>] (do_one_initcall+0x90/0x1dc)
[<c01018f4>] (do_one_initcall) from [<c0900de4>] (kernel_init_freeable+0x15c/0x1fc)
[<c0900de4>] (kernel_init_freeable) from [<c06eed90>] (kernel_init+0x8/0x114)
[<c06eed90>] (kernel_init) from [<c0107910>] (ret_from_fork+0x14/0x24)
Fixes: baa11ebc0c76 ("net: mvneta: Use the new hwbm framework") Reported-by: Russell King <rmk+kernel@armlinux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Follow-up to commit e27f4a942a0e ("bpf: Use mount_nodev not mount_ns
to mount the bpf filesystem"), which removes the FS_USERNS_MOUNT flag.
The original idea was to have a per mountns instance instead of a
single global fs instance, but that didn't work out and we had to
switch to mount_nodev() model. The intent of that middle ground was
that we avoid users who don't play nice to create endless instances
of bpf fs which are difficult to control and discover from an admin
point of view, but at the same time it would have allowed us to be
more flexible with regard to namespaces.
Therefore, since we now did the switch to mount_nodev() as a fix
where individual instances are created, we also need to remove userns
mount flag along with it to avoid running into mentioned situation.
I don't expect any breakage at this early point in time with removing
the flag and we can revisit this later should the requirement for
this come up with future users. This and commit e27f4a942a0e have
been split to facilitate tracking should any of them run into the
unlikely case of causing a regression.
Fixes: b2197755b263 ("bpf: add support for persistent maps/progs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit fa50d974d104 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
moves the default TTL assignment, and as side-effect IPv4 TTL now
has a default value only if sysctl support is enabled (CONFIG_SYSCTL=y).
The sysctl_ip_default_ttl is fundamental for IP to work properly,
as it provides the TTL to be used as default. The defautl TTL may be
used in ip_selected_ttl, through the following flow:
This commit fixes the issue by assigning net->ipv4.sysctl_ip_default_ttl
in net_init_net, called during ipv4's initialization.
Without this commit, a kernel built without sysctl support will send
all IP packets with zero TTL (unless a TTL is explicitly set, e.g.
with setsockopt).
Given a similar issue might appear on the other knobs that were
namespaceify, this commit also moves them.
Fixes: fa50d974d104 ("ipv4: Namespaceify ip_default_ttl sysctl knob") Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
These structures are defined only if __USE_MISC is set in glibc net/if.h
headers, ie when _BSD_SOURCE or _SVID_SOURCE are defined.
CC: Jan Engelhardt <jengelh@inai.de> CC: Josh Boyer <jwboyer@fedoraproject.org> CC: Stephen Hemminger <shemming@brocade.com> CC: Waldemar Brodkorb <mail@waldemar-brodkorb.de> CC: Gabriel Laskar <gabriel@lse.epita.fr> CC: Mikko Rapeli <mikko.rapeli@iki.fi> Fixes: 4a91cb61bb99 ("uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case we find a socket with encapsulation enabled we should call
the encap_recv function even if just a udp header without payload is
available. The callbacks are responsible for correctly verifying and
dropping the packets.
Also, in case the header validation fails for geneve and vxlan we
shouldn't put the skb back into the socket queue, no one will pick
them up there. Instead we can simply discard them in the respective
encap_recv functions.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the
bpf filesystem. Looking at the code I saw a broken usage of mount_ns
with current->nsproxy->mnt_ns. As the code does not acquire a
reference to the mount namespace it can not possibly be correct to
store the mount namespace on the superblock as it does.
Replace mount_ns with mount_nodev so that each mount of the bpf
filesystem returns a distinct instance, and the code is not buggy.
In discussion with Hannes Frederic Sowa it was reported that the use
of mount_ns was an attempt to have one bpf instance per mount
namespace, in an attempt to keep resources that pin resources from
hiding. That intent simply does not work, the vfs is not built to
allow that kind of behavior. Which means that the bpf filesystem
really is buggy both semantically and in it's implemenation as it does
not nor can it implement the original intent.
This change is userspace visible, but my experience with similar
filesystems leads me to believe nothing will break with a model of each
mount of the bpf filesystem is distinct from all others.
Fixes: b2197755b263 ("bpf: add support for persistent maps/progs") Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We used to check dev->reg_state against NETREG_REGISTERED after each
time we are woke up. But after commit 9e641bdcfa4e ("net-tun:
restructure tun_do_read for better sleep/wakeup efficiency"), it uses
skb_recv_datagram() which does not check dev->reg_state. This will
result if we delete a tun/tap device after a process is blocked in the
reading. The device will wait for the reference count which was held
by that process for ever.
Fixes this by using RCV_SHUTDOWN which will be checked during
sk_recv_datagram() before trying to wake up the process during uninit.
Fixes: 9e641bdcfa4e ("net-tun: restructure tun_do_read for better
sleep/wakeup efficiency") Cc: Eric Dumazet <edumazet@google.com> Cc: Xi Wang <xii@google.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In my last commit I replaced MACSEC_SA_ATTR_KEYID by
MACSEC_SA_ATTR_KEY.
Fixes: 8acca6acebd0 ("macsec: key identifier is 128 bits, not 64") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The problem is that fib_info->nh is [0] so the struct fib_info
allocation size depends on number of nexthops. If we just copy fib_info,
we do not copy the nexthops info and driver accesses memory which is not
ours.
Given the fact that fib4 does not defer operations and therefore it does
not need copy, just pass the pointer down to drivers as it was done
before.
Fixes: 850d0cbc91 ("switchdev: remove pointers from switchdev objects") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The publication field of the old netlink API should contain the
publication key and not the publication reference.
Fixes: 44a8ae94fd55 (tipc: convert legacy nl name table dump to nl compat) Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we free cb->skb after a dump, we do it after releasing the
lock. This means that a new dump could have started in the time
being and we'll end up freeing their skb instead of ours.
This patch saves the skb and module before we unlock so we free
the right memory.
Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.") Reported-by: Baozeng Ding <sploving1@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make sure the socket for which the user is listing publication exists
before parsing the socket netlink attributes.
Prior to this patch a call without any socket caused a NULL pointer
dereference in tipc_nl_publ_dump().
Tested-and-reported-by: Baozeng Ding <sploving1@gmail.com> Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Acked-by: Jon Maloy <jon.maloy@ericsson.cm> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix this by suppressing VPD inquiry for this device.
Signed-off-by: Ewan D. Milne <emilne@redhat.com> Reported-by: Jan Stancek <jstancek@redhat.com> Tested-by: Jan Stancek <jstancek@redhat.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When SCSI was written, all commands coming from the filesystem
(REQ_TYPE_FS commands) had data. This meant that our signal for needing
to complete the command was the number of bytes completed being equal to
the number of bytes in the request. Unfortunately, with the advent of
flush barriers, we can now get zero length REQ_TYPE_FS commands, which
confuse this logic because they satisfy the condition every time. This
means they never get retried even for retryable conditions, like UNIT
ATTENTION because we complete them early assuming they're done. Fix
this by special casing the early completion condition to recognise zero
length commands with errors and let them drop through to the retry code.
Reported-by: Sebastian Parschauer <s.parschauer@gmx.de> Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com> Tested-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 5e3ca2b349b1 ("regulator: Try to resolve regulators supplies on
registration") added a call to regulator_resolve_supply() within
regulator_register() where the regulator_list_mutex is held. This causes
a deadlock to occur on the Tegra114 Dalmore board when the palmas PMIC
is registered because regulator_register_resolve_supply() calls
regulator_dev_lookup() which may try to acquire the regulator_list_mutex
again.
Fix this by releasing the mutex before calling
regulator_register_resolve_supply() and update the error exit path to
ensure the mutex is released on an error.
[Made commit message more legible -- broonie]
Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit b9b06cb6feda
("IB/hfi1: Fix missing lock/unlock in verbs drain callback")
added a spin lock.
Unfortunately, the new lock code can be called from a base
level interrupt state, and an interrupt that can get stacked
will attempt to get the same lock.
Fix by using the flag save/restore spin lock variation.
Cc: stable@vger.kernel.org # 4.6+ Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A recent cleanup removed the only user of the 'kms' variable in
msm_preclose(), causing a harmless compiler warning:
drivers/gpu/drm/msm/msm_drv.c: In function 'msm_preclose':
drivers/gpu/drm/msm/msm_drv.c:468:18: error: unused variable 'kms' [-Werror=unused-variable]
This removes the variable as well.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 4016260ba47a ("drm/msm: fix bug after preclose removal") Signed-off-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We don't write back stale inodes so we should skip them in
xfs_iflush_cluster, too.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some careless idiot(*) wrote crap code in commit 1a3e8f3 ("xfs:
convert inode cache lookups to use RCU locking") back in late 2010,
and so xfs_iflush_cluster checks the wrong inode for whether it is
still valid under RCU protection. Fix it to lock and check the
correct inode.
(*) Careless-idiot: Dave Chinner <dchinner@redhat.com>
Discovered-by: Brain Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a failure due to an inode buffer occurs, the error handling
fails to abort the inode writeback correctly. This can result in the
inode being reclaimed whilst still in the AIL, leading to
use-after-free situations as well as filesystems that cannot be
unmounted as the inode log items left in the AIL never get removed.
Fix this by ensuring fatal errors from xfs_imap_to_bp() result in
the inode flush being aborted correctly.
Reported-by: Shyam Kaushik <shyam@zadarastorage.com> Diagnosed-by: Shyam Kaushik <shyam@zadarastorage.com> Tested-by: Shyam Kaushik <shyam@zadarastorage.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Joe Lawrence reported a list_add corruption with 4.6-rc1 when
testing some custom md administration code that made it's own
block device nodes for the md array. The simple test loop of:
for i in {0..100}; do
mknod --mode=0600 $tmp/tmp_node b $MAJOR $MINOR
mdadm --detail --export $tmp/tmp_node > /dev/null
rm -f $tmp/tmp_node
done
Would produce this warning in bd_acquire() when mdadm opened the
device node:
This is a regression caused by commit c19b3b05 ("xfs: mode di_mode
to vfs inode"). The issue is that xfs_inactive() frees the
unlinked inode, and the above commit meant that this freeing zeroed
the mode in the struct inode. The problem is that after evict() has
called ->evict_inode, it expects the i_mode to be intact so that it
can call bd_forget() or cd_forget() to drop the reference to the
block device inode attached to the XFS inode.
In reality, the only thing we do in xfs_fs_evict_inode() that is not
generic is call xfs_inactive(). We can move the xfs_inactive() call
to xfs_fs_destroy_inode() without any problems at all, and this
will leave the VFS inode intact until it is completely done with it.
So, remove xfs_fs_evict_inode(), and do the work it used to do in
->destroy_inode instead.
Reported-by: Joe Lawrence <joe.lawrence@stratus.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 96f859d ("libxfs: pack the agfl header structure so
XFS_AGFL_SIZE is correct") allowed the freelist to use the empty
slot at the end of the freelist on 64 bit systems that was not
being used due to sizeof() rounding up the structure size.
This has caused versions of xfs_repair prior to 4.5.0 (which also
has the fix) to report this as a corruption once the filesystem has
been grown. Older kernels can also have problems (seen from a whacky
container/vm management environment) mounting filesystems grown on a
system with a newer kernel than the vm/container it is deployed on.
To avoid this problem, change the initial free list indexes not to
wrap across the end of the AGFL, hence avoiding the initialisation
of agf_fllast to the last index in the AGFL.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Today, a kernel which refuses to mount a filesystem read-write
due to unknown ro-compat features can still transition to read-write
via the remount path. The old kernel is most likely none the wiser,
because it's unaware of the new feature, and isn't using it. However,
writing to the filesystem may well corrupt metadata related to that
new feature, and moving to a newer kernel which understand the feature
will have problems.
Right now the only ro-compat feature we have is the free inode btree,
which showed up in v3.16. It would be good to push this back to
all the active stable kernels, I think, so that if anyone is using
newer mkfs (which enables the finobt feature) with older kernel
releases, they'll be protected.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Enabling CONFIG_GCOV_PROFILE_ALL produces us a lot of warnings like
lib/lz4/lz4hc_compress.c: In function 'lz4_compresshcctx':
lib/lz4/lz4hc_compress.c:514:1: warning: the frame size of 1504 bytes is larger than 1024 bytes [-Wframe-larger-than=]
After some investigation, I found that this behavior started with gcc-4.9,
and opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69702.
A suggested workaround for it is to use the -fno-tree-loop-im
flag that turns off one of the optimization stages in gcc, so the
code runs a little slower but does not use excessive amounts
of stack.
We could make this conditional on the gcc version, but I could not
find an easy way to do this in Kbuild and the benefit would be
fairly small, given that most of the gcc version in production are
affected now.
I'm marking this for 'stable' backports because it addresses a bug
with code generation in gcc that exists in all kernel versions
with the affected gcc releases.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Michal Marek <mmarek@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If page_move_anon_rmap() is refiling a pmd-splitted THP mapped in a tail
page from a pte, the "address" must be THP aligned in order for the
page->index bugcheck to pass in the CONFIG_DEBUG_VM=y builds.
Link: http://lkml.kernel.org/r/1464253620-106404-1-git-send-email-kirill.shutemov@linux.intel.com Fixes: 6d0a07edd17c ("mm: thp: calculate the mapcount correctly for THP pages during WP faults") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After commit 21a59991ce0c ("scripts/package/Makefile: rpmbuild is needed
for rpm targets"), it is no longer possible to specify RPMOPTS.
For example, we can no longer able to control _topdir using the following
make command.
make RPMOPTS="--define '_topdir /home/xyz/workspace/'" binrpm-pkg
Fixes: 21a59991ce0c ("scripts/package/Makefile: rpmbuild is needed for rpm targets") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Michal Marek <mmarek@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With netconsole (at least) the pr_err("... disablingn") call can
recurse back into the dma-debug code, where it'll try to grab
free_entries_lock again. Avoid the problem by doing the printk after
dropping the lock.
The PM runtime will be left disabled for the device if its
.suspend_late() callback fails and async suspend is not allowed
for this device. In this case device will not be added in
dpm_late_early_list and dpm_resume_early() will ignore this
device, as result PM runtime will be disabled for it forever
(side effect: after 8 subsequent failures for the same device
the PM runtime will be reenabled due to disable_depth overflow).
To fix this problem, add devices to dpm_late_early_list regardless
of whether or not device_suspend_late() returns errors for them.
That will ensure failures in there to be handled consistently for
all devices regardless of their async suspend/resume status.
Reported-by: Grygorii Strashko <grygorii.strashko@ti.com> Tested-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since the patch "NFS: Allow multiple commit requests in flight per file"
we can run multiple simultaneous commits on the same inode. This
introduced a race over collecting pages to commit that made it possible
to call nfs_init_commit() with an empty list - which causes crashes like
the one below.
The fix is to catch this race and avoid calling nfs_init_commit and
initiate_commit when there is no work to do.
Currently, in ext4_mb_init(), there's a loop like the following:
do {
...
offset += 1 << (sb->s_blocksize_bits - i);
i++;
} while (i <= sb->s_blocksize_bits + 1);
Note that the updated offset is used in the loop's next iteration only.
However, at the last iteration, that is at i == sb->s_blocksize_bits + 1,
the shift count becomes equal to (unsigned)-1 > 31 (c.f. C99 6.5.7(3))
and UBSAN reports
Observe that the mentioned shift exponent, 4294967295, equals (unsigned)-1.
Unless compilers start to do some fancy transformations (which at least
GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
such calculated value of offset is never used again.
Silence UBSAN by introducing another variable, offset_incr, holding the
next increment to apply to offset and adjust that one by right shifting it
by one position per loop iteration.
Unless compilers start to do some fancy transformations (which at least
GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
such calculated value of bb is never used again.
Silence UBSAN by introducing another variable, bb_incr, holding the next
increment to apply to bb and adjust that one by right shifting it by one
position per loop iteration.
When filesystem is corrupted in the right way, it can happen
ext4_mark_iloc_dirty() in ext4_orphan_add() returns error and we
subsequently remove inode from the in-memory orphan list. However this
deletion is done with list_del(&EXT4_I(inode)->i_orphan) and thus we
leave i_orphan list_head with a stale content. Later we can look at this
content causing list corruption, oops, or other issues. The reported
trace looked like:
Instead of just printing warning messages, if the orphan list is
corrupted, declare the file system is corrupted. If there are any
reserved inodes in the orphaned inode list, declare the file system
corrupted and stop right away to avoid doing more potential damage to
the file system.
If the orphaned inode list contains inode #5, ext4_iget() returns a
bad inode (since the bootloader inode should never be referenced
directly). Because of the bad inode, we end up processing the inode
repeatedly and this hangs the machine.
(But don't do this if you are using an unpatched kernel if you care
about the system staying functional. :-)
This bug was found by the port of American Fuzzy Lop into the kernel
to find file system problems[1]. (Since it *only* happens if inode #5
shows up on the orphan list --- 3, 7, 8, etc. won't do it, it's not
surprising that AFL needed two hours before it found it.)
Huang has reported that in his powerfail testing he is seeing stale
block contents in some of recently allocated blocks although he mounts
ext4 in data=ordered mode. After some investigation I have found out
that indeed when delayed allocation is used, we don't add inode to
transaction's list of inodes needing flushing before commit. Originally
we were doing that but commit f3b59291a69d removed the logic with a
flawed argument that it is not needed.
The problem is that although for delayed allocated blocks we write their
contents immediately after allocating them, there is no guarantee that
the IO scheduler or device doesn't reorder things and thus transaction
allocating blocks and attaching them to inode can reach stable storage
before actual block contents. Actually whenever we attach freshly
allocated blocks to inode using a written extent, we should add inode to
transaction's ordered inode list to make sure we properly wait for block
contents to be written before committing the transaction. So that is
what we do in this patch. This also handles other cases where stale data
exposure was possible - like filling hole via mmap in
data=ordered,nodelalloc mode.
The only exception to the above rule are extending direct IO writes where
blkdev_direct_IO() waits for IO to complete before increasing i_size and
thus stale data exposure is not possible. For now we don't complicate
the code with optimizing this special case since the overhead is pretty
low. In case this is observed to be a performance problem we can always
handle it using a special flag to ext4_map_blocks().
Pass the current crtc state, not the old crtc state, to the
.update_plane() hook.
Noticed on BSW when PRIMSIZE was getting programmed to a stale value
which produced utter garbage on screen eg. wwhen going from 1920x1080
to 1024x768.
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Fixes: a758e6845825 ("drm/i915: Do not use commit_plane for sprite planes.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1457543247-13987-3-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The component master driver imx-drm-core matches component devices using
their of_node. Since commit 950b410dd1ab ("gpu: ipu-v3: Fix imx-ipuv3-crtc
module autoloading"), the imx-ipuv3-crtc dev->of_node is not set during
probing. Before that, of_node was set and caused an of: modalias to be
used instead of the platform: modalias, which broke module autoloading.
On the other hand, if dev->of_node is not set yet when the imx-ipuv3-crtc
probe function calls component_add, component matching in imx-drm-core
fails. While dev->of_node will be set once the next component tries to
bring up the component master, imx-drm-core component binding will never
succeed if one of the crtc devices is probed last.
Add of_node to the component platform data and match against the
pdata->of_node instead of dev->of_node in imx-drm-core to work around
this problem.
Add a helper which aids in the identification of DP dual mode
(aka. DP++) adaptors. There are several types of adaptors
specified: type 1 DVI, type 1 HDMI, type 2 DVI, type 2 HDMI
Type 1 adaptors have a max TMDS clock limit of 165MHz, type 2 adaptors
may go as high as 300MHz and they provide a register informing the
source device what the actual limit is. Supposedly also type 1 adaptors
may optionally implement this register. This TMDS clock limit is the
main reason why we need to identify these adaptors.
Type 1 adaptors provide access to their internal registers and the sink
DDC bus through I2C. Type 2 adaptors provide this access both via I2C
and I2C-over-AUX. A type 2 source device may choose to implement either
of these methods. If a source device implements the I2C-over-AUX
method, then the driver will obviously need specific support for such
adaptors since the port is driven like an HDMI port, but DDC
communication happes over the AUX channel.
This helper should be enough to identify the adaptor type (some
type 1 DVI adaptors may be a slight exception) and the maximum TMDS
clock limit. Another feature that may be available is control over
the TMDS output buffers on the adaptor, possibly allowing for some
power saving when the TMDS link is down.
Other user controllable features that may be available in the adaptors
are downstream i2c bus speed control when using i2c-over-aux, and
some control over the CEC pin. I chose not to provide any helper
functions for those since I have no use for them in i915 at this time.
The rest of the registers in the adaptor are mostly just information,
eg. IEEE OUI, hardware and firmware revision, etc.
v2: Pass adaptor type to helper functions to ease driver implementation
Fix a bunch of typoes (Paulo)
Add DRM_DP_DUAL_MODE_UNKNOWN for the case where we don't (yet) know
the type (Paulo)
Reject 0x00 and 0xff DP_DUAL_MODE_MAX_TMDS_CLOCK values (Paulo)
Adjust drm_dp_dual_mode_detect() type2 vs. type1 detection to
ease future LSPCON enabling
Remove the unused DP_DUAL_MODE_LAST_RESERVED define
v3: Fix kernel doc function argument descriptions (Jani)
s/NONE/UNKNOWN/ in drm_dp_dual_mode_detect() docs
Add kernel doc for enum drm_dp_dual_mode_type
Actually build the docs
Fix more typoes
v4: Adjust code indentation of type2 adaptor detection (Shashank)
Add debug messages for failurs cases (Shashank)
v5: EXPORT_SYMBOL(drm_dp_dual_mode_read) (Paulo)
Cc: Tore Anderson <tore@fud.no> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Shashank Sharma <shashank.sharma@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Shashank Sharma <shashank.sharma@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Shashank Sharma <shashank.sharma@intel.com> (v4) Link: http://patchwork.freedesktop.org/patch/msgid/1462542412-25533-1-git-send-email-ville.syrjala@linux.intel.com
(cherry picked from commit ede53344dbfd1dd43bfd73eb6af743d37c56a7c3) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 92826fcdfc14 ("drm/i915: Calculate watermark related members in the crtc_state, v4.")
broke thigns by removing the pre vs. post wm update distinction. We also
lost the pre plane wm update entirely for VLV/CHV from the crtc enable
path.
This caused underruns on modeset and plane enable/disable on CHV,
and often those can lead to a dead pipe.
So let's bring back the pre vs. post thing, and let's toss in an
explicit wm update to valleyview_crtc_enable() to avoid having to
put it into the common code.
This is more or less a partial revert of the offending commit.
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: drm-intel-fixes@lists.freedesktop.org Fixes: 92826fcdfc14 ("drm/i915: Calculate watermark related members in the crtc_state, v4.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1457543247-13987-4-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we read out the watermark state from the hardware we're supposed to
transfer that into the active watermarks, but currently we fail to any
part of the active watermarks that isn't explicitly written. Let's clear
it all upfront.
Looks like this has been like this since the beginning, when I added the
readout. No idea why I didn't clear it up.
Cc: Matt Roper <matthew.d.roper@intel.com> Fixes: 243e6a44b9ca ("drm/i915: Init HSW watermark tracking in intel_modeset_setup_hw_state()") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1463151318-14719-2-git-send-email-ville.syrjala@linux.intel.com
(cherry picked from commit 15606534bf0a65d8a74a90fd57b8712d147dbca6) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To save a bit of power, let's try to turn off the TMDS output buffers
in DP++ adaptors when we're not driving the port.
v2: Let's not forget DDI, toss in a debug message while at it
v3: Just do the TMDS output control based on adaptor type. With the
helper getting passed the type, we wouldn't actually have to
check at all in the driver, but the check eliminates the debug
output more honest
Try to detect the max TMDS clock limit for the DP++ adaptor (if any)
and take it into account when checking the port clock.
Note that as with the sink (HDMI vs. DVI) TMDS clock limit we'll ignore
the adaptor TMDS clock limit in the modeset path, in case users are
already "overclocking" their TMDS links. One subtle change here is that
we'll have to respect the adaptor TMDS clock limit when we decide whether
to do 12bpc or 8bpc, otherwise we might end up picking 12bpc and
accidentally driving the TMDS link out of spec even when the user chose
a mode that fits wihting the limits at 8bpc. This means you can't
"overclock" your DP++ dongle at 12bpc anymore, but you can continue to
do so at 8bpc.
Note that for simplicity we'll use the I2C access method for all dual
mode adaptors including type 2. Otherwise we'd have to start mixing
DP AUX and HDMI together. In the future we may need to do that if we
come across any board designs that don't hook up the DDC pins to the
DP++ connectors. Such boards would obviously only work with type 2
dual mode adaptors, and not type 1.
v2: Store adaptor type under indel_hdmi->dp_dual_mode
Deal with DRM_DP_DUAL_MODE_UNKNOWN
Pass adaptor type to drm_dp_dual_mode_max_tmds_clock(),
and use it for type1 adaptors as well
Reported-by: Tore Anderson <tore@fud.no> Fixes: 7a0baa623446 ("Revert "drm/i915: Disable 12bpc hdmi for now"") Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Shashank Sharma <shashank.sharma@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1462216105-20881-3-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Shashank Sharma <shashank.sharma@intel.com>
(cherry picked from commit b1ba124d8e95cca48d33502a4a76b1ed09d213ce) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The default of 0 is 500us of link training, but that's not enough for
some platforms. Decoding this correctly means we're using 2.5ms of
link training on these platforms, which fixes flickering issues
associated with enabling PSR.
v2: Unbotch the math a bit.
v3: Drop debug hunk.
v4: Improve commit message.
Tested-by: Lyude <cpaul@redhat.com> Cc: Lyude <cpaul@redhat.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95176 Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Sonika Jindal <sonika.jindal@intel.com> Cc: Durgadoss R <durgadoss.r@intel.com> Cc: "Pandiyan, Dhinakaran" <dhinakaran.pandiyan@intel.com> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: fritsch@kodi.tv Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1463590036-17824-2-git-send-email-daniel.vetter@ffwll.ch
(cherry picked from commit 50db139018f9c94376d5f4db94a3bae65fdfac14) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The memcpy of ipv6 header destination address to the skb control block
(sbk->cb) in header_create() results in currupted memory when bt_xmit()
is issued. The skb->cb is "released" in the return of header_create()
making room for lower layer to minipulate the skb->cb.
The value retrieved in bt_xmit is not persistent across header creation
and sending, and the lower layer will overwrite portions of skb->cb,
making the copied destination address wrong.
The memory corruption will lead to non-working multicast as the first 4
bytes of the copied destination address is replaced by a value that
resolves into a non-multicast prefix.
This fix removes the dependency on the skb control block between header
creation and send, by moving the destination address memcpy to the send
function path (setup_create, which is called from bt_xmit).
Lyude [Tue, 31 May 2016 16:49:07 +0000 (12:49 -0400)]
drm/atomic: Verify connector->funcs != NULL when clearing states
Unfortunately since we don't have Dave's connector refcounting patch
here yet, it's very possible that drm_atomic_state_default_clear() could
get called by intel_display_resume() when
intel_dp_mst_destroy_connector() isn't completely finished destroying an
mst connector, but has already finished setting connector->funcs to
NULL. As such, we need to treat the connector like it's already been
destroyed and just skip it, otherwise we'll end up dereferencing a NULL
pointer.
This fix is only required for 4.6 and below. David Airlie's patchseries
for 4.7 to add connector reference counting provides a more proper fix
for this.
Changes since v1:
- Fix leftover whitespace
Upstream fix: 0552f7651bc2 ("drm/i915/mst: use reference counted
connectors. (v3)") Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Lyude <cpaul@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lyude [Tue, 31 May 2016 16:49:06 +0000 (12:49 -0400)]
drm/i915: Discard previous atomic state on resume if connectors change
If an MST device is disconnected while the machine is suspended, the
number of connectors will change as well after we call
intel_dp_mst_resume(). This means that any previous atomic state we had
before suspending is no longer valid, since it'll still be pointing to
missing connectors. We need to check for this before committing the
state, otherwise we'll kernel panic on resume whenever if any MST
display was disconnected before we started resuming:
Changes since v1:
- Move drm_atomic_state_free() call down so we're holding the
appropriate locks when destroying the atomic state
Changes since v2:
- Check that state != NULL before we start accessing it's members
This fix is only required for 4.6 and below. David Airlie's patchseries
for 4.7 to add connector reference counting provides a more proper fix
for this.
Upstream fix: 0552f7651bc2 ("drm/i915/mst: use reference counted
connectors. (v3)") Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Lyude <cpaul@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During boot, MST hotplugs are generally expected (even if no physical
hotplugging occurs) and result in DRM's connector topology changing.
This means that using num_connector from the current mode configuration
can lead to the number of connectors changing under us. This can lead to
some nasty scenarios in fbcon:
- We allocate an array to the size of dev->mode_config.num_connectors.
- MST hotplug occurs, dev->mode_config.num_connectors gets incremented.
- We try to loop through each element in the array using the new value
of dev->mode_config.num_connectors, and end up going out of bounds
since dev->mode_config.num_connectors is now larger then the array we
allocated.
fb_helper->connector_count however, will always remain consistent while
we do a modeset in fb_helper.
Note: This is just polish for 4.7, Dave Airlie's drm_connector
refcounting fixed these bugs for real. But it's good enough duct-tape
for stable kernel backporting, since backporting the refcounting
changes is way too invasive.
During boot time, MST devices usually send a ton of hotplug events
irregardless of whether or not any physical hotplugs actually occurred.
Hotplugs mean connectors being created/destroyed, and the number of DRM
connectors changing under us. This isn't a problem if we use
fb_helper->connector_count since we only set it once in the code,
however if we use num_connector from struct drm_mode_config we risk it's
value changing under us. On top of that, there's even a chance that
dev->mode_config.num_connector != fb_helper->connector_count. If the
number of connectors happens to increase under us, we'll end up using
the wrong array size for memcpy and start writing beyond the actual
length of the array, occasionally resulting in kernel panics.
Note: This is just polish for 4.7, Dave Airlie's drm_connector
refcounting fixed these bugs for real. But it's good enough duct-tape
for stable kernel backporting, since backporting the refcounting
changes is way too invasive.
When porting the hdmi deep color detection code from
radeon-kms to amdgpu-kms apparently some kind of
copy and paste error happened, attaching an else
branch to the wrong if statement.
The result is that hdmi deep color mode is always
disabled, regardless of gpu and display capabilities and
user wishes, as the code mistakenly thinks that the display
doesn't provide the required max_tmds_clock limit and falls
back to 8 bpc.
This patch fixes deep color support, as tested on a
R9 380 Tonga Pro + suitable display, and should be
backported to all kernels with amdgpu-kms support.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some global KMS state that is elsewhere protected by the mode_config
mutex here needs to be protected with a local mutex. Remove corresponding
lockdep checks and introduce a new driver-private global_kms_state_mutex,
and make sure its locking order is *after* the crtc locks in order to
avoid having to release those when the new mutex is taken.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix possible out of bounds read, by adding missing comma.
The code may read pass the end of the dsi_errors array
when the most significant bit (bit #31) in the intr_stat register
is set.
This bug has been detected using CppCheck (static analysis tool).