Before that, the user had to use the sysfs attribute "port_add" of an FCP
device (adapter) to add and open remote (target) ports, even for the remote
peer port in point-to-point topology. That code path did a proper port open
recovery trigger taking the erp_lock.
Since above commit, a new helper function zfcp_erp_open_ptp_port()
performed an UNlocked port open recovery trigger. This can race with other
parallel recovery triggers. In zfcp_erp_action_enqueue() this could corrupt
e.g. adapter->erp_total_count or adapter->erp_ready_head.
As already found for fabric topology in v4.17 commit fa89adba1941 ("scsi:
zfcp: fix infinite iteration on ERP ready list"), there was an endless loop
during tracing of rport (un)block. A subsequent v4.18 commit 9e156c54ace3
("scsi: zfcp: assert that the ERP lock is held when tracing a recovery
trigger") introduced a lockdep assertion for that case.
As a side effect, that lockdep assertion now uncovered the unlocked code
path for PtP. It is from within an adapter ERP action:
zfcp_erp_strategy[1479] intentionally DROPs erp lock around
zfcp_erp_strategy_do_action()
zfcp_erp_strategy_do_action[1441] NO erp lock
zfcp_erp_adapter_strategy[876] NO erp lock
zfcp_erp_adapter_strategy_open[855] NO erp lock
zfcp_erp_adapter_strategy_open_fsf[806]NO erp lock
zfcp_erp_adapter_strat_fsf_xconf[772] erp lock only around
zfcp_erp_action_to_running(),
BUT *_not_* around
zfcp_erp_enqueue_ptp_port()
zfcp_erp_enqueue_ptp_port[728] BUG: *_not_* taking erp lock
_zfcp_erp_port_reopen[432] assumes to be called with erp lock
zfcp_erp_action_enqueue[314] assumes to be called with erp lock
zfcp_dbf_rec_trig[288] _checks_ to be called with erp lock:
lockdep_assert_held(&adapter->erp_lock);
It causes the following lockdep warning:
WARNING: CPU: 2 PID: 775 at drivers/s390/scsi/zfcp_dbf.c:288
zfcp_dbf_rec_trig+0x16a/0x188
no locks held by zfcperp0.0.17c0/775.
Fix this by using the proper locked recovery trigger helper function.
Fix below kmemleak detected in verity_fec_ctr. output_pool is
allocated for each dm-verity-fec device. But it is not freed when
dm-table for the verity target is removed. Hence free the output
mempool in destructor function verity_fec_dtr.
This patch replaces the size + 1 value introduced with the recent fix for 1
byte allocs with a constant value.
The idea here is to reduce code overhead as the previous logic would have
to read size into a register, then increment it, and write it back to
whatever field was being used. By using a constant we can avoid those
memory reads and arithmetic operations in favor of just encoding the
maximum value into the operation itself.
Fixes: 2c2ade81741c ("mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs") Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 0161a94e2d1c7 ("tools: gpio: Correctly add make dependencies for
gpio_utils") added a make rule for gpio-utils-in.o but used $(output)
instead of the correct $(OUTPUT) for the output directory, breaking
out-of-tree build (O=xx) with the following error:
No rule to make target 'out/tools/gpio/gpio-utils-in.o', needed by 'out/tools/gpio/lsgpio-in.o'. Stop.
This fix addresses https://bugzilla.kernel.org/show_bug.cgi?id=201071
Commit 5025f7f7d506 wrongly relied on __dev_change_flags to notify users of
dev flag changes in the case when dev->rtnl_link_state = RTNL_LINK_INITIALIZED.
Fix it by indicating flag changes explicitly to __dev_notify_flags.
Fixes: 5025f7f7d506 ("rtnetlink: add rtnl_link_state check in rtnl_configure_link") Reported-By: Liam mcbirnie <liam.mcbirnie@boeing.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The recent commit 98081ca62cba ("ALSA: hda - Record the current power
state before suspend/resume calls") made the HD-audio driver to store
the PM state in power_state field. This forgot, however, the
initialization at power up. Although the codec drivers usually don't
need to refer to this field in the normal operation, let's initialize
it properly for consistency.
Fixes: 98081ca62cba ("ALSA: hda - Record the current power state before suspend/resume calls") Signed-off-by: Takashi Iwai <tiwai@suse.de> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The incorrect traversal of the scatterlist, during the linearization phase
lead to computing the hash value of the wrong input buffer.
New implementation uses scatterwalk_map_and_copy()
to address this issue.
If we have an error while building the backref tree in relocation we'll
process all the pending edges and then free the node. However if we
integrated some edges into the cache we'll lose our link to those edges
by simply freeing this node, which means we'll leak memory and
references to any roots that we've found.
Instead we need to use remove_backref_node(), which walks through all of
the edges that are still linked to this node and free's them up and
drops any root references we may be holding.
CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch is used to fix the bug in collect_uncached_read_data()
that rc is automatically converted from a signed number to an
unsigned number when the CIFS asynchronous read fails.
It will cause ctx->rc is error.
Example:
Share a directory and create a file on the Windows OS.
Mount the directory to the Linux OS using CIFS.
On the CIFS client of the Linux OS, invoke the pread interface to
deliver the read request.
The size of the read length plus offset of the read request is greater
than the maximum file size.
In this case, the CIFS server on the Windows OS returns a failure
message (for example, the return value of
smb2.nt_status is STATUS_INVALID_PARAMETER).
After receiving the response message, the CIFS client parses
smb2.nt_status to STATUS_INVALID_PARAMETER
and converts it to the Linux error code (rdata->result=-22).
Then the CIFS client invokes the collect_uncached_read_data function to
assign the value of rdata->result to rc, that is, rc=rdata->result=-22.
The type of the ctx->total_len variable is unsigned integer,
the type of the rc variable is integer, and the type of
the ctx->rc variable is ssize_t.
Therefore, during the ternary operation, the value of rc is
automatically converted to an unsigned number. The final result is
ctx->rc=4294967274. However, the expected result is ctx->rc=-22.
Signed-off-by: Yilu Lin <linyilu@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If KVM wasn't used at all before we crash the cleanup procedure fails with
BUG: unable to handle page fault for address: ffffffffffffffc8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 23215067 P4D 23215067 PUD 23217067 PMD 0
Oops: 0000 [#8] SMP PTI
CPU: 0 PID: 3542 Comm: bash Kdump: loaded Tainted: G D 5.6.0-rc2+ #823
RIP: 0010:crash_vmclear_local_loaded_vmcss.cold+0x19/0x51 [kvm_intel]
The root cause is that loaded_vmcss_on_cpu list is not yet initialized,
we initialize it in hardware_enable() but this only happens when we start
a VM.
Previously, we used to have a bitmap with enabled CPUs and that was
preventing [masking] the issue.
Initialized loaded_vmcss_on_cpu list earlier, right before we assign
crash_vmclear_loaded_vmcss pointer. blocked_vcpu_on_cpu list and
blocked_vcpu_on_cpu_lock are moved altogether for consistency.
Fixes: 31603d4fc2bb ("KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200401081348.1345307-1-vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown
interrupted a KVM update of the percpu in-use VMCS list.
Because NMIs are not blocked by disabling IRQs, it's possible that
crash_vmclear_local_loaded_vmcss() could be called while the percpu list
of VMCSes is being modified, e.g. in the middle of list_add() in
vmx_vcpu_load_vmcs(). This potential corner case was called out in the
original commit[*], but the analysis of its impact was wrong.
Skipping the VMCLEARs is wrong because it all but guarantees that a
loaded, and therefore cached, VMCS will live across kexec and corrupt
memory in the new kernel. Corruption will occur because the CPU's VMCS
cache is non-coherent, i.e. not snooped, and so the writeback of VMCS
memory on its eviction will overwrite random memory in the new kernel.
The VMCS will live because the NMI shootdown also disables VMX, i.e. the
in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the
VMCS cache on VMXOFF.
Furthermore, interrupting list_add() and list_del() is safe due to
crash_vmclear_local_loaded_vmcss() using forward iteration. list_add()
ensures the new entry is not visible to forward iteration unless the
entire add completes, via WRITE_ONCE(prev->next, new). A bad "prev"
pointer could be observed if the NMI shootdown interrupted list_del() or
list_add(), but list_for_each_entry() does not consume ->prev.
In addition to removing the temporary disabling of VMCLEAR, open code
loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that
the VMCS is deleted from the list only after it's been VMCLEAR'd.
Deleting the VMCS before VMCLEAR would allow a race where the NMI
shootdown could arrive between list_del() and vmcs_clear() and thus
neither flow would execute a successful VMCLEAR. Alternatively, more
code could be moved into loaded_vmcs_init(), but that gets rather silly
as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb()
and would need to work around the list_del().
Update the smp_*() comments related to the list manipulation, and
opportunistically reword them to improve clarity.
Fixes: 8f536b7697a0 ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reallocate a rmap array and recalcuate large page compatibility when
moving an existing memslot to correctly handle the alignment properties
of the new memslot. The number of rmap entries required at each level
is dependent on the alignment of the memslot's base gfn with respect to
that level, e.g. moving a large-page aligned memslot so that it becomes
unaligned will increase the number of rmap entries needed at the now
unaligned level.
Not updating the rmap array is the most obvious bug, as KVM accesses
garbage data beyond the end of the rmap. KVM interprets the bad data as
pointers, leading to non-canonical #GPs, unexpected #PFs, etc...
The disallow_lpage tracking is more subtle. Failure to update results
in KVM creating large pages when it shouldn't, either due to stale data
or again due to indexing beyond the end of the metadata arrays, which
can lead to memory corruption and/or leaking data to guest/userspace.
Note, the arrays for the old memslot are freed by the unconditional call
to kvm_free_memslot() in __kvm_set_memory_region().
Fixes: 05da45583de9b ("KVM: MMU: large page support") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Whenever we get an -EFAULT, we failed to read in guest 2 physical
address space. Such addressing exceptions are reported via a program
intercept to the nested hypervisor.
We faked the intercept, we have to return to guest 2. Instead, right
now we would be returning -EFAULT from the intercept handler, eventually
crashing the VM.
the correct thing to do is to return 1 as rc == 1 is the internal
representation of "we have to go back into g2".
Addressing exceptions can only happen if the g2->g3 page tables
reference invalid g2 addresses (say, either a table or the final page is
not accessible - so something that basically never happens in sane
environments.
Identified by manual code inspection.
Fixes: a3508fbe9dc6 ("KVM: s390: vsie: initial support for nested virtualization") Cc: <stable@vger.kernel.org> # v4.8+ Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200403153050.20569-3-david@redhat.com Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
[borntraeger@de.ibm.com: fix patch description] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case we have a region 1 the following calculation
(31 + ((gmap->asce & _ASCE_TYPE_MASK) >> 2)*11)
results in 64. As shifts beyond the size are undefined the compiler is
free to use instructions like sllg. sllg will only use 6 bits of the
shift value (here 64) resulting in no shift at all. That means that ALL
addresses will be rejected.
The can result in endless loops, e.g. when prefix cannot get mapped.
Fixes: 4be130a08420 ("s390/mm: add shadow gmap support") Tested-by: Janosch Frank <frankja@linux.ibm.com> Reported-by: Janosch Frank <frankja@linux.ibm.com> Cc: <stable@vger.kernel.org> # v4.8+ Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200403153050.20569-2-david@redhat.com Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
[borntraeger@de.ibm.com: fix patch description, remove WARN_ON_ONCE] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Return true for vmx_interrupt_allowed() if the vCPU is in L2 and L1 has
external interrupt exiting enabled. IRQs are never blocked in hardware
if the CPU is in the guest (L2 from L1's perspective) when IRQs trigger
VM-Exit.
The new check percolates up to kvm_vcpu_ready_for_interrupt_injection()
and thus vcpu_run(), and so KVM will exit to userspace if userspace has
requested an interrupt window (to inject an IRQ into L1).
Remove the @external_intr param from vmx_check_nested_events(), which is
actually an indicator that userspace wants an interrupt window, e.g.
it's named @req_int_win further up the stack. Injecting a VM-Exit into
L1 to try and bounce out to L0 userspace is all kinds of broken and is
no longer necessary.
Remove the hack in nested_vmx_vmexit() that attempted to workaround the
breakage in vmx_check_nested_events() by only filling interrupt info if
there's an actual interrupt pending. The hack actually made things
worse because it caused KVM to _never_ fill interrupt info when the
LAPIC resides in userspace (kvm_cpu_has_interrupt() queries
interrupt.injected, which is always cleared by prepare_vmcs12() before
reaching the hack in nested_vmx_vmexit()).
Fixes: 6550c4df7e50 ("KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit"") Cc: stable@vger.kernel.org Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Replace the 32bit exec_id with a 64bit exec_id to make it impossible
to wrap the exec_id counter. With care an attacker can cause exec_id
wrap and send arbitrary signals to a newly exec'd parent. This
bypasses the signal sending checks if the parent changes their
credentials during exec.
The severity of this problem can been seen that in my limited testing
of a 32bit exec_id it can take as little as 19s to exec 65536 times.
Which means that it can take as little as 14 days to wrap a 32bit
exec_id. Adam Zabrocki has succeeded wrapping the self_exe_id in 7
days. Even my slower timing is in the uptime of a typical server.
Which means self_exec_id is simply a speed bump today, and if exec
gets noticably faster self_exec_id won't even be a speed bump.
Extending self_exec_id to 64bits introduces a problem on 32bit
architectures where reading self_exec_id is no longer atomic and can
take two read instructions. Which means that is is possible to hit
a window where the read value of exec_id does not match the written
value. So with very lucky timing after this change this still
remains expoiltable.
I have updated the update of exec_id on exec to use WRITE_ONCE
and the read of exec_id in do_notify_parent to use READ_ONCE
to make it clear that there is no locking between these two
locations.
When TPC is disabled IEEE80211_CONF_CHANGE_POWER event can be handled to
reconfigure HW's maximum txpower.
This fixes 0dBm txpower setting when user attaches to an interface for
the first time with the following scenario:
ieee80211_do_open()
ath9k_add_interface()
ath9k_set_txpower() /* Set TX power with not yet initialized
sc->hw->conf.power_level */
ieee80211_hw_config() /* Iniatilize sc->hw->conf.power_level and
raise IEEE80211_CONF_CHANGE_POWER */
ath9k_config() /* IEEE80211_CONF_CHANGE_POWER is ignored */
This issue can be reproduced with the following:
$ modprobe -r ath9k
$ modprobe ath9k
$ wpa_supplicant -i wlan0 -c /tmp/wpa.conf &
$ iw dev /* Here TX power is either 0 or 3 depending on RF chain */
$ killall wpa_supplicant
$ iw dev /* TX power goes back to calibrated value and subsequent
calls will be fine */
Fixes: 283dd11994cde ("ath9k: add per-vif TX power capability") Cc: stable@vger.kernel.org Signed-off-by: Remi Pommarel <repk@triplefau.lt> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Clear its own IRQs before the parent IRQ get enabled, so that the
remaining IRQs do not accidentally interrupt the parent IRQ controller.
This patch also fixes a reboot bug on OX820 SoC, where the remaining
rps-timer IRQ raises a GIC interrupt that is left pending. After that,
the rps-timer IRQ is cleared during driver initialization, and there's
no IRQ left in rps-irq when local_irq_enable() is called, which evokes
an error message "unexpected IRQ trap".
Fixes: bdd272cbb97a ("irqchip: versatile FPGA: support cascaded interrupts from DT") Signed-off-by: Sungbo Eo <mans0n@gorani.run> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200321133842.2408823-1-mans0n@gorani.run Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit a08bf91ce28e ("KEYS: allow reaching the keys quotas exactly"),
we can reach max bytes/keys in key_alloc, but we forget to remove this
limit when we reserver space for payload in key_payload_reserve. So we
can only reach max keys but not max bytes when having delta between plen
and type->def_datalen. Remove this limit when instantiating the key, so we
can keep consistent with key_alloc.
Also, fix the similar problem in keyctl_chown_key().
Fixes: 0b77f5bfb45c ("keys: make the keyring quotas controllable through /proc/sys") Fixes: a08bf91ce28e ("KEYS: allow reaching the keys quotas exactly") Cc: stable@vger.kernel.org # 5.0.x Cc: Eric Biggers <ebiggers@google.com> Signed-off-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
pci-epc-mem uses a bitmap to manage the Endpoint outbound (OB) address
region. This address region will be shared by multiple endpoint
functions (in the case of multi function endpoint) and it has to be
protected from concurrent access to avoid updating an inconsistent state.
Use a mutex to protect bitmap updates to prevent the memory
allocation API from returning incorrect addresses.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In pcie_config_aspm_l1ss(), we cleared the wrong bits when enabling ASPM L1
Substates. Instead of the L1.x enable bits (PCI_L1SS_CTL1_L1SS_MASK, 0xf), we
cleared the Link Activation Interrupt Enable bit (PCI_L1SS_CAP_L1_PM_SS,
0x10).
Clear the L1.x enable bits before writing the new L1.x configuration.
The original patch was to resolve the lldd being able to be unloaded
while being used to talk to the boot device of the system. However, the
end result of the original patch is that any driver unload while a nvme
controller is live via the lldd is now being prohibited. Given the module
reference, the module teardown routine can't be called, thus there's no
way, other than manual actions to terminate the controllers.
Fixes: 863fbae929c7 ("nvme_fc: add module to ops template to allow module references") Cc: <stable@vger.kernel.org> # v5.4+ Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When CONFIG_DEVFREQ_THERMAL is disabled all functions except
of_devfreq_cooling_register_power() were already inlined. Also inline
the last function to avoid compile errors when multiple drivers call
of_devfreq_cooling_register_power() when CONFIG_DEVFREQ_THERMAL is not
set. Compilation failed with the following message:
multiple definition of `of_devfreq_cooling_register_power'
(which then lists all usages of of_devfreq_cooling_register_power())
Thomas Zimmermann reported this problem [0] on a kernel config with
CONFIG_DRM_LIMA={m,y}, CONFIG_DRM_PANFROST={m,y} and
CONFIG_DEVFREQ_THERMAL=n after both, the lima and panfrost drivers
gained devfreq cooling support.
The value in "new" is constructed from "old" such that all bits defined
as reserved by the ACPI spec[1] are left untouched. But if those bits
do not happen to be all zero, "new < 3" will not evaluate to true.
The firmware of the laptop(s) Medion MD63490 / Akoya P15648 comes with
garbage inside the "FACS" ACPI table. The starting value is
old=0x4944454d, therefore new=0x4944454e, which is >= 3. Mask off
the reserved bits.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206553 Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Realtek PC Beep Hidden Register[1] is currently set by
patch_realtek.c in two different places:
In alc_fill_eapd_coef(), it's set to the value 0x5757, corresponding to
non-beep input on 1Ah and no 1Ah loopback to either headphones or
speakers. (Although, curiously, the loopback amp is still enabled.) This
write was added fairly recently by commit e3743f431143 ("ALSA:
hda/realtek - Dell headphone has noise on unmute for ALC236") and is a
safe default. However, it happens in the wrong place:
alc_fill_eapd_coef() runs on module load and cold boot but not on S3
resume, meaning the register loses its value after suspend.
Conversely, in alc256_init(), the register is updated to unset bit 13
(disable speaker loopback) and set bit 5 (set non-beep input on 1Ah).
Although this write does run on S3 resume, it's not quite enough to fix
up the register's default value of 0x3717. What's missing is a set of
bit 14 to disable headphone loopback. Without that, we end up with a
feedback loop where the headphone jack is being driven by amplified
samples of itself[2].
This change eliminates the update in alc256_init() and replaces it with
the 0x5757 write from alc_fill_eapd_coef(). Kailang says that 0x5757 is
supposed to be the codec's default value, so using it will make
debugging easier for Realtek.
Affects the ALC255, ALC256, ALC257, ALC235, and ALC236 codecs.
[1] Newly documented in Documentation/sound/hd-audio/realtek-pc-beep.rst
[2] Setting the "Headphone Mic Boost" control from userspace changes
this feedback loop and has been a widely-shared workaround for headphone
noise on laptops like the Dell XPS 13 9350. This commit eliminates the
feedback loop and makes the workaround unnecessary.
This codec (among others) has a hidden set of audio routes, apparently
designed to allow PC Beep output without a mixer widget on the output
path, which are controlled by an undocumented Realtek vendor register.
The default configuration of these routes means that certain inputs
aren't accessible, necessitating driver control of the register.
However, Realtek has provided no documentation of the register, instead
opting to fix issues by providing magic numbers, most of which have been
at least somewhat erroneous. These magic numbers then get copied by
others into model-specific fixups, leading to a fragmented and buggy set
of configurations.
To get out of this situation, I've reverse engineered the register by
flipping bits and observing how the codec's behavior changes. This
commit documents my findings. It does not change any code.
The recent fix for the OOB access in PCM OSS plugins (commit f2ecf903ef06: "ALSA: pcm: oss: Avoid plugin buffer overflow") caused a
regression on OSS applications. The patch introduced the size check
in client and slave size calculations to limit to each plugin's buffer
size, but I overlooked that some code paths call those without
allocating the buffer but just for estimation.
This patch fixes the bug by skipping the size check for those code
paths while keeping checking in the actual transfer calls.
The access to Analog Capture Source control value implemented in
prodigy_hifi.c is wrong, as caught by the recently introduced sanity
check; it should be accessing value.enumerated.item[] instead of
value.integer.value[]. This patch corrects the wrong access pattern.
The beep control helper function blindly stores the values in two
stereo channels no matter whether the actual control is mono or
stereo. This is practically harmless, but it annoys the recently
introduced sanity check, resulting in an error when the checker is
enabled.
This patch corrects the behavior to store only on the defined array
member.
The recent AMD platform exposes an HD-audio bus but without any actual
codecs, which is internally tied with a USB-audio device, supposedly.
It results in "no codecs" error of HD-audio bus driver, and it's
nothing but a waste of resources.
This patch introduces a static blacklist table for skipping such a
known bogus PCI SSID entry. As of writing this patch, the known SSIDs
are:
* 1043:874f - ASUS ROG Zenith II / Strix
* 1462:cb59 - MSI TRX40 Creator
* 1462:cb60 - MSI TRX40
Some recent boards (supposedly with a new AMD platform) contain the
USB audio class 2 device that is often tied with HD-audio. The device
exposes an Input Gain Pad control (id=19, control=12) but this node
doesn't behave correctly, returning an error for each inquiry of
GET_MIN and GET_MAX that should have been mandatory.
As a workaround, simply ignore this node by adding a usbmix_name_map
table entry. The currently known devices are:
* 0414:a002 - Gigabyte TRX40 Aorus Pro WiFi
* 0b05:1916 - ASUS ROG Zenith II
* 0b05:1917 - ASUS ROG Strix
* 0db0:0d64 - MSI TRX40 Creator
* 0db0:543d - MSI TRX40
Different configuration/condition may draw different power. Inform the
controller driver of the change so it can respond properly (e.g.
GET_STATUS request). This fixes an issue with setting MaxPower from
configfs. The composite driver doesn't check this value when setting
self-powered.
Cc: stable@vger.kernel.org Fixes: 88af8bbe4ef7 ("usb: gadget: the start of the configfs interface") Signed-off-by: Thinh Nguyen <thinhn@synopsys.com> Signed-off-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In AIO case, the request is freed up if ep_queue fails.
However, io_data->req still has the reference to this freed
request. In the case of this failure if there is aio_cancel
call on this io_data it will lead to an invalid dequeue
operation and a potential use after free issue.
Fix this by setting the io_data->req to NULL when the request
is freed as part of queue failure.
soc_compr_trigger_fe() allows start or stop after pause_push.
In dpcm_be_dai_trigger(), however, only pause_release is allowed
command after pause_push.
So, start or stop after pause in compress offload is always
returned as error if the compress offload is used with dpcm.
To fix the problem, SND_SOC_DPCM_STATE_PAUSED should be allowed
for start or stop command.
Signed-off-by: Gyeongtaek Lee <gt82.lee@samsung.com> Reviewed-by: Vinod Koul <vkoul@kernel.org> Link: https://lore.kernel.org/r/004d01d607c1$7a3d5250$6eb7f6f0$@samsung.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since a virtual mixer has no backing registers
to decide which path to connect,
it will try to match with initial state.
This is to ensure that the default mixer choice will be
correctly powered up during initialization.
Invert flag is used to select initial state of the virtual switch.
Since actual hardware can't be disconnected by virtual switch,
connected is better choice as initial state in many cases.
If regwshift is 32 and the selected architecture compiles '<<' operator
for signed int literal into rotating shift, '1<<regwshift' became 1 and
it makes regwmask to 0x0.
The literal is set to unsigned long to get intended regwmask.
Under CONFIG_SLAB_FREELIST_HARDENED=y, the obfuscation was relatively weak
in that the ptr and ptr address were usually so close that the first XOR
would result in an almost entirely 0-byte value[1], leaving most of the
"secret" number ultimately being stored after the third XOR. A single
blind memory content exposure of the freelist was generally sufficient to
learn the secret.
Add a swab() call to mix bits a little more. This is a cheap way (1
cycle) to make attacks need more than a single exposure to learn the
secret (or to know _where_ the exposure is in memory).
We always search the commit root of the extent tree for looking up back
references, however we track the reloc roots based on their current
bytenr.
This is wrong, if we commit the transaction between relocating tree
blocks we could end up in this code in build_backref_tree
if (key.objectid == key.offset) {
/*
* Only root blocks of reloc trees use backref
* pointing to itself.
*/
root = find_reloc_root(rc, cur->bytenr);
ASSERT(root);
cur->root = root;
break;
}
find_reloc_root() is looking based on the bytenr we had in the commit
root, but if we've COWed this reloc root we will not find that bytenr,
and we will trip over the ASSERT(root).
Fix this by using the commit_root->start bytenr for indexing the commit
root. Then we change the __update_reloc_root() caller to be used when
we switch the commit root for the reloc root during commit.
This fixes the panic I was seeing when we started throttling relocation
for delayed refs.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This was pretty subtle, we default to reloc roots having 0 root refs, so
if we crash in the middle of the relocation they can just be deleted.
If we successfully complete the relocation operations we'll set our root
refs to 1 in prepare_to_merge() and then go on to merge_reloc_roots().
At prepare_to_merge() time if any of the reloc roots have a 0 reference
still, we will remove that reloc root from our reloc root rb tree, and
then clean it up later.
However this only happens if we successfully start a transaction. If
we've aborted previously we will skip this step completely, and only
have reloc roots with a reference count of 0, but were never properly
removed from the reloc control's rb tree.
This isn't a problem per-se, our references are held by the list the
reloc roots are on, and by the original root the reloc root belongs to.
If we end up in this situation all the reloc roots will be added to the
dirty_reloc_list, and then properly dropped at that point. The reloc
control will be free'd and the rb tree is no longer used.
There were two options when fixing this, one was to remove the BUG_ON(),
the other was to make prepare_to_merge() handle the case where we
couldn't start a trans handle.
IMO this is the cleaner solution. I started with handling the error in
prepare_to_merge(), but it turned out super ugly. And in the end this
BUG_ON() simply doesn't matter, the cleanup was happening properly, we
were just panicing because this BUG_ON() only matters in the success
case. So I've opted to just remove it and add a comment where it was.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In bfq_idle_slice_timer func, bfqq = bfqd->in_service_queue is
not in bfqd-lock critical section. The bfqq, which is not
equal to NULL in bfq_idle_slice_timer, may be freed after passing
to bfq_idle_slice_timer_body. So we will access the freed memory.
In addition, considering the bfqq may be in race, we should
firstly check whether bfqq is in service before doing something
on it in bfq_idle_slice_timer_body func. If the bfqq in race is
not in service, it means the bfqq has been expired through
__bfq_bfqq_expire func, and wait_request flags has been cleared in
__bfq_bfqd_reset_in_service func. So we do not need to re-clear the
wait_request of bfqq which is not in service.
KASAN log is given as follows:
[13058.354613] ==================================================================
[13058.354640] BUG: KASAN: use-after-free in bfq_idle_slice_timer+0xac/0x290
[13058.354644] Read of size 8 at addr ffffa02cf3e63f78 by task fork13/19767
[13058.354646]
[13058.354655] CPU: 96 PID: 19767 Comm: fork13
[13058.354661] Call trace:
[13058.354667] dump_backtrace+0x0/0x310
[13058.354672] show_stack+0x28/0x38
[13058.354681] dump_stack+0xd8/0x108
[13058.354687] print_address_description+0x68/0x2d0
[13058.354690] kasan_report+0x124/0x2e0
[13058.354697] __asan_load8+0x88/0xb0
[13058.354702] bfq_idle_slice_timer+0xac/0x290
[13058.354707] __hrtimer_run_queues+0x298/0x8b8
[13058.354710] hrtimer_interrupt+0x1b8/0x678
[13058.354716] arch_timer_handler_phys+0x4c/0x78
[13058.354722] handle_percpu_devid_irq+0xf0/0x558
[13058.354731] generic_handle_irq+0x50/0x70
[13058.354735] __handle_domain_irq+0x94/0x110
[13058.354739] gic_handle_irq+0x8c/0x1b0
[13058.354742] el1_irq+0xb8/0x140
[13058.354748] do_wp_page+0x260/0xe28
[13058.354752] __handle_mm_fault+0x8ec/0x9b0
[13058.354756] handle_mm_fault+0x280/0x460
[13058.354762] do_page_fault+0x3ec/0x890
[13058.354765] do_mem_abort+0xc0/0x1b0
[13058.354768] el0_da+0x24/0x28
[13058.354770]
[13058.354773] Allocated by task 19731:
[13058.354780] kasan_kmalloc+0xe0/0x190
[13058.354784] kasan_slab_alloc+0x14/0x20
[13058.354788] kmem_cache_alloc_node+0x130/0x440
[13058.354793] bfq_get_queue+0x138/0x858
[13058.354797] bfq_get_bfqq_handle_split+0xd4/0x328
[13058.354801] bfq_init_rq+0x1f4/0x1180
[13058.354806] bfq_insert_requests+0x264/0x1c98
[13058.354811] blk_mq_sched_insert_requests+0x1c4/0x488
[13058.354818] blk_mq_flush_plug_list+0x2d4/0x6e0
[13058.354826] blk_flush_plug_list+0x230/0x548
[13058.354830] blk_finish_plug+0x60/0x80
[13058.354838] read_pages+0xec/0x2c0
[13058.354842] __do_page_cache_readahead+0x374/0x438
[13058.354846] ondemand_readahead+0x24c/0x6b0
[13058.354851] page_cache_sync_readahead+0x17c/0x2f8
[13058.354858] generic_file_buffered_read+0x588/0xc58
[13058.354862] generic_file_read_iter+0x1b4/0x278
[13058.354965] ext4_file_read_iter+0xa8/0x1d8 [ext4]
[13058.354972] __vfs_read+0x238/0x320
[13058.354976] vfs_read+0xbc/0x1c0
[13058.354980] ksys_read+0xdc/0x1b8
[13058.354984] __arm64_sys_read+0x50/0x60
[13058.354990] el0_svc_common+0xb4/0x1d8
[13058.354994] el0_svc_handler+0x50/0xa8
[13058.354998] el0_svc+0x8/0xc
[13058.354999]
[13058.355001] Freed by task 19731:
[13058.355007] __kasan_slab_free+0x120/0x228
[13058.355010] kasan_slab_free+0x10/0x18
[13058.355014] kmem_cache_free+0x288/0x3f0
[13058.355018] bfq_put_queue+0x134/0x208
[13058.355022] bfq_exit_icq_bfqq+0x164/0x348
[13058.355026] bfq_exit_icq+0x28/0x40
[13058.355030] ioc_exit_icq+0xa0/0x150
[13058.355035] put_io_context_active+0x250/0x438
[13058.355038] exit_io_context+0xd0/0x138
[13058.355045] do_exit+0x734/0xc58
[13058.355050] do_group_exit+0x78/0x220
[13058.355054] __wake_up_parent+0x0/0x50
[13058.355058] el0_svc_common+0xb4/0x1d8
[13058.355062] el0_svc_handler+0x50/0xa8
[13058.355066] el0_svc+0x8/0xc
[13058.355067]
[13058.355071] The buggy address belongs to the object at ffffa02cf3e63e70#012 which belongs to the cache bfq_queue of size 464
[13058.355075] The buggy address is located 264 bytes inside of#012 464-byte region [ffffa02cf3e63e70, ffffa02cf3e64040)
[13058.355077] The buggy address belongs to the page:
[13058.355083] page:ffff7e80b3cf9800 count:1 mapcount:0 mapping:ffff802db5c90780 index:0xffffa02cf3e606f0 compound_mapcount: 0
[13058.366175] flags: 0x2ffffe0000008100(slab|head)
[13058.370781] raw: 2ffffe0000008100ffff7e80b53b1408ffffa02d730c1c90ffff802db5c90780
[13058.370787] raw: ffffa02cf3e606f0000000000037002300000001ffffffff0000000000000000
[13058.370789] page dumped because: kasan: bad access detected
[13058.370791]
[13058.370792] Memory state around the buggy address:
[13058.370797] ffffa02cf3e63e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb
[13058.370801] ffffa02cf3e63e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[13058.370805] >ffffa02cf3e63f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[13058.370808] ^
[13058.370811] ffffa02cf3e63f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[13058.370815] ffffa02cf3e64000: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[13058.370817] ==================================================================
[13058.370820] Disabling lock debugging due to kernel taint
Here, we directly pass the bfqd to bfq_idle_slice_timer_body func.
--
V2->V3: rewrite the comment as suggested by Paolo Valente
V1->V2: add one comment, and add Fixes and Reported-by tag.
Fixes: aee69d78d ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler") Acked-by: Paolo Valente <paolo.valente@linaro.org> Reported-by: Wang Wang <wangwang2@huawei.com> Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Signed-off-by: Feilong Lin <linfeilong@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
The warning got triggered because lockdep_count_forward_deps() call
__bfs() without current->lockdep_recursion being set, as a result
a lockdep internal function (__bfs()) is checked by lockdep, which is
unexpected, and the inconsistency between the irq-off state and the
state traced by lockdep caused the warning.
Apart from this warning, lockdep internal functions like __bfs() should
always be protected by current->lockdep_recursion to avoid potential
deadlocks and data inconsistency, therefore add the
current->lockdep_recursion on-and-off section to protect __bfs() in both
lockdep_count_forward_deps() and lockdep_count_backward_deps()
This is happening because we have a pending doorbell that requires
retrigger. As SW retriggering is done in a tasklet, we trigger the
circular dependency above.
The easy cop-out is to provide a retrigger callback that doesn't
require acquiring any extra lock.
In certain circumstances, the XHCI SuperSpeed instance in park mode
can fail to recover, thus on Amlogic G12A/G12B/SM1 SoCs when there is high
load on the single XHCI SuperSpeed instance, the controller can crash like:
xhci-hcd xhci-hcd.0.auto: xHCI host not responding to stop endpoint command.
xhci-hcd xhci-hcd.0.auto: Host halt failed, -110
xhci-hcd xhci-hcd.0.auto: xHCI host controller not responding, assume dead
xhci-hcd xhci-hcd.0.auto: xHCI host not responding to stop endpoint command.
hub 2-1.1:1.0: hub_ext_port_status failed (err = -22)
xhci-hcd xhci-hcd.0.auto: HC died; cleaning up
usb 2-1.1-port1: cannot reset (err = -22)
Setting the PARKMODE_DISABLE_SS bit in the DWC3_USB3_GUCTL1 mitigates
the issue. The bit is described as :
"When this bit is set to '1' all SS bus instances in park mode are disabled"
Synopsys explains:
The GUCTL1.PARKMODE_DISABLE_SS is only available in
dwc_usb3 controller running in host mode.
This should not be set for other IPs.
This can be disabled by default based on IP, but I recommend to have a
property to enable this feature for devices that need this.
CC: Dongjin Kim <tobetter@gmail.com> Cc: Jianxin Pan <jianxin.pan@amlogic.com> Cc: Thinh Nguyen <thinhn@synopsys.com> Cc: Jun Li <lijun.kernel@gmail.com> Reported-by: Tim <elatllat@gmail.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
There is a potential race between ioc_release_fn() and
ioc_clear_queue() as shown below, due to which below kernel
crash is observed. It also can result into use-after-free
issue.
context#1: context#2:
ioc_release_fn() __ioc_clear_queue() gets the same icq
->spin_lock(&ioc->lock); ->spin_lock(&ioc->lock);
->ioc_destroy_icq(icq);
->list_del_init(&icq->q_node);
->call_rcu(&icq->__rcu_head,
icq_free_icq_rcu);
->spin_unlock(&ioc->lock);
->ioc_destroy_icq(icq);
->hlist_del_init(&icq->ioc_node);
This results into below crash as this memory
is now used by icq->__rcu_head in context#1.
There is a chance that icq could be free'd
as well.
Fix this by adding a new ICQ_DESTROYED flag in ioc_destroy_icq() to
indicate this icq is once marked as destroyed. Also, ensure
__ioc_clear_queue() is accessing icq within rcu_read_lock/unlock so
that icq doesn't get free'd up while it is still using it.
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Co-developed-by: Pradeep P V K <ppvk@codeaurora.org> Signed-off-by: Pradeep P V K <ppvk@codeaurora.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
irq_domain_alloc_irqs_hierarchy() has 3 call sites in the compilation unit
but only one of them checks for the pointer which is being dereferenced
inside the called function. Move the check into the function. This allows
for catching the error instead of the following crash:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
PC is at 0x0
LR is at gpiochip_hierarchy_irq_domain_alloc+0x11f/0x140
...
[<c06c23ff>] (gpiochip_hierarchy_irq_domain_alloc)
[<c0462a89>] (__irq_domain_alloc_irqs)
[<c0462dad>] (irq_create_fwspec_mapping)
[<c06c2251>] (gpiochip_to_irq)
[<c06c1c9b>] (gpiod_to_irq)
[<bf973073>] (gpio_irqs_init [gpio_irqs])
[<bf974048>] (gpio_irqs_exit+0xecc/0xe84 [gpio_irqs])
Code: bad PC value
3a6b6c6fb23667fa ("efi: Make EFI_MEMORY_ATTRIBUTES_TABLE initialization common across all architectures")
moved the call to efi_memattr_init() from ARM specific to the generic
EFI init code, in order to be able to apply the restricted permissions
described in that table on x86 as well.
We never enabled this feature fully on i386, and so mapping and
reserving this table is pointless. However, due to the early call to
memblock_reserve(), the memory bookkeeping gets confused to the point
where it produces the splat below when we try to map the memory later
on:
Let's work around this by disregarding the memory attributes table
altogether on i386, which does not result in a loss of functionality
or protection, given that we never consumed the contents.
Before this patch, run_queue would demote glocks based on whether
there are any more holders. But if the glock has pending revokes that
haven't been written to the media, giving up the glock might end in
file system corruption if the revokes never get written due to
io errors, node crashes and fences, etc. In that case, another node
will replay the metadata blocks associated with the glock, but
because the revoke was never written, it could replay that block
even though the glock had since been granted to another node who
might have made changes.
This patch changes the logic in run_queue so that it never demotes
a glock until its count of pending revokes reaches zero.
Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
There is also refcount issue, as well:
WARNING: CPU: 1 PID: 1 at lib/refcount.c:28 refcount_warn_saturate+0xf8/0x170
The issue is that we make an erroneous extra call to scsi_host_put()
for that host:
So in ahci_init_one()->ata_host_alloc_pinfo()->ata_host_alloc(), we setup
a device release method - ata_devres_release() - which intends to release
the SCSI hosts:
static void ata_devres_release(struct device *gendev, void *res)
{
...
for (i = 0; i < host->n_ports; i++) {
struct ata_port *ap = host->ports[i];
if (!ap)
continue;
if (ap->scsi_host)
scsi_host_put(ap->scsi_host);
}
...
}
However in the ata_scsi_add_hosts() error path, we also call
scsi_host_put() for the SCSI hosts.
Fix by removing the the scsi_host_put() calls in ata_scsi_add_hosts() and
leave this to ata_devres_release().
Fixes: f31871951b38 ("libata: separate out ata_host_alloc() and ata_host_register()") Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
The same benchmark is running in group C & F, no other tasks are
running, the benchmark is capable to consumed all the CPUs.
We suppose the group C will win more CPU resources since it could
enjoy all the shares of group A, but it's F who wins much more.
The reason is because we have group B with shares as 2, since
A->cfs_rq.load.weight == B->se.load.weight == B->shares/nr_cpus,
so A->cfs_rq.load.weight become very small.
And in calc_group_shares() we calculate shares as:
Since the 'cfs_rq->load.weight' is too small, the load become 0
after scale down, although 'tg_shares' is 102400, shares of the se
which stand for group A on root cfs_rq become 2.
While the se of D on root cfs_rq is far more bigger than 2, so it
wins the battle.
Thus when scale_load_down() scale real weight down to 0, it's no
longer telling the real story, the caller will have the wrong
information and the calculation will be buggy.
This patch add check in scale_load_down(), so the real weight will
be >= MIN_SHARES after scale, after applied the group C wins as
expected.
Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/38e8e212-59a1-64b2-b247-b6d0b52d8dc1@linux.alibaba.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Enclose the chained handler with chained_irq_{enter,exit}(), so that the
muxed interrupts get properly acked.
This patch also fixes a reboot bug on OX820 SoC, where the jiffies timer
interrupt is never acked. The kernel waits a clock tick forever in
calibrate_delay_converge(), which leads to a boot hang.
Field bdi->io_pages added in commit 9491ae4aade6 ("mm: don't cap request
size based on read-ahead setting") removes unneeded split of read requests.
Stacked drivers do not call blk_queue_max_hw_sectors(). Instead they set
limits of their devices by blk_set_stacking_limits() + disk_stack_limits().
Field bio->io_pages stays zero until user set max_sectors_kb via sysfs.
This patch updates io_pages after merging limits in disk_stack_limits().
Commit c6d6e9b0f6b4 ("dm: do not allow readahead to limit IO size") fixed
the same problem for device-mapper devices, this one fixes MD RAIDs.
Fixes: 9491ae4aade6 ("mm: don't cap request size based on read-ahead setting") Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When SEV or SME is enabled and active, vm_get_page_prot() typically
returns with the encryption bit set. This means that users of
pgprot_modify(, vm_get_page_prot()) (mprotect_fixup(), do_mmap()) end up
with a value of vma->vm_pg_prot that is not consistent with the intended
protection of the PTEs.
This is also important for fault handlers that rely on the VMA
vm_page_prot to set the page protection. Fix this by not allowing
pgprot_modify() to change the encryption bit, similar to how it's done
for PAT bits.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20200304114527.3636-2-thomas_os@shipmail.org Signed-off-by: Sasha Levin <sashal@kernel.org>
It will start failing randomly including IO to unrelated zones because of
->error "reuse". Trigger can be partition detection as well if test is not
run immediately which is even more entertaining.
The fix is of course to clear ->error where necessary.
If null_add_dev() fails then null_del_dev() is called with a NULL argument.
Make null_del_dev() handle this scenario correctly. This patch fixes the
following KASAN complaint:
null-ptr-deref in null_del_dev+0x28/0x280 [null_blk]
Read of size 8 at addr 0000000000000000 by task find/1062
printk in macro vxge_debug_ll uses __VA_ARGS__ without "##" prefix,
it causes a build error when there is no variable
arguments(e.g. only fmt is specified.).
Signed-off-by: Zheng Wei <wei.zheng@vivo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
When doing a 16-bit read that returns data in the MSB byte, the
RSB_DATA register will keep the MSB byte unchanged when doing
the following 8-bit read. sunxi_rsb_read() will then return
a result that contains high byte from 16-bit read mixed with
the 8-bit result.
The consequence is that after this happens the PMIC's regmap will
look like this: (0x33 is the high byte from the 16-bit read)
Fix this by masking the result of the read with the correct mask
based on the size of the read. There are no 16-bit users in the
mainline kernel, so this doesn't need to get into the stable tree.
Signed-off-by: Ondrej Jirman <megous@megous.com> Acked-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Signed-off-by: Sasha Levin <sashal@kernel.org>
[subject was: drm/msm: shake fist angrily at dma-mapping]
So, using dma_sync_* for our cache needs works out w/ dma iommu ops, but
it falls appart with dma direct ops. The problem is that, depending on
display generation, we can have either set of dma ops (mdp4 and dpu have
iommu wired to mdss node, which maps to toplevel drm device, but mdp5
has iommu wired up to the mdp sub-node within mdss).
Fixes: 0036bc73ccbe (drm/msm: stop abusing dma_map/unmap for cache) Fixes: 449fa54d6815 (dma-direct: correct the physical addr in dma_direct_sync_sg_for_cpu/device) Tested-by: Stephan Gerhold <stephan@gerhold.net> Signed-off-by: Rob Clark <robdclark@chromium.org>
[seanpaul changed subject to something more desriptive] Signed-off-by: Sean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190730214633.17820-1-robdclark@gmail.com Cc: nobuhiro1.iwamatsu@toshiba.co.jp Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts
commit 6a4290cc28be1 ("usb: dwc3: gadget: set the OTG flag in dwc3 gadget driver.")
We don't yet support any of the OTG mechanisms (HNP/SRP/ADP)
and are not setting gadget->otg_caps, so don't set gadget->is_otg
flag.
If we do then we end up publishing a OTG1.0 descriptor in
the gadget descriptor which causes device enumeration to fail
if we are connected to a host with CONFIG_USB_OTG enabled.
Host side log without this patch
[ 96.720453] usb 1-1: new high-speed USB device number 2 using xhci-hcd
[ 96.901391] usb 1-1: Dual-Role OTG device on non-HNP port
[ 96.907552] usb 1-1: set a_alt_hnp_support failed: -32
[ 97.060447] usb 1-1: new high-speed USB device number 3 using xhci-hcd
[ 97.241378] usb 1-1: Dual-Role OTG device on non-HNP port
[ 97.247536] usb 1-1: set a_alt_hnp_support failed: -32
[ 97.253606] usb usb1-port1: attempt power cycle
[ 97.960449] usb 1-1: new high-speed USB device number 4 using xhci-hcd
[ 98.141383] usb 1-1: Dual-Role OTG device on non-HNP port
[ 98.147540] usb 1-1: set a_alt_hnp_support failed: -32
[ 98.300453] usb 1-1: new high-speed USB device number 5 using xhci-hcd
[ 98.481391] usb 1-1: Dual-Role OTG device on non-HNP port
[ 98.487545] usb 1-1: set a_alt_hnp_support failed: -32
[ 98.493532] usb usb1-port1: unable to enumerate USB device
Signed-off-by: Roger Quadros <rogerq@ti.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is possible for the chunk sizes coming from the non RPM remote procs
to not be word aligned. Remove the alignment warning and continue to
read from the FIFO so execution is not stalled.
Signed-off-by: Chris Lew <clew@codeaurora.org> Signed-off-by: Arun Kumar Neelakantam <aneela@codeaurora.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
__early_cpu_boot_status is of type long. Use quad
assembler directive to allocate proper size.
Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Arun KS <arunks@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Not quite sure what triggered that, but we really shouldn't be abusing
dma_{map,unmap}_sg() for cache maint.
Cc: Stephen Boyd <sboyd@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Sean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190630124735.27786-1-robdclark@gmail.com Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 11189c1089da "acpi/nfit: Fix command-supported detection" broke
ND_CMD_CALL for bus-level commands. The "func = cmd" assumption is only
valid for:
syzbot reported that 4fbc0c711b24 ("ceph: remove the extra slashes in
the server path") had caused a regression where an allocation could be
done under a spinlock -- compare_mount_options() is called by sget_fc()
with sb_lock held.
We don't really need the supplied server path, so canonicalize it
in place and compare it directly. To make this work, the leading
slash is kept around and the logic in ceph_real_mount() to skip it
is restored. CEPH_MSG_CLIENT_SESSION now reports the same (i.e.
canonicalized) path, with the leading slash of course.
Fixes: 4fbc0c711b24 ("ceph: remove the extra slashes in the server path") Reported-by: syzbot+98704a51af8e3d9425a9@syzkaller.appspotmail.com Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Luis Henriques <lhenriques@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
And then when creating any new file or directory under the mount
point, we can hit the following BUG_ON in ceph_fill_trace():
BUG_ON(ceph_snap(dir) != dvino.snap);
Have the client ignore the extra slashes in the server path when
mounting. This will also canonicalize the path, so that identical mounts
can be consilidated.
Regardless of the internal treatment of these paths, the kernel still
stores the original string including the leading '/' for presentation
to userland.
URL: https://tracker.ceph.com/issues/42771 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Luis Henriques <lhenriques@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- Releasing dd->per_sdma[i].kobject in hfi1_unregister_sysfs().
- This will fix the memory leak.
- Calling kobject_put() to unwind operations only for those entries in
dd->per_sdma[] whose operations have succeeded (including the current
one that has just failed) in hfi1_verbs_register_sysfs().
Cc: <stable@vger.kernel.org> Fixes: 0cb2aa690c7e ("IB/hfi1: Add sysfs interface for affinity setup") Link: https://lore.kernel.org/r/20200326163807.21129.27371.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When kobject_init_and_add() returns an error in the function
hfi1_create_port_files(), the function kobject_put() is not called for the
corresponding kobject, which potentially leads to memory leak.
This patch fixes the issue by calling kobject_put() even if
kobject_init_and_add() fails.
Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200326163813.21129.44280.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make sure that the rngc interrupt is masked if the rngc self test fails.
Self test failure means that probe fails as well. Interrupts should be
masked in this case, regardless of the error.
Cc: stable@vger.kernel.org Fixes: 1d5449445bd0 ("hwrng: mx-rngc - add a driver for Freescale RNGC") Reviewed-by: PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com> Signed-off-by: Martin Kaiser <martin@kaiser.cx> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A recent change to the netlink code: 6e237d099fac ("netlink: Relax attr
validation for fixed length types") logs a warning when programs send
messages with invalid attributes (e.g., wrong length for a u32). Yafang
reported this error message for tools/accounting/getdelays.c.
send_cmd() is wrongly adding 1 to the attribute length. As noted in
include/uapi/linux/netlink.h nla_len should be NLA_HDRLEN + payload
length, so drop the +1.
It turns out that RDRAND is pretty slow. Comparing these two
constructions:
for (i = 0; i < CHACHA_BLOCK_SIZE; i += sizeof(ret))
arch_get_random_long(&ret);
and
long buf[CHACHA_BLOCK_SIZE / sizeof(long)];
extract_crng((u8 *)buf);
it amortizes out to 352 cycles per long for the top one and 107 cycles
per long for the bottom one, on Coffee Lake Refresh, Intel Core i9-9880H.
And importantly, the top one has the drawback of not benefiting from the
real rng, whereas the bottom one has all the nice benefits of using our
own chacha rng. As get_random_u{32,64} gets used in more places (perhaps
beyond what it was originally intended for when it was introduced as
get_random_{int,long} back in the md5 monstrosity era), it seems like it
might be a good thing to strengthen its posture a tiny bit. Doing this
should only be stronger and not any weaker because that pool is already
initialized with a bunch of rdrand data (when available). This way, we
get the benefits of the hardware rng as well as our own rng.
Another benefit of this is that we no longer hit pitfalls of the recent
stream of AMD bugs in RDRAND. One often used code pattern for various
things is:
do {
val = get_random_u32();
} while (hash_table_contains_key(val));
That recent AMD bug rendered that pattern useless, whereas we're really
very certain that chacha20 output will give pretty distributed numbers,
no matter what.
So, this simplification seems better both from a security perspective
and from a performance perspective.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20200221201037.30231-1-Jason@zx2c4.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The handler for FLOW_ACTION_VLAN_MANGLE ends by returning whatever the
lower-level function that it calls returns. If there are more actions lined
up after this action, those are never offloaded. Fix by only bailing out
when the called function returns an error.
Fixes: a150201a70da ("mlxsw: spectrum: Add support for vlan modify TC action") Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
struct can_frame contains some padding which is not explicitly zeroed in
slc_bump. This uninitialized data will then be transmitted if the stack
initialization hardening feature is not enabled (CONFIG_INIT_STACK_ALL).
This commit just zeroes the whole struct including the padding.
Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com> Fixes: a1044e36e457 ("can: add slcan driver for serial/USB-serial CAN adapters") Reviewed-by: Kees Cook <keescook@chromium.org> Cc: linux-can@vger.kernel.org Cc: netdev@vger.kernel.org Cc: security@kernel.org Cc: wg@grandegger.com Cc: mkl@pengutronix.de Cc: davem@davemloft.net Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 9463c4455900 ("net: stmmac: dwmac1000: Clear unused address
entries") cleared the unused mac address entries, but introduced an
out-of bounds mac address register programming bug -- After setting
the secondary unicast mac addresses, the "reg" value has reached
netdev_uc_count() + 1, thus we should only clear address entries
if (addr < perfect_addr_number)
Fixes: 9463c4455900 ("net: stmmac: dwmac1000: Clear unused address entries") Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After the power-down bit is cleared, the chip internally triggers a
global reset. According to the KSZ9031 documentation, we have to wait at
least 1ms for the reset to finish.
If the chip is accessed during reset, read will return 0xffff, while
write will be ignored. Depending on the system performance and MDIO bus
speed, we may or may not run in to this issue.
This bug was discovered on an iMX6QP system with KSZ9031 PHY and
attached PHY interrupt line. If IRQ was used, the link status update was
lost. In polling mode, the link status update was always correct.
The investigation showed, that during a read-modify-write access, the
read returned 0xffff (while the chip was still in reset) and
corresponding write hit the chip _after_ reset and triggered (due to the
0xffff) another reset in an undocumented bit (register 0x1f, bit 1),
resulting in the next write being lost due to the new reset cycle.
This patch fixes the issue by adding a 1...2 ms sleep after the
genphy_resume().
Fixes: 836384d2501d ("net: phy: micrel: Add specific suspend") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the bcm_sf2 was converted into a proper platform device driver and
used the new dsa_register_switch() interface, we would still be parsing
the legacy DSA node that contained all the port information since the
platform firmware has intentionally maintained backward and forward
compatibility to client programs. Ensure that we do parse the correct
node, which is "ports" per the revised DSA binding.
Fixes: d9338023fb8e ("net: dsa: bcm_sf2: Make it a real platform device driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bonding slave and team port devices should not have link-local addresses
automatically added to them, as it can interfere with openvswitch being
able to properly add tc ingress.
Basic reproducer, courtesy of Marcelo:
$ ip link add name bond0 type bond
$ ip link set dev ens2f0np0 master bond0
$ ip link set dev ens2f1np2 master bond0
$ ip link set dev bond0 up
$ ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc
mq master bond0 state UP group default qlen 1000
link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc
mq master bond0 state DOWN group default qlen 1000
link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc
noqueue state UP group default qlen 1000
link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
inet6 fe80::20f:53ff:fe2f:ea40/64 scope link
valid_lft forever preferred_lft forever
$ ip a l ens2f0np0
2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc
mq master bond0 state UP group default qlen 1000
link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative
valid_lft forever preferred_lft forever
$ ip a l ens2f1np2
5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc
mq master bond0 state DOWN group default qlen 1000
link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative
valid_lft forever preferred_lft forever
Looks like addrconf_sysctl_addr_gen_mode() bypasses the original "is
this a slave interface?" check added by commit c2edacf80e15, and
results in an address getting added, while w/the proposed patch added,
no address gets added. This simply adds the same gating check to another
code path, and thus should prevent the same devices from erroneously
obtaining an ipv6 link-local address.
Fixes: d35a00b8e33d ("net/ipv6: allow sysctl to change link-local address generation mode") Reported-by: Moshe Levi <moshele@mellanox.com> CC: Stephen Hemminger <stephen@networkplumber.org> CC: Marcelo Ricardo Leitner <mleitner@redhat.com> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>