The "MiB" result of the IMC free-running bandwidth events,
uncore_imc_free_running/read/ and uncore_imc_free_running/write/ are 16
times too small.
The "MiB" value equals the raw IMC free-running bandwidth counter value
times a "scale" which is inaccurate.
The IMC free-running bandwidth events should be incremented per 64B
cache line, not DWs (4 bytes). The "scale" should be 6.103515625e-5.
Fix the "scale" for both Snow Ridge and Ice Lake.
Fixes: 2b3b76b5ec67 ("perf/x86/intel/uncore: Add Ice Lake server uncore support") Fixes: ee49532b38dd ("perf/x86/intel/uncore: Add IMC uncore support for Snow Ridge") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200928133240.12977-1-kan.liang@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
There are some updates for the Icelake model specific uncore performance
monitors. (The update can be found at 10th generation intel core
processors families specification update Revision 004, ICL068)
1) Counter 0 of ARB uncore unit is not available for software use
2) The global 'enable bit' (bit 29) and 'freeze bit' (bit 31) of
MSR_UNC_PERF_GLOBAL_CTRL cannot be used to control counter behavior.
Needs to use local enable in event select MSR.
Accessing the modified bit/registers will be ignored by HW. Users may
observe inaccurate results with the current code.
The changes of the MSR_UNC_PERF_GLOBAL_CTRL imply that groups cannot be
read atomically anymore. Although the error of the result for a group
becomes a bit bigger, it still far lower than not using a group. The
group support is still kept. Only Remove the *_box() related
implementation.
Since the counter 0 of ARB uncore unit is not available, update the MSR
address for the ARB uncore unit.
There is no change for IMC uncore unit, which only include free-running
counters.
Writes to the PMXEVTYPER_EL0 register are not self-synchronising. In
armv8pmu_enable_event(), the PE can reorder configuring the event type
after we have enabled the counter and the interrupt. This can lead to an
interrupt being asserted because of the previous event type that we were
counting using the same counter, not the one that we've just configured.
The same rationale applies to writes to the PMINTENSET_EL1 register. The PE
can reorder enabling the interrupt at any point in the future after we have
enabled the event.
Prevent both situations from happening by adding an ISB just before we
enable the event counter.
Fixes: 030896885ade ("arm64: Performance counters support") Reported-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200924110706.254996-2-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Something is wrong. In find_busiest_group(), we are checking if
src has higher load, however, in task_numa_find_cpu(), we are
checking if dst will have higher load after balancing. It seems
it is not sensible to check src.
It maybe cause wrong imbalance value, for example,
if dst_running = env->dst_stats.nr_running + 1 results in 3 or
above, and src_running = env->src_stats.nr_running - 1 results
in 1;
The current code is thinking imbalance as 0 since src_running is
smaller than 2. This is inconsistent with load balancer.
Basically, in find_busiest_group(), the NUMA imbalance is ignored if moving
a task "from an almost idle domain" to a "domain with spare capacity". This
patch forbids movement "from a misplaced domain" to "an almost idle domain"
as that is closer to what the CPU load balancer expects.
This patch is not a universal win. The old behaviour was intended to allow
a task from an almost idle NUMA node to migrate to its preferred node if
the destination had capacity but there are corner cases. For example,
a NAS compute load could be parallelised to use 1/3rd of available CPUs
but not all those potential tasks are active at all times allowing this
logic to trigger. An obvious example is specjbb 2005 running various
numbers of warehouses on a 2 socket box with 80 cpus.
The performance at the mid-point is better but not universally better. The
patch is a mixed bag depending on the workload, machine and overall
levels of utilisation. Sometimes it's better (sometimes much better),
other times it is worse (sometimes much worse). Given that there isn't a
universally good decision in this section and more people seem to prefer
the patch then it may be best to keep the LB decisions consistent and
revisit imbalance handling when the load balancer code changes settle down.
Jirka Hladky added the following observation.
Our results are mostly in line with what you see. We observe
big gains (20-50%) when the system is loaded to 1/3 of the
maximum capacity and mixed results at the full load - some
workloads benefit from the patch at the full load, others not,
but performance changes at the full load are mostly within the
noise of results (+/-5%). Overall, we think this patch is helpful.
[mgorman@techsingularity.net: Rewrote changelog] Fixes: fb86f5b211 ("sched/numa: Use similar logic to the load balancer for moving between domains with spare capacity") Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200921221849.GI3179@techsingularity.net Signed-off-by: Sasha Levin <sashal@kernel.org>
We've met problems that occasionally tasks with full cpumask
(e.g. by putting it into a cpuset or setting to full affinity)
were migrated to our isolated cpus in production environment.
After some analysis, we found that it is due to the current
select_idle_smt() not considering the sched_domain mask.
Steps to reproduce on my 31-CPU hyperthreads machine:
1. with boot parameter: "isolcpus=domain,2-31"
(thread lists: 0,16 and 1,17)
2. cgcreate -g cpu:test; cgexec -g cpu:test "test_threads"
3. some threads will be migrated to the isolated cpu16~17.
Fix it by checking the valid domain mask in select_idle_smt().
Fixes: 10e2f1acd010 ("sched/core: Rewrite and improve select_idle_siblings()) Reported-by: Wetp Zhang <wetp.zy@linux.alibaba.com> Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/1600930127-76857-1-git-send-email-xlpang@linux.alibaba.com Signed-off-by: Sasha Levin <sashal@kernel.org>
In tx2_uncore_pmu_init_dev(), a call to acpi_dev_get_resources() is used
to create a list _CRS resources which is searched for the device base
address. There is an error check following this:
if (!rentry->res)
return NULL
In no case, will rentry->res be NULL, so the test is useless. Even
if the test worked, it comes before the resource list memory is
freed. None of this really matters as long as the ACPI table has
the memory resource. Let's clean it up so that it makes sense and
will give a meaningful error should firmware leave out the memory
resource.
This is due to use of an uninitialized local resource struct in the xgene
pmu driver. The thunderx2_pmu driver avoids this by using the resource list
constructed by acpi_dev_get_resources() rather than using a callback from
that function. The callback in the xgene driver didn't fully initialize
the resource. So get rid of the callback and search the resource list as
done by thunderx2.
Fixes: 832c927d119b ("perf: xgene: Add APM X-Gene SoC Performance Monitoring Unit driver") Signed-off-by: Mark Salter <msalter@redhat.com> Link: https://lore.kernel.org/r/20200915204110.326138-1-msalter@redhat.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently the ARMv8.3-PAuth combined branch instructions (braa, retaa
etc.) are not simulated for out-of-line execution with a handler. Hence the
uprobe of such instructions leads to kernel warnings in a loop as they are
not explicitly checked and fall into INSN_GOOD categories. Other combined
instructions like LDRAA and LDRBB can be probed.
The issue of the combined branch instructions is fixed by adding
group definitions of all such instructions and rejecting their probes.
The instruction groups added are br_auth(braa, brab, braaz and brabz),
blr_auth(blraa, blrab, blraaz and blrabz), ret_auth(retaa and retab) and
eret_auth(eretaa and eretab).
0c2a3913d6f5 ("x86/fpu: Parse clearcpuid= as early XSAVE argument")
changed clearcpuid parsing from __setup() to cmdline_find_option().
While the __setup() function would have been called for each clearcpuid=
parameter on the command line, cmdline_find_option() will only return
the last one, so the change effectively made it impossible to disable
more than one bit.
Allow a comma-separated list of bit numbers as the argument for
clearcpuid to allow multiple bits to be disabled again. Log the bits
being disabled for informational purposes.
Also fix the check on the return value of cmdline_find_option(). It
returns -1 when the option is not found, so testing as a boolean is
incorrect.
In the non-overflow context, e.g., context switch, with large PEBS, perf
may stop an event twice. An example is below.
//max_samples_per_tick is adjusted to 2
//NMI is triggered
intel_pmu_handle_irq()
handle_pmi_common()
drain_pebs()
__intel_pmu_pebs_event()
perf_event_overflow()
__perf_event_account_interrupt()
hwc->interrupts = 1
return 0
//A context switch happens right after the NMI.
//In the same tick, the perf_throttled_seq is not changed.
perf_event_task_sched_out()
perf_pmu_sched_task()
intel_pmu_drain_pebs_buffer()
__intel_pmu_pebs_event()
perf_event_overflow()
__perf_event_account_interrupt()
++hwc->interrupts >= max_samples_per_tick
return 1
x86_pmu_stop(); # First stop
perf_event_context_sched_out()
task_ctx_sched_out()
ctx_sched_out()
event_sched_out()
x86_pmu_del()
x86_pmu_stop(); # Second stop and trigger the warning
Perf should only invoke the perf_event_overflow() in the overflow
context.
Current drain_pebs() is called from:
- handle_pmi_common() -- overflow context
- intel_pmu_pebs_sched_task() -- non-overflow context
- intel_pmu_pebs_disable() -- non-overflow context
- intel_pmu_auto_reload_read() -- possible overflow context
With PERF_SAMPLE_READ + PERF_FORMAT_GROUP, the function may be
invoked in the NMI handler. But, before calling the function, the
PEBS buffer has already been drained. The __intel_pmu_pebs_event()
will not be called in the possible overflow context.
To fix the issue, an indicator is required to distinguish between the
overflow context aka handle_pmi_common() and other cases.
The dummy regs pointer can be used as the indicator.
In the non-overflow context, perf should treat the last record the same
as other PEBS records, and doesn't invoke the generic overflow handler.
Fixes: 21509084f999 ("perf/x86/intel: Handle multiple records in the PEBS buffer") Reported-by: Like Xu <like.xu@linux.intel.com> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Like Xu <like.xu@linux.intel.com> Link: https://lkml.kernel.org/r/20200902210649.2743-1-kan.liang@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
platform_get_irq() returns a negative error number on error. In such a
case, comparison to 0 would pass the check therefore check the return
value properly, whether it is negative.
[ bp: Massage commit message. ]
Fixes: 86a18ee21e5e ("EDAC, ti: Add support for TI keystone and DRA7xx EDAC") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tero Kristo <t-kristo@ti.com> Link: https://lkml.kernel.org/r/20200827070743.26628-2-krzk@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
platform_get_irq() returns a negative error number on error. In such a
case, comparison to 0 would pass the check therefore check the return
value properly, whether it is negative.
[ bp: Massage commit message. ]
Fixes: 9b7e6242ee4e ("EDAC, aspeed: Add an Aspeed AST2500 EDAC driver") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Stefan Schaeckeler <schaecsn@gmx.net> Link: https://lkml.kernel.org/r/20200827070743.26628-1-krzk@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
When pci_get_device_func() fails, the driver doesn't need to execute
pci_dev_put(). mci should still be freed, though, to prevent a memory
leak. When pci_enable_device() fails, the error injection PCI device
"einj" doesn't need to be disabled either.
[ bp: Massage commit message, rename label to "bail_mc_free". ]
Fixes: 52608ba205461 ("i5100_edac: probe for device 19 function 0") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200826121437.31606-1-dinghao.liu@zju.edu.cn Signed-off-by: Sasha Levin <sashal@kernel.org>
In find_energy_efficient_cpu() 'cpu_cap' could be less that 'util'.
It might be because of RT, DL (so higher sched class than CFS), irq or
thermal pressure signal, which reduce the capacity value.
In such situation the result of 'cpu_cap - util' might be negative but
stored in the unsigned long. Then it might be compared with other unsigned
long when uclamp_rq_util_with() reduced the 'util' such that is passes the
fits_capacity() check.
Prevent this situation and make the arithmetic more safe.
Fixes: 1d42509e475cd ("sched/fair: Make EAS wakeup placement consider uclamp restrictions") Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20200810083004.26420-1-lukasz.luba@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
CAAM accelerator only supports XTS-AES-128 and XTS-AES-256 since
it adheres strictly to the standard. All the other key lengths
are accepted and processed through a fallback as long as they pass
the xts_verify_key() checks.
Fixes: b189817cf789 ("crypto: caam/qi - add ablkcipher and authenc algorithms") Cc: <stable@vger.kernel.org> # v4.12+ Signed-off-by: Andrei Botila <andrei.botila@nxp.com> Reviewed-by: Horia Geantฤ <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A hardware limitation exists for CAAM until Era 9 which restricts
the accelerator to IVs with only 8 bytes. When CAAM has a lower era
a fallback is necessary to process 16 bytes IV.
Fixes: b189817cf789 ("crypto: caam/qi - add ablkcipher and authenc algorithms") Cc: <stable@vger.kernel.org> # v4.12+ Signed-off-by: Andrei Botila <andrei.botila@nxp.com> Reviewed-by: Horia Geantฤ <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The async path cannot use MAY_BACKLOG because it is not meant to
block, which is what MAY_BACKLOG does. On the other hand, both
the sync and async paths can make use of MAY_SLEEP.
Fixes: 83094e5e9e49 ("crypto: af_alg - add async support to...") Cc: <stable@vger.kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Errors returned by crypto_shash_update() are not checked in
ima_calc_boot_aggregate_tfm() and thus can be overwritten at the next
iteration of the loop. This patch adds a check after calling
crypto_shash_update() and returns immediately if the result is not zero.
Cc: stable@vger.kernel.org Fixes: 3323eec921efd ("integrity: IMA as an integrity service provider") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The function amd_ir_set_vcpu_affinity makes use of the parameter struct
amd_iommu_pi_data.prev_ga_tag to determine if it should delete struct
amd_iommu_pi_data from a list when not running in AVIC mode.
However, prev_ga_tag is initialized only when AVIC is enabled. The non-zero
uninitialized value can cause unintended code path, which ends up making
use of the struct vcpu_svm.ir_list and ir_list_lock without being
initialized (since they are intended only for the AVIC case).
This triggers NULL pointer dereference bug in the function vm_ir_list_del
with the following call trace:
Unconditionally intercept changes to CR4.LA57 so that KVM correctly
injects a #GP fault if the guest attempts to set CR4.LA57 when it's
supported in hardware but not exposed to the guest.
Long term, KVM needs to properly handle CR4 bits that can be under guest
control but also may be reserved from the guest's perspective. But, KVM
currently sets the CR4 guest/host mask only during vCPU creation, and
reworking flows to change that will take a bit of elbow grease.
Even if/when generic support for intercepting reserved bits exists, it's
probably not worth letting the guest set CR4.LA57 directly. LA57 can't
be toggled while long mode is enabled, thus it's all but guaranteed to
be set once (maybe twice, e.g. by BIOS and kernel) during boot and never
touched again. On the flip side, letting the guest own CR4.LA57 may
incur extra VMREADs. In other words, this temporary "hack" is probably
also the right long term fix.
Fixes: fd8cb433734e ("KVM: MMU: Expose the LA57 feature to VM.") Cc: stable@vger.kernel.org Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
[sean: rewrote changelog] Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200930041659.28181-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Call kvm_mmu_commit_zap_page() after exiting the "prepare zap" loop in
kvm_recover_nx_lpages() to finish zapping pages in the unlikely event
that the loop exited due to lpage_disallowed_mmu_pages being empty.
Because the recovery thread drops mmu_lock() when rescheduling, it's
possible that lpage_disallowed_mmu_pages could be emptied by a different
thread without to_zap reaching zero despite to_zap being derived from
the number of disallowed lpages.
Fixes: 1aa9b9572b105 ("kvm: x86: mmu: Recovery of shattered NX large pages") Cc: Junaid Shahid <junaids@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923183735.584-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Explicitly reset the segment cache after stuffing guest segment regs in
prepare_vmcs02_rare(). Although the cache is reset when switching to
vmcs02, there is nothing that prevents KVM from re-populating the cache
prior to writing vmcs02 with vmcs12's values. E.g. if the vCPU is
preempted after switching to vmcs02 but before prepare_vmcs02_rare(),
kvm_arch_vcpu_put() will dereference GUEST_SS_AR_BYTES via .get_cpl()
and cache the stale vmcs02 value. While the current code base only
caches stale data in the preemption case, it's theoretically possible
future code could read a segment register during the nested flow itself,
i.e. this isn't technically illegal behavior in kvm_arch_vcpu_put(),
although it did introduce the bug.
This manifests as an unexpected nested VM-Enter failure when running
with unrestricted guest disabled if the above preemption case coincides
with L1 switching L2's CPL, e.g. when switching from a L2 vCPU at CPL3
to to a L2 vCPU at CPL0. stack_segment_valid() will see the new SS_SEL
but the old SS_AR_BYTES and incorrectly mark the guest state as invalid
due to SS.dpl != SS.rpl.
Don't bother updating the cache even though prepare_vmcs02_rare() writes
every segment. With unrestricted guest, guest segments are almost never
read, let alone L2 guest segments. On the other hand, populating the
cache requires a large number of memory writes, i.e. it's unlikely to be
a net win. Updating the cache would be a win when unrestricted guest is
not supported, as guest_state_valid() will immediately cache all segment
registers. But, nested virtualization without unrestricted guest is
dirt slow, saving some VMREADs won't change that, and every CPU
manufactured in the last decade supports unrestricted guest. In other
words, the extra (minor) complexity isn't worth the trouble.
Note, kvm_arch_vcpu_put() may see stale data when querying guest CPL
depending on when preemption occurs. This is "ok" in that the usage is
imperfect by nature, i.e. it's used heuristically to improve performance
but doesn't affect functionality. kvm_arch_vcpu_put() could be "fixed"
by also disabling preemption while loading segments, but that's
pointless and misleading as reading state from kvm_sched_{in,out}() is
guaranteed to see stale data in one form or another. E.g. even if all
the usage of regs_avail is fixed to call kvm_register_mark_available()
after the associated state is set, the individual state might still be
stale with respect to the overall vCPU state. I.e. making functional
decisions in an asynchronous hook is doomed from the get go. Thankfully
KVM doesn't do that.
Fixes: de63ad4cf4973 ("KVM: X86: implement the logic for spinlock optimization") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923184452.980-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On successful nested VM-Enter, check for pending interrupts and convert
the highest priority interrupt to a pending posted interrupt if it
matches L2's notification vector. If the vCPU receives a notification
interrupt before nested VM-Enter (assuming L1 disables IRQs before doing
VM-Enter), the pending interrupt (for L1) should be recognized and
processed as a posted interrupt when interrupts become unblocked after
VM-Enter to L2.
This fixes a bug where L1/L2 will get stuck in an infinite loop if L1 is
trying to inject an interrupt into L2 by setting the appropriate bit in
L2's PIR and sending a self-IPI prior to VM-Enter (as opposed to KVM's
method of manually moving the vector from PIR->vIRR/RVI). KVM will
observe the IPI while the vCPU is in L1 context and so won't immediately
morph it to a posted interrupt for L2. The pending interrupt will be
seen by vmx_check_nested_events(), cause KVM to force an immediate exit
after nested VM-Enter, and eventually be reflected to L1 as a VM-Exit.
After handling the VM-Exit, L1 will see that L2 has a pending interrupt
in PIR, send another IPI, and repeat until L2 is killed.
Note, posted interrupts require virtual interrupt deliveriy, and virtual
interrupt delivery requires exit-on-interrupt, ergo interrupts will be
unconditionally unmasked on VM-Enter if posted interrupts are enabled.
When mounting with modefromsid mount option, it was possible to
get the error on stat of a fifo or char or block device:
"cannot stat <filename>: Operation not supported"
Special devices can be stored as reparse points by some servers
(e.g. Windows NFS server and when using the SMB3.1.1 POSIX
Extensions) but when the modefromsid mount option is used
the client attempts to get the ACL for the file which requires
opening with OPEN_REPARSE_POINT create option.
Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To servers which do not support directory leases (e.g. Samba)
it is wasteful to try to open_shroot (ie attempt to cache the
root directory handle). Skip attempt to open_shroot when
server does not indicate support for directory leases.
Cuts the number of requests on mount from 17 to 15, and
cuts the number of requests on stat of the root directory
from 4 to 3.
Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> # v5.1+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We were setting the uid/gid to the default in each dir entry
in the parsing of the POSIX query dir response, rather
than attempting to map the user and group SIDs returned by
the server to well known SIDs (or upcall if not found).
CC: Stable <stable@vger.kernel.org> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
TCP server info field server->total_read is modified in parallel by
demultiplex thread and decrypt offload worker thread. server->total_read
is used in calculation to discard the remaining data of PDU which is
not read into memory.
Because of parallel modification, server->total_read can get corrupted
and can result in discarding the valid data of next PDU.
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org> #5.4+ Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In crypt_message, when smb2_get_enc_key returns error, we need to
return the error back to the caller. If not, we end up processing
the message further, causing a kernel oops due to unwarranted access
of memory.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The "end" pointer is either NULL or it points to the next byte to parse.
If there isn't a next byte then dereferencing "end" is an off-by-one out
of bounds error. And, of course, if it's NULL that leads to an Oops.
Printing "*end" doesn't seem very useful so let's delete this code.
Also for the last debug statement, I noticed that it should be printing
"sequence_end" instead of "end" so fix that as well.
Reported-by: Dominik Maier <dmaier@sect.tu-berlin.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ASUS D700SA desktop's audio (1043:2390) with ALC887 cannot detect
the headset microphone and another headphone jack until
ALC887_FIXUP_ASUS_HMIC and ALC887_FIXUP_ASUS_AUDIO quirks are applied.
The NID 0x15 maps as the headset microphone and NID 0x19 maps as another
headphone jack. Also need the function like alc887_fixup_asus_jack to
enable the audio jacks.
Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> Signed-off-by: Kailang Yang <kailang@realtek.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20201007052224.22611-1-jhp@endlessos.org Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After installing archlinux, the mute led and micmute led are not working
at all. This patch fix this issue by applying a fixup from similar
model. These mute leds are confirmed working on HP Elitebook 845 G7.
Recently we enabled a HP AIO machine, we found the mic on the machine
couldn't record any sound and it couldn't detect plugging and
unplugging as well.
Through debugging we found the mic is set to manual detect mode, after
setting it to auto detect mode, it could detect plugging and
unplugging and could record sound.
Recently released Line6 Pod Go requires static clock rate quirk to make
its usb audio interface working. Added its usb id to the list of similar
line6 devices.
If the cb function is already registered, should return the pointer
of the structure hda_jack_callback which contains this cb func, but
instead it returns the NULL.
Now fix it by replacing func_is_already_in_callback_list() with
find_callback_from_list().
Fixes: f4794c6064a8 ("ALSA: hda - Don't register a cb func if it is registered already") Reported-and-suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Hui Wang <hui.wang@canonical.com> Link: https://lore.kernel.org/r/20201022030221.22393-1-hui.wang@canonical.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the caller of enable_callback_mst() passes a cb func, the callee
function will malloc memory and link this cb func to the list
unconditionally. This will introduce problem if caller is in the
hda_codec_ops.init() since the init() will be repeatedly called in the
codec rt_resume().
So far, the patch_hdmi.c and patch_ca0132.c call enable_callback_mst()
in the hda_codec_ops.init().
In the header prediction fast path for a bulk data receiver, if no
data is newly acknowledged then we do not call tcp_ack() and do not
call tcp_ack_update_window(). This means that a bulk receiver that
receives large amounts of data can have the incoming sequence numbers
wrap, so that the check in tcp_may_update_window fails:
after(ack_seq, tp->snd_wl1)
If the incoming receive windows are zero in this state, and then the
connection that was a bulk data receiver later wants to send data,
that connection can find itself persistently rejecting the window
updates in incoming ACKs. This means the connection can persistently
fail to discover that the receive window has opened, which in turn
means that the connection is unable to send anything, and the
connection's sending process can get permanently "stuck".
The fix is to update snd_wl1 in the header prediction fast path for a
bulk data receiver, so that it keeps up and does not see wrapping
problems.
This fix is based on a very nice and thorough analysis and diagnosis
by Apollon Oikonomopoulos (see link below).
This is a stable candidate but there is no Fixes tag here since the
bug predates current git history. Just for fun: looks like the bug
dates back to when header prediction was added in Linux v2.1.8 in Nov
1996. In that version tcp_rcv_established() was added, and the code
only updates snd_wl1 in tcp_ack(), and in the new "Bulk data transfer:
receiver" code path it does not call tcp_ack(). This fix seems to
apply cleanly at least as far back as v3.2.
The kci_test_encap_fou() test from kci_test_encap() in rtnetlink.sh
needs the fou module to work. Otherwise it will fail with:
$ ip netns exec "$testns" ip fou add port 7777 ipproto 47
RTNETLINK answers: No such file or directory
Error talking to the kernel
Add the CONFIG_NET_FOU into the config file as well. Which needs at
least to be set as a loadable module.
Fixes: 6227efc1a20b ("selftests: rtnetlink.sh: add vxlan and fou test cases") Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Link: https://lore.kernel.org/r/20201019030928.9859-1-po-hsu.lin@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When 'rp_filter' is configured in strict mode (1) the tests fail because
packets received from the macvlan netdevs would not be forwarded through
them on the reverse path.
Fix this by disabling the 'rp_filter', meaning no source validation is
performed.
Fixes: 1538812e0880 ("selftests: forwarding: Add a test for VXLAN asymmetric routing") Fixes: 438a4f5665b2 ("selftests: forwarding: Add a test for VXLAN symmetric routing") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reported-by: Hangbin Liu <liuhangbin@gmail.com> Tested-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20201015084525.135121-1-idosch@idosch.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For several network drivers it was reported that using
__napi_schedule_irqoff() is unsafe with forced threading. One way to
fix this is switching back to __napi_schedule, but then we lose the
benefit of the irqoff version in general. As stated by Eric it doesn't
make sense to make the minimal hard irq handlers in drivers using NAPI
a thread. Therefore ensure that the hard irq handler is never
thread-ified.
Check that the NFC_ATTR_FIRMWARE_NAME attributes are provided by
the netlink client prior to accessing them.This prevents potential
unhandled NULL pointer dereference exceptions which can be triggered
by malicious user-mode programs, if they omit one or both of these
attributes.
Similar to commit a0323b979f81 ("nfc: Ensure presence of required attributes in the activate_target handler").
Since nexthops are always deleted under RTNL, synchronize_net() can be
used instead. It will call synchronize_rcu_expedited() which only blocks
for several microseconds as opposed to multiple milliseconds like
synchronize_rcu().
With this patch deletion of 16k nexthops takes less than a second:
# time -p ip link set dev dummy10 down
real 0.12
user 0.00
sys 0.04
Tested with fib_nexthops.sh which includes torture tests that prompted
the initial change:
Memory state around the buggy address: ffff88813f5f1b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88813f5f1c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88813f5f1c80: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc
^ ffff88813f5f1d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88813f5f1d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
using IPv6 tunnels, act_tunnel_key allocates a fixed amount of memory for
the tunnel metadata, but then it expects additional bytes to store tunnel
specific metadata with tunnel_key_copy_opts().
Fix the arguments of __ipv6_tun_set_dst(), so that 'md_size' contains the
size previously computed by tunnel_key_get_opts_len(), like it's done for
IPv4 tunnels.
Fixes: 0ed5269f9e41 ("net/sched: add tunnel option support to act_tunnel_key") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://lore.kernel.org/r/36ebe969f6d13ff59912d6464a4356fe6f103766.1603231100.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In setsockopt(SO_MAX_PACING_RATE) on 64bit systems, sk_max_pacing_rate,
after extended from 'u32' to 'unsigned long', takes unintentionally
hiked value whenever assigned from an 'int' value with MSB=1, due to
binary sign extension in promoting s32 to u64, e.g. 0x80000000 becomes
0xFFFFFFFF80000000.
Thus inflated sk_max_pacing_rate causes subsequent getsockopt to return
~0U unexpectedly. It may also result in increased pacing rate.
Fix by explicitly casting the 'int' value to 'unsigned int' before
assigning it to sk_max_pacing_rate, for zero extension to happen.
Fixes: 76a9ebe811fb ("net: extend sk_pacing_rate to unsigned long") Signed-off-by: Ji Li <jli@akamai.com> Signed-off-by: Ke Li <keli@akamai.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201022064146.79873-1-keli@akamai.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This driver calls ether_setup to set up the network device.
The ether_setup function would add the IFF_TX_SKB_SHARING flag to the
device. This flag indicates that it is safe to transmit shared skbs to
the device.
However, this is not true. This driver may pad the frame (in eth_tx)
before transmission, so the skb may be modified.
Fixes: 550fd08c2ceb ("net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared") Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Krzysztof Halasa <khc@pm.waw.pl> Signed-off-by: Xie He <xie.he.0141@gmail.com> Link: https://lore.kernel.org/r/20201020063420.187497-1-xie.he.0141@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The hdlc_rcv function is used as hdlc_packet_type.func to process any
skb received in the kernel with skb->protocol == htons(ETH_P_HDLC).
The purpose of this function is to provide second-stage processing for
skbs not assigned a "real" L3 skb->protocol value in the first stage.
This function assumes the device from which the skb is received is an
HDLC device (a device created by this module). It assumes that
netdev_priv(dev) returns a pointer to "struct hdlc_device".
However, it is possible that some driver in the kernel (not necessarily
in our control) submits a received skb with skb->protocol ==
htons(ETH_P_HDLC), from a non-HDLC device. In this case, the skb would
still be received by hdlc_rcv. This will cause problems.
hdlc_rcv should be able to recognize and drop invalid skbs. It should
first make sure "dev" is actually an HDLC device, before starting its
processing. This patch adds this check to hdlc_rcv.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: Krzysztof Halasa <khc@pm.waw.pl> Signed-off-by: Xie He <xie.he.0141@gmail.com> Link: https://lore.kernel.org/r/20201020013152.89259-1-xie.he.0141@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The new HW arbitration feature on Aspeed ast2600 will cause MAC TX to
hang when handling scatter-gather DMA. Disable the problematic feature
by setting MAC register 0x58 bit28 and bit27.
Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle property") Signed-off-by: Dylan Hung <dylan_hung@aspeedtech.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Keyu Man reported that the ICMP rate limiter could be used
by attackers to get useful signal. Details will be provided
in an upcoming academic publication.
Our solution is to add some noise, so that the attackers
no longer can get help from the predictable token bucket limiter.
Fixes: 4cdf507d5452 ("icmp: add a global rate limitation") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Keyu Man <kman001@ucr.edu> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After mac address change request completes successfully, the new mac
address need to be saved to adapter->mac_addr as well as
netdev->dev_addr. Otherwise, adapter->mac_addr still holds old
data.
Fixes: 62740e97881c ("net/ibmvnic: Update MAC address settings after adapter reset") Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Link: https://lore.kernel.org/r/20201020223919.46106-1-ljp@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When chtls_sock *csk is freed, same memory can be allocated
to different csk in chtls_sock_create().
csk->cdev = NULL; statement might ends up modifying wrong
csk, eventually causing kernel panic.
removing (csk->cdev = NULL) statement as it is not required.
Add the logic to compare net_device returned by ip_dev_find()
with the net_device list in cdev->ports[] array and return
net_device if matched else NULL.
Fixes: 6abde0b24122 ("crypto/chtls: IPv6 support for inline TLS") Signed-off-by: Venkatesh Ellapu <venkatesh.e@chelsio.com> Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Netdev is filled in egress_dev when connection is established,
If connection is closed before establishment, then egress_dev
is NULL, Fix it using ip_dev_find() rather then extracting from
egress_dev.
Fixes: 6abde0b24122 ("crypto/chtls: IPv6 support for inline TLS") Signed-off-by: Venkatesh Ellapu <venkatesh.e@chelsio.com> Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 16ad3f4022bb
("tipc: introduce variable window congestion control"), we applied
the algorithm to select window size from minimum window to the
configured maximum window for unicast link, and, besides we chose
to keep the window size for broadcast link unchanged and equal (i.e
fix window 50)
However, when setting maximum window variable via command, the window
variable was re-initialized to unexpect value (i.e 32).
We fix this by updating the fix window for broadcast as we stated.
Fixes: 16ad3f4022bb ("tipc: introduce variable window congestion control") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The queue limit of the broadcast link is being calculated base on initial
MTU. However, when MTU value changed (e.g manual changing MTU on NIC
device, MTU negotiation etc.,) we do not re-calculate queue limit.
This gives throughput does not reflect with the change.
So fix it by calling the function to re-calculate queue limit of the
broadcast link.
Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A race exists between closing a PCM and update of ELD data. In
hdmi_pcm_close(), hinfo->nid value is modified without taking
spec->pcm_lock. If this happens concurrently while processing an ELD
update in hdmi_pcm_setup_pin(), converter assignment may be done
incorrectly.
This bug was found by hitting a WARN_ON in snd_hda_spdif_ctls_assign()
in a HDMI receiver connection stress test:
In case HDA controller becomes active, but codec is runtime suspended,
jack detection is not successful and no interrupt is raised. This has
been observed with multiple Realtek codecs and HDA controllers from
different vendors. Bug does not occur if both codec and controller are
active, or both are in suspend. Bug can be easily hit on desktop systems
with no built-in speaker.
The problem can be fixed by powering up the codec once after every
controller runtime resume. Even if codec goes back to suspend later, the
jack detection will continue to work. Add a flag to 'hda_codec' to
describe codecs that require this flow from the controller driver.
Modify __azx_runtime_resume() to use pm_request_resume() to make the
intent clearer.
Mark all Realtek codecs with the new forced_resume flag.
When releasing a thread todo list when tearing down
a binder_proc, the following race was possible which
could result in a use-after-free:
1. Thread 1: enter binder_release_work from binder_thread_release
2. Thread 2: binder_update_ref_for_handle() -> binder_dec_node_ilocked()
3. Thread 2: dec nodeA --> 0 (will free node)
4. Thread 1: ACQ inner_proc_lock
5. Thread 2: block on inner_proc_lock
6. Thread 1: dequeue work (BINDER_WORK_NODE, part of nodeA)
7. Thread 1: REL inner_proc_lock
8. Thread 2: ACQ inner_proc_lock
9. Thread 2: todo list cleanup, but work was already dequeued
10. Thread 2: free node
11. Thread 2: REL inner_proc_lock
12. Thread 1: deref w->type (UAF)
The problem was that for a BINDER_WORK_NODE, the binder_work element
must not be accessed after releasing the inner_proc_lock while
processing the todo list elements since another thread might be
handling a deref on the node containing the binder_work element
leading to the node being freed.
Petr reported that after resume from suspend RTL8402 partially
truncates incoming packets, and re-initializing register RxConfig
before the actual chip re-initialization sequence is needed to avoid
the issue.
Reported-by: Petr Tesarik <ptesarik@suse.cz> Proposed-by: Petr Tesarik <ptesarik@suse.cz> Tested-by: Petr Tesarik <ptesarik@suse.cz> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
All TC actions call tcf_action_check_ctrlact() to validate
goto chain, so this check in tcf_action_init_1() is actually
redundant. Remove it to save troubles of leaking memory.
Documentation/networking/ip-sysctl.txt:46 says:
ip_forward_use_pmtu - BOOLEAN
By default we don't trust protocol path MTUs while forwarding
because they could be easily forged and can lead to unwanted
fragmentation by the router.
You only need to enable this if you have user-space software
which tries to discover path mtus by itself and depends on the
kernel honoring this information. This is normally not the case.
Default: 0 (disabled)
Possible values:
0 - disabled
1 - enabled
Which makes it pretty clear that setting it to 1 is a potential
security/safety/DoS issue, and yet it is entirely reasonable to want
forwarded traffic to honour explicitly administrator configured
route mtus (instead of defaulting to device mtu).
Indeed, I can't think of a single reason why you wouldn't want to.
Since you configured a route mtu you probably know better...
It is pretty common to have a higher device mtu to allow receiving
large (jumbo) frames, while having some routes via that interface
(potentially including the default route to the internet) specify
a lower mtu.
Note that ipv6 forwarding uses device mtu unless the route is locked
(in which case it will use the route mtu).
This approach is not usable for IPv4 where an 'mtu lock' on a route
also has the side effect of disabling TCP path mtu discovery via
disabling the IPv4 DF (don't frag) bit on all outgoing frames.
I'm not aware of a way to lock a route from an IPv6 RA, so that also
potentially seems wrong.
Signed-off-by: Maciej ลปenczykowski <maze@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Sunmeet Gill (Sunny) <sgill@quicinc.com> Cc: Vinay Paradkar <vparadka@qti.qualcomm.com> Cc: Tyler Wear <twear@quicinc.com> Cc: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes an uninit-value warning:
BUG: KMSAN: uninit-value in can_receive+0x26b/0x630 net/can/af_can.c:650
Reported-and-tested-by: syzbot+3f3837e61a48d32b495f@syzkaller.appspotmail.com Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Cc: Robin van der Gracht <robin@protonic.nl> Cc: Oleksij Rempel <linux@rempel-privat.de> Cc: Pengutronix Kernel Team <kernel@pengutronix.de> Cc: Oliver Hartkopp <socketcan@hartkopp.net> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://lore.kernel.org/r/20201008061821.24663-1-xiyou.wangcong@gmail.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
removed the m_can_class_resume() call in the runtime resume path to get
rid of a infinite recursion, so the runtime resume now only handles the device
clocks.
Unfortunately it did not remove the complementary m_can_class_suspend() call in
the runtime suspend function, so those paths are now unbalanced, which causes
the pinctrl state to get stuck on the "sleep" state, which breaks all CAN
functionality on SoCs where this state is defined. Remove the
m_can_class_suspend() call to fix this.
Fixes: 0704c5743694 can: m_can_platform: remove unnecessary m_can_class_resume() call Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Link: https://lore.kernel.org/r/20200811081545.19921-1-l.stach@pengutronix.de Acked-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
SOCK_TSTAMP_NEW (timespec64 instead of timespec) is also used for
hardware time stamps (configured via SO_TIMESTAMPING_NEW).
User space (ptp4l) first configures hardware time stamping via
SO_TIMESTAMPING_NEW which sets SOCK_TSTAMP_NEW. In the next step, ptp4l
disables SO_TIMESTAMPNS(_NEW) (software time stamps), but this must not
switch hardware time stamps back to "32 bit mode".
This problem happens on 32 bit platforms were the libc has already
switched to struct timespec64 (from SO_TIMExxx_OLD to SO_TIMExxx_NEW
socket options). ptp4l complains with "missing timestamp on transmitted
peer delay request" because the wrong format is received (and
discarded).
Fixes: 887feae36aee ("socket: Add SO_TIMESTAMP[NS]_NEW") Fixes: 783da70e8396 ("net: add sock_enable_timestamps") Signed-off-by: Christian Eggers <ceggers@arri.de> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The comparison of optname with SO_TIMESTAMPING_NEW is wrong way around,
so SOCK_TSTAMP_NEW will first be set and than reset again. Additionally
move it out of the test for SOF_TIMESTAMPING_RX_SOFTWARE as this seems
unrelated.
This problem happens on 32 bit platforms were the libc has already
switched to struct timespec64 (from SO_TIMExxx_OLD to SO_TIMExxx_NEW
socket options). ptp4l complains with "missing timestamp on transmitted
peer delay request" because the wrong format is received (and
discarded).
Fixes: 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW") Signed-off-by: Christian Eggers <ceggers@arri.de> Reviewed-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Reviewed-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
skb_unshare() drops a reference count on the old skb unconditionally,
so in the failure case, we end up freeing the skb twice here.
And because the skb is allocated in fclone and cloned by caller
tipc_msg_reassemble(), the consequence is actually freeing the
original skb too, thus triggered the UAF by syzbot.
Fix this by replacing this skb_unshare() with skb_cloned()+skb_copy().
Fixes: ff48b6222e65 ("tipc: use skb_unshare() instead in tipc_buf_append()") Reported-and-tested-by: syzbot+e96a7ba46281824cc46a@syzkaller.appspotmail.com Cc: Jon Maloy <jmaloy@redhat.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
At first when sendpage gets called, if there is more data, 'more' in
tls_push_data() gets set which later sets pending_open_record_frags, but
when there is no more data in file left, and last time tls_push_data()
gets called, pending_open_record_frags doesn't get reset. And later when
2 bytes of encrypted alert comes as sendmsg, it first checks for
pending_open_record_frags, and since this is set, it creates a record with
0 data bytes to encrypt, meaning record length is prepend_size + tag_size
only, which causes problem.
We should set/reset pending_open_record_frags based on more bit.
Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The SMCD_DMBE_SIZES should include all valid DMBE buffer sizes, so the
correct value is 6 which means 1MB. With 7 the registration of an ISM
buffer would always fail because of the invalid size requested.
Fix that and set the value to 6.
Fixes: c6ba7c9ba43d ("net/smc: add base infrastructure for SMC-D and ISM") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a delayed event is enqueued then the event worker will send this
event the next time it is running and no other flow is currently
active. The event handler is called for the delayed event, and the
pointer to the event keeps set in lgr->delayed_event. This pointer is
cleared later in the processing by smc_llc_flow_start().
This can lead to a use-after-free condition when the processing does not
reach smc_llc_flow_start(), but frees the event because of an error
situation. Then the delayed_event pointer is still set but the event is
freed.
Fix this by always clearing the delayed event pointer when the event is
provided to the event handler for processing, and remove the code to
clear it in smc_llc_flow_start().
The access of tcf_tunnel_info() produces the following splat, so fix it
by dereferencing the tcf_tunnel_key_params pointer with marker that
internal tcfa_liock is held.
=============================
WARNING: suspicious RCU usage
5.9.0+ #1 Not tainted
-----------------------------
include/net/tc_act/tc_tunnel_key.h:59 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
Fixes: 3ebaf6da0716 ("net: sched: Do not assume RTNL is held in tunnel key action helpers") Fixes: 7a47281439ba ("net: sched: lock action when translating it to flow_action infra") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
using packetdrill it's possible to observe the same MPTCP DSN being acked
by different subflows with DACK4 and DACK8. This is in contrast with what
specified in RFC8684 ยง3.3.2: if an MPTCP endpoint transmits a 64-bit wide
DSN, it MUST be acknowledged with a 64-bit wide DACK. Fix 'use_64bit_ack'
variable to make it a property of MPTCP sockets, not TCP subflows.
Fixes: a0c1d0eafd1e ("mptcp: Use 32-bit DATA_ACK when possible") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When processing a system suspend request we suspend modem endpoints
if they are enabled, and call ipa_cmd_tag_process() (which issues
IPA commands) to ensure the IPA pipeline is cleared. It is an error
to attempt to issue an IPA command before setup is complete, so this
is clearly a bug. But we also shouldn't suspend or resume any
endpoints that have not been set up.
Have ipa_endpoint_suspend() and ipa_endpoint_resume() immediately
return if setup hasn't completed, to avoid any attempt to configure
endpoints or issue IPA commands in that case.
Commit 4fc427e05158 ("ipv6_route_seq_next should increase position index")
tried to fix the issue where seq_file pos is not increased
if a NULL element is returned with seq_ops->next(). See bug
https://bugzilla.kernel.org/show_bug.cgi?id=206283
The commit effectively does:
- increase pos for all seq_ops->start()
- increase pos for all seq_ops->next()
For ipv6_route, increasing pos for all seq_ops->next() is correct.
But increasing pos for seq_ops->start() is not correct
since pos is used to determine how many items to skip during
seq_ops->start():
iter->skip = *pos;
seq_ops->start() just fetches the *current* pos item.
The item can be skipped only after seq_ops->show() which essentially
is the beginning of seq_ops->next().
In the above, I specify buffer size 4096, so all records can be returned
to user space with a single trip to the kernel.
If I use buffer size 128, since each record size is 149, internally
kernel seq_read() will read 149 into its internal buffer and return the data
to user space in two read() syscalls. Then user read() syscall will trigger
next seq_ops->start(). Since the current implementation increased pos even
for seq_ops->start(), it will skip record #2, #4 and #6, assuming the first
record is #1.
To fix the problem, create a fake pos pointer so seq_ops->start()
won't actually increase seq_file pos. With this fix, the
above `dd` command with `bs=128` will show correct result.
Fixes: 4fc427e05158 ("ipv6_route_seq_next should increase position index") Cc: Alexei Starovoitov <ast@kernel.org> Suggested-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The phy_reset_after_clk_enable() does a PHY reset, which means the PHY
loses its register settings. The fec_enet_mii_probe() starts the PHY
and does the necessary calls to configure the PHY via PHY framework,
and loads the correct register settings into the PHY. Therefore,
fec_enet_mii_probe() should be called only after the PHY has been
reset, not before as it is now.
Fixes: 1b0a83ac04e3 ("net: fec: add phy_reset_after_clk_enable() support") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Richard Leitner <richard.leitner@skidata.com> Signed-off-by: Marek Vasut <marex@denx.de> Cc: Christoph Niedermaier <cniedermaier@dh-electronics.com> Cc: David S. Miller <davem@davemloft.net> Cc: NXP Linux Team <linux-imx@nxp.com> Cc: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The phy_reset_after_clk_enable() is always called with ndev->phydev,
however that pointer may be NULL even though the PHY device instance
already exists and is sufficient to perform the PHY reset.
This condition happens in fec_open(), where the clock must be enabled
first, then the PHY must be reset, and then the PHY IDs can be read
out of the PHY.
If the PHY still is not bound to the MAC, but there is OF PHY node
and a matching PHY device instance already, use the OF PHY node to
obtain the PHY device instance, and then use that PHY device instance
when triggering the PHY reset.
Fixes: 1b0a83ac04e3 ("net: fec: add phy_reset_after_clk_enable() support") Signed-off-by: Marek Vasut <marex@denx.de> Cc: Christoph Niedermaier <cniedermaier@dh-electronics.com> Cc: David S. Miller <davem@davemloft.net> Cc: NXP Linux Team <linux-imx@nxp.com> Cc: Richard Leitner <richard.leitner@skidata.com> Cc: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Between queuing the delayed work and finishing the setup of the dsa
ports, the process may sleep in request_module() (via
phy_device_create()) and the queued work may be executed prior to the
switch net devices being registered. In ksz_mib_read_work(), a NULL
dereference will happen within netof_carrier_ok(dp->slave).
Not queuing the delayed work in ksz_init_mib_timer() makes things even
worse because the work will now be queued for immediate execution
(instead of 2000 ms) in ksz_mac_link_down() via
dsa_port_link_register_of().
Call tree:
ksz9477_i2c_probe()
\--ksz9477_switch_register()
\--ksz_switch_register()
+--dsa_register_switch()
| \--dsa_switch_probe()
| \--dsa_tree_setup()
| \--dsa_tree_setup_switches()
| +--dsa_switch_setup()
| | +--ksz9477_setup()
| | | \--ksz_init_mib_timer()
| | | |--/* Start the timer 2 seconds later. */
| | | \--schedule_delayed_work(&dev->mib_read, msecs_to_jiffies(2000));
| | \--__mdiobus_register()
| | \--mdiobus_scan()
| | \--get_phy_device()
| | +--get_phy_id()
| | \--phy_device_create()
| | |--/* sleeping, ksz_mib_read_work() can be called meanwhile */
| | \--request_module()
| |
| \--dsa_port_setup()
| +--/* Called for non-CPU ports */
| +--dsa_slave_create()
| | +--/* Too late, ksz_mib_read_work() may be called beforehand */
| | \--port->slave = ...
| ...
| +--Called for CPU port */
| \--dsa_port_link_register_of()
| \--ksz_mac_link_down()
| +--/* mib_read must be initialized here */
| +--/* work is already scheduled, so it will be executed after 2000 ms */
| \--schedule_delayed_work(&dev->mib_read, 0);
\-- /* here port->slave is setup properly, scheduling the delayed work should be safe */
Solution:
1. Do not queue (only initialize) delayed work in ksz_init_mib_timer().
2. Only queue delayed work in ksz_mac_link_down() if init is completed.
3. Queue work once in ksz_switch_register(), after dsa_register_switch()
has completed.
Fixes: 7c6ff470aa86 ("net: dsa: microchip: add MIB counter reading support") Signed-off-by: Christian Eggers <ceggers@arri.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
netcons calls napi_poll with a budget of 0 to transmit packets.
Handle this by:
- skipping RX processing
- do not try to recycle TX packets to the RX cache
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tobias reported regressions in IPsec tests following the patch
referenced by the Fixes tag below. The root cause is dropping the
reset of the flowi4_oif after the fib_lookup. Apparently it is
needed for xfrm cases, so restore the oif update to ip_route_output_flow
right before the call to xfrm_lookup_route.
Fixes: 2fbc6e89b2f1 ("ipv4: Update exception handling for multipath routes via same device") Reported-by: Tobias Brunner <tobias@strongswan.org> Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ingress large send packets are identified by either:
The IBMVETH_RXQ_LRG_PKT flag in the receive buffer
or with a -1 placed in the ip header checksum.
The method used depends on firmware version. Frame
geometry and sufficient header validation is performed by the
hypervisor eliminating the need for further header checks here.
Fixes: 7b5967389f5a ("ibmveth: set correct gso_size and gso_type") Signed-off-by: David Wilder <dwilder@us.ibm.com> Reviewed-by: Thomas Falcon <tlfalcon@linux.ibm.com> Reviewed-by: Cristobal Forno <cris.forno@ibm.com> Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ibmveth_rx_csum_helper() must be called after ibmveth_rx_mss_helper()
as ibmveth_rx_csum_helper() may alter ip and tcp checksum values.
Fixes: 66aa0678efc2 ("ibmveth: Support to enable LSO/CSO for Trunk VEA.") Signed-off-by: David Wilder <dwilder@us.ibm.com> Reviewed-by: Thomas Falcon <tlfalcon@linux.ibm.com> Reviewed-by: Cristobal Forno <cris.forno@ibm.com> Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The 4-tuple NAT offload via PEDIT always overwrites all the 4-tuple
fields even if they had not been explicitly enabled. If any fields in
the 4-tuple are not enabled, then the hardware overwrites the
disabled fields with zeros, instead of ignoring them.
So, add a parser that can translate the enabled 4-tuple PEDIT fields
to one of the NAT mode combinations supported by the hardware and
hence avoid overwriting disabled fields to 0. Any rule with
unsupported NAT mode combination is rejected.
With suitably crafted reiserfs image and mount command reiserfs will
crash when trying to verify that XATTR_ROOT directory can be looked up
in / as that recurses back to xattr code like:
reiserfs_read_locked_inode() didn't initialize key length properly. Use
_make_cpu_key() macro for key initialization so that all key member are
properly initialized.
CC: stable@vger.kernel.org Reported-by: syzbot+d94d02749498bb7bab4b@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There exist many FT2232-based JTAG+UART adapter designs in which
FT2232 Channel A is used for JTAG and Channel B is used for UART.
The best way to handle them in Linux is to have the ftdi_sio driver
create a ttyUSB device only for Channel B and not for Channel A:
a ttyUSB device for Channel A would be bogus and will disappear as
soon as the user runs OpenOCD or other applications that access
Channel A for JTAG from userspace, causing undesirable noise for
users. The ftdi_sio driver already has a dedicated quirk for such
JTAG+UART FT2232 adapters, and it requires assigning custom USB IDs
to such adapters and adding these IDs to the driver with the
ftdi_jtag_quirk applied.
Boutique hardware manufacturer Falconia Partners LLC has created a
couple of JTAG+UART adapter designs (one buffered, one unbuffered)
as part of FreeCalypso project, and this hardware is specifically made
to be used with Linux hosts, with the intent that Channel A will be
accessed only from userspace via appropriate applications, and that
Channel B will be supported by the ftdi_sio kernel driver, presenting
a standard ttyUSB device to userspace. Toward this end the hardware
manufacturer will be programming FT2232 EEPROMs with custom USB IDs,
specifically with the intent that these IDs will be recognized by
the ftdi_sio driver with the ftdi_jtag_quirk applied.
Signed-off-by: Mychaela N. Falconia <falcon@freecalypso.org>
[johan: insert in PID order and drop unused define] Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>