Mark Brown [Fri, 29 Jul 2022 19:22:22 +0000 (20:22 +0100)]
Add SPI Driver to HPE GXP Architecture
Merge series from nick.hawkins@hpe.com <nick.hawkins@hpe.com>:
The GXP supports 3 separate SPI interfaces to accommodate the system
flash, core flash, and other functions. The SPI engine supports variable
clock frequency, selectable 3-byte or 4-byte addressing and a
configurable x1, x2, and x4 command/address/data modes. The memory
buffer for reading and writing ranges between 256 bytes and 8KB. This
driver supports access to the core flash and bios part.
Ralph Campbell [Mon, 25 Jul 2022 18:36:14 +0000 (11:36 -0700)]
mm/hmm: fault non-owner device private entries
If hmm_range_fault() is called with the HMM_PFN_REQ_FAULT flag and a
device private PTE is found, the hmm_range::dev_private_owner page is used
to determine if the device private page should not be faulted in.
However, if the device private page is not owned by the caller,
hmm_range_fault() returns an error instead of calling migrate_to_ram() to
fault in the page.
For example, if a page is migrated to GPU private memory and a RDMA fault
capable NIC tries to read the migrated page, without this patch it will
get an error. With this patch, the page will be migrated back to system
memory and the NIC will be able to read the data.
Jaewon Kim [Mon, 25 Jul 2022 09:52:12 +0000 (18:52 +0900)]
page_alloc: fix invalid watermark check on a negative value
There was a report that a task is waiting at the
throttle_direct_reclaim. The pgscan_direct_throttle in vmstat was
increasing.
This is a bug where zone_watermark_fast returns true even when the free
is very low. The commit f27ce0e14088 ("page_alloc: consider highatomic
reserve in watermark fast") changed the watermark fast to consider
highatomic reserve. But it did not handle a negative value case which
can be happened when reserved_highatomic pageblock is bigger than the
actual free.
If watermark is considered as ok for the negative value, allocating
contexts for order-0 will consume all free pages without direct reclaim,
and finally free page may become depleted except highatomic free.
Then allocating contexts may fall into throttle_direct_reclaim. This
symptom may easily happen in a system where wmark min is low and other
reclaimers like kswapd does not make free pages quickly.
Handle the negative case by using MIN.
Link: https://lkml.kernel.org/r/20220725095212.25388-1-jaewon31.kim@samsung.com Fixes: f27ce0e14088 ("page_alloc: consider highatomic reserve in watermark fast") Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> Reported-by: GyeongHwan Hong <gh21.hong@samsung.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Minchan Kim <minchan@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yong-Taek Lee <ytk.lee@samsung.com> Cc: <stable@vger.kerenl.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Merge branches 'acpi-video', 'acpi-pci' and 'acpi-docs'
Merge ACPI backlight driver changes, ACPI changes related to PCI and
ACPI documentation changes for v5.20-rc1:
- Use native backlight on Dell Inspiron N4010 (Hans de Goede).
- Use native backlight on some TongFang devices (Werner Sembach).
- Drop X86 dependency from the ACPI backlight driver Kconfig (Riwen
Lu).
- Shorten the quirk list in the ACPI backlight driver by identifying
Clevo by board_name only (Werner Sembach).
- Remove useless NULL pointer checks from 2 ACPI PCI link management
functions (Andrey Strachuk).
- Fix obsolete example in the ACPI EINJ documentation (Qifu Zhang).
- Update links and references to _DSD-related documents (Sudeep Holla).
* acpi-video:
ACPI: video: Use native backlight on Dell Inspiron N4010
ACPI: video: Shortening quirk list by identifying Clevo by board_name only
ACPI: video: Force backlight native for some TongFang devices
ACPI: video: Drop X86 dependency from Kconfig
Merge tag 'perf-tools-fixes-for-v5.19-2022-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix addresses for bss symbols, describing variables used in resolving
data access in tools such as 'perf c2c' and 'perf mem'.
- Skip symbols if SHF_ALLOC flag is not set, a technique used for
listing deprecated symbols, its addresses are zeros, so not useful.
- Remove undefined behavior from bpf_perf_object__next() when dealing
with an empty bpf_objects_list list.
- Make a ARM CoreSight disasm script work with both python2 and
python3.
- Sync x86's cpufeatures header with with the kernel sources.
* tag 'perf-tools-fixes-for-v5.19-2022-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf bpf: Remove undefined behavior from bpf_perf_object__next()
perf symbol: Skip symbols if SHF_ALLOC flag is not set
perf symbol: Correct address for bss symbols
perf scripts python: Let script to be python2 compliant
tools headers cpufeatures: Sync with the kernel sources
Merge branches 'acpi-pm', 'acpi-soc', 'acpi-tables' and 'acpi-resource'
Merge ACPI power management changes, ACPI LPSS driver changes, ACPI
table parsing code changes and ACPI resource handling changes for
v5.20-rc1:
- Save NVS memory during transitions into S3 on Lenovo G40-45 (Manyi
Li).
- Add support for upcoming AMD uPEP device ID AMDI008 to the ACPI
suspend-to-idle driver for x86 platforms (Shyam Sundar S K).
- Clean up checks related to the ACPI_FADT_LOW_POWER_S0 platform flag
in the LPIT table driver and the suspend-to-idle driver for x86
platforms (Rafael Wysocki).
- Print information messages regarding declared LPS0 idle support in
the platform firmware (Rafael Wysocki).
- Fix missing check in register_device_clock() in the ACPI driver for
Intel SoCs (huhai).
- Fix ACS setup in the VIOT table parser (Eric Auger).
- Skip IRQ override on AMD Zen platforms where it's harmful (Chuanhong
Guo).
* acpi-pm:
ACPI: PM: x86: Print messages regarding LPS0 idle support
ACPI: PM: s2idle: Use LPS0 idle if ACPI_FADT_LOW_POWER_S0 is unset
Revert "ACPI / PM: LPIT: Register sysfs attributes based on FADT"
ACPI: PM: s2idle: Add support for upcoming AMD uPEP HID AMDI008
ACPI: PM: save NVS memory for Lenovo G40-45
* acpi-soc:
ACPI: LPSS: Fix missing check in register_device_clock()
* acpi-tables:
ACPI: VIOT: Fix ACS setup
* acpi-resource:
ACPI: resource: skip IRQ override on AMD Zen platforms
The send_signal/send_signal_tracepoint is pretty flaky, with at least
one failure in every ten runs on a few attempts I've tried it:
> test_send_signal_common:PASS:pipe_c2p 0 nsec
> test_send_signal_common:PASS:pipe_p2c 0 nsec
> test_send_signal_common:PASS:fork 0 nsec
> test_send_signal_common:PASS:skel_open_and_load 0 nsec
> test_send_signal_common:PASS:skel_attach 0 nsec
> test_send_signal_common:PASS:pipe_read 0 nsec
> test_send_signal_common:PASS:pipe_write 0 nsec
> test_send_signal_common:PASS:reading pipe 0 nsec
> test_send_signal_common:PASS:reading pipe error: size 0 0 nsec
> test_send_signal_common:FAIL:incorrect result unexpected incorrect result: actual 48 != expected 50
> test_send_signal_common:PASS:pipe_write 0 nsec
> #139/1 send_signal/send_signal_tracepoint:FAIL
The reason does not appear to be a correctness issue in the strict
sense. Rather, we merely do not receive the signal we are waiting for
within the provided timeout.
Let's bump the timeout by a factor of ten. With that change I have not
been able to reproduce the failure in 150+ iterations. I am also sneaking
in a small simplification to the test_progs test selection logic.
Merge branches 'acpi-processor', 'acpi-apei' and 'acpi-ec'
Merge ACPI processor driver changes, APEI changes and ACPI EC driver
changes for v5.20-rc1:
- Drop leftover acpi_processor_get_limit_info() declaration (Riwen Lu).
- Split out thermal initialization from ACPI PSS (Riwen Lu).
- Annotate more functions in the ACPI CPU idle driver to live in the
cpuidle section (Guilherme G. Piccoli).
- Fix _EINJ vs "special purpose" EFI memory regions (Dan Williams).
- Implement a better fix to avoid spamming the console with old error
logs (Tony Luck).
- Fix typo in a comment in the APEI code (Xiang wangx).
- Clean up the ACPI EC driver after previous changes in it (Hans
de Goede).
* acpi-processor:
ACPI: processor: Drop leftover acpi_processor_get_limit_info() declaration
ACPI: processor: Split out thermal initialization from ACPI PSS
ACPI: processor/idle: Annotate more functions to live in cpuidle section
* acpi-apei:
ACPI: APEI: Fix _EINJ vs EFI_MEMORY_SP
ACPI: APEI: Better fix to avoid spamming the console with old error logs
ACPI: APEI: Fix double word in a comment
* acpi-ec:
ACPI: EC: Drop unused ident initializers from dmi_system_id tables
ACPI: EC: Re-use boot_ec when possible even when EC_FLAGS_TRUST_DSDT_GPE is set
ACPI: EC: Drop the EC_FLAGS_IGNORE_DSDT_GPE quirk
ACPI: EC: Remove duplicate ThinkPad X1 Carbon 6th entry from DMI quirks
Merge ACPI device object management changes for v5.20-rc1.
- Use the facilities provided by the driver core and some additional
helpers to handle the children of a given ACPI device object in
multiple places instead of using the children and node list heads in
struct acpi_device which is error prone (Rafael Wysocki).
- Fix ACPI-related device reference counting issue in the hisi_lpc bus
driver (Yang Yingliang).
- Drop the children and node list heads that are not needed any more
from struct acpi_device (Rafael Wysocki).
- Drop driver member from struct acpi_device (Uwe Kleine-König).
- Drop redundant check from acpi_device_remove() (Uwe Kleine-König).
* acpi-bus:
ACPI: bus: Drop unused list heads from struct acpi_device
hisi_lpc: Use acpi_dev_for_each_child()
bus: hisi_lpc: fix missing platform_device_put() in hisi_lpc_acpi_probe()
ACPI: bus: Drop driver member of struct acpi_device
ACPI: bus: Drop redundant check in acpi_device_remove()
mfd: core: Use acpi_dev_for_each_child()
ACPI / MMC: PM: Unify fixing up device power
soundwire: Use acpi_dev_for_each_child()
platform/x86/thinkpad_acpi: Use acpi_dev_for_each_child()
ACPI: scan: Walk ACPI device's children using driver core
ACPI: bus: Introduce acpi_dev_for_each_child_reverse()
ACPI: video: Use acpi_dev_for_each_child()
ACPI: bus: Export acpi_dev_for_each_child() to modules
ACPI: property: Use acpi_dev_for_each_child() for child lookup
ACPI: container: Use acpi_dev_for_each_child()
USB: ACPI: Replace usb_acpi_find_port() with acpi_find_child_by_adr()
thunderbolt: ACPI: Replace tb_acpi_find_port() with acpi_find_child_by_adr()
ACPI: glue: Introduce acpi_find_child_by_adr()
ACPI: glue: Introduce acpi_dev_has_children()
ACPI: glue: Use acpi_dev_for_each_child()
Merge tag 'pm-5.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Make some false positive RCU splats resulting from a recent intel_idle
driver change go away (Waiman Long)"
* tag 'pm-5.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
intel_idle: Fix false positive RCU splats due to incorrect hardirqs state
Lai Jiangshan [Fri, 29 Jul 2022 09:44:38 +0000 (17:44 +0800)]
workqueue: Avoid a false warning in unbind_workers()
Doing set_cpus_allowed_ptr() with wq_unbound_cpumask can be possible
fails and trigger the false warning.
Use cpu_possible_mask instead when wq_unbound_cpumask has no active CPUs.
It is very easy to trigger the warning:
Set wq_unbound_cpumask to a small set of CPUs.
Offline all the CPUs of wq_unbound_cpumask.
Offline an extra CPU and trigger the warning.
Fixes: 10a5a651e3af ("workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs") Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>
Merge branches 'pm-devfreq', 'pm-qos', 'pm-tools' and 'pm-docs'
Merge devfreq changes, PM QoS change, and power management tools and
documentation changes for v5.20-rc1:
- Add new devfreq driver for Mediatek CCI (Cache Coherent
Interconnect) (Johnson Wang).
- Convert the Samsung Exynos SoC Bus bindings to DT schema of
exynos-bus.c (Krzysztof Kozlowski).
- Address kernel-doc warnings by adding the description for unused
fucntion parameters in devfreq core (Mauro Carvalho Chehab).
- Use NULL to pass a null pointer rather than zero according to the
function propotype in imx-bus.c (Colin Ian King).
- Print error message instead of error interger value in
tegra30-devfreq.c (Dmitry Osipenko).
- Add checks to prevent setting negative frequency QoS limits for
CPUs (Shivnandan Kumar).
- Update the pm-graph suite of utilities to the latest revision 5.9
including multiple improvements (Todd Brandt).
- Drop pme_interrupt reference from the PCI power management
documentation (Mario Limonciello).
* pm-devfreq:
PM / devfreq: tegra30: Add error message for devm_devfreq_add_device()
PM / devfreq: imx-bus: use NULL to pass a null pointer rather than zero
PM / devfreq: shut up kernel-doc warnings
dt-bindings: interconnect: samsung,exynos-bus: convert to dtschema
PM / devfreq: mediatek: Introduce MediaTek CCI devfreq driver
dt-bindings: interconnect: Add MediaTek CCI dt-bindings
* pm-qos:
PM: QoS: Add check to make sure CPU freq is non-negative
* pm-tools:
pm-graph v5.9
* pm-docs:
Documentation: PM: Drop pme_interrupt reference
bpftool: Don't try to return value from void function in skeleton
A skeleton generated by bpftool previously contained a return followed
by an expression in OBJ_NAME__detach(), which has return type void. This
did not hurt, the bpf_object__detach_skeleton() called there returns
void itself anyway, but led to a warning when compiling with e.g.
-pedantic.
Merge branches 'pm-core', 'pm-sleep', 'powercap', 'pm-domains' and 'pm-em'
Merge core device power management changes for v5.20-rc1:
- Extend support for wakeirq to callback wrappers used during system
suspend and resume (Ulf Hansson).
- Defer waiting for device probe before loading a hibernation image
till the first actual device access to avoid possible deadlocks
reported by syzbot (Tetsuo Handa).
- Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn
Helgaas).
- Add Raptor Lake-P to the list of processors supported by the Intel
RAPL driver (George D Sworo).
- Add Alder Lake-N and Raptor Lake-P to the list of processors for
which Power Limit4 is supported in the Intel RAPL driver (Sumeet
Pawnikar).
- Make pm_genpd_remove() check genpd_debugfs_dir against NULL before
attempting to remove it (Hsin-Yi Wang).
- Change the Energy Model code to represent power in micro-Watts and
adjust its users accordingly (Lukasz Luba).
* pm-core:
PM: runtime: Extend support for wakeirq for force_suspend|resume
* pm-sleep:
PM: hibernate: defer device probing when resuming from hibernation
PM: wakeup: Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP
* powercap:
powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P
powercap: intel_rapl: Add support for RAPTORLAKE_P
* pm-domains:
PM: domains: Ensure genpd_debugfs_dir exists before remove
* pm-em:
cpufreq: scmi: Support the power scale in micro-Watts in SCMI v3.1
firmware: arm_scmi: Get detailed power scale from perf
Documentation: EM: Switch to micro-Watts scale
PM: EM: convert power field to micro-Watts precision and align drivers
Merge processor power management changes for v5.20-rc1:
- Make cpufreq_show_cpus() more straightforward (Viresh Kumar).
- Drop unnecessary CPU hotplug locking from store() used by cpufreq
sysfs attributes (Viresh Kumar).
- Make the ACPI cpufreq driver support the boost control interface on
Zhaoxin/Centaur processors (Tony W Wang-oc).
- Print a warning message on attempts to free an active cpufreq policy
which should never happen (Viresh Kumar).
- Fix grammar in the Kconfig help text for the loongson2 cpufreq
driver (Randy Dunlap).
- Use cpumask_var_t for an on-stack CPU mask in the ondemand cpufreq
governor (Zhao Liu).
- Add trace points for guest_halt_poll_ns grow/shrink to the haltpoll
cpuidle driver (Eiichi Tsukata).
- Modify intel_idle to treat C1 and C1E as independent idle states on
Sapphire Rapids (Artem Bityutskiy).
* pm-cpufreq:
cpufreq: ondemand: Use cpumask_var_t for on-stack cpu mask
cpufreq: loongson2: fix Kconfig "its" grammar
cpufreq: Warn users while freeing active policy
cpufreq: ACPI: Add Zhaoxin/Centaur turbo boost control interface support
cpufreq: Drop unnecessary cpus locking from store()
cpufreq: Optimize cpufreq_show_cpus()
* pm-cpuidle:
intel_idle: make SPR C1 and C1E be independent
cpuidle: haltpoll: Add trace points for guest_halt_poll_ns grow/shrink
Allow ASPM L1 and its substates. By default this is disabled in the qcom
specific hardware. Enable it explicitly only for controllers belonging to
2_7_0.
This does not affect any link capability registers; it will allow the link
transitions to L1 and its substates only if they are already supported.
Merge tag 'thermal-v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal control changes for 5.20-rc1 from Daniel Lezcano:
"- Make per cpufreq / devfreq cooling device ops instead of using a
global variable, fix comments and rework the trace information
(Lukasz Luba)
- Add the include/dt-bindings/thermal.h under the area covered by the
thermal maintainer in the MAINTAINERS file (Lukas Bulwahn)
- Improve the error output by giving the sensor identification when a
thermal zone failed to initialize, the DT bindings by changing the
positive logic and adding the r8a779f0 support on the rcar3 (Wolfram
Sang)
- Convert the QCom tsens DT binding to the dtsformat format (Krzysztof
Kozlowski)
- Remove the pointless get_trend() function in the QCom, Ux500 and
tegra thermal drivers, along with the unused DROP_FULL and
RAISE_FULL trends definitions. Simplify the code by using clamp()
macros (Daniel Lezcano)
- Fix ref_table memory leak at probe time on the k3_j72xx bandgap
(Bryan Brattlof)
- Fix array underflow in prep_lookup_table (Dan Carpenter)
- Add static annotation to the k3_j72xx_bandgap_j7* data structure
(Jin Xiaoyun)
- Fix typos in comments detected on sun8i by Coccinelle (Julia Lawall)
- Fix typos in comments on rzg2l (Biju Das)
- Remove as unnecessary call to dev_err() as the error is already
printed by the failing function on u8500 (Yang Li)
- Register the thermal zones as hwmon sensors for the Qcom thermal
sensors (Dmitry Baryshkov)
- Fix 'tmon' tool compilation issue by adding phtread.h include
(Markus Mayer)
- Fix typo in the comments for the 'tmon' tool (Slark Xiao)
- Consolidate the thermal core code by beginning to move the thermal
trip structure from the thermal OF code as a generic structure to be
used by the different sensors when registering a thermal zone
(Daniel Lezcano)"
* tag 'thermal-v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (36 commits)
thermal/of: Initialize trip points separately
thermal/of: Use thermal trips stored in the thermal zone
thermal/core: Add thermal_trip in thermal_zone
thermal/core: Rename 'trips' to 'num_trips'
thermal/core: Move thermal_set_delay_jiffies to static
thermal/core: Remove unneeded EXPORT_SYMBOLS
thermal/of: Move thermal_trip structure to thermal.h
thermal/of: Remove the device node pointer for thermal_trip
thermal/of: Replace device node match with device node search
thermal/core: Remove duplicate information when an error occurs
thermal/core: Avoid calling ->get_trip_temp() unnecessarily
thermal/tools/tmon: Fix typo 'the the' in comment
thermal/tools/tmon: Include pthread and time headers in tmon.h
thermal/ti-soc-thermal: Fix comment typo
thermal/drivers/qcom/spmi-adc-tm5: Register thermal zones as hwmon sensors
thermal/drivers/qcom/temp-alarm: Register thermal zones as hwmon sensors
thermal/drivers/u8500: Remove unnecessary print function dev_err()
thermal/drivers/rzg2l: Fix comments
thermal/drivers/sun8i: Fix typo in comment
thermal/drivers/k3_j72xx_bandgap: Make k3_j72xx_bandgap_j721e_data and k3_j72xx_bandgap_j7200_data static
...
Merge tag 'loongarch-fixes-5.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
- Fix cache size calculation, stack protection attributes, ptrace's
fpr_set and "ROM Size" in boardinfo
- Some cleanups and improvements of assembly
- Some cleanups of unused code and useless code
* tag 'loongarch-fixes-5.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Fix wrong "ROM Size" of boardinfo
LoongArch: Fix missing fcsr in ptrace's fpr_set
LoongArch: Fix shared cache size calculation
LoongArch: Disable executable stack by default
LoongArch: Remove unused variables
LoongArch: Remove clock setting during cpu hotplug stage
LoongArch: Remove useless header compiler.h
LoongArch: Remove several syntactic sugar macros for branches
LoongArch: Re-tab the assembly files
LoongArch: Simplify "BGT foo, zero" with BGTZ
LoongArch: Simplify "BLT foo, zero" with BLTZ
LoongArch: Simplify "BEQ/BNE foo, zero" with BEQZ/BNEZ
LoongArch: Use the "move" pseudo-instruction where applicable
LoongArch: Use the "jr" pseudo-instruction where applicable
LoongArch: Use ABI names of registers where appropriate
The main feature of the sparc-specific implementation of
pci_mmap_resource_range() is that it allows mapping the entire PCI I/O
space for a PCI host bridge using the /proc/bus/pci interface on a bridge
device.
The generic implementation cannot do this, but it also appears that this
got broken for sparc by commit 9eff02e2042f ("PCI: check mmap range of
/proc/bus/pci files too"), which enforces that each address is part of a
BAR for kernels after 2.6.28.
Remove it all, assuming that the corresponding user space code has already
been changed to access /dev/ioport instead a long time ago. Add
pci_iobar_pfn() to make it possible to map I/O resources. This is adapted
from the powerpc version.
The ARCH_GENERIC_PCI_MMAP_RESOURCE symbol came up in a recent discussion,
and I noticed that this was left behind by an unfinished cleanup from 2017.
The only architecture that still relies on providing its own
pci_mmap_page_range() helper instead of using the generic
pci_mmap_resource_range() is sparc. Presumably the reasons for this have
not changed, but at least this can be simplified by converting sparc to use
the same interface as the others.
The only difference between the two is the device-specific offset that gets
added to or subtracted from vma->vm_pgoff.
Change the only caller of pci_mmap_page_range() in common code to subtract
this offset and call the modern interface, while adding it back in the
sparc implementation to preserve the existing behavior.
This removes the complexities of the dual interfaces from the common code,
and keeps it all specific to the sparc architecture code. According to
David Miller, the sparc code lets user space poke into the VGA I/O port
registers by mmapping the I/O space of the parent bridge device, which is
something that the generic pci_mmap_resource_range() code apparently does
not.
PCI: Stub __pci_ioport_map() for arches that don't support it at all
When building OpenRISC PCI, which has no ioport_map(), we get the following
build error:
lib/pci_iomap.c: In function 'pci_iomap_range':
CC drivers/i2c/i2c-core-base.o
./include/asm-generic/pci_iomap.h:29:41: error: implicit declaration of function 'ioport_map'; did you mean 'ioremap'? [-Werror=implicit-function-declaration]
29 | #define __pci_ioport_map(dev, port, nr) ioport_map((port), (nr))
| ^~~~~~~~~~
lib/pci_iomap.c:44:24: note: in expansion of macro '__pci_ioport_map'
44 | return __pci_ioport_map(dev, start, len);
| ^~~~~~~~~~~~~~~~
Add a NULL definition of __pci_ioport_map() for architectures that do not
support ioport_map().
vsnprintf returns the number of characters which would have been written if
enough space had been available, excluding the terminating null byte. Thus,
the return value of 'len_left' means that the last character has been
dropped.
Merge tag 'powerpc-5.19-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Re-enable the new amdgpu display engine for powerpc, as long as the
compiler is correctly configured.
- Disable stack variable initialisation in prom_init to fix GCC 12
allmodconfig.
Thanks to Dan Horák and Sudip Mukherjee.
* tag 'powerpc-5.19-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
drm/amdgpu: Re-enable DCN for 64-bit powerpc
powerpc/64s: Disable stack variable initialisation for prom_init
perf stat: Add topdown metrics in the default perf stat on the hybrid machine
Topdown metrics are missed in the default perf stat on the hybrid machine,
add Topdown metrics in default perf stat for hybrid systems.
Currently, we support the perf metrics Topdown for the p-core PMU in the
perf stat default, the perf metrics Topdown support for e-core PMU will be
implemented later separately. Refactor the code adds two x86 specific
functions. Widen the size of the event name column by 7 chars, so that all
metrics after the "#" become aligned again.
The perf metrics topdown feature is supported on the cpu_core of ADL. The
dedicated perf metrics counter and the fixed counter 3 are used for the
topdown events. Adding the topdown metrics doesn't trigger multiplexing.
Kan Liang [Thu, 21 Jul 2022 06:57:04 +0000 (14:57 +0800)]
perf evlist: Always use arch_evlist__add_default_attrs()
Current perf stat uses the evlist__add_default_attrs() to add the
generic default attrs, and uses arch_evlist__add_default_attrs() to add
the Arch specific default attrs, e.g., Topdown for x86.
It works well for the non-hybrid platforms. However, for a hybrid
platform, the hard code generic default attrs don't work.
Uses arch_evlist__add_default_attrs() to replace the
evlist__add_default_attrs(). The arch_evlist__add_default_attrs() is
modified to invoke the same __evlist__add_default_attrs() for the
generic default attrs. No functional change.
Add default_null_attrs[] to indicate the arch specific attrs.
No functional change for the arch specific default attrs either.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220721065706.2886112-4-zhengjun.xing@linux.intel.com Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kan Liang [Thu, 21 Jul 2022 06:57:03 +0000 (14:57 +0800)]
perf evsel: Add arch_evsel__hw_name()
The commit 55bcf6ef314ae8ba ("perf: Extend PERF_TYPE_HARDWARE and
PERF_TYPE_HW_CACHE") extends the two types to become PMU aware types for
a hybrid system. However, current evsel__hw_name doesn't take the PMU
type into account. It mistakenly returns the "unknown-hardware" for the
hardware event with a specific PMU type.
Add an arch specific arch_evsel__hw_name() to specially handle the PMU
aware hardware event.
Currently, the extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE is only
supported by X86. Only implement the specific arch_evsel__hw_name() for
X86 in the patch.
Nothing is changed for the other archs.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220721065706.2886112-3-zhengjun.xing@linux.intel.com Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Between this patch and the reverted patch, the commit 6c1912898ed21bef
("perf parse-events: Rename parse_events_error functions") and the
commit 07eafd4e053a41d7 ("perf parse-event: Add init and exit to
parse_event_error") clean up the parse_events_error_*() codes. The
related change is also reverted.
The reverted patch is hard to be extended to support new default events,
e.g., Topdown events, and the existing "--detailed" option on a hybrid
platform.
A new solution will be proposed in the following patch to enable the
perf stat default on a hybrid platform.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220721065706.2886112-2-zhengjun.xing@linux.intel.com Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Nick Hawkins [Thu, 28 Jul 2022 16:14:55 +0000 (11:14 -0500)]
spi: spi-gxp: Add support for HPE GXP SoCs
The GXP supports 3 separate SPI interfaces to accommodate the system
flash, core flash, and other functions. The SPI engine supports variable
clock frequency, selectable 3-byte or 4-byte addressing and a
configurable x1, x2, and x4 command/address/data modes. The memory
buffer for reading and writing ranges between 256 bytes and 8KB. This
driver supports access to the core flash and bios part.
Fix max_rate option in TC, check for proper quanta boundaries.
Check for minimum value provided and if it fits expected 50Mbps
quanta.
Without this patch, iavf could send settings for max_rate limiting
that would be accepted from by PF even the max_rate option is less
than expected 50Mbps quanta. It results in no rate limiting
on traffic as rate limiting will be floored to 0.
Michael Ellerman [Fri, 29 Jul 2022 06:13:55 +0000 (16:13 +1000)]
powerpc/mm: Export memory_add_physaddr_to_nid() for modules
The cxl_pmem module wants to call memory_add_physaddr_to_nid(), so
export the symbol.
Link: http://lore.kernel.org/r/87sfmkbfyg.fsf@mpe.ellerman.id.au Fixes: 04ad63f086d1 ("cxl/region: Introduce cxl_pmem_region objects") Reported-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The current AMD contact info email address is incorrect, so fix it up to
use the correct one.
Cc: Jonathan Corbet <corbet@lwn.net> Cc: Alex Shi <alexs@kernel.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Hu Haowen <src.res@email.cn> Cc: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20220729134517.2284700-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
William Dean [Sat, 23 Jul 2022 06:37:56 +0000 (14:37 +0800)]
wifi: rtw88: check the return value of alloc_workqueue()
The function alloc_workqueue() in rtw_core_init() can fail, but
there is no check of its return value. To fix this bug, its return value
should be checked with new error handling code.
Fixes: fe101716c7c9d ("rtw88: replace tx tasklet with work queue") Reported-by: Hacash Robot <hacashRobot@santino.com> Signed-off-by: William Dean <williamsukatube@gmail.com> Reviewed-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220723063756.2956189-1-williamsukatube@163.com
Zong-Zhe Yang [Thu, 21 Jul 2022 07:49:52 +0000 (15:49 +0800)]
wifi: rtw89: 8852a: adjust IMR for SER L1
SER (system error recovery) L1 (level 1) has a step-by-step handshake
process with FW. These handshakes still rely on B_AX_HS0ISR_IND_INT_EN.
So, even already during recovery, we enable this bit in IMR.
However when the branch recording is actually done, a different event is
used, as seen in the line:
perf record -o $TMPDIR/... --branch-filter any,save_type,u -- ...
The event is omitted and for "perf record" the default event is cycles,
which is not supported by s390 and this fails when executed on s390:
# perf record --branch-filter any,save_type,u -- /tmp/__perf_test.program.iDSmQ/a.out
Error:
cycles: PMU Hardware or event type doesn't support branch stack sampling.
#
Therefore fix this and use the same event cycles for testing support
and actually running the test.
Output before:
# ./perf test -Fv 95
95: Check branch stack sampling :
--- start ---
Testing user branch stack sampling
---- end ----
Check branch stack sampling: FAILED!
#
wifi: wcn36xx: Move firmware feature bit storage to dedicated firmware.c file
The naming of the get/set/clear firmware feature capability bits doesn't
really follow the established namespace pattern of
wcn36xx_logicalblock_do_something();
The feature bits are accessed by smd.c and main.c. It would be nice to
display the found feature bits in debugfs. To do so though we should tidy
up the namespace a bit.
Move the firmware feature exchange API to its own file - firmware.c giving
us the opportunity to functionally decompose other firmware related
accessors as appropriate in future.
RISC-V: KVM: Add support for Svpbmt inside Guest/VM
The Guest/VM can use Svpbmt in VS-stage page tables when allowed by the
Hypervisor using the henvcfg.PBMTE bit.
We add Svpbmt support for the KVM Guest/VM which can be enabled/disabled
by the KVM user-space (QEMU/KVMTOOL) using the ISA extension ONE_REG
interface.
RISC-V: KVM: Add G-stage ioremap() and iounmap() functions
The in-kernel AIA IMSIC support requires on-demand mapping / unmapping
of Guest IMSIC address to Host IMSIC guest files. To help achieve this,
we add kvm_riscv_stage2_ioremap() and kvm_riscv_stage2_iounmap() functions.
These new functions for updating G-stage page table mappings will be called
in atomic context so we have special "in_atomic" parameter for this purpose.
KVM: Add gfp_custom flag in struct kvm_mmu_memory_cache
The kvm_mmu_topup_memory_cache() always uses GFP_KERNEL_ACCOUNT for
memory allocation which prevents it's use in atomic context. To address
this limitation of kvm_mmu_topup_memory_cache(), we add gfp_custom flag
in struct kvm_mmu_memory_cache. When the gfp_custom flag is set to some
GFP_xyz flags, the kvm_mmu_topup_memory_cache() will use that instead of
GFP_KERNEL_ACCOUNT.
We add an extensible CSR emulation framework which is based upon the
existing system instruction emulation. This will be useful to upcoming
AIA, PMU, Nested and other virtualization features.
The CSR emulation framework also has provision to emulate CSR in user
space but this will be used only in very specific cases such as AIA
IMSIC CSR emulation in user space or vendor specific CSR emulation
in user space.
By default, all CSRs not handled by KVM RISC-V will be redirected back
to Guest VCPU as illegal instruction trap.
Nikolay Borisov [Fri, 29 Jul 2022 11:44:34 +0000 (17:14 +0530)]
RISC-V: KVM: move preempt_disable() call in kvm_arch_vcpu_ioctl_run
local_irq_disable provides stronger guarantees than preempt_disable so
calling the latter is redundant when interrupts are disabled. Instead,
explicitly disable preemption right before interrupts are enabled/disabled
to ensure that the time accounted in guest_timing_exit_irqoff
includes time taken by the guest or interrupts.
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Anup Patel <anup@brainfault.org>
Nikolay Borisov [Fri, 29 Jul 2022 11:44:26 +0000 (17:14 +0530)]
RISC-V: KVM: Make kvm_riscv_guest_timer_init a void function
It can never fail so convey that fact explicitly by making the function
void. Also in kvm_arch_init_vm it makes it clear that there no need
to do any cleanup after kvm_riscv_gstage_vmid_init has been called.
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Anup Patel <anup@brainfault.org>
RISC-V: KVM: Improve ISA extension by using a bitmap
Currently, the every vcpu only stores the ISA extensions in a unsigned long
which is not scalable as number of extensions will continue to grow.
Using a bitmap allows the ISA extension to support any number of
extensions. The CONFIG one reg interface implementation is modified to
support the bitmap as well. But it is meant only for base extensions.
Thus, the first element of the bitmap array is sufficient for that
interface.
In the future, all the new multi-letter extensions must use the
ISA_EXT one reg interface that allows enabling/disabling any extension
now.
val = (u64)NSEC_PER_SEC * LPC18XX_PWM_TIMER_MAX;
do_div(val, lpc18xx_pwm->clk_rate);
lpc18xx_pwm->max_period_ns = val;
is bogus because with NSEC_PER_SEC = 1000000000,
LPC18XX_PWM_TIMER_MAX = 0xffffffff and clk_rate < NSEC_PER_SEC this
overflows the (on lpc18xx (i.e. ARM32) 32 bit wide) unsigned int
.max_period_ns. This results (dependant of the actual clk rate) in an
arbitrary limitation of the maximal period. E.g. for clkrate = 333333333 (Hz) we get max_period_ns = 9 instead of 12884901897.
So make .max_period_ns an u64 and pass period and duty as u64 to not
discard relevant digits. And also make use of mul_u64_u64_div_u64()
which prevents all overflows assuming clk_rate < NSEC_PER_SEC.
This has various upsides:
- It emits the symbolic name of the error code
- It is silent in the EPROBE_DEFER case and properly sets the defer reason
- It reduces the number of code lines slightly
pwm: twl-led: Document some limitations and link to the reference manual
I found these just from reading the reference manual and the driver
source. It's unclear to me if there are glitches when updating the ON
and OFF registers.
Some systems have clocks exposed to external devices. If the clock
controller supports duty-cycle configuration, such clocks can be used as
pwm outputs. In fact PWM and CLK subsystems are interfaced with in a
similar way and an "opposite" driver already exists (clk-pwm). Add a
driver that would enable pwm devices to be used via clk subsystem.
pwm: sifive: Ensure the clk is enabled exactly once per running PWM
.apply() assumes the clk to be for a given PWM iff the PWM is enabled.
So make sure this is the case when .probe() completes. And in .remove()
disable the according number of times.
This fixes a clk enable/disable imbalance, if some PWMs are already running
at probe time.
Fixes: 9e37a53eb051 (pwm: sifive: Add a driver for SiFive SoC PWM) Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
pwm: sifive: Enable clk only after period check in .apply()
For the period check and the initial calculations of register values there
is no hardware access needed. So delay enabling the clk a bit to simplify
the code flow a bit.
pwm: sifive: Reduce time the controller lock is held
The lock is only to serialize access and update to user_count and
approx_period between different PWMs served by the same pwm_chip.
So the lock needs only to be taken during the check if the (chip global)
period can and/or needs to be changed.
pwm: sifive: Fold pwm_sifive_enable() into its only caller
There is only a single caller of pwm_sifive_enable() which only enables
or disables the clk. Put this implementation directly into
pwm_sifive_apply() which allows further simplification in the next
change.
pwm: sifive: Simplify offset calculation for PWMCMP registers
Instead of explicitly using PWM_SIFIVE_PWMCMP0 + pwm->hwpwm *
PWM_SIFIVE_SIZE_PWMCMP for each access to one of the PWMCMP registers,
introduce a macro that takes the hwpwm id as parameter.
For the register definition using a plain 4 instead of the cpp constant
PWM_SIFIVE_SIZE_PWMCMP is easier to read, so define the offset macro
without the constant. The latter can then be dropped as there are no
users left.
selftests: netdevsim: Add test cases for route deletion failure
Add IPv4 and IPv6 test cases that ensure that we are not leaking a
reference on the nexthop device when we are unable to delete its
associated route.
Without the fix in a previous patch ("netdevsim: fib: Fix reference
count leak on route deletion failure") both test cases get stuck,
waiting for the reference to be released from the dummy device [1][2].
[1]
unregister_netdevice: waiting for dummy1 to become free. Usage count = 5
leaked reference.
fib_check_nh+0x275/0x620
fib_create_info+0x237c/0x4d30
fib_table_insert+0x1dd/0x1d20
inet_rtm_newroute+0x11b/0x200
rtnetlink_rcv_msg+0x43b/0xd20
netlink_rcv_skb+0x15e/0x430
netlink_unicast+0x53b/0x800
netlink_sendmsg+0x945/0xe40
____sys_sendmsg+0x747/0x960
___sys_sendmsg+0x11d/0x190
__sys_sendmsg+0x118/0x1e0
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
[2]
unregister_netdevice: waiting for dummy1 to become free. Usage count = 5
leaked reference.
fib6_nh_init+0xc46/0x1ca0
ip6_route_info_create+0x1167/0x19a0
ip6_route_add+0x27/0x150
inet6_rtm_newroute+0x161/0x170
rtnetlink_rcv_msg+0x43b/0xd20
netlink_rcv_skb+0x15e/0x430
netlink_unicast+0x53b/0x800
netlink_sendmsg+0x945/0xe40
____sys_sendmsg+0x747/0x960
___sys_sendmsg+0x11d/0x190
__sys_sendmsg+0x118/0x1e0
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
netdevsim: fib: Add debugfs knob to simulate route deletion failure
The previous patch ("netdevsim: fib: Fix reference count leak on route
deletion failure") fixed a reference count leak that happens on route
deletion failure.
Such failures can only be simulated by injecting slab allocation
failures, which cannot be surgically injected.
In order to be able to specifically test this scenario, add a debugfs
knob that allows user space to fail route deletion requests when
enabled.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
netdevsim: fib: Fix reference count leak on route deletion failure
As part of FIB offload simulation, netdevsim stores IPv4 and IPv6 routes
and holds a reference on FIB info structures that in turn hold a
reference on the associated nexthop device(s).
In the unlikely case where we are unable to allocate memory to process a
route deletion request, netdevsim will not release the reference from
the associated FIB info structure, thereby preventing the associated
nexthop device(s) from ever being removed [1].
Fix this by scheduling a work item that will flush netdevsim's FIB table
upon route deletion failure. This will cause netdevsim to release its
reference from all the FIB info structures in its table.
Reported by Lucas Leong of Trend Micro Zero Day Initiative.
Fixes: 0ae3eb7b4611 ("netdevsim: fib: Perform the route programming in a non-atomic context") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 29 Jul 2022 11:14:03 +0000 (12:14 +0100)]
Merge branch 'seg6-headend-reduced'
Andrea Mayer says:
====================
seg6: add support for SRv6 Headend Reduced
This patchset adds support for SRv6 Headend behavior with Reduced
Encapsulation. It introduces the H.Encaps.Red and H.L2Encaps.Red versions
of the SRv6 H.Encaps and H.L2Encaps behaviors, according to RFC 8986 [1].
In details, the patchset is made of:
- patch 1/4: add support for SRv6 H.Encaps.Red behavior;
- Patch 2/4: add support for SRv6 H.L2Encaps.Red behavior;
- patch 2/4: add selftest for SRv6 H.Encaps.Red behavior;
- patch 3/4: add selftest for SRv6 H.L2Encaps.Red behavior.
The corresponding iproute2 patch for supporting SRv6 H.Encaps.Red and
H.L2Encaps.Red behaviors is provided in a separated patchset.
- Improve selftests by:
i) adding a random suffix to network namespaces;
ii) creating net devices directly into network namespaces;
iii) using trap EXIT command to properly clean up selftest networks.
Thanks to Paolo Abeni.
v3 -> v4:
- Add selftests to the Makefile, thanks to Jakub Kicinski.
v2 -> v3:
- Keep SRH when HMAC TLV is present;
- Split the support for H.Encaps.Red and H.L2Encaps.Red behaviors in two
patches (respectively, patch 1/4 and patch 2/4);
- Add selftests for SRv6 H.Encaps.Red and H.L2Encaps.Red.
v1 -> v2:
- Fixed sparse warnings;
- memset now uses sizeof() instead of hardcoded value;
- Removed EXPORT_SYMBOL_GPL.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrea Mayer [Wed, 27 Jul 2022 18:54:08 +0000 (20:54 +0200)]
selftests: seg6: add selftest for SRv6 H.L2Encaps.Red behavior
This selftest is designed for testing the H.L2Encaps.Red behavior. It
instantiates a virtual network composed of several nodes: hosts and SRv6
routers. Each node is realized using a network namespace that is
properly interconnected to others through veth pairs.
The test considers SRv6 routers implementing a L2 VPN leveraged by hosts
for communicating with each other. Such routers make use of the SRv6
H.L2Encaps.Red behavior for applying SRv6 policies to L2 traffic coming
from hosts.
The correct execution of the behavior is verified through reachability
tests carried out between hosts belonging to the same VPN.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrea Mayer [Wed, 27 Jul 2022 18:54:07 +0000 (20:54 +0200)]
selftests: seg6: add selftest for SRv6 H.Encaps.Red behavior
This selftest is designed for testing the H.Encaps.Red behavior. It
instantiates a virtual network composed of several nodes: hosts and SRv6
routers. Each node is realized using a network namespace that is
properly interconnected to others through veth pairs.
The test considers SRv6 routers implementing L3 VPNs leveraged by hosts
for communicating with each other. Such routers make use of the SRv6
H.Encaps.Red behavior for applying SRv6 policies to L3 traffic coming
from hosts.
The correct execution of the behavior is verified through reachability
tests carried out between hosts belonging to the same VPN.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrea Mayer [Wed, 27 Jul 2022 18:54:06 +0000 (20:54 +0200)]
seg6: add support for SRv6 H.L2Encaps.Red behavior
The SRv6 H.L2Encaps.Red behavior described in [1] is an optimization of
the SRv6 H.L2Encaps behavior [2].
H.L2Encaps.Red reduces the length of the SRH by excluding the first
segment (SID) in the SRH of the pushed IPv6 header. The first SID is
only placed in the IPv6 Destination Address field of the pushed IPv6
header.
When the SRv6 Policy only contains one SID the SRH is omitted, unless
there is an HMAC TLV to be carried.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Anton Makarov <anton.makarov11235@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrea Mayer [Wed, 27 Jul 2022 18:54:05 +0000 (20:54 +0200)]
seg6: add support for SRv6 H.Encaps.Red behavior
The SRv6 H.Encaps.Red behavior described in [1] is an optimization of
the SRv6 H.Encaps behavior [2].
H.Encaps.Red reduces the length of the SRH by excluding the first
segment (SID) in the SRH of the pushed IPv6 header. The first SID is
only placed in the IPv6 Destination Address field of the pushed IPv6
header.
When the SRv6 Policy only contains one SID the SRH is omitted, unless
there is an HMAC TLV to be carried.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Anton Makarov <anton.makarov11235@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Commit '2c5a5748105a ("vmxnet3: add support for out of order rx
completion")' added support for out of order rx completion. Within
that patch, an enhancement was done to reschedule napi for processing
rx completions.
However, it can lead to missing an interrupt. So, this patch reverts
that part of the code.
Fixes: 2c5a5748105a ("vmxnet3: add support for out of order rx completion") Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/af_packet: check len when min_header_len equals to 0
User can use AF_PACKET socket to send packets with the length of 0.
When min_header_len equals to 0, packet_snd will call __dev_queue_xmit
to send packets, and sock->type can be any type.
Reported-by: syzbot+5ea725c25d06fb9114c4@syzkaller.appspotmail.com Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Manning [Mon, 25 Jul 2022 18:14:42 +0000 (19:14 +0100)]
net: allow unbound socket for packets in VRF when tcp_l3mdev_accept set
The commit 3c82a21f4320 ("net: allow binding socket in a VRF when
there's an unbound socket") changed the inet socket lookup to avoid
packets in a VRF from matching an unbound socket. This is to ensure the
necessary isolation between the default and other VRFs for routing and
forwarding. VRF-unaware processes running in the default VRF cannot
access another VRF and have to be run with 'ip vrf exec <vrf>'. This is
to be expected with tcp_l3mdev_accept disabled, but could be reallowed
when this sysctl option is enabled. So instead of directly checking dif
and sdif in inet[6]_match, here call inet_sk_bound_dev_eq(). This
allows a match on unbound socket for non-zero sdif i.e. for packets in
a VRF, if tcp_l3mdev_accept is enabled.
For avoiding the potential deadlock via kill_fasync() call, use the
new fasync helpers to defer the invocation from the control API. Note
that it's merely a workaround.
Another note: although we haven't received reports about the deadlock
with the control API, the deadlock is still potentially possible, and
it's better to align the behavior with other core APIs (PCM and
timer); so let's move altogether.
For avoiding the potential deadlock via kill_fasync() call, use the
new fasync helpers to defer the invocation from timer API. Note that
it's merely a workaround.
For avoiding the potential deadlock via kill_fasync() call, use the
new fasync helpers to defer the invocation from PCI API. Note that
it's merely a workaround.