Commit 545fbd0775ba ("rq-qos: fix missed wake-ups in rq_qos_throttle")
tried to fix a problem that a process could be sleeping in rq_qos_wait()
without anyone to wake it up. However the fix is not complete and the
following can still happen:
completes IOs, inflight
decreased
prepare_to_wait_exclusive()
prepare_to_wait_exclusive()
has_sleeper = !wq_has_single_sleeper() -> true as there are two sleepers
has_sleeper = !wq_has_single_sleeper() -> true
io_schedule() io_schedule()
Deadlock as now there's nobody to wakeup the two waiters. The logic
automatically blocking when there are already sleepers is really subtle
and the only way to make it work reliably is that we check whether there
are some waiters in the queue when adding ourselves there. That way, we
are guaranteed that at least the first process to enter the wait queue
will recheck the waiting condition before going to sleep and thus
guarantee forward progress.
Fixes: 545fbd0775ba ("rq-qos: fix missed wake-ups in rq_qos_throttle") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210607112613.25344-1-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There's two variables being increased in that loop (i and j), and i
follows the raw data, and j follows what is being written into the buffer.
We should compare 'i' to MAX_MEMHEX_BYTES or compare 'j' to HEX_CHARS.
Otherwise, if 'j' goes bigger than HEX_CHARS, it will overflow the
destination buffer.
Where feasible, I prefer to have all tests visible on all architectures,
but to have them wired to XFAIL. DOUBLE_FAIL was set up to XFAIL, but
wasn't actually being added to the test list.
Fixes: cea23efb4de2 ("lkdtm/bugs: Make double-fault test always available") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210623203936.3151093-7-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
extcon driver for Basin Cove PMIC shadows the switch status used for dwc3
DRD to detect a change in the switch position. This change initializes the
status at probe time.
The commit 529a1101212a("mfd: syscon: Don't free allocated name
for regmap_config") doesn't free the allocated name field of struct
regmap_config, but introduce a memory leak. There is another
commit 94cc89eb8fa5("regmap: debugfs: Fix handling of name string
for debugfs init delays") fixing this debugfs init issue from root
cause. With this fixing, the name field in struct regmap_debugfs_node
is removed. When initialize debugfs for syscon driver, the name
field of struct regmap_config is not used anymore. So, the allocated
name field of struct regmap_config is need to be freed directly after
regmap initialization to avoid memory leak.
Cc: stable@vger.kernel.org Fixes: 529a1101212a("mfd: syscon: Don't free allocated name for regmap_config") Signed-off-by: Meng Li <Meng.Li@windriver.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Trying to get the AB8500 charging driver working I ran into a bit
of bitrot: we haven't used the driver for a while so errors in
refactorings won't be noticed.
This one is pretty self evident: use argument to the macro or we
end up with a random pointer to something else.
Cc: stable@vger.kernel.org Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Marcus Cooper <codekipper@gmail.com> Fixes: 297d716f6260 ("power_supply: Change ownership from driver to core") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
UBIFS may occur some problems with concurrent xattr_{set|get} and
listxattr operations, such as assertion failure, memory corruption,
stale xattr value[1].
Fix it by importing a new rw-lock in @ubifs_inode to serilize write
operations on xattr, concurrent read operations are still effective,
just like ext4.
The following fixes are done for tcc sysfs interface:
- TCC is 6 bits only from bit 29-24
- TCC of 0 is valid
- When BIT(31) is set, this register is read only
- Check for invalid tcc value
- Error for negative values
When an IPMI watchdog timer is being stopped in ipmi_close() or
ipmi_ioctl(WDIOS_DISABLECARD), the current watchdog action is updated to
WDOG_TIMEOUT_NONE and _ipmi_set_timeout(IPMI_SET_TIMEOUT_NO_HB) is called
to install this action. The latter function ends up invoking
__ipmi_set_timeout() which makes the actual 'Set Watchdog Timer' IPMI
request.
For IPMI 1.0, this operation results in fully stopping the watchdog timer.
For IPMI >= 1.5, function __ipmi_set_timeout() always specifies the "don't
stop" flag in the prepared 'Set Watchdog Timer' IPMI request. This causes
that the watchdog timer has its action correctly updated to 'none' but the
timer continues to run. A problem is that IPMI firmware can then still log
an expiration event when the configured timeout is reached, which is
unexpected because the watchdog timer was requested to be stopped.
The patch fixes this problem by not setting the "don't stop" flag in
__ipmi_set_timeout() when the current action is WDOG_TIMEOUT_NONE which
results in stopping the watchdog timer. This makes the behaviour for
IPMI >= 1.5 consistent with IPMI 1.0. It also matches the logic in
__ipmi_heartbeat() which does not allow to reset the watchdog if the
current action is WDOG_TIMEOUT_NONE as that would start the timer.
fw_cfg_showrev() is called by an indirect call in kobj_attr_show(),
which violates clang's CFI checking because fw_cfg_showrev()'s second
parameter is 'struct attribute', whereas the ->show() member of 'struct
kobj_structure' expects the second parameter to be of type 'struct
kobj_attribute'.
Fix this by converting fw_cfg_rev_attr to 'struct kobj_attribute' where
this would have been caught automatically by the incompatible pointer
types compiler warning. Update fw_cfg_showrev() accordingly.
As reported by Alex Sergeev, the i40e driver is incrementing the PTP
clock at 40Gb speeds when linked at 5Gb. Fix this bug by making
sure that the right multiplier is selected when linked at 5Gb.
Fixes: 3dbdd6c2f70a ("i40e: Add support for 5Gbps cards") Cc: stable@vger.kernel.org Reported-by: Alex Sergeev <asergeev@carbonrobotics.com> Suggested-by: Alex Sergeev <asergeev@carbonrobotics.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We can deadlock when rmmod'ing the driver or going through firmware
reset, because the cfg80211_unregister_wdev() has to bring down the link
for us, ... which then grab the same wiphy lock.
nl80211_del_interface() already handles a very similar case, with a nice
description:
/*
* We hold RTNL, so this is safe, without RTNL opencount cannot
* reach 0, and thus the rdev cannot be deleted.
*
* We need to do it for the dev_close(), since that will call
* the netdev notifiers, and we need to acquire the mutex there
* but don't know if we get there from here or from some other
* place (e.g. "ip link set ... down").
*/
mutex_unlock(&rdev->wiphy.mtx);
...
Do similarly for mwifiex teardown, by ensuring we bring the link down
first.
The driver_name="tegra" is now required by the newer ALSA UCMs, otherwise
Tegra UCMs don't match by the path/name.
All Tegra machine drivers are specifying the card's name, but it has no
effect if model name is specified in the device-tree since it overrides
the card's name. We need to set the driver_name to "tegra" in order to
get a usable lookup path for the updated ALSA UCMs. The new UCM lookup
path has a form of driver_name/card_name.
The old lookup paths that are based on driver module name continue to
work as before. Note that UCM matching never worked for Tegra ASoC drivers
if they were compiled as built-in, this is fixed by supporting the new
naming scheme.
The stratix10-soc driver uses fpga_mgr_create() function and is therefore
responsible to call fpga_mgr_free() to release the class driver resources.
Add a missing call to fpga_mgr_free in the s10_remove() function.
Bad counter reads are experienced sometimes when bit 10 or greater rolls
over. Originally, testing showed that at least 10 lower bits would be
set to the same value during these bad reads. However, some users still
reported time skips.
Wider testing revealed that on some chips, occasionally only the lowest
9 bits would read as the anomalous value. During these reads (which
still happen only when bit 10), bit 9 would read as the correct value.
Reduce the mask by one bit to cover these cases as well.
Cc: stable@vger.kernel.org Fixes: c950ca8c35ee ("clocksource/drivers/arch_timer: Workaround for Allwinner A64 timer instability") Reported-by: Roman Stratiienko <r.stratiienko@gmail.com> Signed-off-by: Samuel Holland <samuel@sholland.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210515021439.55316-1-samuel@sholland.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexey and Joshua tried to solve a cpusets related hotplug problem which is
user space visible and results in unexpected behaviour for some time after
a CPU has been plugged in and the corresponding uevent was delivered.
cpusets delegate the hotplug work (rebuilding cpumasks etc.) to a
workqueue. This is done because the cpusets code has already a lock
nesting of cgroups_mutex -> cpu_hotplug_lock. A synchronous callback or
waiting for the work to finish with cpu_hotplug_lock held can and will
deadlock because that results in the reverse lock order.
As a consequence the uevent can be delivered before cpusets have consistent
state which means that a user space invocation of sched_setaffinity() to
move a task to the plugged CPU fails up to the point where the scheduled
work has been processed.
The same is true for CPU unplug, but that does not create user observable
failure (yet).
It's still inconsistent to claim that an operation is finished before it
actually is and that's the real issue at hand. uevents just make it
reliably observable.
Obviously the problem should be fixed in cpusets/cgroups, but untangling
that is pretty much impossible because according to the changelog of the
commit which introduced this 8 years ago:
the lock order cgroups_mutex -> cpu_hotplug_lock is a design decision and
the whole code is built around that.
So bite the bullet and invoke the relevant cpuset function, which waits for
the work to finish, in _cpu_up/down() after dropping cpu_hotplug_lock and
only when tasks are not frozen by suspend/hibernate because that would
obviously wait forever.
Waiting there with cpu_add_remove_lock, which is protecting the present
and possible CPU maps, held is not a problem at all because neither work
queues nor cpusets/cgroups have any lockchains related to that lock.
Waiting in the hotplug machinery is not problematic either because there
are already state callbacks which wait for hardware queues to drain. It
makes the operations slightly slower, but hotplug is slow anyway.
This ensures that state is consistent before returning from a hotplug
up/down operation. It's still inconsistent during the operation, but that's
a different story.
Add a large comment which explains why this is done and why this is not a
dump ground for the hack of the day to work around half thought out locking
schemes. Document also the implications vs. hotplug operations and
serialization or the lack of it.
Thanks to Alexy and Joshua for analyzing why this temporary
sched_setaffinity() failure happened.
The TTL field indicates the level of page table walk holding the *leaf*
entry for the address being invalidated. But currently, the TTL field
may be set to an incorrent value in the following stack:
In this case, we just want to flush a PTE page, but the tlb->cleared_pmds
is set and we get tlb_level = 2 in the tlb_get_level() function. This may
cause some unexpected problems.
This patch set the TTL field to 0 if tlb->freed_tables is set. The
tlb->freed_tables indicates page table pages are freed, not the leaf
entry.
Cc: <stable@vger.kernel.org> # 5.9.x Fixes: c4ab2cbc1d87 ("arm64: tlb: Set the TTL field in flush_tlb_range") Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: ZhuRui <zhurui3@huawei.com> Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com> Link: https://lore.kernel.org/r/b80ead47-1f88-3a00-18e1-cacc22f54cc4@huawei.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
DIPM is unsupported or broken on sunxi. Trying to enable the power
management policy med_power_with_dipm on an Allwinner A20 SoC based board
leads to immediate I/O errors and the attached SATA disk disappears from
the /dev filesystem. A reset (power cycle) is required to make the SATA
controller or disk work again. The A10 and A20 SoC data sheets and manuals
don't mention DIPM at all [1], so it's fair to assume that it's simply not
supported. But even if it was, it should be considered broken and best be
disabled in the ahci_sunxi driver.
While initializing an UHS-I SD card, the mmc core first tries to switch to
1.8V I/O voltage, before it continues to change the settings for the bus
speed mode.
However, the current behaviour in the mmc core is inconsistent and doesn't
conform to the SD spec. More precisely, an SD card that supports UHS-I must
set both the SD_OCR_CCS bit and the SD_OCR_S18R bit in the OCR register
response. When switching to 1.8V I/O the mmc core correctly checks both of
the bits, but only the SD_OCR_S18R bit when changing the settings for bus
speed mode.
Rather than actually fixing the code to confirm to the SD spec, let's
deliberately deviate from it by requiring only the SD_OCR_S18R bit for both
parts. This enables us to support UHS-I for SDSC cards (outside spec),
which is actually being supported by some existing SDSC cards. Moreover,
this fixes the inconsistent behaviour.
It might be that something goes wrong during tuning so the MMC core will
immediately trigger a retune. In our case it was:
- we sent a tuning block
- there was an error so we need to send an abort cmd to the eMMC
- the abort cmd had a CRC error
- retune was set by the MMC core
This lead to a vicious circle causing a performance regression of 75%.
So, clear retuning flags before we enable retuning to start with a known
cleared state.
Reported-by Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Suggested-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Fixes: bd11e8bd03ca ("mmc: core: Flag re-tuning is needed on CRC errors") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210624151616.38770-2-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When an eMMC device is being run in HS400 mode, any access to the
RPMB device will cause the error message "mmc1: Invalid UHS-I mode
selected". This happens as a result of tuning being disabled before
RPMB access and then re-enabled after the RPMB access is complete.
When tuning is re-enabled, the system has to switch from HS400
to HS200 to do the tuning and then back to HS400. As part of
sequence to switch from HS400 to HS200 the system is temporarily
put into HS mode. When switching to HS mode, sdhci_get_preset_value()
is called and does not have support for HS mode and prints the warning
message and returns the preset for SDR12. The fix is to add support
for MMC and SD HS modes to sdhci_get_preset_value().
This can be reproduced on any system running eMMC in HS400 mode
(not HS400ES) by using the "mmc" utility to run the following
command: "mmc rpmb read-counter /dev/mmcblk0rpmb".
Signed-off-by: Al Cooper <alcooperx@gmail.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Fixes: 52983382c74f ("mmc: sdhci: enhance preset value function") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210624163045.33651-1-alcooperx@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On the Toshiba Encore 2 WT8-B the microSD slot always reports the card
being write-protected even though microSD cards do not have a write-protect
switch at all.
Add a new DMI_QUIRK_SD_NO_WRITE_PROTECT quirk entry to sdhci-acpi.c's
DMI quirk table for this.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20210503092157.5689-1-hdegoede@redhat.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
intel_dp_vsc_sdp_unpack() was using a memset() size (36, struct dp_sdp)
larger than the destination (24, struct drm_dp_vsc_sdp), clobbering
fields in struct intel_crtc_state after infoframes.vsc. Use the actual
target size for the memset().
Fixes: 1b404b7dbb10 ("drm/i915/dp: Read out DP SDPs") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210617213301.1824728-1-keescook@chromium.org
(cherry picked from commit c88e2647c5bb45d04dc4302018ebe6ebbf331823) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drm: Auto-set allow_fb_modifiers when given modifiers at plane init
this is done automatically as part of plane init, if drivers set the
modifier list correctly. Which is the case here.
Note that this fixes an inconsistency: We've set the cap everywhere,
but only nv50+ supports modifiers. Hence cc stable, but not further
back then the patch from Paul.
Reviewed-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org # v5.1 + Cc: Pekka Paalanen <pekka.paalanen@collabora.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: nouveau@lists.freedesktop.org Link: https://patchwork.freedesktop.org/patch/msgid/20210427092018.832258-6-daniel.vetter@ffwll.ch Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Setting the cap without the modifier list is very confusing to
userspace. Fix that by listing the ones we support explicitly.
Stable backport so that userspace can rely on this working in a
reasonable way, i.e. that the cap set implies IN_FORMATS is available.
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Cc: Pekka Paalanen <pekka.paalanen@collabora.com> Cc: Rob Clark <robdclark@chromium.org> Cc: Jordan Crouse <jordan@cosmicpenguin.net> Cc: Emil Velikov <emil.velikov@collabora.com> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210427092018.832258-5-daniel.vetter@ffwll.ch Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drm: Auto-set allow_fb_modifiers when given modifiers at plane init
this is done automatically as part of plane init, if drivers set the
modifier list correctly. Which is the case here.
It was slightly inconsistently though, since planes with only linear
modifier support haven't listed that explicitly. Fix that, and cc:
stable to allow userspace to rely on this. Again don't backport
further than where Paul's patch got added.
When using a 24-bit panel on a 8-bit serial bus, the pixel clock
requested by the panel has to be multiplied by 3, since the subpixels
are shifted sequentially.
The code (in ingenic_drm_encoder_atomic_check) already computed
crtc_state->adjusted_mode->crtc_clock accordingly, but clk_set_rate()
used crtc_state->adjusted_mode->clock instead.
Fixes: 28ab7d35b6e0 ("drm/ingenic: Properly compute timings when using a 3x8-bit panel") Cc: stable@vger.kernel.org # v5.10 Signed-off-by: Paul Cercueil <paul@crapouillou.net> Tested-by: H. Nikolaus Schaller <hns@goldelico.com> # CI20/jz4780 (HDMI) and Alpha400/jz4730 (LCD) Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20210323144008.166248-1-paul@crapouillou.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Why]
This hasn't been well tested and leads to complete system hangs on DCN1
based systems, possibly others.
The system hang can be reproduced by gesturing the video on the YouTube
Android app on ChromeOS into full screen.
[How]
Reject atomic commits with non-zero drm_plane_state.src_x or src_y values.
v2:
- Add code comment describing the reason we're rejecting non-zero
src_x and src_y
- Drop gerrit Change-Id
- Add stable CC
- Based on amd-staging-drm-next
v3: removed trailing whitespace
Signed-off-by: Harry Wentland <harry.wentland@amd.com> Cc: stable@vger.kernel.org Cc: nicholas.kazlauskas@amd.com Cc: amd-gfx@lists.freedesktop.org Cc: alexander.deucher@amd.com Cc: Roman.Li@amd.com Cc: hersenxs.wu@amd.com Cc: danny.wang@amd.com Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hersen Wu <hersenxs.wu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add device HID AMDI0031 to the AMD GPIO controller driver match table.
This controller can be found on Microsoft Surface Laptop 4 devices and
seems similar enough that we can just copy the existing AMDI0030 entry.
commit cf6d100dd238 ("drm/rockchip: dsi: add dual mipi support") added
this devcnt field and call to component_del(). However, these both
appear to be erroneous changes left over from an earlier version of the
patch. In the version merged, nothing ever modifies devcnt, meaning
component_del() runs unconditionally and in addition to the
component_del() calls in dw_mipi_dsi_rockchip_host_detach(). The second
call fails to delete anything and produces a warning in dmesg.
If we look at the previous version of the patch[1], however, we see that
it had logic to calculate devcnt and call component_add() in certain
situations. This was removed in v6, and the fact that the deletion code
was not appears to have been an oversight.
While the DP specification isn't entirely clear on if this should be
allowed or not, some branch devices report having downstream ports present
while also reporting a downstream port count of 0. So to avoid breaking
those devices, we need to handle this in drm_dp_read_downstream_info().
So, to do this we assume there's no downstream port info when the
downstream port count is 0.
Signed-off-by: Lyude Paul <lyude@redhat.com> Tested-by: Jérôme de Bretagne <jerome.debretagne@gmail.com>
Bugzilla: https://gitlab.freedesktop.org/drm/intel/-/issues/3416 Fixes: 3d3721ccb18a ("drm/i915/dp: Extract drm_dp_read_downstream_info()") Cc: <stable@vger.kernel.org> # v5.10+ Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210430223428.10514-1-lyude@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since we fixed the hooks to disable the encoder at boot, we now have an
unbalanced clk_disable call at boot since we never enabled them in the
first place.
Let's mimic the state of the hardware and enable the clocks at boot if
the controller is enabled to get the use-count right.
The vc4_set_crtc_possible_masks is meant to run over all the encoders
and then set their possible_crtcs mask to their associated pixelvalve.
However, since the commit 39fcb2808376 ("drm/vc4: txp: Turn the TXP into
a CRTC of its own"), the TXP has been turned to a CRTC and encoder of
its own, and while it does indeed register an encoder, it no longer has
an associated pixelvalve. The code will thus run over the TXP encoder
and set a bogus possible_crtcs mask, overriding the one set in the TXP
bind function.
In order to fix this, let's skip any virtual encoder.
Cc: <stable@vger.kernel.org> # v5.9+ Fixes: 39fcb2808376 ("drm/vc4: txp: Turn the TXP into a CRTC of its own") Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20210507150515.257424-3-maxime@cerno.tech Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current code does a binary OR on the possible_crtcs variable of the
TXP encoder, while we want to set it to that value instead.
Cc: <stable@vger.kernel.org> # v5.9+ Fixes: 39fcb2808376 ("drm/vc4: txp: Turn the TXP into a CRTC of its own") Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20210507150515.257424-2-maxime@cerno.tech Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
SMU had set all the necessary fields for a link width switch
but the width switch wasn't occurring because the link was idle
in the L1 state. Setting LC_L1_RECONFIG_EN=0x1 will allow width
switches to also be initiated while in L1 instead of waiting until
the link is back in L0.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A lot of NAK-G being generated when link widht switching is happening.
WA for this issue is to program the SPC to 4 symbols per clock during
bootup when the native PCIE width is x4.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Without driver loaded, SDMA0_UTCL1_PAGE.TMZ_ENABLE is set to 1
by default for all asic. On Raven/Renoir, the sdma goldsetting
changes SDMA0_UTCL1_PAGE.TMZ_ENABLE to 0.
This patch restores SDMA0_UTCL1_PAGE.TMZ_ENABLE to 1.
Signed-off-by: Aaron Liu <aaron.liu@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Navi series GPUs have 2 SIMDs per CU (and then 2 CUs per WGP).
The NV enum headers incorrectly listed this as 4, which later meant
we were incorrectly reporting the number of SIMDs in the HSA
topology. This could cause problems down the line for user-space
applications that want to launch a fixed amount of work to each
SIMD.
Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kernel handles the NX fault by updating CSB or sending
signal to process. In multithread applications, children can
open VAS windows and can exit without closing them. But the
parent can continue to send NX requests with these windows. To
prevent pid reuse, reference will be taken on pid and tgid
when the window is opened and release them during window close.
The current code is not releasing the tgid reference which can
cause pid leak and this patch fixes the issue.
Fixes: db1c08a740635 ("powerpc/vas: Take reference to PID and mm for user space windows") Cc: stable@vger.kernel.org # 5.8+ Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6020fc4d444864fe20f7dcdc5edfe53e67480a1c.camel@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A change in clang 13 results in the __lwsync macro being defined as
__builtin_ppc_lwsync, which emits 'lwsync' or 'msync' depending on what
the target supports. This breaks the build because of -Werror in
arch/powerpc, along with thousands of warnings:
In file included from arch/powerpc/kernel/pmc.c:12:
In file included from include/linux/bug.h:5:
In file included from arch/powerpc/include/asm/bug.h:109:
In file included from include/asm-generic/bug.h:20:
In file included from include/linux/kernel.h:12:
In file included from include/linux/bitops.h:32:
In file included from arch/powerpc/include/asm/bitops.h:62:
arch/powerpc/include/asm/barrier.h:49:9: error: '__lwsync' macro redefined [-Werror,-Wmacro-redefined]
#define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
^
<built-in>:308:9: note: previous definition is here
#define __lwsync __builtin_ppc_lwsync
^
1 error generated.
Undefine this macro so that the runtime patching introduced by
commit 2d1b2027626d ("powerpc: Fixup lwsync at runtime") continues to
work properly with clang and the build no longer breaks.
The powerpc kernel is not prepared to handle exec faults from kernel.
Especially, the function is_exec_fault() will return 'false' when an
exec fault is taken by kernel, because the check is based on reading
current->thread.regs->trap which contains the trap from user.
For instance, when provoking a LKDTM EXEC_USERSPACE test,
current->thread.regs->trap is set to SYSCALL trap (0xc00), and
the fault taken by the kernel is not seen as an exec fault by
set_access_flags_filter().
Commit d7df2443cd5f ("powerpc/mm: Fix spurious segfaults on radix
with autonuma") made it clear and handled it properly. But later on
commit d3ca587404b3 ("powerpc/mm: Fix reporting of kernel execute
faults") removed that handling, introducing test based on error_code.
And here is the problem, because on the 603 all upper bits of SRR1
get cleared when the TLB instruction miss handler bails out to ISI.
Until commit cbd7e6ca0210 ("powerpc/fault: Avoid heavy
search_exception_tables() verification"), an exec fault from kernel
at a userspace address was indirectly caught by the lack of entry for
that address in the exception tables. But after that commit the
kernel mainly relies on KUAP or on core mm handling to catch wrong
user accesses. Here the access is not wrong, so mm handles it.
It is a minor fault because PAGE_EXEC is not set,
set_access_flags_filter() should set PAGE_EXEC and voila.
But as is_exec_fault() returns false as explained in the beginning,
set_access_flags_filter() bails out without setting PAGE_EXEC flag,
which leads to a forever minor exec fault.
As the kernel is not prepared to handle such exec faults, the thing to
do is to fire in bad_kernel_fault() for any exec fault taken by the
kernel, as it was prior to commit d3ca587404b3.
To avoid a race between rmap walk and mremap, mremap does
take_rmap_locks(). The lock was taken to ensure that rmap walk don't miss
a page table entry due to PTE moves via move_pagetables(). The kernel
does further optimization of this lock such that if we are going to find
the newly added vma after the old vma, the rmap lock is not taken. This
is because rmap walk would find the vmas in the same order and if we don't
find the page table attached to older vma we would find it with the new
vma which we would iterate later.
As explained in commit eb66ae030829 ("mremap: properly flush TLB before
releasing the page") mremap is special in that it doesn't take ownership
of the page. The optimized version for PUD/PMD aligned mremap also
doesn't hold the ptl lock. This can result in stale TLB entries as show
below.
This patch updates the rmap locking requirement in mremap to handle the race condition
explained below with optimized mremap::
Resctrl test suite accepts command line argument "-t" to specify the
unit tests to run in the test list (e.g., -t mbm,mba,cmt,cat) as
documented in the help.
When calling strtok() to parse the option, the incorrect delimiters
argument ":\t" is used. As a result, passing "-t mbm,mba,cmt,cat" throws
an invalid option error.
Fix this by using delimiters argument "," instead of ":\t" for parsing
of unit tests list. At the same time, remove the unnecessary "spaces"
between the unit tests in help documentation to prevent confusion.
Commit 275e88b06a27 ("PCI: tegra: Fix host link initialization") broke
host initialization during resume as it misses out calling the API
dw_pcie_setup_rc() which is required for host and MSI initialization.
Link: https://lore.kernel.org/r/20210504172157.29712-1-vidyas@nvidia.com Fixes: 275e88b06a27 ("PCI: tegra: Fix host link initialization") Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Clang's integrated assembler only accepts these instructions when the
cpu is set to mips32r5. With this change, we can assemble
malta_defconfig with Clang via `make LLVM_IAS=1`.
Link: https://github.com/ClangBuiltLinux/linux/issues/763 Reported-by: Dmitry Golovin <dima@golovin.in> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
The cause of the problem is as follows:
1. when cat /sys/devices/system/memory/memory0/valid_zones,
test_pages_in_a_zone() will be called.
2. test_pages_in_a_zone() finds the zone according to stat_pfn = 0.
The smallest pfn of the numa node in the mips architecture is 128,
and the page corresponding to the previous 0~127 pfn is not
initialized (page->flags is 0xFFFFFFFF)
3. The nid and zonenum obtained using page_zone(pfn_to_page(0)) are out
of bounds in the corresponding array,
&NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)],
access to the out-of-bounds zone member variables appear abnormal,
resulting in Oops.
Therefore, it is necessary to keep the page between 0 and the minimum
pfn to prevent Oops from appearing.
The first chunk in a packet is ensured to be present at the beginning of
sctp_rcv(), as a packet needs to have at least 1 chunk. But the second
one, may not be completely available and ch->length can be over
uninitialized memory.
Fix here is by only trying to walk on the next chunk if there is enough to
hold at least the header, and then proceed with the ch->length validation
that is already there.
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Ilja reported that, simply putting it, nothing was validating that
from_addr_param functions were operating on initialized memory. That is,
the parameter itself was being validated by sctp_walk_params, but it
doesn't check for types and their specific sizes and it could be a 0-length
one, causing from_addr_param to potentially work over the next parameter or
even uninitialized memory.
The fix here is to, in all calls to from_addr_param, check if enough space
is there for the wanted IP address type.
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
"action" should not be NULL when it is referenced.
Signed-off-by: gushengxian <13145886936@163.com> Signed-off-by: gushengxian <gushengxian@yulong.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
kmemleak scans struct page, but it does not scan the page content. If we
allocate some memory with kmalloc(), then allocate page with alloc_page(),
and if we put kmalloc pointer somewhere inside that page, kmemleak will
report kmalloc pointer as a false positive.
We can instruct kmemleak to scan the memory area by calling kmemleak_alloc()
and kmemleak_free(), but part of struct bpf_ringbuf is mmaped to user space,
and if struct bpf_ringbuf changes we would have to revisit and review size
argument in kmemleak_alloc(), because we do not want kmemleak to scan the
user space memory. Let's simplify things and use kmemleak_not_leak() here.
For posterity, also adding additional prior analysis from Andrii:
I think either kmemleak or syzbot are misreporting this. I've added a
bunch of printks around all allocations performed by BPF ringbuf. [...]
On repro side I get these two warnings:
[vmuser@archvm bpf]$ sudo ./repro
BUG: memory leak
unreferenced object 0xffff88810d538c00 (size 64):
comm "repro", pid 2140, jiffies 4294692933 (age 14.540s)
hex dump (first 32 bytes):
00 af 19 04 00 ea ff ff c0 ae 19 04 00 ea ff ff ................
80 ae 19 04 00 ea ff ff c0 29 2e 04 00 ea ff ff .........)......
backtrace:
[<0000000077bfbfbd>] __bpf_map_area_alloc+0x31/0xc0
[<00000000587fa522>] ringbuf_map_alloc.cold.4+0x48/0x218
[<0000000044d49e96>] __do_sys_bpf+0x359/0x1d90
[<00000000f601d565>] do_syscall_64+0x2d/0x40
[<0000000043d3112a>] entry_SYSCALL_64_after_hwframe+0x44/0xae
BUG: memory leak
unreferenced object 0xffff88810d538c80 (size 64):
comm "repro", pid 2143, jiffies 4294699025 (age 8.448s)
hex dump (first 32 bytes):
80 aa 19 04 00 ea ff ff 00 ab 19 04 00 ea ff ff ................
c0 ab 19 04 00 ea ff ff 80 44 28 04 00 ea ff ff .........D(.....
backtrace:
[<0000000077bfbfbd>] __bpf_map_area_alloc+0x31/0xc0
[<00000000587fa522>] ringbuf_map_alloc.cold.4+0x48/0x218
[<0000000044d49e96>] __do_sys_bpf+0x359/0x1d90
[<00000000f601d565>] do_syscall_64+0x2d/0x40
[<0000000043d3112a>] entry_SYSCALL_64_after_hwframe+0x44/0xae
Note that both reported leaks (ffff88810d538c80 and ffff88810d538c00)
correspond to pages array bpf_ringbuf is allocating and tracking properly
internally. Note also that syzbot repro doesn't close FD of created BPF
ringbufs, and even when ./repro itself exits with error, there are still
two forked processes hanging around in my system. So clearly ringbuf maps
are alive at that point. So reporting any memory leak looks weird at that
point, because that memory is being used by active referenced BPF ringbuf.
It's also a question why repro doesn't clean up its forks. But if I do a
`pkill repro`, I do see that all the allocated memory is /properly/ cleaned
up [and the] "leaks" are deallocated properly.
BTW, if I add close() right after bpf() syscall in syzbot repro, I see that
everything is immediately deallocated, like designed. And no memory leak
is reported. So I don't think the problem is anywhere in bpf_ringbuf code,
rather in the leak detection and/or repro itself.
The _sum and _avg values are in general sync together with the PELT
divider. They are however not always completely in perfect sync,
resulting in situations where _sum gets to zero while _avg stays
positive. Such situations are undesirable.
This comes from the fact that PELT will increase period_contrib, also
increasing the PELT divider, without updating _sum and _avg values to
stay in perfect sync where (_sum == _avg * divider). However, such PELT
change will never lower _sum, making it impossible to end up in a
situation where _sum is zero and _avg is not.
Therefore, we need to ensure that when subtracting load outside PELT,
that when _sum is zero, _avg is also set to zero. This occurs when
(_sum < _avg * divider), and the subtracted (_avg * divider) is bigger
or equal to the current _sum, while the subtracted _avg is smaller than
the current _avg.
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Odin Ugedal <odin@uged.al> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20210624111815.57937-1-odin@uged.al Signed-off-by: Sasha Levin <sashal@kernel.org>
This is btsoc timing issue, after host start to downloading bt firmware,
ep2 need time to switch from function acl to function dfu, so host add
20ms delay as workaround.
When the Get Device Flags command fails, it returns the error status
with the parameters filled with the garbage values. Although the
parameters are not used, it is better to fill with zero than the random
values.
Because mSBC frames do not need to be aligned to the SCO packet
boundary. Using USB ALT 3 let HCI payload >= 60 bytes, let mSBC
data satisfy 60 Bytes avoid payload unaligned situation and fixed
some headset no voise issue.
USB Alt 3 supported also need HFP support transparent MTU in 72 Bytes.
This patch adds the 0cf3:e500 Bluetooth device (from a QCA9377 board) as a
QCA_ROME device. It appears to be functionally identical to another device
ID, also from a QCA9377 board, which was previously marked as QCA_ROME in 0a03f98b98c201191e3ba15a0e33f46d8660e1fd
("Bluetooth: Add a new 04ca:3015 QCA_ROME device").
Without this patch, the WiFi side of the QCA9377 board is slow or unusable
when the Bluetooth side is in use.
See https://askubuntu.com/a/1137852 for another report of QCA_ROME fixing
this issue for this device ID.
Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it
stops working:
[ 509.691509] Bluetooth: hci0: HCI reset during shutdown failed
[ 514.897584] Bluetooth: hci0: MSFT filter_enable is already on
[ 530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
[ 545.660350] usb 3-10: device descriptor read/64, error -110
[ 561.283530] usb 3-10: device descriptor read/64, error -110
[ 561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
[ 566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500
[ 568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)
[ 578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)
[ 586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)
[ 596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)
0x2B, 0x31 and 0x33 are reserved for future use but were not present in
the HCI to MGMT conversion table, this caused the conversion to be
incorrect for the HCI status code greater than 0x2A.
Fix a memory leak when "mda_resolve_route() is called more than once on
the same "rdma_cm_id".
This is possible if cma_query_handler() triggers the
RDMA_CM_EVENT_ROUTE_ERROR flow which puts the state machine back and
allows rdma_resolve_route() to be called again.
Dave observed number of machines hitting OOM on the UDP send
path. The workload seems to be sending large UDP packets over
loopback. Since loopback has MTU of 64k kernel will try to
allocate an skb with up to 64k of head space. This has a good
chance of failing under memory pressure. What's worse if
the message length is <32k the allocation may trigger an
OOM killer.
This is entirely avoidable, we can use an skb with page frags.
af_unix solves a similar problem by limiting the head
length to SKB_MAX_ALLOC. This seems like a good and simple
approach. It means that UDP messages > 16kB will now
use fragments if underlying device supports SG, if extra
allocator pressure causes regressions in real workloads
we can switch to trying the large allocation first and
falling back.
v4: pre-calculate all the additions to alloclen so
we can be sure it won't go over order-2
Reported-by: Dave Jones <dsj@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
It's getting more common to run nested container environments for
testing cloud software. One of such examples is Kind [1] which runs a
Kubernetes cluster in Docker containers on a single host. Each container
acts as a Kubernetes node, and thus can run any Pod (aka container)
inside the former. This approach simplifies testing a lot, as it
eliminates complicated VM setups.
Unfortunately, such a setup breaks some functionality when cgroupv2 BPF
programs are used for load-balancing. The load-balancer BPF program
needs to detect whether a request originates from the host netns or a
container netns in order to allow some access, e.g. to a service via a
loopback IP address. Typically, the programs detect this by comparing
netns cookies with the one of the init ns via a call to
bpf_get_netns_cookie(NULL). However, in nested environments the latter
cannot be used given the Kubernetes node's netns is outside the init ns.
To fix this, we need to pass the Kubernetes node netns cookie to the
program in a different way: by extending getsockopt() with a
SO_NETNS_COOKIE option, the orchestrator which runs in the Kubernetes
node netns can retrieve the cookie and pass it to the program instead.
Thus, this is following up on Eric's commit 3d368ab87cf6 ("net:
initialize net->net_cookie at netns setup") to allow retrieval via
SO_NETNS_COOKIE. This is also in line in how we retrieve socket cookie
via SO_COOKIE.
[1] https://kind.sigs.k8s.io/
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Martynas Pumputis <m@lambda.lt> Cc: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The syscall bpf(BPF_PROG_QUERY, &attr) should use the prog_cnt field to
see how many entries user space provided and return ENOSPC if there are
more programs than that. Before this patch, this is not checked and
ENOSPC is never returned.
Note that one lirc device is limited to 64 bpf programs, and user space
I'm aware of -- ir-keytable -- always gives enough space for 64 entries
already. However, we should not copy program ids than are requested.
Since the Linux iser initiator default max I/O size set to 512KB and since
there is no handshake procedure for this size in iser protocol, set the
default max IO size of the target to 512KB as well.
For changing the default values, there is a module parameter for both
drivers.
Link: https://lore.kernel.org/r/20210524085215.29005-1-mgurtovoy@nvidia.com Reviewed-by: Alaa Hleihel <alaa@nvidia.com> Reviewed-by: Israel Rukshin <israelr@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
1. The device driver requests HW restart.
2. A scan is requested from user space and is propagated
to the driver. During this flow HW_SCANNING flag is set.
3. The thread that handles the HW restart is scheduled,
and before starting the actual reconfiguration it
checks that HW_SCANNING is not set. The flow does so
without acquiring any lock, and thus the WARN fires.
Fix this by checking that HW_SCANNING is on only after RTNL is
acquired, i.e., user space scan request handling is no longer
in transit.
If we have been keeping per-CPU statistics, consider them
regardless of USES_RSS, because we may not actually fill
those, for example in non-fast-RX cases when the connection
is not compatible with fast-RX. If we didn't fill them, the
additional data will be zero and not affect anything, and
if we did fill them then it's more correct to consider them.
This fixes an issue in mesh mode where some statistics are
not updated due to USES_RSS being set, but fast-RX isn't
used.
In 2G band, a HE sta can only supports HT and HE, but not supports VHT.
In this case, default HE tx bitrate mask isn't filled, when we use iw to
set bitrates without any parameter.
net/wireless/wext-spy.c:178:2: warning: 'memcpy' offset [25, 28] from the object at 'threshold' is out of the bounds of referenced subobject 'low' with type 'struct iw_quality' at offset 20 [-Warray-bounds]
The problem is that the original code is trying to copy data into a
couple of struct members adjacent to each other in a single call to
memcpy(). This causes a legitimate compiler warning because memcpy()
overruns the length of &threshold.low and &spydata->spy_thr_low. As
these are just a couple of struct members, fix this by using direct
assignments, instead of memcpy().
This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().
If SRIOV cannot be disabled during device removal or module unloading,
return error code so it can be logged properly in the calling function.
Note that this can only happen if any VF is currently attached to a
guest using Xen, but not with vfio/KVM. Despite that in that case the
VFs won't work properly with PF removed and/or the module unloaded, I
have let it as is because I don't know what side effects may have
changing it, and also it seems to be the same that other drivers are
doing in this situation.
In the case of being called during SRIOV reconfiguration, the behavior
hasn't changed because the function is called with force=false.
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
If pci_remove was called for a PF with VFs, the removal of the VFs was
called twice from efx_ef10_sriov_fini: one directly with pci_driver->remove
and another implicit by calling pci_disable_sriov, which also perform
the VFs remove. This was leading to crashing the kernel on the second
attempt.
Given that pci_disable_sriov already calls to pci remove function, get
rid of the direct call to pci_driver->remove from the driver.
2 different ways to trigger the bug:
- Create one or more VFs, then attach the PF to a virtual machine (at
least with qemu/KVM)
- Create one or more VFs, then remove the PF with:
echo 1 > /sys/bus/pci/devices/PF_PCI_ID/remove
Removing sfc module does not trigger the error, at least for me, because
it removes the VF first, and then the PF.
Example of a log with the error:
list_del corruption, ffff967fd20a8ad0->next is LIST_POISON1 (dead000000000100)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:47!
[...trimmed...]
RIP: 0010:__list_del_entry_valid.cold.1+0x12/0x4c
[...trimmed...]
Call Trace:
efx_dissociate+0x1f/0x140 [sfc]
efx_pci_remove+0x27/0x150 [sfc]
pci_device_remove+0x3b/0xc0
device_release_driver_internal+0x103/0x1f0
pci_stop_bus_device+0x69/0x90
pci_stop_and_remove_bus_device+0xe/0x20
pci_iov_remove_virtfn+0xba/0x120
sriov_disable+0x2f/0xe0
efx_ef10_pci_sriov_disable+0x52/0x80 [sfc]
? pcie_aer_is_native+0x12/0x40
efx_ef10_sriov_fini+0x72/0x110 [sfc]
efx_pci_remove+0x62/0x150 [sfc]
pci_device_remove+0x3b/0xc0
device_release_driver_internal+0x103/0x1f0
unbind_store+0xf6/0x130
kernfs_fop_write+0x116/0x190
vfs_write+0xa5/0x1a0
ksys_write+0x4f/0xb0
do_syscall_64+0x5b/0x1a0
entry_SYSCALL_64_after_hwframe+0x65/0xca
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
8821CE with ASPM cannot work properly on Protempo Ltd L116HTN6SPW. Add a
quirk to disable the cap.
The reporter describes the symptom is that this module (driver) causes
frequent freezes, randomly but usually within a few minutes of running
(thus very soon after boot): screen display remains frozen, no response
to either keyboard or mouse input. All I can do is to hold the power
button to power off, then reboot.
After firmware alive, iwl_trans_pcie_gen2_fw_alive() is called
to free the context info. However, on gen3 that will then free
the context info with the wrong size.
Since we free this allocation later, let it stick around until
the device is stopped for now, freeing some of it earlier is a
separate change.
When the session protection ends and the Driver is not
associated or a beacon was not heard, the Driver
prints "No beacons heard...".
That's confusing for the case where not associated.
Change the print when not associated to "Not associated...".
SMPS requests may differ per interfaces due to e.g. Bluetooth
only interfering on 2.4 GHz, so if that's the case we should,
in the case of multiple PHY contexts, still allow RX diversity
on PHY context that have no interfaces with SMPS requests.
Fix the code to pass through the PHY context in question and
skip interfaces with non-matching PHY context while iterating.
When we have a P2P Device active, we attempt to only change the
PHY context it uses when we get a new remain-on-channel, if the
P2P Device is the only user of the PHY context.
This is fine if we're switching within a band, but if we're
switching bands then the switch implies a removal and re-add
of the PHY context, which isn't permitted by the firmware while
it's bound to an interface.
Fix the code to skip the unbind/release/... cycle only if the
band doesn't change (or we have old devices that can switch the
band on the fly as well.)
rxe_mr_init_user() always returns the fixed -EINVAL when ib_umem_get()
fails so it's hard for user to know which actual error happens in
ib_umem_get(). For example, ib_umem_get() will return -EOPNOTSUPP when
trying to pin pages on a DAX file.
The client's sk_state will be set to TCP_ESTABLISHED if the server
replay the client's connect request.
However, if the client has pending signal, its sk_state will be set
to TCP_CLOSE without notify the server, so the server will hold the
corrupt connection.
client server
1. sk_state=TCP_SYN_SENT |
2. call ->connect() |
3. wait reply |
| 4. sk_state=TCP_ESTABLISHED
| 5. insert to connected list
| 6. reply to the client
7. sk_state=TCP_ESTABLISHED |
8. insert to connected list |
9. *signal pending* <--------------------- the user kill client
10. sk_state=TCP_CLOSE |
client is exiting... |
11. call ->release() |
virtio_transport_close
if (!(sk->sk_state == TCP_ESTABLISHED ||
sk->sk_state == TCP_CLOSING))
return true; *return at here, the server cannot notice the connection is corrupt*
So the client should notify the peer in this case.
Cc: David S. Miller <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jorgen Hansen <jhansen@vmware.com> Cc: Norbert Slusarek <nslusarek@gmx.net> Cc: Andra Paraschiv <andraprs@amazon.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: David Brazdil <dbrazdil@google.com> Cc: Alexander Popov <alex.popov@linux.com> Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lkml.org/lkml/2021/5/17/418 Signed-off-by: lixianming <lixianming5@huawei.com> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Because the error handling is sequential, the application of resources
should be carried out in the order of error handling, so the operation
of registering the interrupt handler should be put in front, so as not
to free the unregistered interrupt handler during error handling.
When 'nicstar_init_one' fails, 'ns_init_card_error' will be executed for
error handling, but the correct memory free function should be used,
otherwise it will cause an error. Since 'card->rsq.org' and
'card->tsq.org' are allocated using 'dma_alloc_coherent' function, they
should be freed using 'dma_free_coherent'.
Fix this by using 'dma_free_coherent' instead of 'kfree'
As we know that AVB is enabled by default, and the ENET IP design is
queue 0 for best effort, queue 1&2 for AVB Class A&B. Bandwidth of each
queue 1&2 set in driver is 50%, TX bandwidth fluctuated when selecting
tx queues randomly with FEC_QUIRK_HAS_AVB quirk available.
This patch adds ndo_select_queue callback to select queues for
transmitting to fix this issue. It will always return queue 0 if this is
not a vlan packet, and return queue 1 or 2 based on priority of vlan
packet.
You may complain that in fact we only use single queue for trasmitting
if we are not targeted to VLAN. Yes, but seems we have no choice, since
AVB is enabled when the driver probed, we can't switch this feature
dynamicly. After compare multiple queues to single queue, TX throughput
almost no improvement.
One way we can implemet is to configure the driver to multiple queues
with Round-robin scheme by default. Then add ndo_setup_tc callback to
enable/disable AVB feature for users. Unfortunately, ENET AVB IP seems
not follow the standard 802.1Qav spec. We only can program
DMAnCFG[IDLE_SLOPE] field to calculate bandwidth fraction. And idle
slope is restricted to certain valus (a total of 19). It's far away from
CBS QDisc implemented in Linux TC framework. If you strongly suggest to do
this, I think we only can support limited numbers of bandwidth and reject
others, but it's really urgly and wried.
With this patch, VLAN tagged packets route to queue 0/1/2 based on vlan
priority; VLAN untagged packets route to queue 0.
Tested-by: Frieder Schrempf <frieder.schrempf@kontron.de> Reported-by: Frieder Schrempf <frieder.schrempf@kontron.de> Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Frieder Schrempf reported a TX throuthput issue [1], it happens quite often
that the measured bandwidth in TX direction drops from its expected/nominal
value to something like ~50% (for 100M) or ~67% (for 1G) connections.
The issue becomes clear after digging into it, Net core would select
queues when transmitting packets. Since FEC have not impletemented
ndo_select_queue callback yet, so it will call netdev_pick_tx to select
queues randomly.
For i.MX6SX ENET IP with AVB support, driver default enables this
feature. According to the setting of QOS/RCMRn/DMAnCFG registers, AVB
configured to Credit-based scheme, 50% bandwidth of each queue 1&2.
With below tests let me think more:
1) With FEC_QUIRK_HAS_AVB quirk, can reproduce TX bandwidth fluctuations issue.
2) Without FEC_QUIRK_HAS_AVB quirk, can't reproduce TX bandwidth fluctuations issue.
The related difference with or w/o FEC_QUIRK_HAS_AVB quirk is that, whether we
program FTYPE field of TxBD or not. As I describe above, AVB feature is
enabled by default. With FEC_QUIRK_HAS_AVB quirk, frames in queue 0
marked as non-AVB, and frames in queue 1&2 marked as AVB Class A&B. It's
unreasonable if frames in queue 1&2 are not required to be time-sensitive.
So when Net core select tx queues ramdomly, Credit-based scheme would work
and lead to TX bandwidth fluctuated. On the other hand, w/o
FEC_QUIRK_HAS_AVB quirk, frames in queue 1&2 are all marked as non-AVB, so
Credit-based scheme would not work.
Till now, how can we fix this TX throughput issue? Yes, please remove
FEC_QUIRK_HAS_AVB quirk if you suffer it from time-nonsensitive networking.
However, this quirk is used to indicate i.MX6SX, other setting depends
on it. So this patch adds a new quirk FEC_QUIRK_HAS_MULTI_QUEUES to
represent i.MX6SX, it is safe for us remove FEC_QUIRK_HAS_AVB quirk
now.
FEC_QUIRK_HAS_AVB quirk is set by default in the driver, and users may
not know much about driver details, they would waste effort to find the
root cause, that is not we want. The following patch is a implementation
to fix it and users don't need to modify the driver.
Tested-by: Frieder Schrempf <frieder.schrempf@kontron.de> Reported-by: Frieder Schrempf <frieder.schrempf@kontron.de> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>