If debugging on i.MX is enabled DEBUG_IMX_UART_PORT defines which UART
is used for the debug output. If however debugging is off don't only
hide the then unused config item but drop it completely by using a
dependency instead of a conditional prompt.
This fixes DEBUG_IMX_UART_PORT being present in the kernel config even
if DEBUG_LL is disabled.
On the other hand, synthetic event creation and deletion paths
call trace_add_event_call() and trace_remove_event_call()
which acquires event_mutex. In that case, if we keep the
synth_event_mutex locked while registering/unregistering synthetic
events, its dependency will be inversed.
To avoid this issue, current synthetic event is using a 2 phase
process to create/delete events. For example, it searches existing
events under synth_event_mutex to check for event-name conflicts, and
unlocks synth_event_mutex, then registers a new event under event_mutex
locked. Finally, it locks synth_event_mutex and tries to add the
new event to the list. But it can introduce complexity and a chance
for name conflicts.
To solve this simpler, this introduces trace_add_event_call_nolock()
and trace_remove_event_call_nolock() which don't acquire
event_mutex inside. synthetic event can lock event_mutex before
synth_event_mutex to solve the lock dependency issue simpler.
Link: http://lkml.kernel.org/r/154140844377.17322.13781091165954002713.stgit@devbox Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
This sets the partition information on the SQ201 to be read
out from the RedBoot partition table, removes the static
partition table and sets our boot options to mount root from
/dev/mtdblock2 where the squashfs+JFFS2 resides.
When dif and first burst is used in a write command wqe, the driver was not
properly setting fields in the io command request. This resulted in no dif
bytes being sent and invalid xfer_rdy's, resulting in the io being aborted
by the hardware.
Correct the wqe initializaton when both dif and first burst are used.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Driver is hitting null pring pointers in lpfc_do_work().
Pointer assignment occurs based on SLI-revision. If recovering after an
error, its possible the sli revision for the port was cleared, making the
lpfc_phba_elsring() not return a ring pointer, thus the null pointer.
Add SLI revision checking to lpfc_phba_elsring() and status checking to all
callers.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This patch does not change any functionality but avoids that sparse
complains about the queue_cmd_ring() function and its callers.
Fixes: 6fd0ce79724d ("tcmu: prep queue_cmd_ring to be used by unmap wq") Reviewed-by: David Disseldorp <ddiss@suse.de> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: Mike Christie <mchristi@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The owner member of struct pwm_ops must be set to THIS_MODULE to
increase the reference count of the module such that the module cannot
be removed while its code is in use.
In the first 5 minutes after boot (time of INITIAL_JIFFIES),
ieee80211_sta_last_active() returns zero if last_ack is zero. This
leads to "inactive time" showing jiffies_to_msecs(jiffies).
# iw wlan0 station get fc:ec:da:64:a6:dd
Station fc:ec:da:64:a6:dd (on wlan0)
inactive time: 4294894049 ms
.
.
connected time: 70 seconds
The "read-modify-write register index" function is declared with a
confusing prototype: the "mask" and "reg" arguments are swapped.
Fortunately, this does not affect callers so far. Both arguments are
u32, and the wrapper macros (ocelot_rmw_ix etc) have the arguments in
the correct order (the one from ocelot_io.c).
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The phy_init_hw() function may reset the PHY to a configuration
that does not match manual network settings stored in the phydev
structure. If the phy state machine is polled rather than event
driven this can create a timing hazard where the phy state machine
might alter the settings stored in the phydev structure from the
value read from the BMCR.
This commit follows invocations of phy_init_hw() by the bcmgenet
driver with invocations of the genphy_config_aneg() function to
ensure that the BMCR is written to match the settings held in the
phydev structure. This prevents the risk of manual settings being
accidentally altered.
Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
As noted in commit 28c2d1a7a0bf ("net: bcmgenet: enable loopback
during UniMAC sw_reset") the UniMAC must be clocked while sw_reset
is asserted for its state machines to reset cleanly.
The transmit and receive clocks used by the UniMAC are derived from
the signals used on its PHY interface. The bcmgenet MAC can be
configured to work with different PHY interfaces including MII,
GMII, RGMII, and Reverse MII on internal and external interfaces.
Unfortunately for the UniMAC, when configured for MII the Tx clock
is always driven from the PHY which places it outside of the direct
control of the MAC.
The earlier commit enabled a local loopback mode within the UniMAC
so that the receive clock would be derived from the transmit clock
which addressed the observed issue with an external GPHY disabling
it's Rx clock. However, when a Tx clock is not available this
loopback is insufficient.
This commit implements a workaround that leverages the fact that
the MAC can reliably generate all of its necessary clocking by
enterring the external GPHY RGMII interface mode with the UniMAC in
local loopback during the sw_reset interval. Unfortunately, this
has the undesirable side efect of the RGMII GTXCLK signal being
driven during the same window.
In most configurations this is a benign side effect as the signal
is either not routed to a pin or is already expected to drive the
pin. The one exception is when an external MII PHY is expected to
drive the same pin with its TX_CLK output creating output driver
contention.
This commit exploits the IEEE 802.3 clause 22 standard defined
isolate mode to force an external MII PHY to present a high
impedance on its TX_CLK output during the window to prevent any
contention at the pin.
The MII interface is used internally with the 40nm internal EPHY
which agressively disables its clocks for power savings leading to
incomplete resets of the UniMAC and many instabilities observed
over the years. The workaround of this commit is expected to put
an end to those problems.
Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
gcc's -freorder-blocks-and-partition option makes it group frequently
and infrequently used code in .text.hot and .text.unlikely sections
respectively. At least when building modules on s390, this option is
used by default.
gdb assumes that all code is located in .text section, and that .text
section is located at module load address. With such modules this is no
longer the case: there is code in .text.hot and .text.unlikely, and
either of them might precede .text.
Fix by explicitly telling gdb the addresses of code sections.
It might be tempting to do this for all sections, not only the ones in
the white list. Unfortunately, gdb appears to have an issue, when
telling it about e.g. loadable .note.gnu.build-id section causes it to
think that non-loadable .note.Linux section is loaded at address 0,
which in turn causes NULL pointers to be resolved to bogus symbols. So
keep using the white list approach for the time being.
The left time value is wrong when we get it by sysfs. The left time value
should be equal to preset timeout value minus elapsed time value. According
to the Meson-GXB/GXL datasheets which can be found at [0], the timeout value
is saved to BIT[0-15] of the WATCHDOG_TCNT, and elapsed time value is saved
to BIT[16-31] of the WATCHDOG_TCNT.
In mcp251x_restart_work_handler() the variable to stop the interrupt
handler (priv->force_quit) is reset after the chip is restarted and thus
a interrupt might occur.
This patch fixes the potential race condition by resetting force_quit
before enabling interrupts.
Signed-off-by: Timo Schlüßler <schluessler@krause.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
The call to can_rx_offload_queue_sorted() may fail and return an error
(in the current implementation due to resource shortage). The passed skb
is consumed.
This patch adds incrementing of the appropriate error counters to let
the device statistics reflect that there's a problem.
Reported-by: Martin Hundebøll <martin@geanix.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
In case of a resource shortage, i.e. the rx_offload queue will overflow
or a skb fails to be allocated (due to OOM),
can_rx_offload_offload_one() will call mailbox_read() to discard the
mailbox and return an ERR_PTR.
If the hardware FIFO is empty can_rx_offload_offload_one() will return
NULL.
In case a CAN frame was read from the hardware,
can_rx_offload_offload_one() returns the skb containing it.
Without this patch can_rx_offload_irq_offload_fifo() bails out if no skb
returned, regardless of the reason.
Similar to can_rx_offload_irq_offload_timestamp() in case of a resource
shortage the whole FIFO should be discarded, to avoid an IRQ storm and
give the system some time to recover. However if the FIFO is empty the
loop can be left.
With this patch the loop is left in case of empty FIFO, but not on
errors.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
In case of a resource shortage, i.e. the rx_offload queue will overflow
or a skb fails to be allocated (due to OOM),
can_rx_offload_offload_one() will call mailbox_read() to discard the
mailbox and return an ERR_PTR.
However can_rx_offload_irq_offload_timestamp() bails out in the error
case. In case of a resource shortage all mailboxes should be discarded,
to avoid an IRQ storm and give the system some time to recover.
Since can_rx_offload_irq_offload_timestamp() is typically called from a
while loop, all message will eventually be discarded. So let's continue
on error instead to discard them directly.
The skb_queue is a linked list, holding the skb to be processed in the
next NAPI call.
Without this patch, the queue length in can_rx_offload_offload_one() is
limited to skb_queue_len_max + 1. As the skb_queue is a linked list, no
array or other resources are accessed out-of-bound, however this
behaviour is counterintuitive.
This patch limits the rx-offload skb_queue length to skb_queue_len_max.
Fixes: d254586c3453 ("can: rx-offload: Add support for HW fifo based irq offloading") Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the rx-offload skb_queue is full can_rx_offload_queue_tail() will not
queue the skb and return with an error.
This patch frees the skb in case of a full queue, which brings
can_rx_offload_queue_tail() in line with the
can_rx_offload_queue_sorted() function, which has been adjusted in the
previous patch.
The return value is adjusted to -ENOBUFS to better reflect the actual
problem.
The device stats handling is left to the caller.
Fixes: d254586c3453 ("can: rx-offload: Add support for HW fifo based irq offloading") Reported-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
When the CAN interface is closed it the hardwre is put in power down
mode, but does not reset the error counters / state. Reset the D_CAN on
open, so the reported state and the actual state match.
According to [1], the C_CAN module doesn't have the software reset.
While the state changes are reported when the error counters increase
and decrease, there is no event when the bus recovers and the error
counters decrease again. So add those as well.
Change the state going downward to be ERROR_PASSIVE -> ERROR_WARNING ->
ERROR_ACTIVE instead of directly to ERROR_ACTIVE again.
Commit 3d8598fb9c5a ("clk: ti: clkctrl: use fallback udelay approach if
timekeeping is suspended") added handling for cases when timekeeping is
suspended. But looks like we can still get occasional "failed to enable"
errors on the PM runtime resume path with udelay() returning faster than
expected.
With ti-sysc interconnect target module driver this leads into device
failure with PM runtime failing with "failed to enable" clkctrl error.
Let's fix the issue with a delay of two times the desired delay as in
often done for udelay() to account for the inaccuracy.
Fixes: 3d8598fb9c5a ("clk: ti: clkctrl: use fallback udelay approach if timekeeping is suspended") Cc: Keerthy <j-keerthy@ti.com> Cc: Tero Kristo <t-kristo@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Link: https://lkml.kernel.org/r/20190930154001.46581-1-tony@atomide.com Tested-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When a mon group is being deleted, rdtgrp->flags is set to RDT_DELETED
in rdtgroup_rmdir_mon() firstly. The structure of rdtgrp will be freed
until rdtgrp->waitcount is dropped to 0 in rdtgroup_kn_unlock() later.
During the window of deleting a mon group, if an application calls
rdtgroup_mondata_show() to read mondata under this mon group,
'rdtgrp' returned from rdtgroup_kn_lock_live() is a NULL pointer when
rdtgrp->flags is RDT_DELETED. And then 'rdtgrp' is passed in this path:
rdtgroup_mondata_show() --> mon_event_read() --> mon_event_count().
Thus it results in NULL pointer dereference in mon_event_count().
Check 'rdtgrp' in rdtgroup_mondata_show(), and return -ENOENT
immediately when reading mondata during the window of deleting a mon
group.
Attempting to allocate an entry at 0xffffffff when one is already
present would succeed in allocating one at 2^32, which would confuse
everything. Return -ENOSPC in this case, as expected.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
If there is an entry at INT_MAX then idr_for_each_entry() will increment
id after handling it. This is undefined behaviour, and is caught by
UBSAN. Adding 1U to id forces the operation to be carried out as an
unsigned addition which (when assigned to id) will result in INT_MIN.
Since there is never an entry stored at INT_MIN, idr_get_next() will
return NULL, ending the loop as expected.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
We have seen many crashes on powerpc hosts while loading bpf programs.
The problem here is that bpf_int_jit_compile() does a first pass
to compute the program length.
Then it allocates memory to store the generated program and
calls bpf_jit_build_body() a second time (and a third time
later)
What I have observed is that the second bpf_jit_build_body()
could end up using few more words than expected.
If bpf_jit_binary_alloc() put the space for the program
at the end of the allocated page, we then write on
a non mapped memory.
It appears that bpf_jit_emit_tail_call() calls
bpf_jit_emit_common_epilogue() while ctx->seen might not
be stable.
Only after the second pass we can be sure ctx->seen wont be changed.
Trying to avoid a second pass seems quite complex and probably
not worth it.
Fixes: ce0761419faef ("powerpc/bpf: Implement support for tail calls") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com> Cc: Sandipan Das <sandipan@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20191101033444.143741-1-edumazet@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The zero'ing of bits 16 and 18 is incorrect. Currently the code
is masking with the bitwise-and of BIT(16) & BIT(18) which is
0, so the updated value for val is always zero. Fix this by bitwise
and-ing value with the correct mask that will zero bits 16 and 18.
Addresses-Coverity: (" Suspicious &= or |= constant expression") Fixes: b8eb71dcdd08 ("clk: sunxi-ng: Add A80 CCU") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
r375326 in Clang exposes an issue with operator precedence in
sunxi_div_clk_setup:
drivers/clk/sunxi/clk-sunxi.c:1083:30: warning: operator '?:' has lower
precedence than '|'; '|' will be evaluated first
[-Wbitwise-conditional-parentheses]
data->div[i].critical ?
~~~~~~~~~~~~~~~~~~~~~ ^
drivers/clk/sunxi/clk-sunxi.c:1083:30: note: place parentheses around
the '|' expression to silence this warning
data->div[i].critical ?
^
)
drivers/clk/sunxi/clk-sunxi.c:1083:30: note: place parentheses around
the '?:' expression to evaluate it first
data->div[i].critical ?
^
(
1 warning generated.
It appears that the intention was for ?: to be evaluated first so that
CLK_IS_CRITICAL could be added to clkflags if the critical boolean was
set; right now, | is being evaluated first. Add parentheses around the
?: block to have it be evaluated first.
Fixes: 9919d44ff297 ("clk: sunxi: Use CLK_IS_CRITICAL flag for critical clks") Link: https://github.com/ClangBuiltLinux/linux/issues/745 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
It is not allowed to sleep to early in the boot process and this may lead
to kernel issues if the bootloader didn't prepare the slow clock and main
clock.
This results in the following error and dump stack on the AriettaG25:
bad: scheduling from the idle thread!
Ensure it is possible to sleep, else simply have a delay.
The MMA8451 interrupt triggers as low level, so the GPIO6_IO31 pin
needs to activate its pull up, otherwise it will stay always at low level
generating multiple interrupts.
The current device tree does not configure the IOMUX for this pin, so
it uses whathever comes configured from the bootloader.
The IOMUXC_SW_PAD_CTL_PAD_EIM_BCLK register value comes as 0x8000 from
the bootloader, which has PKE bit cleared, hence disabling the
pull-up.
Instead of relying on a previous configuration from the bootloader,
configure the GPIO6_IO31 pin with pull-up enabled in order to fix
this problem.
Keeping the IRQ chip definition static shares it with multiple instances
of the GPIO chip in the system. This is bad and now we get this warning
from GPIO library:
"detected irqchip that is shared with multiple gpiochips: please fix the driver."
Hence, move the IRQ chip definition from being driver static into the struct
intel_pinctrl. So a unique IRQ chip is used for each GPIO chip instance.
This patch is heavily based on the attachment to the bug by Christoph Marz.
Properly save and restore all top PLL related configuration registers
during suspend/resume cycle. So far driver only handled EPLL and RPLL
clocks, all other were reset to default values after suspend/resume cycle.
This caused for example lower G3D (MALI Panfrost) performance after system
resume, even if performance governor has been selected.
Reported-by: Reported-by: Marian Mihailescu <mihailescu2m@gmail.com> Fixes: 773424326b51 ("clk: samsung: exynos5420: add more registers to restore list") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The devm conversion of kirkwood was incorrect; on removal, devm takes
effect after the "remove" function has returned. So, the effect of
the conversion was to change the order during remove from:
- clk_disable_unprepare() - while the device may still be active
- snd_soc_unregister_component()
- cleanup resources
Hence, it introduces a bug, where the internal clock for the device
may be shut down before the device itself has been shut down. It is
known that Marvell SoCs, including Dove, locks up if registers for a
peripheral that has its clocks disabled are accessed.
Fixes: f98fc0f8154e ("ASoC: kirkwood: replace platform to component") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1iNGyP-0004oN-BA@rmk-PC.armlinux.org.uk Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When our call to get the external clock fails, we forget to clean up
the enabled internal clock correctly. Enable the clock after we have
obtained all our resources.
Fixes: 84aac6c79bfd ("ASoC: kirkwood: fix loss of external clock at probe time") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1iNGyK-0004oF-6A@rmk-PC.armlinux.org.uk Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Memory allocated for 'struct reset_control_array' in
of_reset_control_array_get() is never freed in
reset_control_array_put() resulting in kmemleak showing
the following backtrace.
Fixes: 17c82e206d2a ("reset: Add APIs to manage array of resets") Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
According to the PM8916 Hardware Register Description,
CDC_D_CDC_CONN_HPHR_DAC_CTL has only a single bit (RX_SEL)
to switch between RX1 (0) and RX2 (1). It is not possible to
disable it entirely to achieve the "ZERO" state.
However, at the moment the "RDAC2 MUX" mixer defines three possible
values ("ZERO", "RX2" and "RX1"). Setting the mixer to "ZERO"
actually configures it to RX1. Setting the mixer to "RX1" has
(seemingly) no effect.
Remove "ZERO" and replace it with "RX1" to fix this.
Fixes: 585e881e5b9e ("ASoC: codecs: Add msm8916-wcd analog codec") Signed-off-by: Stephan Gerhold <stephan@gerhold.net> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20191020153007.206070-1-stephan@gerhold.net Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When timer_of_init fails, it cleans up after itself by undoing
everything it did during the initialization function.
mtk_syst_init and mtk_gpt_init both call timer_of_cleanup if
timer_of_init fails. timer_of_cleanup try to release the resource
taken. Since these resources have already been cleaned up by
timer_of_init, we end up getting a few warnings printed:
The meson-saradc driver manually sets the input clock for
sar_adc_clk_sel. Update the GXBB clock driver (which is used on GXBB,
GXL and GXM) so the rate settings on sar_adc_clk_div are propagated up
to sar_adc_clk_sel which will let the common clock framework select the
best matching parent clock if we want that.
This makes sar_adc_clk_div consistent with the axg-aoclk and g12a-aoclk
drivers, which both also specify CLK_SET_RATE_PARENT.
Fixes: 33d0fcdfe0e870 ("clk: gxbb: add the SAR ADC clocks and expose them") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
A bit unexpectedly (but still documented), request_module may
return a positive value, in case of a modprobe error.
This is currently causing issues in the devfreq framework.
When a request_module exits with a positive value, we currently
return that via ERR_PTR. However, because the value is positive,
it's not a ERR_VALUE proper, and is therefore treated as a
valid struct devfreq_governor pointer, leading to a kernel oops.
Fix this by returning -EINVAL if request_module returns a positive
value.
On some systems that are vulnerable to Spectre v2, it is up to
software to flush the link stack (return address stack), in order to
protect against Spectre-RSB.
When exiting from a guest we do some house keeping and then
potentially exit to C code which is several stack frames deep in the
host kernel. We will then execute a series of returns without
preceeding calls, opening up the possiblity that the guest could have
poisoned the link stack, and direct speculative execution of the host
to a gadget of some sort.
To prevent this we add a flush of the link stack on exit from a guest.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[dja: straightforward backport to v4.19] Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit ee13cb249fab ("powerpc/64s: Add support for software count
cache flush"), I added support for software to flush the count
cache (indirect branch cache) on context switch if firmware told us
that was the required mitigation for Spectre v2.
As part of that code we also added a software flush of the link
stack (return address stack), which protects against Spectre-RSB
between user processes.
That is all correct for CPUs that activate that mitigation, which is
currently Power9 Nimbus DD2.3.
What I got wrong is that on older CPUs, where firmware has disabled
the count cache, we also need to flush the link stack on context
switch.
To fix it we create a new feature bit which is not set by firmware,
which tells us we need to flush the link stack. We set that when
firmware tells us that either of the existing Spectre v2 mitigations
are enabled.
Then we adjust the patching code so that if we see that feature bit we
enable the link stack flush. If we're also told to flush the count
cache in software then we fall through and do that also.
On the older CPUs we don't need to do do the software count cache
flush, firmware has disabled it, so in that case we patch in an early
return after the link stack flush.
The naming of some of the functions is awkward after this patch,
because they're called "count cache" but they also do link stack. But
we'll fix that up in a later commit to ease backporting.
This is the fix for CVE-2019-18660.
Reported-by: Anthony Steinhauser <asteinhauser@google.com> Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache flush") Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add support for disabling the kernel implemented spectre v2 mitigation
(count cache flush on context switch) via the nospectre_v2 and
mitigations=off cmdline options.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190524024647.381-1-cmr@informatik.wtf Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The userspace comedilib function 'get_cmd_generic_timed' fills
the cmd structure with an informed guess and then calls the
function 'usbduxfast_ai_cmdtest' in this driver repeatedly while
'usbduxfast_ai_cmdtest' is modifying the cmd struct until it
no longer changes. However, because of rounding errors this never
converged because 'steps = (cmd->convert_arg * 30) / 1000' and then
back to 'cmd->convert_arg = (steps * 1000) / 30' won't be the same
because of rounding errors. 'Steps' should only be converted back to
the 'convert_arg' if 'steps' has actually been modified. In addition
the case of steps being 0 wasn't checked which is also now done.
These are the Foxconn-branded variants of the Dell DW5821e modules,
same USB layout as those. The device exposes AT, NMEA and DIAG ports
in both USB configurations.
The driver was setting the device remote-wakeup feature during probe in
violation of the USB specification (which says it should only be set
just prior to suspending the device). This could potentially waste
power during suspend as well as lead to spurious wakeups.
Note that USB core would clear the remote-wakeup feature at first
resume.
The driver was setting the device remote-wakeup feature during probe in
violation of the USB specification (which says it should only be set
just prior to suspending the device). This could potentially waste
power during suspend as well as lead to spurious wakeups.
Note that USB core would clear the remote-wakeup feature at first
resume.
Fixes: 0f64478cbc7a ("USB: add USB serial mos7720 driver") Cc: stable <stable@vger.kernel.org> # 2.6.19 Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add USB ID for MOXA UPort 2210. This device contains mos7820 but
it passes GPIO0 check implemented by driver and it's detected as
mos7840. Hence product id check is added to force mos7820 mode.
Signed-off-by: Pavel Löbl <pavel@loebl.cz> Cc: stable <stable@vger.kernel.org>
[ johan: rename id defines and add vendor-id check ] Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1. stale memory left over from the last transfer
the actual length of the data transfered needs to be checked
2. memory already freed
the error handling in appledisplay_probe() needs
to cancel the work in that case
In case of a timeout or if a signal aborts a read
communication with the device needs to be ended
lest we overwrite an active URB the next time we
do IO to the device, as the URB may still be active.
Smatch reported that nents is not initialized and used in
stub_recv_cmd_submit(). nents is currently initialized by sgl_alloc()
and used to allocate multiple URBs when host controller doesn't
support scatter-gather DMA. The use of uninitialized nents means that
buf_len is zero and use_sg is true. But buffer length should not be
zero when an URB uses scatter-gather DMA.
To prevent this situation, add the conditional that checks buf_len
and use_sg. And move the use of nents right after the sgl_alloc() to
avoid the use of uninitialized nents.
If the error occurs, it adds SDEV_EVENT_ERROR_MALLOC and stub_priv
will be released by stub event handler and connection will be shut
down.
Fixes: ea44d190764b ("usbip: Implement SG support to vhci-hcd and stub driver") Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Suwan Kim <suwan.kim027@gmail.com> Acked-by: Shuah Khan <skhan@linuxfoundation.org> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20191111141035.27788-1-suwan.kim027@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 780bc7903a32 ("virtio_ring: Support DMA APIs") makes
virtqueue_add() return -EIO when we fail to map our I/O buffers. This is
a very realistic scenario for guests with encrypted memory, as swiotlb
may run out of space, depending on it's size and the I/O load.
The virtio-blk driver interprets -EIO form virtqueue_add() as an IO
error, despite the fact that swiotlb full is in absence of bugs a
recoverable condition.
Let us change the return code to -ENOMEM, and make the block layer
recover form these failures when virtio-blk encounters the condition
described above.
Cc: stable@vger.kernel.org Fixes: 780bc7903a32 ("virtio_ring: Support DMA APIs") Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Tested-by: Michael Mueller <mimu@linux.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The touch timer is set up in intf1. If the second interface does not exist,
the timer and touch input device are not setup and we get the following
error, when touch events are reported via intf0.
Local variable description: ----ircode@cxusb_rc_query
Variable was created at:
cxusb_rc_query+0x4d/0x360 drivers/media/usb/dvb-usb/cxusb.c:543
dvb_usb_read_remote_control+0xf9/0x290 drivers/media/usb/dvb-usb/dvb-usb-remote.c:261
Signed-off-by: Vito Caputo <vcaputo@pengaru.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When parsing the UVC control descriptors fails, the error path tries to
cleanup a media device that hasn't been initialised, potentially
resulting in a crash. Fix this by initialising the media device before
the error handling path can be reached.
Visual inspection of the usbvision driver shows that it suffers from
three races between its open, close, and disconnect handlers. In
particular, the driver is careful to update its usbvision->user and
usbvision->remove_pending flags while holding the private mutex, but:
usbvision_v4l2_close() and usbvision_radio_close() don't hold
the mutex while they check the value of
usbvision->remove_pending;
usbvision_disconnect() doesn't hold the mutex while checking
the value of usbvision->user; and
also, usbvision_v4l2_open() and usbvision_radio_open() don't
check whether the device has been unplugged before allowing
the user to open the device files.
Each of these can potentially lead to usbvision_release() being called
twice and use-after-free errors.
This patch fixes the races by reading the flags while the mutex is
still held and checking for pending removes before allowing an open to
succeed.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> CC: <stable@vger.kernel.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is the same incorrect approach to locking implemented in
vivid_stop_generating_vid_cap(), vivid_stop_generating_vid_out() and
sdr_cap_stop_streaming().
These functions are called during streaming stopping with vivid_dev.mutex
locked. And they all do the same mistake while stopping their kthreads,
which need to lock this mutex as well. See the example from
vivid_stop_generating_vid_cap():
/* shutdown control thread */
vivid_grab_controls(dev, false);
mutex_unlock(&dev->mutex);
kthread_stop(dev->kthread_vid_cap);
dev->kthread_vid_cap = NULL;
mutex_lock(&dev->mutex);
But when this mutex is unlocked, another vb2_fop_read() can lock it
instead of vivid_thread_vid_cap() and manipulate the buffer queue.
That causes a use-after-free access later.
To fix those issues let's:
1. avoid unlocking the mutex in vivid_stop_generating_vid_cap(),
vivid_stop_generating_vid_out() and sdr_cap_stop_streaming();
2. use mutex_trylock() with schedule_timeout_uninterruptible() in
the loops of the vivid kthread handlers.
Signed-off-by: Alexander Popov <alex.popov@linux.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: <stable@vger.kernel.org> # for v3.18 and up Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When vbi stream is started, followed by video streaming,
the vid_cap_streaming and vid_out_streaming were not being set to true,
which would cause the video stream to stop when vbi stream is stopped.
This patch allows to set vid_cap_streaming and vid_out_streaming to true.
According to Hans Verkuil it appears that these 'if (dev->kthread_vid_cap)'
checks are a left-over from the original vivid development and should never
have been there.
Signed-off-by: Vandana BN <bnvandana@gmail.com> Cc: <stable@vger.kernel.org> # for v3.18 and up Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If starting the transfer of a command suceeds but the transfer for the reply
fails, it is not enough to initiate killing the transfer for the
command may still be running. You need to wait for the killing to finish
before you can reuse URB and buffer.
Reported-and-tested-by: syzbot+711468aa5c3a1eabf863@syzkaller.appspotmail.com Signed-off-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
snd_usb_mixer_controls_badd() that parses UAC3 BADD profiles misses a
NULL check for the given interfaces. When a malformed USB descriptor
is passed, this may lead to an Oops, as spotted by syzkaller.
Skip the iteration if the interface doesn't exist for avoiding the
crash.
Robust futexes utilize the robust_list mechanism to allow the kernel to
release futexes which are held when a task exits. The exit can be voluntary
or caused by a signal or fault. This prevents that waiters block forever.
The futex operations in user space store a pointer to the futex they are
either locking or unlocking in the op_pending member of the per task robust
list.
After a lock operation has succeeded the futex is queued in the robust list
linked list and the op_pending pointer is cleared.
After an unlock operation has succeeded the futex is removed from the
robust list linked list and the op_pending pointer is cleared.
The robust list exit code checks for the pending operation and any futex
which is queued in the linked list. It carefully checks whether the futex
value is the TID of the exiting task. If so, it sets the OWNER_DIED bit and
tries to wake up a potential waiter.
This is race free for the lock operation but unlock has two race scenarios
where waiters might not be woken up. These issues can be observed with
regular robust pthread mutexes. PI aware pthread mutexes are not affected.
(1) Unlocking task is killed after unlocking the futex value in user space
before being able to wake a waiter.
pthread_mutex_unlock()
|
V
atomic_exchange_rel (&mutex->__data.__lock, 0)
<------------------------killed
lll_futex_wake () |
|
|(__lock = 0)
|(enter kernel)
|
V
do_exit()
exit_mm()
mm_release()
exit_robust_list()
handle_futex_death()
|
|(__lock = 0)
|(uval = 0)
|
V
if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
return 0;
The sanity check which ensures that the user space futex is owned by
the exiting task prevents the wakeup of waiters which in consequence
block infinitely.
(2) Waiting task is killed after a wakeup and before it can acquire the
futex in user space.
OWNER WAITER
futex_wait()
pthread_mutex_unlock() |
| |
|(__lock = 0) |
| |
V |
futex_wake() ------------> wakeup()
|
|(return to userspace)
|(__lock = 0)
|
V
oldval = mutex->__data.__lock
<-----------------killed
atomic_compare_and_exchange_val_acq (&mutex->__data.__lock, |
id | assume_other_futex_waiters, 0) |
|
|
(enter kernel)|
|
V
do_exit()
|
|
V
handle_futex_death()
|
|(__lock = 0)
|(uval = 0)
|
V
if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
return 0;
The sanity check which ensures that the user space futex is owned
by the exiting task prevents the wakeup of waiters, which seems to
be correct as the exiting task does not own the futex value, but
the consequence is that other waiters wont be woken up and block
infinitely.
In both scenarios the following conditions are true:
- task->robust_list->list_op_pending != NULL
- user space futex value == 0
- Regular futex (not PI)
If these conditions are met then it is reasonably safe to wake up a
potential waiter in order to prevent the above problems.
As this might be a false positive it can cause spurious wakeups, but the
waiter side has to handle other types of unrelated wakeups, e.g. signals
gracefully anyway. So such a spurious wakeup will not affect the
correctness of these operations.
This workaround must not touch the user space futex value and cannot set
the OWNER_DIED bit because the lock value is 0, i.e. uncontended. Setting
OWNER_DIED in this case would result in inconsistent state and subsequently
in malfunction of the owner died handling in user space.
The rest of the user space state is still consistent as no other task can
observe the list_op_pending entry in the exiting tasks robust list.
The eventually woken up waiter will observe the uncontended lock value and
take it over.
[ tglx: Massaged changelog and comment. Made the return explicit and not
depend on the subsequent check and added constants to hand into
handle_futex_death() instead of plain numbers. Fixed a few coding
style issues. ]
We are going to share the compat_sys_futex() handler between 64-bit
architectures and 32-bit architectures that need to deal with both 32-bit
and 64-bit time_t, and this is easier if both entry points are in the
same file.
In fact, most other system call handlers do the same thing these days, so
let's follow the trend here and merge all of futex_compat.c into futex.c.
In the process, a few minor changes have to be done to make sure everything
still makes sense: handle_futex_death() and futex_cmpxchg_enabled() become
local symbol, and the compat version of the fetch_robust_entry() function
gets renamed to compat_fetch_robust_entry() to avoid a symbol clash.
This is intended as a purely cosmetic patch, no behavior should
change.
In nbd_add_socket when krealloc succeeds, if nsock's allocation fail the
reallocted memory is leak. The correct behaviour should be assigning the
reallocted memory to config->socks right after success.
Since MDS and TAA mitigations are inter-related for processors that are
affected by both vulnerabilities, the followiing confusing messages can
be printed in the kernel log:
MDS: Vulnerable
MDS: Mitigation: Clear CPU buffers
To avoid the first incorrect message, defer the printing of MDS
mitigation after the TAA mitigation selection has been done. However,
that has the side effect of printing TAA mitigation first before MDS
mitigation.
[ bp: Check box is affected/mitigations are disabled first before
printing and massage. ]
Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Mark Gross <mgross@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191115161445.30809-3-longman@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For MDS vulnerable processors with TSX support, enabling either MDS or
TAA mitigations will enable the use of VERW to flush internal processor
buffers at the right code path. IOW, they are either both mitigated
or both not. However, if the command line options are inconsistent,
the vulnerabilites sysfs files may not report the mitigation status
correctly.
For example, with only the "mds=off" option:
vulnerabilities/mds:Vulnerable; SMT vulnerable
vulnerabilities/tsx_async_abort:Mitigation: Clear CPU buffers; SMT vulnerable
The mds vulnerabilities file has wrong status in this case. Similarly,
the taa vulnerability file will be wrong with mds mitigation on, but
taa off.
Change taa_select_mitigation() to sync up the two mitigation status
and have them turned off if both "mds=off" and "tsx_async_abort=off"
are present.
Update documentation to emphasize the fact that both "mds=off" and
"tsx_async_abort=off" have to be specified together for processors that
are affected by both TAA and MDS to be effective.
[ bp: Massage and add kernel-parameters.txt change too. ]
Fixes: 1b42f017415b ("x86/speculation/taa: Add mitigation for TSX Async Abort") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: linux-doc@vger.kernel.org Cc: Mark Gross <mgross@linux.intel.com> Cc: <stable@vger.kernel.org> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191115161445.30809-2-longman@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
gawk 5.0.1 generates the following regexp warnings:
GEN /home/sasha/torvalds/tools/objtool/arch/x86/lib/inat-tables.c
awk: ../arch/x86/tools/gen-insn-attr-x86.awk:260: warning: regexp escape sequence `\:' is not a known regexp operator
awk: ../arch/x86/tools/gen-insn-attr-x86.awk:350: (FILENAME=../arch/x86/lib/x86-opcode-map.txt FNR=41) warning: regexp escape sequence `\&' is not a known regexp operator
Ealier versions of gawk are not known to generate these warnings. The
gawk manual referenced below does not list characters ':' and '&' as
needing escaping, so 'unescape' them. See
8-letter strings representing ARC perf events are stores in two
32-bit registers as ASCII characters like that: "IJMP", "IALL", "IJMPTAK" etc.
And the same order of bytes in the word is used regardless CPU endianness.
Which means in case of big-endian CPU core we need to swap bytes to get
the same order as if it was on little-endian CPU.
Otherwise we're seeing the following error message on boot:
------------------------->8----------------------
ARC perf : 8 counters (32 bits), 40 conditions, [overflow IRQ support]
sysfs: cannot create duplicate filename '/devices/arc_pct/events/pmji'
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.18 #3
Stack Trace:
arc_unwind_core+0xd4/0xfc
dump_stack+0x64/0x80
sysfs_warn_dup+0x46/0x58
sysfs_add_file_mode_ns+0xb2/0x168
create_files+0x70/0x2a0
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at kernel/events/core.c:12144 perf_event_sysfs_init+0x70/0xa0
Failed to register pmu: arc_pct, reason -17
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.18 #3
Stack Trace:
arc_unwind_core+0xd4/0xfc
dump_stack+0x64/0x80
__warn+0x9c/0xd4
warn_slowpath_fmt+0x22/0x2c
perf_event_sysfs_init+0x70/0xa0
---[ end trace a75fb9a9837bd1ec ]---
------------------------->8----------------------
What happens here we're trying to register more than one raw perf event
with the same name "PMJI". Why? Because ARC perf events are 4 to 8 letters
and encoded into two 32-bit words. In this particular case we deal with 2
events:
* "IJMP____" which counts all jump & branch instructions
* "IJMPC___" which counts only conditional jumps & branches
Those strings are split in two 32-bit words this way "IJMP" + "____" &
"IJMP" + "C___" correspondingly. Now if we read them swapped due to CPU core
being big-endian then we read "PMJI" + "____" & "PMJI" + "___C".
And since we interpret read array of ASCII letters as a null-terminated string
on big-endian CPU we end up with 2 events of the same name "PMJI".
adjust_lowmem_bounds() checks every memblocks in order to find the boundary
between lowmem and highmem. However some memblocks could be marked as NOMAP
so they are not used by kernel, which should be skipped while calculating
the boundary.
Signed-off-by: Chester Lin <clin@suse.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Remove ocfs2_is_o2cb_active(). We have similar functions to identify
which cluster stack is being used via osb->osb_cluster_stack.
Secondly, the current implementation of ocfs2_is_o2cb_active() is not
totally safe. Based on the design of stackglue, we need to get
ocfs2_stack_lock before using ocfs2_stack related data structures, and
that active_stack pointer can be NULL in the case of mount failure.
Link: http://lkml.kernel.org/r/1495441079-11708-1-git-send-email-ghe@suse.com Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Reviewed-by: Eric Ren <zren@suse.com> Acked-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After reset SGMII Autoneg timer is set to 2us (bits 6 and 5 are 01).
That is not enough to finalize autonegatiation on some devices.
Increase this timer duration to maximum supported 16ms.
Signed-off-by: Max Uvarov <muvarov@gmail.com> Cc: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ adapted for kernels without phy_modify_mmd ] Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For supporting 10Mps speed in SGMII mode DP83867_10M_SGMII_RATE_ADAPT bit
of DP83867_10M_SGMII_CFG register has to be cleared by software.
That does not affect speeds 100 and 1000 so can be done on init.
Signed-off-by: Max Uvarov <muvarov@gmail.com> Cc: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ adapted for kernels without phy_modify_mmd ] Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Let's limit shrinking to !ZONE_DEVICE so we can fix the current code.
We should never try to touch the memmap of offline sections where we
could have uninitialized memmaps and could trigger BUGs when calling
page_to_nid() on poisoned pages.
There is no reliable way to distinguish an uninitialized memmap from an
initialized memmap that belongs to ZONE_DEVICE, as we don't have
anything like SECTION_IS_ONLINE we can use similar to
pfn_to_online_section() for !ZONE_DEVICE memory.
E.g., set_zone_contiguous() similarly relies on pfn_to_online_section()
and will therefore never set a ZONE_DEVICE zone consecutive. Stopping
to shrink the ZONE_DEVICE therefore results in no observable changes,
besides /proc/zoneinfo indicating different boundaries - something we
can totally live with.
Before commit d0dc12e86b31 ("mm/memory_hotplug: optimize memory
hotplug"), the memmap was initialized with 0 and the node with the right
value. So the zone might be wrong but not garbage. After that commit,
both the zone and the node will be garbage when touching uninitialized
memmaps.
Toshiki reported a BUG (race between delayed initialization of
ZONE_DEVICE memmaps without holding the memory hotplug lock and
concurrent zone shrinking).
https://lkml.org/lkml/2019/11/14/1040
"Iteration of create and destroy namespace causes the panic as below:
While creating a namespace and initializing memmap, if you destroy the
namespace and shrink the zone, it will initialize the memmap outside
the zone and trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page),
pfn), page) in set_pfnblock_flags_mask()."
This BUG is also mitigated by this commit, where we for now stop to
shrink the ZONE_DEVICE zone until we can do it in a safe and clean way.
Link: http://lkml.kernel.org/r/20191006085646.5768-5-david@redhat.com Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319] Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reported-by: Toshiki Fukasawa <t-fukasawa@vx.jp.nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Damian Tometzki <damian.tometzki@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Halil Pasic <pasic@linux.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jun Yao <yaojun8558363@gmail.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Pankaj Gupta <pagupta@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qian Cai <cai@lca.pw> Cc: Rich Felker <dalias@libc.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> [4.13+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Due to unneeded multiplication in the out_free_pages portion of
r10buf_pool_alloc(), when using a 3-copy raid10 layout, it is
possible to access a resync_pages offset that has not been
initialized. This access translates into a crash of the system
within resync_free_pages() while passing a bad pointer to
put_page(). Remove the multiplication, preventing access to the
uninitialized area.
Fixes: f0250618361db ("md: raid10: don't use bio's vec table to manage resync pages") Cc: stable@vger.kernel.org # 4.12+ Signed-off-by: John Pittman <jpittman@redhat.com> Suggested-by: David Jeffery <djeffery@redhat.com> Reviewed-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, data variable in ar9003_hw_thermo_cal_apply() could be
uninitialized if ar9300_otp_read_word() will fail to read the value.
Initialize data variable with 0 to prevent an undefined behavior. This
will be enough to handle error case when ar9300_otp_read_word() fails.
Fixes: 80fe43f2bbd5 ("ath9k_hw: Read and configure thermocal for AR9462") Cc: Rajkumar Manoharan <rmanohar@qca.qualcomm.com> Cc: John W. Linville <linville@tuxdriver.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: stable@vger.kernel.org Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The `ar_usb` field of `ath10k_usb_pipe_usb_pipe` objects
are initialized to point to the containing `ath10k_usb` object
according to endpoint descriptors read from the device side, as shown
below in `ath10k_usb_setup_pipe_resources`:
for (i = 0; i < iface_desc->desc.bNumEndpoints; ++i) {
endpoint = &iface_desc->endpoint[i].desc;
// get the address from endpoint descriptor
pipe_num = ath10k_usb_get_logical_pipe_num(ar_usb,
endpoint->bEndpointAddress,
&urbcount);
......
// select the pipe object
pipe = &ar_usb->pipes[pipe_num];
// initialize the ar_usb field
pipe->ar_usb = ar_usb;
}
The driver assumes that the addresses reported in endpoint
descriptors from device side to be complete. If a device is
malicious and does not report complete addresses, it may trigger
NULL-ptr-deref `ath10k_usb_alloc_urb_from_pipe` and
`ath10k_usb_free_urb_to_pipe`.
This patch fixes the bug by preventing potential NULL-ptr-deref.
Signed-off-by: Hui Peng <benquike@gmail.com> Reported-by: Hui Peng <benquike@gmail.com> Reported-by: Mathias Payer <mathias.payer@nebelwelt.net> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[groeck: Add driver tag to subject, fix build warning] Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and
instead manually handle ZONE_DEVICE on a case-by-case basis. For things
like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal
pages, e.g. put pages grabbed via gup(). But for flows such as setting
A/D bits or shifting refcounts for transparent huge pages, KVM needs to
to avoid processing ZONE_DEVICE pages as the flows in question lack the
underlying machinery for proper handling of ZONE_DEVICE pages.
This fixes a hang reported by Adam Borowski[*] in dev_pagemap_cleanup()
when running a KVM guest backed with /dev/dax memory, as KVM straight up
doesn't put any references to ZONE_DEVICE pages acquired by gup().
Note, Dan Williams proposed an alternative solution of doing put_page()
on ZONE_DEVICE pages immediately after gup() in order to simplify the
auditing needed to ensure is_zone_device_page() is called if and only if
the backing device is pinned (via gup()). But that approach would break
kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til
unmap() when accessing guest memory, unlike KVM's secondary MMU, which
coordinates with mmu_notifier invalidations to avoid creating stale
page references, i.e. doesn't rely on pages being pinned.
Reported-by: Adam Borowski <kilobyte@angband.pl> Analyzed-by: David Hildenbrand <david@redhat.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: stable@vger.kernel.org Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[sean: backport to 4.x; resolve conflict in mmu.c] Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>