Stefan Wahren [Mon, 13 Jan 2020 18:56:16 +0000 (19:56 +0100)]
thermal: Add BCM2711 thermal driver
This adds the thermal sensor driver for the Broadcom BCM2711 SoC,
which is placed on the Raspberry Pi 4. The driver only provides
SoC temperature reading so far.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Tested-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/1578941778-23321-3-git-send-email-stefan.wahren@i2se.com
Stefan Wahren [Mon, 13 Jan 2020 18:56:15 +0000 (19:56 +0100)]
dt-bindings: Add Broadcom AVS RO thermal
Since the BCM2711 doesn't have a AVS TMON block, the thermal information
must be retrieved from the AVS ring oscillator block. This block is part
of the AVS monitor which contains a bunch of raw sensors.
Yangtao Li [Sun, 12 Jan 2020 18:09:25 +0000 (18:09 +0000)]
thermal: sun8i: Remove unused variable and unneeded macros
The cp_ft_flag variable is not used after initialization, so delete
it. After that, THS_EFUSE_CP_FT_MASK, THS_EFUSE_CP_FT_BIT and
THS_CALIBRATION_IN_FT are not needed, so delete them.
thermal: exynos: Rename Samsung and Exynos to lowercase
Fix up inconsistent usage of upper and lowercase letters in "Samsung"
and "Exynos" names.
"SAMSUNG" and "EXYNOS" are not abbreviations but regular trademarked
names. Therefore they should be written with lowercase letters starting
with capital letter.
The lowercase "Exynos" name is promoted by its manufacturer Samsung
Electronics Co., Ltd., in advertisement materials and on website.
Although advertisement materials usually use uppercase "SAMSUNG", the
lowercase version is used in all legal aspects (e.g. on Wikipedia and in
privacy/legal statements on
https://www.samsung.com/semiconductor/privacy-global/).
thermal: generic-adc: silence info message for IIO_TEMP channels
Since commit d36e2fa0253875 ("thermal: generic-adc: make lookup table
optional") "generic-adc-thermal" can be used with an IIO_TEMP channel.
In this case the following message is logged at probe time:
no lookup table, assuming DAC channel returns milliCelcius
Silence this info message if the channel type is known to be in
milli celsius. Keep this message when the channel type is unknown or not
of type temperature.
thermal: generic-adc: silence "no lookup table" on deferred probe
A "generic-adc-thermal" without "temperature-lookup-table" is perfectly
valid since commit d36e2fa0253875 ("thermal: generic-adc: make lookup
table optional"). On deferred probe the message "no lookup table,
assuming DAC channel returns milliCelcius" is still logged.
Prevent this message on deferred probe of the IIO channel by first
looking up the IIO channel.
Yangtao Li [Thu, 19 Dec 2019 17:28:17 +0000 (09:28 -0800)]
thermal/drivers/sun8i: Add thermal driver for H6/H5/H3/A64/A83T/R40
This patch adds the support for allwinner thermal sensor, within
allwinner SoC. It will register sensors for thermal framework
and use device tree to bind cooling device.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Ondrej Jirman <megous@megous.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com> Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20191219172823.1652600-2-anarsoul@gmail.com
Daniel Lezcano [Thu, 19 Dec 2019 22:53:17 +0000 (23:53 +0100)]
thermal/drivers/cpu_cooling: Rename to cpufreq_cooling
As we introduced the idle injection cooling device called
cpuidle_cooling, let's be consistent and rename the cpu_cooling to
cpufreq_cooling as this one mitigates with OPPs changes.
Daniel Lezcano [Thu, 19 Dec 2019 22:53:16 +0000 (23:53 +0100)]
thermal/drivers/cpu_cooling: Introduce the cpu idle cooling driver
The cpu idle cooling device offers a new method to cool down a CPU by
injecting idle cycles at runtime.
It has some similarities with the intel power clamp driver but it is
actually designed to be more generic and relying on the idle injection
powercap framework.
The idle injection duration is fixed while the running duration is
variable. That allows to have control on the device reactivity for the
user experience.
An idle state powering down the CPU or the cluster will allow to drop
the static leakage, thus restoring the heat capacity of the SoC. It
can be set with a trip point between the hot and the critical points,
giving the opportunity to prevent a hard reset of the system when the
cpufreq cooling fails to cool down the CPU.
With more sophisticated boards having a per core sensor, the idle
cooling device allows to cool down a single core without throttling
the compute capacity of several cpus belonging to the same clock line,
so it could be used in collaboration with the cpufreq cooling device.
Provide some documentation for the idle injection cooling effect in
order to let people to understand the rational of the approach for the
idle injection CPU cooling device.
Daniel Lezcano [Wed, 4 Dec 2019 15:39:27 +0000 (16:39 +0100)]
thermal/drivers/Kconfig: Convert the CPU cooling device to a choice
The next changes will add a new way to cool down a CPU by injecting
idle cycles. With the current configuration, a CPU cooling device is
the cpufreq cooling device. As we want to add a new CPU cooling
device, let's convert the CPU cooling to a choice giving a list of CPU
cooling devices. At this point, there is obviously only one CPU
cooling device.
Andrey Smirnov [Tue, 10 Dec 2019 16:41:50 +0000 (08:41 -0800)]
thermal: qoriq: Enable all sensors before registering them
Tmu_get_temp will get called as a part of sensor registration via
devm_thermal_zone_of_sensor_register(). To prevent it from retruning
bogus data we need to enable sensor monitoring before that. Looking at
the datasheet (i.MX8MQ RM) there doesn't seem to be any harm in
enabling them all, so, for the sake of simplicity, change the code to
do just that.
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Tested-by: Lucas Stach <l.stach@pengutronix.de> Cc: Chris Healy <cphealy@gmail.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Eduardo Valentin <edubezval@gmail.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Angus Ainslie (Purism) <angus@akkea.ca> Cc: linux-imx@nxp.com Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20191210164153.10463-10-andrew.smirnov@gmail.com
Andrey Smirnov [Tue, 10 Dec 2019 16:41:49 +0000 (08:41 -0800)]
thermal: qoriq: Convert driver to use regmap API
Convert driver to use regmap API, drop custom LE/BE IO helpers and
simplify bit manipulation using regmap_update_bits(). This also allows
us to convert some register initialization to use loops and adds
convenient debug access to TMU registers via debugfs.
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Lucas Stach <l.stach@pengutronix.de> Cc: Chris Healy <cphealy@gmail.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Eduardo Valentin <edubezval@gmail.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Angus Ainslie (Purism) <angus@akkea.ca> Cc: linux-imx@nxp.com Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20191210164153.10463-9-andrew.smirnov@gmail.com
Andrey Smirnov [Tue, 10 Dec 2019 16:41:47 +0000 (08:41 -0800)]
thermal: qoriq: Pass data to qoriq_tmu_calibration() directly
We can simplify error cleanup code if instead of passing a "struct
platform_device *" to qoriq_tmu_calibration() and deriving a bunch of
pointers from it, we pass those pointers directly. This way we won't
be force to call platform_set_drvdata() as early in qoriq_tmu_probe()
and need to have "platform_set_drvdata(pdev, NULL);" in error path.
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Lucas Stach <l.stach@pengutronix.de> Cc: Chris Healy <cphealy@gmail.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Eduardo Valentin <edubezval@gmail.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Angus Ainslie (Purism) <angus@akkea.ca> Cc: linux-imx@nxp.com Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20191210164153.10463-7-andrew.smirnov@gmail.com
Andrey Smirnov [Tue, 10 Dec 2019 16:41:46 +0000 (08:41 -0800)]
thermal: qoriq: Pass data to qoriq_tmu_register_tmu_zone() directly
Pass all necessary data to qoriq_tmu_register_tmu_zone() directly
instead of passing a platform device and then deriving it. This is
done as a first step to simplify resource deallocation code.
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Lucas Stach <l.stach@pengutronix.de> Cc: Chris Healy <cphealy@gmail.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Eduardo Valentin <edubezval@gmail.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Angus Ainslie (Purism) <angus@akkea.ca> Cc: linux-imx@nxp.com Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20191210164153.10463-6-andrew.smirnov@gmail.com
Andrey Smirnov [Tue, 10 Dec 2019 16:41:45 +0000 (08:41 -0800)]
thermal: qoriq: Embed per-sensor data into struct qoriq_tmu_data
Embed per-sensor data into struct qoriq_tmu_data so we can drop the
code allocating it. This also allows us to get rid of per-sensor back
reference to struct qoriq_tmu_data since now its address can be
calculated using container_of().
Amit Kucheria [Wed, 20 Nov 2019 15:45:19 +0000 (21:15 +0530)]
thermal: amlogic: Appease the kernel-doc deity
Fix up the following warning when compiled with make W=1:
linux.git/drivers/thermal/amlogic_thermal.c:78: warning: Function parameter or member 'A' not described in 'amlogic_thermal_soc_calib_data'
linux.git/drivers/thermal/amlogic_thermal.c:78: warning: Function parameter or member 'B' not described in 'amlogic_thermal_soc_calib_data'
linux.git/drivers/thermal/amlogic_thermal.c:78: warning: Function parameter or member 'm' not described in 'amlogic_thermal_soc_calib_data'
linux.git/drivers/thermal/amlogic_thermal.c:78: warning: Function parameter or member 'n' not described in 'amlogic_thermal_soc_calib_data'
Amit Kucheria [Wed, 20 Nov 2019 15:45:18 +0000 (21:15 +0530)]
thermal: tegra: Appease the kernel-doc deity
Fix up the following warning when compiled with make W=1:
linux.git/drivers/thermal/tegra/soctherm.c:369: warning: Function parameter or member 'value' not described in 'ccroc_writel'
linux.git/drivers/thermal/tegra/soctherm.c:369: warning: Excess function parameter 'v' description in 'ccroc_writel'
linux.git/drivers/thermal/tegra/soctherm.c:447: warning: Function parameter or member 'dev' not described in 'enforce_temp_range'
linux.git/drivers/thermal/tegra/soctherm.c:772: warning: Function parameter or member 'sg' not described in 'tegra_soctherm_set_hwtrips'
linux.git/drivers/thermal/tegra/soctherm.c:772: warning: Function parameter or member 'tz' not described in 'tegra_soctherm_set_hwtrips'
linux.git/drivers/thermal/tegra/soctherm.c:944: warning: Function parameter or member 'ts' not described in 'soctherm_oc_intr_enable'
linux.git/drivers/thermal/tegra/soctherm.c:1167: warning: Function parameter or member 'data' not described in 'soctherm_oc_irq_disable'
linux.git/drivers/thermal/tegra/soctherm.c:1167: warning: Excess function parameter 'irq_data' description in 'soctherm_oc_irq_disable'
linux.git/drivers/thermal/tegra/soctherm.c:1224: warning: Function parameter or member 'ctrlr' not described in 'soctherm_irq_domain_xlate_twocell'
linux.git/drivers/thermal/tegra/soctherm.c:1686: warning: Function parameter or member 'pdev' not described in 'soctherm_init_hw_throt_cdev'
linux.git/drivers/thermal/tegra/soctherm.c:1764: warning: Function parameter or member 'ts' not described in 'throttlectl_cpu_level_cfg'
linux.git/drivers/thermal/tegra/soctherm.c:1812: warning: Function parameter or member 'ts' not described in 'throttlectl_cpu_level_select'
linux.git/drivers/thermal/tegra/soctherm.c:1855: warning: Function parameter or member 'ts' not described in 'throttlectl_cpu_mn'
linux.git/drivers/thermal/tegra/soctherm.c:1886: warning: Function parameter or member 'ts' not described in 'throttlectl_gpu_level_select'
linux.git/drivers/thermal/tegra/soctherm.c:1928: warning: Function parameter or member 'ts' not described in 'soctherm_throttle_program'
Amit Kucheria [Wed, 20 Nov 2019 15:45:17 +0000 (21:15 +0530)]
thermal: samsung: Appease the kernel-doc deity
Fix up the following warning when compiled with make W=1:
linux.git/drivers/thermal/samsung/exynos_tmu.c:141: warning: bad
line: driver
linux.git/drivers/thermal/samsung/exynos_tmu.c:203: warning: Function
parameter or member 'tzd' not described in 'exynos_tmu_data'
linux.git/drivers/thermal/samsung/exynos_tmu.c:203: warning: Function
parameter or member 'tmu_set_trip_temp' not described in
'exynos_tmu_data'
linux.git/drivers/thermal/samsung/exynos_tmu.c:203: warning: Function
parameter or member 'tmu_set_trip_hyst' not described in
'exynos_tmu_data'
Amit Kucheria [Wed, 20 Nov 2019 15:45:16 +0000 (21:15 +0530)]
thermal: rockchip: Appease the kernel-doc deity
Replace a comment starting with /** by simply /* to avoid having it
interpreted as a kernel-doc comment. Describe missing function
parameters where needed.
Fixes up the following warnings when compiled with make W=1:
linux.git/drivers/thermal/rockchip_thermal.c:27: warning: cannot
understand function prototype: 'enum tshut_mode '
linux.git/drivers/thermal/rockchip_thermal.c:37: warning: cannot
understand function prototype: 'enum tshut_polarity '
linux.git/drivers/thermal/rockchip_thermal.c:46: warning: cannot
understand function prototype: 'enum sensor_id '
linux.git/drivers/thermal/rockchip_thermal.c:56: warning: cannot
understand function prototype: 'enum adc_sort_mode '
linux.git/drivers/thermal/rockchip_thermal.c:123: warning: Function
parameter or member 'chn_id' not described in 'rockchip_tsadc_chip'
linux.git/drivers/thermal/rockchip_thermal.c:123: warning: Function
parameter or member 'control' not described in 'rockchip_tsadc_chip'
linux.git/drivers/thermal/rockchip_thermal.c:167: warning: Function
parameter or member 'sensors' not described in 'rockchip_thermal_data'
linux.git/drivers/thermal/rockchip_thermal.c:608: warning: Function
parameter or member 'grf' not described in 'rk_tsadcv2_initialize'
linux.git/drivers/thermal/rockchip_thermal.c:608: warning: Function
parameter or member 'regs' not described in 'rk_tsadcv2_initialize'
linux.git/drivers/thermal/rockchip_thermal.c:608: warning: Function
parameter or member 'tshut_polarity' not described in
'rk_tsadcv2_initialize'
linux.git/drivers/thermal/rockchip_thermal.c:644: warning: Function
parameter or member 'grf' not described in 'rk_tsadcv3_initialize'
linux.git/drivers/thermal/rockchip_thermal.c:644: warning: Function
parameter or member 'regs' not described in 'rk_tsadcv3_initialize'
linux.git/drivers/thermal/rockchip_thermal.c:644: warning: Function
parameter or member 'tshut_polarity' not described in
'rk_tsadcv3_initialize'
linux.git/drivers/thermal/rockchip_thermal.c:732: warning: Function
parameter or member 'regs' not described in 'rk_tsadcv3_control'
linux.git/drivers/thermal/rockchip_thermal.c:732: warning: Function
parameter or member 'enable' not described in 'rk_tsadcv3_control'
linux.git/drivers/thermal/rockchip_thermal.c:1211: warning: Function
parameter or member 'reset' not described in
'rockchip_thermal_reset_controller'
Amit Kucheria [Wed, 20 Nov 2019 15:45:15 +0000 (21:15 +0530)]
thermal: mediatek: Appease the kernel-doc deity
Replace a comment starting with /** by simply /* to avoid having it
interpreted as a kernel-doc comment. Describe missing function
parameters where needed.
Fixes up the following warnings when compiled with make W=1:
linux.git/drivers/thermal/mtk_thermal.c:374: warning: cannot understand
function prototype: 'const struct mtk_thermal_data mt8173_thermal_data =
'
linux.git/drivers/thermal/mtk_thermal.c:413: warning: cannot understand
function prototype: 'const struct mtk_thermal_data mt2701_thermal_data =
'
linux.git/drivers/thermal/mtk_thermal.c:443: warning: cannot understand
function prototype: 'const struct mtk_thermal_data mt2712_thermal_data =
'
linux.git/drivers/thermal/mtk_thermal.c:499: warning: cannot understand
function prototype: 'const struct mtk_thermal_data mt8183_thermal_data =
'
linux.git/drivers/thermal/mtk_thermal.c:529: warning: Function parameter
or member 'sensno' not described in 'raw_to_mcelsius'
Amit Kucheria [Wed, 20 Nov 2019 15:45:13 +0000 (21:15 +0530)]
thermal: devfreq_cooling: Appease the kernel-doc deity
Fix up the following warnings with make W=1:
linux.git/drivers/thermal/devfreq_cooling.c:68: warning: Function
parameter or member 'capped_state' not described in
'devfreq_cooling_device'
linux.git/drivers/thermal/devfreq_cooling.c:593: warning: Function
parameter or member 'cdev' not described in 'devfreq_cooling_unregister'
linux.git/drivers/thermal/devfreq_cooling.c:593: warning: Excess
function parameter 'dfc' description in 'devfreq_cooling_unregister'
Amit Kucheria [Wed, 20 Nov 2019 15:45:12 +0000 (21:15 +0530)]
thermal: step_wise: Appease the kernel-doc deity
Replace - with : to appease the kernel-doc gods and fix warnings such as
the following when compiled with make W=1:
linux-amit.git/drivers/thermal/step_wise.c:187: warning: Function
parameter or member 'tz' not described in 'step_wise_throttle'
linux-amit.git/drivers/thermal/step_wise.c:187: warning: Function
parameter or member 'trip' not described in 'step_wise_throttle'
linux.git/drivers/thermal/fair_share.c:79: warning: Function parameter
or member 'tz' not described in 'fair_share_throttle'
linux.git/drivers/thermal/fair_share.c:79: warning: Function parameter
or member 'trip' not described in 'fair_share_throttle'
Amit Kucheria [Wed, 20 Nov 2019 15:45:10 +0000 (21:15 +0530)]
thermal: of-thermal: Appease the kernel-doc deity
Replace a comment starting with /** by simply /* to avoid having
it interpreted as a kernel-doc comment.
Fixes the following warning when compile with make W=1:
linux.git/drivers/thermal/of-thermal.c:761: warning: cannot understand function prototype: 'const char *trip_types[] = '
Linus Walleij [Tue, 19 Nov 2019 07:46:50 +0000 (08:46 +0100)]
thermal: db8500: Depromote debug print
We are not interested in getting this debug print on our
console all the time.
Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Stephan Gerhold <stephan@gerhold.net> Fixes: 6c375eccded4 ("thermal: db8500: Rewrite to be a pure OF sensor") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Stephan Gerhold <stephan@gerhold.net> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20191119074650.2664-1-linus.walleij@linaro.org
Linus Torvalds [Sun, 26 Jan 2020 20:23:04 +0000 (12:23 -0800)]
Merge tag 'io_uring-5.5-2020-01-26' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Fix for two regressions in this cycle, both reported by the postgresql
use case.
One removes the added restriction on who can submit IO, making it
possible for rings shared across forks to do so. The other fixes an
issue for the same kind of use case, where one exiting process would
cancel all IO"
* tag 'io_uring-5.5-2020-01-26' of git://git.kernel.dk/linux-block:
io_uring: don't cancel all work on process exit
Revert "io_uring: only allow submit from owning task"
Linus Torvalds [Sun, 26 Jan 2020 18:39:09 +0000 (10:39 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two last minute fixes, both in drivers.
The fnic one is a highly unlikely condition, but the RDMA one is a
recently introduced regression that causes a kernel warning to trigger
in every RDMA logon, which would be unsightly if it got into the final
release"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: RDMA/isert: Fix a recently introduced regression related to logout
scsi: fnic: do not queue commands during fwreset
Jens Axboe [Sun, 26 Jan 2020 17:17:12 +0000 (10:17 -0700)]
io_uring: don't cancel all work on process exit
If we're sharing the ring across forks, then one process exiting means
that we cancel ALL work and prevent future work. This is overly
restrictive. As long as we cancel the work associated with the files
from the current task, it's safe to let others persist. Normal fd close
on exit will still wait (and cancel) pending work.
Fixes: fcb323cc53e2 ("io_uring: io_uring: add support for async work inheriting files") Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
block: allow partitions on host aware zone devices
Host-aware SMR drives can be used with the commands to explicitly manage
zone state, but they can also be used as normal disks. In the former
case it makes perfect sense to allow partitions on them, in the latter
it does not, just like for host managed devices. Add a check to
add_partition to allow partitions on host aware devices, but give
up any zone management capabilities in that case, which also catches
the previously missed case of adding a partition vs just scanning it.
Because sd can rescan the attribute at runtime it needs to check if
a disk has partitions, for which a new helper is added to genhd.h.
Fixes: 5eac3eb30c9a ("block: Remove partition support for zoned block devices") Reported-by: Borislav Petkov <bp@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Sun, 26 Jan 2020 16:53:12 +0000 (09:53 -0700)]
Revert "io_uring: only allow submit from owning task"
This ends up being too restrictive for tasks that willingly fork and
share the ring between forks. Andres reports that this breaks his
postgresql work. Since we're close to 5.5 release, revert this change
for now.
Cc: stable@vger.kernel.org Fixes: 44d282796f81 ("io_uring: only allow submit from owning task") Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Sun, 26 Jan 2020 01:02:53 +0000 (01:02 +0000)]
afs: Fix characters allowed into cell names
The afs filesystem needs to prohibit certain characters from cell names,
such as '/', as these are used to form filenames in procfs, leading to
the following warning being generated:
WARNING: CPU: 0 PID: 3489 at fs/proc/generic.c:178
Fix afs_alloc_cell() to disallow nonprintable characters, '/', '@' and
names that begin with a dot.
Remove the check for "@cell" as that is then redundant.
This can be tested by running:
echo add foo/.bar 1.2.3.4 >/proc/fs/afs/cells
Note that we will also need to deal with:
- Names ending in ".invalid" shouldn't be passed to the DNS.
- Names that contain non-valid domainname chars shouldn't be passed to
the DNS.
- DNS replies that say "your-dns-needs-immediate-attention.<gTLD>" and
replies containing A records that say 127.0.53.53 should be
considered invalid.
[https://www.icann.org/en/system/files/files/name-collision-mitigation-01aug14-en.pdf]
but these need to be dealt with by the kafs-client DNS program rather
than the kernel.
Linus Torvalds [Sat, 25 Jan 2020 22:32:51 +0000 (14:32 -0800)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
- fix ftrace relocation type filtering
- relax arch timer version check
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8955/1: virt: Relax arch timer version check during early boot
ARM: 8950/1: ftrace/recordmcount: filter relocation types
1) Off by one in mt76 airtime calculation, from Dan Carpenter.
2) Fix TLV fragment allocation loop condition in iwlwifi, from Luca
Coelho.
3) Don't confirm neigh entries when doing ipsec pmtu updates, from Xu
Wang.
4) More checks to make sure we only send TSO packets to lan78xx chips
that they can actually handle. From James Hughes.
5) Fix ip_tunnel namespace move, from William Dauchy.
6) Fix unintended packet reordering due to cooperation between
listification done by GRO and non-GRO paths. From Maxim
Mikityanskiy.
7) Add Jakub Kicincki formally as networking co-maintainer.
8) Info leak in airo ioctls, from Michael Ellerman.
9) IFLA_MTU attribute needs validation during rtnl_create_link(), from
Eric Dumazet.
10) Use after free during reload in mlxsw, from Ido Schimmel.
11) Dangling pointers are possible in tp->highest_sack, fix from Eric
Dumazet.
12) Missing *pos++ in various networking seq_next handlers, from Vasily
Averin.
13) CHELSIO_GET_MEM operation neds CAP_NET_ADMIN check, from Michael
Ellerman.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (109 commits)
firestream: fix memory leaks
net: cxgb3_main: Add CAP_NET_ADMIN check to CHELSIO_GET_MEM
net: bcmgenet: Use netif_tx_napi_add() for TX NAPI
tipc: change maintainer email address
net: stmmac: platform: fix probe for ACPI devices
net/mlx5e: kTLS, Do not send decrypted-marked SKBs via non-accel path
net/mlx5e: kTLS, Remove redundant posts in TX resync flow
net/mlx5e: kTLS, Fix corner-case checks in TX resync flow
net/mlx5e: Clear VF config when switching modes
net/mlx5: DR, use non preemptible call to get the current cpu number
net/mlx5: E-Switch, Prevent ingress rate configuration of uplink rep
net/mlx5: DR, Enable counter on non-fwd-dest objects
net/mlx5: Update the list of the PCI supported devices
net/mlx5: Fix lowest FDB pool size
net: Fix skb->csum update in inet_proto_csum_replace16().
netfilter: nf_tables: autoload modules from the abort path
netfilter: nf_tables: add __nft_chain_type_get()
netfilter: nf_tables_offload: fix check the chain offload flag
netfilter: conntrack: sctp: use distinct states for new SCTP connections
ipv6_route_seq_next should increase position index
...
Linus Torvalds [Sat, 25 Jan 2020 22:08:43 +0000 (14:08 -0800)]
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
"A couple of fixes have come in that would be good to include in this
release:
- A fix for amount of memory on Beaglebone Black. Surfaced now since
GRUB2 doesn't update memory size in the booted kernel.
- A fix to make SPI interfaces work on am43x-epos-evm.
- Small Kconfig fix for OPTEE (adds a depend on MMU) to avoid build
failures"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: dts: am43x-epos-evm: set data pin directions for spi0 and spi1
tee: optee: Fix compilation issue with nommu
ARM: dts: am335x-boneblack-common: fix memory size
Wenwen Wang [Sat, 25 Jan 2020 14:33:29 +0000 (14:33 +0000)]
firestream: fix memory leaks
In fs_open(), 'vcc' is allocated through kmalloc() and assigned to
'atm_vcc->dev_data.' In the following execution, if an error occurs, e.g.,
there is no more free channel, an error code EBUSY or ENOMEM will be
returned. However, 'vcc' is not deallocated, leading to memory leaks. Note
that, in normal cases where fs_open() returns 0, 'vcc' will be deallocated
in fs_close(). But, if fs_open() fails, there is no guarantee that
fs_close() will be invoked.
To fix this issue, deallocate 'vcc' before the error code is returned.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 25 Jan 2020 18:55:24 +0000 (10:55 -0800)]
Merge tag 'for-5.5-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"Here's a last minute fix for a regression introduced in this
development cycle.
There's a small chance of a silent corruption when device replace and
NOCOW data writes happen at the same time in one block group. Metadata
or COW data writes are unaffected.
The extra fixup patch is there to silence an unnecessary warning"
* tag 'for-5.5-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: dev-replace: remove warning for unknown return codes when finished
btrfs: scrub: Require mandatory block group RO for dev-replace
Linus Torvalds [Sat, 25 Jan 2020 18:46:07 +0000 (10:46 -0800)]
Merge tag 'pinctrl-v5.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fix from Linus Walleij:
"A single fix for the Intel Sunrisepoint pin controller that makes the
interrupts work properly on it"
* tag 'pinctrl-v5.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: sunrisepoint: Add missing Interrupt Status register offset
Please pull and let me know if there is any problem.
Merge conflict: once merge with net-next, a contextual conflict will
appear in drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
since the code moved in net-next.
To resolve, just delete ALL of the conflicting hunk from net.
So sorry for the small mess ..
For -stable v5.4:
('net/mlx5: Update the list of the PCI supported devices')
('net/mlx5: Fix lowest FDB pool size')
('net/mlx5e: kTLS, Fix corner-case checks in TX resync flow')
('net/mlx5e: kTLS, Do not send decrypted-marked SKBs via non-accel path')
('net/mlx5: Eswitch, Prevent ingress rate configuration of uplink rep')
('net/mlx5e: kTLS, Remove redundant posts in TX resync flow')
('net/mlx5: DR, Enable counter on non-fwd-dest objects')
('net/mlx5: DR, use non preemptible call to get the current cpu number')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
caused by the previous patch ("btrfs: scrub: Require mandatory block
group RO for dev-replace"). Hitting ENOSPC is possible and could happen
when the block group is set read-only, preventing NOCOW writes to the
area that's being accessed by dev-replace.
This has happend with scratch devices of size 12G but not with 5G and
20G, so this is depends on timing and other activity on the filesystem.
The whole replace operation is restartable, the space state should be
examined by the user in any case.
The error code is propagated back to the ioctl caller so the kernel
warning is causing false alerts.
Michael Ellerman [Fri, 24 Jan 2020 09:41:44 +0000 (20:41 +1100)]
net: cxgb3_main: Add CAP_NET_ADMIN check to CHELSIO_GET_MEM
The cxgb3 driver for "Chelsio T3-based gigabit and 10Gb Ethernet
adapters" implements a custom ioctl as SIOCCHIOCTL/SIOCDEVPRIVATE in
cxgb_extension_ioctl().
One of the subcommands of the ioctl is CHELSIO_GET_MEM, which appears
to read memory directly out of the adapter and return it to userspace.
It's not entirely clear what the contents of the adapter memory
contains, but the assumption is that it shouldn't be accessible to all
users.
So add a CAP_NET_ADMIN check to the CHELSIO_GET_MEM case. Put it after
the is_offload() check, which matches two of the other subcommands in
the same function which also check for is_offload() and CAP_NET_ADMIN.
Found by Ilja by code inspection, not tested as I don't have the
required hardware.
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 23 Jan 2020 17:49:34 +0000 (09:49 -0800)]
net: bcmgenet: Use netif_tx_napi_add() for TX NAPI
Before commit 7587935cfa11 ("net: bcmgenet: move NAPI initialization to
ring initialization") moved the code, this used to be
netif_tx_napi_add(), but we lost that small semantic change in the
process, restore that.
Fixes: 7587935cfa11 ("net: bcmgenet: move NAPI initialization to ring initialization") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Doug Berger <opendmb@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Olof Johansson [Fri, 24 Jan 2020 20:05:32 +0000 (12:05 -0800)]
Merge tag 'omap-for-fixes-whenever-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes
Few minor fixes for omaps
Looks like we have wrong default memory size for beaglebone black,
it has at least 512 MB of RAM and not 256 MB. This causes an issue
when booted with GRUB2 that does not seem to pass memory info to
the kernel.
And for am43x-epos-evm the SPI pin directions need to be configured
for SPI to work.
* tag 'omap-for-fixes-whenever-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: am43x-epos-evm: set data pin directions for spi0 and spi1
ARM: dts: am335x-boneblack-common: fix memory size
Tariq Toukan [Mon, 20 Jan 2020 11:42:00 +0000 (13:42 +0200)]
net/mlx5e: kTLS, Do not send decrypted-marked SKBs via non-accel path
When TCP out-of-order is identified (unexpected tcp seq mismatch), driver
analyzes the packet and decides what handling should it get:
1. go to accelerated path (to be encrypted in HW),
2. go to regular xmit path (send w/o encryption),
3. drop.
Packets marked with skb->decrypted by the TLS stack in the TX flow skips
SW encryption, and rely on the HW offload.
Verify that such packets are never sent un-encrypted on the wire.
Add a WARN to catch such bugs, and prefer dropping the packet in these cases.
Tariq Toukan [Sun, 12 Jan 2020 14:22:14 +0000 (16:22 +0200)]
net/mlx5e: kTLS, Fix corner-case checks in TX resync flow
There are the following cases:
1. Packet ends before start marker: bypass offload.
2. Packet starts before start marker and ends after it: drop,
not supported, breaks contract with kernel.
3. packet ends before tls record info starts: drop,
this packet was already acknowledged and its record info
was released.
Add the above as comment in code.
Mind possible wraparounds of the TCP seq, replace the simple comparison
with a call to the TCP before() method.
In addition, remove logic that handles negative sync_len values,
as it became impossible.
Dmytro Linkin [Mon, 13 Jan 2020 11:46:13 +0000 (13:46 +0200)]
net/mlx5e: Clear VF config when switching modes
Currently VF in LEGACY mode are not able to go up. Also in OFFLOADS
mode, when switching to it first time, VF can go up independently to
his representor, which is not expected.
Perform clearing of VF config when switching modes and set link state
to AUTO as default value. Also, when switching to OFFLOADS mode set
link state to DOWN, which allow VF link state to be controlled by its
REP.
Fixes: 1ab2068a4c66 ("net/mlx5: Implement vports admin state backup/restore") Fixes: 556b9d16d3f5 ("net/mlx5: Clear VF's configuration on disabling SRIOV") Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Erez Shitrit [Sun, 12 Jan 2020 06:57:59 +0000 (08:57 +0200)]
net/mlx5: DR, use non preemptible call to get the current cpu number
Use raw_smp_processor_id instead of smp_processor_id() otherwise we will
get the following trace in debug-kernel:
BUG: using smp_processor_id() in preemptible [00000000] code: devlink
caller is dr_create_cq.constprop.2+0x31d/0x970 [mlx5_core]
Call Trace:
dump_stack+0x9a/0xf0
debug_smp_processor_id+0x1f3/0x200
dr_create_cq.constprop.2+0x31d/0x970
genl_family_rcv_msg+0x5fd/0x1170
genl_rcv_msg+0xb8/0x160
netlink_rcv_skb+0x11e/0x340
Fixes: 297cccebdc5a ("net/mlx5: DR, Expose an internal API to issue RDMA operations") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eli Cohen [Sun, 12 Jan 2020 11:43:37 +0000 (13:43 +0200)]
net/mlx5: E-Switch, Prevent ingress rate configuration of uplink rep
Since the implementation relies on limiting the VF transmit rate to
simulate ingress rate limiting, and since either uplink representor or
ecpf are not associated with a VF, we limit the rate limit configuration
for those ports.
Fixes: fcb64c0f5640 ("net/mlx5: E-Switch, add ingress rate support") Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Meir Lichtinger [Thu, 12 Dec 2019 14:09:33 +0000 (16:09 +0200)]
net/mlx5: Update the list of the PCI supported devices
Add the upcoming ConnectX-7 device ID.
Fixes: 85327a9c4150 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Meir Lichtinger <meirl@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Paul Blakey [Tue, 31 Dec 2019 15:04:15 +0000 (17:04 +0200)]
net/mlx5: Fix lowest FDB pool size
The pool sizes represent the pool sizes in the fw. when we request
a pool size from fw, it will return the next possible group.
We track how many pools the fw has left and start requesting groups
from the big to the small.
When we start request 4k group, which doesn't exists in fw, fw
wants to allocate the next possible size, 64k, but will fail since
its exhausted. The correct smallest pool size in fw is 128 and not 4k.
Fixes: e52c28024008 ("net/mlx5: E-Switch, Add chains and priorities") Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
net: Fix skb->csum update in inet_proto_csum_replace16().
skb->csum is updated incorrectly, when manipulation for
NF_NAT_MANIP_SRC\DST is done on IPV6 packet.
Fix:
There is no need to update skb->csum in inet_proto_csum_replace16(),
because update in two fields a.) IPv6 src/dst address and b.) L4 header
checksum cancels each other for skb->csum calculation. Whereas
inet_proto_csum_replace4 function needs to update skb->csum, because
update in 3 fields a.) IPv4 src/dst address, b.) IPv4 Header checksum
and c.) L4 header checksum results in same diff as L4 Header checksum
for skb->csum calculation.
[ pablo@netfilter.org: a few comestic documentation edits ] Signed-off-by: Praveen Chaudhary <pchaudhary@linkedin.com> Signed-off-by: Zhenggen Xu <zxu@linkedin.com> Signed-off-by: Andy Stracner <astracner@linkedin.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_tables: autoload modules from the abort path
This patch introduces a list of pending module requests. This new module
list is composed of nft_module_request objects that contain the module
name and one status field that tells if the module has been already
loaded (the 'done' field).
In the first pass, from the preparation phase, the netlink command finds
that a module is missing on this list. Then, a module request is
allocated and added to this list and nft_request_module() returns
-EAGAIN. This triggers the abort path with the autoload parameter set on
from nfnetlink, request_module() is called and the module request enters
the 'done' state. Since the mutex is released when loading modules from
the abort phase, the module list is zapped so this is iteration occurs
over a local list. Therefore, the request_module() calls happen when
object lists are in consistent state (after fulling aborting the
transaction) and the commit list is empty.
On the second pass, the netlink command will find that it already tried
to load the module, so it does not request it again and
nft_request_module() returns 0. Then, there is a look up to find the
object that the command was missing. If the module was successfully
loaded, the command proceeds normally since it finds the missing object
in place, otherwise -ENOENT is reported to userspace.
This patch also updates nfnetlink to include the reason to enter the
abort phase, which is required for this new autoload module rationale.
Fixes: ec7470b834fe ("netfilter: nf_tables: store transaction list locally while requesting module") Reported-by: syzbot+29125d208b3dae9a7019@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This new helper function validates that unknown family and chain type
coming from userspace do not trigger an out-of-bound array access. Bail
out in case __nft_chain_type_get() returns NULL from
nft_chain_parse_hook().
Fixes: 9370761c56b6 ("netfilter: nf_tables: convert built-in tables/chains to chain types") Reported-by: syzbot+156a04714799b1d480bc@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linus Torvalds [Fri, 24 Jan 2020 17:56:59 +0000 (09:56 -0800)]
Merge tag 'iommu-fixes-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
"Two fixes:
- Fix NULL-ptr dereference bug in Intel IOMMU driver
- Properly save and restore AMD IOMMU performance counter registers
when testing if they are writable"
* tag 'iommu-fixes-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix IOMMU perf counter clobbering during init
iommu/vt-d: Call __dmar_remove_one_dev_info with valid pointer
Linus Torvalds [Fri, 24 Jan 2020 17:49:20 +0000 (09:49 -0800)]
Merge tag 'powerpc-5.5-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 5.5:
- Fix our hash MMU code to avoid having overlapping ids between user
and kernel, which isn't as bad as it sounds but led to crashes on
some machines.
- A fix for the Power9 XIVE interrupt code, which could return the
wrong interrupt state in obscure error conditions.
- A minor Kconfig fix for the recently added CONFIG_PPC_UV code.
Thanks to Aneesh Kumar K.V, Bharata B Rao, Cédric Le Goater, Frederic
Barrat"
* tag 'powerpc-5.5-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm/hash: Fix sharing context ids between kernel & userspace
powerpc/xive: Discard ESB load value when interrupt is invalid
powerpc: Ultravisor: Fix the dependencies for CONFIG_PPC_UV
Linus Torvalds [Fri, 24 Jan 2020 17:38:04 +0000 (09:38 -0800)]
Merge tag 'drm-fixes-2020-01-24' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"This one has a core mst fix and two i915 fixes. amdgpu just enables
some hw outside experimental.
The panfrost fix is a little bigger than I'd like at this stage but it
fixes a fairly fundamental problem with global shared buffers in that
driver, and since it's confined to that driver and I've taken a look
at it, I think it's fine to get into the tree now, so it can get
stable propagated as well.
core/mst:
- Fix SST branch device handling
amdgpu:
- enable renoir outside experimental
i915:
- Avoid overflow with huge userptr objects
- uAPI fix to correctly handle negative values in
engine->uabi_class/instance (cc: stable)
panfrost:
- Fix mapping of globally visible BO's (Boris)"
* tag 'drm-fixes-2020-01-24' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: remove the experimental flag for renoir
drm/panfrost: Add the panfrost_gem_mapping concept
drm/i915: Align engine->uabi_class/instance with i915_drm.h
drm/i915/userptr: fix size calculation
drm/dp_mst: Handle SST-only branch device case
Christophe Leroy [Thu, 23 Jan 2020 08:34:18 +0000 (08:34 +0000)]
lib: Reduce user_access_begin() boundaries in strncpy_from_user() and strnlen_user()
The range passed to user_access_begin() by strncpy_from_user() and
strnlen_user() starts at 'src' and goes up to the limit of userspace
although reads will be limited by the 'count' param.
On 32 bits powerpc (book3s/32) access has to be granted for each
256Mbytes segment and the cost increases with the number of segments to
unlock.
Jiri Wiesner [Sat, 18 Jan 2020 12:10:50 +0000 (13:10 +0100)]
netfilter: conntrack: sctp: use distinct states for new SCTP connections
The netlink notifications triggered by the INIT and INIT_ACK chunks
for a tracked SCTP association do not include protocol information
for the corresponding connection - SCTP state and verification tags
for the original and reply direction are missing. Since the connection
tracking implementation allows user space programs to receive
notifications about a connection and then create a new connection
based on the values received in a notification, it makes sense that
INIT and INIT_ACK notifications should contain the SCTP state
and verification tags available at the time when a notification
is sent. The missing verification tags cause a newly created
netfilter connection to fail to verify the tags of SCTP packets
when this connection has been created from the values previously
received in an INIT or INIT_ACK notification.
A PROTOINFO event is cached in sctp_packet() when the state
of a connection changes. The CLOSED and COOKIE_WAIT state will
be used for connections that have seen an INIT and INIT_ACK chunk,
respectively. The distinct states will cause a connection state
change in sctp_packet().
Signed-off-by: Jiri Wiesner <jwiesner@suse.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jerry Snitselaar [Wed, 22 Jan 2020 00:34:26 +0000 (17:34 -0700)]
iommu/vt-d: Call __dmar_remove_one_dev_info with valid pointer
It is possible for archdata.iommu to be set to
DEFER_DEVICE_DOMAIN_INFO or DUMMY_DEVICE_DOMAIN_INFO so check for
those values before calling __dmar_remove_one_dev_info. Without a
check it can result in a null pointer dereference. This has been seen
while booting a kdump kernel on an HP dl380 gen9.
Cc: Joerg Roedel <joro@8bytes.org> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: stable@vger.kernel.org # 5.3+ Cc: linux-kernel@vger.kernel.org Fixes: ae23bfb68f28 ("iommu/vt-d: Detach domain before using a private one") Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
Qu Wenruo [Thu, 23 Jan 2020 23:58:20 +0000 (07:58 +0800)]
btrfs: scrub: Require mandatory block group RO for dev-replace
[BUG]
For dev-replace test cases with fsstress, like btrfs/06[45] btrfs/071,
looped runs can lead to random failure, where scrub finds csum error.
The possibility is not high, around 1/20 to 1/100, but it's causing data
corruption.
The bug is observable after commit b12de52896c0 ("btrfs: scrub: Don't
check free space before marking a block group RO")
[CAUSE]
Dev-replace has two source of writes:
- Write duplication
All writes to source device will also be duplicated to target device.
Content: Not yet persisted data/meta
- Scrub copy
Dev-replace reused scrub code to iterate through existing extents, and
copy the verified data to target device.
Content: Previously persisted data and metadata
The difference in contents makes the following race possible:
Regular Writer | Dev-replace
-----------------------------------------------------------------
^ |
| Preallocate one data extent |
| at bytenr X, len 1M |
v |
^ Commit transaction |
| Now extent [X, X+1M) is in |
v commit root |
================== Dev replace starts =========================
| ^
| | Scrub extent [X, X+1M)
| | Read [X, X+1M)
| | (The content are mostly garbage
| | since it's preallocated)
^ | v
| Write back happens for |
| extent [X, X+512K) |
| New data writes to both |
| source and target dev. |
v |
| ^
| | Scrub writes back extent [X, X+1M)
| | to target device.
| | This will over write the new data in
| | [X, X+512K)
| v
This race can only happen for nocow writes. Thus metadata and data cow
writes are safe, as COW will never overwrite extents of previous
transaction (in commit root).
This behavior can be confirmed by disabling all fallocate related calls
in fsstress (*), then all related tests can pass a 2000 run loop.
*: FSSTRESS_AVOID="-f fallocate=0 -f allocsp=0 -f zero=0 -f insert=0 \
-f collapse=0 -f punch=0 -f resvsp=0"
I didn't expect resvsp ioctl will fallback to fallocate in VFS...
[FIX]
Make dev-replace to require mandatory block group RO, and wait for current
nocow writes before calling scrub_chunk().
This patch will mostly revert commit 76a8efa171bf ("btrfs: Continue replace
when set_block_ro failed") for dev-replace path.
The side effect is, dev-replace can be more strict on avaialble space, but
definitely worth to avoid data corruption.
Reported-by: Filipe Manana <fdmanana@suse.com> Fixes: 76a8efa171bf ("btrfs: Continue replace when set_block_ro failed") Fixes: b12de52896c0 ("btrfs: scrub: Don't check free space before marking a block group RO") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
====================
netdev: seq_file .next functions should increase position index
In Aug 2018 NeilBrown noticed
commit 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface")
"Some ->next functions do not increment *pos when they return NULL...
Note that such ->next functions are buggy and should be fixed.
A simple demonstration is
dd if=/proc/swaps bs=1000 skip=1
Choose any block size larger than the size of /proc/swaps. This will
always show the whole last line of /proc/swaps"
Described problem is still actual. If you make lseek into middle of last output line
following read will output end of last line and whole last line once again.
$ dd if=/proc/swaps bs=1 # usual output
Filename Type Size Used Priority
/dev/dm-0 partition 4194812 97536 -2
104+0 records in
104+0 records out
104 bytes copied
$ dd if=/proc/swaps bs=40 skip=1 # last line was generated twice
dd: /proc/swaps: cannot skip to specified offset
v/dm-0 partition 4194812 97536 -2
/dev/dm-0 partition 4194812 97536 -2
3+1 records in
3+1 records out
131 bytes copied
There are lot of other affected files, I've found 30+ including
/proc/net/ip_tables_matches and /proc/sysvipc/*
Vasily Averin [Thu, 23 Jan 2020 07:12:06 +0000 (10:12 +0300)]
ipv6_route_seq_next should increase position index
if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.
https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Thu, 23 Jan 2020 07:11:35 +0000 (10:11 +0300)]
rt_cpu_seq_next should increase position index
if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.
https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Thu, 23 Jan 2020 07:11:28 +0000 (10:11 +0300)]
neigh_stat_seq_next() should increase position index
if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.
https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Thu, 23 Jan 2020 07:11:20 +0000 (10:11 +0300)]
vcc_seq_next should increase position index
if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.
https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Thu, 23 Jan 2020 07:11:13 +0000 (10:11 +0300)]
l2t_seq_next should increase position index
if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.
https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Thu, 23 Jan 2020 07:11:08 +0000 (10:11 +0300)]
seq_tab_next() should increase position index
if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.
https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 23 Jan 2020 05:03:00 +0000 (21:03 -0800)]
tcp: do not leave dangling pointers in tp->highest_sack
Latest commit 853697504de0 ("tcp: Fix highest_sack and highest_sack_seq")
apparently allowed syzbot to trigger various crashes in TCP stack [1]
I believe this commit only made things easier for syzbot to find
its way into triggering use-after-frees. But really the bugs
could lead to bad TCP behavior or even plain crashes even for
non malicious peers.
I have audited all calls to tcp_rtx_queue_unlink() and
tcp_rtx_queue_unlink_and_free() and made sure tp->highest_sack would be updated
if we are removing from rtx queue the skb that tp->highest_sack points to.
These updates were missing in three locations :
1) tcp_clean_rtx_queue() [This one seems quite serious,
I have no idea why this was not caught earlier]
2) tcp_rtx_queue_purge() [Probably not a big deal for normal operations]
3) tcp_send_synack() [Probably not a big deal for normal operations]
[1]
BUG: KASAN: use-after-free in tcp_highest_sack_seq include/net/tcp.h:1864 [inline]
BUG: KASAN: use-after-free in tcp_highest_sack_seq include/net/tcp.h:1856 [inline]
BUG: KASAN: use-after-free in tcp_check_sack_reordering+0x33c/0x3a0 net/ipv4/tcp_input.c:891
Read of size 4 at addr ffff8880a488d068 by task ksoftirqd/1/16
Memory state around the buggy address: ffff8880a488cf00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8880a488cf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8880a488d000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
^ ffff8880a488d080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a488d100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fixes: 853697504de0 ("tcp: Fix highest_sack and highest_sack_seq") Fixes: 50895b9de1d3 ("tcp: highest_sack fix") Fixes: 737ff314563c ("tcp: use sequence distance to detect reordering") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cambda Zhu <cambda@linux.alibaba.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>