]> git.ipfire.org Git - people/ms/linux.git/log
people/ms/linux.git
3 years agopinctrl: qcom: sm8250: Fix PDC map
Jianhua Lu [Wed, 3 Aug 2022 01:56:45 +0000 (09:56 +0800)] 
pinctrl: qcom: sm8250: Fix PDC map

Fix the PDC mapping for SM8250, gpio39 is mapped to irq73(not irq37).

Fixes: b41efeed507a("pinctrl: qcom: sm8250: Specify PDC map.")
Signed-off-by: Jianhua Lu <lujianhua000@gmail.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Link: https://lore.kernel.org/r/20220803015645.22388-1-lujianhua000@gmail.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
3 years agopinctrl: amd: Fix an unused variable
Mario Limonciello [Mon, 1 Aug 2022 14:49:52 +0000 (09:49 -0500)] 
pinctrl: amd: Fix an unused variable

`char *output_enable` is no longer used once switching to unicode
output.

Fixes: e8129a076a50 ("pinctrl: amd: Use unicode for debugfs output")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20220801144952.141-1-mario.limonciello@amd.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
3 years agoMerge tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache
Linus Torvalds [Wed, 3 Aug 2022 17:35:43 +0000 (10:35 -0700)] 
Merge tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache

Pull folio updates from Matthew Wilcox:

 - Fix an accounting bug that made NR_FILE_DIRTY grow without limit
   when running xfstests

 - Convert more of mpage to use folios

 - Remove add_to_page_cache() and add_to_page_cache_locked()

 - Convert find_get_pages_range() to filemap_get_folios()

 - Improvements to the read_cache_page() family of functions

 - Remove a few unnecessary checks of PageError

 - Some straightforward filesystem conversions to use folios

 - Split PageMovable users out from address_space_operations into
   their own movable_operations

 - Convert aops->migratepage to aops->migrate_folio

 - Remove nobh support (Christoph Hellwig)

* tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache: (78 commits)
  fs: remove the NULL get_block case in mpage_writepages
  fs: don't call ->writepage from __mpage_writepage
  fs: remove the nobh helpers
  jfs: stop using the nobh helper
  ext2: remove nobh support
  ntfs3: refactor ntfs_writepages
  mm/folio-compat: Remove migration compatibility functions
  fs: Remove aops->migratepage()
  secretmem: Convert to migrate_folio
  hugetlb: Convert to migrate_folio
  aio: Convert to migrate_folio
  f2fs: Convert to filemap_migrate_folio()
  ubifs: Convert to filemap_migrate_folio()
  btrfs: Convert btrfs_migratepage to migrate_folio
  mm/migrate: Add filemap_migrate_folio()
  mm/migrate: Convert migrate_page() to migrate_folio()
  nfs: Convert to migrate_folio
  btrfs: Convert btree_migratepage to migrate_folio
  mm/migrate: Convert expected_page_refs() to folio_expected_refs()
  mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio()
  ...

3 years agotools/thermal: Fix possible path truncations
Florian Fainelli [Mon, 25 Jul 2022 17:37:54 +0000 (10:37 -0700)] 
tools/thermal: Fix possible path truncations

A build with -D_FORTIFY_SOURCE=2 enabled will produce the following warnings:

sysfs.c:63:30: warning: '%s' directive output may be truncated writing up to 255 bytes into a region of size between 0 and 255 [-Wformat-truncation=]
  snprintf(filepath, 256, "%s/%s", path, filename);
                              ^~
Bump up the buffer to PATH_MAX which is the limit and account for all of
the possible NUL and separators that could lead to exceeding the
allocated buffer sizes.

Fixes: 94f69966faf8 ("tools/thermal: Introduce tmon, a tool for thermal subsystem")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 years agothermal: Drop obsolete dependency on COMPILE_TEST
Jean Delvare [Sun, 31 Jul 2022 12:13:52 +0000 (14:13 +0200)] 
thermal: Drop obsolete dependency on COMPILE_TEST

Since commit 0166dc11be91 ("of: make CONFIG_OF user selectable"), it
is possible to test-build any driver which depends on OF on any
architecture by explicitly selecting OF. Therefore depending on
COMPILE_TEST as an alternative is no longer needed.

It is actually better to always build such drivers with OF enabled,
so that the test builds are closer to how each driver will actually be
built on its intended target. Building them without OF may not test
much as the compiler will optimize out potentially large parts of the
code. In the worst case, this could even pop false positive warnings.
Dropping COMPILE_TEST here improves the quality of our testing and
avoids wasting time on non-existent issues.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 years agothermal: sysfs: Fix cooling_device_stats_setup() error code path
Rafael J. Wysocki [Fri, 29 Jul 2022 15:39:07 +0000 (17:39 +0200)] 
thermal: sysfs: Fix cooling_device_stats_setup() error code path

If cooling_device_stats_setup() fails to create the stats object, it
must clear the last slot in cooling_device_attr_groups that was
initially empty (so as to make it possible to add stats attributes to
the cooling device attribute groups).

Failing to do so may cause the stats attributes to be created by
mistake for a device that doesn't have a stats object, because the
slot in question might be populated previously during the registration
of another cooling device.

Fixes: 8ea229511e06 ("thermal: Add cooling device's statistics in sysfs")
Reported-by: Di Shen <di.shen@unisoc.com>
Tested-by: Di Shen <di.shen@unisoc.com>
Cc: 4.17+ <stable@vger.kernel.org> # 4.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 years agothermal: intel: Add TCC cooling support for Alder Lake-N and Raptor Lake-P
Sumeet Pawnikar [Thu, 28 Jul 2022 17:54:56 +0000 (23:24 +0530)] 
thermal: intel: Add TCC cooling support for Alder Lake-N and Raptor Lake-P

Add Alder Lake-N and Raptor Lake-P to the list of processor models
supported by the Intel TCC cooling driver.

Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 years agoMerge tag 'xarray-6.0' of git://git.infradead.org/users/willy/xarray
Linus Torvalds [Wed, 3 Aug 2022 17:02:28 +0000 (10:02 -0700)] 
Merge tag 'xarray-6.0' of git://git.infradead.org/users/willy/xarray

Pull XArray/IDR updates from Matthew Wilcox:

 - Add appropriate might_alloc() annotations to the XArray APIs

 - Document that the IDR is deprecated

* tag 'xarray-6.0' of git://git.infradead.org/users/willy/xarray:
  IDR: Note that the IDR API is deprecated
  XArray: Add calls to might_alloc()

3 years agoMerge tag 'cgroup-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Linus Torvalds [Wed, 3 Aug 2022 16:45:08 +0000 (09:45 -0700)] 
Merge tag 'cgroup-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:
 "Several core optimizations:

   - threadgroup_rwsem write locking is skipped when configuring
     controllers in empty subtrees.

     Combined with CLONE_INTO_CGROUP, this allows the common static
     usage pattern to not grab threadgroup_rwsem at all (glibc still
     doesn't seem ready for CLONE_INTO_CGROUP unfortunately).

   - threadgroup_rwsem used to be put into non-percpu mode by default
     due to latency concerns in specific use cases. There's no reason
     for everyone else to pay for it. Make the behavior optional.

   - psi no longer allocates memory when disabled.

  ... along with some code cleanups"

* tag 'cgroup-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Skip subtree root in cgroup_update_dfl_csses()
  cgroup: remove "no" prefixed mount options
  cgroup: Make !percpu threadgroup_rwsem operations optional
  cgroup: Add "no" prefixed mount options
  cgroup: Elide write-locking threadgroup_rwsem when updating csses on an empty subtree
  cgroup.c: remove redundant check for mixable cgroup in cgroup_migrate_vet_dst
  cgroup.c: add helper __cset_cgroup_from_root to cleanup duplicated codes
  psi: dont alloc memory for psi by default

3 years agoperf stat: Refactor __run_perf_stat() common code
Adrián Herrera Arcila [Fri, 29 Jul 2022 16:12:43 +0000 (16:12 +0000)] 
perf stat: Refactor __run_perf_stat() common code

This extracts common code from the branches of the forks if-then-else.

enable_counters(), which was at the beginning of both branches of the
conditional, is now unconditional; evlist__start_workload() is extracted
to a different if, which enables making the common clocking code
unconditional.

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Adrián Herrera Arcila <adrian.herrera@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/r/20220729161244.10522-1-adrian.herrera@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agocpuidle: Add cpu_idle_miss trace event
Kajetan Puchalski [Tue, 26 Jul 2022 10:24:04 +0000 (11:24 +0100)] 
cpuidle: Add cpu_idle_miss trace event

Add a trace event for cpuidle to track missed (too deep or too shallow)
wakeups.

After each wakeup, CPUIdle already computes whether the entered state was
optimal, above or below the desired one and updates the relevant
counters. This patch makes it possible to trace those events in addition
to just reading the counters.

The patterns of types and percentages of misses across different
workloads appear to be very consistent. This makes the trace event very
useful for comparing the relative correctness of different CPUIdle
governors for different types of workloads, or for finding the
optimal governor for a given device.

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 years agoMerge tag 'opp-updates-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Rafael J. Wysocki [Wed, 3 Aug 2022 15:49:38 +0000 (17:49 +0200)] 
Merge tag 'opp-updates-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm

Pull operating performance points (OPP) updates for 5.20-rc1 from Viresh
Kumar:

"- Make dev_pm_opp_set_regulators() accept NULL terminated list (Viresh
   Kumar).

 - Add dev_pm_opp_set_config() and friends and migrate other
   users/helpers to using them (Viresh Kumar).

 - Add support for multiple clocks for a device (Viresh Kumar and
   Krzysztof Kozlowski).

 - Configure resources before adding OPP table for Venus (Stanimir
   Varbanov).

 - Keep reference count up for opp->np and opp_table->np while they are
   still in use (Liang He).

 - Minor cleanups (Viresh Kumar and Yang Li)."

* tag 'opp-updates-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (43 commits)
  venus: pm_helpers: Fix warning in OPP during probe
  OPP: Don't drop opp->np reference while it is still in use
  OPP: Don't drop opp_table->np reference while it is still in use
  OPP: Remove dev{m}_pm_opp_of_add_table_noclk()
  PM / devfreq: tegra30: Register config_clks helper
  OPP: Allow config_clks helper for single clk case
  OPP: Provide a simple implementation to configure multiple clocks
  OPP: Assert clk_count == 1 for single clk helpers
  OPP: Add key specific assert() method to key finding helpers
  OPP: Compare bandwidths for all paths in _opp_compare_key()
  OPP: Allow multiple clocks for a device
  dt-bindings: opp: accept array of frequencies
  OPP: Make dev_pm_opp_set_opp() independent of frequency
  OPP: Reuse _opp_compare_key() in _opp_add_static_v2()
  OPP: Remove rate_not_available parameter to _opp_add()
  OPP: Use consistent names for OPP table instances
  OPP: Use generic key finding helpers for bandwidth key
  OPP: Use generic key finding helpers for level key
  OPP: Add generic key finding helpers and use them for freq APIs
  OPP: Remove dev_pm_opp_find_freq_ceil_by_volt()
  ...

3 years agoMerge tag 'cpufreq-arm-updates-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Rafael J. Wysocki [Wed, 3 Aug 2022 15:47:45 +0000 (17:47 +0200)] 
Merge tag 'cpufreq-arm-updates-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm

Pull cpufreq/ARM updates for 5.20-rc1 from Viresh Kumar:

"- Fix return error code in mtk_cpu_dvfs_info_init (Yang Yingliang).

 - Minor cleanups and support for new boards for Qcom cpufreq drivers
   (Bryan O'Donoghue, Konrad Dybcio, Pierre Gondois, and Yicong Yang).

 - Fix sparse warnings for Tegra driver (Viresh Kumar)."

* tag 'cpufreq-arm-updates-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  cpufreq: tegra194: Staticize struct tegra_cpufreq_soc instances
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM6375 compatible
  dt-bindings: opp: Add msm8939 to the compatible list
  dt-bindings: opp: Add missing compat devices
  dt-bindings: opp: opp-v2-kryo-cpu: Fix example binding checks
  cpufreq: Change order of online() CB and policy->cpus modification
  cpufreq: qcom-hw: Remove deprecated irq_set_affinity_hint() call
  cpufreq: qcom-hw: Disable LMH irq when disabling policy
  cpufreq: qcom-hw: Reset cancel_throttle when policy is re-enabled
  cpufreq: qcom-cpufreq-hw: use HZ_PER_KHZ macro in units.h
  cpufreq: mediatek: fix error return code in mtk_cpu_dvfs_info_init()

3 years agofs/ntfs3: Make ni_ins_new_attr return error
Konstantin Komarov [Wed, 13 Jul 2022 16:18:19 +0000 (19:18 +0300)] 
fs/ntfs3: Make ni_ins_new_attr return error

Function ni_ins_new_attr now returns ERR_PTR(err),
so we check it now in other functions like ni_expand_mft_list

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 years agofs/ntfs3: Create MFT zone only if length is large enough
Konstantin Komarov [Wed, 13 Jul 2022 16:11:15 +0000 (19:11 +0300)] 
fs/ntfs3: Create MFT zone only if length is large enough

Also removed uninformative print

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 years agofs/ntfs3: Refactoring attr_insert_range to restore after errors
Konstantin Komarov [Wed, 13 Jul 2022 15:44:24 +0000 (18:44 +0300)] 
fs/ntfs3: Refactoring attr_insert_range to restore after errors

Added done and undo labels for restoring after errors

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 years agofs/ntfs3: Refactoring attr_punch_hole to restore after errors
Konstantin Komarov [Wed, 13 Jul 2022 15:18:48 +0000 (18:18 +0300)] 
fs/ntfs3: Refactoring attr_punch_hole to restore after errors

Added comments to code
Added new function run_clone to make a copy of run
Added done and undo labels for restoring after errors

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 years agofs/ntfs3: Refactoring attr_set_size to restore after errors
Konstantin Komarov [Wed, 13 Jul 2022 15:00:03 +0000 (18:00 +0300)] 
fs/ntfs3: Refactoring attr_set_size to restore after errors

Added comments to code
Added two undo labels for restoring after errors

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 years agofs/ntfs3: New function ntfs_bad_inode
Konstantin Komarov [Wed, 13 Jul 2022 14:55:27 +0000 (17:55 +0300)] 
fs/ntfs3: New function ntfs_bad_inode

There are repetitive steps in case of bad inode
This commit wraps them in function

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 years agofs/ntfs3: Make MFT zone less fragmented
Konstantin Komarov [Thu, 7 Jul 2022 16:25:43 +0000 (19:25 +0300)] 
fs/ntfs3: Make MFT zone less fragmented

Now we take free space after the MFT zone if the MFT zone shrinks.

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 years agofs/ntfs3: Check possible errors in run_pack in advance
Konstantin Komarov [Wed, 6 Jul 2022 17:17:28 +0000 (20:17 +0300)] 
fs/ntfs3: Check possible errors in run_pack in advance

Checking in advance speeds things up in some cases.

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 years agofs/ntfs3: Added comments to frecord functions
Konstantin Komarov [Wed, 6 Jul 2022 17:01:11 +0000 (20:01 +0300)] 
fs/ntfs3: Added comments to frecord functions

Added some comments in frecord.c for more context.
Also changed run_lookup to static because it's an internal function.

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 years agofs/ntfs3: Fill duplicate info in ni_add_name
Konstantin Komarov [Fri, 1 Jul 2022 12:18:29 +0000 (15:18 +0300)] 
fs/ntfs3: Fill duplicate info in ni_add_name

Work with names must be completed in ni_add_name

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 years agofs/ntfs3: Make static function attr_load_runs
Konstantin Komarov [Fri, 1 Jul 2022 12:15:32 +0000 (15:15 +0300)] 
fs/ntfs3: Make static function attr_load_runs

attr_load_runs is an internal function

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 years agofs/ntfs3: Add new argument is_mft to ntfs_mark_rec_free
Konstantin Komarov [Thu, 30 Jun 2022 16:14:43 +0000 (19:14 +0300)] 
fs/ntfs3: Add new argument is_mft to ntfs_mark_rec_free

This argument helps in avoiding double locking

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 years agofs/ntfs3: Remove unused mi_mark_free
Konstantin Komarov [Thu, 30 Jun 2022 15:49:24 +0000 (18:49 +0300)] 
fs/ntfs3: Remove unused mi_mark_free

Cleaning up dead code
Fix wrong comments

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 years agofs/ntfs3: Fix very fragmented case in attr_punch_hole
Konstantin Komarov [Thu, 30 Jun 2022 15:31:26 +0000 (18:31 +0300)] 
fs/ntfs3: Fix very fragmented case in attr_punch_hole

In some cases we need to ni_find_attr attr_b

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 years agofs/ntfs3: Fix work with fragmented xattr
Konstantin Komarov [Fri, 13 May 2022 16:54:23 +0000 (19:54 +0300)] 
fs/ntfs3: Fix work with fragmented xattr

In some cases xattr is too fragmented,
so we need to load it before writing.

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 years agofs/ntfs3: Make ntfs_fallocate return -ENOSPC instead of -EFBIG
Konstantin Komarov [Fri, 13 May 2022 16:21:54 +0000 (19:21 +0300)] 
fs/ntfs3: Make ntfs_fallocate return -ENOSPC instead of -EFBIG

In some cases we need to return ENOSPC
Fixes xfstest generic/213
Fixes: 114346978cf6 ("fs/ntfs3: Check new size for limits")
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 years agofs/ntfs3: extend ni_insert_nonresident to return inserted ATTR_LIST_ENTRY
Konstantin Komarov [Fri, 13 May 2022 15:25:04 +0000 (18:25 +0300)] 
fs/ntfs3: extend ni_insert_nonresident to return inserted ATTR_LIST_ENTRY

Fixes xfstest generic/300
Fixes: 4534a70b7056 ("fs/ntfs3: Add headers and misc files")
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 years agofs/ntfs3: Check reserved size for maximum allowed
Konstantin Komarov [Thu, 12 May 2022 16:18:11 +0000 (19:18 +0300)] 
fs/ntfs3: Check reserved size for maximum allowed

Also don't mask EFBIG
Fixes xfstest generic/485
Fixes: 4342306f0f0d ("fs/ntfs3: Add file operations and implementation")
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 years agofs/ntfs3: Do not change mode if ntfs_set_ea failed
Konstantin Komarov [Thu, 12 May 2022 16:08:40 +0000 (19:08 +0300)] 
fs/ntfs3: Do not change mode if ntfs_set_ea failed

ntfs_set_ea can fail with NOSPC, so we don't need to
change mode in this situation.
Fixes xfstest generic/449
Fixes: be71b5cba2e6 ("fs/ntfs3: Add attrib operations")
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 years agomailbox: imx: clear pending interrupts
Peng Fan [Wed, 3 Aug 2022 07:53:26 +0000 (15:53 +0800)] 
mailbox: imx: clear pending interrupts

During MU initialization, there maybe pending GSR and RSR pending
interrupt, clear them to avoid unexpected kernel dump when requesting
mailbox channel

Reviewed-by: Jacky Bai <ping.bai@nxp.com>
Reviewed-by: Ye Li <ye.li@nxp.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
3 years agoio_uring: pass correct parameters to io_req_set_res
Ming Lei [Wed, 3 Aug 2022 12:07:57 +0000 (20:07 +0800)] 
io_uring: pass correct parameters to io_req_set_res

The two parameters of 'res' and 'cflags' are swapped, so fix it.
Without this fix, 'ublk del' hangs forever.

Cc: Pavel Begunkov <asml.silence@gmail.com>
Fixes: de23077eda61f ("io_uring: set completion results upfront")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220803120757.1668278-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agomodpost: use more reliable way to get fromsec in section_rel(a)()
Masahiro Yamada [Sat, 30 Jul 2022 17:36:35 +0000 (02:36 +0900)] 
modpost: use more reliable way to get fromsec in section_rel(a)()

The section name of Rel and Rela starts with ".rel" and ".rela"
respectively (but, I do not know whether this is specification or
convention).

For example, ".rela.text" holds relocation entries applied to the
".text" section.

So, the code chops the ".rel" or ".rela" prefix to get the name of
the section to which the relocation applies.

However, I do not like to skip 4 or 5 bytes blindly because it is
potential memory overrun.

The ELF specification provides a more reliable way to do this.

 - The sh_info field holds extra information, whose interpretation
   depends on the section type

 - If the section type is SHT_REL or SHT_RELA, the sh_info field holds
   the section header index of the section to which the relocation
   applies.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
3 years agomodpost: add array range check to sec_name()
Masahiro Yamada [Sat, 30 Jul 2022 17:36:34 +0000 (02:36 +0900)] 
modpost: add array range check to sec_name()

The section index is always positive, so the argument, secindex, should
be unsigned.

Also, inserted the array range check.

If sym->st_shndx is a special section index (between SHN_LORESERVE and
SHN_HIRESERVE), there is no corresponding section header.

For example, if a symbol specifies an absolute value, sym->st_shndx is
SHN_ABS (=0xfff1).

The current users do not cause the out-of-range access of
info->sechddrs[], but it is better to avoid such a pitfall.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
3 years agomodpost: refactor get_secindex()
Masahiro Yamada [Wed, 3 Aug 2022 13:50:13 +0000 (22:50 +0900)] 
modpost: refactor get_secindex()

SPECIAL() is only used in get_secindex(). Squash it.

Make the code more readable with more comments.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
3 years agokbuild: set EXIT trap before creating temporary directory
Masahiro Yamada [Thu, 28 Jul 2022 03:14:33 +0000 (12:14 +0900)] 
kbuild: set EXIT trap before creating temporary directory

Swap the order of 'mkdir' and 'trap' just in case the subshell is
interrupted between 'mkdir' and 'trap' although the effect might be
subtle.

This does not intend to make the cleanup perfect. There are more cases
that miss to remove the tmp directory, for example:

 - When interrupted, dash does not invoke the EXIT trap (bash does)

 - 'rm' command might be interrupted before removing the directory

I am not addressing all the cases since the tmp directory is harmless
after all.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
3 years agovideo: fbdev: i740fb: Check the argument of i740_calc_vclk()
Zheyu Ma [Wed, 3 Aug 2022 09:24:19 +0000 (17:24 +0800)] 
video: fbdev: i740fb: Check the argument of i740_calc_vclk()

Since the user can control the arguments of the ioctl() from the user
space, under special arguments that may result in a divide-by-zero bug.

If the user provides an improper 'pixclock' value that makes the argumet
of i740_calc_vclk() less than 'I740_RFREQ_FIX', it will cause a
divide-by-zero bug in:
    drivers/video/fbdev/i740fb.c:353 p_best = min(15, ilog2(I740_MAX_VCO_FREQ / (freq / I740_RFREQ_FIX)));

The following log can reveal it:

divide error: 0000 [#1] PREEMPT SMP KASAN PTI
RIP: 0010:i740_calc_vclk drivers/video/fbdev/i740fb.c:353 [inline]
RIP: 0010:i740fb_decode_var drivers/video/fbdev/i740fb.c:646 [inline]
RIP: 0010:i740fb_set_par+0x163f/0x3b70 drivers/video/fbdev/i740fb.c:742
Call Trace:
 fb_set_var+0x604/0xeb0 drivers/video/fbdev/core/fbmem.c:1034
 do_fb_ioctl+0x234/0x670 drivers/video/fbdev/core/fbmem.c:1110
 fb_ioctl+0xdd/0x130 drivers/video/fbdev/core/fbmem.c:1189

Fix this by checking the argument of i740_calc_vclk() first.

Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
3 years agovideo: fbdev: arkfb: Fix a divide-by-zero bug in ark_set_pixclock()
Zheyu Ma [Wed, 3 Aug 2022 09:23:12 +0000 (17:23 +0800)] 
video: fbdev: arkfb: Fix a divide-by-zero bug in ark_set_pixclock()

Since the user can control the arguments of the ioctl() from the user
space, under special arguments that may result in a divide-by-zero bug
in:
  drivers/video/fbdev/arkfb.c:784: ark_set_pixclock(info, (hdiv * info->var.pixclock) / hmul);
with hdiv=1, pixclock=1 and hmul=2 you end up with (1*1)/2 = (int) 0.
and then in:
  drivers/video/fbdev/arkfb.c:504: rv = dac_set_freq(par->dac, 0, 1000000000 / pixclock);
we'll get a division-by-zero.

The following log can reveal it:

divide error: 0000 [#1] PREEMPT SMP KASAN PTI
RIP: 0010:ark_set_pixclock drivers/video/fbdev/arkfb.c:504 [inline]
RIP: 0010:arkfb_set_par+0x10fc/0x24c0 drivers/video/fbdev/arkfb.c:784
Call Trace:
 fb_set_var+0x604/0xeb0 drivers/video/fbdev/core/fbmem.c:1034
 do_fb_ioctl+0x234/0x670 drivers/video/fbdev/core/fbmem.c:1110
 fb_ioctl+0xdd/0x130 drivers/video/fbdev/core/fbmem.c:1189

Fix this by checking the argument of ark_set_pixclock() first.

Fixes: 681e14730c73 ("arkfb: new framebuffer driver for ARK Logic cards")
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
3 years agox86/speculation: Add LFENCE to RSB fill sequence
Pawan Gupta [Tue, 2 Aug 2022 22:47:02 +0000 (15:47 -0700)] 
x86/speculation: Add LFENCE to RSB fill sequence

RSB fill sequence does not have any protection for miss-prediction of
conditional branch at the end of the sequence. CPU can speculatively
execute code immediately after the sequence, while RSB filling hasn't
completed yet.

  #define __FILL_RETURN_BUFFER(reg, nr, sp)       \
          mov     $(nr/2), reg;                   \
  771:                                            \
          ANNOTATE_INTRA_FUNCTION_CALL;           \
          call    772f;                           \
  773:    /* speculation trap */                  \
          UNWIND_HINT_EMPTY;                      \
          pause;                                  \
          lfence;                                 \
          jmp     773b;                           \
  772:                                            \
          ANNOTATE_INTRA_FUNCTION_CALL;           \
          call    774f;                           \
  775:    /* speculation trap */                  \
          UNWIND_HINT_EMPTY;                      \
          pause;                                  \
          lfence;                                 \
          jmp     775b;                           \
  774:                                            \
          add     $(BITS_PER_LONG/8) * 2, sp;     \
          dec     reg;                            \
          jnz     771b;        <----- CPU can miss-predict here.

Before RSB is filled, RETs that come in program order after this macro
can be executed speculatively, making them vulnerable to RSB-based
attacks.

Mitigate it by adding an LFENCE after the conditional branch to prevent
speculation while RSB is being filled.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
3 years agolibceph: clean up ceph_osdc_start_request prototype
Jeff Layton [Thu, 30 Jun 2022 20:21:50 +0000 (16:21 -0400)] 
libceph: clean up ceph_osdc_start_request prototype

This function always returns 0, and ignores the nofail boolean. Drop the
nofail argument, make the function void return and fix up the callers.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
3 years agomodpost: remove unused Elf_Sword macro
Masahiro Yamada [Tue, 26 Jul 2022 16:52:04 +0000 (01:52 +0900)] 
modpost: remove unused Elf_Sword macro

Commit 9ad21c3f3ecf ("kbuild: try harder to find symbol names in
modpost") added Elf_Sword (in a wrong way), but did not use it at all.

BTW, the current code looks weird.

The fix for the 32-bit part would be:

    Elf64_Sword  -->  Elf32_Sword

(inconsistet prefix, Elf32_ vs Elf64_)

The fix for the 64-bit part would be:

    Elf64_Sxword  -->  Elf64_Sword

(the size is different between Sword and Sxword)

Note:

    Elf32_Sword   ==  Elf64_Sword   ==  int32_t
    Elf32_Sxword  ==  Elf64_Sxword  ==  int64_t

Anyway, let's drop unused code instead of fixing it.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
3 years agoMakefile.extrawarn: re-enable -Wformat for clang
Justin Stitt [Wed, 20 Jul 2022 23:23:32 +0000 (16:23 -0700)] 
Makefile.extrawarn: re-enable -Wformat for clang

There's been an ongoing mission to re-enable the -Wformat warning for
Clang. A previous attempt at enabling the warning showed that there were
many instances of this warning throughout the codebase. The sheer amount
of these warnings really polluted builds and thus -Wno-format was added
to _temporarily_ toggle them off.

After many patches the warning has largely been eradicated for x86,
x86_64, arm, and arm64 on a variety of configs. The time to enable the
warning has never been better as it seems for the first time we are
ahead of them and can now solve them as they appear rather than tackling
from a backlog.

As to the root cause of this large backlog of warnings, Clang seems to
pickup on some more nuanced cases of format warnings caused by implicit
integer conversion as well as default argument promotions from
printf-like functions.

Link: https://github.com/ClangBuiltLinux/linux/issues/378
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
3 years agox86/numa: Use cpumask_available instead of hardcoded NULL check
Siddh Raman Pant [Sun, 31 Jul 2022 16:09:13 +0000 (21:39 +0530)] 
x86/numa: Use cpumask_available instead of hardcoded NULL check

GCC-12 started triggering a new warning:

  arch/x86/mm/numa.c: In function ‘cpumask_of_node’:
  arch/x86/mm/numa.c:916:39: warning: the comparison will always evaluate as ‘false’ for the address of ‘node_to_cpumask_map’ will never be NULL [-Waddress]
    916 |         if (node_to_cpumask_map[node] == NULL) {
        |                                       ^~

node_to_cpumask_map is of type cpumask_var_t[].

When CONFIG_CPUMASK_OFFSTACK is set, cpumask_var_t is typedef'd to a
pointer for dynamic allocation, else to an array of one element. The
"wicked game" can be checked on line 700 of include/linux/cpumask.h.

The original code in debug_cpumask_set_cpu() and cpumask_of_node() were
probably written by the original authors with CONFIG_CPUMASK_OFFSTACK=y
(i.e. dynamic allocation) in mind, checking if the cpumask was available
via a direct NULL check.

When CONFIG_CPUMASK_OFFSTACK is not set, GCC gives the above warning
while compiling the kernel.

Fix that by using cpumask_available(), which does the NULL check when
CONFIG_CPUMASK_OFFSTACK is set, otherwise returns true. Use it wherever
such checks are made.

Conditional definitions of cpumask_available() can be found along with
the definition of cpumask_var_t. Check the cpumask.h reference mentioned
above.

Fixes: c032ef60d1aa ("cpumask: convert node_to_cpumask_map[] to cpumask_var_t")
Fixes: de2d9445f162 ("x86: Unify node_to_cpumask_map handling between 32 and 64bit")
Signed-off-by: Siddh Raman Pant <code@siddh.me>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20220731160913.632092-1-code@siddh.me
3 years agox86/speculation: Add RSB VM Exit protections
Daniel Sneddon [Tue, 2 Aug 2022 22:47:01 +0000 (15:47 -0700)] 
x86/speculation: Add RSB VM Exit protections

tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as
documented for RET instructions after VM exits. Mitigate it with a new
one-entry RSB stuffing mechanism and a new LFENCE.

== Background ==

Indirect Branch Restricted Speculation (IBRS) was designed to help
mitigate Branch Target Injection and Speculative Store Bypass, i.e.
Spectre, attacks. IBRS prevents software run in less privileged modes
from affecting branch prediction in more privileged modes. IBRS requires
the MSR to be written on every privilege level change.

To overcome some of the performance issues of IBRS, Enhanced IBRS was
introduced.  eIBRS is an "always on" IBRS, in other words, just turn
it on once instead of writing the MSR on every privilege level change.
When eIBRS is enabled, more privileged modes should be protected from
less privileged modes, including protecting VMMs from guests.

== Problem ==

Here's a simplification of how guests are run on Linux' KVM:

void run_kvm_guest(void)
{
// Prepare to run guest
VMRESUME();
// Clean up after guest runs
}

The execution flow for that would look something like this to the
processor:

1. Host-side: call run_kvm_guest()
2. Host-side: VMRESUME
3. Guest runs, does "CALL guest_function"
4. VM exit, host runs again
5. Host might make some "cleanup" function calls
6. Host-side: RET from run_kvm_guest()

Now, when back on the host, there are a couple of possible scenarios of
post-guest activity the host needs to do before executing host code:

* on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not
touched and Linux has to do a 32-entry stuffing.

* on eIBRS hardware, VM exit with IBRS enabled, or restoring the host
IBRS=1 shortly after VM exit, has a documented side effect of flushing
the RSB except in this PBRSB situation where the software needs to stuff
the last RSB entry "by hand".

IOW, with eIBRS supported, host RET instructions should no longer be
influenced by guest behavior after the host retires a single CALL
instruction.

However, if the RET instructions are "unbalanced" with CALLs after a VM
exit as is the RET in #6, it might speculatively use the address for the
instruction after the CALL in #3 as an RSB prediction. This is a problem
since the (untrusted) guest controls this address.

Balanced CALL/RET instruction pairs such as in step #5 are not affected.

== Solution ==

The PBRSB issue affects a wide variety of Intel processors which
support eIBRS. But not all of them need mitigation. Today,
X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates
PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e.,
eIBRS systems which enable legacy IBRS explicitly.

However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT
and most of them need a new mitigation.

Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE
which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT.

The lighter-weight mitigation performs a CALL instruction which is
immediately followed by a speculative execution barrier (INT3). This
steers speculative execution to the barrier -- just like a retpoline
-- which ensures that speculation can never reach an unbalanced RET.
Then, ensure this CALL is retired before continuing execution with an
LFENCE.

In other words, the window of exposure is opened at VM exit where RET
behavior is troublesome. While the window is open, force RSB predictions
sampling for RET targets to a dead end at the INT3. Close the window
with the LFENCE.

There is a subset of eIBRS systems which are not vulnerable to PBRSB.
Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB.
Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO.

  [ bp: Massage, incorporate review comments from Andy Cooper. ]

Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
3 years agosched/rt: Fix Sparse warnings due to undefined rt.c declarations
Ben Dooks [Thu, 21 Jul 2022 14:51:55 +0000 (15:51 +0100)] 
sched/rt: Fix Sparse warnings due to undefined rt.c declarations

There are several symbols defined in kernel/sched/sched.h but get wrapped
in CONFIG_CGROUP_SCHED, even though dummy versions get built in rt.c and
therefore trigger Sparse warnings:

  kernel/sched/rt.c:309:6: warning: symbol 'unregister_rt_sched_group' was not declared. Should it be static?
  kernel/sched/rt.c:311:6: warning: symbol 'free_rt_sched_group' was not declared. Should it be static?
  kernel/sched/rt.c:313:5: warning: symbol 'alloc_rt_sched_group' was not declared. Should it be static?

Fix this by moving them outside the CONFIG_CGROUP_SCHED block.

[ mingo: Refreshed to the latest scheduler tree, tweaked changelog. ]

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20220721145155.358366-1-ben-linux@fluff.org
3 years agovideo:backlight: remove reference to AVR32 architecture in ltv350qv
Hans-Christian Noren Egtvedt [Sat, 20 Oct 2018 10:40:16 +0000 (12:40 +0200)] 
video:backlight: remove reference to AVR32 architecture in ltv350qv

The AVR32 architecture does no longer exist in the Linux kernel, hence
remove a reference to also non-existing Linux BSP CD from Atmel.

Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
3 years agovideo: remove support for non-existing atmel,at32ap-lcdc in atmel_lcdfb
Hans-Christian Noren Egtvedt [Sun, 5 Nov 2017 09:43:11 +0000 (10:43 +0100)] 
video: remove support for non-existing atmel,at32ap-lcdc in atmel_lcdfb

The AVR32 architecture has been removed from the kernel in commit
26202873bb51fafdaa51be3e8de7aab9beb49f70, hence clean out the
atmel,at32ap-lcdc parts in the atmel_lcdfb.c video driver.

AVR32 architecture never supported device tree, hence this code was not
used by anybody.

Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
3 years agousb:udc: remove reference to AVR32 architecture in Atmel USBA Kconfig
Hans-Christian Noren Egtvedt [Sat, 20 Oct 2018 10:39:09 +0000 (12:39 +0200)] 
usb:udc: remove reference to AVR32 architecture in Atmel USBA Kconfig

The AVR32 architecture does no longer exist in the Linux kernel, hence
remove a reference to it in Kconfig help text to avoid confusion.

Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
3 years agosound:spi: remove reference to AVR32 in Atmel AT73C213 DAC driver
Hans-Christian Noren Egtvedt [Sat, 20 Oct 2018 10:48:57 +0000 (12:48 +0200)] 
sound:spi: remove reference to AVR32 in Atmel AT73C213 DAC driver

The AVR32 architecture does no longer exist in the Linux kernel, hence
remove a reference to it in Kconfig help text to avoid confusion.

Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
3 years agonet: remove cdns,at32ap7000-macb device tree entry
Hans-Christian Noren Egtvedt [Sun, 5 Nov 2017 10:19:53 +0000 (11:19 +0100)] 
net: remove cdns,at32ap7000-macb device tree entry

The AVR32 architecture has been removed from the kernel in commit
26202873bb51fafdaa51be3e8de7aab9beb49f70, hence clean out the
cdns,at32ap7000-macb compatible entry in Cadence macb Ethernet driver.

AVR32 architecture never supported device tree, hence this code was not
used by anybody.

Updated documentation to match the default entry, no users of
cdns,at32ap7000-macb in the kernel tree.

Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
3 years agomisc: update maintainer email address and description for atmel-ssc
Hans-Christian Noren Egtvedt [Sat, 20 Oct 2018 10:33:59 +0000 (12:33 +0200)] 
misc: update maintainer email address and description for atmel-ssc

I have changed my overall maintainer email address to the samfundet.no
domain, hence update the atmel-ssc module to reflect that.

Also remove the AVR32 reference, since the AVR32 architecture no longer
exist in the Linux kernel.

Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
3 years agomfd: remove reference to AVR32 architecture in atmel-smc.c
Hans-Christian Noren Egtvedt [Sat, 20 Oct 2018 10:30:17 +0000 (12:30 +0200)] 
mfd: remove reference to AVR32 architecture in atmel-smc.c

The AVR32 architecture does no longer exist in the Linux kernel, hence
remove a reference to it in comments to avoid confusion.

Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
3 years agodma:dw: remove reference to AVR32 architecture in core.c
Hans-Christian Noren Egtvedt [Sat, 20 Oct 2018 10:30:17 +0000 (12:30 +0200)] 
dma:dw: remove reference to AVR32 architecture in core.c

The AVR32 architecture does no longer exist in the Linux kernel, hence
remove a reference to it in comments to avoid confusion.

Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
3 years agoexit: Fix typo in comment: s/sub-theads/sub-threads
Ingo Molnar [Wed, 3 Aug 2022 08:43:42 +0000 (10:43 +0200)] 
exit: Fix typo in comment: s/sub-theads/sub-threads

Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
3 years agosched, cpuset: Fix dl_cpu_busy() panic due to empty cs->cpus_allowed
Waiman Long [Wed, 3 Aug 2022 01:54:51 +0000 (21:54 -0400)] 
sched, cpuset: Fix dl_cpu_busy() panic due to empty cs->cpus_allowed

With cgroup v2, the cpuset's cpus_allowed mask can be empty indicating
that the cpuset will just use the effective CPUs of its parent. So
cpuset_can_attach() can call task_can_attach() with an empty mask.
This can lead to cpumask_any_and() returns nr_cpu_ids causing the call
to dl_bw_of() to crash due to percpu value access of an out of bound
CPU value. For example:

[80468.182258] BUG: unable to handle page fault for address: ffffffff8b6648b0
  :
[80468.191019] RIP: 0010:dl_cpu_busy+0x30/0x2b0
  :
[80468.207946] Call Trace:
[80468.208947]  cpuset_can_attach+0xa0/0x140
[80468.209953]  cgroup_migrate_execute+0x8c/0x490
[80468.210931]  cgroup_update_dfl_csses+0x254/0x270
[80468.211898]  cgroup_subtree_control_write+0x322/0x400
[80468.212854]  kernfs_fop_write_iter+0x11c/0x1b0
[80468.213777]  new_sync_write+0x11f/0x1b0
[80468.214689]  vfs_write+0x1eb/0x280
[80468.215592]  ksys_write+0x5f/0xe0
[80468.216463]  do_syscall_64+0x5c/0x80
[80468.224287]  entry_SYSCALL_64_after_hwframe+0x44/0xae

Fix that by using effective_cpus instead. For cgroup v1, effective_cpus
is the same as cpus_allowed. For v2, effective_cpus is the real cpumask
to be used by tasks within the cpuset anyway.

Also update task_can_attach()'s 2nd argument name to cs_effective_cpus to
reflect the change. In addition, a check is added to task_can_attach()
to guard against the possibility that cpumask_any_and() may return a
value >= nr_cpu_ids.

Fixes: 7f51412a415d ("sched/deadline: Fix bandwidth check/update when migrating tasks between exclusive cpusets")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/r/20220803015451.2219567-1-longman@redhat.com
3 years agoMAINTAINERS: Use Lee Jones' kernel.org address for Backlight submissions
Lee Jones [Mon, 1 Aug 2022 15:53:51 +0000 (16:53 +0100)] 
MAINTAINERS: Use Lee Jones' kernel.org address for Backlight submissions

Going forward, I'll be using my kernel.org for upstream work.

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Lee Jones <lee@kernel.org>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Paolo Abeni [Wed, 3 Aug 2022 06:50:42 +0000 (08:50 +0200)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Conflicts:

net/ax25/af_ax25.c
  d7c4c9e075f8c ("ax25: fix incorrect dev_tracker usage")
  d62607c3fe459 ("net: rename reference+tracking helpers")

drivers/net/netdevsim/fib.c
  180a6a3ee60a ("netdevsim: fib: Fix reference count leak on route deletion failure")
  012ec02ae441 ("netdevsim: convert driver to use unlocked devlink API during init/fini")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 years agopowerpc/64e: Fix kexec build error
Michael Ellerman [Wed, 3 Aug 2022 06:29:41 +0000 (16:29 +1000)] 
powerpc/64e: Fix kexec build error

When building ppc64_book3e_allmodconfig the build fails with:

  arch/powerpc/kexec/file_load_64.c:1063:14: error: implicit declaration of function ‘firmware_has_feature’
   1063 |         if (!firmware_has_feature(FW_FEATURE_LPAR))
        |              ^~~~~~~~~~~~~~~~~~~~

Add a direct include of asm/firmware.h to fix the error.

Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220803063152.1249270-1-mpe@ellerman.id.au
3 years agotty: serial: qcom-geni-serial: Fix %lu -> %u in print statements
Douglas Anderson [Tue, 2 Aug 2022 20:23:09 +0000 (13:23 -0700)] 
tty: serial: qcom-geni-serial: Fix %lu -> %u in print statements

When we multiply an unsigned int by a u32 we still end up with an
unsigned int. That means we should specify "%u" not "%lu" in the
format code.

NOTE: this fix was chosen instead of somehow promoting the value to
"unsigned long" since the max baud rate from the earlier call to
uart_get_baud_rate() is 4000000 and the max sampling rate is 32.
4000000 * 32 = 0x07a12000, not even close to overflowing 32-bits.

Fixes: c474c775716e ("tty: serial: qcom-geni-serial: Fix get_clk_div_rate() which otherwise could return a sub-optimal clock rate.")
Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20220802132250.1.Iea061e14157a17e114dbe2eca764568a02d6b889@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoxfrm: clone missing x->lastused in xfrm_do_migrate
Antony Antony [Wed, 27 Jul 2022 15:41:22 +0000 (17:41 +0200)] 
xfrm: clone missing x->lastused in xfrm_do_migrate

x->lastused was not cloned in xfrm_do_migrate. Add it to clone during
migrate.

Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)")
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
3 years agoxfrm: fix XFRMA_LASTUSED comment
Antony Antony [Wed, 27 Jul 2022 15:40:53 +0000 (17:40 +0200)] 
xfrm: fix XFRMA_LASTUSED comment

It is a __u64, internally time64_t.

Fixes: bf825f81b454 ("xfrm: introduce basic mark infrastructure")
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
3 years agoRevert "xfrm: update SA curlft.use_time"
Antony Antony [Wed, 27 Jul 2022 15:38:35 +0000 (17:38 +0200)] 
Revert "xfrm: update SA curlft.use_time"

This reverts commit af734a26a1a95a9fda51f2abb0c22a7efcafd5ca.

The abvoce commit is a regression according RFC 2367. A better fix would be
use x->lastused. Which will be propsed later.

according to RFC 2367 use_time == sadb_lifetime_usetime.

"sadb_lifetime_usetime
                   For CURRENT, the time, in seconds, when association
                   was first used. For HARD and SOFT, the number of
                   seconds after the first use of the association until
                   it expires."

Fixes: af734a26a1a9 ("xfrm: update SA curlft.use_time")
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
3 years agodoc: sfp-phylink: Fix a broken reference
Christophe JAILLET [Sun, 31 Jul 2022 05:59:00 +0000 (07:59 +0200)] 
doc: sfp-phylink: Fix a broken reference

The commit in Fixes: has changed a .txt file into a .yaml file. Update the
documentation accordingly.

While at it add some `` around some file names to improve the output.

Fixes: 70991f1e6858 ("dt-bindings: net: convert sff,sfp to dtschema")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/be3c7e87ca7f027703247eccfe000b8e34805094.1659247114.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge tag 'amd-drm-next-5.20-2022-07-29' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Wed, 3 Aug 2022 04:00:18 +0000 (14:00 +1000)] 
Merge tag 'amd-drm-next-5.20-2022-07-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-5.20-2022-07-29:

amdgpu:
- Misc spelling and grammar fixes
- DC whitespace cleanups
- ACP smatch fix
- GFX 11.0 updates
- PSP 13.0 updates
- VCN 4.0 updates
- DC FP fix for PPC64
- Misc bug fixes

amdkfd:
- SVM fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220729202742.6636-1-alexander.deucher@amd.com
3 years agoext4: add ioctls to get/set the ext4 superblock uuid
Jeremy Bongio [Thu, 21 Jul 2022 22:44:22 +0000 (15:44 -0700)] 
ext4: add ioctls to get/set the ext4 superblock uuid

This fixes a race between changing the ext4 superblock uuid and operations
like mounting, resizing, changing features, etc.

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jeremy Bongio <bongiojp@gmail.com>
Link: https://lore.kernel.org/r/20220721224422.438351-1-bongiojp@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: avoid resizing to a partial cluster size
Kiselev, Oleg [Wed, 20 Jul 2022 04:27:48 +0000 (04:27 +0000)] 
ext4: avoid resizing to a partial cluster size

This patch avoids an attempt to resize the filesystem to an
unaligned cluster boundary.  An online resize to a size that is not
integral to cluster size results in the last iteration attempting to
grow the fs by a negative amount, which trips a BUG_ON and leaves the fs
with a corrupted in-memory superblock.

Signed-off-by: Oleg Kiselev <okiselev@amazon.com>
Link: https://lore.kernel.org/r/0E92A0AB-4F16-4F1A-94B7-702CC6504FDE@amazon.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: reduce computation of overhead during resize
Kiselev, Oleg [Wed, 20 Jul 2022 04:26:22 +0000 (04:26 +0000)] 
ext4: reduce computation of overhead during resize

This patch avoids doing an O(n**2)-complexity walk through every flex group.
Instead, it uses the already computed overhead information for the newly
allocated space, and simply adds it to the previously calculated
overhead stored in the superblock.  This drastically reduces the time
taken to resize very large bigalloc filesystems (from 3+ hours for a
64TB fs down to milliseconds).

Signed-off-by: Oleg Kiselev <okiselev@amazon.com>
Link: https://lore.kernel.org/r/CE4F359F-4779-45E6-B6A9-8D67FDFF5AE2@amazon.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agojbd2: fix assertion 'jh->b_frozen_data == NULL' failure when journal aborted
Zhihao Cheng [Fri, 15 Jul 2022 12:51:52 +0000 (20:51 +0800)] 
jbd2: fix assertion 'jh->b_frozen_data == NULL' failure when journal aborted

Following process will fail assertion 'jh->b_frozen_data == NULL' in
jbd2_journal_dirty_metadata():

                   jbd2_journal_commit_transaction
unlink(dir/a)
 jh->b_transaction = trans1
 jh->b_jlist = BJ_Metadata
                    journal->j_running_transaction = NULL
                    trans1->t_state = T_COMMIT
unlink(dir/b)
 handle->h_trans = trans2
 do_get_write_access
  jh->b_modified = 0
  jh->b_frozen_data = frozen_buffer
  jh->b_next_transaction = trans2
 jbd2_journal_dirty_metadata
  is_handle_aborted
   is_journal_aborted // return false

           --> jbd2 abort <--

                     while (commit_transaction->t_buffers)
                      if (is_journal_aborted)
                       jbd2_journal_refile_buffer
                        __jbd2_journal_refile_buffer
                         WRITE_ONCE(jh->b_transaction,
jh->b_next_transaction)
                         WRITE_ONCE(jh->b_next_transaction, NULL)
                         __jbd2_journal_file_buffer(jh, BJ_Reserved)
        J_ASSERT_JH(jh, jh->b_frozen_data == NULL) // assertion failure !

The reproducer (See detail in [Link]) reports:
 ------------[ cut here ]------------
 kernel BUG at fs/jbd2/transaction.c:1629!
 invalid opcode: 0000 [#1] PREEMPT SMP
 CPU: 2 PID: 584 Comm: unlink Tainted: G        W
 5.19.0-rc6-00115-g4a57a8400075-dirty #697
 RIP: 0010:jbd2_journal_dirty_metadata+0x3c5/0x470
 RSP: 0018:ffffc90000be7ce0 EFLAGS: 00010202
 Call Trace:
  <TASK>
  __ext4_handle_dirty_metadata+0xa0/0x290
  ext4_handle_dirty_dirblock+0x10c/0x1d0
  ext4_delete_entry+0x104/0x200
  __ext4_unlink+0x22b/0x360
  ext4_unlink+0x275/0x390
  vfs_unlink+0x20b/0x4c0
  do_unlinkat+0x42f/0x4c0
  __x64_sys_unlink+0x37/0x50
  do_syscall_64+0x35/0x80

After journal aborting, __jbd2_journal_refile_buffer() is executed with
holding @jh->b_state_lock, we can fix it by moving 'is_handle_aborted()'
into the area protected by @jh->b_state_lock.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216251
Fixes: 470decc613ab20 ("[PATCH] jbd2: initial copy of files from jbd")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Link: https://lore.kernel.org/r/20220715125152.4022726-1-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: block range must be validated before use in ext4_mb_clear_bb()
Lukas Czerner [Thu, 14 Jul 2022 16:59:03 +0000 (18:59 +0200)] 
ext4: block range must be validated before use in ext4_mb_clear_bb()

Block range to free is validated in ext4_free_blocks() using
ext4_inode_block_valid() and then it's passed to ext4_mb_clear_bb().
However in some situations on bigalloc file system the range might be
adjusted after the validation in ext4_free_blocks() which can lead to
troubles on corrupted file systems such as one found by syzkaller that
resulted in the following BUG

kernel BUG at fs/ext4/ext4.h:3319!
PREEMPT SMP NOPTI
CPU: 28 PID: 4243 Comm: repro Kdump: loaded Not tainted 5.19.0-rc6+ #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
RIP: 0010:ext4_free_blocks+0x95e/0xa90
Call Trace:
 <TASK>
 ? lock_timer_base+0x61/0x80
 ? __es_remove_extent+0x5a/0x760
 ? __mod_timer+0x256/0x380
 ? ext4_ind_truncate_ensure_credits+0x90/0x220
 ext4_clear_blocks+0x107/0x1b0
 ext4_free_data+0x15b/0x170
 ext4_ind_truncate+0x214/0x2c0
 ? _raw_spin_unlock+0x15/0x30
 ? ext4_discard_preallocations+0x15a/0x410
 ? ext4_journal_check_start+0xe/0x90
 ? __ext4_journal_start_sb+0x2f/0x110
 ext4_truncate+0x1b5/0x460
 ? __ext4_journal_start_sb+0x2f/0x110
 ext4_evict_inode+0x2b4/0x6f0
 evict+0xd0/0x1d0
 ext4_enable_quotas+0x11f/0x1f0
 ext4_orphan_cleanup+0x3de/0x430
 ? proc_create_seq_private+0x43/0x50
 ext4_fill_super+0x295f/0x3ae0
 ? snprintf+0x39/0x40
 ? sget_fc+0x19c/0x330
 ? ext4_reconfigure+0x850/0x850
 get_tree_bdev+0x16d/0x260
 vfs_get_tree+0x25/0xb0
 path_mount+0x431/0xa70
 __x64_sys_mount+0xe2/0x120
 do_syscall_64+0x5b/0x80
 ? do_user_addr_fault+0x1e2/0x670
 ? exc_page_fault+0x70/0x170
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fdf4e512ace

Fix it by making sure that the block range is properly validated before
used every time it changes in ext4_free_blocks() or ext4_mb_clear_bb().

Link: https://syzkaller.appspot.com/bug?id=5266d464285a03cee9dbfda7d2452a72c3c2ae7c
Reported-by: syzbot+15cd994e273307bf5cfa@syzkaller.appspotmail.com
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: Tadeusz Struk <tadeusz.struk@linaro.org>
Tested-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Link: https://lore.kernel.org/r/20220714165903.58260-1-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agombcache: automatically delete entries from cache on freeing
Jan Kara [Tue, 12 Jul 2022 10:54:29 +0000 (12:54 +0200)] 
mbcache: automatically delete entries from cache on freeing

Use the fact that entries with elevated refcount are not removed from
the hash and just move removal of the entry from the hash to the entry
freeing time. When doing this we also change the generic code to hold
one reference to the cache entry, not two of them, which makes code
somewhat more obvious.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-10-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agombcache: Remove mb_cache_entry_delete()
Jan Kara [Tue, 12 Jul 2022 10:54:28 +0000 (12:54 +0200)] 
mbcache: Remove mb_cache_entry_delete()

Nobody uses mb_cache_entry_delete() anymore. Remove it.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-9-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext2: avoid deleting xattr block that is being reused
Jan Kara [Tue, 12 Jul 2022 10:54:27 +0000 (12:54 +0200)] 
ext2: avoid deleting xattr block that is being reused

Currently when we decide to reuse xattr block we detect the case when
the last reference to xattr block is being dropped at the same time and
cancel the reuse attempt. Convert ext2 to a new scheme when as soon as
matching mbcache entry is found, we wait with dropping the last xattr
block reference until mbcache entry reference is dropped (meaning either
the xattr block reference is increased or we decided not to reuse the
block).

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-8-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext2: unindent codeblock in ext2_xattr_set()
Jan Kara [Tue, 12 Jul 2022 10:54:26 +0000 (12:54 +0200)] 
ext2: unindent codeblock in ext2_xattr_set()

Replace one else in ext2_xattr_set() with a goto. This makes following
code changes simpler to follow. No functional changes.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220712105436.32204-7-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext2: factor our freeing of xattr block reference
Jan Kara [Tue, 12 Jul 2022 10:54:25 +0000 (12:54 +0200)] 
ext2: factor our freeing of xattr block reference

Free of xattr block reference is opencode in two places. Factor it out
into a separate function and use it.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-6-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: fix race when reusing xattr blocks
Jan Kara [Tue, 12 Jul 2022 10:54:24 +0000 (12:54 +0200)] 
ext4: fix race when reusing xattr blocks

When ext4_xattr_block_set() decides to remove xattr block the following
race can happen:

CPU1                                    CPU2
ext4_xattr_block_set()                  ext4_xattr_release_block()
  new_bh = ext4_xattr_block_cache_find()

                                          lock_buffer(bh);
                                          ref = le32_to_cpu(BHDR(bh)->h_refcount);
                                          if (ref == 1) {
                                            ...
                                            mb_cache_entry_delete();
                                            unlock_buffer(bh);
                                            ext4_free_blocks();
                                              ...
                                              ext4_forget(..., bh, ...);
                                                jbd2_journal_revoke(..., bh);

  ext4_journal_get_write_access(..., new_bh, ...)
    do_get_write_access()
      jbd2_journal_cancel_revoke(..., new_bh);

Later the code in ext4_xattr_block_set() finds out the block got freed
and cancels reusal of the block but the revoke stays canceled and so in
case of block reuse and journal replay the filesystem can get corrupted.
If the race works out slightly differently, we can also hit assertions
in the jbd2 code.

Fix the problem by making sure that once matching mbcache entry is
found, code dropping the last xattr block reference (or trying to modify
xattr block in place) waits until the mbcache entry reference is
dropped. This way code trying to reuse xattr block is protected from
someone trying to drop the last reference to xattr block.

Reported-and-tested-by: Ritesh Harjani <ritesh.list@gmail.com>
CC: stable@vger.kernel.org
Fixes: 82939d7999df ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-5-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: unindent codeblock in ext4_xattr_block_set()
Jan Kara [Tue, 12 Jul 2022 10:54:23 +0000 (12:54 +0200)] 
ext4: unindent codeblock in ext4_xattr_block_set()

Remove unnecessary else (and thus indentation level) from a code block
in ext4_xattr_block_set(). It will also make following code changes
easier. No functional changes.

CC: stable@vger.kernel.org
Fixes: 82939d7999df ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-4-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: remove EA inode entry from mbcache on inode eviction
Jan Kara [Tue, 12 Jul 2022 10:54:22 +0000 (12:54 +0200)] 
ext4: remove EA inode entry from mbcache on inode eviction

Currently we remove EA inode from mbcache as soon as its xattr refcount
drops to zero. However there can be pending attempts to reuse the inode
and thus refcount handling code has to handle the situation when
refcount increases from zero anyway. So save some work and just keep EA
inode in mbcache until it is getting evicted. At that moment we are sure
following iget() of EA inode will fail anyway (or wait for eviction to
finish and load things from the disk again) and so removing mbcache
entry at that moment is fine and simplifies the code a bit.

CC: stable@vger.kernel.org
Fixes: 82939d7999df ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-3-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agombcache: add functions to delete entry if unused
Jan Kara [Tue, 12 Jul 2022 10:54:21 +0000 (12:54 +0200)] 
mbcache: add functions to delete entry if unused

Add function mb_cache_entry_delete_or_get() to delete mbcache entry if
it is unused and also add a function to wait for entry to become unused
- mb_cache_entry_wait_unused(). We do not share code between the two
deleting function as one of them will go away soon.

CC: stable@vger.kernel.org
Fixes: 82939d7999df ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agombcache: don't reclaim used entries
Jan Kara [Tue, 12 Jul 2022 10:54:20 +0000 (12:54 +0200)] 
mbcache: don't reclaim used entries

Do not reclaim entries that are currently used by somebody from a
shrinker. Firstly, these entries are likely useful. Secondly, we will
need to keep such entries to protect pending increment of xattr block
refcount.

CC: stable@vger.kernel.org
Fixes: 82939d7999df ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: make sure ext4_append() always allocates new block
Lukas Czerner [Mon, 4 Jul 2022 14:27:21 +0000 (16:27 +0200)] 
ext4: make sure ext4_append() always allocates new block

ext4_append() must always allocate a new block, otherwise we run the
risk of overwriting existing directory block corrupting the directory
tree in the process resulting in all manner of problems later on.

Add a sanity check to see if the logical block is already allocated and
error out if it is.

Cc: stable@kernel.org
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20220704142721.157985-2-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: check if directory block is within i_size
Lukas Czerner [Mon, 4 Jul 2022 14:27:20 +0000 (16:27 +0200)] 
ext4: check if directory block is within i_size

Currently ext4 directory handling code implicitly assumes that the
directory blocks are always within the i_size. In fact ext4_append()
will attempt to allocate next directory block based solely on i_size and
the i_size is then appropriately increased after a successful
allocation.

However, for this to work it requires i_size to be correct. If, for any
reason, the directory inode i_size is corrupted in a way that the
directory tree refers to a valid directory block past i_size, we could
end up corrupting parts of the directory tree structure by overwriting
already used directory blocks when modifying the directory.

Fix it by catching the corruption early in __ext4_read_dirblock().

Addresses Red-Hat-Bugzilla: #2070205
CVE: CVE-2022-1184
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20220704142721.157985-1-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: reflect mb_optimize_scan value in options file
Ojaswin Mujoo [Mon, 4 Jul 2022 05:46:03 +0000 (11:16 +0530)] 
ext4: reflect mb_optimize_scan value in options file

Add support to display the mb_optimize_scan value in
/proc/fs/ext4/<dev>/options file. The option is only
displayed when the value is non default.

Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20220704054603.21462-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: avoid remove directory when directory is corrupted
Ye Bin [Wed, 22 Jun 2022 09:02:23 +0000 (17:02 +0800)] 
ext4: avoid remove directory when directory is corrupted

Now if check directoy entry is corrupted, ext4_empty_dir may return true
then directory will be removed when file system mounted with "errors=continue".
In order not to make things worse just return false when directory is corrupted.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220622090223.682234-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: aligned '*' in comments
Jiang Jian [Tue, 21 Jun 2022 06:15:31 +0000 (14:15 +0800)] 
ext4: aligned '*' in comments

The '*' in the comment is not aligned.

Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com>
Link: https://lore.kernel.org/r/20220621061531.19669-1-jiangjian@cdjrlc.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoDocumentation: ext4: fix cell spacing of table heading on blockmap table
Bagas Sanjaya [Sun, 19 Jun 2022 07:29:39 +0000 (14:29 +0700)] 
Documentation: ext4: fix cell spacing of table heading on blockmap table

Commit 3103084afcf234 ("ext4, doc: remove unnecessary escaping") removes
redundant underscore escaping, however the cell spacing in heading row of
blockmap table became not aligned anymore, hence triggers malformed table
warning:

Documentation/filesystems/ext4/blockmap.rst:3: WARNING: Malformed table.

+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| i.i_block Offset   | Where It Points                                                                                                                                                                                                              |
<snipped>...

The warning caused the table not being loaded.

Realign the heading row cell by adding missing space at the first cell
to fix the warning.

Fixes: 3103084afcf234 ("ext4, doc: remove unnecessary escaping")
Cc: stable@kernel.org
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Wang Jianjian <wangjianjian3@huawei.com>
Cc: linux-ext4@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20220619072938.7334-1-bagasdotme@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: recover csum seed of tmp_inode after migrating to extents
Li Lingfeng [Fri, 17 Jun 2022 06:25:15 +0000 (14:25 +0800)] 
ext4: recover csum seed of tmp_inode after migrating to extents

When migrating to extents, the checksum seed of temporary inode
need to be replaced by inode's, otherwise the inode checksums
will be incorrect when swapping the inodes data.

However, the temporary inode can not match it's checksum to
itself since it has lost it's own checksum seed.

mkfs.ext4 -F /dev/sdc
mount /dev/sdc /mnt/sdc
xfs_io -fc "pwrite 4k 4k" -c "fsync" /mnt/sdc/testfile
chattr -e /mnt/sdc/testfile
chattr +e /mnt/sdc/testfile
umount /dev/sdc
fsck -fn /dev/sdc

========
...
Pass 1: Checking inodes, blocks, and sizes
Inode 13 passes checks, but checksum does not match inode.  Fix? no
...
========

The fix is simple, save the checksum seed of temporary inode, and
recover it after migrating to extents.

Fixes: e81c9302a6c3 ("ext4: set csum seed in tmp inode while migrating to extents")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220617062515.2113438-1-lilingfeng3@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: fix warning in ext4_iomap_begin as race between bmap and write
Ye Bin [Fri, 17 Jun 2022 01:39:35 +0000 (09:39 +0800)] 
ext4: fix warning in ext4_iomap_begin as race between bmap and write

We got issue as follows:
------------[ cut here ]------------
WARNING: CPU: 3 PID: 9310 at fs/ext4/inode.c:3441 ext4_iomap_begin+0x182/0x5d0
RIP: 0010:ext4_iomap_begin+0x182/0x5d0
RSP: 0018:ffff88812460fa08 EFLAGS: 00010293
RAX: ffff88811f168000 RBX: 0000000000000000 RCX: ffffffff97793c12
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: ffff88812c669160 R08: ffff88811f168000 R09: ffffed10258cd20f
R10: ffff88812c669077 R11: ffffed10258cd20e R12: 0000000000000001
R13: 00000000000000a4 R14: 000000000000000c R15: ffff88812c6691ee
FS:  00007fd0d6ff3740(0000) GS:ffff8883af180000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd0d6dda290 CR3: 0000000104a62000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 iomap_apply+0x119/0x570
 iomap_bmap+0x124/0x150
 ext4_bmap+0x14f/0x250
 bmap+0x55/0x80
 do_vfs_ioctl+0x952/0xbd0
 __x64_sys_ioctl+0xc6/0x170
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Above issue may happen as follows:
          bmap                    write
bmap
  ext4_bmap
    iomap_bmap
      ext4_iomap_begin
                            ext4_file_write_iter
      ext4_buffered_write_iter
        generic_perform_write
  ext4_da_write_begin
    ext4_da_write_inline_data_begin
      ext4_prepare_inline_data
        ext4_create_inline_data
  ext4_set_inode_flag(inode,
EXT4_INODE_INLINE_DATA);
      if (WARN_ON_ONCE(ext4_has_inline_data(inode))) ->trigger bug_on

To solved above issue hold inode lock in ext4_bamp.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20220617013935.397596-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
3 years agoext4: correct the misjudgment in ext4_iget_extra_inode
Baokun Li [Thu, 16 Jun 2022 02:13:58 +0000 (10:13 +0800)] 
ext4: correct the misjudgment in ext4_iget_extra_inode

Use the EXT4_INODE_HAS_XATTR_SPACE macro to more accurately
determine whether the inode have xattr space.

Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220616021358.2504451-5-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: correct max_inline_xattr_value_size computing
Baokun Li [Thu, 16 Jun 2022 02:13:57 +0000 (10:13 +0800)] 
ext4: correct max_inline_xattr_value_size computing

If the ext4 inode does not have xattr space, 0 is returned in the
get_max_inline_xattr_value_size function. Otherwise, the function returns
a negative value when the inode does not contain EXT4_STATE_XATTR.

Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220616021358.2504451-4-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: fix use-after-free in ext4_xattr_set_entry
Baokun Li [Thu, 16 Jun 2022 02:13:56 +0000 (10:13 +0800)] 
ext4: fix use-after-free in ext4_xattr_set_entry

Hulk Robot reported a issue:
==================================================================
BUG: KASAN: use-after-free in ext4_xattr_set_entry+0x18ab/0x3500
Write of size 4105 at addr ffff8881675ef5f4 by task syz-executor.0/7092

CPU: 1 PID: 7092 Comm: syz-executor.0 Not tainted 4.19.90-dirty #17
Call Trace:
[...]
 memcpy+0x34/0x50 mm/kasan/kasan.c:303
 ext4_xattr_set_entry+0x18ab/0x3500 fs/ext4/xattr.c:1747
 ext4_xattr_ibody_inline_set+0x86/0x2a0 fs/ext4/xattr.c:2205
 ext4_xattr_set_handle+0x940/0x1300 fs/ext4/xattr.c:2386
 ext4_xattr_set+0x1da/0x300 fs/ext4/xattr.c:2498
 __vfs_setxattr+0x112/0x170 fs/xattr.c:149
 __vfs_setxattr_noperm+0x11b/0x2a0 fs/xattr.c:180
 __vfs_setxattr_locked+0x17b/0x250 fs/xattr.c:238
 vfs_setxattr+0xed/0x270 fs/xattr.c:255
 setxattr+0x235/0x330 fs/xattr.c:520
 path_setxattr+0x176/0x190 fs/xattr.c:539
 __do_sys_lsetxattr fs/xattr.c:561 [inline]
 __se_sys_lsetxattr fs/xattr.c:557 [inline]
 __x64_sys_lsetxattr+0xc2/0x160 fs/xattr.c:557
 do_syscall_64+0xdf/0x530 arch/x86/entry/common.c:298
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x459fe9
RSP: 002b:00007fa5e54b4c08 EFLAGS: 00000246 ORIG_RAX: 00000000000000bd
RAX: ffffffffffffffda RBX: 000000000051bf60 RCX: 0000000000459fe9
RDX: 00000000200003c0 RSI: 0000000020000180 RDI: 0000000020000140
RBP: 000000000051bf60 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000001009 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffc73c93fc0 R14: 000000000051bf60 R15: 00007fa5e54b4d80
[...]
==================================================================

Above issue may happen as follows:
-------------------------------------
ext4_xattr_set
  ext4_xattr_set_handle
    ext4_xattr_ibody_find
      >> s->end < s->base
      >> no EXT4_STATE_XATTR
      >> xattr_check_inode is not executed
    ext4_xattr_ibody_set
      ext4_xattr_set_entry
       >> size_t min_offs = s->end - s->base
       >> UAF in memcpy

we can easily reproduce this problem with the following commands:
    mkfs.ext4 -F /dev/sda
    mount -o debug_want_extra_isize=128 /dev/sda /mnt
    touch /mnt/file
    setfattr -n user.cat -v `seq -s z 4096|tr -d '[:digit:]'` /mnt/file

In ext4_xattr_ibody_find, we have the following assignment logic:
  header = IHDR(inode, raw_inode)
         = raw_inode + EXT4_GOOD_OLD_INODE_SIZE + i_extra_isize
  is->s.base = IFIRST(header)
             = header + sizeof(struct ext4_xattr_ibody_header)
  is->s.end = raw_inode + s_inode_size

In ext4_xattr_set_entry
  min_offs = s->end - s->base
           = s_inode_size - EXT4_GOOD_OLD_INODE_SIZE - i_extra_isize -
     sizeof(struct ext4_xattr_ibody_header)
  last = s->first
  free = min_offs - ((void *)last - s->base) - sizeof(__u32)
       = s_inode_size - EXT4_GOOD_OLD_INODE_SIZE - i_extra_isize -
         sizeof(struct ext4_xattr_ibody_header) - sizeof(__u32)

In the calculation formula, all values except s_inode_size and
i_extra_size are fixed values. When i_extra_size is the maximum value
s_inode_size - EXT4_GOOD_OLD_INODE_SIZE, min_offs is -4 and free is -8.
The value overflows. As a result, the preceding issue is triggered when
memcpy is executed.

Therefore, when finding xattr or setting xattr, check whether
there is space for storing xattr in the inode to resolve this issue.

Cc: stable@kernel.org
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220616021358.2504451-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: add EXT4_INODE_HAS_XATTR_SPACE macro in xattr.h
Baokun Li [Thu, 16 Jun 2022 02:13:55 +0000 (10:13 +0800)] 
ext4: add EXT4_INODE_HAS_XATTR_SPACE macro in xattr.h

When adding an xattr to an inode, we must ensure that the inode_size is
not less than EXT4_GOOD_OLD_INODE_SIZE + extra_isize + pad. Otherwise,
the end position may be greater than the start position, resulting in UAF.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220616021358.2504451-2-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: fix extent status tree race in writeback error recovery path
Eric Whitney [Wed, 15 Jun 2022 16:05:30 +0000 (12:05 -0400)] 
ext4: fix extent status tree race in writeback error recovery path

A race can occur in the unlikely event ext4 is unable to allocate a
physical cluster for a delayed allocation in a bigalloc file system
during writeback.  Failure to allocate a cluster forces error recovery
that includes a call to mpage_release_unused_pages().  That function
removes any corresponding delayed allocated blocks from the extent
status tree.  If a new delayed write is in progress on the same cluster
simultaneously, resulting in the addition of an new extent containing
one or more blocks in that cluster to the extent status tree, delayed
block accounting can be thrown off if that delayed write then encounters
a similar cluster allocation failure during future writeback.

Write lock the i_data_sem in mpage_release_unused_pages() to fix this
problem.  Ext4's block/cluster accounting code for bigalloc relies on
i_data_sem for mutual exclusion, as is found in the delayed write path,
and the locking in mpage_release_unused_pages() is missing.

Cc: stable@kernel.org
Reported-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Link: https://lore.kernel.org/r/20220615160530.1928801-1-enwlinux@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agojbd2: fix outstanding credits assert in jbd2_journal_commit_transaction()
Zhang Yi [Sat, 11 Jun 2022 13:04:26 +0000 (21:04 +0800)] 
jbd2: fix outstanding credits assert in jbd2_journal_commit_transaction()

We catch an assert problem in jbd2_journal_commit_transaction() when
doing fsstress and request falut injection tests. The problem is
happened in a race condition between jbd2_journal_commit_transaction()
and ext4_end_io_end(). Firstly, ext4_writepages() writeback dirty pages
and start reserved handle, and then the journal was aborted due to some
previous metadata IO error, jbd2_journal_abort() start to commit current
running transaction, the committing procedure could be raced by
ext4_end_io_end() and lead to subtract j_reserved_credits twice from
commit_transaction->t_outstanding_credits, finally the
t_outstanding_credits is mistakenly smaller than t_nr_buffers and
trigger assert.

kjournald2           kworker

jbd2_journal_commit_transaction()
 write_unlock(&journal->j_state_lock);
 atomic_sub(j_reserved_credits, t_outstanding_credits); //sub once

                   jbd2_journal_start_reserved()
                    start_this_handle()  //detect aborted journal
                    jbd2_journal_free_reserved()  //get running transaction
                       read_lock(&journal->j_state_lock)
                      __jbd2_journal_unreserve_handle()
                     atomic_sub(j_reserved_credits, t_outstanding_credits);
                       //sub again
                       read_unlock(&journal->j_state_lock);

 journal->j_running_transaction = NULL;
 J_ASSERT(t_nr_buffers <= t_outstanding_credits) //bomb!!!

Fix this issue by using journal->j_state_lock to protect the subtraction
in jbd2_journal_commit_transaction().

Fixes: 96f1e0974575 ("jbd2: avoid long hold times of j_state_lock while committing a transaction")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220611130426.2013258-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agojbd2: unexport jbd2_log_start_commit()
Jan Kara [Wed, 8 Jun 2022 11:23:50 +0000 (13:23 +0200)] 
jbd2: unexport jbd2_log_start_commit()

jbd2_log_start_commit() is not used outside of jbd2 so unexport it. Also
make __jbd2_log_start_commit() static when we are at it.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20220608112355.4397-4-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agojbd2: remove unused exports for jbd2 debugging
Jan Kara [Wed, 8 Jun 2022 11:23:49 +0000 (13:23 +0200)] 
jbd2: remove unused exports for jbd2 debugging

Jbd2 exports jbd2_journal_enable_debug and __jbd2_debug() depite the
first is used only in fs/jbd2/journal.c and the second only within jbd2
code. Remove the pointless exports make jbd2_journal_enable_debug
static.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20220608112355.4397-3-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agojbd2: rename jbd_debug() to jbd2_debug()
Jan Kara [Wed, 8 Jun 2022 11:23:48 +0000 (13:23 +0200)] 
jbd2: rename jbd_debug() to jbd2_debug()

The name of jbd_debug() is confusing as all functions inside jbd2 have
jbd2_ prefix. Rename jbd_debug() to jbd2_debug(). No functional changes.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20220608112355.4397-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: use ext4_debug() instead of jbd_debug()
Jan Kara [Wed, 8 Jun 2022 11:23:47 +0000 (13:23 +0200)] 
ext4: use ext4_debug() instead of jbd_debug()

We use jbd_debug() in some places in ext4. It seems a bit strange to use
jbd2 debugging output function for ext4 code. Also these days
ext4_debug() uses dynamic printk so each debug message can be enabled /
disabled on its own so the time when it made some sense to have these
combined (to allow easier common selecting of messages to report) has
passed. Just convert all jbd_debug() uses in ext4 to ext4_debug().

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20220608112355.4397-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: reuse order and buddy in mb_mark_used when buddy split
hanjinke [Mon, 6 Jun 2022 15:53:05 +0000 (23:53 +0800)] 
ext4: reuse order and buddy in mb_mark_used when buddy split

After each buddy split, mb_mark_used will search the proper order
for the block which may consume some loop in mb_find_order_for_block.
In fact, we can reuse the order and buddy generated by the buddy split.

Reviewed by: lei.rao@intel.com
Signed-off-by: hanjinke <hanjinke.666@bytedance.com>
Link: https://lore.kernel.org/r/20220606155305.74146-1-hanjinke.666@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>