Liang He [Tue, 19 Jul 2022 09:52:15 +0000 (17:52 +0800)]
mmc: cavium-octeon: Add of_node_put() when breaking out of loop
In octeon_mmc_probe(), we should call of_node_put() when breaking
out of for_each_child_of_node() which has increased and decreased
the refcount during each iteration.
Fixes: 01d95843335c ("mmc: cavium: Add MMC support for Octeon SOCs.") Signed-off-by: Liang He <windhl@126.com> Acked-by: Robert Richter <rric@kernel.org> Link: https://lore.kernel.org/r/20220719095216.1241601-1-windhl@126.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Liang He [Tue, 19 Jul 2022 09:10:51 +0000 (17:10 +0800)]
mmc: core: quirks: Add of_node_put() when breaking out of loop
In mmc_fixup_of_compatible_match(), we should call of_node_put()
when breaking out of for_each_child_of_node() which will increase
and decrease the refcount during one iteration.
Dan Williams [Sat, 21 May 2022 23:24:14 +0000 (16:24 -0700)]
cxl/core: Define a 'struct cxl_endpoint_decoder'
Previously the target routing specifics of switch decoders and platform
CXL window resource tracking of root decoders were factored out of
'struct cxl_decoder'. While switch decoders translate from SPA to
downstream ports, endpoint decoders translate from SPA to DPA.
This patch, 3 of 3, adds a 'struct cxl_endpoint_decoder' that tracks an
endpoint-specific Device Physical Address (DPA) resource. For now this
just defines ->dpa_res, a follow-on patch will handle requesting DPA
resource ranges from a device-DPA resource tree.
Dan Williams [Wed, 13 Jul 2022 01:38:26 +0000 (18:38 -0700)]
cxl/core: Define a 'struct cxl_root_decoder'
Previously the target routing specifics of switch decoders were factored
out of 'struct cxl_decoder' into 'struct cxl_switch_decoder'.
This patch, 2 of 3, adds a 'struct cxl_root_decoder' as a superset of a
switch decoder that also track the associated CXL window platform
resource.
Note that the reason the resource for a given root decoder needs to be
looked up after the fact (i.e. after cxl_parse_cfmws() and
add_cxl_resource()) is because add_cxl_resource() may have merged CXL
windows in order to keep them at the top of the resource tree / decode
hierarchy.
Dan Williams [Wed, 13 Jul 2022 01:37:54 +0000 (18:37 -0700)]
cxl/acpi: Track CXL resources in iomem_resource
Recall that CXL capable address ranges, on ACPI platforms, are published
in the CEDT.CFMWS (CXL Early Discovery Table: CXL Fixed Memory Window
Structures). These windows represent both the actively mapped capacity
and the potential address space that can be dynamically assigned to a
new CXL decode configuration (region / interleave-set).
CXL endpoints like DDR DIMMs can be mapped at any physical address
including 0 and legacy ranges.
There is an expectation and requirement that the /proc/iomem interface
and the iomem_resource tree in the kernel reflect the full set of
platform address ranges. I.e. that every address range that platform
firmware and bus drivers enumerate be reflected as an iomem_resource
entry. The hard requirement to do this for CXL arises from the fact that
facilities like CONFIG_DEVICE_PRIVATE expect to be able to treat empty
iomem_resource ranges as free for software to use as proxy address
space. Without CXL publishing its potential address ranges in
iomem_resource, the CONFIG_DEVICE_PRIVATE mechanism may inadvertently
steal capacity reserved for runtime provisioning of new CXL regions.
So, iomem_resource needs to know about both active and potential CXL
resource ranges. The active CXL resources might already be reflected in
iomem_resource as "System RAM". insert_resource_expand_to_fit() handles
re-parenting "System RAM" underneath a CXL window.
The "_expand_to_fit()" behavior handles cases where a CXL window is not
a strict superset of an existing entry in the iomem_resource tree. The
"_expand_to_fit()" behavior is acceptable from the perspective of
resource allocation. The expansion happens because a conflicting
resource range is already populated, which means the resource boundary
expansion does not result in any additional free CXL address space being
made available. CXL address space allocation is always bounded by the
orginal unexpanded address range.
However, the potential for expansion does mean that something like
walk_iomem_res_desc(IORES_DESC_CXL...) can only return fuzzy answers on
corner case platforms that cause the resource tree to expand a CXL
window resource over a range that is not decoded by CXL. This would be
an odd platform configuration, but if it becomes a problem in practice
the CXL subsytem could just publish an API that returns definitive
answers.
Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/165784325943.1758207.5310344844375305118.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 19 May 2022 00:52:23 +0000 (17:52 -0700)]
cxl/core: Define a 'struct cxl_switch_decoder'
Currently 'struct cxl_decoder' contains the superset of attributes
needed for all decoder types. Before more type-specific attributes are
added to the common definition, reorganize 'struct cxl_decoder' into type
specific objects.
This patch, the first of three, factors out a cxl_switch_decoder type.
See the new kdoc for what a 'struct cxl_switch_decoder' represents in a
CXL topology.
ACPI: resource: skip IRQ override on AMD Zen platforms
IRQ override isn't needed on modern AMD Zen systems.
There's an active low keyboard IRQ on AMD Ryzen 6000 and it will stay
this way on newer platforms. This IRQ override breaks keyboards for
almost all Ryzen 6000 laptops currently on the market.
Skip this IRQ override for all AMD Zen platforms because this IRQ
override is supposed to be a workaround for buggy ACPI DSDT and we can't
have a long list of all future AMD CPUs/Laptops in the kernel code.
If a device with buggy ACPI DSDT shows up, a separated list containing
just them should be created.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216118 Suggested-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Chuanhong Guo <gch981213@gmail.com> Acked-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: XiaoYan Li <lxy.lixiaoyan@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since commit 488dac0c9237 ("libfs: fix error cast of negative value in
simple_attr_write()"), the EINJ debugfs interface no longer accepts
negative values as input. Attempt to do so will result in EINVAL.
Fixes: 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()") Signed-off-by: Qifu Zhang <zhangqifu@bytedance.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
erofs: record the longest decompressed size in this round
Currently, `pcl->length' records the longest decompressed length
as long as the pcluster itself isn't reclaimed. However, such
number is unneeded for the general cases since it doesn't indicate
the exact decompressed size in this round.
Instead, let's record the decompressed size for this round instead,
thus `pcl->nr_pages' can be completely dropped and pageofs_out is
also designed to be kept in sync with `pcl->length'.
Let's introduce struct z_erofs_decompress_backend in order to pass
on the decompression backend context between helper functions more
easier and avoid too many arguments.
In order to introduce multi-reference pclusters for compressed data
deduplication, let's get rid of the global page array for now since
it needs to be re-designed then at least.
`enum z_erofs_collectmode' is really ambiguous, but I'm not quite
sure if there are better naming, basically it's used to judge whether
inplace I/O can be used due to the current status of pclusters in
the chain.
Since all decompressed offsets have been integrated to bvecs[], this
patch avoids all sub-indexes so that page->private only includes a
part count and an eio flag, thus in the future folio->private can have
the same meaning.
In addition, PG_error will not be used anymore after this patch and
we're heading to use page->private (later folio->private) and
page->mapping (later folio->mapping) only.
`z_erofs_decompress_pcluster()' is too long therefore it'd be better
to introduce another helper to parse compressed pages (or laterly,
compressed bvecs.)
BTW, since `compressed_bvecs' is too long as a part of the function
name, `in_bvecs' is used here instead.
erofs: introduce bufvec to store decompressed buffers
For each pcluster, the total compressed buffers are determined in
advance, yet the number of decompressed buffers actually vary. Too
many decompressed pages can be recorded if one pcluster is highly
compressed or its pcluster size is large. That takes extra memory
footprints compared to uncompressed filesystems, especially a lot of
I/O in flight on low-ended devices.
Therefore, similar to inplace I/O, pagevec was introduced to reuse
page cache to store these pointers in the time-sharing way since
these pages are actually unused before decompressing.
In order to make it more flexable, a cleaner bufvec is used to
replace the old pagevec stuffs so that
- Decompressed offsets can be stored inline, thus it can be used
for the upcoming feature like compressed data deduplication.
It's calculated by `page_offset(page) - map->m_la';
- Towards supporting large folios for compressed inodes since
our final goal is to completely avoid page->private but use
folio->private only for all page cache pages.
`z_erofs_decompress_pcluster()' is too long therefore it'd be better
to introduce another helper to parse decompressed pages (or laterly,
decompressed bvecs.)
BTW, since `decompressed_bvecs' is too long as a part of the function
name, `out_bvecs' is used instead.
Bob Pearson [Thu, 14 Jul 2022 20:46:20 +0000 (15:46 -0500)]
RDMA/rxe: Fix mw bind to allow any consumer key portion
The current implementation of rxe_check_bind_mw() in rxe_mw.c is incorrect
since it requires the new key portion provided by the mw consumer to be
different than the previous key portion. This is not required by the
IBA. Remove the test.
hwmon: (tps23861) fix byte order in current and voltage registers
Trying to use this driver on a big-endian machine results in garbage
values for voltage and current. The tps23861 registers are little-
endian, and regmap_read_bulk() does not do byte order conversion. Thus
on BE machines, the most significant bytes got modified, and were
trimmed by the VOLTAGE_CURRENT_MASK.
To resolve this use uint16_t values, and convert them to host byte
order using le16_to_cpu(). This results in correct readings on MIPS.
Paul Fertser [Thu, 14 Jul 2022 14:23:44 +0000 (17:23 +0300)]
hwmon: (aspeed-pwm-tacho) increase fan tach period (again)
The old value allows measuring fan speeds down to about 970 RPM and
gives timeout for anything less than that. It is problematic because it
can also be used as an indicator for fan failure or absence.
Despite having read the relevant section of "ASPEED AST2500/AST2520 A2
Datasheet â V1.7" multiple times I wasn't able to figure out what
exactly "fan tach period" and "fan tach falling point of period" mean
(both are set by the driver from the constant this patch is amending).
Experimentation with a Tioga Pass OCP board (AST2500 BMC) showed that
value of 0x0108 gives time outs for speeds below 1500 RPM and the value
offered by the patch is good for at least 750 RPM (the fans can't spin
any slower so the lower bound is unknown). Measuring with the fans
spinning takes about 30 ms, sometimes down to 18 ms, so about the same
as with the previous value.
This constant was last changed with commit 762b1e888013 ("hwmon:
(aspeed-pwm-tacho) increase fan tach period")
Dave Marchevsky [Mon, 11 Jul 2022 17:48:08 +0000 (10:48 -0700)]
fuse: Add module param for CAP_SYS_ADMIN access bypassing allow_other
Since commit 73f03c2b4b52 ("fuse: Restrict allow_other to the superblock's
namespace or a descendant"), access to allow_other FUSE filesystems has
been limited to users in the mounting user namespace or descendants. This
prevents a process that is privileged in its userns - but not its parent
namespaces - from mounting a FUSE fs w/ allow_other that is accessible to
processes in parent namespaces.
While this restriction makes sense overall it breaks a legitimate usecase:
I have a tracing daemon which needs to peek into process' open files in
order to symbolicate - similar to 'perf'. The daemon is a privileged
process in the root userns, but is unable to peek into FUSE filesystems
mounted by processes in child namespaces.
This patch adds a module param, allow_sys_admin_access, to act as an escape
hatch for this descendant userns logic and for the allow_other mount option
in general. Setting allow_sys_admin_access allows processes with
CAP_SYS_ADMIN in the initial userns to access FUSE filesystems irrespective
of the mounting userns or whether allow_other was set. A sysadmin setting
this param must trust FUSEs on the host to not DoS processes as described
in 73f03c2b4b52.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The commit 15c8e72e88e0 ("fuse: allow skipping control interface and forced
unmount") tries to remove the control interface for virtio-fs since it does
not support aborting requests which are being processed. But it doesn't
work now.
This patch fixes it by skipping creating the control interface if
fuse_conn->no_control is set.
Fixes: 15c8e72e88e0 ("fuse: allow skipping control interface and forced unmount") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Overlayfs may fail to complete updates when a filesystem lacks
fileattr/xattr syscall support and responds with an ENOSYS error code,
resulting in an unexpected "Function not implemented" error.
This bug may occur with FUSE filesystems, such as davfs2.
# when "some-file" exists in the lowerdir, this fails with "Function
# not implemented", with dmesg showing "overlayfs: failed to retrieve
# lower fileattr (/some-file, err=-38)"
touch /test/mnt/some-file
The underlying cause of this regresion is actually in FUSE, which fails to
translate the ENOSYS error code returned by userspace filesystem (which
means that the ioctl operation is not supported) to ENOTTY.
fuse: fix deadlock between atomic O_TRUNC and page invalidation
fuse_finish_open() will be called with FUSE_NOWRITE set in case of atomic
O_TRUNC open(), so commit 76224355db75 ("fuse: truncate pagecache on
atomic_o_trunc") replaced invalidate_inode_pages2() by truncate_pagecache()
in such a case to avoid the A-A deadlock. However, we found another A-B-B-A
deadlock related to the case above, which will cause the xfstests
generic/464 testcase hung in our virtio-fs test environment.
For example, consider two processes concurrently open one same file, one
with O_TRUNC and another without O_TRUNC. The deadlock case is described
below, if open(O_TRUNC) is already set_nowrite(acquired A), and is trying
to lock a page (acquiring B), open() could have held the page lock
(acquired B), and waiting on the page writeback (acquiring A). This would
lead to deadlocks.
Besides this case, all calls of invalidate_inode_pages2() and
invalidate_inode_pages2_range() in fuse code also can deadlock with
open(O_TRUNC).
Fix by moving the truncate_pagecache() call outside the nowrite protected
region. The nowrite protection is only for delayed writeback
(writeback_cache) case, where inode lock does not protect against
truncation racing with writes on the server. Write syscalls racing with
page cache truncation still get the inode lock protection.
This patch also changes the order of filemap_invalidate_lock()
vs. fuse_set_nowrite() in fuse_open_common(). This new order matches the
order found in fuse_file_fallocate() and fuse_do_setattr().
A race between write(2) and close(2) allows pages to be dirtied after
fuse_flush -> write_inode_now(). If these pages are not flushed from
fuse_release(), then there might not be a writable open file later. So any
remaining dirty pages must be written back before the file is released.
This is a partial revert of the blamed commit.
Reported-by: syzbot+6e1efbd8efaaa6860e91@syzkaller.appspotmail.com Fixes: 36ea23374d1f ("fuse: write inode in fuse_vma_close() instead of fuse_release()") Cc: <stable@vger.kernel.org> # v5.16 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Antonio Borneo [Tue, 19 Jul 2022 12:28:31 +0000 (14:28 +0200)]
scripts/gdb: fix 'lx-dmesg' on 32 bits arch
The type atomic_long_t can have size 4 or 8 bytes, depending on
CONFIG_64BIT; it's only content, the field 'counter', is either an
int or a s64 value.
Current code incorrectly uses the fixed size utils.read_u64() to
read the field 'counter' inside atomic_long_t.
On 32 bits architectures reading the last element 'tail_id' of the
struct prb_desc_ring:
struct prb_desc_ring {
...
atomic_long_t tail_id;
};
causes the utils.read_u64() to access outside the boundary of the
struct and the gdb command 'lx-dmesg' exits with error:
Python Exception <class 'IndexError'>: index out of range
Error occurred in Python: index out of range
Query the really used atomic_long_t counter type size.
Merge tag 'qcom-arm64-defconfig-for-5.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/defconfig
Qualcomm ARM64 defconfig more updates for v5.20
This enables a few of the core drivers needed to boot the 8cx Gen 3
platform and demotes the Qualcomm USB PHY drivers to modules, as they
don't need to be builtin.
* tag 'qcom-arm64-defconfig-for-5.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: defconfig: Demote Qualcomm USB PHYs to modules
arm64: defconfig: Enable Qualcomm SC8280XP providers
bpf: Check attach_func_proto more carefully in check_helper_call
Syzkaller found a problem similar to d1a6edecc1fd ("bpf: Check
attach_func_proto more carefully in check_return_code") where
attach_func_proto might be NULL:
Merge tag 'at91-soc-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/soc
AT91 SoC for 5.20
It contains updates for integration with OP-TEE by having a
dummy outer_cache.write_sec function to avoid triggering
exception when Linux tries to update secure registers.
* tag 'at91-soc-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
ARM: at91: setup outer cache .write_sec() callback if needed
ARM: at91: add sam_linux_is_optee_available() function
Merge tag 'at91-fixes-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/fixes
AT91 fixes for 5.19 #3
It contains one fix for LAN966 based SoCs fixing the frequency of
sys_clk. sys_clk is feeding different IPs so having proper frequency
for it in DT is necessary for proper working of different drivers.
* tag 'at91-fixes-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
ARM: dts: lan966x: fix sys_clk frequency
Shengjiu Wang [Thu, 7 Jul 2022 03:00:29 +0000 (11:00 +0800)]
dmaengine: imx-sdma: Add FIFO stride support for multi FIFO script
The peripheral may have several FIFOs, but some case just select
some FIFOs from them for data transfer, which means FIFO0 and FIFO2
may be selected. So add FIFO address stride support, 0 means all FIFOs
are continuous, 1 means 1 word stride between FIFOs. All stride between
FIFOs should be same.
Another option words_per_fifo means how many audio channel data copied
to one FIFO one time, 1 means one channel per FIFO, 2 means 2 channels
per FIFO.
If 'n_fifos_src = 4' and 'words_per_fifo = 2', it means the first two
words(channels) fetch from FIFO0 and then jump to FIFO1 for next two words,
and so on after the last FIFO3 fetched, roll back to FIFO0.
Merge tag 'qcom-arm64-for-5.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/dt
More Qualcomm ARM64 DTS updates for v5.20
Related to SDM845, the Xiaomi Mi Mix2s is introduced, the DB845c on
SDM845 gains support for the second GPI DMA controller and has the GENI
I2C and SPI instances wired up to their respective GPI DMA controller.
QCS404 USB controller and PHY assignment is corrected and IPQ8074 gains
APCS definition to handle outgoing IPC interrupts.
Lastly a range of Devicetree validation issues are addressed.
Merge tag 'qcom-dts-for-5.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/dt
More Qualcomm DTS updates for v5.20
This adds an additional GSBI, hwclock, smem and tsens nodes for IPQ8064,
in addition to fixing up and improving the existing descriptions of the
platform.
USB interrupts are reordered to please the Devicetree binding.
The Light Pulse Generator is defined for PM8941 and LEDs are defined for
the FairPhone2, Nexus 5 and Sony Xperia devices.
* tag 'qcom-dts-for-5.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
ARM: dts: qcom: add rpmcc missing clocks for apq/ipq8064 and msm8660
ARM: dts: qcom: msm8974: Disable remoteprocs by default
ARM: dts: qcom: ipq8064: add missing smem compatible
ARM: dts: qcom: ipq8064: add missing hwlock
ARM: dts: qcom: ipq8064: add speedbin efuse nvmem node
ARM: dts: qcom: ipq8064: fix and add some missing gsbi node
ARM: dts: qcom: ipq8064: reduce pci IO size to 64K
ARM: dts: qcom: ipq8064: disable usb phy by default
ARM: dts: qcom: ipq8064: add missing snps,dwmac compatible for gmac
ARM: dts: qcom: ipq8064: add specific dtsi with smb208 rpm regulators
ARM: dts: qcom: ipq8064: add gsbi6 missing definition
ARM: dts: qcom: ipq8064: add multiple missing pin definition
ARM: dts: qcom: msm8974-hammerhead: Add notification LED
ARM: dts: qcom: msm8974-FP2: Add notification LED
ARM: dts: qcom: msm8974-sony: Enable LPG
ARM: dts: qcom: Add LPG node to pm8941
ARM: dts: qcom: sdx65: reorder USB interrupts
ARM: dts: qcom: apq8064: create tsens device node
Some IAX operation code nomenclatures are misleading or don't match with
others:
1. Operation code 0x4c is Zero Compress 32. IAX_OPCODE_DECOMP_32 is a
misleading name. Change it to IAX_OPCODE_ZERO_COMP_32.
2. Operation code 0x4d is Zero Compress 16. IAX_OPCODE_DECOMP_16 is a
misleading name. Change it to IAX_OPCODE_ZERO_COMP_16.
3. IAX_OPCDE_FIND_UNIQUE is corrected to match with other nomenclatures.
Co-developed-by: Li Zhang <li4.zhang@intel.com> Signed-off-by: Li Zhang <li4.zhang@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20220707002052.1546361-1-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
Ben Dooks [Fri, 8 Jul 2022 17:01:53 +0000 (18:01 +0100)]
dmaengine: dw-axi-dmac: ignore interrupt if no descriptor
If the channel has no descriptor and the interrupt is raised then the
kernel will OOPS. Check the result of vchan_next_desc() in the handler
axi_chan_block_xfer_complete() to avoid the error happening.
Ben Dooks [Fri, 8 Jul 2022 17:01:52 +0000 (18:01 +0100)]
dmaengine: dw-axi-dmac: do not print NULL LLI during error
During debugging we have seen an issue where axi_chan_dump_lli()
is passed a NULL LLI pointer which ends up causing an OOPS due
to trying to get fields from it. Simply print NULL LLI and exit
to avoid this.
Dan Carpenter [Tue, 19 Jul 2022 09:53:01 +0000 (12:53 +0300)]
libbpf: Fix str_has_sfx()'s return value
The return from strcmp() is inverted so it wrongly returns true instead
of false and vice versa.
Fixes: a1c9d61b19cb ("libbpf: Improve library identification for uprobe binary path resolution") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Cc: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/YtZ+/dAA195d99ak@kili
Dan Carpenter [Tue, 19 Jul 2022 09:49:34 +0000 (12:49 +0300)]
libbpf: Fix sign expansion bug in btf_dump_get_enum_value()
The code here is supposed to take a signed int and store it in a signed
long long. Unfortunately, the way that the type promotion works with
this conditional statement is that it takes a signed int, type promotes
it to a __u32, and then stores that as a signed long long. The result is
never negative.
This is from static analysis, but I made a little test program just to
test it before I sent the patch:
#include <stdio.h>
int main(void)
{
unsigned long long src = -1ULL;
signed long long dst1, dst2;
int is_signed = 1;
dst1 = is_signed ? *(int *)&src : *(unsigned int *)0;
dst2 = is_signed ? (signed long long)*(int *)&src : *(unsigned int *)0;
printf("%lld\n", dst1);
printf("%lld\n", dst2);
return 0;
}
Fixes: d90ec262b35b ("libbpf: Add enum64 support for btf_dump") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/YtZ+LpgPADm7BeEd@kili
Jason Gerecke [Fri, 15 Jul 2022 23:05:19 +0000 (16:05 -0700)]
HID: wacom: Force pen out of prox if no events have been received in a while
Prox-out events may not be reliably sent by some AES firmware. This can
cause problems for users, particularly due to arbitration logic disabling
touch input while the pen is in prox.
This commit adds a timer which is reset every time a new prox event is
received. When the timer expires we check to see if the pen is still in
prox and force it out if necessary. This is patterend off of the same
solution used by 'hid-letsketch' driver which has a similar problem.
Newer AMD SOCs use SFH1.1 memory access with new PCI-id. Hence add new
sfh1_1 sub directory to implement SFH1.1 functionality by defining new
PCI id, interface functions, descriptor functions and handlers which
invokes sfh1.1.
HID: core: remove unneeded assignment in hid_process_report()
Commit bebcc522fbee ("HID: core: for input reports, process the usages by
priority list") split the iteration into two distinct loops in
hid_process_report().
After this change, the variable field is only used while iterating in the
second loop and the assignment of values to this variable in the first loop
is simply not needed.
Remove the unneeded assignment during retrieval. No functional change and
no change in the resulting object code.
This was discovered as a dead store with clang-analyzer.
net/cdc_ncm: Increase NTB max RX/TX values to 64kb
DisplayLink ethernet devices require NTB buffers larger then 32kb
in order to run with highest performance.
This patch is changing upper limit of the rx and tx buffers.
Those buffers are initialized with CDC_NCM_NTB_DEF_SIZE_RX and
CDC_NCM_NTB_DEF_SIZE_TX which is 16kb so by default no device is
affected by increased limit.
Rx and tx buffer is increased under two conditions:
- Device need to advertise that it supports higher buffer size in
dwNtbMaxInMaxSize and dwNtbMaxOutMaxSize.
- cdc_ncm/rx_max and cdc_ncm/tx_max driver parameters must be adjusted
with udev rule or ethtool.
Summary of testing and performance results:
Tests were performed on following devices:
- DisplayLink DL-3xxx family device
- DisplayLink DL-6xxx family device
- ASUS USB-C2500 2.5G USB3 ethernet adapter
- Plugable USB3 1G USB3 ethernet adapter
- EDIMAX EU-4307 USB-C ethernet adapter
- Dell DBQBCBC064 USB-C ethernet adapter
Performance measurements were done with:
- iperf3 between two linux boxes
- http://openspeedtest.com/ instance running on local test machine
Insights from tests results:
- All except one from third party usb adapters were not affected by
increased buffer size to their advertised dwNtbOutMaxSize and
dwNtbInMaxSize.
Devices were generally reaching 912-940Mbps both download and upload.
Only EDIMAX adapter experienced decreased download size from
929Mbps to 827Mbps with iper3, with openspeedtest decrease was from
968Mbps to 886Mbps.
- DisplayLink DL-3xxx family devices experienced performance increase
with iperf3 download from 300Mbps to 870Mbps and
upload from 782Mbps to 844Mbps.
With openspeedtest download increased from 556Mbps to 873Mbps
and upload from 727Mbps to 973Mbps
- DiplayLink DL-6xxx family devices are not affected by
increased buffer size.
Jiang Jian [Tue, 21 Jun 2022 12:27:51 +0000 (20:27 +0800)]
ID: intel-ish-hid: hid-client: drop unexpected word "the" in the comments
there is an unexpected word "the" in the comments that need to be dropped
file: drivers/hid/intel-ish-hid/ishtp-hid-client.c
line: 331
* @device: Pointer to the the ishtp client device for which this message
changed to
* @device: Pointer to the ishtp client device for which this message
HID: mcp2221: prevent a buffer overflow in mcp_smbus_write()
Smatch Warning:
drivers/hid/hid-mcp2221.c:388 mcp_smbus_write() error: __memcpy()
'&mcp->txbuf[5]' too small (59 vs 255)
drivers/hid/hid-mcp2221.c:388 mcp_smbus_write() error: __memcpy() 'buf'
too small (34 vs 255)
The 'len' variable can take a value between 0-255 as it can come from
data->block[0] and it is user data. So add an bound check to prevent a
buffer overflow in memcpy().
Fixes: 67a95c21463d ("HID: mcp2221: add usb to i2c-smbus host bridge") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Yang Xu [Thu, 14 Jul 2022 06:11:28 +0000 (14:11 +0800)]
ceph: rely on vfs for setgid stripping
Now that we finished moving setgid stripping for regular files in setgid
directories into the vfs, individual filesystem don't need to manually
strip the setgid bit anymore. Drop the now unneeded code from ceph.
Link: https://lore.kernel.org/r/1657779088-2242-4-git-send-email-xuyang2018.jy@fujitsu.com Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Christian Brauner (Microsoft)<brauner@kernel.org> Reviewed-and-Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Yang Xu [Thu, 14 Jul 2022 06:11:27 +0000 (14:11 +0800)]
fs: move S_ISGID stripping into the vfs_*() helpers
Move setgid handling out of individual filesystems and into the VFS
itself to stop the proliferation of setgid inheritance bugs.
Creating files that have both the S_IXGRP and S_ISGID bit raised in
directories that themselves have the S_ISGID bit set requires additional
privileges to avoid security issues.
When a filesystem creates a new inode it needs to take care that the
caller is either in the group of the newly created inode or they have
CAP_FSETID in their current user namespace and are privileged over the
parent directory of the new inode. If any of these two conditions is
true then the S_ISGID bit can be raised for an S_IXGRP file and if not
it needs to be stripped.
However, there are several key issues with the current implementation:
* S_ISGID stripping logic is entangled with umask stripping.
If a filesystem doesn't support or enable POSIX ACLs then umask
stripping is done directly in the vfs before calling into the
filesystem.
If the filesystem does support POSIX ACLs then unmask stripping may be
done in the filesystem itself when calling posix_acl_create().
Since umask stripping has an effect on S_ISGID inheritance, e.g., by
stripping the S_IXGRP bit from the file to be created and all relevant
filesystems have to call posix_acl_create() before inode_init_owner()
where we currently take care of S_ISGID handling S_ISGID handling is
order dependent. IOW, whether or not you get a setgid bit depends on
POSIX ACLs and umask and in what order they are called.
Note that technically filesystems are free to impose their own
ordering between posix_acl_create() and inode_init_owner() meaning
that there's additional ordering issues that influence S_SIGID
inheritance.
* Filesystems that don't rely on inode_init_owner() don't get S_ISGID
stripping logic.
While that may be intentional (e.g. network filesystems might just
defer setgid stripping to a server) it is often just a security issue.
This is not just ugly it's unsustainably messy especially since we do
still have bugs in this area years after the initial round of setgid
bugfixes.
So the current state is quite messy and while we won't be able to make
it completely clean as posix_acl_create() is still a filesystem specific
call we can improve the S_SIGD stripping situation quite a bit by
hoisting it out of inode_init_owner() and into the vfs creation
operations. This means we alleviate the burden for filesystems to handle
S_ISGID stripping correctly and can standardize the ordering between
S_ISGID and umask stripping in the vfs.
We add a new helper vfs_prepare_mode() so S_ISGID handling is now done
in the VFS before umask handling. This has S_ISGID handling is
unaffected unaffected by whether umask stripping is done by the VFS
itself (if no POSIX ACLs are supported or enabled) or in the filesystem
in posix_acl_create() (if POSIX ACLs are supported).
The vfs_prepare_mode() helper is called directly in vfs_*() helpers that
create new filesystem objects. We need to move them into there to make
sure that filesystems like overlayfs hat have callchains like:
get S_ISGID stripping done when calling into lower filesystems via
vfs_*() creation helpers. Moving vfs_prepare_mode() into e.g.
vfs_mknod() takes care of that. This is in any case semantically cleaner
because S_ISGID stripping is VFS security requirement.
Security hooks so far have seen the mode with the umask applied but
without S_ISGID handling done. The relevant hooks are called outside of
vfs_*() creation helpers so by calling vfs_prepare_mode() from vfs_*()
helpers the security hooks would now see the mode without umask
stripping applied. For now we fix this by passing the mode with umask
settings applied to not risk any regressions for LSM hooks. IOW, nothing
changes for LSM hooks. It is worth pointing out that security hooks
never saw the mode that is seen by the filesystem when actually creating
the file. They have always been completely misplaced for that to work.
The following filesystems use inode_init_owner() and thus relied on
S_ISGID stripping: spufs, 9p, bfs, btrfs, ext2, ext4, f2fs, hfsplus,
hugetlbfs, jfs, minix, nilfs2, ntfs3, ocfs2, omfs, overlayfs, ramfs,
reiserfs, sysv, ubifs, udf, ufs, xfs, zonefs, bpf, tmpfs.
All of the above filesystems end up calling inode_init_owner() when new
filesystem objects are created through the ->mkdir(), ->mknod(),
->create(), ->tmpfile(), ->rename() inode operations.
Since directories always inherit the S_ISGID bit with the exception of
xfs when irix_sgid_inherit mode is turned on S_ISGID stripping doesn't
apply. The ->symlink() and ->link() inode operations trivially inherit
the mode from the target and the ->rename() inode operation inherits the
mode from the source inode. All other creation inode operations will get
S_ISGID handling via vfs_prepare_mode() when called from their relevant
vfs_*() helpers.
In addition to this there are filesystems which allow the creation of
filesystem objects through ioctl()s or - in the case of spufs -
circumventing the vfs in other ways. If filesystem objects are created
through ioctl()s the vfs doesn't know about it and can't apply regular
permission checking including S_ISGID logic. Therfore, a filesystem
relying on S_ISGID stripping in inode_init_owner() in their ioctl()
callpath will be affected by moving this logic into the vfs. We audited
those filesystems:
* btrfs allows the creation of filesystem objects through various
ioctls(). Snapshot creation literally takes a snapshot and so the mode
is fully preserved and S_ISGID stripping doesn't apply.
Creating a new subvolum relies on inode_init_owner() in
btrfs_new_subvol_inode() but only creates directories and doesn't
raise S_ISGID.
* ocfs2 has a peculiar implementation of reflinks. In contrast to e.g.
xfs and btrfs FICLONE/FICLONERANGE ioctl() that is only concerned with
the actual extents ocfs2 uses a separate ioctl() that also creates the
target file.
Iow, ocfs2 circumvents the vfs entirely here and did indeed rely on
inode_init_owner() to strip the S_ISGID bit. This is the only place
where a filesystem needs to call mode_strip_sgid() directly but this
is self-inflicted pain.
* spufs doesn't go through the vfs at all and doesn't use ioctl()s
either. Instead it has a dedicated system call spufs_create() which
allows the creation of filesystem objects. But spufs only creates
directories and doesn't allo S_SIGID bits, i.e. it specifically only
allows 0777 bits.
* bpf uses vfs_mkobj() but also doesn't allow S_ISGID bits to be created.
The patch will have an effect on ext2 when the EXT2_MOUNT_GRPID mount
option is used, on ext4 when the EXT4_MOUNT_GRPID mount option is used,
and on xfs when the XFS_FEAT_GRPID mount option is used. When any of
these filesystems are mounted with their respective GRPID option then
newly created files inherit the parent directories group
unconditionally. In these cases non of the filesystems call
inode_init_owner() and thus did never strip the S_ISGID bit for newly
created files. Moving this logic into the VFS means that they now get
the S_ISGID bit stripped. This is a user visible change. If this leads
to regressions we will either need to figure out a better way or we need
to revert. However, given the various setgid bugs that we found just in
the last two years this is a regression risk we should take.
Associated with this change is a new set of fstests to enforce the
semantics for all new filesystems.
Link: https://lore.kernel.org/ceph-devel/20220427092201.wvsdjbnc7b4dttaw@wittgenstein
Link: e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes") [2]
Link: 01ea173e103e ("xfs: fix up non-directory creation in SGID directories") [3]
Link: fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories") [4] Link: https://lore.kernel.org/r/1657779088-2242-3-git-send-email-xuyang2018.jy@fujitsu.com Suggested-by: Dave Chinner <david@fromorbit.com> Suggested-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-and-Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
[<brauner@kernel.org>: rewrote commit message] Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Peter Zijlstra [Fri, 17 Jun 2022 14:52:06 +0000 (16:52 +0200)]
x86/extable: Fix ex_handler_msr() print condition
On Fri, Jun 17, 2022 at 02:08:52PM +0300, Stephane Eranian wrote:
> Some changes to the way invalid MSR accesses are reported by the
> kernel is causing some problems with messages printed on the
> console.
>
> We have seen several cases of ex_handler_msr() printing invalid MSR
> accesses once but the callstack multiple times causing confusion on
> the console.
> The problem here is that another earlier commit (5.13):
>
> a358f40600b3 ("once: implement DO_ONCE_LITE for non-fast-path "do once" functionality")
>
> Modifies all the pr_*_once() calls to always return true claiming
> that no caller is ever checking the return value of the functions.
>
> This is why we are seeing the callstack printed without the
> associated printk() msg.
Extract the ONCE_IF(cond) part into __ONCE_LTE_IF() and use that to
implement DO_ONCE_LITE_IF() and fix the extable code.
Cruz Zhao [Tue, 28 Jun 2022 07:57:23 +0000 (15:57 +0800)]
sched/core: Fix the bug that task won't enqueue into core tree when update cookie
In function sched_core_update_cookie(), a task will enqueue into the
core tree only when it enqueued before, that is, if an uncookied task
is cookied, it will not enqueue into the core tree until it enqueue
again, which will result in unnecessary force idle.
Here follows the scenario:
CPU x and CPU y are a pair of SMT siblings.
1. Start task a running on CPU x without sleeping, and task b and
task c running on CPU y without sleeping.
2. We create a cookie and share it to task a and task b, and then
we create another cookie and share it to task c.
3. Simpling core_forceidle_sum of task a and b from /proc/PID/sched
And we will find out that core_forceidle_sum of task a takes 30%
time of the sampling period, which shouldn't happen as task a and b
have the same cookie.
Then we migrate task a to CPU x', migrate task b and c to CPU y', where
CPU x' and CPU y' are a pair of SMT siblings, and sampling again, we
will found out that core_forceidle_sum of task a and b are almost zero.
To solve this problem, we enqueue the task into the core tree if it's
on rq.
nohz/full, sched/rt: Fix missed tick-reenabling bug in dequeue_task_rt()
dequeue_task_rt() only decrements 'rt_rq->rt_nr_running' after having
called sched_update_tick_dependency() preventing it from re-enabling the
tick on systems that no longer have pending SCHED_RT tasks but have
multiple runnable SCHED_OTHER tasks:
Every other scheduler class performs the operation in the opposite
order, and sched_update_tick_dependency() expects the values to be
updated as such. So avoid the misbehaviour by inverting the order in
which the above operations are performed in the RT scheduler.
Fixes: 76d92ac305f2 ("sched: Migrate sched to use new tick dependency mask model") Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Phil Auld <pauld@redhat.com> Link: https://lore.kernel.org/r/20220628092259.330171-1-nsaenzju@redhat.com
Juri Lelli [Thu, 14 Jul 2022 15:19:08 +0000 (17:19 +0200)]
sched/deadline: Fix BUG_ON condition for deboosted tasks
Tasks the are being deboosted from SCHED_DEADLINE might enter
enqueue_task_dl() one last time and hit an erroneous BUG_ON condition:
since they are not boosted anymore, the if (is_dl_boosted()) branch is
not taken, but the else if (!dl_prio) is and inside this one we
BUG_ON(!is_dl_boosted), which is of course false (BUG_ON triggered)
otherwise we had entered the if branch above. Long story short, the
current condition doesn't make sense and always leads to triggering of a
BUG.
Fix this by only checking enqueue flags, properly: ENQUEUE_REPLENISH has
to be present, but additional flags are not a problem.
Fixes: 64be6f1f5f71 ("sched/deadline: Don't replenish from a !SCHED_DEADLINE entity") Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220714151908.533052-1-juri.lelli@redhat.com
====================
net: ipa: move configuration data files
This series moves the "ipa_data-vX.Y.c" files into a subdirectory.
The first patch adds a Makefile variable containing the list of
supported IPA versions, and uses it to simplify the way these files
are specified.
====================
Alex Elder [Tue, 19 Jul 2022 15:08:26 +0000 (10:08 -0500)]
net: ipa: list supported IPA versions in the Makefile
Create a variable in the Makefile listing the IPA versions supported
by the driver. Use that to create the list of configuration data
object files used (rather than listing them all individually).
Add a SPDX license comment.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 21 Jul 2022 04:05:07 +0000 (21:05 -0700)]
Merge branch 'net-ipa-small-transaction-updates'
Alex Elder says:
====================
net: ipa: small transaction updates
Version 2 of this series corrects a misspelling of "outstanding"
pointed out by the netdev test bots. (For some reason I don't see
that when I run "checkpatch".) I found and fixed a second instance
of that word being misspelled as well.
This series includes three changes to the transaction code. The
first adds a new transaction list that represents a distinct state
that has not been maintained. The second moves a field in the
transaction information structure, and reorders its initialization
a bit. The third skips a function call when it is known not to be
necessary.
The last two are very small "leftover" patches.
====================
Alex Elder [Tue, 19 Jul 2022 18:10:20 +0000 (13:10 -0500)]
net: ipa: fix an outdated comment
Since commit 8797972afff3d ("net: ipa: remove command info pool"),
we don't allocate "command info" entries for command channel
transactions. Fix a comment that seems to suggest we still do.
(Even before that commit, the comment was out of place.)
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 19 Jul 2022 18:10:19 +0000 (13:10 -0500)]
net: ipa: report when the driver has been removed
When the IPA driver has completed its initialization and setup
stages, it emits a brief message to the log. Add a small message
that reports when it has been removed.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 19 Jul 2022 18:10:18 +0000 (13:10 -0500)]
net: ipa: skip some cleanup for unused transactions
In gsi_trans_free(), there's no point in ipa_gsi_trans_release() if
a transaction is unused. No used TREs means no IPA layer resources
to clean up. So only call ipa_gsi_trans_release() if at least one
TRE was used.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 19 Jul 2022 18:10:17 +0000 (13:10 -0500)]
net: ipa: rearrange transaction initialization
The transaction map is really associated with the transaction pool;
move its definition earlier in the gsi_trans_info structure.
Rearrange initialization in gsi_channel_trans_init() so it
sets the tre_avail value first, then initializes the transaction
pool, and finally allocating the transaction map.
Update comments.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>