The current implementation accesses the `child` fwnode handle outside of
fwnode_for_each_available_child_node() without incrementing its
refcount. Add the missing call to `fwnode_handle_get(child)`.
The cleanup process where `child` is accessed is not right either
because a single call to `fwnode_handle_put()` is carried out in case of
an error, ignoring unasigned nodes at the point when the error happens.
Keep `child` inside of the first loop, and use the helper pointer that
receives references via `fwnode_handle_get()` to handle the child nodes
within the second loop.
Moreover, the iterated nodes are direct children of the device node,
and the `device_for_each_child_node()` macro accounts for child node
availability. By restricting `child` to live within that loop, the
scoped version of it can be used to simplify the error handling.
`fwnode_for_each_available_child_node()` is meant to access the child
nodes of an fwnode, and therefore not direct child nodes of the device
node.
Use `device_for_each_child_node_scoped()` to indicate device's direct
child nodes.
Fix seveal odd-looking places in SM8550's dispcc driver:
- duplicate entries in disp_cc_parent_map_4 and disp_cc_parent_map_5
- using &disp_cc_mdss_dptx0_link_div_clk_src as a source for
disp_cc_mdss_dptx1_usb_router_link_intf_clk
Similar to DCLK_LCDC on RK3328, the DCLK_VOP on RK3228 is typically
parented by the hdmiphy clk and it is expected that the DCLK_VOP and
hdmiphy clk rate are kept in sync.
Use CLK_SET_RATE_PARENT and CLK_SET_RATE_NO_REPARENT flags, same as used
on RK3328, to make full use of all possible supported display modes.
Fixes: 0a9d4ac08ebc ("clk: rockchip: set the clock ids for RK3228 VOP") Fixes: 307a2e9ac524 ("clk: rockchip: add clock controller for rk3228") Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Link: https://lore.kernel.org/r/20240615170417.3134517-3-jonas@kwiboo.se Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
Initialize workqueue before requesting mailbox channel, otherwise if
mailbox interrupt comes before workqueue ready, the imx_rproc_rx_callback
will trigger issue.
The initialization order of SCU clocks affects the sequence of SCU clock
resume. If there are no other effects, the earlier the initialization,
the earlier the resume. During SCU clock resume, the clock rate is
restored. As SCFW guidelines, configure the parent clock rate before
configuring the child rate.
Fixes: babfaa9556d7 ("clk: imx: scu: add more scu clocks") Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Abel Vesa <abel.vesa@linaro.org> Link: https://lore.kernel.org/r/20240607133347.3291040-15-peng.fan@oss.nxp.com Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The initialization order of SCU clocks affects the sequence of SCU clock
resume. If there are no other effects, the earlier the initialization,
the earlier the resume. During SCU clock resume, the clock rate is
restored. As SCFW guidelines, configure the parent clock rate before
configuring the child rate.
Fixes: 91e916771de0 ("clk: imx: scu: remove legacy scu clock binding support") Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Abel Vesa <abel.vesa@linaro.org> Link: https://lore.kernel.org/r/20240607133347.3291040-14-peng.fan@oss.nxp.com Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
On the i.MX8M*, the TF-A exposes a SiP (Silicon Provider) service
for DDR frequency scaling. The imx8m-ddrc-devfreq driver calls the
SiP and then does clk_set_parent on the DDR muxes to synchronize
the clock tree.
since commit 936c383673b9 ("clk: imx: fix composite peripheral flags"),
these TF-A managed muxes have SET_PARENT_GATE set, which results
in imx8m-ddrc-devfreq's clk_set_parent after SiP failing with -EBUSY:
commit 926bf91248dd
("clk: imx8m: fix clock tree update of TF-A managed clocks") adds this
method and enables 8mm, 8mn and 8mq. i.MX8MP also needs it.
This is safe to do, because updating the Linux clock tree to reflect
reality will always be glitch-free.
Another reason to this patch is that powersave image BT music
requires dram to be 400MTS, so clk_set_parent(dram_alt_src,
sys1_pll_800m); is required. Without this patch, it will not succeed.
Fixes: 936c383673b9 ("clk: imx: fix composite peripheral flags") Signed-off-by: Zhipeng Wang <zhipeng.wang_1@nxp.com> Reviewed-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Abel Vesa <abel.vesa@linaro.org> Link: https://lore.kernel.org/r/20240607133347.3291040-7-peng.fan@oss.nxp.com Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fractional part of PLL gets lost after re-enabling the PLL. the
MFN can NOT be automatically loaded when doing frac PLL enable/disable,
So when re-enable PLL, configure mfn explicitly.
Fixes: 1b26cb8a77a4 ("clk: imx: support fracn gppll") Signed-off-by: Pengfei Li <pengfei.li_1@nxp.com> Reviewed-by: Jacky Bai <ping.bai@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Abel Vesa <abel.vesa@linaro.org> Link: https://lore.kernel.org/r/20240607133347.3291040-5-peng.fan@oss.nxp.com Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When some module is disabled by fuse, its PCC PR bit is default 0 and
PCC is not operational. Any write to this PCC will cause SError.
Fixes: b40ba8065347 ("clk: imx: Update the compsite driver to support imx8ulp") Reviewed-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Ye Li <ye.li@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Abel Vesa <abel.vesa@linaro.org> Link: https://lore.kernel.org/r/20240607133347.3291040-4-peng.fan@oss.nxp.com Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Previously we assumed that the root clock slice is enabled
by default when kernel boot up. But the bootloader may disable
the clocks before jump into kernel. The gate ops should be registered
rather than NULL to make sure the disabled clock can be enabled
when kernel boot up. Refine the code to skip disable the clock
if mcore booted.
Fixes: a740d7350ff7 ("clk: imx: imx93: add mcore_booted module paratemter") Signed-off-by: Jacky Bai <ping.bai@nxp.com> Reviewed-by: Peng Fan <peng.fan@nxp.com> Tested-by: Chancel Liu <chancel.liu@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Abel Vesa <abel.vesa@linaro.org> Link: https://lore.kernel.org/r/20240607133347.3291040-3-peng.fan@oss.nxp.com Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Bootloader might disable some CCM ROOT Slices. So if mcore_booted set with
display CCM ROOT disabled by Bootloader, kernel display BLK CTRL driver
imx8m_blk_ctrl_driver_init may hang the system because the BUS clk is
disabled.
Add back gate ops, but with disable doing nothing, then the CCM ROOT
will be enabled when used.
Fixes: bb7e897b002a ("clk: imx8m: check mcore_booted before register clk") Reviewed-by: Ye Li <ye.li@nxp.com> Reviewed-by: Jacky Bai <ping.bai@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Abel Vesa <abel.vesa@linaro.org> Link: https://lore.kernel.org/r/20240607133347.3291040-2-peng.fan@oss.nxp.com Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The function “kfree” was called in up to three cases
by the function “__imx8m_clk_hw_composite” during error handling
even if the passed variables contained a null pointer.
Adjust jump targets according to the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Peng Fan <peng.fan@nxp.com> Link: https://lore.kernel.org/r/147ca1e6-69f3-4586-b5b3-b69f9574a862@web.de Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Stable-dep-of: 8f32e9dd0916 ("clk: imx: composite-8m: Enable gate clk with mcore_booted") Signed-off-by: Sasha Levin <sashal@kernel.org>
The clk_set_parent for "enet1_ref_sel" and "enet2_ref_sel" are
incorrect, therefore the original requirements to have "enet_clk_ref" as
output sourced by iMX ENET PLL as a default config is not met.
Only "enet[1,2]_ref_125m" "enet[1,2]_ref_pad" are possible parents for
"enet1_ref_sel" and "enet2_ref_sel".
This was observed as a regression using a custom device tree which was
expecting this default config.
This can be fixed at the device tree level but having a default config
matching the original behavior (before refclock mux) will avoid breaking
existing configs.
Switch strtoul to strtoull as an unsigned long in 32-bit build isn't
64-bits.
Fixes: c284d669a20d408b ("perf tools: Move parse_nsec_time to time-utils.c") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: David Ahern <dsa@cumulusnetworks.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240831070415.506194-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If sched_in event for current task is not recorded, sched_in timestamp
will be set to end_time of time window interest, causing an error in
timestamp show. In this case, we choose to ignore this event.
Test scenario:
perf[1229608] does not record the first sched_in event, run time and sch delay are both 0
# perf sched timehist
Samples of sched_switch event do not have callchains.
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- --------- 2090450.763231 [0000] perf[1229608] 0.000 0.000 0.000 2090450.763235 [0000] migration/0[15] 0.000 0.001 0.003 2090450.763263 [0001] perf[1229608] 0.000 0.000 0.000 2090450.763268 [0001] migration/1[21] 0.000 0.001 0.004 2090450.763302 [0002] perf[1229608] 0.000 0.000 0.000 2090450.763309 [0002] migration/2[27] 0.000 0.001 0.007 2090450.763338 [0003] perf[1229608] 0.000 0.000 0.000 2090450.763343 [0003] migration/3[33] 0.000 0.001 0.004
Before:
arbitrarily specify a time window of interest, timestamp will be set to an incorrect value
# perf sched timehist --time 100,200
Samples of sched_switch event do not have callchains.
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
200.000000 [0000] perf[1229608] 0.000 0.000 0.000
200.000000 [0001] perf[1229608] 0.000 0.000 0.000
200.000000 [0002] perf[1229608] 0.000 0.000 0.000
200.000000 [0003] perf[1229608] 0.000 0.000 0.000
200.000000 [0004] perf[1229608] 0.000 0.000 0.000
200.000000 [0005] perf[1229608] 0.000 0.000 0.000
200.000000 [0006] perf[1229608] 0.000 0.000 0.000
200.000000 [0007] perf[1229608] 0.000 0.000 0.000
After:
# perf sched timehist --time 100,200
Samples of sched_switch event do not have callchains.
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
Fixes: 853b74071110bed3 ("perf sched timehist: Add option to specify time window of interest") Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsa@cumulusnetworks.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240819024720.2405244-1-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently we'll only print metric headers for metric leader in
aggregration mode. This will make `perf iostat` header not shown
since it'll aggregrated globally but don't have metric events:
root@ubuntu204:/home/yang/linux/tools/perf# ./perf stat --iostat --timeout 1000
Performance counter stats for 'system wide':
port
0000:00 0 0 0 0
0000:80 0 0 0 0
[...]
Fix this by excluding the iostat in the check of printing metric
headers. Then we can see the headers:
When perf_time__parse_str() fails in perf_sched__timehist(),
need to free session that was previously created, fix it.
Fixes: 853b74071110bed3 ("perf sched timehist: Add option to specify time window of interest") Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsa@cumulusnetworks.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240806023533.1316348-1-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Some fields in the 'struct annotation' are only used with 'struct
annotated_source' so better to be moved there in order to reduce memory
consumption for other symbols.
Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231103191907.54531-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Stable-dep-of: 3ef44458071a ("perf report: Fix --total-cycles --stdio output error") Signed-off-by: Sasha Levin <sashal@kernel.org>
The cycles info is only meaningful when sample has branch stacks. To
save the memory for normal cases, move those fields to a new 'struct
annotated_branch' and dynamically allocate it when needed. Also move
cycles_hist from annotated_source as it's related here.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231103191907.54531-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Stable-dep-of: 3ef44458071a ("perf report: Fix --total-cycles --stdio output error") Signed-off-by: Sasha Levin <sashal@kernel.org>
The processing of leader samples would turn an individual sample with
a group of read values into multiple samples. 'perf inject' would pass
through the additional samples increasing the output data file size:
This behavior is incorrect as in the case above 'perf inject' should
have done nothing. Fix this behavior by disabling separating samples
for a tool that requests it. Only request this for `perf inject` so as
to not affect other perf tools. With the patch and the test above
there are no differences between the orig.txt and new.txt.
Fixes: e4caec0d1af3d608 ("perf evsel: Add PERF_SAMPLE_READ sample related processing") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240729220620.2957754-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
For all non-tracing helpers which formerly had ARG_PTR_TO_{LONG,INT} as input
arguments, zero the value for the case of an error as otherwise it could leak
memory. For tracing, it is not needed given CAP_PERFMON can already read all
kernel memory anyway hence bpf_get_func_arg() and bpf_get_func_ret() is skipped
in here.
Also, the MTU helpers mtu_len pointer value is being written but also read.
Technically, the MEM_UNINIT should not be there in order to always force init.
Removing MEM_UNINIT needs more verifier rework though: MEM_UNINIT right now
implies two things actually: i) write into memory, ii) memory does not have
to be initialized. If we lift MEM_UNINIT, it then becomes: i) read into memory,
ii) memory must be initialized. This means that for bpf_*_check_mtu() we're
readding the issue we're trying to fix, that is, it would then be able to
write back into things like .rodata BPF maps. Follow-up work will rework the
MEM_UNINIT semantics such that the intent can be better expressed. For now
just clear the *mtu_len on error path which can be lifted later again.
When checking malformed helper function signatures, also take other argument
types into account aside from just ARG_PTR_TO_UNINIT_MEM.
This concerns (formerly) ARG_PTR_TO_{INT,LONG} given uninitialized memory can
be passed there, too.
The func proto sanity check goes back to commit 435faee1aae9 ("bpf, verifier:
add ARG_PTR_TO_RAW_STACK type"), and its purpose was to detect wrong func protos
which had more than just one MEM_UNINIT-tagged type as arguments.
The reason more than one is currently not supported is as we mark stack slots with
STACK_MISC in check_helper_call() in case of raw mode based on meta.access_size to
allow uninitialized stack memory to be passed to helpers when they just write into
the buffer.
Probing for base type as well as MEM_UNINIT tagging ensures that other types do not
get missed (as it used to be the case for ARG_PTR_TO_{INT,LONG}).
Lonial found an issue that despite user- and BPF-side frozen BPF map
(like in case of .rodata), it was still possible to write into it from
a BPF program side through specific helpers having ARG_PTR_TO_{LONG,INT}
as arguments.
In check_func_arg() when the argument is as mentioned, the meta->raw_mode
is never set. Later, check_helper_mem_access(), under the case of
PTR_TO_MAP_VALUE as register base type, it assumes BPF_READ for the
subsequent call to check_map_access_type() and given the BPF map is
read-only it succeeds.
The helpers really need to be annotated as ARG_PTR_TO_{LONG,INT} | MEM_UNINIT
when results are written into them as opposed to read out of them. The
latter indicates that it's okay to pass a pointer to uninitialized memory
as the memory is written to anyway.
However, ARG_PTR_TO_{LONG,INT} is a special case of ARG_PTR_TO_FIXED_SIZE_MEM
just with additional alignment requirement. So it is better to just get
rid of the ARG_PTR_TO_{LONG,INT} special cases altogether and reuse the
fixed size memory types. For this, add MEM_ALIGNED to additionally ensure
alignment given these helpers write directly into the args via *<ptr> = val.
The .arg*_size has been initialized reflecting the actual sizeof(*<ptr>).
MEM_ALIGNED can only be used in combination with MEM_FIXED_SIZE annotated
argument types, since in !MEM_FIXED_SIZE cases the verifier does not know
the buffer size a priori and therefore cannot blindly write *<ptr> = val.
The bpf_strtol() and bpf_strtoul() helpers are currently broken on 32bit:
The argument type ARG_PTR_TO_LONG is BPF-side "long", not kernel-side "long"
and therefore always considered fixed 64bit no matter if 64 or 32bit underlying
architecture.
This contract breaks in case of the two mentioned helpers since their BPF_CALL
definition for the helpers was added with {unsigned,}long *res. Meaning, the
transition from BPF-side "long" (BPF program) to kernel-side "long" (BPF helper)
breaks here.
Both helpers call __bpf_strtoll() with "long long" correctly, but later assigning
the result into 32-bit "*(long *)" on 32bit architectures. From a BPF program
point of view, this means upper bits will be seen as uninitialised.
Therefore, fix both BPF_CALL signatures to {s,u}64 types to fix this situation.
Now, changing also uapi/bpf.h helper documentation which generates bpf_helper_defs.h
for BPF programs is tricky: Changing signatures there to __{s,u}64 would trigger
compiler warnings (incompatible pointer types passing 'long *' to parameter of type
'__s64 *' (aka 'long long *')) for existing BPF programs.
Leaving the signatures as-is would be fine as from BPF program point of view it is
still BPF-side "long" and thus equivalent to __{s,u}64 on 64 or 32bit underlying
architectures.
Note that bpf_strtol() and bpf_strtoul() are the only helpers with this issue.
The function nilfs_btree_check_delete(), which checks whether degeneration
to direct mapping occurs before deleting a b-tree entry, causes memory
access outside the block buffer when retrieving the maximum key if the
root node has no entries.
This does not usually happen because b-tree mappings with 0 child nodes
are never created by mkfs.nilfs2 or nilfs2 itself. However, it can happen
if the b-tree root node read from a device is configured that way, so fix
this potential issue by adding a check for that case.
Due to the nature of b-trees, nilfs2 itself and admin tools such as
mkfs.nilfs2 will never create an intermediate b-tree node block with 0
child nodes, nor will they delete (key, pointer)-entries that would result
in such a state. However, it is possible that a b-tree node block is
corrupted on the backing device and is read with 0 child nodes.
Because operation is not guaranteed if the number of child nodes is 0 for
intermediate node blocks other than the root node, modify
nilfs_btree_node_broken(), which performs sanity checks when reading a
b-tree node block, so that such cases will be judged as metadata
corruption.
Patch series "nilfs2: fix potential issues with empty b-tree nodes".
This series addresses three potential issues with empty b-tree nodes that
can occur with corrupted filesystem images, including one recently
discovered by syzbot.
This patch (of 3):
If a b-tree is broken on the device, and the b-tree height is greater than
2 (the level of the root node is greater than 1) even if the number of
child nodes of the b-tree root is 0, a NULL pointer dereference occurs in
nilfs_btree_prepare_insert(), which is called from nilfs_btree_insert().
This is because, when the number of child nodes of the b-tree root is 0,
nilfs_btree_do_lookup() does not set the block buffer head in any of
path[x].bp_bh, leaving it as the initial value of NULL, but if the level
of the b-tree root node is greater than 1, nilfs_btree_get_nonroot_node(),
which accesses the buffer memory of path[x].bp_bh, is called.
Fix this issue by adding a check to nilfs_btree_root_broken(), which
performs sanity checks when reading the root node from the device, to
detect this inconsistency.
Thanks to Lizhi Xu for trying to solve the bug and clarifying the cause
early on.
Problem statement:
Since commit fc137c0ddab2 ("sched/numa: enhance vma scanning logic"), the
Numa vma scan overhead has been reduced a lot. Meanwhile, the reducing of
the vma scan might create less Numa page fault information. The
insufficient information makes it harder for the Numa balancer to make
decision. Later, commit b7a5b537c55c08 ("sched/numa: Complete scanning of
partial VMAs regardless of PID activity") and commit 84db47ca7146d7
("sched/numa: Fix mm numa_scan_seq based unconditional scan") are found to
bring back part of the performance.
Recently when running SPECcpu omnetpp_r on a 320 CPUs/2 Sockets system, a
long duration of remote Numa node read was observed by PMU events: A few
cores having ~500MB/s remote memory access for ~20 seconds. It causes
high core-to-core variance and performance penalty. After the
investigation, it is found that many vmas are skipped due to the active
PID check. According to the trace events, in most cases,
vma_is_accessed() returns false because the history access info stored in
pids_active array has been cleared.
Proposal:
The main idea is to adjust vma_is_accessed() to let it return true easier.
Thus compare the diff between mm->numa_scan_seq and
vma->numab_state->prev_scan_seq. If the diff has exceeded the threshold,
scan the vma.
This patch especially helps the cases where there are small number of
threads, like the process-based SPECcpu. Without this patch, if the
SPECcpu process access the vma at the beginning, then sleeps for a long
time, the pid_active array will be cleared. A a result, if this process
is woken up again, it never has a chance to set prot_none anymore.
Because only the first 2 times of access is granted for vma scan:
(current->mm->numa_scan_seq) - vma->numab_state->start_scan_seq) < 2 to be
worse, no other threads within the task can help set the prot_none. This
causes information lost.
Raghavendra helped test current patch and got the positive result
on the AMD platform:
VMAs are skipped if there is no recent fault activity but this represents
a chicken-and-egg problem as there may be no fault activity if the PTEs
are never updated to trap NUMA hints. There is an indirect reliance on
scanning to be forced early in the lifetime of a task but this may fail
to detect changes in phase behaviour. Force inactive VMAs to be scanned
when all other eligible VMAs have been updated within the same scan
sequence.
Test results in general look good with some changes in performance, both
negative and positive, depending on whether the additional scanning and
faulting was beneficial or not to the workload. The autonuma benchmark
workload NUMA01_THREADLOCAL was picked for closer examination. The workload
creates two processes with numerous threads and thread-local storage that
is zero-filled in a loop. It exercises the corner case where unrelated
threads may skip VMAs that are thread-local to another thread and still
has some VMAs that inactive while the workload executes.
The VMA skipping activity frequency with and without the patch:
Note that with the patch applied, the PID activity is ignored
(ignore_pid_inactive) to ensure a VMA with some activity is completely
scanned. In addition, a small number of VMAs are scanned when no other
eligible VMA is available during a single scan window (seq_completed).
The number of times a VMA is skipped due to no PID activity from the
scanning task (pid_inactive) drops dramatically. It is expected that
this will increase the number of PTEs updated for NUMA hinting faults
as well as hinting faults but these represent PTEs that would otherwise
have been missed. The tradeoff is scan+fault overhead versus improving
locality due to migration.
On a 2-socket Cascade Lake test machine, the time to complete the
workload is as follows;
The time to complete the workload is reduced by almost 30%:
6.6.0-rc2 6.6.0-rc2
sched-numabtrace-v1 sched-numabselective-v1 /
Duration User 91201.80 63506.64
Duration System 2015.53 1819.78
Duration Elapsed 1234.77 868.37
In this specific case, system CPU time was not increased but it's not
universally true.
From vmstat, the NUMA scanning and fault activity is as follows;
6.6.0-rc2 6.6.0-rc2
sched-numabtrace-v1 sched-numabselective-v1
Ops NUMA base-page range updates 64272.00 26374386.00
Ops NUMA PTE updates 36624.00 55538.00
Ops NUMA PMD updates 54.00 51404.00
Ops NUMA hint faults 15504.00 75786.00
Ops NUMA hint local faults % 14860.00 56763.00
Ops NUMA hint local percent 95.85 74.90
Ops NUMA pages migrated 1629.00 6469222.00
Both the number of PTE updates and hint faults is dramatically
increased. While this is superficially unfortunate, it represents
ranges that were simply skipped without the patch. As a result
of the scanning and hinting faults, many more pages were also
migrated but as the time to completion is reduced, the overhead
is offset by the gain.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Raghavendra K T <raghavendra.kt@amd.com> Link: https://lore.kernel.org/r/20231010083143.19593-7-mgorman@techsingularity.net
Stable-dep-of: f22cde4371f3 ("sched/numa: Fix the vma scan starving issue") Signed-off-by: Sasha Levin <sashal@kernel.org>
NUMA Balancing skips VMAs when the current task has not trapped a NUMA
fault within the VMA. If the VMA is skipped then mm->numa_scan_offset
advances and a task that is trapping faults within the VMA may never
fully update PTEs within the VMA.
Force tasks to update PTEs for partially scanned PTEs. The VMA will
be tagged for NUMA hints by some task but this removes some of the
benefit of tracking PID activity within a VMA. A follow-on patch
will mitigate this problem.
The test cases and machines evaluated did not trigger the corner case so
the performance results are neutral with only small changes within the
noise from normal test-to-test variance. However, the next patch makes
the corner case easier to trigger.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Raghavendra K T <raghavendra.kt@amd.com> Link: https://lore.kernel.org/r/20231010083143.19593-6-mgorman@techsingularity.net
Stable-dep-of: f22cde4371f3 ("sched/numa: Fix the vma scan starving issue") Signed-off-by: Sasha Levin <sashal@kernel.org>
Recent NUMA hinting faulting activity is reset approximately every
VMA_PID_RESET_PERIOD milliseconds. However, if the current task has not
accessed a VMA then the reset check is missed and the reset is potentially
deferred forever. Check if the PID activity information should be reset
before checking if the current task recently trapped a NUMA hinting fault.
NUMA balancing skips or scans VMAs for a variety of reasons. In preparation
for completing scans of VMAs regardless of PID access, trace the reasons
why a VMA was skipped. In a later patch, the tracing will be used to track
if a VMA was forcibly scanned.
The access_pids[] field name is somewhat ambiguous as no PIDs are accessed.
Similarly, it's not clear that next_pid_reset is related to access_pids[].
Rename the fields to more accurately reflect their purpose.
We disable stripe size in __ext4_fill_super if it is not a multiple of
the cluster ratio however this check is missed when trying to remount.
This can leave us with cases where stripe < cluster_ratio after
remount:set making EXT4_B2C(sbi->s_stripe) become 0 that can cause some
unforeseen bugs like divide by 0.
Fix that by adding the check in remount path as well.
When looking up for an entry in an inlined directory, if e_value_offs is
changed underneath the filesystem by some change in the block device, it
will lead to an out-of-bounds access that KASAN detects as an UAF.
EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: none.
loop0: detected capacity change from 2048 to 2047
==================================================================
BUG: KASAN: use-after-free in ext4_search_dir+0xf2/0x1c0 fs/ext4/namei.c:1500
Read of size 1 at addr ffff88803e91130f by task syz-executor269/5103
Calling ext4_xattr_ibody_find right after reading the inode with
ext4_get_inode_loc will lead to a check of the validity of the xattrs,
avoiding this problem.
Reported-by: syzbot+0c2508114d912a54ee79@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0c2508114d912a54ee79 Fixes: e8e948e7802a ("ext4: let ext4_find_entry handle inline data") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Link: https://patch.msgid.link/20240821152324.3621860-5-cascardo@igalia.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
In case of errors when reading an inode from disk or traversing inline
directory entries, return an error-encoded ERR_PTR instead of returning
NULL. ext4_find_inline_entry only caller, __ext4_find_entry already returns
such encoded errors.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Link: https://patch.msgid.link/20240821152324.3621860-3-cascardo@igalia.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Stable-dep-of: c6b72f5d82b1 ("ext4: avoid OOB when system.data xattr changes underneath the filesystem") Signed-off-by: Sasha Levin <sashal@kernel.org>
min_clusters is signed integer and will be converted to unsigned
integer when compared with unsigned number stats.free_clusters.
If min_clusters is negative, it will be converted to a huge unsigned
value in which case all groups may not meet the actual desired free
clusters.
Set negative min_clusters to 0 to avoid unexpected behavior.
If a group is marked EXT4_GROUP_INFO_IBITMAP_CORRUPT after it's inode
bitmap buffer_head was successfully verified, then __ext4_new_inode()
will get a valid inode_bitmap_bh of a corrupted group from
ext4_read_inode_bitmap() in which case inode_bitmap_bh misses a release.
Hnadle "IS_ERR(inode_bitmap_bh)" and group corruption separately like
how ext4_free_inode() does to avoid buffer_head leak.
Fixes: 9008a58e5dce ("ext4: make the bitmap read routines return real error codes") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://patch.msgid.link/20240820132234.2759926-3-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
Release inode_bitmap_bh from ext4_read_inode_bitmap() in
ext4_mark_inode_used() to avoid buffer_head leak.
By the way, remove unneeded goto for invalid ino when inode_bitmap_bh
is NULL.
In the `smk_set_cipso` function, the `skp->smk_netlabel.attr.mls.cat`
field is directly assigned to a new value without using the appropriate
RCU pointer assignment functions. According to RCU usage rules, this is
illegal and can lead to unpredictable behavior, including data
inconsistencies and impossible-to-diagnose memory corruption issues.
This possible bug was identified using a static analysis tool developed
by myself, specifically designed to detect RCU-related issues.
To address this, the assignment is now done using rcu_assign_pointer(),
which ensures that the pointer assignment is done safely, with the
necessary memory barriers and synchronization. This change prevents
potential RCU dereference issues by ensuring that the `cat` field is
safely updated while still adhering to RCU's requirements.
Fixes: 0817534ff9ea ("smackfs: Fix use-after-free in netlbl_catmap_walk()") Signed-off-by: Jiawei Ye <jiawei.ye@foxmail.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 3d56b8d2c74c ("ext4: Speed up FITRIM by recording flags in
ext4_group_info") speed up fstrim by skipping trim trimmed group. We
also has the chance to clear trimmed once there exists some block free
for this group(mount without discard), and the next trim for this group
will work well too.
For mount with discard, we will issue dicard when we free blocks, so
leave trimmed flag keep alive to skip useless trim trigger from
userspace seems reasonable. But for some case like ext4 build on
dm-thinpool(ext4 blocksize 4K, pool blocksize 128K), discard from ext4
maybe unaligned for dm thinpool, and thinpool will just finish this
discard(see process_discard_bio when begein equals to end) without
actually process discard. For this case, trim from userspace can really
help us to free some thinpool block.
So convert to clear trimmed flag for all case no matter mounted with
discard or not.
Fixes: 3d56b8d2c74c ("ext4: Speed up FITRIM by recording flags in ext4_group_info") Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240817085510.2084444-1-yangerkun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
When analyzing a kernel waring message, Peter pointed out that there is a
race condition when the kworker is being frozen and falls into
try_to_freeze() with TASK_INTERRUPTIBLE, which could trigger a
might_sleep() warning in try_to_freeze(). Although the root cause is not
related to freeze()[1], it is still worthy to fix this issue ahead.
In 2018, a dependency on <linux/crc32poly.h> was added to avoid
duplicating the same constant in multiple files. Two months later it was
found to be a bad idea and the definition of CRC32_POLY_LE macro was moved
into xz_private.h to avoid including <linux/crc32poly.h>.
xz_private.h is a wrong place for it too. Revert back to the upstream
version which has the poly in xz_crc32_init() in xz_crc32.c.
Link: https://lkml.kernel.org/r/20240721133633.47721-10-lasse.collin@tukaani.org Fixes: faa16bc404d7 ("lib: Use existing define with polynomial") Fixes: 242cdad873a7 ("lib/xz: Put CRC32_POLY_LE in xz_private.h") Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Reviewed-by: Sam James <sam@gentoo.org> Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Joel Stanley <joel@jms.id.au> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Emil Renner Berthing <emil.renner.berthing@canonical.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Jubin Zhong <zhongjubin@huawei.com> Cc: Jules Maselbas <jmaselbas@zdiv.net> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rui Li <me@lirui.org> Cc: Simon Glass <sjg@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In case of malformed relocation record of kind BPF_CORE_TYPE_ID_LOCAL
referencing a non-existing BTF type, function bpf_core_calc_relo_insn
would cause a null pointer deference.
Fix this by adding a proper check upper in call stack, as malformed
relocation records could be passed from user space.
Simplest reproducer is a program:
r0 = 0
exit
With a single relocation record:
.insn_off = 0, /* patch first instruction */
.type_id = 100500, /* this type id does not exist */
.access_str_off = 6, /* offset of string "0" */
.kind = BPF_CORE_TYPE_ID_LOCAL,
See the link for original reproducer or next commit for a test case.
Currently, compiling the bpf programs will result the compilation errors
with the cf-protection option as follows in arm64 and loongarch64 machine
when using gcc 12.3.1 and clang 17.0.6. This commit fixes the compilation
errors by limited the cf-protection option only used in x86 platform.
[root@localhost linux]# make M=samples/bpf
......
CLANG-bpf samples/bpf/xdp2skb_meta_kern.o
error: option 'cf-protection=return' cannot be specified on this target
error: option 'cf-protection=branch' cannot be specified on this target
2 errors generated.
CLANG-bpf samples/bpf/syscall_tp_kern.o
error: option 'cf-protection=return' cannot be specified on this target
error: option 'cf-protection=branch' cannot be specified on this target
2 errors generated.
......
Fixes: 34f6e38f58db ("samples/bpf: fix warning with ignored-attributes") Reported-by: Jiangshan Yi <yijiangshan@kylinos.cn> Signed-off-by: Jiangshan Yi <yijiangshan@kylinos.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Qiang Wang <wangqiang1@kylinos.cn> Link: https://lore.kernel.org/bpf/20240815135524.140675-1-13667453960@163.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Linux 5.1 implemented 64-bit time types and related syscalls to address the
Y2038 problem generally across archs. Userspace handling of Y2038 varies
with the libc however. While musl libc uses 64-bit time across all 32-bit
and 64-bit platforms, GNU glibc uses 64-bit time on 64-bit platforms but
defaults to 32-bit time on 32-bit platforms unless they "opt-in" to 64-bit
time or explicitly use 64-bit syscalls and time structures.
One specific area is the standard setsockopt() call, SO_TIMESTAMPNS option
used for timestamping, and the related output 'struct timespec'. GNU glibc
defaults as above, also exposing the SO_TIMESTAMPNS_NEW flag to explicitly
use a 64-bit call and 'struct __kernel_timespec'. Since these are not
exposed or needed with musl libc, their use in tc_redirect.c leads to
compile errors building for mips64el/musl:
tc_redirect.c: In function 'rcv_tstamp':
tc_redirect.c:425:32: error: 'SO_TIMESTAMPNS_NEW' undeclared (first use in this function); did you mean 'SO_TIMESTAMPNS'?
425 | cmsg->cmsg_type == SO_TIMESTAMPNS_NEW)
| ^~~~~~~~~~~~~~~~~~
| SO_TIMESTAMPNS
tc_redirect.c:425:32: note: each undeclared identifier is reported only once for each function it appears in
tc_redirect.c: In function 'test_inet_dtime':
tc_redirect.c:491:49: error: 'SO_TIMESTAMPNS_NEW' undeclared (first use in this function); did you mean 'SO_TIMESTAMPNS'?
491 | err = setsockopt(listen_fd, SOL_SOCKET, SO_TIMESTAMPNS_NEW,
| ^~~~~~~~~~~~~~~~~~
| SO_TIMESTAMPNS
However, using SO_TIMESTAMPNS_NEW isn't strictly needed, nor is Y2038 being
explicitly tested. The timestamp checks in tc_redirect.c are simple: the
packet receive timestamp is non-zero and processed/handled in less than 5
seconds.
Switch to using the standard setsockopt() call and SO_TIMESTAMPNS option to
ensure compatibility across glibc and musl libc. In the worst-case, there
is a 5-second window 14 years from now where tc_redirect tests may fail on
32-bit systems. However, we should reasonably expect glibc to adopt a
64-bit mandate rather than the current "opt-in" policy before the Y2038
roll-over.
Fixes: ce6f6cffaeaa ("selftests/bpf: Wait for the netstamp_needed_key static key to be turned on") Fixes: c803475fd8dd ("bpf: selftests: test skb->tstamp in redirect_neigh") Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/031d656c058b4e55ceae56ef49c4e1729b5090f3.1722244708.git.tony.ambardar@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Include GNU <execinfo.h> header only with glibc and provide weak, stubbed
backtrace functions as a fallback in test_progs.c. This allows for non-GNU
replacements while avoiding compile errors (e.g. with musl libc) like:
test_progs.c:13:10: fatal error: execinfo.h: No such file or directory
13 | #include <execinfo.h> /* backtrace */
| ^~~~~~~~~~~~
test_progs.c: In function 'crash_handler':
test_progs.c:1034:14: error: implicit declaration of function 'backtrace' [-Werror=implicit-function-declaration]
1034 | sz = backtrace(bt, ARRAY_SIZE(bt));
| ^~~~~~~~~
test_progs.c:1045:9: error: implicit declaration of function 'backtrace_symbols_fd' [-Werror=implicit-function-declaration]
1045 | backtrace_symbols_fd(bt, sz, STDERR_FILENO);
| ^~~~~~~~~~~~~~~~~~~~
Compiling lwt_reroute.c with GCC 12.3 for mips64el/musl-libc yields errors:
In file included from .../include/arpa/inet.h:9,
from ./test_progs.h:18,
from tools/testing/selftests/bpf/prog_tests/lwt_helpers.h:11,
from tools/testing/selftests/bpf/prog_tests/lwt_reroute.c:52:
.../include/netinet/in.h:23:8: error: redefinition of 'struct in6_addr'
23 | struct in6_addr {
| ^~~~~~~~
In file included from .../include/linux/icmp.h:24,
from tools/testing/selftests/bpf/prog_tests/lwt_helpers.h:9:
.../include/linux/in6.h:33:8: note: originally defined here
33 | struct in6_addr {
| ^~~~~~~~
.../include/netinet/in.h:34:8: error: redefinition of 'struct sockaddr_in6'
34 | struct sockaddr_in6 {
| ^~~~~~~~~~~~
.../include/linux/in6.h:50:8: note: originally defined here
50 | struct sockaddr_in6 {
| ^~~~~~~~~~~~
.../include/netinet/in.h:42:8: error: redefinition of 'struct ipv6_mreq'
42 | struct ipv6_mreq {
| ^~~~~~~~~
.../include/linux/in6.h:60:8: note: originally defined here
60 | struct ipv6_mreq {
| ^~~~~~~~~
These errors occur because <linux/in6.h> is included before <netinet/in.h>,
bypassing the Linux uapi/libc compat mechanism's partial musl support. As
described in [1] and [2], fix these errors by including <netinet/in.h> in
lwt_reroute.c before any uapi headers.
One possibility is that both lwt_redirect and lwt_reroute create
netns with the same name "ns_lwt" which may cause conflict. I tried
the following example:
$ sudo ip netns add abc
$ echo $?
0
$ sudo ip netns add abc
Cannot create namespace file "/var/run/netns/abc": File exists
$ echo $?
1
$
The return code for above netns_create() is 256. The internet search
suggests that the return value for 'ip netns add ns_lwt' is 1, which
matches the above 'sudo ip netns add abc' example.
This patch tried to use different netns names for two tests to avoid
'ip netns add <name>' failure.
I ran './test_progs -j' 10 times and all succeeded with
lwt_redirect/lwt_reroute tests.
While building, bpftool makes a skeleton from test_core_extern.c, which
itself includes <stdbool.h> and uses the 'bool' type. However, the skeleton
test_core_extern.skel.h generated *does not* include <stdbool.h> or use the
'bool' type, instead using the C-only '_Bool' type. Compiling test_cpp.cpp
with g++ 12.3 for mips64el/musl-libc then fails with error:
In file included from test_cpp.cpp:9:
test_core_extern.skel.h:45:17: error: '_Bool' does not name a type
45 | _Bool CONFIG_BOOL;
| ^~~~~
This was likely missed previously because glibc uses a GNU extension for
<stdbool.h> with C++ (#define _Bool bool), not supported by musl libc.
Normally, a C fragment would include <stdbool.h> and use the 'bool' type,
and thus cleanly work after import by C++. The ideal fix would be for
'bpftool gen skeleton' to output the correct type/include supporting C++,
but in the meantime add a conditional define as above.
Although the post-increment in macro 'CPU_SET(next++, &cpuset)' seems safe,
the sequencing can raise compile errors, so move the increment outside the
macro. This avoids an error seen using gcc 12.3.0 for mips64el/musl-libc:
In file included from test_lru_map.c:11:
test_lru_map.c: In function 'sched_next_online':
test_lru_map.c:129:29: error: operation on 'next' may be undefined [-Werror=sequence-point]
129 | CPU_SET(next++, &cpuset);
| ^
cc1: all warnings being treated as errors
where logic assumes the 'state' var can distinguish between first and
subsequent strtok_r() calls, and adjusts parameters accordingly. However,
'state' is strictly internal context for strtok_r() and no such assumptions
are supported in the man page. Moreover, the exact behaviour of 'state'
depends on the libc implementation, making the above code fragile.
Indeed, invoking "./test_progs -t <test_name>" on mips64el/musl will hang,
with the above code in an infinite loop.
Similarly, we see strange behaviour running 'veristat' on mips64el/musl:
$ ./veristat -e file,prog,verdict,insns -C two-ok add-failure
Can't specify more than 9 stats
Rewrite code using a counter to distinguish between strtok_r() calls.
Remove a redundant include of '<asm/types.h>', whose needed definitions are
already included (via '<linux/types.h>') in cg_storage_multi_egress_only.c,
cg_storage_multi_isolated.c, and cg_storage_multi_shared.c. This avoids
redefinition errors seen compiling for mips64el/musl-libc like:
In file included from progs/cg_storage_multi_egress_only.c:13:
In file included from progs/cg_storage_multi.h:6:
In file included from /usr/mips64el-linux-gnuabi64/include/asm/types.h:23:
/usr/include/asm-generic/int-l64.h:29:25: error: typedef redefinition with different types ('long' vs 'long long')
29 | typedef __signed__ long __s64;
| ^
/usr/include/asm-generic/int-ll64.h:30:44: note: previous definition is here
30 | __extension__ typedef __signed__ long long __s64;
| ^
Remove a redundant include of '<linux/in6.h>', whose needed definitions are
already provided by 'test_progs.h'. This avoids errors seen compiling for
mips64el/musl-libc:
In file included from .../arpa/inet.h:9,
from ./test_progs.h:17,
from prog_tests/decap_sanity.c:9:
.../netinet/in.h:23:8: error: redefinition of 'struct in6_addr'
23 | struct in6_addr {
| ^~~~~~~~
In file included from decap_sanity.c:7:
.../linux/in6.h:33:8: note: originally defined here
33 | struct in6_addr {
| ^~~~~~~~
.../netinet/in.h:34:8: error: redefinition of 'struct sockaddr_in6'
34 | struct sockaddr_in6 {
| ^~~~~~~~~~~~
.../linux/in6.h:50:8: note: originally defined here
50 | struct sockaddr_in6 {
| ^~~~~~~~~~~~
.../netinet/in.h:42:8: error: redefinition of 'struct ipv6_mreq'
42 | struct ipv6_mreq {
| ^~~~~~~~~
.../linux/in6.h:60:8: note: originally defined here
60 | struct ipv6_mreq {
| ^~~~~~~~~
Remove a redundant include of '<linux/icmp.h>' which is already provided in
'lwt_helpers.h'. This avoids errors seen compiling for mips64el/musl-libc:
In file included from .../arpa/inet.h:9,
from lwt_redirect.c:51:
.../netinet/in.h:23:8: error: redefinition of 'struct in6_addr'
23 | struct in6_addr {
| ^~~~~~~~
In file included from .../linux/icmp.h:24,
from lwt_redirect.c:50:
.../linux/in6.h:33:8: note: originally defined here
33 | struct in6_addr {
| ^~~~~~~~
.../netinet/in.h:34:8: error: redefinition of 'struct sockaddr_in6'
34 | struct sockaddr_in6 {
| ^~~~~~~~~~~~
.../linux/in6.h:50:8: note: originally defined here
50 | struct sockaddr_in6 {
| ^~~~~~~~~~~~
.../netinet/in.h:42:8: error: redefinition of 'struct ipv6_mreq'
42 | struct ipv6_mreq {
| ^~~~~~~~~
.../linux/in6.h:60:8: note: originally defined here
60 | struct ipv6_mreq {
| ^~~~~~~~~
The type 'loff_t' is a GNU extension and not exposed by the musl 'fcntl.h'
header unless _GNU_SOURCE is defined. Add this definition to fix errors
seen compiling for mips64el/musl-libc:
In file included from tools/testing/selftests/bpf/prog_tests/core_reloc.c:4:
./bpf_testmod/bpf_testmod.h:10:9: error: unknown type name 'loff_t'
10 | loff_t off;
| ^~~~~~
./bpf_testmod/bpf_testmod.h:16:9: error: unknown type name 'loff_t'
16 | loff_t off;
| ^~~~~~
The GNU version of 'struct tcp_info' in 'netinet/tcp.h' is not exposed by
musl headers unless _GNU_SOURCE is defined.
Add this definition to fix errors seen compiling for mips64el/musl-libc:
tcp_rtt.c: In function 'wait_for_ack':
tcp_rtt.c:24:25: error: storage size of 'info' isn't known
24 | struct tcp_info info;
| ^~~~
tcp_rtt.c:24:25: error: unused variable 'info' [-Werror=unused-variable]
cc1: all warnings being treated as errors
The GNU version of 'struct tcphdr' has members 'doff', 'source' and 'dest',
which are not exposed by musl libc headers unless _GNU_SOURCE is defined.
Add this definition to fix errors seen compiling for mips64el/musl-libc:
flow_dissector.c:118:30: error: 'struct tcphdr' has no member named 'doff'
118 | .tcp.doff = 5,
| ^~~~
flow_dissector.c:119:30: error: 'struct tcphdr' has no member named 'source'
119 | .tcp.source = 80,
| ^~~~~~
flow_dissector.c:120:30: error: 'struct tcphdr' has no member named 'dest'
120 | .tcp.dest = 8080,
| ^~~~
The GNU version of 'struct tcphdr' with member 'doff' is not exposed by
musl headers unless _GNU_SOURCE is defined. Add this definition to fix
errors seen compiling for mips64el/musl-libc:
In file included from kfree_skb.c:2:
kfree_skb.c: In function 'on_sample':
kfree_skb.c:45:30: error: 'struct tcphdr' has no member named 'doff'
45 | if (CHECK(pkt_v6->tcp.doff != 5, "check_tcp",
| ^
The GNU version of 'struct tcphdr', with members 'doff' and 'urg_ptr', is
not exposed by musl headers unless _GNU_SOURCE is defined.
Add this definition to fix errors seen compiling for mips64el/musl-libc:
parse_tcp_hdr_opt.c:18:21: error: 'struct tcphdr' has no member named 'urg_ptr'
18 | .pk6_v6.tcp.urg_ptr = 123,
| ^~~~~~~
parse_tcp_hdr_opt.c:19:21: error: 'struct tcphdr' has no member named 'doff'
19 | .pk6_v6.tcp.doff = 9, /* 16 bytes of options */
| ^~~~
Update ns_current_pid_tgid.c to use '#include <fcntl.h>' and avoid compile
error against mips64el/musl libc:
In file included from .../prog_tests/ns_current_pid_tgid.c:14:
.../include/sys/fcntl.h:1:2: error: #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>
| ^~~~~~~
cc1: all warnings being treated as errors
Refactor some functions in both user space code and bpf program
as these functions are used by later cgroup/sk_msg tests.
Another change is to mark tp program optional loading as later
patches will use optional loading as well since they have quite
different attachment and testing logic.
Replace CHECK in selftest ns_current_pid_tgid with recommended ASSERT_* style.
I also shortened subtest name as the prefix of subtest name is covered
by the test name already.
This patch does fix a testing issue. Currently even if bss->user_{pid,tgid}
is not correct, the test still passed since the clone func returns 0.
I fixed it to return a non-zero value if bss->user_{pid,tgid} is incorrect.
Explicitly include '<linux/build_bug.h>' to fix errors seen compiling with
gcc targeting mips64el/musl-libc:
user_ringbuf.c: In function 'test_user_ringbuf_loop':
user_ringbuf.c:426:9: error: implicit declaration of function 'BUILD_BUG_ON' [-Werror=implicit-function-declaration]
426 | BUILD_BUG_ON(total_samples <= c_max_entries);
| ^~~~~~~~~~~~
cc1: all warnings being treated as errors
Include <limits.h> in 'bench.h' to provide a UINT_MAX definition and avoid
multiple compile errors against mips64el/musl-libc like:
benchs/bench_local_storage.c: In function 'parse_arg':
benchs/bench_local_storage.c:40:38: error: 'UINT_MAX' undeclared (first use in this function)
40 | if (ret < 1 || ret > UINT_MAX) {
| ^~~~~~~~
benchs/bench_local_storage.c:11:1: note: 'UINT_MAX' is defined in header '<limits.h>'; did you forget to '#include <limits.h>'?
10 | #include <test_btf.h>
+++ |+#include <limits.h>
11 |
seen with bench_local_storage.c, bench_local_storage_rcu_tasks_trace.c, and
bench_bpf_hashmap_lookup.c.
Add a "bpf_util.h" include to avoid the following error seen compiling for
mips64el with musl libc:
bench.c: In function 'find_benchmark':
bench.c:590:25: error: implicit declaration of function 'ARRAY_SIZE' [-Werror=implicit-function-declaration]
590 | for (i = 0; i < ARRAY_SIZE(benchs); i++) {
| ^~~~~~~~~~
cc1: all warnings being treated as errors
The addition of general support for unprivileged tests in test_loader.c
breaks building test_verifier on non-glibc (e.g. musl) systems, due to the
inclusion of glibc extension '<error.h>' in 'unpriv_helpers.c'. However,
the header is actually not needed, so remove it to restore building.
Similarly for sk_lookup.c and flow_dissector.c, error.h is not necessary
and causes problems, so drop them.
Fixes: 1d56ade032a4 ("selftests/bpf: Unprivileged tests for test_loader.c") Fixes: 0ab5539f8584 ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point") Fixes: 0905beec9f52 ("selftests/bpf: run flow dissector tests in skb-less mode") Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/5664367edf5fea4f3f4b4aec3b182bcfc6edff9c.1721713597.git.tony.ambardar@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Introduce a new function called get_hw_size that retrieves both the
current and maximum size of the interface and stores this information
in the 'ethtool_ringparam' structure.
Remove ethtool_channels struct from xdp_hw_metadata.c due to redefinition
error. Remove unused linux/if.h include from flow_dissector BPF test to
address CI pipeline failure.
Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20240402114529.545475-4-tushar.vyavahare@intel.com
Stable-dep-of: 69f409469c9b ("selftests/bpf: Drop unneeded error.h includes") Signed-off-by: Sasha Levin <sashal@kernel.org>
Existing code calls getsockname() with a 'struct sockaddr_in6 *' argument
where a 'struct sockaddr *' argument is declared, yielding compile errors
when building for mips64el/musl-libc:
bpf_iter_setsockopt.c: In function 'get_local_port':
bpf_iter_setsockopt.c:98:30: error: passing argument 2 of 'getsockname' from incompatible pointer type [-Werror=incompatible-pointer-types]
98 | if (!getsockname(fd, &addr, &addrlen))
| ^~~~~
| |
| struct sockaddr_in6 *
In file included from .../netinet/in.h:10,
from .../arpa/inet.h:9,
from ./test_progs.h:17,
from bpf_iter_setsockopt.c:5:
.../sys/socket.h:391:23: note: expected 'struct sockaddr * restrict' but argument is of type 'struct sockaddr_in6 *'
391 | int getsockname (int, struct sockaddr *__restrict, socklen_t *__restrict);
| ^
cc1: all warnings being treated as errors
This compiled under glibc only because the argument is declared to be a
"funky" transparent union which includes both types above. Explicitly cast
the argument to allow compiling for both musl and glibc.
Cast 'rlim_t' argument to match expected type of printf() format and avoid
compile errors seen building for mips64el/musl-libc:
In file included from map_tests/sk_storage_map.c:20:
map_tests/sk_storage_map.c: In function 'test_sk_storage_map_stress_free':
map_tests/sk_storage_map.c:414:56: error: format '%lu' expects argument of type 'long unsigned int', but argument 2 has type 'rlim_t' {aka 'long long unsigned int'} [-Werror=format=]
414 | CHECK(err, "setrlimit(RLIMIT_NOFILE)", "rlim_new:%lu errno:%d",
| ^~~~~~~~~~~~~~~~~~~~~~~
415 | rlim_new.rlim_cur, errno);
| ~~~~~~~~~~~~~~~~~
| |
| rlim_t {aka long long unsigned int}
./test_maps.h:12:24: note: in definition of macro 'CHECK'
12 | printf(format); \
| ^~~~~~
map_tests/sk_storage_map.c:414:68: note: format string is defined here
414 | CHECK(err, "setrlimit(RLIMIT_NOFILE)", "rlim_new:%lu errno:%d",
| ~~^
| |
| long unsigned int
| %llu
cc1: all warnings being treated as errors
Use pid_t rather than __pid_t when allocating memory for 'worker_pids' in
'struct test_env', as this is its declared type and also avoids compile
errors seen building against musl libc on mipsel64:
test_progs.c:1738:49: error: '__pid_t' undeclared (first use in this function); did you mean 'pid_t'?
1738 | env.worker_pids = calloc(sizeof(__pid_t), env.workers);
| ^~~~~~~
| pid_t
test_progs.c:1738:49: note: each undeclared identifier is reported only once for each function it appears in
Actually use previously defined LDFLAGS during build and add support for
LDLIBS to link extra standalone libraries e.g. 'argp' which is not provided
by musl libc.
Make log output incorrectly shows 'test_maps' as the binary name for every
'CLNG-BPF' build step, apparently picking up the last value defined for the
$(TRUNNER_BINARY) variable. Update the 'CLANG_BPF_BUILD_RULE' variants to
fix this confusing output.
Fixes: a5d0c26a2784 ("selftests/bpf: Add a cpuv4 test runner for cpu=v4 testing") Fixes: 89ad7420b25c ("selftests/bpf: Drop the need for LLVM's llc") Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240720052535.2185967-1-tony.ambardar@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The vmlinux.h file generated by bpftool makes use of compiler pragmas
in order to install the CO-RE preserve_access_index in all the struct
types derived from the BTF info:
[... type definitions generated from kernel BTF ... ]
#ifndef BPF_NO_PRESERVE_ACCESS_INDEX
#pragma clang attribute pop
#endif
The `clang attribute push/pop' pragmas are specific to clang/llvm and
are not supported by GCC.
At the moment the BTF dumping services in libbpf do not support
dicriminating between types dumped because they are directly referred
and types dumped because they are dependencies. A suitable API is
being worked now. See [1] and [2].
In the interim, this patch changes the selftests/bpf Makefile so it
passes -DBPF_NO_PRESERVE_ACCESS_INDEX to GCC when it builds the
selftests. This workaround is temporary, and may have an impact on
the results of the GCC-built tests.
This patch modifies selftests/bpf/Makefile to pass -Wno-attributes to
GCC. This is because of the following attributes which are ignored:
- btf_decl_tag
- btf_type_tag
There are many of these. At the moment none of these are
recognized/handled by gcc-bpf.
We are aware that btf_decl_tag is necessary for some of the
selftest harness to communicate test failure/success. Support for
it is in progress in GCC upstream:
However, the GCC master branch is not yet open, so the series
above (currently under review upstream) wont be able to make it
there until 14.1 gets released, probably mid next week.
As for btf_type_tag, more extensive work will be needed in GCC
upstream to support it in both BTF and DWARF. We have a WIP big
patch for that, but that is not needed to compile/build the
selftests.
- used
There are SEC macros defined in the selftests as:
#define SEC(N) __attribute__((section(N),used))
The SEC macro is used for both functions and global variables.
According to the GCC documentation `used' attribute is really only
meaningful for functions, and it warns when the attribute is used
for other global objects, like for example ctl_array in
test_xdp_noinline.c.
GCC warns that it is ignoring this attribute, because it is not
implemented by GCC.
I think ignoring this attribute in GCC is benign, because according
to the clang documentation [1] its purpose seems to be merely
declarative and doesn't seem to translate into extra checks at
run-time, only to perhaps better optimized code ("runtime behavior
is undefined if the pointed memory object is not aligned to the
specified alignment").
Certain BPF selftests contain code that, albeit being legal C, trigger
warnings in GCC that cannot be disabled. This is the case for example
for the tests
which contain struct type declarations inside function parameter
lists. This is problematic, because:
- The BPF selftests are built with -Werror.
- The Clang and GCC compilers sometimes differ when it comes to handle
warnings. in the handling of warnings. One compiler may emit
warnings for code that the other compiles compiles silently, and one
compiler may offer the possibility to disable certain warnings, while
the other doesn't.
In order to overcome this problem, this patch modifies the
tools/testing/selftests/bpf/Makefile in order to:
1. Enable the possibility of specifing per-source-file extra CFLAGS.
This is done by defining a make variable like:
<source-filename>-CFLAGS := <whateverflags>
And then modifying the proper Make rule in order to use these flags
when compiling <source-filename>.
2. Use the mechanism above to add -Wno-error to CFLAGS for the
following selftests:
Note the corresponding -CFLAGS variables for these files are
defined only if the selftests are being built with GCC.
Note that, while compiler pragmas can generally be used to disable
particular warnings per file, this 1) is only possible for warning
that actually can be disabled in the command line, i.e. that have
-Wno-FOO options, and 2) doesn't apply to -Wno-error.
Linking uprobe_multi.c on mips64el fails due to relocation overflows, when
the GOT entries required exceeds the default maximum. Add a specific CFLAGS
(-mxgot) for uprobe_multi.c on MIPS that allows using a larger GOT and
avoids errors such as:
/tmp/ccBTNQzv.o: in function `bench':
uprobe_multi.c:49:(.text+0x1d7720): relocation truncated to fit: R_MIPS_GOT_DISP against `uprobe_multi_func_08188'
uprobe_multi.c:49:(.text+0x1d7730): relocation truncated to fit: R_MIPS_GOT_DISP against `uprobe_multi_func_08189'
...
collect2: error: ld returned 1 exit status
According to the cgroup hierarchy, A should preempt B. But current
check_preempt_wakeup_fair() treats cgroup se and task separately, so B
will preempt A unexpectedly.
Unify the wakeup logic by {c,p}se_is_idle only. This makes SCHED_IDLE of
a task a relative policy that is effective only within its own cgroup,
similar to the behavior of NICE.
Also fix se_is_idle() definition when !CONFIG_FAIR_GROUP_SCHED.
Fixes: 304000390f88 ("sched: Cgroup SCHED_IDLE support") Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Josh Don <joshdon@google.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20240626023505.1332596-1-dtcccc@linux.alibaba.com Signed-off-by: Sasha Levin <sashal@kernel.org>
tpm_dev_transmit prepares the TPM space before attempting command
transmission. However if the command fails no rollback of this
preparation is done. This can result in transient handles being leaked
if the device is subsequently closed with no further commands performed.
Fix this by flushing the space in the event of command transmission
failure.
Fixes: 745b361e989a ("tpm: infrastructure for TPM spaces") Signed-off-by: Jonathan McDowell <noodles@meta.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The allocated size in xen_swiotlb_alloc_coherent() and
xen_swiotlb_free_coherent() is calculated wrong for the case of
XEN_PAGE_SIZE not matching PAGE_SIZE. Fix that.
Fixes: 7250f422da04 ("xen-swiotlb: use actually allocated size on check physical continuous") Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When checking a memory buffer to be consecutive in machine memory,
the alignment needs to be checked, too. Failing to do so might result
in DMA memory not being aligned according to its requested size,
leading to error messages like:
4xxx 0000:2b:00.0: enabling device (0140 -> 0142)
4xxx 0000:2b:00.0: Ring address not aligned
4xxx 0000:2b:00.0: Failed to initialise service qat_crypto
4xxx 0000:2b:00.0: Resetting device qat_dev0
4xxx: probe of 0000:2b:00.0 failed with error -14
In order to minimize required special handling for running as Xen PV
dom0, the memory layout is modified to match that of the host. This
requires to have only RAM at the locations where Xen allocated memory
is living. Unfortunately there seem to be some machines, where ACPI
NVS is located at 64 MB, resulting in a conflict with the loaded
kernel or the initial page tables built by Xen.
Avoid this conflict by swapping the ACPI NVS area in the memory map
with unused RAM. This is possible via modification of the dom0 P2M map.
Accesses to the ACPI NVS area are done either for saving and restoring
it across suspend operations (this will work the same way as before),
or by ACPI code when NVS memory is referenced from other ACPI tables.
The latter case is handled by a Xen specific indirection of
acpi_os_ioremap().
While the E820 map can (and should) be modified right away, the P2M
map can be updated only after memory allocation is working, as the P2M
map might need to be extended.
Fixes: 808fdb71936c ("xen: check for kernel memory conflicting with memory layout") Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When running as a Xen PV dom0 it can happen that the kernel is being
loaded to a guest physical address conflicting with the host memory
map.
In order to be able to resolve this conflict, add the capability to
remap non-RAM areas to different guest PFNs. A function to use this
remapping information for other purposes than doing the remap will be
added when needed.
As the number of conflicts should be rather low (currently only
machines with max. 1 conflict are known), save the remap data in a
small statically allocated array.
Instead of having max_pfn as a local variable of xen_memory_setup(),
make it a static variable in setup.c instead. This avoids having to
pass it to subfunctions, which will be needed in more cases in future.
Rename it to ini_nr_pages, as the value denotes the currently usable
number of memory pages as passed from the hypervisor at boot time.
When booting as a Xen PV dom0 the memory layout of the dom0 is
modified to match that of the host, as this requires less changes in
the kernel for supporting Xen.
There are some cases, though, which are problematic, as it is the Xen
hypervisor selecting the kernel's load address plus some other data,
which might conflict with the host's memory map.
These conflicts are detected at boot time and result in a boot error.
In order to support handling at least some of these conflicts in
future, introduce a generic helper function which will later gain the
ability to adapt the memory layout when possible.
Add the missing check for the xen_start_info area.
Note that possible p2m map and initrd memory conflicts are handled
already by copying the data to memory areas not conflicting with the
memory map. The initial stack allocated by Xen doesn't need to be
checked, as early boot code is switching to the statically allocated
initial kernel stack. Initial page tables and the kernel itself will
be handled later.
We have some very fancy min/max macros that have tons of sanity checking
to warn about mixed signedness etc.
This is all things that a sane compiler should warn about, but there are
no sane compiler interfaces for this, and '-Wsign-compare' is broken [1]
and not useful.
So then we compensate (some would say over-compensate) by doing the
checks manually with some truly horrid macro games.
And no, we can't just use __builtin_types_compatible_p(), because the
whole question of "does it make sense to compare these two values" is a
lot more complicated than that.
For example, it makes a ton of sense to compare unsigned values with
simple constants like "5", even if that is indeed a signed type. So we
have these very strange macros to try to make sensible type checking
decisions on the arguments to 'min()' and 'max()'.
But that can cause enormous code expansion if the min()/max() macros are
used with complicated expressions, and particularly if you nest these
things so that you get the first big expansion then expanded again.
The xen setup.c file ended up ballooning to over 50MB of preprocessed
noise that takes 15s to compile (obviously depending on the build host),
largely due to one single line.
So let's split that one single line to just be simpler. I think it ends
up being more legible to humans too at the same time. Now that single
file compiles in under a second.
When ata_qc_complete() schedules a command for EH using
ata_qc_schedule_eh(), blk_abort_request() will be called, which leads to
req->q->mq_ops->timeout() / scsi_timeout() being called.
scsi_timeout(), if the LLDD has no abort handler (libata has no abort
handler), will set host byte to DID_TIME_OUT, and then call
scsi_eh_scmd_add() to add the command to EH.
Thus, when commands first enter libata's EH strategy_handler, all the
commands that have been added to EH will have DID_TIME_OUT set.
libata has its own flag (AC_ERR_TIMEOUT), that it sets for commands that
have not received a completion at the time of entering EH.
Thus, libata doesn't really care about DID_TIME_OUT at all, and currently
clears the host byte at the end of EH, in ata_scsi_qc_complete(), before
scsi_eh_finish_cmd() is called.
However, this clearing in ata_scsi_qc_complete() is currently only done
for commands that are not ATA passthrough commands.
Since the host byte is visible in the completion that we return to user
space for ATA passthrough commands, for ATA passthrough commands that got
completed via EH (commands with sense data), the user will incorrectly see:
ATA pass-through(16): transport error: Host_status=0x03 [DID_TIME_OUT]
Fix this by moving the clearing of the host byte (which is currently only
done for commands that are not ATA passthrough commands) from
ata_scsi_qc_complete() to the start of EH (regardless if the command is
ATA passthrough or not).
While at it, use the proper helper function to clear the host byte, rather
than open coding the clearing.
This will make sure that we:
-Correctly clear DID_TIME_OUT for both ATA passthrough commands and
commands that are not ATA passthrough commands.
-Do not needlessly clear the host byte for commands that did not go via EH.
ata_scsi_qc_complete() is called both for commands that are completed
normally (without going via EH), and for commands that went via EH,
however, only commands that went via EH will have DID_TIME_OUT set.
Fixes: 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to not set CHECK_CONDITION") Reported-by: Igor Pylypiv <ipylypiv@google.com> Closes: https://lore.kernel.org/linux-ide/ZttIN8He8TOZ7Lct@google.com/ Signed-off-by: Niklas Cassel <cassel@kernel.org> Tested-by: Igor Pylypiv <ipylypiv@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The driver currently assumes that the first sequence number it will see
is going to be 0. This is not a realiable assumption and can break if,
for example, the tablet has already been running for some time prior to
the kernel driver connecting to the device. This commit initializes the
expected sequence number to -1 and will only print the "Dropped" warning
the it has been updated to a non-negative value.
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com> Tested-by: Joshua Dickens <joshua.dickens@wacom.com> Fixes: 6d09085b38e5 ("HID: wacom: Adding Support for new usages") Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>