By now all the functions fun_xdp_tx() calls are shared with the skb path
and thus are fragment-capable. Update fun_xdp_tx(), that up to now has
been passing just one buffer, to check for fragments and call
accordingly. This makes XDP_TX and ndo_xdp_xmit fragment-capable.
Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of passing an skb to the mapping function pass an
skb_shared_info plus an additional address/length pair. This makes it
usable for both skbs and XDP. Call it from the XDP path and adjust the
skb path.
Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Current XDP unmapping is a subset of its skb analog, dealing with
only one buffer. In preparation for multi-frag XDP rename the skb
function and use it also for XDP. The XDP version is removed.
Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Bonzini [Mon, 1 Aug 2022 11:27:18 +0000 (07:27 -0400)]
KVM: x86/mmu: remove unused variable
The last use of 'pfn' went away with the same-named argument to
host_pfn_mapping_level; now that the hugepage level is obtained
exclusively from the host page tables, kvm_mmu_zap_collapsible_spte
does not need to know host pfns at all.
Fixes: a8ac499bb6ab ("KVM: x86/mmu: Don't require refcounted "struct page" to create huge SPTEs") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
net: devlink: enable parallel ops on netlink interface
As the devlink_mutex was removed and all devlink instances are protected
individually by devlink->lock mutex, allow the netlink ops to run
in parallel and therefore allow user to execute commands on multiple
devlink instances simultaneously.
Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
All accesses to devlink structure from userspace and drivers are locked
with devlink->lock instance mutex. Also, devlinks xa_array iteration is
taken care of by iteration helpers taking devlink reference.
Therefore, remove devlink_mutex as it is no longer needed.
Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: devlink: convert reload command to take implicit devlink->lock
Convert reload command to behave the same way as the rest of the
commands and let if be called with devlink->lock held. Remove the
temporary devl_lock taking from drivers. As the DEVLINK_NL_FLAG_NO_LOCK
flag is no longer used, remove it alongside.
Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: devlink: introduce "unregistering" mark and use it during devlinks iteration
Add new mark called "unregistering" to be set at the beginning of
devlink_unregister() function. Check this mark during devlinks
iteration in order to prevent getting a reference of devlink which is
being currently unregistered.
Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
udp: Remove redundant __udp_sysctl_init() call from udp_init().
__udp_sysctl_init() is called for init_net via udp_sysctl_ops.
While at it, we can rename __udp_sysctl_init() to udp_sysctl_init().
Fixes: 1e8029515816 ("udp: Move the udp sysctl to namespace.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
this is a pull request of 36 patches for net-next/master.
The 1st patch is by me and fixes a typo in the mcp251xfd driver.
Vincent Mailhol contributes a series of 9 patches, which clean up the
drivers to make use of KBUILD_MODNAME instead of hard coded names and
remove DRV_VERSION.
Followed by 3 patches by Vincent Mailhol that directly set the
ethtool_ops in instead of calling a function in the slcan, c_can and
flexcan driver.
Vincent Mailhol contributes a KBUILD_MODNAME and pr_fmt cleanup patch
for the slcan driver. Dario Binacchi contributes 6 patches to clean up
the driver and remove the legacy driver infrastructure.
The next 14 patches are by Vincent Mailhol and target the various
drivers, they add ethtool support and reporting of timestamping
capabilities.
Another patch by Vincent Mailhol for the etas_es58x driver to remove
useless calls to usb_fill_bulk_urb().
The last patch is by Christophe JAILLET and fixes a broken link to
Documentation in the can327 driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Fri, 29 Jul 2022 07:49:35 +0000 (15:49 +0800)]
cifs: fix wrong unlock before return from cifs_tree_connect()
It should unlock 'tcon->tc_lock' before return from cifs_tree_connect().
Fixes: fe67bd563ec2 ("cifs: avoid use of global locks for high contention data") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
cifs: avoid use of global locks for high contention data
During analysis of multichannel perf, it was seen that
the global locks cifs_tcp_ses_lock and GlobalMid_Lock, which
were shared between various data structures were causing a
lot of contention points.
With this change, we're breaking down the use of these locks
by introducing new locks at more granular levels. i.e.
server->srv_lock, ses->ses_lock and tcon->tc_lock to protect
the unprotected fields of server, session and tcon structs;
and server->mid_lock to protect mid related lists and entries
at server level.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
If we hit the 'index == next_cached' case, we leak a refcount on the
struct page. Fix this by using readahead_folio() which takes care of
the refcount for you.
Fixes: 0174ee9947bd ("cifs: Implement cache I/O by accessing the cache directly") Cc: David Howells <dhowells@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Colin Ian King [Tue, 28 Jun 2022 21:32:29 +0000 (22:32 +0100)]
cifs: remove redundant initialization to variable mnt_sign_enabled
Variable mnt_sign_enabled is being initialized with a value that
is never read, it is being reassigned later on with a different
value. The initialization is redundant and can be removed.
Cleans up clang scan-build warning:
fs/cifs/cifssmb.c:465:7: warning: Value stored to 'mnt_sign_enabled
during its initialization is never read
Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Steve French [Tue, 12 Jul 2022 16:43:44 +0000 (11:43 -0500)]
smb3: check xattr value length earlier
Coverity complains about assigning a pointer based on
value length before checking that value length goes
beyond the end of the SMB. Although this is even more
unlikely as value length is a single byte, and the
pointer is not dereferenced until laterm, it is clearer
to check the lengths first.
Addresses-Coverity: 1467704 ("Speculative execution data leak") Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Juergen Gross [Wed, 22 Jun 2022 06:38:38 +0000 (08:38 +0200)]
xen: don't require virtio with grants for non-PV guests
Commit fa1f57421e0b ("xen/virtio: Enable restricted memory access using
Xen grant mappings") introduced a new requirement for using virtio
devices: the backend now needs to support the VIRTIO_F_ACCESS_PLATFORM
feature.
This is an undue requirement for non-PV guests, as those can be operated
with existing backends without any problem, as long as those backends
are running in dom0.
Per default allow virtio devices without grant support for non-PV
guests.
On Arm require VIRTIO_F_ACCESS_PLATFORM for devices having been listed
in the device tree to use grants.
Add a new config item to always force use of grants for virtio.
Juergen Gross [Wed, 22 Jun 2022 06:38:36 +0000 (08:38 +0200)]
virtio: replace restricted mem access flag with callback
Instead of having a global flag to require restricted memory access
for all virtio devices, introduce a callback which can select that
requirement on a per-device basis.
For convenience add a common function returning always true, which can
be used for use cases like SEV.
Per default use a callback always returning false.
As the callback needs to be set in early init code already, add a
virtio anchor which is builtin in case virtio is enabled.
Ross Lagerwall [Mon, 27 Jun 2022 14:28:22 +0000 (15:28 +0100)]
xen/manage: Use orderly_reboot() to reboot
Currently when the toolstack issues a reboot, it gets translated into a
call to ctrl_alt_del(). But tying reboot to ctrl-alt-del means rebooting
may fail if e.g. the user has masked the ctrl-alt-del.target under
systemd.
A previous attempt to fix this issue made a change that sets the
kernel.ctrl-alt-del sysctl to 1 before ctrl_alt_del() is called.
However, this doesn't give userspace the opportunity to block rebooting
or even do any cleanup or syncing.
Instead, call orderly_reboot() which will call the "reboot" command,
giving userspace the opportunity to block it or perform the usual reboot
process while being independent of the ctrl-alt-del behaviour. It also
matches what happens in the shutdown case.
Hyunchul Lee [Thu, 28 Jul 2022 14:41:51 +0000 (23:41 +0900)]
ksmbd: prevent out of bound read for SMB2_WRITE
OOB read memory can be written to a file,
if DataOffset is 0 and Length is too large
in SMB2_WRITE request of compound request.
To prevent this, when checking the length of
the data area of SMB2_WRITE in smb2_get_data_area_len(),
let the minimum of DataOffset be the size of
SMB2 header + the size of SMB2_WRITE header.
ksmbd: fix racy issue while destroying session on multichannel
After multi-channel connection with windows, Several channels of
session are connected. Among them, if there is a problem in one channel,
Windows connects again after disconnecting the channel. In this process,
the session is released and a kernel oop can occurs while processing
requests to other channels. When the channel is disconnected, if other
channels still exist in the session after deleting the channel from
the channel list in the session, the session should not be released.
Finally, the session will be released after all channels are disconnected.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
ksmbd: use wait_event instead of schedule_timeout()
ksmbd threads eating masses of cputime when connection is disconnected.
If connection is disconnected, ksmbd thread waits for pending requests
to be processed using schedule_timeout. schedule_timeout() incorrectly
is used, and it is more efficient to use wait_event/wake_up than to check
r_count every time with timeout.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Guo Ren [Mon, 1 Aug 2022 02:34:24 +0000 (22:34 -0400)]
csky: abiv1: Fixup compile error
LD vmlinux.o
arch/csky/lib/string.o: In function `memmove':
string.c:(.text+0x108): multiple definition of `memmove'
lib/string.o:string.c:(.text+0x7e8): first defined here
arch/csky/lib/string.o: In function `memset':
string.c:(.text+0x148): multiple definition of `memset'
lib/string.o:string.c:(.text+0x2ac): first defined here
scripts/Makefile.vmlinux_o:68: recipe for target 'vmlinux.o' failed
make[4]: *** [vmlinux.o] Error 1
Fixes: e4df2d5e852a ("csky: Add C based string functions") Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Cc: <stable@vger.kernel.org>
exfat: Drop superfluous new line for error messages
exfat_err() adds the new line at the end of the message by itself,
hence the passed string shouldn't contain a new line. Drop the
superfluous newline letters in the error messages in a few places that
have been put mistakenly.
exfat: Downgrade ENAMETOOLONG error message to debug messages
The ENAMETOOLONG error message is printed at each time when user tries
to operate with a too long name, and this can flood the kernel logs
easily, as every user can trigger this. Let's downgrade this error
message level to a debug message for suppressing the superfluous
logs.
exfat: Expand exfat_err() and co directly to pr_*() macro
Currently the error and info messages handled by exfat_err() and co
are tossed to exfat_msg() function that does nothing but passes the
strings with printk() invocation. Not only that this is more overhead
by the indirect calls, but also this makes harder to extend for the
debug print usage; because of the direct printk() call, you cannot
make it for dynamic debug or without debug like the standard helpers
such as pr_debug() or dev_dbg().
For addressing the problem, this patch replaces exfat_*() macro to
expand to pr_*() directly. Along with it, add the new exfat_debug()
macro that is expanded to pr_debug() (which output can be gracefully
suppressed via dyndbg).
NLS_NAME_* are bit flags although they are currently defined as enum;
it's casually working so far (from 0 to 2), but it's error-prone and
may bring a problem when we want to add more flag.
This patch changes the definitions of NLS_NAME_* explicitly being bit
flags.
exfat: Return ENAMETOOLONG consistently for oversized paths
LTP has a test for oversized file path renames and it expects the
return value to be ENAMETOOLONG. However, exfat returns EINVAL
unexpectedly in some cases, hence LTP test fails. The further
investigation indicated that the problem happens only when iocharset
isn't set to utf8.
The difference comes from that, in the case of utf8,
exfat_utf8_to_utf16() returns the error -ENAMETOOLONG directly and
it's treated as the final error code. Meanwhile, on other iocharsets,
exfat_nls_to_ucs2() returns the max path size but it sets
NLS_NAME_OVERLEN to lossy flag instead; the caller side checks only
whether lossy flag is set or not, resulting in always -EINVAL
unconditionally.
This patch aligns the return code for both cases by checking the lossy
flag bit and returning ENAMETOOLONG when NLS_NAME_OVERLEN bit is set.
Yuezhang Mo [Wed, 29 Jun 2022 03:05:51 +0000 (11:05 +0800)]
exfat: remove duplicate write inode for extending dir/file
Since the timestamps need to be updated, the directory entries
will be updated by mark_inode_dirty() whether or not a new
cluster is allocated for the file or directory, so there is no
need to use __exfat_write_inode() to update the directory entries
when allocating a new cluster for a file or directory.
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Yuezhang Mo [Wed, 29 Jun 2022 02:39:59 +0000 (10:39 +0800)]
exfat: remove duplicate write inode for truncating file
This commit moves updating file attributes and timestamps before
calling __exfat_write_inode(), so that all updates of the inode
had been written by __exfat_write_inode(), mark_inode_dirty() is
unneeded.
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Andreas Schwab [Tue, 26 Jul 2022 08:01:22 +0000 (10:01 +0200)]
rtla: Define syscall numbers for riscv
RISC-V uses the same (generic) syscall numbers as ARM64.
Link: https://lkml.kernel.org/r/mvma68wl2ul.fsf@suse.de Signed-off-by: Andreas Schwab <schwab@suse.de> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Andreas Schwab [Mon, 25 Jul 2022 15:12:18 +0000 (17:12 +0200)]
rtla: Fix double free
Avoid double free by making trace_instance_destroy indempotent. When
trace_instance_init fails, it calls trace_instance_destroy, but its only
caller osnoise_destroy_tool calls it again.
Link: https://lkml.kernel.org/r/mvmilnlkyzx.fsf_-_@suse.de Fixes: 0605bf009f18 ("rtla: Add osnoise tool") Signed-off-by: Andreas Schwab <schwab@suse.de> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
tracing: Use a struct alignof to determine trace event field alignment
alignof() gives an alignment of types as they would be as standalone
variables. But alignment in structures might be different, and when
building the fields of events, the alignment must be the actual
alignment otherwise the field offsets may not match what they actually
are.
This caused trace-cmd to crash, as libtraceevent did not check if the
field offset was bigger than the event. The write_msr and read_msr
events on 32 bit had their fields incorrect, because it had a u64 field
between two ints. alignof(u64) would give 8, but the u64 field was at a
4 byte alignment.
Define a macro as:
ALIGN_STRUCTFIELD(type) ((int)(offsetof(struct {char a; type b;}, b)))
which gives the actual alignment of types in a structure.
Link: https://lkml.kernel.org/r/20220731015928.7ab3a154@rorschach.local.home Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Fixes: 04ae87a52074e ("ftrace: Rework event_create_dir()") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Since commit 482a4360c56a ("docs: networking: convert netdevices.txt to
ReST"), Documentation/networking/netdevices.txt has been replaced by
Documentation/networking/netdevices.rst.
Update the comment accordingly to avoid a 'make htmldocs' warning.
Merge tag 'x86_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Update the 'mitigations=' kernel param documentation
- Check the IBPB feature flag before enabling IBPB in firmware calls
because cloud vendors' fantasy when it comes to creating guest
configurations is unlimited
- Unexport sev_es_ghcb_hv_call() before 5.19 releases now that HyperV
doesn't need it anymore
- Remove dead CONFIG_* items
* tag 'x86_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed
x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available
Revert "x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV"
x86/configs: Update configs in x86_debug.config
delete extra space and tab in blank line, there is no functional change.
Reported-by: Hacash Robot <hacashRobot@santino.com> Signed-off-by: Xie Shaowen <studentxswpy@163.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Fix this by locking i_lock before xfs_ifork_ptr().
Fixes: abbf9e8a4507 ("xfs: rewrite getbmap using the xfs_iext_* helpers") Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Signed-off-by: Guo Xuenan <guoxuenan@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: added fixes tag] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Merge tag 'locking_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Borislav Petkov:
- Avoid rwsem lockups in certain situations when handling the handoff
bit
* tag 'locking_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter
Merge tag 'edac_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
- Relax the condition under which the DIMM label in ghes_edac is set in
order to accomodate an HPE BIOS which sets only the device but not
the bank
- Two forgotten fixes to synopsys_edac when handling error interrupts
* tag 'edac_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/ghes: Set the DIMM label unconditionally
EDAC/synopsys: Re-enable the error interrupts on v3 hw
EDAC/synopsys: Use the correct register to disable the error interrupt on v3 hw
Hongnan Li [Fri, 22 Jul 2022 08:27:32 +0000 (16:27 +0800)]
erofs: update ctx->pos for every emitted dirent
erofs_readdir update ctx->pos after filling a batch of dentries
and it may cause dir/files duplication for NFS readdirplus which
depends on ctx->pos to fill dir correctly. So update ctx->pos for
every emitted dirent in erofs_fill_dentries to fix it.
Also fix the update of ctx->pos when the initial file position has
exceeded nameoff.
Enable qspinlock by the requirements mentioned in a8ad07e5240c9
("asm-generic: qspinlock: Indicate the use of mixed-size atomics").
C-SKY only has "ldex/stex" for all atomic operations. So csky give a
strong forward guarantee for "ldex/stex." That means when ldex grabbed
the cache line into $L1, it would block other cores from snooping the
address with several cycles. The atomic_fetch_add & xchg16 has the same
forward guarantee level in C-SKY.
Qspinlock has better code size and performance in a fast path.
Phillip Potter [Sat, 30 Jul 2022 23:59:10 +0000 (00:59 +0100)]
staging: r8188eu: fix potential uninitialised variable use in rtw_pwrctrl.c
Set ret to 0 (success) before entering first if statement, thereby
assuring that even if the device is not associated and further checks
pass, we do not then end up returning the uninitialized value of ret.
This assignment is deliberately now directly before the if statement, in
order to keep it clear what is happening as opposed to having it as an
initialization at the start of the function like it was originally.
Also add a comment to make it clear this first if block is currently a
success path. As a side note, smatch does not trigger warnings for this
change, for me at least.
Within core/rtw_pwrctrl.c in the rtw_pwr_wakeup function, I previously
dropped the initialization of 'ret' (int ret = 0;) in favour of its
assignment which happens inside the first if block directly before its
corresponding goto. This was the cause of this bug, and was introduced
by: commit f3a76018dd55 ("staging: r8188eu: remove initializer from ret
in rtw_pwr_wakeup").
Fixes: f3a76018dd55 ("staging: r8188eu: remove initializer from ret in rtw_pwr_wakeup") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Link: https://lore.kernel.org/r/20220730235910.1145-1-phil@philpotter.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The root cause is segment type is invalid, so in f2fs_do_replace_block(),
f2fs accesses f2fs_sm_info::curseg_array with out-of-range segment type,
result in accessing invalid curseg->sum_blk during memcpy in
f2fs_update_meta_page(). Fix this by adding sanity check on segment type
in build_sit_entries().
Reported-by: Wenqing Liu <wenqingliu0120@gmail.com> Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
NAT entry and nat bitmap can be inconsistent, e.g. one nid is free
in nat bitmap, and blkaddr in its NAT entry is not NULL_ADDR, it
may trigger BUG_ON() in f2fs_new_node_page(), fix it.
Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com> Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Liu [Mon, 25 Jul 2022 10:16:33 +0000 (18:16 +0800)]
f2fs: fix to remove F2FS_COMPR_FL and tag F2FS_NOCOMP_FL at the same time
If the inode has the compress flag, it will fail to use
'chattr -c +m' to remove its compress flag and tag no compress flag.
However, the same command will be successful when executed again,
as shown below:
$ touch foo.txt
$ chattr +c foo.txt
$ chattr -c +m foo.txt
chattr: Invalid argument while setting flags on foo.txt
$ chattr -c +m foo.txt
$ f2fs_io getflags foo.txt
get a flag on foo.txt ret=0, flags=nocompression,inline_data
Fix this by removing some checks in f2fs_setflags_common()
that do not affect the original logic. I go through all the
possible scenarios, and the results are as follows. Bold is
the only thing that has changed.
+---------------+-----------+-----------+----------+
| | file flags |
+ command +-----------+-----------+----------+
| | no flag | compr | nocompr |
+---------------+-----------+-----------+----------+
| chattr +c | compr | compr | -EINVAL |
| chattr -c | no flag | no flag | nocompr |
| chattr +m | nocompr | -EINVAL | nocompr |
| chattr -m | no flag | compr | no flag |
| chattr +c +m | -EINVAL | -EINVAL | -EINVAL |
| chattr +c -m | compr | compr | compr |
| chattr -c +m | nocompr | *nocompr* | nocompr |
| chattr -c -m | no flag | no flag | no flag |
+---------------+-----------+-----------+----------+
introduce the below 4 new sysfs node for atomic write statistics.
- current_atomic_write: the total current atomic write block count,
which is not committed yet.
- peak_atomic_write: the peak value of total current atomic write block
count after boot.
- committed_atomic_block: the accumulated total committed atomic write
block count after boot.
- revoked_atomic_block: the accumulated total revoked atomic write block
count after boot.
Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs_gc returns -EINVAL via f2fs_balance_fs when there is enough free
secs after write checkpoint, but with gc_merge enabled, it will cause
the sleep time of gc thread to be set to no_gc_sleep_time even if there
are many dirty segments can be selected.
f2fs: invalidate meta pages only for post_read required inode
After commit e3b49ea36802 ("f2fs: invalidate META_MAPPING before
IPU/DIO write"), invalidate_mapping_pages() will be called to
avoid race condition in between IPU/DIO and readahead for GC.
However, readahead flow is only used for post_read required inode,
so this patch adds check condition to avoids unnecessary page cache
invalidating for non-post_read inode.
Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs: Delete f2fs_copy_page() and replace with memcpy_page()
f2fs_copy_page() is a wrapper around two kmap() + one memcpy() from/to
the mapped pages. It unnecessarily duplicates a kernel API and it makes
use of kmap(), which is being deprecated in favor of kmap_local_page().
Two main problems with kmap(): (1) It comes with an overhead as mapping
space is restricted and protected by a global lock for synchronization and
(2) it also requires global TLB invalidation when the kmap’s pool wraps
and it might block when the mapping space is fully utilized until a slot
becomes available.
With kmap_local_page() the mappings are per thread, CPU local, can take
page faults, and can be called from any context (including interrupts).
It is faster than kmap() in kernels with HIGHMEM enabled. Therefore, its
use in __clone_blkaddrs() is safe and should be preferred.
Delete f2fs_copy_page() and use a plain memcpy_page() in the only one
site calling the removed function. memcpy_page() avoids open coding two
kmap_local_page() + one memcpy() between the two kernel virtual addresses.
Suggested-by: Christoph Hellwig <hch@infradead.org> Suggested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs: fix to invalidate META_MAPPING before DIO write
Quoted from commit e3b49ea36802 ("f2fs: invalidate META_MAPPING before
IPU/DIO write")
"
Encrypted pages during GC are read and cached in META_MAPPING.
However, due to cached pages in META_MAPPING, there is an issue where
newly written pages are lost by IPU or DIO writes.
(a) In phase 3 of f2fs_gc(), up-to-date page is read from storage and
cached in META_MAPPING.
(b) In thread B, writing new data by IPU or DIO write on same blkaddr as
read in (a). cached page in META_MAPPING become out-dated.
(c) In phase 4 of f2fs_gc(), out-dated page in META_MAPPING is copied to
new blkaddr. In conclusion, the newly written data in (b) is lost.
To address this issue, invalidating pages in META_MAPPING before IPU or
DIO write.
"
In previous commit, we missed to cover extent cache hit case, and passed
wrong value for parameter @end of invalidate_mapping_pages(), fix both
issues.
Fixes: 6aa58d8ad20a ("f2fs: readahead encrypted block during GC") Fixes: e3b49ea36802 ("f2fs: invalidate META_MAPPING before IPU/DIO write") Cc: Hyeong-Jun Kim <hj514.kim@samsung.com> Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Daeho Jeong [Mon, 20 Jun 2022 17:38:42 +0000 (10:38 -0700)]
f2fs: introduce memory mode
Introduce memory mode to supports "normal" and "low" memory modes.
"low" mode is to support low memory devices. Because of the nature of
low memory devices, in this mode, f2fs will try to save memory sometimes
by sacrificing performance. "normal" mode is the default mode and same
as before.
The Multicolor PWM LED uses max-brigthness property (in the example and
in the driver), so document it to fixi dt_binding_check warning like:
leds/leds-pwm-multicolor.example.dtb:
led-controller: multi-led: Unevaluated properties are not allowed ('max-brightness' was unexpected)
Reported-by: Rob Herring <robh@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Pavel Machek <pavel@ucw.cz>
The driver core supports the ability to handle the creation and removal
of device-specific sysfs files in a race-free manner. Take advantage of
that by converting this driver to use this by moving the sysfs
attributes into a group and assigning the dev_groups pointer to it.
Cc: Pavel Machek <pavel@ucw.cz> Cc: linux-leds@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Marek Behún <kabel@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Pavel Machek <pavel@ucw.cz>
The wakeup in preemptive (wip) monitor verifies if the
wakeup events always take place with preemption disabled:
|
|
v
#==================#
H preemptive H <+
#==================# |
| |
| preempt_disable | preempt_enable
v |
sched_waking +------------------+ |
+--------------- | | |
| | non_preemptive | |
+--------------> | | -+
+------------------+
The wakeup event always takes place with preemption disabled because
of the scheduler synchronization. However, because the preempt_count
and its trace event are not atomic with regard to interrupts, some
inconsistencies might happen.
The documentation illustrates one of these cases.
Link: https://lkml.kernel.org/r/c98ca678df81115fddc04921b3c79720c836b18f.1659052063.git.bristot@kernel.org Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Gabriele Paoloni <gpaoloni@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Tao Zhou <tao.zhou@linux.dev> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
rv/monitor: Add the wip monitor skeleton created by dot2k
THIS CODE IS NOT LINKED TO THE MAKEFILE.
This model does not compile because it lacks the instrumentation
part, which will be added next. In the typical case, there will be
only one patch, but it was split into two patches for educational
purposes.
This is the direct output this command line:
$ dot2k -d tools/verification/models/wip.dot -t per_cpu
Link: https://lkml.kernel.org/r/5eb7a9118917e8a814c5e49853a72fc62be0a101.1659052063.git.bristot@kernel.org Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Gabriele Paoloni <gpaoloni@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Tao Zhou <tao.zhou@linux.dev> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>