From: Masatake YAMATO Date: Fri, 5 Jan 2024 05:31:46 +0000 (+0900) Subject: lsfd: extend nodev table to decode "btrfs" on SOURCE column X-Git-Tag: v2.42-start~502^2~1 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=1b2ab467e7257ce2526f1282a867bb764950285c;p=thirdparty%2Futil-linux.git lsfd: extend nodev table to decode "btrfs" on SOURCE column When filling SOURCE column, lsfd decodes the name of the device where the file object is. If the file object is sourced from a file system, lsfd fills the column with the file system's name. As #2349 and #2308, if the file system is btrfs, lsfd couldn't decode the name correctly. This change and its preceding changes fix this bug. "devnum offset" causes the trouble. On btrfs, the device number reported by stat syscall and proc fs are different. For the details of "devnum offset", see "Mechanism behind the devnum offset". About the way to fix it, see "How to adjust the output of lsfd". Without this change: $ ./lsfd -Q '(ASSOC == "exe")' -p $$ COMMAND PID USER ASSOC XMODE TYPE SOURCE MNTID INODE NAME zsh 19318 yamato exe ------ REG 0:38 0 589767 /usr/bin/zsh With this change: $ ./lsfd -Q '(ASSOC == "exe")' -p $$ COMMAND PID USER ASSOC XMODE TYPE SOURCE MNTID INODE NAME zsh 19318 yamato exe ------ REG btrfs 0 589767 /usr/bin/zsh Mechanisum behind the devnum offset ----------------------------------- Both stat command and the inotify field in fdinfo refer to an inode. filename_lookup(https://elixir.bootlin.com/linux/v6.2.9/source/fs/namei.c#L2495) is the function getting the inode for a given file name. filename_lookup returns a struct path. Via path->detnry->inode, the caller of filename_lookup can get the inode. stat command calls statx system call. statx calls filename_lookup eventually. inotify_add_watch system call takes a file name. The inotify_add_watch calls the filename_lookup eventually for getting the inode for the file name. The inode number that inotify_add_watch gets via filename_lookup is printed in the inotify field in fdinfo. The device number, the subject of this issue, can be obtained via path->detnry->inode->i_sb->s_dev. Both the stat command and the inotify field in fdinfo use the filename_lookup for getting path. If they use the same function, why don't the device numbers match? I monitored the device numbers obtained via path->detnry->inode->i_sb->s_dev by inserting a systemtap probe to filename_lookup. I saw the numbers matched. However, the number monitored via systemtap did not match the number printed by the stat command. statx system call doesn't use path->detnry->inode->i_sb->s_dev , the value obtained via filename_lookup, directly. statx calls vfs_statx. vfs_statx calls vfs_getattr after calling the filename_lookup for filling struct kstat. vfs_getattr calls inode->i_op->getattr, a file system specific method for filling struct kstat if it is available. btrfs has an implementation for the method, btrfs_getattr(https://elixir.bootlin.com/linux/v6.2.9/source/fs/btrfs/inode.c#L9007): stat->dev = BTRFS_I(inode)->root->anon_dev; The dev member is overwritten with btrfs specific value. How to adjust the output of lsfd -------------------------------- lsfd already reads mountinfo files. 1. Get the "rawnum" and mount point The device numbers in a mountinfo file are raw; btrfs is not considered. Let's call the number "rawnum" here. When reading the mountinfo file, lsfd can know the mount points of btrfs. grep btrfs /proc/self/mountinfo 72 1 0:35 /root / rw,relatime shared:1 - btrfs 2. Get the cooked num By calling "stat" system call for the mount point getting in the step 1, lsdf can know the device number the btrfs customizes with its getattr method. Let's call the device number "cookednum". 3. Make a table mapping "rawnum" to "cookednum". 4. Look up the table when printing inodes. Signed-off-by: Masatake YAMATO --- diff --git a/misc-utils/lsfd.c b/misc-utils/lsfd.c index 503f24158..d4d2a99bc 100644 --- a/misc-utils/lsfd.c +++ b/misc-utils/lsfd.c @@ -102,12 +102,23 @@ static struct nodev_table nodev_table; struct mnt_namespace { bool read_mountinfo; ino_t id; + struct list_head cooked_bdevs; }; static struct mnt_namespace *find_mnt_ns(ino_t id); static struct mnt_namespace *add_mnt_ns(ino_t id); static void *mnt_namespaces; /* for tsearch/tfind */ +struct cooked_bdev { + struct list_head cooked_bdevs; + dev_t cooked; + dev_t raw; + char *filesystem; +}; + +static ino_t self_mntns_id; +static int self_mntns_fd = -1; + struct name_manager { struct idcache *cache; unsigned long next_id; @@ -1067,18 +1078,95 @@ static void collect_namespace_files_bottomhalf(struct path_cxt *pc, struct proc false); } +static void reset_cooked_bdev(struct cooked_bdev *bdev, dev_t raw, const char *filesystem) +{ + bdev->raw = raw; + free(bdev->filesystem); + bdev->filesystem = xstrdup(filesystem); +} + +static struct cooked_bdev *new_cooked_bdev(dev_t cooked, dev_t raw, const char *filesystem) +{ + struct cooked_bdev *bdev = xmalloc(sizeof(*bdev)); + + INIT_LIST_HEAD(&bdev->cooked_bdevs); + bdev->cooked = cooked; + bdev->raw = raw; + if (major(cooked) == 0) { + bdev->filesystem = NULL; + xasprintf(&bdev->filesystem, "%s:%lu", + filesystem, (unsigned long)minor(cooked)); + } else + bdev->filesystem = xstrdup(filesystem); + + return bdev; +} + +static void free_cooked_bdev(struct cooked_bdev* bdev) +{ + if (bdev->filesystem) + free(bdev->filesystem); + free(bdev); +} + +static void add_cooked_bdev(struct mnt_namespace *mnt_ns, dev_t cooked, dev_t raw, const char *filesystem) +{ + struct cooked_bdev *bdev; + + struct list_head *n; + list_for_each (n, &mnt_ns->cooked_bdevs) { + bdev = list_entry(n, struct cooked_bdev, cooked_bdevs); + if (bdev->cooked == cooked) { + reset_cooked_bdev (bdev, raw, filesystem); + return; + } + } + + bdev = new_cooked_bdev(cooked, raw, filesystem); + list_add_tail(&bdev->cooked_bdevs, &mnt_ns->cooked_bdevs); +} + +static void dedup_cooked_bdevs(struct mnt_namespace *mnt_ns) +{ + struct list_head *n, *nnext; + + list_for_each_safe(n, nnext, &mnt_ns->cooked_bdevs) { + struct cooked_bdev *bdev = list_entry(n, struct cooked_bdev, + cooked_bdevs); + if (bdev->cooked == bdev->raw) { + list_del(n); + free_cooked_bdev(bdev); + } + } + +#if 0 + list_for_each(n, &mnt_ns->cooked_bdevs) { + struct cooked_bdev *bdev = list_entry(n, struct cooked_bdev, + cooked_bdevs); + fprintf(stderr, "mntns: %lu (major: %u, minor: %u) => (major: %u, minor: %u)\n", + mnt_ns->id, + major(bdev->cooked), minor(bdev->cooked), + major(bdev->raw), minor(bdev->raw)); + } +#endif +} + static struct mnt_namespace *new_mnt_ns(ino_t id) { struct mnt_namespace *mnt_ns = xmalloc(sizeof(*mnt_ns)); mnt_ns->id = id; mnt_ns->read_mountinfo = false; + INIT_LIST_HEAD(&mnt_ns->cooked_bdevs); return mnt_ns; } static void free_mnt_ns(void *mnt_ns) { + list_free(&((struct mnt_namespace *)mnt_ns)->cooked_bdevs, + struct cooked_bdev, cooked_bdevs, free_cooked_bdev); + free(mnt_ns); } @@ -1143,15 +1231,24 @@ void add_nodev(unsigned long minor, const char *filesystem) static void initialize_nodevs(void) { int i; + struct stat sb; for (i = 0; i < NODEV_TABLE_SIZE; i++) INIT_LIST_HEAD(&nodev_table.tables[i]); + + if (stat("/proc/self/ns/mnt", &sb) == 0) { + self_mntns_id = sb.st_ino; + self_mntns_fd = open("/proc/self/ns/mnt", O_RDONLY); + } } static void finalize_nodevs(void) { int i; + if (self_mntns_fd >= 0) + close(self_mntns_fd); + for (i = 0; i < NODEV_TABLE_SIZE; i++) list_free(&nodev_table.tables[i], struct nodev, nodevs, free_nodev); @@ -1171,9 +1268,29 @@ const char *get_nodev_filesystem(unsigned long minor) return NULL; } -static void process_mountinfo_entry(unsigned long major, unsigned long minor, - const char *filesystem) +static void add_nodevs_from_cooked_bdevs(struct mnt_namespace *mnt_ns) { + struct list_head *n; + list_for_each(n, &mnt_ns->cooked_bdevs) { + struct cooked_bdev *bdev = list_entry(n, struct cooked_bdev, + cooked_bdevs); + if (major(bdev->cooked) == 0 + && get_nodev_filesystem(minor(bdev->cooked)) == NULL) + add_nodev(minor(bdev->cooked), bdev->filesystem); + } +} + +static void process_mountinfo_entry(unsigned long major, unsigned long minor, + const char *filesystem, + const char *mntpoint_filename, + struct mnt_namespace *mnt_ns) +{ + if (mnt_ns != NULL) { + struct stat sb; + if (stat(mntpoint_filename, &sb) == 0) + add_cooked_bdev(mnt_ns, sb.st_dev, makedev(major, minor), filesystem); + } + if (major != 0) return; if (get_nodev_filesystem(minor)) @@ -1182,7 +1299,7 @@ static void process_mountinfo_entry(unsigned long major, unsigned long minor, add_nodev(minor, filesystem); } -static void read_mountinfo(FILE *mountinfo) +static void read_mountinfo(FILE *mountinfo, struct mnt_namespace *mnt_ns) { /* This can be very long. A line in mountinfo can have more than 3 * paths. */ @@ -1191,19 +1308,50 @@ static void read_mountinfo(FILE *mountinfo) while (fgets(line, sizeof(line), mountinfo)) { unsigned long major, minor; char filesystem[256]; + int mntpoint_offset, mntpoint_end_offset; + int scan_offset; - /* 23 61 0:22 / /sys rw,nosuid,nodev,noexec,relatime shared:2 - sysfs sysfs rw,seclabel */ - if(sscanf(line, "%*d %*d %lu:%lu %*s %*s %*s %*[^-] - %255s %*[^\n]", - &major, &minor, filesystem) != 3) - /* 1600 1458 0:55 / / rw,nodev,relatime - overlay overlay rw,context="s... */ - if (sscanf(line, "%*d %*d %lu:%lu %*s %*s %*s - %255s %*[^\n]", - &major, &minor, filesystem) != 3) + if(sscanf(line, "%*d %*d %lu:%lu %*s %n%*s%n %*s %n", &major, &minor, + &mntpoint_offset, &mntpoint_end_offset, &scan_offset) != 2) + continue; + + /* 23 61 0:22 / /sys rw,nosuid,nodev,noexec,relatime shared:2 - sysfs sysfs rw,seclabel + * --------------------------------------------------^ + */ + if(sscanf(line + scan_offset, "%*[^-] - %255s %*[^\n]", + filesystem) != 1) + /* 1600 1458 0:55 / / rw,nodev,relatime - overlay overlay rw,context="s... + * -------------------------------------^ + */ + if (sscanf(line + scan_offset, "- %255s %*[^\n]", + filesystem) != 1) continue; - process_mountinfo_entry(major, minor, filesystem); + line[mntpoint_end_offset] = '\0'; + process_mountinfo_entry(major, minor, filesystem, + line + mntpoint_offset, mnt_ns); + } + + if (mnt_ns) { + dedup_cooked_bdevs(mnt_ns); + add_nodevs_from_cooked_bdevs(mnt_ns); } } +static void read_mountinfo_in_mntns(FILE *mountinfo, struct mnt_namespace *mnt_ns, + int mntns_fd) +{ + if (mntns_fd >= 0 && setns(mntns_fd, CLONE_NEWNS) < 0) { + mntns_fd = -1; + mnt_ns = NULL; + } + + read_mountinfo(mountinfo, mnt_ns); + + if (mntns_fd >= 0) + setns(self_mntns_fd, CLONE_NEWNS); +} + static void initialize_ipc_table(void) { for (int i = 0; i < IPC_TABLE_SIZE; i++) @@ -1810,7 +1958,12 @@ static void read_process(struct lsfd_control *ctl, struct path_cxt *pc, if (proc->mnt_ns == NULL || !proc->mnt_ns->read_mountinfo) { FILE *mountinfo = ul_path_fopen(pc, "r", "mountinfo"); if (mountinfo) { - read_mountinfo(mountinfo); + int mntns_fd = -1; + if (proc->mnt_ns && (self_mntns_id != proc->mnt_ns->id)) + mntns_fd = ul_path_open(pc, O_RDONLY, "ns/mnt"); + read_mountinfo_in_mntns(mountinfo, proc->mnt_ns, mntns_fd); + if (mntns_fd >= 0) + close(mntns_fd); if (proc->mnt_ns) proc->mnt_ns->read_mountinfo = true; fclose(mountinfo); diff --git a/tests/expected/lsfd/column-source-btrfs b/tests/expected/lsfd/column-source-btrfs new file mode 100644 index 000000000..be2360403 --- /dev/null +++ b/tests/expected/lsfd/column-source-btrfs @@ -0,0 +1,2 @@ +SOURCE: 0 +SOURCE == EXPECTED diff --git a/tests/ts/lsfd/column-source-btrfs b/tests/ts/lsfd/column-source-btrfs new file mode 100755 index 000000000..52999281c --- /dev/null +++ b/tests/ts/lsfd/column-source-btrfs @@ -0,0 +1,74 @@ +#!/bin/bash +# +# Copyright (C) 2024 Masatake YAMATO +# +# This file is part of util-linux. +# +# This file is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This file is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +TS_TOPDIR="${0%/*}/../.." +TS_DESC="SOURCE column for fd opening a file on btrfs" + +. "$TS_TOPDIR"/functions.sh +ts_init "$*" +ts_skip_nonroot + +ts_check_test_command "$TS_CMD_LSFD" +ts_check_prog "stat" +ts_check_prog "sed" +ts_check_prog "mkfs.btrfs" +ts_check_prog "dd" + +ts_cd "$TS_OUTDIR" + +PID= +FD=3 + +IMG=img-column-source-btrfs.btrfs +MNTPNT=mntpnt-column-source-btrfs +FILE=${MNTPNT}/afile + +mkdir -p $MNTPNT +dd if=/dev/zero of=$IMG bs=114294784 count=1 status=none +if ! mkfs.btrfs -q $IMG; then + ts_skip "failed to make a btrfs image: $IMG" +fi +if ! mount $IMG $MNTPNT; then + ts_skip "failed to mount a btrfs image, $IMG to $MNTPNT" +fi +trap "umount $MNTPNT; rm -f $IMG" EXIT + +if ! touch $FILE; then + ts_skip "failed to touch a file on a btrfs filesystem: $FILE" +fi + +# The major number may be 0. So we can assume the device number is the +# same as that of minor number. +EXPECTED="btrfs:$(stat -c %d $FILE)" + +{ + coproc MKFDS { "$TS_HELPER_MKFDS" ro-regular-file $FD file=$FILE; } + if read -u ${MKFDS[0]} PID; then + EXPR='(PID == '"${PID}"') and (FD == '"$FD"')' + SOURCE=$(${TS_CMD_LSFD} -n --raw -o SOURCE -Q "${EXPR}") + echo "SOURCE": $? + if [[ "$SOURCE" == "$EXPECTED" ]]; then + echo "SOURCE == EXPECTED" + else + echo "SOURCE: $SOURCE" + echo "EXPECTED: $EXPECTED" + fi + echo DONE >&"${MKFDS[1]}" + fi + wait "${MKFDS_PID}" +} > "$TS_OUTPUT" 2>&1 + +ts_finalize