From: Paul Eggert Date: Thu, 27 Nov 2025 04:14:08 +0000 (-0800) Subject: Bring back placeholders X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=f83a120c580fa5a15e4eeac3dea4cb12bae1ce6a;p=thirdparty%2Ftar.git Bring back placeholders They can still be useful if -h is used. See Pavel Cahyna in: https://lists.gnu.org/r/bug-tar/2025-11/msg00026.html while we’re at it bring them back if -P is used, as they can still be useful there too. * src/extract.c (HAVE_BIRTHTIME, BIRTHTIME_EQ): Bring back these macros. (struct delayed_link, struct string_list): Bring back these structs. (delayed_link_table, delayed_link_head, delayed_link_tail): Bring back these static vars. (dl_hash, dl_compare, find_direct_ancestor) (find_delayed_link_source, create_placeholder_file) (apply_delayed_link, apply_delayed_links): Bring back these static functions. (mark_metadata_set): Rename from mark_after_links. All uses changed. (extract_link, extract_symlink): Create placeholders as before, except only if -P or -h are used. (extract_finish): Deal with delayed links, as before. --- diff --git a/NEWS b/NEWS index c5f921d9..c78ae9ca 100644 --- a/NEWS +++ b/NEWS @@ -1,4 +1,4 @@ -GNU tar NEWS - User visible changes. 2025-11-13 +GNU tar NEWS - User visible changes. 2025-11-26 Please send GNU tar bug reports to version 1.35.90 (git) @@ -62,10 +62,11 @@ option. ** Sparse files are now read and written with larger blocksizes. -** When extracting, tar no longer creates empty placeholder files +** When extracting and neither --absolute-names (-P) nor --dereference + (-h) is used, tar no longer creates empty placeholder files that are later replaced by symbolic links. The placeholders are no - longer needed now that tar no longer follows symbolic links to - targets outside the working directory. + longer needed now that tar by default no longer follows symbolic + links to targets outside the working directory. version 1.35 - Sergey Poznyakoff, 2023-07-18 diff --git a/doc/tar.texi b/doc/tar.texi index a6302b41..28158e25 100644 --- a/doc/tar.texi +++ b/doc/tar.texi @@ -2662,7 +2662,8 @@ directories until the end of extraction. @xref{Directory Modification Times and When reading or writing a file to be archived, @command{tar} accesses the file that a symbolic link points to, rather than the symlink -itself. @xref{dereference}. +itself. This a dangerous option, as it can cause @command{tar} to +access files outside the working directory. @xref{dereference}. @opsummary{directory} @item --directory=@var{dir} @@ -9527,7 +9528,7 @@ The interpretation of options in file lists is disabled by @cindex file names, absolute By default, @GNUTAR{} drops a leading @samp{/} on -input or output, and complains about file names containing a @file{..} +input or output, and complains about file names containing a @samp{..} component. There is an option that turns off this behavior: @table @option @@ -9535,7 +9536,8 @@ component. There is an option that turns off this behavior: @item --absolute-names @itemx -P Do not strip leading slashes from file names, and permit file names -containing a @file{..} file name component. +containing a @samp{..} file name component, or that escape +the extraction directory. @end table When @command{tar} extracts archive members from an archive, it strips any @@ -9547,7 +9549,7 @@ in the archive. For example, if the archive member has the name @file{/etc/passwd}, @command{tar} will extract it as if the name were really @file{etc/passwd}. -File names containing @file{..} can cause problems when extracting, so +File names containing @samp{..} can cause problems when extracting, so @command{tar} normally warns you about such files when creating an archive, and prevents attempts to extract such files if that would affect files outside the working directory. @@ -9569,45 +9571,14 @@ for the information on how to handle this case.}. If you use the @option{--absolute-names} (@option{-P}) option, @command{tar} will do none of these transformations. -To archive or extract files relative to the root directory, specify -the @option{--absolute-names} (@option{-P}) option. - -Normally, @command{tar} acts on files relative to the working -directory---ignoring superior directory names when archiving, and -ignoring leading slashes when extracting. - -When you specify @option{--absolute-names} (@option{-P}), -@command{tar} stores file names including all superior directory -names, and preserves leading slashes. If you only invoked -@command{tar} from the root directory you would never need the -@option{--absolute-names} option, but using this option -may be more convenient than switching to root. - @FIXME{Should be an example in the tutorial/wizardry section using this to transfer files between systems.} -@table @option -@item --absolute-names -Preserves full file names (including superior directory names) when -archiving and extracting files. - -@end table - -@command{tar} prints out a message about removing the @samp{/} from +By default @command{tar} prints out a message about removing the @samp{/} from file names. This message appears once per @GNUTAR{} invocation. It represents something which ought to be told; ignoring what it means can cause very serious surprises, later. - -Some people, nevertheless, do not want to see this message. Wanting to -play really dangerously, one may of course redirect @command{tar} standard -error to the sink. For example, under @command{sh}: - -@smallexample -$ @kbd{tar -c -f archive.tar /home 2> /dev/null} -@end smallexample - -@noindent -Another solution, both nicer and simpler, would be to change to +However, to suppress this message change to the @file{/} directory first, and then avoid absolute notation. For example: @@ -9615,8 +9586,15 @@ For example: $ @kbd{tar -c -f archive.tar -C / home} @end smallexample +If you use the dangerous options @option{--absolute-names} +(@option{-P}) or @option{--dereference} (@option{-h}), +symbolic links containing @samp{..} or leading @samp{/} can cause +problems when extracting, so @command{tar} extracts them last; +it may create empty files as placeholders during extraction. +Although these placeholders prevent problems if you are extracting +into an empty directory, they do not suffice for nonempty directories. @xref{Integrity}, for some of the security-related implications -of using this option. +of using these dangerous options. @include parse-datetime.texi @@ -10429,10 +10407,12 @@ When @option{--dereference} (@option{-h}) is used with symbolic links point to, instead of the links themselves. -When creating portable archives, use @option{--dereference} +When creating a portable archive from a directory that adversaries +cannot modify, consider using @option{--dereference} (@option{-h}): some systems do not support symbolic links, and moreover, your distribution might be unusable if it contains unresolved symbolic links. +@xref{dereference}. When reading from an archive, the @option{--dereference} (@option{-h}) option causes @command{tar} to follow an already-existing symbolic @@ -10442,7 +10422,8 @@ remove the link before writing a new file. @xref{Dealing with Old Files}. The @option{--dereference} option is unsafe if an untrusted user can -modify directories while @command{tar} is running. @xref{Security}. +modify directories while @command{tar} is running, or if extracting +from an untrusted archive into a nonempty directory. @xref{Security}. @node hard links @subsection Hard Links @@ -13131,7 +13112,7 @@ directory and run @command{tar} in that directory. You can use the @option{--directory} (@option{-C}) option to specify the working directory (@pxref{directory}). -When extracting from an archive, @command{tar} rejects attempts to +When extracting from an archive, @command{tar} by default rejects attempts to modify files outside the working directory. For example, if a symbolic link points outside the working directory, @command{tar} refuses to follow the link, regardless of whether the @@ -13147,11 +13128,13 @@ ordinarily follow symbolic links even if they escape the working directory. If you use the @option{--absolute-names} (@option{-P}) option when extracting, @command{tar} respects any file names in the archive, even -file names that begin with @file{/}, contain @file{..}, or that follow -a symbolic link to escape the extraction directory. As this lets the -archive overwrite any file in your system that you can write, -the @option{--absolute-names} (@option{-P}) option should be used only -for trusted archives. +file names that begin with @samp{/}, contain @samp{..}, or that follow +a symbolic link to escape the extraction directory. +If you use the @option{--dereference} (@option{-h}) option when extracting, +@command{tar} follows any existing symbolic link that is the last component of +a file name, even if that link escapes the extraction directory. +These two options should be used only for trusted archives, as they +can let an archive overwrite any file in your system that you can owrite. Conversely, with the @option{--keep-old-files} (@option{-k}) and @option{--skip-old-files} options, @command{tar} refuses to replace diff --git a/src/extract.c b/src/extract.c index 0e099efd..4345b528 100644 --- a/src/extract.c +++ b/src/extract.c @@ -47,6 +47,21 @@ static mode_t const all_mode_bits = ~ (mode_t) 0; # define fchown(fd, uid, gid) (errno = ENOSYS, -1) #endif +#if (defined HAVE_STRUCT_STAT_ST_BIRTHTIMESPEC_TV_NSEC \ + || defined HAVE_STRUCT_STAT_ST_BIRTHTIM_TV_NSEC \ + || defined HAVE_STRUCT_STAT_ST_BIRTHTIMENSEC \ + || (defined _WIN32 && ! defined __CYGWIN__)) +# define HAVE_BIRTHTIME 1 +#else +# define HAVE_BIRTHTIME 0 +#endif + +#if HAVE_BIRTHTIME +# define BIRTHTIME_EQ(a, b) (timespec_cmp (a, b) == 0) +#else +# define BIRTHTIME_EQ(a, b) true +#endif + /* Return true if an error number ERR means the system call is supported in this case. */ static bool @@ -58,9 +73,17 @@ implemented (int err) } /* List of directories whose statuses we need to extract after we've - finished extracting their subsidiary files. The head of the list - has the longest name, and each non-head element is an ancestor (in - the directory hierarchy) of the preceding element. */ + finished extracting their subsidiary files. Ordinarily the head of + the list has the longest name, and each non-head element is an + ancestor (in the directory hierarchy) of the preceding element. + However, if --absolute-names (-P) or --directory (-h) is used, + things get more complicated: if you consider each + contiguous subsequence of elements of the form [D]?[^D]*, where [D] + represents an element where METADATA_SET and [^D] + represents an element where !METADATA_SET, then the head + of the subsequence has the longest name, and each non-head element + in the subsequence is an ancestor (in the directory hierarchy) of the + preceding element. */ struct delayed_set_stat { @@ -68,7 +91,6 @@ struct delayed_set_stat struct delayed_set_stat *next; /* Metadata for this directory. */ - bool metadata_set; dev_t st_dev; ino_t st_ino; mode_t mode; /* The desired mode is MODE & ~ current_umask. */ @@ -77,6 +99,10 @@ struct delayed_set_stat struct timespec atime; struct timespec mtime; + /* Whether the metadata are set. If true, do not set the status + of this directory until after any delayed links are created. */ + bool metadata_set; + /* An estimate of the directory's current mode, along with a mask specifying which bits of this estimate are known to be correct. If CURRENT_MODE_MASK is zero, CURRENT_MODE's value doesn't @@ -114,6 +140,90 @@ static struct delayed_set_stat *delayed_set_stat_head; /* Table of delayed stat updates hashed by path; null if none. */ static Hash_table *delayed_set_stat_table; +/* A link whose creation we have delayed. */ +struct delayed_link + { + /* The next in a list of delayed links that should be made after + this delayed link. */ + struct delayed_link *next; + + /* The device, inode number and birthtime of the placeholder. + birthtime.tv_nsec is negative if the birthtime is not available. + Don't use mtime as this would allow for false matches if some + other process removes the placeholder. Don't use ctime as + this would cause race conditions and other screwups, e.g., + when restoring hard-linked symlinks. */ + dev_t st_dev; + ino_t st_ino; +#if HAVE_BIRTHTIME + struct timespec birthtime; +#endif + + /* True if the link is symbolic. */ + bool is_symlink; + + /* The desired metadata, valid only the link is symbolic. */ + mode_t mode; + uid_t uid; + gid_t gid; + struct timespec atime; + struct timespec mtime; + + /* The directory that the sources and target are relative to. */ + idx_t change_dir; + + /* A list of sources for this link. The sources are all to be + hard-linked together. */ + struct string_list *sources; + + /* SELinux context */ + char *cntx_name; + + /* ACLs */ + char *acls_a_ptr; + idx_t acls_a_len; + char *acls_d_ptr; + idx_t acls_d_len; + + struct xattr_map xattr_map; + + /* The desired target of the desired link. */ + char target[FLEXIBLE_ARRAY_MEMBER]; + }; + +/* Table of delayed links hashed by device and inode; null if none. */ +static Hash_table *delayed_link_table; + +/* A list of the delayed links in tar file order, + and the tail of that list. */ +static struct delayed_link *delayed_link_head; +static struct delayed_link **delayed_link_tail = &delayed_link_head; + +struct string_list + { + struct string_list *next; + char string[FLEXIBLE_ARRAY_MEMBER]; + }; + +static size_t +dl_hash (void const *entry, size_t table_size) +{ + struct delayed_link const *dl = entry; + uintmax_t n = dl->st_dev; + int nshift = TYPE_WIDTH (n) - TYPE_WIDTH (dl->st_dev); + if (0 < nshift) + n <<= nshift; + n ^= dl->st_ino; + return n % table_size; +} + +static bool +dl_compare (void const *a, void const *b) +{ + struct delayed_link const *da = a, *db = b; + return PSAME_INODE (da, db); +} + static size_t ds_hash (void const *entry, size_t table_size) { @@ -369,10 +479,29 @@ set_stat (char const *file_name, xattrs_selinux_set (st, file_name, typeflag); } -/* For each entry H in the entries in HEAD, mark H and fill in its dev - and ino members. Assume HEAD. */ +/* Find the direct ancestor of FILE_NAME in the delayed_set_stat list. */ +static struct delayed_set_stat * +find_direct_ancestor (char const *file_name) +{ + struct delayed_set_stat *h = delayed_set_stat_head; + while (h) + { + if (! h->metadata_set + && strncmp (file_name, h->file_name, h->file_name_len) == 0 + && ISSLASH (file_name[h->file_name_len]) + && (last_component (file_name + h->file_name_len + 1) + == file_name + h->file_name_len + 1)) + break; + h = h->next; + } + return h; +} + +/* For each entry H in the leading prefix of entries in HEAD that do + not have metadata_set marked, mark H and fill in its dev and ino + members. Assume HEAD && ! HEAD->metadata_set. */ static void -mark_after_links (struct delayed_set_stat *head) +mark_metadata_set (struct delayed_set_stat *head) { struct delayed_set_stat *h = head; @@ -502,7 +631,7 @@ delay_set_stat (char const *file_name, struct tar_stat_info const *st, if (st) xattr_map_copy (&data->xattr_map, &st->xattr_map); if (must_be_dot_or_slash (file_name)) - mark_after_links (data); + mark_metadata_set (data); } /* If DIR is an intermediate directory created earlier, update its @@ -536,8 +665,8 @@ update_interdir_set_stat (char const *dir) /* Update the delayed_set_stat info for an intermediate directory created within the file name of DIR. The intermediate directory turned - out to be the same as this directory, e.g. due to ".." or symbolic - links. *DIR_STAT_INFO is the status of the directory. */ + out to be the same as this directory, e.g., due to ".." or symbolic links. + *DIR_STAT_INFO is the status of the directory. */ static void repair_delayed_set_stat (char const *dir, struct stat const *dir_stat_info) @@ -877,7 +1006,8 @@ set_xattr (MAYBE_UNUSED char const *file_name, /* Fix the statuses of all directories whose statuses need fixing, and which are not ancestors of FILE_NAME. If METADATA_SET, do this for all such directories; otherwise, stop at the - first directory with metadata already determined. */ + first directory that is marked to be fixed up only after delayed + links are applied. */ static void apply_nonancestor_delayed_set_stat (char const *file_name, bool metadata_set) { @@ -1287,6 +1417,140 @@ extract_file (char *file_name, char typeflag) return status == 0; } +/* Return true if NAME is a delayed link. This can happen only if the link + placeholder file has been created. Therefore, try to stat the NAME + first. If it doesn't exist, there is no matching entry in the table. + Otherwise, look for the entry in the table that has the matching dev + and ino numbers. Return false if not found. + + Do not rely on comparing file names, which may differ for + various reasons (e.g., relative vs. absolute file names). */ +static bool +find_delayed_link_source (char const *name) +{ + struct stat st; + + if (!delayed_link_table) + return false; + + struct fdbase f = fdbase (name); + if (f.fd == BADFD || fstatat (f.fd, f.base, &st, AT_SYMLINK_NOFOLLOW) < 0) + { + if (errno != ENOENT) + stat_error (name); + return false; + } + + struct delayed_link dl; + dl.st_dev = st.st_dev; + dl.st_ino = st.st_ino; + return hash_lookup (delayed_link_table, &dl) != NULL; +} + +/* Create a placeholder file with name FILE_NAME, which will be + replaced after other extraction is done by a symbolic link if + IS_SYMLINK is true, and by a hard link otherwise. Set + *INTERDIR_MADE if an intermediate directory is made in the + process. */ + +static bool +create_placeholder_file (char *file_name, bool is_symlink, bool *interdir_made) +{ + int fd; + struct stat st; + + for (;;) + { + struct fdbase f = fdbase (file_name); + if (f.fd != BADFD) + { + fd = openat (f.fd, f.base, O_WRONLY | O_CREAT | O_EXCL, 0); + if (0 <= fd) + break; + } + + if (errno == EEXIST && find_delayed_link_source (file_name)) + { + /* The placeholder file has already been created. This means + that the link being extracted is a duplicate of an already + processed one. Skip it. */ + return true; + } + + switch (maybe_recoverable (file_name, false, interdir_made)) + { + case RECOVER_OK: + continue; + + case RECOVER_SKIP: + return true; + + case RECOVER_NO: + open_error (file_name); + return false; + } + } + + if (fstat (fd, &st) < 0) + { + stat_error (file_name); + close (fd); + } + else if (close (fd) < 0) + close_error (file_name); + else + { + struct delayed_set_stat *h; + struct delayed_link *p = + xmalloc (FLEXNSIZEOF (struct delayed_link, target, + strlen (current_stat_info.link_name) + 1)); + p->next = NULL; + p->st_dev = st.st_dev; + p->st_ino = st.st_ino; +#if HAVE_BIRTHTIME + p->birthtime = get_stat_birthtime (&st); +#endif + p->is_symlink = is_symlink; + if (is_symlink) + { + p->mode = current_stat_info.stat.st_mode; + p->uid = current_stat_info.stat.st_uid; + p->gid = current_stat_info.stat.st_gid; + p->atime = current_stat_info.atime; + p->mtime = current_stat_info.mtime; + } + p->change_dir = chdir_current; + p->sources = xmalloc (FLEXNSIZEOF (struct string_list, string, + strlen (file_name) + 1)); + p->sources->next = 0; + strcpy (p->sources->string, file_name); + p->cntx_name = NULL; + assign_string_or_null (&p->cntx_name, current_stat_info.cntx_name); + p->acls_a_ptr = NULL; + p->acls_a_len = 0; + p->acls_d_ptr = NULL; + p->acls_d_len = 0; + xattr_map_init (&p->xattr_map); + xattr_map_copy (&p->xattr_map, ¤t_stat_info.xattr_map); + strcpy (p->target, current_stat_info.link_name); + + *delayed_link_tail = p; + delayed_link_tail = &p->next; + if (! ((delayed_link_table + || (delayed_link_table = hash_initialize (0, 0, dl_hash, + dl_compare, free))) + && hash_insert (delayed_link_table, p))) + xalloc_die (); + + if ((h = find_direct_ancestor (file_name)) != NULL) + mark_metadata_set (h); + + return true; + } + + return false; +} + static bool extract_link (char *file_name, MAYBE_UNUSED char typeflag) { @@ -1296,6 +1560,11 @@ extract_link (char *file_name, MAYBE_UNUSED char typeflag) link_name = current_stat_info.link_name; + if (absolute_names_option | dereference_option + && ((! absolute_names_option && contains_dot_dot (link_name)) + || find_delayed_link_source (link_name))) + return create_placeholder_file (file_name, false, &interdir_made); + do { struct stat st, st1; @@ -1312,7 +1581,28 @@ extract_link (char *file_name, MAYBE_UNUSED char typeflag) } if (status == 0) - return true; + { + if (delayed_link_table + && fstatat (f1.fd, f1.base, &st1, AT_SYMLINK_NOFOLLOW) == 0) + { + struct delayed_link dl1; + dl1.st_ino = st1.st_ino; + dl1.st_dev = st1.st_dev; + struct delayed_link *ds = hash_lookup (delayed_link_table, &dl1); + if (ds && ds->change_dir == chdir_current + && BIRTHTIME_EQ (ds->birthtime, get_stat_birthtime (&st1))) + { + struct string_list *p + = xmalloc (FLEXNSIZEOF (struct string_list, + string, strlen (file_name) + 1)); + strcpy (p->string, file_name); + p->next = ds->sources; + ds->sources = p; + } + } + + return true; + } int e = errno; if ((e == EEXIST && streq (link_name, file_name)) @@ -1341,6 +1631,11 @@ extract_symlink (char *file_name, MAYBE_UNUSED char typeflag) { bool interdir_made = false; + if (!absolute_names_option & dereference_option + && (IS_ABSOLUTE_FILE_NAME (current_stat_info.link_name) + || contains_dot_dot (current_stat_info.link_name))) + return create_placeholder_file (file_name, true, &interdir_made); + for (struct fdbase f; ((f = fdbase (file_name)).fd == BADFD || symlinkat (current_stat_info.link_name, f.fd, f.base) < 0); @@ -1621,11 +1916,116 @@ extract_archive (void) undo_last_backup (); } +/* Extract the link DS whose final extraction was delayed. */ +static void +apply_delayed_link (struct delayed_link *ds) +{ + char const *valid_source = NULL; + + chdir_do (ds->change_dir); + + for (struct string_list *sources = ds->sources; + sources; + sources = sources->next) + { + char const *source = sources->string; + struct stat st; + + /* Make sure the placeholder file is still there. If not, + don't create a link, as the placeholder was probably + removed by a later extraction. */ + struct fdbase f = fdbase (source); + if (f.fd != BADFD && fstatat (f.fd, f.base, &st, AT_SYMLINK_NOFOLLOW) == 0 + && SAME_INODE (st, *ds) + && BIRTHTIME_EQ (get_stat_birthtime (&st), ds->birthtime)) + { + /* Unlink the placeholder, then create a hard link if possible, + a symbolic link otherwise. */ + struct fdbase f1; + if (unlinkat (f.fd, f.base, 0) < 0) + unlink_error (source); + else if (valid_source + && ((f1 = f.fd == BADFD ? f : fdbase1 (valid_source)).fd + != BADFD) + && linkat (f1.fd, f1.base, f.fd, f.base, 0) == 0) + ; + else if (!ds->is_symlink) + { + f1 = f.fd == BADFD ? f : fdbase1 (ds->target); + if (f1.fd == BADFD + || linkat (f1.fd, f1.base, f.fd, f.base, 0) < 0) + link_error (ds->target, source); + } + else if (symlinkat (ds->target, f.fd, f.base) < 0) + symlink_error (ds->target, source); + else + { + struct tar_stat_info st1; + st1.stat.st_mode = ds->mode; + st1.stat.st_uid = ds->uid; + st1.stat.st_gid = ds->gid; + st1.atime = ds->atime; + st1.mtime = ds->mtime; + st1.cntx_name = ds->cntx_name; + st1.acls_a_ptr = ds->acls_a_ptr; + st1.acls_a_len = ds->acls_a_len; + st1.acls_d_ptr = ds->acls_d_ptr; + st1.acls_d_len = ds->acls_d_len; + st1.xattr_map = ds->xattr_map; + set_stat (source, &st1, -1, 0, 0, SYMTYPE, + false, AT_SYMLINK_NOFOLLOW); + valid_source = source; + } + } + } + + /* There is little point to freeing, as we are about to exit, + and freeing is more likely to cause than cure trouble. */ + if (false) + { + for (struct string_list *sources = ds->sources; sources; ) + { + struct string_list *next = sources->next; + free (sources); + sources = next; + } + + xattr_map_free (&ds->xattr_map); + free (ds->cntx_name); + } +} + +/* Extract the links whose final extraction were delayed. */ +static void +apply_delayed_links (void) +{ + for (struct delayed_link *ds = delayed_link_head; ds; ds = ds->next) + apply_delayed_link (ds); + + if (false && delayed_link_table) + { + /* There is little point to freeing, as we are about to exit, + and freeing is more likely to cause than cure trouble. + Also, the above code has not bothered to free the list + in delayed_link_head. */ + hash_free (delayed_link_table); + delayed_link_table = NULL; + } +} + /* Finish the extraction of an archive. */ void extract_finish (void) { - /* Fix the status of ordinary directories that need fixing. */ + /* First, fix the status of ordinary directories that need fixing. */ + apply_nonancestor_delayed_set_stat ("", false); + + /* Then, apply delayed links, so that they don't affect delayed + directory status-setting for ordinary directories. */ + apply_delayed_links (); + + /* Finally, fix the status of directories that are ancestors + of delayed links. */ apply_nonancestor_delayed_set_stat ("", true); /* This table should be empty after apply_nonancestor_delayed_set_stat. */