]> git.ipfire.org Git - thirdparty/libarchive.git/blob - README.md
pax writer: fix multiple security vulnerabilities
[thirdparty/libarchive.git] / README.md
1 # Welcome to libarchive!
2
3 The libarchive project develops a portable, efficient C library that
4 can read and write streaming archives in a variety of formats. It
5 also includes implementations of the common `tar`, `cpio`, and `zcat`
6 command-line tools that use the libarchive library.
7
8 ## Questions? Issues?
9
10 * https://www.libarchive.org is the home for ongoing
11 libarchive development, including documentation,
12 and links to the libarchive mailing lists.
13 * To report an issue, use the issue tracker at
14 https://github.com/libarchive/libarchive/issues
15 * To submit an enhancement to libarchive, please
16 submit a pull request via GitHub: https://github.com/libarchive/libarchive/pulls
17
18 ## Contents of the Distribution
19
20 This distribution bundle includes the following major components:
21
22 * **libarchive**: a library for reading and writing streaming archives
23 * **tar**: the 'bsdtar' program is a full-featured 'tar' implementation built on libarchive
24 * **cpio**: the 'bsdcpio' program is a different interface to essentially the same functionality
25 * **cat**: the 'bsdcat' program is a simple replacement tool for zcat, bzcat, xzcat, and such
26 * **unzip**: the 'bsdunzip' program is a simple replacement tool for Info-ZIP's unzip
27 * **examples**: Some small example programs that you may find useful.
28 * **examples/minitar**: a compact sample demonstrating use of libarchive.
29 * **contrib**: Various items sent to me by third parties; please contact the authors with any questions.
30
31 The top-level directory contains the following information files:
32
33 * **NEWS** - highlights of recent changes
34 * **COPYING** - what you can do with this
35 * **INSTALL** - installation instructions
36 * **README** - this file
37 * **CMakeLists.txt** - input for "cmake" build tool, see INSTALL
38 * **configure** - configuration script, see INSTALL for details. If your copy of the source lacks a `configure` script, you can try to construct it by running the script in `build/autogen.sh` (or use `cmake`).
39
40 The following files in the top-level directory are used by the 'configure' script:
41
42 * `Makefile.am`, `aclocal.m4`, `configure.ac` - used to build this distribution, only needed by maintainers
43 * `Makefile.in`, `config.h.in` - templates used by configure script
44
45 ## Documentation
46
47 In addition to the informational articles and documentation
48 in the online [libarchive Wiki](https://github.com/libarchive/libarchive/wiki),
49 the distribution also includes a number of manual pages:
50
51 * bsdtar.1 explains the use of the bsdtar program
52 * bsdcpio.1 explains the use of the bsdcpio program
53 * bsdcat.1 explains the use of the bsdcat program
54 * libarchive.3 gives an overview of the library as a whole
55 * archive_read.3, archive_write.3, archive_write_disk.3, and
56 archive_read_disk.3 provide detailed calling sequences for the read
57 and write APIs
58 * archive_entry.3 details the "struct archive_entry" utility class
59 * archive_internals.3 provides some insight into libarchive's
60 internal structure and operation.
61 * libarchive-formats.5 documents the file formats supported by the library
62 * cpio.5, mtree.5, and tar.5 provide detailed information about these
63 popular archive formats, including hard-to-find details about
64 modern cpio and tar variants.
65
66 The manual pages above are provided in the 'doc' directory in
67 a number of different formats.
68
69 You should also read the copious comments in `archive.h` and the
70 source code for the sample programs for more details. Please let us
71 know about any errors or omissions you find.
72
73 ## Supported Formats
74
75 Currently, the library automatically detects and reads the following formats:
76
77 * Old V7 tar archives
78 * POSIX ustar
79 * GNU tar format (including GNU long filenames, long link names, and sparse files)
80 * Solaris 9 extended tar format (including ACLs)
81 * POSIX pax interchange format
82 * POSIX octet-oriented cpio
83 * SVR4 ASCII cpio
84 * Binary cpio (big-endian or little-endian)
85 * PWB binary cpio
86 * ISO9660 CD-ROM images (with optional Rockridge or Joliet extensions)
87 * ZIP archives (with uncompressed or "deflate" compressed entries, including support for encrypted Zip archives)
88 * ZIPX archives (with support for bzip2, ppmd8, lzma and xz compressed entries)
89 * GNU and BSD 'ar' archives
90 * 'mtree' format
91 * 7-Zip archives (including archives that use zstandard compression)
92 * Microsoft CAB format
93 * LHA and LZH archives
94 * RAR and RAR 5.0 archives (with some limitations due to RAR's proprietary status)
95 * XAR archives
96
97 The library also detects and handles any of the following before evaluating the archive:
98
99 * uuencoded files
100 * files with RPM wrapper
101 * gzip compression
102 * bzip2 compression
103 * compress/LZW compression
104 * lzma, lzip, and xz compression
105 * lz4 compression
106 * lzop compression
107 * zstandard compression
108
109 The library can create archives in any of the following formats:
110
111 * POSIX ustar
112 * POSIX pax interchange format
113 * "restricted" pax format, which will create ustar archives except for
114 entries that require pax extensions (for long filenames, ACLs, etc).
115 * Old GNU tar format
116 * Old V7 tar format
117 * POSIX octet-oriented cpio
118 * SVR4 "newc" cpio
119 * Binary cpio (little-endian)
120 * PWB binary cpio
121 * shar archives
122 * ZIP archives (with uncompressed or "deflate" compressed entries)
123 * GNU and BSD 'ar' archives
124 * 'mtree' format
125 * ISO9660 format
126 * 7-Zip archives
127 * XAR archives
128
129 When creating archives, the result can be filtered with any of the following:
130
131 * uuencode
132 * gzip compression
133 * bzip2 compression
134 * compress/LZW compression
135 * lzma, lzip, and xz compression
136 * lz4 compression
137 * lzop compression
138 * zstandard compression
139
140 ## Notes about the Library Design
141
142 The following notes address many of the most common
143 questions we are asked about libarchive:
144
145 * This is a heavily stream-oriented system. That means that
146 it is optimized to read or write the archive in a single
147 pass from beginning to end. For example, this allows
148 libarchive to process archives too large to store on disk
149 by processing them on-the-fly as they are read from or
150 written to a network or tape drive. This also makes
151 libarchive useful for tools that need to produce
152 archives on-the-fly (such as webservers that provide
153 archived contents of a users account).
154
155 * In-place modification and random access to the contents
156 of an archive are not directly supported. For some formats,
157 this is not an issue: For example, tar.gz archives are not
158 designed for random access. In some other cases, libarchive
159 can re-open an archive and scan it from the beginning quickly
160 enough to provide the needed abilities even without true
161 random access. Of course, some applications do require true
162 random access; those applications should consider alternatives
163 to libarchive.
164
165 * The library is designed to be extended with new compression and
166 archive formats. The only requirement is that the format be
167 readable or writable as a stream and that each archive entry be
168 independent. There are articles on the libarchive Wiki explaining
169 how to extend libarchive.
170
171 * On read, compression and format are always detected automatically.
172
173 * The same API is used for all formats; it should be very
174 easy for software using libarchive to transparently handle
175 any of libarchive's archiving formats.
176
177 * Libarchive's automatic support for decompression can be used
178 without archiving by explicitly selecting the "raw" and "empty"
179 formats.
180
181 * I've attempted to minimize static link pollution. If you don't
182 explicitly invoke a particular feature (such as support for a
183 particular compression or format), it won't get pulled in to
184 statically-linked programs. In particular, if you don't explicitly
185 enable a particular compression or decompression support, you won't
186 need to link against the corresponding compression or decompression
187 libraries. This also reduces the size of statically-linked
188 binaries in environments where that matters.
189
190 * The library is generally _thread safe_ depending on the platform:
191 it does not define any global variables of its own. However, some
192 platforms do not provide fully thread-safe versions of key C library
193 functions. On those platforms, libarchive will use the non-thread-safe
194 functions. Patches to improve this are of great interest to us.
195
196 * The function `archive_write_disk_header()` is _not_ thread safe on
197 POSIX machines and could lead to security issue resulting in world
198 writeable directories. Thus it must be mutexed by the calling code.
199 This is due to calling `umask(oldumask = umask(0))`, which sets the
200 umask for the whole process to 0 for a short time frame.
201 In case other thread calls the same function in parallel, it might
202 get interrupted by it and cause the executable to use umask=0 for the
203 remaining execution.
204 This will then lead to implicitely created directories to have 777
205 permissions without sticky bit.
206
207 * In particular, libarchive's modules to read or write a directory
208 tree do use `chdir()` to optimize the directory traversals. This
209 can cause problems for programs that expect to do disk access from
210 multiple threads. Of course, those modules are completely
211 optional and you can use the rest of libarchive without them.
212
213 * The library is _not_ thread aware, however. It does no locking
214 or thread management of any kind. If you create a libarchive
215 object and need to access it from multiple threads, you will
216 need to provide your own locking.
217
218 * On read, the library accepts whatever blocks you hand it.
219 Your read callback is free to pass the library a byte at a time
220 or mmap the entire archive and give it to the library at once.
221 On write, the library always produces correctly-blocked output.
222
223 * The object-style approach allows you to have multiple archive streams
224 open at once. bsdtar uses this in its "@archive" extension.
225
226 * The archive itself is read/written using callback functions.
227 You can read an archive directly from an in-memory buffer or
228 write it to a socket, if you wish. There are some utility
229 functions to provide easy-to-use "open file," etc, capabilities.
230
231 * The read/write APIs are designed to allow individual entries
232 to be read or written to any data source: You can create
233 a block of data in memory and add it to a tar archive without
234 first writing a temporary file. You can also read an entry from
235 an archive and write the data directly to a socket. If you want
236 to read/write entries to disk, there are convenience functions to
237 make this especially easy.
238
239 * Note: The "pax interchange format" is a POSIX standard extended tar
240 format that should be used when the older _ustar_ format is not
241 appropriate. It has many advantages over other tar formats
242 (including the legacy GNU tar format) and is widely supported by
243 current tar implementations.
244