]>
Commit | Line | Data |
---|---|---|
e2d91bed | 1 | # Welcome to libarchive! |
7b7c3284 TK |
2 | |
3 | The libarchive project develops a portable, efficient C library that | |
4 | can read and write streaming archives in a variety of formats. It | |
5 | also includes implementations of the common `tar`, `cpio`, and `zcat` | |
6 | command-line tools that use the libarchive library. | |
7 | ||
e2d91bed TK |
8 | ## Questions? Issues? |
9 | ||
91bb3583 | 10 | * https://www.libarchive.org is the home for ongoing |
e2d91bed TK |
11 | libarchive development, including documentation, |
12 | and links to the libarchive mailing lists. | |
13 | * To report an issue, use the issue tracker at | |
14 | https://github.com/libarchive/libarchive/issues | |
15 | * To submit an enhancement to libarchive, please | |
16 | submit a pull request via GitHub: https://github.com/libarchive/libarchive/pulls | |
17 | ||
18 | ## Contents of the Distribution | |
19 | ||
20 | This distribution bundle includes the following major components: | |
7b7c3284 | 21 | |
7b7c3284 TK |
22 | * **libarchive**: a library for reading and writing streaming archives |
23 | * **tar**: the 'bsdtar' program is a full-featured 'tar' implementation built on libarchive | |
24 | * **cpio**: the 'bsdcpio' program is a different interface to essentially the same functionality | |
25 | * **cat**: the 'bsdcat' program is a simple replacement tool for zcat, bzcat, xzcat, and such | |
c157e4ce | 26 | * **unzip**: the 'bsdunzip' program is a simple replacement tool for Info-ZIP's unzip |
7b7c3284 TK |
27 | * **examples**: Some small example programs that you may find useful. |
28 | * **examples/minitar**: a compact sample demonstrating use of libarchive. | |
29 | * **contrib**: Various items sent to me by third parties; please contact the authors with any questions. | |
30 | ||
31 | The top-level directory contains the following information files: | |
e2d91bed | 32 | |
7b7c3284 TK |
33 | * **NEWS** - highlights of recent changes |
34 | * **COPYING** - what you can do with this | |
35 | * **INSTALL** - installation instructions | |
36 | * **README** - this file | |
7b7c3284 | 37 | * **CMakeLists.txt** - input for "cmake" build tool, see INSTALL |
e2d91bed | 38 | * **configure** - configuration script, see INSTALL for details. If your copy of the source lacks a `configure` script, you can try to construct it by running the script in `build/autogen.sh` (or use `cmake`). |
7b7c3284 | 39 | |
e2d91bed | 40 | The following files in the top-level directory are used by the 'configure' script: |
287d0c26 | 41 | |
e2d91bed TK |
42 | * `Makefile.am`, `aclocal.m4`, `configure.ac` - used to build this distribution, only needed by maintainers |
43 | * `Makefile.in`, `config.h.in` - templates used by configure script | |
44 | ||
45 | ## Documentation | |
46 | ||
47 | In addition to the informational articles and documentation | |
48 | in the online [libarchive Wiki](https://github.com/libarchive/libarchive/wiki), | |
49 | the distribution also includes a number of manual pages: | |
7b7c3284 | 50 | |
7b7c3284 TK |
51 | * bsdtar.1 explains the use of the bsdtar program |
52 | * bsdcpio.1 explains the use of the bsdcpio program | |
53 | * bsdcat.1 explains the use of the bsdcat program | |
54 | * libarchive.3 gives an overview of the library as a whole | |
55 | * archive_read.3, archive_write.3, archive_write_disk.3, and | |
56 | archive_read_disk.3 provide detailed calling sequences for the read | |
57 | and write APIs | |
58 | * archive_entry.3 details the "struct archive_entry" utility class | |
59 | * archive_internals.3 provides some insight into libarchive's | |
60 | internal structure and operation. | |
61 | * libarchive-formats.5 documents the file formats supported by the library | |
62 | * cpio.5, mtree.5, and tar.5 provide detailed information about these | |
63 | popular archive formats, including hard-to-find details about | |
64 | modern cpio and tar variants. | |
e2d91bed | 65 | |
7b7c3284 TK |
66 | The manual pages above are provided in the 'doc' directory in |
67 | a number of different formats. | |
68 | ||
e2d91bed | 69 | You should also read the copious comments in `archive.h` and the |
7b7c3284 TK |
70 | source code for the sample programs for more details. Please let us |
71 | know about any errors or omissions you find. | |
72 | ||
e2d91bed TK |
73 | ## Supported Formats |
74 | ||
c37ac23c | 75 | Currently, the library automatically detects and reads the following formats: |
287d0c26 | 76 | |
7b7c3284 TK |
77 | * Old V7 tar archives |
78 | * POSIX ustar | |
e2d91bed TK |
79 | * GNU tar format (including GNU long filenames, long link names, and sparse files) |
80 | * Solaris 9 extended tar format (including ACLs) | |
7b7c3284 TK |
81 | * POSIX pax interchange format |
82 | * POSIX octet-oriented cpio | |
83 | * SVR4 ASCII cpio | |
7b7c3284 | 84 | * Binary cpio (big-endian or little-endian) |
85f0c98c | 85 | * PWB binary cpio |
7b7c3284 | 86 | * ISO9660 CD-ROM images (with optional Rockridge or Joliet extensions) |
e2d91bed | 87 | * ZIP archives (with uncompressed or "deflate" compressed entries, including support for encrypted Zip archives) |
614110e7 | 88 | * ZIPX archives (with support for bzip2, ppmd8, lzma and xz compressed entries) |
7b7c3284 TK |
89 | * GNU and BSD 'ar' archives |
90 | * 'mtree' format | |
5f329a3a | 91 | * 7-Zip archives (including archives that use zstandard compression) |
7b7c3284 TK |
92 | * Microsoft CAB format |
93 | * LHA and LZH archives | |
a6e1e9db | 94 | * RAR and RAR 5.0 archives (with some limitations due to RAR's proprietary status) |
7b7c3284 TK |
95 | * XAR archives |
96 | ||
97 | The library also detects and handles any of the following before evaluating the archive: | |
287d0c26 | 98 | |
7b7c3284 TK |
99 | * uuencoded files |
100 | * files with RPM wrapper | |
101 | * gzip compression | |
102 | * bzip2 compression | |
103 | * compress/LZW compression | |
104 | * lzma, lzip, and xz compression | |
105 | * lz4 compression | |
106 | * lzop compression | |
07fbaa20 | 107 | * zstandard compression |
7b7c3284 TK |
108 | |
109 | The library can create archives in any of the following formats: | |
287d0c26 | 110 | |
7b7c3284 TK |
111 | * POSIX ustar |
112 | * POSIX pax interchange format | |
113 | * "restricted" pax format, which will create ustar archives except for | |
114 | entries that require pax extensions (for long filenames, ACLs, etc). | |
115 | * Old GNU tar format | |
116 | * Old V7 tar format | |
117 | * POSIX octet-oriented cpio | |
118 | * SVR4 "newc" cpio | |
85f0c98c TIH |
119 | * Binary cpio (little-endian) |
120 | * PWB binary cpio | |
7b7c3284 TK |
121 | * shar archives |
122 | * ZIP archives (with uncompressed or "deflate" compressed entries) | |
123 | * GNU and BSD 'ar' archives | |
124 | * 'mtree' format | |
125 | * ISO9660 format | |
126 | * 7-Zip archives | |
127 | * XAR archives | |
128 | ||
129 | When creating archives, the result can be filtered with any of the following: | |
287d0c26 | 130 | |
7b7c3284 TK |
131 | * uuencode |
132 | * gzip compression | |
133 | * bzip2 compression | |
134 | * compress/LZW compression | |
135 | * lzma, lzip, and xz compression | |
136 | * lz4 compression | |
137 | * lzop compression | |
07fbaa20 | 138 | * zstandard compression |
7b7c3284 | 139 | |
e2d91bed TK |
140 | ## Notes about the Library Design |
141 | ||
24e2f6ba TK |
142 | The following notes address many of the most common |
143 | questions we are asked about libarchive: | |
144 | ||
e2d91bed TK |
145 | * This is a heavily stream-oriented system. That means that |
146 | it is optimized to read or write the archive in a single | |
147 | pass from beginning to end. For example, this allows | |
148 | libarchive to process archives too large to store on disk | |
149 | by processing them on-the-fly as they are read from or | |
9a790ea8 TK |
150 | written to a network or tape drive. This also makes |
151 | libarchive useful for tools that need to produce | |
152 | archives on-the-fly (such as webservers that provide | |
153 | archived contents of a users account). | |
154 | ||
155 | * In-place modification and random access to the contents | |
156 | of an archive are not directly supported. For some formats, | |
157 | this is not an issue: For example, tar.gz archives are not | |
158 | designed for random access. In some other cases, libarchive | |
159 | can re-open an archive and scan it from the beginning quickly | |
160 | enough to provide the needed abilities even without true | |
161 | random access. Of course, some applications do require true | |
162 | random access; those applications should consider alternatives | |
163 | to libarchive. | |
e2d91bed TK |
164 | |
165 | * The library is designed to be extended with new compression and | |
166 | archive formats. The only requirement is that the format be | |
167 | readable or writable as a stream and that each archive entry be | |
168 | independent. There are articles on the libarchive Wiki explaining | |
169 | how to extend libarchive. | |
170 | ||
171 | * On read, compression and format are always detected automatically. | |
172 | ||
07fbaa20 | 173 | * The same API is used for all formats; it should be very |
e2d91bed TK |
174 | easy for software using libarchive to transparently handle |
175 | any of libarchive's archiving formats. | |
176 | ||
177 | * Libarchive's automatic support for decompression can be used | |
178 | without archiving by explicitly selecting the "raw" and "empty" | |
179 | formats. | |
180 | ||
181 | * I've attempted to minimize static link pollution. If you don't | |
182 | explicitly invoke a particular feature (such as support for a | |
183 | particular compression or format), it won't get pulled in to | |
184 | statically-linked programs. In particular, if you don't explicitly | |
185 | enable a particular compression or decompression support, you won't | |
186 | need to link against the corresponding compression or decompression | |
187 | libraries. This also reduces the size of statically-linked | |
188 | binaries in environments where that matters. | |
189 | ||
24e2f6ba TK |
190 | * The library is generally _thread safe_ depending on the platform: |
191 | it does not define any global variables of its own. However, some | |
192 | platforms do not provide fully thread-safe versions of key C library | |
193 | functions. On those platforms, libarchive will use the non-thread-safe | |
194 | functions. Patches to improve this are of great interest to us. | |
195 | ||
27bad820 PK |
196 | * The function `archive_write_disk_header()` is _not_ thread safe on |
197 | POSIX machines and could lead to security issue resulting in world | |
198 | writeable directories. Thus it must be mutexed by the calling code. | |
199 | This is due to calling `umask(oldumask = umask(0))`, which sets the | |
200 | umask for the whole process to 0 for a short time frame. | |
201 | In case other thread calls the same function in parallel, it might | |
202 | get interrupted by it and cause the executable to use umask=0 for the | |
203 | remaining execution. | |
204 | This will then lead to implicitely created directories to have 777 | |
205 | permissions without sticky bit. | |
206 | ||
24e2f6ba TK |
207 | * In particular, libarchive's modules to read or write a directory |
208 | tree do use `chdir()` to optimize the directory traversals. This | |
209 | can cause problems for programs that expect to do disk access from | |
6fd58302 TK |
210 | multiple threads. Of course, those modules are completely |
211 | optional and you can use the rest of libarchive without them. | |
24e2f6ba TK |
212 | |
213 | * The library is _not_ thread aware, however. It does no locking | |
214 | or thread management of any kind. If you create a libarchive | |
215 | object and need to access it from multiple threads, you will | |
216 | need to provide your own locking. | |
217 | ||
e2d91bed TK |
218 | * On read, the library accepts whatever blocks you hand it. |
219 | Your read callback is free to pass the library a byte at a time | |
220 | or mmap the entire archive and give it to the library at once. | |
221 | On write, the library always produces correctly-blocked output. | |
222 | ||
223 | * The object-style approach allows you to have multiple archive streams | |
224 | open at once. bsdtar uses this in its "@archive" extension. | |
225 | ||
226 | * The archive itself is read/written using callback functions. | |
227 | You can read an archive directly from an in-memory buffer or | |
228 | write it to a socket, if you wish. There are some utility | |
229 | functions to provide easy-to-use "open file," etc, capabilities. | |
230 | ||
231 | * The read/write APIs are designed to allow individual entries | |
232 | to be read or written to any data source: You can create | |
233 | a block of data in memory and add it to a tar archive without | |
234 | first writing a temporary file. You can also read an entry from | |
235 | an archive and write the data directly to a socket. If you want | |
236 | to read/write entries to disk, there are convenience functions to | |
237 | make this especially easy. | |
238 | ||
6fd58302 TK |
239 | * Note: The "pax interchange format" is a POSIX standard extended tar |
240 | format that should be used when the older _ustar_ format is not | |
241 | appropriate. It has many advantages over other tar formats | |
242 | (including the legacy GNU tar format) and is widely supported by | |
243 | current tar implementations. | |
244 |