]> git.ipfire.org Git - thirdparty/gcc.git/blob - libcody/README.md
Use OEP_DECL_NAME when comparing VLA bounds [PR101585].
[thirdparty/gcc.git] / libcody / README.md
1 # libCODY: COmpiler DYnamism<sup><a href="#1">1</a></sup>
2
3 Copyright (C) 2020 Nathan Sidwell, nathan@acm.org
4
5 libCODY is an implementation of a communication protocol between
6 compilers and build systems.
7
8 **WARNING:** This is preliminary software.
9
10 In addition to supporting C++modules, this may also support LTO
11 requirements and could also deal with generated #include files
12 and feed the compiler with prepruned include paths and whatnot. (The
13 system calls involved in include searches can be quite expensive on
14 some build infrastructures.)
15
16 * Client and Server objects
17 * Direct connection for in-process use
18 * Testing with Joust (that means nothing to you, doesn't it!)
19
20
21 ## Problem Being Solved
22
23 The origin is in C++20 modules:
24 ```
25 import foo;
26 ```
27
28 At that import, the compiler needs<sup><a href="#2">2</a></sup> to
29 load up the compiled serialization of module `foo`. Where is that
30 file? Does it even exist? Unless the build system already knows the
31 dependency graph, this might be a completely unknown module. Now, the
32 build system knows how to build things, but it might not have complete
33 information about the dependencies. The ultimate source of
34 dependencies is the source code being compiled, and specifying the
35 same thing in multiple places is a recipe for build skew.
36
37 Hence, a protocol by which a compiler can query a build system. This
38 was originally described in <a
39 href="https://wg21.link/p1184r1">p1184r1:A Module Mapper</a>. Along
40 with a proof-of-concept hack in GNUmake, described in <a
41 href="https://wg21.link/p1602">p1602:Make Me A Module</a>. The current
42 implementation has evolved and an update to p1184 will be forthcoming.
43
44 ## Packet Encoding
45
46 The protocol is turn-based. The compiler sends a block of one or more
47 requests to the builder, then waits for a block of responses to all of
48 those requests. If the builder needs to compile something to satisfy
49 a request, there may be some time before the response. A builder may
50 service multiple compilers concurrently, each as a separate connection.
51
52 When multiple requests are in a block, the responses are also in a
53 block, and in corresponding order. The responses must not be
54 commenced eagerly -- they must wait until the incoming block has ended
55 (as mentioned above, it is turn-based). To do otherwise risks
56 deadlock, as there is no requirement for a sending end of the
57 communication to listen for incoming responses (or new requests) until
58 it has completed sending its current block.
59
60 Every request has a response.
61
62 Requests and responses are user-readable text. It is not intended as
63 a transmission medium to send large binary objects (such as compiled
64 modules). It is presumed the builder and the compiler share a file
65 system, for that kind of thing.<sup><a href="#3">3</a></sup>
66
67 Messages characters are encoded in UTF8.
68
69 Messages are a sequence of octets ending with a NEWLINE (0xa). The lines
70 consist of a sequence of words, separated by WHITESPACE (0x20 or 0x9).
71 Words themselves do not contain WHITESPACE. Lines consisting solely
72 of WHITESPACE (or empty) are ignored.
73
74 To encode a block of multiple messages, non-final messages end with a
75 single word of SEMICOLON (0x3b), immediately before the NEWLINE. Thus
76 a serial connection can determine whether a block is complete without
77 decoding the messages.
78
79 Words containing characters in the set [-+_/%.A-Za-z0-9] need not be
80 quoted. Words containing characters outside that set should be
81 quoted. A zero-length word may be achieved with `''`
82
83 Quoted words begin and end with APOSTROPHE (x27). Within the quoted
84 word, BACKSLASH (x5c) is used as an escape mechanism, with the
85 following meanings:
86
87 * \\n - NEWLINE (0xa)
88 * \\t - TAB (0x9)
89 * \\' - APOSTROPHE (')
90 * \\\\ - BACKSLASH (\\)
91
92 Characters in the range [0x00, 0x20) and 0x7f are encoded with one or
93 two lowercase hex characters. Octets in the range [0x80,0xff) are
94 UTF8 encodings of unicode characters outside the traditional ASCII set
95 and passed as such.
96
97 Decoding should be more relaxed. Unquoted words containing characters
98 in the range [0x20,0xff] other than BACKSLASH or APOSTROPHE should be
99 accepted. In a quoted sequence, `\` followed by one or two lower case
100 hex characters decode to that octet. Further, words can be
101 constructed from a mixture of abutted quoted and unquoted sequences.
102 For instance `FOO' 'bar` would decode to the word `FOO bar`.
103
104 Notice that the block continuation marker of `;` is not a valid
105 encoding of the word `;`, which would be `';'`.
106
107 It is recommended that words are separated by single SPACE characters.
108
109 ## Messages
110
111 The message descriptions use `$metavariable` examples.
112
113 The request messages are specific to a particular action. The response
114 messages are more generic, describing their value types, but not their
115 meaning. Message consumers need to know the response to decode them.
116 Notice the `Packet::GetRequest()` method records in response packets
117 what the request being responded to was. Do not confuse this with the
118 `Packet::GetCode ()` method.
119
120 ### Responses
121
122 The simplest response is a single:
123
124 `OK`
125
126 This indicates the request was successful.
127
128
129 An error response is:
130
131 `ERROR $message`
132
133 The message is a human-readable string. It indicates failure of the request.
134
135 Pathnames are encoded with:
136
137 `PATHNAME $pathname`
138
139 Boolean responses use:
140
141 `BOOL `(`TRUE`|`FALSE`)
142
143 ### Handshake Request
144
145 The first message is a handshake:
146
147 `HELLO $version $compiler $ident`
148
149 The `$version` is a numeric value, currently `1`. `$compiler` identifies
150 the compiler &mdash; builders may need to keep compiled modules from
151 different compilers separate. `$ident` is an identifier the builder
152 might use to identify the compilation it is communicating with.
153
154 Responses are:
155
156 `HELLO $version $builder [$flags]`
157
158 A successful handshake. The communication is now connected and other
159 messages may be exchanged. An ERROR response indicates an unsuccessful
160 handshake. The communication remains unconnected.
161
162 There is nothing restricting a handshake to its own message block. Of
163 course, if the handshake fails, subsequent non-handshake messages in
164 the block will fail (producing error responses).
165
166 The `$flags` word, if present allows a server to control what requests
167 might be given. See below.
168
169 ### C++ Module Requests
170
171 A set of requests are specific to C++ modules:
172
173 #### Flags
174
175 Several requests and one response have an optional `$flags` word.
176 These are the `Cody::Flags` value pertaining to that request. If
177 omitted the value 0 is implied. The following flags are available:
178
179 * `0`, `None`: No flags.
180
181 * `1<<0`, `NameOnly`: The request is for the name only, and not the
182 CMI contents.
183
184 The `NameOnly` flag may be provded in a handshake response, and
185 indicates that the server is interested in requests only for their
186 implied dependency information. It may be provided on a request to
187 indicate that only the CMI name is required, not its contents (for
188 instance, when preprocessing). Note that a compiler may still make
189 `NameOnly` requests even if the server did not ask for such.
190
191 #### Repository
192
193 All relative CMI file names are relative to a repository. (There are
194 usually no absolute CMI files). The repository may be determined
195 with:
196
197 `MODULE-REPO`
198
199 A PATHNAME response is expected. The `$pathname` may be an empty
200 word, which is equivalent to `.`. When the response is a relative
201 pathname, it must be relative to the client's current working
202 directory (which might be a process on a different host to the
203 server). You may set the repository to `/`, if you with to use paths
204 relative to the root directory.
205
206 #### Exporting
207
208 A compilation of a module interface, partition or header unit can
209 inform the builder with:
210
211 `MODULE-EXPORT $module [$flags]`
212
213 This will result in a PATHNAME response naming the Compiled Module
214 Interface pathname to write.
215
216 The `MODULE-EXPORT` request does not indicate the module has been
217 successfully compiled. At most one `MODULE-EXPORT` is to be made, and
218 as the connection is for a single compilation, the builder may infer
219 dependency relationships between the module being generated and import
220 requests made.
221
222 Named module names and header unit names are distinguished by making
223 the latter unambiguously look like file names. Firstly, they must be
224 fully resolved according to the compiler's usual include path. If
225 that results in an absolute name file name (beginning with `/`, or
226 certain other OS-specific sequences), all is well. Otherwise a
227 relative file name must be prefixed by `./` to be distinguished from a
228 similarly named named module. This prefixing must occur, even if the
229 header-unit's name contains characters that cannot appear in a named
230 module's name.
231
232 It is expected that absolute header-unit names convert to relative CMI
233 names, to keep all CMIs within the CMI repository. This means that
234 steps must be taken to distinguish the CMIs for `/here` from `./here`,
235 and this can be achieved by replacing the leading `./` directory with
236 `,/`, which is visually similar but does not have the self-reference
237 semantics of dot. Likewise, header-unit names containing `..`
238 directories, can be remapped to `,,`. (When symlinks are involved
239 `bob/dob/..` might not be `bob`, of course.) C++ header-unit
240 semantics are such that there is no need to resolve multiple ways of
241 spelling a particular header-unit to a unique CMI file.
242
243 Successful compilation of an interface is indicated with a subsequent:
244
245 `MODULE-COMPILED $module [$flags]`
246
247 request. This indicates the CMI file has been written to disk, so
248 that any other compilations waiting on it may proceed. Depending on
249 compiler implementation, the CMI may be written before the compilation
250 completes. A single OK response is expected.
251
252 Compilation failure can be inferred by lack of a `MODULE-COMPILED`
253 request. It is presumed the builder can determine this, as it is also
254 responsible for launching and reaping the compiler invocations
255 themselves.
256
257 #### Importing
258
259 Importation, including that of header-units, uses:
260
261 `MODULE-IMPORT $module [$flags]`
262
263 A PATHNAME response names the CMI file to be read. Should the builder
264 have to invoke a compilation to produce the CMI, the response should
265 be delayed until that occurs. If such a compilation fails, an error
266 response should be provided to the requestor &mdash; which will then
267 presumably fail in some manner.
268
269 #### Include Translation
270
271 Include translation can be determined with:
272
273 `INCLUDE-TRANSLATE $header [$flags]`
274
275 The header name, `$header`, is the fully resolved header name, in the
276 above-mentioned unambiguous filename form. The response will either
277 be a BOOL response indicating textual inclusion, or a PATHNAME
278 response naming the CMI for such translation. The BOOL value is TRUE,
279 if the header is known to be a textual header, and FALSE if nothing is
280 known about it -- the latter might cause diagnostics about incomplete
281 knowledge.
282
283 ### GCC LTO Messages
284
285 These set of requests are used for GCC LTO jobserver integration with GNU Make
286
287 ## Building libCody
288
289 Libcody is written in C++11. (It's a intended for compilers, so
290 there'd be a bootstrapping problem if it used the latest and greatest.)
291
292 ### Using configure and make.
293
294 It supports the usual `configure`, `make`, `make check` & `make install`
295 sequence. It does not support building in the source directory --
296 that just didn't drop out, and it's not how I build things (because,
297 again, for compilers). Excitingly it uses my own `joust` test
298 harness, so you'll need to build and install that somewhere, if you
299 want the comfort of testing.
300
301 The following configure options are available, in addition to the usual set:
302
303 * `--enable-checking` Compile with assert-like checking. Defaults to on.
304
305 * `--with-tooldir=DIR` Prepend `DIR` to `PATH` when building (`DIR`
306 need not already include the trailing `/bin`, and the right things
307 happen). Use this if you need to point to non-standard tools that
308 you usually don't have in your path. This path is also used when
309 the configure script searches for programs.
310
311 * `--with-toolinc=DIR`, `--with-toollib=DIR`, include path and library
312 path variants of `--with-tooldir`. If these are siblings of the
313 tool bin directory, they'll be found automatically.
314
315 * `--with-compiler=NAME` Specify a particular compiler to use.
316 Usually what configure finds is sufficiently usable.
317
318 * `--with-bugurl=URL` Override the bugreporting URL. Do this if
319 you're providing libcody as part of a package that /you/ are
320 supporting.
321
322 * `--enable-maintainer-mode` Specify that rules to rebuild things like
323 `configure` (with `autoconf`) should be enabled. When not enabled,
324 you'll get a message if these appear out of date, but that can
325 happen naturally after an update or clone as `git`, in common with
326 other VCs, doesn't preserve the relative ordering of file
327 modifications. You can use `make MAINTAINER=touch` to shut make up,
328 if this occurs (or manually execute the `autoconf` and related
329 commands).
330
331 When building, you can override the default optimization flags with
332 `CXXFLAGS=$flags`. I often build a debuggable library with `make
333 CXXFLAGS=-g3`.
334
335 The `Makefile` will also parallelize according to the number of CPUs,
336 unless you specify explicitly with a `-j` option. This is a little
337 clunky, as it's not possible to figure out inside the makefile whether
338 the user provided `-j`. (Or at least I've not figured out how.)
339
340 ### Using cmake and make
341
342 #### In the clang/LLVM project
343
344 The primary motivation for a cmake implementation is to allow building
345 libcody "in tree" in clang/LLVM. In that case, a checkout of libcody
346 can be placed (or symbolically linked) into clang/tools. This will
347 configure and build the library along with other LLVM dependencies.
348
349 *NOTE* This is not treated as an installable entity (it is present only
350 for use by the project).
351
352 *NOTE* The testing targets would not be appropriate in this configuration;
353 it is expected that lit-based testing of the required functionality will be
354 done by the code using the library.
355
356 #### Stand-alone
357
358 For use on platforms that don't support configure & make effectively, it
359 is possible to use the cmake & make process in stand-alone mode (similar
360 to the configure & make process above).
361
362 An example use.
363 ```
364 cmake -DCMAKE_INSTALL_PREFIX=/path/to/installation -DCMAKE_CXX_COMPILER=clang++ /path/to/libcody/source
365 make
366 make install
367 ```
368 Supported flags (additions to the usual cmake ones).
369
370 * `-DCODY_CHECKING=ON,OFF`: Compile with assert-like checking. (defaults ON)
371
372 * `-DCODY_WITHEXCEPTIONS=ON,OFF`: Compile with C++ exceptions and RTTI enabled.
373 (defaults OFF, to be compatible with GCC and LLVM).
374
375 *TODO*: At present there is no support for `ctest` integration (this should be
376 feasible, provided that `joust` is installed and can be discovered by `cmake`).
377
378 ## API
379
380 The library defines entities in the `::Cody` namespace.
381
382 There are 4 user-visible classes:
383
384 * `Packet`: Responses to requests are `Packets`. These have a code,
385 indicating the response kind, and a payload.
386
387 * `Client`: The compiler-end of a connection. Requests may be made
388 and responses are returned.
389
390 * `Server`: The builder-end of a connection. Requests may be waited
391 for, and responses made. Builders that serve multiple concurrent
392 connections and spawn compilations to resolve dependencies may need
393 to derive from this class to provide response queuing.
394
395 * `Resolver`: The processing engine of the builder side. User code is
396 expected to derive from this class and provide virtual function
397 overriders to affect the semantics of the resolver.
398
399 In addition there are a number of helpers to setup connections.
400
401 Logically the Client and the Server communicate via a sequential
402 channel. The channel may be provided by:
403
404 * two pipes, with different file descriptors for reading and writing
405 at each end.
406
407 * a socket, which will use the same file descriptor for reading and
408 writing. the socket can be created in a number of ways, including
409 Unix domain and IPv6 TCP, for which helpers are provided.
410
411 * a direct, in-process, connection, using buffer swapping.
412
413 The communication channel is presumed reliable.
414
415 Refer to the (currently very sparse) doxygen-generated documentation
416 for details of the API.
417
418 ## Examples
419
420 To create an in-process resolver, use the following boilerplate:
421
422 ```
423 class MyResolver : Cody::Resolver { ... stuff here ... };
424
425 Cody::Client *MakeClient (char const *maybe_ident)
426 {
427 auto *r = new MyResolver (...);
428 auto *s = new Cody::Server (r);
429 auto *c = new Cody::Client (s);
430
431 auto t = c->ConnectRequest ("ME", maybe_ident);
432 if (t.GetCode () == Cody::Client::TC_CONNECT)
433 ;// Yay!
434 else if (t.GetCode () == Cody::Client::TC_ERROR)
435 report_error (t.GetString ());
436
437 return c;
438 }
439
440 ```
441
442 For a remotely connecting client:
443 ```
444 Cody::Client *MakeClient ()
445 {
446 char const *err = nullptr;
447 int fd = OpenInet6 (char const **err, name, port);
448 if (fd < 0)
449 { ... error... return nullptr;}
450
451 auto *c = new Cody::Client (fd);
452
453 auto t = c->ConnectRequest ("ME", maybe_ident);
454 if (t.GetCode () == Cody::Client::TC_CONNECT)
455 ;// Yay!
456 else if (t.GetCode () == Cody::Client::TC_ERROR)
457 report_error (t.GetString ());
458
459 return c;
460 }
461 ```
462
463 # Future Directions
464
465 * Current Directory. There is no mechanism to check the builder and
466 the compiler have the same working directory. Perhaps that should
467 be addressed.
468
469 * Include path canonization and/or header file lookup. This can be
470 expensive, particularly with many `-I` options, due to the system
471 calls. Perhaps using a common resource would be cheaper?
472
473 * Generated header file lookup/construction. This is essentially the
474 same problem as importing a module, and build systems are crap at
475 dealing with this.
476
477 * Link-time compilations. Another place the compiler would like to
478 ask the build system to do things.
479
480 * C++20 API entrypoints &mdash; std:string_view would be nice
481
482 * Exception-safety audit. Exceptions are not used, but memory
483 exhaustion could happen. And perhaps user's resolver code employs
484 exceptions?
485
486 <a name="1">1</a>: Or a small town in Wyoming
487
488 <a name="2">2</a>: This describes one common implementation technique.
489 The std itself doesn't require such serializations, but the ability to
490 create them is kind of the point. Also, 'compiler' is used where we
491 mean any consumer of a module, and 'build system' where we mean any
492 producer of a module.
493
494 <a name="3">3</a>: Even when the builder is managing a distributed set
495 of compilations, the builder must have a mechanism to get source files
496 to, and object files from, the compilations. That scheme can also
497 transfer the CMI files.