]>
Commit | Line | Data |
---|---|---|
0a35513e AH |
1 | \input texinfo @c -*-texinfo-*- |
2 | ||
3 | @c %**start of header | |
4 | @setfilename libitm.info | |
5 | @settitle GNU libitm | |
6 | @c %**end of header | |
7 | ||
8 | ||
9 | @copying | |
98db73df | 10 | Copyright @copyright{} 2011-2014 Free Software Foundation, Inc. |
0a35513e AH |
11 | |
12 | Permission is granted to copy, distribute and/or modify this document | |
13 | under the terms of the GNU Free Documentation License, Version 1.2 or | |
14 | any later version published by the Free Software Foundation; with no | |
15 | Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. | |
16 | A copy of the license is included in the section entitled ``GNU | |
17 | Free Documentation License''. | |
18 | @end copying | |
19 | ||
20 | @ifinfo | |
21 | @dircategory GNU Libraries | |
22 | @direntry | |
23 | * libitm: (libitm). GNU Transactional Memory Library | |
24 | @end direntry | |
25 | ||
26 | This manual documents the GNU Transactional Memory Library. | |
27 | ||
28 | @insertcopying | |
29 | @end ifinfo | |
30 | ||
31 | ||
32 | @setchapternewpage odd | |
33 | ||
34 | @titlepage | |
35 | @title The GNU Transactional Memory Library | |
36 | @page | |
37 | @vskip 0pt plus 1filll | |
38 | @comment For the @value{version-GCC} Version* | |
39 | @sp 1 | |
40 | @insertcopying | |
41 | @end titlepage | |
42 | ||
43 | @summarycontents | |
44 | @contents | |
45 | @page | |
46 | ||
47 | ||
48 | @node Top | |
49 | @top Introduction | |
50 | @cindex Introduction | |
51 | ||
52 | This manual documents the usage and internals of libitm, the GNU Transactional | |
53 | Memory Library. It provides transaction support for accesses to a process' | |
54 | memory, enabling easy-to-use synchronization of accesses to shared memory by | |
55 | several threads. | |
56 | ||
57 | ||
58 | @comment | |
59 | @comment When you add a new menu item, please keep the right hand | |
60 | @comment aligned to the same column. Do not use tabs. This provides | |
61 | @comment better formatting. | |
62 | @comment | |
63 | @menu | |
64 | * Enabling libitm:: How to enable libitm for your applications. | |
65 | * C/C++ Language Constructs for TM:: | |
66 | Notes on the language-level interface supported | |
67 | by gcc. | |
68 | * The libitm ABI:: Notes on the external ABI provided by libitm. | |
69 | * Internals:: Notes on libitm's internal synchronization. | |
70 | * GNU Free Documentation License:: | |
71 | How you can copy and share this manual. | |
c288ec8f | 72 | * Library Index:: Index of this documentation. |
0a35513e AH |
73 | @end menu |
74 | ||
75 | ||
76 | @c --------------------------------------------------------------------- | |
77 | @c Enabling libitm | |
78 | @c --------------------------------------------------------------------- | |
79 | ||
80 | @node Enabling libitm | |
81 | @chapter Enabling libitm | |
82 | ||
83 | To activate support for TM in C/C++, the compile-time flag @option{-fgnu-tm} | |
84 | must be specified. This enables TM language-level constructs such as | |
034209bc TR |
85 | transaction statements (e.g., @code{__transaction_atomic}, @pxref{C/C++ |
86 | Language Constructs for TM} for details). | |
0a35513e AH |
87 | |
88 | @c --------------------------------------------------------------------- | |
89 | @c C/C++ Language Constructs for TM | |
90 | @c --------------------------------------------------------------------- | |
91 | ||
92 | @node C/C++ Language Constructs for TM | |
93 | @chapter C/C++ Language Constructs for TM | |
94 | ||
034209bc TR |
95 | Transactions are supported in C++ and C in the form of transaction statements, |
96 | transaction expressions, and function transactions. In the following example, | |
97 | both @code{a} and @code{b} will be read and the difference will be written to | |
98 | @code{c}, all atomically and isolated from other transactions: | |
99 | ||
100 | @example | |
101 | __transaction_atomic @{ c = a - b; @} | |
102 | @end example | |
103 | ||
104 | Therefore, another thread can use the following code to concurrently update | |
105 | @code{b} without ever causing @code{c} to hold a negative value (and without | |
106 | having to use other synchronization constructs such as locks or C++11 | |
107 | atomics): | |
108 | ||
109 | @example | |
110 | __transaction_atomic @{ if (a > b) b++; @} | |
111 | @end example | |
112 | ||
113 | GCC follows the @uref{https://sites.google.com/site/tmforcplusplus/, Draft | |
114 | Specification of Transactional Language Constructs for C++ (v1.1)} in its | |
115 | implementation of transactions. | |
116 | ||
117 | The precise semantics of transactions are defined in terms of the C++11/C11 | |
118 | memory model (see the specification). Roughly, transactions provide | |
119 | synchronization guarantees that are similar to what would be guaranteed when | |
120 | using a single global lock as a guard for all transactions. Note that like | |
121 | other synchronization constructs in C/C++, transactions rely on a | |
122 | data-race-free program (e.g., a nontransactional write that is concurrent | |
123 | with a transactional read to the same memory location is a data race). | |
0a35513e AH |
124 | |
125 | @c --------------------------------------------------------------------- | |
126 | @c The libitm ABI | |
127 | @c --------------------------------------------------------------------- | |
128 | ||
129 | @node The libitm ABI | |
130 | @chapter The libitm ABI | |
131 | ||
132 | The ABI provided by libitm is basically equal to the Linux variant of Intel's | |
133 | current TM ABI specification document (Revision 1.1, May 6 2009) but with the | |
134 | differences listed in this chapter. It would be good if these changes would | |
135 | eventually be merged into a future version of this specification. To ease | |
136 | look-up, the following subsections mirror the structure of this specification. | |
137 | ||
138 | @section [No changes] Objectives | |
139 | @section [No changes] Non-objectives | |
140 | ||
141 | @section Library design principles | |
142 | @subsection [No changes] Calling conventions | |
143 | @subsection [No changes] TM library algorithms | |
144 | @subsection [No changes] Optimized load and store routines | |
145 | @subsection [No changes] Aligned load and store routines | |
146 | ||
147 | @subsection Data logging functions | |
148 | ||
149 | The memory locations accessed with transactional loads and stores and the | |
150 | memory locations whose values are logged must not overlap. This required | |
151 | separation only extends to the scope of the execution of one transaction | |
152 | including all the executions of all nested transactions. | |
153 | ||
154 | The compiler must be consistent (within the scope of a single transaction) | |
155 | about which memory locations are shared and which are not shared with other | |
156 | threads (i.e., data must be accessed either transactionally or | |
157 | nontransactionally). Otherwise, non-write-through TM algorithms would not work. | |
158 | ||
eb00e959 TR |
159 | For memory locations on the stack, this requirement extends to only the |
160 | lifetime of the stack frame that the memory location belongs to (or the | |
161 | lifetime of the transaction, whichever is shorter). Thus, memory that is | |
162 | reused for several stack frames could be target of both data logging and | |
163 | transactional accesses; however, this is harmless because these stack frames' | |
164 | lifetimes will end before the transaction finishes. | |
165 | ||
0a35513e AH |
166 | @subsection [No changes] Scatter/gather calls |
167 | @subsection [No changes] Serial and irrevocable mode | |
168 | @subsection [No changes] Transaction descriptor | |
169 | @subsection Store allocation | |
170 | ||
171 | There is no @code{getTransaction} function. | |
172 | ||
173 | @subsection [No changes] Naming conventions | |
174 | ||
175 | @subsection Function pointer encryption | |
176 | ||
177 | Currently, this is not implemented. | |
178 | ||
179 | ||
180 | @section Types and macros list | |
181 | ||
182 | @code{_ITM_codeProperties} has changed, @pxref{txn-code-properties,,Starting a | |
183 | transaction}. | |
184 | @code{_ITM_srcLocation} is not used. | |
185 | ||
186 | ||
187 | @section Function list | |
188 | ||
189 | @subsection Initialization and finalization functions | |
190 | These functions are not part of the ABI. | |
191 | ||
192 | @subsection [No changes] Version checking | |
193 | @subsection [No changes] Error reporting | |
194 | @subsection [No changes] inTransaction call | |
195 | ||
196 | @subsection State manipulation functions | |
197 | There is no @code{getTransaction} function. Transaction identifiers for | |
198 | nested transactions will be ordered but not necessarily sequential (i.e., for | |
199 | a nested transaction's identifier @var{IN} and its enclosing transaction's | |
200 | identifier @var{IE}, it is guaranteed that @math{IN >= IE}). | |
201 | ||
202 | @subsection [No changes] Source locations | |
203 | ||
204 | @subsection Starting a transaction | |
205 | ||
206 | @subsubsection Transaction code properties | |
207 | ||
208 | @anchor{txn-code-properties} | |
209 | The bit @code{hasNoXMMUpdate} is instead called @code{hasNoVectorUpdate}. | |
210 | Iff it is set, vector register save/restore is not necessary for any target | |
211 | machine. | |
212 | ||
213 | The @code{hasNoFloatUpdate} bit (@code{0x0010}) is new. Iff it is set, floating | |
214 | point register save/restore is not necessary for any target machine. | |
215 | ||
216 | @code{undoLogCode} is not supported and a fatal runtime error will be raised | |
217 | if this bit is set. It is not properly defined in the ABI why barriers | |
218 | other than undo logging are not present; Are they not necessary (e.g., a | |
219 | transaction operating purely on thread-local data) or have they been omitted by | |
220 | the compiler because it thinks that some kind of global synchronization | |
221 | (e.g., serial mode) might perform better? The specification suggests that the | |
222 | latter might be the case, but the former seems to be more useful. | |
223 | ||
224 | The @code{readOnly} bit (@code{0x4000}) is new. @strong{TODO} Lexical or dynamic | |
225 | scope? | |
226 | ||
227 | @code{hasNoRetry} is not supported. If this bit is not set, but | |
228 | @code{hasNoAbort} is set, the library can assume that transaction | |
229 | rollback will not be requested. | |
230 | ||
231 | It would be useful if the absence of externally-triggered rollbacks would be | |
232 | reported for the dynamic scope as well, not just for the lexical scope | |
233 | (@code{hasNoAbort}). Without this, a library cannot exploit this together | |
234 | with flat nesting. | |
235 | ||
236 | @code{exceptionBlock} is not supported because exception blocks are not used. | |
237 | ||
238 | @subsubsection [No changes] Windows exception state | |
239 | @subsubsection [No changes] Other machine state | |
240 | ||
241 | @subsubsection [No changes] Results from beginTransaction | |
242 | ||
243 | @subsection Aborting a transaction | |
244 | ||
245 | @code{_ITM_rollbackTransaction} is not supported. @code{_ITM_abortTransaction} | |
246 | is supported but the abort reasons @code{exceptionBlockAbort}, | |
247 | @code{TMConflict}, and @code{userRetry} are not supported. There are no | |
248 | exception blocks in general, so the related cases also do not have to be | |
249 | considered. To encode @code{__transaction_cancel [[outer]]}, compilers must | |
250 | set the new @code{outerAbort} bit (@code{0x10}) additionally to the | |
251 | @code{userAbort} bit in the abort reason. | |
252 | ||
253 | @subsection Committing a transaction | |
254 | ||
255 | The exception handling (EH) scheme is different. The Intel ABI requires the | |
256 | @code{_ITM_tryCommitTransaction} function that will return even when the | |
257 | commit failed and will have to be matched with calls to either | |
258 | @code{_ITM_abortTransaction} or @code{_ITM_commitTransaction}. In contrast, | |
259 | gcc relies on transactional wrappers for the functions of the Exception | |
260 | Handling ABI and on one additional commit function (shown below). This allows | |
261 | the TM to keep track of EH internally and thus it does not have to embed the | |
262 | cleanup of EH state into the existing EH code in the program. | |
263 | @code{_ITM_tryCommitTransaction} is not supported. | |
264 | @code{_ITM_commitTransactionToId} is also not supported because the | |
265 | propagation of thrown exceptions will not bypass commits of nested | |
266 | transactions. | |
267 | ||
268 | @example | |
269 | void _ITM_commitTransactionEH(void *exc_ptr) ITM_REGPARM; | |
270 | void *_ITM_cxa_allocate_exception (size_t); | |
271 | void _ITM_cxa_throw (void *obj, void *tinfo, void *dest); | |
272 | void *_ITM_cxa_begin_catch (void *exc_ptr); | |
273 | void _ITM_cxa_end_catch (void); | |
274 | @end example | |
275 | ||
276 | @code{_ITM_commitTransactionEH} must be called to commit a transaction if an | |
277 | exception could be in flight at this position in the code. @code{exc_ptr} is | |
278 | the current exception or zero if there is no current exception. | |
279 | The @code{_ITM_cxa...} functions are transactional wrappers for the respective | |
280 | @code{__cxa...} functions and must be called instead of these in transactional | |
281 | code. | |
282 | ||
283 | To support this EH scheme, libstdc++ needs to provide one additional function | |
284 | (@code{_cxa_tm_cleanup}), which is used by the TM to clean up the exception | |
285 | handling state while rolling back a transaction: | |
286 | ||
287 | @example | |
288 | void __cxa_tm_cleanup (void *unthrown_obj, void *cleanup_exc, | |
289 | unsigned int caught_count); | |
290 | @end example | |
291 | ||
292 | @code{unthrown_obj} is non-null if the program called | |
293 | @code{__cxa_allocate_exception} for this exception but did not yet called | |
294 | @code{__cxa_throw} for it. @code{cleanup_exc} is non-null if the program is | |
295 | currently processing a cleanup along an exception path but has not caught this | |
296 | exception yet. @code{caught_count} is the nesting depth of | |
297 | @code{__cxa_begin_catch} within the transaction (which can be counted by the TM | |
298 | using @code{_ITM_cxa_begin_catch} and @code{_ITM_cxa_end_catch}); | |
299 | @code{__cxa_tm_cleanup} then performs rollback by essentially performing | |
300 | @code{__cxa_end_catch} that many times. | |
301 | ||
302 | ||
303 | ||
304 | @subsection Exception handling support | |
305 | ||
306 | Currently, there is no support for functionality like | |
307 | @code{__transaction_cancel throw} as described in the C++ TM specification. | |
308 | Supporting this should be possible with the EH scheme explained previously | |
309 | because via the transactional wrappers for the EH ABI, the TM is able to | |
310 | observe and intercept EH. | |
311 | ||
312 | ||
313 | @subsection [No changes] Transition to serial--irrevocable mode | |
314 | @subsection [No changes] Data transfer functions | |
315 | @subsection [No changes] Transactional memory copies | |
316 | ||
317 | @subsection Transactional versions of memmove | |
318 | ||
319 | If either the source or destination memory region is to be accessed | |
320 | nontransactionally, then source and destination regions must not be | |
321 | overlapping. The respective @code{_ITM_memmove} functions are still | |
322 | available but a fatal runtime error will be raised if such regions do overlap. | |
323 | To support this functionality, the ABI would have to specify how the | |
324 | intersection of the regions has to be accessed (i.e., transactionally or | |
325 | nontransactionally). | |
326 | ||
327 | @subsection [No changes] Transactional versions of memset | |
328 | @subsection [No changes] Logging functions | |
329 | ||
330 | @subsection User-registered commit and undo actions | |
331 | ||
332 | Commit actions will get executed in the same order in which the respective | |
333 | calls to @code{_ITM_addUserCommitAction} happened. Only | |
334 | @code{_ITM_noTransactionId} is allowed as value for the | |
335 | @code{resumingTransactionId} argument. Commit actions get executed after | |
336 | privatization safety has been ensured. | |
337 | ||
338 | Undo actions will get executed in reverse order compared to the order in which | |
339 | the respective calls to @code{_ITM_addUserUndoAction} happened. The ordering of | |
340 | undo actions w.r.t. the roll-back of other actions (e.g., data transfers or | |
341 | memory allocations) is undefined. | |
342 | ||
343 | @code{_ITM_getThreadnum} is not supported currently because its only purpose | |
344 | is to provide a thread ID that matches some assumed performance tuning output, | |
345 | but this output is not part of the ABI nor further defined by it. | |
346 | ||
347 | @code{_ITM_dropReferences} is not supported currently because its semantics and | |
348 | the intention behind it is not entirely clear. The | |
349 | specification suggests that this function is necessary because of certain | |
350 | orderings of data transfer undos and the releasing of memory regions (i.e., | |
351 | privatization). However, this ordering is never defined, nor is the ordering of | |
352 | dropping references w.r.t. other events. | |
353 | ||
354 | @subsection [New] Transactional indirect calls | |
355 | ||
356 | Indirect calls (i.e., calls through a function pointer) within transactions | |
357 | should execute the transactional clone of the original function (i.e., a clone | |
358 | of the original that has been fully instrumented to use the TM runtime), if | |
359 | such a clone is available. The runtime provides two functions to | |
360 | register/deregister clone tables: | |
361 | ||
362 | @example | |
363 | struct clone_entry | |
364 | @{ | |
365 | void *orig, *clone; | |
366 | @}; | |
367 | ||
368 | void _ITM_registerTMCloneTable (clone_entry *table, size_t entries); | |
369 | void _ITM_deregisterTMCloneTable (clone_entry *table); | |
370 | @end example | |
371 | ||
372 | Registered tables must be writable by the TM runtime, and must be live | |
373 | throughout the life-time of the TM runtime. | |
374 | ||
375 | @strong{TODO} The intention was always to drop the registration functions | |
376 | entirely, and create a new ELF Phdr describing the linker-sorted table. Much | |
377 | like what currently happens for @code{PT_GNU_EH_FRAME}. | |
378 | This work kept getting bogged down in how to represent the @var{N} different | |
379 | code generation variants. We clearly needed at least two---SW and HW | |
380 | transactional clones---but there was always a suggestion of more variants for | |
381 | different TM assumptions/invariants. | |
382 | ||
383 | The compiler can then use two TM runtime functions to perform indirect calls in | |
384 | transactions: | |
385 | @example | |
386 | void *_ITM_getTMCloneOrIrrevocable (void *function) ITM_REGPARM; | |
387 | void *_ITM_getTMCloneSafe (void *function) ITM_REGPARM; | |
388 | @end example | |
389 | ||
390 | If there is a registered clone for supplied function, both will return a | |
391 | pointer to the clone. If not, the first runtime function will attempt to switch | |
392 | to serial--irrevocable mode and return the original pointer, whereas the second | |
393 | will raise a fatal runtime error. | |
394 | ||
395 | @subsection [New] Transactional dynamic memory management | |
396 | ||
397 | @example | |
398 | void *_ITM_malloc (size_t) | |
399 | __attribute__((__malloc__)) ITM_PURE; | |
400 | void *_ITM_calloc (size_t, size_t) | |
401 | __attribute__((__malloc__)) ITM_PURE; | |
402 | void _ITM_free (void *) ITM_PURE; | |
403 | @end example | |
404 | ||
405 | These functions are essentially transactional wrappers for @code{malloc}, | |
406 | @code{calloc}, and @code{free}. Within transactions, the compiler should | |
407 | replace calls to the original functions with calls to the wrapper functions. | |
408 | ||
409 | ||
410 | @section [No changes] Future Enhancements to the ABI | |
411 | ||
412 | @section Sample code | |
413 | ||
414 | The code examples might not be correct w.r.t. the current version of the ABI, | |
415 | especially everything related to exception handling. | |
416 | ||
417 | ||
418 | @section [New] Memory model | |
419 | ||
420 | The ABI should define a memory model and the ordering that is guaranteed for | |
421 | data transfers and commit/undo actions, or at least refer to another memory | |
422 | model that needs to be preserved. Without that, the compiler cannot ensure the | |
423 | memory model specified on the level of the programming language (e.g., by the | |
424 | C++ TM specification). | |
425 | ||
426 | For example, if a transactional load is ordered before another load/store, then | |
427 | the TM runtime must also ensure this ordering when accessing shared state. If | |
428 | not, this might break the kind of publication safety used in the C++ TM | |
429 | specification. Likewise, the TM runtime must ensure privatization safety. | |
430 | ||
431 | ||
432 | ||
433 | @c --------------------------------------------------------------------- | |
434 | @c Internals | |
435 | @c --------------------------------------------------------------------- | |
436 | ||
437 | @node Internals | |
438 | @chapter Internals | |
439 | ||
440 | @section TM methods and method groups | |
441 | ||
442 | libitm supports several ways of synchronizing transactions with each other. | |
443 | These TM methods (or TM algorithms) are implemented in the form of | |
444 | subclasses of @code{abi_dispatch}, which provide methods for | |
445 | transactional loads and stores as well as callbacks for rollback and commit. | |
446 | All methods that are compatible with each other (i.e., that let concurrently | |
447 | running transactions still synchronize correctly even if different methods | |
448 | are used) belong to the same TM method group. Pointers to TM methods can be | |
449 | obtained using the factory methods prefixed with @code{dispatch_} in | |
450 | @file{libitm_i.h}. There are two special methods, @code{dispatch_serial} and | |
451 | @code{dispatch_serialirr}, that are compatible with all methods because they | |
452 | run transactions completely in serial mode. | |
453 | ||
454 | @subsection TM method life cycle | |
455 | ||
456 | The state of TM methods does not change after construction, but they do alter | |
457 | the state of transactions that use this method. However, because | |
458 | per-transaction data gets used by several methods, @code{gtm_thread} is | |
459 | responsible for setting an initial state that is useful for all methods. | |
460 | After that, methods are responsible for resetting/clearing this state on each | |
461 | rollback or commit (of outermost transactions), so that the transaction | |
462 | executed next is not affected by the previous transaction. | |
463 | ||
464 | There is also global state associated with each method group, which is | |
465 | initialized and shut down (@code{method_group::init()} and @code{fini()}) | |
466 | when switching between method groups (see @file{retry.cc}). | |
467 | ||
468 | @subsection Selecting the default method | |
469 | ||
470 | The default method that libitm uses for freshly started transactions (but | |
471 | not necessarily for restarted transactions) can be set via an environment | |
472 | variable (@env{ITM_DEFAULT_METHOD}), whose value should be equal to the name | |
473 | of one of the factory methods returning abi_dispatch subclasses but without | |
474 | the "dispatch_" prefix (e.g., "serialirr" instead of | |
475 | @code{GTM::dispatch_serialirr()}). | |
476 | ||
477 | Note that this environment variable is only a hint for libitm and might not | |
478 | be supported in the future. | |
479 | ||
480 | ||
481 | @section Nesting: flat vs. closed | |
482 | ||
483 | We support two different kinds of nesting of transactions. In the case of | |
484 | @emph{flat nesting}, the nesting structure is flattened and all nested | |
485 | transactions are subsumed by the enclosing transaction. In contrast, | |
486 | with @emph{closed nesting}, nested transactions that have not yet committed | |
487 | can be rolled back separately from the enclosing transactions; when they | |
488 | commit, they are subsumed by the enclosing transaction, and their effects | |
489 | will be finally committed when the outermost transaction commits. | |
490 | @emph{Open nesting} (where nested transactions can commit independently of the | |
491 | enclosing transactions) are not supported. | |
492 | ||
493 | Flat nesting is the default nesting mode, but closed nesting is supported and | |
494 | used when transactions contain user-controlled aborts | |
495 | (@code{__transaction_cancel} statements). We assume that user-controlled | |
496 | aborts are rare in typical code and used mostly in exceptional situations. | |
497 | Thus, it makes more sense to use flat nesting by default to avoid the | |
498 | performance overhead of the additional checkpoints required for closed | |
499 | nesting. User-controlled aborts will correctly abort the innermost enclosing | |
500 | transaction, whereas the whole (i.e., outermost) transaction will be restarted | |
501 | otherwise (e.g., when a transaction encounters data conflicts during | |
502 | optimistic execution). | |
503 | ||
504 | ||
505 | @section Locking conventions | |
506 | ||
507 | This section documents the locking scheme and rules for all uses of locking | |
508 | in libitm. We have to support serial(-irrevocable) mode, which is implemented | |
509 | using a global lock as explained next (called the @emph{serial lock}). To | |
510 | simplify the overall design, we use the same lock as catch-all locking | |
511 | mechanism for other infrequent tasks such as (de)registering clone tables or | |
512 | threads. Besides the serial lock, there are @emph{per-method-group locks} that | |
513 | are managed by specific method groups (i.e., groups of similar TM concurrency | |
514 | control algorithms), and lock-like constructs for quiescence-based operations | |
515 | such as ensuring privatization safety. | |
516 | ||
517 | Thus, the actions that participate in the libitm-internal locking are either | |
518 | @emph{active transactions} that do not run in serial mode, @emph{serial | |
519 | transactions} (which (are about to) run in serial mode), and management tasks | |
520 | that do not execute within a transaction but have acquired the serial mode | |
521 | like a serial transaction would do (e.g., to be able to register threads with | |
522 | libitm). Transactions become active as soon as they have successfully used the | |
523 | serial lock to announce this globally (@pxref{serial-lock-impl,,Serial lock | |
524 | implementation}). Likewise, transactions become serial transactions as soon as | |
525 | they have acquired the exclusive rights provided by the serial lock (i.e., | |
526 | serial mode, which also means that there are no other concurrent active or | |
527 | serial transactions). Note that active transactions can become serial | |
528 | transactions when they enter serial mode during the runtime of the | |
529 | transaction. | |
530 | ||
531 | @subsection State-to-lock mapping | |
532 | ||
533 | Application data is protected by the serial lock if there is a serial | |
534 | transaction and no concurrently running active transaction (i.e., non-serial). | |
535 | Otherwise, application data is protected by the currently selected method | |
536 | group, which might use per-method-group locks or other mechanisms. Also note | |
537 | that application data that is about to be privatized might not be allowed to be | |
538 | accessed by nontransactional code until privatization safety has been ensured; | |
539 | the details of this are handled by the current method group. | |
540 | ||
541 | libitm-internal state is either protected by the serial lock or accessed | |
542 | through custom concurrent code. The latter applies to the public/shared part | |
543 | of a transaction object and most typical method-group-specific state. | |
544 | ||
545 | The former category (protected by the serial lock) includes: | |
546 | @itemize @bullet | |
547 | @item The list of active threads that have used transactions. | |
548 | @item The tables that map functions to their transactional clones. | |
549 | @item The current selection of which method group to use. | |
550 | @item Some method-group-specific data, or invariants of this data. For example, | |
551 | resetting a method group to its initial state is handled by switching to the | |
552 | same method group, so the serial lock protects such resetting as well. | |
553 | @end itemize | |
554 | In general, such state is immutable whenever there exists an active | |
555 | (non-serial) transaction. If there is no active transaction, a serial | |
556 | transaction (or a thread that is not currently executing a transaction but has | |
557 | acquired the serial lock) is allowed to modify this state (but must of course | |
558 | be careful to not surprise the current method group's implementation with such | |
559 | modifications). | |
560 | ||
561 | @subsection Lock acquisition order | |
562 | ||
563 | To prevent deadlocks, locks acquisition must happen in a globally agreed-upon | |
564 | order. Note that this applies to other forms of blocking too, but does not | |
565 | necessarily apply to lock acquisitions that do not block (e.g., trylock() | |
566 | calls that do not get retried forever). Note that serial transactions are | |
567 | never return back to active transactions until the transaction has committed. | |
568 | Likewise, active transactions stay active until they have committed. | |
569 | Per-method-group locks are typically also not released before commit. | |
570 | ||
571 | Lock acquisition / blocking rules: | |
572 | @itemize @bullet | |
573 | ||
574 | @item Transactions must become active or serial before they are allowed to | |
575 | use method-group-specific locks or blocking (i.e., the serial lock must be | |
576 | acquired before those other locks, either in serial or nonserial mode). | |
577 | ||
578 | @item Any number of threads that do not currently run active transactions can | |
579 | block while trying to get the serial lock in exclusive mode. Note that active | |
580 | transactions must not block when trying to upgrade to serial mode unless there | |
581 | is no other transaction that is trying that (the latter is ensured by the | |
582 | serial lock implementation. | |
583 | ||
584 | @item Method groups must prevent deadlocks on their locks. In particular, they | |
585 | must also be prepared for another active transaction that has acquired | |
586 | method-group-specific locks but is blocked during an attempt to upgrade to | |
587 | being a serial transaction. See below for details. | |
588 | ||
589 | @item Serial transactions can acquire method-group-specific locks because there | |
590 | will be no other active nor serial transaction. | |
591 | ||
592 | @end itemize | |
593 | ||
594 | There is no single rule for per-method-group blocking because this depends on | |
595 | when a TM method might acquire locks. If no active transaction can upgrade to | |
596 | being a serial transaction after it has acquired per-method-group locks (e.g., | |
597 | when those locks are only acquired during an attempt to commit), then the TM | |
598 | method does not need to consider a potential deadlock due to serial mode. | |
599 | ||
600 | If there can be upgrades to serial mode after the acquisition of | |
601 | per-method-group locks, then TM methods need to avoid those deadlocks: | |
602 | @itemize @bullet | |
603 | @item When upgrading to a serial transaction, after acquiring exclusive rights | |
604 | to the serial lock but before waiting for concurrent active transactions to | |
605 | finish (@pxref{serial-lock-impl,,Serial lock implementation} for details), | |
606 | we have to wake up all active transactions waiting on the upgrader's | |
607 | per-method-group locks. | |
608 | @item Active transactions blocking on per-method-group locks need to check the | |
609 | serial lock and abort if there is a pending serial transaction. | |
610 | @item Lost wake-ups have to be prevented (e.g., by changing a bit in each | |
611 | per-method-group lock before doing the wake-up, and only blocking on this lock | |
612 | using a futex if this bit is not group). | |
613 | @end itemize | |
614 | ||
615 | @strong{TODO}: Can reuse serial lock for gl-*? And if we can, does it make | |
616 | sense to introduce further complexity in the serial lock? For gl-*, we can | |
617 | really only avoid an abort if we do -wb and -vbv. | |
618 | ||
619 | ||
620 | @subsection Serial lock implementation | |
621 | @anchor{serial-lock-impl} | |
622 | ||
623 | The serial lock implementation is optimized towards assuming that serial | |
624 | transactions are infrequent and not the common case. However, the performance | |
625 | of entering serial mode can matter because when only few transactions are run | |
626 | concurrently or if there are few threads, then it can be efficient to run | |
627 | transactions serially. | |
628 | ||
629 | The serial lock is similar to a multi-reader-single-writer lock in that there | |
630 | can be several active transactions but only one serial transaction. However, | |
631 | we do want to avoid contention (in the lock implementation) between active | |
632 | transactions, so we split up the reader side of the lock into per-transaction | |
633 | flags that are true iff the transaction is active. The exclusive writer side | |
634 | remains a shared single flag, which is acquired using a CAS, for example. | |
635 | On the fast-path, the serial lock then works similar to Dekker's algorithm but | |
636 | with several reader flags that a serial transaction would have to check. | |
637 | A serial transaction thus requires a list of all threads with potentially | |
638 | active transactions; we can use the serial lock itself to protect this list | |
639 | (i.e., only threads that have acquired the serial lock can modify this list). | |
640 | ||
641 | We want starvation-freedom for the serial lock to allow for using it to ensure | |
642 | progress for potentially starved transactions (@pxref{progress-guarantees,, | |
643 | Progress Guarantees} for details). However, this is currently not enforced by | |
644 | the implementation of the serial lock. | |
645 | ||
646 | Here is pseudo-code for the read/write fast paths of acquiring the serial | |
647 | lock (read-to-write upgrade is similar to write_lock: | |
648 | @example | |
649 | // read_lock: | |
650 | tx->shared_state |= active; | |
651 | __sync_synchronize(); // or STLD membar, or C++0x seq-cst fence | |
652 | while (!serial_lock.exclusive) | |
653 | if (spinning_for_too_long) goto slowpath; | |
654 | ||
655 | // write_lock: | |
656 | if (CAS(&serial_lock.exclusive, 0, this) != 0) | |
657 | goto slowpath; // writer-writer contention | |
658 | // need a membar here, but CAS already has full membar semantics | |
659 | bool need_blocking = false; | |
660 | for (t: all txns) | |
661 | @{ | |
662 | for (;t->shared_state & active;) | |
663 | if (spinning_for_too_long) @{ need_blocking = true; break; @} | |
664 | @} | |
665 | if (need_blocking) goto slowpath; | |
666 | @end example | |
667 | ||
668 | Releasing a lock in this spin-lock version then just consists of resetting | |
669 | @code{tx->shared_state} to inactive or clearing @code{serial_lock.exclusive}. | |
670 | ||
671 | However, we can't rely on a pure spinlock because we need to get the OS | |
672 | involved at some time (e.g., when there are more threads than CPUs to run on). | |
673 | Therefore, the real implementation falls back to a blocking slow path, either | |
674 | based on pthread mutexes or Linux futexes. | |
675 | ||
676 | ||
677 | @subsection Reentrancy | |
678 | ||
679 | libitm has to consider the following cases of reentrancy: | |
680 | @itemize @bullet | |
681 | ||
682 | @item Transaction calls unsafe code that starts a new transaction: The outer | |
683 | transaction will become a serial transaction before executing unsafe code. | |
684 | Therefore, nesting within serial transactions must work, even if the nested | |
685 | transaction is called from within uninstrumented code. | |
686 | ||
687 | @item Transaction calls either a transactional wrapper or safe code, which in | |
688 | turn starts a new transaction: It is not yet defined in the specification | |
689 | whether this is allowed. Thus, it is undefined whether libitm supports this. | |
690 | ||
691 | @item Code that starts new transactions might be called from within any part | |
692 | of libitm: This kind of reentrancy would likely be rather complex and can | |
693 | probably be avoided. Therefore, it is not supported. | |
694 | ||
695 | @end itemize | |
696 | ||
697 | @subsection Privatization safety | |
698 | ||
699 | Privatization safety is ensured by libitm using a quiescence-based approach. | |
700 | Basically, a privatizing transaction waits until all concurrent active | |
701 | transactions will either have finished (are not active anymore) or operate on | |
702 | a sufficiently recent snapshot to not access the privatized data anymore. This | |
703 | happens after the privatizing transaction has stopped being an active | |
704 | transaction, so waiting for quiescence does not contribute to deadlocks. | |
705 | ||
706 | In method groups that need to ensure publication safety explicitly, active | |
707 | transactions maintain a flag or timestamp in the public/shared part of the | |
708 | transaction descriptor. Before blocking, privatizers need to let the other | |
709 | transactions know that they should wake up the privatizer. | |
710 | ||
711 | @strong{TODO} Ho to implement the waiters? Should those flags be | |
712 | per-transaction or at a central place? We want to avoid one wake/wait call | |
713 | per active transactions, so we might want to use either a tree or combining | |
714 | to reduce the syscall overhead, or rather spin for a long amount of time | |
715 | instead of doing blocking. Also, it would be good if only the last transaction | |
716 | that the privatizer waits for would do the wake-up. | |
717 | ||
718 | @subsection Progress guarantees | |
719 | @anchor{progress-guarantees} | |
720 | ||
721 | Transactions that do not make progress when using the current TM method will | |
722 | eventually try to execute in serial mode. Thus, the serial lock's progress | |
723 | guarantees determine the progress guarantees of the whole TM. Obviously, we at | |
724 | least need deadlock-freedom for the serial lock, but it would also be good to | |
725 | provide starvation-freedom (informally, all threads will finish executing a | |
726 | transaction eventually iff they get enough cycles). | |
727 | ||
728 | However, the scheduling of transactions (e.g., thread scheduling by the OS) | |
729 | also affects the handling of progress guarantees by the TM. First, the TM | |
730 | can only guarantee deadlock-freedom if threads do not get stopped. Likewise, | |
731 | low-priority threads can starve if they do not get scheduled when other | |
732 | high-priority threads get those cycles instead. | |
733 | ||
734 | If all threads get scheduled eventually, correct lock implementations will | |
735 | provide deadlock-freedom, but might not provide starvation-freedom. We can | |
736 | either enforce the latter in the TM's lock implementation, or assume that | |
737 | the scheduling is sufficiently random to yield a probabilistic guarantee that | |
738 | no thread will starve (because eventually, a transaction will encounter a | |
739 | scheduling that will allow it to run). This can indeed work well in practice | |
740 | but is not necessarily guaranteed to work (e.g., simple spin locks can be | |
741 | pretty efficient). | |
742 | ||
743 | Because enforcing stronger progress guarantees in the TM has a higher runtime | |
744 | overhead, we focus on deadlock-freedom right now and assume that the threads | |
745 | will get scheduled eventually by the OS (but don't consider threads with | |
746 | different priorities). We should support starvation-freedom for serial | |
747 | transactions in the future. Everything beyond that is highly related to proper | |
748 | contention management across all of the TM (including with TM method to | |
749 | choose), and is future work. | |
750 | ||
751 | @strong{TODO} Handling thread priorities: We want to avoid priority inversion | |
752 | but it's unclear how often that actually matters in practice. Workloads that | |
753 | have threads with different priorities will likely also require lower latency | |
754 | or higher throughput for high-priority threads. Therefore, it probably makes | |
755 | not that much sense (except for eventual progress guarantees) to use | |
756 | priority inheritance until the TM has priority-aware contention management. | |
757 | ||
758 | ||
759 | @c --------------------------------------------------------------------- | |
760 | @c GNU Free Documentation License | |
761 | @c --------------------------------------------------------------------- | |
762 | ||
763 | @include fdl.texi | |
764 | ||
765 | @c --------------------------------------------------------------------- | |
766 | @c Index | |
767 | @c --------------------------------------------------------------------- | |
768 | ||
c288ec8f JM |
769 | @node Library Index |
770 | @unnumbered Library Index | |
0a35513e AH |
771 | |
772 | @printindex cp | |
773 | ||
774 | @bye |