libitm/doc/locking-conventions.rst

   1 ..
   2   Copyright 1988-2022 Free Software Foundation, Inc.
   3   This is part of the GCC manual.
   4   For copying conditions, see the copyright.rst file.
   5
   6 Locking conventions
   7 *******************
   8
   9 This section documents the locking scheme and rules for all uses of locking
  10 in libitm. We have to support serial(-irrevocable) mode, which is implemented
  11 using a global lock as explained next (called the *serial lock*). To
  12 simplify the overall design, we use the same lock as catch-all locking
  13 mechanism for other infrequent tasks such as (de)registering clone tables or
  14 threads. Besides the serial lock, there are *per-method-group locks* that
  15 are managed by specific method groups (i.e., groups of similar TM concurrency
  16 control algorithms), and lock-like constructs for quiescence-based operations
  17 such as ensuring privatization safety.
  18
  19 Thus, the actions that participate in the libitm-internal locking are either
  20 *active transactions* that do not run in serial mode, *serial
  21 transactions* (which (are about to) run in serial mode), and management tasks
  22 that do not execute within a transaction but have acquired the serial mode
  23 like a serial transaction would do (e.g., to be able to register threads with
  24 libitm). Transactions become active as soon as they have successfully used the
  25 serial lock to announce this globally (see :ref:`serial-lock-impl`). Likewise, transactions become serial transactions as soon as
  26 they have acquired the exclusive rights provided by the serial lock (i.e.,
  27 serial mode, which also means that there are no other concurrent active or
  28 serial transactions). Note that active transactions can become serial
  29 transactions when they enter serial mode during the runtime of the
  30 transaction.
  31
  32 State-to-lock mapping
  33 ^^^^^^^^^^^^^^^^^^^^^
  34
  35 Application data is protected by the serial lock if there is a serial
  36 transaction and no concurrently running active transaction (i.e., non-serial).
  37 Otherwise, application data is protected by the currently selected method
  38 group, which might use per-method-group locks or other mechanisms. Also note
  39 that application data that is about to be privatized might not be allowed to be
  40 accessed by nontransactional code until privatization safety has been ensured;
  41 the details of this are handled by the current method group.
  42
  43 libitm-internal state is either protected by the serial lock or accessed
  44 through custom concurrent code. The latter applies to the public/shared part
  45 of a transaction object and most typical method-group-specific state.
  46
  47 The former category (protected by the serial lock) includes:
  48
  49 * The list of active threads that have used transactions.
  50
  51 * The tables that map functions to their transactional clones.
  52
  53 * The current selection of which method group to use.
  54
  55 * Some method-group-specific data, or invariants of this data. For example,
  56   resetting a method group to its initial state is handled by switching to the
  57   same method group, so the serial lock protects such resetting as well.
  58
  59 In general, such state is immutable whenever there exists an active
  60 (non-serial) transaction. If there is no active transaction, a serial
  61 transaction (or a thread that is not currently executing a transaction but has
  62 acquired the serial lock) is allowed to modify this state (but must of course
  63 be careful to not surprise the current method group's implementation with such
  64 modifications).
  65
  66 Lock acquisition order
  67 ^^^^^^^^^^^^^^^^^^^^^^
  68
  69 To prevent deadlocks, locks acquisition must happen in a globally agreed-upon
  70 order. Note that this applies to other forms of blocking too, but does not
  71 necessarily apply to lock acquisitions that do not block (e.g., trylock()
  72 calls that do not get retried forever). Note that serial transactions are
  73 never return back to active transactions until the transaction has committed.
  74 Likewise, active transactions stay active until they have committed.
  75 Per-method-group locks are typically also not released before commit.
  76
  77 Lock acquisition / blocking rules:
  78
  79 * Transactions must become active or serial before they are allowed to
  80   use method-group-specific locks or blocking (i.e., the serial lock must be
  81   acquired before those other locks, either in serial or nonserial mode).
  82
  83 * Any number of threads that do not currently run active transactions can
  84   block while trying to get the serial lock in exclusive mode. Note that active
  85   transactions must not block when trying to upgrade to serial mode unless there
  86   is no other transaction that is trying that (the latter is ensured by the
  87   serial lock implementation.
  88
  89 * Method groups must prevent deadlocks on their locks. In particular, they
  90   must also be prepared for another active transaction that has acquired
  91   method-group-specific locks but is blocked during an attempt to upgrade to
  92   being a serial transaction. See below for details.
  93
  94 * Serial transactions can acquire method-group-specific locks because there
  95   will be no other active nor serial transaction.
  96
  97 There is no single rule for per-method-group blocking because this depends on
  98 when a TM method might acquire locks. If no active transaction can upgrade to
  99 being a serial transaction after it has acquired per-method-group locks (e.g.,
 100 when those locks are only acquired during an attempt to commit), then the TM
 101 method does not need to consider a potential deadlock due to serial mode.
 102
 103 If there can be upgrades to serial mode after the acquisition of
 104 per-method-group locks, then TM methods need to avoid those deadlocks:
 105
 106 * When upgrading to a serial transaction, after acquiring exclusive rights
 107   to the serial lock but before waiting for concurrent active transactions to
 108   finish (see :ref:`serial-lock-impl` for details),
 109   we have to wake up all active transactions waiting on the upgrader's
 110   per-method-group locks.
 111
 112 * Active transactions blocking on per-method-group locks need to check the
 113   serial lock and abort if there is a pending serial transaction.
 114
 115 * Lost wake-ups have to be prevented (e.g., by changing a bit in each
 116   per-method-group lock before doing the wake-up, and only blocking on this lock
 117   using a futex if this bit is not group).
 118
 119 .. todo:: Can reuse serial lock for gl-\*? And if we can, does it make
 120   sense to introduce further complexity in the serial lock? For gl-\*, we can
 121   really only avoid an abort if we do -wb and -vbv.
 122
 123 .. _serial-lock-impl:
 124
 125 Serial lock implementation
 126 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 127
 128 The serial lock implementation is optimized towards assuming that serial
 129 transactions are infrequent and not the common case. However, the performance
 130 of entering serial mode can matter because when only few transactions are run
 131 concurrently or if there are few threads, then it can be efficient to run
 132 transactions serially.
 133
 134 The serial lock is similar to a multi-reader-single-writer lock in that there
 135 can be several active transactions but only one serial transaction. However,
 136 we do want to avoid contention (in the lock implementation) between active
 137 transactions, so we split up the reader side of the lock into per-transaction
 138 flags that are true iff the transaction is active. The exclusive writer side
 139 remains a shared single flag, which is acquired using a CAS, for example.
 140 On the fast-path, the serial lock then works similar to Dekker's algorithm but
 141 with several reader flags that a serial transaction would have to check.
 142 A serial transaction thus requires a list of all threads with potentially
 143 active transactions; we can use the serial lock itself to protect this list
 144 (i.e., only threads that have acquired the serial lock can modify this list).
 145
 146 We want starvation-freedom for the serial lock to allow for using it to ensure
 147 progress for potentially starved transactions (see :ref:`progress-guarantees` for details). However, this is currently not enforced by
 148 the implementation of the serial lock.
 149
 150 Here is pseudo-code for the read/write fast paths of acquiring the serial
 151 lock (read-to-write upgrade is similar to write_lock:
 152
 153 .. code-block:: c++
 154
 155   // read_lock:
 156   tx->shared_state |= active;
 157   __sync_synchronize(); // or STLD membar, or C++0x seq-cst fence
 158   while (!serial_lock.exclusive)
 159     if (spinning_for_too_long) goto slowpath;
 160
 161   // write_lock:
 162   if (CAS(&serial_lock.exclusive, 0, this) != 0)
 163     goto slowpath; // writer-writer contention
 164   // need a membar here, but CAS already has full membar semantics
 165   bool need_blocking = false;
 166   for (t: all txns)
 167     {
 168       for (;t->shared_state & active;)
 169         if (spinning_for_too_long) { need_blocking = true; break; }
 170     }
 171   if (need_blocking) goto slowpath;
 172
 173 Releasing a lock in this spin-lock version then just consists of resetting
 174 ``tx->shared_state`` to inactive or clearing ``serial_lock.exclusive``.
 175
 176 However, we can't rely on a pure spinlock because we need to get the OS
 177 involved at some time (e.g., when there are more threads than CPUs to run on).
 178 Therefore, the real implementation falls back to a blocking slow path, either
 179 based on pthread mutexes or Linux futexes.
 180
 181 Reentrancy
 182 ^^^^^^^^^^
 183
 184 libitm has to consider the following cases of reentrancy:
 185
 186 * Transaction calls unsafe code that starts a new transaction: The outer
 187   transaction will become a serial transaction before executing unsafe code.
 188   Therefore, nesting within serial transactions must work, even if the nested
 189   transaction is called from within uninstrumented code.
 190
 191 * Transaction calls either a transactional wrapper or safe code, which in
 192   turn starts a new transaction: It is not yet defined in the specification
 193   whether this is allowed. Thus, it is undefined whether libitm supports this.
 194
 195 * Code that starts new transactions might be called from within any part
 196   of libitm: This kind of reentrancy would likely be rather complex and can
 197   probably be avoided. Therefore, it is not supported.
 198
 199 Privatization safety
 200 ^^^^^^^^^^^^^^^^^^^^
 201
 202 Privatization safety is ensured by libitm using a quiescence-based approach.
 203 Basically, a privatizing transaction waits until all concurrent active
 204 transactions will either have finished (are not active anymore) or operate on
 205 a sufficiently recent snapshot to not access the privatized data anymore. This
 206 happens after the privatizing transaction has stopped being an active
 207 transaction, so waiting for quiescence does not contribute to deadlocks.
 208
 209 In method groups that need to ensure publication safety explicitly, active
 210 transactions maintain a flag or timestamp in the public/shared part of the
 211 transaction descriptor. Before blocking, privatizers need to let the other
 212 transactions know that they should wake up the privatizer.
 213
 214 .. todo:: How to implement the waiters? Should those flags be
 215   per-transaction or at a central place? We want to avoid one wake/wait call
 216   per active transactions, so we might want to use either a tree or combining
 217   to reduce the syscall overhead, or rather spin for a long amount of time
 218   instead of doing blocking. Also, it would be good if only the last transaction
 219   that the privatizer waits for would do the wake-up.
 220
 221 .. _progress-guarantees:
 222
 223 Progress guarantees
 224 ^^^^^^^^^^^^^^^^^^^
 225
 226 Transactions that do not make progress when using the current TM method will
 227 eventually try to execute in serial mode. Thus, the serial lock's progress
 228 guarantees determine the progress guarantees of the whole TM. Obviously, we at
 229 least need deadlock-freedom for the serial lock, but it would also be good to
 230 provide starvation-freedom (informally, all threads will finish executing a
 231 transaction eventually iff they get enough cycles).
 232
 233 However, the scheduling of transactions (e.g., thread scheduling by the OS)
 234 also affects the handling of progress guarantees by the TM. First, the TM
 235 can only guarantee deadlock-freedom if threads do not get stopped. Likewise,
 236 low-priority threads can starve if they do not get scheduled when other
 237 high-priority threads get those cycles instead.
 238
 239 If all threads get scheduled eventually, correct lock implementations will
 240 provide deadlock-freedom, but might not provide starvation-freedom. We can
 241 either enforce the latter in the TM's lock implementation, or assume that
 242 the scheduling is sufficiently random to yield a probabilistic guarantee that
 243 no thread will starve (because eventually, a transaction will encounter a
 244 scheduling that will allow it to run). This can indeed work well in practice
 245 but is not necessarily guaranteed to work (e.g., simple spin locks can be
 246 pretty efficient).
 247
 248 Because enforcing stronger progress guarantees in the TM has a higher runtime
 249 overhead, we focus on deadlock-freedom right now and assume that the threads
 250 will get scheduled eventually by the OS (but don't consider threads with
 251 different priorities). We should support starvation-freedom for serial
 252 transactions in the future. Everything beyond that is highly related to proper
 253 contention management across all of the TM (including with TM method to
 254 choose), and is future work.
 255
 256 **TODO** Handling thread priorities: We want to avoid priority inversion
 257 but it's unclear how often that actually matters in practice. Workloads that
 258 have threads with different priorities will likely also require lower latency
 259 or higher throughput for high-priority threads. Therefore, it probably makes
 260 not that much sense (except for eventual progress guarantees) to use
 261 priority inheritance until the TM has priority-aware contention management.