libgomp/NOTES

   1 Notes on the external ABI presented by libgomp.  This ought to get
   2 transformed into proper documentation at some point.
   3
   4 Implementing MASTER construct
   5
   6         if (omp_get_thread_num () == 0)
   7           block
   8
   9         Alternately, we generate two copies of the parallel subfunction
  10         and only include this in the version run by the master thread.
  11         Surely that's not worthwhile though...
  12
  13 Implementing CRITICAL construct
  14
  15         Without a specified name,
  16
  17         void GOMP_critical_start (void);
  18         void GOMP_critical_end (void);
  19
  20         so that we don't get COPY relocations from libgomp to the main
  21         application.
  22
  23         With a specified name, use omp_set_lock and omp_unset_lock with
  24         name being transformed into a variable declared like
  25
  26                 omp_lock_t gomp_critical_user_<name>
  27                         __attribute__((common))
  28
  29         Ideally the ABI would specify that all zero is a valid unlocked
  30         state, and so we wouldn't actually need to initialize this at
  31         startup.
  32
  33 Implementing ATOMIC construct
  34
  35         The target should implement the __sync builtins.
  36
  37         Failing that we could add
  38
  39         void GOMP_atomic_enter (void)
  40         void GOMP_atomic_exit (void)
  41
  42         which reuses the regular lock code, but with yet another lock
  43         object private to the library.
  44
  45 Implementing FLUSH construct
  46
  47         Expands to the __sync_synchronize builtin.
  48
  49 Implementing BARRIER construct
  50
  51         void GOMP_barrier (void)
  52
  53 Implementing THREADPRIVATE construct
  54
  55         In _most_ cases we can map this directly to __thread.  Except
  56         that OMP allows constructors for C++ objects.  We can either
  57         refuse to support this (how often is it used?) or we can
  58         implement something akin to .ctors.
  59
  60         Even more ideally, this ctor feature is handled by extensions
  61         to the main pthreads library.  Failing that, we can have a set
  62         of entry points to register ctor functions to be called.
  63
  64 Implementing PRIVATE clause
  65
  66         In association with a PARALLEL, or within the lexical extent
  67         of a PARALLEL block, the variable becomes a local variable in
  68         the parallel subfunction.
  69
  70         In association with FOR or SECTIONS blocks, create a new
  71         automatic variable within the current function.  This preserves
  72         the semantic of new variable creation.
  73
  74 Implementing FIRSTPRIVATE, LASTPRIVATE, COPYIN, COPYPRIVATE clauses
  75
  76         Seems simple enough for PARALLEL blocks.  Create a private
  77         struct for communicating between parent and subfunction.
  78         In the parent, copy in values for scalar and "small" structs;
  79         copy in addresses for others TREE_ADDRESSABLE types.  In the
  80         subfunction, copy the value into the local variable.
  81
  82         Not clear at all what to do with bare FOR or SECTION blocks.
  83         The only thing I can figure is that we do something like
  84
  85
  86                 #pragma omp for firstprivate(x) lastprivate(y)
  87                 for (int i = 0; i < n; ++i)
  88                   body;
  89
  90                 =>
  91
  92                 {
  93                   int x = x, y;
  94
  95                   // for stuff
  96
  97                   if (i == n)
  98                     y = y;
  99                 }
 100
 101         where the "x=x" and "y=y" assignments actually have different
 102         uids for the two variables, i.e. not something you could write
 103         directly in C.  Presumably this only makes sense if the "outer"
 104         x and y are global variables.
 105
 106         COPYPRIVATE would work the same way, except the structure
 107         broadcast would have to happen via SINGLE machinery instead.
 108
 109 Implementing REDUCTION clause
 110
 111         The private struct mentioned above should have a pointer to
 112         an array of the type of the variable, indexed by the thread's
 113         team_id.  The thread stores its final value into the array,
 114         and after the barrier the master thread iterates over the
 115         array to collect the values.
 116
 117 Implementing PARALLEL construct
 118
 119         #pragma omp parallel
 120         {
 121           body;
 122         }
 123
 124         =>
 125
 126         void subfunction (void *data)
 127         {
 128           use data;
 129           body;
 130         }
 131
 132         setup data;
 133         GOMP_parallel_start (subfunction, &data, num_threads);
 134         subfunction (&data);
 135         GOMP_parallel_end ();
 136
 137   void GOMP_parallel_start (void (*fn)(void *), void *data,
 138                             unsigned num_threads)
 139
 140         The FN argument is the subfunction to be run in parallel.
 141
 142         The DATA argument is a pointer to a structure used to
 143         communicate data in and out of the subfunction, as discussed
 144         above wrt FIRSTPRIVATE et al.
 145
 146         The NUM_THREADS argument is 1 if an IF clause is present
 147         and false, or the value of the NUM_THREADS clause, if
 148         present, or 0.
 149
 150         The function needs to create the appropriate number of
 151         threads and/or launch them from the dock.  It needs to
 152         create the team structure and assign team ids.
 153
 154   void GOMP_parallel_end (void)
 155
 156         Tears down the team and return us to the previous
 157         omp_in_parallel() state.
 158
 159 Implementing FOR construct
 160
 161         #pragma omp parallel for
 162         for (i = lb; i <= ub; i++)
 163           body;
 164
 165         =>
 166
 167         void subfunction (void *data)
 168         {
 169           long _s0, _e0;
 170           while (GOMP_loop_static_next (&_s0, &_e0))
 171             {
 172               long _e1 = _e0, i;
 173               for (i = _s0; i < _e1; i++)
 174                 body;
 175             }
 176           GOMP_loop_end_nowait ();
 177         }
 178
 179         GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0);
 180         subfunction (NULL);
 181         GOMP_parallel_end ();
 182
 183         #pragma omp for schedule(runtime)
 184         for (i = 0; i < n; i++)
 185           body;
 186
 187         =>
 188
 189         {
 190           long i, _s0, _e0;
 191           if (GOMP_loop_runtime_start (0, n, 1, &_s0, &_e0))
 192             do {
 193               long _e1 = _e0;
 194               for (i = _s0, i < _e0; i++)
 195                 body;
 196             } while (GOMP_loop_runtime_next (&_s0, _&e0));
 197           GOMP_loop_end ();
 198         }
 199
 200         Note that while it looks like there is trickyness to propagating
 201         a non-constant STEP, there isn't really.  We're explicitly allowed
 202         to evaluate it as many times as we want, and any variables involved
 203         should automatically be handled as PRIVATE or SHARED like any other
 204         variables.  So the expression should remain evaluable in the
 205         subfunction.  We can also pull it into a local variable if we like,
 206         but since its supposed to remain unchanged, we can also not if we like.
 207
 208         If we have SCHEDULE(STATIC), and no ORDERED, then we ought to be
 209         able to get away with no work-sharing context at all, since we can
 210         simply perform the arithmetic directly in each thread to divide up
 211         the iterations.  Which would mean that we wouldn't need to call any
 212         of these routines.
 213
 214         There are separate routines for handling loops with an ORDERED
 215         clause.  Bookkeeping for that is non-trivial...
 216
 217 Implementing ORDERED construct
 218
 219         void GOMP_ordered_start (void)
 220         void GOMP_ordered_end (void)
 221
 222 Implementing SECTIONS construct
 223
 224         #pragma omp sections
 225         {
 226           #pragma omp section
 227           stmt1;
 228           #pragma omp section
 229           stmt2;
 230           #pragma omp section
 231           stmt3;
 232         }
 233
 234         =>
 235
 236         for (i = GOMP_sections_start (3); i != 0; i = GOMP_sections_next ())
 237           switch (i)
 238             {
 239             case 1:
 240               stmt1;
 241               break;
 242             case 2:
 243               stmt2;
 244               break;
 245             case 3:
 246               stmt3;
 247               break;
 248             }
 249         GOMP_barrier ();
 250
 251 Implementing SINGLE construct
 252
 253         #pragma omp single
 254         {
 255           body;
 256         }
 257
 258         =>
 259
 260         if (GOMP_single_start ())
 261           body;
 262         GOMP_barrier ();
 263
 264
 265         #pragma omp single copyprivate(x)
 266         body;
 267
 268         =>
 269
 270         datap = GOMP_single_copy_start ();
 271         if (datap == NULL)
 272           {
 273             body;
 274             data.x = x;
 275             GOMP_single_copy_end (&data);
 276           }
 277         else
 278           x = datap->x;
 279         GOMP_barrier ();