]> git.ipfire.org Git - thirdparty/gcc.git/blob - libgomp/NOTES
[multiple changes]
[thirdparty/gcc.git] / libgomp / NOTES
1 Notes on the external ABI presented by libgomp. This ought to get
2 transformed into proper documentation at some point.
3
4 Implementing MASTER construct
5
6 if (omp_get_thread_num () == 0)
7 block
8
9 Alternately, we generate two copies of the parallel subfunction
10 and only include this in the version run by the master thread.
11 Surely that's not worthwhile though...
12
13 Implementing CRITICAL construct
14
15 Without a specified name,
16
17 void GOMP_critical_start (void);
18 void GOMP_critical_end (void);
19
20 so that we don't get COPY relocations from libgomp to the main
21 application.
22
23 With a specified name, use omp_set_lock and omp_unset_lock with
24 name being transformed into a variable declared like
25
26 omp_lock_t gomp_critical_user_<name>
27 __attribute__((common))
28
29 Ideally the ABI would specify that all zero is a valid unlocked
30 state, and so we wouldn't actually need to initialize this at
31 startup.
32
33 Implementing ATOMIC construct
34
35 The target should implement the __sync builtins.
36
37 Failing that we could add
38
39 void GOMP_atomic_enter (void)
40 void GOMP_atomic_exit (void)
41
42 which reuses the regular lock code, but with yet another lock
43 object private to the library.
44
45 Implementing FLUSH construct
46
47 Expands to the __sync_synchronize builtin.
48
49 Implementing BARRIER construct
50
51 void GOMP_barrier (void)
52
53 Implementing THREADPRIVATE construct
54
55 In _most_ cases we can map this directly to __thread. Except
56 that OMP allows constructors for C++ objects. We can either
57 refuse to support this (how often is it used?) or we can
58 implement something akin to .ctors.
59
60 Even more ideally, this ctor feature is handled by extensions
61 to the main pthreads library. Failing that, we can have a set
62 of entry points to register ctor functions to be called.
63
64 Implementing PRIVATE clause
65
66 In association with a PARALLEL, or within the lexical extent
67 of a PARALLEL block, the variable becomes a local variable in
68 the parallel subfunction.
69
70 In association with FOR or SECTIONS blocks, create a new
71 automatic variable within the current function. This preserves
72 the semantic of new variable creation.
73
74 Implementing FIRSTPRIVATE, LASTPRIVATE, COPYIN, COPYPRIVATE clauses
75
76 Seems simple enough for PARALLEL blocks. Create a private
77 struct for communicating between parent and subfunction.
78 In the parent, copy in values for scalar and "small" structs;
79 copy in addresses for others TREE_ADDRESSABLE types. In the
80 subfunction, copy the value into the local variable.
81
82 Not clear at all what to do with bare FOR or SECTION blocks.
83 The only thing I can figure is that we do something like
84
85
86 #pragma omp for firstprivate(x) lastprivate(y)
87 for (int i = 0; i < n; ++i)
88 body;
89
90 =>
91
92 {
93 int x = x, y;
94
95 // for stuff
96
97 if (i == n)
98 y = y;
99 }
100
101 where the "x=x" and "y=y" assignments actually have different
102 uids for the two variables, i.e. not something you could write
103 directly in C. Presumably this only makes sense if the "outer"
104 x and y are global variables.
105
106 COPYPRIVATE would work the same way, except the structure
107 broadcast would have to happen via SINGLE machinery instead.
108
109 Implementing REDUCTION clause
110
111 The private struct mentioned above should have a pointer to
112 an array of the type of the variable, indexed by the thread's
113 team_id. The thread stores its final value into the array,
114 and after the barrier the master thread iterates over the
115 array to collect the values.
116
117 Implementing PARALLEL construct
118
119 #pragma omp parallel
120 {
121 body;
122 }
123
124 =>
125
126 void subfunction (void *data)
127 {
128 use data;
129 body;
130 }
131
132 setup data;
133 GOMP_parallel_start (subfunction, &data, num_threads);
134 subfunction (&data);
135 GOMP_parallel_end ();
136
137 void GOMP_parallel_start (void (*fn)(void *), void *data,
138 unsigned num_threads)
139
140 The FN argument is the subfunction to be run in parallel.
141
142 The DATA argument is a pointer to a structure used to
143 communicate data in and out of the subfunction, as discussed
144 above wrt FIRSTPRIVATE et al.
145
146 The NUM_THREADS argument is 1 if an IF clause is present
147 and false, or the value of the NUM_THREADS clause, if
148 present, or 0.
149
150 The function needs to create the appropriate number of
151 threads and/or launch them from the dock. It needs to
152 create the team structure and assign team ids.
153
154 void GOMP_parallel_end (void)
155
156 Tears down the team and return us to the previous
157 omp_in_parallel() state.
158
159 Implementing FOR construct
160
161 #pragma omp parallel for
162 for (i = lb; i <= ub; i++)
163 body;
164
165 =>
166
167 void subfunction (void *data)
168 {
169 long _s0, _e0;
170 while (GOMP_loop_static_next (&_s0, &_e0))
171 {
172 long _e1 = _e0, i;
173 for (i = _s0; i < _e1; i++)
174 body;
175 }
176 GOMP_loop_end_nowait ();
177 }
178
179 GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0);
180 subfunction (NULL);
181 GOMP_parallel_end ();
182
183 #pragma omp for schedule(runtime)
184 for (i = 0; i < n; i++)
185 body;
186
187 =>
188
189 {
190 long i, _s0, _e0;
191 if (GOMP_loop_runtime_start (0, n, 1, &_s0, &_e0))
192 do {
193 long _e1 = _e0;
194 for (i = _s0, i < _e0; i++)
195 body;
196 } while (GOMP_loop_runtime_next (&_s0, _&e0));
197 GOMP_loop_end ();
198 }
199
200 Note that while it looks like there is trickyness to propagating
201 a non-constant STEP, there isn't really. We're explicitly allowed
202 to evaluate it as many times as we want, and any variables involved
203 should automatically be handled as PRIVATE or SHARED like any other
204 variables. So the expression should remain evaluable in the
205 subfunction. We can also pull it into a local variable if we like,
206 but since its supposed to remain unchanged, we can also not if we like.
207
208 If we have SCHEDULE(STATIC), and no ORDERED, then we ought to be
209 able to get away with no work-sharing context at all, since we can
210 simply perform the arithmetic directly in each thread to divide up
211 the iterations. Which would mean that we wouldn't need to call any
212 of these routines.
213
214 There are separate routines for handling loops with an ORDERED
215 clause. Bookkeeping for that is non-trivial...
216
217 Implementing ORDERED construct
218
219 void GOMP_ordered_start (void)
220 void GOMP_ordered_end (void)
221
222 Implementing SECTIONS construct
223
224 #pragma omp sections
225 {
226 #pragma omp section
227 stmt1;
228 #pragma omp section
229 stmt2;
230 #pragma omp section
231 stmt3;
232 }
233
234 =>
235
236 for (i = GOMP_sections_start (3); i != 0; i = GOMP_sections_next ())
237 switch (i)
238 {
239 case 1:
240 stmt1;
241 break;
242 case 2:
243 stmt2;
244 break;
245 case 3:
246 stmt3;
247 break;
248 }
249 GOMP_barrier ();
250
251 Implementing SINGLE construct
252
253 #pragma omp single
254 {
255 body;
256 }
257
258 =>
259
260 if (GOMP_single_start ())
261 body;
262 GOMP_barrier ();
263
264
265 #pragma omp single copyprivate(x)
266 body;
267
268 =>
269
270 datap = GOMP_single_copy_start ();
271 if (datap == NULL)
272 {
273 body;
274 data.x = x;
275 GOMP_single_copy_end (&data);
276 }
277 else
278 x = datap->x;
279 GOMP_barrier ();