]> git.ipfire.org Git - thirdparty/glibc.git/blob - README.libm
Update.
[thirdparty/glibc.git] / README.libm
1 The following functions for the `long double' versions of the libm
2 function have to be written:
3
4 e_acosl.c
5 e_asinl.c
6 e_atan2l.c
7 e_expl.c
8 e_fmodl.c
9 e_hypotl.c
10 e_j0l.c
11 e_j1l.c
12 e_jnl.c
13 e_lgammal_r.c
14 e_logl.c
15 e_log10l.c
16 e_powl.c
17 e_rem_pio2l.c
18 e_sinhl.c
19 e_sqrtl.c
20
21 k_cosl.c
22 k_rem_pio2l.c
23 k_sinl.c
24 k_tanl.c
25
26 s_atanl.c
27 s_erfl.c
28 s_expm1l.c
29 s_log1pl.c
30
31 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
32 Methods
33 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
34 arcsin
35 ~~~~~~
36 * Since asin(x) = x + x^3/6 + x^5*3/40 + x^7*15/336 + ...
37 * we approximate asin(x) on [0,0.5] by
38 * asin(x) = x + x*x^2*R(x^2)
39 * where
40 * R(x^2) is a rational approximation of (asin(x)-x)/x^3
41 * and its remez error is bounded by
42 * |(asin(x)-x)/x^3 - R(x^2)| < 2^(-58.75)
43 *
44 * For x in [0.5,1]
45 * asin(x) = pi/2-2*asin(sqrt((1-x)/2))
46 * Let y = (1-x), z = y/2, s := sqrt(z), and pio2_hi+pio2_lo=pi/2;
47 * then for x>0.98
48 * asin(x) = pi/2 - 2*(s+s*z*R(z))
49 * = pio2_hi - (2*(s+s*z*R(z)) - pio2_lo)
50 * For x<=0.98, let pio4_hi = pio2_hi/2, then
51 * f = hi part of s;
52 * c = sqrt(z) - f = (z-f*f)/(s+f) ...f+c=sqrt(z)
53 * and
54 * asin(x) = pi/2 - 2*(s+s*z*R(z))
55 * = pio4_hi+(pio4-2s)-(2s*z*R(z)-pio2_lo)
56 * = pio4_hi+(pio4-2f)-(2s*z*R(z)-(pio2_lo+2c))
57 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
58 arccos
59 ~~~~~~
60 * Method :
61 * acos(x) = pi/2 - asin(x)
62 * acos(-x) = pi/2 + asin(x)
63 * For |x|<=0.5
64 * acos(x) = pi/2 - (x + x*x^2*R(x^2)) (see asin.c)
65 * For x>0.5
66 * acos(x) = pi/2 - (pi/2 - 2asin(sqrt((1-x)/2)))
67 * = 2asin(sqrt((1-x)/2))
68 * = 2s + 2s*z*R(z) ...z=(1-x)/2, s=sqrt(z)
69 * = 2f + (2c + 2s*z*R(z))
70 * where f=hi part of s, and c = (z-f*f)/(s+f) is the correction term
71 * for f so that f+c ~ sqrt(z).
72 * For x<-0.5
73 * acos(x) = pi - 2asin(sqrt((1-|x|)/2))
74 * = pi - 0.5*(s+s*z*R(z)), where z=(1-|x|)/2,s=sqrt(z)
75 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
76 atan2
77 ~~~~~
78 * Method :
79 * 1. Reduce y to positive by atan2(y,x)=-atan2(-y,x).
80 * 2. Reduce x to positive by (if x and y are unexceptional):
81 * ARG (x+iy) = arctan(y/x) ... if x > 0,
82 * ARG (x+iy) = pi - arctan[y/(-x)] ... if x < 0,
83 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
84 atan
85 ~~~~
86 * Method
87 * 1. Reduce x to positive by atan(x) = -atan(-x).
88 * 2. According to the integer k=4t+0.25 chopped, t=x, the argument
89 * is further reduced to one of the following intervals and the
90 * arctangent of t is evaluated by the corresponding formula:
91 *
92 * [0,7/16] atan(x) = t-t^3*(a1+t^2*(a2+...(a10+t^2*a11)...)
93 * [7/16,11/16] atan(x) = atan(1/2) + atan( (t-0.5)/(1+t/2) )
94 * [11/16.19/16] atan(x) = atan( 1 ) + atan( (t-1)/(1+t) )
95 * [19/16,39/16] atan(x) = atan(3/2) + atan( (t-1.5)/(1+1.5t) )
96 * [39/16,INF] atan(x) = atan(INF) + atan( -1/t )
97 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
98 exp
99 ~~~
100 * Method
101 * 1. Argument reduction:
102 * Reduce x to an r so that |r| <= 0.5*ln2 ~ 0.34658.
103 * Given x, find r and integer k such that
104 *
105 * x = k*ln2 + r, |r| <= 0.5*ln2.
106 *
107 * Here r will be represented as r = hi-lo for better
108 * accuracy.
109 *
110 * 2. Approximation of exp(r) by a special rational function on
111 * the interval [0,0.34658]:
112 * Write
113 * R(r**2) = r*(exp(r)+1)/(exp(r)-1) = 2 + r*r/6 - r**4/360 + ...
114 * We use a special Reme algorithm on [0,0.34658] to generate
115 * a polynomial of degree 5 to approximate R. The maximum error
116 * of this polynomial approximation is bounded by 2**-59. In
117 * other words,
118 * R(z) ~ 2.0 + P1*z + P2*z**2 + P3*z**3 + P4*z**4 + P5*z**5
119 * (where z=r*r, and the values of P1 to P5 are listed below)
120 * and
121 * | 5 | -59
122 * | 2.0+P1*z+...+P5*z - R(z) | <= 2
123 * | |
124 * The computation of exp(r) thus becomes
125 * 2*r
126 * exp(r) = 1 + -------
127 * R - r
128 * r*R1(r)
129 * = 1 + r + ----------- (for better accuracy)
130 * 2 - R1(r)
131 * where
132 * 2 4 10
133 * R1(r) = r - (P1*r + P2*r + ... + P5*r ).
134 *
135 * 3. Scale back to obtain exp(x):
136 * From step 1, we have
137 * exp(x) = 2^k * exp(r)
138 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
139 hypot
140 ~~~~~
141 * If (assume round-to-nearest) z=x*x+y*y
142 * has error less than sqrt(2)/2 ulp, than
143 * sqrt(z) has error less than 1 ulp (exercise).
144 *
145 * So, compute sqrt(x*x+y*y) with some care as
146 * follows to get the error below 1 ulp:
147 *
148 * Assume x>y>0;
149 * (if possible, set rounding to round-to-nearest)
150 * 1. if x > 2y use
151 * x1*x1+(y*y+(x2*(x+x1))) for x*x+y*y
152 * where x1 = x with lower 32 bits cleared, x2 = x-x1; else
153 * 2. if x <= 2y use
154 * t1*y1+((x-y)*(x-y)+(t1*y2+t2*y))
155 * where t1 = 2x with lower 32 bits cleared, t2 = 2x-t1,
156 * y1= y with lower 32 bits chopped, y2 = y-y1.
157 *
158 * NOTE: scaling may be necessary if some argument is too
159 * large or too tiny
160 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
161 j0/y0
162 ~~~~~
163 * Method -- j0(x):
164 * 1. For tiny x, we use j0(x) = 1 - x^2/4 + x^4/64 - ...
165 * 2. Reduce x to |x| since j0(x)=j0(-x), and
166 * for x in (0,2)
167 * j0(x) = 1-z/4+ z^2*R0/S0, where z = x*x;
168 * (precision: |j0-1+z/4-z^2R0/S0 |<2**-63.67 )
169 * for x in (2,inf)
170 * j0(x) = sqrt(2/(pi*x))*(p0(x)*cos(x0)-q0(x)*sin(x0))
171 * where x0 = x-pi/4. It is better to compute sin(x0),cos(x0)
172 * as follow:
173 * cos(x0) = cos(x)cos(pi/4)+sin(x)sin(pi/4)
174 * = 1/sqrt(2) * (cos(x) + sin(x))
175 * sin(x0) = sin(x)cos(pi/4)-cos(x)sin(pi/4)
176 * = 1/sqrt(2) * (sin(x) - cos(x))
177 * (To avoid cancellation, use
178 * sin(x) +- cos(x) = -cos(2x)/(sin(x) -+ cos(x))
179 * to compute the worse one.)
180 *
181 * Method -- y0(x):
182 * 1. For x<2.
183 * Since
184 * y0(x) = 2/pi*(j0(x)*(ln(x/2)+Euler) + x^2/4 - ...)
185 * therefore y0(x)-2/pi*j0(x)*ln(x) is an even function.
186 * We use the following function to approximate y0,
187 * y0(x) = U(z)/V(z) + (2/pi)*(j0(x)*ln(x)), z= x^2
188 * where
189 * U(z) = u00 + u01*z + ... + u06*z^6
190 * V(z) = 1 + v01*z + ... + v04*z^4
191 * with absolute approximation error bounded by 2**-72.
192 * Note: For tiny x, U/V = u0 and j0(x)~1, hence
193 * y0(tiny) = u0 + (2/pi)*ln(tiny), (choose tiny<2**-27)
194 * 2. For x>=2.
195 * y0(x) = sqrt(2/(pi*x))*(p0(x)*cos(x0)+q0(x)*sin(x0))
196 * where x0 = x-pi/4. It is better to compute sin(x0),cos(x0)
197 * by the method mentioned above.
198 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
199 j1/y1
200 ~~~~~
201 * Method -- j1(x):
202 * 1. For tiny x, we use j1(x) = x/2 - x^3/16 + x^5/384 - ...
203 * 2. Reduce x to |x| since j1(x)=-j1(-x), and
204 * for x in (0,2)
205 * j1(x) = x/2 + x*z*R0/S0, where z = x*x;
206 * (precision: |j1/x - 1/2 - R0/S0 |<2**-61.51 )
207 * for x in (2,inf)
208 * j1(x) = sqrt(2/(pi*x))*(p1(x)*cos(x1)-q1(x)*sin(x1))
209 * y1(x) = sqrt(2/(pi*x))*(p1(x)*sin(x1)+q1(x)*cos(x1))
210 * where x1 = x-3*pi/4. It is better to compute sin(x1),cos(x1)
211 * as follow:
212 * cos(x1) = cos(x)cos(3pi/4)+sin(x)sin(3pi/4)
213 * = 1/sqrt(2) * (sin(x) - cos(x))
214 * sin(x1) = sin(x)cos(3pi/4)-cos(x)sin(3pi/4)
215 * = -1/sqrt(2) * (sin(x) + cos(x))
216 * (To avoid cancellation, use
217 * sin(x) +- cos(x) = -cos(2x)/(sin(x) -+ cos(x))
218 * to compute the worse one.)
219 *
220 * Method -- y1(x):
221 * 1. screen out x<=0 cases: y1(0)=-inf, y1(x<0)=NaN
222 * 2. For x<2.
223 * Since
224 * y1(x) = 2/pi*(j1(x)*(ln(x/2)+Euler)-1/x-x/2+5/64*x^3-...)
225 * therefore y1(x)-2/pi*j1(x)*ln(x)-1/x is an odd function.
226 * We use the following function to approximate y1,
227 * y1(x) = x*U(z)/V(z) + (2/pi)*(j1(x)*ln(x)-1/x), z= x^2
228 * where for x in [0,2] (abs err less than 2**-65.89)
229 * U(z) = U0[0] + U0[1]*z + ... + U0[4]*z^4
230 * V(z) = 1 + v0[0]*z + ... + v0[4]*z^5
231 * Note: For tiny x, 1/x dominate y1 and hence
232 * y1(tiny) = -2/pi/tiny, (choose tiny<2**-54)
233 * 3. For x>=2.
234 * y1(x) = sqrt(2/(pi*x))*(p1(x)*sin(x1)+q1(x)*cos(x1))
235 * where x1 = x-3*pi/4. It is better to compute sin(x1),cos(x1)
236 * by method mentioned above.
237 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
238 jn/yn
239 ~~~~~
240 * Note 2. About jn(n,x), yn(n,x)
241 * For n=0, j0(x) is called,
242 * for n=1, j1(x) is called,
243 * for n<x, forward recursion us used starting
244 * from values of j0(x) and j1(x).
245 * for n>x, a continued fraction approximation to
246 * j(n,x)/j(n-1,x) is evaluated and then backward
247 * recursion is used starting from a supposed value
248 * for j(n,x). The resulting value of j(0,x) is
249 * compared with the actual value to correct the
250 * supposed value of j(n,x).
251 *
252 * yn(n,x) is similar in all respects, except
253 * that forward recursion is used for all
254 * values of n>1.
255
256 jn:
257 /* (x >> n**2)
258 * Jn(x) = cos(x-(2n+1)*pi/4)*sqrt(2/x*pi)
259 * Yn(x) = sin(x-(2n+1)*pi/4)*sqrt(2/x*pi)
260 * Let s=sin(x), c=cos(x),
261 * xn=x-(2n+1)*pi/4, sqt2 = sqrt(2),then
262 *
263 * n sin(xn)*sqt2 cos(xn)*sqt2
264 * ----------------------------------
265 * 0 s-c c+s
266 * 1 -s-c -c+s
267 * 2 -s+c -c-s
268 * 3 s+c c-s
269 ...
270 /* x is tiny, return the first Taylor expansion of J(n,x)
271 * J(n,x) = 1/n!*(x/2)^n - ...
272 ...
273 /* use backward recurrence */
274 /* x x^2 x^2
275 * J(n,x)/J(n-1,x) = ---- ------ ------ .....
276 * 2n - 2(n+1) - 2(n+2)
277 *
278 * 1 1 1
279 * (for large x) = ---- ------ ------ .....
280 * 2n 2(n+1) 2(n+2)
281 * -- - ------ - ------ -
282 * x x x
283 *
284 * Let w = 2n/x and h=2/x, then the above quotient
285 * is equal to the continued fraction:
286 * 1
287 * = -----------------------
288 * 1
289 * w - -----------------
290 * 1
291 * w+h - ---------
292 * w+2h - ...
293 *
294 * To determine how many terms needed, let
295 * Q(0) = w, Q(1) = w(w+h) - 1,
296 * Q(k) = (w+k*h)*Q(k-1) - Q(k-2),
297 * When Q(k) > 1e4 good for single
298 * When Q(k) > 1e9 good for double
299 * When Q(k) > 1e17 good for quadruple
300
301 ...
302 /* estimate log((2/x)^n*n!) = n*log(2/x)+n*ln(n)
303 * Hence, if n*(log(2n/x)) > ...
304 * single 8.8722839355e+01
305 * double 7.09782712893383973096e+02
306 * long double 1.1356523406294143949491931077970765006170e+04
307 * then recurrent value may overflow and the result is
308 * likely underflow to zero
309
310 yn:
311 /* (x >> n**2)
312 * Jn(x) = cos(x-(2n+1)*pi/4)*sqrt(2/x*pi)
313 * Yn(x) = sin(x-(2n+1)*pi/4)*sqrt(2/x*pi)
314 * Let s=sin(x), c=cos(x),
315 * xn=x-(2n+1)*pi/4, sqt2 = sqrt(2),then
316 *
317 * n sin(xn)*sqt2 cos(xn)*sqt2
318 * ----------------------------------
319 * 0 s-c c+s
320 * 1 -s-c -c+s
321 * 2 -s+c -c-s
322 * 3 s+c c-s
323 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
324 lgamma
325 ~~~~~~
326 * Method:
327 * 1. Argument Reduction for 0 < x <= 8
328 * Since gamma(1+s)=s*gamma(s), for x in [0,8], we may
329 * reduce x to a number in [1.5,2.5] by
330 * lgamma(1+s) = log(s) + lgamma(s)
331 * for example,
332 * lgamma(7.3) = log(6.3) + lgamma(6.3)
333 * = log(6.3*5.3) + lgamma(5.3)
334 * = log(6.3*5.3*4.3*3.3*2.3) + lgamma(2.3)
335 * 2. Polynomial approximation of lgamma around its
336 * minimun ymin=1.461632144968362245 to maintain monotonicity.
337 * On [ymin-0.23, ymin+0.27] (i.e., [1.23164,1.73163]), use
338 * Let z = x-ymin;
339 * lgamma(x) = -1.214862905358496078218 + z^2*poly(z)
340 * where
341 * poly(z) is a 14 degree polynomial.
342 * 2. Rational approximation in the primary interval [2,3]
343 * We use the following approximation:
344 * s = x-2.0;
345 * lgamma(x) = 0.5*s + s*P(s)/Q(s)
346 * with accuracy
347 * |P/Q - (lgamma(x)-0.5s)| < 2**-61.71
348 * Our algorithms are based on the following observation
349 *
350 * zeta(2)-1 2 zeta(3)-1 3
351 * lgamma(2+s) = s*(1-Euler) + --------- * s - --------- * s + ...
352 * 2 3
353 *
354 * where Euler = 0.5771... is the Euler constant, which is very
355 * close to 0.5.
356 *
357 * 3. For x>=8, we have
358 * lgamma(x)~(x-0.5)log(x)-x+0.5*log(2pi)+1/(12x)-1/(360x**3)+....
359 * (better formula:
360 * lgamma(x)~(x-0.5)*(log(x)-1)-.5*(log(2pi)-1) + ...)
361 * Let z = 1/x, then we approximation
362 * f(z) = lgamma(x) - (x-0.5)(log(x)-1)
363 * by
364 * 3 5 11
365 * w = w0 + w1*z + w2*z + w3*z + ... + w6*z
366 * where
367 * |w - f(z)| < 2**-58.74
368 *
369 * 4. For negative x, since (G is gamma function)
370 * -x*G(-x)*G(x) = pi/sin(pi*x),
371 * we have
372 * G(x) = pi/(sin(pi*x)*(-x)*G(-x))
373 * since G(-x) is positive, sign(G(x)) = sign(sin(pi*x)) for x<0
374 * Hence, for x<0, signgam = sign(sin(pi*x)) and
375 * lgamma(x) = log(|Gamma(x)|)
376 * = log(pi/(|x*sin(pi*x)|)) - lgamma(-x);
377 * Note: one should avoid compute pi*(-x) directly in the
378 * computation of sin(pi*(-x)).
379 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
380 log
381 ~~~
382 * Method :
383 * 1. Argument Reduction: find k and f such that
384 * x = 2^k * (1+f),
385 * where sqrt(2)/2 < 1+f < sqrt(2) .
386 *
387 * 2. Approximation of log(1+f).
388 * Let s = f/(2+f) ; based on log(1+f) = log(1+s) - log(1-s)
389 * = 2s + 2/3 s**3 + 2/5 s**5 + .....,
390 * = 2s + s*R
391 * We use a special Reme algorithm on [0,0.1716] to generate
392 * a polynomial of degree 14 to approximate R The maximum error
393 * of this polynomial approximation is bounded by 2**-58.45. In
394 * other words,
395 * 2 4 6 8 10 12 14
396 * R(z) ~ Lg1*s +Lg2*s +Lg3*s +Lg4*s +Lg5*s +Lg6*s +Lg7*s
397 * (the values of Lg1 to Lg7 are listed in the program)
398 * and
399 * | 2 14 | -58.45
400 * | Lg1*s +...+Lg7*s - R(z) | <= 2
401 * | |
402 * Note that 2s = f - s*f = f - hfsq + s*hfsq, where hfsq = f*f/2.
403 * In order to guarantee error in log below 1ulp, we compute log
404 * by
405 * log(1+f) = f - s*(f - R) (if f is not too large)
406 * log(1+f) = f - (hfsq - s*(hfsq+R)). (better accuracy)
407 *
408 * 3. Finally, log(x) = k*ln2 + log(1+f).
409 * = k*ln2_hi+(f-(hfsq-(s*(hfsq+R)+k*ln2_lo)))
410 * Here ln2 is split into two floating point number:
411 * ln2_hi + ln2_lo,
412 * where n*ln2_hi is always exact for |n| < 2000.
413 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
414 log10
415 ~~~~~
416 * Method :
417 * Let log10_2hi = leading 40 bits of log10(2) and
418 * log10_2lo = log10(2) - log10_2hi,
419 * ivln10 = 1/log(10) rounded.
420 * Then
421 * n = ilogb(x),
422 * if(n<0) n = n+1;
423 * x = scalbn(x,-n);
424 * log10(x) := n*log10_2hi + (n*log10_2lo + ivln10*log(x))
425 *
426 * Note 1:
427 * To guarantee log10(10**n)=n, where 10**n is normal, the rounding
428 * mode must set to Round-to-Nearest.
429 * Note 2:
430 * [1/log(10)] rounded to 53 bits has error .198 ulps;
431 * log10 is monotonic at all binary break points.
432 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
433 pow
434 ~~~
435 * Method: Let x = 2 * (1+f)
436 * 1. Compute and return log2(x) in two pieces:
437 * log2(x) = w1 + w2,
438 * where w1 has 53-24 = 29 bit trailing zeros.
439 * 2. Perform y*log2(x) = n+y' by simulating muti-precision
440 * arithmetic, where |y'|<=0.5.
441 * 3. Return x**y = 2**n*exp(y'*log2)
442 *
443 * Special cases:
444 * 1. (anything) ** 0 is 1
445 * 2. (anything) ** 1 is itself
446 * 3. (anything) ** NAN is NAN
447 * 4. NAN ** (anything except 0) is NAN
448 * 5. +-(|x| > 1) ** +INF is +INF
449 * 6. +-(|x| > 1) ** -INF is +0
450 * 7. +-(|x| < 1) ** +INF is +0
451 * 8. +-(|x| < 1) ** -INF is +INF
452 * 9. +-1 ** +-INF is NAN
453 * 10. +0 ** (+anything except 0, NAN) is +0
454 * 11. -0 ** (+anything except 0, NAN, odd integer) is +0
455 * 12. +0 ** (-anything except 0, NAN) is +INF
456 * 13. -0 ** (-anything except 0, NAN, odd integer) is +INF
457 * 14. -0 ** (odd integer) = -( +0 ** (odd integer) )
458 * 15. +INF ** (+anything except 0,NAN) is +INF
459 * 16. +INF ** (-anything except 0,NAN) is +0
460 * 17. -INF ** (anything) = -0 ** (-anything)
461 * 18. (-anything) ** (integer) is (-1)**(integer)*(+anything**integer)
462 * 19. (-anything except 0 and inf) ** (non-integer) is NAN
463 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
464 rem_pio2 return the remainder of x rem pi/2 in y[0]+y[1]
465 ~~~~~~~~
466 This is one of the basic functions which is written with highest accuracy
467 in mind.
468 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
469 sinh
470 ~~~~
471 * Method :
472 * mathematically sinh(x) if defined to be (exp(x)-exp(-x))/2
473 * 1. Replace x by |x| (sinh(-x) = -sinh(x)).
474 * 2.
475 * E + E/(E+1)
476 * 0 <= x <= 22 : sinh(x) := --------------, E=expm1(x)
477 * 2
478 *
479 * 22 <= x <= lnovft : sinh(x) := exp(x)/2
480 * lnovft <= x <= ln2ovft: sinh(x) := exp(x/2)/2 * exp(x/2)
481 * ln2ovft < x : sinh(x) := x*shuge (overflow)
482 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
483 sqrt
484 ~~~~
485 * Method:
486 * Bit by bit method using integer arithmetic. (Slow, but portable)
487 * 1. Normalization
488 * Scale x to y in [1,4) with even powers of 2:
489 * find an integer k such that 1 <= (y=x*2^(2k)) < 4, then
490 * sqrt(x) = 2^k * sqrt(y)
491 * 2. Bit by bit computation
492 * Let q = sqrt(y) truncated to i bit after binary point (q = 1),
493 * i 0
494 * i+1 2
495 * s = 2*q , and y = 2 * ( y - q ). (1)
496 * i i i i
497 *
498 * To compute q from q , one checks whether
499 * i+1 i
500 *
501 * -(i+1) 2
502 * (q + 2 ) <= y. (2)
503 * i
504 * -(i+1)
505 * If (2) is false, then q = q ; otherwise q = q + 2 .
506 * i+1 i i+1 i
507 *
508 * With some algebric manipulation, it is not difficult to see
509 * that (2) is equivalent to
510 * -(i+1)
511 * s + 2 <= y (3)
512 * i i
513 *
514 * The advantage of (3) is that s and y can be computed by
515 * i i
516 * the following recurrence formula:
517 * if (3) is false
518 *
519 * s = s , y = y ; (4)
520 * i+1 i i+1 i
521 *
522 * otherwise,
523 * -i -(i+1)
524 * s = s + 2 , y = y - s - 2 (5)
525 * i+1 i i+1 i i
526 *
527 * One may easily use induction to prove (4) and (5).
528 * Note. Since the left hand side of (3) contain only i+2 bits,
529 * it does not necessary to do a full (53-bit) comparison
530 * in (3).
531 * 3. Final rounding
532 * After generating the 53 bits result, we compute one more bit.
533 * Together with the remainder, we can decide whether the
534 * result is exact, bigger than 1/2ulp, or less than 1/2ulp
535 * (it will never equal to 1/2ulp).
536 * The rounding mode can be detected by checking whether
537 * huge + tiny is equal to huge, and whether huge - tiny is
538 * equal to huge for some floating point number "huge" and "tiny".
539 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
540 cos
541 ~~~
542 * kernel cos function on [-pi/4, pi/4], pi/4 ~ 0.785398164
543 * Input x is assumed to be bounded by ~pi/4 in magnitude.
544 * Input y is the tail of x.
545 *
546 * Algorithm
547 * 1. Since cos(-x) = cos(x), we need only to consider positive x.
548 * 2. if x < 2^-27 (hx<0x3e400000 0), return 1 with inexact if x!=0.
549 * 3. cos(x) is approximated by a polynomial of degree 14 on
550 * [0,pi/4]
551 * 4 14
552 * cos(x) ~ 1 - x*x/2 + C1*x + ... + C6*x
553 * where the remez error is
554 *
555 * | 2 4 6 8 10 12 14 | -58
556 * |cos(x)-(1-.5*x +C1*x +C2*x +C3*x +C4*x +C5*x +C6*x )| <= 2
557 * | |
558 *
559 * 4 6 8 10 12 14
560 * 4. let r = C1*x +C2*x +C3*x +C4*x +C5*x +C6*x , then
561 * cos(x) = 1 - x*x/2 + r
562 * since cos(x+y) ~ cos(x) - sin(x)*y
563 * ~ cos(x) - x*y,
564 * a correction term is necessary in cos(x) and hence
565 * cos(x+y) = 1 - (x*x/2 - (r - x*y))
566 * For better accuracy when x > 0.3, let qx = |x|/4 with
567 * the last 32 bits mask off, and if x > 0.78125, let qx = 0.28125.
568 * Then
569 * cos(x+y) = (1-qx) - ((x*x/2-qx) - (r-x*y)).
570 * Note that 1-qx and (x*x/2-qx) is EXACT here, and the
571 * magnitude of the latter is at least a quarter of x*x/2,
572 * thus, reducing the rounding error in the subtraction.
573 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
574 sin
575 ~~~
576 * kernel sin function on [-pi/4, pi/4], pi/4 ~ 0.7854
577 * Input x is assumed to be bounded by ~pi/4 in magnitude.
578 * Input y is the tail of x.
579 * Input iy indicates whether y is 0. (if iy=0, y assume to be 0).
580 *
581 * Algorithm
582 * 1. Since sin(-x) = -sin(x), we need only to consider positive x.
583 * 2. if x < 2^-27 (hx<0x3e400000 0), return x with inexact if x!=0.
584 * 3. sin(x) is approximated by a polynomial of degree 13 on
585 * [0,pi/4]
586 * 3 13
587 * sin(x) ~ x + S1*x + ... + S6*x
588 * where
589 *
590 * |sin(x) 2 4 6 8 10 12 | -58
591 * |----- - (1+S1*x +S2*x +S3*x +S4*x +S5*x +S6*x )| <= 2
592 * | x |
593 *
594 * 4. sin(x+y) = sin(x) + sin'(x')*y
595 * ~ sin(x) + (1-x*x/2)*y
596 * For better accuracy, let
597 * 3 2 2 2 2
598 * r = x *(S2+x *(S3+x *(S4+x *(S5+x *S6))))
599 * then 3 2
600 * sin(x) = x + (S1*x + (x *(r-y/2)+y))
601 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
602 tan
603 ~~~
604 * kernel tan function on [-pi/4, pi/4], pi/4 ~ 0.7854
605 * Input x is assumed to be bounded by ~pi/4 in magnitude.
606 * Input y is the tail of x.
607 * Input k indicates whether tan (if k=1) or
608 * -1/tan (if k= -1) is returned.
609 *
610 * Algorithm
611 * 1. Since tan(-x) = -tan(x), we need only to consider positive x.
612 * 2. if x < 2^-28 (hx<0x3e300000 0), return x with inexact if x!=0.
613 * 3. tan(x) is approximated by a odd polynomial of degree 27 on
614 * [0,0.67434]
615 * 3 27
616 * tan(x) ~ x + T1*x + ... + T13*x
617 * where
618 *
619 * |tan(x) 2 4 26 | -59.2
620 * |----- - (1+T1*x +T2*x +.... +T13*x )| <= 2
621 * | x |
622 *
623 * Note: tan(x+y) = tan(x) + tan'(x)*y
624 * ~ tan(x) + (1+x*x)*y
625 * Therefore, for better accuracy in computing tan(x+y), let
626 * 3 2 2 2 2
627 * r = x *(T2+x *(T3+x *(...+x *(T12+x *T13))))
628 * then
629 * 3 2
630 * tan(x+y) = x + (T1*x + (x *(r+y)+y))
631 *
632 * 4. For x in [0.67434,pi/4], let y = pi/4 - x, then
633 * tan(x) = tan(pi/4-y) = (1-tan(y))/(1+tan(y))
634 * = 1 - 2*(tan(y) - (tan(y)^2)/(1+tan(y)))
635 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
636 atan
637 ~~~~
638 * Method
639 * 1. Reduce x to positive by atan(x) = -atan(-x).
640 * 2. According to the integer k=4t+0.25 chopped, t=x, the argument
641 * is further reduced to one of the following intervals and the
642 * arctangent of t is evaluated by the corresponding formula:
643 *
644 * [0,7/16] atan(x) = t-t^3*(a1+t^2*(a2+...(a10+t^2*a11)...)
645 * [7/16,11/16] atan(x) = atan(1/2) + atan( (t-0.5)/(1+t/2) )
646 * [11/16.19/16] atan(x) = atan( 1 ) + atan( (t-1)/(1+t) )
647 * [19/16,39/16] atan(x) = atan(3/2) + atan( (t-1.5)/(1+1.5t) )
648 * [39/16,INF] atan(x) = atan(INF) + atan( -1/t )
649 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
650 erf
651 ~~~
652 * x
653 * 2 |\
654 * erf(x) = --------- | exp(-t*t)dt
655 * sqrt(pi) \|
656 * 0
657 *
658 * erfc(x) = 1-erf(x)
659 * Note that
660 * erf(-x) = -erf(x)
661 * erfc(-x) = 2 - erfc(x)
662 *
663 * Method:
664 * 1. For |x| in [0, 0.84375]
665 * erf(x) = x + x*R(x^2)
666 * erfc(x) = 1 - erf(x) if x in [-.84375,0.25]
667 * = 0.5 + ((0.5-x)-x*R) if x in [0.25,0.84375]
668 * where R = P/Q where P is an odd poly of degree 8 and
669 * Q is an odd poly of degree 10.
670 * -57.90
671 * | R - (erf(x)-x)/x | <= 2
672 *
673 *
674 * Remark. The formula is derived by noting
675 * erf(x) = (2/sqrt(pi))*(x - x^3/3 + x^5/10 - x^7/42 + ....)
676 * and that
677 * 2/sqrt(pi) = 1.128379167095512573896158903121545171688
678 * is close to one. The interval is chosen because the fix
679 * point of erf(x) is near 0.6174 (i.e., erf(x)=x when x is
680 * near 0.6174), and by some experiment, 0.84375 is chosen to
681 * guarantee the error is less than one ulp for erf.
682 *
683 * 2. For |x| in [0.84375,1.25], let s = |x| - 1, and
684 * c = 0.84506291151 rounded to single (24 bits)
685 * erf(x) = sign(x) * (c + P1(s)/Q1(s))
686 * erfc(x) = (1-c) - P1(s)/Q1(s) if x > 0
687 * 1+(c+P1(s)/Q1(s)) if x < 0
688 * |P1/Q1 - (erf(|x|)-c)| <= 2**-59.06
689 * Remark: here we use the taylor series expansion at x=1.
690 * erf(1+s) = erf(1) + s*Poly(s)
691 * = 0.845.. + P1(s)/Q1(s)
692 * That is, we use rational approximation to approximate
693 * erf(1+s) - (c = (single)0.84506291151)
694 * Note that |P1/Q1|< 0.078 for x in [0.84375,1.25]
695 * where
696 * P1(s) = degree 6 poly in s
697 * Q1(s) = degree 6 poly in s
698 *
699 * 3. For x in [1.25,1/0.35(~2.857143)],
700 * erfc(x) = (1/x)*exp(-x*x-0.5625+R1/S1)
701 * erf(x) = 1 - erfc(x)
702 * where
703 * R1(z) = degree 7 poly in z, (z=1/x^2)
704 * S1(z) = degree 8 poly in z
705 *
706 * 4. For x in [1/0.35,28]
707 * erfc(x) = (1/x)*exp(-x*x-0.5625+R2/S2) if x > 0
708 * = 2.0 - (1/x)*exp(-x*x-0.5625+R2/S2) if -6<x<0
709 * = 2.0 - tiny (if x <= -6)
710 * erf(x) = sign(x)*(1.0 - erfc(x)) if x < 6, else
711 * erf(x) = sign(x)*(1.0 - tiny)
712 * where
713 * R2(z) = degree 6 poly in z, (z=1/x^2)
714 * S2(z) = degree 7 poly in z
715 *
716 * Note1:
717 * To compute exp(-x*x-0.5625+R/S), let s be a single
718 * precision number and s := x; then
719 * -x*x = -s*s + (s-x)*(s+x)
720 * exp(-x*x-0.5626+R/S) =
721 * exp(-s*s-0.5625)*exp((s-x)*(s+x)+R/S);
722 * Note2:
723 * Here 4 and 5 make use of the asymptotic series
724 * exp(-x*x)
725 * erfc(x) ~ ---------- * ( 1 + Poly(1/x^2) )
726 * x*sqrt(pi)
727 * We use rational approximation to approximate
728 * g(s)=f(1/x^2) = log(erfc(x)*x) - x*x + 0.5625
729 * Here is the error bound for R1/S1 and R2/S2
730 * |R1/S1 - f(x)| < 2**(-62.57)
731 * |R2/S2 - f(x)| < 2**(-61.52)
732 *
733 * 5. For inf > x >= 28
734 * erf(x) = sign(x) *(1 - tiny) (raise inexact)
735 * erfc(x) = tiny*tiny (raise underflow) if x > 0
736 * = 2 - tiny if x<0
737 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
738 expm1 Returns exp(x)-1, the exponential of x minus 1
739 ~~~~~
740 * Method
741 * 1. Argument reduction:
742 * Given x, find r and integer k such that
743 *
744 * x = k*ln2 + r, |r| <= 0.5*ln2 ~ 0.34658
745 *
746 * Here a correction term c will be computed to compensate
747 * the error in r when rounded to a floating-point number.
748 *
749 * 2. Approximating expm1(r) by a special rational function on
750 * the interval [0,0.34658]:
751 * Since
752 * r*(exp(r)+1)/(exp(r)-1) = 2+ r^2/6 - r^4/360 + ...
753 * we define R1(r*r) by
754 * r*(exp(r)+1)/(exp(r)-1) = 2+ r^2/6 * R1(r*r)
755 * That is,
756 * R1(r**2) = 6/r *((exp(r)+1)/(exp(r)-1) - 2/r)
757 * = 6/r * ( 1 + 2.0*(1/(exp(r)-1) - 1/r))
758 * = 1 - r^2/60 + r^4/2520 - r^6/100800 + ...
759 * We use a special Reme algorithm on [0,0.347] to generate
760 * a polynomial of degree 5 in r*r to approximate R1. The
761 * maximum error of this polynomial approximation is bounded
762 * by 2**-61. In other words,
763 * R1(z) ~ 1.0 + Q1*z + Q2*z**2 + Q3*z**3 + Q4*z**4 + Q5*z**5
764 * where Q1 = -1.6666666666666567384E-2,
765 * Q2 = 3.9682539681370365873E-4,
766 * Q3 = -9.9206344733435987357E-6,
767 * Q4 = 2.5051361420808517002E-7,
768 * Q5 = -6.2843505682382617102E-9;
769 * (where z=r*r, and the values of Q1 to Q5 are listed below)
770 * with error bounded by
771 * | 5 | -61
772 * | 1.0+Q1*z+...+Q5*z - R1(z) | <= 2
773 * | |
774 *
775 * expm1(r) = exp(r)-1 is then computed by the following
776 * specific way which minimize the accumulation rounding error:
777 * 2 3
778 * r r [ 3 - (R1 + R1*r/2) ]
779 * expm1(r) = r + --- + --- * [--------------------]
780 * 2 2 [ 6 - r*(3 - R1*r/2) ]
781 *
782 * To compensate the error in the argument reduction, we use
783 * expm1(r+c) = expm1(r) + c + expm1(r)*c
784 * ~ expm1(r) + c + r*c
785 * Thus c+r*c will be added in as the correction terms for
786 * expm1(r+c). Now rearrange the term to avoid optimization
787 * screw up:
788 * ( 2 2 )
789 * ({ ( r [ R1 - (3 - R1*r/2) ] ) } r )
790 * expm1(r+c)~r - ({r*(--- * [--------------------]-c)-c} - --- )
791 * ({ ( 2 [ 6 - r*(3 - R1*r/2) ] ) } 2 )
792 * ( )
793 *
794 * = r - E
795 * 3. Scale back to obtain expm1(x):
796 * From step 1, we have
797 * expm1(x) = either 2^k*[expm1(r)+1] - 1
798 * = or 2^k*[expm1(r) + (1-2^-k)]
799 * 4. Implementation notes:
800 * (A). To save one multiplication, we scale the coefficient Qi
801 * to Qi*2^i, and replace z by (x^2)/2.
802 * (B). To achieve maximum accuracy, we compute expm1(x) by
803 * (i) if x < -56*ln2, return -1.0, (raise inexact if x!=inf)
804 * (ii) if k=0, return r-E
805 * (iii) if k=-1, return 0.5*(r-E)-0.5
806 * (iv) if k=1 if r < -0.25, return 2*((r+0.5)- E)
807 * else return 1.0+2.0*(r-E);
808 * (v) if (k<-2||k>56) return 2^k(1-(E-r)) - 1 (or exp(x)-1)
809 * (vi) if k <= 20, return 2^k((1-2^-k)-(E-r)), else
810 * (vii) return 2^k(1-((E+2^-k)-r))
811 *
812 * Special cases:
813 * expm1(INF) is INF, expm1(NaN) is NaN;
814 * expm1(-INF) is -1, and
815 * for finite argument, only expm1(0)=0 is exact.
816 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
817 log1p
818 ~~~~~
819 * Method :
820 * 1. Argument Reduction: find k and f such that
821 * 1+x = 2^k * (1+f),
822 * where sqrt(2)/2 < 1+f < sqrt(2) .
823 *
824 * Note. If k=0, then f=x is exact. However, if k!=0, then f
825 * may not be representable exactly. In that case, a correction
826 * term is need. Let u=1+x rounded. Let c = (1+x)-u, then
827 * log(1+x) - log(u) ~ c/u. Thus, we proceed to compute log(u),
828 * and add back the correction term c/u.
829 * (Note: when x > 2**53, one can simply return log(x))
830 *
831 * 2. Approximation of log1p(f).
832 * Let s = f/(2+f) ; based on log(1+f) = log(1+s) - log(1-s)
833 * = 2s + 2/3 s**3 + 2/5 s**5 + .....,
834 * = 2s + s*R
835 * We use a special Reme algorithm on [0,0.1716] to generate
836 * a polynomial of degree 14 to approximate R The maximum error
837 * of this polynomial approximation is bounded by 2**-58.45. In
838 * other words,
839 * 2 4 6 8 10 12 14
840 * R(z) ~ Lp1*s +Lp2*s +Lp3*s +Lp4*s +Lp5*s +Lp6*s +Lp7*s
841 * (the values of Lp1 to Lp7 are listed in the program)
842 * and
843 * | 2 14 | -58.45
844 * | Lp1*s +...+Lp7*s - R(z) | <= 2
845 * | |
846 * Note that 2s = f - s*f = f - hfsq + s*hfsq, where hfsq = f*f/2.
847 * In order to guarantee error in log below 1ulp, we compute log
848 * by
849 * log1p(f) = f - (hfsq - s*(hfsq+R)).
850 *
851 * 3. Finally, log1p(x) = k*ln2 + log1p(f).
852 * = k*ln2_hi+(f-(hfsq-(s*(hfsq+R)+k*ln2_lo)))
853 * Here ln2 is split into two floating point number:
854 * ln2_hi + ln2_lo,
855 * where n*ln2_hi is always exact for |n| < 2000.
856 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~