Rely on (X + (Y - 1)) / Y being the same as ceil(X / Y) when operating on
integers.
This has a couple of benefits over the previous expression:
1) the size argument is evaluated only once
2) the generated code is simpler (no conditional instructions)
3) the generated code is smaller
The generated code shrinks in terms of both bytes and instruction count.
The following table lists the number of bytes (B) and instructions (I) used
by the code before and after this change on an assortment of architectures
when the input is not known at compile time. Unless otherwise noted, the
results are based clang 6.0.1 output.
| before | after | delta
---------+---------+--------+-------------
aarch64 | 32B 8I | 24B 6I | -25%B -25%I
amd64 | 38B 10I | 25B 5I | -34%B -50%I
amd64 [1]| 43B 10I | 31B 6I | -28%B -40%I
armv7 | 36B 9I | 24B 6I | -33%B -33%I
i386 | 32B 12I | 20B 6I | -38%B -50%I
i386 [1] | 35B 11I | 25B 7I | -29%B -36%I
ppc32 | 44B 11I | 20B 5I | -55%B -55%I
ppc64 | 52B 13I | 32B 8I | -38%B -38%I
s390x | 74B 16I | 26B 5I | -65%B -69%I
sparcv9 | 36B 9I | 12B 3I | -66%B -66%I
[1] gcc 8.2.0
/* max. buffer size required for base64_encode() */
#define MAX_BASE64_ENCODED_SIZE(size) \
- (((size) / 3 + ((size) % 3 > 0)) * 4)
+ ((((size) + 2) / 3) * 4)
/* max. buffer size required for base64_decode() */
#define MAX_BASE64_DECODED_SIZE(size) \
((size) / 4 * 3 + 3)