Default curl unity builds make a single unit for each target. It means
all target sources are batched together and built in a single compiler
invocation. With multi-core CPUs this doesn't always result in the best
possible performance. This patch enables smaller batches for jobs where
this resulted in shorter build times. These jobs are Cygwin, MSYS2,
MinGW, running on the Windows runners.
Use batch of 30 (meaning 30 sources batched into units), and 32 for
Cygwin/MSYS2 to avoid a unity fallout that's subject to a different PR.
(CMake allows to set the number of sources per unit, not the number
of units, though the latter may be more practical to max out CPU cores.)
Also override to not batch the `curlu` target because batching lost
a little bit of time there, due to the already existing parallelism when
building the `testdeps` targets.
For jobs on the macOS and Linux runners jobs were already mostly single
digit or below teen seconds, and batching didn't improve on them
noticeably. On VM jobs, the virtual CPUs are limited, so I didn't
make a try. In AppVeyor and GHA vcpkg jobs (using msbuild), batching
didn't result in conclusive or any gains.
Build times in seconds (curl + testdeps):
Job | Before | After w curlu=0 | Gain
:--------------------| :-------------- | :-------------- | :---
cygwin, CM | 19 + 32 = 51 | 12 + 32 = 44 | 7
msys2, CM | 7 + 15 = 22 | 5 + 14 = 19 | 3
mingw gcc U, CM | 19 + 30 = 49 | 13 + 32 = 45 | 4
mingw ucrt, CM | 32 + 42 = 74 | 15 + 43 = 58 | 16
mingw clang, CM | 15 + 21 = 36 | 8 + 21 = 29 | 7
mingw uwp, CM | 30 + 40 = 70 | 14 + 40 = 54 | 16
mingw gcc, CM | 20 + 31 = 51 | 12 + 31 = 43 | 8
mingw x86, CM | 35 + 40 = 75 | 15 + 38 = 53 | 22
dl-mingw, CM 9.5.0 | 88 + 99 = 187 | 42 + 101 = 143 | 44
dl-mingw, CM 7.3.0 U | 24 + 32 = 56 | 17 + 35 = 52 | 4
Total | | | 131
Total gain per GHA/windows workflow runs: 2m11s
Runs:
Before: https://github.com/curl/curl/actions/runs/
13220256084/job/
36904342259
After: https://github.com/curl/curl/actions/runs/
13220383702/job/
36904602981
https://github.com/curl/curl/actions/runs/
13220613141/job/
36905170104
https://github.com/curl/curl/actions/runs/
13222019443/job/
36908358550
With curlu tweak: https://github.com/curl/curl/actions/runs/
13222239255/job/
36908782462
Ref:
116950a25066257f86461f9d1dfa5f787f55e73c #16265
Closes #16272
if [ '${{ matrix.build }}' = 'cmake' ]; then
PATH="/usr/bin:$(cygpath "${SYSTEMROOT}")/System32"
cmake -B bld -G Ninja ${options} \
- -DCMAKE_UNITY_BUILD=ON -DCURL_TEST_BUNDLES=ON \
+ -DCMAKE_UNITY_BUILD=ON -DCMAKE_UNITY_BUILD_BATCH_SIZE=32 -DCURL_TEST_BUNDLES=ON \
-DCURL_WERROR=ON \
${{ matrix.config }}
else
cmake -B bld -G Ninja ${options} \
-DCMAKE_C_FLAGS="${{ matrix.cflags }} ${CFLAGS_CMAKE} ${CPPFLAGS}" \
-DCMAKE_BUILD_TYPE='${{ matrix.type }}' \
- -DCMAKE_UNITY_BUILD=ON -DCURL_TEST_BUNDLES=ON \
+ -DCMAKE_UNITY_BUILD=ON -DCMAKE_UNITY_BUILD_BATCH_SIZE=32 -DCURL_TEST_BUNDLES=ON \
-DCURL_WERROR=ON \
${{ matrix.config }}
else
cmake -B bld -G 'MSYS Makefiles' ${options} \
-DCMAKE_C_COMPILER=gcc \
-DCMAKE_BUILD_TYPE='${{ matrix.type }}' \
- -DCMAKE_UNITY_BUILD=ON -DCURL_TEST_BUNDLES=ON \
+ -DCMAKE_UNITY_BUILD=ON -DCMAKE_UNITY_BUILD_BATCH_SIZE=30 -DCURL_TEST_BUNDLES=ON \
-DCURL_WERROR=ON \
-DCURL_USE_LIBPSL=OFF \
${{ matrix.config }}
)
target_compile_definitions(curlu PUBLIC "UNITTESTS" "CURL_STATICLIB")
target_link_libraries(curlu PRIVATE ${CURL_LIBS})
+ # There is plenty of parallelism when building the testdeps target.
+ # Override the curlu batch size with the maximum to optimize performance.
+ set_target_properties(curlu PROPERTIES UNITY_BUILD_BATCH_SIZE 0)
endif()
if(ENABLE_CURLDEBUG)