For large arrays it was linear, so huge amount of +1 would need
quadratic time together. I've been hating that for a long time,
but now I finally have a use case where it makes a large difference.
The one from GCC looks good to me (theoretically) and it surely has
lots of practical deployment.
CI scan-build: I still have no idea about these array allocation
errors; I had just given up and believe they're false alarms.