Commit
27346b01a ("OPTIM: tools: optimize my_ffsl() for x86_64") optimized
my_ffsl() for intensive use cases in the scheduler, but as half of the times
I got it wrong so it counted bits the reverse way. It doesn't matter for the
scheduler nor fd cache but it broke cpu-map with threads which heavily relies
on proper ordering.
We should probably consider dropping support for gcc < 3.4 and switching
to builtins for these ones, though often they are as ambiguous.
No backport is needed.
unsigned long cnt;
#if defined(__x86_64__)
- __asm__("bsr %1,%0\n" : "=r" (cnt) : "rm" (a));
+ __asm__("bsf %1,%0\n" : "=r" (cnt) : "rm" (a));
cnt++;
#else