I was doing some performance testing on our 32-way servers and discovered that in the simple case where the test program returns the data straight away, about 80% of the time was being spent in the kernel in a spinlock. I've been debugging this looking at the coprocess code and discovered that this is due to the following line in coprocess.cc:
setbuf(d_fp,0); // no buffering please, confuses select
If this is removed, performance in my particular test case goes from 2000qps with powerdns running at about 2000% cpu to 10000qps with powerdns using about 300% cpu.
Obviously the comment implies that this is not a permanent solution, I guess if the timeout is specified as 0 then the select code won't be executed and so you can disable the setbuf easily enough. However perhaps if the timeout is wanted you could set an alarm() rather than using select?