When sending messages to clients, ctdb checks for EAGAIN error code and
schedules next write in the subsequent event loop. Using sys_write in
these places causes ctdb to loop hard till a client is able to read from
the socket. With real time scheduling, ctdb daemon spins consuming 100%
of CPU trying to write to the client sockets. This can be quite harmful
when running under VMs or machines with single CPU.
This regression was introduced when all read/write calls were replaced to
use sys_read/sys_write wrappers (
c1558adeaa980fb4bd6177d36250ec8262e9b9fe).
The existing code backs off in case of EAGAIN failures and waits for an
event loop to process the write again. This should give ctdb clients
a chance to get scheduled and to process the ctdb socket.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Feb 24 12:29:30 CET 2015 on sn-devel-104
(cherry picked from commit
04a061e4d19d5bdbd8179fb0fab8b0875eec243e)
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11124
CTDB daemon is non responsive and consuming 100% CPU
struct ctdb_queue_pkt *pkt = queue->out_queue;
ssize_t n;
if (queue->ctdb->flags & CTDB_FLAG_TORTURE) {
- n = sys_write(queue->fd, pkt->data, 1);
+ n = write(queue->fd, pkt->data, 1);
} else {
- n = sys_write(queue->fd, pkt->data, pkt->length);
+ n = write(queue->fd, pkt->data, pkt->length);
}
if (n == -1 && errno != EAGAIN && errno != EWOULDBLOCK) {
queue overhead. This relies on non-blocking sockets */
if (queue->out_queue == NULL && queue->fd != -1 &&
!(queue->ctdb->flags & CTDB_FLAG_TORTURE)) {
- ssize_t n = sys_write(queue->fd, data, length2);
+ ssize_t n = write(queue->fd, data, length2);
if (n == -1 && errno != EAGAIN && errno != EWOULDBLOCK) {
talloc_free(queue->fde);
queue->fde = NULL;