From: Christian Brauner Date: Wed, 18 Oct 2017 12:19:31 +0000 (+0200) Subject: ringbuf: implement simple and efficient ringbuffer X-Git-Tag: lxc-3.0.0.beta1~210^2~8 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=f3d05ee66dbf0e8283a2eeb6321e1bc7dfcb3034;p=thirdparty%2Flxc.git ringbuf: implement simple and efficient ringbuffer liblxc will use a ringbuffer implementation that employs mmap()ed memory. Specifically, the ringbuffer will create an anonymous memory mapping twice the requested size for the ringbuffer. Afterwards, an in-memory file the requested size for the ringbuffer will be created. This in-memory file will then be memory mapped twice into the previously established anonymous memory mapping thereby effectively splitting the anoymous memory mapping in two halves of equal size. This will allow the ringbuffer to get rid of any complex boundary and wrap-around calculation logic. Since the underlying physical memory is the same in both halves of the memory mapping only a single memcpy() call for both reads and writes from and to the ringbuffer is needed. Design Notes: - Since we're using MAP_FIXED memory mappings to map the same in-memory file twice into the anonymous memory mapping the kernel requires us to always operate on properly aligned pages. To guarantee proper page aligment the size of the ringbuffer must always be a muliple of the kernel's page size. This also implies that the minimum size of the ringbuffer must be at least equal to one page size. This additional requirement is reasonably unproblematic. First, any ringbuffer smaller than the size of a single page is very likely useless since the standard page size on linux is 4096 bytes. - Because liblxc is not able to predict the output a user is going to produce (e.g. users could cat binary files onto the console) and because the ringbuffer is located in a hotpath and needs to be as performant as possible liblxc will not parse the buffer. Use Case: The ringbuffer is needed by liblxc in order to safely log the output of write intensive callers that produce unpredictable output or unpredictable amounts of output. The console output created by a booting system and the user is one of those cases. Allowing a container to log the console's output to a file it would be possible for a malicious user to fill up the host filesystem by producing random ouput on the container's console if quota support is either not enabled or not available for the underlying filesystem. Using a ringbuffer is a reliable and secure way to ensure a fixed-size log. Closes #1857. Signed-off-by: Christian Brauner --- diff --git a/src/lxc/Makefile.am b/src/lxc/Makefile.am index b71992d75..fff32ae4f 100644 --- a/src/lxc/Makefile.am +++ b/src/lxc/Makefile.am @@ -116,6 +116,7 @@ liblxc_la_SOURCES = \ log.c log.h \ attach.c attach.h \ criu.c criu.h \ + ringbuf.c ringbuf.h \ \ network.c network.h \ nl.c nl.h \ diff --git a/src/lxc/ringbuf.c b/src/lxc/ringbuf.c new file mode 100644 index 000000000..1299fe709 --- /dev/null +++ b/src/lxc/ringbuf.c @@ -0,0 +1,145 @@ +/* liblxcapi + * + * Copyright © 2017 Christian Brauner . + * Copyright © 2017 Canonical Ltd. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#define _GNU_SOURCE +#define __STDC_FORMAT_MACROS +#include +#include +#include +#include +#include +#include +#include +#include + +#include "ringbuf.h" +#include "utils.h" + +int lxc_ringbuf_create(struct lxc_ringbuf *buf, size_t size) +{ + char *tmp; + int ret; + int memfd = -1; + + buf->size = size; + buf->r_off = 0; + buf->w_off = 0; + + /* verify that we are at least given the multiple of a page size */ + if (buf->size % lxc_getpagesize()) + return -EINVAL; + + buf->addr = mmap(NULL, buf->size * 2, PROT_NONE, + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + if (buf->addr == MAP_FAILED) + return -EINVAL; + + memfd = memfd_create(".lxc_ringbuf", MFD_CLOEXEC); + if (memfd < 0) { + if (errno != ENOSYS) + goto on_error; + + memfd = lxc_make_tmpfile((char *){P_tmpdir"/.lxc_ringbuf_XXXXXX"}, true); + } + if (memfd < 0) + goto on_error; + + ret = ftruncate(memfd, buf->size); + if (ret < 0) + goto on_error; + + tmp = mmap(buf->addr, buf->size, PROT_READ | PROT_WRITE, + MAP_FIXED | MAP_SHARED, memfd, 0); + if (tmp == MAP_FAILED || tmp != buf->addr) + goto on_error; + + tmp = mmap(buf->addr + buf->size, buf->size, PROT_READ | PROT_WRITE, + MAP_FIXED | MAP_SHARED, memfd, 0); + if (tmp == MAP_FAILED || tmp != (buf->addr + buf->size)) + goto on_error; + + close(memfd); + + return 0; + +on_error: + lxc_ringbuf_release(buf); + if (memfd >= 0) + close(memfd); + return -1; +} + +void lxc_ringbuf_move_read_addr(struct lxc_ringbuf *buf, size_t len) +{ + buf->r_off += len; + + if (buf->r_off < buf->size) + return; + + /* wrap around */ + buf->r_off -= buf->size; + buf->w_off -= buf->size; +} + +/** + * lxc_ringbuf_write - write a message to the ringbuffer + * - The size of the message should never be greater than the size of the whole + * ringbuffer. + * - The write method will always succeed i.e. it will always advance the r_off + * if it detects that there's not enough space available to write the + * message. + */ +int lxc_ringbuf_write(struct lxc_ringbuf *buf, const char *msg, size_t len) +{ + char *w_addr; + uint64_t free; + + /* sanity check: a write should never exceed the ringbuffer's total size */ + if (len > buf->size) + return -EFBIG; + + free = lxc_ringbuf_free(buf); + + /* not enough space left so advance read address */ + if (len > free) + lxc_ringbuf_move_read_addr(buf, len); + w_addr = lxc_ringbuf_get_write_addr(buf); + memcpy(w_addr, msg, len); + lxc_ringbuf_move_write_addr(buf, len); + return 0; +} + +int lxc_ringbuf_read(struct lxc_ringbuf *buf, char *out, size_t *len) +{ + uint64_t used; + + /* there's nothing to read */ + if (buf->r_off == buf->w_off) + return -ENODATA; + + /* read maximum amount available */ + used = lxc_ringbuf_used(buf); + if (used < *len) + *len = used; + + /* copy data to reader but don't advance addr */ + memcpy(out, lxc_ringbuf_get_read_addr(buf), *len); + out[*len - 1] = '\0'; + return 0; +} diff --git a/src/lxc/ringbuf.h b/src/lxc/ringbuf.h new file mode 100644 index 000000000..0e8e7922f --- /dev/null +++ b/src/lxc/ringbuf.h @@ -0,0 +1,90 @@ +/* liblxcapi + * + * Copyright © 2017 Christian Brauner . + * Copyright © 2017 Canonical Ltd. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#ifndef __LXC_RINGBUF_H +#define __LXC_RINGBUF_H + +#include +#include +#include +#include + +/** + * lxc_ringbuf - Implements a simple and efficient memory mapped ringbuffer. + * - The "addr" field of struct lxc_ringbuf is considered immutable. Instead the + * read and write offsets r_off and w_off are used to calculate the current + * read and write addresses. There should never be a need to use any of those + * fields directly. Instead use the appropriate helpers below. + * - Callers are expected to synchronize read and write accesses to the + * ringbuffer. + */ +struct lxc_ringbuf { + char *addr; /* start address of the ringbuffer */ + uint64_t size; /* total size of the ringbuffer in bytes */ + uint64_t r_off; /* read offset */ + uint64_t w_off; /* write offset */ +}; + +/** + * lxc_ringbuf_create - Initialize a new ringbuffer. + * + * @param[in] size Size of the new ringbuffer as a power of 2. + */ +extern int lxc_ringbuf_create(struct lxc_ringbuf *buf, size_t size); +extern void lxc_ringbuf_move_read_addr(struct lxc_ringbuf *buf, size_t len); +extern int lxc_ringbuf_write(struct lxc_ringbuf *buf, const char *msg, size_t len); +extern int lxc_ringbuf_read(struct lxc_ringbuf *buf, char *out, size_t *len); + +static inline void lxc_ringbuf_release(struct lxc_ringbuf *buf) +{ + munmap(buf->addr, buf->size * 2); +} + +static inline void lxc_ringbuf_clear(struct lxc_ringbuf *buf) +{ + buf->r_off = 0; + buf->w_off = 0; +} + +static inline uint64_t lxc_ringbuf_used(struct lxc_ringbuf *buf) +{ + return buf->w_off - buf->r_off; +} + +static inline uint64_t lxc_ringbuf_free(struct lxc_ringbuf *buf) +{ + return buf->size - lxc_ringbuf_used(buf); +} + +static inline char *lxc_ringbuf_get_read_addr(struct lxc_ringbuf *buf) +{ + return buf->addr + buf->r_off; +} + +static inline char *lxc_ringbuf_get_write_addr(struct lxc_ringbuf *buf) +{ + return buf->addr + buf->w_off; +} + +static inline void lxc_ringbuf_move_write_addr(struct lxc_ringbuf *buf, size_t len) +{ + buf->w_off += len; +} + +#endif /* __LXC_RINGBUF_H */