From: Christian Brauner Date: Wed, 30 Aug 2017 10:26:10 +0000 (+0200) Subject: Revert "cgfsng: try to delete parent cgroups" X-Git-Tag: lxc-2.1.0~13^2~1 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=308a6c946d5226c043953e74f35f74c332e6764d;p=thirdparty%2Flxc.git Revert "cgfsng: try to delete parent cgroups" This reverts commit 92c590ae1ea40bc094603ab49c20b785cc88bb1d. Problem: Commit 92c590ae1ea40bc094603ab49c20b785cc88bb1d introduced the following behavior: > cgfsng: try to delete parent cgroups > > Say we have > > lxc.uts.name = c1 > lxc.cgroup.dir = lxd/a/b/c > > the path for the container's cgroup would be > > lxd/a/b/c/c1 > > When the container is shutdown we should not just try to delete "c1" we > should also try to delete "c", "b", "a", and "lxd". This is to ensure > that we don't leave empty cgroups around thereby increasing the chance > that we run into trouble with cgroup limits. The algorithm for this isn't > too costly since we can simply stop walking upwards at the first rmdir() > failure. The algorithm employs recursive_destroy() which opens each directory specified in lxc.cgroup.dir and tries to delete each directory within that directory. For example, assume "/sys/fs/cgroup/memory/lxd/a/b/c" only contains the cgroup "c1" for container "c1". Assume that "c1" calls recursive_destroy() to cleanup it's cgroups. It will first delete "c1" and anything underneath it. This is perfectly fine since anything underneath that cgroup is under its control. The new algorithm will then tell it to "recurse upwards". So recursive_destroy() will try to delete "/sys/fs/cgroup/lxd/a/b/c" next. Now assume that a second container "c2" has "lxc.cgroup.dir = lxd/a/b/c" set in its config file and calls cgroup_create(). This will create the *empty* cgroup "/sys/fs/cgroup/memory/lxd/a/b/c/c2". Now assume that after having created "c2" container "c1"'s call to recursive_destroy() reaches "/sys/fs/cgroup/memory/lxd/a/b/c/c2" before it is populated. Then the cgroup "c2" will be removed. Now "c2" calls cgroup_enter() to enter its created cgroup. This will fail since c1 deleted the cgroup "c2". (As a sidenote: This is in the set of the few race conditions that are actually easy to describe.) Possible Solution: Instead of calling recursive_destroy() on all cgroups specified in lxc.cgroup.dir we only call recursive_destroy() on the container's own cgroup "/sys/fs/cgroup/memory/lxd/a/b/c/c1". When we start to recurse upwards we only call unlinkat(AT_FDCWD, path, AT_REMOVEDIR). This should avoid the race described above. My argument is as follows. Assume that the container c1 has created the cgroup "/sys/fs/cgroup/lxd/a/b/c/c1" for itself. Now c1 calls cgroup_destroy(). First, recursive_destroy() will be called on the cgroup "c1" which will delete any emtpy cgroup directories underneath "c1" and finally "c1" itself. This is fine since everything under "c1" is the container's c1 sole property. Now container c1 will call unlinkat() on "/sys/fs/cgroup/memory/lxd/a/b/c/c1": - Assume that in the meantime container c2 has created the cgroup "/sys/fs/cgroup/memory/lxd/a/b/c/c2". Then c1's unlinkat() will fail. This will stop c1 from recursing upwards. So c2's cgroup_enter() call will find all its cgroups intact and well. unlinkat() will come with the appropriate in-kernel locking which will stop it from racing with mkdir(). - There's still a subtle race left. c2 might be calling an implementation of mkdir -p to try and create e.g. the cgroup "/sys/fs/cgroup/memory/lxd/a/b". Let's assume "b" exists then c2 will receive EEXIST on "b" and move on to create "c". Let's further assume c1 has already deleted "c". c1 will now be able to delete "/sys/fs/cgroup/memory/lxd/a/b/" and c2's call to create "c" will fail. The latter subtle race makes me rethink this approach. For now we'll just leave empty cgroups behind since I don't want to start locking stuff. Signed-off-by: Christian Brauner --- diff --git a/src/lxc/cgroups/cgfsng.c b/src/lxc/cgroups/cgfsng.c index 390923eca..fe3fd7062 100644 --- a/src/lxc/cgroups/cgfsng.c +++ b/src/lxc/cgroups/cgfsng.c @@ -1272,91 +1272,35 @@ static int rmdir_wrapper(void *data) return cgroup_rmdir(path); } -int recursive_destroy(char *path, struct lxc_conf *conf) +void recursive_destroy(char *path, struct lxc_conf *conf) { int r; - if (conf && !lxc_list_empty(&conf->id_map)) r = userns_exec_1(conf, rmdir_wrapper, path, "rmdir_wrapper"); else r = cgroup_rmdir(path); + if (r < 0) ERROR("Error destroying %s", path); - - return r; } static void cgfsng_destroy(void *hdata, struct lxc_conf *conf) { - int i; - char *clean_parent, *clean_fullcgpath; - char **fields; - size_t recurse_upwards = 0; struct cgfsng_handler_data *d = hdata; if (!d) return; - if (!d->container_cgroup || !hierarchies) - return; - - if (d->cgroup_meta.dir) - clean_parent = d->cgroup_meta.dir; - else - clean_parent = d->cgroup_pattern; - fields = lxc_normalize_path(clean_parent); - if (fields) { - recurse_upwards = lxc_array_len((void **)fields); - if (recurse_upwards > 0 && clean_parent == d->cgroup_pattern) - recurse_upwards--; - lxc_free_array((void **)fields, free); - } - - for (i = 0; hierarchies[i]; i++) { - int ret; - size_t j; - struct hierarchy *h = hierarchies[i]; - - if (!h->fullcgpath) - continue; - - clean_fullcgpath = lxc_deslashify(h->fullcgpath); - if (!clean_fullcgpath) - clean_fullcgpath = h->fullcgpath; - - /* Delete the container's cgroup */ - ret = recursive_destroy(clean_fullcgpath, conf); - if (ret < 0) - goto next; - - if (h->fullcgpath == clean_fullcgpath) - goto next; - - /* Delete parent cgroups as specified in the containers config - * file. This takes care of not having useless empty cgroups - * around. - */ - for (j = 0; j < recurse_upwards; j++) { - char *s = clean_fullcgpath; - - s = strrchr(s, '/'); - if (!s) - break; - *s = '\0'; - - /* If we fail to delete a cgroup we know that any parent - * cgroup also cannot be removed. - */ - ret = recursive_destroy(clean_fullcgpath, conf); - if (ret < 0) - break; + if (d->container_cgroup && hierarchies) { + int i; + for (i = 0; hierarchies[i]; i++) { + struct hierarchy *h = hierarchies[i]; + if (h->fullcgpath) { + recursive_destroy(h->fullcgpath, conf); + free(h->fullcgpath); + h->fullcgpath = NULL; + } } - -next: - if (h->fullcgpath != clean_fullcgpath) - free(clean_fullcgpath); - free(h->fullcgpath); - h->fullcgpath = NULL; } free_handler_data(d); diff --git a/src/lxc/confile.c b/src/lxc/confile.c index e66bae314..62337289e 100644 --- a/src/lxc/confile.c +++ b/src/lxc/confile.c @@ -1431,6 +1431,9 @@ static int set_config_cgroup_dir(const char *key, const char *value, if (lxc_config_value_empty(value)) return clr_config_cgroup_dir(key, lxc_conf, NULL); + if (lxc_conf->cgroup_meta.dir) + clr_config_cgroup_dir(key, lxc_conf, NULL); + return set_config_string_item(&lxc_conf->cgroup_meta.dir, value); }