From: Willy Tarreau Date: Thu, 23 May 2024 18:06:25 +0000 (+0200) Subject: BUG/MEDIUM: stick-tables: make sure never to create two same remote entries X-Git-Tag: v3.0-dev13~24 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=77f286e8bcd345abbae252c2af2cee808cb91c32;p=thirdparty%2Fhaproxy.git BUG/MEDIUM: stick-tables: make sure never to create two same remote entries In GH issue #2552, Christian Ruppert reported an increase in crashes with recent 3.0-dev versions, always related with stick-tables and peers. One particularity of his config is that it has a lot of peers. While trying to reproduce, it empirically was found that firing 10 load generators at 10 different haproxy instances tracking a random key among 100k against a table of max 5k entries, on 8 threads and between a total of 50 parallel peers managed to reproduce the crashes in seconds, very often in ebtree deletion or insertion code, but not only. The debugging revealed that the crashes are often caused by a parent node being corrupted while delete/insert tries to update it regarding a recently inserted/removed node, and that that corrupted node had always been proven to be deleted, then immediately freed, so it ought not be visited in the tree from functions enclosed between a pair of lock/unlock. As such the only possibility was that it had experienced unexpected inserts. Also, running with pool integrity checking would 90% of the time cause crashes during allocation based on corrupted contents in the node, likely because it was found at two places in the same tree and still present as a parent of a node being deleted or inserted (hence the __stksess_free and stktable_trash_oldest callers being visible on these items). Indeed the issue is in fact related to the test set (occasionally redundant keys, many peers). What happens is that sometimes, a same key is learned from two different peers. When it is learned for the first time, we end up in stktable_touch_with_exp() in the "else" branch, where the test for existence is made before taking the lock (since commit cfeca3a3a3 ("MEDIUM: stick-table: touch updates under an upgradable read lock") that was merged in 2.9), and from there the entry is added. But is one of the threads manages to insert it before the other thread takes the lock, then the second thread will try to insert this node again. And inserting an already inserted node will corrupt the tree (note that we never switched to enforcing a check in insertion code on this due to API history that would break various code parts). Here the solution is simple, it requires to recheck leaf_p after getting the lock, to avoid touching anything if the entry has already been inserted in the mean time. Many thanks to Christian Ruppert for testing this and for his invaluable help on this hard-to-trigger issue. This fix needs to be backported to 2.9. --- diff --git a/src/stick_table.c b/src/stick_table.c index 98afbe8736..00f300f0fe 100644 --- a/src/stick_table.c +++ b/src/stick_table.c @@ -607,8 +607,16 @@ void stktable_touch_with_exp(struct stktable *t, struct stksess *ts, int local, do_wakeup = 1; } else { - /* If this entry is not in the tree */ - + /* Note: we land here when learning new entries from + * remote peers. We hold one ref_cnt so the entry + * cannot vanish under us, however if two peers create + * the same key at the exact same time, we must be + * careful not to perform two parallel inserts! Hence + * we need to first check leaf_p to know if the entry + * is new, then lock the tree and check the entry again + * (since another thread could have created it in the + * mean time). + */ if (!ts->upd.node.leaf_p) { /* Time to upgrade the read lock to write lock if needed */ if (!use_wrlock) { @@ -617,13 +625,14 @@ void stktable_touch_with_exp(struct stktable *t, struct stksess *ts, int local, } /* here we're write-locked */ - - ts->seen = 0; - ts->upd.key= (++t->update)+(2147483648U); - eb = eb32_insert(&t->updates, &ts->upd); - if (eb != &ts->upd) { - eb32_delete(eb); - eb32_insert(&t->updates, &ts->upd); + if (!ts->upd.node.leaf_p) { + ts->seen = 0; + ts->upd.key= (++t->update)+(2147483648U); + eb = eb32_insert(&t->updates, &ts->upd); + if (eb != &ts->upd) { + eb32_delete(eb); + eb32_insert(&t->updates, &ts->upd); + } } } }