From: Aurelien DARRAGON <adarragon@haproxy.com>
Date: Wed, 15 May 2024 08:02:27 +0000 (+0200)
Subject: MEDIUM: hlua: take nbthread into account in hlua_get_nb_instruction()
X-Git-Tag: v3.0-dev12~47
X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=231d3d32beed5c2d0d090a89b36f7995983a4463;p=thirdparty%2Fhaproxy.git

MEDIUM: hlua: take nbthread into account in hlua_get_nb_instruction()

Based on Willy's idea (from 3.0-dev6 announcement message): in this patch
we try to reduce the max latency that can be caused by running lua scripts
with default settings.

Indeed, by default, hlua engine is allowed to process up to 10k
instructions per batch. While this value was found to be the optimal one
for a single thread, it turns out that keeping a thread busy for 10k lua
instructions could increase thread contention. This is especially true
when the script is loaded with 'lua-load', because in that case the
current thread owns the main lua lock and prevent other threads from
making any progress if they're also waiting on the main lock.

Thanks to Thierry Fournier's work, we know that performance-wise we can
reach optimal performance by sticking between 500 and 10k instructions
per batch. Given that, when the script is loaded using 'lua-load', if no
"tune.lua.forced-yield" was set by the user, we automatically divide the
default value (10K) by the number of threads haproxy can use to reduce
thread contention (given that all threads could compete for the main lua
lock), however we make sure not to return a value below 500, because
Thierry's work showed that this would come with a significant performance
loss.

The historical behavior may still be enforced by setting
"tune.lua.forced-yield" to 10000 in the global config section.
---

diff --git a/doc/configuration.txt b/doc/configuration.txt
index 7732843427..27a616e6ba 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -3456,10 +3456,17 @@ tune.lua.forced-yield <number>
   This directive forces the Lua engine to execute a yield each <number> of
   instructions executed. This permits interrupting a long script and allows the
   HAProxy scheduler to process other tasks like accepting connections or
-  forwarding traffic. The default value is 10000 instructions. If HAProxy often
-  executes some Lua code but more responsiveness is required, this value can be
-  lowered. If the Lua code is quite long and its result is absolutely required
-  to process the data, the <number> can be increased.
+  forwarding traffic. The default value is 10000 instructions for scripts loaded
+  using "lua-load-per-thread" and MAX(500, 10000 / nbthread) instructions for
+  scripts loaded using "lua-load" (it was found to be an optimal value for
+  performance while taking care of not creating thread contention with multiple
+  threads competing for the global lua lock).
+
+  If HAProxy often executes some Lua code but more responsiveness is required,
+  this value can be lowered. If the Lua code is quite long and its result is
+  absolutely required to process the data, the <number> can be increased, but
+  the value should be set wisely as in multithreading context it could increase
+  contention.
 
 tune.lua.maxmem <number>
   Sets the maximum amount of RAM in megabytes per process usable by Lua. By
diff --git a/src/hlua.c b/src/hlua.c
index 6fe64be85a..098107f7ae 100644
--- a/src/hlua.c
+++ b/src/hlua.c
@@ -516,7 +516,15 @@ static inline int hlua_timer_check(const struct hlua_timer *timer)
 
 /* Interrupts the Lua processing each "hlua_nb_instruction" instructions.
  * it is used for preventing infinite loops.
+ */
+static unsigned int hlua_nb_instruction = 0;
+
+/* Wrapper to retrieve the number of instructions between two interrupts
+ * depending on user settings and current hlua context. If not already
+ * explicitly set, we compute the ideal value using hard limits releaved
+ * by Thierry Fournier's work, whose original notes may be found below:
  *
+ * --
  * I test the scheer with an infinite loop containing one incrementation
  * and one test. I run this loop between 10 seconds, I raise a ceil of
  * 710M loops from one interrupt each 9000 instructions, so I fix the value
@@ -537,16 +545,41 @@ static inline int hlua_timer_check(const struct hlua_timer *timer)
  *  10000         | 710
  *  100000        | 710
  *  1000000       | 710
+ * --
  *
- */
-static unsigned int hlua_nb_instruction = 10000;
-
-/* Wrapper to retrieve the number of instructions between two interrupts
- * depending on user settings and current hlua context.
+ * Thanks to his work, we know we can safely use values between 500 and 10000
+ * without a significant impact on performance.
  */
 static inline unsigned int hlua_get_nb_instruction(struct hlua *hlua)
 {
-	return hlua_nb_instruction;
+	int ceil = 10000; /* above 10k, no significant performance gain */
+	int floor = 500;  /* below 500, significant performance loss */
+
+	if (hlua_nb_instruction) {
+		/* value enforced by user */
+		return hlua_nb_instruction;
+	}
+
+	/* not set, assign automatic value */
+	if (hlua->state_id == 0) {
+		/* this function is expected to be called during runtime (after config
+		 * parsing), thus global.nb_thread is expected to be set.
+		 */
+		BUG_ON(global.nbthread == 0);
+
+		/* main lua stack (shared global lock), take number of threads into
+		 * account in an attempt to reduce thread contention
+		 */
+		return MAX(floor, ceil / global.nbthread);
+	}
+	else {
+		/* per-thread lua stack, less contention is expected (no global lock),
+		 * allow up to the maximum number of instructions and hope that the
+		 * user manually yields after heavy (lock dependent) work from lua
+		 * script (e.g.: map manipulation).
+		 */
+		return ceil;
+	}
 }
 
 /* Descriptor for the memory allocation state. The limit is pre-initialised to