--- /dev/null
- <meta name="Copyright" content="Copyright (C) 2005-2015, Mike Pall">
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
+<html>
+<head>
+<title>Profiler</title>
+<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
+<meta name="Author" content="Mike Pall">
- Copyright © 2005-2015 Mike Pall
++<meta name="Copyright" content="Copyright (C) 2005-2016, Mike Pall">
+<meta name="Language" content="en">
+<link rel="stylesheet" type="text/css" href="bluequad.css" media="screen">
+<link rel="stylesheet" type="text/css" href="bluequad-print.css" media="print">
+</head>
+<body>
+<div id="site">
+<a href="http://luajit.org"><span>Lua<span id="logo">JIT</span></span></a>
+</div>
+<div id="head">
+<h1>Profiler</h1>
+</div>
+<div id="nav">
+<ul><li>
+<a href="luajit.html">LuaJIT</a>
+<ul><li>
+<a href="http://luajit.org/download.html">Download <span class="ext">»</span></a>
+</li><li>
+<a href="install.html">Installation</a>
+</li><li>
+<a href="running.html">Running</a>
+</li></ul>
+</li><li>
+<a href="extensions.html">Extensions</a>
+<ul><li>
+<a href="ext_ffi.html">FFI Library</a>
+<ul><li>
+<a href="ext_ffi_tutorial.html">FFI Tutorial</a>
+</li><li>
+<a href="ext_ffi_api.html">ffi.* API</a>
+</li><li>
+<a href="ext_ffi_semantics.html">FFI Semantics</a>
+</li></ul>
+</li><li>
+<a href="ext_jit.html">jit.* Library</a>
+</li><li>
+<a href="ext_c_api.html">Lua/C API</a>
+</li><li>
+<a class="current" href="ext_profiler.html">Profiler</a>
+</li></ul>
+</li><li>
+<a href="status.html">Status</a>
+<ul><li>
+<a href="changes.html">Changes</a>
+</li></ul>
+</li><li>
+<a href="faq.html">FAQ</a>
+</li><li>
+<a href="http://luajit.org/performance.html">Performance <span class="ext">»</span></a>
+</li><li>
+<a href="http://wiki.luajit.org/">Wiki <span class="ext">»</span></a>
+</li><li>
+<a href="http://luajit.org/list.html">Mailing List <span class="ext">»</span></a>
+</li></ul>
+</div>
+<div id="main">
+<p>
+LuaJIT has an integrated statistical profiler with very low overhead. It
+allows sampling the currently executing stack and other parameters in
+regular intervals.
+</p>
+<p>
+The integrated profiler can be accessed from three levels:
+</p>
+<ul>
+<li>The <a href="#hl_profiler">bundled high-level profiler</a>, invoked by the
+<a href="#j_p"><tt>-jp</tt></a> command line option.</li>
+<li>A <a href="#ll_lua_api">low-level Lua API</a> to control the profiler.</li>
+<li>A <a href="#ll_c_api">low-level C API</a> to control the profiler.</li>
+</ul>
+
+<h2 id="hl_profiler">High-Level Profiler</h2>
+<p>
+The bundled high-level profiler offers basic profiling functionality. It
+generates simple textual summaries or source code annotations. It can be
+accessed with the <a href="#j_p"><tt>-jp</tt></a> command line option
+or from Lua code by loading the underlying <tt>jit.p</tt> module.
+</p>
+<p>
+To cut to the chase — run this to get a CPU usage profile by
+function name:
+</p>
+<pre class="code">
+luajit -jp myapp.lua
+</pre>
+<p>
+It's <em>not</em> a stated goal of the bundled profiler to add every
+possible option or to cater for special profiling needs. The low-level
+profiler APIs are documented below. They may be used by third-party
+authors to implement advanced functionality, e.g. IDE integration or
+graphical profilers.
+</p>
+<p>
+Note: Sampling works for both interpreted and JIT-compiled code. The
+results for JIT-compiled code may sometimes be surprising. LuaJIT
+heavily optimizes and inlines Lua code — there's no simple
+one-to-one correspondence between source code lines and the sampled
+machine code.
+</p>
+
+<h3 id="j_p"><tt>-jp=[options[,output]]</tt></h3>
+<p>
+The <tt>-jp</tt> command line option starts the high-level profiler.
+When the application run by the command line terminates, the profiler
+stops and writes the results to <tt>stdout</tt> or to the specified
+<tt>output</tt> file.
+</p>
+<p>
+The <tt>options</tt> argument specifies how the profiling is to be
+performed:
+</p>
+<ul>
+<li><tt>f</tt> — Stack dump: function name, otherwise module:line.
+This is the default mode.</li>
+<li><tt>F</tt> — Stack dump: ditto, but dump module:name.</li>
+<li><tt>l</tt> — Stack dump: module:line.</li>
+<li><tt><number></tt> — stack dump depth (callee ←
+caller). Default: 1.</li>
+<li><tt>-<number></tt> — Inverse stack dump depth (caller
+→ callee).</li>
+<li><tt>s</tt> — Split stack dump after first stack level. Implies
+depth ≥ 2 or depth ≤ -2.</li>
+<li><tt>p</tt> — Show full path for module names.</li>
+<li><tt>v</tt> — Show VM states.</li>
+<li><tt>z</tt> — Show <a href="#jit_zone">zones</a>.</li>
+<li><tt>r</tt> — Show raw sample counts. Default: show percentages.</li>
+<li><tt>a</tt> — Annotate excerpts from source code files.</li>
+<li><tt>A</tt> — Annotate complete source code files.</li>
+<li><tt>G</tt> — Produce raw output suitable for graphical tools.</li>
+<li><tt>m<number></tt> — Minimum sample percentage to be shown.
+Default: 3%.</li>
+<li><tt>i<number></tt> — Sampling interval in milliseconds.
+Default: 10ms.<br>
+Note: The actual sampling precision is OS-dependent.</li>
+</ul>
+<p>
+The default output for <tt>-jp</tt> is a list of the most CPU consuming
+spots in the application. Increasing the stack dump depth with (say)
+<tt>-jp=2</tt> may help to point out the main callers or callees of
+hotspots. But sample aggregation is still flat per unique stack dump.
+</p>
+<p>
+To get a two-level view (split view) of callers/callees, use
+<tt>-jp=s</tt> or <tt>-jp=-s</tt>. The percentages shown for the second
+level are relative to the first level.
+</p>
+<p>
+To see how much time is spent in each line relative to a function, use
+<tt>-jp=fl</tt>.
+</p>
+<p>
+To see how much time is spent in different VM states or
+<a href="#jit_zone">zones</a>, use <tt>-jp=v</tt> or <tt>-jp=z</tt>.
+</p>
+<p>
+Combinations of <tt>v/z</tt> with <tt>f/F/l</tt> produce two-level
+views, e.g. <tt>-jp=vf</tt> or <tt>-jp=fv</tt>. This shows the time
+spent in a VM state or zone vs. hotspots. This can be used to answer
+questions like "Which time consuming functions are only interpreted?" or
+"What's the garbage collector overhead for a specific function?".
+</p>
+<p>
+Multiple options can be combined — but not all combinations make
+sense, see above. E.g. <tt>-jp=3si4m1</tt> samples three stack levels
+deep in 4ms intervals and shows a split view of the CPU consuming
+functions and their callers with a 1% threshold.
+</p>
+<p>
+Source code annotations produced by <tt>-jp=a</tt> or <tt>-jp=A</tt> are
+always flat and at the line level. Obviously, the source code files need
+to be readable by the profiler script.
+</p>
+<p>
+The high-level profiler can also be started and stopped from Lua code with:
+</p>
+<pre class="code">
+require("jit.p").start(options, output)
+...
+require("jit.p").stop()
+</pre>
+
+<h3 id="jit_zone"><tt>jit.zone</tt> — Zones</h3>
+<p>
+Zones can be used to provide information about different parts of an
+application to the high-level profiler. E.g. a game could make use of an
+<tt>"AI"</tt> zone, a <tt>"PHYS"</tt> zone, etc. Zones are hierarchical,
+organized as a stack.
+</p>
+<p>
+The <tt>jit.zone</tt> module needs to be loaded explicitly:
+</p>
+<pre class="code">
+local zone = require("jit.zone")
+</pre>
+<ul>
+<li><tt>zone("name")</tt> pushes a named zone to the zone stack.</li>
+<li><tt>zone()</tt> pops the current zone from the zone stack and
+returns its name.</li>
+<li><tt>zone:get()</tt> returns the current zone name or <tt>nil</tt>.</li>
+<li><tt>zone:flush()</tt> flushes the zone stack.</li>
+</ul>
+<p>
+To show the time spent in each zone use <tt>-jp=z</tt>. To show the time
+spent relative to hotspots use e.g. <tt>-jp=zf</tt> or <tt>-jp=fz</tt>.
+</p>
+
+<h2 id="ll_lua_api">Low-level Lua API</h2>
+<p>
+The <tt>jit.profile</tt> module gives access to the low-level API of the
+profiler from Lua code. This module needs to be loaded explicitly:
+<pre class="code">
+local profile = require("jit.profile")
+</pre>
+<p>
+This module can be used to implement your own higher-level profiler.
+A typical profiling run starts the profiler, captures stack dumps in
+the profiler callback, adds them to a hash table to aggregate the number
+of samples, stops the profiler and then analyzes all of the captured
+stack dumps. Other parameters can be sampled in the profiler callback,
+too. But it's important not to spend too much time in the callback,
+since this may skew the statistics.
+</p>
+
+<h3 id="profile_start"><tt>profile.start(mode, cb)</tt>
+— Start profiler</h3>
+<p>
+This function starts the profiler. The <tt>mode</tt> argument is a
+string holding options:
+</p>
+<ul>
+<li><tt>f</tt> — Profile with precision down to the function level.</li>
+<li><tt>l</tt> — Profile with precision down to the line level.</li>
+<li><tt>i<number></tt> — Sampling interval in milliseconds (default
+10ms).</br>
+Note: The actual sampling precision is OS-dependent.
+</li>
+</ul>
+<p>
+The <tt>cb</tt> argument is a callback function which is called with
+three arguments: <tt>(thread, samples, vmstate)</tt>. The callback is
+called on a separate coroutine, the <tt>thread</tt> argument is the
+state that holds the stack to sample for profiling. Note: do
+<em>not</em> modify the stack of that state or call functions on it.
+</p>
+<p>
+<tt>samples</tt> gives the number of accumulated samples since the last
+callback (usually 1).
+</p>
+<p>
+<tt>vmstate</tt> holds the VM state at the time the profiling timer
+triggered. This may or may not correspond to the state of the VM when
+the profiling callback is called. The state is either <tt>'N'</tt>
+native (compiled) code, <tt>'I'</tt> interpreted code, <tt>'C'</tt>
+C code, <tt>'G'</tt> the garbage collector, or <tt>'J'</tt> the JIT
+compiler.
+</p>
+
+<h3 id="profile_stop"><tt>profile.stop()</tt>
+— Stop profiler</h3>
+<p>
+This function stops the profiler.
+</p>
+
+<h3 id="profile_dump"><tt>dump = profile.dumpstack([thread,] fmt, depth)</tt>
+— Dump stack </h3>
+<p>
+This function allows taking stack dumps in an efficient manner. It
+returns a string with a stack dump for the <tt>thread</tt> (coroutine),
+formatted according to the <tt>fmt</tt> argument:
+</p>
+<ul>
+<li><tt>p</tt> — Preserve the full path for module names. Otherwise
+only the file name is used.</li>
+<li><tt>f</tt> — Dump the function name if it can be derived. Otherwise
+use module:line.</li>
+<li><tt>F</tt> — Ditto, but dump module:name.</li>
+<li><tt>l</tt> — Dump module:line.</li>
+<li><tt>Z</tt> — Zap the following characters for the last dumped
+frame.</li>
+<li>All other characters are added verbatim to the output string.</li>
+</ul>
+<p>
+The <tt>depth</tt> argument gives the number of frames to dump, starting
+at the topmost frame of the thread. A negative number dumps the frames in
+inverse order.
+</p>
+<p>
+The first example prints a list of the current module names and line
+numbers of up to 10 frames in separate lines. The second example prints
+semicolon-separated function names for all frames (up to 100) in inverse
+order:
+</p>
+<pre class="code">
+print(profile.dumpstack(thread, "l\n", 10))
+print(profile.dumpstack(thread, "lZ;", -100))
+</pre>
+
+<h2 id="ll_c_api">Low-level C API</h2>
+<p>
+The profiler can be controlled directly from C code, e.g. for
+use by IDEs. The declarations are in <tt>"luajit.h"</tt> (see
+<a href="ext_c_api.html">Lua/C API</a> extensions).
+</p>
+
+<h3 id="luaJIT_profile_start"><tt>luaJIT_profile_start(L, mode, cb, data)</tt>
+— Start profiler</h3>
+<p>
+This function starts the profiler. <a href="#profile_start">See
+above</a> for a description of the <tt>mode</tt> argument.
+</p>
+<p>
+The <tt>cb</tt> argument is a callback function with the following
+declaration:
+</p>
+<pre class="code">
+typedef void (*luaJIT_profile_callback)(void *data, lua_State *L,
+ int samples, int vmstate);
+</pre>
+<p>
+<tt>data</tt> is available for use by the callback. <tt>L</tt> is the
+state that holds the stack to sample for profiling. Note: do
+<em>not</em> modify this stack or call functions on this stack —
+use a separate coroutine for this purpose. <a href="#profile_start">See
+above</a> for a description of <tt>samples</tt> and <tt>vmstate</tt>.
+</p>
+
+<h3 id="luaJIT_profile_stop"><tt>luaJIT_profile_stop(L)</tt>
+— Stop profiler</h3>
+<p>
+This function stops the profiler.
+</p>
+
+<h3 id="luaJIT_profile_dumpstack"><tt>p = luaJIT_profile_dumpstack(L, fmt, depth, len)</tt>
+— Dump stack </h3>
+<p>
+This function allows taking stack dumps in an efficient manner.
+<a href="#profile_dump">See above</a> for a description of <tt>fmt</tt>
+and <tt>depth</tt>.
+</p>
+<p>
+This function returns a <tt>const char *</tt> pointing to a
+private string buffer of the profiler. The <tt>int *len</tt>
+argument returns the length of the output string. The buffer is
+overwritten on the next call and deallocated when the profiler stops.
+You either need to consume the content immediately or copy it for later
+use.
+</p>
+<br class="flush">
+</div>
+<div id="foot">
+<hr class="hide">
++Copyright © 2005-2016 Mike Pall
+<span class="noprint">
+·
+<a href="contact.html">Contact</a>
+</span>
+</div>
+</body>
+</html>
--- /dev/null
- ** Copyright (C) 2005-2015 Mike Pall. All rights reserved.
+/*
+** DynASM ARM64 encoding engine.
++** Copyright (C) 2005-2016 Mike Pall. All rights reserved.
+** Released under the MIT license. See dynasm.lua for full copyright notice.
+*/
+
+#include <stddef.h>
+#include <stdarg.h>
+#include <string.h>
+#include <stdlib.h>
+
+#define DASM_ARCH "arm64"
+
+#ifndef DASM_EXTERN
+#define DASM_EXTERN(a,b,c,d) 0
+#endif
+
+/* Action definitions. */
+enum {
+ DASM_STOP, DASM_SECTION, DASM_ESC, DASM_REL_EXT,
+ /* The following actions need a buffer position. */
+ DASM_ALIGN, DASM_REL_LG, DASM_LABEL_LG,
+ /* The following actions also have an argument. */
+ DASM_REL_PC, DASM_LABEL_PC,
+ DASM_IMM, DASM_IMM6, DASM_IMM12, DASM_IMM13W, DASM_IMM13X, DASM_IMML,
+ DASM__MAX
+};
+
+/* Maximum number of section buffer positions for a single dasm_put() call. */
+#define DASM_MAXSECPOS 25
+
+/* DynASM encoder status codes. Action list offset or number are or'ed in. */
+#define DASM_S_OK 0x00000000
+#define DASM_S_NOMEM 0x01000000
+#define DASM_S_PHASE 0x02000000
+#define DASM_S_MATCH_SEC 0x03000000
+#define DASM_S_RANGE_I 0x11000000
+#define DASM_S_RANGE_SEC 0x12000000
+#define DASM_S_RANGE_LG 0x13000000
+#define DASM_S_RANGE_PC 0x14000000
+#define DASM_S_RANGE_REL 0x15000000
+#define DASM_S_UNDEF_LG 0x21000000
+#define DASM_S_UNDEF_PC 0x22000000
+
+/* Macros to convert positions (8 bit section + 24 bit index). */
+#define DASM_POS2IDX(pos) ((pos)&0x00ffffff)
+#define DASM_POS2BIAS(pos) ((pos)&0xff000000)
+#define DASM_SEC2POS(sec) ((sec)<<24)
+#define DASM_POS2SEC(pos) ((pos)>>24)
+#define DASM_POS2PTR(D, pos) (D->sections[DASM_POS2SEC(pos)].rbuf + (pos))
+
+/* Action list type. */
+typedef const unsigned int *dasm_ActList;
+
+/* Per-section structure. */
+typedef struct dasm_Section {
+ int *rbuf; /* Biased buffer pointer (negative section bias). */
+ int *buf; /* True buffer pointer. */
+ size_t bsize; /* Buffer size in bytes. */
+ int pos; /* Biased buffer position. */
+ int epos; /* End of biased buffer position - max single put. */
+ int ofs; /* Byte offset into section. */
+} dasm_Section;
+
+/* Core structure holding the DynASM encoding state. */
+struct dasm_State {
+ size_t psize; /* Allocated size of this structure. */
+ dasm_ActList actionlist; /* Current actionlist pointer. */
+ int *lglabels; /* Local/global chain/pos ptrs. */
+ size_t lgsize;
+ int *pclabels; /* PC label chains/pos ptrs. */
+ size_t pcsize;
+ void **globals; /* Array of globals (bias -10). */
+ dasm_Section *section; /* Pointer to active section. */
+ size_t codesize; /* Total size of all code sections. */
+ int maxsection; /* 0 <= sectionidx < maxsection. */
+ int status; /* Status code. */
+ dasm_Section sections[1]; /* All sections. Alloc-extended. */
+};
+
+/* The size of the core structure depends on the max. number of sections. */
+#define DASM_PSZ(ms) (sizeof(dasm_State)+(ms-1)*sizeof(dasm_Section))
+
+
+/* Initialize DynASM state. */
+void dasm_init(Dst_DECL, int maxsection)
+{
+ dasm_State *D;
+ size_t psz = 0;
+ int i;
+ Dst_REF = NULL;
+ DASM_M_GROW(Dst, struct dasm_State, Dst_REF, psz, DASM_PSZ(maxsection));
+ D = Dst_REF;
+ D->psize = psz;
+ D->lglabels = NULL;
+ D->lgsize = 0;
+ D->pclabels = NULL;
+ D->pcsize = 0;
+ D->globals = NULL;
+ D->maxsection = maxsection;
+ for (i = 0; i < maxsection; i++) {
+ D->sections[i].buf = NULL; /* Need this for pass3. */
+ D->sections[i].rbuf = D->sections[i].buf - DASM_SEC2POS(i);
+ D->sections[i].bsize = 0;
+ D->sections[i].epos = 0; /* Wrong, but is recalculated after resize. */
+ }
+}
+
+/* Free DynASM state. */
+void dasm_free(Dst_DECL)
+{
+ dasm_State *D = Dst_REF;
+ int i;
+ for (i = 0; i < D->maxsection; i++)
+ if (D->sections[i].buf)
+ DASM_M_FREE(Dst, D->sections[i].buf, D->sections[i].bsize);
+ if (D->pclabels) DASM_M_FREE(Dst, D->pclabels, D->pcsize);
+ if (D->lglabels) DASM_M_FREE(Dst, D->lglabels, D->lgsize);
+ DASM_M_FREE(Dst, D, D->psize);
+}
+
+/* Setup global label array. Must be called before dasm_setup(). */
+void dasm_setupglobal(Dst_DECL, void **gl, unsigned int maxgl)
+{
+ dasm_State *D = Dst_REF;
+ D->globals = gl - 10; /* Negative bias to compensate for locals. */
+ DASM_M_GROW(Dst, int, D->lglabels, D->lgsize, (10+maxgl)*sizeof(int));
+}
+
+/* Grow PC label array. Can be called after dasm_setup(), too. */
+void dasm_growpc(Dst_DECL, unsigned int maxpc)
+{
+ dasm_State *D = Dst_REF;
+ size_t osz = D->pcsize;
+ DASM_M_GROW(Dst, int, D->pclabels, D->pcsize, maxpc*sizeof(int));
+ memset((void *)(((unsigned char *)D->pclabels)+osz), 0, D->pcsize-osz);
+}
+
+/* Setup encoder. */
+void dasm_setup(Dst_DECL, const void *actionlist)
+{
+ dasm_State *D = Dst_REF;
+ int i;
+ D->actionlist = (dasm_ActList)actionlist;
+ D->status = DASM_S_OK;
+ D->section = &D->sections[0];
+ memset((void *)D->lglabels, 0, D->lgsize);
+ if (D->pclabels) memset((void *)D->pclabels, 0, D->pcsize);
+ for (i = 0; i < D->maxsection; i++) {
+ D->sections[i].pos = DASM_SEC2POS(i);
+ D->sections[i].ofs = 0;
+ }
+}
+
+
+#ifdef DASM_CHECKS
+#define CK(x, st) \
+ do { if (!(x)) { \
+ D->status = DASM_S_##st|(p-D->actionlist-1); return; } } while (0)
+#define CKPL(kind, st) \
+ do { if ((size_t)((char *)pl-(char *)D->kind##labels) >= D->kind##size) { \
+ D->status = DASM_S_RANGE_##st|(p-D->actionlist-1); return; } } while (0)
+#else
+#define CK(x, st) ((void)0)
+#define CKPL(kind, st) ((void)0)
+#endif
+
+static int dasm_imm12(unsigned int n)
+{
+ if ((n >> 12) == 0)
+ return n;
+ else if ((n & 0xff000fff) == 0)
+ return (n >> 12) | 0x1000;
+ else
+ return -1;
+}
+
+static int dasm_ffs(unsigned long long x)
+{
+ int n = -1;
+ while (x) { x >>= 1; n++; }
+ return n;
+}
+
+static int dasm_imm13(int lo, int hi)
+{
+ int inv = 0, w = 64, s = 0xfff, xa, xb;
+ unsigned long long n = (((unsigned long long)hi) << 32) | (unsigned int)lo;
+ unsigned long long m = 1ULL, a, b, c;
+ if (n & 1) { n = ~n; inv = 1; }
+ a = n & -n; b = (n+a)&-(n+a); c = (n+a-b)&-(n+a-b);
+ xa = dasm_ffs(a); xb = dasm_ffs(b);
+ if (c) {
+ w = dasm_ffs(c) - xa;
+ if (w == 32) m = 0x0000000100000001UL;
+ else if (w == 16) m = 0x0001000100010001UL;
+ else if (w == 8) m = 0x0101010101010101UL;
+ else if (w == 4) m = 0x1111111111111111UL;
+ else if (w == 2) m = 0x5555555555555555UL;
+ else return -1;
+ s = (-2*w & 0x3f) - 1;
+ } else if (!a) {
+ return -1;
+ } else if (xb == -1) {
+ xb = 64;
+ }
+ if ((b-a) * m != n) return -1;
+ if (inv) {
+ return ((w - xb) << 6) | (s+w+xa-xb);
+ } else {
+ return ((w - xa) << 6) | (s+xb-xa);
+ }
+ return -1;
+}
+
+/* Pass 1: Store actions and args, link branches/labels, estimate offsets. */
+void dasm_put(Dst_DECL, int start, ...)
+{
+ va_list ap;
+ dasm_State *D = Dst_REF;
+ dasm_ActList p = D->actionlist + start;
+ dasm_Section *sec = D->section;
+ int pos = sec->pos, ofs = sec->ofs;
+ int *b;
+
+ if (pos >= sec->epos) {
+ DASM_M_GROW(Dst, int, sec->buf, sec->bsize,
+ sec->bsize + 2*DASM_MAXSECPOS*sizeof(int));
+ sec->rbuf = sec->buf - DASM_POS2BIAS(pos);
+ sec->epos = (int)sec->bsize/sizeof(int) - DASM_MAXSECPOS+DASM_POS2BIAS(pos);
+ }
+
+ b = sec->rbuf;
+ b[pos++] = start;
+
+ va_start(ap, start);
+ while (1) {
+ unsigned int ins = *p++;
+ unsigned int action = (ins >> 16);
+ if (action >= DASM__MAX) {
+ ofs += 4;
+ } else {
+ int *pl, n = action >= DASM_REL_PC ? va_arg(ap, int) : 0;
+ switch (action) {
+ case DASM_STOP: goto stop;
+ case DASM_SECTION:
+ n = (ins & 255); CK(n < D->maxsection, RANGE_SEC);
+ D->section = &D->sections[n]; goto stop;
+ case DASM_ESC: p++; ofs += 4; break;
+ case DASM_REL_EXT: break;
+ case DASM_ALIGN: ofs += (ins & 255); b[pos++] = ofs; break;
+ case DASM_REL_LG:
+ n = (ins & 2047) - 10; pl = D->lglabels + n;
+ /* Bkwd rel or global. */
+ if (n >= 0) { CK(n>=10||*pl<0, RANGE_LG); CKPL(lg, LG); goto putrel; }
+ pl += 10; n = *pl;
+ if (n < 0) n = 0; /* Start new chain for fwd rel if label exists. */
+ goto linkrel;
+ case DASM_REL_PC:
+ pl = D->pclabels + n; CKPL(pc, PC);
+ putrel:
+ n = *pl;
+ if (n < 0) { /* Label exists. Get label pos and store it. */
+ b[pos] = -n;
+ } else {
+ linkrel:
+ b[pos] = n; /* Else link to rel chain, anchored at label. */
+ *pl = pos;
+ }
+ pos++;
+ break;
+ case DASM_LABEL_LG:
+ pl = D->lglabels + (ins & 2047) - 10; CKPL(lg, LG); goto putlabel;
+ case DASM_LABEL_PC:
+ pl = D->pclabels + n; CKPL(pc, PC);
+ putlabel:
+ n = *pl; /* n > 0: Collapse rel chain and replace with label pos. */
+ while (n > 0) { int *pb = DASM_POS2PTR(D, n); n = *pb; *pb = pos;
+ }
+ *pl = -pos; /* Label exists now. */
+ b[pos++] = ofs; /* Store pass1 offset estimate. */
+ break;
+ case DASM_IMM:
+ CK((n & ((1<<((ins>>10)&31))-1)) == 0, RANGE_I);
+ n >>= ((ins>>10)&31);
+#ifdef DASM_CHECKS
+ if ((ins & 0x8000))
+ CK(((n + (1<<(((ins>>5)&31)-1)))>>((ins>>5)&31)) == 0, RANGE_I);
+ else
+ CK((n>>((ins>>5)&31)) == 0, RANGE_I);
+#endif
+ b[pos++] = n;
+ break;
+ case DASM_IMM6:
+ CK((n >> 6) == 0, RANGE_I);
+ b[pos++] = n;
+ break;
+ case DASM_IMM12:
+ CK(dasm_imm12((unsigned int)n) != -1, RANGE_I);
+ b[pos++] = n;
+ break;
+ case DASM_IMM13W:
+ CK(dasm_imm13(n, n) != -1, RANGE_I);
+ b[pos++] = n;
+ break;
+ case DASM_IMM13X: {
+ int m = va_arg(ap, int);
+ CK(dasm_imm13(n, m) != -1, RANGE_I);
+ b[pos++] = n;
+ b[pos++] = m;
+ break;
+ }
+ case DASM_IMML: {
+#ifdef DASM_CHECKS
+ int scale = (p[-2] >> 30);
+ CK((!(n & ((1<<scale)-1)) && (unsigned int)(n>>scale) < 4096) ||
+ (unsigned int)(n+256) < 512, RANGE_I);
+#endif
+ b[pos++] = n;
+ break;
+ }
+ }
+ }
+ }
+stop:
+ va_end(ap);
+ sec->pos = pos;
+ sec->ofs = ofs;
+}
+#undef CK
+
+/* Pass 2: Link sections, shrink aligns, fix label offsets. */
+int dasm_link(Dst_DECL, size_t *szp)
+{
+ dasm_State *D = Dst_REF;
+ int secnum;
+ int ofs = 0;
+
+#ifdef DASM_CHECKS
+ *szp = 0;
+ if (D->status != DASM_S_OK) return D->status;
+ {
+ int pc;
+ for (pc = 0; pc*sizeof(int) < D->pcsize; pc++)
+ if (D->pclabels[pc] > 0) return DASM_S_UNDEF_PC|pc;
+ }
+#endif
+
+ { /* Handle globals not defined in this translation unit. */
+ int idx;
+ for (idx = 20; idx*sizeof(int) < D->lgsize; idx++) {
+ int n = D->lglabels[idx];
+ /* Undefined label: Collapse rel chain and replace with marker (< 0). */
+ while (n > 0) { int *pb = DASM_POS2PTR(D, n); n = *pb; *pb = -idx; }
+ }
+ }
+
+ /* Combine all code sections. No support for data sections (yet). */
+ for (secnum = 0; secnum < D->maxsection; secnum++) {
+ dasm_Section *sec = D->sections + secnum;
+ int *b = sec->rbuf;
+ int pos = DASM_SEC2POS(secnum);
+ int lastpos = sec->pos;
+
+ while (pos != lastpos) {
+ dasm_ActList p = D->actionlist + b[pos++];
+ while (1) {
+ unsigned int ins = *p++;
+ unsigned int action = (ins >> 16);
+ switch (action) {
+ case DASM_STOP: case DASM_SECTION: goto stop;
+ case DASM_ESC: p++; break;
+ case DASM_REL_EXT: break;
+ case DASM_ALIGN: ofs -= (b[pos++] + ofs) & (ins & 255); break;
+ case DASM_REL_LG: case DASM_REL_PC: pos++; break;
+ case DASM_LABEL_LG: case DASM_LABEL_PC: b[pos++] += ofs; break;
+ case DASM_IMM: case DASM_IMM6: case DASM_IMM12: case DASM_IMM13W:
+ case DASM_IMML: pos++; break;
+ case DASM_IMM13X: pos += 2; break;
+ }
+ }
+ stop: (void)0;
+ }
+ ofs += sec->ofs; /* Next section starts right after current section. */
+ }
+
+ D->codesize = ofs; /* Total size of all code sections */
+ *szp = ofs;
+ return DASM_S_OK;
+}
+
+#ifdef DASM_CHECKS
+#define CK(x, st) \
+ do { if (!(x)) return DASM_S_##st|(p-D->actionlist-1); } while (0)
+#else
+#define CK(x, st) ((void)0)
+#endif
+
+/* Pass 3: Encode sections. */
+int dasm_encode(Dst_DECL, void *buffer)
+{
+ dasm_State *D = Dst_REF;
+ char *base = (char *)buffer;
+ unsigned int *cp = (unsigned int *)buffer;
+ int secnum;
+
+ /* Encode all code sections. No support for data sections (yet). */
+ for (secnum = 0; secnum < D->maxsection; secnum++) {
+ dasm_Section *sec = D->sections + secnum;
+ int *b = sec->buf;
+ int *endb = sec->rbuf + sec->pos;
+
+ while (b != endb) {
+ dasm_ActList p = D->actionlist + *b++;
+ while (1) {
+ unsigned int ins = *p++;
+ unsigned int action = (ins >> 16);
+ int n = (action >= DASM_ALIGN && action < DASM__MAX) ? *b++ : 0;
+ switch (action) {
+ case DASM_STOP: case DASM_SECTION: goto stop;
+ case DASM_ESC: *cp++ = *p++; break;
+ case DASM_REL_EXT:
+ n = DASM_EXTERN(Dst, (unsigned char *)cp, (ins&2047), !(ins&2048));
+ goto patchrel;
+ case DASM_ALIGN:
+ ins &= 255; while ((((char *)cp - base) & ins)) *cp++ = 0xe1a00000;
+ break;
+ case DASM_REL_LG:
+ CK(n >= 0, UNDEF_LG);
+ case DASM_REL_PC:
+ CK(n >= 0, UNDEF_PC);
+ n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base) + 4;
+ patchrel:
+ if (!(ins & 0xf800)) { /* B, BL */
+ CK((n & 3) == 0 && ((n+0x08000000) >> 28) == 0, RANGE_REL);
+ cp[-1] |= ((n >> 2) & 0x03ffffff);
+ } else if ((ins & 0x800)) { /* B.cond, CBZ, CBNZ, LDR* literal */
+ CK((n & 3) == 0 && ((n+0x00100000) >> 21) == 0, RANGE_REL);
+ cp[-1] |= ((n << 3) & 0x00ffffe0);
+ } else if ((ins & 0x3000) == 0x2000) { /* ADR */
+ CK(((n+0x00100000) >> 21) == 0, RANGE_REL);
+ cp[-1] |= ((n << 3) & 0x00ffffe0) | ((n & 3) << 29);
+ } else if ((ins & 0x3000) == 0x3000) { /* ADRP */
+ cp[-1] |= ((n >> 9) & 0x00ffffe0) | (((n >> 12) & 3) << 29);
+ } else if ((ins & 0x1000)) { /* TBZ, TBNZ */
+ CK((n & 3) == 0 && ((n+0x00008000) >> 16) == 0, RANGE_REL);
+ cp[-1] |= ((n << 3) & 0x0007ffe0);
+ }
+ break;
+ case DASM_LABEL_LG:
+ ins &= 2047; if (ins >= 20) D->globals[ins-10] = (void *)(base + n);
+ break;
+ case DASM_LABEL_PC: break;
+ case DASM_IMM:
+ cp[-1] |= (n & ((1<<((ins>>5)&31))-1)) << (ins&31);
+ break;
+ case DASM_IMM6:
+ cp[-1] |= ((n&31) << 19) | ((n&32) << 26);
+ break;
+ case DASM_IMM12:
+ cp[-1] |= (dasm_imm12((unsigned int)n) << 10);
+ break;
+ case DASM_IMM13W:
+ cp[-1] |= (dasm_imm13(n, n) << 10);
+ break;
+ case DASM_IMM13X:
+ cp[-1] |= (dasm_imm13(n, *b++) << 10);
+ break;
+ case DASM_IMML: {
+ int scale = (p[-2] >> 30);
+ cp[-1] |= (!(n & ((1<<scale)-1)) && (unsigned int)(n>>scale) < 4096) ?
+ ((n << (10-scale)) | 0x01000000) : ((n & 511) << 12);
+ break;
+ }
+ default: *cp++ = ins; break;
+ }
+ }
+ stop: (void)0;
+ }
+ }
+
+ if (base + D->codesize != (char *)cp) /* Check for phase errors. */
+ return DASM_S_PHASE;
+ return DASM_S_OK;
+}
+#undef CK
+
+/* Get PC label offset. */
+int dasm_getpclabel(Dst_DECL, unsigned int pc)
+{
+ dasm_State *D = Dst_REF;
+ if (pc*sizeof(int) < D->pcsize) {
+ int pos = D->pclabels[pc];
+ if (pos < 0) return *DASM_POS2PTR(D, -pos);
+ if (pos > 0) return -1; /* Undefined. */
+ }
+ return -2; /* Unused or out of range. */
+}
+
+#ifdef DASM_CHECKS
+/* Optional sanity checker to call between isolated encoding steps. */
+int dasm_checkstep(Dst_DECL, int secmatch)
+{
+ dasm_State *D = Dst_REF;
+ if (D->status == DASM_S_OK) {
+ int i;
+ for (i = 1; i <= 9; i++) {
+ if (D->lglabels[i] > 0) { D->status = DASM_S_UNDEF_LG|i; break; }
+ D->lglabels[i] = 0;
+ }
+ }
+ if (D->status == DASM_S_OK && secmatch >= 0 &&
+ D->section != &D->sections[secmatch])
+ D->status = DASM_S_MATCH_SEC|(D->section-D->sections);
+ return D->status;
+}
+#endif
+
--- /dev/null
- -- Copyright (C) 2005-2015 Mike Pall. All rights reserved.
+------------------------------------------------------------------------------
+-- DynASM ARM64 module.
+--
++-- Copyright (C) 2005-2016 Mike Pall. All rights reserved.
+-- See dynasm.lua for full copyright notice.
+------------------------------------------------------------------------------
+
+-- Module information:
+local _info = {
+ arch = "arm",
+ description = "DynASM ARM64 module",
+ version = "1.4.0",
+ vernum = 10400,
+ release = "2015-10-18",
+ author = "Mike Pall",
+ license = "MIT",
+}
+
+-- Exported glue functions for the arch-specific module.
+local _M = { _info = _info }
+
+-- Cache library functions.
+local type, tonumber, pairs, ipairs = type, tonumber, pairs, ipairs
+local assert, setmetatable, rawget = assert, setmetatable, rawget
+local _s = string
+local sub, format, byte, char = _s.sub, _s.format, _s.byte, _s.char
+local match, gmatch, gsub = _s.match, _s.gmatch, _s.gsub
+local concat, sort, insert = table.concat, table.sort, table.insert
+local bit = bit or require("bit")
+local band, shl, shr, sar = bit.band, bit.lshift, bit.rshift, bit.arshift
+local ror, tohex = bit.ror, bit.tohex
+
+-- Inherited tables and callbacks.
+local g_opt, g_arch
+local wline, werror, wfatal, wwarn
+
+-- Action name list.
+-- CHECK: Keep this in sync with the C code!
+local action_names = {
+ "STOP", "SECTION", "ESC", "REL_EXT",
+ "ALIGN", "REL_LG", "LABEL_LG",
+ "REL_PC", "LABEL_PC", "IMM", "IMM6", "IMM12", "IMM13W", "IMM13X", "IMML",
+}
+
+-- Maximum number of section buffer positions for dasm_put().
+-- CHECK: Keep this in sync with the C code!
+local maxsecpos = 25 -- Keep this low, to avoid excessively long C lines.
+
+-- Action name -> action number.
+local map_action = {}
+for n,name in ipairs(action_names) do
+ map_action[name] = n-1
+end
+
+-- Action list buffer.
+local actlist = {}
+
+-- Argument list for next dasm_put(). Start with offset 0 into action list.
+local actargs = { 0 }
+
+-- Current number of section buffer positions for dasm_put().
+local secpos = 1
+
+------------------------------------------------------------------------------
+
+-- Dump action names and numbers.
+local function dumpactions(out)
+ out:write("DynASM encoding engine action codes:\n")
+ for n,name in ipairs(action_names) do
+ local num = map_action[name]
+ out:write(format(" %-10s %02X %d\n", name, num, num))
+ end
+ out:write("\n")
+end
+
+-- Write action list buffer as a huge static C array.
+local function writeactions(out, name)
+ local nn = #actlist
+ if nn == 0 then nn = 1; actlist[0] = map_action.STOP end
+ out:write("static const unsigned int ", name, "[", nn, "] = {\n")
+ for i = 1,nn-1 do
+ assert(out:write("0x", tohex(actlist[i]), ",\n"))
+ end
+ assert(out:write("0x", tohex(actlist[nn]), "\n};\n\n"))
+end
+
+------------------------------------------------------------------------------
+
+-- Add word to action list.
+local function wputxw(n)
+ assert(n >= 0 and n <= 0xffffffff and n % 1 == 0, "word out of range")
+ actlist[#actlist+1] = n
+end
+
+-- Add action to list with optional arg. Advance buffer pos, too.
+local function waction(action, val, a, num)
+ local w = assert(map_action[action], "bad action name `"..action.."'")
+ wputxw(w * 0x10000 + (val or 0))
+ if a then actargs[#actargs+1] = a end
+ if a or num then secpos = secpos + (num or 1) end
+end
+
+-- Flush action list (intervening C code or buffer pos overflow).
+local function wflush(term)
+ if #actlist == actargs[1] then return end -- Nothing to flush.
+ if not term then waction("STOP") end -- Terminate action list.
+ wline(format("dasm_put(Dst, %s);", concat(actargs, ", ")), true)
+ actargs = { #actlist } -- Actionlist offset is 1st arg to next dasm_put().
+ secpos = 1 -- The actionlist offset occupies a buffer position, too.
+end
+
+-- Put escaped word.
+local function wputw(n)
+ if n <= 0x000fffff then waction("ESC") end
+ wputxw(n)
+end
+
+-- Reserve position for word.
+local function wpos()
+ local pos = #actlist+1
+ actlist[pos] = ""
+ return pos
+end
+
+-- Store word to reserved position.
+local function wputpos(pos, n)
+ assert(n >= 0 and n <= 0xffffffff and n % 1 == 0, "word out of range")
+ if n <= 0x000fffff then
+ insert(actlist, pos+1, n)
+ n = map_action.ESC * 0x10000
+ end
+ actlist[pos] = n
+end
+
+------------------------------------------------------------------------------
+
+-- Global label name -> global label number. With auto assignment on 1st use.
+local next_global = 20
+local map_global = setmetatable({}, { __index = function(t, name)
+ if not match(name, "^[%a_][%w_]*$") then werror("bad global label") end
+ local n = next_global
+ if n > 2047 then werror("too many global labels") end
+ next_global = n + 1
+ t[name] = n
+ return n
+end})
+
+-- Dump global labels.
+local function dumpglobals(out, lvl)
+ local t = {}
+ for name, n in pairs(map_global) do t[n] = name end
+ out:write("Global labels:\n")
+ for i=20,next_global-1 do
+ out:write(format(" %s\n", t[i]))
+ end
+ out:write("\n")
+end
+
+-- Write global label enum.
+local function writeglobals(out, prefix)
+ local t = {}
+ for name, n in pairs(map_global) do t[n] = name end
+ out:write("enum {\n")
+ for i=20,next_global-1 do
+ out:write(" ", prefix, t[i], ",\n")
+ end
+ out:write(" ", prefix, "_MAX\n};\n")
+end
+
+-- Write global label names.
+local function writeglobalnames(out, name)
+ local t = {}
+ for name, n in pairs(map_global) do t[n] = name end
+ out:write("static const char *const ", name, "[] = {\n")
+ for i=20,next_global-1 do
+ out:write(" \"", t[i], "\",\n")
+ end
+ out:write(" (const char *)0\n};\n")
+end
+
+------------------------------------------------------------------------------
+
+-- Extern label name -> extern label number. With auto assignment on 1st use.
+local next_extern = 0
+local map_extern_ = {}
+local map_extern = setmetatable({}, { __index = function(t, name)
+ -- No restrictions on the name for now.
+ local n = next_extern
+ if n > 2047 then werror("too many extern labels") end
+ next_extern = n + 1
+ t[name] = n
+ map_extern_[n] = name
+ return n
+end})
+
+-- Dump extern labels.
+local function dumpexterns(out, lvl)
+ out:write("Extern labels:\n")
+ for i=0,next_extern-1 do
+ out:write(format(" %s\n", map_extern_[i]))
+ end
+ out:write("\n")
+end
+
+-- Write extern label names.
+local function writeexternnames(out, name)
+ out:write("static const char *const ", name, "[] = {\n")
+ for i=0,next_extern-1 do
+ out:write(" \"", map_extern_[i], "\",\n")
+ end
+ out:write(" (const char *)0\n};\n")
+end
+
+------------------------------------------------------------------------------
+
+-- Arch-specific maps.
+
+-- Ext. register name -> int. name.
+local map_archdef = { xzr = "@x31", wzr = "@w31", lr = "x30", }
+
+-- Int. register name -> ext. name.
+local map_reg_rev = { ["@x31"] = "xzr", ["@w31"] = "wzr", x30 = "lr", }
+
+local map_type = {} -- Type name -> { ctype, reg }
+local ctypenum = 0 -- Type number (for Dt... macros).
+
+-- Reverse defines for registers.
+function _M.revdef(s)
+ return map_reg_rev[s] or s
+end
+
+local map_shift = { lsl = 0, lsr = 1, asr = 2, }
+
+local map_extend = {
+ uxtb = 0, uxth = 1, uxtw = 2, uxtx = 3,
+ sxtb = 4, sxth = 5, sxtw = 6, sxtx = 7,
+}
+
+local map_cond = {
+ eq = 0, ne = 1, cs = 2, cc = 3, mi = 4, pl = 5, vs = 6, vc = 7,
+ hi = 8, ls = 9, ge = 10, lt = 11, gt = 12, le = 13, al = 14,
+ hs = 2, lo = 3,
+}
+
+------------------------------------------------------------------------------
+
+local parse_reg_type
+
+local function parse_reg(expr)
+ if not expr then werror("expected register name") end
+ local tname, ovreg = match(expr, "^([%w_]+):(@?%l%d+)$")
+ local tp = map_type[tname or expr]
+ if tp then
+ local reg = ovreg or tp.reg
+ if not reg then
+ werror("type `"..(tname or expr).."' needs a register override")
+ end
+ expr = reg
+ end
+ local ok31, rt, r = match(expr, "^(@?)([xwqdshb])([123]?[0-9])$")
+ if r then
+ r = tonumber(r)
+ if r <= 30 or (r == 31 and ok31 ~= "" or (rt ~= "w" and rt ~= "x")) then
+ if not parse_reg_type then
+ parse_reg_type = rt
+ elseif parse_reg_type ~= rt then
+ werror("register size mismatch")
+ end
+ return r, tp
+ end
+ end
+ werror("bad register name `"..expr.."'")
+end
+
+local function parse_reg_base(expr)
+ if expr == "sp" then return 0x3e0 end
+ local base, tp = parse_reg(expr)
+ if parse_reg_type ~= "x" then werror("bad register type") end
+ parse_reg_type = false
+ return shl(base, 5), tp
+end
+
+local parse_ctx = {}
+
+local loadenv = setfenv and function(s)
+ local code = loadstring(s, "")
+ if code then setfenv(code, parse_ctx) end
+ return code
+end or function(s)
+ return load(s, "", nil, parse_ctx)
+end
+
+-- Try to parse simple arithmetic, too, since some basic ops are aliases.
+local function parse_number(n)
+ local x = tonumber(n)
+ if x then return x end
+ local code = loadenv("return "..n)
+ if code then
+ local ok, y = pcall(code)
+ if ok then return y end
+ end
+ return nil
+end
+
+local function parse_imm(imm, bits, shift, scale, signed)
+ imm = match(imm, "^#(.*)$")
+ if not imm then werror("expected immediate operand") end
+ local n = parse_number(imm)
+ if n then
+ local m = sar(n, scale)
+ if shl(m, scale) == n then
+ if signed then
+ local s = sar(m, bits-1)
+ if s == 0 then return shl(m, shift)
+ elseif s == -1 then return shl(m + shl(1, bits), shift) end
+ else
+ if sar(m, bits) == 0 then return shl(m, shift) end
+ end
+ end
+ werror("out of range immediate `"..imm.."'")
+ else
+ waction("IMM", (signed and 32768 or 0)+scale*1024+bits*32+shift, imm)
+ return 0
+ end
+end
+
+local function parse_imm12(imm)
+ imm = match(imm, "^#(.*)$")
+ if not imm then werror("expected immediate operand") end
+ local n = parse_number(imm)
+ if n then
+ if shr(n, 12) == 0 then
+ return shl(n, 10)
+ elseif band(n, 0xff000fff) == 0 then
+ return shr(n, 2) + 0x00400000
+ end
+ werror("out of range immediate `"..imm.."'")
+ else
+ waction("IMM12", 0, imm)
+ return 0
+ end
+end
+
+local function parse_imm13(imm)
+ imm = match(imm, "^#(.*)$")
+ if not imm then werror("expected immediate operand") end
+ local n = parse_number(imm)
+ local r64 = parse_reg_type == "x"
+ if n and n % 1 == 0 and n >= 0 and n <= 0xffffffff then
+ local inv = false
+ if band(n, 1) == 1 then n = bit.bnot(n); inv = true end
+ local t = {}
+ for i=1,32 do t[i] = band(n, 1); n = shr(n, 1) end
+ local b = table.concat(t)
+ b = b..(r64 and (inv and "1" or "0"):rep(32) or b)
+ local p0, p1, p0a, p1a = b:match("^(0+)(1+)(0*)(1*)")
+ if p0 then
+ local w = p1a == "" and (r64 and 64 or 32) or #p1+#p0a
+ if band(w, w-1) == 0 and b == b:sub(1, w):rep(64/w) then
+ local s = band(-2*w, 0x3f) - 1
+ if w == 64 then s = s + 0x1000 end
+ if inv then
+ return shl(w-#p1-#p0, 16) + shl(s+w-#p1, 10)
+ else
+ return shl(w-#p0, 16) + shl(s+#p1, 10)
+ end
+ end
+ end
+ werror("out of range immediate `"..imm.."'")
+ elseif r64 then
+ waction("IMM13X", 0, format("(unsigned int)(%s)", imm))
+ actargs[#actargs+1] = format("(unsigned int)((unsigned long long)(%s)>>32)", imm)
+ return 0
+ else
+ waction("IMM13W", 0, imm)
+ return 0
+ end
+end
+
+local function parse_imm6(imm)
+ imm = match(imm, "^#(.*)$")
+ if not imm then werror("expected immediate operand") end
+ local n = parse_number(imm)
+ if n then
+ if n >= 0 and n <= 63 then
+ return shl(band(n, 0x1f), 19) + (n >= 32 and 0x80000000 or 0)
+ end
+ werror("out of range immediate `"..imm.."'")
+ else
+ waction("IMM6", 0, imm)
+ return 0
+ end
+end
+
+local function parse_imm_load(imm, scale)
+ local n = parse_number(imm)
+ if n then
+ local m = sar(n, scale)
+ if shl(m, scale) == n and m >= 0 and m < 0x1000 then
+ return shl(m, 10) + 0x01000000 -- Scaled, unsigned 12 bit offset.
+ elseif n >= -256 and n < 256 then
+ return shl(band(n, 511), 12) -- Unscaled, signed 9 bit offset.
+ end
+ werror("out of range immediate `"..imm.."'")
+ else
+ waction("IMML", 0, imm)
+ return 0
+ end
+end
+
+local function parse_fpimm(imm)
+ imm = match(imm, "^#(.*)$")
+ if not imm then werror("expected immediate operand") end
+ local n = parse_number(imm)
+ if n then
+ local m, e = math.frexp(n)
+ local s, e2 = 0, band(e-2, 7)
+ if m < 0 then m = -m; s = 0x00100000 end
+ m = m*32-16
+ if m % 1 == 0 and m >= 0 and m <= 15 and sar(shl(e2, 29), 29)+2 == e then
+ return s + shl(e2, 17) + shl(m, 13)
+ end
+ werror("out of range immediate `"..imm.."'")
+ else
+ werror("NYI fpimm action")
+ end
+end
+
+local function parse_shift(expr)
+ local s, s2 = match(expr, "^(%S+)%s*(.*)$")
+ s = map_shift[s]
+ if not s then werror("expected shift operand") end
+ return parse_imm(s2, 6, 10, 0, false) + shl(s, 22)
+end
+
+local function parse_lslx16(expr)
+ local n = match(expr, "^lsl%s*#(%d+)$")
+ n = tonumber(n)
+ if not n then werror("expected shift operand") end
+ if band(n, parse_reg_type == "x" and 0xffffffcf or 0xffffffef) ~= 0 then
+ werror("bad shift amount")
+ end
+ return shl(n, 17)
+end
+
+local function parse_extend(expr)
+ local s, s2 = match(expr, "^(%S+)%s*(.*)$")
+ if s == "lsl" then
+ s = parse_reg_type == "x" and 3 or 2
+ else
+ s = map_extend[s]
+ end
+ if not s then werror("expected extend operand") end
+ return (s2 == "" and 0 or parse_imm(s2, 3, 10, 0, false)) + shl(s, 13)
+end
+
+local function parse_cond(expr, inv)
+ local c = map_cond[expr]
+ if not c then werror("expected condition operand") end
+ return shl(bit.bxor(c, inv), 12)
+end
+
+local function parse_load(params, nparams, n, op)
+ if params[n+2] then werror("too many operands") end
+ local pn, p2 = params[n], params[n+1]
+ local p1, wb = match(pn, "^%[%s*(.-)%s*%](!?)$")
+ if not p1 then
+ if not p2 then
+ local reg, tailr = match(pn, "^([%w_:]+)%s*(.*)$")
+ if reg and tailr ~= "" then
+ local base, tp = parse_reg_base(reg)
+ if tp then
+ waction("IMML", 0, format(tp.ctypefmt, tailr))
+ return op + base
+ end
+ end
+ end
+ werror("expected address operand")
+ end
+ local scale = shr(op, 30)
+ if p2 then
+ if wb == "!" then werror("bad use of '!'") end
+ op = op + parse_reg_base(p1) + parse_imm(p2, 9, 12, 0, true) + 0x400
+ elseif wb == "!" then
+ local p1a, p2a = match(p1, "^([^,%s]*)%s*,%s*(.*)$")
+ if not p1a then werror("bad use of '!'") end
+ op = op + parse_reg_base(p1a) + parse_imm(p2a, 9, 12, 0, true) + 0xc00
+ else
+ local p1a, p2a = match(p1, "^([^,%s]*)%s*(.*)$")
+ op = op + parse_reg_base(p1a)
+ if p2a ~= "" then
+ local imm = match(p2a, "^,%s*#(.*)$")
+ if imm then
+ op = op + parse_imm_load(imm, scale)
+ else
+ local p2b, p3b, p3s = match(p2a, "^,%s*([^,%s]*)%s*,?%s*(%S*)%s*(.*)$")
+ op = op + shl(parse_reg(p2b), 16) + 0x00200800
+ if parse_reg_type ~= "x" and parse_reg_type ~= "w" then
+ werror("bad index register type")
+ end
+ if p3b == "" then
+ if parse_reg_type ~= "x" then werror("bad index register type") end
+ op = op + 0x6000
+ else
+ if p3s == "" or p3s == "#0" then
+ elseif p3s == "#"..scale then
+ op = op + 0x1000
+ else
+ werror("bad scale")
+ end
+ if parse_reg_type == "x" then
+ if p3b == "lsl" and p3s ~= "" then op = op + 0x6000
+ elseif p3b == "sxtx" then op = op + 0xe000
+ else
+ werror("bad extend/shift specifier")
+ end
+ else
+ if p3b == "uxtw" then op = op + 0x4000
+ elseif p3b == "sxtw" then op = op + 0xc000
+ else
+ werror("bad extend/shift specifier")
+ end
+ end
+ end
+ end
+ else
+ if wb == "!" then werror("bad use of '!'") end
+ op = op + 0x01000000
+ end
+ end
+ return op
+end
+
+local function parse_load_pair(params, nparams, n, op)
+ if params[n+2] then werror("too many operands") end
+ local pn, p2 = params[n], params[n+1]
+ local scale = shr(op, 30) == 0 and 2 or 3
+ local p1, wb = match(pn, "^%[%s*(.-)%s*%](!?)$")
+ if not p1 then
+ if not p2 then
+ local reg, tailr = match(pn, "^([%w_:]+)%s*(.*)$")
+ if reg and tailr ~= "" then
+ local base, tp = parse_reg_base(reg)
+ if tp then
+ waction("IMM", 32768+7*32+15+scale*1024, format(tp.ctypefmt, tailr))
+ return op + base + 0x01000000
+ end
+ end
+ end
+ werror("expected address operand")
+ end
+ if p2 then
+ if wb == "!" then werror("bad use of '!'") end
+ op = op + 0x00800000
+ else
+ local p1a, p2a = match(p1, "^([^,%s]*)%s*,%s*(.*)$")
+ if p1a then p1, p2 = p1a, p2a else p2 = "#0" end
+ op = op + (wb == "!" and 0x01800000 or 0x01000000)
+ end
+ return op + parse_reg_base(p1) + parse_imm(p2, 7, 15, scale, true)
+end
+
+local function parse_label(label, def)
+ local prefix = sub(label, 1, 2)
+ -- =>label (pc label reference)
+ if prefix == "=>" then
+ return "PC", 0, sub(label, 3)
+ end
+ -- ->name (global label reference)
+ if prefix == "->" then
+ return "LG", map_global[sub(label, 3)]
+ end
+ if def then
+ -- [1-9] (local label definition)
+ if match(label, "^[1-9]$") then
+ return "LG", 10+tonumber(label)
+ end
+ else
+ -- [<>][1-9] (local label reference)
+ local dir, lnum = match(label, "^([<>])([1-9])$")
+ if dir then -- Fwd: 1-9, Bkwd: 11-19.
+ return "LG", lnum + (dir == ">" and 0 or 10)
+ end
+ -- extern label (extern label reference)
+ local extname = match(label, "^extern%s+(%S+)$")
+ if extname then
+ return "EXT", map_extern[extname]
+ end
+ end
+ werror("bad label `"..label.."'")
+end
+
+local function branch_type(op)
+ if band(op, 0x7c000000) == 0x14000000 then return 0 -- B, BL
+ elseif shr(op, 24) == 0x54 or band(op, 0x7e000000) == 0x34000000 or
+ band(op, 0x3b000000) == 0x18000000 then
+ return 0x800 -- B.cond, CBZ, CBNZ, LDR* literal
+ elseif band(op, 0x7e000000) == 0x36000000 then return 0x1000 -- TBZ, TBNZ
+ elseif band(op, 0x9f000000) == 0x10000000 then return 0x2000 -- ADR
+ elseif band(op, 0x9f000000) == band(0x90000000) then return 0x3000 -- ADRP
+ else
+ assert(false, "unknown branch type")
+ end
+end
+
+------------------------------------------------------------------------------
+
+local map_op, op_template
+
+local function op_alias(opname, f)
+ return function(params, nparams)
+ if not params then return "-> "..opname:sub(1, -3) end
+ f(params, nparams)
+ op_template(params, map_op[opname], nparams)
+ end
+end
+
+local function alias_bfx(p)
+ p[4] = "#("..p[3]:sub(2)..")+("..p[4]:sub(2)..")-1"
+end
+
+local function alias_bfiz(p)
+ parse_reg(p[1])
+ if parse_reg_type == "w" then
+ p[3] = "#-("..p[3]:sub(2)..")%32"
+ p[4] = "#("..p[4]:sub(2)..")-1"
+ else
+ p[3] = "#-("..p[3]:sub(2)..")%64"
+ p[4] = "#("..p[4]:sub(2)..")-1"
+ end
+end
+
+local alias_lslimm = op_alias("ubfm_4", function(p)
+ parse_reg(p[1])
+ local sh = p[3]:sub(2)
+ if parse_reg_type == "w" then
+ p[3] = "#-("..sh..")%32"
+ p[4] = "#31-("..sh..")"
+ else
+ p[3] = "#-("..sh..")%64"
+ p[4] = "#63-("..sh..")"
+ end
+end)
+
+-- Template strings for ARM instructions.
+map_op = {
+ -- Basic data processing instructions.
+ add_3 = "0b000000DNMg|11000000pDpNIg|8b206000pDpNMx",
+ add_4 = "0b000000DNMSg|0b200000DNMXg|8b200000pDpNMXx|8b200000pDpNxMwX",
+ adds_3 = "2b000000DNMg|31000000DpNIg|ab206000DpNMx",
+ adds_4 = "2b000000DNMSg|2b200000DNMXg|ab200000DpNMXx|ab200000DpNxMwX",
+ cmn_2 = "2b00001fNMg|3100001fpNIg|ab20601fpNMx",
+ cmn_3 = "2b00001fNMSg|2b20001fNMXg|ab20001fpNMXx|ab20001fpNxMwX",
+
+ sub_3 = "4b000000DNMg|51000000pDpNIg|cb206000pDpNMx",
+ sub_4 = "4b000000DNMSg|4b200000DNMXg|cb200000pDpNMXx|cb200000pDpNxMwX",
+ subs_3 = "6b000000DNMg|71000000DpNIg|eb206000DpNMx",
+ subs_4 = "6b000000DNMSg|6b200000DNMXg|eb200000DpNMXx|eb200000DpNxMwX",
+ cmp_2 = "6b00001fNMg|7100001fpNIg|eb20601fpNMx",
+ cmp_3 = "6b00001fNMSg|6b20001fNMXg|eb20001fpNMXx|eb20001fpNxMwX",
+
+ neg_2 = "4b0003e0DMg",
+ neg_3 = "4b0003e0DMSg",
+ negs_2 = "6b0003e0DMg",
+ negs_3 = "6b0003e0DMSg",
+
+ adc_3 = "1a000000DNMg",
+ adcs_3 = "3a000000DNMg",
+ sbc_3 = "5a000000DNMg",
+ sbcs_3 = "7a000000DNMg",
+ ngc_2 = "5a0003e0DMg",
+ ngcs_2 = "7a0003e0DMg",
+
+ and_3 = "0a000000DNMg|12000000pDNig",
+ and_4 = "0a000000DNMSg",
+ orr_3 = "2a000000DNMg|32000000pDNig",
+ orr_4 = "2a000000DNMSg",
+ eor_3 = "4a000000DNMg|52000000pDNig",
+ eor_4 = "4a000000DNMSg",
+ ands_3 = "6a000000DNMg|72000000DNig",
+ ands_4 = "6a000000DNMSg",
+ tst_2 = "6a00001fNMg|7200001fNig",
+ tst_3 = "6a00001fNMSg",
+
+ bic_3 = "0a200000DNMg",
+ bic_4 = "0a200000DNMSg",
+ orn_3 = "2a200000DNMg",
+ orn_4 = "2a200000DNMSg",
+ eon_3 = "4a200000DNMg",
+ eon_4 = "4a200000DNMSg",
+ bics_3 = "6a200000DNMg",
+ bics_4 = "6a200000DNMSg",
+
+ movn_2 = "12800000DWg",
+ movn_3 = "12800000DWRg",
+ movz_2 = "52800000DWg",
+ movz_3 = "52800000DWRg",
+ movk_2 = "72800000DWg",
+ movk_3 = "72800000DWRg",
+
+ -- TODO: this doesn't cover all valid immediates for mov reg, #imm.
+ mov_2 = "2a0003e0DMg|52800000DW|320003e0pDig|11000000pDpNg",
+ mov_3 = "2a0003e0DMSg",
+ mvn_2 = "2a2003e0DMg",
+ mvn_3 = "2a2003e0DMSg",
+
+ adr_2 = "10000000DBx",
+ adrp_2 = "90000000DBx",
+
+ csel_4 = "1a800000DNMCg",
+ csinc_4 = "1a800400DNMCg",
+ csinv_4 = "5a800000DNMCg",
+ csneg_4 = "5a800400DNMCg",
+ cset_2 = "1a9f07e0Dcg",
+ csetm_2 = "5a9f03e0Dcg",
+ cinc_3 = "1a800400DNmcg",
+ cinv_3 = "5a800000DNmcg",
+ cneg_3 = "5a800400DNmcg",
+
+ ccmn_4 = "3a400000NMVCg|3a400800N5VCg",
+ ccmp_4 = "7a400000NMVCg|7a400800N5VCg",
+
+ madd_4 = "1b000000DNMAg",
+ msub_4 = "1b008000DNMAg",
+ mul_3 = "1b007c00DNMg",
+ mneg_3 = "1b00fc00DNMg",
+
+ smaddl_4 = "9b200000DxNMwAx",
+ smsubl_4 = "9b208000DxNMwAx",
+ smull_3 = "9b207c00DxNMw",
+ smnegl_3 = "9b20fc00DxNMw",
+ smulh_3 = "9b407c00DNMx",
+ umaddl_4 = "9ba00000DxNMwAx",
+ umsubl_4 = "9ba08000DxNMwAx",
+ umull_3 = "9ba07c00DxNMw",
+ umnegl_3 = "9ba0fc00DxNMw",
+ umulh_3 = "9bc07c00DNMx",
+
+ udiv_3 = "1ac00800DNMg",
+ sdiv_3 = "1ac00c00DNMg",
+
+ -- Bit operations.
+ sbfm_4 = "13000000DN12w|93400000DN12x",
+ bfm_4 = "33000000DN12w|b3400000DN12x",
+ ubfm_4 = "53000000DN12w|d3400000DN12x",
+ extr_4 = "13800000DNM2w|93c00000DNM2x",
+
+ sxtb_2 = "13001c00DNw|93401c00DNx",
+ sxth_2 = "13003c00DNw|93403c00DNx",
+ sxtw_2 = "93407c00DxNw",
+ uxtb_2 = "53001c00DNw",
+ uxth_2 = "53003c00DNw",
+
+ sbfx_4 = op_alias("sbfm_4", alias_bfx),
+ bfxil_4 = op_alias("bfm_4", alias_bfx),
+ ubfx_4 = op_alias("ubfm_4", alias_bfx),
+ sbfiz_4 = op_alias("sbfm_4", alias_bfiz),
+ bfi_4 = op_alias("bfm_4", alias_bfiz),
+ ubfiz_4 = op_alias("ubfm_4", alias_bfiz),
+
+ lsl_3 = function(params, nparams)
+ if params and params[3]:byte() == 35 then
+ return alias_lslimm(params, nparams)
+ else
+ return op_template(params, "1ac02000DNMg", nparams)
+ end
+ end,
+ lsr_3 = "1ac02400DNMg|53007c00DN1w|d340fc00DN1x",
+ asr_3 = "1ac02800DNMg|13007c00DN1w|9340fc00DN1x",
+ ror_3 = "1ac02c00DNMg|13800000DNm2w|93c00000DNm2x",
+
+ clz_2 = "5ac01000DNg",
+ cls_2 = "5ac01400DNg",
+ rbit_2 = "5ac00000DNg",
+ rev_2 = "5ac00800DNw|dac00c00DNx",
+ rev16_2 = "5ac00400DNg",
+ rev32_2 = "dac00800DNx",
+
+ -- Loads and stores.
+ ["strb_*"] = "38000000DwL",
+ ["ldrb_*"] = "38400000DwL",
+ ["ldrsb_*"] = "38c00000DwL|38800000DxL",
+ ["strh_*"] = "78000000DwL",
+ ["ldrh_*"] = "78400000DwL",
+ ["ldrsh_*"] = "78c00000DwL|78800000DxL",
+ ["str_*"] = "b8000000DwL|f8000000DxL|bc000000DsL|fc000000DdL",
+ ["ldr_*"] = "18000000DwB|58000000DxB|1c000000DsB|5c000000DdB|b8400000DwL|f8400000DxL|bc400000DsL|fc400000DdL",
+ ["ldrsw_*"] = "98000000DxB|b8800000DxL",
+ -- NOTE: ldur etc. are handled by ldr et al.
+
+ ["stp_*"] = "28000000DAwP|a8000000DAxP|2c000000DAsP|6c000000DAdP",
+ ["ldp_*"] = "28400000DAwP|a8400000DAxP|2c400000DAsP|6c400000DAdP",
+ ["ldpsw_*"] = "68400000DAxP",
+
+ -- Branches.
+ b_1 = "14000000B",
+ bl_1 = "94000000B",
+ blr_1 = "d63f0000Nx",
+ br_1 = "d61f0000Nx",
+ ret_0 = "d65f03c0",
+ ret_1 = "d65f0000Nx",
+ -- b.cond is added below.
+ cbz_2 = "34000000DBg",
+ cbnz_2 = "35000000DBg",
+ tbz_3 = "36000000DTBw|36000000DTBx",
+ tbnz_3 = "37000000DTBw|37000000DTBx",
+
+ -- Miscellaneous instructions.
+ -- TODO: hlt, hvc, smc, svc, eret, dcps[123], drps, mrs, msr
+ -- TODO: sys, sysl, ic, dc, at, tlbi
+ -- TODO: hint, yield, wfe, wfi, sev, sevl
+ -- TODO: clrex, dsb, dmb, isb
+ nop_0 = "d503201f",
+ brk_0 = "d4200000",
+ brk_1 = "d4200000W",
+
+ -- Floating point instructions.
+ fmov_2 = "1e204000DNf|1e260000DwNs|1e270000DsNw|9e660000DxNd|9e670000DdNx|1e201000DFf",
+ fabs_2 = "1e20c000DNf",
+ fneg_2 = "1e214000DNf",
+ fsqrt_2 = "1e21c000DNf",
+
+ fcvt_2 = "1e22c000DdNs|1e624000DsNd",
+
+ -- TODO: half-precision and fixed-point conversions.
+ fcvtas_2 = "1e240000DwNs|9e240000DxNs|1e640000DwNd|9e640000DxNd",
+ fcvtau_2 = "1e250000DwNs|9e250000DxNs|1e650000DwNd|9e650000DxNd",
+ fcvtms_2 = "1e300000DwNs|9e300000DxNs|1e700000DwNd|9e700000DxNd",
+ fcvtmu_2 = "1e310000DwNs|9e310000DxNs|1e710000DwNd|9e710000DxNd",
+ fcvtns_2 = "1e200000DwNs|9e200000DxNs|1e600000DwNd|9e600000DxNd",
+ fcvtnu_2 = "1e210000DwNs|9e210000DxNs|1e610000DwNd|9e610000DxNd",
+ fcvtps_2 = "1e280000DwNs|9e280000DxNs|1e680000DwNd|9e680000DxNd",
+ fcvtpu_2 = "1e290000DwNs|9e290000DxNs|1e690000DwNd|9e690000DxNd",
+ fcvtzs_2 = "1e380000DwNs|9e380000DxNs|1e780000DwNd|9e780000DxNd",
+ fcvtzu_2 = "1e390000DwNs|9e390000DxNs|1e790000DwNd|9e790000DxNd",
+
+ scvtf_2 = "1e220000DsNw|9e220000DsNx|1e620000DdNw|9e620000DdNx",
+ ucvtf_2 = "1e230000DsNw|9e230000DsNx|1e630000DdNw|9e630000DdNx",
+
+ frintn_2 = "1e244000DNf",
+ frintp_2 = "1e24c000DNf",
+ frintm_2 = "1e254000DNf",
+ frintz_2 = "1e25c000DNf",
+ frinta_2 = "1e264000DNf",
+ frintx_2 = "1e274000DNf",
+ frinti_2 = "1e27c000DNf",
+
+ fadd_3 = "1e202800DNMf",
+ fsub_3 = "1e203800DNMf",
+ fmul_3 = "1e200800DNMf",
+ fnmul_3 = "1e208800DNMf",
+ fdiv_3 = "1e201800DNMf",
+
+ fmadd_4 = "1f000000DNMAf",
+ fmsub_4 = "1f008000DNMAf",
+ fnmadd_4 = "1f200000DNMAf",
+ fnmsub_4 = "1f208000DNMAf",
+
+ fmax_3 = "1e204800DNMf",
+ fmaxnm_3 = "1e206800DNMf",
+ fmin_3 = "1e205800DNMf",
+ fminnm_3 = "1e207800DNMf",
+
+ fcmp_2 = "1e202000NMf|1e202008NZf",
+ fcmpe_2 = "1e202010NMf|1e202018NZf",
+
+ fccmp_4 = "1e200400NMVCf",
+ fccmpe_4 = "1e200410NMVCf",
+
+ fcsel_4 = "1e200c00DNMCf",
+
+ -- TODO: crc32*, aes*, sha*, pmull
+ -- TODO: SIMD instructions.
+}
+
+for cond,c in pairs(map_cond) do
+ map_op["b"..cond.."_1"] = tohex(0x54000000+c).."B"
+end
+
+------------------------------------------------------------------------------
+
+-- Handle opcodes defined with template strings.
+local function parse_template(params, template, nparams, pos)
+ local op = tonumber(sub(template, 1, 8), 16)
+ local n = 1
+ local rtt = {}
+
+ parse_reg_type = false
+
+ -- Process each character.
+ for p in gmatch(sub(template, 9), ".") do
+ local q = params[n]
+ if p == "D" then
+ op = op + parse_reg(q); n = n + 1
+ elseif p == "N" then
+ op = op + shl(parse_reg(q), 5); n = n + 1
+ elseif p == "M" then
+ op = op + shl(parse_reg(q), 16); n = n + 1
+ elseif p == "A" then
+ op = op + shl(parse_reg(q), 10); n = n + 1
+ elseif p == "m" then
+ op = op + shl(parse_reg(params[n-1]), 16)
+
+ elseif p == "p" then
+ if q == "sp" then params[n] = "@x31" end
+ elseif p == "g" then
+ if parse_reg_type == "x" then
+ op = op + 0x80000000
+ elseif parse_reg_type ~= "w" then
+ werror("bad register type")
+ end
+ parse_reg_type = false
+ elseif p == "f" then
+ if parse_reg_type == "d" then
+ op = op + 0x00400000
+ elseif parse_reg_type ~= "s" then
+ werror("bad register type")
+ end
+ parse_reg_type = false
+ elseif p == "x" or p == "w" or p == "d" or p == "s" then
+ if parse_reg_type ~= p then
+ werror("register size mismatch")
+ end
+ parse_reg_type = false
+
+ elseif p == "L" then
+ op = parse_load(params, nparams, n, op)
+ elseif p == "P" then
+ op = parse_load_pair(params, nparams, n, op)
+
+ elseif p == "B" then
+ local mode, v, s = parse_label(q, false); n = n + 1
+ local m = branch_type(op)
+ waction("REL_"..mode, v+m, s, 1)
+
+ elseif p == "I" then
+ op = op + parse_imm12(q); n = n + 1
+ elseif p == "i" then
+ op = op + parse_imm13(q); n = n + 1
+ elseif p == "W" then
+ op = op + parse_imm(q, 16, 5, 0, false); n = n + 1
+ elseif p == "T" then
+ op = op + parse_imm6(q); n = n + 1
+ elseif p == "1" then
+ op = op + parse_imm(q, 6, 16, 0, false); n = n + 1
+ elseif p == "2" then
+ op = op + parse_imm(q, 6, 10, 0, false); n = n + 1
+ elseif p == "5" then
+ op = op + parse_imm(q, 5, 16, 0, false); n = n + 1
+ elseif p == "V" then
+ op = op + parse_imm(q, 4, 0, 0, false); n = n + 1
+ elseif p == "F" then
+ op = op + parse_fpimm(q); n = n + 1
+ elseif p == "Z" then
+ if q ~= "#0" and q ~= "#0.0" then werror("expected zero immediate") end
+ n = n + 1
+
+ elseif p == "S" then
+ op = op + parse_shift(q); n = n + 1
+ elseif p == "X" then
+ op = op + parse_extend(q); n = n + 1
+ elseif p == "R" then
+ op = op + parse_lslx16(q); n = n + 1
+ elseif p == "C" then
+ op = op + parse_cond(q, 0); n = n + 1
+ elseif p == "c" then
+ op = op + parse_cond(q, 1); n = n + 1
+
+ else
+ assert(false)
+ end
+ end
+ wputpos(pos, op)
+end
+
+function op_template(params, template, nparams)
+ if not params then return template:gsub("%x%x%x%x%x%x%x%x", "") end
+
+ -- Limit number of section buffer positions used by a single dasm_put().
+ -- A single opcode needs a maximum of 3 positions.
+ if secpos+3 > maxsecpos then wflush() end
+ local pos = wpos()
+ local lpos, apos, spos = #actlist, #actargs, secpos
+
+ local ok, err
+ for t in gmatch(template, "[^|]+") do
+ ok, err = pcall(parse_template, params, t, nparams, pos)
+ if ok then return end
+ secpos = spos
+ actlist[lpos+1] = nil
+ actlist[lpos+2] = nil
+ actlist[lpos+3] = nil
+ actargs[apos+1] = nil
+ actargs[apos+2] = nil
+ actargs[apos+3] = nil
+ end
+ error(err, 0)
+end
+
+map_op[".template__"] = op_template
+
+------------------------------------------------------------------------------
+
+-- Pseudo-opcode to mark the position where the action list is to be emitted.
+map_op[".actionlist_1"] = function(params)
+ if not params then return "cvar" end
+ local name = params[1] -- No syntax check. You get to keep the pieces.
+ wline(function(out) writeactions(out, name) end)
+end
+
+-- Pseudo-opcode to mark the position where the global enum is to be emitted.
+map_op[".globals_1"] = function(params)
+ if not params then return "prefix" end
+ local prefix = params[1] -- No syntax check. You get to keep the pieces.
+ wline(function(out) writeglobals(out, prefix) end)
+end
+
+-- Pseudo-opcode to mark the position where the global names are to be emitted.
+map_op[".globalnames_1"] = function(params)
+ if not params then return "cvar" end
+ local name = params[1] -- No syntax check. You get to keep the pieces.
+ wline(function(out) writeglobalnames(out, name) end)
+end
+
+-- Pseudo-opcode to mark the position where the extern names are to be emitted.
+map_op[".externnames_1"] = function(params)
+ if not params then return "cvar" end
+ local name = params[1] -- No syntax check. You get to keep the pieces.
+ wline(function(out) writeexternnames(out, name) end)
+end
+
+------------------------------------------------------------------------------
+
+-- Label pseudo-opcode (converted from trailing colon form).
+map_op[".label_1"] = function(params)
+ if not params then return "[1-9] | ->global | =>pcexpr" end
+ if secpos+1 > maxsecpos then wflush() end
+ local mode, n, s = parse_label(params[1], true)
+ if mode == "EXT" then werror("bad label definition") end
+ waction("LABEL_"..mode, n, s, 1)
+end
+
+------------------------------------------------------------------------------
+
+-- Pseudo-opcodes for data storage.
+map_op[".long_*"] = function(params)
+ if not params then return "imm..." end
+ for _,p in ipairs(params) do
+ local n = tonumber(p)
+ if not n then werror("bad immediate `"..p.."'") end
+ if n < 0 then n = n + 2^32 end
+ wputw(n)
+ if secpos+2 > maxsecpos then wflush() end
+ end
+end
+
+-- Alignment pseudo-opcode.
+map_op[".align_1"] = function(params)
+ if not params then return "numpow2" end
+ if secpos+1 > maxsecpos then wflush() end
+ local align = tonumber(params[1])
+ if align then
+ local x = align
+ -- Must be a power of 2 in the range (2 ... 256).
+ for i=1,8 do
+ x = x / 2
+ if x == 1 then
+ waction("ALIGN", align-1, nil, 1) -- Action byte is 2**n-1.
+ return
+ end
+ end
+ end
+ werror("bad alignment")
+end
+
+------------------------------------------------------------------------------
+
+-- Pseudo-opcode for (primitive) type definitions (map to C types).
+map_op[".type_3"] = function(params, nparams)
+ if not params then
+ return nparams == 2 and "name, ctype" or "name, ctype, reg"
+ end
+ local name, ctype, reg = params[1], params[2], params[3]
+ if not match(name, "^[%a_][%w_]*$") then
+ werror("bad type name `"..name.."'")
+ end
+ local tp = map_type[name]
+ if tp then
+ werror("duplicate type `"..name.."'")
+ end
+ -- Add #type to defines. A bit unclean to put it in map_archdef.
+ map_archdef["#"..name] = "sizeof("..ctype..")"
+ -- Add new type and emit shortcut define.
+ local num = ctypenum + 1
+ map_type[name] = {
+ ctype = ctype,
+ ctypefmt = format("Dt%X(%%s)", num),
+ reg = reg,
+ }
+ wline(format("#define Dt%X(_V) (int)(ptrdiff_t)&(((%s *)0)_V)", num, ctype))
+ ctypenum = num
+end
+map_op[".type_2"] = map_op[".type_3"]
+
+-- Dump type definitions.
+local function dumptypes(out, lvl)
+ local t = {}
+ for name in pairs(map_type) do t[#t+1] = name end
+ sort(t)
+ out:write("Type definitions:\n")
+ for _,name in ipairs(t) do
+ local tp = map_type[name]
+ local reg = tp.reg or ""
+ out:write(format(" %-20s %-20s %s\n", name, tp.ctype, reg))
+ end
+ out:write("\n")
+end
+
+------------------------------------------------------------------------------
+
+-- Set the current section.
+function _M.section(num)
+ waction("SECTION", num)
+ wflush(true) -- SECTION is a terminal action.
+end
+
+------------------------------------------------------------------------------
+
+-- Dump architecture description.
+function _M.dumparch(out)
+ out:write(format("DynASM %s version %s, released %s\n\n",
+ _info.arch, _info.version, _info.release))
+ dumpactions(out)
+end
+
+-- Dump all user defined elements.
+function _M.dumpdef(out, lvl)
+ dumptypes(out, lvl)
+ dumpglobals(out, lvl)
+ dumpexterns(out, lvl)
+end
+
+------------------------------------------------------------------------------
+
+-- Pass callbacks from/to the DynASM core.
+function _M.passcb(wl, we, wf, ww)
+ wline, werror, wfatal, wwarn = wl, we, wf, ww
+ return wflush
+end
+
+-- Setup the arch-specific module.
+function _M.setup(arch, opt)
+ g_arch, g_opt = arch, opt
+end
+
+-- Merge the core maps and the arch-specific maps.
+function _M.mergemaps(map_coreop, map_def)
+ setmetatable(map_op, { __index = map_coreop })
+ setmetatable(map_def, { __index = map_archdef })
+ return map_op, map_def
+end
+
+return _M
+
+------------------------------------------------------------------------------
+
/*
-** DynASM PPC encoding engine.
+** DynASM PPC/PPC64 encoding engine.
- ** Copyright (C) 2005-2015 Mike Pall. All rights reserved.
+ ** Copyright (C) 2005-2016 Mike Pall. All rights reserved.
** Released under the MIT license. See dynasm.lua for full copyright notice.
*/
------------------------------------------------------------------------------
--- DynASM PPC module.
+-- DynASM PPC/PPC64 module.
--
- -- Copyright (C) 2005-2015 Mike Pall. All rights reserved.
+ -- Copyright (C) 2005-2016 Mike Pall. All rights reserved.
-- See dynasm.lua for full copyright notice.
+--
+-- Support for various extensions contributed by Caio Souza Oliveira.
------------------------------------------------------------------------------
-- Module information:
--- /dev/null
- -- Copyright (C) 2005-2015 Mike Pall. All rights reserved.
+----------------------------------------------------------------------------
+-- Lua script to dump the bytecode of the library functions written in Lua.
+-- The resulting 'buildvm_libbc.h' is used for the build process of LuaJIT.
+----------------------------------------------------------------------------
++-- Copyright (C) 2005-2016 Mike Pall. All rights reserved.
+-- Released under the MIT license. See Copyright Notice in luajit.h
+----------------------------------------------------------------------------
+
+local ffi = require("ffi")
+local bit = require("bit")
+local vmdef = require("jit.vmdef")
+local bcnames = vmdef.bcnames
+
+local format = string.format
+
+local isbe = (string.byte(string.dump(function() end), 5) % 2 == 1)
+
+local function usage(arg)
+ io.stderr:write("Usage: ", arg and arg[0] or "genlibbc",
+ " [-o buildvm_libbc.h] lib_*.c\n")
+ os.exit(1)
+end
+
+local function parse_arg(arg)
+ local outfile = "-"
+ if not (arg and arg[1]) then
+ usage(arg)
+ end
+ if arg[1] == "-o" then
+ outfile = arg[2]
+ if not outfile then usage(arg) end
+ table.remove(arg, 1)
+ table.remove(arg, 1)
+ end
+ return outfile
+end
+
+local function read_files(names)
+ local src = ""
+ for _,name in ipairs(names) do
+ local fp = assert(io.open(name))
+ src = src .. fp:read("*a")
+ fp:close()
+ end
+ return src
+end
+
+local function transform_lua(code)
+ local fixup = {}
+ local n = -30000
+ code = string.gsub(code, "CHECK_(%w*)%((.-)%)", function(tp, var)
+ n = n + 1
+ fixup[n] = { "CHECK", tp }
+ return format("%s=%d", var, n)
+ end)
+ code = string.gsub(code, "PAIRS%((.-)%)", function(var)
+ fixup.PAIRS = true
+ return format("nil, %s, 0", var)
+ end)
+ return "return "..code, fixup
+end
+
+local function read_uleb128(p)
+ local v = p[0]; p = p + 1
+ if v >= 128 then
+ local sh = 7; v = v - 128
+ repeat
+ local r = p[0]
+ v = v + bit.lshift(bit.band(r, 127), sh)
+ sh = sh + 7
+ p = p + 1
+ until r < 128
+ end
+ return p, v
+end
+
+-- ORDER LJ_T
+local name2itype = {
+ str = 5, func = 9, tab = 12, int = 14, num = 15
+}
+
+local BC = {}
+for i=0,#bcnames/6-1 do
+ BC[string.gsub(string.sub(bcnames, i*6+1, i*6+6), " ", "")] = i
+end
+local xop, xra = isbe and 3 or 0, isbe and 2 or 1
+local xrc, xrb = isbe and 1 or 2, isbe and 0 or 3
+
+local function fixup_dump(dump, fixup)
+ local buf = ffi.new("uint8_t[?]", #dump+1, dump)
+ local p = buf+5
+ local n, sizebc
+ p, n = read_uleb128(p)
+ local start = p
+ p = p + 4
+ p = read_uleb128(p)
+ p = read_uleb128(p)
+ p, sizebc = read_uleb128(p)
+ local rawtab = {}
+ for i=0,sizebc-1 do
+ local op = p[xop]
+ if op == BC.KSHORT then
+ local rd = p[xrc] + 256*p[xrb]
+ rd = bit.arshift(bit.lshift(rd, 16), 16)
+ local f = fixup[rd]
+ if f then
+ if f[1] == "CHECK" then
+ local tp = f[2]
+ if tp == "tab" then rawtab[p[xra]] = true end
+ p[xop] = tp == "num" and BC.ISNUM or BC.ISTYPE
+ p[xrb] = 0
+ p[xrc] = name2itype[tp]
+ else
+ error("unhandled fixup type: "..f[1])
+ end
+ end
+ elseif op == BC.TGETV then
+ if rawtab[p[xrb]] then
+ p[xop] = BC.TGETR
+ end
+ elseif op == BC.TSETV then
+ if rawtab[p[xrb]] then
+ p[xop] = BC.TSETR
+ end
+ elseif op == BC.ITERC then
+ if fixup.PAIRS then
+ p[xop] = BC.ITERN
+ end
+ end
+ p = p + 4
+ end
+ return ffi.string(start, n)
+end
+
+local function find_defs(src)
+ local defs = {}
+ for name, code in string.gmatch(src, "LJLIB_LUA%(([^)]*)%)%s*/%*(.-)%*/") do
+ local env = {}
+ local tcode, fixup = transform_lua(code)
+ local func = assert(load(tcode, "", nil, env))()
+ defs[name] = fixup_dump(string.dump(func, true), fixup)
+ defs[#defs+1] = name
+ end
+ return defs
+end
+
+local function gen_header(defs)
+ local t = {}
+ local function w(x) t[#t+1] = x end
+ w("/* This is a generated file. DO NOT EDIT! */\n\n")
+ w("static const int libbc_endian = ") w(isbe and 1 or 0) w(";\n\n")
+ local s = ""
+ for _,name in ipairs(defs) do
+ s = s .. defs[name]
+ end
+ w("static const uint8_t libbc_code[] = {\n")
+ local n = 0
+ for i=1,#s do
+ local x = string.byte(s, i)
+ w(x); w(",")
+ n = n + (x < 10 and 2 or (x < 100 and 3 or 4))
+ if n >= 75 then n = 0; w("\n") end
+ end
+ w("0\n};\n\n")
+ w("static const struct { const char *name; int ofs; } libbc_map[] = {\n")
+ local m = 0
+ for _,name in ipairs(defs) do
+ w('{"'); w(name); w('",'); w(m) w('},\n')
+ m = m + #defs[name]
+ end
+ w("{NULL,"); w(m); w("}\n};\n\n")
+ return table.concat(t)
+end
+
+local function write_file(name, data)
+ if name == "-" then
+ assert(io.write(data))
+ assert(io.flush())
+ else
+ local fp = io.open(name)
+ if fp then
+ local old = fp:read("*a")
+ fp:close()
+ if data == old then return end
+ end
+ fp = assert(io.open(name, "w"))
+ assert(fp:write(data))
+ assert(fp:close())
+ end
+end
+
+local outfile = parse_arg(arg)
+local src = read_files(arg)
+local defs = find_defs(src)
+local hdr = gen_header(defs)
+write_file(outfile, hdr)
+
--- /dev/null
- -- Copyright (C) 2005-2015 Mike Pall. All rights reserved.
+----------------------------------------------------------------------------
+-- LuaJIT profiler.
+--
++-- Copyright (C) 2005-2016 Mike Pall. All rights reserved.
+-- Released under the MIT license. See Copyright Notice in luajit.h
+----------------------------------------------------------------------------
+--
+-- This module is a simple command line interface to the built-in
+-- low-overhead profiler of LuaJIT.
+--
+-- The lower-level API of the profiler is accessible via the "jit.profile"
+-- module or the luaJIT_profile_* C API.
+--
+-- Example usage:
+--
+-- luajit -jp myapp.lua
+-- luajit -jp=s myapp.lua
+-- luajit -jp=-s myapp.lua
+-- luajit -jp=vl myapp.lua
+-- luajit -jp=G,profile.txt myapp.lua
+--
+-- The following dump features are available:
+--
+-- f Stack dump: function name, otherwise module:line. Default mode.
+-- F Stack dump: ditto, but always prepend module.
+-- l Stack dump: module:line.
+-- <number> stack dump depth (callee < caller). Default: 1.
+-- -<number> Inverse stack dump depth (caller > callee).
+-- s Split stack dump after first stack level. Implies abs(depth) >= 2.
+-- p Show full path for module names.
+-- v Show VM states. Can be combined with stack dumps, e.g. vf or fv.
+-- z Show zones. Can be combined with stack dumps, e.g. zf or fz.
+-- r Show raw sample counts. Default: show percentages.
+-- a Annotate excerpts from source code files.
+-- A Annotate complete source code files.
+-- G Produce raw output suitable for graphical tools (e.g. flame graphs).
+-- m<number> Minimum sample percentage to be shown. Default: 3.
+-- i<number> Sampling interval in milliseconds. Default: 10.
+--
+----------------------------------------------------------------------------
+
+-- Cache some library functions and objects.
+local jit = require("jit")
+assert(jit.version_num == 20100, "LuaJIT core/library version mismatch")
+local profile = require("jit.profile")
+local vmdef = require("jit.vmdef")
+local math = math
+local pairs, ipairs, tonumber, floor = pairs, ipairs, tonumber, math.floor
+local sort, format = table.sort, string.format
+local stdout = io.stdout
+local zone -- Load jit.zone module on demand.
+
+-- Output file handle.
+local out
+
+------------------------------------------------------------------------------
+
+local prof_ud
+local prof_states, prof_split, prof_min, prof_raw, prof_fmt, prof_depth
+local prof_ann, prof_count1, prof_count2, prof_samples
+
+local map_vmmode = {
+ N = "Compiled",
+ I = "Interpreted",
+ C = "C code",
+ G = "Garbage Collector",
+ J = "JIT Compiler",
+}
+
+-- Profiler callback.
+local function prof_cb(th, samples, vmmode)
+ prof_samples = prof_samples + samples
+ local key_stack, key_stack2, key_state
+ -- Collect keys for sample.
+ if prof_states then
+ if prof_states == "v" then
+ key_state = map_vmmode[vmmode] or vmmode
+ else
+ key_state = zone:get() or "(none)"
+ end
+ end
+ if prof_fmt then
+ key_stack = profile.dumpstack(th, prof_fmt, prof_depth)
+ key_stack = key_stack:gsub("%[builtin#(%d+)%]", function(x)
+ return vmdef.ffnames[tonumber(x)]
+ end)
+ if prof_split == 2 then
+ local k1, k2 = key_stack:match("(.-) [<>] (.*)")
+ if k2 then key_stack, key_stack2 = k1, k2 end
+ elseif prof_split == 3 then
+ key_stack2 = profile.dumpstack(th, "l", 1)
+ end
+ end
+ -- Order keys.
+ local k1, k2
+ if prof_split == 1 then
+ if key_state then
+ k1 = key_state
+ if key_stack then k2 = key_stack end
+ end
+ elseif key_stack then
+ k1 = key_stack
+ if key_stack2 then k2 = key_stack2 elseif key_state then k2 = key_state end
+ end
+ -- Coalesce samples in one or two levels.
+ if k1 then
+ local t1 = prof_count1
+ t1[k1] = (t1[k1] or 0) + samples
+ if k2 then
+ local t2 = prof_count2
+ local t3 = t2[k1]
+ if not t3 then t3 = {}; t2[k1] = t3 end
+ t3[k2] = (t3[k2] or 0) + samples
+ end
+ end
+end
+
+------------------------------------------------------------------------------
+
+-- Show top N list.
+local function prof_top(count1, count2, samples, indent)
+ local t, n = {}, 0
+ for k, v in pairs(count1) do
+ n = n + 1
+ t[n] = k
+ end
+ sort(t, function(a, b) return count1[a] > count1[b] end)
+ for i=1,n do
+ local k = t[i]
+ local v = count1[k]
+ local pct = floor(v*100/samples + 0.5)
+ if pct < prof_min then break end
+ if not prof_raw then
+ out:write(format("%s%2d%% %s\n", indent, pct, k))
+ elseif prof_raw == "r" then
+ out:write(format("%s%5d %s\n", indent, v, k))
+ else
+ out:write(format("%s %d\n", k, v))
+ end
+ if count2 then
+ local r = count2[k]
+ if r then
+ prof_top(r, nil, v, (prof_split == 3 or prof_split == 1) and " -- " or
+ (prof_depth < 0 and " -> " or " <- "))
+ end
+ end
+ end
+end
+
+-- Annotate source code
+local function prof_annotate(count1, samples)
+ local files = {}
+ local ms = 0
+ for k, v in pairs(count1) do
+ local pct = floor(v*100/samples + 0.5)
+ ms = math.max(ms, v)
+ if pct >= prof_min then
+ local file, line = k:match("^(.*):(%d+)$")
+ local fl = files[file]
+ if not fl then fl = {}; files[file] = fl; files[#files+1] = file end
+ line = tonumber(line)
+ fl[line] = prof_raw and v or pct
+ end
+ end
+ sort(files)
+ local fmtv, fmtn = " %3d%% | %s\n", " | %s\n"
+ if prof_raw then
+ local n = math.max(5, math.ceil(math.log10(ms)))
+ fmtv = "%"..n.."d | %s\n"
+ fmtn = (" "):rep(n).." | %s\n"
+ end
+ local ann = prof_ann
+ for _, file in ipairs(files) do
+ local f0 = file:byte()
+ if f0 == 40 or f0 == 91 then
+ out:write(format("\n====== %s ======\n[Cannot annotate non-file]\n", file))
+ break
+ end
+ local fp, err = io.open(file)
+ if not fp then
+ out:write(format("====== ERROR: %s: %s\n", file, err))
+ break
+ end
+ out:write(format("\n====== %s ======\n", file))
+ local fl = files[file]
+ local n, show = 1, false
+ if ann ~= 0 then
+ for i=1,ann do
+ if fl[i] then show = true; out:write("@@ 1 @@\n"); break end
+ end
+ end
+ for line in fp:lines() do
+ if line:byte() == 27 then
+ out:write("[Cannot annotate bytecode file]\n")
+ break
+ end
+ local v = fl[n]
+ if ann ~= 0 then
+ local v2 = fl[n+ann]
+ if show then
+ if v2 then show = n+ann elseif v then show = n
+ elseif show+ann < n then show = false end
+ elseif v2 then
+ show = n+ann
+ out:write(format("@@ %d @@\n", n))
+ end
+ if not show then goto next end
+ end
+ if v then
+ out:write(format(fmtv, v, line))
+ else
+ out:write(format(fmtn, line))
+ end
+ ::next::
+ n = n + 1
+ end
+ fp:close()
+ end
+end
+
+------------------------------------------------------------------------------
+
+-- Finish profiling and dump result.
+local function prof_finish()
+ if prof_ud then
+ profile.stop()
+ local samples = prof_samples
+ if samples == 0 then
+ if prof_raw ~= true then out:write("[No samples collected]\n") end
+ return
+ end
+ if prof_ann then
+ prof_annotate(prof_count1, samples)
+ else
+ prof_top(prof_count1, prof_count2, samples, "")
+ end
+ prof_count1 = nil
+ prof_count2 = nil
+ prof_ud = nil
+ end
+end
+
+-- Start profiling.
+local function prof_start(mode)
+ local interval = ""
+ mode = mode:gsub("i%d*", function(s) interval = s; return "" end)
+ prof_min = 3
+ mode = mode:gsub("m(%d+)", function(s) prof_min = tonumber(s); return "" end)
+ prof_depth = 1
+ mode = mode:gsub("%-?%d+", function(s) prof_depth = tonumber(s); return "" end)
+ local m = {}
+ for c in mode:gmatch(".") do m[c] = c end
+ prof_states = m.z or m.v
+ if prof_states == "z" then zone = require("jit.zone") end
+ local scope = m.l or m.f or m.F or (prof_states and "" or "f")
+ local flags = (m.p or "")
+ prof_raw = m.r
+ if m.s then
+ prof_split = 2
+ if prof_depth == -1 or m["-"] then prof_depth = -2
+ elseif prof_depth == 1 then prof_depth = 2 end
+ elseif mode:find("[fF].*l") then
+ scope = "l"
+ prof_split = 3
+ else
+ prof_split = (scope == "" or mode:find("[zv].*[lfF]")) and 1 or 0
+ end
+ prof_ann = m.A and 0 or (m.a and 3)
+ if prof_ann then
+ scope = "l"
+ prof_fmt = "pl"
+ prof_split = 0
+ prof_depth = 1
+ elseif m.G and scope ~= "" then
+ prof_fmt = flags..scope.."Z;"
+ prof_depth = -100
+ prof_raw = true
+ prof_min = 0
+ elseif scope == "" then
+ prof_fmt = false
+ else
+ local sc = prof_split == 3 and m.f or m.F or scope
+ prof_fmt = flags..sc..(prof_depth >= 0 and "Z < " or "Z > ")
+ end
+ prof_count1 = {}
+ prof_count2 = {}
+ prof_samples = 0
+ profile.start(scope:lower()..interval, prof_cb)
+ prof_ud = newproxy(true)
+ getmetatable(prof_ud).__gc = prof_finish
+end
+
+------------------------------------------------------------------------------
+
+local function start(mode, outfile)
+ if not outfile then outfile = os.getenv("LUAJIT_PROFILEFILE") end
+ if outfile then
+ out = outfile == "-" and stdout or assert(io.open(outfile, "w"))
+ else
+ out = stdout
+ end
+ prof_start(mode or "f")
+end
+
+-- Public module functions.
+return {
+ start = start, -- For -j command line option.
+ stop = prof_finish
+}
+
--- /dev/null
- -- Copyright (C) 2005-2015 Mike Pall. All rights reserved.
+----------------------------------------------------------------------------
+-- LuaJIT profiler zones.
+--
++-- Copyright (C) 2005-2016 Mike Pall. All rights reserved.
+-- Released under the MIT license. See Copyright Notice in luajit.h
+----------------------------------------------------------------------------
+--
+-- This module implements a simple hierarchical zone model.
+--
+-- Example usage:
+--
+-- local zone = require("jit.zone")
+-- zone("AI")
+-- ...
+-- zone("A*")
+-- ...
+-- print(zone:get()) --> "A*"
+-- ...
+-- zone()
+-- ...
+-- print(zone:get()) --> "AI"
+-- ...
+-- zone()
+--
+----------------------------------------------------------------------------
+
+local remove = table.remove
+
+return setmetatable({
+ flush = function(t)
+ for i=#t,1,-1 do t[i] = nil end
+ end,
+ get = function(t)
+ return t[#t]
+ end
+}, {
+ __call = function(t, zone)
+ if zone then
+ t[#t+1] = zone
+ else
+ return (assert(remove(t), "empty zone stack"))
+ end
+ end
+})
+
--- /dev/null
- ** Copyright (C) 2005-2015 Mike Pall. See Copyright Notice in luajit.h
+/*
+** Buffer handling.
++** Copyright (C) 2005-2016 Mike Pall. See Copyright Notice in luajit.h
+*/
+
+#define lj_buf_c
+#define LUA_CORE
+
+#include "lj_obj.h"
+#include "lj_gc.h"
+#include "lj_err.h"
+#include "lj_buf.h"
+#include "lj_str.h"
+#include "lj_tab.h"
+#include "lj_strfmt.h"
+
+/* -- Buffer management --------------------------------------------------- */
+
+static void buf_grow(SBuf *sb, MSize sz)
+{
+ MSize osz = sbufsz(sb), len = sbuflen(sb), nsz = osz;
+ char *b;
+ if (nsz < LJ_MIN_SBUF) nsz = LJ_MIN_SBUF;
+ while (nsz < sz) nsz += nsz;
+ b = (char *)lj_mem_realloc(sbufL(sb), sbufB(sb), osz, nsz);
+ setmref(sb->b, b);
+ setmref(sb->p, b + len);
+ setmref(sb->e, b + nsz);
+}
+
+LJ_NOINLINE char *LJ_FASTCALL lj_buf_need2(SBuf *sb, MSize sz)
+{
+ lua_assert(sz > sbufsz(sb));
+ if (LJ_UNLIKELY(sz > LJ_MAX_BUF))
+ lj_err_mem(sbufL(sb));
+ buf_grow(sb, sz);
+ return sbufB(sb);
+}
+
+LJ_NOINLINE char *LJ_FASTCALL lj_buf_more2(SBuf *sb, MSize sz)
+{
+ MSize len = sbuflen(sb);
+ lua_assert(sz > sbufleft(sb));
+ if (LJ_UNLIKELY(sz > LJ_MAX_BUF || len + sz > LJ_MAX_BUF))
+ lj_err_mem(sbufL(sb));
+ buf_grow(sb, len + sz);
+ return sbufP(sb);
+}
+
+void LJ_FASTCALL lj_buf_shrink(lua_State *L, SBuf *sb)
+{
+ char *b = sbufB(sb);
+ MSize osz = (MSize)(sbufE(sb) - b);
+ if (osz > 2*LJ_MIN_SBUF) {
+ MSize n = (MSize)(sbufP(sb) - b);
+ b = lj_mem_realloc(L, b, osz, (osz >> 1));
+ setmref(sb->b, b);
+ setmref(sb->p, b + n);
+ setmref(sb->e, b + (osz >> 1));
+ }
+}
+
+char * LJ_FASTCALL lj_buf_tmp(lua_State *L, MSize sz)
+{
+ SBuf *sb = &G(L)->tmpbuf;
+ setsbufL(sb, L);
+ return lj_buf_need(sb, sz);
+}
+
+/* -- Low-level buffer put operations ------------------------------------- */
+
+SBuf *lj_buf_putmem(SBuf *sb, const void *q, MSize len)
+{
+ char *p = lj_buf_more(sb, len);
+ p = lj_buf_wmem(p, q, len);
+ setsbufP(sb, p);
+ return sb;
+}
+
+SBuf * LJ_FASTCALL lj_buf_putchar(SBuf *sb, int c)
+{
+ char *p = lj_buf_more(sb, 1);
+ *p++ = (char)c;
+ setsbufP(sb, p);
+ return sb;
+}
+
+SBuf * LJ_FASTCALL lj_buf_putstr(SBuf *sb, GCstr *s)
+{
+ MSize len = s->len;
+ char *p = lj_buf_more(sb, len);
+ p = lj_buf_wmem(p, strdata(s), len);
+ setsbufP(sb, p);
+ return sb;
+}
+
+/* -- High-level buffer put operations ------------------------------------ */
+
+SBuf * LJ_FASTCALL lj_buf_putstr_reverse(SBuf *sb, GCstr *s)
+{
+ MSize len = s->len;
+ char *p = lj_buf_more(sb, len), *e = p+len;
+ const char *q = strdata(s)+len-1;
+ while (p < e)
+ *p++ = *q--;
+ setsbufP(sb, p);
+ return sb;
+}
+
+SBuf * LJ_FASTCALL lj_buf_putstr_lower(SBuf *sb, GCstr *s)
+{
+ MSize len = s->len;
+ char *p = lj_buf_more(sb, len), *e = p+len;
+ const char *q = strdata(s);
+ for (; p < e; p++, q++) {
+ uint32_t c = *(unsigned char *)q;
+#if LJ_TARGET_PPC
+ *p = c + ((c >= 'A' && c <= 'Z') << 5);
+#else
+ if (c >= 'A' && c <= 'Z') c += 0x20;
+ *p = c;
+#endif
+ }
+ setsbufP(sb, p);
+ return sb;
+}
+
+SBuf * LJ_FASTCALL lj_buf_putstr_upper(SBuf *sb, GCstr *s)
+{
+ MSize len = s->len;
+ char *p = lj_buf_more(sb, len), *e = p+len;
+ const char *q = strdata(s);
+ for (; p < e; p++, q++) {
+ uint32_t c = *(unsigned char *)q;
+#if LJ_TARGET_PPC
+ *p = c - ((c >= 'a' && c <= 'z') << 5);
+#else
+ if (c >= 'a' && c <= 'z') c -= 0x20;
+ *p = c;
+#endif
+ }
+ setsbufP(sb, p);
+ return sb;
+}
+
+SBuf *lj_buf_putstr_rep(SBuf *sb, GCstr *s, int32_t rep)
+{
+ MSize len = s->len;
+ if (rep > 0 && len) {
+ uint64_t tlen = (uint64_t)rep * len;
+ char *p;
+ if (LJ_UNLIKELY(tlen > LJ_MAX_STR))
+ lj_err_mem(sbufL(sb));
+ p = lj_buf_more(sb, (MSize)tlen);
+ if (len == 1) { /* Optimize a common case. */
+ uint32_t c = strdata(s)[0];
+ do { *p++ = c; } while (--rep > 0);
+ } else {
+ const char *e = strdata(s) + len;
+ do {
+ const char *q = strdata(s);
+ do { *p++ = *q++; } while (q < e);
+ } while (--rep > 0);
+ }
+ setsbufP(sb, p);
+ }
+ return sb;
+}
+
+SBuf *lj_buf_puttab(SBuf *sb, GCtab *t, GCstr *sep, int32_t i, int32_t e)
+{
+ MSize seplen = sep ? sep->len : 0;
+ if (i <= e) {
+ for (;;) {
+ cTValue *o = lj_tab_getint(t, i);
+ char *p;
+ if (!o) {
+ badtype: /* Error: bad element type. */
+ setsbufP(sb, (void *)(intptr_t)i); /* Store failing index. */
+ return NULL;
+ } else if (tvisstr(o)) {
+ MSize len = strV(o)->len;
+ p = lj_buf_wmem(lj_buf_more(sb, len + seplen), strVdata(o), len);
+ } else if (tvisint(o)) {
+ p = lj_strfmt_wint(lj_buf_more(sb, STRFMT_MAXBUF_INT+seplen), intV(o));
+ } else if (tvisnum(o)) {
+ p = lj_buf_more(lj_strfmt_putfnum(sb, STRFMT_G14, numV(o)), seplen);
+ } else {
+ goto badtype;
+ }
+ if (i++ == e) {
+ setsbufP(sb, p);
+ break;
+ }
+ if (seplen) p = lj_buf_wmem(p, strdata(sep), seplen);
+ setsbufP(sb, p);
+ }
+ }
+ return sb;
+}
+
+/* -- Miscellaneous buffer operations ------------------------------------- */
+
+GCstr * LJ_FASTCALL lj_buf_tostr(SBuf *sb)
+{
+ return lj_str_new(sbufL(sb), sbufB(sb), sbuflen(sb));
+}
+
+/* Concatenate two strings. */
+GCstr *lj_buf_cat2str(lua_State *L, GCstr *s1, GCstr *s2)
+{
+ MSize len1 = s1->len, len2 = s2->len;
+ char *buf = lj_buf_tmp(L, len1 + len2);
+ memcpy(buf, strdata(s1), len1);
+ memcpy(buf+len1, strdata(s2), len2);
+ return lj_str_new(L, buf, len1 + len2);
+}
+
+/* Read ULEB128 from buffer. */
+uint32_t LJ_FASTCALL lj_buf_ruleb128(const char **pp)
+{
+ const uint8_t *p = (const uint8_t *)*pp;
+ uint32_t v = *p++;
+ if (LJ_UNLIKELY(v >= 0x80)) {
+ int sh = 0;
+ v &= 0x7f;
+ do { v |= ((*p & 0x7f) << (sh += 7)); } while (*p++ >= 0x80);
+ }
+ *pp = (const char *)p;
+ return v;
+}
+
--- /dev/null
- ** Copyright (C) 2005-2015 Mike Pall. See Copyright Notice in luajit.h
+/*
+** Buffer handling.
++** Copyright (C) 2005-2016 Mike Pall. See Copyright Notice in luajit.h
+*/
+
+#ifndef _LJ_BUF_H
+#define _LJ_BUF_H
+
+#include "lj_obj.h"
+#include "lj_gc.h"
+#include "lj_str.h"
+
+/* Resizable string buffers. Struct definition in lj_obj.h. */
+#define sbufB(sb) (mref((sb)->b, char))
+#define sbufP(sb) (mref((sb)->p, char))
+#define sbufE(sb) (mref((sb)->e, char))
+#define sbufL(sb) (mref((sb)->L, lua_State))
+#define sbufsz(sb) ((MSize)(sbufE((sb)) - sbufB((sb))))
+#define sbuflen(sb) ((MSize)(sbufP((sb)) - sbufB((sb))))
+#define sbufleft(sb) ((MSize)(sbufE((sb)) - sbufP((sb))))
+#define setsbufP(sb, q) (setmref((sb)->p, (q)))
+#define setsbufL(sb, l) (setmref((sb)->L, (l)))
+
+/* Buffer management */
+LJ_FUNC char *LJ_FASTCALL lj_buf_need2(SBuf *sb, MSize sz);
+LJ_FUNC char *LJ_FASTCALL lj_buf_more2(SBuf *sb, MSize sz);
+LJ_FUNC void LJ_FASTCALL lj_buf_shrink(lua_State *L, SBuf *sb);
+LJ_FUNC char * LJ_FASTCALL lj_buf_tmp(lua_State *L, MSize sz);
+
+static LJ_AINLINE void lj_buf_init(lua_State *L, SBuf *sb)
+{
+ setsbufL(sb, L);
+ setmref(sb->p, NULL); setmref(sb->e, NULL); setmref(sb->b, NULL);
+}
+
+static LJ_AINLINE void lj_buf_reset(SBuf *sb)
+{
+ setmrefr(sb->p, sb->b);
+}
+
+static LJ_AINLINE SBuf *lj_buf_tmp_(lua_State *L)
+{
+ SBuf *sb = &G(L)->tmpbuf;
+ setsbufL(sb, L);
+ lj_buf_reset(sb);
+ return sb;
+}
+
+static LJ_AINLINE void lj_buf_free(global_State *g, SBuf *sb)
+{
+ lj_mem_free(g, sbufB(sb), sbufsz(sb));
+}
+
+static LJ_AINLINE char *lj_buf_need(SBuf *sb, MSize sz)
+{
+ if (LJ_UNLIKELY(sz > sbufsz(sb)))
+ return lj_buf_need2(sb, sz);
+ return sbufB(sb);
+}
+
+static LJ_AINLINE char *lj_buf_more(SBuf *sb, MSize sz)
+{
+ if (LJ_UNLIKELY(sz > sbufleft(sb)))
+ return lj_buf_more2(sb, sz);
+ return sbufP(sb);
+}
+
+/* Low-level buffer put operations */
+LJ_FUNC SBuf *lj_buf_putmem(SBuf *sb, const void *q, MSize len);
+LJ_FUNC SBuf * LJ_FASTCALL lj_buf_putchar(SBuf *sb, int c);
+LJ_FUNC SBuf * LJ_FASTCALL lj_buf_putstr(SBuf *sb, GCstr *s);
+
+static LJ_AINLINE char *lj_buf_wmem(char *p, const void *q, MSize len)
+{
+ return (char *)memcpy(p, q, len) + len;
+}
+
+static LJ_AINLINE void lj_buf_putb(SBuf *sb, int c)
+{
+ char *p = lj_buf_more(sb, 1);
+ *p++ = (char)c;
+ setsbufP(sb, p);
+}
+
+/* High-level buffer put operations */
+LJ_FUNCA SBuf * LJ_FASTCALL lj_buf_putstr_reverse(SBuf *sb, GCstr *s);
+LJ_FUNCA SBuf * LJ_FASTCALL lj_buf_putstr_lower(SBuf *sb, GCstr *s);
+LJ_FUNCA SBuf * LJ_FASTCALL lj_buf_putstr_upper(SBuf *sb, GCstr *s);
+LJ_FUNC SBuf *lj_buf_putstr_rep(SBuf *sb, GCstr *s, int32_t rep);
+LJ_FUNC SBuf *lj_buf_puttab(SBuf *sb, GCtab *t, GCstr *sep,
+ int32_t i, int32_t e);
+
+/* Miscellaneous buffer operations */
+LJ_FUNCA GCstr * LJ_FASTCALL lj_buf_tostr(SBuf *sb);
+LJ_FUNC GCstr *lj_buf_cat2str(lua_State *L, GCstr *s1, GCstr *s2);
+LJ_FUNC uint32_t LJ_FASTCALL lj_buf_ruleb128(const char **pp);
+
+static LJ_AINLINE GCstr *lj_buf_str(lua_State *L, SBuf *sb)
+{
+ return lj_str_new(L, sbufB(sb), sbuflen(sb));
+}
+
+#endif
--- /dev/null
- ** Copyright (C) 2005-2015 Mike Pall. See Copyright Notice in luajit.h
+/*
+** Low-overhead profiling.
++** Copyright (C) 2005-2016 Mike Pall. See Copyright Notice in luajit.h
+*/
+
+#define lj_profile_c
+#define LUA_CORE
+
+#include "lj_obj.h"
+
+#if LJ_HASPROFILE
+
+#include "lj_buf.h"
+#include "lj_frame.h"
+#include "lj_debug.h"
+#include "lj_dispatch.h"
+#if LJ_HASJIT
+#include "lj_jit.h"
+#include "lj_trace.h"
+#endif
+#include "lj_profile.h"
+
+#include "luajit.h"
+
+#if LJ_PROFILE_SIGPROF
+
+#include <sys/time.h>
+#include <signal.h>
+#define profile_lock(ps) UNUSED(ps)
+#define profile_unlock(ps) UNUSED(ps)
+
+#elif LJ_PROFILE_PTHREAD
+
+#include <pthread.h>
+#include <time.h>
+#if LJ_TARGET_PS3
+#include <sys/timer.h>
+#endif
+#define profile_lock(ps) pthread_mutex_lock(&ps->lock)
+#define profile_unlock(ps) pthread_mutex_unlock(&ps->lock)
+
+#elif LJ_PROFILE_WTHREAD
+
+#define WIN32_LEAN_AND_MEAN
+#if LJ_TARGET_XBOX360
+#include <xtl.h>
+#include <xbox.h>
+#else
+#include <windows.h>
+#endif
+typedef unsigned int (WINAPI *WMM_TPFUNC)(unsigned int);
+#define profile_lock(ps) EnterCriticalSection(&ps->lock)
+#define profile_unlock(ps) LeaveCriticalSection(&ps->lock)
+
+#endif
+
+/* Profiler state. */
+typedef struct ProfileState {
+ global_State *g; /* VM state that started the profiler. */
+ luaJIT_profile_callback cb; /* Profiler callback. */
+ void *data; /* Profiler callback data. */
+ SBuf sb; /* String buffer for stack dumps. */
+ int interval; /* Sample interval in milliseconds. */
+ int samples; /* Number of samples for next callback. */
+ int vmstate; /* VM state when profile timer triggered. */
+#if LJ_PROFILE_SIGPROF
+ struct sigaction oldsa; /* Previous SIGPROF state. */
+#elif LJ_PROFILE_PTHREAD
+ pthread_mutex_t lock; /* g->hookmask update lock. */
+ pthread_t thread; /* Timer thread. */
+ int abort; /* Abort timer thread. */
+#elif LJ_PROFILE_WTHREAD
+#if LJ_TARGET_WINDOWS
+ HINSTANCE wmm; /* WinMM library handle. */
+ WMM_TPFUNC wmm_tbp; /* WinMM timeBeginPeriod function. */
+ WMM_TPFUNC wmm_tep; /* WinMM timeEndPeriod function. */
+#endif
+ CRITICAL_SECTION lock; /* g->hookmask update lock. */
+ HANDLE thread; /* Timer thread. */
+ int abort; /* Abort timer thread. */
+#endif
+} ProfileState;
+
+/* Sadly, we have to use a static profiler state.
+**
+** The SIGPROF variant needs a static pointer to the global state, anyway.
+** And it would be hard to extend for multiple threads. You can still use
+** multiple VMs in multiple threads, but only profile one at a time.
+*/
+static ProfileState profile_state;
+
+/* Default sample interval in milliseconds. */
+#define LJ_PROFILE_INTERVAL_DEFAULT 10
+
+/* -- Profiler/hook interaction ------------------------------------------- */
+
+#if !LJ_PROFILE_SIGPROF
+void LJ_FASTCALL lj_profile_hook_enter(global_State *g)
+{
+ ProfileState *ps = &profile_state;
+ if (ps->g) {
+ profile_lock(ps);
+ hook_enter(g);
+ profile_unlock(ps);
+ } else {
+ hook_enter(g);
+ }
+}
+
+void LJ_FASTCALL lj_profile_hook_leave(global_State *g)
+{
+ ProfileState *ps = &profile_state;
+ if (ps->g) {
+ profile_lock(ps);
+ hook_leave(g);
+ profile_unlock(ps);
+ } else {
+ hook_leave(g);
+ }
+}
+#endif
+
+/* -- Profile callbacks --------------------------------------------------- */
+
+/* Callback from profile hook (HOOK_PROFILE already cleared). */
+void LJ_FASTCALL lj_profile_interpreter(lua_State *L)
+{
+ ProfileState *ps = &profile_state;
+ global_State *g = G(L);
+ uint8_t mask;
+ profile_lock(ps);
+ mask = (g->hookmask & ~HOOK_PROFILE);
+ if (!(mask & HOOK_VMEVENT)) {
+ int samples = ps->samples;
+ ps->samples = 0;
+ g->hookmask = HOOK_VMEVENT;
+ lj_dispatch_update(g);
+ profile_unlock(ps);
+ ps->cb(ps->data, L, samples, ps->vmstate); /* Invoke user callback. */
+ profile_lock(ps);
+ mask |= (g->hookmask & HOOK_PROFILE);
+ }
+ g->hookmask = mask;
+ lj_dispatch_update(g);
+ profile_unlock(ps);
+}
+
+/* Trigger profile hook. Asynchronous call from OS-specific profile timer. */
+static void profile_trigger(ProfileState *ps)
+{
+ global_State *g = ps->g;
+ uint8_t mask;
+ profile_lock(ps);
+ ps->samples++; /* Always increment number of samples. */
+ mask = g->hookmask;
+ if (!(mask & (HOOK_PROFILE|HOOK_VMEVENT))) { /* Set profile hook. */
+ int st = g->vmstate;
+ ps->vmstate = st >= 0 ? 'N' :
+ st == ~LJ_VMST_INTERP ? 'I' :
+ st == ~LJ_VMST_C ? 'C' :
+ st == ~LJ_VMST_GC ? 'G' : 'J';
+ g->hookmask = (mask | HOOK_PROFILE);
+ lj_dispatch_update(g);
+ }
+ profile_unlock(ps);
+}
+
+/* -- OS-specific profile timer handling ---------------------------------- */
+
+#if LJ_PROFILE_SIGPROF
+
+/* SIGPROF handler. */
+static void profile_signal(int sig)
+{
+ UNUSED(sig);
+ profile_trigger(&profile_state);
+}
+
+/* Start profiling timer. */
+static void profile_timer_start(ProfileState *ps)
+{
+ int interval = ps->interval;
+ struct itimerval tm;
+ struct sigaction sa;
+ tm.it_value.tv_sec = tm.it_interval.tv_sec = interval / 1000;
+ tm.it_value.tv_usec = tm.it_interval.tv_usec = (interval % 1000) * 1000;
+ setitimer(ITIMER_PROF, &tm, NULL);
+ sa.sa_flags = SA_RESTART;
+ sa.sa_handler = profile_signal;
+ sigemptyset(&sa.sa_mask);
+ sigaction(SIGPROF, &sa, &ps->oldsa);
+}
+
+/* Stop profiling timer. */
+static void profile_timer_stop(ProfileState *ps)
+{
+ struct itimerval tm;
+ tm.it_value.tv_sec = tm.it_interval.tv_sec = 0;
+ tm.it_value.tv_usec = tm.it_interval.tv_usec = 0;
+ setitimer(ITIMER_PROF, &tm, NULL);
+ sigaction(SIGPROF, &ps->oldsa, NULL);
+}
+
+#elif LJ_PROFILE_PTHREAD
+
+/* POSIX timer thread. */
+static void *profile_thread(ProfileState *ps)
+{
+ int interval = ps->interval;
+#if !LJ_TARGET_PS3
+ struct timespec ts;
+ ts.tv_sec = interval / 1000;
+ ts.tv_nsec = (interval % 1000) * 1000000;
+#endif
+ while (1) {
+#if LJ_TARGET_PS3
+ sys_timer_usleep(interval * 1000);
+#else
+ nanosleep(&ts, NULL);
+#endif
+ if (ps->abort) break;
+ profile_trigger(ps);
+ }
+ return NULL;
+}
+
+/* Start profiling timer thread. */
+static void profile_timer_start(ProfileState *ps)
+{
+ pthread_mutex_init(&ps->lock, 0);
+ ps->abort = 0;
+ pthread_create(&ps->thread, NULL, (void *(*)(void *))profile_thread, ps);
+}
+
+/* Stop profiling timer thread. */
+static void profile_timer_stop(ProfileState *ps)
+{
+ ps->abort = 1;
+ pthread_join(ps->thread, NULL);
+ pthread_mutex_destroy(&ps->lock);
+}
+
+#elif LJ_PROFILE_WTHREAD
+
+/* Windows timer thread. */
+static DWORD WINAPI profile_thread(void *psx)
+{
+ ProfileState *ps = (ProfileState *)psx;
+ int interval = ps->interval;
+#if LJ_TARGET_WINDOWS
+ ps->wmm_tbp(interval);
+#endif
+ while (1) {
+ Sleep(interval);
+ if (ps->abort) break;
+ profile_trigger(ps);
+ }
+#if LJ_TARGET_WINDOWS
+ ps->wmm_tep(interval);
+#endif
+ return 0;
+}
+
+/* Start profiling timer thread. */
+static void profile_timer_start(ProfileState *ps)
+{
+#if LJ_TARGET_WINDOWS
+ if (!ps->wmm) { /* Load WinMM library on-demand. */
+ ps->wmm = LoadLibraryExA("winmm.dll", NULL, 0);
+ if (ps->wmm) {
+ ps->wmm_tbp = (WMM_TPFUNC)GetProcAddress(ps->wmm, "timeBeginPeriod");
+ ps->wmm_tep = (WMM_TPFUNC)GetProcAddress(ps->wmm, "timeEndPeriod");
+ if (!ps->wmm_tbp || !ps->wmm_tep) {
+ ps->wmm = NULL;
+ return;
+ }
+ }
+ }
+#endif
+ InitializeCriticalSection(&ps->lock);
+ ps->abort = 0;
+ ps->thread = CreateThread(NULL, 0, profile_thread, ps, 0, NULL);
+}
+
+/* Stop profiling timer thread. */
+static void profile_timer_stop(ProfileState *ps)
+{
+ ps->abort = 1;
+ WaitForSingleObject(ps->thread, INFINITE);
+ DeleteCriticalSection(&ps->lock);
+}
+
+#endif
+
+/* -- Public profiling API ------------------------------------------------ */
+
+/* Start profiling. */
+LUA_API void luaJIT_profile_start(lua_State *L, const char *mode,
+ luaJIT_profile_callback cb, void *data)
+{
+ ProfileState *ps = &profile_state;
+ int interval = LJ_PROFILE_INTERVAL_DEFAULT;
+ while (*mode) {
+ int m = *mode++;
+ switch (m) {
+ case 'i':
+ interval = 0;
+ while (*mode >= '0' && *mode <= '9')
+ interval = interval * 10 + (*mode++ - '0');
+ if (interval <= 0) interval = 1;
+ break;
+#if LJ_HASJIT
+ case 'l': case 'f':
+ L2J(L)->prof_mode = m;
+ lj_trace_flushall(L);
+ break;
+#endif
+ default: /* Ignore unknown mode chars. */
+ break;
+ }
+ }
+ if (ps->g) {
+ luaJIT_profile_stop(L);
+ if (ps->g) return; /* Profiler in use by another VM. */
+ }
+ ps->g = G(L);
+ ps->interval = interval;
+ ps->cb = cb;
+ ps->data = data;
+ ps->samples = 0;
+ lj_buf_init(L, &ps->sb);
+ profile_timer_start(ps);
+}
+
+/* Stop profiling. */
+LUA_API void luaJIT_profile_stop(lua_State *L)
+{
+ ProfileState *ps = &profile_state;
+ global_State *g = ps->g;
+ if (G(L) == g) { /* Only stop profiler if started by this VM. */
+ profile_timer_stop(ps);
+ g->hookmask &= ~HOOK_PROFILE;
+ lj_dispatch_update(g);
+#if LJ_HASJIT
+ G2J(g)->prof_mode = 0;
+ lj_trace_flushall(L);
+#endif
+ lj_buf_free(g, &ps->sb);
+ setmref(ps->sb.b, NULL);
+ setmref(ps->sb.e, NULL);
+ ps->g = NULL;
+ }
+}
+
+/* Return a compact stack dump. */
+LUA_API const char *luaJIT_profile_dumpstack(lua_State *L, const char *fmt,
+ int depth, size_t *len)
+{
+ ProfileState *ps = &profile_state;
+ SBuf *sb = &ps->sb;
+ setsbufL(sb, L);
+ lj_buf_reset(sb);
+ lj_debug_dumpstack(L, sb, fmt, depth);
+ *len = (size_t)sbuflen(sb);
+ return sbufB(sb);
+}
+
+#endif
--- /dev/null
- ** Copyright (C) 2005-2015 Mike Pall. See Copyright Notice in luajit.h
+/*
+** Low-overhead profiling.
++** Copyright (C) 2005-2016 Mike Pall. See Copyright Notice in luajit.h
+*/
+
+#ifndef _LJ_PROFILE_H
+#define _LJ_PROFILE_H
+
+#include "lj_obj.h"
+
+#if LJ_HASPROFILE
+
+LJ_FUNC void LJ_FASTCALL lj_profile_interpreter(lua_State *L);
+#if !LJ_PROFILE_SIGPROF
+LJ_FUNC void LJ_FASTCALL lj_profile_hook_enter(global_State *g);
+LJ_FUNC void LJ_FASTCALL lj_profile_hook_leave(global_State *g);
+#endif
+
+#endif
+
+#endif
/*
** String handling.
- ** Copyright (C) 2005-2015 Mike Pall. See Copyright Notice in luajit.h
+ ** Copyright (C) 2005-2016 Mike Pall. See Copyright Notice in luajit.h
-**
-** Portions taken verbatim or adapted from the Lua interpreter.
-** Copyright (C) 1994-2008 Lua.org, PUC-Rio. See Copyright Notice in lua.h
*/
-#include <stdio.h>
-
#define lj_str_c
#define LUA_CORE
--- /dev/null
- ** Copyright (C) 2005-2015 Mike Pall. See Copyright Notice in luajit.h
+/*
+** String formatting.
++** Copyright (C) 2005-2016 Mike Pall. See Copyright Notice in luajit.h
+*/
+
+#include <stdio.h>
+
+#define lj_strfmt_c
+#define LUA_CORE
+
+#include "lj_obj.h"
+#include "lj_buf.h"
+#include "lj_str.h"
+#include "lj_state.h"
+#include "lj_char.h"
+#include "lj_strfmt.h"
+
+/* -- Format parser ------------------------------------------------------- */
+
+static const uint8_t strfmt_map[('x'-'A')+1] = {
+ STRFMT_A,0,0,0,STRFMT_E,STRFMT_F,STRFMT_G,0,0,0,0,0,0,
+ 0,0,0,0,0,0,0,0,0,0,STRFMT_X,0,0,
+ 0,0,0,0,0,0,
+ STRFMT_A,0,STRFMT_C,STRFMT_D,STRFMT_E,STRFMT_F,STRFMT_G,0,STRFMT_I,0,0,0,0,
+ 0,STRFMT_O,STRFMT_P,STRFMT_Q,0,STRFMT_S,0,STRFMT_U,0,0,STRFMT_X
+};
+
+SFormat LJ_FASTCALL lj_strfmt_parse(FormatState *fs)
+{
+ const uint8_t *p = fs->p, *e = fs->e;
+ fs->str = (const char *)p;
+ for (; p < e; p++) {
+ if (*p == '%') { /* Escape char? */
+ if (p[1] == '%') { /* '%%'? */
+ fs->p = ++p+1;
+ goto retlit;
+ } else {
+ SFormat sf = 0;
+ uint32_t c;
+ if (p != (const uint8_t *)fs->str)
+ break;
+ for (p++; (uint32_t)*p - ' ' <= (uint32_t)('0' - ' '); p++) {
+ /* Parse flags. */
+ if (*p == '-') sf |= STRFMT_F_LEFT;
+ else if (*p == '+') sf |= STRFMT_F_PLUS;
+ else if (*p == '0') sf |= STRFMT_F_ZERO;
+ else if (*p == ' ') sf |= STRFMT_F_SPACE;
+ else if (*p == '#') sf |= STRFMT_F_ALT;
+ else break;
+ }
+ if ((uint32_t)*p - '0' < 10) { /* Parse width. */
+ uint32_t width = (uint32_t)*p++ - '0';
+ if ((uint32_t)*p - '0' < 10)
+ width = (uint32_t)*p++ - '0' + width*10;
+ sf |= (width << STRFMT_SH_WIDTH);
+ }
+ if (*p == '.') { /* Parse precision. */
+ uint32_t prec = 0;
+ p++;
+ if ((uint32_t)*p - '0' < 10) {
+ prec = (uint32_t)*p++ - '0';
+ if ((uint32_t)*p - '0' < 10)
+ prec = (uint32_t)*p++ - '0' + prec*10;
+ }
+ sf |= ((prec+1) << STRFMT_SH_PREC);
+ }
+ /* Parse conversion. */
+ c = (uint32_t)*p - 'A';
+ if (LJ_LIKELY(c <= (uint32_t)('x' - 'A'))) {
+ uint32_t sx = strfmt_map[c];
+ if (sx) {
+ fs->p = p+1;
+ return (sf | sx | ((c & 0x20) ? 0 : STRFMT_F_UPPER));
+ }
+ }
+ /* Return error location. */
+ if (*p >= 32) p++;
+ fs->len = (MSize)(p - (const uint8_t *)fs->str);
+ fs->p = fs->e;
+ return STRFMT_ERR;
+ }
+ }
+ }
+ fs->p = p;
+retlit:
+ fs->len = (MSize)(p - (const uint8_t *)fs->str);
+ return fs->len ? STRFMT_LIT : STRFMT_EOF;
+}
+
+/* -- Raw conversions ----------------------------------------------------- */
+
+#define WINT_R(x, sh, sc) \
+ { uint32_t d = (x*(((1<<sh)+sc-1)/sc))>>sh; x -= d*sc; *p++ = (char)('0'+d); }
+
+/* Write integer to buffer. */
+char * LJ_FASTCALL lj_strfmt_wint(char *p, int32_t k)
+{
+ uint32_t u = (uint32_t)k;
+ if (k < 0) { u = (uint32_t)-k; *p++ = '-'; }
+ if (u < 10000) {
+ if (u < 10) goto dig1; if (u < 100) goto dig2; if (u < 1000) goto dig3;
+ } else {
+ uint32_t v = u / 10000; u -= v * 10000;
+ if (v < 10000) {
+ if (v < 10) goto dig5; if (v < 100) goto dig6; if (v < 1000) goto dig7;
+ } else {
+ uint32_t w = v / 10000; v -= w * 10000;
+ if (w >= 10) WINT_R(w, 10, 10)
+ *p++ = (char)('0'+w);
+ }
+ WINT_R(v, 23, 1000)
+ dig7: WINT_R(v, 12, 100)
+ dig6: WINT_R(v, 10, 10)
+ dig5: *p++ = (char)('0'+v);
+ }
+ WINT_R(u, 23, 1000)
+ dig3: WINT_R(u, 12, 100)
+ dig2: WINT_R(u, 10, 10)
+ dig1: *p++ = (char)('0'+u);
+ return p;
+}
+#undef WINT_R
+
+/* Write pointer to buffer. */
+char * LJ_FASTCALL lj_strfmt_wptr(char *p, const void *v)
+{
+ ptrdiff_t x = (ptrdiff_t)v;
+ MSize i, n = STRFMT_MAXBUF_PTR;
+ if (x == 0) {
+ *p++ = 'N'; *p++ = 'U'; *p++ = 'L'; *p++ = 'L';
+ return p;
+ }
+#if LJ_64
+ /* Shorten output for 64 bit pointers. */
+ n = 2+2*4+((x >> 32) ? 2+2*(lj_fls((uint32_t)(x >> 32))>>3) : 0);
+#endif
+ p[0] = '0';
+ p[1] = 'x';
+ for (i = n-1; i >= 2; i--, x >>= 4)
+ p[i] = "0123456789abcdef"[(x & 15)];
+ return p+n;
+}
+
+/* Write ULEB128 to buffer. */
+char * LJ_FASTCALL lj_strfmt_wuleb128(char *p, uint32_t v)
+{
+ for (; v >= 0x80; v >>= 7)
+ *p++ = (char)((v & 0x7f) | 0x80);
+ *p++ = (char)v;
+ return p;
+}
+
+/* Return string or write number to tmp buffer and return pointer to start. */
+const char *lj_strfmt_wstrnum(lua_State *L, cTValue *o, MSize *lenp)
+{
+ SBuf *sb;
+ if (tvisstr(o)) {
+ *lenp = strV(o)->len;
+ return strVdata(o);
+ } else if (tvisint(o)) {
+ sb = lj_strfmt_putint(lj_buf_tmp_(L), intV(o));
+ } else if (tvisnum(o)) {
+ sb = lj_strfmt_putfnum(lj_buf_tmp_(L), STRFMT_G14, o->n);
+ } else {
+ return NULL;
+ }
+ *lenp = sbuflen(sb);
+ return sbufB(sb);
+}
+
+/* -- Unformatted conversions to buffer ----------------------------------- */
+
+/* Add integer to buffer. */
+SBuf * LJ_FASTCALL lj_strfmt_putint(SBuf *sb, int32_t k)
+{
+ setsbufP(sb, lj_strfmt_wint(lj_buf_more(sb, STRFMT_MAXBUF_INT), k));
+ return sb;
+}
+
+#if LJ_HASJIT
+/* Add number to buffer. */
+SBuf * LJ_FASTCALL lj_strfmt_putnum(SBuf *sb, cTValue *o)
+{
+ return lj_strfmt_putfnum(sb, STRFMT_G14, o->n);
+}
+#endif
+
+SBuf * LJ_FASTCALL lj_strfmt_putptr(SBuf *sb, const void *v)
+{
+ setsbufP(sb, lj_strfmt_wptr(lj_buf_more(sb, STRFMT_MAXBUF_PTR), v));
+ return sb;
+}
+
+/* Add quoted string to buffer. */
+SBuf * LJ_FASTCALL lj_strfmt_putquoted(SBuf *sb, GCstr *str)
+{
+ const char *s = strdata(str);
+ MSize len = str->len;
+ lj_buf_putb(sb, '"');
+ while (len--) {
+ uint32_t c = (uint32_t)(uint8_t)*s++;
+ char *p = lj_buf_more(sb, 4);
+ if (c == '"' || c == '\\' || c == '\n') {
+ *p++ = '\\';
+ } else if (lj_char_iscntrl(c)) { /* This can only be 0-31 or 127. */
+ uint32_t d;
+ *p++ = '\\';
+ if (c >= 100 || lj_char_isdigit((uint8_t)*s)) {
+ *p++ = (char)('0'+(c >= 100)); if (c >= 100) c -= 100;
+ goto tens;
+ } else if (c >= 10) {
+ tens:
+ d = (c * 205) >> 11; c -= d * 10; *p++ = (char)('0'+d);
+ }
+ c += '0';
+ }
+ *p++ = (char)c;
+ setsbufP(sb, p);
+ }
+ lj_buf_putb(sb, '"');
+ return sb;
+}
+
+/* -- Formatted conversions to buffer ------------------------------------- */
+
+/* Add formatted char to buffer. */
+SBuf *lj_strfmt_putfchar(SBuf *sb, SFormat sf, int32_t c)
+{
+ MSize width = STRFMT_WIDTH(sf);
+ char *p = lj_buf_more(sb, width > 1 ? width : 1);
+ if ((sf & STRFMT_F_LEFT)) *p++ = (char)c;
+ while (width-- > 1) *p++ = ' ';
+ if (!(sf & STRFMT_F_LEFT)) *p++ = (char)c;
+ setsbufP(sb, p);
+ return sb;
+}
+
+/* Add formatted string to buffer. */
+SBuf *lj_strfmt_putfstr(SBuf *sb, SFormat sf, GCstr *str)
+{
+ MSize len = str->len <= STRFMT_PREC(sf) ? str->len : STRFMT_PREC(sf);
+ MSize width = STRFMT_WIDTH(sf);
+ char *p = lj_buf_more(sb, width > len ? width : len);
+ if ((sf & STRFMT_F_LEFT)) p = lj_buf_wmem(p, strdata(str), len);
+ while (width-- > len) *p++ = ' ';
+ if (!(sf & STRFMT_F_LEFT)) p = lj_buf_wmem(p, strdata(str), len);
+ setsbufP(sb, p);
+ return sb;
+}
+
+/* Add formatted signed/unsigned integer to buffer. */
+SBuf *lj_strfmt_putfxint(SBuf *sb, SFormat sf, uint64_t k)
+{
+ char buf[STRFMT_MAXBUF_XINT], *q = buf + sizeof(buf), *p;
+#ifdef LUA_USE_ASSERT
+ char *ps;
+#endif
+ MSize prefix = 0, len, prec, pprec, width, need;
+
+ /* Figure out signed prefixes. */
+ if (STRFMT_TYPE(sf) == STRFMT_INT) {
+ if ((int64_t)k < 0) {
+ k = (uint64_t)-(int64_t)k;
+ prefix = 256 + '-';
+ } else if ((sf & STRFMT_F_PLUS)) {
+ prefix = 256 + '+';
+ } else if ((sf & STRFMT_F_SPACE)) {
+ prefix = 256 + ' ';
+ }
+ }
+
+ /* Convert number and store to fixed-size buffer in reverse order. */
+ prec = STRFMT_PREC(sf);
+ if ((int32_t)prec >= 0) sf &= ~STRFMT_F_ZERO;
+ if (k == 0) { /* Special-case zero argument. */
+ if (prec != 0 ||
+ (sf & (STRFMT_T_OCT|STRFMT_F_ALT)) == (STRFMT_T_OCT|STRFMT_F_ALT))
+ *--q = '0';
+ } else if (!(sf & (STRFMT_T_HEX|STRFMT_T_OCT))) { /* Decimal. */
+ uint32_t k2;
+ while ((k >> 32)) { *--q = (char)('0' + k % 10); k /= 10; }
+ k2 = (uint32_t)k;
+ do { *--q = (char)('0' + k2 % 10); k2 /= 10; } while (k2);
+ } else if ((sf & STRFMT_T_HEX)) { /* Hex. */
+ const char *hexdig = (sf & STRFMT_F_UPPER) ? "0123456789ABCDEF" :
+ "0123456789abcdef";
+ do { *--q = hexdig[(k & 15)]; k >>= 4; } while (k);
+ if ((sf & STRFMT_F_ALT)) prefix = 512 + ((sf & STRFMT_F_UPPER) ? 'X' : 'x');
+ } else { /* Octal. */
+ do { *--q = (char)('0' + (uint32_t)(k & 7)); k >>= 3; } while (k);
+ if ((sf & STRFMT_F_ALT)) *--q = '0';
+ }
+
+ /* Calculate sizes. */
+ len = (MSize)(buf + sizeof(buf) - q);
+ if ((int32_t)len >= (int32_t)prec) prec = len;
+ width = STRFMT_WIDTH(sf);
+ pprec = prec + (prefix >> 8);
+ need = width > pprec ? width : pprec;
+ p = lj_buf_more(sb, need);
+#ifdef LUA_USE_ASSERT
+ ps = p;
+#endif
+
+ /* Format number with leading/trailing whitespace and zeros. */
+ if ((sf & (STRFMT_F_LEFT|STRFMT_F_ZERO)) == 0)
+ while (width-- > pprec) *p++ = ' ';
+ if (prefix) {
+ if ((char)prefix >= 'X') *p++ = '0';
+ *p++ = (char)prefix;
+ }
+ if ((sf & (STRFMT_F_LEFT|STRFMT_F_ZERO)) == STRFMT_F_ZERO)
+ while (width-- > pprec) *p++ = '0';
+ while (prec-- > len) *p++ = '0';
+ while (q < buf + sizeof(buf)) *p++ = *q++; /* Add number itself. */
+ if ((sf & STRFMT_F_LEFT))
+ while (width-- > pprec) *p++ = ' ';
+
+ lua_assert(need == (MSize)(p - ps));
+ setsbufP(sb, p);
+ return sb;
+}
+
+/* Add number formatted as signed integer to buffer. */
+SBuf *lj_strfmt_putfnum_int(SBuf *sb, SFormat sf, lua_Number n)
+{
+ int64_t k = (int64_t)n;
+ if (checki32(k) && sf == STRFMT_INT)
+ return lj_strfmt_putint(sb, (int32_t)k); /* Shortcut for plain %d. */
+ else
+ return lj_strfmt_putfxint(sb, sf, (uint64_t)k);
+}
+
+/* Add number formatted as unsigned integer to buffer. */
+SBuf *lj_strfmt_putfnum_uint(SBuf *sb, SFormat sf, lua_Number n)
+{
+ int64_t k;
+ if (n >= 9223372036854775808.0)
+ k = (int64_t)(n - 18446744073709551616.0);
+ else
+ k = (int64_t)n;
+ return lj_strfmt_putfxint(sb, sf, (uint64_t)k);
+}
+
+/* -- Conversions to strings ---------------------------------------------- */
+
+/* Convert integer to string. */
+GCstr * LJ_FASTCALL lj_strfmt_int(lua_State *L, int32_t k)
+{
+ char buf[STRFMT_MAXBUF_INT];
+ MSize len = (MSize)(lj_strfmt_wint(buf, k) - buf);
+ return lj_str_new(L, buf, len);
+}
+
+/* Convert integer or number to string. */
+GCstr * LJ_FASTCALL lj_strfmt_number(lua_State *L, cTValue *o)
+{
+ return tvisint(o) ? lj_strfmt_int(L, intV(o)) : lj_strfmt_num(L, o);
+}
+
+#if LJ_HASJIT
+/* Convert char value to string. */
+GCstr * LJ_FASTCALL lj_strfmt_char(lua_State *L, int c)
+{
+ char buf[1];
+ buf[0] = c;
+ return lj_str_new(L, buf, 1);
+}
+#endif
+
+/* Raw conversion of object to string. */
+GCstr * LJ_FASTCALL lj_strfmt_obj(lua_State *L, cTValue *o)
+{
+ if (tvisstr(o)) {
+ return strV(o);
+ } else if (tvisnumber(o)) {
+ return lj_strfmt_number(L, o);
+ } else if (tvisnil(o)) {
+ return lj_str_newlit(L, "nil");
+ } else if (tvisfalse(o)) {
+ return lj_str_newlit(L, "false");
+ } else if (tvistrue(o)) {
+ return lj_str_newlit(L, "true");
+ } else {
+ char buf[8+2+2+16], *p = buf;
+ p = lj_buf_wmem(p, lj_typename(o), (MSize)strlen(lj_typename(o)));
+ *p++ = ':'; *p++ = ' ';
+ if (tvisfunc(o) && isffunc(funcV(o))) {
+ p = lj_buf_wmem(p, "builtin#", 8);
+ p = lj_strfmt_wint(p, funcV(o)->c.ffid);
+ } else {
+ p = lj_strfmt_wptr(p, lj_obj_ptr(o));
+ }
+ return lj_str_new(L, buf, (size_t)(p - buf));
+ }
+}
+
+/* -- Internal string formatting ------------------------------------------ */
+
+/*
+** These functions are only used for lua_pushfstring(), lua_pushvfstring()
+** and for internal string formatting (e.g. error messages). Caveat: unlike
+** string.format(), only a limited subset of formats and flags are supported!
+**
+** LuaJIT has support for a couple more formats than Lua 5.1/5.2:
+** - %d %u %o %x with full formatting, 32 bit integers only.
+** - %f and other FP formats are really %.14g.
+** - %s %c %p without formatting.
+*/
+
+/* Push formatted message as a string object to Lua stack. va_list variant. */
+const char *lj_strfmt_pushvf(lua_State *L, const char *fmt, va_list argp)
+{
+ SBuf *sb = lj_buf_tmp_(L);
+ FormatState fs;
+ SFormat sf;
+ GCstr *str;
+ lj_strfmt_init(&fs, fmt, (MSize)strlen(fmt));
+ while ((sf = lj_strfmt_parse(&fs)) != STRFMT_EOF) {
+ switch (STRFMT_TYPE(sf)) {
+ case STRFMT_LIT:
+ lj_buf_putmem(sb, fs.str, fs.len);
+ break;
+ case STRFMT_INT:
+ lj_strfmt_putfxint(sb, sf, va_arg(argp, int32_t));
+ break;
+ case STRFMT_UINT:
+ lj_strfmt_putfxint(sb, sf, va_arg(argp, uint32_t));
+ break;
+ case STRFMT_NUM:
+ lj_strfmt_putfnum(sb, STRFMT_G14, va_arg(argp, lua_Number));
+ break;
+ case STRFMT_STR: {
+ const char *s = va_arg(argp, char *);
+ if (s == NULL) s = "(null)";
+ lj_buf_putmem(sb, s, (MSize)strlen(s));
+ break;
+ }
+ case STRFMT_CHAR:
+ lj_buf_putb(sb, va_arg(argp, int));
+ break;
+ case STRFMT_PTR:
+ lj_strfmt_putptr(sb, va_arg(argp, void *));
+ break;
+ case STRFMT_ERR:
+ default:
+ lj_buf_putb(sb, '?');
+ lua_assert(0);
+ break;
+ }
+ }
+ str = lj_buf_str(L, sb);
+ setstrV(L, L->top, str);
+ incr_top(L);
+ return strdata(str);
+}
+
+/* Push formatted message as a string object to Lua stack. Vararg variant. */
+const char *lj_strfmt_pushf(lua_State *L, const char *fmt, ...)
+{
+ const char *msg;
+ va_list argp;
+ va_start(argp, fmt);
+ msg = lj_strfmt_pushvf(L, fmt, argp);
+ va_end(argp);
+ return msg;
+}
+
--- /dev/null
- ** Copyright (C) 2005-2015 Mike Pall. See Copyright Notice in luajit.h
+/*
+** String formatting.
++** Copyright (C) 2005-2016 Mike Pall. See Copyright Notice in luajit.h
+*/
+
+#ifndef _LJ_STRFMT_H
+#define _LJ_STRFMT_H
+
+#include "lj_obj.h"
+
+typedef uint32_t SFormat; /* Format indicator. */
+
+/* Format parser state. */
+typedef struct FormatState {
+ const uint8_t *p; /* Current format string pointer. */
+ const uint8_t *e; /* End of format string. */
+ const char *str; /* Returned literal string. */
+ MSize len; /* Size of literal string. */
+} FormatState;
+
+/* Format types (max. 16). */
+typedef enum FormatType {
+ STRFMT_EOF, STRFMT_ERR, STRFMT_LIT,
+ STRFMT_INT, STRFMT_UINT, STRFMT_NUM, STRFMT_STR, STRFMT_CHAR, STRFMT_PTR
+} FormatType;
+
+/* Format subtypes (bits are reused). */
+#define STRFMT_T_HEX 0x0010 /* STRFMT_UINT */
+#define STRFMT_T_OCT 0x0020 /* STRFMT_UINT */
+#define STRFMT_T_FP_A 0x0000 /* STRFMT_NUM */
+#define STRFMT_T_FP_E 0x0010 /* STRFMT_NUM */
+#define STRFMT_T_FP_F 0x0020 /* STRFMT_NUM */
+#define STRFMT_T_FP_G 0x0030 /* STRFMT_NUM */
+#define STRFMT_T_QUOTED 0x0010 /* STRFMT_STR */
+
+/* Format flags. */
+#define STRFMT_F_LEFT 0x0100
+#define STRFMT_F_PLUS 0x0200
+#define STRFMT_F_ZERO 0x0400
+#define STRFMT_F_SPACE 0x0800
+#define STRFMT_F_ALT 0x1000
+#define STRFMT_F_UPPER 0x2000
+
+/* Format indicator fields. */
+#define STRFMT_SH_WIDTH 16
+#define STRFMT_SH_PREC 24
+
+#define STRFMT_TYPE(sf) ((FormatType)((sf) & 15))
+#define STRFMT_WIDTH(sf) (((sf) >> STRFMT_SH_WIDTH) & 255u)
+#define STRFMT_PREC(sf) ((((sf) >> STRFMT_SH_PREC) & 255u) - 1u)
+#define STRFMT_FP(sf) (((sf) >> 4) & 3)
+
+/* Formats for conversion characters. */
+#define STRFMT_A (STRFMT_NUM|STRFMT_T_FP_A)
+#define STRFMT_C (STRFMT_CHAR)
+#define STRFMT_D (STRFMT_INT)
+#define STRFMT_E (STRFMT_NUM|STRFMT_T_FP_E)
+#define STRFMT_F (STRFMT_NUM|STRFMT_T_FP_F)
+#define STRFMT_G (STRFMT_NUM|STRFMT_T_FP_G)
+#define STRFMT_I STRFMT_D
+#define STRFMT_O (STRFMT_UINT|STRFMT_T_OCT)
+#define STRFMT_P (STRFMT_PTR)
+#define STRFMT_Q (STRFMT_STR|STRFMT_T_QUOTED)
+#define STRFMT_S (STRFMT_STR)
+#define STRFMT_U (STRFMT_UINT)
+#define STRFMT_X (STRFMT_UINT|STRFMT_T_HEX)
+#define STRFMT_G14 (STRFMT_G | ((14+1) << STRFMT_SH_PREC))
+
+/* Maximum buffer sizes for conversions. */
+#define STRFMT_MAXBUF_XINT (1+22) /* '0' prefix + uint64_t in octal. */
+#define STRFMT_MAXBUF_INT (1+10) /* Sign + int32_t in decimal. */
+#define STRFMT_MAXBUF_NUM 32 /* Must correspond with STRFMT_G14. */
+#define STRFMT_MAXBUF_PTR (2+2*sizeof(ptrdiff_t)) /* "0x" + hex ptr. */
+
+/* Format parser. */
+LJ_FUNC SFormat LJ_FASTCALL lj_strfmt_parse(FormatState *fs);
+
+static LJ_AINLINE void lj_strfmt_init(FormatState *fs, const char *p, MSize len)
+{
+ fs->p = (const uint8_t *)p;
+ fs->e = (const uint8_t *)p + len;
+ lua_assert(*fs->e == 0); /* Must be NUL-terminated (may have NULs inside). */
+}
+
+/* Raw conversions. */
+LJ_FUNC char * LJ_FASTCALL lj_strfmt_wint(char *p, int32_t k);
+LJ_FUNC char * LJ_FASTCALL lj_strfmt_wptr(char *p, const void *v);
+LJ_FUNC char * LJ_FASTCALL lj_strfmt_wuleb128(char *p, uint32_t v);
+LJ_FUNC const char *lj_strfmt_wstrnum(lua_State *L, cTValue *o, MSize *lenp);
+
+/* Unformatted conversions to buffer. */
+LJ_FUNC SBuf * LJ_FASTCALL lj_strfmt_putint(SBuf *sb, int32_t k);
+#if LJ_HASJIT
+LJ_FUNC SBuf * LJ_FASTCALL lj_strfmt_putnum(SBuf *sb, cTValue *o);
+#endif
+LJ_FUNC SBuf * LJ_FASTCALL lj_strfmt_putptr(SBuf *sb, const void *v);
+LJ_FUNC SBuf * LJ_FASTCALL lj_strfmt_putquoted(SBuf *sb, GCstr *str);
+
+/* Formatted conversions to buffer. */
+LJ_FUNC SBuf *lj_strfmt_putfxint(SBuf *sb, SFormat sf, uint64_t k);
+LJ_FUNC SBuf *lj_strfmt_putfnum_int(SBuf *sb, SFormat sf, lua_Number n);
+LJ_FUNC SBuf *lj_strfmt_putfnum_uint(SBuf *sb, SFormat sf, lua_Number n);
+LJ_FUNC SBuf *lj_strfmt_putfnum(SBuf *sb, SFormat, lua_Number n);
+LJ_FUNC SBuf *lj_strfmt_putfchar(SBuf *sb, SFormat, int32_t c);
+LJ_FUNC SBuf *lj_strfmt_putfstr(SBuf *sb, SFormat, GCstr *str);
+
+/* Conversions to strings. */
+LJ_FUNC GCstr * LJ_FASTCALL lj_strfmt_int(lua_State *L, int32_t k);
+LJ_FUNCA GCstr * LJ_FASTCALL lj_strfmt_num(lua_State *L, cTValue *o);
+LJ_FUNCA GCstr * LJ_FASTCALL lj_strfmt_number(lua_State *L, cTValue *o);
+#if LJ_HASJIT
+LJ_FUNC GCstr * LJ_FASTCALL lj_strfmt_char(lua_State *L, int c);
+#endif
+LJ_FUNC GCstr * LJ_FASTCALL lj_strfmt_obj(lua_State *L, cTValue *o);
+
+/* Internal string formatting. */
+LJ_FUNC const char *lj_strfmt_pushvf(lua_State *L, const char *fmt,
+ va_list argp);
+LJ_FUNC const char *lj_strfmt_pushf(lua_State *L, const char *fmt, ...)
+#ifdef __GNUC__
+ __attribute__ ((format (printf, 2, 3)))
+#endif
+ ;
+
+#endif
--- /dev/null
- ** Copyright (C) 2005-2015 Mike Pall. See Copyright Notice in luajit.h
+/*
+** String formatting for floating-point numbers.
++** Copyright (C) 2005-2016 Mike Pall. See Copyright Notice in luajit.h
+** Contributed by Peter Cawley.
+*/
+
+#include <stdio.h>
+
+#define lj_strfmt_num_c
+#define LUA_CORE
+
+#include "lj_obj.h"
+#include "lj_buf.h"
+#include "lj_str.h"
+#include "lj_strfmt.h"
+
+/* -- Precomputed tables -------------------------------------------------- */
+
+/* Rescale factors to push the exponent of a number towards zero. */
+#define RESCALE_EXPONENTS(P, N) \
+ P(308), P(289), P(270), P(250), P(231), P(212), P(193), P(173), P(154), \
+ P(135), P(115), P(96), P(77), P(58), P(38), P(0), P(0), P(0), N(39), N(58), \
+ N(77), N(96), N(116), N(135), N(154), N(174), N(193), N(212), N(231), \
+ N(251), N(270), N(289)
+
+#define ONE_E_P(X) 1e+0 ## X
+#define ONE_E_N(X) 1e-0 ## X
+static const int16_t rescale_e[] = { RESCALE_EXPONENTS(-, +) };
+static const double rescale_n[] = { RESCALE_EXPONENTS(ONE_E_P, ONE_E_N) };
+#undef ONE_E_N
+#undef ONE_E_P
+
+/*
+** For p in range -70 through 57, this table encodes pairs (m, e) such that
+** 4*2^p <= (uint8_t)m*10^e, and is the smallest value for which this holds.
+*/
+static const int8_t four_ulp_m_e[] = {
+ 34, -21, 68, -21, 14, -20, 28, -20, 55, -20, 2, -19, 3, -19, 5, -19, 9, -19,
+ -82, -18, 35, -18, 7, -17, -117, -17, 28, -17, 56, -17, 112, -16, -33, -16,
+ 45, -16, 89, -16, -78, -15, 36, -15, 72, -15, -113, -14, 29, -14, 57, -14,
+ 114, -13, -28, -13, 46, -13, 91, -12, -74, -12, 37, -12, 73, -12, 15, -11, 3,
+ -11, 59, -11, 2, -10, 3, -10, 5, -10, 1, -9, -69, -9, 38, -9, 75, -9, 15, -7,
+ 3, -7, 6, -7, 12, -6, -17, -7, 48, -7, 96, -7, -65, -6, 39, -6, 77, -6, -103,
+ -5, 31, -5, 62, -5, 123, -4, -11, -4, 49, -4, 98, -4, -60, -3, 4, -2, 79, -3,
+ 16, -2, 32, -2, 63, -2, 2, -1, 25, 0, 5, 1, 1, 2, 2, 2, 4, 2, 8, 2, 16, 2,
+ 32, 2, 64, 2, -128, 2, 26, 2, 52, 2, 103, 3, -51, 3, 41, 4, 82, 4, -92, 4,
+ 33, 4, 66, 4, -124, 5, 27, 5, 53, 5, 105, 6, 21, 6, 42, 6, 84, 6, 17, 7, 34,
+ 7, 68, 7, 2, 8, 3, 8, 6, 8, 108, 9, -41, 9, 43, 10, 86, 9, -84, 10, 35, 10,
+ 69, 10, -118, 11, 28, 11, 55, 12, 11, 13, 22, 13, 44, 13, 88, 13, -80, 13,
+ 36, 13, 71, 13, -115, 14, 29, 14, 57, 14, 113, 15, -30, 15, 46, 15, 91, 15,
+ 19, 16, 37, 16, 73, 16, 2, 17, 3, 17, 6, 17
+};
+
+/* min(2^32-1, 10^e-1) for e in range 0 through 10 */
+static uint32_t ndigits_dec_threshold[] = {
+ 0, 9U, 99U, 999U, 9999U, 99999U, 999999U,
+ 9999999U, 99999999U, 999999999U, 0xffffffffU
+};
+
+/* -- Helper functions ---------------------------------------------------- */
+
+/* Compute the number of digits in the decimal representation of x. */
+static MSize ndigits_dec(uint32_t x)
+{
+ MSize t = ((lj_fls(x | 1) * 77) >> 8) + 1; /* 2^8/77 is roughly log2(10) */
+ return t + (x > ndigits_dec_threshold[t]);
+}
+
+#define WINT_R(x, sh, sc) \
+ { uint32_t d = (x*(((1<<sh)+sc-1)/sc))>>sh; x -= d*sc; *p++ = (char)('0'+d); }
+
+/* Write 9-digit unsigned integer to buffer. */
+static char *lj_strfmt_wuint9(char *p, uint32_t u)
+{
+ uint32_t v = u / 10000, w;
+ u -= v * 10000;
+ w = v / 10000;
+ v -= w * 10000;
+ *p++ = (char)('0'+w);
+ WINT_R(v, 23, 1000)
+ WINT_R(v, 12, 100)
+ WINT_R(v, 10, 10)
+ *p++ = (char)('0'+v);
+ WINT_R(u, 23, 1000)
+ WINT_R(u, 12, 100)
+ WINT_R(u, 10, 10)
+ *p++ = (char)('0'+u);
+ return p;
+}
+#undef WINT_R
+
+/* -- Extended precision arithmetic --------------------------------------- */
+
+/*
+** The "nd" format is a fixed-precision decimal representation for numbers. It
+** consists of up to 64 uint32_t values, with each uint32_t storing a value
+** in the range [0, 1e9). A number in "nd" format consists of three variables:
+**
+** uint32_t nd[64];
+** uint32_t ndlo;
+** uint32_t ndhi;
+**
+** The integral part of the number is stored in nd[0 ... ndhi], the value of
+** which is sum{i in [0, ndhi] | nd[i] * 10^(9*i)}. If the fractional part of
+** the number is zero, ndlo is zero. Otherwise, the fractional part is stored
+** in nd[ndlo ... 63], the value of which is taken to be
+** sum{i in [ndlo, 63] | nd[i] * 10^(9*(i-64))}.
+**
+** If the array part had 128 elements rather than 64, then every double would
+** have an exact representation in "nd" format. With 64 elements, all integral
+** doubles have an exact representation, and all non-integral doubles have
+** enough digits to make both %.99e and %.99f do the right thing.
+*/
+
+#if LJ_64
+#define ND_MUL2K_MAX_SHIFT 29
+#define ND_MUL2K_DIV1E9(val) ((uint32_t)((val) / 1000000000))
+#else
+#define ND_MUL2K_MAX_SHIFT 11
+#define ND_MUL2K_DIV1E9(val) ((uint32_t)((val) >> 9) / 1953125)
+#endif
+
+/* Multiply nd by 2^k and add carry_in (ndlo is assumed to be zero). */
+static uint32_t nd_mul2k(uint32_t* nd, uint32_t ndhi, uint32_t k,
+ uint32_t carry_in, SFormat sf)
+{
+ uint32_t i, ndlo = 0, start = 1;
+ /* Performance hacks. */
+ if (k > ND_MUL2K_MAX_SHIFT*2 && STRFMT_FP(sf) != STRFMT_FP(STRFMT_T_FP_F)) {
+ start = ndhi - (STRFMT_PREC(sf) + 17) / 8;
+ }
+ /* Real logic. */
+ while (k >= ND_MUL2K_MAX_SHIFT) {
+ for (i = ndlo; i <= ndhi; i++) {
+ uint64_t val = ((uint64_t)nd[i] << ND_MUL2K_MAX_SHIFT) | carry_in;
+ carry_in = ND_MUL2K_DIV1E9(val);
+ nd[i] = (uint32_t)val - carry_in * 1000000000;
+ }
+ if (carry_in) {
+ nd[++ndhi] = carry_in; carry_in = 0;
+ if(start++ == ndlo) ++ndlo;
+ }
+ k -= ND_MUL2K_MAX_SHIFT;
+ }
+ if (k) {
+ for (i = ndlo; i <= ndhi; i++) {
+ uint64_t val = ((uint64_t)nd[i] << k) | carry_in;
+ carry_in = ND_MUL2K_DIV1E9(val);
+ nd[i] = (uint32_t)val - carry_in * 1000000000;
+ }
+ if (carry_in) nd[++ndhi] = carry_in;
+ }
+ return ndhi;
+}
+
+/* Divide nd by 2^k (ndlo is assumed to be zero). */
+static uint32_t nd_div2k(uint32_t* nd, uint32_t ndhi, uint32_t k, SFormat sf)
+{
+ uint32_t ndlo = 0, stop1 = ~0, stop2 = ~0;
+ /* Performance hacks. */
+ if (!ndhi) {
+ if (!nd[0]) {
+ return 0;
+ } else {
+ uint32_t s = lj_ffs(nd[0]);
+ if (s >= k) { nd[0] >>= k; return 0; }
+ nd[0] >>= s; k -= s;
+ }
+ }
+ if (k > 18) {
+ if (STRFMT_FP(sf) == STRFMT_FP(STRFMT_T_FP_F)) {
+ stop1 = 63 - (int32_t)STRFMT_PREC(sf) / 9;
+ } else {
+ int32_t floorlog2 = ndhi * 29 + lj_fls(nd[ndhi]) - k;
+ int32_t floorlog10 = (int32_t)(floorlog2 * 0.30102999566398114);
+ stop1 = 62 + (floorlog10 - (int32_t)STRFMT_PREC(sf)) / 9;
+ stop2 = 61 + ndhi - (int32_t)STRFMT_PREC(sf) / 8;
+ }
+ }
+ /* Real logic. */
+ while (k >= 9) {
+ uint32_t i = ndhi, carry = 0;
+ for (;;) {
+ uint32_t val = nd[i];
+ nd[i] = (val >> 9) + carry;
+ carry = (val & 0x1ff) * 1953125;
+ if (i == ndlo) break;
+ i = (i - 1) & 0x3f;
+ }
+ if (ndlo != stop1 && ndlo != stop2) {
+ if (carry) { ndlo = (ndlo - 1) & 0x3f; nd[ndlo] = carry; }
+ if (!nd[ndhi]) { ndhi = (ndhi - 1) & 0x3f; stop2--; }
+ } else if (!nd[ndhi]) {
+ if (ndhi != ndlo) { ndhi = (ndhi - 1) & 0x3f; stop2--; }
+ else return ndlo;
+ }
+ k -= 9;
+ }
+ if (k) {
+ uint32_t mask = (1U << k) - 1, mul = 1000000000 >> k, i = ndhi, carry = 0;
+ for (;;) {
+ uint32_t val = nd[i];
+ nd[i] = (val >> k) + carry;
+ carry = (val & mask) * mul;
+ if (i == ndlo) break;
+ i = (i - 1) & 0x3f;
+ }
+ if (carry) { ndlo = (ndlo - 1) & 0x3f; nd[ndlo] = carry; }
+ }
+ return ndlo;
+}
+
+/* Add m*10^e to nd (assumes ndlo <= e/9 <= ndhi and 0 <= m <= 9). */
+static uint32_t nd_add_m10e(uint32_t* nd, uint32_t ndhi, uint8_t m, int32_t e)
+{
+ uint32_t i, carry;
+ if (e >= 0) {
+ i = (uint32_t)e/9;
+ carry = m * (ndigits_dec_threshold[e - (int32_t)i*9] + 1);
+ } else {
+ int32_t f = (e-8)/9;
+ i = (uint32_t)(64 + f);
+ carry = m * (ndigits_dec_threshold[e - f*9] + 1);
+ }
+ for (;;) {
+ uint32_t val = nd[i] + carry;
+ if (LJ_UNLIKELY(val >= 1000000000)) {
+ val -= 1000000000;
+ nd[i] = val;
+ if (LJ_UNLIKELY(i == ndhi)) {
+ ndhi = (ndhi + 1) & 0x3f;
+ nd[ndhi] = 1;
+ break;
+ }
+ carry = 1;
+ i = (i + 1) & 0x3f;
+ } else {
+ nd[i] = val;
+ break;
+ }
+ }
+ return ndhi;
+}
+
+/* Test whether two "nd" values are equal in their most significant digits. */
+static int nd_similar(uint32_t* nd, uint32_t ndhi, uint32_t* ref, MSize hilen,
+ MSize prec)
+{
+ char nd9[9], ref9[9];
+ if (hilen <= prec) {
+ if (LJ_UNLIKELY(nd[ndhi] != *ref)) return 0;
+ prec -= hilen; ref--; ndhi = (ndhi - 1) & 0x3f;
+ if (prec >= 9) {
+ if (LJ_UNLIKELY(nd[ndhi] != *ref)) return 0;
+ prec -= 9; ref--; ndhi = (ndhi - 1) & 0x3f;
+ }
+ } else {
+ prec -= hilen - 9;
+ }
+ lua_assert(prec < 9);
+ lj_strfmt_wuint9(nd9, nd[ndhi]);
+ lj_strfmt_wuint9(ref9, *ref);
+ return !memcmp(nd9, ref9, prec) && (nd9[prec] < '5') == (ref9[prec] < '5');
+}
+
+/* -- Formatted conversions to buffer ------------------------------------- */
+
+/* Write formatted floating-point number to either sb or p. */
+static char *lj_strfmt_wfnum(SBuf *sb, SFormat sf, lua_Number n, char *p)
+{
+ MSize width = STRFMT_WIDTH(sf), prec = STRFMT_PREC(sf), len;
+ TValue t;
+ t.n = n;
+ if (LJ_UNLIKELY((t.u32.hi << 1) >= 0xffe00000)) {
+ /* Handle non-finite values uniformly for %a, %e, %f, %g. */
+ int prefix = 0, ch = (sf & STRFMT_F_UPPER) ? 0x202020 : 0;
+ if (((t.u32.hi & 0x000fffff) | t.u32.lo) != 0) {
+ ch ^= ('n' << 16) | ('a' << 8) | 'n';
+ if ((sf & STRFMT_F_SPACE)) prefix = ' ';
+ } else {
+ ch ^= ('i' << 16) | ('n' << 8) | 'f';
+ if ((t.u32.hi & 0x80000000)) prefix = '-';
+ else if ((sf & STRFMT_F_PLUS)) prefix = '+';
+ else if ((sf & STRFMT_F_SPACE)) prefix = ' ';
+ }
+ len = 3 + (prefix != 0);
+ if (!p) p = lj_buf_more(sb, width > len ? width : len);
+ if (!(sf & STRFMT_F_LEFT)) while (width-- > len) *p++ = ' ';
+ if (prefix) *p++ = prefix;
+ *p++ = (char)(ch >> 16); *p++ = (char)(ch >> 8); *p++ = (char)ch;
+ } else if (STRFMT_FP(sf) == STRFMT_FP(STRFMT_T_FP_A)) {
+ /* %a */
+ const char *hexdig = (sf & STRFMT_F_UPPER) ? "0123456789ABCDEFPX"
+ : "0123456789abcdefpx";
+ int32_t e = (t.u32.hi >> 20) & 0x7ff;
+ char prefix = 0, eprefix = '+';
+ if (t.u32.hi & 0x80000000) prefix = '-';
+ else if ((sf & STRFMT_F_PLUS)) prefix = '+';
+ else if ((sf & STRFMT_F_SPACE)) prefix = ' ';
+ t.u32.hi &= 0xfffff;
+ if (e) {
+ t.u32.hi |= 0x100000;
+ e -= 1023;
+ } else if (t.u32.lo | t.u32.hi) {
+ /* Non-zero denormal - normalise it. */
+ uint32_t shift = t.u32.hi ? 20-lj_fls(t.u32.hi) : 52-lj_fls(t.u32.lo);
+ e = -1022 - shift;
+ t.u64 <<= shift;
+ }
+ /* abs(n) == t.u64 * 2^(e - 52) */
+ /* If n != 0, bit 52 of t.u64 is set, and is the highest set bit. */
+ if ((int32_t)prec < 0) {
+ /* Default precision: use smallest precision giving exact result. */
+ prec = t.u32.lo ? 13-lj_ffs(t.u32.lo)/4 : 5-lj_ffs(t.u32.hi|0x100000)/4;
+ } else if (prec < 13) {
+ /* Precision is sufficiently low as to maybe require rounding. */
+ t.u64 += (((uint64_t)1) << (51 - prec*4));
+ }
+ if (e < 0) {
+ eprefix = '-';
+ e = -e;
+ }
+ len = 5 + ndigits_dec((uint32_t)e) + prec + (prefix != 0)
+ + ((prec | (sf & STRFMT_F_ALT)) != 0);
+ if (!p) p = lj_buf_more(sb, width > len ? width : len);
+ if (!(sf & (STRFMT_F_LEFT | STRFMT_F_ZERO))) {
+ while (width-- > len) *p++ = ' ';
+ }
+ if (prefix) *p++ = prefix;
+ *p++ = '0';
+ *p++ = hexdig[17]; /* x or X */
+ if ((sf & (STRFMT_F_LEFT | STRFMT_F_ZERO)) == STRFMT_F_ZERO) {
+ while (width-- > len) *p++ = '0';
+ }
+ *p++ = '0' + (t.u32.hi >> 20); /* Usually '1', sometimes '0' or '2'. */
+ if ((prec | (sf & STRFMT_F_ALT))) {
+ /* Emit fractional part. */
+ char *q = p + 1 + prec;
+ *p = '.';
+ if (prec < 13) t.u64 >>= (52 - prec*4);
+ else while (prec > 13) p[prec--] = '0';
+ while (prec) { p[prec--] = hexdig[t.u64 & 15]; t.u64 >>= 4; }
+ p = q;
+ }
+ *p++ = hexdig[16]; /* p or P */
+ *p++ = eprefix; /* + or - */
+ p = lj_strfmt_wint(p, e);
+ } else {
+ /* %e or %f or %g - begin by converting n to "nd" format. */
+ uint32_t nd[64];
+ uint32_t ndhi = 0, ndlo, i;
+ int32_t e = (t.u32.hi >> 20) & 0x7ff, ndebias = 0;
+ char prefix = 0, *q;
+ if (t.u32.hi & 0x80000000) prefix = '-';
+ else if ((sf & STRFMT_F_PLUS)) prefix = '+';
+ else if ((sf & STRFMT_F_SPACE)) prefix = ' ';
+ prec += ((int32_t)prec >> 31) & 7; /* Default precision is 6. */
+ if (STRFMT_FP(sf) == STRFMT_FP(STRFMT_T_FP_G)) {
+ /* %g - decrement precision if non-zero (to make it like %e). */
+ prec--;
+ prec ^= (uint32_t)((int32_t)prec >> 31);
+ }
+ if ((sf & STRFMT_T_FP_E) && prec < 14 && n != 0) {
+ /* Precision is sufficiently low that rescaling will probably work. */
+ if ((ndebias = rescale_e[e >> 6])) {
+ t.n = n * rescale_n[e >> 6];
+ t.u64 -= 2; /* Convert 2ulp below (later we convert 2ulp above). */
+ nd[0] = 0x100000 | (t.u32.hi & 0xfffff);
+ e = ((t.u32.hi >> 20) & 0x7ff) - 1075 - (ND_MUL2K_MAX_SHIFT < 29);
+ goto load_t_lo; rescale_failed:
+ t.n = n;
+ e = (t.u32.hi >> 20) & 0x7ff;
+ ndebias = ndhi = 0;
+ }
+ }
+ nd[0] = t.u32.hi & 0xfffff;
+ if (e == 0) e++; else nd[0] |= 0x100000;
+ e -= 1043;
+ if (t.u32.lo) {
+ e -= 32 + (ND_MUL2K_MAX_SHIFT < 29); load_t_lo:
+#if ND_MUL2K_MAX_SHIFT >= 29
+ nd[0] = (nd[0] << 3) | (t.u32.lo >> 29);
+ ndhi = nd_mul2k(nd, ndhi, 29, t.u32.lo & 0x1fffffff, sf);
+#elif ND_MUL2K_MAX_SHIFT >= 11
+ ndhi = nd_mul2k(nd, ndhi, 11, t.u32.lo >> 21, sf);
+ ndhi = nd_mul2k(nd, ndhi, 11, (t.u32.lo >> 10) & 0x7ff, sf);
+ ndhi = nd_mul2k(nd, ndhi, 11, (t.u32.lo << 1) & 0x7ff, sf);
+#else
+#error "ND_MUL2K_MAX_SHIFT too small"
+#endif
+ }
+ if (e >= 0) {
+ ndhi = nd_mul2k(nd, ndhi, (uint32_t)e, 0, sf);
+ ndlo = 0;
+ } else {
+ ndlo = nd_div2k(nd, ndhi, (uint32_t)-e, sf);
+ if (ndhi && !nd[ndhi]) ndhi--;
+ }
+ /* abs(n) == nd * 10^ndebias (for slightly loose interpretation of ==) */
+ if ((sf & STRFMT_T_FP_E)) {
+ /* %e or %g - assume %e and start by calculating nd's exponent (nde). */
+ char eprefix = '+';
+ int32_t nde = -1;
+ MSize hilen;
+ if (ndlo && !nd[ndhi]) {
+ ndhi = 64; do {} while (!nd[--ndhi]);
+ nde -= 64 * 9;
+ }
+ hilen = ndigits_dec(nd[ndhi]);
+ nde += ndhi * 9 + hilen;
+ if (ndebias) {
+ /*
+ ** Rescaling was performed, but this introduced some error, and might
+ ** have pushed us across a rounding boundary. We check whether this
+ ** error affected the result by introducing even more error (2ulp in
+ ** either direction), and seeing whether a roundary boundary was
+ ** crossed. Having already converted the -2ulp case, we save off its
+ ** most significant digits, convert the +2ulp case, and compare them.
+ */
+ int32_t eidx = e + 70 + (ND_MUL2K_MAX_SHIFT < 29)
+ + (t.u32.lo >= 0xfffffffe && !(~t.u32.hi << 12));
+ const int8_t *m_e = four_ulp_m_e + eidx * 2;
+ lua_assert(0 <= eidx && eidx < 128);
+ nd[33] = nd[ndhi];
+ nd[32] = nd[(ndhi - 1) & 0x3f];
+ nd[31] = nd[(ndhi - 2) & 0x3f];
+ nd_add_m10e(nd, ndhi, (uint8_t)*m_e, m_e[1]);
+ if (LJ_UNLIKELY(!nd_similar(nd, ndhi, nd + 33, hilen, prec + 1))) {
+ goto rescale_failed;
+ }
+ }
+ if ((int32_t)(prec - nde) < (0x3f & -(int32_t)ndlo) * 9) {
+ /* Precision is sufficiently low as to maybe require rounding. */
+ ndhi = nd_add_m10e(nd, ndhi, 5, nde - prec - 1);
+ nde += (hilen != ndigits_dec(nd[ndhi]));
+ }
+ nde += ndebias;
+ if ((sf & STRFMT_T_FP_F)) {
+ /* %g */
+ if ((int32_t)prec >= nde && nde >= -4) {
+ if (nde < 0) ndhi = 0;
+ prec -= nde;
+ goto g_format_like_f;
+ } else if (!(sf & STRFMT_F_ALT) && prec && width > 5) {
+ /* Decrease precision in order to strip trailing zeroes. */
+ char tail[9];
+ uint32_t maxprec = hilen - 1 + ((ndhi - ndlo) & 0x3f) * 9;
+ if (prec >= maxprec) prec = maxprec;
+ else ndlo = (ndhi - (((int32_t)(prec - hilen) + 9) / 9)) & 0x3f;
+ i = prec - hilen - (((ndhi - ndlo) & 0x3f) * 9) + 10;
+ lj_strfmt_wuint9(tail, nd[ndlo]);
+ while (prec && tail[--i] == '0') {
+ prec--;
+ if (!i) {
+ if (ndlo == ndhi) { prec = 0; break; }
+ lj_strfmt_wuint9(tail, nd[++ndlo]);
+ i = 9;
+ }
+ }
+ }
+ }
+ if (nde < 0) {
+ /* Make nde non-negative. */
+ eprefix = '-';
+ nde = -nde;
+ }
+ len = 3 + prec + (prefix != 0) + ndigits_dec((uint32_t)nde) + (nde < 10)
+ + ((prec | (sf & STRFMT_F_ALT)) != 0);
+ if (!p) p = lj_buf_more(sb, (width > len ? width : len) + 5);
+ if (!(sf & (STRFMT_F_LEFT | STRFMT_F_ZERO))) {
+ while (width-- > len) *p++ = ' ';
+ }
+ if (prefix) *p++ = prefix;
+ if ((sf & (STRFMT_F_LEFT | STRFMT_F_ZERO)) == STRFMT_F_ZERO) {
+ while (width-- > len) *p++ = '0';
+ }
+ q = lj_strfmt_wint(p + 1, nd[ndhi]);
+ p[0] = p[1]; /* Put leading digit in the correct place. */
+ if ((prec | (sf & STRFMT_F_ALT))) {
+ /* Emit fractional part. */
+ p[1] = '.'; p += 2;
+ prec -= (q - p); p = q; /* Account for the digits already emitted. */
+ /* Then emit chunks of 9 digits (this may emit 8 digits too many). */
+ for (i = ndhi; (int32_t)prec > 0 && i != ndlo; prec -= 9) {
+ i = (i - 1) & 0x3f;
+ p = lj_strfmt_wuint9(p, nd[i]);
+ }
+ if ((sf & STRFMT_T_FP_F) && !(sf & STRFMT_F_ALT)) {
+ /* %g (and not %#g) - strip trailing zeroes. */
+ p += (int32_t)prec & ((int32_t)prec >> 31);
+ while (p[-1] == '0') p--;
+ if (p[-1] == '.') p--;
+ } else {
+ /* %e (or %#g) - emit trailing zeroes. */
+ while ((int32_t)prec > 0) { *p++ = '0'; prec--; }
+ p += (int32_t)prec;
+ }
+ } else {
+ p++;
+ }
+ *p++ = (sf & STRFMT_F_UPPER) ? 'E' : 'e';
+ *p++ = eprefix; /* + or - */
+ if (nde < 10) *p++ = '0'; /* Always at least two digits of exponent. */
+ p = lj_strfmt_wint(p, nde);
+ } else {
+ /* %f (or, shortly, %g in %f style) */
+ if (prec < (MSize)(0x3f & -(int32_t)ndlo) * 9) {
+ /* Precision is sufficiently low as to maybe require rounding. */
+ ndhi = nd_add_m10e(nd, ndhi, 5, 0 - prec - 1);
+ }
+ g_format_like_f:
+ if ((sf & STRFMT_T_FP_E) && !(sf & STRFMT_F_ALT) && prec && width) {
+ /* Decrease precision in order to strip trailing zeroes. */
+ if (ndlo) {
+ /* nd has a fractional part; we need to look at its digits. */
+ char tail[9];
+ uint32_t maxprec = (64 - ndlo) * 9;
+ if (prec >= maxprec) prec = maxprec;
+ else ndlo = 64 - (prec + 8) / 9;
+ i = prec - ((63 - ndlo) * 9);
+ lj_strfmt_wuint9(tail, nd[ndlo]);
+ while (prec && tail[--i] == '0') {
+ prec--;
+ if (!i) {
+ if (ndlo == 63) { prec = 0; break; }
+ lj_strfmt_wuint9(tail, nd[++ndlo]);
+ i = 9;
+ }
+ }
+ } else {
+ /* nd has no fractional part, so precision goes straight to zero. */
+ prec = 0;
+ }
+ }
+ len = ndhi * 9 + ndigits_dec(nd[ndhi]) + prec + (prefix != 0)
+ + ((prec | (sf & STRFMT_F_ALT)) != 0);
+ if (!p) p = lj_buf_more(sb, (width > len ? width : len) + 8);
+ if (!(sf & (STRFMT_F_LEFT | STRFMT_F_ZERO))) {
+ while (width-- > len) *p++ = ' ';
+ }
+ if (prefix) *p++ = prefix;
+ if ((sf & (STRFMT_F_LEFT | STRFMT_F_ZERO)) == STRFMT_F_ZERO) {
+ while (width-- > len) *p++ = '0';
+ }
+ /* Emit integer part. */
+ p = lj_strfmt_wint(p, nd[ndhi]);
+ i = ndhi;
+ while (i) p = lj_strfmt_wuint9(p, nd[--i]);
+ if ((prec | (sf & STRFMT_F_ALT))) {
+ /* Emit fractional part. */
+ *p++ = '.';
+ /* Emit chunks of 9 digits (this may emit 8 digits too many). */
+ while ((int32_t)prec > 0 && i != ndlo) {
+ i = (i - 1) & 0x3f;
+ p = lj_strfmt_wuint9(p, nd[i]);
+ prec -= 9;
+ }
+ if ((sf & STRFMT_T_FP_E) && !(sf & STRFMT_F_ALT)) {
+ /* %g (and not %#g) - strip trailing zeroes. */
+ p += (int32_t)prec & ((int32_t)prec >> 31);
+ while (p[-1] == '0') p--;
+ if (p[-1] == '.') p--;
+ } else {
+ /* %f (or %#g) - emit trailing zeroes. */
+ while ((int32_t)prec > 0) { *p++ = '0'; prec--; }
+ p += (int32_t)prec;
+ }
+ }
+ }
+ }
+ if ((sf & STRFMT_F_LEFT)) while (width-- > len) *p++ = ' ';
+ return p;
+}
+
+/* Add formatted floating-point number to buffer. */
+SBuf *lj_strfmt_putfnum(SBuf *sb, SFormat sf, lua_Number n)
+{
+ setsbufP(sb, lj_strfmt_wfnum(sb, sf, n, NULL));
+ return sb;
+}
+
+/* -- Conversions to strings ---------------------------------------------- */
+
+/* Convert number to string. */
+GCstr * LJ_FASTCALL lj_strfmt_num(lua_State *L, cTValue *o)
+{
+ char buf[STRFMT_MAXBUF_NUM];
+ MSize len = (MSize)(lj_strfmt_wfnum(NULL, STRFMT_G14, o->n, buf) - buf);
+ return lj_str_new(L, buf, len);
+}
+
--- /dev/null
- ** Copyright (C) 2005-2015 Mike Pall. See Copyright Notice in luajit.h
+/*
+** Definitions for ARM64 CPUs.
++** Copyright (C) 2005-2016 Mike Pall. See Copyright Notice in luajit.h
+*/
+
+#ifndef _LJ_TARGET_ARM64_H
+#define _LJ_TARGET_ARM64_H
+
+/* -- Registers IDs ------------------------------------------------------- */
+
+#define GPRDEF(_) \
+ _(X0) _(X1) _(X2) _(X3) _(X4) _(X5) _(X6) _(X7) \
+ _(X8) _(X9) _(X10) _(X11) _(X12) _(X13) _(X14) _(X15) \
+ _(X16) _(X17) _(X18) _(X19) _(X20) _(X21) _(X22) _(X23) \
+ _(X24) _(X25) _(X26) _(X27) _(X28) _(FP) _(LR) _(SP)
+#define FPRDEF(_) \
+ _(D0) _(D1) _(D2) _(D3) _(D4) _(D5) _(D6) _(D7) \
+ _(D8) _(D9) _(D10) _(D11) _(D12) _(D13) _(D14) _(D15) \
+ _(D16) _(D17) _(D18) _(D19) _(D20) _(D21) _(D22) _(D23) \
+ _(D24) _(D25) _(D26) _(D27) _(D28) _(D29) _(D30) _(D31)
+#define VRIDDEF(_)
+
+#define RIDENUM(name) RID_##name,
+
+enum {
+ GPRDEF(RIDENUM) /* General-purpose registers (GPRs). */
+ FPRDEF(RIDENUM) /* Floating-point registers (FPRs). */
+ RID_MAX,
+ RID_TMP = RID_LR,
+ RID_ZERO = RID_SP,
+
+ /* Calling conventions. */
+ RID_RET = RID_X0,
+ RID_FPRET = RID_D0,
+
+ /* These definitions must match with the *.dasc file(s): */
+ RID_BASE = RID_X19, /* Interpreter BASE. */
+ RID_LPC = RID_X21, /* Interpreter PC. */
+ RID_GL = RID_X22, /* Interpreter GL. */
+ RID_LREG = RID_X23, /* Interpreter L. */
+
+ /* Register ranges [min, max) and number of registers. */
+ RID_MIN_GPR = RID_X0,
+ RID_MAX_GPR = RID_SP+1,
+ RID_MIN_FPR = RID_MAX_GPR,
+ RID_MAX_FPR = RID_D31+1,
+ RID_NUM_GPR = RID_MAX_GPR - RID_MIN_GPR,
+ RID_NUM_FPR = RID_MAX_FPR - RID_MIN_FPR
+};
+
+#define RID_NUM_KREF RID_NUM_GPR
+#define RID_MIN_KREF RID_X0
+
+/* -- Register sets ------------------------------------------------------- */
+
+/* Make use of all registers, except for x18, fp, lr and sp. */
+#define RSET_FIXED \
+ (RID2RSET(RID_X18)|RID2RSET(RID_FP)|RID2RSET(RID_LR)|RID2RSET(RID_SP))
+#define RSET_GPR (RSET_RANGE(RID_MIN_GPR, RID_MAX_GPR) - RSET_FIXED)
+#define RSET_FPR RSET_RANGE(RID_MIN_FPR, RID_MAX_FPR)
+#define RSET_ALL (RSET_GPR|RSET_FPR)
+#define RSET_INIT RSET_ALL
+
+/* lr is an implicit scratch register. */
+#define RSET_SCRATCH_GPR (RSET_RANGE(RID_X0, RID_X17+1))
+#define RSET_SCRATCH_FPR \
+ (RSET_RANGE(RID_D0, RID_D7+1)|RSET_RANGE(RID_D16, RID_D31+1))
+#define RSET_SCRATCH (RSET_SCRATCH_GPR|RSET_SCRATCH_FPR)
+#define REGARG_FIRSTGPR RID_X0
+#define REGARG_LASTGPR RID_X7
+#define REGARG_NUMGPR 8
+#define REGARG_FIRSTFPR RID_D0
+#define REGARG_LASTFPR RID_D7
+#define REGARG_NUMFPR 8
+
+/* -- Instructions -------------------------------------------------------- */
+
+/* Instruction fields. */
+#define A64F_D(r) (r)
+#define A64F_N(r) ((r) << 5)
+#define A64F_A(r) ((r) << 10)
+#define A64F_M(r) ((r) << 16)
+#define A64F_U16(x) ((x) << 5)
+#define A64F_S26(x) (x)
+#define A64F_S19(x) ((x) << 5)
+
+typedef enum A64Ins {
+ A64I_MOVZw = 0x52800000,
+ A64I_MOVZx = 0xd2800000,
+ A64I_LDRLw = 0x18000000,
+ A64I_LDRLx = 0x58000000,
+ A64I_NOP = 0xd503201f,
+ A64I_B = 0x14000000,
+ A64I_BR = 0xd61f0000,
+} A64Ins;
+
+#endif
#include "lua.h"
-#define LUAJIT_VERSION "LuaJIT 2.0.4"
-#define LUAJIT_VERSION_NUM 20004 /* Version 2.0.4 = 02.00.04. */
-#define LUAJIT_VERSION_SYM luaJIT_version_2_0_4
+#define LUAJIT_VERSION "LuaJIT 2.1.0-beta1"
+#define LUAJIT_VERSION_NUM 20100 /* Version 2.1.0 = 02.01.00. */
+#define LUAJIT_VERSION_SYM luaJIT_version_2_1_0_beta1
- #define LUAJIT_COPYRIGHT "Copyright (C) 2005-2015 Mike Pall"
+ #define LUAJIT_COPYRIGHT "Copyright (C) 2005-2016 Mike Pall"
#define LUAJIT_URL "http://luajit.org/"
/* Modes for luaJIT_setmode. */
--- /dev/null
- |// Copyright (C) 2005-2015 Mike Pall. See Copyright Notice in luajit.h
+|// Low-level VM code for ARM64 CPUs.
+|// Bytecode interpreter, fast functions and helper functions.
++|// Copyright (C) 2005-2016 Mike Pall. See Copyright Notice in luajit.h
+|
+|.arch arm64
+|.section code_op, code_sub
+|
+|.actionlist build_actionlist
+|.globals GLOB_
+|.globalnames globnames
+|.externnames extnames
+|
+|// Note: The ragged indentation of the instructions is intentional.
+|// The starting columns indicate data dependencies.
+|
+|//-----------------------------------------------------------------------
+|
+|// ARM64 registers and the AAPCS64 ABI 1.0 at a glance:
+|//
+|// x0-x17 temp, x19-x28 callee-saved, x29 fp, x30 lr
+|// x18 is reserved on most platforms. Don't use it, save it or restore it.
+|// x31 doesn't exist. Register number 31 either means xzr/wzr (zero) or sp,
+|// depending on the instruction.
+|// v0-v7 temp, v8-v15 callee-saved (only d8-d15 preserved), v16-v31 temp
+|//
+|// x0-x7/v0-v7 hold parameters and results.
+|
+|// Fixed register assignments for the interpreter.
+|
+|// The following must be C callee-save.
+|.define BASE, x19 // Base of current Lua stack frame.
+|.define KBASE, x20 // Constants of current Lua function.
+|.define PC, x21 // Next PC.
+|.define GLREG, x22 // Global state.
+|.define LREG, x23 // Register holding lua_State (also in SAVE_L).
+|.define TISNUM, x24 // Constant LJ_TISNUM << 47.
+|.define TISNUMhi, x25 // Constant LJ_TISNUM << 15.
+|.define TISNIL, x26 // Constant -1LL.
+|.define fp, x29 // Yes, we have to maintain a frame pointer.
+|
+|.define ST_INTERP, w26 // Constant -1.
+|
+|// The following temporaries are not saved across C calls, except for RA/RC.
+|.define RA, x27
+|.define RC, x28
+|.define RB, x17
+|.define RAw, w27
+|.define RCw, w28
+|.define RBw, w17
+|.define INS, x16
+|.define INSw, w16
+|.define ITYPE, x15
+|.define TMP0, x8
+|.define TMP1, x9
+|.define TMP2, x10
+|.define TMP3, x11
+|.define TMP0w, w8
+|.define TMP1w, w9
+|.define TMP2w, w10
+|.define TMP3w, w11
+|
+|// Calling conventions. Also used as temporaries.
+|.define CARG1, x0
+|.define CARG2, x1
+|.define CARG3, x2
+|.define CARG4, x3
+|.define CARG5, x4
+|.define CARG1w, w0
+|.define CARG2w, w1
+|.define CARG3w, w2
+|.define CARG4w, w3
+|.define CARG5w, w4
+|
+|.define FARG1, d0
+|.define FARG2, d1
+|
+|.define CRET1, x0
+|.define CRET1w, w0
+|
+|// Stack layout while in interpreter. Must match with lj_frame.h.
+|
+|.define CFRAME_SPACE, 208
+|//----- 16 byte aligned, <-- sp entering interpreter
+|// Unused [sp, #204] // 32 bit values
+|.define SAVE_NRES, [sp, #200]
+|.define SAVE_ERRF, [sp, #196]
+|.define SAVE_MULTRES, [sp, #192]
+|.define TMPD, [sp, #184] // 64 bit values
+|.define SAVE_L, [sp, #176]
+|.define SAVE_PC, [sp, #168]
+|.define SAVE_CFRAME, [sp, #160]
+|.define SAVE_FPR_, 96 // 96+8*8: 64 bit FPR saves
+|.define SAVE_GPR_, 16 // 16+10*8: 64 bit GPR saves
+|.define SAVE_LR, [sp, #8]
+|.define SAVE_FP, [sp]
+|//----- 16 byte aligned, <-- sp while in interpreter.
+|
+|.define TMPDofs, #184
+|
+|.macro save_, gpr1, gpr2, fpr1, fpr2
+| stp d..fpr1, d..fpr2, [sp, # SAVE_FPR_+(fpr1-8)*8]
+| stp x..gpr1, x..gpr2, [sp, # SAVE_GPR_+(gpr1-19)*8]
+|.endmacro
+|.macro rest_, gpr1, gpr2, fpr1, fpr2
+| ldp d..fpr1, d..fpr2, [sp, # SAVE_FPR_+(fpr1-8)*8]
+| ldp x..gpr1, x..gpr2, [sp, # SAVE_GPR_+(gpr1-19)*8]
+|.endmacro
+|
+|.macro saveregs
+| stp fp, lr, [sp, #-CFRAME_SPACE]!
+| add fp, sp, #0
+| stp x19, x20, [sp, # SAVE_GPR_]
+| save_ 21, 22, 8, 9
+| save_ 23, 24, 10, 11
+| save_ 25, 26, 12, 13
+| save_ 27, 28, 14, 15
+|.endmacro
+|.macro restoreregs
+| ldp x19, x20, [sp, # SAVE_GPR_]
+| rest_ 21, 22, 8, 9
+| rest_ 23, 24, 10, 11
+| rest_ 25, 26, 12, 13
+| rest_ 27, 28, 14, 15
+| ldp fp, lr, [sp], # CFRAME_SPACE
+|.endmacro
+|
+|// Type definitions. Some of these are only used for documentation.
+|.type L, lua_State, LREG
+|.type GL, global_State, GLREG
+|.type TVALUE, TValue
+|.type GCOBJ, GCobj
+|.type STR, GCstr
+|.type TAB, GCtab
+|.type LFUNC, GCfuncL
+|.type CFUNC, GCfuncC
+|.type PROTO, GCproto
+|.type UPVAL, GCupval
+|.type NODE, Node
+|.type NARGS8, int
+|.type TRACE, GCtrace
+|.type SBUF, SBuf
+|
+|//-----------------------------------------------------------------------
+|
+|// Trap for not-yet-implemented parts.
+|.macro NYI; brk; .endmacro
+|
+|//-----------------------------------------------------------------------
+|
+|// Access to frame relative to BASE.
+|.define FRAME_FUNC, #-16
+|.define FRAME_PC, #-8
+|
+|.macro decode_RA, dst, ins; ubfx dst, ins, #8, #8; .endmacro
+|.macro decode_RB, dst, ins; ubfx dst, ins, #24, #8; .endmacro
+|.macro decode_RC, dst, ins; ubfx dst, ins, #16, #8; .endmacro
+|.macro decode_RD, dst, ins; ubfx dst, ins, #16, #16; .endmacro
+|.macro decode_RC8RD, dst, src; ubfiz dst, src, #3, #8; .endmacro
+|
+|// Instruction decode+dispatch.
+|.macro ins_NEXT
+| ldr INSw, [PC], #4
+| add TMP1, GL, INS, uxtb #3
+| decode_RA RA, INS
+| ldr TMP0, [TMP1, #GG_G2DISP]
+| decode_RD RC, INS
+| br TMP0
+|.endmacro
+|
+|// Instruction footer.
+|.if 1
+| // Replicated dispatch. Less unpredictable branches, but higher I-Cache use.
+| .define ins_next, ins_NEXT
+| .define ins_next_, ins_NEXT
+|.else
+| // Common dispatch. Lower I-Cache use, only one (very) unpredictable branch.
+| // Affects only certain kinds of benchmarks (and only with -j off).
+| .macro ins_next
+| b ->ins_next
+| .endmacro
+| .macro ins_next_
+| ->ins_next:
+| ins_NEXT
+| .endmacro
+|.endif
+|
+|// Call decode and dispatch.
+|.macro ins_callt
+| // BASE = new base, CARG3 = LFUNC/CFUNC, RC = nargs*8, FRAME_PC(BASE) = PC
+| ldr PC, LFUNC:CARG3->pc
+| ldr INSw, [PC], #4
+| add TMP1, GL, INS, uxtb #3
+| decode_RA RA, INS
+| ldr TMP0, [TMP1, #GG_G2DISP]
+| add RA, BASE, RA, lsl #3
+| br TMP0
+|.endmacro
+|
+|.macro ins_call
+| // BASE = new base, CARG3 = LFUNC/CFUNC, RC = nargs*8, PC = caller PC
+| str PC, [BASE, FRAME_PC]
+| ins_callt
+|.endmacro
+|
+|//-----------------------------------------------------------------------
+|
+|// Macros to check the TValue type and extract the GCobj. Branch on failure.
+|.macro checktp, reg, tp, target
+| asr ITYPE, reg, #47
+| cmn ITYPE, #-tp
+| and reg, reg, #LJ_GCVMASK
+| bne target
+|.endmacro
+|.macro checktp, dst, reg, tp, target
+| asr ITYPE, reg, #47
+| cmn ITYPE, #-tp
+| and dst, reg, #LJ_GCVMASK
+| bne target
+|.endmacro
+|.macro checkstr, reg, target; checktp reg, LJ_TSTR, target; .endmacro
+|.macro checktab, reg, target; checktp reg, LJ_TTAB, target; .endmacro
+|.macro checkfunc, reg, target; checktp reg, LJ_TFUNC, target; .endmacro
+|.macro checkint, reg, target
+| cmp TISNUMhi, reg, lsr #32
+| bne target
+|.endmacro
+|.macro checknum, reg, target
+| cmp TISNUMhi, reg, lsr #32
+| bls target
+|.endmacro
+|.macro checknumber, reg, target
+| cmp TISNUMhi, reg, lsr #32
+| blo target
+|.endmacro
+|
+|.macro mov_false, reg; movn reg, #0x8000, lsl #32; .endmacro
+|.macro mov_true, reg; movn reg, #0x0001, lsl #48; .endmacro
+|
+#define GL_J(field) (GG_OFS(J) + (int)offsetof(jit_State, field))
+|
+#define PC2PROTO(field) ((int)offsetof(GCproto, field)-(int)sizeof(GCproto))
+|
+|.macro hotcheck, delta
+| NYI
+|.endmacro
+|
+|.macro hotloop
+| hotcheck HOTCOUNT_LOOP
+| blo ->vm_hotloop
+|.endmacro
+|
+|.macro hotcall
+| hotcheck HOTCOUNT_CALL
+| blo ->vm_hotcall
+|.endmacro
+|
+|// Set current VM state.
+|.macro mv_vmstate, reg, st; movn reg, #LJ_VMST_..st; .endmacro
+|.macro st_vmstate, reg; str reg, GL->vmstate; .endmacro
+|
+|// Move table write barrier back. Overwrites mark and tmp.
+|.macro barrierback, tab, mark, tmp
+| ldr tmp, GL->gc.grayagain
+| and mark, mark, #~LJ_GC_BLACK // black2gray(tab)
+| str tab, GL->gc.grayagain
+| strb mark, tab->marked
+| str tmp, tab->gclist
+|.endmacro
+|
+|//-----------------------------------------------------------------------
+
+#if !LJ_DUALNUM
+#error "Only dual-number mode supported for ARM64 target"
+#endif
+
+/* Generate subroutines used by opcodes and other parts of the VM. */
+/* The .code_sub section should be last to help static branch prediction. */
+static void build_subroutines(BuildCtx *ctx)
+{
+ |.code_sub
+ |
+ |//-----------------------------------------------------------------------
+ |//-- Return handling ----------------------------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |->vm_returnp:
+ | // See vm_return. Also: RB = previous base.
+ | tbz PC, #2, ->cont_dispatch // (PC & FRAME_P) == 0?
+ |
+ | // Return from pcall or xpcall fast func.
+ | ldr PC, [RB, FRAME_PC] // Fetch PC of previous frame.
+ | mov_true TMP0
+ | mov BASE, RB
+ | // Prepending may overwrite the pcall frame, so do it at the end.
+ | str TMP0, [RA, #-8]! // Prepend true to results.
+ |
+ |->vm_returnc:
+ | adds RC, RC, #8 // RC = (nresults+1)*8.
+ | mov CRET1, #LUA_YIELD
+ | beq ->vm_unwind_c_eh
+ | str RCw, SAVE_MULTRES
+ | ands CARG1, PC, #FRAME_TYPE
+ | beq ->BC_RET_Z // Handle regular return to Lua.
+ |
+ |->vm_return:
+ | // BASE = base, RA = resultptr, RC/MULTRES = (nresults+1)*8, PC = return
+ | // CARG1 = PC & FRAME_TYPE
+ | and RB, PC, #~FRAME_TYPEP
+ | cmp CARG1, #FRAME_C
+ | sub RB, BASE, RB // RB = previous base.
+ | bne ->vm_returnp
+ |
+ | str RB, L->base
+ | ldrsw CARG2, SAVE_NRES // CARG2 = nresults+1.
+ | mv_vmstate TMP0w, C
+ | sub BASE, BASE, #16
+ | subs TMP2, RC, #8
+ | st_vmstate TMP0w
+ | beq >2
+ |1:
+ | subs TMP2, TMP2, #8
+ | ldr TMP0, [RA], #8
+ | str TMP0, [BASE], #8
+ | bne <1
+ |2:
+ | cmp RC, CARG2, lsl #3 // More/less results wanted?
+ | bne >6
+ |3:
+ | str BASE, L->top // Store new top.
+ |
+ |->vm_leave_cp:
+ | ldr RC, SAVE_CFRAME // Restore previous C frame.
+ | mov CRET1, #0 // Ok return status for vm_pcall.
+ | str RC, L->cframe
+ |
+ |->vm_leave_unw:
+ | restoreregs
+ | ret
+ |
+ |6:
+ | bgt >7 // Less results wanted?
+ | // More results wanted. Check stack size and fill up results with nil.
+ | ldr CARG3, L->maxstack
+ | cmp BASE, CARG3
+ | bhs >8
+ | str TISNIL, [BASE], #8
+ | add RC, RC, #8
+ | b <2
+ |
+ |7: // Less results wanted.
+ | cbz CARG2, <3 // LUA_MULTRET+1 case?
+ | sub CARG1, RC, CARG2, lsl #3
+ | sub BASE, BASE, CARG1 // Shrink top.
+ | b <3
+ |
+ |8: // Corner case: need to grow stack for filling up results.
+ | // This can happen if:
+ | // - A C function grows the stack (a lot).
+ | // - The GC shrinks the stack in between.
+ | // - A return back from a lua_call() with (high) nresults adjustment.
+ | str BASE, L->top // Save current top held in BASE (yes).
+ | mov CARG1, L
+ | bl extern lj_state_growstack // (lua_State *L, int n)
+ | ldr BASE, L->top // Need the (realloced) L->top in BASE.
+ | ldrsw CARG2, SAVE_NRES
+ | b <2
+ |
+ |->vm_unwind_c: // Unwind C stack, return from vm_pcall.
+ | // (void *cframe, int errcode)
+ | mov sp, CARG1
+ | mov CRET1, CARG2
+ |->vm_unwind_c_eh: // Landing pad for external unwinder.
+ | ldr L, SAVE_L
+ | mv_vmstate TMP0w, C
+ | ldr GL, L->glref
+ | st_vmstate TMP0w
+ | b ->vm_leave_unw
+ |
+ |->vm_unwind_ff: // Unwind C stack, return from ff pcall.
+ | // (void *cframe)
+ | and sp, CARG1, #CFRAME_RAWMASK
+ |->vm_unwind_ff_eh: // Landing pad for external unwinder.
+ | ldr L, SAVE_L
+ | movz TISNUM, #(LJ_TISNUM>>1)&0xffff, lsl #48
+ | movz TISNUMhi, #(LJ_TISNUM>>1)&0xffff, lsl #16
+ | movn TISNIL, #0
+ | mov RC, #16 // 2 results: false + error message.
+ | ldr BASE, L->base
+ | ldr GL, L->glref // Setup pointer to global state.
+ | mov_false TMP0
+ | sub RA, BASE, #8 // Results start at BASE-8.
+ | ldr PC, [BASE, FRAME_PC] // Fetch PC of previous frame.
+ | str TMP0, [BASE, #-8] // Prepend false to error message.
+ | st_vmstate ST_INTERP
+ | b ->vm_returnc
+ |
+ |//-----------------------------------------------------------------------
+ |//-- Grow stack for calls -----------------------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |->vm_growstack_c: // Grow stack for C function.
+ | // CARG1 = L
+ | mov CARG2, #LUA_MINSTACK
+ | b >2
+ |
+ |->vm_growstack_l: // Grow stack for Lua function.
+ | // BASE = new base, RA = BASE+framesize*8, RC = nargs*8, PC = first PC
+ | add RC, BASE, RC
+ | sub RA, RA, BASE
+ | mov CARG1, L
+ | stp BASE, RC, L->base
+ | add PC, PC, #4 // Must point after first instruction.
+ | lsr CARG2, RA, #3
+ |2:
+ | // L->base = new base, L->top = top
+ | str PC, SAVE_PC
+ | bl extern lj_state_growstack // (lua_State *L, int n)
+ | ldp BASE, RC, L->base
+ | ldr LFUNC:CARG3, [BASE, FRAME_FUNC]
+ | sub NARGS8:RC, RC, BASE
+ | and LFUNC:CARG3, CARG3, #LJ_GCVMASK
+ | // BASE = new base, RB = LFUNC/CFUNC, RC = nargs*8, FRAME_PC(BASE) = PC
+ | ins_callt // Just retry the call.
+ |
+ |//-----------------------------------------------------------------------
+ |//-- Entry points into the assembler VM ---------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |->vm_resume: // Setup C frame and resume thread.
+ | // (lua_State *L, TValue *base, int nres1 = 0, ptrdiff_t ef = 0)
+ | saveregs
+ | mov L, CARG1
+ | ldr GL, L->glref // Setup pointer to global state.
+ | mov BASE, CARG2
+ | str L, SAVE_L
+ | mov PC, #FRAME_CP
+ | str wzr, SAVE_NRES
+ | add TMP0, sp, #CFRAME_RESUME
+ | ldrb TMP1w, L->status
+ | str wzr, SAVE_ERRF
+ | str L, SAVE_PC // Any value outside of bytecode is ok.
+ | str xzr, SAVE_CFRAME
+ | str TMP0, L->cframe
+ | cbz TMP1w, >3
+ |
+ | // Resume after yield (like a return).
+ | str L, GL->cur_L
+ | mov RA, BASE
+ | ldp BASE, CARG1, L->base
+ | movz TISNUM, #(LJ_TISNUM>>1)&0xffff, lsl #48
+ | movz TISNUMhi, #(LJ_TISNUM>>1)&0xffff, lsl #16
+ | ldr PC, [BASE, FRAME_PC]
+ | strb wzr, L->status
+ | movn TISNIL, #0
+ | sub RC, CARG1, BASE
+ | ands CARG1, PC, #FRAME_TYPE
+ | add RC, RC, #8
+ | st_vmstate ST_INTERP
+ | str RCw, SAVE_MULTRES
+ | beq ->BC_RET_Z
+ | b ->vm_return
+ |
+ |->vm_pcall: // Setup protected C frame and enter VM.
+ | // (lua_State *L, TValue *base, int nres1, ptrdiff_t ef)
+ | saveregs
+ | mov PC, #FRAME_CP
+ | str CARG4w, SAVE_ERRF
+ | b >1
+ |
+ |->vm_call: // Setup C frame and enter VM.
+ | // (lua_State *L, TValue *base, int nres1)
+ | saveregs
+ | mov PC, #FRAME_C
+ |
+ |1: // Entry point for vm_pcall above (PC = ftype).
+ | ldr RC, L:CARG1->cframe
+ | str CARG3w, SAVE_NRES
+ | mov L, CARG1
+ | str CARG1, SAVE_L
+ | ldr GL, L->glref // Setup pointer to global state.
+ | mov BASE, CARG2
+ | str CARG1, SAVE_PC // Any value outside of bytecode is ok.
+ | str RC, SAVE_CFRAME
+ | str fp, L->cframe // Add our C frame to cframe chain.
+ |
+ |3: // Entry point for vm_cpcall/vm_resume (BASE = base, PC = ftype).
+ | str L, GL->cur_L
+ | ldp RB, CARG1, L->base // RB = old base (for vmeta_call).
+ | movz TISNUM, #(LJ_TISNUM>>1)&0xffff, lsl #48
+ | movz TISNUMhi, #(LJ_TISNUM>>1)&0xffff, lsl #16
+ | add PC, PC, BASE
+ | movn TISNIL, #0
+ | sub PC, PC, RB // PC = frame delta + frame type
+ | sub NARGS8:RC, CARG1, BASE
+ | st_vmstate ST_INTERP
+ |
+ |->vm_call_dispatch:
+ | // RB = old base, BASE = new base, RC = nargs*8, PC = caller PC
+ | ldr CARG3, [BASE, FRAME_FUNC]
+ | checkfunc CARG3, ->vmeta_call
+ |
+ |->vm_call_dispatch_f:
+ | ins_call
+ | // BASE = new base, CARG3 = func, RC = nargs*8, PC = caller PC
+ |
+ |->vm_cpcall: // Setup protected C frame, call C.
+ | // (lua_State *L, lua_CFunction func, void *ud, lua_CPFunction cp)
+ | saveregs
+ | mov L, CARG1
+ | ldr RA, L:CARG1->stack
+ | str CARG1, SAVE_L
+ | ldr GL, L->glref // Setup pointer to global state.
+ | ldr RB, L->top
+ | str CARG1, SAVE_PC // Any value outside of bytecode is ok.
+ | ldr RC, L->cframe
+ | sub RA, RA, RB // Compute -savestack(L, L->top).
+ | str RAw, SAVE_NRES // Neg. delta means cframe w/o frame.
+ | str wzr, SAVE_ERRF // No error function.
+ | str RC, SAVE_CFRAME
+ | str fp, L->cframe // Add our C frame to cframe chain.
+ | str L, GL->cur_L
+ | blr CARG4 // (lua_State *L, lua_CFunction func, void *ud)
+ | mov BASE, CRET1
+ | mov PC, #FRAME_CP
+ | cbnz BASE, <3 // Else continue with the call.
+ | b ->vm_leave_cp // No base? Just remove C frame.
+ |
+ |//-----------------------------------------------------------------------
+ |//-- Metamethod handling ------------------------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |//-- Continuation dispatch ----------------------------------------------
+ |
+ |->cont_dispatch:
+ | // BASE = meta base, RA = resultptr, RC = (nresults+1)*8
+ | ldr LFUNC:CARG3, [RB, FRAME_FUNC]
+ | ldr CARG1, [BASE, #-32] // Get continuation.
+ | mov CARG4, BASE
+ | mov BASE, RB // Restore caller BASE.
+ | and LFUNC:CARG3, CARG3, #LJ_GCVMASK
+ |.if FFI
+ | cmp CARG1, #1
+ |.endif
+ | ldr PC, [CARG4, #-24] // Restore PC from [cont|PC].
+ | ldr CARG3, LFUNC:CARG3->pc
+ | add TMP0, RA, RC
+ | str TISNIL, [TMP0, #-8] // Ensure one valid arg.
+ |.if FFI
+ | bls >1
+ |.endif
+ | ldr KBASE, [CARG3, #PC2PROTO(k)]
+ | // BASE = base, RA = resultptr, CARG4 = meta base
+ | br CARG1
+ |
+ |.if FFI
+ |1:
+ | beq ->cont_ffi_callback // cont = 1: return from FFI callback.
+ | // cont = 0: tailcall from C function.
+ | sub CARG4, CARG4, #32
+ | sub RC, CARG4, BASE
+ | b ->vm_call_tail
+ |.endif
+ |
+ |->cont_cat: // RA = resultptr, CARG4 = meta base
+ | ldr INSw, [PC, #-4]
+ | sub CARG2, CARG4, #32
+ | ldr TMP0, [RA]
+ | str BASE, L->base
+ | decode_RB RB, INS
+ | decode_RA RA, INS
+ | add TMP1, BASE, RB, lsl #3
+ | subs TMP1, CARG2, TMP1
+ | beq >1
+ | str TMP0, [CARG2]
+ | lsr CARG3, TMP1, #3
+ | b ->BC_CAT_Z
+ |
+ |1:
+ | str TMP0, [BASE, RA, lsl #3]
+ | b ->cont_nop
+ |
+ |//-- Table indexing metamethods -----------------------------------------
+ |
+ |->vmeta_tgets1:
+ | movn CARG4, #~LJ_TSTR
+ | add CARG2, BASE, RB, lsl #3
+ | add CARG4, STR:RC, CARG4, lsl #47
+ | b >2
+ |
+ |->vmeta_tgets:
+ | movk CARG2, #(LJ_TTAB>>1)&0xffff, lsl #48
+ | str CARG2, GL->tmptv
+ | add CARG2, GL, #offsetof(global_State, tmptv)
+ |2:
+ | add CARG3, sp, TMPDofs
+ | str CARG4, TMPD
+ | b >1
+ |
+ |->vmeta_tgetb: // RB = table, RC = index
+ | add RC, RC, TISNUM
+ | add CARG2, BASE, RB, lsl #3
+ | add CARG3, sp, TMPDofs
+ | str RC, TMPD
+ | b >1
+ |
+ |->vmeta_tgetv: // RB = table, RC = key
+ | add CARG2, BASE, RB, lsl #3
+ | add CARG3, BASE, RC, lsl #3
+ |1:
+ | str BASE, L->base
+ | mov CARG1, L
+ | str PC, SAVE_PC
+ | bl extern lj_meta_tget // (lua_State *L, TValue *o, TValue *k)
+ | // Returns TValue * (finished) or NULL (metamethod).
+ | cbz CRET1, >3
+ | ldr TMP0, [CRET1]
+ | str TMP0, [BASE, RA, lsl #3]
+ | ins_next
+ |
+ |3: // Call __index metamethod.
+ | // BASE = base, L->top = new base, stack = cont/func/t/k
+ | sub TMP1, BASE, #FRAME_CONT
+ | ldr BASE, L->top
+ | mov NARGS8:RC, #16 // 2 args for func(t, k).
+ | ldr LFUNC:CARG3, [BASE, FRAME_FUNC] // Guaranteed to be a function here.
+ | str PC, [BASE, #-24] // [cont|PC]
+ | sub PC, BASE, TMP1
+ | and LFUNC:CARG3, CARG3, #LJ_GCVMASK
+ | b ->vm_call_dispatch_f
+ |
+ |->vmeta_tgetr:
+ | sxtw CARG2, TMP1w
+ | bl extern lj_tab_getinth // (GCtab *t, int32_t key)
+ | // Returns cTValue * or NULL.
+ | mov TMP0, TISNIL
+ | cbz CRET1, ->BC_TGETR_Z
+ | ldr TMP0, [CRET1]
+ | b ->BC_TGETR_Z
+ |
+ |//-----------------------------------------------------------------------
+ |
+ |->vmeta_tsets1:
+ | movn CARG4, #~LJ_TSTR
+ | add CARG2, BASE, RB, lsl #3
+ | add CARG4, STR:RC, CARG4, lsl #47
+ | b >2
+ |
+ |->vmeta_tsets:
+ | movk CARG2, #(LJ_TTAB>>1)&0xffff, lsl #48
+ | str CARG2, GL->tmptv
+ | add CARG2, GL, #offsetof(global_State, tmptv)
+ |2:
+ | add CARG3, sp, TMPDofs
+ | str CARG4, TMPD
+ | b >1
+ |
+ |->vmeta_tsetb: // RB = table, RC = index
+ | add RC, RC, TISNUM
+ | add CARG2, BASE, RB, lsl #3
+ | add CARG3, sp, TMPDofs
+ | str RC, TMPD
+ | b >1
+ |
+ |->vmeta_tsetv:
+ | add CARG2, BASE, RB, lsl #3
+ | add CARG3, BASE, RC, lsl #3
+ |1:
+ | str BASE, L->base
+ | mov CARG1, L
+ | str PC, SAVE_PC
+ | bl extern lj_meta_tset // (lua_State *L, TValue *o, TValue *k)
+ | // Returns TValue * (finished) or NULL (metamethod).
+ | ldr TMP0, [BASE, RA, lsl #3]
+ | cbz CRET1, >3
+ | // NOBARRIER: lj_meta_tset ensures the table is not black.
+ | str TMP0, [CRET1]
+ | ins_next
+ |
+ |3: // Call __newindex metamethod.
+ | // BASE = base, L->top = new base, stack = cont/func/t/k/(v)
+ | sub TMP1, BASE, #FRAME_CONT
+ | ldr BASE, L->top
+ | mov NARGS8:RC, #24 // 3 args for func(t, k, v).
+ | ldr LFUNC:CARG3, [BASE, FRAME_FUNC] // Guaranteed to be a function here.
+ | str TMP0, [BASE, #16] // Copy value to third argument.
+ | str PC, [BASE, #-24] // [cont|PC]
+ | sub PC, BASE, TMP1
+ | and LFUNC:CARG3, CARG3, #LJ_GCVMASK
+ | b ->vm_call_dispatch_f
+ |
+ |->vmeta_tsetr:
+ | sxtw CARG3, TMP1w
+ | str BASE, L->base
+ | str PC, SAVE_PC
+ | bl extern lj_tab_setinth // (lua_State *L, GCtab *t, int32_t key)
+ | // Returns TValue *.
+ | b ->BC_TSETR_Z
+ |
+ |//-- Comparison metamethods ---------------------------------------------
+ |
+ |->vmeta_comp:
+ | add CARG2, BASE, RA, lsl #3
+ | sub PC, PC, #4
+ | add CARG3, BASE, RC, lsl #3
+ | str BASE, L->base
+ | mov CARG1, L
+ | str PC, SAVE_PC
+ | uxtb CARG4w, INSw
+ | bl extern lj_meta_comp // (lua_State *L, TValue *o1, *o2, int op)
+ | // Returns 0/1 or TValue * (metamethod).
+ |3:
+ | cmp CRET1, #1
+ | bhi ->vmeta_binop
+ |4:
+ | ldrh RBw, [PC, #2]
+ | add PC, PC, #4
+ | add RB, PC, RB, lsl #2
+ | sub RB, RB, #0x20000
+ | csel PC, PC, RB, lo
+ |->cont_nop:
+ | ins_next
+ |
+ |->cont_ra: // RA = resultptr
+ | ldr INSw, [PC, #-4]
+ | ldr TMP0, [RA]
+ | decode_RA TMP1, INS
+ | str TMP0, [BASE, TMP1, lsl #3]
+ | b ->cont_nop
+ |
+ |->cont_condt: // RA = resultptr
+ | ldr TMP0, [RA]
+ | mov_true TMP1
+ | cmp TMP1, TMP0 // Branch if result is true.
+ | b <4
+ |
+ |->cont_condf: // RA = resultptr
+ | ldr TMP0, [RA]
+ | mov_false TMP1
+ | cmp TMP0, TMP1 // Branch if result is false.
+ | b <4
+ |
+ |->vmeta_equal:
+ | // CARG2, CARG3, CARG4 are already set by BC_ISEQV/BC_ISNEV.
+ | and TAB:CARG3, CARG3, #LJ_GCVMASK
+ | sub PC, PC, #4
+ | str BASE, L->base
+ | mov CARG1, L
+ | str PC, SAVE_PC
+ | bl extern lj_meta_equal // (lua_State *L, GCobj *o1, *o2, int ne)
+ | // Returns 0/1 or TValue * (metamethod).
+ | b <3
+ |
+ |->vmeta_equal_cd:
+ |.if FFI
+ | sub PC, PC, #4
+ | str BASE, L->base
+ | mov CARG1, L
+ | mov CARG2, INS
+ | str PC, SAVE_PC
+ | bl extern lj_meta_equal_cd // (lua_State *L, BCIns op)
+ | // Returns 0/1 or TValue * (metamethod).
+ | b <3
+ |.endif
+ |
+ |->vmeta_istype:
+ | sub PC, PC, #4
+ | str BASE, L->base
+ | mov CARG1, L
+ | mov CARG2, RA
+ | mov CARG3, RC
+ | str PC, SAVE_PC
+ | bl extern lj_meta_istype // (lua_State *L, BCReg ra, BCReg tp)
+ | b ->cont_nop
+ |
+ |//-- Arithmetic metamethods ---------------------------------------------
+ |
+ |->vmeta_arith_vn:
+ | add CARG3, BASE, RB, lsl #3
+ | add CARG4, KBASE, RC, lsl #3
+ | b >1
+ |
+ |->vmeta_arith_nv:
+ | add CARG4, BASE, RB, lsl #3
+ | add CARG3, KBASE, RC, lsl #3
+ | b >1
+ |
+ |->vmeta_unm:
+ | add CARG3, BASE, RC, lsl #3
+ | mov CARG4, CARG3
+ | b >1
+ |
+ |->vmeta_arith_vv:
+ | add CARG3, BASE, RB, lsl #3
+ | add CARG4, BASE, RC, lsl #3
+ |1:
+ | uxtb CARG5w, INSw
+ | add CARG2, BASE, RA, lsl #3
+ | str BASE, L->base
+ | mov CARG1, L
+ | str PC, SAVE_PC
+ | bl extern lj_meta_arith // (lua_State *L, TValue *ra,*rb,*rc, BCReg op)
+ | // Returns NULL (finished) or TValue * (metamethod).
+ | cbz CRET1, ->cont_nop
+ |
+ | // Call metamethod for binary op.
+ |->vmeta_binop:
+ | // BASE = old base, CRET1 = new base, stack = cont/func/o1/o2
+ | sub TMP1, CRET1, BASE
+ | str PC, [CRET1, #-24] // [cont|PC]
+ | add PC, TMP1, #FRAME_CONT
+ | mov BASE, CRET1
+ | mov NARGS8:RC, #16 // 2 args for func(o1, o2).
+ | b ->vm_call_dispatch
+ |
+ |->vmeta_len:
+ | add CARG2, BASE, RC, lsl #3
+#if LJ_52
+ | mov TAB:RC, TAB:CARG1 // Save table (ignored for other types).
+#endif
+ | str BASE, L->base
+ | mov CARG1, L
+ | str PC, SAVE_PC
+ | bl extern lj_meta_len // (lua_State *L, TValue *o)
+ | // Returns NULL (retry) or TValue * (metamethod base).
+#if LJ_52
+ | cbnz CRET1, ->vmeta_binop // Binop call for compatibility.
+ | mov TAB:CARG1, TAB:RC
+ | b ->BC_LEN_Z
+#else
+ | b ->vmeta_binop // Binop call for compatibility.
+#endif
+ |
+ |//-- Call metamethod ----------------------------------------------------
+ |
+ |->vmeta_call: // Resolve and call __call metamethod.
+ | // RB = old base, BASE = new base, RC = nargs*8
+ | mov CARG1, L
+ | str RB, L->base // This is the callers base!
+ | sub CARG2, BASE, #16
+ | str PC, SAVE_PC
+ | add CARG3, BASE, NARGS8:RC
+ | bl extern lj_meta_call // (lua_State *L, TValue *func, TValue *top)
+ | ldr LFUNC:CARG3, [BASE, FRAME_FUNC] // Guaranteed to be a function here.
+ | add NARGS8:RC, NARGS8:RC, #8 // Got one more argument now.
+ | and LFUNC:CARG3, CARG3, #LJ_GCVMASK
+ | ins_call
+ |
+ |->vmeta_callt: // Resolve __call for BC_CALLT.
+ | // BASE = old base, RA = new base, RC = nargs*8
+ | mov CARG1, L
+ | str BASE, L->base
+ | sub CARG2, RA, #16
+ | str PC, SAVE_PC
+ | add CARG3, RA, NARGS8:RC
+ | bl extern lj_meta_call // (lua_State *L, TValue *func, TValue *top)
+ | ldr TMP1, [RA, FRAME_FUNC] // Guaranteed to be a function here.
+ | ldr PC, [BASE, FRAME_PC]
+ | add NARGS8:RC, NARGS8:RC, #8 // Got one more argument now.
+ | and LFUNC:CARG3, TMP1, #LJ_GCVMASK
+ | b ->BC_CALLT2_Z
+ |
+ |//-- Argument coercion for 'for' statement ------------------------------
+ |
+ |->vmeta_for:
+ | mov CARG1, L
+ | str BASE, L->base
+ | mov CARG2, RA
+ | str PC, SAVE_PC
+ | bl extern lj_meta_for // (lua_State *L, TValue *base)
+ | ldr INSw, [PC, #-4]
+ |.if JIT
+ | uxtb TMP0, INS
+ |.endif
+ | decode_RA RA, INS
+ | decode_RD RC, INS
+ |.if JIT
+ | cmp TMP0, #BC_JFORI
+ | beq =>BC_JFORI
+ |.endif
+ | b =>BC_FORI
+ |
+ |//-----------------------------------------------------------------------
+ |//-- Fast functions -----------------------------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |.macro .ffunc, name
+ |->ff_ .. name:
+ |.endmacro
+ |
+ |.macro .ffunc_1, name
+ |->ff_ .. name:
+ | ldr CARG1, [BASE]
+ | cmp NARGS8:RC, #8
+ | blo ->fff_fallback
+ |.endmacro
+ |
+ |.macro .ffunc_2, name
+ |->ff_ .. name:
+ | ldp CARG1, CARG2, [BASE]
+ | cmp NARGS8:RC, #16
+ | blo ->fff_fallback
+ |.endmacro
+ |
+ |.macro .ffunc_n, name
+ | .ffunc name
+ | ldr CARG1, [BASE]
+ | cmp NARGS8:RC, #8
+ | ldr FARG1, [BASE]
+ | blo ->fff_fallback
+ | checknum CARG1, ->fff_fallback
+ |.endmacro
+ |
+ |.macro .ffunc_nn, name
+ | .ffunc name
+ | ldp CARG1, CARG2, [BASE]
+ | cmp NARGS8:RC, #16
+ | ldp FARG1, FARG2, [BASE]
+ | blo ->fff_fallback
+ | checknum CARG1, ->fff_fallback
+ | checknum CARG2, ->fff_fallback
+ |.endmacro
+ |
+ |// Inlined GC threshold check. Caveat: uses CARG1 and CARG2.
+ |.macro ffgccheck
+ | ldp CARG1, CARG2, GL->gc.total // Assumes threshold follows total.
+ | cmp CARG1, CARG2
+ | blt >1
+ | bl ->fff_gcstep
+ |1:
+ |.endmacro
+ |
+ |//-- Base library: checks -----------------------------------------------
+ |
+ |.ffunc_1 assert
+ | ldr PC, [BASE, FRAME_PC]
+ | mov_false TMP1
+ | cmp CARG1, TMP1
+ | bhs ->fff_fallback
+ | str CARG1, [BASE, #-16]
+ | sub RB, BASE, #8
+ | subs RA, NARGS8:RC, #8
+ | add RC, NARGS8:RC, #8 // Compute (nresults+1)*8.
+ | cbz RA, ->fff_res // Done if exactly 1 argument.
+ |1:
+ | ldr CARG1, [RB, #16]
+ | sub RA, RA, #8
+ | str CARG1, [RB], #8
+ | cbnz RA, <1
+ | b ->fff_res
+ |
+ |.ffunc_1 type
+ | mov TMP0, #~LJ_TISNUM
+ | asr ITYPE, CARG1, #47
+ | cmn ITYPE, #~LJ_TISNUM
+ | csinv TMP1, TMP0, ITYPE, lo
+ | add TMP1, TMP1, #offsetof(GCfuncC, upvalue)/8
+ | ldr CARG1, [CFUNC:CARG3, TMP1, lsl #3]
+ | b ->fff_restv
+ |
+ |//-- Base library: getters and setters ---------------------------------
+ |
+ |.ffunc_1 getmetatable
+ | asr ITYPE, CARG1, #47
+ | cmn ITYPE, #-LJ_TTAB
+ | ccmn ITYPE, #-LJ_TUDATA, #4, ne
+ | and TAB:CARG1, CARG1, #LJ_GCVMASK
+ | bne >6
+ |1: // Field metatable must be at same offset for GCtab and GCudata!
+ | ldr TAB:RB, TAB:CARG1->metatable
+ |2:
+ | mov CARG1, TISNIL
+ | ldr STR:RC, GL->gcroot[GCROOT_MMNAME+MM_metatable]
+ | cbz TAB:RB, ->fff_restv
+ | ldr TMP1w, TAB:RB->hmask
+ | ldr TMP2w, STR:RC->hash
+ | ldr NODE:CARG3, TAB:RB->node
+ | and TMP1w, TMP1w, TMP2w // idx = str->hash & tab->hmask
+ | add TMP1, TMP1, TMP1, lsl #1
+ | movn CARG4, #~LJ_TSTR
+ | add NODE:CARG3, NODE:CARG3, TMP1, lsl #3 // node = tab->node + idx*3*8
+ | add CARG4, STR:RC, CARG4, lsl #47 // Tagged key to look for.
+ |3: // Rearranged logic, because we expect _not_ to find the key.
+ | ldp CARG1, TMP0, NODE:CARG3->val
+ | ldr NODE:CARG3, NODE:CARG3->next
+ | cmp TMP0, CARG4
+ | beq >5
+ | cbnz NODE:CARG3, <3
+ |4:
+ | mov CARG1, RB // Use metatable as default result.
+ | movk CARG1, #(LJ_TTAB>>1)&0xffff, lsl #48
+ | b ->fff_restv
+ |5:
+ | cmp TMP0, TISNIL
+ | bne ->fff_restv
+ | b <4
+ |
+ |6:
+ | movn TMP0, #~LJ_TISNUM
+ | cmp ITYPE, TMP0
+ | csel ITYPE, ITYPE, TMP0, hs
+ | sub TMP1, GL, ITYPE, lsl #3
+ | ldr TAB:RB, [TMP1, #offsetof(global_State, gcroot[GCROOT_BASEMT])-8]
+ | b <2
+ |
+ |.ffunc_2 setmetatable
+ | // Fast path: no mt for table yet and not clearing the mt.
+ | checktp TMP1, CARG1, LJ_TTAB, ->fff_fallback
+ | ldr TAB:TMP0, TAB:TMP1->metatable
+ | asr ITYPE, CARG2, #47
+ | ldrb TMP2w, TAB:TMP1->marked
+ | cmn ITYPE, #-LJ_TTAB
+ | and TAB:CARG2, CARG2, #LJ_GCVMASK
+ | ccmp TAB:TMP0, #0, #0, eq
+ | bne ->fff_fallback
+ | str TAB:CARG2, TAB:TMP1->metatable
+ | tbz TMP2w, #2, ->fff_restv // isblack(table)
+ | barrierback TAB:TMP1, TMP2w, TMP0
+ | b ->fff_restv
+ |
+ |.ffunc rawget
+ | ldr CARG2, [BASE]
+ | cmp NARGS8:RC, #16
+ | blo ->fff_fallback
+ | checktab CARG2, ->fff_fallback
+ | mov CARG1, L
+ | add CARG3, BASE, #8
+ | bl extern lj_tab_get // (lua_State *L, GCtab *t, cTValue *key)
+ | // Returns cTValue *.
+ | ldr CARG1, [CRET1]
+ | b ->fff_restv
+ |
+ |//-- Base library: conversions ------------------------------------------
+ |
+ |.ffunc tonumber
+ | // Only handles the number case inline (without a base argument).
+ | ldr CARG1, [BASE]
+ | cmp NARGS8:RC, #8
+ | bne ->fff_fallback
+ | checknumber CARG1, ->fff_fallback
+ | b ->fff_restv
+ |
+ |.ffunc_1 tostring
+ | // Only handles the string or number case inline.
+ | asr ITYPE, CARG1, #47
+ | cmn ITYPE, #-LJ_TSTR
+ | // A __tostring method in the string base metatable is ignored.
+ | beq ->fff_restv
+ | // Handle numbers inline, unless a number base metatable is present.
+ | ldr TMP1, GL->gcroot[GCROOT_BASEMT_NUM]
+ | str BASE, L->base
+ | cmn ITYPE, #-LJ_TISNUM
+ | ccmp TMP1, #0, #0, ls
+ | str PC, SAVE_PC // Redundant (but a defined value).
+ | bne ->fff_fallback
+ | ffgccheck
+ | mov CARG1, L
+ | mov CARG2, BASE
+ | bl extern lj_strfmt_number // (lua_State *L, cTValue *o)
+ | // Returns GCstr *.
+ | movn TMP1, #~LJ_TSTR
+ | ldr BASE, L->base
+ | add CARG1, CARG1, TMP1, lsl #47
+ | b ->fff_restv
+ |
+ |//-- Base library: iterators -------------------------------------------
+ |
+ |.ffunc_1 next
+ | checktp CARG2, CARG1, LJ_TTAB, ->fff_fallback
+ | str TISNIL, [BASE, NARGS8:RC] // Set missing 2nd arg to nil.
+ | ldr PC, [BASE, FRAME_PC]
+ | stp BASE, BASE, L->base // Add frame since C call can throw.
+ | mov CARG1, L
+ | add CARG3, BASE, #8
+ | str PC, SAVE_PC
+ | bl extern lj_tab_next // (lua_State *L, GCtab *t, TValue *key)
+ | // Returns 0 at end of traversal.
+ | str TISNIL, [BASE, #-16]
+ | cbz CRET1, ->fff_res1 // End of traversal: return nil.
+ | ldp CARG1, CARG2, [BASE, #8] // Copy key and value to results.
+ | mov RC, #(2+1)*8
+ | stp CARG1, CARG2, [BASE, #-16]
+ | b ->fff_res
+ |
+ |.ffunc_1 pairs
+ | checktp TMP1, CARG1, LJ_TTAB, ->fff_fallback
+#if LJ_52
+ | ldr TAB:CARG2, TAB:TMP1->metatable
+#endif
+ | ldr CFUNC:CARG4, CFUNC:CARG3->upvalue[0]
+ | ldr PC, [BASE, FRAME_PC]
+#if LJ_52
+ | cbnz TAB:CARG2, ->fff_fallback
+#endif
+ | mov RC, #(3+1)*8
+ | stp CARG1, TISNIL, [BASE, #-8]
+ | str CFUNC:CARG4, [BASE, #-16]
+ | b ->fff_res
+ |
+ |.ffunc_2 ipairs_aux
+ | checktab CARG1, ->fff_fallback
+ | checkint CARG2, ->fff_fallback
+ | ldr TMP1w, TAB:CARG1->asize
+ | ldr CARG3, TAB:CARG1->array
+ | ldr TMP0w, TAB:CARG1->hmask
+ | add CARG2w, CARG2w, #1
+ | cmp CARG2w, TMP1w
+ | ldr PC, [BASE, FRAME_PC]
+ | add TMP2, CARG2, TISNUM
+ | mov RC, #(0+1)*8
+ | str TMP2, [BASE, #-16]
+ | bhs >2 // Not in array part?
+ | ldr TMP0, [CARG3, CARG2, lsl #3]
+ |1:
+ | mov TMP1, #(2+1)*8
+ | cmp TMP0, TISNIL
+ | str TMP0, [BASE, #-8]
+ | csel RC, RC, TMP1, eq
+ | b ->fff_res
+ |2: // Check for empty hash part first. Otherwise call C function.
+ | cbz TMP0w, ->fff_res
+ | bl extern lj_tab_getinth // (GCtab *t, int32_t key)
+ | // Returns cTValue * or NULL.
+ | cbz CRET1, ->fff_res
+ | ldr TMP0, [CRET1]
+ | b <1
+ |
+ |.ffunc_1 ipairs
+ | checktp TMP1, CARG1, LJ_TTAB, ->fff_fallback
+#if LJ_52
+ | ldr TAB:CARG2, TAB:TMP1->metatable
+#endif
+ | ldr CFUNC:CARG4, CFUNC:CARG3->upvalue[0]
+ | ldr PC, [BASE, FRAME_PC]
+#if LJ_52
+ | cbnz TAB:CARG2, ->fff_fallback
+#endif
+ | mov RC, #(3+1)*8
+ | stp CARG1, TISNUM, [BASE, #-8]
+ | str CFUNC:CARG4, [BASE, #-16]
+ | b ->fff_res
+ |
+ |//-- Base library: catch errors ----------------------------------------
+ |
+ |.ffunc pcall
+ | ldrb TMP0w, GL->hookmask
+ | subs NARGS8:RC, NARGS8:RC, #8
+ | blo ->fff_fallback
+ | mov RB, BASE
+ | add BASE, BASE, #16
+ | ubfx TMP0w, TMP0w, #HOOK_ACTIVE_SHIFT, #1
+ | add PC, TMP0, #16+FRAME_PCALL
+ | beq ->vm_call_dispatch
+ |1:
+ | add TMP2, BASE, NARGS8:RC
+ |2:
+ | ldr TMP0, [TMP2, #-16]
+ | str TMP0, [TMP2, #-8]!
+ | cmp TMP2, BASE
+ | bne <2
+ | b ->vm_call_dispatch
+ |
+ |.ffunc xpcall
+ | ldp CARG1, CARG2, [BASE]
+ | ldrb TMP0w, GL->hookmask
+ | subs NARGS8:RC, NARGS8:RC, #16
+ | blo ->fff_fallback
+ | mov RB, BASE
+ | add BASE, BASE, #24
+ | asr ITYPE, CARG2, #47
+ | ubfx TMP0w, TMP0w, #HOOK_ACTIVE_SHIFT, #1
+ | cmn ITYPE, #-LJ_TFUNC
+ | add PC, TMP0, #24+FRAME_PCALL
+ | bne ->fff_fallback // Traceback must be a function.
+ | stp CARG2, CARG1, [RB] // Swap function and traceback.
+ | cbz NARGS8:RC, ->vm_call_dispatch
+ | b <1
+ |
+ |//-- Coroutine library --------------------------------------------------
+ |
+ |.macro coroutine_resume_wrap, resume
+ |.if resume
+ |.ffunc_1 coroutine_resume
+ | checktp CARG1, LJ_TTHREAD, ->fff_fallback
+ |.else
+ |.ffunc coroutine_wrap_aux
+ | ldr L:CARG1, CFUNC:CARG3->upvalue[0].gcr
+ | and L:CARG1, CARG1, #LJ_GCVMASK
+ |.endif
+ | ldr PC, [BASE, FRAME_PC]
+ | str BASE, L->base
+ | ldp RB, CARG2, L:CARG1->base
+ | ldrb TMP1w, L:CARG1->status
+ | add TMP0, CARG2, TMP1
+ | str PC, SAVE_PC
+ | cmp TMP0, RB
+ | beq ->fff_fallback
+ | cmp TMP1, #LUA_YIELD
+ | add TMP0, CARG2, #8
+ | csel CARG2, CARG2, TMP0, hs
+ | ldr CARG4, L:CARG1->maxstack
+ | add CARG3, CARG2, NARGS8:RC
+ | ldr RB, L:CARG1->cframe
+ | ccmp CARG3, CARG4, #2, ls
+ | ccmp RB, #0, #2, ls
+ | bhi ->fff_fallback
+ |.if resume
+ | sub CARG3, CARG3, #8 // Keep resumed thread in stack for GC.
+ | add BASE, BASE, #8
+ | sub NARGS8:RC, NARGS8:RC, #8
+ |.endif
+ | str CARG3, L:CARG1->top
+ | str BASE, L->top
+ | cbz NARGS8:RC, >3
+ |2: // Move args to coroutine.
+ | ldr TMP0, [BASE, RB]
+ | cmp RB, NARGS8:RC
+ | str TMP0, [CARG2, RB]
+ | add RB, RB, #8
+ | bne <2
+ |3:
+ | mov CARG3, #0
+ | mov L:RA, L:CARG1
+ | mov CARG4, #0
+ | bl ->vm_resume // (lua_State *L, TValue *base, 0, 0)
+ | // Returns thread status.
+ |4:
+ | ldp CARG3, CARG4, L:RA->base
+ | cmp CRET1, #LUA_YIELD
+ | ldr BASE, L->base
+ | str L, GL->cur_L
+ | st_vmstate ST_INTERP
+ | bhi >8
+ | sub RC, CARG4, CARG3
+ | ldr CARG1, L->maxstack
+ | add CARG2, BASE, RC
+ | cbz RC, >6 // No results?
+ | cmp CARG2, CARG1
+ | mov RB, #0
+ | bhi >9 // Need to grow stack?
+ |
+ | sub CARG4, RC, #8
+ | str CARG3, L:RA->top // Clear coroutine stack.
+ |5: // Move results from coroutine.
+ | ldr TMP0, [CARG3, RB]
+ | cmp RB, CARG4
+ | str TMP0, [BASE, RB]
+ | add RB, RB, #8
+ | bne <5
+ |6:
+ |.if resume
+ | mov_true TMP1
+ | add RC, RC, #16
+ |7:
+ | str TMP1, [BASE, #-8] // Prepend true/false to results.
+ | sub RA, BASE, #8
+ |.else
+ | mov RA, BASE
+ | add RC, RC, #8
+ |.endif
+ | ands CARG1, PC, #FRAME_TYPE
+ | str PC, SAVE_PC
+ | str RCw, SAVE_MULTRES
+ | beq ->BC_RET_Z
+ | b ->vm_return
+ |
+ |8: // Coroutine returned with error (at co->top-1).
+ |.if resume
+ | ldr TMP0, [CARG4, #-8]!
+ | mov_false TMP1
+ | mov RC, #(2+1)*8
+ | str CARG4, L:RA->top // Remove error from coroutine stack.
+ | str TMP0, [BASE] // Copy error message.
+ | b <7
+ |.else
+ | mov CARG1, L
+ | mov CARG2, L:RA
+ | bl extern lj_ffh_coroutine_wrap_err // (lua_State *L, lua_State *co)
+ | // Never returns.
+ |.endif
+ |
+ |9: // Handle stack expansion on return from yield.
+ | mov CARG1, L
+ | lsr CARG2, RC, #3
+ | bl extern lj_state_growstack // (lua_State *L, int n)
+ | mov CRET1, #0
+ | b <4
+ |.endmacro
+ |
+ | coroutine_resume_wrap 1 // coroutine.resume
+ | coroutine_resume_wrap 0 // coroutine.wrap
+ |
+ |.ffunc coroutine_yield
+ | ldr TMP0, L->cframe
+ | add TMP1, BASE, NARGS8:RC
+ | mov CRET1, #LUA_YIELD
+ | stp BASE, TMP1, L->base
+ | tbz TMP0, #0, ->fff_fallback
+ | str xzr, L->cframe
+ | strb CRET1w, L->status
+ | b ->vm_leave_unw
+ |
+ |//-- Math library -------------------------------------------------------
+ |
+ |.macro math_round, func, round
+ | .ffunc math_ .. func
+ | ldr CARG1, [BASE]
+ | cmp NARGS8:RC, #8
+ | ldr d0, [BASE]
+ | blo ->fff_fallback
+ | cmp TISNUMhi, CARG1, lsr #32
+ | beq ->fff_restv
+ | blo ->fff_fallback
+ | round d0, d0
+ | b ->fff_resn
+ |.endmacro
+ |
+ | math_round floor, frintm
+ | math_round ceil, frintp
+ |
+ |.ffunc_1 math_abs
+ | checknumber CARG1, ->fff_fallback
+ | and CARG1, CARG1, #U64x(7fffffff,ffffffff)
+ | bne ->fff_restv
+ | eor CARG2w, CARG1w, CARG1w, asr #31
+ | movz CARG3, #0x41e0, lsl #48 // 2^31.
+ | subs CARG1w, CARG2w, CARG1w, asr #31
+ | add CARG1, CARG1, TISNUM
+ | csel CARG1, CARG1, CARG3, pl
+ | // Fallthrough.
+ |
+ |->fff_restv:
+ | // CARG1 = TValue result.
+ | ldr PC, [BASE, FRAME_PC]
+ | str CARG1, [BASE, #-16]
+ |->fff_res1:
+ | // PC = return.
+ | mov RC, #(1+1)*8
+ |->fff_res:
+ | // RC = (nresults+1)*8, PC = return.
+ | ands CARG1, PC, #FRAME_TYPE
+ | str RCw, SAVE_MULTRES
+ | sub RA, BASE, #16
+ | bne ->vm_return
+ | ldr INSw, [PC, #-4]
+ | decode_RB RB, INS
+ |5:
+ | cmp RC, RB, lsl #3 // More results expected?
+ | blo >6
+ | decode_RA TMP1, INS
+ | // Adjust BASE. KBASE is assumed to be set for the calling frame.
+ | sub BASE, RA, TMP1, lsl #3
+ | ins_next
+ |
+ |6: // Fill up results with nil.
+ | add TMP1, RA, RC
+ | add RC, RC, #8
+ | str TISNIL, [TMP1, #-8]
+ | b <5
+ |
+ |.macro math_extern, func
+ | .ffunc_n math_ .. func
+ | bl extern func
+ | b ->fff_resn
+ |.endmacro
+ |
+ |.macro math_extern2, func
+ | .ffunc_nn math_ .. func
+ | bl extern func
+ | b ->fff_resn
+ |.endmacro
+ |
+ |.ffunc_n math_sqrt
+ | fsqrt d0, d0
+ |->fff_resn:
+ | ldr PC, [BASE, FRAME_PC]
+ | str d0, [BASE, #-16]
+ | b ->fff_res1
+ |
+ |.ffunc math_log
+ | ldr CARG1, [BASE]
+ | cmp NARGS8:RC, #8
+ | ldr FARG1, [BASE]
+ | bne ->fff_fallback // Need exactly 1 argument.
+ | checknum CARG1, ->fff_fallback
+ | bl extern log
+ | b ->fff_resn
+ |
+ | math_extern log10
+ | math_extern exp
+ | math_extern sin
+ | math_extern cos
+ | math_extern tan
+ | math_extern asin
+ | math_extern acos
+ | math_extern atan
+ | math_extern sinh
+ | math_extern cosh
+ | math_extern tanh
+ | math_extern2 pow
+ | math_extern2 atan2
+ | math_extern2 fmod
+ |
+ |.ffunc_2 math_ldexp
+ | ldr FARG1, [BASE]
+ | checknum CARG1, ->fff_fallback
+ | checkint CARG2, ->fff_fallback
+ | sxtw CARG1, CARG2w
+ | bl extern ldexp // (double x, int exp)
+ | b ->fff_resn
+ |
+ |.ffunc_n math_frexp
+ | add CARG1, sp, TMPDofs
+ | bl extern frexp
+ | ldr CARG2w, TMPD
+ | ldr PC, [BASE, FRAME_PC]
+ | str d0, [BASE, #-16]
+ | mov RC, #(2+1)*8
+ | add CARG2, CARG2, TISNUM
+ | str CARG2, [BASE, #-8]
+ | b ->fff_res
+ |
+ |.ffunc_n math_modf
+ | sub CARG1, BASE, #16
+ | ldr PC, [BASE, FRAME_PC]
+ | bl extern modf
+ | mov RC, #(2+1)*8
+ | str d0, [BASE, #-8]
+ | b ->fff_res
+ |
+ |.macro math_minmax, name, cond, fcond
+ | .ffunc_1 name
+ | add RB, BASE, RC
+ | add RA, BASE, #8
+ | checkint CARG1, >4
+ |1: // Handle integers.
+ | ldr CARG2, [RA]
+ | cmp RA, RB
+ | bhs ->fff_restv
+ | checkint CARG2, >3
+ | cmp CARG1w, CARG2w
+ | add RA, RA, #8
+ | csel CARG1, CARG2, CARG1, cond
+ | b <1
+ |3: // Convert intermediate result to number and continue below.
+ | scvtf d0, CARG1w
+ | blo ->fff_fallback
+ | ldr d1, [RA]
+ | b >6
+ |
+ |4:
+ | ldr d0, [BASE]
+ | blo ->fff_fallback
+ |5: // Handle numbers.
+ | ldr CARG2, [RA]
+ | ldr d1, [RA]
+ | cmp RA, RB
+ | bhs ->fff_resn
+ | checknum CARG2, >7
+ |6:
+ | fcmp d0, d1
+ | add RA, RA, #8
+ | fcsel d0, d1, d0, fcond
+ | b <5
+ |7: // Convert integer to number and continue above.
+ | scvtf d1, CARG2w
+ | blo ->fff_fallback
+ | b <6
+ |.endmacro
+ |
+ | math_minmax math_min, gt, hi
+ | math_minmax math_max, lt, lo
+ |
+ |//-- String library -----------------------------------------------------
+ |
+ |.ffunc string_byte // Only handle the 1-arg case here.
+ | ldp PC, CARG1, [BASE, FRAME_PC]
+ | cmp NARGS8:RC, #8
+ | asr ITYPE, CARG1, #47
+ | ccmn ITYPE, #-LJ_TSTR, #0, eq
+ | and STR:CARG1, CARG1, #LJ_GCVMASK
+ | bne ->fff_fallback
+ | ldrb TMP0w, STR:CARG1[1] // Access is always ok (NUL at end).
+ | ldr CARG3w, STR:CARG1->len
+ | add TMP0, TMP0, TISNUM
+ | str TMP0, [BASE, #-16]
+ | mov RC, #(0+1)*8
+ | cbz CARG3, ->fff_res
+ | b ->fff_res1
+ |
+ |.ffunc string_char // Only handle the 1-arg case here.
+ | ffgccheck
+ | ldp PC, CARG1, [BASE, FRAME_PC]
+ | cmp CARG1w, #255
+ | ccmp NARGS8:RC, #8, #0, ls // Need exactly 1 argument.
+ | bne ->fff_fallback
+ | checkint CARG1, ->fff_fallback
+ | mov CARG3, #1
+ | mov CARG2, BASE // Points to stack. Little-endian.
+ |->fff_newstr:
+ | // CARG2 = str, CARG3 = len.
+ | str BASE, L->base
+ | mov CARG1, L
+ | str PC, SAVE_PC
+ | bl extern lj_str_new // (lua_State *L, char *str, size_t l)
+ |->fff_resstr:
+ | // Returns GCstr *.
+ | ldr BASE, L->base
+ | movn TMP1, #~LJ_TSTR
+ | add CARG1, CARG1, TMP1, lsl #47
+ | b ->fff_restv
+ |
+ |.ffunc string_sub
+ | ffgccheck
+ | ldr CARG1, [BASE]
+ | ldr CARG3, [BASE, #16]
+ | cmp NARGS8:RC, #16
+ | movn RB, #0
+ | beq >1
+ | blo ->fff_fallback
+ | checkint CARG3, ->fff_fallback
+ | sxtw RB, CARG3w
+ |1:
+ | ldr CARG2, [BASE, #8]
+ | checkstr CARG1, ->fff_fallback
+ | ldr TMP1w, STR:CARG1->len
+ | checkint CARG2, ->fff_fallback
+ | sxtw CARG2, CARG2w
+ | // CARG1 = str, TMP1 = str->len, CARG2 = start, RB = end
+ | add TMP2, RB, TMP1
+ | cmp RB, #0
+ | add TMP0, CARG2, TMP1
+ | csinc RB, RB, TMP2, ge // if (end < 0) end += len+1
+ | cmp CARG2, #0
+ | csinc CARG2, CARG2, TMP0, ge // if (start < 0) start += len+1
+ | cmp RB, #0
+ | csel RB, RB, xzr, ge // if (end < 0) end = 0
+ | cmp CARG2, #1
+ | csinc CARG2, CARG2, xzr, ge // if (start < 1) start = 1
+ | cmp RB, TMP1
+ | csel RB, RB, TMP1, le // if (end > len) end = len
+ | add CARG1, STR:CARG1, #sizeof(GCstr)-1
+ | subs CARG3, RB, CARG2 // len = end - start
+ | add CARG2, CARG1, CARG2
+ | add CARG3, CARG3, #1 // len += 1
+ | bge ->fff_newstr
+ | add STR:CARG1, GL, #offsetof(global_State, strempty)
+ | movn TMP1, #~LJ_TSTR
+ | add CARG1, CARG1, TMP1, lsl #47
+ | b ->fff_restv
+ |
+ |.macro ffstring_op, name
+ | .ffunc string_ .. name
+ | ffgccheck
+ | ldr CARG2, [BASE]
+ | cmp NARGS8:RC, #8
+ | asr ITYPE, CARG2, #47
+ | ccmn ITYPE, #-LJ_TSTR, #0, hs
+ | and STR:CARG2, CARG2, #LJ_GCVMASK
+ | bne ->fff_fallback
+ | ldr TMP0, GL->tmpbuf.b
+ | add SBUF:CARG1, GL, #offsetof(global_State, tmpbuf)
+ | str BASE, L->base
+ | str PC, SAVE_PC
+ | str L, GL->tmpbuf.L
+ | str TMP0, GL->tmpbuf.p
+ | bl extern lj_buf_putstr_ .. name
+ | bl extern lj_buf_tostr
+ | b ->fff_resstr
+ |.endmacro
+ |
+ |ffstring_op reverse
+ |ffstring_op lower
+ |ffstring_op upper
+ |
+ |//-- Bit library --------------------------------------------------------
+ |
+ |// FP number to bit conversion for soft-float. Clobbers CARG1-CARG3
+ |->vm_tobit_fb:
+ | bls ->fff_fallback
+ | add CARG2, CARG1, CARG1
+ | mov CARG3, #1076
+ | sub CARG3, CARG3, CARG2, lsr #53
+ | cmp CARG3, #53
+ | bhi >1
+ | and CARG2, CARG2, #U64x(001fffff,ffffffff)
+ | orr CARG2, CARG2, #U64x(00200000,00000000)
+ | cmp CARG1, #0
+ | lsr CARG2, CARG2, CARG3
+ | cneg CARG1w, CARG2w, mi
+ | br lr
+ |1:
+ | mov CARG1w, #0
+ | br lr
+ |
+ |.macro .ffunc_bit, name
+ | .ffunc_1 bit_..name
+ | adr lr, >1
+ | checkint CARG1, ->vm_tobit_fb
+ |1:
+ |.endmacro
+ |
+ |.macro .ffunc_bit_op, name, ins
+ | .ffunc_bit name
+ | mov RA, #8
+ | mov TMP0w, CARG1w
+ | adr lr, >2
+ |1:
+ | ldr CARG1, [BASE, RA]
+ | cmp RA, NARGS8:RC
+ | add RA, RA, #8
+ | bge >9
+ | checkint CARG1, ->vm_tobit_fb
+ |2:
+ | ins TMP0w, TMP0w, CARG1w
+ | b <1
+ |.endmacro
+ |
+ |.ffunc_bit_op band, and
+ |.ffunc_bit_op bor, orr
+ |.ffunc_bit_op bxor, eor
+ |
+ |.ffunc_bit tobit
+ | mov TMP0w, CARG1w
+ |9: // Label reused by .ffunc_bit_op users.
+ | add CARG1, TMP0, TISNUM
+ | b ->fff_restv
+ |
+ |.ffunc_bit bswap
+ | rev TMP0w, CARG1w
+ | add CARG1, TMP0, TISNUM
+ | b ->fff_restv
+ |
+ |.ffunc_bit bnot
+ | mvn TMP0w, CARG1w
+ | add CARG1, TMP0, TISNUM
+ | b ->fff_restv
+ |
+ |.macro .ffunc_bit_sh, name, ins, shmod
+ | .ffunc bit_..name
+ | ldp TMP0, CARG1, [BASE]
+ | cmp NARGS8:RC, #16
+ | blo ->fff_fallback
+ | adr lr, >1
+ | checkint CARG1, ->vm_tobit_fb
+ |1:
+ |.if shmod == 0
+ | mov TMP1, CARG1
+ |.else
+ | neg TMP1, CARG1
+ |.endif
+ | mov CARG1, TMP0
+ | adr lr, >2
+ | checkint CARG1, ->vm_tobit_fb
+ |2:
+ | ins TMP0w, CARG1w, TMP1w
+ | add CARG1, TMP0, TISNUM
+ | b ->fff_restv
+ |.endmacro
+ |
+ |.ffunc_bit_sh lshift, lsl, 0
+ |.ffunc_bit_sh rshift, lsr, 0
+ |.ffunc_bit_sh arshift, asr, 0
+ |.ffunc_bit_sh rol, ror, 1
+ |.ffunc_bit_sh ror, ror, 0
+ |
+ |//-----------------------------------------------------------------------
+ |
+ |->fff_fallback: // Call fast function fallback handler.
+ | // BASE = new base, RC = nargs*8
+ | ldp CFUNC:CARG3, PC, [BASE, FRAME_FUNC] // Fallback may overwrite PC.
+ | ldr TMP2, L->maxstack
+ | add TMP1, BASE, NARGS8:RC
+ | stp BASE, TMP1, L->base
+ | and CFUNC:CARG3, CARG3, #LJ_GCVMASK
+ | add TMP1, TMP1, #8*LUA_MINSTACK
+ | ldr CARG3, CFUNC:CARG3->f
+ | str PC, SAVE_PC // Redundant (but a defined value).
+ | cmp TMP1, TMP2
+ | mov CARG1, L
+ | bhi >5 // Need to grow stack.
+ | blr CARG3 // (lua_State *L)
+ | // Either throws an error, or recovers and returns -1, 0 or nresults+1.
+ | ldr BASE, L->base
+ | cmp CRET1w, #0
+ | lsl RC, CRET1, #3
+ | sub RA, BASE, #16
+ | bgt ->fff_res // Returned nresults+1?
+ |1: // Returned 0 or -1: retry fast path.
+ | ldr CARG1, L->top
+ | ldr CFUNC:CARG3, [BASE, FRAME_FUNC]
+ | sub NARGS8:RC, CARG1, BASE
+ | bne ->vm_call_tail // Returned -1?
+ | and CFUNC:CARG3, CARG3, #LJ_GCVMASK
+ | ins_callt // Returned 0: retry fast path.
+ |
+ |// Reconstruct previous base for vmeta_call during tailcall.
+ |->vm_call_tail:
+ | ands TMP0, PC, #FRAME_TYPE
+ | and TMP1, PC, #~FRAME_TYPEP
+ | bne >3
+ | ldrb RAw, [PC, #-3]
+ | lsl RA, RA, #3
+ | add TMP1, RA, #16
+ |3:
+ | sub RB, BASE, TMP1
+ | b ->vm_call_dispatch // Resolve again for tailcall.
+ |
+ |5: // Grow stack for fallback handler.
+ | mov CARG2, #LUA_MINSTACK
+ | bl extern lj_state_growstack // (lua_State *L, int n)
+ | ldr BASE, L->base
+ | cmp CARG1, CARG1 // Set zero-flag to force retry.
+ | b <1
+ |
+ |->fff_gcstep: // Call GC step function.
+ | // BASE = new base, RC = nargs*8
+ | add CARG2, BASE, NARGS8:RC // Calculate L->top.
+ | mov RA, lr
+ | stp BASE, CARG2, L->base
+ | str PC, SAVE_PC // Redundant (but a defined value).
+ | mov CARG1, L
+ | bl extern lj_gc_step // (lua_State *L)
+ | ldp BASE, CARG2, L->base
+ | ldr CFUNC:CARG3, [BASE, FRAME_FUNC]
+ | mov lr, RA // Help return address predictor.
+ | sub NARGS8:RC, CARG2, BASE // Calculate nargs*8.
+ | and CFUNC:CARG3, CARG3, #LJ_GCVMASK
+ | ret
+ |
+ |//-----------------------------------------------------------------------
+ |//-- Special dispatch targets -------------------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |->vm_record: // Dispatch target for recording phase.
+ | NYI
+ |
+ |->vm_rethook: // Dispatch target for return hooks.
+ | ldrb TMP2w, GL->hookmask
+ | tbz TMP2w, #HOOK_ACTIVE_SHIFT, >1 // Hook already active?
+ |5: // Re-dispatch to static ins.
+ | ldr TMP0, [TMP1, #GG_G2DISP+GG_DISP2STATIC]
+ | br TMP0
+ |
+ |->vm_inshook: // Dispatch target for instr/line hooks.
+ | ldrb TMP2w, GL->hookmask
+ | ldr TMP3w, GL->hookcount
+ | tbnz TMP2w, #HOOK_ACTIVE_SHIFT, <5 // Hook already active?
+ | tst TMP2w, #LUA_MASKLINE|LUA_MASKCOUNT
+ | beq <5
+ | sub TMP3w, TMP3w, #1
+ | str TMP3w, GL->hookcount
+ | cbz TMP3w, >1
+ | tbz TMP2w, #LUA_HOOKLINE, <5
+ |1:
+ | mov CARG1, L
+ | str BASE, L->base
+ | mov CARG2, PC
+ | // SAVE_PC must hold the _previous_ PC. The callee updates it with PC.
+ | bl extern lj_dispatch_ins // (lua_State *L, const BCIns *pc)
+ |3:
+ | ldr BASE, L->base
+ |4: // Re-dispatch to static ins.
+ | ldr INSw, [PC, #-4]
+ | add TMP1, GL, INS, uxtb #3
+ | decode_RA RA, INS
+ | ldr TMP0, [TMP1, #GG_G2DISP+GG_DISP2STATIC]
+ | decode_RD RC, INS
+ | br TMP0
+ |
+ |->cont_hook: // Continue from hook yield.
+ | ldr CARG1, [CARG4, #-40]
+ | add PC, PC, #4
+ | str CARG1w, SAVE_MULTRES // Restore MULTRES for *M ins.
+ | b <4
+ |
+ |->vm_hotloop: // Hot loop counter underflow.
+ | NYI
+ |
+ |->vm_callhook: // Dispatch target for call hooks.
+ | mov CARG2, PC
+ |.if JIT
+ | b >1
+ |.endif
+ |
+ |->vm_hotcall: // Hot call counter underflow.
+ |.if JIT
+ | orr CARG2, PC, #1
+ |1:
+ |.endif
+ | add TMP1, BASE, NARGS8:RC
+ | str PC, SAVE_PC
+ | mov CARG1, L
+ | sub RA, RA, BASE
+ | stp BASE, TMP1, L->base
+ | bl extern lj_dispatch_call // (lua_State *L, const BCIns *pc)
+ | // Returns ASMFunction.
+ | ldp BASE, TMP1, L->base
+ | str xzr, SAVE_PC // Invalidate for subsequent line hook.
+ | ldr LFUNC:CARG3, [BASE, FRAME_FUNC]
+ | add RA, BASE, RA
+ | sub NARGS8:RC, TMP1, BASE
+ | ldr INSw, [PC, #-4]
+ | and LFUNC:CARG3, CARG3, #LJ_GCVMASK
+ | br CRET1
+ |
+ |->cont_stitch: // Trace stitching.
+ | NYI
+ |
+ |->vm_profhook: // Dispatch target for profiler hook.
+#if LJ_HASPROFILE
+ | mov CARG1, L
+ | str BASE, L->base
+ | mov CARG2, PC
+ | bl extern lj_dispatch_profile // (lua_State *L, const BCIns *pc)
+ | // HOOK_PROFILE is off again, so re-dispatch to dynamic instruction.
+ | ldr BASE, L->base
+ | sub PC, PC, #4
+ | b ->cont_nop
+#endif
+ |
+ |//-----------------------------------------------------------------------
+ |//-- Trace exit handler -------------------------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |->vm_exit_handler:
+ | NYI
+ |->vm_exit_interp:
+ | NYI
+ |
+ |//-----------------------------------------------------------------------
+ |//-- Math helper functions ----------------------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ | // int lj_vm_modi(int dividend, int divisor);
+ |->vm_modi:
+ | eor CARG4w, CARG1w, CARG2w
+ | cmp CARG4w, #0
+ | eor CARG3w, CARG1w, CARG1w, asr #31
+ | eor CARG4w, CARG2w, CARG2w, asr #31
+ | sub CARG3w, CARG3w, CARG1w, asr #31
+ | sub CARG4w, CARG4w, CARG2w, asr #31
+ | udiv CARG1w, CARG3w, CARG4w
+ | msub CARG1w, CARG1w, CARG4w, CARG3w
+ | ccmp CARG1w, #0, #4, mi
+ | sub CARG3w, CARG1w, CARG4w
+ | csel CARG1w, CARG1w, CARG3w, eq
+ | eor CARG3w, CARG1w, CARG2w
+ | cmp CARG3w, #0
+ | cneg CARG1w, CARG1w, mi
+ | ret
+ |
+ |//-----------------------------------------------------------------------
+ |//-- Miscellaneous functions --------------------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |//-----------------------------------------------------------------------
+ |//-- FFI helper functions -----------------------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |// Handler for callback functions.
+ |// Saveregs already performed. Callback slot number in [sp], g in r12.
+ |->vm_ffi_callback:
+ |.if FFI
+ |.type CTSTATE, CTState, PC
+ | saveregs
+ | ldr CTSTATE, GL:x10->ctype_state
+ | mov GL, x10
+ | add x10, sp, # CFRAME_SPACE
+ | str w9, CTSTATE->cb.slot
+ | stp x0, x1, CTSTATE->cb.gpr[0]
+ | stp d0, d1, CTSTATE->cb.fpr[0]
+ | stp x2, x3, CTSTATE->cb.gpr[2]
+ | stp d2, d3, CTSTATE->cb.fpr[2]
+ | stp x4, x5, CTSTATE->cb.gpr[4]
+ | stp d4, d5, CTSTATE->cb.fpr[4]
+ | stp x6, x7, CTSTATE->cb.gpr[6]
+ | stp d6, d7, CTSTATE->cb.fpr[6]
+ | str x10, CTSTATE->cb.stack
+ | mov CARG1, CTSTATE
+ | str CTSTATE, SAVE_PC // Any value outside of bytecode is ok.
+ | mov CARG2, sp
+ | bl extern lj_ccallback_enter // (CTState *cts, void *cf)
+ | // Returns lua_State *.
+ | ldp BASE, RC, L:CRET1->base
+ | movz TISNUM, #(LJ_TISNUM>>1)&0xffff, lsl #48
+ | movz TISNUMhi, #(LJ_TISNUM>>1)&0xffff, lsl #16
+ | movn TISNIL, #0
+ | mov L, CRET1
+ | ldr LFUNC:CARG3, [BASE, FRAME_FUNC]
+ | sub RC, RC, BASE
+ | st_vmstate ST_INTERP
+ | and LFUNC:CARG3, CARG3, #LJ_GCVMASK
+ | ins_callt
+ |.endif
+ |
+ |->cont_ffi_callback: // Return from FFI callback.
+ |.if FFI
+ | ldr CTSTATE, GL->ctype_state
+ | stp BASE, CARG4, L->base
+ | str L, CTSTATE->L
+ | mov CARG1, CTSTATE
+ | mov CARG2, RA
+ | bl extern lj_ccallback_leave // (CTState *cts, TValue *o)
+ | ldp x0, x1, CTSTATE->cb.gpr[0]
+ | ldp d0, d1, CTSTATE->cb.fpr[0]
+ | b ->vm_leave_unw
+ |.endif
+ |
+ |->vm_ffi_call: // Call C function via FFI.
+ | // Caveat: needs special frame unwinding, see below.
+ |.if FFI
+ | .type CCSTATE, CCallState, x19
+ | stp fp, lr, [sp, #-32]!
+ | add fp, sp, #0
+ | str CCSTATE, [sp, #16]
+ | mov CCSTATE, x0
+ | ldr TMP0w, CCSTATE:x0->spadj
+ | ldrb TMP1w, CCSTATE->nsp
+ | add TMP2, CCSTATE, #offsetof(CCallState, stack)
+ | subs TMP1, TMP1, #1
+ | ldr TMP3, CCSTATE->func
+ | sub sp, fp, TMP0
+ | bmi >2
+ |1: // Copy stack slots
+ | ldr TMP0, [TMP2, TMP1, lsl #3]
+ | str TMP0, [sp, TMP1, lsl #3]
+ | subs TMP1, TMP1, #1
+ | bpl <1
+ |2:
+ | ldp x0, x1, CCSTATE->gpr[0]
+ | ldp d0, d1, CCSTATE->fpr[0]
+ | ldp x2, x3, CCSTATE->gpr[2]
+ | ldp d2, d3, CCSTATE->fpr[2]
+ | ldp x4, x5, CCSTATE->gpr[4]
+ | ldp d4, d5, CCSTATE->fpr[4]
+ | ldp x6, x7, CCSTATE->gpr[6]
+ | ldp d6, d7, CCSTATE->fpr[6]
+ | ldr x8, CCSTATE->retp
+ | blr TMP3
+ | mov sp, fp
+ | stp x0, x1, CCSTATE->gpr[0]
+ | stp d0, d1, CCSTATE->fpr[0]
+ | stp d2, d3, CCSTATE->fpr[2]
+ | ldr CCSTATE, [sp, #16]
+ | ldp fp, lr, [sp], #32
+ | ret
+ |.endif
+ |// Note: vm_ffi_call must be the last function in this object file!
+ |
+ |//-----------------------------------------------------------------------
+}
+
+/* Generate the code for a single instruction. */
+static void build_ins(BuildCtx *ctx, BCOp op, int defop)
+{
+ int vk = 0;
+ |=>defop:
+
+ switch (op) {
+
+ /* -- Comparison ops ---------------------------------------------------- */
+
+ /* Remember: all ops branch for a true comparison, fall through otherwise. */
+
+ case BC_ISLT: case BC_ISGE: case BC_ISLE: case BC_ISGT:
+ | // RA = src1, RC = src2, JMP with RC = target
+ | ldr CARG1, [BASE, RA, lsl #3]
+ | ldrh RBw, [PC, #2]
+ | ldr CARG2, [BASE, RC, lsl #3]
+ | add PC, PC, #4
+ | add RB, PC, RB, lsl #2
+ | sub RB, RB, #0x20000
+ | checkint CARG1, >3
+ | checkint CARG2, >4
+ | cmp CARG1w, CARG2w
+ if (op == BC_ISLT) {
+ | csel PC, RB, PC, lt
+ } else if (op == BC_ISGE) {
+ | csel PC, RB, PC, ge
+ } else if (op == BC_ISLE) {
+ | csel PC, RB, PC, le
+ } else {
+ | csel PC, RB, PC, gt
+ }
+ |1:
+ | ins_next
+ |
+ |3: // RA not int.
+ | ldr FARG1, [BASE, RA, lsl #3]
+ | blo ->vmeta_comp
+ | ldr FARG2, [BASE, RC, lsl #3]
+ | cmp TISNUMhi, CARG2, lsr #32
+ | bhi >5
+ | bne ->vmeta_comp
+ | // RA number, RC int.
+ | scvtf FARG2, CARG2w
+ | b >5
+ |
+ |4: // RA int, RC not int
+ | ldr FARG2, [BASE, RC, lsl #3]
+ | blo ->vmeta_comp
+ | // RA int, RC number.
+ | scvtf FARG1, CARG1w
+ |
+ |5: // RA number, RC number
+ | fcmp FARG1, FARG2
+ | // To preserve NaN semantics GE/GT branch on unordered, but LT/LE don't.
+ if (op == BC_ISLT) {
+ | csel PC, RB, PC, lo
+ } else if (op == BC_ISGE) {
+ | csel PC, RB, PC, hs
+ } else if (op == BC_ISLE) {
+ | csel PC, RB, PC, ls
+ } else {
+ | csel PC, RB, PC, hi
+ }
+ | b <1
+ break;
+
+ case BC_ISEQV: case BC_ISNEV:
+ vk = op == BC_ISEQV;
+ | // RA = src1, RC = src2, JMP with RC = target
+ | ldr CARG1, [BASE, RA, lsl #3]
+ | add RC, BASE, RC, lsl #3
+ | ldrh RBw, [PC, #2]
+ | ldr CARG3, [RC]
+ | add PC, PC, #4
+ | add RB, PC, RB, lsl #2
+ | sub RB, RB, #0x20000
+ | asr ITYPE, CARG3, #47
+ | cmn ITYPE, #-LJ_TISNUM
+ if (vk) {
+ | bls ->BC_ISEQN_Z
+ } else {
+ | bls ->BC_ISNEN_Z
+ }
+ | // RC is not a number.
+ | asr TMP0, CARG1, #47
+ |.if FFI
+ | // Check if RC or RA is a cdata.
+ | cmn ITYPE, #-LJ_TCDATA
+ | ccmn TMP0, #-LJ_TCDATA, #4, ne
+ | beq ->vmeta_equal_cd
+ |.endif
+ | cmp CARG1, CARG3
+ | bne >2
+ | // Tag and value are equal.
+ if (vk) {
+ |->BC_ISEQV_Z:
+ | mov PC, RB // Perform branch.
+ }
+ |1:
+ | ins_next
+ |
+ |2: // Check if the tags are the same and it's a table or userdata.
+ | cmp ITYPE, TMP0
+ | ccmn ITYPE, #-LJ_TISTABUD, #2, eq
+ if (vk) {
+ | bhi <1
+ } else {
+ | bhi ->BC_ISEQV_Z // Reuse code from opposite instruction.
+ }
+ | // Different tables or userdatas. Need to check __eq metamethod.
+ | // Field metatable must be at same offset for GCtab and GCudata!
+ | and TAB:CARG2, CARG1, #LJ_GCVMASK
+ | ldr TAB:TMP2, TAB:CARG2->metatable
+ if (vk) {
+ | cbz TAB:TMP2, <1 // No metatable?
+ | ldrb TMP1w, TAB:TMP2->nomm
+ | mov CARG4, #0 // ne = 0
+ | tbnz TMP1w, #MM_eq, <1 // 'no __eq' flag set: done.
+ } else {
+ | cbz TAB:TMP2, ->BC_ISEQV_Z // No metatable?
+ | ldrb TMP1w, TAB:TMP2->nomm
+ | mov CARG4, #1 // ne = 1.
+ | tbnz TMP1w, #MM_eq, ->BC_ISEQV_Z // 'no __eq' flag set: done.
+ }
+ | b ->vmeta_equal
+ break;
+
+ case BC_ISEQS: case BC_ISNES:
+ vk = op == BC_ISEQS;
+ | // RA = src, RC = str_const (~), JMP with RC = target
+ | ldr CARG1, [BASE, RA, lsl #3]
+ | mvn RC, RC
+ | ldrh RBw, [PC, #2]
+ | ldr CARG2, [KBASE, RC, lsl #3]
+ | add PC, PC, #4
+ | movn TMP0, #~LJ_TSTR
+ |.if FFI
+ | asr ITYPE, CARG1, #47
+ |.endif
+ | add RB, PC, RB, lsl #2
+ | add CARG2, CARG2, TMP0, lsl #47
+ | sub RB, RB, #0x20000
+ |.if FFI
+ | cmn ITYPE, #-LJ_TCDATA
+ | beq ->vmeta_equal_cd
+ |.endif
+ | cmp CARG1, CARG2
+ if (vk) {
+ | csel PC, RB, PC, eq
+ } else {
+ | csel PC, RB, PC, ne
+ }
+ | ins_next
+ break;
+
+ case BC_ISEQN: case BC_ISNEN:
+ vk = op == BC_ISEQN;
+ | // RA = src, RC = num_const (~), JMP with RC = target
+ | ldr CARG1, [BASE, RA, lsl #3]
+ | add RC, KBASE, RC, lsl #3
+ | ldrh RBw, [PC, #2]
+ | ldr CARG3, [RC]
+ | add PC, PC, #4
+ | add RB, PC, RB, lsl #2
+ | sub RB, RB, #0x20000
+ if (vk) {
+ |->BC_ISEQN_Z:
+ } else {
+ |->BC_ISNEN_Z:
+ }
+ | checkint CARG1, >4
+ | checkint CARG3, >6
+ | cmp CARG1w, CARG3w
+ |1:
+ if (vk) {
+ | csel PC, RB, PC, eq
+ |2:
+ } else {
+ |2:
+ | csel PC, RB, PC, ne
+ }
+ |3:
+ | ins_next
+ |
+ |4: // RA not int.
+ |.if FFI
+ | blo >7
+ |.else
+ | blo <2
+ |.endif
+ | ldr FARG1, [BASE, RA, lsl #3]
+ | ldr FARG2, [RC]
+ | cmp TISNUMhi, CARG3, lsr #32
+ | bne >5
+ | // RA number, RC int.
+ | scvtf FARG2, CARG3w
+ |5:
+ | // RA number, RC number.
+ | fcmp FARG1, FARG2
+ | b <1
+ |
+ |6: // RA int, RC number
+ | ldr FARG2, [RC]
+ | scvtf FARG1, CARG1w
+ | fcmp FARG1, FARG2
+ | b <1
+ |
+ |.if FFI
+ |7:
+ | asr ITYPE, CARG1, #47
+ | cmn ITYPE, #-LJ_TCDATA
+ | bne <2
+ | b ->vmeta_equal_cd
+ |.endif
+ break;
+
+ case BC_ISEQP: case BC_ISNEP:
+ vk = op == BC_ISEQP;
+ | // RA = src, RC = primitive_type (~), JMP with RC = target
+ | ldr TMP0, [BASE, RA, lsl #3]
+ | ldrh RBw, [PC, #2]
+ | add PC, PC, #4
+ | add RC, RC, #1
+ | add RB, PC, RB, lsl #2
+ |.if FFI
+ | asr ITYPE, TMP0, #47
+ | cmn ITYPE, #-LJ_TCDATA
+ | beq ->vmeta_equal_cd
+ | cmn RC, ITYPE
+ |.else
+ | cmn RC, TMP0, asr #47
+ |.endif
+ | sub RB, RB, #0x20000
+ if (vk) {
+ | csel PC, RB, PC, eq
+ } else {
+ | csel PC, RB, PC, ne
+ }
+ | ins_next
+ break;
+
+ /* -- Unary test and copy ops ------------------------------------------- */
+
+ case BC_ISTC: case BC_ISFC: case BC_IST: case BC_ISF:
+ | // RA = dst or unused, RC = src, JMP with RC = target
+ | ldrh RBw, [PC, #2]
+ | ldr TMP0, [BASE, RC, lsl #3]
+ | add PC, PC, #4
+ | mov_false TMP1
+ | add RB, PC, RB, lsl #2
+ | cmp TMP0, TMP1
+ | sub RB, RB, #0x20000
+ if (op == BC_ISTC || op == BC_IST) {
+ if (op == BC_ISTC) {
+ | csel RA, RA, RC, lo
+ }
+ | csel PC, RB, PC, lo
+ } else {
+ if (op == BC_ISFC) {
+ | csel RA, RA, RC, hs
+ }
+ | csel PC, RB, PC, hs
+ }
+ if (op == BC_ISTC || op == BC_ISFC) {
+ | str TMP0, [BASE, RA, lsl #3]
+ }
+ | ins_next
+ break;
+
+ case BC_ISTYPE:
+ | // RA = src, RC = -type
+ | ldr TMP0, [BASE, RA, lsl #3]
+ | cmn RC, TMP0, asr #47
+ | bne ->vmeta_istype
+ | ins_next
+ break;
+ case BC_ISNUM:
+ | // RA = src, RC = -(TISNUM-1)
+ | ldr TMP0, [BASE, RA]
+ | checknum TMP0, ->vmeta_istype
+ | ins_next
+ break;
+
+ /* -- Unary ops --------------------------------------------------------- */
+
+ case BC_MOV:
+ | // RA = dst, RC = src
+ | ldr TMP0, [BASE, RC, lsl #3]
+ | str TMP0, [BASE, RA, lsl #3]
+ | ins_next
+ break;
+ case BC_NOT:
+ | // RA = dst, RC = src
+ | ldr TMP0, [BASE, RC, lsl #3]
+ | mov_false TMP1
+ | mov_true TMP2
+ | cmp TMP0, TMP1
+ | csel TMP0, TMP1, TMP2, lo
+ | str TMP0, [BASE, RA, lsl #3]
+ | ins_next
+ break;
+ case BC_UNM:
+ | // RA = dst, RC = src
+ | ldr TMP0, [BASE, RC, lsl #3]
+ | asr ITYPE, TMP0, #47
+ | cmn ITYPE, #-LJ_TISNUM
+ | bhi ->vmeta_unm
+ | eor TMP0, TMP0, #U64x(80000000,00000000)
+ | bne >5
+ | negs TMP0w, TMP0w
+ | movz CARG3, #0x41e0, lsl #48 // 2^31.
+ | add TMP0, TMP0, TISNUM
+ | csel TMP0, TMP0, CARG3, vc
+ |5:
+ | str TMP0, [BASE, RA, lsl #3]
+ | ins_next
+ break;
+ case BC_LEN:
+ | // RA = dst, RC = src
+ | ldr CARG1, [BASE, RC, lsl #3]
+ | asr ITYPE, CARG1, #47
+ | cmn ITYPE, #-LJ_TSTR
+ | and CARG1, CARG1, #LJ_GCVMASK
+ | bne >2
+ | ldr CARG1w, STR:CARG1->len
+ |1:
+ | add CARG1, CARG1, TISNUM
+ | str CARG1, [BASE, RA, lsl #3]
+ | ins_next
+ |
+ |2:
+ | cmn ITYPE, #-LJ_TTAB
+ | bne ->vmeta_len
+#if LJ_52
+ | ldr TAB:CARG2, TAB:CARG1->metatable
+ | cbnz TAB:CARG2, >9
+ |3:
+#endif
+ |->BC_LEN_Z:
+ | bl extern lj_tab_len // (GCtab *t)
+ | // Returns uint32_t (but less than 2^31).
+ | b <1
+ |
+#if LJ_52
+ |9:
+ | ldrb TMP1w, TAB:CARG2->nomm
+ | tbnz TMP1w, #MM_len, <3 // 'no __len' flag set: done.
+ | b ->vmeta_len
+#endif
+ break;
+
+ /* -- Binary ops -------------------------------------------------------- */
+
+ |.macro ins_arithcheck_int, target
+ | checkint CARG1, target
+ | checkint CARG2, target
+ |.endmacro
+ |
+ |.macro ins_arithcheck_num, target
+ | checknum CARG1, target
+ | checknum CARG2, target
+ |.endmacro
+ |
+ |.macro ins_arithcheck_nzdiv, target
+ | cbz CARG2w, target
+ |.endmacro
+ |
+ |.macro ins_arithhead
+ ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
+ ||if (vk == 1) {
+ | and RC, RC, #255
+ | decode_RB RB, INS
+ ||} else {
+ | decode_RB RB, INS
+ | and RC, RC, #255
+ ||}
+ |.endmacro
+ |
+ |.macro ins_arithload, reg1, reg2
+ | // RA = dst, RB = src1, RC = src2 | num_const
+ ||switch (vk) {
+ ||case 0:
+ | ldr reg1, [BASE, RB, lsl #3]
+ | ldr reg2, [KBASE, RC, lsl #3]
+ || break;
+ ||case 1:
+ | ldr reg1, [KBASE, RC, lsl #3]
+ | ldr reg2, [BASE, RB, lsl #3]
+ || break;
+ ||default:
+ | ldr reg1, [BASE, RB, lsl #3]
+ | ldr reg2, [BASE, RC, lsl #3]
+ || break;
+ ||}
+ |.endmacro
+ |
+ |.macro ins_arithfallback, ins
+ ||switch (vk) {
+ ||case 0:
+ | ins ->vmeta_arith_vn
+ || break;
+ ||case 1:
+ | ins ->vmeta_arith_nv
+ || break;
+ ||default:
+ | ins ->vmeta_arith_vv
+ || break;
+ ||}
+ |.endmacro
+ |
+ |.macro ins_arithmod, res, reg1, reg2
+ | fdiv d2, reg1, reg2
+ | frintm d2, d2
+ | fmsub res, d2, reg2, reg1
+ |.endmacro
+ |
+ |.macro ins_arithdn, intins, fpins
+ | ins_arithhead
+ | ins_arithload CARG1, CARG2
+ | ins_arithcheck_int >5
+ |.if "intins" == "smull"
+ | smull CARG1, CARG1w, CARG2w
+ | cmp CARG1, CARG1, sxtw
+ | mov CARG1w, CARG1w
+ | ins_arithfallback bne
+ |.elif "intins" == "ins_arithmodi"
+ | ins_arithfallback ins_arithcheck_nzdiv
+ | bl ->vm_modi
+ |.else
+ | intins CARG1w, CARG1w, CARG2w
+ | ins_arithfallback bvs
+ |.endif
+ | add CARG1, CARG1, TISNUM
+ | str CARG1, [BASE, RA, lsl #3]
+ |4:
+ | ins_next
+ |
+ |5: // FP variant.
+ | ins_arithload FARG1, FARG2
+ | ins_arithfallback ins_arithcheck_num
+ | fpins FARG1, FARG1, FARG2
+ | str FARG1, [BASE, RA, lsl #3]
+ | b <4
+ |.endmacro
+ |
+ |.macro ins_arithfp, fpins
+ | ins_arithhead
+ | ins_arithload CARG1, CARG2
+ | ins_arithload FARG1, FARG2
+ | ins_arithfallback ins_arithcheck_num
+ |.if "fpins" == "fpow"
+ | bl extern pow
+ |.else
+ | fpins FARG1, FARG1, FARG2
+ |.endif
+ | str FARG1, [BASE, RA, lsl #3]
+ | ins_next
+ |.endmacro
+
+ case BC_ADDVN: case BC_ADDNV: case BC_ADDVV:
+ | ins_arithdn adds, fadd
+ break;
+ case BC_SUBVN: case BC_SUBNV: case BC_SUBVV:
+ | ins_arithdn subs, fsub
+ break;
+ case BC_MULVN: case BC_MULNV: case BC_MULVV:
+ | ins_arithdn smull, fmul
+ break;
+ case BC_DIVVN: case BC_DIVNV: case BC_DIVVV:
+ | ins_arithfp fdiv
+ break;
+ case BC_MODVN: case BC_MODNV: case BC_MODVV:
+ | ins_arithdn ins_arithmodi, ins_arithmod
+ break;
+ case BC_POW:
+ | // NYI: (partial) integer arithmetic.
+ | ins_arithfp fpow
+ break;
+
+ case BC_CAT:
+ | decode_RB RB, INS
+ | and RC, RC, #255
+ | // RA = dst, RB = src_start, RC = src_end
+ | str BASE, L->base
+ | sub CARG3, RC, RB
+ | add CARG2, BASE, RC, lsl #3
+ |->BC_CAT_Z:
+ | // RA = dst, CARG2 = top-1, CARG3 = left
+ | mov CARG1, L
+ | str PC, SAVE_PC
+ | bl extern lj_meta_cat // (lua_State *L, TValue *top, int left)
+ | // Returns NULL (finished) or TValue * (metamethod).
+ | ldrb RBw, [PC, #-1]
+ | ldr BASE, L->base
+ | cbnz CRET1, ->vmeta_binop
+ | ldr TMP0, [BASE, RB, lsl #3]
+ | str TMP0, [BASE, RA, lsl #3] // Copy result to RA.
+ | ins_next
+ break;
+
+ /* -- Constant ops ------------------------------------------------------ */
+
+ case BC_KSTR:
+ | // RA = dst, RC = str_const (~)
+ | mvn RC, RC
+ | ldr TMP0, [KBASE, RC, lsl #3]
+ | movn TMP1, #~LJ_TSTR
+ | add TMP0, TMP0, TMP1, lsl #47
+ | str TMP0, [BASE, RA, lsl #3]
+ | ins_next
+ break;
+ case BC_KCDATA:
+ |.if FFI
+ | // RA = dst, RC = cdata_const (~)
+ | mvn RC, RC
+ | ldr TMP0, [KBASE, RC, lsl #3]
+ | movn TMP1, #~LJ_TCDATA
+ | add TMP0, TMP0, TMP1, lsl #47
+ | str TMP0, [BASE, RA, lsl #3]
+ | ins_next
+ |.endif
+ break;
+ case BC_KSHORT:
+ | // RA = dst, RC = int16_literal
+ | sxth RCw, RCw
+ | add TMP0, RC, TISNUM
+ | str TMP0, [BASE, RA, lsl #3]
+ | ins_next
+ break;
+ case BC_KNUM:
+ | // RA = dst, RC = num_const
+ | ldr TMP0, [KBASE, RC, lsl #3]
+ | str TMP0, [BASE, RA, lsl #3]
+ | ins_next
+ break;
+ case BC_KPRI:
+ | // RA = dst, RC = primitive_type (~)
+ | mvn TMP0, RC, lsl #47
+ | str TMP0, [BASE, RA, lsl #3]
+ | ins_next
+ break;
+ case BC_KNIL:
+ | // RA = base, RC = end
+ | add RA, BASE, RA, lsl #3
+ | add RC, BASE, RC, lsl #3
+ | str TISNIL, [RA], #8
+ |1:
+ | cmp RA, RC
+ | str TISNIL, [RA], #8
+ | blt <1
+ | ins_next_
+ break;
+
+ /* -- Upvalue and function ops ------------------------------------------ */
+
+ case BC_UGET:
+ | // RA = dst, RC = uvnum
+ | ldr LFUNC:CARG2, [BASE, FRAME_FUNC]
+ | add RC, RC, #offsetof(GCfuncL, uvptr)/8
+ | and LFUNC:CARG2, CARG2, #LJ_GCVMASK
+ | ldr UPVAL:CARG2, [LFUNC:CARG2, RC, lsl #3]
+ | ldr CARG2, UPVAL:CARG2->v
+ | ldr TMP0, [CARG2]
+ | str TMP0, [BASE, RA, lsl #3]
+ | ins_next
+ break;
+ case BC_USETV:
+ | // RA = uvnum, RC = src
+ | ldr LFUNC:CARG2, [BASE, FRAME_FUNC]
+ | add RA, RA, #offsetof(GCfuncL, uvptr)/8
+ | and LFUNC:CARG2, CARG2, #LJ_GCVMASK
+ | ldr UPVAL:CARG1, [LFUNC:CARG2, RA, lsl #3]
+ | ldr CARG3, [BASE, RC, lsl #3]
+ | ldr CARG2, UPVAL:CARG1->v
+ | ldrb TMP2w, UPVAL:CARG1->marked
+ | ldrb TMP0w, UPVAL:CARG1->closed
+ | asr ITYPE, CARG3, #47
+ | str CARG3, [CARG2]
+ | add ITYPE, ITYPE, #-LJ_TISGCV
+ | tst TMP2w, #LJ_GC_BLACK // isblack(uv)
+ | ccmp TMP0w, #0, #4, ne // && uv->closed
+ | ccmn ITYPE, #-(LJ_TNUMX - LJ_TISGCV), #0, ne // && tvisgcv(v)
+ | bhi >2
+ |1:
+ | ins_next
+ |
+ |2: // Check if new value is white.
+ | and GCOBJ:CARG3, CARG3, #LJ_GCVMASK
+ | ldrb TMP1w, GCOBJ:CARG3->gch.marked
+ | tst TMP1w, #LJ_GC_WHITES // iswhite(str)
+ | beq <1
+ | // Crossed a write barrier. Move the barrier forward.
+ | mov CARG1, GL
+ | bl extern lj_gc_barrieruv // (global_State *g, TValue *tv)
+ | b <1
+ break;
+ case BC_USETS:
+ | // RA = uvnum, RC = str_const (~)
+ | ldr LFUNC:CARG2, [BASE, FRAME_FUNC]
+ | add RA, RA, #offsetof(GCfuncL, uvptr)/8
+ | mvn RC, RC
+ | and LFUNC:CARG2, CARG2, #LJ_GCVMASK
+ | ldr UPVAL:CARG1, [LFUNC:CARG2, RA, lsl #3]
+ | ldr STR:CARG3, [KBASE, RC, lsl #3]
+ | movn TMP0, #~LJ_TSTR
+ | ldr CARG2, UPVAL:CARG1->v
+ | ldrb TMP2w, UPVAL:CARG1->marked
+ | add TMP0, STR:CARG3, TMP0, lsl #47
+ | ldrb TMP1w, STR:CARG3->marked
+ | str TMP0, [CARG2]
+ | tbnz TMP2w, #2, >2 // isblack(uv)
+ |1:
+ | ins_next
+ |
+ |2: // Check if string is white and ensure upvalue is closed.
+ | ldrb TMP0w, UPVAL:CARG1->closed
+ | tst TMP1w, #LJ_GC_WHITES // iswhite(str)
+ | ccmp TMP0w, #0, #0, ne
+ | beq <1
+ | // Crossed a write barrier. Move the barrier forward.
+ | mov CARG1, GL
+ | bl extern lj_gc_barrieruv // (global_State *g, TValue *tv)
+ | b <1
+ break;
+ case BC_USETN:
+ | // RA = uvnum, RC = num_const
+ | ldr LFUNC:CARG2, [BASE, FRAME_FUNC]
+ | add RA, RA, #offsetof(GCfuncL, uvptr)/8
+ | and LFUNC:CARG2, CARG2, #LJ_GCVMASK
+ | ldr UPVAL:CARG2, [LFUNC:CARG2, RA, lsl #3]
+ | ldr TMP0, [KBASE, RC, lsl #3]
+ | ldr CARG2, UPVAL:CARG2->v
+ | str TMP0, [CARG2]
+ | ins_next
+ break;
+ case BC_USETP:
+ | // RA = uvnum, RC = primitive_type (~)
+ | ldr LFUNC:CARG2, [BASE, FRAME_FUNC]
+ | add RA, RA, #offsetof(GCfuncL, uvptr)/8
+ | and LFUNC:CARG2, CARG2, #LJ_GCVMASK
+ | ldr UPVAL:CARG2, [LFUNC:CARG2, RA, lsl #3]
+ | mvn TMP0, RC, lsl #47
+ | ldr CARG2, UPVAL:CARG2->v
+ | str TMP0, [CARG2]
+ | ins_next
+ break;
+
+ case BC_UCLO:
+ | // RA = level, RC = target
+ | ldr CARG3, L->openupval
+ | add RC, PC, RC, lsl #2
+ | str BASE, L->base
+ | sub PC, RC, #0x20000
+ | cbz CARG3, >1
+ | mov CARG1, L
+ | add CARG2, BASE, RA, lsl #3
+ | bl extern lj_func_closeuv // (lua_State *L, TValue *level)
+ | ldr BASE, L->base
+ |1:
+ | ins_next
+ break;
+
+ case BC_FNEW:
+ | // RA = dst, RC = proto_const (~) (holding function prototype)
+ | mvn RC, RC
+ | str BASE, L->base
+ | ldr LFUNC:CARG3, [BASE, FRAME_FUNC]
+ | str PC, SAVE_PC
+ | ldr CARG2, [KBASE, RC, lsl #3]
+ | mov CARG1, L
+ | and LFUNC:CARG3, CARG3, #LJ_GCVMASK
+ | // (lua_State *L, GCproto *pt, GCfuncL *parent)
+ | bl extern lj_func_newL_gc
+ | // Returns GCfuncL *.
+ | ldr BASE, L->base
+ | movn TMP0, #~LJ_TFUNC
+ | add CRET1, CRET1, TMP0, lsl #47
+ | str CRET1, [BASE, RA, lsl #3]
+ | ins_next
+ break;
+
+ /* -- Table ops --------------------------------------------------------- */
+
+ case BC_TNEW:
+ case BC_TDUP:
+ | // RA = dst, RC = (hbits|asize) | tab_const (~)
+ | ldp CARG3, CARG4, GL->gc.total // Assumes threshold follows total.
+ | str BASE, L->base
+ | str PC, SAVE_PC
+ | mov CARG1, L
+ | cmp CARG3, CARG4
+ | bhs >5
+ |1:
+ if (op == BC_TNEW) {
+ | and CARG2, RC, #0x7ff
+ | lsr CARG3, RC, #11
+ | cmp CARG2, #0x7ff
+ | mov TMP0, #0x801
+ | csel CARG2, CARG2, TMP0, ne
+ | bl extern lj_tab_new // (lua_State *L, int32_t asize, uint32_t hbits)
+ | // Returns GCtab *.
+ } else {
+ | mvn RC, RC
+ | ldr CARG2, [KBASE, RC, lsl #3]
+ | bl extern lj_tab_dup // (lua_State *L, Table *kt)
+ | // Returns GCtab *.
+ }
+ | ldr BASE, L->base
+ | movk CRET1, #(LJ_TTAB>>1)&0xffff, lsl #48
+ | str CRET1, [BASE, RA, lsl #3]
+ | ins_next
+ |
+ |5:
+ | bl extern lj_gc_step_fixtop // (lua_State *L)
+ | mov CARG1, L
+ | b <1
+ break;
+
+ case BC_GGET:
+ | // RA = dst, RC = str_const (~)
+ case BC_GSET:
+ | // RA = dst, RC = str_const (~)
+ | ldr LFUNC:CARG1, [BASE, FRAME_FUNC]
+ | mvn RC, RC
+ | and LFUNC:CARG1, CARG1, #LJ_GCVMASK
+ | ldr TAB:CARG2, LFUNC:CARG1->env
+ | ldr STR:RC, [KBASE, RC, lsl #3]
+ if (op == BC_GGET) {
+ | b ->BC_TGETS_Z
+ } else {
+ | b ->BC_TSETS_Z
+ }
+ break;
+
+ case BC_TGETV:
+ | decode_RB RB, INS
+ | and RC, RC, #255
+ | // RA = dst, RB = table, RC = key
+ | ldr CARG2, [BASE, RB, lsl #3]
+ | ldr TMP1, [BASE, RC, lsl #3]
+ | checktab CARG2, ->vmeta_tgetv
+ | checkint TMP1, >9 // Integer key?
+ | ldr CARG3, TAB:CARG2->array
+ | ldr CARG1w, TAB:CARG2->asize
+ | add CARG3, CARG3, TMP1, uxtw #3
+ | cmp TMP1w, CARG1w // In array part?
+ | bhs ->vmeta_tgetv
+ | ldr TMP0, [CARG3]
+ | cmp TMP0, TISNIL
+ | beq >5
+ |1:
+ | str TMP0, [BASE, RA, lsl #3]
+ | ins_next
+ |
+ |5: // Check for __index if table value is nil.
+ | ldr TAB:CARG1, TAB:CARG2->metatable
+ | cbz TAB:CARG1, <1 // No metatable: done.
+ | ldrb TMP1w, TAB:CARG1->nomm
+ | tbnz TMP1w, #MM_index, <1 // 'no __index' flag set: done.
+ | b ->vmeta_tgetv
+ |
+ |9:
+ | asr ITYPE, TMP1, #47
+ | cmn ITYPE, #-LJ_TSTR // String key?
+ | bne ->vmeta_tgetv
+ | and STR:RC, TMP1, #LJ_GCVMASK
+ | b ->BC_TGETS_Z
+ break;
+ case BC_TGETS:
+ | decode_RB RB, INS
+ | and RC, RC, #255
+ | // RA = dst, RB = table, RC = str_const (~)
+ | ldr CARG2, [BASE, RB, lsl #3]
+ | mvn RC, RC
+ | ldr STR:RC, [KBASE, RC, lsl #3]
+ | checktab CARG2, ->vmeta_tgets1
+ |->BC_TGETS_Z:
+ | // TAB:CARG2 = GCtab *, STR:RC = GCstr *, RA = dst
+ | ldr TMP1w, TAB:CARG2->hmask
+ | ldr TMP2w, STR:RC->hash
+ | ldr NODE:CARG3, TAB:CARG2->node
+ | and TMP1w, TMP1w, TMP2w // idx = str->hash & tab->hmask
+ | add TMP1, TMP1, TMP1, lsl #1
+ | movn CARG4, #~LJ_TSTR
+ | add NODE:CARG3, NODE:CARG3, TMP1, lsl #3 // node = tab->node + idx*3*8
+ | add CARG4, STR:RC, CARG4, lsl #47 // Tagged key to look for.
+ |1:
+ | ldp TMP0, CARG1, NODE:CARG3->val
+ | ldr NODE:CARG3, NODE:CARG3->next
+ | cmp CARG1, CARG4
+ | bne >4
+ | cmp TMP0, TISNIL
+ | beq >5
+ |3:
+ | str TMP0, [BASE, RA, lsl #3]
+ | ins_next
+ |
+ |4: // Follow hash chain.
+ | cbnz NODE:CARG3, <1
+ | // End of hash chain: key not found, nil result.
+ | mov TMP0, TISNIL
+ |
+ |5: // Check for __index if table value is nil.
+ | ldr TAB:CARG1, TAB:CARG2->metatable
+ | cbz TAB:CARG1, <3 // No metatable: done.
+ | ldrb TMP1w, TAB:CARG1->nomm
+ | tbnz TMP1w, #MM_index, <3 // 'no __index' flag set: done.
+ | b ->vmeta_tgets
+ break;
+ case BC_TGETB:
+ | decode_RB RB, INS
+ | and RC, RC, #255
+ | // RA = dst, RB = table, RC = index
+ | ldr CARG2, [BASE, RB, lsl #3]
+ | checktab CARG2, ->vmeta_tgetb
+ | ldr CARG3, TAB:CARG2->array
+ | ldr CARG1w, TAB:CARG2->asize
+ | add CARG3, CARG3, RC, lsl #3
+ | cmp RCw, CARG1w // In array part?
+ | bhs ->vmeta_tgetb
+ | ldr TMP0, [CARG3]
+ | cmp TMP0, TISNIL
+ | beq >5
+ |1:
+ | str TMP0, [BASE, RA, lsl #3]
+ | ins_next
+ |
+ |5: // Check for __index if table value is nil.
+ | ldr TAB:CARG1, TAB:CARG2->metatable
+ | cbz TAB:CARG1, <1 // No metatable: done.
+ | ldrb TMP1w, TAB:CARG1->nomm
+ | tbnz TMP1w, #MM_index, <1 // 'no __index' flag set: done.
+ | b ->vmeta_tgetb
+ break;
+ case BC_TGETR:
+ | decode_RB RB, INS
+ | and RC, RC, #255
+ | // RA = dst, RB = table, RC = key
+ | ldr CARG1, [BASE, RB, lsl #3]
+ | ldr TMP1, [BASE, RC, lsl #3]
+ | and TAB:CARG1, CARG1, #LJ_GCVMASK
+ | ldr CARG3, TAB:CARG1->array
+ | ldr TMP2w, TAB:CARG1->asize
+ | add CARG3, CARG3, TMP1w, uxtw #3
+ | cmp TMP1w, TMP2w // In array part?
+ | bhs ->vmeta_tgetr
+ | ldr TMP0, [CARG3]
+ |->BC_TGETR_Z:
+ | str TMP0, [BASE, RA, lsl #3]
+ | ins_next
+ break;
+
+ case BC_TSETV:
+ | decode_RB RB, INS
+ | and RC, RC, #255
+ | // RA = src, RB = table, RC = key
+ | ldr CARG2, [BASE, RB, lsl #3]
+ | ldr TMP1, [BASE, RC, lsl #3]
+ | checktab CARG2, ->vmeta_tsetv
+ | checkint TMP1, >9 // Integer key?
+ | ldr CARG3, TAB:CARG2->array
+ | ldr CARG1w, TAB:CARG2->asize
+ | add CARG3, CARG3, TMP1, uxtw #3
+ | cmp TMP1w, CARG1w // In array part?
+ | bhs ->vmeta_tsetv
+ | ldr TMP1, [CARG3]
+ | ldr TMP0, [BASE, RA, lsl #3]
+ | ldrb TMP2w, TAB:CARG2->marked
+ | cmp TMP1, TISNIL // Previous value is nil?
+ | beq >5
+ |1:
+ | str TMP0, [CARG3]
+ | tbnz TMP2w, #2, >7 // isblack(table)
+ |2:
+ | ins_next
+ |
+ |5: // Check for __newindex if previous value is nil.
+ | ldr TAB:CARG1, TAB:CARG2->metatable
+ | cbz TAB:CARG1, <1 // No metatable: done.
+ | ldrb TMP1w, TAB:CARG1->nomm
+ | tbnz TMP1w, #MM_newindex, <1 // 'no __newindex' flag set: done.
+ | b ->vmeta_tsetv
+ |
+ |7: // Possible table write barrier for the value. Skip valiswhite check.
+ | barrierback TAB:CARG2, TMP2w, TMP1
+ | b <2
+ |
+ |9:
+ | asr ITYPE, TMP1, #47
+ | cmn ITYPE, #-LJ_TSTR // String key?
+ | bne ->vmeta_tsetv
+ | and STR:RC, TMP1, #LJ_GCVMASK
+ | b ->BC_TSETS_Z
+ break;
+ case BC_TSETS:
+ | decode_RB RB, INS
+ | and RC, RC, #255
+ | // RA = dst, RB = table, RC = str_const (~)
+ | ldr CARG2, [BASE, RB, lsl #3]
+ | mvn RC, RC
+ | ldr STR:RC, [KBASE, RC, lsl #3]
+ | checktab CARG2, ->vmeta_tsets1
+ |->BC_TSETS_Z:
+ | // TAB:CARG2 = GCtab *, STR:RC = GCstr *, RA = src
+ | ldr TMP1w, TAB:CARG2->hmask
+ | ldr TMP2w, STR:RC->hash
+ | ldr NODE:CARG3, TAB:CARG2->node
+ | and TMP1w, TMP1w, TMP2w // idx = str->hash & tab->hmask
+ | add TMP1, TMP1, TMP1, lsl #1
+ | movn CARG4, #~LJ_TSTR
+ | add NODE:CARG3, NODE:CARG3, TMP1, lsl #3 // node = tab->node + idx*3*8
+ | add CARG4, STR:RC, CARG4, lsl #47 // Tagged key to look for.
+ | strb wzr, TAB:CARG2->nomm // Clear metamethod cache.
+ |1:
+ | ldp TMP1, CARG1, NODE:CARG3->val
+ | ldr NODE:TMP3, NODE:CARG3->next
+ | ldrb TMP2w, TAB:CARG2->marked
+ | cmp CARG1, CARG4
+ | bne >5
+ | ldr TMP0, [BASE, RA, lsl #3]
+ | cmp TMP1, TISNIL // Previous value is nil?
+ | beq >4
+ |2:
+ | str TMP0, NODE:CARG3->val
+ | tbnz TMP2w, #2, >7 // isblack(table)
+ |3:
+ | ins_next
+ |
+ |4: // Check for __newindex if previous value is nil.
+ | ldr TAB:CARG1, TAB:CARG2->metatable
+ | cbz TAB:CARG1, <2 // No metatable: done.
+ | ldrb TMP1w, TAB:CARG1->nomm
+ | tbnz TMP1w, #MM_newindex, <2 // 'no __newindex' flag set: done.
+ | b ->vmeta_tsets
+ |
+ |5: // Follow hash chain.
+ | mov NODE:CARG3, NODE:TMP3
+ | cbnz NODE:TMP3, <1
+ | // End of hash chain: key not found, add a new one.
+ |
+ | // But check for __newindex first.
+ | ldr TAB:CARG1, TAB:CARG2->metatable
+ | cbz TAB:CARG1, >6 // No metatable: continue.
+ | ldrb TMP1w, TAB:CARG1->nomm
+ | // 'no __newindex' flag NOT set: check.
+ | tbz TMP1w, #MM_newindex, ->vmeta_tsets
+ |6:
+ | movn TMP1, #~LJ_TSTR
+ | str PC, SAVE_PC
+ | add TMP0, STR:RC, TMP1, lsl #47
+ | str BASE, L->base
+ | mov CARG1, L
+ | str TMP0, TMPD
+ | add CARG3, sp, TMPDofs
+ | bl extern lj_tab_newkey // (lua_State *L, GCtab *t, TValue *k)
+ | // Returns TValue *.
+ | ldr BASE, L->base
+ | ldr TMP0, [BASE, RA, lsl #3]
+ | str TMP0, [CRET1]
+ | b <3 // No 2nd write barrier needed.
+ |
+ |7: // Possible table write barrier for the value. Skip valiswhite check.
+ | barrierback TAB:CARG2, TMP2w, TMP1
+ | b <3
+ break;
+ case BC_TSETB:
+ | decode_RB RB, INS
+ | and RC, RC, #255
+ | // RA = src, RB = table, RC = index
+ | ldr CARG2, [BASE, RB, lsl #3]
+ | checktab CARG2, ->vmeta_tsetb
+ | ldr CARG3, TAB:CARG2->array
+ | ldr CARG1w, TAB:CARG2->asize
+ | add CARG3, CARG3, RC, lsl #3
+ | cmp RCw, CARG1w // In array part?
+ | bhs ->vmeta_tsetb
+ | ldr TMP1, [CARG3]
+ | ldr TMP0, [BASE, RA, lsl #3]
+ | ldrb TMP2w, TAB:CARG2->marked
+ | cmp TMP1, TISNIL // Previous value is nil?
+ | beq >5
+ |1:
+ | str TMP0, [CARG3]
+ | tbnz TMP2w, #2, >7 // isblack(table)
+ |2:
+ | ins_next
+ |
+ |5: // Check for __newindex if previous value is nil.
+ | ldr TAB:CARG1, TAB:CARG2->metatable
+ | cbz TAB:CARG1, <1 // No metatable: done.
+ | ldrb TMP1w, TAB:CARG1->nomm
+ | tbnz TMP1w, #MM_newindex, <1 // 'no __newindex' flag set: done.
+ | b ->vmeta_tsetb
+ |
+ |7: // Possible table write barrier for the value. Skip valiswhite check.
+ | barrierback TAB:CARG2, TMP2w, TMP1
+ | b <2
+ break;
+ case BC_TSETR:
+ | decode_RB RB, INS
+ | and RC, RC, #255
+ | // RA = src, RB = table, RC = key
+ | ldr CARG2, [BASE, RB, lsl #3]
+ | ldr TMP1, [BASE, RC, lsl #3]
+ | and TAB:CARG2, CARG2, #LJ_GCVMASK
+ | ldr CARG1, TAB:CARG2->array
+ | ldrb TMP2w, TAB:CARG2->marked
+ | ldr CARG4w, TAB:CARG2->asize
+ | add CARG1, CARG1, TMP1, uxtw #3
+ | tbnz TMP2w, #2, >7 // isblack(table)
+ |2:
+ | cmp TMP1w, CARG4w // In array part?
+ | bhs ->vmeta_tsetr
+ |->BC_TSETR_Z:
+ | ldr TMP0, [BASE, RA, lsl #3]
+ | str TMP0, [CARG1]
+ | ins_next
+ |
+ |7: // Possible table write barrier for the value. Skip valiswhite check.
+ | barrierback TAB:CARG2, TMP2w, TMP0
+ | b <2
+ break;
+
+ case BC_TSETM:
+ | // RA = base (table at base-1), RC = num_const (start index)
+ | add RA, BASE, RA, lsl #3
+ |1:
+ | ldr RBw, SAVE_MULTRES
+ | ldr TAB:CARG2, [RA, #-8] // Guaranteed to be a table.
+ | ldr TMP1, [KBASE, RC, lsl #3] // Integer constant is in lo-word.
+ | sub RB, RB, #8
+ | cbz RB, >4 // Nothing to copy?
+ | and TAB:CARG2, CARG2, #LJ_GCVMASK
+ | ldr CARG1w, TAB:CARG2->asize
+ | add CARG3w, TMP1w, RBw, lsr #3
+ | ldr CARG4, TAB:CARG2->array
+ | cmp CARG3, CARG1
+ | add RB, RA, RB
+ | bhi >5
+ | add TMP1, CARG4, TMP1w, uxtw #3
+ | ldrb TMP2w, TAB:CARG2->marked
+ |3: // Copy result slots to table.
+ | ldr TMP0, [RA], #8
+ | str TMP0, [TMP1], #8
+ | cmp RA, RB
+ | blo <3
+ | tbnz TMP2w, #2, >7 // isblack(table)
+ |4:
+ | ins_next
+ |
+ |5: // Need to resize array part.
+ | str BASE, L->base
+ | mov CARG1, L
+ | str PC, SAVE_PC
+ | bl extern lj_tab_reasize // (lua_State *L, GCtab *t, int nasize)
+ | // Must not reallocate the stack.
+ | b <1
+ |
+ |7: // Possible table write barrier for any value. Skip valiswhite check.
+ | barrierback TAB:CARG2, TMP2w, TMP1
+ | b <4
+ break;
+
+ /* -- Calls and vararg handling ----------------------------------------- */
+
+ case BC_CALLM:
+ | // RA = base, (RB = nresults+1,) RC = extra_nargs
+ | ldr TMP0w, SAVE_MULTRES
+ | decode_RC8RD NARGS8:RC, RC
+ | add NARGS8:RC, NARGS8:RC, TMP0
+ | b ->BC_CALL_Z
+ break;
+ case BC_CALL:
+ | decode_RC8RD NARGS8:RC, RC
+ | // RA = base, (RB = nresults+1,) RC = (nargs+1)*8
+ |->BC_CALL_Z:
+ | mov RB, BASE // Save old BASE for vmeta_call.
+ | add BASE, BASE, RA, lsl #3
+ | ldr CARG3, [BASE]
+ | sub NARGS8:RC, NARGS8:RC, #8
+ | add BASE, BASE, #16
+ | checkfunc CARG3, ->vmeta_call
+ | ins_call
+ break;
+
+ case BC_CALLMT:
+ | // RA = base, (RB = 0,) RC = extra_nargs
+ | ldr TMP0w, SAVE_MULTRES
+ | add NARGS8:RC, TMP0, RC, lsl #3
+ | b ->BC_CALLT1_Z
+ break;
+ case BC_CALLT:
+ | lsl NARGS8:RC, RC, #3
+ | // RA = base, (RB = 0,) RC = (nargs+1)*8
+ |->BC_CALLT1_Z:
+ | add RA, BASE, RA, lsl #3
+ | ldr TMP1, [RA]
+ | sub NARGS8:RC, NARGS8:RC, #8
+ | add RA, RA, #16
+ | checktp CARG3, TMP1, LJ_TFUNC, ->vmeta_callt
+ | ldr PC, [BASE, FRAME_PC]
+ |->BC_CALLT2_Z:
+ | mov RB, #0
+ | ldrb TMP2w, LFUNC:CARG3->ffid
+ | tst PC, #FRAME_TYPE
+ | bne >7
+ |1:
+ | str TMP1, [BASE, FRAME_FUNC] // Copy function down, but keep PC.
+ | cbz NARGS8:RC, >3
+ |2:
+ | ldr TMP0, [RA, RB]
+ | add TMP1, RB, #8
+ | cmp TMP1, NARGS8:RC
+ | str TMP0, [BASE, RB]
+ | mov RB, TMP1
+ | bne <2
+ |3:
+ | cmp TMP2, #1 // (> FF_C) Calling a fast function?
+ | bhi >5
+ |4:
+ | ins_callt
+ |
+ |5: // Tailcall to a fast function with a Lua frame below.
+ | ldrb RAw, [PC, #-3]
+ | sub CARG1, BASE, RA, lsl #3
+ | ldr LFUNC:CARG1, [CARG1, #-32]
+ | and LFUNC:CARG1, CARG1, #LJ_GCVMASK
+ | ldr CARG1, LFUNC:CARG1->pc
+ | ldr KBASE, [CARG1, #PC2PROTO(k)]
+ | b <4
+ |
+ |7: // Tailcall from a vararg function.
+ | eor PC, PC, #FRAME_VARG
+ | tst PC, #FRAME_TYPEP // Vararg frame below?
+ | csel TMP2, RB, TMP2, ne // Clear ffid if no Lua function below.
+ | bne <1
+ | sub BASE, BASE, PC
+ | ldr PC, [BASE, FRAME_PC]
+ | tst PC, #FRAME_TYPE
+ | csel TMP2, RB, TMP2, ne // Clear ffid if no Lua function below.
+ | b <1
+ break;
+
+ case BC_ITERC:
+ | // RA = base, (RB = nresults+1, RC = nargs+1 (2+1))
+ | add RA, BASE, RA, lsl #3
+ | ldr CARG3, [RA, #-24]
+ | mov RB, BASE // Save old BASE for vmeta_call.
+ | ldp CARG1, CARG2, [RA, #-16]
+ | add BASE, RA, #16
+ | mov NARGS8:RC, #16 // Iterators get 2 arguments.
+ | str CARG3, [RA] // Copy callable.
+ | stp CARG1, CARG2, [RA, #16] // Copy state and control var.
+ | checkfunc CARG3, ->vmeta_call
+ | ins_call
+ break;
+
+ case BC_ITERN:
+ | // RA = base, (RB = nresults+1, RC = nargs+1 (2+1))
+ |.if JIT
+ | // NYI: add hotloop, record BC_ITERN.
+ |.endif
+ | add RA, BASE, RA, lsl #3
+ | ldr TAB:RB, [RA, #-16]
+ | ldrh TMP3w, [PC, #2]
+ | ldr CARG1w, [RA, #-8] // Get index from control var.
+ | add PC, PC, #4
+ | add TMP3, PC, TMP3, lsl #2
+ | and TAB:RB, RB, #LJ_GCVMASK
+ | sub TMP3, TMP3, #0x20000
+ | ldr TMP1w, TAB:RB->asize
+ | ldr CARG2, TAB:RB->array
+ |1: // Traverse array part.
+ | subs RC, CARG1, TMP1
+ | add CARG3, CARG2, CARG1, lsl #3
+ | bhs >5 // Index points after array part?
+ | ldr TMP0, [CARG3]
+ | cmp TMP0, TISNIL
+ | cinc CARG1, CARG1, eq // Skip holes in array part.
+ | beq <1
+ | add CARG1, CARG1, TISNUM
+ | stp CARG1, TMP0, [RA]
+ | add CARG1, CARG1, #1
+ |3:
+ | str CARG1w, [RA, #-8] // Update control var.
+ | mov PC, TMP3
+ |4:
+ | ins_next
+ |
+ |5: // Traverse hash part.
+ | ldr TMP2w, TAB:RB->hmask
+ | ldr NODE:RB, TAB:RB->node
+ |6:
+ | add CARG1, RC, RC, lsl #1
+ | cmp RC, TMP2 // End of iteration? Branch to ITERN+1.
+ | add NODE:CARG3, NODE:RB, CARG1, lsl #3 // node = tab->node + idx*3*8
+ | bhi <4
+ | ldp TMP0, CARG1, NODE:CARG3->val
+ | cmp TMP0, TISNIL
+ | add RC, RC, #1
+ | beq <6 // Skip holes in hash part.
+ | stp CARG1, TMP0, [RA]
+ | add CARG1, RC, TMP1
+ | b <3
+ break;
+
+ case BC_ISNEXT:
+ | // RA = base, RC = target (points to ITERN)
+ | add RA, BASE, RA, lsl #3
+ | ldr CFUNC:CARG1, [RA, #-24]
+ | add RC, PC, RC, lsl #2
+ | ldp TAB:CARG3, CARG4, [RA, #-16]
+ | sub RC, RC, #0x20000
+ | checkfunc CFUNC:CARG1, >5
+ | asr TMP0, TAB:CARG3, #47
+ | ldrb TMP1w, CFUNC:CARG1->ffid
+ | cmn TMP0, #-LJ_TTAB
+ | ccmp CARG4, TISNIL, #0, eq
+ | ccmp TMP1w, #FF_next_N, #0, eq
+ | bne >5
+ | mov TMP0w, #0xfffe7fff
+ | lsl TMP0, TMP0, #32
+ | str TMP0, [RA, #-8] // Initialize control var.
+ |1:
+ | mov PC, RC
+ | ins_next
+ |
+ |5: // Despecialize bytecode if any of the checks fail.
+ | mov TMP0, #BC_JMP
+ | mov TMP1, #BC_ITERC
+ | strb TMP0w, [PC, #-4]
+ | strb TMP1w, [RC]
+ | b <1
+ break;
+
+ case BC_VARG:
+ | decode_RB RB, INS
+ | and RC, RC, #255
+ | // RA = base, RB = (nresults+1), RC = numparams
+ | ldr TMP1, [BASE, FRAME_PC]
+ | add RC, BASE, RC, lsl #3
+ | add RA, BASE, RA, lsl #3
+ | add RC, RC, #FRAME_VARG
+ | add TMP2, RA, RB, lsl #3
+ | sub RC, RC, TMP1 // RC = vbase
+ | // Note: RC may now be even _above_ BASE if nargs was < numparams.
+ | sub TMP3, BASE, #16 // TMP3 = vtop
+ | cbz RB, >5
+ | sub TMP2, TMP2, #16
+ |1: // Copy vararg slots to destination slots.
+ | cmp RC, TMP3
+ | ldr TMP0, [RC], #8
+ | csel TMP0, TMP0, TISNIL, lo
+ | cmp RA, TMP2
+ | str TMP0, [RA], #8
+ | blo <1
+ |2:
+ | ins_next
+ |
+ |5: // Copy all varargs.
+ | ldr TMP0, L->maxstack
+ | subs TMP2, TMP3, RC
+ | csel RB, xzr, TMP2, le // MULTRES = (max(vtop-vbase,0)+1)*8
+ | add RB, RB, #8
+ | add TMP1, RA, TMP2
+ | str RBw, SAVE_MULTRES
+ | ble <2 // Nothing to copy.
+ | cmp TMP1, TMP0
+ | bhi >7
+ |6:
+ | ldr TMP0, [RC], #8
+ | str TMP0, [RA], #8
+ | cmp RC, TMP3
+ | blo <6
+ | b <2
+ |
+ |7: // Grow stack for varargs.
+ | lsr CARG2, TMP2, #3
+ | stp BASE, RA, L->base
+ | mov CARG1, L
+ | sub RC, RC, BASE // Need delta, because BASE may change.
+ | str PC, SAVE_PC
+ | bl extern lj_state_growstack // (lua_State *L, int n)
+ | ldp BASE, RA, L->base
+ | add RC, BASE, RC
+ | sub TMP3, BASE, #16
+ | b <6
+ break;
+
+ /* -- Returns ----------------------------------------------------------- */
+
+ case BC_RETM:
+ | // RA = results, RC = extra results
+ | ldr TMP0w, SAVE_MULTRES
+ | ldr PC, [BASE, FRAME_PC]
+ | add RA, BASE, RA, lsl #3
+ | add RC, TMP0, RC, lsl #3
+ | b ->BC_RETM_Z
+ break;
+
+ case BC_RET:
+ | // RA = results, RC = nresults+1
+ | ldr PC, [BASE, FRAME_PC]
+ | lsl RC, RC, #3
+ | add RA, BASE, RA, lsl #3
+ |->BC_RETM_Z:
+ | str RCw, SAVE_MULTRES
+ |1:
+ | ands CARG1, PC, #FRAME_TYPE
+ | eor CARG2, PC, #FRAME_VARG
+ | bne ->BC_RETV2_Z
+ |
+ |->BC_RET_Z:
+ | // BASE = base, RA = resultptr, RC = (nresults+1)*8, PC = return
+ | ldr INSw, [PC, #-4]
+ | subs TMP1, RC, #8
+ | sub CARG3, BASE, #16
+ | beq >3
+ |2:
+ | ldr TMP0, [RA], #8
+ | add BASE, BASE, #8
+ | sub TMP1, TMP1, #8
+ | str TMP0, [BASE, #-24]
+ | cbnz TMP1, <2
+ |3:
+ | decode_RA RA, INS
+ | sub CARG4, CARG3, RA, lsl #3
+ | decode_RB RB, INS
+ | ldr LFUNC:CARG1, [CARG4, FRAME_FUNC]
+ |5:
+ | cmp RC, RB, lsl #3 // More results expected?
+ | blo >6
+ | and LFUNC:CARG1, CARG1, #LJ_GCVMASK
+ | mov BASE, CARG4
+ | ldr CARG2, LFUNC:CARG1->pc
+ | ldr KBASE, [CARG2, #PC2PROTO(k)]
+ | ins_next
+ |
+ |6: // Fill up results with nil.
+ | add BASE, BASE, #8
+ | add RC, RC, #8
+ | str TISNIL, [BASE, #-24]
+ | b <5
+ |
+ |->BC_RETV1_Z: // Non-standard return case.
+ | add RA, BASE, RA, lsl #3
+ |->BC_RETV2_Z:
+ | tst CARG2, #FRAME_TYPEP
+ | bne ->vm_return
+ | // Return from vararg function: relocate BASE down.
+ | sub BASE, BASE, CARG2
+ | ldr PC, [BASE, FRAME_PC]
+ | b <1
+ break;
+
+ case BC_RET0: case BC_RET1:
+ | // RA = results, RC = nresults+1
+ | ldr PC, [BASE, FRAME_PC]
+ | lsl RC, RC, #3
+ | str RCw, SAVE_MULTRES
+ | ands CARG1, PC, #FRAME_TYPE
+ | eor CARG2, PC, #FRAME_VARG
+ | bne ->BC_RETV1_Z
+ | ldr INSw, [PC, #-4]
+ if (op == BC_RET1) {
+ | ldr TMP0, [BASE, RA, lsl #3]
+ }
+ | sub CARG4, BASE, #16
+ | decode_RA RA, INS
+ | sub BASE, CARG4, RA, lsl #3
+ if (op == BC_RET1) {
+ | str TMP0, [CARG4], #8
+ }
+ | decode_RB RB, INS
+ | ldr LFUNC:CARG1, [BASE, FRAME_FUNC]
+ |5:
+ | cmp RC, RB, lsl #3
+ | blo >6
+ | and LFUNC:CARG1, CARG1, #LJ_GCVMASK
+ | ldr CARG2, LFUNC:CARG1->pc
+ | ldr KBASE, [CARG2, #PC2PROTO(k)]
+ | ins_next
+ |
+ |6: // Fill up results with nil.
+ | add RC, RC, #8
+ | str TISNIL, [CARG4], #8
+ | b <5
+ break;
+
+ /* -- Loops and branches ------------------------------------------------ */
+
+ |.define FOR_IDX, [RA]; .define FOR_TIDX, [RA, #4]
+ |.define FOR_STOP, [RA, #8]; .define FOR_TSTOP, [RA, #12]
+ |.define FOR_STEP, [RA, #16]; .define FOR_TSTEP, [RA, #20]
+ |.define FOR_EXT, [RA, #24]; .define FOR_TEXT, [RA, #28]
+
+ case BC_FORL:
+ |.if JIT
+ | hotloop
+ |.endif
+ | // Fall through. Assumes BC_IFORL follows.
+ break;
+
+ case BC_JFORI:
+ case BC_JFORL:
+#if !LJ_HASJIT
+ break;
+#endif
+ case BC_FORI:
+ case BC_IFORL:
+ | // RA = base, RC = target (after end of loop or start of loop)
+ vk = (op == BC_IFORL || op == BC_JFORL);
+ | add RA, BASE, RA, lsl #3
+ | ldp CARG1, CARG2, FOR_IDX // CARG1 = IDX, CARG2 = STOP
+ | ldr CARG3, FOR_STEP // CARG3 = STEP
+ if (op != BC_JFORL) {
+ | add RC, PC, RC, lsl #2
+ | sub RC, RC, #0x20000
+ }
+ | checkint CARG1, >5
+ if (!vk) {
+ | checkint CARG2, ->vmeta_for
+ | checkint CARG3, ->vmeta_for
+ | tbnz CARG3w, #31, >4
+ | cmp CARG1w, CARG2w
+ } else {
+ | adds CARG1w, CARG1w, CARG3w
+ | bvs >2
+ | add TMP0, CARG1, TISNUM
+ | tbnz CARG3w, #31, >4
+ | cmp CARG1w, CARG2w
+ }
+ |1:
+ if (op == BC_FORI) {
+ | csel PC, RC, PC, gt
+ } else if (op == BC_JFORI) {
+ | ldrh RCw, [RC, #-2]
+ } else if (op == BC_IFORL) {
+ | csel PC, RC, PC, le
+ }
+ if (vk) {
+ | str TMP0, FOR_IDX
+ | str TMP0, FOR_EXT
+ } else {
+ | str CARG1, FOR_EXT
+ }
+ if (op == BC_JFORI || op == BC_JFORL) {
+ | ble =>BC_JLOOP
+ }
+ |2:
+ | ins_next
+ |
+ |4: // Invert check for negative step.
+ | cmp CARG2w, CARG1w
+ | b <1
+ |
+ |5: // FP loop.
+ | ldp d0, d1, FOR_IDX
+ | blo ->vmeta_for
+ if (!vk) {
+ | checknum CARG2, ->vmeta_for
+ | checknum CARG3, ->vmeta_for
+ | str d0, FOR_EXT
+ } else {
+ | ldr d2, FOR_STEP
+ | fadd d0, d0, d2
+ }
+ | tbnz CARG3, #63, >7
+ | fcmp d0, d1
+ |6:
+ if (vk) {
+ | str d0, FOR_IDX
+ | str d0, FOR_EXT
+ }
+ if (op == BC_FORI) {
+ | csel PC, RC, PC, hi
+ } else if (op == BC_JFORI) {
+ | ldrh RCw, [RC, #-2]
+ | bls =>BC_JLOOP
+ } else if (op == BC_IFORL) {
+ | csel PC, RC, PC, ls
+ } else {
+ | bls =>BC_JLOOP
+ }
+ | b <2
+ |
+ |7: // Invert check for negative step.
+ | fcmp d1, d0
+ | b <6
+ break;
+
+ case BC_ITERL:
+ |.if JIT
+ | hotloop
+ |.endif
+ | // Fall through. Assumes BC_IITERL follows.
+ break;
+
+ case BC_JITERL:
+#if !LJ_HASJIT
+ break;
+#endif
+ case BC_IITERL:
+ | // RA = base, RC = target
+ | ldr CARG1, [BASE, RA, lsl #3]
+ | add TMP1, BASE, RA, lsl #3
+ | cmp CARG1, TISNIL
+ | beq >1 // Stop if iterator returned nil.
+ if (op == BC_JITERL) {
+ | str CARG1, [TMP1, #-8]
+ | b =>BC_JLOOP
+ } else {
+ | add TMP0, PC, RC, lsl #2 // Otherwise save control var + branch.
+ | sub PC, TMP0, #0x20000
+ | str CARG1, [TMP1, #-8]
+ }
+ |1:
+ | ins_next
+ break;
+
+ case BC_LOOP:
+ | // RA = base, RC = target (loop extent)
+ | // Note: RA/RC is only used by trace recorder to determine scope/extent
+ | // This opcode does NOT jump, it's only purpose is to detect a hot loop.
+ |.if JIT
+ | hotloop
+ |.endif
+ | // Fall through. Assumes BC_ILOOP follows.
+ break;
+
+ case BC_ILOOP:
+ | // RA = base, RC = target (loop extent)
+ | ins_next
+ break;
+
+ case BC_JLOOP:
+ |.if JIT
+ | NYI
+ |.endif
+ break;
+
+ case BC_JMP:
+ | // RA = base (only used by trace recorder), RC = target
+ | add RC, PC, RC, lsl #2
+ | sub PC, RC, #0x20000
+ | ins_next
+ break;
+
+ /* -- Function headers -------------------------------------------------- */
+
+ case BC_FUNCF:
+ |.if JIT
+ | hotcall
+ |.endif
+ case BC_FUNCV: /* NYI: compiled vararg functions. */
+ | // Fall through. Assumes BC_IFUNCF/BC_IFUNCV follow.
+ break;
+
+ case BC_JFUNCF:
+#if !LJ_HASJIT
+ break;
+#endif
+ case BC_IFUNCF:
+ | // BASE = new base, RA = BASE+framesize*8, CARG3 = LFUNC, RC = nargs*8
+ | ldr CARG1, L->maxstack
+ | ldrb TMP1w, [PC, #-4+PC2PROTO(numparams)]
+ | ldr KBASE, [PC, #-4+PC2PROTO(k)]
+ | cmp RA, CARG1
+ | bhi ->vm_growstack_l
+ |2:
+ | cmp NARGS8:RC, TMP1, lsl #3 // Check for missing parameters.
+ | blo >3
+ if (op == BC_JFUNCF) {
+ | decode_RD RC, INS
+ | b =>BC_JLOOP
+ } else {
+ | ins_next
+ }
+ |
+ |3: // Clear missing parameters.
+ | str TISNIL, [BASE, NARGS8:RC]
+ | add NARGS8:RC, NARGS8:RC, #8
+ | b <2
+ break;
+
+ case BC_JFUNCV:
+#if !LJ_HASJIT
+ break;
+#endif
+ | NYI // NYI: compiled vararg functions
+ break; /* NYI: compiled vararg functions. */
+
+ case BC_IFUNCV:
+ | // BASE = new base, RA = BASE+framesize*8, CARG3 = LFUNC, RC = nargs*8
+ | ldr CARG1, L->maxstack
+ | add TMP2, BASE, RC
+ | add RA, RA, RC
+ | add TMP0, RC, #16+FRAME_VARG
+ | str LFUNC:CARG3, [TMP2], #8 // Store (untagged) copy of LFUNC.
+ | ldr KBASE, [PC, #-4+PC2PROTO(k)]
+ | cmp RA, CARG1
+ | str TMP0, [TMP2], #8 // Store delta + FRAME_VARG.
+ | bhs ->vm_growstack_l
+ | sub RC, TMP2, #16
+ | ldrb TMP1w, [PC, #-4+PC2PROTO(numparams)]
+ | mov RA, BASE
+ | mov BASE, TMP2
+ | cbz TMP1, >2
+ |1:
+ | cmp RA, RC // Less args than parameters?
+ | bhs >3
+ | ldr TMP0, [RA]
+ | sub TMP1, TMP1, #1
+ | str TISNIL, [RA], #8 // Clear old fixarg slot (help the GC).
+ | str TMP0, [TMP2], #8
+ | cbnz TMP1, <1
+ |2:
+ | ins_next
+ |
+ |3:
+ | sub TMP1, TMP1, #1
+ | str TISNIL, [TMP2], #8
+ | cbz TMP1, <2
+ | b <3
+ break;
+
+ case BC_FUNCC:
+ case BC_FUNCCW:
+ | // BASE = new base, RA = BASE+framesize*8, CARG3 = CFUNC, RC = nargs*8
+ if (op == BC_FUNCC) {
+ | ldr CARG4, CFUNC:CARG3->f
+ } else {
+ | ldr CARG4, GL->wrapf
+ }
+ | add CARG2, RA, NARGS8:RC
+ | ldr CARG1, L->maxstack
+ | add RC, BASE, NARGS8:RC
+ | cmp CARG2, CARG1
+ | stp BASE, RC, L->base
+ if (op == BC_FUNCCW) {
+ | ldr CARG2, CFUNC:CARG3->f
+ }
+ | mv_vmstate TMP0w, C
+ | mov CARG1, L
+ | bhi ->vm_growstack_c // Need to grow stack.
+ | st_vmstate TMP0w
+ | blr CARG4 // (lua_State *L [, lua_CFunction f])
+ | // Returns nresults.
+ | ldp BASE, TMP1, L->base
+ | str L, GL->cur_L
+ | sbfiz RC, CRET1, #3, #32
+ | st_vmstate ST_INTERP
+ | ldr PC, [BASE, FRAME_PC]
+ | sub RA, TMP1, RC // RA = L->top - nresults*8
+ | b ->vm_returnc
+ break;
+
+ /* ---------------------------------------------------------------------- */
+
+ default:
+ fprintf(stderr, "Error: undefined opcode BC_%s\n", bc_names[op]);
+ exit(2);
+ break;
+ }
+}
+
+static int build_backend(BuildCtx *ctx)
+{
+ int op;
+
+ dasm_growpc(Dst, BC__MAX);
+
+ build_subroutines(ctx);
+
+ |.code_op
+ for (op = 0; op < BC__MAX; op++)
+ build_ins(ctx, (BCOp)op, op);
+
+ return BC__MAX;
+}
+
+/* Emit pseudo frame-info for all assembler functions. */
+static void emit_asm_debug(BuildCtx *ctx)
+{
+ int fcofs = (int)((uint8_t *)ctx->glob[GLOB_vm_ffi_call] - ctx->code);
+ int i, cf = CFRAME_SIZE >> 3;
+ switch (ctx->mode) {
+ case BUILD_elfasm:
+ fprintf(ctx->fp, "\t.section .debug_frame,\"\",%%progbits\n");
+ fprintf(ctx->fp,
+ ".Lframe0:\n"
+ "\t.long .LECIE0-.LSCIE0\n"
+ ".LSCIE0:\n"
+ "\t.long 0xffffffff\n"
+ "\t.byte 0x1\n"
+ "\t.string \"\"\n"
+ "\t.uleb128 0x1\n"
+ "\t.sleb128 -8\n"
+ "\t.byte 30\n" /* Return address is in lr. */
+ "\t.byte 0xc\n\t.uleb128 31\n\t.uleb128 0\n" /* def_cfa sp */
+ "\t.align 3\n"
+ ".LECIE0:\n\n");
+ fprintf(ctx->fp,
+ ".LSFDE0:\n"
+ "\t.long .LEFDE0-.LASFDE0\n"
+ ".LASFDE0:\n"
+ "\t.long .Lframe0\n"
+ "\t.quad .Lbegin\n"
+ "\t.quad %d\n"
+ "\t.byte 0xe\n\t.uleb128 %d\n" /* def_cfa_offset */
+ "\t.byte 0x9d\n\t.uleb128 %d\n" /* offset fp */
+ "\t.byte 0x9e\n\t.uleb128 %d\n", /* offset lr */
+ fcofs, CFRAME_SIZE, cf, cf-1);
+ for (i = 19; i <= 28; i++) /* offset x19-x28 */
+ fprintf(ctx->fp, "\t.byte 0x%x\n\t.uleb128 %d\n", 0x80+i, cf-i+17);
+ for (i = 8; i <= 15; i++) /* offset d8-d15 */
+ fprintf(ctx->fp, "\t.byte 5\n\t.uleb128 0x%x\n\t.uleb128 %d\n",
+ 64+i, cf-i-4);
+ fprintf(ctx->fp,
+ "\t.align 3\n"
+ ".LEFDE0:\n\n");
+#if LJ_HASFFI
+ fprintf(ctx->fp,
+ ".LSFDE1:\n"
+ "\t.long .LEFDE1-.LASFDE1\n"
+ ".LASFDE1:\n"
+ "\t.long .Lframe0\n"
+ "\t.quad lj_vm_ffi_call\n"
+ "\t.quad %d\n"
+ "\t.byte 0xe\n\t.uleb128 32\n" /* def_cfa_offset */
+ "\t.byte 0x9d\n\t.uleb128 4\n" /* offset fp */
+ "\t.byte 0x9e\n\t.uleb128 3\n" /* offset lr */
+ "\t.byte 0x93\n\t.uleb128 2\n" /* offset x19 */
+ "\t.align 3\n"
+ ".LEFDE1:\n\n", (int)ctx->codesz - fcofs);
+#endif
+ fprintf(ctx->fp, "\t.section .eh_frame,\"a\",%%progbits\n");
+ fprintf(ctx->fp,
+ ".Lframe1:\n"
+ "\t.long .LECIE1-.LSCIE1\n"
+ ".LSCIE1:\n"
+ "\t.long 0\n"
+ "\t.byte 0x1\n"
+ "\t.string \"zPR\"\n"
+ "\t.uleb128 0x1\n"
+ "\t.sleb128 -8\n"
+ "\t.byte 30\n" /* Return address is in lr. */
+ "\t.uleb128 6\n" /* augmentation length */
+ "\t.byte 0x1b\n" /* pcrel|sdata4 */
+ "\t.long lj_err_unwind_dwarf-.\n"
+ "\t.byte 0x1b\n" /* pcrel|sdata4 */
+ "\t.byte 0xc\n\t.uleb128 31\n\t.uleb128 0\n" /* def_cfa sp */
+ "\t.align 3\n"
+ ".LECIE1:\n\n");
+ fprintf(ctx->fp,
+ ".LSFDE2:\n"
+ "\t.long .LEFDE2-.LASFDE2\n"
+ ".LASFDE2:\n"
+ "\t.long .LASFDE2-.Lframe1\n"
+ "\t.long .Lbegin-.\n"
+ "\t.long %d\n"
+ "\t.uleb128 0\n" /* augmentation length */
+ "\t.byte 0xe\n\t.uleb128 %d\n" /* def_cfa_offset */
+ "\t.byte 0x9d\n\t.uleb128 %d\n" /* offset fp */
+ "\t.byte 0x9e\n\t.uleb128 %d\n", /* offset lr */
+ fcofs, CFRAME_SIZE, cf, cf-1);
+ for (i = 19; i <= 28; i++) /* offset x19-x28 */
+ fprintf(ctx->fp, "\t.byte 0x%x\n\t.uleb128 %d\n", 0x80+i, cf-i+17);
+ for (i = 8; i <= 15; i++) /* offset d8-d15 */
+ fprintf(ctx->fp, "\t.byte 5\n\t.uleb128 0x%x\n\t.uleb128 %d\n",
+ 64+i, cf-i-4);
+ fprintf(ctx->fp,
+ "\t.align 3\n"
+ ".LEFDE2:\n\n");
+#if LJ_HASFFI
+ fprintf(ctx->fp,
+ ".Lframe2:\n"
+ "\t.long .LECIE2-.LSCIE2\n"
+ ".LSCIE2:\n"
+ "\t.long 0\n"
+ "\t.byte 0x1\n"
+ "\t.string \"zR\"\n"
+ "\t.uleb128 0x1\n"
+ "\t.sleb128 -8\n"
+ "\t.byte 30\n" /* Return address is in lr. */
+ "\t.uleb128 1\n" /* augmentation length */
+ "\t.byte 0x1b\n" /* pcrel|sdata4 */
+ "\t.byte 0xc\n\t.uleb128 31\n\t.uleb128 0\n" /* def_cfa sp */
+ "\t.align 3\n"
+ ".LECIE2:\n\n");
+ fprintf(ctx->fp,
+ ".LSFDE3:\n"
+ "\t.long .LEFDE3-.LASFDE3\n"
+ ".LASFDE3:\n"
+ "\t.long .LASFDE3-.Lframe2\n"
+ "\t.long lj_vm_ffi_call-.\n"
+ "\t.long %d\n"
+ "\t.uleb128 0\n" /* augmentation length */
+ "\t.byte 0xe\n\t.uleb128 32\n" /* def_cfa_offset */
+ "\t.byte 0x9d\n\t.uleb128 4\n" /* offset fp */
+ "\t.byte 0x9e\n\t.uleb128 3\n" /* offset lr */
+ "\t.byte 0x93\n\t.uleb128 2\n" /* offset x19 */
+ "\t.align 3\n"
+ ".LEFDE3:\n\n", (int)ctx->codesz - fcofs);
+#endif
+ break;
+ default:
+ break;
+ }
+}
+
|// Low-level VM code for MIPS CPUs.
|// Bytecode interpreter, fast functions and helper functions.
- |// Copyright (C) 2005-2015 Mike Pall. See Copyright Notice in luajit.h
+ |// Copyright (C) 2005-2016 Mike Pall. See Copyright Notice in luajit.h
+|//
+|// MIPS soft-float support contributed by Djordje Kovacevic and
+|// Stefan Pejic from RT-RK.com, sponsored by Cisco Systems, Inc.
|
|.arch mips
|.section code_op, code_sub
-|// Low-level VM code for PowerPC CPUs.
+|// Low-level VM code for PowerPC 32 bit or 32on64 bit mode.
|// Bytecode interpreter, fast functions and helper functions.
- |// Copyright (C) 2005-2015 Mike Pall. See Copyright Notice in luajit.h
+ |// Copyright (C) 2005-2016 Mike Pall. See Copyright Notice in luajit.h
|
|.arch ppc
|.section code_op, code_sub
--- /dev/null
- |// Copyright (C) 2005-2015 Mike Pall. See Copyright Notice in luajit.h
+|// Low-level VM code for x64 CPUs in LJ_GC64 mode.
+|// Bytecode interpreter, fast functions and helper functions.
++|// Copyright (C) 2005-2016 Mike Pall. See Copyright Notice in luajit.h
+|
+|.arch x64
+|.section code_op, code_sub
+|
+|.actionlist build_actionlist
+|.globals GLOB_
+|.globalnames globnames
+|.externnames extnames
+|
+|//-----------------------------------------------------------------------
+|
+|.if WIN
+|.define X64WIN, 1 // Windows/x64 calling conventions.
+|.endif
+|
+|// Fixed register assignments for the interpreter.
+|// This is very fragile and has many dependencies. Caveat emptor.
+|.define BASE, rdx // Not C callee-save, refetched anyway.
+|.if X64WIN
+|.define KBASE, rdi // Must be C callee-save.
+|.define PC, rsi // Must be C callee-save.
+|.define DISPATCH, rbx // Must be C callee-save.
+|.define KBASEd, edi
+|.define PCd, esi
+|.define DISPATCHd, ebx
+|.else
+|.define KBASE, r15 // Must be C callee-save.
+|.define PC, rbx // Must be C callee-save.
+|.define DISPATCH, r14 // Must be C callee-save.
+|.define KBASEd, r15d
+|.define PCd, ebx
+|.define DISPATCHd, r14d
+|.endif
+|
+|.define RA, rcx
+|.define RAd, ecx
+|.define RAH, ch
+|.define RAL, cl
+|.define RB, rbp // Must be rbp (C callee-save).
+|.define RBd, ebp
+|.define RC, rax // Must be rax.
+|.define RCd, eax
+|.define RCW, ax
+|.define RCH, ah
+|.define RCL, al
+|.define OP, RBd
+|.define RD, RC
+|.define RDd, RCd
+|.define RDW, RCW
+|.define RDL, RCL
+|.define TMPR, r10
+|.define TMPRd, r10d
+|.define ITYPE, r11
+|.define ITYPEd, r11d
+|
+|.if X64WIN
+|.define CARG1, rcx // x64/WIN64 C call arguments.
+|.define CARG2, rdx
+|.define CARG3, r8
+|.define CARG4, r9
+|.define CARG1d, ecx
+|.define CARG2d, edx
+|.define CARG3d, r8d
+|.define CARG4d, r9d
+|.else
+|.define CARG1, rdi // x64/POSIX C call arguments.
+|.define CARG2, rsi
+|.define CARG3, rdx
+|.define CARG4, rcx
+|.define CARG5, r8
+|.define CARG6, r9
+|.define CARG1d, edi
+|.define CARG2d, esi
+|.define CARG3d, edx
+|.define CARG4d, ecx
+|.define CARG5d, r8d
+|.define CARG6d, r9d
+|.endif
+|
+|// Type definitions. Some of these are only used for documentation.
+|.type L, lua_State
+|.type GL, global_State
+|.type TVALUE, TValue
+|.type GCOBJ, GCobj
+|.type STR, GCstr
+|.type TAB, GCtab
+|.type LFUNC, GCfuncL
+|.type CFUNC, GCfuncC
+|.type PROTO, GCproto
+|.type UPVAL, GCupval
+|.type NODE, Node
+|.type NARGS, int
+|.type TRACE, GCtrace
+|.type SBUF, SBuf
+|
+|// Stack layout while in interpreter. Must match with lj_frame.h.
+|//-----------------------------------------------------------------------
+|.if X64WIN // x64/Windows stack layout
+|
+|.define CFRAME_SPACE, aword*5 // Delta for rsp (see <--).
+|.macro saveregs_
+| push rdi; push rsi; push rbx
+| sub rsp, CFRAME_SPACE
+|.endmacro
+|.macro saveregs
+| push rbp; saveregs_
+|.endmacro
+|.macro restoreregs
+| add rsp, CFRAME_SPACE
+| pop rbx; pop rsi; pop rdi; pop rbp
+|.endmacro
+|
+|.define SAVE_CFRAME, aword [rsp+aword*13]
+|.define SAVE_PC, aword [rsp+aword*12]
+|.define SAVE_L, aword [rsp+aword*11]
+|.define SAVE_ERRF, dword [rsp+dword*21]
+|.define SAVE_NRES, dword [rsp+dword*20]
+|//----- 16 byte aligned, ^^^ 32 byte register save area, owned by interpreter
+|.define SAVE_RET, aword [rsp+aword*9] //<-- rsp entering interpreter.
+|.define SAVE_R4, aword [rsp+aword*8]
+|.define SAVE_R3, aword [rsp+aword*7]
+|.define SAVE_R2, aword [rsp+aword*6]
+|.define SAVE_R1, aword [rsp+aword*5] //<-- rsp after register saves.
+|.define ARG5, aword [rsp+aword*4]
+|.define CSAVE_4, aword [rsp+aword*3]
+|.define CSAVE_3, aword [rsp+aword*2]
+|.define CSAVE_2, aword [rsp+aword*1]
+|.define CSAVE_1, aword [rsp] //<-- rsp while in interpreter.
+|//----- 16 byte aligned, ^^^ 32 byte register save area, owned by callee
+|
+|.define ARG5d, dword [rsp+dword*8]
+|.define TMP1, ARG5 // TMP1 overlaps ARG5
+|.define TMP1d, ARG5d
+|.define TMP1hi, dword [rsp+dword*9]
+|.define MULTRES, TMP1d // MULTRES overlaps TMP1d.
+|
+|//-----------------------------------------------------------------------
+|.else // x64/POSIX stack layout
+|
+|.define CFRAME_SPACE, aword*5 // Delta for rsp (see <--).
+|.macro saveregs_
+| push rbx; push r15; push r14
+|.if NO_UNWIND
+| push r13; push r12
+|.endif
+| sub rsp, CFRAME_SPACE
+|.endmacro
+|.macro saveregs
+| push rbp; saveregs_
+|.endmacro
+|.macro restoreregs
+| add rsp, CFRAME_SPACE
+|.if NO_UNWIND
+| pop r12; pop r13
+|.endif
+| pop r14; pop r15; pop rbx; pop rbp
+|.endmacro
+|
+|//----- 16 byte aligned,
+|.if NO_UNWIND
+|.define SAVE_RET, aword [rsp+aword*11] //<-- rsp entering interpreter.
+|.define SAVE_R4, aword [rsp+aword*10]
+|.define SAVE_R3, aword [rsp+aword*9]
+|.define SAVE_R2, aword [rsp+aword*8]
+|.define SAVE_R1, aword [rsp+aword*7]
+|.define SAVE_RU2, aword [rsp+aword*6]
+|.define SAVE_RU1, aword [rsp+aword*5] //<-- rsp after register saves.
+|.else
+|.define SAVE_RET, aword [rsp+aword*9] //<-- rsp entering interpreter.
+|.define SAVE_R4, aword [rsp+aword*8]
+|.define SAVE_R3, aword [rsp+aword*7]
+|.define SAVE_R2, aword [rsp+aword*6]
+|.define SAVE_R1, aword [rsp+aword*5] //<-- rsp after register saves.
+|.endif
+|.define SAVE_CFRAME, aword [rsp+aword*4]
+|.define SAVE_PC, aword [rsp+aword*3]
+|.define SAVE_L, aword [rsp+aword*2]
+|.define SAVE_ERRF, dword [rsp+dword*3]
+|.define SAVE_NRES, dword [rsp+dword*2]
+|.define TMP1, aword [rsp] //<-- rsp while in interpreter.
+|//----- 16 byte aligned
+|
+|.define TMP1d, dword [rsp]
+|.define TMP1hi, dword [rsp+dword*1]
+|.define MULTRES, TMP1d // MULTRES overlaps TMP1d.
+|
+|.endif
+|
+|//-----------------------------------------------------------------------
+|
+|// Instruction headers.
+|.macro ins_A; .endmacro
+|.macro ins_AD; .endmacro
+|.macro ins_AJ; .endmacro
+|.macro ins_ABC; movzx RBd, RCH; movzx RCd, RCL; .endmacro
+|.macro ins_AB_; movzx RBd, RCH; .endmacro
+|.macro ins_A_C; movzx RCd, RCL; .endmacro
+|.macro ins_AND; not RD; .endmacro
+|
+|// Instruction decode+dispatch. Carefully tuned (nope, lodsd is not faster).
+|.macro ins_NEXT
+| mov RCd, [PC]
+| movzx RAd, RCH
+| movzx OP, RCL
+| add PC, 4
+| shr RCd, 16
+| jmp aword [DISPATCH+OP*8]
+|.endmacro
+|
+|// Instruction footer.
+|.if 1
+| // Replicated dispatch. Less unpredictable branches, but higher I-Cache use.
+| .define ins_next, ins_NEXT
+| .define ins_next_, ins_NEXT
+|.else
+| // Common dispatch. Lower I-Cache use, only one (very) unpredictable branch.
+| // Affects only certain kinds of benchmarks (and only with -j off).
+| // Around 10%-30% slower on Core2, a lot more slower on P4.
+| .macro ins_next
+| jmp ->ins_next
+| .endmacro
+| .macro ins_next_
+| ->ins_next:
+| ins_NEXT
+| .endmacro
+|.endif
+|
+|// Call decode and dispatch.
+|.macro ins_callt
+| // BASE = new base, RB = LFUNC, RD = nargs+1, [BASE-8] = PC
+| mov PC, LFUNC:RB->pc
+| mov RAd, [PC]
+| movzx OP, RAL
+| movzx RAd, RAH
+| add PC, 4
+| jmp aword [DISPATCH+OP*8]
+|.endmacro
+|
+|.macro ins_call
+| // BASE = new base, RB = LFUNC, RD = nargs+1
+| mov [BASE-8], PC
+| ins_callt
+|.endmacro
+|
+|//-----------------------------------------------------------------------
+|
+|// Macros to clear or set tags.
+|.macro cleartp, reg; shl reg, 17; shr reg, 17; .endmacro
+|.macro settp, reg, tp
+| mov64 ITYPE, ((int64_t)tp<<47)
+| or reg, ITYPE
+|.endmacro
+|.macro settp, dst, reg, tp
+| mov64 dst, ((int64_t)tp<<47)
+| or dst, reg
+|.endmacro
+|.macro setint, reg
+| settp reg, LJ_TISNUM
+|.endmacro
+|.macro setint, dst, reg
+| settp dst, reg, LJ_TISNUM
+|.endmacro
+|
+|// Macros to test operand types.
+|.macro checktp_nc, reg, tp, target
+| mov ITYPE, reg
+| sar ITYPE, 47
+| cmp ITYPEd, tp
+| jne target
+|.endmacro
+|.macro checktp, reg, tp, target
+| mov ITYPE, reg
+| cleartp reg
+| sar ITYPE, 47
+| cmp ITYPEd, tp
+| jne target
+|.endmacro
+|.macro checktptp, src, tp, target
+| mov ITYPE, src
+| sar ITYPE, 47
+| cmp ITYPEd, tp
+| jne target
+|.endmacro
+|.macro checkstr, reg, target; checktp reg, LJ_TSTR, target; .endmacro
+|.macro checktab, reg, target; checktp reg, LJ_TTAB, target; .endmacro
+|.macro checkfunc, reg, target; checktp reg, LJ_TFUNC, target; .endmacro
+|
+|.macro checknumx, reg, target, jump
+| mov ITYPE, reg
+| sar ITYPE, 47
+| cmp ITYPEd, LJ_TISNUM
+| jump target
+|.endmacro
+|.macro checkint, reg, target; checknumx reg, target, jne; .endmacro
+|.macro checkinttp, src, target; checknumx src, target, jne; .endmacro
+|.macro checknum, reg, target; checknumx reg, target, jae; .endmacro
+|.macro checknumtp, src, target; checknumx src, target, jae; .endmacro
+|.macro checknumber, src, target; checknumx src, target, ja; .endmacro
+|
+|.macro mov_false, reg; mov64 reg, (int64_t)~((uint64_t)1<<47); .endmacro
+|.macro mov_true, reg; mov64 reg, (int64_t)~((uint64_t)2<<47); .endmacro
+|
+|// These operands must be used with movzx.
+|.define PC_OP, byte [PC-4]
+|.define PC_RA, byte [PC-3]
+|.define PC_RB, byte [PC-1]
+|.define PC_RC, byte [PC-2]
+|.define PC_RD, word [PC-2]
+|
+|.macro branchPC, reg
+| lea PC, [PC+reg*4-BCBIAS_J*4]
+|.endmacro
+|
+|// Assumes DISPATCH is relative to GL.
+#define DISPATCH_GL(field) (GG_DISP2G + (int)offsetof(global_State, field))
+#define DISPATCH_J(field) (GG_DISP2J + (int)offsetof(jit_State, field))
+|
+#define PC2PROTO(field) ((int)offsetof(GCproto, field)-(int)sizeof(GCproto))
+|
+|// Decrement hashed hotcount and trigger trace recorder if zero.
+|.macro hotloop, reg
+| mov reg, PCd
+| shr reg, 1
+| and reg, HOTCOUNT_PCMASK
+| sub word [DISPATCH+reg+GG_DISP2HOT], HOTCOUNT_LOOP
+| jb ->vm_hotloop
+|.endmacro
+|
+|.macro hotcall, reg
+| mov reg, PCd
+| shr reg, 1
+| and reg, HOTCOUNT_PCMASK
+| sub word [DISPATCH+reg+GG_DISP2HOT], HOTCOUNT_CALL
+| jb ->vm_hotcall
+|.endmacro
+|
+|// Set current VM state.
+|.macro set_vmstate, st
+| mov dword [DISPATCH+DISPATCH_GL(vmstate)], ~LJ_VMST_..st
+|.endmacro
+|
+|.macro fpop1; fstp st1; .endmacro
+|
+|// Synthesize SSE FP constants.
+|.macro sseconst_abs, reg, tmp // Synthesize abs mask.
+| mov64 tmp, U64x(7fffffff,ffffffff); movd reg, tmp
+|.endmacro
+|
+|.macro sseconst_hi, reg, tmp, val // Synthesize hi-32 bit const.
+| mov64 tmp, U64x(val,00000000); movd reg, tmp
+|.endmacro
+|
+|.macro sseconst_sign, reg, tmp // Synthesize sign mask.
+| sseconst_hi reg, tmp, 80000000
+|.endmacro
+|.macro sseconst_1, reg, tmp // Synthesize 1.0.
+| sseconst_hi reg, tmp, 3ff00000
+|.endmacro
+|.macro sseconst_m1, reg, tmp // Synthesize -1.0.
+| sseconst_hi reg, tmp, bff00000
+|.endmacro
+|.macro sseconst_2p52, reg, tmp // Synthesize 2^52.
+| sseconst_hi reg, tmp, 43300000
+|.endmacro
+|.macro sseconst_tobit, reg, tmp // Synthesize 2^52 + 2^51.
+| sseconst_hi reg, tmp, 43380000
+|.endmacro
+|
+|// Move table write barrier back. Overwrites reg.
+|.macro barrierback, tab, reg
+| and byte tab->marked, (uint8_t)~LJ_GC_BLACK // black2gray(tab)
+| mov reg, [DISPATCH+DISPATCH_GL(gc.grayagain)]
+| mov [DISPATCH+DISPATCH_GL(gc.grayagain)], tab
+| mov tab->gclist, reg
+|.endmacro
+|
+|//-----------------------------------------------------------------------
+
+/* Generate subroutines used by opcodes and other parts of the VM. */
+/* The .code_sub section should be last to help static branch prediction. */
+static void build_subroutines(BuildCtx *ctx)
+{
+ |.code_sub
+ |
+ |//-----------------------------------------------------------------------
+ |//-- Return handling ----------------------------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |->vm_returnp:
+ | test PCd, FRAME_P
+ | jz ->cont_dispatch
+ |
+ | // Return from pcall or xpcall fast func.
+ | and PC, -8
+ | sub BASE, PC // Restore caller base.
+ | lea RA, [RA+PC-8] // Rebase RA and prepend one result.
+ | mov PC, [BASE-8] // Fetch PC of previous frame.
+ | // Prepending may overwrite the pcall frame, so do it at the end.
+ | mov_true ITYPE
+ | mov aword [BASE+RA], ITYPE // Prepend true to results.
+ |
+ |->vm_returnc:
+ | add RDd, 1 // RD = nresults+1
+ | jz ->vm_unwind_yield
+ | mov MULTRES, RDd
+ | test PC, FRAME_TYPE
+ | jz ->BC_RET_Z // Handle regular return to Lua.
+ |
+ |->vm_return:
+ | // BASE = base, RA = resultofs, RD = nresults+1 (= MULTRES), PC = return
+ | xor PC, FRAME_C
+ | test PCd, FRAME_TYPE
+ | jnz ->vm_returnp
+ |
+ | // Return to C.
+ | set_vmstate C
+ | and PC, -8
+ | sub PC, BASE
+ | neg PC // Previous base = BASE - delta.
+ |
+ | sub RDd, 1
+ | jz >2
+ |1: // Move results down.
+ | mov RB, [BASE+RA]
+ | mov [BASE-16], RB
+ | add BASE, 8
+ | sub RDd, 1
+ | jnz <1
+ |2:
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, PC
+ |3:
+ | mov RDd, MULTRES
+ | mov RAd, SAVE_NRES // RA = wanted nresults+1
+ |4:
+ | cmp RAd, RDd
+ | jne >6 // More/less results wanted?
+ |5:
+ | sub BASE, 16
+ | mov L:RB->top, BASE
+ |
+ |->vm_leave_cp:
+ | mov RA, SAVE_CFRAME // Restore previous C frame.
+ | mov L:RB->cframe, RA
+ | xor eax, eax // Ok return status for vm_pcall.
+ |
+ |->vm_leave_unw:
+ | restoreregs
+ | ret
+ |
+ |6:
+ | jb >7 // Less results wanted?
+ | // More results wanted. Check stack size and fill up results with nil.
+ | cmp BASE, L:RB->maxstack
+ | ja >8
+ | mov aword [BASE-16], LJ_TNIL
+ | add BASE, 8
+ | add RDd, 1
+ | jmp <4
+ |
+ |7: // Less results wanted.
+ | test RAd, RAd
+ | jz <5 // But check for LUA_MULTRET+1.
+ | sub RA, RD // Negative result!
+ | lea BASE, [BASE+RA*8] // Correct top.
+ | jmp <5
+ |
+ |8: // Corner case: need to grow stack for filling up results.
+ | // This can happen if:
+ | // - A C function grows the stack (a lot).
+ | // - The GC shrinks the stack in between.
+ | // - A return back from a lua_call() with (high) nresults adjustment.
+ | mov L:RB->top, BASE // Save current top held in BASE (yes).
+ | mov MULTRES, RDd // Need to fill only remainder with nil.
+ | mov CARG2d, RAd
+ | mov CARG1, L:RB
+ | call extern lj_state_growstack // (lua_State *L, int n)
+ | mov BASE, L:RB->top // Need the (realloced) L->top in BASE.
+ | jmp <3
+ |
+ |->vm_unwind_yield:
+ | mov al, LUA_YIELD
+ | jmp ->vm_unwind_c_eh
+ |
+ |->vm_unwind_c: // Unwind C stack, return from vm_pcall.
+ | // (void *cframe, int errcode)
+ | mov eax, CARG2d // Error return status for vm_pcall.
+ | mov rsp, CARG1
+ |->vm_unwind_c_eh: // Landing pad for external unwinder.
+ | mov L:RB, SAVE_L
+ | mov GL:RB, L:RB->glref
+ | mov dword GL:RB->vmstate, ~LJ_VMST_C
+ | jmp ->vm_leave_unw
+ |
+ |->vm_unwind_rethrow:
+ |.if not X64WIN
+ | mov CARG1, SAVE_L
+ | mov CARG2d, eax
+ | restoreregs
+ | jmp extern lj_err_throw // (lua_State *L, int errcode)
+ |.endif
+ |
+ |->vm_unwind_ff: // Unwind C stack, return from ff pcall.
+ | // (void *cframe)
+ | and CARG1, CFRAME_RAWMASK
+ | mov rsp, CARG1
+ |->vm_unwind_ff_eh: // Landing pad for external unwinder.
+ | mov L:RB, SAVE_L
+ | mov RDd, 1+1 // Really 1+2 results, incr. later.
+ | mov BASE, L:RB->base
+ | mov DISPATCH, L:RB->glref // Setup pointer to dispatch table.
+ | add DISPATCH, GG_G2DISP
+ | mov PC, [BASE-8] // Fetch PC of previous frame.
+ | mov_false RA
+ | mov RB, [BASE]
+ | mov [BASE-16], RA // Prepend false to error message.
+ | mov [BASE-8], RB
+ | mov RA, -16 // Results start at BASE+RA = BASE-16.
+ | set_vmstate INTERP
+ | jmp ->vm_returnc // Increments RD/MULTRES and returns.
+ |
+ |//-----------------------------------------------------------------------
+ |//-- Grow stack for calls -----------------------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |->vm_growstack_c: // Grow stack for C function.
+ | mov CARG2d, LUA_MINSTACK
+ | jmp >2
+ |
+ |->vm_growstack_v: // Grow stack for vararg Lua function.
+ | sub RD, 16 // LJ_FR2
+ | jmp >1
+ |
+ |->vm_growstack_f: // Grow stack for fixarg Lua function.
+ | // BASE = new base, RD = nargs+1, RB = L, PC = first PC
+ | lea RD, [BASE+NARGS:RD*8-8]
+ |1:
+ | movzx RAd, byte [PC-4+PC2PROTO(framesize)]
+ | add PC, 4 // Must point after first instruction.
+ | mov L:RB->base, BASE
+ | mov L:RB->top, RD
+ | mov SAVE_PC, PC
+ | mov CARG2, RA
+ |2:
+ | // RB = L, L->base = new base, L->top = top
+ | mov CARG1, L:RB
+ | call extern lj_state_growstack // (lua_State *L, int n)
+ | mov BASE, L:RB->base
+ | mov RD, L:RB->top
+ | mov LFUNC:RB, [BASE-16]
+ | cleartp LFUNC:RB
+ | sub RD, BASE
+ | shr RDd, 3
+ | add NARGS:RDd, 1
+ | // BASE = new base, RB = LFUNC, RD = nargs+1
+ | ins_callt // Just retry the call.
+ |
+ |//-----------------------------------------------------------------------
+ |//-- Entry points into the assembler VM ---------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |->vm_resume: // Setup C frame and resume thread.
+ | // (lua_State *L, TValue *base, int nres1 = 0, ptrdiff_t ef = 0)
+ | saveregs
+ | mov L:RB, CARG1 // Caveat: CARG1 may be RA.
+ | mov SAVE_L, CARG1
+ | mov RA, CARG2
+ | mov PCd, FRAME_CP
+ | xor RDd, RDd
+ | lea KBASE, [esp+CFRAME_RESUME]
+ | mov DISPATCH, L:RB->glref // Setup pointer to dispatch table.
+ | add DISPATCH, GG_G2DISP
+ | mov SAVE_PC, RD // Any value outside of bytecode is ok.
+ | mov SAVE_CFRAME, RD
+ | mov SAVE_NRES, RDd
+ | mov SAVE_ERRF, RDd
+ | mov L:RB->cframe, KBASE
+ | cmp byte L:RB->status, RDL
+ | je >2 // Initial resume (like a call).
+ |
+ | // Resume after yield (like a return).
+ | mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
+ | set_vmstate INTERP
+ | mov byte L:RB->status, RDL
+ | mov BASE, L:RB->base
+ | mov RD, L:RB->top
+ | sub RD, RA
+ | shr RDd, 3
+ | add RDd, 1 // RD = nresults+1
+ | sub RA, BASE // RA = resultofs
+ | mov PC, [BASE-8]
+ | mov MULTRES, RDd
+ | test PCd, FRAME_TYPE
+ | jz ->BC_RET_Z
+ | jmp ->vm_return
+ |
+ |->vm_pcall: // Setup protected C frame and enter VM.
+ | // (lua_State *L, TValue *base, int nres1, ptrdiff_t ef)
+ | saveregs
+ | mov PCd, FRAME_CP
+ | mov SAVE_ERRF, CARG4d
+ | jmp >1
+ |
+ |->vm_call: // Setup C frame and enter VM.
+ | // (lua_State *L, TValue *base, int nres1)
+ | saveregs
+ | mov PCd, FRAME_C
+ |
+ |1: // Entry point for vm_pcall above (PC = ftype).
+ | mov SAVE_NRES, CARG3d
+ | mov L:RB, CARG1 // Caveat: CARG1 may be RA.
+ | mov SAVE_L, CARG1
+ | mov RA, CARG2
+ |
+ | mov DISPATCH, L:RB->glref // Setup pointer to dispatch table.
+ | mov KBASE, L:RB->cframe // Add our C frame to cframe chain.
+ | mov SAVE_CFRAME, KBASE
+ | mov SAVE_PC, L:RB // Any value outside of bytecode is ok.
+ | add DISPATCH, GG_G2DISP
+ | mov L:RB->cframe, rsp
+ |
+ |2: // Entry point for vm_resume/vm_cpcall (RA = base, RB = L, PC = ftype).
+ | mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
+ | set_vmstate INTERP
+ | mov BASE, L:RB->base // BASE = old base (used in vmeta_call).
+ | add PC, RA
+ | sub PC, BASE // PC = frame delta + frame type
+ |
+ | mov RD, L:RB->top
+ | sub RD, RA
+ | shr NARGS:RDd, 3
+ | add NARGS:RDd, 1 // RD = nargs+1
+ |
+ |->vm_call_dispatch:
+ | mov LFUNC:RB, [RA-16]
+ | checkfunc LFUNC:RB, ->vmeta_call // Ensure KBASE defined and != BASE.
+ |
+ |->vm_call_dispatch_f:
+ | mov BASE, RA
+ | ins_call
+ | // BASE = new base, RB = func, RD = nargs+1, PC = caller PC
+ |
+ |->vm_cpcall: // Setup protected C frame, call C.
+ | // (lua_State *L, lua_CFunction func, void *ud, lua_CPFunction cp)
+ | saveregs
+ | mov L:RB, CARG1 // Caveat: CARG1 may be RA.
+ | mov SAVE_L, CARG1
+ | mov SAVE_PC, L:RB // Any value outside of bytecode is ok.
+ |
+ | mov KBASE, L:RB->stack // Compute -savestack(L, L->top).
+ | sub KBASE, L:RB->top
+ | mov DISPATCH, L:RB->glref // Setup pointer to dispatch table.
+ | mov SAVE_ERRF, 0 // No error function.
+ | mov SAVE_NRES, KBASEd // Neg. delta means cframe w/o frame.
+ | add DISPATCH, GG_G2DISP
+ | // Handler may change cframe_nres(L->cframe) or cframe_errfunc(L->cframe).
+ |
+ | mov KBASE, L:RB->cframe // Add our C frame to cframe chain.
+ | mov SAVE_CFRAME, KBASE
+ | mov L:RB->cframe, rsp
+ | mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
+ |
+ | call CARG4 // (lua_State *L, lua_CFunction func, void *ud)
+ | // TValue * (new base) or NULL returned in eax (RC).
+ | test RC, RC
+ | jz ->vm_leave_cp // No base? Just remove C frame.
+ | mov RA, RC
+ | mov PCd, FRAME_CP
+ | jmp <2 // Else continue with the call.
+ |
+ |//-----------------------------------------------------------------------
+ |//-- Metamethod handling ------------------------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |//-- Continuation dispatch ----------------------------------------------
+ |
+ |->cont_dispatch:
+ | // BASE = meta base, RA = resultofs, RD = nresults+1 (also in MULTRES)
+ | add RA, BASE
+ | and PC, -8
+ | mov RB, BASE
+ | sub BASE, PC // Restore caller BASE.
+ | mov aword [RA+RD*8-8], LJ_TNIL // Ensure one valid arg.
+ | mov RC, RA // ... in [RC]
+ | mov PC, [RB-24] // Restore PC from [cont|PC].
+ | mov RA, qword [RB-32] // May be negative on WIN64 with debug.
+ |.if FFI
+ | cmp RA, 1
+ | jbe >1
+ |.endif
+ | mov LFUNC:KBASE, [BASE-16]
+ | cleartp LFUNC:KBASE
+ | mov KBASE, LFUNC:KBASE->pc
+ | mov KBASE, [KBASE+PC2PROTO(k)]
+ | // BASE = base, RC = result, RB = meta base
+ | jmp RA // Jump to continuation.
+ |
+ |.if FFI
+ |1:
+ | je ->cont_ffi_callback // cont = 1: return from FFI callback.
+ | // cont = 0: Tail call from C function.
+ | sub RB, BASE
+ | shr RBd, 3
+ | lea RDd, [RBd-3]
+ | jmp ->vm_call_tail
+ |.endif
+ |
+ |->cont_cat: // BASE = base, RC = result, RB = mbase
+ | movzx RAd, PC_RB
+ | sub RB, 32
+ | lea RA, [BASE+RA*8]
+ | sub RA, RB
+ | je ->cont_ra
+ | neg RA
+ | shr RAd, 3
+ |.if X64WIN
+ | mov CARG3d, RAd
+ | mov L:CARG1, SAVE_L
+ | mov L:CARG1->base, BASE
+ | mov RC, [RC]
+ | mov [RB], RC
+ | mov CARG2, RB
+ |.else
+ | mov L:CARG1, SAVE_L
+ | mov L:CARG1->base, BASE
+ | mov CARG3d, RAd
+ | mov RA, [RC]
+ | mov [RB], RA
+ | mov CARG2, RB
+ |.endif
+ | jmp ->BC_CAT_Z
+ |
+ |//-- Table indexing metamethods -----------------------------------------
+ |
+ |->vmeta_tgets:
+ | settp STR:RC, LJ_TSTR // STR:RC = GCstr *
+ | mov TMP1, STR:RC
+ | lea RC, TMP1
+ | cmp PC_OP, BC_GGET
+ | jne >1
+ | settp TAB:RA, TAB:RB, LJ_TTAB // TAB:RB = GCtab *
+ | lea RB, [DISPATCH+DISPATCH_GL(tmptv)] // Store fn->l.env in g->tmptv.
+ | mov [RB], TAB:RA
+ | jmp >2
+ |
+ |->vmeta_tgetb:
+ | movzx RCd, PC_RC
+ |.if DUALNUM
+ | setint RC
+ | mov TMP1, RC
+ |.else
+ | cvtsi2sd xmm0, RCd
+ | movsd TMP1, xmm0
+ |.endif
+ | lea RC, TMP1
+ | jmp >1
+ |
+ |->vmeta_tgetv:
+ | movzx RCd, PC_RC // Reload TValue *k from RC.
+ | lea RC, [BASE+RC*8]
+ |1:
+ | movzx RBd, PC_RB // Reload TValue *t from RB.
+ | lea RB, [BASE+RB*8]
+ |2:
+ | mov L:CARG1, SAVE_L
+ | mov L:CARG1->base, BASE // Caveat: CARG2/CARG3 may be BASE.
+ | mov CARG2, RB
+ | mov CARG3, RC
+ | mov L:RB, L:CARG1
+ | mov SAVE_PC, PC
+ | call extern lj_meta_tget // (lua_State *L, TValue *o, TValue *k)
+ | // TValue * (finished) or NULL (metamethod) returned in eax (RC).
+ | mov BASE, L:RB->base
+ | test RC, RC
+ | jz >3
+ |->cont_ra: // BASE = base, RC = result
+ | movzx RAd, PC_RA
+ | mov RB, [RC]
+ | mov [BASE+RA*8], RB
+ | ins_next
+ |
+ |3: // Call __index metamethod.
+ | // BASE = base, L->top = new base, stack = cont/func/t/k
+ | mov RA, L:RB->top
+ | mov [RA-24], PC // [cont|PC]
+ | lea PC, [RA+FRAME_CONT]
+ | sub PC, BASE
+ | mov LFUNC:RB, [RA-16] // Guaranteed to be a function here.
+ | mov NARGS:RDd, 2+1 // 2 args for func(t, k).
+ | cleartp LFUNC:RB
+ | jmp ->vm_call_dispatch_f
+ |
+ |->vmeta_tgetr:
+ | mov CARG1, TAB:RB
+ | mov RB, BASE // Save BASE.
+ | mov CARG2d, RCd // Caveat: CARG2 == BASE
+ | call extern lj_tab_getinth // (GCtab *t, int32_t key)
+ | // cTValue * or NULL returned in eax (RC).
+ | movzx RAd, PC_RA
+ | mov BASE, RB // Restore BASE.
+ | test RC, RC
+ | jnz ->BC_TGETR_Z
+ | mov ITYPE, LJ_TNIL
+ | jmp ->BC_TGETR2_Z
+ |
+ |//-----------------------------------------------------------------------
+ |
+ |->vmeta_tsets:
+ | settp STR:RC, LJ_TSTR // STR:RC = GCstr *
+ | mov TMP1, STR:RC
+ | lea RC, TMP1
+ | cmp PC_OP, BC_GSET
+ | jne >1
+ | settp TAB:RA, TAB:RB, LJ_TTAB // TAB:RB = GCtab *
+ | lea RB, [DISPATCH+DISPATCH_GL(tmptv)] // Store fn->l.env in g->tmptv.
+ | mov [RB], TAB:RA
+ | jmp >2
+ |
+ |->vmeta_tsetb:
+ | movzx RCd, PC_RC
+ |.if DUALNUM
+ | setint RC
+ | mov TMP1, RC
+ |.else
+ | cvtsi2sd xmm0, RCd
+ | movsd TMP1, xmm0
+ |.endif
+ | lea RC, TMP1
+ | jmp >1
+ |
+ |->vmeta_tsetv:
+ | movzx RCd, PC_RC // Reload TValue *k from RC.
+ | lea RC, [BASE+RC*8]
+ |1:
+ | movzx RBd, PC_RB // Reload TValue *t from RB.
+ | lea RB, [BASE+RB*8]
+ |2:
+ | mov L:CARG1, SAVE_L
+ | mov L:CARG1->base, BASE // Caveat: CARG2/CARG3 may be BASE.
+ | mov CARG2, RB
+ | mov CARG3, RC
+ | mov L:RB, L:CARG1
+ | mov SAVE_PC, PC
+ | call extern lj_meta_tset // (lua_State *L, TValue *o, TValue *k)
+ | // TValue * (finished) or NULL (metamethod) returned in eax (RC).
+ | mov BASE, L:RB->base
+ | test RC, RC
+ | jz >3
+ | // NOBARRIER: lj_meta_tset ensures the table is not black.
+ | movzx RAd, PC_RA
+ | mov RB, [BASE+RA*8]
+ | mov [RC], RB
+ |->cont_nop: // BASE = base, (RC = result)
+ | ins_next
+ |
+ |3: // Call __newindex metamethod.
+ | // BASE = base, L->top = new base, stack = cont/func/t/k/(v)
+ | mov RA, L:RB->top
+ | mov [RA-24], PC // [cont|PC]
+ | movzx RCd, PC_RA
+ | // Copy value to third argument.
+ | mov RB, [BASE+RC*8]
+ | mov [RA+16], RB
+ | lea PC, [RA+FRAME_CONT]
+ | sub PC, BASE
+ | mov LFUNC:RB, [RA-16] // Guaranteed to be a function here.
+ | mov NARGS:RDd, 3+1 // 3 args for func(t, k, v).
+ | cleartp LFUNC:RB
+ | jmp ->vm_call_dispatch_f
+ |
+ |->vmeta_tsetr:
+ |.if X64WIN
+ | mov L:CARG1, SAVE_L
+ | mov CARG3d, RCd
+ | mov L:CARG1->base, BASE
+ | xchg CARG2, TAB:RB // Caveat: CARG2 == BASE.
+ |.else
+ | mov L:CARG1, SAVE_L
+ | mov CARG2, TAB:RB
+ | mov L:CARG1->base, BASE
+ | mov RB, BASE // Save BASE.
+ | mov CARG3d, RCd // Caveat: CARG3 == BASE.
+ |.endif
+ | mov SAVE_PC, PC
+ | call extern lj_tab_setinth // (lua_State *L, GCtab *t, int32_t key)
+ | // TValue * returned in eax (RC).
+ | movzx RAd, PC_RA
+ | mov BASE, RB // Restore BASE.
+ | jmp ->BC_TSETR_Z
+ |
+ |//-- Comparison metamethods ---------------------------------------------
+ |
+ |->vmeta_comp:
+ | movzx RDd, PC_RD
+ | movzx RAd, PC_RA
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, BASE // Caveat: CARG2/CARG3 == BASE.
+ |.if X64WIN
+ | lea CARG3, [BASE+RD*8]
+ | lea CARG2, [BASE+RA*8]
+ |.else
+ | lea CARG2, [BASE+RA*8]
+ | lea CARG3, [BASE+RD*8]
+ |.endif
+ | mov CARG1, L:RB // Caveat: CARG1/CARG4 == RA.
+ | movzx CARG4d, PC_OP
+ | mov SAVE_PC, PC
+ | call extern lj_meta_comp // (lua_State *L, TValue *o1, *o2, int op)
+ | // 0/1 or TValue * (metamethod) returned in eax (RC).
+ |3:
+ | mov BASE, L:RB->base
+ | cmp RC, 1
+ | ja ->vmeta_binop
+ |4:
+ | lea PC, [PC+4]
+ | jb >6
+ |5:
+ | movzx RDd, PC_RD
+ | branchPC RD
+ |6:
+ | ins_next
+ |
+ |->cont_condt: // BASE = base, RC = result
+ | add PC, 4
+ | mov ITYPE, [RC]
+ | sar ITYPE, 47
+ | cmp ITYPEd, LJ_TISTRUECOND // Branch if result is true.
+ | jb <5
+ | jmp <6
+ |
+ |->cont_condf: // BASE = base, RC = result
+ | mov ITYPE, [RC]
+ | sar ITYPE, 47
+ | cmp ITYPEd, LJ_TISTRUECOND // Branch if result is false.
+ | jmp <4
+ |
+ |->vmeta_equal:
+ | cleartp TAB:RD
+ | sub PC, 4
+ |.if X64WIN
+ | mov CARG3, RD
+ | mov CARG4d, RBd
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, BASE // Caveat: CARG2 == BASE.
+ | mov CARG2, RA
+ | mov CARG1, L:RB // Caveat: CARG1 == RA.
+ |.else
+ | mov CARG2, RA
+ | mov CARG4d, RBd // Caveat: CARG4 == RA.
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, BASE // Caveat: CARG3 == BASE.
+ | mov CARG3, RD
+ | mov CARG1, L:RB
+ |.endif
+ | mov SAVE_PC, PC
+ | call extern lj_meta_equal // (lua_State *L, GCobj *o1, *o2, int ne)
+ | // 0/1 or TValue * (metamethod) returned in eax (RC).
+ | jmp <3
+ |
+ |->vmeta_equal_cd:
+ |.if FFI
+ | sub PC, 4
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, BASE
+ | mov CARG1, L:RB
+ | mov CARG2d, dword [PC-4]
+ | mov SAVE_PC, PC
+ | call extern lj_meta_equal_cd // (lua_State *L, BCIns ins)
+ | // 0/1 or TValue * (metamethod) returned in eax (RC).
+ | jmp <3
+ |.endif
+ |
+ |->vmeta_istype:
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, BASE // Caveat: CARG2/CARG3 may be BASE.
+ | mov CARG2d, RAd
+ | mov CARG3d, RDd
+ | mov L:CARG1, L:RB
+ | mov SAVE_PC, PC
+ | call extern lj_meta_istype // (lua_State *L, BCReg ra, BCReg tp)
+ | mov BASE, L:RB->base
+ | jmp <6
+ |
+ |//-- Arithmetic metamethods ---------------------------------------------
+ |
+ |->vmeta_arith_vno:
+ |.if DUALNUM
+ | movzx RBd, PC_RB
+ | movzx RCd, PC_RC
+ |.endif
+ |->vmeta_arith_vn:
+ | lea RC, [KBASE+RC*8]
+ | jmp >1
+ |
+ |->vmeta_arith_nvo:
+ |.if DUALNUM
+ | movzx RBd, PC_RB
+ | movzx RCd, PC_RC
+ |.endif
+ |->vmeta_arith_nv:
+ | lea TMPR, [KBASE+RC*8]
+ | lea RC, [BASE+RB*8]
+ | mov RB, TMPR
+ | jmp >2
+ |
+ |->vmeta_unm:
+ | lea RC, [BASE+RD*8]
+ | mov RB, RC
+ | jmp >2
+ |
+ |->vmeta_arith_vvo:
+ |.if DUALNUM
+ | movzx RBd, PC_RB
+ | movzx RCd, PC_RC
+ |.endif
+ |->vmeta_arith_vv:
+ | lea RC, [BASE+RC*8]
+ |1:
+ | lea RB, [BASE+RB*8]
+ |2:
+ | lea RA, [BASE+RA*8]
+ |.if X64WIN
+ | mov CARG3, RB
+ | mov CARG4, RC
+ | movzx RCd, PC_OP
+ | mov ARG5d, RCd
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, BASE // Caveat: CARG2 == BASE.
+ | mov CARG2, RA
+ | mov CARG1, L:RB // Caveat: CARG1 == RA.
+ |.else
+ | movzx CARG5d, PC_OP
+ | mov CARG2, RA
+ | mov CARG4, RC // Caveat: CARG4 == RA.
+ | mov L:CARG1, SAVE_L
+ | mov L:CARG1->base, BASE // Caveat: CARG3 == BASE.
+ | mov CARG3, RB
+ | mov L:RB, L:CARG1
+ |.endif
+ | mov SAVE_PC, PC
+ | call extern lj_meta_arith // (lua_State *L, TValue *ra,*rb,*rc, BCReg op)
+ | // NULL (finished) or TValue * (metamethod) returned in eax (RC).
+ | mov BASE, L:RB->base
+ | test RC, RC
+ | jz ->cont_nop
+ |
+ | // Call metamethod for binary op.
+ |->vmeta_binop:
+ | // BASE = base, RC = new base, stack = cont/func/o1/o2
+ | mov RA, RC
+ | sub RC, BASE
+ | mov [RA-24], PC // [cont|PC]
+ | lea PC, [RC+FRAME_CONT]
+ | mov NARGS:RDd, 2+1 // 2 args for func(o1, o2).
+ | jmp ->vm_call_dispatch
+ |
+ |->vmeta_len:
+ | movzx RDd, PC_RD
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, BASE
+ | lea CARG2, [BASE+RD*8] // Caveat: CARG2 == BASE
+ | mov L:CARG1, L:RB
+ | mov SAVE_PC, PC
+ | call extern lj_meta_len // (lua_State *L, TValue *o)
+ | // NULL (retry) or TValue * (metamethod) returned in eax (RC).
+ | mov BASE, L:RB->base
+#if LJ_52
+ | test RC, RC
+ | jne ->vmeta_binop // Binop call for compatibility.
+ | movzx RDd, PC_RD
+ | mov TAB:CARG1, [BASE+RD*8]
+ | cleartp TAB:CARG1
+ | jmp ->BC_LEN_Z
+#else
+ | jmp ->vmeta_binop // Binop call for compatibility.
+#endif
+ |
+ |//-- Call metamethod ----------------------------------------------------
+ |
+ |->vmeta_call_ra:
+ | lea RA, [BASE+RA*8+16]
+ |->vmeta_call: // Resolve and call __call metamethod.
+ | // BASE = old base, RA = new base, RC = nargs+1, PC = return
+ | mov TMP1d, NARGS:RDd // Save RA, RC for us.
+ | mov RB, RA
+ |.if X64WIN
+ | mov L:TMPR, SAVE_L
+ | mov L:TMPR->base, BASE // Caveat: CARG2 is BASE.
+ | lea CARG2, [RA-16]
+ | lea CARG3, [RA+NARGS:RD*8-8]
+ | mov CARG1, L:TMPR // Caveat: CARG1 is RA.
+ |.else
+ | mov L:CARG1, SAVE_L
+ | mov L:CARG1->base, BASE // Caveat: CARG3 is BASE.
+ | lea CARG2, [RA-16]
+ | lea CARG3, [RA+NARGS:RD*8-8]
+ |.endif
+ | mov SAVE_PC, PC
+ | call extern lj_meta_call // (lua_State *L, TValue *func, TValue *top)
+ | mov RA, RB
+ | mov L:RB, SAVE_L
+ | mov BASE, L:RB->base
+ | mov NARGS:RDd, TMP1d
+ | mov LFUNC:RB, [RA-16]
+ | cleartp LFUNC:RB
+ | add NARGS:RDd, 1
+ | // This is fragile. L->base must not move, KBASE must always be defined.
+ | cmp KBASE, BASE // Continue with CALLT if flag set.
+ | je ->BC_CALLT_Z
+ | mov BASE, RA
+ | ins_call // Otherwise call resolved metamethod.
+ |
+ |//-- Argument coercion for 'for' statement ------------------------------
+ |
+ |->vmeta_for:
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, BASE
+ | mov CARG2, RA // Caveat: CARG2 == BASE
+ | mov L:CARG1, L:RB // Caveat: CARG1 == RA
+ | mov SAVE_PC, PC
+ | call extern lj_meta_for // (lua_State *L, TValue *base)
+ | mov BASE, L:RB->base
+ | mov RCd, [PC-4]
+ | movzx RAd, RCH
+ | movzx OP, RCL
+ | shr RCd, 16
+ | jmp aword [DISPATCH+OP*8+GG_DISP2STATIC] // Retry FORI or JFORI.
+ |
+ |//-----------------------------------------------------------------------
+ |//-- Fast functions -----------------------------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |.macro .ffunc, name
+ |->ff_ .. name:
+ |.endmacro
+ |
+ |.macro .ffunc_1, name
+ |->ff_ .. name:
+ | cmp NARGS:RDd, 1+1; jb ->fff_fallback
+ |.endmacro
+ |
+ |.macro .ffunc_2, name
+ |->ff_ .. name:
+ | cmp NARGS:RDd, 2+1; jb ->fff_fallback
+ |.endmacro
+ |
+ |.macro .ffunc_n, name, op
+ | .ffunc_1 name
+ | checknumtp [BASE], ->fff_fallback
+ | op xmm0, qword [BASE]
+ |.endmacro
+ |
+ |.macro .ffunc_n, name
+ | .ffunc_n name, movsd
+ |.endmacro
+ |
+ |.macro .ffunc_nn, name
+ | .ffunc_2 name
+ | checknumtp [BASE], ->fff_fallback
+ | checknumtp [BASE+8], ->fff_fallback
+ | movsd xmm0, qword [BASE]
+ | movsd xmm1, qword [BASE+8]
+ |.endmacro
+ |
+ |// Inlined GC threshold check. Caveat: uses label 1.
+ |.macro ffgccheck
+ | mov RB, [DISPATCH+DISPATCH_GL(gc.total)]
+ | cmp RB, [DISPATCH+DISPATCH_GL(gc.threshold)]
+ | jb >1
+ | call ->fff_gcstep
+ |1:
+ |.endmacro
+ |
+ |//-- Base library: checks -----------------------------------------------
+ |
+ |.ffunc_1 assert
+ | mov ITYPE, [BASE]
+ | mov RB, ITYPE
+ | sar ITYPE, 47
+ | cmp ITYPEd, LJ_TISTRUECOND; jae ->fff_fallback
+ | mov PC, [BASE-8]
+ | mov MULTRES, RDd
+ | mov RB, [BASE]
+ | mov [BASE-16], RB
+ | sub RDd, 2
+ | jz >2
+ | mov RA, BASE
+ |1:
+ | add RA, 8
+ | mov RB, [RA]
+ | mov [RA-16], RB
+ | sub RDd, 1
+ | jnz <1
+ |2:
+ | mov RDd, MULTRES
+ | jmp ->fff_res_
+ |
+ |.ffunc_1 type
+ | mov RC, [BASE]
+ | sar RC, 47
+ | mov RBd, LJ_TISNUM
+ | cmp RCd, RBd
+ | cmovb RCd, RBd
+ | not RCd
+ |2:
+ | mov CFUNC:RB, [BASE-16]
+ | cleartp CFUNC:RB
+ | mov STR:RC, [CFUNC:RB+RC*8+((char *)(&((GCfuncC *)0)->upvalue))]
+ | mov PC, [BASE-8]
+ | settp STR:RC, LJ_TSTR
+ | mov [BASE-16], STR:RC
+ | jmp ->fff_res1
+ |
+ |//-- Base library: getters and setters ---------------------------------
+ |
+ |.ffunc_1 getmetatable
+ | mov TAB:RB, [BASE]
+ | mov PC, [BASE-8]
+ | checktab TAB:RB, >6
+ |1: // Field metatable must be at same offset for GCtab and GCudata!
+ | mov TAB:RB, TAB:RB->metatable
+ |2:
+ | test TAB:RB, TAB:RB
+ | mov aword [BASE-16], LJ_TNIL
+ | jz ->fff_res1
+ | settp TAB:RC, TAB:RB, LJ_TTAB
+ | mov [BASE-16], TAB:RC // Store metatable as default result.
+ | mov STR:RC, [DISPATCH+DISPATCH_GL(gcroot)+8*(GCROOT_MMNAME+MM_metatable)]
+ | mov RAd, TAB:RB->hmask
+ | and RAd, STR:RC->hash
+ | settp STR:RC, LJ_TSTR
+ | imul RAd, #NODE
+ | add NODE:RA, TAB:RB->node
+ |3: // Rearranged logic, because we expect _not_ to find the key.
+ | cmp NODE:RA->key, STR:RC
+ | je >5
+ |4:
+ | mov NODE:RA, NODE:RA->next
+ | test NODE:RA, NODE:RA
+ | jnz <3
+ | jmp ->fff_res1 // Not found, keep default result.
+ |5:
+ | mov RB, NODE:RA->val
+ | cmp RB, LJ_TNIL; je ->fff_res1 // Ditto for nil value.
+ | mov [BASE-16], RB // Return value of mt.__metatable.
+ | jmp ->fff_res1
+ |
+ |6:
+ | cmp ITYPEd, LJ_TUDATA; je <1
+ | cmp ITYPEd, LJ_TISNUM; ja >7
+ | mov ITYPEd, LJ_TISNUM
+ |7:
+ | not ITYPEd
+ | mov TAB:RB, [DISPATCH+ITYPE*8+DISPATCH_GL(gcroot[GCROOT_BASEMT])]
+ | jmp <2
+ |
+ |.ffunc_2 setmetatable
+ | mov TAB:RB, [BASE]
+ | mov TAB:TMPR, TAB:RB
+ | checktab TAB:RB, ->fff_fallback
+ | // Fast path: no mt for table yet and not clearing the mt.
+ | cmp aword TAB:RB->metatable, 0; jne ->fff_fallback
+ | mov TAB:RA, [BASE+8]
+ | checktab TAB:RA, ->fff_fallback
+ | mov TAB:RB->metatable, TAB:RA
+ | mov PC, [BASE-8]
+ | mov [BASE-16], TAB:TMPR // Return original table.
+ | test byte TAB:RB->marked, LJ_GC_BLACK // isblack(table)
+ | jz >1
+ | // Possible write barrier. Table is black, but skip iswhite(mt) check.
+ | barrierback TAB:RB, RC
+ |1:
+ | jmp ->fff_res1
+ |
+ |.ffunc_2 rawget
+ |.if X64WIN
+ | mov TAB:RA, [BASE]
+ | checktab TAB:RA, ->fff_fallback
+ | mov RB, BASE // Save BASE.
+ | lea CARG3, [BASE+8]
+ | mov CARG2, TAB:RA // Caveat: CARG2 == BASE.
+ | mov CARG1, SAVE_L
+ |.else
+ | mov TAB:CARG2, [BASE]
+ | checktab TAB:CARG2, ->fff_fallback
+ | mov RB, BASE // Save BASE.
+ | lea CARG3, [BASE+8] // Caveat: CARG3 == BASE.
+ | mov CARG1, SAVE_L
+ |.endif
+ | call extern lj_tab_get // (lua_State *L, GCtab *t, cTValue *key)
+ | // cTValue * returned in eax (RD).
+ | mov BASE, RB // Restore BASE.
+ | // Copy table slot.
+ | mov RB, [RD]
+ | mov PC, [BASE-8]
+ | mov [BASE-16], RB
+ | jmp ->fff_res1
+ |
+ |//-- Base library: conversions ------------------------------------------
+ |
+ |.ffunc tonumber
+ | // Only handles the number case inline (without a base argument).
+ | cmp NARGS:RDd, 1+1; jne ->fff_fallback // Exactly one argument.
+ | mov RB, [BASE]
+ | checknumber RB, ->fff_fallback
+ | mov PC, [BASE-8]
+ | mov [BASE-16], RB
+ | jmp ->fff_res1
+ |
+ |.ffunc_1 tostring
+ | // Only handles the string or number case inline.
+ | mov PC, [BASE-8]
+ | mov STR:RB, [BASE]
+ | checktp_nc STR:RB, LJ_TSTR, >3
+ | // A __tostring method in the string base metatable is ignored.
+ |2:
+ | mov [BASE-16], STR:RB
+ | jmp ->fff_res1
+ |3: // Handle numbers inline, unless a number base metatable is present.
+ | cmp ITYPEd, LJ_TISNUM; ja ->fff_fallback_1
+ | cmp aword [DISPATCH+DISPATCH_GL(gcroot[GCROOT_BASEMT_NUM])], 0
+ | jne ->fff_fallback
+ | ffgccheck // Caveat: uses label 1.
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, BASE // Add frame since C call can throw.
+ | mov SAVE_PC, PC // Redundant (but a defined value).
+ |.if not X64WIN
+ | mov CARG2, BASE // Otherwise: CARG2 == BASE
+ |.endif
+ | mov L:CARG1, L:RB
+ |.if DUALNUM
+ | call extern lj_strfmt_number // (lua_State *L, cTValue *o)
+ |.else
+ | call extern lj_strfmt_num // (lua_State *L, lua_Number *np)
+ |.endif
+ | // GCstr returned in eax (RD).
+ | mov BASE, L:RB->base
+ | settp STR:RB, RD, LJ_TSTR
+ | jmp <2
+ |
+ |//-- Base library: iterators -------------------------------------------
+ |
+ |.ffunc_1 next
+ | je >2 // Missing 2nd arg?
+ |1:
+ |.if X64WIN
+ | mov RA, [BASE]
+ | checktab RA, ->fff_fallback
+ |.else
+ | mov CARG2, [BASE]
+ | checktab CARG2, ->fff_fallback
+ |.endif
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, BASE // Add frame since C call can throw.
+ | mov L:RB->top, BASE // Dummy frame length is ok.
+ | mov PC, [BASE-8]
+ |.if X64WIN
+ | lea CARG3, [BASE+8]
+ | mov CARG2, RA // Caveat: CARG2 == BASE.
+ | mov CARG1, L:RB
+ |.else
+ | lea CARG3, [BASE+8] // Caveat: CARG3 == BASE.
+ | mov CARG1, L:RB
+ |.endif
+ | mov SAVE_PC, PC // Needed for ITERN fallback.
+ | call extern lj_tab_next // (lua_State *L, GCtab *t, TValue *key)
+ | // Flag returned in eax (RD).
+ | mov BASE, L:RB->base
+ | test RDd, RDd; jz >3 // End of traversal?
+ | // Copy key and value to results.
+ | mov RB, [BASE+8]
+ | mov RD, [BASE+16]
+ | mov [BASE-16], RB
+ | mov [BASE-8], RD
+ |->fff_res2:
+ | mov RDd, 1+2
+ | jmp ->fff_res
+ |2: // Set missing 2nd arg to nil.
+ | mov aword [BASE+8], LJ_TNIL
+ | jmp <1
+ |3: // End of traversal: return nil.
+ | mov aword [BASE-16], LJ_TNIL
+ | jmp ->fff_res1
+ |
+ |.ffunc_1 pairs
+ | mov TAB:RB, [BASE]
+ | mov TMPR, TAB:RB
+ | checktab TAB:RB, ->fff_fallback
+#if LJ_52
+ | cmp aword TAB:RB->metatable, 0; jne ->fff_fallback
+#endif
+ | mov CFUNC:RD, [BASE-16]
+ | cleartp CFUNC:RD
+ | mov CFUNC:RD, CFUNC:RD->upvalue[0]
+ | settp CFUNC:RD, LJ_TFUNC
+ | mov PC, [BASE-8]
+ | mov [BASE-16], CFUNC:RD
+ | mov [BASE-8], TMPR
+ | mov aword [BASE], LJ_TNIL
+ | mov RDd, 1+3
+ | jmp ->fff_res
+ |
+ |.ffunc_2 ipairs_aux
+ | mov TAB:RB, [BASE]
+ | checktab TAB:RB, ->fff_fallback
+ |.if DUALNUM
+ | mov RA, [BASE+8]
+ | checkint RA, ->fff_fallback
+ |.else
+ | checknumtp [BASE+8], ->fff_fallback
+ | movsd xmm0, qword [BASE+8]
+ |.endif
+ | mov PC, [BASE-8]
+ |.if DUALNUM
+ | add RAd, 1
+ | setint ITYPE, RA
+ | mov [BASE-16], ITYPE
+ |.else
+ | sseconst_1 xmm1, TMPR
+ | addsd xmm0, xmm1
+ | cvttsd2si RAd, xmm0
+ | movsd qword [BASE-16], xmm0
+ |.endif
+ | cmp RAd, TAB:RB->asize; jae >2 // Not in array part?
+ | mov RD, TAB:RB->array
+ | lea RD, [RD+RA*8]
+ |1:
+ | cmp aword [RD], LJ_TNIL; je ->fff_res0
+ | // Copy array slot.
+ | mov RB, [RD]
+ | mov [BASE-8], RB
+ | jmp ->fff_res2
+ |2: // Check for empty hash part first. Otherwise call C function.
+ | cmp dword TAB:RB->hmask, 0; je ->fff_res0
+ |.if X64WIN
+ | mov TMPR, BASE
+ | mov CARG2d, RAd
+ | mov CARG1, TAB:RB
+ | mov RB, TMPR
+ |.else
+ | mov CARG1, TAB:RB
+ | mov RB, BASE // Save BASE.
+ | mov CARG2d, RAd // Caveat: CARG2 == BASE
+ |.endif
+ | call extern lj_tab_getinth // (GCtab *t, int32_t key)
+ | // cTValue * or NULL returned in eax (RD).
+ | mov BASE, RB
+ | test RD, RD
+ | jnz <1
+ |->fff_res0:
+ | mov RDd, 1+0
+ | jmp ->fff_res
+ |
+ |.ffunc_1 ipairs
+ | mov TAB:RB, [BASE]
+ | mov TMPR, TAB:RB
+ | checktab TAB:RB, ->fff_fallback
+#if LJ_52
+ | cmp aword TAB:RB->metatable, 0; jne ->fff_fallback
+#endif
+ | mov CFUNC:RD, [BASE-16]
+ | cleartp CFUNC:RD
+ | mov CFUNC:RD, CFUNC:RD->upvalue[0]
+ | settp CFUNC:RD, LJ_TFUNC
+ | mov PC, [BASE-8]
+ | mov [BASE-16], CFUNC:RD
+ | mov [BASE-8], TMPR
+ |.if DUALNUM
+ | mov64 RD, ((int64_t)LJ_TISNUM<<47)
+ | mov [BASE], RD
+ |.else
+ | mov qword [BASE], 0
+ |.endif
+ | mov RDd, 1+3
+ | jmp ->fff_res
+ |
+ |//-- Base library: catch errors ----------------------------------------
+ |
+ |.ffunc_1 pcall
+ | lea RA, [BASE+16]
+ | sub NARGS:RDd, 1
+ | mov PCd, 16+FRAME_PCALL
+ |1:
+ | movzx RBd, byte [DISPATCH+DISPATCH_GL(hookmask)]
+ | shr RB, HOOK_ACTIVE_SHIFT
+ | and RB, 1
+ | add PC, RB // Remember active hook before pcall.
+ | // Note: this does a (harmless) copy of the function to the PC slot, too.
+ | mov KBASE, RD
+ |2:
+ | mov RB, [RA+KBASE*8-24]
+ | mov [RA+KBASE*8-16], RB
+ | sub KBASE, 1
+ | ja <2
+ | jmp ->vm_call_dispatch
+ |
+ |.ffunc_2 xpcall
+ | mov LFUNC:RA, [BASE+8]
+ | checktp_nc LFUNC:RA, LJ_TFUNC, ->fff_fallback
+ | mov LFUNC:RB, [BASE] // Swap function and traceback.
+ | mov [BASE], LFUNC:RA
+ | mov [BASE+8], LFUNC:RB
+ | lea RA, [BASE+24]
+ | sub NARGS:RDd, 2
+ | mov PCd, 24+FRAME_PCALL
+ | jmp <1
+ |
+ |//-- Coroutine library --------------------------------------------------
+ |
+ |.macro coroutine_resume_wrap, resume
+ |.if resume
+ |.ffunc_1 coroutine_resume
+ | mov L:RB, [BASE]
+ | cleartp L:RB
+ |.else
+ |.ffunc coroutine_wrap_aux
+ | mov CFUNC:RB, [BASE-16]
+ | cleartp CFUNC:RB
+ | mov L:RB, CFUNC:RB->upvalue[0].gcr
+ | cleartp L:RB
+ |.endif
+ | mov PC, [BASE-8]
+ | mov SAVE_PC, PC
+ | mov TMP1, L:RB
+ |.if resume
+ | checktptp [BASE], LJ_TTHREAD, ->fff_fallback
+ |.endif
+ | cmp aword L:RB->cframe, 0; jne ->fff_fallback
+ | cmp byte L:RB->status, LUA_YIELD; ja ->fff_fallback
+ | mov RA, L:RB->top
+ | je >1 // Status != LUA_YIELD (i.e. 0)?
+ | cmp RA, L:RB->base // Check for presence of initial func.
+ | je ->fff_fallback
+ | mov PC, [RA-8] // Move initial function up.
+ | mov [RA], PC
+ | add RA, 8
+ |1:
+ |.if resume
+ | lea PC, [RA+NARGS:RD*8-16] // Check stack space (-1-thread).
+ |.else
+ | lea PC, [RA+NARGS:RD*8-8] // Check stack space (-1).
+ |.endif
+ | cmp PC, L:RB->maxstack; ja ->fff_fallback
+ | mov L:RB->top, PC
+ |
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, BASE
+ |.if resume
+ | add BASE, 8 // Keep resumed thread in stack for GC.
+ |.endif
+ | mov L:RB->top, BASE
+ |.if resume
+ | lea RB, [BASE+NARGS:RD*8-24] // RB = end of source for stack move.
+ |.else
+ | lea RB, [BASE+NARGS:RD*8-16] // RB = end of source for stack move.
+ |.endif
+ | sub RB, PC // Relative to PC.
+ |
+ | cmp PC, RA
+ | je >3
+ |2: // Move args to coroutine.
+ | mov RC, [PC+RB]
+ | mov [PC-8], RC
+ | sub PC, 8
+ | cmp PC, RA
+ | jne <2
+ |3:
+ | mov CARG2, RA
+ | mov CARG1, TMP1
+ | call ->vm_resume // (lua_State *L, TValue *base, 0, 0)
+ |
+ | mov L:RB, SAVE_L
+ | mov L:PC, TMP1
+ | mov BASE, L:RB->base
+ | mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
+ | set_vmstate INTERP
+ |
+ | cmp eax, LUA_YIELD
+ | ja >8
+ |4:
+ | mov RA, L:PC->base
+ | mov KBASE, L:PC->top
+ | mov L:PC->top, RA // Clear coroutine stack.
+ | mov PC, KBASE
+ | sub PC, RA
+ | je >6 // No results?
+ | lea RD, [BASE+PC]
+ | shr PCd, 3
+ | cmp RD, L:RB->maxstack
+ | ja >9 // Need to grow stack?
+ |
+ | mov RB, BASE
+ | sub RB, RA
+ |5: // Move results from coroutine.
+ | mov RD, [RA]
+ | mov [RA+RB], RD
+ | add RA, 8
+ | cmp RA, KBASE
+ | jne <5
+ |6:
+ |.if resume
+ | lea RDd, [PCd+2] // nresults+1 = 1 + true + results.
+ | mov_true ITYPE // Prepend true to results.
+ | mov [BASE-8], ITYPE
+ |.else
+ | lea RDd, [PCd+1] // nresults+1 = 1 + results.
+ |.endif
+ |7:
+ | mov PC, SAVE_PC
+ | mov MULTRES, RDd
+ |.if resume
+ | mov RA, -8
+ |.else
+ | xor RAd, RAd
+ |.endif
+ | test PCd, FRAME_TYPE
+ | jz ->BC_RET_Z
+ | jmp ->vm_return
+ |
+ |8: // Coroutine returned with error (at co->top-1).
+ |.if resume
+ | mov_false ITYPE // Prepend false to results.
+ | mov [BASE-8], ITYPE
+ | mov RA, L:PC->top
+ | sub RA, 8
+ | mov L:PC->top, RA // Clear error from coroutine stack.
+ | // Copy error message.
+ | mov RD, [RA]
+ | mov [BASE], RD
+ | mov RDd, 1+2 // nresults+1 = 1 + false + error.
+ | jmp <7
+ |.else
+ | mov CARG2, L:PC
+ | mov CARG1, L:RB
+ | call extern lj_ffh_coroutine_wrap_err // (lua_State *L, lua_State *co)
+ | // Error function does not return.
+ |.endif
+ |
+ |9: // Handle stack expansion on return from yield.
+ | mov L:RA, TMP1
+ | mov L:RA->top, KBASE // Undo coroutine stack clearing.
+ | mov CARG2, PC
+ | mov CARG1, L:RB
+ | call extern lj_state_growstack // (lua_State *L, int n)
+ | mov L:PC, TMP1
+ | mov BASE, L:RB->base
+ | jmp <4 // Retry the stack move.
+ |.endmacro
+ |
+ | coroutine_resume_wrap 1 // coroutine.resume
+ | coroutine_resume_wrap 0 // coroutine.wrap
+ |
+ |.ffunc coroutine_yield
+ | mov L:RB, SAVE_L
+ | test aword L:RB->cframe, CFRAME_RESUME
+ | jz ->fff_fallback
+ | mov L:RB->base, BASE
+ | lea RD, [BASE+NARGS:RD*8-8]
+ | mov L:RB->top, RD
+ | xor RDd, RDd
+ | mov aword L:RB->cframe, RD
+ | mov al, LUA_YIELD
+ | mov byte L:RB->status, al
+ | jmp ->vm_leave_unw
+ |
+ |//-- Math library -------------------------------------------------------
+ |
+ | .ffunc_1 math_abs
+ | mov RB, [BASE]
+ |.if DUALNUM
+ | checkint RB, >3
+ | cmp RBd, 0; jns ->fff_resi
+ | neg RBd; js >2
+ |->fff_resbit:
+ |->fff_resi:
+ | setint RB
+ |->fff_resRB:
+ | mov PC, [BASE-8]
+ | mov [BASE-16], RB
+ | jmp ->fff_res1
+ |2:
+ | mov64 RB, U64x(41e00000,00000000) // 2^31.
+ | jmp ->fff_resRB
+ |3:
+ | ja ->fff_fallback
+ |.else
+ | checknum RB, ->fff_fallback
+ |.endif
+ | shl RB, 1
+ | shr RB, 1
+ | mov PC, [BASE-8]
+ | mov [BASE-16], RB
+ | jmp ->fff_res1
+ |
+ |.ffunc_n math_sqrt, sqrtsd
+ |->fff_resxmm0:
+ | mov PC, [BASE-8]
+ | movsd qword [BASE-16], xmm0
+ | // fallthrough
+ |
+ |->fff_res1:
+ | mov RDd, 1+1
+ |->fff_res:
+ | mov MULTRES, RDd
+ |->fff_res_:
+ | test PCd, FRAME_TYPE
+ | jnz >7
+ |5:
+ | cmp PC_RB, RDL // More results expected?
+ | ja >6
+ | // Adjust BASE. KBASE is assumed to be set for the calling frame.
+ | movzx RAd, PC_RA
+ | neg RA
+ | lea BASE, [BASE+RA*8-16] // base = base - (RA+2)*8
+ | ins_next
+ |
+ |6: // Fill up results with nil.
+ | mov aword [BASE+RD*8-24], LJ_TNIL
+ | add RD, 1
+ | jmp <5
+ |
+ |7: // Non-standard return case.
+ | mov RA, -16 // Results start at BASE+RA = BASE-16.
+ | jmp ->vm_return
+ |
+ |.macro math_round, func
+ | .ffunc math_ .. func
+ |.if DUALNUM
+ | mov RB, [BASE]
+ | checknumx RB, ->fff_resRB, je
+ | ja ->fff_fallback
+ |.else
+ | checknumtp [BASE], ->fff_fallback
+ |.endif
+ | movsd xmm0, qword [BASE]
+ | call ->vm_ .. func .. _sse
+ |.if DUALNUM
+ | cvttsd2si RBd, xmm0
+ | cmp RBd, 0x80000000
+ | jne ->fff_resi
+ | cvtsi2sd xmm1, RBd
+ | ucomisd xmm0, xmm1
+ | jp ->fff_resxmm0
+ | je ->fff_resi
+ |.endif
+ | jmp ->fff_resxmm0
+ |.endmacro
+ |
+ | math_round floor
+ | math_round ceil
+ |
+ |.ffunc math_log
+ | cmp NARGS:RDd, 1+1; jne ->fff_fallback // Exactly one argument.
+ | checknumtp [BASE], ->fff_fallback
+ | movsd xmm0, qword [BASE]
+ | mov RB, BASE
+ | call extern log
+ | mov BASE, RB
+ | jmp ->fff_resxmm0
+ |
+ |.macro math_extern, func
+ | .ffunc_n math_ .. func
+ | mov RB, BASE
+ | call extern func
+ | mov BASE, RB
+ | jmp ->fff_resxmm0
+ |.endmacro
+ |
+ |.macro math_extern2, func
+ | .ffunc_nn math_ .. func
+ | mov RB, BASE
+ | call extern func
+ | mov BASE, RB
+ | jmp ->fff_resxmm0
+ |.endmacro
+ |
+ | math_extern log10
+ | math_extern exp
+ | math_extern sin
+ | math_extern cos
+ | math_extern tan
+ | math_extern asin
+ | math_extern acos
+ | math_extern atan
+ | math_extern sinh
+ | math_extern cosh
+ | math_extern tanh
+ | math_extern2 pow
+ | math_extern2 atan2
+ | math_extern2 fmod
+ |
+ |.ffunc_2 math_ldexp
+ | checknumtp [BASE], ->fff_fallback
+ | checknumtp [BASE+8], ->fff_fallback
+ | fld qword [BASE+8]
+ | fld qword [BASE]
+ | fscale
+ | fpop1
+ | mov PC, [BASE-8]
+ | fstp qword [BASE-16]
+ | jmp ->fff_res1
+ |
+ |.ffunc_n math_frexp
+ |.if X64WIN
+ | lea CARG2, TMP1
+ |.else
+ | lea CARG1, TMP1
+ |.endif
+ | mov RB, BASE
+ | call extern frexp
+ | mov BASE, RB
+ | mov RBd, TMP1d
+ | mov PC, [BASE-8]
+ | movsd qword [BASE-16], xmm0
+ |.if DUALNUM
+ | setint RB
+ | mov [BASE-8], RB
+ |.else
+ | cvtsi2sd xmm1, RBd
+ | movsd qword [BASE-8], xmm1
+ |.endif
+ | mov RDd, 1+2
+ | jmp ->fff_res
+ |
+ |.ffunc_n math_modf
+ |.if X64WIN
+ | lea CARG2, [BASE-16]
+ |.else
+ | lea CARG1, [BASE-16]
+ |.endif
+ | mov PC, [BASE-8]
+ | mov RB, BASE
+ | call extern modf
+ | mov BASE, RB
+ | mov PC, [BASE-8]
+ | movsd qword [BASE-8], xmm0
+ | mov RDd, 1+2
+ | jmp ->fff_res
+ |
+ |.macro math_minmax, name, cmovop, sseop
+ | .ffunc name
+ | mov RAd, 2
+ |.if DUALNUM
+ | mov RB, [BASE]
+ | checkint RB, >4
+ |1: // Handle integers.
+ | cmp RAd, RDd; jae ->fff_resRB
+ | mov TMPR, [BASE+RA*8-8]
+ | checkint TMPR, >3
+ | cmp RBd, TMPRd
+ | cmovop RB, TMPR
+ | add RAd, 1
+ | jmp <1
+ |3:
+ | ja ->fff_fallback
+ | // Convert intermediate result to number and continue below.
+ | cvtsi2sd xmm0, RBd
+ | jmp >6
+ |4:
+ | ja ->fff_fallback
+ |.else
+ | checknumtp [BASE], ->fff_fallback
+ |.endif
+ |
+ | movsd xmm0, qword [BASE]
+ |5: // Handle numbers or integers.
+ | cmp RAd, RDd; jae ->fff_resxmm0
+ |.if DUALNUM
+ | mov RB, [BASE+RA*8-8]
+ | checknumx RB, >6, jb
+ | ja ->fff_fallback
+ | cvtsi2sd xmm1, RBd
+ | jmp >7
+ |.else
+ | checknumtp [BASE+RA*8-8], ->fff_fallback
+ |.endif
+ |6:
+ | movsd xmm1, qword [BASE+RA*8-8]
+ |7:
+ | sseop xmm0, xmm1
+ | add RAd, 1
+ | jmp <5
+ |.endmacro
+ |
+ | math_minmax math_min, cmovg, minsd
+ | math_minmax math_max, cmovl, maxsd
+ |
+ |//-- String library -----------------------------------------------------
+ |
+ |.ffunc string_byte // Only handle the 1-arg case here.
+ | cmp NARGS:RDd, 1+1; jne ->fff_fallback
+ | mov STR:RB, [BASE]
+ | checkstr STR:RB, ->fff_fallback
+ | mov PC, [BASE-8]
+ | cmp dword STR:RB->len, 1
+ | jb ->fff_res0 // Return no results for empty string.
+ | movzx RBd, byte STR:RB[1]
+ |.if DUALNUM
+ | jmp ->fff_resi
+ |.else
+ | cvtsi2sd xmm0, RBd; jmp ->fff_resxmm0
+ |.endif
+ |
+ |.ffunc string_char // Only handle the 1-arg case here.
+ | ffgccheck
+ | cmp NARGS:RDd, 1+1; jne ->fff_fallback // *Exactly* 1 arg.
+ |.if DUALNUM
+ | mov RB, [BASE]
+ | checkint RB, ->fff_fallback
+ |.else
+ | checknumtp [BASE], ->fff_fallback
+ | cvttsd2si RBd, qword [BASE]
+ |.endif
+ | cmp RBd, 255; ja ->fff_fallback
+ | mov TMP1d, RBd
+ | mov TMPRd, 1
+ | lea RD, TMP1 // Points to stack. Little-endian.
+ |->fff_newstr:
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, BASE
+ | mov CARG3d, TMPRd // Zero-extended to size_t.
+ | mov CARG2, RD
+ | mov CARG1, L:RB
+ | mov SAVE_PC, PC
+ | call extern lj_str_new // (lua_State *L, char *str, size_t l)
+ |->fff_resstr:
+ | // GCstr * returned in eax (RD).
+ | mov BASE, L:RB->base
+ | mov PC, [BASE-8]
+ | settp STR:RD, LJ_TSTR
+ | mov [BASE-16], STR:RD
+ | jmp ->fff_res1
+ |
+ |.ffunc string_sub
+ | ffgccheck
+ | mov TMPRd, -1
+ | cmp NARGS:RDd, 1+2; jb ->fff_fallback
+ | jna >1
+ |.if DUALNUM
+ | mov TMPR, [BASE+16]
+ | checkint TMPR, ->fff_fallback
+ |.else
+ | checknumtp [BASE+16], ->fff_fallback
+ | cvttsd2si TMPRd, qword [BASE+16]
+ |.endif
+ |1:
+ | mov STR:RB, [BASE]
+ | checkstr STR:RB, ->fff_fallback
+ |.if DUALNUM
+ | mov ITYPE, [BASE+8]
+ | mov RAd, ITYPEd // Must clear hiword for lea below.
+ | sar ITYPE, 47
+ | cmp ITYPEd, LJ_TISNUM
+ | jne ->fff_fallback
+ |.else
+ | checknumtp [BASE+8], ->fff_fallback
+ | cvttsd2si RAd, qword [BASE+8]
+ |.endif
+ | mov RCd, STR:RB->len
+ | cmp RCd, TMPRd // len < end? (unsigned compare)
+ | jb >5
+ |2:
+ | test RAd, RAd // start <= 0?
+ | jle >7
+ |3:
+ | sub TMPRd, RAd // start > end?
+ | jl ->fff_emptystr
+ | lea RD, [STR:RB+RAd+#STR-1]
+ | add TMPRd, 1
+ |4:
+ | jmp ->fff_newstr
+ |
+ |5: // Negative end or overflow.
+ | jl >6
+ | lea TMPRd, [TMPRd+RCd+1] // end = end+(len+1)
+ | jmp <2
+ |6: // Overflow.
+ | mov TMPRd, RCd // end = len
+ | jmp <2
+ |
+ |7: // Negative start or underflow.
+ | je >8
+ | add RAd, RCd // start = start+(len+1)
+ | add RAd, 1
+ | jg <3 // start > 0?
+ |8: // Underflow.
+ | mov RAd, 1 // start = 1
+ | jmp <3
+ |
+ |->fff_emptystr: // Range underflow.
+ | xor TMPRd, TMPRd // Zero length. Any ptr in RD is ok.
+ | jmp <4
+ |
+ |.macro ffstring_op, name
+ | .ffunc_1 string_ .. name
+ | ffgccheck
+ |.if X64WIN
+ | mov STR:TMPR, [BASE]
+ | checkstr STR:TMPR, ->fff_fallback
+ |.else
+ | mov STR:CARG2, [BASE]
+ | checkstr STR:CARG2, ->fff_fallback
+ |.endif
+ | mov L:RB, SAVE_L
+ | lea SBUF:CARG1, [DISPATCH+DISPATCH_GL(tmpbuf)]
+ | mov L:RB->base, BASE
+ |.if X64WIN
+ | mov STR:CARG2, STR:TMPR // Caveat: CARG2 == BASE
+ |.endif
+ | mov RC, SBUF:CARG1->b
+ | mov SBUF:CARG1->L, L:RB
+ | mov SBUF:CARG1->p, RC
+ | mov SAVE_PC, PC
+ | call extern lj_buf_putstr_ .. name
+ | mov CARG1, rax
+ | call extern lj_buf_tostr
+ | jmp ->fff_resstr
+ |.endmacro
+ |
+ |ffstring_op reverse
+ |ffstring_op lower
+ |ffstring_op upper
+ |
+ |//-- Bit library --------------------------------------------------------
+ |
+ |.macro .ffunc_bit, name, kind, fdef
+ | fdef name
+ |.if kind == 2
+ | sseconst_tobit xmm1, RB
+ |.endif
+ |.if DUALNUM
+ | mov RB, [BASE]
+ | checkint RB, >1
+ |.if kind > 0
+ | jmp >2
+ |.else
+ | jmp ->fff_resbit
+ |.endif
+ |1:
+ | ja ->fff_fallback
+ | movd xmm0, RB
+ |.else
+ | checknumtp [BASE], ->fff_fallback
+ | movsd xmm0, qword [BASE]
+ |.endif
+ |.if kind < 2
+ | sseconst_tobit xmm1, RB
+ |.endif
+ | addsd xmm0, xmm1
+ | movd RBd, xmm0
+ |2:
+ |.endmacro
+ |
+ |.macro .ffunc_bit, name, kind
+ | .ffunc_bit name, kind, .ffunc_1
+ |.endmacro
+ |
+ |.ffunc_bit bit_tobit, 0
+ | jmp ->fff_resbit
+ |
+ |.macro .ffunc_bit_op, name, ins
+ | .ffunc_bit name, 2
+ | mov TMPRd, NARGS:RDd // Save for fallback.
+ | lea RD, [BASE+NARGS:RD*8-16]
+ |1:
+ | cmp RD, BASE
+ | jbe ->fff_resbit
+ |.if DUALNUM
+ | mov RA, [RD]
+ | checkint RA, >2
+ | ins RBd, RAd
+ | sub RD, 8
+ | jmp <1
+ |2:
+ | ja ->fff_fallback_bit_op
+ | movd xmm0, RA
+ |.else
+ | checknumtp [RD], ->fff_fallback_bit_op
+ | movsd xmm0, qword [RD]
+ |.endif
+ | addsd xmm0, xmm1
+ | movd RAd, xmm0
+ | ins RBd, RAd
+ | sub RD, 8
+ | jmp <1
+ |.endmacro
+ |
+ |.ffunc_bit_op bit_band, and
+ |.ffunc_bit_op bit_bor, or
+ |.ffunc_bit_op bit_bxor, xor
+ |
+ |.ffunc_bit bit_bswap, 1
+ | bswap RBd
+ | jmp ->fff_resbit
+ |
+ |.ffunc_bit bit_bnot, 1
+ | not RBd
+ |.if DUALNUM
+ | jmp ->fff_resbit
+ |.else
+ |->fff_resbit:
+ | cvtsi2sd xmm0, RBd
+ | jmp ->fff_resxmm0
+ |.endif
+ |
+ |->fff_fallback_bit_op:
+ | mov NARGS:RDd, TMPRd // Restore for fallback
+ | jmp ->fff_fallback
+ |
+ |.macro .ffunc_bit_sh, name, ins
+ |.if DUALNUM
+ | .ffunc_bit name, 1, .ffunc_2
+ | // Note: no inline conversion from number for 2nd argument!
+ | mov RA, [BASE+8]
+ | checkint RA, ->fff_fallback
+ |.else
+ | .ffunc_nn name
+ | sseconst_tobit xmm2, RB
+ | addsd xmm0, xmm2
+ | addsd xmm1, xmm2
+ | movd RBd, xmm0
+ | movd RAd, xmm1
+ |.endif
+ | ins RBd, cl // Assumes RA is ecx.
+ | jmp ->fff_resbit
+ |.endmacro
+ |
+ |.ffunc_bit_sh bit_lshift, shl
+ |.ffunc_bit_sh bit_rshift, shr
+ |.ffunc_bit_sh bit_arshift, sar
+ |.ffunc_bit_sh bit_rol, rol
+ |.ffunc_bit_sh bit_ror, ror
+ |
+ |//-----------------------------------------------------------------------
+ |
+ |->fff_fallback_2:
+ | mov NARGS:RDd, 1+2 // Other args are ignored, anyway.
+ | jmp ->fff_fallback
+ |->fff_fallback_1:
+ | mov NARGS:RDd, 1+1 // Other args are ignored, anyway.
+ |->fff_fallback: // Call fast function fallback handler.
+ | // BASE = new base, RD = nargs+1
+ | mov L:RB, SAVE_L
+ | mov PC, [BASE-8] // Fallback may overwrite PC.
+ | mov SAVE_PC, PC // Redundant (but a defined value).
+ | mov L:RB->base, BASE
+ | lea RD, [BASE+NARGS:RD*8-8]
+ | lea RA, [RD+8*LUA_MINSTACK] // Ensure enough space for handler.
+ | mov L:RB->top, RD
+ | mov CFUNC:RD, [BASE-16]
+ | cleartp CFUNC:RD
+ | cmp RA, L:RB->maxstack
+ | ja >5 // Need to grow stack.
+ | mov CARG1, L:RB
+ | call aword CFUNC:RD->f // (lua_State *L)
+ | mov BASE, L:RB->base
+ | // Either throws an error, or recovers and returns -1, 0 or nresults+1.
+ | test RDd, RDd; jg ->fff_res // Returned nresults+1?
+ |1:
+ | mov RA, L:RB->top
+ | sub RA, BASE
+ | shr RAd, 3
+ | test RDd, RDd
+ | lea NARGS:RDd, [RAd+1]
+ | mov LFUNC:RB, [BASE-16]
+ | jne ->vm_call_tail // Returned -1?
+ | cleartp LFUNC:RB
+ | ins_callt // Returned 0: retry fast path.
+ |
+ |// Reconstruct previous base for vmeta_call during tailcall.
+ |->vm_call_tail:
+ | mov RA, BASE
+ | test PCd, FRAME_TYPE
+ | jnz >3
+ | movzx RBd, PC_RA
+ | neg RB
+ | lea BASE, [BASE+RB*8-16] // base = base - (RB+2)*8
+ | jmp ->vm_call_dispatch // Resolve again for tailcall.
+ |3:
+ | mov RB, PC
+ | and RB, -8
+ | sub BASE, RB
+ | jmp ->vm_call_dispatch // Resolve again for tailcall.
+ |
+ |5: // Grow stack for fallback handler.
+ | mov CARG2d, LUA_MINSTACK
+ | mov CARG1, L:RB
+ | call extern lj_state_growstack // (lua_State *L, int n)
+ | mov BASE, L:RB->base
+ | xor RDd, RDd // Simulate a return 0.
+ | jmp <1 // Dumb retry (goes through ff first).
+ |
+ |->fff_gcstep: // Call GC step function.
+ | // BASE = new base, RD = nargs+1
+ | pop RB // Must keep stack at same level.
+ | mov TMP1, RB // Save return address
+ | mov L:RB, SAVE_L
+ | mov SAVE_PC, PC // Redundant (but a defined value).
+ | mov L:RB->base, BASE
+ | lea RD, [BASE+NARGS:RD*8-8]
+ | mov CARG1, L:RB
+ | mov L:RB->top, RD
+ | call extern lj_gc_step // (lua_State *L)
+ | mov BASE, L:RB->base
+ | mov RD, L:RB->top
+ | sub RD, BASE
+ | shr RDd, 3
+ | add NARGS:RDd, 1
+ | mov RB, TMP1
+ | push RB // Restore return address.
+ | ret
+ |
+ |//-----------------------------------------------------------------------
+ |//-- Special dispatch targets -------------------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |->vm_record: // Dispatch target for recording phase.
+ |.if JIT
+ | movzx RDd, byte [DISPATCH+DISPATCH_GL(hookmask)]
+ | test RDL, HOOK_VMEVENT // No recording while in vmevent.
+ | jnz >5
+ | // Decrement the hookcount for consistency, but always do the call.
+ | test RDL, HOOK_ACTIVE
+ | jnz >1
+ | test RDL, LUA_MASKLINE|LUA_MASKCOUNT
+ | jz >1
+ | dec dword [DISPATCH+DISPATCH_GL(hookcount)]
+ | jmp >1
+ |.endif
+ |
+ |->vm_rethook: // Dispatch target for return hooks.
+ | movzx RDd, byte [DISPATCH+DISPATCH_GL(hookmask)]
+ | test RDL, HOOK_ACTIVE // Hook already active?
+ | jnz >5
+ | jmp >1
+ |
+ |->vm_inshook: // Dispatch target for instr/line hooks.
+ | movzx RDd, byte [DISPATCH+DISPATCH_GL(hookmask)]
+ | test RDL, HOOK_ACTIVE // Hook already active?
+ | jnz >5
+ |
+ | test RDL, LUA_MASKLINE|LUA_MASKCOUNT
+ | jz >5
+ | dec dword [DISPATCH+DISPATCH_GL(hookcount)]
+ | jz >1
+ | test RDL, LUA_MASKLINE
+ | jz >5
+ |1:
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, BASE
+ | mov CARG2, PC // Caveat: CARG2 == BASE
+ | mov CARG1, L:RB
+ | // SAVE_PC must hold the _previous_ PC. The callee updates it with PC.
+ | call extern lj_dispatch_ins // (lua_State *L, const BCIns *pc)
+ |3:
+ | mov BASE, L:RB->base
+ |4:
+ | movzx RAd, PC_RA
+ |5:
+ | movzx OP, PC_OP
+ | movzx RDd, PC_RD
+ | jmp aword [DISPATCH+OP*8+GG_DISP2STATIC] // Re-dispatch to static ins.
+ |
+ |->cont_hook: // Continue from hook yield.
+ | add PC, 4
+ | mov RA, [RB-40]
+ | mov MULTRES, RAd // Restore MULTRES for *M ins.
+ | jmp <4
+ |
+ |->vm_hotloop: // Hot loop counter underflow.
+ |.if JIT
+ | mov LFUNC:RB, [BASE-16] // Same as curr_topL(L).
+ | cleartp LFUNC:RB
+ | mov RB, LFUNC:RB->pc
+ | movzx RDd, byte [RB+PC2PROTO(framesize)]
+ | lea RD, [BASE+RD*8]
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, BASE
+ | mov L:RB->top, RD
+ | mov CARG2, PC
+ | lea CARG1, [DISPATCH+GG_DISP2J]
+ | mov aword [DISPATCH+DISPATCH_J(L)], L:RB
+ | mov SAVE_PC, PC
+ | call extern lj_trace_hot // (jit_State *J, const BCIns *pc)
+ | jmp <3
+ |.endif
+ |
+ |->vm_callhook: // Dispatch target for call hooks.
+ | mov SAVE_PC, PC
+ |.if JIT
+ | jmp >1
+ |.endif
+ |
+ |->vm_hotcall: // Hot call counter underflow.
+ |.if JIT
+ | mov SAVE_PC, PC
+ | or PC, 1 // Marker for hot call.
+ |1:
+ |.endif
+ | lea RD, [BASE+NARGS:RD*8-8]
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, BASE
+ | mov L:RB->top, RD
+ | mov CARG2, PC
+ | mov CARG1, L:RB
+ | call extern lj_dispatch_call // (lua_State *L, const BCIns *pc)
+ | // ASMFunction returned in eax/rax (RD).
+ | mov SAVE_PC, 0 // Invalidate for subsequent line hook.
+ |.if JIT
+ | and PC, -2
+ |.endif
+ | mov BASE, L:RB->base
+ | mov RA, RD
+ | mov RD, L:RB->top
+ | sub RD, BASE
+ | mov RB, RA
+ | movzx RAd, PC_RA
+ | shr RDd, 3
+ | add NARGS:RDd, 1
+ | jmp RB
+ |
+ |->cont_stitch: // Trace stitching.
+ |.if JIT
+ | // BASE = base, RC = result, RB = mbase
+ | mov ITYPEd, [RB-24] // Save previous trace number.
+ | mov TMPRd, MULTRES
+ | movzx RAd, PC_RA
+ | lea RA, [BASE+RA*8] // Call base.
+ | sub TMPRd, 1
+ | jz >2
+ |1: // Move results down.
+ | mov RB, [RC]
+ | mov [RA], RB
+ | add RC, 8
+ | add RA, 8
+ | sub TMPRd, 1
+ | jnz <1
+ |2:
+ | movzx RCd, PC_RA
+ | movzx RBd, PC_RB
+ | add RC, RB
+ | lea RC, [BASE+RC*8-8]
+ |3:
+ | cmp RC, RA
+ | ja >9 // More results wanted?
+ |
+ | mov RA, [DISPATCH+DISPATCH_J(trace)]
+ | mov TRACE:RD, [RA+ITYPE*8]
+ | test TRACE:RD, TRACE:RD
+ | jz ->cont_nop
+ | movzx RDd, word TRACE:RD->link
+ | cmp RDd, RBd
+ | je ->cont_nop // Blacklisted.
+ | test RDd, RDd
+ | jne =>BC_JLOOP // Jump to stitched trace.
+ |
+ | // Stitch a new trace to the previous trace.
+ | mov [DISPATCH+DISPATCH_J(exitno)], RB
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, BASE
+ | mov CARG2, PC
+ | lea CARG1, [DISPATCH+GG_DISP2J]
+ | mov aword [DISPATCH+DISPATCH_J(L)], L:RB
+ | call extern lj_dispatch_stitch // (jit_State *J, const BCIns *pc)
+ | mov BASE, L:RB->base
+ | jmp ->cont_nop
+ |
+ |9: // Fill up results with nil.
+ | mov aword [RA], LJ_TNIL
+ | add RA, 8
+ | jmp <3
+ |.endif
+ |
+ |->vm_profhook: // Dispatch target for profiler hook.
+#if LJ_HASPROFILE
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, BASE
+ | mov CARG2, PC // Caveat: CARG2 == BASE
+ | mov CARG1, L:RB
+ | call extern lj_dispatch_profile // (lua_State *L, const BCIns *pc)
+ | mov BASE, L:RB->base
+ | // HOOK_PROFILE is off again, so re-dispatch to dynamic instruction.
+ | sub PC, 4
+ | jmp ->cont_nop
+#endif
+ |
+ |//-----------------------------------------------------------------------
+ |//-- Trace exit handler -------------------------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |// Called from an exit stub with the exit number on the stack.
+ |// The 16 bit exit number is stored with two (sign-extended) push imm8.
+ |->vm_exit_handler:
+ |.if JIT
+ | push r13; push r12
+ | push r11; push r10; push r9; push r8
+ | push rdi; push rsi; push rbp; lea rbp, [rsp+88]; push rbp
+ | push rbx; push rdx; push rcx; push rax
+ | movzx RCd, byte [rbp-8] // Reconstruct exit number.
+ | mov RCH, byte [rbp-16]
+ | mov [rbp-8], r15; mov [rbp-16], r14
+ | // Caveat: DISPATCH is rbx.
+ | mov DISPATCH, [ebp]
+ | mov RA, [DISPATCH+DISPATCH_GL(vmstate)] // Get trace number.
+ | set_vmstate EXIT
+ | mov [DISPATCH+DISPATCH_J(exitno)], RC
+ | mov [DISPATCH+DISPATCH_J(parent)], RA
+ |.if X64WIN
+ | sub rsp, 16*8+4*8 // Room for SSE regs + save area.
+ |.else
+ | sub rsp, 16*8 // Room for SSE regs.
+ |.endif
+ | add rbp, -128
+ | movsd qword [rbp-8], xmm15; movsd qword [rbp-16], xmm14
+ | movsd qword [rbp-24], xmm13; movsd qword [rbp-32], xmm12
+ | movsd qword [rbp-40], xmm11; movsd qword [rbp-48], xmm10
+ | movsd qword [rbp-56], xmm9; movsd qword [rbp-64], xmm8
+ | movsd qword [rbp-72], xmm7; movsd qword [rbp-80], xmm6
+ | movsd qword [rbp-88], xmm5; movsd qword [rbp-96], xmm4
+ | movsd qword [rbp-104], xmm3; movsd qword [rbp-112], xmm2
+ | movsd qword [rbp-120], xmm1; movsd qword [rbp-128], xmm0
+ | // Caveat: RB is rbp.
+ | mov L:RB, [DISPATCH+DISPATCH_GL(cur_L)]
+ | mov BASE, [DISPATCH+DISPATCH_GL(jit_base)]
+ | mov aword [DISPATCH+DISPATCH_J(L)], L:RB
+ | mov L:RB->base, BASE
+ |.if X64WIN
+ | lea CARG2, [rsp+4*8]
+ |.else
+ | mov CARG2, rsp
+ |.endif
+ | lea CARG1, [DISPATCH+GG_DISP2J]
+ | mov dword [DISPATCH+DISPATCH_GL(jit_base)], 0
+ | call extern lj_trace_exit // (jit_State *J, ExitState *ex)
+ | // MULTRES or negated error code returned in eax (RD).
+ | mov RA, L:RB->cframe
+ | and RA, CFRAME_RAWMASK
+ | mov [RA+CFRAME_OFS_L], L:RB // Set SAVE_L (on-trace resume/yield).
+ | mov BASE, L:RB->base
+ | mov PC, [RA+CFRAME_OFS_PC] // Get SAVE_PC.
+ | jmp >1
+ |.endif
+ |->vm_exit_interp:
+ | // RD = MULTRES or negated error code, BASE, PC and DISPATCH set.
+ |.if JIT
+ | // Restore additional callee-save registers only used in compiled code.
+ |.if X64WIN
+ | lea RA, [rsp+10*16+4*8]
+ |1:
+ | movdqa xmm15, [RA-10*16]
+ | movdqa xmm14, [RA-9*16]
+ | movdqa xmm13, [RA-8*16]
+ | movdqa xmm12, [RA-7*16]
+ | movdqa xmm11, [RA-6*16]
+ | movdqa xmm10, [RA-5*16]
+ | movdqa xmm9, [RA-4*16]
+ | movdqa xmm8, [RA-3*16]
+ | movdqa xmm7, [RA-2*16]
+ | mov rsp, RA // Reposition stack to C frame.
+ | movdqa xmm6, [RA-1*16]
+ | mov r15, CSAVE_1
+ | mov r14, CSAVE_2
+ | mov r13, CSAVE_3
+ | mov r12, CSAVE_4
+ |.else
+ | lea RA, [rsp+16]
+ |1:
+ | mov r13, [RA-8]
+ | mov r12, [RA]
+ | mov rsp, RA // Reposition stack to C frame.
+ |.endif
+ | test RDd, RDd; js >9 // Check for error from exit.
+ | mov L:RB, SAVE_L
+ | mov MULTRES, RDd
+ | mov LFUNC:KBASE, [BASE-16]
+ | cleartp LFUNC:KBASE
+ | mov KBASE, LFUNC:KBASE->pc
+ | mov KBASE, [KBASE+PC2PROTO(k)]
+ | mov L:RB->base, BASE
+ | mov dword [DISPATCH+DISPATCH_GL(jit_base)], 0
+ | set_vmstate INTERP
+ | // Modified copy of ins_next which handles function header dispatch, too.
+ | mov RCd, [PC]
+ | movzx RAd, RCH
+ | movzx OP, RCL
+ | add PC, 4
+ | shr RCd, 16
+ | cmp OP, BC_FUNCF // Function header?
+ | jb >3
+ | cmp OP, BC_FUNCC+2 // Fast function?
+ | jae >4
+ |2:
+ | mov RCd, MULTRES // RC/RD holds nres+1.
+ |3:
+ | jmp aword [DISPATCH+OP*8]
+ |
+ |4: // Check frame below fast function.
+ | mov RC, [BASE-8]
+ | test RCd, FRAME_TYPE
+ | jnz <2 // Trace stitching continuation?
+ | // Otherwise set KBASE for Lua function below fast function.
+ | movzx RCd, byte [RC-3]
+ | neg RC
+ | mov LFUNC:KBASE, [BASE+RC*8-24]
+ | cleartp LFUNC:KBASE
+ | mov KBASE, LFUNC:KBASE->pc
+ | mov KBASE, [KBASE+PC2PROTO(k)]
+ | jmp <2
+ |
+ |9: // Rethrow error from the right C frame.
+ | neg RD
+ | mov CARG1, L:RB
+ | mov CARG2, RD
+ | call extern lj_err_throw // (lua_State *L, int errcode)
+ |.endif
+ |
+ |//-----------------------------------------------------------------------
+ |//-- Math helper functions ----------------------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |// FP value rounding. Called by math.floor/math.ceil fast functions
+ |// and from JIT code. arg/ret is xmm0. xmm0-xmm3 and RD (eax) modified.
+ |.macro vm_round, name, mode, cond
+ |->name:
+ |->name .. _sse:
+ | sseconst_abs xmm2, RD
+ | sseconst_2p52 xmm3, RD
+ | movaps xmm1, xmm0
+ | andpd xmm1, xmm2 // |x|
+ | ucomisd xmm3, xmm1 // No truncation if 2^52 <= |x|.
+ | jbe >1
+ | andnpd xmm2, xmm0 // Isolate sign bit.
+ |.if mode == 2 // trunc(x)?
+ | movaps xmm0, xmm1
+ | addsd xmm1, xmm3 // (|x| + 2^52) - 2^52
+ | subsd xmm1, xmm3
+ | sseconst_1 xmm3, RD
+ | cmpsd xmm0, xmm1, 1 // |x| < result?
+ | andpd xmm0, xmm3
+ | subsd xmm1, xmm0 // If yes, subtract -1.
+ | orpd xmm1, xmm2 // Merge sign bit back in.
+ |.else
+ | addsd xmm1, xmm3 // (|x| + 2^52) - 2^52
+ | subsd xmm1, xmm3
+ | orpd xmm1, xmm2 // Merge sign bit back in.
+ | .if mode == 1 // ceil(x)?
+ | sseconst_m1 xmm2, RD // Must subtract -1 to preserve -0.
+ | cmpsd xmm0, xmm1, 6 // x > result?
+ | .else // floor(x)?
+ | sseconst_1 xmm2, RD
+ | cmpsd xmm0, xmm1, 1 // x < result?
+ | .endif
+ | andpd xmm0, xmm2
+ | subsd xmm1, xmm0 // If yes, subtract +-1.
+ |.endif
+ | movaps xmm0, xmm1
+ |1:
+ | ret
+ |.endmacro
+ |
+ | vm_round vm_floor, 0, 1
+ | vm_round vm_ceil, 1, JIT
+ | vm_round vm_trunc, 2, JIT
+ |
+ |// FP modulo x%y. Called by BC_MOD* and vm_arith.
+ |->vm_mod:
+ |// Args in xmm0/xmm1, return value in xmm0.
+ |// Caveat: xmm0-xmm5 and RC (eax) modified!
+ | movaps xmm5, xmm0
+ | divsd xmm0, xmm1
+ | sseconst_abs xmm2, RD
+ | sseconst_2p52 xmm3, RD
+ | movaps xmm4, xmm0
+ | andpd xmm4, xmm2 // |x/y|
+ | ucomisd xmm3, xmm4 // No truncation if 2^52 <= |x/y|.
+ | jbe >1
+ | andnpd xmm2, xmm0 // Isolate sign bit.
+ | addsd xmm4, xmm3 // (|x/y| + 2^52) - 2^52
+ | subsd xmm4, xmm3
+ | orpd xmm4, xmm2 // Merge sign bit back in.
+ | sseconst_1 xmm2, RD
+ | cmpsd xmm0, xmm4, 1 // x/y < result?
+ | andpd xmm0, xmm2
+ | subsd xmm4, xmm0 // If yes, subtract 1.0.
+ | movaps xmm0, xmm5
+ | mulsd xmm1, xmm4
+ | subsd xmm0, xmm1
+ | ret
+ |1:
+ | mulsd xmm1, xmm0
+ | movaps xmm0, xmm5
+ | subsd xmm0, xmm1
+ | ret
+ |
+ |// Args in xmm0/eax. Ret in xmm0. xmm0-xmm1 and eax modified.
+ |->vm_powi_sse:
+ | cmp eax, 1; jle >6 // i<=1?
+ | // Now 1 < (unsigned)i <= 0x80000000.
+ |1: // Handle leading zeros.
+ | test eax, 1; jnz >2
+ | mulsd xmm0, xmm0
+ | shr eax, 1
+ | jmp <1
+ |2:
+ | shr eax, 1; jz >5
+ | movaps xmm1, xmm0
+ |3: // Handle trailing bits.
+ | mulsd xmm0, xmm0
+ | shr eax, 1; jz >4
+ | jnc <3
+ | mulsd xmm1, xmm0
+ | jmp <3
+ |4:
+ | mulsd xmm0, xmm1
+ |5:
+ | ret
+ |6:
+ | je <5 // x^1 ==> x
+ | jb >7 // x^0 ==> 1
+ | neg eax
+ | call <1
+ | sseconst_1 xmm1, RD
+ | divsd xmm1, xmm0
+ | movaps xmm0, xmm1
+ | ret
+ |7:
+ | sseconst_1 xmm0, RD
+ | ret
+ |
+ |//-----------------------------------------------------------------------
+ |//-- Miscellaneous functions --------------------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |// int lj_vm_cpuid(uint32_t f, uint32_t res[4])
+ |->vm_cpuid:
+ | mov eax, CARG1d
+ | .if X64WIN; push rsi; mov rsi, CARG2; .endif
+ | push rbx
+ | cpuid
+ | mov [rsi], eax
+ | mov [rsi+4], ebx
+ | mov [rsi+8], ecx
+ | mov [rsi+12], edx
+ | pop rbx
+ | .if X64WIN; pop rsi; .endif
+ | ret
+ |
+ |//-----------------------------------------------------------------------
+ |//-- Assertions ---------------------------------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |->assert_bad_for_arg_type:
+#ifdef LUA_USE_ASSERT
+ | int3
+#endif
+ | int3
+ |
+ |//-----------------------------------------------------------------------
+ |//-- FFI helper functions -----------------------------------------------
+ |//-----------------------------------------------------------------------
+ |
+ |// Handler for callback functions. Callback slot number in ah/al.
+ |->vm_ffi_callback:
+ |.if FFI
+ |.type CTSTATE, CTState, PC
+ | saveregs_ // ebp/rbp already saved. ebp now holds global_State *.
+ | lea DISPATCH, [ebp+GG_G2DISP]
+ | mov CTSTATE, GL:ebp->ctype_state
+ | movzx eax, ax
+ | mov CTSTATE->cb.slot, eax
+ | mov CTSTATE->cb.gpr[0], CARG1
+ | mov CTSTATE->cb.gpr[1], CARG2
+ | mov CTSTATE->cb.gpr[2], CARG3
+ | mov CTSTATE->cb.gpr[3], CARG4
+ | movsd qword CTSTATE->cb.fpr[0], xmm0
+ | movsd qword CTSTATE->cb.fpr[1], xmm1
+ | movsd qword CTSTATE->cb.fpr[2], xmm2
+ | movsd qword CTSTATE->cb.fpr[3], xmm3
+ |.if X64WIN
+ | lea rax, [rsp+CFRAME_SIZE+4*8]
+ |.else
+ | lea rax, [rsp+CFRAME_SIZE]
+ | mov CTSTATE->cb.gpr[4], CARG5
+ | mov CTSTATE->cb.gpr[5], CARG6
+ | movsd qword CTSTATE->cb.fpr[4], xmm4
+ | movsd qword CTSTATE->cb.fpr[5], xmm5
+ | movsd qword CTSTATE->cb.fpr[6], xmm6
+ | movsd qword CTSTATE->cb.fpr[7], xmm7
+ |.endif
+ | mov CTSTATE->cb.stack, rax
+ | mov CARG2, rsp
+ | mov SAVE_PC, CTSTATE // Any value outside of bytecode is ok.
+ | mov CARG1, CTSTATE
+ | call extern lj_ccallback_enter // (CTState *cts, void *cf)
+ | // lua_State * returned in eax (RD).
+ | set_vmstate INTERP
+ | mov BASE, L:RD->base
+ | mov RD, L:RD->top
+ | sub RD, BASE
+ | mov LFUNC:RB, [BASE-16]
+ | cleartp LFUNC:RB
+ | shr RD, 3
+ | add RD, 1
+ | ins_callt
+ |.endif
+ |
+ |->cont_ffi_callback: // Return from FFI callback.
+ |.if FFI
+ | mov L:RA, SAVE_L
+ | mov CTSTATE, [DISPATCH+DISPATCH_GL(ctype_state)]
+ | mov aword CTSTATE->L, L:RA
+ | mov L:RA->base, BASE
+ | mov L:RA->top, RB
+ | mov CARG1, CTSTATE
+ | mov CARG2, RC
+ | call extern lj_ccallback_leave // (CTState *cts, TValue *o)
+ | mov rax, CTSTATE->cb.gpr[0]
+ | movsd xmm0, qword CTSTATE->cb.fpr[0]
+ | jmp ->vm_leave_unw
+ |.endif
+ |
+ |->vm_ffi_call: // Call C function via FFI.
+ | // Caveat: needs special frame unwinding, see below.
+ |.if FFI
+ | .type CCSTATE, CCallState, rbx
+ | push rbp; mov rbp, rsp; push rbx; mov CCSTATE, CARG1
+ |
+ | // Readjust stack.
+ | mov eax, CCSTATE->spadj
+ | sub rsp, rax
+ |
+ | // Copy stack slots.
+ | movzx ecx, byte CCSTATE->nsp
+ | sub ecx, 1
+ | js >2
+ |1:
+ | mov rax, [CCSTATE+rcx*8+offsetof(CCallState, stack)]
+ | mov [rsp+rcx*8+CCALL_SPS_EXTRA*8], rax
+ | sub ecx, 1
+ | jns <1
+ |2:
+ |
+ | movzx eax, byte CCSTATE->nfpr
+ | mov CARG1, CCSTATE->gpr[0]
+ | mov CARG2, CCSTATE->gpr[1]
+ | mov CARG3, CCSTATE->gpr[2]
+ | mov CARG4, CCSTATE->gpr[3]
+ |.if not X64WIN
+ | mov CARG5, CCSTATE->gpr[4]
+ | mov CARG6, CCSTATE->gpr[5]
+ |.endif
+ | test eax, eax; jz >5
+ | movaps xmm0, CCSTATE->fpr[0]
+ | movaps xmm1, CCSTATE->fpr[1]
+ | movaps xmm2, CCSTATE->fpr[2]
+ | movaps xmm3, CCSTATE->fpr[3]
+ |.if not X64WIN
+ | cmp eax, 4; jbe >5
+ | movaps xmm4, CCSTATE->fpr[4]
+ | movaps xmm5, CCSTATE->fpr[5]
+ | movaps xmm6, CCSTATE->fpr[6]
+ | movaps xmm7, CCSTATE->fpr[7]
+ |.endif
+ |5:
+ |
+ | call aword CCSTATE->func
+ |
+ | mov CCSTATE->gpr[0], rax
+ | movaps CCSTATE->fpr[0], xmm0
+ |.if not X64WIN
+ | mov CCSTATE->gpr[1], rdx
+ | movaps CCSTATE->fpr[1], xmm1
+ |.endif
+ |
+ | mov rbx, [rbp-8]; leave; ret
+ |.endif
+ |// Note: vm_ffi_call must be the last function in this object file!
+ |
+ |//-----------------------------------------------------------------------
+}
+
+/* Generate the code for a single instruction. */
+static void build_ins(BuildCtx *ctx, BCOp op, int defop)
+{
+ int vk = 0;
+ |// Note: aligning all instructions does not pay off.
+ |=>defop:
+
+ switch (op) {
+
+ /* -- Comparison ops ---------------------------------------------------- */
+
+ /* Remember: all ops branch for a true comparison, fall through otherwise. */
+
+ |.macro jmp_comp, lt, ge, le, gt, target
+ ||switch (op) {
+ ||case BC_ISLT:
+ | lt target
+ ||break;
+ ||case BC_ISGE:
+ | ge target
+ ||break;
+ ||case BC_ISLE:
+ | le target
+ ||break;
+ ||case BC_ISGT:
+ | gt target
+ ||break;
+ ||default: break; /* Shut up GCC. */
+ ||}
+ |.endmacro
+
+ case BC_ISLT: case BC_ISGE: case BC_ISLE: case BC_ISGT:
+ | // RA = src1, RD = src2, JMP with RD = target
+ | ins_AD
+ | mov ITYPE, [BASE+RA*8]
+ | mov RB, [BASE+RD*8]
+ | mov RA, ITYPE
+ | mov RD, RB
+ | sar ITYPE, 47
+ | sar RB, 47
+ |.if DUALNUM
+ | cmp ITYPEd, LJ_TISNUM; jne >7
+ | cmp RBd, LJ_TISNUM; jne >8
+ | add PC, 4
+ | cmp RAd, RDd
+ | jmp_comp jge, jl, jg, jle, >9
+ |6:
+ | movzx RDd, PC_RD
+ | branchPC RD
+ |9:
+ | ins_next
+ |
+ |7: // RA is not an integer.
+ | ja ->vmeta_comp
+ | // RA is a number.
+ | cmp RBd, LJ_TISNUM; jb >1; jne ->vmeta_comp
+ | // RA is a number, RD is an integer.
+ | cvtsi2sd xmm0, RDd
+ | jmp >2
+ |
+ |8: // RA is an integer, RD is not an integer.
+ | ja ->vmeta_comp
+ | // RA is an integer, RD is a number.
+ | cvtsi2sd xmm1, RAd
+ | movd xmm0, RD
+ | jmp >3
+ |.else
+ | cmp ITYPEd, LJ_TISNUM; jae ->vmeta_comp
+ | cmp RBd, LJ_TISNUM; jae ->vmeta_comp
+ |.endif
+ |1:
+ | movd xmm0, RD
+ |2:
+ | movd xmm1, RA
+ |3:
+ | add PC, 4
+ | ucomisd xmm0, xmm1
+ | // Unordered: all of ZF CF PF set, ordered: PF clear.
+ | // To preserve NaN semantics GE/GT branch on unordered, but LT/LE don't.
+ |.if DUALNUM
+ | jmp_comp jbe, ja, jb, jae, <9
+ | jmp <6
+ |.else
+ | jmp_comp jbe, ja, jb, jae, >1
+ | movzx RDd, PC_RD
+ | branchPC RD
+ |1:
+ | ins_next
+ |.endif
+ break;
+
+ case BC_ISEQV: case BC_ISNEV:
+ vk = op == BC_ISEQV;
+ | ins_AD // RA = src1, RD = src2, JMP with RD = target
+ | mov RB, [BASE+RD*8]
+ | mov ITYPE, [BASE+RA*8]
+ | add PC, 4
+ | mov RD, RB
+ | mov RA, ITYPE
+ | sar RB, 47
+ | sar ITYPE, 47
+ |.if DUALNUM
+ | cmp RBd, LJ_TISNUM; jne >7
+ | cmp ITYPEd, LJ_TISNUM; jne >8
+ | cmp RDd, RAd
+ if (vk) {
+ | jne >9
+ } else {
+ | je >9
+ }
+ | movzx RDd, PC_RD
+ | branchPC RD
+ |9:
+ | ins_next
+ |
+ |7: // RD is not an integer.
+ | ja >5
+ | // RD is a number.
+ | movd xmm1, RD
+ | cmp ITYPEd, LJ_TISNUM; jb >1; jne >5
+ | // RD is a number, RA is an integer.
+ | cvtsi2sd xmm0, RAd
+ | jmp >2
+ |
+ |8: // RD is an integer, RA is not an integer.
+ | ja >5
+ | // RD is an integer, RA is a number.
+ | cvtsi2sd xmm1, RDd
+ | jmp >1
+ |
+ |.else
+ | cmp RBd, LJ_TISNUM; jae >5
+ | cmp ITYPEd, LJ_TISNUM; jae >5
+ | movd xmm1, RD
+ |.endif
+ |1:
+ | movd xmm0, RA
+ |2:
+ | ucomisd xmm0, xmm1
+ |4:
+ iseqne_fp:
+ if (vk) {
+ | jp >2 // Unordered means not equal.
+ | jne >2
+ } else {
+ | jp >2 // Unordered means not equal.
+ | je >1
+ }
+ iseqne_end:
+ if (vk) {
+ |1: // EQ: Branch to the target.
+ | movzx RDd, PC_RD
+ | branchPC RD
+ |2: // NE: Fallthrough to next instruction.
+ |.if not FFI
+ |3:
+ |.endif
+ } else {
+ |.if not FFI
+ |3:
+ |.endif
+ |2: // NE: Branch to the target.
+ | movzx RDd, PC_RD
+ | branchPC RD
+ |1: // EQ: Fallthrough to next instruction.
+ }
+ if (LJ_DUALNUM && (op == BC_ISEQV || op == BC_ISNEV ||
+ op == BC_ISEQN || op == BC_ISNEN)) {
+ | jmp <9
+ } else {
+ | ins_next
+ }
+ |
+ if (op == BC_ISEQV || op == BC_ISNEV) {
+ |5: // Either or both types are not numbers.
+ |.if FFI
+ | cmp RBd, LJ_TCDATA; je ->vmeta_equal_cd
+ | cmp ITYPEd, LJ_TCDATA; je ->vmeta_equal_cd
+ |.endif
+ | cmp RA, RD
+ | je <1 // Same GCobjs or pvalues?
+ | cmp RBd, ITYPEd
+ | jne <2 // Not the same type?
+ | cmp RBd, LJ_TISTABUD
+ | ja <2 // Different objects and not table/ud?
+ |
+ | // Different tables or userdatas. Need to check __eq metamethod.
+ | // Field metatable must be at same offset for GCtab and GCudata!
+ | cleartp TAB:RA
+ | mov TAB:RB, TAB:RA->metatable
+ | test TAB:RB, TAB:RB
+ | jz <2 // No metatable?
+ | test byte TAB:RB->nomm, 1<<MM_eq
+ | jnz <2 // Or 'no __eq' flag set?
+ if (vk) {
+ | xor RBd, RBd // ne = 0
+ } else {
+ | mov RBd, 1 // ne = 1
+ }
+ | jmp ->vmeta_equal // Handle __eq metamethod.
+ } else {
+ |.if FFI
+ |3:
+ | cmp ITYPEd, LJ_TCDATA
+ if (LJ_DUALNUM && vk) {
+ | jne <9
+ } else {
+ | jne <2
+ }
+ | jmp ->vmeta_equal_cd
+ |.endif
+ }
+ break;
+ case BC_ISEQS: case BC_ISNES:
+ vk = op == BC_ISEQS;
+ | ins_AND // RA = src, RD = str const, JMP with RD = target
+ | mov RB, [BASE+RA*8]
+ | add PC, 4
+ | checkstr RB, >3
+ | cmp RB, [KBASE+RD*8]
+ iseqne_test:
+ if (vk) {
+ | jne >2
+ } else {
+ | je >1
+ }
+ goto iseqne_end;
+ case BC_ISEQN: case BC_ISNEN:
+ vk = op == BC_ISEQN;
+ | ins_AD // RA = src, RD = num const, JMP with RD = target
+ | mov RB, [BASE+RA*8]
+ | add PC, 4
+ |.if DUALNUM
+ | checkint RB, >7
+ | mov RD, [KBASE+RD*8]
+ | checkint RD, >8
+ | cmp RBd, RDd
+ if (vk) {
+ | jne >9
+ } else {
+ | je >9
+ }
+ | movzx RDd, PC_RD
+ | branchPC RD
+ |9:
+ | ins_next
+ |
+ |7: // RA is not an integer.
+ | ja >3
+ | // RA is a number.
+ | mov RD, [KBASE+RD*8]
+ | checkint RD, >1
+ | // RA is a number, RD is an integer.
+ | cvtsi2sd xmm0, RDd
+ | jmp >2
+ |
+ |8: // RA is an integer, RD is a number.
+ | cvtsi2sd xmm0, RBd
+ | movd xmm1, RD
+ | ucomisd xmm0, xmm1
+ | jmp >4
+ |1:
+ | movd xmm0, RD
+ |.else
+ | checknum RB, >3
+ |1:
+ | movsd xmm0, qword [KBASE+RD*8]
+ |.endif
+ |2:
+ | ucomisd xmm0, qword [BASE+RA*8]
+ |4:
+ goto iseqne_fp;
+ case BC_ISEQP: case BC_ISNEP:
+ vk = op == BC_ISEQP;
+ | ins_AND // RA = src, RD = primitive type (~), JMP with RD = target
+ | mov RB, [BASE+RA*8]
+ | sar RB, 47
+ | add PC, 4
+ | cmp RBd, RDd
+ if (!LJ_HASFFI) goto iseqne_test;
+ if (vk) {
+ | jne >3
+ | movzx RDd, PC_RD
+ | branchPC RD
+ |2:
+ | ins_next
+ |3:
+ | cmp RBd, LJ_TCDATA; jne <2
+ | jmp ->vmeta_equal_cd
+ } else {
+ | je >2
+ | cmp RBd, LJ_TCDATA; je ->vmeta_equal_cd
+ | movzx RDd, PC_RD
+ | branchPC RD
+ |2:
+ | ins_next
+ }
+ break;
+
+ /* -- Unary test and copy ops ------------------------------------------- */
+
+ case BC_ISTC: case BC_ISFC: case BC_IST: case BC_ISF:
+ | ins_AD // RA = dst or unused, RD = src, JMP with RD = target
+ | mov ITYPE, [BASE+RD*8]
+ | add PC, 4
+ if (op == BC_ISTC || op == BC_ISFC) {
+ | mov RB, ITYPE
+ }
+ | sar ITYPE, 47
+ | cmp ITYPEd, LJ_TISTRUECOND
+ if (op == BC_IST || op == BC_ISTC) {
+ | jae >1
+ } else {
+ | jb >1
+ }
+ if (op == BC_ISTC || op == BC_ISFC) {
+ | mov [BASE+RA*8], RB
+ }
+ | movzx RDd, PC_RD
+ | branchPC RD
+ |1: // Fallthrough to the next instruction.
+ | ins_next
+ break;
+
+ case BC_ISTYPE:
+ | ins_AD // RA = src, RD = -type
+ | mov RB, [BASE+RA*8]
+ | sar RB, 47
+ | add RBd, RDd
+ | jne ->vmeta_istype
+ | ins_next
+ break;
+ case BC_ISNUM:
+ | ins_AD // RA = src, RD = -(TISNUM-1)
+ | checknumtp [BASE+RA*8], ->vmeta_istype
+ | ins_next
+ break;
+
+ /* -- Unary ops --------------------------------------------------------- */
+
+ case BC_MOV:
+ | ins_AD // RA = dst, RD = src
+ | mov RB, [BASE+RD*8]
+ | mov [BASE+RA*8], RB
+ | ins_next_
+ break;
+ case BC_NOT:
+ | ins_AD // RA = dst, RD = src
+ | mov RB, [BASE+RD*8]
+ | sar RB, 47
+ | mov RCd, 2
+ | cmp RB, LJ_TISTRUECOND
+ | sbb RCd, 0
+ | shl RC, 47
+ | not RC
+ | mov [BASE+RA*8], RC
+ | ins_next
+ break;
+ case BC_UNM:
+ | ins_AD // RA = dst, RD = src
+ | mov RB, [BASE+RD*8]
+ |.if DUALNUM
+ | checkint RB, >5
+ | neg RBd
+ | jo >4
+ | setint RB
+ |9:
+ | mov [BASE+RA*8], RB
+ | ins_next
+ |4:
+ | mov64 RB, U64x(41e00000,00000000) // 2^31.
+ | jmp <9
+ |5:
+ | ja ->vmeta_unm
+ |.else
+ | checknum RB, ->vmeta_unm
+ |.endif
+ | mov64 RD, U64x(80000000,00000000)
+ | xor RB, RD
+ |.if DUALNUM
+ | jmp <9
+ |.else
+ | mov [BASE+RA*8], RB
+ | ins_next
+ |.endif
+ break;
+ case BC_LEN:
+ | ins_AD // RA = dst, RD = src
+ | mov RD, [BASE+RD*8]
+ | checkstr RD, >2
+ |.if DUALNUM
+ | mov RDd, dword STR:RD->len
+ |1:
+ | setint RD
+ | mov [BASE+RA*8], RD
+ |.else
+ | xorps xmm0, xmm0
+ | cvtsi2sd xmm0, dword STR:RD->len
+ |1:
+ | movsd qword [BASE+RA*8], xmm0
+ |.endif
+ | ins_next
+ |2:
+ | cmp ITYPEd, LJ_TTAB; jne ->vmeta_len
+ | mov TAB:CARG1, TAB:RD
+#if LJ_52
+ | mov TAB:RB, TAB:RD->metatable
+ | cmp TAB:RB, 0
+ | jnz >9
+ |3:
+#endif
+ |->BC_LEN_Z:
+ | mov RB, BASE // Save BASE.
+ | call extern lj_tab_len // (GCtab *t)
+ | // Length of table returned in eax (RD).
+ |.if DUALNUM
+ | // Nothing to do.
+ |.else
+ | cvtsi2sd xmm0, RDd
+ |.endif
+ | mov BASE, RB // Restore BASE.
+ | movzx RAd, PC_RA
+ | jmp <1
+#if LJ_52
+ |9: // Check for __len.
+ | test byte TAB:RB->nomm, 1<<MM_len
+ | jnz <3
+ | jmp ->vmeta_len // 'no __len' flag NOT set: check.
+#endif
+ break;
+
+ /* -- Binary ops -------------------------------------------------------- */
+
+ |.macro ins_arithpre, sseins, ssereg
+ | ins_ABC
+ ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
+ ||switch (vk) {
+ ||case 0:
+ | checknumtp [BASE+RB*8], ->vmeta_arith_vn
+ | .if DUALNUM
+ | checknumtp [KBASE+RC*8], ->vmeta_arith_vn
+ | .endif
+ | movsd xmm0, qword [BASE+RB*8]
+ | sseins ssereg, qword [KBASE+RC*8]
+ || break;
+ ||case 1:
+ | checknumtp [BASE+RB*8], ->vmeta_arith_nv
+ | .if DUALNUM
+ | checknumtp [KBASE+RC*8], ->vmeta_arith_nv
+ | .endif
+ | movsd xmm0, qword [KBASE+RC*8]
+ | sseins ssereg, qword [BASE+RB*8]
+ || break;
+ ||default:
+ | checknumtp [BASE+RB*8], ->vmeta_arith_vv
+ | checknumtp [BASE+RC*8], ->vmeta_arith_vv
+ | movsd xmm0, qword [BASE+RB*8]
+ | sseins ssereg, qword [BASE+RC*8]
+ || break;
+ ||}
+ |.endmacro
+ |
+ |.macro ins_arithdn, intins
+ | ins_ABC
+ ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN);
+ ||switch (vk) {
+ ||case 0:
+ | mov RB, [BASE+RB*8]
+ | mov RC, [KBASE+RC*8]
+ | checkint RB, ->vmeta_arith_vno
+ | checkint RC, ->vmeta_arith_vno
+ | intins RBd, RCd; jo ->vmeta_arith_vno
+ || break;
+ ||case 1:
+ | mov RB, [BASE+RB*8]
+ | mov RC, [KBASE+RC*8]
+ | checkint RB, ->vmeta_arith_nvo
+ | checkint RC, ->vmeta_arith_nvo
+ | intins RCd, RBd; jo ->vmeta_arith_nvo
+ || break;
+ ||default:
+ | mov RB, [BASE+RB*8]
+ | mov RC, [BASE+RC*8]
+ | checkint RB, ->vmeta_arith_vvo
+ | checkint RC, ->vmeta_arith_vvo
+ | intins RBd, RCd; jo ->vmeta_arith_vvo
+ || break;
+ ||}
+ ||if (vk == 1) {
+ | setint RC
+ | mov [BASE+RA*8], RC
+ ||} else {
+ | setint RB
+ | mov [BASE+RA*8], RB
+ ||}
+ | ins_next
+ |.endmacro
+ |
+ |.macro ins_arithpost
+ | movsd qword [BASE+RA*8], xmm0
+ |.endmacro
+ |
+ |.macro ins_arith, sseins
+ | ins_arithpre sseins, xmm0
+ | ins_arithpost
+ | ins_next
+ |.endmacro
+ |
+ |.macro ins_arith, intins, sseins
+ |.if DUALNUM
+ | ins_arithdn intins
+ |.else
+ | ins_arith, sseins
+ |.endif
+ |.endmacro
+
+ | // RA = dst, RB = src1 or num const, RC = src2 or num const
+ case BC_ADDVN: case BC_ADDNV: case BC_ADDVV:
+ | ins_arith add, addsd
+ break;
+ case BC_SUBVN: case BC_SUBNV: case BC_SUBVV:
+ | ins_arith sub, subsd
+ break;
+ case BC_MULVN: case BC_MULNV: case BC_MULVV:
+ | ins_arith imul, mulsd
+ break;
+ case BC_DIVVN: case BC_DIVNV: case BC_DIVVV:
+ | ins_arith divsd
+ break;
+ case BC_MODVN:
+ | ins_arithpre movsd, xmm1
+ |->BC_MODVN_Z:
+ | call ->vm_mod
+ | ins_arithpost
+ | ins_next
+ break;
+ case BC_MODNV: case BC_MODVV:
+ | ins_arithpre movsd, xmm1
+ | jmp ->BC_MODVN_Z // Avoid 3 copies. It's slow anyway.
+ break;
+ case BC_POW:
+ | ins_arithpre movsd, xmm1
+ | mov RB, BASE
+ | call extern pow
+ | movzx RAd, PC_RA
+ | mov BASE, RB
+ | ins_arithpost
+ | ins_next
+ break;
+
+ case BC_CAT:
+ | ins_ABC // RA = dst, RB = src_start, RC = src_end
+ | mov L:CARG1, SAVE_L
+ | mov L:CARG1->base, BASE
+ | lea CARG2, [BASE+RC*8]
+ | mov CARG3d, RCd
+ | sub CARG3d, RBd
+ |->BC_CAT_Z:
+ | mov L:RB, L:CARG1
+ | mov SAVE_PC, PC
+ | call extern lj_meta_cat // (lua_State *L, TValue *top, int left)
+ | // NULL (finished) or TValue * (metamethod) returned in eax (RC).
+ | mov BASE, L:RB->base
+ | test RC, RC
+ | jnz ->vmeta_binop
+ | movzx RBd, PC_RB // Copy result to Stk[RA] from Stk[RB].
+ | movzx RAd, PC_RA
+ | mov RC, [BASE+RB*8]
+ | mov [BASE+RA*8], RC
+ | ins_next
+ break;
+
+ /* -- Constant ops ------------------------------------------------------ */
+
+ case BC_KSTR:
+ | ins_AND // RA = dst, RD = str const (~)
+ | mov RD, [KBASE+RD*8]
+ | settp RD, LJ_TSTR
+ | mov [BASE+RA*8], RD
+ | ins_next
+ break;
+ case BC_KCDATA:
+ |.if FFI
+ | ins_AND // RA = dst, RD = cdata const (~)
+ | mov RD, [KBASE+RD*8]
+ | settp RD, LJ_TCDATA
+ | mov [BASE+RA*8], RD
+ | ins_next
+ |.endif
+ break;
+ case BC_KSHORT:
+ | ins_AD // RA = dst, RD = signed int16 literal
+ |.if DUALNUM
+ | movsx RDd, RDW
+ | setint RD
+ | mov [BASE+RA*8], RD
+ |.else
+ | movsx RDd, RDW // Sign-extend literal.
+ | cvtsi2sd xmm0, RDd
+ | movsd qword [BASE+RA*8], xmm0
+ |.endif
+ | ins_next
+ break;
+ case BC_KNUM:
+ | ins_AD // RA = dst, RD = num const
+ | movsd xmm0, qword [KBASE+RD*8]
+ | movsd qword [BASE+RA*8], xmm0
+ | ins_next
+ break;
+ case BC_KPRI:
+ | ins_AD // RA = dst, RD = primitive type (~)
+ | shl RD, 47
+ | not RD
+ | mov [BASE+RA*8], RD
+ | ins_next
+ break;
+ case BC_KNIL:
+ | ins_AD // RA = dst_start, RD = dst_end
+ | lea RA, [BASE+RA*8+8]
+ | lea RD, [BASE+RD*8]
+ | mov RB, LJ_TNIL
+ | mov [RA-8], RB // Sets minimum 2 slots.
+ |1:
+ | mov [RA], RB
+ | add RA, 8
+ | cmp RA, RD
+ | jbe <1
+ | ins_next
+ break;
+
+ /* -- Upvalue and function ops ------------------------------------------ */
+
+ case BC_UGET:
+ | ins_AD // RA = dst, RD = upvalue #
+ | mov LFUNC:RB, [BASE-16]
+ | cleartp LFUNC:RB
+ | mov UPVAL:RB, [LFUNC:RB+RD*8+offsetof(GCfuncL, uvptr)]
+ | mov RB, UPVAL:RB->v
+ | mov RD, [RB]
+ | mov [BASE+RA*8], RD
+ | ins_next
+ break;
+ case BC_USETV:
+#define TV2MARKOFS \
+ ((int32_t)offsetof(GCupval, marked)-(int32_t)offsetof(GCupval, tv))
+ | ins_AD // RA = upvalue #, RD = src
+ | mov LFUNC:RB, [BASE-16]
+ | cleartp LFUNC:RB
+ | mov UPVAL:RB, [LFUNC:RB+RA*8+offsetof(GCfuncL, uvptr)]
+ | cmp byte UPVAL:RB->closed, 0
+ | mov RB, UPVAL:RB->v
+ | mov RA, [BASE+RD*8]
+ | mov [RB], RA
+ | jz >1
+ | // Check barrier for closed upvalue.
+ | test byte [RB+TV2MARKOFS], LJ_GC_BLACK // isblack(uv)
+ | jnz >2
+ |1:
+ | ins_next
+ |
+ |2: // Upvalue is black. Check if new value is collectable and white.
+ | mov RD, RA
+ | sar RD, 47
+ | sub RDd, LJ_TISGCV
+ | cmp RDd, LJ_TNUMX - LJ_TISGCV // tvisgcv(v)
+ | jbe <1
+ | cleartp GCOBJ:RA
+ | test byte GCOBJ:RA->gch.marked, LJ_GC_WHITES // iswhite(v)
+ | jz <1
+ | // Crossed a write barrier. Move the barrier forward.
+ |.if not X64WIN
+ | mov CARG2, RB
+ | mov RB, BASE // Save BASE.
+ |.else
+ | xchg CARG2, RB // Save BASE (CARG2 == BASE).
+ |.endif
+ | lea GL:CARG1, [DISPATCH+GG_DISP2G]
+ | call extern lj_gc_barrieruv // (global_State *g, TValue *tv)
+ | mov BASE, RB // Restore BASE.
+ | jmp <1
+ break;
+#undef TV2MARKOFS
+ case BC_USETS:
+ | ins_AND // RA = upvalue #, RD = str const (~)
+ | mov LFUNC:RB, [BASE-16]
+ | cleartp LFUNC:RB
+ | mov UPVAL:RB, [LFUNC:RB+RA*8+offsetof(GCfuncL, uvptr)]
+ | mov STR:RA, [KBASE+RD*8]
+ | mov RD, UPVAL:RB->v
+ | settp STR:ITYPE, STR:RA, LJ_TSTR
+ | mov [RD], STR:ITYPE
+ | test byte UPVAL:RB->marked, LJ_GC_BLACK // isblack(uv)
+ | jnz >2
+ |1:
+ | ins_next
+ |
+ |2: // Check if string is white and ensure upvalue is closed.
+ | test byte GCOBJ:RA->gch.marked, LJ_GC_WHITES // iswhite(str)
+ | jz <1
+ | cmp byte UPVAL:RB->closed, 0
+ | jz <1
+ | // Crossed a write barrier. Move the barrier forward.
+ | mov RB, BASE // Save BASE (CARG2 == BASE).
+ | mov CARG2, RD
+ | lea GL:CARG1, [DISPATCH+GG_DISP2G]
+ | call extern lj_gc_barrieruv // (global_State *g, TValue *tv)
+ | mov BASE, RB // Restore BASE.
+ | jmp <1
+ break;
+ case BC_USETN:
+ | ins_AD // RA = upvalue #, RD = num const
+ | mov LFUNC:RB, [BASE-16]
+ | cleartp LFUNC:RB
+ | movsd xmm0, qword [KBASE+RD*8]
+ | mov UPVAL:RB, [LFUNC:RB+RA*8+offsetof(GCfuncL, uvptr)]
+ | mov RA, UPVAL:RB->v
+ | movsd qword [RA], xmm0
+ | ins_next
+ break;
+ case BC_USETP:
+ | ins_AD // RA = upvalue #, RD = primitive type (~)
+ | mov LFUNC:RB, [BASE-16]
+ | cleartp LFUNC:RB
+ | mov UPVAL:RB, [LFUNC:RB+RA*8+offsetof(GCfuncL, uvptr)]
+ | shl RD, 47
+ | not RD
+ | mov RA, UPVAL:RB->v
+ | mov [RA], RD
+ | ins_next
+ break;
+ case BC_UCLO:
+ | ins_AD // RA = level, RD = target
+ | branchPC RD // Do this first to free RD.
+ | mov L:RB, SAVE_L
+ | cmp dword L:RB->openupval, 0
+ | je >1
+ | mov L:RB->base, BASE
+ | lea CARG2, [BASE+RA*8] // Caveat: CARG2 == BASE
+ | mov L:CARG1, L:RB // Caveat: CARG1 == RA
+ | call extern lj_func_closeuv // (lua_State *L, TValue *level)
+ | mov BASE, L:RB->base
+ |1:
+ | ins_next
+ break;
+
+ case BC_FNEW:
+ | ins_AND // RA = dst, RD = proto const (~) (holding function prototype)
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, BASE // Caveat: CARG2/CARG3 may be BASE.
+ | mov CARG3, [BASE-16]
+ | cleartp CARG3
+ | mov CARG2, [KBASE+RD*8] // Fetch GCproto *.
+ | mov CARG1, L:RB
+ | mov SAVE_PC, PC
+ | // (lua_State *L, GCproto *pt, GCfuncL *parent)
+ | call extern lj_func_newL_gc
+ | // GCfuncL * returned in eax (RC).
+ | mov BASE, L:RB->base
+ | movzx RAd, PC_RA
+ | settp LFUNC:RC, LJ_TFUNC
+ | mov [BASE+RA*8], LFUNC:RC
+ | ins_next
+ break;
+
+ /* -- Table ops --------------------------------------------------------- */
+
+ case BC_TNEW:
+ | ins_AD // RA = dst, RD = hbits|asize
+ | mov L:RB, SAVE_L
+ | mov L:RB->base, BASE
+ | mov RA, [DISPATCH+DISPATCH_GL(gc.total)]
+ | cmp RA, [DISPATCH+DISPATCH_GL(gc.threshold)]
+ | mov SAVE_PC, PC
+ | jae >5
+ |1:
+ | mov CARG3d, RDd
+ | and RDd, 0x7ff
+ | shr CARG3d, 11
+ | cmp RDd, 0x7ff
+ | je >3
+ |2:
+ | mov L:CARG1, L:RB
+ | mov CARG2d, RDd
+ | call extern lj_tab_new // (lua_State *L, int32_t asize, uint32_t hbits)
+ | // Table * returned in eax (RC).
+ | mov BASE, L:RB->base
+ | movzx RAd, PC_RA
+ | settp TAB:RC, LJ_TTAB
+ | mov [BASE+RA*8], TAB:RC
+ | ins_next
+ |3: // Turn 0x7ff into 0x801.
+ | mov RDd, 0x801
+ | jmp <2
+ |5:
+ | mov L:CARG1, L:RB
+ | call extern lj_gc_step_fixtop // (lua_State *L)
+ | movzx RDd, PC_RD
+ | jmp <1
+ break;
+ case BC_TDUP:
+ | ins_AND // RA = dst, RD = table const (~) (holding template table)
+ | mov L:RB, SAVE_L
+ | mov RA, [DISPATCH+DISPATCH_GL(gc.total)]
+ | mov SAVE_PC, PC
+ | cmp RA, [DISPATCH+DISPATCH_GL(gc.threshold)]
+ | mov L:RB->base, BASE
+ | jae >3
+ |2:
+ | mov TAB:CARG2, [KBASE+RD*8] // Caveat: CARG2 == BASE
+ | mov L:CARG1, L:RB // Caveat: CARG1 == RA
+ | call extern lj_tab_dup // (lua_State *L, Table *kt)
+ | // Table * returned in eax (RC).
+ | mov BASE, L:RB->base
+ | movzx RAd, PC_RA
+ | settp TAB:RC, LJ_TTAB
+ | mov [BASE+RA*8], TAB:RC
+ | ins_next
+ |3:
+ | mov L:CARG1, L:RB
+ | call extern lj_gc_step_fixtop // (lua_State *L)
+ | movzx RDd, PC_RD // Need to reload RD.
+ | not RD
+ | jmp <2
+ break;
+
+ case BC_GGET:
+ | ins_AND // RA = dst, RD = str const (~)
+ | mov LFUNC:RB, [BASE-16]
+ | cleartp LFUNC:RB
+ | mov TAB:RB, LFUNC:RB->env
+ | mov STR:RC, [KBASE+RD*8]
+ | jmp ->BC_TGETS_Z
+ break;
+ case BC_GSET:
+ | ins_AND // RA = src, RD = str const (~)
+ | mov LFUNC:RB, [BASE-16]
+ | cleartp LFUNC:RB
+ | mov TAB:RB, LFUNC:RB->env
+ | mov STR:RC, [KBASE+RD*8]
+ | jmp ->BC_TSETS_Z
+ break;
+
+ case BC_TGETV:
+ | ins_ABC // RA = dst, RB = table, RC = key
+ | mov TAB:RB, [BASE+RB*8]
+ | mov RC, [BASE+RC*8]
+ | checktab TAB:RB, ->vmeta_tgetv
+ |
+ | // Integer key?
+ |.if DUALNUM
+ | checkint RC, >5
+ |.else
+ | // Convert number to int and back and compare.
+ | checknum RC, >5
+ | movd xmm0, RC
+ | cvttsd2si RCd, xmm0
+ | cvtsi2sd xmm1, RCd
+ | ucomisd xmm0, xmm1
+ | jne ->vmeta_tgetv // Generic numeric key? Use fallback.
+ |.endif
+ | cmp RCd, TAB:RB->asize // Takes care of unordered, too.
+ | jae ->vmeta_tgetv // Not in array part? Use fallback.
+ | shl RCd, 3
+ | add RC, TAB:RB->array
+ | // Get array slot.
+ | mov ITYPE, [RC]
+ | cmp ITYPE, LJ_TNIL // Avoid overwriting RB in fastpath.
+ | je >2
+ |1:
+ | mov [BASE+RA*8], ITYPE
+ | ins_next
+ |
+ |2: // Check for __index if table value is nil.
+ | mov TAB:TMPR, TAB:RB->metatable
+ | test TAB:TMPR, TAB:TMPR
+ | jz <1
+ | test byte TAB:TMPR->nomm, 1<<MM_index
+ | jz ->vmeta_tgetv // 'no __index' flag NOT set: check.
+ | jmp <1
+ |
+ |5: // String key?
+ | cmp ITYPEd, LJ_TSTR; jne ->vmeta_tgetv
+ | cleartp STR:RC
+ | jmp ->BC_TGETS_Z
+ break;
+ case BC_TGETS:
+ | ins_ABC // RA = dst, RB = table, RC = str const (~)
+ | mov TAB:RB, [BASE+RB*8]
+ | not RC
+ | mov STR:RC, [KBASE+RC*8]
+ | checktab TAB:RB, ->vmeta_tgets
+ |->BC_TGETS_Z: // RB = GCtab *, RC = GCstr *
+ | mov TMPRd, TAB:RB->hmask
+ | and TMPRd, STR:RC->hash
+ | imul TMPRd, #NODE
+ | add NODE:TMPR, TAB:RB->node
+ | settp ITYPE, STR:RC, LJ_TSTR
+ |1:
+ | cmp NODE:TMPR->key, ITYPE
+ | jne >4
+ | // Get node value.
+ | mov ITYPE, NODE:TMPR->val
+ | cmp ITYPE, LJ_TNIL
+ | je >5 // Key found, but nil value?
+ |2:
+ | mov [BASE+RA*8], ITYPE
+ | ins_next
+ |
+ |4: // Follow hash chain.
+ | mov NODE:TMPR, NODE:TMPR->next
+ | test NODE:TMPR, NODE:TMPR
+ | jnz <1
+ | // End of hash chain: key not found, nil result.
+ | mov ITYPE, LJ_TNIL
+ |
+ |5: // Check for __index if table value is nil.
+ | mov TAB:TMPR, TAB:RB->metatable
+ | test TAB:TMPR, TAB:TMPR
+ | jz <2 // No metatable: done.
+ | test byte TAB:TMPR->nomm, 1<<MM_index
+ | jnz <2 // 'no __index' flag set: done.
+ | jmp ->vmeta_tgets // Caveat: preserve STR:RC.
+ break;
+ case BC_TGETB:
+ | ins_ABC // RA = dst, RB = table, RC = byte literal
+ | mov TAB:RB, [BASE+RB*8]
+ | checktab TAB:RB, ->vmeta_tgetb
+ | cmp RCd, TAB:RB->asize
+ | jae ->vmeta_tgetb
+ | shl RCd, 3
+ | add RC, TAB:RB->array
+ | // Get array slot.
+ | mov ITYPE, [RC]
+ | cmp ITYPE, LJ_TNIL
+ | je >2
+ |1:
+ | mov [BASE+RA*8], ITYPE
+ | ins_next
+ |
+ |2: // Check for __index if table value is nil.
+ | mov TAB:TMPR, TAB:RB->metatable
+ | test TAB:TMPR, TAB:TMPR
+ | jz <1
+ | test byte TAB:TMPR->nomm, 1<<MM_index
+ | jz ->vmeta_tgetb // 'no __index' flag NOT set: check.
+ | jmp <1
+ break;
+ case BC_TGETR:
+ | ins_ABC // RA = dst, RB = table, RC = key
+ | mov TAB:RB, [BASE+RB*8]
+ | cleartp TAB:RB
+ |.if DUALNUM
+ | mov RCd, dword [BASE+RC*8]
+ |.else
+ | cvttsd2si RCd, qword [BASE+RC*8]
+ |.endif
+ | cmp RCd, TAB:RB->asize
+ | jae ->vmeta_tgetr // Not in array part? Use fallback.
+ | shl RCd, 3
+ | add RC, TAB:RB->array
+ | // Get array slot.
+ |->BC_TGETR_Z:
+ | mov ITYPE, [RC]
+ |->BC_TGETR2_Z:
+ | mov [BASE+RA*8], ITYPE
+ | ins_next
+ break;
+
+ case BC_TSETV:
+ | ins_ABC // RA = src, RB = table, RC = key
+ | mov TAB:RB, [BASE+RB*8]
+ | mov RC, [BASE+RC*8]
+ | checktab TAB:RB, ->vmeta_tsetv
+ |
+ | // Integer key?
+ |.if DUALNUM
+ | checkint RC, >5
+ |.else
+ | // Convert number to int and back and compare.
+ | checknum RC, >5
+ | movd xmm0, RC
+ | cvttsd2si RCd, xmm0
+ | cvtsi2sd xmm1, RCd
+ | ucomisd xmm0, xmm1
+ | jne ->vmeta_tsetv // Generic numeric key? Use fallback.
+ |.endif
+ | cmp RCd, TAB:RB->asize // Takes care of unordered, too.
+ | jae ->vmeta_tsetv
+ | shl RCd, 3
+ | add RC, TAB:RB->array
+ | cmp aword [RC], LJ_TNIL
+ | je >3 // Previous value is nil?
+ |1:
+ | test byte TAB:RB->marked, LJ_GC_BLACK // isblack(table)
+ | jnz >7
+ |2: // Set array slot.
+ | mov RB, [BASE+RA*8]
+ | mov [RC], RB
+ | ins_next
+ |
+ |3: // Check for __newindex if previous value is nil.
+ | mov TAB:TMPR, TAB:RB->metatable
+ | test TAB:TMPR, TAB:TMPR
+ | jz <1
+ | test byte TAB:TMPR->nomm, 1<<MM_newindex
+ | jz ->vmeta_tsetv // 'no __newindex' flag NOT set: check.
+ | jmp <1
+ |
+ |5: // String key?
+ | cmp ITYPEd, LJ_TSTR; jne ->vmeta_tsetv
+ | cleartp STR:RC
+ | jmp ->BC_TSETS_Z
+ |
+ |7: // Possible table write barrier for the value. Skip valiswhite check.
+ | barrierback TAB:RB, TMPR
+ | jmp <2
+ break;
+ case BC_TSETS:
+ | ins_ABC // RA = src, RB = table, RC = str const (~)
+ | mov TAB:RB, [BASE+RB*8]
+ | not RC
+ | mov STR:RC, [KBASE+RC*8]
+ | checktab TAB:RB, ->vmeta_tsets
+ |->BC_TSETS_Z: // RB = GCtab *, RC = GCstr *
+ | mov TMPRd, TAB:RB->hmask
+ | and TMPRd, STR:RC->hash
+ | imul TMPRd, #NODE
+ | mov byte TAB:RB->nomm, 0 // Clear metamethod cache.
+ | add NODE:TMPR, TAB:RB->node
+ | settp ITYPE, STR:RC, LJ_TSTR
+ |1:
+ | cmp NODE:TMPR->key, ITYPE
+ | jne >5
+ | // Ok, key found. Assumes: offsetof(Node, val) == 0
+ | cmp aword [TMPR], LJ_TNIL
+ | je >4 // Previous value is nil?
+ |2:
+ | test byte TAB:RB->marked, LJ_GC_BLACK // isblack(table)
+ | jnz >7
+ |3: // Set node value.
+ | mov ITYPE, [BASE+RA*8]
+ | mov [TMPR], ITYPE
+ | ins_next
+ |
+ |4: // Check for __newindex if previous value is nil.
+ | mov TAB:ITYPE, TAB:RB->metatable
+ | test TAB:ITYPE, TAB:ITYPE
+ | jz <2
+ | test byte TAB:ITYPE->nomm, 1<<MM_newindex
+ | jz ->vmeta_tsets // 'no __newindex' flag NOT set: check.
+ | jmp <2
+ |
+ |5: // Follow hash chain.
+ | mov NODE:TMPR, NODE:TMPR->next
+ | test NODE:TMPR, NODE:TMPR
+ | jnz <1
+ | // End of hash chain: key not found, add a new one.
+ |
+ | // But check for __newindex first.
+ | mov TAB:TMPR, TAB:RB->metatable
+ | test TAB:TMPR, TAB:TMPR
+ | jz >6 // No metatable: continue.
+ | test byte TAB:TMPR->nomm, 1<<MM_newindex
+ | jz ->vmeta_tsets // 'no __newindex' flag NOT set: check.
+ |6:
+ | mov TMP1, ITYPE
+ | mov L:CARG1, SAVE_L
+ | mov L:CARG1->base, BASE
+ | lea CARG3, TMP1
+ | mov CARG2, TAB:RB
+ | mov SAVE_PC, PC
+ | call extern lj_tab_newkey // (lua_State *L, GCtab *t, TValue *k)
+ | // Handles write barrier for the new key. TValue * returned in eax (RC).
+ | mov L:CARG1, SAVE_L
+ | mov BASE, L:CARG1->base
+ | mov TMPR, rax
+ | movzx RAd, PC_RA
+ | jmp <2 // Must check write barrier for value.
+ |
+ |7: // Possible table write barrier for the value. Skip valiswhite check.
+ | barrierback TAB:RB, ITYPE
+ | jmp <3
+ break;
+ case BC_TSETB:
+ | ins_ABC // RA = src, RB = table, RC = byte literal
+ | mov TAB:RB, [BASE+RB*8]
+ | checktab TAB:RB, ->vmeta_tsetb
+ | cmp RCd, TAB:RB->asize
+ | jae ->vmeta_tsetb
+ | shl RCd, 3
+ | add RC, TAB:RB->array
+ | cmp aword [RC], LJ_TNIL
+ | je >3 // Previous value is nil?
+ |1:
+ | test byte TAB:RB->marked, LJ_GC_BLACK // isblack(table)
+ | jnz >7
+ |2: // Set array slot.
+ | mov ITYPE, [BASE+RA*8]
+ | mov [RC], ITYPE
+ | ins_next
+ |
+ |3: // Check for __newindex if previous value is nil.
+ | mov TAB:TMPR, TAB:RB->metatable
+ | test TAB:TMPR, TAB:TMPR
+ | jz <1
+ | test byte TAB:TMPR->nomm, 1<<MM_newindex
+ | jz ->vmeta_tsetb // 'no __newindex' flag NOT set: check.
+ | jmp <1
+ |
+ |7: // Possible table write barrier for the value. Skip valiswhite check.
+ | barrierback TAB:RB, TMPR
+ | jmp <2
+ break;
+ case BC_TSETR:
+ | ins_ABC // RA = src, RB = table, RC = key
+ | mov TAB:RB, [BASE+RB*8]
+ | cleartp TAB:RB
+ |.if DUALNUM
+ | mov RC, [BASE+RC*8]
+ |.else
+ | cvttsd2si RCd, qword [BASE+RC*8]
+ |.endif
+ | test byte TAB:RB->marked, LJ_GC_BLACK // isblack(table)
+ | jnz >7
+ |2:
+ | cmp RCd, TAB:RB->asize
+ | jae ->vmeta_tsetr
+ | shl RCd, 3
+ | add RC, TAB:RB->array
+ | // Set array slot.
+ |->BC_TSETR_Z:
+ | mov ITYPE, [BASE+RA*8]
+ | mov [RC], ITYPE
+ | ins_next
+ |
+ |7: // Possible table write barrier for the value. Skip valiswhite check.
+ | barrierback TAB:RB, TMPR
+ | jmp <2
+ break;
+
+ case BC_TSETM:
+ | ins_AD // RA = base (table at base-1), RD = num const (start index)
+ |1:
+ | mov TMPRd, dword [KBASE+RD*8] // Integer constant is in lo-word.
+ | lea RA, [BASE+RA*8]
+ | mov TAB:RB, [RA-8] // Guaranteed to be a table.
+ | cleartp TAB:RB
+ | test byte TAB:RB->marked, LJ_GC_BLACK // isblack(table)
+ | jnz >7
+ |2:
+ | mov RDd, MULTRES
+ | sub RDd, 1
+ | jz >4 // Nothing to copy?
+ | add RDd, TMPRd // Compute needed size.
+ | cmp RDd, TAB:RB->asize
+ | ja >5 // Doesn't fit into array part?
+ | sub RDd, TMPRd
+ | shl TMPRd, 3
+ | add TMPR, TAB:RB->array
+ |3: // Copy result slots to table.
+ | mov RB, [RA]
+ | add RA, 8
+ | mov [TMPR], RB
+ | add TMPR, 8
+ | sub RDd, 1
+ | jnz <3
+ |4:
+ | ins_next
+ |
+ |5: // Need to resize array part.
+ | mov L:CARG1, SAVE_L
+ | mov L:CARG1->base, BASE // Caveat: CARG2/CARG3 may be BASE.
+ | mov CARG2, TAB:RB
+ | mov CARG3d, RDd
+ | mov L:RB, L:CARG1
+ | mov SAVE_PC, PC
+ | call extern lj_tab_reasize // (lua_State *L, GCtab *t, int nasize)
+ | mov BASE, L:RB->base
+ | movzx RAd, PC_RA // Restore RA.
+ | movzx RDd, PC_RD // Restore RD.
+ | jmp <1 // Retry.
+ |
+ |7: // Possible table write barrier for any value. Skip valiswhite check.
+ | barrierback TAB:RB, RD
+ | jmp <2
+ break;
+
+ /* -- Calls and vararg handling ----------------------------------------- */
+
+ case BC_CALL: case BC_CALLM:
+ | ins_A_C // RA = base, (RB = nresults+1,) RC = nargs+1 | extra_nargs
+ if (op == BC_CALLM) {
+ | add NARGS:RDd, MULTRES
+ }
+ | mov LFUNC:RB, [BASE+RA*8]
+ | checkfunc LFUNC:RB, ->vmeta_call_ra
+ | lea BASE, [BASE+RA*8+16]
+ | ins_call
+ break;
+
+ case BC_CALLMT:
+ | ins_AD // RA = base, RD = extra_nargs
+ | add NARGS:RDd, MULTRES
+ | // Fall through. Assumes BC_CALLT follows and ins_AD is a no-op.
+ break;
+ case BC_CALLT:
+ | ins_AD // RA = base, RD = nargs+1
+ | lea RA, [BASE+RA*8+16]
+ | mov KBASE, BASE // Use KBASE for move + vmeta_call hint.
+ | mov LFUNC:RB, [RA-16]
+ | checktp_nc LFUNC:RB, LJ_TFUNC, ->vmeta_call
+ |->BC_CALLT_Z:
+ | mov PC, [BASE-8]
+ | test PCd, FRAME_TYPE
+ | jnz >7
+ |1:
+ | mov [BASE-16], LFUNC:RB // Copy func+tag down, reloaded below.
+ | mov MULTRES, NARGS:RDd
+ | sub NARGS:RDd, 1
+ | jz >3
+ |2: // Move args down.
+ | mov RB, [RA]
+ | add RA, 8
+ | mov [KBASE], RB
+ | add KBASE, 8
+ | sub NARGS:RDd, 1
+ | jnz <2
+ |
+ | mov LFUNC:RB, [BASE-16]
+ |3:
+ | cleartp LFUNC:RB
+ | mov NARGS:RDd, MULTRES
+ | cmp byte LFUNC:RB->ffid, 1 // (> FF_C) Calling a fast function?
+ | ja >5
+ |4:
+ | ins_callt
+ |
+ |5: // Tailcall to a fast function.
+ | test PCd, FRAME_TYPE // Lua frame below?
+ | jnz <4
+ | movzx RAd, PC_RA
+ | neg RA
+ | mov LFUNC:KBASE, [BASE+RA*8-32] // Need to prepare KBASE.
+ | cleartp LFUNC:KBASE
+ | mov KBASE, LFUNC:KBASE->pc
+ | mov KBASE, [KBASE+PC2PROTO(k)]
+ | jmp <4
+ |
+ |7: // Tailcall from a vararg function.
+ | sub PC, FRAME_VARG
+ | test PCd, FRAME_TYPEP
+ | jnz >8 // Vararg frame below?
+ | sub BASE, PC // Need to relocate BASE/KBASE down.
+ | mov KBASE, BASE
+ | mov PC, [BASE-8]
+ | jmp <1
+ |8:
+ | add PCd, FRAME_VARG
+ | jmp <1
+ break;
+
+ case BC_ITERC:
+ | ins_A // RA = base, (RB = nresults+1,) RC = nargs+1 (2+1)
+ | lea RA, [BASE+RA*8+16] // fb = base+2
+ | mov RB, [RA-32] // Copy state. fb[0] = fb[-4].
+ | mov RC, [RA-24] // Copy control var. fb[1] = fb[-3].
+ | mov [RA], RB
+ | mov [RA+8], RC
+ | mov LFUNC:RB, [RA-40] // Copy callable. fb[-1] = fb[-5]
+ | mov [RA-16], LFUNC:RB
+ | mov NARGS:RDd, 2+1 // Handle like a regular 2-arg call.
+ | checkfunc LFUNC:RB, ->vmeta_call
+ | mov BASE, RA
+ | ins_call
+ break;
+
+ case BC_ITERN:
+ | ins_A // RA = base, (RB = nresults+1, RC = nargs+1 (2+1))
+ |.if JIT
+ | // NYI: add hotloop, record BC_ITERN.
+ |.endif
+ | mov TAB:RB, [BASE+RA*8-16]
+ | cleartp TAB:RB
+ | mov RCd, [BASE+RA*8-8] // Get index from control var.
+ | mov TMPRd, TAB:RB->asize
+ | add PC, 4
+ | mov ITYPE, TAB:RB->array
+ |1: // Traverse array part.
+ | cmp RCd, TMPRd; jae >5 // Index points after array part?
+ | cmp aword [ITYPE+RC*8], LJ_TNIL; je >4
+ |.if not DUALNUM
+ | cvtsi2sd xmm0, RCd
+ |.endif
+ | // Copy array slot to returned value.
+ | mov RB, [ITYPE+RC*8]
+ | mov [BASE+RA*8+8], RB
+ | // Return array index as a numeric key.
+ |.if DUALNUM
+ | setint ITYPE, RC
+ | mov [BASE+RA*8], ITYPE
+ |.else
+ | movsd qword [BASE+RA*8], xmm0
+ |.endif
+ | add RCd, 1
+ | mov [BASE+RA*8-8], RCd // Update control var.
+ |2:
+ | movzx RDd, PC_RD // Get target from ITERL.
+ | branchPC RD
+ |3:
+ | ins_next
+ |
+ |4: // Skip holes in array part.
+ | add RCd, 1
+ | jmp <1
+ |
+ |5: // Traverse hash part.
+ | sub RCd, TMPRd
+ |6:
+ | cmp RCd, TAB:RB->hmask; ja <3 // End of iteration? Branch to ITERL+1.
+ | imul ITYPEd, RCd, #NODE
+ | add NODE:ITYPE, TAB:RB->node
+ | cmp aword NODE:ITYPE->val, LJ_TNIL; je >7
+ | lea TMPRd, [RCd+TMPRd+1]
+ | // Copy key and value from hash slot.
+ | mov RB, NODE:ITYPE->key
+ | mov RC, NODE:ITYPE->val
+ | mov [BASE+RA*8], RB
+ | mov [BASE+RA*8+8], RC
+ | mov [BASE+RA*8-8], TMPRd
+ | jmp <2
+ |
+ |7: // Skip holes in hash part.
+ | add RCd, 1
+ | jmp <6
+ break;
+
+ case BC_ISNEXT:
+ | ins_AD // RA = base, RD = target (points to ITERN)
+ | mov CFUNC:RB, [BASE+RA*8-24]
+ | checkfunc CFUNC:RB, >5
+ | checktptp [BASE+RA*8-16], LJ_TTAB, >5
+ | cmp aword [BASE+RA*8-8], LJ_TNIL; jne >5
+ | cmp byte CFUNC:RB->ffid, FF_next_N; jne >5
+ | branchPC RD
+ | mov64 TMPR, U64x(fffe7fff, 00000000)
+ | mov [BASE+RA*8-8], TMPR // Initialize control var.
+ |1:
+ | ins_next
+ |5: // Despecialize bytecode if any of the checks fail.
+ | mov PC_OP, BC_JMP
+ | branchPC RD
+ | mov byte [PC], BC_ITERC
+ | jmp <1
+ break;
+
+ case BC_VARG:
+ | ins_ABC // RA = base, RB = nresults+1, RC = numparams
+ | lea TMPR, [BASE+RC*8+(16+FRAME_VARG)]
+ | lea RA, [BASE+RA*8]
+ | sub TMPR, [BASE-8]
+ | // Note: TMPR may now be even _above_ BASE if nargs was < numparams.
+ | test RB, RB
+ | jz >5 // Copy all varargs?
+ | lea RB, [RA+RB*8-8]
+ | cmp TMPR, BASE // No vararg slots?
+ | jnb >2
+ |1: // Copy vararg slots to destination slots.
+ | mov RC, [TMPR-16]
+ | add TMPR, 8
+ | mov [RA], RC
+ | add RA, 8
+ | cmp RA, RB // All destination slots filled?
+ | jnb >3
+ | cmp TMPR, BASE // No more vararg slots?
+ | jb <1
+ |2: // Fill up remainder with nil.
+ | mov aword [RA], LJ_TNIL
+ | add RA, 8
+ | cmp RA, RB
+ | jb <2
+ |3:
+ | ins_next
+ |
+ |5: // Copy all varargs.
+ | mov MULTRES, 1 // MULTRES = 0+1
+ | mov RC, BASE
+ | sub RC, TMPR
+ | jbe <3 // No vararg slots?
+ | mov RBd, RCd
+ | shr RBd, 3
+ | add RBd, 1
+ | mov MULTRES, RBd // MULTRES = #varargs+1
+ | mov L:RB, SAVE_L
+ | add RC, RA
+ | cmp RC, L:RB->maxstack
+ | ja >7 // Need to grow stack?
+ |6: // Copy all vararg slots.
+ | mov RC, [TMPR-16]
+ | add TMPR, 8
+ | mov [RA], RC
+ | add RA, 8
+ | cmp TMPR, BASE // No more vararg slots?
+ | jb <6
+ | jmp <3
+ |
+ |7: // Grow stack for varargs.
+ | mov L:RB->base, BASE
+ | mov L:RB->top, RA
+ | mov SAVE_PC, PC
+ | sub TMPR, BASE // Need delta, because BASE may change.
+ | mov TMP1hi, TMPRd
+ | mov CARG2d, MULTRES
+ | sub CARG2d, 1
+ | mov CARG1, L:RB
+ | call extern lj_state_growstack // (lua_State *L, int n)
+ | mov BASE, L:RB->base
+ | movsxd TMPR, TMP1hi
+ | mov RA, L:RB->top
+ | add TMPR, BASE
+ | jmp <6
+ break;
+
+ /* -- Returns ----------------------------------------------------------- */
+
+ case BC_RETM:
+ | ins_AD // RA = results, RD = extra_nresults
+ | add RDd, MULTRES // MULTRES >=1, so RD >=1.
+ | // Fall through. Assumes BC_RET follows and ins_AD is a no-op.
+ break;
+
+ case BC_RET: case BC_RET0: case BC_RET1:
+ | ins_AD // RA = results, RD = nresults+1
+ if (op != BC_RET0) {
+ | shl RAd, 3
+ }
+ |1:
+ | mov PC, [BASE-8]
+ | mov MULTRES, RDd // Save nresults+1.
+ | test PCd, FRAME_TYPE // Check frame type marker.
+ | jnz >7 // Not returning to a fixarg Lua func?
+ switch (op) {
+ case BC_RET:
+ |->BC_RET_Z:
+ | mov KBASE, BASE // Use KBASE for result move.
+ | sub RDd, 1
+ | jz >3
+ |2: // Move results down.
+ | mov RB, [KBASE+RA]
+ | mov [KBASE-16], RB
+ | add KBASE, 8
+ | sub RDd, 1
+ | jnz <2
+ |3:
+ | mov RDd, MULTRES // Note: MULTRES may be >255.
+ | movzx RBd, PC_RB // So cannot compare with RDL!
+ |5:
+ | cmp RBd, RDd // More results expected?
+ | ja >6
+ break;
+ case BC_RET1:
+ | mov RB, [BASE+RA]
+ | mov [BASE-16], RB
+ /* fallthrough */
+ case BC_RET0:
+ |5:
+ | cmp PC_RB, RDL // More results expected?
+ | ja >6
+ default:
+ break;
+ }
+ | movzx RAd, PC_RA
+ | neg RA
+ | lea BASE, [BASE+RA*8-16] // base = base - (RA+2)*8
+ | mov LFUNC:KBASE, [BASE-16]
+ | cleartp LFUNC:KBASE
+ | mov KBASE, LFUNC:KBASE->pc
+ | mov KBASE, [KBASE+PC2PROTO(k)]
+ | ins_next
+ |
+ |6: // Fill up results with nil.
+ if (op == BC_RET) {
+ | mov aword [KBASE-16], LJ_TNIL // Note: relies on shifted base.
+ | add KBASE, 8
+ } else {
+ | mov aword [BASE+RD*8-24], LJ_TNIL
+ }
+ | add RD, 1
+ | jmp <5
+ |
+ |7: // Non-standard return case.
+ | lea RB, [PC-FRAME_VARG]
+ | test RBd, FRAME_TYPEP
+ | jnz ->vm_return
+ | // Return from vararg function: relocate BASE down and RA up.
+ | sub BASE, RB
+ if (op != BC_RET0) {
+ | add RA, RB
+ }
+ | jmp <1
+ break;
+
+ /* -- Loops and branches ------------------------------------------------ */
+
+ |.define FOR_IDX, [RA]
+ |.define FOR_STOP, [RA+8]
+ |.define FOR_STEP, [RA+16]
+ |.define FOR_EXT, [RA+24]
+
+ case BC_FORL:
+ |.if JIT
+ | hotloop RBd
+ |.endif
+ | // Fall through. Assumes BC_IFORL follows and ins_AJ is a no-op.
+ break;
+
+ case BC_JFORI:
+ case BC_JFORL:
+#if !LJ_HASJIT
+ break;
+#endif
+ case BC_FORI:
+ case BC_IFORL:
+ vk = (op == BC_IFORL || op == BC_JFORL);
+ | ins_AJ // RA = base, RD = target (after end of loop or start of loop)
+ | lea RA, [BASE+RA*8]
+ if (LJ_DUALNUM) {
+ | mov RB, FOR_IDX
+ | checkint RB, >9
+ | mov TMPR, FOR_STOP
+ if (!vk) {
+ | checkint TMPR, ->vmeta_for
+ | mov ITYPE, FOR_STEP
+ | test ITYPEd, ITYPEd; js >5
+ | sar ITYPE, 47;
+ | cmp ITYPEd, LJ_TISNUM; jne ->vmeta_for
+ } else {
+#ifdef LUA_USE_ASSERT
+ | checkinttp FOR_STOP, ->assert_bad_for_arg_type
+ | checkinttp FOR_STEP, ->assert_bad_for_arg_type
+#endif
+ | mov ITYPE, FOR_STEP
+ | test ITYPEd, ITYPEd; js >5
+ | add RBd, ITYPEd; jo >1
+ | setint RB
+ | mov FOR_IDX, RB
+ }
+ | cmp RBd, TMPRd
+ | mov FOR_EXT, RB
+ if (op == BC_FORI) {
+ | jle >7
+ |1:
+ |6:
+ | branchPC RD
+ } else if (op == BC_JFORI) {
+ | branchPC RD
+ | movzx RDd, PC_RD
+ | jle =>BC_JLOOP
+ |1:
+ |6:
+ } else if (op == BC_IFORL) {
+ | jg >7
+ |6:
+ | branchPC RD
+ |1:
+ } else {
+ | jle =>BC_JLOOP
+ |1:
+ |6:
+ }
+ |7:
+ | ins_next
+ |
+ |5: // Invert check for negative step.
+ if (!vk) {
+ | sar ITYPE, 47;
+ | cmp ITYPEd, LJ_TISNUM; jne ->vmeta_for
+ } else {
+ | add RBd, ITYPEd; jo <1
+ | setint RB
+ | mov FOR_IDX, RB
+ }
+ | cmp RBd, TMPRd
+ | mov FOR_EXT, RB
+ if (op == BC_FORI) {
+ | jge <7
+ } else if (op == BC_JFORI) {
+ | branchPC RD
+ | movzx RDd, PC_RD
+ | jge =>BC_JLOOP
+ } else if (op == BC_IFORL) {
+ | jl <7
+ } else {
+ | jge =>BC_JLOOP
+ }
+ | jmp <6
+ |9: // Fallback to FP variant.
+ if (!vk) {
+ | jae ->vmeta_for
+ }
+ } else if (!vk) {
+ | checknumtp FOR_IDX, ->vmeta_for
+ }
+ if (!vk) {
+ | checknumtp FOR_STOP, ->vmeta_for
+ } else {
+#ifdef LUA_USE_ASSERT
+ | checknumtp FOR_STOP, ->assert_bad_for_arg_type
+ | checknumtp FOR_STEP, ->assert_bad_for_arg_type
+#endif
+ }
+ | mov RB, FOR_STEP
+ if (!vk) {
+ | checknum RB, ->vmeta_for
+ }
+ | movsd xmm0, qword FOR_IDX
+ | movsd xmm1, qword FOR_STOP
+ if (vk) {
+ | addsd xmm0, qword FOR_STEP
+ | movsd qword FOR_IDX, xmm0
+ | test RB, RB; js >3
+ } else {
+ | jl >3
+ }
+ | ucomisd xmm1, xmm0
+ |1:
+ | movsd qword FOR_EXT, xmm0
+ if (op == BC_FORI) {
+ |.if DUALNUM
+ | jnb <7
+ |.else
+ | jnb >2
+ | branchPC RD
+ |.endif
+ } else if (op == BC_JFORI) {
+ | branchPC RD
+ | movzx RDd, PC_RD
+ | jnb =>BC_JLOOP
+ } else if (op == BC_IFORL) {
+ |.if DUALNUM
+ | jb <7
+ |.else
+ | jb >2
+ | branchPC RD
+ |.endif
+ } else {
+ | jnb =>BC_JLOOP
+ }
+ |.if DUALNUM
+ | jmp <6
+ |.else
+ |2:
+ | ins_next
+ |.endif
+ |
+ |3: // Invert comparison if step is negative.
+ | ucomisd xmm0, xmm1
+ | jmp <1
+ break;
+
+ case BC_ITERL:
+ |.if JIT
+ | hotloop RBd
+ |.endif
+ | // Fall through. Assumes BC_IITERL follows and ins_AJ is a no-op.
+ break;
+
+ case BC_JITERL:
+#if !LJ_HASJIT
+ break;
+#endif
+ case BC_IITERL:
+ | ins_AJ // RA = base, RD = target
+ | lea RA, [BASE+RA*8]
+ | mov RB, [RA]
+ | cmp RB, LJ_TNIL; je >1 // Stop if iterator returned nil.
+ if (op == BC_JITERL) {
+ | mov [RA-8], RB
+ | jmp =>BC_JLOOP
+ } else {
+ | branchPC RD // Otherwise save control var + branch.
+ | mov [RA-8], RB
+ }
+ |1:
+ | ins_next
+ break;
+
+ case BC_LOOP:
+ | ins_A // RA = base, RD = target (loop extent)
+ | // Note: RA/RD is only used by trace recorder to determine scope/extent
+ | // This opcode does NOT jump, it's only purpose is to detect a hot loop.
+ |.if JIT
+ | hotloop RBd
+ |.endif
+ | // Fall through. Assumes BC_ILOOP follows and ins_A is a no-op.
+ break;
+
+ case BC_ILOOP:
+ | ins_A // RA = base, RD = target (loop extent)
+ | ins_next
+ break;
+
+ case BC_JLOOP:
+ |.if JIT
+ | ins_AD // RA = base (ignored), RD = traceno
+ | mov RA, [DISPATCH+DISPATCH_J(trace)]
+ | mov TRACE:RD, [RA+RD*8]
+ | mov RD, TRACE:RD->mcode
+ | mov L:RB, SAVE_L
+ | mov [DISPATCH+DISPATCH_GL(jit_base)], BASE
+ | mov [DISPATCH+DISPATCH_GL(tmpbuf.L)], L:RB
+ | // Save additional callee-save registers only used in compiled code.
+ |.if X64WIN
+ | mov CSAVE_4, r12
+ | mov CSAVE_3, r13
+ | mov CSAVE_2, r14
+ | mov CSAVE_1, r15
+ | mov RA, rsp
+ | sub rsp, 10*16+4*8
+ | movdqa [RA-1*16], xmm6
+ | movdqa [RA-2*16], xmm7
+ | movdqa [RA-3*16], xmm8
+ | movdqa [RA-4*16], xmm9
+ | movdqa [RA-5*16], xmm10
+ | movdqa [RA-6*16], xmm11
+ | movdqa [RA-7*16], xmm12
+ | movdqa [RA-8*16], xmm13
+ | movdqa [RA-9*16], xmm14
+ | movdqa [RA-10*16], xmm15
+ |.else
+ | sub rsp, 16
+ | mov [rsp+16], r12
+ | mov [rsp+8], r13
+ |.endif
+ | jmp RD
+ |.endif
+ break;
+
+ case BC_JMP:
+ | ins_AJ // RA = unused, RD = target
+ | branchPC RD
+ | ins_next
+ break;
+
+ /* -- Function headers -------------------------------------------------- */
+
+ /*
+ ** Reminder: A function may be called with func/args above L->maxstack,
+ ** i.e. occupying EXTRA_STACK slots. And vmeta_call may add one extra slot,
+ ** too. This means all FUNC* ops (including fast functions) must check
+ ** for stack overflow _before_ adding more slots!
+ */
+
+ case BC_FUNCF:
+ |.if JIT
+ | hotcall RBd
+ |.endif
+ case BC_FUNCV: /* NYI: compiled vararg functions. */
+ | // Fall through. Assumes BC_IFUNCF/BC_IFUNCV follow and ins_AD is a no-op.
+ break;
+
+ case BC_JFUNCF:
+#if !LJ_HASJIT
+ break;
+#endif
+ case BC_IFUNCF:
+ | ins_AD // BASE = new base, RA = framesize, RD = nargs+1
+ | mov KBASE, [PC-4+PC2PROTO(k)]
+ | mov L:RB, SAVE_L
+ | lea RA, [BASE+RA*8] // Top of frame.
+ | cmp RA, L:RB->maxstack
+ | ja ->vm_growstack_f
+ | movzx RAd, byte [PC-4+PC2PROTO(numparams)]
+ | cmp NARGS:RDd, RAd // Check for missing parameters.
+ | jbe >3
+ |2:
+ if (op == BC_JFUNCF) {
+ | movzx RDd, PC_RD
+ | jmp =>BC_JLOOP
+ } else {
+ | ins_next
+ }
+ |
+ |3: // Clear missing parameters.
+ | mov aword [BASE+NARGS:RD*8-8], LJ_TNIL
+ | add NARGS:RDd, 1
+ | cmp NARGS:RDd, RAd
+ | jbe <3
+ | jmp <2
+ break;
+
+ case BC_JFUNCV:
+#if !LJ_HASJIT
+ break;
+#endif
+ | int3 // NYI: compiled vararg functions
+ break; /* NYI: compiled vararg functions. */
+
+ case BC_IFUNCV:
+ | ins_AD // BASE = new base, RA = framesize, RD = nargs+1
+ | lea RBd, [NARGS:RD*8+FRAME_VARG+8]
+ | lea RD, [BASE+NARGS:RD*8+8]
+ | mov LFUNC:KBASE, [BASE-16]
+ | mov [RD-8], RB // Store delta + FRAME_VARG.
+ | mov [RD-16], LFUNC:KBASE // Store copy of LFUNC.
+ | mov L:RB, SAVE_L
+ | lea RA, [RD+RA*8]
+ | cmp RA, L:RB->maxstack
+ | ja ->vm_growstack_v // Need to grow stack.
+ | mov RA, BASE
+ | mov BASE, RD
+ | movzx RBd, byte [PC-4+PC2PROTO(numparams)]
+ | test RBd, RBd
+ | jz >2
+ | add RA, 8
+ |1: // Copy fixarg slots up to new frame.
+ | add RA, 8
+ | cmp RA, BASE
+ | jnb >3 // Less args than parameters?
+ | mov KBASE, [RA-16]
+ | mov [RD], KBASE
+ | add RD, 8
+ | mov aword [RA-16], LJ_TNIL // Clear old fixarg slot (help the GC).
+ | sub RBd, 1
+ | jnz <1
+ |2:
+ if (op == BC_JFUNCV) {
+ | movzx RDd, PC_RD
+ | jmp =>BC_JLOOP
+ } else {
+ | mov KBASE, [PC-4+PC2PROTO(k)]
+ | ins_next
+ }
+ |
+ |3: // Clear missing parameters.
+ | mov aword [RD], LJ_TNIL
+ | add RD, 8
+ | sub RBd, 1
+ | jnz <3
+ | jmp <2
+ break;
+
+ case BC_FUNCC:
+ case BC_FUNCCW:
+ | ins_AD // BASE = new base, RA = ins RA|RD (unused), RD = nargs+1
+ | mov CFUNC:RB, [BASE-16]
+ | cleartp CFUNC:RB
+ | mov KBASE, CFUNC:RB->f
+ | mov L:RB, SAVE_L
+ | lea RD, [BASE+NARGS:RD*8-8]
+ | mov L:RB->base, BASE
+ | lea RA, [RD+8*LUA_MINSTACK]
+ | cmp RA, L:RB->maxstack
+ | mov L:RB->top, RD
+ if (op == BC_FUNCC) {
+ | mov CARG1, L:RB // Caveat: CARG1 may be RA.
+ } else {
+ | mov CARG2, KBASE
+ | mov CARG1, L:RB // Caveat: CARG1 may be RA.
+ }
+ | ja ->vm_growstack_c // Need to grow stack.
+ | set_vmstate C
+ if (op == BC_FUNCC) {
+ | call KBASE // (lua_State *L)
+ } else {
+ | // (lua_State *L, lua_CFunction f)
+ | call aword [DISPATCH+DISPATCH_GL(wrapf)]
+ }
+ | // nresults returned in eax (RD).
+ | mov BASE, L:RB->base
+ | mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
+ | set_vmstate INTERP
+ | lea RA, [BASE+RD*8]
+ | neg RA
+ | add RA, L:RB->top // RA = (L->top-(L->base+nresults))*8
+ | mov PC, [BASE-8] // Fetch PC of caller.
+ | jmp ->vm_returnc
+ break;
+
+ /* ---------------------------------------------------------------------- */
+
+ default:
+ fprintf(stderr, "Error: undefined opcode BC_%s\n", bc_names[op]);
+ exit(2);
+ break;
+ }
+}
+
+static int build_backend(BuildCtx *ctx)
+{
+ int op;
+ dasm_growpc(Dst, BC__MAX);
+ build_subroutines(ctx);
+ |.code_op
+ for (op = 0; op < BC__MAX; op++)
+ build_ins(ctx, (BCOp)op, op);
+ return BC__MAX;
+}
+
+/* Emit pseudo frame-info for all assembler functions. */
+static void emit_asm_debug(BuildCtx *ctx)
+{
+ int fcofs = (int)((uint8_t *)ctx->glob[GLOB_vm_ffi_call] - ctx->code);
+ switch (ctx->mode) {
+ case BUILD_elfasm:
+ fprintf(ctx->fp, "\t.section .debug_frame,\"\",@progbits\n");
+ fprintf(ctx->fp,
+ ".Lframe0:\n"
+ "\t.long .LECIE0-.LSCIE0\n"
+ ".LSCIE0:\n"
+ "\t.long 0xffffffff\n"
+ "\t.byte 0x1\n"
+ "\t.string \"\"\n"
+ "\t.uleb128 0x1\n"
+ "\t.sleb128 -8\n"
+ "\t.byte 0x10\n"
+ "\t.byte 0xc\n\t.uleb128 0x7\n\t.uleb128 8\n"
+ "\t.byte 0x80+0x10\n\t.uleb128 0x1\n"
+ "\t.align 8\n"
+ ".LECIE0:\n\n");
+ fprintf(ctx->fp,
+ ".LSFDE0:\n"
+ "\t.long .LEFDE0-.LASFDE0\n"
+ ".LASFDE0:\n"
+ "\t.long .Lframe0\n"
+ "\t.quad .Lbegin\n"
+ "\t.quad %d\n"
+ "\t.byte 0xe\n\t.uleb128 %d\n" /* def_cfa_offset */
+ "\t.byte 0x86\n\t.uleb128 0x2\n" /* offset rbp */
+ "\t.byte 0x83\n\t.uleb128 0x3\n" /* offset rbx */
+ "\t.byte 0x8f\n\t.uleb128 0x4\n" /* offset r15 */
+ "\t.byte 0x8e\n\t.uleb128 0x5\n" /* offset r14 */
+#if LJ_NO_UNWIND
+ "\t.byte 0x8d\n\t.uleb128 0x6\n" /* offset r13 */
+ "\t.byte 0x8c\n\t.uleb128 0x7\n" /* offset r12 */
+#endif
+ "\t.align 8\n"
+ ".LEFDE0:\n\n", fcofs, CFRAME_SIZE);
+#if LJ_HASFFI
+ fprintf(ctx->fp,
+ ".LSFDE1:\n"
+ "\t.long .LEFDE1-.LASFDE1\n"
+ ".LASFDE1:\n"
+ "\t.long .Lframe0\n"
+ "\t.quad lj_vm_ffi_call\n"
+ "\t.quad %d\n"
+ "\t.byte 0xe\n\t.uleb128 16\n" /* def_cfa_offset */
+ "\t.byte 0x86\n\t.uleb128 0x2\n" /* offset rbp */
+ "\t.byte 0xd\n\t.uleb128 0x6\n" /* def_cfa_register rbp */
+ "\t.byte 0x83\n\t.uleb128 0x3\n" /* offset rbx */
+ "\t.align 8\n"
+ ".LEFDE1:\n\n", (int)ctx->codesz - fcofs);
+#endif
+#if !LJ_NO_UNWIND
+#if (defined(__sun__) && defined(__svr4__))
+ fprintf(ctx->fp, "\t.section .eh_frame,\"a\",@unwind\n");
+#else
+ fprintf(ctx->fp, "\t.section .eh_frame,\"a\",@progbits\n");
+#endif
+ fprintf(ctx->fp,
+ ".Lframe1:\n"
+ "\t.long .LECIE1-.LSCIE1\n"
+ ".LSCIE1:\n"
+ "\t.long 0\n"
+ "\t.byte 0x1\n"
+ "\t.string \"zPR\"\n"
+ "\t.uleb128 0x1\n"
+ "\t.sleb128 -8\n"
+ "\t.byte 0x10\n"
+ "\t.uleb128 6\n" /* augmentation length */
+ "\t.byte 0x1b\n" /* pcrel|sdata4 */
+ "\t.long lj_err_unwind_dwarf-.\n"
+ "\t.byte 0x1b\n" /* pcrel|sdata4 */
+ "\t.byte 0xc\n\t.uleb128 0x7\n\t.uleb128 8\n"
+ "\t.byte 0x80+0x10\n\t.uleb128 0x1\n"
+ "\t.align 8\n"
+ ".LECIE1:\n\n");
+ fprintf(ctx->fp,
+ ".LSFDE2:\n"
+ "\t.long .LEFDE2-.LASFDE2\n"
+ ".LASFDE2:\n"
+ "\t.long .LASFDE2-.Lframe1\n"
+ "\t.long .Lbegin-.\n"
+ "\t.long %d\n"
+ "\t.uleb128 0\n" /* augmentation length */
+ "\t.byte 0xe\n\t.uleb128 %d\n" /* def_cfa_offset */
+ "\t.byte 0x86\n\t.uleb128 0x2\n" /* offset rbp */
+ "\t.byte 0x83\n\t.uleb128 0x3\n" /* offset rbx */
+ "\t.byte 0x8f\n\t.uleb128 0x4\n" /* offset r15 */
+ "\t.byte 0x8e\n\t.uleb128 0x5\n" /* offset r14 */
+ "\t.align 8\n"
+ ".LEFDE2:\n\n", fcofs, CFRAME_SIZE);
+#if LJ_HASFFI
+ fprintf(ctx->fp,
+ ".Lframe2:\n"
+ "\t.long .LECIE2-.LSCIE2\n"
+ ".LSCIE2:\n"
+ "\t.long 0\n"
+ "\t.byte 0x1\n"
+ "\t.string \"zR\"\n"
+ "\t.uleb128 0x1\n"
+ "\t.sleb128 -8\n"
+ "\t.byte 0x10\n"
+ "\t.uleb128 1\n" /* augmentation length */
+ "\t.byte 0x1b\n" /* pcrel|sdata4 */
+ "\t.byte 0xc\n\t.uleb128 0x7\n\t.uleb128 8\n"
+ "\t.byte 0x80+0x10\n\t.uleb128 0x1\n"
+ "\t.align 8\n"
+ ".LECIE2:\n\n");
+ fprintf(ctx->fp,
+ ".LSFDE3:\n"
+ "\t.long .LEFDE3-.LASFDE3\n"
+ ".LASFDE3:\n"
+ "\t.long .LASFDE3-.Lframe2\n"
+ "\t.long lj_vm_ffi_call-.\n"
+ "\t.long %d\n"
+ "\t.uleb128 0\n" /* augmentation length */
+ "\t.byte 0xe\n\t.uleb128 16\n" /* def_cfa_offset */
+ "\t.byte 0x86\n\t.uleb128 0x2\n" /* offset rbp */
+ "\t.byte 0xd\n\t.uleb128 0x6\n" /* def_cfa_register rbp */
+ "\t.byte 0x83\n\t.uleb128 0x3\n" /* offset rbx */
+ "\t.align 8\n"
+ ".LEFDE3:\n\n", (int)ctx->codesz - fcofs);
+#endif
+#endif
+ break;
+#if !LJ_NO_UNWIND
+ /* Mental note: never let Apple design an assembler.
+ ** Or a linker. Or a plastic case. But I digress.
+ */
+ case BUILD_machasm: {
+#if LJ_HASFFI
+ int fcsize = 0;
+#endif
+ int i;
+ fprintf(ctx->fp, "\t.section __TEXT,__eh_frame,coalesced,no_toc+strip_static_syms+live_support\n");
+ fprintf(ctx->fp,
+ "EH_frame1:\n"
+ "\t.set L$set$x,LECIEX-LSCIEX\n"
+ "\t.long L$set$x\n"
+ "LSCIEX:\n"
+ "\t.long 0\n"
+ "\t.byte 0x1\n"
+ "\t.ascii \"zPR\\0\"\n"
+ "\t.byte 0x1\n"
+ "\t.byte 128-8\n"
+ "\t.byte 0x10\n"
+ "\t.byte 6\n" /* augmentation length */
+ "\t.byte 0x9b\n" /* indirect|pcrel|sdata4 */
+ "\t.long _lj_err_unwind_dwarf+4@GOTPCREL\n"
+ "\t.byte 0x1b\n" /* pcrel|sdata4 */
+ "\t.byte 0xc\n\t.byte 0x7\n\t.byte 8\n"
+ "\t.byte 0x80+0x10\n\t.byte 0x1\n"
+ "\t.align 3\n"
+ "LECIEX:\n\n");
+ for (i = 0; i < ctx->nsym; i++) {
+ const char *name = ctx->sym[i].name;
+ int32_t size = ctx->sym[i+1].ofs - ctx->sym[i].ofs;
+ if (size == 0) continue;
+#if LJ_HASFFI
+ if (!strcmp(name, "_lj_vm_ffi_call")) { fcsize = size; continue; }
+#endif
+ fprintf(ctx->fp,
+ "%s.eh:\n"
+ "LSFDE%d:\n"
+ "\t.set L$set$%d,LEFDE%d-LASFDE%d\n"
+ "\t.long L$set$%d\n"
+ "LASFDE%d:\n"
+ "\t.long LASFDE%d-EH_frame1\n"
+ "\t.long %s-.\n"
+ "\t.long %d\n"
+ "\t.byte 0\n" /* augmentation length */
+ "\t.byte 0xe\n\t.byte %d\n" /* def_cfa_offset */
+ "\t.byte 0x86\n\t.byte 0x2\n" /* offset rbp */
+ "\t.byte 0x83\n\t.byte 0x3\n" /* offset rbx */
+ "\t.byte 0x8f\n\t.byte 0x4\n" /* offset r15 */
+ "\t.byte 0x8e\n\t.byte 0x5\n" /* offset r14 */
+ "\t.align 3\n"
+ "LEFDE%d:\n\n",
+ name, i, i, i, i, i, i, i, name, size, CFRAME_SIZE, i);
+ }
+#if LJ_HASFFI
+ if (fcsize) {
+ fprintf(ctx->fp,
+ "EH_frame2:\n"
+ "\t.set L$set$y,LECIEY-LSCIEY\n"
+ "\t.long L$set$y\n"
+ "LSCIEY:\n"
+ "\t.long 0\n"
+ "\t.byte 0x1\n"
+ "\t.ascii \"zR\\0\"\n"
+ "\t.byte 0x1\n"
+ "\t.byte 128-8\n"
+ "\t.byte 0x10\n"
+ "\t.byte 1\n" /* augmentation length */
+ "\t.byte 0x1b\n" /* pcrel|sdata4 */
+ "\t.byte 0xc\n\t.byte 0x7\n\t.byte 8\n"
+ "\t.byte 0x80+0x10\n\t.byte 0x1\n"
+ "\t.align 3\n"
+ "LECIEY:\n\n");
+ fprintf(ctx->fp,
+ "_lj_vm_ffi_call.eh:\n"
+ "LSFDEY:\n"
+ "\t.set L$set$yy,LEFDEY-LASFDEY\n"
+ "\t.long L$set$yy\n"
+ "LASFDEY:\n"
+ "\t.long LASFDEY-EH_frame2\n"
+ "\t.long _lj_vm_ffi_call-.\n"
+ "\t.long %d\n"
+ "\t.byte 0\n" /* augmentation length */
+ "\t.byte 0xe\n\t.byte 16\n" /* def_cfa_offset */
+ "\t.byte 0x86\n\t.byte 0x2\n" /* offset rbp */
+ "\t.byte 0xd\n\t.byte 0x6\n" /* def_cfa_register rbp */
+ "\t.byte 0x83\n\t.byte 0x3\n" /* offset rbx */
+ "\t.align 3\n"
+ "LEFDEY:\n\n", fcsize);
+ }
+#endif
+ fprintf(ctx->fp, ".subsections_via_symbols\n");
+ }
+ break;
+#endif
+ default: /* Difficult for other modes. */
+ break;
+ }
+}
+