From: Mike Pall Date: Sun, 20 Aug 2023 19:33:37 +0000 (+0200) Subject: Merge branch 'master' into v2.1 X-Git-Tag: v2.1.ROLLING~2 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=ef587afb2cd7267c0defd04aa642593b76a6b23d;p=thirdparty%2FLuaJIT.git Merge branch 'master' into v2.1 --- ef587afb2cd7267c0defd04aa642593b76a6b23d diff --cc doc/contact.html index 6d609286,7b8cd404..478c4bff --- a/doc/contact.html +++ b/doc/contact.html @@@ -2,8 -2,8 +2,8 @@@ Contact - + - + diff --cc doc/ext_buffer.html index 1c646a71,00000000..35f01c9a mode 100644,000000..100644 --- a/doc/ext_buffer.html +++ b/doc/ext_buffer.html @@@ -1,689 -1,0 +1,689 @@@ + + + +String Buffer Library + - ++ + + + + + + +
+Lua +
+ + +
+

+The string buffer library allows high-performance manipulation of +string-like data. +

+

+Unlike Lua strings, which are constants, string buffers are +mutable sequences of 8-bit (binary-transparent) characters. Data +can be stored, formatted and encoded into a string buffer and later +converted, extracted or decoded. +

+

+The convenient string buffer API simplifies common string manipulation +tasks, that would otherwise require creating many intermediate strings. +String buffers improve performance by eliminating redundant memory +copies, object creation, string interning and garbage collection +overhead. In conjunction with the FFI library, they allow zero-copy +operations. +

+

+The string buffer library also includes a high-performance +serializer for Lua objects. +

+ +

Using the String Buffer Library

+

+The string buffer library is built into LuaJIT by default, but it's not +loaded by default. Add this to the start of every Lua file that needs +one of its functions: +

+
 +local buffer = require("string.buffer")
 +
+

+The convention for the syntax shown on this page is that buffer +refers to the buffer library and buf refers to an individual +buffer object. +

+

+Please note the difference between a Lua function call, e.g. +buffer.new() (with a dot) and a Lua method call, e.g. +buf:reset() (with a colon). +

+ +

Buffer Objects

+

+A buffer object is a garbage-collected Lua object. After creation with +buffer.new(), it can (and should) be reused for many operations. +When the last reference to a buffer object is gone, it will eventually +be freed by the garbage collector, along with the allocated buffer +space. +

+

+Buffers operate like a FIFO (first-in first-out) data structure. Data +can be appended (written) to the end of the buffer and consumed (read) +from the front of the buffer. These operations may be freely mixed. +

+

+The buffer space that holds the characters is managed automatically +— it grows as needed and already consumed space is recycled. Use +buffer.new(size) and buf:free(), if you need more +control. +

+

+The maximum size of a single buffer is the same as the maximum size of a +Lua string, which is slightly below two gigabytes. For huge data sizes, +neither strings nor buffers are the right data structure — use the +FFI library to directly map memory or files up to the virtual memory +limit of your OS. +

+ +

Buffer Method Overview

+ + +

Buffer Creation and Management

+ +

local buf = buffer.new([size [,options]])
+local buf = buffer.new([options])

+

+Creates a new buffer object. +

+

+The optional size argument ensures a minimum initial buffer +size. This is strictly an optimization when the required buffer size is +known beforehand. The buffer space will grow as needed, in any case. +

+

+The optional table options sets various +serialization options. +

+ +

buf = buf:reset()

+

+Reset (empty) the buffer. The allocated buffer space is not freed and +may be reused. +

+ +

buf = buf:free()

+

+The buffer space of the buffer object is freed. The object itself +remains intact, empty and may be reused. +

+

+Note: you normally don't need to use this method. The garbage collector +automatically frees the buffer space, when the buffer object is +collected. Use this method, if you need to free the associated memory +immediately. +

+ +

Buffer Writers

+ +

buf = buf:put([str|num|obj] [,…])

+

+Appends a string str, a number num or any object +obj with a __tostring metamethod to the buffer. +Multiple arguments are appended in the given order. +

+

+Appending a buffer to a buffer is possible and short-circuited +internally. But it still involves a copy. Better combine the buffer +writes to use a single buffer. +

+ +

buf = buf:putf(format, …)

+

+Appends the formatted arguments to the buffer. The format +string supports the same options as string.format(). +

+ +

buf = buf:putcdata(cdata, len)FFI

+

+Appends the given len number of bytes from the memory pointed +to by the FFI cdata object to the buffer. The object needs to +be convertible to a (constant) pointer. +

+ +

buf = buf:set(str)
+buf = buf:set(cdata, len)
FFI

+

+This method allows zero-copy consumption of a string or an FFI cdata +object as a buffer. It stores a reference to the passed string +str or the FFI cdata object in the buffer. Any buffer +space originally allocated is freed. This is not an append +operation, unlike the buf:put*() methods. +

+

+After calling this method, the buffer behaves as if +buf:free():put(str) or buf:free():put(cdata, len) +had been called. However, the data is only referenced and not copied, as +long as the buffer is only consumed. +

+

+In case the buffer is written to later on, the referenced data is copied +and the object reference is removed (copy-on-write semantics). +

+

+The stored reference is an anchor for the garbage collector and keeps the +originally passed string or FFI cdata object alive. +

+ +

ptr, len = buf:reserve(size)FFI
+buf = buf:commit(used)FFI

+

+The reserve method reserves at least size bytes of +write space in the buffer. It returns an uint8_t * FFI +cdata pointer ptr that points to this space. +

+

+The available length in bytes is returned in len. This is at +least size bytes, but may be more to facilitate efficient +buffer growth. You can either make use of the additional space or ignore +len and only use size bytes. +

+

+The commit method appends the used bytes of the +previously returned write space to the buffer data. +

+

+This pair of methods allows zero-copy use of C read-style APIs: +

+
 +local MIN_SIZE = 65536
 +repeat
 +  local ptr, len = buf:reserve(MIN_SIZE)
 +  local n = C.read(fd, ptr, len)
 +  if n == 0 then break end -- EOF.
 +  if n < 0 then error("read error") end
 +  buf:commit(n)
 +until false
 +
+

+The reserved write space is not initialized. At least the +used bytes must be written to before calling the +commit method. There's no need to call the commit +method, if nothing is added to the buffer (e.g. on error). +

+ +

Buffer Readers

+ +

len = #buf

+

+Returns the current length of the buffer data in bytes. +

+ +

res = str|num|buf .. str|num|buf […]

+

+The Lua concatenation operator .. also accepts buffers, just +like strings or numbers. It always returns a string and not a buffer. +

+

+Note that although this is supported for convenience, this thwarts one +of the main reasons to use buffers, which is to avoid string +allocations. Rewrite it with buf:put() and buf:get(). +

+

+Mixing this with unrelated objects that have a __concat +metamethod may not work, since these probably only expect strings. +

+ +

buf = buf:skip(len)

+

+Skips (consumes) len bytes from the buffer up to the current +length of the buffer data. +

+ +

str, … = buf:get([len|nil] [,…])

+

+Consumes the buffer data and returns one or more strings. If called +without arguments, the whole buffer data is consumed. If called with a +number, up to len bytes are consumed. A nil argument +consumes the remaining buffer space (this only makes sense as the last +argument). Multiple arguments consume the buffer data in the given +order. +

+

+Note: a zero length or no remaining buffer data returns an empty string +and not nil. +

+ +

str = buf:tostring()
+str = tostring(buf)

+

+Creates a string from the buffer data, but doesn't consume it. The +buffer remains unchanged. +

+

+Buffer objects also define a __tostring metamethod. This means +buffers can be passed to the global tostring() function and +many other functions that accept this in place of strings. The important +internal uses in functions like io.write() are short-circuited +to avoid the creation of an intermediate string object. +

+ +

ptr, len = buf:ref()FFI

+

+Returns an uint8_t * FFI cdata pointer ptr that +points to the buffer data. The length of the buffer data in bytes is +returned in len. +

+

+The returned pointer can be directly passed to C functions that expect a +buffer and a length. You can also do bytewise reads +(local x = ptr[i]) or writes +(ptr[i] = 0x40) of the buffer data. +

+

+In conjunction with the skip method, this allows zero-copy use +of C write-style APIs: +

+
 +repeat
 +  local ptr, len = buf:ref()
 +  if len == 0 then break end
 +  local n = C.write(fd, ptr, len)
 +  if n < 0 then error("write error") end
 +  buf:skip(n)
 +until n >= len
 +
+

+Unlike Lua strings, buffer data is not implicitly +zero-terminated. It's not safe to pass ptr to C functions that +expect zero-terminated strings. If you're not using len, then +you're doing something wrong. +

+ +

Serialization of Lua Objects

+

+The following functions and methods allow high-speed serialization +(encoding) of a Lua object into a string and decoding it back to a Lua +object. This allows convenient storage and transport of structured +data. +

+

+The encoded data is in an internal binary +format. The data can be stored in files, binary-transparent +databases or transmitted to other LuaJIT instances across threads, +processes or networks. +

+

+Encoding speed can reach up to 1 Gigabyte/second on a modern desktop- or +server-class system, even when serializing many small objects. Decoding +speed is mostly constrained by object creation cost. +

+

+The serializer handles most Lua types, common FFI number types and +nested structures. Functions, thread objects, other FFI cdata and full +userdata cannot be serialized (yet). +

+

+The encoder serializes nested structures as trees. Multiple references +to a single object will be stored separately and create distinct objects +after decoding. Circular references cause an error. +

+ +

Serialization Functions and Methods

+ +

str = buffer.encode(obj)
+buf = buf:encode(obj)

+

+Serializes (encodes) the Lua object obj. The stand-alone +function returns a string str. The buffer method appends the +encoding to the buffer. +

+

+obj can be any of the supported Lua types — it doesn't +need to be a Lua table. +

+

+This function may throw an error when attempting to serialize +unsupported object types, circular references or deeply nested tables. +

+ +

obj = buffer.decode(str)
+obj = buf:decode()

+

+The stand-alone function deserializes (decodes) the string +str, the buffer method deserializes one object from the +buffer. Both return a Lua object obj. +

+

+The returned object may be any of the supported Lua types — +even nil. +

+

+This function may throw an error when fed with malformed or incomplete +encoded data. The stand-alone function throws when there's left-over +data after decoding a single top-level object. The buffer method leaves +any left-over data in the buffer. +

+

+Attempting to deserialize an FFI type will throw an error, if the FFI +library is not built-in or has not been loaded, yet. +

+ +

Serialization Options

+

+The options table passed to buffer.new() may contain +the following members (all optional): +

+ +

+dict needs to be an array of strings and metatable needs +to be an array of tables. Both starting at index 1 and without holes (no +nil in between). The tables are anchored in the buffer object and +internally modified into a two-way index (don't do this yourself, just pass +a plain array). The tables must not be modified after they have been passed +to buffer.new(). +

+

+The dict and metatable tables used by the encoder and +decoder must be the same. Put the most common entries at the front. Extend +at the end to ensure backwards-compatibility — older encodings can +then still be read. You may also set some indexes to false to +explicitly drop backwards-compatibility. Old encodings that use these +indexes will throw an error when decoded. +

+

+Metatables that are not found in the metatable dictionary are +ignored when encoding. Decoding returns a table with a nil +metatable. +

+

+Note: parsing and preparation of the options table is somewhat +expensive. Create a buffer object only once and recycle it for multiple +uses. Avoid mixing encoder and decoder buffers, since the +buf:set() method frees the already allocated buffer space: +

+
 +local options = {
 +  dict = { "commonly", "used", "string", "keys" },
 +}
 +local buf_enc = buffer.new(options)
 +local buf_dec = buffer.new(options)
 +
 +local function encode(obj)
 +  return buf_enc:reset():encode(obj):get()
 +end
 +
 +local function decode(str)
 +  return buf_dec:set(str):decode()
 +end
 +
+ +

Streaming Serialization

+

+In some contexts, it's desirable to do piecewise serialization of large +datasets, also known as streaming. +

+

+This serialization format can be safely concatenated and supports streaming. +Multiple encodings can simply be appended to a buffer and later decoded +individually: +

+
 +local buf = buffer.new()
 +buf:encode(obj1)
 +buf:encode(obj2)
 +local copy1 = buf:decode()
 +local copy2 = buf:decode()
 +
+

+Here's how to iterate over a stream: +

+
 +while #buf ~= 0 do
 +  local obj = buf:decode()
 +  -- Do something with obj.
 +end
 +
+

+Since the serialization format doesn't prepend a length to its encoding, +network applications may need to transmit the length, too. +

+ +

Serialization Format Specification

+

+This serialization format is designed for internal use by LuaJIT +applications. Serialized data is upwards-compatible and portable across +all supported LuaJIT platforms. +

+

+It's an 8-bit binary format and not human-readable. It uses e.g. +embedded zeroes and stores embedded Lua string objects unmodified, which +are 8-bit-clean, too. Encoded data can be safely concatenated for +streaming and later decoded one top-level object at a time. +

+

+The encoding is reasonably compact, but tuned for maximum performance, +not for minimum space usage. It compresses well with any of the common +byte-oriented data compression algorithms. +

+

+Although documented here for reference, this format is explicitly +not intended to be a 'public standard' for structured data +interchange across computer languages (like JSON or MessagePack). Please +do not use it as such. +

+

+The specification is given below as a context-free grammar with a +top-level object as the starting point. Alternatives are +separated by the | symbol and * indicates repeats. +Grouping is implicit or indicated by {…}. Terminals are +either plain hex numbers, encoded as bytes, or have a .format +suffix. +

+
 +object    → nil | false | true
 +          | null | lightud32 | lightud64
 +          | int | num | tab | tab_mt
 +          | int64 | uint64 | complex
 +          | string
 +
 +nil       → 0x00
 +false     → 0x01
 +true      → 0x02
 +
 +null      → 0x03                            // NULL lightuserdata
 +lightud32 → 0x04 data.I                   // 32 bit lightuserdata
 +lightud64 → 0x05 data.L                   // 64 bit lightuserdata
 +
 +int       → 0x06 int.I                                 // int32_t
 +num       → 0x07 double.L
 +
 +tab       → 0x08                                   // Empty table
 +          | 0x09 h.U h*{object object}          // Key/value hash
 +          | 0x0a a.U a*object                    // 0-based array
 +          | 0x0b a.U a*object h.U h*{object object}      // Mixed
 +          | 0x0c a.U (a-1)*object                // 1-based array
 +          | 0x0d a.U (a-1)*object h.U h*{object object}  // Mixed
 +tab_mt    → 0x0e (index-1).U tab          // Metatable dict entry
 +
 +int64     → 0x10 int.L                             // FFI int64_t
 +uint64    → 0x11 uint.L                           // FFI uint64_t
 +complex   → 0x12 re.L im.L                         // FFI complex
 +
 +string    → (0x20+len).U len*char.B
 +          | 0x0f (index-1).U                 // String dict entry
 +
 +.B = 8 bit
 +.I = 32 bit little-endian
 +.L = 64 bit little-endian
 +.U = prefix-encoded 32 bit unsigned number n:
 +     0x00..0xdf   → n.B
 +     0xe0..0x1fdf → (0xe0|(((n-0xe0)>>8)&0x1f)).B ((n-0xe0)&0xff).B
 +   0x1fe0..       → 0xff n.I
 +
+ +

Error handling

+

+Many of the buffer methods can throw an error. Out-of-memory or usage +errors are best caught with an outer wrapper for larger parts of code. +There's not much one can do after that, anyway. +

+

+OTOH, you may want to catch some errors individually. Buffer methods need +to receive the buffer object as the first argument. The Lua colon-syntax +obj:method() does that implicitly. But to wrap a method with +pcall(), the arguments need to be passed like this: +

+
 +local ok, err = pcall(buf.encode, buf, obj)
 +if not ok then
 +  -- Handle error in err.
 +end
 +
+ +

FFI caveats

+

+The string buffer library has been designed to work well together with +the FFI library. But due to the low-level nature of the FFI library, +some care needs to be taken: +

+

+First, please remember that FFI pointers are zero-indexed. The space +returned by buf:reserve() and buf:ref() starts at the +returned pointer and ends before len bytes after that. +

+

+I.e. the first valid index is ptr[0] and the last valid index +is ptr[len-1]. If the returned length is zero, there's no valid +index at all. The returned pointer may even be NULL. +

+

+The space pointed to by the returned pointer is only valid as long as +the buffer is not modified in any way (neither append, nor consume, nor +reset, etc.). The pointer is also not a GC anchor for the buffer object +itself. +

+

+Buffer data is only guaranteed to be byte-aligned. Casting the returned +pointer to a data type with higher alignment may cause unaligned +accesses. It depends on the CPU architecture whether this is allowed or +not (it's always OK on x86/x64 and mostly OK on other modern +architectures). +

+

+FFI pointers or references do not count as GC anchors for an underlying +object. E.g. an array allocated with ffi.new() is +anchored by buf:set(array, len), but not by +buf:set(array+offset, len). The addition of the offset +creates a new pointer, even when the offset is zero. In this case, you +need to make sure there's still a reference to the original array as +long as its contents are in use by the buffer. +

+

+Even though each LuaJIT VM instance is single-threaded (but you can +create multiple VMs), FFI data structures can be accessed concurrently. +Be careful when reading/writing FFI cdata from/to buffers to avoid +concurrent accesses or modifications. In particular, the memory +referenced by buf:set(cdata, len) must not be modified +while buffer readers are working on it. Shared, but read-only memory +mappings of files are OK, but only if the file does not change. +

+
+
+ + + diff --cc doc/ext_c_api.html index 151d20b5,ee64ec0f..ab72d19a --- a/doc/ext_c_api.html +++ b/doc/ext_c_api.html @@@ -2,8 -2,8 +2,8 @@@ Lua/C API Extensions - + - + diff --cc doc/ext_ffi.html index 5f1e2d7c,c78fef84..e8e5565d --- a/doc/ext_ffi.html +++ b/doc/ext_ffi.html @@@ -2,8 -2,8 +2,8 @@@ FFI Library - + - + diff --cc doc/ext_ffi_api.html index 8e99ff48,570ea4fe..ea03168a --- a/doc/ext_ffi_api.html +++ b/doc/ext_ffi_api.html @@@ -2,8 -2,8 +2,8 @@@ ffi.* API Functions - + - + diff --cc doc/ext_ffi_semantics.html index 603f9950,5ecb2f4e..419ef07a --- a/doc/ext_ffi_semantics.html +++ b/doc/ext_ffi_semantics.html @@@ -2,8 -2,8 +2,8 @@@ FFI Semantics - + - + diff --cc doc/ext_ffi_tutorial.html index ff0c3a9a,94e2f61d..3cf4862a --- a/doc/ext_ffi_tutorial.html +++ b/doc/ext_ffi_tutorial.html @@@ -2,8 -2,8 +2,8 @@@ FFI Tutorial - + - + diff --cc doc/ext_jit.html index 3ff5c05e,908701b6..7bf9f343 --- a/doc/ext_jit.html +++ b/doc/ext_jit.html @@@ -2,8 -2,8 +2,8 @@@ jit.* Library - + - + diff --cc doc/ext_profiler.html index d6e26efd,00000000..18880239 mode 100644,000000..100644 --- a/doc/ext_profiler.html +++ b/doc/ext_profiler.html @@@ -1,359 -1,0 +1,359 @@@ + + + +Profiler + - ++ + + + + + +
+Lua +
+ + +
+

+LuaJIT has an integrated statistical profiler with very low overhead. It +allows sampling the currently executing stack and other parameters in +regular intervals. +

+

+The integrated profiler can be accessed from three levels: +

+ + +

High-Level Profiler

+

+The bundled high-level profiler offers basic profiling functionality. It +generates simple textual summaries or source code annotations. It can be +accessed with the -jp command line option +or from Lua code by loading the underlying jit.p module. +

+

+To cut to the chase — run this to get a CPU usage profile by +function name: +

+
 +luajit -jp myapp.lua
 +
+

+It's not a stated goal of the bundled profiler to add every +possible option or to cater for special profiling needs. The low-level +profiler APIs are documented below. They may be used by third-party +authors to implement advanced functionality, e.g. IDE integration or +graphical profilers. +

+

+Note: Sampling works for both interpreted and JIT-compiled code. The +results for JIT-compiled code may sometimes be surprising. LuaJIT +heavily optimizes and inlines Lua code — there's no simple +one-to-one correspondence between source code lines and the sampled +machine code. +

+ +

-jp=[options[,output]]

+

+The -jp command line option starts the high-level profiler. +When the application run by the command line terminates, the profiler +stops and writes the results to stdout or to the specified +output file. +

+

+The options argument specifies how the profiling is to be +performed: +

+ +

+The default output for -jp is a list of the most CPU consuming +spots in the application. Increasing the stack dump depth with (say) +-jp=2 may help to point out the main callers or callees of +hotspots. But sample aggregation is still flat per unique stack dump. +

+

+To get a two-level view (split view) of callers/callees, use +-jp=s or -jp=-s. The percentages shown for the second +level are relative to the first level. +

+

+To see how much time is spent in each line relative to a function, use +-jp=fl. +

+

+To see how much time is spent in different VM states or +zones, use -jp=v or -jp=z. +

+

+Combinations of v/z with f/F/l produce two-level +views, e.g. -jp=vf or -jp=fv. This shows the time +spent in a VM state or zone vs. hotspots. This can be used to answer +questions like "Which time-consuming functions are only interpreted?" or +"What's the garbage collector overhead for a specific function?". +

+

+Multiple options can be combined — but not all combinations make +sense, see above. E.g. -jp=3si4m1 samples three stack levels +deep in 4ms intervals and shows a split view of the CPU consuming +functions and their callers with a 1% threshold. +

+

+Source code annotations produced by -jp=a or -jp=A are +always flat and at the line level. Obviously, the source code files need +to be readable by the profiler script. +

+

+The high-level profiler can also be started and stopped from Lua code with: +

+
 +require("jit.p").start(options, output)
 +...
 +require("jit.p").stop()
 +
+ +

jit.zone — Zones

+

+Zones can be used to provide information about different parts of an +application to the high-level profiler. E.g. a game could make use of an +"AI" zone, a "PHYS" zone, etc. Zones are hierarchical, +organized as a stack. +

+

+The jit.zone module needs to be loaded explicitly: +

+
 +local zone = require("jit.zone")
 +
+ +

+To show the time spent in each zone use -jp=z. To show the time +spent relative to hotspots use e.g. -jp=zf or -jp=fz. +

+ +

Low-level Lua API

+

+The jit.profile module gives access to the low-level API of the +profiler from Lua code. This module needs to be loaded explicitly: +

 +local profile = require("jit.profile")
 +
+

+This module can be used to implement your own higher-level profiler. +A typical profiling run starts the profiler, captures stack dumps in +the profiler callback, adds them to a hash table to aggregate the number +of samples, stops the profiler and then analyzes all captured +stack dumps. Other parameters can be sampled in the profiler callback, +too. But it's important not to spend too much time in the callback, +since this may skew the statistics. +

+ +

profile.start(mode, cb) +— Start profiler

+

+This function starts the profiler. The mode argument is a +string holding options: +

+ +

+The cb argument is a callback function which is called with +three arguments: (thread, samples, vmstate). The callback is +called on a separate coroutine, the thread argument is the +state that holds the stack to sample for profiling. Note: do +not modify the stack of that state or call functions on it. +

+

+samples gives the number of accumulated samples since the last +callback (usually 1). +

+

+vmstate holds the VM state at the time the profiling timer +triggered. This may or may not correspond to the state of the VM when +the profiling callback is called. The state is either 'N' +native (compiled) code, 'I' interpreted code, 'C' +C code, 'G' the garbage collector, or 'J' the JIT +compiler. +

+ +

profile.stop() +— Stop profiler

+

+This function stops the profiler. +

+ +

dump = profile.dumpstack([thread,] fmt, depth) +— Dump stack

+

+This function allows taking stack dumps in an efficient manner. It +returns a string with a stack dump for the thread (coroutine), +formatted according to the fmt argument: +

+ +

+The depth argument gives the number of frames to dump, starting +at the topmost frame of the thread. A negative number dumps the frames in +inverse order. +

+

+The first example prints a list of the current module names and line +numbers of up to 10 frames in separate lines. The second example prints +semicolon-separated function names for all frames (up to 100) in inverse +order: +

+
 +print(profile.dumpstack(thread, "l\n", 10))
 +print(profile.dumpstack(thread, "lZ;", -100))
 +
+ +

Low-level C API

+

+The profiler can be controlled directly from C code, e.g. for +use by IDEs. The declarations are in "luajit.h" (see +Lua/C API extensions). +

+ +

luaJIT_profile_start(L, mode, cb, data) +— Start profiler

+

+This function starts the profiler. See +above for a description of the mode argument. +

+

+The cb argument is a callback function with the following +declaration: +

+
 +typedef void (*luaJIT_profile_callback)(void *data, lua_State *L,
 +                                        int samples, int vmstate);
 +
+

+data is available for use by the callback. L is the +state that holds the stack to sample for profiling. Note: do +not modify this stack or call functions on this stack — +use a separate coroutine for this purpose. See +above for a description of samples and vmstate. +

+ +

luaJIT_profile_stop(L) +— Stop profiler

+

+This function stops the profiler. +

+ +

p = luaJIT_profile_dumpstack(L, fmt, depth, len) +— Dump stack

+

+This function allows taking stack dumps in an efficient manner. +See above for a description of fmt +and depth. +

+

+This function returns a const char * pointing to a +private string buffer of the profiler. The int *len +argument returns the length of the output string. The buffer is +overwritten on the next call and deallocated when the profiler stops. +You either need to consume the content immediately or copy it for later +use. +

+
+
+ + + diff --cc doc/extensions.html index f006a6db,fc673ef7..c7ace015 --- a/doc/extensions.html +++ b/doc/extensions.html @@@ -2,8 -2,8 +2,8 @@@ Extensions - + - + diff --cc doc/faq.html index c07fd248,f160fffe..a53a7512 --- a/doc/faq.html +++ b/doc/faq.html @@@ -2,8 -2,8 +2,8 @@@ Frequently Asked Questions (FAQ) - + - + diff --cc doc/install.html index d78dda3e,c960e071..0ccd37aa --- a/doc/install.html +++ b/doc/install.html @@@ -2,8 -2,8 +2,8 @@@ Installation - + - + diff --cc doc/luajit.html index e3a5478d,2895a981..3bb8aaf2 --- a/doc/luajit.html +++ b/doc/luajit.html @@@ -2,8 -2,8 +2,8 @@@ LuaJIT - + - + diff --cc doc/running.html index edc049fb,e8c9b1c6..325cb6bb --- a/doc/running.html +++ b/doc/running.html @@@ -2,8 -2,8 +2,8 @@@ Running LuaJIT - + - + diff --cc doc/status.html index efb1e064,b69a9721..49ced3f9 --- a/doc/status.html +++ b/doc/status.html @@@ -2,8 -2,8 +2,8 @@@ Status - + - + diff --cc dynasm/dasm_arm64.h index dffd64e8,00000000..1c541e5d mode 100644,000000..100644 --- a/dynasm/dasm_arm64.h +++ b/dynasm/dasm_arm64.h @@@ -1,558 -1,0 +1,558 @@@ +/* +** DynASM ARM64 encoding engine. - ** Copyright (C) 2005-2022 Mike Pall. All rights reserved. ++** Copyright (C) 2005-2023 Mike Pall. All rights reserved. +** Released under the MIT license. See dynasm.lua for full copyright notice. +*/ + +#include +#include +#include +#include + +#define DASM_ARCH "arm64" + +#ifndef DASM_EXTERN +#define DASM_EXTERN(a,b,c,d) 0 +#endif + +/* Action definitions. */ +enum { + DASM_STOP, DASM_SECTION, DASM_ESC, DASM_REL_EXT, + /* The following actions need a buffer position. */ + DASM_ALIGN, DASM_REL_LG, DASM_LABEL_LG, + /* The following actions also have an argument. */ + DASM_REL_PC, DASM_LABEL_PC, DASM_REL_A, + DASM_IMM, DASM_IMM6, DASM_IMM12, DASM_IMM13W, DASM_IMM13X, DASM_IMML, + DASM_IMMV, DASM_VREG, + DASM__MAX +}; + +/* Maximum number of section buffer positions for a single dasm_put() call. */ +#define DASM_MAXSECPOS 25 + +/* DynASM encoder status codes. Action list offset or number are or'ed in. */ +#define DASM_S_OK 0x00000000 +#define DASM_S_NOMEM 0x01000000 +#define DASM_S_PHASE 0x02000000 +#define DASM_S_MATCH_SEC 0x03000000 +#define DASM_S_RANGE_I 0x11000000 +#define DASM_S_RANGE_SEC 0x12000000 +#define DASM_S_RANGE_LG 0x13000000 +#define DASM_S_RANGE_PC 0x14000000 +#define DASM_S_RANGE_REL 0x15000000 +#define DASM_S_RANGE_VREG 0x16000000 +#define DASM_S_UNDEF_LG 0x21000000 +#define DASM_S_UNDEF_PC 0x22000000 + +/* Macros to convert positions (8 bit section + 24 bit index). */ +#define DASM_POS2IDX(pos) ((pos)&0x00ffffff) +#define DASM_POS2BIAS(pos) ((pos)&0xff000000) +#define DASM_SEC2POS(sec) ((sec)<<24) +#define DASM_POS2SEC(pos) ((pos)>>24) +#define DASM_POS2PTR(D, pos) (D->sections[DASM_POS2SEC(pos)].rbuf + (pos)) + +/* Action list type. */ +typedef const unsigned int *dasm_ActList; + +/* Per-section structure. */ +typedef struct dasm_Section { + int *rbuf; /* Biased buffer pointer (negative section bias). */ + int *buf; /* True buffer pointer. */ + size_t bsize; /* Buffer size in bytes. */ + int pos; /* Biased buffer position. */ + int epos; /* End of biased buffer position - max single put. */ + int ofs; /* Byte offset into section. */ +} dasm_Section; + +/* Core structure holding the DynASM encoding state. */ +struct dasm_State { + size_t psize; /* Allocated size of this structure. */ + dasm_ActList actionlist; /* Current actionlist pointer. */ + int *lglabels; /* Local/global chain/pos ptrs. */ + size_t lgsize; + int *pclabels; /* PC label chains/pos ptrs. */ + size_t pcsize; + void **globals; /* Array of globals. */ + dasm_Section *section; /* Pointer to active section. */ + size_t codesize; /* Total size of all code sections. */ + int maxsection; /* 0 <= sectionidx < maxsection. */ + int status; /* Status code. */ + dasm_Section sections[1]; /* All sections. Alloc-extended. */ +}; + +/* The size of the core structure depends on the max. number of sections. */ +#define DASM_PSZ(ms) (sizeof(dasm_State)+(ms-1)*sizeof(dasm_Section)) + + +/* Initialize DynASM state. */ +void dasm_init(Dst_DECL, int maxsection) +{ + dasm_State *D; + size_t psz = 0; + Dst_REF = NULL; + DASM_M_GROW(Dst, struct dasm_State, Dst_REF, psz, DASM_PSZ(maxsection)); + D = Dst_REF; + D->psize = psz; + D->lglabels = NULL; + D->lgsize = 0; + D->pclabels = NULL; + D->pcsize = 0; + D->globals = NULL; + D->maxsection = maxsection; + memset((void *)D->sections, 0, maxsection * sizeof(dasm_Section)); +} + +/* Free DynASM state. */ +void dasm_free(Dst_DECL) +{ + dasm_State *D = Dst_REF; + int i; + for (i = 0; i < D->maxsection; i++) + if (D->sections[i].buf) + DASM_M_FREE(Dst, D->sections[i].buf, D->sections[i].bsize); + if (D->pclabels) DASM_M_FREE(Dst, D->pclabels, D->pcsize); + if (D->lglabels) DASM_M_FREE(Dst, D->lglabels, D->lgsize); + DASM_M_FREE(Dst, D, D->psize); +} + +/* Setup global label array. Must be called before dasm_setup(). */ +void dasm_setupglobal(Dst_DECL, void **gl, unsigned int maxgl) +{ + dasm_State *D = Dst_REF; + D->globals = gl; + DASM_M_GROW(Dst, int, D->lglabels, D->lgsize, (10+maxgl)*sizeof(int)); +} + +/* Grow PC label array. Can be called after dasm_setup(), too. */ +void dasm_growpc(Dst_DECL, unsigned int maxpc) +{ + dasm_State *D = Dst_REF; + size_t osz = D->pcsize; + DASM_M_GROW(Dst, int, D->pclabels, D->pcsize, maxpc*sizeof(int)); + memset((void *)(((unsigned char *)D->pclabels)+osz), 0, D->pcsize-osz); +} + +/* Setup encoder. */ +void dasm_setup(Dst_DECL, const void *actionlist) +{ + dasm_State *D = Dst_REF; + int i; + D->actionlist = (dasm_ActList)actionlist; + D->status = DASM_S_OK; + D->section = &D->sections[0]; + memset((void *)D->lglabels, 0, D->lgsize); + if (D->pclabels) memset((void *)D->pclabels, 0, D->pcsize); + for (i = 0; i < D->maxsection; i++) { + D->sections[i].pos = DASM_SEC2POS(i); + D->sections[i].rbuf = D->sections[i].buf - D->sections[i].pos; + D->sections[i].ofs = 0; + } +} + + +#ifdef DASM_CHECKS +#define CK(x, st) \ + do { if (!(x)) { \ + D->status = DASM_S_##st|(int)(p-D->actionlist-1); return; } } while (0) +#define CKPL(kind, st) \ + do { if ((size_t)((char *)pl-(char *)D->kind##labels) >= D->kind##size) { \ + D->status = DASM_S_RANGE_##st|(int)(p-D->actionlist-1); return; } } while (0) +#else +#define CK(x, st) ((void)0) +#define CKPL(kind, st) ((void)0) +#endif + +static int dasm_imm12(unsigned int n) +{ + if ((n >> 12) == 0) + return n; + else if ((n & 0xff000fff) == 0) + return (n >> 12) | 0x1000; + else + return -1; +} + +static int dasm_ffs(unsigned long long x) +{ + int n = -1; + while (x) { x >>= 1; n++; } + return n; +} + +static int dasm_imm13(int lo, int hi) +{ + int inv = 0, w = 64, s = 0xfff, xa, xb; + unsigned long long n = (((unsigned long long)hi) << 32) | (unsigned int)lo; + unsigned long long m = 1ULL, a, b, c; + if (n & 1) { n = ~n; inv = 1; } + a = n & (unsigned long long)-(long long)n; + b = (n+a)&(unsigned long long)-(long long)(n+a); + c = (n+a-b)&(unsigned long long)-(long long)(n+a-b); + xa = dasm_ffs(a); xb = dasm_ffs(b); + if (c) { + w = dasm_ffs(c) - xa; + if (w == 32) m = 0x0000000100000001UL; + else if (w == 16) m = 0x0001000100010001UL; + else if (w == 8) m = 0x0101010101010101UL; + else if (w == 4) m = 0x1111111111111111UL; + else if (w == 2) m = 0x5555555555555555UL; + else return -1; + s = (-2*w & 0x3f) - 1; + } else if (!a) { + return -1; + } else if (xb == -1) { + xb = 64; + } + if ((b-a) * m != n) return -1; + if (inv) { + return ((w - xb) << 6) | (s+w+xa-xb); + } else { + return ((w - xa) << 6) | (s+xb-xa); + } + return -1; +} + +/* Pass 1: Store actions and args, link branches/labels, estimate offsets. */ +void dasm_put(Dst_DECL, int start, ...) +{ + va_list ap; + dasm_State *D = Dst_REF; + dasm_ActList p = D->actionlist + start; + dasm_Section *sec = D->section; + int pos = sec->pos, ofs = sec->ofs; + int *b; + + if (pos >= sec->epos) { + DASM_M_GROW(Dst, int, sec->buf, sec->bsize, + sec->bsize + 2*DASM_MAXSECPOS*sizeof(int)); + sec->rbuf = sec->buf - DASM_POS2BIAS(pos); + sec->epos = (int)sec->bsize/sizeof(int) - DASM_MAXSECPOS+DASM_POS2BIAS(pos); + } + + b = sec->rbuf; + b[pos++] = start; + + va_start(ap, start); + while (1) { + unsigned int ins = *p++; + unsigned int action = (ins >> 16); + if (action >= DASM__MAX) { + ofs += 4; + } else { + int *pl, n = action >= DASM_REL_PC ? va_arg(ap, int) : 0; + switch (action) { + case DASM_STOP: goto stop; + case DASM_SECTION: + n = (ins & 255); CK(n < D->maxsection, RANGE_SEC); + D->section = &D->sections[n]; goto stop; + case DASM_ESC: p++; ofs += 4; break; + case DASM_REL_EXT: if ((ins & 0x8000)) ofs += 8; break; + case DASM_ALIGN: ofs += (ins & 255); b[pos++] = ofs; break; + case DASM_REL_LG: + n = (ins & 2047) - 10; pl = D->lglabels + n; + /* Bkwd rel or global. */ + if (n >= 0) { CK(n>=10||*pl<0, RANGE_LG); CKPL(lg, LG); goto putrel; } + pl += 10; n = *pl; + if (n < 0) n = 0; /* Start new chain for fwd rel if label exists. */ + goto linkrel; + case DASM_REL_PC: + pl = D->pclabels + n; CKPL(pc, PC); + putrel: + n = *pl; + if (n < 0) { /* Label exists. Get label pos and store it. */ + b[pos] = -n; + } else { + linkrel: + b[pos] = n; /* Else link to rel chain, anchored at label. */ + *pl = pos; + } + pos++; + if ((ins & 0x8000)) ofs += 8; + break; + case DASM_REL_A: + b[pos++] = n; + b[pos++] = va_arg(ap, int); + break; + case DASM_LABEL_LG: + pl = D->lglabels + (ins & 2047) - 10; CKPL(lg, LG); goto putlabel; + case DASM_LABEL_PC: + pl = D->pclabels + n; CKPL(pc, PC); + putlabel: + n = *pl; /* n > 0: Collapse rel chain and replace with label pos. */ + while (n > 0) { int *pb = DASM_POS2PTR(D, n); n = *pb; *pb = pos; + } + *pl = -pos; /* Label exists now. */ + b[pos++] = ofs; /* Store pass1 offset estimate. */ + break; + case DASM_IMM: + CK((n & ((1<<((ins>>10)&31))-1)) == 0, RANGE_I); + n >>= ((ins>>10)&31); +#ifdef DASM_CHECKS + if ((ins & 0x8000)) + CK(((n + (1<<(((ins>>5)&31)-1)))>>((ins>>5)&31)) == 0, RANGE_I); + else + CK((n>>((ins>>5)&31)) == 0, RANGE_I); +#endif + b[pos++] = n; + break; + case DASM_IMM6: + CK((n >> 6) == 0, RANGE_I); + b[pos++] = n; + break; + case DASM_IMM12: + CK(dasm_imm12((unsigned int)n) != -1, RANGE_I); + b[pos++] = n; + break; + case DASM_IMM13W: + CK(dasm_imm13(n, n) != -1, RANGE_I); + b[pos++] = n; + break; + case DASM_IMM13X: { + int m = va_arg(ap, int); + CK(dasm_imm13(n, m) != -1, RANGE_I); + b[pos++] = n; + b[pos++] = m; + break; + } + case DASM_IMML: { +#ifdef DASM_CHECKS + int scale = (ins & 3); + CK((!(n & ((1<>scale) < 4096) || + (unsigned int)(n+256) < 512, RANGE_I); +#endif + b[pos++] = n; + break; + } + case DASM_IMMV: + ofs += 4; + b[pos++] = n; + break; + case DASM_VREG: + CK(n < 32, RANGE_VREG); + b[pos++] = n; + break; + } + } + } +stop: + va_end(ap); + sec->pos = pos; + sec->ofs = ofs; +} +#undef CK + +/* Pass 2: Link sections, shrink aligns, fix label offsets. */ +int dasm_link(Dst_DECL, size_t *szp) +{ + dasm_State *D = Dst_REF; + int secnum; + int ofs = 0; + +#ifdef DASM_CHECKS + *szp = 0; + if (D->status != DASM_S_OK) return D->status; + { + int pc; + for (pc = 0; pc*sizeof(int) < D->pcsize; pc++) + if (D->pclabels[pc] > 0) return DASM_S_UNDEF_PC|pc; + } +#endif + + { /* Handle globals not defined in this translation unit. */ + int idx; + for (idx = 10; idx*sizeof(int) < D->lgsize; idx++) { + int n = D->lglabels[idx]; + /* Undefined label: Collapse rel chain and replace with marker (< 0). */ + while (n > 0) { int *pb = DASM_POS2PTR(D, n); n = *pb; *pb = -idx; } + } + } + + /* Combine all code sections. No support for data sections (yet). */ + for (secnum = 0; secnum < D->maxsection; secnum++) { + dasm_Section *sec = D->sections + secnum; + int *b = sec->rbuf; + int pos = DASM_SEC2POS(secnum); + int lastpos = sec->pos; + + while (pos != lastpos) { + dasm_ActList p = D->actionlist + b[pos++]; + while (1) { + unsigned int ins = *p++; + unsigned int action = (ins >> 16); + switch (action) { + case DASM_STOP: case DASM_SECTION: goto stop; + case DASM_ESC: p++; break; + case DASM_REL_EXT: break; + case DASM_ALIGN: ofs -= (b[pos++] + ofs) & (ins & 255); break; + case DASM_REL_LG: case DASM_REL_PC: pos++; break; + case DASM_LABEL_LG: case DASM_LABEL_PC: b[pos++] += ofs; break; + case DASM_IMM: case DASM_IMM6: case DASM_IMM12: case DASM_IMM13W: + case DASM_IMML: case DASM_IMMV: case DASM_VREG: pos++; break; + case DASM_IMM13X: case DASM_REL_A: pos += 2; break; + } + } + stop: (void)0; + } + ofs += sec->ofs; /* Next section starts right after current section. */ + } + + D->codesize = ofs; /* Total size of all code sections */ + *szp = ofs; + return DASM_S_OK; +} + +#ifdef DASM_CHECKS +#define CK(x, st) \ + do { if (!(x)) return DASM_S_##st|(int)(p-D->actionlist-1); } while (0) +#else +#define CK(x, st) ((void)0) +#endif + +/* Pass 3: Encode sections. */ +int dasm_encode(Dst_DECL, void *buffer) +{ + dasm_State *D = Dst_REF; + char *base = (char *)buffer; + unsigned int *cp = (unsigned int *)buffer; + int secnum; + + /* Encode all code sections. No support for data sections (yet). */ + for (secnum = 0; secnum < D->maxsection; secnum++) { + dasm_Section *sec = D->sections + secnum; + int *b = sec->buf; + int *endb = sec->rbuf + sec->pos; + + while (b != endb) { + dasm_ActList p = D->actionlist + *b++; + while (1) { + unsigned int ins = *p++; + unsigned int action = (ins >> 16); + int n = (action >= DASM_ALIGN && action < DASM__MAX) ? *b++ : 0; + switch (action) { + case DASM_STOP: case DASM_SECTION: goto stop; + case DASM_ESC: *cp++ = *p++; break; + case DASM_REL_EXT: + n = DASM_EXTERN(Dst, (unsigned char *)cp, (ins&2047), !(ins&2048)); + goto patchrel; + case DASM_ALIGN: + ins &= 255; while ((((char *)cp - base) & ins)) *cp++ = 0xd503201f; + break; + case DASM_REL_LG: + if (n < 0) { + ptrdiff_t na = (ptrdiff_t)D->globals[-n-10] - (ptrdiff_t)cp + 4; + n = (int)na; + CK((ptrdiff_t)n == na, RANGE_REL); + goto patchrel; + } + /* fallthrough */ + case DASM_REL_PC: + CK(n >= 0, UNDEF_PC); + n = *DASM_POS2PTR(D, n) - (int)((char *)cp - base) + 4; + patchrel: + if (!(ins & 0xf800)) { /* B, BL */ + CK((n & 3) == 0 && ((n+0x08000000) >> 28) == 0, RANGE_REL); + cp[-1] |= ((n >> 2) & 0x03ffffff); + } else if ((ins & 0x800)) { /* B.cond, CBZ, CBNZ, LDR* literal */ + CK((n & 3) == 0 && ((n+0x00100000) >> 21) == 0, RANGE_REL); + cp[-1] |= ((n << 3) & 0x00ffffe0); + } else if ((ins & 0x3000) == 0x2000) { /* ADR */ + CK(((n+0x00100000) >> 21) == 0, RANGE_REL); + cp[-1] |= ((n << 3) & 0x00ffffe0) | ((n & 3) << 29); + } else if ((ins & 0x3000) == 0x3000) { /* ADRP */ + cp[-1] |= ((n >> 9) & 0x00ffffe0) | (((n >> 12) & 3) << 29); + } else if ((ins & 0x1000)) { /* TBZ, TBNZ */ + CK((n & 3) == 0 && ((n+0x00008000) >> 16) == 0, RANGE_REL); + cp[-1] |= ((n << 3) & 0x0007ffe0); + } else if ((ins & 0x8000)) { /* absolute */ + cp[0] = (unsigned int)((ptrdiff_t)cp - 4 + n); + cp[1] = (unsigned int)(((ptrdiff_t)cp - 4 + n) >> 32); + cp += 2; + } + break; + case DASM_REL_A: { + ptrdiff_t na = (((ptrdiff_t)(*b++) << 32) | (unsigned int)n); + if ((ins & 0x3000) == 0x3000) { /* ADRP */ + ins &= ~0x1000; + na = (na >> 12) - (((ptrdiff_t)cp - 4) >> 12); + } else { + na = na - (ptrdiff_t)cp + 4; + } + n = (int)na; + CK((ptrdiff_t)n == na, RANGE_REL); + goto patchrel; + } + case DASM_LABEL_LG: + ins &= 2047; if (ins >= 20) D->globals[ins-20] = (void *)(base + n); + break; + case DASM_LABEL_PC: break; + case DASM_IMM: + cp[-1] |= (n & ((1<<((ins>>5)&31))-1)) << (ins&31); + break; + case DASM_IMM6: + cp[-1] |= ((n&31) << 19) | ((n&32) << 26); + break; + case DASM_IMM12: + cp[-1] |= (dasm_imm12((unsigned int)n) << 10); + break; + case DASM_IMM13W: + cp[-1] |= (dasm_imm13(n, n) << 10); + break; + case DASM_IMM13X: + cp[-1] |= (dasm_imm13(n, *b++) << 10); + break; + case DASM_IMML: { + int scale = (ins & 3); + cp[-1] |= (!(n & ((1<>scale) < 4096) ? + ((n << (10-scale)) | 0x01000000) : ((n & 511) << 12); + break; + } + case DASM_IMMV: + *cp++ = n; + break; + case DASM_VREG: + cp[-1] |= (n & 0x1f) << (ins & 0x1f); + break; + default: *cp++ = ins; break; + } + } + stop: (void)0; + } + } + + if (base + D->codesize != (char *)cp) /* Check for phase errors. */ + return DASM_S_PHASE; + return DASM_S_OK; +} +#undef CK + +/* Get PC label offset. */ +int dasm_getpclabel(Dst_DECL, unsigned int pc) +{ + dasm_State *D = Dst_REF; + if (pc*sizeof(int) < D->pcsize) { + int pos = D->pclabels[pc]; + if (pos < 0) return *DASM_POS2PTR(D, -pos); + if (pos > 0) return -1; /* Undefined. */ + } + return -2; /* Unused or out of range. */ +} + +#ifdef DASM_CHECKS +/* Optional sanity checker to call between isolated encoding steps. */ +int dasm_checkstep(Dst_DECL, int secmatch) +{ + dasm_State *D = Dst_REF; + if (D->status == DASM_S_OK) { + int i; + for (i = 1; i <= 9; i++) { + if (D->lglabels[i] > 0) { D->status = DASM_S_UNDEF_LG|i; break; } + D->lglabels[i] = 0; + } + } + if (D->status == DASM_S_OK && secmatch >= 0 && + D->section != &D->sections[secmatch]) + D->status = DASM_S_MATCH_SEC|(int)(D->section-D->sections); + return D->status; +} +#endif + diff --cc dynasm/dasm_arm64.lua index fee902d5,00000000..e69f8ef3 mode 100644,000000..100644 --- a/dynasm/dasm_arm64.lua +++ b/dynasm/dasm_arm64.lua @@@ -1,1226 -1,0 +1,1226 @@@ +------------------------------------------------------------------------------ +-- DynASM ARM64 module. +-- - -- Copyright (C) 2005-2022 Mike Pall. All rights reserved. ++-- Copyright (C) 2005-2023 Mike Pall. All rights reserved. +-- See dynasm.lua for full copyright notice. +------------------------------------------------------------------------------ + +-- Module information: +local _info = { + arch = "arm", + description = "DynASM ARM64 module", + version = "1.5.0", + vernum = 10500, + release = "2021-05-02", + author = "Mike Pall", + license = "MIT", +} + +-- Exported glue functions for the arch-specific module. +local _M = { _info = _info } + +-- Cache library functions. +local type, tonumber, pairs, ipairs = type, tonumber, pairs, ipairs +local assert, setmetatable, rawget = assert, setmetatable, rawget +local _s = string +local format, byte, char = _s.format, _s.byte, _s.char +local match, gmatch, gsub = _s.match, _s.gmatch, _s.gsub +local concat, sort, insert = table.concat, table.sort, table.insert +local bit = bit or require("bit") +local band, shl, shr, sar = bit.band, bit.lshift, bit.rshift, bit.arshift +local ror, tohex, tobit = bit.ror, bit.tohex, bit.tobit + +-- Inherited tables and callbacks. +local g_opt, g_arch +local wline, werror, wfatal, wwarn + +-- Action name list. +-- CHECK: Keep this in sync with the C code! +local action_names = { + "STOP", "SECTION", "ESC", "REL_EXT", + "ALIGN", "REL_LG", "LABEL_LG", + "REL_PC", "LABEL_PC", "REL_A", + "IMM", "IMM6", "IMM12", "IMM13W", "IMM13X", "IMML", "IMMV", + "VREG", +} + +-- Maximum number of section buffer positions for dasm_put(). +-- CHECK: Keep this in sync with the C code! +local maxsecpos = 25 -- Keep this low, to avoid excessively long C lines. + +-- Action name -> action number. +local map_action = {} +for n,name in ipairs(action_names) do + map_action[name] = n-1 +end + +-- Action list buffer. +local actlist = {} + +-- Argument list for next dasm_put(). Start with offset 0 into action list. +local actargs = { 0 } + +-- Current number of section buffer positions for dasm_put(). +local secpos = 1 + +------------------------------------------------------------------------------ + +-- Dump action names and numbers. +local function dumpactions(out) + out:write("DynASM encoding engine action codes:\n") + for n,name in ipairs(action_names) do + local num = map_action[name] + out:write(format(" %-10s %02X %d\n", name, num, num)) + end + out:write("\n") +end + +-- Write action list buffer as a huge static C array. +local function writeactions(out, name) + local nn = #actlist + if nn == 0 then nn = 1; actlist[0] = map_action.STOP end + out:write("static const unsigned int ", name, "[", nn, "] = {\n") + for i = 1,nn-1 do + assert(out:write("0x", tohex(actlist[i]), ",\n")) + end + assert(out:write("0x", tohex(actlist[nn]), "\n};\n\n")) +end + +------------------------------------------------------------------------------ + +-- Add word to action list. +local function wputxw(n) + assert(n >= 0 and n <= 0xffffffff and n % 1 == 0, "word out of range") + actlist[#actlist+1] = n +end + +-- Add action to list with optional arg. Advance buffer pos, too. +local function waction(action, val, a, num) + local w = assert(map_action[action], "bad action name `"..action.."'") + wputxw(w * 0x10000 + (val or 0)) + if a then actargs[#actargs+1] = a end + if a or num then secpos = secpos + (num or 1) end +end + +-- Flush action list (intervening C code or buffer pos overflow). +local function wflush(term) + if #actlist == actargs[1] then return end -- Nothing to flush. + if not term then waction("STOP") end -- Terminate action list. + wline(format("dasm_put(Dst, %s);", concat(actargs, ", ")), true) + actargs = { #actlist } -- Actionlist offset is 1st arg to next dasm_put(). + secpos = 1 -- The actionlist offset occupies a buffer position, too. +end + +-- Put escaped word. +local function wputw(n) + if n <= 0x000fffff then waction("ESC") end + wputxw(n) +end + +-- Reserve position for word. +local function wpos() + local pos = #actlist+1 + actlist[pos] = "" + return pos +end + +-- Store word to reserved position. +local function wputpos(pos, n) + assert(n >= 0 and n <= 0xffffffff and n % 1 == 0, "word out of range") + if n <= 0x000fffff then + insert(actlist, pos+1, n) + n = map_action.ESC * 0x10000 + end + actlist[pos] = n +end + +------------------------------------------------------------------------------ + +-- Global label name -> global label number. With auto assignment on 1st use. +local next_global = 20 +local map_global = setmetatable({}, { __index = function(t, name) + if not match(name, "^[%a_][%w_]*$") then werror("bad global label") end + local n = next_global + if n > 2047 then werror("too many global labels") end + next_global = n + 1 + t[name] = n + return n +end}) + +-- Dump global labels. +local function dumpglobals(out, lvl) + local t = {} + for name, n in pairs(map_global) do t[n] = name end + out:write("Global labels:\n") + for i=20,next_global-1 do + out:write(format(" %s\n", t[i])) + end + out:write("\n") +end + +-- Write global label enum. +local function writeglobals(out, prefix) + local t = {} + for name, n in pairs(map_global) do t[n] = name end + out:write("enum {\n") + for i=20,next_global-1 do + out:write(" ", prefix, t[i], ",\n") + end + out:write(" ", prefix, "_MAX\n};\n") +end + +-- Write global label names. +local function writeglobalnames(out, name) + local t = {} + for name, n in pairs(map_global) do t[n] = name end + out:write("static const char *const ", name, "[] = {\n") + for i=20,next_global-1 do + out:write(" \"", t[i], "\",\n") + end + out:write(" (const char *)0\n};\n") +end + +------------------------------------------------------------------------------ + +-- Extern label name -> extern label number. With auto assignment on 1st use. +local next_extern = 0 +local map_extern_ = {} +local map_extern = setmetatable({}, { __index = function(t, name) + -- No restrictions on the name for now. + local n = next_extern + if n > 2047 then werror("too many extern labels") end + next_extern = n + 1 + t[name] = n + map_extern_[n] = name + return n +end}) + +-- Dump extern labels. +local function dumpexterns(out, lvl) + out:write("Extern labels:\n") + for i=0,next_extern-1 do + out:write(format(" %s\n", map_extern_[i])) + end + out:write("\n") +end + +-- Write extern label names. +local function writeexternnames(out, name) + out:write("static const char *const ", name, "[] = {\n") + for i=0,next_extern-1 do + out:write(" \"", map_extern_[i], "\",\n") + end + out:write(" (const char *)0\n};\n") +end + +------------------------------------------------------------------------------ + +-- Arch-specific maps. + +-- Ext. register name -> int. name. +local map_archdef = { xzr = "@x31", wzr = "@w31", lr = "x30", } + +-- Int. register name -> ext. name. +local map_reg_rev = { ["@x31"] = "xzr", ["@w31"] = "wzr", x30 = "lr", } + +local map_type = {} -- Type name -> { ctype, reg } +local ctypenum = 0 -- Type number (for Dt... macros). + +-- Reverse defines for registers. +function _M.revdef(s) + return map_reg_rev[s] or s +end + +local map_shift = { lsl = 0, lsr = 1, asr = 2, } + +local map_extend = { + uxtb = 0, uxth = 1, uxtw = 2, uxtx = 3, + sxtb = 4, sxth = 5, sxtw = 6, sxtx = 7, +} + +local map_cond = { + eq = 0, ne = 1, cs = 2, cc = 3, mi = 4, pl = 5, vs = 6, vc = 7, + hi = 8, ls = 9, ge = 10, lt = 11, gt = 12, le = 13, al = 14, + hs = 2, lo = 3, +} + +------------------------------------------------------------------------------ + +local parse_reg_type + +local function parse_reg(expr, shift, no_vreg) + if not expr then werror("expected register name") end + local tname, ovreg = match(expr, "^([%w_]+):(@?%l%d+)$") + if not tname then + tname, ovreg = match(expr, "^([%w_]+):(R[xwqdshb]%b())$") + end + local tp = map_type[tname or expr] + if tp then + local reg = ovreg or tp.reg + if not reg then + werror("type `"..(tname or expr).."' needs a register override") + end + expr = reg + end + local ok31, rt, r = match(expr, "^(@?)([xwqdshb])([123]?[0-9])$") + if r then + r = tonumber(r) + if r <= 30 or (r == 31 and ok31 ~= "" or (rt ~= "w" and rt ~= "x")) then + if not parse_reg_type then + parse_reg_type = rt + elseif parse_reg_type ~= rt then + werror("register size mismatch") + end + return shl(r, shift), tp + end + end + local vrt, vreg = match(expr, "^R([xwqdshb])(%b())$") + if vreg then + if not parse_reg_type then + parse_reg_type = vrt + elseif parse_reg_type ~= vrt then + werror("register size mismatch") + end + if not no_vreg then waction("VREG", shift, vreg) end + return 0 + end + werror("bad register name `"..expr.."'") +end + +local function parse_reg_base(expr) + if expr == "sp" then return 0x3e0 end + local base, tp = parse_reg(expr, 5) + if parse_reg_type ~= "x" then werror("bad register type") end + parse_reg_type = false + return base, tp +end + +local parse_ctx = {} + +local loadenv = setfenv and function(s) + local code = loadstring(s, "") + if code then setfenv(code, parse_ctx) end + return code +end or function(s) + return load(s, "", nil, parse_ctx) +end + +-- Try to parse simple arithmetic, too, since some basic ops are aliases. +local function parse_number(n) + local x = tonumber(n) + if x then return x end + local code = loadenv("return "..n) + if code then + local ok, y = pcall(code) + if ok and type(y) == "number" then return y end + end + return nil +end + +local function parse_imm(imm, bits, shift, scale, signed) + imm = match(imm, "^#(.*)$") + if not imm then werror("expected immediate operand") end + local n = parse_number(imm) + if n then + local m = sar(n, scale) + if shl(m, scale) == n then + if signed then + local s = sar(m, bits-1) + if s == 0 then return shl(m, shift) + elseif s == -1 then return shl(m + shl(1, bits), shift) end + else + if sar(m, bits) == 0 then return shl(m, shift) end + end + end + werror("out of range immediate `"..imm.."'") + else + waction("IMM", (signed and 32768 or 0)+scale*1024+bits*32+shift, imm) + return 0 + end +end + +local function parse_imm12(imm) + imm = match(imm, "^#(.*)$") + if not imm then werror("expected immediate operand") end + local n = parse_number(imm) + if n then + if shr(n, 12) == 0 then + return shl(n, 10) + elseif band(n, 0xff000fff) == 0 then + return shr(n, 2) + 0x00400000 + end + werror("out of range immediate `"..imm.."'") + else + waction("IMM12", 0, imm) + return 0 + end +end + +local function parse_imm13(imm) + imm = match(imm, "^#(.*)$") + if not imm then werror("expected immediate operand") end + local n = parse_number(imm) + local r64 = parse_reg_type == "x" + if n and n % 1 == 0 and n >= 0 and n <= 0xffffffff then + local inv = false + if band(n, 1) == 1 then n = bit.bnot(n); inv = true end + local t = {} + for i=1,32 do t[i] = band(n, 1); n = shr(n, 1) end + local b = table.concat(t) + b = b..(r64 and (inv and "1" or "0"):rep(32) or b) + local p0, p1, p0a, p1a = b:match("^(0+)(1+)(0*)(1*)") + if p0 then + local w = p1a == "" and (r64 and 64 or 32) or #p1+#p0a + if band(w, w-1) == 0 and b == b:sub(1, w):rep(64/w) then + local s = band(-2*w, 0x3f) - 1 + if w == 64 then s = s + 0x1000 end + if inv then + return shl(w-#p1-#p0, 16) + shl(s+w-#p1, 10) + else + return shl(w-#p0, 16) + shl(s+#p1, 10) + end + end + end + werror("out of range immediate `"..imm.."'") + elseif r64 then + waction("IMM13X", 0, format("(unsigned int)(%s)", imm)) + actargs[#actargs+1] = format("(unsigned int)((unsigned long long)(%s)>>32)", imm) + return 0 + else + waction("IMM13W", 0, imm) + return 0 + end +end + +local function parse_imm6(imm) + imm = match(imm, "^#(.*)$") + if not imm then werror("expected immediate operand") end + local n = parse_number(imm) + if n then + if n >= 0 and n <= 63 then + return shl(band(n, 0x1f), 19) + (n >= 32 and 0x80000000 or 0) + end + werror("out of range immediate `"..imm.."'") + else + waction("IMM6", 0, imm) + return 0 + end +end + +local function parse_imm_load(imm, scale) + local n = parse_number(imm) + if n then + local m = sar(n, scale) + if shl(m, scale) == n and m >= 0 and m < 0x1000 then + return shl(m, 10) + 0x01000000 -- Scaled, unsigned 12 bit offset. + elseif n >= -256 and n < 256 then + return shl(band(n, 511), 12) -- Unscaled, signed 9 bit offset. + end + werror("out of range immediate `"..imm.."'") + else + waction("IMML", scale, imm) + return 0 + end +end + +local function parse_fpimm(imm) + imm = match(imm, "^#(.*)$") + if not imm then werror("expected immediate operand") end + local n = parse_number(imm) + if n then + local m, e = math.frexp(n) + local s, e2 = 0, band(e-2, 7) + if m < 0 then m = -m; s = 0x00100000 end + m = m*32-16 + if m % 1 == 0 and m >= 0 and m <= 15 and sar(shl(e2, 29), 29)+2 == e then + return s + shl(e2, 17) + shl(m, 13) + end + werror("out of range immediate `"..imm.."'") + else + werror("NYI fpimm action") + end +end + +local function parse_shift(expr) + local s, s2 = match(expr, "^(%S+)%s*(.*)$") + s = map_shift[s] + if not s then werror("expected shift operand") end + return parse_imm(s2, 6, 10, 0, false) + shl(s, 22) +end + +local function parse_lslx16(expr) + local n = match(expr, "^lsl%s*#(%d+)$") + n = tonumber(n) + if not n then werror("expected shift operand") end + if band(n, parse_reg_type == "x" and 0xffffffcf or 0xffffffef) ~= 0 then + werror("bad shift amount") + end + return shl(n, 17) +end + +local function parse_extend(expr) + local s, s2 = match(expr, "^(%S+)%s*(.*)$") + if s == "lsl" then + s = parse_reg_type == "x" and 3 or 2 + else + s = map_extend[s] + end + if not s then werror("expected extend operand") end + return (s2 == "" and 0 or parse_imm(s2, 3, 10, 0, false)) + shl(s, 13) +end + +local function parse_cond(expr, inv) + local c = map_cond[expr] + if not c then werror("expected condition operand") end + return shl(bit.bxor(c, inv), 12) +end + +local function parse_load(params, nparams, n, op) + if params[n+2] then werror("too many operands") end + local scale = shr(op, 30) + local pn, p2 = params[n], params[n+1] + local p1, wb = match(pn, "^%[%s*(.-)%s*%](!?)$") + if not p1 then + if not p2 then + local reg, tailr = match(pn, "^([%w_:]+)%s*(.*)$") + if reg and tailr ~= "" then + local base, tp = parse_reg_base(reg) + if tp then + waction("IMML", scale, format(tp.ctypefmt, tailr)) + return op + base + end + end + end + werror("expected address operand") + end + if p2 then + if wb == "!" then werror("bad use of '!'") end + op = op + parse_reg_base(p1) + parse_imm(p2, 9, 12, 0, true) + 0x400 + elseif wb == "!" then + local p1a, p2a = match(p1, "^([^,%s]*)%s*,%s*(.*)$") + if not p1a then werror("bad use of '!'") end + op = op + parse_reg_base(p1a) + parse_imm(p2a, 9, 12, 0, true) + 0xc00 + else + local p1a, p2a = match(p1, "^([^,%s]*)%s*(.*)$") + op = op + parse_reg_base(p1a) + if p2a ~= "" then + local imm = match(p2a, "^,%s*#(.*)$") + if imm then + op = op + parse_imm_load(imm, scale) + else + local p2b, p3b, p3s = match(p2a, "^,%s*([^,%s]*)%s*,?%s*(%S*)%s*(.*)$") + op = op + parse_reg(p2b, 16) + 0x00200800 + if parse_reg_type ~= "x" and parse_reg_type ~= "w" then + werror("bad index register type") + end + if p3b == "" then + if parse_reg_type ~= "x" then werror("bad index register type") end + op = op + 0x6000 + else + if p3s == "" or p3s == "#0" then + elseif p3s == "#"..scale then + op = op + 0x1000 + else + werror("bad scale") + end + if parse_reg_type == "x" then + if p3b == "lsl" and p3s ~= "" then op = op + 0x6000 + elseif p3b == "sxtx" then op = op + 0xe000 + else + werror("bad extend/shift specifier") + end + else + if p3b == "uxtw" then op = op + 0x4000 + elseif p3b == "sxtw" then op = op + 0xc000 + else + werror("bad extend/shift specifier") + end + end + end + end + else + if wb == "!" then werror("bad use of '!'") end + op = op + 0x01000000 + end + end + return op +end + +local function parse_load_pair(params, nparams, n, op) + if params[n+2] then werror("too many operands") end + local pn, p2 = params[n], params[n+1] + local scale = shr(op, 30) == 0 and 2 or 3 + local p1, wb = match(pn, "^%[%s*(.-)%s*%](!?)$") + if not p1 then + if not p2 then + local reg, tailr = match(pn, "^([%w_:]+)%s*(.*)$") + if reg and tailr ~= "" then + local base, tp = parse_reg_base(reg) + if tp then + waction("IMM", 32768+7*32+15+scale*1024, format(tp.ctypefmt, tailr)) + return op + base + 0x01000000 + end + end + end + werror("expected address operand") + end + if p2 then + if wb == "!" then werror("bad use of '!'") end + op = op + 0x00800000 + else + local p1a, p2a = match(p1, "^([^,%s]*)%s*,%s*(.*)$") + if p1a then p1, p2 = p1a, p2a else p2 = "#0" end + op = op + (wb == "!" and 0x01800000 or 0x01000000) + end + return op + parse_reg_base(p1) + parse_imm(p2, 7, 15, scale, true) +end + +local function parse_label(label, def) + local prefix = label:sub(1, 2) + -- =>label (pc label reference) + if prefix == "=>" then + return "PC", 0, label:sub(3) + end + -- ->name (global label reference) + if prefix == "->" then + return "LG", map_global[label:sub(3)] + end + if def then + -- [1-9] (local label definition) + if match(label, "^[1-9]$") then + return "LG", 10+tonumber(label) + end + else + -- [<>][1-9] (local label reference) + local dir, lnum = match(label, "^([<>])([1-9])$") + if dir then -- Fwd: 1-9, Bkwd: 11-19. + return "LG", lnum + (dir == ">" and 0 or 10) + end + -- extern label (extern label reference) + local extname = match(label, "^extern%s+(%S+)$") + if extname then + return "EXT", map_extern[extname] + end + -- &expr (pointer) + if label:sub(1, 1) == "&" then + return "A", 0, format("(ptrdiff_t)(%s)", label:sub(2)) + end + end +end + +local function branch_type(op) + if band(op, 0x7c000000) == 0x14000000 then return 0 -- B, BL + elseif shr(op, 24) == 0x54 or band(op, 0x7e000000) == 0x34000000 or + band(op, 0x3b000000) == 0x18000000 then + return 0x800 -- B.cond, CBZ, CBNZ, LDR* literal + elseif band(op, 0x7e000000) == 0x36000000 then return 0x1000 -- TBZ, TBNZ + elseif band(op, 0x9f000000) == 0x10000000 then return 0x2000 -- ADR + elseif band(op, 0x9f000000) == band(0x90000000) then return 0x3000 -- ADRP + else + assert(false, "unknown branch type") + end +end + +------------------------------------------------------------------------------ + +local map_op, op_template + +local function op_alias(opname, f) + return function(params, nparams) + if not params then return "-> "..opname:sub(1, -3) end + f(params, nparams) + op_template(params, map_op[opname], nparams) + end +end + +local function alias_bfx(p) + p[4] = "#("..p[3]:sub(2)..")+("..p[4]:sub(2)..")-1" +end + +local function alias_bfiz(p) + parse_reg(p[1], 0, true) + if parse_reg_type == "w" then + p[3] = "#(32-("..p[3]:sub(2).."))%32" + p[4] = "#("..p[4]:sub(2)..")-1" + else + p[3] = "#(64-("..p[3]:sub(2).."))%64" + p[4] = "#("..p[4]:sub(2)..")-1" + end +end + +local alias_lslimm = op_alias("ubfm_4", function(p) + parse_reg(p[1], 0, true) + local sh = p[3]:sub(2) + if parse_reg_type == "w" then + p[3] = "#(32-("..sh.."))%32" + p[4] = "#31-("..sh..")" + else + p[3] = "#(64-("..sh.."))%64" + p[4] = "#63-("..sh..")" + end +end) + +-- Template strings for ARM instructions. +map_op = { + -- Basic data processing instructions. + add_3 = "0b000000DNMg|11000000pDpNIg|8b206000pDpNMx", + add_4 = "0b000000DNMSg|0b200000DNMXg|8b200000pDpNMXx|8b200000pDpNxMwX", + adds_3 = "2b000000DNMg|31000000DpNIg|ab206000DpNMx", + adds_4 = "2b000000DNMSg|2b200000DNMXg|ab200000DpNMXx|ab200000DpNxMwX", + cmn_2 = "2b00001fNMg|3100001fpNIg|ab20601fpNMx", + cmn_3 = "2b00001fNMSg|2b20001fNMXg|ab20001fpNMXx|ab20001fpNxMwX", + + sub_3 = "4b000000DNMg|51000000pDpNIg|cb206000pDpNMx", + sub_4 = "4b000000DNMSg|4b200000DNMXg|cb200000pDpNMXx|cb200000pDpNxMwX", + subs_3 = "6b000000DNMg|71000000DpNIg|eb206000DpNMx", + subs_4 = "6b000000DNMSg|6b200000DNMXg|eb200000DpNMXx|eb200000DpNxMwX", + cmp_2 = "6b00001fNMg|7100001fpNIg|eb20601fpNMx", + cmp_3 = "6b00001fNMSg|6b20001fNMXg|eb20001fpNMXx|eb20001fpNxMwX", + + neg_2 = "4b0003e0DMg", + neg_3 = "4b0003e0DMSg", + negs_2 = "6b0003e0DMg", + negs_3 = "6b0003e0DMSg", + + adc_3 = "1a000000DNMg", + adcs_3 = "3a000000DNMg", + sbc_3 = "5a000000DNMg", + sbcs_3 = "7a000000DNMg", + ngc_2 = "5a0003e0DMg", + ngcs_2 = "7a0003e0DMg", + + and_3 = "0a000000DNMg|12000000pDNig", + and_4 = "0a000000DNMSg", + orr_3 = "2a000000DNMg|32000000pDNig", + orr_4 = "2a000000DNMSg", + eor_3 = "4a000000DNMg|52000000pDNig", + eor_4 = "4a000000DNMSg", + ands_3 = "6a000000DNMg|72000000DNig", + ands_4 = "6a000000DNMSg", + tst_2 = "6a00001fNMg|7200001fNig", + tst_3 = "6a00001fNMSg", + + bic_3 = "0a200000DNMg", + bic_4 = "0a200000DNMSg", + orn_3 = "2a200000DNMg", + orn_4 = "2a200000DNMSg", + eon_3 = "4a200000DNMg", + eon_4 = "4a200000DNMSg", + bics_3 = "6a200000DNMg", + bics_4 = "6a200000DNMSg", + + movn_2 = "12800000DWg", + movn_3 = "12800000DWRg", + movz_2 = "52800000DWg", + movz_3 = "52800000DWRg", + movk_2 = "72800000DWg", + movk_3 = "72800000DWRg", + + -- TODO: this doesn't cover all valid immediates for mov reg, #imm. + mov_2 = "2a0003e0DMg|52800000DW|320003e0pDig|11000000pDpNg", + mov_3 = "2a0003e0DMSg", + mvn_2 = "2a2003e0DMg", + mvn_3 = "2a2003e0DMSg", + + adr_2 = "10000000DBx", + adrp_2 = "90000000DBx", + + csel_4 = "1a800000DNMCg", + csinc_4 = "1a800400DNMCg", + csinv_4 = "5a800000DNMCg", + csneg_4 = "5a800400DNMCg", + cset_2 = "1a9f07e0Dcg", + csetm_2 = "5a9f03e0Dcg", + cinc_3 = "1a800400DNmcg", + cinv_3 = "5a800000DNmcg", + cneg_3 = "5a800400DNmcg", + + ccmn_4 = "3a400000NMVCg|3a400800N5VCg", + ccmp_4 = "7a400000NMVCg|7a400800N5VCg", + + madd_4 = "1b000000DNMAg", + msub_4 = "1b008000DNMAg", + mul_3 = "1b007c00DNMg", + mneg_3 = "1b00fc00DNMg", + + smaddl_4 = "9b200000DxNMwAx", + smsubl_4 = "9b208000DxNMwAx", + smull_3 = "9b207c00DxNMw", + smnegl_3 = "9b20fc00DxNMw", + smulh_3 = "9b407c00DNMx", + umaddl_4 = "9ba00000DxNMwAx", + umsubl_4 = "9ba08000DxNMwAx", + umull_3 = "9ba07c00DxNMw", + umnegl_3 = "9ba0fc00DxNMw", + umulh_3 = "9bc07c00DNMx", + + udiv_3 = "1ac00800DNMg", + sdiv_3 = "1ac00c00DNMg", + + -- Bit operations. + sbfm_4 = "13000000DN12w|93400000DN12x", + bfm_4 = "33000000DN12w|b3400000DN12x", + ubfm_4 = "53000000DN12w|d3400000DN12x", + extr_4 = "13800000DNM2w|93c00000DNM2x", + + sxtb_2 = "13001c00DNw|93401c00DNx", + sxth_2 = "13003c00DNw|93403c00DNx", + sxtw_2 = "93407c00DxNw", + uxtb_2 = "53001c00DNw", + uxth_2 = "53003c00DNw", + + sbfx_4 = op_alias("sbfm_4", alias_bfx), + bfxil_4 = op_alias("bfm_4", alias_bfx), + ubfx_4 = op_alias("ubfm_4", alias_bfx), + sbfiz_4 = op_alias("sbfm_4", alias_bfiz), + bfi_4 = op_alias("bfm_4", alias_bfiz), + ubfiz_4 = op_alias("ubfm_4", alias_bfiz), + + lsl_3 = function(params, nparams) + if params and params[3]:byte() == 35 then + return alias_lslimm(params, nparams) + else + return op_template(params, "1ac02000DNMg", nparams) + end + end, + lsr_3 = "1ac02400DNMg|53007c00DN1w|d340fc00DN1x", + asr_3 = "1ac02800DNMg|13007c00DN1w|9340fc00DN1x", + ror_3 = "1ac02c00DNMg|13800000DNm2w|93c00000DNm2x", + + clz_2 = "5ac01000DNg", + cls_2 = "5ac01400DNg", + rbit_2 = "5ac00000DNg", + rev_2 = "5ac00800DNw|dac00c00DNx", + rev16_2 = "5ac00400DNg", + rev32_2 = "dac00800DNx", + + -- Loads and stores. + ["strb_*"] = "38000000DwL", + ["ldrb_*"] = "38400000DwL", + ["ldrsb_*"] = "38c00000DwL|38800000DxL", + ["strh_*"] = "78000000DwL", + ["ldrh_*"] = "78400000DwL", + ["ldrsh_*"] = "78c00000DwL|78800000DxL", + ["str_*"] = "b8000000DwL|f8000000DxL|bc000000DsL|fc000000DdL", + ["ldr_*"] = "18000000DwB|58000000DxB|1c000000DsB|5c000000DdB|b8400000DwL|f8400000DxL|bc400000DsL|fc400000DdL", + ["ldrsw_*"] = "98000000DxB|b8800000DxL", + -- NOTE: ldur etc. are handled by ldr et al. + + ["stp_*"] = "28000000DAwP|a8000000DAxP|2c000000DAsP|6c000000DAdP", + ["ldp_*"] = "28400000DAwP|a8400000DAxP|2c400000DAsP|6c400000DAdP", + ["ldpsw_*"] = "68400000DAxP", + + -- Branches. + b_1 = "14000000B", + bl_1 = "94000000B", + blr_1 = "d63f0000Nx", + br_1 = "d61f0000Nx", + ret_0 = "d65f03c0", + ret_1 = "d65f0000Nx", + -- b.cond is added below. + cbz_2 = "34000000DBg", + cbnz_2 = "35000000DBg", + tbz_3 = "36000000DTBw|36000000DTBx", + tbnz_3 = "37000000DTBw|37000000DTBx", + + -- ARM64e: Pointer authentication codes (PAC). + blraaz_1 = "d63f081fNx", + braa_2 = "d71f0800NDx", + braaz_1 = "d61f081fNx", + pacibsp_0 = "d503237f", + retab_0 = "d65f0fff", + + -- Miscellaneous instructions. + -- TODO: hlt, hvc, smc, svc, eret, dcps[123], drps, mrs, msr + -- TODO: sys, sysl, ic, dc, at, tlbi + -- TODO: hint, yield, wfe, wfi, sev, sevl + -- TODO: clrex, dsb, dmb, isb + nop_0 = "d503201f", + brk_0 = "d4200000", + brk_1 = "d4200000W", + + -- Floating point instructions. + fmov_2 = "1e204000DNf|1e260000DwNs|1e270000DsNw|9e660000DxNd|9e670000DdNx|1e201000DFf", + fabs_2 = "1e20c000DNf", + fneg_2 = "1e214000DNf", + fsqrt_2 = "1e21c000DNf", + + fcvt_2 = "1e22c000DdNs|1e624000DsNd", + + -- TODO: half-precision and fixed-point conversions. + fcvtas_2 = "1e240000DwNs|9e240000DxNs|1e640000DwNd|9e640000DxNd", + fcvtau_2 = "1e250000DwNs|9e250000DxNs|1e650000DwNd|9e650000DxNd", + fcvtms_2 = "1e300000DwNs|9e300000DxNs|1e700000DwNd|9e700000DxNd", + fcvtmu_2 = "1e310000DwNs|9e310000DxNs|1e710000DwNd|9e710000DxNd", + fcvtns_2 = "1e200000DwNs|9e200000DxNs|1e600000DwNd|9e600000DxNd", + fcvtnu_2 = "1e210000DwNs|9e210000DxNs|1e610000DwNd|9e610000DxNd", + fcvtps_2 = "1e280000DwNs|9e280000DxNs|1e680000DwNd|9e680000DxNd", + fcvtpu_2 = "1e290000DwNs|9e290000DxNs|1e690000DwNd|9e690000DxNd", + fcvtzs_2 = "1e380000DwNs|9e380000DxNs|1e780000DwNd|9e780000DxNd", + fcvtzu_2 = "1e390000DwNs|9e390000DxNs|1e790000DwNd|9e790000DxNd", + + scvtf_2 = "1e220000DsNw|9e220000DsNx|1e620000DdNw|9e620000DdNx", + ucvtf_2 = "1e230000DsNw|9e230000DsNx|1e630000DdNw|9e630000DdNx", + + frintn_2 = "1e244000DNf", + frintp_2 = "1e24c000DNf", + frintm_2 = "1e254000DNf", + frintz_2 = "1e25c000DNf", + frinta_2 = "1e264000DNf", + frintx_2 = "1e274000DNf", + frinti_2 = "1e27c000DNf", + + fadd_3 = "1e202800DNMf", + fsub_3 = "1e203800DNMf", + fmul_3 = "1e200800DNMf", + fnmul_3 = "1e208800DNMf", + fdiv_3 = "1e201800DNMf", + + fmadd_4 = "1f000000DNMAf", + fmsub_4 = "1f008000DNMAf", + fnmadd_4 = "1f200000DNMAf", + fnmsub_4 = "1f208000DNMAf", + + fmax_3 = "1e204800DNMf", + fmaxnm_3 = "1e206800DNMf", + fmin_3 = "1e205800DNMf", + fminnm_3 = "1e207800DNMf", + + fcmp_2 = "1e202000NMf|1e202008NZf", + fcmpe_2 = "1e202010NMf|1e202018NZf", + + fccmp_4 = "1e200400NMVCf", + fccmpe_4 = "1e200410NMVCf", + + fcsel_4 = "1e200c00DNMCf", + + -- TODO: crc32*, aes*, sha*, pmull + -- TODO: SIMD instructions. +} + +for cond,c in pairs(map_cond) do + map_op["b"..cond.."_1"] = tohex(0x54000000+c).."B" +end + +------------------------------------------------------------------------------ + +-- Handle opcodes defined with template strings. +local function parse_template(params, template, nparams, pos) + local op = tonumber(template:sub(1, 8), 16) + local n = 1 + local rtt = {} + + parse_reg_type = false + + -- Process each character. + for p in gmatch(template:sub(9), ".") do + local q = params[n] + if p == "D" then + op = op + parse_reg(q, 0); n = n + 1 + elseif p == "N" then + op = op + parse_reg(q, 5); n = n + 1 + elseif p == "M" then + op = op + parse_reg(q, 16); n = n + 1 + elseif p == "A" then + op = op + parse_reg(q, 10); n = n + 1 + elseif p == "m" then + op = op + parse_reg(params[n-1], 16) + + elseif p == "p" then + if q == "sp" then params[n] = "@x31" end + elseif p == "g" then + if parse_reg_type == "x" then + op = op + 0x80000000 + elseif parse_reg_type ~= "w" then + werror("bad register type") + end + parse_reg_type = false + elseif p == "f" then + if parse_reg_type == "d" then + op = op + 0x00400000 + elseif parse_reg_type ~= "s" then + werror("bad register type") + end + parse_reg_type = false + elseif p == "x" or p == "w" or p == "d" or p == "s" then + if parse_reg_type ~= p then + werror("register size mismatch") + end + parse_reg_type = false + + elseif p == "L" then + op = parse_load(params, nparams, n, op) + elseif p == "P" then + op = parse_load_pair(params, nparams, n, op) + + elseif p == "B" then + local mode, v, s = parse_label(q, false); n = n + 1 + if not mode then werror("bad label `"..q.."'") end + local m = branch_type(op) + if mode == "A" then + waction("REL_"..mode, v+m, format("(unsigned int)(%s)", s)) + actargs[#actargs+1] = format("(unsigned int)((%s)>>32)", s) + else + waction("REL_"..mode, v+m, s, 1) + end + + elseif p == "I" then + op = op + parse_imm12(q); n = n + 1 + elseif p == "i" then + op = op + parse_imm13(q); n = n + 1 + elseif p == "W" then + op = op + parse_imm(q, 16, 5, 0, false); n = n + 1 + elseif p == "T" then + op = op + parse_imm6(q); n = n + 1 + elseif p == "1" then + op = op + parse_imm(q, 6, 16, 0, false); n = n + 1 + elseif p == "2" then + op = op + parse_imm(q, 6, 10, 0, false); n = n + 1 + elseif p == "5" then + op = op + parse_imm(q, 5, 16, 0, false); n = n + 1 + elseif p == "V" then + op = op + parse_imm(q, 4, 0, 0, false); n = n + 1 + elseif p == "F" then + op = op + parse_fpimm(q); n = n + 1 + elseif p == "Z" then + if q ~= "#0" and q ~= "#0.0" then werror("expected zero immediate") end + n = n + 1 + + elseif p == "S" then + op = op + parse_shift(q); n = n + 1 + elseif p == "X" then + op = op + parse_extend(q); n = n + 1 + elseif p == "R" then + op = op + parse_lslx16(q); n = n + 1 + elseif p == "C" then + op = op + parse_cond(q, 0); n = n + 1 + elseif p == "c" then + op = op + parse_cond(q, 1); n = n + 1 + + else + assert(false) + end + end + wputpos(pos, op) +end + +function op_template(params, template, nparams) + if not params then return template:gsub("%x%x%x%x%x%x%x%x", "") end + + -- Limit number of section buffer positions used by a single dasm_put(). + -- A single opcode needs a maximum of 4 positions. + if secpos+4 > maxsecpos then wflush() end + local pos = wpos() + local lpos, apos, spos = #actlist, #actargs, secpos + + local ok, err + for t in gmatch(template, "[^|]+") do + ok, err = pcall(parse_template, params, t, nparams, pos) + if ok then return end + secpos = spos + actlist[lpos+1] = nil + actlist[lpos+2] = nil + actlist[lpos+3] = nil + actlist[lpos+4] = nil + actargs[apos+1] = nil + actargs[apos+2] = nil + actargs[apos+3] = nil + actargs[apos+4] = nil + end + error(err, 0) +end + +map_op[".template__"] = op_template + +------------------------------------------------------------------------------ + +-- Pseudo-opcode to mark the position where the action list is to be emitted. +map_op[".actionlist_1"] = function(params) + if not params then return "cvar" end + local name = params[1] -- No syntax check. You get to keep the pieces. + wline(function(out) writeactions(out, name) end) +end + +-- Pseudo-opcode to mark the position where the global enum is to be emitted. +map_op[".globals_1"] = function(params) + if not params then return "prefix" end + local prefix = params[1] -- No syntax check. You get to keep the pieces. + wline(function(out) writeglobals(out, prefix) end) +end + +-- Pseudo-opcode to mark the position where the global names are to be emitted. +map_op[".globalnames_1"] = function(params) + if not params then return "cvar" end + local name = params[1] -- No syntax check. You get to keep the pieces. + wline(function(out) writeglobalnames(out, name) end) +end + +-- Pseudo-opcode to mark the position where the extern names are to be emitted. +map_op[".externnames_1"] = function(params) + if not params then return "cvar" end + local name = params[1] -- No syntax check. You get to keep the pieces. + wline(function(out) writeexternnames(out, name) end) +end + +------------------------------------------------------------------------------ + +-- Label pseudo-opcode (converted from trailing colon form). +map_op[".label_1"] = function(params) + if not params then return "[1-9] | ->global | =>pcexpr" end + if secpos+1 > maxsecpos then wflush() end + local mode, n, s = parse_label(params[1], true) + if not mode or mode == "EXT" then werror("bad label definition") end + waction("LABEL_"..mode, n, s, 1) +end + +------------------------------------------------------------------------------ + +-- Pseudo-opcodes for data storage. +local function op_data(params) + if not params then return "imm..." end + local sz = params.op == ".long" and 4 or 8 + for _,p in ipairs(params) do + local imm = parse_number(p) + if imm then + local n = tobit(imm) + if n == imm or (n < 0 and n + 2^32 == imm) then + wputw(n < 0 and n + 2^32 or n) + if sz == 8 then + wputw(imm < 0 and 0xffffffff or 0) + end + elseif sz == 4 then + werror("bad immediate `"..p.."'") + else + imm = nil + end + end + if not imm then + local mode, v, s = parse_label(p, false) + if sz == 4 then + if mode then werror("label does not fit into .long") end + waction("IMMV", 0, p) + elseif mode and mode ~= "A" then + waction("REL_"..mode, v+0x8000, s, 1) + else + if mode == "A" then p = s end + waction("IMMV", 0, format("(unsigned int)(%s)", p)) + waction("IMMV", 0, format("(unsigned int)((unsigned long long)(%s)>>32)", p)) + end + end + if secpos+2 > maxsecpos then wflush() end + end +end +map_op[".long_*"] = op_data +map_op[".quad_*"] = op_data +map_op[".addr_*"] = op_data + +-- Alignment pseudo-opcode. +map_op[".align_1"] = function(params) + if not params then return "numpow2" end + if secpos+1 > maxsecpos then wflush() end + local align = tonumber(params[1]) + if align then + local x = align + -- Must be a power of 2 in the range (2 ... 256). + for i=1,8 do + x = x / 2 + if x == 1 then + waction("ALIGN", align-1, nil, 1) -- Action byte is 2**n-1. + return + end + end + end + werror("bad alignment") +end + +------------------------------------------------------------------------------ + +-- Pseudo-opcode for (primitive) type definitions (map to C types). +map_op[".type_3"] = function(params, nparams) + if not params then + return nparams == 2 and "name, ctype" or "name, ctype, reg" + end + local name, ctype, reg = params[1], params[2], params[3] + if not match(name, "^[%a_][%w_]*$") then + werror("bad type name `"..name.."'") + end + local tp = map_type[name] + if tp then + werror("duplicate type `"..name.."'") + end + -- Add #type to defines. A bit unclean to put it in map_archdef. + map_archdef["#"..name] = "sizeof("..ctype..")" + -- Add new type and emit shortcut define. + local num = ctypenum + 1 + map_type[name] = { + ctype = ctype, + ctypefmt = format("Dt%X(%%s)", num), + reg = reg, + } + wline(format("#define Dt%X(_V) (int)(ptrdiff_t)&(((%s *)0)_V)", num, ctype)) + ctypenum = num +end +map_op[".type_2"] = map_op[".type_3"] + +-- Dump type definitions. +local function dumptypes(out, lvl) + local t = {} + for name in pairs(map_type) do t[#t+1] = name end + sort(t) + out:write("Type definitions:\n") + for _,name in ipairs(t) do + local tp = map_type[name] + local reg = tp.reg or "" + out:write(format(" %-20s %-20s %s\n", name, tp.ctype, reg)) + end + out:write("\n") +end + +------------------------------------------------------------------------------ + +-- Set the current section. +function _M.section(num) + waction("SECTION", num) + wflush(true) -- SECTION is a terminal action. +end + +------------------------------------------------------------------------------ + +-- Dump architecture description. +function _M.dumparch(out) + out:write(format("DynASM %s version %s, released %s\n\n", + _info.arch, _info.version, _info.release)) + dumpactions(out) +end + +-- Dump all user defined elements. +function _M.dumpdef(out, lvl) + dumptypes(out, lvl) + dumpglobals(out, lvl) + dumpexterns(out, lvl) +end + +------------------------------------------------------------------------------ + +-- Pass callbacks from/to the DynASM core. +function _M.passcb(wl, we, wf, ww) + wline, werror, wfatal, wwarn = wl, we, wf, ww + return wflush +end + +-- Setup the arch-specific module. +function _M.setup(arch, opt) + g_arch, g_opt = arch, opt +end + +-- Merge the core maps and the arch-specific maps. +function _M.mergemaps(map_coreop, map_def) + setmetatable(map_op, { __index = map_coreop }) + setmetatable(map_def, { __index = map_archdef }) + return map_op, map_def +end + +return _M + +------------------------------------------------------------------------------ + diff --cc dynasm/dasm_mips.lua index 6f893fe0,e2ff17f0..1c605b68 --- a/dynasm/dasm_mips.lua +++ b/dynasm/dasm_mips.lua @@@ -1,7 -1,7 +1,7 @@@ ------------------------------------------------------------------------------ --- DynASM MIPS module. +-- DynASM MIPS32/MIPS64 module. -- - -- Copyright (C) 2005-2022 Mike Pall. All rights reserved. + -- Copyright (C) 2005-2023 Mike Pall. All rights reserved. -- See dynasm.lua for full copyright notice. ------------------------------------------------------------------------------ diff --cc dynasm/dasm_mips64.lua index b4f8707d,00000000..c97d666b mode 100644,000000..100644 --- a/dynasm/dasm_mips64.lua +++ b/dynasm/dasm_mips64.lua @@@ -1,12 -1,0 +1,12 @@@ +------------------------------------------------------------------------------ +-- DynASM MIPS64 module. +-- - -- Copyright (C) 2005-2022 Mike Pall. All rights reserved. ++-- Copyright (C) 2005-2023 Mike Pall. All rights reserved. +-- See dynasm.lua for full copyright notice. +------------------------------------------------------------------------------ +-- This module just sets 64 bit mode for the combined MIPS/MIPS64 module. +-- All the interesting stuff is there. +------------------------------------------------------------------------------ + +mips64 = true -- Using a global is an ugly, but effective solution. +return require("dasm_mips") diff --cc dynasm/dasm_ppc.h index 14db019d,e2d6f1fc..4c7d7289 --- a/dynasm/dasm_ppc.h +++ b/dynasm/dasm_ppc.h @@@ -1,6 -1,6 +1,6 @@@ /* -** DynASM PPC encoding engine. +** DynASM PPC/PPC64 encoding engine. - ** Copyright (C) 2005-2022 Mike Pall. All rights reserved. + ** Copyright (C) 2005-2023 Mike Pall. All rights reserved. ** Released under the MIT license. See dynasm.lua for full copyright notice. */ diff --cc dynasm/dasm_ppc.lua index 3624e882,b4f5cea4..d66ae4a0 --- a/dynasm/dasm_ppc.lua +++ b/dynasm/dasm_ppc.lua @@@ -1,10 -1,8 +1,10 @@@ ------------------------------------------------------------------------------ --- DynASM PPC module. +-- DynASM PPC/PPC64 module. -- - -- Copyright (C) 2005-2022 Mike Pall. All rights reserved. + -- Copyright (C) 2005-2023 Mike Pall. All rights reserved. -- See dynasm.lua for full copyright notice. +-- +-- Support for various extensions contributed by Caio Souza Oliveira. ------------------------------------------------------------------------------ -- Module information: diff --cc src/host/genlibbc.lua index ba18812c,00000000..3621c3f5 mode 100644,000000..100644 --- a/src/host/genlibbc.lua +++ b/src/host/genlibbc.lua @@@ -1,225 -1,0 +1,225 @@@ +---------------------------------------------------------------------------- +-- Lua script to dump the bytecode of the library functions written in Lua. +-- The resulting 'buildvm_libbc.h' is used for the build process of LuaJIT. +---------------------------------------------------------------------------- - -- Copyright (C) 2005-2022 Mike Pall. All rights reserved. ++-- Copyright (C) 2005-2023 Mike Pall. All rights reserved. +-- Released under the MIT license. See Copyright Notice in luajit.h +---------------------------------------------------------------------------- + +local ffi = require("ffi") +local bit = require("bit") +local vmdef = require("jit.vmdef") +local bcnames = vmdef.bcnames + +local format = string.format + +local isbe = (string.byte(string.dump(function() end), 5) % 2 == 1) + +local function usage(arg) + io.stderr:write("Usage: ", arg and arg[0] or "genlibbc", + " [-o buildvm_libbc.h] lib_*.c\n") + os.exit(1) +end + +local function parse_arg(arg) + local outfile = "-" + if not (arg and arg[1]) then + usage(arg) + end + if arg[1] == "-o" then + outfile = arg[2] + if not outfile then usage(arg) end + table.remove(arg, 1) + table.remove(arg, 1) + end + return outfile +end + +local function read_files(names) + local src = "" + for _,name in ipairs(names) do + local fp = assert(io.open(name)) + src = src .. fp:read("*a") + fp:close() + end + return src +end + +local function transform_lua(code) + local fixup = {} + local n = -30000 + code = string.gsub(code, "CHECK_(%w*)%((.-)%)", function(tp, var) + n = n + 1 + fixup[n] = { "CHECK", tp } + return format("%s=%d", var, n) + end) + code = string.gsub(code, "PAIRS%((.-)%)", function(var) + fixup.PAIRS = true + return format("nil, %s, 0x4dp80", var) + end) + return "return "..code, fixup +end + +local function read_uleb128(p) + local v = p[0]; p = p + 1 + if v >= 128 then + local sh = 7; v = v - 128 + repeat + local r = p[0] + v = v + bit.lshift(bit.band(r, 127), sh) + sh = sh + 7 + p = p + 1 + until r < 128 + end + return p, v +end + +-- ORDER LJ_T +local name2itype = { + str = 5, func = 9, tab = 12, int = 14, num = 15 +} + +local BC, BCN = {}, {} +for i=0,#bcnames/6-1 do + local name = bcnames:sub(i*6+1, i*6+6):gsub(" ", "") + BC[name] = i + BCN[i] = name +end +local xop, xra = isbe and 3 or 0, isbe and 2 or 1 +local xrc, xrb = isbe and 1 or 2, isbe and 0 or 3 + +local function fixup_dump(dump, fixup) + local buf = ffi.new("uint8_t[?]", #dump+1, dump) + local p = buf+5 + local n, sizebc + p, n = read_uleb128(p) + local start = p + p = p + 4 + p = read_uleb128(p) + p = read_uleb128(p) + p, sizebc = read_uleb128(p) + local startbc = tonumber(p - start) + local rawtab = {} + for i=0,sizebc-1 do + local op = p[xop] + if op == BC.KSHORT then + local rd = p[xrc] + 256*p[xrb] + rd = bit.arshift(bit.lshift(rd, 16), 16) + local f = fixup[rd] + if f then + if f[1] == "CHECK" then + local tp = f[2] + if tp == "tab" then rawtab[p[xra]] = true end + p[xop] = tp == "num" and BC.ISNUM or BC.ISTYPE + p[xrb] = 0 + p[xrc] = name2itype[tp] + else + error("unhandled fixup type: "..f[1]) + end + end + elseif op == BC.TGETV then + if rawtab[p[xrb]] then + p[xop] = BC.TGETR + end + elseif op == BC.TSETV then + if rawtab[p[xrb]] then + p[xop] = BC.TSETR + end + elseif op == BC.ITERC then + if fixup.PAIRS then + p[xop] = BC.ITERN + end + end + p = p + 4 + end + local ndump = ffi.string(start, n) + -- Fixup hi-part of 0x4dp80 to LJ_KEYINDEX. + ndump = ndump:gsub("\x80\x80\xcd\xaa\x04", "\xff\xff\xf9\xff\x0f") + return { dump = ndump, startbc = startbc, sizebc = sizebc } +end + +local function find_defs(src) + local defs = {} + for name, code in string.gmatch(src, "LJLIB_LUA%(([^)]*)%)%s*/%*(.-)%*/") do + local env = {} + local tcode, fixup = transform_lua(code) + local func = assert(load(tcode, "", nil, env))() + defs[name] = fixup_dump(string.dump(func, true), fixup) + defs[#defs+1] = name + end + return defs +end + +local function gen_header(defs) + local t = {} + local function w(x) t[#t+1] = x end + w("/* This is a generated file. DO NOT EDIT! */\n\n") + w("static const int libbc_endian = ") w(isbe and 1 or 0) w(";\n\n") + local s, sb = "", "" + for i,name in ipairs(defs) do + local d = defs[name] + s = s .. d.dump + sb = sb .. string.char(i) .. ("\0"):rep(d.startbc - 1) + .. (isbe and "\0\0\0\255" or "\255\0\0\0"):rep(d.sizebc) + .. ("\0"):rep(#d.dump - d.startbc - d.sizebc*4) + end + w("static const uint8_t libbc_code[] = {\n") + local n = 0 + for i=1,#s do + local x = string.byte(s, i) + local xb = string.byte(sb, i) + if xb == 255 then + local name = BCN[x] + local m = #name + 4 + if n + m > 78 then n = 0; w("\n") end + n = n + m + w("BC_"); w(name) + else + local m = x < 10 and 2 or (x < 100 and 3 or 4) + if xb == 0 then + if n + m > 78 then n = 0; w("\n") end + else + local name = defs[xb]:gsub("_", ".") + if n ~= 0 then w("\n") end + w("/* "); w(name); w(" */ ") + n = #name + 7 + end + n = n + m + w(x) + end + w(",") + end + w("\n0\n};\n\n") + w("static const struct { const char *name; int ofs; } libbc_map[] = {\n") + local m = 0 + for _,name in ipairs(defs) do + w('{"'); w(name); w('",'); w(m) w('},\n') + m = m + #defs[name].dump + end + w("{NULL,"); w(m); w("}\n};\n\n") + return table.concat(t) +end + +local function write_file(name, data) + if name == "-" then + assert(io.write(data)) + assert(io.flush()) + else + local fp = io.open(name) + if fp then + local old = fp:read("*a") + fp:close() + if data == old then return end + end + fp = assert(io.open(name, "w")) + assert(fp:write(data)) + assert(fp:close()) + end +end + +local outfile = parse_arg(arg) +local src = read_files(arg) +local defs = find_defs(src) +local hdr = gen_header(defs) +write_file(outfile, hdr) + diff --cc src/jit/dis_arm64.lua index 531584a1,00000000..b10e2fb1 mode 100644,000000..100644 --- a/src/jit/dis_arm64.lua +++ b/src/jit/dis_arm64.lua @@@ -1,1216 -1,0 +1,1216 @@@ +---------------------------------------------------------------------------- +-- LuaJIT ARM64 disassembler module. +-- - -- Copyright (C) 2005-2022 Mike Pall. All rights reserved. ++-- Copyright (C) 2005-2023 Mike Pall. All rights reserved. +-- Released under the MIT license. See Copyright Notice in luajit.h +-- +-- Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com. +-- Sponsored by Cisco Systems, Inc. +---------------------------------------------------------------------------- +-- This is a helper module used by the LuaJIT machine code dumper module. +-- +-- It disassembles most user-mode AArch64 instructions. +-- NYI: Advanced SIMD and VFP instructions. +------------------------------------------------------------------------------ + +local type = type +local sub, byte, format = string.sub, string.byte, string.format +local match, gmatch, gsub = string.match, string.gmatch, string.gsub +local concat = table.concat +local bit = require("bit") +local band, bor, bxor, tohex = bit.band, bit.bor, bit.bxor, bit.tohex +local lshift, rshift, arshift = bit.lshift, bit.rshift, bit.arshift +local ror = bit.ror + +------------------------------------------------------------------------------ +-- Opcode maps +------------------------------------------------------------------------------ + +local map_adr = { -- PC-relative addressing. + shift = 31, mask = 1, + [0] = "adrDBx", "adrpDBx" +} + +local map_addsubi = { -- Add/subtract immediate. + shift = 29, mask = 3, + [0] = "add|movDNIg", "adds|cmnD0NIg", "subDNIg", "subs|cmpD0NIg", +} + +local map_logi = { -- Logical immediate. + shift = 31, mask = 1, + [0] = { + shift = 22, mask = 1, + [0] = { + shift = 29, mask = 3, + [0] = "andDNig", "orr|movDN0ig", "eorDNig", "ands|tstD0Nig" + }, + false -- unallocated + }, + { + shift = 29, mask = 3, + [0] = "andDNig", "orr|movDN0ig", "eorDNig", "ands|tstD0Nig" + } +} + +local map_movwi = { -- Move wide immediate. + shift = 31, mask = 1, + [0] = { + shift = 22, mask = 1, + [0] = { + shift = 29, mask = 3, + [0] = "movnDWRg", false, "movz|movDYRg", "movkDWRg" + }, false -- unallocated + }, + { + shift = 29, mask = 3, + [0] = "movnDWRg", false, "movz|movDYRg", "movkDWRg" + }, +} + +local map_bitf = { -- Bitfield. + shift = 31, mask = 1, + [0] = { + shift = 22, mask = 1, + [0] = { + shift = 29, mask = 3, + [0] = "sbfm|sbfiz|sbfx|asr|sxtw|sxth|sxtbDN12w", + "bfm|bfi|bfxilDN13w", + "ubfm|ubfiz|ubfx|lsr|lsl|uxth|uxtbDN12w" + } + }, + { + shift = 22, mask = 1, + { + shift = 29, mask = 3, + [0] = "sbfm|sbfiz|sbfx|asr|sxtw|sxth|sxtbDN12x", + "bfm|bfi|bfxilDN13x", + "ubfm|ubfiz|ubfx|lsr|lsl|uxth|uxtbDN12x" + } + } +} + +local map_datai = { -- Data processing - immediate. + shift = 23, mask = 7, + [0] = map_adr, map_adr, map_addsubi, false, + map_logi, map_movwi, map_bitf, + { + shift = 15, mask = 0x1c0c1, + [0] = "extr|rorDNM4w", [0x10080] = "extr|rorDNM4x", + [0x10081] = "extr|rorDNM4x" + } +} + +local map_logsr = { -- Logical, shifted register. + shift = 31, mask = 1, + [0] = { + shift = 15, mask = 1, + [0] = { + shift = 29, mask = 3, + [0] = { + shift = 21, mask = 7, + [0] = "andDNMSg", "bicDNMSg", "andDNMSg", "bicDNMSg", + "andDNMSg", "bicDNMSg", "andDNMg", "bicDNMg" + }, + { + shift = 21, mask = 7, + [0] ="orr|movDN0MSg", "orn|mvnDN0MSg", "orr|movDN0MSg", "orn|mvnDN0MSg", + "orr|movDN0MSg", "orn|mvnDN0MSg", "orr|movDN0Mg", "orn|mvnDN0Mg" + }, + { + shift = 21, mask = 7, + [0] = "eorDNMSg", "eonDNMSg", "eorDNMSg", "eonDNMSg", + "eorDNMSg", "eonDNMSg", "eorDNMg", "eonDNMg" + }, + { + shift = 21, mask = 7, + [0] = "ands|tstD0NMSg", "bicsDNMSg", "ands|tstD0NMSg", "bicsDNMSg", + "ands|tstD0NMSg", "bicsDNMSg", "ands|tstD0NMg", "bicsDNMg" + } + }, + false -- unallocated + }, + { + shift = 29, mask = 3, + [0] = { + shift = 21, mask = 7, + [0] = "andDNMSg", "bicDNMSg", "andDNMSg", "bicDNMSg", + "andDNMSg", "bicDNMSg", "andDNMg", "bicDNMg" + }, + { + shift = 21, mask = 7, + [0] = "orr|movDN0MSg", "orn|mvnDN0MSg", "orr|movDN0MSg", "orn|mvnDN0MSg", + "orr|movDN0MSg", "orn|mvnDN0MSg", "orr|movDN0Mg", "orn|mvnDN0Mg" + }, + { + shift = 21, mask = 7, + [0] = "eorDNMSg", "eonDNMSg", "eorDNMSg", "eonDNMSg", + "eorDNMSg", "eonDNMSg", "eorDNMg", "eonDNMg" + }, + { + shift = 21, mask = 7, + [0] = "ands|tstD0NMSg", "bicsDNMSg", "ands|tstD0NMSg", "bicsDNMSg", + "ands|tstD0NMSg", "bicsDNMSg", "ands|tstD0NMg", "bicsDNMg" + } + } +} + +local map_assh = { + shift = 31, mask = 1, + [0] = { + shift = 15, mask = 1, + [0] = { + shift = 29, mask = 3, + [0] = { + shift = 22, mask = 3, + [0] = "addDNMSg", "addDNMSg", "addDNMSg", "addDNMg" + }, + { + shift = 22, mask = 3, + [0] = "adds|cmnD0NMSg", "adds|cmnD0NMSg", + "adds|cmnD0NMSg", "adds|cmnD0NMg" + }, + { + shift = 22, mask = 3, + [0] = "sub|negDN0MSg", "sub|negDN0MSg", "sub|negDN0MSg", "sub|negDN0Mg" + }, + { + shift = 22, mask = 3, + [0] = "subs|cmp|negsD0N0MzSg", "subs|cmp|negsD0N0MzSg", + "subs|cmp|negsD0N0MzSg", "subs|cmp|negsD0N0Mzg" + }, + }, + false -- unallocated + }, + { + shift = 29, mask = 3, + [0] = { + shift = 22, mask = 3, + [0] = "addDNMSg", "addDNMSg", "addDNMSg", "addDNMg" + }, + { + shift = 22, mask = 3, + [0] = "adds|cmnD0NMSg", "adds|cmnD0NMSg", "adds|cmnD0NMSg", + "adds|cmnD0NMg" + }, + { + shift = 22, mask = 3, + [0] = "sub|negDN0MSg", "sub|negDN0MSg", "sub|negDN0MSg", "sub|negDN0Mg" + }, + { + shift = 22, mask = 3, + [0] = "subs|cmp|negsD0N0MzSg", "subs|cmp|negsD0N0MzSg", + "subs|cmp|negsD0N0MzSg", "subs|cmp|negsD0N0Mzg" + } + } +} + +local map_addsubsh = { -- Add/subtract, shifted register. + shift = 22, mask = 3, + [0] = map_assh, map_assh, map_assh +} + +local map_addsubex = { -- Add/subtract, extended register. + shift = 22, mask = 3, + [0] = { + shift = 29, mask = 3, + [0] = "addDNMXg", "adds|cmnD0NMXg", "subDNMXg", "subs|cmpD0NMzXg", + } +} + +local map_addsubc = { -- Add/subtract, with carry. + shift = 10, mask = 63, + [0] = { + shift = 29, mask = 3, + [0] = "adcDNMg", "adcsDNMg", "sbc|ngcDN0Mg", "sbcs|ngcsDN0Mg", + } +} + +local map_ccomp = { + shift = 4, mask = 1, + [0] = { + shift = 10, mask = 3, + [0] = { -- Conditional compare register. + shift = 29, mask = 3, + "ccmnNMVCg", false, "ccmpNMVCg", + }, + [2] = { -- Conditional compare immediate. + shift = 29, mask = 3, + "ccmnN5VCg", false, "ccmpN5VCg", + } + } +} + +local map_csel = { -- Conditional select. + shift = 11, mask = 1, + [0] = { + shift = 10, mask = 1, + [0] = { + shift = 29, mask = 3, + [0] = "cselDNMzCg", false, "csinv|cinv|csetmDNMcg", false, + }, + { + shift = 29, mask = 3, + [0] = "csinc|cinc|csetDNMcg", false, "csneg|cnegDNMcg", false, + } + } +} + +local map_data1s = { -- Data processing, 1 source. + shift = 29, mask = 1, + [0] = { + shift = 31, mask = 1, + [0] = { + shift = 10, mask = 0x7ff, + [0] = "rbitDNg", "rev16DNg", "revDNw", false, "clzDNg", "clsDNg" + }, + { + shift = 10, mask = 0x7ff, + [0] = "rbitDNg", "rev16DNg", "rev32DNx", "revDNx", "clzDNg", "clsDNg" + } + } +} + +local map_data2s = { -- Data processing, 2 sources. + shift = 29, mask = 1, + [0] = { + shift = 10, mask = 63, + false, "udivDNMg", "sdivDNMg", false, false, false, false, "lslDNMg", + "lsrDNMg", "asrDNMg", "rorDNMg" + } +} + +local map_data3s = { -- Data processing, 3 sources. + shift = 29, mask = 7, + [0] = { + shift = 21, mask = 7, + [0] = { + shift = 15, mask = 1, + [0] = "madd|mulDNMA0g", "msub|mnegDNMA0g" + } + }, false, false, false, + { + shift = 15, mask = 1, + [0] = { + shift = 21, mask = 7, + [0] = "madd|mulDNMA0g", "smaddl|smullDxNMwA0x", "smulhDNMx", false, + false, "umaddl|umullDxNMwA0x", "umulhDNMx" + }, + { + shift = 21, mask = 7, + [0] = "msub|mnegDNMA0g", "smsubl|smneglDxNMwA0x", false, false, + false, "umsubl|umneglDxNMwA0x" + } + } +} + +local map_datar = { -- Data processing, register. + shift = 28, mask = 1, + [0] = { + shift = 24, mask = 1, + [0] = map_logsr, + { + shift = 21, mask = 1, + [0] = map_addsubsh, map_addsubex + } + }, + { + shift = 21, mask = 15, + [0] = map_addsubc, false, map_ccomp, false, map_csel, false, + { + shift = 30, mask = 1, + [0] = map_data2s, map_data1s + }, + false, map_data3s, map_data3s, map_data3s, map_data3s, map_data3s, + map_data3s, map_data3s, map_data3s + } +} + +local map_lrl = { -- Load register, literal. + shift = 26, mask = 1, + [0] = { + shift = 30, mask = 3, + [0] = "ldrDwB", "ldrDxB", "ldrswDxB" + }, + { + shift = 30, mask = 3, + [0] = "ldrDsB", "ldrDdB" + } +} + +local map_lsriind = { -- Load/store register, immediate pre/post-indexed. + shift = 30, mask = 3, + [0] = { + shift = 26, mask = 1, + [0] = { + shift = 22, mask = 3, + [0] = "strbDwzL", "ldrbDwzL", "ldrsbDxzL", "ldrsbDwzL" + } + }, + { + shift = 26, mask = 1, + [0] = { + shift = 22, mask = 3, + [0] = "strhDwzL", "ldrhDwzL", "ldrshDxzL", "ldrshDwzL" + } + }, + { + shift = 26, mask = 1, + [0] = { + shift = 22, mask = 3, + [0] = "strDwzL", "ldrDwzL", "ldrswDxzL" + }, + { + shift = 22, mask = 3, + [0] = "strDszL", "ldrDszL" + } + }, + { + shift = 26, mask = 1, + [0] = { + shift = 22, mask = 3, + [0] = "strDxzL", "ldrDxzL" + }, + { + shift = 22, mask = 3, + [0] = "strDdzL", "ldrDdzL" + } + } +} + +local map_lsriro = { + shift = 21, mask = 1, + [0] = { -- Load/store register immediate. + shift = 10, mask = 3, + [0] = { -- Unscaled immediate. + shift = 26, mask = 1, + [0] = { + shift = 30, mask = 3, + [0] = { + shift = 22, mask = 3, + [0] = "sturbDwK", "ldurbDwK" + }, + { + shift = 22, mask = 3, + [0] = "sturhDwK", "ldurhDwK" + }, + { + shift = 22, mask = 3, + [0] = "sturDwK", "ldurDwK" + }, + { + shift = 22, mask = 3, + [0] = "sturDxK", "ldurDxK" + } + } + }, map_lsriind, false, map_lsriind + }, + { -- Load/store register, register offset. + shift = 10, mask = 3, + [2] = { + shift = 26, mask = 1, + [0] = { + shift = 30, mask = 3, + [0] = { + shift = 22, mask = 3, + [0] = "strbDwO", "ldrbDwO", "ldrsbDxO", "ldrsbDwO" + }, + { + shift = 22, mask = 3, + [0] = "strhDwO", "ldrhDwO", "ldrshDxO", "ldrshDwO" + }, + { + shift = 22, mask = 3, + [0] = "strDwO", "ldrDwO", "ldrswDxO" + }, + { + shift = 22, mask = 3, + [0] = "strDxO", "ldrDxO" + } + }, + { + shift = 30, mask = 3, + [2] = { + shift = 22, mask = 3, + [0] = "strDsO", "ldrDsO" + }, + [3] = { + shift = 22, mask = 3, + [0] = "strDdO", "ldrDdO" + } + } + } + } +} + +local map_lsp = { -- Load/store register pair, offset. + shift = 22, mask = 1, + [0] = { + shift = 30, mask = 3, + [0] = { + shift = 26, mask = 1, + [0] = "stpDzAzwP", "stpDzAzsP", + }, + { + shift = 26, mask = 1, + "stpDzAzdP" + }, + { + shift = 26, mask = 1, + [0] = "stpDzAzxP" + } + }, + { + shift = 30, mask = 3, + [0] = { + shift = 26, mask = 1, + [0] = "ldpDzAzwP", "ldpDzAzsP", + }, + { + shift = 26, mask = 1, + [0] = "ldpswDAxP", "ldpDzAzdP" + }, + { + shift = 26, mask = 1, + [0] = "ldpDzAzxP" + } + } +} + +local map_ls = { -- Loads and stores. + shift = 24, mask = 0x31, + [0x10] = map_lrl, [0x30] = map_lsriro, + [0x20] = { + shift = 23, mask = 3, + map_lsp, map_lsp, map_lsp + }, + [0x21] = { + shift = 23, mask = 3, + map_lsp, map_lsp, map_lsp + }, + [0x31] = { + shift = 26, mask = 1, + [0] = { + shift = 30, mask = 3, + [0] = { + shift = 22, mask = 3, + [0] = "strbDwzU", "ldrbDwzU" + }, + { + shift = 22, mask = 3, + [0] = "strhDwzU", "ldrhDwzU" + }, + { + shift = 22, mask = 3, + [0] = "strDwzU", "ldrDwzU" + }, + { + shift = 22, mask = 3, + [0] = "strDxzU", "ldrDxzU" + } + }, + { + shift = 30, mask = 3, + [2] = { + shift = 22, mask = 3, + [0] = "strDszU", "ldrDszU" + }, + [3] = { + shift = 22, mask = 3, + [0] = "strDdzU", "ldrDdzU" + } + } + }, +} + +local map_datafp = { -- Data processing, SIMD and FP. + shift = 28, mask = 7, + { -- 001 + shift = 24, mask = 1, + [0] = { + shift = 21, mask = 1, + { + shift = 10, mask = 3, + [0] = { + shift = 12, mask = 1, + [0] = { + shift = 13, mask = 1, + [0] = { + shift = 14, mask = 1, + [0] = { + shift = 15, mask = 1, + [0] = { -- FP/int conversion. + shift = 31, mask = 1, + [0] = { + shift = 16, mask = 0xff, + [0x20] = "fcvtnsDwNs", [0x21] = "fcvtnuDwNs", + [0x22] = "scvtfDsNw", [0x23] = "ucvtfDsNw", + [0x24] = "fcvtasDwNs", [0x25] = "fcvtauDwNs", + [0x26] = "fmovDwNs", [0x27] = "fmovDsNw", + [0x28] = "fcvtpsDwNs", [0x29] = "fcvtpuDwNs", + [0x30] = "fcvtmsDwNs", [0x31] = "fcvtmuDwNs", + [0x38] = "fcvtzsDwNs", [0x39] = "fcvtzuDwNs", + [0x60] = "fcvtnsDwNd", [0x61] = "fcvtnuDwNd", + [0x62] = "scvtfDdNw", [0x63] = "ucvtfDdNw", + [0x64] = "fcvtasDwNd", [0x65] = "fcvtauDwNd", + [0x68] = "fcvtpsDwNd", [0x69] = "fcvtpuDwNd", + [0x70] = "fcvtmsDwNd", [0x71] = "fcvtmuDwNd", + [0x78] = "fcvtzsDwNd", [0x79] = "fcvtzuDwNd" + }, + { + shift = 16, mask = 0xff, + [0x20] = "fcvtnsDxNs", [0x21] = "fcvtnuDxNs", + [0x22] = "scvtfDsNx", [0x23] = "ucvtfDsNx", + [0x24] = "fcvtasDxNs", [0x25] = "fcvtauDxNs", + [0x28] = "fcvtpsDxNs", [0x29] = "fcvtpuDxNs", + [0x30] = "fcvtmsDxNs", [0x31] = "fcvtmuDxNs", + [0x38] = "fcvtzsDxNs", [0x39] = "fcvtzuDxNs", + [0x60] = "fcvtnsDxNd", [0x61] = "fcvtnuDxNd", + [0x62] = "scvtfDdNx", [0x63] = "ucvtfDdNx", + [0x64] = "fcvtasDxNd", [0x65] = "fcvtauDxNd", + [0x66] = "fmovDxNd", [0x67] = "fmovDdNx", + [0x68] = "fcvtpsDxNd", [0x69] = "fcvtpuDxNd", + [0x70] = "fcvtmsDxNd", [0x71] = "fcvtmuDxNd", + [0x78] = "fcvtzsDxNd", [0x79] = "fcvtzuDxNd" + } + } + }, + { -- FP data-processing, 1 source. + shift = 31, mask = 1, + [0] = { + shift = 22, mask = 3, + [0] = { + shift = 15, mask = 63, + [0] = "fmovDNf", "fabsDNf", "fnegDNf", + "fsqrtDNf", false, "fcvtDdNs", false, false, + "frintnDNf", "frintpDNf", "frintmDNf", "frintzDNf", + "frintaDNf", false, "frintxDNf", "frintiDNf", + }, + { + shift = 15, mask = 63, + [0] = "fmovDNf", "fabsDNf", "fnegDNf", + "fsqrtDNf", "fcvtDsNd", false, false, false, + "frintnDNf", "frintpDNf", "frintmDNf", "frintzDNf", + "frintaDNf", false, "frintxDNf", "frintiDNf", + } + } + } + }, + { -- FP compare. + shift = 31, mask = 1, + [0] = { + shift = 14, mask = 3, + [0] = { + shift = 23, mask = 1, + [0] = { + shift = 0, mask = 31, + [0] = "fcmpNMf", [8] = "fcmpNZf", + [16] = "fcmpeNMf", [24] = "fcmpeNZf", + } + } + } + } + }, + { -- FP immediate. + shift = 31, mask = 1, + [0] = { + shift = 5, mask = 31, + [0] = { + shift = 23, mask = 1, + [0] = "fmovDFf" + } + } + } + }, + { -- FP conditional compare. + shift = 31, mask = 1, + [0] = { + shift = 23, mask = 1, + [0] = { + shift = 4, mask = 1, + [0] = "fccmpNMVCf", "fccmpeNMVCf" + } + } + }, + { -- FP data-processing, 2 sources. + shift = 31, mask = 1, + [0] = { + shift = 23, mask = 1, + [0] = { + shift = 12, mask = 15, + [0] = "fmulDNMf", "fdivDNMf", "faddDNMf", "fsubDNMf", + "fmaxDNMf", "fminDNMf", "fmaxnmDNMf", "fminnmDNMf", + "fnmulDNMf" + } + } + }, + { -- FP conditional select. + shift = 31, mask = 1, + [0] = { + shift = 23, mask = 1, + [0] = "fcselDNMCf" + } + } + } + }, + { -- FP data-processing, 3 sources. + shift = 31, mask = 1, + [0] = { + shift = 15, mask = 1, + [0] = { + shift = 21, mask = 5, + [0] = "fmaddDNMAf", "fnmaddDNMAf" + }, + { + shift = 21, mask = 5, + [0] = "fmsubDNMAf", "fnmsubDNMAf" + } + } + } + } +} + +local map_br = { -- Branches, exception generating and system instructions. + shift = 29, mask = 7, + [0] = "bB", + { -- Compare & branch, immediate. + shift = 24, mask = 3, + [0] = "cbzDBg", "cbnzDBg", "tbzDTBw", "tbnzDTBw" + }, + { -- Conditional branch, immediate. + shift = 24, mask = 3, + [0] = { + shift = 4, mask = 1, + [0] = { + shift = 0, mask = 15, + [0] = "beqB", "bneB", "bhsB", "bloB", "bmiB", "bplB", "bvsB", "bvcB", + "bhiB", "blsB", "bgeB", "bltB", "bgtB", "bleB", "balB" + } + } + }, false, "blB", + { -- Compare & branch, immediate. + shift = 24, mask = 3, + [0] = "cbzDBg", "cbnzDBg", "tbzDTBx", "tbnzDTBx" + }, + { + shift = 24, mask = 3, + [0] = { -- Exception generation. + shift = 0, mask = 0xe0001f, + [0x200000] = "brkW" + }, + { -- System instructions. + shift = 0, mask = 0x3fffff, + [0x03201f] = "nop" + }, + { -- Unconditional branch, register. + shift = 0, mask = 0xfffc1f, + [0x1f0000] = "brNx", [0x3f0000] = "blrNx", + [0x5f0000] = "retNx" + }, + } +} + +local map_init = { + shift = 25, mask = 15, + [0] = false, false, false, false, map_ls, map_datar, map_ls, map_datafp, + map_datai, map_datai, map_br, map_br, map_ls, map_datar, map_ls, map_datafp +} + +------------------------------------------------------------------------------ + +local map_regs = { x = {}, w = {}, d = {}, s = {} } + +for i=0,30 do + map_regs.x[i] = "x"..i + map_regs.w[i] = "w"..i + map_regs.d[i] = "d"..i + map_regs.s[i] = "s"..i +end +map_regs.x[31] = "sp" +map_regs.w[31] = "wsp" +map_regs.d[31] = "d31" +map_regs.s[31] = "s31" + +local map_cond = { + [0] = "eq", "ne", "cs", "cc", "mi", "pl", "vs", "vc", + "hi", "ls", "ge", "lt", "gt", "le", "al", +} + +local map_shift = { [0] = "lsl", "lsr", "asr", } + +local map_extend = { + [0] = "uxtb", "uxth", "uxtw", "uxtx", "sxtb", "sxth", "sxtw", "sxtx", +} + +------------------------------------------------------------------------------ + +-- Output a nicely formatted line with an opcode and operands. +local function putop(ctx, text, operands) + local pos = ctx.pos + local extra = "" + if ctx.rel then + local sym = ctx.symtab[ctx.rel] + if sym then + extra = "\t->"..sym + end + end + if ctx.hexdump > 0 then + ctx.out(format("%08x %s %-5s %s%s\n", + ctx.addr+pos, tohex(ctx.op), text, concat(operands, ", "), extra)) + else + ctx.out(format("%08x %-5s %s%s\n", + ctx.addr+pos, text, concat(operands, ", "), extra)) + end + ctx.pos = pos + 4 +end + +-- Fallback for unknown opcodes. +local function unknown(ctx) + return putop(ctx, ".long", { "0x"..tohex(ctx.op) }) +end + +local function match_reg(p, pat, regnum) + return map_regs[match(pat, p.."%w-([xwds])")][regnum] +end + +local function fmt_hex32(x) + if x < 0 then + return tohex(x) + else + return format("%x", x) + end +end + +local imm13_rep = { 0x55555555, 0x11111111, 0x01010101, 0x00010001, 0x00000001 } + +local function decode_imm13(op) + local imms = band(rshift(op, 10), 63) + local immr = band(rshift(op, 16), 63) + if band(op, 0x00400000) == 0 then + local len = 5 + if imms >= 56 then + if imms >= 60 then len = 1 else len = 2 end + elseif imms >= 48 then len = 3 elseif imms >= 32 then len = 4 end + local l = lshift(1, len)-1 + local s = band(imms, l) + local r = band(immr, l) + local imm = ror(rshift(-1, 31-s), r) + if len ~= 5 then imm = band(imm, lshift(1, l)-1) + rshift(imm, 31-l) end + imm = imm * imm13_rep[len] + local ix = fmt_hex32(imm) + if rshift(op, 31) ~= 0 then + return ix..tohex(imm) + else + return ix + end + else + local lo, hi = -1, 0 + if imms < 32 then lo = rshift(-1, 31-imms) else hi = rshift(-1, 63-imms) end + if immr ~= 0 then + lo, hi = ror(lo, immr), ror(hi, immr) + local x = immr == 32 and 0 or band(bxor(lo, hi), lshift(-1, 32-immr)) + lo, hi = bxor(lo, x), bxor(hi, x) + if immr >= 32 then lo, hi = hi, lo end + end + if hi ~= 0 then + return fmt_hex32(hi)..tohex(lo) + else + return fmt_hex32(lo) + end + end +end + +local function parse_immpc(op, name) + if name == "b" or name == "bl" then + return arshift(lshift(op, 6), 4) + elseif name == "adr" or name == "adrp" then + local immlo = band(rshift(op, 29), 3) + local immhi = lshift(arshift(lshift(op, 8), 13), 2) + return bor(immhi, immlo) + elseif name == "tbz" or name == "tbnz" then + return lshift(arshift(lshift(op, 13), 18), 2) + else + return lshift(arshift(lshift(op, 8), 13), 2) + end +end + +local function parse_fpimm8(op) + local sign = band(op, 0x100000) == 0 and 1 or -1 + local exp = bxor(rshift(arshift(lshift(op, 12), 5), 24), 0x80) - 131 + local frac = 16+band(rshift(op, 13), 15) + return sign * frac * 2^exp +end + +local function prefer_bfx(sf, uns, imms, immr) + if imms < immr or imms == 31 or imms == 63 then + return false + end + if immr == 0 then + if sf == 0 and (imms == 7 or imms == 15) then + return false + end + if sf ~= 0 and uns == 0 and (imms == 7 or imms == 15 or imms == 31) then + return false + end + end + return true +end + +-- Disassemble a single instruction. +local function disass_ins(ctx) + local pos = ctx.pos + local b0, b1, b2, b3 = byte(ctx.code, pos+1, pos+4) + local op = bor(lshift(b3, 24), lshift(b2, 16), lshift(b1, 8), b0) + local operands = {} + local suffix = "" + local last, name, pat + local map_reg + ctx.op = op + ctx.rel = nil + last = nil + local opat + opat = map_init[band(rshift(op, 25), 15)] + while type(opat) ~= "string" do + if not opat then return unknown(ctx) end + opat = opat[band(rshift(op, opat.shift), opat.mask)] or opat._ + end + name, pat = match(opat, "^([a-z0-9]*)(.*)") + local altname, pat2 = match(pat, "|([a-z0-9_.|]*)(.*)") + if altname then pat = pat2 end + if sub(pat, 1, 1) == "." then + local s2, p2 = match(pat, "^([a-z0-9.]*)(.*)") + suffix = suffix..s2 + pat = p2 + end + + local rt = match(pat, "[gf]") + if rt then + if rt == "g" then + map_reg = band(op, 0x80000000) ~= 0 and map_regs.x or map_regs.w + else + map_reg = band(op, 0x400000) ~= 0 and map_regs.d or map_regs.s + end + end + + local second0, immr + + for p in gmatch(pat, ".") do + local x = nil + if p == "D" then + local regnum = band(op, 31) + x = rt and map_reg[regnum] or match_reg(p, pat, regnum) + elseif p == "N" then + local regnum = band(rshift(op, 5), 31) + x = rt and map_reg[regnum] or match_reg(p, pat, regnum) + elseif p == "M" then + local regnum = band(rshift(op, 16), 31) + x = rt and map_reg[regnum] or match_reg(p, pat, regnum) + elseif p == "A" then + local regnum = band(rshift(op, 10), 31) + x = rt and map_reg[regnum] or match_reg(p, pat, regnum) + elseif p == "B" then + local addr = ctx.addr + pos + parse_immpc(op, name) + ctx.rel = addr + x = "0x"..tohex(addr) + elseif p == "T" then + x = bor(band(rshift(op, 26), 32), band(rshift(op, 19), 31)) + elseif p == "V" then + x = band(op, 15) + elseif p == "C" then + x = map_cond[band(rshift(op, 12), 15)] + elseif p == "c" then + local rn = band(rshift(op, 5), 31) + local rm = band(rshift(op, 16), 31) + local cond = band(rshift(op, 12), 15) + local invc = bxor(cond, 1) + x = map_cond[cond] + if altname and cond ~= 14 and cond ~= 15 then + local a1, a2 = match(altname, "([^|]*)|(.*)") + if rn == rm then + local n = #operands + operands[n] = nil + x = map_cond[invc] + if rn ~= 31 then + if a1 then name = a1 else name = altname end + else + operands[n-1] = nil + name = a2 + end + end + end + elseif p == "W" then + x = band(rshift(op, 5), 0xffff) + elseif p == "Y" then + x = band(rshift(op, 5), 0xffff) + local hw = band(rshift(op, 21), 3) + if altname and (hw == 0 or x ~= 0) then + name = altname + end + elseif p == "L" then + local rn = map_regs.x[band(rshift(op, 5), 31)] + local imm9 = arshift(lshift(op, 11), 23) + if band(op, 0x800) ~= 0 then + x = "["..rn..", #"..imm9.."]!" + else + x = "["..rn.."], #"..imm9 + end + elseif p == "U" then + local rn = map_regs.x[band(rshift(op, 5), 31)] + local sz = band(rshift(op, 30), 3) + local imm12 = lshift(arshift(lshift(op, 10), 20), sz) + if imm12 ~= 0 then + x = "["..rn..", #"..imm12.."]" + else + x = "["..rn.."]" + end + elseif p == "K" then + local rn = map_regs.x[band(rshift(op, 5), 31)] + local imm9 = arshift(lshift(op, 11), 23) + if imm9 ~= 0 then + x = "["..rn..", #"..imm9.."]" + else + x = "["..rn.."]" + end + elseif p == "O" then + local rn, rm = map_regs.x[band(rshift(op, 5), 31)] + local m = band(rshift(op, 13), 1) + if m == 0 then + rm = map_regs.w[band(rshift(op, 16), 31)] + else + rm = map_regs.x[band(rshift(op, 16), 31)] + end + x = "["..rn..", "..rm + local opt = band(rshift(op, 13), 7) + local s = band(rshift(op, 12), 1) + local sz = band(rshift(op, 30), 3) + -- extension to be applied + if opt == 3 then + if s == 0 then x = x.."]" + else x = x..", lsl #"..sz.."]" end + elseif opt == 2 or opt == 6 or opt == 7 then + if s == 0 then x = x..", "..map_extend[opt].."]" + else x = x..", "..map_extend[opt].." #"..sz.."]" end + else + x = x.."]" + end + elseif p == "P" then + local opcv, sh = rshift(op, 26), 2 + if opcv >= 0x2a then sh = 4 elseif opcv >= 0x1b then sh = 3 end + local imm7 = lshift(arshift(lshift(op, 10), 25), sh) + local rn = map_regs.x[band(rshift(op, 5), 31)] + local ind = band(rshift(op, 23), 3) + if ind == 1 then + x = "["..rn.."], #"..imm7 + elseif ind == 2 then + if imm7 == 0 then + x = "["..rn.."]" + else + x = "["..rn..", #"..imm7.."]" + end + elseif ind == 3 then + x = "["..rn..", #"..imm7.."]!" + end + elseif p == "I" then + local shf = band(rshift(op, 22), 3) + local imm12 = band(rshift(op, 10), 0x0fff) + local rn, rd = band(rshift(op, 5), 31), band(op, 31) + if altname == "mov" and shf == 0 and imm12 == 0 and (rn == 31 or rd == 31) then + name = altname + x = nil + elseif shf == 0 then + x = imm12 + elseif shf == 1 then + x = imm12..", lsl #12" + end + elseif p == "i" then + x = "#0x"..decode_imm13(op) + elseif p == "1" then + immr = band(rshift(op, 16), 63) + x = immr + elseif p == "2" then + x = band(rshift(op, 10), 63) + if altname then + local a1, a2, a3, a4, a5, a6 = + match(altname, "([^|]*)|([^|]*)|([^|]*)|([^|]*)|([^|]*)|(.*)") + local sf = band(rshift(op, 26), 32) + local uns = band(rshift(op, 30), 1) + if prefer_bfx(sf, uns, x, immr) then + name = a2 + x = x - immr + 1 + elseif immr == 0 and x == 7 then + local n = #operands + operands[n] = nil + if sf ~= 0 then + operands[n-1] = gsub(operands[n-1], "x", "w") + end + last = operands[n-1] + name = a6 + x = nil + elseif immr == 0 and x == 15 then + local n = #operands + operands[n] = nil + if sf ~= 0 then + operands[n-1] = gsub(operands[n-1], "x", "w") + end + last = operands[n-1] + name = a5 + x = nil + elseif x == 31 or x == 63 then + if x == 31 and immr == 0 and name == "sbfm" then + name = a4 + local n = #operands + operands[n] = nil + if sf ~= 0 then + operands[n-1] = gsub(operands[n-1], "x", "w") + end + last = operands[n-1] + else + name = a3 + end + x = nil + elseif band(x, 31) ~= 31 and immr == x+1 and name == "ubfm" then + name = a4 + last = "#"..(sf+32 - immr) + operands[#operands] = last + x = nil + elseif x < immr then + name = a1 + last = "#"..(sf+32 - immr) + operands[#operands] = last + x = x + 1 + end + end + elseif p == "3" then + x = band(rshift(op, 10), 63) + if altname then + local a1, a2 = match(altname, "([^|]*)|(.*)") + if x < immr then + name = a1 + local sf = band(rshift(op, 26), 32) + last = "#"..(sf+32 - immr) + operands[#operands] = last + x = x + 1 + else + name = a2 + x = x - immr + 1 + end + end + elseif p == "4" then + x = band(rshift(op, 10), 63) + local rn = band(rshift(op, 5), 31) + local rm = band(rshift(op, 16), 31) + if altname and rn == rm then + local n = #operands + operands[n] = nil + last = operands[n-1] + name = altname + end + elseif p == "5" then + x = band(rshift(op, 16), 31) + elseif p == "S" then + x = band(rshift(op, 10), 63) + if x == 0 then x = nil + else x = map_shift[band(rshift(op, 22), 3)].." #"..x end + elseif p == "X" then + local opt = band(rshift(op, 13), 7) + -- Width specifier . + if opt ~= 3 and opt ~= 7 then + last = map_regs.w[band(rshift(op, 16), 31)] + operands[#operands] = last + end + x = band(rshift(op, 10), 7) + -- Extension. + if opt == 2 + band(rshift(op, 31), 1) and + band(rshift(op, second0 and 5 or 0), 31) == 31 then + if x == 0 then x = nil + else x = "lsl #"..x end + else + if x == 0 then x = map_extend[band(rshift(op, 13), 7)] + else x = map_extend[band(rshift(op, 13), 7)].." #"..x end + end + elseif p == "R" then + x = band(rshift(op,21), 3) + if x == 0 then x = nil + else x = "lsl #"..x*16 end + elseif p == "z" then + local n = #operands + if operands[n] == "sp" then operands[n] = "xzr" + elseif operands[n] == "wsp" then operands[n] = "wzr" + end + elseif p == "Z" then + x = 0 + elseif p == "F" then + x = parse_fpimm8(op) + elseif p == "g" or p == "f" or p == "x" or p == "w" or + p == "d" or p == "s" then + -- These are handled in D/N/M/A. + elseif p == "0" then + if last == "sp" or last == "wsp" then + local n = #operands + operands[n] = nil + last = operands[n-1] + if altname then + local a1, a2 = match(altname, "([^|]*)|(.*)") + if not a1 then + name = altname + elseif second0 then + name, altname = a2, a1 + else + name, altname = a1, a2 + end + end + end + second0 = true + else + assert(false) + end + if x then + last = x + if type(x) == "number" then x = "#"..x end + operands[#operands+1] = x + end + end + + return putop(ctx, name..suffix, operands) +end + +------------------------------------------------------------------------------ + +-- Disassemble a block of code. +local function disass_block(ctx, ofs, len) + if not ofs then ofs = 0 end + local stop = len and ofs+len or #ctx.code + ctx.pos = ofs + ctx.rel = nil + while ctx.pos < stop do disass_ins(ctx) end +end + +-- Extended API: create a disassembler context. Then call ctx:disass(ofs, len). +local function create(code, addr, out) + local ctx = {} + ctx.code = code + ctx.addr = addr or 0 + ctx.out = out or io.write + ctx.symtab = {} + ctx.disass = disass_block + ctx.hexdump = 8 + return ctx +end + +-- Simple API: disassemble code (a string) at address and output via out. +local function disass(code, addr, out) + create(code, addr, out):disass() +end + +-- Return register name for RID. +local function regname(r) + if r < 32 then return map_regs.x[r] end + return map_regs.d[r-32] +end + +-- Public module functions. +return { + create = create, + disass = disass, + regname = regname +} + diff --cc src/jit/dis_arm64be.lua index 7337f5b7,00000000..f7a56352 mode 100644,000000..100644 --- a/src/jit/dis_arm64be.lua +++ b/src/jit/dis_arm64be.lua @@@ -1,12 -1,0 +1,12 @@@ +---------------------------------------------------------------------------- +-- LuaJIT ARM64BE disassembler wrapper module. +-- - -- Copyright (C) 2005-2022 Mike Pall. All rights reserved. ++-- Copyright (C) 2005-2023 Mike Pall. All rights reserved. +-- Released under the MIT license. See Copyright Notice in luajit.h +---------------------------------------------------------------------------- +-- ARM64 instructions are always little-endian. So just forward to the +-- common ARM64 disassembler module. All the interesting stuff is there. +------------------------------------------------------------------------------ + +return require((string.match(..., ".*%.") or "").."dis_arm64") + diff --cc src/jit/dis_mips64.lua index 1236e524,00000000..5f3a4dab mode 100644,000000..100644 --- a/src/jit/dis_mips64.lua +++ b/src/jit/dis_mips64.lua @@@ -1,17 -1,0 +1,17 @@@ +---------------------------------------------------------------------------- +-- LuaJIT MIPS64 disassembler wrapper module. +-- - -- Copyright (C) 2005-2022 Mike Pall. All rights reserved. ++-- Copyright (C) 2005-2023 Mike Pall. All rights reserved. +-- Released under the MIT license. See Copyright Notice in luajit.h +---------------------------------------------------------------------------- +-- This module just exports the big-endian functions from the +-- MIPS disassembler module. All the interesting stuff is there. +------------------------------------------------------------------------------ + +local dis_mips = require((string.match(..., ".*%.") or "").."dis_mips") +return { + create = dis_mips.create, + disass = dis_mips.disass, + regname = dis_mips.regname +} + diff --cc src/jit/dis_mips64el.lua index 7c478d2d,00000000..ea513649 mode 100644,000000..100644 --- a/src/jit/dis_mips64el.lua +++ b/src/jit/dis_mips64el.lua @@@ -1,17 -1,0 +1,17 @@@ +---------------------------------------------------------------------------- +-- LuaJIT MIPS64EL disassembler wrapper module. +-- - -- Copyright (C) 2005-2022 Mike Pall. All rights reserved. ++-- Copyright (C) 2005-2023 Mike Pall. All rights reserved. +-- Released under the MIT license. See Copyright Notice in luajit.h +---------------------------------------------------------------------------- +-- This module just exports the little-endian functions from the +-- MIPS disassembler module. All the interesting stuff is there. +------------------------------------------------------------------------------ + +local dis_mips = require((string.match(..., ".*%.") or "").."dis_mips") +return { + create = dis_mips.create_el, + disass = dis_mips.disass_el, + regname = dis_mips.regname +} + diff --cc src/jit/dis_mips64r6.lua index c5789ce4,00000000..1d948411 mode 100644,000000..100644 --- a/src/jit/dis_mips64r6.lua +++ b/src/jit/dis_mips64r6.lua @@@ -1,17 -1,0 +1,17 @@@ +---------------------------------------------------------------------------- +-- LuaJIT MIPS64R6 disassembler wrapper module. +-- - -- Copyright (C) 2005-2022 Mike Pall. All rights reserved. ++-- Copyright (C) 2005-2023 Mike Pall. All rights reserved. +-- Released under the MIT license. See Copyright Notice in luajit.h +---------------------------------------------------------------------------- +-- This module just exports the r6 big-endian functions from the +-- MIPS disassembler module. All the interesting stuff is there. +------------------------------------------------------------------------------ + +local dis_mips = require((string.match(..., ".*%.") or "").."dis_mips") +return { + create = dis_mips.create_r6, + disass = dis_mips.disass_r6, + regname = dis_mips.regname +} + diff --cc src/jit/dis_mips64r6el.lua index f67f6240,00000000..26592e17 mode 100644,000000..100644 --- a/src/jit/dis_mips64r6el.lua +++ b/src/jit/dis_mips64r6el.lua @@@ -1,17 -1,0 +1,17 @@@ +---------------------------------------------------------------------------- +-- LuaJIT MIPS64R6EL disassembler wrapper module. +-- - -- Copyright (C) 2005-2022 Mike Pall. All rights reserved. ++-- Copyright (C) 2005-2023 Mike Pall. All rights reserved. +-- Released under the MIT license. See Copyright Notice in luajit.h +---------------------------------------------------------------------------- +-- This module just exports the r6 little-endian functions from the +-- MIPS disassembler module. All the interesting stuff is there. +------------------------------------------------------------------------------ + +local dis_mips = require((string.match(..., ".*%.") or "").."dis_mips") +return { + create = dis_mips.create_r6_el, + disass = dis_mips.disass_r6_el, + regname = dis_mips.regname +} + diff --cc src/jit/p.lua index f225c312,00000000..3daa9291 mode 100644,000000..100644 --- a/src/jit/p.lua +++ b/src/jit/p.lua @@@ -1,312 -1,0 +1,312 @@@ +---------------------------------------------------------------------------- +-- LuaJIT profiler. +-- - -- Copyright (C) 2005-2022 Mike Pall. All rights reserved. ++-- Copyright (C) 2005-2023 Mike Pall. All rights reserved. +-- Released under the MIT license. See Copyright Notice in luajit.h +---------------------------------------------------------------------------- +-- +-- This module is a simple command line interface to the built-in +-- low-overhead profiler of LuaJIT. +-- +-- The lower-level API of the profiler is accessible via the "jit.profile" +-- module or the luaJIT_profile_* C API. +-- +-- Example usage: +-- +-- luajit -jp myapp.lua +-- luajit -jp=s myapp.lua +-- luajit -jp=-s myapp.lua +-- luajit -jp=vl myapp.lua +-- luajit -jp=G,profile.txt myapp.lua +-- +-- The following dump features are available: +-- +-- f Stack dump: function name, otherwise module:line. Default mode. +-- F Stack dump: ditto, but always prepend module. +-- l Stack dump: module:line. +-- stack dump depth (callee < caller). Default: 1. +-- - Inverse stack dump depth (caller > callee). +-- s Split stack dump after first stack level. Implies abs(depth) >= 2. +-- p Show full path for module names. +-- v Show VM states. Can be combined with stack dumps, e.g. vf or fv. +-- z Show zones. Can be combined with stack dumps, e.g. zf or fz. +-- r Show raw sample counts. Default: show percentages. +-- a Annotate excerpts from source code files. +-- A Annotate complete source code files. +-- G Produce raw output suitable for graphical tools (e.g. flame graphs). +-- m Minimum sample percentage to be shown. Default: 3. +-- i Sampling interval in milliseconds. Default: 10. +-- +---------------------------------------------------------------------------- + +-- Cache some library functions and objects. +local jit = require("jit") +assert(jit.version_num == 20100, "LuaJIT core/library version mismatch") +local profile = require("jit.profile") +local vmdef = require("jit.vmdef") +local math = math +local pairs, ipairs, tonumber, floor = pairs, ipairs, tonumber, math.floor +local sort, format = table.sort, string.format +local stdout = io.stdout +local zone -- Load jit.zone module on demand. + +-- Output file handle. +local out + +------------------------------------------------------------------------------ + +local prof_ud +local prof_states, prof_split, prof_min, prof_raw, prof_fmt, prof_depth +local prof_ann, prof_count1, prof_count2, prof_samples + +local map_vmmode = { + N = "Compiled", + I = "Interpreted", + C = "C code", + G = "Garbage Collector", + J = "JIT Compiler", +} + +-- Profiler callback. +local function prof_cb(th, samples, vmmode) + prof_samples = prof_samples + samples + local key_stack, key_stack2, key_state + -- Collect keys for sample. + if prof_states then + if prof_states == "v" then + key_state = map_vmmode[vmmode] or vmmode + else + key_state = zone:get() or "(none)" + end + end + if prof_fmt then + key_stack = profile.dumpstack(th, prof_fmt, prof_depth) + key_stack = key_stack:gsub("%[builtin#(%d+)%]", function(x) + return vmdef.ffnames[tonumber(x)] + end) + if prof_split == 2 then + local k1, k2 = key_stack:match("(.-) [<>] (.*)") + if k2 then key_stack, key_stack2 = k1, k2 end + elseif prof_split == 3 then + key_stack2 = profile.dumpstack(th, "l", 1) + end + end + -- Order keys. + local k1, k2 + if prof_split == 1 then + if key_state then + k1 = key_state + if key_stack then k2 = key_stack end + end + elseif key_stack then + k1 = key_stack + if key_stack2 then k2 = key_stack2 elseif key_state then k2 = key_state end + end + -- Coalesce samples in one or two levels. + if k1 then + local t1 = prof_count1 + t1[k1] = (t1[k1] or 0) + samples + if k2 then + local t2 = prof_count2 + local t3 = t2[k1] + if not t3 then t3 = {}; t2[k1] = t3 end + t3[k2] = (t3[k2] or 0) + samples + end + end +end + +------------------------------------------------------------------------------ + +-- Show top N list. +local function prof_top(count1, count2, samples, indent) + local t, n = {}, 0 + for k in pairs(count1) do + n = n + 1 + t[n] = k + end + sort(t, function(a, b) return count1[a] > count1[b] end) + for i=1,n do + local k = t[i] + local v = count1[k] + local pct = floor(v*100/samples + 0.5) + if pct < prof_min then break end + if not prof_raw then + out:write(format("%s%2d%% %s\n", indent, pct, k)) + elseif prof_raw == "r" then + out:write(format("%s%5d %s\n", indent, v, k)) + else + out:write(format("%s %d\n", k, v)) + end + if count2 then + local r = count2[k] + if r then + prof_top(r, nil, v, (prof_split == 3 or prof_split == 1) and " -- " or + (prof_depth < 0 and " -> " or " <- ")) + end + end + end +end + +-- Annotate source code +local function prof_annotate(count1, samples) + local files = {} + local ms = 0 + for k, v in pairs(count1) do + local pct = floor(v*100/samples + 0.5) + ms = math.max(ms, v) + if pct >= prof_min then + local file, line = k:match("^(.*):(%d+)$") + if not file then file = k; line = 0 end + local fl = files[file] + if not fl then fl = {}; files[file] = fl; files[#files+1] = file end + line = tonumber(line) + fl[line] = prof_raw and v or pct + end + end + sort(files) + local fmtv, fmtn = " %3d%% | %s\n", " | %s\n" + if prof_raw then + local n = math.max(5, math.ceil(math.log10(ms))) + fmtv = "%"..n.."d | %s\n" + fmtn = (" "):rep(n).." | %s\n" + end + local ann = prof_ann + for _, file in ipairs(files) do + local f0 = file:byte() + if f0 == 40 or f0 == 91 then + out:write(format("\n====== %s ======\n[Cannot annotate non-file]\n", file)) + break + end + local fp, err = io.open(file) + if not fp then + out:write(format("====== ERROR: %s: %s\n", file, err)) + break + end + out:write(format("\n====== %s ======\n", file)) + local fl = files[file] + local n, show = 1, false + if ann ~= 0 then + for i=1,ann do + if fl[i] then show = true; out:write("@@ 1 @@\n"); break end + end + end + for line in fp:lines() do + if line:byte() == 27 then + out:write("[Cannot annotate bytecode file]\n") + break + end + local v = fl[n] + if ann ~= 0 then + local v2 = fl[n+ann] + if show then + if v2 then show = n+ann elseif v then show = n + elseif show+ann < n then show = false end + elseif v2 then + show = n+ann + out:write(format("@@ %d @@\n", n)) + end + if not show then goto next end + end + if v then + out:write(format(fmtv, v, line)) + else + out:write(format(fmtn, line)) + end + ::next:: + n = n + 1 + end + fp:close() + end +end + +------------------------------------------------------------------------------ + +-- Finish profiling and dump result. +local function prof_finish() + if prof_ud then + profile.stop() + local samples = prof_samples + if samples == 0 then + if prof_raw ~= true then out:write("[No samples collected]\n") end + return + end + if prof_ann then + prof_annotate(prof_count1, samples) + else + prof_top(prof_count1, prof_count2, samples, "") + end + prof_count1 = nil + prof_count2 = nil + prof_ud = nil + if out ~= stdout then out:close() end + end +end + +-- Start profiling. +local function prof_start(mode) + local interval = "" + mode = mode:gsub("i%d*", function(s) interval = s; return "" end) + prof_min = 3 + mode = mode:gsub("m(%d+)", function(s) prof_min = tonumber(s); return "" end) + prof_depth = 1 + mode = mode:gsub("%-?%d+", function(s) prof_depth = tonumber(s); return "" end) + local m = {} + for c in mode:gmatch(".") do m[c] = c end + prof_states = m.z or m.v + if prof_states == "z" then zone = require("jit.zone") end + local scope = m.l or m.f or m.F or (prof_states and "" or "f") + local flags = (m.p or "") + prof_raw = m.r + if m.s then + prof_split = 2 + if prof_depth == -1 or m["-"] then prof_depth = -2 + elseif prof_depth == 1 then prof_depth = 2 end + elseif mode:find("[fF].*l") then + scope = "l" + prof_split = 3 + else + prof_split = (scope == "" or mode:find("[zv].*[lfF]")) and 1 or 0 + end + prof_ann = m.A and 0 or (m.a and 3) + if prof_ann then + scope = "l" + prof_fmt = "pl" + prof_split = 0 + prof_depth = 1 + elseif m.G and scope ~= "" then + prof_fmt = flags..scope.."Z;" + prof_depth = -100 + prof_raw = true + prof_min = 0 + elseif scope == "" then + prof_fmt = false + else + local sc = prof_split == 3 and m.f or m.F or scope + prof_fmt = flags..sc..(prof_depth >= 0 and "Z < " or "Z > ") + end + prof_count1 = {} + prof_count2 = {} + prof_samples = 0 + profile.start(scope:lower()..interval, prof_cb) + prof_ud = newproxy(true) + getmetatable(prof_ud).__gc = prof_finish +end + +------------------------------------------------------------------------------ + +local function start(mode, outfile) + if not outfile then outfile = os.getenv("LUAJIT_PROFILEFILE") end + if outfile then + out = outfile == "-" and stdout or assert(io.open(outfile, "w")) + else + out = stdout + end + prof_start(mode or "f") +end + +-- Public module functions. +return { + start = start, -- For -j command line option. + stop = prof_finish +} + diff --cc src/jit/zone.lua index 1308cb74,00000000..55dc76d3 mode 100644,000000..100644 --- a/src/jit/zone.lua +++ b/src/jit/zone.lua @@@ -1,45 -1,0 +1,45 @@@ +---------------------------------------------------------------------------- +-- LuaJIT profiler zones. +-- - -- Copyright (C) 2005-2022 Mike Pall. All rights reserved. ++-- Copyright (C) 2005-2023 Mike Pall. All rights reserved. +-- Released under the MIT license. See Copyright Notice in luajit.h +---------------------------------------------------------------------------- +-- +-- This module implements a simple hierarchical zone model. +-- +-- Example usage: +-- +-- local zone = require("jit.zone") +-- zone("AI") +-- ... +-- zone("A*") +-- ... +-- print(zone:get()) --> "A*" +-- ... +-- zone() +-- ... +-- print(zone:get()) --> "AI" +-- ... +-- zone() +-- +---------------------------------------------------------------------------- + +local remove = table.remove + +return setmetatable({ + flush = function(t) + for i=#t,1,-1 do t[i] = nil end + end, + get = function(t) + return t[#t] + end +}, { + __call = function(t, zone) + if zone then + t[#t+1] = zone + else + return (assert(remove(t), "empty zone stack")) + end + end +}) + diff --cc src/lib_buffer.c index d6ff1346,00000000..e4ec9d9d mode 100644,000000..100644 --- a/src/lib_buffer.c +++ b/src/lib_buffer.c @@@ -1,360 -1,0 +1,360 @@@ +/* +** Buffer library. - ** Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h ++** Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +*/ + +#define lib_buffer_c +#define LUA_LIB + +#include "lua.h" +#include "lauxlib.h" +#include "lualib.h" + +#include "lj_obj.h" + +#if LJ_HASBUFFER +#include "lj_gc.h" +#include "lj_err.h" +#include "lj_buf.h" +#include "lj_str.h" +#include "lj_tab.h" +#include "lj_udata.h" +#include "lj_meta.h" +#if LJ_HASFFI +#include "lj_ctype.h" +#include "lj_cdata.h" +#include "lj_cconv.h" +#endif +#include "lj_strfmt.h" +#include "lj_serialize.h" +#include "lj_lib.h" + +/* -- Helper functions ---------------------------------------------------- */ + +/* Check that the first argument is a string buffer. */ +static SBufExt *buffer_tobuf(lua_State *L) +{ + if (!(L->base < L->top && tvisbuf(L->base))) + lj_err_argtype(L, 1, "buffer"); + return bufV(L->base); +} + +/* Ditto, but for writers. */ +static LJ_AINLINE SBufExt *buffer_tobufw(lua_State *L) +{ + SBufExt *sbx = buffer_tobuf(L); + setsbufXL_(sbx, L); + return sbx; +} + +#define buffer_toudata(sbx) ((GCudata *)(sbx)-1) + +/* -- Buffer methods ------------------------------------------------------ */ + +#define LJLIB_MODULE_buffer_method + +LJLIB_CF(buffer_method_free) +{ + SBufExt *sbx = buffer_tobuf(L); + lj_bufx_free(L, sbx); + L->top = L->base+1; /* Chain buffer object. */ + return 1; +} + +LJLIB_CF(buffer_method_reset) LJLIB_REC(.) +{ + SBufExt *sbx = buffer_tobuf(L); + lj_bufx_reset(sbx); + L->top = L->base+1; /* Chain buffer object. */ + return 1; +} + +LJLIB_CF(buffer_method_skip) LJLIB_REC(.) +{ + SBufExt *sbx = buffer_tobuf(L); + MSize n = (MSize)lj_lib_checkintrange(L, 2, 0, LJ_MAX_BUF); + MSize len = sbufxlen(sbx); + if (n < len) { + sbx->r += n; + } else if (sbufiscow(sbx)) { + sbx->r = sbx->w; + } else { + sbx->r = sbx->w = sbx->b; + } + L->top = L->base+1; /* Chain buffer object. */ + return 1; +} + +LJLIB_CF(buffer_method_set) LJLIB_REC(.) +{ + SBufExt *sbx = buffer_tobuf(L); + GCobj *ref; + const char *p; + MSize len; +#if LJ_HASFFI + if (tviscdata(L->base+1)) { + CTState *cts = ctype_cts(L); + lj_cconv_ct_tv(cts, ctype_get(cts, CTID_P_CVOID), (uint8_t *)&p, + L->base+1, CCF_ARG(2)); + len = (MSize)lj_lib_checkintrange(L, 3, 0, LJ_MAX_BUF); + } else +#endif + { + GCstr *str = lj_lib_checkstrx(L, 2); + p = strdata(str); + len = str->len; + } + lj_bufx_free(L, sbx); + lj_bufx_set_cow(L, sbx, p, len); + ref = gcV(L->base+1); + setgcref(sbx->cowref, ref); + lj_gc_objbarrier(L, buffer_toudata(sbx), ref); + L->top = L->base+1; /* Chain buffer object. */ + return 1; +} + +LJLIB_CF(buffer_method_put) LJLIB_REC(.) +{ + SBufExt *sbx = buffer_tobufw(L); + ptrdiff_t arg, narg = L->top - L->base; + for (arg = 1; arg < narg; arg++) { + cTValue *o = &L->base[arg], *mo = NULL; + retry: + if (tvisstr(o)) { + lj_buf_putstr((SBuf *)sbx, strV(o)); + } else if (tvisint(o)) { + lj_strfmt_putint((SBuf *)sbx, intV(o)); + } else if (tvisnum(o)) { + lj_strfmt_putfnum((SBuf *)sbx, STRFMT_G14, numV(o)); + } else if (tvisbuf(o)) { + SBufExt *sbx2 = bufV(o); + if (sbx2 == sbx) lj_err_arg(L, (int)(arg+1), LJ_ERR_BUFFER_SELF); + lj_buf_putmem((SBuf *)sbx, sbx2->r, sbufxlen(sbx2)); + } else if (!mo && !tvisnil(mo = lj_meta_lookup(L, o, MM_tostring))) { + /* Call __tostring metamethod inline. */ + copyTV(L, L->top++, mo); + copyTV(L, L->top++, o); + lua_call(L, 1, 1); + o = &L->base[arg]; /* The stack may have been reallocated. */ + copyTV(L, &L->base[arg], L->top-1); + L->top = L->base + narg; + goto retry; /* Retry with the result. */ + } else { + lj_err_argtype(L, (int)(arg+1), "string/number/__tostring"); + } + /* Probably not useful to inline other __tostring MMs, e.g. FFI numbers. */ + } + L->top = L->base+1; /* Chain buffer object. */ + lj_gc_check(L); + return 1; +} + +LJLIB_CF(buffer_method_putf) LJLIB_REC(.) +{ + SBufExt *sbx = buffer_tobufw(L); + lj_strfmt_putarg(L, (SBuf *)sbx, 2, 2); + L->top = L->base+1; /* Chain buffer object. */ + lj_gc_check(L); + return 1; +} + +LJLIB_CF(buffer_method_get) LJLIB_REC(.) +{ + SBufExt *sbx = buffer_tobuf(L); + ptrdiff_t arg, narg = L->top - L->base; + if (narg == 1) { + narg++; + setnilV(L->top++); /* get() is the same as get(nil). */ + } + for (arg = 1; arg < narg; arg++) { + TValue *o = &L->base[arg]; + MSize n = tvisnil(o) ? LJ_MAX_BUF : + (MSize) lj_lib_checkintrange(L, (int)(arg+1), 0, LJ_MAX_BUF); + MSize len = sbufxlen(sbx); + if (n > len) n = len; + setstrV(L, o, lj_str_new(L, sbx->r, n)); + sbx->r += n; + } + if (sbx->r == sbx->w && !sbufiscow(sbx)) sbx->r = sbx->w = sbx->b; + lj_gc_check(L); + return (int)(narg-1); +} + +#if LJ_HASFFI +LJLIB_CF(buffer_method_putcdata) LJLIB_REC(.) +{ + SBufExt *sbx = buffer_tobufw(L); + const char *p; + MSize len; + if (tviscdata(L->base+1)) { + CTState *cts = ctype_cts(L); + lj_cconv_ct_tv(cts, ctype_get(cts, CTID_P_CVOID), (uint8_t *)&p, + L->base+1, CCF_ARG(2)); + } else { + lj_err_argtype(L, 2, "cdata"); + } + len = (MSize)lj_lib_checkintrange(L, 3, 0, LJ_MAX_BUF); + lj_buf_putmem((SBuf *)sbx, p, len); + L->top = L->base+1; /* Chain buffer object. */ + return 1; +} + +LJLIB_CF(buffer_method_reserve) LJLIB_REC(.) +{ + SBufExt *sbx = buffer_tobufw(L); + MSize sz = (MSize)lj_lib_checkintrange(L, 2, 0, LJ_MAX_BUF); + GCcdata *cd; + lj_buf_more((SBuf *)sbx, sz); + ctype_loadffi(L); + cd = lj_cdata_new_(L, CTID_P_UINT8, CTSIZE_PTR); + *(void **)cdataptr(cd) = sbx->w; + setcdataV(L, L->top++, cd); + setintV(L->top++, sbufleft(sbx)); + return 2; +} + +LJLIB_CF(buffer_method_commit) LJLIB_REC(.) +{ + SBufExt *sbx = buffer_tobuf(L); + MSize len = (MSize)lj_lib_checkintrange(L, 2, 0, LJ_MAX_BUF); + if (len > sbufleft(sbx)) lj_err_arg(L, 2, LJ_ERR_NUMRNG); + sbx->w += len; + L->top = L->base+1; /* Chain buffer object. */ + return 1; +} + +LJLIB_CF(buffer_method_ref) LJLIB_REC(.) +{ + SBufExt *sbx = buffer_tobuf(L); + GCcdata *cd; + ctype_loadffi(L); + cd = lj_cdata_new_(L, CTID_P_UINT8, CTSIZE_PTR); + *(void **)cdataptr(cd) = sbx->r; + setcdataV(L, L->top++, cd); + setintV(L->top++, sbufxlen(sbx)); + return 2; +} +#endif + +LJLIB_CF(buffer_method_encode) LJLIB_REC(.) +{ + SBufExt *sbx = buffer_tobufw(L); + cTValue *o = lj_lib_checkany(L, 2); + lj_serialize_put(sbx, o); + lj_gc_check(L); + L->top = L->base+1; /* Chain buffer object. */ + return 1; +} + +LJLIB_CF(buffer_method_decode) LJLIB_REC(.) +{ + SBufExt *sbx = buffer_tobufw(L); + setnilV(L->top++); + sbx->r = lj_serialize_get(sbx, L->top-1); + lj_gc_check(L); + return 1; +} + +LJLIB_CF(buffer_method___gc) +{ + SBufExt *sbx = buffer_tobuf(L); + lj_bufx_free(L, sbx); + return 0; +} + +LJLIB_CF(buffer_method___tostring) LJLIB_REC(.) +{ + SBufExt *sbx = buffer_tobuf(L); + setstrV(L, L->top-1, lj_str_new(L, sbx->r, sbufxlen(sbx))); + lj_gc_check(L); + return 1; +} + +LJLIB_CF(buffer_method___len) LJLIB_REC(.) +{ + SBufExt *sbx = buffer_tobuf(L); + setintV(L->top-1, (int32_t)sbufxlen(sbx)); + return 1; +} + +LJLIB_PUSH("buffer") LJLIB_SET(__metatable) +LJLIB_PUSH(top-1) LJLIB_SET(__index) + +/* -- Buffer library functions -------------------------------------------- */ + +#define LJLIB_MODULE_buffer + +LJLIB_PUSH(top-2) LJLIB_SET(!) /* Set environment. */ + +LJLIB_CF(buffer_new) +{ + MSize sz = 0; + int targ = 1; + GCtab *env, *dict_str = NULL, *dict_mt = NULL; + GCudata *ud; + SBufExt *sbx; + if (L->base < L->top && !tvistab(L->base)) { + targ = 2; + if (!tvisnil(L->base)) + sz = (MSize)lj_lib_checkintrange(L, 1, 0, LJ_MAX_BUF); + } + if (L->base+targ-1 < L->top) { + GCtab *options = lj_lib_checktab(L, targ); + cTValue *opt_dict, *opt_mt; + opt_dict = lj_tab_getstr(options, lj_str_newlit(L, "dict")); + if (opt_dict && tvistab(opt_dict)) { + dict_str = tabV(opt_dict); + lj_serialize_dict_prep_str(L, dict_str); + } + opt_mt = lj_tab_getstr(options, lj_str_newlit(L, "metatable")); + if (opt_mt && tvistab(opt_mt)) { + dict_mt = tabV(opt_mt); + lj_serialize_dict_prep_mt(L, dict_mt); + } + } + env = tabref(curr_func(L)->c.env); + ud = lj_udata_new(L, sizeof(SBufExt), env); + ud->udtype = UDTYPE_BUFFER; + /* NOBARRIER: The GCudata is new (marked white). */ + setgcref(ud->metatable, obj2gco(env)); + setudataV(L, L->top++, ud); + sbx = (SBufExt *)uddata(ud); + lj_bufx_init(L, sbx); + setgcref(sbx->dict_str, obj2gco(dict_str)); + setgcref(sbx->dict_mt, obj2gco(dict_mt)); + if (sz > 0) lj_buf_need2((SBuf *)sbx, sz); + lj_gc_check(L); + return 1; +} + +LJLIB_CF(buffer_encode) LJLIB_REC(.) +{ + cTValue *o = lj_lib_checkany(L, 1); + setstrV(L, L->top++, lj_serialize_encode(L, o)); + lj_gc_check(L); + return 1; +} + +LJLIB_CF(buffer_decode) LJLIB_REC(.) +{ + GCstr *str = lj_lib_checkstrx(L, 1); + setnilV(L->top++); + lj_serialize_decode(L, L->top-1, str); + lj_gc_check(L); + return 1; +} + +/* ------------------------------------------------------------------------ */ + +#include "lj_libdef.h" + +int luaopen_string_buffer(lua_State *L) +{ + LJ_LIB_REG(L, NULL, buffer_method); + lua_getfield(L, -1, "__tostring"); + lua_setfield(L, -2, "tostring"); + LJ_LIB_REG(L, NULL, buffer); + return 1; +} + +#endif diff --cc src/lj_asm_arm64.h index c537c514,00000000..34960d7c mode 100644,000000..100644 --- a/src/lj_asm_arm64.h +++ b/src/lj_asm_arm64.h @@@ -1,2070 -1,0 +1,2070 @@@ +/* +** ARM64 IR assembler (SSA IR -> machine code). - ** Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h ++** Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +** +** Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com. +** Sponsored by Cisco Systems, Inc. +*/ + +/* -- Register allocator extensions --------------------------------------- */ + +/* Allocate a register with a hint. */ +static Reg ra_hintalloc(ASMState *as, IRRef ref, Reg hint, RegSet allow) +{ + Reg r = IR(ref)->r; + if (ra_noreg(r)) { + if (!ra_hashint(r) && !iscrossref(as, ref)) + ra_sethint(IR(ref)->r, hint); /* Propagate register hint. */ + r = ra_allocref(as, ref, allow); + } + ra_noweak(as, r); + return r; +} + +/* Allocate two source registers for three-operand instructions. */ +static Reg ra_alloc2(ASMState *as, IRIns *ir, RegSet allow) +{ + IRIns *irl = IR(ir->op1), *irr = IR(ir->op2); + Reg left = irl->r, right = irr->r; + if (ra_hasreg(left)) { + ra_noweak(as, left); + if (ra_noreg(right)) + right = ra_allocref(as, ir->op2, rset_exclude(allow, left)); + else + ra_noweak(as, right); + } else if (ra_hasreg(right)) { + ra_noweak(as, right); + left = ra_allocref(as, ir->op1, rset_exclude(allow, right)); + } else if (ra_hashint(right)) { + right = ra_allocref(as, ir->op2, allow); + left = ra_alloc1(as, ir->op1, rset_exclude(allow, right)); + } else { + left = ra_allocref(as, ir->op1, allow); + right = ra_alloc1(as, ir->op2, rset_exclude(allow, left)); + } + return left | (right << 8); +} + +/* -- Guard handling ------------------------------------------------------ */ + +/* Setup all needed exit stubs. */ +static void asm_exitstub_setup(ASMState *as, ExitNo nexits) +{ + ExitNo i; + MCode *mxp = as->mctop; + if (mxp - (nexits + 3 + MCLIM_REDZONE) < as->mclim) + asm_mclimit(as); + /* 1: str lr,[sp]; bl ->vm_exit_handler; movz w0,traceno; bl <1; bl <1; ... */ + for (i = nexits-1; (int32_t)i >= 0; i--) + *--mxp = A64I_LE(A64I_BL | A64F_S26(-3-i)); + *--mxp = A64I_LE(A64I_MOVZw | A64F_U16(as->T->traceno)); + mxp--; + *mxp = A64I_LE(A64I_BL | A64F_S26(((MCode *)(void *)lj_vm_exit_handler-mxp))); + *--mxp = A64I_LE(A64I_STRx | A64F_D(RID_LR) | A64F_N(RID_SP)); + as->mctop = mxp; +} + +static MCode *asm_exitstub_addr(ASMState *as, ExitNo exitno) +{ + /* Keep this in-sync with exitstub_trace_addr(). */ + return as->mctop + exitno + 3; +} + +/* Emit conditional branch to exit for guard. */ +static void asm_guardcc(ASMState *as, A64CC cc) +{ + MCode *target = asm_exitstub_addr(as, as->snapno); + MCode *p = as->mcp; + if (LJ_UNLIKELY(p == as->invmcp)) { + as->loopinv = 1; + *p = A64I_B | A64F_S26(target-p); + emit_cond_branch(as, cc^1, p-1); + return; + } + emit_cond_branch(as, cc, target); +} + +/* Emit test and branch instruction to exit for guard. */ +static void asm_guardtnb(ASMState *as, A64Ins ai, Reg r, uint32_t bit) +{ + MCode *target = asm_exitstub_addr(as, as->snapno); + MCode *p = as->mcp; + if (LJ_UNLIKELY(p == as->invmcp)) { + as->loopinv = 1; + *p = A64I_B | A64F_S26(target-p); + emit_tnb(as, ai^0x01000000u, r, bit, p-1); + return; + } + emit_tnb(as, ai, r, bit, target); +} + +/* Emit compare and branch instruction to exit for guard. */ +static void asm_guardcnb(ASMState *as, A64Ins ai, Reg r) +{ + MCode *target = asm_exitstub_addr(as, as->snapno); + MCode *p = as->mcp; + if (LJ_UNLIKELY(p == as->invmcp)) { + as->loopinv = 1; + *p = A64I_B | A64F_S26(target-p); + emit_cnb(as, ai^0x01000000u, r, p-1); + return; + } + emit_cnb(as, ai, r, target); +} + +/* -- Operand fusion ------------------------------------------------------ */ + +/* Limit linear search to this distance. Avoids O(n^2) behavior. */ +#define CONFLICT_SEARCH_LIM 31 + +static int asm_isk32(ASMState *as, IRRef ref, int32_t *k) +{ + if (irref_isk(ref)) { + IRIns *ir = IR(ref); + if (ir->o == IR_KNULL || !irt_is64(ir->t)) { + *k = ir->i; + return 1; + } else if (checki32((int64_t)ir_k64(ir)->u64)) { + *k = (int32_t)ir_k64(ir)->u64; + return 1; + } + } + return 0; +} + +/* Check if there's no conflicting instruction between curins and ref. */ +static int noconflict(ASMState *as, IRRef ref, IROp conflict) +{ + IRIns *ir = as->ir; + IRRef i = as->curins; + if (i > ref + CONFLICT_SEARCH_LIM) + return 0; /* Give up, ref is too far away. */ + while (--i > ref) + if (ir[i].o == conflict) + return 0; /* Conflict found. */ + return 1; /* Ok, no conflict. */ +} + +/* Fuse the array base of colocated arrays. */ +static int32_t asm_fuseabase(ASMState *as, IRRef ref) +{ + IRIns *ir = IR(ref); + if (ir->o == IR_TNEW && ir->op1 <= LJ_MAX_COLOSIZE && + !neverfuse(as) && noconflict(as, ref, IR_NEWREF)) + return (int32_t)sizeof(GCtab); + return 0; +} + +#define FUSE_REG 0x40000000 + +/* Fuse array/hash/upvalue reference into register+offset operand. */ +static Reg asm_fuseahuref(ASMState *as, IRRef ref, int32_t *ofsp, RegSet allow, + A64Ins ins) +{ + IRIns *ir = IR(ref); + if (ra_noreg(ir->r)) { + if (ir->o == IR_AREF) { + if (mayfuse(as, ref)) { + if (irref_isk(ir->op2)) { + IRRef tab = IR(ir->op1)->op1; + int32_t ofs = asm_fuseabase(as, tab); + IRRef refa = ofs ? tab : ir->op1; + ofs += 8*IR(ir->op2)->i; + if (emit_checkofs(ins, ofs)) { + *ofsp = ofs; + return ra_alloc1(as, refa, allow); + } + } else { + Reg base = ra_alloc1(as, ir->op1, allow); + *ofsp = FUSE_REG|ra_alloc1(as, ir->op2, rset_exclude(allow, base)); + return base; + } + } + } else if (ir->o == IR_HREFK) { + if (mayfuse(as, ref)) { + int32_t ofs = (int32_t)(IR(ir->op2)->op2 * sizeof(Node)); + if (emit_checkofs(ins, ofs)) { + *ofsp = ofs; + return ra_alloc1(as, ir->op1, allow); + } + } + } else if (ir->o == IR_UREFC) { + if (irref_isk(ir->op1)) { + GCfunc *fn = ir_kfunc(IR(ir->op1)); + GCupval *uv = &gcref(fn->l.uvptr[(ir->op2 >> 8)])->uv; + int64_t ofs = glofs(as, &uv->tv); + if (emit_checkofs(ins, ofs)) { + *ofsp = (int32_t)ofs; + return RID_GL; + } + } + } else if (ir->o == IR_TMPREF) { + *ofsp = (int32_t)glofs(as, &J2G(as->J)->tmptv); + return RID_GL; + } + } + *ofsp = 0; + return ra_alloc1(as, ref, allow); +} + +/* Fuse m operand into arithmetic/logic instructions. */ +static uint32_t asm_fuseopm(ASMState *as, A64Ins ai, IRRef ref, RegSet allow) +{ + IRIns *ir = IR(ref); + if (ra_hasreg(ir->r)) { + ra_noweak(as, ir->r); + return A64F_M(ir->r); + } else if (irref_isk(ref)) { + uint32_t m; + int64_t k = get_k64val(as, ref); + if ((ai & 0x1f000000) == 0x0a000000) + m = emit_isk13(k, irt_is64(ir->t)); + else + m = emit_isk12(k); + if (m) + return m; + } else if (mayfuse(as, ref)) { + if ((ir->o >= IR_BSHL && ir->o <= IR_BSAR && irref_isk(ir->op2)) || + (ir->o == IR_ADD && ir->op1 == ir->op2)) { + A64Shift sh = ir->o == IR_BSHR ? A64SH_LSR : + ir->o == IR_BSAR ? A64SH_ASR : A64SH_LSL; + int shift = ir->o == IR_ADD ? 1 : + (IR(ir->op2)->i & (irt_is64(ir->t) ? 63 : 31)); + IRIns *irl = IR(ir->op1); + if (sh == A64SH_LSL && + irl->o == IR_CONV && + irl->op2 == ((IRT_I64<op1, allow); + return A64F_M(m) | A64F_EXSH(A64EX_SXTW, shift); + } else { + Reg m = ra_alloc1(as, ir->op1, allow); + return A64F_M(m) | A64F_SH(sh, shift); + } + } else if (ir->o == IR_CONV && + ir->op2 == ((IRT_I64<op1, allow); + return A64F_M(m) | A64F_EX(A64EX_SXTW); + } + } + return A64F_M(ra_allocref(as, ref, allow)); +} + +/* Fuse XLOAD/XSTORE reference into load/store operand. */ +static void asm_fusexref(ASMState *as, A64Ins ai, Reg rd, IRRef ref, + RegSet allow) +{ + IRIns *ir = IR(ref); + Reg base; + int32_t ofs = 0; + if (ra_noreg(ir->r) && canfuse(as, ir)) { + if (ir->o == IR_ADD) { + if (asm_isk32(as, ir->op2, &ofs) && emit_checkofs(ai, ofs)) { + ref = ir->op1; + } else { + Reg rn, rm; + IRRef lref = ir->op1, rref = ir->op2; + IRIns *irl = IR(lref); + if (mayfuse(as, irl->op1)) { + unsigned int shift = 4; + if (irl->o == IR_BSHL && irref_isk(irl->op2)) { + shift = (IR(irl->op2)->i & 63); + } else if (irl->o == IR_ADD && irl->op1 == irl->op2) { + shift = 1; + } + if ((ai >> 30) == shift) { + lref = irl->op1; + irl = IR(lref); + ai |= A64I_LS_SH; + } + } + if (irl->o == IR_CONV && + irl->op2 == ((IRT_I64<op1; + ai |= A64I_LS_SXTWx; + } else { + ai |= A64I_LS_LSLx; + } + rm = ra_alloc1(as, lref, allow); + rn = ra_alloc1(as, rref, rset_exclude(allow, rm)); + emit_dnm(as, (ai^A64I_LS_R), (rd & 31), rn, rm); + return; + } + } else if (ir->o == IR_STRREF) { + if (asm_isk32(as, ir->op2, &ofs)) { + ref = ir->op1; + } else if (asm_isk32(as, ir->op1, &ofs)) { + ref = ir->op2; + } else { + Reg refk = irref_isk(ir->op1) ? ir->op1 : ir->op2; + Reg refv = irref_isk(ir->op1) ? ir->op2 : ir->op1; + Reg rn = ra_alloc1(as, refv, allow); + IRIns *irr = IR(refk); + uint32_t m; + if (irr+1 == ir && !ra_used(irr) && + irr->o == IR_ADD && irref_isk(irr->op2)) { + ofs = sizeof(GCstr) + IR(irr->op2)->i; + if (emit_checkofs(ai, ofs)) { + Reg rm = ra_alloc1(as, irr->op1, rset_exclude(allow, rn)); + m = A64F_M(rm) | A64F_EX(A64EX_SXTW); + goto skipopm; + } + } + m = asm_fuseopm(as, 0, refk, rset_exclude(allow, rn)); + ofs = sizeof(GCstr); + skipopm: + emit_lso(as, ai, rd, rd, ofs); + emit_dn(as, A64I_ADDx^m, rd, rn); + return; + } + ofs += sizeof(GCstr); + if (!emit_checkofs(ai, ofs)) { + Reg rn = ra_alloc1(as, ref, allow); + Reg rm = ra_allock(as, ofs, rset_exclude(allow, rn)); + emit_dnm(as, (ai^A64I_LS_R)|A64I_LS_UXTWx, rd, rn, rm); + return; + } + } + } + base = ra_alloc1(as, ref, allow); + emit_lso(as, ai, (rd & 31), base, ofs); +} + +/* Fuse FP multiply-add/sub. */ +static int asm_fusemadd(ASMState *as, IRIns *ir, A64Ins ai, A64Ins air) +{ + IRRef lref = ir->op1, rref = ir->op2; + IRIns *irm; + if ((as->flags & JIT_F_OPT_FMA) && + lref != rref && + ((mayfuse(as, lref) && (irm = IR(lref), irm->o == IR_MUL) && + ra_noreg(irm->r)) || + (mayfuse(as, rref) && (irm = IR(rref), irm->o == IR_MUL) && + (rref = lref, ai = air, ra_noreg(irm->r))))) { + Reg dest = ra_dest(as, ir, RSET_FPR); + Reg add = ra_hintalloc(as, rref, dest, RSET_FPR); + Reg left = ra_alloc2(as, irm, + rset_exclude(rset_exclude(RSET_FPR, dest), add)); + Reg right = (left >> 8); left &= 255; + emit_dnma(as, ai, (dest & 31), (left & 31), (right & 31), (add & 31)); + return 1; + } + return 0; +} + +/* Fuse BAND + BSHL/BSHR into UBFM. */ +static int asm_fuseandshift(ASMState *as, IRIns *ir) +{ + IRIns *irl = IR(ir->op1); + lj_assertA(ir->o == IR_BAND, "bad usage"); + if (canfuse(as, irl) && irref_isk(ir->op2)) { + uint64_t mask = get_k64val(as, ir->op2); + if (irref_isk(irl->op2) && (irl->o == IR_BSHR || irl->o == IR_BSHL)) { + int32_t shmask = irt_is64(irl->t) ? 63 : 31; + int32_t shift = (IR(irl->op2)->i & shmask); + int32_t imms = shift; + if (irl->o == IR_BSHL) { + mask >>= shift; + shift = (shmask-shift+1) & shmask; + imms = 0; + } + if (mask && !((mask+1) & mask)) { /* Contiguous 1-bits at the bottom. */ + Reg dest = ra_dest(as, ir, RSET_GPR); + Reg left = ra_alloc1(as, irl->op1, RSET_GPR); + A64Ins ai = shmask == 63 ? A64I_UBFMx : A64I_UBFMw; + imms += 63 - emit_clz64(mask); + if (imms > shmask) imms = shmask; + emit_dn(as, ai | A64F_IMMS(imms) | A64F_IMMR(shift), dest, left); + return 1; + } + } + } + return 0; +} + +/* Fuse BOR(BSHL, BSHR) into EXTR/ROR. */ +static int asm_fuseorshift(ASMState *as, IRIns *ir) +{ + IRIns *irl = IR(ir->op1), *irr = IR(ir->op2); + lj_assertA(ir->o == IR_BOR, "bad usage"); + if (canfuse(as, irl) && canfuse(as, irr) && + ((irl->o == IR_BSHR && irr->o == IR_BSHL) || + (irl->o == IR_BSHL && irr->o == IR_BSHR))) { + if (irref_isk(irl->op2) && irref_isk(irr->op2)) { + IRRef lref = irl->op1, rref = irr->op1; + uint32_t lshift = IR(irl->op2)->i, rshift = IR(irr->op2)->i; + if (irl->o == IR_BSHR) { /* BSHR needs to be the right operand. */ + uint32_t tmp2; + IRRef tmp1 = lref; lref = rref; rref = tmp1; + tmp2 = lshift; lshift = rshift; rshift = tmp2; + } + if (rshift + lshift == (irt_is64(ir->t) ? 64 : 32)) { + A64Ins ai = irt_is64(ir->t) ? A64I_EXTRx : A64I_EXTRw; + Reg dest = ra_dest(as, ir, RSET_GPR); + Reg left = ra_alloc1(as, lref, RSET_GPR); + Reg right = ra_alloc1(as, rref, rset_exclude(RSET_GPR, left)); + emit_dnm(as, ai | A64F_IMMS(rshift), dest, left, right); + return 1; + } + } + } + return 0; +} + +/* -- Calls --------------------------------------------------------------- */ + +/* Generate a call to a C function. */ +static void asm_gencall(ASMState *as, const CCallInfo *ci, IRRef *args) +{ + uint32_t n, nargs = CCI_XNARGS(ci); + int32_t ofs = 0; + Reg gpr, fpr = REGARG_FIRSTFPR; + if (ci->func) + emit_call(as, ci->func); + for (gpr = REGARG_FIRSTGPR; gpr <= REGARG_LASTGPR; gpr++) + as->cost[gpr] = REGCOST(~0u, ASMREF_L); + gpr = REGARG_FIRSTGPR; + for (n = 0; n < nargs; n++) { /* Setup args. */ + IRRef ref = args[n]; + IRIns *ir = IR(ref); + if (ref) { + if (irt_isfp(ir->t)) { + if (fpr <= REGARG_LASTFPR) { + lj_assertA(rset_test(as->freeset, fpr), + "reg %d not free", fpr); /* Must have been evicted. */ + ra_leftov(as, fpr, ref); + fpr++; + } else { + Reg r = ra_alloc1(as, ref, RSET_FPR); + emit_spstore(as, ir, r, ofs + ((LJ_BE && !irt_isnum(ir->t)) ? 4 : 0)); + ofs += 8; + } + } else { + if (gpr <= REGARG_LASTGPR) { + lj_assertA(rset_test(as->freeset, gpr), + "reg %d not free", gpr); /* Must have been evicted. */ + ra_leftov(as, gpr, ref); + gpr++; + } else { + Reg r = ra_alloc1(as, ref, RSET_GPR); + emit_spstore(as, ir, r, ofs + ((LJ_BE && !irt_is64(ir->t)) ? 4 : 0)); + ofs += 8; + } + } + } + } +} + +/* Setup result reg/sp for call. Evict scratch regs. */ +static void asm_setupresult(ASMState *as, IRIns *ir, const CCallInfo *ci) +{ + RegSet drop = RSET_SCRATCH; + int hiop = ((ir+1)->o == IR_HIOP && !irt_isnil((ir+1)->t)); + if (ra_hasreg(ir->r)) + rset_clear(drop, ir->r); /* Dest reg handled below. */ + if (hiop && ra_hasreg((ir+1)->r)) + rset_clear(drop, (ir+1)->r); /* Dest reg handled below. */ + ra_evictset(as, drop); /* Evictions must be performed first. */ + if (ra_used(ir)) { + lj_assertA(!irt_ispri(ir->t), "PRI dest"); + if (irt_isfp(ir->t)) { + if (ci->flags & CCI_CASTU64) { + Reg dest = ra_dest(as, ir, RSET_FPR) & 31; + emit_dn(as, irt_isnum(ir->t) ? A64I_FMOV_D_R : A64I_FMOV_S_R, + dest, RID_RET); + } else { + ra_destreg(as, ir, RID_FPRET); + } + } else if (hiop) { + ra_destpair(as, ir); + } else { + ra_destreg(as, ir, RID_RET); + } + } + UNUSED(ci); +} + +static void asm_callx(ASMState *as, IRIns *ir) +{ + IRRef args[CCI_NARGS_MAX*2]; + CCallInfo ci; + IRRef func; + IRIns *irf; + ci.flags = asm_callx_flags(as, ir); + asm_collectargs(as, ir, &ci, args); + asm_setupresult(as, ir, &ci); + func = ir->op2; irf = IR(func); + if (irf->o == IR_CARG) { func = irf->op1; irf = IR(func); } + if (irref_isk(func)) { /* Call to constant address. */ + ci.func = (ASMFunction)(ir_k64(irf)->u64); + } else { /* Need a non-argument register for indirect calls. */ + Reg freg = ra_alloc1(as, func, RSET_RANGE(RID_X8, RID_MAX_GPR)-RSET_FIXED); + emit_n(as, A64I_BLR_AUTH, freg); + ci.func = (ASMFunction)(void *)0; + } + asm_gencall(as, &ci, args); +} + +/* -- Returns ------------------------------------------------------------- */ + +/* Return to lower frame. Guard that it goes to the right spot. */ +static void asm_retf(ASMState *as, IRIns *ir) +{ + Reg base = ra_alloc1(as, REF_BASE, RSET_GPR); + void *pc = ir_kptr(IR(ir->op2)); + int32_t delta = 1+LJ_FR2+bc_a(*((const BCIns *)pc - 1)); + as->topslot -= (BCReg)delta; + if ((int32_t)as->topslot < 0) as->topslot = 0; + irt_setmark(IR(REF_BASE)->t); /* Children must not coalesce with BASE reg. */ + /* Need to force a spill on REF_BASE now to update the stack slot. */ + emit_lso(as, A64I_STRx, base, RID_SP, ra_spill(as, IR(REF_BASE))); + emit_setgl(as, base, jit_base); + emit_addptr(as, base, -8*delta); + asm_guardcc(as, CC_NE); + emit_nm(as, A64I_CMPx, RID_TMP, + ra_allock(as, i64ptr(pc), rset_exclude(RSET_GPR, base))); + emit_lso(as, A64I_LDRx, RID_TMP, base, -8); +} + +/* -- Buffer operations --------------------------------------------------- */ + +#if LJ_HASBUFFER +static void asm_bufhdr_write(ASMState *as, Reg sb) +{ + Reg tmp = ra_scratch(as, rset_exclude(RSET_GPR, sb)); + IRIns irgc; + irgc.ot = IRT(0, IRT_PGC); /* GC type. */ + emit_storeofs(as, &irgc, RID_TMP, sb, offsetof(SBuf, L)); + emit_dn(as, A64I_BFMx | A64F_IMMS(lj_fls(SBUF_MASK_FLAG)) | A64F_IMMR(0), RID_TMP, tmp); + emit_getgl(as, RID_TMP, cur_L); + emit_loadofs(as, &irgc, tmp, sb, offsetof(SBuf, L)); +} +#endif + +/* -- Type conversions ---------------------------------------------------- */ + +static void asm_tointg(ASMState *as, IRIns *ir, Reg left) +{ + Reg tmp = ra_scratch(as, rset_exclude(RSET_FPR, left)); + Reg dest = ra_dest(as, ir, RSET_GPR); + asm_guardcc(as, CC_NE); + emit_nm(as, A64I_FCMPd, (tmp & 31), (left & 31)); + emit_dn(as, A64I_FCVT_F64_S32, (tmp & 31), dest); + emit_dn(as, A64I_FCVT_S32_F64, dest, (left & 31)); +} + +static void asm_tobit(ASMState *as, IRIns *ir) +{ + RegSet allow = RSET_FPR; + Reg left = ra_alloc1(as, ir->op1, allow); + Reg right = ra_alloc1(as, ir->op2, rset_clear(allow, left)); + Reg tmp = ra_scratch(as, rset_clear(allow, right)); + Reg dest = ra_dest(as, ir, RSET_GPR); + emit_dn(as, A64I_FMOV_R_S, dest, (tmp & 31)); + emit_dnm(as, A64I_FADDd, (tmp & 31), (left & 31), (right & 31)); +} + +static void asm_conv(ASMState *as, IRIns *ir) +{ + IRType st = (IRType)(ir->op2 & IRCONV_SRCMASK); + int st64 = (st == IRT_I64 || st == IRT_U64 || st == IRT_P64); + int stfp = (st == IRT_NUM || st == IRT_FLOAT); + IRRef lref = ir->op1; + lj_assertA(irt_type(ir->t) != st, "inconsistent types for CONV"); + if (irt_isfp(ir->t)) { + Reg dest = ra_dest(as, ir, RSET_FPR); + if (stfp) { /* FP to FP conversion. */ + emit_dn(as, st == IRT_NUM ? A64I_FCVT_F32_F64 : A64I_FCVT_F64_F32, + (dest & 31), (ra_alloc1(as, lref, RSET_FPR) & 31)); + } else { /* Integer to FP conversion. */ + Reg left = ra_alloc1(as, lref, RSET_GPR); + A64Ins ai = irt_isfloat(ir->t) ? + (((IRT_IS64 >> st) & 1) ? + (st == IRT_I64 ? A64I_FCVT_F32_S64 : A64I_FCVT_F32_U64) : + (st == IRT_INT ? A64I_FCVT_F32_S32 : A64I_FCVT_F32_U32)) : + (((IRT_IS64 >> st) & 1) ? + (st == IRT_I64 ? A64I_FCVT_F64_S64 : A64I_FCVT_F64_U64) : + (st == IRT_INT ? A64I_FCVT_F64_S32 : A64I_FCVT_F64_U32)); + emit_dn(as, ai, (dest & 31), left); + } + } else if (stfp) { /* FP to integer conversion. */ + if (irt_isguard(ir->t)) { + /* Checked conversions are only supported from number to int. */ + lj_assertA(irt_isint(ir->t) && st == IRT_NUM, + "bad type for checked CONV"); + asm_tointg(as, ir, ra_alloc1(as, lref, RSET_FPR)); + } else { + Reg left = ra_alloc1(as, lref, RSET_FPR); + Reg dest = ra_dest(as, ir, RSET_GPR); + A64Ins ai = irt_is64(ir->t) ? + (st == IRT_NUM ? + (irt_isi64(ir->t) ? A64I_FCVT_S64_F64 : A64I_FCVT_U64_F64) : + (irt_isi64(ir->t) ? A64I_FCVT_S64_F32 : A64I_FCVT_U64_F32)) : + (st == IRT_NUM ? + (irt_isint(ir->t) ? A64I_FCVT_S32_F64 : A64I_FCVT_U32_F64) : + (irt_isint(ir->t) ? A64I_FCVT_S32_F32 : A64I_FCVT_U32_F32)); + emit_dn(as, ai, dest, (left & 31)); + } + } else if (st >= IRT_I8 && st <= IRT_U16) { /* Extend to 32 bit integer. */ + Reg dest = ra_dest(as, ir, RSET_GPR); + Reg left = ra_alloc1(as, lref, RSET_GPR); + A64Ins ai = st == IRT_I8 ? A64I_SXTBw : + st == IRT_U8 ? A64I_UXTBw : + st == IRT_I16 ? A64I_SXTHw : A64I_UXTHw; + lj_assertA(irt_isint(ir->t) || irt_isu32(ir->t), "bad type for CONV EXT"); + emit_dn(as, ai, dest, left); + } else { + Reg dest = ra_dest(as, ir, RSET_GPR); + if (irt_is64(ir->t)) { + if (st64 || !(ir->op2 & IRCONV_SEXT)) { + /* 64/64 bit no-op (cast) or 32 to 64 bit zero extension. */ + ra_leftov(as, dest, lref); /* Do nothing, but may need to move regs. */ + } else { /* 32 to 64 bit sign extension. */ + Reg left = ra_alloc1(as, lref, RSET_GPR); + emit_dn(as, A64I_SXTW, dest, left); + } + } else { + if (st64 && !(ir->op2 & IRCONV_NONE)) { + /* This is either a 32 bit reg/reg mov which zeroes the hiword + ** or a load of the loword from a 64 bit address. + */ + Reg left = ra_alloc1(as, lref, RSET_GPR); + emit_dm(as, A64I_MOVw, dest, left); + } else { /* 32/32 bit no-op (cast). */ + ra_leftov(as, dest, lref); /* Do nothing, but may need to move regs. */ + } + } + } +} + +static void asm_strto(ASMState *as, IRIns *ir) +{ + const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_strscan_num]; + IRRef args[2]; + Reg dest = 0, tmp; + int destused = ra_used(ir); + int32_t ofs = 0; + ra_evictset(as, RSET_SCRATCH); + if (destused) { + if (ra_hasspill(ir->s)) { + ofs = sps_scale(ir->s); + destused = 0; + if (ra_hasreg(ir->r)) { + ra_free(as, ir->r); + ra_modified(as, ir->r); + emit_spload(as, ir, ir->r, ofs); + } + } else { + dest = ra_dest(as, ir, RSET_FPR); + } + } + if (destused) + emit_lso(as, A64I_LDRd, (dest & 31), RID_SP, 0); + asm_guardcnb(as, A64I_CBZ, RID_RET); + args[0] = ir->op1; /* GCstr *str */ + args[1] = ASMREF_TMP1; /* TValue *n */ + asm_gencall(as, ci, args); + tmp = ra_releasetmp(as, ASMREF_TMP1); + emit_opk(as, A64I_ADDx, tmp, RID_SP, ofs, RSET_GPR); +} + +/* -- Memory references --------------------------------------------------- */ + +/* Store tagged value for ref at base+ofs. */ +static void asm_tvstore64(ASMState *as, Reg base, int32_t ofs, IRRef ref) +{ + RegSet allow = rset_exclude(RSET_GPR, base); + IRIns *ir = IR(ref); + lj_assertA(irt_ispri(ir->t) || irt_isaddr(ir->t) || irt_isinteger(ir->t), + "store of IR type %d", irt_type(ir->t)); + if (irref_isk(ref)) { + TValue k; + lj_ir_kvalue(as->J->L, &k, ir); + emit_lso(as, A64I_STRx, ra_allock(as, k.u64, allow), base, ofs); + } else { + Reg src = ra_alloc1(as, ref, allow); + rset_clear(allow, src); + if (irt_isinteger(ir->t)) { + Reg type = ra_allock(as, (int64_t)irt_toitype(ir->t) << 47, allow); + emit_lso(as, A64I_STRx, RID_TMP, base, ofs); + emit_dnm(as, A64I_ADDx | A64F_EX(A64EX_UXTW), RID_TMP, type, src); + } else { + Reg type = ra_allock(as, (int32_t)irt_toitype(ir->t), allow); + emit_lso(as, A64I_STRx, RID_TMP, base, ofs); + emit_dnm(as, A64I_ADDx | A64F_SH(A64SH_LSL, 47), RID_TMP, src, type); + } + } +} + +/* Get pointer to TValue. */ +static void asm_tvptr(ASMState *as, Reg dest, IRRef ref, MSize mode) +{ + if ((mode & IRTMPREF_IN1)) { + IRIns *ir = IR(ref); + if (irt_isnum(ir->t)) { + if (irref_isk(ref) && !(mode & IRTMPREF_OUT1)) { + /* Use the number constant itself as a TValue. */ + ra_allockreg(as, i64ptr(ir_knum(ir)), dest); + return; + } + emit_lso(as, A64I_STRd, (ra_alloc1(as, ref, RSET_FPR) & 31), dest, 0); + } else { + asm_tvstore64(as, dest, 0, ref); + } + } + /* g->tmptv holds the TValue(s). */ + emit_dn(as, A64I_ADDx^emit_isk12(glofs(as, &J2G(as->J)->tmptv)), dest, RID_GL); +} + +static void asm_aref(ASMState *as, IRIns *ir) +{ + Reg dest = ra_dest(as, ir, RSET_GPR); + Reg idx, base; + if (irref_isk(ir->op2)) { + IRRef tab = IR(ir->op1)->op1; + int32_t ofs = asm_fuseabase(as, tab); + IRRef refa = ofs ? tab : ir->op1; + uint32_t k = emit_isk12(ofs + 8*IR(ir->op2)->i); + if (k) { + base = ra_alloc1(as, refa, RSET_GPR); + emit_dn(as, A64I_ADDx^k, dest, base); + return; + } + } + base = ra_alloc1(as, ir->op1, RSET_GPR); + idx = ra_alloc1(as, ir->op2, rset_exclude(RSET_GPR, base)); + emit_dnm(as, A64I_ADDx | A64F_EXSH(A64EX_UXTW, 3), dest, base, idx); +} + +/* Inlined hash lookup. Specialized for key type and for const keys. +** The equivalent C code is: +** Node *n = hashkey(t, key); +** do { +** if (lj_obj_equal(&n->key, key)) return &n->val; +** } while ((n = nextnode(n))); +** return niltv(L); +*/ +static void asm_href(ASMState *as, IRIns *ir, IROp merge) +{ + RegSet allow = RSET_GPR; + int destused = ra_used(ir); + Reg dest = ra_dest(as, ir, allow); + Reg tab = ra_alloc1(as, ir->op1, rset_clear(allow, dest)); + Reg key = 0, tmp = RID_TMP; + Reg ftmp = RID_NONE, type = RID_NONE, scr = RID_NONE, tisnum = RID_NONE; + IRRef refkey = ir->op2; + IRIns *irkey = IR(refkey); + int isk = irref_isk(ir->op2); + IRType1 kt = irkey->t; + uint32_t k = 0; + uint32_t khash; + MCLabel l_end, l_loop, l_next; + rset_clear(allow, tab); + + if (!isk) { + key = ra_alloc1(as, ir->op2, irt_isnum(kt) ? RSET_FPR : allow); + rset_clear(allow, key); + if (!irt_isstr(kt)) { + tmp = ra_scratch(as, allow); + rset_clear(allow, tmp); + } + } else if (irt_isnum(kt)) { + int64_t val = (int64_t)ir_knum(irkey)->u64; + if (!(k = emit_isk12(val))) { + key = ra_allock(as, val, allow); + rset_clear(allow, key); + } + } else if (!irt_ispri(kt)) { + if (!(k = emit_isk12(irkey->i))) { + key = ra_alloc1(as, refkey, allow); + rset_clear(allow, key); + } + } + + /* Allocate constants early. */ + if (irt_isnum(kt)) { + if (!isk) { + tisnum = ra_allock(as, LJ_TISNUM << 15, allow); + ftmp = ra_scratch(as, rset_exclude(RSET_FPR, key)); + rset_clear(allow, tisnum); + } + } else if (irt_isaddr(kt)) { + if (isk) { + int64_t kk = ((int64_t)irt_toitype(kt) << 47) | irkey[1].tv.u64; + scr = ra_allock(as, kk, allow); + } else { + scr = ra_scratch(as, allow); + } + rset_clear(allow, scr); + } else { + lj_assertA(irt_ispri(kt) && !irt_isnil(kt), "bad HREF key type"); + type = ra_allock(as, ~((int64_t)~irt_toitype(kt) << 47), allow); + scr = ra_scratch(as, rset_clear(allow, type)); + rset_clear(allow, scr); + } + + /* Key not found in chain: jump to exit (if merged) or load niltv. */ + l_end = emit_label(as); + as->invmcp = NULL; + if (merge == IR_NE) + asm_guardcc(as, CC_AL); + else if (destused) + emit_loada(as, dest, niltvg(J2G(as->J))); + + /* Follow hash chain until the end. */ + l_loop = --as->mcp; + emit_n(as, A64I_CMPx^A64I_K12^0, dest); + emit_lso(as, A64I_LDRx, dest, dest, offsetof(Node, next)); + l_next = emit_label(as); + + /* Type and value comparison. */ + if (merge == IR_EQ) + asm_guardcc(as, CC_EQ); + else + emit_cond_branch(as, CC_EQ, l_end); + + if (irt_isnum(kt)) { + if (isk) { + /* Assumes -0.0 is already canonicalized to +0.0. */ + if (k) + emit_n(as, A64I_CMPx^k, tmp); + else + emit_nm(as, A64I_CMPx, key, tmp); + emit_lso(as, A64I_LDRx, tmp, dest, offsetof(Node, key.u64)); + } else { + emit_nm(as, A64I_FCMPd, key, ftmp); + emit_dn(as, A64I_FMOV_D_R, (ftmp & 31), (tmp & 31)); + emit_cond_branch(as, CC_LO, l_next); + emit_nm(as, A64I_CMPx | A64F_SH(A64SH_LSR, 32), tisnum, tmp); + emit_lso(as, A64I_LDRx, tmp, dest, offsetof(Node, key.n)); + } + } else if (irt_isaddr(kt)) { + if (isk) { + emit_nm(as, A64I_CMPx, scr, tmp); + emit_lso(as, A64I_LDRx, tmp, dest, offsetof(Node, key.u64)); + } else { + emit_nm(as, A64I_CMPx, tmp, scr); + emit_lso(as, A64I_LDRx, scr, dest, offsetof(Node, key.u64)); + } + } else { + emit_nm(as, A64I_CMPx, scr, type); + emit_lso(as, A64I_LDRx, scr, dest, offsetof(Node, key)); + } + + *l_loop = A64I_BCC | A64F_S19(as->mcp - l_loop) | CC_NE; + if (!isk && irt_isaddr(kt)) { + type = ra_allock(as, (int32_t)irt_toitype(kt), allow); + emit_dnm(as, A64I_ADDx | A64F_SH(A64SH_LSL, 47), tmp, key, type); + rset_clear(allow, type); + } + /* Load main position relative to tab->node into dest. */ + khash = isk ? ir_khash(as, irkey) : 1; + if (khash == 0) { + emit_lso(as, A64I_LDRx, dest, tab, offsetof(GCtab, node)); + } else { + emit_dnm(as, A64I_ADDx | A64F_SH(A64SH_LSL, 3), dest, tmp, dest); + emit_dnm(as, A64I_ADDx | A64F_SH(A64SH_LSL, 1), dest, dest, dest); + emit_lso(as, A64I_LDRx, tmp, tab, offsetof(GCtab, node)); + if (isk) { + Reg tmphash = ra_allock(as, khash, allow); + emit_dnm(as, A64I_ANDw, dest, dest, tmphash); + emit_lso(as, A64I_LDRw, dest, tab, offsetof(GCtab, hmask)); + } else if (irt_isstr(kt)) { + /* Fetch of str->sid is cheaper than ra_allock. */ + emit_dnm(as, A64I_ANDw, dest, dest, tmp); + emit_lso(as, A64I_LDRw, tmp, key, offsetof(GCstr, sid)); + emit_lso(as, A64I_LDRw, dest, tab, offsetof(GCtab, hmask)); + } else { /* Must match with hash*() in lj_tab.c. */ + emit_dnm(as, A64I_ANDw, dest, dest, tmp); + emit_lso(as, A64I_LDRw, tmp, tab, offsetof(GCtab, hmask)); + emit_dnm(as, A64I_SUBw, dest, dest, tmp); + emit_dnm(as, A64I_EXTRw | (A64F_IMMS(32-HASH_ROT3)), tmp, tmp, tmp); + emit_dnm(as, A64I_EORw, dest, dest, tmp); + emit_dnm(as, A64I_EXTRw | (A64F_IMMS(32-HASH_ROT2)), dest, dest, dest); + emit_dnm(as, A64I_SUBw, tmp, tmp, dest); + emit_dnm(as, A64I_EXTRw | (A64F_IMMS(32-HASH_ROT1)), dest, dest, dest); + emit_dnm(as, A64I_EORw, tmp, tmp, dest); + if (irt_isnum(kt)) { + emit_dnm(as, A64I_ADDw, dest, dest, dest); + emit_dn(as, A64I_LSRx | A64F_IMMR(32)|A64F_IMMS(32), dest, dest); + emit_dm(as, A64I_MOVw, tmp, dest); + emit_dn(as, A64I_FMOV_R_D, dest, (key & 31)); + } else { + checkmclim(as); + emit_dm(as, A64I_MOVw, tmp, key); + emit_dnm(as, A64I_EORw, dest, dest, + ra_allock(as, irt_toitype(kt) << 15, allow)); + emit_dn(as, A64I_LSRx | A64F_IMMR(32)|A64F_IMMS(32), dest, dest); + emit_dm(as, A64I_MOVx, dest, key); + } + } + } +} + +static void asm_hrefk(ASMState *as, IRIns *ir) +{ + IRIns *kslot = IR(ir->op2); + IRIns *irkey = IR(kslot->op1); + int32_t ofs = (int32_t)(kslot->op2 * sizeof(Node)); + int32_t kofs = ofs + (int32_t)offsetof(Node, key); + int bigofs = !emit_checkofs(A64I_LDRx, kofs); + Reg dest = (ra_used(ir) || bigofs) ? ra_dest(as, ir, RSET_GPR) : RID_NONE; + Reg node = ra_alloc1(as, ir->op1, RSET_GPR); + Reg key, idx = node; + RegSet allow = rset_exclude(RSET_GPR, node); + uint64_t k; + lj_assertA(ofs % sizeof(Node) == 0, "unaligned HREFK slot"); + if (bigofs) { + idx = dest; + rset_clear(allow, dest); + kofs = (int32_t)offsetof(Node, key); + } else if (ra_hasreg(dest)) { + emit_opk(as, A64I_ADDx, dest, node, ofs, allow); + } + asm_guardcc(as, CC_NE); + if (irt_ispri(irkey->t)) { + k = ~((int64_t)~irt_toitype(irkey->t) << 47); + } else if (irt_isnum(irkey->t)) { + k = ir_knum(irkey)->u64; + } else { + k = ((uint64_t)irt_toitype(irkey->t) << 47) | (uint64_t)ir_kgc(irkey); + } + key = ra_scratch(as, allow); + emit_nm(as, A64I_CMPx, key, ra_allock(as, k, rset_exclude(allow, key))); + emit_lso(as, A64I_LDRx, key, idx, kofs); + if (bigofs) + emit_opk(as, A64I_ADDx, dest, node, ofs, rset_exclude(RSET_GPR, node)); +} + +static void asm_uref(ASMState *as, IRIns *ir) +{ + Reg dest = ra_dest(as, ir, RSET_GPR); + if (irref_isk(ir->op1)) { + GCfunc *fn = ir_kfunc(IR(ir->op1)); + MRef *v = &gcref(fn->l.uvptr[(ir->op2 >> 8)])->uv.v; + emit_lsptr(as, A64I_LDRx, dest, v); + } else { + Reg uv = ra_scratch(as, RSET_GPR); + Reg func = ra_alloc1(as, ir->op1, RSET_GPR); + if (ir->o == IR_UREFC) { + asm_guardcc(as, CC_NE); + emit_n(as, (A64I_CMPx^A64I_K12) | A64F_U12(1), RID_TMP); + emit_opk(as, A64I_ADDx, dest, uv, + (int32_t)offsetof(GCupval, tv), RSET_GPR); + emit_lso(as, A64I_LDRB, RID_TMP, uv, (int32_t)offsetof(GCupval, closed)); + } else { + emit_lso(as, A64I_LDRx, dest, uv, (int32_t)offsetof(GCupval, v)); + } + emit_lso(as, A64I_LDRx, uv, func, + (int32_t)offsetof(GCfuncL, uvptr) + 8*(int32_t)(ir->op2 >> 8)); + } +} + +static void asm_fref(ASMState *as, IRIns *ir) +{ + UNUSED(as); UNUSED(ir); + lj_assertA(!ra_used(ir), "unfused FREF"); +} + +static void asm_strref(ASMState *as, IRIns *ir) +{ + RegSet allow = RSET_GPR; + Reg dest = ra_dest(as, ir, allow); + Reg base = ra_alloc1(as, ir->op1, allow); + IRIns *irr = IR(ir->op2); + int32_t ofs = sizeof(GCstr); + uint32_t m; + rset_clear(allow, base); + if (irref_isk(ir->op2) && (m = emit_isk12(ofs + irr->i))) { + emit_dn(as, A64I_ADDx^m, dest, base); + } else { + emit_dn(as, (A64I_ADDx^A64I_K12) | A64F_U12(ofs), dest, dest); + emit_dnm(as, A64I_ADDx, dest, base, ra_alloc1(as, ir->op2, allow)); + } +} + +/* -- Loads and stores ---------------------------------------------------- */ + +static A64Ins asm_fxloadins(IRIns *ir) +{ + switch (irt_type(ir->t)) { + case IRT_I8: return A64I_LDRB ^ A64I_LS_S; + case IRT_U8: return A64I_LDRB; + case IRT_I16: return A64I_LDRH ^ A64I_LS_S; + case IRT_U16: return A64I_LDRH; + case IRT_NUM: return A64I_LDRd; + case IRT_FLOAT: return A64I_LDRs; + default: return irt_is64(ir->t) ? A64I_LDRx : A64I_LDRw; + } +} + +static A64Ins asm_fxstoreins(IRIns *ir) +{ + switch (irt_type(ir->t)) { + case IRT_I8: case IRT_U8: return A64I_STRB; + case IRT_I16: case IRT_U16: return A64I_STRH; + case IRT_NUM: return A64I_STRd; + case IRT_FLOAT: return A64I_STRs; + default: return irt_is64(ir->t) ? A64I_STRx : A64I_STRw; + } +} + +static void asm_fload(ASMState *as, IRIns *ir) +{ + Reg dest = ra_dest(as, ir, RSET_GPR); + Reg idx; + A64Ins ai = asm_fxloadins(ir); + int32_t ofs; + if (ir->op1 == REF_NIL) { /* FLOAD from GG_State with offset. */ + idx = RID_GL; + ofs = (ir->op2 << 2) - GG_OFS(g); + } else { + idx = ra_alloc1(as, ir->op1, RSET_GPR); + if (ir->op2 == IRFL_TAB_ARRAY) { + ofs = asm_fuseabase(as, ir->op1); + if (ofs) { /* Turn the t->array load into an add for colocated arrays. */ + emit_dn(as, (A64I_ADDx^A64I_K12) | A64F_U12(ofs), dest, idx); + return; + } + } + ofs = field_ofs[ir->op2]; + } + emit_lso(as, ai, (dest & 31), idx, ofs); +} + +static void asm_fstore(ASMState *as, IRIns *ir) +{ + if (ir->r != RID_SINK) { + Reg src = ra_alloc1(as, ir->op2, RSET_GPR); + IRIns *irf = IR(ir->op1); + Reg idx = ra_alloc1(as, irf->op1, rset_exclude(RSET_GPR, src)); + int32_t ofs = field_ofs[irf->op2]; + emit_lso(as, asm_fxstoreins(ir), (src & 31), idx, ofs); + } +} + +static void asm_xload(ASMState *as, IRIns *ir) +{ + Reg dest = ra_dest(as, ir, irt_isfp(ir->t) ? RSET_FPR : RSET_GPR); + lj_assertA(!(ir->op2 & IRXLOAD_UNALIGNED), "unaligned XLOAD"); + asm_fusexref(as, asm_fxloadins(ir), dest, ir->op1, RSET_GPR); +} + +static void asm_xstore(ASMState *as, IRIns *ir) +{ + if (ir->r != RID_SINK) { + Reg src = ra_alloc1(as, ir->op2, irt_isfp(ir->t) ? RSET_FPR : RSET_GPR); + asm_fusexref(as, asm_fxstoreins(ir), src, ir->op1, + rset_exclude(RSET_GPR, src)); + } +} + +static void asm_ahuvload(ASMState *as, IRIns *ir) +{ + Reg idx, tmp, type; + int32_t ofs = 0; + RegSet gpr = RSET_GPR, allow = irt_isnum(ir->t) ? RSET_FPR : RSET_GPR; + lj_assertA(irt_isnum(ir->t) || irt_ispri(ir->t) || irt_isaddr(ir->t) || + irt_isint(ir->t), + "bad load type %d", irt_type(ir->t)); + if (ra_used(ir)) { + Reg dest = ra_dest(as, ir, allow); + tmp = irt_isnum(ir->t) ? ra_scratch(as, rset_clear(gpr, dest)) : dest; + if (irt_isaddr(ir->t)) { + emit_dn(as, A64I_ANDx^emit_isk13(LJ_GCVMASK, 1), dest, dest); + } else if (irt_isnum(ir->t)) { + emit_dn(as, A64I_FMOV_D_R, (dest & 31), tmp); + } else if (irt_isint(ir->t)) { + emit_dm(as, A64I_MOVw, dest, dest); + } + } else { + tmp = ra_scratch(as, gpr); + } + type = ra_scratch(as, rset_clear(gpr, tmp)); + idx = asm_fuseahuref(as, ir->op1, &ofs, rset_clear(gpr, type), A64I_LDRx); + if (ir->o == IR_VLOAD) ofs += 8 * ir->op2; + /* Always do the type check, even if the load result is unused. */ + asm_guardcc(as, irt_isnum(ir->t) ? CC_LS : CC_NE); + if (irt_type(ir->t) >= IRT_NUM) { + lj_assertA(irt_isinteger(ir->t) || irt_isnum(ir->t), + "bad load type %d", irt_type(ir->t)); + emit_nm(as, A64I_CMPx | A64F_SH(A64SH_LSR, 32), + ra_allock(as, LJ_TISNUM << 15, rset_exclude(gpr, idx)), tmp); + } else if (irt_isaddr(ir->t)) { + emit_n(as, (A64I_CMNx^A64I_K12) | A64F_U12(-irt_toitype(ir->t)), type); + emit_dn(as, A64I_ASRx | A64F_IMMR(47), type, tmp); + } else if (irt_isnil(ir->t)) { + emit_n(as, (A64I_CMNx^A64I_K12) | A64F_U12(1), tmp); + } else { + emit_nm(as, A64I_CMPx | A64F_SH(A64SH_LSR, 32), + ra_allock(as, (irt_toitype(ir->t) << 15) | 0x7fff, gpr), tmp); + } + if (ofs & FUSE_REG) + emit_dnm(as, (A64I_LDRx^A64I_LS_R)|A64I_LS_UXTWx|A64I_LS_SH, tmp, idx, (ofs & 31)); + else + emit_lso(as, A64I_LDRx, tmp, idx, ofs); +} + +static void asm_ahustore(ASMState *as, IRIns *ir) +{ + if (ir->r != RID_SINK) { + RegSet allow = RSET_GPR; + Reg idx, src = RID_NONE, tmp = RID_TMP, type = RID_NONE; + int32_t ofs = 0; + if (irt_isnum(ir->t)) { + src = ra_alloc1(as, ir->op2, RSET_FPR); + idx = asm_fuseahuref(as, ir->op1, &ofs, allow, A64I_STRd); + if (ofs & FUSE_REG) + emit_dnm(as, (A64I_STRd^A64I_LS_R)|A64I_LS_UXTWx|A64I_LS_SH, (src & 31), idx, (ofs &31)); + else + emit_lso(as, A64I_STRd, (src & 31), idx, ofs); + } else { + if (!irt_ispri(ir->t)) { + src = ra_alloc1(as, ir->op2, allow); + rset_clear(allow, src); + if (irt_isinteger(ir->t)) + type = ra_allock(as, (uint64_t)(int32_t)LJ_TISNUM << 47, allow); + else + type = ra_allock(as, irt_toitype(ir->t), allow); + } else { + tmp = type = ra_allock(as, ~((int64_t)~irt_toitype(ir->t)<<47), allow); + } + idx = asm_fuseahuref(as, ir->op1, &ofs, rset_exclude(allow, type), + A64I_STRx); + if (ofs & FUSE_REG) + emit_dnm(as, (A64I_STRx^A64I_LS_R)|A64I_LS_UXTWx|A64I_LS_SH, tmp, idx, (ofs & 31)); + else + emit_lso(as, A64I_STRx, tmp, idx, ofs); + if (ra_hasreg(src)) { + if (irt_isinteger(ir->t)) { + emit_dnm(as, A64I_ADDx | A64F_EX(A64EX_UXTW), tmp, type, src); + } else { + emit_dnm(as, A64I_ADDx | A64F_SH(A64SH_LSL, 47), tmp, src, type); + } + } + } + } +} + +static void asm_sload(ASMState *as, IRIns *ir) +{ + int32_t ofs = 8*((int32_t)ir->op1-2); + IRType1 t = ir->t; + Reg dest = RID_NONE, base; + RegSet allow = RSET_GPR; + lj_assertA(!(ir->op2 & IRSLOAD_PARENT), + "bad parent SLOAD"); /* Handled by asm_head_side(). */ + lj_assertA(irt_isguard(t) || !(ir->op2 & IRSLOAD_TYPECHECK), + "inconsistent SLOAD variant"); + if ((ir->op2 & IRSLOAD_CONVERT) && irt_isguard(t) && irt_isint(t)) { + dest = ra_scratch(as, RSET_FPR); + asm_tointg(as, ir, dest); + t.irt = IRT_NUM; /* Continue with a regular number type check. */ + } else if (ra_used(ir)) { + Reg tmp = RID_NONE; + if ((ir->op2 & IRSLOAD_CONVERT)) + tmp = ra_scratch(as, irt_isint(t) ? RSET_FPR : RSET_GPR); + lj_assertA((irt_isnum(t)) || irt_isint(t) || irt_isaddr(t), + "bad SLOAD type %d", irt_type(t)); + dest = ra_dest(as, ir, irt_isnum(t) ? RSET_FPR : allow); + base = ra_alloc1(as, REF_BASE, rset_clear(allow, dest)); + if (irt_isaddr(t)) { + emit_dn(as, A64I_ANDx^emit_isk13(LJ_GCVMASK, 1), dest, dest); + } else if ((ir->op2 & IRSLOAD_CONVERT)) { + if (irt_isint(t)) { + emit_dn(as, A64I_FCVT_S32_F64, dest, (tmp & 31)); + /* If value is already loaded for type check, move it to FPR. */ + if ((ir->op2 & IRSLOAD_TYPECHECK)) + emit_dn(as, A64I_FMOV_D_R, (tmp & 31), dest); + else + dest = tmp; + t.irt = IRT_NUM; /* Check for original type. */ + } else { + emit_dn(as, A64I_FCVT_F64_S32, (dest & 31), tmp); + dest = tmp; + t.irt = IRT_INT; /* Check for original type. */ + } + } else if (irt_isint(t) && (ir->op2 & IRSLOAD_TYPECHECK)) { + emit_dm(as, A64I_MOVw, dest, dest); + } + goto dotypecheck; + } + base = ra_alloc1(as, REF_BASE, allow); +dotypecheck: + rset_clear(allow, base); + if ((ir->op2 & IRSLOAD_TYPECHECK)) { + Reg tmp; + if (ra_hasreg(dest) && rset_test(RSET_GPR, dest)) { + tmp = dest; + } else { + tmp = ra_scratch(as, allow); + rset_clear(allow, tmp); + } + if (ra_hasreg(dest) && tmp != dest) + emit_dn(as, A64I_FMOV_D_R, (dest & 31), tmp); + /* Need type check, even if the load result is unused. */ + asm_guardcc(as, irt_isnum(t) ? CC_LS : CC_NE); + if (irt_type(t) >= IRT_NUM) { + lj_assertA(irt_isinteger(t) || irt_isnum(t), + "bad SLOAD type %d", irt_type(t)); + emit_nm(as, A64I_CMPx | A64F_SH(A64SH_LSR, 32), + ra_allock(as, (ir->op2 & IRSLOAD_KEYINDEX) ? LJ_KEYINDEX : (LJ_TISNUM << 15), allow), tmp); + } else if (irt_isnil(t)) { + emit_n(as, (A64I_CMNx^A64I_K12) | A64F_U12(1), tmp); + } else if (irt_ispri(t)) { + emit_nm(as, A64I_CMPx, + ra_allock(as, ~((int64_t)~irt_toitype(t) << 47) , allow), tmp); + } else { + Reg type = ra_scratch(as, allow); + emit_n(as, (A64I_CMNx^A64I_K12) | A64F_U12(-irt_toitype(t)), type); + emit_dn(as, A64I_ASRx | A64F_IMMR(47), type, tmp); + } + emit_lso(as, A64I_LDRx, tmp, base, ofs); + return; + } + if (ra_hasreg(dest)) { + emit_lso(as, irt_isnum(t) ? A64I_LDRd : + (irt_isint(t) ? A64I_LDRw : A64I_LDRx), (dest & 31), base, + ofs ^ ((LJ_BE && irt_isint(t) ? 4 : 0))); + } +} + +/* -- Allocations --------------------------------------------------------- */ + +#if LJ_HASFFI +static void asm_cnew(ASMState *as, IRIns *ir) +{ + CTState *cts = ctype_ctsG(J2G(as->J)); + CTypeID id = (CTypeID)IR(ir->op1)->i; + CTSize sz; + CTInfo info = lj_ctype_info(cts, id, &sz); + const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_mem_newgco]; + IRRef args[4]; + RegSet allow = (RSET_GPR & ~RSET_SCRATCH); + lj_assertA(sz != CTSIZE_INVALID || (ir->o == IR_CNEW && ir->op2 != REF_NIL), + "bad CNEW/CNEWI operands"); + + as->gcsteps++; + asm_setupresult(as, ir, ci); /* GCcdata * */ + /* Initialize immutable cdata object. */ + if (ir->o == IR_CNEWI) { + int32_t ofs = sizeof(GCcdata); + Reg r = ra_alloc1(as, ir->op2, allow); + lj_assertA(sz == 4 || sz == 8, "bad CNEWI size %d", sz); + emit_lso(as, sz == 8 ? A64I_STRx : A64I_STRw, r, RID_RET, ofs); + } else if (ir->op2 != REF_NIL) { /* Create VLA/VLS/aligned cdata. */ + ci = &lj_ir_callinfo[IRCALL_lj_cdata_newv]; + args[0] = ASMREF_L; /* lua_State *L */ + args[1] = ir->op1; /* CTypeID id */ + args[2] = ir->op2; /* CTSize sz */ + args[3] = ASMREF_TMP1; /* CTSize align */ + asm_gencall(as, ci, args); + emit_loadi(as, ra_releasetmp(as, ASMREF_TMP1), (int32_t)ctype_align(info)); + return; + } + + /* Initialize gct and ctypeid. lj_mem_newgco() already sets marked. */ + { + Reg r = (id < 65536) ? RID_X1 : ra_allock(as, id, allow); + emit_lso(as, A64I_STRB, RID_TMP, RID_RET, offsetof(GCcdata, gct)); + emit_lso(as, A64I_STRH, r, RID_RET, offsetof(GCcdata, ctypeid)); + emit_d(as, A64I_MOVZw | A64F_U16(~LJ_TCDATA), RID_TMP); + if (id < 65536) emit_d(as, A64I_MOVZw | A64F_U16(id), RID_X1); + } + args[0] = ASMREF_L; /* lua_State *L */ + args[1] = ASMREF_TMP1; /* MSize size */ + asm_gencall(as, ci, args); + ra_allockreg(as, (int32_t)(sz+sizeof(GCcdata)), + ra_releasetmp(as, ASMREF_TMP1)); +} +#endif + +/* -- Write barriers ------------------------------------------------------ */ + +static void asm_tbar(ASMState *as, IRIns *ir) +{ + Reg tab = ra_alloc1(as, ir->op1, RSET_GPR); + Reg link = ra_scratch(as, rset_exclude(RSET_GPR, tab)); + Reg mark = RID_TMP; + MCLabel l_end = emit_label(as); + emit_lso(as, A64I_STRx, link, tab, (int32_t)offsetof(GCtab, gclist)); + emit_lso(as, A64I_STRB, mark, tab, (int32_t)offsetof(GCtab, marked)); + emit_setgl(as, tab, gc.grayagain); + emit_dn(as, A64I_ANDw^emit_isk13(~LJ_GC_BLACK, 0), mark, mark); + emit_getgl(as, link, gc.grayagain); + emit_cond_branch(as, CC_EQ, l_end); + emit_n(as, A64I_TSTw^emit_isk13(LJ_GC_BLACK, 0), mark); + emit_lso(as, A64I_LDRB, mark, tab, (int32_t)offsetof(GCtab, marked)); +} + +static void asm_obar(ASMState *as, IRIns *ir) +{ + const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_gc_barrieruv]; + IRRef args[2]; + MCLabel l_end; + RegSet allow = RSET_GPR; + Reg obj, val, tmp; + /* No need for other object barriers (yet). */ + lj_assertA(IR(ir->op1)->o == IR_UREFC, "bad OBAR type"); + ra_evictset(as, RSET_SCRATCH); + l_end = emit_label(as); + args[0] = ASMREF_TMP1; /* global_State *g */ + args[1] = ir->op1; /* TValue *tv */ + asm_gencall(as, ci, args); + emit_dm(as, A64I_MOVx, ra_releasetmp(as, ASMREF_TMP1), RID_GL); + obj = IR(ir->op1)->r; + tmp = ra_scratch(as, rset_exclude(allow, obj)); + emit_cond_branch(as, CC_EQ, l_end); + emit_n(as, A64I_TSTw^emit_isk13(LJ_GC_BLACK, 0), tmp); + emit_cond_branch(as, CC_EQ, l_end); + emit_n(as, A64I_TSTw^emit_isk13(LJ_GC_WHITES, 0), RID_TMP); + val = ra_alloc1(as, ir->op2, rset_exclude(RSET_GPR, obj)); + emit_lso(as, A64I_LDRB, tmp, obj, + (int32_t)offsetof(GCupval, marked)-(int32_t)offsetof(GCupval, tv)); + emit_lso(as, A64I_LDRB, RID_TMP, val, (int32_t)offsetof(GChead, marked)); +} + +/* -- Arithmetic and logic operations ------------------------------------- */ + +static void asm_fparith(ASMState *as, IRIns *ir, A64Ins ai) +{ + Reg dest = ra_dest(as, ir, RSET_FPR); + Reg right, left = ra_alloc2(as, ir, RSET_FPR); + right = (left >> 8); left &= 255; + emit_dnm(as, ai, (dest & 31), (left & 31), (right & 31)); +} + +static void asm_fpunary(ASMState *as, IRIns *ir, A64Ins ai) +{ + Reg dest = ra_dest(as, ir, RSET_FPR); + Reg left = ra_hintalloc(as, ir->op1, dest, RSET_FPR); + emit_dn(as, ai, (dest & 31), (left & 31)); +} + +static void asm_fpmath(ASMState *as, IRIns *ir) +{ + IRFPMathOp fpm = (IRFPMathOp)ir->op2; + if (fpm == IRFPM_SQRT) { + asm_fpunary(as, ir, A64I_FSQRTd); + } else if (fpm <= IRFPM_TRUNC) { + asm_fpunary(as, ir, fpm == IRFPM_FLOOR ? A64I_FRINTMd : + fpm == IRFPM_CEIL ? A64I_FRINTPd : A64I_FRINTZd); + } else { + asm_callid(as, ir, IRCALL_lj_vm_floor + fpm); + } +} + +static int asm_swapops(ASMState *as, IRRef lref, IRRef rref) +{ + IRIns *ir; + if (irref_isk(rref)) + return 0; /* Don't swap constants to the left. */ + if (irref_isk(lref)) + return 1; /* But swap constants to the right. */ + ir = IR(rref); + if ((ir->o >= IR_BSHL && ir->o <= IR_BSAR) || + (ir->o == IR_ADD && ir->op1 == ir->op2) || + (ir->o == IR_CONV && ir->op2 == ((IRT_I64<o >= IR_BSHL && ir->o <= IR_BSAR) || + (ir->o == IR_ADD && ir->op1 == ir->op2) || + (ir->o == IR_CONV && ir->op2 == ((IRT_I64<op1, rref = ir->op2; + Reg left, dest = ra_dest(as, ir, RSET_GPR); + uint32_t m; + if ((ai & ~A64I_S) != A64I_SUBw && asm_swapops(as, lref, rref)) { + IRRef tmp = lref; lref = rref; rref = tmp; + } + left = ra_hintalloc(as, lref, dest, RSET_GPR); + if (irt_is64(ir->t)) ai |= A64I_X; + m = asm_fuseopm(as, ai, rref, rset_exclude(RSET_GPR, left)); + if (irt_isguard(ir->t)) { /* For IR_ADDOV etc. */ + asm_guardcc(as, CC_VS); + ai |= A64I_S; + } + emit_dn(as, ai^m, dest, left); +} + +static void asm_intop_s(ASMState *as, IRIns *ir, A64Ins ai) +{ + if (as->flagmcp == as->mcp) { /* Drop cmp r, #0. */ + as->flagmcp = NULL; + as->mcp++; + ai |= A64I_S; + } + asm_intop(as, ir, ai); +} + +static void asm_intneg(ASMState *as, IRIns *ir) +{ + Reg dest = ra_dest(as, ir, RSET_GPR); + Reg left = ra_hintalloc(as, ir->op1, dest, RSET_GPR); + emit_dm(as, irt_is64(ir->t) ? A64I_NEGx : A64I_NEGw, dest, left); +} + +/* NYI: use add/shift for MUL(OV) with constants. FOLD only does 2^k. */ +static void asm_intmul(ASMState *as, IRIns *ir) +{ + Reg dest = ra_dest(as, ir, RSET_GPR); + Reg left = ra_alloc1(as, ir->op1, rset_exclude(RSET_GPR, dest)); + Reg right = ra_alloc1(as, ir->op2, rset_exclude(RSET_GPR, left)); + if (irt_isguard(ir->t)) { /* IR_MULOV */ + asm_guardcc(as, CC_NE); + emit_dm(as, A64I_MOVw, dest, dest); /* Zero-extend. */ + emit_nm(as, A64I_CMPw | A64F_SH(A64SH_ASR, 31), RID_TMP, dest); + emit_dn(as, A64I_ASRx | A64F_IMMR(32), RID_TMP, dest); + emit_dnm(as, A64I_SMULL, dest, right, left); + } else { + emit_dnm(as, irt_is64(ir->t) ? A64I_MULx : A64I_MULw, dest, left, right); + } +} + +static void asm_add(ASMState *as, IRIns *ir) +{ + if (irt_isnum(ir->t)) { + if (!asm_fusemadd(as, ir, A64I_FMADDd, A64I_FMADDd)) + asm_fparith(as, ir, A64I_FADDd); + return; + } + asm_intop_s(as, ir, A64I_ADDw); +} + +static void asm_sub(ASMState *as, IRIns *ir) +{ + if (irt_isnum(ir->t)) { + if (!asm_fusemadd(as, ir, A64I_FNMSUBd, A64I_FMSUBd)) + asm_fparith(as, ir, A64I_FSUBd); + return; + } + asm_intop_s(as, ir, A64I_SUBw); +} + +static void asm_mul(ASMState *as, IRIns *ir) +{ + if (irt_isnum(ir->t)) { + asm_fparith(as, ir, A64I_FMULd); + return; + } + asm_intmul(as, ir); +} + +#define asm_addov(as, ir) asm_add(as, ir) +#define asm_subov(as, ir) asm_sub(as, ir) +#define asm_mulov(as, ir) asm_mul(as, ir) + +#define asm_fpdiv(as, ir) asm_fparith(as, ir, A64I_FDIVd) +#define asm_abs(as, ir) asm_fpunary(as, ir, A64I_FABS) + +static void asm_neg(ASMState *as, IRIns *ir) +{ + if (irt_isnum(ir->t)) { + asm_fpunary(as, ir, A64I_FNEGd); + return; + } + asm_intneg(as, ir); +} + +static void asm_band(ASMState *as, IRIns *ir) +{ + A64Ins ai = A64I_ANDw; + if (asm_fuseandshift(as, ir)) + return; + if (as->flagmcp == as->mcp) { + /* Try to drop cmp r, #0. */ + as->flagmcp = NULL; + as->mcp++; + ai = A64I_ANDSw; + } + asm_intop(as, ir, ai); +} + +static void asm_borbxor(ASMState *as, IRIns *ir, A64Ins ai) +{ + IRRef lref = ir->op1, rref = ir->op2; + IRIns *irl = IR(lref), *irr = IR(rref); + if ((canfuse(as, irl) && irl->o == IR_BNOT && !irref_isk(rref)) || + (canfuse(as, irr) && irr->o == IR_BNOT && !irref_isk(lref))) { + Reg left, dest = ra_dest(as, ir, RSET_GPR); + uint32_t m; + if (irl->o == IR_BNOT) { + IRRef tmp = lref; lref = rref; rref = tmp; + } + left = ra_alloc1(as, lref, RSET_GPR); + ai |= A64I_ON; + if (irt_is64(ir->t)) ai |= A64I_X; + m = asm_fuseopm(as, ai, IR(rref)->op1, rset_exclude(RSET_GPR, left)); + emit_dn(as, ai^m, dest, left); + } else { + asm_intop(as, ir, ai); + } +} + +static void asm_bor(ASMState *as, IRIns *ir) +{ + if (asm_fuseorshift(as, ir)) + return; + asm_borbxor(as, ir, A64I_ORRw); +} + +#define asm_bxor(as, ir) asm_borbxor(as, ir, A64I_EORw) + +static void asm_bnot(ASMState *as, IRIns *ir) +{ + A64Ins ai = A64I_MVNw; + Reg dest = ra_dest(as, ir, RSET_GPR); + uint32_t m = asm_fuseopm(as, ai, ir->op1, RSET_GPR); + if (irt_is64(ir->t)) ai |= A64I_X; + emit_d(as, ai^m, dest); +} + +static void asm_bswap(ASMState *as, IRIns *ir) +{ + Reg dest = ra_dest(as, ir, RSET_GPR); + Reg left = ra_alloc1(as, ir->op1, RSET_GPR); + emit_dn(as, irt_is64(ir->t) ? A64I_REVx : A64I_REVw, dest, left); +} + +static void asm_bitshift(ASMState *as, IRIns *ir, A64Ins ai, A64Shift sh) +{ + int32_t shmask = irt_is64(ir->t) ? 63 : 31; + if (irref_isk(ir->op2)) { /* Constant shifts. */ + Reg left, dest = ra_dest(as, ir, RSET_GPR); + int32_t shift = (IR(ir->op2)->i & shmask); + IRIns *irl = IR(ir->op1); + if (shmask == 63) ai += A64I_UBFMx - A64I_UBFMw; + + /* Fuse BSHL + BSHR/BSAR into UBFM/SBFM aka UBFX/SBFX/UBFIZ/SBFIZ. */ + if ((sh == A64SH_LSR || sh == A64SH_ASR) && canfuse(as, irl)) { + if (irl->o == IR_BSHL && irref_isk(irl->op2)) { + int32_t shift2 = (IR(irl->op2)->i & shmask); + shift = ((shift - shift2) & shmask); + shmask -= shift2; + ir = irl; + } + } + + left = ra_alloc1(as, ir->op1, RSET_GPR); + switch (sh) { + case A64SH_LSL: + emit_dn(as, ai | A64F_IMMS(shmask-shift) | + A64F_IMMR((shmask-shift+1)&shmask), dest, left); + break; + case A64SH_LSR: case A64SH_ASR: + emit_dn(as, ai | A64F_IMMS(shmask) | A64F_IMMR(shift), dest, left); + break; + case A64SH_ROR: + emit_dnm(as, ai | A64F_IMMS(shift), dest, left, left); + break; + } + } else { /* Variable-length shifts. */ + Reg dest = ra_dest(as, ir, RSET_GPR); + Reg left = ra_alloc1(as, ir->op1, RSET_GPR); + Reg right = ra_alloc1(as, ir->op2, rset_exclude(RSET_GPR, left)); + emit_dnm(as, (shmask == 63 ? A64I_SHRx : A64I_SHRw) | A64F_BSH(sh), dest, left, right); + } +} + +#define asm_bshl(as, ir) asm_bitshift(as, ir, A64I_UBFMw, A64SH_LSL) +#define asm_bshr(as, ir) asm_bitshift(as, ir, A64I_UBFMw, A64SH_LSR) +#define asm_bsar(as, ir) asm_bitshift(as, ir, A64I_SBFMw, A64SH_ASR) +#define asm_bror(as, ir) asm_bitshift(as, ir, A64I_EXTRw, A64SH_ROR) +#define asm_brol(as, ir) lj_assertA(0, "unexpected BROL") + +static void asm_intmin_max(ASMState *as, IRIns *ir, A64CC cc) +{ + Reg dest = ra_dest(as, ir, RSET_GPR); + Reg left = ra_hintalloc(as, ir->op1, dest, RSET_GPR); + Reg right = ra_alloc1(as, ir->op2, rset_exclude(RSET_GPR, left)); + emit_dnm(as, A64I_CSELw|A64F_CC(cc), dest, left, right); + emit_nm(as, A64I_CMPw, left, right); +} + +static void asm_fpmin_max(ASMState *as, IRIns *ir, A64CC fcc) +{ + Reg dest = (ra_dest(as, ir, RSET_FPR) & 31); + Reg right, left = ra_alloc2(as, ir, RSET_FPR); + right = ((left >> 8) & 31); left &= 31; + emit_dnm(as, A64I_FCSELd | A64F_CC(fcc), dest, right, left); + emit_nm(as, A64I_FCMPd, left, right); +} + +static void asm_min_max(ASMState *as, IRIns *ir, A64CC cc, A64CC fcc) +{ + if (irt_isnum(ir->t)) + asm_fpmin_max(as, ir, fcc); + else + asm_intmin_max(as, ir, cc); +} + +#define asm_min(as, ir) asm_min_max(as, ir, CC_LT, CC_PL) +#define asm_max(as, ir) asm_min_max(as, ir, CC_GT, CC_LE) + +/* -- Comparisons --------------------------------------------------------- */ + +/* Map of comparisons to flags. ORDER IR. */ +static const uint8_t asm_compmap[IR_ABC+1] = { + /* op FP swp int cc FP cc */ + /* LT */ CC_GE + (CC_HS << 4), + /* GE x */ CC_LT + (CC_HI << 4), + /* LE */ CC_GT + (CC_HI << 4), + /* GT x */ CC_LE + (CC_HS << 4), + /* ULT x */ CC_HS + (CC_LS << 4), + /* UGE */ CC_LO + (CC_LO << 4), + /* ULE x */ CC_HI + (CC_LO << 4), + /* UGT */ CC_LS + (CC_LS << 4), + /* EQ */ CC_NE + (CC_NE << 4), + /* NE */ CC_EQ + (CC_EQ << 4), + /* ABC */ CC_LS + (CC_LS << 4) /* Same as UGT. */ +}; + +/* FP comparisons. */ +static void asm_fpcomp(ASMState *as, IRIns *ir) +{ + Reg left, right; + A64Ins ai; + int swp = ((ir->o ^ (ir->o >> 2)) & ~(ir->o >> 3) & 1); + if (!swp && irref_isk(ir->op2) && ir_knum(IR(ir->op2))->u64 == 0) { + left = (ra_alloc1(as, ir->op1, RSET_FPR) & 31); + right = 0; + ai = A64I_FCMPZd; + } else { + left = ra_alloc2(as, ir, RSET_FPR); + if (swp) { + right = (left & 31); left = ((left >> 8) & 31); + } else { + right = ((left >> 8) & 31); left &= 31; + } + ai = A64I_FCMPd; + } + asm_guardcc(as, (asm_compmap[ir->o] >> 4)); + emit_nm(as, ai, left, right); +} + +/* Integer comparisons. */ +static void asm_intcomp(ASMState *as, IRIns *ir) +{ + A64CC oldcc, cc = (asm_compmap[ir->o] & 15); + A64Ins ai = irt_is64(ir->t) ? A64I_CMPx : A64I_CMPw; + IRRef lref = ir->op1, rref = ir->op2; + Reg left; + uint32_t m; + int cmpprev0 = 0; + lj_assertA(irt_is64(ir->t) || irt_isint(ir->t) || + irt_isu32(ir->t) || irt_isaddr(ir->t) || irt_isu8(ir->t), + "bad comparison data type %d", irt_type(ir->t)); + if (asm_swapops(as, lref, rref)) { + IRRef tmp = lref; lref = rref; rref = tmp; + if (cc >= CC_GE) cc ^= 7; /* LT <-> GT, LE <-> GE */ + else if (cc > CC_NE) cc ^= 11; /* LO <-> HI, LS <-> HS */ + } + oldcc = cc; + if (irref_isk(rref) && get_k64val(as, rref) == 0) { + IRIns *irl = IR(lref); + if (cc == CC_GE) cc = CC_PL; + else if (cc == CC_LT) cc = CC_MI; + else if (cc > CC_NE) goto nocombine; /* Other conds don't work with tst. */ + cmpprev0 = (irl+1 == ir); + /* Combine and-cmp-bcc into tbz/tbnz or and-cmp into tst. */ + if (cmpprev0 && irl->o == IR_BAND && !ra_used(irl)) { + IRRef blref = irl->op1, brref = irl->op2; + uint32_t m2 = 0; + Reg bleft; + if (asm_swapops(as, blref, brref)) { + Reg tmp = blref; blref = brref; brref = tmp; + } + if (irref_isk(brref)) { + uint64_t k = get_k64val(as, brref); + if (k && !(k & (k-1)) && (cc == CC_EQ || cc == CC_NE)) { + asm_guardtnb(as, cc == CC_EQ ? A64I_TBZ : A64I_TBNZ, + ra_alloc1(as, blref, RSET_GPR), emit_ctz64(k)); + return; + } + m2 = emit_isk13(k, irt_is64(irl->t)); + } + bleft = ra_alloc1(as, blref, RSET_GPR); + ai = (irt_is64(irl->t) ? A64I_TSTx : A64I_TSTw); + if (!m2) + m2 = asm_fuseopm(as, ai, brref, rset_exclude(RSET_GPR, bleft)); + asm_guardcc(as, cc); + emit_n(as, ai^m2, bleft); + return; + } + if (cc == CC_EQ || cc == CC_NE) { + /* Combine cmp-bcc into cbz/cbnz. */ + ai = cc == CC_EQ ? A64I_CBZ : A64I_CBNZ; + if (irt_is64(ir->t)) ai |= A64I_X; + asm_guardcnb(as, ai, ra_alloc1(as, lref, RSET_GPR)); + return; + } + } +nocombine: + left = ra_alloc1(as, lref, RSET_GPR); + m = asm_fuseopm(as, ai, rref, rset_exclude(RSET_GPR, left)); + asm_guardcc(as, cc); + emit_n(as, ai^m, left); + /* Signed comparison with zero and referencing previous ins? */ + if (cmpprev0 && (oldcc <= CC_NE || oldcc >= CC_GE)) + as->flagmcp = as->mcp; /* Allow elimination of the compare. */ +} + +static void asm_comp(ASMState *as, IRIns *ir) +{ + if (irt_isnum(ir->t)) + asm_fpcomp(as, ir); + else + asm_intcomp(as, ir); +} + +#define asm_equal(as, ir) asm_comp(as, ir) + +/* -- Split register ops -------------------------------------------------- */ + +/* Hiword op of a split 64/64 bit op. Previous op is the loword op. */ +static void asm_hiop(ASMState *as, IRIns *ir) +{ + /* HIOP is marked as a store because it needs its own DCE logic. */ + int uselo = ra_used(ir-1), usehi = ra_used(ir); /* Loword/hiword used? */ + if (LJ_UNLIKELY(!(as->flags & JIT_F_OPT_DCE))) uselo = usehi = 1; + if (!usehi) return; /* Skip unused hiword op for all remaining ops. */ + switch ((ir-1)->o) { + case IR_CALLN: + case IR_CALLL: + case IR_CALLS: + case IR_CALLXS: + if (!uselo) + ra_allocref(as, ir->op1, RID2RSET(RID_RETLO)); /* Mark lo op as used. */ + break; + default: lj_assertA(0, "bad HIOP for op %d", (ir-1)->o); break; + } +} + +/* -- Profiling ----------------------------------------------------------- */ + +static void asm_prof(ASMState *as, IRIns *ir) +{ + uint32_t k = emit_isk13(HOOK_PROFILE, 0); + lj_assertA(k != 0, "HOOK_PROFILE does not fit in K13"); + UNUSED(ir); + asm_guardcc(as, CC_NE); + emit_n(as, A64I_TSTw^k, RID_TMP); + emit_lsptr(as, A64I_LDRB, RID_TMP, (void *)&J2G(as->J)->hookmask); +} + +/* -- Stack handling ------------------------------------------------------ */ + +/* Check Lua stack size for overflow. Use exit handler as fallback. */ +static void asm_stack_check(ASMState *as, BCReg topslot, + IRIns *irp, RegSet allow, ExitNo exitno) +{ + Reg pbase; + uint32_t k; + if (irp) { + if (!ra_hasspill(irp->s)) { + pbase = irp->r; + lj_assertA(ra_hasreg(pbase), "base reg lost"); + } else if (allow) { + pbase = rset_pickbot(allow); + } else { + pbase = RID_RET; + emit_lso(as, A64I_LDRx, RID_RET, RID_SP, 0); /* Restore temp register. */ + } + } else { + pbase = RID_BASE; + } + emit_cond_branch(as, CC_LS, asm_exitstub_addr(as, exitno)); + k = emit_isk12((8*topslot)); + lj_assertA(k, "slot offset %d does not fit in K12", 8*topslot); + emit_n(as, A64I_CMPx^k, RID_TMP); + emit_dnm(as, A64I_SUBx, RID_TMP, RID_TMP, pbase); + emit_lso(as, A64I_LDRx, RID_TMP, RID_TMP, + (int32_t)offsetof(lua_State, maxstack)); + if (irp) { /* Must not spill arbitrary registers in head of side trace. */ + if (ra_hasspill(irp->s)) + emit_lso(as, A64I_LDRx, pbase, RID_SP, sps_scale(irp->s)); + emit_lso(as, A64I_LDRx, RID_TMP, RID_GL, glofs(as, &J2G(as->J)->cur_L)); + if (ra_hasspill(irp->s) && !allow) + emit_lso(as, A64I_STRx, RID_RET, RID_SP, 0); /* Save temp register. */ + } else { + emit_getgl(as, RID_TMP, cur_L); + } +} + +/* Restore Lua stack from on-trace state. */ +static void asm_stack_restore(ASMState *as, SnapShot *snap) +{ + SnapEntry *map = &as->T->snapmap[snap->mapofs]; +#ifdef LUA_USE_ASSERT + SnapEntry *flinks = &as->T->snapmap[snap_nextofs(as->T, snap)-1-LJ_FR2]; +#endif + MSize n, nent = snap->nent; + /* Store the value of all modified slots to the Lua stack. */ + for (n = 0; n < nent; n++) { + SnapEntry sn = map[n]; + BCReg s = snap_slot(sn); + int32_t ofs = 8*((int32_t)s-1-LJ_FR2); + IRRef ref = snap_ref(sn); + IRIns *ir = IR(ref); + if ((sn & SNAP_NORESTORE)) + continue; + if ((sn & SNAP_KEYINDEX)) { + RegSet allow = rset_exclude(RSET_GPR, RID_BASE); + Reg r = irref_isk(ref) ? ra_allock(as, ir->i, allow) : + ra_alloc1(as, ref, allow); + rset_clear(allow, r); + emit_lso(as, A64I_STRw, r, RID_BASE, ofs); + emit_lso(as, A64I_STRw, ra_allock(as, LJ_KEYINDEX, allow), RID_BASE, ofs+4); + } else if (irt_isnum(ir->t)) { + Reg src = ra_alloc1(as, ref, RSET_FPR); + emit_lso(as, A64I_STRd, (src & 31), RID_BASE, ofs); + } else { + asm_tvstore64(as, RID_BASE, ofs, ref); + } + checkmclim(as); + } + lj_assertA(map + nent == flinks, "inconsistent frames in snapshot"); +} + +/* -- GC handling --------------------------------------------------------- */ + +/* Marker to prevent patching the GC check exit. */ +#define ARM64_NOPATCH_GC_CHECK \ + (A64I_ORRx|A64F_D(RID_TMP)|A64F_M(RID_TMP)|A64F_N(RID_TMP)) + +/* Check GC threshold and do one or more GC steps. */ +static void asm_gc_check(ASMState *as) +{ + const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_gc_step_jit]; + IRRef args[2]; + MCLabel l_end; + Reg tmp2; + ra_evictset(as, RSET_SCRATCH); + l_end = emit_label(as); + /* Exit trace if in GCSatomic or GCSfinalize. Avoids syncing GC objects. */ + asm_guardcnb(as, A64I_CBNZ, RID_RET); /* Assumes asm_snap_prep() is done. */ + *--as->mcp = ARM64_NOPATCH_GC_CHECK; + args[0] = ASMREF_TMP1; /* global_State *g */ + args[1] = ASMREF_TMP2; /* MSize steps */ + asm_gencall(as, ci, args); + emit_dm(as, A64I_MOVx, ra_releasetmp(as, ASMREF_TMP1), RID_GL); + tmp2 = ra_releasetmp(as, ASMREF_TMP2); + emit_loadi(as, tmp2, as->gcsteps); + /* Jump around GC step if GC total < GC threshold. */ + emit_cond_branch(as, CC_LS, l_end); + emit_nm(as, A64I_CMPx, RID_TMP, tmp2); + emit_getgl(as, tmp2, gc.threshold); + emit_getgl(as, RID_TMP, gc.total); + as->gcsteps = 0; + checkmclim(as); +} + +/* -- Loop handling ------------------------------------------------------- */ + +/* Fixup the loop branch. */ +static void asm_loop_fixup(ASMState *as) +{ + MCode *p = as->mctop; + MCode *target = as->mcp; + if (as->loopinv) { /* Inverted loop branch? */ + uint32_t mask = (p[-2] & 0x7e000000) == 0x36000000 ? 0x3fffu : 0x7ffffu; + ptrdiff_t delta = target - (p - 2); + /* asm_guard* already inverted the bcc/tnb/cnb and patched the final b. */ + p[-2] |= ((uint32_t)delta & mask) << 5; + } else { + ptrdiff_t delta = target - (p - 1); + p[-1] = A64I_B | A64F_S26(delta); + } +} + +/* Fixup the tail of the loop. */ +static void asm_loop_tail_fixup(ASMState *as) +{ + UNUSED(as); /* Nothing to do. */ +} + +/* -- Head of trace ------------------------------------------------------- */ + +/* Reload L register from g->cur_L. */ +static void asm_head_lreg(ASMState *as) +{ + IRIns *ir = IR(ASMREF_L); + if (ra_used(ir)) { + Reg r = ra_dest(as, ir, RSET_GPR); + emit_getgl(as, r, cur_L); + ra_evictk(as); + } +} + +/* Coalesce BASE register for a root trace. */ +static void asm_head_root_base(ASMState *as) +{ + IRIns *ir; + asm_head_lreg(as); + ir = IR(REF_BASE); + if (ra_hasreg(ir->r) && (rset_test(as->modset, ir->r) || irt_ismarked(ir->t))) + ra_spill(as, ir); + ra_destreg(as, ir, RID_BASE); +} + +/* Coalesce BASE register for a side trace. */ +static Reg asm_head_side_base(ASMState *as, IRIns *irp) +{ + IRIns *ir; + asm_head_lreg(as); + ir = IR(REF_BASE); + if (ra_hasreg(ir->r) && (rset_test(as->modset, ir->r) || irt_ismarked(ir->t))) + ra_spill(as, ir); + if (ra_hasspill(irp->s)) { + return ra_dest(as, ir, RSET_GPR); + } else { + Reg r = irp->r; + lj_assertA(ra_hasreg(r), "base reg lost"); + if (r != ir->r && !rset_test(as->freeset, r)) + ra_restore(as, regcost_ref(as->cost[r])); + ra_destreg(as, ir, r); + return r; + } +} + +/* -- Tail of trace ------------------------------------------------------- */ + +/* Fixup the tail code. */ +static void asm_tail_fixup(ASMState *as, TraceNo lnk) +{ + MCode *p = as->mctop; + MCode *target; + /* Undo the sp adjustment in BC_JLOOP when exiting to the interpreter. */ + int32_t spadj = as->T->spadjust + (lnk ? 0 : sps_scale(SPS_FIXED)); + if (spadj == 0) { + *--p = A64I_LE(A64I_NOP); + as->mctop = p; + } else { + /* Patch stack adjustment. */ + uint32_t k = emit_isk12(spadj); + lj_assertA(k, "stack adjustment %d does not fit in K12", spadj); + p[-2] = (A64I_ADDx^k) | A64F_D(RID_SP) | A64F_N(RID_SP); + } + /* Patch exit branch. */ + target = lnk ? traceref(as->J, lnk)->mcode : (MCode *)lj_vm_exit_interp; + p[-1] = A64I_B | A64F_S26((target-p)+1); +} + +/* Prepare tail of code. */ +static void asm_tail_prep(ASMState *as) +{ + MCode *p = as->mctop - 1; /* Leave room for exit branch. */ + if (as->loopref) { + as->invmcp = as->mcp = p; + } else { + as->mcp = p-1; /* Leave room for stack pointer adjustment. */ + as->invmcp = NULL; + } + *p = 0; /* Prevent load/store merging. */ +} + +/* -- Trace setup --------------------------------------------------------- */ + +/* Ensure there are enough stack slots for call arguments. */ +static Reg asm_setup_call_slots(ASMState *as, IRIns *ir, const CCallInfo *ci) +{ + IRRef args[CCI_NARGS_MAX*2]; + uint32_t i, nargs = CCI_XNARGS(ci); + int nslots = 0, ngpr = REGARG_NUMGPR, nfpr = REGARG_NUMFPR; + asm_collectargs(as, ir, ci, args); + for (i = 0; i < nargs; i++) { + if (args[i] && irt_isfp(IR(args[i])->t)) { + if (nfpr > 0) nfpr--; else nslots += 2; + } else { + if (ngpr > 0) ngpr--; else nslots += 2; + } + } + if (nslots > as->evenspill) /* Leave room for args in stack slots. */ + as->evenspill = nslots; + return REGSP_HINT(RID_RET); +} + +static void asm_setup_target(ASMState *as) +{ + /* May need extra exit for asm_stack_check on side traces. */ + asm_exitstub_setup(as, as->T->nsnap + (as->parent ? 1 : 0)); +} + +#if LJ_BE +/* ARM64 instructions are always little-endian. Swap for ARM64BE. */ +static void asm_mcode_fixup(MCode *mcode, MSize size) +{ + MCode *pe = (MCode *)((char *)mcode + size); + while (mcode < pe) { + MCode ins = *mcode; + *mcode++ = lj_bswap(ins); + } +} +#define LJ_TARGET_MCODE_FIXUP 1 +#endif + +/* -- Trace patching ------------------------------------------------------ */ + +/* Patch exit jumps of existing machine code to a new target. */ +void lj_asm_patchexit(jit_State *J, GCtrace *T, ExitNo exitno, MCode *target) +{ + MCode *p = T->mcode; + MCode *pe = (MCode *)((char *)p + T->szmcode); + MCode *cstart = NULL; + MCode *mcarea = lj_mcode_patch(J, p, 0); + MCode *px = exitstub_trace_addr(T, exitno); + int patchlong = 1; + /* Note: this assumes a trace exit is only ever patched once. */ + for (; p < pe; p++) { + /* Look for exitstub branch, replace with branch to target. */ + ptrdiff_t delta = target - p; + MCode ins = A64I_LE(*p); + if ((ins & 0xff000000u) == 0x54000000u && + ((ins ^ ((px-p)<<5)) & 0x00ffffe0u) == 0) { + /* Patch bcc, if within range. */ + if (A64F_S_OK(delta, 19)) { + *p = A64I_LE((ins & 0xff00001fu) | A64F_S19(delta)); + if (!cstart) cstart = p; + } + } else if ((ins & 0xfc000000u) == 0x14000000u && + ((ins ^ (px-p)) & 0x03ffffffu) == 0) { + /* Patch b. */ + lj_assertJ(A64F_S_OK(delta, 26), "branch target out of range"); + *p = A64I_LE((ins & 0xfc000000u) | A64F_S26(delta)); + if (!cstart) cstart = p; + } else if ((ins & 0x7e000000u) == 0x34000000u && + ((ins ^ ((px-p)<<5)) & 0x00ffffe0u) == 0) { + /* Patch cbz/cbnz, if within range. */ + if (p[-1] == ARM64_NOPATCH_GC_CHECK) { + patchlong = 0; + } else if (A64F_S_OK(delta, 19)) { + *p = A64I_LE((ins & 0xff00001fu) | A64F_S19(delta)); + if (!cstart) cstart = p; + } + } else if ((ins & 0x7e000000u) == 0x36000000u && + ((ins ^ ((px-p)<<5)) & 0x0007ffe0u) == 0) { + /* Patch tbz/tbnz, if within range. */ + if (A64F_S_OK(delta, 14)) { + *p = A64I_LE((ins & 0xfff8001fu) | A64F_S14(delta)); + if (!cstart) cstart = p; + } + } + } + /* Always patch long-range branch in exit stub itself. Except, if we can't. */ + if (patchlong) { + ptrdiff_t delta = target - px; + lj_assertJ(A64F_S_OK(delta, 26), "branch target out of range"); + *px = A64I_B | A64F_S26(delta); + if (!cstart) cstart = px; + } + if (cstart) lj_mcode_sync(cstart, px+1); + lj_mcode_patch(J, mcarea, 1); +} + diff --cc src/lj_assert.c index 4b713b2b,00000000..5c948b41 mode 100644,000000..100644 --- a/src/lj_assert.c +++ b/src/lj_assert.c @@@ -1,28 -1,0 +1,28 @@@ +/* +** Internal assertions. - ** Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h ++** Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +*/ + +#define lj_assert_c +#define LUA_CORE + +#if defined(LUA_USE_ASSERT) || defined(LUA_USE_APICHECK) + +#include + +#include "lj_obj.h" + +void lj_assert_fail(global_State *g, const char *file, int line, + const char *func, const char *fmt, ...) +{ + va_list argp; + va_start(argp, fmt); + fprintf(stderr, "LuaJIT ASSERT %s:%d: %s: ", file, line, func); + vfprintf(stderr, fmt, argp); + fputc('\n', stderr); + va_end(argp); + UNUSED(g); /* May be NULL. TODO: optionally dump state. */ + abort(); +} + +#endif diff --cc src/lj_buf.c index cf268af2,00000000..ae2ccd82 mode 100644,000000..100644 --- a/src/lj_buf.c +++ b/src/lj_buf.c @@@ -1,305 -1,0 +1,305 @@@ +/* +** Buffer handling. - ** Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h ++** Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +*/ + +#define lj_buf_c +#define LUA_CORE + +#include "lj_obj.h" +#include "lj_gc.h" +#include "lj_err.h" +#include "lj_buf.h" +#include "lj_str.h" +#include "lj_tab.h" +#include "lj_strfmt.h" + +/* -- Buffer management --------------------------------------------------- */ + +static void buf_grow(SBuf *sb, MSize sz) +{ + MSize osz = sbufsz(sb), len = sbuflen(sb), nsz = osz; + char *b; + GCSize flag; + if (nsz < LJ_MIN_SBUF) nsz = LJ_MIN_SBUF; + while (nsz < sz) nsz += nsz; + flag = sbufflag(sb); + if ((flag & SBUF_FLAG_COW)) { /* Copy-on-write semantics. */ + lj_assertG_(G(sbufL(sb)), sb->w == sb->e, "bad SBuf COW"); + b = (char *)lj_mem_new(sbufL(sb), nsz); + setsbufflag(sb, flag & ~(GCSize)SBUF_FLAG_COW); + setgcrefnull(sbufX(sb)->cowref); + memcpy(b, sb->b, osz); + } else { + b = (char *)lj_mem_realloc(sbufL(sb), sb->b, osz, nsz); + } + if ((flag & SBUF_FLAG_EXT)) { + sbufX(sb)->r = sbufX(sb)->r - sb->b + b; /* Adjust read pointer, too. */ + } + /* Adjust buffer pointers. */ + sb->b = b; + sb->w = b + len; + sb->e = b + nsz; + if ((flag & SBUF_FLAG_BORROW)) { /* Adjust borrowed buffer pointers. */ + SBuf *bsb = mref(sbufX(sb)->bsb, SBuf); + bsb->b = b; + bsb->w = b + len; + bsb->e = b + nsz; + } +} + +LJ_NOINLINE char *LJ_FASTCALL lj_buf_need2(SBuf *sb, MSize sz) +{ + lj_assertG_(G(sbufL(sb)), sz > sbufsz(sb), "SBuf overflow"); + if (LJ_UNLIKELY(sz > LJ_MAX_BUF)) + lj_err_mem(sbufL(sb)); + buf_grow(sb, sz); + return sb->b; +} + +LJ_NOINLINE char *LJ_FASTCALL lj_buf_more2(SBuf *sb, MSize sz) +{ + if (sbufisext(sb)) { + SBufExt *sbx = (SBufExt *)sb; + MSize len = sbufxlen(sbx); + if (LJ_UNLIKELY(sz > LJ_MAX_BUF || len + sz > LJ_MAX_BUF)) + lj_err_mem(sbufL(sbx)); + if (len + sz > sbufsz(sbx)) { /* Must grow. */ + buf_grow((SBuf *)sbx, len + sz); + } else if (sbufiscow(sb) || sbufxslack(sbx) < (sbufsz(sbx) >> 3)) { + /* Also grow to avoid excessive compactions, if slack < size/8. */ + buf_grow((SBuf *)sbx, sbuflen(sbx) + sz); /* Not sbufxlen! */ + return sbx->w; + } + if (sbx->r != sbx->b) { /* Compact by moving down. */ + memmove(sbx->b, sbx->r, len); + sbx->r = sbx->b; + sbx->w = sbx->b + len; + lj_assertG_(G(sbufL(sbx)), len + sz <= sbufsz(sbx), "bad SBuf compact"); + } + } else { + MSize len = sbuflen(sb); + lj_assertG_(G(sbufL(sb)), sz > sbufleft(sb), "SBuf overflow"); + if (LJ_UNLIKELY(sz > LJ_MAX_BUF || len + sz > LJ_MAX_BUF)) + lj_err_mem(sbufL(sb)); + buf_grow(sb, len + sz); + } + return sb->w; +} + +void LJ_FASTCALL lj_buf_shrink(lua_State *L, SBuf *sb) +{ + char *b = sb->b; + MSize osz = (MSize)(sb->e - b); + if (osz > 2*LJ_MIN_SBUF) { + MSize n = (MSize)(sb->w - b); + b = lj_mem_realloc(L, b, osz, (osz >> 1)); + sb->b = b; + sb->w = b + n; + sb->e = b + (osz >> 1); + } + lj_assertG_(G(sbufL(sb)), !sbufisext(sb), "YAGNI shrink SBufExt"); +} + +char * LJ_FASTCALL lj_buf_tmp(lua_State *L, MSize sz) +{ + SBuf *sb = &G(L)->tmpbuf; + setsbufL(sb, L); + return lj_buf_need(sb, sz); +} + +#if LJ_HASBUFFER && LJ_HASJIT +void lj_bufx_set(SBufExt *sbx, const char *p, MSize len, GCobj *ref) +{ + lua_State *L = sbufL(sbx); + lj_bufx_free(L, sbx); + lj_bufx_set_cow(L, sbx, p, len); + setgcref(sbx->cowref, ref); + lj_gc_objbarrier(L, (GCudata *)sbx - 1, ref); +} + +#if LJ_HASFFI +MSize LJ_FASTCALL lj_bufx_more(SBufExt *sbx, MSize sz) +{ + lj_buf_more((SBuf *)sbx, sz); + return sbufleft(sbx); +} +#endif +#endif + +/* -- Low-level buffer put operations ------------------------------------- */ + +SBuf *lj_buf_putmem(SBuf *sb, const void *q, MSize len) +{ + char *w = lj_buf_more(sb, len); + w = lj_buf_wmem(w, q, len); + sb->w = w; + return sb; +} + +#if LJ_HASJIT || LJ_HASFFI +static LJ_NOINLINE SBuf * LJ_FASTCALL lj_buf_putchar2(SBuf *sb, int c) +{ + char *w = lj_buf_more2(sb, 1); + *w++ = (char)c; + sb->w = w; + return sb; +} + +SBuf * LJ_FASTCALL lj_buf_putchar(SBuf *sb, int c) +{ + char *w = sb->w; + if (LJ_LIKELY(w < sb->e)) { + *w++ = (char)c; + sb->w = w; + return sb; + } + return lj_buf_putchar2(sb, c); +} +#endif + +SBuf * LJ_FASTCALL lj_buf_putstr(SBuf *sb, GCstr *s) +{ + MSize len = s->len; + char *w = lj_buf_more(sb, len); + w = lj_buf_wmem(w, strdata(s), len); + sb->w = w; + return sb; +} + +/* -- High-level buffer put operations ------------------------------------ */ + +SBuf * LJ_FASTCALL lj_buf_putstr_reverse(SBuf *sb, GCstr *s) +{ + MSize len = s->len; + char *w = lj_buf_more(sb, len), *e = w+len; + const char *q = strdata(s)+len-1; + while (w < e) + *w++ = *q--; + sb->w = w; + return sb; +} + +SBuf * LJ_FASTCALL lj_buf_putstr_lower(SBuf *sb, GCstr *s) +{ + MSize len = s->len; + char *w = lj_buf_more(sb, len), *e = w+len; + const char *q = strdata(s); + for (; w < e; w++, q++) { + uint32_t c = *(unsigned char *)q; +#if LJ_TARGET_PPC + *w = c + ((c >= 'A' && c <= 'Z') << 5); +#else + if (c >= 'A' && c <= 'Z') c += 0x20; + *w = c; +#endif + } + sb->w = w; + return sb; +} + +SBuf * LJ_FASTCALL lj_buf_putstr_upper(SBuf *sb, GCstr *s) +{ + MSize len = s->len; + char *w = lj_buf_more(sb, len), *e = w+len; + const char *q = strdata(s); + for (; w < e; w++, q++) { + uint32_t c = *(unsigned char *)q; +#if LJ_TARGET_PPC + *w = c - ((c >= 'a' && c <= 'z') << 5); +#else + if (c >= 'a' && c <= 'z') c -= 0x20; + *w = c; +#endif + } + sb->w = w; + return sb; +} + +SBuf *lj_buf_putstr_rep(SBuf *sb, GCstr *s, int32_t rep) +{ + MSize len = s->len; + if (rep > 0 && len) { + uint64_t tlen = (uint64_t)rep * len; + char *w; + if (LJ_UNLIKELY(tlen > LJ_MAX_STR)) + lj_err_mem(sbufL(sb)); + w = lj_buf_more(sb, (MSize)tlen); + if (len == 1) { /* Optimize a common case. */ + uint32_t c = strdata(s)[0]; + do { *w++ = c; } while (--rep > 0); + } else { + const char *e = strdata(s) + len; + do { + const char *q = strdata(s); + do { *w++ = *q++; } while (q < e); + } while (--rep > 0); + } + sb->w = w; + } + return sb; +} + +SBuf *lj_buf_puttab(SBuf *sb, GCtab *t, GCstr *sep, int32_t i, int32_t e) +{ + MSize seplen = sep ? sep->len : 0; + if (i <= e) { + for (;;) { + cTValue *o = lj_tab_getint(t, i); + char *w; + if (!o) { + badtype: /* Error: bad element type. */ + sb->w = (char *)(intptr_t)i; /* Store failing index. */ + return NULL; + } else if (tvisstr(o)) { + MSize len = strV(o)->len; + w = lj_buf_wmem(lj_buf_more(sb, len + seplen), strVdata(o), len); + } else if (tvisint(o)) { + w = lj_strfmt_wint(lj_buf_more(sb, STRFMT_MAXBUF_INT+seplen), intV(o)); + } else if (tvisnum(o)) { + w = lj_buf_more(lj_strfmt_putfnum(sb, STRFMT_G14, numV(o)), seplen); + } else { + goto badtype; + } + if (i++ == e) { + sb->w = w; + break; + } + if (seplen) w = lj_buf_wmem(w, strdata(sep), seplen); + sb->w = w; + } + } + return sb; +} + +/* -- Miscellaneous buffer operations ------------------------------------- */ + +GCstr * LJ_FASTCALL lj_buf_tostr(SBuf *sb) +{ + return lj_str_new(sbufL(sb), sb->b, sbuflen(sb)); +} + +/* Concatenate two strings. */ +GCstr *lj_buf_cat2str(lua_State *L, GCstr *s1, GCstr *s2) +{ + MSize len1 = s1->len, len2 = s2->len; + char *buf = lj_buf_tmp(L, len1 + len2); + memcpy(buf, strdata(s1), len1); + memcpy(buf+len1, strdata(s2), len2); + return lj_str_new(L, buf, len1 + len2); +} + +/* Read ULEB128 from buffer. */ +uint32_t LJ_FASTCALL lj_buf_ruleb128(const char **pp) +{ + const uint8_t *w = (const uint8_t *)*pp; + uint32_t v = *w++; + if (LJ_UNLIKELY(v >= 0x80)) { + int sh = 0; + v &= 0x7f; + do { v |= ((*w & 0x7f) << (sh += 7)); } while (*w++ >= 0x80); + } + *pp = (const char *)w; + return v; +} + diff --cc src/lj_buf.h index 76114201,00000000..744e5747 mode 100644,000000..100644 --- a/src/lj_buf.h +++ b/src/lj_buf.h @@@ -1,198 -1,0 +1,198 @@@ +/* +** Buffer handling. - ** Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h ++** Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +*/ + +#ifndef _LJ_BUF_H +#define _LJ_BUF_H + +#include "lj_obj.h" +#include "lj_gc.h" +#include "lj_str.h" + +/* Resizable string buffers. */ + +/* The SBuf struct definition is in lj_obj.h: +** char *w; Write pointer. +** char *e; End pointer. +** char *b; Base pointer. +** MRef L; lua_State, used for buffer resizing. Extension bits in 3 LSB. +*/ + +/* Extended string buffer. */ +typedef struct SBufExt { + SBufHeader; + union { + GCRef cowref; /* Copy-on-write object reference. */ + MRef bsb; /* Borrowed string buffer. */ + }; + char *r; /* Read pointer. */ + GCRef dict_str; /* Serialization string dictionary table. */ + GCRef dict_mt; /* Serialization metatable dictionary table. */ + int depth; /* Remaining recursion depth. */ +} SBufExt; + +#define sbufsz(sb) ((MSize)((sb)->e - (sb)->b)) +#define sbuflen(sb) ((MSize)((sb)->w - (sb)->b)) +#define sbufleft(sb) ((MSize)((sb)->e - (sb)->w)) +#define sbufxlen(sbx) ((MSize)((sbx)->w - (sbx)->r)) +#define sbufxslack(sbx) ((MSize)((sbx)->r - (sbx)->b)) + +#define SBUF_MASK_FLAG (7) +#define SBUF_MASK_L (~(GCSize)SBUF_MASK_FLAG) +#define SBUF_FLAG_EXT 1 /* Extended string buffer. */ +#define SBUF_FLAG_COW 2 /* Copy-on-write buffer. */ +#define SBUF_FLAG_BORROW 4 /* Borrowed string buffer. */ + +#define sbufL(sb) \ + ((lua_State *)(void *)(uintptr_t)(mrefu((sb)->L) & SBUF_MASK_L)) +#define setsbufL(sb, l) (setmref((sb)->L, (l))) +#define setsbufXL(sb, l, flag) \ + (setmrefu((sb)->L, (GCSize)(uintptr_t)(void *)(l) + (flag))) +#define setsbufXL_(sb, l) \ + (setmrefu((sb)->L, (GCSize)(uintptr_t)(void *)(l) | (mrefu((sb)->L) & SBUF_MASK_FLAG))) + +#define sbufflag(sb) (mrefu((sb)->L)) +#define sbufisext(sb) (sbufflag((sb)) & SBUF_FLAG_EXT) +#define sbufiscow(sb) (sbufflag((sb)) & SBUF_FLAG_COW) +#define sbufisborrow(sb) (sbufflag((sb)) & SBUF_FLAG_BORROW) +#define sbufiscoworborrow(sb) (sbufflag((sb)) & (SBUF_FLAG_COW|SBUF_FLAG_BORROW)) +#define sbufX(sb) \ + (lj_assertG_(G(sbufL(sb)), sbufisext(sb), "not an SBufExt"), (SBufExt *)(sb)) +#define setsbufflag(sb, flag) (setmrefu((sb)->L, (flag))) + +#define tvisbuf(o) \ + (LJ_HASBUFFER && tvisudata(o) && udataV(o)->udtype == UDTYPE_BUFFER) +#define bufV(o) check_exp(tvisbuf(o), ((SBufExt *)uddata(udataV(o)))) + +/* Buffer management */ +LJ_FUNC char *LJ_FASTCALL lj_buf_need2(SBuf *sb, MSize sz); +LJ_FUNC char *LJ_FASTCALL lj_buf_more2(SBuf *sb, MSize sz); +LJ_FUNC void LJ_FASTCALL lj_buf_shrink(lua_State *L, SBuf *sb); +LJ_FUNC char * LJ_FASTCALL lj_buf_tmp(lua_State *L, MSize sz); + +static LJ_AINLINE void lj_buf_init(lua_State *L, SBuf *sb) +{ + setsbufL(sb, L); + sb->w = sb->e = sb->b = NULL; +} + +static LJ_AINLINE void lj_buf_reset(SBuf *sb) +{ + sb->w = sb->b; +} + +static LJ_AINLINE SBuf *lj_buf_tmp_(lua_State *L) +{ + SBuf *sb = &G(L)->tmpbuf; + setsbufL(sb, L); + lj_buf_reset(sb); + return sb; +} + +static LJ_AINLINE void lj_buf_free(global_State *g, SBuf *sb) +{ + lj_assertG(!sbufisext(sb), "bad free of SBufExt"); + lj_mem_free(g, sb->b, sbufsz(sb)); +} + +static LJ_AINLINE char *lj_buf_need(SBuf *sb, MSize sz) +{ + if (LJ_UNLIKELY(sz > sbufsz(sb))) + return lj_buf_need2(sb, sz); + return sb->b; +} + +static LJ_AINLINE char *lj_buf_more(SBuf *sb, MSize sz) +{ + if (LJ_UNLIKELY(sz > sbufleft(sb))) + return lj_buf_more2(sb, sz); + return sb->w; +} + +/* Extended buffer management */ +static LJ_AINLINE void lj_bufx_init(lua_State *L, SBufExt *sbx) +{ + memset(sbx, 0, sizeof(SBufExt)); + setsbufXL(sbx, L, SBUF_FLAG_EXT); +} + +static LJ_AINLINE void lj_bufx_set_borrow(lua_State *L, SBufExt *sbx, SBuf *sb) +{ + setsbufXL(sbx, L, SBUF_FLAG_EXT | SBUF_FLAG_BORROW); + setmref(sbx->bsb, sb); + sbx->r = sbx->w = sbx->b = sb->b; + sbx->e = sb->e; +} + +static LJ_AINLINE void lj_bufx_set_cow(lua_State *L, SBufExt *sbx, + const char *p, MSize len) +{ + setsbufXL(sbx, L, SBUF_FLAG_EXT | SBUF_FLAG_COW); + sbx->r = sbx->b = (char *)p; + sbx->w = sbx->e = (char *)p + len; +} + +static LJ_AINLINE void lj_bufx_reset(SBufExt *sbx) +{ + if (sbufiscow(sbx)) { + setmrefu(sbx->L, (mrefu(sbx->L) & ~(GCSize)SBUF_FLAG_COW)); + setgcrefnull(sbx->cowref); + sbx->b = sbx->e = NULL; + } + sbx->r = sbx->w = sbx->b; +} + +static LJ_AINLINE void lj_bufx_free(lua_State *L, SBufExt *sbx) +{ + if (!sbufiscoworborrow(sbx)) lj_mem_free(G(L), sbx->b, sbufsz(sbx)); + setsbufXL(sbx, L, SBUF_FLAG_EXT); + setgcrefnull(sbx->cowref); + sbx->r = sbx->w = sbx->b = sbx->e = NULL; +} + +#if LJ_HASBUFFER && LJ_HASJIT +LJ_FUNC void lj_bufx_set(SBufExt *sbx, const char *p, MSize len, GCobj *o); +#if LJ_HASFFI +LJ_FUNC MSize LJ_FASTCALL lj_bufx_more(SBufExt *sbx, MSize sz); +#endif +#endif + +/* Low-level buffer put operations */ +LJ_FUNC SBuf *lj_buf_putmem(SBuf *sb, const void *q, MSize len); +#if LJ_HASJIT || LJ_HASFFI +LJ_FUNC SBuf * LJ_FASTCALL lj_buf_putchar(SBuf *sb, int c); +#endif +LJ_FUNC SBuf * LJ_FASTCALL lj_buf_putstr(SBuf *sb, GCstr *s); + +static LJ_AINLINE char *lj_buf_wmem(char *p, const void *q, MSize len) +{ + return (char *)memcpy(p, q, len) + len; +} + +static LJ_AINLINE void lj_buf_putb(SBuf *sb, int c) +{ + char *w = lj_buf_more(sb, 1); + *w++ = (char)c; + sb->w = w; +} + +/* High-level buffer put operations */ +LJ_FUNCA SBuf * LJ_FASTCALL lj_buf_putstr_reverse(SBuf *sb, GCstr *s); +LJ_FUNCA SBuf * LJ_FASTCALL lj_buf_putstr_lower(SBuf *sb, GCstr *s); +LJ_FUNCA SBuf * LJ_FASTCALL lj_buf_putstr_upper(SBuf *sb, GCstr *s); +LJ_FUNC SBuf *lj_buf_putstr_rep(SBuf *sb, GCstr *s, int32_t rep); +LJ_FUNC SBuf *lj_buf_puttab(SBuf *sb, GCtab *t, GCstr *sep, + int32_t i, int32_t e); + +/* Miscellaneous buffer operations */ +LJ_FUNCA GCstr * LJ_FASTCALL lj_buf_tostr(SBuf *sb); +LJ_FUNC GCstr *lj_buf_cat2str(lua_State *L, GCstr *s1, GCstr *s2); +LJ_FUNC uint32_t LJ_FASTCALL lj_buf_ruleb128(const char **pp); + +static LJ_AINLINE GCstr *lj_buf_str(lua_State *L, SBuf *sb) +{ + return lj_str_new(L, sb->b, sbuflen(sb)); +} + +#endif diff --cc src/lj_emit_arm64.h index 65463a5e,00000000..52d010b8 mode 100644,000000..100644 --- a/src/lj_emit_arm64.h +++ b/src/lj_emit_arm64.h @@@ -1,431 -1,0 +1,431 @@@ +/* +** ARM64 instruction emitter. - ** Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h ++** Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +** +** Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com. +** Sponsored by Cisco Systems, Inc. +*/ + +/* -- Constant encoding --------------------------------------------------- */ + +static uint64_t get_k64val(ASMState *as, IRRef ref) +{ + IRIns *ir = IR(ref); + if (ir->o == IR_KINT64) { + return ir_kint64(ir)->u64; + } else if (ir->o == IR_KGC) { + return (uint64_t)ir_kgc(ir); + } else if (ir->o == IR_KPTR || ir->o == IR_KKPTR) { + return (uint64_t)ir_kptr(ir); + } else { + lj_assertA(ir->o == IR_KINT || ir->o == IR_KNULL, + "bad 64 bit const IR op %d", ir->o); + return ir->i; /* Sign-extended. */ + } +} + +/* Encode constant in K12 format for data processing instructions. */ +static uint32_t emit_isk12(int64_t n) +{ + uint64_t k = n < 0 ? ~(uint64_t)n+1u : (uint64_t)n; + uint32_t m = n < 0 ? 0x40000000 : 0; + if (k < 0x1000) { + return A64I_K12|m|A64F_U12(k); + } else if ((k & 0xfff000) == k) { + return A64I_K12|m|0x400000|A64F_U12(k>>12); + } + return 0; +} + +#define emit_clz64(n) __builtin_clzll(n) +#define emit_ctz64(n) __builtin_ctzll(n) + +/* Encode constant in K13 format for logical data processing instructions. */ +static uint32_t emit_isk13(uint64_t n, int is64) +{ + int inv = 0, w = 128, lz, tz; + if (n & 1) { n = ~n; w = 64; inv = 1; } /* Avoid wrap-around of ones. */ + if (!n) return 0; /* Neither all-zero nor all-ones are allowed. */ + do { /* Find the repeat width. */ + if (is64 && (uint32_t)(n^(n>>32))) break; + n = (uint32_t)n; + if (!n) return 0; /* Ditto when passing n=0xffffffff and is64=0. */ + w = 32; if ((n^(n>>16)) & 0xffff) break; + n = n & 0xffff; w = 16; if ((n^(n>>8)) & 0xff) break; + n = n & 0xff; w = 8; if ((n^(n>>4)) & 0xf) break; + n = n & 0xf; w = 4; if ((n^(n>>2)) & 0x3) break; + n = n & 0x3; w = 2; + } while (0); + lz = emit_clz64(n); + tz = emit_ctz64(n); + if ((int64_t)(n << lz) >> (lz+tz) != -1ll) return 0; /* Non-contiguous? */ + if (inv) + return A64I_K13 | (((lz-w) & 127) << 16) | (((lz+tz-w-1) & 63) << 10); + else + return A64I_K13 | ((w-tz) << 16) | (((63-lz-tz-w-w) & 63) << 10); +} + +static uint32_t emit_isfpk64(uint64_t n) +{ + uint64_t etop9 = ((n >> 54) & 0x1ff); + if ((n << 16) == 0 && (etop9 == 0x100 || etop9 == 0x0ff)) { + return (uint32_t)(((n >> 48) & 0x7f) | ((n >> 56) & 0x80)); + } + return ~0u; +} + +/* -- Emit basic instructions --------------------------------------------- */ + +static void emit_dnma(ASMState *as, A64Ins ai, Reg rd, Reg rn, Reg rm, Reg ra) +{ + *--as->mcp = ai | A64F_D(rd) | A64F_N(rn) | A64F_M(rm) | A64F_A(ra); +} + +static void emit_dnm(ASMState *as, A64Ins ai, Reg rd, Reg rn, Reg rm) +{ + *--as->mcp = ai | A64F_D(rd) | A64F_N(rn) | A64F_M(rm); +} + +static void emit_dm(ASMState *as, A64Ins ai, Reg rd, Reg rm) +{ + *--as->mcp = ai | A64F_D(rd) | A64F_M(rm); +} + +static void emit_dn(ASMState *as, A64Ins ai, Reg rd, Reg rn) +{ + *--as->mcp = ai | A64F_D(rd) | A64F_N(rn); +} + +static void emit_nm(ASMState *as, A64Ins ai, Reg rn, Reg rm) +{ + *--as->mcp = ai | A64F_N(rn) | A64F_M(rm); +} + +static void emit_d(ASMState *as, A64Ins ai, Reg rd) +{ + *--as->mcp = ai | A64F_D(rd); +} + +static void emit_n(ASMState *as, A64Ins ai, Reg rn) +{ + *--as->mcp = ai | A64F_N(rn); +} + +static int emit_checkofs(A64Ins ai, int64_t ofs) +{ + int scale = (ai >> 30) & 3; + if (ofs < 0 || (ofs & ((1<= -256 && ofs <= 255) ? -1 : 0; + } else { + return (ofs < (4096<> 30) & 3; + lj_assertA(ot, "load/store offset %d out of range", ofs); + /* Combine LDR/STR pairs to LDP/STP. */ + if ((sc == 2 || sc == 3) && + (!(ai & 0x400000) || rd != rn) && + as->mcp != as->mcloop) { + uint32_t prev = *as->mcp & ~A64F_D(31); + int ofsm = ofs - (1<>sc)) || + prev == ((ai^A64I_LS_U) | A64F_N(rn) | A64F_S9(ofsm&0x1ff))) { + aip = (A64F_A(rd) | A64F_D(*as->mcp & 31)); + } else if (prev == (ai | A64F_N(rn) | A64F_U12(ofsp>>sc)) || + prev == ((ai^A64I_LS_U) | A64F_N(rn) | A64F_S9(ofsp&0x1ff))) { + aip = (A64F_D(rd) | A64F_A(*as->mcp & 31)); + ofsm = ofs; + } else { + goto nopair; + } + if (ofsm >= (int)((unsigned int)-64<mcp = aip | A64F_N(rn) | (((ofsm >> sc) & 0x7f) << 15) | + (ai ^ ((ai == A64I_LDRx || ai == A64I_STRx) ? 0x50000000 : 0x90000000)); + return; + } + } +nopair: + if (ot == 1) + *--as->mcp = ai | A64F_D(rd) | A64F_N(rn) | A64F_U12(ofs >> sc); + else + *--as->mcp = (ai^A64I_LS_U) | A64F_D(rd) | A64F_N(rn) | A64F_S9(ofs & 0x1ff); +} + +/* -- Emit loads/stores --------------------------------------------------- */ + +/* Prefer rematerialization of BASE/L from global_State over spills. */ +#define emit_canremat(ref) ((ref) <= ASMREF_L) + +/* Try to find an N-step delta relative to other consts with N < lim. */ +static int emit_kdelta(ASMState *as, Reg rd, uint64_t k, int lim) +{ + RegSet work = (~as->freeset & RSET_GPR) | RID2RSET(RID_GL); + if (lim <= 1) return 0; /* Can't beat that. */ + while (work) { + Reg r = rset_picktop(work); + IRRef ref = regcost_ref(as->cost[r]); + lj_assertA(r != rd, "dest reg %d not free", rd); + if (ref < REF_TRUE) { + uint64_t kx = ra_iskref(ref) ? (uint64_t)ra_krefk(as, ref) : + get_k64val(as, ref); + int64_t delta = (int64_t)(k - kx); + if (delta == 0) { + emit_dm(as, A64I_MOVx, rd, r); + return 1; + } else { + uint32_t k12 = emit_isk12(delta < 0 ? (int64_t)(~(uint64_t)delta+1u) : delta); + if (k12) { + emit_dn(as, (delta < 0 ? A64I_SUBx : A64I_ADDx)^k12, rd, r); + return 1; + } + /* Do other ops or multi-step deltas pay off? Probably not. + ** E.g. XOR rarely helps with pointer consts. + */ + } + } + rset_clear(work, r); + } + return 0; /* Failed. */ +} + +static void emit_loadk(ASMState *as, Reg rd, uint64_t u64, int is64) +{ + int i, zeros = 0, ones = 0, neg; + if (!is64) u64 = (int64_t)(int32_t)u64; /* Sign-extend. */ + /* Count homogeneous 16 bit fragments. */ + for (i = 0; i < 4; i++) { + uint64_t frag = (u64 >> i*16) & 0xffff; + zeros += (frag == 0); + ones += (frag == 0xffff); + } + neg = ones > zeros; /* Use MOVN if it pays off. */ + if ((neg ? ones : zeros) < 3) { /* Need 2+ ins. Try shorter K13 encoding. */ + uint32_t k13 = emit_isk13(u64, is64); + if (k13) { + emit_dn(as, (is64|A64I_ORRw)^k13, rd, RID_ZERO); + return; + } + } + if (!emit_kdelta(as, rd, u64, 4 - (neg ? ones : zeros))) { + int shift = 0, lshift = 0; + uint64_t n64 = neg ? ~u64 : u64; + if (n64 != 0) { + /* Find first/last fragment to be filled. */ + shift = (63-emit_clz64(n64)) & ~15; + lshift = emit_ctz64(n64) & ~15; + } + /* MOVK requires the original value (u64). */ + while (shift > lshift) { + uint32_t u16 = (u64 >> shift) & 0xffff; + /* Skip fragments that are correctly filled by MOVN/MOVZ. */ + if (u16 != (neg ? 0xffff : 0)) + emit_d(as, is64 | A64I_MOVKw | A64F_U16(u16) | A64F_LSL16(shift), rd); + shift -= 16; + } + /* But MOVN needs an inverted value (n64). */ + emit_d(as, (neg ? A64I_MOVNx : A64I_MOVZx) | + A64F_U16((n64 >> lshift) & 0xffff) | A64F_LSL16(lshift), rd); + } +} + +/* Load a 32 bit constant into a GPR. */ +#define emit_loadi(as, rd, i) emit_loadk(as, rd, i, 0) + +/* Load a 64 bit constant into a GPR. */ +#define emit_loadu64(as, rd, i) emit_loadk(as, rd, i, A64I_X) + +#define emit_loada(as, r, addr) emit_loadu64(as, (r), (uintptr_t)(addr)) + +#define glofs(as, k) \ + ((intptr_t)((uintptr_t)(k) - (uintptr_t)&J2GG(as->J)->g)) +#define mcpofs(as, k) \ + ((intptr_t)((uintptr_t)(k) - (uintptr_t)(as->mcp - 1))) +#define checkmcpofs(as, k) \ + (A64F_S_OK(mcpofs(as, k)>>2, 19)) + +static Reg ra_allock(ASMState *as, intptr_t k, RegSet allow); + +/* Get/set from constant pointer. */ +static void emit_lsptr(ASMState *as, A64Ins ai, Reg r, void *p) +{ + /* First, check if ip + offset is in range. */ + if ((ai & 0x00400000) && checkmcpofs(as, p)) { + emit_d(as, A64I_LDRLx | A64F_S19(mcpofs(as, p)>>2), r); + } else { + Reg base = RID_GL; /* Next, try GL + offset. */ + int64_t ofs = glofs(as, p); + if (!emit_checkofs(ai, ofs)) { /* Else split up into base reg + offset. */ + int64_t i64 = i64ptr(p); + base = ra_allock(as, (i64 & ~0x7fffull), rset_exclude(RSET_GPR, r)); + ofs = i64 & 0x7fffull; + } + emit_lso(as, ai, r, base, ofs); + } +} + +/* Load 64 bit IR constant into register. */ +static void emit_loadk64(ASMState *as, Reg r, IRIns *ir) +{ + const uint64_t *k = &ir_k64(ir)->u64; + int64_t ofs; + if (r >= RID_MAX_GPR) { + uint32_t fpk = emit_isfpk64(*k); + if (fpk != ~0u) { + emit_d(as, A64I_FMOV_DI | A64F_FP8(fpk), (r & 31)); + return; + } + } + ofs = glofs(as, k); + if (emit_checkofs(A64I_LDRx, ofs)) { + emit_lso(as, r >= RID_MAX_GPR ? A64I_LDRd : A64I_LDRx, + (r & 31), RID_GL, ofs); + } else { + if (r >= RID_MAX_GPR) { + emit_dn(as, A64I_FMOV_D_R, (r & 31), RID_TMP); + r = RID_TMP; + } + if (checkmcpofs(as, k)) + emit_d(as, A64I_LDRLx | A64F_S19(mcpofs(as, k)>>2), r); + else + emit_loadu64(as, r, *k); + } +} + +/* Get/set global_State fields. */ +#define emit_getgl(as, r, field) \ + emit_lsptr(as, A64I_LDRx, (r), (void *)&J2G(as->J)->field) +#define emit_setgl(as, r, field) \ + emit_lsptr(as, A64I_STRx, (r), (void *)&J2G(as->J)->field) + +/* Trace number is determined from pc of exit instruction. */ +#define emit_setvmstate(as, i) UNUSED(i) + +/* -- Emit control-flow instructions -------------------------------------- */ + +/* Label for internal jumps. */ +typedef MCode *MCLabel; + +/* Return label pointing to current PC. */ +#define emit_label(as) ((as)->mcp) + +static void emit_cond_branch(ASMState *as, A64CC cond, MCode *target) +{ + MCode *p = --as->mcp; + ptrdiff_t delta = target - p; + lj_assertA(A64F_S_OK(delta, 19), "branch target out of range"); + *p = A64I_BCC | A64F_S19(delta) | cond; +} + +static void emit_branch(ASMState *as, A64Ins ai, MCode *target) +{ + MCode *p = --as->mcp; + ptrdiff_t delta = target - p; + lj_assertA(A64F_S_OK(delta, 26), "branch target out of range"); + *p = ai | A64F_S26(delta); +} + +static void emit_tnb(ASMState *as, A64Ins ai, Reg r, uint32_t bit, MCode *target) +{ + MCode *p = --as->mcp; + ptrdiff_t delta = target - p; + lj_assertA(bit < 63, "bit number out of range"); + lj_assertA(A64F_S_OK(delta, 14), "branch target out of range"); + if (bit > 31) ai |= A64I_X; + *p = ai | A64F_BIT(bit & 31) | A64F_S14(delta) | r; +} + +static void emit_cnb(ASMState *as, A64Ins ai, Reg r, MCode *target) +{ + MCode *p = --as->mcp; + ptrdiff_t delta = target - p; + lj_assertA(A64F_S_OK(delta, 19), "branch target out of range"); + *p = ai | A64F_S19(delta) | r; +} + +#define emit_jmp(as, target) emit_branch(as, A64I_B, (target)) + +static void emit_call(ASMState *as, ASMFunction target) +{ + MCode *p = --as->mcp; +#if LJ_ABI_PAUTH + char *targetp = ptrauth_auth_data((char *)target, + ptrauth_key_function_pointer, 0); +#else + char *targetp = (char *)target; +#endif + ptrdiff_t delta = targetp - (char *)p; + if (A64F_S_OK(delta>>2, 26)) { + *p = A64I_BL | A64F_S26(delta>>2); + } else { /* Target out of range: need indirect call. But don't use R0-R7. */ + Reg r = ra_allock(as, i64ptr(target), + RSET_RANGE(RID_X8, RID_MAX_GPR)-RSET_FIXED); + *p = A64I_BLR_AUTH | A64F_N(r); + } +} + +/* -- Emit generic operations --------------------------------------------- */ + +/* Generic move between two regs. */ +static void emit_movrr(ASMState *as, IRIns *ir, Reg dst, Reg src) +{ + if (dst >= RID_MAX_GPR) { + emit_dn(as, irt_isnum(ir->t) ? A64I_FMOV_D : A64I_FMOV_S, + (dst & 31), (src & 31)); + return; + } + if (as->mcp != as->mcloop) { /* Swap early registers for loads/stores. */ + MCode ins = *as->mcp, swp = (src^dst); + if ((ins & 0xbf800000) == 0xb9000000) { + if (!((ins ^ (dst << 5)) & 0x000003e0)) + *as->mcp = ins ^ (swp << 5); /* Swap N in load/store. */ + if (!(ins & 0x00400000) && !((ins ^ dst) & 0x0000001f)) + *as->mcp = ins ^ swp; /* Swap D in store. */ + } + } + emit_dm(as, A64I_MOVx, dst, src); +} + +/* Generic load of register with base and (small) offset address. */ +static void emit_loadofs(ASMState *as, IRIns *ir, Reg r, Reg base, int32_t ofs) +{ + if (r >= RID_MAX_GPR) + emit_lso(as, irt_isnum(ir->t) ? A64I_LDRd : A64I_LDRs, (r & 31), base, ofs); + else + emit_lso(as, irt_is64(ir->t) ? A64I_LDRx : A64I_LDRw, r, base, ofs); +} + +/* Generic store of register with base and (small) offset address. */ +static void emit_storeofs(ASMState *as, IRIns *ir, Reg r, Reg base, int32_t ofs) +{ + if (r >= RID_MAX_GPR) + emit_lso(as, irt_isnum(ir->t) ? A64I_STRd : A64I_STRs, (r & 31), base, ofs); + else + emit_lso(as, irt_is64(ir->t) ? A64I_STRx : A64I_STRw, r, base, ofs); +} + +/* Emit an arithmetic operation with a constant operand. */ +static void emit_opk(ASMState *as, A64Ins ai, Reg dest, Reg src, + int32_t i, RegSet allow) +{ + uint32_t k = emit_isk12(i); + if (k) + emit_dn(as, ai^k, dest, src); + else + emit_dnm(as, ai, dest, src, ra_allock(as, i, allow)); +} + +/* Add offset to pointer. */ +static void emit_addptr(ASMState *as, Reg r, int32_t ofs) +{ + if (ofs) + emit_opk(as, ofs < 0 ? A64I_SUBx : A64I_ADDx, r, r, + ofs < 0 ? (int32_t)(~(uint32_t)ofs+1u) : ofs, + rset_exclude(RSET_GPR, r)); +} + +#define emit_spsub(as, ofs) emit_addptr(as, RID_SP, -(ofs)) + diff --cc src/lj_emit_mips.h index 0cea5479,57a7a7cd..dda9092d --- a/src/lj_emit_mips.h +++ b/src/lj_emit_mips.h @@@ -1,34 -1,8 +1,34 @@@ /* ** MIPS instruction emitter. - ** Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h + ** Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h */ +#if LJ_64 +static intptr_t get_k64val(ASMState *as, IRRef ref) +{ + IRIns *ir = IR(ref); + if (ir->o == IR_KINT64) { + return (intptr_t)ir_kint64(ir)->u64; + } else if (ir->o == IR_KGC) { + return (intptr_t)ir_kgc(ir); + } else if (ir->o == IR_KPTR || ir->o == IR_KKPTR) { + return (intptr_t)ir_kptr(ir); + } else if (LJ_SOFTFP && ir->o == IR_KNUM) { + return (intptr_t)ir_knum(ir)->u64; + } else { + lj_assertA(ir->o == IR_KINT || ir->o == IR_KNULL, + "bad 64 bit const IR op %d", ir->o); + return ir->i; /* Sign-extended. */ + } +} +#endif + +#if LJ_64 +#define get_kval(as, ref) get_k64val(as, ref) +#else +#define get_kval(as, ref) (IR((ref))->i) +#endif + /* -- Emit basic instructions --------------------------------------------- */ static void emit_dst(ASMState *as, MIPSIns mi, Reg rd, Reg rs, Reg rt) diff --cc src/lj_prng.c index 01935e57,00000000..326b41e6 mode 100644,000000..100644 --- a/src/lj_prng.c +++ b/src/lj_prng.c @@@ -1,259 -1,0 +1,259 @@@ +/* +** Pseudo-random number generation. - ** Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h ++** Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +*/ + +#define lj_prng_c +#define LUA_CORE + +/* To get the syscall prototype. */ +#if defined(__linux__) && !defined(_GNU_SOURCE) +#define _GNU_SOURCE +#endif + +#include "lj_def.h" +#include "lj_arch.h" +#include "lj_prng.h" + +/* -- PRNG step function -------------------------------------------------- */ + +/* This implements a Tausworthe PRNG with period 2^223. Based on: +** Tables of maximally-equidistributed combined LFSR generators, +** Pierre L'Ecuyer, 1991, table 3, 1st entry. +** Full-period ME-CF generator with L=64, J=4, k=223, N1=49. +** +** Important note: This PRNG is NOT suitable for cryptographic use! +** +** But it works fine for math.random(), which has an API that's not +** suitable for cryptography, anyway. +** +** When used as a securely seeded global PRNG, it substantially raises +** the difficulty for various attacks on the VM. +*/ + +/* Update generator i and compute a running xor of all states. */ +#define TW223_GEN(rs, z, r, i, k, q, s) \ + z = rs->u[i]; \ + z = (((z<> (k-s)) ^ ((z&((uint64_t)(int64_t)-1 << (64-k)))<u[i] = z; + +#define TW223_STEP(rs, z, r) \ + TW223_GEN(rs, z, r, 0, 63, 31, 18) \ + TW223_GEN(rs, z, r, 1, 58, 19, 28) \ + TW223_GEN(rs, z, r, 2, 55, 24, 7) \ + TW223_GEN(rs, z, r, 3, 47, 21, 8) + +/* PRNG step function with uint64_t result. */ +LJ_NOINLINE uint64_t LJ_FASTCALL lj_prng_u64(PRNGState *rs) +{ + uint64_t z, r = 0; + TW223_STEP(rs, z, r) + return r; +} + +/* PRNG step function with double in uint64_t result. */ +LJ_NOINLINE uint64_t LJ_FASTCALL lj_prng_u64d(PRNGState *rs) +{ + uint64_t z, r = 0; + TW223_STEP(rs, z, r) + /* Returns a double bit pattern in the range 1.0 <= d < 2.0. */ + return (r & U64x(000fffff,ffffffff)) | U64x(3ff00000,00000000); +} + +/* Condition seed: ensure k[i] MSB of u[i] are non-zero. */ +static LJ_AINLINE void lj_prng_condition(PRNGState *rs) +{ + if (rs->u[0] < (1u << 1)) rs->u[0] += (1u << 1); + if (rs->u[1] < (1u << 6)) rs->u[1] += (1u << 6); + if (rs->u[2] < (1u << 9)) rs->u[2] += (1u << 9); + if (rs->u[3] < (1u << 17)) rs->u[3] += (1u << 17); +} + +/* -- PRNG seeding from OS ------------------------------------------------ */ + +#if LUAJIT_SECURITY_PRNG == 0 + +/* Nothing to define. */ + +#elif LJ_TARGET_XBOX360 + +extern int XNetRandom(void *buf, unsigned int len); + +#elif LJ_TARGET_PS3 + +extern int sys_get_random_number(void *buf, uint64_t len); + +#elif LJ_TARGET_PS4 || LJ_TARGET_PS5 || LJ_TARGET_PSVITA + +extern int sceRandomGetRandomNumber(void *buf, size_t len); + +#elif LJ_TARGET_NX + +#include + +#elif LJ_TARGET_WINDOWS || LJ_TARGET_XBOXONE + +#define WIN32_LEAN_AND_MEAN +#include + +#if LJ_TARGET_UWP || LJ_TARGET_XBOXONE +/* Must use BCryptGenRandom. */ +#include +#pragma comment(lib, "bcrypt.lib") +#else +/* If you wonder about this mess, then search online for RtlGenRandom. */ +typedef BOOLEAN (WINAPI *PRGR)(void *buf, ULONG len); +static PRGR libfunc_rgr; +#endif + +#elif LJ_TARGET_POSIX + +#if LJ_TARGET_LINUX +/* Avoid a dependency on glibc 2.25+ and use the getrandom syscall instead. */ +#include +#else + +#if LJ_TARGET_OSX && !LJ_TARGET_IOS +/* +** In their infinite wisdom Apple decided to disallow getentropy() in the +** iOS App Store. Even though the call is common to all BSD-ish OS, it's +** recommended by Apple in their own security-related docs, and, to top +** off the foolery, /dev/urandom is handled by the same kernel code, +** yet accessing it is actually permitted (but less efficient). +*/ +#include +#if __MAC_OS_X_VERSION_MIN_REQUIRED >= 101200 +#define LJ_TARGET_HAS_GETENTROPY 1 +#endif +#elif (LJ_TARGET_BSD && !defined(__NetBSD__)) || LJ_TARGET_SOLARIS || LJ_TARGET_CYGWIN || LJ_TARGET_QNX +#define LJ_TARGET_HAS_GETENTROPY 1 +#endif + +#if LJ_TARGET_HAS_GETENTROPY +extern int getentropy(void *buf, size_t len) +#ifdef __ELF__ + __attribute__((weak)) +#endif +; +#endif + +#endif + +/* For the /dev/urandom fallback. */ +#include +#include + +#endif + +#if LUAJIT_SECURITY_PRNG == 0 + +/* If you really don't care about security, then define +** LUAJIT_SECURITY_PRNG=0. This yields a predictable seed +** and provides NO SECURITY against various attacks on the VM. +** +** BTW: This is NOT the way to get predictable table iteration, +** predictable trace generation, predictable bytecode generation, etc. +*/ +int LJ_FASTCALL lj_prng_seed_secure(PRNGState *rs) +{ + lj_prng_seed_fixed(rs); /* The fixed seed is already conditioned. */ + return 1; +} + +#else + +/* Securely seed PRNG from system entropy. Returns 0 on failure. */ +int LJ_FASTCALL lj_prng_seed_secure(PRNGState *rs) +{ +#if LJ_TARGET_XBOX360 + + if (XNetRandom(rs->u, (unsigned int)sizeof(rs->u)) == 0) + goto ok; + +#elif LJ_TARGET_PS3 + + if (sys_get_random_number(rs->u, sizeof(rs->u)) == 0) + goto ok; + +#elif LJ_TARGET_PS4 || LJ_TARGET_PS5 || LJ_TARGET_PSVITA + + if (sceRandomGetRandomNumber(rs->u, sizeof(rs->u)) == 0) + goto ok; + +#elif LJ_TARGET_NX + + if (getentropy(rs->u, sizeof(rs->u)) == 0) + goto ok; + +#elif LJ_TARGET_UWP || LJ_TARGET_XBOXONE + + if (BCryptGenRandom(NULL, (PUCHAR)(rs->u), (ULONG)sizeof(rs->u), + BCRYPT_USE_SYSTEM_PREFERRED_RNG) >= 0) + goto ok; + +#elif LJ_TARGET_WINDOWS + + /* Keep the library loaded in case multiple VMs are started. */ + if (!libfunc_rgr) { + HMODULE lib = LJ_WIN_LOADLIBA("advapi32.dll"); + if (!lib) return 0; + libfunc_rgr = (PRGR)GetProcAddress(lib, "SystemFunction036"); + if (!libfunc_rgr) return 0; + } + if (libfunc_rgr(rs->u, (ULONG)sizeof(rs->u))) + goto ok; + +#elif LJ_TARGET_POSIX + +#if LJ_TARGET_LINUX && defined(SYS_getrandom) + + if (syscall(SYS_getrandom, rs->u, sizeof(rs->u), 0) == (long)sizeof(rs->u)) + goto ok; + +#elif LJ_TARGET_HAS_GETENTROPY + +#ifdef __ELF__ + if (&getentropy && getentropy(rs->u, sizeof(rs->u)) == 0) + goto ok; +#else + if (getentropy(rs->u, sizeof(rs->u)) == 0) + goto ok; +#endif + +#endif + + /* Fallback to /dev/urandom. This may fail if the device is not + ** existent or accessible in a chroot or container, or if the process + ** or the OS ran out of file descriptors. + */ + { + int fd = open("/dev/urandom", O_RDONLY|O_CLOEXEC); + if (fd != -1) { + ssize_t n = read(fd, rs->u, sizeof(rs->u)); + (void)close(fd); + if (n == (ssize_t)sizeof(rs->u)) + goto ok; + } + } + +#else + + /* Add an elif above for your OS with a secure PRNG seed. + ** Note that fiddling around with rand(), getpid(), time() or coercing + ** ASLR to yield a few bits of randomness is not helpful. + ** If you don't want any security, then don't pretend you have any + ** and simply define LUAJIT_SECURITY_PRNG=0 for the build. + */ +#error "Missing secure PRNG seed for this OS" + +#endif + return 0; /* Fail. */ + +ok: + lj_prng_condition(rs); + (void)lj_prng_u64(rs); + return 1; /* Success. */ +} + +#endif + diff --cc src/lj_prng.h index bdc958ab,00000000..3dd9dbc0 mode 100644,000000..100644 --- a/src/lj_prng.h +++ b/src/lj_prng.h @@@ -1,24 -1,0 +1,24 @@@ +/* +** Pseudo-random number generation. - ** Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h ++** Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +*/ + +#ifndef _LJ_PRNG_H +#define _LJ_PRNG_H + +#include "lj_def.h" + +LJ_FUNC int LJ_FASTCALL lj_prng_seed_secure(PRNGState *rs); +LJ_FUNC uint64_t LJ_FASTCALL lj_prng_u64(PRNGState *rs); +LJ_FUNC uint64_t LJ_FASTCALL lj_prng_u64d(PRNGState *rs); + +/* This is just the precomputed result of lib_math.c:random_seed(rs, 0.0). */ +static LJ_AINLINE void lj_prng_seed_fixed(PRNGState *rs) +{ + rs->u[0] = U64x(a0d27757,0a345b8c); + rs->u[1] = U64x(764a296c,5d4aa64f); + rs->u[2] = U64x(51220704,070adeaa); + rs->u[3] = U64x(2a2717b5,a7b7b927); +} + +#endif diff --cc src/lj_profile.c index 4a13537d,00000000..8cefd5fb mode 100644,000000..100644 --- a/src/lj_profile.c +++ b/src/lj_profile.c @@@ -1,371 -1,0 +1,371 @@@ +/* +** Low-overhead profiling. - ** Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h ++** Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +*/ + +#define lj_profile_c +#define LUA_CORE + +#include "lj_obj.h" + +#if LJ_HASPROFILE + +#include "lj_buf.h" +#include "lj_frame.h" +#include "lj_debug.h" +#include "lj_dispatch.h" +#if LJ_HASJIT +#include "lj_jit.h" +#include "lj_trace.h" +#endif +#include "lj_profile.h" + +#include "luajit.h" + +#if LJ_PROFILE_SIGPROF + +#include +#include +#define profile_lock(ps) UNUSED(ps) +#define profile_unlock(ps) UNUSED(ps) + +#elif LJ_PROFILE_PTHREAD + +#include +#include +#if LJ_TARGET_PS3 +#include +#endif +#define profile_lock(ps) pthread_mutex_lock(&ps->lock) +#define profile_unlock(ps) pthread_mutex_unlock(&ps->lock) + +#elif LJ_PROFILE_WTHREAD + +#define WIN32_LEAN_AND_MEAN +#if LJ_TARGET_XBOX360 +#include +#include +#else +#include +#endif +typedef unsigned int (WINAPI *WMM_TPFUNC)(unsigned int); +#define profile_lock(ps) EnterCriticalSection(&ps->lock) +#define profile_unlock(ps) LeaveCriticalSection(&ps->lock) + +#endif + +/* Profiler state. */ +typedef struct ProfileState { + global_State *g; /* VM state that started the profiler. */ + luaJIT_profile_callback cb; /* Profiler callback. */ + void *data; /* Profiler callback data. */ + SBuf sb; /* String buffer for stack dumps. */ + int interval; /* Sample interval in milliseconds. */ + int samples; /* Number of samples for next callback. */ + int vmstate; /* VM state when profile timer triggered. */ +#if LJ_PROFILE_SIGPROF + struct sigaction oldsa; /* Previous SIGPROF state. */ +#elif LJ_PROFILE_PTHREAD + pthread_mutex_t lock; /* g->hookmask update lock. */ + pthread_t thread; /* Timer thread. */ + int abort; /* Abort timer thread. */ +#elif LJ_PROFILE_WTHREAD +#if LJ_TARGET_WINDOWS + HINSTANCE wmm; /* WinMM library handle. */ + WMM_TPFUNC wmm_tbp; /* WinMM timeBeginPeriod function. */ + WMM_TPFUNC wmm_tep; /* WinMM timeEndPeriod function. */ +#endif + CRITICAL_SECTION lock; /* g->hookmask update lock. */ + HANDLE thread; /* Timer thread. */ + int abort; /* Abort timer thread. */ +#endif +} ProfileState; + +/* Sadly, we have to use a static profiler state. +** +** The SIGPROF variant needs a static pointer to the global state, anyway. +** And it would be hard to extend for multiple threads. You can still use +** multiple VMs in multiple threads, but only profile one at a time. +*/ +static ProfileState profile_state; + +/* Default sample interval in milliseconds. */ +#define LJ_PROFILE_INTERVAL_DEFAULT 10 + +/* -- Profiler/hook interaction ------------------------------------------- */ + +#if !LJ_PROFILE_SIGPROF +void LJ_FASTCALL lj_profile_hook_enter(global_State *g) +{ + ProfileState *ps = &profile_state; + if (ps->g) { + profile_lock(ps); + hook_enter(g); + profile_unlock(ps); + } else { + hook_enter(g); + } +} + +void LJ_FASTCALL lj_profile_hook_leave(global_State *g) +{ + ProfileState *ps = &profile_state; + if (ps->g) { + profile_lock(ps); + hook_leave(g); + profile_unlock(ps); + } else { + hook_leave(g); + } +} +#endif + +/* -- Profile callbacks --------------------------------------------------- */ + +/* Callback from profile hook (HOOK_PROFILE already cleared). */ +void LJ_FASTCALL lj_profile_interpreter(lua_State *L) +{ + ProfileState *ps = &profile_state; + global_State *g = G(L); + uint8_t mask; + profile_lock(ps); + mask = (g->hookmask & ~HOOK_PROFILE); + if (!(mask & HOOK_VMEVENT)) { + int samples = ps->samples; + ps->samples = 0; + g->hookmask = HOOK_VMEVENT; + lj_dispatch_update(g); + profile_unlock(ps); + ps->cb(ps->data, L, samples, ps->vmstate); /* Invoke user callback. */ + profile_lock(ps); + mask |= (g->hookmask & HOOK_PROFILE); + } + g->hookmask = mask; + lj_dispatch_update(g); + profile_unlock(ps); +} + +/* Trigger profile hook. Asynchronous call from OS-specific profile timer. */ +static void profile_trigger(ProfileState *ps) +{ + global_State *g = ps->g; + uint8_t mask; + profile_lock(ps); + ps->samples++; /* Always increment number of samples. */ + mask = g->hookmask; + if (!(mask & (HOOK_PROFILE|HOOK_VMEVENT|HOOK_GC))) { /* Set profile hook. */ + int st = g->vmstate; + ps->vmstate = st >= 0 ? 'N' : + st == ~LJ_VMST_INTERP ? 'I' : + st == ~LJ_VMST_C ? 'C' : + st == ~LJ_VMST_GC ? 'G' : 'J'; + g->hookmask = (mask | HOOK_PROFILE); + lj_dispatch_update(g); + } + profile_unlock(ps); +} + +/* -- OS-specific profile timer handling ---------------------------------- */ + +#if LJ_PROFILE_SIGPROF + +/* SIGPROF handler. */ +static void profile_signal(int sig) +{ + UNUSED(sig); + profile_trigger(&profile_state); +} + +/* Start profiling timer. */ +static void profile_timer_start(ProfileState *ps) +{ + int interval = ps->interval; + struct itimerval tm; + struct sigaction sa; + tm.it_value.tv_sec = tm.it_interval.tv_sec = interval / 1000; + tm.it_value.tv_usec = tm.it_interval.tv_usec = (interval % 1000) * 1000; + setitimer(ITIMER_PROF, &tm, NULL); +#if LJ_TARGET_QNX + sa.sa_flags = 0; +#else + sa.sa_flags = SA_RESTART; +#endif + sa.sa_handler = profile_signal; + sigemptyset(&sa.sa_mask); + sigaction(SIGPROF, &sa, &ps->oldsa); +} + +/* Stop profiling timer. */ +static void profile_timer_stop(ProfileState *ps) +{ + struct itimerval tm; + tm.it_value.tv_sec = tm.it_interval.tv_sec = 0; + tm.it_value.tv_usec = tm.it_interval.tv_usec = 0; + setitimer(ITIMER_PROF, &tm, NULL); + sigaction(SIGPROF, &ps->oldsa, NULL); +} + +#elif LJ_PROFILE_PTHREAD + +/* POSIX timer thread. */ +static void *profile_thread(ProfileState *ps) +{ + int interval = ps->interval; +#if !LJ_TARGET_PS3 + struct timespec ts; + ts.tv_sec = interval / 1000; + ts.tv_nsec = (interval % 1000) * 1000000; +#endif + while (1) { +#if LJ_TARGET_PS3 + sys_timer_usleep(interval * 1000); +#else + nanosleep(&ts, NULL); +#endif + if (ps->abort) break; + profile_trigger(ps); + } + return NULL; +} + +/* Start profiling timer thread. */ +static void profile_timer_start(ProfileState *ps) +{ + pthread_mutex_init(&ps->lock, 0); + ps->abort = 0; + pthread_create(&ps->thread, NULL, (void *(*)(void *))profile_thread, ps); +} + +/* Stop profiling timer thread. */ +static void profile_timer_stop(ProfileState *ps) +{ + ps->abort = 1; + pthread_join(ps->thread, NULL); + pthread_mutex_destroy(&ps->lock); +} + +#elif LJ_PROFILE_WTHREAD + +/* Windows timer thread. */ +static DWORD WINAPI profile_thread(void *psx) +{ + ProfileState *ps = (ProfileState *)psx; + int interval = ps->interval; +#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP + ps->wmm_tbp(interval); +#endif + while (1) { + Sleep(interval); + if (ps->abort) break; + profile_trigger(ps); + } +#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP + ps->wmm_tep(interval); +#endif + return 0; +} + +/* Start profiling timer thread. */ +static void profile_timer_start(ProfileState *ps) +{ +#if LJ_TARGET_WINDOWS && !LJ_TARGET_UWP + if (!ps->wmm) { /* Load WinMM library on-demand. */ + ps->wmm = LJ_WIN_LOADLIBA("winmm.dll"); + if (ps->wmm) { + ps->wmm_tbp = (WMM_TPFUNC)GetProcAddress(ps->wmm, "timeBeginPeriod"); + ps->wmm_tep = (WMM_TPFUNC)GetProcAddress(ps->wmm, "timeEndPeriod"); + if (!ps->wmm_tbp || !ps->wmm_tep) { + ps->wmm = NULL; + return; + } + } + } +#endif + InitializeCriticalSection(&ps->lock); + ps->abort = 0; + ps->thread = CreateThread(NULL, 0, profile_thread, ps, 0, NULL); +} + +/* Stop profiling timer thread. */ +static void profile_timer_stop(ProfileState *ps) +{ + ps->abort = 1; + WaitForSingleObject(ps->thread, INFINITE); + DeleteCriticalSection(&ps->lock); +} + +#endif + +/* -- Public profiling API ------------------------------------------------ */ + +/* Start profiling. */ +LUA_API void luaJIT_profile_start(lua_State *L, const char *mode, + luaJIT_profile_callback cb, void *data) +{ + ProfileState *ps = &profile_state; + int interval = LJ_PROFILE_INTERVAL_DEFAULT; + while (*mode) { + int m = *mode++; + switch (m) { + case 'i': + interval = 0; + while (*mode >= '0' && *mode <= '9') + interval = interval * 10 + (*mode++ - '0'); + if (interval <= 0) interval = 1; + break; +#if LJ_HASJIT + case 'l': case 'f': + L2J(L)->prof_mode = m; + lj_trace_flushall(L); + break; +#endif + default: /* Ignore unknown mode chars. */ + break; + } + } + if (ps->g) { + luaJIT_profile_stop(L); + if (ps->g) return; /* Profiler in use by another VM. */ + } + ps->g = G(L); + ps->interval = interval; + ps->cb = cb; + ps->data = data; + ps->samples = 0; + lj_buf_init(L, &ps->sb); + profile_timer_start(ps); +} + +/* Stop profiling. */ +LUA_API void luaJIT_profile_stop(lua_State *L) +{ + ProfileState *ps = &profile_state; + global_State *g = ps->g; + if (G(L) == g) { /* Only stop profiler if started by this VM. */ + profile_timer_stop(ps); + g->hookmask &= ~HOOK_PROFILE; + lj_dispatch_update(g); +#if LJ_HASJIT + G2J(g)->prof_mode = 0; + lj_trace_flushall(L); +#endif + lj_buf_free(g, &ps->sb); + ps->sb.w = ps->sb.e = NULL; + ps->g = NULL; + } +} + +/* Return a compact stack dump. */ +LUA_API const char *luaJIT_profile_dumpstack(lua_State *L, const char *fmt, + int depth, size_t *len) +{ + ProfileState *ps = &profile_state; + SBuf *sb = &ps->sb; + setsbufL(sb, L); + lj_buf_reset(sb); + lj_debug_dumpstack(L, sb, fmt, depth); + *len = (size_t)sbuflen(sb); + return sb->b; +} + +#endif diff --cc src/lj_profile.h index 3969f8e8,00000000..68bb9a1f mode 100644,000000..100644 --- a/src/lj_profile.h +++ b/src/lj_profile.h @@@ -1,21 -1,0 +1,21 @@@ +/* +** Low-overhead profiling. - ** Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h ++** Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +*/ + +#ifndef _LJ_PROFILE_H +#define _LJ_PROFILE_H + +#include "lj_obj.h" + +#if LJ_HASPROFILE + +LJ_FUNC void LJ_FASTCALL lj_profile_interpreter(lua_State *L); +#if !LJ_PROFILE_SIGPROF +LJ_FUNC void LJ_FASTCALL lj_profile_hook_enter(global_State *g); +LJ_FUNC void LJ_FASTCALL lj_profile_hook_leave(global_State *g); +#endif + +#endif + +#endif diff --cc src/lj_serialize.c index f7e51828,00000000..83881766 mode 100644,000000..100644 --- a/src/lj_serialize.c +++ b/src/lj_serialize.c @@@ -1,539 -1,0 +1,539 @@@ +/* +** Object de/serialization. - ** Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h ++** Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +*/ + +#define lj_serialize_c +#define LUA_CORE + +#include "lj_obj.h" + +#if LJ_HASBUFFER +#include "lj_err.h" +#include "lj_buf.h" +#include "lj_str.h" +#include "lj_tab.h" +#include "lj_udata.h" +#if LJ_HASFFI +#include "lj_ctype.h" +#include "lj_cdata.h" +#endif +#if LJ_HASJIT +#include "lj_ir.h" +#endif +#include "lj_serialize.h" + +/* Tags for internal serialization format. */ +enum { + SER_TAG_NIL, /* 0x00 */ + SER_TAG_FALSE, + SER_TAG_TRUE, + SER_TAG_NULL, + SER_TAG_LIGHTUD32, + SER_TAG_LIGHTUD64, + SER_TAG_INT, + SER_TAG_NUM, + SER_TAG_TAB, /* 0x08 */ + SER_TAG_DICT_MT = SER_TAG_TAB+6, + SER_TAG_DICT_STR, + SER_TAG_INT64, /* 0x10 */ + SER_TAG_UINT64, + SER_TAG_COMPLEX, + SER_TAG_0x13, + SER_TAG_0x14, + SER_TAG_0x15, + SER_TAG_0x16, + SER_TAG_0x17, + SER_TAG_0x18, /* 0x18 */ + SER_TAG_0x19, + SER_TAG_0x1a, + SER_TAG_0x1b, + SER_TAG_0x1c, + SER_TAG_0x1d, + SER_TAG_0x1e, + SER_TAG_0x1f, + SER_TAG_STR, /* 0x20 + str->len */ +}; +LJ_STATIC_ASSERT((SER_TAG_TAB & 7) == 0); + +/* -- Helper functions ---------------------------------------------------- */ + +static LJ_AINLINE char *serialize_more(char *w, SBufExt *sbx, MSize sz) +{ + if (LJ_UNLIKELY(sz > (MSize)(sbx->e - w))) { + sbx->w = w; + w = lj_buf_more2((SBuf *)sbx, sz); + } + return w; +} + +/* Write U124 to buffer. */ +static LJ_NOINLINE char *serialize_wu124_(char *w, uint32_t v) +{ + if (v < 0x1fe0) { + v -= 0xe0; + *w++ = (char)(0xe0 | (v >> 8)); *w++ = (char)v; + } else { + *w++ = (char)0xff; +#if LJ_BE + v = lj_bswap(v); +#endif + memcpy(w, &v, 4); w += 4; + } + return w; +} + +static LJ_AINLINE char *serialize_wu124(char *w, uint32_t v) +{ + if (LJ_LIKELY(v < 0xe0)) { + *w++ = (char)v; + return w; + } else { + return serialize_wu124_(w, v); + } +} + +static LJ_NOINLINE char *serialize_ru124_(char *r, char *w, uint32_t *pv) +{ + uint32_t v = *pv; + if (v != 0xff) { + if (r >= w) return NULL; + v = ((v & 0x1f) << 8) + *(uint8_t *)r + 0xe0; r++; + } else { + if (r + 4 > w) return NULL; + v = lj_getu32(r); r += 4; +#if LJ_BE + v = lj_bswap(v); +#endif + } + *pv = v; + return r; +} + +static LJ_AINLINE char *serialize_ru124(char *r, char *w, uint32_t *pv) +{ + if (LJ_LIKELY(r < w)) { + uint32_t v = *(uint8_t *)r; r++; + *pv = v; + if (LJ_UNLIKELY(v >= 0xe0)) { + r = serialize_ru124_(r, w, pv); + } + return r; + } + return NULL; +} + +/* Prepare string dictionary for use (once). */ +void LJ_FASTCALL lj_serialize_dict_prep_str(lua_State *L, GCtab *dict) +{ + if (!dict->hmask) { /* No hash part means not prepared, yet. */ + MSize i, len = lj_tab_len(dict); + if (!len) return; + lj_tab_resize(L, dict, dict->asize, hsize2hbits(len)); + for (i = 1; i <= len && i < dict->asize; i++) { + cTValue *o = arrayslot(dict, i); + if (tvisstr(o)) { + if (!lj_tab_getstr(dict, strV(o))) { /* Ignore dups. */ + lj_tab_newkey(L, dict, o)->u64 = (uint64_t)(i-1); + } + } else if (!tvisfalse(o)) { + lj_err_caller(L, LJ_ERR_BUFFER_BADOPT); + } + } + } +} + +/* Prepare metatable dictionary for use (once). */ +void LJ_FASTCALL lj_serialize_dict_prep_mt(lua_State *L, GCtab *dict) +{ + if (!dict->hmask) { /* No hash part means not prepared, yet. */ + MSize i, len = lj_tab_len(dict); + if (!len) return; + lj_tab_resize(L, dict, dict->asize, hsize2hbits(len)); + for (i = 1; i <= len && i < dict->asize; i++) { + cTValue *o = arrayslot(dict, i); + if (tvistab(o)) { + if (tvisnil(lj_tab_get(L, dict, o))) { /* Ignore dups. */ + lj_tab_newkey(L, dict, o)->u64 = (uint64_t)(i-1); + } + } else if (!tvisfalse(o)) { + lj_err_caller(L, LJ_ERR_BUFFER_BADOPT); + } + } + } +} + +/* -- Internal serializer ------------------------------------------------- */ + +/* Put serialized object into buffer. */ +static char *serialize_put(char *w, SBufExt *sbx, cTValue *o) +{ + if (LJ_LIKELY(tvisstr(o))) { + const GCstr *str = strV(o); + MSize len = str->len; + w = serialize_more(w, sbx, 5+len); + w = serialize_wu124(w, SER_TAG_STR + len); + w = lj_buf_wmem(w, strdata(str), len); + } else if (tvisint(o)) { + uint32_t x = LJ_BE ? lj_bswap((uint32_t)intV(o)) : (uint32_t)intV(o); + w = serialize_more(w, sbx, 1+4); + *w++ = SER_TAG_INT; memcpy(w, &x, 4); w += 4; + } else if (tvisnum(o)) { + uint64_t x = LJ_BE ? lj_bswap64(o->u64) : o->u64; + w = serialize_more(w, sbx, 1+sizeof(lua_Number)); + *w++ = SER_TAG_NUM; memcpy(w, &x, 8); w += 8; + } else if (tvispri(o)) { + w = serialize_more(w, sbx, 1); + *w++ = (char)(SER_TAG_NIL + ~itype(o)); + } else if (tvistab(o)) { + const GCtab *t = tabV(o); + uint32_t narray = 0, nhash = 0, one = 2; + if (sbx->depth <= 0) lj_err_caller(sbufL(sbx), LJ_ERR_BUFFER_DEPTH); + sbx->depth--; + if (t->asize > 0) { /* Determine max. length of array part. */ + ptrdiff_t i; + TValue *array = tvref(t->array); + for (i = (ptrdiff_t)t->asize-1; i >= 0; i--) + if (!tvisnil(&array[i])) + break; + narray = (uint32_t)(i+1); + if (narray && tvisnil(&array[0])) one = 4; + } + if (t->hmask > 0) { /* Count number of used hash slots. */ + uint32_t i, hmask = t->hmask; + Node *node = noderef(t->node); + for (i = 0; i <= hmask; i++) + nhash += !tvisnil(&node[i].val); + } + /* Write metatable index. */ + if (LJ_UNLIKELY(tabref(sbx->dict_mt)) && tabref(t->metatable)) { + TValue mto; + Node *n; + settabV(sbufL(sbx), &mto, tabref(t->metatable)); + n = hashgcref(tabref(sbx->dict_mt), mto.gcr); + do { + if (n->key.u64 == mto.u64) { + uint32_t idx = n->val.u32.lo; + w = serialize_more(w, sbx, 1+5); + *w++ = SER_TAG_DICT_MT; + w = serialize_wu124(w, idx); + break; + } + } while ((n = nextnode(n))); + } + /* Write number of array slots and hash slots. */ + w = serialize_more(w, sbx, 1+2*5); + *w++ = (char)(SER_TAG_TAB + (nhash ? 1 : 0) + (narray ? one : 0)); + if (narray) w = serialize_wu124(w, narray); + if (nhash) w = serialize_wu124(w, nhash); + if (narray) { /* Write array entries. */ + cTValue *oa = tvref(t->array) + (one >> 2); + cTValue *oe = tvref(t->array) + narray; + while (oa < oe) w = serialize_put(w, sbx, oa++); + } + if (nhash) { /* Write hash entries. */ + const Node *node = noderef(t->node) + t->hmask; + GCtab *dict_str = tabref(sbx->dict_str); + if (LJ_UNLIKELY(dict_str)) { + for (;; node--) + if (!tvisnil(&node->val)) { + if (LJ_LIKELY(tvisstr(&node->key))) { + /* Inlined lj_tab_getstr is 30% faster. */ + const GCstr *str = strV(&node->key); + Node *n = hashstr(dict_str, str); + do { + if (tvisstr(&n->key) && strV(&n->key) == str) { + uint32_t idx = n->val.u32.lo; + w = serialize_more(w, sbx, 1+5); + *w++ = SER_TAG_DICT_STR; + w = serialize_wu124(w, idx); + break; + } + n = nextnode(n); + if (!n) { + MSize len = str->len; + w = serialize_more(w, sbx, 5+len); + w = serialize_wu124(w, SER_TAG_STR + len); + w = lj_buf_wmem(w, strdata(str), len); + break; + } + } while (1); + } else { + w = serialize_put(w, sbx, &node->key); + } + w = serialize_put(w, sbx, &node->val); + if (--nhash == 0) break; + } + } else { + for (;; node--) + if (!tvisnil(&node->val)) { + w = serialize_put(w, sbx, &node->key); + w = serialize_put(w, sbx, &node->val); + if (--nhash == 0) break; + } + } + } + sbx->depth++; +#if LJ_HASFFI + } else if (tviscdata(o)) { + CTState *cts = ctype_cts(sbufL(sbx)); + CType *s = ctype_raw(cts, cdataV(o)->ctypeid); + uint8_t *sp = cdataptr(cdataV(o)); + if (ctype_isinteger(s->info) && s->size == 8) { + w = serialize_more(w, sbx, 1+8); + *w++ = (s->info & CTF_UNSIGNED) ? SER_TAG_UINT64 : SER_TAG_INT64; +#if LJ_BE + { uint64_t u = lj_bswap64(*(uint64_t *)sp); memcpy(w, &u, 8); } +#else + memcpy(w, sp, 8); +#endif + w += 8; + } else if (ctype_iscomplex(s->info) && s->size == 16) { + w = serialize_more(w, sbx, 1+16); + *w++ = SER_TAG_COMPLEX; +#if LJ_BE + { /* Only swap the doubles. The re/im order stays the same. */ + uint64_t u = lj_bswap64(((uint64_t *)sp)[0]); memcpy(w, &u, 8); + u = lj_bswap64(((uint64_t *)sp)[1]); memcpy(w+8, &u, 8); + } +#else + memcpy(w, sp, 16); +#endif + w += 16; + } else { + goto badenc; /* NYI other cdata */ + } +#endif + } else if (tvislightud(o)) { + uintptr_t ud = (uintptr_t)lightudV(G(sbufL(sbx)), o); + w = serialize_more(w, sbx, 1+sizeof(ud)); + if (ud == 0) { + *w++ = SER_TAG_NULL; + } else if (LJ_32 || checku32(ud)) { +#if LJ_BE && LJ_64 + ud = lj_bswap64(ud); +#elif LJ_BE + ud = lj_bswap(ud); +#endif + *w++ = SER_TAG_LIGHTUD32; memcpy(w, &ud, 4); w += 4; +#if LJ_64 + } else { +#if LJ_BE + ud = lj_bswap64(ud); +#endif + *w++ = SER_TAG_LIGHTUD64; memcpy(w, &ud, 8); w += 8; +#endif + } + } else { + /* NYI userdata */ +#if LJ_HASFFI + badenc: +#endif + lj_err_callerv(sbufL(sbx), LJ_ERR_BUFFER_BADENC, lj_typename(o)); + } + return w; +} + +/* Get serialized object from buffer. */ +static char *serialize_get(char *r, SBufExt *sbx, TValue *o) +{ + char *w = sbx->w; + uint32_t tp; + r = serialize_ru124(r, w, &tp); if (LJ_UNLIKELY(!r)) goto eob; + if (LJ_LIKELY(tp >= SER_TAG_STR)) { + uint32_t len = tp - SER_TAG_STR; + if (LJ_UNLIKELY(len > (uint32_t)(w - r))) goto eob; + setstrV(sbufL(sbx), o, lj_str_new(sbufL(sbx), r, len)); + r += len; + } else if (tp == SER_TAG_INT) { + if (LJ_UNLIKELY(r + 4 > w)) goto eob; + setintV(o, (int32_t)(LJ_BE ? lj_bswap(lj_getu32(r)) : lj_getu32(r))); + r += 4; + } else if (tp == SER_TAG_NUM) { + if (LJ_UNLIKELY(r + 8 > w)) goto eob; + memcpy(o, r, 8); r += 8; +#if LJ_BE + o->u64 = lj_bswap64(o->u64); +#endif + if (!tvisnum(o)) setnanV(o); /* Fix non-canonical NaNs. */ + } else if (tp <= SER_TAG_TRUE) { + setpriV(o, ~tp); + } else if (tp == SER_TAG_DICT_STR) { + GCtab *dict_str; + uint32_t idx; + r = serialize_ru124(r, w, &idx); if (LJ_UNLIKELY(!r)) goto eob; + idx++; + dict_str = tabref(sbx->dict_str); + if (dict_str && idx < dict_str->asize && tvisstr(arrayslot(dict_str, idx))) + copyTV(sbufL(sbx), o, arrayslot(dict_str, idx)); + else + lj_err_callerv(sbufL(sbx), LJ_ERR_BUFFER_BADDICTX, idx); + } else if (tp >= SER_TAG_TAB && tp <= SER_TAG_DICT_MT) { + uint32_t narray = 0, nhash = 0; + GCtab *t, *mt = NULL; + if (sbx->depth <= 0) lj_err_caller(sbufL(sbx), LJ_ERR_BUFFER_DEPTH); + sbx->depth--; + if (tp == SER_TAG_DICT_MT) { + GCtab *dict_mt; + uint32_t idx; + r = serialize_ru124(r, w, &idx); if (LJ_UNLIKELY(!r)) goto eob; + idx++; + dict_mt = tabref(sbx->dict_mt); + if (dict_mt && idx < dict_mt->asize && tvistab(arrayslot(dict_mt, idx))) + mt = tabV(arrayslot(dict_mt, idx)); + else + lj_err_callerv(sbufL(sbx), LJ_ERR_BUFFER_BADDICTX, idx); + r = serialize_ru124(r, w, &tp); if (LJ_UNLIKELY(!r)) goto eob; + if (!(tp >= SER_TAG_TAB && tp < SER_TAG_DICT_MT)) goto badtag; + } + if (tp >= SER_TAG_TAB+2) { + r = serialize_ru124(r, w, &narray); if (LJ_UNLIKELY(!r)) goto eob; + } + if ((tp & 1)) { + r = serialize_ru124(r, w, &nhash); if (LJ_UNLIKELY(!r)) goto eob; + } + t = lj_tab_new(sbufL(sbx), narray, hsize2hbits(nhash)); + /* NOBARRIER: The table is new (marked white). */ + setgcref(t->metatable, obj2gco(mt)); + settabV(sbufL(sbx), o, t); + if (narray) { + TValue *oa = tvref(t->array) + (tp >= SER_TAG_TAB+4); + TValue *oe = tvref(t->array) + narray; + while (oa < oe) r = serialize_get(r, sbx, oa++); + } + if (nhash) { + do { + TValue k, *v; + r = serialize_get(r, sbx, &k); + v = lj_tab_set(sbufL(sbx), t, &k); + if (LJ_UNLIKELY(!tvisnil(v))) + lj_err_caller(sbufL(sbx), LJ_ERR_BUFFER_DUPKEY); + r = serialize_get(r, sbx, v); + } while (--nhash); + } + sbx->depth++; +#if LJ_HASFFI + } else if (tp >= SER_TAG_INT64 && tp <= SER_TAG_COMPLEX) { + uint32_t sz = tp == SER_TAG_COMPLEX ? 16 : 8; + GCcdata *cd; + if (LJ_UNLIKELY(r + sz > w)) goto eob; + if (LJ_UNLIKELY(!ctype_ctsG(G(sbufL(sbx))))) goto badtag; + cd = lj_cdata_new_(sbufL(sbx), + tp == SER_TAG_INT64 ? CTID_INT64 : + tp == SER_TAG_UINT64 ? CTID_UINT64 : CTID_COMPLEX_DOUBLE, + sz); + memcpy(cdataptr(cd), r, sz); r += sz; +#if LJ_BE + *(uint64_t *)cdataptr(cd) = lj_bswap64(*(uint64_t *)cdataptr(cd)); + if (sz == 16) + ((uint64_t *)cdataptr(cd))[1] = lj_bswap64(((uint64_t *)cdataptr(cd))[1]); +#endif + if (sz == 16) { /* Fix non-canonical NaNs. */ + TValue *cdo = (TValue *)cdataptr(cd); + if (!tvisnum(&cdo[0])) setnanV(&cdo[0]); + if (!tvisnum(&cdo[1])) setnanV(&cdo[1]); + } + setcdataV(sbufL(sbx), o, cd); +#endif + } else if (tp <= (LJ_64 ? SER_TAG_LIGHTUD64 : SER_TAG_LIGHTUD32)) { + uintptr_t ud = 0; + if (tp == SER_TAG_LIGHTUD32) { + if (LJ_UNLIKELY(r + 4 > w)) goto eob; + ud = (uintptr_t)(LJ_BE ? lj_bswap(lj_getu32(r)) : lj_getu32(r)); + r += 4; + } +#if LJ_64 + else if (tp == SER_TAG_LIGHTUD64) { + if (LJ_UNLIKELY(r + 8 > w)) goto eob; + memcpy(&ud, r, 8); r += 8; +#if LJ_BE + ud = lj_bswap64(ud); +#endif + } + setrawlightudV(o, lj_lightud_intern(sbufL(sbx), (void *)ud)); +#else + setrawlightudV(o, (void *)ud); +#endif + } else { +badtag: + lj_err_callerv(sbufL(sbx), LJ_ERR_BUFFER_BADDEC, tp); + } + return r; +eob: + lj_err_caller(sbufL(sbx), LJ_ERR_BUFFER_EOB); + return NULL; +} + +/* -- External serialization API ------------------------------------------ */ + +/* Encode to buffer. */ +SBufExt * LJ_FASTCALL lj_serialize_put(SBufExt *sbx, cTValue *o) +{ + sbx->depth = LJ_SERIALIZE_DEPTH; + sbx->w = serialize_put(sbx->w, sbx, o); + return sbx; +} + +/* Decode from buffer. */ +char * LJ_FASTCALL lj_serialize_get(SBufExt *sbx, TValue *o) +{ + sbx->depth = LJ_SERIALIZE_DEPTH; + return serialize_get(sbx->r, sbx, o); +} + +/* Stand-alone encoding, borrowing from global temporary buffer. */ +GCstr * LJ_FASTCALL lj_serialize_encode(lua_State *L, cTValue *o) +{ + SBufExt sbx; + char *w; + memset(&sbx, 0, sizeof(SBufExt)); + lj_bufx_set_borrow(L, &sbx, &G(L)->tmpbuf); + sbx.depth = LJ_SERIALIZE_DEPTH; + w = serialize_put(sbx.w, &sbx, o); + return lj_str_new(L, sbx.b, (size_t)(w - sbx.b)); +} + +/* Stand-alone decoding, copy-on-write from string. */ +void lj_serialize_decode(lua_State *L, TValue *o, GCstr *str) +{ + SBufExt sbx; + char *r; + memset(&sbx, 0, sizeof(SBufExt)); + lj_bufx_set_cow(L, &sbx, strdata(str), str->len); + /* No need to set sbx.cowref here. */ + sbx.depth = LJ_SERIALIZE_DEPTH; + r = serialize_get(sbx.r, &sbx, o); + if (r != sbx.w) lj_err_caller(L, LJ_ERR_BUFFER_LEFTOV); +} + +#if LJ_HASJIT +/* Peek into buffer to find the result IRType for specialization purposes. */ +LJ_FUNC MSize LJ_FASTCALL lj_serialize_peektype(SBufExt *sbx) +{ + uint32_t tp; + if (serialize_ru124(sbx->r, sbx->w, &tp)) { + /* This must match the handling of all tags in the decoder above. */ + switch (tp) { + case SER_TAG_NIL: return IRT_NIL; + case SER_TAG_FALSE: return IRT_FALSE; + case SER_TAG_TRUE: return IRT_TRUE; + case SER_TAG_NULL: case SER_TAG_LIGHTUD32: case SER_TAG_LIGHTUD64: + return IRT_LIGHTUD; + case SER_TAG_INT: return LJ_DUALNUM ? IRT_INT : IRT_NUM; + case SER_TAG_NUM: return IRT_NUM; + case SER_TAG_TAB: case SER_TAG_TAB+1: case SER_TAG_TAB+2: + case SER_TAG_TAB+3: case SER_TAG_TAB+4: case SER_TAG_TAB+5: + case SER_TAG_DICT_MT: + return IRT_TAB; + case SER_TAG_INT64: case SER_TAG_UINT64: case SER_TAG_COMPLEX: + return IRT_CDATA; + case SER_TAG_DICT_STR: + default: + return IRT_STR; + } + } + return IRT_NIL; /* Will fail on actual decode. */ +} +#endif + +#endif diff --cc src/lj_serialize.h index d3f4275a,00000000..da823573 mode 100644,000000..100644 --- a/src/lj_serialize.h +++ b/src/lj_serialize.h @@@ -1,28 -1,0 +1,28 @@@ +/* +** Object de/serialization. - ** Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h ++** Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +*/ + +#ifndef _LJ_SERIALIZE_H +#define _LJ_SERIALIZE_H + +#include "lj_obj.h" +#include "lj_buf.h" + +#if LJ_HASBUFFER + +#define LJ_SERIALIZE_DEPTH 100 /* Default depth. */ + +LJ_FUNC void LJ_FASTCALL lj_serialize_dict_prep_str(lua_State *L, GCtab *dict); +LJ_FUNC void LJ_FASTCALL lj_serialize_dict_prep_mt(lua_State *L, GCtab *dict); +LJ_FUNC SBufExt * LJ_FASTCALL lj_serialize_put(SBufExt *sbx, cTValue *o); +LJ_FUNC char * LJ_FASTCALL lj_serialize_get(SBufExt *sbx, TValue *o); +LJ_FUNC GCstr * LJ_FASTCALL lj_serialize_encode(lua_State *L, cTValue *o); +LJ_FUNC void lj_serialize_decode(lua_State *L, TValue *o, GCstr *str); +#if LJ_HASJIT +LJ_FUNC MSize LJ_FASTCALL lj_serialize_peektype(SBufExt *sbx); +#endif + +#endif + +#endif diff --cc src/lj_str.c index a5282da6,7242a8e0..cfdaec6f --- a/src/lj_str.c +++ b/src/lj_str.c @@@ -1,8 -1,13 +1,8 @@@ /* ** String handling. - ** Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h + ** Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h -** -** Portions taken verbatim or adapted from the Lua interpreter. -** Copyright (C) 1994-2008 Lua.org, PUC-Rio. See Copyright Notice in lua.h */ -#include - #define lj_str_c #define LUA_CORE diff --cc src/lj_strfmt.c index 71ee9f62,00000000..909255db mode 100644,000000..100644 --- a/src/lj_strfmt.c +++ b/src/lj_strfmt.c @@@ -1,606 -1,0 +1,606 @@@ +/* +** String formatting. - ** Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h ++** Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +*/ + +#include + +#define lj_strfmt_c +#define LUA_CORE + +#include "lj_obj.h" +#include "lj_err.h" +#include "lj_buf.h" +#include "lj_str.h" +#include "lj_meta.h" +#include "lj_state.h" +#include "lj_char.h" +#include "lj_strfmt.h" +#if LJ_HASFFI +#include "lj_ctype.h" +#endif +#include "lj_lib.h" + +/* -- Format parser ------------------------------------------------------- */ + +static const uint8_t strfmt_map[('x'-'A')+1] = { + STRFMT_A,0,0,0,STRFMT_E,STRFMT_F,STRFMT_G,0,0,0,0,0,0, + 0,0,0,0,0,0,0,0,0,0,STRFMT_X,0,0, + 0,0,0,0,0,0, + STRFMT_A,0,STRFMT_C,STRFMT_D,STRFMT_E,STRFMT_F,STRFMT_G,0,STRFMT_I,0,0,0,0, + 0,STRFMT_O,STRFMT_P,STRFMT_Q,0,STRFMT_S,0,STRFMT_U,0,0,STRFMT_X +}; + +SFormat LJ_FASTCALL lj_strfmt_parse(FormatState *fs) +{ + const uint8_t *p = fs->p, *e = fs->e; + fs->str = (const char *)p; + for (; p < e; p++) { + if (*p == '%') { /* Escape char? */ + if (p[1] == '%') { /* '%%'? */ + fs->p = ++p+1; + goto retlit; + } else { + SFormat sf = 0; + uint32_t c; + if (p != (const uint8_t *)fs->str) + break; + for (p++; (uint32_t)*p - ' ' <= (uint32_t)('0' - ' '); p++) { + /* Parse flags. */ + if (*p == '-') sf |= STRFMT_F_LEFT; + else if (*p == '+') sf |= STRFMT_F_PLUS; + else if (*p == '0') sf |= STRFMT_F_ZERO; + else if (*p == ' ') sf |= STRFMT_F_SPACE; + else if (*p == '#') sf |= STRFMT_F_ALT; + else break; + } + if ((uint32_t)*p - '0' < 10) { /* Parse width. */ + uint32_t width = (uint32_t)*p++ - '0'; + if ((uint32_t)*p - '0' < 10) + width = (uint32_t)*p++ - '0' + width*10; + sf |= (width << STRFMT_SH_WIDTH); + } + if (*p == '.') { /* Parse precision. */ + uint32_t prec = 0; + p++; + if ((uint32_t)*p - '0' < 10) { + prec = (uint32_t)*p++ - '0'; + if ((uint32_t)*p - '0' < 10) + prec = (uint32_t)*p++ - '0' + prec*10; + } + sf |= ((prec+1) << STRFMT_SH_PREC); + } + /* Parse conversion. */ + c = (uint32_t)*p - 'A'; + if (LJ_LIKELY(c <= (uint32_t)('x' - 'A'))) { + uint32_t sx = strfmt_map[c]; + if (sx) { + fs->p = p+1; + return (sf | sx | ((c & 0x20) ? 0 : STRFMT_F_UPPER)); + } + } + /* Return error location. */ + if (*p >= 32) p++; + fs->len = (MSize)(p - (const uint8_t *)fs->str); + fs->p = fs->e; + return STRFMT_ERR; + } + } + } + fs->p = p; +retlit: + fs->len = (MSize)(p - (const uint8_t *)fs->str); + return fs->len ? STRFMT_LIT : STRFMT_EOF; +} + +/* -- Raw conversions ----------------------------------------------------- */ + +#define WINT_R(x, sh, sc) \ + { uint32_t d = (x*(((1<>sh; x -= d*sc; *p++ = (char)('0'+d); } + +/* Write integer to buffer. */ +char * LJ_FASTCALL lj_strfmt_wint(char *p, int32_t k) +{ + uint32_t u = (uint32_t)k; + if (k < 0) { u = ~u+1u; *p++ = '-'; } + if (u < 10000) { + if (u < 10) goto dig1; + if (u < 100) goto dig2; + if (u < 1000) goto dig3; + } else { + uint32_t v = u / 10000; u -= v * 10000; + if (v < 10000) { + if (v < 10) goto dig5; + if (v < 100) goto dig6; + if (v < 1000) goto dig7; + } else { + uint32_t w = v / 10000; v -= w * 10000; + if (w >= 10) WINT_R(w, 10, 10) + *p++ = (char)('0'+w); + } + WINT_R(v, 23, 1000) + dig7: WINT_R(v, 12, 100) + dig6: WINT_R(v, 10, 10) + dig5: *p++ = (char)('0'+v); + } + WINT_R(u, 23, 1000) + dig3: WINT_R(u, 12, 100) + dig2: WINT_R(u, 10, 10) + dig1: *p++ = (char)('0'+u); + return p; +} +#undef WINT_R + +/* Write pointer to buffer. */ +char * LJ_FASTCALL lj_strfmt_wptr(char *p, const void *v) +{ + ptrdiff_t x = (ptrdiff_t)v; + MSize i, n = STRFMT_MAXBUF_PTR; + if (x == 0) { + *p++ = 'N'; *p++ = 'U'; *p++ = 'L'; *p++ = 'L'; + return p; + } +#if LJ_64 + /* Shorten output for 64 bit pointers. */ + n = 2+2*4+((x >> 32) ? 2+2*(lj_fls((uint32_t)(x >> 32))>>3) : 0); +#endif + p[0] = '0'; + p[1] = 'x'; + for (i = n-1; i >= 2; i--, x >>= 4) + p[i] = "0123456789abcdef"[(x & 15)]; + return p+n; +} + +/* Write ULEB128 to buffer. */ +char * LJ_FASTCALL lj_strfmt_wuleb128(char *p, uint32_t v) +{ + for (; v >= 0x80; v >>= 7) + *p++ = (char)((v & 0x7f) | 0x80); + *p++ = (char)v; + return p; +} + +/* Return string or write number to tmp buffer and return pointer to start. */ +const char *lj_strfmt_wstrnum(lua_State *L, cTValue *o, MSize *lenp) +{ + SBuf *sb; + if (tvisstr(o)) { + *lenp = strV(o)->len; + return strVdata(o); + } else if (tvisbuf(o)) { + SBufExt *sbx = bufV(o); + *lenp = sbufxlen(sbx); + return sbx->r; + } else if (tvisint(o)) { + sb = lj_strfmt_putint(lj_buf_tmp_(L), intV(o)); + } else if (tvisnum(o)) { + sb = lj_strfmt_putfnum(lj_buf_tmp_(L), STRFMT_G14, o->n); + } else { + return NULL; + } + *lenp = sbuflen(sb); + return sb->b; +} + +/* -- Unformatted conversions to buffer ----------------------------------- */ + +/* Add integer to buffer. */ +SBuf * LJ_FASTCALL lj_strfmt_putint(SBuf *sb, int32_t k) +{ + sb->w = lj_strfmt_wint(lj_buf_more(sb, STRFMT_MAXBUF_INT), k); + return sb; +} + +#if LJ_HASJIT +/* Add number to buffer. */ +SBuf * LJ_FASTCALL lj_strfmt_putnum(SBuf *sb, cTValue *o) +{ + return lj_strfmt_putfnum(sb, STRFMT_G14, o->n); +} +#endif + +SBuf * LJ_FASTCALL lj_strfmt_putptr(SBuf *sb, const void *v) +{ + sb->w = lj_strfmt_wptr(lj_buf_more(sb, STRFMT_MAXBUF_PTR), v); + return sb; +} + +/* Add quoted string to buffer. */ +static SBuf *strfmt_putquotedlen(SBuf *sb, const char *s, MSize len) +{ + lj_buf_putb(sb, '"'); + while (len--) { + uint32_t c = (uint32_t)(uint8_t)*s++; + char *w = lj_buf_more(sb, 4); + if (c == '"' || c == '\\' || c == '\n') { + *w++ = '\\'; + } else if (lj_char_iscntrl(c)) { /* This can only be 0-31 or 127. */ + uint32_t d; + *w++ = '\\'; + if (c >= 100 || lj_char_isdigit((uint8_t)*s)) { + *w++ = (char)('0'+(c >= 100)); if (c >= 100) c -= 100; + goto tens; + } else if (c >= 10) { + tens: + d = (c * 205) >> 11; c -= d * 10; *w++ = (char)('0'+d); + } + c += '0'; + } + *w++ = (char)c; + sb->w = w; + } + lj_buf_putb(sb, '"'); + return sb; +} + +#if LJ_HASJIT +SBuf * LJ_FASTCALL lj_strfmt_putquoted(SBuf *sb, GCstr *str) +{ + return strfmt_putquotedlen(sb, strdata(str), str->len); +} +#endif + +/* -- Formatted conversions to buffer ------------------------------------- */ + +/* Add formatted char to buffer. */ +SBuf *lj_strfmt_putfchar(SBuf *sb, SFormat sf, int32_t c) +{ + MSize width = STRFMT_WIDTH(sf); + char *w = lj_buf_more(sb, width > 1 ? width : 1); + if ((sf & STRFMT_F_LEFT)) *w++ = (char)c; + while (width-- > 1) *w++ = ' '; + if (!(sf & STRFMT_F_LEFT)) *w++ = (char)c; + sb->w = w; + return sb; +} + +/* Add formatted string to buffer. */ +static SBuf *strfmt_putfstrlen(SBuf *sb, SFormat sf, const char *s, MSize len) +{ + MSize width = STRFMT_WIDTH(sf); + char *w; + if (len > STRFMT_PREC(sf)) len = STRFMT_PREC(sf); + w = lj_buf_more(sb, width > len ? width : len); + if ((sf & STRFMT_F_LEFT)) w = lj_buf_wmem(w, s, len); + while (width-- > len) *w++ = ' '; + if (!(sf & STRFMT_F_LEFT)) w = lj_buf_wmem(w, s, len); + sb->w = w; + return sb; +} + +#if LJ_HASJIT +SBuf *lj_strfmt_putfstr(SBuf *sb, SFormat sf, GCstr *str) +{ + return strfmt_putfstrlen(sb, sf, strdata(str), str->len); +} +#endif + +/* Add formatted signed/unsigned integer to buffer. */ +SBuf *lj_strfmt_putfxint(SBuf *sb, SFormat sf, uint64_t k) +{ + char buf[STRFMT_MAXBUF_XINT], *q = buf + sizeof(buf), *w; +#ifdef LUA_USE_ASSERT + char *ws; +#endif + MSize prefix = 0, len, prec, pprec, width, need; + + /* Figure out signed prefixes. */ + if (STRFMT_TYPE(sf) == STRFMT_INT) { + if ((int64_t)k < 0) { + k = ~k+1u; + prefix = 256 + '-'; + } else if ((sf & STRFMT_F_PLUS)) { + prefix = 256 + '+'; + } else if ((sf & STRFMT_F_SPACE)) { + prefix = 256 + ' '; + } + } + + /* Convert number and store to fixed-size buffer in reverse order. */ + prec = STRFMT_PREC(sf); + if ((int32_t)prec >= 0) sf &= ~STRFMT_F_ZERO; + if (k == 0) { /* Special-case zero argument. */ + if (prec != 0 || + (sf & (STRFMT_T_OCT|STRFMT_F_ALT)) == (STRFMT_T_OCT|STRFMT_F_ALT)) + *--q = '0'; + } else if (!(sf & (STRFMT_T_HEX|STRFMT_T_OCT))) { /* Decimal. */ + uint32_t k2; + while ((k >> 32)) { *--q = (char)('0' + k % 10); k /= 10; } + k2 = (uint32_t)k; + do { *--q = (char)('0' + k2 % 10); k2 /= 10; } while (k2); + } else if ((sf & STRFMT_T_HEX)) { /* Hex. */ + const char *hexdig = (sf & STRFMT_F_UPPER) ? "0123456789ABCDEF" : + "0123456789abcdef"; + do { *--q = hexdig[(k & 15)]; k >>= 4; } while (k); + if ((sf & STRFMT_F_ALT)) prefix = 512 + ((sf & STRFMT_F_UPPER) ? 'X' : 'x'); + } else { /* Octal. */ + do { *--q = (char)('0' + (uint32_t)(k & 7)); k >>= 3; } while (k); + if ((sf & STRFMT_F_ALT)) *--q = '0'; + } + + /* Calculate sizes. */ + len = (MSize)(buf + sizeof(buf) - q); + if ((int32_t)len >= (int32_t)prec) prec = len; + width = STRFMT_WIDTH(sf); + pprec = prec + (prefix >> 8); + need = width > pprec ? width : pprec; + w = lj_buf_more(sb, need); +#ifdef LUA_USE_ASSERT + ws = w; +#endif + + /* Format number with leading/trailing whitespace and zeros. */ + if ((sf & (STRFMT_F_LEFT|STRFMT_F_ZERO)) == 0) + while (width-- > pprec) *w++ = ' '; + if (prefix) { + if ((char)prefix >= 'X') *w++ = '0'; + *w++ = (char)prefix; + } + if ((sf & (STRFMT_F_LEFT|STRFMT_F_ZERO)) == STRFMT_F_ZERO) + while (width-- > pprec) *w++ = '0'; + while (prec-- > len) *w++ = '0'; + while (q < buf + sizeof(buf)) *w++ = *q++; /* Add number itself. */ + if ((sf & STRFMT_F_LEFT)) + while (width-- > pprec) *w++ = ' '; + + lj_assertX(need == (MSize)(w - ws), "miscalculated format size"); + sb->w = w; + return sb; +} + +/* Add number formatted as signed integer to buffer. */ +SBuf *lj_strfmt_putfnum_int(SBuf *sb, SFormat sf, lua_Number n) +{ + int64_t k = (int64_t)n; + if (checki32(k) && sf == STRFMT_INT) + return lj_strfmt_putint(sb, (int32_t)k); /* Shortcut for plain %d. */ + else + return lj_strfmt_putfxint(sb, sf, (uint64_t)k); +} + +/* Add number formatted as unsigned integer to buffer. */ +SBuf *lj_strfmt_putfnum_uint(SBuf *sb, SFormat sf, lua_Number n) +{ + int64_t k; + if (n >= 9223372036854775808.0) + k = (int64_t)(n - 18446744073709551616.0); + else + k = (int64_t)n; + return lj_strfmt_putfxint(sb, sf, (uint64_t)k); +} + +/* Format stack arguments to buffer. */ +int lj_strfmt_putarg(lua_State *L, SBuf *sb, int arg, int retry) +{ + int narg = (int)(L->top - L->base); + GCstr *fmt = lj_lib_checkstr(L, arg); + FormatState fs; + SFormat sf; + lj_strfmt_init(&fs, strdata(fmt), fmt->len); + while ((sf = lj_strfmt_parse(&fs)) != STRFMT_EOF) { + if (sf == STRFMT_LIT) { + lj_buf_putmem(sb, fs.str, fs.len); + } else if (sf == STRFMT_ERR) { + lj_err_callerv(L, LJ_ERR_STRFMT, + strdata(lj_str_new(L, fs.str, fs.len))); + } else { + TValue *o = &L->base[arg++]; + if (arg > narg) + lj_err_arg(L, arg, LJ_ERR_NOVAL); + switch (STRFMT_TYPE(sf)) { + case STRFMT_INT: + if (tvisint(o)) { + int32_t k = intV(o); + if (sf == STRFMT_INT) + lj_strfmt_putint(sb, k); /* Shortcut for plain %d. */ + else + lj_strfmt_putfxint(sb, sf, k); + break; + } +#if LJ_HASFFI + if (tviscdata(o)) { + GCcdata *cd = cdataV(o); + if (cd->ctypeid == CTID_INT64 || cd->ctypeid == CTID_UINT64) { + lj_strfmt_putfxint(sb, sf, *(uint64_t *)cdataptr(cd)); + break; + } + } +#endif + lj_strfmt_putfnum_int(sb, sf, lj_lib_checknum(L, arg)); + break; + case STRFMT_UINT: + if (tvisint(o)) { + lj_strfmt_putfxint(sb, sf, intV(o)); + break; + } +#if LJ_HASFFI + if (tviscdata(o)) { + GCcdata *cd = cdataV(o); + if (cd->ctypeid == CTID_INT64 || cd->ctypeid == CTID_UINT64) { + lj_strfmt_putfxint(sb, sf, *(uint64_t *)cdataptr(cd)); + break; + } + } +#endif + lj_strfmt_putfnum_uint(sb, sf, lj_lib_checknum(L, arg)); + break; + case STRFMT_NUM: + lj_strfmt_putfnum(sb, sf, lj_lib_checknum(L, arg)); + break; + case STRFMT_STR: { + MSize len; + const char *s; + cTValue *mo; + if (LJ_UNLIKELY(!tvisstr(o) && !tvisbuf(o)) && retry >= 0 && + !tvisnil(mo = lj_meta_lookup(L, o, MM_tostring))) { + /* Call __tostring metamethod once. */ + copyTV(L, L->top++, mo); + copyTV(L, L->top++, o); + lua_call(L, 1, 1); + o = &L->base[arg-1]; /* Stack may have been reallocated. */ + copyTV(L, o, --L->top); /* Replace inline for retry. */ + if (retry < 2) { /* Global buffer may have been overwritten. */ + retry = 1; + break; + } + } + if (LJ_LIKELY(tvisstr(o))) { + len = strV(o)->len; + s = strVdata(o); +#if LJ_HASBUFFER + } else if (tvisbuf(o)) { + SBufExt *sbx = bufV(o); + if (sbx == (SBufExt *)sb) lj_err_arg(L, arg+1, LJ_ERR_BUFFER_SELF); + len = sbufxlen(sbx); + s = sbx->r; +#endif + } else { + GCstr *str = lj_strfmt_obj(L, o); + len = str->len; + s = strdata(str); + } + if ((sf & STRFMT_T_QUOTED)) + strfmt_putquotedlen(sb, s, len); /* No formatting. */ + else + strfmt_putfstrlen(sb, sf, s, len); + break; + } + case STRFMT_CHAR: + lj_strfmt_putfchar(sb, sf, lj_lib_checkint(L, arg)); + break; + case STRFMT_PTR: /* No formatting. */ + lj_strfmt_putptr(sb, lj_obj_ptr(G(L), o)); + break; + default: + lj_assertL(0, "bad string format type"); + break; + } + } + } + return retry; +} + +/* -- Conversions to strings ---------------------------------------------- */ + +/* Convert integer to string. */ +GCstr * LJ_FASTCALL lj_strfmt_int(lua_State *L, int32_t k) +{ + char buf[STRFMT_MAXBUF_INT]; + MSize len = (MSize)(lj_strfmt_wint(buf, k) - buf); + return lj_str_new(L, buf, len); +} + +/* Convert integer or number to string. */ +GCstr * LJ_FASTCALL lj_strfmt_number(lua_State *L, cTValue *o) +{ + return tvisint(o) ? lj_strfmt_int(L, intV(o)) : lj_strfmt_num(L, o); +} + +#if LJ_HASJIT +/* Convert char value to string. */ +GCstr * LJ_FASTCALL lj_strfmt_char(lua_State *L, int c) +{ + char buf[1]; + buf[0] = c; + return lj_str_new(L, buf, 1); +} +#endif + +/* Raw conversion of object to string. */ +GCstr * LJ_FASTCALL lj_strfmt_obj(lua_State *L, cTValue *o) +{ + if (tvisstr(o)) { + return strV(o); + } else if (tvisnumber(o)) { + return lj_strfmt_number(L, o); + } else if (tvisnil(o)) { + return lj_str_newlit(L, "nil"); + } else if (tvisfalse(o)) { + return lj_str_newlit(L, "false"); + } else if (tvistrue(o)) { + return lj_str_newlit(L, "true"); + } else { + char buf[8+2+2+16], *p = buf; + p = lj_buf_wmem(p, lj_typename(o), (MSize)strlen(lj_typename(o))); + *p++ = ':'; *p++ = ' '; + if (tvisfunc(o) && isffunc(funcV(o))) { + p = lj_buf_wmem(p, "builtin#", 8); + p = lj_strfmt_wint(p, funcV(o)->c.ffid); + } else { + p = lj_strfmt_wptr(p, lj_obj_ptr(G(L), o)); + } + return lj_str_new(L, buf, (size_t)(p - buf)); + } +} + +/* -- Internal string formatting ------------------------------------------ */ + +/* +** These functions are only used for lua_pushfstring(), lua_pushvfstring() +** and for internal string formatting (e.g. error messages). Caveat: unlike +** string.format(), only a limited subset of formats and flags are supported! +** +** LuaJIT has support for a couple more formats than Lua 5.1/5.2: +** - %d %u %o %x with full formatting, 32 bit integers only. +** - %f and other FP formats are really %.14g. +** - %s %c %p without formatting. +*/ + +/* Push formatted message as a string object to Lua stack. va_list variant. */ +const char *lj_strfmt_pushvf(lua_State *L, const char *fmt, va_list argp) +{ + SBuf *sb = lj_buf_tmp_(L); + FormatState fs; + SFormat sf; + GCstr *str; + lj_strfmt_init(&fs, fmt, (MSize)strlen(fmt)); + while ((sf = lj_strfmt_parse(&fs)) != STRFMT_EOF) { + switch (STRFMT_TYPE(sf)) { + case STRFMT_LIT: + lj_buf_putmem(sb, fs.str, fs.len); + break; + case STRFMT_INT: + lj_strfmt_putfxint(sb, sf, va_arg(argp, int32_t)); + break; + case STRFMT_UINT: + lj_strfmt_putfxint(sb, sf, va_arg(argp, uint32_t)); + break; + case STRFMT_NUM: + lj_strfmt_putfnum(sb, STRFMT_G14, va_arg(argp, lua_Number)); + break; + case STRFMT_STR: { + const char *s = va_arg(argp, char *); + if (s == NULL) s = "(null)"; + lj_buf_putmem(sb, s, (MSize)strlen(s)); + break; + } + case STRFMT_CHAR: + lj_buf_putb(sb, va_arg(argp, int)); + break; + case STRFMT_PTR: + lj_strfmt_putptr(sb, va_arg(argp, void *)); + break; + case STRFMT_ERR: + default: + lj_buf_putb(sb, '?'); + lj_assertL(0, "bad string format near offset %d", fs.len); + break; + } + } + str = lj_buf_str(L, sb); + setstrV(L, L->top, str); + incr_top(L); + return strdata(str); +} + +/* Push formatted message as a string object to Lua stack. Vararg variant. */ +const char *lj_strfmt_pushf(lua_State *L, const char *fmt, ...) +{ + const char *msg; + va_list argp; + va_start(argp, fmt); + msg = lj_strfmt_pushvf(L, fmt, argp); + va_end(argp); + return msg; +} + diff --cc src/lj_strfmt.h index a4529604,00000000..bd17896e mode 100644,000000..100644 --- a/src/lj_strfmt.h +++ b/src/lj_strfmt.h @@@ -1,131 -1,0 +1,131 @@@ +/* +** String formatting. - ** Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h ++** Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +*/ + +#ifndef _LJ_STRFMT_H +#define _LJ_STRFMT_H + +#include "lj_obj.h" + +typedef uint32_t SFormat; /* Format indicator. */ + +/* Format parser state. */ +typedef struct FormatState { + const uint8_t *p; /* Current format string pointer. */ + const uint8_t *e; /* End of format string. */ + const char *str; /* Returned literal string. */ + MSize len; /* Size of literal string. */ +} FormatState; + +/* Format types (max. 16). */ +typedef enum FormatType { + STRFMT_EOF, STRFMT_ERR, STRFMT_LIT, + STRFMT_INT, STRFMT_UINT, STRFMT_NUM, STRFMT_STR, STRFMT_CHAR, STRFMT_PTR +} FormatType; + +/* Format subtypes (bits are reused). */ +#define STRFMT_T_HEX 0x0010 /* STRFMT_UINT */ +#define STRFMT_T_OCT 0x0020 /* STRFMT_UINT */ +#define STRFMT_T_FP_A 0x0000 /* STRFMT_NUM */ +#define STRFMT_T_FP_E 0x0010 /* STRFMT_NUM */ +#define STRFMT_T_FP_F 0x0020 /* STRFMT_NUM */ +#define STRFMT_T_FP_G 0x0030 /* STRFMT_NUM */ +#define STRFMT_T_QUOTED 0x0010 /* STRFMT_STR */ + +/* Format flags. */ +#define STRFMT_F_LEFT 0x0100 +#define STRFMT_F_PLUS 0x0200 +#define STRFMT_F_ZERO 0x0400 +#define STRFMT_F_SPACE 0x0800 +#define STRFMT_F_ALT 0x1000 +#define STRFMT_F_UPPER 0x2000 + +/* Format indicator fields. */ +#define STRFMT_SH_WIDTH 16 +#define STRFMT_SH_PREC 24 + +#define STRFMT_TYPE(sf) ((FormatType)((sf) & 15)) +#define STRFMT_WIDTH(sf) (((sf) >> STRFMT_SH_WIDTH) & 255u) +#define STRFMT_PREC(sf) ((((sf) >> STRFMT_SH_PREC) & 255u) - 1u) +#define STRFMT_FP(sf) (((sf) >> 4) & 3) + +/* Formats for conversion characters. */ +#define STRFMT_A (STRFMT_NUM|STRFMT_T_FP_A) +#define STRFMT_C (STRFMT_CHAR) +#define STRFMT_D (STRFMT_INT) +#define STRFMT_E (STRFMT_NUM|STRFMT_T_FP_E) +#define STRFMT_F (STRFMT_NUM|STRFMT_T_FP_F) +#define STRFMT_G (STRFMT_NUM|STRFMT_T_FP_G) +#define STRFMT_I STRFMT_D +#define STRFMT_O (STRFMT_UINT|STRFMT_T_OCT) +#define STRFMT_P (STRFMT_PTR) +#define STRFMT_Q (STRFMT_STR|STRFMT_T_QUOTED) +#define STRFMT_S (STRFMT_STR) +#define STRFMT_U (STRFMT_UINT) +#define STRFMT_X (STRFMT_UINT|STRFMT_T_HEX) +#define STRFMT_G14 (STRFMT_G | ((14+1) << STRFMT_SH_PREC)) + +/* Maximum buffer sizes for conversions. */ +#define STRFMT_MAXBUF_XINT (1+22) /* '0' prefix + uint64_t in octal. */ +#define STRFMT_MAXBUF_INT (1+10) /* Sign + int32_t in decimal. */ +#define STRFMT_MAXBUF_NUM 32 /* Must correspond with STRFMT_G14. */ +#define STRFMT_MAXBUF_PTR (2+2*sizeof(ptrdiff_t)) /* "0x" + hex ptr. */ + +/* Format parser. */ +LJ_FUNC SFormat LJ_FASTCALL lj_strfmt_parse(FormatState *fs); + +static LJ_AINLINE void lj_strfmt_init(FormatState *fs, const char *p, MSize len) +{ + fs->p = (const uint8_t *)p; + fs->e = (const uint8_t *)p + len; + /* Must be NUL-terminated. May have NULs inside, too. */ + lj_assertX(*fs->e == 0, "format not NUL-terminated"); +} + +/* Raw conversions. */ +LJ_FUNC char * LJ_FASTCALL lj_strfmt_wint(char *p, int32_t k); +LJ_FUNC char * LJ_FASTCALL lj_strfmt_wptr(char *p, const void *v); +LJ_FUNC char * LJ_FASTCALL lj_strfmt_wuleb128(char *p, uint32_t v); +LJ_FUNC const char *lj_strfmt_wstrnum(lua_State *L, cTValue *o, MSize *lenp); + +/* Unformatted conversions to buffer. */ +LJ_FUNC SBuf * LJ_FASTCALL lj_strfmt_putint(SBuf *sb, int32_t k); +#if LJ_HASJIT +LJ_FUNC SBuf * LJ_FASTCALL lj_strfmt_putnum(SBuf *sb, cTValue *o); +#endif +LJ_FUNC SBuf * LJ_FASTCALL lj_strfmt_putptr(SBuf *sb, const void *v); +#if LJ_HASJIT +LJ_FUNC SBuf * LJ_FASTCALL lj_strfmt_putquoted(SBuf *sb, GCstr *str); +#endif + +/* Formatted conversions to buffer. */ +LJ_FUNC SBuf *lj_strfmt_putfxint(SBuf *sb, SFormat sf, uint64_t k); +LJ_FUNC SBuf *lj_strfmt_putfnum_int(SBuf *sb, SFormat sf, lua_Number n); +LJ_FUNC SBuf *lj_strfmt_putfnum_uint(SBuf *sb, SFormat sf, lua_Number n); +LJ_FUNC SBuf *lj_strfmt_putfnum(SBuf *sb, SFormat, lua_Number n); +LJ_FUNC SBuf *lj_strfmt_putfchar(SBuf *sb, SFormat, int32_t c); +#if LJ_HASJIT +LJ_FUNC SBuf *lj_strfmt_putfstr(SBuf *sb, SFormat, GCstr *str); +#endif +LJ_FUNC int lj_strfmt_putarg(lua_State *L, SBuf *sb, int arg, int retry); + +/* Conversions to strings. */ +LJ_FUNC GCstr * LJ_FASTCALL lj_strfmt_int(lua_State *L, int32_t k); +LJ_FUNCA GCstr * LJ_FASTCALL lj_strfmt_num(lua_State *L, cTValue *o); +LJ_FUNCA GCstr * LJ_FASTCALL lj_strfmt_number(lua_State *L, cTValue *o); +#if LJ_HASJIT +LJ_FUNC GCstr * LJ_FASTCALL lj_strfmt_char(lua_State *L, int c); +#endif +LJ_FUNC GCstr * LJ_FASTCALL lj_strfmt_obj(lua_State *L, cTValue *o); + +/* Internal string formatting. */ +LJ_FUNC const char *lj_strfmt_pushvf(lua_State *L, const char *fmt, + va_list argp); +LJ_FUNC const char *lj_strfmt_pushf(lua_State *L, const char *fmt, ...) +#if defined(__GNUC__) || defined(__clang__) + __attribute__ ((format (printf, 2, 3))) +#endif + ; + +#endif diff --cc src/lj_strfmt_num.c index 3c60695c,00000000..79ec0263 mode 100644,000000..100644 --- a/src/lj_strfmt_num.c +++ b/src/lj_strfmt_num.c @@@ -1,592 -1,0 +1,592 @@@ +/* +** String formatting for floating-point numbers. - ** Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h ++** Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +** Contributed by Peter Cawley. +*/ + +#include + +#define lj_strfmt_num_c +#define LUA_CORE + +#include "lj_obj.h" +#include "lj_buf.h" +#include "lj_str.h" +#include "lj_strfmt.h" + +/* -- Precomputed tables -------------------------------------------------- */ + +/* Rescale factors to push the exponent of a number towards zero. */ +#define RESCALE_EXPONENTS(P, N) \ + P(308), P(289), P(270), P(250), P(231), P(212), P(193), P(173), P(154), \ + P(135), P(115), P(96), P(77), P(58), P(38), P(0), P(0), P(0), N(39), N(58), \ + N(77), N(96), N(116), N(135), N(154), N(174), N(193), N(212), N(231), \ + N(251), N(270), N(289) + +#define ONE_E_P(X) 1e+0 ## X +#define ONE_E_N(X) 1e-0 ## X +static const int16_t rescale_e[] = { RESCALE_EXPONENTS(-, +) }; +static const double rescale_n[] = { RESCALE_EXPONENTS(ONE_E_P, ONE_E_N) }; +#undef ONE_E_N +#undef ONE_E_P + +/* +** For p in range -70 through 57, this table encodes pairs (m, e) such that +** 4*2^p <= (uint8_t)m*10^e, and is the smallest value for which this holds. +*/ +static const int8_t four_ulp_m_e[] = { + 34, -21, 68, -21, 14, -20, 28, -20, 55, -20, 2, -19, 3, -19, 5, -19, 9, -19, + -82, -18, 35, -18, 7, -17, -117, -17, 28, -17, 56, -17, 112, -16, -33, -16, + 45, -16, 89, -16, -78, -15, 36, -15, 72, -15, -113, -14, 29, -14, 57, -14, + 114, -13, -28, -13, 46, -13, 91, -12, -74, -12, 37, -12, 73, -12, 15, -11, 3, + -11, 59, -11, 2, -10, 3, -10, 5, -10, 1, -9, -69, -9, 38, -9, 75, -9, 15, -7, + 3, -7, 6, -7, 12, -6, -17, -7, 48, -7, 96, -7, -65, -6, 39, -6, 77, -6, -103, + -5, 31, -5, 62, -5, 123, -4, -11, -4, 49, -4, 98, -4, -60, -3, 4, -2, 79, -3, + 16, -2, 32, -2, 63, -2, 2, -1, 25, 0, 5, 1, 1, 2, 2, 2, 4, 2, 8, 2, 16, 2, + 32, 2, 64, 2, -128, 2, 26, 2, 52, 2, 103, 3, -51, 3, 41, 4, 82, 4, -92, 4, + 33, 4, 66, 4, -124, 5, 27, 5, 53, 5, 105, 6, 21, 6, 42, 6, 84, 6, 17, 7, 34, + 7, 68, 7, 2, 8, 3, 8, 6, 8, 108, 9, -41, 9, 43, 10, 86, 9, -84, 10, 35, 10, + 69, 10, -118, 11, 28, 11, 55, 12, 11, 13, 22, 13, 44, 13, 88, 13, -80, 13, + 36, 13, 71, 13, -115, 14, 29, 14, 57, 14, 113, 15, -30, 15, 46, 15, 91, 15, + 19, 16, 37, 16, 73, 16, 2, 17, 3, 17, 6, 17 +}; + +/* min(2^32-1, 10^e-1) for e in range 0 through 10 */ +static uint32_t ndigits_dec_threshold[] = { + 0, 9U, 99U, 999U, 9999U, 99999U, 999999U, + 9999999U, 99999999U, 999999999U, 0xffffffffU +}; + +/* -- Helper functions ---------------------------------------------------- */ + +/* Compute the number of digits in the decimal representation of x. */ +static MSize ndigits_dec(uint32_t x) +{ + MSize t = ((lj_fls(x | 1) * 77) >> 8) + 1; /* 2^8/77 is roughly log2(10) */ + return t + (x > ndigits_dec_threshold[t]); +} + +#define WINT_R(x, sh, sc) \ + { uint32_t d = (x*(((1<>sh; x -= d*sc; *p++ = (char)('0'+d); } + +/* Write 9-digit unsigned integer to buffer. */ +static char *lj_strfmt_wuint9(char *p, uint32_t u) +{ + uint32_t v = u / 10000, w; + u -= v * 10000; + w = v / 10000; + v -= w * 10000; + *p++ = (char)('0'+w); + WINT_R(v, 23, 1000) + WINT_R(v, 12, 100) + WINT_R(v, 10, 10) + *p++ = (char)('0'+v); + WINT_R(u, 23, 1000) + WINT_R(u, 12, 100) + WINT_R(u, 10, 10) + *p++ = (char)('0'+u); + return p; +} +#undef WINT_R + +/* -- Extended precision arithmetic --------------------------------------- */ + +/* +** The "nd" format is a fixed-precision decimal representation for numbers. It +** consists of up to 64 uint32_t values, with each uint32_t storing a value +** in the range [0, 1e9). A number in "nd" format consists of three variables: +** +** uint32_t nd[64]; +** uint32_t ndlo; +** uint32_t ndhi; +** +** The integral part of the number is stored in nd[0 ... ndhi], the value of +** which is sum{i in [0, ndhi] | nd[i] * 10^(9*i)}. If the fractional part of +** the number is zero, ndlo is zero. Otherwise, the fractional part is stored +** in nd[ndlo ... 63], the value of which is taken to be +** sum{i in [ndlo, 63] | nd[i] * 10^(9*(i-64))}. +** +** If the array part had 128 elements rather than 64, then every double would +** have an exact representation in "nd" format. With 64 elements, all integral +** doubles have an exact representation, and all non-integral doubles have +** enough digits to make both %.99e and %.99f do the right thing. +*/ + +#if LJ_64 +#define ND_MUL2K_MAX_SHIFT 29 +#define ND_MUL2K_DIV1E9(val) ((uint32_t)((val) / 1000000000)) +#else +#define ND_MUL2K_MAX_SHIFT 11 +#define ND_MUL2K_DIV1E9(val) ((uint32_t)((val) >> 9) / 1953125) +#endif + +/* Multiply nd by 2^k and add carry_in (ndlo is assumed to be zero). */ +static uint32_t nd_mul2k(uint32_t* nd, uint32_t ndhi, uint32_t k, + uint32_t carry_in, SFormat sf) +{ + uint32_t i, ndlo = 0, start = 1; + /* Performance hacks. */ + if (k > ND_MUL2K_MAX_SHIFT*2 && STRFMT_FP(sf) != STRFMT_FP(STRFMT_T_FP_F)) { + start = ndhi - (STRFMT_PREC(sf) + 17) / 8; + } + /* Real logic. */ + while (k >= ND_MUL2K_MAX_SHIFT) { + for (i = ndlo; i <= ndhi; i++) { + uint64_t val = ((uint64_t)nd[i] << ND_MUL2K_MAX_SHIFT) | carry_in; + carry_in = ND_MUL2K_DIV1E9(val); + nd[i] = (uint32_t)val - carry_in * 1000000000; + } + if (carry_in) { + nd[++ndhi] = carry_in; carry_in = 0; + if (start++ == ndlo) ++ndlo; + } + k -= ND_MUL2K_MAX_SHIFT; + } + if (k) { + for (i = ndlo; i <= ndhi; i++) { + uint64_t val = ((uint64_t)nd[i] << k) | carry_in; + carry_in = ND_MUL2K_DIV1E9(val); + nd[i] = (uint32_t)val - carry_in * 1000000000; + } + if (carry_in) nd[++ndhi] = carry_in; + } + return ndhi; +} + +/* Divide nd by 2^k (ndlo is assumed to be zero). */ +static uint32_t nd_div2k(uint32_t* nd, uint32_t ndhi, uint32_t k, SFormat sf) +{ + uint32_t ndlo = 0, stop1 = ~0, stop2 = ~0; + /* Performance hacks. */ + if (!ndhi) { + if (!nd[0]) { + return 0; + } else { + uint32_t s = lj_ffs(nd[0]); + if (s >= k) { nd[0] >>= k; return 0; } + nd[0] >>= s; k -= s; + } + } + if (k > 18) { + if (STRFMT_FP(sf) == STRFMT_FP(STRFMT_T_FP_F)) { + stop1 = 63 - (int32_t)STRFMT_PREC(sf) / 9; + } else { + int32_t floorlog2 = ndhi * 29 + lj_fls(nd[ndhi]) - k; + int32_t floorlog10 = (int32_t)(floorlog2 * 0.30102999566398114); + stop1 = 62 + (floorlog10 - (int32_t)STRFMT_PREC(sf)) / 9; + stop2 = 61 + ndhi - (int32_t)STRFMT_PREC(sf) / 8; + } + } + /* Real logic. */ + while (k >= 9) { + uint32_t i = ndhi, carry = 0; + for (;;) { + uint32_t val = nd[i]; + nd[i] = (val >> 9) + carry; + carry = (val & 0x1ff) * 1953125; + if (i == ndlo) break; + i = (i - 1) & 0x3f; + } + if (ndlo != stop1 && ndlo != stop2) { + if (carry) { ndlo = (ndlo - 1) & 0x3f; nd[ndlo] = carry; } + if (!nd[ndhi]) { ndhi = (ndhi - 1) & 0x3f; stop2--; } + } else if (!nd[ndhi]) { + if (ndhi != ndlo) { ndhi = (ndhi - 1) & 0x3f; stop2--; } + else return ndlo; + } + k -= 9; + } + if (k) { + uint32_t mask = (1U << k) - 1, mul = 1000000000 >> k, i = ndhi, carry = 0; + for (;;) { + uint32_t val = nd[i]; + nd[i] = (val >> k) + carry; + carry = (val & mask) * mul; + if (i == ndlo) break; + i = (i - 1) & 0x3f; + } + if (carry) { ndlo = (ndlo - 1) & 0x3f; nd[ndlo] = carry; } + } + return ndlo; +} + +/* Add m*10^e to nd (assumes ndlo <= e/9 <= ndhi and 0 <= m <= 9). */ +static uint32_t nd_add_m10e(uint32_t* nd, uint32_t ndhi, uint8_t m, int32_t e) +{ + uint32_t i, carry; + if (e >= 0) { + i = (uint32_t)e/9; + carry = m * (ndigits_dec_threshold[e - (int32_t)i*9] + 1); + } else { + int32_t f = (e-8)/9; + i = (uint32_t)(64 + f); + carry = m * (ndigits_dec_threshold[e - f*9] + 1); + } + for (;;) { + uint32_t val = nd[i] + carry; + if (LJ_UNLIKELY(val >= 1000000000)) { + val -= 1000000000; + nd[i] = val; + if (LJ_UNLIKELY(i == ndhi)) { + ndhi = (ndhi + 1) & 0x3f; + nd[ndhi] = 1; + break; + } + carry = 1; + i = (i + 1) & 0x3f; + } else { + nd[i] = val; + break; + } + } + return ndhi; +} + +/* Test whether two "nd" values are equal in their most significant digits. */ +static int nd_similar(uint32_t* nd, uint32_t ndhi, uint32_t* ref, MSize hilen, + MSize prec) +{ + char nd9[9], ref9[9]; + if (hilen <= prec) { + if (LJ_UNLIKELY(nd[ndhi] != *ref)) return 0; + prec -= hilen; ref--; ndhi = (ndhi - 1) & 0x3f; + if (prec >= 9) { + if (LJ_UNLIKELY(nd[ndhi] != *ref)) return 0; + prec -= 9; ref--; ndhi = (ndhi - 1) & 0x3f; + } + } else { + prec -= hilen - 9; + } + lj_assertX(prec < 9, "bad precision %d", prec); + lj_strfmt_wuint9(nd9, nd[ndhi]); + lj_strfmt_wuint9(ref9, *ref); + return !memcmp(nd9, ref9, prec) && (nd9[prec] < '5') == (ref9[prec] < '5'); +} + +/* -- Formatted conversions to buffer ------------------------------------- */ + +/* Write formatted floating-point number to either sb or p. */ +static char *lj_strfmt_wfnum(SBuf *sb, SFormat sf, lua_Number n, char *p) +{ + MSize width = STRFMT_WIDTH(sf), prec = STRFMT_PREC(sf), len; + TValue t; + t.n = n; + if (LJ_UNLIKELY((t.u32.hi << 1) >= 0xffe00000)) { + /* Handle non-finite values uniformly for %a, %e, %f, %g. */ + int prefix = 0, ch = (sf & STRFMT_F_UPPER) ? 0x202020 : 0; + if (((t.u32.hi & 0x000fffff) | t.u32.lo) != 0) { + ch ^= ('n' << 16) | ('a' << 8) | 'n'; + if ((sf & STRFMT_F_SPACE)) prefix = ' '; + } else { + ch ^= ('i' << 16) | ('n' << 8) | 'f'; + if ((t.u32.hi & 0x80000000)) prefix = '-'; + else if ((sf & STRFMT_F_PLUS)) prefix = '+'; + else if ((sf & STRFMT_F_SPACE)) prefix = ' '; + } + len = 3 + (prefix != 0); + if (!p) p = lj_buf_more(sb, width > len ? width : len); + if (!(sf & STRFMT_F_LEFT)) while (width-- > len) *p++ = ' '; + if (prefix) *p++ = prefix; + *p++ = (char)(ch >> 16); *p++ = (char)(ch >> 8); *p++ = (char)ch; + } else if (STRFMT_FP(sf) == STRFMT_FP(STRFMT_T_FP_A)) { + /* %a */ + const char *hexdig = (sf & STRFMT_F_UPPER) ? "0123456789ABCDEFPX" + : "0123456789abcdefpx"; + int32_t e = (t.u32.hi >> 20) & 0x7ff; + char prefix = 0, eprefix = '+'; + if (t.u32.hi & 0x80000000) prefix = '-'; + else if ((sf & STRFMT_F_PLUS)) prefix = '+'; + else if ((sf & STRFMT_F_SPACE)) prefix = ' '; + t.u32.hi &= 0xfffff; + if (e) { + t.u32.hi |= 0x100000; + e -= 1023; + } else if (t.u32.lo | t.u32.hi) { + /* Non-zero denormal - normalise it. */ + uint32_t shift = t.u32.hi ? 20-lj_fls(t.u32.hi) : 52-lj_fls(t.u32.lo); + e = -1022 - shift; + t.u64 <<= shift; + } + /* abs(n) == t.u64 * 2^(e - 52) */ + /* If n != 0, bit 52 of t.u64 is set, and is the highest set bit. */ + if ((int32_t)prec < 0) { + /* Default precision: use smallest precision giving exact result. */ + prec = t.u32.lo ? 13-lj_ffs(t.u32.lo)/4 : 5-lj_ffs(t.u32.hi|0x100000)/4; + } else if (prec < 13) { + /* Precision is sufficiently low as to maybe require rounding. */ + t.u64 += (((uint64_t)1) << (51 - prec*4)); + } + if (e < 0) { + eprefix = '-'; + e = -e; + } + len = 5 + ndigits_dec((uint32_t)e) + prec + (prefix != 0) + + ((prec | (sf & STRFMT_F_ALT)) != 0); + if (!p) p = lj_buf_more(sb, width > len ? width : len); + if (!(sf & (STRFMT_F_LEFT | STRFMT_F_ZERO))) { + while (width-- > len) *p++ = ' '; + } + if (prefix) *p++ = prefix; + *p++ = '0'; + *p++ = hexdig[17]; /* x or X */ + if ((sf & (STRFMT_F_LEFT | STRFMT_F_ZERO)) == STRFMT_F_ZERO) { + while (width-- > len) *p++ = '0'; + } + *p++ = '0' + (t.u32.hi >> 20); /* Usually '1', sometimes '0' or '2'. */ + if ((prec | (sf & STRFMT_F_ALT))) { + /* Emit fractional part. */ + char *q = p + 1 + prec; + *p = '.'; + if (prec < 13) t.u64 >>= (52 - prec*4); + else while (prec > 13) p[prec--] = '0'; + while (prec) { p[prec--] = hexdig[t.u64 & 15]; t.u64 >>= 4; } + p = q; + } + *p++ = hexdig[16]; /* p or P */ + *p++ = eprefix; /* + or - */ + p = lj_strfmt_wint(p, e); + } else { + /* %e or %f or %g - begin by converting n to "nd" format. */ + uint32_t nd[64]; + uint32_t ndhi = 0, ndlo, i; + int32_t e = (t.u32.hi >> 20) & 0x7ff, ndebias = 0; + char prefix = 0, *q; + if (t.u32.hi & 0x80000000) prefix = '-'; + else if ((sf & STRFMT_F_PLUS)) prefix = '+'; + else if ((sf & STRFMT_F_SPACE)) prefix = ' '; + prec += ((int32_t)prec >> 31) & 7; /* Default precision is 6. */ + if (STRFMT_FP(sf) == STRFMT_FP(STRFMT_T_FP_G)) { + /* %g - decrement precision if non-zero (to make it like %e). */ + prec--; + prec ^= (uint32_t)((int32_t)prec >> 31); + } + if ((sf & STRFMT_T_FP_E) && prec < 14 && n != 0) { + /* Precision is sufficiently low that rescaling will probably work. */ + if ((ndebias = rescale_e[e >> 6])) { + t.n = n * rescale_n[e >> 6]; + if (LJ_UNLIKELY(!e)) t.n *= 1e10, ndebias -= 10; + t.u64 -= 2; /* Convert 2ulp below (later we convert 2ulp above). */ + nd[0] = 0x100000 | (t.u32.hi & 0xfffff); + e = ((t.u32.hi >> 20) & 0x7ff) - 1075 - (ND_MUL2K_MAX_SHIFT < 29); + goto load_t_lo; rescale_failed: + t.n = n; + e = (t.u32.hi >> 20) & 0x7ff; + ndebias = ndhi = 0; + } + } + nd[0] = t.u32.hi & 0xfffff; + if (e == 0) e++; else nd[0] |= 0x100000; + e -= 1043; + if (t.u32.lo) { + e -= 32 + (ND_MUL2K_MAX_SHIFT < 29); load_t_lo: +#if ND_MUL2K_MAX_SHIFT >= 29 + nd[0] = (nd[0] << 3) | (t.u32.lo >> 29); + ndhi = nd_mul2k(nd, ndhi, 29, t.u32.lo & 0x1fffffff, sf); +#elif ND_MUL2K_MAX_SHIFT >= 11 + ndhi = nd_mul2k(nd, ndhi, 11, t.u32.lo >> 21, sf); + ndhi = nd_mul2k(nd, ndhi, 11, (t.u32.lo >> 10) & 0x7ff, sf); + ndhi = nd_mul2k(nd, ndhi, 11, (t.u32.lo << 1) & 0x7ff, sf); +#else +#error "ND_MUL2K_MAX_SHIFT too small" +#endif + } + if (e >= 0) { + ndhi = nd_mul2k(nd, ndhi, (uint32_t)e, 0, sf); + ndlo = 0; + } else { + ndlo = nd_div2k(nd, ndhi, (uint32_t)-e, sf); + if (ndhi && !nd[ndhi]) ndhi--; + } + /* abs(n) == nd * 10^ndebias (for slightly loose interpretation of ==) */ + if ((sf & STRFMT_T_FP_E)) { + /* %e or %g - assume %e and start by calculating nd's exponent (nde). */ + char eprefix = '+'; + int32_t nde = -1; + MSize hilen; + if (ndlo && !nd[ndhi]) { + ndhi = 64; do {} while (!nd[--ndhi]); + nde -= 64 * 9; + } + hilen = ndigits_dec(nd[ndhi]); + nde += ndhi * 9 + hilen; + if (ndebias) { + /* + ** Rescaling was performed, but this introduced some error, and might + ** have pushed us across a rounding boundary. We check whether this + ** error affected the result by introducing even more error (2ulp in + ** either direction), and seeing whether a rounding boundary was + ** crossed. Having already converted the -2ulp case, we save off its + ** most significant digits, convert the +2ulp case, and compare them. + */ + int32_t eidx = e + 70 + (ND_MUL2K_MAX_SHIFT < 29) + + (t.u32.lo >= 0xfffffffe && !(~t.u32.hi << 12)); + const int8_t *m_e = four_ulp_m_e + eidx * 2; + lj_assertG_(G(sbufL(sb)), 0 <= eidx && eidx < 128, "bad eidx %d", eidx); + nd[33] = nd[ndhi]; + nd[32] = nd[(ndhi - 1) & 0x3f]; + nd[31] = nd[(ndhi - 2) & 0x3f]; + nd_add_m10e(nd, ndhi, (uint8_t)*m_e, m_e[1]); + if (LJ_UNLIKELY(!nd_similar(nd, ndhi, nd + 33, hilen, prec + 1))) { + goto rescale_failed; + } + } + if ((int32_t)(prec - nde) < (0x3f & -(int32_t)ndlo) * 9) { + /* Precision is sufficiently low as to maybe require rounding. */ + ndhi = nd_add_m10e(nd, ndhi, 5, nde - prec - 1); + nde += (hilen != ndigits_dec(nd[ndhi])); + } + nde += ndebias; + if ((sf & STRFMT_T_FP_F)) { + /* %g */ + if ((int32_t)prec >= nde && nde >= -4) { + if (nde < 0) ndhi = 0; + prec -= nde; + goto g_format_like_f; + } else if (!(sf & STRFMT_F_ALT) && prec && width > 5) { + /* Decrease precision in order to strip trailing zeroes. */ + char tail[9]; + uint32_t maxprec = hilen - 1 + ((ndhi - ndlo) & 0x3f) * 9; + if (prec >= maxprec) prec = maxprec; + else ndlo = (ndhi - (((int32_t)(prec - hilen) + 9) / 9)) & 0x3f; + i = prec - hilen - (((ndhi - ndlo) & 0x3f) * 9) + 10; + lj_strfmt_wuint9(tail, nd[ndlo]); + while (prec && tail[--i] == '0') { + prec--; + if (!i) { + if (ndlo == ndhi) { prec = 0; break; } + lj_strfmt_wuint9(tail, nd[++ndlo]); + i = 9; + } + } + } + } + if (nde < 0) { + /* Make nde non-negative. */ + eprefix = '-'; + nde = -nde; + } + len = 3 + prec + (prefix != 0) + ndigits_dec((uint32_t)nde) + (nde < 10) + + ((prec | (sf & STRFMT_F_ALT)) != 0); + if (!p) p = lj_buf_more(sb, (width > len ? width : len) + 5); + if (!(sf & (STRFMT_F_LEFT | STRFMT_F_ZERO))) { + while (width-- > len) *p++ = ' '; + } + if (prefix) *p++ = prefix; + if ((sf & (STRFMT_F_LEFT | STRFMT_F_ZERO)) == STRFMT_F_ZERO) { + while (width-- > len) *p++ = '0'; + } + q = lj_strfmt_wint(p + 1, nd[ndhi]); + p[0] = p[1]; /* Put leading digit in the correct place. */ + if ((prec | (sf & STRFMT_F_ALT))) { + /* Emit fractional part. */ + p[1] = '.'; p += 2; + prec -= (MSize)(q - p); p = q; /* Account for digits already emitted. */ + /* Then emit chunks of 9 digits (this may emit 8 digits too many). */ + for (i = ndhi; (int32_t)prec > 0 && i != ndlo; prec -= 9) { + i = (i - 1) & 0x3f; + p = lj_strfmt_wuint9(p, nd[i]); + } + if ((sf & STRFMT_T_FP_F) && !(sf & STRFMT_F_ALT)) { + /* %g (and not %#g) - strip trailing zeroes. */ + p += (int32_t)prec & ((int32_t)prec >> 31); + while (p[-1] == '0') p--; + if (p[-1] == '.') p--; + } else { + /* %e (or %#g) - emit trailing zeroes. */ + while ((int32_t)prec > 0) { *p++ = '0'; prec--; } + p += (int32_t)prec; + } + } else { + p++; + } + *p++ = (sf & STRFMT_F_UPPER) ? 'E' : 'e'; + *p++ = eprefix; /* + or - */ + if (nde < 10) *p++ = '0'; /* Always at least two digits of exponent. */ + p = lj_strfmt_wint(p, nde); + } else { + /* %f (or, shortly, %g in %f style) */ + if (prec < (MSize)(0x3f & -(int32_t)ndlo) * 9) { + /* Precision is sufficiently low as to maybe require rounding. */ + ndhi = nd_add_m10e(nd, ndhi, 5, 0 - prec - 1); + } + g_format_like_f: + if ((sf & STRFMT_T_FP_E) && !(sf & STRFMT_F_ALT) && prec && width) { + /* Decrease precision in order to strip trailing zeroes. */ + if (ndlo) { + /* nd has a fractional part; we need to look at its digits. */ + char tail[9]; + uint32_t maxprec = (64 - ndlo) * 9; + if (prec >= maxprec) prec = maxprec; + else ndlo = 64 - (prec + 8) / 9; + i = prec - ((63 - ndlo) * 9); + lj_strfmt_wuint9(tail, nd[ndlo]); + while (prec && tail[--i] == '0') { + prec--; + if (!i) { + if (ndlo == 63) { prec = 0; break; } + lj_strfmt_wuint9(tail, nd[++ndlo]); + i = 9; + } + } + } else { + /* nd has no fractional part, so precision goes straight to zero. */ + prec = 0; + } + } + len = ndhi * 9 + ndigits_dec(nd[ndhi]) + prec + (prefix != 0) + + ((prec | (sf & STRFMT_F_ALT)) != 0); + if (!p) p = lj_buf_more(sb, (width > len ? width : len) + 8); + if (!(sf & (STRFMT_F_LEFT | STRFMT_F_ZERO))) { + while (width-- > len) *p++ = ' '; + } + if (prefix) *p++ = prefix; + if ((sf & (STRFMT_F_LEFT | STRFMT_F_ZERO)) == STRFMT_F_ZERO) { + while (width-- > len) *p++ = '0'; + } + /* Emit integer part. */ + p = lj_strfmt_wint(p, nd[ndhi]); + i = ndhi; + while (i) p = lj_strfmt_wuint9(p, nd[--i]); + if ((prec | (sf & STRFMT_F_ALT))) { + /* Emit fractional part. */ + *p++ = '.'; + /* Emit chunks of 9 digits (this may emit 8 digits too many). */ + while ((int32_t)prec > 0 && i != ndlo) { + i = (i - 1) & 0x3f; + p = lj_strfmt_wuint9(p, nd[i]); + prec -= 9; + } + if ((sf & STRFMT_T_FP_E) && !(sf & STRFMT_F_ALT)) { + /* %g (and not %#g) - strip trailing zeroes. */ + p += (int32_t)prec & ((int32_t)prec >> 31); + while (p[-1] == '0') p--; + if (p[-1] == '.') p--; + } else { + /* %f (or %#g) - emit trailing zeroes. */ + while ((int32_t)prec > 0) { *p++ = '0'; prec--; } + p += (int32_t)prec; + } + } + } + } + if ((sf & STRFMT_F_LEFT)) while (width-- > len) *p++ = ' '; + return p; +} + +/* Add formatted floating-point number to buffer. */ +SBuf *lj_strfmt_putfnum(SBuf *sb, SFormat sf, lua_Number n) +{ + sb->w = lj_strfmt_wfnum(sb, sf, n, NULL); + return sb; +} + +/* -- Conversions to strings ---------------------------------------------- */ + +/* Convert number to string. */ +GCstr * LJ_FASTCALL lj_strfmt_num(lua_State *L, cTValue *o) +{ + char buf[STRFMT_MAXBUF_NUM]; + MSize len = (MSize)(lj_strfmt_wfnum(NULL, STRFMT_G14, o->n, buf) - buf); + return lj_str_new(L, buf, len); +} + diff --cc src/lj_target_arm64.h index c9c6b80f,00000000..65a14307 mode 100644,000000..100644 --- a/src/lj_target_arm64.h +++ b/src/lj_target_arm64.h @@@ -1,342 -1,0 +1,342 @@@ +/* +** Definitions for ARM64 CPUs. - ** Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h ++** Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +*/ + +#ifndef _LJ_TARGET_ARM64_H +#define _LJ_TARGET_ARM64_H + +/* -- Registers IDs ------------------------------------------------------- */ + +#define GPRDEF(_) \ + _(X0) _(X1) _(X2) _(X3) _(X4) _(X5) _(X6) _(X7) \ + _(X8) _(X9) _(X10) _(X11) _(X12) _(X13) _(X14) _(X15) \ + _(X16) _(X17) _(X18) _(X19) _(X20) _(X21) _(X22) _(X23) \ + _(X24) _(X25) _(X26) _(X27) _(X28) _(FP) _(LR) _(SP) +#define FPRDEF(_) \ + _(D0) _(D1) _(D2) _(D3) _(D4) _(D5) _(D6) _(D7) \ + _(D8) _(D9) _(D10) _(D11) _(D12) _(D13) _(D14) _(D15) \ + _(D16) _(D17) _(D18) _(D19) _(D20) _(D21) _(D22) _(D23) \ + _(D24) _(D25) _(D26) _(D27) _(D28) _(D29) _(D30) _(D31) +#define VRIDDEF(_) + +#define RIDENUM(name) RID_##name, + +enum { + GPRDEF(RIDENUM) /* General-purpose registers (GPRs). */ + FPRDEF(RIDENUM) /* Floating-point registers (FPRs). */ + RID_MAX, + RID_TMP = RID_LR, + RID_ZERO = RID_SP, + + /* Calling conventions. */ + RID_RET = RID_X0, + RID_RETLO = RID_X0, + RID_RETHI = RID_X1, + RID_FPRET = RID_D0, + + /* These definitions must match with the *.dasc file(s): */ + RID_BASE = RID_X19, /* Interpreter BASE. */ + RID_LPC = RID_X21, /* Interpreter PC. */ + RID_GL = RID_X22, /* Interpreter GL. */ + RID_LREG = RID_X23, /* Interpreter L. */ + + /* Register ranges [min, max) and number of registers. */ + RID_MIN_GPR = RID_X0, + RID_MAX_GPR = RID_SP+1, + RID_MIN_FPR = RID_MAX_GPR, + RID_MAX_FPR = RID_D31+1, + RID_NUM_GPR = RID_MAX_GPR - RID_MIN_GPR, + RID_NUM_FPR = RID_MAX_FPR - RID_MIN_FPR +}; + +#define RID_NUM_KREF RID_NUM_GPR +#define RID_MIN_KREF RID_X0 + +/* -- Register sets ------------------------------------------------------- */ + +/* Make use of all registers, except for x18, fp, lr and sp. */ +#define RSET_FIXED \ + (RID2RSET(RID_X18)|RID2RSET(RID_FP)|RID2RSET(RID_LR)|RID2RSET(RID_SP)|\ + RID2RSET(RID_GL)) +#define RSET_GPR (RSET_RANGE(RID_MIN_GPR, RID_MAX_GPR) - RSET_FIXED) +#define RSET_FPR RSET_RANGE(RID_MIN_FPR, RID_MAX_FPR) +#define RSET_ALL (RSET_GPR|RSET_FPR) +#define RSET_INIT RSET_ALL + +/* lr is an implicit scratch register. */ +#define RSET_SCRATCH_GPR (RSET_RANGE(RID_X0, RID_X17+1)) +#define RSET_SCRATCH_FPR \ + (RSET_RANGE(RID_D0, RID_D7+1)|RSET_RANGE(RID_D16, RID_D31+1)) +#define RSET_SCRATCH (RSET_SCRATCH_GPR|RSET_SCRATCH_FPR) +#define REGARG_FIRSTGPR RID_X0 +#define REGARG_LASTGPR RID_X7 +#define REGARG_NUMGPR 8 +#define REGARG_FIRSTFPR RID_D0 +#define REGARG_LASTFPR RID_D7 +#define REGARG_NUMFPR 8 + +/* -- Spill slots --------------------------------------------------------- */ + +/* Spill slots are 32 bit wide. An even/odd pair is used for FPRs. +** +** SPS_FIXED: Available fixed spill slots in interpreter frame. +** This definition must match with the vm_arm64.dasc file. +** Pre-allocate some slots to avoid sp adjust in every root trace. +** +** SPS_FIRST: First spill slot for general use. Reserve min. two 32 bit slots. +*/ +#define SPS_FIXED 4 +#define SPS_FIRST 2 + +#define SPOFS_TMP 0 + +#define sps_scale(slot) (4 * (int32_t)(slot)) +#define sps_align(slot) (((slot) - SPS_FIXED + 3) & ~3) + +/* -- Exit state ---------------------------------------------------------- */ + +/* This definition must match with the *.dasc file(s). */ +typedef struct { + lua_Number fpr[RID_NUM_FPR]; /* Floating-point registers. */ + intptr_t gpr[RID_NUM_GPR]; /* General-purpose registers. */ + int32_t spill[256]; /* Spill slots. */ +} ExitState; + +/* Highest exit + 1 indicates stack check. */ +#define EXITSTATE_CHECKEXIT 1 + +/* Return the address of a per-trace exit stub. */ +static LJ_AINLINE uint32_t *exitstub_trace_addr_(uint32_t *p, uint32_t exitno) +{ + while (*p == (LJ_LE ? 0xd503201f : 0x1f2003d5)) p++; /* Skip A64I_NOP. */ + return p + 3 + exitno; +} +/* Avoid dependence on lj_jit.h if only including lj_target.h. */ +#define exitstub_trace_addr(T, exitno) \ + exitstub_trace_addr_((MCode *)((char *)(T)->mcode + (T)->szmcode), (exitno)) + +/* -- Instructions -------------------------------------------------------- */ + +/* ARM64 instructions are always little-endian. Swap for ARM64BE. */ +#if LJ_BE +#define A64I_LE(x) (lj_bswap(x)) +#else +#define A64I_LE(x) (x) +#endif + +/* Instruction fields. */ +#define A64F_D(r) (r) +#define A64F_N(r) ((r) << 5) +#define A64F_A(r) ((r) << 10) +#define A64F_M(r) ((r) << 16) +#define A64F_IMMS(x) ((x) << 10) +#define A64F_IMMR(x) ((x) << 16) +#define A64F_U16(x) ((x) << 5) +#define A64F_U12(x) ((x) << 10) +#define A64F_S26(x) (((uint32_t)(x) & 0x03ffffffu)) +#define A64F_S19(x) (((uint32_t)(x) & 0x7ffffu) << 5) +#define A64F_S14(x) (((uint32_t)(x) & 0x3fffu) << 5) +#define A64F_S9(x) ((x) << 12) +#define A64F_BIT(x) ((x) << 19) +#define A64F_SH(sh, x) (((sh) << 22) | ((x) << 10)) +#define A64F_EX(ex) (A64I_EX | ((ex) << 13)) +#define A64F_EXSH(ex,x) (A64I_EX | ((ex) << 13) | ((x) << 10)) +#define A64F_FP8(x) ((x) << 13) +#define A64F_CC(cc) ((cc) << 12) +#define A64F_LSL16(x) (((x) / 16) << 21) +#define A64F_BSH(sh) ((sh) << 10) + +/* Check for valid field range. */ +#define A64F_S_OK(x, b) ((((x) + (1 << (b-1))) >> (b)) == 0) + +typedef enum A64Ins { + A64I_S = 0x20000000, + A64I_X = 0x80000000, + A64I_EX = 0x00200000, + A64I_ON = 0x00200000, + A64I_K12 = 0x1a000000, + A64I_K13 = 0x18000000, + A64I_LS_U = 0x01000000, + A64I_LS_S = 0x00800000, + A64I_LS_R = 0x01200800, + A64I_LS_SH = 0x00001000, + A64I_LS_UXTWx = 0x00004000, + A64I_LS_SXTWx = 0x0000c000, + A64I_LS_SXTXx = 0x0000e000, + A64I_LS_LSLx = 0x00006000, + + A64I_ADDw = 0x0b000000, + A64I_ADDx = 0x8b000000, + A64I_ADDSw = 0x2b000000, + A64I_ADDSx = 0xab000000, + A64I_NEGw = 0x4b0003e0, + A64I_NEGx = 0xcb0003e0, + A64I_SUBw = 0x4b000000, + A64I_SUBx = 0xcb000000, + A64I_SUBSw = 0x6b000000, + A64I_SUBSx = 0xeb000000, + + A64I_MULw = 0x1b007c00, + A64I_MULx = 0x9b007c00, + A64I_SMULL = 0x9b207c00, + + A64I_ANDw = 0x0a000000, + A64I_ANDx = 0x8a000000, + A64I_ANDSw = 0x6a000000, + A64I_ANDSx = 0xea000000, + A64I_EORw = 0x4a000000, + A64I_EORx = 0xca000000, + A64I_ORRw = 0x2a000000, + A64I_ORRx = 0xaa000000, + A64I_TSTw = 0x6a00001f, + A64I_TSTx = 0xea00001f, + + A64I_CMPw = 0x6b00001f, + A64I_CMPx = 0xeb00001f, + A64I_CMNw = 0x2b00001f, + A64I_CMNx = 0xab00001f, + A64I_CCMPw = 0x7a400000, + A64I_CCMPx = 0xfa400000, + A64I_CSELw = 0x1a800000, + A64I_CSELx = 0x9a800000, + + A64I_ASRw = 0x13007c00, + A64I_ASRx = 0x9340fc00, + A64I_LSLx = 0xd3400000, + A64I_LSRx = 0xd340fc00, + A64I_SHRw = 0x1ac02000, + A64I_SHRx = 0x9ac02000, /* lsl/lsr/asr/ror x0, x0, x0 */ + A64I_REVw = 0x5ac00800, + A64I_REVx = 0xdac00c00, + + A64I_EXTRw = 0x13800000, + A64I_EXTRx = 0x93c00000, + A64I_BFMw = 0x33000000, + A64I_BFMx = 0xb3400000, + A64I_SBFMw = 0x13000000, + A64I_SBFMx = 0x93400000, + A64I_SXTBw = 0x13001c00, + A64I_SXTHw = 0x13003c00, + A64I_SXTW = 0x93407c00, + A64I_UBFMw = 0x53000000, + A64I_UBFMx = 0xd3400000, + A64I_UXTBw = 0x53001c00, + A64I_UXTHw = 0x53003c00, + + A64I_MOVw = 0x2a0003e0, + A64I_MOVx = 0xaa0003e0, + A64I_MVNw = 0x2a2003e0, + A64I_MVNx = 0xaa2003e0, + A64I_MOVKw = 0x72800000, + A64I_MOVKx = 0xf2800000, + A64I_MOVZw = 0x52800000, + A64I_MOVZx = 0xd2800000, + A64I_MOVNw = 0x12800000, + A64I_MOVNx = 0x92800000, + + A64I_LDRB = 0x39400000, + A64I_LDRH = 0x79400000, + A64I_LDRw = 0xb9400000, + A64I_LDRx = 0xf9400000, + A64I_LDRLw = 0x18000000, + A64I_LDRLx = 0x58000000, + A64I_STRB = 0x39000000, + A64I_STRH = 0x79000000, + A64I_STRw = 0xb9000000, + A64I_STRx = 0xf9000000, + A64I_STPw = 0x29000000, + A64I_STPx = 0xa9000000, + A64I_LDPw = 0x29400000, + A64I_LDPx = 0xa9400000, + + A64I_B = 0x14000000, + A64I_BCC = 0x54000000, + A64I_BL = 0x94000000, + A64I_BR = 0xd61f0000, + A64I_BLR = 0xd63f0000, + A64I_TBZ = 0x36000000, + A64I_TBNZ = 0x37000000, + A64I_CBZ = 0x34000000, + A64I_CBNZ = 0x35000000, + + A64I_BRAAZ = 0xd61f081f, + A64I_BLRAAZ = 0xd63f081f, + + A64I_NOP = 0xd503201f, + + /* FP */ + A64I_FADDd = 0x1e602800, + A64I_FSUBd = 0x1e603800, + A64I_FMADDd = 0x1f400000, + A64I_FMSUBd = 0x1f408000, + A64I_FNMADDd = 0x1f600000, + A64I_FNMSUBd = 0x1f608000, + A64I_FMULd = 0x1e600800, + A64I_FDIVd = 0x1e601800, + A64I_FNEGd = 0x1e614000, + A64I_FABS = 0x1e60c000, + A64I_FSQRTd = 0x1e61c000, + A64I_LDRs = 0xbd400000, + A64I_LDRd = 0xfd400000, + A64I_STRs = 0xbd000000, + A64I_STRd = 0xfd000000, + A64I_LDPs = 0x2d400000, + A64I_LDPd = 0x6d400000, + A64I_STPs = 0x2d000000, + A64I_STPd = 0x6d000000, + A64I_FCMPd = 0x1e602000, + A64I_FCMPZd = 0x1e602008, + A64I_FCSELd = 0x1e600c00, + A64I_FRINTMd = 0x1e654000, + A64I_FRINTPd = 0x1e64c000, + A64I_FRINTZd = 0x1e65c000, + + A64I_FCVT_F32_F64 = 0x1e624000, + A64I_FCVT_F64_F32 = 0x1e22c000, + A64I_FCVT_F32_S32 = 0x1e220000, + A64I_FCVT_F64_S32 = 0x1e620000, + A64I_FCVT_F32_U32 = 0x1e230000, + A64I_FCVT_F64_U32 = 0x1e630000, + A64I_FCVT_F32_S64 = 0x9e220000, + A64I_FCVT_F64_S64 = 0x9e620000, + A64I_FCVT_F32_U64 = 0x9e230000, + A64I_FCVT_F64_U64 = 0x9e630000, + A64I_FCVT_S32_F64 = 0x1e780000, + A64I_FCVT_S32_F32 = 0x1e380000, + A64I_FCVT_U32_F64 = 0x1e790000, + A64I_FCVT_U32_F32 = 0x1e390000, + A64I_FCVT_S64_F64 = 0x9e780000, + A64I_FCVT_S64_F32 = 0x9e380000, + A64I_FCVT_U64_F64 = 0x9e790000, + A64I_FCVT_U64_F32 = 0x9e390000, + + A64I_FMOV_S = 0x1e204000, + A64I_FMOV_D = 0x1e604000, + A64I_FMOV_R_S = 0x1e260000, + A64I_FMOV_S_R = 0x1e270000, + A64I_FMOV_R_D = 0x9e660000, + A64I_FMOV_D_R = 0x9e670000, + A64I_FMOV_DI = 0x1e601000, +} A64Ins; + +#define A64I_BR_AUTH (LJ_ABI_PAUTH ? A64I_BRAAZ : A64I_BR) +#define A64I_BLR_AUTH (LJ_ABI_PAUTH ? A64I_BLRAAZ : A64I_BLR) + +typedef enum A64Shift { + A64SH_LSL, A64SH_LSR, A64SH_ASR, A64SH_ROR +} A64Shift; + +typedef enum A64Extend { + A64EX_UXTB, A64EX_UXTH, A64EX_UXTW, A64EX_UXTX, + A64EX_SXTB, A64EX_SXTH, A64EX_SXTW, A64EX_SXTX, +} A64Extend; + +/* ARM condition codes. */ +typedef enum A64CC { + CC_EQ, CC_NE, CC_CS, CC_CC, CC_MI, CC_PL, CC_VS, CC_VC, + CC_HI, CC_LS, CC_GE, CC_LT, CC_GT, CC_LE, CC_AL, + CC_HS = CC_CS, CC_LO = CC_CC +} A64CC; + +#endif diff --cc src/ljamalg.c index cae8356c,f9315d5c..f1dce6a3 --- a/src/ljamalg.c +++ b/src/ljamalg.c @@@ -1,8 -1,18 +1,8 @@@ /* ** LuaJIT core and libraries amalgamation. - ** Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h + ** Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h */ -/* -+--------------------------------------------------------------------------+ -| WARNING: Compiling the amalgamation needs a lot of virtual memory | -| (around 300 MB with GCC 4.x)! If you don't have enough physical memory | -| your machine will start swapping to disk and the compile will not finish | -| within a reasonable amount of time. | -| So either compile on a bigger machine or use the non-amalgamated build. | -+--------------------------------------------------------------------------+ -*/ - #define ljamalg_c #define LUA_CORE diff --cc src/luajit.h index 31f1eb1f,8b09f376..f01771ae --- a/src/luajit.h +++ b/src/luajit.h @@@ -30,10 -30,10 +30,10 @@@ #include "lua.h" -#define LUAJIT_VERSION "LuaJIT 2.0.5" -#define LUAJIT_VERSION_NUM 20005 /* Version 2.0.5 = 02.00.05. */ -#define LUAJIT_VERSION_SYM luaJIT_version_2_0_5 +#define LUAJIT_VERSION "LuaJIT 2.1.0-beta3" +#define LUAJIT_VERSION_NUM 20100 /* Version 2.1.0 = 02.01.00. */ +#define LUAJIT_VERSION_SYM luaJIT_version_2_1_0_beta3 - #define LUAJIT_COPYRIGHT "Copyright (C) 2005-2022 Mike Pall" + #define LUAJIT_COPYRIGHT "Copyright (C) 2005-2023 Mike Pall" #define LUAJIT_URL "https://luajit.org/" /* Modes for luaJIT_setmode. */ diff --cc src/vm_arm64.dasc index effb8d91,00000000..a7a9392c mode 100644,000000..100644 --- a/src/vm_arm64.dasc +++ b/src/vm_arm64.dasc @@@ -1,4201 -1,0 +1,4201 @@@ +|// Low-level VM code for ARM64 CPUs. +|// Bytecode interpreter, fast functions and helper functions. - |// Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h ++|// Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +| +|.arch arm64 +|.section code_op, code_sub +| +|.actionlist build_actionlist +|.globals GLOB_ +|.globalnames globnames +|.externnames extnames +| +|// Note: The ragged indentation of the instructions is intentional. +|// The starting columns indicate data dependencies. +| +|//----------------------------------------------------------------------- +| +|// ARM64 registers and the AAPCS64 ABI 1.0 at a glance: +|// +|// x0-x17 temp, x19-x28 callee-saved, x29 fp, x30 lr +|// x18 is reserved on most platforms. Don't use it, save it or restore it. +|// x31 doesn't exist. Register number 31 either means xzr/wzr (zero) or sp, +|// depending on the instruction. +|// v0-v7 temp, v8-v15 callee-saved (only d8-d15 preserved), v16-v31 temp +|// +|// x0-x7/v0-v7 hold parameters and results. +| +|// Fixed register assignments for the interpreter. +| +|// The following must be C callee-save. +|.define BASE, x19 // Base of current Lua stack frame. +|.define KBASE, x20 // Constants of current Lua function. +|.define PC, x21 // Next PC. +|.define GLREG, x22 // Global state. +|.define LREG, x23 // Register holding lua_State (also in SAVE_L). +|.define TISNUM, x24 // Constant LJ_TISNUM << 47. +|.define TISNUMhi, x25 // Constant LJ_TISNUM << 15. +|.define TISNIL, x26 // Constant -1LL. +|.define fp, x29 // Yes, we have to maintain a frame pointer. +| +|.define ST_INTERP, w26 // Constant -1. +| +|// The following temporaries are not saved across C calls, except for RA/RC. +|.define RA, x27 +|.define RC, x28 +|.define RB, x17 +|.define RAw, w27 +|.define RCw, w28 +|.define RBw, w17 +|.define INS, x16 +|.define INSw, w16 +|.define ITYPE, x15 +|.define TMP0, x8 +|.define TMP1, x9 +|.define TMP2, x10 +|.define TMP3, x11 +|.define TMP0w, w8 +|.define TMP1w, w9 +|.define TMP2w, w10 +|.define TMP3w, w11 +| +|// Calling conventions. Also used as temporaries. +|.define CARG1, x0 +|.define CARG2, x1 +|.define CARG3, x2 +|.define CARG4, x3 +|.define CARG5, x4 +|.define CARG1w, w0 +|.define CARG2w, w1 +|.define CARG3w, w2 +|.define CARG4w, w3 +|.define CARG5w, w4 +| +|.define FARG1, d0 +|.define FARG2, d1 +| +|.define CRET1, x0 +|.define CRET1w, w0 +| +|//----------------------------------------------------------------------- +| +|// ARM64e pointer authentication codes (PAC). +|.if PAUTH +|.macro sp_auth; pacibsp; .endmacro +|.macro br_auth, reg; braaz reg; .endmacro +|.macro blr_auth, reg; blraaz reg; .endmacro +|.macro ret_auth; retab; .endmacro +|.else +|.macro sp_auth; .endmacro +|.macro br_auth, reg; br reg; .endmacro +|.macro blr_auth, reg; blr reg; .endmacro +|.macro ret_auth; ret; .endmacro +|.endif +| +|//----------------------------------------------------------------------- +| +|// Stack layout while in interpreter. Must match with lj_frame.h. +| +|.define CFRAME_SPACE, 208 +|//----- 16 byte aligned, <-- sp entering interpreter +|.define SAVE_FP_LR_, 192 +|.define SAVE_GPR_, 112 // 112+10*8: 64 bit GPR saves +|.define SAVE_FPR_, 48 // 48+8*8: 64 bit FPR saves +|// Unused [sp, #44] // 32 bit values +|.define SAVE_NRES, [sp, #40] +|.define SAVE_ERRF, [sp, #36] +|.define SAVE_MULTRES, [sp, #32] +|.define TMPD, [sp, #24] // 64 bit values +|.define SAVE_L, [sp, #16] +|.define SAVE_PC, [sp, #8] +|.define SAVE_CFRAME, [sp, #0] +|//----- 16 byte aligned, <-- sp while in interpreter. +| +|.define TMPDofs, #24 +| +|.macro save_, gpr1, gpr2, fpr1, fpr2 +| stp d..fpr2, d..fpr1, [sp, # SAVE_FPR_+(14-fpr1)*8] +| stp x..gpr2, x..gpr1, [sp, # SAVE_GPR_+(27-gpr1)*8] +|.endmacro +|.macro rest_, gpr1, gpr2, fpr1, fpr2 +| ldp d..fpr2, d..fpr1, [sp, # SAVE_FPR_+(14-fpr1)*8] +| ldp x..gpr2, x..gpr1, [sp, # SAVE_GPR_+(27-gpr1)*8] +|.endmacro +| +|.macro saveregs +| sp_auth +| sub sp, sp, # CFRAME_SPACE +| stp fp, lr, [sp, # SAVE_FP_LR_] +| add fp, sp, # SAVE_FP_LR_ +| stp x20, x19, [sp, # SAVE_GPR_+(27-19)*8] +| save_ 21, 22, 8, 9 +| save_ 23, 24, 10, 11 +| save_ 25, 26, 12, 13 +| save_ 27, 28, 14, 15 +|.endmacro +|.macro restoreregs +| ldp x20, x19, [sp, # SAVE_GPR_+(27-19)*8] +| rest_ 21, 22, 8, 9 +| rest_ 23, 24, 10, 11 +| rest_ 25, 26, 12, 13 +| rest_ 27, 28, 14, 15 +| ldp fp, lr, [sp, # SAVE_FP_LR_] +| add sp, sp, # CFRAME_SPACE +|.endmacro +| +|// Type definitions. Some of these are only used for documentation. +|.type L, lua_State, LREG +|.type GL, global_State, GLREG +|.type TVALUE, TValue +|.type GCOBJ, GCobj +|.type STR, GCstr +|.type TAB, GCtab +|.type LFUNC, GCfuncL +|.type CFUNC, GCfuncC +|.type PROTO, GCproto +|.type UPVAL, GCupval +|.type NODE, Node +|.type NARGS8, int +|.type TRACE, GCtrace +|.type SBUF, SBuf +| +|//----------------------------------------------------------------------- +| +|// Trap for not-yet-implemented parts. +|.macro NYI; brk; .endmacro +| +|//----------------------------------------------------------------------- +| +|// Access to frame relative to BASE. +|.define FRAME_FUNC, #-16 +|.define FRAME_PC, #-8 +| +|// Endian-specific defines. +|.if ENDIAN_LE +|.define LO, 0 +|.define OFS_RD, 2 +|.define OFS_RB, 3 +|.define OFS_RA, 1 +|.define OFS_OP, 0 +|.else +|.define LO, 4 +|.define OFS_RD, 0 +|.define OFS_RB, 0 +|.define OFS_RA, 2 +|.define OFS_OP, 3 +|.endif +| +|.macro decode_RA, dst, ins; ubfx dst, ins, #8, #8; .endmacro +|.macro decode_RB, dst, ins; ubfx dst, ins, #24, #8; .endmacro +|.macro decode_RC, dst, ins; ubfx dst, ins, #16, #8; .endmacro +|.macro decode_RD, dst, ins; ubfx dst, ins, #16, #16; .endmacro +|.macro decode_RC8RD, dst, src; ubfiz dst, src, #3, #8; .endmacro +| +|// Instruction decode+dispatch. +|.macro ins_NEXT +| ldr INSw, [PC], #4 +| add TMP1, GL, INS, uxtb #3 +| decode_RA RA, INS +| ldr TMP0, [TMP1, #GG_G2DISP] +| decode_RD RC, INS +| br_auth TMP0 +|.endmacro +| +|// Instruction footer. +|.if 1 +| // Replicated dispatch. Less unpredictable branches, but higher I-Cache use. +| .define ins_next, ins_NEXT +| .define ins_next_, ins_NEXT +|.else +| // Common dispatch. Lower I-Cache use, only one (very) unpredictable branch. +| // Affects only certain kinds of benchmarks (and only with -j off). +| .macro ins_next +| b ->ins_next +| .endmacro +| .macro ins_next_ +| ->ins_next: +| ins_NEXT +| .endmacro +|.endif +| +|// Call decode and dispatch. +|.macro ins_callt +| // BASE = new base, CARG3 = LFUNC/CFUNC, RC = nargs*8, FRAME_PC(BASE) = PC +| ldr PC, LFUNC:CARG3->pc +| ldr INSw, [PC], #4 +| add TMP1, GL, INS, uxtb #3 +| decode_RA RA, INS +| ldr TMP0, [TMP1, #GG_G2DISP] +| add RA, BASE, RA, lsl #3 +| br_auth TMP0 +|.endmacro +| +|.macro ins_call +| // BASE = new base, CARG3 = LFUNC/CFUNC, RC = nargs*8, PC = caller PC +| str PC, [BASE, FRAME_PC] +| ins_callt +|.endmacro +| +|//----------------------------------------------------------------------- +| +|// Macros to check the TValue type and extract the GCobj. Branch on failure. +|.macro checktp, reg, tp, target +| asr ITYPE, reg, #47 +| cmn ITYPE, #-tp +| and reg, reg, #LJ_GCVMASK +| bne target +|.endmacro +|.macro checktp, dst, reg, tp, target +| asr ITYPE, reg, #47 +| cmn ITYPE, #-tp +| and dst, reg, #LJ_GCVMASK +| bne target +|.endmacro +|.macro checkstr, reg, target; checktp reg, LJ_TSTR, target; .endmacro +|.macro checktab, reg, target; checktp reg, LJ_TTAB, target; .endmacro +|.macro checkfunc, reg, target; checktp reg, LJ_TFUNC, target; .endmacro +|.macro checkint, reg, target +| cmp TISNUMhi, reg, lsr #32 +| bne target +|.endmacro +|.macro checknum, reg, target +| cmp TISNUMhi, reg, lsr #32 +| bls target +|.endmacro +|.macro checknumber, reg, target +| cmp TISNUMhi, reg, lsr #32 +| blo target +|.endmacro +| +|.macro mov_false, reg; movn reg, #0x8000, lsl #32; .endmacro +|.macro mov_true, reg; movn reg, #0x0001, lsl #48; .endmacro +| +#define GL_J(field) (GG_G2J + (int)offsetof(jit_State, field)) +| +#define PC2PROTO(field) ((int)offsetof(GCproto, field)-(int)sizeof(GCproto)) +| +|.macro hotcheck, delta +| lsr CARG1, PC, #1 +| and CARG1, CARG1, #126 +| add CARG1, CARG1, #GG_G2DISP+GG_DISP2HOT +| ldrh CARG2w, [GL, CARG1] +| subs CARG2, CARG2, #delta +| strh CARG2w, [GL, CARG1] +|.endmacro +| +|.macro hotloop +| hotcheck HOTCOUNT_LOOP +| blo ->vm_hotloop +|.endmacro +| +|.macro hotcall +| hotcheck HOTCOUNT_CALL +| blo ->vm_hotcall +|.endmacro +| +|// Set current VM state. +|.macro mv_vmstate, reg, st; movn reg, #LJ_VMST_..st; .endmacro +|.macro st_vmstate, reg; str reg, GL->vmstate; .endmacro +| +|// Move table write barrier back. Overwrites mark and tmp. +|.macro barrierback, tab, mark, tmp +| ldr tmp, GL->gc.grayagain +| and mark, mark, #~LJ_GC_BLACK // black2gray(tab) +| str tab, GL->gc.grayagain +| strb mark, tab->marked +| str tmp, tab->gclist +|.endmacro +| +|//----------------------------------------------------------------------- + +#if !LJ_DUALNUM +#error "Only dual-number mode supported for ARM64 target" +#endif + +/* Generate subroutines used by opcodes and other parts of the VM. */ +/* The .code_sub section should be last to help static branch prediction. */ +static void build_subroutines(BuildCtx *ctx) +{ + |.code_sub + | + |//----------------------------------------------------------------------- + |//-- Return handling ---------------------------------------------------- + |//----------------------------------------------------------------------- + | + |->vm_returnp: + | // See vm_return. Also: RB = previous base. + | tbz PC, #2, ->cont_dispatch // (PC & FRAME_P) == 0? + | + | // Return from pcall or xpcall fast func. + | ldr PC, [RB, FRAME_PC] // Fetch PC of previous frame. + | mov_true TMP0 + | mov BASE, RB + | // Prepending may overwrite the pcall frame, so do it at the end. + | str TMP0, [RA, #-8]! // Prepend true to results. + | + |->vm_returnc: + | adds RC, RC, #8 // RC = (nresults+1)*8. + | mov CRET1, #LUA_YIELD + | beq ->vm_unwind_c_eh + | str RCw, SAVE_MULTRES + | ands CARG1, PC, #FRAME_TYPE + | beq ->BC_RET_Z // Handle regular return to Lua. + | + |->vm_return: + | // BASE = base, RA = resultptr, RC/MULTRES = (nresults+1)*8, PC = return + | // CARG1 = PC & FRAME_TYPE + | and RB, PC, #~FRAME_TYPEP + | cmp CARG1, #FRAME_C + | sub RB, BASE, RB // RB = previous base. + | bne ->vm_returnp + | + | str RB, L->base + | ldrsw CARG2, SAVE_NRES // CARG2 = nresults+1. + | mv_vmstate TMP0w, C + | sub BASE, BASE, #16 + | subs TMP2, RC, #8 + | st_vmstate TMP0w + | beq >2 + |1: + | subs TMP2, TMP2, #8 + | ldr TMP0, [RA], #8 + | str TMP0, [BASE], #8 + | bne <1 + |2: + | cmp RC, CARG2, lsl #3 // More/less results wanted? + | bne >6 + |3: + | str BASE, L->top // Store new top. + | + |->vm_leave_cp: + | ldr RC, SAVE_CFRAME // Restore previous C frame. + | mov CRET1, #0 // Ok return status for vm_pcall. + | str RC, L->cframe + | + |->vm_leave_unw: + | restoreregs + | ret_auth + | + |6: + | bgt >7 // Less results wanted? + | // More results wanted. Check stack size and fill up results with nil. + | ldr CARG3, L->maxstack + | cmp BASE, CARG3 + | bhs >8 + | str TISNIL, [BASE], #8 + | add RC, RC, #8 + | b <2 + | + |7: // Less results wanted. + | cbz CARG2, <3 // LUA_MULTRET+1 case? + | sub CARG1, RC, CARG2, lsl #3 + | sub BASE, BASE, CARG1 // Shrink top. + | b <3 + | + |8: // Corner case: need to grow stack for filling up results. + | // This can happen if: + | // - A C function grows the stack (a lot). + | // - The GC shrinks the stack in between. + | // - A return back from a lua_call() with (high) nresults adjustment. + | str BASE, L->top // Save current top held in BASE (yes). + | mov CARG1, L + | bl extern lj_state_growstack // (lua_State *L, int n) + | ldr BASE, L->top // Need the (realloced) L->top in BASE. + | ldrsw CARG2, SAVE_NRES + | b <2 + | + |->vm_unwind_c: // Unwind C stack, return from vm_pcall. + | // (void *cframe, int errcode) + | mov sp, CARG1 + | mov CRET1, CARG2 + |->vm_unwind_c_eh: // Landing pad for external unwinder. + | ldr L, SAVE_L + | mv_vmstate TMP0w, C + | ldr GL, L->glref + | st_vmstate TMP0w + | b ->vm_leave_unw + | + |->vm_unwind_ff: // Unwind C stack, return from ff pcall. + | // (void *cframe) + | and sp, CARG1, #CFRAME_RAWMASK + |->vm_unwind_ff_eh: // Landing pad for external unwinder. + | ldr L, SAVE_L + | movz TISNUM, #(LJ_TISNUM>>1)&0xffff, lsl #48 + | movz TISNUMhi, #(LJ_TISNUM>>1)&0xffff, lsl #16 + | movn TISNIL, #0 + | mov RC, #16 // 2 results: false + error message. + | ldr BASE, L->base + | ldr GL, L->glref // Setup pointer to global state. + | mov_false TMP0 + | sub RA, BASE, #8 // Results start at BASE-8. + | ldr PC, [BASE, FRAME_PC] // Fetch PC of previous frame. + | str TMP0, [BASE, #-8] // Prepend false to error message. + | st_vmstate ST_INTERP + | b ->vm_returnc + | + |//----------------------------------------------------------------------- + |//-- Grow stack for calls ----------------------------------------------- + |//----------------------------------------------------------------------- + | + |->vm_growstack_c: // Grow stack for C function. + | // CARG1 = L + | mov CARG2, #LUA_MINSTACK + | b >2 + | + |->vm_growstack_l: // Grow stack for Lua function. + | // BASE = new base, RA = BASE+framesize*8, RC = nargs*8, PC = first PC + | add RC, BASE, RC + | sub RA, RA, BASE + | mov CARG1, L + | stp BASE, RC, L->base + | add PC, PC, #4 // Must point after first instruction. + | lsr CARG2, RA, #3 + |2: + | // L->base = new base, L->top = top + | str PC, SAVE_PC + | bl extern lj_state_growstack // (lua_State *L, int n) + | ldp BASE, RC, L->base + | ldr LFUNC:CARG3, [BASE, FRAME_FUNC] + | sub NARGS8:RC, RC, BASE + | and LFUNC:CARG3, CARG3, #LJ_GCVMASK + | // BASE = new base, RB = LFUNC/CFUNC, RC = nargs*8, FRAME_PC(BASE) = PC + | ins_callt // Just retry the call. + | + |//----------------------------------------------------------------------- + |//-- Entry points into the assembler VM --------------------------------- + |//----------------------------------------------------------------------- + | + |->vm_resume: // Setup C frame and resume thread. + | // (lua_State *L, TValue *base, int nres1 = 0, ptrdiff_t ef = 0) + | saveregs + | mov L, CARG1 + | ldr GL, L->glref // Setup pointer to global state. + | mov BASE, CARG2 + | str L, SAVE_L + | mov PC, #FRAME_CP + | str wzr, SAVE_NRES + | add TMP0, sp, #CFRAME_RESUME + | ldrb TMP1w, L->status + | str wzr, SAVE_ERRF + | str L, SAVE_PC // Any value outside of bytecode is ok. + | str xzr, SAVE_CFRAME + | str TMP0, L->cframe + | cbz TMP1w, >3 + | + | // Resume after yield (like a return). + | str L, GL->cur_L + | mov RA, BASE + | ldp BASE, CARG1, L->base + | movz TISNUM, #(LJ_TISNUM>>1)&0xffff, lsl #48 + | movz TISNUMhi, #(LJ_TISNUM>>1)&0xffff, lsl #16 + | ldr PC, [BASE, FRAME_PC] + | strb wzr, L->status + | movn TISNIL, #0 + | sub RC, CARG1, BASE + | ands CARG1, PC, #FRAME_TYPE + | add RC, RC, #8 + | st_vmstate ST_INTERP + | str RCw, SAVE_MULTRES + | beq ->BC_RET_Z + | b ->vm_return + | + |->vm_pcall: // Setup protected C frame and enter VM. + | // (lua_State *L, TValue *base, int nres1, ptrdiff_t ef) + | saveregs + | mov PC, #FRAME_CP + | str CARG4w, SAVE_ERRF + | b >1 + | + |->vm_call: // Setup C frame and enter VM. + | // (lua_State *L, TValue *base, int nres1) + | saveregs + | mov PC, #FRAME_C + | + |1: // Entry point for vm_pcall above (PC = ftype). + | ldr RC, L:CARG1->cframe + | str CARG3w, SAVE_NRES + | mov L, CARG1 + | str CARG1, SAVE_L + | ldr GL, L->glref // Setup pointer to global state. + | mov BASE, CARG2 + | str CARG1, SAVE_PC // Any value outside of bytecode is ok. + | add TMP0, sp, #0 + | str RC, SAVE_CFRAME + | str TMP0, L->cframe // Add our C frame to cframe chain. + | + |3: // Entry point for vm_cpcall/vm_resume (BASE = base, PC = ftype). + | str L, GL->cur_L + | ldp RB, CARG1, L->base // RB = old base (for vmeta_call). + | movz TISNUM, #(LJ_TISNUM>>1)&0xffff, lsl #48 + | movz TISNUMhi, #(LJ_TISNUM>>1)&0xffff, lsl #16 + | add PC, PC, BASE + | movn TISNIL, #0 + | sub PC, PC, RB // PC = frame delta + frame type + | sub NARGS8:RC, CARG1, BASE + | st_vmstate ST_INTERP + | + |->vm_call_dispatch: + | // RB = old base, BASE = new base, RC = nargs*8, PC = caller PC + | ldr CARG3, [BASE, FRAME_FUNC] + | checkfunc CARG3, ->vmeta_call + | + |->vm_call_dispatch_f: + | ins_call + | // BASE = new base, CARG3 = func, RC = nargs*8, PC = caller PC + | + |->vm_cpcall: // Setup protected C frame, call C. + | // (lua_State *L, lua_CFunction func, void *ud, lua_CPFunction cp) + | saveregs + | mov L, CARG1 + | ldr RA, L:CARG1->stack + | str CARG1, SAVE_L + | ldr GL, L->glref // Setup pointer to global state. + | ldr RB, L->top + | str CARG1, SAVE_PC // Any value outside of bytecode is ok. + | ldr RC, L->cframe + | sub RA, RA, RB // Compute -savestack(L, L->top). + | str RAw, SAVE_NRES // Neg. delta means cframe w/o frame. + | str wzr, SAVE_ERRF // No error function. + | add TMP0, sp, #0 + | str RC, SAVE_CFRAME + | str TMP0, L->cframe // Add our C frame to cframe chain. + | str L, GL->cur_L + | blr_auth CARG4 // (lua_State *L, lua_CFunction func, void *ud) + | mov BASE, CRET1 + | mov PC, #FRAME_CP + | cbnz BASE, <3 // Else continue with the call. + | b ->vm_leave_cp // No base? Just remove C frame. + | + |//----------------------------------------------------------------------- + |//-- Metamethod handling ------------------------------------------------ + |//----------------------------------------------------------------------- + | + |//-- Continuation dispatch ---------------------------------------------- + | + |->cont_dispatch: + | // BASE = meta base, RA = resultptr, RC = (nresults+1)*8 + | ldr LFUNC:CARG3, [RB, FRAME_FUNC] + | ldr CARG1, [BASE, #-32] // Get continuation. + | mov CARG4, BASE + | mov BASE, RB // Restore caller BASE. + | and LFUNC:CARG3, CARG3, #LJ_GCVMASK + |.if FFI + | cmp CARG1, #1 + |.endif + | ldr PC, [CARG4, #-24] // Restore PC from [cont|PC]. + | add TMP0, RA, RC + | str TISNIL, [TMP0, #-8] // Ensure one valid arg. + |.if FFI + | bls >1 + |.endif + | ldr CARG3, LFUNC:CARG3->pc + | ldr KBASE, [CARG3, #PC2PROTO(k)] + | // BASE = base, RA = resultptr, CARG4 = meta base + | br_auth CARG1 + | + |.if FFI + |1: + | beq ->cont_ffi_callback // cont = 1: return from FFI callback. + | // cont = 0: tailcall from C function. + | sub CARG4, CARG4, #32 + | sub RC, CARG4, BASE + | b ->vm_call_tail + |.endif + | + |->cont_cat: // RA = resultptr, CARG4 = meta base + | ldr INSw, [PC, #-4] + | sub CARG2, CARG4, #32 + | ldr TMP0, [RA] + | str BASE, L->base + | decode_RB RB, INS + | decode_RA RA, INS + | add TMP1, BASE, RB, lsl #3 + | subs TMP1, CARG2, TMP1 + | beq >1 + | str TMP0, [CARG2] + | lsr CARG3, TMP1, #3 + | b ->BC_CAT_Z + | + |1: + | str TMP0, [BASE, RA, lsl #3] + | b ->cont_nop + | + |//-- Table indexing metamethods ----------------------------------------- + | + |->vmeta_tgets1: + | movn CARG4, #~LJ_TSTR + | add CARG2, BASE, RB, lsl #3 + | add CARG4, STR:RC, CARG4, lsl #47 + | b >2 + | + |->vmeta_tgets: + | movk CARG2, #(LJ_TTAB>>1)&0xffff, lsl #48 + | str CARG2, GL->tmptv + | add CARG2, GL, #offsetof(global_State, tmptv) + |2: + | add CARG3, sp, TMPDofs + | str CARG4, TMPD + | b >1 + | + |->vmeta_tgetb: // RB = table, RC = index + | add RC, RC, TISNUM + | add CARG2, BASE, RB, lsl #3 + | add CARG3, sp, TMPDofs + | str RC, TMPD + | b >1 + | + |->vmeta_tgetv: // RB = table, RC = key + | add CARG2, BASE, RB, lsl #3 + | add CARG3, BASE, RC, lsl #3 + |1: + | str BASE, L->base + | mov CARG1, L + | str PC, SAVE_PC + | bl extern lj_meta_tget // (lua_State *L, TValue *o, TValue *k) + | // Returns TValue * (finished) or NULL (metamethod). + | cbz CRET1, >3 + | ldr TMP0, [CRET1] + | str TMP0, [BASE, RA, lsl #3] + | ins_next + | + |3: // Call __index metamethod. + | // BASE = base, L->top = new base, stack = cont/func/t/k + | sub TMP1, BASE, #FRAME_CONT + | ldr BASE, L->top + | mov NARGS8:RC, #16 // 2 args for func(t, k). + | ldr LFUNC:CARG3, [BASE, FRAME_FUNC] // Guaranteed to be a function here. + | str PC, [BASE, #-24] // [cont|PC] + | sub PC, BASE, TMP1 + | and LFUNC:CARG3, CARG3, #LJ_GCVMASK + | b ->vm_call_dispatch_f + | + |->vmeta_tgetr: + | sxtw CARG2, TMP1w + | bl extern lj_tab_getinth // (GCtab *t, int32_t key) + | // Returns cTValue * or NULL. + | mov TMP0, TISNIL + | cbz CRET1, ->BC_TGETR_Z + | ldr TMP0, [CRET1] + | b ->BC_TGETR_Z + | + |//----------------------------------------------------------------------- + | + |->vmeta_tsets1: + | movn CARG4, #~LJ_TSTR + | add CARG2, BASE, RB, lsl #3 + | add CARG4, STR:RC, CARG4, lsl #47 + | b >2 + | + |->vmeta_tsets: + | movk CARG2, #(LJ_TTAB>>1)&0xffff, lsl #48 + | str CARG2, GL->tmptv + | add CARG2, GL, #offsetof(global_State, tmptv) + |2: + | add CARG3, sp, TMPDofs + | str CARG4, TMPD + | b >1 + | + |->vmeta_tsetb: // RB = table, RC = index + | add RC, RC, TISNUM + | add CARG2, BASE, RB, lsl #3 + | add CARG3, sp, TMPDofs + | str RC, TMPD + | b >1 + | + |->vmeta_tsetv: + | add CARG2, BASE, RB, lsl #3 + | add CARG3, BASE, RC, lsl #3 + |1: + | str BASE, L->base + | mov CARG1, L + | str PC, SAVE_PC + | bl extern lj_meta_tset // (lua_State *L, TValue *o, TValue *k) + | // Returns TValue * (finished) or NULL (metamethod). + | ldr TMP0, [BASE, RA, lsl #3] + | cbz CRET1, >3 + | // NOBARRIER: lj_meta_tset ensures the table is not black. + | str TMP0, [CRET1] + | ins_next + | + |3: // Call __newindex metamethod. + | // BASE = base, L->top = new base, stack = cont/func/t/k/(v) + | sub TMP1, BASE, #FRAME_CONT + | ldr BASE, L->top + | mov NARGS8:RC, #24 // 3 args for func(t, k, v). + | ldr LFUNC:CARG3, [BASE, FRAME_FUNC] // Guaranteed to be a function here. + | str TMP0, [BASE, #16] // Copy value to third argument. + | str PC, [BASE, #-24] // [cont|PC] + | sub PC, BASE, TMP1 + | and LFUNC:CARG3, CARG3, #LJ_GCVMASK + | b ->vm_call_dispatch_f + | + |->vmeta_tsetr: + | sxtw CARG3, TMP1w + | str BASE, L->base + | mov CARG1, L + | str PC, SAVE_PC + | bl extern lj_tab_setinth // (lua_State *L, GCtab *t, int32_t key) + | // Returns TValue *. + | b ->BC_TSETR_Z + | + |//-- Comparison metamethods --------------------------------------------- + | + |->vmeta_comp: + | add CARG2, BASE, RA, lsl #3 + | sub PC, PC, #4 + | add CARG3, BASE, RC, lsl #3 + | str BASE, L->base + | mov CARG1, L + | str PC, SAVE_PC + | uxtb CARG4w, INSw + | bl extern lj_meta_comp // (lua_State *L, TValue *o1, *o2, int op) + | // Returns 0/1 or TValue * (metamethod). + |3: + | cmp CRET1, #1 + | bhi ->vmeta_binop + |4: + | ldrh RBw, [PC, # OFS_RD] + | add PC, PC, #4 + | add RB, PC, RB, lsl #2 + | sub RB, RB, #0x20000 + | csel PC, PC, RB, lo + |->cont_nop: + | ins_next + | + |->cont_ra: // RA = resultptr + | ldr INSw, [PC, #-4] + | ldr TMP0, [RA] + | decode_RA TMP1, INS + | str TMP0, [BASE, TMP1, lsl #3] + | b ->cont_nop + | + |->cont_condt: // RA = resultptr + | ldr TMP0, [RA] + | mov_true TMP1 + | cmp TMP1, TMP0 // Branch if result is true. + | b <4 + | + |->cont_condf: // RA = resultptr + | ldr TMP0, [RA] + | mov_false TMP1 + | cmp TMP0, TMP1 // Branch if result is false. + | b <4 + | + |->vmeta_equal: + | // CARG2, CARG3, CARG4 are already set by BC_ISEQV/BC_ISNEV. + | and TAB:CARG3, CARG3, #LJ_GCVMASK + | sub PC, PC, #4 + | str BASE, L->base + | mov CARG1, L + | str PC, SAVE_PC + | bl extern lj_meta_equal // (lua_State *L, GCobj *o1, *o2, int ne) + | // Returns 0/1 or TValue * (metamethod). + | b <3 + | + |->vmeta_equal_cd: + |.if FFI + | sub PC, PC, #4 + | str BASE, L->base + | mov CARG1, L + | mov CARG2, INS + | str PC, SAVE_PC + | bl extern lj_meta_equal_cd // (lua_State *L, BCIns op) + | // Returns 0/1 or TValue * (metamethod). + | b <3 + |.endif + | + |->vmeta_istype: + | sub PC, PC, #4 + | str BASE, L->base + | mov CARG1, L + | mov CARG2, RA + | mov CARG3, RC + | str PC, SAVE_PC + | bl extern lj_meta_istype // (lua_State *L, BCReg ra, BCReg tp) + | b ->cont_nop + | + |//-- Arithmetic metamethods --------------------------------------------- + | + |->vmeta_arith_vn: + | add CARG3, BASE, RB, lsl #3 + | add CARG4, KBASE, RC, lsl #3 + | b >1 + | + |->vmeta_arith_nv: + | add CARG4, BASE, RB, lsl #3 + | add CARG3, KBASE, RC, lsl #3 + | b >1 + | + |->vmeta_unm: + | add CARG3, BASE, RC, lsl #3 + | mov CARG4, CARG3 + | b >1 + | + |->vmeta_arith_vv: + | add CARG3, BASE, RB, lsl #3 + | add CARG4, BASE, RC, lsl #3 + |1: + | uxtb CARG5w, INSw + | add CARG2, BASE, RA, lsl #3 + | str BASE, L->base + | mov CARG1, L + | str PC, SAVE_PC + | bl extern lj_meta_arith // (lua_State *L, TValue *ra,*rb,*rc, BCReg op) + | // Returns NULL (finished) or TValue * (metamethod). + | cbz CRET1, ->cont_nop + | + | // Call metamethod for binary op. + |->vmeta_binop: + | // BASE = old base, CRET1 = new base, stack = cont/func/o1/o2 + | sub TMP1, CRET1, BASE + | str PC, [CRET1, #-24] // [cont|PC] + | add PC, TMP1, #FRAME_CONT + | mov BASE, CRET1 + | mov NARGS8:RC, #16 // 2 args for func(o1, o2). + | b ->vm_call_dispatch + | + |->vmeta_len: + | add CARG2, BASE, RC, lsl #3 +#if LJ_52 + | mov TAB:RC, TAB:CARG1 // Save table (ignored for other types). +#endif + | str BASE, L->base + | mov CARG1, L + | str PC, SAVE_PC + | bl extern lj_meta_len // (lua_State *L, TValue *o) + | // Returns NULL (retry) or TValue * (metamethod base). +#if LJ_52 + | cbnz CRET1, ->vmeta_binop // Binop call for compatibility. + | mov TAB:CARG1, TAB:RC + | b ->BC_LEN_Z +#else + | b ->vmeta_binop // Binop call for compatibility. +#endif + | + |//-- Call metamethod ---------------------------------------------------- + | + |->vmeta_call: // Resolve and call __call metamethod. + | // RB = old base, BASE = new base, RC = nargs*8 + | mov CARG1, L + | str RB, L->base // This is the callers base! + | sub CARG2, BASE, #16 + | str PC, SAVE_PC + | add CARG3, BASE, NARGS8:RC + | bl extern lj_meta_call // (lua_State *L, TValue *func, TValue *top) + | ldr LFUNC:CARG3, [BASE, FRAME_FUNC] // Guaranteed to be a function here. + | add NARGS8:RC, NARGS8:RC, #8 // Got one more argument now. + | and LFUNC:CARG3, CARG3, #LJ_GCVMASK + | ins_call + | + |->vmeta_callt: // Resolve __call for BC_CALLT. + | // BASE = old base, RA = new base, RC = nargs*8 + | mov CARG1, L + | str BASE, L->base + | sub CARG2, RA, #16 + | str PC, SAVE_PC + | add CARG3, RA, NARGS8:RC + | bl extern lj_meta_call // (lua_State *L, TValue *func, TValue *top) + | ldr TMP1, [RA, FRAME_FUNC] // Guaranteed to be a function here. + | ldr PC, [BASE, FRAME_PC] + | add NARGS8:RC, NARGS8:RC, #8 // Got one more argument now. + | and LFUNC:CARG3, TMP1, #LJ_GCVMASK + | b ->BC_CALLT2_Z + | + |//-- Argument coercion for 'for' statement ------------------------------ + | + |->vmeta_for: + | mov CARG1, L + | str BASE, L->base + | mov CARG2, RA + | str PC, SAVE_PC + | bl extern lj_meta_for // (lua_State *L, TValue *base) + | ldr INSw, [PC, #-4] + |.if JIT + | uxtb TMP0w, INSw + |.endif + | decode_RA RA, INS + | decode_RD RC, INS + |.if JIT + | cmp TMP0, #BC_JFORI + | beq =>BC_JFORI + |.endif + | b =>BC_FORI + | + |//----------------------------------------------------------------------- + |//-- Fast functions ----------------------------------------------------- + |//----------------------------------------------------------------------- + | + |.macro .ffunc, name + |->ff_ .. name: + |.endmacro + | + |.macro .ffunc_1, name + |->ff_ .. name: + | ldr CARG1, [BASE] + | cmp NARGS8:RC, #8 + | blo ->fff_fallback + |.endmacro + | + |.macro .ffunc_2, name + |->ff_ .. name: + | ldp CARG1, CARG2, [BASE] + | cmp NARGS8:RC, #16 + | blo ->fff_fallback + |.endmacro + | + |.macro .ffunc_n, name + | .ffunc name + | ldr CARG1, [BASE] + | cmp NARGS8:RC, #8 + | ldr FARG1, [BASE] + | blo ->fff_fallback + | checknum CARG1, ->fff_fallback + |.endmacro + | + |.macro .ffunc_nn, name + | .ffunc name + | ldp CARG1, CARG2, [BASE] + | cmp NARGS8:RC, #16 + | ldp FARG1, FARG2, [BASE] + | blo ->fff_fallback + | checknum CARG1, ->fff_fallback + | checknum CARG2, ->fff_fallback + |.endmacro + | + |// Inlined GC threshold check. Caveat: uses CARG1 and CARG2. + |.macro ffgccheck + | ldp CARG1, CARG2, GL->gc.total // Assumes threshold follows total. + | cmp CARG1, CARG2 + | blt >1 + | bl ->fff_gcstep + |1: + |.endmacro + | + |//-- Base library: checks ----------------------------------------------- + | + |.ffunc_1 assert + | ldr PC, [BASE, FRAME_PC] + | mov_false TMP1 + | cmp CARG1, TMP1 + | bhs ->fff_fallback + | str CARG1, [BASE, #-16] + | sub RB, BASE, #8 + | subs RA, NARGS8:RC, #8 + | add RC, NARGS8:RC, #8 // Compute (nresults+1)*8. + | cbz RA, ->fff_res // Done if exactly 1 argument. + |1: + | ldr CARG1, [RB, #16] + | sub RA, RA, #8 + | str CARG1, [RB], #8 + | cbnz RA, <1 + | b ->fff_res + | + |.ffunc_1 type + | mov TMP0, #~LJ_TISNUM + | asr ITYPE, CARG1, #47 + | cmn ITYPE, #~LJ_TISNUM + | csinv TMP1, TMP0, ITYPE, lo + | add TMP1, TMP1, #offsetof(GCfuncC, upvalue)/8 + | ldr CARG1, [CFUNC:CARG3, TMP1, lsl #3] + | b ->fff_restv + | + |//-- Base library: getters and setters --------------------------------- + | + |.ffunc_1 getmetatable + | asr ITYPE, CARG1, #47 + | cmn ITYPE, #-LJ_TTAB + | ccmn ITYPE, #-LJ_TUDATA, #4, ne + | and TAB:CARG1, CARG1, #LJ_GCVMASK + | bne >6 + |1: // Field metatable must be at same offset for GCtab and GCudata! + | ldr TAB:RB, TAB:CARG1->metatable + |2: + | mov CARG1, TISNIL + | ldr STR:RC, GL->gcroot[GCROOT_MMNAME+MM_metatable] + | cbz TAB:RB, ->fff_restv + | ldr TMP1w, TAB:RB->hmask + | ldr TMP2w, STR:RC->sid + | ldr NODE:CARG3, TAB:RB->node + | and TMP1w, TMP1w, TMP2w // idx = str->sid & tab->hmask + | add TMP1, TMP1, TMP1, lsl #1 + | movn CARG4, #~LJ_TSTR + | add NODE:CARG3, NODE:CARG3, TMP1, lsl #3 // node = tab->node + idx*3*8 + | add CARG4, STR:RC, CARG4, lsl #47 // Tagged key to look for. + |3: // Rearranged logic, because we expect _not_ to find the key. + | ldp CARG1, TMP0, NODE:CARG3->val + | ldr NODE:CARG3, NODE:CARG3->next + | cmp TMP0, CARG4 + | beq >5 + | cbnz NODE:CARG3, <3 + |4: + | mov CARG1, RB // Use metatable as default result. + | movk CARG1, #(LJ_TTAB>>1)&0xffff, lsl #48 + | b ->fff_restv + |5: + | cmp TMP0, TISNIL + | bne ->fff_restv + | b <4 + | + |6: + | movn TMP0, #~LJ_TISNUM + | cmp ITYPE, TMP0 + | csel ITYPE, ITYPE, TMP0, hs + | sub TMP1, GL, ITYPE, lsl #3 + | ldr TAB:RB, [TMP1, #offsetof(global_State, gcroot[GCROOT_BASEMT])-8] + | b <2 + | + |.ffunc_2 setmetatable + | // Fast path: no mt for table yet and not clearing the mt. + | checktp TMP1, CARG1, LJ_TTAB, ->fff_fallback + | ldr TAB:TMP0, TAB:TMP1->metatable + | asr ITYPE, CARG2, #47 + | ldrb TMP2w, TAB:TMP1->marked + | cmn ITYPE, #-LJ_TTAB + | and TAB:CARG2, CARG2, #LJ_GCVMASK + | ccmp TAB:TMP0, #0, #0, eq + | bne ->fff_fallback + | str TAB:CARG2, TAB:TMP1->metatable + | tbz TMP2w, #2, ->fff_restv // isblack(table) + | barrierback TAB:TMP1, TMP2w, TMP0 + | b ->fff_restv + | + |.ffunc rawget + | ldr CARG2, [BASE] + | cmp NARGS8:RC, #16 + | blo ->fff_fallback + | checktab CARG2, ->fff_fallback + | mov CARG1, L + | add CARG3, BASE, #8 + | bl extern lj_tab_get // (lua_State *L, GCtab *t, cTValue *key) + | // Returns cTValue *. + | ldr CARG1, [CRET1] + | b ->fff_restv + | + |//-- Base library: conversions ------------------------------------------ + | + |.ffunc tonumber + | // Only handles the number case inline (without a base argument). + | ldr CARG1, [BASE] + | cmp NARGS8:RC, #8 + | bne ->fff_fallback + | checknumber CARG1, ->fff_fallback + | b ->fff_restv + | + |.ffunc_1 tostring + | // Only handles the string or number case inline. + | asr ITYPE, CARG1, #47 + | cmn ITYPE, #-LJ_TSTR + | // A __tostring method in the string base metatable is ignored. + | beq ->fff_restv + | // Handle numbers inline, unless a number base metatable is present. + | ldr TMP1, GL->gcroot[GCROOT_BASEMT_NUM] + | str BASE, L->base + | cmn ITYPE, #-LJ_TISNUM + | ccmp TMP1, #0, #0, ls + | str PC, SAVE_PC // Redundant (but a defined value). + | bne ->fff_fallback + | ffgccheck + | mov CARG1, L + | mov CARG2, BASE + | bl extern lj_strfmt_number // (lua_State *L, cTValue *o) + | // Returns GCstr *. + | movn TMP1, #~LJ_TSTR + | ldr BASE, L->base + | add CARG1, CARG1, TMP1, lsl #47 + | b ->fff_restv + | + |//-- Base library: iterators ------------------------------------------- + | + |.ffunc_1 next + | checktp CARG1, LJ_TTAB, ->fff_fallback + | str TISNIL, [BASE, NARGS8:RC] // Set missing 2nd arg to nil. + | ldr PC, [BASE, FRAME_PC] + | add CARG2, BASE, #8 + | sub CARG3, BASE, #16 + | bl extern lj_tab_next // (GCtab *t, cTValue *key, TValue *o) + | // Returns 1=found, 0=end, -1=error. + | mov RC, #(2+1)*8 + | tbnz CRET1w, #31, ->fff_fallback // Invalid key. + | cbnz CRET1, ->fff_res // Found key/value. + | // End of traversal: return nil. + | str TISNIL, [BASE, #-16] + | b ->fff_res1 + | + |.ffunc_1 pairs + | checktp TMP1, CARG1, LJ_TTAB, ->fff_fallback +#if LJ_52 + | ldr TAB:CARG2, TAB:TMP1->metatable +#endif + | ldr CFUNC:CARG4, CFUNC:CARG3->upvalue[0] + | ldr PC, [BASE, FRAME_PC] +#if LJ_52 + | cbnz TAB:CARG2, ->fff_fallback +#endif + | mov RC, #(3+1)*8 + | stp CARG1, TISNIL, [BASE, #-8] + | str CFUNC:CARG4, [BASE, #-16] + | b ->fff_res + | + |.ffunc_2 ipairs_aux + | checktab CARG1, ->fff_fallback + | checkint CARG2, ->fff_fallback + | ldr TMP1w, TAB:CARG1->asize + | ldr CARG3, TAB:CARG1->array + | ldr TMP0w, TAB:CARG1->hmask + | add CARG2w, CARG2w, #1 + | cmp CARG2w, TMP1w + | ldr PC, [BASE, FRAME_PC] + | add TMP2, CARG2, TISNUM + | mov RC, #(0+1)*8 + | str TMP2, [BASE, #-16] + | bhs >2 // Not in array part? + | ldr TMP0, [CARG3, CARG2, lsl #3] + |1: + | mov TMP1, #(2+1)*8 + | cmp TMP0, TISNIL + | str TMP0, [BASE, #-8] + | csel RC, RC, TMP1, eq + | b ->fff_res + |2: // Check for empty hash part first. Otherwise call C function. + | cbz TMP0w, ->fff_res + | bl extern lj_tab_getinth // (GCtab *t, int32_t key) + | // Returns cTValue * or NULL. + | cbz CRET1, ->fff_res + | ldr TMP0, [CRET1] + | b <1 + | + |.ffunc_1 ipairs + | checktp TMP1, CARG1, LJ_TTAB, ->fff_fallback +#if LJ_52 + | ldr TAB:CARG2, TAB:TMP1->metatable +#endif + | ldr CFUNC:CARG4, CFUNC:CARG3->upvalue[0] + | ldr PC, [BASE, FRAME_PC] +#if LJ_52 + | cbnz TAB:CARG2, ->fff_fallback +#endif + | mov RC, #(3+1)*8 + | stp CARG1, TISNUM, [BASE, #-8] + | str CFUNC:CARG4, [BASE, #-16] + | b ->fff_res + | + |//-- Base library: catch errors ---------------------------------------- + | + |.ffunc pcall + | cmp NARGS8:RC, #8 + | ldrb TMP0w, GL->hookmask + | blo ->fff_fallback + | sub NARGS8:RC, NARGS8:RC, #8 + | mov RB, BASE + | add BASE, BASE, #16 + | ubfx TMP0w, TMP0w, #HOOK_ACTIVE_SHIFT, #1 + | add PC, TMP0, #16+FRAME_PCALL + | beq ->vm_call_dispatch + |1: + | add TMP2, BASE, NARGS8:RC + |2: + | ldr TMP0, [TMP2, #-16] + | str TMP0, [TMP2, #-8]! + | cmp TMP2, BASE + | bne <2 + | b ->vm_call_dispatch + | + |.ffunc xpcall + | ldp CARG1, CARG2, [BASE] + | ldrb TMP0w, GL->hookmask + | subs NARGS8:TMP1, NARGS8:RC, #16 + | blo ->fff_fallback + | mov RB, BASE + | asr ITYPE, CARG2, #47 + | ubfx TMP0w, TMP0w, #HOOK_ACTIVE_SHIFT, #1 + | cmn ITYPE, #-LJ_TFUNC + | add PC, TMP0, #24+FRAME_PCALL + | bne ->fff_fallback // Traceback must be a function. + | mov NARGS8:RC, NARGS8:TMP1 + | add BASE, BASE, #24 + | stp CARG2, CARG1, [RB] // Swap function and traceback. + | cbz NARGS8:RC, ->vm_call_dispatch + | b <1 + | + |//-- Coroutine library -------------------------------------------------- + | + |.macro coroutine_resume_wrap, resume + |.if resume + |.ffunc_1 coroutine_resume + | checktp CARG1, LJ_TTHREAD, ->fff_fallback + |.else + |.ffunc coroutine_wrap_aux + | ldr L:CARG1, CFUNC:CARG3->upvalue[0].gcr + | and L:CARG1, CARG1, #LJ_GCVMASK + |.endif + | ldr PC, [BASE, FRAME_PC] + | str BASE, L->base + | ldp RB, CARG2, L:CARG1->base + | ldrb TMP1w, L:CARG1->status + | add TMP0, CARG2, TMP1 + | str PC, SAVE_PC + | cmp TMP0, RB + | beq ->fff_fallback + | cmp TMP1, #LUA_YIELD + | add TMP0, CARG2, #8 + | csel CARG2, CARG2, TMP0, hs + | ldr CARG4, L:CARG1->maxstack + | add CARG3, CARG2, NARGS8:RC + | ldr RB, L:CARG1->cframe + | ccmp CARG3, CARG4, #2, ls + | ccmp RB, #0, #2, ls + | bhi ->fff_fallback + |.if resume + | sub CARG3, CARG3, #8 // Keep resumed thread in stack for GC. + | add BASE, BASE, #8 + | sub NARGS8:RC, NARGS8:RC, #8 + |.endif + | str CARG3, L:CARG1->top + | str BASE, L->top + | cbz NARGS8:RC, >3 + |2: // Move args to coroutine. + | ldr TMP0, [BASE, RB] + | cmp RB, NARGS8:RC + | str TMP0, [CARG2, RB] + | add RB, RB, #8 + | bne <2 + |3: + | mov CARG3, #0 + | mov L:RA, L:CARG1 + | mov CARG4, #0 + | bl ->vm_resume // (lua_State *L, TValue *base, 0, 0) + | // Returns thread status. + |4: + | ldp CARG3, CARG4, L:RA->base + | cmp CRET1, #LUA_YIELD + | ldr BASE, L->base + | str L, GL->cur_L + | st_vmstate ST_INTERP + | bhi >8 + | sub RC, CARG4, CARG3 + | ldr CARG1, L->maxstack + | add CARG2, BASE, RC + | cbz RC, >6 // No results? + | cmp CARG2, CARG1 + | mov RB, #0 + | bhi >9 // Need to grow stack? + | + | sub CARG4, RC, #8 + | str CARG3, L:RA->top // Clear coroutine stack. + |5: // Move results from coroutine. + | ldr TMP0, [CARG3, RB] + | cmp RB, CARG4 + | str TMP0, [BASE, RB] + | add RB, RB, #8 + | bne <5 + |6: + |.if resume + | mov_true TMP1 + | add RC, RC, #16 + |7: + | str TMP1, [BASE, #-8] // Prepend true/false to results. + | sub RA, BASE, #8 + |.else + | mov RA, BASE + | add RC, RC, #8 + |.endif + | ands CARG1, PC, #FRAME_TYPE + | str PC, SAVE_PC + | str RCw, SAVE_MULTRES + | beq ->BC_RET_Z + | b ->vm_return + | + |8: // Coroutine returned with error (at co->top-1). + |.if resume + | ldr TMP0, [CARG4, #-8]! + | mov_false TMP1 + | mov RC, #(2+1)*8 + | str CARG4, L:RA->top // Remove error from coroutine stack. + | str TMP0, [BASE] // Copy error message. + | b <7 + |.else + | mov CARG1, L + | mov CARG2, L:RA + | bl extern lj_ffh_coroutine_wrap_err // (lua_State *L, lua_State *co) + | // Never returns. + |.endif + | + |9: // Handle stack expansion on return from yield. + | mov CARG1, L + | lsr CARG2, RC, #3 + | bl extern lj_state_growstack // (lua_State *L, int n) + | mov CRET1, #0 + | b <4 + |.endmacro + | + | coroutine_resume_wrap 1 // coroutine.resume + | coroutine_resume_wrap 0 // coroutine.wrap + | + |.ffunc coroutine_yield + | ldr TMP0, L->cframe + | add TMP1, BASE, NARGS8:RC + | mov CRET1, #LUA_YIELD + | stp BASE, TMP1, L->base + | tbz TMP0, #0, ->fff_fallback + | str xzr, L->cframe + | strb CRET1w, L->status + | b ->vm_leave_unw + | + |//-- Math library ------------------------------------------------------- + | + |.macro math_round, func, round + | .ffunc math_ .. func + | ldr CARG1, [BASE] + | cmp NARGS8:RC, #8 + | ldr d0, [BASE] + | blo ->fff_fallback + | cmp TISNUMhi, CARG1, lsr #32 + | beq ->fff_restv + | blo ->fff_fallback + | round d0, d0 + | b ->fff_resn + |.endmacro + | + | math_round floor, frintm + | math_round ceil, frintp + | + |.ffunc_1 math_abs + | checknumber CARG1, ->fff_fallback + | and CARG1, CARG1, #U64x(7fffffff,ffffffff) + | bne ->fff_restv + | eor CARG2w, CARG1w, CARG1w, asr #31 + | movz CARG3, #0x41e0, lsl #48 // 2^31. + | subs CARG1w, CARG2w, CARG1w, asr #31 + | add CARG1, CARG1, TISNUM + | csel CARG1, CARG1, CARG3, pl + | // Fallthrough. + | + |->fff_restv: + | // CARG1 = TValue result. + | ldr PC, [BASE, FRAME_PC] + | str CARG1, [BASE, #-16] + |->fff_res1: + | // PC = return. + | mov RC, #(1+1)*8 + |->fff_res: + | // RC = (nresults+1)*8, PC = return. + | ands CARG1, PC, #FRAME_TYPE + | str RCw, SAVE_MULTRES + | sub RA, BASE, #16 + | bne ->vm_return + | ldr INSw, [PC, #-4] + | decode_RB RB, INS + |5: + | cmp RC, RB, lsl #3 // More results expected? + | blo >6 + | decode_RA TMP1, INS + | // Adjust BASE. KBASE is assumed to be set for the calling frame. + | sub BASE, RA, TMP1, lsl #3 + | ins_next + | + |6: // Fill up results with nil. + | add TMP1, RA, RC + | add RC, RC, #8 + | str TISNIL, [TMP1, #-8] + | b <5 + | + |.macro math_extern, func + | .ffunc_n math_ .. func + | bl extern func + | b ->fff_resn + |.endmacro + | + |.macro math_extern2, func + | .ffunc_nn math_ .. func + | bl extern func + | b ->fff_resn + |.endmacro + | + |.ffunc_n math_sqrt + | fsqrt d0, d0 + |->fff_resn: + | ldr PC, [BASE, FRAME_PC] + | str d0, [BASE, #-16] + | b ->fff_res1 + | + |.ffunc math_log + | ldr CARG1, [BASE] + | cmp NARGS8:RC, #8 + | ldr FARG1, [BASE] + | bne ->fff_fallback // Need exactly 1 argument. + | checknum CARG1, ->fff_fallback + | bl extern log + | b ->fff_resn + | + | math_extern log10 + | math_extern exp + | math_extern sin + | math_extern cos + | math_extern tan + | math_extern asin + | math_extern acos + | math_extern atan + | math_extern sinh + | math_extern cosh + | math_extern tanh + | math_extern2 pow + | math_extern2 atan2 + | math_extern2 fmod + | + |.ffunc_2 math_ldexp + | ldr FARG1, [BASE] + | checknum CARG1, ->fff_fallback + | checkint CARG2, ->fff_fallback + | sxtw CARG1, CARG2w + | bl extern ldexp // (double x, int exp) + | b ->fff_resn + | + |.ffunc_n math_frexp + | add CARG1, sp, TMPDofs + | bl extern frexp + | ldr CARG2w, TMPD + | ldr PC, [BASE, FRAME_PC] + | str d0, [BASE, #-16] + | mov RC, #(2+1)*8 + | add CARG2, CARG2, TISNUM + | str CARG2, [BASE, #-8] + | b ->fff_res + | + |.ffunc_n math_modf + | sub CARG1, BASE, #16 + | ldr PC, [BASE, FRAME_PC] + | bl extern modf + | mov RC, #(2+1)*8 + | str d0, [BASE, #-8] + | b ->fff_res + | + |.macro math_minmax, name, cond, fcond + | .ffunc_1 name + | add RB, BASE, RC + | add RA, BASE, #8 + | checkint CARG1, >4 + |1: // Handle integers. + | ldr CARG2, [RA] + | cmp RA, RB + | bhs ->fff_restv + | checkint CARG2, >3 + | cmp CARG1w, CARG2w + | add RA, RA, #8 + | csel CARG1, CARG2, CARG1, cond + | b <1 + |3: // Convert intermediate result to number and continue below. + | scvtf d0, CARG1w + | blo ->fff_fallback + | ldr d1, [RA] + | b >6 + | + |4: + | ldr d0, [BASE] + | blo ->fff_fallback + |5: // Handle numbers. + | ldr CARG2, [RA] + | ldr d1, [RA] + | cmp RA, RB + | bhs ->fff_resn + | checknum CARG2, >7 + |6: + | fcmp d0, d1 + | add RA, RA, #8 + | fcsel d0, d1, d0, fcond + | b <5 + |7: // Convert integer to number and continue above. + | scvtf d1, CARG2w + | blo ->fff_fallback + | b <6 + |.endmacro + | + | math_minmax math_min, gt, pl + | math_minmax math_max, lt, le + | + |//-- String library ----------------------------------------------------- + | + |.ffunc string_byte // Only handle the 1-arg case here. + | ldp PC, CARG1, [BASE, FRAME_PC] + | cmp NARGS8:RC, #8 + | asr ITYPE, CARG1, #47 + | ccmn ITYPE, #-LJ_TSTR, #0, eq + | and STR:CARG1, CARG1, #LJ_GCVMASK + | bne ->fff_fallback + | ldrb TMP0w, STR:CARG1[1] // Access is always ok (NUL at end). + | ldr CARG3w, STR:CARG1->len + | add TMP0, TMP0, TISNUM + | str TMP0, [BASE, #-16] + | mov RC, #(0+1)*8 + | cbz CARG3, ->fff_res + | b ->fff_res1 + | + |.ffunc string_char // Only handle the 1-arg case here. + | ffgccheck + | ldp PC, CARG1, [BASE, FRAME_PC] + | cmp CARG1w, #255 + | ccmp NARGS8:RC, #8, #0, ls // Need exactly 1 argument. + | bne ->fff_fallback + | checkint CARG1, ->fff_fallback + | mov CARG3, #1 + | // Point to the char inside the integer in the stack slot. + |.if ENDIAN_LE + | mov CARG2, BASE + |.else + | add CARG2, BASE, #7 + |.endif + |->fff_newstr: + | // CARG2 = str, CARG3 = len. + | str BASE, L->base + | mov CARG1, L + | str PC, SAVE_PC + | bl extern lj_str_new // (lua_State *L, char *str, size_t l) + |->fff_resstr: + | // Returns GCstr *. + | ldr BASE, L->base + | movn TMP1, #~LJ_TSTR + | add CARG1, CARG1, TMP1, lsl #47 + | b ->fff_restv + | + |.ffunc string_sub + | ffgccheck + | ldr CARG1, [BASE] + | ldr CARG3, [BASE, #16] + | cmp NARGS8:RC, #16 + | movn RB, #0 + | beq >1 + | blo ->fff_fallback + | checkint CARG3, ->fff_fallback + | sxtw RB, CARG3w + |1: + | ldr CARG2, [BASE, #8] + | checkstr CARG1, ->fff_fallback + | ldr TMP1w, STR:CARG1->len + | checkint CARG2, ->fff_fallback + | sxtw CARG2, CARG2w + | // CARG1 = str, TMP1 = str->len, CARG2 = start, RB = end + | add TMP2, RB, TMP1 + | cmp RB, #0 + | add TMP0, CARG2, TMP1 + | csinc RB, RB, TMP2, ge // if (end < 0) end += len+1 + | cmp CARG2, #0 + | csinc CARG2, CARG2, TMP0, ge // if (start < 0) start += len+1 + | cmp RB, #0 + | csel RB, RB, xzr, ge // if (end < 0) end = 0 + | cmp CARG2, #1 + | csinc CARG2, CARG2, xzr, ge // if (start < 1) start = 1 + | cmp RB, TMP1 + | csel RB, RB, TMP1, le // if (end > len) end = len + | add CARG1, STR:CARG1, #sizeof(GCstr)-1 + | subs CARG3, RB, CARG2 // len = end - start + | add CARG2, CARG1, CARG2 + | add CARG3, CARG3, #1 // len += 1 + | bge ->fff_newstr + | add STR:CARG1, GL, #offsetof(global_State, strempty) + | movn TMP1, #~LJ_TSTR + | add CARG1, CARG1, TMP1, lsl #47 + | b ->fff_restv + | + |.macro ffstring_op, name + | .ffunc string_ .. name + | ffgccheck + | ldr CARG2, [BASE] + | cmp NARGS8:RC, #8 + | asr ITYPE, CARG2, #47 + | ccmn ITYPE, #-LJ_TSTR, #0, hs + | and STR:CARG2, CARG2, #LJ_GCVMASK + | bne ->fff_fallback + | ldr TMP0, GL->tmpbuf.b + | add SBUF:CARG1, GL, #offsetof(global_State, tmpbuf) + | str BASE, L->base + | str PC, SAVE_PC + | str L, GL->tmpbuf.L + | str TMP0, GL->tmpbuf.w + | bl extern lj_buf_putstr_ .. name + | bl extern lj_buf_tostr + | b ->fff_resstr + |.endmacro + | + |ffstring_op reverse + |ffstring_op lower + |ffstring_op upper + | + |//-- Bit library -------------------------------------------------------- + | + |// FP number to bit conversion for soft-float. Clobbers CARG1-CARG3 + |->vm_tobit_fb: + | bls ->fff_fallback + | add CARG2, CARG1, CARG1 + | mov CARG3, #1076 + | sub CARG3, CARG3, CARG2, lsr #53 + | cmp CARG3, #53 + | bhi >1 + | and CARG2, CARG2, #U64x(001fffff,ffffffff) + | orr CARG2, CARG2, #U64x(00200000,00000000) + | cmp CARG1, #0 + | lsr CARG2, CARG2, CARG3 + | cneg CARG1w, CARG2w, mi + | br lr + |1: + | mov CARG1w, #0 + | br lr + | + |.macro .ffunc_bit, name + | .ffunc_1 bit_..name + | adr lr, >1 + | checkint CARG1, ->vm_tobit_fb + |1: + |.endmacro + | + |.macro .ffunc_bit_op, name, ins + | .ffunc_bit name + | mov RA, #8 + | mov TMP0w, CARG1w + | adr lr, >2 + |1: + | ldr CARG1, [BASE, RA] + | cmp RA, NARGS8:RC + | add RA, RA, #8 + | bge >9 + | checkint CARG1, ->vm_tobit_fb + |2: + | ins TMP0w, TMP0w, CARG1w + | b <1 + |.endmacro + | + |.ffunc_bit_op band, and + |.ffunc_bit_op bor, orr + |.ffunc_bit_op bxor, eor + | + |.ffunc_bit tobit + | mov TMP0w, CARG1w + |9: // Label reused by .ffunc_bit_op users. + | add CARG1, TMP0, TISNUM + | b ->fff_restv + | + |.ffunc_bit bswap + | rev TMP0w, CARG1w + | add CARG1, TMP0, TISNUM + | b ->fff_restv + | + |.ffunc_bit bnot + | mvn TMP0w, CARG1w + | add CARG1, TMP0, TISNUM + | b ->fff_restv + | + |.macro .ffunc_bit_sh, name, ins, shmod + | .ffunc bit_..name + | ldp TMP0, CARG1, [BASE] + | cmp NARGS8:RC, #16 + | blo ->fff_fallback + | adr lr, >1 + | checkint CARG1, ->vm_tobit_fb + |1: + |.if shmod == 0 + | mov TMP1, CARG1 + |.else + | neg TMP1, CARG1 + |.endif + | mov CARG1, TMP0 + | adr lr, >2 + | checkint CARG1, ->vm_tobit_fb + |2: + | ins TMP0w, CARG1w, TMP1w + | add CARG1, TMP0, TISNUM + | b ->fff_restv + |.endmacro + | + |.ffunc_bit_sh lshift, lsl, 0 + |.ffunc_bit_sh rshift, lsr, 0 + |.ffunc_bit_sh arshift, asr, 0 + |.ffunc_bit_sh rol, ror, 1 + |.ffunc_bit_sh ror, ror, 0 + | + |//----------------------------------------------------------------------- + | + |->fff_fallback: // Call fast function fallback handler. + | // BASE = new base, RC = nargs*8 + | ldp CFUNC:CARG3, PC, [BASE, FRAME_FUNC] // Fallback may overwrite PC. + | ldr TMP2, L->maxstack + | add TMP1, BASE, NARGS8:RC + | stp BASE, TMP1, L->base + | and CFUNC:CARG3, CARG3, #LJ_GCVMASK + | add TMP1, TMP1, #8*LUA_MINSTACK + | ldr CARG3, CFUNC:CARG3->f + | str PC, SAVE_PC // Redundant (but a defined value). + | cmp TMP1, TMP2 + | mov CARG1, L + | bhi >5 // Need to grow stack. + | blr_auth CARG3 // (lua_State *L) + | // Either throws an error, or recovers and returns -1, 0 or nresults+1. + | ldr BASE, L->base + | cmp CRET1w, #0 + | lsl RC, CRET1, #3 + | sub RA, BASE, #16 + | bgt ->fff_res // Returned nresults+1? + |1: // Returned 0 or -1: retry fast path. + | ldr CARG1, L->top + | ldr CFUNC:CARG3, [BASE, FRAME_FUNC] + | sub NARGS8:RC, CARG1, BASE + | bne ->vm_call_tail // Returned -1? + | and CFUNC:CARG3, CARG3, #LJ_GCVMASK + | ins_callt // Returned 0: retry fast path. + | + |// Reconstruct previous base for vmeta_call during tailcall. + |->vm_call_tail: + | ands TMP0, PC, #FRAME_TYPE + | and TMP1, PC, #~FRAME_TYPEP + | bne >3 + | ldrb RAw, [PC, #-4+OFS_RA] + | lsl RA, RA, #3 + | add TMP1, RA, #16 + |3: + | sub RB, BASE, TMP1 + | b ->vm_call_dispatch // Resolve again for tailcall. + | + |5: // Grow stack for fallback handler. + | mov CARG2, #LUA_MINSTACK + | bl extern lj_state_growstack // (lua_State *L, int n) + | ldr BASE, L->base + | cmp CARG1, CARG1 // Set zero-flag to force retry. + | b <1 + | + |->fff_gcstep: // Call GC step function. + | // BASE = new base, RC = nargs*8 + | sp_auth + | add CARG2, BASE, NARGS8:RC // Calculate L->top. + | mov RA, lr + | stp BASE, CARG2, L->base + | str PC, SAVE_PC // Redundant (but a defined value). + | mov CARG1, L + | bl extern lj_gc_step // (lua_State *L) + | ldp BASE, CARG2, L->base + | ldr CFUNC:CARG3, [BASE, FRAME_FUNC] + | mov lr, RA // Help return address predictor. + | sub NARGS8:RC, CARG2, BASE // Calculate nargs*8. + | and CFUNC:CARG3, CARG3, #LJ_GCVMASK + | ret_auth + | + |//----------------------------------------------------------------------- + |//-- Special dispatch targets ------------------------------------------- + |//----------------------------------------------------------------------- + | + |->vm_record: // Dispatch target for recording phase. + |.if JIT + | ldrb CARG1w, GL->hookmask + | tst CARG1, #HOOK_VMEVENT // No recording while in vmevent. + | bne >5 + | // Decrement the hookcount for consistency, but always do the call. + | ldr CARG2w, GL->hookcount + | tst CARG1, #HOOK_ACTIVE + | bne >1 + | sub CARG2w, CARG2w, #1 + | tst CARG1, #LUA_MASKLINE|LUA_MASKCOUNT + | beq >1 + | str CARG2w, GL->hookcount + | b >1 + |.endif + | + |->vm_rethook: // Dispatch target for return hooks. + | ldrb TMP2w, GL->hookmask + | tbz TMP2w, #HOOK_ACTIVE_SHIFT, >1 // Hook already active? + |5: // Re-dispatch to static ins. + | ldr TMP0, [TMP1, #GG_G2DISP+GG_DISP2STATIC] + | br_auth TMP0 + | + |->vm_inshook: // Dispatch target for instr/line hooks. + | ldrb TMP2w, GL->hookmask + | ldr TMP3w, GL->hookcount + | tbnz TMP2w, #HOOK_ACTIVE_SHIFT, <5 // Hook already active? + | tst TMP2w, #LUA_MASKLINE|LUA_MASKCOUNT + | beq <5 + | sub TMP3w, TMP3w, #1 + | str TMP3w, GL->hookcount + | cbz TMP3w, >1 + | tbz TMP2w, #LUA_HOOKLINE, <5 + |1: + | mov CARG1, L + | str BASE, L->base + | mov CARG2, PC + | // SAVE_PC must hold the _previous_ PC. The callee updates it with PC. + | bl extern lj_dispatch_ins // (lua_State *L, const BCIns *pc) + |3: + | ldr BASE, L->base + |4: // Re-dispatch to static ins. + | ldr INSw, [PC, #-4] + | add TMP1, GL, INS, uxtb #3 + | decode_RA RA, INS + | ldr TMP0, [TMP1, #GG_G2DISP+GG_DISP2STATIC] + | decode_RD RC, INS + | br_auth TMP0 + | + |->cont_hook: // Continue from hook yield. + | ldr CARG1, [CARG4, #-40] + | add PC, PC, #4 + | str CARG1w, SAVE_MULTRES // Restore MULTRES for *M ins. + | b <4 + | + |->vm_hotloop: // Hot loop counter underflow. + |.if JIT + | ldr LFUNC:CARG3, [BASE, FRAME_FUNC] // Same as curr_topL(L). + | add CARG1, GL, #GG_G2DISP+GG_DISP2J + | and LFUNC:CARG3, CARG3, #LJ_GCVMASK + | str PC, SAVE_PC + | ldr CARG3, LFUNC:CARG3->pc + | mov CARG2, PC + | str L, [GL, #GL_J(L)] + | ldrb CARG3w, [CARG3, #PC2PROTO(framesize)] + | str BASE, L->base + | add CARG3, BASE, CARG3, lsl #3 + | str CARG3, L->top + | bl extern lj_trace_hot // (jit_State *J, const BCIns *pc) + | b <3 + |.endif + | + |->vm_callhook: // Dispatch target for call hooks. + | mov CARG2, PC + |.if JIT + | b >1 + |.endif + | + |->vm_hotcall: // Hot call counter underflow. + |.if JIT + | orr CARG2, PC, #1 + |1: + |.endif + | add TMP1, BASE, NARGS8:RC + | str PC, SAVE_PC + | mov CARG1, L + | sub RA, RA, BASE + | stp BASE, TMP1, L->base + | bl extern lj_dispatch_call // (lua_State *L, const BCIns *pc) + | // Returns ASMFunction. + | ldp BASE, TMP1, L->base + | str xzr, SAVE_PC // Invalidate for subsequent line hook. + | ldr LFUNC:CARG3, [BASE, FRAME_FUNC] + | add RA, BASE, RA + | sub NARGS8:RC, TMP1, BASE + | ldr INSw, [PC, #-4] + | and LFUNC:CARG3, CARG3, #LJ_GCVMASK + | br_auth CRET1 + | + |->cont_stitch: // Trace stitching. + |.if JIT + | // RA = resultptr, CARG4 = meta base + | ldr RBw, SAVE_MULTRES + | ldr INSw, [PC, #-4] + | ldr TRACE:CARG3, [CARG4, #-40] // Save previous trace. + | subs RB, RB, #8 + | decode_RA RC, INS // Call base. + | and CARG3, CARG3, #LJ_GCVMASK + | beq >2 + |1: // Move results down. + | ldr CARG1, [RA] + | add RA, RA, #8 + | subs RB, RB, #8 + | str CARG1, [BASE, RC, lsl #3] + | add RC, RC, #1 + | bne <1 + |2: + | decode_RA RA, INS + | decode_RB RB, INS + | add RA, RA, RB + |3: + | cmp RA, RC + | bhi >9 // More results wanted? + | + | ldrh RAw, TRACE:CARG3->traceno + | ldrh RCw, TRACE:CARG3->link + | cmp RCw, RAw + | beq ->cont_nop // Blacklisted. + | cmp RCw, #0 + | bne =>BC_JLOOP // Jump to stitched trace. + | + | // Stitch a new trace to the previous trace. + | mov CARG1, #GL_J(exitno) + | str RAw, [GL, CARG1] + | mov CARG1, #GL_J(L) + | str L, [GL, CARG1] + | str BASE, L->base + | add CARG1, GL, #GG_G2J + | mov CARG2, PC + | bl extern lj_dispatch_stitch // (jit_State *J, const BCIns *pc) + | ldr BASE, L->base + | b ->cont_nop + | + |9: // Fill up results with nil. + | str TISNIL, [BASE, RC, lsl #3] + | add RC, RC, #1 + | b <3 + |.endif + | + |->vm_profhook: // Dispatch target for profiler hook. +#if LJ_HASPROFILE + | mov CARG1, L + | str BASE, L->base + | mov CARG2, PC + | bl extern lj_dispatch_profile // (lua_State *L, const BCIns *pc) + | // HOOK_PROFILE is off again, so re-dispatch to dynamic instruction. + | ldr BASE, L->base + | sub PC, PC, #4 + | b ->cont_nop +#endif + | + |//----------------------------------------------------------------------- + |//-- Trace exit handler ------------------------------------------------- + |//----------------------------------------------------------------------- + | + |.macro savex_, a, b + | stp d..a, d..b, [sp, #a*8] + | stp x..a, x..b, [sp, #32*8+a*8] + |.endmacro + | + |->vm_exit_handler: + |.if JIT + | sub sp, sp, #(64*8) + | savex_, 0, 1 + | savex_, 2, 3 + | savex_, 4, 5 + | savex_, 6, 7 + | savex_, 8, 9 + | savex_, 10, 11 + | savex_, 12, 13 + | savex_, 14, 15 + | savex_, 16, 17 + | savex_, 18, 19 + | savex_, 20, 21 + | savex_, 22, 23 + | savex_, 24, 25 + | savex_, 26, 27 + | savex_, 28, 29 + | stp d30, d31, [sp, #30*8] + | ldr CARG1, [sp, #64*8] // Load original value of lr. + | add CARG3, sp, #64*8 // Recompute original value of sp. + | mv_vmstate CARG4w, EXIT + | stp xzr, CARG3, [sp, #62*8] // Store 0/sp in RID_LR/RID_SP. + | sub CARG1, CARG1, lr + | ldr L, GL->cur_L + | lsr CARG1, CARG1, #2 + | ldr BASE, GL->jit_base + | sub CARG1, CARG1, #2 + | ldr CARG2w, [lr] // Load trace number. + | st_vmstate CARG4w + |.if ENDIAN_BE + | rev32 CARG2, CARG2 + |.endif + | str BASE, L->base + | ubfx CARG2w, CARG2w, #5, #16 + | str CARG1w, [GL, #GL_J(exitno)] + | str CARG2w, [GL, #GL_J(parent)] + | str L, [GL, #GL_J(L)] + | str xzr, GL->jit_base + | add CARG1, GL, #GG_G2J + | mov CARG2, sp + | bl extern lj_trace_exit // (jit_State *J, ExitState *ex) + | // Returns MULTRES (unscaled) or negated error code. + | ldr CARG2, L->cframe + | ldr BASE, L->base + | and sp, CARG2, #CFRAME_RAWMASK + | ldr PC, SAVE_PC // Get SAVE_PC. + | str L, SAVE_L // Set SAVE_L (on-trace resume/yield). + | b >1 + |.endif + | + |->vm_exit_interp: + | // CARG1 = MULTRES or negated error code, BASE, PC and GL set. + |.if JIT + | ldr L, SAVE_L + |1: + | cmn CARG1w, #LUA_ERRERR + | bhs >9 // Check for error from exit. + | lsl RC, CARG1, #3 + | ldr LFUNC:CARG2, [BASE, FRAME_FUNC] + | movz TISNUM, #(LJ_TISNUM>>1)&0xffff, lsl #48 + | movz TISNUMhi, #(LJ_TISNUM>>1)&0xffff, lsl #16 + | movn TISNIL, #0 + | and LFUNC:CARG2, CARG2, #LJ_GCVMASK + | str RCw, SAVE_MULTRES + | str BASE, L->base + | ldr CARG2, LFUNC:CARG2->pc + | str xzr, GL->jit_base + | mv_vmstate CARG4w, INTERP + | ldr KBASE, [CARG2, #PC2PROTO(k)] + | // Modified copy of ins_next which handles function header dispatch, too. + | ldrb RBw, [PC, # OFS_OP] + | ldr INSw, [PC], #4 + | st_vmstate CARG4w + | cmn CARG1w, #17 // Static dispatch? + | beq >5 + | cmp RBw, #BC_FUNCC+2 // Fast function? + | add TMP1, GL, INS, uxtb #3 + | bhs >4 + |2: + | cmp RBw, #BC_FUNCF // Function header? + | add TMP0, GL, RB, uxtb #3 + | ldr RB, [TMP0, #GG_G2DISP] + | decode_RA RA, INS + | lsr TMP0, INS, #16 + | csel RC, TMP0, RC, lo + | blo >3 + | ldr CARG3, [BASE, FRAME_FUNC] + | sub RC, RC, #8 + | add RA, BASE, RA, lsl #3 // Yes: RA = BASE+framesize*8, RC = nargs*8 + | and LFUNC:CARG3, CARG3, #LJ_GCVMASK + |3: + | br_auth RB + | + |4: // Check frame below fast function. + | ldr CARG1, [BASE, FRAME_PC] + | ands CARG2, CARG1, #FRAME_TYPE + | bne <2 // Trace stitching continuation? + | // Otherwise set KBASE for Lua function below fast function. + | ldr CARG3w, [CARG1, #-4] + | decode_RA CARG1, CARG3 + | sub CARG2, BASE, CARG1, lsl #3 + | ldr LFUNC:CARG3, [CARG2, #-32] + | and LFUNC:CARG3, CARG3, #LJ_GCVMASK + | ldr CARG3, LFUNC:CARG3->pc + | ldr KBASE, [CARG3, #PC2PROTO(k)] + | b <2 + | + |5: // Dispatch to static entry of original ins replaced by BC_JLOOP. + | ldr RA, [GL, #GL_J(trace)] + | decode_RD RC, INS + | ldr TRACE:RA, [RA, RC, lsl #3] + | ldr INSw, TRACE:RA->startins + | add TMP0, GL, INS, uxtb #3 + | decode_RA RA, INS + | ldr RB, [TMP0, #GG_G2DISP+GG_DISP2STATIC] + | decode_RD RC, INS + | br_auth RB + | + |9: // Rethrow error from the right C frame. + | neg CARG2w, CARG1w + | mov CARG1, L + | bl extern lj_err_trace // (lua_State *L, int errcode) + |.endif + | + |//----------------------------------------------------------------------- + |//-- Math helper functions ---------------------------------------------- + |//----------------------------------------------------------------------- + | + | // int lj_vm_modi(int dividend, int divisor); + |->vm_modi: + | eor CARG4w, CARG1w, CARG2w + | cmp CARG4w, #0 + | eor CARG3w, CARG1w, CARG1w, asr #31 + | eor CARG4w, CARG2w, CARG2w, asr #31 + | sub CARG3w, CARG3w, CARG1w, asr #31 + | sub CARG4w, CARG4w, CARG2w, asr #31 + | udiv CARG1w, CARG3w, CARG4w + | msub CARG1w, CARG1w, CARG4w, CARG3w + | ccmp CARG1w, #0, #4, mi + | sub CARG3w, CARG1w, CARG4w + | csel CARG1w, CARG1w, CARG3w, eq + | eor CARG3w, CARG1w, CARG2w + | cmp CARG3w, #0 + | cneg CARG1w, CARG1w, mi + | ret + | + |//----------------------------------------------------------------------- + |//-- Miscellaneous functions -------------------------------------------- + |//----------------------------------------------------------------------- + | + |.define NEXT_TAB, TAB:CARG1 + |.define NEXT_RES, CARG1 + |.define NEXT_IDX, CARG2w + |.define NEXT_LIM, CARG3w + |.define NEXT_TMP0, TMP0 + |.define NEXT_TMP0w, TMP0w + |.define NEXT_TMP1, TMP1 + |.define NEXT_TMP1w, TMP1w + |.define NEXT_RES_PTR, sp + |.define NEXT_RES_VAL, [sp] + |.define NEXT_RES_KEY, [sp, #8] + | + |// TValue *lj_vm_next(GCtab *t, uint32_t idx) + |// Next idx returned in CRET2w. + |->vm_next: + |.if JIT + | ldr NEXT_LIM, NEXT_TAB->asize + | ldr NEXT_TMP1, NEXT_TAB->array + |1: // Traverse array part. + | subs NEXT_TMP0w, NEXT_IDX, NEXT_LIM + | bhs >5 // Index points after array part? + | ldr NEXT_TMP0, [NEXT_TMP1, NEXT_IDX, uxtw #3] + | cmn NEXT_TMP0, #-LJ_TNIL + | cinc NEXT_IDX, NEXT_IDX, eq + | beq <1 // Skip holes in array part. + | str NEXT_TMP0, NEXT_RES_VAL + | movz NEXT_TMP0w, #(LJ_TISNUM>>1)&0xffff, lsl #16 + | stp NEXT_IDX, NEXT_TMP0w, NEXT_RES_KEY + | add NEXT_IDX, NEXT_IDX, #1 + | mov NEXT_RES, NEXT_RES_PTR + |4: + | ret + | + |5: // Traverse hash part. + | ldr NEXT_TMP1w, NEXT_TAB->hmask + | ldr NODE:NEXT_RES, NEXT_TAB->node + | add NEXT_TMP0w, NEXT_TMP0w, NEXT_TMP0w, lsl #1 + | add NEXT_LIM, NEXT_LIM, NEXT_TMP1w + | add NODE:NEXT_RES, NODE:NEXT_RES, NEXT_TMP0w, uxtw #3 + |6: + | cmp NEXT_IDX, NEXT_LIM + | bhi >9 + | ldr NEXT_TMP0, NODE:NEXT_RES->val + | cmn NEXT_TMP0, #-LJ_TNIL + | add NEXT_IDX, NEXT_IDX, #1 + | bne <4 + | // Skip holes in hash part. + | add NODE:NEXT_RES, NODE:NEXT_RES, #sizeof(Node) + | b <6 + | + |9: // End of iteration. Set the key to nil (not the value). + | movn NEXT_TMP0, #0 + | str NEXT_TMP0, NEXT_RES_KEY + | mov NEXT_RES, NEXT_RES_PTR + | ret + |.endif + | + |//----------------------------------------------------------------------- + |//-- FFI helper functions ----------------------------------------------- + |//----------------------------------------------------------------------- + | + |// Handler for callback functions. + |// Saveregs already performed. Callback slot number in [sp], g in r12. + |->vm_ffi_callback: + |.if FFI + |.type CTSTATE, CTState, PC + | saveregs + | ldr CTSTATE, GL:x10->ctype_state + | mov GL, x10 + | add x10, sp, # CFRAME_SPACE + | str w9, CTSTATE->cb.slot + | stp x0, x1, CTSTATE->cb.gpr[0] + | stp d0, d1, CTSTATE->cb.fpr[0] + | stp x2, x3, CTSTATE->cb.gpr[2] + | stp d2, d3, CTSTATE->cb.fpr[2] + | stp x4, x5, CTSTATE->cb.gpr[4] + | stp d4, d5, CTSTATE->cb.fpr[4] + | stp x6, x7, CTSTATE->cb.gpr[6] + | stp d6, d7, CTSTATE->cb.fpr[6] + | str x10, CTSTATE->cb.stack + | mov CARG1, CTSTATE + | str CTSTATE, SAVE_PC // Any value outside of bytecode is ok. + | mov CARG2, sp + | bl extern lj_ccallback_enter // (CTState *cts, void *cf) + | // Returns lua_State *. + | ldp BASE, RC, L:CRET1->base + | movz TISNUM, #(LJ_TISNUM>>1)&0xffff, lsl #48 + | movz TISNUMhi, #(LJ_TISNUM>>1)&0xffff, lsl #16 + | movn TISNIL, #0 + | mov L, CRET1 + | ldr LFUNC:CARG3, [BASE, FRAME_FUNC] + | sub RC, RC, BASE + | st_vmstate ST_INTERP + | and LFUNC:CARG3, CARG3, #LJ_GCVMASK + | ins_callt + |.endif + | + |->cont_ffi_callback: // Return from FFI callback. + |.if FFI + | ldr CTSTATE, GL->ctype_state + | stp BASE, CARG4, L->base + | str L, CTSTATE->L + | mov CARG1, CTSTATE + | mov CARG2, RA + | bl extern lj_ccallback_leave // (CTState *cts, TValue *o) + | ldp x0, x1, CTSTATE->cb.gpr[0] + | ldp d0, d1, CTSTATE->cb.fpr[0] + | b ->vm_leave_unw + |.endif + | + |->vm_ffi_call: // Call C function via FFI. + | // Caveat: needs special frame unwinding, see below. + |.if FFI + | .type CCSTATE, CCallState, x19 + | sp_auth + | stp x20, CCSTATE, [sp, #-32]! + | stp fp, lr, [sp, #16] + | add fp, sp, #16 + | mov CCSTATE, x0 + | ldr TMP0w, CCSTATE:x0->spadj + | ldrb TMP1w, CCSTATE->nsp + | add TMP2, CCSTATE, #offsetof(CCallState, stack) + | subs TMP1, TMP1, #1 + | ldr TMP3, CCSTATE->func + | sub sp, sp, TMP0 + | bmi >2 + |1: // Copy stack slots + | ldr TMP0, [TMP2, TMP1, lsl #3] + | str TMP0, [sp, TMP1, lsl #3] + | subs TMP1, TMP1, #1 + | bpl <1 + |2: + | ldp x0, x1, CCSTATE->gpr[0] + | ldp d0, d1, CCSTATE->fpr[0] + | ldp x2, x3, CCSTATE->gpr[2] + | ldp d2, d3, CCSTATE->fpr[2] + | ldp x4, x5, CCSTATE->gpr[4] + | ldp d4, d5, CCSTATE->fpr[4] + | ldp x6, x7, CCSTATE->gpr[6] + | ldp d6, d7, CCSTATE->fpr[6] + | ldr x8, CCSTATE->retp + | blr_auth TMP3 + | sub sp, fp, #16 + | stp x0, x1, CCSTATE->gpr[0] + | stp d0, d1, CCSTATE->fpr[0] + | stp d2, d3, CCSTATE->fpr[2] + | ldp fp, lr, [sp, #16] + | ldp x20, CCSTATE, [sp], #32 + | ret_auth + |.endif + |// Note: vm_ffi_call must be the last function in this object file! + | + |//----------------------------------------------------------------------- +} + +/* Generate the code for a single instruction. */ +static void build_ins(BuildCtx *ctx, BCOp op, int defop) +{ + int vk = 0; + |=>defop: + + switch (op) { + + /* -- Comparison ops ---------------------------------------------------- */ + + /* Remember: all ops branch for a true comparison, fall through otherwise. */ + + case BC_ISLT: case BC_ISGE: case BC_ISLE: case BC_ISGT: + | // RA = src1, RC = src2, JMP with RC = target + | ldr CARG1, [BASE, RA, lsl #3] + | ldrh RBw, [PC, # OFS_RD] + | ldr CARG2, [BASE, RC, lsl #3] + | add PC, PC, #4 + | add RB, PC, RB, lsl #2 + | sub RB, RB, #0x20000 + | checkint CARG1, >3 + | checkint CARG2, >4 + | cmp CARG1w, CARG2w + if (op == BC_ISLT) { + | csel PC, RB, PC, lt + } else if (op == BC_ISGE) { + | csel PC, RB, PC, ge + } else if (op == BC_ISLE) { + | csel PC, RB, PC, le + } else { + | csel PC, RB, PC, gt + } + |1: + | ins_next + | + |3: // RA not int. + | ldr FARG1, [BASE, RA, lsl #3] + | blo ->vmeta_comp + | ldr FARG2, [BASE, RC, lsl #3] + | cmp TISNUMhi, CARG2, lsr #32 + | bhi >5 + | bne ->vmeta_comp + | // RA number, RC int. + | scvtf FARG2, CARG2w + | b >5 + | + |4: // RA int, RC not int + | ldr FARG2, [BASE, RC, lsl #3] + | blo ->vmeta_comp + | // RA int, RC number. + | scvtf FARG1, CARG1w + | + |5: // RA number, RC number + | fcmp FARG1, FARG2 + | // To preserve NaN semantics GE/GT branch on unordered, but LT/LE don't. + if (op == BC_ISLT) { + | csel PC, RB, PC, lo + } else if (op == BC_ISGE) { + | csel PC, RB, PC, hs + } else if (op == BC_ISLE) { + | csel PC, RB, PC, ls + } else { + | csel PC, RB, PC, hi + } + | b <1 + break; + + case BC_ISEQV: case BC_ISNEV: + vk = op == BC_ISEQV; + | // RA = src1, RC = src2, JMP with RC = target + | ldr CARG1, [BASE, RA, lsl #3] + | add RC, BASE, RC, lsl #3 + | ldrh RBw, [PC, # OFS_RD] + | ldr CARG3, [RC] + | add PC, PC, #4 + | add RB, PC, RB, lsl #2 + | sub RB, RB, #0x20000 + | asr ITYPE, CARG3, #47 + | cmn ITYPE, #-LJ_TISNUM + if (vk) { + | bls ->BC_ISEQN_Z + } else { + | bls ->BC_ISNEN_Z + } + | // RC is not a number. + | asr TMP0, CARG1, #47 + |.if FFI + | // Check if RC or RA is a cdata. + | cmn ITYPE, #-LJ_TCDATA + | ccmn TMP0, #-LJ_TCDATA, #4, ne + | beq ->vmeta_equal_cd + |.endif + | cmp CARG1, CARG3 + | bne >2 + | // Tag and value are equal. + if (vk) { + |->BC_ISEQV_Z: + | mov PC, RB // Perform branch. + } + |1: + | ins_next + | + |2: // Check if the tags are the same and it's a table or userdata. + | cmp ITYPE, TMP0 + | ccmn ITYPE, #-LJ_TISTABUD, #2, eq + if (vk) { + | bhi <1 + } else { + | bhi ->BC_ISEQV_Z // Reuse code from opposite instruction. + } + | // Different tables or userdatas. Need to check __eq metamethod. + | // Field metatable must be at same offset for GCtab and GCudata! + | and TAB:CARG2, CARG1, #LJ_GCVMASK + | ldr TAB:TMP2, TAB:CARG2->metatable + if (vk) { + | cbz TAB:TMP2, <1 // No metatable? + | ldrb TMP1w, TAB:TMP2->nomm + | mov CARG4, #0 // ne = 0 + | tbnz TMP1w, #MM_eq, <1 // 'no __eq' flag set: done. + } else { + | cbz TAB:TMP2, ->BC_ISEQV_Z // No metatable? + | ldrb TMP1w, TAB:TMP2->nomm + | mov CARG4, #1 // ne = 1. + | tbnz TMP1w, #MM_eq, ->BC_ISEQV_Z // 'no __eq' flag set: done. + } + | b ->vmeta_equal + break; + + case BC_ISEQS: case BC_ISNES: + vk = op == BC_ISEQS; + | // RA = src, RC = str_const (~), JMP with RC = target + | ldr CARG1, [BASE, RA, lsl #3] + | mvn RC, RC + | ldrh RBw, [PC, # OFS_RD] + | ldr CARG2, [KBASE, RC, lsl #3] + | add PC, PC, #4 + | movn TMP0, #~LJ_TSTR + |.if FFI + | asr ITYPE, CARG1, #47 + |.endif + | add RB, PC, RB, lsl #2 + | add CARG2, CARG2, TMP0, lsl #47 + | sub RB, RB, #0x20000 + |.if FFI + | cmn ITYPE, #-LJ_TCDATA + | beq ->vmeta_equal_cd + |.endif + | cmp CARG1, CARG2 + if (vk) { + | csel PC, RB, PC, eq + } else { + | csel PC, RB, PC, ne + } + | ins_next + break; + + case BC_ISEQN: case BC_ISNEN: + vk = op == BC_ISEQN; + | // RA = src, RC = num_const (~), JMP with RC = target + | ldr CARG1, [BASE, RA, lsl #3] + | add RC, KBASE, RC, lsl #3 + | ldrh RBw, [PC, # OFS_RD] + | ldr CARG3, [RC] + | add PC, PC, #4 + | add RB, PC, RB, lsl #2 + | sub RB, RB, #0x20000 + if (vk) { + |->BC_ISEQN_Z: + } else { + |->BC_ISNEN_Z: + } + | checkint CARG1, >4 + | checkint CARG3, >6 + | cmp CARG1w, CARG3w + |1: + if (vk) { + | csel PC, RB, PC, eq + |2: + } else { + |2: + | csel PC, RB, PC, ne + } + |3: + | ins_next + | + |4: // RA not int. + |.if FFI + | blo >7 + |.else + | blo <2 + |.endif + | ldr FARG1, [BASE, RA, lsl #3] + | ldr FARG2, [RC] + | cmp TISNUMhi, CARG3, lsr #32 + | bne >5 + | // RA number, RC int. + | scvtf FARG2, CARG3w + |5: + | // RA number, RC number. + | fcmp FARG1, FARG2 + | b <1 + | + |6: // RA int, RC number + | ldr FARG2, [RC] + | scvtf FARG1, CARG1w + | fcmp FARG1, FARG2 + | b <1 + | + |.if FFI + |7: + | asr ITYPE, CARG1, #47 + | cmn ITYPE, #-LJ_TCDATA + | bne <2 + | b ->vmeta_equal_cd + |.endif + break; + + case BC_ISEQP: case BC_ISNEP: + vk = op == BC_ISEQP; + | // RA = src, RC = primitive_type (~), JMP with RC = target + | ldr TMP0, [BASE, RA, lsl #3] + | ldrh RBw, [PC, # OFS_RD] + | add PC, PC, #4 + | add RC, RC, #1 + | add RB, PC, RB, lsl #2 + |.if FFI + | asr ITYPE, TMP0, #47 + | cmn ITYPE, #-LJ_TCDATA + | beq ->vmeta_equal_cd + | cmn RC, ITYPE + |.else + | cmn RC, TMP0, asr #47 + |.endif + | sub RB, RB, #0x20000 + if (vk) { + | csel PC, RB, PC, eq + } else { + | csel PC, RB, PC, ne + } + | ins_next + break; + + /* -- Unary test and copy ops ------------------------------------------- */ + + case BC_ISTC: case BC_ISFC: case BC_IST: case BC_ISF: + | // RA = dst or unused, RC = src, JMP with RC = target + | ldrh RBw, [PC, # OFS_RD] + | ldr TMP0, [BASE, RC, lsl #3] + | add PC, PC, #4 + | mov_false TMP1 + | add RB, PC, RB, lsl #2 + | cmp TMP0, TMP1 + | sub RB, RB, #0x20000 + if (op == BC_ISTC || op == BC_IST) { + if (op == BC_ISTC) { + | csel RA, RA, RC, lo + } + | csel PC, RB, PC, lo + } else { + if (op == BC_ISFC) { + | csel RA, RA, RC, hs + } + | csel PC, RB, PC, hs + } + if (op == BC_ISTC || op == BC_ISFC) { + | str TMP0, [BASE, RA, lsl #3] + } + | ins_next + break; + + case BC_ISTYPE: + | // RA = src, RC = -type + | ldr TMP0, [BASE, RA, lsl #3] + | cmn RC, TMP0, asr #47 + | bne ->vmeta_istype + | ins_next + break; + case BC_ISNUM: + | // RA = src, RC = -(TISNUM-1) + | ldr TMP0, [BASE, RA] + | checknum TMP0, ->vmeta_istype + | ins_next + break; + + /* -- Unary ops --------------------------------------------------------- */ + + case BC_MOV: + | // RA = dst, RC = src + | ldr TMP0, [BASE, RC, lsl #3] + | str TMP0, [BASE, RA, lsl #3] + | ins_next + break; + case BC_NOT: + | // RA = dst, RC = src + | ldr TMP0, [BASE, RC, lsl #3] + | mov_false TMP1 + | mov_true TMP2 + | cmp TMP0, TMP1 + | csel TMP0, TMP1, TMP2, lo + | str TMP0, [BASE, RA, lsl #3] + | ins_next + break; + case BC_UNM: + | // RA = dst, RC = src + | ldr TMP0, [BASE, RC, lsl #3] + | asr ITYPE, TMP0, #47 + | cmn ITYPE, #-LJ_TISNUM + | bhi ->vmeta_unm + | eor TMP0, TMP0, #U64x(80000000,00000000) + | bne >5 + | negs TMP0w, TMP0w + | movz CARG3, #0x41e0, lsl #48 // 2^31. + | add TMP0, TMP0, TISNUM + | csel TMP0, TMP0, CARG3, vc + |5: + | str TMP0, [BASE, RA, lsl #3] + | ins_next + break; + case BC_LEN: + | // RA = dst, RC = src + | ldr CARG1, [BASE, RC, lsl #3] + | asr ITYPE, CARG1, #47 + | cmn ITYPE, #-LJ_TSTR + | and CARG1, CARG1, #LJ_GCVMASK + | bne >2 + | ldr CARG1w, STR:CARG1->len + |1: + | add CARG1, CARG1, TISNUM + | str CARG1, [BASE, RA, lsl #3] + | ins_next + | + |2: + | cmn ITYPE, #-LJ_TTAB + | bne ->vmeta_len +#if LJ_52 + | ldr TAB:CARG2, TAB:CARG1->metatable + | cbnz TAB:CARG2, >9 + |3: +#endif + |->BC_LEN_Z: + | bl extern lj_tab_len // (GCtab *t) + | // Returns uint32_t (but less than 2^31). + | b <1 + | +#if LJ_52 + |9: + | ldrb TMP1w, TAB:CARG2->nomm + | tbnz TMP1w, #MM_len, <3 // 'no __len' flag set: done. + | b ->vmeta_len +#endif + break; + + /* -- Binary ops -------------------------------------------------------- */ + + |.macro ins_arithcheck_int, target + | checkint CARG1, target + | checkint CARG2, target + |.endmacro + | + |.macro ins_arithcheck_num, target + | checknum CARG1, target + | checknum CARG2, target + |.endmacro + | + |.macro ins_arithcheck_nzdiv, target + | cbz CARG2w, target + |.endmacro + | + |.macro ins_arithhead + ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN); + ||if (vk == 1) { + | and RC, RC, #255 + | decode_RB RB, INS + ||} else { + | decode_RB RB, INS + | and RC, RC, #255 + ||} + |.endmacro + | + |.macro ins_arithload, reg1, reg2 + | // RA = dst, RB = src1, RC = src2 | num_const + ||switch (vk) { + ||case 0: + | ldr reg1, [BASE, RB, lsl #3] + | ldr reg2, [KBASE, RC, lsl #3] + || break; + ||case 1: + | ldr reg1, [KBASE, RC, lsl #3] + | ldr reg2, [BASE, RB, lsl #3] + || break; + ||default: + | ldr reg1, [BASE, RB, lsl #3] + | ldr reg2, [BASE, RC, lsl #3] + || break; + ||} + |.endmacro + | + |.macro ins_arithfallback, ins + ||switch (vk) { + ||case 0: + | ins ->vmeta_arith_vn + || break; + ||case 1: + | ins ->vmeta_arith_nv + || break; + ||default: + | ins ->vmeta_arith_vv + || break; + ||} + |.endmacro + | + |.macro ins_arithmod, res, reg1, reg2 + | fdiv d2, reg1, reg2 + | frintm d2, d2 + | // Cannot use fmsub, because FMA is not enabled by default. + | fmul d2, d2, reg2 + | fsub res, reg1, d2 + |.endmacro + | + |.macro ins_arithdn, intins, fpins + | ins_arithhead + | ins_arithload CARG1, CARG2 + | ins_arithcheck_int >5 + |.if "intins" == "smull" + | smull CARG1, CARG1w, CARG2w + | cmp CARG1, CARG1, sxtw + | mov CARG1w, CARG1w + | ins_arithfallback bne + |.elif "intins" == "ins_arithmodi" + | ins_arithfallback ins_arithcheck_nzdiv + | bl ->vm_modi + |.else + | intins CARG1w, CARG1w, CARG2w + | ins_arithfallback bvs + |.endif + | add CARG1, CARG1, TISNUM + | str CARG1, [BASE, RA, lsl #3] + |4: + | ins_next + | + |5: // FP variant. + | ins_arithload FARG1, FARG2 + | ins_arithfallback ins_arithcheck_num + | fpins FARG1, FARG1, FARG2 + | str FARG1, [BASE, RA, lsl #3] + | b <4 + |.endmacro + | + |.macro ins_arithfp, fpins + | ins_arithhead + | ins_arithload CARG1, CARG2 + | ins_arithload FARG1, FARG2 + | ins_arithfallback ins_arithcheck_num + |.if "fpins" == "fpow" + | bl extern pow + |.else + | fpins FARG1, FARG1, FARG2 + |.endif + | str FARG1, [BASE, RA, lsl #3] + | ins_next + |.endmacro + + case BC_ADDVN: case BC_ADDNV: case BC_ADDVV: + | ins_arithdn adds, fadd + break; + case BC_SUBVN: case BC_SUBNV: case BC_SUBVV: + | ins_arithdn subs, fsub + break; + case BC_MULVN: case BC_MULNV: case BC_MULVV: + | ins_arithdn smull, fmul + break; + case BC_DIVVN: case BC_DIVNV: case BC_DIVVV: + | ins_arithfp fdiv + break; + case BC_MODVN: case BC_MODNV: case BC_MODVV: + | ins_arithdn ins_arithmodi, ins_arithmod + break; + case BC_POW: + | // NYI: (partial) integer arithmetic. + | ins_arithfp fpow + break; + + case BC_CAT: + | decode_RB RB, INS + | and RC, RC, #255 + | // RA = dst, RB = src_start, RC = src_end + | str BASE, L->base + | sub CARG3, RC, RB + | add CARG2, BASE, RC, lsl #3 + |->BC_CAT_Z: + | // RA = dst, CARG2 = top-1, CARG3 = left + | mov CARG1, L + | str PC, SAVE_PC + | bl extern lj_meta_cat // (lua_State *L, TValue *top, int left) + | // Returns NULL (finished) or TValue * (metamethod). + | ldrb RBw, [PC, #-4+OFS_RB] + | ldr BASE, L->base + | cbnz CRET1, ->vmeta_binop + | ldr TMP0, [BASE, RB, lsl #3] + | str TMP0, [BASE, RA, lsl #3] // Copy result to RA. + | ins_next + break; + + /* -- Constant ops ------------------------------------------------------ */ + + case BC_KSTR: + | // RA = dst, RC = str_const (~) + | mvn RC, RC + | ldr TMP0, [KBASE, RC, lsl #3] + | movn TMP1, #~LJ_TSTR + | add TMP0, TMP0, TMP1, lsl #47 + | str TMP0, [BASE, RA, lsl #3] + | ins_next + break; + case BC_KCDATA: + |.if FFI + | // RA = dst, RC = cdata_const (~) + | mvn RC, RC + | ldr TMP0, [KBASE, RC, lsl #3] + | movn TMP1, #~LJ_TCDATA + | add TMP0, TMP0, TMP1, lsl #47 + | str TMP0, [BASE, RA, lsl #3] + | ins_next + |.endif + break; + case BC_KSHORT: + | // RA = dst, RC = int16_literal + | sxth RCw, RCw + | add TMP0, RC, TISNUM + | str TMP0, [BASE, RA, lsl #3] + | ins_next + break; + case BC_KNUM: + | // RA = dst, RC = num_const + | ldr TMP0, [KBASE, RC, lsl #3] + | str TMP0, [BASE, RA, lsl #3] + | ins_next + break; + case BC_KPRI: + | // RA = dst, RC = primitive_type (~) + | mvn TMP0, RC, lsl #47 + | str TMP0, [BASE, RA, lsl #3] + | ins_next + break; + case BC_KNIL: + | // RA = base, RC = end + | add RA, BASE, RA, lsl #3 + | add RC, BASE, RC, lsl #3 + | str TISNIL, [RA], #8 + |1: + | cmp RA, RC + | str TISNIL, [RA], #8 + | blt <1 + | ins_next_ + break; + + /* -- Upvalue and function ops ------------------------------------------ */ + + case BC_UGET: + | // RA = dst, RC = uvnum + | ldr LFUNC:CARG2, [BASE, FRAME_FUNC] + | add RC, RC, #offsetof(GCfuncL, uvptr)/8 + | and LFUNC:CARG2, CARG2, #LJ_GCVMASK + | ldr UPVAL:CARG2, [LFUNC:CARG2, RC, lsl #3] + | ldr CARG2, UPVAL:CARG2->v + | ldr TMP0, [CARG2] + | str TMP0, [BASE, RA, lsl #3] + | ins_next + break; + case BC_USETV: + | // RA = uvnum, RC = src + | ldr LFUNC:CARG2, [BASE, FRAME_FUNC] + | add RA, RA, #offsetof(GCfuncL, uvptr)/8 + | and LFUNC:CARG2, CARG2, #LJ_GCVMASK + | ldr UPVAL:CARG1, [LFUNC:CARG2, RA, lsl #3] + | ldr CARG3, [BASE, RC, lsl #3] + | ldr CARG2, UPVAL:CARG1->v + | ldrb TMP2w, UPVAL:CARG1->marked + | ldrb TMP0w, UPVAL:CARG1->closed + | asr ITYPE, CARG3, #47 + | str CARG3, [CARG2] + | add ITYPE, ITYPE, #-LJ_TISGCV + | tst TMP2w, #LJ_GC_BLACK // isblack(uv) + | ccmp TMP0w, #0, #4, ne // && uv->closed + | ccmn ITYPE, #-(LJ_TNUMX - LJ_TISGCV), #0, ne // && tvisgcv(v) + | bhi >2 + |1: + | ins_next + | + |2: // Check if new value is white. + | and GCOBJ:CARG3, CARG3, #LJ_GCVMASK + | ldrb TMP1w, GCOBJ:CARG3->gch.marked + | tst TMP1w, #LJ_GC_WHITES // iswhite(str) + | beq <1 + | // Crossed a write barrier. Move the barrier forward. + | mov CARG1, GL + | bl extern lj_gc_barrieruv // (global_State *g, TValue *tv) + | b <1 + break; + case BC_USETS: + | // RA = uvnum, RC = str_const (~) + | ldr LFUNC:CARG2, [BASE, FRAME_FUNC] + | add RA, RA, #offsetof(GCfuncL, uvptr)/8 + | mvn RC, RC + | and LFUNC:CARG2, CARG2, #LJ_GCVMASK + | ldr UPVAL:CARG1, [LFUNC:CARG2, RA, lsl #3] + | ldr STR:CARG3, [KBASE, RC, lsl #3] + | movn TMP0, #~LJ_TSTR + | ldr CARG2, UPVAL:CARG1->v + | ldrb TMP2w, UPVAL:CARG1->marked + | add TMP0, STR:CARG3, TMP0, lsl #47 + | ldrb TMP1w, STR:CARG3->marked + | str TMP0, [CARG2] + | tbnz TMP2w, #2, >2 // isblack(uv) + |1: + | ins_next + | + |2: // Check if string is white and ensure upvalue is closed. + | ldrb TMP0w, UPVAL:CARG1->closed + | tst TMP1w, #LJ_GC_WHITES // iswhite(str) + | ccmp TMP0w, #0, #4, ne + | beq <1 + | // Crossed a write barrier. Move the barrier forward. + | mov CARG1, GL + | bl extern lj_gc_barrieruv // (global_State *g, TValue *tv) + | b <1 + break; + case BC_USETN: + | // RA = uvnum, RC = num_const + | ldr LFUNC:CARG2, [BASE, FRAME_FUNC] + | add RA, RA, #offsetof(GCfuncL, uvptr)/8 + | and LFUNC:CARG2, CARG2, #LJ_GCVMASK + | ldr UPVAL:CARG2, [LFUNC:CARG2, RA, lsl #3] + | ldr TMP0, [KBASE, RC, lsl #3] + | ldr CARG2, UPVAL:CARG2->v + | str TMP0, [CARG2] + | ins_next + break; + case BC_USETP: + | // RA = uvnum, RC = primitive_type (~) + | ldr LFUNC:CARG2, [BASE, FRAME_FUNC] + | add RA, RA, #offsetof(GCfuncL, uvptr)/8 + | and LFUNC:CARG2, CARG2, #LJ_GCVMASK + | ldr UPVAL:CARG2, [LFUNC:CARG2, RA, lsl #3] + | mvn TMP0, RC, lsl #47 + | ldr CARG2, UPVAL:CARG2->v + | str TMP0, [CARG2] + | ins_next + break; + + case BC_UCLO: + | // RA = level, RC = target + | ldr CARG3, L->openupval + | add RC, PC, RC, lsl #2 + | str BASE, L->base + | sub PC, RC, #0x20000 + | cbz CARG3, >1 + | mov CARG1, L + | add CARG2, BASE, RA, lsl #3 + | bl extern lj_func_closeuv // (lua_State *L, TValue *level) + | ldr BASE, L->base + |1: + | ins_next + break; + + case BC_FNEW: + | // RA = dst, RC = proto_const (~) (holding function prototype) + | mvn RC, RC + | str BASE, L->base + | ldr LFUNC:CARG3, [BASE, FRAME_FUNC] + | str PC, SAVE_PC + | ldr CARG2, [KBASE, RC, lsl #3] + | mov CARG1, L + | and LFUNC:CARG3, CARG3, #LJ_GCVMASK + | // (lua_State *L, GCproto *pt, GCfuncL *parent) + | bl extern lj_func_newL_gc + | // Returns GCfuncL *. + | ldr BASE, L->base + | movn TMP0, #~LJ_TFUNC + | add CRET1, CRET1, TMP0, lsl #47 + | str CRET1, [BASE, RA, lsl #3] + | ins_next + break; + + /* -- Table ops --------------------------------------------------------- */ + + case BC_TNEW: + case BC_TDUP: + | // RA = dst, RC = (hbits|asize) | tab_const (~) + | ldp CARG3, CARG4, GL->gc.total // Assumes threshold follows total. + | str BASE, L->base + | str PC, SAVE_PC + | mov CARG1, L + | cmp CARG3, CARG4 + | bhs >5 + |1: + if (op == BC_TNEW) { + | and CARG2, RC, #0x7ff + | lsr CARG3, RC, #11 + | cmp CARG2, #0x7ff + | mov TMP0, #0x801 + | csel CARG2, CARG2, TMP0, ne + | bl extern lj_tab_new // (lua_State *L, int32_t asize, uint32_t hbits) + | // Returns GCtab *. + } else { + | mvn RC, RC + | ldr CARG2, [KBASE, RC, lsl #3] + | bl extern lj_tab_dup // (lua_State *L, Table *kt) + | // Returns GCtab *. + } + | ldr BASE, L->base + | movk CRET1, #(LJ_TTAB>>1)&0xffff, lsl #48 + | str CRET1, [BASE, RA, lsl #3] + | ins_next + | + |5: + | bl extern lj_gc_step_fixtop // (lua_State *L) + | mov CARG1, L + | b <1 + break; + + case BC_GGET: + | // RA = dst, RC = str_const (~) + case BC_GSET: + | // RA = src, RC = str_const (~) + | ldr LFUNC:CARG1, [BASE, FRAME_FUNC] + | mvn RC, RC + | and LFUNC:CARG1, CARG1, #LJ_GCVMASK + | ldr TAB:CARG2, LFUNC:CARG1->env + | ldr STR:RC, [KBASE, RC, lsl #3] + if (op == BC_GGET) { + | b ->BC_TGETS_Z + } else { + | b ->BC_TSETS_Z + } + break; + + case BC_TGETV: + | decode_RB RB, INS + | and RC, RC, #255 + | // RA = dst, RB = table, RC = key + | ldr CARG2, [BASE, RB, lsl #3] + | ldr TMP1, [BASE, RC, lsl #3] + | checktab CARG2, ->vmeta_tgetv + | checkint TMP1, >9 // Integer key? + | ldr CARG3, TAB:CARG2->array + | ldr CARG1w, TAB:CARG2->asize + | add CARG3, CARG3, TMP1, uxtw #3 + | cmp TMP1w, CARG1w // In array part? + | bhs ->vmeta_tgetv + | ldr TMP0, [CARG3] + | cmp TMP0, TISNIL + | beq >5 + |1: + | str TMP0, [BASE, RA, lsl #3] + | ins_next + | + |5: // Check for __index if table value is nil. + | ldr TAB:CARG1, TAB:CARG2->metatable + | cbz TAB:CARG1, <1 // No metatable: done. + | ldrb TMP1w, TAB:CARG1->nomm + | tbnz TMP1w, #MM_index, <1 // 'no __index' flag set: done. + | b ->vmeta_tgetv + | + |9: + | asr ITYPE, TMP1, #47 + | cmn ITYPE, #-LJ_TSTR // String key? + | bne ->vmeta_tgetv + | and STR:RC, TMP1, #LJ_GCVMASK + | b ->BC_TGETS_Z + break; + case BC_TGETS: + | decode_RB RB, INS + | and RC, RC, #255 + | // RA = dst, RB = table, RC = str_const (~) + | ldr CARG2, [BASE, RB, lsl #3] + | mvn RC, RC + | ldr STR:RC, [KBASE, RC, lsl #3] + | checktab CARG2, ->vmeta_tgets1 + |->BC_TGETS_Z: + | // TAB:CARG2 = GCtab *, STR:RC = GCstr *, RA = dst + | ldr TMP1w, TAB:CARG2->hmask + | ldr TMP2w, STR:RC->sid + | ldr NODE:CARG3, TAB:CARG2->node + | and TMP1w, TMP1w, TMP2w // idx = str->sid & tab->hmask + | add TMP1, TMP1, TMP1, lsl #1 + | movn CARG4, #~LJ_TSTR + | add NODE:CARG3, NODE:CARG3, TMP1, lsl #3 // node = tab->node + idx*3*8 + | add CARG4, STR:RC, CARG4, lsl #47 // Tagged key to look for. + |1: + | ldp TMP0, CARG1, NODE:CARG3->val + | ldr NODE:CARG3, NODE:CARG3->next + | cmp CARG1, CARG4 + | bne >4 + | cmp TMP0, TISNIL + | beq >5 + |3: + | str TMP0, [BASE, RA, lsl #3] + | ins_next + | + |4: // Follow hash chain. + | cbnz NODE:CARG3, <1 + | // End of hash chain: key not found, nil result. + | mov TMP0, TISNIL + | + |5: // Check for __index if table value is nil. + | ldr TAB:CARG1, TAB:CARG2->metatable + | cbz TAB:CARG1, <3 // No metatable: done. + | ldrb TMP1w, TAB:CARG1->nomm + | tbnz TMP1w, #MM_index, <3 // 'no __index' flag set: done. + | b ->vmeta_tgets + break; + case BC_TGETB: + | decode_RB RB, INS + | and RC, RC, #255 + | // RA = dst, RB = table, RC = index + | ldr CARG2, [BASE, RB, lsl #3] + | checktab CARG2, ->vmeta_tgetb + | ldr CARG3, TAB:CARG2->array + | ldr CARG1w, TAB:CARG2->asize + | add CARG3, CARG3, RC, lsl #3 + | cmp RCw, CARG1w // In array part? + | bhs ->vmeta_tgetb + | ldr TMP0, [CARG3] + | cmp TMP0, TISNIL + | beq >5 + |1: + | str TMP0, [BASE, RA, lsl #3] + | ins_next + | + |5: // Check for __index if table value is nil. + | ldr TAB:CARG1, TAB:CARG2->metatable + | cbz TAB:CARG1, <1 // No metatable: done. + | ldrb TMP1w, TAB:CARG1->nomm + | tbnz TMP1w, #MM_index, <1 // 'no __index' flag set: done. + | b ->vmeta_tgetb + break; + case BC_TGETR: + | decode_RB RB, INS + | and RC, RC, #255 + | // RA = dst, RB = table, RC = key + | ldr CARG1, [BASE, RB, lsl #3] + | ldr TMP1, [BASE, RC, lsl #3] + | and TAB:CARG1, CARG1, #LJ_GCVMASK + | ldr CARG3, TAB:CARG1->array + | ldr TMP2w, TAB:CARG1->asize + | add CARG3, CARG3, TMP1w, uxtw #3 + | cmp TMP1w, TMP2w // In array part? + | bhs ->vmeta_tgetr + | ldr TMP0, [CARG3] + |->BC_TGETR_Z: + | str TMP0, [BASE, RA, lsl #3] + | ins_next + break; + + case BC_TSETV: + | decode_RB RB, INS + | and RC, RC, #255 + | // RA = src, RB = table, RC = key + | ldr CARG2, [BASE, RB, lsl #3] + | ldr TMP1, [BASE, RC, lsl #3] + | checktab CARG2, ->vmeta_tsetv + | checkint TMP1, >9 // Integer key? + | ldr CARG3, TAB:CARG2->array + | ldr CARG1w, TAB:CARG2->asize + | add CARG3, CARG3, TMP1, uxtw #3 + | cmp TMP1w, CARG1w // In array part? + | bhs ->vmeta_tsetv + | ldr TMP1, [CARG3] + | ldr TMP0, [BASE, RA, lsl #3] + | ldrb TMP2w, TAB:CARG2->marked + | cmp TMP1, TISNIL // Previous value is nil? + | beq >5 + |1: + | str TMP0, [CARG3] + | tbnz TMP2w, #2, >7 // isblack(table) + |2: + | ins_next + | + |5: // Check for __newindex if previous value is nil. + | ldr TAB:CARG1, TAB:CARG2->metatable + | cbz TAB:CARG1, <1 // No metatable: done. + | ldrb TMP1w, TAB:CARG1->nomm + | tbnz TMP1w, #MM_newindex, <1 // 'no __newindex' flag set: done. + | b ->vmeta_tsetv + | + |7: // Possible table write barrier for the value. Skip valiswhite check. + | barrierback TAB:CARG2, TMP2w, TMP1 + | b <2 + | + |9: + | asr ITYPE, TMP1, #47 + | cmn ITYPE, #-LJ_TSTR // String key? + | bne ->vmeta_tsetv + | and STR:RC, TMP1, #LJ_GCVMASK + | b ->BC_TSETS_Z + break; + case BC_TSETS: + | decode_RB RB, INS + | and RC, RC, #255 + | // RA = dst, RB = table, RC = str_const (~) + | ldr CARG2, [BASE, RB, lsl #3] + | mvn RC, RC + | ldr STR:RC, [KBASE, RC, lsl #3] + | checktab CARG2, ->vmeta_tsets1 + |->BC_TSETS_Z: + | // TAB:CARG2 = GCtab *, STR:RC = GCstr *, RA = src + | ldr TMP1w, TAB:CARG2->hmask + | ldr TMP2w, STR:RC->sid + | ldr NODE:CARG3, TAB:CARG2->node + | and TMP1w, TMP1w, TMP2w // idx = str->sid & tab->hmask + | add TMP1, TMP1, TMP1, lsl #1 + | movn CARG4, #~LJ_TSTR + | add NODE:CARG3, NODE:CARG3, TMP1, lsl #3 // node = tab->node + idx*3*8 + | add CARG4, STR:RC, CARG4, lsl #47 // Tagged key to look for. + | strb wzr, TAB:CARG2->nomm // Clear metamethod cache. + |1: + | ldp TMP1, CARG1, NODE:CARG3->val + | ldr NODE:TMP3, NODE:CARG3->next + | ldrb TMP2w, TAB:CARG2->marked + | cmp CARG1, CARG4 + | bne >5 + | ldr TMP0, [BASE, RA, lsl #3] + | cmp TMP1, TISNIL // Previous value is nil? + | beq >4 + |2: + | str TMP0, NODE:CARG3->val + | tbnz TMP2w, #2, >7 // isblack(table) + |3: + | ins_next + | + |4: // Check for __newindex if previous value is nil. + | ldr TAB:CARG1, TAB:CARG2->metatable + | cbz TAB:CARG1, <2 // No metatable: done. + | ldrb TMP1w, TAB:CARG1->nomm + | tbnz TMP1w, #MM_newindex, <2 // 'no __newindex' flag set: done. + | b ->vmeta_tsets + | + |5: // Follow hash chain. + | mov NODE:CARG3, NODE:TMP3 + | cbnz NODE:TMP3, <1 + | // End of hash chain: key not found, add a new one. + | + | // But check for __newindex first. + | ldr TAB:CARG1, TAB:CARG2->metatable + | cbz TAB:CARG1, >6 // No metatable: continue. + | ldrb TMP1w, TAB:CARG1->nomm + | // 'no __newindex' flag NOT set: check. + | tbz TMP1w, #MM_newindex, ->vmeta_tsets + |6: + | movn TMP1, #~LJ_TSTR + | str PC, SAVE_PC + | add TMP0, STR:RC, TMP1, lsl #47 + | str BASE, L->base + | mov CARG1, L + | str TMP0, TMPD + | add CARG3, sp, TMPDofs + | bl extern lj_tab_newkey // (lua_State *L, GCtab *t, TValue *k) + | // Returns TValue *. + | ldr BASE, L->base + | ldr TMP0, [BASE, RA, lsl #3] + | str TMP0, [CRET1] + | b <3 // No 2nd write barrier needed. + | + |7: // Possible table write barrier for the value. Skip valiswhite check. + | barrierback TAB:CARG2, TMP2w, TMP1 + | b <3 + break; + case BC_TSETB: + | decode_RB RB, INS + | and RC, RC, #255 + | // RA = src, RB = table, RC = index + | ldr CARG2, [BASE, RB, lsl #3] + | checktab CARG2, ->vmeta_tsetb + | ldr CARG3, TAB:CARG2->array + | ldr CARG1w, TAB:CARG2->asize + | add CARG3, CARG3, RC, lsl #3 + | cmp RCw, CARG1w // In array part? + | bhs ->vmeta_tsetb + | ldr TMP1, [CARG3] + | ldr TMP0, [BASE, RA, lsl #3] + | ldrb TMP2w, TAB:CARG2->marked + | cmp TMP1, TISNIL // Previous value is nil? + | beq >5 + |1: + | str TMP0, [CARG3] + | tbnz TMP2w, #2, >7 // isblack(table) + |2: + | ins_next + | + |5: // Check for __newindex if previous value is nil. + | ldr TAB:CARG1, TAB:CARG2->metatable + | cbz TAB:CARG1, <1 // No metatable: done. + | ldrb TMP1w, TAB:CARG1->nomm + | tbnz TMP1w, #MM_newindex, <1 // 'no __newindex' flag set: done. + | b ->vmeta_tsetb + | + |7: // Possible table write barrier for the value. Skip valiswhite check. + | barrierback TAB:CARG2, TMP2w, TMP1 + | b <2 + break; + case BC_TSETR: + | decode_RB RB, INS + | and RC, RC, #255 + | // RA = src, RB = table, RC = key + | ldr CARG2, [BASE, RB, lsl #3] + | ldr TMP1, [BASE, RC, lsl #3] + | and TAB:CARG2, CARG2, #LJ_GCVMASK + | ldr CARG1, TAB:CARG2->array + | ldrb TMP2w, TAB:CARG2->marked + | ldr CARG4w, TAB:CARG2->asize + | add CARG1, CARG1, TMP1, uxtw #3 + | tbnz TMP2w, #2, >7 // isblack(table) + |2: + | cmp TMP1w, CARG4w // In array part? + | bhs ->vmeta_tsetr + |->BC_TSETR_Z: + | ldr TMP0, [BASE, RA, lsl #3] + | str TMP0, [CARG1] + | ins_next + | + |7: // Possible table write barrier for the value. Skip valiswhite check. + | barrierback TAB:CARG2, TMP2w, TMP0 + | b <2 + break; + + case BC_TSETM: + | // RA = base (table at base-1), RC = num_const (start index) + | add RA, BASE, RA, lsl #3 + |1: + | ldr RBw, SAVE_MULTRES + | ldr TAB:CARG2, [RA, #-8] // Guaranteed to be a table. + | ldr TMP1, [KBASE, RC, lsl #3] // Integer constant is in lo-word. + | sub RB, RB, #8 + | cbz RB, >4 // Nothing to copy? + | and TAB:CARG2, CARG2, #LJ_GCVMASK + | ldr CARG1w, TAB:CARG2->asize + | add CARG3w, TMP1w, RBw, lsr #3 + | ldr CARG4, TAB:CARG2->array + | cmp CARG3, CARG1 + | add RB, RA, RB + | bhi >5 + | add TMP1, CARG4, TMP1w, uxtw #3 + | ldrb TMP2w, TAB:CARG2->marked + |3: // Copy result slots to table. + | ldr TMP0, [RA], #8 + | str TMP0, [TMP1], #8 + | cmp RA, RB + | blo <3 + | tbnz TMP2w, #2, >7 // isblack(table) + |4: + | ins_next + | + |5: // Need to resize array part. + | str BASE, L->base + | mov CARG1, L + | str PC, SAVE_PC + | bl extern lj_tab_reasize // (lua_State *L, GCtab *t, int nasize) + | // Must not reallocate the stack. + | b <1 + | + |7: // Possible table write barrier for any value. Skip valiswhite check. + | barrierback TAB:CARG2, TMP2w, TMP1 + | b <4 + break; + + /* -- Calls and vararg handling ----------------------------------------- */ + + case BC_CALLM: + | // RA = base, (RB = nresults+1,) RC = extra_nargs + | ldr TMP0w, SAVE_MULTRES + | decode_RC8RD NARGS8:RC, RC + | add NARGS8:RC, NARGS8:RC, TMP0 + | b ->BC_CALL_Z + break; + case BC_CALL: + | decode_RC8RD NARGS8:RC, RC + | // RA = base, (RB = nresults+1,) RC = (nargs+1)*8 + |->BC_CALL_Z: + | mov RB, BASE // Save old BASE for vmeta_call. + | add BASE, BASE, RA, lsl #3 + | ldr CARG3, [BASE] + | sub NARGS8:RC, NARGS8:RC, #8 + | add BASE, BASE, #16 + | checkfunc CARG3, ->vmeta_call + | ins_call + break; + + case BC_CALLMT: + | // RA = base, (RB = 0,) RC = extra_nargs + | ldr TMP0w, SAVE_MULTRES + | add NARGS8:RC, TMP0, RC, lsl #3 + | b ->BC_CALLT1_Z + break; + case BC_CALLT: + | lsl NARGS8:RC, RC, #3 + | // RA = base, (RB = 0,) RC = (nargs+1)*8 + |->BC_CALLT1_Z: + | add RA, BASE, RA, lsl #3 + | ldr TMP1, [RA] + | sub NARGS8:RC, NARGS8:RC, #8 + | add RA, RA, #16 + | checktp CARG3, TMP1, LJ_TFUNC, ->vmeta_callt + | ldr PC, [BASE, FRAME_PC] + |->BC_CALLT2_Z: + | mov RB, #0 + | ldrb TMP2w, LFUNC:CARG3->ffid + | tst PC, #FRAME_TYPE + | bne >7 + |1: + | str TMP1, [BASE, FRAME_FUNC] // Copy function down, but keep PC. + | cbz NARGS8:RC, >3 + |2: + | ldr TMP0, [RA, RB] + | add TMP1, RB, #8 + | cmp TMP1, NARGS8:RC + | str TMP0, [BASE, RB] + | mov RB, TMP1 + | bne <2 + |3: + | cmp TMP2, #1 // (> FF_C) Calling a fast function? + | bhi >5 + |4: + | ins_callt + | + |5: // Tailcall to a fast function with a Lua frame below. + | ldrb RAw, [PC, #-4+OFS_RA] + | sub CARG1, BASE, RA, lsl #3 + | ldr LFUNC:CARG1, [CARG1, #-32] + | and LFUNC:CARG1, CARG1, #LJ_GCVMASK + | ldr CARG1, LFUNC:CARG1->pc + | ldr KBASE, [CARG1, #PC2PROTO(k)] + | b <4 + | + |7: // Tailcall from a vararg function. + | eor PC, PC, #FRAME_VARG + | tst PC, #FRAME_TYPEP // Vararg frame below? + | csel TMP2, RB, TMP2, ne // Clear ffid if no Lua function below. + | bne <1 + | sub BASE, BASE, PC + | ldr PC, [BASE, FRAME_PC] + | tst PC, #FRAME_TYPE + | csel TMP2, RB, TMP2, ne // Clear ffid if no Lua function below. + | b <1 + break; + + case BC_ITERC: + | // RA = base, (RB = nresults+1, RC = nargs+1 (2+1)) + | add RA, BASE, RA, lsl #3 + | ldr CARG3, [RA, #-24] + | mov RB, BASE // Save old BASE for vmeta_call. + | ldp CARG1, CARG2, [RA, #-16] + | add BASE, RA, #16 + | mov NARGS8:RC, #16 // Iterators get 2 arguments. + | str CARG3, [RA] // Copy callable. + | stp CARG1, CARG2, [RA, #16] // Copy state and control var. + | checkfunc CARG3, ->vmeta_call + | ins_call + break; + + case BC_ITERN: + |.if JIT + | hotloop + |.endif + |->vm_IITERN: + | // RA = base, (RB = nresults+1, RC = nargs+1 (2+1)) + | add RA, BASE, RA, lsl #3 + | ldr TAB:RB, [RA, #-16] + | ldrh TMP3w, [PC, # OFS_RD] + | ldr CARG1w, [RA, #-8+LO] // Get index from control var. + | add PC, PC, #4 + | add TMP3, PC, TMP3, lsl #2 + | and TAB:RB, RB, #LJ_GCVMASK + | sub TMP3, TMP3, #0x20000 + | ldr TMP1w, TAB:RB->asize + | ldr CARG2, TAB:RB->array + |1: // Traverse array part. + | subs RC, CARG1, TMP1 + | add CARG3, CARG2, CARG1, lsl #3 + | bhs >5 // Index points after array part? + | ldr TMP0, [CARG3] + | cmp TMP0, TISNIL + | cinc CARG1, CARG1, eq // Skip holes in array part. + | beq <1 + | add CARG1, CARG1, TISNUM + | stp CARG1, TMP0, [RA] + | add CARG1, CARG1, #1 + |3: + | str CARG1w, [RA, #-8+LO] // Update control var. + | mov PC, TMP3 + |4: + | ins_next + | + |5: // Traverse hash part. + | ldr TMP2w, TAB:RB->hmask + | ldr NODE:RB, TAB:RB->node + |6: + | add CARG1, RC, RC, lsl #1 + | cmp RC, TMP2 // End of iteration? Branch to ITERN+1. + | add NODE:CARG3, NODE:RB, CARG1, lsl #3 // node = tab->node + idx*3*8 + | bhi <4 + | ldp TMP0, CARG1, NODE:CARG3->val + | cmp TMP0, TISNIL + | add RC, RC, #1 + | beq <6 // Skip holes in hash part. + | stp CARG1, TMP0, [RA] + | add CARG1, RC, TMP1 + | b <3 + break; + + case BC_ISNEXT: + | // RA = base, RC = target (points to ITERN) + | add RA, BASE, RA, lsl #3 + | ldr CFUNC:CARG1, [RA, #-24] + | add RC, PC, RC, lsl #2 + | ldp TAB:CARG3, CARG4, [RA, #-16] + | sub RC, RC, #0x20000 + | checkfunc CFUNC:CARG1, >5 + | asr TMP0, TAB:CARG3, #47 + | ldrb TMP1w, CFUNC:CARG1->ffid + | cmn TMP0, #-LJ_TTAB + | ccmp CARG4, TISNIL, #0, eq + | ccmp TMP1w, #FF_next_N, #0, eq + | bne >5 + | mov TMP0w, #0xfffe7fff // LJ_KEYINDEX + | lsl TMP0, TMP0, #32 + | str TMP0, [RA, #-8] // Initialize control var. + |1: + | mov PC, RC + | ins_next + | + |5: // Despecialize bytecode if any of the checks fail. + |.if JIT + | ldrb TMP2w, [RC, # OFS_OP] + |.endif + | mov TMP0, #BC_JMP + | mov TMP1, #BC_ITERC + | strb TMP0w, [PC, #-4+OFS_OP] + |.if JIT + | cmp TMP2w, #BC_ITERN + | bne >6 + |.endif + | strb TMP1w, [RC, # OFS_OP] + | b <1 + |.if JIT + |6: // Unpatch JLOOP. + | ldr RA, [GL, #GL_J(trace)] + | ldrh TMP2w, [RC, # OFS_RD] + | ldr TRACE:RA, [RA, TMP2, lsl #3] + | ldr TMP2w, TRACE:RA->startins + | bfxil TMP2w, TMP1w, #0, #8 + | str TMP2w, [RC] + | b <1 + |.endif + break; + + case BC_VARG: + | decode_RB RB, INS + | and RC, RC, #255 + | // RA = base, RB = (nresults+1), RC = numparams + | ldr TMP1, [BASE, FRAME_PC] + | add RC, BASE, RC, lsl #3 + | add RA, BASE, RA, lsl #3 + | add RC, RC, #FRAME_VARG + | add TMP2, RA, RB, lsl #3 + | sub RC, RC, TMP1 // RC = vbase + | // Note: RC may now be even _above_ BASE if nargs was < numparams. + | sub TMP3, BASE, #16 // TMP3 = vtop + | cbz RB, >5 + | sub TMP2, TMP2, #16 + |1: // Copy vararg slots to destination slots. + | cmp RC, TMP3 + | ldr TMP0, [RC], #8 + | csel TMP0, TMP0, TISNIL, lo + | cmp RA, TMP2 + | str TMP0, [RA], #8 + | blo <1 + |2: + | ins_next + | + |5: // Copy all varargs. + | ldr TMP0, L->maxstack + | subs TMP2, TMP3, RC + | csel RB, xzr, TMP2, le // MULTRES = (max(vtop-vbase,0)+1)*8 + | add RB, RB, #8 + | add TMP1, RA, TMP2 + | str RBw, SAVE_MULTRES + | ble <2 // Nothing to copy. + | cmp TMP1, TMP0 + | bhi >7 + |6: + | ldr TMP0, [RC], #8 + | str TMP0, [RA], #8 + | cmp RC, TMP3 + | blo <6 + | b <2 + | + |7: // Grow stack for varargs. + | lsr CARG2, TMP2, #3 + | stp BASE, RA, L->base + | mov CARG1, L + | sub RC, RC, BASE // Need delta, because BASE may change. + | str PC, SAVE_PC + | bl extern lj_state_growstack // (lua_State *L, int n) + | ldp BASE, RA, L->base + | add RC, BASE, RC + | sub TMP3, BASE, #16 + | b <6 + break; + + /* -- Returns ----------------------------------------------------------- */ + + case BC_RETM: + | // RA = results, RC = extra results + | ldr TMP0w, SAVE_MULTRES + | ldr PC, [BASE, FRAME_PC] + | add RA, BASE, RA, lsl #3 + | add RC, TMP0, RC, lsl #3 + | b ->BC_RETM_Z + break; + + case BC_RET: + | // RA = results, RC = nresults+1 + | ldr PC, [BASE, FRAME_PC] + | lsl RC, RC, #3 + | add RA, BASE, RA, lsl #3 + |->BC_RETM_Z: + | str RCw, SAVE_MULTRES + |1: + | ands CARG1, PC, #FRAME_TYPE + | eor CARG2, PC, #FRAME_VARG + | bne ->BC_RETV2_Z + | + |->BC_RET_Z: + | // BASE = base, RA = resultptr, RC = (nresults+1)*8, PC = return + | ldr INSw, [PC, #-4] + | subs TMP1, RC, #8 + | sub CARG3, BASE, #16 + | beq >3 + |2: + | ldr TMP0, [RA], #8 + | add BASE, BASE, #8 + | sub TMP1, TMP1, #8 + | str TMP0, [BASE, #-24] + | cbnz TMP1, <2 + |3: + | decode_RA RA, INS + | sub CARG4, CARG3, RA, lsl #3 + | decode_RB RB, INS + | ldr LFUNC:CARG1, [CARG4, FRAME_FUNC] + |5: + | cmp RC, RB, lsl #3 // More results expected? + | blo >6 + | and LFUNC:CARG1, CARG1, #LJ_GCVMASK + | mov BASE, CARG4 + | ldr CARG2, LFUNC:CARG1->pc + | ldr KBASE, [CARG2, #PC2PROTO(k)] + | ins_next + | + |6: // Fill up results with nil. + | add BASE, BASE, #8 + | add RC, RC, #8 + | str TISNIL, [BASE, #-24] + | b <5 + | + |->BC_RETV1_Z: // Non-standard return case. + | add RA, BASE, RA, lsl #3 + |->BC_RETV2_Z: + | tst CARG2, #FRAME_TYPEP + | bne ->vm_return + | // Return from vararg function: relocate BASE down. + | sub BASE, BASE, CARG2 + | ldr PC, [BASE, FRAME_PC] + | b <1 + break; + + case BC_RET0: case BC_RET1: + | // RA = results, RC = nresults+1 + | ldr PC, [BASE, FRAME_PC] + | lsl RC, RC, #3 + | str RCw, SAVE_MULTRES + | ands CARG1, PC, #FRAME_TYPE + | eor CARG2, PC, #FRAME_VARG + | bne ->BC_RETV1_Z + | ldr INSw, [PC, #-4] + if (op == BC_RET1) { + | ldr TMP0, [BASE, RA, lsl #3] + } + | sub CARG4, BASE, #16 + | decode_RA RA, INS + | sub BASE, CARG4, RA, lsl #3 + if (op == BC_RET1) { + | str TMP0, [CARG4], #8 + } + | decode_RB RB, INS + | ldr LFUNC:CARG1, [BASE, FRAME_FUNC] + |5: + | cmp RC, RB, lsl #3 + | blo >6 + | and LFUNC:CARG1, CARG1, #LJ_GCVMASK + | ldr CARG2, LFUNC:CARG1->pc + | ldr KBASE, [CARG2, #PC2PROTO(k)] + | ins_next + | + |6: // Fill up results with nil. + | add RC, RC, #8 + | str TISNIL, [CARG4], #8 + | b <5 + break; + + /* -- Loops and branches ------------------------------------------------ */ + + |.define FOR_IDX, [RA]; .define FOR_TIDX, [RA, #4] + |.define FOR_STOP, [RA, #8]; .define FOR_TSTOP, [RA, #12] + |.define FOR_STEP, [RA, #16]; .define FOR_TSTEP, [RA, #20] + |.define FOR_EXT, [RA, #24]; .define FOR_TEXT, [RA, #28] + + case BC_FORL: + |.if JIT + | hotloop + |.endif + | // Fall through. Assumes BC_IFORL follows. + break; + + case BC_JFORI: + case BC_JFORL: +#if !LJ_HASJIT + break; +#endif + case BC_FORI: + case BC_IFORL: + | // RA = base, RC = target (after end of loop or start of loop) + vk = (op == BC_IFORL || op == BC_JFORL); + | add RA, BASE, RA, lsl #3 + | ldp CARG1, CARG2, FOR_IDX // CARG1 = IDX, CARG2 = STOP + | ldr CARG3, FOR_STEP // CARG3 = STEP + if (op != BC_JFORL) { + | add RC, PC, RC, lsl #2 + | sub RC, RC, #0x20000 + } + | checkint CARG1, >5 + if (!vk) { + | checkint CARG2, ->vmeta_for + | checkint CARG3, ->vmeta_for + | tbnz CARG3w, #31, >4 + | cmp CARG1w, CARG2w + } else { + | adds CARG1w, CARG1w, CARG3w + | bvs >2 + | add TMP0, CARG1, TISNUM + | tbnz CARG3w, #31, >4 + | cmp CARG1w, CARG2w + } + |1: + if (op == BC_FORI) { + | csel PC, RC, PC, gt + } else if (op == BC_JFORI) { + | mov PC, RC + | ldrh RCw, [RC, #-4+OFS_RD] + } else if (op == BC_IFORL) { + | csel PC, RC, PC, le + } + if (vk) { + | str TMP0, FOR_IDX + | str TMP0, FOR_EXT + } else { + | str CARG1, FOR_EXT + } + if (op == BC_JFORI || op == BC_JFORL) { + | ble =>BC_JLOOP + } + |2: + | ins_next + | + |4: // Invert check for negative step. + | cmp CARG2w, CARG1w + | b <1 + | + |5: // FP loop. + | ldp d0, d1, FOR_IDX + | blo ->vmeta_for + if (!vk) { + | checknum CARG2, ->vmeta_for + | checknum CARG3, ->vmeta_for + | str d0, FOR_EXT + } else { + | ldr d2, FOR_STEP + | fadd d0, d0, d2 + } + | tbnz CARG3, #63, >7 + | fcmp d0, d1 + |6: + if (vk) { + | str d0, FOR_IDX + | str d0, FOR_EXT + } + if (op == BC_FORI) { + | csel PC, RC, PC, hi + } else if (op == BC_JFORI) { + | ldrh RCw, [RC, #-4+OFS_RD] + | bls =>BC_JLOOP + } else if (op == BC_IFORL) { + | csel PC, RC, PC, ls + } else { + | bls =>BC_JLOOP + } + | b <2 + | + |7: // Invert check for negative step. + | fcmp d1, d0 + | b <6 + break; + + case BC_ITERL: + |.if JIT + | hotloop + |.endif + | // Fall through. Assumes BC_IITERL follows. + break; + + case BC_JITERL: +#if !LJ_HASJIT + break; +#endif + case BC_IITERL: + | // RA = base, RC = target + | ldr CARG1, [BASE, RA, lsl #3] + | add TMP1, BASE, RA, lsl #3 + | cmp CARG1, TISNIL + | beq >1 // Stop if iterator returned nil. + if (op == BC_JITERL) { + | str CARG1, [TMP1, #-8] + | b =>BC_JLOOP + } else { + | add TMP0, PC, RC, lsl #2 // Otherwise save control var + branch. + | sub PC, TMP0, #0x20000 + | str CARG1, [TMP1, #-8] + } + |1: + | ins_next + break; + + case BC_LOOP: + | // RA = base, RC = target (loop extent) + | // Note: RA/RC is only used by trace recorder to determine scope/extent + | // This opcode does NOT jump, it's only purpose is to detect a hot loop. + |.if JIT + | hotloop + |.endif + | // Fall through. Assumes BC_ILOOP follows. + break; + + case BC_ILOOP: + | // RA = base, RC = target (loop extent) + | ins_next + break; + + case BC_JLOOP: + |.if JIT + | // RA = base (ignored), RC = traceno + | ldr CARG1, [GL, #GL_J(trace)] + | mov CARG2w, #0 // Traces on ARM64 don't store the trace #, so use 0. + | ldr TRACE:RC, [CARG1, RC, lsl #3] + | st_vmstate CARG2w + |.if PAUTH + | ldr RA, TRACE:RC->mcauth + |.else + | ldr RA, TRACE:RC->mcode + |.endif + | str BASE, GL->jit_base + | str L, GL->tmpbuf.L + | sub sp, sp, #16 // See SPS_FIXED. Avoids sp adjust in every root trace. + |.if PAUTH + | braa RA, RC + |.else + | br RA + |.endif + |.endif + break; + + case BC_JMP: + | // RA = base (only used by trace recorder), RC = target + | add RC, PC, RC, lsl #2 + | sub PC, RC, #0x20000 + | ins_next + break; + + /* -- Function headers -------------------------------------------------- */ + + case BC_FUNCF: + |.if JIT + | hotcall + |.endif + case BC_FUNCV: /* NYI: compiled vararg functions. */ + | // Fall through. Assumes BC_IFUNCF/BC_IFUNCV follow. + break; + + case BC_JFUNCF: +#if !LJ_HASJIT + break; +#endif + case BC_IFUNCF: + | // BASE = new base, RA = BASE+framesize*8, CARG3 = LFUNC, RC = nargs*8 + | ldr CARG1, L->maxstack + | ldrb TMP1w, [PC, #-4+PC2PROTO(numparams)] + | ldr KBASE, [PC, #-4+PC2PROTO(k)] + | cmp RA, CARG1 + | bhi ->vm_growstack_l + |2: + | cmp NARGS8:RC, TMP1, lsl #3 // Check for missing parameters. + | blo >3 + if (op == BC_JFUNCF) { + | decode_RD RC, INS + | b =>BC_JLOOP + } else { + | ins_next + } + | + |3: // Clear missing parameters. + | str TISNIL, [BASE, NARGS8:RC] + | add NARGS8:RC, NARGS8:RC, #8 + | b <2 + break; + + case BC_JFUNCV: +#if !LJ_HASJIT + break; +#endif + | NYI // NYI: compiled vararg functions + break; /* NYI: compiled vararg functions. */ + + case BC_IFUNCV: + | // BASE = new base, RA = BASE+framesize*8, CARG3 = LFUNC, RC = nargs*8 + | ldr CARG1, L->maxstack + | movn TMP0, #~LJ_TFUNC + | add TMP2, BASE, RC + | add LFUNC:CARG3, CARG3, TMP0, lsl #47 + | add RA, RA, RC + | add TMP0, RC, #16+FRAME_VARG + | str LFUNC:CARG3, [TMP2], #8 // Store (tagged) copy of LFUNC. + | ldr KBASE, [PC, #-4+PC2PROTO(k)] + | cmp RA, CARG1 + | str TMP0, [TMP2], #8 // Store delta + FRAME_VARG. + | bhs ->vm_growstack_l + | sub RC, TMP2, #16 + | ldrb TMP1w, [PC, #-4+PC2PROTO(numparams)] + | mov RA, BASE + | mov BASE, TMP2 + | cbz TMP1, >2 + |1: + | cmp RA, RC // Less args than parameters? + | bhs >3 + | ldr TMP0, [RA] + | sub TMP1, TMP1, #1 + | str TISNIL, [RA], #8 // Clear old fixarg slot (help the GC). + | str TMP0, [TMP2], #8 + | cbnz TMP1, <1 + |2: + | ins_next + | + |3: + | sub TMP1, TMP1, #1 + | str TISNIL, [TMP2], #8 + | cbz TMP1, <2 + | b <3 + break; + + case BC_FUNCC: + case BC_FUNCCW: + | // BASE = new base, RA = BASE+framesize*8, CARG3 = CFUNC, RC = nargs*8 + if (op == BC_FUNCC) { + | ldr CARG4, CFUNC:CARG3->f + } else { + | ldr CARG4, GL->wrapf + } + | add CARG2, RA, NARGS8:RC + | ldr CARG1, L->maxstack + | add RC, BASE, NARGS8:RC + | cmp CARG2, CARG1 + | stp BASE, RC, L->base + if (op == BC_FUNCCW) { + | ldr CARG2, CFUNC:CARG3->f + } + | mv_vmstate TMP0w, C + | mov CARG1, L + | bhi ->vm_growstack_c // Need to grow stack. + | st_vmstate TMP0w + | blr_auth CARG4 // (lua_State *L [, lua_CFunction f]) + | // Returns nresults. + | ldp BASE, TMP1, L->base + | str L, GL->cur_L + | sbfiz RC, CRET1, #3, #32 + | st_vmstate ST_INTERP + | ldr PC, [BASE, FRAME_PC] + | sub RA, TMP1, RC // RA = L->top - nresults*8 + | b ->vm_returnc + break; + + /* ---------------------------------------------------------------------- */ + + default: + fprintf(stderr, "Error: undefined opcode BC_%s\n", bc_names[op]); + exit(2); + break; + } +} + +static int build_backend(BuildCtx *ctx) +{ + int op; + + dasm_growpc(Dst, BC__MAX); + + build_subroutines(ctx); + + |.code_op + for (op = 0; op < BC__MAX; op++) + build_ins(ctx, (BCOp)op, op); + + return BC__MAX; +} + +/* Emit pseudo frame-info for all assembler functions. */ +static void emit_asm_debug(BuildCtx *ctx) +{ + int fcofs = (int)((uint8_t *)ctx->glob[GLOB_vm_ffi_call] - ctx->code); + int i; + switch (ctx->mode) { + case BUILD_elfasm: + fprintf(ctx->fp, "\t.section .debug_frame,\"\",%%progbits\n"); + fprintf(ctx->fp, + ".Lframe0:\n" + "\t.long .LECIE0-.LSCIE0\n" + ".LSCIE0:\n" + "\t.long 0xffffffff\n" + "\t.byte 0x1\n" + "\t.string \"\"\n" + "\t.uleb128 0x1\n" + "\t.sleb128 -8\n" + "\t.byte 30\n" /* Return address is in lr. */ + "\t.byte 0xc\n\t.uleb128 29\n\t.uleb128 16\n" /* def_cfa fp 16 */ + "\t.align 3\n" + ".LECIE0:\n\n"); + fprintf(ctx->fp, + ".LSFDE0:\n" + "\t.long .LEFDE0-.LASFDE0\n" + ".LASFDE0:\n" + "\t.long .Lframe0\n" + "\t.quad .Lbegin\n" + "\t.quad %d\n" + "\t.byte 0x9e\n\t.uleb128 1\n" /* offset lr */ + "\t.byte 0x9d\n\t.uleb128 2\n", /* offset fp */ + fcofs); + for (i = 19; i <= 28; i++) /* offset x19-x28 */ + fprintf(ctx->fp, "\t.byte 0x%x\n\t.uleb128 %d\n", 0x80+i, i+(3-19)); + for (i = 8; i <= 15; i++) /* offset d8-d15 */ + fprintf(ctx->fp, "\t.byte 5\n\t.uleb128 0x%x\n\t.uleb128 %d\n", + 64+i, i+(3+(28-19+1)-8)); + fprintf(ctx->fp, + "\t.align 3\n" + ".LEFDE0:\n\n"); +#if LJ_HASFFI + fprintf(ctx->fp, + ".LSFDE1:\n" + "\t.long .LEFDE1-.LASFDE1\n" + ".LASFDE1:\n" + "\t.long .Lframe0\n" + "\t.quad lj_vm_ffi_call\n" + "\t.quad %d\n" + "\t.byte 0x9e\n\t.uleb128 1\n" /* offset lr */ + "\t.byte 0x9d\n\t.uleb128 2\n" /* offset fp */ + "\t.byte 0x93\n\t.uleb128 3\n" /* offset x19 */ + "\t.byte 0x94\n\t.uleb128 4\n" /* offset x20 */ + "\t.align 3\n" + ".LEFDE1:\n\n", (int)ctx->codesz - fcofs); +#endif +#if !LJ_NO_UNWIND + fprintf(ctx->fp, "\t.section .eh_frame,\"a\",%%progbits\n"); + fprintf(ctx->fp, + ".Lframe1:\n" + "\t.long .LECIE1-.LSCIE1\n" + ".LSCIE1:\n" + "\t.long 0\n" + "\t.byte 0x1\n" + "\t.string \"zPR\"\n" + "\t.uleb128 0x1\n" + "\t.sleb128 -8\n" + "\t.byte 30\n" /* Return address is in lr. */ + "\t.uleb128 6\n" /* augmentation length */ + "\t.byte 0x1b\n" /* pcrel|sdata4 */ + "\t.long lj_err_unwind_dwarf-.\n" + "\t.byte 0x1b\n" /* pcrel|sdata4 */ + "\t.byte 0xc\n\t.uleb128 29\n\t.uleb128 16\n" /* def_cfa fp 16 */ + "\t.align 3\n" + ".LECIE1:\n\n"); + fprintf(ctx->fp, + ".LSFDE2:\n" + "\t.long .LEFDE2-.LASFDE2\n" + ".LASFDE2:\n" + "\t.long .LASFDE2-.Lframe1\n" + "\t.long .Lbegin-.\n" + "\t.long %d\n" + "\t.uleb128 0\n" /* augmentation length */ + "\t.byte 0x9e\n\t.uleb128 1\n" /* offset lr */ + "\t.byte 0x9d\n\t.uleb128 2\n", /* offset fp */ + fcofs); + for (i = 19; i <= 28; i++) /* offset x19-x28 */ + fprintf(ctx->fp, "\t.byte 0x%x\n\t.uleb128 %d\n", 0x80+i, i+(3-19)); + for (i = 8; i <= 15; i++) /* offset d8-d15 */ + fprintf(ctx->fp, "\t.byte 5\n\t.uleb128 0x%x\n\t.uleb128 %d\n", + 64+i, i+(3+(28-19+1)-8)); + fprintf(ctx->fp, + "\t.align 3\n" + ".LEFDE2:\n\n"); +#if LJ_HASFFI + fprintf(ctx->fp, + ".Lframe2:\n" + "\t.long .LECIE2-.LSCIE2\n" + ".LSCIE2:\n" + "\t.long 0\n" + "\t.byte 0x1\n" + "\t.string \"zR\"\n" + "\t.uleb128 0x1\n" + "\t.sleb128 -8\n" + "\t.byte 30\n" /* Return address is in lr. */ + "\t.uleb128 1\n" /* augmentation length */ + "\t.byte 0x1b\n" /* pcrel|sdata4 */ + "\t.byte 0xc\n\t.uleb128 29\n\t.uleb128 16\n" /* def_cfa fp 16 */ + "\t.align 3\n" + ".LECIE2:\n\n"); + fprintf(ctx->fp, + ".LSFDE3:\n" + "\t.long .LEFDE3-.LASFDE3\n" + ".LASFDE3:\n" + "\t.long .LASFDE3-.Lframe2\n" + "\t.long lj_vm_ffi_call-.\n" + "\t.long %d\n" + "\t.uleb128 0\n" /* augmentation length */ + "\t.byte 0x9e\n\t.uleb128 1\n" /* offset lr */ + "\t.byte 0x9d\n\t.uleb128 2\n" /* offset fp */ + "\t.byte 0x93\n\t.uleb128 3\n" /* offset x19 */ + "\t.byte 0x94\n\t.uleb128 4\n" /* offset x20 */ + "\t.align 3\n" + ".LEFDE3:\n\n", (int)ctx->codesz - fcofs); +#endif +#endif + break; +#if !LJ_NO_UNWIND + case BUILD_machasm: { +#if LJ_HASFFI + int fcsize = 0; +#endif + int j; + fprintf(ctx->fp, "\t.section __TEXT,__eh_frame,coalesced,no_toc+strip_static_syms+live_support\n"); + fprintf(ctx->fp, + "EH_frame1:\n" + "\t.set L$set$x,LECIEX-LSCIEX\n" + "\t.long L$set$x\n" + "LSCIEX:\n" + "\t.long 0\n" + "\t.byte 0x1\n" + "\t.ascii \"zPR\\0\"\n" + "\t.uleb128 0x1\n" + "\t.sleb128 -8\n" + "\t.byte 30\n" /* Return address is in lr. */ + "\t.uleb128 6\n" /* augmentation length */ + "\t.byte 0x9b\n" /* indirect|pcrel|sdata4 */ + "\t.long _lj_err_unwind_dwarf@GOT-.\n" + "\t.byte 0x1b\n" /* pcrel|sdata4 */ + "\t.byte 0xc\n\t.uleb128 29\n\t.uleb128 16\n" /* def_cfa fp 16 */ + "\t.align 3\n" + "LECIEX:\n\n"); + for (j = 0; j < ctx->nsym; j++) { + const char *name = ctx->sym[j].name; + int32_t size = ctx->sym[j+1].ofs - ctx->sym[j].ofs; + if (size == 0) continue; +#if LJ_HASFFI + if (!strcmp(name, "_lj_vm_ffi_call")) { fcsize = size; continue; } +#endif + fprintf(ctx->fp, + "LSFDE%d:\n" + "\t.set L$set$%d,LEFDE%d-LASFDE%d\n" + "\t.long L$set$%d\n" + "LASFDE%d:\n" + "\t.long LASFDE%d-EH_frame1\n" + "\t.long %s-.\n" + "\t.long %d\n" + "\t.uleb128 0\n" /* augmentation length */ + "\t.byte 0x9e\n\t.uleb128 1\n" /* offset lr */ + "\t.byte 0x9d\n\t.uleb128 2\n", /* offset fp */ + j, j, j, j, j, j, j, name, size); + for (i = 19; i <= 28; i++) /* offset x19-x28 */ + fprintf(ctx->fp, "\t.byte 0x%x\n\t.uleb128 %d\n", 0x80+i, i+(3-19)); + for (i = 8; i <= 15; i++) /* offset d8-d15 */ + fprintf(ctx->fp, "\t.byte 5\n\t.uleb128 0x%x\n\t.uleb128 %d\n", + 64+i, i+(3+(28-19+1)-8)); + fprintf(ctx->fp, + "\t.align 3\n" + "LEFDE%d:\n\n", j); + } +#if LJ_HASFFI + if (fcsize) { + fprintf(ctx->fp, + "EH_frame2:\n" + "\t.set L$set$y,LECIEY-LSCIEY\n" + "\t.long L$set$y\n" + "LSCIEY:\n" + "\t.long 0\n" + "\t.byte 0x1\n" + "\t.ascii \"zR\\0\"\n" + "\t.uleb128 0x1\n" + "\t.sleb128 -8\n" + "\t.byte 30\n" /* Return address is in lr. */ + "\t.uleb128 1\n" /* augmentation length */ + "\t.byte 0x1b\n" /* pcrel|sdata4 */ + "\t.byte 0xc\n\t.uleb128 29\n\t.uleb128 16\n" /* def_cfa fp 16 */ + "\t.align 3\n" + "LECIEY:\n\n"); + fprintf(ctx->fp, + "LSFDEY:\n" + "\t.set L$set$yy,LEFDEY-LASFDEY\n" + "\t.long L$set$yy\n" + "LASFDEY:\n" + "\t.long LASFDEY-EH_frame2\n" + "\t.long _lj_vm_ffi_call-.\n" + "\t.long %d\n" + "\t.uleb128 0\n" /* augmentation length */ + "\t.byte 0x9e\n\t.uleb128 1\n" /* offset lr */ + "\t.byte 0x9d\n\t.uleb128 2\n" /* offset fp */ + "\t.byte 0x93\n\t.uleb128 3\n" /* offset x19 */ + "\t.byte 0x94\n\t.uleb128 4\n" /* offset x20 */ + "\t.align 3\n" + "LEFDEY:\n\n", fcsize); + } +#endif + fprintf(ctx->fp, ".subsections_via_symbols\n"); + } + break; +#endif + default: + break; + } +} + diff --cc src/vm_mips.dasc index bfdcfc1e,f6f801f2..94a878b9 --- a/src/vm_mips.dasc +++ b/src/vm_mips.dasc @@@ -1,9 -1,6 +1,9 @@@ |// Low-level VM code for MIPS CPUs. |// Bytecode interpreter, fast functions and helper functions. - |// Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h + |// Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +|// +|// MIPS soft-float support contributed by Djordje Kovacevic and +|// Stefan Pejic from RT-RK.com, sponsored by Cisco Systems, Inc. | |.arch mips |.section code_op, code_sub diff --cc src/vm_mips64.dasc index 801087b3,00000000..f8e181ee mode 100644,000000..100644 --- a/src/vm_mips64.dasc +++ b/src/vm_mips64.dasc @@@ -1,5557 -1,0 +1,5557 @@@ +|// Low-level VM code for MIPS64 CPUs. +|// Bytecode interpreter, fast functions and helper functions. - |// Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h ++|// Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +|// +|// Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com. +|// Sponsored by Cisco Systems, Inc. +| +|.arch mips64 +|.section code_op, code_sub +| +|.actionlist build_actionlist +|.globals GLOB_ +|.globalnames globnames +|.externnames extnames +| +|// Note: The ragged indentation of the instructions is intentional. +|// The starting columns indicate data dependencies. +| +|//----------------------------------------------------------------------- +| +|// Fixed register assignments for the interpreter. +|// Don't use: r0 = 0, r26/r27 = reserved, r28 = gp, r29 = sp, r31 = ra +| +|.macro .FPU, a, b +|.if FPU +| a, b +|.endif +|.endmacro +| +|// The following must be C callee-save (but BASE is often refetched). +|.define BASE, r16 // Base of current Lua stack frame. +|.define KBASE, r17 // Constants of current Lua function. +|.define PC, r18 // Next PC. +|.define DISPATCH, r19 // Opcode dispatch table. +|.define LREG, r20 // Register holding lua_State (also in SAVE_L). +|.define MULTRES, r21 // Size of multi-result: (nresults+1)*8. +| +|.define JGL, r30 // On-trace: global_State + 32768. +| +|// Constants for type-comparisons, stores and conversions. C callee-save. +|.define TISNIL, r30 +|.define TISNUM, r22 +|.if FPU +|.define TOBIT, f30 // 2^52 + 2^51. +|.endif +| +|// The following temporaries are not saved across C calls, except for RA. +|.define RA, r23 // Callee-save. +|.define RB, r8 +|.define RC, r9 +|.define RD, r10 +|.define INS, r11 +| +|.define AT, r1 // Assembler temporary. +|.define TMP0, r12 +|.define TMP1, r13 +|.define TMP2, r14 +|.define TMP3, r15 +| +|// MIPS n64 calling convention. +|.define CFUNCADDR, r25 +|.define CARG1, r4 +|.define CARG2, r5 +|.define CARG3, r6 +|.define CARG4, r7 +|.define CARG5, r8 +|.define CARG6, r9 +|.define CARG7, r10 +|.define CARG8, r11 +| +|.define CRET1, r2 +|.define CRET2, r3 +| +|.if FPU +|.define FARG1, f12 +|.define FARG2, f13 +|.define FARG3, f14 +|.define FARG4, f15 +|.define FARG5, f16 +|.define FARG6, f17 +|.define FARG7, f18 +|.define FARG8, f19 +| +|.define FRET1, f0 +|.define FRET2, f2 +| +|.define FTMP0, f20 +|.define FTMP1, f21 +|.define FTMP2, f22 +|.endif +| +|// Stack layout while in interpreter. Must match with lj_frame.h. +|.if FPU // MIPS64 hard-float. +| +|.define CFRAME_SPACE, 192 // Delta for sp. +| +|//----- 16 byte aligned, <-- sp entering interpreter +|.define SAVE_ERRF, 188(sp) // 32 bit values. +|.define SAVE_NRES, 184(sp) +|.define SAVE_CFRAME, 176(sp) // 64 bit values. +|.define SAVE_L, 168(sp) +|.define SAVE_PC, 160(sp) +|//----- 16 byte aligned +|.define SAVE_GPR_, 80 // .. 80+10*8: 64 bit GPR saves. +|.define SAVE_FPR_, 16 // .. 16+8*8: 64 bit FPR saves. +| +|.else // MIPS64 soft-float +| +|.define CFRAME_SPACE, 128 // Delta for sp. +| +|//----- 16 byte aligned, <-- sp entering interpreter +|.define SAVE_ERRF, 124(sp) // 32 bit values. +|.define SAVE_NRES, 120(sp) +|.define SAVE_CFRAME, 112(sp) // 64 bit values. +|.define SAVE_L, 104(sp) +|.define SAVE_PC, 96(sp) +|//----- 16 byte aligned +|.define SAVE_GPR_, 16 // .. 16+10*8: 64 bit GPR saves. +| +|.endif +| +|.define TMPX, 8(sp) // Unused by interpreter, temp for JIT code. +|.define TMPD, 0(sp) +|//----- 16 byte aligned +| +|.define TMPD_OFS, 0 +| +|.define SAVE_MULTRES, TMPD +| +|//----------------------------------------------------------------------- +| +|.macro saveregs +| daddiu sp, sp, -CFRAME_SPACE +| sd ra, SAVE_GPR_+9*8(sp) +| sd r30, SAVE_GPR_+8*8(sp) +| .FPU sdc1 f31, SAVE_FPR_+7*8(sp) +| sd r23, SAVE_GPR_+7*8(sp) +| .FPU sdc1 f30, SAVE_FPR_+6*8(sp) +| sd r22, SAVE_GPR_+6*8(sp) +| .FPU sdc1 f29, SAVE_FPR_+5*8(sp) +| sd r21, SAVE_GPR_+5*8(sp) +| .FPU sdc1 f28, SAVE_FPR_+4*8(sp) +| sd r20, SAVE_GPR_+4*8(sp) +| .FPU sdc1 f27, SAVE_FPR_+3*8(sp) +| sd r19, SAVE_GPR_+3*8(sp) +| .FPU sdc1 f26, SAVE_FPR_+2*8(sp) +| sd r18, SAVE_GPR_+2*8(sp) +| .FPU sdc1 f25, SAVE_FPR_+1*8(sp) +| sd r17, SAVE_GPR_+1*8(sp) +| .FPU sdc1 f24, SAVE_FPR_+0*8(sp) +| sd r16, SAVE_GPR_+0*8(sp) +|.endmacro +| +|.macro restoreregs_ret +| ld ra, SAVE_GPR_+9*8(sp) +| ld r30, SAVE_GPR_+8*8(sp) +| ld r23, SAVE_GPR_+7*8(sp) +| .FPU ldc1 f31, SAVE_FPR_+7*8(sp) +| ld r22, SAVE_GPR_+6*8(sp) +| .FPU ldc1 f30, SAVE_FPR_+6*8(sp) +| ld r21, SAVE_GPR_+5*8(sp) +| .FPU ldc1 f29, SAVE_FPR_+5*8(sp) +| ld r20, SAVE_GPR_+4*8(sp) +| .FPU ldc1 f28, SAVE_FPR_+4*8(sp) +| ld r19, SAVE_GPR_+3*8(sp) +| .FPU ldc1 f27, SAVE_FPR_+3*8(sp) +| ld r18, SAVE_GPR_+2*8(sp) +| .FPU ldc1 f26, SAVE_FPR_+2*8(sp) +| ld r17, SAVE_GPR_+1*8(sp) +| .FPU ldc1 f25, SAVE_FPR_+1*8(sp) +| ld r16, SAVE_GPR_+0*8(sp) +| .FPU ldc1 f24, SAVE_FPR_+0*8(sp) +| jr ra +| daddiu sp, sp, CFRAME_SPACE +|.endmacro +| +|// Type definitions. Some of these are only used for documentation. +|.type L, lua_State, LREG +|.type GL, global_State +|.type TVALUE, TValue +|.type GCOBJ, GCobj +|.type STR, GCstr +|.type TAB, GCtab +|.type LFUNC, GCfuncL +|.type CFUNC, GCfuncC +|.type PROTO, GCproto +|.type UPVAL, GCupval +|.type NODE, Node +|.type NARGS8, int +|.type TRACE, GCtrace +|.type SBUF, SBuf +| +|//----------------------------------------------------------------------- +| +|// Trap for not-yet-implemented parts. +|.macro NYI; .long 0xec1cf0f0; .endmacro +| +|// Macros to mark delay slots. +|.macro ., a; a; .endmacro +|.macro ., a,b; a,b; .endmacro +|.macro ., a,b,c; a,b,c; .endmacro +|.macro ., a,b,c,d; a,b,c,d; .endmacro +| +|.define FRAME_PC, -8 +|.define FRAME_FUNC, -16 +| +|//----------------------------------------------------------------------- +| +|// Endian-specific defines. +|.if ENDIAN_LE +|.define HI, 4 +|.define LO, 0 +|.define OFS_RD, 2 +|.define OFS_RA, 1 +|.define OFS_OP, 0 +|.else +|.define HI, 0 +|.define LO, 4 +|.define OFS_RD, 0 +|.define OFS_RA, 2 +|.define OFS_OP, 3 +|.endif +| +|// Instruction decode. +|.macro decode_OP1, dst, ins; andi dst, ins, 0xff; .endmacro +|.macro decode_OP8a, dst, ins; andi dst, ins, 0xff; .endmacro +|.macro decode_OP8b, dst; sll dst, dst, 3; .endmacro +|.macro decode_RC8a, dst, ins; srl dst, ins, 13; .endmacro +|.macro decode_RC8b, dst; andi dst, dst, 0x7f8; .endmacro +|.macro decode_RD4b, dst; sll dst, dst, 2; .endmacro +|.macro decode_RA8a, dst, ins; srl dst, ins, 5; .endmacro +|.macro decode_RA8b, dst; andi dst, dst, 0x7f8; .endmacro +|.macro decode_RB8a, dst, ins; srl dst, ins, 21; .endmacro +|.macro decode_RB8b, dst; andi dst, dst, 0x7f8; .endmacro +|.macro decode_RD8a, dst, ins; srl dst, ins, 16; .endmacro +|.macro decode_RD8b, dst; sll dst, dst, 3; .endmacro +|.macro decode_RDtoRC8, dst, src; andi dst, src, 0x7f8; .endmacro +| +|// Instruction fetch. +|.macro ins_NEXT1 +| lw INS, 0(PC) +| daddiu PC, PC, 4 +|.endmacro +|// Instruction decode+dispatch. +|.macro ins_NEXT2 +| decode_OP8a TMP1, INS +| decode_OP8b TMP1 +| daddu TMP0, DISPATCH, TMP1 +| decode_RD8a RD, INS +| ld AT, 0(TMP0) +| decode_RA8a RA, INS +| decode_RD8b RD +| jr AT +| decode_RA8b RA +|.endmacro +|.macro ins_NEXT +| ins_NEXT1 +| ins_NEXT2 +|.endmacro +| +|// Instruction footer. +|.if 1 +| // Replicated dispatch. Less unpredictable branches, but higher I-Cache use. +| .define ins_next, ins_NEXT +| .define ins_next_, ins_NEXT +| .define ins_next1, ins_NEXT1 +| .define ins_next2, ins_NEXT2 +|.else +| // Common dispatch. Lower I-Cache use, only one (very) unpredictable branch. +| // Affects only certain kinds of benchmarks (and only with -j off). +| .macro ins_next +| b ->ins_next +| .endmacro +| .macro ins_next1 +| .endmacro +| .macro ins_next2 +| b ->ins_next +| .endmacro +| .macro ins_next_ +| ->ins_next: +| ins_NEXT +| .endmacro +|.endif +| +|// Call decode and dispatch. +|.macro ins_callt +| // BASE = new base, RB = LFUNC/CFUNC, RC = nargs*8, FRAME_PC(BASE) = PC +| ld PC, LFUNC:RB->pc +| lw INS, 0(PC) +| daddiu PC, PC, 4 +| decode_OP8a TMP1, INS +| decode_RA8a RA, INS +| decode_OP8b TMP1 +| decode_RA8b RA +| daddu TMP0, DISPATCH, TMP1 +| ld TMP0, 0(TMP0) +| jr TMP0 +| daddu RA, RA, BASE +|.endmacro +| +|.macro ins_call +| // BASE = new base, RB = LFUNC/CFUNC, RC = nargs*8, PC = caller PC +| sd PC, FRAME_PC(BASE) +| ins_callt +|.endmacro +| +|//----------------------------------------------------------------------- +| +|.macro branch_RD +| srl TMP0, RD, 1 +| lui AT, (-(BCBIAS_J*4 >> 16) & 65535) +| addu TMP0, TMP0, AT +| daddu PC, PC, TMP0 +|.endmacro +| +|// Assumes DISPATCH is relative to GL. +#define DISPATCH_GL(field) (GG_DISP2G + (int)offsetof(global_State, field)) +#define DISPATCH_J(field) (GG_DISP2J + (int)offsetof(jit_State, field)) +#define GG_DISP2GOT (GG_OFS(got) - GG_OFS(dispatch)) +#define DISPATCH_GOT(name) (GG_DISP2GOT + sizeof(void*)*LJ_GOT_##name) +| +#define PC2PROTO(field) ((int)offsetof(GCproto, field)-(int)sizeof(GCproto)) +| +|.macro load_got, func +| ld CFUNCADDR, DISPATCH_GOT(func)(DISPATCH) +|.endmacro +|// Much faster. Sadly, there's no easy way to force the required code layout. +|// .macro call_intern, func; bal extern func; .endmacro +|.macro call_intern, func; jalr CFUNCADDR; .endmacro +|.macro call_extern; jalr CFUNCADDR; .endmacro +|.macro jmp_extern; jr CFUNCADDR; .endmacro +| +|.macro hotcheck, delta, target +| dsrl TMP1, PC, 1 +| andi TMP1, TMP1, 126 +| daddu TMP1, TMP1, DISPATCH +| lhu TMP2, GG_DISP2HOT(TMP1) +| addiu TMP2, TMP2, -delta +| bltz TMP2, target +|. sh TMP2, GG_DISP2HOT(TMP1) +|.endmacro +| +|.macro hotloop +| hotcheck HOTCOUNT_LOOP, ->vm_hotloop +|.endmacro +| +|.macro hotcall +| hotcheck HOTCOUNT_CALL, ->vm_hotcall +|.endmacro +| +|// Set current VM state. Uses TMP0. +|.macro li_vmstate, st; li TMP0, ~LJ_VMST_..st; .endmacro +|.macro st_vmstate; sw TMP0, DISPATCH_GL(vmstate)(DISPATCH); .endmacro +| +|// Move table write barrier back. Overwrites mark and tmp. +|.macro barrierback, tab, mark, tmp, target +| ld tmp, DISPATCH_GL(gc.grayagain)(DISPATCH) +| andi mark, mark, ~LJ_GC_BLACK & 255 // black2gray(tab) +| sd tab, DISPATCH_GL(gc.grayagain)(DISPATCH) +| sb mark, tab->marked +| b target +|. sd tmp, tab->gclist +|.endmacro +| +|// Clear type tag. Isolate lowest 14+32+1=47 bits of reg. +|.macro cleartp, reg; dextm reg, reg, 0, 14; .endmacro +|.macro cleartp, dst, reg; dextm dst, reg, 0, 14; .endmacro +| +|// Set type tag: Merge 17 type bits into bits [15+32=47, 31+32+1=64) of dst. +|.macro settp, dst, tp; dinsu dst, tp, 15, 31; .endmacro +| +|// Extract (negative) type tag. +|.macro gettp, dst, src; dsra dst, src, 47; .endmacro +| +|// Macros to check the TValue type and extract the GCobj. Branch on failure. +|.macro checktp, reg, tp, target +| gettp AT, reg +| daddiu AT, AT, tp +| bnez AT, target +|. cleartp reg +|.endmacro +|.macro checktp, dst, reg, tp, target +| gettp AT, reg +| daddiu AT, AT, tp +| bnez AT, target +|. cleartp dst, reg +|.endmacro +|.macro checkstr, reg, target; checktp reg, -LJ_TSTR, target; .endmacro +|.macro checktab, reg, target; checktp reg, -LJ_TTAB, target; .endmacro +|.macro checkfunc, reg, target; checktp reg, -LJ_TFUNC, target; .endmacro +|.macro checkint, reg, target // Caveat: has delay slot! +| gettp AT, reg +| bne AT, TISNUM, target +|.endmacro +|.macro checknum, reg, target // Caveat: has delay slot! +| gettp AT, reg +| sltiu AT, AT, LJ_TISNUM +| beqz AT, target +|.endmacro +| +|.macro mov_false, reg +| lu reg, 0x8000 +| dsll reg, reg, 32 +| not reg, reg +|.endmacro +|.macro mov_true, reg +| li reg, 0x0001 +| dsll reg, reg, 48 +| not reg, reg +|.endmacro +| +|//----------------------------------------------------------------------- + +/* Generate subroutines used by opcodes and other parts of the VM. */ +/* The .code_sub section should be last to help static branch prediction. */ +static void build_subroutines(BuildCtx *ctx) +{ + |.code_sub + | + |//----------------------------------------------------------------------- + |//-- Return handling ---------------------------------------------------- + |//----------------------------------------------------------------------- + | + |->vm_returnp: + | // See vm_return. Also: TMP2 = previous base. + | andi AT, PC, FRAME_P + | beqz AT, ->cont_dispatch + | + | // Return from pcall or xpcall fast func. + |. mov_true TMP1 + | ld PC, FRAME_PC(TMP2) // Fetch PC of previous frame. + | move BASE, TMP2 // Restore caller base. + | // Prepending may overwrite the pcall frame, so do it at the end. + | sd TMP1, -8(RA) // Prepend true to results. + | daddiu RA, RA, -8 + | + |->vm_returnc: + | addiu RD, RD, 8 // RD = (nresults+1)*8. + | andi TMP0, PC, FRAME_TYPE + | beqz RD, ->vm_unwind_c_eh + |. li CRET1, LUA_YIELD + | beqz TMP0, ->BC_RET_Z // Handle regular return to Lua. + |. move MULTRES, RD + | + |->vm_return: + | // BASE = base, RA = resultptr, RD/MULTRES = (nresults+1)*8, PC = return + | // TMP0 = PC & FRAME_TYPE + | li TMP2, -8 + | xori AT, TMP0, FRAME_C + | and TMP2, PC, TMP2 + | bnez AT, ->vm_returnp + | dsubu TMP2, BASE, TMP2 // TMP2 = previous base. + | + | addiu TMP1, RD, -8 + | sd TMP2, L->base + | li_vmstate C + | lw TMP2, SAVE_NRES + | daddiu BASE, BASE, -16 + | st_vmstate + | beqz TMP1, >2 + |. sll TMP2, TMP2, 3 + |1: + | addiu TMP1, TMP1, -8 + | ld CRET1, 0(RA) + | daddiu RA, RA, 8 + | sd CRET1, 0(BASE) + | bnez TMP1, <1 + |. daddiu BASE, BASE, 8 + | + |2: + | bne TMP2, RD, >6 + |3: + |. sd BASE, L->top // Store new top. + | + |->vm_leave_cp: + | ld TMP0, SAVE_CFRAME // Restore previous C frame. + | move CRET1, r0 // Ok return status for vm_pcall. + | sd TMP0, L->cframe + | + |->vm_leave_unw: + | restoreregs_ret + | + |6: + | ld TMP1, L->maxstack + | slt AT, TMP2, RD + | bnez AT, >7 // Less results wanted? + | // More results wanted. Check stack size and fill up results with nil. + |. slt AT, BASE, TMP1 + | beqz AT, >8 + |. nop + | sd TISNIL, 0(BASE) + | addiu RD, RD, 8 + | b <2 + |. daddiu BASE, BASE, 8 + | + |7: // Less results wanted. + | subu TMP0, RD, TMP2 + | dsubu TMP0, BASE, TMP0 // Either keep top or shrink it. + |.if MIPSR6 + | selnez TMP0, TMP0, TMP2 // LUA_MULTRET+1 case? + | seleqz BASE, BASE, TMP2 + | b <3 + |. or BASE, BASE, TMP0 + |.else + | b <3 + |. movn BASE, TMP0, TMP2 // LUA_MULTRET+1 case? + |.endif + | + |8: // Corner case: need to grow stack for filling up results. + | // This can happen if: + | // - A C function grows the stack (a lot). + | // - The GC shrinks the stack in between. + | // - A return back from a lua_call() with (high) nresults adjustment. + | load_got lj_state_growstack + | move MULTRES, RD + | srl CARG2, TMP2, 3 + | call_intern lj_state_growstack // (lua_State *L, int n) + |. move CARG1, L + | lw TMP2, SAVE_NRES + | ld BASE, L->top // Need the (realloced) L->top in BASE. + | move RD, MULTRES + | b <2 + |. sll TMP2, TMP2, 3 + | + |->vm_unwind_c: // Unwind C stack, return from vm_pcall. + | // (void *cframe, int errcode) + | move sp, CARG1 + | move CRET1, CARG2 + |->vm_unwind_c_eh: // Landing pad for external unwinder. + | ld L, SAVE_L + | li TMP0, ~LJ_VMST_C + | ld GL:TMP1, L->glref + | b ->vm_leave_unw + |. sw TMP0, GL:TMP1->vmstate + | + |->vm_unwind_ff: // Unwind C stack, return from ff pcall. + | // (void *cframe) + | li AT, -4 + | and sp, CARG1, AT + |->vm_unwind_ff_eh: // Landing pad for external unwinder. + | ld L, SAVE_L + | .FPU lui TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float). + | li TISNIL, LJ_TNIL + | li TISNUM, LJ_TISNUM + | ld BASE, L->base + | ld DISPATCH, L->glref // Setup pointer to dispatch table. + | .FPU mtc1 TMP3, TOBIT + | mov_false TMP1 + | li_vmstate INTERP + | ld PC, FRAME_PC(BASE) // Fetch PC of previous frame. + | .FPU cvt.d.s TOBIT, TOBIT + | daddiu RA, BASE, -8 // Results start at BASE-8. + | daddiu DISPATCH, DISPATCH, GG_G2DISP + | sd TMP1, 0(RA) // Prepend false to error message. + | st_vmstate + | b ->vm_returnc + |. li RD, 16 // 2 results: false + error message. + | + |->vm_unwind_stub: // Jump to exit stub from unwinder. + | jr CARG1 + |. move ra, CARG2 + | + |//----------------------------------------------------------------------- + |//-- Grow stack for calls ----------------------------------------------- + |//----------------------------------------------------------------------- + | + |->vm_growstack_c: // Grow stack for C function. + | b >2 + |. li CARG2, LUA_MINSTACK + | + |->vm_growstack_l: // Grow stack for Lua function. + | // BASE = new base, RA = BASE+framesize*8, RC = nargs*8, PC = first PC + | daddu RC, BASE, RC + | dsubu RA, RA, BASE + | sd BASE, L->base + | daddiu PC, PC, 4 // Must point after first instruction. + | sd RC, L->top + | srl CARG2, RA, 3 + |2: + | // L->base = new base, L->top = top + | load_got lj_state_growstack + | sd PC, SAVE_PC + | call_intern lj_state_growstack // (lua_State *L, int n) + |. move CARG1, L + | ld BASE, L->base + | ld RC, L->top + | ld LFUNC:RB, FRAME_FUNC(BASE) + | dsubu RC, RC, BASE + | cleartp LFUNC:RB + | // BASE = new base, RB = LFUNC/CFUNC, RC = nargs*8, FRAME_PC(BASE) = PC + | ins_callt // Just retry the call. + | + |//----------------------------------------------------------------------- + |//-- Entry points into the assembler VM --------------------------------- + |//----------------------------------------------------------------------- + | + |->vm_resume: // Setup C frame and resume thread. + | // (lua_State *L, TValue *base, int nres1 = 0, ptrdiff_t ef = 0) + | saveregs + | move L, CARG1 + | ld DISPATCH, L->glref // Setup pointer to dispatch table. + | move BASE, CARG2 + | lbu TMP1, L->status + | sd L, SAVE_L + | li PC, FRAME_CP + | daddiu TMP0, sp, CFRAME_RESUME + | daddiu DISPATCH, DISPATCH, GG_G2DISP + | sw r0, SAVE_NRES + | sw r0, SAVE_ERRF + | sd CARG1, SAVE_PC // Any value outside of bytecode is ok. + | sd r0, SAVE_CFRAME + | beqz TMP1, >3 + |. sd TMP0, L->cframe + | + | // Resume after yield (like a return). + | sd L, DISPATCH_GL(cur_L)(DISPATCH) + | move RA, BASE + | ld BASE, L->base + | ld TMP1, L->top + | ld PC, FRAME_PC(BASE) + | .FPU lui TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float). + | dsubu RD, TMP1, BASE + | .FPU mtc1 TMP3, TOBIT + | sb r0, L->status + | .FPU cvt.d.s TOBIT, TOBIT + | li_vmstate INTERP + | daddiu RD, RD, 8 + | st_vmstate + | move MULTRES, RD + | andi TMP0, PC, FRAME_TYPE + | li TISNIL, LJ_TNIL + | beqz TMP0, ->BC_RET_Z + |. li TISNUM, LJ_TISNUM + | b ->vm_return + |. nop + | + |->vm_pcall: // Setup protected C frame and enter VM. + | // (lua_State *L, TValue *base, int nres1, ptrdiff_t ef) + | saveregs + | sw CARG4, SAVE_ERRF + | b >1 + |. li PC, FRAME_CP + | + |->vm_call: // Setup C frame and enter VM. + | // (lua_State *L, TValue *base, int nres1) + | saveregs + | li PC, FRAME_C + | + |1: // Entry point for vm_pcall above (PC = ftype). + | ld TMP1, L:CARG1->cframe + | move L, CARG1 + | sw CARG3, SAVE_NRES + | ld DISPATCH, L->glref // Setup pointer to dispatch table. + | sd CARG1, SAVE_L + | move BASE, CARG2 + | daddiu DISPATCH, DISPATCH, GG_G2DISP + | sd CARG1, SAVE_PC // Any value outside of bytecode is ok. + | sd TMP1, SAVE_CFRAME + | sd sp, L->cframe // Add our C frame to cframe chain. + | + |3: // Entry point for vm_cpcall/vm_resume (BASE = base, PC = ftype). + | sd L, DISPATCH_GL(cur_L)(DISPATCH) + | ld TMP2, L->base // TMP2 = old base (used in vmeta_call). + | .FPU lui TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float). + | ld TMP1, L->top + | .FPU mtc1 TMP3, TOBIT + | daddu PC, PC, BASE + | dsubu NARGS8:RC, TMP1, BASE + | li TISNUM, LJ_TISNUM + | dsubu PC, PC, TMP2 // PC = frame delta + frame type + | .FPU cvt.d.s TOBIT, TOBIT + | li_vmstate INTERP + | li TISNIL, LJ_TNIL + | st_vmstate + | + |->vm_call_dispatch: + | // TMP2 = old base, BASE = new base, RC = nargs*8, PC = caller PC + | ld LFUNC:RB, FRAME_FUNC(BASE) + | checkfunc LFUNC:RB, ->vmeta_call + | + |->vm_call_dispatch_f: + | ins_call + | // BASE = new base, RB = func, RC = nargs*8, PC = caller PC + | + |->vm_cpcall: // Setup protected C frame, call C. + | // (lua_State *L, lua_CFunction func, void *ud, lua_CPFunction cp) + | saveregs + | move L, CARG1 + | ld TMP0, L:CARG1->stack + | sd CARG1, SAVE_L + | ld TMP1, L->top + | ld DISPATCH, L->glref // Setup pointer to dispatch table. + | sd CARG1, SAVE_PC // Any value outside of bytecode is ok. + | dsubu TMP0, TMP0, TMP1 // Compute -savestack(L, L->top). + | ld TMP1, L->cframe + | daddiu DISPATCH, DISPATCH, GG_G2DISP + | sw TMP0, SAVE_NRES // Neg. delta means cframe w/o frame. + | sw r0, SAVE_ERRF // No error function. + | sd TMP1, SAVE_CFRAME + | sd sp, L->cframe // Add our C frame to cframe chain. + | sd L, DISPATCH_GL(cur_L)(DISPATCH) + | jalr CARG4 // (lua_State *L, lua_CFunction func, void *ud) + |. move CFUNCADDR, CARG4 + | move BASE, CRET1 + | bnez CRET1, <3 // Else continue with the call. + |. li PC, FRAME_CP + | b ->vm_leave_cp // No base? Just remove C frame. + |. nop + | + |//----------------------------------------------------------------------- + |//-- Metamethod handling ------------------------------------------------ + |//----------------------------------------------------------------------- + | + |// The lj_meta_* functions (except for lj_meta_cat) don't reallocate the + |// stack, so BASE doesn't need to be reloaded across these calls. + | + |//-- Continuation dispatch ---------------------------------------------- + | + |->cont_dispatch: + | // BASE = meta base, RA = resultptr, RD = (nresults+1)*8 + | ld TMP0, -32(BASE) // Continuation. + | move RB, BASE + | move BASE, TMP2 // Restore caller BASE. + | ld LFUNC:TMP1, FRAME_FUNC(TMP2) + |.if FFI + | sltiu AT, TMP0, 2 + |.endif + | ld PC, -24(RB) // Restore PC from [cont|PC]. + | cleartp LFUNC:TMP1 + | daddu TMP2, RA, RD + |.if FFI + | bnez AT, >1 + |.endif + |. sd TISNIL, -8(TMP2) // Ensure one valid arg. + | ld TMP1, LFUNC:TMP1->pc + | // BASE = base, RA = resultptr, RB = meta base + | jr TMP0 // Jump to continuation. + |. ld KBASE, PC2PROTO(k)(TMP1) + | + |.if FFI + |1: + | bnez TMP0, ->cont_ffi_callback // cont = 1: return from FFI callback. + | // cont = 0: tailcall from C function. + |. daddiu TMP1, RB, -32 + | b ->vm_call_tail + |. dsubu RC, TMP1, BASE + |.endif + | + |->cont_cat: // RA = resultptr, RB = meta base + | lw INS, -4(PC) + | daddiu CARG2, RB, -32 + | ld CRET1, 0(RA) + | decode_RB8a MULTRES, INS + | decode_RA8a RA, INS + | decode_RB8b MULTRES + | decode_RA8b RA + | daddu TMP1, BASE, MULTRES + | sd BASE, L->base + | dsubu CARG3, CARG2, TMP1 + | bne TMP1, CARG2, ->BC_CAT_Z + |. sd CRET1, 0(CARG2) + | daddu RA, BASE, RA + | b ->cont_nop + |. sd CRET1, 0(RA) + | + |//-- Table indexing metamethods ----------------------------------------- + | + |->vmeta_tgets1: + | daddiu CARG3, DISPATCH, DISPATCH_GL(tmptv) + | li TMP0, LJ_TSTR + | settp STR:RC, TMP0 + | b >1 + |. sd STR:RC, 0(CARG3) + | + |->vmeta_tgets: + | daddiu CARG2, DISPATCH, DISPATCH_GL(tmptv) + | li TMP0, LJ_TTAB + | li TMP1, LJ_TSTR + | settp TAB:RB, TMP0 + | daddiu CARG3, DISPATCH, DISPATCH_GL(tmptv2) + | sd TAB:RB, 0(CARG2) + | settp STR:RC, TMP1 + | b >1 + |. sd STR:RC, 0(CARG3) + | + |->vmeta_tgetb: // TMP0 = index + | daddiu CARG3, DISPATCH, DISPATCH_GL(tmptv) + | settp TMP0, TISNUM + | sd TMP0, 0(CARG3) + | + |->vmeta_tgetv: + |1: + | load_got lj_meta_tget + | sd BASE, L->base + | sd PC, SAVE_PC + | call_intern lj_meta_tget // (lua_State *L, TValue *o, TValue *k) + |. move CARG1, L + | // Returns TValue * (finished) or NULL (metamethod). + | beqz CRET1, >3 + |. daddiu TMP1, BASE, -FRAME_CONT + | ld CARG1, 0(CRET1) + | ins_next1 + | sd CARG1, 0(RA) + | ins_next2 + | + |3: // Call __index metamethod. + | // BASE = base, L->top = new base, stack = cont/func/t/k + | ld BASE, L->top + | sd PC, -24(BASE) // [cont|PC] + | dsubu PC, BASE, TMP1 + | ld LFUNC:RB, FRAME_FUNC(BASE) // Guaranteed to be a function here. + | cleartp LFUNC:RB + | b ->vm_call_dispatch_f + |. li NARGS8:RC, 16 // 2 args for func(t, k). + | + |->vmeta_tgetr: + | load_got lj_tab_getinth + | call_intern lj_tab_getinth // (GCtab *t, int32_t key) + |. nop + | // Returns cTValue * or NULL. + | beqz CRET1, ->BC_TGETR_Z + |. move CARG2, TISNIL + | b ->BC_TGETR_Z + |. ld CARG2, 0(CRET1) + | + |//----------------------------------------------------------------------- + | + |->vmeta_tsets1: + | daddiu CARG3, DISPATCH, DISPATCH_GL(tmptv) + | li TMP0, LJ_TSTR + | settp STR:RC, TMP0 + | b >1 + |. sd STR:RC, 0(CARG3) + | + |->vmeta_tsets: + | daddiu CARG2, DISPATCH, DISPATCH_GL(tmptv) + | li TMP0, LJ_TTAB + | li TMP1, LJ_TSTR + | settp TAB:RB, TMP0 + | daddiu CARG3, DISPATCH, DISPATCH_GL(tmptv2) + | sd TAB:RB, 0(CARG2) + | settp STR:RC, TMP1 + | b >1 + |. sd STR:RC, 0(CARG3) + | + |->vmeta_tsetb: // TMP0 = index + | daddiu CARG3, DISPATCH, DISPATCH_GL(tmptv) + | settp TMP0, TISNUM + | sd TMP0, 0(CARG3) + | + |->vmeta_tsetv: + |1: + | load_got lj_meta_tset + | sd BASE, L->base + | sd PC, SAVE_PC + | call_intern lj_meta_tset // (lua_State *L, TValue *o, TValue *k) + |. move CARG1, L + | // Returns TValue * (finished) or NULL (metamethod). + | beqz CRET1, >3 + |. ld CARG1, 0(RA) + | // NOBARRIER: lj_meta_tset ensures the table is not black. + | ins_next1 + | sd CARG1, 0(CRET1) + | ins_next2 + | + |3: // Call __newindex metamethod. + | // BASE = base, L->top = new base, stack = cont/func/t/k/(v) + | daddiu TMP1, BASE, -FRAME_CONT + | ld BASE, L->top + | sd PC, -24(BASE) // [cont|PC] + | dsubu PC, BASE, TMP1 + | ld LFUNC:RB, FRAME_FUNC(BASE) // Guaranteed to be a function here. + | cleartp LFUNC:RB + | sd CARG1, 16(BASE) // Copy value to third argument. + | b ->vm_call_dispatch_f + |. li NARGS8:RC, 24 // 3 args for func(t, k, v) + | + |->vmeta_tsetr: + | load_got lj_tab_setinth + | sd BASE, L->base + | sd PC, SAVE_PC + | call_intern lj_tab_setinth // (lua_State *L, GCtab *t, int32_t key) + |. move CARG1, L + | // Returns TValue *. + | b ->BC_TSETR_Z + |. nop + | + |//-- Comparison metamethods --------------------------------------------- + | + |->vmeta_comp: + | // RA/RD point to o1/o2. + | move CARG2, RA + | move CARG3, RD + | load_got lj_meta_comp + | daddiu PC, PC, -4 + | sd BASE, L->base + | sd PC, SAVE_PC + | decode_OP1 CARG4, INS + | call_intern lj_meta_comp // (lua_State *L, TValue *o1, *o2, int op) + |. move CARG1, L + | // Returns 0/1 or TValue * (metamethod). + |3: + | sltiu AT, CRET1, 2 + | beqz AT, ->vmeta_binop + | negu TMP2, CRET1 + |4: + | lhu RD, OFS_RD(PC) + | daddiu PC, PC, 4 + | lui TMP1, (-(BCBIAS_J*4 >> 16) & 65535) + | sll RD, RD, 2 + | addu RD, RD, TMP1 + | and RD, RD, TMP2 + | daddu PC, PC, RD + |->cont_nop: + | ins_next + | + |->cont_ra: // RA = resultptr + | lbu TMP1, -4+OFS_RA(PC) + | ld CRET1, 0(RA) + | sll TMP1, TMP1, 3 + | daddu TMP1, BASE, TMP1 + | b ->cont_nop + |. sd CRET1, 0(TMP1) + | + |->cont_condt: // RA = resultptr + | ld TMP0, 0(RA) + | gettp TMP0, TMP0 + | sltiu AT, TMP0, LJ_TISTRUECOND + | b <4 + |. negu TMP2, AT // Branch if result is true. + | + |->cont_condf: // RA = resultptr + | ld TMP0, 0(RA) + | gettp TMP0, TMP0 + | sltiu AT, TMP0, LJ_TISTRUECOND + | b <4 + |. addiu TMP2, AT, -1 // Branch if result is false. + | + |->vmeta_equal: + | // CARG1/CARG2 point to o1/o2. TMP0 is set to 0/1. + | load_got lj_meta_equal + | cleartp LFUNC:CARG3, CARG2 + | cleartp LFUNC:CARG2, CARG1 + | move CARG4, TMP0 + | daddiu PC, PC, -4 + | sd BASE, L->base + | sd PC, SAVE_PC + | call_intern lj_meta_equal // (lua_State *L, GCobj *o1, *o2, int ne) + |. move CARG1, L + | // Returns 0/1 or TValue * (metamethod). + | b <3 + |. nop + | + |->vmeta_equal_cd: + |.if FFI + | load_got lj_meta_equal_cd + | move CARG2, INS + | daddiu PC, PC, -4 + | sd BASE, L->base + | sd PC, SAVE_PC + | call_intern lj_meta_equal_cd // (lua_State *L, BCIns op) + |. move CARG1, L + | // Returns 0/1 or TValue * (metamethod). + | b <3 + |. nop + |.endif + | + |->vmeta_istype: + | load_got lj_meta_istype + | daddiu PC, PC, -4 + | sd BASE, L->base + | srl CARG2, RA, 3 + | srl CARG3, RD, 3 + | sd PC, SAVE_PC + | call_intern lj_meta_istype // (lua_State *L, BCReg ra, BCReg tp) + |. move CARG1, L + | b ->cont_nop + |. nop + | + |//-- Arithmetic metamethods --------------------------------------------- + | + |->vmeta_unm: + | move RC, RB + | + |->vmeta_arith: + | load_got lj_meta_arith + | sd BASE, L->base + | move CARG2, RA + | sd PC, SAVE_PC + | move CARG3, RB + | move CARG4, RC + | decode_OP1 CARG5, INS // CARG5 == RB. + | call_intern lj_meta_arith // (lua_State *L, TValue *ra,*rb,*rc, BCReg op) + |. move CARG1, L + | // Returns NULL (finished) or TValue * (metamethod). + | beqz CRET1, ->cont_nop + |. nop + | + | // Call metamethod for binary op. + |->vmeta_binop: + | // BASE = old base, CRET1 = new base, stack = cont/func/o1/o2 + | dsubu TMP1, CRET1, BASE + | sd PC, -24(CRET1) // [cont|PC] + | move TMP2, BASE + | daddiu PC, TMP1, FRAME_CONT + | move BASE, CRET1 + | b ->vm_call_dispatch + |. li NARGS8:RC, 16 // 2 args for func(o1, o2). + | + |->vmeta_len: + | // CARG2 already set by BC_LEN. +#if LJ_52 + | move MULTRES, CARG1 +#endif + | load_got lj_meta_len + | sd BASE, L->base + | sd PC, SAVE_PC + | call_intern lj_meta_len // (lua_State *L, TValue *o) + |. move CARG1, L + | // Returns NULL (retry) or TValue * (metamethod base). +#if LJ_52 + | bnez CRET1, ->vmeta_binop // Binop call for compatibility. + |. nop + | b ->BC_LEN_Z + |. move CARG1, MULTRES +#else + | b ->vmeta_binop // Binop call for compatibility. + |. nop +#endif + | + |//-- Call metamethod ---------------------------------------------------- + | + |->vmeta_call: // Resolve and call __call metamethod. + | // TMP2 = old base, BASE = new base, RC = nargs*8 + | load_got lj_meta_call + | sd TMP2, L->base // This is the callers base! + | daddiu CARG2, BASE, -16 + | sd PC, SAVE_PC + | daddu CARG3, BASE, RC + | move MULTRES, NARGS8:RC + | call_intern lj_meta_call // (lua_State *L, TValue *func, TValue *top) + |. move CARG1, L + | ld LFUNC:RB, FRAME_FUNC(BASE) // Guaranteed to be a function here. + | daddiu NARGS8:RC, MULTRES, 8 // Got one more argument now. + | cleartp LFUNC:RB + | ins_call + | + |->vmeta_callt: // Resolve __call for BC_CALLT. + | // BASE = old base, RA = new base, RC = nargs*8 + | load_got lj_meta_call + | sd BASE, L->base + | daddiu CARG2, RA, -16 + | sd PC, SAVE_PC + | daddu CARG3, RA, RC + | move MULTRES, NARGS8:RC + | call_intern lj_meta_call // (lua_State *L, TValue *func, TValue *top) + |. move CARG1, L + | ld RB, FRAME_FUNC(RA) // Guaranteed to be a function here. + | ld TMP1, FRAME_PC(BASE) + | daddiu NARGS8:RC, MULTRES, 8 // Got one more argument now. + | b ->BC_CALLT_Z + |. cleartp LFUNC:CARG3, RB + | + |//-- Argument coercion for 'for' statement ------------------------------ + | + |->vmeta_for: + | load_got lj_meta_for + | sd BASE, L->base + | move CARG2, RA + | sd PC, SAVE_PC + | move MULTRES, INS + | call_intern lj_meta_for // (lua_State *L, TValue *base) + |. move CARG1, L + |.if JIT + | decode_OP1 TMP0, MULTRES + | li AT, BC_JFORI + |.endif + | decode_RA8a RA, MULTRES + | decode_RD8a RD, MULTRES + | decode_RA8b RA + |.if JIT + | beq TMP0, AT, =>BC_JFORI + |. decode_RD8b RD + | b =>BC_FORI + |. nop + |.else + | b =>BC_FORI + |. decode_RD8b RD + |.endif + | + |//----------------------------------------------------------------------- + |//-- Fast functions ----------------------------------------------------- + |//----------------------------------------------------------------------- + | + |.macro .ffunc, name + |->ff_ .. name: + |.endmacro + | + |.macro .ffunc_1, name + |->ff_ .. name: + | beqz NARGS8:RC, ->fff_fallback + |. ld CARG1, 0(BASE) + |.endmacro + | + |.macro .ffunc_2, name + |->ff_ .. name: + | sltiu AT, NARGS8:RC, 16 + | ld CARG1, 0(BASE) + | bnez AT, ->fff_fallback + |. ld CARG2, 8(BASE) + |.endmacro + | + |.macro .ffunc_n, name // Caveat: has delay slot! + |->ff_ .. name: + | ld CARG1, 0(BASE) + | beqz NARGS8:RC, ->fff_fallback + | // Either ldc1 or the 1st instruction of checknum is in the delay slot. + | .FPU ldc1 FARG1, 0(BASE) + | checknum CARG1, ->fff_fallback + |.endmacro + | + |.macro .ffunc_nn, name // Caveat: has delay slot! + |->ff_ .. name: + | ld CARG1, 0(BASE) + | sltiu AT, NARGS8:RC, 16 + | ld CARG2, 8(BASE) + | bnez AT, ->fff_fallback + |. gettp TMP0, CARG1 + | gettp TMP1, CARG2 + | sltiu TMP0, TMP0, LJ_TISNUM + | sltiu TMP1, TMP1, LJ_TISNUM + | .FPU ldc1 FARG1, 0(BASE) + | and TMP0, TMP0, TMP1 + | .FPU ldc1 FARG2, 8(BASE) + | beqz TMP0, ->fff_fallback + |.endmacro + | + |// Inlined GC threshold check. Caveat: uses TMP0 and TMP1 and has delay slot! + |// MIPSR6: no delay slot, but a forbidden slot. + |.macro ffgccheck + | ld TMP0, DISPATCH_GL(gc.total)(DISPATCH) + | ld TMP1, DISPATCH_GL(gc.threshold)(DISPATCH) + | dsubu AT, TMP0, TMP1 + |.if MIPSR6 + | bgezalc AT, ->fff_gcstep + |.else + | bgezal AT, ->fff_gcstep + |.endif + |.endmacro + | + |//-- Base library: checks ----------------------------------------------- + |.ffunc_1 assert + | gettp AT, CARG1 + | sltiu AT, AT, LJ_TISTRUECOND + | beqz AT, ->fff_fallback + |. daddiu RA, BASE, -16 + | ld PC, FRAME_PC(BASE) + | addiu RD, NARGS8:RC, 8 // Compute (nresults+1)*8. + | daddu TMP2, RA, RD + | daddiu TMP1, BASE, 8 + | beq BASE, TMP2, ->fff_res // Done if exactly 1 argument. + |. sd CARG1, 0(RA) + |1: + | ld CRET1, 0(TMP1) + | sd CRET1, -16(TMP1) + | bne TMP1, TMP2, <1 + |. daddiu TMP1, TMP1, 8 + | b ->fff_res + |. nop + | + |.ffunc_1 type + | gettp TMP0, CARG1 + | sltu TMP1, TISNUM, TMP0 + | not TMP2, TMP0 + | li TMP3, ~LJ_TISNUM + |.if MIPSR6 + | selnez TMP2, TMP2, TMP1 + | seleqz TMP3, TMP3, TMP1 + | or TMP2, TMP2, TMP3 + |.else + | movz TMP2, TMP3, TMP1 + |.endif + | dsll TMP2, TMP2, 3 + | daddu TMP2, CFUNC:RB, TMP2 + | b ->fff_restv + |. ld CARG1, CFUNC:TMP2->upvalue + | + |//-- Base library: getters and setters --------------------------------- + | + |.ffunc_1 getmetatable + | gettp TMP2, CARG1 + | daddiu TMP0, TMP2, -LJ_TTAB + | daddiu TMP1, TMP2, -LJ_TUDATA + |.if MIPSR6 + | selnez TMP0, TMP1, TMP0 + |.else + | movn TMP0, TMP1, TMP0 + |.endif + | bnez TMP0, >6 + |. cleartp TAB:CARG1 + |1: // Field metatable must be at same offset for GCtab and GCudata! + | ld TAB:RB, TAB:CARG1->metatable + |2: + | ld STR:RC, DISPATCH_GL(gcroot[GCROOT_MMNAME+MM_metatable])(DISPATCH) + | beqz TAB:RB, ->fff_restv + |. li CARG1, LJ_TNIL + | lw TMP0, TAB:RB->hmask + | lw TMP1, STR:RC->sid + | ld NODE:TMP2, TAB:RB->node + | and TMP1, TMP1, TMP0 // idx = str->sid & tab->hmask + | dsll TMP0, TMP1, 5 + | dsll TMP1, TMP1, 3 + | dsubu TMP1, TMP0, TMP1 + | daddu NODE:TMP2, NODE:TMP2, TMP1 // node = tab->node + (idx*32-idx*8) + | li CARG4, LJ_TSTR + | settp STR:RC, CARG4 // Tagged key to look for. + |3: // Rearranged logic, because we expect _not_ to find the key. + | ld TMP0, NODE:TMP2->key + | ld CARG1, NODE:TMP2->val + | ld NODE:TMP2, NODE:TMP2->next + | beq RC, TMP0, >5 + |. li AT, LJ_TTAB + | bnez NODE:TMP2, <3 + |. nop + |4: + | move CARG1, RB + | b ->fff_restv // Not found, keep default result. + |. settp CARG1, AT + |5: + | bne CARG1, TISNIL, ->fff_restv + |. nop + | b <4 // Ditto for nil value. + |. nop + | + |6: + | sltiu AT, TMP2, LJ_TISNUM + |.if MIPSR6 + | selnez TMP0, TISNUM, AT + | seleqz AT, TMP2, AT + | or TMP2, TMP0, AT + |.else + | movn TMP2, TISNUM, AT + |.endif + | dsll TMP2, TMP2, 3 + | dsubu TMP0, DISPATCH, TMP2 + | b <2 + |. ld TAB:RB, DISPATCH_GL(gcroot[GCROOT_BASEMT])-8(TMP0) + | + |.ffunc_2 setmetatable + | // Fast path: no mt for table yet and not clearing the mt. + | checktp TMP1, CARG1, -LJ_TTAB, ->fff_fallback + | gettp TMP3, CARG2 + | ld TAB:TMP0, TAB:TMP1->metatable + | lbu TMP2, TAB:TMP1->marked + | daddiu AT, TMP3, -LJ_TTAB + | cleartp TAB:CARG2 + | or AT, AT, TAB:TMP0 + | bnez AT, ->fff_fallback + |. andi AT, TMP2, LJ_GC_BLACK // isblack(table) + | beqz AT, ->fff_restv + |. sd TAB:CARG2, TAB:TMP1->metatable + | barrierback TAB:TMP1, TMP2, TMP0, ->fff_restv + | + |.ffunc rawget + | ld CARG2, 0(BASE) + | sltiu AT, NARGS8:RC, 16 + | load_got lj_tab_get + | gettp TMP0, CARG2 + | cleartp CARG2 + | daddiu TMP0, TMP0, -LJ_TTAB + | or AT, AT, TMP0 + | bnez AT, ->fff_fallback + |. daddiu CARG3, BASE, 8 + | call_intern lj_tab_get // (lua_State *L, GCtab *t, cTValue *key) + |. move CARG1, L + | b ->fff_restv + |. ld CARG1, 0(CRET1) + | + |//-- Base library: conversions ------------------------------------------ + | + |.ffunc tonumber + | // Only handles the number case inline (without a base argument). + | ld CARG1, 0(BASE) + | xori AT, NARGS8:RC, 8 // Exactly one number argument. + | gettp TMP1, CARG1 + | sltu TMP0, TISNUM, TMP1 + | or AT, AT, TMP0 + | bnez AT, ->fff_fallback + |. nop + | b ->fff_restv + |. nop + | + |.ffunc_1 tostring + | // Only handles the string or number case inline. + | gettp TMP0, CARG1 + | daddiu AT, TMP0, -LJ_TSTR + | // A __tostring method in the string base metatable is ignored. + | beqz AT, ->fff_restv // String key? + | // Handle numbers inline, unless a number base metatable is present. + |. ld TMP1, DISPATCH_GL(gcroot[GCROOT_BASEMT_NUM])(DISPATCH) + | sltu TMP0, TISNUM, TMP0 + | or TMP0, TMP0, TMP1 + | bnez TMP0, ->fff_fallback + |. sd BASE, L->base // Add frame since C call can throw. + |.if MIPSR6 + | sd PC, SAVE_PC // Redundant (but a defined value). + | ffgccheck + |.else + | ffgccheck + |. sd PC, SAVE_PC // Redundant (but a defined value). + |.endif + | load_got lj_strfmt_number + | move CARG1, L + | call_intern lj_strfmt_number // (lua_State *L, cTValue *o) + |. move CARG2, BASE + | // Returns GCstr *. + | li AT, LJ_TSTR + | settp CRET1, AT + | b ->fff_restv + |. move CARG1, CRET1 + | + |//-- Base library: iterators ------------------------------------------- + | + |.ffunc_1 next + | checktp CARG1, -LJ_TTAB, ->fff_fallback + | daddu TMP2, BASE, NARGS8:RC + | sd TISNIL, 0(TMP2) // Set missing 2nd arg to nil. + | load_got lj_tab_next + | ld PC, FRAME_PC(BASE) + | daddiu CARG2, BASE, 8 + | call_intern lj_tab_next // (GCtab *t, cTValue *key, TValue *o) + |. daddiu CARG3, BASE, -16 + | // Returns 1=found, 0=end, -1=error. + | daddiu RA, BASE, -16 + | bgtz CRET1, ->fff_res // Found key/value. + |. li RD, (2+1)*8 + | beqz CRET1, ->fff_restv // End of traversal: return nil. + |. move CARG1, TISNIL + | ld CFUNC:RB, FRAME_FUNC(BASE) + | cleartp CFUNC:RB + | b ->fff_fallback // Invalid key. + |. li RC, 2*8 + | + |.ffunc_1 pairs + | checktp TAB:TMP1, CARG1, -LJ_TTAB, ->fff_fallback + | ld PC, FRAME_PC(BASE) +#if LJ_52 + | ld TAB:TMP2, TAB:TMP1->metatable + | ld TMP0, CFUNC:RB->upvalue[0] + | bnez TAB:TMP2, ->fff_fallback +#else + | ld TMP0, CFUNC:RB->upvalue[0] +#endif + |. daddiu RA, BASE, -16 + | sd TISNIL, 0(BASE) + | sd CARG1, -8(BASE) + | sd TMP0, 0(RA) + | b ->fff_res + |. li RD, (3+1)*8 + | + |.ffunc_2 ipairs_aux + | checktab CARG1, ->fff_fallback + | checkint CARG2, ->fff_fallback + |. lw TMP0, TAB:CARG1->asize + | ld TMP1, TAB:CARG1->array + | ld PC, FRAME_PC(BASE) + | sextw TMP2, CARG2 + | addiu TMP2, TMP2, 1 + | sltu AT, TMP2, TMP0 + | daddiu RA, BASE, -16 + | zextw TMP0, TMP2 + | settp TMP0, TISNUM + | beqz AT, >2 // Not in array part? + |. sd TMP0, 0(RA) + | dsll TMP3, TMP2, 3 + | daddu TMP3, TMP1, TMP3 + | ld TMP1, 0(TMP3) + |1: + | beq TMP1, TISNIL, ->fff_res // End of iteration, return 0 results. + |. li RD, (0+1)*8 + | sd TMP1, -8(BASE) + | b ->fff_res + |. li RD, (2+1)*8 + |2: // Check for empty hash part first. Otherwise call C function. + | lw TMP0, TAB:CARG1->hmask + | load_got lj_tab_getinth + | beqz TMP0, ->fff_res + |. li RD, (0+1)*8 + | call_intern lj_tab_getinth // (GCtab *t, int32_t key) + |. move CARG2, TMP2 + | // Returns cTValue * or NULL. + | beqz CRET1, ->fff_res + |. li RD, (0+1)*8 + | b <1 + |. ld TMP1, 0(CRET1) + | + |.ffunc_1 ipairs + | checktp TAB:TMP1, CARG1, -LJ_TTAB, ->fff_fallback + | ld PC, FRAME_PC(BASE) +#if LJ_52 + | ld TAB:TMP2, TAB:TMP1->metatable + | ld CFUNC:TMP0, CFUNC:RB->upvalue[0] + | bnez TAB:TMP2, ->fff_fallback +#else + | ld TMP0, CFUNC:RB->upvalue[0] +#endif + | daddiu RA, BASE, -16 + | dsll AT, TISNUM, 47 + | sd CARG1, -8(BASE) + | sd AT, 0(BASE) + | sd CFUNC:TMP0, 0(RA) + | b ->fff_res + |. li RD, (3+1)*8 + | + |//-- Base library: catch errors ---------------------------------------- + | + |.ffunc pcall + | daddiu NARGS8:RC, NARGS8:RC, -8 + | lbu TMP3, DISPATCH_GL(hookmask)(DISPATCH) + | bltz NARGS8:RC, ->fff_fallback + |. move TMP2, BASE + | daddiu BASE, BASE, 16 + | // Remember active hook before pcall. + | srl TMP3, TMP3, HOOK_ACTIVE_SHIFT + | andi TMP3, TMP3, 1 + | daddiu PC, TMP3, 16+FRAME_PCALL + | beqz NARGS8:RC, ->vm_call_dispatch + |1: + |. daddu TMP0, BASE, NARGS8:RC + |2: + | ld TMP1, -16(TMP0) + | sd TMP1, -8(TMP0) + | daddiu TMP0, TMP0, -8 + | bne TMP0, BASE, <2 + |. nop + | b ->vm_call_dispatch + |. nop + | + |.ffunc xpcall + | daddiu NARGS8:TMP0, NARGS8:RC, -16 + | ld CARG1, 0(BASE) + | ld CARG2, 8(BASE) + | bltz NARGS8:TMP0, ->fff_fallback + |. lbu TMP1, DISPATCH_GL(hookmask)(DISPATCH) + | gettp AT, CARG2 + | daddiu AT, AT, -LJ_TFUNC + | bnez AT, ->fff_fallback // Traceback must be a function. + |. move TMP2, BASE + | move NARGS8:RC, NARGS8:TMP0 + | daddiu BASE, BASE, 24 + | // Remember active hook before pcall. + | srl TMP3, TMP3, HOOK_ACTIVE_SHIFT + | sd CARG2, 0(TMP2) // Swap function and traceback. + | andi TMP3, TMP3, 1 + | sd CARG1, 8(TMP2) + | beqz NARGS8:RC, ->vm_call_dispatch + |. daddiu PC, TMP3, 24+FRAME_PCALL + | b <1 + |. nop + | + |//-- Coroutine library -------------------------------------------------- + | + |.macro coroutine_resume_wrap, resume + |.if resume + |.ffunc_1 coroutine_resume + | checktp CARG1, CARG1, -LJ_TTHREAD, ->fff_fallback + |.else + |.ffunc coroutine_wrap_aux + | ld L:CARG1, CFUNC:RB->upvalue[0].gcr + | cleartp L:CARG1 + |.endif + | lbu TMP0, L:CARG1->status + | ld TMP1, L:CARG1->cframe + | ld CARG2, L:CARG1->top + | ld TMP2, L:CARG1->base + | addiu AT, TMP0, -LUA_YIELD + | daddu CARG3, CARG2, TMP0 + | daddiu TMP3, CARG2, 8 + |.if MIPSR6 + | seleqz CARG2, CARG2, AT + | selnez TMP3, TMP3, AT + | bgtz AT, ->fff_fallback // st > LUA_YIELD? + |. or CARG2, TMP3, CARG2 + |.else + | bgtz AT, ->fff_fallback // st > LUA_YIELD? + |. movn CARG2, TMP3, AT + |.endif + | xor TMP2, TMP2, CARG3 + | bnez TMP1, ->fff_fallback // cframe != 0? + |. or AT, TMP2, TMP0 + | ld TMP0, L:CARG1->maxstack + | beqz AT, ->fff_fallback // base == top && st == 0? + |. ld PC, FRAME_PC(BASE) + | daddu TMP2, CARG2, NARGS8:RC + | sltu AT, TMP0, TMP2 + | bnez AT, ->fff_fallback // Stack overflow? + |. sd PC, SAVE_PC + | sd BASE, L->base + |1: + |.if resume + | daddiu BASE, BASE, 8 // Keep resumed thread in stack for GC. + | daddiu NARGS8:RC, NARGS8:RC, -8 + | daddiu TMP2, TMP2, -8 + |.endif + | sd TMP2, L:CARG1->top + | daddu TMP1, BASE, NARGS8:RC + | move CARG3, CARG2 + | sd BASE, L->top + |2: // Move args to coroutine. + | ld CRET1, 0(BASE) + | sltu AT, BASE, TMP1 + | beqz AT, >3 + |. daddiu BASE, BASE, 8 + | sd CRET1, 0(CARG3) + | b <2 + |. daddiu CARG3, CARG3, 8 + |3: + | bal ->vm_resume // (lua_State *L, TValue *base, 0, 0) + |. move L:RA, L:CARG1 + | // Returns thread status. + |4: + | ld TMP2, L:RA->base + | sltiu AT, CRET1, LUA_YIELD+1 + | ld TMP3, L:RA->top + | li_vmstate INTERP + | ld BASE, L->base + | sd L, DISPATCH_GL(cur_L)(DISPATCH) + | st_vmstate + | beqz AT, >8 + |. dsubu RD, TMP3, TMP2 + | ld TMP0, L->maxstack + | beqz RD, >6 // No results? + |. daddu TMP1, BASE, RD + | sltu AT, TMP0, TMP1 + | bnez AT, >9 // Need to grow stack? + |. daddu TMP3, TMP2, RD + | sd TMP2, L:RA->top // Clear coroutine stack. + | move TMP1, BASE + |5: // Move results from coroutine. + | ld CRET1, 0(TMP2) + | daddiu TMP2, TMP2, 8 + | sltu AT, TMP2, TMP3 + | sd CRET1, 0(TMP1) + | bnez AT, <5 + |. daddiu TMP1, TMP1, 8 + |6: + | andi TMP0, PC, FRAME_TYPE + |.if resume + | mov_true TMP1 + | daddiu RA, BASE, -8 + | sd TMP1, -8(BASE) // Prepend true to results. + | daddiu RD, RD, 16 + |.else + | move RA, BASE + | daddiu RD, RD, 8 + |.endif + |7: + | sd PC, SAVE_PC + | beqz TMP0, ->BC_RET_Z + |. move MULTRES, RD + | b ->vm_return + |. nop + | + |8: // Coroutine returned with error (at co->top-1). + |.if resume + | daddiu TMP3, TMP3, -8 + | mov_false TMP1 + | ld CRET1, 0(TMP3) + | sd TMP3, L:RA->top // Remove error from coroutine stack. + | li RD, (2+1)*8 + | sd TMP1, -8(BASE) // Prepend false to results. + | daddiu RA, BASE, -8 + | sd CRET1, 0(BASE) // Copy error message. + | b <7 + |. andi TMP0, PC, FRAME_TYPE + |.else + | load_got lj_ffh_coroutine_wrap_err + | move CARG2, L:RA + | call_intern lj_ffh_coroutine_wrap_err // (lua_State *L, lua_State *co) + |. move CARG1, L + |.endif + | + |9: // Handle stack expansion on return from yield. + | load_got lj_state_growstack + | srl CARG2, RD, 3 + | call_intern lj_state_growstack // (lua_State *L, int n) + |. move CARG1, L + | b <4 + |. li CRET1, 0 + |.endmacro + | + | coroutine_resume_wrap 1 // coroutine.resume + | coroutine_resume_wrap 0 // coroutine.wrap + | + |.ffunc coroutine_yield + | ld TMP0, L->cframe + | daddu TMP1, BASE, NARGS8:RC + | sd BASE, L->base + | andi TMP0, TMP0, CFRAME_RESUME + | sd TMP1, L->top + | beqz TMP0, ->fff_fallback + |. li CRET1, LUA_YIELD + | sd r0, L->cframe + | b ->vm_leave_unw + |. sb CRET1, L->status + | + |//-- Math library ------------------------------------------------------- + | + |.ffunc_1 math_abs + | gettp CARG2, CARG1 + | daddiu AT, CARG2, -LJ_TISNUM + | bnez AT, >1 + |. sextw TMP1, CARG1 + | sra TMP0, TMP1, 31 // Extract sign. + | xor TMP1, TMP1, TMP0 + | dsubu CARG1, TMP1, TMP0 + | dsll TMP3, CARG1, 32 + | bgez TMP3, ->fff_restv + |. settp CARG1, TISNUM + | li CARG1, 0x41e0 // 2^31 as a double. + | b ->fff_restv + |. dsll CARG1, CARG1, 48 + |1: + | sltiu AT, CARG2, LJ_TISNUM + | beqz AT, ->fff_fallback + |. dextm CARG1, CARG1, 0, 30 + |// fallthrough + | + |->fff_restv: + | // CARG1 = TValue result. + | ld PC, FRAME_PC(BASE) + | daddiu RA, BASE, -16 + | sd CARG1, -16(BASE) + |->fff_res1: + | // RA = results, PC = return. + | li RD, (1+1)*8 + |->fff_res: + | // RA = results, RD = (nresults+1)*8, PC = return. + | andi TMP0, PC, FRAME_TYPE + | bnez TMP0, ->vm_return + |. move MULTRES, RD + | lw INS, -4(PC) + | decode_RB8a RB, INS + | decode_RB8b RB + |5: + | sltu AT, RD, RB + | bnez AT, >6 // More results expected? + |. decode_RA8a TMP0, INS + | decode_RA8b TMP0 + | ins_next1 + | // Adjust BASE. KBASE is assumed to be set for the calling frame. + | dsubu BASE, RA, TMP0 + | ins_next2 + | + |6: // Fill up results with nil. + | daddu TMP1, RA, RD + | daddiu RD, RD, 8 + | b <5 + |. sd TISNIL, -8(TMP1) + | + |.macro math_extern, func + | .ffunc_n math_ .. func + | load_got func + | call_extern + |. nop + | b ->fff_resn + |. nop + |.endmacro + | + |.macro math_extern2, func + | .ffunc_nn math_ .. func + |. load_got func + | call_extern + |. nop + | b ->fff_resn + |. nop + |.endmacro + | + |// TODO: Return integer type if result is integer (own sf implementation). + |.macro math_round, func + |->ff_math_ .. func: + | ld CARG1, 0(BASE) + | beqz NARGS8:RC, ->fff_fallback + |. gettp TMP0, CARG1 + | beq TMP0, TISNUM, ->fff_restv + |. sltu AT, TMP0, TISNUM + | beqz AT, ->fff_fallback + |.if FPU + |. ldc1 FARG1, 0(BASE) + | bal ->vm_ .. func + |. nop + |.else + |. load_got func + | call_extern + |. nop + |.endif + | b ->fff_resn + |. nop + |.endmacro + | + | math_round floor + | math_round ceil + | + |.ffunc math_log + | li AT, 8 + | bne NARGS8:RC, AT, ->fff_fallback // Exactly 1 argument. + |. ld CARG1, 0(BASE) + | checknum CARG1, ->fff_fallback + |. load_got log + |.if FPU + | call_extern + |. ldc1 FARG1, 0(BASE) + |.else + | call_extern + |. nop + |.endif + | b ->fff_resn + |. nop + | + | math_extern log10 + | math_extern exp + | math_extern sin + | math_extern cos + | math_extern tan + | math_extern asin + | math_extern acos + | math_extern atan + | math_extern sinh + | math_extern cosh + | math_extern tanh + | math_extern2 pow + | math_extern2 atan2 + | math_extern2 fmod + | + |.if FPU + |.ffunc_n math_sqrt + |. sqrt.d FRET1, FARG1 + |// fallthrough to ->fff_resn + |.else + | math_extern sqrt + |.endif + | + |->fff_resn: + | ld PC, FRAME_PC(BASE) + | daddiu RA, BASE, -16 + | b ->fff_res1 + |.if FPU + |. sdc1 FRET1, 0(RA) + |.else + |. sd CRET1, 0(RA) + |.endif + | + | + |.ffunc_2 math_ldexp + | checknum CARG1, ->fff_fallback + | checkint CARG2, ->fff_fallback + |. load_got ldexp + | .FPU ldc1 FARG1, 0(BASE) + | call_extern + |. lw CARG2, 8+LO(BASE) + | b ->fff_resn + |. nop + | + |.ffunc_n math_frexp + | load_got frexp + | ld PC, FRAME_PC(BASE) + | call_extern + |. daddiu CARG2, DISPATCH, DISPATCH_GL(tmptv) + | lw TMP1, DISPATCH_GL(tmptv)(DISPATCH) + | daddiu RA, BASE, -16 + |.if FPU + | mtc1 TMP1, FARG2 + | sdc1 FRET1, 0(RA) + | cvt.d.w FARG2, FARG2 + | sdc1 FARG2, 8(RA) + |.else + | sd CRET1, 0(RA) + | zextw TMP1, TMP1 + | settp TMP1, TISNUM + | sd TMP1, 8(RA) + |.endif + | b ->fff_res + |. li RD, (2+1)*8 + | + |.ffunc_n math_modf + | load_got modf + | ld PC, FRAME_PC(BASE) + | call_extern + |. daddiu CARG2, BASE, -16 + | daddiu RA, BASE, -16 + |.if FPU + | sdc1 FRET1, -8(BASE) + |.else + | sd CRET1, -8(BASE) + |.endif + | b ->fff_res + |. li RD, (2+1)*8 + | + |.macro math_minmax, name, intins, intinsc, fpins + | .ffunc_1 name + | daddu TMP3, BASE, NARGS8:RC + | checkint CARG1, >5 + |. daddiu TMP2, BASE, 8 + |1: // Handle integers. + | beq TMP2, TMP3, ->fff_restv + |. ld CARG2, 0(TMP2) + | checkint CARG2, >3 + |. sextw CARG1, CARG1 + | lw CARG2, LO(TMP2) + |. slt AT, CARG1, CARG2 + |.if MIPSR6 + | intins TMP1, CARG2, AT + | intinsc CARG1, CARG1, AT + | or CARG1, CARG1, TMP1 + |.else + | intins CARG1, CARG2, AT + |.endif + | daddiu TMP2, TMP2, 8 + | zextw CARG1, CARG1 + | b <1 + |. settp CARG1, TISNUM + | + |3: // Convert intermediate result to number and continue with number loop. + | checknum CARG2, ->fff_fallback + |.if FPU + |. mtc1 CARG1, FRET1 + | cvt.d.w FRET1, FRET1 + | b >7 + |. ldc1 FARG1, 0(TMP2) + |.else + |. nop + | bal ->vm_sfi2d_1 + |. nop + | b >7 + |. nop + |.endif + | + |5: + | .FPU ldc1 FRET1, 0(BASE) + | checknum CARG1, ->fff_fallback + |6: // Handle numbers. + |. ld CARG2, 0(TMP2) + | beq TMP2, TMP3, ->fff_resn + |.if FPU + | ldc1 FARG1, 0(TMP2) + |.else + | move CRET1, CARG1 + |.endif + | checknum CARG2, >8 + |. nop + |7: + |.if FPU + |.if MIPSR6 + | fpins FRET1, FRET1, FARG1 + |.else + |.if fpins // ismax + | c.olt.d FARG1, FRET1 + |.else + | c.olt.d FRET1, FARG1 + |.endif + | movf.d FRET1, FARG1 + |.endif + |.else + |.if fpins // ismax + | bal ->vm_sfcmpogt + |.else + | bal ->vm_sfcmpolt + |.endif + |. nop + |.if MIPSR6 + | seleqz AT, CARG2, CRET1 + | selnez CARG1, CARG1, CRET1 + | or CARG1, CARG1, AT + |.else + | movz CARG1, CARG2, CRET1 + |.endif + |.endif + | b <6 + |. daddiu TMP2, TMP2, 8 + | + |8: // Convert integer to number and continue with number loop. + | checkint CARG2, ->fff_fallback + |.if FPU + |. lwc1 FARG1, LO(TMP2) + | b <7 + |. cvt.d.w FARG1, FARG1 + |.else + |. lw CARG2, LO(TMP2) + | bal ->vm_sfi2d_2 + |. nop + | b <7 + |. nop + |.endif + | + |.endmacro + | + |.if MIPSR6 + | math_minmax math_min, seleqz, selnez, min.d + | math_minmax math_max, selnez, seleqz, max.d + |.else + | math_minmax math_min, movz, _, 0 + | math_minmax math_max, movn, _, 1 + |.endif + | + |//-- String library ----------------------------------------------------- + | + |.ffunc string_byte // Only handle the 1-arg case here. + | ld CARG1, 0(BASE) + | gettp TMP0, CARG1 + | xori AT, NARGS8:RC, 8 + | daddiu TMP0, TMP0, -LJ_TSTR + | or AT, AT, TMP0 + | bnez AT, ->fff_fallback // Need exactly 1 string argument. + |. cleartp STR:CARG1 + | lw TMP0, STR:CARG1->len + | daddiu RA, BASE, -16 + | ld PC, FRAME_PC(BASE) + | sltu RD, r0, TMP0 + | lbu TMP1, STR:CARG1[1] // Access is always ok (NUL at end). + | addiu RD, RD, 1 + | sll RD, RD, 3 // RD = ((str->len != 0)+1)*8 + | settp TMP1, TISNUM + | b ->fff_res + |. sd TMP1, 0(RA) + | + |.ffunc string_char // Only handle the 1-arg case here. + | ffgccheck + |.if not MIPSR6 + |. nop + |.endif + | ld CARG1, 0(BASE) + | gettp TMP0, CARG1 + | xori AT, NARGS8:RC, 8 // Exactly 1 argument. + | daddiu TMP0, TMP0, -LJ_TISNUM // Integer. + | li TMP1, 255 + | sextw CARG1, CARG1 + | or AT, AT, TMP0 + | sltu TMP1, TMP1, CARG1 // !(255 < n). + | or AT, AT, TMP1 + | bnez AT, ->fff_fallback + |. li CARG3, 1 + | daddiu CARG2, sp, TMPD_OFS + | sb CARG1, TMPD + |->fff_newstr: + | load_got lj_str_new + | sd BASE, L->base + | sd PC, SAVE_PC + | call_intern lj_str_new // (lua_State *L, char *str, size_t l) + |. move CARG1, L + | // Returns GCstr *. + | ld BASE, L->base + |->fff_resstr: + | li AT, LJ_TSTR + | settp CRET1, AT + | b ->fff_restv + |. move CARG1, CRET1 + | + |.ffunc string_sub + | ffgccheck + |.if not MIPSR6 + |. nop + |.endif + | addiu AT, NARGS8:RC, -16 + | ld TMP0, 0(BASE) + | bltz AT, ->fff_fallback + |. gettp TMP3, TMP0 + | cleartp STR:CARG1, TMP0 + | ld CARG2, 8(BASE) + | beqz AT, >1 + |. li CARG4, -1 + | ld CARG3, 16(BASE) + | checkint CARG3, ->fff_fallback + |. sextw CARG4, CARG3 + |1: + | checkint CARG2, ->fff_fallback + |. li AT, LJ_TSTR + | bne TMP3, AT, ->fff_fallback + |. sextw CARG3, CARG2 + | lw CARG2, STR:CARG1->len + | // STR:CARG1 = str, CARG2 = str->len, CARG3 = start, CARG4 = end + | slt AT, CARG4, r0 + | addiu TMP0, CARG2, 1 + | addu TMP1, CARG4, TMP0 + | slt TMP3, CARG3, r0 + |.if MIPSR6 + | seleqz CARG4, CARG4, AT + | selnez TMP1, TMP1, AT + | or CARG4, TMP1, CARG4 // if (end < 0) end += len+1 + |.else + | movn CARG4, TMP1, AT // if (end < 0) end += len+1 + |.endif + | addu TMP1, CARG3, TMP0 + |.if MIPSR6 + | selnez TMP1, TMP1, TMP3 + | seleqz CARG3, CARG3, TMP3 + | or CARG3, TMP1, CARG3 // if (start < 0) start += len+1 + | li TMP2, 1 + | slt AT, CARG4, r0 + | slt TMP3, r0, CARG3 + | seleqz CARG4, CARG4, AT // if (end < 0) end = 0 + | selnez CARG3, CARG3, TMP3 + | seleqz TMP2, TMP2, TMP3 + | or CARG3, TMP2, CARG3 // if (start < 1) start = 1 + | slt AT, CARG2, CARG4 + | seleqz CARG4, CARG4, AT + | selnez CARG2, CARG2, AT + | or CARG4, CARG2, CARG4 // if (end > len) end = len + |.else + | movn CARG3, TMP1, TMP3 // if (start < 0) start += len+1 + | li TMP2, 1 + | slt AT, CARG4, r0 + | slt TMP3, r0, CARG3 + | movn CARG4, r0, AT // if (end < 0) end = 0 + | movz CARG3, TMP2, TMP3 // if (start < 1) start = 1 + | slt AT, CARG2, CARG4 + | movn CARG4, CARG2, AT // if (end > len) end = len + |.endif + | daddu CARG2, STR:CARG1, CARG3 + | subu CARG3, CARG4, CARG3 // len = end - start + | daddiu CARG2, CARG2, sizeof(GCstr)-1 + | bgez CARG3, ->fff_newstr + |. addiu CARG3, CARG3, 1 // len++ + |->fff_emptystr: // Return empty string. + | li AT, LJ_TSTR + | daddiu STR:CARG1, DISPATCH, DISPATCH_GL(strempty) + | b ->fff_restv + |. settp CARG1, AT + | + |.macro ffstring_op, name + | .ffunc string_ .. name + | ffgccheck + |. nop + | beqz NARGS8:RC, ->fff_fallback + |. ld CARG2, 0(BASE) + | checkstr STR:CARG2, ->fff_fallback + | daddiu SBUF:CARG1, DISPATCH, DISPATCH_GL(tmpbuf) + | load_got lj_buf_putstr_ .. name + | ld TMP0, SBUF:CARG1->b + | sd L, SBUF:CARG1->L + | sd BASE, L->base + | sd TMP0, SBUF:CARG1->w + | call_intern extern lj_buf_putstr_ .. name + |. sd PC, SAVE_PC + | load_got lj_buf_tostr + | call_intern lj_buf_tostr + |. move SBUF:CARG1, SBUF:CRET1 + | b ->fff_resstr + |. ld BASE, L->base + |.endmacro + | + |ffstring_op reverse + |ffstring_op lower + |ffstring_op upper + | + |//-- Bit library -------------------------------------------------------- + | + |->vm_tobit_fb: + | beqz TMP1, ->fff_fallback + |.if FPU + |. ldc1 FARG1, 0(BASE) + | add.d FARG1, FARG1, TOBIT + | mfc1 CRET1, FARG1 + | jr ra + |. zextw CRET1, CRET1 + |.else + |// FP number to bit conversion for soft-float. + |->vm_tobit: + | dsll TMP0, CARG1, 1 + | li CARG3, 1076 + | dsrl AT, TMP0, 53 + | dsubu CARG3, CARG3, AT + | sltiu AT, CARG3, 54 + | beqz AT, >1 + |. dextm TMP0, TMP0, 0, 20 + | dinsu TMP0, AT, 21, 21 + | slt AT, CARG1, r0 + | dsrlv CRET1, TMP0, CARG3 + | dsubu TMP0, r0, CRET1 + |.if MIPSR6 + | selnez TMP0, TMP0, AT + | seleqz CRET1, CRET1, AT + | or CRET1, CRET1, TMP0 + |.else + | movn CRET1, TMP0, AT + |.endif + | jr ra + |. zextw CRET1, CRET1 + |1: + | jr ra + |. move CRET1, r0 + | + |// FP number to int conversion with a check for soft-float. + |// Modifies CARG1, CRET1, CRET2, TMP0, AT. + |->vm_tointg: + |.if JIT + | dsll CRET2, CARG1, 1 + | beqz CRET2, >2 + |. li TMP0, 1076 + | dsrl AT, CRET2, 53 + | dsubu TMP0, TMP0, AT + | sltiu AT, TMP0, 54 + | beqz AT, >1 + |. dextm CRET2, CRET2, 0, 20 + | dinsu CRET2, AT, 21, 21 + | slt AT, CARG1, r0 + | dsrlv CRET1, CRET2, TMP0 + | dsubu CARG1, r0, CRET1 + |.if MIPSR6 + | seleqz CRET1, CRET1, AT + | selnez CARG1, CARG1, AT + | or CRET1, CRET1, CARG1 + |.else + | movn CRET1, CARG1, AT + |.endif + | li CARG1, 64 + | subu TMP0, CARG1, TMP0 + | dsllv CRET2, CRET2, TMP0 // Integer check. + | sextw AT, CRET1 + | xor AT, CRET1, AT // Range check. + |.if MIPSR6 + | seleqz AT, AT, CRET2 + | selnez CRET2, CRET2, CRET2 + | jr ra + |. or CRET2, AT, CRET2 + |.else + | jr ra + |. movz CRET2, AT, CRET2 + |.endif + |1: + | jr ra + |. li CRET2, 1 + |2: + | jr ra + |. move CRET1, r0 + |.endif + |.endif + | + |.macro .ffunc_bit, name + | .ffunc_1 bit_..name + | gettp TMP0, CARG1 + | beq TMP0, TISNUM, >6 + |. zextw CRET1, CARG1 + | bal ->vm_tobit_fb + |. sltiu TMP1, TMP0, LJ_TISNUM + |6: + |.endmacro + | + |.macro .ffunc_bit_op, name, bins + | .ffunc_bit name + | daddiu TMP2, BASE, 8 + | daddu TMP3, BASE, NARGS8:RC + |1: + | beq TMP2, TMP3, ->fff_resi + |. ld CARG1, 0(TMP2) + | gettp TMP0, CARG1 + |.if FPU + | bne TMP0, TISNUM, >2 + |. daddiu TMP2, TMP2, 8 + | zextw CARG1, CARG1 + | b <1 + |. bins CRET1, CRET1, CARG1 + |2: + | ldc1 FARG1, -8(TMP2) + | sltiu AT, TMP0, LJ_TISNUM + | beqz AT, ->fff_fallback + |. add.d FARG1, FARG1, TOBIT + | mfc1 CARG1, FARG1 + | zextw CARG1, CARG1 + | b <1 + |. bins CRET1, CRET1, CARG1 + |.else + | beq TMP0, TISNUM, >2 + |. move CRET2, CRET1 + | bal ->vm_tobit_fb + |. sltiu TMP1, TMP0, LJ_TISNUM + | move CARG1, CRET2 + |2: + | zextw CARG1, CARG1 + | bins CRET1, CRET1, CARG1 + | b <1 + |. daddiu TMP2, TMP2, 8 + |.endif + |.endmacro + | + |.ffunc_bit_op band, and + |.ffunc_bit_op bor, or + |.ffunc_bit_op bxor, xor + | + |.ffunc_bit bswap + | dsrl TMP0, CRET1, 8 + | dsrl TMP1, CRET1, 24 + | andi TMP2, TMP0, 0xff00 + | dins TMP1, CRET1, 24, 31 + | dins TMP2, TMP0, 16, 23 + | b ->fff_resi + |. or CRET1, TMP1, TMP2 + | + |.ffunc_bit bnot + | not CRET1, CRET1 + | b ->fff_resi + |. zextw CRET1, CRET1 + | + |.macro .ffunc_bit_sh, name, shins, shmod + | .ffunc_2 bit_..name + | gettp TMP0, CARG1 + | beq TMP0, TISNUM, >1 + |. nop + | bal ->vm_tobit_fb + |. sltiu TMP1, TMP0, LJ_TISNUM + | move CARG1, CRET1 + |1: + | gettp TMP0, CARG2 + | bne TMP0, TISNUM, ->fff_fallback + |. zextw CARG2, CARG2 + | sextw CARG1, CARG1 + |.if shmod == 1 + | negu CARG2, CARG2 + |.endif + | shins CRET1, CARG1, CARG2 + | b ->fff_resi + |. zextw CRET1, CRET1 + |.endmacro + | + |.ffunc_bit_sh lshift, sllv, 0 + |.ffunc_bit_sh rshift, srlv, 0 + |.ffunc_bit_sh arshift, srav, 0 + |.ffunc_bit_sh rol, rotrv, 1 + |.ffunc_bit_sh ror, rotrv, 0 + | + |.ffunc_bit tobit + |->fff_resi: + | ld PC, FRAME_PC(BASE) + | daddiu RA, BASE, -16 + | settp CRET1, TISNUM + | b ->fff_res1 + |. sd CRET1, -16(BASE) + | + |//----------------------------------------------------------------------- + |->fff_fallback: // Call fast function fallback handler. + | // BASE = new base, RB = CFUNC, RC = nargs*8 + | ld TMP3, CFUNC:RB->f + | daddu TMP1, BASE, NARGS8:RC + | ld PC, FRAME_PC(BASE) // Fallback may overwrite PC. + | daddiu TMP0, TMP1, 8*LUA_MINSTACK + | ld TMP2, L->maxstack + | sd PC, SAVE_PC // Redundant (but a defined value). + | sltu AT, TMP2, TMP0 + | sd BASE, L->base + | sd TMP1, L->top + | bnez AT, >5 // Need to grow stack. + |. move CFUNCADDR, TMP3 + | jalr TMP3 // (lua_State *L) + |. move CARG1, L + | // Either throws an error, or recovers and returns -1, 0 or nresults+1. + | ld BASE, L->base + | sll RD, CRET1, 3 + | bgtz CRET1, ->fff_res // Returned nresults+1? + |. daddiu RA, BASE, -16 + |1: // Returned 0 or -1: retry fast path. + | ld LFUNC:RB, FRAME_FUNC(BASE) + | ld TMP0, L->top + | cleartp LFUNC:RB + | bnez CRET1, ->vm_call_tail // Returned -1? + |. dsubu NARGS8:RC, TMP0, BASE + | ins_callt // Returned 0: retry fast path. + | + |// Reconstruct previous base for vmeta_call during tailcall. + |->vm_call_tail: + | andi TMP0, PC, FRAME_TYPE + | li AT, -4 + | bnez TMP0, >3 + |. and TMP1, PC, AT + | lbu TMP1, OFS_RA(PC) + | sll TMP1, TMP1, 3 + | addiu TMP1, TMP1, 16 + |3: + | b ->vm_call_dispatch // Resolve again for tailcall. + |. dsubu TMP2, BASE, TMP1 + | + |5: // Grow stack for fallback handler. + | load_got lj_state_growstack + | li CARG2, LUA_MINSTACK + | call_intern lj_state_growstack // (lua_State *L, int n) + |. move CARG1, L + | ld BASE, L->base + | b <1 + |. li CRET1, 0 // Force retry. + | + |->fff_gcstep: // Call GC step function. + | // BASE = new base, RC = nargs*8 + | move MULTRES, ra + | load_got lj_gc_step + | sd BASE, L->base + | daddu TMP0, BASE, NARGS8:RC + | sd PC, SAVE_PC // Redundant (but a defined value). + | sd TMP0, L->top + | call_intern lj_gc_step // (lua_State *L) + |. move CARG1, L + | ld BASE, L->base + | move ra, MULTRES + | ld TMP0, L->top + | ld CFUNC:RB, FRAME_FUNC(BASE) + | cleartp CFUNC:RB + | jr ra + |. dsubu NARGS8:RC, TMP0, BASE + | + |//----------------------------------------------------------------------- + |//-- Special dispatch targets ------------------------------------------- + |//----------------------------------------------------------------------- + | + |->vm_record: // Dispatch target for recording phase. + |.if JIT + | lbu TMP3, DISPATCH_GL(hookmask)(DISPATCH) + | andi AT, TMP3, HOOK_VMEVENT // No recording while in vmevent. + | bnez AT, >5 + | // Decrement the hookcount for consistency, but always do the call. + |. lw TMP2, DISPATCH_GL(hookcount)(DISPATCH) + | andi AT, TMP3, HOOK_ACTIVE + | bnez AT, >1 + |. addiu TMP2, TMP2, -1 + | andi AT, TMP3, LUA_MASKLINE|LUA_MASKCOUNT + | beqz AT, >1 + |. nop + | b >1 + |. sw TMP2, DISPATCH_GL(hookcount)(DISPATCH) + |.endif + | + |->vm_rethook: // Dispatch target for return hooks. + | lbu TMP3, DISPATCH_GL(hookmask)(DISPATCH) + | andi AT, TMP3, HOOK_ACTIVE // Hook already active? + | beqz AT, >1 + |5: // Re-dispatch to static ins. + |. ld AT, GG_DISP2STATIC(TMP0) // Assumes TMP0 holds DISPATCH+OP*4. + | jr AT + |. nop + | + |->vm_inshook: // Dispatch target for instr/line hooks. + | lbu TMP3, DISPATCH_GL(hookmask)(DISPATCH) + | lw TMP2, DISPATCH_GL(hookcount)(DISPATCH) + | andi AT, TMP3, HOOK_ACTIVE // Hook already active? + | bnez AT, <5 + |. andi AT, TMP3, LUA_MASKLINE|LUA_MASKCOUNT + | beqz AT, <5 + |. addiu TMP2, TMP2, -1 + | beqz TMP2, >1 + |. sw TMP2, DISPATCH_GL(hookcount)(DISPATCH) + | andi AT, TMP3, LUA_MASKLINE + | beqz AT, <5 + |1: + |. load_got lj_dispatch_ins + | sw MULTRES, SAVE_MULTRES + | move CARG2, PC + | sd BASE, L->base + | // SAVE_PC must hold the _previous_ PC. The callee updates it with PC. + | call_intern lj_dispatch_ins // (lua_State *L, const BCIns *pc) + |. move CARG1, L + |3: + | ld BASE, L->base + |4: // Re-dispatch to static ins. + | lw INS, -4(PC) + | decode_OP8a TMP1, INS + | decode_OP8b TMP1 + | daddu TMP0, DISPATCH, TMP1 + | decode_RD8a RD, INS + | ld AT, GG_DISP2STATIC(TMP0) + | decode_RA8a RA, INS + | decode_RD8b RD + | jr AT + | decode_RA8b RA + | + |->cont_hook: // Continue from hook yield. + | daddiu PC, PC, 4 + | b <4 + |. lw MULTRES, -24+LO(RB) // Restore MULTRES for *M ins. + | + |->vm_hotloop: // Hot loop counter underflow. + |.if JIT + | ld LFUNC:TMP1, FRAME_FUNC(BASE) + | daddiu CARG1, DISPATCH, GG_DISP2J + | cleartp LFUNC:TMP1 + | sd PC, SAVE_PC + | ld TMP1, LFUNC:TMP1->pc + | move CARG2, PC + | sd L, DISPATCH_J(L)(DISPATCH) + | lbu TMP1, PC2PROTO(framesize)(TMP1) + | load_got lj_trace_hot + | sd BASE, L->base + | dsll TMP1, TMP1, 3 + | daddu TMP1, BASE, TMP1 + | call_intern lj_trace_hot // (jit_State *J, const BCIns *pc) + |. sd TMP1, L->top + | b <3 + |. nop + |.endif + | + | + |->vm_callhook: // Dispatch target for call hooks. + |.if JIT + | b >1 + |.endif + |. move CARG2, PC + | + |->vm_hotcall: // Hot call counter underflow. + |.if JIT + | ori CARG2, PC, 1 + |1: + |.endif + | load_got lj_dispatch_call + | daddu TMP0, BASE, RC + | sd PC, SAVE_PC + | sd BASE, L->base + | dsubu RA, RA, BASE + | sd TMP0, L->top + | call_intern lj_dispatch_call // (lua_State *L, const BCIns *pc) + |. move CARG1, L + | // Returns ASMFunction. + | ld BASE, L->base + | ld TMP0, L->top + | sd r0, SAVE_PC // Invalidate for subsequent line hook. + | dsubu NARGS8:RC, TMP0, BASE + | daddu RA, BASE, RA + | ld LFUNC:RB, FRAME_FUNC(BASE) + | cleartp LFUNC:RB + | jr CRET1 + |. lw INS, -4(PC) + | + |->cont_stitch: // Trace stitching. + |.if JIT + | // RA = resultptr, RB = meta base + | lw INS, -4(PC) + | ld TRACE:TMP2, -40(RB) // Save previous trace. + | decode_RA8a RC, INS + | daddiu AT, MULTRES, -8 + | cleartp TRACE:TMP2 + | decode_RA8b RC + | beqz AT, >2 + |. daddu RC, BASE, RC // Call base. + |1: // Move results down. + | ld CARG1, 0(RA) + | daddiu AT, AT, -8 + | daddiu RA, RA, 8 + | sd CARG1, 0(RC) + | bnez AT, <1 + |. daddiu RC, RC, 8 + |2: + | decode_RA8a RA, INS + | decode_RB8a RB, INS + | decode_RA8b RA + | decode_RB8b RB + | daddu RA, RA, RB + | daddu RA, BASE, RA + |3: + | sltu AT, RC, RA + | bnez AT, >9 // More results wanted? + |. nop + | + | lhu TMP3, TRACE:TMP2->traceno + | lhu RD, TRACE:TMP2->link + | beq RD, TMP3, ->cont_nop // Blacklisted. + |. load_got lj_dispatch_stitch + | bnez RD, =>BC_JLOOP // Jump to stitched trace. + |. sll RD, RD, 3 + | + | // Stitch a new trace to the previous trace. + | sw TMP3, DISPATCH_J(exitno)(DISPATCH) + | sd L, DISPATCH_J(L)(DISPATCH) + | sd BASE, L->base + | daddiu CARG1, DISPATCH, GG_DISP2J + | call_intern lj_dispatch_stitch // (jit_State *J, const BCIns *pc) + |. move CARG2, PC + | b ->cont_nop + |. ld BASE, L->base + | + |9: + | sd TISNIL, 0(RC) + | b <3 + |. daddiu RC, RC, 8 + |.endif + | + |->vm_profhook: // Dispatch target for profiler hook. +#if LJ_HASPROFILE + | load_got lj_dispatch_profile + | sw MULTRES, SAVE_MULTRES + | move CARG2, PC + | sd BASE, L->base + | call_intern lj_dispatch_profile // (lua_State *L, const BCIns *pc) + |. move CARG1, L + | // HOOK_PROFILE is off again, so re-dispatch to dynamic instruction. + | daddiu PC, PC, -4 + | b ->cont_nop + |. ld BASE, L->base +#endif + | + |//----------------------------------------------------------------------- + |//-- Trace exit handler ------------------------------------------------- + |//----------------------------------------------------------------------- + | + |.macro savex_, a, b + |.if FPU + | sdc1 f..a, a*8(sp) + | sdc1 f..b, b*8(sp) + | sd r..a, 32*8+a*8(sp) + | sd r..b, 32*8+b*8(sp) + |.else + | sd r..a, a*8(sp) + | sd r..b, b*8(sp) + |.endif + |.endmacro + | + |->vm_exit_handler: + |.if JIT + |.if FPU + | daddiu sp, sp, -(32*8+32*8) + |.else + | daddiu sp, sp, -(32*8) + |.endif + | savex_ 0, 1 + | savex_ 2, 3 + | savex_ 4, 5 + | savex_ 6, 7 + | savex_ 8, 9 + | savex_ 10, 11 + | savex_ 12, 13 + | savex_ 14, 15 + | savex_ 16, 17 + | savex_ 18, 19 + | savex_ 20, 21 + | savex_ 22, 23 + | savex_ 24, 25 + | savex_ 26, 27 + | savex_ 28, 30 + |.if FPU + | sdc1 f29, 29*8(sp) + | sdc1 f31, 31*8(sp) + | sd r0, 32*8+31*8(sp) // Clear RID_TMP. + | daddiu TMP2, sp, 32*8+32*8 // Recompute original value of sp. + | sd TMP2, 32*8+29*8(sp) // Store sp in RID_SP + |.else + | sd r0, 31*8(sp) // Clear RID_TMP. + | daddiu TMP2, sp, 32*8 // Recompute original value of sp. + | sd TMP2, 29*8(sp) // Store sp in RID_SP + |.endif + | li_vmstate EXIT + | daddiu DISPATCH, JGL, -GG_DISP2G-32768 + | lw TMP1, 0(TMP2) // Load exit number. + | st_vmstate + | ld L, DISPATCH_GL(cur_L)(DISPATCH) + | ld BASE, DISPATCH_GL(jit_base)(DISPATCH) + | load_got lj_trace_exit + | sd L, DISPATCH_J(L)(DISPATCH) + | sw ra, DISPATCH_J(parent)(DISPATCH) // Store trace number. + | sd BASE, L->base + | sw TMP1, DISPATCH_J(exitno)(DISPATCH) // Store exit number. + | daddiu CARG1, DISPATCH, GG_DISP2J + | sd r0, DISPATCH_GL(jit_base)(DISPATCH) + | call_intern lj_trace_exit // (jit_State *J, ExitState *ex) + |. move CARG2, sp + | // Returns MULTRES (unscaled) or negated error code. + | ld TMP1, L->cframe + | li AT, -4 + | ld BASE, L->base + | and sp, TMP1, AT + | ld PC, SAVE_PC // Get SAVE_PC. + | b >1 + |. sd L, SAVE_L // Set SAVE_L (on-trace resume/yield). + |.endif + |->vm_exit_interp: + |.if JIT + | // CRET1 = MULTRES or negated error code, BASE, PC and JGL set. + | ld L, SAVE_L + | daddiu DISPATCH, JGL, -GG_DISP2G-32768 + | sd BASE, L->base + |1: + | sltiu TMP0, CRET1, -LUA_ERRERR // Check for error from exit. + | beqz TMP0, >9 + |. ld LFUNC:RB, FRAME_FUNC(BASE) + | .FPU lui TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float). + | dsll MULTRES, CRET1, 3 + | cleartp LFUNC:RB + | sw MULTRES, SAVE_MULTRES + | li TISNIL, LJ_TNIL + | li TISNUM, LJ_TISNUM // Setup type comparison constants. + | .FPU mtc1 TMP3, TOBIT + | ld TMP1, LFUNC:RB->pc + | sd r0, DISPATCH_GL(jit_base)(DISPATCH) + | ld KBASE, PC2PROTO(k)(TMP1) + | .FPU cvt.d.s TOBIT, TOBIT + | // Modified copy of ins_next which handles function header dispatch, too. + | lw INS, 0(PC) + | addiu CRET1, CRET1, 17 // Static dispatch? + | // Assumes TISNIL == ~LJ_VMST_INTERP == -1 + | sw TISNIL, DISPATCH_GL(vmstate)(DISPATCH) + | decode_RD8a RD, INS + | beqz CRET1, >5 + |. daddiu PC, PC, 4 + | decode_OP8a TMP1, INS + | decode_OP8b TMP1 + | daddu TMP0, DISPATCH, TMP1 + | sltiu TMP2, TMP1, BC_FUNCF*8 + | ld AT, 0(TMP0) + | decode_RA8a RA, INS + | beqz TMP2, >2 + |. decode_RA8b RA + | jr AT + |. decode_RD8b RD + |2: + | sltiu TMP2, TMP1, (BC_FUNCC+2)*8 // Fast function? + | bnez TMP2, >3 + |. ld TMP1, FRAME_PC(BASE) + | // Check frame below fast function. + | andi TMP0, TMP1, FRAME_TYPE + | bnez TMP0, >3 // Trace stitching continuation? + |. nop + | // Otherwise set KBASE for Lua function below fast function. + | lw TMP2, -4(TMP1) + | decode_RA8a TMP0, TMP2 + | decode_RA8b TMP0 + | dsubu TMP1, BASE, TMP0 + | ld LFUNC:TMP2, -32(TMP1) + | cleartp LFUNC:TMP2 + | ld TMP1, LFUNC:TMP2->pc + | ld KBASE, PC2PROTO(k)(TMP1) + |3: + | daddiu RC, MULTRES, -8 + | jr AT + |. daddu RA, RA, BASE + | + |5: // Dispatch to static entry of original ins replaced by BC_JLOOP. + | ld TMP0, DISPATCH_J(trace)(DISPATCH) + | decode_RD8b RD + | daddu TMP0, TMP0, RD + | ld TRACE:TMP2, 0(TMP0) + | lw INS, TRACE:TMP2->startins + | decode_OP8a TMP1, INS + | decode_OP8b TMP1 + | daddu TMP0, DISPATCH, TMP1 + | decode_RD8a RD, INS + | ld AT, GG_DISP2STATIC(TMP0) + | decode_RA8a RA, INS + | decode_RD8b RD + | jr AT + |. decode_RA8b RA + | + |9: // Rethrow error from the right C frame. + | load_got lj_err_trace + | sub CARG2, r0, CRET1 + | call_intern lj_err_trace // (lua_State *L, int errcode) + |. move CARG1, L + |.endif + | + |//----------------------------------------------------------------------- + |//-- Math helper functions ---------------------------------------------- + |//----------------------------------------------------------------------- + | + |// Hard-float round to integer. + |// Modifies AT, TMP0, FRET1, FRET2, f4. Keeps all others incl. FARG1. + |// MIPSR6: Modifies FTMP1, too. + |.macro vm_round_hf, func + | lui TMP0, 0x4330 // Hiword of 2^52 (double). + | dsll TMP0, TMP0, 32 + | dmtc1 TMP0, f4 + | abs.d FRET2, FARG1 // |x| + | dmfc1 AT, FARG1 + |.if MIPSR6 + | cmp.lt.d FTMP1, FRET2, f4 + | add.d FRET1, FRET2, f4 // (|x| + 2^52) - 2^52 + | bc1eqz FTMP1, >1 // Truncate only if |x| < 2^52. + |.else + | c.olt.d 0, FRET2, f4 + | add.d FRET1, FRET2, f4 // (|x| + 2^52) - 2^52 + | bc1f 0, >1 // Truncate only if |x| < 2^52. + |.endif + |. sub.d FRET1, FRET1, f4 + | slt AT, AT, r0 + |.if "func" == "ceil" + | lui TMP0, 0xbff0 // Hiword of -1 (double). Preserves -0. + |.else + | lui TMP0, 0x3ff0 // Hiword of +1 (double). + |.endif + |.if "func" == "trunc" + | dsll TMP0, TMP0, 32 + | dmtc1 TMP0, f4 + |.if MIPSR6 + | cmp.lt.d FTMP1, FRET2, FRET1 // |x| < result? + | sub.d FRET2, FRET1, f4 + | sel.d FTMP1, FRET1, FRET2 // If yes, subtract +1. + | dmtc1 AT, FRET1 + | neg.d FRET2, FTMP1 + | jr ra + |. sel.d FRET1, FTMP1, FRET2 // Merge sign bit back in. + |.else + | c.olt.d 0, FRET2, FRET1 // |x| < result? + | sub.d FRET2, FRET1, f4 + | movt.d FRET1, FRET2, 0 // If yes, subtract +1. + | neg.d FRET2, FRET1 + | jr ra + |. movn.d FRET1, FRET2, AT // Merge sign bit back in. + |.endif + |.else + | neg.d FRET2, FRET1 + | dsll TMP0, TMP0, 32 + | dmtc1 TMP0, f4 + |.if MIPSR6 + | dmtc1 AT, FTMP1 + | sel.d FTMP1, FRET1, FRET2 + |.if "func" == "ceil" + | cmp.lt.d FRET1, FTMP1, FARG1 // x > result? + |.else + | cmp.lt.d FRET1, FARG1, FTMP1 // x < result? + |.endif + | sub.d FRET2, FTMP1, f4 // If yes, subtract +-1. + | jr ra + |. sel.d FRET1, FTMP1, FRET2 + |.else + | movn.d FRET1, FRET2, AT // Merge sign bit back in. + |.if "func" == "ceil" + | c.olt.d 0, FRET1, FARG1 // x > result? + |.else + | c.olt.d 0, FARG1, FRET1 // x < result? + |.endif + | sub.d FRET2, FRET1, f4 // If yes, subtract +-1. + | jr ra + |. movt.d FRET1, FRET2, 0 + |.endif + |.endif + |1: + | jr ra + |. mov.d FRET1, FARG1 + |.endmacro + | + |.macro vm_round, func + |.if FPU + | vm_round_hf, func + |.endif + |.endmacro + | + |->vm_floor: + | vm_round floor + |->vm_ceil: + | vm_round ceil + |->vm_trunc: + |.if JIT + | vm_round trunc + |.endif + | + |// Soft-float integer to number conversion. + |.macro sfi2d, ARG + |.if not FPU + | beqz ARG, >9 // Handle zero first. + |. sra TMP0, ARG, 31 + | xor TMP1, ARG, TMP0 + | dsubu TMP1, TMP1, TMP0 // Absolute value in TMP1. + | dclz ARG, TMP1 + | addiu ARG, ARG, -11 + | li AT, 0x3ff+63-11-1 + | dsllv TMP1, TMP1, ARG // Align mantissa left with leading 1. + | subu ARG, AT, ARG // Exponent - 1. + | ins ARG, TMP0, 11, 11 // Sign | Exponent. + | dsll ARG, ARG, 52 // Align left. + | jr ra + |. daddu ARG, ARG, TMP1 // Add mantissa, increment exponent. + |9: + | jr ra + |. nop + |.endif + |.endmacro + | + |// Input CARG1. Output: CARG1. Temporaries: AT, TMP0, TMP1. + |->vm_sfi2d_1: + | sfi2d CARG1 + | + |// Input CARG2. Output: CARG2. Temporaries: AT, TMP0, TMP1. + |->vm_sfi2d_2: + | sfi2d CARG2 + | + |// Soft-float comparison. Equivalent to c.eq.d. + |// Input: CARG*. Output: CRET1. Temporaries: AT, TMP0, TMP1. + |->vm_sfcmpeq: + |.if not FPU + | dsll AT, CARG1, 1 + | dsll TMP0, CARG2, 1 + | or TMP1, AT, TMP0 + | beqz TMP1, >8 // Both args +-0: return 1. + |. lui TMP1, 0xffe0 + | dsll TMP1, TMP1, 32 + | sltu AT, TMP1, AT + | sltu TMP0, TMP1, TMP0 + | or TMP1, AT, TMP0 + | bnez TMP1, >9 // Either arg is NaN: return 0; + |. xor AT, CARG1, CARG2 + | jr ra + |. sltiu CRET1, AT, 1 // Same values: return 1. + |8: + | jr ra + |. li CRET1, 1 + |9: + | jr ra + |. li CRET1, 0 + |.endif + | + |// Soft-float comparison. Equivalent to c.ult.d and c.olt.d. + |// Input: CARG1, CARG2. Output: CRET1. Temporaries: AT, TMP0, TMP1, CRET2. + |->vm_sfcmpult: + |.if not FPU + | b >1 + |. li CRET2, 1 + |.endif + | + |->vm_sfcmpolt: + |.if not FPU + | li CRET2, 0 + |1: + | dsll AT, CARG1, 1 + | dsll TMP0, CARG2, 1 + | or TMP1, AT, TMP0 + | beqz TMP1, >8 // Both args +-0: return 0. + |. lui TMP1, 0xffe0 + | dsll TMP1, TMP1, 32 + | sltu AT, TMP1, AT + | sltu TMP0, TMP1, TMP0 + | or TMP1, AT, TMP0 + | bnez TMP1, >9 // Either arg is NaN: return 0 or 1; + |. and AT, CARG1, CARG2 + | bltz AT, >5 // Both args negative? + |. nop + | jr ra + |. slt CRET1, CARG1, CARG2 + |5: // Swap conditions if both operands are negative. + | jr ra + |. slt CRET1, CARG2, CARG1 + |8: + | jr ra + |. li CRET1, 0 + |9: + | jr ra + |. move CRET1, CRET2 + |.endif + | + |->vm_sfcmpogt: + |.if not FPU + | dsll AT, CARG2, 1 + | dsll TMP0, CARG1, 1 + | or TMP1, AT, TMP0 + | beqz TMP1, >8 // Both args +-0: return 0. + |. lui TMP1, 0xffe0 + | dsll TMP1, TMP1, 32 + | sltu AT, TMP1, AT + | sltu TMP0, TMP1, TMP0 + | or TMP1, AT, TMP0 + | bnez TMP1, >9 // Either arg is NaN: return 0 or 1; + |. and AT, CARG2, CARG1 + | bltz AT, >5 // Both args negative? + |. nop + | jr ra + |. slt CRET1, CARG2, CARG1 + |5: // Swap conditions if both operands are negative. + | jr ra + |. slt CRET1, CARG1, CARG2 + |8: + | jr ra + |. li CRET1, 0 + |9: + | jr ra + |. li CRET1, 0 + |.endif + | + |// Soft-float comparison. Equivalent to c.ole.d a, b or c.ole.d b, a. + |// Input: CARG1, CARG2, TMP3. Output: CRET1. Temporaries: AT, TMP0, TMP1. + |->vm_sfcmpolex: + |.if not FPU + | dsll AT, CARG1, 1 + | dsll TMP0, CARG2, 1 + | or TMP1, AT, TMP0 + | beqz TMP1, >8 // Both args +-0: return 1. + |. lui TMP1, 0xffe0 + | dsll TMP1, TMP1, 32 + | sltu AT, TMP1, AT + | sltu TMP0, TMP1, TMP0 + | or TMP1, AT, TMP0 + | bnez TMP1, >9 // Either arg is NaN: return 0; + |. and AT, CARG1, CARG2 + | xor AT, AT, TMP3 + | bltz AT, >5 // Both args negative? + |. nop + | jr ra + |. slt CRET1, CARG2, CARG1 + |5: // Swap conditions if both operands are negative. + | jr ra + |. slt CRET1, CARG1, CARG2 + |8: + | jr ra + |. li CRET1, 1 + |9: + | jr ra + |. li CRET1, 0 + |.endif + | + |.macro sfmin_max, name, fpcall + |->vm_sf .. name: + |.if JIT and not FPU + | move TMP2, ra + | bal ->fpcall + |. nop + | move ra, TMP2 + | move TMP0, CRET1 + | move CRET1, CARG1 + |.if MIPSR6 + | selnez CRET1, CRET1, TMP0 + | seleqz TMP0, CARG2, TMP0 + | jr ra + |. or CRET1, CRET1, TMP0 + |.else + | jr ra + |. movz CRET1, CARG2, TMP0 + |.endif + |.endif + |.endmacro + | + | sfmin_max min, vm_sfcmpolt + | sfmin_max max, vm_sfcmpogt + | + |//----------------------------------------------------------------------- + |//-- Miscellaneous functions -------------------------------------------- + |//----------------------------------------------------------------------- + | + |.define NEXT_TAB, TAB:CARG1 + |.define NEXT_IDX, CARG2 + |.define NEXT_ASIZE, CARG3 + |.define NEXT_NIL, CARG4 + |.define NEXT_TMP0, r12 + |.define NEXT_TMP1, r13 + |.define NEXT_TMP2, r14 + |.define NEXT_RES_VK, CRET1 + |.define NEXT_RES_IDX, CRET2 + |.define NEXT_RES_PTR, sp + |.define NEXT_RES_VAL, 0(sp) + |.define NEXT_RES_KEY, 8(sp) + | + |// TValue *lj_vm_next(GCtab *t, uint32_t idx) + |// Next idx returned in CRET2. + |->vm_next: + |.if JIT and ENDIAN_LE + | lw NEXT_ASIZE, NEXT_TAB->asize + | ld NEXT_TMP0, NEXT_TAB->array + | li NEXT_NIL, LJ_TNIL + |1: // Traverse array part. + | sltu AT, NEXT_IDX, NEXT_ASIZE + | sll NEXT_TMP1, NEXT_IDX, 3 + | beqz AT, >5 + |. daddu NEXT_TMP1, NEXT_TMP0, NEXT_TMP1 + | li AT, LJ_TISNUM + | ld NEXT_TMP2, 0(NEXT_TMP1) + | dsll AT, AT, 47 + | or NEXT_TMP1, NEXT_IDX, AT + | beq NEXT_TMP2, NEXT_NIL, <1 + |. addiu NEXT_IDX, NEXT_IDX, 1 + | sd NEXT_TMP2, NEXT_RES_VAL + | sd NEXT_TMP1, NEXT_RES_KEY + | move NEXT_RES_VK, NEXT_RES_PTR + | jr ra + |. move NEXT_RES_IDX, NEXT_IDX + | + |5: // Traverse hash part. + | subu NEXT_RES_IDX, NEXT_IDX, NEXT_ASIZE + | ld NODE:NEXT_RES_VK, NEXT_TAB->node + | sll NEXT_TMP2, NEXT_RES_IDX, 5 + | lw NEXT_TMP0, NEXT_TAB->hmask + | sll AT, NEXT_RES_IDX, 3 + | subu AT, NEXT_TMP2, AT + | daddu NODE:NEXT_RES_VK, NODE:NEXT_RES_VK, AT + |6: + | sltu AT, NEXT_TMP0, NEXT_RES_IDX + | bnez AT, >8 + |. nop + | ld NEXT_TMP2, NODE:NEXT_RES_VK->val + | bne NEXT_TMP2, NEXT_NIL, >9 + |. addiu NEXT_RES_IDX, NEXT_RES_IDX, 1 + | // Skip holes in hash part. + | b <6 + |. daddiu NODE:NEXT_RES_VK, NODE:NEXT_RES_VK, sizeof(Node) + | + |8: // End of iteration. Set the key to nil (not the value). + | sd NEXT_NIL, NEXT_RES_KEY + | move NEXT_RES_VK, NEXT_RES_PTR + |9: + | jr ra + |. addu NEXT_RES_IDX, NEXT_RES_IDX, NEXT_ASIZE + |.endif + | + |//----------------------------------------------------------------------- + |//-- FFI helper functions ----------------------------------------------- + |//----------------------------------------------------------------------- + | + |// Handler for callback functions. Callback slot number in r1, g in r2. + |->vm_ffi_callback: + |.if FFI + |.type CTSTATE, CTState, PC + | saveregs + | ld CTSTATE, GL:r2->ctype_state + | daddiu DISPATCH, r2, GG_G2DISP + | load_got lj_ccallback_enter + | sw r1, CTSTATE->cb.slot + | sd CARG1, CTSTATE->cb.gpr[0] + | .FPU sdc1 FARG1, CTSTATE->cb.fpr[0] + | sd CARG2, CTSTATE->cb.gpr[1] + | .FPU sdc1 FARG2, CTSTATE->cb.fpr[1] + | sd CARG3, CTSTATE->cb.gpr[2] + | .FPU sdc1 FARG3, CTSTATE->cb.fpr[2] + | sd CARG4, CTSTATE->cb.gpr[3] + | .FPU sdc1 FARG4, CTSTATE->cb.fpr[3] + | sd CARG5, CTSTATE->cb.gpr[4] + | .FPU sdc1 FARG5, CTSTATE->cb.fpr[4] + | sd CARG6, CTSTATE->cb.gpr[5] + | .FPU sdc1 FARG6, CTSTATE->cb.fpr[5] + | sd CARG7, CTSTATE->cb.gpr[6] + | .FPU sdc1 FARG7, CTSTATE->cb.fpr[6] + | sd CARG8, CTSTATE->cb.gpr[7] + | .FPU sdc1 FARG8, CTSTATE->cb.fpr[7] + | daddiu TMP0, sp, CFRAME_SPACE + | sd TMP0, CTSTATE->cb.stack + | sd r0, SAVE_PC // Any value outside of bytecode is ok. + | move CARG2, sp + | call_intern lj_ccallback_enter // (CTState *cts, void *cf) + |. move CARG1, CTSTATE + | // Returns lua_State *. + | ld BASE, L:CRET1->base + | ld RC, L:CRET1->top + | move L, CRET1 + | .FPU lui TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float). + | ld LFUNC:RB, FRAME_FUNC(BASE) + | .FPU mtc1 TMP3, TOBIT + | li TISNIL, LJ_TNIL + | li TISNUM, LJ_TISNUM + | li_vmstate INTERP + | subu RC, RC, BASE + | cleartp LFUNC:RB + | st_vmstate + | .FPU cvt.d.s TOBIT, TOBIT + | ins_callt + |.endif + | + |->cont_ffi_callback: // Return from FFI callback. + |.if FFI + | load_got lj_ccallback_leave + | ld CTSTATE, DISPATCH_GL(ctype_state)(DISPATCH) + | sd BASE, L->base + | sd RB, L->top + | sd L, CTSTATE->L + | move CARG2, RA + | call_intern lj_ccallback_leave // (CTState *cts, TValue *o) + |. move CARG1, CTSTATE + | .FPU ldc1 FRET1, CTSTATE->cb.fpr[0] + | ld CRET1, CTSTATE->cb.gpr[0] + | .FPU ldc1 FRET2, CTSTATE->cb.fpr[1] + | b ->vm_leave_unw + |. ld CRET2, CTSTATE->cb.gpr[1] + |.endif + | + |->vm_ffi_call: // Call C function via FFI. + | // Caveat: needs special frame unwinding, see below. + |.if FFI + | .type CCSTATE, CCallState, CARG1 + | lw TMP1, CCSTATE->spadj + | lbu CARG2, CCSTATE->nsp + | move TMP2, sp + | dsubu sp, sp, TMP1 + | sd ra, -8(TMP2) + | sll CARG2, CARG2, 3 + | sd r16, -16(TMP2) + | sd CCSTATE, -24(TMP2) + | move r16, TMP2 + | daddiu TMP1, CCSTATE, offsetof(CCallState, stack) + | move TMP2, sp + | beqz CARG2, >2 + |. daddu TMP3, TMP1, CARG2 + |1: + | ld TMP0, 0(TMP1) + | daddiu TMP1, TMP1, 8 + | sltu AT, TMP1, TMP3 + | sd TMP0, 0(TMP2) + | bnez AT, <1 + |. daddiu TMP2, TMP2, 8 + |2: + | ld CFUNCADDR, CCSTATE->func + | .FPU ldc1 FARG1, CCSTATE->gpr[0] + | ld CARG2, CCSTATE->gpr[1] + | .FPU ldc1 FARG2, CCSTATE->gpr[1] + | ld CARG3, CCSTATE->gpr[2] + | .FPU ldc1 FARG3, CCSTATE->gpr[2] + | ld CARG4, CCSTATE->gpr[3] + | .FPU ldc1 FARG4, CCSTATE->gpr[3] + | ld CARG5, CCSTATE->gpr[4] + | .FPU ldc1 FARG5, CCSTATE->gpr[4] + | ld CARG6, CCSTATE->gpr[5] + | .FPU ldc1 FARG6, CCSTATE->gpr[5] + | ld CARG7, CCSTATE->gpr[6] + | .FPU ldc1 FARG7, CCSTATE->gpr[6] + | ld CARG8, CCSTATE->gpr[7] + | .FPU ldc1 FARG8, CCSTATE->gpr[7] + | jalr CFUNCADDR + |. ld CARG1, CCSTATE->gpr[0] // Do this last, since CCSTATE is CARG1. + | ld CCSTATE:TMP1, -24(r16) + | ld TMP2, -16(r16) + | ld ra, -8(r16) + | sd CRET1, CCSTATE:TMP1->gpr[0] + | sd CRET2, CCSTATE:TMP1->gpr[1] + |.if FPU + | sdc1 FRET1, CCSTATE:TMP1->fpr[0] + | sdc1 FRET2, CCSTATE:TMP1->fpr[1] + |.else + | sd CARG1, CCSTATE:TMP1->gpr[2] // 2nd FP struct field for soft-float. + |.endif + | move sp, r16 + | jr ra + |. move r16, TMP2 + |.endif + |// Note: vm_ffi_call must be the last function in this object file! + | + |//----------------------------------------------------------------------- +} + +/* Generate the code for a single instruction. */ +static void build_ins(BuildCtx *ctx, BCOp op, int defop) +{ + int vk = 0; + |=>defop: + + switch (op) { + + /* -- Comparison ops ---------------------------------------------------- */ + + /* Remember: all ops branch for a true comparison, fall through otherwise. */ + + case BC_ISLT: case BC_ISGE: case BC_ISLE: case BC_ISGT: + | // RA = src1*8, RD = src2*8, JMP with RD = target + |.macro bc_comp, FRA, FRD, ARGRA, ARGRD, movop, fmovop, fcomp, sfcomp + | daddu RA, BASE, RA + | daddu RD, BASE, RD + | ld ARGRA, 0(RA) + | ld ARGRD, 0(RD) + | lhu TMP2, OFS_RD(PC) + | gettp CARG3, ARGRA + | gettp CARG4, ARGRD + | bne CARG3, TISNUM, >2 + |. daddiu PC, PC, 4 + | bne CARG4, TISNUM, >5 + |. decode_RD4b TMP2 + | sextw ARGRA, ARGRA + | sextw ARGRD, ARGRD + | lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535) + | slt AT, CARG1, CARG2 + | addu TMP2, TMP2, TMP3 + |.if MIPSR6 + | movop TMP2, TMP2, AT + |.else + | movop TMP2, r0, AT + |.endif + |1: + | daddu PC, PC, TMP2 + | ins_next + | + |2: // RA is not an integer. + | sltiu AT, CARG3, LJ_TISNUM + | beqz AT, ->vmeta_comp + |. lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535) + | sltiu AT, CARG4, LJ_TISNUM + | beqz AT, >4 + |. decode_RD4b TMP2 + |.if FPU + | ldc1 FRA, 0(RA) + | ldc1 FRD, 0(RD) + |.endif + |3: // RA and RD are both numbers. + |.if FPU + |.if MIPSR6 + | fcomp FTMP0, FTMP0, FTMP2 + | addu TMP2, TMP2, TMP3 + | mfc1 TMP3, FTMP0 + | b <1 + |. fmovop TMP2, TMP2, TMP3 + |.else + | fcomp FTMP0, FTMP2 + | addu TMP2, TMP2, TMP3 + | b <1 + |. fmovop TMP2, r0 + |.endif + |.else + | bal sfcomp + |. addu TMP2, TMP2, TMP3 + | b <1 + |.if MIPSR6 + |. movop TMP2, TMP2, CRET1 + |.else + |. movop TMP2, r0, CRET1 + |.endif + |.endif + | + |4: // RA is a number, RD is not a number. + | bne CARG4, TISNUM, ->vmeta_comp + | // RA is a number, RD is an integer. Convert RD to a number. + |.if FPU + |. lwc1 FRD, LO(RD) + | ldc1 FRA, 0(RA) + | b <3 + |. cvt.d.w FRD, FRD + |.else + |.if "ARGRD" == "CARG1" + |. sextw CARG1, CARG1 + | bal ->vm_sfi2d_1 + |. nop + |.else + |. sextw CARG2, CARG2 + | bal ->vm_sfi2d_2 + |. nop + |.endif + | b <3 + |. nop + |.endif + | + |5: // RA is an integer, RD is not an integer + | sltiu AT, CARG4, LJ_TISNUM + | beqz AT, ->vmeta_comp + |. lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535) + | // RA is an integer, RD is a number. Convert RA to a number. + |.if FPU + | lwc1 FRA, LO(RA) + | ldc1 FRD, 0(RD) + | b <3 + | cvt.d.w FRA, FRA + |.else + |.if "ARGRA" == "CARG1" + | bal ->vm_sfi2d_1 + |. sextw CARG1, CARG1 + |.else + | bal ->vm_sfi2d_2 + |. sextw CARG2, CARG2 + |.endif + | b <3 + |. nop + |.endif + |.endmacro + | + |.if MIPSR6 + if (op == BC_ISLT) { + | bc_comp FTMP0, FTMP2, CARG1, CARG2, selnez, selnez, cmp.lt.d, ->vm_sfcmpolt + } else if (op == BC_ISGE) { + | bc_comp FTMP0, FTMP2, CARG1, CARG2, seleqz, seleqz, cmp.lt.d, ->vm_sfcmpolt + } else if (op == BC_ISLE) { + | bc_comp FTMP2, FTMP0, CARG2, CARG1, seleqz, seleqz, cmp.ult.d, ->vm_sfcmpult + } else { + | bc_comp FTMP2, FTMP0, CARG2, CARG1, selnez, selnez, cmp.ult.d, ->vm_sfcmpult + } + |.else + if (op == BC_ISLT) { + | bc_comp FTMP0, FTMP2, CARG1, CARG2, movz, movf, c.olt.d, ->vm_sfcmpolt + } else if (op == BC_ISGE) { + | bc_comp FTMP0, FTMP2, CARG1, CARG2, movn, movt, c.olt.d, ->vm_sfcmpolt + } else if (op == BC_ISLE) { + | bc_comp FTMP2, FTMP0, CARG2, CARG1, movn, movt, c.ult.d, ->vm_sfcmpult + } else { + | bc_comp FTMP2, FTMP0, CARG2, CARG1, movz, movf, c.ult.d, ->vm_sfcmpult + } + |.endif + break; + + case BC_ISEQV: case BC_ISNEV: + vk = op == BC_ISEQV; + | // RA = src1*8, RD = src2*8, JMP with RD = target + | daddu RA, BASE, RA + | daddiu PC, PC, 4 + | daddu RD, BASE, RD + | ld CARG1, 0(RA) + | lhu TMP2, -4+OFS_RD(PC) + | ld CARG2, 0(RD) + | gettp CARG3, CARG1 + | gettp CARG4, CARG2 + | sltu AT, TISNUM, CARG3 + | sltu TMP1, TISNUM, CARG4 + | or AT, AT, TMP1 + if (vk) { + | beqz AT, ->BC_ISEQN_Z + } else { + | beqz AT, ->BC_ISNEN_Z + } + | // Either or both types are not numbers. + | lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535) + |.if FFI + |. li AT, LJ_TCDATA + | beq CARG3, AT, ->vmeta_equal_cd + |.endif + | decode_RD4b TMP2 + |.if FFI + | beq CARG4, AT, ->vmeta_equal_cd + |. nop + |.endif + | bne CARG1, CARG2, >2 + |. addu TMP2, TMP2, TMP3 + | // Tag and value are equal. + if (vk) { + |->BC_ISEQV_Z: + | daddu PC, PC, TMP2 + } + |1: + | ins_next + | + |2: // Check if the tags are the same and it's a table or userdata. + | xor AT, CARG3, CARG4 // Same type? + | sltiu TMP0, CARG3, LJ_TISTABUD+1 // Table or userdata? + |.if MIPSR6 + | seleqz TMP0, TMP0, AT + |.else + | movn TMP0, r0, AT + |.endif + if (vk) { + | beqz TMP0, <1 + } else { + | beqz TMP0, ->BC_ISEQV_Z // Reuse code from opposite instruction. + } + | // Different tables or userdatas. Need to check __eq metamethod. + | // Field metatable must be at same offset for GCtab and GCudata! + |. cleartp TAB:TMP1, CARG1 + | ld TAB:TMP3, TAB:TMP1->metatable + if (vk) { + | beqz TAB:TMP3, <1 // No metatable? + |. nop + | lbu TMP3, TAB:TMP3->nomm + | andi TMP3, TMP3, 1<1 // Or 'no __eq' flag set? + } else { + | beqz TAB:TMP3,->BC_ISEQV_Z // No metatable? + |. nop + | lbu TMP3, TAB:TMP3->nomm + | andi TMP3, TMP3, 1<BC_ISEQV_Z // Or 'no __eq' flag set? + } + |. nop + | b ->vmeta_equal // Handle __eq metamethod. + |. li TMP0, 1-vk // ne = 0 or 1. + break; + + case BC_ISEQS: case BC_ISNES: + vk = op == BC_ISEQS; + | // RA = src*8, RD = str_const*8 (~), JMP with RD = target + | daddu RA, BASE, RA + | daddiu PC, PC, 4 + | ld CARG1, 0(RA) + | dsubu RD, KBASE, RD + | lhu TMP2, -4+OFS_RD(PC) + | ld CARG2, -8(RD) // KBASE-8-str_const*8 + |.if FFI + | gettp TMP0, CARG1 + | li AT, LJ_TCDATA + |.endif + | li TMP1, LJ_TSTR + | decode_RD4b TMP2 + |.if FFI + | beq TMP0, AT, ->vmeta_equal_cd + |.endif + |. settp CARG2, TMP1 + | lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535) + | xor TMP1, CARG1, CARG2 + | addu TMP2, TMP2, TMP3 + |.if MIPSR6 + if (vk) { + | seleqz TMP2, TMP2, TMP1 + } else { + | selnez TMP2, TMP2, TMP1 + } + |.else + if (vk) { + | movn TMP2, r0, TMP1 + } else { + | movz TMP2, r0, TMP1 + } + |.endif + | daddu PC, PC, TMP2 + | ins_next + break; + + case BC_ISEQN: case BC_ISNEN: + vk = op == BC_ISEQN; + | // RA = src*8, RD = num_const*8, JMP with RD = target + | daddu RA, BASE, RA + | daddu RD, KBASE, RD + | ld CARG1, 0(RA) + | ld CARG2, 0(RD) + | lhu TMP2, OFS_RD(PC) + | gettp CARG3, CARG1 + | gettp CARG4, CARG2 + | daddiu PC, PC, 4 + | lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535) + if (vk) { + |->BC_ISEQN_Z: + } else { + |->BC_ISNEN_Z: + } + | bne CARG3, TISNUM, >3 + |. decode_RD4b TMP2 + | bne CARG4, TISNUM, >6 + |. addu TMP2, TMP2, TMP3 + | xor AT, CARG1, CARG2 + |.if MIPSR6 + if (vk) { + | seleqz TMP2, TMP2, AT + |1: + | daddu PC, PC, TMP2 + |2: + } else { + | selnez TMP2, TMP2, AT + |1: + |2: + | daddu PC, PC, TMP2 + } + |.else + if (vk) { + | movn TMP2, r0, AT + |1: + | daddu PC, PC, TMP2 + |2: + } else { + | movz TMP2, r0, AT + |1: + |2: + | daddu PC, PC, TMP2 + } + |.endif + | ins_next + | + |3: // RA is not an integer. + | sltu AT, CARG3, TISNUM + |.if FFI + | beqz AT, >8 + |.else + | beqz AT, <2 + |.endif + |. addu TMP2, TMP2, TMP3 + | sltu AT, CARG4, TISNUM + |.if FPU + | ldc1 FTMP0, 0(RA) + | ldc1 FTMP2, 0(RD) + |.endif + | beqz AT, >5 + |. nop + |4: // RA and RD are both numbers. + |.if FPU + |.if MIPSR6 + | cmp.eq.d FTMP0, FTMP0, FTMP2 + | dmfc1 TMP1, FTMP0 + | b <1 + if (vk) { + |. selnez TMP2, TMP2, TMP1 + } else { + |. seleqz TMP2, TMP2, TMP1 + } + |.else + | c.eq.d FTMP0, FTMP2 + | b <1 + if (vk) { + |. movf TMP2, r0 + } else { + |. movt TMP2, r0 + } + |.endif + |.else + | bal ->vm_sfcmpeq + |. nop + | b <1 + |.if MIPSR6 + if (vk) { + |. selnez TMP2, TMP2, CRET1 + } else { + |. seleqz TMP2, TMP2, CRET1 + } + |.else + if (vk) { + |. movz TMP2, r0, CRET1 + } else { + |. movn TMP2, r0, CRET1 + } + |.endif + |.endif + | + |5: // RA is a number, RD is not a number. + |.if FFI + | bne CARG4, TISNUM, >9 + |.else + | bne CARG4, TISNUM, <2 + |.endif + | // RA is a number, RD is an integer. Convert RD to a number. + |.if FPU + |. lwc1 FTMP2, LO(RD) + | b <4 + |. cvt.d.w FTMP2, FTMP2 + |.else + |. sextw CARG2, CARG2 + | bal ->vm_sfi2d_2 + |. nop + | b <4 + |. nop + |.endif + | + |6: // RA is an integer, RD is not an integer + | sltu AT, CARG4, TISNUM + |.if FFI + | beqz AT, >9 + |.else + | beqz AT, <2 + |.endif + | // RA is an integer, RD is a number. Convert RA to a number. + |.if FPU + |. lwc1 FTMP0, LO(RA) + | ldc1 FTMP2, 0(RD) + | b <4 + | cvt.d.w FTMP0, FTMP0 + |.else + |. sextw CARG1, CARG1 + | bal ->vm_sfi2d_1 + |. nop + | b <4 + |. nop + |.endif + | + |.if FFI + |8: + | li AT, LJ_TCDATA + | bne CARG3, AT, <2 + |. nop + | b ->vmeta_equal_cd + |. nop + |9: + | li AT, LJ_TCDATA + | bne CARG4, AT, <2 + |. nop + | b ->vmeta_equal_cd + |. nop + |.endif + break; + + case BC_ISEQP: case BC_ISNEP: + vk = op == BC_ISEQP; + | // RA = src*8, RD = primitive_type*8 (~), JMP with RD = target + | daddu RA, BASE, RA + | srl TMP1, RD, 3 + | ld TMP0, 0(RA) + | lhu TMP2, OFS_RD(PC) + | not TMP1, TMP1 + | gettp TMP0, TMP0 + | daddiu PC, PC, 4 + |.if FFI + | li AT, LJ_TCDATA + | beq TMP0, AT, ->vmeta_equal_cd + |.endif + |. xor TMP0, TMP0, TMP1 + | decode_RD4b TMP2 + | lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535) + | addu TMP2, TMP2, TMP3 + |.if MIPSR6 + if (vk) { + | seleqz TMP2, TMP2, TMP0 + } else { + | selnez TMP2, TMP2, TMP0 + } + |.else + if (vk) { + | movn TMP2, r0, TMP0 + } else { + | movz TMP2, r0, TMP0 + } + |.endif + | daddu PC, PC, TMP2 + | ins_next + break; + + /* -- Unary test and copy ops ------------------------------------------- */ + + case BC_ISTC: case BC_ISFC: case BC_IST: case BC_ISF: + | // RA = dst*8 or unused, RD = src*8, JMP with RD = target + | daddu RD, BASE, RD + | lhu TMP2, OFS_RD(PC) + | ld TMP0, 0(RD) + | daddiu PC, PC, 4 + | gettp TMP0, TMP0 + | sltiu TMP0, TMP0, LJ_TISTRUECOND + if (op == BC_IST || op == BC_ISF) { + | decode_RD4b TMP2 + | lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535) + | addu TMP2, TMP2, TMP3 + |.if MIPSR6 + if (op == BC_IST) { + | selnez TMP2, TMP2, TMP0; + } else { + | seleqz TMP2, TMP2, TMP0; + } + |.else + if (op == BC_IST) { + | movz TMP2, r0, TMP0 + } else { + | movn TMP2, r0, TMP0 + } + |.endif + | daddu PC, PC, TMP2 + } else { + | ld CRET1, 0(RD) + if (op == BC_ISTC) { + | beqz TMP0, >1 + } else { + | bnez TMP0, >1 + } + |. daddu RA, BASE, RA + | decode_RD4b TMP2 + | lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535) + | addu TMP2, TMP2, TMP3 + | sd CRET1, 0(RA) + | daddu PC, PC, TMP2 + |1: + } + | ins_next + break; + + case BC_ISTYPE: + | // RA = src*8, RD = -type*8 + | daddu TMP2, BASE, RA + | srl TMP1, RD, 3 + | ld TMP0, 0(TMP2) + | ins_next1 + | gettp TMP0, TMP0 + | daddu AT, TMP0, TMP1 + | bnez AT, ->vmeta_istype + |. ins_next2 + break; + case BC_ISNUM: + | // RA = src*8, RD = -(TISNUM-1)*8 + | daddu TMP2, BASE, RA + | ld TMP0, 0(TMP2) + | ins_next1 + | checknum TMP0, ->vmeta_istype + |. ins_next2 + break; + + /* -- Unary ops --------------------------------------------------------- */ + + case BC_MOV: + | // RA = dst*8, RD = src*8 + | daddu RD, BASE, RD + | daddu RA, BASE, RA + | ld CRET1, 0(RD) + | ins_next1 + | sd CRET1, 0(RA) + | ins_next2 + break; + case BC_NOT: + | // RA = dst*8, RD = src*8 + | daddu RD, BASE, RD + | daddu RA, BASE, RA + | ld TMP0, 0(RD) + | li AT, LJ_TTRUE + | gettp TMP0, TMP0 + | sltu TMP0, AT, TMP0 + | addiu TMP0, TMP0, 1 + | dsll TMP0, TMP0, 47 + | not TMP0, TMP0 + | ins_next1 + | sd TMP0, 0(RA) + | ins_next2 + break; + case BC_UNM: + | // RA = dst*8, RD = src*8 + | daddu RB, BASE, RD + | ld CARG1, 0(RB) + | daddu RA, BASE, RA + | gettp CARG3, CARG1 + | bne CARG3, TISNUM, >2 + |. lui TMP1, 0x8000 + | sextw CARG1, CARG1 + | beq CARG1, TMP1, ->vmeta_unm // Meta handler deals with -2^31. + |. negu CARG1, CARG1 + | zextw CARG1, CARG1 + | settp CARG1, TISNUM + |1: + | ins_next1 + | sd CARG1, 0(RA) + | ins_next2 + |2: + | sltiu AT, CARG3, LJ_TISNUM + | beqz AT, ->vmeta_unm + |. dsll TMP1, TMP1, 32 + | b <1 + |. xor CARG1, CARG1, TMP1 + break; + case BC_LEN: + | // RA = dst*8, RD = src*8 + | daddu CARG2, BASE, RD + | daddu RA, BASE, RA + | ld TMP0, 0(CARG2) + | gettp TMP1, TMP0 + | daddiu AT, TMP1, -LJ_TSTR + | bnez AT, >2 + |. cleartp STR:CARG1, TMP0 + | lw CRET1, STR:CARG1->len + |1: + | settp CRET1, TISNUM + | ins_next1 + | sd CRET1, 0(RA) + | ins_next2 + |2: + | daddiu AT, TMP1, -LJ_TTAB + | bnez AT, ->vmeta_len + |. nop +#if LJ_52 + | ld TAB:TMP2, TAB:CARG1->metatable + | bnez TAB:TMP2, >9 + |. nop + |3: +#endif + |->BC_LEN_Z: + | load_got lj_tab_len + | call_intern lj_tab_len // (GCtab *t) + |. nop + | // Returns uint32_t (but less than 2^31). + | b <1 + |. nop +#if LJ_52 + |9: + | lbu TMP0, TAB:TMP2->nomm + | andi TMP0, TMP0, 1<vmeta_len + |. nop +#endif + break; + + /* -- Binary ops -------------------------------------------------------- */ + + |.macro fpmod, a, b, c + | bal ->vm_floor // floor(b/c) + |. div.d FARG1, b, c + | mul.d a, FRET1, c + | sub.d a, b, a // b - floor(b/c)*c + |.endmacro + + |.macro sfpmod + | daddiu sp, sp, -16 + | + | load_got __divdf3 + | sd CARG1, 0(sp) + | call_extern + |. sd CARG2, 8(sp) + | + | load_got floor + | call_extern + |. move CARG1, CRET1 + | + | load_got __muldf3 + | move CARG1, CRET1 + | call_extern + |. ld CARG2, 8(sp) + | + | load_got __subdf3 + | ld CARG1, 0(sp) + | call_extern + |. move CARG2, CRET1 + | + | daddiu sp, sp, 16 + |.endmacro + + |.macro ins_arithpre, label + ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN); + | // RA = dst*8, RB = src1*8, RC = src2*8 | num_const*8 + ||switch (vk) { + ||case 0: + | decode_RB8a RB, INS + | decode_RB8b RB + | decode_RDtoRC8 RC, RD + | // RA = dst*8, RB = src1*8, RC = num_const*8 + | daddu RB, BASE, RB + |.if "label" ~= "none" + | b label + |.endif + |. daddu RC, KBASE, RC + || break; + ||case 1: + | decode_RB8a RC, INS + | decode_RB8b RC + | decode_RDtoRC8 RB, RD + | // RA = dst*8, RB = num_const*8, RC = src1*8 + | daddu RC, BASE, RC + |.if "label" ~= "none" + | b label + |.endif + |. daddu RB, KBASE, RB + || break; + ||default: + | decode_RB8a RB, INS + | decode_RB8b RB + | decode_RDtoRC8 RC, RD + | // RA = dst*8, RB = src1*8, RC = src2*8 + | daddu RB, BASE, RB + |.if "label" ~= "none" + | b label + |.endif + |. daddu RC, BASE, RC + || break; + ||} + |.endmacro + | + |.macro ins_arith, intins, fpins, fpcall, label + | ins_arithpre none + | + |.if "label" ~= "none" + |label: + |.endif + | + |// Used in 5. + | ld CARG1, 0(RB) + | ld CARG2, 0(RC) + | gettp TMP0, CARG1 + | gettp TMP1, CARG2 + | + |.if "intins" ~= "div" + | + | // Check for two integers. + | sextw CARG3, CARG1 + | bne TMP0, TISNUM, >5 + |. sextw CARG4, CARG2 + | bne TMP1, TISNUM, >5 + | + |.if "intins" == "addu" + |. intins CRET1, CARG3, CARG4 + | xor TMP1, CRET1, CARG3 // ((y^a) & (y^b)) < 0: overflow. + | xor TMP2, CRET1, CARG4 + | and TMP1, TMP1, TMP2 + | bltz TMP1, ->vmeta_arith + |. daddu RA, BASE, RA + |.elif "intins" == "subu" + |. intins CRET1, CARG3, CARG4 + | xor TMP1, CRET1, CARG3 // ((y^a) & (a^b)) < 0: overflow. + | xor TMP2, CARG3, CARG4 + | and TMP1, TMP1, TMP2 + | bltz TMP1, ->vmeta_arith + |. daddu RA, BASE, RA + |.elif "intins" == "mult" + |.if MIPSR6 + |. nop + | mul CRET1, CARG3, CARG4 + | muh TMP2, CARG3, CARG4 + |.else + |. intins CARG3, CARG4 + | mflo CRET1 + | mfhi TMP2 + |.endif + | sra TMP1, CRET1, 31 + | bne TMP1, TMP2, ->vmeta_arith + |. daddu RA, BASE, RA + |.else + |. load_got lj_vm_modi + | beqz CARG4, ->vmeta_arith + |. daddu RA, BASE, RA + | move CARG1, CARG3 + | call_extern + |. move CARG2, CARG4 + |.endif + | + | zextw CRET1, CRET1 + | settp CRET1, TISNUM + | ins_next1 + | sd CRET1, 0(RA) + |3: + | ins_next2 + | + |.endif + | + |5: // Check for two numbers. + | .FPU ldc1 FTMP0, 0(RB) + | sltu AT, TMP0, TISNUM + | sltu TMP0, TMP1, TISNUM + | .FPU ldc1 FTMP2, 0(RC) + | and AT, AT, TMP0 + | beqz AT, ->vmeta_arith + |. daddu RA, BASE, RA + | + |.if FPU + | fpins FRET1, FTMP0, FTMP2 + |.elif "fpcall" == "sfpmod" + | sfpmod + |.else + | load_got fpcall + | call_extern + |. nop + |.endif + | + | ins_next1 + |.if "intins" ~= "div" + | b <3 + |.endif + |.if FPU + |. sdc1 FRET1, 0(RA) + |.else + |. sd CRET1, 0(RA) + |.endif + |.if "intins" == "div" + | ins_next2 + |.endif + | + |.endmacro + + case BC_ADDVN: case BC_ADDNV: case BC_ADDVV: + | ins_arith addu, add.d, __adddf3, none + break; + case BC_SUBVN: case BC_SUBNV: case BC_SUBVV: + | ins_arith subu, sub.d, __subdf3, none + break; + case BC_MULVN: case BC_MULNV: case BC_MULVV: + | ins_arith mult, mul.d, __muldf3, none + break; + case BC_DIVVN: + | ins_arith div, div.d, __divdf3, ->BC_DIVVN_Z + break; + case BC_DIVNV: case BC_DIVVV: + | ins_arithpre ->BC_DIVVN_Z + break; + case BC_MODVN: + | ins_arith modi, fpmod, sfpmod, ->BC_MODVN_Z + break; + case BC_MODNV: case BC_MODVV: + | ins_arithpre ->BC_MODVN_Z + break; + case BC_POW: + | ins_arithpre none + | ld CARG1, 0(RB) + | ld CARG2, 0(RC) + | gettp TMP0, CARG1 + | gettp TMP1, CARG2 + | sltiu TMP0, TMP0, LJ_TISNUM + | sltiu TMP1, TMP1, LJ_TISNUM + | and AT, TMP0, TMP1 + | load_got pow + | beqz AT, ->vmeta_arith + |. daddu RA, BASE, RA + |.if FPU + | ldc1 FARG1, 0(RB) + | ldc1 FARG2, 0(RC) + |.endif + | call_extern + |. nop + | ins_next1 + |.if FPU + | sdc1 FRET1, 0(RA) + |.else + | sd CRET1, 0(RA) + |.endif + | ins_next2 + break; + + case BC_CAT: + | // RA = dst*8, RB = src_start*8, RC = src_end*8 + | decode_RB8a RB, INS + | decode_RB8b RB + | decode_RDtoRC8 RC, RD + | dsubu CARG3, RC, RB + | sd BASE, L->base + | daddu CARG2, BASE, RC + | move MULTRES, RB + |->BC_CAT_Z: + | load_got lj_meta_cat + | srl CARG3, CARG3, 3 + | sd PC, SAVE_PC + | call_intern lj_meta_cat // (lua_State *L, TValue *top, int left) + |. move CARG1, L + | // Returns NULL (finished) or TValue * (metamethod). + | bnez CRET1, ->vmeta_binop + |. ld BASE, L->base + | daddu RB, BASE, MULTRES + | ld CRET1, 0(RB) + | daddu RA, BASE, RA + | ins_next1 + | sd CRET1, 0(RA) + | ins_next2 + break; + + /* -- Constant ops ------------------------------------------------------ */ + + case BC_KSTR: + | // RA = dst*8, RD = str_const*8 (~) + | dsubu TMP1, KBASE, RD + | ins_next1 + | li TMP2, LJ_TSTR + | ld TMP0, -8(TMP1) // KBASE-8-str_const*8 + | daddu RA, BASE, RA + | settp TMP0, TMP2 + | sd TMP0, 0(RA) + | ins_next2 + break; + case BC_KCDATA: + |.if FFI + | // RA = dst*8, RD = cdata_const*8 (~) + | dsubu TMP1, KBASE, RD + | ins_next1 + | ld TMP0, -8(TMP1) // KBASE-8-cdata_const*8 + | li TMP2, LJ_TCDATA + | daddu RA, BASE, RA + | settp TMP0, TMP2 + | sd TMP0, 0(RA) + | ins_next2 + |.endif + break; + case BC_KSHORT: + | // RA = dst*8, RD = int16_literal*8 + | sra RD, INS, 16 + | daddu RA, BASE, RA + | zextw RD, RD + | ins_next1 + | settp RD, TISNUM + | sd RD, 0(RA) + | ins_next2 + break; + case BC_KNUM: + | // RA = dst*8, RD = num_const*8 + | daddu RD, KBASE, RD + | daddu RA, BASE, RA + | ld CRET1, 0(RD) + | ins_next1 + | sd CRET1, 0(RA) + | ins_next2 + break; + case BC_KPRI: + | // RA = dst*8, RD = primitive_type*8 (~) + | daddu RA, BASE, RA + | dsll TMP0, RD, 44 + | not TMP0, TMP0 + | ins_next1 + | sd TMP0, 0(RA) + | ins_next2 + break; + case BC_KNIL: + | // RA = base*8, RD = end*8 + | daddu RA, BASE, RA + | sd TISNIL, 0(RA) + | daddiu RA, RA, 8 + | daddu RD, BASE, RD + |1: + | sd TISNIL, 0(RA) + | slt AT, RA, RD + | bnez AT, <1 + |. daddiu RA, RA, 8 + | ins_next_ + break; + + /* -- Upvalue and function ops ------------------------------------------ */ + + case BC_UGET: + | // RA = dst*8, RD = uvnum*8 + | ld LFUNC:RB, FRAME_FUNC(BASE) + | daddu RA, BASE, RA + | cleartp LFUNC:RB + | daddu RD, RD, LFUNC:RB + | ld UPVAL:RB, LFUNC:RD->uvptr + | ins_next1 + | ld TMP1, UPVAL:RB->v + | ld CRET1, 0(TMP1) + | sd CRET1, 0(RA) + | ins_next2 + break; + case BC_USETV: + | // RA = uvnum*8, RD = src*8 + | ld LFUNC:RB, FRAME_FUNC(BASE) + | daddu RD, BASE, RD + | cleartp LFUNC:RB + | daddu RA, RA, LFUNC:RB + | ld UPVAL:RB, LFUNC:RA->uvptr + | ld CRET1, 0(RD) + | lbu TMP3, UPVAL:RB->marked + | ld CARG2, UPVAL:RB->v + | andi TMP3, TMP3, LJ_GC_BLACK // isblack(uv) + | lbu TMP0, UPVAL:RB->closed + | gettp TMP2, CRET1 + | sd CRET1, 0(CARG2) + | li AT, LJ_GC_BLACK|1 + | or TMP3, TMP3, TMP0 + | beq TMP3, AT, >2 // Upvalue is closed and black? + |. daddiu TMP2, TMP2, -(LJ_TNUMX+1) + |1: + | ins_next + | + |2: // Check if new value is collectable. + | sltiu AT, TMP2, LJ_TISGCV - (LJ_TNUMX+1) + | beqz AT, <1 // tvisgcv(v) + |. cleartp GCOBJ:CRET1, CRET1 + | lbu TMP3, GCOBJ:CRET1->gch.marked + | andi TMP3, TMP3, LJ_GC_WHITES // iswhite(v) + | beqz TMP3, <1 + |. load_got lj_gc_barrieruv + | // Crossed a write barrier. Move the barrier forward. + | call_intern lj_gc_barrieruv // (global_State *g, TValue *tv) + |. daddiu CARG1, DISPATCH, GG_DISP2G + | b <1 + |. nop + break; + case BC_USETS: + | // RA = uvnum*8, RD = str_const*8 (~) + | ld LFUNC:RB, FRAME_FUNC(BASE) + | dsubu TMP1, KBASE, RD + | cleartp LFUNC:RB + | daddu RA, RA, LFUNC:RB + | ld UPVAL:RB, LFUNC:RA->uvptr + | ld STR:TMP1, -8(TMP1) // KBASE-8-str_const*8 + | lbu TMP2, UPVAL:RB->marked + | ld CARG2, UPVAL:RB->v + | lbu TMP3, STR:TMP1->marked + | andi AT, TMP2, LJ_GC_BLACK // isblack(uv) + | lbu TMP2, UPVAL:RB->closed + | li TMP0, LJ_TSTR + | settp TMP1, TMP0 + | bnez AT, >2 + |. sd TMP1, 0(CARG2) + |1: + | ins_next + | + |2: // Check if string is white and ensure upvalue is closed. + | beqz TMP2, <1 + |. andi AT, TMP3, LJ_GC_WHITES // iswhite(str) + | beqz AT, <1 + |. load_got lj_gc_barrieruv + | // Crossed a write barrier. Move the barrier forward. + | call_intern lj_gc_barrieruv // (global_State *g, TValue *tv) + |. daddiu CARG1, DISPATCH, GG_DISP2G + | b <1 + |. nop + break; + case BC_USETN: + | // RA = uvnum*8, RD = num_const*8 + | ld LFUNC:RB, FRAME_FUNC(BASE) + | daddu RD, KBASE, RD + | cleartp LFUNC:RB + | daddu RA, RA, LFUNC:RB + | ld UPVAL:RB, LFUNC:RA->uvptr + | ld CRET1, 0(RD) + | ld TMP1, UPVAL:RB->v + | ins_next1 + | sd CRET1, 0(TMP1) + | ins_next2 + break; + case BC_USETP: + | // RA = uvnum*8, RD = primitive_type*8 (~) + | ld LFUNC:RB, FRAME_FUNC(BASE) + | dsll TMP0, RD, 44 + | cleartp LFUNC:RB + | daddu RA, RA, LFUNC:RB + | not TMP0, TMP0 + | ld UPVAL:RB, LFUNC:RA->uvptr + | ins_next1 + | ld TMP1, UPVAL:RB->v + | sd TMP0, 0(TMP1) + | ins_next2 + break; + + case BC_UCLO: + | // RA = level*8, RD = target + | ld TMP2, L->openupval + | branch_RD // Do this first since RD is not saved. + | load_got lj_func_closeuv + | sd BASE, L->base + | beqz TMP2, >1 + |. move CARG1, L + | call_intern lj_func_closeuv // (lua_State *L, TValue *level) + |. daddu CARG2, BASE, RA + | ld BASE, L->base + |1: + | ins_next + break; + + case BC_FNEW: + | // RA = dst*8, RD = proto_const*8 (~) (holding function prototype) + | load_got lj_func_newL_gc + | dsubu TMP1, KBASE, RD + | ld CARG3, FRAME_FUNC(BASE) + | ld CARG2, -8(TMP1) // KBASE-8-tab_const*8 + | sd BASE, L->base + | sd PC, SAVE_PC + | cleartp CARG3 + | // (lua_State *L, GCproto *pt, GCfuncL *parent) + | call_intern lj_func_newL_gc + |. move CARG1, L + | // Returns GCfuncL *. + | li TMP0, LJ_TFUNC + | ld BASE, L->base + | ins_next1 + | settp CRET1, TMP0 + | daddu RA, BASE, RA + | sd CRET1, 0(RA) + | ins_next2 + break; + + /* -- Table ops --------------------------------------------------------- */ + + case BC_TNEW: + case BC_TDUP: + | // RA = dst*8, RD = (hbits|asize)*8 | tab_const*8 (~) + | ld TMP0, DISPATCH_GL(gc.total)(DISPATCH) + | ld TMP1, DISPATCH_GL(gc.threshold)(DISPATCH) + | sd BASE, L->base + | sd PC, SAVE_PC + | sltu AT, TMP0, TMP1 + | beqz AT, >5 + |1: + if (op == BC_TNEW) { + | load_got lj_tab_new + | srl CARG2, RD, 3 + | andi CARG2, CARG2, 0x7ff + | li TMP0, 0x801 + | addiu AT, CARG2, -0x7ff + | srl CARG3, RD, 14 + |.if MIPSR6 + | seleqz TMP0, TMP0, AT + | selnez CARG2, CARG2, AT + | or CARG2, CARG2, TMP0 + |.else + | movz CARG2, TMP0, AT + |.endif + | // (lua_State *L, int32_t asize, uint32_t hbits) + | call_intern lj_tab_new + |. move CARG1, L + | // Returns Table *. + } else { + | load_got lj_tab_dup + | dsubu TMP1, KBASE, RD + | move CARG1, L + | call_intern lj_tab_dup // (lua_State *L, Table *kt) + |. ld CARG2, -8(TMP1) // KBASE-8-str_const*8 + | // Returns Table *. + } + | li TMP0, LJ_TTAB + | ld BASE, L->base + | ins_next1 + | daddu RA, BASE, RA + | settp CRET1, TMP0 + | sd CRET1, 0(RA) + | ins_next2 + |5: + | load_got lj_gc_step_fixtop + | move MULTRES, RD + | call_intern lj_gc_step_fixtop // (lua_State *L) + |. move CARG1, L + | b <1 + |. move RD, MULTRES + break; + + case BC_GGET: + | // RA = dst*8, RD = str_const*8 (~) + case BC_GSET: + | // RA = src*8, RD = str_const*8 (~) + | ld LFUNC:TMP2, FRAME_FUNC(BASE) + | dsubu TMP1, KBASE, RD + | ld STR:RC, -8(TMP1) // KBASE-8-str_const*8 + | cleartp LFUNC:TMP2 + | ld TAB:RB, LFUNC:TMP2->env + if (op == BC_GGET) { + | b ->BC_TGETS_Z + } else { + | b ->BC_TSETS_Z + } + |. daddu RA, BASE, RA + break; + + case BC_TGETV: + | // RA = dst*8, RB = table*8, RC = key*8 + | decode_RB8a RB, INS + | decode_RB8b RB + | decode_RDtoRC8 RC, RD + | daddu CARG2, BASE, RB + | daddu CARG3, BASE, RC + | ld TAB:RB, 0(CARG2) + | ld TMP2, 0(CARG3) + | daddu RA, BASE, RA + | checktab TAB:RB, ->vmeta_tgetv + | gettp TMP3, TMP2 + | bne TMP3, TISNUM, >5 // Integer key? + |. lw TMP0, TAB:RB->asize + | sextw TMP2, TMP2 + | ld TMP1, TAB:RB->array + | sltu AT, TMP2, TMP0 + | sll TMP2, TMP2, 3 + | beqz AT, ->vmeta_tgetv // Integer key and in array part? + |. daddu TMP2, TMP1, TMP2 + | ld AT, 0(TMP2) + | beq AT, TISNIL, >2 + |. ld CRET1, 0(TMP2) + |1: + | ins_next1 + | sd CRET1, 0(RA) + | ins_next2 + | + |2: // Check for __index if table value is nil. + | ld TAB:TMP2, TAB:RB->metatable + | beqz TAB:TMP2, <1 // No metatable: done. + |. nop + | lbu TMP0, TAB:TMP2->nomm + | andi TMP0, TMP0, 1<vmeta_tgetv + |. nop + | + |5: + | li AT, LJ_TSTR + | bne TMP3, AT, ->vmeta_tgetv + |. cleartp RC, TMP2 + | b ->BC_TGETS_Z // String key? + |. nop + break; + case BC_TGETS: + | // RA = dst*8, RB = table*8, RC = str_const*8 (~) + | decode_RB8a RB, INS + | decode_RB8b RB + | decode_RC8a RC, INS + | daddu CARG2, BASE, RB + | decode_RC8b RC + | ld TAB:RB, 0(CARG2) + | dsubu CARG3, KBASE, RC + | daddu RA, BASE, RA + | ld STR:RC, -8(CARG3) // KBASE-8-str_const*8 + | checktab TAB:RB, ->vmeta_tgets1 + |->BC_TGETS_Z: + | // TAB:RB = GCtab *, STR:RC = GCstr *, RA = dst*8 + | lw TMP0, TAB:RB->hmask + | lw TMP1, STR:RC->sid + | ld NODE:TMP2, TAB:RB->node + | and TMP1, TMP1, TMP0 // idx = str->sid & tab->hmask + | sll TMP0, TMP1, 5 + | sll TMP1, TMP1, 3 + | subu TMP1, TMP0, TMP1 + | li TMP3, LJ_TSTR + | daddu NODE:TMP2, NODE:TMP2, TMP1 // node = tab->node + (idx*32-idx*8) + | settp STR:RC, TMP3 // Tagged key to look for. + |1: + | ld CARG1, NODE:TMP2->key + | ld CRET1, NODE:TMP2->val + | ld NODE:TMP1, NODE:TMP2->next + | bne CARG1, RC, >4 + |. ld TAB:TMP3, TAB:RB->metatable + | beq CRET1, TISNIL, >5 // Key found, but nil value? + |. nop + |3: + | ins_next1 + | sd CRET1, 0(RA) + | ins_next2 + | + |4: // Follow hash chain. + | bnez NODE:TMP1, <1 + |. move NODE:TMP2, NODE:TMP1 + | // End of hash chain: key not found, nil result. + | + |5: // Check for __index if table value is nil. + | beqz TAB:TMP3, <3 // No metatable: done. + |. move CRET1, TISNIL + | lbu TMP0, TAB:TMP3->nomm + | andi TMP0, TMP0, 1<vmeta_tgets + |. nop + break; + case BC_TGETB: + | // RA = dst*8, RB = table*8, RC = index*8 + | decode_RB8a RB, INS + | decode_RB8b RB + | daddu CARG2, BASE, RB + | decode_RDtoRC8 RC, RD + | ld TAB:RB, 0(CARG2) + | daddu RA, BASE, RA + | srl TMP0, RC, 3 + | checktab TAB:RB, ->vmeta_tgetb + | lw TMP1, TAB:RB->asize + | ld TMP2, TAB:RB->array + | sltu AT, TMP0, TMP1 + | beqz AT, ->vmeta_tgetb + |. daddu RC, TMP2, RC + | ld AT, 0(RC) + | beq AT, TISNIL, >5 + |. ld CRET1, 0(RC) + |1: + | ins_next1 + | sd CRET1, 0(RA) + | ins_next2 + | + |5: // Check for __index if table value is nil. + | ld TAB:TMP2, TAB:RB->metatable + | beqz TAB:TMP2, <1 // No metatable: done. + |. nop + | lbu TMP1, TAB:TMP2->nomm + | andi TMP1, TMP1, 1<vmeta_tgetb // Caveat: preserve TMP0 and CARG2! + |. nop + break; + case BC_TGETR: + | // RA = dst*8, RB = table*8, RC = key*8 + | decode_RB8a RB, INS + | decode_RB8b RB + | decode_RDtoRC8 RC, RD + | daddu RB, BASE, RB + | daddu RC, BASE, RC + | ld TAB:CARG1, 0(RB) + | lw CARG2, LO(RC) + | daddu RA, BASE, RA + | cleartp TAB:CARG1 + | lw TMP0, TAB:CARG1->asize + | ld TMP1, TAB:CARG1->array + | sltu AT, CARG2, TMP0 + | sll TMP2, CARG2, 3 + | beqz AT, ->vmeta_tgetr // In array part? + |. daddu CRET1, TMP1, TMP2 + | ld CARG2, 0(CRET1) + |->BC_TGETR_Z: + | ins_next1 + | sd CARG2, 0(RA) + | ins_next2 + break; + + case BC_TSETV: + | // RA = src*8, RB = table*8, RC = key*8 + | decode_RB8a RB, INS + | decode_RB8b RB + | decode_RDtoRC8 RC, RD + | daddu CARG2, BASE, RB + | daddu CARG3, BASE, RC + | ld RB, 0(CARG2) + | ld TMP2, 0(CARG3) + | daddu RA, BASE, RA + | checktab RB, ->vmeta_tsetv + | checkint TMP2, >5 + |. sextw RC, TMP2 + | lw TMP0, TAB:RB->asize + | ld TMP1, TAB:RB->array + | sltu AT, RC, TMP0 + | sll TMP2, RC, 3 + | beqz AT, ->vmeta_tsetv // Integer key and in array part? + |. daddu TMP1, TMP1, TMP2 + | ld TMP0, 0(TMP1) + | lbu TMP3, TAB:RB->marked + | beq TMP0, TISNIL, >3 + |. ld CRET1, 0(RA) + |1: + | andi AT, TMP3, LJ_GC_BLACK // isblack(table) + | bnez AT, >7 + |. sd CRET1, 0(TMP1) + |2: + | ins_next + | + |3: // Check for __newindex if previous value is nil. + | ld TAB:TMP2, TAB:RB->metatable + | beqz TAB:TMP2, <1 // No metatable: done. + |. nop + | lbu TMP2, TAB:TMP2->nomm + | andi TMP2, TMP2, 1<vmeta_tsetv + |. nop + | + |5: + | gettp AT, TMP2 + | daddiu AT, AT, -LJ_TSTR + | bnez AT, ->vmeta_tsetv + |. nop + | b ->BC_TSETS_Z // String key? + |. cleartp STR:RC, TMP2 + | + |7: // Possible table write barrier for the value. Skip valiswhite check. + | barrierback TAB:RB, TMP3, TMP0, <2 + break; + case BC_TSETS: + | // RA = src*8, RB = table*8, RC = str_const*8 (~) + | decode_RB8a RB, INS + | decode_RB8b RB + | daddu CARG2, BASE, RB + | decode_RC8a RC, INS + | ld TAB:RB, 0(CARG2) + | decode_RC8b RC + | dsubu CARG3, KBASE, RC + | ld RC, -8(CARG3) // KBASE-8-str_const*8 + | daddu RA, BASE, RA + | cleartp STR:RC + | checktab TAB:RB, ->vmeta_tsets1 + |->BC_TSETS_Z: + | // TAB:RB = GCtab *, STR:RC = GCstr *, RA = BASE+src*8 + | lw TMP0, TAB:RB->hmask + | lw TMP1, STR:RC->sid + | ld NODE:TMP2, TAB:RB->node + | sb r0, TAB:RB->nomm // Clear metamethod cache. + | and TMP1, TMP1, TMP0 // idx = str->sid & tab->hmask + | sll TMP0, TMP1, 5 + | sll TMP1, TMP1, 3 + | subu TMP1, TMP0, TMP1 + | li TMP3, LJ_TSTR + | daddu NODE:TMP2, NODE:TMP2, TMP1 // node = tab->node + (idx*32-idx*8) + | settp STR:RC, TMP3 // Tagged key to look for. + |.if FPU + | ldc1 FTMP0, 0(RA) + |.else + | ld CRET1, 0(RA) + |.endif + |1: + | ld TMP0, NODE:TMP2->key + | ld CARG2, NODE:TMP2->val + | ld NODE:TMP1, NODE:TMP2->next + | bne TMP0, RC, >5 + |. lbu TMP3, TAB:RB->marked + | beq CARG2, TISNIL, >4 // Key found, but nil value? + |. ld TAB:TMP0, TAB:RB->metatable + |2: + | andi AT, TMP3, LJ_GC_BLACK // isblack(table) + | bnez AT, >7 + |.if FPU + |. sdc1 FTMP0, NODE:TMP2->val + |.else + |. sd CRET1, NODE:TMP2->val + |.endif + |3: + | ins_next + | + |4: // Check for __newindex if previous value is nil. + | beqz TAB:TMP0, <2 // No metatable: done. + |. nop + | lbu TMP0, TAB:TMP0->nomm + | andi TMP0, TMP0, 1<vmeta_tsets + |. nop + | + |5: // Follow hash chain. + | bnez NODE:TMP1, <1 + |. move NODE:TMP2, NODE:TMP1 + | // End of hash chain: key not found, add a new one + | + | // But check for __newindex first. + | ld TAB:TMP2, TAB:RB->metatable + | beqz TAB:TMP2, >6 // No metatable: continue. + |. daddiu CARG3, DISPATCH, DISPATCH_GL(tmptv) + | lbu TMP0, TAB:TMP2->nomm + | andi TMP0, TMP0, 1<vmeta_tsets // 'no __newindex' flag NOT set: check. + |6: + | load_got lj_tab_newkey + | sd RC, 0(CARG3) + | sd BASE, L->base + | move CARG2, TAB:RB + | sd PC, SAVE_PC + | call_intern lj_tab_newkey // (lua_State *L, GCtab *t, TValue *k + |. move CARG1, L + | // Returns TValue *. + | ld BASE, L->base + |.if FPU + | b <3 // No 2nd write barrier needed. + |. sdc1 FTMP0, 0(CRET1) + |.else + | ld CARG1, 0(RA) + | b <3 // No 2nd write barrier needed. + |. sd CARG1, 0(CRET1) + |.endif + | + |7: // Possible table write barrier for the value. Skip valiswhite check. + | barrierback TAB:RB, TMP3, TMP0, <3 + break; + case BC_TSETB: + | // RA = src*8, RB = table*8, RC = index*8 + | decode_RB8a RB, INS + | decode_RB8b RB + | daddu CARG2, BASE, RB + | decode_RDtoRC8 RC, RD + | ld TAB:RB, 0(CARG2) + | daddu RA, BASE, RA + | srl TMP0, RC, 3 + | checktab RB, ->vmeta_tsetb + | lw TMP1, TAB:RB->asize + | ld TMP2, TAB:RB->array + | sltu AT, TMP0, TMP1 + | beqz AT, ->vmeta_tsetb + |. daddu RC, TMP2, RC + | ld TMP1, 0(RC) + | lbu TMP3, TAB:RB->marked + | beq TMP1, TISNIL, >5 + |1: + |. ld CRET1, 0(RA) + | andi AT, TMP3, LJ_GC_BLACK // isblack(table) + | bnez AT, >7 + |. sd CRET1, 0(RC) + |2: + | ins_next + | + |5: // Check for __newindex if previous value is nil. + | ld TAB:TMP2, TAB:RB->metatable + | beqz TAB:TMP2, <1 // No metatable: done. + |. nop + | lbu TMP1, TAB:TMP2->nomm + | andi TMP1, TMP1, 1<vmeta_tsetb // Caveat: preserve TMP0 and CARG2! + |. nop + | + |7: // Possible table write barrier for the value. Skip valiswhite check. + | barrierback TAB:RB, TMP3, TMP0, <2 + break; + case BC_TSETR: + | // RA = dst*8, RB = table*8, RC = key*8 + | decode_RB8a RB, INS + | decode_RB8b RB + | decode_RDtoRC8 RC, RD + | daddu CARG1, BASE, RB + | daddu CARG3, BASE, RC + | ld TAB:CARG2, 0(CARG1) + | lw CARG3, LO(CARG3) + | cleartp TAB:CARG2 + | lbu TMP3, TAB:CARG2->marked + | lw TMP0, TAB:CARG2->asize + | ld TMP1, TAB:CARG2->array + | andi AT, TMP3, LJ_GC_BLACK // isblack(table) + | bnez AT, >7 + |. daddu RA, BASE, RA + |2: + | sltu AT, CARG3, TMP0 + | sll TMP2, CARG3, 3 + | beqz AT, ->vmeta_tsetr // In array part? + |. daddu CRET1, TMP1, TMP2 + |->BC_TSETR_Z: + | ld CARG1, 0(RA) + | ins_next1 + | sd CARG1, 0(CRET1) + | ins_next2 + | + |7: // Possible table write barrier for the value. Skip valiswhite check. + | barrierback TAB:CARG2, TMP3, CRET1, <2 + break; + + case BC_TSETM: + | // RA = base*8 (table at base-1), RD = num_const*8 (start index) + | daddu RA, BASE, RA + |1: + | daddu TMP3, KBASE, RD + | ld TAB:CARG2, -8(RA) // Guaranteed to be a table. + | addiu TMP0, MULTRES, -8 + | lw TMP3, LO(TMP3) // Integer constant is in lo-word. + | beqz TMP0, >4 // Nothing to copy? + |. srl CARG3, TMP0, 3 + | cleartp CARG2 + | addu CARG3, CARG3, TMP3 + | lw TMP2, TAB:CARG2->asize + | sll TMP1, TMP3, 3 + | lbu TMP3, TAB:CARG2->marked + | ld CARG1, TAB:CARG2->array + | sltu AT, TMP2, CARG3 + | bnez AT, >5 + |. daddu TMP2, RA, TMP0 + | daddu TMP1, TMP1, CARG1 + | andi TMP0, TMP3, LJ_GC_BLACK // isblack(table) + |3: // Copy result slots to table. + | ld CRET1, 0(RA) + | daddiu RA, RA, 8 + | sltu AT, RA, TMP2 + | sd CRET1, 0(TMP1) + | bnez AT, <3 + |. daddiu TMP1, TMP1, 8 + | bnez TMP0, >7 + |. nop + |4: + | ins_next + | + |5: // Need to resize array part. + | load_got lj_tab_reasize + | sd BASE, L->base + | sd PC, SAVE_PC + | move BASE, RD + | call_intern lj_tab_reasize // (lua_State *L, GCtab *t, int nasize) + |. move CARG1, L + | // Must not reallocate the stack. + | move RD, BASE + | b <1 + |. ld BASE, L->base // Reload BASE for lack of a saved register. + | + |7: // Possible table write barrier for any value. Skip valiswhite check. + | barrierback TAB:CARG2, TMP3, TMP0, <4 + break; + + /* -- Calls and vararg handling ----------------------------------------- */ + + case BC_CALLM: + | // RA = base*8, (RB = (nresults+1)*8,) RC = extra_nargs*8 + | decode_RDtoRC8 NARGS8:RC, RD + | b ->BC_CALL_Z + |. addu NARGS8:RC, NARGS8:RC, MULTRES + break; + case BC_CALL: + | // RA = base*8, (RB = (nresults+1)*8,) RC = (nargs+1)*8 + | decode_RDtoRC8 NARGS8:RC, RD + |->BC_CALL_Z: + | move TMP2, BASE + | daddu BASE, BASE, RA + | ld LFUNC:RB, 0(BASE) + | daddiu BASE, BASE, 16 + | addiu NARGS8:RC, NARGS8:RC, -8 + | checkfunc RB, ->vmeta_call + | ins_call + break; + + case BC_CALLMT: + | // RA = base*8, (RB = 0,) RC = extra_nargs*8 + | addu NARGS8:RD, NARGS8:RD, MULTRES // BC_CALLT gets RC from RD. + | // Fall through. Assumes BC_CALLT follows. + break; + case BC_CALLT: + | // RA = base*8, (RB = 0,) RC = (nargs+1)*8 + | daddu RA, BASE, RA + | ld RB, 0(RA) + | move NARGS8:RC, RD + | ld TMP1, FRAME_PC(BASE) + | daddiu RA, RA, 16 + | addiu NARGS8:RC, NARGS8:RC, -8 + | checktp CARG3, RB, -LJ_TFUNC, ->vmeta_callt + |->BC_CALLT_Z: + | andi TMP0, TMP1, FRAME_TYPE // Caveat: preserve TMP0 until the 'or'. + | lbu TMP3, LFUNC:CARG3->ffid + | bnez TMP0, >7 + |. xori TMP2, TMP1, FRAME_VARG + |1: + | sd RB, FRAME_FUNC(BASE) // Copy function down, but keep PC. + | sltiu AT, TMP3, 2 // (> FF_C) Calling a fast function? + | move TMP2, BASE + | move RB, CARG3 + | beqz NARGS8:RC, >3 + |. move TMP3, NARGS8:RC + |2: + | ld CRET1, 0(RA) + | daddiu RA, RA, 8 + | addiu TMP3, TMP3, -8 + | sd CRET1, 0(TMP2) + | bnez TMP3, <2 + |. daddiu TMP2, TMP2, 8 + |3: + | or TMP0, TMP0, AT + | beqz TMP0, >5 + |. nop + |4: + | ins_callt + | + |5: // Tailcall to a fast function with a Lua frame below. + | lw INS, -4(TMP1) + | decode_RA8a RA, INS + | decode_RA8b RA + | dsubu TMP1, BASE, RA + | ld TMP1, -32(TMP1) + | cleartp LFUNC:TMP1 + | ld TMP1, LFUNC:TMP1->pc + | b <4 + |. ld KBASE, PC2PROTO(k)(TMP1) // Need to prepare KBASE. + | + |7: // Tailcall from a vararg function. + | andi AT, TMP2, FRAME_TYPEP + | bnez AT, <1 // Vararg frame below? + |. dsubu TMP2, BASE, TMP2 // Relocate BASE down. + | move BASE, TMP2 + | ld TMP1, FRAME_PC(TMP2) + | b <1 + |. andi TMP0, TMP1, FRAME_TYPE + break; + + case BC_ITERC: + | // RA = base*8, (RB = (nresults+1)*8, RC = (nargs+1)*8 ((2+1)*8)) + | move TMP2, BASE // Save old BASE fir vmeta_call. + | daddu BASE, BASE, RA + | ld RB, -24(BASE) + | ld CARG1, -16(BASE) + | ld CARG2, -8(BASE) + | li NARGS8:RC, 16 // Iterators get 2 arguments. + | sd RB, 0(BASE) // Copy callable. + | sd CARG1, 16(BASE) // Copy state. + | sd CARG2, 24(BASE) // Copy control var. + | daddiu BASE, BASE, 16 + | checkfunc RB, ->vmeta_call + | ins_call + break; + + case BC_ITERN: + |.if JIT and ENDIAN_LE + | hotloop + |.endif + |->vm_IITERN: + | // RA = base*8, (RB = (nresults+1)*8, RC = (nargs+1)*8 (2+1)*8) + | daddu RA, BASE, RA + | ld TAB:RB, -16(RA) + | lw RC, -8+LO(RA) // Get index from control var. + | cleartp TAB:RB + | daddiu PC, PC, 4 + | lw TMP0, TAB:RB->asize + | ld TMP1, TAB:RB->array + | dsll CARG3, TISNUM, 47 + |1: // Traverse array part. + | sltu AT, RC, TMP0 + | beqz AT, >5 // Index points after array part? + |. sll TMP3, RC, 3 + | daddu TMP3, TMP1, TMP3 + | ld CARG1, 0(TMP3) + | lhu RD, -4+OFS_RD(PC) + | or TMP2, RC, CARG3 + | beq CARG1, TISNIL, <1 // Skip holes in array part. + |. addiu RC, RC, 1 + | sd TMP2, 0(RA) + | sd CARG1, 8(RA) + | lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535) + | decode_RD4b RD + | daddu RD, RD, TMP3 + | sw RC, -8+LO(RA) // Update control var. + | daddu PC, PC, RD + |3: + | ins_next + | + |5: // Traverse hash part. + | lw TMP1, TAB:RB->hmask + | subu RC, RC, TMP0 + | ld TMP2, TAB:RB->node + |6: + | sltu AT, TMP1, RC // End of iteration? Branch to ITERL+1. + | bnez AT, <3 + |. sll TMP3, RC, 5 + | sll RB, RC, 3 + | subu TMP3, TMP3, RB + | daddu NODE:TMP3, TMP3, TMP2 + | ld CARG1, 0(NODE:TMP3) + | lhu RD, -4+OFS_RD(PC) + | beq CARG1, TISNIL, <6 // Skip holes in hash part. + |. addiu RC, RC, 1 + | ld CARG2, NODE:TMP3->key + | lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535) + | sd CARG1, 8(RA) + | addu RC, RC, TMP0 + | decode_RD4b RD + | addu RD, RD, TMP3 + | sd CARG2, 0(RA) + | daddu PC, PC, RD + | b <3 + |. sw RC, -8+LO(RA) // Update control var. + break; + + case BC_ISNEXT: + | // RA = base*8, RD = target (points to ITERN) + | daddu RA, BASE, RA + | srl TMP0, RD, 1 + | ld CFUNC:CARG1, -24(RA) + | daddu TMP0, PC, TMP0 + | ld CARG2, -16(RA) + | ld CARG3, -8(RA) + | lui TMP2, (-(BCBIAS_J*4 >> 16) & 65535) + | checkfunc CFUNC:CARG1, >5 + | gettp CARG2, CARG2 + | daddiu CARG2, CARG2, -LJ_TTAB + | lbu TMP1, CFUNC:CARG1->ffid + | daddiu CARG3, CARG3, -LJ_TNIL + | or AT, CARG2, CARG3 + | daddiu TMP1, TMP1, -FF_next_N + | or AT, AT, TMP1 + | bnez AT, >5 + |. lui TMP1, (LJ_KEYINDEX >> 16) + | daddu PC, TMP0, TMP2 + | ori TMP1, TMP1, (LJ_KEYINDEX & 0xffff) + | dsll TMP1, TMP1, 32 + | sd TMP1, -8(RA) + |1: + | ins_next + |5: // Despecialize bytecode if any of the checks fail. + | li TMP3, BC_JMP + | li TMP1, BC_ITERC + | sb TMP3, -4+OFS_OP(PC) + | daddu PC, TMP0, TMP2 + |.if JIT + | lb TMP0, OFS_OP(PC) + | li AT, BC_ITERN + | bne TMP0, AT, >6 + |. lhu TMP2, OFS_RD(PC) + |.endif + | b <1 + |. sb TMP1, OFS_OP(PC) + |.if JIT + |6: // Unpatch JLOOP. + | ld TMP0, DISPATCH_J(trace)(DISPATCH) + | sll TMP2, TMP2, 3 + | daddu TMP0, TMP0, TMP2 + | ld TRACE:TMP2, 0(TMP0) + | lw TMP0, TRACE:TMP2->startins + | li AT, -256 + | and TMP0, TMP0, AT + | or TMP0, TMP0, TMP1 + | b <1 + |. sw TMP0, 0(PC) + |.endif + break; + + case BC_VARG: + | // RA = base*8, RB = (nresults+1)*8, RC = numparams*8 + | ld TMP0, FRAME_PC(BASE) + | decode_RDtoRC8 RC, RD + | decode_RB8a RB, INS + | daddu RC, BASE, RC + | decode_RB8b RB + | daddu RA, BASE, RA + | daddiu RC, RC, FRAME_VARG + | daddu TMP2, RA, RB + | daddiu TMP3, BASE, -16 // TMP3 = vtop + | dsubu RC, RC, TMP0 // RC = vbase + | // Note: RC may now be even _above_ BASE if nargs was < numparams. + | beqz RB, >5 // Copy all varargs? + |. dsubu TMP1, TMP3, RC + | daddiu TMP2, TMP2, -16 + |1: // Copy vararg slots to destination slots. + | ld CARG1, 0(RC) + | sltu AT, RC, TMP3 + | daddiu RC, RC, 8 + |.if MIPSR6 + | selnez CARG1, CARG1, AT + | seleqz AT, TISNIL, AT + | or CARG1, CARG1, AT + |.else + | movz CARG1, TISNIL, AT + |.endif + | sd CARG1, 0(RA) + | sltu AT, RA, TMP2 + | bnez AT, <1 + |. daddiu RA, RA, 8 + |3: + | ins_next + | + |5: // Copy all varargs. + | ld TMP0, L->maxstack + | blez TMP1, <3 // No vararg slots? + |. li MULTRES, 8 // MULTRES = (0+1)*8 + | daddu TMP2, RA, TMP1 + | sltu AT, TMP0, TMP2 + | bnez AT, >7 + |. daddiu MULTRES, TMP1, 8 + |6: + | ld CRET1, 0(RC) + | daddiu RC, RC, 8 + | sd CRET1, 0(RA) + | sltu AT, RC, TMP3 + | bnez AT, <6 // More vararg slots? + |. daddiu RA, RA, 8 + | b <3 + |. nop + | + |7: // Grow stack for varargs. + | load_got lj_state_growstack + | sd RA, L->top + | dsubu RA, RA, BASE + | sd BASE, L->base + | dsubu BASE, RC, BASE // Need delta, because BASE may change. + | sd PC, SAVE_PC + | srl CARG2, TMP1, 3 + | call_intern lj_state_growstack // (lua_State *L, int n) + |. move CARG1, L + | move RC, BASE + | ld BASE, L->base + | daddu RA, BASE, RA + | daddu RC, BASE, RC + | b <6 + |. daddiu TMP3, BASE, -16 + break; + + /* -- Returns ----------------------------------------------------------- */ + + case BC_RETM: + | // RA = results*8, RD = extra_nresults*8 + | addu RD, RD, MULTRES // MULTRES >= 8, so RD >= 8. + | // Fall through. Assumes BC_RET follows. + break; + + case BC_RET: + | // RA = results*8, RD = (nresults+1)*8 + | ld PC, FRAME_PC(BASE) + | daddu RA, BASE, RA + | move MULTRES, RD + |1: + | andi TMP0, PC, FRAME_TYPE + | bnez TMP0, ->BC_RETV_Z + |. xori TMP1, PC, FRAME_VARG + | + |->BC_RET_Z: + | // BASE = base, RA = resultptr, RD = (nresults+1)*8, PC = return + | lw INS, -4(PC) + | daddiu TMP2, BASE, -16 + | daddiu RC, RD, -8 + | decode_RA8a TMP0, INS + | decode_RB8a RB, INS + | decode_RA8b TMP0 + | decode_RB8b RB + | daddu TMP3, TMP2, RB + | beqz RC, >3 + |. dsubu BASE, TMP2, TMP0 + |2: + | ld CRET1, 0(RA) + | daddiu RA, RA, 8 + | daddiu RC, RC, -8 + | sd CRET1, 0(TMP2) + | bnez RC, <2 + |. daddiu TMP2, TMP2, 8 + |3: + | daddiu TMP3, TMP3, -8 + |5: + | sltu AT, TMP2, TMP3 + | bnez AT, >6 + |. ld LFUNC:TMP1, FRAME_FUNC(BASE) + | ins_next1 + | cleartp LFUNC:TMP1 + | ld TMP1, LFUNC:TMP1->pc + | ld KBASE, PC2PROTO(k)(TMP1) + | ins_next2 + | + |6: // Fill up results with nil. + | sd TISNIL, 0(TMP2) + | b <5 + |. daddiu TMP2, TMP2, 8 + | + |->BC_RETV_Z: // Non-standard return case. + | andi TMP2, TMP1, FRAME_TYPEP + | bnez TMP2, ->vm_return + |. nop + | // Return from vararg function: relocate BASE down. + | dsubu BASE, BASE, TMP1 + | b <1 + |. ld PC, FRAME_PC(BASE) + break; + + case BC_RET0: case BC_RET1: + | // RA = results*8, RD = (nresults+1)*8 + | ld PC, FRAME_PC(BASE) + | daddu RA, BASE, RA + | move MULTRES, RD + | andi TMP0, PC, FRAME_TYPE + | bnez TMP0, ->BC_RETV_Z + |. xori TMP1, PC, FRAME_VARG + | lw INS, -4(PC) + | daddiu TMP2, BASE, -16 + if (op == BC_RET1) { + | ld CRET1, 0(RA) + } + | decode_RB8a RB, INS + | decode_RA8a RA, INS + | decode_RB8b RB + | decode_RA8b RA + | dsubu BASE, TMP2, RA + if (op == BC_RET1) { + | sd CRET1, 0(TMP2) + } + |5: + | sltu AT, RD, RB + | bnez AT, >6 + |. ld TMP1, FRAME_FUNC(BASE) + | ins_next1 + | cleartp LFUNC:TMP1 + | ld TMP1, LFUNC:TMP1->pc + | ld KBASE, PC2PROTO(k)(TMP1) + | ins_next2 + | + |6: // Fill up results with nil. + | daddiu TMP2, TMP2, 8 + | daddiu RD, RD, 8 + | b <5 + if (op == BC_RET1) { + |. sd TISNIL, 0(TMP2) + } else { + |. sd TISNIL, -8(TMP2) + } + break; + + /* -- Loops and branches ------------------------------------------------ */ + + case BC_FORL: + |.if JIT + | hotloop + |.endif + | // Fall through. Assumes BC_IFORL follows. + break; + + case BC_JFORI: + case BC_JFORL: +#if !LJ_HASJIT + break; +#endif + case BC_FORI: + case BC_IFORL: + | // RA = base*8, RD = target (after end of loop or start of loop) + vk = (op == BC_IFORL || op == BC_JFORL); + | daddu RA, BASE, RA + | ld CARG1, FORL_IDX*8(RA) // IDX CARG1 - CARG3 type + | gettp CARG3, CARG1 + if (op != BC_JFORL) { + | srl RD, RD, 1 + | lui TMP2, (-(BCBIAS_J*4 >> 16) & 65535) + | daddu TMP2, RD, TMP2 + } + if (!vk) { + | ld CARG2, FORL_STOP*8(RA) // STOP CARG2 - CARG4 type + | ld CRET1, FORL_STEP*8(RA) // STEP CRET1 - CRET2 type + | gettp CARG4, CARG2 + | bne CARG3, TISNUM, >5 + |. gettp CRET2, CRET1 + | bne CARG4, TISNUM, ->vmeta_for + |. sextw CARG3, CARG1 + | bne CRET2, TISNUM, ->vmeta_for + |. sextw CARG2, CARG2 + | dext AT, CRET1, 31, 0 + | slt CRET1, CARG2, CARG3 + | slt TMP1, CARG3, CARG2 + |.if MIPSR6 + | selnez TMP1, TMP1, AT + | seleqz CRET1, CRET1, AT + | or CRET1, CRET1, TMP1 + |.else + | movn CRET1, TMP1, AT + |.endif + } else { + | bne CARG3, TISNUM, >5 + |. ld CARG2, FORL_STEP*8(RA) // STEP CARG2 - CARG4 type + | ld CRET1, FORL_STOP*8(RA) // STOP CRET1 - CRET2 type + | sextw TMP3, CARG1 + | sextw CARG2, CARG2 + | sextw CRET1, CRET1 + | addu CARG1, TMP3, CARG2 + | xor TMP0, CARG1, TMP3 + | xor TMP1, CARG1, CARG2 + | and TMP0, TMP0, TMP1 + | slt TMP1, CARG1, CRET1 + | slt CRET1, CRET1, CARG1 + | slt AT, CARG2, r0 + | slt TMP0, TMP0, r0 // ((y^a) & (y^b)) < 0: overflow. + |.if MIPSR6 + | selnez TMP1, TMP1, AT + | seleqz CRET1, CRET1, AT + | or CRET1, CRET1, TMP1 + |.else + | movn CRET1, TMP1, AT + |.endif + | or CRET1, CRET1, TMP0 + | zextw CARG1, CARG1 + | settp CARG1, TISNUM + } + |1: + if (op == BC_FORI) { + |.if MIPSR6 + | selnez TMP2, TMP2, CRET1 + |.else + | movz TMP2, r0, CRET1 + |.endif + | daddu PC, PC, TMP2 + } else if (op == BC_JFORI) { + | daddu PC, PC, TMP2 + | lhu RD, -4+OFS_RD(PC) + } else if (op == BC_IFORL) { + |.if MIPSR6 + | seleqz TMP2, TMP2, CRET1 + |.else + | movn TMP2, r0, CRET1 + |.endif + | daddu PC, PC, TMP2 + } + if (vk) { + | sd CARG1, FORL_IDX*8(RA) + } + | ins_next1 + | sd CARG1, FORL_EXT*8(RA) + |2: + if (op == BC_JFORI) { + | beqz CRET1, =>BC_JLOOP + |. decode_RD8b RD + } else if (op == BC_JFORL) { + | beqz CRET1, =>BC_JLOOP + } + | ins_next2 + | + |5: // FP loop. + |.if FPU + if (!vk) { + | ldc1 f0, FORL_IDX*8(RA) + | ldc1 f2, FORL_STOP*8(RA) + | sltiu TMP0, CARG3, LJ_TISNUM + | sltiu TMP1, CARG4, LJ_TISNUM + | sltiu AT, CRET2, LJ_TISNUM + | ld TMP3, FORL_STEP*8(RA) + | and TMP0, TMP0, TMP1 + | and AT, AT, TMP0 + | beqz AT, ->vmeta_for + |. slt TMP3, TMP3, r0 + |.if MIPSR6 + | dmtc1 TMP3, FTMP2 + | cmp.lt.d FTMP0, f0, f2 + | cmp.lt.d FTMP1, f2, f0 + | sel.d FTMP2, FTMP1, FTMP0 + | b <1 + |. dmfc1 CRET1, FTMP2 + |.else + | c.ole.d 0, f0, f2 + | c.ole.d 1, f2, f0 + | li CRET1, 1 + | movt CRET1, r0, 0 + | movt AT, r0, 1 + | b <1 + |. movn CRET1, AT, TMP3 + |.endif + } else { + | ldc1 f0, FORL_IDX*8(RA) + | ldc1 f4, FORL_STEP*8(RA) + | ldc1 f2, FORL_STOP*8(RA) + | ld TMP3, FORL_STEP*8(RA) + | add.d f0, f0, f4 + |.if MIPSR6 + | slt TMP3, TMP3, r0 + | dmtc1 TMP3, FTMP2 + | cmp.lt.d FTMP0, f0, f2 + | cmp.lt.d FTMP1, f2, f0 + | sel.d FTMP2, FTMP1, FTMP0 + | dmfc1 CRET1, FTMP2 + if (op == BC_IFORL) { + | seleqz TMP2, TMP2, CRET1 + | daddu PC, PC, TMP2 + } + |.else + | c.ole.d 0, f0, f2 + | c.ole.d 1, f2, f0 + | slt TMP3, TMP3, r0 + | li CRET1, 1 + | li AT, 1 + | movt CRET1, r0, 0 + | movt AT, r0, 1 + | movn CRET1, AT, TMP3 + if (op == BC_IFORL) { + | movn TMP2, r0, CRET1 + | daddu PC, PC, TMP2 + } + |.endif + | sdc1 f0, FORL_IDX*8(RA) + | ins_next1 + | b <2 + |. sdc1 f0, FORL_EXT*8(RA) + } + |.else + if (!vk) { + | sltiu TMP0, CARG3, LJ_TISNUM + | sltiu TMP1, CARG4, LJ_TISNUM + | sltiu AT, CRET2, LJ_TISNUM + | and TMP0, TMP0, TMP1 + | and AT, AT, TMP0 + | beqz AT, ->vmeta_for + |. nop + | bal ->vm_sfcmpolex + |. lw TMP3, FORL_STEP*8+HI(RA) + | b <1 + |. nop + } else { + | load_got __adddf3 + | call_extern + |. sw TMP2, TMPD + | ld CARG2, FORL_STOP*8(RA) + | move CARG1, CRET1 + if ( op == BC_JFORL ) { + | lhu RD, -4+OFS_RD(PC) + | decode_RD8b RD + } + | bal ->vm_sfcmpolex + |. lw TMP3, FORL_STEP*8+HI(RA) + | b <1 + |. lw TMP2, TMPD + } + |.endif + break; + + case BC_ITERL: + |.if JIT + | hotloop + |.endif + | // Fall through. Assumes BC_IITERL follows. + break; + + case BC_JITERL: +#if !LJ_HASJIT + break; +#endif + case BC_IITERL: + | // RA = base*8, RD = target + | daddu RA, BASE, RA + | ld TMP1, 0(RA) + | beq TMP1, TISNIL, >1 // Stop if iterator returned nil. + |. nop + if (op == BC_JITERL) { + | b =>BC_JLOOP + |. sd TMP1, -8(RA) + } else { + | branch_RD // Otherwise save control var + branch. + | sd TMP1, -8(RA) + } + |1: + | ins_next + break; + + case BC_LOOP: + | // RA = base*8, RD = target (loop extent) + | // Note: RA/RD is only used by trace recorder to determine scope/extent + | // This opcode does NOT jump, it's only purpose is to detect a hot loop. + |.if JIT + | hotloop + |.endif + | // Fall through. Assumes BC_ILOOP follows. + break; + + case BC_ILOOP: + | // RA = base*8, RD = target (loop extent) + | ins_next + break; + + case BC_JLOOP: + |.if JIT + | // RA = base*8 (ignored), RD = traceno*8 + | ld TMP1, DISPATCH_J(trace)(DISPATCH) + | li AT, 0 + | daddu TMP1, TMP1, RD + | // Traces on MIPS don't store the trace number, so use 0. + | sd AT, DISPATCH_GL(vmstate)(DISPATCH) + | ld TRACE:TMP2, 0(TMP1) + | sd BASE, DISPATCH_GL(jit_base)(DISPATCH) + | ld TMP2, TRACE:TMP2->mcode + | sd L, DISPATCH_GL(tmpbuf.L)(DISPATCH) + | jr TMP2 + |. daddiu JGL, DISPATCH, GG_DISP2G+32768 + |.endif + break; + + case BC_JMP: + | // RA = base*8 (only used by trace recorder), RD = target + | branch_RD + | ins_next + break; + + /* -- Function headers -------------------------------------------------- */ + + case BC_FUNCF: + |.if JIT + | hotcall + |.endif + case BC_FUNCV: /* NYI: compiled vararg functions. */ + | // Fall through. Assumes BC_IFUNCF/BC_IFUNCV follow. + break; + + case BC_JFUNCF: +#if !LJ_HASJIT + break; +#endif + case BC_IFUNCF: + | // BASE = new base, RA = BASE+framesize*8, RB = LFUNC, RC = nargs*8 + | ld TMP2, L->maxstack + | lbu TMP1, -4+PC2PROTO(numparams)(PC) + | ld KBASE, -4+PC2PROTO(k)(PC) + | sltu AT, TMP2, RA + | bnez AT, ->vm_growstack_l + |. sll TMP1, TMP1, 3 + if (op != BC_JFUNCF) { + | ins_next1 + } + |2: + | sltu AT, NARGS8:RC, TMP1 // Check for missing parameters. + | bnez AT, >3 + |. daddu AT, BASE, NARGS8:RC + if (op == BC_JFUNCF) { + | decode_RD8a RD, INS + | b =>BC_JLOOP + |. decode_RD8b RD + } else { + | ins_next2 + } + | + |3: // Clear missing parameters. + | sd TISNIL, 0(AT) + | b <2 + |. addiu NARGS8:RC, NARGS8:RC, 8 + break; + + case BC_JFUNCV: +#if !LJ_HASJIT + break; +#endif + | NYI // NYI: compiled vararg functions + break; /* NYI: compiled vararg functions. */ + + case BC_IFUNCV: + | // BASE = new base, RA = BASE+framesize*8, RB = LFUNC, RC = nargs*8 + | li TMP0, LJ_TFUNC + | daddu TMP1, BASE, RC + | ld TMP2, L->maxstack + | settp LFUNC:RB, TMP0 + | daddu TMP0, RA, RC + | sd LFUNC:RB, 0(TMP1) // Store (tagged) copy of LFUNC. + | daddiu TMP3, RC, 16+FRAME_VARG + | sltu AT, TMP0, TMP2 + | ld KBASE, -4+PC2PROTO(k)(PC) + | beqz AT, ->vm_growstack_l + |. sd TMP3, 8(TMP1) // Store delta + FRAME_VARG. + | lbu TMP2, -4+PC2PROTO(numparams)(PC) + | move RA, BASE + | move RC, TMP1 + | ins_next1 + | beqz TMP2, >3 + |. daddiu BASE, TMP1, 16 + |1: + | ld TMP0, 0(RA) + | sltu AT, RA, RC // Less args than parameters? + | move CARG1, TMP0 + |.if MIPSR6 + | selnez TMP0, TMP0, AT + | seleqz TMP3, TISNIL, AT + | or TMP0, TMP0, TMP3 + | seleqz TMP3, CARG1, AT + | selnez CARG1, TISNIL, AT + | or CARG1, CARG1, TMP3 + |.else + | movz TMP0, TISNIL, AT // Clear missing parameters. + | movn CARG1, TISNIL, AT // Clear old fixarg slot (help the GC). + |.endif + | addiu TMP2, TMP2, -1 + | sd TMP0, 16(TMP1) + | daddiu TMP1, TMP1, 8 + | sd CARG1, 0(RA) + | bnez TMP2, <1 + |. daddiu RA, RA, 8 + |3: + | ins_next2 + break; + + case BC_FUNCC: + case BC_FUNCCW: + | // BASE = new base, RA = BASE+framesize*8, RB = CFUNC, RC = nargs*8 + if (op == BC_FUNCC) { + | ld CFUNCADDR, CFUNC:RB->f + } else { + | ld CFUNCADDR, DISPATCH_GL(wrapf)(DISPATCH) + } + | daddu TMP1, RA, NARGS8:RC + | ld TMP2, L->maxstack + | daddu RC, BASE, NARGS8:RC + | sd BASE, L->base + | sltu AT, TMP2, TMP1 + | sd RC, L->top + | li_vmstate C + if (op == BC_FUNCCW) { + | ld CARG2, CFUNC:RB->f + } + | bnez AT, ->vm_growstack_c // Need to grow stack. + |. move CARG1, L + | jalr CFUNCADDR // (lua_State *L [, lua_CFunction f]) + |. st_vmstate + | // Returns nresults. + | ld BASE, L->base + | sll RD, CRET1, 3 + | ld TMP1, L->top + | li_vmstate INTERP + | ld PC, FRAME_PC(BASE) // Fetch PC of caller. + | dsubu RA, TMP1, RD // RA = L->top - nresults*8 + | sd L, DISPATCH_GL(cur_L)(DISPATCH) + | b ->vm_returnc + |. st_vmstate + break; + + /* ---------------------------------------------------------------------- */ + + default: + fprintf(stderr, "Error: undefined opcode BC_%s\n", bc_names[op]); + exit(2); + break; + } +} + +static int build_backend(BuildCtx *ctx) +{ + int op; + + dasm_growpc(Dst, BC__MAX); + + build_subroutines(ctx); + + |.code_op + for (op = 0; op < BC__MAX; op++) + build_ins(ctx, (BCOp)op, op); + + return BC__MAX; +} + +/* Emit pseudo frame-info for all assembler functions. */ +static void emit_asm_debug(BuildCtx *ctx) +{ + int fcofs = (int)((uint8_t *)ctx->glob[GLOB_vm_ffi_call] - ctx->code); + int i; + switch (ctx->mode) { + case BUILD_elfasm: + fprintf(ctx->fp, "\t.section .debug_frame,\"\",@progbits\n"); + fprintf(ctx->fp, + ".Lframe0:\n" + "\t.4byte .LECIE0-.LSCIE0\n" + ".LSCIE0:\n" + "\t.4byte 0xffffffff\n" + "\t.byte 0x1\n" + "\t.string \"\"\n" + "\t.uleb128 0x1\n" + "\t.sleb128 -4\n" + "\t.byte 31\n" + "\t.byte 0xc\n\t.uleb128 29\n\t.uleb128 0\n" + "\t.align 2\n" + ".LECIE0:\n\n"); + fprintf(ctx->fp, + ".LSFDE0:\n" + "\t.4byte .LEFDE0-.LASFDE0\n" + ".LASFDE0:\n" + "\t.4byte .Lframe0\n" + "\t.8byte .Lbegin\n" + "\t.8byte %d\n" + "\t.byte 0xe\n\t.uleb128 %d\n" + "\t.byte 0x9f\n\t.sleb128 2*5\n" + "\t.byte 0x9e\n\t.sleb128 2*6\n", + fcofs, CFRAME_SIZE); + for (i = 23; i >= 16; i--) + fprintf(ctx->fp, "\t.byte %d\n\t.uleb128 %d\n", 0x80+i, 2*(30-i)); +#if !LJ_SOFTFP + for (i = 31; i >= 24; i--) + fprintf(ctx->fp, "\t.byte %d\n\t.uleb128 %d\n", 0x80+32+i, 2*(46-i)); +#endif + fprintf(ctx->fp, + "\t.align 2\n" + ".LEFDE0:\n\n"); +#if LJ_HASFFI + fprintf(ctx->fp, + ".LSFDE1:\n" + "\t.4byte .LEFDE1-.LASFDE1\n" + ".LASFDE1:\n" + "\t.4byte .Lframe0\n" + "\t.4byte lj_vm_ffi_call\n" + "\t.4byte %d\n" + "\t.byte 0x9f\n\t.uleb128 2*1\n" + "\t.byte 0x90\n\t.uleb128 2*2\n" + "\t.byte 0xd\n\t.uleb128 0x10\n" + "\t.align 2\n" + ".LEFDE1:\n\n", (int)ctx->codesz - fcofs); +#endif +#if !LJ_NO_UNWIND + /* NYI */ +#endif + break; + default: + break; + } +} + diff --cc src/vm_ppc.dasc index 73d60ae4,61ebbb04..73a70a00 --- a/src/vm_ppc.dasc +++ b/src/vm_ppc.dasc @@@ -1,6 -1,6 +1,6 @@@ -|// Low-level VM code for PowerPC CPUs. +|// Low-level VM code for PowerPC 32 bit or 32on64 bit mode. |// Bytecode interpreter, fast functions and helper functions. - |// Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h + |// Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h | |.arch ppc |.section code_op, code_sub diff --cc src/vm_x64.dasc index 5983eeed,00000000..a8649b4e mode 100644,000000..100644 --- a/src/vm_x64.dasc +++ b/src/vm_x64.dasc @@@ -1,4946 -1,0 +1,4946 @@@ +|// Low-level VM code for x64 CPUs in LJ_GC64 mode. +|// Bytecode interpreter, fast functions and helper functions. - |// Copyright (C) 2005-2022 Mike Pall. See Copyright Notice in luajit.h ++|// Copyright (C) 2005-2023 Mike Pall. See Copyright Notice in luajit.h +| +|.arch x64 +|.section code_op, code_sub +| +|.actionlist build_actionlist +|.globals GLOB_ +|.globalnames globnames +|.externnames extnames +| +|//----------------------------------------------------------------------- +| +|.if WIN +|.define X64WIN, 1 // Windows/x64 calling conventions. +|.endif +| +|// Fixed register assignments for the interpreter. +|// This is very fragile and has many dependencies. Caveat emptor. +|.define BASE, rdx // Not C callee-save, refetched anyway. +|.if X64WIN +|.define KBASE, rdi // Must be C callee-save. +|.define PC, rsi // Must be C callee-save. +|.define DISPATCH, rbx // Must be C callee-save. +|.define KBASEd, edi +|.define PCd, esi +|.define DISPATCHd, ebx +|.else +|.define KBASE, r15 // Must be C callee-save. +|.define PC, rbx // Must be C callee-save. +|.define DISPATCH, r14 // Must be C callee-save. +|.define KBASEd, r15d +|.define PCd, ebx +|.define DISPATCHd, r14d +|.endif +| +|.define RA, rcx +|.define RAd, ecx +|.define RAH, ch +|.define RAL, cl +|.define RB, rbp // Must be rbp (C callee-save). +|.define RBd, ebp +|.define RC, rax // Must be rax. +|.define RCd, eax +|.define RCW, ax +|.define RCH, ah +|.define RCL, al +|.define OP, RBd +|.define RD, RC +|.define RDd, RCd +|.define RDW, RCW +|.define RDL, RCL +|.define TMPR, r10 +|.define TMPRd, r10d +|.define ITYPE, r11 +|.define ITYPEd, r11d +| +|.if X64WIN +|.define CARG1, rcx // x64/WIN64 C call arguments. +|.define CARG2, rdx +|.define CARG3, r8 +|.define CARG4, r9 +|.define CARG1d, ecx +|.define CARG2d, edx +|.define CARG3d, r8d +|.define CARG4d, r9d +|.else +|.define CARG1, rdi // x64/POSIX C call arguments. +|.define CARG2, rsi +|.define CARG3, rdx +|.define CARG4, rcx +|.define CARG5, r8 +|.define CARG6, r9 +|.define CARG1d, edi +|.define CARG2d, esi +|.define CARG3d, edx +|.define CARG4d, ecx +|.define CARG5d, r8d +|.define CARG6d, r9d +|.endif +| +|// Type definitions. Some of these are only used for documentation. +|.type L, lua_State +|.type GL, global_State +|.type TVALUE, TValue +|.type GCOBJ, GCobj +|.type STR, GCstr +|.type TAB, GCtab +|.type LFUNC, GCfuncL +|.type CFUNC, GCfuncC +|.type PROTO, GCproto +|.type UPVAL, GCupval +|.type NODE, Node +|.type NARGS, int +|.type TRACE, GCtrace +|.type SBUF, SBuf +| +|// Stack layout while in interpreter. Must match with lj_frame.h. +|//----------------------------------------------------------------------- +|.if X64WIN // x64/Windows stack layout +| +|.define CFRAME_SPACE, aword*5 // Delta for rsp (see <--). +|.macro saveregs_ +| push rdi; push rsi; push rbx +| sub rsp, CFRAME_SPACE +|.endmacro +|.macro saveregs +| push rbp; saveregs_ +|.endmacro +|.macro restoreregs +| add rsp, CFRAME_SPACE +| pop rbx; pop rsi; pop rdi; pop rbp +|.endmacro +| +|.define SAVE_CFRAME, aword [rsp+aword*13] +|.define SAVE_PC, aword [rsp+aword*12] +|.define SAVE_L, aword [rsp+aword*11] +|.define SAVE_ERRF, dword [rsp+dword*21] +|.define SAVE_NRES, dword [rsp+dword*20] +|//----- 16 byte aligned, ^^^ 32 byte register save area, owned by interpreter +|.define SAVE_RET, aword [rsp+aword*9] //<-- rsp entering interpreter. +|.define SAVE_R4, aword [rsp+aword*8] +|.define SAVE_R3, aword [rsp+aword*7] +|.define SAVE_R2, aword [rsp+aword*6] +|.define SAVE_R1, aword [rsp+aword*5] //<-- rsp after register saves. +|.define ARG5, aword [rsp+aword*4] +|.define CSAVE_4, aword [rsp+aword*3] +|.define CSAVE_3, aword [rsp+aword*2] +|.define CSAVE_2, aword [rsp+aword*1] +|.define CSAVE_1, aword [rsp] //<-- rsp while in interpreter. +|//----- 16 byte aligned, ^^^ 32 byte register save area, owned by callee +| +|.define ARG5d, dword [rsp+dword*8] +|.define TMP1, ARG5 // TMP1 overlaps ARG5 +|.define TMP1d, ARG5d +|.define TMP1hi, dword [rsp+dword*9] +|.define MULTRES, TMP1d // MULTRES overlaps TMP1d. +| +|//----------------------------------------------------------------------- +|.else // x64/POSIX stack layout +| +|.define CFRAME_SPACE, aword*5 // Delta for rsp (see <--). +|.macro saveregs_ +| push rbx; push r15; push r14 +|.if NO_UNWIND +| push r13; push r12 +|.endif +| sub rsp, CFRAME_SPACE +|.endmacro +|.macro saveregs +| push rbp; saveregs_ +|.endmacro +|.macro restoreregs +| add rsp, CFRAME_SPACE +|.if NO_UNWIND +| pop r12; pop r13 +|.endif +| pop r14; pop r15; pop rbx; pop rbp +|.endmacro +| +|//----- 16 byte aligned, +|.if NO_UNWIND +|.define SAVE_RET, aword [rsp+aword*11] //<-- rsp entering interpreter. +|.define SAVE_R4, aword [rsp+aword*10] +|.define SAVE_R3, aword [rsp+aword*9] +|.define SAVE_R2, aword [rsp+aword*8] +|.define SAVE_R1, aword [rsp+aword*7] +|.define SAVE_RU2, aword [rsp+aword*6] +|.define SAVE_RU1, aword [rsp+aword*5] //<-- rsp after register saves. +|.else +|.define SAVE_RET, aword [rsp+aword*9] //<-- rsp entering interpreter. +|.define SAVE_R4, aword [rsp+aword*8] +|.define SAVE_R3, aword [rsp+aword*7] +|.define SAVE_R2, aword [rsp+aword*6] +|.define SAVE_R1, aword [rsp+aword*5] //<-- rsp after register saves. +|.endif +|.define SAVE_CFRAME, aword [rsp+aword*4] +|.define SAVE_PC, aword [rsp+aword*3] +|.define SAVE_L, aword [rsp+aword*2] +|.define SAVE_ERRF, dword [rsp+dword*3] +|.define SAVE_NRES, dword [rsp+dword*2] +|.define TMP1, aword [rsp] //<-- rsp while in interpreter. +|//----- 16 byte aligned +| +|.define TMP1d, dword [rsp] +|.define TMP1hi, dword [rsp+dword*1] +|.define MULTRES, TMP1d // MULTRES overlaps TMP1d. +| +|.endif +| +|//----------------------------------------------------------------------- +| +|// Instruction headers. +|.macro ins_A; .endmacro +|.macro ins_AD; .endmacro +|.macro ins_AJ; .endmacro +|.macro ins_ABC; movzx RBd, RCH; movzx RCd, RCL; .endmacro +|.macro ins_AB_; movzx RBd, RCH; .endmacro +|.macro ins_A_C; movzx RCd, RCL; .endmacro +|.macro ins_AND; not RD; .endmacro +| +|// Instruction decode+dispatch. Carefully tuned (nope, lodsd is not faster). +|.macro ins_NEXT +| mov RCd, [PC] +| movzx RAd, RCH +| movzx OP, RCL +| add PC, 4 +| shr RCd, 16 +| jmp aword [DISPATCH+OP*8] +|.endmacro +| +|// Instruction footer. +|.if 1 +| // Replicated dispatch. Less unpredictable branches, but higher I-Cache use. +| .define ins_next, ins_NEXT +| .define ins_next_, ins_NEXT +|.else +| // Common dispatch. Lower I-Cache use, only one (very) unpredictable branch. +| // Affects only certain kinds of benchmarks (and only with -j off). +| // Around 10%-30% slower on Core2, a lot more slower on P4. +| .macro ins_next +| jmp ->ins_next +| .endmacro +| .macro ins_next_ +| ->ins_next: +| ins_NEXT +| .endmacro +|.endif +| +|// Call decode and dispatch. +|.macro ins_callt +| // BASE = new base, RB = LFUNC, RD = nargs+1, [BASE-8] = PC +| mov PC, LFUNC:RB->pc +| mov RAd, [PC] +| movzx OP, RAL +| movzx RAd, RAH +| add PC, 4 +| jmp aword [DISPATCH+OP*8] +|.endmacro +| +|.macro ins_call +| // BASE = new base, RB = LFUNC, RD = nargs+1 +| mov [BASE-8], PC +| ins_callt +|.endmacro +| +|//----------------------------------------------------------------------- +| +|// Macros to clear or set tags. +|.macro cleartp, reg; shl reg, 17; shr reg, 17; .endmacro +|.macro settp, reg, tp +| mov64 ITYPE, ((uint64_t)tp<<47) +| or reg, ITYPE +|.endmacro +|.macro settp, dst, reg, tp +| mov64 dst, ((uint64_t)tp<<47) +| or dst, reg +|.endmacro +|.macro setint, reg +| settp reg, LJ_TISNUM +|.endmacro +|.macro setint, dst, reg +| settp dst, reg, LJ_TISNUM +|.endmacro +| +|// Macros to test operand types. +|.macro checktp_nc, reg, tp, target +| mov ITYPE, reg +| sar ITYPE, 47 +| cmp ITYPEd, tp +| jne target +|.endmacro +|.macro checktp, reg, tp, target +| mov ITYPE, reg +| cleartp reg +| sar ITYPE, 47 +| cmp ITYPEd, tp +| jne target +|.endmacro +|.macro checktptp, src, tp, target +| mov ITYPE, src +| sar ITYPE, 47 +| cmp ITYPEd, tp +| jne target +|.endmacro +|.macro checkstr, reg, target; checktp reg, LJ_TSTR, target; .endmacro +|.macro checktab, reg, target; checktp reg, LJ_TTAB, target; .endmacro +|.macro checkfunc, reg, target; checktp reg, LJ_TFUNC, target; .endmacro +| +|.macro checknumx, reg, target, jump +| mov ITYPE, reg +| sar ITYPE, 47 +| cmp ITYPEd, LJ_TISNUM +| jump target +|.endmacro +|.macro checkint, reg, target; checknumx reg, target, jne; .endmacro +|.macro checkinttp, src, target; checknumx src, target, jne; .endmacro +|.macro checknum, reg, target; checknumx reg, target, jae; .endmacro +|.macro checknumtp, src, target; checknumx src, target, jae; .endmacro +|.macro checknumber, src, target; checknumx src, target, ja; .endmacro +| +|.macro mov_false, reg; mov64 reg, (int64_t)~((uint64_t)1<<47); .endmacro +|.macro mov_true, reg; mov64 reg, (int64_t)~((uint64_t)2<<47); .endmacro +| +|// These operands must be used with movzx. +|.define PC_OP, byte [PC-4] +|.define PC_RA, byte [PC-3] +|.define PC_RB, byte [PC-1] +|.define PC_RC, byte [PC-2] +|.define PC_RD, word [PC-2] +| +|.macro branchPC, reg +| lea PC, [PC+reg*4-BCBIAS_J*4] +|.endmacro +| +|// Assumes DISPATCH is relative to GL. +#define DISPATCH_GL(field) (GG_DISP2G + (int)offsetof(global_State, field)) +#define DISPATCH_J(field) (GG_DISP2J + (int)offsetof(jit_State, field)) +| +#define PC2PROTO(field) ((int)offsetof(GCproto, field)-(int)sizeof(GCproto)) +| +|// Decrement hashed hotcount and trigger trace recorder if zero. +|.macro hotloop, reg +| mov reg, PCd +| shr reg, 1 +| and reg, HOTCOUNT_PCMASK +| sub word [DISPATCH+reg+GG_DISP2HOT], HOTCOUNT_LOOP +| jb ->vm_hotloop +|.endmacro +| +|.macro hotcall, reg +| mov reg, PCd +| shr reg, 1 +| and reg, HOTCOUNT_PCMASK +| sub word [DISPATCH+reg+GG_DISP2HOT], HOTCOUNT_CALL +| jb ->vm_hotcall +|.endmacro +| +|// Set current VM state. +|.macro set_vmstate, st +| mov dword [DISPATCH+DISPATCH_GL(vmstate)], ~LJ_VMST_..st +|.endmacro +| +|.macro fpop1; fstp st1; .endmacro +| +|// Synthesize SSE FP constants. +|.macro sseconst_abs, reg, tmp // Synthesize abs mask. +| mov64 tmp, U64x(7fffffff,ffffffff); movd reg, tmp +|.endmacro +| +|.macro sseconst_hi, reg, tmp, val // Synthesize hi-32 bit const. +| mov64 tmp, U64x(val,00000000); movd reg, tmp +|.endmacro +| +|.macro sseconst_sign, reg, tmp // Synthesize sign mask. +| sseconst_hi reg, tmp, 80000000 +|.endmacro +|.macro sseconst_1, reg, tmp // Synthesize 1.0. +| sseconst_hi reg, tmp, 3ff00000 +|.endmacro +|.macro sseconst_2p52, reg, tmp // Synthesize 2^52. +| sseconst_hi reg, tmp, 43300000 +|.endmacro +|.macro sseconst_tobit, reg, tmp // Synthesize 2^52 + 2^51. +| sseconst_hi reg, tmp, 43380000 +|.endmacro +| +|// Move table write barrier back. Overwrites reg. +|.macro barrierback, tab, reg +| and byte tab->marked, (uint8_t)~LJ_GC_BLACK // black2gray(tab) +| mov reg, [DISPATCH+DISPATCH_GL(gc.grayagain)] +| mov [DISPATCH+DISPATCH_GL(gc.grayagain)], tab +| mov tab->gclist, reg +|.endmacro +| +|//----------------------------------------------------------------------- + +/* Generate subroutines used by opcodes and other parts of the VM. */ +/* The .code_sub section should be last to help static branch prediction. */ +static void build_subroutines(BuildCtx *ctx) +{ + |.code_sub + | + |//----------------------------------------------------------------------- + |//-- Return handling ---------------------------------------------------- + |//----------------------------------------------------------------------- + | + |->vm_returnp: + | test PCd, FRAME_P + | jz ->cont_dispatch + | + | // Return from pcall or xpcall fast func. + | and PC, -8 + | sub BASE, PC // Restore caller base. + | lea RA, [RA+PC-8] // Rebase RA and prepend one result. + | mov PC, [BASE-8] // Fetch PC of previous frame. + | // Prepending may overwrite the pcall frame, so do it at the end. + | mov_true ITYPE + | mov aword [BASE+RA], ITYPE // Prepend true to results. + | + |->vm_returnc: + | add RDd, 1 // RD = nresults+1 + | jz ->vm_unwind_yield + | mov MULTRES, RDd + | test PC, FRAME_TYPE + | jz ->BC_RET_Z // Handle regular return to Lua. + | + |->vm_return: + | // BASE = base, RA = resultofs, RD = nresults+1 (= MULTRES), PC = return + | xor PC, FRAME_C + | test PCd, FRAME_TYPE + | jnz ->vm_returnp + | + | // Return to C. + | set_vmstate C + | and PC, -8 + | sub PC, BASE + | neg PC // Previous base = BASE - delta. + | + | sub RDd, 1 + | jz >2 + |1: // Move results down. + | mov RB, [BASE+RA] + | mov [BASE-16], RB + | add BASE, 8 + | sub RDd, 1 + | jnz <1 + |2: + | mov L:RB, SAVE_L + | mov L:RB->base, PC + |3: + | mov RDd, MULTRES + | mov RAd, SAVE_NRES // RA = wanted nresults+1 + |4: + | cmp RAd, RDd + | jne >6 // More/less results wanted? + |5: + | sub BASE, 16 + | mov L:RB->top, BASE + | + |->vm_leave_cp: + | mov RA, SAVE_CFRAME // Restore previous C frame. + | mov L:RB->cframe, RA + | xor eax, eax // Ok return status for vm_pcall. + | + |->vm_leave_unw: + | restoreregs + | ret + | + |6: + | jb >7 // Less results wanted? + | // More results wanted. Check stack size and fill up results with nil. + | cmp BASE, L:RB->maxstack + | ja >8 + | mov aword [BASE-16], LJ_TNIL + | add BASE, 8 + | add RDd, 1 + | jmp <4 + | + |7: // Less results wanted. + | test RAd, RAd + | jz <5 // But check for LUA_MULTRET+1. + | sub RA, RD // Negative result! + | lea BASE, [BASE+RA*8] // Correct top. + | jmp <5 + | + |8: // Corner case: need to grow stack for filling up results. + | // This can happen if: + | // - A C function grows the stack (a lot). + | // - The GC shrinks the stack in between. + | // - A return back from a lua_call() with (high) nresults adjustment. + | mov L:RB->top, BASE // Save current top held in BASE (yes). + | mov MULTRES, RDd // Need to fill only remainder with nil. + | mov CARG2d, RAd + | mov CARG1, L:RB + | call extern lj_state_growstack // (lua_State *L, int n) + | mov BASE, L:RB->top // Need the (realloced) L->top in BASE. + | jmp <3 + | + |->vm_unwind_yield: + | mov al, LUA_YIELD + | jmp ->vm_unwind_c_eh + | + |->vm_unwind_c: // Unwind C stack, return from vm_pcall. + | // (void *cframe, int errcode) + | mov eax, CARG2d // Error return status for vm_pcall. + | mov rsp, CARG1 + |->vm_unwind_c_eh: // Landing pad for external unwinder. + | mov L:RB, SAVE_L + | mov GL:RB, L:RB->glref + | mov dword GL:RB->vmstate, ~LJ_VMST_C + | jmp ->vm_leave_unw + | + |->vm_unwind_rethrow: + |.if not X64WIN + | mov CARG1, SAVE_L + | mov CARG2d, eax + | restoreregs + | jmp extern lj_err_throw // (lua_State *L, int errcode) + |.endif + | + |->vm_unwind_ff: // Unwind C stack, return from ff pcall. + | // (void *cframe) + | and CARG1, CFRAME_RAWMASK + | mov rsp, CARG1 + |->vm_unwind_ff_eh: // Landing pad for external unwinder. + | mov L:RB, SAVE_L + | mov RDd, 1+1 // Really 1+2 results, incr. later. + | mov BASE, L:RB->base + | mov DISPATCH, L:RB->glref // Setup pointer to dispatch table. + | add DISPATCH, GG_G2DISP + | mov PC, [BASE-8] // Fetch PC of previous frame. + | mov_false RA + | mov RB, [BASE] + | mov [BASE-16], RA // Prepend false to error message. + | mov [BASE-8], RB + | mov RA, -16 // Results start at BASE+RA = BASE-16. + | set_vmstate INTERP + | jmp ->vm_returnc // Increments RD/MULTRES and returns. + | + |//----------------------------------------------------------------------- + |//-- Grow stack for calls ----------------------------------------------- + |//----------------------------------------------------------------------- + | + |->vm_growstack_c: // Grow stack for C function. + | mov CARG2d, LUA_MINSTACK + | jmp >2 + | + |->vm_growstack_v: // Grow stack for vararg Lua function. + | sub RD, 16 // LJ_FR2 + | jmp >1 + | + |->vm_growstack_f: // Grow stack for fixarg Lua function. + | // BASE = new base, RD = nargs+1, RB = L, PC = first PC + | lea RD, [BASE+NARGS:RD*8-8] + |1: + | movzx RAd, byte [PC-4+PC2PROTO(framesize)] + | add PC, 4 // Must point after first instruction. + | mov L:RB->base, BASE + | mov L:RB->top, RD + | mov SAVE_PC, PC + | mov CARG2, RA + |2: + | // RB = L, L->base = new base, L->top = top + | mov CARG1, L:RB + | call extern lj_state_growstack // (lua_State *L, int n) + | mov BASE, L:RB->base + | mov RD, L:RB->top + | mov LFUNC:RB, [BASE-16] + | cleartp LFUNC:RB + | sub RD, BASE + | shr RDd, 3 + | add NARGS:RDd, 1 + | // BASE = new base, RB = LFUNC, RD = nargs+1 + | ins_callt // Just retry the call. + | + |//----------------------------------------------------------------------- + |//-- Entry points into the assembler VM --------------------------------- + |//----------------------------------------------------------------------- + | + |->vm_resume: // Setup C frame and resume thread. + | // (lua_State *L, TValue *base, int nres1 = 0, ptrdiff_t ef = 0) + | saveregs + | mov L:RB, CARG1 // Caveat: CARG1 may be RA. + | mov SAVE_L, CARG1 + | mov RA, CARG2 + | mov PCd, FRAME_CP + | xor RDd, RDd + | lea KBASE, [esp+CFRAME_RESUME] + | mov DISPATCH, L:RB->glref // Setup pointer to dispatch table. + | add DISPATCH, GG_G2DISP + | mov SAVE_PC, RD // Any value outside of bytecode is ok. + | mov SAVE_CFRAME, RD + | mov SAVE_NRES, RDd + | mov SAVE_ERRF, RDd + | mov L:RB->cframe, KBASE + | cmp byte L:RB->status, RDL + | je >2 // Initial resume (like a call). + | + | // Resume after yield (like a return). + | mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB + | set_vmstate INTERP + | mov byte L:RB->status, RDL + | mov BASE, L:RB->base + | mov RD, L:RB->top + | sub RD, RA + | shr RDd, 3 + | add RDd, 1 // RD = nresults+1 + | sub RA, BASE // RA = resultofs + | mov PC, [BASE-8] + | mov MULTRES, RDd + | test PCd, FRAME_TYPE + | jz ->BC_RET_Z + | jmp ->vm_return + | + |->vm_pcall: // Setup protected C frame and enter VM. + | // (lua_State *L, TValue *base, int nres1, ptrdiff_t ef) + | saveregs + | mov PCd, FRAME_CP + | mov SAVE_ERRF, CARG4d + | jmp >1 + | + |->vm_call: // Setup C frame and enter VM. + | // (lua_State *L, TValue *base, int nres1) + | saveregs + | mov PCd, FRAME_C + | + |1: // Entry point for vm_pcall above (PC = ftype). + | mov SAVE_NRES, CARG3d + | mov L:RB, CARG1 // Caveat: CARG1 may be RA. + | mov SAVE_L, CARG1 + | mov RA, CARG2 + | + | mov DISPATCH, L:RB->glref // Setup pointer to dispatch table. + | mov KBASE, L:RB->cframe // Add our C frame to cframe chain. + | mov SAVE_CFRAME, KBASE + | mov SAVE_PC, L:RB // Any value outside of bytecode is ok. + | add DISPATCH, GG_G2DISP + | mov L:RB->cframe, rsp + | + |2: // Entry point for vm_resume/vm_cpcall (RA = base, RB = L, PC = ftype). + | mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB + | set_vmstate INTERP + | mov BASE, L:RB->base // BASE = old base (used in vmeta_call). + | add PC, RA + | sub PC, BASE // PC = frame delta + frame type + | + | mov RD, L:RB->top + | sub RD, RA + | shr NARGS:RDd, 3 + | add NARGS:RDd, 1 // RD = nargs+1 + | + |->vm_call_dispatch: + | mov LFUNC:RB, [RA-16] + | checkfunc LFUNC:RB, ->vmeta_call // Ensure KBASE defined and != BASE. + | + |->vm_call_dispatch_f: + | mov BASE, RA + | ins_call + | // BASE = new base, RB = func, RD = nargs+1, PC = caller PC + | + |->vm_cpcall: // Setup protected C frame, call C. + | // (lua_State *L, lua_CFunction func, void *ud, lua_CPFunction cp) + | saveregs + | mov L:RB, CARG1 // Caveat: CARG1 may be RA. + | mov SAVE_L, CARG1 + | mov SAVE_PC, L:RB // Any value outside of bytecode is ok. + | + | mov KBASE, L:RB->stack // Compute -savestack(L, L->top). + | sub KBASE, L:RB->top + | mov DISPATCH, L:RB->glref // Setup pointer to dispatch table. + | mov SAVE_ERRF, 0 // No error function. + | mov SAVE_NRES, KBASEd // Neg. delta means cframe w/o frame. + | add DISPATCH, GG_G2DISP + | // Handler may change cframe_nres(L->cframe) or cframe_errfunc(L->cframe). + | + | mov KBASE, L:RB->cframe // Add our C frame to cframe chain. + | mov SAVE_CFRAME, KBASE + | mov L:RB->cframe, rsp + | mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB + | + | call CARG4 // (lua_State *L, lua_CFunction func, void *ud) + | // TValue * (new base) or NULL returned in eax (RC). + | test RC, RC + | jz ->vm_leave_cp // No base? Just remove C frame. + | mov RA, RC + | mov PCd, FRAME_CP + | jmp <2 // Else continue with the call. + | + |//----------------------------------------------------------------------- + |//-- Metamethod handling ------------------------------------------------ + |//----------------------------------------------------------------------- + | + |//-- Continuation dispatch ---------------------------------------------- + | + |->cont_dispatch: + | // BASE = meta base, RA = resultofs, RD = nresults+1 (also in MULTRES) + | add RA, BASE + | and PC, -8 + | mov RB, BASE + | sub BASE, PC // Restore caller BASE. + | mov aword [RA+RD*8-8], LJ_TNIL // Ensure one valid arg. + | mov RC, RA // ... in [RC] + | mov PC, [RB-24] // Restore PC from [cont|PC]. + | mov RA, qword [RB-32] // May be negative on WIN64 with debug. + |.if FFI + | cmp RA, 1 + | jbe >1 + |.endif + | mov LFUNC:KBASE, [BASE-16] + | cleartp LFUNC:KBASE + | mov KBASE, LFUNC:KBASE->pc + | mov KBASE, [KBASE+PC2PROTO(k)] + | // BASE = base, RC = result, RB = meta base + | jmp RA // Jump to continuation. + | + |.if FFI + |1: + | je ->cont_ffi_callback // cont = 1: return from FFI callback. + | // cont = 0: Tail call from C function. + | sub RB, BASE + | shr RBd, 3 + | lea RDd, [RBd-3] + | jmp ->vm_call_tail + |.endif + | + |->cont_cat: // BASE = base, RC = result, RB = mbase + | movzx RAd, PC_RB + | sub RB, 32 + | lea RA, [BASE+RA*8] + | sub RA, RB + | je ->cont_ra + | neg RA + | shr RAd, 3 + |.if X64WIN + | mov CARG3d, RAd + | mov L:CARG1, SAVE_L + | mov L:CARG1->base, BASE + | mov RC, [RC] + | mov [RB], RC + | mov CARG2, RB + |.else + | mov L:CARG1, SAVE_L + | mov L:CARG1->base, BASE + | mov CARG3d, RAd + | mov RA, [RC] + | mov [RB], RA + | mov CARG2, RB + |.endif + | jmp ->BC_CAT_Z + | + |//-- Table indexing metamethods ----------------------------------------- + | + |->vmeta_tgets: + | settp STR:RC, LJ_TSTR // STR:RC = GCstr * + | mov TMP1, STR:RC + | lea RC, TMP1 + | cmp PC_OP, BC_GGET + | jne >1 + | settp TAB:RA, TAB:RB, LJ_TTAB // TAB:RB = GCtab * + | lea RB, [DISPATCH+DISPATCH_GL(tmptv)] // Store fn->l.env in g->tmptv. + | mov [RB], TAB:RA + | jmp >2 + | + |->vmeta_tgetb: + | movzx RCd, PC_RC + |.if DUALNUM + | setint RC + | mov TMP1, RC + |.else + | cvtsi2sd xmm0, RCd + | movsd TMP1, xmm0 + |.endif + | lea RC, TMP1 + | jmp >1 + | + |->vmeta_tgetv: + | movzx RCd, PC_RC // Reload TValue *k from RC. + | lea RC, [BASE+RC*8] + |1: + | movzx RBd, PC_RB // Reload TValue *t from RB. + | lea RB, [BASE+RB*8] + |2: + | mov L:CARG1, SAVE_L + | mov L:CARG1->base, BASE // Caveat: CARG2/CARG3 may be BASE. + | mov CARG2, RB + | mov CARG3, RC + | mov L:RB, L:CARG1 + | mov SAVE_PC, PC + | call extern lj_meta_tget // (lua_State *L, TValue *o, TValue *k) + | // TValue * (finished) or NULL (metamethod) returned in eax (RC). + | mov BASE, L:RB->base + | test RC, RC + | jz >3 + |->cont_ra: // BASE = base, RC = result + | movzx RAd, PC_RA + | mov RB, [RC] + | mov [BASE+RA*8], RB + | ins_next + | + |3: // Call __index metamethod. + | // BASE = base, L->top = new base, stack = cont/func/t/k + | mov RA, L:RB->top + | mov [RA-24], PC // [cont|PC] + | lea PC, [RA+FRAME_CONT] + | sub PC, BASE + | mov LFUNC:RB, [RA-16] // Guaranteed to be a function here. + | mov NARGS:RDd, 2+1 // 2 args for func(t, k). + | cleartp LFUNC:RB + | jmp ->vm_call_dispatch_f + | + |->vmeta_tgetr: + | mov CARG1, TAB:RB + | mov RB, BASE // Save BASE. + | mov CARG2d, RCd // Caveat: CARG2 == BASE + | call extern lj_tab_getinth // (GCtab *t, int32_t key) + | // cTValue * or NULL returned in eax (RC). + | movzx RAd, PC_RA + | mov BASE, RB // Restore BASE. + | test RC, RC + | jnz ->BC_TGETR_Z + | mov ITYPE, LJ_TNIL + | jmp ->BC_TGETR2_Z + | + |//----------------------------------------------------------------------- + | + |->vmeta_tsets: + | settp STR:RC, LJ_TSTR // STR:RC = GCstr * + | mov TMP1, STR:RC + | lea RC, TMP1 + | cmp PC_OP, BC_GSET + | jne >1 + | settp TAB:RA, TAB:RB, LJ_TTAB // TAB:RB = GCtab * + | lea RB, [DISPATCH+DISPATCH_GL(tmptv)] // Store fn->l.env in g->tmptv. + | mov [RB], TAB:RA + | jmp >2 + | + |->vmeta_tsetb: + | movzx RCd, PC_RC + |.if DUALNUM + | setint RC + | mov TMP1, RC + |.else + | cvtsi2sd xmm0, RCd + | movsd TMP1, xmm0 + |.endif + | lea RC, TMP1 + | jmp >1 + | + |->vmeta_tsetv: + | movzx RCd, PC_RC // Reload TValue *k from RC. + | lea RC, [BASE+RC*8] + |1: + | movzx RBd, PC_RB // Reload TValue *t from RB. + | lea RB, [BASE+RB*8] + |2: + | mov L:CARG1, SAVE_L + | mov L:CARG1->base, BASE // Caveat: CARG2/CARG3 may be BASE. + | mov CARG2, RB + | mov CARG3, RC + | mov L:RB, L:CARG1 + | mov SAVE_PC, PC + | call extern lj_meta_tset // (lua_State *L, TValue *o, TValue *k) + | // TValue * (finished) or NULL (metamethod) returned in eax (RC). + | mov BASE, L:RB->base + | test RC, RC + | jz >3 + | // NOBARRIER: lj_meta_tset ensures the table is not black. + | movzx RAd, PC_RA + | mov RB, [BASE+RA*8] + | mov [RC], RB + |->cont_nop: // BASE = base, (RC = result) + | ins_next + | + |3: // Call __newindex metamethod. + | // BASE = base, L->top = new base, stack = cont/func/t/k/(v) + | mov RA, L:RB->top + | mov [RA-24], PC // [cont|PC] + | movzx RCd, PC_RA + | // Copy value to third argument. + | mov RB, [BASE+RC*8] + | mov [RA+16], RB + | lea PC, [RA+FRAME_CONT] + | sub PC, BASE + | mov LFUNC:RB, [RA-16] // Guaranteed to be a function here. + | mov NARGS:RDd, 3+1 // 3 args for func(t, k, v). + | cleartp LFUNC:RB + | jmp ->vm_call_dispatch_f + | + |->vmeta_tsetr: + |.if X64WIN + | mov L:CARG1, SAVE_L + | mov CARG3d, RCd + | mov L:CARG1->base, BASE + | xchg CARG2, TAB:RB // Caveat: CARG2 == BASE. + |.else + | mov L:CARG1, SAVE_L + | mov CARG2, TAB:RB + | mov L:CARG1->base, BASE + | mov RB, BASE // Save BASE. + | mov CARG3d, RCd // Caveat: CARG3 == BASE. + |.endif + | mov SAVE_PC, PC + | call extern lj_tab_setinth // (lua_State *L, GCtab *t, int32_t key) + | // TValue * returned in eax (RC). + | movzx RAd, PC_RA + | mov BASE, RB // Restore BASE. + | jmp ->BC_TSETR_Z + | + |//-- Comparison metamethods --------------------------------------------- + | + |->vmeta_comp: + | movzx RDd, PC_RD + | movzx RAd, PC_RA + | mov L:RB, SAVE_L + | mov L:RB->base, BASE // Caveat: CARG2/CARG3 == BASE. + |.if X64WIN + | lea CARG3, [BASE+RD*8] + | lea CARG2, [BASE+RA*8] + |.else + | lea CARG2, [BASE+RA*8] + | lea CARG3, [BASE+RD*8] + |.endif + | mov CARG1, L:RB // Caveat: CARG1/CARG4 == RA. + | movzx CARG4d, PC_OP + | mov SAVE_PC, PC + | call extern lj_meta_comp // (lua_State *L, TValue *o1, *o2, int op) + | // 0/1 or TValue * (metamethod) returned in eax (RC). + |3: + | mov BASE, L:RB->base + | cmp RC, 1 + | ja ->vmeta_binop + |4: + | lea PC, [PC+4] + | jb >6 + |5: + | movzx RDd, PC_RD + | branchPC RD + |6: + | ins_next + | + |->cont_condt: // BASE = base, RC = result + | add PC, 4 + | mov ITYPE, [RC] + | sar ITYPE, 47 + | cmp ITYPEd, LJ_TISTRUECOND // Branch if result is true. + | jb <5 + | jmp <6 + | + |->cont_condf: // BASE = base, RC = result + | mov ITYPE, [RC] + | sar ITYPE, 47 + | cmp ITYPEd, LJ_TISTRUECOND // Branch if result is false. + | jmp <4 + | + |->vmeta_equal: + | cleartp TAB:RD + | sub PC, 4 + |.if X64WIN + | mov CARG3, RD + | mov CARG4d, RBd + | mov L:RB, SAVE_L + | mov L:RB->base, BASE // Caveat: CARG2 == BASE. + | mov CARG2, RA + | mov CARG1, L:RB // Caveat: CARG1 == RA. + |.else + | mov CARG2, RA + | mov CARG4d, RBd // Caveat: CARG4 == RA. + | mov L:RB, SAVE_L + | mov L:RB->base, BASE // Caveat: CARG3 == BASE. + | mov CARG3, RD + | mov CARG1, L:RB + |.endif + | mov SAVE_PC, PC + | call extern lj_meta_equal // (lua_State *L, GCobj *o1, *o2, int ne) + | // 0/1 or TValue * (metamethod) returned in eax (RC). + | jmp <3 + | + |->vmeta_equal_cd: + |.if FFI + | sub PC, 4 + | mov L:RB, SAVE_L + | mov L:RB->base, BASE + | mov CARG1, L:RB + | mov CARG2d, dword [PC-4] + | mov SAVE_PC, PC + | call extern lj_meta_equal_cd // (lua_State *L, BCIns ins) + | // 0/1 or TValue * (metamethod) returned in eax (RC). + | jmp <3 + |.endif + | + |->vmeta_istype: + | mov L:RB, SAVE_L + | mov L:RB->base, BASE // Caveat: CARG2/CARG3 may be BASE. + | mov CARG2d, RAd + | mov CARG3d, RDd + | mov L:CARG1, L:RB + | mov SAVE_PC, PC + | call extern lj_meta_istype // (lua_State *L, BCReg ra, BCReg tp) + | mov BASE, L:RB->base + | jmp <6 + | + |//-- Arithmetic metamethods --------------------------------------------- + | + |->vmeta_arith_vno: + |.if DUALNUM + | movzx RBd, PC_RB + | movzx RCd, PC_RC + |.endif + |->vmeta_arith_vn: + | lea RC, [KBASE+RC*8] + | jmp >1 + | + |->vmeta_arith_nvo: + |.if DUALNUM + | movzx RBd, PC_RB + | movzx RCd, PC_RC + |.endif + |->vmeta_arith_nv: + | lea TMPR, [KBASE+RC*8] + | lea RC, [BASE+RB*8] + | mov RB, TMPR + | jmp >2 + | + |->vmeta_unm: + | lea RC, [BASE+RD*8] + | mov RB, RC + | jmp >2 + | + |->vmeta_arith_vvo: + |.if DUALNUM + | movzx RBd, PC_RB + | movzx RCd, PC_RC + |.endif + |->vmeta_arith_vv: + | lea RC, [BASE+RC*8] + |1: + | lea RB, [BASE+RB*8] + |2: + | lea RA, [BASE+RA*8] + |.if X64WIN + | mov CARG3, RB + | mov CARG4, RC + | movzx RCd, PC_OP + | mov ARG5d, RCd + | mov L:RB, SAVE_L + | mov L:RB->base, BASE // Caveat: CARG2 == BASE. + | mov CARG2, RA + | mov CARG1, L:RB // Caveat: CARG1 == RA. + |.else + | movzx CARG5d, PC_OP + | mov CARG2, RA + | mov CARG4, RC // Caveat: CARG4 == RA. + | mov L:CARG1, SAVE_L + | mov L:CARG1->base, BASE // Caveat: CARG3 == BASE. + | mov CARG3, RB + | mov L:RB, L:CARG1 + |.endif + | mov SAVE_PC, PC + | call extern lj_meta_arith // (lua_State *L, TValue *ra,*rb,*rc, BCReg op) + | // NULL (finished) or TValue * (metamethod) returned in eax (RC). + | mov BASE, L:RB->base + | test RC, RC + | jz ->cont_nop + | + | // Call metamethod for binary op. + |->vmeta_binop: + | // BASE = base, RC = new base, stack = cont/func/o1/o2 + | mov RA, RC + | sub RC, BASE + | mov [RA-24], PC // [cont|PC] + | lea PC, [RC+FRAME_CONT] + | mov NARGS:RDd, 2+1 // 2 args for func(o1, o2). + | jmp ->vm_call_dispatch + | + |->vmeta_len: + | movzx RDd, PC_RD + | mov L:RB, SAVE_L + | mov L:RB->base, BASE + | lea CARG2, [BASE+RD*8] // Caveat: CARG2 == BASE + | mov L:CARG1, L:RB + | mov SAVE_PC, PC + | call extern lj_meta_len // (lua_State *L, TValue *o) + | // NULL (retry) or TValue * (metamethod) returned in eax (RC). + | mov BASE, L:RB->base +#if LJ_52 + | test RC, RC + | jne ->vmeta_binop // Binop call for compatibility. + | movzx RDd, PC_RD + | mov TAB:CARG1, [BASE+RD*8] + | cleartp TAB:CARG1 + | jmp ->BC_LEN_Z +#else + | jmp ->vmeta_binop // Binop call for compatibility. +#endif + | + |//-- Call metamethod ---------------------------------------------------- + | + |->vmeta_call_ra: + | lea RA, [BASE+RA*8+16] + |->vmeta_call: // Resolve and call __call metamethod. + | // BASE = old base, RA = new base, RC = nargs+1, PC = return + | mov TMP1d, NARGS:RDd // Save RA, RC for us. + | mov RB, RA + |.if X64WIN + | mov L:TMPR, SAVE_L + | mov L:TMPR->base, BASE // Caveat: CARG2 is BASE. + | lea CARG2, [RA-16] + | lea CARG3, [RA+NARGS:RD*8-8] + | mov CARG1, L:TMPR // Caveat: CARG1 is RA. + |.else + | mov L:CARG1, SAVE_L + | mov L:CARG1->base, BASE // Caveat: CARG3 is BASE. + | lea CARG2, [RA-16] + | lea CARG3, [RA+NARGS:RD*8-8] + |.endif + | mov SAVE_PC, PC + | call extern lj_meta_call // (lua_State *L, TValue *func, TValue *top) + | mov RA, RB + | mov L:RB, SAVE_L + | mov BASE, L:RB->base + | mov NARGS:RDd, TMP1d + | mov LFUNC:RB, [RA-16] + | add NARGS:RDd, 1 + | // This is fragile. L->base must not move, KBASE must always be defined. + | cmp KBASE, BASE // Continue with CALLT if flag set. + | je ->BC_CALLT_Z + | cleartp LFUNC:RB + | mov BASE, RA + | ins_call // Otherwise call resolved metamethod. + | + |//-- Argument coercion for 'for' statement ------------------------------ + | + |->vmeta_for: + | mov L:RB, SAVE_L + | mov L:RB->base, BASE + | mov CARG2, RA // Caveat: CARG2 == BASE + | mov L:CARG1, L:RB // Caveat: CARG1 == RA + | mov SAVE_PC, PC + | call extern lj_meta_for // (lua_State *L, TValue *base) + | mov BASE, L:RB->base + | mov RCd, [PC-4] + | movzx RAd, RCH + | movzx OP, RCL + | shr RCd, 16 + | jmp aword [DISPATCH+OP*8+GG_DISP2STATIC] // Retry FORI or JFORI. + | + |//----------------------------------------------------------------------- + |//-- Fast functions ----------------------------------------------------- + |//----------------------------------------------------------------------- + | + |.macro .ffunc, name + |->ff_ .. name: + |.endmacro + | + |.macro .ffunc_1, name + |->ff_ .. name: + | cmp NARGS:RDd, 1+1; jb ->fff_fallback + |.endmacro + | + |.macro .ffunc_2, name + |->ff_ .. name: + | cmp NARGS:RDd, 2+1; jb ->fff_fallback + |.endmacro + | + |.macro .ffunc_n, name, op + | .ffunc_1 name + | checknumtp [BASE], ->fff_fallback + | op xmm0, qword [BASE] + |.endmacro + | + |.macro .ffunc_n, name + | .ffunc_n name, movsd + |.endmacro + | + |.macro .ffunc_nn, name + | .ffunc_2 name + | checknumtp [BASE], ->fff_fallback + | checknumtp [BASE+8], ->fff_fallback + | movsd xmm0, qword [BASE] + | movsd xmm1, qword [BASE+8] + |.endmacro + | + |// Inlined GC threshold check. Caveat: uses label 1. + |.macro ffgccheck + | mov RB, [DISPATCH+DISPATCH_GL(gc.total)] + | cmp RB, [DISPATCH+DISPATCH_GL(gc.threshold)] + | jb >1 + | call ->fff_gcstep + |1: + |.endmacro + | + |//-- Base library: checks ----------------------------------------------- + | + |.ffunc_1 assert + | mov ITYPE, [BASE] + | mov RB, ITYPE + | sar ITYPE, 47 + | cmp ITYPEd, LJ_TISTRUECOND; jae ->fff_fallback + | mov PC, [BASE-8] + | mov MULTRES, RDd + | mov RB, [BASE] + | mov [BASE-16], RB + | sub RDd, 2 + | jz >2 + | mov RA, BASE + |1: + | add RA, 8 + | mov RB, [RA] + | mov [RA-16], RB + | sub RDd, 1 + | jnz <1 + |2: + | mov RDd, MULTRES + | jmp ->fff_res_ + | + |.ffunc_1 type + | mov RC, [BASE] + | sar RC, 47 + | mov RBd, LJ_TISNUM + | cmp RCd, RBd + | cmovb RCd, RBd + | not RCd + |2: + | mov CFUNC:RB, [BASE-16] + | cleartp CFUNC:RB + | mov STR:RC, [CFUNC:RB+RC*8+((char *)(&((GCfuncC *)0)->upvalue))] + | mov PC, [BASE-8] + | settp STR:RC, LJ_TSTR + | mov [BASE-16], STR:RC + | jmp ->fff_res1 + | + |//-- Base library: getters and setters --------------------------------- + | + |.ffunc_1 getmetatable + | mov TAB:RB, [BASE] + | mov PC, [BASE-8] + | checktab TAB:RB, >6 + |1: // Field metatable must be at same offset for GCtab and GCudata! + | mov TAB:RB, TAB:RB->metatable + |2: + | test TAB:RB, TAB:RB + | mov aword [BASE-16], LJ_TNIL + | jz ->fff_res1 + | settp TAB:RC, TAB:RB, LJ_TTAB + | mov [BASE-16], TAB:RC // Store metatable as default result. + | mov STR:RC, [DISPATCH+DISPATCH_GL(gcroot)+8*(GCROOT_MMNAME+MM_metatable)] + | mov RAd, TAB:RB->hmask + | and RAd, STR:RC->sid + | settp STR:RC, LJ_TSTR + | imul RAd, #NODE + | add NODE:RA, TAB:RB->node + |3: // Rearranged logic, because we expect _not_ to find the key. + | cmp NODE:RA->key, STR:RC + | je >5 + |4: + | mov NODE:RA, NODE:RA->next + | test NODE:RA, NODE:RA + | jnz <3 + | jmp ->fff_res1 // Not found, keep default result. + |5: + | mov RB, NODE:RA->val + | cmp RB, LJ_TNIL; je ->fff_res1 // Ditto for nil value. + | mov [BASE-16], RB // Return value of mt.__metatable. + | jmp ->fff_res1 + | + |6: + | cmp ITYPEd, LJ_TUDATA; je <1 + | cmp ITYPEd, LJ_TISNUM; ja >7 + | mov ITYPEd, LJ_TISNUM + |7: + | not ITYPEd + | mov TAB:RB, [DISPATCH+ITYPE*8+DISPATCH_GL(gcroot[GCROOT_BASEMT])] + | jmp <2 + | + |.ffunc_2 setmetatable + | mov TAB:RB, [BASE] + | mov TAB:TMPR, TAB:RB + | checktab TAB:RB, ->fff_fallback + | // Fast path: no mt for table yet and not clearing the mt. + | cmp aword TAB:RB->metatable, 0; jne ->fff_fallback + | mov TAB:RA, [BASE+8] + | checktab TAB:RA, ->fff_fallback + | mov TAB:RB->metatable, TAB:RA + | mov PC, [BASE-8] + | mov [BASE-16], TAB:TMPR // Return original table. + | test byte TAB:RB->marked, LJ_GC_BLACK // isblack(table) + | jz >1 + | // Possible write barrier. Table is black, but skip iswhite(mt) check. + | barrierback TAB:RB, RC + |1: + | jmp ->fff_res1 + | + |.ffunc_2 rawget + |.if X64WIN + | mov TAB:RA, [BASE] + | checktab TAB:RA, ->fff_fallback + | mov RB, BASE // Save BASE. + | lea CARG3, [BASE+8] + | mov CARG2, TAB:RA // Caveat: CARG2 == BASE. + | mov CARG1, SAVE_L + |.else + | mov TAB:CARG2, [BASE] + | checktab TAB:CARG2, ->fff_fallback + | mov RB, BASE // Save BASE. + | lea CARG3, [BASE+8] // Caveat: CARG3 == BASE. + | mov CARG1, SAVE_L + |.endif + | call extern lj_tab_get // (lua_State *L, GCtab *t, cTValue *key) + | // cTValue * returned in eax (RD). + | mov BASE, RB // Restore BASE. + | // Copy table slot. + | mov RB, [RD] + | mov PC, [BASE-8] + | mov [BASE-16], RB + | jmp ->fff_res1 + | + |//-- Base library: conversions ------------------------------------------ + | + |.ffunc tonumber + | // Only handles the number case inline (without a base argument). + | cmp NARGS:RDd, 1+1; jne ->fff_fallback // Exactly one argument. + | mov RB, [BASE] + | checknumber RB, ->fff_fallback + | mov PC, [BASE-8] + | mov [BASE-16], RB + | jmp ->fff_res1 + | + |.ffunc_1 tostring + | // Only handles the string or number case inline. + | mov PC, [BASE-8] + | mov STR:RB, [BASE] + | checktp_nc STR:RB, LJ_TSTR, >3 + | // A __tostring method in the string base metatable is ignored. + |2: + | mov [BASE-16], STR:RB + | jmp ->fff_res1 + |3: // Handle numbers inline, unless a number base metatable is present. + | cmp ITYPEd, LJ_TISNUM; ja ->fff_fallback_1 + | cmp aword [DISPATCH+DISPATCH_GL(gcroot[GCROOT_BASEMT_NUM])], 0 + | jne ->fff_fallback + | ffgccheck // Caveat: uses label 1. + | mov L:RB, SAVE_L + | mov L:RB->base, BASE // Add frame since C call can throw. + | mov SAVE_PC, PC // Redundant (but a defined value). + |.if not X64WIN + | mov CARG2, BASE // Otherwise: CARG2 == BASE + |.endif + | mov L:CARG1, L:RB + |.if DUALNUM + | call extern lj_strfmt_number // (lua_State *L, cTValue *o) + |.else + | call extern lj_strfmt_num // (lua_State *L, lua_Number *np) + |.endif + | // GCstr returned in eax (RD). + | mov BASE, L:RB->base + | settp STR:RB, RD, LJ_TSTR + | jmp <2 + | + |//-- Base library: iterators ------------------------------------------- + | + |.ffunc_1 next + | je >2 // Missing 2nd arg? + |1: + | mov CARG1, [BASE] + | mov PC, [BASE-8] + | checktab CARG1, ->fff_fallback + | mov RB, BASE // Save BASE. + |.if X64WIN + | lea CARG3, [BASE-16] + | lea CARG2, [BASE+8] // Caveat: CARG2 == BASE. + |.else + | lea CARG2, [BASE+8] + | lea CARG3, [BASE-16] // Caveat: CARG3 == BASE. + |.endif + | call extern lj_tab_next // (GCtab *t, cTValue *key, TValue *o) + | // 1=found, 0=end, -1=error returned in eax (RD). + | mov BASE, RB // Restore BASE. + | test RDd, RDd; jg ->fff_res2 // Found key/value. + | js ->fff_fallback_2 // Invalid key. + | // End of traversal: return nil. + | mov aword [BASE-16], LJ_TNIL + | jmp ->fff_res1 + |2: // Set missing 2nd arg to nil. + | mov aword [BASE+8], LJ_TNIL + | jmp <1 + | + |.ffunc_1 pairs + | mov TAB:RB, [BASE] + | mov TMPR, TAB:RB + | checktab TAB:RB, ->fff_fallback +#if LJ_52 + | cmp aword TAB:RB->metatable, 0; jne ->fff_fallback +#endif + | mov CFUNC:RD, [BASE-16] + | cleartp CFUNC:RD + | mov CFUNC:RD, CFUNC:RD->upvalue[0] + | settp CFUNC:RD, LJ_TFUNC + | mov PC, [BASE-8] + | mov [BASE-16], CFUNC:RD + | mov [BASE-8], TMPR + | mov aword [BASE], LJ_TNIL + | mov RDd, 1+3 + | jmp ->fff_res + | + |.ffunc_2 ipairs_aux + | mov TAB:RB, [BASE] + | checktab TAB:RB, ->fff_fallback + |.if DUALNUM + | mov RA, [BASE+8] + | checkint RA, ->fff_fallback + |.else + | checknumtp [BASE+8], ->fff_fallback + | movsd xmm0, qword [BASE+8] + |.endif + | mov PC, [BASE-8] + |.if DUALNUM + | add RAd, 1 + | setint ITYPE, RA + | mov [BASE-16], ITYPE + |.else + | sseconst_1 xmm1, TMPR + | addsd xmm0, xmm1 + | cvttsd2si RAd, xmm0 + | movsd qword [BASE-16], xmm0 + |.endif + | cmp RAd, TAB:RB->asize; jae >2 // Not in array part? + | mov RD, TAB:RB->array + | lea RD, [RD+RA*8] + |1: + | cmp aword [RD], LJ_TNIL; je ->fff_res0 + | // Copy array slot. + | mov RB, [RD] + | mov [BASE-8], RB + |->fff_res2: + | mov RDd, 1+2 + | jmp ->fff_res + |2: // Check for empty hash part first. Otherwise call C function. + | cmp dword TAB:RB->hmask, 0; je ->fff_res0 + |.if X64WIN + | mov TMPR, BASE + | mov CARG2d, RAd + | mov CARG1, TAB:RB + | mov RB, TMPR + |.else + | mov CARG1, TAB:RB + | mov RB, BASE // Save BASE. + | mov CARG2d, RAd // Caveat: CARG2 == BASE + |.endif + | call extern lj_tab_getinth // (GCtab *t, int32_t key) + | // cTValue * or NULL returned in eax (RD). + | mov BASE, RB + | test RD, RD + | jnz <1 + |->fff_res0: + | mov RDd, 1+0 + | jmp ->fff_res + | + |.ffunc_1 ipairs + | mov TAB:RB, [BASE] + | mov TMPR, TAB:RB + | checktab TAB:RB, ->fff_fallback +#if LJ_52 + | cmp aword TAB:RB->metatable, 0; jne ->fff_fallback +#endif + | mov CFUNC:RD, [BASE-16] + | cleartp CFUNC:RD + | mov CFUNC:RD, CFUNC:RD->upvalue[0] + | settp CFUNC:RD, LJ_TFUNC + | mov PC, [BASE-8] + | mov [BASE-16], CFUNC:RD + | mov [BASE-8], TMPR + |.if DUALNUM + | mov64 RD, ((uint64_t)LJ_TISNUM<<47) + | mov [BASE], RD + |.else + | mov qword [BASE], 0 + |.endif + | mov RDd, 1+3 + | jmp ->fff_res + | + |//-- Base library: catch errors ---------------------------------------- + | + |.ffunc_1 pcall + | lea RA, [BASE+16] + | sub NARGS:RDd, 1 + | mov PCd, 16+FRAME_PCALL + |1: + | movzx RBd, byte [DISPATCH+DISPATCH_GL(hookmask)] + | shr RB, HOOK_ACTIVE_SHIFT + | and RB, 1 + | add PC, RB // Remember active hook before pcall. + | // Note: this does a (harmless) copy of the function to the PC slot, too. + | mov KBASE, RD + |2: + | mov RB, [RA+KBASE*8-24] + | mov [RA+KBASE*8-16], RB + | sub KBASE, 1 + | ja <2 + | jmp ->vm_call_dispatch + | + |.ffunc_2 xpcall + | mov LFUNC:RA, [BASE+8] + | checktp_nc LFUNC:RA, LJ_TFUNC, ->fff_fallback + | mov LFUNC:RB, [BASE] // Swap function and traceback. + | mov [BASE], LFUNC:RA + | mov [BASE+8], LFUNC:RB + | lea RA, [BASE+24] + | sub NARGS:RDd, 2 + | mov PCd, 24+FRAME_PCALL + | jmp <1 + | + |//-- Coroutine library -------------------------------------------------- + | + |.macro coroutine_resume_wrap, resume + |.if resume + |.ffunc_1 coroutine_resume + | mov L:RB, [BASE] + | cleartp L:RB + |.else + |.ffunc coroutine_wrap_aux + | mov CFUNC:RB, [BASE-16] + | cleartp CFUNC:RB + | mov L:RB, CFUNC:RB->upvalue[0].gcr + | cleartp L:RB + |.endif + | mov PC, [BASE-8] + | mov SAVE_PC, PC + | mov TMP1, L:RB + |.if resume + | checktptp [BASE], LJ_TTHREAD, ->fff_fallback + |.endif + | cmp aword L:RB->cframe, 0; jne ->fff_fallback + | cmp byte L:RB->status, LUA_YIELD; ja ->fff_fallback + | mov RA, L:RB->top + | je >1 // Status != LUA_YIELD (i.e. 0)? + | cmp RA, L:RB->base // Check for presence of initial func. + | je ->fff_fallback + | mov PC, [RA-8] // Move initial function up. + | mov [RA], PC + | add RA, 8 + |1: + |.if resume + | lea PC, [RA+NARGS:RD*8-16] // Check stack space (-1-thread). + |.else + | lea PC, [RA+NARGS:RD*8-8] // Check stack space (-1). + |.endif + | cmp PC, L:RB->maxstack; ja ->fff_fallback + | mov L:RB->top, PC + | + | mov L:RB, SAVE_L + | mov L:RB->base, BASE + |.if resume + | add BASE, 8 // Keep resumed thread in stack for GC. + |.endif + | mov L:RB->top, BASE + |.if resume + | lea RB, [BASE+NARGS:RD*8-24] // RB = end of source for stack move. + |.else + | lea RB, [BASE+NARGS:RD*8-16] // RB = end of source for stack move. + |.endif + | sub RB, PC // Relative to PC. + | + | cmp PC, RA + | je >3 + |2: // Move args to coroutine. + | mov RC, [PC+RB] + | mov [PC-8], RC + | sub PC, 8 + | cmp PC, RA + | jne <2 + |3: + | mov CARG2, RA + | mov CARG1, TMP1 + | call ->vm_resume // (lua_State *L, TValue *base, 0, 0) + | + | mov L:RB, SAVE_L + | mov L:PC, TMP1 + | mov BASE, L:RB->base + | mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB + | set_vmstate INTERP + | + | cmp eax, LUA_YIELD + | ja >8 + |4: + | mov RA, L:PC->base + | mov KBASE, L:PC->top + | mov L:PC->top, RA // Clear coroutine stack. + | mov PC, KBASE + | sub PC, RA + | je >6 // No results? + | lea RD, [BASE+PC] + | shr PCd, 3 + | cmp RD, L:RB->maxstack + | ja >9 // Need to grow stack? + | + | mov RB, BASE + | sub RB, RA + |5: // Move results from coroutine. + | mov RD, [RA] + | mov [RA+RB], RD + | add RA, 8 + | cmp RA, KBASE + | jne <5 + |6: + |.if resume + | lea RDd, [PCd+2] // nresults+1 = 1 + true + results. + | mov_true ITYPE // Prepend true to results. + | mov [BASE-8], ITYPE + |.else + | lea RDd, [PCd+1] // nresults+1 = 1 + results. + |.endif + |7: + | mov PC, SAVE_PC + | mov MULTRES, RDd + |.if resume + | mov RA, -8 + |.else + | xor RAd, RAd + |.endif + | test PCd, FRAME_TYPE + | jz ->BC_RET_Z + | jmp ->vm_return + | + |8: // Coroutine returned with error (at co->top-1). + |.if resume + | mov_false ITYPE // Prepend false to results. + | mov [BASE-8], ITYPE + | mov RA, L:PC->top + | sub RA, 8 + | mov L:PC->top, RA // Clear error from coroutine stack. + | // Copy error message. + | mov RD, [RA] + | mov [BASE], RD + | mov RDd, 1+2 // nresults+1 = 1 + false + error. + | jmp <7 + |.else + | mov CARG2, L:PC + | mov CARG1, L:RB + | call extern lj_ffh_coroutine_wrap_err // (lua_State *L, lua_State *co) + | // Error function does not return. + |.endif + | + |9: // Handle stack expansion on return from yield. + | mov L:RA, TMP1 + | mov L:RA->top, KBASE // Undo coroutine stack clearing. + | mov CARG2, PC + | mov CARG1, L:RB + | call extern lj_state_growstack // (lua_State *L, int n) + | mov L:PC, TMP1 + | mov BASE, L:RB->base + | jmp <4 // Retry the stack move. + |.endmacro + | + | coroutine_resume_wrap 1 // coroutine.resume + | coroutine_resume_wrap 0 // coroutine.wrap + | + |.ffunc coroutine_yield + | mov L:RB, SAVE_L + | test aword L:RB->cframe, CFRAME_RESUME + | jz ->fff_fallback + | mov L:RB->base, BASE + | lea RD, [BASE+NARGS:RD*8-8] + | mov L:RB->top, RD + | xor RDd, RDd + | mov aword L:RB->cframe, RD + | mov al, LUA_YIELD + | mov byte L:RB->status, al + | jmp ->vm_leave_unw + | + |//-- Math library ------------------------------------------------------- + | + | .ffunc_1 math_abs + | mov RB, [BASE] + |.if DUALNUM + | checkint RB, >3 + | cmp RBd, 0; jns ->fff_resi + | neg RBd; js >2 + |->fff_resbit: + |->fff_resi: + | setint RB + |->fff_resRB: + | mov PC, [BASE-8] + | mov [BASE-16], RB + | jmp ->fff_res1 + |2: + | mov64 RB, U64x(41e00000,00000000) // 2^31. + | jmp ->fff_resRB + |3: + | ja ->fff_fallback + |.else + | checknum RB, ->fff_fallback + |.endif + | shl RB, 1 + | shr RB, 1 + | mov PC, [BASE-8] + | mov [BASE-16], RB + | jmp ->fff_res1 + | + |.ffunc_n math_sqrt, sqrtsd + |->fff_resxmm0: + | mov PC, [BASE-8] + | movsd qword [BASE-16], xmm0 + | // fallthrough + | + |->fff_res1: + | mov RDd, 1+1 + |->fff_res: + | mov MULTRES, RDd + |->fff_res_: + | test PCd, FRAME_TYPE + | jnz >7 + |5: + | cmp PC_RB, RDL // More results expected? + | ja >6 + | // Adjust BASE. KBASE is assumed to be set for the calling frame. + | movzx RAd, PC_RA + | neg RA + | lea BASE, [BASE+RA*8-16] // base = base - (RA+2)*8 + | ins_next + | + |6: // Fill up results with nil. + | mov aword [BASE+RD*8-24], LJ_TNIL + | add RD, 1 + | jmp <5 + | + |7: // Non-standard return case. + | mov RA, -16 // Results start at BASE+RA = BASE-16. + | jmp ->vm_return + | + |.macro math_round, func + | .ffunc math_ .. func + |.if DUALNUM + | mov RB, [BASE] + | checknumx RB, ->fff_resRB, je + | ja ->fff_fallback + |.else + | checknumtp [BASE], ->fff_fallback + |.endif + | movsd xmm0, qword [BASE] + | call ->vm_ .. func .. _sse + |.if DUALNUM + | cvttsd2si RBd, xmm0 + | cmp RBd, 0x80000000 + | jne ->fff_resi + | cvtsi2sd xmm1, RBd + | ucomisd xmm0, xmm1 + | jp ->fff_resxmm0 + | je ->fff_resi + |.endif + | jmp ->fff_resxmm0 + |.endmacro + | + | math_round floor + | math_round ceil + | + |.ffunc math_log + | cmp NARGS:RDd, 1+1; jne ->fff_fallback // Exactly one argument. + | checknumtp [BASE], ->fff_fallback + | movsd xmm0, qword [BASE] + | mov RB, BASE + | call extern log + | mov BASE, RB + | jmp ->fff_resxmm0 + | + |.macro math_extern, func + | .ffunc_n math_ .. func + | mov RB, BASE + | call extern func + | mov BASE, RB + | jmp ->fff_resxmm0 + |.endmacro + | + |.macro math_extern2, func + | .ffunc_nn math_ .. func + | mov RB, BASE + | call extern func + | mov BASE, RB + | jmp ->fff_resxmm0 + |.endmacro + | + | math_extern log10 + | math_extern exp + | math_extern sin + | math_extern cos + | math_extern tan + | math_extern asin + | math_extern acos + | math_extern atan + | math_extern sinh + | math_extern cosh + | math_extern tanh + | math_extern2 pow + | math_extern2 atan2 + | math_extern2 fmod + | + |.ffunc_2 math_ldexp + | checknumtp [BASE], ->fff_fallback + | checknumtp [BASE+8], ->fff_fallback + | fld qword [BASE+8] + | fld qword [BASE] + | fscale + | fpop1 + | mov PC, [BASE-8] + | fstp qword [BASE-16] + | jmp ->fff_res1 + | + |.ffunc_n math_frexp + | mov RB, BASE + |.if X64WIN + | lea CARG2, TMP1 // Caveat: CARG2 == BASE + |.else + | lea CARG1, TMP1 + |.endif + | call extern frexp + | mov BASE, RB + | mov RBd, TMP1d + | mov PC, [BASE-8] + | movsd qword [BASE-16], xmm0 + |.if DUALNUM + | setint RB + | mov [BASE-8], RB + |.else + | cvtsi2sd xmm1, RBd + | movsd qword [BASE-8], xmm1 + |.endif + | mov RDd, 1+2 + | jmp ->fff_res + | + |.ffunc_n math_modf + | mov RB, BASE + |.if X64WIN + | lea CARG2, [BASE-16] // Caveat: CARG2 == BASE + |.else + | lea CARG1, [BASE-16] + |.endif + | call extern modf + | mov BASE, RB + | mov PC, [BASE-8] + | movsd qword [BASE-8], xmm0 + | mov RDd, 1+2 + | jmp ->fff_res + | + |.macro math_minmax, name, cmovop, sseop + | .ffunc_1 name + | mov RAd, 2 + |.if DUALNUM + | mov RB, [BASE] + | checkint RB, >4 + |1: // Handle integers. + | cmp RAd, RDd; jae ->fff_resRB + | mov TMPR, [BASE+RA*8-8] + | checkint TMPR, >3 + | cmp RBd, TMPRd + | cmovop RB, TMPR + | add RAd, 1 + | jmp <1 + |3: + | ja ->fff_fallback + | // Convert intermediate result to number and continue below. + | cvtsi2sd xmm0, RBd + | jmp >6 + |4: + | ja ->fff_fallback + |.else + | checknumtp [BASE], ->fff_fallback + |.endif + | + | movsd xmm0, qword [BASE] + |5: // Handle numbers or integers. + | cmp RAd, RDd; jae ->fff_resxmm0 + |.if DUALNUM + | mov RB, [BASE+RA*8-8] + | checknumx RB, >6, jb + | ja ->fff_fallback + | cvtsi2sd xmm1, RBd + | jmp >7 + |.else + | checknumtp [BASE+RA*8-8], ->fff_fallback + |.endif + |6: + | movsd xmm1, qword [BASE+RA*8-8] + |7: + | sseop xmm0, xmm1 + | add RAd, 1 + | jmp <5 + |.endmacro + | + | math_minmax math_min, cmovg, minsd + | math_minmax math_max, cmovl, maxsd + | + |//-- String library ----------------------------------------------------- + | + |.ffunc string_byte // Only handle the 1-arg case here. + | cmp NARGS:RDd, 1+1; jne ->fff_fallback + | mov STR:RB, [BASE] + | checkstr STR:RB, ->fff_fallback + | mov PC, [BASE-8] + | cmp dword STR:RB->len, 1 + | jb ->fff_res0 // Return no results for empty string. + | movzx RBd, byte STR:RB[1] + |.if DUALNUM + | jmp ->fff_resi + |.else + | cvtsi2sd xmm0, RBd; jmp ->fff_resxmm0 + |.endif + | + |.ffunc string_char // Only handle the 1-arg case here. + | ffgccheck + | cmp NARGS:RDd, 1+1; jne ->fff_fallback // *Exactly* 1 arg. + |.if DUALNUM + | mov RB, [BASE] + | checkint RB, ->fff_fallback + |.else + | checknumtp [BASE], ->fff_fallback + | cvttsd2si RBd, qword [BASE] + |.endif + | cmp RBd, 255; ja ->fff_fallback + | mov TMP1d, RBd + | mov TMPRd, 1 + | lea RD, TMP1 // Points to stack. Little-endian. + |->fff_newstr: + | mov L:RB, SAVE_L + | mov L:RB->base, BASE + | mov CARG3d, TMPRd // Zero-extended to size_t. + | mov CARG2, RD + | mov CARG1, L:RB + | mov SAVE_PC, PC + | call extern lj_str_new // (lua_State *L, char *str, size_t l) + |->fff_resstr: + | // GCstr * returned in eax (RD). + | mov BASE, L:RB->base + | mov PC, [BASE-8] + | settp STR:RD, LJ_TSTR + | mov [BASE-16], STR:RD + | jmp ->fff_res1 + | + |.ffunc string_sub + | ffgccheck + | mov TMPRd, -1 + | cmp NARGS:RDd, 1+2; jb ->fff_fallback + | jna >1 + |.if DUALNUM + | mov TMPR, [BASE+16] + | checkint TMPR, ->fff_fallback + |.else + | checknumtp [BASE+16], ->fff_fallback + | cvttsd2si TMPRd, qword [BASE+16] + |.endif + |1: + | mov STR:RB, [BASE] + | checkstr STR:RB, ->fff_fallback + |.if DUALNUM + | mov ITYPE, [BASE+8] + | mov RAd, ITYPEd // Must clear hiword for lea below. + | sar ITYPE, 47 + | cmp ITYPEd, LJ_TISNUM + | jne ->fff_fallback + |.else + | checknumtp [BASE+8], ->fff_fallback + | cvttsd2si RAd, qword [BASE+8] + |.endif + | mov RCd, STR:RB->len + | cmp RCd, TMPRd // len < end? (unsigned compare) + | jb >5 + |2: + | test RAd, RAd // start <= 0? + | jle >7 + |3: + | sub TMPRd, RAd // start > end? + | jl ->fff_emptystr + | lea RD, [STR:RB+RAd+#STR-1] + | add TMPRd, 1 + |4: + | jmp ->fff_newstr + | + |5: // Negative end or overflow. + | jl >6 + | lea TMPRd, [TMPRd+RCd+1] // end = end+(len+1) + | jmp <2 + |6: // Overflow. + | mov TMPRd, RCd // end = len + | jmp <2 + | + |7: // Negative start or underflow. + | je >8 + | add RAd, RCd // start = start+(len+1) + | add RAd, 1 + | jg <3 // start > 0? + |8: // Underflow. + | mov RAd, 1 // start = 1 + | jmp <3 + | + |->fff_emptystr: // Range underflow. + | xor TMPRd, TMPRd // Zero length. Any ptr in RD is ok. + | jmp <4 + | + |.macro ffstring_op, name + | .ffunc_1 string_ .. name + | ffgccheck + |.if X64WIN + | mov STR:TMPR, [BASE] + | checkstr STR:TMPR, ->fff_fallback + |.else + | mov STR:CARG2, [BASE] + | checkstr STR:CARG2, ->fff_fallback + |.endif + | mov L:RB, SAVE_L + | lea SBUF:CARG1, [DISPATCH+DISPATCH_GL(tmpbuf)] + | mov L:RB->base, BASE + |.if X64WIN + | mov STR:CARG2, STR:TMPR // Caveat: CARG2 == BASE + |.endif + | mov RC, SBUF:CARG1->b + | mov SBUF:CARG1->L, L:RB + | mov SBUF:CARG1->w, RC + | mov SAVE_PC, PC + | call extern lj_buf_putstr_ .. name + | mov CARG1, rax + | call extern lj_buf_tostr + | jmp ->fff_resstr + |.endmacro + | + |ffstring_op reverse + |ffstring_op lower + |ffstring_op upper + | + |//-- Bit library -------------------------------------------------------- + | + |.macro .ffunc_bit, name, kind, fdef + | fdef name + |.if kind == 2 + | sseconst_tobit xmm1, RB + |.endif + |.if DUALNUM + | mov RB, [BASE] + | checkint RB, >1 + |.if kind > 0 + | jmp >2 + |.else + | jmp ->fff_resbit + |.endif + |1: + | ja ->fff_fallback + | movd xmm0, RB + |.else + | checknumtp [BASE], ->fff_fallback + | movsd xmm0, qword [BASE] + |.endif + |.if kind < 2 + | sseconst_tobit xmm1, RB + |.endif + | addsd xmm0, xmm1 + | movd RBd, xmm0 + |2: + |.endmacro + | + |.macro .ffunc_bit, name, kind + | .ffunc_bit name, kind, .ffunc_1 + |.endmacro + | + |.ffunc_bit bit_tobit, 0 + | jmp ->fff_resbit + | + |.macro .ffunc_bit_op, name, ins + | .ffunc_bit name, 2 + | mov TMPRd, NARGS:RDd // Save for fallback. + | lea RD, [BASE+NARGS:RD*8-16] + |1: + | cmp RD, BASE + | jbe ->fff_resbit + |.if DUALNUM + | mov RA, [RD] + | checkint RA, >2 + | ins RBd, RAd + | sub RD, 8 + | jmp <1 + |2: + | ja ->fff_fallback_bit_op + | movd xmm0, RA + |.else + | checknumtp [RD], ->fff_fallback_bit_op + | movsd xmm0, qword [RD] + |.endif + | addsd xmm0, xmm1 + | movd RAd, xmm0 + | ins RBd, RAd + | sub RD, 8 + | jmp <1 + |.endmacro + | + |.ffunc_bit_op bit_band, and + |.ffunc_bit_op bit_bor, or + |.ffunc_bit_op bit_bxor, xor + | + |.ffunc_bit bit_bswap, 1 + | bswap RBd + | jmp ->fff_resbit + | + |.ffunc_bit bit_bnot, 1 + | not RBd + |.if DUALNUM + | jmp ->fff_resbit + |.else + |->fff_resbit: + | cvtsi2sd xmm0, RBd + | jmp ->fff_resxmm0 + |.endif + | + |->fff_fallback_bit_op: + | mov NARGS:RDd, TMPRd // Restore for fallback + | jmp ->fff_fallback + | + |.macro .ffunc_bit_sh, name, ins + |.if DUALNUM + | .ffunc_bit name, 1, .ffunc_2 + | // Note: no inline conversion from number for 2nd argument! + | mov RA, [BASE+8] + | checkint RA, ->fff_fallback + |.else + | .ffunc_nn name + | sseconst_tobit xmm2, RB + | addsd xmm0, xmm2 + | addsd xmm1, xmm2 + | movd RBd, xmm0 + | movd RAd, xmm1 + |.endif + | ins RBd, cl // Assumes RA is ecx. + | jmp ->fff_resbit + |.endmacro + | + |.ffunc_bit_sh bit_lshift, shl + |.ffunc_bit_sh bit_rshift, shr + |.ffunc_bit_sh bit_arshift, sar + |.ffunc_bit_sh bit_rol, rol + |.ffunc_bit_sh bit_ror, ror + | + |//----------------------------------------------------------------------- + | + |->fff_fallback_2: + | mov NARGS:RDd, 1+2 // Other args are ignored, anyway. + | jmp ->fff_fallback + |->fff_fallback_1: + | mov NARGS:RDd, 1+1 // Other args are ignored, anyway. + |->fff_fallback: // Call fast function fallback handler. + | // BASE = new base, RD = nargs+1 + | mov L:RB, SAVE_L + | mov PC, [BASE-8] // Fallback may overwrite PC. + | mov SAVE_PC, PC // Redundant (but a defined value). + | mov L:RB->base, BASE + | lea RD, [BASE+NARGS:RD*8-8] + | lea RA, [RD+8*LUA_MINSTACK] // Ensure enough space for handler. + | mov L:RB->top, RD + | mov CFUNC:RD, [BASE-16] + | cleartp CFUNC:RD + | cmp RA, L:RB->maxstack + | ja >5 // Need to grow stack. + | mov CARG1, L:RB + | call aword CFUNC:RD->f // (lua_State *L) + | mov BASE, L:RB->base + | // Either throws an error, or recovers and returns -1, 0 or nresults+1. + | test RDd, RDd; jg ->fff_res // Returned nresults+1? + |1: + | mov RA, L:RB->top + | sub RA, BASE + | shr RAd, 3 + | test RDd, RDd + | lea NARGS:RDd, [RAd+1] + | mov LFUNC:RB, [BASE-16] + | jne ->vm_call_tail // Returned -1? + | cleartp LFUNC:RB + | ins_callt // Returned 0: retry fast path. + | + |// Reconstruct previous base for vmeta_call during tailcall. + |->vm_call_tail: + | mov RA, BASE + | test PCd, FRAME_TYPE + | jnz >3 + | movzx RBd, PC_RA + | neg RB + | lea BASE, [BASE+RB*8-16] // base = base - (RB+2)*8 + | jmp ->vm_call_dispatch // Resolve again for tailcall. + |3: + | mov RB, PC + | and RB, -8 + | sub BASE, RB + | jmp ->vm_call_dispatch // Resolve again for tailcall. + | + |5: // Grow stack for fallback handler. + | mov CARG2d, LUA_MINSTACK + | mov CARG1, L:RB + | call extern lj_state_growstack // (lua_State *L, int n) + | mov BASE, L:RB->base + | xor RDd, RDd // Simulate a return 0. + | jmp <1 // Dumb retry (goes through ff first). + | + |->fff_gcstep: // Call GC step function. + | // BASE = new base, RD = nargs+1 + | pop RB // Must keep stack at same level. + | mov TMP1, RB // Save return address + | mov L:RB, SAVE_L + | mov SAVE_PC, PC // Redundant (but a defined value). + | mov L:RB->base, BASE + | lea RD, [BASE+NARGS:RD*8-8] + | mov CARG1, L:RB + | mov L:RB->top, RD + | call extern lj_gc_step // (lua_State *L) + | mov BASE, L:RB->base + | mov RD, L:RB->top + | sub RD, BASE + | shr RDd, 3 + | add NARGS:RDd, 1 + | mov RB, TMP1 + | push RB // Restore return address. + | ret + | + |//----------------------------------------------------------------------- + |//-- Special dispatch targets ------------------------------------------- + |//----------------------------------------------------------------------- + | + |->vm_record: // Dispatch target for recording phase. + |.if JIT + | movzx RDd, byte [DISPATCH+DISPATCH_GL(hookmask)] + | test RDL, HOOK_VMEVENT // No recording while in vmevent. + | jnz >5 + | // Decrement the hookcount for consistency, but always do the call. + | test RDL, HOOK_ACTIVE + | jnz >1 + | test RDL, LUA_MASKLINE|LUA_MASKCOUNT + | jz >1 + | dec dword [DISPATCH+DISPATCH_GL(hookcount)] + | jmp >1 + |.endif + | + |->vm_rethook: // Dispatch target for return hooks. + | movzx RDd, byte [DISPATCH+DISPATCH_GL(hookmask)] + | test RDL, HOOK_ACTIVE // Hook already active? + | jnz >5 + | jmp >1 + | + |->vm_inshook: // Dispatch target for instr/line hooks. + | movzx RDd, byte [DISPATCH+DISPATCH_GL(hookmask)] + | test RDL, HOOK_ACTIVE // Hook already active? + | jnz >5 + | + | test RDL, LUA_MASKLINE|LUA_MASKCOUNT + | jz >5 + | dec dword [DISPATCH+DISPATCH_GL(hookcount)] + | jz >1 + | test RDL, LUA_MASKLINE + | jz >5 + |1: + | mov L:RB, SAVE_L + | mov L:RB->base, BASE + | mov CARG2, PC // Caveat: CARG2 == BASE + | mov CARG1, L:RB + | // SAVE_PC must hold the _previous_ PC. The callee updates it with PC. + | call extern lj_dispatch_ins // (lua_State *L, const BCIns *pc) + |3: + | mov BASE, L:RB->base + |4: + | movzx RAd, PC_RA + |5: + | movzx OP, PC_OP + | movzx RDd, PC_RD + | jmp aword [DISPATCH+OP*8+GG_DISP2STATIC] // Re-dispatch to static ins. + | + |->cont_hook: // Continue from hook yield. + | add PC, 4 + | mov RA, [RB-40] + | mov MULTRES, RAd // Restore MULTRES for *M ins. + | jmp <4 + | + |->vm_hotloop: // Hot loop counter underflow. + |.if JIT + | mov LFUNC:RB, [BASE-16] // Same as curr_topL(L). + | cleartp LFUNC:RB + | mov RB, LFUNC:RB->pc + | movzx RDd, byte [RB+PC2PROTO(framesize)] + | lea RD, [BASE+RD*8] + | mov L:RB, SAVE_L + | mov L:RB->base, BASE + | mov L:RB->top, RD + | mov CARG2, PC + | lea CARG1, [DISPATCH+GG_DISP2J] + | mov aword [DISPATCH+DISPATCH_J(L)], L:RB + | mov SAVE_PC, PC + | call extern lj_trace_hot // (jit_State *J, const BCIns *pc) + | jmp <3 + |.endif + | + |->vm_callhook: // Dispatch target for call hooks. + | mov SAVE_PC, PC + |.if JIT + | jmp >1 + |.endif + | + |->vm_hotcall: // Hot call counter underflow. + |.if JIT + | mov SAVE_PC, PC + | or PC, 1 // Marker for hot call. + |1: + |.endif + | lea RD, [BASE+NARGS:RD*8-8] + | mov L:RB, SAVE_L + | mov L:RB->base, BASE + | mov L:RB->top, RD + | mov CARG2, PC + | mov CARG1, L:RB + | call extern lj_dispatch_call // (lua_State *L, const BCIns *pc) + | // ASMFunction returned in eax/rax (RD). + | mov SAVE_PC, 0 // Invalidate for subsequent line hook. + |.if JIT + | and PC, -2 + |.endif + | mov BASE, L:RB->base + | mov RA, RD + | mov RD, L:RB->top + | sub RD, BASE + | mov RB, RA + | movzx RAd, PC_RA + | shr RDd, 3 + | add NARGS:RDd, 1 + | jmp RB + | + |->cont_stitch: // Trace stitching. + |.if JIT + | // BASE = base, RC = result, RB = mbase + | mov TRACE:ITYPE, [RB-40] // Save previous trace. + | cleartp TRACE:ITYPE + | mov TMPRd, MULTRES + | movzx RAd, PC_RA + | lea RA, [BASE+RA*8] // Call base. + | sub TMPRd, 1 + | jz >2 + |1: // Move results down. + | mov RB, [RC] + | mov [RA], RB + | add RC, 8 + | add RA, 8 + | sub TMPRd, 1 + | jnz <1 + |2: + | movzx RCd, PC_RA + | movzx RBd, PC_RB + | add RC, RB + | lea RC, [BASE+RC*8-8] + |3: + | cmp RC, RA + | ja >9 // More results wanted? + | + | test TRACE:ITYPE, TRACE:ITYPE + | jz ->cont_nop + | movzx RBd, word TRACE:ITYPE->traceno + | movzx RDd, word TRACE:ITYPE->link + | cmp RDd, RBd + | je ->cont_nop // Blacklisted. + | test RDd, RDd + | jne =>BC_JLOOP // Jump to stitched trace. + | + | // Stitch a new trace to the previous trace. + | mov [DISPATCH+DISPATCH_J(exitno)], RB + | mov L:RB, SAVE_L + | mov L:RB->base, BASE + | mov CARG2, PC + | lea CARG1, [DISPATCH+GG_DISP2J] + | mov aword [DISPATCH+DISPATCH_J(L)], L:RB + | call extern lj_dispatch_stitch // (jit_State *J, const BCIns *pc) + | mov BASE, L:RB->base + | jmp ->cont_nop + | + |9: // Fill up results with nil. + | mov aword [RA], LJ_TNIL + | add RA, 8 + | jmp <3 + |.endif + | + |->vm_profhook: // Dispatch target for profiler hook. +#if LJ_HASPROFILE + | mov L:RB, SAVE_L + | mov L:RB->base, BASE + | mov CARG2, PC // Caveat: CARG2 == BASE + | mov CARG1, L:RB + | call extern lj_dispatch_profile // (lua_State *L, const BCIns *pc) + | mov BASE, L:RB->base + | // HOOK_PROFILE is off again, so re-dispatch to dynamic instruction. + | sub PC, 4 + | jmp ->cont_nop +#endif + | + |//----------------------------------------------------------------------- + |//-- Trace exit handler ------------------------------------------------- + |//----------------------------------------------------------------------- + | + |// Called from an exit stub with the exit number on the stack. + |// The 16 bit exit number is stored with two (sign-extended) push imm8. + |->vm_exit_handler: + |.if JIT + | push r13; push r12 + | push r11; push r10; push r9; push r8 + | push rdi; push rsi; push rbp; lea rbp, [rsp+88]; push rbp + | push rbx; push rdx; push rcx; push rax + | movzx RCd, byte [rbp-8] // Reconstruct exit number. + | mov RCH, byte [rbp-16] + | mov [rbp-8], r15; mov [rbp-16], r14 + | // DISPATCH is preserved on-trace in LJ_GC64 mode. + | mov RAd, [DISPATCH+DISPATCH_GL(vmstate)] // Get trace number. + | set_vmstate EXIT + | mov [DISPATCH+DISPATCH_J(exitno)], RCd + | mov [DISPATCH+DISPATCH_J(parent)], RAd + |.if X64WIN + | sub rsp, 16*8+4*8 // Room for SSE regs + save area. + |.else + | sub rsp, 16*8 // Room for SSE regs. + |.endif + | add rbp, -128 + | movsd qword [rbp-8], xmm15; movsd qword [rbp-16], xmm14 + | movsd qword [rbp-24], xmm13; movsd qword [rbp-32], xmm12 + | movsd qword [rbp-40], xmm11; movsd qword [rbp-48], xmm10 + | movsd qword [rbp-56], xmm9; movsd qword [rbp-64], xmm8 + | movsd qword [rbp-72], xmm7; movsd qword [rbp-80], xmm6 + | movsd qword [rbp-88], xmm5; movsd qword [rbp-96], xmm4 + | movsd qword [rbp-104], xmm3; movsd qword [rbp-112], xmm2 + | movsd qword [rbp-120], xmm1; movsd qword [rbp-128], xmm0 + | // Caveat: RB is rbp. + | mov L:RB, [DISPATCH+DISPATCH_GL(cur_L)] + | mov BASE, [DISPATCH+DISPATCH_GL(jit_base)] + | mov aword [DISPATCH+DISPATCH_J(L)], L:RB + | mov L:RB->base, BASE + |.if X64WIN + | lea CARG2, [rsp+4*8] + |.else + | mov CARG2, rsp + |.endif + | lea CARG1, [DISPATCH+GG_DISP2J] + | mov qword [DISPATCH+DISPATCH_GL(jit_base)], 0 + | call extern lj_trace_exit // (jit_State *J, ExitState *ex) + | // MULTRES or negated error code returned in eax (RD). + | mov RA, L:RB->cframe + | and RA, CFRAME_RAWMASK + | mov [RA+CFRAME_OFS_L], L:RB // Set SAVE_L (on-trace resume/yield). + | mov BASE, L:RB->base + | mov PC, [RA+CFRAME_OFS_PC] // Get SAVE_PC. + | jmp >1 + |.endif + |->vm_exit_interp: + | // RD = MULTRES or negated error code, BASE, PC and DISPATCH set. + |.if JIT + | // Restore additional callee-save registers only used in compiled code. + |.if X64WIN + | lea RA, [rsp+10*16+4*8] + |1: + | movdqa xmm15, [RA-10*16] + | movdqa xmm14, [RA-9*16] + | movdqa xmm13, [RA-8*16] + | movdqa xmm12, [RA-7*16] + | movdqa xmm11, [RA-6*16] + | movdqa xmm10, [RA-5*16] + | movdqa xmm9, [RA-4*16] + | movdqa xmm8, [RA-3*16] + | movdqa xmm7, [RA-2*16] + | mov rsp, RA // Reposition stack to C frame. + | movdqa xmm6, [RA-1*16] + | mov r15, CSAVE_1 + | mov r14, CSAVE_2 + | mov r13, CSAVE_3 + | mov r12, CSAVE_4 + |.else + | lea RA, [rsp+16] + |1: + | mov r13, [RA-8] + | mov r12, [RA] + | mov rsp, RA // Reposition stack to C frame. + |.endif + | cmp RDd, -LUA_ERRERR; jae >9 // Check for error from exit. + | mov L:RB, SAVE_L + | mov MULTRES, RDd + | mov LFUNC:KBASE, [BASE-16] + | cleartp LFUNC:KBASE + | mov KBASE, LFUNC:KBASE->pc + | mov KBASE, [KBASE+PC2PROTO(k)] + | mov L:RB->base, BASE + | mov qword [DISPATCH+DISPATCH_GL(jit_base)], 0 + | set_vmstate INTERP + | // Modified copy of ins_next which handles function header dispatch, too. + | mov RCd, [PC] + | movzx RAd, RCH + | movzx OP, RCL + | add PC, 4 + | shr RCd, 16 + | cmp MULTRES, -17 // Static dispatch? + | je >5 + | cmp OP, BC_FUNCF // Function header? + | jb >3 + | cmp OP, BC_FUNCC+2 // Fast function? + | jae >4 + |2: + | mov RCd, MULTRES // RC/RD holds nres+1. + |3: + | jmp aword [DISPATCH+OP*8] + | + |4: // Check frame below fast function. + | mov RC, [BASE-8] + | test RCd, FRAME_TYPE + | jnz <2 // Trace stitching continuation? + | // Otherwise set KBASE for Lua function below fast function. + | movzx RCd, byte [RC-3] + | neg RC + | mov LFUNC:KBASE, [BASE+RC*8-32] + | cleartp LFUNC:KBASE + | mov KBASE, LFUNC:KBASE->pc + | mov KBASE, [KBASE+PC2PROTO(k)] + | jmp <2 + | + |5: // Dispatch to static entry of original ins replaced by BC_JLOOP. + | mov RA, [DISPATCH+DISPATCH_J(trace)] + | mov TRACE:RA, [RA+RD*8] + | mov RCd, TRACE:RA->startins + | movzx RAd, RCH + | movzx OP, RCL + | shr RCd, 16 + | jmp aword [DISPATCH+OP*8+GG_DISP2STATIC] + | + |9: // Rethrow error from the right C frame. + | mov CARG2d, RDd + | mov CARG1, L:RB + | neg CARG2d + | call extern lj_err_trace // (lua_State *L, int errcode) + |.endif + | + |//----------------------------------------------------------------------- + |//-- Math helper functions ---------------------------------------------- + |//----------------------------------------------------------------------- + | + |// FP value rounding. Called by math.floor/math.ceil fast functions + |// and from JIT code. arg/ret is xmm0. xmm0-xmm3 and RD (eax) modified. + |.macro vm_round, name, mode, cond + |->name: + |->name .. _sse: + | sseconst_abs xmm2, RD + | sseconst_2p52 xmm3, RD + | movaps xmm1, xmm0 + | andpd xmm1, xmm2 // |x| + | ucomisd xmm3, xmm1 // No truncation if 2^52 <= |x|. + | jbe >1 + | andnpd xmm2, xmm0 // Isolate sign bit. + |.if mode == 2 // trunc(x)? + | movaps xmm0, xmm1 + | addsd xmm1, xmm3 // (|x| + 2^52) - 2^52 + | subsd xmm1, xmm3 + | sseconst_1 xmm3, RD + | cmpsd xmm0, xmm1, 1 // |x| < result? + | andpd xmm0, xmm3 + | subsd xmm1, xmm0 // If yes, subtract -1. + | orpd xmm1, xmm2 // Merge sign bit back in. + |.else + | addsd xmm1, xmm3 // (|x| + 2^52) - 2^52 + | subsd xmm1, xmm3 + | orpd xmm1, xmm2 // Merge sign bit back in. + | sseconst_1 xmm3, RD + | .if mode == 1 // ceil(x)? + | cmpsd xmm0, xmm1, 6 // x > result? + | andpd xmm0, xmm3 + | addsd xmm1, xmm0 // If yes, add 1. + | orpd xmm1, xmm2 // Merge sign bit back in (again). + | .else // floor(x)? + | cmpsd xmm0, xmm1, 1 // x < result? + | andpd xmm0, xmm3 + | subsd xmm1, xmm0 // If yes, subtract 1. + | .endif + |.endif + | movaps xmm0, xmm1 + |1: + | ret + |.endmacro + | + | vm_round vm_floor, 0, 1 + | vm_round vm_ceil, 1, JIT + | vm_round vm_trunc, 2, JIT + | + |// FP modulo x%y. Called by BC_MOD* and vm_arith. + |->vm_mod: + |// Args in xmm0/xmm1, return value in xmm0. + |// Caveat: xmm0-xmm5 and RC (eax) modified! + | movaps xmm5, xmm0 + | divsd xmm0, xmm1 + | sseconst_abs xmm2, RD + | sseconst_2p52 xmm3, RD + | movaps xmm4, xmm0 + | andpd xmm4, xmm2 // |x/y| + | ucomisd xmm3, xmm4 // No truncation if 2^52 <= |x/y|. + | jbe >1 + | andnpd xmm2, xmm0 // Isolate sign bit. + | addsd xmm4, xmm3 // (|x/y| + 2^52) - 2^52 + | subsd xmm4, xmm3 + | orpd xmm4, xmm2 // Merge sign bit back in. + | sseconst_1 xmm2, RD + | cmpsd xmm0, xmm4, 1 // x/y < result? + | andpd xmm0, xmm2 + | subsd xmm4, xmm0 // If yes, subtract 1.0. + | movaps xmm0, xmm5 + | mulsd xmm1, xmm4 + | subsd xmm0, xmm1 + | ret + |1: + | mulsd xmm1, xmm0 + | movaps xmm0, xmm5 + | subsd xmm0, xmm1 + | ret + | + |//----------------------------------------------------------------------- + |//-- Miscellaneous functions -------------------------------------------- + |//----------------------------------------------------------------------- + | + |// int lj_vm_cpuid(uint32_t f, uint32_t res[4]) + |->vm_cpuid: + | mov eax, CARG1d + | .if X64WIN; push rsi; mov rsi, CARG2; .endif + | push rbx + | xor ecx, ecx + | cpuid + | mov [rsi], eax + | mov [rsi+4], ebx + | mov [rsi+8], ecx + | mov [rsi+12], edx + | pop rbx + | .if X64WIN; pop rsi; .endif + | ret + | + |.define NEXT_TAB, TAB:CARG1 + |.define NEXT_IDX, CARG2d + |.define NEXT_IDXa, CARG2 + |.define NEXT_PTR, RC + |.define NEXT_PTRd, RCd + |.define NEXT_TMP, CARG3 + |.define NEXT_ASIZE, CARG4d + |.macro NEXT_RES_IDXL, op2; lea edx, [NEXT_IDX+op2]; .endmacro + |.if X64WIN + |.define NEXT_RES_PTR, [rsp+aword*5] + |.macro NEXT_RES_IDX, op2; add NEXT_IDX, op2; .endmacro + |.else + |.define NEXT_RES_PTR, [rsp+aword*1] + |.macro NEXT_RES_IDX, op2; lea edx, [NEXT_IDX+op2]; .endmacro + |.endif + | + |// TValue *lj_vm_next(GCtab *t, uint32_t idx) + |// Next idx returned in edx. + |->vm_next: + |.if JIT + | mov NEXT_ASIZE, NEXT_TAB->asize + |1: // Traverse array part. + | cmp NEXT_IDX, NEXT_ASIZE; jae >5 + | mov NEXT_TMP, NEXT_TAB->array + | mov NEXT_TMP, qword [NEXT_TMP+NEXT_IDX*8] + | cmp NEXT_TMP, LJ_TNIL; je >2 + | lea NEXT_PTR, NEXT_RES_PTR + | mov qword [NEXT_PTR], NEXT_TMP + |.if DUALNUM + | setint NEXT_TMP, NEXT_IDXa + | mov qword [NEXT_PTR+qword*1], NEXT_TMP + |.else + | cvtsi2sd xmm0, NEXT_IDX + | movsd qword [NEXT_PTR+qword*1], xmm0 + |.endif + | NEXT_RES_IDX 1 + | ret + |2: // Skip holes in array part. + | add NEXT_IDX, 1 + | jmp <1 + | + |5: // Traverse hash part. + | sub NEXT_IDX, NEXT_ASIZE + |6: + | cmp NEXT_IDX, NEXT_TAB->hmask; ja >9 + | imul NEXT_PTRd, NEXT_IDX, #NODE + | add NODE:NEXT_PTR, NEXT_TAB->node + | cmp qword NODE:NEXT_PTR->val, LJ_TNIL; je >7 + | NEXT_RES_IDXL NEXT_ASIZE+1 + | ret + |7: // Skip holes in hash part. + | add NEXT_IDX, 1 + | jmp <6 + | + |9: // End of iteration. Set the key to nil (not the value). + | NEXT_RES_IDX NEXT_ASIZE + | lea NEXT_PTR, NEXT_RES_PTR + | mov qword [NEXT_PTR+qword*1], LJ_TNIL + | ret + |.endif + | + |//----------------------------------------------------------------------- + |//-- Assertions --------------------------------------------------------- + |//----------------------------------------------------------------------- + | + |->assert_bad_for_arg_type: +#ifdef LUA_USE_ASSERT + | int3 +#endif + | int3 + | + |//----------------------------------------------------------------------- + |//-- FFI helper functions ----------------------------------------------- + |//----------------------------------------------------------------------- + | + |// Handler for callback functions. Callback slot number in ah/al. + |->vm_ffi_callback: + |.if FFI + |.type CTSTATE, CTState, PC + | saveregs_ // ebp/rbp already saved. ebp now holds global_State *. + | lea DISPATCH, [ebp+GG_G2DISP] + | mov CTSTATE, GL:ebp->ctype_state + | movzx eax, ax + | mov CTSTATE->cb.slot, eax + | mov CTSTATE->cb.gpr[0], CARG1 + | mov CTSTATE->cb.gpr[1], CARG2 + | mov CTSTATE->cb.gpr[2], CARG3 + | mov CTSTATE->cb.gpr[3], CARG4 + | movsd qword CTSTATE->cb.fpr[0], xmm0 + | movsd qword CTSTATE->cb.fpr[1], xmm1 + | movsd qword CTSTATE->cb.fpr[2], xmm2 + | movsd qword CTSTATE->cb.fpr[3], xmm3 + |.if X64WIN + | lea rax, [rsp+CFRAME_SIZE+4*8] + |.else + | lea rax, [rsp+CFRAME_SIZE] + | mov CTSTATE->cb.gpr[4], CARG5 + | mov CTSTATE->cb.gpr[5], CARG6 + | movsd qword CTSTATE->cb.fpr[4], xmm4 + | movsd qword CTSTATE->cb.fpr[5], xmm5 + | movsd qword CTSTATE->cb.fpr[6], xmm6 + | movsd qword CTSTATE->cb.fpr[7], xmm7 + |.endif + | mov CTSTATE->cb.stack, rax + | mov CARG2, rsp + | mov SAVE_PC, CTSTATE // Any value outside of bytecode is ok. + | mov CARG1, CTSTATE + | call extern lj_ccallback_enter // (CTState *cts, void *cf) + | // lua_State * returned in eax (RD). + | set_vmstate INTERP + | mov BASE, L:RD->base + | mov RD, L:RD->top + | sub RD, BASE + | mov LFUNC:RB, [BASE-16] + | cleartp LFUNC:RB + | shr RD, 3 + | add RD, 1 + | ins_callt + |.endif + | + |->cont_ffi_callback: // Return from FFI callback. + |.if FFI + | mov L:RA, SAVE_L + | mov CTSTATE, [DISPATCH+DISPATCH_GL(ctype_state)] + | mov aword CTSTATE->L, L:RA + | mov L:RA->base, BASE + | mov L:RA->top, RB + | mov CARG1, CTSTATE + | mov CARG2, RC + | call extern lj_ccallback_leave // (CTState *cts, TValue *o) + | mov rax, CTSTATE->cb.gpr[0] + | movsd xmm0, qword CTSTATE->cb.fpr[0] + | jmp ->vm_leave_unw + |.endif + | + |->vm_ffi_call: // Call C function via FFI. + | // Caveat: needs special frame unwinding, see below. + |.if FFI + | .type CCSTATE, CCallState, rbx + | push rbp; mov rbp, rsp; push rbx; mov CCSTATE, CARG1 + | + | // Readjust stack. + | mov eax, CCSTATE->spadj + | sub rsp, rax + | + | // Copy stack slots. + | movzx ecx, byte CCSTATE->nsp + | sub ecx, 1 + | js >2 + |1: + | mov rax, [CCSTATE+rcx*8+offsetof(CCallState, stack)] + | mov [rsp+rcx*8+CCALL_SPS_EXTRA*8], rax + | sub ecx, 1 + | jns <1 + |2: + | + | movzx eax, byte CCSTATE->nfpr + | mov CARG1, CCSTATE->gpr[0] + | mov CARG2, CCSTATE->gpr[1] + | mov CARG3, CCSTATE->gpr[2] + | mov CARG4, CCSTATE->gpr[3] + |.if not X64WIN + | mov CARG5, CCSTATE->gpr[4] + | mov CARG6, CCSTATE->gpr[5] + |.endif + | test eax, eax; jz >5 + | movaps xmm0, CCSTATE->fpr[0] + | movaps xmm1, CCSTATE->fpr[1] + | movaps xmm2, CCSTATE->fpr[2] + | movaps xmm3, CCSTATE->fpr[3] + |.if not X64WIN + | cmp eax, 4; jbe >5 + | movaps xmm4, CCSTATE->fpr[4] + | movaps xmm5, CCSTATE->fpr[5] + | movaps xmm6, CCSTATE->fpr[6] + | movaps xmm7, CCSTATE->fpr[7] + |.endif + |5: + | + | call aword CCSTATE->func + | + | mov CCSTATE->gpr[0], rax + | movaps CCSTATE->fpr[0], xmm0 + |.if not X64WIN + | mov CCSTATE->gpr[1], rdx + | movaps CCSTATE->fpr[1], xmm1 + |.endif + | + | mov rbx, [rbp-8]; leave; ret + |.endif + |// Note: vm_ffi_call must be the last function in this object file! + | + |//----------------------------------------------------------------------- +} + +/* Generate the code for a single instruction. */ +static void build_ins(BuildCtx *ctx, BCOp op, int defop) +{ + int vk = 0; + |// Note: aligning all instructions does not pay off. + |=>defop: + + switch (op) { + + /* -- Comparison ops ---------------------------------------------------- */ + + /* Remember: all ops branch for a true comparison, fall through otherwise. */ + + |.macro jmp_comp, lt, ge, le, gt, target + ||switch (op) { + ||case BC_ISLT: + | lt target + ||break; + ||case BC_ISGE: + | ge target + ||break; + ||case BC_ISLE: + | le target + ||break; + ||case BC_ISGT: + | gt target + ||break; + ||default: break; /* Shut up GCC. */ + ||} + |.endmacro + + case BC_ISLT: case BC_ISGE: case BC_ISLE: case BC_ISGT: + | // RA = src1, RD = src2, JMP with RD = target + | ins_AD + | mov ITYPE, [BASE+RA*8] + | mov RB, [BASE+RD*8] + | mov RA, ITYPE + | mov RD, RB + | sar ITYPE, 47 + | sar RB, 47 + |.if DUALNUM + | cmp ITYPEd, LJ_TISNUM; jne >7 + | cmp RBd, LJ_TISNUM; jne >8 + | add PC, 4 + | cmp RAd, RDd + | jmp_comp jge, jl, jg, jle, >9 + |6: + | movzx RDd, PC_RD + | branchPC RD + |9: + | ins_next + | + |7: // RA is not an integer. + | ja ->vmeta_comp + | // RA is a number. + | cmp RBd, LJ_TISNUM; jb >1; jne ->vmeta_comp + | // RA is a number, RD is an integer. + | cvtsi2sd xmm0, RDd + | jmp >2 + | + |8: // RA is an integer, RD is not an integer. + | ja ->vmeta_comp + | // RA is an integer, RD is a number. + | cvtsi2sd xmm1, RAd + | movd xmm0, RD + | jmp >3 + |.else + | cmp ITYPEd, LJ_TISNUM; jae ->vmeta_comp + | cmp RBd, LJ_TISNUM; jae ->vmeta_comp + |.endif + |1: + | movd xmm0, RD + |2: + | movd xmm1, RA + |3: + | add PC, 4 + | ucomisd xmm0, xmm1 + | // Unordered: all of ZF CF PF set, ordered: PF clear. + | // To preserve NaN semantics GE/GT branch on unordered, but LT/LE don't. + |.if DUALNUM + | jmp_comp jbe, ja, jb, jae, <9 + | jmp <6 + |.else + | jmp_comp jbe, ja, jb, jae, >1 + | movzx RDd, PC_RD + | branchPC RD + |1: + | ins_next + |.endif + break; + + case BC_ISEQV: case BC_ISNEV: + vk = op == BC_ISEQV; + | ins_AD // RA = src1, RD = src2, JMP with RD = target + | mov RB, [BASE+RD*8] + | mov ITYPE, [BASE+RA*8] + | add PC, 4 + | mov RD, RB + | mov RA, ITYPE + | sar RB, 47 + | sar ITYPE, 47 + |.if DUALNUM + | cmp RBd, LJ_TISNUM; jne >7 + | cmp ITYPEd, LJ_TISNUM; jne >8 + | cmp RDd, RAd + if (vk) { + | jne >9 + } else { + | je >9 + } + | movzx RDd, PC_RD + | branchPC RD + |9: + | ins_next + | + |7: // RD is not an integer. + | ja >5 + | // RD is a number. + | movd xmm1, RD + | cmp ITYPEd, LJ_TISNUM; jb >1; jne >5 + | // RD is a number, RA is an integer. + | cvtsi2sd xmm0, RAd + | jmp >2 + | + |8: // RD is an integer, RA is not an integer. + | ja >5 + | // RD is an integer, RA is a number. + | cvtsi2sd xmm1, RDd + | jmp >1 + | + |.else + | cmp RBd, LJ_TISNUM; jae >5 + | cmp ITYPEd, LJ_TISNUM; jae >5 + | movd xmm1, RD + |.endif + |1: + | movd xmm0, RA + |2: + | ucomisd xmm0, xmm1 + |4: + iseqne_fp: + if (vk) { + | jp >2 // Unordered means not equal. + | jne >2 + } else { + | jp >2 // Unordered means not equal. + | je >1 + } + iseqne_end: + if (vk) { + |1: // EQ: Branch to the target. + | movzx RDd, PC_RD + | branchPC RD + |2: // NE: Fallthrough to next instruction. + |.if not FFI + |3: + |.endif + } else { + |.if not FFI + |3: + |.endif + |2: // NE: Branch to the target. + | movzx RDd, PC_RD + | branchPC RD + |1: // EQ: Fallthrough to next instruction. + } + if (LJ_DUALNUM && (op == BC_ISEQV || op == BC_ISNEV || + op == BC_ISEQN || op == BC_ISNEN)) { + | jmp <9 + } else { + | ins_next + } + | + if (op == BC_ISEQV || op == BC_ISNEV) { + |5: // Either or both types are not numbers. + |.if FFI + | cmp RBd, LJ_TCDATA; je ->vmeta_equal_cd + | cmp ITYPEd, LJ_TCDATA; je ->vmeta_equal_cd + |.endif + | cmp RA, RD + | je <1 // Same GCobjs or pvalues? + | cmp RBd, ITYPEd + | jne <2 // Not the same type? + | cmp RBd, LJ_TISTABUD + | ja <2 // Different objects and not table/ud? + | + | // Different tables or userdatas. Need to check __eq metamethod. + | // Field metatable must be at same offset for GCtab and GCudata! + | cleartp TAB:RA + | mov TAB:RB, TAB:RA->metatable + | test TAB:RB, TAB:RB + | jz <2 // No metatable? + | test byte TAB:RB->nomm, 1<vmeta_equal // Handle __eq metamethod. + } else { + |.if FFI + |3: + | cmp ITYPEd, LJ_TCDATA + if (LJ_DUALNUM && vk) { + | jne <9 + } else { + | jne <2 + } + | jmp ->vmeta_equal_cd + |.endif + } + break; + case BC_ISEQS: case BC_ISNES: + vk = op == BC_ISEQS; + | ins_AND // RA = src, RD = str const, JMP with RD = target + | mov RB, [BASE+RA*8] + | add PC, 4 + | checkstr RB, >3 + | cmp RB, [KBASE+RD*8] + iseqne_test: + if (vk) { + | jne >2 + } else { + | je >1 + } + goto iseqne_end; + case BC_ISEQN: case BC_ISNEN: + vk = op == BC_ISEQN; + | ins_AD // RA = src, RD = num const, JMP with RD = target + | mov RB, [BASE+RA*8] + | add PC, 4 + |.if DUALNUM + | checkint RB, >7 + | mov RD, [KBASE+RD*8] + | checkint RD, >8 + | cmp RBd, RDd + if (vk) { + | jne >9 + } else { + | je >9 + } + | movzx RDd, PC_RD + | branchPC RD + |9: + | ins_next + | + |7: // RA is not an integer. + | ja >3 + | // RA is a number. + | mov RD, [KBASE+RD*8] + | checkint RD, >1 + | // RA is a number, RD is an integer. + | cvtsi2sd xmm0, RDd + | jmp >2 + | + |8: // RA is an integer, RD is a number. + | cvtsi2sd xmm0, RBd + | movd xmm1, RD + | ucomisd xmm0, xmm1 + | jmp >4 + |1: + | movd xmm0, RD + |.else + | checknum RB, >3 + |1: + | movsd xmm0, qword [KBASE+RD*8] + |.endif + |2: + | ucomisd xmm0, qword [BASE+RA*8] + |4: + goto iseqne_fp; + case BC_ISEQP: case BC_ISNEP: + vk = op == BC_ISEQP; + | ins_AND // RA = src, RD = primitive type (~), JMP with RD = target + | mov RB, [BASE+RA*8] + | sar RB, 47 + | add PC, 4 + | cmp RBd, RDd + if (!LJ_HASFFI) goto iseqne_test; + if (vk) { + | jne >3 + | movzx RDd, PC_RD + | branchPC RD + |2: + | ins_next + |3: + | cmp RBd, LJ_TCDATA; jne <2 + | jmp ->vmeta_equal_cd + } else { + | je >2 + | cmp RBd, LJ_TCDATA; je ->vmeta_equal_cd + | movzx RDd, PC_RD + | branchPC RD + |2: + | ins_next + } + break; + + /* -- Unary test and copy ops ------------------------------------------- */ + + case BC_ISTC: case BC_ISFC: case BC_IST: case BC_ISF: + | ins_AD // RA = dst or unused, RD = src, JMP with RD = target + | mov ITYPE, [BASE+RD*8] + | add PC, 4 + if (op == BC_ISTC || op == BC_ISFC) { + | mov RB, ITYPE + } + | sar ITYPE, 47 + | cmp ITYPEd, LJ_TISTRUECOND + if (op == BC_IST || op == BC_ISTC) { + | jae >1 + } else { + | jb >1 + } + if (op == BC_ISTC || op == BC_ISFC) { + | mov [BASE+RA*8], RB + } + | movzx RDd, PC_RD + | branchPC RD + |1: // Fallthrough to the next instruction. + | ins_next + break; + + case BC_ISTYPE: + | ins_AD // RA = src, RD = -type + | mov RB, [BASE+RA*8] + | sar RB, 47 + | add RBd, RDd + | jne ->vmeta_istype + | ins_next + break; + case BC_ISNUM: + | ins_AD // RA = src, RD = -(TISNUM-1) + | checknumtp [BASE+RA*8], ->vmeta_istype + | ins_next + break; + + /* -- Unary ops --------------------------------------------------------- */ + + case BC_MOV: + | ins_AD // RA = dst, RD = src + | mov RB, [BASE+RD*8] + | mov [BASE+RA*8], RB + | ins_next_ + break; + case BC_NOT: + | ins_AD // RA = dst, RD = src + | mov RB, [BASE+RD*8] + | sar RB, 47 + | mov RCd, 2 + | cmp RB, LJ_TISTRUECOND + | sbb RCd, 0 + | shl RC, 47 + | not RC + | mov [BASE+RA*8], RC + | ins_next + break; + case BC_UNM: + | ins_AD // RA = dst, RD = src + | mov RB, [BASE+RD*8] + |.if DUALNUM + | checkint RB, >5 + | neg RBd + | jo >4 + | setint RB + |9: + | mov [BASE+RA*8], RB + | ins_next + |4: + | mov64 RB, U64x(41e00000,00000000) // 2^31. + | jmp <9 + |5: + | ja ->vmeta_unm + |.else + | checknum RB, ->vmeta_unm + |.endif + | mov64 RD, U64x(80000000,00000000) + | xor RB, RD + |.if DUALNUM + | jmp <9 + |.else + | mov [BASE+RA*8], RB + | ins_next + |.endif + break; + case BC_LEN: + | ins_AD // RA = dst, RD = src + | mov RD, [BASE+RD*8] + | checkstr RD, >2 + |.if DUALNUM + | mov RDd, dword STR:RD->len + |1: + | setint RD + | mov [BASE+RA*8], RD + |.else + | xorps xmm0, xmm0 + | cvtsi2sd xmm0, dword STR:RD->len + |1: + | movsd qword [BASE+RA*8], xmm0 + |.endif + | ins_next + |2: + | cmp ITYPEd, LJ_TTAB; jne ->vmeta_len + | mov TAB:CARG1, TAB:RD +#if LJ_52 + | mov TAB:RB, TAB:RD->metatable + | cmp TAB:RB, 0 + | jnz >9 + |3: +#endif + |->BC_LEN_Z: + | mov RB, BASE // Save BASE. + | call extern lj_tab_len // (GCtab *t) + | // Length of table returned in eax (RD). + |.if DUALNUM + | // Nothing to do. + |.else + | cvtsi2sd xmm0, RDd + |.endif + | mov BASE, RB // Restore BASE. + | movzx RAd, PC_RA + | jmp <1 +#if LJ_52 + |9: // Check for __len. + | test byte TAB:RB->nomm, 1<vmeta_len // 'no __len' flag NOT set: check. +#endif + break; + + /* -- Binary ops -------------------------------------------------------- */ + + |.macro ins_arithpre, sseins, ssereg + | ins_ABC + ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN); + ||switch (vk) { + ||case 0: + | checknumtp [BASE+RB*8], ->vmeta_arith_vn + | .if DUALNUM + | checknumtp [KBASE+RC*8], ->vmeta_arith_vn + | .endif + | movsd xmm0, qword [BASE+RB*8] + | sseins ssereg, qword [KBASE+RC*8] + || break; + ||case 1: + | checknumtp [BASE+RB*8], ->vmeta_arith_nv + | .if DUALNUM + | checknumtp [KBASE+RC*8], ->vmeta_arith_nv + | .endif + | movsd xmm0, qword [KBASE+RC*8] + | sseins ssereg, qword [BASE+RB*8] + || break; + ||default: + | checknumtp [BASE+RB*8], ->vmeta_arith_vv + | checknumtp [BASE+RC*8], ->vmeta_arith_vv + | movsd xmm0, qword [BASE+RB*8] + | sseins ssereg, qword [BASE+RC*8] + || break; + ||} + |.endmacro + | + |.macro ins_arithdn, intins + | ins_ABC + ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN); + ||switch (vk) { + ||case 0: + | mov RB, [BASE+RB*8] + | mov RC, [KBASE+RC*8] + | checkint RB, ->vmeta_arith_vno + | checkint RC, ->vmeta_arith_vno + | intins RBd, RCd; jo ->vmeta_arith_vno + || break; + ||case 1: + | mov RB, [BASE+RB*8] + | mov RC, [KBASE+RC*8] + | checkint RB, ->vmeta_arith_nvo + | checkint RC, ->vmeta_arith_nvo + | intins RCd, RBd; jo ->vmeta_arith_nvo + || break; + ||default: + | mov RB, [BASE+RB*8] + | mov RC, [BASE+RC*8] + | checkint RB, ->vmeta_arith_vvo + | checkint RC, ->vmeta_arith_vvo + | intins RBd, RCd; jo ->vmeta_arith_vvo + || break; + ||} + ||if (vk == 1) { + | setint RC + | mov [BASE+RA*8], RC + ||} else { + | setint RB + | mov [BASE+RA*8], RB + ||} + | ins_next + |.endmacro + | + |.macro ins_arithpost + | movsd qword [BASE+RA*8], xmm0 + |.endmacro + | + |.macro ins_arith, sseins + | ins_arithpre sseins, xmm0 + | ins_arithpost + | ins_next + |.endmacro + | + |.macro ins_arith, intins, sseins + |.if DUALNUM + | ins_arithdn intins + |.else + | ins_arith, sseins + |.endif + |.endmacro + + | // RA = dst, RB = src1 or num const, RC = src2 or num const + case BC_ADDVN: case BC_ADDNV: case BC_ADDVV: + | ins_arith add, addsd + break; + case BC_SUBVN: case BC_SUBNV: case BC_SUBVV: + | ins_arith sub, subsd + break; + case BC_MULVN: case BC_MULNV: case BC_MULVV: + | ins_arith imul, mulsd + break; + case BC_DIVVN: case BC_DIVNV: case BC_DIVVV: + | ins_arith divsd + break; + case BC_MODVN: + | ins_arithpre movsd, xmm1 + |->BC_MODVN_Z: + | call ->vm_mod + | ins_arithpost + | ins_next + break; + case BC_MODNV: case BC_MODVV: + | ins_arithpre movsd, xmm1 + | jmp ->BC_MODVN_Z // Avoid 3 copies. It's slow anyway. + break; + case BC_POW: + | ins_arithpre movsd, xmm1 + | mov RB, BASE + | call extern pow + | movzx RAd, PC_RA + | mov BASE, RB + | ins_arithpost + | ins_next + break; + + case BC_CAT: + | ins_ABC // RA = dst, RB = src_start, RC = src_end + | mov L:CARG1, SAVE_L + | mov L:CARG1->base, BASE + | lea CARG2, [BASE+RC*8] + | mov CARG3d, RCd + | sub CARG3d, RBd + |->BC_CAT_Z: + | mov L:RB, L:CARG1 + | mov SAVE_PC, PC + | call extern lj_meta_cat // (lua_State *L, TValue *top, int left) + | // NULL (finished) or TValue * (metamethod) returned in eax (RC). + | mov BASE, L:RB->base + | test RC, RC + | jnz ->vmeta_binop + | movzx RBd, PC_RB // Copy result to Stk[RA] from Stk[RB]. + | movzx RAd, PC_RA + | mov RC, [BASE+RB*8] + | mov [BASE+RA*8], RC + | ins_next + break; + + /* -- Constant ops ------------------------------------------------------ */ + + case BC_KSTR: + | ins_AND // RA = dst, RD = str const (~) + | mov RD, [KBASE+RD*8] + | settp RD, LJ_TSTR + | mov [BASE+RA*8], RD + | ins_next + break; + case BC_KCDATA: + |.if FFI + | ins_AND // RA = dst, RD = cdata const (~) + | mov RD, [KBASE+RD*8] + | settp RD, LJ_TCDATA + | mov [BASE+RA*8], RD + | ins_next + |.endif + break; + case BC_KSHORT: + | ins_AD // RA = dst, RD = signed int16 literal + |.if DUALNUM + | movsx RDd, RDW + | setint RD + | mov [BASE+RA*8], RD + |.else + | movsx RDd, RDW // Sign-extend literal. + | cvtsi2sd xmm0, RDd + | movsd qword [BASE+RA*8], xmm0 + |.endif + | ins_next + break; + case BC_KNUM: + | ins_AD // RA = dst, RD = num const + | movsd xmm0, qword [KBASE+RD*8] + | movsd qword [BASE+RA*8], xmm0 + | ins_next + break; + case BC_KPRI: + | ins_AD // RA = dst, RD = primitive type (~) + | shl RD, 47 + | not RD + | mov [BASE+RA*8], RD + | ins_next + break; + case BC_KNIL: + | ins_AD // RA = dst_start, RD = dst_end + | lea RA, [BASE+RA*8+8] + | lea RD, [BASE+RD*8] + | mov RB, LJ_TNIL + | mov [RA-8], RB // Sets minimum 2 slots. + |1: + | mov [RA], RB + | add RA, 8 + | cmp RA, RD + | jbe <1 + | ins_next + break; + + /* -- Upvalue and function ops ------------------------------------------ */ + + case BC_UGET: + | ins_AD // RA = dst, RD = upvalue # + | mov LFUNC:RB, [BASE-16] + | cleartp LFUNC:RB + | mov UPVAL:RB, [LFUNC:RB+RD*8+offsetof(GCfuncL, uvptr)] + | mov RB, UPVAL:RB->v + | mov RD, [RB] + | mov [BASE+RA*8], RD + | ins_next + break; + case BC_USETV: +#define TV2MARKOFS \ + ((int32_t)offsetof(GCupval, marked)-(int32_t)offsetof(GCupval, tv)) + | ins_AD // RA = upvalue #, RD = src + | mov LFUNC:RB, [BASE-16] + | cleartp LFUNC:RB + | mov UPVAL:RB, [LFUNC:RB+RA*8+offsetof(GCfuncL, uvptr)] + | cmp byte UPVAL:RB->closed, 0 + | mov RB, UPVAL:RB->v + | mov RA, [BASE+RD*8] + | mov [RB], RA + | jz >1 + | // Check barrier for closed upvalue. + | test byte [RB+TV2MARKOFS], LJ_GC_BLACK // isblack(uv) + | jnz >2 + |1: + | ins_next + | + |2: // Upvalue is black. Check if new value is collectable and white. + | mov RD, RA + | sar RD, 47 + | sub RDd, LJ_TISGCV + | cmp RDd, LJ_TNUMX - LJ_TISGCV // tvisgcv(v) + | jbe <1 + | cleartp GCOBJ:RA + | test byte GCOBJ:RA->gch.marked, LJ_GC_WHITES // iswhite(v) + | jz <1 + | // Crossed a write barrier. Move the barrier forward. + |.if not X64WIN + | mov CARG2, RB + | mov RB, BASE // Save BASE. + |.else + | xchg CARG2, RB // Save BASE (CARG2 == BASE). + |.endif + | lea GL:CARG1, [DISPATCH+GG_DISP2G] + | call extern lj_gc_barrieruv // (global_State *g, TValue *tv) + | mov BASE, RB // Restore BASE. + | jmp <1 + break; +#undef TV2MARKOFS + case BC_USETS: + | ins_AND // RA = upvalue #, RD = str const (~) + | mov LFUNC:RB, [BASE-16] + | cleartp LFUNC:RB + | mov UPVAL:RB, [LFUNC:RB+RA*8+offsetof(GCfuncL, uvptr)] + | mov STR:RA, [KBASE+RD*8] + | mov RD, UPVAL:RB->v + | settp STR:ITYPE, STR:RA, LJ_TSTR + | mov [RD], STR:ITYPE + | test byte UPVAL:RB->marked, LJ_GC_BLACK // isblack(uv) + | jnz >2 + |1: + | ins_next + | + |2: // Check if string is white and ensure upvalue is closed. + | test byte GCOBJ:RA->gch.marked, LJ_GC_WHITES // iswhite(str) + | jz <1 + | cmp byte UPVAL:RB->closed, 0 + | jz <1 + | // Crossed a write barrier. Move the barrier forward. + | mov RB, BASE // Save BASE (CARG2 == BASE). + | mov CARG2, RD + | lea GL:CARG1, [DISPATCH+GG_DISP2G] + | call extern lj_gc_barrieruv // (global_State *g, TValue *tv) + | mov BASE, RB // Restore BASE. + | jmp <1 + break; + case BC_USETN: + | ins_AD // RA = upvalue #, RD = num const + | mov LFUNC:RB, [BASE-16] + | cleartp LFUNC:RB + | movsd xmm0, qword [KBASE+RD*8] + | mov UPVAL:RB, [LFUNC:RB+RA*8+offsetof(GCfuncL, uvptr)] + | mov RA, UPVAL:RB->v + | movsd qword [RA], xmm0 + | ins_next + break; + case BC_USETP: + | ins_AD // RA = upvalue #, RD = primitive type (~) + | mov LFUNC:RB, [BASE-16] + | cleartp LFUNC:RB + | mov UPVAL:RB, [LFUNC:RB+RA*8+offsetof(GCfuncL, uvptr)] + | shl RD, 47 + | not RD + | mov RA, UPVAL:RB->v + | mov [RA], RD + | ins_next + break; + case BC_UCLO: + | ins_AD // RA = level, RD = target + | branchPC RD // Do this first to free RD. + | mov L:RB, SAVE_L + | cmp aword L:RB->openupval, 0 + | je >1 + | mov L:RB->base, BASE + | lea CARG2, [BASE+RA*8] // Caveat: CARG2 == BASE + | mov L:CARG1, L:RB // Caveat: CARG1 == RA + | call extern lj_func_closeuv // (lua_State *L, TValue *level) + | mov BASE, L:RB->base + |1: + | ins_next + break; + + case BC_FNEW: + | ins_AND // RA = dst, RD = proto const (~) (holding function prototype) + | mov L:RB, SAVE_L + | mov L:RB->base, BASE // Caveat: CARG2/CARG3 may be BASE. + | mov CARG3, [BASE-16] + | cleartp CARG3 + | mov CARG2, [KBASE+RD*8] // Fetch GCproto *. + | mov CARG1, L:RB + | mov SAVE_PC, PC + | // (lua_State *L, GCproto *pt, GCfuncL *parent) + | call extern lj_func_newL_gc + | // GCfuncL * returned in eax (RC). + | mov BASE, L:RB->base + | movzx RAd, PC_RA + | settp LFUNC:RC, LJ_TFUNC + | mov [BASE+RA*8], LFUNC:RC + | ins_next + break; + + /* -- Table ops --------------------------------------------------------- */ + + case BC_TNEW: + | ins_AD // RA = dst, RD = hbits|asize + | mov L:RB, SAVE_L + | mov L:RB->base, BASE + | mov RA, [DISPATCH+DISPATCH_GL(gc.total)] + | cmp RA, [DISPATCH+DISPATCH_GL(gc.threshold)] + | mov SAVE_PC, PC + | jae >5 + |1: + | mov CARG3d, RDd + | and RDd, 0x7ff + | shr CARG3d, 11 + | cmp RDd, 0x7ff + | je >3 + |2: + | mov L:CARG1, L:RB + | mov CARG2d, RDd + | call extern lj_tab_new // (lua_State *L, int32_t asize, uint32_t hbits) + | // Table * returned in eax (RC). + | mov BASE, L:RB->base + | movzx RAd, PC_RA + | settp TAB:RC, LJ_TTAB + | mov [BASE+RA*8], TAB:RC + | ins_next + |3: // Turn 0x7ff into 0x801. + | mov RDd, 0x801 + | jmp <2 + |5: + | mov L:CARG1, L:RB + | call extern lj_gc_step_fixtop // (lua_State *L) + | movzx RDd, PC_RD + | jmp <1 + break; + case BC_TDUP: + | ins_AND // RA = dst, RD = table const (~) (holding template table) + | mov L:RB, SAVE_L + | mov RA, [DISPATCH+DISPATCH_GL(gc.total)] + | mov SAVE_PC, PC + | cmp RA, [DISPATCH+DISPATCH_GL(gc.threshold)] + | mov L:RB->base, BASE + | jae >3 + |2: + | mov TAB:CARG2, [KBASE+RD*8] // Caveat: CARG2 == BASE + | mov L:CARG1, L:RB // Caveat: CARG1 == RA + | call extern lj_tab_dup // (lua_State *L, Table *kt) + | // Table * returned in eax (RC). + | mov BASE, L:RB->base + | movzx RAd, PC_RA + | settp TAB:RC, LJ_TTAB + | mov [BASE+RA*8], TAB:RC + | ins_next + |3: + | mov L:CARG1, L:RB + | call extern lj_gc_step_fixtop // (lua_State *L) + | movzx RDd, PC_RD // Need to reload RD. + | not RD + | jmp <2 + break; + + case BC_GGET: + | ins_AND // RA = dst, RD = str const (~) + | mov LFUNC:RB, [BASE-16] + | cleartp LFUNC:RB + | mov TAB:RB, LFUNC:RB->env + | mov STR:RC, [KBASE+RD*8] + | jmp ->BC_TGETS_Z + break; + case BC_GSET: + | ins_AND // RA = src, RD = str const (~) + | mov LFUNC:RB, [BASE-16] + | cleartp LFUNC:RB + | mov TAB:RB, LFUNC:RB->env + | mov STR:RC, [KBASE+RD*8] + | jmp ->BC_TSETS_Z + break; + + case BC_TGETV: + | ins_ABC // RA = dst, RB = table, RC = key + | mov TAB:RB, [BASE+RB*8] + | mov RC, [BASE+RC*8] + | checktab TAB:RB, ->vmeta_tgetv + | + | // Integer key? + |.if DUALNUM + | checkint RC, >5 + |.else + | // Convert number to int and back and compare. + | checknum RC, >5 + | movd xmm0, RC + | cvttsd2si RCd, xmm0 + | cvtsi2sd xmm1, RCd + | ucomisd xmm0, xmm1 + | jne ->vmeta_tgetv // Generic numeric key? Use fallback. + |.endif + | cmp RCd, TAB:RB->asize // Takes care of unordered, too. + | jae ->vmeta_tgetv // Not in array part? Use fallback. + | shl RCd, 3 + | add RC, TAB:RB->array + | // Get array slot. + | mov ITYPE, [RC] + | cmp ITYPE, LJ_TNIL // Avoid overwriting RB in fastpath. + | je >2 + |1: + | mov [BASE+RA*8], ITYPE + | ins_next + | + |2: // Check for __index if table value is nil. + | mov TAB:TMPR, TAB:RB->metatable + | test TAB:TMPR, TAB:TMPR + | jz <1 + | test byte TAB:TMPR->nomm, 1<vmeta_tgetv // 'no __index' flag NOT set: check. + | jmp <1 + | + |5: // String key? + | cmp ITYPEd, LJ_TSTR; jne ->vmeta_tgetv + | cleartp STR:RC + | jmp ->BC_TGETS_Z + break; + case BC_TGETS: + | ins_ABC // RA = dst, RB = table, RC = str const (~) + | mov TAB:RB, [BASE+RB*8] + | not RC + | mov STR:RC, [KBASE+RC*8] + | checktab TAB:RB, ->vmeta_tgets + |->BC_TGETS_Z: // RB = GCtab *, RC = GCstr * + | mov TMPRd, TAB:RB->hmask + | and TMPRd, STR:RC->sid + | imul TMPRd, #NODE + | add NODE:TMPR, TAB:RB->node + | settp ITYPE, STR:RC, LJ_TSTR + |1: + | cmp NODE:TMPR->key, ITYPE + | jne >4 + | // Get node value. + | mov ITYPE, NODE:TMPR->val + | cmp ITYPE, LJ_TNIL + | je >5 // Key found, but nil value? + |2: + | mov [BASE+RA*8], ITYPE + | ins_next + | + |4: // Follow hash chain. + | mov NODE:TMPR, NODE:TMPR->next + | test NODE:TMPR, NODE:TMPR + | jnz <1 + | // End of hash chain: key not found, nil result. + | mov ITYPE, LJ_TNIL + | + |5: // Check for __index if table value is nil. + | mov TAB:TMPR, TAB:RB->metatable + | test TAB:TMPR, TAB:TMPR + | jz <2 // No metatable: done. + | test byte TAB:TMPR->nomm, 1<vmeta_tgets // Caveat: preserve STR:RC. + break; + case BC_TGETB: + | ins_ABC // RA = dst, RB = table, RC = byte literal + | mov TAB:RB, [BASE+RB*8] + | checktab TAB:RB, ->vmeta_tgetb + | cmp RCd, TAB:RB->asize + | jae ->vmeta_tgetb + | shl RCd, 3 + | add RC, TAB:RB->array + | // Get array slot. + | mov ITYPE, [RC] + | cmp ITYPE, LJ_TNIL + | je >2 + |1: + | mov [BASE+RA*8], ITYPE + | ins_next + | + |2: // Check for __index if table value is nil. + | mov TAB:TMPR, TAB:RB->metatable + | test TAB:TMPR, TAB:TMPR + | jz <1 + | test byte TAB:TMPR->nomm, 1<vmeta_tgetb // 'no __index' flag NOT set: check. + | jmp <1 + break; + case BC_TGETR: + | ins_ABC // RA = dst, RB = table, RC = key + | mov TAB:RB, [BASE+RB*8] + | cleartp TAB:RB + |.if DUALNUM + | mov RCd, dword [BASE+RC*8] + |.else + | cvttsd2si RCd, qword [BASE+RC*8] + |.endif + | cmp RCd, TAB:RB->asize + | jae ->vmeta_tgetr // Not in array part? Use fallback. + | shl RCd, 3 + | add RC, TAB:RB->array + | // Get array slot. + |->BC_TGETR_Z: + | mov ITYPE, [RC] + |->BC_TGETR2_Z: + | mov [BASE+RA*8], ITYPE + | ins_next + break; + + case BC_TSETV: + | ins_ABC // RA = src, RB = table, RC = key + | mov TAB:RB, [BASE+RB*8] + | mov RC, [BASE+RC*8] + | checktab TAB:RB, ->vmeta_tsetv + | + | // Integer key? + |.if DUALNUM + | checkint RC, >5 + |.else + | // Convert number to int and back and compare. + | checknum RC, >5 + | movd xmm0, RC + | cvttsd2si RCd, xmm0 + | cvtsi2sd xmm1, RCd + | ucomisd xmm0, xmm1 + | jne ->vmeta_tsetv // Generic numeric key? Use fallback. + |.endif + | cmp RCd, TAB:RB->asize // Takes care of unordered, too. + | jae ->vmeta_tsetv + | shl RCd, 3 + | add RC, TAB:RB->array + | cmp aword [RC], LJ_TNIL + | je >3 // Previous value is nil? + |1: + | test byte TAB:RB->marked, LJ_GC_BLACK // isblack(table) + | jnz >7 + |2: // Set array slot. + | mov RB, [BASE+RA*8] + | mov [RC], RB + | ins_next + | + |3: // Check for __newindex if previous value is nil. + | mov TAB:TMPR, TAB:RB->metatable + | test TAB:TMPR, TAB:TMPR + | jz <1 + | test byte TAB:TMPR->nomm, 1<vmeta_tsetv // 'no __newindex' flag NOT set: check. + | jmp <1 + | + |5: // String key? + | cmp ITYPEd, LJ_TSTR; jne ->vmeta_tsetv + | cleartp STR:RC + | jmp ->BC_TSETS_Z + | + |7: // Possible table write barrier for the value. Skip valiswhite check. + | barrierback TAB:RB, TMPR + | jmp <2 + break; + case BC_TSETS: + | ins_ABC // RA = src, RB = table, RC = str const (~) + | mov TAB:RB, [BASE+RB*8] + | not RC + | mov STR:RC, [KBASE+RC*8] + | checktab TAB:RB, ->vmeta_tsets + |->BC_TSETS_Z: // RB = GCtab *, RC = GCstr * + | mov TMPRd, TAB:RB->hmask + | and TMPRd, STR:RC->sid + | imul TMPRd, #NODE + | mov byte TAB:RB->nomm, 0 // Clear metamethod cache. + | add NODE:TMPR, TAB:RB->node + | settp ITYPE, STR:RC, LJ_TSTR + |1: + | cmp NODE:TMPR->key, ITYPE + | jne >5 + | // Ok, key found. Assumes: offsetof(Node, val) == 0 + | cmp aword [TMPR], LJ_TNIL + | je >4 // Previous value is nil? + |2: + | test byte TAB:RB->marked, LJ_GC_BLACK // isblack(table) + | jnz >7 + |3: // Set node value. + | mov ITYPE, [BASE+RA*8] + | mov [TMPR], ITYPE + | ins_next + | + |4: // Check for __newindex if previous value is nil. + | mov TAB:ITYPE, TAB:RB->metatable + | test TAB:ITYPE, TAB:ITYPE + | jz <2 + | test byte TAB:ITYPE->nomm, 1<vmeta_tsets // 'no __newindex' flag NOT set: check. + | jmp <2 + | + |5: // Follow hash chain. + | mov NODE:TMPR, NODE:TMPR->next + | test NODE:TMPR, NODE:TMPR + | jnz <1 + | // End of hash chain: key not found, add a new one. + | + | // But check for __newindex first. + | mov TAB:TMPR, TAB:RB->metatable + | test TAB:TMPR, TAB:TMPR + | jz >6 // No metatable: continue. + | test byte TAB:TMPR->nomm, 1<vmeta_tsets // 'no __newindex' flag NOT set: check. + |6: + | mov TMP1, ITYPE + | mov L:CARG1, SAVE_L + | mov L:CARG1->base, BASE + | lea CARG3, TMP1 + | mov CARG2, TAB:RB + | mov SAVE_PC, PC + | call extern lj_tab_newkey // (lua_State *L, GCtab *t, TValue *k) + | // Handles write barrier for the new key. TValue * returned in eax (RC). + | mov L:CARG1, SAVE_L + | mov BASE, L:CARG1->base + | mov TMPR, rax + | movzx RAd, PC_RA + | jmp <2 // Must check write barrier for value. + | + |7: // Possible table write barrier for the value. Skip valiswhite check. + | barrierback TAB:RB, ITYPE + | jmp <3 + break; + case BC_TSETB: + | ins_ABC // RA = src, RB = table, RC = byte literal + | mov TAB:RB, [BASE+RB*8] + | checktab TAB:RB, ->vmeta_tsetb + | cmp RCd, TAB:RB->asize + | jae ->vmeta_tsetb + | shl RCd, 3 + | add RC, TAB:RB->array + | cmp aword [RC], LJ_TNIL + | je >3 // Previous value is nil? + |1: + | test byte TAB:RB->marked, LJ_GC_BLACK // isblack(table) + | jnz >7 + |2: // Set array slot. + | mov ITYPE, [BASE+RA*8] + | mov [RC], ITYPE + | ins_next + | + |3: // Check for __newindex if previous value is nil. + | mov TAB:TMPR, TAB:RB->metatable + | test TAB:TMPR, TAB:TMPR + | jz <1 + | test byte TAB:TMPR->nomm, 1<vmeta_tsetb // 'no __newindex' flag NOT set: check. + | jmp <1 + | + |7: // Possible table write barrier for the value. Skip valiswhite check. + | barrierback TAB:RB, TMPR + | jmp <2 + break; + case BC_TSETR: + | ins_ABC // RA = src, RB = table, RC = key + | mov TAB:RB, [BASE+RB*8] + | cleartp TAB:RB + |.if DUALNUM + | mov RC, [BASE+RC*8] + |.else + | cvttsd2si RCd, qword [BASE+RC*8] + |.endif + | test byte TAB:RB->marked, LJ_GC_BLACK // isblack(table) + | jnz >7 + |2: + | cmp RCd, TAB:RB->asize + | jae ->vmeta_tsetr + | shl RCd, 3 + | add RC, TAB:RB->array + | // Set array slot. + |->BC_TSETR_Z: + | mov ITYPE, [BASE+RA*8] + | mov [RC], ITYPE + | ins_next + | + |7: // Possible table write barrier for the value. Skip valiswhite check. + | barrierback TAB:RB, TMPR + | jmp <2 + break; + + case BC_TSETM: + | ins_AD // RA = base (table at base-1), RD = num const (start index) + |1: + | mov TMPRd, dword [KBASE+RD*8] // Integer constant is in lo-word. + | lea RA, [BASE+RA*8] + | mov TAB:RB, [RA-8] // Guaranteed to be a table. + | cleartp TAB:RB + | test byte TAB:RB->marked, LJ_GC_BLACK // isblack(table) + | jnz >7 + |2: + | mov RDd, MULTRES + | sub RDd, 1 + | jz >4 // Nothing to copy? + | add RDd, TMPRd // Compute needed size. + | cmp RDd, TAB:RB->asize + | ja >5 // Doesn't fit into array part? + | sub RDd, TMPRd + | shl TMPRd, 3 + | add TMPR, TAB:RB->array + |3: // Copy result slots to table. + | mov RB, [RA] + | add RA, 8 + | mov [TMPR], RB + | add TMPR, 8 + | sub RDd, 1 + | jnz <3 + |4: + | ins_next + | + |5: // Need to resize array part. + | mov L:CARG1, SAVE_L + | mov L:CARG1->base, BASE // Caveat: CARG2/CARG3 may be BASE. + | mov CARG2, TAB:RB + | mov CARG3d, RDd + | mov L:RB, L:CARG1 + | mov SAVE_PC, PC + | call extern lj_tab_reasize // (lua_State *L, GCtab *t, int nasize) + | mov BASE, L:RB->base + | movzx RAd, PC_RA // Restore RA. + | movzx RDd, PC_RD // Restore RD. + | jmp <1 // Retry. + | + |7: // Possible table write barrier for any value. Skip valiswhite check. + | barrierback TAB:RB, RD + | jmp <2 + break; + + /* -- Calls and vararg handling ----------------------------------------- */ + + case BC_CALL: case BC_CALLM: + | ins_A_C // RA = base, (RB = nresults+1,) RC = nargs+1 | extra_nargs + if (op == BC_CALLM) { + | add NARGS:RDd, MULTRES + } + | mov LFUNC:RB, [BASE+RA*8] + | checkfunc LFUNC:RB, ->vmeta_call_ra + | lea BASE, [BASE+RA*8+16] + | ins_call + break; + + case BC_CALLMT: + | ins_AD // RA = base, RD = extra_nargs + | add NARGS:RDd, MULTRES + | // Fall through. Assumes BC_CALLT follows and ins_AD is a no-op. + break; + case BC_CALLT: + | ins_AD // RA = base, RD = nargs+1 + | lea RA, [BASE+RA*8+16] + | mov KBASE, BASE // Use KBASE for move + vmeta_call hint. + | mov LFUNC:RB, [RA-16] + | checktp_nc LFUNC:RB, LJ_TFUNC, ->vmeta_call + |->BC_CALLT_Z: + | mov PC, [BASE-8] + | test PCd, FRAME_TYPE + | jnz >7 + |1: + | mov [BASE-16], LFUNC:RB // Copy func+tag down, reloaded below. + | mov MULTRES, NARGS:RDd + | sub NARGS:RDd, 1 + | jz >3 + |2: // Move args down. + | mov RB, [RA] + | add RA, 8 + | mov [KBASE], RB + | add KBASE, 8 + | sub NARGS:RDd, 1 + | jnz <2 + | + | mov LFUNC:RB, [BASE-16] + |3: + | cleartp LFUNC:RB + | mov NARGS:RDd, MULTRES + | cmp byte LFUNC:RB->ffid, 1 // (> FF_C) Calling a fast function? + | ja >5 + |4: + | ins_callt + | + |5: // Tailcall to a fast function. + | test PCd, FRAME_TYPE // Lua frame below? + | jnz <4 + | movzx RAd, PC_RA + | neg RA + | mov LFUNC:KBASE, [BASE+RA*8-32] // Need to prepare KBASE. + | cleartp LFUNC:KBASE + | mov KBASE, LFUNC:KBASE->pc + | mov KBASE, [KBASE+PC2PROTO(k)] + | jmp <4 + | + |7: // Tailcall from a vararg function. + | sub PC, FRAME_VARG + | test PCd, FRAME_TYPEP + | jnz >8 // Vararg frame below? + | sub BASE, PC // Need to relocate BASE/KBASE down. + | mov KBASE, BASE + | mov PC, [BASE-8] + | jmp <1 + |8: + | add PCd, FRAME_VARG + | jmp <1 + break; + + case BC_ITERC: + | ins_A // RA = base, (RB = nresults+1,) RC = nargs+1 (2+1) + | lea RA, [BASE+RA*8+16] // fb = base+2 + | mov RB, [RA-32] // Copy state. fb[0] = fb[-4]. + | mov RC, [RA-24] // Copy control var. fb[1] = fb[-3]. + | mov [RA], RB + | mov [RA+8], RC + | mov LFUNC:RB, [RA-40] // Copy callable. fb[-2] = fb[-5] + | mov [RA-16], LFUNC:RB + | mov NARGS:RDd, 2+1 // Handle like a regular 2-arg call. + | checkfunc LFUNC:RB, ->vmeta_call + | mov BASE, RA + | ins_call + break; + + case BC_ITERN: + |.if JIT + | hotloop RBd + |.endif + |->vm_IITERN: + | ins_A // RA = base, (RB = nresults+1, RC = nargs+1 (2+1)) + | mov TAB:RB, [BASE+RA*8-16] + | cleartp TAB:RB + | mov RCd, [BASE+RA*8-8] // Get index from control var. + | mov TMPRd, TAB:RB->asize + | add PC, 4 + | mov ITYPE, TAB:RB->array + |1: // Traverse array part. + | cmp RCd, TMPRd; jae >5 // Index points after array part? + | cmp aword [ITYPE+RC*8], LJ_TNIL; je >4 + |.if not DUALNUM + | cvtsi2sd xmm0, RCd + |.endif + | // Copy array slot to returned value. + | mov RB, [ITYPE+RC*8] + | mov [BASE+RA*8+8], RB + | // Return array index as a numeric key. + |.if DUALNUM + | setint ITYPE, RC + | mov [BASE+RA*8], ITYPE + |.else + | movsd qword [BASE+RA*8], xmm0 + |.endif + | add RCd, 1 + | mov [BASE+RA*8-8], RCd // Update control var. + |2: + | movzx RDd, PC_RD // Get target from ITERL. + | branchPC RD + |3: + | ins_next + | + |4: // Skip holes in array part. + | add RCd, 1 + | jmp <1 + | + |5: // Traverse hash part. + | sub RCd, TMPRd + |6: + | cmp RCd, TAB:RB->hmask; ja <3 // End of iteration? Branch to ITERL+1. + | imul ITYPEd, RCd, #NODE + | add NODE:ITYPE, TAB:RB->node + | cmp aword NODE:ITYPE->val, LJ_TNIL; je >7 + | lea TMPRd, [RCd+TMPRd+1] + | // Copy key and value from hash slot. + | mov RB, NODE:ITYPE->key + | mov RC, NODE:ITYPE->val + | mov [BASE+RA*8], RB + | mov [BASE+RA*8+8], RC + | mov [BASE+RA*8-8], TMPRd + | jmp <2 + | + |7: // Skip holes in hash part. + | add RCd, 1 + | jmp <6 + break; + + case BC_ISNEXT: + | ins_AD // RA = base, RD = target (points to ITERN) + | mov CFUNC:RB, [BASE+RA*8-24] + | checkfunc CFUNC:RB, >5 + | checktptp [BASE+RA*8-16], LJ_TTAB, >5 + | cmp aword [BASE+RA*8-8], LJ_TNIL; jne >5 + | cmp byte CFUNC:RB->ffid, FF_next_N; jne >5 + | branchPC RD + | mov64 TMPR, ((uint64_t)LJ_KEYINDEX << 32) + | mov [BASE+RA*8-8], TMPR // Initialize control var. + |1: + | ins_next + |5: // Despecialize bytecode if any of the checks fail. + | mov PC_OP, BC_JMP + | branchPC RD + |.if JIT + | cmp byte [PC], BC_ITERN + | jne >6 + |.endif + | mov byte [PC], BC_ITERC + | jmp <1 + |.if JIT + |6: // Unpatch JLOOP. + | mov RA, [DISPATCH+DISPATCH_J(trace)] + | movzx RCd, word [PC+2] + | mov TRACE:RA, [RA+RC*8] + | mov eax, TRACE:RA->startins + | mov al, BC_ITERC + | mov dword [PC], eax + | jmp <1 + |.endif + break; + + case BC_VARG: + | ins_ABC // RA = base, RB = nresults+1, RC = numparams + | lea TMPR, [BASE+RC*8+(16+FRAME_VARG)] + | lea RA, [BASE+RA*8] + | sub TMPR, [BASE-8] + | // Note: TMPR may now be even _above_ BASE if nargs was < numparams. + | test RB, RB + | jz >5 // Copy all varargs? + | lea RB, [RA+RB*8-8] + | cmp TMPR, BASE // No vararg slots? + | jnb >2 + |1: // Copy vararg slots to destination slots. + | mov RC, [TMPR-16] + | add TMPR, 8 + | mov [RA], RC + | add RA, 8 + | cmp RA, RB // All destination slots filled? + | jnb >3 + | cmp TMPR, BASE // No more vararg slots? + | jb <1 + |2: // Fill up remainder with nil. + | mov aword [RA], LJ_TNIL + | add RA, 8 + | cmp RA, RB + | jb <2 + |3: + | ins_next + | + |5: // Copy all varargs. + | mov MULTRES, 1 // MULTRES = 0+1 + | mov RC, BASE + | sub RC, TMPR + | jbe <3 // No vararg slots? + | mov RBd, RCd + | shr RBd, 3 + | add RBd, 1 + | mov MULTRES, RBd // MULTRES = #varargs+1 + | mov L:RB, SAVE_L + | add RC, RA + | cmp RC, L:RB->maxstack + | ja >7 // Need to grow stack? + |6: // Copy all vararg slots. + | mov RC, [TMPR-16] + | add TMPR, 8 + | mov [RA], RC + | add RA, 8 + | cmp TMPR, BASE // No more vararg slots? + | jb <6 + | jmp <3 + | + |7: // Grow stack for varargs. + | mov L:RB->base, BASE + | mov L:RB->top, RA + | mov SAVE_PC, PC + | sub TMPR, BASE // Need delta, because BASE may change. + | mov TMP1hi, TMPRd + | mov CARG2d, MULTRES + | sub CARG2d, 1 + | mov CARG1, L:RB + | call extern lj_state_growstack // (lua_State *L, int n) + | mov BASE, L:RB->base + | movsxd TMPR, TMP1hi + | mov RA, L:RB->top + | add TMPR, BASE + | jmp <6 + break; + + /* -- Returns ----------------------------------------------------------- */ + + case BC_RETM: + | ins_AD // RA = results, RD = extra_nresults + | add RDd, MULTRES // MULTRES >=1, so RD >=1. + | // Fall through. Assumes BC_RET follows and ins_AD is a no-op. + break; + + case BC_RET: case BC_RET0: case BC_RET1: + | ins_AD // RA = results, RD = nresults+1 + if (op != BC_RET0) { + | shl RAd, 3 + } + |1: + | mov PC, [BASE-8] + | mov MULTRES, RDd // Save nresults+1. + | test PCd, FRAME_TYPE // Check frame type marker. + | jnz >7 // Not returning to a fixarg Lua func? + switch (op) { + case BC_RET: + |->BC_RET_Z: + | mov KBASE, BASE // Use KBASE for result move. + | sub RDd, 1 + | jz >3 + |2: // Move results down. + | mov RB, [KBASE+RA] + | mov [KBASE-16], RB + | add KBASE, 8 + | sub RDd, 1 + | jnz <2 + |3: + | mov RDd, MULTRES // Note: MULTRES may be >255. + | movzx RBd, PC_RB // So cannot compare with RDL! + |5: + | cmp RBd, RDd // More results expected? + | ja >6 + break; + case BC_RET1: + | mov RB, [BASE+RA] + | mov [BASE-16], RB + /* fallthrough */ + case BC_RET0: + |5: + | cmp PC_RB, RDL // More results expected? + | ja >6 + default: + break; + } + | movzx RAd, PC_RA + | neg RA + | lea BASE, [BASE+RA*8-16] // base = base - (RA+2)*8 + | mov LFUNC:KBASE, [BASE-16] + | cleartp LFUNC:KBASE + | mov KBASE, LFUNC:KBASE->pc + | mov KBASE, [KBASE+PC2PROTO(k)] + | ins_next + | + |6: // Fill up results with nil. + if (op == BC_RET) { + | mov aword [KBASE-16], LJ_TNIL // Note: relies on shifted base. + | add KBASE, 8 + } else { + | mov aword [BASE+RD*8-24], LJ_TNIL + } + | add RD, 1 + | jmp <5 + | + |7: // Non-standard return case. + | lea RB, [PC-FRAME_VARG] + | test RBd, FRAME_TYPEP + | jnz ->vm_return + | // Return from vararg function: relocate BASE down and RA up. + | sub BASE, RB + if (op != BC_RET0) { + | add RA, RB + } + | jmp <1 + break; + + /* -- Loops and branches ------------------------------------------------ */ + + |.define FOR_IDX, [RA] + |.define FOR_STOP, [RA+8] + |.define FOR_STEP, [RA+16] + |.define FOR_EXT, [RA+24] + + case BC_FORL: + |.if JIT + | hotloop RBd + |.endif + | // Fall through. Assumes BC_IFORL follows and ins_AJ is a no-op. + break; + + case BC_JFORI: + case BC_JFORL: +#if !LJ_HASJIT + break; +#endif + case BC_FORI: + case BC_IFORL: + vk = (op == BC_IFORL || op == BC_JFORL); + | ins_AJ // RA = base, RD = target (after end of loop or start of loop) + | lea RA, [BASE+RA*8] + if (LJ_DUALNUM) { + | mov RB, FOR_IDX + | checkint RB, >9 + | mov TMPR, FOR_STOP + if (!vk) { + | checkint TMPR, ->vmeta_for + | mov ITYPE, FOR_STEP + | test ITYPEd, ITYPEd; js >5 + | sar ITYPE, 47; + | cmp ITYPEd, LJ_TISNUM; jne ->vmeta_for + } else { +#ifdef LUA_USE_ASSERT + | checkinttp FOR_STOP, ->assert_bad_for_arg_type + | checkinttp FOR_STEP, ->assert_bad_for_arg_type +#endif + | mov ITYPE, FOR_STEP + | test ITYPEd, ITYPEd; js >5 + | add RBd, ITYPEd; jo >1 + | setint RB + | mov FOR_IDX, RB + } + | cmp RBd, TMPRd + | mov FOR_EXT, RB + if (op == BC_FORI) { + | jle >7 + |1: + |6: + | branchPC RD + } else if (op == BC_JFORI) { + | branchPC RD + | movzx RDd, PC_RD + | jle =>BC_JLOOP + |1: + |6: + } else if (op == BC_IFORL) { + | jg >7 + |6: + | branchPC RD + |1: + } else { + | jle =>BC_JLOOP + |1: + |6: + } + |7: + | ins_next + | + |5: // Invert check for negative step. + if (!vk) { + | sar ITYPE, 47; + | cmp ITYPEd, LJ_TISNUM; jne ->vmeta_for + } else { + | add RBd, ITYPEd; jo <1 + | setint RB + | mov FOR_IDX, RB + } + | cmp RBd, TMPRd + | mov FOR_EXT, RB + if (op == BC_FORI) { + | jge <7 + } else if (op == BC_JFORI) { + | branchPC RD + | movzx RDd, PC_RD + | jge =>BC_JLOOP + } else if (op == BC_IFORL) { + | jl <7 + } else { + | jge =>BC_JLOOP + } + | jmp <6 + |9: // Fallback to FP variant. + if (!vk) { + | jae ->vmeta_for + } + } else if (!vk) { + | checknumtp FOR_IDX, ->vmeta_for + } + if (!vk) { + | checknumtp FOR_STOP, ->vmeta_for + } else { +#ifdef LUA_USE_ASSERT + | checknumtp FOR_STOP, ->assert_bad_for_arg_type + | checknumtp FOR_STEP, ->assert_bad_for_arg_type +#endif + } + | mov RB, FOR_STEP + if (!vk) { + | checknum RB, ->vmeta_for + } + | movsd xmm0, qword FOR_IDX + | movsd xmm1, qword FOR_STOP + if (vk) { + | addsd xmm0, qword FOR_STEP + | movsd qword FOR_IDX, xmm0 + | test RB, RB; js >3 + } else { + | jl >3 + } + | ucomisd xmm1, xmm0 + |1: + | movsd qword FOR_EXT, xmm0 + if (op == BC_FORI) { + |.if DUALNUM + | jnb <7 + |.else + | jnb >2 + | branchPC RD + |.endif + } else if (op == BC_JFORI) { + | branchPC RD + | movzx RDd, PC_RD + | jnb =>BC_JLOOP + } else if (op == BC_IFORL) { + |.if DUALNUM + | jb <7 + |.else + | jb >2 + | branchPC RD + |.endif + } else { + | jnb =>BC_JLOOP + } + |.if DUALNUM + | jmp <6 + |.else + |2: + | ins_next + |.endif + | + |3: // Invert comparison if step is negative. + | ucomisd xmm0, xmm1 + | jmp <1 + break; + + case BC_ITERL: + |.if JIT + | hotloop RBd + |.endif + | // Fall through. Assumes BC_IITERL follows and ins_AJ is a no-op. + break; + + case BC_JITERL: +#if !LJ_HASJIT + break; +#endif + case BC_IITERL: + | ins_AJ // RA = base, RD = target + | lea RA, [BASE+RA*8] + | mov RB, [RA] + | cmp RB, LJ_TNIL; je >1 // Stop if iterator returned nil. + if (op == BC_JITERL) { + | mov [RA-8], RB + | jmp =>BC_JLOOP + } else { + | branchPC RD // Otherwise save control var + branch. + | mov [RA-8], RB + } + |1: + | ins_next + break; + + case BC_LOOP: + | ins_A // RA = base, RD = target (loop extent) + | // Note: RA/RD is only used by trace recorder to determine scope/extent + | // This opcode does NOT jump, it's only purpose is to detect a hot loop. + |.if JIT + | hotloop RBd + |.endif + | // Fall through. Assumes BC_ILOOP follows and ins_A is a no-op. + break; + + case BC_ILOOP: + | ins_A // RA = base, RD = target (loop extent) + | ins_next + break; + + case BC_JLOOP: + |.if JIT + | ins_AD // RA = base (ignored), RD = traceno + | mov RA, [DISPATCH+DISPATCH_J(trace)] + | mov TRACE:RD, [RA+RD*8] + | mov RD, TRACE:RD->mcode + | mov L:RB, SAVE_L + | mov [DISPATCH+DISPATCH_GL(jit_base)], BASE + | mov [DISPATCH+DISPATCH_GL(tmpbuf.L)], L:RB + | // Save additional callee-save registers only used in compiled code. + |.if X64WIN + | mov CSAVE_4, r12 + | mov CSAVE_3, r13 + | mov CSAVE_2, r14 + | mov CSAVE_1, r15 + | mov RA, rsp + | sub rsp, 10*16+4*8 + | movdqa [RA-1*16], xmm6 + | movdqa [RA-2*16], xmm7 + | movdqa [RA-3*16], xmm8 + | movdqa [RA-4*16], xmm9 + | movdqa [RA-5*16], xmm10 + | movdqa [RA-6*16], xmm11 + | movdqa [RA-7*16], xmm12 + | movdqa [RA-8*16], xmm13 + | movdqa [RA-9*16], xmm14 + | movdqa [RA-10*16], xmm15 + |.else + | sub rsp, 16 + | mov [rsp+16], r12 + | mov [rsp+8], r13 + |.endif + | jmp RD + |.endif + break; + + case BC_JMP: + | ins_AJ // RA = unused, RD = target + | branchPC RD + | ins_next + break; + + /* -- Function headers -------------------------------------------------- */ + + /* + ** Reminder: A function may be called with func/args above L->maxstack, + ** i.e. occupying EXTRA_STACK slots. And vmeta_call may add one extra slot, + ** too. This means all FUNC* ops (including fast functions) must check + ** for stack overflow _before_ adding more slots! + */ + + case BC_FUNCF: + |.if JIT + | hotcall RBd + |.endif + case BC_FUNCV: /* NYI: compiled vararg functions. */ + | // Fall through. Assumes BC_IFUNCF/BC_IFUNCV follow and ins_AD is a no-op. + break; + + case BC_JFUNCF: +#if !LJ_HASJIT + break; +#endif + case BC_IFUNCF: + | ins_AD // BASE = new base, RA = framesize, RD = nargs+1 + | mov KBASE, [PC-4+PC2PROTO(k)] + | mov L:RB, SAVE_L + | lea RA, [BASE+RA*8] // Top of frame. + | cmp RA, L:RB->maxstack + | ja ->vm_growstack_f + | movzx RAd, byte [PC-4+PC2PROTO(numparams)] + | cmp NARGS:RDd, RAd // Check for missing parameters. + | jbe >3 + |2: + if (op == BC_JFUNCF) { + | movzx RDd, PC_RD + | jmp =>BC_JLOOP + } else { + | ins_next + } + | + |3: // Clear missing parameters. + | mov aword [BASE+NARGS:RD*8-8], LJ_TNIL + | add NARGS:RDd, 1 + | cmp NARGS:RDd, RAd + | jbe <3 + | jmp <2 + break; + + case BC_JFUNCV: +#if !LJ_HASJIT + break; +#endif + | int3 // NYI: compiled vararg functions + break; /* NYI: compiled vararg functions. */ + + case BC_IFUNCV: + | ins_AD // BASE = new base, RA = framesize, RD = nargs+1 + | lea RBd, [NARGS:RD*8+FRAME_VARG+8] + | lea RD, [BASE+NARGS:RD*8+8] + | mov LFUNC:KBASE, [BASE-16] + | mov [RD-8], RB // Store delta + FRAME_VARG. + | mov [RD-16], LFUNC:KBASE // Store copy of LFUNC. + | mov L:RB, SAVE_L + | lea RA, [RD+RA*8] + | cmp RA, L:RB->maxstack + | ja ->vm_growstack_v // Need to grow stack. + | mov RA, BASE + | mov BASE, RD + | movzx RBd, byte [PC-4+PC2PROTO(numparams)] + | test RBd, RBd + | jz >2 + | add RA, 8 + |1: // Copy fixarg slots up to new frame. + | add RA, 8 + | cmp RA, BASE + | jnb >3 // Less args than parameters? + | mov KBASE, [RA-16] + | mov [RD], KBASE + | add RD, 8 + | mov aword [RA-16], LJ_TNIL // Clear old fixarg slot (help the GC). + | sub RBd, 1 + | jnz <1 + |2: + if (op == BC_JFUNCV) { + | movzx RDd, PC_RD + | jmp =>BC_JLOOP + } else { + | mov KBASE, [PC-4+PC2PROTO(k)] + | ins_next + } + | + |3: // Clear missing parameters. + | mov aword [RD], LJ_TNIL + | add RD, 8 + | sub RBd, 1 + | jnz <3 + | jmp <2 + break; + + case BC_FUNCC: + case BC_FUNCCW: + | ins_AD // BASE = new base, RA = ins RA|RD (unused), RD = nargs+1 + | mov CFUNC:RB, [BASE-16] + | cleartp CFUNC:RB + | mov KBASE, CFUNC:RB->f + | mov L:RB, SAVE_L + | lea RD, [BASE+NARGS:RD*8-8] + | mov L:RB->base, BASE + | lea RA, [RD+8*LUA_MINSTACK] + | cmp RA, L:RB->maxstack + | mov L:RB->top, RD + if (op == BC_FUNCC) { + | mov CARG1, L:RB // Caveat: CARG1 may be RA. + } else { + | mov CARG2, KBASE + | mov CARG1, L:RB // Caveat: CARG1 may be RA. + } + | ja ->vm_growstack_c // Need to grow stack. + | set_vmstate C + if (op == BC_FUNCC) { + | call KBASE // (lua_State *L) + } else { + | // (lua_State *L, lua_CFunction f) + | call aword [DISPATCH+DISPATCH_GL(wrapf)] + } + | // nresults returned in eax (RD). + | mov BASE, L:RB->base + | mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB + | set_vmstate INTERP + | lea RA, [BASE+RD*8] + | neg RA + | add RA, L:RB->top // RA = (L->top-(L->base+nresults))*8 + | mov PC, [BASE-8] // Fetch PC of caller. + | jmp ->vm_returnc + break; + + /* ---------------------------------------------------------------------- */ + + default: + fprintf(stderr, "Error: undefined opcode BC_%s\n", bc_names[op]); + exit(2); + break; + } +} + +static int build_backend(BuildCtx *ctx) +{ + int op; + dasm_growpc(Dst, BC__MAX); + build_subroutines(ctx); + |.code_op + for (op = 0; op < BC__MAX; op++) + build_ins(ctx, (BCOp)op, op); + return BC__MAX; +} + +/* Emit pseudo frame-info for all assembler functions. */ +static void emit_asm_debug(BuildCtx *ctx) +{ + int fcofs = (int)((uint8_t *)ctx->glob[GLOB_vm_ffi_call] - ctx->code); + switch (ctx->mode) { + case BUILD_elfasm: + fprintf(ctx->fp, "\t.section .debug_frame,\"\",@progbits\n"); + fprintf(ctx->fp, + ".Lframe0:\n" + "\t.long .LECIE0-.LSCIE0\n" + ".LSCIE0:\n" + "\t.long 0xffffffff\n" + "\t.byte 0x1\n" + "\t.string \"\"\n" + "\t.uleb128 0x1\n" + "\t.sleb128 -8\n" + "\t.byte 0x10\n" + "\t.byte 0xc\n\t.uleb128 0x7\n\t.uleb128 8\n" + "\t.byte 0x80+0x10\n\t.uleb128 0x1\n" + "\t.align 8\n" + ".LECIE0:\n\n"); + fprintf(ctx->fp, + ".LSFDE0:\n" + "\t.long .LEFDE0-.LASFDE0\n" + ".LASFDE0:\n" + "\t.long .Lframe0\n" + "\t.quad .Lbegin\n" + "\t.quad %d\n" + "\t.byte 0xe\n\t.uleb128 %d\n" /* def_cfa_offset */ + "\t.byte 0x86\n\t.uleb128 0x2\n" /* offset rbp */ + "\t.byte 0x83\n\t.uleb128 0x3\n" /* offset rbx */ + "\t.byte 0x8f\n\t.uleb128 0x4\n" /* offset r15 */ + "\t.byte 0x8e\n\t.uleb128 0x5\n" /* offset r14 */ +#if LJ_NO_UNWIND + "\t.byte 0x8d\n\t.uleb128 0x6\n" /* offset r13 */ + "\t.byte 0x8c\n\t.uleb128 0x7\n" /* offset r12 */ +#endif + "\t.align 8\n" + ".LEFDE0:\n\n", fcofs, CFRAME_SIZE); +#if LJ_HASFFI + fprintf(ctx->fp, + ".LSFDE1:\n" + "\t.long .LEFDE1-.LASFDE1\n" + ".LASFDE1:\n" + "\t.long .Lframe0\n" + "\t.quad lj_vm_ffi_call\n" + "\t.quad %d\n" + "\t.byte 0xe\n\t.uleb128 16\n" /* def_cfa_offset */ + "\t.byte 0x86\n\t.uleb128 0x2\n" /* offset rbp */ + "\t.byte 0xd\n\t.uleb128 0x6\n" /* def_cfa_register rbp */ + "\t.byte 0x83\n\t.uleb128 0x3\n" /* offset rbx */ + "\t.align 8\n" + ".LEFDE1:\n\n", (int)ctx->codesz - fcofs); +#endif +#if !LJ_NO_UNWIND +#if LJ_TARGET_SOLARIS + fprintf(ctx->fp, "\t.section .eh_frame,\"a\",@unwind\n"); +#else + fprintf(ctx->fp, "\t.section .eh_frame,\"a\",@progbits\n"); +#endif + fprintf(ctx->fp, + ".Lframe1:\n" + "\t.long .LECIE1-.LSCIE1\n" + ".LSCIE1:\n" + "\t.long 0\n" + "\t.byte 0x1\n" + "\t.string \"zPR\"\n" + "\t.uleb128 0x1\n" + "\t.sleb128 -8\n" + "\t.byte 0x10\n" + "\t.uleb128 6\n" /* augmentation length */ + "\t.byte 0x1b\n" /* pcrel|sdata4 */ + "\t.long lj_err_unwind_dwarf-.\n" + "\t.byte 0x1b\n" /* pcrel|sdata4 */ + "\t.byte 0xc\n\t.uleb128 0x7\n\t.uleb128 8\n" + "\t.byte 0x80+0x10\n\t.uleb128 0x1\n" + "\t.align 8\n" + ".LECIE1:\n\n"); + fprintf(ctx->fp, + ".LSFDE2:\n" + "\t.long .LEFDE2-.LASFDE2\n" + ".LASFDE2:\n" + "\t.long .LASFDE2-.Lframe1\n" + "\t.long .Lbegin-.\n" + "\t.long %d\n" + "\t.uleb128 0\n" /* augmentation length */ + "\t.byte 0xe\n\t.uleb128 %d\n" /* def_cfa_offset */ + "\t.byte 0x86\n\t.uleb128 0x2\n" /* offset rbp */ + "\t.byte 0x83\n\t.uleb128 0x3\n" /* offset rbx */ + "\t.byte 0x8f\n\t.uleb128 0x4\n" /* offset r15 */ + "\t.byte 0x8e\n\t.uleb128 0x5\n" /* offset r14 */ + "\t.align 8\n" + ".LEFDE2:\n\n", fcofs, CFRAME_SIZE); +#if LJ_HASFFI + fprintf(ctx->fp, + ".Lframe2:\n" + "\t.long .LECIE2-.LSCIE2\n" + ".LSCIE2:\n" + "\t.long 0\n" + "\t.byte 0x1\n" + "\t.string \"zR\"\n" + "\t.uleb128 0x1\n" + "\t.sleb128 -8\n" + "\t.byte 0x10\n" + "\t.uleb128 1\n" /* augmentation length */ + "\t.byte 0x1b\n" /* pcrel|sdata4 */ + "\t.byte 0xc\n\t.uleb128 0x7\n\t.uleb128 8\n" + "\t.byte 0x80+0x10\n\t.uleb128 0x1\n" + "\t.align 8\n" + ".LECIE2:\n\n"); + fprintf(ctx->fp, + ".LSFDE3:\n" + "\t.long .LEFDE3-.LASFDE3\n" + ".LASFDE3:\n" + "\t.long .LASFDE3-.Lframe2\n" + "\t.long lj_vm_ffi_call-.\n" + "\t.long %d\n" + "\t.uleb128 0\n" /* augmentation length */ + "\t.byte 0xe\n\t.uleb128 16\n" /* def_cfa_offset */ + "\t.byte 0x86\n\t.uleb128 0x2\n" /* offset rbp */ + "\t.byte 0xd\n\t.uleb128 0x6\n" /* def_cfa_register rbp */ + "\t.byte 0x83\n\t.uleb128 0x3\n" /* offset rbx */ + "\t.align 8\n" + ".LEFDE3:\n\n", (int)ctx->codesz - fcofs); +#endif +#endif + break; +#if !LJ_NO_UNWIND + /* Mental note: never let Apple design an assembler. + ** Or a linker. Or a plastic case. But I digress. + */ + case BUILD_machasm: { +#if LJ_HASFFI + int fcsize = 0; +#endif + int i; + fprintf(ctx->fp, "\t.section __TEXT,__eh_frame,coalesced,no_toc+strip_static_syms+live_support\n"); + fprintf(ctx->fp, + "EH_frame1:\n" + "\t.set L$set$x,LECIEX-LSCIEX\n" + "\t.long L$set$x\n" + "LSCIEX:\n" + "\t.long 0\n" + "\t.byte 0x1\n" + "\t.ascii \"zPR\\0\"\n" + "\t.byte 0x1\n" + "\t.byte 128-8\n" + "\t.byte 0x10\n" + "\t.byte 6\n" /* augmentation length */ + "\t.byte 0x9b\n" /* indirect|pcrel|sdata4 */ + "\t.long _lj_err_unwind_dwarf+4@GOTPCREL\n" + "\t.byte 0x1b\n" /* pcrel|sdata4 */ + "\t.byte 0xc\n\t.byte 0x7\n\t.byte 8\n" + "\t.byte 0x80+0x10\n\t.byte 0x1\n" + "\t.align 3\n" + "LECIEX:\n\n"); + for (i = 0; i < ctx->nsym; i++) { + const char *name = ctx->sym[i].name; + int32_t size = ctx->sym[i+1].ofs - ctx->sym[i].ofs; + if (size == 0) continue; +#if LJ_HASFFI + if (!strcmp(name, "_lj_vm_ffi_call")) { fcsize = size; continue; } +#endif + fprintf(ctx->fp, + "%s.eh:\n" + "LSFDE%d:\n" + "\t.set L$set$%d,LEFDE%d-LASFDE%d\n" + "\t.long L$set$%d\n" + "LASFDE%d:\n" + "\t.long LASFDE%d-EH_frame1\n" + "\t.long %s-.\n" + "\t.long %d\n" + "\t.byte 0\n" /* augmentation length */ + "\t.byte 0xe\n\t.byte %d\n" /* def_cfa_offset */ + "\t.byte 0x86\n\t.byte 0x2\n" /* offset rbp */ + "\t.byte 0x83\n\t.byte 0x3\n" /* offset rbx */ + "\t.byte 0x8f\n\t.byte 0x4\n" /* offset r15 */ + "\t.byte 0x8e\n\t.byte 0x5\n" /* offset r14 */ + "\t.align 3\n" + "LEFDE%d:\n\n", + name, i, i, i, i, i, i, i, name, size, CFRAME_SIZE, i); + } +#if LJ_HASFFI + if (fcsize) { + fprintf(ctx->fp, + "EH_frame2:\n" + "\t.set L$set$y,LECIEY-LSCIEY\n" + "\t.long L$set$y\n" + "LSCIEY:\n" + "\t.long 0\n" + "\t.byte 0x1\n" + "\t.ascii \"zR\\0\"\n" + "\t.byte 0x1\n" + "\t.byte 128-8\n" + "\t.byte 0x10\n" + "\t.byte 1\n" /* augmentation length */ + "\t.byte 0x1b\n" /* pcrel|sdata4 */ + "\t.byte 0xc\n\t.byte 0x7\n\t.byte 8\n" + "\t.byte 0x80+0x10\n\t.byte 0x1\n" + "\t.align 3\n" + "LECIEY:\n\n"); + fprintf(ctx->fp, + "_lj_vm_ffi_call.eh:\n" + "LSFDEY:\n" + "\t.set L$set$yy,LEFDEY-LASFDEY\n" + "\t.long L$set$yy\n" + "LASFDEY:\n" + "\t.long LASFDEY-EH_frame2\n" + "\t.long _lj_vm_ffi_call-.\n" + "\t.long %d\n" + "\t.byte 0\n" /* augmentation length */ + "\t.byte 0xe\n\t.byte 16\n" /* def_cfa_offset */ + "\t.byte 0x86\n\t.byte 0x2\n" /* offset rbp */ + "\t.byte 0xd\n\t.byte 0x6\n" /* def_cfa_register rbp */ + "\t.byte 0x83\n\t.byte 0x3\n" /* offset rbx */ + "\t.align 3\n" + "LEFDEY:\n\n", fcsize); + } +#endif + fprintf(ctx->fp, ".subsections_via_symbols\n"); + } + break; +#endif + default: /* Difficult for other modes. */ + break; + } +} +