From: drh Lemon is an LALR(1) parser generator for C.
@@ -23,7 +24,37 @@ or embedded controllers. This document is an introduction to the Lemon
parser generator. The language parser code created by Lemon is very robust and
is well-suited for use in internet-facing applications that need to
@@ -43,26 +74,29 @@ To summarize: The main goal of Lemon is to translate a context free grammar (CFG)
+ Lemon is computer program that translates a context free grammar (CFG)
for a particular language into C code that implements a parser for
that language.
-The program has two inputs:The Lemon Parser Generator
Security Note
+
+1.0 Table of Contents
+
+
+
+
+
+2.0 Security Note
Theory of Operation
+
+3.0 Theory of Operation
-
Typically, only the grammar specification is supplied by the programmer. -Lemon comes with a default parser template which works fine for most -applications. But the user is free to substitute a different parser -template if desired.
+Lemon comes with a default parser template +("lempar.c") +that works fine for most applications. But the user is free to substitute +a different parser template if desired.Depending on command-line options, Lemon will generate up to three output files.
The behavior of Lemon can be modified using command-line options. You can obtain a list of the available command-line options together @@ -134,7 +169,8 @@ Use file as the template for the generated C-code parser implementation. Print the Lemon version number. -
Lemon doesn't generate a complete, working program. It only generates a few subroutines that implement a parser. This section describes @@ -275,7 +311,61 @@ or calls an action routine. Each such message is prefaced using the text given by zPrefix. This debugging output can be turned off by calling ParseTrace() again with a first argument of NULL (0).
-If all calls to the Parse() interface are made from within +%code directives, then the parse +object can be allocated from the stack rather than from the heap. +These are the steps: + +
The following code illustrates how this is done: + +
+ ParseFile(){ + yyParser x; + ParseInit( &x ); + while( GetNextToken(pTokenizer,&hTokenId, &sToken) ){ + Parse(&x, hTokenId, sToken); + } + Parse(&x, 0, sToken); + ParseFinalize( &x ); + } ++ + +
Here is a quick overview of the C-language interface to a +Lemon-generated parser:
+ ++ ++void *ParseAlloc( (void*(*malloc)(size_t) ); +void ParseFree(void *pParser, (void(*free)(void*) ); +void Parse(void *pParser, int tokenCode, ParseTOKENTYPE token, ...); +void ParseTrace(FILE *stream, char *zPrefix); +
Notes:
+Programmers who have previously used the yacc or bison parser generator will notice several important differences between yacc and/or @@ -296,10 +386,39 @@ believe that the Lemon way of doing things is better.
Updated as of 2016-02-16: The text above was written in the 1990s. We are told that Bison has lately been enhanced to support the -tokenizer-calls-parser paradigm used by Lemon, and to obviate the +tokenizer-calls-parser paradigm used by Lemon, eliminating the need for global variables.
-The "lemon" or "lemon.exe" program is built from a single file +of C-code named +"lemon.c". +The Lemon source code is generic C89 code that uses +no unusual or non-standard libraries. Any +reasonable C compiler should suffice to compile the lemon program. +A command-line like the following will usually work:
+ +On Windows machines with Visual C++ installed, bring up a +"VS20NN x64 Native Tools Command Prompt" window and enter: + ++cc -o lemon lemon.c +
+ ++cl lemon.c +
Compiling Lemon really is that simple. +Additional compiler options such as +"-O2" or "-g" or "-Wall" can be added if desired, but they are not +necessary.
+ + + +The main purpose of the grammar specification file for Lemon is to define the grammar for the parser. But the input file also @@ -313,7 +432,8 @@ declaration can occur at any point in the file. Lemon ignores whitespace (except where it is needed to separate tokens), and it honors the same commenting conventions as C and C++.
-A terminal symbol (token) is any string of alphanumeric and/or underscore characters @@ -338,7 +458,8 @@ this: ')' or '$'. Lemon does not allow this alternative form for terminal symbols. With Lemon, all symbols, terminals and nonterminals, must have alphanumeric names.
-The main component of a Lemon grammar file is a sequence of grammar rules. @@ -423,7 +544,7 @@ allocated by the values of terminals and nonterminals on the right-hand side of a rule.
-Lemon resolves parsing ambiguities in exactly the same way as yacc and bison. A shift-reduce conflict is resolved in favor @@ -539,7 +660,8 @@ as follows:
appears first in the grammar, and report a parsing conflict. -The input grammar to Lemon consists of grammar rules and special directives. We've described all the grammar rules, so now we'll @@ -586,7 +708,7 @@ other than that, the order of directives in Lemon is arbitrary.
following sections: -The %code directive is used to specify additional C code that is added to the end of the main output file. This is similar to @@ -597,8 +719,11 @@ the %include directive except that a tokenizer or even the "main()" function as part of the output file.
+There can be multiple %code directives. The arguments of +all %code directives are concatenated.
+ -The %default_destructor directive specifies a destructor to use for non-terminals that do not have their own destructor @@ -612,14 +737,14 @@ a convenient way to specify the same destructor for all those non-terminals using a single statement.
-The %default_type directive specifies the data type of non-terminal symbols that do not have their own data type defined using a separate %type directive.
-The %destructor directive is used to specify a destructor for a non-terminal symbol. @@ -669,7 +794,7 @@ allocated objects when they go out of scope. To do the same using yacc or bison is much more difficult.
-The %extra_argument directive instructs Lemon to add a 4th parameter to the parameter list of the Parse() function it generates. Lemon @@ -691,7 +816,7 @@ is passed in on the ParseAlloc() or ParseInit() routines instead of on Parse().
-The %extra_context directive instructs Lemon to add a 2nd parameter to the parameter list of the ParseAlloc() and ParseInit() functions. Lemon @@ -711,7 +836,7 @@ a variable named "pAbc" that is the value of that 2nd parameter.
is passed in on the Parse() routine instead of on ParseAlloc()/ParseInit(). -The %fallback directive specifies an alternative meaning for one or more tokens. The alternative meaning is tried if the original token @@ -741,7 +866,7 @@ arguments are tokens which fall back to the token identified by the first argument.
-The %if, %ifdef, %ifndef, %else, and %endif directives @@ -772,7 +897,7 @@ intended to be a single preprocessor symbol name, not a general expression. Use the "%if" directive for general expressions.
-The %include directive specifies C code that is included at the top of the generated parser. You can include any text you want — @@ -796,7 +921,7 @@ grammar call functions that are prototyped in unistd.h.
the end of the generated parser. -By default, the functions generated by Lemon all begin with the five-character string "Parse". You can change this string to something @@ -848,7 +973,7 @@ functions named
parsers and link them all into the same executable. -This directive is used to assign non-associative precedence to one or more terminal symbols. See the section on @@ -857,7 +982,7 @@ or on the %left directive for additional information.
-The %parse_accept directive specifies a block of C code that is executed whenever the parser accepts its input string. To "accept" @@ -873,7 +998,7 @@ without error.
-The %parse_failure directive specifies a block of C code that is executed whenever the parser fails complete. This code is not @@ -888,7 +1013,7 @@ only invoked when parsing is unable to continue.
-This directive is used to assign right-associative precedence to one or more terminal symbols. See the section on @@ -896,7 +1021,7 @@ one or more terminal symbols. See the section on or on the %left directive for additional information.
-The %stack_overflow directive specifies a block of C code that is executed if the parser's internal stack ever overflows. Typically @@ -925,7 +1050,7 @@ For example, do rules like this:
-If stack overflow is a problem and you can't resolve the trouble by using left-recursion, then you might want to increase the size @@ -938,7 +1063,7 @@ with a stack of the requested size. The default value is 100.
-By default, the start symbol for the grammar that Lemon generates is the first non-terminal that appears in the grammar file. But you @@ -950,18 +1075,18 @@ can choose a different start symbol using the -
See Error Processing.
-Undocumented. Appears to be related to the MULTITERMINAL concept. Implementation.
-The %destructor directive assigns a destructor to a non-terminal symbol. (See the description of the @@ -977,7 +1102,7 @@ Other than that, the token destructor works just like the non-terminal destructors.
-Lemon generates #defines that assign small integer constants to each terminal symbol in the grammar. If desired, Lemon will @@ -1004,7 +1129,7 @@ to each of the #defines it generates.
-These directives are used to specify the data types for values on the parser's stack associated with terminal and non-terminal @@ -1041,7 +1166,7 @@ entry parser stack will require 100K of heap space. If you are willing and able to pay that price, fine. You just need to know.
-The %wildcard directive is followed by a single token name and a period. This directive specifies that the identified token should @@ -1052,7 +1177,7 @@ the wildcard token and some other token, the other token is always used. The wildcard token is only matched if there are no alternatives.
-After extensive experimentation over several years, it has been discovered that the error recovery strategy used by yacc is about @@ -1075,5 +1200,41 @@ to begin parsing a new file. This is what will happen at the very first syntax error, of course, if there are no instances of the "error" non-terminal in your grammar.
+ +Lemon was originally written by Richard Hipp sometime in the late +1980s on a Sun4 Workstation using K&R C. +There was a companion LL(1) parser generator program named "Lime", the +source code to which as been lost.
+ +The lemon.c source file was originally many separate files that were +compiled together to generate the "lemon" executable. Sometime in the +1990s, the individual source code files were combined together into +the current single large "lemon.c" source file. You can still see traces +of original filenames in the code.
+ +Since 2001, Lemon has been part of the +SQLite project and the source code +to Lemon has been managed as a part of the +SQLite source tree in the following +files:
+ +All of the source code to Lemon, including the template parser file +"lempar.c" and this documentation file ("lemon.html") are in the public +domain. You can use the code for any purpose and without attribution.
+ +The code comes with no warranty. If it breaks, you get to keep both +pieces.
+ diff --git a/manifest b/manifest index 1b57c1c173..37b2099bb2 100644 --- a/manifest +++ b/manifest @@ -1,5 +1,5 @@ -C Improvements\sto\sthe\sIN-early-out\soptimization\sso\sthat\sit\sworks\smore\nefficiently\swhen\sthere\sare\stwo\sor\smore\sindexed\sIN\sclauses\son\sa\ssingle\stable. -D 2020-09-01T01:52:03.629 +C Lemon\supdates:\s\s(1)\sinclude\sthe\s#defines\sfor\sall\stokens\sin\sthe\sgenerated\sC\nfile,\sso\sthat\sthe\sC-file\scan\sbe\sstand-alone.\s\s(2)\sIf\sthe\sgrammar\sbegins\swith\na\s%include\s{...}\sdirective\son\sline\sone,\smake\sthat\sdirective\sthe\sheader\sfor\nthe\sgenerated\sC\sfile.\s\s(3)\sEnhance\sthe\slemon.html\sdocumentation. +D 2020-09-01T11:20:03.785 F .fossil-settings/empty-dirs dbb81e8fc0401ac46a1491ab34a7f2c7c0452f2f06b54ebb845d024ca8283ef1 F .fossil-settings/ignore-glob 35175cdfcf539b2318cb04a9901442804be81cd677d8b889fcc9149c21f239ea F LICENSE.md df5091916dbb40e6e9686186587125e1b2ff51f022cc334e886c19a0e9982724 @@ -38,7 +38,7 @@ F configure 63af83d31b9fdf304f2dbb1e1638530d4ceff31702d1e19550d1fbf3bdf9471e x F configure.ac 40d01e89cb325c28b33f5957e61fede0bd17da2b5e37d9b223a90c8a318e88d4 F contrib/sqlitecon.tcl 210a913ad63f9f991070821e599d600bd913e0ad F doc/F2FS.txt c1d4a0ae9711cfe0e1d8b019d154f1c29e0d3abfe820787ba1e9ed7691160fcd -F doc/lemon.html 5155bf346e59385ac8d14da0c1e895d8dbc5d225a7d93d3f8249cbfb3c938f55 +F doc/lemon.html c5d8ba85ac1daef7be8c2d389899480eb62451ff5c09b0c28ff8157bb8770746 F doc/pager-invariants.txt 27fed9a70ddad2088750c4a2b493b63853da2710 F doc/trusted-schema.md 33625008620e879c7bcfbbfa079587612c434fa094d338b08242288d358c3e8a F doc/vfs-shm.txt e101f27ea02a8387ce46a05be2b1a902a021d37a @@ -524,7 +524,7 @@ F src/os_win.c a2149ff0a85c1c3f9cc102a46c673ce87e992396ba3411bfb53db66813b32f1d F src/os_win.h 7b073010f1451abe501be30d12f6bc599824944a F src/pager.c 3700a1c55427a3d4168ad1f1b8a8b0cb9ace1d107e4506e30a8f1e66d8a1195e F src/pager.h 4bf9b3213a4b2bebbced5eaa8b219cf25d4a82f385d093cd64b7e93e5285f66f -F src/parse.y 2ca57a8383e9cf9e1140706a85a4b357d6c09cfea7ba9098746a28bc8212441a +F src/parse.y 9ce4dfb772608ed5bd3c32f33e943e021e3b06cfd2c01932d4280888fdd2ebed F src/pcache.c 385ff064bca69789d199a98e2169445dc16e4291fa807babd61d4890c3b34177 F src/pcache.h 4f87acd914cef5016fae3030343540d75f5b85a1877eed1a2a19b9f284248586 F src/pcache1.c 6596e10baf3d8f84cc1585d226cf1ab26564a5f5caf85a15757a281ff977d51a @@ -1798,8 +1798,8 @@ F tool/genfkey.test b6afd7b825d797a1e1274f519ab5695373552ecad5cd373530c63533638a F tool/getlock.c f4c39b651370156cae979501a7b156bdba50e7ce F tool/index_usage.c f62a0c701b2c7ff2f3e21d206f093c123f222dbf07136a10ffd1ca15a5c706c5 F tool/kvtest-speed.sh 4761a9c4b3530907562314d7757995787f7aef8f -F tool/lemon.c 600a58b9d1b8ec5419373982428e927ca208826edacb91ca42ab94514d006039 -F tool/lempar.c e8899b28488f060d0ff931539ea6311b16b22dce068c086c788a06d5e8d01ab7 +F tool/lemon.c 5206111b82f279115c1bfd25a2d859e2b99ab068fc6cddd124d93efd7112cc20 +F tool/lempar.c dc1f5e8a0847c2257b0b069c61e290227062c4d75f5b5a0797b75b08b1c00405 F tool/libvers.c caafc3b689638a1d88d44bc5f526c2278760d9b9 F tool/loadfts.c c3c64e4d5e90e8ba41159232c2189dba4be7b862 F tool/logest.c 11346aa019e2e77a00902aa7d0cabd27bd2e8cca @@ -1879,7 +1879,7 @@ F vsixtest/vsixtest.tcl 6a9a6ab600c25a91a7acc6293828957a386a8a93 F vsixtest/vsixtest.vcxproj.data 2ed517e100c66dc455b492e1a33350c1b20fbcdc F vsixtest/vsixtest.vcxproj.filters 37e51ffedcdb064aad6ff33b6148725226cd608e F vsixtest/vsixtest_TemporaryKey.pfx e5b1b036facdb453873e7084e1cae9102ccc67a0 -P 3ca0b7d54d73d07cd6b32e650a809174bb1cd66ce5ecdb36f65b70899ea05824 -R 16504c659945ee05da548d177d28a416 +P 35505c68c1945c35babd2496e02bc4907a15c8e7b8d77f05f230bd0e9d4891d7 +R ca40e65faf80d0ec5a9ea286af461844 U drh -Z d1eb95f49e8d2ff17d6f9cd7b555126f +Z b58ed847c13aa05b57f422755df0e3ad diff --git a/manifest.uuid b/manifest.uuid index 880e8c42ce..6136c16e36 100644 --- a/manifest.uuid +++ b/manifest.uuid @@ -1 +1 @@ -35505c68c1945c35babd2496e02bc4907a15c8e7b8d77f05f230bd0e9d4891d7 \ No newline at end of file +84d54eb35716174195ee7e5ac846f47308e5dbb0056e8ff568daa133860bab74 \ No newline at end of file diff --git a/src/parse.y b/src/parse.y index c44d6563a4..d3ec2b3da6 100644 --- a/src/parse.y +++ b/src/parse.y @@ -1,5 +1,6 @@ +%include { /* -** 2001 September 15 +** 2001-09-15 ** ** The author disclaims copyright to this source code. In place of ** a legal notice, here is a blessing: @@ -9,11 +10,16 @@ ** May you share freely, never taking more than you give. ** ************************************************************************* -** This file contains SQLite's grammar for SQL. Process this file -** using the lemon parser generator to generate C code that runs -** the parser. Lemon will also generate a header file containing -** numeric codes for all of the tokens. +** This file contains SQLite's SQL parser. +** +** The canonical source code to this file ("parse.y") is a Lemon grammar +** file that specifies the input grammar and actions to take while parsing. +** That input file is processed by Lemon to generate a C-language +** implementation of a parser for the given grammer. You might be reading +** this comment as part of the translated C-code. Edits should be made +** to the original parse.y sources. */ +} // All token codes are small integers with #defines that begin with "TK_" %token_prefix TK_ diff --git a/tool/lemon.c b/tool/lemon.c index 40e4e2894f..97e5fab440 100644 --- a/tool/lemon.c +++ b/tool/lemon.c @@ -2638,8 +2638,10 @@ static void parseonetoken(struct pstate *psp) } nOld = lemonStrlen(zOld); n = nOld + nNew + 20; - addLineMacro = !psp->gp->nolinenosflag && psp->insertLineMacro && - (psp->decllinenoslot==0 || psp->decllinenoslot[0]!=0); + addLineMacro = !psp->gp->nolinenosflag + && psp->insertLineMacro + && psp->tokenlineno>1 + && (psp->decllinenoslot==0 || psp->decllinenoslot[0]!=0); if( addLineMacro ){ for(z=psp->filename, nBack=0; *z; z++){ if( *z=='\\' ) nBack++; @@ -3617,6 +3619,16 @@ PRIVATE void tplt_xfer(char *name, FILE *in, FILE *out, int *lineno) } } +/* Skip forward past the header of the template file to the first "%%" +*/ +PRIVATE void tplt_skip_header(FILE *in, int *lineno) +{ + char line[LINESIZE]; + while( fgets(line,LINESIZE,in) && (line[0]!='%' || line[1]!='%') ){ + (*lineno)++; + } +} + /* The next function finds the template file and opens it, returning ** a pointer to the opened file. */ PRIVATE FILE *tplt_open(struct lemon *lemp) @@ -4287,6 +4299,7 @@ void ReportTable( int mnTknOfst, mxTknOfst; int mnNtOfst, mxNtOfst; struct axset *ax; + char *prefix; lemp->minShiftReduce = lemp->nstate; lemp->errAction = lemp->minShiftReduce + lemp->nrule; @@ -4375,7 +4388,22 @@ void ReportTable( fprintf(sql, "COMMIT;\n"); } lineno = 1; - tplt_xfer(lemp->name,in,out,&lineno); + + /* The first %include directive begins with a C-language comment, + ** then skip over the header comment of the template file + */ + if( lemp->include==0 ) lemp->include = ""; + for(i=0; ISSPACE(lemp->include[i]); i++){ + if( lemp->include[i]=='\n' ){ + lemp->include += i+1; + i = -1; + } + } + if( lemp->include[0]=='/' ){ + tplt_skip_header(in,&lineno); + }else{ + tplt_xfer(lemp->name,in,out,&lineno); + } /* Generate the include code, if any */ tplt_print(out,lemp,lemp->include,&lineno); @@ -4387,17 +4415,19 @@ void ReportTable( tplt_xfer(lemp->name,in,out,&lineno); /* Generate #defines for all tokens */ + if( lemp->tokenprefix ) prefix = lemp->tokenprefix; + else prefix = ""; if( mhflag ){ const char *prefix; fprintf(out,"#if INTERFACE\n"); lineno++; - if( lemp->tokenprefix ) prefix = lemp->tokenprefix; - else prefix = ""; - for(i=1; i