shess [Thu, 26 Oct 2006 00:41:51 +0000 (00:41 +0000)]
Empty queries should get no results. My recent change
( http://www.sqlite.org/cvstrac/chngview?cn=3486 ) broke test fts2a-5.3.
This change should make the expected result more obvious. (CVS 3489)
shess [Thu, 26 Oct 2006 00:04:31 +0000 (00:04 +0000)]
Make memset() uses less error-prone.
http://www.sqlite.org/cvstrac/tktview?tn=2036,35 describes some cases
where we were passing memset() a length which was the sizeof a
pointer, rather than the structure pointed to. Instead, wrap this
idiom up in CLEAR() and SCRAMBLE() macros. (CVS 3488)
shess [Wed, 25 Oct 2006 21:00:09 +0000 (21:00 +0000)]
Replace the DocList and DocListReader structures. The new structures
distinguish reading from a static buffer from writing to a dynamic
buffer. This allows n-way doclist merging, and in-place merging of
segment leaf nodes, which together cut segment merge times in half. (CVS 3486)
shess [Wed, 25 Oct 2006 05:21:55 +0000 (05:21 +0000)]
Don't store empty segments. When inserting empty strings, the code
was writing out a segment made up of a single leaf node containing the
\0 header. LeafReader assumed that leaf nodes always contained at
least one term, so assertions would fail.
While it would be possible to support reading and merging empty
segments, there's no reason to do so. While this change could have
been done in writeZeroSegment(), I put it in leafWriterFlush() so that
it would work right if segmentMerge() created an empty segment, which
could happen with future changes to how deleted documents are handled. (CVS 3484)
shess [Thu, 12 Oct 2006 23:15:24 +0000 (23:15 +0000)]
Convert fts2 to store data in a way which allows for much faster
updates. Groups of documents form segments which are encoded in a
btree layered over a table of blocks, with various tricks to make
merges fast. This performs 20x-25x faster than fts1 when loading the
Enron corpus, and is only slightly slower for queries. (CVS 3474)
shess [Thu, 5 Oct 2006 21:48:56 +0000 (21:48 +0000)]
Fix incorrect doclist initialization in term_select_all().
docListRestrictColumn() generates a DL_POSITIONS doclist, which means
that after the first doclist is processed, the second doclist is
initialized as DL_POSITIONS, but with DL_POSITIONS_OFFSETS data.
(Note that DL_DEFAULT is now DL_POSITIONS, which masks this bug.) (CVS 3467)
drh [Tue, 3 Oct 2006 19:05:18 +0000 (19:05 +0000)]
Report the error SQLITE_CORRUPT instead of SQLITE_IOERR if unable
to rollback a hot journal that was damaged (for example) by filesystem
corruption following a power failure. (CVS 3460)
drh [Sun, 1 Oct 2006 18:58:31 +0000 (18:58 +0000)]
Remove one non-working test case fromthe Porter stemmer tests and add
an acknowledgement for the source of the test data (Martin Porter himself.) (CVS 3453)
Be sure to ignore PRAGMA encoding pragmas if the encoding has already been
set for a database. Ticket #1987. This patch also includes some cleanup
of the schema parser and initialization logic. (CVS 3436)
We handle an UPDATE to a row by performing an UPDATE on the content table and by building new position lists for each term which appears in either the old or new versions of the row. We write these position lists all at once; this is presumably more efficient than a delete followed by an insert (which would first write empty position lists, then new position lists). (CVS 3434)
When gathering a doclist for querying, don't discard empty position lists until the end; this allows empty position lists to override non-empty lists encountered later in the gathering process. This fixes #1982, which was caused by the fact that for all-column queries we weren't discarding empty position lists at all. (CVS 3433)
Convert all names to lower case before sending them to the xFindFunction
method of a virtual table. In FTS1, use strcmp instead of strcasecmp.
Ticket #1981. (CVS 3429)
Convert all names to lower case before sending them to the xFindFunction
method of a virtual table. In FTS1, use strcmp instead of strcasecmp.
Ticket #1981. (CVS 3428)
Modify FTS1 so that the "magic" column has the same name as the virtual
table. Offsets are retrieved using a special "offsets" function whose
first argument is the magic column. Snippets will ultimately be retrieved
in the same way. (CVS 3427)
Add support for extended result codes - additional result information
carried in the higher bits of the integer return codes. This must be
enabled using the sqlite3_extended_result_code() API. Only a few extra
result codes are currently defined. (CVS 3422)
The FTS1 tables have a new automatic column named "offset" that returns
a string containing byte offset information for all matching terms.
Also added a large test case based on SQLite mailing list entries. (CVS 3417)
Module spec parser enhancements for FTS1. Now able to cope with column
names in the spec that are SQL keywords or have special characters, etc.
Also added support for additional control lines. Column names can be
followed by a type specifier (which is ignored.) (CVS 3410)
Allow virtual tables to contain multiple full-text-indexed columns. Added a magic column "_all" which can be used for querying all columns in a table at once.
For now, each posting list stores position/offset information for multiple columns. We may implement separate posting lists for separate columns at some future point. (CVS 3408)
Re-use deleted rowids for new segments. This has a somewhat
surprising impact on performance, I believe because it keeps the index
smaller (by keeping rowids smaller), and also because it improves
locality in the table (deleting a row means we've already touched the
pages leading to that rowid). (CVS 3405)
Add a rudimentary tokenizer and parser to FTS1 for parsing the module
arguments during initialization. Recognized arguments include a
tokenizer selector and a list of virtual table columns. (CVS 3403)
Add pzErr parameters to the xConnect and xCreate methods of virtual tables
in order to provide better error reporting. This is an interface change
for virtual tables. Prior virtual table implementations will need to be
modified and recompiled. (CVS 3402)
Add a new zErrMsg field to the sqlite3_vtab structure to support returning
error messages from virtual table constructors. This change means that
virtual table implementations compiled as loadable extensions for version
3.3.7 will need to be recompile for version 3.3.8 and will not be usable
by both versions at one. The virtual table mechanism is still considered
experimental so we feel justified in breaking backwards compatibility
in this way. Additional interface changes might occurs in the future. (CVS 3401)
Write doclists using a segmented technique to amortize costs better.
New items for a term are merged with the term's segment 0 doclist,
until that doclist exceeds CHUNK_MAX. Then the segments are merged in
exponential fashion, so that segment 1 contains approximately
2*CHUNK_MAX data, segment 2 4*CHUNK_MAX, and so on. (CVS 3398)
Add HAVE_GMTIME_R and HAVE_LOCALTIME_R flags and use them if defined.
Unable to modify the configure script to test for gmtime_r and
localtime_r, however, because on my SuSE 10.2 system, autoconf generates
a configure script that does not work. Bummer. Ticket #1906 (CVS 3397)
Bug fix in date/time computations. Ticket #1964.
Some unrelated comment typos are also fixed and got accidently
checked in at the same time. (CVS 3396)