shess [Wed, 29 Nov 2006 01:02:03 +0000 (01:02 +0000)]
Delta-encode terms in interior nodes. While experiments have shown
that this is of marginal utility when encoding terms resulting from
regular English text, it turns out to be very useful when encoding
inputs with very large terms. (CVS 3520)
shess [Sat, 18 Nov 2006 00:12:44 +0000 (00:12 +0000)]
Store minimal terms in interior nodes. Whenever there's a break
between leaf nodes, instead of storing the entire leftmost term of the
rightmost child, store only that portion of the leftmost term
necessary to distinguish it from the rightmost term of the leftmost
child. (CVS 3513)
shess [Fri, 17 Nov 2006 21:12:15 +0000 (21:12 +0000)]
Refactoring groundwork for coming work on interior nodes. Change
LeafWriter to use empty data buffer (instead of empty term) to detect
an empty block. Code to validate interior nodes. Moderate revisions
to leaf-node and doclist validation. Recast leafWriterStep() in terms
of LeafWriterStepMerge(). (CVS 3512)
shess [Mon, 13 Nov 2006 21:00:54 +0000 (21:00 +0000)]
Require a minimum fanout for interior nodes. This prevents cases
where excessively large terms keep the tree from finding a single
root. A downside is that this could result in large interior nodes in
the presence of large terms, which may be prone to fragmentation,
though if the nodes were smaller that would translate into more levels
in the tree, which would also have that problem. (CVS 3510)
aswift [Sat, 11 Nov 2006 01:31:58 +0000 (01:31 +0000)]
The uninitialized file descriptor from the unixFile structure is passed to sqlite3DetectLockingStyle in allocateUnixFile rather than the file descriptor passed in. This was causing the locking detection on NFS file systems to behave somewhat randomly and the result was locks were not respected and data loss could occur. (CVS 3508)
drh [Thu, 9 Nov 2006 00:24:53 +0000 (00:24 +0000)]
First cut at adding the sqlite3_prepare_v2() API. Test cases added, but
more testing would be useful. Still need to update the documentation. (CVS 3506)
drh [Mon, 6 Nov 2006 21:20:25 +0000 (21:20 +0000)]
Use the difference between the SQLITE_IOERR_SHORT_READ and SQLITE_IOERR_READ
returns from sqlite3OsRead() to make decisions about what to do with the
error. (CVS 3503)
drh [Tue, 31 Oct 2006 21:16:48 +0000 (21:16 +0000)]
Change the default prefix for temporary files so that it no longer
contains the text "sqlite". In this way, perhaps we will not get so
many false bug reports such as ticket #2049, #1989, and #1841. (CVS 3498)
drh [Thu, 26 Oct 2006 18:15:42 +0000 (18:15 +0000)]
Bring CVS output into more commonly accepted practice. Tickets #2030, #1573.
Add command-line options -bail and ".bail" commands. Default behavior is
to continue after encountering an error. Ticket #2045. (CVS 3491)
drh [Thu, 26 Oct 2006 14:25:58 +0000 (14:25 +0000)]
Command-line shell enhancements. Bail out when errors are seen in
non-interactive mode. Override isatty() using -interactive or -batch
command-line options. Report line number in error messages.
Tickets #2009, #2045. (CVS 3490)
shess [Thu, 26 Oct 2006 00:41:51 +0000 (00:41 +0000)]
Empty queries should get no results. My recent change
( http://www.sqlite.org/cvstrac/chngview?cn=3486 ) broke test fts2a-5.3.
This change should make the expected result more obvious. (CVS 3489)
shess [Thu, 26 Oct 2006 00:04:31 +0000 (00:04 +0000)]
Make memset() uses less error-prone.
http://www.sqlite.org/cvstrac/tktview?tn=2036,35 describes some cases
where we were passing memset() a length which was the sizeof a
pointer, rather than the structure pointed to. Instead, wrap this
idiom up in CLEAR() and SCRAMBLE() macros. (CVS 3488)
shess [Wed, 25 Oct 2006 21:00:09 +0000 (21:00 +0000)]
Replace the DocList and DocListReader structures. The new structures
distinguish reading from a static buffer from writing to a dynamic
buffer. This allows n-way doclist merging, and in-place merging of
segment leaf nodes, which together cut segment merge times in half. (CVS 3486)
shess [Wed, 25 Oct 2006 05:21:55 +0000 (05:21 +0000)]
Don't store empty segments. When inserting empty strings, the code
was writing out a segment made up of a single leaf node containing the
\0 header. LeafReader assumed that leaf nodes always contained at
least one term, so assertions would fail.
While it would be possible to support reading and merging empty
segments, there's no reason to do so. While this change could have
been done in writeZeroSegment(), I put it in leafWriterFlush() so that
it would work right if segmentMerge() created an empty segment, which
could happen with future changes to how deleted documents are handled. (CVS 3484)
shess [Thu, 12 Oct 2006 23:15:24 +0000 (23:15 +0000)]
Convert fts2 to store data in a way which allows for much faster
updates. Groups of documents form segments which are encoded in a
btree layered over a table of blocks, with various tricks to make
merges fast. This performs 20x-25x faster than fts1 when loading the
Enron corpus, and is only slightly slower for queries. (CVS 3474)
shess [Thu, 5 Oct 2006 21:48:56 +0000 (21:48 +0000)]
Fix incorrect doclist initialization in term_select_all().
docListRestrictColumn() generates a DL_POSITIONS doclist, which means
that after the first doclist is processed, the second doclist is
initialized as DL_POSITIONS, but with DL_POSITIONS_OFFSETS data.
(Note that DL_DEFAULT is now DL_POSITIONS, which masks this bug.) (CVS 3467)
drh [Tue, 3 Oct 2006 19:05:18 +0000 (19:05 +0000)]
Report the error SQLITE_CORRUPT instead of SQLITE_IOERR if unable
to rollback a hot journal that was damaged (for example) by filesystem
corruption following a power failure. (CVS 3460)
drh [Sun, 1 Oct 2006 18:58:31 +0000 (18:58 +0000)]
Remove one non-working test case fromthe Porter stemmer tests and add
an acknowledgement for the source of the test data (Martin Porter himself.) (CVS 3453)
Be sure to ignore PRAGMA encoding pragmas if the encoding has already been
set for a database. Ticket #1987. This patch also includes some cleanup
of the schema parser and initialization logic. (CVS 3436)
We handle an UPDATE to a row by performing an UPDATE on the content table and by building new position lists for each term which appears in either the old or new versions of the row. We write these position lists all at once; this is presumably more efficient than a delete followed by an insert (which would first write empty position lists, then new position lists). (CVS 3434)
When gathering a doclist for querying, don't discard empty position lists until the end; this allows empty position lists to override non-empty lists encountered later in the gathering process. This fixes #1982, which was caused by the fact that for all-column queries we weren't discarding empty position lists at all. (CVS 3433)
Convert all names to lower case before sending them to the xFindFunction
method of a virtual table. In FTS1, use strcmp instead of strcasecmp.
Ticket #1981. (CVS 3429)
Convert all names to lower case before sending them to the xFindFunction
method of a virtual table. In FTS1, use strcmp instead of strcasecmp.
Ticket #1981. (CVS 3428)
Modify FTS1 so that the "magic" column has the same name as the virtual
table. Offsets are retrieved using a special "offsets" function whose
first argument is the magic column. Snippets will ultimately be retrieved
in the same way. (CVS 3427)
Add support for extended result codes - additional result information
carried in the higher bits of the integer return codes. This must be
enabled using the sqlite3_extended_result_code() API. Only a few extra
result codes are currently defined. (CVS 3422)