From: drh Date: Thu, 6 May 2010 11:55:56 +0000 (+0000) Subject: Add two text files containing pager design notes to the doc/ subfolder. X-Git-Tag: version-3.7.2~422 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=24e39711977854135526ed4988f9c624611b826e;p=thirdparty%2Fsqlite.git Add two text files containing pager design notes to the doc/ subfolder. FossilOrigin-Name: ed817fc893e7162ae0ff4022591f7e9e3b81d622 --- diff --git a/doc/pager-invariants.txt b/doc/pager-invariants.txt new file mode 100644 index 0000000000..c6deda7a69 --- /dev/null +++ b/doc/pager-invariants.txt @@ -0,0 +1,76 @@ + *** Throughout this document, a page is deemed to have been synced + automatically as soon as it is written when PRAGMA synchronous=OFF. + Otherwise, the page is not synced until the xSync method of the VFS + is called successfully on the file containing the page. + + *** Definition: A page of the database file is said to be "overwriteable" if + one or more of the following are true about the page: + + (a) The original content of the page as it was at the beginning of + the transaction has been written into the rollback journal and + synced. + + (b) The page was a freelist leaf page at the start of the transaction. + + (c) The page number is greater than the largest page that existed in + the database file at the start of the transaction. + + (1) A page of the database file is never overwritten unless one of the + following are true: + + (a) The page and all other pages on the same sector are overwriteable. + + (b) The atomic page write optimization is enabled, and the entire + transaction other than the update of the transaction sequence + number consists of a single page change. + + (2) The content of a page written into the rollback journal exactly matches + both the content in the database when the rollback journal was written + and the content in the database at the beginning of the current + transaction. + + (3) Writes to the database file are an integer multiple of the page size + in length and are aligned to a page boundary. + + (4) Reads from the database file are either aligned on a page boundary and + an integer multiple of the page size in length or are taken from the + first 100 bytes of the database file. + + (5) All writes to the database file are synced prior to the rollback journal + being deleted, truncated, or zeroed. + + (6) If a master journal file is used, then all writes to the database file + are synced prior to the master journal being deleted. + + *** Definition: Two databases (or the same database at two points it time) + are said to be "logically equivalent" if they give the same answer to + all queries. Note in particular the the content of freelist leaf + pages can be changed arbitarily without effecting the logical equivalence + of the database. + + (7) At any time, if any subset, including the empty set and the total set, + of the unsynced changes to a rollback journal are removed and the + journal is rolled back, the resulting database file will be logical + equivalent to the database file at the beginning of the transaction. + + (8) When a transaction is rolled back, the xTruncate method of the VFS + is called to restore the database file to the same size it was at + the beginning of the transaction. (In some VFSes, the xTruncate + method is a no-op, but that does not change the fact the SQLite will + invoke it.) + + (9) Whenever the database file is modified, at least one bit in the range + of bytes from 24 through 39 inclusive will be changed prior to releasing + the EXCLUSIVE lock. + +(10) The pattern of bits in bytes 24 through 39 shall not repeat in less + than one billion transactions. + +(11) A database file is well-formed at the beginning and at the conclusion + of every transaction. + +(12) An EXCLUSIVE lock must be held on the database file before making + any changes to the database file. + +(13) A SHARED lock must be held on the database file before reading any + content out of the database file. diff --git a/doc/vfs-shm.txt b/doc/vfs-shm.txt new file mode 100644 index 0000000000..3e8efb8061 --- /dev/null +++ b/doc/vfs-shm.txt @@ -0,0 +1,125 @@ +The 5 states of an historical rollback lock as implemented by the +xLock, xUnlock, and xCheckReservedLock methods of the sqlite3_io_methods +objec are: + + UNLOCKED + SHARED + RESERVED + PENDING + EXCLUSIVE + +The wal-index file has a similar locking hierarchy implemented using +the xShmLock method of the sqlite3_vfs object, but with 7 +states. Each connection to a wal-index file must be in one of +the following 7 states: + + UNLOCKED + READ + READ_FULL + WRITE + PENDING + CHECKPOINT + RECOVER + +These roughly correspond to the 5 states of a rollback lock except +that SHARED is split out into 2 states: READ and READ_FULL and +there is an extra RECOVER state used for wal-index reconstruction. + +The meanings of the various wal-index locking states is as follows: + + UNLOCKED - The wal-index is not in use. + + READ - Some prefix of the wal-index is being read. Additional + wal-index information can be appended at any time. The + newly appended content will be ignored by the holder of + the READ lock. + + READ_FULL - The entire wal-index is being read. No new information + can be added to the wal-index. The holder of a READ_FULL + lock promises never to read pages from the database file + that are available anywhere in the wal-index. + + WRITE - It is OK to append to the wal-index file and to adjust + the header to indicate the new "last valid frame". + + PENDING - Waiting on all READ locks to clear so that a + CHECKPOINT lock can be acquired. + + CHECKPOINT - It is OK to write any WAL data into the database file + and zero the last valid frame field of the wal-index + header. The wal-index file itself may not be changed + other than to zero the last valid frame field in the + header. + + RECOVER - Held during wal-index recovery. Used to prevent a + race if multiple clients try to recover a wal-index at + the same time. + + +A particular lock manager implementation may coalesce one or more of +the wal-index locking states, though with a reduction in concurrency. +For example, an implemention might implement only exclusive locking, +in which case all states would be equivalent to CHECKPOINT, meaning that +only one reader or one writer or one checkpointer could be active at a +time. Or, an implementation might combine READ and READ_FULL into +a single state equivalent to READ, meaning that a writer could +coexist with a reader, but no reader or writers could coexist with a +checkpointer. + +The lock manager must obey the following rules: + +(1) A READ cannot coexist with CHECKPOINT. +(2) A READ_FULL cannot coexist with WRITE. +(3) None of WRITE, PENDING, CHECKPOINT, or RECOVER can coexist. + +The SQLite core will obey the next set of rules. These rules are +assertions on the behavior of the SQLite core which might be verified +during testing using an instrumented lock manager. + +(5) No part of the wal-index will be read without holding either some + kind of SHM lock or an EXCLUSIVE lock on the original database. +(6) A holder of a READ_FULL will never read any page of the database + file that is contained anywhere in the wal-index. +(7) No part of the wal-index other than the header will be written nor + will the size of the wal-index grow without holding a WRITE. +(8) The wal-index header will not be written without holding one of + WRITE, CHECKPOINT, or RECOVER. +(9) A CHECKPOINT or RECOVER must be held in order to reset the last valid + frame counter in the header of the wal-index back to zero. +(10) A WRITE can only increase the last valid frame pointer in the header. + +The SQLite core will only ever send requests for UNLOCK, READ, WRITE, +CHECKPOINT, or RECOVER to the lock manager. The SQLite core will never +request a READ_FULL or PENDING lock though the lock manager may deliver +those locking states in response to READ and CHECKPOINT requests, +respectively, if and only if the requested READ or CHECKPOINT cannot +be delivered. + +The following are the allowed lock transitions: + + Original-State Request New-State + -------------- ---------- ---------- +(11a) UNLOCK READ READ +(11b) UNLOCK READ READ_FULL +(11c) UNLOCK CHECKPOINT PENDING +(11d) UNLOCK CHECKPOINT CHECKPOINT +(11e) READ UNLOCK UNLOCK +(11f) READ WRITE WRITE +(11g) READ RECOVER RECOVER +(11h) READ_FULL UNLOCK UNLOCK +(11i) READ_FULL WRITE WRITE +(11j) READ_FULL RECOVER RECOVER +(11k) WRITE READ READ +(11l) PENDING UNLOCK UNLOCK +(11m) PENDING CHECKPOINT CHECKPOINT +(11n) CHECKPOINT UNLOCK UNLOCK +(11o) CHECKPOINT RECOVER RECOVER +(11p) RECOVER READ READ +(11q) RECOVER CHECKPOINT CHECKPOINT + +These 17 transitions are all that needs to be supported. The lock +manager implementation can assert that fact. The other 25 possible +transitions among the 7 locking states will never occur. + +The rules above are sufficient for correctness. For maximum concurrency, +the following additional considerations apply: diff --git a/manifest b/manifest index 12e762b0ca..098b33ebf8 100644 --- a/manifest +++ b/manifest @@ -1,5 +1,8 @@ -C Add\stest\scases\sto\stest\sthe\slibraries\shandling\sof\scorrupt\swal-index\sheaders. -D 2010-05-06T11:32:09 +-----BEGIN PGP SIGNED MESSAGE----- +Hash: SHA1 + +C Add\stwo\stext\sfiles\scontaining\spager\sdesign\snotes\sto\sthe\sdoc/\ssubfolder. +D 2010-05-06T11:55:57 F Makefile.arm-wince-mingw32ce-gcc fcd5e9cd67fe88836360bb4f9ef4cb7f8e2fb5a0 F Makefile.in d83a0ffef3dcbfb08b410a6c6dd6c009ec9167fb F Makefile.linux-gcc d53183f4aa6a9192d249731c90dbdffbd2c68654 @@ -23,6 +26,8 @@ F configure 72c0ad7c8cfabbffeaf8ca61e1d24143cf857eb2 x F configure.ac 14740970ddb674d92a9f5da89083dff1179014ff F contrib/sqlitecon.tcl 210a913ad63f9f991070821e599d600bd913e0ad F doc/lemon.html f0f682f50210928c07e562621c3b7e8ab912a538 +F doc/pager-invariants.txt 870107036470d7c419e93768676fae2f8749cf9e +F doc/vfs-shm.txt db230538d9d2170d838b7a79493bbc24a5c39788 F ext/README.txt 913a7bd3f4837ab14d7e063304181787658b14e1 F ext/async/README.txt 0c541f418b14b415212264cbaaf51c924ec62e5b F ext/async/sqlite3async.c 676066c2a111a8b3107aeb59bdbbbf335c348f4a @@ -811,7 +816,14 @@ F tool/speedtest2.tcl ee2149167303ba8e95af97873c575c3e0fab58ff F tool/speedtest8.c 2902c46588c40b55661e471d7a86e4dd71a18224 F tool/speedtest8inst1.c 293327bc76823f473684d589a8160bde1f52c14e F tool/vdbe-compress.tcl d70ea6d8a19e3571d7ab8c9b75cba86d1173ff0f -P fbbcacb137e8f5246b88ad09331236aaa1900f60 -R c23cc19ecc9b2f6c130b1979d4249711 -U dan -Z 7712493005da602cd936a05bd6144f71 +P 9465b267d420120c050bbe4f143ac824146a9e4a +R 089c4fb1e4c5e2af451ed447b05621f8 +U drh +Z 1e94a2049aa9a831f97a2f9882e08469 +-----BEGIN PGP SIGNATURE----- +Version: GnuPG v1.4.6 (GNU/Linux) + +iD8DBQFL4q5RoxKgR168RlERAgnEAJ9AxQEr7Uk8mFQoqD+OX/obdL89jACfTG+r +XdjbjjPNKUjZXT97fSSLXx4= +=ENzL +-----END PGP SIGNATURE----- diff --git a/manifest.uuid b/manifest.uuid index f114d97282..24dfae9f4d 100644 --- a/manifest.uuid +++ b/manifest.uuid @@ -1 +1 @@ -9465b267d420120c050bbe4f143ac824146a9e4a \ No newline at end of file +ed817fc893e7162ae0ff4022591f7e9e3b81d622 \ No newline at end of file