From: dan Date: Fri, 30 Mar 2018 20:42:25 +0000 (+0000) Subject: Update and add further detail to README-server-edition.html. X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=bddcb53614cb8036f42d2036ba3e1e92d98cd5d7;p=thirdparty%2Fsqlite.git Update and add further detail to README-server-edition.html. FossilOrigin-Name: 337a0b67e30f1030fdc59f712e5914f4801b0e9e4ae19a1e82c10b73eb3f4773 --- diff --git a/README-server-edition.html b/README-server-edition.html index 65a5e91a8b..d6eff66323 100644 --- a/README-server-edition.html +++ b/README-server-edition.html @@ -5,35 +5,48 @@

-The "server-process-edition" branch contains two modifications to stock SQLite -that work together to provide concurrent read/write transactions using -page-level-locking provided that: +The "server-process-edition" branch contains two modifications to stock +SQLite that work together to provide concurrent read/write transactions +using pessimistic page-level-locking. The system runs in two modes:

+

The system is designed to be most efficient when used with + + "PRAGMA synchronous=OFF", although it does not require this. +

Up to 16 simultaneous read/write transactions controlled by page-level-locking -are possible. Additionally, there may be any number of read-only transactions -started using "BEGIN READONLY" commands. Read-only transactions do not block -read-write transactions, and read-write transactions do not block read-only -transactions. +are possible. Additionally, in single-process mode there may be any number of +read-only transactions started using the "BEGIN READONLY" command. Read-only +transactions do not block read-write transactions, and read-write transactions +do not block read-only transactions. Read-only transactions access a consistent +snapshot of the database - writes committed by other clients after the +transaction has started are never visible to read-only transactions. In +multi-process mode, the "BEGIN READONLY" command is equivalent to a stock +"BEGIN".

The two features on this branch are:

    -
  1. An alternative layout for the database free-page list. This is intended - to reduce contention between writers when allocating new database pages, - either from the free-list or by extending the database file. - -

  2. The "server-mode" extension, which provides the read/write - page-level-locking and read-only MVCC concurrency mentioned above. +

  3. An + alternative layout for the database free-page list. + This is intended to reduce contention between writers when allocating + new database pages, either from the free-list or by extending the + database file. + +

  4. The "server-mode" extension, which + provides read/write page-level-locking concurrency and (in + single-process mode) read-only MVCC concurrency mentioned above.

-

Alternative Free-List Format

+

1.0 Alternative Free-List Format

The alternative free-list format is very similar to the current format. It @@ -80,15 +93,19 @@ completely empty. Which, as the implementation ensures that a free-list that uses the alternative format is never completely emptied, effectively precludes changing the format from 2 (alternative) to 1 (legacy). -

Page level locking - "Server Mode"

+

+For databases that use the "alternative" free-list format, the read and write +versions in the database header (byte offsets 18 and 19) are set to 3 for +rollback mode or 4 for wal mode (instead of 1 and 2 respectively). + +

2.0 Page level locking - "Server Mode"

-A database client automatically enters "server mode" if (a) it is using a VFS -that takes a process-wide exclusive lock on the db file (like "unix-excl" -does), and (b) there exists a directory named "<database>-journal" in the -file system alongside the database file "<database>" There is currently no -provision for creating this directory, although it could be safely done for -a database in rollback mode using something like: +A database client automatically enters "server mode" if there exists a +directory named "<database>-journal" in the file system alongside +the database file "<database>" There is currently no provision for +creating this directory, although it could be safely done for a database in +rollback mode using something like:

   PRAGMA journal_mode = off;
@@ -97,49 +114,238 @@ a database in rollback mode using something like:
   END;
 
-

-To check the status of these two conditions, a new file-control is added - -SQLITE_FCNTL_SERVER_MODE. SQLite invokes this file-control as part of the -procedure for detecting a hot journal (after it has established that there is a -file-system entry named <database>-journal and that no other process -holds a RESERVED lock). If the VFS does support an exclusive process-wide lock -and if the directory is present, the VFS indicates that the client should enter -server mode. If the VFS does not indicate this, or if it returns -SQLITE_NOTFOUND, then SQLite proceeds with the hot-journal rollback. +

As well as signalling new clients that they should enter server-mode, +creating a directory named "<database>-journal" has the helpful +side-effect of preventing legacy clients from accessing the database file at +all. + +

If the VFS is one that takes an exclusive lock on the db file (to +guarantee that no other process accesses the db file), then the system +automatically enters single-process mode. Otherwise, multi-process mode. + +

In both single and multi-process modes, page-level-locking is managed +by allocating a fixed-size array of "locking slots". Each locking slot is +32-bits in size. By default, the array contains 262144 (2^18) slots. Pages are +assigned to locking slots using the formula (pgno % 262144) - so pages 1, +262145, 524289 etc. share a single locking slot. + +

In single-process mode, the array of locking slots is allocated on +the process heap and access is protected by a mutex. In multi-process mode, it +is created by memory-mapping a file on disk (similar to the *-shm file in +SQLite wal mode) and access is performed using +atomic CAS + primitives exclusively. + +

Each time a read/write transaction is opened, the client assumes a client +id between 0 and 15 for the duration of the transaction. Client ids are unique +at any point in time - concurrently executing transactions must use different +client ids. So there may exist a maximum of 16 concurrent read/write +transactions at any one time. + +

Read/write transactions in server-mode are similar to regular SQLite +transactions in rollback mode. The most significant differences are that: -

-There is also a new file-control named SQLITE_FCNTL_FILEID, which requests a -128-bit value that uniquely identifies an open file on disk from the VFS. This -is used to ensure that all connections to the same database from within a -process use the same shared state, even if they connect to the db using -different file-system paths. +

-

-Write transactions use a journal file stored in the <database>-journal -directory. Journal files are named "<id>-journal", where <id> is an -integer value betwen 0 and 15, inclusive. A client may use multiple different -journal files throughout its lifetime. +

Each locking slot is 32-bits in size. A locking slot may simultaneously +support a single write-lock, up to 16 read-locks from read/write clients, and +(in single process mode) up 1024 read-locks from "BEGIN READONLY" clients. +Locking slot bits are used as follows: + +

-

Before database pages are overwritten in server-mode, entries are added to -an in-memory hash table containing the old page content. These entries are -used by read-only transactions to ensure that they access a consistent snapshot -of the database. Hash table entries are automatically removed when they are -no longer required. +

Currently, if a client requests a lock that cannot be granted due to +a conflicting lock, SQLITE_BUSY is returned to the caller and either the +entire transaction or statement transaction must be rolled back. See +Problems and Issues below for more details. -

-It is not difficult to extend the kind of page level locking used by read/write -transactions to clients in multiple processes. It might be more difficult to -extend the read-only MVCC capability though. +

2.1 Single-Process Mode

+ +

Single process mode is simpler than multi-process mode because it does +not have to deal with runtime client failure - it is assumed that if one +client fails mid-transaction the entire process crashes. As a result the +only time hot-journal rollback is required in single-process mode is as +part of startup. The first client to connect to a database in single-process +mode attempts to open and rollback all 16 potential hot journal files. + +

But, in order to support non-blocking "BEGIN READONLY" transactions, it is +also in some ways more complicated than multi-process mode. "BEGIN READONLY" +support works as follows: + +

+ +

2.2 Multi-Process Mode

+ +

Multi-process mode differs from single-process mode in two important ways: + +

+ +

Unlike single-process mode clients, which may be assigned a different +client-id for each transaction, clients in multi-process mode are assigned a +client-id when they connect to the database and do not relinquish it until +they disconnect. As such, a database in multi-process server-mode supports +at most 16 concurrent client connections. + +

As well as the array of locking slots, the shared-memory mapping used +by clients in multi-process mode contains 16 "client slots". When a client +connects, it takes a posix WRITE lock on the client slot that corresponds +to its client id. This lock is not released until the client disconnects. +Additionally, whenever a client starts a transaction, it sets the value +in its client locking slot to 1, and clears it again after the transaction +is concluded. + +

This assists with handling client failure mid-transaction in two ways: + +

+ +

2.3 Required VFS Support

+ +

The server-mode extension requires that the VFS support various special +file-control commands. Currently support is limited to the "unix" VFS. + +

+
SQLITE_FCNTL_SERVER_MODE +

This is used by SQLite to query the VFS as to whether the + connection should use single-process server-mode, multi-process server-mode, + or continue in legacy mode. + +

SQLite invokes this file-control as part of the procedure for detecting a + hot journal (after it has established that there is a file-system entry named + <database>-journal and that no other process holds a RESERVED lock). + If the <database>-journal directory is present in the file-system and + the current VFS takes an exclusive lock on the database file (i.e. is + "unix-excl"), then this file-control indicates that the connection should use + single-process server-mode. Or, if the directory exists but the VFS does not + take an exclusive lock on the database file, that the connection should use + multi-proces server-mode. Or, if there is no directory of the required name, + that the connection should use legacy mode. + +

SQLITE_FCNTL_FILEID +

Return a 128-bit value that uniquely identifies an open file on disk + from the VFS. This is used to ensure that all connections to the same + database from within a process use the same shared state, even if they + connect to the db using different file-system paths. + +

SQLITE_FCNTL_SHMOPEN +
+ +
SQLITE_FCNTL_SHMOPEN2 +
+ +
SQLITE_FCNTL_SHMLOCK +
+ +
SQLITE_FCNTL_SHMCLOSE +
+
+ + +

3.0 Problems and Issues

+ + -

Performance Test

+

4.0 Performance Test

The test uses a single table with the following schema: diff --git a/manifest b/manifest index 988be04004..2f4c8bc992 100644 --- a/manifest +++ b/manifest @@ -1,11 +1,11 @@ -C Update\sthis\sbranch\swith\slatest\strunk\schanges. -D 2018-03-28T15:41:57.525 +C Update\sand\sadd\sfurther\sdetail\sto\sREADME-server-edition.html. +D 2018-03-30T20:42:25.654 F .fossil-settings/empty-dirs dbb81e8fc0401ac46a1491ab34a7f2c7c0452f2f06b54ebb845d024ca8283ef1 F .fossil-settings/ignore-glob 35175cdfcf539b2318cb04a9901442804be81cd677d8b889fcc9149c21f239ea F Makefile.in 7016fc56c6b9bfe5daac4f34be8be38d8c0b5fab79ccbfb764d3b23bf1c6fff3 F Makefile.linux-gcc 7bc79876b875010e8c8f9502eb935ca92aa3c434 F Makefile.msc bdcad21b027a56a73e54a1121cfb9edd0a35c0abfa53aa12c2f996006ff99960 -F README-server-edition.html 0c6bc6f55191b6900595fe37470bbe5772953ab5c64dae967d07a5d58a0c3508 +F README-server-edition.html 2065bc7f89b84ec9e4199aeae3786399a3bb88cd8ed3f7398067d010d7c4cf8b F README.md 1d5342ebda97420f114283e604e5fe99b0da939d63b76d492eabbaae23488276 F VERSION cdf91ac446255ecf3d8f6d8c3ee40d64123235ae5b3cef29d344e61b45ec3759 F aclocal.m4 a5c22d164aff7ed549d53a90fa56d56955281f50 @@ -493,7 +493,7 @@ F src/random.c 80f5d666f23feb3e6665a6ce04c7197212a88384 F src/resolve.c 66c73fcb7719b8ff0e841b58338f13604ff3e2b50a723f9b8f383595735262f6 F src/rowset.c 7b7e7e479212e65b723bf40128c7b36dc5afdfac F src/select.c e51efe5479d1cb4f48defe0b97cdba7391df42a755ba9592b9159510d03cf738 -F src/server.c 9af69ec201823023bfa6f52b2b8262611f2e14698cb7d5e79e7791f0e7fd7139 +F src/server.c 70421e6acbb2279878606be160b45c7db78933d6ec320317a2e939218496deb9 F src/server.h f46be129ffe407cac9b7018e6d4851b04e685d59b6837c73a1fb69e6aab52e3a F src/shell.c.in d6a07811aa9f3b10200c15ab8dd4b6b998849a3b0c8b125bfa980329a33c26a6 F src/sqlite.h.in 45150a75c20ad6f9d914cd6e59caf36453206b0f824d514f194b56236f2d63d7 @@ -1729,7 +1729,7 @@ F vsixtest/vsixtest.tcl 6a9a6ab600c25a91a7acc6293828957a386a8a93 F vsixtest/vsixtest.vcxproj.data 2ed517e100c66dc455b492e1a33350c1b20fbcdc F vsixtest/vsixtest.vcxproj.filters 37e51ffedcdb064aad6ff33b6148725226cd608e F vsixtest/vsixtest_TemporaryKey.pfx e5b1b036facdb453873e7084e1cae9102ccc67a0 -P 1b3df8ffc551df0e4d8bcb633e1549ba769d8866cfcffea4cb62949ecf5c4c99 d282f064698782cf7b584138549a6b27befa0b945ae96b52a3ef6f8a13448077 -R fc0391b911f4aaa8e82b08fa65e12c60 +P df52e89fff54dbb650cd1fb2b7afe0467acea96a0056728ef48e0c3fea40eeb2 +R 7d4ea8fb7a75fedab49e62a0da9de0ed U dan -Z 0026b429d274d1865a2fb3f0ce193a91 +Z c5379dcb0a548050240d2cfa8c04d0d8 diff --git a/manifest.uuid b/manifest.uuid index 0fb2380e0b..6ea9d6f82b 100644 --- a/manifest.uuid +++ b/manifest.uuid @@ -1 +1 @@ -df52e89fff54dbb650cd1fb2b7afe0467acea96a0056728ef48e0c3fea40eeb2 \ No newline at end of file +337a0b67e30f1030fdc59f712e5914f4801b0e9e4ae19a1e82c10b73eb3f4773 \ No newline at end of file diff --git a/src/server.c b/src/server.c index 7e663d599b..e01e2e74f2 100644 --- a/src/server.c +++ b/src/server.c @@ -140,6 +140,9 @@ struct Server { Server *pNext; /* Next in pCommit or pReader list */ }; +/* +** Global variables used by this module. +*/ struct ServerGlobal { ServerDb *pDb; /* Linked list of all ServerDb objects */ }; @@ -161,10 +164,6 @@ typedef struct ServerFcntlArg ServerFcntlArg; #define SERVER_TRANS_READONLY 1 #define SERVER_TRANS_READWRITE 2 -#define SERVER_WRITE_LOCK 3 -#define SERVER_READ_LOCK 2 -#define SERVER_NO_LOCK 1 - /* ** Global mutex functions used by code in this file. */ @@ -224,12 +223,20 @@ static int serverFindDatabase(Server *pNew, i64 *aFileId){ return rc; } +/* +** Roll back journal iClient. This is a hot-journal rollback - the +** connection passed as the first argument does not currently have an +** open transaction that uses the journal (although it may have an +** open transaction that uses some other journal). +*/ static int serverClientRollback(Server *p, int iClient){ ServerDb *pDb = p->pDb; ServerJournal *pJ = &pDb->aJrnl[iClient]; int bExist = 1; int rc = SQLITE_OK; + /* If it is not exists on disk but is not already open, open the + ** journal file in question. */ if( fdOpen(pJ->jfd)==0 ){ bExist = 0; rc = sqlite3OsAccess(pDb->pVfs, pJ->zJournal, SQLITE_ACCESS_EXISTS,&bExist);