From: dan Date: Mon, 31 Jul 2017 19:55:56 +0000 (+0000) Subject: Add documentation file ./README-server-edition.html. X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=0a1c5dc414268dd8c3029c1d6148aa9ab556124c;p=thirdparty%2Fsqlite.git Add documentation file ./README-server-edition.html. FossilOrigin-Name: fbc4f4ad259cfe04b28d698ef025f179b6708ac788939976c401db2caf8f92c1 --- diff --git a/README-server-edition.html b/README-server-edition.html new file mode 100644 index 0000000000..65a5e91a8b --- /dev/null +++ b/README-server-edition.html @@ -0,0 +1,231 @@ + + +
+

The "server-process-edition" Branch

+
+ +

+The "server-process-edition" branch contains two modifications to stock SQLite +that work together to provide concurrent read/write transactions using +page-level-locking provided that: + +

+ +

+Up to 16 simultaneous read/write transactions controlled by page-level-locking +are possible. Additionally, there may be any number of read-only transactions +started using "BEGIN READONLY" commands. Read-only transactions do not block +read-write transactions, and read-write transactions do not block read-only +transactions. + +

+The two features on this branch are: +

    +
  1. An alternative layout for the database free-page list. This is intended + to reduce contention between writers when allocating new database pages, + either from the free-list or by extending the database file. + +

  2. The "server-mode" extension, which provides the read/write + page-level-locking and read-only MVCC concurrency mentioned above. +

+ + +

Alternative Free-List Format

+ +

+The alternative free-list format is very similar to the current format. It +differs in the following respects: + +

+ +

+This effectively means that a database has N free-lists instead of just one. To +allocate a free page, a writer only needs to lock one such free-list, and so up +to N transactions that allocate new pages may proceed concurrently. + +

+Allocating pages from the end of the db file still locks out all other +read/write transactions (because it involves writing to page 1, which every +transaction needs to read). To minimize the frequency with which this occurs, +when a page must be allocated from the end of the database file, the file is +extended by 2048 pages. These are distributed equally between 16 free-lists +(children of the free-list node page). Additionally, the first trunk page in +each free list is never reused. Doing so would require writing to the +free-list node page - effectively an exclusive lock on the entire +page-allocation system. + +

+The current format used for the free-list can be modified or queried using a +new pragma: + +

+  PRAGMA [database.]freelist_format;
+  PRAGMA [database.]freelist_format = 1|2;
+
+ +

+At present, the free-list format may only be modified when the free-list is +completely empty. Which, as the implementation ensures that a free-list that +uses the alternative format is never completely emptied, effectively precludes +changing the format from 2 (alternative) to 1 (legacy). + +

Page level locking - "Server Mode"

+ +

+A database client automatically enters "server mode" if (a) it is using a VFS +that takes a process-wide exclusive lock on the db file (like "unix-excl" +does), and (b) there exists a directory named "<database>-journal" in the +file system alongside the database file "<database>" There is currently no +provision for creating this directory, although it could be safely done for +a database in rollback mode using something like: + +

+  PRAGMA journal_mode = off;
+  BEGIN EXCLUSIVE;
+    <create directory>
+  END;
+
+ +

+To check the status of these two conditions, a new file-control is added - +SQLITE_FCNTL_SERVER_MODE. SQLite invokes this file-control as part of the +procedure for detecting a hot journal (after it has established that there is a +file-system entry named <database>-journal and that no other process +holds a RESERVED lock). If the VFS does support an exclusive process-wide lock +and if the directory is present, the VFS indicates that the client should enter +server mode. If the VFS does not indicate this, or if it returns +SQLITE_NOTFOUND, then SQLite proceeds with the hot-journal rollback. + +

+There is also a new file-control named SQLITE_FCNTL_FILEID, which requests a +128-bit value that uniquely identifies an open file on disk from the VFS. This +is used to ensure that all connections to the same database from within a +process use the same shared state, even if they connect to the db using +different file-system paths. + +

+The heap-memory data structure shared between all connections to the same +database is protected by a mutex. Clients take and release the mutex each +time a transaction is opened or closed, and each time a read or write lock +is taken on a specific database page. Ordinary read/write transactions lock +each page that they access - each page can support any number of concurrent +read locks or a single write lock. + +

+Write transactions use a journal file stored in the <database>-journal +directory. Journal files are named "<id>-journal", where <id> is an +integer value betwen 0 and 15, inclusive. A client may use multiple different +journal files throughout its lifetime. + +

Before database pages are overwritten in server-mode, entries are added to +an in-memory hash table containing the old page content. These entries are +used by read-only transactions to ensure that they access a consistent snapshot +of the database. Hash table entries are automatically removed when they are +no longer required. + +

+It is not difficult to extend the kind of page level locking used by read/write +transactions to clients in multiple processes. It might be more difficult to +extend the read-only MVCC capability though. + +

Performance Test

+ +

+The test uses a single table with the following schema: + +

+  CREATE TABLE t1(a INTEGER PRIMARY KEY, b BLOB(16), c BLOB(16), d BLOB(400));
+  CREATE INDEX i1 ON t1(b);
+  CREATE INDEX i2 ON t1(c);
+
+ +

+The database initially contains 5,000,000 rows. Values for column "a" are +between 1 and 5,000,000, inclusive. Other columns are populated with randomly +generated blob values, each 16, 16, and 400 bytes in size, respectively. + +

+Read/write transactions used by the test take the following form. Each such +transaction modifies approximately 25 pages (5 in the main table and 10 in each +index), not accounting for tree rebalancing operations. + +

+  BEGIN;
+    REPLACE INTO t1 VALUES(abs(random() % 5000000), randomblob(16), randomblob(16), randomblob(400));
+    REPLACE INTO t1 VALUES(abs(random() % 5000000), randomblob(16), randomblob(16), randomblob(400));
+    REPLACE INTO t1 VALUES(abs(random() % 5000000), randomblob(16), randomblob(16), randomblob(400));
+    REPLACE INTO t1 VALUES(abs(random() % 5000000), randomblob(16), randomblob(16), randomblob(400));
+    REPLACE INTO t1 VALUES(abs(random() % 5000000), randomblob(16), randomblob(16), randomblob(400));
+  COMMIT;
+
+ +

+Read-only transactions are as follows: + +

+  BEGIN READONLY;
+    SELECT * FROM t1 WHERE a>abs((random()%5000000)) LIMIT 10;
+    SELECT * FROM t1 WHERE a>abs((random()%5000000)) LIMIT 10;
+    SELECT * FROM t1 WHERE a>abs((random()%5000000)) LIMIT 10;
+    SELECT * FROM t1 WHERE a>abs((random()%5000000)) LIMIT 10;
+    SELECT * FROM t1 WHERE a>abs((random()%5000000)) LIMIT 10;
+  END;
+
+ +

+The performance test features one or more clients executing read/write +transactions as fast as possible, and zero or more clients executing read-only +transactions, also as fast as possible. All tests use the "unix-excl" VFS and +all clients execute in a separate thread within the same process. The database +is located on a tmpfs file-system. + +

+In the table below "rw:" refers to the number of read-write clients, and "ro:" +the number of read-only clients used by the test. The TPS values in brackets +are the number of read-only transactions per second. All other values are +read-write transactions per second. + +

+The collision rate (percentage of attempted transactions that failed due to a +page-level locking conflict) in all tests was between 1 and 2%. Failed +transactions are not included in the TPS counts below. + +

+ + + +
Configuration TPS per client TPS total +
3.20.0, journal_mode=persist, rw:1, ro:0 6886 6886 +
3.20.0, journal_mode=wal, rw:1, ro:0 5997 5997 +
begin-concurrent, rw:1, ro:05925 5925 +
begin-concurrent, rw:2, ro:04172 8345 +
begin-concurrent, rw:3, ro:02943 8831 +
begin-concurrent, rw:2, ro:13781 7562 (17473) +
server-mode, rw:1, ro:0 6333 6333 +
server-mode, rw:2, ro:0 5693 11386 +
server-mode, rw:3, ro:05132 15398 +
server-mode, rw:2, ro:15303 10607 (28300) +
+ diff --git a/manifest b/manifest index e2abab2c19..aa980ac04d 100644 --- a/manifest +++ b/manifest @@ -1,8 +1,9 @@ -C Merge\slatest\strunk\schanges\swith\sthis\sbranch. -D 2017-07-29T17:10:37.200 +C Add\sdocumentation\sfile\s./README-server-edition.html. +D 2017-07-31T19:55:56.996 F Makefile.in d9873c9925917cca9990ee24be17eb9613a668012c85a343aef7e5536ae266e8 F Makefile.linux-gcc 7bc79876b875010e8c8f9502eb935ca92aa3c434 F Makefile.msc 02b469e9dcd5b7ee63fc1fb05babc174260ee4cfa4e0ef2e48c3c6801567a016 +F README-server-edition.html 0c6bc6f55191b6900595fe37470bbe5772953ab5c64dae967d07a5d58a0c3508 F README.md f5c87359573c4d255425e588a56554b50fdcc2afba4e017a2e02a43701456afd F VERSION f81232df28e2d3ff049feefad5fbd5489cc33697f6bd2ecf61af7f0dde3b83d0 F aclocal.m4 a5c22d164aff7ed549d53a90fa56d56955281f50 @@ -1646,7 +1647,7 @@ F vsixtest/vsixtest.tcl 6a9a6ab600c25a91a7acc6293828957a386a8a93 F vsixtest/vsixtest.vcxproj.data 2ed517e100c66dc455b492e1a33350c1b20fbcdc F vsixtest/vsixtest.vcxproj.filters 37e51ffedcdb064aad6ff33b6148725226cd608e F vsixtest/vsixtest_TemporaryKey.pfx e5b1b036facdb453873e7084e1cae9102ccc67a0 -P 8299bdb7cbede30c665dda131bdcbd1d260b3ae9bd16d9b414d8c3776b08f1b3 bcec155e0d6c6b17ae09d5a366c080723d01ff40dbc1a0ad0bb669a91db1b850 -R 77ed1cc2fd3110f4b8329af1b4d34a6f +P b42c87790e2dfdc4e52c42967257decf4b44ded909403ea4ac47fe41acfd7077 +R 0649fc2cf3ba1c03f8e547dfbc16afc9 U dan -Z 4686c3424ce70d61f90ed6c1d0cd236d +Z c48da8fd6893915c8bae54b12ad68f68 diff --git a/manifest.uuid b/manifest.uuid index 7741847729..16ed4d7fdd 100644 --- a/manifest.uuid +++ b/manifest.uuid @@ -1 +1 @@ -b42c87790e2dfdc4e52c42967257decf4b44ded909403ea4ac47fe41acfd7077 \ No newline at end of file +fbc4f4ad259cfe04b28d698ef025f179b6708ac788939976c401db2caf8f92c1 \ No newline at end of file