Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Bring this branch into closer alignment with begin-concurrent. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | begin-concurrent-pnu |
Files: | files | file ages | folders |
SHA3-256: |
6433d366175c5be586a0e11b540e76a2 |
User & Date: | drh 2019-03-26 13:28:15.115 |
Context
2019-04-15
| ||
15:07 | Merge the latest enhancements from trunk. (check-in: 6caaaf21c9 user: drh tags: begin-concurrent-pnu) | |
2019-03-26
| ||
13:28 | Bring this branch into closer alignment with begin-concurrent. (check-in: 6433d36617 user: drh tags: begin-concurrent-pnu) | |
12:16 | Merge the latest trunk changes. (check-in: 51e3e83549 user: drh tags: begin-concurrent-pnu) | |
Changes
Added doc/begin_concurrent.md.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | Begin Concurrent ================ ## Overview Usually, SQLite allows at most one writer to proceed concurrently. The BEGIN CONCURRENT enhancement allows multiple writers to process write transactions simultanously if the database is in "wal" or "wal2" mode, although the system still serializes COMMIT commands. When a write-transaction is opened with "BEGIN CONCURRENT", actually locking the database is deferred until a COMMIT is executed. This means that any number of transactions started with BEGIN CONCURRENT may proceed concurrently. The system uses optimistic page-level-locking to prevent conflicting concurrent transactions from being committed. When a BEGIN CONCURRENT transaction is committed, the system checks whether or not any of the database pages that the transaction has read have been modified since the BEGIN CONCURRENT was opened. In other words - it asks if the transaction being committed operates on a different set of data than all other concurrently executing transactions. If the answer is "yes, this transaction did not read or modify any data modified by any concurrent transaction", then the transaction is committed as normal. Otherwise, if the transaction does conflict, it cannot be committed and an SQLITE_BUSY_SNAPSHOT error is returned. At this point, all the client can do is ROLLBACK the transaction. If SQLITE_BUSY_SNAPSHOT is returned, messages are output via the sqlite3_log mechanism indicating the page and table or index on which the conflict occurred. This can be useful when optimizing concurrency. ## Application Programming Notes In order to serialize COMMIT processing, SQLite takes a lock on the database as part of each COMMIT command and releases it before returning. At most one writer may hold this lock at any one time. If a writer cannot obtain the lock, it uses SQLite's busy-handler to pause and retry for a while: <a href=https://www.sqlite.org/c3ref/busy_handler.html> https://www.sqlite.org/c3ref/busy_handler.html </a> If there is significant contention for the writer lock, this mechanism can be inefficient. In this case it is better for the application to use a mutex or some other mechanism that supports blocking to ensure that at most one writer is attempting to COMMIT a BEGIN CONCURRENT transaction at a time. This is usually easier if all writers are part of the same operating system process. If all database clients (readers and writers) are located in the same OS process, and if that OS is a Unix variant, then it can be more efficient to the built-in VFS "unix-excl" instead of the default "unix". This is because it uses more efficient locking primitives. The key to maximizing concurrency using BEGIN CONCURRENT is to ensure that there are a large number of non-conflicting transactions. In SQLite, each table and each index is stored as a separate b-tree, each of which is distributed over a discrete set of database pages. This means that: * Two transactions that write to different sets of tables never conflict, and that * Two transactions that write to the same tables or indexes only conflict if the values of the keys (either primary keys or indexed rows) are fairly close together. For example, given a large table with the schema: <pre> CREATE TABLE t1(a INTEGER PRIMARY KEY, b BLOB);</pre> writing two rows with adjacent values for "a" probably will cause a conflict (as the two keys are stored on the same page), but writing two rows with vastly different values for "a" will not (as the keys will likly be stored on different pages). Note that, in SQLite, if values are not explicitly supplied for an INTEGER PRIMARY KEY, as for example in: > INSERT INTO t1(b) VALUES(<blob-value>); then monotonically increasing values are assigned automatically. This is terrible for concurrency, as it all but ensures that all new rows are added to the same database page. In such situations, it is better to explicitly assign random values to INTEGER PRIMARY KEY fields. This problem also comes up for non-WITHOUT ROWID tables that do not have an explicit INTEGER PRIMARY KEY column. In these cases each table has an implicit INTEGER PRIMARY KEY column that is assigned increasing values, leading to the same problem as omitting to assign a value to an explicit INTEGER PRIMARY KEY column. For both explicit and implicit INTEGER PRIMARY KEYs, it is possible to have SQLite assign values at random (instead of the monotonically increasing values) by writing a row with a rowid equal to the largest possible signed 64-bit integer to the table. For example: INSERT INTO t1(a) VALUES(9223372036854775807); Applications should take care not to malfunction due to the presence of such rows. The nature of some types of indexes, for example indexes on timestamp fields, can also cause problems (as concurrent transactions may assign similar timestamps that will be stored on the same db page to new records). In these cases the database schema may need to be rethought to increase the concurrency provided by page-level-locking. |
Changes to src/btree.c.
︙ | ︙ | |||
8020 8021 8022 8023 8024 8025 8026 8027 8028 8029 8030 8031 8032 8033 | pgno = get4byte(pRight); while( 1 ){ rc = getAndInitPage(pBt, pgno, &apOld[i], 0, 0); if( rc ){ memset(apOld, 0, (i+1)*sizeof(MemPage*)); goto balance_cleanup; } if( apOld[i]->nFree<0 ){ rc = btreeComputeFreeSpace(apOld[i]); if( rc ){ memset(apOld, 0, (i)*sizeof(MemPage*)); goto balance_cleanup; } } | > < | 8020 8021 8022 8023 8024 8025 8026 8027 8028 8029 8030 8031 8032 8033 8034 8035 8036 8037 8038 8039 8040 8041 | pgno = get4byte(pRight); while( 1 ){ rc = getAndInitPage(pBt, pgno, &apOld[i], 0, 0); if( rc ){ memset(apOld, 0, (i+1)*sizeof(MemPage*)); goto balance_cleanup; } setMempageRoot(apOld[i], pgnoRoot); if( apOld[i]->nFree<0 ){ rc = btreeComputeFreeSpace(apOld[i]); if( rc ){ memset(apOld, 0, (i)*sizeof(MemPage*)); goto balance_cleanup; } } if( (i--)==0 ) break; if( pParent->nOverflow && i+nxDiv==pParent->aiOvfl[0] ){ apDiv[i] = pParent->apOvfl[0]; pgno = get4byte(apDiv[i]); szNew[i] = pParent->xCellSize(pParent, apDiv[i]); pParent->nOverflow = 0; |
︙ | ︙ |
Changes to src/pager.c.
︙ | ︙ | |||
3212 3213 3214 3215 3216 3217 3218 | ** been written (but not committed) to the log file, do one of the ** following: ** ** + Discard the cached page (if refcount==0), or ** + Reload page content from the database (if refcount>0). */ pPager->dbSize = pPager->dbOrigSize; | | > > > > > > | 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 | ** been written (but not committed) to the log file, do one of the ** following: ** ** + Discard the cached page (if refcount==0), or ** + Reload page content from the database (if refcount>0). */ pPager->dbSize = pPager->dbOrigSize; rc = sqlite3WalUndo(pPager->pWal, pagerUndoCallback, (void *)pPager, #ifdef SQLITE_OMIT_CONCURRENT 0 #else pPager->pAllRead!=0 #endif ); pList = sqlite3PcacheDirtyList(pPager->pPCache); #ifndef SQLITE_OMIT_CONCURRENT /* If this is an CONCURRENT transaction, then page 1 must be reread from ** the db file, even if it is not dirty. This is because the b-tree layer ** may have already zeroed the nFree and iTrunk header fields. */ if( rc==SQLITE_OK && (pList==0 || pList->pgno!=1) && pPager->pAllRead ){ |
︙ | ︙ |
Changes to src/sqliteInt.h.
︙ | ︙ | |||
1554 1555 1556 1557 1558 1559 1560 1561 | #define SQLITE_Fts3Tokenizer 0x00400000 /* Enable fts3_tokenizer(2) */ #define SQLITE_EnableQPSG 0x00800000 /* Query Planner Stability Guarantee*/ #define SQLITE_TriggerEQP 0x01000000 /* Show trigger EXPLAIN QUERY PLAN */ #define SQLITE_ResetDatabase 0x02000000 /* Reset the database */ #define SQLITE_LegacyAlter 0x04000000 /* Legacy ALTER TABLE behaviour */ #define SQLITE_NoSchemaError 0x08000000 /* Do not report schema parse errors*/ #define SQLITE_Defensive 0x10000000 /* Input SQL is likely hostile */ | > < | 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 | #define SQLITE_Fts3Tokenizer 0x00400000 /* Enable fts3_tokenizer(2) */ #define SQLITE_EnableQPSG 0x00800000 /* Query Planner Stability Guarantee*/ #define SQLITE_TriggerEQP 0x01000000 /* Show trigger EXPLAIN QUERY PLAN */ #define SQLITE_ResetDatabase 0x02000000 /* Reset the database */ #define SQLITE_LegacyAlter 0x04000000 /* Legacy ALTER TABLE behaviour */ #define SQLITE_NoSchemaError 0x08000000 /* Do not report schema parse errors*/ #define SQLITE_Defensive 0x10000000 /* Input SQL is likely hostile */ #define SQLITE_NoopUpdate 0x20000000 /* UPDATE operations are no-ops */ /* Flags used only if debugging */ #define HI(X) ((u64)(X)<<32) #ifdef SQLITE_DEBUG #define SQLITE_SqlTrace HI(0x0001) /* Debug print SQL as it executes */ #define SQLITE_VdbeListing HI(0x0002) /* Debug listings of VDBE progs */ #define SQLITE_VdbeTrace HI(0x0004) /* True to trace VDBE execution */ #define SQLITE_VdbeAddopTrace HI(0x0008) /* Trace sqlite3VdbeAddOp() calls */ |
︙ | ︙ |
Changes to src/wal.c.
︙ | ︙ | |||
3254 3255 3256 3257 3258 3259 3260 | ** to the WAL since the start of the transaction. If the callback returns ** other than SQLITE_OK, it is not invoked again and the error code is ** returned to the caller. ** ** Otherwise, if the callback function does not return an error, this ** function returns SQLITE_OK. */ | | > > > > > > > > > > > > | 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 | ** to the WAL since the start of the transaction. If the callback returns ** other than SQLITE_OK, it is not invoked again and the error code is ** returned to the caller. ** ** Otherwise, if the callback function does not return an error, this ** function returns SQLITE_OK. */ int sqlite3WalUndo( Wal *pWal, int (*xUndo)(void *, Pgno), void *pUndoCtx, int bConcurrent /* True if this is a CONCURRENT transaction */ ){ int rc = SQLITE_OK; if( pWal->writeLock ){ Pgno iMax = pWal->hdr.mxFrame; Pgno iFrame; /* Restore the clients cache of the wal-index header to the state it ** was in before the client began writing to the database. */ memcpy(&pWal->hdr, (void *)walIndexHdr(pWal), sizeof(WalIndexHdr)); #ifndef SQLITE_OMIT_CONCURRENT if( bConcurrent ){ pWal->hdr.aCksum[0]++; } #else UNUSED_PARAMETER(bConcurrent); #endif for(iFrame=pWal->hdr.mxFrame+1; ALWAYS(rc==SQLITE_OK) && iFrame<=iMax; iFrame++ ){ /* This call cannot fail. Unless the page for which the page number ** is passed as the second argument is (a) in the cache and |
︙ | ︙ |
Changes to src/wal.h.
︙ | ︙ | |||
30 31 32 33 34 35 36 | # define sqlite3WalLimit(x,y) # define sqlite3WalClose(v,w,x,y,z) 0 # define sqlite3WalBeginReadTransaction(y,z) 0 # define sqlite3WalEndReadTransaction(z) # define sqlite3WalDbsize(y) 0 # define sqlite3WalBeginWriteTransaction(y) 0 # define sqlite3WalEndWriteTransaction(x) 0 | | | 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | # define sqlite3WalLimit(x,y) # define sqlite3WalClose(v,w,x,y,z) 0 # define sqlite3WalBeginReadTransaction(y,z) 0 # define sqlite3WalEndReadTransaction(z) # define sqlite3WalDbsize(y) 0 # define sqlite3WalBeginWriteTransaction(y) 0 # define sqlite3WalEndWriteTransaction(x) 0 # define sqlite3WalUndo(w,x,y,z) 0 # define sqlite3WalSavepoint(y,z) # define sqlite3WalSavepointUndo(y,z) 0 # define sqlite3WalFrames(u,v,w,x,y,z) 0 # define sqlite3WalCheckpoint(q,r,s,t,u,v,w,x,y,z) 0 # define sqlite3WalCallback(z) 0 # define sqlite3WalExclusiveMode(y,z) 0 # define sqlite3WalHeapMemory(z) 0 |
︙ | ︙ | |||
79 80 81 82 83 84 85 | Pgno sqlite3WalDbsize(Wal *pWal); /* Obtain or release the WRITER lock. */ int sqlite3WalBeginWriteTransaction(Wal *pWal); int sqlite3WalEndWriteTransaction(Wal *pWal); /* Undo any frames written (but not committed) to the log */ | | | 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | Pgno sqlite3WalDbsize(Wal *pWal); /* Obtain or release the WRITER lock. */ int sqlite3WalBeginWriteTransaction(Wal *pWal); int sqlite3WalEndWriteTransaction(Wal *pWal); /* Undo any frames written (but not committed) to the log */ int sqlite3WalUndo(Wal *pWal, int (*xUndo)(void *, Pgno), void *pUndoCtx, int); /* Return an integer that records the current (uncommitted) write ** position in the WAL */ void sqlite3WalSavepoint(Wal *pWal, u32 *aWalData); /* Move the write position of the WAL back to iFrame. Called in ** response to a ROLLBACK TO command. */ |
︙ | ︙ |
Changes to test/concfault.test.
︙ | ︙ | |||
78 79 80 81 82 83 84 | } } -test { faultsim_test_result {0 {}} catchsql { ROLLBACK } faultsim_integrity_check } | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | > > > > > | > > | 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | } } -test { faultsim_test_result {0 {}} catchsql { ROLLBACK } faultsim_integrity_check } #------------------------------------------------------------------------- reset_db do_execsql_test 2.0 { PRAGMA auto_vacuum = 0; PRAGMA journal_mode = wal; CREATE TABLE t1(a PRIMARY KEY, b); CREATE TABLE t2(a PRIMARY KEY, b); INSERT INTO t1 VALUES(randomblob(1000), randomblob(100)); INSERT INTO t1 SELECT randomblob(1000), randomblob(1000) FROM t1; INSERT INTO t1 SELECT randomblob(1000), randomblob(1000) FROM t1; INSERT INTO t1 SELECT randomblob(1000), randomblob(1000) FROM t1; INSERT INTO t1 SELECT randomblob(1000), randomblob(1000) FROM t1; DELETE FROM t1 WHERE rowid%2; } {wal} faultsim_save_and_close do_faultsim_test 1 -prep { faultsim_restore_and_reopen execsql { SELECT * FROM t1; BEGIN CONCURRENT; INSERT INTO t2 VALUES(1, 2); } sqlite3 db2 test.db execsql { PRAGMA journal_size_limit = 10000; INSERT INTO t1 VALUES(randomblob(1000), randomblob(1000)); } db2 db2 close } -body { execsql { COMMIT } } -test { faultsim_test_result {0 {}} catchsql { ROLLBACK } set res [catchsql { SELECT count(*) FROM t1 }] if {$res!="0 9"} { error "expected {0 9} got {$res}" } faultsim_integrity_check } finish_test |
Changes to test/permutations.test.
︙ | ︙ | |||
85 86 87 88 89 90 91 92 93 94 95 96 97 98 | # $allquicktests # set alltests [list] foreach f [glob $testdir/*.test] { lappend alltests [file tail $f] } foreach f [glob -nocomplain \ $testdir/../ext/rtree/*.test \ $testdir/../ext/fts5/test/*.test \ $testdir/../ext/lsm1/test/*.test \ ] { lappend alltests $f } foreach f [glob -nocomplain $testdir/../ext/session/*.test] { lappend alltests $f } | > | 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | # $allquicktests # set alltests [list] foreach f [glob $testdir/*.test] { lappend alltests [file tail $f] } foreach f [glob -nocomplain \ $testdir/../ext/rtree/*.test \ $testdir/../ext/fts5/test/*.test \ $testdir/../ext/expert/*.test \ $testdir/../ext/lsm1/test/*.test \ ] { lappend alltests $f } foreach f [glob -nocomplain $testdir/../ext/session/*.test] { lappend alltests $f } |
︙ | ︙ |