Index: pages/docsdata.tcl ================================================================== --- pages/docsdata.tcl +++ pages/docsdata.tcl @@ -279,10 +279,15 @@ in place of XML or JSON or a "pile-of-file". } doc {Well Known Users} {famous.html} { This page lists a small subset of the many thousands of devices and application programs that make use of SQLite. +} +doc {35% Faster Than The Filesystem} {fasterthanfs.html} { + This article points out that reading blobs out of an SQLite database + is often faster than reading the same blobs from individual files in + the filesystem. } ############################################################################### heading {Technical and Design Documentation} technical { ADDED pages/fasterthanfs.in Index: pages/fasterthanfs.in ================================================================== --- /dev/null +++ pages/fasterthanfs.in @@ -0,0 +1,166 @@ +35% Faster Than The Filesystem +hd_keywords {faster than the filesystem} + + + +

Summary

+ +

Small blobs (for example, thumbnail images) +can be read out of an SQLite database about 35% faster +than they can be read from individual files on disk. + +

Furthermore, a single SQLite database holding +10-kilobyte blobs uses about 20% less disk space than +storing the blobs in individual files. + +

The performance difference arises (we believe) because when +reading from an SQLite database, the open() and close() system calls +are invoked only once, whereas +open() and close() are invoked once for each blob +when reading the blobs from individual files. It appears that the +overhead of calling open() and close() is greater than the overhead +of using the database. The size reduction arises from the fact that +individual files are padded out to the next multiple of the filesystem +block size, whereas the blobs are packed more tightly into an SQLite +database. + +

How These Measurements Are Made

+ +

The performance comparison is accomplished using the +[https://www.sqlite.org/src/file/test/kvtest.c|kvtest.c] program +found in the SQLite source tree. +To compile the test program, first gather the kvtest.c source file +into a directory with the [amalgamation|SQLite amalgamation] source +files "sqlite3.c" and "sqlite3.h". Then on unix, run a command like +the following: + + +gcc -Os -I. -DSQLITE_DIRECT_OVERFLOW_READ kvtest.c sqlite3.c \ + -o kvtest -ldl -lpthread + + +

Or on Windows with MSVC: + + +cl -I. -DSQLITE_DIRECT_OVERFLOW_READ kvtest.c sqlite3.c + + +

+Use the resulting "kvtest" program to +generate a test database with 100,000 random blobs, each 10,000 bytes in +size using a command like this: + + +./kvtest init test1.db --count 100k --size 10k + + +

+Next, make copies of all the blobs into individual files in a directory +using commands like this: + + +mkdir test1.dir +./kvtest export test1.db test1.dir + + +

+At this point, you can measure the amount of disk space used by +the test1.db database and the space used by the test1.dir directory +and all of its content. On a standard Ubuntu Linux desktop, the +database file will be 1,024,512,000 bytes in size and the test1.dir +directory will use 1,228,800,000 bytes of space (according to "du -k"), +about 20% more than the database. + +

+Measure the performance for reading blobs from the database and from +individual files using these commands: + + +./kvtest run test1.db --count 100k --blob-api +./kvtest run test1.dir --count 100k + + +

+Depending on your platform, you should see that reads from the test1.db +database file are about 35% faster than reads from individual files in +the test1.dir folder. + +

Variations

+ +

The [-DSQLITE_DIRECT_OVERFLOW_READ] compile-time option causes SQLite +to bypass its page cache when reading content from overflow pages. This +helps database reads of 10K blobs run a little faster, but not all that much +faster. SQLite still holds a speed advantage over direct filesystem reads +without the SQLITE_DIRECT_OVERFLOW_READ compile-time option. + +

Other compile-time options such as using -O3 instead of -Os or +using [-DSQLITE_THREADSAFE=0] and/or some of the other +[recommended compile-time options] might help SQLite to run even faster +relative to direct filesystem reads. + +

When constructing the test data, trying varying the size of the blob. +The performance advantage will shift toward direct filesystem reads as +the size of blobs increase, since the cost of invoking open() and close() +will be amortized over more bytes transferred using read(). The break-even +point, the point where it becomes faster to read directly from the filesystem, +will vary from one system to another. In the other direction, reducing the +blob size provide more advantage to database reads. With a 5 KB blob size, +reading from the database is twice as fast and uses 60% less space than +blobs stored as individual files. + +

The --blob-api option causes database reads to occur using the +[sqlite3_blob_open()], [sqlite3_blob_reopen()], and [sqlite3_blob_read()] +interfaces instead of using SQL statements. Without the --blob-api +option, a separate SQL statement is run to read each blob and +the performance of reading from the database is approximately +the same as the performance from reading directly from files. +This is still a significant finding, since few people would +expect a [full-featured SQL] database to run as fast as direct file reads, +and yet SQLite does. + +

The --random option on the "run" command causes the +blobs to be read in a random order. This causes the performance of database +reads to decrease. The reason is that the blobs are tightly packed in +the database, rather than being padded out to the next block size as when +they are stored in the filesystem. Some pages contain parts of +adjacent blobs. When the blobs are read sequentially, those pages are +only read into memory once and cached and then used to reconstruct +adjacent blobs, but when blobs are read in a random order, those pages +that share parts of two or more blobs tend to be read multiple times, +leading to decreased performance. + +

The "--mmap SIZE" option on the "run" command causes the database file +to be accessed using mmap() instead of via read(). The SIZE argument is +the size of the memory mapped region, and should be the size of the database +file for maximum performance. Using "--mmap 1G" causes the database reads +to be almost twice as fast as disk reads even when the --random +option is used. + +

When --random is used and both --blob-api and --mmap are omitted, +reading directly from files on disk is generally a little faster, but +reads from the database are still competitive. + +

Other Considerations

+ +

Some other SQL database engines advise developers to store blobs in separate +files and then store the filename in the database. In that case, where +the database must first be consulted to find the filename before opening +and reading the file, simply storing the entire blob in the database is +gives much faster read performance with SQLite. +See the [Internal Versus External BLOBs] article for more information. + +

This report only looks at the performance of reads, not writes. +Because SQLite implements [atomic commit|power-safe ACID transactions] +we expect that write performance into SQLite will be slower than writing +directly to individual files. However, if ACID transactions are disabled +via [PRAGMA journal_mode|PRAGMA journal_mode=OFF] +(thus putting SQLite on equal footing with the filesystem) and the +[sqlite3_blob_write()] interface is used, SQLite might well be competitive +or even faster than writes to separate files on disk. That is an +experiment we have not yet run. + +

Remember that the relative performance of database reads and reads from +the filesystem will depend on both the hardware and the operating system. +Please try the tests above on your own system. If you encounter cases +there database reads do not perform favorably in comparison to filesystem +reads, please report your findings in the [mailing lists|SQLite mailing list].