Documentation Source Text

Check-in [15b43e7cdb]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:In the testing document, use <codeblock> instead of <pre> and add a section on mutation testing.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 15b43e7cdb48e06ea03a3822b3095430a5347cfb
User & Date: drh 2016-09-05 16:51:45
Context
2016-09-05
17:18
More mutation testing documentation. check-in: 34514986f4 user: drh tags: trunk
16:51
In the testing document, use <codeblock> instead of <pre> and add a section on mutation testing. check-in: 15b43e7cdb user: drh tags: trunk
2016-09-03
19:43
Improved hyperlinks in the history of releases. check-in: a5156889f3 user: drh tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/testing.in.

484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
...
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
...
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
...
688
689
690
691
692
693
694




























































695
696
697
698
699
700
701
coverage measures the number of machine-code branch instructions that
are evaluated at least once on both directions.</p>

<p>To illustrate the difference between statement coverage and
branch coverage, consider the following hypothetical
line of C code:</p>

<blockquote><pre>
if( a>b && c!=25 ){ d++; }
</pre></blockquote>

<p>Such a line of C code might generate a dozen separate machine code
instructions.  If any one of those instructions is ever evaluated, then
we say that the statement has been tested.  So, for example, it might
be the case that the conditional expression is
always false and the "d" variable is
never incremented.  Even so, statement coverage counts this line of
................................................................................
macros called ALWAYS() and NEVER().   The ALWAYS() macro
surrounds conditions
which are expected to always evaluate as true and NEVER() surrounds
conditions that are always evaluated to false.  These macros serve as
comments to indicate that the conditions are defensive code.
In release builds, these macros are pass-throughs:</p>

<blockquote><pre>
#define ALWAYS(X)  (X)
#define NEVER(X)   (X)
</pre></blockquote>

<p>During most testing, however, these macros will throw an assertion
fault if their argument does not have the expected truth value.  This
alerts the developers quickly to incorrect design assumptions.

<blockquote><pre>
#define ALWAYS(X)  ((X)?1:assert(0),0)
#define NEVER(X)   ((X)?assert(0),1:0)
</pre></blockquote>

<p>When measuring test coverage, these macros are defined to be constant
truth values so that they do not generate assembly language branch
instructions, and hence do not come into play when calculating the
branch coverage:</p>

<blockquote><pre>
#define ALWAYS(X)  (1)
#define NEVER(X)   (0)
</pre></blockquote>

<p>The test suite is designed to be run three times, once for each of
the ALWAYS() and NEVER() definitions shown above.  All three test runs
should yield exactly the same result.  There is a run-time test using
the [sqlite3_test_control]([SQLITE_TESTCTRL_ALWAYS], ...) interface that
can be used to verify that the macros are correctly set to the first
form (the pass-through form) for deployment.</p>
................................................................................

<p>Another macro used in conjunction with test coverage measurement is
the <tt>testcase()</tt> macro.  The argument is a condition for which
we want test cases that evaluate to both true and false.
In non-coverage builds (that is to say, in release builds) the
<tt>testcase()</tt> macro is a no-op:</p>

<blockquote><pre>
#define testcase(X)
</pre></blockquote>

<p>But in a coverage measuring build, the <tt>testcase()</tt> macro
generates code that evaluates the conditional expression in its argument.  
Then during analysis, a check
is made to ensure tests exist that evaluate the conditional to both true
and false.  <tt>Testcase()</tt> macros are used, for example, to help verify
that boundary values are tested.  For example:</p>

<blockquote><pre>
testcase( a==b );
testcase( a==b+1 );
if( a>b && c!=25 ){ d++; }
</pre></blockquote>

<p>Testcase macros are also used when two or more cases of a switch
statement go to the same block of code, to make sure that the code was
reached for all cases:</p>

<blockquote><pre>
switch( op ){
  case OP_Add:
  case OP_Subtract: {
    testcase( op==OP_Add );
    testcase( op==OP_Subtract );
    /* ... */
    break;
  }
  /* ... */
}
</pre></blockquote>

<p>For bitmask tests, <tt>testcase()</tt> macros are used to verify that every
bit of the bitmask affects the outcome.  For example, in the following block
of code, the condition is true if the mask contains either of two bits
indicating either a MAIN_DB or a TEMP_DB is being opened.  
The <tt>testcase()</tt>
macros that precede the if statement verify that both cases are tested:</p>

<blockquote><pre>
testcase( mask & SQLITE_OPEN_MAIN_DB );
testcase( mask & SQLITE_OPEN_TEMP_DB );
if( (mask & (SQLITE_OPEN_MAIN_DB|SQLITE_OPEN_TEMP_DB))!=0 ){ ... }
</pre></blockquote>

<p>The SQLite source code contains <tcl>N {$stat(nTestcase)}</tcl>
uses of the <tt>testcase()</tt> macro.</p>

<tcl>hd_fragment {mcdc} *MC/DC {MC/DC testing}</tcl>
<h2>Branch coverage versus MC/DC</h2>

................................................................................
differences in output indicate either the use of undefined or
indeterminate behavior in the SQLite code (and hence a bug), 
or a bug in the compiler.
Note that SQLite has, over the previous decade, encountered bugs
in each of GCC, Clang, and MSVC.  Compiler bugs, while rare, do happen,
which is why it is so important to test the code in an as-delivered
configuration.





























































<tcl>hd_fragment thoughts1</tcl>
<h2>Experience with full test coverage</h2>

<p>The developers of SQLite have found that full coverage testing is an
extremely effective method for locating and preventing bugs.
Because every single branch







|

|







 







|


|





|


|






|


|







 







|

|








|



|





|










|








|



|







 







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
...
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
...
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
...
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
coverage measures the number of machine-code branch instructions that
are evaluated at least once on both directions.</p>

<p>To illustrate the difference between statement coverage and
branch coverage, consider the following hypothetical
line of C code:</p>

<codeblock>
if( a>b && c!=25 ){ d++; }
</codeblock>

<p>Such a line of C code might generate a dozen separate machine code
instructions.  If any one of those instructions is ever evaluated, then
we say that the statement has been tested.  So, for example, it might
be the case that the conditional expression is
always false and the "d" variable is
never incremented.  Even so, statement coverage counts this line of
................................................................................
macros called ALWAYS() and NEVER().   The ALWAYS() macro
surrounds conditions
which are expected to always evaluate as true and NEVER() surrounds
conditions that are always evaluated to false.  These macros serve as
comments to indicate that the conditions are defensive code.
In release builds, these macros are pass-throughs:</p>

<codeblock>
#define ALWAYS(X)  (X)
#define NEVER(X)   (X)
</codeblock>

<p>During most testing, however, these macros will throw an assertion
fault if their argument does not have the expected truth value.  This
alerts the developers quickly to incorrect design assumptions.

<codeblock>
#define ALWAYS(X)  ((X)?1:assert(0),0)
#define NEVER(X)   ((X)?assert(0),1:0)
</codeblock>

<p>When measuring test coverage, these macros are defined to be constant
truth values so that they do not generate assembly language branch
instructions, and hence do not come into play when calculating the
branch coverage:</p>

<codeblock>
#define ALWAYS(X)  (1)
#define NEVER(X)   (0)
</codeblock>

<p>The test suite is designed to be run three times, once for each of
the ALWAYS() and NEVER() definitions shown above.  All three test runs
should yield exactly the same result.  There is a run-time test using
the [sqlite3_test_control]([SQLITE_TESTCTRL_ALWAYS], ...) interface that
can be used to verify that the macros are correctly set to the first
form (the pass-through form) for deployment.</p>
................................................................................

<p>Another macro used in conjunction with test coverage measurement is
the <tt>testcase()</tt> macro.  The argument is a condition for which
we want test cases that evaluate to both true and false.
In non-coverage builds (that is to say, in release builds) the
<tt>testcase()</tt> macro is a no-op:</p>

<codeblock>
#define testcase(X)
</codeblock>

<p>But in a coverage measuring build, the <tt>testcase()</tt> macro
generates code that evaluates the conditional expression in its argument.  
Then during analysis, a check
is made to ensure tests exist that evaluate the conditional to both true
and false.  <tt>Testcase()</tt> macros are used, for example, to help verify
that boundary values are tested.  For example:</p>

<codeblock>
testcase( a==b );
testcase( a==b+1 );
if( a>b && c!=25 ){ d++; }
</codeblock>

<p>Testcase macros are also used when two or more cases of a switch
statement go to the same block of code, to make sure that the code was
reached for all cases:</p>

<codeblock>
switch( op ){
  case OP_Add:
  case OP_Subtract: {
    testcase( op==OP_Add );
    testcase( op==OP_Subtract );
    /* ... */
    break;
  }
  /* ... */
}
</codeblock>

<p>For bitmask tests, <tt>testcase()</tt> macros are used to verify that every
bit of the bitmask affects the outcome.  For example, in the following block
of code, the condition is true if the mask contains either of two bits
indicating either a MAIN_DB or a TEMP_DB is being opened.  
The <tt>testcase()</tt>
macros that precede the if statement verify that both cases are tested:</p>

<codeblock>
testcase( mask & SQLITE_OPEN_MAIN_DB );
testcase( mask & SQLITE_OPEN_TEMP_DB );
if( (mask & (SQLITE_OPEN_MAIN_DB|SQLITE_OPEN_TEMP_DB))!=0 ){ ... }
</codeblock>

<p>The SQLite source code contains <tcl>N {$stat(nTestcase)}</tcl>
uses of the <tt>testcase()</tt> macro.</p>

<tcl>hd_fragment {mcdc} *MC/DC {MC/DC testing}</tcl>
<h2>Branch coverage versus MC/DC</h2>

................................................................................
differences in output indicate either the use of undefined or
indeterminate behavior in the SQLite code (and hence a bug), 
or a bug in the compiler.
Note that SQLite has, over the previous decade, encountered bugs
in each of GCC, Clang, and MSVC.  Compiler bugs, while rare, do happen,
which is why it is so important to test the code in an as-delivered
configuration.

<tcl>hd_fragment mutationtests</tcl>
<h2>Mutation testing</h2>

<p>Using gcov (or similar) to show that every branch instruction is taken
at least once in both directions is good measure of test suite quality.
But even better is showing that every branch instruction makes
a difference in the output.  In other words, we want to show 
not only that every branch instruction both jumps and falls through but also
that every branch is doing useful work and that the test suite is able
to detect and verify that work.  When a branch is found that does not
make a difference in the output, that suggests that the code associated 
the branch can be removed (reducing the size of the library and perhaps
making it run faster) or that the test suite is inadequately testing the
feature that the branch implements.

<p>SQLite strives to verify that every branch instruction makes a difference
using [https://en.wikipedia.org/wiki/Mutation_testing|mutation testing].
A script first compiles the SQLite source code into assembly language
(using, for example, the -S option to gcc).  Then the script steps through
the generated assembly language and, one by one, changes each branch 
instruction into either an unconditional jump or a no-op, compiles the 
result, and verifies that the test suite catches the mutation.

<p>
Unfortunately, SQLite contains many branch instructions that
help the code run faster without changing the output.
Such branches generate false-positives during mutation testing.
As an example, consider the following 
[https://www.sqlite.org/src/artifact/55b5fb474?ln=55-62 | hash function]
used to accelerate table-name lookup:

<codeblock>
55  static unsigned int strHash(const char *z){
56    unsigned int h = 0;
57    unsigned char c;
58    while( (c = (unsigned char)*z++)!=0 ){     /*OPTIMIZATION-IF-TRUE*/
59      h = (h&lt;&lt;3) &#94; h &#94; sqlite3UpperToLower&#91;c&#93;;
60    }
61    return h;
62  }
</codeblock>

<p>
If the branch instruction that implements the "c!=0" test on line 58
is changed into a no-op, then the while-loop will loop forever and the
test suite will fail with a time-out.  But if that branch is changed
into an unconditional jump, then the hash function will always return 0.
The problem is that 0 is a valid hash.  A hash function that always
returns 0 still works in the sense that SQLite still always gets the correct
answer.  The table-name hash table degenerates into a linked-list
and so the table-name lookups that occur while parsing SQL statements 
might be a little slower, but the end result will be the same.

<p>
To work around this problem, comments of the form
"<code>/*OPTIMIZATION-IF-TRUE*/</code>" and
"<code>/*OPTIMIZATION-IF-FALSE*/</code>" are inserted into the SQLite
source code to tell the mutation testing script to ignore some branch
instructions.

<tcl>hd_fragment thoughts1</tcl>
<h2>Experience with full test coverage</h2>

<p>The developers of SQLite have found that full coverage testing is an
extremely effective method for locating and preventing bugs.
Because every single branch