/ Changes On Branch query-planner-tweaks
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Changes In Branch query-planner-tweaks Excluding Merge-Ins

This is equivalent to a diff from b90c28be38 to d1248165e3

2011-08-08
17:18
Add code to actually use the sqlite_stat2.cnt field in the query planner. This changes some plans resulting in a few failures in analyze5.test. (Closed-Leaf check-in: d1248165e3 user: drh tags: query-planner-tweaks)
2011-08-07
01:31
Remove relevant elements from the sqlite_stat2 table when doing a DROP INDEX or DROP TABLE. (check-in: 3c8f97ae52 user: drh tags: trunk)
00:21
The ANALYZE command adds the sqlite_stat2.cnt column if it does not already exist. (check-in: 794fde6f91 user: drh tags: query-planner-tweaks)
2011-08-05
21:13
Add a column to the sqlite_stat2 table that contains the number of entries with exactly the same key as the sample. We do not yet do anything with this extra value. Some tests in analyze2.test are failing. (check-in: eb43422827 user: drh tags: query-planner-tweaks)
2011-08-03
22:06
Merge the winopen-retry-logic branch into trunk. The biggest change here is to test scripts, which should now use such as copy_file and delete_file from tester.tcl rather than the raw file commands of TCL. (check-in: b90c28be38 user: drh tags: trunk)
16:40
Update the OP_Move opcode to shift the pScopyFrom pointer of aliases when compiled with SQLITE_DEBUG. Ticket [d63523637517386191]. (check-in: a2135ad130 user: drh tags: trunk)
2011-08-02
23:45
Add explanatory comment to the win32lock-2.2 test case. (Closed-Leaf check-in: 4cb17881d9 user: mistachkin tags: winopen-retry-logic)

Changes to src/analyze.c.

     6      6   **
     7      7   **    May you do good and not evil.
     8      8   **    May you find forgiveness for yourself and forgive others.
     9      9   **    May you share freely, never taking more than you give.
    10     10   **
    11     11   *************************************************************************
    12     12   ** This file contains code associated with the ANALYZE command.
           13  +**
           14  +** The ANALYZE command gather statistics about the content of tables
           15  +** and indices.  These statistics are made available to the query planner
           16  +** to help it make better decisions about the best way to implement a
           17  +** query.
           18  +**
           19  +** Two system tables are created as follows:
           20  +**
           21  +**    CREATE TABLE sqlite_stat1(tbl, idx, stat);
           22  +**    CREATE TABLE sqlite_stat2(tbl, idx, sampleno, sample, cnt);
           23  +**
           24  +** Additional tables might be added in future releases of SQLite.
           25  +** The sqlite_stat2 table is only created and used if SQLite is
           26  +** compiled with SQLITE_ENABLE_STAT2.  Older versions of SQLite
           27  +** omit the sqlite_stat2.cnt column.  Newer versions of SQLite are
           28  +** able to use older versions of the stat2 table that lack the cnt
           29  +** column.
           30  +**
           31  +** Format of sqlite_stat1:
           32  +**
           33  +** There is normally one row per index, with the index identified by the
           34  +** name in the idx column.  The tbl column is the name of the table to
           35  +** which the index belongs.  In each such row, the stat column will be
           36  +** a string consisting of a list of integers.  The first integer in this
           37  +** list is the number of rows in the index and in the table.  The second
           38  +** integer is the average number of rows in the index that have the same
           39  +** value in the first column of the index.  The third integer is the average
           40  +** number of rows in the index that have the same value for the first two
           41  +** columns.  The N-th integer (for N>1) is the average number of rows in 
           42  +** the index which have the same value for the first N-1 columns.  For
           43  +** a K-column index, there will be K+1 integers in the stat column.  If
           44  +** the index is unique, then the last integer will be 1.
           45  +**
           46  +** The list of integers in the stat column can optionally be followed
           47  +** by the keyword "unordered".  The "unordered" keyword, if it is present,
           48  +** must be separated from the last integer by a single space.  If the
           49  +** "unordered" keyword is present, then the query planner assumes that
           50  +** the index is unordered and will not use the index for a range query.
           51  +** 
           52  +** If the sqlite_stat1.idx column is NULL, then the sqlite_stat1.stat
           53  +** column contains a single integer which is the (estimated) number of
           54  +** rows in the table identified by sqlite_stat1.tbl.
           55  +**
           56  +** Format of sqlite_stat2:
           57  +**
           58  +** The sqlite_stat2 is only created and is only used if SQLite is compiled
           59  +** with SQLITE_ENABLE_STAT2.  The "stat2" table contains additional information
           60  +** about the key distribution within an index.  The index is identified by
           61  +** the "idx" column and the "tbl" column is the name of the table to which
           62  +** the index belongs.  There are usually multiple rows in the sqlite_stat2
           63  +** table for each index.
           64  +**
           65  +** The sqlite_stat2 entires for an index that have sampleno>=0 are
           66  +** sampled key values for the first column of the index taken at
           67  +** intervals along the index.  The sqlite_stat2.sample column holds
           68  +** the value of the key in the left-most column of the index.
           69  +**
           70  +** The samples are numbered from 0 to S-1
           71  +** where S is 10 by default.  The number of samples created by the
           72  +** ANALYZE command can be adjusted at compile-time using the
           73  +** SQLITE_INDEX_SAMPLES macro.  The maximum number of samples is
           74  +** SQLITE_MAX_SAMPLES, currently set to 100.  There are places in the
           75  +** code that use an unsigned character to count samples, so an upper
           76  +** bound on SQLITE_MAX_SAMPLES is 255.
           77  +**
           78  +** Suppose the index contains C rows.  And let the number
           79  +** of samples be S.  SQLite assumes that the samples are taken from the
           80  +** following rows for i between 0 and S-1:
           81  +**
           82  +**     rownumber = (i*C*2 + C)/(S*2)
           83  +**
           84  +** Conceptually, the index is divided into S bins and the sample is
           85  +** taken from the middle of each bin.  The ANALYZE will not attempt
           86  +** to populate sqlite_stat2 for an index that holds fewer than S*2
           87  +** entries.
           88  +**
           89  +** If the key value for a sample (the sqlite_stat2.sample column) is a 
           90  +** large string or blob, SQLite will only use the first 255 bytes of 
           91  +** that string or blob.
           92  +** 
           93  +** The sqlite_stat2.cnt column contains the number of entries in the
           94  +** index for which sqlite_stat2.sample matches the left-most column
           95  +** of the index.  In other words, sqlite_stat2.cnt holds the number of
           96  +** times the sqlite_stat2.sample value appears in the index..  Many 
           97  +** older versions of SQLite omit the sqlite_stat2.cnt column.
           98  +**
           99  +** If the sqlite_stat2.sampleno value is -1, then that row holds a first-
          100  +** column key that is a frequently used key in the index.  The
          101  +** sqlite_stat2.cnt column will hold the number of occurrances of that key.
          102  +** This information is useful to the query planner in cases where a
          103  +** large percentage of the rows in indexed field have one of a small
          104  +** handful of value but the balance of the rows in the index have
          105  +** distinct or nearly distinct keys.
    13    106   */
    14    107   #ifndef SQLITE_OMIT_ANALYZE
    15    108   #include "sqliteInt.h"
    16    109   
    17    110   /*
    18    111   ** This routine generates code that opens the sqlite_stat1 table for
    19    112   ** writing with cursor iStatCur. If the library was built with the
................................................................................
    39    132   ){
    40    133     static const struct {
    41    134       const char *zName;
    42    135       const char *zCols;
    43    136     } aTable[] = {
    44    137       { "sqlite_stat1", "tbl,idx,stat" },
    45    138   #ifdef SQLITE_ENABLE_STAT2
    46         -    { "sqlite_stat2", "tbl,idx,sampleno,sample" },
          139  +    { "sqlite_stat2", "tbl,idx,sampleno,sample,cnt" },
    47    140   #endif
    48    141     };
    49    142   
    50    143     int aRoot[] = {0, 0};
    51    144     u8 aCreateTbl[] = {0, 0};
    52    145   
    53    146     int i;
................................................................................
    82    175           sqlite3NestedParse(pParse,
    83    176              "DELETE FROM %Q.%s WHERE %s=%Q", pDb->zName, zTab, zWhereType, zWhere
    84    177           );
    85    178         }else{
    86    179           /* The sqlite_stat[12] table already exists.  Delete all rows. */
    87    180           sqlite3VdbeAddOp2(v, OP_Clear, aRoot[i], iDb);
    88    181         }
          182  +#ifdef SQLITE_ENABLE_STAT2
          183  +      if( i==1 && iDb!=1 && pStat->nCol==4 ){
          184  +        sqlite3NestedParse(pParse,
          185  +           "UPDATE %Q.sqlite_master SET sql='CREATE TABLE sqlite_stat2(%s)'"
          186  +           " WHERE name='sqlite_stat2'", pDb->zName, aTable[i].zCols
          187  +        );
          188  +        sqlite3ChangeCookie(pParse, iDb);
          189  +      }
          190  +#endif
    89    191       }
    90    192     }
    91    193   
    92    194     /* Open the sqlite_stat[12] tables for writing. */
    93    195     for(i=0; i<ArraySize(aTable); i++){
    94    196       sqlite3VdbeAddOp3(v, OP_OpenWrite, iStatCur+i, aRoot[i], iDb);
    95    197       sqlite3VdbeChangeP4(v, -1, (char *)3, P4_INT32);
    96    198       sqlite3VdbeChangeP5(v, aCreateTbl[i]);
          199  +    VdbeComment((v, "%s", aTable[i].zName));
    97    200     }
    98    201   }
    99    202   
   100    203   /*
   101    204   ** Generate code to do an analysis of all indices associated with
   102    205   ** a single table.
   103    206   */
................................................................................
   115    218     int i;                       /* Loop counter */
   116    219     int topOfLoop;               /* The top of the loop */
   117    220     int endOfLoop;               /* The end of the loop */
   118    221     int jZeroRows = -1;          /* Jump from here if number of rows is zero */
   119    222     int iDb;                     /* Index of database containing pTab */
   120    223     int regTabname = iMem++;     /* Register containing table name */
   121    224     int regIdxname = iMem++;     /* Register containing index name */
   122         -  int regSampleno = iMem++;    /* Register containing next sample number */
   123         -  int regCol = iMem++;         /* Content of a column analyzed table */
          225  +  int regSampleno = iMem++;    /* Sampleno (stat2) or stat (stat1) */
          226  +#ifdef SQLITE_ENABLE_STAT2
          227  +  int regSample = iMem++;      /* The next sample value */
          228  +  int regSampleCnt = iMem++;   /* Number of occurrances of regSample value */
          229  +  int shortJump = 0;           /* Instruction address */
          230  +  int addrStoreStat2 = 0;      /* Address of subroutine to wrote to stat2 */
          231  +  int regNext = iMem++;        /* Index of next sample to record */
          232  +  int regSampleIdx = iMem++;   /* Index of next sample */
          233  +  int regReady = iMem++;       /* True if ready to store a stat2 entry */
          234  +  int regGosub = iMem++;       /* Register holding subroutine return addr */
          235  +  int regSample2 = iMem++;     /* Number of samples to acquire times 2 */
          236  +  int regCount = iMem++;       /* Number of rows in the table */
          237  +  int regCount2 = iMem++;      /* regCount*2 */
          238  +  int once = 1;                /* One-time initialization */
          239  +#endif
          240  +  int regCol = iMem++;         /* Content of a column in analyzed table */
   124    241     int regRec = iMem++;         /* Register holding completed record */
   125    242     int regTemp = iMem++;        /* Temporary use register */
   126    243     int regRowid = iMem++;       /* Rowid for the inserted record */
   127    244   
   128         -#ifdef SQLITE_ENABLE_STAT2
   129         -  int addr = 0;                /* Instruction address */
   130         -  int regTemp2 = iMem++;       /* Temporary use register */
   131         -  int regSamplerecno = iMem++; /* Index of next sample to record */
   132         -  int regRecno = iMem++;       /* Current sample index */
   133         -  int regLast = iMem++;        /* Index of last sample to record */
   134         -  int regFirst = iMem++;       /* Index of first sample to record */
   135         -#endif
   136    245   
   137    246     v = sqlite3GetVdbe(pParse);
   138    247     if( v==0 || NEVER(pTab==0) ){
   139    248       return;
   140    249     }
   141    250     if( pTab->tnum==0 ){
   142    251       /* Do not gather statistics on views or virtual tables */
................................................................................
   161    270     sqlite3TableLock(pParse, iDb, pTab->tnum, 0, pTab->zName);
   162    271   
   163    272     iIdxCur = pParse->nTab++;
   164    273     sqlite3VdbeAddOp4(v, OP_String8, 0, regTabname, 0, pTab->zName, 0);
   165    274     for(pIdx=pTab->pIndex; pIdx; pIdx=pIdx->pNext){
   166    275       int nCol;
   167    276       KeyInfo *pKey;
          277  +    int addrIfNot;               /* address of OP_IfNot */
          278  +    int *aChngAddr;              /* Array of jump instruction addresses */
   168    279   
   169    280       if( pOnlyIdx && pOnlyIdx!=pIdx ) continue;
          281  +    VdbeNoopComment((v, "Begin analysis of %s", pIdx->zName));
   170    282       nCol = pIdx->nColumn;
   171    283       pKey = sqlite3IndexKeyinfo(pParse, pIdx);
   172    284       if( iMem+1+(nCol*2)>pParse->nMem ){
   173    285         pParse->nMem = iMem+1+(nCol*2);
   174    286       }
          287  +    aChngAddr = sqlite3DbMallocRaw(db, sizeof(int)*pIdx->nColumn);
          288  +    if( aChngAddr==0 ) continue;
   175    289   
   176    290       /* Open a cursor to the index to be analyzed. */
   177    291       assert( iDb==sqlite3SchemaToIndex(db, pIdx->pSchema) );
   178    292       sqlite3VdbeAddOp4(v, OP_OpenRead, iIdxCur, pIdx->tnum, iDb,
   179    293           (char *)pKey, P4_KEYINFO_HANDOFF);
   180    294       VdbeComment((v, "%s", pIdx->zName));
   181    295   
................................................................................
   183    297       sqlite3VdbeAddOp4(v, OP_String8, 0, regIdxname, 0, pIdx->zName, 0);
   184    298   
   185    299   #ifdef SQLITE_ENABLE_STAT2
   186    300   
   187    301       /* If this iteration of the loop is generating code to analyze the
   188    302       ** first index in the pTab->pIndex list, then register regLast has
   189    303       ** not been populated. In this case populate it now.  */
   190         -    if( pTab->pIndex==pIdx ){
   191         -      sqlite3VdbeAddOp2(v, OP_Integer, SQLITE_INDEX_SAMPLES, regSamplerecno);
   192         -      sqlite3VdbeAddOp2(v, OP_Integer, SQLITE_INDEX_SAMPLES*2-1, regTemp);
   193         -      sqlite3VdbeAddOp2(v, OP_Integer, SQLITE_INDEX_SAMPLES*2, regTemp2);
   194         -
   195         -      sqlite3VdbeAddOp2(v, OP_Count, iIdxCur, regLast);
   196         -      sqlite3VdbeAddOp2(v, OP_Null, 0, regFirst);
   197         -      addr = sqlite3VdbeAddOp3(v, OP_Lt, regSamplerecno, 0, regLast);
   198         -      sqlite3VdbeAddOp3(v, OP_Divide, regTemp2, regLast, regFirst);
   199         -      sqlite3VdbeAddOp3(v, OP_Multiply, regLast, regTemp, regLast);
   200         -      sqlite3VdbeAddOp2(v, OP_AddImm, regLast, SQLITE_INDEX_SAMPLES*2-2);
   201         -      sqlite3VdbeAddOp3(v, OP_Divide,  regTemp2, regLast, regLast);
   202         -      sqlite3VdbeJumpHere(v, addr);
   203         -    }
   204         -
   205         -    /* Zero the regSampleno and regRecno registers. */
          304  +    if( once ){
          305  +      once = 0;
          306  +      sqlite3VdbeAddOp2(v, OP_Integer, SQLITE_INDEX_SAMPLES*2, regSample2);
          307  +
          308  +      sqlite3VdbeAddOp2(v, OP_Count, iIdxCur, regCount);
          309  +      sqlite3VdbeAddOp3(v, OP_Add, regCount, regCount, regCount2);
          310  +
          311  +
          312  +      /* Generate code for a subroutine that store the most recent sample
          313  +      ** in the sqlite_stat2 table
          314  +      */
          315  +      shortJump = sqlite3VdbeAddOp0(v, OP_Goto);
          316  +      sqlite3VdbeAddOp4(v, OP_MakeRecord, regTabname, 5, regRec, "aaaba", 0);
          317  +      VdbeComment((v, "begin stat2 write subroutine"));
          318  +      sqlite3VdbeAddOp2(v, OP_NewRowid, iStatCur+1, regRowid);
          319  +      sqlite3VdbeAddOp3(v, OP_Insert, iStatCur+1, regRec, regRowid);
          320  +      sqlite3VdbeAddOp2(v, OP_AddImm, regSampleno, 1);
          321  +      sqlite3VdbeAddOp2(v, OP_AddImm, regReady, -1);
          322  +      addrStoreStat2 = sqlite3VdbeAddOp2(v, OP_IfPos, regReady, shortJump+1);
          323  +      sqlite3VdbeAddOp1(v, OP_Return, regGosub);
          324  +      VdbeComment((v, "end stat2 write subroutine"));
          325  +      sqlite3VdbeJumpHere(v, shortJump);
          326  +    }
          327  +    /* Reset state registers */
          328  +    sqlite3VdbeAddOp2(v, OP_Copy, regCount2, regNext);
          329  +    shortJump = sqlite3VdbeAddOp3(v, OP_Lt, regSample2, 0, regCount);
          330  +    sqlite3VdbeAddOp3(v, OP_Divide, regSample2, regCount, regNext);
          331  +    sqlite3VdbeJumpHere(v, shortJump);
   206    332       sqlite3VdbeAddOp2(v, OP_Integer, 0, regSampleno);
   207         -    sqlite3VdbeAddOp2(v, OP_Integer, 0, regRecno);
   208         -    sqlite3VdbeAddOp2(v, OP_Copy, regFirst, regSamplerecno);
   209         -#endif
          333  +    sqlite3VdbeAddOp2(v, OP_Integer, 0, regSampleIdx);
          334  +    sqlite3VdbeAddOp2(v, OP_Integer, 0, regReady);
          335  +
          336  +#endif /* SQLITE_ENABLE_STAT2 */
   210    337   
   211    338       /* The block of memory cells initialized here is used as follows.
   212    339       **
   213    340       **    iMem:                
   214    341       **        The total number of rows in the table.
   215    342       **
   216    343       **    iMem+1 .. iMem+nCol: 
................................................................................
   232    359       }
   233    360   
   234    361       /* Start the analysis loop. This loop runs through all the entries in
   235    362       ** the index b-tree.  */
   236    363       endOfLoop = sqlite3VdbeMakeLabel(v);
   237    364       sqlite3VdbeAddOp2(v, OP_Rewind, iIdxCur, endOfLoop);
   238    365       topOfLoop = sqlite3VdbeCurrentAddr(v);
   239         -    sqlite3VdbeAddOp2(v, OP_AddImm, iMem, 1);
          366  +    sqlite3VdbeAddOp2(v, OP_AddImm, iMem, 1);  /* Increment row counter */
   240    367   
   241    368       for(i=0; i<nCol; i++){
   242    369         CollSeq *pColl;
   243    370         sqlite3VdbeAddOp3(v, OP_Column, iIdxCur, i, regCol);
   244    371         if( i==0 ){
   245         -#ifdef SQLITE_ENABLE_STAT2
   246         -        /* Check if the record that cursor iIdxCur points to contains a
   247         -        ** value that should be stored in the sqlite_stat2 table. If so,
   248         -        ** store it.  */
   249         -        int ne = sqlite3VdbeAddOp3(v, OP_Ne, regRecno, 0, regSamplerecno);
   250         -        assert( regTabname+1==regIdxname 
   251         -             && regTabname+2==regSampleno
   252         -             && regTabname+3==regCol
   253         -        );
   254         -        sqlite3VdbeChangeP5(v, SQLITE_JUMPIFNULL);
   255         -        sqlite3VdbeAddOp4(v, OP_MakeRecord, regTabname, 4, regRec, "aaab", 0);
   256         -        sqlite3VdbeAddOp2(v, OP_NewRowid, iStatCur+1, regRowid);
   257         -        sqlite3VdbeAddOp3(v, OP_Insert, iStatCur+1, regRec, regRowid);
   258         -
   259         -        /* Calculate new values for regSamplerecno and regSampleno.
   260         -        **
   261         -        **   sampleno = sampleno + 1
   262         -        **   samplerecno = samplerecno+(remaining records)/(remaining samples)
   263         -        */
   264         -        sqlite3VdbeAddOp2(v, OP_AddImm, regSampleno, 1);
   265         -        sqlite3VdbeAddOp3(v, OP_Subtract, regRecno, regLast, regTemp);
   266         -        sqlite3VdbeAddOp2(v, OP_AddImm, regTemp, -1);
   267         -        sqlite3VdbeAddOp2(v, OP_Integer, SQLITE_INDEX_SAMPLES, regTemp2);
   268         -        sqlite3VdbeAddOp3(v, OP_Subtract, regSampleno, regTemp2, regTemp2);
   269         -        sqlite3VdbeAddOp3(v, OP_Divide, regTemp2, regTemp, regTemp);
   270         -        sqlite3VdbeAddOp3(v, OP_Add, regSamplerecno, regTemp, regSamplerecno);
   271         -
   272         -        sqlite3VdbeJumpHere(v, ne);
   273         -        sqlite3VdbeAddOp2(v, OP_AddImm, regRecno, 1);
   274         -#endif
   275         -
   276    372           /* Always record the very first row */
   277         -        sqlite3VdbeAddOp1(v, OP_IfNot, iMem+1);
          373  +        addrIfNot = sqlite3VdbeAddOp1(v, OP_IfNot, iMem+1);
   278    374         }
   279    375         assert( pIdx->azColl!=0 );
   280    376         assert( pIdx->azColl[i]!=0 );
   281    377         pColl = sqlite3LocateCollSeq(pParse, pIdx->azColl[i]);
   282         -      sqlite3VdbeAddOp4(v, OP_Ne, regCol, 0, iMem+nCol+i+1,
   283         -                       (char*)pColl, P4_COLLSEQ);
          378  +      aChngAddr[i] = sqlite3VdbeAddOp4(v, OP_Ne, regCol, 0, iMem+nCol+i+1,
          379  +                                      (char*)pColl, P4_COLLSEQ);
   284    380         sqlite3VdbeChangeP5(v, SQLITE_NULLEQ);
   285         -    }
   286         -    if( db->mallocFailed ){
   287         -      /* If a malloc failure has occurred, then the result of the expression 
   288         -      ** passed as the second argument to the call to sqlite3VdbeJumpHere() 
   289         -      ** below may be negative. Which causes an assert() to fail (or an
   290         -      ** out-of-bounds write if SQLITE_DEBUG is not defined).  */
   291         -      return;
          381  +      VdbeComment((v, "jump if column %d changed", i));
          382  +#ifdef SQLITE_ENABLE_STAT2
          383  +      if( i==0 && addrStoreStat2 ){
          384  +        sqlite3VdbeAddOp2(v, OP_AddImm, regSampleCnt, 1);
          385  +        VdbeComment((v, "incr repeat count"));
          386  +      }
          387  +#endif
   292    388       }
   293    389       sqlite3VdbeAddOp2(v, OP_Goto, 0, endOfLoop);
   294    390       for(i=0; i<nCol; i++){
   295         -      int addr2 = sqlite3VdbeCurrentAddr(v) - (nCol*2);
          391  +      sqlite3VdbeJumpHere(v, aChngAddr[i]);  /* Set jump dest for the OP_Ne */
   296    392         if( i==0 ){
   297         -        sqlite3VdbeJumpHere(v, addr2-1);  /* Set jump dest for the OP_IfNot */
          393  +        sqlite3VdbeJumpHere(v, addrIfNot);   /* Jump dest for OP_IfNot */
          394  +#ifdef SQLITE_ENABLE_STAT2
          395  +        sqlite3VdbeAddOp2(v, OP_Gosub, regGosub, addrStoreStat2);
          396  +        sqlite3VdbeAddOp2(v, OP_Integer, 1, regSampleCnt);
          397  +#endif        
   298    398         }
   299         -      sqlite3VdbeJumpHere(v, addr2);      /* Set jump dest for the OP_Ne */
   300    399         sqlite3VdbeAddOp2(v, OP_AddImm, iMem+i+1, 1);
   301    400         sqlite3VdbeAddOp3(v, OP_Column, iIdxCur, i, iMem+nCol+i+1);
   302    401       }
          402  +    sqlite3DbFree(db, aChngAddr);
   303    403   
   304         -    /* End of the analysis loop. */
          404  +    /* Always jump here after updating the iMem+1...iMem+1+nCol counters */
   305    405       sqlite3VdbeResolveLabel(v, endOfLoop);
          406  +
          407  +#ifdef SQLITE_ENABLE_STAT2
          408  +    /* Check if the record that cursor iIdxCur points to contains a
          409  +    ** value that should be stored in the sqlite_stat2 table. If so,
          410  +    ** store it. 
          411  +    */
          412  +    int ne = sqlite3VdbeAddOp3(v, OP_Le, regNext, 0, iMem);
          413  +    VdbeComment((v, "jump if not a sample"));
          414  +    shortJump = sqlite3VdbeAddOp1(v, OP_If, regReady);
          415  +    sqlite3VdbeAddOp2(v, OP_Copy, iMem+nCol+1, regSample);
          416  +    sqlite3VdbeJumpHere(v, shortJump);
          417  +    sqlite3VdbeAddOp2(v, OP_AddImm, regReady, 1);
          418  +
          419  +    /* Calculate new values for regNextSample.  Where N is the number
          420  +    ** of rows in the table and S is the number of samples to take:
          421  +    **
          422  +    **   nextSample = (sampleNumber*N*2 + N)/(2*S)
          423  +    */
          424  +    sqlite3VdbeAddOp2(v, OP_AddImm, regSampleIdx, 1);
          425  +    sqlite3VdbeAddOp3(v, OP_Multiply, regSampleIdx, regCount2, regNext);
          426  +    sqlite3VdbeAddOp3(v, OP_Add, regNext, regCount, regNext);
          427  +    sqlite3VdbeAddOp3(v, OP_Divide, regSample2, regNext, regNext);
          428  +    sqlite3VdbeJumpHere(v, ne);
          429  +#endif
          430  +
   306    431       sqlite3VdbeAddOp2(v, OP_Next, iIdxCur, topOfLoop);
   307    432       sqlite3VdbeAddOp1(v, OP_Close, iIdxCur);
          433  +#ifdef SQLITE_ENABLE_STAT2
          434  +    sqlite3VdbeAddOp2(v, OP_Gosub, regGosub, addrStoreStat2);
          435  +#endif        
   308    436   
   309    437       /* Store the results in sqlite_stat1.
   310    438       **
   311    439       ** The result is a single row of the sqlite_stat1 table.  The first
   312    440       ** two columns are the names of the table and index.  The third column
   313    441       ** is a string composed of a list of integer statistics about the
   314    442       ** index.  The first integer in the list is the total number of entries
................................................................................
   484    612           if( (pIdx = sqlite3FindIndex(db, z, zDb))!=0 ){
   485    613             analyzeTable(pParse, pIdx->pTable, pIdx);
   486    614           }else if( (pTab = sqlite3LocateTable(pParse, 0, z, zDb))!=0 ){
   487    615             analyzeTable(pParse, pTab, 0);
   488    616           }
   489    617           sqlite3DbFree(db, z);
   490    618         }
   491         -    }   
          619  +    }
   492    620     }
   493    621   }
   494    622   
   495    623   /*
   496    624   ** Used to pass information from the analyzer reader through to the
   497    625   ** callback routine.
   498    626   */
................................................................................
   552    680         pIndex->bUnordered = 1;
   553    681         break;
   554    682       }
   555    683     }
   556    684     return 0;
   557    685   }
   558    686   
          687  +#if SQLITE_ENABLE_STAT2
          688  +/*
          689  +** Delete an array of IndexSample objects
          690  +*/
          691  +static void deleteIndexSampleArray(
          692  +  sqlite3 *db,                 /* The database connection */
          693  +  IndexSampleArray *pArray     /* Array of IndexSample objects */
          694  +){
          695  +  int j;
          696  +  if( pArray->a==0 ) return;
          697  +  for(j=0; j<pArray->n; j++){
          698  +    IndexSample *p = &pArray->a[j];
          699  +    if( p->eType==SQLITE_TEXT || p->eType==SQLITE_BLOB ){
          700  +      sqlite3_free(p->u.z);
          701  +    }
          702  +  }
          703  +  sqlite3_free(pArray->a);
          704  +  memset(pArray, 0, sizeof(*pArray));
          705  +}
          706  +#endif
          707  +
   559    708   /*
   560         -** If the Index.aSample variable is not NULL, delete the aSample[] array
   561         -** and its contents.
          709  +** Delete the sample and common-key arrays from the index.
   562    710   */
   563    711   void sqlite3DeleteIndexSamples(sqlite3 *db, Index *pIdx){
   564    712   #ifdef SQLITE_ENABLE_STAT2
   565         -  if( pIdx->aSample ){
   566         -    int j;
   567         -    for(j=0; j<SQLITE_INDEX_SAMPLES; j++){
   568         -      IndexSample *p = &pIdx->aSample[j];
   569         -      if( p->eType==SQLITE_TEXT || p->eType==SQLITE_BLOB ){
   570         -        sqlite3DbFree(db, p->u.z);
   571         -      }
   572         -    }
   573         -    sqlite3DbFree(db, pIdx->aSample);
   574         -  }
          713  +  deleteIndexSampleArray(db, &pIdx->sample);
          714  +  deleteIndexSampleArray(db, &pIdx->comkey);
   575    715   #else
   576    716     UNUSED_PARAMETER(db);
   577    717     UNUSED_PARAMETER(pIdx);
   578    718   #endif
   579    719   }
   580    720   
          721  +#ifdef SQLITE_ENABLE_STAT2
          722  +/*
          723  +** Enlarge an array of IndexSample objects.
          724  +*/
          725  +static IndexSample *allocIndexSample(
          726  +  sqlite3 *db,              /* Database connection to malloc against */
          727  +  IndexSampleArray *pArray, /* The array to enlarge */
          728  +  int i                     /* Return this element */
          729  +){
          730  +  IndexSample *p;
          731  +  if( i>=pArray->nAlloc ){
          732  +    int szNew = i+1;
          733  +    p = (IndexSample*)sqlite3_realloc(pArray->a, szNew*sizeof(IndexSample));
          734  +    if( p==0 ) return 0;
          735  +    pArray->a = p;
          736  +    memset(&pArray->a[pArray->n], 0, (szNew-(pArray->n))*sizeof(IndexSample));
          737  +    pArray->nAlloc = szNew;
          738  +  }
          739  +  if( i>=pArray->n ) pArray->n = i+1;
          740  +  return &pArray->a[i];
          741  +}
          742  +#endif
          743  +
   581    744   /*
   582    745   ** Load the content of the sqlite_stat1 and sqlite_stat2 tables. The
   583    746   ** contents of sqlite_stat1 are used to populate the Index.aiRowEst[]
   584    747   ** arrays. The contents of sqlite_stat2 are used to populate the
   585         -** Index.aSample[] arrays.
          748  +** Index.sample and Index.comkey arrays.
   586    749   **
   587    750   ** If the sqlite_stat1 table is not present in the database, SQLITE_ERROR
   588    751   ** is returned. In this case, even if SQLITE_ENABLE_STAT2 was defined 
   589    752   ** during compilation and the sqlite_stat2 table is present, no data is 
   590    753   ** read from it.
   591    754   **
   592    755   ** If SQLITE_ENABLE_STAT2 was defined during compilation and the 
................................................................................
   599    762   ** code may be ignored.
   600    763   */
   601    764   int sqlite3AnalysisLoad(sqlite3 *db, int iDb){
   602    765     analysisInfo sInfo;
   603    766     HashElem *i;
   604    767     char *zSql;
   605    768     int rc;
          769  +  Table *pTab;    /* Stat1 or Stat2 table */
   606    770   
   607    771     assert( iDb>=0 && iDb<db->nDb );
   608    772     assert( db->aDb[iDb].pBt!=0 );
   609    773   
   610    774     /* Clear any prior statistics */
   611    775     assert( sqlite3SchemaMutexHeld(db, iDb, 0) );
   612    776     for(i=sqliteHashFirst(&db->aDb[iDb].pSchema->idxHash);i;i=sqliteHashNext(i)){
   613    777       Index *pIdx = sqliteHashData(i);
   614    778       sqlite3DefaultRowEst(pIdx);
   615    779       sqlite3DeleteIndexSamples(db, pIdx);
   616         -    pIdx->aSample = 0;
   617    780     }
   618    781   
   619    782     /* Check to make sure the sqlite_stat1 table exists */
   620    783     sInfo.db = db;
   621    784     sInfo.zDatabase = db->aDb[iDb].zName;
   622         -  if( sqlite3FindTable(db, "sqlite_stat1", sInfo.zDatabase)==0 ){
          785  +  if( (pTab=sqlite3FindTable(db, "sqlite_stat1", sInfo.zDatabase))==0 ){
   623    786       return SQLITE_ERROR;
   624    787     }
   625    788   
   626    789     /* Load new statistics out of the sqlite_stat1 table */
   627    790     zSql = sqlite3MPrintf(db, 
   628    791         "SELECT tbl, idx, stat FROM %Q.sqlite_stat1", sInfo.zDatabase);
   629    792     if( zSql==0 ){
................................................................................
   632    795       rc = sqlite3_exec(db, zSql, analysisLoader, &sInfo, 0);
   633    796       sqlite3DbFree(db, zSql);
   634    797     }
   635    798   
   636    799   
   637    800     /* Load the statistics from the sqlite_stat2 table. */
   638    801   #ifdef SQLITE_ENABLE_STAT2
   639         -  if( rc==SQLITE_OK && !sqlite3FindTable(db, "sqlite_stat2", sInfo.zDatabase) ){
          802  +  if( rc==SQLITE_OK 
          803  +    && (pTab=sqlite3FindTable(db, "sqlite_stat2", sInfo.zDatabase))==0 ){
   640    804       rc = SQLITE_ERROR;
   641    805     }
   642    806     if( rc==SQLITE_OK ){
   643    807       sqlite3_stmt *pStmt = 0;
   644    808   
   645    809       zSql = sqlite3MPrintf(db, 
   646         -        "SELECT idx,sampleno,sample FROM %Q.sqlite_stat2", sInfo.zDatabase);
          810  +        "SELECT idx, sampleno, sample, %s FROM %Q.sqlite_stat2"
          811  +        " ORDER BY rowid DESC",
          812  +        pTab->nCol>=5 ? "cnt" : "0", sInfo.zDatabase);
   647    813       if( !zSql ){
   648    814         rc = SQLITE_NOMEM;
   649    815       }else{
   650    816         rc = sqlite3_prepare(db, zSql, -1, &pStmt, 0);
   651    817         sqlite3DbFree(db, zSql);
   652    818       }
   653    819   
   654    820       if( rc==SQLITE_OK ){
   655    821         while( sqlite3_step(pStmt)==SQLITE_ROW ){
   656    822           char *zIndex;   /* Index name */
   657    823           Index *pIdx;    /* Pointer to the index object */
          824  +        int iSample;
          825  +        int eType;
          826  +        IndexSample *pSample;
   658    827   
   659    828           zIndex = (char *)sqlite3_column_text(pStmt, 0);
   660         -        pIdx = zIndex ? sqlite3FindIndex(db, zIndex, sInfo.zDatabase) : 0;
   661         -        if( pIdx ){
   662         -          int iSample = sqlite3_column_int(pStmt, 1);
   663         -          if( iSample<SQLITE_INDEX_SAMPLES && iSample>=0 ){
   664         -            int eType = sqlite3_column_type(pStmt, 2);
   665         -
   666         -            if( pIdx->aSample==0 ){
   667         -              static const int sz = sizeof(IndexSample)*SQLITE_INDEX_SAMPLES;
   668         -              pIdx->aSample = (IndexSample *)sqlite3DbMallocRaw(0, sz);
   669         -              if( pIdx->aSample==0 ){
   670         -                db->mallocFailed = 1;
   671         -                break;
   672         -              }
   673         -	      memset(pIdx->aSample, 0, sz);
   674         -            }
   675         -
   676         -            assert( pIdx->aSample );
   677         -            {
   678         -              IndexSample *pSample = &pIdx->aSample[iSample];
   679         -              pSample->eType = (u8)eType;
   680         -              if( eType==SQLITE_INTEGER || eType==SQLITE_FLOAT ){
   681         -                pSample->u.r = sqlite3_column_double(pStmt, 2);
   682         -              }else if( eType==SQLITE_TEXT || eType==SQLITE_BLOB ){
   683         -                const char *z = (const char *)(
   684         -                    (eType==SQLITE_BLOB) ?
   685         -                    sqlite3_column_blob(pStmt, 2):
   686         -                    sqlite3_column_text(pStmt, 2)
   687         -                );
   688         -                int n = sqlite3_column_bytes(pStmt, 2);
   689         -                if( n>24 ){
   690         -                  n = 24;
   691         -                }
   692         -                pSample->nByte = (u8)n;
   693         -                if( n < 1){
   694         -                  pSample->u.z = 0;
   695         -                }else{
   696         -                  pSample->u.z = sqlite3DbStrNDup(0, z, n);
   697         -                  if( pSample->u.z==0 ){
   698         -                    db->mallocFailed = 1;
   699         -                    break;
   700         -                  }
   701         -                }
   702         -              }
          829  +        if( zIndex==0 ) continue;
          830  +        pIdx = sqlite3FindIndex(db, zIndex, sInfo.zDatabase);
          831  +        if( pIdx==0 ) continue;
          832  +        iSample = sqlite3_column_int(pStmt, 1);
          833  +        if( iSample>=SQLITE_MAX_SAMPLES ) continue;
          834  +        if( iSample<0 ){
          835  +          pSample = allocIndexSample(db, &pIdx->comkey, pIdx->comkey.n);
          836  +        }else{
          837  +          pSample = allocIndexSample(db, &pIdx->sample, iSample);
          838  +        }
          839  +        if( pSample==0 ) break;
          840  +        eType = sqlite3_column_type(pStmt, 2);
          841  +        pSample->eType = (u8)eType;
          842  +        pSample->nCopy = sqlite3_column_int(pStmt, 3);
          843  +        if( eType==SQLITE_INTEGER ){
          844  +          pSample->u.i = sqlite3_column_int64(pStmt, 2);
          845  +        }else if( eType==SQLITE_FLOAT ){
          846  +          pSample->u.r = sqlite3_column_double(pStmt, 2);
          847  +        }else if( eType==SQLITE_TEXT || eType==SQLITE_BLOB ){
          848  +          const char *z = (const char *)(
          849  +             (eType==SQLITE_BLOB) ?
          850  +              sqlite3_column_blob(pStmt, 2):
          851  +              sqlite3_column_text(pStmt, 2)
          852  +          );
          853  +          int n = sqlite3_column_bytes(pStmt, 2);
          854  +          if( n>255 ) n = 255;
          855  +          pSample->nByte = (u8)n;
          856  +          if( n < 1){
          857  +            pSample->u.z = 0;
          858  +          }else{
          859  +            pSample->u.z = sqlite3DbStrNDup(0, z, n);
          860  +            if( pSample->u.z==0 ){
          861  +              db->mallocFailed = 1;
          862  +              break;
   703    863               }
   704    864             }
   705    865           }
   706    866         }
   707    867         rc = sqlite3_finalize(pStmt);
   708    868       }
   709    869     }

Changes to src/sqliteInt.h.

    77     77   #endif
    78     78   
    79     79   /*
    80     80   ** The number of samples of an index that SQLite takes in order to 
    81     81   ** construct a histogram of the table content when running ANALYZE
    82     82   ** and with SQLITE_ENABLE_STAT2
    83     83   */
    84         -#define SQLITE_INDEX_SAMPLES 10
           84  +#ifndef SQLITE_INDEX_SAMPLES
           85  +# define SQLITE_INDEX_SAMPLES 10
           86  +#endif
           87  +#define SQLITE_MAX_SAMPLES 100
    85     88   
    86     89   /*
    87     90   ** The following macros are used to cast pointers to integers and
    88     91   ** integers to pointers.  The way you do this varies from one compiler
    89     92   ** to the next, so we have developed the following set of #if statements
    90     93   ** to generate appropriate macros for a wide range of compilers.
    91     94   **
................................................................................
   608    611   typedef struct FKey FKey;
   609    612   typedef struct FuncDestructor FuncDestructor;
   610    613   typedef struct FuncDef FuncDef;
   611    614   typedef struct FuncDefHash FuncDefHash;
   612    615   typedef struct IdList IdList;
   613    616   typedef struct Index Index;
   614    617   typedef struct IndexSample IndexSample;
          618  +typedef struct IndexSampleArray IndexSampleArray;
   615    619   typedef struct KeyClass KeyClass;
   616    620   typedef struct KeyInfo KeyInfo;
   617    621   typedef struct Lookaside Lookaside;
   618    622   typedef struct LookasideSlot LookasideSlot;
   619    623   typedef struct Module Module;
   620    624   typedef struct NameContext NameContext;
   621    625   typedef struct Parse Parse;
................................................................................
  1443   1447   #define UNPACKED_NEED_FREE     0x0001  /* Memory is from sqlite3Malloc() */
  1444   1448   #define UNPACKED_NEED_DESTROY  0x0002  /* apMem[]s should all be destroyed */
  1445   1449   #define UNPACKED_IGNORE_ROWID  0x0004  /* Ignore trailing rowid on key1 */
  1446   1450   #define UNPACKED_INCRKEY       0x0008  /* Make this key an epsilon larger */
  1447   1451   #define UNPACKED_PREFIX_MATCH  0x0010  /* A prefix match is considered OK */
  1448   1452   #define UNPACKED_PREFIX_SEARCH 0x0020  /* A prefix match is considered OK */
  1449   1453   
         1454  +/*
         1455  +** Each sample stored in the sqlite_stat2 table is represented in memory 
         1456  +** using a structure of this type.
         1457  +*/
         1458  +struct IndexSample {
         1459  +  union {
         1460  +    char *z;        /* Value if eType is SQLITE_TEXT or SQLITE_BLOB */
         1461  +    double r;       /* Value if eType is SQLITE_FLOAT */
         1462  +    i64 i;          /* Value if eType is SQLITE_INTEGER */
         1463  +  } u;
         1464  +  u8 eType;         /* SQLITE_NULL, SQLITE_INTEGER ... etc. */
         1465  +  u8 nByte;         /* Size in byte of text or blob. */
         1466  +  u32 nCopy;        /* How many copies of this sample are in the database */
         1467  +};
         1468  +
         1469  +/*
         1470  +** An array of IndexSample elements is as follows:
         1471  +*/
         1472  +struct IndexSampleArray {
         1473  +  u16 n;            /* Number of elements in the array */
         1474  +  u16 nAlloc;       /* Space allocated to a[] */
         1475  +  IndexSample *a;   /* The samples */
         1476  +};
         1477  +
  1450   1478   /*
  1451   1479   ** Each SQL index is represented in memory by an
  1452   1480   ** instance of the following structure.
  1453   1481   **
  1454   1482   ** The columns of the table that are to be indexed are described
  1455   1483   ** by the aiColumn[] field of this structure.  For example, suppose
  1456   1484   ** we have the following table and index:
................................................................................
  1484   1512     u8 autoIndex;    /* True if is automatically created (ex: by UNIQUE) */
  1485   1513     u8 bUnordered;   /* Use this index for == or IN queries only */
  1486   1514     char *zColAff;   /* String defining the affinity of each column */
  1487   1515     Index *pNext;    /* The next index associated with the same table */
  1488   1516     Schema *pSchema; /* Schema containing this index */
  1489   1517     u8 *aSortOrder;  /* Array of size Index.nColumn. True==DESC, False==ASC */
  1490   1518     char **azColl;   /* Array of collation sequence names for index */
  1491         -  IndexSample *aSample;    /* Array of SQLITE_INDEX_SAMPLES samples */
  1492         -};
  1493         -
  1494         -/*
  1495         -** Each sample stored in the sqlite_stat2 table is represented in memory 
  1496         -** using a structure of this type.
  1497         -*/
  1498         -struct IndexSample {
  1499         -  union {
  1500         -    char *z;        /* Value if eType is SQLITE_TEXT or SQLITE_BLOB */
  1501         -    double r;       /* Value if eType is SQLITE_FLOAT or SQLITE_INTEGER */
  1502         -  } u;
  1503         -  u8 eType;         /* SQLITE_NULL, SQLITE_INTEGER ... etc. */
  1504         -  u8 nByte;         /* Size in byte of text or blob. */
         1519  +#ifdef SQLITE_ENABLE_STAT2
         1520  +  IndexSampleArray sample;  /* Sampled histogram for the first column */
         1521  +  IndexSampleArray comkey;  /* The most common keys */
         1522  +#endif
  1505   1523   };
  1506   1524   
  1507   1525   /*
  1508   1526   ** Each token coming out of the lexer is an instance of
  1509   1527   ** this structure.  Tokens are also used as part of an expression.
  1510   1528   **
  1511   1529   ** Note if Token.z==0 then Token.dyn and Token.n are undefined and

Changes to src/vdbe.c.

  1049   1049   */
  1050   1050   case OP_Copy: {             /* in1, out2 */
  1051   1051     pIn1 = &aMem[pOp->p1];
  1052   1052     pOut = &aMem[pOp->p2];
  1053   1053     assert( pOut!=pIn1 );
  1054   1054     sqlite3VdbeMemShallowCopy(pOut, pIn1, MEM_Ephem);
  1055   1055     Deephemeralize(pOut);
  1056         -  REGISTER_TRACE(pOp->p2, pOut);
  1057   1056     break;
  1058   1057   }
  1059   1058   
  1060   1059   /* Opcode: SCopy P1 P2 * * *
  1061   1060   **
  1062   1061   ** Make a shallow copy of register P1 into register P2.
  1063   1062   **
................................................................................
  1536   1535   ** To force any register to be an integer, just add 0.
  1537   1536   */
  1538   1537   case OP_AddImm: {            /* in1 */
  1539   1538     pIn1 = &aMem[pOp->p1];
  1540   1539     memAboutToChange(p, pIn1);
  1541   1540     sqlite3VdbeMemIntegerify(pIn1);
  1542   1541     pIn1->u.i += pOp->p2;
         1542  +  REGISTER_TRACE(pOp->p1, pIn1);
  1543   1543     break;
  1544   1544   }
  1545   1545   
  1546   1546   /* Opcode: MustBeInt P1 P2 * * *
  1547   1547   ** 
  1548   1548   ** Force the value in register P1 to be an integer.  If the value
  1549   1549   ** in P1 is not an integer and cannot be converted into an integer
................................................................................
  2464   2464     memAboutToChange(p, pOut);
  2465   2465   
  2466   2466     /* Loop through the elements that will make up the record to figure
  2467   2467     ** out how much space is required for the new record.
  2468   2468     */
  2469   2469     for(pRec=pData0; pRec<=pLast; pRec++){
  2470   2470       assert( memIsValid(pRec) );
         2471  +    REGISTER_TRACE((pRec-pData0)+pOp->p1, pRec);
  2471   2472       if( zAffinity ){
  2472   2473         applyAffinity(pRec, zAffinity[pRec-pData0], encoding);
  2473   2474       }
  2474   2475       if( pRec->flags&MEM_Zero && pRec->n>0 ){
  2475   2476         sqlite3VdbeMemExpandBlob(pRec);
  2476   2477       }
  2477   2478       serial_type = sqlite3VdbeSerialType(pRec, file_format);

Changes to src/where.c.

  2418   2418     */
  2419   2419     bestOrClauseIndex(pParse, pWC, pSrc, notReady, notValid, pOrderBy, pCost);
  2420   2420   }
  2421   2421   #endif /* SQLITE_OMIT_VIRTUALTABLE */
  2422   2422   
  2423   2423   /*
  2424   2424   ** Argument pIdx is a pointer to an index structure that has an array of
  2425         -** SQLITE_INDEX_SAMPLES evenly spaced samples of the first indexed column
  2426         -** stored in Index.aSample. These samples divide the domain of values stored
  2427         -** the index into (SQLITE_INDEX_SAMPLES+1) regions.
  2428         -** Region 0 contains all values less than the first sample value. Region
  2429         -** 1 contains values between the first and second samples.  Region 2 contains
  2430         -** values between samples 2 and 3.  And so on.  Region SQLITE_INDEX_SAMPLES
  2431         -** contains values larger than the last sample.
         2425  +** pIdx->sample.n (hereafter "S") evenly spaced samples of the first indexed
         2426  +** column stored in Index.sample. These samples divide the domain of values
         2427  +** stored the index into S+1 regions.  Region 0 contains all values less than
         2428  +** the first sample value. Region 1 contains values between the first and
         2429  +** second samples.  Region 2 contains values between samples 2 and 3.  And so
         2430  +** on.  Region S contains values larger than the last sample.
         2431  +**
         2432  +** Note that samples are computed as being centered on S buckets where each
         2433  +** bucket contains the nearly same number of rows.  This routine takes samples
         2434  +** to be dividers between regions, though.  Hence, region 0 and region S
         2435  +** contain half as many rows as the interior regions.
  2432   2436   **
  2433   2437   ** If the index contains many duplicates of a single value, then it is
  2434   2438   ** possible that two or more adjacent samples can hold the same value.
  2435   2439   ** When that is the case, the smallest possible region code is returned
  2436   2440   ** when roundUp is false and the largest possible region code is returned
  2437   2441   ** when roundUp is true.
  2438   2442   **
  2439   2443   ** If successful, this function determines which of the regions value 
  2440   2444   ** pVal lies in, sets *piRegion to the region index (a value between 0
  2441         -** and SQLITE_INDEX_SAMPLES+1, inclusive) and returns SQLITE_OK.
         2445  +** and S, inclusive) and returns SQLITE_OK.
  2442   2446   ** Or, if an OOM occurs while converting text values between encodings,
  2443   2447   ** SQLITE_NOMEM is returned and *piRegion is undefined.
  2444   2448   */
  2445   2449   #ifdef SQLITE_ENABLE_STAT2
  2446   2450   static int whereRangeRegion(
  2447   2451     Parse *pParse,              /* Database connection */
  2448   2452     Index *pIdx,                /* Index to consider domain of */
  2449   2453     sqlite3_value *pVal,        /* Value to consider */
  2450   2454     int roundUp,                /* Return largest valid region if true */
  2451         -  int *piRegion               /* OUT: Region of domain in which value lies */
         2455  +  int *piRegion,              /* OUT: Region of domain in which value lies */
         2456  +  u32 *pnCopy                 /* OUT: Number of rows with pVal, or -1 if unk */
  2452   2457   ){
  2453   2458     assert( roundUp==0 || roundUp==1 );
  2454   2459     if( ALWAYS(pVal) ){
  2455         -    IndexSample *aSample = pIdx->aSample;
         2460  +    IndexSample *aSample = pIdx->sample.a;
         2461  +    int nSample = pIdx->sample.n;
  2456   2462       int i = 0;
  2457   2463       int eType = sqlite3_value_type(pVal);
  2458   2464   
  2459         -    if( eType==SQLITE_INTEGER || eType==SQLITE_FLOAT ){
         2465  +    assert( nSample>0 );
         2466  +    if( eType==SQLITE_INTEGER ){
         2467  +      i64 x = sqlite3_value_int64(pVal);
         2468  +      for(i=0; i<nSample; i++){
         2469  +        if( aSample[i].eType==SQLITE_NULL ) continue;
         2470  +        if( aSample[i].eType>=SQLITE_TEXT ) break;
         2471  +        if( aSample[i].u.i==x ) *pnCopy = aSample[i].nCopy;
         2472  +        if( roundUp ){
         2473  +          if( aSample[i].u.i>x ) break;
         2474  +        }else{
         2475  +          if( aSample[i].u.i>=x ) break;
         2476  +        }
         2477  +      }
         2478  +    }else if( eType==SQLITE_FLOAT ){
  2460   2479         double r = sqlite3_value_double(pVal);
  2461         -      for(i=0; i<SQLITE_INDEX_SAMPLES; i++){
         2480  +      for(i=0; i<nSample; i++){
  2462   2481           if( aSample[i].eType==SQLITE_NULL ) continue;
  2463   2482           if( aSample[i].eType>=SQLITE_TEXT ) break;
         2483  +        if( aSample[i].u.r==r ) *pnCopy = aSample[i].nCopy;
  2464   2484           if( roundUp ){
  2465   2485             if( aSample[i].u.r>r ) break;
  2466   2486           }else{
  2467   2487             if( aSample[i].u.r>=r ) break;
  2468   2488           }
  2469   2489         }
  2470   2490       }else if( eType==SQLITE_NULL ){
  2471   2491         i = 0;
         2492  +      if( aSample[0].eType==SQLITE_NULL ) *pnCopy = aSample[0].nCopy;
  2472   2493         if( roundUp ){
  2473         -        while( i<SQLITE_INDEX_SAMPLES && aSample[i].eType==SQLITE_NULL ) i++;
         2494  +        while( i<nSample && aSample[i].eType==SQLITE_NULL ) i++;
  2474   2495         }
  2475   2496       }else{ 
  2476   2497         sqlite3 *db = pParse->db;
  2477   2498         CollSeq *pColl;
  2478   2499         const u8 *z;
  2479   2500         int n;
  2480   2501   
  2481         -      /* pVal comes from sqlite3ValueFromExpr() so the type cannot be NULL */
  2482   2502         assert( eType==SQLITE_TEXT || eType==SQLITE_BLOB );
  2483         -
  2484   2503         if( eType==SQLITE_BLOB ){
  2485   2504           z = (const u8 *)sqlite3_value_blob(pVal);
  2486   2505           pColl = db->pDfltColl;
  2487   2506           assert( pColl->enc==SQLITE_UTF8 );
  2488   2507         }else{
  2489   2508           pColl = sqlite3GetCollSeq(db, SQLITE_UTF8, 0, *pIdx->azColl);
  2490   2509           if( pColl==0 ){
................................................................................
  2496   2515           if( !z ){
  2497   2516             return SQLITE_NOMEM;
  2498   2517           }
  2499   2518           assert( z && pColl && pColl->xCmp );
  2500   2519         }
  2501   2520         n = sqlite3ValueBytes(pVal, pColl->enc);
  2502   2521   
  2503         -      for(i=0; i<SQLITE_INDEX_SAMPLES; i++){
         2522  +      for(i=0; i<nSample; i++){
  2504   2523           int c;
  2505   2524           int eSampletype = aSample[i].eType;
  2506   2525           if( eSampletype==SQLITE_NULL || eSampletype<eType ) continue;
  2507   2526           if( (eSampletype!=eType) ) break;
  2508   2527   #ifndef SQLITE_OMIT_UTF16
  2509   2528           if( pColl->enc!=SQLITE_UTF8 ){
  2510   2529             int nSample;
................................................................................
  2518   2537             c = pColl->xCmp(pColl->pUser, nSample, zSample, n, z);
  2519   2538             sqlite3DbFree(db, zSample);
  2520   2539           }else
  2521   2540   #endif
  2522   2541           {
  2523   2542             c = pColl->xCmp(pColl->pUser, aSample[i].nByte, aSample[i].u.z, n, z);
  2524   2543           }
         2544  +        if( c==0 ) *pnCopy = aSample[i].nCopy;
  2525   2545           if( c-roundUp>=0 ) break;
  2526   2546         }
  2527   2547       }
  2528   2548   
  2529         -    assert( i>=0 && i<=SQLITE_INDEX_SAMPLES );
         2549  +    assert( i>=0 && i<=pIdx->sample.n );
  2530   2550       *piRegion = i;
  2531   2551     }
  2532   2552     return SQLITE_OK;
  2533   2553   }
  2534   2554   #endif   /* #ifdef SQLITE_ENABLE_STAT2 */
  2535   2555   
  2536   2556   /*
................................................................................
  2617   2637     WhereTerm *pUpper,   /* Upper bound on the range. ex: "x<455" Might be NULL */
  2618   2638     int *piEst           /* OUT: Return value */
  2619   2639   ){
  2620   2640     int rc = SQLITE_OK;
  2621   2641   
  2622   2642   #ifdef SQLITE_ENABLE_STAT2
  2623   2643   
  2624         -  if( nEq==0 && p->aSample ){
         2644  +  if( nEq==0 && p->sample.a ){
  2625   2645       sqlite3_value *pLowerVal = 0;
  2626   2646       sqlite3_value *pUpperVal = 0;
  2627   2647       int iEst;
  2628   2648       int iLower = 0;
  2629         -    int iUpper = SQLITE_INDEX_SAMPLES;
         2649  +    int nSample = p->sample.n;
         2650  +    int iUpper = p->sample.n;
  2630   2651       int roundUpUpper = 0;
  2631   2652       int roundUpLower = 0;
         2653  +    u32 nC = 0;
  2632   2654       u8 aff = p->pTable->aCol[p->aiColumn[0]].affinity;
  2633   2655   
  2634   2656       if( pLower ){
  2635   2657         Expr *pExpr = pLower->pExpr->pRight;
  2636   2658         rc = valueFromExpr(pParse, pExpr, aff, &pLowerVal);
  2637   2659         assert( pLower->eOperator==WO_GT || pLower->eOperator==WO_GE );
  2638   2660         roundUpLower = (pLower->eOperator==WO_GT) ?1:0;
................................................................................
  2645   2667       }
  2646   2668   
  2647   2669       if( rc!=SQLITE_OK || (pLowerVal==0 && pUpperVal==0) ){
  2648   2670         sqlite3ValueFree(pLowerVal);
  2649   2671         sqlite3ValueFree(pUpperVal);
  2650   2672         goto range_est_fallback;
  2651   2673       }else if( pLowerVal==0 ){
  2652         -      rc = whereRangeRegion(pParse, p, pUpperVal, roundUpUpper, &iUpper);
         2674  +      rc = whereRangeRegion(pParse, p, pUpperVal, roundUpUpper, &iUpper, &nC);
  2653   2675         if( pLower ) iLower = iUpper/2;
  2654   2676       }else if( pUpperVal==0 ){
  2655         -      rc = whereRangeRegion(pParse, p, pLowerVal, roundUpLower, &iLower);
  2656         -      if( pUpper ) iUpper = (iLower + SQLITE_INDEX_SAMPLES + 1)/2;
         2677  +      rc = whereRangeRegion(pParse, p, pLowerVal, roundUpLower, &iLower, &nC);
         2678  +      if( pUpper ) iUpper = (iLower + p->sample.n + 1)/2;
  2657   2679       }else{
  2658         -      rc = whereRangeRegion(pParse, p, pUpperVal, roundUpUpper, &iUpper);
         2680  +      rc = whereRangeRegion(pParse, p, pUpperVal, roundUpUpper, &iUpper, &nC);
  2659   2681         if( rc==SQLITE_OK ){
  2660         -        rc = whereRangeRegion(pParse, p, pLowerVal, roundUpLower, &iLower);
         2682  +        rc = whereRangeRegion(pParse, p, pLowerVal, roundUpLower, &iLower, &nC);
  2661   2683         }
  2662   2684       }
  2663   2685       WHERETRACE(("range scan regions: %d..%d\n", iLower, iUpper));
  2664         -
  2665   2686       iEst = iUpper - iLower;
  2666         -    testcase( iEst==SQLITE_INDEX_SAMPLES );
  2667         -    assert( iEst<=SQLITE_INDEX_SAMPLES );
         2687  +    testcase( iEst==nSample );
         2688  +    assert( iEst<=nSample );
         2689  +    assert( nSample>0 );
  2668   2690       if( iEst<1 ){
  2669         -      *piEst = 50/SQLITE_INDEX_SAMPLES;
         2691  +      *piEst = 50/nSample;
  2670   2692       }else{
  2671         -      *piEst = (iEst*100)/SQLITE_INDEX_SAMPLES;
         2693  +      *piEst = (iEst*100)/nSample;
  2672   2694       }
  2673   2695       sqlite3ValueFree(pLowerVal);
  2674   2696       sqlite3ValueFree(pUpperVal);
  2675   2697       return rc;
  2676   2698     }
  2677   2699   range_est_fallback:
  2678   2700   #else
................................................................................
  2712   2734     double *pnRow        /* Write the revised row estimate here */
  2713   2735   ){
  2714   2736     sqlite3_value *pRhs = 0;  /* VALUE on right-hand side of pTerm */
  2715   2737     int iLower, iUpper;       /* Range of histogram regions containing pRhs */
  2716   2738     u8 aff;                   /* Column affinity */
  2717   2739     int rc;                   /* Subfunction return code */
  2718   2740     double nRowEst;           /* New estimate of the number of rows */
         2741  +  u32 nC = 0;               /* Key copy count */
  2719   2742   
  2720         -  assert( p->aSample!=0 );
         2743  +  assert( p->sample.a!=0 );
         2744  +  assert( p->sample.n>0 );
  2721   2745     aff = p->pTable->aCol[p->aiColumn[0]].affinity;
  2722   2746     if( pExpr ){
  2723   2747       rc = valueFromExpr(pParse, pExpr, aff, &pRhs);
  2724   2748       if( rc ) goto whereEqualScanEst_cancel;
  2725   2749     }else{
  2726   2750       pRhs = sqlite3ValueNew(pParse->db);
  2727   2751     }
  2728   2752     if( pRhs==0 ) return SQLITE_NOTFOUND;
  2729         -  rc = whereRangeRegion(pParse, p, pRhs, 0, &iLower);
         2753  +  rc = whereRangeRegion(pParse, p, pRhs, 0, &iLower, &nC);
  2730   2754     if( rc ) goto whereEqualScanEst_cancel;
  2731         -  rc = whereRangeRegion(pParse, p, pRhs, 1, &iUpper);
  2732         -  if( rc ) goto whereEqualScanEst_cancel;
  2733         -  WHERETRACE(("equality scan regions: %d..%d\n", iLower, iUpper));
  2734         -  if( iLower>=iUpper ){
  2735         -    nRowEst = p->aiRowEst[0]/(SQLITE_INDEX_SAMPLES*2);
  2736         -    if( nRowEst<*pnRow ) *pnRow = nRowEst;
         2755  +  if( nC==0 ){
         2756  +    rc = whereRangeRegion(pParse, p, pRhs, 1, &iUpper, &nC);
         2757  +    if( rc ) goto whereEqualScanEst_cancel;
         2758  +  }
         2759  +  if( nC ){
         2760  +    WHERETRACE(("equality scan count: %u\n", nC));
         2761  +    *pnRow = nC;
  2737   2762     }else{
  2738         -    nRowEst = (iUpper-iLower)*p->aiRowEst[0]/SQLITE_INDEX_SAMPLES;
  2739         -    *pnRow = nRowEst;
         2763  +    WHERETRACE(("equality scan regions: %d..%d\n", iLower, iUpper));
         2764  +    if( iLower>=iUpper ){
         2765  +      nRowEst = p->aiRowEst[0]/(p->sample.n*3);
         2766  +      if( nRowEst<*pnRow ) *pnRow = nRowEst;
         2767  +    }else{
         2768  +      nRowEst = (iUpper-iLower)*p->aiRowEst[0]/p->sample.n;
         2769  +      *pnRow = nRowEst;
         2770  +    }
  2740   2771     }
  2741   2772   
  2742   2773   whereEqualScanEst_cancel:
  2743   2774     sqlite3ValueFree(pRhs);
  2744   2775     return rc;
  2745   2776   }
  2746   2777   #endif /* defined(SQLITE_ENABLE_STAT2) */
................................................................................
  2772   2803     int iLower, iUpper;       /* Range of histogram regions containing pRhs */
  2773   2804     u8 aff;                   /* Column affinity */
  2774   2805     int rc = SQLITE_OK;       /* Subfunction return code */
  2775   2806     double nRowEst;           /* New estimate of the number of rows */
  2776   2807     int nSpan = 0;            /* Number of histogram regions spanned */
  2777   2808     int nSingle = 0;          /* Histogram regions hit by a single value */
  2778   2809     int nNotFound = 0;        /* Count of values that are not constants */
  2779         -  int i;                               /* Loop counter */
  2780         -  u8 aSpan[SQLITE_INDEX_SAMPLES+1];    /* Histogram regions that are spanned */
  2781         -  u8 aSingle[SQLITE_INDEX_SAMPLES+1];  /* Histogram regions hit once */
         2810  +  int i;                             /* Loop counter */
         2811  +  u32 nC;                            /* Exact count of rows for a key */
         2812  +  int nSample = p->sample.n;         /* Number of samples */
         2813  +  u8 aSpan[SQLITE_MAX_SAMPLES+1];    /* Histogram regions that are spanned */
         2814  +  u8 aSingle[SQLITE_MAX_SAMPLES+1];  /* Histogram regions hit once */
  2782   2815   
  2783         -  assert( p->aSample!=0 );
         2816  +  assert( p->sample.a!=0 );
         2817  +  assert( nSample>0 );
  2784   2818     aff = p->pTable->aCol[p->aiColumn[0]].affinity;
  2785         -  memset(aSpan, 0, sizeof(aSpan));
  2786         -  memset(aSingle, 0, sizeof(aSingle));
         2819  +  memset(aSpan, 0, nSample+1);
         2820  +  memset(aSingle, 0, nSample+1);
  2787   2821     for(i=0; i<pList->nExpr; i++){
  2788   2822       sqlite3ValueFree(pVal);
  2789   2823       rc = valueFromExpr(pParse, pList->a[i].pExpr, aff, &pVal);
  2790   2824       if( rc ) break;
  2791   2825       if( pVal==0 || sqlite3_value_type(pVal)==SQLITE_NULL ){
  2792   2826         nNotFound++;
  2793   2827         continue;
  2794   2828       }
  2795         -    rc = whereRangeRegion(pParse, p, pVal, 0, &iLower);
         2829  +    rc = whereRangeRegion(pParse, p, pVal, 0, &iLower, &nC);
  2796   2830       if( rc ) break;
  2797         -    rc = whereRangeRegion(pParse, p, pVal, 1, &iUpper);
         2831  +    rc = whereRangeRegion(pParse, p, pVal, 1, &iUpper, &nC);
  2798   2832       if( rc ) break;
  2799   2833       if( iLower>=iUpper ){
  2800   2834         aSingle[iLower] = 1;
  2801   2835       }else{
  2802         -      assert( iLower>=0 && iUpper<=SQLITE_INDEX_SAMPLES );
         2836  +      assert( iLower>=0 && iUpper<=nSample );
  2803   2837         while( iLower<iUpper ) aSpan[iLower++] = 1;
  2804   2838       }
  2805   2839     }
  2806   2840     if( rc==SQLITE_OK ){
  2807         -    for(i=nSpan=0; i<=SQLITE_INDEX_SAMPLES; i++){
         2841  +    for(i=nSpan=0; i<=nSample; i++){
  2808   2842         if( aSpan[i] ){
  2809   2843           nSpan++;
  2810   2844         }else if( aSingle[i] ){
  2811   2845           nSingle++;
  2812   2846         }
  2813   2847       }
  2814         -    nRowEst = (nSpan*2+nSingle)*p->aiRowEst[0]/(2*SQLITE_INDEX_SAMPLES)
         2848  +    nRowEst = (nSpan*3+nSingle)*p->aiRowEst[0]/(3*nSample)
  2815   2849                  + nNotFound*p->aiRowEst[1];
  2816   2850       if( nRowEst > p->aiRowEst[0] ) nRowEst = p->aiRowEst[0];
  2817   2851       *pnRow = nRowEst;
  2818   2852       WHERETRACE(("IN row estimate: nSpan=%d, nSingle=%d, nNotFound=%d, est=%g\n",
  2819   2853                    nSpan, nSingle, nNotFound, nRowEst));
  2820   2854     }
  2821   2855     sqlite3ValueFree(pVal);
................................................................................
  3025   3059             /* "x IN (value, value, ...)" */
  3026   3060             nInMul *= pExpr->x.pList->nExpr;
  3027   3061           }
  3028   3062         }else if( pTerm->eOperator & WO_ISNULL ){
  3029   3063           wsFlags |= WHERE_COLUMN_NULL;
  3030   3064         }
  3031   3065   #ifdef SQLITE_ENABLE_STAT2
  3032         -      if( nEq==0 && pProbe->aSample ) pFirstTerm = pTerm;
         3066  +      if( nEq==0 && pProbe->sample.a ) pFirstTerm = pTerm;
  3033   3067   #endif
  3034   3068         used |= pTerm->prereqRight;
  3035   3069       }
  3036   3070   
  3037   3071       /* Determine the value of estBound. */
  3038   3072       if( nEq<pProbe->nColumn && pProbe->bUnordered==0 ){
  3039   3073         int j = pProbe->aiColumn[nEq];

Changes to test/analyze2.test.

    67     67   do_test analyze2-1.1 {
    68     68     execsql { CREATE TABLE t1(x PRIMARY KEY) }
    69     69     for {set i 0} {$i < 1000} {incr i} {
    70     70       execsql { INSERT INTO t1 VALUES($i) }
    71     71     }
    72     72     execsql { 
    73     73       ANALYZE;
    74         -    SELECT * FROM sqlite_stat2;
           74  +    SELECT tbl, idx, sampleno, sample FROM sqlite_stat2;
    75     75     }
    76     76   } [list t1 sqlite_autoindex_t1_1 0 50  \
    77         -        t1 sqlite_autoindex_t1_1 1 149 \
    78         -        t1 sqlite_autoindex_t1_1 2 249 \
    79         -        t1 sqlite_autoindex_t1_1 3 349 \
    80         -        t1 sqlite_autoindex_t1_1 4 449 \
    81         -        t1 sqlite_autoindex_t1_1 5 549 \
    82         -        t1 sqlite_autoindex_t1_1 6 649 \
    83         -        t1 sqlite_autoindex_t1_1 7 749 \
    84         -        t1 sqlite_autoindex_t1_1 8 849 \
    85         -        t1 sqlite_autoindex_t1_1 9 949 \
           77  +        t1 sqlite_autoindex_t1_1 1 150 \
           78  +        t1 sqlite_autoindex_t1_1 2 250 \
           79  +        t1 sqlite_autoindex_t1_1 3 350 \
           80  +        t1 sqlite_autoindex_t1_1 4 450 \
           81  +        t1 sqlite_autoindex_t1_1 5 550 \
           82  +        t1 sqlite_autoindex_t1_1 6 650 \
           83  +        t1 sqlite_autoindex_t1_1 7 750 \
           84  +        t1 sqlite_autoindex_t1_1 8 850 \
           85  +        t1 sqlite_autoindex_t1_1 9 950 \
    86     86   ]
    87     87   
    88     88   do_test analyze2-1.2 {
    89     89     execsql {
    90         -    DELETE FROM t1 WHERe x>9;
           90  +    DELETE FROM t1 WHERE x>20;
    91     91       ANALYZE;
    92     92       SELECT tbl, idx, group_concat(sample, ' ') FROM sqlite_stat2;
    93     93     }
    94         -} {t1 sqlite_autoindex_t1_1 {0 1 2 3 4 5 6 7 8 9}}
           94  +} {t1 sqlite_autoindex_t1_1 {1 3 5 7 9 11 13 15 17 19}}
    95     95   do_test analyze2-1.3 {
    96     96     execsql {
    97     97       DELETE FROM t1 WHERE x>8;
    98     98       ANALYZE;
    99     99       SELECT * FROM sqlite_stat2;
   100    100     }
   101    101   } {}
................................................................................
   185    185     execsql ANALYZE
   186    186     execsql { 
   187    187       SELECT tbl,idx,group_concat(sample,' ') 
   188    188       FROM sqlite_stat2 
   189    189       WHERE idx = 't1_x' 
   190    190       GROUP BY tbl,idx
   191    191     }
   192         -} {t1 t1_x {100 299 499 699 899 ajj cjj ejj gjj ijj}}
          192  +} {t1 t1_x {100 300 500 700 900 baa daa faa haa jaa}}
   193    193   do_test analyze2-3.2 {
   194    194     execsql { 
   195    195       SELECT tbl,idx,group_concat(sample,' ') 
   196    196       FROM sqlite_stat2 
   197    197       WHERE idx = 't1_y' 
   198    198       GROUP BY tbl,idx
   199    199     }
   200         -} {t1 t1_y {100 299 499 699 899 ajj cjj ejj gjj ijj}}
          200  +} {t1 t1_y {100 300 500 700 900 baa daa faa haa jaa}}
   201    201   
   202    202   do_eqp_test 3.3 {
   203    203     SELECT * FROM t1 WHERE x BETWEEN 100 AND 500 AND y BETWEEN 'a' AND 'b'
   204    204   } {
   205         -  0 0 0 {SEARCH TABLE t1 USING INDEX t1_y (y>? AND y<?) (~50 rows)}
          205  +  0 0 0 {SEARCH TABLE t1 USING INDEX t1_y (y>? AND y<?) (~25 rows)}
   206    206   }
   207    207   do_eqp_test 3.4 {
   208    208     SELECT * FROM t1 WHERE x BETWEEN 100 AND 400 AND y BETWEEN 'a' AND 'h'
   209    209   } {
   210    210     0 0 0 {SEARCH TABLE t1 USING INDEX t1_x (x>? AND x<?) (~100 rows)}
   211    211   }
   212    212   do_eqp_test 3.5 {
   213    213     SELECT * FROM t1 WHERE x<'a' AND y>'h'
   214    214   } {
   215         -  0 0 0 {SEARCH TABLE t1 USING INDEX t1_y (y>?) (~66 rows)}
          215  +  0 0 0 {SEARCH TABLE t1 USING INDEX t1_y (y>?) (~133 rows)}
   216    216   }
   217    217   do_eqp_test 3.6 {
   218    218     SELECT * FROM t1 WHERE x<444 AND y>'h'
   219    219   } {
   220         -  0 0 0 {SEARCH TABLE t1 USING INDEX t1_y (y>?) (~66 rows)}
          220  +  0 0 0 {SEARCH TABLE t1 USING INDEX t1_y (y>?) (~133 rows)}
   221    221   }
   222    222   do_eqp_test 3.7 {
   223    223     SELECT * FROM t1 WHERE x<221 AND y>'g'
   224    224   } {
   225    225     0 0 0 {SEARCH TABLE t1 USING INDEX t1_x (x<?) (~66 rows)}
   226    226   }
   227    227   
................................................................................
   245    245       PRAGMA automatic_index=OFF;
   246    246       SELECT tbl,idx,group_concat(sample,' ') 
   247    247       FROM sqlite_stat2 
   248    248       WHERE idx = 't3a' 
   249    249       GROUP BY tbl,idx;
   250    250       PRAGMA automatic_index=ON;
   251    251     }
   252         -} {t3 t3a {AfA bEj CEj dEj EEj fEj GEj hEj IEj jEj}}
          252  +} {t3 t3a {AfA bfA CfA dfA EfA ffA GfA hfA IfA jfA}}
   253    253   do_test analyze2-4.3 {
   254    254     execsql { 
   255    255       SELECT tbl,idx,group_concat(sample,' ') 
   256    256       FROM sqlite_stat2 
   257    257       WHERE idx = 't3b' 
   258    258       GROUP BY tbl,idx
   259    259     }
   260         -} {t3 t3b {AbA CIj EIj GIj IIj bIj dIj fIj hIj jIj}}
          260  +} {t3 t3b {AbA CbA EbA GbA IbA bbA dbA fbA hbA jbA}}
   261    261   
   262    262   do_eqp_test 4.4 {
   263    263     SELECT * FROM t3 WHERE a > 'A' AND a < 'C' AND b > 'A' AND b < 'C'
   264    264   } {
   265    265     0 0 0 {SEARCH TABLE t3 USING INDEX t3b (b>? AND b<?) (~11 rows)}
   266    266   }
   267    267   do_eqp_test 4.5 {
................................................................................
   293    293     do_test analyze2-5.2 {
   294    294       execsql { 
   295    295         SELECT tbl,idx,group_concat(sample,' ') 
   296    296         FROM sqlite_stat2 
   297    297         WHERE tbl = 't4' 
   298    298         GROUP BY tbl,idx
   299    299       }
   300         -  } {t4 t4x {afa bej cej dej eej fej gej hej iej jej}}
          300  +  } {t4 t4x {afa bfa cfa dfa efa ffa gfa hfa ifa jfa}}
   301    301     do_eqp_test 5.3 {
   302    302       SELECT * FROM t4 WHERE x>'ccc'
   303    303     } {0 0 0 {SEARCH TABLE t4 USING COVERING INDEX t4x (x>?) (~800 rows)}}
   304    304     do_eqp_test 5.4 {
   305    305       SELECT * FROM t4 AS t41, t4 AS t42 WHERE t41.x>'ccc' AND t42.x>'ggg'
   306    306     } {
   307    307       0 0 1 {SEARCH TABLE t4 AS t42 USING COVERING INDEX t4x (x>?) (~300 rows)} 

Changes to test/analyze5.test.

   124    124      47  {z>=-100 AND z<=0.0}  t1z  400
   125    125      48  {z>=-100 AND z<0.0}   t1z   50
   126    126      49  {z>=-100 AND z<=1.0}  t1z  700
   127    127      50  {z>=-100 AND z<2.0}   t1z  700
   128    128      51  {z>=-100 AND z<=2.0}  t1z  900
   129    129      52  {z>=-100 AND z<3.0}   t1z  900
   130    130     
   131         -  101  {z=-1}                t1z   50
          131  +  101  {z=-1}                t1z   33
   132    132     102  {z=0}                 t1z  400
   133    133     103  {z=1}                 t1z  300
   134    134     104  {z=2}                 t1z  200
   135    135     105  {z=3}                 t1z  100
   136         -  106  {z=4}                 t1z   50
   137         -  107  {z=-10.0}             t1z   50
          136  +  106  {z=4}                 t1z   33
          137  +  107  {z=-10.0}             t1z   33
   138    138     108  {z=0.0}               t1z  400
   139    139     109  {z=1.0}               t1z  300
   140    140     110  {z=2.0}               t1z  200
   141    141     111  {z=3.0}               t1z  100
   142         -  112  {z=4.0}               t1z   50
   143         -  113  {z=1.5}               t1z   50
   144         -  114  {z=2.5}               t1z   50
          142  +  112  {z=4.0}               t1z   33
          143  +  113  {z=1.5}               t1z   33
          144  +  114  {z=2.5}               t1z   33
   145    145     
   146         -  201  {z IN (-1)}           t1z   50
          146  +  201  {z IN (-1)}           t1z   33
   147    147     202  {z IN (0)}            t1z  400
   148    148     203  {z IN (1)}            t1z  300
   149    149     204  {z IN (2)}            t1z  200
   150    150     205  {z IN (3)}            t1z  100
   151         -  206  {z IN (4)}            t1z   50
   152         -  207  {z IN (0.5)}          t1z   50
          151  +  206  {z IN (4)}            t1z   33
          152  +  207  {z IN (0.5)}          t1z   33
   153    153     208  {z IN (0,1)}          t1z  700
   154    154     209  {z IN (0,1,2)}        t1z  900
   155    155     210  {z IN (0,1,2,3)}      {}   100
   156    156     211  {z IN (0,1,2,3,4,5)}  {}   100
   157    157     212  {z IN (1,2)}          t1z  500
   158    158     213  {z IN (2,3)}          t1z  300
   159    159     214  {z=3 OR z=2}          t1z  300
   160         -  215  {z IN (-1,3)}         t1z  150
   161         -  216  {z=-1 OR z=3}         t1z  150
          160  +  215  {z IN (-1,3)}         t1z  133
          161  +  216  {z=-1 OR z=3}         t1z  133
   162    162   
   163    163     300  {y=0}                 {}   100
   164         -  301  {y=1}                 t1y   50
   165         -  302  {y=0.1}               t1y   50
          164  +  301  {y=1}                 t1y   33
          165  +  302  {y=0.1}               t1y   33
   166    166   
   167    167     400  {x IS NULL}           t1x  400
   168    168   
   169    169   } {
   170    170     # Verify that the expected index is used with the expected row count
   171    171     do_test analyze5-1.${testid}a {
   172    172       set x [lindex [eqp "SELECT * FROM t1 WHERE $where"] 3]
................................................................................
   201    201      ANALYZE;
   202    202   }
   203    203   
   204    204   # Verify that range queries generate the correct row count estimates
   205    205   #
   206    206   foreach {testid where index rows} {
   207    207     500  {x IS NULL AND u='charlie'}         t1u  20
   208         -  501  {x=1 AND u='charlie'}               t1x   5
          208  +  501  {x=1 AND u='charlie'}               t1x   3
   209    209     502  {x IS NULL}                          {} 100
   210         -  503  {x=1}                               t1x  50
          210  +  503  {x=1}                               t1x  33
   211    211     504  {x IS NOT NULL}                     t1x  25
   212    212     505  {+x IS NOT NULL}                     {} 500
   213    213     506  {upper(x) IS NOT NULL}               {} 500
   214    214   
   215    215   } {
   216    216     # Verify that the expected index is used with the expected row count
   217    217     do_test analyze5-1.${testid}a {

Changes to test/analyze7.test.

    95     95       execsql {EXPLAIN QUERY PLAN SELECT * FROM t1 WHERE c=2;}
    96     96     } {0 0 0 {SEARCH TABLE t1 USING INDEX t1cd (c=?) (~86 rows)}}
    97     97   }
    98     98   do_test analyze7-3.3 {
    99     99     execsql {EXPLAIN QUERY PLAN SELECT * FROM t1 WHERE a=123 AND b=123}
   100    100   } {0 0 0 {SEARCH TABLE t1 USING INDEX t1a (a=?) (~1 rows)}}
   101    101   do_test analyze7-3.4 {
   102         -  execsql {EXPLAIN QUERY PLAN SELECT * FROM t1 WHERE c=123 AND b=123}
   103         -} {0 0 0 {SEARCH TABLE t1 USING INDEX t1b (b=?) (~2 rows)}}
          102  +  set x [execsql {EXPLAIN QUERY PLAN SELECT * FROM t1 WHERE c=123 AND b=123}]
          103  +  regsub -all {[bcd]+} $x {x} x
          104  +  set x
          105  +} {0 0 0 {SEARCH TABLE t1 USING INDEX t1x (x=?) (~2 rows)}}
   104    106   do_test analyze7-3.5 {
   105    107     execsql {EXPLAIN QUERY PLAN SELECT * FROM t1 WHERE a=123 AND c=123}
   106    108   } {0 0 0 {SEARCH TABLE t1 USING INDEX t1a (a=?) (~1 rows)}}
   107    109   do_test analyze7-3.6 {
   108    110     execsql {EXPLAIN QUERY PLAN SELECT * FROM t1 WHERE c=123 AND d=123 AND b=123}
   109    111   } {0 0 0 {SEARCH TABLE t1 USING INDEX t1cd (c=? AND d=?) (~1 rows)}}
   110    112   
   111    113   finish_test

Added test/analyze8.test.

            1  +# 2011 August 5
            2  +#
            3  +# The author disclaims copyright to this source code.  In place of
            4  +# a legal notice, here is a blessing:
            5  +#
            6  +#    May you do good and not evil.
            7  +#    May you find forgiveness for yourself and forgive others.
            8  +#    May you share freely, never taking more than you give.
            9  +#
           10  +#***********************************************************************
           11  +# This file implements regression tests for SQLite library.
           12  +# This file implements tests for the ANALYZE command under STAT2.
           13  +# Testing the logic that computes the number of copies of each sample.
           14  +#
           15  +
           16  +set testdir [file dirname $argv0]
           17  +source $testdir/tester.tcl
           18  +
           19  +# There is nothing to test if ANALYZE is disable for this build.
           20  +#
           21  +ifcapable {!analyze||!vtab||!stat2} {
           22  +  finish_test
           23  +  return
           24  +}
           25  +
           26  +# Generate some test data
           27  +#
           28  +do_test analyze8-1.0 {
           29  +  set x 100
           30  +  set y 1
           31  +  set ycnt 0
           32  +  set yinc 10
           33  +  execsql {
           34  +    CREATE TABLE t1(x,y);
           35  +    ANALYZE;
           36  +    BEGIN;
           37  +    CREATE INDEX t1x ON t1(x);
           38  +    CREATE INDEX t1y ON t1(y);
           39  +  }
           40  +  for {set i 0} {$i<20} {incr i} {
           41  +    for {set j 0} {$j<300} {incr j} {
           42  +      execsql {INSERT INTO t1 VALUES($x,$y)}
           43  +      incr ycnt
           44  +      if {$ycnt>=$yinc} {set ycnt 0; incr y}
           45  +    }
           46  +    for {set j 0} {$j<100} {incr j} {
           47  +      incr x
           48  +      execsql {INSERT INTO t1 VALUES($x,$y)}
           49  +      incr ycnt
           50  +      if {$ycnt>=$yinc} {set ycnt 0; incr y}
           51  +    }
           52  +  }
           53  +  execsql {
           54  +    COMMIT;
           55  +    ANALYZE;
           56  +    SELECT sample, cnt FROM sqlite_stat2 WHERE idx='t1x' ORDER BY sampleno;
           57  +  }
           58  +} {200 301 400 301 600 301 800 301 1000 301 1200 301 1400 301 1600 301 1800 301 2000 301}
           59  +do_test analyze8-1.1 {
           60  +  execsql {
           61  +    SELECT count(*) FROM t1 WHERE x=200;
           62  +  }
           63  +} {301}
           64  +
           65  +do_test analyze8-2.0 {
           66  +  execsql {
           67  +    BEGIN;
           68  +    DELETE FROM t1;
           69  +  }
           70  +  for {set x 1} {$x<200} {incr x} {
           71  +    execsql {INSERT INTO t1 VALUES($x,$x)}
           72  +  }
           73  +  for {set i 0} {$i<200} {incr i} {
           74  +    execsql {INSERT INTO t1 VALUES(999,999)}
           75  +  }
           76  +  execsql {
           77  +    COMMIT;
           78  +    ANALYZE;
           79  +    SELECT sample, cnt FROM sqlite_stat2 WHERE idx='t1x' ORDER BY sampleno;
           80  +  }
           81  +} {20 1 60 1 100 1 140 1 180 1 999 200 999 200 999 200 999 200 999 200}
           82  +do_test analyze8-2.1 {
           83  +  for {set i 0} {$i<200} {incr i} {
           84  +    execsql {INSERT INTO t1 VALUES(0,999)}
           85  +  }
           86  +  execsql {
           87  +    ANALYZE;
           88  +    SELECT sample, cnt FROM sqlite_stat2 WHERE idx='t1x' ORDER BY sampleno;
           89  +  }
           90  +} {0 200 0 200 0 200 10 1 70 1 130 1 190 1 999 200 999 200 999 200}
           91  +
           92  +do_test analyze8-3.0 {
           93  +  execsql {
           94  +    BEGIN;
           95  +    DROP TABLE t1;
           96  +    CREATE TABLE t1(a,b);
           97  +    CREATE INDEX t1all ON t1(a,b);
           98  +    INSERT INTO t1 VALUES(0,1);
           99  +    INSERT INTO t1 VALUES(0,2);
          100  +    INSERT INTO t1 VALUES(0,3);
          101  +    INSERT INTO t1 VALUES(1,4);
          102  +    INSERT INTO t1 SELECT a+2, b+4 FROM t1;
          103  +    INSERT INTO t1 SELECT a+4, b+8 FROM t1;
          104  +    INSERT INTO t1 SELECT a+8, b+16 FROM t1;
          105  +    COMMIT;
          106  +    ANALYZE;
          107  +    SELECT sample, cnt FROM sqlite_stat2 WHERE idx='t1all' ORDER BY sampleno;
          108  +  }
          109  +} {0 3 2 3 4 3 5 1 6 3 8 3 10 3 12 3 13 1 14 3}
          110  +do_test analyze8-3.1 {
          111  +  execsql {
          112  +    DELETE FROM t1;
          113  +    INSERT INTO t1 VALUES(1,1);
          114  +    INSERT INTO t1 VALUES(2,2);
          115  +    INSERT INTO t1 SELECT a+2, b+2 FROM t1;
          116  +    INSERT INTO t1 SELECT a+4, b+4 FROM t1;
          117  +    INSERT INTO t1 SELECT a+8, b+8 FROM t1;
          118  +    INSERT INTO t1 SELECT a+16, b+16 FROM t1;
          119  +    DELETE FROM t1 WHERE a>21;
          120  +    ANALYZE;
          121  +    SELECT sample, cnt FROM sqlite_stat2 WHERE idx='t1all' ORDER BY sampleno;
          122  +  }
          123  +} {2 1 4 1 6 1 8 1 10 1 12 1 14 1 16 1 18 1 20 1}
          124  +do_test analyze8-3.2 {
          125  +  execsql {
          126  +    UPDATE t1 SET a=123;
          127  +    ANALYZE;
          128  +    SELECT sample, cnt FROM sqlite_stat2 WHERE idx='t1all' ORDER BY sampleno;
          129  +  }
          130  +} {123 21 123 21 123 21 123 21 123 21 123 21 123 21 123 21 123 21 123 21}
          131  +do_test analyze8-3.3 {
          132  +  execsql {
          133  +    DELETE FROM t1 WHERE b=1 OR b=2;
          134  +    ANALYZE;
          135  +    SELECT count(*) FROM t1;
          136  +    SELECT sample, cnt FROM sqlite_stat2 WHERE idx='t1all' ORDER BY sampleno;
          137  +  }
          138  +} {19}
          139  +do_test analyze8-3.4 {
          140  +  execsql {
          141  +    UPDATE t1 SET a=b;
          142  +    INSERT INTO t1 VALUES(1,1);
          143  +    INSERT INTO t1 VALUES(2,2);
          144  +    INSERT INTO t1 SELECT a, b FROM t1;
          145  +    ANALYZE;
          146  +    SELECT sample, cnt FROM sqlite_stat2 WHERE idx='t1all' ORDER BY sampleno;
          147  +  }
          148  +} {2 2 4 2 6 2 8 2 10 2 12 2 14 2 16 2 18 2 20 2}
          149  +do_test analyze8-3.5 {
          150  +  execsql {
          151  +    UPDATE t1 SET a=1 WHERE b<20;
          152  +    ANALYZE;
          153  +    SELECT sample, cnt FROM sqlite_stat2 WHERE idx='t1all' ORDER BY sampleno;
          154  +  }
          155  +} {1 38 1 38 1 38 1 38 1 38 1 38 1 38 1 38 1 38 20 2}
          156  +do_test analyze8-3.6 {
          157  +  execsql {
          158  +    UPDATE t1 SET a=b;
          159  +    UPDATE t1 SET a=20 WHERE b>2;
          160  +    ANALYZE;
          161  +    SELECT sample, cnt FROM sqlite_stat2 WHERE idx='t1all' ORDER BY sampleno;
          162  +  }
          163  +} {2 2 20 38 20 38 20 38 20 38 20 38 20 38 20 38 20 38 20 38}
          164  +
          165  +
          166  +
          167  +# Verify that the 5th "cnt" column is added to the sqlite_stat2 table
          168  +# on a full ANALYZE if the column is not already present.
          169  +#
          170  +do_test analyze8-4.0 {
          171  +  execsql {
          172  +    UPDATE t1 SET a=b;
          173  +    ANALYZE;
          174  +    PRAGMA writable_schema=ON;
          175  +    UPDATE sqlite_master 
          176  +       SET sql='CREATE TABLE sqlite_stat2(tbl,idx,sampleno,sample)'
          177  +     WHERE name='sqlite_stat2';
          178  +  }
          179  +  db close
          180  +  sqlite3 db test.db
          181  +  execsql {
          182  +    SELECT sample FROM sqlite_stat2 WHERE idx='t1all' ORDER BY sampleno
          183  +  }
          184  +} {2 4 6 8 10 12 14 16 18 20}
          185  +do_test analyze8-4.1 {
          186  +  catchsql {SELECT sample, cnt FROM sqlite_stat2}
          187  +} {1 {no such column: cnt}}
          188  +do_test analyze8-4.2 {
          189  +  execsql {
          190  +    ANALYZE;
          191  +  }
          192  +  db close;
          193  +  sqlite3 db test.db
          194  +  execsql {
          195  +    SELECT sample, +cnt FROM sqlite_stat2 WHERE idx='t1all' ORDER BY sampleno;
          196  +  }
          197  +} {2 2 4 2 6 2 8 2 10 2 12 2 14 2 16 2 18 2 20 2}
          198  +
          199  +
          200  +
          201  +finish_test

Changes to test/tkt-cbd054fa6b.test.

    31     31       INSERT INTO t1 VALUES (NULL, 'C');
    32     32       INSERT INTO t1 VALUES (NULL, 'D');
    33     33       INSERT INTO t1 VALUES (NULL, 'E');
    34     34       INSERT INTO t1 VALUES (NULL, 'F');
    35     35       INSERT INTO t1 VALUES (NULL, 'G');
    36     36       INSERT INTO t1 VALUES (NULL, 'H');
    37     37       INSERT INTO t1 VALUES (NULL, 'I');
           38  +    INSERT INTO t1 VALUES (NULL, 'J');
           39  +    INSERT INTO t1 VALUES (NULL, 'K');
           40  +    INSERT INTO t1 VALUES (NULL, 'L');
           41  +    INSERT INTO t1 VALUES (NULL, 'M');
           42  +    INSERT INTO t1 VALUES (NULL, 'N');
           43  +    INSERT INTO t1 VALUES (NULL, 'O');
           44  +    INSERT INTO t1 VALUES (NULL, 'P');
           45  +    INSERT INTO t1 VALUES (NULL, 'Q');
           46  +    INSERT INTO t1 VALUES (NULL, 'R');
           47  +    INSERT INTO t1 VALUES (NULL, 'S');
    38     48       SELECT count(*) FROM t1;
    39     49     }
    40         -} {10}
           50  +} {20}
    41     51   do_test tkt-cbd05-1.2 {
    42     52     db eval {
    43     53       ANALYZE;
    44     54     }
    45     55   } {}
    46     56   do_test tkt-cbd05-1.3 {
    47     57     execsql { 
    48     58       SELECT tbl,idx,group_concat(sample,' ') 
    49     59       FROM sqlite_stat2 
    50     60       WHERE idx = 't1_x' 
    51     61       GROUP BY tbl,idx
    52     62     }
    53         -} {t1 t1_x { A B C D E F G H I}}
           63  +} {t1 t1_x {A C E G I K M O Q S}}
    54     64   
    55     65   do_test tkt-cbd05-2.1 {
    56     66     db eval {
    57     67       DROP TABLE t1;
    58     68       CREATE TABLE t1(a INTEGER PRIMARY KEY, b BLOB UNIQUE NOT NULL);
    59     69       CREATE INDEX t1_x ON t1(b);
    60     70       INSERT INTO t1 VALUES(NULL, X'');
................................................................................
    63     73       INSERT INTO t1 VALUES(NULL, X'43');
    64     74       INSERT INTO t1 VALUES(NULL, X'44');
    65     75       INSERT INTO t1 VALUES(NULL, X'45');
    66     76       INSERT INTO t1 VALUES(NULL, X'46');
    67     77       INSERT INTO t1 VALUES(NULL, X'47');
    68     78       INSERT INTO t1 VALUES(NULL, X'48');
    69     79       INSERT INTO t1 VALUES(NULL, X'49');
           80  +    INSERT INTO t1 VALUES(NULL, X'4A');
           81  +    INSERT INTO t1 VALUES(NULL, X'4B');
           82  +    INSERT INTO t1 VALUES(NULL, X'4C');
           83  +    INSERT INTO t1 VALUES(NULL, X'4D');
           84  +    INSERT INTO t1 VALUES(NULL, X'4E');
           85  +    INSERT INTO t1 VALUES(NULL, X'4F');
           86  +    INSERT INTO t1 VALUES(NULL, X'50');
           87  +    INSERT INTO t1 VALUES(NULL, X'51');
           88  +    INSERT INTO t1 VALUES(NULL, X'52');
           89  +    INSERT INTO t1 VALUES(NULL, X'53');
    70     90       SELECT count(*) FROM t1;
    71     91     }
    72         -} {10}
           92  +} {20}
    73     93   do_test tkt-cbd05-2.2 {
    74     94     db eval {
    75     95       ANALYZE;
    76     96     }
    77     97   } {}
    78     98   do_test tkt-cbd05-2.3 {
    79     99     execsql { 
    80    100       SELECT tbl,idx,group_concat(sample,' ') 
    81    101       FROM sqlite_stat2 
    82    102       WHERE idx = 't1_x' 
    83    103       GROUP BY tbl,idx
    84    104     }
    85         -} {t1 t1_x { A B C D E F G H I}}
          105  +} {t1 t1_x {A C E G I K M O Q S}}
    86    106   
    87    107   finish_test