PROVE IT !!If you know it then PROVE IT !! Skill Proficiency Test

Stats !! Stats !! Stats !! | Teradata 14.0 new syntax: SHOW STATISTICS

Teradata STATS – Statistics is important in Teradata. What's more important is to get into details of statistics.

Earlier releases of Teradata provided the only way to observe collected statistics at the table level by using HELP STATISTICS:

HELP STATISTICS table1;
 *** Help information returned. 2 rows.
 *** Total elapsed time was 1 second.

Date     Time     Unique Values        Column Names
-------- -------- -------------------- ------------------------------------
12/05/11 15:24:24                    8 a
03/24/12 12:02:33                   14 b

This information is simple and straightforward. But sometimes it's not enough when we want to know more like what the MAX value it is. That's why in Teradata 14.0 we introduced a new syntax: SHOW STATISTICS.

In order to demonstrate the different usages for this new syntax, let's first set up a test database.

 

CREATE DATABASE ShowStatSyntax AS PERM=1.0e7;
DATABASE ShowStatSyntax;

CREATE TABLE t1
(  a                   INTEGER      ,
   b                   INTEGER      ,
   c                   INTEGER      ,
   d                   INTEGER COMPRESS 1,
   e                   CHAR(8) COMPRESS 'CScs' ,
   f                   DATE         ,
   g                   DECIMAL(8,3) ,
   h                   FLOAT   COMPRESS 19     ,
   i                   BYTEINT COMPRESS 4,
   j                   SMALLINT COMPRESS 10

) PRIMARY INDEX (a)
INDEX (b)
INDEX (c );

INSERT INTO t1 VALUES (1,10,101,1,'CScs', '61/06/05', 100.333, 19, 4, 10)
; INSERT INTO t1 VALUES (2,20,102,1,'CScs', '61/06/05', 200.333, 19, 4, 10)
; INSERT INTO t1 VALUES (3,30,103,1,'CScs', '61/06/05', 300.333, 19, 4, 10)
; INSERT INTO t1 VALUES (4,40,104,1,'CScs', '61/06/05', 100.333, 19, 4, 10)
; INSERT INTO t1 VALUES (5,50,105,1,'CScs', '61/06/05', 200.333, 19, 4, 10)
; INSERT INTO t1 VALUES (6,60,106,1,'CScs', '63/02/03', 300.333, 19, 4, 10)
; INSERT INTO t1 VALUES (7,70,107,1,'CScs', '63/02/03', 100.333, 19, 4, 10)
; INSERT INTO t1 VALUES (8,80,108,1,'CScs', '63/02/03', 200.333, 19, 4, 10)
; INSERT INTO t1 VALUES (9,90,109,1,'CScs', '63/02/03', 300.333, 19, 4, 10)
; INSERT INTO t1 VALUES (10,100,110,1,'CScs', '63/02/03', 100.333, 19, 4, 10)
;

Now let's collect some statistics.

COLLECT STATISTICS COLUMN c ON t1;
COLLECT STATISTICS COLUMN (c,e) ON t1;

Bear in mind that although the above two statements are issued to collect two column level statistics, table level statistics is always collected (after 1st statement) and updated (after 2nd statement), implicitly.

Now we're ready to SHOW them. There are two flavors of SHOW STATISTICS output. One is in plain text, the other is in XML. Let's focus on the plain text one first.

1. Table level statistics

In Teradata 14.0, table level statistics is introduced to better capture the data volatility. You cannot collect statistics directly against a table. However each time column statistics is collected the table level statistics will be collected or updated.

SHOW SUMMARY STATISTICS ON t1;

 *** Text of DDL statement returned.
 *** Total elapsed time was 1 second.

---------------------------------------------------------------------------
COLLECT SUMMARY STATISTICS
                ON SHOWSTATSYNTAX.t1 ;

That is a "brief" overview of statistics for the table. All it tells us is that the table t1 does have table level statistics available. Till now there's nothing much different than previous HELP STATISTICS output. But the following example will show some new stuff, thanks to the VALUES keyword:

SHOW SUMMARY STATISTICS VALUES ON t1;

 *** Text of DDL statement returned.
 *** Total elapsed time was 1 second.

---------------------------------------------------------------------------
COLLECT SUMMARY STATISTICS
                ON SHOWSTATSYNTAX.t1
            VALUES
(
 /** TableLevelSummary **/
 /* Version           */ 5,
 /* NumOfRecords      */ 2,
 /* Reserved          */ 0.000000,
 /* Reserved          */ 0.000000,
 /* SummaryRecord[1] */
 /* Temperature           */ 0,
 /* TimeStamp             */ TIMESTAMP '2012-05-10 16:22:29',
 /* NumOfAMPs             */ 4,
 /* OneAMPSampleEst       */ 20,
 /* AllAMPSampleEst       */ 10,
 /* RowCount              */ 10,
 /* DelRowCount           */ 0,
 /* PhyRowCount           */ 10,
 /* AvgRowsPerBlock       */ 1.000000,
 /* AvgBlockSize (bytes)  */ 512.000000,
 /* BLCPctCompressed      */ 0.00,
 /* BLCBlkUcpuCost        */ 0.000000,
 /* BLCBlkURatio          */ 0.000000,
 /* AvgRowSize            */ 56.000000,
 /* Reserved              */ 0.000000,
 /* Reserved              */ 0.000000,
 /* Reserved              */ 0.000000,
 /* SummaryRecord[2] */
 /* Temperature           */ 0,
 /* TimeStamp             */ TIMESTAMP '2012-05-10 16:22:25',
 /* NumOfAMPs             */ 4,
 /* OneAMPSampleEst       */ 20,
 /* AllAMPSampleEst       */ 10,
 /* RowCount              */ 10,
 /* DelRowCount           */ 0,
 /* PhyRowCount           */ 10,
 /* AvgRowsPerBlock       */ 1.000000,
 /* AvgBlockSize (bytes)  */ 512.000000,
 /* BLCPctCompressed      */ 0.00,
 /* BLCBlkUcpuCost        */ 0.000000,
 /* BLCBlkURatio          */ 0.000000,
 /* AvgRowSize            */ 56.000000,
 /* Reserved              */ 0.000000,
 /* Reserved              */ 0.000000,
 /* Reserved              */ 0.000000
);

Remember I mentioned earlier that we collected and updated the table level statistics implicitly? That's why we have two records here.

Most entries are self-explanatory. Note that they only reflect table level statistics demographics, nothing to do with any column.

2. Column level statistics

Both single-column statistics and multi-column statistics are supported. Similar to table level statistics, column level statistics also provides a "brief" output and a "detailed" output.

SHOW STATISTICS COLUMN c ON t1;

 *** Text of DDL statement returned.
 *** Total elapsed time was 1 second.

---------------------------------------------------------------------------
COLLECT STATISTICS
            COLUMN ( c )
                ON SHOWSTATSYNTAX.t1 ;

SHOW STATISTICS COLUMN (c,e) ON t1;

 *** Text of DDL statement returned.
 *** Total elapsed time was 1 second.

---------------------------------------------------------------------------
COLLECT STATISTICS
            COLUMN ( c ,e )
                ON SHOWSTATSYNTAX.t1 ;

 

SHOW STATISTICS VALUES COLUMN c ON t1;

 *** Text of DDL statement returned.
 *** Total elapsed time was 1 second.

---------------------------------------------------------------------------
COLLECT STATISTICS
            COLUMN ( c )
                ON SHOWSTATSYNTAX.t1
            VALUES
(
 /** SummaryInfo **/
 /* Data Type and Length: 'I:4' */
 /* TimeStamp             */ TIMESTAMP '2012-05-10 16:22:25',
 /* Version               */ 5,
 /* OriginalVersion       */ 5,
 /* DBSVersion            */ '14.00.01.0DR157227MAY4',
 /* UsageType             */ 'S',
 /* ComplexStatInfo       */ 'ComplexStatInfo',
 /* NumOfBiasedValues     */ 10,
 /* NumOfEHIntervals      */ 0,
 /* NumOfHistoryRecords   */ 0,
 /* SamplePercent         */ 0.00,
 /* NumOfNulls            */ 0,
 /* NumOfAllNulls         */ 0,
 /* NumOfPartialNullVals  */ 0,
 /* PartialNullHMF        */ 0,
 /* AvgAmpRPV             */ 1.000000,
 /* MinVal                */ 101,
 /* MaxVal                */ 110,
 /* ModeVal               */ 101,
 /* HighModeFreq          */ 1,
 /* NumOfDistinctVals     */ 10,
 /* NumOfRows             */ 10,
 /* CPUUsage              */ 0.000000,
 /* IOUsage               */ 0.000000,
 /* Reserved              */ 0,
 /* Reserved              */ 0,
 /* Reserved              */ 0.000000,
 /* Reserved              */ 0.000000,
 /* Reserved              */ '',
 /** Biased: Value, Frequency **/
 /*   1 */   101, 1,
 /*   2 */   102, 1,
 /*   3 */   103, 1,
 /*   4 */   104, 1,
 /*   5 */   105, 1,
 /*   6 */   106, 1,
 /*   7 */   107, 1,
 /*   8 */   108, 1,
 /*   9 */   109, 1,
 /*  10 */   110, 1
);

SHOW STATISTICS VALUES COLUMN (c,e) ON t1;

 *** Text of DDL statement returned.
 *** Total elapsed time was 1 second.

---------------------------------------------------------------------------
COLLECT STATISTICS
            COLUMN ( c ,e )
                ON SHOWSTATSYNTAX.t1
            VALUES
(
 /** SummaryInfo **/
 /* Data Type and Length: 'I:4', 'CF:8' */
 /* TimeStamp             */ TIMESTAMP '2012-05-10 16:22:29',
 /* Version               */ 5,
 /* OriginalVersion       */ 5,
 /* DBSVersion            */ '14.00.01.0DR157227MAY4',
 /* UsageType             */ 'S',
 /* ComplexStatInfo       */ 'ComplexStatInfo',
 /* NumOfBiasedValues     */ 10,
 /* NumOfEHIntervals      */ 0,
 /* NumOfHistoryRecords   */ 0,
 /* SamplePercent         */ 0.00,
 /* NumOfNulls            */ 0,
 /* NumOfAllNulls         */ 0,
 /* NumOfPartialNullVals  */ 0,
 /* PartialNullHMF        */ 0,
 /* AvgAmpRPV             */ 0.000000,
 /* MinVal                */ 101, 'CSCS    ',
 /* MaxVal                */ 110, 'CSCS    ',
 /* ModeVal               */ 101, 'CSCS    ',
 /* HighModeFreq          */ 1,
 /* NumOfDistinctVals     */ 10,
 /* NumOfRows             */ 10,
 /* CPUUsage              */ 0.000000,
 /* IOUsage               */ 0.000000,
 /* Reserved              */ 0,
 /* Reserved              */ 0,
 /* Reserved              */ 0.000000,
 /* Reserved              */ 0.000000,
 /* Reserved              */ '',
 /** Biased: Value[2], Frequency **/
 /*   1 */   101, 'CSCS    ', 1,
 /*   2 */   102, 'CSCS    ', 1,
 /*   3 */   103, 'CSCS    ', 1,
 /*   4 */   104, 'CSCS    ', 1,
 /*   5 */   105, 'CSCS    ', 1,
 /*   6 */   106, 'CSCS    ', 1,
 /*   7 */   107, 'CSCS    ', 1,
 /*   8 */   108, 'CSCS    ', 1,
 /*   9 */   109, 'CSCS    ', 1,
 /*  10 */   110, 'CSCS    ', 1
);

When you have a more diversified data set, you will see more Biased values and Equal-height intervals (not shown in this example). When you collect statistics again on a column (or multiple columns), it will create History records too (not shown in this example), which is used for trend analysis.

3. Bring'em together

You can also get table level statistics and column level statistics together by using only one statement. It will list all the available statistics on the specified table.

 

SHOW STATISTICS ON t1;

 *** Text of DDL statement returned.
 *** Total elapsed time was 1 second.

---------------------------------------------------------------------------
COLLECT STATISTICS
            COLUMN ( c ) ,
            COLUMN ( c ,e )
                ON SHOWSTATSYNTAX.t1 ;

 

SHOW STATISTICS VALUES ON t1;

 *** Text of DDL statement returned.
 *** Total elapsed time was 1 second.

---------------------------------------------------------------------------
COLLECT SUMMARY STATISTICS
                ON SHOWSTATSYNTAX.t1
            VALUES
(
 /** TableLevelSummary **/
 /* Version           */ 5,
 /* NumOfRecords      */ 2,
...
...
);
COLLECT STATISTICS
            COLUMN ( c )
                ON SHOWSTATSYNTAX.t1
            VALUES
(
 /** SummaryInfo **/
 /* Data Type and Length: 'I:4' */
 /* TimeStamp             */ TIMESTAMP '2012-05-10 16:22:25',
 /* Version               */ 5,
...
...
);
COLLECT STATISTICS
            COLUMN ( c ,e )
                ON SHOWSTATSYNTAX.t1
            VALUES
(
 /** SummaryInfo **/
 /* Data Type and Length: 'I:4', 'CF:8' */
 /* TimeStamp             */ TIMESTAMP '2012-05-10 16:22:29',
 /* Version               */ 5,
...
...
);

All the above COLLECT STATISTICS statement output can be submitted back to the database as is. That is very useful for statistics backup and restore.

4. XML

The XML output is mainly used by client tools (vs. plain text output read by people), like ODBC, JDBC and so on. The usage is very simple: put the keyword "IN XML" in between "SHOW" and "(SUMMARY) STATISTICS". All the above syntax's have an XML flavor. Here let me show some simple examples.

5. New HELP STATISTICS

Now it becomes obvious that HELP STATISTICS should also be enhanced since the concept of "table level statistics" is introduced.

 

HELP STATISTICS ON t1;

 *** Help information returned. 3 rows.
 *** Total elapsed time was 1 second.

Date     Time     Unique Values        Column Names
-------- -------- -------------------- ------------------------------------
12/05/10 16:22:29                   10 *
12/05/10 16:22:25                   10 c
12/05/10 16:22:29                   10 c,e

The star stands for table level statistics.

Now you've got a glimpse of the versatile SHOW STATISTICS in Teradata 14.0.

Add a Comment

Your email address will not be published. Required fields are marked *