This course is designed for associates who have a working knowledge of Teradata 13.0 and need to know what is new or changed in Teradata 13.10.

For each key feature, it includes a brief description, its business value, how it works, how to install or configure it, how to use it, and how to support it.

Modules details:-

 Module 1: Database Enhancements

  • DBQL Rule Enhancements
  • ARC Archives dbc.DBQLRuleTbl Table
  • Statement Independence
  • Merge Datablocks
  • Fast Export with No Spooling
  • Block Level Compression

 Module 2: Temporal Overview

 Module 3: Usability Features

  • SQL User Defined Functions
  • Utility Functions
  • User Defined Ordered Analytic Functions
  • Algorithmic Compression
  • Character Partitioned Primary Index
  • Moving Current Date /Timestamp in Join Index
  • Moving Current Date in PPI
  • Time Series Expansion

 

Module 1: Database Enhancements 1.1  

 

DBQL Rule Enhancements Database Query Logging (DBQL) enhancements include the following:

  • Allowing the LIMIT THRESHOLD in combination with SQL, STEPINFO and OBJECTS logging options.
  • Type Of Use column in DBQLObjTbl will be populated.
  • New REPLACE QUERY LOGGING command

Business Value Allowing THRESHOLD with other logging options will help you prevent unnecessary queries from being logged. Populating the TypeOfUse column helps businesses understand the usage of various objects involved in the query. The REPLACE QUERY LOGGING statement allows you to modify your query logging for active users without having to end query logging and begin query logging. This will help prevent missing logging data between the end logging andbegin logging sequence of statements. Technical Overview When to Use This Feature  :  This feature is useful under many circumstances, and does not have special usage conditions. Requirements   There are no specific requirements for this feature. Trade-Offs There are no trade-offs in using this feature. Limitations The THRESHOLD limit option is not supported by EXPLAIN and XMLPLAN. Installation & Configuration The THRESHOLD option has an internal DBSControl parameter named ‘DBQLTempObjCacheLimit’.

  • The default value 10.
  • The max value is 200.

Usage

  • Allowing the LIMIT THRESHOLD in combination with SQL, STEPINFO and OBJECTS logging options
  • By filtering logging with THRESHOLD will lower the performance overhead of detailed logging of unnecessary queries.
  • For example, to log into DBQLogTbl, DBQLSqlTBL, DBQLStepTbl and DBQLObjTbl queries that run beyond 1 CPU and summarize all other queries in the DBQLSummaryTbl:

BEGIN QUERY LOGGING with SQL, STEPINFO, OBJECTS LIMIT THRESHOLD=100 CPUTIME on ALL; REPLACE QUERY LOGGING command

Can be used to avoid the end query logging and begin query logging statement pairs and unnecessary flushing of DBQL caches.

Submitting, REPLACE QUERY LOGGING with SQL on User1; will replace any existing rule for User1. If no rule existed, it will be created. Syntax and Options {BEGIN} {REPLACE} QUERY LOGGING {WITH NONE} { [ WITH logging-option [ … ,logging-option ] ] [ LIMIT limit-option [AND limit-option ] ]} { } ON { ALL [ ACCOUNT = (‘account id’ [ … ,’account id’ ] ) ] } { user-name [ ACCOUNT = (‘account id’ [ … ,’account id’ ] ) ] } { user-name [ … ,user-name]} { APPLNAME = ‘appname’ } ; where: [ ] – indicates an optional item { } – indicates one of these is required ( ) – indicates a set of parentheses in the syntax • Replace Query Logging takes all the same options as Begin Query Logging TypeOfUse column in DBQLObjTbl will be populated The TypeOfUse column contains the following numeric values: · 1 = Found in the resolver · 2 = Accessed during query processing · 4 = Found in a conditional context · 8 = Found in inner join condition · 16 = Found in outer join condition · 32 = Found in a sum node · 64 = Found in a full outer join condition

Support This feature does not require any specific support information. 1.2 ARC Archives dbc.DBQLRuleTbl Table ARC now archives the DBQLRuleTbl table and DBQLRuleCountTbl table. Business Value The ARC utility now includes DBQLRuleTbl and DBQLRuleCountTbl tables in archive and restore. This feature helps you avoid manual restoration of DBQL rules. Technical Overview DBQL rules are stored in dbc.DBQLRuleTbl and the number of DBQL rules is stored in dbc.DBQLRuleCountTbl. These tables were not previously backed up by ARC utility. With RFC 98408 Teradata 13.10 and later, the ARC utility will automatically backup dbc.DBQLRuleTbl and dbc.DBQLRuleCountTbl as part of the dictionary table backup. Example: When you want to port the Database to a new larger Database, the ARC utility’s automatic backup and restore of dictionary tables eliminates the need for manual scripting to restore DBQL rules from the old Database to the new one. In other words the DBQL rules get ported to the new database seamlessly. When to Use This Feature This feature is useful under many circumstances, and does not have special usage conditions. Requirements When the ARC utility is used to restore DBQL rules on an active system, make sure to issue ‘Diagnostic spoilq’. This is how DBQL is triggered to look for changes in the rules due to the DBQL restore by the ARC utility. This needs to be done only once after a restore to ensure the rules caches are flushed so any new rules will be applied appropriately and old rules will be removed from use. Trade-Offs There are no trade-offs in using this feature. Limitations There are no limitations when using this feature. Installation & Configuration This feature does not require any special installation or configuration instructions. Usage When configured properly, the system uses this feature automatically. No special usage instructions are required. Support This feature does not require any specific support information. 1.3 Statement Independence The Statement Independence feature handles multiple errors in the same request. Business Value This feature particularly improves performance of TPump and other data loading client utilities that use multi-statement INSERT requests and/or array (or iterated)INSERT. Technical Overview The Teradata client software has been enhanced to support this new Teradata Database 13.10 capability for the JDBC interface. Background information

  • The previous error handling in Teradata, for the TPump utility in particular, was not very efficient because individual errors were handled one at a time. Multipl
    e errors in a request resulted in multiple rollbacks and resubmissions, potentially affecting performance of the job.
  • Since the data-related errors are handled by statement errors, where possible, the multiple rollbacks on the server and the multiple resubmissions by the client software can be completely avoided.
  • If there are any data-related errors detected in the server, only the associated statements are rolled back. Other statements that completed successfully in the same request are not rolled back.
  • A new error parcel called STMTERROR is used to respond back to the client foreach of the failing statements of the request. An OK/SUCCESS parcel is sent back to the client for each of the successful statements of the same request.
  • The server will send one statement error parcel back to the client for each
  • statement that encountered an error and one success parcel back to the client for each statement that completed successfully.

When to Use This Feature

  • This feature is used automatically by the system for multi-statement or iterated INSERT requests and does not have special usage conditions.

Requirements

  • There are no specific requirements for this feature.

Trade-Offs There are no trade-offs in using this feature. Limitations

  • In Teradata Database 13.10, this feature is supported by the JDBC interface only.
  • Therefore, any application or request that is not using JDBC does not use statement independence.
  • Please refer to the Teradata JDBC Driver User Guide(B035-2403-088A) for more details on enabling and using the feature.

Statement independence is not enabled for INSERT requests into tables defined with the following options: · Triggers · Hash indexes · Join indexes Statement independence is also not supported for the following SQL features: · INSERT … SELECT requests · SQL stored procedures

The statement error capability is only used when requested by the client software. Only new client and server software (Teradata Database 13.10) can enable the Statement Independence feature. This feature can be enabled only for iterated INSERT statements (also known as array INSERT) or a multi-statement request consisting of only simple INSERT statements. Statement Independence supports the following data error types for INSERT statements:

  • Column-level CHECK constraint violations
  • Duplicate row errors for SET tables
  • Primary index uniqueness violations
  • Secondary index uniqueness violations
  • Referential integrity violations

Installation & Configuration This feature does not require any special installation or configuration instructions. Usage This feature is used automatically by the system for JDBC data load operations containing iterated requests or multi-statement requests, with certain limitations. Support This feature does not require any specific support information. 1.4 Merge Datablocks This Teradata Database 13.10 feature automatically searches for “small” data blocks within a table and will combine (merge) these small datablocks into a single larger block. Business Value Over time, modifications to a table (especially with DELETEs of data rows) can result in a table having blocks that are less than 50% of the maximum datablock size. This File System feature combines these small blocks into a larger block. The benefit is simply that future full table operations (full table scans and/or full table updates) will perform faster because fewer I/Os are performed. By having larger blocks in the Teradata file system, the selection of target rows can also be more efficient. Technical Overview

  • How does a table get to the point of having many small blocks?
  • Assume that a table is initially loaded via FastLoad. The datablock size of the table will be the maximum block size (e.g., 127 KB).
  • INSERT and UPDATE operations on a table will cause the table to end up many datablocks that are between 50% and 100% of the maximum datablocksize. This is a natural occurrence because of datablock splits within the cylinder. In this scenario, typical datablock sizes may average 96 KB in size.Blocks that are 50% or greater of the maximum multi-row datablock size (63.5 KB in this example) are not considered to be small blocks. Small blocks are less than 50% of the maximum datablock size.
  • However, over time, DELETEs from this table can cause blocks to permanently shrink to a much smaller size unless a large amount of data is added again. It is possible to have a number of data blocks that are much smaller than 50% of the maximum datablock size.
  • At this point, the performance of a full table operation suffers because of too many I/Os. This feature can help reduce this performance impact by combining the small blocks into larger blocks.

How have businesses resolved this problem before Teradata 13.10? · The ALTER TABLE command can be used to re-block a table. This technique can be time consuming and requires an exclusive table lock. – ALTER TABLE DATABLOCKSIZE = <value> IMMEDIATE · This technique is still available with Teradata 13.10. If this featured is enabled, the merge of small data blocks into a larger block runs automatically during full table SQL operations. This feature can merge datablocks for the primary/fallback subtables and all of the index subtables. The feature runs automatically when the following SQL functions are executed. · INSERT-SELECT · UPDATE-WHERE, · DELETE-WHERE (used on both permanent table and permanent journal datablocks) · During the DELETE phase of Reconfig utility on source amps The merge of multiple small blocks into a larger block is limited by cylinder boundaries – does not occur between cylinders. A maximum of 7 logically adjacent preceding blocks can are merged together into a target block when the target block is updated. Therefore, a maximum of 8 total blocks can be merged together. Why are logical following blocks NOT merged together?

  • The File System software does not know if following blocks are going to be immediately updated
  • Reduces performance impact during dense sequential updates
  • The merge of following datablocks may be added in a future release.

Situations where the merge does not occur:

  •  If a block split occurs (datablock has reached its maximum datablock size), the merge does not occur. Split datablocks are still eligible for future merges.
  •  If the blocks to be merged cannot be locked (locks cannot be granted), then the merge does not occur. File system software uses a no-wait lock requests to reduce response time impact and will only perform a single retry.

Additional characteristics of the Merge Datablocks feature include:

  • A block does not have to contain target rows in order to be combined with other blocks in the cylinder.
  • Blocks are not merged for read-only work (SELECT), Global Temporary, and Volatile tables.
  • Teradata’s File System software manages and chooses combinations of blocks to merge together.
  • Merges are performed simultaneously with updates; invisible to end user.
  • This process runs in foreground; not as a background task.
  • Small datablocks are not merged during initial table population (e.g.,FastLoad or MultiLoad), or when Mini-Cylpack is done.
  • Small datablocks are also merged during SQL transaction rollbacks that use full table interface calls.

How to use this Feature Defaults for this feature can be set at the system level via DBSControl setting sand can be overridden with table level attributes. The CREATE TABLE and ALTER TABLE commands have options to enable or disable this featu
re for a specific table. The key parameter that controls this feature is MergeBlockRatio. This parameter can be set at the system level and also as a table level attribute. MergeBlockRatio has the following characteristics:

  • Limits the resulting size of a merged block.
  • Reduces the chances that a merged block will split again soon after it is merged, defeating the feature’s purpose.
  • Computed as a percentage of the maximum multi-row datablock size for the associated table.
  • Candidate merged block must be smaller than this computed size after all target row updates are completed.
  • Source blocks are counted up as eligible until the size limit is reached (zero to 8 blocks can be merged together).
  • The default system level percentage is 60% and can be changed.

CREATE TABLE or ALTER TABLE options · DEFAULT MERGEBLOCKRATIO – Use the DBSControl system default value for full table modification operations. – Default option on all CREATE TABLE statements · MERGEBLOCKRATIO = integer [PERCENT] – Fixed MergeBlockRatio used for full table modification operations – Overrides the system default value · NO MERGEBLOCKRATIO – Disables merges completely for the table. DBSControl FILESYS Group parameters 25. DisableMergeBlocks (TRUE/FALSE, default FALSE) – Disables feature completely across the system, even for tables with a defined MergeBlockRatio as a table level attribute. – Effective immediately – does not require a Teradata restart (tpareset) 26. MergeBlockRatio (1-100%, default 60%) – Default setting for any table – this can be overridden at the table level by using the table level attribute MergeBlockRatio – Ignored when DisableMergeBlocks is TRUE (FILESYS Flag #25) – This is not stored in or copied to table header – Effective immediately without a tpareset When to Use This Feature

  • For most tables, the system defaults will work fine. This feature can be implemented on a table by table basis.
  • This feature can be used to increase the average block size for tables that historically have very small average block sizes.
  • If a NUPI has a large number of duplicate values, primary index updates on this column may be slower if the hash values are split between multiple blocks. This feature can help reduce the number of I/Os to perform this type of update. This case only applies when the blocks are small enough to be merged.

Requirements There are no specific requirements for this feature. Trade-Offs When the File System software is merging smaller blocks into a larger block, there can be some additional CPU and I/O overhead. Limitations

  • This feature is NOT in effect during non-SQL type updates such as MultiLoad and TPump without Array Insert.
  • When a full table modification operation is executed, there can be a
  • performance impact because of the overhead of merging small blocks into larger blocks. This overhead will be diminished when the larger block sizes are achieved.
  • Locking protocol of Teradata does not support simultaneous merging of small datablocks between cylinders.
  • This feature will not help read-only tables until the table is modified again
  • User cannot specifically force a merge for any DB ranges

Installation & Configuration

  • This feature is automatically part of the Teradata Database 13.10 software when this release is installed on a system.
  • This feature is automatically enabled on a new Teradata 13.10 system (sysinit).There are no special installation or configuration instructions for a new system.
  • For systems that are upgraded to Teradata 13.10 from a previous release, this feature is NOT automatically enabled.
  • Not automatically enabled due to REDO implications on backdown (toTeradata 13.0) due to new WAL records in 13.10.
  • This feature is only enabled when the DBSControl Internal field No13dot0Backdown is set to TRUE.

Usage

  • The system uses this feature automatically. No special usage instructions are required.
  • Merges are transparent to end users and occur without intervention.

Support If this feature appears to introduce unexpected performance or functional issues, you can do one of the following: Disable the feature for a specific table. – ALTER TABLE with NO MERGEBLOCKRATIO option Disable the feature for the system. – Set DisableMergeBlocks to TRUE in DBSControl FILESYS group – This flag disables the feature system wide and it is not necessary to search for tables that have set the MergeBlockRatio attribute. 1.5 Fast Export with No Spooling

  • The NoSpool option in FastExport was implemented for the following reasons:
  • A need to export the contents of a table as quickly and efficiently as possible.
  • An overriding requirement of performance and minimum resource utilization.
  • The “direct without spooling” method was chosen with the expectation it would be the best way to extract tables as quickly and efficiently as possible.

Business Value Depending on the number of rows being exported, this can result in a faster export because the spooling phase is eliminated. Solution: · Provide a FastExport mode that reads the rows into in-memory buffers and begins export of the buffers immediately Benefits: · No export spool file · One pass over the data · Export of blocks begins immediately · Provides a more efficient mechanism for exporting the contents of a table Technical Overview

  • Spooling Architecture
  • No Spooling Architecture
  • Reads all rows into a spool file
  • Reads rows into in memory buffers
  • Evenly distributes blocks across all AMPs
  • Notifies client to begin requesting blocks when buffers are created
  • After distribution, notifies client to begin
  • requesting blocks
  • Begins exporting blocks to client from in
  • memory buffers while data blocks are still
  • being read
  • Client requests n blocks
  • Client requests blocks until receives end of blocks message

When to Use This Feature

  • Use No Spool for simple SELECT requests. Some possible scenarios for taking advantage of the NOSPOOL mode include:
  • Copy the table to another system in preparation for replication or to recover from a synchronization fault in a dual system environment.
  • Copy the table to another system during a system migration.
  • Copy the table or a subset of the table to a test/development system – this is where SAMPLE is very important.
  • Copy (possibly a subset of) the table out to populate a data mart or other data store (e.g. out to SAS, MOLAP tools, etc.).

Requirements There are no specific requirements for this feature. Trade-Offs

  • Ordering requires spooling – so there will be no guarantee of the order of returned data (some users expect a consistency in the order of the returned data even without an ORDER BY clause)
  • A table lock is maintained through the entire export process
  • Character set translation and data type conversion can occur any time during the export (errors were previously detected during the spooling phase, but now may not be detected until the block is read). The application must be prepared to handle errors during row fetching.

Error Messages · 9112 No more FastExport blocks on the session. · 9256 FastExport No Spooling mode is not supported for SELECT statement. Installation & Configuration This feature does not require any special installation or configuration instructions. Usage

  • This feature is used when NoSpool or NoSpoolOnly commands are executed.
  • Existing
    FastExport Utility scripts will run the same as in previous releases.
  • The NoSpool mode exports the contents of a table as quickly as possible without reading the table into a spool file or distributing the file to all AMPs before extracting it. The three options for spooling with FastExport are:

 Spool the data (Default) · NoSpool – use No Spooling if supported by the SELECT statement otherwise spool the data · NoSpool Only – return an error if the SELECT statement is not supported for no spooling NoSpool Mode Supports · Simple SELECT statements · Single Retrieve or Sample Step · Single, few, or ALL AMPs · 1MB Response Buffer NoSpool Mode Does Not Support · ORDER BY · HAVING · WITH · Joins · Stat Functions · Multiple SELECT statements Supported Statements

SELECT statements with a single RETRIEVE step SELECT * FROM tbl001_i_nupi WHERE ((case when I1=1 then SI1 when I2=1 then SI2 else null end) + nullifzero(I1)) * (case when I1=1 then I1 else SI2 end) is null; SELECT statements with a single SAMPLING step SELECT CUST_AUTO_TBL.AUTO_SEQ_NBR, CUST_AUTO_TBL.ACCT_ID, CUST_AUTO_TBL.SERVICE_DT SAMPLE 300,900,1200; STRING Functions SELECT * FROM t5_a WHERE substring(c3 from 1 for 2) like ‘A%’; Arithmetic operators, Case expressions SELECT (I1/(SI1*SI2))(decimal(4,2)) ff, CASE when 1=1 then ff end from t50_a; PPI tables SELECT i2,i3 FROM ppi_exp001u_t01 WHERE i2 >= 0 AND i3 > 0;

Unsupported Statements

Contains a SORT SELECT * FROM t5000_a order by 1; Has a JOIN, SUM, or STAT step or multiple RETRIEVE steps

SELECT VC1, VC2, MIN(SI1) OVER (PARTITION BY VC1 ORDER BY VC1 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) FROM T50_a GROUP BY VC1, VC2, SI1 WHERE VC1 LIKE ‘AA%’;

Has both a RETRIEVE and SAMPLING step SELECT AUTO_SEQ_NBR, ACCT_ID,SERVICE_DT, AUTO_MAKE FROM CUST_AUTO_TBL SAMPLE WITH REPLACEMENT RANDOMIZED ALLOCATION 30,40,50;

Spoolmode NOSPOOL This option tells FastExport to try to use the NoSpool method.

If the NoSpool method is not supported, FastExport issues a warning and then uses the Spool method.

· If SELECT supported by NOSPOOL mode, will use NOSPOOL mode.

· If the NoSpool method is not supported, FastExport issues a warning and then uses the Spool method.

.logon system/fexp_u1,fexp_u1; .begin export SPOOLMODE NOSPOOL sessions 4;

.export outfile nosp10.out format fastload mode record; SELECT * FROM CUST_AUTO_TBL; .end export;

.logoff;

Spoolmode NOSPOOLONLY This option tells FastExport to use the NoSpool method only. If the NoSpool method is not supported, then terminate the job with an error. · If SELECT supported by NOSPOOL mode, will use NOSPOOL mode.

· If the NOSPOOL method is not supported, then terminate the job with an error: RDBMS failure, 9256: FastExport No Spooling mode is not supported for SELECT statement.

.logon system/fexp_u1,fexp_u1;

.begin export SPOOLMODE NOSPOOL ONLY sessions 4;

.export outfile no sp 10

.out format fastload mode record;

SELECT * FROM CUST_AUTO_TBL ORDER BY 1; .end export; .logoff; Syntax and Usage – with JDBC · Uses the JDBC term FastExport “Direct Mode” · No change to existing syntax: String conStr = “TYPE=FASTEXPORT”; · There is no customer option to specify with or without spooling · “WITH NO SPOOL” used for a single-statement SELECT with no GROUP BY and no ORDER BY clause

· All other cases, “WITH SPOOL” is used · Driver notifies that application Direct Mode is used by adding SQLWarning 1298 to the warning chain Syntax and Usage – with TPT Export Operator

· Export Operator will introduce a new optional attribute called SpoolMode.

· The syntax for the SpoolMode attribute is: – VARCHAR SpoolMode = ‘<mode>’ · Valid values are: – ‘Spool’ = spool the answer set (Default) – ‘NoSpool’ = use the NoSpool method – ‘NoSpoolOnly’ = use the NoSpoolOnly method · New notify event called “Block Count” – Event code of 40 – Returns a four-byte unsigned integer of the block count – Available for notification levels greater than Low – Event will be available after all of the rows are exported Support This feature does not require any specific support information.

1.6 Block Level Compression This feature provides the capability to perform compression on whole data blocks at the file system level before the data blocks are actually written to storage. However, there is a CPU cost to perform the act of compression or decompression on whole data blocks, which must be considered.

Business Value Block level compression (BLC) reduces the actual storage required for storing the data and significantly reduces the I/O required to read the data. It allows for much larger table sizes, though it comes at a CPU cost. It might be more effective at compressing data when compared to MVC and ALC because it operates at a block level. For example, when you have small rows, MVC and ALC can’t do much but if you have say 6000 small rows, block level compression may be helpful.

Technical Overview With block level compression, the following applies: · In order to enable block level compression, you must first commit to not backing down to Teradata Database 13.0. (This is covered in the Installation & Configuration section.) · Compressing the data uses more CPU than uncompressing the data. · The compression is done at the block level on a table basis. · When data is compressed/uncompressed, primary, fallback and CLOB data will be operated on. For primary and fallback data they will either both be compressed or both not compressed. · Secondary indexes are never compressed – so no USIs, or NUPIs or NUSIs. · JI and HASH indexes can be compressed.

When to Use This Feature This feature is useful in the following circumstances: · If a table is very large or you are running out of space. · If a table is not accessed or updated very often. · The CPU cost is acceptable given the pattern of usage.

Requirements In order to enable block level compression, you must first commit to not backing down to Teradata 13.0. See the Installation & Configuration section below for more details.

Trade-Offs You must consider the trade-off between reduced storage size and potential decreased performance. Also, prior to enabling this feature, you must also commit to not backing down to Teradata 13.0.

Limitations The following limitations apply: · Cannot independently compress primary/fallback data.

Platforms and Performance There is a CPU cost to perform block level compression (as well as algorithmic compression). Although it can be used on any platform, we have the following general recommendations: · The newly released appliance platforms (e.g. 2650) that have a richer ratio of CPU to I/O will have more CPU headroom available than EDW class platforms (5XXX) for doing compression. · Discourage the use of BLC on a 5xxxx system as the reduced storage size is unlikely to be worth the trade off in decreased performance.

Installation & Configuration In order to enable block level compression, you must first commit to not backing down to Teradata 13.0. You do this by setting the following General DBS Control field: 65. No13dot0Backdown = TRUE When this flag is set to TRUE, then you can enable block level compression by using the DBS Control Compression Field, BlockLevelCompression (see below). Note: Once the NO13dot0Backdown flag is set it cannot be reversed except by a Teradata 13.0 sysinit. DBS Control: Compression Fields Use the following DBS Control fields to enable block level compression and define how it should be used. BlockLevelCompression This field contro
ls whether block level compression is enabled globally or not. · On: Compression of DBs is enabled. DBs from different categories of table may or may not be compressed, subject to the other BLC compression settings. This field can only be set to ON if the No 13.0 Backdown flag has been set, otherwise it has a default setting of OFF. If this field is ON, compression is enabled or disabled for individual table types as specified by their corresponding tunables. · Off: Default. Compression is disabled for all categories of table data, including block compression initiated by query bands when tables were loaded. Newly created data blocks will not be compressed. If this field is OFF, it supersedes all table type specific settings. CompressionAlgorithm Specifies the algorithm that is used to compress DB data. · ZLIB: Default. It is the only compression algorithm that is currently supported. CompressLevel Determines whether compression operations favor processing speed or degree of data compression. It is a parameter used by the selected compression algorithm. · For ZLIB, valid range is 1 through 9. · A value of 1 indicates preference for maximum compression speed · Each higher value sacrifices more speed for better compression ratio. · The highest value of 9 optimizes for minimal compression size. · The default is 6, which provides a compromise midway between speed of compression operation and degree of data compression. Note: For average data (mix of numeric and character data in rows of under a couple of 100 bytes), changing the CompressLevel had very little effect. CompressGlobalTempDBs This field is used by the File System to determine how to handle Global Temp DBs with respect to compression. This setting and all other compression attributes specified for the base table of this table type also applies to sort and index maintenance work tables corresponding to this table type. · ONLYIFQBYES: The DBs of a new table will not be compressed unless a Query Band option used at load time specified to compress them. · UNLESSQBNO: The DBs of a new table will be compressed unless a Query Band option used at load time specified not to compress them. · NEVER: Default. The DBs of a new table will not be compressed regardless of any Query Band option used at load time. CompressPermDBs This field is used by the File System to determine how to handle Perm DBs with respect to compression. System data, dictionary data, table headers, LOBs, Snapshot Logs, and Restore Logs are always excluded from compression. This setting and all other compression attributes specified for the base table of this table type also applies to sort and index maintenance work tables corresponding to this table type. · ONLYIFQBYES: Default. The DBs of a new table will not be compressed unless a Query Band used at load time specified to compress them. · UNLESSQBNO: The DBs of a new table will be compressed unless a Query Band option used at load time specified not to compress them. · NEVER: The DBs of a new table will not be compressed regardless of any Query Band option used at load time. CompressPJDBs This field is used by the File System to determine how to handle Permanent Journal (PJ) DBs with respect to compression. This setting and all other compression attributes specified for the base table of this table type also applies to sort and index maintenance work tables corresponding to this table type. · ALWAYS: The DBs of a new PJ table will be compressed. · NEVER: Default. The DBs of a new PJ table will not be compressed. CompressSpoolDBs This field is used by the File System to determine how to handle Spool DBs with respect to compression. This setting and all other compression attributes specified for the base table of this table type also applies to sort and index maintenance work tables corresponding to this table type. · ALWAYS: The DBs of a new Spool table will be compressed. · NEVER: Default. The DBs of a new Spool table will not be compressed. · IFNOTCACHED: The DBs of a new Spool table will be compressed if they are not being cached. CompressMloadworkDBs This field is used by the File System to determine how to handle Multiload sort and index maintenance work tables with respect to compression. · ONLYIFQBYES: The DBs of a new table will not be compressed unless a Query Band option used at load time specified to compress them. · UNLESSQBNO: The DBs of a new table will be compressed unless a Query Band option used at load time specified not to compress them. · NEVER: Default. The DBs of a new table will not be compressed regardless of any Query Band used at load time. MinDBSectsToCompress A parameter to be used by the compression algorithm to determine if the Data Block occupies enough sectors to warrant compression. If not enough sectors exist in the Data Block, the Data Block will not be saved in the compressed format. · Valid Range is 2 through 255 sectors. · The default is 16. MinPercentCompressReduction A parameter to be used by the compression algorithm to determine if compression of the Data Block has reduced the size a sufficient amount. If the size reduction is not adequate, the Data Block will not be saved in the compressed format and will be stored uncompressed. · Valid Range is 0 through 99 percent. · The default is 20. UncompressReservedSpace Specifies the minimum percentage of free cylinders (system disk space for each AMP) that must remain available after DBs are decompressed using the Ferret UNCOMPRESS command. The uncompress operation is terminated at the point where this threshold would be exceeded and the table is left in a partially compressed state (but still usable). · Valid Range is 1 through 90 percent. · The default is 20. CompressClobDBs Controls whether CLOBs are normally compressed or not. This option applies to CLOBs in Global Temp Tables and Perm Tables. · ONLYIFQBYES: Default. The DBs of a new table will not be compressed unless a Query Band used at load time specified to compress them. · UNLESSQBNO: The DBs of a new table will be compressed unless a Query Band option used at load time specified not to compress them. · NEVER: The DBs of a new table will not be compressed regardless of any Query Band option used at load time. Usage You can compress data when it’s first loaded into an empty table or you can later use FERRET to compress/uncompress data in an existing table. If the table is empty, compression is determined by a combination of the DBSControl settings and the Query Band (QB) options. If the table is populated, compression is determined by the master DBS Control field, BlockLevelCompression, as well as the settings in the table-specific fields (CompressGlobalTempDBS, CompressPermDBs) but only if it is set to NEVER. If you choose to compress a table, make sure the effective block size for the table is the maximum allowed (255 sectors). This can be done as either a table level attribute (with create or alter table) or for tables with no table level attribute, as the system default (PermDBSize in DBS Control). This will allow for the best compression ratios. FERRET Changes Ferret contains new commands to compress and uncompress a table as well as slightly enhanced displays. · COMPRESS u0 u1: Compresses Primary, Fallback and CLOBs data blocks of a specified table, where u0 and u1 are unique values of tableid. · UNCOMPRESS u0 u1: Uncompresses Primary, Fallback and CLOBs data blocks of a specified table, where u0 and u1 are unique values of tableid. · The following optional clauses are available for the COMPRESS/UNCOMPRESS command: o WITHOUT CLOBS – [un]compress primary/fallback but leave the CLOBS as is. o ONLY CLOBS – [un]compress the CLOBS and leave primary/fallback as is. o If no option is specified, CLOB data will not be compressed. · SHOWBLOCKS: Displays data block size and the number of rows per data block, or the data block size only for a defined scope. Uses a # sign to indicate that some or all of the data blocks associated with the subtables are in compressed format. o /S: For each primary data subtable, display the following: § a histogram of block sizes § the minimum, average, and max
imum block size per subtable This is the default display. o /M: Same as the /S option, but displays the statistics for all subtables. o /L: For each subtable, for each block size, display the following: § number of blocks § the minimum, average, and maximum number of rows per data block size Displays the statistics for all subtables. Query Band Options A new query band option, BLOCKCOMPRESSION, has been added. Ex: SET QUERY_BAND = ‘BLOCKCOMPRESSION=YES;’ UPDATE FOR SESSION It indicates if the data being loaded into an empty table should be compressed atthe data block level. It can also be specified on a CREATE USER statement. If it is not present, the DBSControl settings apply. Support Potential Problems and Solutions · Interrupted COMPRESS / UNCOMPRESS § The table is partially compressed, but still usable. § Reissue the command to finish up. · When you use FERRET to COMPRESS / UNCOMPRESS, space accounting is not updated automatically. § You have to manually run the utility updatespace (from DBW/CNS) to see the space accounting reflected correctly · COMPRESS and UNCOMPRESS will merge small DBs as part of the process. · UNCOMPRESS of a large table could run system out of space. Use UncompressReservedSpace setting to reserve space and prevent out of space situations. Troubleshooting New Error Message: ERRFILCOMPFAIL – got an error from compression library New Standalone Utility – blkutil: For cases where you can’t start FILER or it will not run, you can use this utility to try to fix the table enough to get FILER started again. blkutil: · Takes a gdb/coroner ascii hex display as input § Currently support format like x/200xw · Can uncompress a compressed DB · Can do formatted displays of the block § o hex, short, long, etc · Runs with the Teradata Database down Workarounds The following are workarounds you can try if you run into issues: · Turn compression off via the DBS Control Compression field, BlockLevelCompression. · Uncompress a problematic table via the Ferret uncompress command.

Module 2    :   Temporal Overview

What is a Temporal Database? A temporal database is a database that includes temporal tables. Temporal tables are tables that provide special support for the time dimension. They provide special facilities for storing, querying, and updating historical and/or future data. For example, they can store the history of a stock or the history of employee movement within an organization. A temporal DBMS provides a temporal version of SQL, including enhancements to the data definition language (DDL), constraint specifications and their enforcements, data types, data manipulation language (DML), and query language for temporal tables. Conventional database systems – including all of the existing mainstream products – are not temporal in that they provide no special support for temporal data. For example, a conventional database cannot directly support historical queries about past status and cannot represent inherently retroactive or proactive changes. Without built-in temporal table support from the DBMS, applications are forced to use complex and often manual methods to manage and maintain temporal information. The difference between a temporal database and a conventional database is that a temporal database maintains data with respect to time and allows time-based reasoning and querying, whereas a conventional database captures only a current snapshot of reality.

The Need for a Temporal Database ? Most applications need to design and build databases where information changes over time. It may also be a requirement to  ?capture the state of data at specific points in time, past, present and future. For example, we might wish to see what a specific table looked like two years ago. Doing this without temporal table support is possible, although complex. Consider an application for an insurance company that uses a Policy table where The definition looks like this:

CREATE TABLE Policy( Policy_ID INTEGER, Customer_ID INTEGER, Policy_Type CHAR(2), Policy_Details CHAR(40) ) UNIQUE PRIMARY INDEX(Policy_ID);

Suppose the application needs to record when rows in the Policy table became valid. Without temporal table support, one approach that the application can take is to add a DATE column to the Policy table called Start_Date. Suppose the application also needs to know when rows in the table are no longer valid. Another DATE column called End_Date can accomplish this. The new definition of the table looks like this:

CREATE TABLE Policy( Policy_ID INTEGER, Customer_ID INTEGER, Policy_Type CHAR(2), Policy_Details CHAR(40) Start_Date DATE, End_Date DATE ) UNIQUE PRIMARY INDEX(Policy_ID);

Several complications are now evident. For example, if a customer makes a change to their policy during the life of the policy, a new row will need to be created to store the new policy conditions that are in effect from that time until the end of the policy. But the policy conditions prior to the change are also likely to be important to retain for historical reasons. The original row represents the conditions that were in effect for the beginning portion of the policy, but the END_DATE will need to be updated to reflect when the policy conditions were changed. Additionally, because of these types of changes, it becomes likely that more than one row will now have the same value for Policy_ID, so the primary index for the table would need to change. All modifications to the table must now consider changing the Start_Date and End_Date columns. Queries will be more complicated. The presence of a DATE column in a table does not make the table a temporal table, nor make the database a temporal database. A temporal database must record the time-varying nature of the information managed by the enterprise. Rather than using approaches such as adding DATE columns to traditional tables, Teradata Database provides support to effectively create, query, and modify time-varying tables in a completely different manner. Business Value The Temporal feature permits you to add a time dimension qualifier to the database objects that are created or altered. This is useful for those who need to capture current, historical or future states of different aspects of their business. Temporal databases increase your depth of business intelligence activities by allowing you to: · Exploit an enhanced time dimension, enabling ‘chain of events‘ and ‘point in time’ analytics. · Uncover previously-undetected business value from historical data. · Easily reconstruct historical transaction details. · Discover new ways to perform and compete. · Temporal databases provide technical benefits by simplifying coding: · Reduces the cost of developing queries and maintaining temporal data. · Improves development and maintenance productivity. · Diminishes the cost and effort of monitoring and maintaining temporal data chains. · Many companies already maintain temporal versioning using custom built scripts · but a a DBMS implementation provides ease of use and superior performance in many cases. · The primary benefit is simplified modifications based on qualifier statements instead of elaborate application scripts. · Users can specify temporal constraints that are enforced by the DBMS. · Both of the preceding provide better data quality. For an active data warehouse:

  • Unique Join Index features provides two amp access plans.
  • Temporal forms of JI will speed queries on temporal tables.

For Teradata CRM, BI tools, and analytical applications it makes new insight possible. Technical Overview

  • Temporal tables store and maintain information with respect to time. Using temporal tables, Teradata Database can perform operations and queries that include time-
    based reasoning.
  • Temporal tables include one or two special columns, which store time information:
  • · A transaction-time column records and maintains the time period for which Teradata Database was aware of the information in the row. Teradata Database automatically enters and maintains the transaction-time column data, and consequently automatically tracks the history of such information.
  • · A valid-time column models the real world, and stores information such as the time an insurance policy or product warranty is valid, the length of employment of an employee, or other information that is important to track and manipulate in a time-aware fashion.

When you add a new row to this type of table, you use the valid-time column to specify the time period for which the row information is valid. This is called the Period of Validity (PV) of the information in the row. As rows are changed in temporal tables, the database automatically creates new rows as necessary to maintain the time dimensions. For example, if a row in a table with transaction time is modified, the row is automatically split into two rows:

  • One row represents the row as it existed before the modification. The ending bound of the transaction time period value is set to the time of the modification. This row is “closed” and becomes a history row, because the information it contains before the modification is no longer true. However, the row is not physically deleted from the database. It remains to serve as a historical record of how the row existed before the modification was made.
  • A new row is created to represent the changed information that exists following the modification. The beginning bound of its transaction time period value is set to the time of the modification, and its ending bound is left open (set to UNTIL_CLOSED). As far as the database is concerned, the information in the new row is true until it changes again or until the row is deleted.

Modifications to rows in a table with a valid-time column are more flexible. When a row is modified in a table with a valid-time column, you can specify the time period for which the modification applies. This is called the Period of Applicability (PA) of the modification. Depending on the relationship between the PV of the row and the PA of the modification, Teradata Database may split the modified row into multiple rows. For example, if the modification is applicable only to a brief period that lies within the PV of the row, three rows will result from a modification:

  • One row has the original information, and a valid-time period that covers the time from the beginning of the original PV of the row until the modification happened.
  • The second row has the modified information, and a valid-time period that matches the PA of the modification statement.
  • The third row has the original information, like the first row, but has a valid-time period that covers the time starting from after the modification is no longer valid through the end time of the PV of the original row.

If the PA of the modification overlaps, but does not lie within the PV of the row, the modification will split the row into only two rows, similar to the example for a TransactionTime table. Transaction time and valid time are considered independent time dimensions, and their columns serve different purposes, so a table can have both a valid-time column and a transaction-time column. Such a dual-purpose temporal table is called a Bi-temporal table. When to Use This Feature This feature is useful when it is desirable to create tables and other objects to be used for time-dimensioned queries with a minimum of complexity in the coding.

  • This feature provides better insight into a business with focus on the time dimension of changing business metrics.
  • ValidTime tables can retain evolution of business data and enable business analytics:
  • Determine seat pricing for a flight based on historical pricing versus sales – How did my airline seat capacity change by hour in the week preceding a flight 4 years ago, versus 3 years ago, versus 1 year ago to determine current pricing.
  • ValidTime tables can enable representation of future data.
  • TransactionTime tables will enable auditing. For example Canadian Revenue Agency requires that all answers given to a tax payer must be retained for future reference. A TransactionTime table can be used with AS OF queries to reconstruct the answer for a given time period at a later time.

Requirements The temporal feature must be purchased and enabled in order to use temporal tables. Trade-Offs Temporal tables add value for satisfying time dimension data capture and reporting requirements both in terms of ease of use and ease of creation. Because rows are typically not physically deleted in temporal tables, there will be an expected increase in the storage requirements for these kept rows. Because temporal DML operations such as insert, update and delete have ramifications for other related rows in the temporal database, these operations can require more system resources in terms of CPU and I/O. Temporal Time The two new types of time that are introduced with temporal databases are ValidTime and TransactionTime: · ValidTime is supported as a Period data type with begin and end element types of DATE or TIMESTAMP. It models the user world and denotes the time period during which a fact (row) is true (valid) with respect to the real world. · TransactionTime is supported as a Period data type with begin and end element types of TIMESTAMP. TransactionTime is the time period beginning when a fact was first known to (or recorded in) the database and ending when superseded by an update to the fact or when the fact is deleted. Note that the ValidTime and TransactionTime period values can both exist as separate columns in the same row and are often different. A fact recorded in the database may take place before or after the real world change occurs. A salary increase which is scheduled to start next month may be input to the database today. Temporal Tables Temporal databases may have several types of tables as seen in the following list.

  • ValidTime table – A table that has a ValidTime column (but not a TransactionTime column).
  • TransactionTime table. – A table that has a TransactionTime column (but not a ValidTime column).
  • Bi-temporal table – A table that has both types of columns.
  • Temporal table – A table that has one or both.
  • Nontemporal table – A table that has neither.

Temporal Qualifiers There are three general forms of qualifiers that can be applied to temporal statements. Each affects in a different way how the data will be processed. Current – Considers a current snapshot of the entity. This is what a nontemporal table stores. Because current is the default qualifier, if temporal qualifiers are not use in a query, the results produced are the same as would be returned in a traditional non-temporal table. Sequenced – Considers the state of an entity at each moment in time over a period of applicability. NonSequenced – Treats the temporal columns with no special meaning. The time dimension is effectively ignored. AS OF – Considers a snapshot in time based on the AS OF date/timestamp, retrieving rows whose valid or transaction time periods overlap the specified AS OF expression. Each of these forms may be applied to each of the two different time dimensions, ValidTime and TransactionTime. ValidTime Options are: Current in ValidTime, Sequenced in ValidTime, Nonsequenced in ValidTime ValidTime AS OF Transaction Time Options are: Current in TransactionTime Nonsequenced in TransactionTime TransactionTime AS OF The combination tells us how a specific operation will be processed; · Which form is used (e.g. Current) · Which time dimension is applied (e.g., ValidTime) Temporal Row
s
 The following descriptors are used to categorize rows in their valid time or transaction time dimension. Note these are not technically precise definitions, but they are intended to convey the essence of each type of row. ValidTime Tables: Current row – The current date falls within the valid time period. History row – The current date falls beyond the valid time period. Future row – The current date falls before the valid time period. TransactionTime Tables: Open row – The transaction time period has the reserved word UNTIL_CLOSED as its end bound (or the equivalent timestamp literal which is TIMESTAMP ‘9999-12-31 23:59:59.999999+00:00’). This indicates that this row (fact) is something the database currently knows to be true, since it is not closed yet. Closed row – The transaction time period has a not-null date less than UNTIL_CLOSED as its end bound. This indicates that the row has been closed out and is only kept for historical purposes.

Module 3: Usability features

3.1 SQL User Defined Functions

You can now write user-defined functions (UDFs) using the SQL programming language.

  • SQL UDFs (SUDFs) provide a simple mechanism that allows you to define your own expressions and encapsulate them within a function concept.
  • SQL UDFs are functions written using regular SQL expressions and are used like standard SQL functions.
  • SUDFs can be used as a way to represent expressions that may be used in many instances of a query as well as in different queries. Conceptually, they are like macros.

In general, anywhere you can define a UDF you can define a SUDF.

 Business Value

 This feature:

·         Simplifies writing complex SQL expressions that are used frequently.

·         Enables the reuse of complex SQL expressions.

·         Reduces client-server network traffic because requests written using SQL

UDFs are smaller than those written in other languages.

·          Does not require writing a UDF in C, C++, Java, or other languages.

·          Lays the foundation of a more general feature (User Defined Operators).

Technical Overview

 UDFs help to improve the coding and maintenance of queries. Also, using

SUDFs instead of UDFs allows for great improvements in performance:

 ·         Parsing time is similar to that of UDFs.

 ·         Approximately 2 times faster than unprotected UDFs and 3.33 times faster than protected UDFs for processing a table with 100,000 rows.

 ·         Approximately 30 times faster than unprotected UDFs and 51 times faster than protected UDFs for processing a table with 10,100,000 rows

The CREATE FUNCTION and REPLACE FUNCTION statements have been enhanced to enable writing UDFs that are SQL expressions. If required, such SQL UDFs can reference external UDFs to provide more complex processing capabilities.

 When to Use This Feature

Use this feature any time common SQL expressions need to be represented within a query or set of queries.

Do not use this feature for SQL expressions that are very simple or will be rarely used.

 Requirements

There are no specific requirements for this feature.

Trade-Offs

There are no trade-offs in using this feature.

 Limitations

See the Usage section below for more information.

 Installation & Configuration

 A new internal DBS Control field allows you to enable or disable the reparsing of SQL UDFs. After the UDF reference is replaced by its definition, the query can be further validated by parsing it again. This flag is used for debugging purposes and may introduce a small performance degradation. The default is FALSE so that reparsing is off.

 Usage

The CREATE/REPLACE statement has been modified to include a RETURN clause and a new value SQL for the LANGUAGE clause

 The following rules apply to the RETURN statement.

·         RETURN is an SQL control statement that can only be executed fromwithin an SQL UDF.

·         You cannot specify more than one RETURN statement per SQL UDF.

·         The data type of value_expression must match the data type specified by the RETURNS clause of the SQL UDF definition or it must be capable of being cast implicitly
to the data type specified in the RETURNS clause.

 The specified data types can be any valid Teradata type.

 ·         value_expression cannot contain references to tables.

 ·         value_expression cannot contain scalar subqueries such as those madeby DML statements.

 ·         value_expression can contain references to parameters, constants, SQL UDFs, external UDFs, and methods.

 ·         value_expression cannot be a conditional expression nor have a Boolean return type.

 ·         value_expression can contain arithmetic values and functions, string functions, DateTime functions, and SQL operators that define a scalar result.

 ·         value_expression cannot contain aggregate or ordered analytic functions.

 ·         If you specify a reference to an SQL UDF in the RETURN statement, it cannot be any of the following types of reference:

 ·         Self-references

·         Circular references

·         Forward references

 A forward reference is a case where an SQL UDF references another SQL UDF that does not yet exist.

 Example:

 The following CREATE FUNCTION request performs an addition operation if the first argument submitted to calc is the literal ‘A’, a subtraction operation if the first argument is ‘S’, a multiplication operation if the first argument is ‘M’, and a division operation if the first argument is anything else.

 CREATE FUNCTION calc (func CHARACTER(1), A INTEGER, B

INTEGER)

RETURNS INTEGER

LANGUAGE SQL

DETERMINISTIC

CONTAINS SQL

SQL SECURITY DEFINER

COLLATION INVOKER

INLINE TYPE 1

RETURN CASE

WHEN func = ‘A’

THEN A + B

WHEN func = ‘S’

THEN A – B

WHEN func = ‘M’

THEN A * B

ELSE A / B

END;

 

The following SELECT request returns a result value of 5.

SELECT calc (cast(‘A’ AS CHAR(1), 2, 3);

 

The following SELECT request returns a result value of 6.

SELECT calc (cast(‘M’ AS CHAR(1), 2, 3);

 

Support

Tip :- When passing strings as arguments make sure the formal parameter is VARCHAR. Do not use CHAR, even if it’s just a single character.

  3.2 Utility Functions

 New SQL User Defined Functions (UDFs) and Domain Specific functions have been added that enhance the usability of date/time operations. For example, given a date value, return the day_of_week.

 Business Value

 These new functions make it easier to write sequenced aggregation and normalize aggregation queries. These functions have the following benefits.

 They are simple to use.

 They avoid complex joins.

 They improve performance when compared with equivalent SQLs.

 

Technical Overview

 When to Use This Feature

This feature is useful under many circumstances, and does not have special usage conditions.

 Requirements

There are no specific requirements for this feature.

Trade-Offs

There are no trade-offs in using this feature.

Limitations

There are no limitations when using this feature.

 Installation & Configuration

This feature does not require any special installation or configuration instructions.

 Usage

The section below lists and describes the functions added.

 Domain Specific Table Functions

 TD_NORMALIZE_OVERLAP

 Combines (normalizes) the rows that overlap on period values. The resulting normalized row would contain the earliest begin and latest end of all the rows involved. For example: grp1 has 3 distinct period values, i.e., (t1-t4, t2 – t8 and t6 – t9) then the resulting normalized row will have period (t1- t9)

Note: A normalized row is an outcome of a normalize operation on two or more rows whose period values either meet or overlap.

 Input: Grouping column list and a period(Date/Timestamp/Timestamp with TZ) column.

 Output: Grouping column list, a normalized period(Date/Timestamp/Timestamp with TZ) column and an optional integer count column.

You have to specify output columns that are the same as the columns mentioned in the input arguments. An additional integer  Column is also provided if you request it. Otherwise, only the grouping column output and normalized period columns are returned as output.

 Example:

WITH SUBTBL( flight_id, duration ) AS

(SELECT flight_id, duration FROM FlightExp)

SELECT * FROM TABLE (

TD_SYSFNLIB.TD_NORMALIZE_OVERLAP( NEW

VARIANT_TYPE(SUBTBL.flight_id),

SUBTBL.duration )

RETURNS ($_flight_id INT, $_duration PERIOD(TIMESTAMP(6)

WITH TIME ZONE), NrmCount INT )

— Note: Output parameter names must be different from

input parameter names of the table function.

HASH BY flight_id LOCAL ORDER BY flight_id , duration

)

AS DT(flight_id, duration, NrmCount) ORDER BY 1,2;

 

TD_NORMALIZE_MEET

 Combines (normalizes) the rows that meet on period values. The resulting normalized row would contain the earliest begin and latest end of all the rows involved. For example: grp1 has 3 distinct period values, i.e., (t1-t4, t4 – t8 and t8 – t9) then the resulting normalized row will have period (t1- t9).

 Input: Grouping column list and a period (Date/Timestamp/Timestamp with TZ) column. 

Output: Grouping column list, a normalized period(Date/Timestamp/Timestamp with TZ) column and an optional integer count column.

You have to specify output columns that are the same as the columns mentioned in the input arguments. An additional integer column is also provided if you request it. Otherwise, only the grouping column output and normalized period columns are returned as output.

 TD_SUM_NORMALIZE_OVERLAP

Finds the sum of a column for all the rows that are normalized on overlap in a group.

 Input: Grouping column list, a single numeric column on which sum() is requested, and a period (Date/Timestamp/Timestamp with TZ) column. 

Output: Grouping column list, the numeric result column and a normalized period(Date/Timestamp/Timestamp with TZ) column.

You have to specify output columns that are the same as the columns mentioned in the input arguments. For the numeric output column, you have to supply the same datatype of source column. If there is an overflow during the math computation, the function returns an overflow error. It is your responsibility to cast the input numeric column to wide numeric column to get the correct result.

 Example:

WITH SUBTBL( flight_id, charges,duration ) AS

(SELECT flight_id,charges, duration FROM FlightExp)

SELECT * FROM TABLE (

TD_SYSFNLIB.TD_SUM_NORMALIZE_OVERLAP( NEW

VARIANT_TYPE(SUBTBL.flight_id),

NEW VARIANT_TYPE(SUBTBL.charges),

SUBTBL.duration )

RETURNS ($_flight_id INT, $_charges FLOAT, $_duration

PERIOD(TIMESTAMP(6) WITH TIME ZONE) )

— Note: Output parameter names must be different from input

parameter names of the table function.

HASH BY flight_id LOCAL ORDER BY flight_id , duration )

AS DT(flight_id, charges, duration) ORDER BY 1,3;

 

TD_SUM_NORMALIZE_MEET

Find the sum of a column for all the rows that are normalized on meet in a group.

 Input: Grouping column list, a single numeric column on which sum() is requested, and a period (Date/Timestamp/Timestamp with TZ) column.

 Output: Grouping column list, the numeric result column and a normalized period(Date/Timestamp/Timestamp with TZ) column.

You have to specify output columns that are the same as the columns mentioned in the input arguments. For the numeric output column, you have to supply the same datatype of source column. If there is an overflow during the math computation, the function returns an overflow error. It is your responsibility to cast the input numeric column to wide numeric column to get the correct result.

 TD_SUM_NORMALIZE_OVERLAP_MEET

Finds the sum of a column for all the rows that are normalized on either overlap or meet in a group.

 Input: Grouping column list, a single numeric column on which sum() is requested, and a period (Date/Timestamp/Timestamp with TZ) column.

 Output: Grouping column list, the numeric result column and a normalized period(Date/Timestamp/Timestamp with TZ) column.

 You have to specify output columns that are the same as the columns mentioned in the input arguments. For the numeric output column, you have to supply the same datatype of source column. If there is an overflow during the math computation, the function returns an overflow error. It is your responsibility to cast the input numeric column to wide numeric column to get the correct result.

 TD_SEQUENCED_SUM

Finds sum of a column for all adjacent periods in normalized rows whose period values either meet or overlap.

 Input: Grouping column list, a single numeric column on which sum() is requested, and a period (Date/Timestamp/Timestamp with TZ) column.

 Output: Grouping column list, the numeric result column and a sequenced aggregation result  eriod(Date/Timestamp/Timestamp with TZ) column.

You have to specify output columns that are the same as the columns mentioned in the input arguments. For the numeric output column, you have to supply the same datatype of source column. If there is an overflow during the math computation, the function returns an overflow error. It is your responsibility to cast the input numeric column to wide numeric column to get the correct result.

 Example:

WITH SUBTBL( flight_id, charges,duration ) AS

(SELECT flight_id,charges, duration FROM FlightExp)

SELECT * FROM TABLE (

TD_SYSFNLIB.TD_SEQUENCED_SUM( NEW

VARIANT_TYPE(SUBTBL.flight_id),

NEW VARIANT_TYPE(SUBTBL.charges),

SUBTBL.duration )

RETURNS ($_flight_id INT, $_charges FLOAT, $_duration

PERIOD(TIMESTAMP(6) WITH TIME ZONE) )

— Note: Output parameter names must be different from input

parameter names of the table function.

HASH BY flight_id LOCAL ORDER BY flight_id , duration )

AS DT(flight_id, charges, duration) ORDER BY 1,3;

 

TD_SEQUENCED_AVG

Finds average of a column for all adjacent periods in normalized rows whose period values either meet or overlap.

 Input: Grouping column list, a single numeric column on which sum() is requested, and a period (Date/Timestamp/Timestamp with TZ) column.

Output: Grouping column list, the numeric result column and a sequenced aggregation result  period(Date/Timestamp/Timestamp with TZ) column. You have to specify output columns that are the same as the columns mentioned in the input arguments. For the numeric output column, you have to supply REAL data type to hold the average value. If you specify that the output column is the same as the input column that is not of type real/decimal, then truncation of value happens. An AVERAGE always returns a real value. If there is an overflow during the math computation, the function returns an

overflow error. It is your responsibility to cast the input numeric column to wide numeric column to get the correct result.

 

TD_SEQUENCED_COUNT

Finds the count of a column for all adjacent periods in normalized rows whose period values either meet or overlap. 

Input: Grouping column list and a period(Date/Timestamp/Timestamp with TZ) column.

Output: Grouping column list, an integer count column and a sequenced aggregation result period(Date/Timestamp/Timestamp with TZ) column.

You have to specify output columns that are the same as the columns mentioned in the input arguments. For the numeric output column, you have to supply INT as the result column for the count(). If there is an overflow during the math computation, the function returns an overflow error. It is your responsibility to cast the input numeric column to wide numeric column to get the correct result.

Example:

WITH SUBTBL( flight_id, duration ) AS

(SELECT flight_id, duration FROM FlightExp)

SELECT * FROM TABLE (

TD_SYSFNLIB.TD_SEQUENCED_COUNT( NEW

VARIANT_TYPE(SUBTBL.flight_id),

SUBTBL.duration )

RETURNS ($_flight_id INT, $_duration PERIOD(TIMESTAMP(6) WITH

TIME ZONE) )

— Note: Output parameter names must be different from input parameter names of the table function.

HASH BY flight_id LOCAL ORDER BY flight_id , duration ) AS DT(flight_id, duration) ORDER BY 1,3;

 SQL UDFs

The following SQL UDFs have been added:

·         day_of_calendar: Gets the day number from the beginning of the calendar.

·         day_of_month: Gets the day number from the beginning of the month.

·         day_of_year: Gets the day number from the beginning of the year.

·         day_of_week: Gets the day number from the beginning of the week.

·         weekday_of_month: Gets the week number from the beginning of the month.

·         week_of_month: Gets the full week number from the beginning of the month.

·         week_of_year: Gets the week number from the beginning of the year.

·         week_of_calendar: Gets the week number from the beginning of the calendar.

·         month_of_quarter: Gets the month number from the beginning of the quarter.

·         month_of_year: Gets the month number from the beginning of the year.

·         month_of_calendar: Gets the month number from the beginning of the calendar.

·         quarter_of_year: Gets the quarter number from the beginning of the year.

·         quarter_of_calendar: Gets the quarter number from the beginning of the calendar.

·         year_of_calendar: Gets the year number from the beginning of the calendar.

They each have the following inputs and outputs:

·         Input: Date, Timestamp or Timestamp with zone

·         Output: Integer

·         Example: sel day_of_calendar(current_date);

Support

This feature does not require any specific support information.

 3.3 User Defined Ordered Analytic Functions               

The User Defined Ordered Analytic (UDOA) feature provides the ability to specify

User Defined Aggregates (UDAs) within an ordered analytic window clause.

Recall that there are three types of User Defined Functions (UDFs): scalar, aggregate and table functions. UDAs are simply UDFs of the aggregate type which can now be written to work in conjunction with ordered analytic processing.

Thus, when the term UDA is encountered, think of it as a special purpose UDF.

 

Business Value

Businesses that need to perform analytics on ordered data sets can benefit from UDOAs. UDOAs can improve performance over built-in ordered analytic functions in cases that require a RESET WHEN clause
. They enable tools and procedural language to be embedded in database analytics.

 Oracle’s First/Last value functions are simple to express and perform faster than RESET WHEN. Teradata UDOA enables us to compete in performance with the Oracle feature.

 Teradata UDOA processing uses shared nothing parallelism for moving window processing even in the case of single partition or skewed data.

Teradata UDOA can reduce system resource consumption over built-in order analytics functions which are using the RESET WHEN clause. Actual customer test cased show 2.5X reduction in CPU and IO.

 Technical Overview

 The 13.10 release is the first release of UDOAs. It contains the following:

• Support for User Defined Aggregates (UDAs) in a windows clause

• New AGR_MOVINGTRAIL Phase

• FNC_Context_t area values describing window characteristics.

 When to Use This Feature

This feature is useful for allowing customization when processing an ordered stream of rows.

Requirements

There are no specific requirements for this feature.

 Trade-Offs

There are no trade-offs in using this feature.

 Limitations

The following limitations apply:

• No Java language support.

• No range partitioning for group/cumulative window types.

• The database system does not validate if UDOA is used in a specific window type. The UDOA writer can build window type checking into the function.

• UDAs used in the windows clause have the same set of functionality and limits as an existing UDF. Likewise the window clause containing a UDA would have the same set of limitations and functionality as an existing window clause. This feature does not introduce any further restrictions on the analytic functions.

• Context area limited to 64000 bytes same as UDAs

• The following capabilities are not available in built in ordered analytic functions

Sessionization

Exponential Weighted Moving Averages

Dense Rank

 Installation & Configuration

This feature does not require any special installation or configuration instructions. This feature does not impact any other feature nor does it affect the installation, reporting, operation, or other Teradata processes.

 Usage

The syntax for the window support for the UDA functions is exactly the same as the support that is available for aggregates like SUM, COUNT, AVG, etc. that can be used in a windows clause.

 Ordered Analytic functions operate on a window of rows

CREATE TABLE t (id INTEGER, v INTEGER)

 — REPORTING WINDOW

select id, v, MYSUM(v) OVER (PARTITION BY id ORDER BY v) FROM t

 — CUMMULATIVE WINDOW

select id, v, MYSUM(v) OVER (PARTITION BY id ORDER BY v ROWS UNBOUNDED PRECEDING) FROM t

 — MOVING WINDOW

select id, v, MYSUM(v) OVER (PARTITION BY id ORDER BY v ROWS BETWEEN 2 PRECEDING AND 3 FOLLOWING) FROM t

 Options and Syntax

 • DDL: There is no new CREATE FUNCTION DDL clause syntax. UDOAs use the same syntax as aggregate UDFs.

 Result Type of UDOA : Defined in create DDL.

Result Format : Default format of the return type

• Note, most of the built-in window functions return floating point data types even though the window aggregate function’s  argument may be integer.

(This is to prevent overflows.)

• UDOAs are different since the UDOA writer explicitly specifies the return type. The UDOA writer should consider allowable window sizes and select an appropriate return data type to avoid overflows.

• DML: Supported window types

Reporting i.e. ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

Cumulative i.e. ROWS BETWEEN UNBOUNDED PRECEDING  AND CURRENT ROW and ROWS BETWEEN CURRENT ROW A
ND UNBOUNDED FOLLOWING

Moving i.e.

o    ROWS BETWEEN x PRECEDING AND y FOLLOWING

o    ROWS BETWEEN X PRECEDING AND Y PRECEDING

o    ROWS BETWEEN X FOLLOWING AND Y FOLLOWING ROWS BETWEEN X PRECEDING AND CURRENT ROW

o    ROWS BETWEEN CURRENT ROW AND Y FOLLOWING

 

• UDOAs do not support unbounded preceding or unbounded following moving windows, i.e.

ROWS BETWEEN UNBOUNDED PRECEDING AND X FOLLOWING

ROWS BETWEEN X PRECEDING AND UNBOUNDED FOLLOWING

 • Programming Context Area. The following fields of FNC_Context_t are setup by the database. They are only valid in AGR_INIT:

 pre_window size: Value specified as part of PRECEDING clause. The value of this field would be negative if this points to a row that precedes current row. This field is not applicable for the cumulative and reporting window types. It is initialized to zero in

these cases.

post_window size: Value specified as part of FOLLOWING clause. The value of this field would be positive if this points to a

row that follows the current row. This field is not applicable for the cumulative and reporting window types. It is initialized to zero in these cases.

window_size: For cumulative window types, value is -1. For reporting window type, value is -2. For the moving window type,

value is set as (post_window – pre_window size +1). Note, +1 is for the current row.

• Programming Phases.

The AGR_INIT phase is triggered once per partition at the start of a new group of the UDOA evaluation.

The AGR_DETAIL phase is triggered every time the forward row progresses.

A new phase AGR_MOVINGTRAIL has been introduced. This phase is applicable for the moving window type (non-cumulative,

non-reporting window type). This phase is triggered only for the last few rows of a moving window when the forward pointer to the

window reaches the end of the partition. The phase does not provide any new values to the UDOA but indicates to the UDOA

that processing has reached the end of the partition. It is expected that the UDOA uses this phase to adjust necessary internal count

and offset values to reflect the actual size as the windowdiminishes towards the end of the partition.

The AGR_FINAL phase is invoked at the time the final evaluated result needs to be moved into the result row produced by the

UDOA.

• Partitioning (Hash or Range)

Range partitioning strategy is mainly in place to avoid hot-amp situations for skewed partition by column values.

Range and/or hash partitioning is supported for moving window types.

Hash partitioning only is supported for reporting/cumulative window types. This stems from potential ambiguities that could be

involved when the user tries to reference previous values assuming specific ordering within window types like reporting/cumulative which are semantically not order dependant.

It is expected that the SQL writer uses an appropriate set of values for the partition by column to avoid potential skew

situations for the reporting / cumulative aggregate case.

 

Customer Example: Dense Rank

CREATE MULTISET TABLE t (id INTEGER, v INTEGER);

INSERT INTO t VALUES (1,1);

INSERT INTO t VALUES (1,2);

INSERT INTO t VALUES (1,2);

INSERT INTO t VALUES (1,4);

INSERT INTO t VALUES (1,5);

INSERT INTO t VALUES (1,5);

INSERT INTO t VALUES (1,5);

INSERT INTO t VALUES (1,8);

INSERT INTO t VALUES (1,);

 

The dense_rank function is an example of a UDOA. It is used to produce a denser ranking than the traditional RANK function. Traditional rank handles ties by assigning the same rank to tied values, but counts tied slots on the ranking list for subsequent ranked values. For example, a traditional ranking might produce the following values:

Value Rank

1001 1

1002 2

1002 2

1003 4

The final row is assigned a rank of 4 because the slot for 3 was used by one of the tied values.

The dense rank function disregards the tied slots in determining subsequent rankings, thus the dense rank of the above list would be:

Value Rank

1001 1

1002 2

1002 2

1003 3

 

REPLACE FUNCTION dense_rank

(x INTEGER)

RETURNS INTEGER

CLASS AGGREGATE (1000)

LANGUAGE C

PARAMETER STYLE SQL

NO SQL

DETERMINISTIC

CALLED ON NULL INPUT

EXTERNAL;

#define SQL_TEXT Latin_Text

#include <sqltypes_td.h>

#include <string.h>

typedef struct agr_storage {

int cr; // current rank

int pv; // previous value

} AGR_Storage;

void dense_rank( FNC_Phase phase,

FNC_Context_t *fctx,

INTEGER *x,

INTEGER *result,

int *x_i,

int *result_i,

char sqlstate[6],

SQL_TEXT fncname[129],

SQL_TEXT sfncname[129],

SQL_TEXT error_message[257] )

{

AGR_Storage *s1 = fctx->interim1; // pointer to intermediate storage

switch (phase)

{

case AGR_INIT:

 

if (fctx->window_size != -1) // window case not supported

{

strcpy(error_message,”Only cummulative window type supported”);

strcpy(sqlstate, “U0001”);

return;

}

if ( (s1=FNC_DefMem(sizeof(AGR_Storage))) == NULL)

{

strcpy(sqlstate, “U0002“);

return;

}

s1->cr = 1;

s1->pv = *x;

// Fall through to detail phase

case AGR_DETAIL:

if (*x != s1->pv)

s1->cr++;

s1->pv = *x;

break;

case AGR_FINAL:

*result = s1->cr;

break;

case AGR_COMBINE: // Add this to generate an error for any undefined

phases

case AGR_MOVINGTRAIL:

default:

sprintf(error_message,“phase is %d”,phase);

strcpy(sqlstate, “U0005”);

return;

 

}

SELECT v, dense_rank(v) OVER (PARTITION BY id ORDER BY v

ROWS UNBOUNDED PRECEDING) AS “Dense Rank”

,RANK() OVER (PARTITION BY id ORDER BY v) as “Normal Rank”

FROM t ORDER BY dr;

*** Query completed. 9 rows found. 3 columns returned.

*** Total elapsed time was 1 second.

v            DenseRank   Normal Rank

———– ———– ———–

?               1          1

1               2          2

2               3          3

2               3          3

4               4          5

5               5          6

5               5          6

5               5          6

8               6          9

 

 

Support

 

Potential Problems

Any AMP error should be specific to UDOA feature either rooted to UDOA or a base issue exposed by UDOA. For a parser error, problem isolation is facilitated by executing a specified UDOA DML query as UDA DML query. If the parser error still shows up it is rooted to the base UDA code otherwise it is UDOA specific.

Error Messages

The 3732 error code definition has been extended.

Example:

SELECT my_uda(a) OVER

(ROWS BETWEEN 1 preceding and UNBOUNDED following) FROM t

*** Failure 3732 The facility of specifying UNBOUNDED

FOLLOWING for moving user defined window function has not

been implemented yet.

 

 

3.4 Algorithmic Compression

 

Prior to Teradata 13.10, Teradata supported Multi Value Compression (MVC), (referred to as Value List Compression (VLC) in prior releases) for most fixed length column data types.

 As part of the compression enhancements for Teradata 13.10, the following capabilities have been added:

·         Support of MVC for variable length column data types

VARCHAR

VARGRAPHIC

VARBYTE

 

·         Extend MVC to be supported on columns of size less than or equal to 510. (Prior to Teradata 13.10, MVC was only permitted on a column whose length is less than or equal to 255. This limit has been increased to 510.)

 ·         New column level compression scheme called Algorithmic Compression(ALC). 

ALC is a new column level compression scheme wherein the users map compression/decompression algorithms implemented as UDFs to columns. The database then uses the UDF logic to compress/ decompress column data.

ALC allows users to apply their own compression/decompression algorithms to compress/decompress data at a column level in a row.

 It is provided as a framework so users can decide the best compression/decompression scheme depending on data type. It minimizes storage capacity for data and may trade off CPU performance in doing so.

The extra cost of compression/decompression depends on the nature of the algorithm chosen and the invocation cost for UDFs. Hence a careful choice needs to be made.

First, the algorithms need to be defined as scalar Domain Specific Functions.

(Note: Domain Specific Functions are also a new feature of TD13.10 and are covered elsewhere in this training.) Secondly, they should be specified in the column definition during a CREATE/ALTER TABLE. These algorithms are then invoked internally by the Teradata Database to compress/decompress the column data when the data is moved into the tables or when data is retrieved

from the tables.

 

ALC and MVC can co-exist on the same column. In such a case, ALC will only compress those values which are not part of the value list specified by MVC.

To prevent redefining the same algorithms for different column sizes or compatible column types, ALC provides a mechanism to define a generic UDF and UDF function signatures will be adjusted internally when the UDF is mapped to various columns.

 Since the defined UDFs are generic UDFs, developers can use following new FNC calls to determine the size of input/output data and result buffer size. The functions also allow user to pass NULLs as input or output data.

·         FNC_GetOutputBufferSize

Returns the size of the output buffer allocated by Evl.

Used by the UDF to determine the maximum output size.

 ·         FNC_GetByteLength, FNC_GetCharLength, FNC_GetGraphicLength, FNC_GetVarCharLength

Returns the size of the input parameter.

Can be used by the UDF instead of strlen.

Necessary because some strings may contain null terminators.

 ·         FNC_SetVarCharLength

Sets the length of the output string.

When invoked by the UDF Evl will not call strlen to get the length    of the returned string.

Necessary because some strings may contain null terminators.

                       

Business Value

Data Compression provides many advantages; some of them are given below:

·         Reduces the storage capacity needs, which in turn cuts down costs.

·         Reduces the number of disk I/Os.

·         Improves response time because less physical data is being operated upon.

 Technical Overview

The primary motivation for the compression enhancements was to increase the column level compression options provided by Teradata (ALC) and extend existing compression options (MVC) to variable columns.

Without support of MVC on variable length columns, NULL compression cannot be achieved. Currently a variable length column incurs a cost of 2 – 4 bytes, and this can be significant if the column has lot of NULLs.

In certain cases, ALC is a better choice than MVC and ALC can extend the benefits of compression to columns where MVC is not very helpful. Such a scenario occurs when the column values are mostly unique and so MVC compression benefits are minimal. In such a case, compression due to ALC, assuming the algorithms are carefully chosen, can yield significantly better compression than MVC.

 Because ALC and MVC can co-exist on the same column, the user is provided the flexibility of an IF-ELSE option wherein you can compress values from a dictionary else use a custom compression algorithm.

Teradata has been an advocate of column level compression as it does not require off-line compression processing and can be handled inline with the query processing. In this respect MVC was almost “free” in the sense it didn’t incur substantial CPU cost but the same time it is to be noted that it does not involve compressing anything. It just maps a dictionary value to a bit pattern. However, ALC involves invoking and executing a UDF to compress/ decompress the data and then stores the result. So, ALC compression is not “free” and is a tradeoff that needs to be clearly understood.

 Compression can still be handled inline with normal processing but will require more time due to increased CPU processing.

 When to Use This Feature

This feature is useful under many circumstances, and does not have special usage conditions.

 Requirements

When defining tables with ALC, it is expected that the UDFs are already created.Both customer-defined and domain specific functions are allowed and expected to be created in either SYSUDTLIB or TD_SYSFNLIB.

 ARCHIVE utility will require the compress/decompress UDFs to be present on target system prior to table restore.

 ALC requires custom compression algorithms at the column level. Compression achieved is a function of algorithm and column data type.

Trade-Offs

The following trade-offs apply:

 ·         This feature helps minimize storage capacity for data but trades off CPU performance in doing so.

 ·         Also, non specific function invocation costs and algorithm processing time need to be taken into account for ALC and contrasted with space savings.

 

Limitations

Limitations on MVC are applicable to ALC as well:

·         Primary index columns are not compressible

·         Table header is limited to 1 MB

·         Compression is not supported on:

volatile temporary tables

global temporary tables

·         There is a per column limit of 8192 chars for value list in CREATE TABLE

 

In addition, the following limitations are specific to ALC:

·         Compression is not supported on ALC columns in join indexes.

·         When compression scheme is changed from or to ALC, ALTER TABLE is

not allowed when the table has data.

·         ALC is supported on the following data types:

CHAR, GRAPHIC

VARCHAR, VARGRAPHIC

BYTE, VARBYTE

·         In contrast to MVC, the following are allowed with ALC:

There is no limit on the size of the data type that can have ALC.

 

Restrictions

The following restrictions apply:

·         Both customer-defined and domain specific functions are allowed.

Must be scalar, deterministic

Must NOT be Java UDF

 ·         Compress UDF should have:

A single input parameter which is compatible with the column data type.

A single output parameter which is VARBYTE.

 For example if the column datatype is CHAR:

REPLACE FUNCTION MyCompress(P1 VARCHAR(64000))

RETURNS VARBYTE(64000)

·         Decompress UDF should have:

A single input parameter which is VARBYTE.

A single output parameter which is compatible with the column

  data type

For example if the column datatype is CHAR:

REPLACE FUNCTION MyDecompress(P1 VARBYTE(64000))

RETURNS VARCHAR(64000)

 

Installation & Configuration

This feature does not require any special installation or configuration instructions.

 Usage

Create Function/Replace Function

Added FOR COMPRESS and FOR DECOMPRESS in function attributes.

CREATE FUNCTION [dbname.]udfname

( [ [pname] data_type […, [pname] data_type ] ] )

{ data_type [CAST FROM data_type] }

RETURNS { TABLE VARYING COLUMNS (integer) }

{ TABLE ( pname data_type […, paname date_type] ) }

function_characteristic [ … function_characteristic]

[ { name } ]

EXTERNAL [ NAME { } ]

[ { ‘[ needed_part [ delim needed_part ] ]’ } ]

[ PARAMETER STYLE [ SQL | TD_GENERAL | JAVA ]

………

………

[FOR COMPRESS | FOR DECOMPRESS]

………

………

 

Create/Alter Table

Extended options at the column level to specify ALC.

[NONTEMPORAL]

{ CREATE [SET ] [ GLOBAL TEMPORARY ] TABLE }

{ CREATE [MULTISET] [ VOLATILE ] }

{ } [databasename.]tablename

{ CT }

[ ,option [ … ,option ] ]

( { column_declaration | table_level_option }

…..

…..;

………..

………….

[column_declaration is

cname data_type_declaration [column_attribute […, column_attribute]]

column_attribute is one of the following:

data_type_attributes

[ COMPRESS [constant | ({NULL | constant} [… ,{NULL | constant}]

[COMPRESS USING [dbname.]udfname DECOMPRESS USING

[dbname.]udfname

………

………..

 

Variable MVC

The functionality has been extended to variable column data types – VARCHAR,VARBYTE, and VARGRAPHIC. And the MVC limit on data type size has been extended from 255 to 510.

 

Unicode Example

One of the use cases for ALC was sites moving from proprietary character sets to Unicode and fearing a huge increase in data size. This was because these sites’ national character sets require only 1 byte per character and Unicode requires 2 bytes. ALC could be used to implement algorithms to reduce Unicode penalty and reduce impact of move to Unicode.

In the following example, ALC is being used to convert Unicode column data to UTF8.

Example:

CREATE TABLE addr_alc_city(

idnum INTEGER,

street VARCHAR(100) CHARACTER SET UNICODE,

city CHAR(30) CHARACTER SET UNICODE

COMPRESS USING TransUnicodeToUTF8

DECOMPRESS USING TransUTF8ToUnicode,

zip CHAR(5) CHARACTER SET UNICODE

) PRIMARY INDEX (idnum);

 

Other considerations for this type of table:

• ARC will require UDF to be present on target system prior to table restore.

• FILER will display decompressed data for ALC compressed columns only for “tab /r” option else it will display hexdump of data.

• CHECKTABLE does some additional checks in case of ALC to check for data integrity.

• Algorithmic Compressions currently supports UNICODE ASCII character sets only. ALC support for a true UNICODE character set is planned for a future release. 

Support

Potential Problems

During CREATE/ALTER TABLE the user might run into errors that point to the UDF being unavailable. This might be because the UDF does not exist or UDF parameters and column type are not compatible data types.

 ·         Refer to the remedy in the error message description.

·         Check if DIPUDT has been run.

Queries on whether the column is compressed correctly or not would have to be debugged using FILER by dumping the table header/data row and checking the row level context.

CHECKTABLE has been extended to report corruption on tables for MVC and ALC.

Error Messages

The following error messages apply:

·         ERRTEQALCINVDB 9452

·         ERRTEQALCINVDTYPE 9453

·         ERRTEQALCTBLDATA 9454

·         ERRTEQALCINVOPTS 9455

·         ERRTEQDIPUDT 9464          

Workarounds

Use diagnostic EVLINTERP  if the query crashes in GNX mode.

3.5 Character Partitioned Primary Index

This new Teradata feature extends the capabilities and options when defining a PPI for a table or a non-compressed join index. Tables and non-
compressed join indexes can now include partitioning on a character column.

Business Value

This feature provides the improved query performance benefits of partitioning when queries have conditions on columns with character (alphanumeric) data.

Before Teradata 13.10, businesses were limited to creating partitioning on tables that did not involve comparison of character data. Partitioning expressions were limited to numeric or date type data.

The Partitioned Primary Index (PPI) feature of Teradata has always allowed a class of queries to access a portion of a large table, instead of the entire table.

This capability has simply been extended to include character data.

The traditional uses of the Primary Index (PI) for data placement and rapid access of the data when the PI values are specified are still retained.

 Technical Overview

 When creating a table or a join index, the PARTITION BY clause (part of PRIMARY INDEX) can now include partitioning on a character column. This allows the comparison of character data.

This feature allows a partitioning expression to involve comparison of character data (CHAR, VARCHAR, GRAPHIC, and VARGRAPHIC) types. A comparison may involve a predicate (=, >, <, >=, <=, <>, BETWEEN, LIKE) or a string function.

 ·         The use of a character expression is a PPI table may be referred to as CPPI (Character PPI).

The most common partitioning expressions utilize RANGE_N or CASE_N expressions.

Prior to Teradata 13.10, both the CASE_N and RANGE_N functions did not allow the PPI definition of character data. This limited the useful partitioning that could be done using character columns as a standard ordering (collation) of the character data is not preserved.

Both the RANGE_N and CASE_N functions support the definition of character data in Teradata 13.10.

Note: The term “character or char” will be used to represent CHAR, VARCHAR, GRAPHIC, or VARGRAPHIC data types.

 Example 1: Create a PPI table with character RANGE_N partitioning.

In this example, the state codes are divided into 5 partitions. The special partitions of NO RANGE and UNKNOWN are also defined.

 If the partitioning column is state_code, partition 1 is defined as having values between ‘AA’ and ‘CT’. Partition 2 is defined as having values between ‘DC’ and ‘FZ’, etc. When querying data for a particular state, only a fraction of the data in the table needs to be read – effectively partition elimination.

CREATE TABLE Claim_PPI

(claim_id INTEGER NOT NULL,

claim_date DATE NOT NULL,

city VARCHAR(30) NOT NULL,

state_code CHAR(2) NOT NULL,

claim_info VARCHAR (256) NOT NULL)

PRIMARY INDEX (claim_id)

PARTITION BY RANGE_N

(state_code BETWEEN ‘AA’ AND ‘CT’,

‘DC’ AND ‘HI’,

‘IA’ AND ‘MT’,

‘NC’ AND ‘SD’,

‘TN’ AND ‘WY’,

NO RANGE, UNKNOWN);

 

Why have a range starting with ‘AA’? This state code abbreviation of ‘AA’  doesn’t represent a state, but is a valid code for the Armed Forces of  America. There are more than 50 valid state codes.

 Note: The special partition UNKNOWN is defined, but is not necessary because the state_code has the NOT NULL attribute and will have a value. The following queries will benefit from this type of partitioning.

SELECT * FROM Claim_PPI

WHERE state_code = ‘OH’;

 SELECT * FROM Claim_PPI

WHERE state_code LIKE ‘O%’;

 SELECT * FROM Claim_PPI

WHERE state_code BETWEEN ‘GA’ and ‘MA’;

 SELECT * FROM Claim_PPI

WHERE state_code BETWEEN ‘ga’ and ‘ma’;

 

The session mode when these tables were created and when these queries were executed was Teradata mode (BTET) and Teradata mode defaults to “not case specific”.

Example 2: Create a MLPPI table with character RANGE_N partitioning.

A MLPPI table (Multi-level PPI) is created by specifying two or more partitioning expressions, where each expression must be defined using either a RANGE_N function or a CASE_N function exclusively. The system combines the individual partitioning expressions internally into a single partitioning expression that defines how the data is partitioned on an AMP.

In this example, the Claim table is first partitioned by claim_date. Claim_date is then sub-partitioned by state codes. The special partitions of NO RANGE and UNKNOWN are defined at the claim-date and at the state_code levels.

 

CREATE TABLE Claim_MLPPI

(claim_id INTEGER NOT NULL,

claim_date DATE NOT NULL,

city VARCHAR(30) NOT NULL,

state_code CHAR(2) NOT NULL,

claim_info VARCHAR (256) NOT NULL)

PRIMARY INDEX (claim_id)

PARTITION BY

( RANGE_N

(claim_date BETWEEN DATE ‘2004-01-01’ and DATE ‘2011-12-31’

EACH INTERVAL ‘1’ DAY, NO RANGE, UNKNOWN),

RANGE_N

(state_code BETWEEN ‘AA’ AND ‘CT’,

‘DC’ AND ‘HI’,

‘IA’ AND ‘MT’,

‘NC’ AND ‘SD’,

‘TN’ AND ‘WY’,

NO RANGE, UNKNOWN)

);

 

The following queries will benefit from this type of partitioning.

 

SELECT * FROM Claim_MLPPI

WHERE claim_date

BETWEEN DATE ‘2010-08-24’ AND DATE ‘2010-08-25’;

 

SELECT * FROM Claim_MLPPI

WHERE state_code = ‘OH’;

 

SELECT * FROM Claim_MLPPI

WHERE claim_date = DATE ‘2010-08-24’

AND state_code = ‘OH’;

 

SELECT * FROM Claim_MLPPI

WHERE claim_date

BETWEEN DATE ‘2010-08-24’ AND DATE ‘2010-08-25’

AND state_code = ‘OH’;

 

Example 3: Create a MLPPI table with two levels of character RANGE_N

partitioning.

In this example, the Claim table is first partitioned by claim_date (monthly intervals). Claim_date is then sub-partitioned by state codes. State codes are then sub-partitioned by the first two letters of a city name. The special partitions of NO RANGE and UNKNOWN are defined at the claim-date, state_code, and city levels.

Why is this partitioning example defined with intervals of 1 month for claim_date?

Teradata still has a maximum limit of 65,535 partitions in a table and defining 8 years of day partitioning with two levels of sub-partitioning cause this limit to be exceeded.

 CREATE TABLE Claim_MLPPI2

(claim_id INTEGER NOT NULL,

claim_date DATE NOT NULL,

city VARCHAR(30) NOT NULL,

state_code CHAR(2) NOT NULL,

claim_info VARCHAR (256) NOT NULL)

PRIMARY INDEX (claim_id)

PARTITION BY

( RANGE_N

(claim_date BETWEEN DATE ‘2004-01-01’ and DATE ‘2011-12-31’

EACH INTERVAL ‘1’ MONTH, NO RANGE, UNKNOWN),

RANGE_N

(state_code BETWEEN ‘AA’ AND ‘CT’,

‘DC’ AND ‘HI’,

‘IA’ AND ‘MT’,

‘NC’ AND ‘SD’,

‘TN’ AND ‘WY’,

NO RANGE, UNKNOWN),

RANGE_N

(city BETWEEN ‘A’,

‘D’,

‘I’,

‘N’,

‘T’ AND ‘ZZ’,

NO RANGE, UNKNOWN)

);

 

The following queries will benefit from this type of partitioning.

 

SELECT * FROM Claim_MLPPI3

WHERE sta
te_code = ‘GA’ AND city LIKE ‘a%’;

 

SELECT * FROM Claim_MLPPI3

WHERE claim_date = ‘2010-08-24’ AND city LIKE ‘a%’;

 

The test value of a RANGE_N function should be a simple column reference, involving no other functions or expressions. For example, if SUBSTR is added, then static partition elimination will not occur. Keep the partitioning expressions as simple as possible.

RANGE_N (SUBSTR (state_code, 1, 1) BETWEEN ‘AK’ and ‘CA’, …

This definition will not allow static partition elimination.

 The session collation in effect when the character PPI is created determines the ordering of data used to evaluate the partitioning expression.

 The ascending order of ranges in a character PPI RANGE_N expression is defined by the session collation in effect when the PPI is created or altered, as well as the case sensitivity of the column or expression in the test value. The default case sensitivity of character data for the session transaction semantics in effect when the PPI is created will also determine case sensitivity of comparison unless overridden with an explicit CAST to a specific case sensitivity.

• This is called the PPI collation. All system collations are supported:

– ASCII

– EBCDIC

– CHARSET_COLL

– JIS_COLL

– MULTINATIONAL

• This ordering affects the evaluation of partitioning expression conditions (=, >,<, >=, <=, <>, BETWEEN, LIKE) and string functions when calculating the partition number of a row.

• The default case sensitivity in effect when the character PPI is created will also affect the ordering of character data for the PPI.

– Default case sensitivity of comparisons involving character constants    is influenced by the session  ode. String literals have a different    default CASESPECIFIC attribute depending on the session mode.

• Teradata Mode (BTET) is NOT CASESPECIFIC

• ANSI mode is CASESPECIFIC

 – If any expression in the comparison is case specific, then the    comparison is case sensitive.

 The impact of the session collation and case sensitivities of WHERE/ON clause conditions in the query is that they must match (or be compatible with) the PPI collation and partitioning expression case sensitivity for partition elimination to occur.

§  If ANY condition in the partitioning expression is case sensitive, it is considered case sensitive.

§  If ANY comparison in a WHERE clause condition is case blind, it is considered case blind.

 All character string comparisons involving graphic data are case sensitive.

 Partition Elimination Details

 For static Partition Elimination (PE) to occur, all comparisons in the CPPI must have the same case sensitivity and the case sensitivity of the WHERE clause condition enabling partition elimination must match the PPI’s case sensitivity.

·         If CHARSET_COLL collation is used, the session character set must also match that of the CPPI for static partition elimination to occur. Various forms of Dynamic Partition Elimination (DPE) utilize equality conditions (DPE, rowkey-based merge join, delayed partition elimination). Static partition elimination using an equality condition does not require collations to match unless the query condition is case blind and the PPI collation is MULTINATIONAL or CHARSET_COLL.

 If the condition enabling partition elimination is case blind, all comparisons in the partitioning expression must be case blind. The WHERE clause condition is considered to be case blind if ANY of the comparisons or string functions involving non-constant expressions in the condition is case blind.

 ·         Suggestion is to use case blind partitioning expressions to maximize use of the PPI.

 When to Use This Feature 

This feature is an automatic feature in Teradata 13.10. Some general considerations when using the character PPI feature.

·         Character partitioning columns should use the Unicode server character set if multiple character sets are used on the system. Otherwise the PPI may be ineligible for dynamic forms of partition elimination such as rowkey-based merge join.

– Suggestion is to use Unicode partitioning columns to maximize use of the PPI. 

Requirements

There are no specific requirements for this feature.

Trade-Offs

The usual PPI trade-offs should be considered. PPI tables will use more pe
rmanent space because rows are slightly larger (2 bytes) to include the partition number. Join performance may be longer.

 For example, adding partitioning columns that are not in the primary index may decrease join performance if there are usually join conditions given on the PI columns, but not on the partitioning columns.

Limitations

 • A character PPI may not involve columns using the Kanji1 or KanjiSJIS server character sets.

 • The EACH clause is not supported in a RANGE_N function with character data ranges.

 • Static partition elimination may not occur with CASE_N character partitioning when the query condition or CASE_N function uses the LIKE predicate.

• Use of the concatenation operator (‘||’) in a non-constant expression in a partitioning expression will disallow static partition elimination.

• A character PPI with CHARSET_COLL collation queried in a stored procedure cannot utilize partition elimination.

• The CASESPECIFIC attribute of the Range_N test value is inherited by the range boundaries.

 • The AcceptReplacementCharacters DBSControl flag must be set to false for partition elimination on a CPPI partitioning level to occur.

• Merge join DPE (Dynamic Partition Elimination) is not supported and is not applicable to CPPI.

 • ADD/DROP RANGE of standard ranges not supported for character partitioning levels.

 • Any string literal referenced within a CASE_N or RANGE_N expression must be less than 31000 bytes (otherwise, error 3738 occurs).

• A character PPI RANGE_N expression may be altered to only ADD/DROP special partitions (NO RANGE, and UNKNOWN partitions).

Examples: The following queries will fail.

 ALTER TABLE Claim_PPI MODIFY PRIMARY INDEX DROP RANGE BETWEEN ‘TN’ AND ‘WY’;

 ALTER TABLE Claim_MLPPI MODIFY PRIMARY INDEX DROP RANGE#L2 NO RANGE;

(If the table is empty, this query will be successful.)

Examples: The following queries will succeed. 

ALTER TABLE Claim_PPI MODIFY PRIMARY INDEX

DROP RANGE NO RANGE, UNKNOWN;

 

ALTER TABLE Claim_MLPPI MODIFY PRIMARY INDEX

ADD RANGE NO RANGE OR UNKNOWN;

 

Installation & Configuration

This feature does not require any special installation or configuration instructions.

This feature is automatically part of the Teradata Database 13.10 software when this release is installed on a system.

Usage

The system uses this feature automatically. No special usage instructions are required.

This feature allows more opportunities to use PPI.

Support 

This feature does not require any specific support information. If it is necessary to disable this feature, there is an internal DBSControl flag that can be set to disable use of CHAR PPI optimizations:

The DisableCharPE field (internal flag #157) can be set to TRUE to disable this feature.

This will not disable partition elimination optimizations on non-character partitioning levels of an MLPPI table.

3.6 Moving Current Date / Timestamp in Join Index

The moving Current Date / Timestamp feature provides a way for you to periodically update the value of the system constant  CURRENT_DATE/CURRENT_TIMESTAMP (CD/CT) which is used in the definition of a Join Index. This feature allows the content of a Join Index to be refreshed without having to drop it and recreate it.

Business Value

The implementation of the Moving Current Date / Timestamp feature eliminates the need to drop and  ecreate a join index and therefore reduces the data dictionary operations and the cost of refreshing  he content of a join index. This makes a join index with CD/CT easily usable for a temporal query and enforces data integrity of a temporal table. You can define a CD/CT based Join Index for capturing  data with a “moving” window of time without the need of dropping and recreating the whole Join Index, thereby:

·         Reducing the data dictionary operations.

·         Avoiding Join index maintenance. Join index maintenance depends on both the join index definition and current data. If it is certain that current data won’t affect a join index, there is no join      index maintenance required. Otherwise, join index maintenance is needed.

In order to use a join index to rewrite a query, the join index must have
the same 
resolved CD/CT as a temporal query. CD/CT could appear in a SystemDefinedJI that will be refreshed with MoveDate Join Index. SystemDefinedJI is used to enforce data integrity of a temporal table.

For most applications, refreshing a join index with this feature will be much faster.

 

Technical Overview

This feature refreshes the value of the system-defined constant Current_Data/Current_Time Stamp  CD/CT) in the definition of a Join Index and updates its content accordingly when you alter the join index or when you alter the temporal table if the join index is a system-defined join index.

 There is a uniqueness constraint requirement in CURRENT data to support Temporal Database. This constraint is satisfied by:

·         Allowing a unique primary index in a join index.

·         Period data type and BEGIN and END methods in defining a join index.

·         Refreshment of the value of Current_Date/Current_TimeStamp in the system-defined join index.

Its content is updated automatically when the user alters the temporal table.

 When to Use This Feature

This feature is useful under the following circumstances:

 ·         When you want to refresh a join index without having to drop and recreate it. After a join index is refreshed, it could be used to rewrite a query.

·         When you need to define a Join Index based on Date or TimeStamp and there is a need to refresh the Join Index to catch up with the latest date,on a continuous basis.

·         This feature is also applied automatically in the system to refresh a SystemDefinedJI to enforce the data integrity of a temporal table.

Requirements

There are no specific requirements for this feature.

 Trade-Offs

If the Join Index is defined with complicated CD/CT terms, the performance of refreshing it may not be better than dropping and recreating it manually. This is dictated by the amount of data that needs to be manipulated.

Limitations

The following limitations apply:

·         Any CD/CT used in the projection list and/or WHERE clause of the Join Index definition is moved to the latest date /timestamp and the content of the Join Index is updated accordingly.

·         Any CD/CT used in the partitioning expression of the Join Index is also moved and the partitioning of data in the Join Index is updated accordingly,

·         A Join Index must have reference to CD/CT somewhere in its definition (including the PPI expression).

·         You must have DROP TABLE privileges on the Join Index or on the database containing the Join Index. Furthermore, you must have INDEX or DROP TABLE privileges on the base table(s) on which the CD condition is defined.

 Installation & Configuration

This feature does not require any special installation or configuration instructions.

 Usage

The CURRENT_DATE / CURRENT_TIMESTAMP in a join index is the date that the join index was created. The content of the join index depends on the value of the CURRENT_DATE / CURRENT_TIMESTAMP.

The MoveDate Join Index feature will update the CURRENT_DATE / CURRENT_TIMESTAMP of the Join Index to the latest date and update the content of the join index.

 

Support

Error Messages

·         9248

If a Join Index does not have reference to CD/CT anywhere in its definition (including the PPI expression), an error 9248 “Object is not associated with CURRENT_DATE or CURRENT_TIMESTAMP” is reported.

·         9247

If the resolved value of CD/CT in the Join Index is more recent that the current value of the CD/CT an error 9247 “One or more object reconciled to an earlier or same date or timestamp” is reported.

 Troubleshooting

The following troubleshooting techniques may be helpful:

·         After altering a Join Index, if you suspect that the Join Index doesn’t

have the correct content because you see incorrect results, to isolate

the problem:

 

§  Use the “show qualified <dml> ;” to show the resolved CURRENT DATE / CURRENT_TIMESTAMP value.

§  After enabling the Join Index select diagnostic, Use ‘sel * from ji;” to show the actual content of the Join Index.

§  If you suspect that the Join Index is corrupted because of the Optimize method, you can turn on the diagnostic flag “NOMDJIOPT” to disable the optimization. It is possible that the optimize method may make some error in computing the delta between the new Date range and the old Date range.

 Note: The general method, which is to drop everything and rematerialize the JI,  should always work.

3.7 Moving Current Date in PPI

This feature supports the use of CURRENT_DATE and CURRENT_TIMESTAMP built-in functions in a partitioning expression and the ability to reconcile the values of these built-in functions to a newer date or timestamp using ALTER TABLE TO CURRENT by

·         Optimally repartitioning the rows based on the newly resolved values.

·         Reconciling the Partitioned Primary Index (PPI) expression based on the newly resolved values.

 Date based partitioning is typically used for PPI. If a PPI is defined with ‘moving’ current date or current timestamp, the partition that contains the recent data can be as small as possible for efficient access.

 Business Value

This feature allows you to easily define PPI with a ‘moving’ current date and timestamp (instead of requiring a manual redefinition of the PPI expression using constants) and optimally reconcile the values of these built-in functions to a newer date or timestamp using ALTER TABLE TO CURRENT.

 Most of the temporal applications require the data to be partitioned on current and history, this feature provides the facility to define such partitions and help in maintaining the current data to be recent very efficiently.

Technical Overview

The performance of this feature depends on how CURRENT_DATE or CURRENT_TIMESTAMP functions are used in a PPI. Optimization during ALTER TABLE TO CURRENT is provided for most common and simple expressions forms such as

col > CURRENT_DATE +/- INTERVAL in CASE_N

col between CURRENT_DATE and CURRENT_DATE +/- INTERVAL

expressions in RANGE_N

If the expression is complex, reconciliation using the ALTER TABLE TO CURRENT statement may have unacceptable performance as it may perform an all-rows scan to repartition the data. Moreover, performance may be severely degraded if a large number of rows relative to the size of the table must be moved or deleted. Hence, a PPI that references CURRENT_DATE or/and CURRENT_TIMESTAMP should be designed with care.

When to Use This Feature

Use this feature when you need to define a PPI using a ‘moving’ current date or timestamp, such as with temporal applications.

Requirements

There are no specific requirements for this feature.

 Trade-Offs

Performance may be severely degraded if a large number of rows relative to the size of the table must be moved or deleted. Hence, a PPI that references CURRENT_DATE or/and CURRENT_TIMESTAMP should be designed with care.

 Limitations

The following limitations apply:

·         ADD and DROP range(s) at any level of a multi-level PPI table is not supported if the partitioning expression for that level specifies

CURRENT_DATE or CURRENT_TIMESTAMP.

·         If WITH DELETE is specified, any delete triggers on the PPI table must be disabled. If WITH INSERT is specified, and any insert triggers on the save table must be disabled otherwise ALTER TABLE TO CURRENT reports error 5737.

Installation & Configuration

This feature does not require any special installation or configuration instructions.

Usage

General Steps

·         You specify CURRENT_DATE or/and CURRENT_TIMESTAMP in a PPI to use the feature.

 ·         You should reconcile the table based on the newly resolved values of these functions by ALTER TABLE TO CURRENT statement to have recent data.

·         The newly resolved values of CURRENT_DATE or CURRENT_TIMESTAMP should always be greater than the last resolved values when reconciling the table using ALTER TABLE TO CURRENT.

·         You need not drop and recreate PPI tables (or compression join indexes) with moving time when expressions are based on date or timestamp.

The PPI expression should be designed with care when CURRENT DATE or CURRENT_TIMESTAMP are specifi
ed. We recommend that you do not use expressions that reconcile the table by scanning all rows in the table. 

Example 1:

CREATE TABLE cust(custid INT, exp_date DATE )

PRIMARY INDEX(cust_id)

PARTITION BY CASE_N(exp_date = CURRENT_DATE, NO CASE);

 In this case the ALTER TABLE TO CURRENT will always scan both the partitions for the above PPI. This is because of the possibility that rows in the second partition might have to be relocated to the first, given that any non-current rows are found there, some of which might be future rows.

 Example 2:

CREATE TABLE cust(custid INT, exp_date DATE )

PRIMARY INDEX(cust_id)

PARTITION BY CASE_N(exp_date > = CURRENT_DATE,

exp_date < CURRENT_DATE);

 The ALTER TABLE TO CURRENT optimizes the reconciliation and scans only the first partition and relocates stale rows to the second partition.

Additionally:

·         Conditional expressions with contiguous timelines are recommended in CASE_N expressions instead of defining them with time gaps.

·         If an UNKNOWN partition is defined in a CASE_N expression, it is always scanned during reconciliation.

·         If CURRENT_DATE or CURRENT_TIMESTAMP is specified in the start_expr of the first range, and if that start_expr gets altered with a new CURRENT_DATE or CURRENT_TIMESTAMP, then all partitions prior to the new first partition are dropped.

 For example, if the partition range is:

PARTITION BY range_n( dt BETWEEN CURRENT_DATE AND

CURRENT_DATE + INTERVAL ‘1’ YEAR EACH INTERVAL ‘1’ MONTH,

NO RANGE)

where Current_Date is 01-July-2010 and the table is altered so that current date becomes 01-Aug-2010, then all July rows are dropped.

·         If CURRENT_DATE or CURRENT_TIMESTAMP is used in an end_expr only then:

If NO RANGE is specified, rows from the NO RANGE partition are  scanned to partition them based on the newly resolved date.

If NO RANGE is not specified, no rows are scanned. The  partitioning expression is changed to accommodate the changed range values.

Options and Syntax

CREATE TABLE/CREATE TABLE AS/CREATE JOIN INDEX:

There is no new syntax introduced for CREATE TABLE. The prior restriction on specifying CURRENT_DATE or CURRENT_TIMESTAMP in the PPI has been lifted with this feature.

New ALTER TABLE syntax:

This new extension reconciles the partitioning for a PPI of a table or noncompressed join index to a newly resolved date or timestamp when it’s partitioning is based on CURRENT_DATE or CURRENT_TIMESTAMP functions.

Sample: Scenario #1:

The table below creates two partitions

(1) A partition with active policies and

(2) A partition with expired policies.

 

CREATE TABLE cust_policy (

cust_no CHAR(8),

policy_number INTEGER,

policy_expiry_date DATE FORMAT ‘YYYY/MM/DD’

)

PRIMARY INDEX(cust_no,policy_number)

PARTITION BY CASE_N( policy_expiry_date >=

CURRENT_DATE,

NO CASE);

 

SHOW TABLE displays the user-specified partitioning expression.

SHOW TABLE cust_policy;

*** Text of DDL statement returned.

*** Total elapsed time was 1 second.

 

CREATE SET TABLE DB1.cust_policy ,NO FALLBACK ,

NO BEFORE JOURNAL,

NO AFTER JOURNAL,

CHECKSUM = DEFAULT,

DEFAULT MERGEBLOCKRATIO

(

cust_no CHAR(8) CHARACTER SET LATIN NOT

CASESPECIFIC,

policy_number INTEGER,

policy_expiration_date DATE FORMAT ‘YYYY/MM/DD’)

PRIMARY INDEX (cust_no,policy_number)

PARTITION BY C
ASE_N(

policy_expiration_date >= DATE ,

NO CASE);

 

SHOW QUALIFIED request displays the partitioning expression with the last

resolved values of CURRENT_DATE or CURRENT_TIMESTAMP.

SHOW QUALIFIED SELECT * FROM cust_policy;

*** Text of DDL statement returned.

*** Total elapsed time was 1 second.

CREATE SET TABLE DB1.cust_policy ,NO FALLBACK ,

NO BEFORE JOURNAL,

NO AFTER JOURNAL,

CHECKSUM = DEFAULT

(

cust_no CHAR(8) CHARACTER SET LATIN NOT

CASESPECIFIC,

policy_number INTEGER,

policy_expiration_date DATE FORMAT ‘YYYY/MM/DD’)

PRIMARY INDEX (cust_no,policy_number)

PARTITION BY CASE_N(

policy_expiration_date >= DATE ‘2008-01-01’,

NO CASE);

 

The above shows CURRENT_DATE resolved to a constant date value. In this case, the table was created on DATE ‘2008-01-01’ and has not been ALTERED to the new resolved values of CURRENT_DATE or CURRENT_TIMESTAMP. 

Data in the table;

SELECT PARTITION, cust_policy.*

FROM cust_policy

ORDER BY 1;

*** Query completed. 6 rows found. 4 columns returned.

*** Total elapsed time was 1 second.

 

PARTITION

cust_no

policy_number

policy_expiry_date

1

104

2003

11/28/2008

1

101

2002

1/1/2010

1

101

2001

1/1/2009

1

102

2003

1/1/2008

2

105

2004

12/31/2007

2

103

2004

12/30/2007

 

 

Assuming we will reconcile the table on date ‘2009-01-01’, the two rows with cust_no 104, 102 will become inactive policies. This suggests that a reconciliation will only affect a single partition of the table. We can further see this with an EXPLAIN of the reconciliation statement.

 DATA AFTER ALTER TABLE TO CURRENT:

SELECT PARTITION,cust_policy.* FROM cust_policy ORDER BY 1;

*** Query completed. 6 rows found. 4 columns returned.

*** Total elapsed time was 1 second.

 

PARTITION

cust_no

policy_number

policy_expiry_date

1

101

2002

1/1/2010

1

101

2001

1/1/2009

2

104

2003

11/28/2008

2

105

2004

12/31/2007

2

103

2004

12/30/2007

2

102

2003

1/1/2008

 

Now we can see the rows marked in RED were moved to partition 2.

 

 

Another real schema using CURRENT_DATE:

 

CREATE TABLE Orders

(o_orderkey INTEGER NOT NULL,

o_custkey INTEGER,

o_orderstatus CHAR(1) CASESPECIFIC,

o_totalprice DECIMAL(13,2) NOT NULL,

o_orderdate DATE FORMAT ‘yyyy-mm-dd’ NOT NULL,

o_orderpriority CHAR(21),

document VARCHAR(79))

PRIMARY INDEX (o_orderkey)

PARTITION BY

CASE_N(o_orderdate >= CURRENT_DATE, /* Current Quarter */

o_orderdate >= CURRENT_DATE – INTERVAL ‘3’ MONTH AND

o_orderdate < CURRENT_DATE, /* Previous Quarter */

o_orderdate < CURRENT_DATE – INTERVAL ‘3’ MONTH,

NO CASE); /* Old Quarters */

 

• The above table is partitioned with data as current, previous and old quarters.

CURRENT_DATE is valued as the beginning date of the current quarter.

 

• NO CASE optimization:

ALTER TABLE TO CURRENT does not scan NO CASE partition when all the earlier conditional expression(s) with CURRENT_DATE or CURRENT_TIMESTAMP are contiguous, Such is the case in the

preceding schema. NO CASE normally contains huge number of rows. This optimization gives a significant performance benefit.

Support

• Diagnostics introduced with this feature.

Diagnostic NOMDPPIOPT on for request/session. Use of this diagnostic is explained in the Hints and Tips section.

 

Hints and Tips

·         Are rows correctly re-partitioned after ALTER TABLE TO CURRENT?

Use CHECK TABLE to determine the invalid rows with the new partitioning. The following command  can be used for this: CHECK <database name>. <table_name> AT LEVEL HASH;

 ·         If rows are not correctly partitioned after ALTER TABLE TO CURRENT (one way to detect is the  above), fix the badly partitioned table using

ALTER TABLE REVALIDATE PRIMARY INDEX on the subject table or join index.

§  If the problem of bad re-partitioning is recurring every time ALTER TABLE

TO CURRENT is issued – issue the

DIAGNOSTIC NOMDPPIOPT ON FOR SESSION prior to the ALTER TABLE TO CURRENT statemen
t.

This diagnostic re-partitions all rows in the table without any optimization.

§  The EXPLAIN of the ALTER TABLE TO CURRENT shows whether the operation is a partition-based access (optimal) or an all rows scan.

§  While debugging a problem with ALTER TABLE TO CURRENT, the partition list that needs to be scanned as part of reconciliation logic can be printed using gdb and evaluated for correct partitions pruned.

 

Relationship to MoveDate Join Index Feature of TD13.10

 

This feature has an interaction with the new TD13.10 MoveDate Join Index feature.

• With MoveDate in JI feature, the user can specify

CURRENT_DATE/CURRENT_TIMESTAMP in the WHERE clause of join index.

 

• Hence, CURRENT_DATE/CURRENT_TIMESTAMP can be specified in both the PPI and the SELECT’s WHERE clause of a Join index. The ALTER TABLE TO CURRENT logic does the following in this case:

Remove those rows in JI that no longer satisfy the new CURRENT_DATE or CURRENT_TIMESTAMP condition

Moves CURRENT_DATE or CURRENT_TIMSTAMP into PPI expression and re-partition data accordingly.

Add new rows to JI that are qualified by the new CURRENT_DATE or CURRENT_TIMESTAMP condition. The new rows are built using the new CURRENT_DATE value so any CURRENT_DATE based expression in the JI’s SELECT has the new date value.

Error Messages

The following are new error messages introduced with this feature:

·         9247 , ‘One or more objects reconciled to an earlier or same date or timestamp’

·         9248 , ‘Object is not associated with CURRENT_DATE or CURRENT_TIMESTAMP’

 ·         9249 , ‘The altering of RANGE_N definition with CURRENT_DATE/CURRENT_TIMESTAMP is not allowed’

3.8 Time Series Expansion 

This feature expands a column having a Period data type, creating a regular time series of rows based on the Period value in the input row. The expansion of a Period column produces one value-equivalent row for each of the time granules in the date or timestamp representation of the specified time granule. You can perform time series expansions only on Period expressions.

When the expand operation is specified on a period expression in a select query, the period expression is expanded based on the interval specified by the user or the least granule of period column being expanded on.

The intent is to expand a Period column and produce value equivalent rows, one each for each of the granules in the period. Each such row is associated with a period representation of the specified time granule.

The time series expand operation is added to the current functionality of the SELECT statement. The expand operation is performed only on a Period expression.

Business Value

Time series analysis is used for many applications such as sales forecasting, stock market analysis, and inventory studies. In all the above scenarios without time series expand operation, we may need to use the system calendar in order generate the sequence of intervals. The time series expand operation avoids the calendar table join. The Time Series Expand clause avoids joins with the System Calendar for some OLAP Queries in the following application environments:

·         Active Data Warehouse

·         Teradata CRM

·         BI Tools, Analytic Applications, Teradata Warehouse Miner

 

A period representation of a time series, which is useful for slowly changing data, is a dense representation for a single relational row. Sparse representations of relational data, on the other hand, are useful for describing quickly changing data. A sparse representation of a time series records the same data as a dense representation, but in multiple rows, with each row having a date or timestamp value that is associated with its data. The analysis of densely represented data

often requires complex coding in standard SQL, which increases the likelihood of returning a different result than was intended. The EXPAND ON clause is designed to simplify the SQL code required to analyze densely represented data.

EXPAND ON describes a time series expand operation.

 

Technical Overview

The time series expand operation is added to the current functionality of the

SELECT statement.

An EXPAND ON clause can be specified anywhere in a SELECT except in the case where the SELECT is specified as a subquery in a search condition or in the case where the SELECT is in a join index.

The expand operation is performed only on a Period expression.

The EXPAND clause enables you to p
erform three different forms of expansion:

 

·         Interval expansion, where rows are expanded by user-specified intervals. Expand by the user-specified intervals such as INTERVAL ‘1’ DAY, INTERVAL ‘3’ MONTH, etc. An example where this form of expansion may be useful is “compute the moving window average of inventory cost by week during the year 2006.”

·         Anchor point expansion, where rows are expanded by user-specified anchored points in a time line. For example, BY ANCHOR MONTH_BEGIN specified as the anchor point results in multiple

expanded rows, 1 for each beginning value of a month present in the input period. An example query that uses such expansion is “Get the month end average inventory cost during the last quarter of the year 2006.”

·         Anchor period expansion, where rows are expanded by user-specified anchored periods in a time line. Uses the PERIOD keyword in the syntax to distinguish between anchor point expansion and anchor period expansion. For example, BY ANCHOR PERIOD MONTH_BEGIN, BY ANCHOR PERIOD MONDAY, etc.,. A sample of a query that uses such expansion is, “compute the weekly weighted average of inventory cost.”

 When to Use This Feature

Any business using time series analysis will find this feature useful.

 Requirements

There are no specific requirements for using this feature.

 Trade-Offs

There are no trade-offs in using this feature.

 Limitations

There are no limitations when using this feature.

 Usage

The following is a syntax diagram for the EXPAND ON clause of the SELECT statement.

 

 

Example of Interval Expansion – The resulting TSP column shows each employee broken into multiple rows, each with a maximum interval of one year and each within the defined interval period of the SELECT statement.

 

Employee TABLE

Eid

Ename

Jobduration

 

1001

Xavier

2002-01-10,9999-12-31

 

1002

Ricci

2007-07-01,9999-12-31

 

1003

Charlie

2006-02-10,2008-06-01

 

 

SELECT eid, ename, jobduration, tsp FROM employee

EXPAND ON jobduration AS tsp BY INTERVAL ‘1’ YEAR

FOR PERIOD(DATE ‘2006-01-01’, DATE ‘2008-01-01’);

 

EID

ENAME

JOBDURATION

TSP

100

Xavier

2002-01-10,9999-12-31

2007-01-01,2008-01-01

100

Xavier

2002-01-10,9999-12-31

2006-01-01,2007-01-01

100

Ricci

2007-07-01,9999-12-31

2007-07-01,2008-01-01

100

Charlie

2006-02-10,2008-06-01

2007-02-10,2008-01-01

100

Charlie

2006-02-10,2008-06-01

2006-02-10,2007-02-10

        

 

 

 

 

 

 

 

 

 

 

 

Example: Expansion on an Anchor Point Using WEEK_DAY

Suppose you create a table named tdate with the following definition.

CREATE SET TABLE tdate (

id INTEGER,

quantity INTEGER,

pd PERIOD(DATE))

PRIMARY INDEX (id);

 

You insert two rows into tdate so its contents are as follows:

 

 

Id            quantity                                 pd

—            ———-                   —————————

11            110                          2005-02-03, 2005-06-20

10            100                          2004-01-03, 2004-05-20

 

You now submit a SELECT request against tdate that specifies an EXPAND ON clause anchored by a day of the week, Monday, so each expanded row begins from Monday, which you specified in the request, and the duration of each expanded period is seven days.

SELECT id, BEGIN(bg)

FROM tdate

WHERE id = 11

EXPAND ON pd AS bg BY ANCHOR MONDAY;

 

Teradata Database returns tdate details for each week of a given period,  beginning on Monday, as you specified in the BY ANCHOR clause of the request.

Because the first row in tdate starts on a Thursday, not a Monday, the expanded row starts on the next sequential Monday date, which is February 7, and then continues in weekly increments.

 

id

begin(bg)

11

2/7/2005

11

2/14/2005

11

2/21/2005

11

2/28/2005

11

3/7/2005

11

3/14/2005

11

3/21/2005

11

3/28/2005

11

4/4/2005

11

4/11/2005

11

4/18/2005

11

4/25/2005

11

5/2/2005

11

5/9/2005

11

5/16/2005

11

5/23/2005

11

5/30/2005

11

6/6/2005

11

6/13/2005

 

Example: EXPAND ON For an Anchored Interval

This example demonstrates the use of an anchored interval for doing anchor period and anchor point expansions. For an anchor period expansion, the expanded period value must overlap the expanding period, while for an anchor point expansion, the begin value of the expanded period value must be contained in the expanding period, which is a more restrictive condition.

First create the sold_products table

CREATE SET TABLE sold_products, NO FALLBACK (

product_id INTEGER,

product_price DECIMAL(10,2),

product_duration PERIOD(DATE))

PRIMARY INDEX (product_id);

 

Next add these two rows to sold_products:

 

product_id            product_price       product_duration

————–             —————–         —————————–

1000                       100.00                   2007-02-15, 2007-08-11

1001                       99.99                      2007-03-04, 2007-05-01

 

Now submit the following SELECT request, which specifies an anchor period of

MONTH_BEGIN. This is an anchor period expansion.

 

SELECT product_id, product_price, product_duration, expd

FROM sold_products

EXPAND ON product_duration AS expd BY ANCHOR PERIOD MONTH_BEGIN;

 

The request returns the following nine rows.

 

product_id

product_price

product_duration

expd

1000

100

2007-02-15,2007-08-11

2007-02-01,2007-03-01

1000

100

2007-02-15,2007-08-11

2007-03-01,2007-04-01

1000

100

2007-02-15,2007-08-11

2007-04-01,2007-05-01

1000

100

2007-02-15,2007-08-11

2007-05-01,2007-06-01

1000

100

2007-02-15,2007-08-11

2007-06-01,2007-07-01

1000

100

2007-02-15,2007-08-11

2007-07-01,2007-08-01

1000

100

2007-02-15,2007-08-11

2007-08-01,2007-09-01

1001

99.99

2007-03-04,2007-05-01

2007-03-01,2007-04-01

1001

99.99

2007-03-04,2007-05-01

2007-04-01,2007-05-01

 

 

The highlighted rows will not appear in the answer set of the following example.

 

 

Example: EXPAND ON For an Anchored Period

 

Submit the following SELECT request, which differs from the previous request only in specifying the BEGIN bound function on product_duration instead of simply specifying the column name. This is an anchor point expansion done on the same data as the previous anchor period expansion.

 

SELECT product_id, product_price, product_duration, BEGIN(expd)

FROM sold_products

EXPAND ON product_duration AS expd BY ANCHOR MONTH_BEGIN;

 

(Note the absence of the word PERIOD in the BY ANCHOR clause)

As you can see, this request returns seven, rather than nine, rows, with the highlighted rows from the previous example not appearing. This is because the result will not include months which do not have the first day of the month in the period.

product_id

product_price

product_duration

begin(expd)

1000

100

2007-02-15,2007-08-11

3/1/2007

1000

100

2007-02-15,2007-08-11

4/1/2007

1000

100

2007-02-15,2007-08-11

5/1/2007

1000

100

2007-02-15,2007-08-11

6/1/2007

1000

100

2007-02-15,2007-08-11

7/1/2007

1000

100

2007-02-15,2007-08-11

8/1/2007

1001

99.99

2007-03-04,2007-05-01

4/1/2007

 

 

Customer Examples of Time Series Queries

Find the weekly moving average inventory cost by item by day for 2006 across all stores.

 With Time Series Support

SELECT item, P, AVG (qty*unitprice)

OVER (PARTITION BY item

ORDER BY P ROWS 6 PRECEDING)

FROM

(SELECT item, qty, unitprice,

BEGIN(expd) as p FROM inventory

EXPAND ON validity AS expd

BY INTERVAL ‘1’ DAY

FOR Period (DATE ‘2006-01-01’, DATE ‘2007-01-01’)

) dt;

Note: First SELECT specifies operation (moving average) to be performed.

Second SELECT (dt) specifies the rows to be worked on and is expansion

specific.

 

Without Time Series Support 

SELECT item, P, AVG (qty*unitprice)

OVER (PARTITION BY item

ORDER BY P ROWS 6 PRECEDING)

FROM

(SELECT item, qty, unitprice,

BEGIN(validity) as p

FROM inventory i, calendar c

WHERE i.validity overlaps

(DATE ‘2006-01-01, DATE ‘2007-01-01’)

AND c.year = 2006

AND c.date >= BEGIN(validity)

AND c.date < END(validity)) dt;

 

Note: This syntax will require a product join with the system calendar and thus

will perform less well than the previous example.

 

Time Series With Anchor Point

Give the top 3 high inventory items for this quarter at every week end.

 

SELECT item,

rank () OVER

(PARTITION BY p ORDER BY q DESC) r,

p

FROM

(SELECT item, qty*unitprice q,

BEGIN(expd) as p

FROM inventory

EXPAND ON validity AS expd

BY ANCHOR FRIDAY

FOR PERIOD (DATE ‘2006-01-01’, DATE ‘2006-04-01’)

) dt

QUALIFY r < 3;

Other Features 

• An EXPAND ON clause can be specified in the seed statement in a recursive query.

• An EXPAND ON clause can be specified in a derived table including the  case when it is contained in a subquery.

• An EXPAND ON clause can be specified in an INSERT SELECT query.

• An EXPAND ON clause can be specified in the SELECT clause of a CREATE TABLE AS SUBQUERY statement.

• If optional FOR clause is specified and the period expression in the FOR  clause overlaps the period value of the expand expression for a qualifying row, the period being expanded for this qualifying row, is the intersection of the period value of the expand expression and the period value of the period expression in the FOR clause.