PROVE IT !!If you know it then PROVE IT !! Skill Proficiency Test

Teradata Surrogate Key / Identity Key Generation

The use of surrogate keys is usually considered when loading different source systems with different natural keys on the same integrated data model. This implies a conversion of the different natural keys into a standard surrogate key that can be used for all source systems. Another important consideration of using surrogate keys as part of the primary index is to be able to improve performance on the joins among tables with the same surrogate key.Surrogate keys can be generated via ETL/ELT tools or inside Teradata.

There are data models that use surrogate keys as part of primary key/primary index definition.
Examples are: Industry and Homegrown models.

Let's talk about three different ways of generating surrogate keys inside Teradata.

  • Identity Columns
  • OLAP Function (CSUM)
  • Hashing Algorithm
  • Identity Columns

This is a good way to generate surrogate keys if you don't have a dual active Teradata environment and the surrogate keys don't need to be generated in sequential order.

The identity column will generate different surrogate keys on different systems.Also, each AMP will have a set of numbers to use and this range of numbers is defined based on the following DBS Control parameter.

IdCol Batch Size

  • Description: Indicates the size of a pool of numbers reserved by a vproc for assigning identity values torows inserted into an identity column table.
  • Valid Range: 1 through 1,000,000
  • Default Value: 100,000

This means AMP 1 will have a pool of numbers from 1 to 100,000 and AMP 2 will have a pool of numbers from 100,001 to 200,000 on a 2 AMP system using the IdCol Batch Size = 100,000.This option can be implemented by defining a key table containing an identity column.A new surrogate key will be generated every time a new record is inserted into this key table and there is no need to pass any value to the identity column when it is defined as follows.The key table also provide redundancy in case of a disaster.

Surrogate Key Table

CREATE MULTISET TABLE DATABASENAME.AR_KEY ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     DATABLOCKSIZE = 130560 BYTES, CHECKSUM = DEFAULT
     (
      SRC_SYS_UNQ_KEY_TX VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
      SRC_SYS_CD CHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
      AR_ID DECIMAL(15,0) GENERATED ALWAYS AS IDENTITY
           (START WITH 1
            INCREMENT BY 1
            MINVALUE -999999999999999
            MAXVALUE 999999999999999
            NO CYCLE),
      CRE_RUN_ID INTEGER NOT NULL,
      ICDW_ISRT_TS TIMESTAMP(0) NOT NULL DEFAULT CURRENT_TIMESTAMP(0))
UNIQUE PRIMARY INDEX ( SRC_SYS_UNQ_KEY_TX ,SRC_SYS_CD )

There are some special considerations when the key table containing an identity column needs to be re-loaded  in case of a disaster.

The key table needs to be re-created with a new START and MINVALUE numbers to exclude the numbers that were generated before to avoid the same number to be generated again and have duplicate numbers on the surrogate key (identity column).
The definition of the identity column needs to be changed from "GENERATED ALWAYS" to "GENERATED BY DEFAULT" to be able to reload the keys that were generated before.
New surrogate keys will be generated only when Nulls (no values) are passed to the identity column.
So, for example a surrogate key table with a highest identity column value of 999,999 needs to be reloaded because of a disaster, then the new surrogate key table needs to be defined as follows.

Surrogate Key Table (Re-Load)

CREATE MULTISET TABLE DATABASENAME.AR_KEY ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     DATABLOCKSIZE = 130560 BYTES, CHECKSUM = DEFAULT
     (
      SRC_SYS_UNQ_KEY_TX VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
      SRC_SYS_CD CHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
      AR_ID DECIMAL(15,0) GENERATED BY DEFAULT AS IDENTITY
           (START WITH 1000000
            INCREMENT BY 1
            MINVALUE 1000000
            MAXVALUE 999999999999999
            NO CYCLE),
      CRE_RUN_ID INTEGER NOT NULL,
      ICDW_ISRT_TS TIMESTAMP(0) NOT NULL DEFAULT CURRENT_TIMESTAMP(0))
UNIQUE PRIMARY INDEX ( SRC_SYS_UNQ_KEY_TX ,SRC_SYS_CD );

OLAP Function (CSUM)

This is a good way to generate surrogate keys in sequential order. It can be used on a dual active environment if a dual load strategy is in place or if the surrogate keys are generated on the primary system and then it is copied over to the secondary system by using Data Mover for example.The CSUM function will generate the next surrogate key number only if the highest surrogate key already generated is provided as part of the equation.This option can be implemented by developing a surrogate key generation process via a stored procedure together with a surrogate key table containing the natural key plus the surrogate key. A new surrogate key is generated by the CSUM function every time a new record is inserted into the surrogate key table.

The key table also provides redundancy in case of a disaster.

Surrogate Key Table

CREATE MULTISET TABLE DATABASENAME.AR_KEY ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     DATABLOCKSIZE = 130560 BYTES, CHECKSUM = DEFAULT
     (
      SRC_SYS_UNQ_KEY_TX VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
      SRC_SYS_CD CHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
      AR_ID DECIMAL(15,0) NOT NULL,
      CRE_RUN_ID INTEGER NOT NULL,
      ICDW_ISRT_TS TIMESTAMP(0) NOT NULL DEFAULT CURRENT_TIMESTAMP(0))
UNIQUE PRIMARY INDEX ( SRC_SYS_UNQ_KEY_TX ,SRC_SYS_CD );

Surrogate Key Stored Procedure

  BEGINTRANSACTION;
— CHECK THE HIGHEST SURROGATE KEY GENERATED
  SETtmpMsg = 'SELECT (COALESCE(MAX(AR_ID),0)) FROM DATABASENAME.AR_KEY;';
  SETcntr = 0;
LOCKING DATABASENAME.AR_KEY FORWRITE
  SELECT(COALESCE(MAX(AR_ID),0))
  INTOcntr
  FROM DATABASENAME.AR_KEY  ;
  IF SQLSTATE = '02000'THEN
     SETcntr = 0;
  ENDIF;
— GENERATING NEW SURROGATE KEYS
  SETtmpMsg = 'INSERT INTO DATABASENAME.AR_KEY…';
  SETcntrkey = 0;
SETstatmnt = '
  INSERT INTO DATABASENAME.AR_KEY
  SELECT
   SRC_SYS_UNQ_KEY_TX
  ,SRC_SYS_CD
  ,(CSUM(1,SRC_SYS_UNQ_KEY_TX,SRC_SYS_CD) +'||  cntr || ' ) AS AR_ID
  ,CRE_RUN_ID
  ,CURRENT_TIMESTAMP(0) AS ICDW_ISRT_TS
  FROM
  (SELECT
   SRC_SYS_UNQ_KEY_TX
  ,SRC_SYS_CD
  ,CRE_RUN_ID
  FROM DATABASENAME.'|| 'AR_'|| sourcecd || '_SA3
  WHERE (SRC_SYS_UNQ_KEY_TX,SRC_SYS_CD) NOT IN
  (SELECT
   SRC_SYS_UNQ_KEY_TX
  ,SRC_SYS_CD
  FROM DATABASENAME.AR_KEY
  WHERE SRC_SYS_CD = '||  ''''|| srcsyscd || ''''|| '
  )
  AND SRC_SYS_CD = '||  ''''|| srcsyscd || ''''|| '
  AND CRE_RUN_ID = '||  crerunid || '
  AND ETL_ACTN_CD = ''I''
  GROUP BY 1,2,3) T0
  ;';   
  CALL DBC.SysExecSQL(statmnt);
  SETcntrkey = ACTIVITY_COUNT;
  IF cntrkey = 0 THEN
SET tmpMsg = 'APP WARNING: NO NEW AR_IDs WERE GENERATED';
SET SQLMSG = TMPMSG;
SET SQLERRCDE = SQLCODE;
SET SQLSTTE = SQLSTATE;
SET SQL_ERR_CDE = SQLCODE;
SET SQL_STATE = SQLSTATE;   
 
INSERT  INTO  DATABASENAME.SPROC_ERR_LOG 
VALUES('DATABASENAME','SPROC_GEN_AR_KEY','W',
:SQL_ERR_CDE,:SQL_STATE,:TMPMSG,:crerunid
,CURRENT_TIMESTAMP(0));
 
ENDIF;
ENDTRANSACTION;

Please notice that the maximum surrogate key value is added to the CSUM expression to generate the next surrogate key.

Please also notice that there are a "BEGIN TRANSACTION;", a " END TRANSACTION;", and a "LOCKING ATABASENAME.AR_KEY FOR WRITE" statements inside of the stored procedure code for concurrency reasons. This allows multiple surrogate key generation processes to run at the same time.

If there are two surrogate key generation processes running a the same time, then the first process to place the write lock on the key table will generate new surrogate keys while the second process will be waiting until the end transaction statement of the first process is done.

There are some special considerations when the key table containing the surrogate keys needs to be re-loaded  in case of a disaster.

The surrogate keys can't be re-generated using the stored procedure because the process could generate new surrogate keys different from the original ones.
The surrogate keys need to be copied from the target tables into the key table or restored from the latest backup.
Continue using the stored procedure after the key table is re-loaded.

Hashing Algorithm

This is a good way to generate surrogate keys if you have a dual active environment and the surrogate keys don't need to be generated in sequential order. The hashing algorithm will generate the same surrogate keys on different Teradata systems but key collisions can happen. Therefore, the hashing algorithm needs to generate a uniqueness value as part of the surrogate key generation process to eliminate key collisions.
This option can be implemented by developing a surrogate key generation process via a stored procedure together with a surrogate key table containing the natural key plus the surrogate key (hashing key plus uniqueness value). The idea here is to generate the surrogate keys just once and store into a key table.  Then, the key table is used to assign the surrogate keys to all tables that require it.
The key table also provides redundancy in case of a disaster.

Surrogate Key Table
CREATE MULTISET TABLE DATABASENAME.AR_KEY ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     DATABLOCKSIZE = 130560 BYTES, CHECKSUM = DEFAULT
     (
      SRC_SYS_UNQ_KEY_TX VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
      SRC_SYS_CD CHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
      AR_ID DECIMAL(18,0) NOT NULL,
      HASH_AR_ID DECIMAL(18,0) NOT NULL,
      UNIQUENESS_VALUE INTEGER NOT NULL,
      CRE_RUN_ID INTEGER NOT NULL,
      ICDW_ISRT_TS TIMESTAMP(0) NOT NULL DEFAULT CURRENT_TIMESTAMP(0))
UNIQUE PRIMARY INDEX ( SRC_SYS_UNQ_KEY_TX ,SRC_SYS_CD );

Hash Key Temporary Table

CREATE MULTISET GLOBAL TEMPORARY TABLE DATABASENAME.HASH_AR_KEY ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     DATABLOCKSIZE = 130560 BYTES, CHECKSUM = DEFAULT,
     LOG
     (
      SRC_SYS_UNQ_KEY_TX VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
      SRC_SYS_CD CHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
      HASH_AR_ID DECIMAL(18,0) NOT NULL,
      CRE_RUN_ID INTEGER NOT NULL,
      ICDW_ISRT_TS TIMESTAMP(0) NOT NULL DEFAULT CURRENT_TIMESTAMP(0))
PRIMARY INDEX ( HASH_AR_ID )
ON COMMIT PRESERVE ROWS;

Surrogate Key Stored Procedure

— GENERATING NEW HASHING KEYS

 SET cntrkey = 0;
 SET tmpMsg = 'INSERT INTO DATABASENAME.HASH_AR_KEY…';
   
 SET statmnt = '
  INSERT INTO DATABASENAME.HASH_AR_KEY
  SELECT
   T1.SRC_SYS_UNQ_KEY_TX
  ,T1.SRC_SYS_CD
  ,CAST(T1.AR_ID AS DECIMAL(18)) AS HASH_AR_ID
  ,T1.CRE_RUN_ID
  ,CURRENT_TIMESTAMP(0) AS ICDW_ISRT_TS
  FROM
  (SELECT
   TRIM(HASHBUCKET('||''''||'00'||''''||'XB || SUBSTR(HASHROW (SRC_SYS_UNQ_KEY_TX,SRC_SYS_CD) ,1,1)  (BYTE(4)) ))||TRIM(HASHBUCKET('||''''||'00'||''''||'XB || SUBSTR(HASHROW (SRC_SYS_UNQ_KEY_TX,SRC_SYS_CD) ,2,1) (BYTE(4)) ))||TRIM(HASHBUCKET('||''''||'00'||''''||'XB || SUBSTR(HASHROW (SRC_SYS_UNQ_KEY_TX,SRC_SYS_CD) ,3,1) (BYTE(4)) ))||TRIM(HASHBUCKET('||''''||'00'||''''||'XB || SUBSTR(HASHROW (SRC_SYS_UNQ_KEY_TX,SRC_SYS_CD) ,4,1) (BYTE(4)) )) AS AR_ID
  ,SRC_SYS_UNQ_KEY_TX
  ,SRC_SYS_CD
  ,CRE_RUN_ID
  FROM DATABASENAME.' || 'AR_' || sourcecd || '_SA3
  WHERE (SRC_SYS_UNQ_KEY_TX,SRC_SYS_CD) NOT IN
  (SELECT
   SRC_SYS_UNQ_KEY_TX
  ,SRC_SYS_CD
  FROM DATABASENAME.AR_KEY
  WHERE SRC_SYS_CD = ' ||  '''' || srcsyscd || ''''|| '
  )
  AND SRC_SYS_CD = ' ||  '''' || srcsyscd || ''''|| '
  AND CRE_RUN_ID = ' ||  crerunid || '
  AND ETL_ACTN_CD = ''I''
  GROUP BY 1,2,3,4) T1
  ;';
    
  CALL DBC.SysExecSQL(statmnt);
  
  SET cntrkey = ACTIVITY_COUNT;
  
  IF cntrkey = 0 THEN
     SET tmpMsg = 'APP WARNING: NO NEW HASH_AR_IDs WERE GENERATED';
     SET SQLMSG = TMPMSG;
     SET SQLERRCDE = SQLCODE;
     SET SQLSTTE = SQLSTATE;
     SET SQL_ERR_CDE = SQLCODE;
     SET SQL_STATE = SQLSTATE;
     INSERT INTO DATABASENAME.SPROC_ERR_LOG
     VALUES ('DATABASENAME','SPROC_GEN_AR_KEY','W',
            :SQL_ERR_CDE,:SQL_STATE,:TMPMSG,
            :crerunid,CURRENT_TIMESTAMP(0));
  END IF;
  
— GENERATING NEW SURROGATE KEYS

 SET cntrkey = 0;
 SET tmpMsg = 'INSERT INTO DATABASENAME.AR_KEY…';
   
 SET statmnt = '
  INSERT INTO DATABASENAME.AR_KEY
  SELECT
   T1.SRC_SYS_UNQ_KEY_TX
  ,T1.SRC_SYS_CD
  ,(CAST((TRIM(T1.HASH_AR_ID)||TRIM(SUM(T1.UNIQUENESS_VALUE) OVER 
   (PARTITION BY T1.HASH_AR_ID ORDER BY T1.HASH_AR_ID ROWS UNBOUNDED PRECEDING))) AS DECIMAL(18))) + COALESCE(T2.UNIQUENESS_VALUE,0) AS AR_ID
  ,CAST(T1.HASH_AR_ID AS DECIMAL(18)) AS HASH_AR_ID
  ,(CAST(SUM(T1.UNIQUENESS_VALUE) OVER 
   (PARTITION BY T1.HASH_AR_ID ORDER BY T1.HASH_AR_ID ROWS UNBOUNDED PRECEDING) AS INTEGER)) + COALESCE(T2.UNIQUENESS_VALUE,0) AS UNIQUENESS_VALUE
  ,T1.CRE_RUN_ID
  ,CURRENT_TIMESTAMP(0) AS ICDW_ISRT_TS
  FROM
  (SELECT
   HASH_AR_ID,
   SRC_SYS_UNQ_KEY_TX,
   SRC_SYS_CD,
   CRE_RUN_ID,
   1 AS UNIQUENESS_VALUE
   FROM DATABASENAME.HASH_AR_KEY
  ) T1
  LEFT OUTER JOIN
  (SELECT
   HASH_AR_ID,
   MAX(UNIQUENESS_VALUE) AS UNIQUENESS_VALUE
   FROM DATABASENAME.AR_KEY
   GROUP BY 1
  ) T2
  ON T1.HASH_AR_ID = T2.HASH_AR_ID
  ;';     
  CALL DBC.SysExecSQL(statmnt);
  
  SET cntrkey = ACTIVITY_COUNT;
  
  IF cntrkey = 0 THEN
     SET tmpMsg = 'APP WARNING: NO NEW AR_IDs WERE GENERATED';
     SET SQLMSG = TMPMSG;
     SET SQLERRCDE = SQLCODE;
     SET SQLSTTE = SQLSTATE;
     SET SQL_ERR_CDE = SQLCODE;
     SET SQL_STATE = SQLSTATE;

INSERT INTO DATABASENAME.SPROC_ERR_LOG
VALUES ('DATABASENAME','SPROC_GEN_AR_KEY','W',:SQL_ERR_CDE,
:SQL_STATE,:TMPMSG,:crerunid,CURRENT_TIMESTAMP(0));

 END IF;

Please notice that the process is split in two steps.

The first one generates new hashing keys and the second one generates new surrogate keys by adding the uniqueness value to the hashing keys.The process is also taking in consideration the highest uniqueness value for any existing hashing key to manage key collisions and to assign the uniqueness value correctly.
There are some special considerations when the key table containing the surrogate keys needs to be re-loaded  in case of a disaster because of the uniqueness value.

The surrogate keys can't be re-generated using the stored procedure because the process could generate new surrogate keys different from the original ones because of the uniqueness value.
The surrogate keys need to be copied from the target tables into the key table or restored from the latest backup.
Continue using the stored procedure after the key table is re-loaded.

Add a Comment

Your email address will not be published. Required fields are marked *