Sunday, September 23, 2012

Lookup Transformation

Lookup Transformation

Use a Lookup transformation in a mapping to look up data in a flat file, relational table, view, or synonym. You can import a lookup definition from any flat file or relational database to which both the PowerCenter Client and Integration Service can connect. You can also create a lookup definition from a source qualifier. You can use multiple Lookup transformations in a mapping.
The Integration Service queries the lookup source based on the lookup ports in the transformation and a lookup condition. The Lookup transformation returns the result of the lookup to the target or another transformation.

Perform the following tasks with a Lookup transformation:
·         Get a related value. Retrieve a value from the lookup table based on a value in the source. For example, the source has an employee ID. Retrieve the employee name from the lookup table.
·         Perform a calculation. Retrieve a value from a lookup table and use it in a calculation. For example, retrieve a sales tax percentage, calculate a tax, and return the tax to a target.
·         Update slowly changing dimension tables. Determine whether rows exist in a target.


Configure the Lookup transformation to perform the following types of lookups:
·         Relational or flat file lookup. Perform a lookup on a flat file or a relational table. When you create a Lookup transformation using a relational table as the lookup source, you can connect to the lookup source using ODBC and import the table definition as the structure for the Lookup transformation. When you create a Lookup transformation using a flat file as a lookup source, the Designer invokes the Flat File Wizard.
·         Pipeline lookup. Perform a lookup on application sources such as a JMS, MSMQ, or SAP. Drag the source into the mapping and associate the Lookup transformation with the source qualifier. Configure partitions to improve performance when the Integration Service retrieves source data for the lookup cache.
·         Connected or unconnected lookup. A connected Lookup transformation receives source data, performs a lookup, and returns data to the pipeline. An unconnected Lookup transformation is not connected to a source or target. A transformation in the pipeline calls the Lookup transformation with a :LKP expression. The unconnected Lookup transformation returns one column to the calling transformation.
·         Cached or uncached lookup. Cache the lookup source to improve performance. If you cache the lookup source, you can use a dynamic or static cache. By default, the lookup cache remains static and does not change during the session. With a dynamic cache, the Integration Service inserts or updates rows in the cache. When you cache the target table as the lookup source, you can look up values in the cache to determine if the values exist in the target. The Lookup transformation marks rows to insert or update the target.

Lookup Source Types
When you create a Lookup transformation, you can choose a relational table, flat file, or a source qualifier as the lookup source.

Relational Lookups
When you create a Lookup transformation using a relational table as a lookup source, you can connect to the lookup source using ODBC and import the table definition as the structure for the Lookup transformation.
Use the following options with relational lookups:
·         Override the default SQL statement to add a WHERE clause or to query multiple tables.
·         Sort null data high or low, based on database support.
·         Perform case-sensitive comparisons based on the database support.


Flat File Lookups
When you create a Lookup transformation using a flat file as a lookup source, select a flat file definition in the repository or import the source when you create the transformation. When you import a flat file lookup source, the Designer invokes the Flat File Wizard.
Use the following options with flat file lookups:
·         Use indirect files as lookup sources by configuring a file list as the lookup file name.
·         Use sorted input for the lookup.
·         Sort null data high or low.
·         Use case-sensitive string comparison with flat file lookups.



Using Sorted Input
When you configure a flat file Lookup transformation for sorted input, the condition columns must be grouped. If the condition columns are not grouped, the Integration Service cannot cache the lookup and fails the session. For optimal caching performance, sort the condition columns.
For example, a Lookup transformation has the following condition:
OrderID = OrderID1

In the following flat file lookup source, the keys are grouped, but not sorted. The Integration Service can cache the data, but performance may not be optimal.CustID = CustID1


Pipeline Lookups
Create a pipeline Lookup transformation to perform a lookup on an application source that is not a relational table or flat file. A pipeline Lookup transformation has a source qualifier as the lookup source. The source qualifier can represent any type of source definition, including JMS and MSMQ. The source definition cannot have more than one group.
When you configure a pipeline Lookup transformation, the lookup source and source qualifier are in a different pipeline from the Lookup transformation. The source and source qualifier are in a partial pipeline that contains no target. The Integration Service reads the source data in this pipeline and passes the data to the Lookup transformation to create the cache. You can create multiple partitions in the partial pipeline to improve performance.



To improve performance when processing relational or flat file lookup sources, create a pipeline Lookup transformation instead of a relational or flat file Lookup transformation. You can create partitions to process the lookup source and pass it to the Lookup transformation.
Create a connected or unconnected pipeline Lookup transformation.



Configuring a Pipeline Lookup Transformation in a Mapping
A mapping that contains a pipeline Lookup transformation includes a partial pipeline that contains the lookup source and source qualifier. The partial pipeline does not include a target. The Integration Service retrieves the lookup source data in this pipeline and passes the data to the lookup cache.
The partial pipeline is in a separate target load order group in session properties. You can create multiple partitions in the pipeline to improve performance. You can not configure the target load order with the partial pipeline.  

The mapping contains the following objects:
·         The lookup source definition and source qualifier are in a separate pipeline. The Integration Service creates a lookup cache after it processes the lookup source data in the pipeline.
·         A flat file source contains new department names by employee number.
·         The pipeline Lookup transformation receives Employee_Number and New_Dept from the source file. The pipeline Lookup performs a lookup on Employee_ID in the lookup cache. It retrieves the employee first and last name from the lookup cache.
·         A flat file target receives the Employee_ID, First_Name, Last_Name, and New_Dept from the Lookup transformation.



Connected and Unconnected Lookups
You can configure a connected Lookup transformation to receive input directly from the mapping pipeline, or you can configure an unconnected Lookup transformation to receive input from the result of an expression in another transformation.
The following table lists the differences between connected and unconnected lookups:
Connected Lookup Transformation
The following steps describe how the Integration Service processes a connected Lookup transformation:
1.       A connected Lookup transformation receives input values directly from another transformation in the pipeline.
2.       For each input row, the Integration Service queries the lookup source or cache based on the lookup ports and the condition in the transformation.
3.       If the transformation is uncached or uses a static cache, the Integration Service returns values from the lookup query. If the transformation uses a dynamic cache, the Integration Service inserts the row into the cache when it does not find the row in the cache. When the Integration Service finds the row in the cache, it updates the row in the cache or leaves it unchanged. It flags the row as insert, update, or no change.
4.       The Integration Service passes return values from the query to the next transformation. If the transformation uses a dynamic cache, you can pass rows to a Filter or Router transformation to filter new rows to the target.

Unconnected Lookup Transformation
An unconnected Lookup transformation receives input values from the result of a :LKP expression in another transformation. You can call the Lookup transformation more than once in a mapping.
A common use for unconnected Lookup transformations is to update slowly changing dimension tables. For more information about slowly changing dimension tables, visit the Informatica Knowledge Base at http://my.informatica.com.
The following steps describe the way the Integration Service processes an unconnected Lookup transformation:
1.       An unconnected Lookup transformation receives input values from the result of a :LKP expression in another transformation, such as an Update Strategy transformation.
2.       The Integration Service queries the lookup source or cache based on the lookup ports and condition in the transformation.
3.       The Integration Service returns one value into the return port of the Lookup transformation.
4.       The Lookup transformation passes the return value into the :LKP expression.

Indexes and a Lookup Table
If you have privileges to modify the database containing a lookup table, you can improve lookup initialization time by adding an index to the lookup table. You can improve performance for very large lookup tables. Since the Integration Service queries, sorts, and compares values in lookup columns, the index needs to include every column in a lookup condition.
You can improve performance by indexing the following types of lookup:

·         Cached lookups. You can improve performance by indexing the columns in the lookup ORDER BY. The session log contains the ORDER BY clause.
·         Uncached lookups. Because the Integration Service issues a SELECT statement for each row passing into the Lookup transformation, you can improve performance by indexing the columns in the lookup condition.

The Lookup transformation also enables an associated ports property that you configure when you use a dynamic cache. The associated port is the input port that contains the data to update the lookup cache.
Use the following guidelines to configure lookup ports:
·         If you delete lookup ports from a flat file lookup, the session fails.
·         You can delete lookup ports from a relational lookup if the mapping does not use the lookup port. This reduces the amount of memory the Integration Service needs to run the session.

Configuring Lookup Properties in a Session
When you configure a session, you can configure lookup properties that are unique to sessions:
·         Flat file lookups. Configure lookup location information, such as the source file directory, file name, and the file type.
·         Relational lookups. You can define $Source and $Target variables in the session properties. You can also override connection information to use the $DBConnectionName or $AppConnectionName session parameter.
·         Pipeline lookups. Configure the lookup source file properties such as the source file directory, file name, and the file type. If the source is a relational table or application source, configure the connection information.
Default Lookup Query
The default lookup query contains the following statements:
·         SELECT. The SELECT statement includes all the lookup ports in the mapping. You can view the SELECT statement by generating SQL using the Lookup SQL Override property. Do not add or delete any columns from the default SQL statement.
·         ORDER BY. The ORDER BY clause orders the columns in the same order they appear in the Lookup transformation. The Integration Service generates the ORDER BY clause. You cannot view this when you generate the default SQL using the Lookup SQL Override property.

Overriding the Lookup Query
The lookup SQL override is similar to entering a custom query in a Source Qualifier transformation. You can override the lookup query for a relational lookup. You can enter the entire override, or you can generate and edit the default SQL statement. When the Designer generates the default SQL statement for the lookup SQL override, it includes the lookup/output ports in the lookup condition and the lookup/return port.
Override the lookup query in the following circumstances:
·         Override the ORDER BY clause. Create the ORDER BY clause with fewer columns to increase performance. When you override the ORDER BY clause, you must suppress the generated ORDER BY clause with a comment notation.
Note: If you use pushdown optimization, you cannot override the ORDER BY clause or suppress the generated ORDER BY clause with a comment notation.
·         A lookup table name or column names contains a reserved word. If the table name or any column name in the lookup query contains a reserved word, you must ensure that all reserved words are enclosed in quotes.
·         Use parameters and variables. Use parameters and variables when you enter a lookup SQL override. Use any parameter or variable type that you can define in the parameter file. You can enter a parameter or variable within the SQL statement, or you can use a parameter or variable as the SQL query. For example, you can use a session parameter, $ParamMyLkpOverride, as the lookup SQL query, and set $ParamMyLkpOverride to the SQL statement in a parameter file.
The Designer cannot expand parameters and variables in the query override and does not validate it when you use a parameter or variable. The Integration Service expands the parameters and variables when you run the session.
·         A lookup column name contains a slash (/) character. When generating the default lookup query, the Designer and Integration Service replace any slash character (/) in the lookup column name with an underscore character. To query lookup column names containing the slash character, override the default lookup query, replace the underscore characters with the slash character, and enclose the column name in double quotes.
·         Add a WHERE clause. Use a lookup SQL override to add a WHERE clause to the default SQL statement. You might want to use the WHERE clause to reduce the number of rows included in the cache. When you add a WHERE clause to a Lookup transformation using a dynamic cache, use a Filter transformation before the Lookup transformation to pass rows into the dynamic cache that match the WHERE clause.
Note: The session fails if you include large object ports in a WHERE clause.
·         Other. Use a lookup SQL override if you want to query lookup data from multiple lookups or if you want to modify the data queried from the lookup table before the Integration Service caches the lookup rows. For example, use TO_CHAR to convert dates to strings.

Overriding the ORDER BY Clause
By default, the Integration Service generates an ORDER BY clause for a cached lookup. The ORDER BY clause contains all lookup ports. To increase performance, you can suppress the default ORDER BY clause and enter an override ORDER BY with fewer columns.

Note: If you use pushdown optimization, you cannot override the ORDER BY clause or suppress the generated ORDER BY clause with a comment notation.
The Integration Service always generates an ORDER BY clause, even if you enter one in the override. Place two dashes ‘--’ after the ORDER BY override to suppress the generated ORDER BY clause. For example, a Lookup transformation uses the following lookup condition:
ITEM_ID = IN_ITEM_ID
PRICE <= IN_PRICE

The Lookup transformation includes three lookup ports used in the mapping, ITEM_ID, ITEM_NAME, and PRICE. When you enter the ORDER BY clause, enter the columns in the same order as the ports in the lookup condition. You must also enclose all database reserved words in quotes. Enter the following lookup query in the lookup SQL override:
SELECT ITEMS_DIM.ITEM_NAME, ITEMS_DIM.PRICE, ITEMS_DIM.ITEM_ID FROM ITEMS_DIM ORDER BY
ITEMS_DIM.ITEM_ID, ITEMS_DIM.PRICE --
To override the default ORDER BY clause for a relational lookup, complete the following steps:
1.       Generate the lookup query in the Lookup transformation.
2.       Enter an ORDER BY clause that contains the condition ports in the same order they appear in the Lookup condition.
3.       Place two dashes ‘--’ as a comment notation after the ORDER BY clause to suppress the ORDER BY clause that the Integration Service generates.
If you override the lookup query with an ORDER BY clause without adding comment notation, the lookup fails.
Note: Sybase has a 16 column ORDER BY limitation. If the Lookup transformation has more than 16 lookup/output ports including the ports in the lookup condition, override the ORDER BY clause or use multiple Lookup transformations to query the lookup table.


Reserved Words
If any lookup name or column name contains a database reserved word, such as MONTH or YEAR, the session fails with database errors when the Integration Service executes SQL against the database. You can create and maintain a reserved words file, reswords.txt, in the Integration Service installation directory. When the Integration Service initializes a session, it searches for reswords.txt. If the file exists, the Integration Service places quotes around matching reserved words when it executes SQL against the database.
You may need to enable some databases, such as Microsoft SQL Server and Sybase, to use SQL-92 standards regarding quoted identifiers. Use connection environment SQL to issue the command. For example, with Microsoft SQL Server, use the following command:
SET QUOTED_IDENTIFIER ON
Note: The reserved words file, reswords.txt, is a file that you create and maintain in the Integration Service installation directory. The Integration Service searches this file and places quotes around reserved words when it executes SQL against source, target, and lookup databases.

Guidelines to Overriding the Lookup Query
Use the following guidelines when you override the lookup SQL query:
·         You can only override the lookup SQL query for relational lookups.
·         Configure the Lookup transformation for caching. If you do not enable caching, the Integration Service does not recognize the override.
·         Generate the default query, and then configure the override. This helps ensure that all the lookup/output ports are included in the query. If you add or subtract ports from the SELECT statement, the session fails.
·         Add a source lookup filter to filter the rows that are added to the lookup cache. This ensures the Integration Service only inserts rows in the dynamic cache and target table that match the WHERE clause.
·         To share the cache, use the same lookup SQL override for each Lookup transformation.
·         If you override the ORDER BY clause, the session fails if the ORDER BY clause does not contain the condition ports in the same order they appear in the Lookup condition or if you do not suppress the generated ORDER BY clause with the comment notation.
·         If you use pushdown optimization, you cannot override the ORDER BY clause or suppress the generated ORDER BY clause with comment notation.
·         If the table name or any column name in the lookup query contains a reserved word, you must enclose all reserved words in quotes.

Lookup Condition
The Integration Service finds data in the lookup source with a lookup condition. The lookup condition is similar to the WHERE clause in an SQL query. When you configure a lookup condition in a Lookup transformation, you compare the value of one or more columns in the source data with values in the lookup source or cache.

For example, the source data contains an employee_number. The lookup source table contains employee_ID, first_name, and last_name. You configure the following lookup condition:
employee_ID = employee_number
For each employee_number, the Integration Service returns the employee_ID, last_name, and first_name column from the lookup source.
The Integration Service can return more than one row from the lookup source. You configure the following lookup condition:
employee_ID > employee_number
The Integration Service returns rows for all employee_ID numbers greater than the source employee number.

Use the following guidelines when you enter a condition for a Lookup transformation:
·         The datatypes for the columns in a lookup condition must match.
·         You must enter a lookup condition in all Lookup transformations.
·         Use one input port for each lookup port in the lookup condition. Use the same input port in more than one condition in a transformation.
·         When you enter multiple conditions, the Integration Service evaluates each condition as an AND, not an OR. The Integration Service returns rows that match all the conditions you configure.
·         If you include multiple conditions, enter the conditions in the following order to optimize lookup performance:
−Equal to (=)
−Less than (<), greater than (>), less than or equal to (<=), greater than or equal to (>=)
−Not equal to (!=)
·         The Integration Service matches null values. For example, if an input lookup condition column is NULL, the Integration Service evaluates the NULL equal to a NULL in the lookup.
·         If you configure a flat file lookup for sorted input, the Integration Service fails the session if the condition columns are not grouped. If the columns are grouped, but not sorted, the Integration Service processes the lookup as if you did not configure sorted input.

The Integration Service processes lookup matches differently depending on whether you configure the transformation for a dynamic cache or an uncached or static cache.

Uncached or Static Cache
Use the following guidelines when you configure a Lookup transformation that has a static lookup cache or an uncached lookup source:
·         Use the following operators when you create the lookup condition:
=, >, <, >=, <=, !=
If you include more than one lookup condition, place the conditions in the following order to optimize lookup performance:
−Equal to (=)
−Less than (<), greater than (>), less than or equal to (<=), greater than or equal to (>=)
−Not equal to (!=)
For example, create the following lookup condition:
ITEM_ID = IN_ITEM_ID
PRICE <= IN_PRICE
·         The input value must meet all conditions for the lookup to return a value.
The condition can match equivalent values or supply a threshold condition. For example, you might look for customers who do not live in California, or employees whose salary is greater than $30,000. Depending on the nature of the source and condition, the lookup might return multiple values.

Dynamic Cache
If you configure a Lookup transformation to use a dynamic cache, you can use only the equality operator (=) in the lookup condition.

Handling Multiple Matches
Lookups find a value based on the conditions you set in the Lookup transformation. If the lookup condition is not based on a unique key, or if the lookup source is denormalized, the Integration Service might find multiple matches in the lookup source or cache.
You can configure a Lookup transformation to handle multiple matches in the following ways:
·         Return the first matching value, or return the last matching value. You can configure the transformation to return the first matching value or the last matching value. The first and last values are the first value and last value found in the lookup cache that match the lookup condition. When you cache the lookup source, the Integration Service generates an ORDER BY clause for each column in the lookup cache to determine the first and last row in the cache. The Integration Service then sorts each lookup source column in ascending order.
The Integration Service sorts numeric columns in ascending numeric order (such as 0 to 10), date/time columns from January to December and from the first of the month to the end of the month, and string columns based on the sort order configured for the session.
·         Return any matching value. You can configure the Lookup transformation to return any value that matches the lookup condition. When you configure the Lookup transformation to return any matching value, the transformation returns the first value that matches the lookup condition. It creates an index based on the key ports rather than all Lookup transformation ports. When you use any matching value, performance can improve because the process of indexing rows is simplified.
·         Return an error. When the Lookup transformation uses a static cache or no cache, the Integration Service marks the row as an error, writes the row to the session log by default, and increases the error count by one. When the Lookup transformation uses a dynamic cache, the Integration Service fails the session when it encounters multiple matches either while caching the lookup table or looking up values in the cache that contain duplicate keys. Also, if you configure the Lookup transformation to output old values on updates, the Lookup transformation returns an error when it encounters multiple matches.


Lookup Caches
You can configure a Lookup transformation to cache the lookup file or table. The Integration Service builds a cache in memory when it processes the first row of data in a cached Lookup transformation. It allocates memory for the cache based on the amount you configure in the transformation or session properties. The Integration Service stores condition values in the index cache and output values in the data cache. The Integration Service queries the cache for each row that enters the transformation.
The Integration Service also creates cache files by default in the $PMCacheDir. If the data does not fit in the memory cache, the Integration Service stores the overflow values in the cache files. When the session completes, the Integration Service releases cache memory and deletes the cache files unless you configure the Lookup transformation to use a persistent cache.
When configuring a lookup cache, you can configure the following options:
·         Persistent cache
·         Recache from lookup source
·         Static cache
·         Dynamic cache
·         Shared cache
·         Pre-build lookup cache
Note: You can use a dynamic cache for relational or flat file lookups.

Creating a Reusable Pipeline Lookup Transformation
Create a reusable pipeline Lookup transformation in the Transformation Developer. When you create the transformation, choose Source for the lookup table location. The Transformation Developer displays a list of source definitions from the repository.
When you choose a source qualifier that represents a relational table or flat file source definition, the Designer creates a relational or flat file Lookup transformation. When the source qualifier represents an application source, the Designer creates a pipeline Lookup transformation. To create a pipeline Lookup transformation for a relational or flat file lookup source, change the source type to Source Qualifier after you create the transformation. Enter the name of the source definition in the Lookup Table property.
When you drag a reusable pipeline Lookup transformation into a mapping, the Mapping Designer adds the source definition from the Lookup Table property to the mapping. The Designer adds the Source Qualifier transformation.
To change the Lookup Table Name to another source qualifier in the mapping, click the Open button in the Lookup Table Name property. Choose a Source Qualifier from the list.

Creating a Non-Reusable Pipeline Lookup Transformation
Create a non-reusable pipeline Lookup transformation in the Mapping Designer. Drag a source definition into a mapping. When you create the transformation in the Mapping Designer, select Source Qualifier as the lookup table location. The Mapping Designer displays a list of the source qualifiers in the mapping. When you select a source qualifier, the Mapping Designer populates the Lookup transformation with port names and attributes from the source qualifier you choose.

Tips
Add an index to the columns used in a lookup condition.
If you have privileges to modify the database containing a lookup table, you can improve performance for both cached and uncached lookups. This is important for very large lookup tables. Since the Integration Service needs to query, sort, and compare values in these columns, the index needs to include every column used in a lookup condition.
Place conditions with an equality operator (=) first.
If you include more than one lookup condition, place the conditions in the following order to optimize lookup performance:
·         Equal to (=)
·         Less than (<), greater than (>), less than or equal to (<=), greater than or equal to (>=)
·         Not equal to (!=)

Cache small lookup tables.
Improve session performance by caching small lookup tables. The result of the lookup query and processing is the same, whether or not you cache the lookup table.
Join tables in the database.
If the lookup table is on the same database as the source table in the mapping and caching is not feasible, join the tables in the source database rather than using a Lookup transformation.

Use a persistent lookup cache for static lookups.
If the lookup source does not change between sessions, configure the Lookup transformation to use a persistent lookup cache. The Integration Service then saves and reuses cache files from session to session, eliminating the time required to read the lookup source.

Call unconnected Lookup transformations with the :LKP reference qualifier.
When you write an expression using the :LKP reference qualifier, you call unconnected Lookup transformations only. If you try to call a connected Lookup transformation, the Designer displays an error and marks the mapping invalid.

Configure a pipeline Lookup transformation to improve performance when processing a relational or flat file lookup source.
You can create partitions to process a relational or flat file lookup source when you define the lookup source as a source qualifier. Configure a non-reusable pipeline Lookup transformation and create partitions in the partial pipeline that processes the lookup source.






 

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home