Monday, September 17, 2012

Aggregator Transformation

Aggregator Transformation
The Aggregator transformation performs aggregate calculations, such as averages and sums. The Integration Service performs aggregate calculations as it reads and stores data group and row data in an aggregate cache. The Aggregator transformation is unlike the Expression transformation, in that you use the Aggregator transformation to perform calculations on groups. The Expression transformation permits you to perform calculations on a row-by-row basis only.
When you use the transformation language to create aggregate expressions, you can use conditional clauses to filter rows, providing more flexibility than SQL language.
After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation. When the Integration Service performs incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally.

Components of the Aggregator Transformation
The Aggregator is an active transformation, changing the number of rows in the pipeline. The Aggregator transformation has the following components and options:
·         Aggregate cache. The Integration Service stores data in the aggregate cache until it completes aggregate calculations. It stores group values in an index cache and row data in the data cache.
·         Aggregate expression. Enter an expression in an output port. The expression can include non-aggregate expressions and conditional clauses.
·         Group by port. Indicate how to create groups. The port can be any input, input/output, output, or variable port. When grouping data, the Aggregator transformation outputs the last row of each group unless otherwise specified.
·         Sorted input. Select this option to improve session performance. To use sorted input, you must pass data to the Aggregator transformation sorted by group by port, in ascending or descending order.


Configuring Aggregator Transformation Ports
To configure ports in the Aggregator transformation, complete the following tasks:
·         Enter an expression in any output port, using conditional clauses or non-aggregate functions in the port.
·         Create multiple aggregate output ports.
·         Configure any input, input/output, output, or variable port as a group by port.
·         Improve performance by connecting only the necessary input/output ports to subsequent transformations, reducing the size of the data cache.
·         Use variable ports for local variables.
·         Create connections to other transformations as you enter an expression.

Aggregate Expressions
The Designer allows aggregate expressions only in the Aggregator transformation. An aggregate expression can include conditional clauses and non-aggregate functions. It can also include one aggregate function nested within another aggregate function, such as:
MAX( COUNT( ITEM ))

The result of an aggregate expression varies depending on the group by ports used in the transformation. For example, when the Integration Service calculates the following aggregate expression with no group by ports defined, it finds the total quantity of items sold:  SUM( QUANTITY )

However, if you use the same expression, and you group by the ITEM port, the Integration Service returns the total quantity of items sold, by item.


Aggregate Functions
Use the following aggregate functions within an Aggregator transformation. You can nest one aggregate function within another aggregate function.
The transformation language includes the following aggregate functions:

·         AVG
·         COUNT
·         FIRST
·         LAST
·         MAX
·         MEDIAN
·         MIN
·         PERCENTILE
·         STDDEV
·         SUM
·         VARIANCE

When you use any of these functions, you must use them in an expression within an Aggregator transformation.


Nested Aggregate Functions
You can include multiple single-level or multiple nested functions in different output ports in an Aggregator transformation. However, you cannot include both single-level and nested functions in an Aggregator transformation. Therefore, if an Aggregator transformation contains a single-level function in any output port, you cannot use a nested function in any other port in that transformation. When you include single-level and nested functions in the same Aggregator transformation, the Designer marks the mapping or mapplet invalid. If you need to create both single-level and nested functions, create separate Aggregator transformations.


Conditional Clauses
Use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE.
For example, use the following expression to calculate the total commissions of employees who exceeded their quarterly quota: SUM( COMMISSION, COMMISSION > QUOTA )

Non-Aggregate Functions
You can also use non-aggregate functions in the aggregate expression.
The following expression returns the highest number of items sold for each item (grouped by item). If no items were sold, the expression returns 0.
IIF( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))


Null Values in Aggregate Functions
When you configure the Integration Service, you can choose how you want the Integration Service to handle null values in aggregate functions. You can choose to treat null values in aggregate functions as NULL or zero. By default, the Integration Service treats null values as NULL in aggregate functions.

Group By Ports
The Aggregator transformation lets you define groups for aggregations, rather than performing the aggregation across all input data. For example, rather than finding the total company sales, you can find the total sales grouped by region.

To define a group for the aggregate expression, select the appropriate input, input/output, output, and variable ports in the Aggregator transformation. You can select multiple group by ports to create a new group for each unique combination. The Integration Service then performs the defined aggregation for each group.
When you group values, the Integration Service produces one row for each group. If you do not group values, the Integration Service returns one row for all input rows. The Integration Service typically returns the last row of each group (or the last row received) with the result of the aggregation. However, if you specify a particular row to be returned (for example, by using the FIRST function), the Integration Service then returns the specified row.
When selecting multiple group by ports in the Aggregator transformation, the Integration Service uses port order to determine the order by which it groups. Since group order can affect the results, order group by ports to ensure the appropriate grouping. For example, the results of grouping by ITEM_ID then QUANTITY can vary from grouping by QUANTITY then ITEM_ID, because the numeric values for quantity are not necessarily unique.
Non-Aggregate Expressions
Use non-aggregate expressions in group by ports to modify or replace groups. For example, if you want to replace ‘AAA battery’ before grouping, you can create a new group by output port, named CORRECTED_ITEM, using the following expression:
IIF( ITEM = 'AAA battery', battery, ITEM )

Using Sorted Input
You can improve Aggregator transformation performance by using the sorted input option. When you use sorted input, the Integration Service assumes all data is sorted by group and it performs aggregate calculations as it reads rows for a group. When necessary, it stores group information in memory. To use the Sorted Input option, you must pass sorted data to the Aggregator transformation. You can gain performance with sorted ports when you configure the session with multiple partitions.
When you do not use sorted input, the Integration Service performs aggregate calculations as it reads. Since the data is not sorted, the Integration Service stores data for each group until it reads the entire source to ensure all aggregate calculations are accurate.
Sorted Input Conditions
Do not use sorted input if either of the following conditions are true:
·         The aggregate expression uses nested aggregate functions.
·         The session uses incremental aggregation.
·         Source data is data driven.
If you use sorted input and do not sort data correctly, the session fails.


Sorting Data
To use sorted input, you pass sorted data through the Aggregator.
Data must be sorted in the following ways:
·         By the Aggregator group by ports, in the order they appear in the Aggregator transformation.
·         Using the same sort order configured for the session. If data is not in strict ascending or descending order based on the session sort order, the Integration Service fails the session. For example, if you configure a session to use a French sort order, data passing into the Aggregator transformation must be sorted using the French sort order.
For relational and file sources, use the Sorter transformation to sort data in the mapping before passing it to the Aggregator transformation. You can place the Sorter transformation anywhere in the mapping prior to the Aggregator if no transformation changes the order of the sorted data. Group by columns in the Aggregator transformation must be in the same order as they appear in the Sorter transformation.
If the session uses relational sources, you can also use the Number of Sorted Ports option in the Source Qualifier transformation to sort group by columns in the source database. Group by columns must be in the same order in both the Aggregator and Source Qualifier transformations.

Tips
Use sorted input to decrease the use of aggregate caches.
Sorted input reduces the amount of data cached during the session and improves session performance. Use this option with the Sorter transformation to pass sorted data to the Aggregator transformation.
Limit connected input/output or output ports.
Limit the number of connected input/output or output ports to reduce the amount of data the Aggregator transformation stores in the data cache.
Filter the data before aggregating it.
If you use a Filter transformation in the mapping, place the transformation before the Aggregator transformation to reduce unnecessary aggregation.





0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home