Informatica Tutorials: April 2011

Wednesday, 27 April 2011

Working with XML files

We have three types of the xml transformations and they are listed below:

XML Source Qualifier Transformation: You can add an XML Source Qualifier transformation to a mapping by dragging an XML source definition to the Mapping Designer workspace or by manually creating one.

We can link one XML source definition to one XML Source Qualifier transformation.

We cannot link ports from more than one group in an XML Source Qualifier transformation to ports in the same target transformation.

XML Parser Transformation: The XML Parser transformation lets you extract XML data from messaging systems, such as TIBCO or MQ Series, and from other sources, such as files or databases.

Used when we need to extract XML data from a TIBCO source and pass the data to relational targets.

The XML Parser transformation reads XML data from a single input port and writes data to one or more output ports.

XML Generator Transformation: The XML Generator transformation lets you read data from messaging systems, such as TIBCO and MQ Series, or from other sources, such as files or databases.

Used when we need to extract data from relational sources and passes XML data to targets.

The XML Generator transformation accepts data from multiple ports and writes XML through a single output port.

Stage 1 (Oracle to XML)

We are gonna generate an xml file as output with the oracle emp table as source.

Step 1: Generate the XML target file.

Import the same emp table as source table
Go the targets and click on import the XML definition.
Later choose the Non XML source from the left hand pane.
Move the emp table (source table) from all sources to the Selected Sources.
After which, we got to click on open to have the target table in the target designer.
Sequential Steps to generate the xml target table is shown in below snap shots.

Fig a: Importing the XML definition

Fig b: stage 1 for generating file just got to click on OK in this window.

Fig c: Here specify the name for the target table and click next.

Fig d: finally we got to click on Finish in this window.

Step 2: Design the mapping, connect the SQ straight away to the target table.

Create the name of the mapping as per the naming convention.
Save the changes.

Step 3: Create task and the work flow.

Double click on the work flow and go to the mapping tab and here we got to specify the output file directory. (C :/) ….
Run the work flow ,check in the C drive and look for an file by name emp.xml …

Stage 2 (XML to Oracle)

Here source is gonna be the xml file and the target file is the oracle file.

Step 1: Importing the source xml file and import the target transformation.

Go the sources and click on the import XML definition.
Browse for the emp.xml file and open the same.
The first three windows are gonna be same as in previous case.
Target table is gonna be the same EMP table.

Step 2: Design the mapping.

Connections for this mapping is gonna be the following way.

Save the mapping.

Step 3: Create the task and work flow.

Create the task and the work flow using the naming conventions.
Go to the mappings tab and click on the Source on the left hand pane to specify the path for the input file.

Step 4: Preview the output on the target table.

Tuesday, 26 April 2011

SCD Type 3

SCD –Type 3:

This Method has limited history preservation, and we are goanna use skey as the Primary key here.

Source table: (01-01-2011)

Empno

Ename

Sal

101

102

103

1000

2000

3000

Target Table: (01-01-2011)

Empno

Ename

C-sal

P-sal

101

102

103

1000

2000

3000

Source Table: (01-02-2011)

Empno

Ename

Sal

101

102

103

1000

4566

3000

Target Table (01-02-2011):

Empno

Ename

C-sal

P-sal

101

102

103

102

1000

4566

3000

4544

Null

4566

So hope u got what I’m trying to do with the above tables:

Step 1: Initially in the mapping designer I’m goanna create a mapping as below. And in this mapping I’m using lookup, expression, filter, update strategy to drive the purpose. Explanation of each and every Transformation is given below.

Step 2: here we are goanna see the purpose and usage of all the transformations that we have used in the above mapping.

Look up Transformation: The look Transformation looks the target table and compares the same with the source table. Based on the Look up condition it decides whether we need to update, insert, and delete the data from being loaded in to the target table.

As usually we are goanna connect Empno column from the Source Qualifier and connect it to look up transformation. Prior to this Look up transformation has to look at the target table.
Next to this we are goanna specify the look up condition empno =empno1.
Finally specify that connection Information (Oracle) and look up policy on multiple mismatches (use last value) in the Properties tab.

Expression Transformation:

We are using the Expression Transformation to separate out the Insert-stuff’s and Update- Stuff’s logically.

Drag all the ports from the Source Qualifier and Look up in to Expression.
Add two Ports and Rename them as Insert, Update.
These two ports are goanna be just output ports. Specify the below conditions in the Expression editor for the ports respectively.

Insert: isnull(ENO1 )

Update: iif(not isnull(ENO1) and decode(SAL,Curr_Sal,1,0)=0,1,0)

Filter Transformation: We are goanna use two filter Transformation to filter out the data physically in to two separate sections one for insert and the other for the update process to happen.

Filter 1:

Drag the Insert and other three ports which came from source qualifier in to the Expression in to first filter.
In the Properties tab specify the Filter condition as Insert.

Filter 2:

Drag the update and other four ports which came from Look up in to the Expression in to Second filter.
In the Properties tab specify the Filter condition as update.

Update Strategy: Finally we need the update strategy to insert or to update in to the target table.

Update Strategy 1: This is intended to insert in to the target table.

Drag all the ports except the insert from the first filter in to this.
In the Properties tab specify the condition as the 0 or dd_insert.

Update Strategy 2: This is intended to update in to the target table.

Drag all the ports except the update from the second filter in to this.
In the Properties tab specify the condition as the 1 or dd_update.

Finally connect both the update strategy in to two instances of the target.

Step 3: Create a session for this mapping and Run the work flow.

Step 4: Observe the output it would same as the second target table

Monday, 25 April 2011

Mapping Templates

Mapping Templates Overview :

A mapping template is a drawing in Visio that represents a PowerCenter mapping. You can configure rules and parameters in a mapping template to specify the transformation logic.

Use the Informatica Stencil and the Informatica toolbar in the Mapping Architect for Visio to create a mapping template. The Informatica Stencil contains shapes that represent mapping objects that you can use to create a mapping template. The Informatica toolbar contains buttons for the tasks you can perform on mapping template.

You can create a mapping template manually, or you can create a mapping template by importing a Power Center mapping.

Creating a Mapping Template Manually:

You can use the Informatica Stencil and the Informatica toolbar to create a mapping template. Save and publish a mapping template to create the mapping template files.

To create a mapping template manually, complete the following steps:

1. Start Mapping Architect for Visio.

2. Verify that the Informatica Stencil and Informatica toolbar are available.

3. Drag the mapping objects from the Informatica Stencil to the drawing window:- Use the mapping objects to create visual representation of the mapping.

4. Create links:- Create links to connect mapping objects.

5. Configure link rules:- Configure rules for each link in the mapping template to indicate how data moves from one mapping object to another. Use parameters to make the rules flexible.

6. Configure the mapping objects:- Add a group or expression required by the transformations in the mapping template. To create multiple mappings, set a parameter for the source or target definition.

7. Declare mapping parameters and variables to use when you run sessions in Power Center:- After you import the mappings created from the mapping template into Power Center, you can use the mapping parameters and variables in the session or workflow.

8. Validate the mapping template.

9. Save the mapping template:- Save changes to the mapping template drawing file.

10. Publish the mapping template:- When you publish the mapping template, Mapping Architect for Visio generates a mapping template XML file and a mapping template parameter file (param.xml).If you edit the mapping template drawing file after you publish it, you need to publish again. Do not edit the mapping template XML file.

Importing a Mapping Template from a Power Center:

If you have a Power Center mapping that you want to use as a basis for a mapping template, export the mapping to a mapping XML file and then use the mapping XML file to create a mapping template.

Note: Export the mapping XML file within the current Power Center release. Informatica does not support imported objects from a different release.

To import a mapping template from a Power Center mapping, complete the following steps:

1. Export a Power Center mapping. In the Designer, select the mapping that you want to base the mapping template on and export it to an XML file.

2. Start Mapping Architect for Visio.

3. Verify that the Informatica stencil and Informatica toolbar are available.

4. Import the mapping. On the Informatica toolbar, click the Create Template from Mapping XML button. Mapping Architect for Visio determines the mapping objects and links included in the mapping and adds the appropriate objects to the drawing window.

5. Verify links. Create or verify links that connect mapping objects.

6. Configure link rules. Configure rules for each link in the mapping template to indicate how data moves from one mapping object to another. Use parameters to make the rules flexible.

7. Configure the mapping objects. Add a group or expression required by the transformations in the mapping template. To create multiple mappings, set a parameter for the source or target definition.

8. Declare mapping parameters and variables to use when you run the session in Power Center. After you import the mappings created from the mapping template into Power Center, you can use the mapping parameters and variables in the session or workflow.

Note: If the Power Center mapping contains mapping parameters and variables, it is possible that the mapping parameters and variables ($$ParameterName) may not work for all mappings you plan to create from the mapping template. Modify or declare new mapping parameters and variables appropriate for running the new mappings created from the mapping template.

9. Validate the mapping template.

10. Save the mapping template. Save changes to the mapping template drawing file.

11. Publish the mapping template. When you publish the mapping template, Mapping Architect for Visio generates a mapping template XML file and a mapping template parameter file (param.xml).

If you make any change to the mapping template after publishing, you need to publish the mapping template again. Do not edit the mapping template XML file.

Note: Mapping Architect for Visio fails to create a mapping template if you import a mapping that includes an unsupported source type, target type, or mapping object.

Grid Processing

Grid Processing Overview:

When a Power Center domain contains multiple nodes, you can configure workflows and sessions to run on a grid. When you run a workflow on a grid, the Integration Service runs a service process on each available node of the grid to increase performance and scalability. When you run a session on a grid, the Integration Service distributes session threads to multiple DTM processes on nodes in the grid to increase performance and scalability.
You create the grid and configure the Integration Service in the Administration Console. To run a workflow on a grid, you configure the workflow to run on the Integration Service associated with the grid. To run a session on a grid, configure the session to run on the grid.

The Integration Service distributes workflow tasks and session threads based on how you configure the workflow or session to run:

Running workflows on a grid. The Integration Service distributes workflows across the nodes in a grid. It also distributes the Session, Command, and predefined Event-Wait tasks within workflows across the nodes in a grid.
Running sessions on a grid. The Integration Service distributes session threads across nodes in a grid.

Note: To run workflows on a grid, you must have the Server grid option. To run sessions on a grid, you must have the Session on Grid option.

Running Workflows on a Grid:

When you run a workflow on a grid, the master service process runs the workflow and all tasks except Session, Command, and predefined Event-Wait tasks, which it may distribute to other nodes. The master service process is the Integration Service process that runs the workflow, monitors service processes running on other nodes, and runs the Load Balancer. The Scheduler runs on the master service process node, so it uses the date and time for the master service process node to start scheduled workflows.

The Load Balancer is the component of the Integration Service that dispatches Session, Command, and predefined Event-Wait tasks to the nodes in the grid. The Load Balancer distributes tasks based on node availability. If the Integration Service is configured to check resources, the Load Balancer also distributes tasks based on resource availability.

For example, a workflow contains a Session task, a Decision task, and a Command task. You specify a resource requirement for the Session task. The grid contains four nodes, and Node 4 is unavailable. The master service process runs the Start and Decision tasks. The Load Balancer distributes the Session and Command tasks to

nodes on the grid based on resource availability and node availability.

Running Sessions on a Grid:

When you run a session on a grid, the master service process runs the workflow and all tasks except Session, Command, and predefined Event-Wait tasks as it does when you run a workflow on a grid. The Scheduler runs on the master service process node, so it uses the date and time for the master service process node to start scheduled workflows. In addition, the Load Balancer distributes session threads to DTM processes running on different nodes.

When you run a session on a grid, the Load Balancer distributes session threads based on the following factors:

Node availability :- The Load Balancer verifies which nodes are currently running, enabled, and available for task dispatch.
Resource availability :- If the Integration Service is configured to check resources, it identifies nodes that have resources required by mapping objects in the session.
Partitioning configuration. The Load Balancer dispatches groups of session threads to separate nodes based on the partitioning configuration.

You might want to configure a session to run on a grid when the workflow contains a session that takes a long time to run.

Grid Connectivity and Recovery

When you run a workflow or session on a grid, service processes and DTM processes run on different nodes. Network failures can cause connectivity loss between processes running on separate nodes. Services may shut down unexpectedly, or you may disable the Integration Service or service processes while a workflow or session is running. The Integration Service failover and recovery behavior in these situations depends on the service process that is disabled, shuts down, or loses connectivity. Recovery behavior also depends on the following factors:

High availability option:-When you have high availability, workflows fail over to another node if the node or service shuts down. If you do not have high availability, you can manually restart a workflow on another node to recover it.
Recovery strategy:- You can configure a workflow to suspend on error. You configure a recovery strategy for tasks within the workflow. When a workflow suspends, the recovery behavior depends on the recovery strategy you configure for each task in the workflow.
Shutdown mode:- When you disable an Integration Service or service process, you can specify that the service completes, aborts, or stops processes running on the service. Behavior differs when you disable the Integration Service or you disable a service process. Behavior also differs when you disable a master service process or a worker service process. The Integration Service or service process may also shut down unexpectedly. In this case, the failover and recovery behavior depend on which service process shuts down and the configured recovery strategy.
Running mode:-If the workflow runs on a grid, the Integration Service can recover workflows and tasks on another node. If a session runs on a grid, you cannot configure a resume recovery strategy.
Operating mode:- If the Integration Service runs in safe mode, recovery is disabled for sessions and workflows.

Note: You cannot configure an Integration Service to fail over in safe mode if it runs on a grid.

Workflow Variables

Workflow Variables Overview:

You can create and use variables in a workflow to reference values and record information. For example, use a Variable in a Decision task to determine whether the previous task ran properly. If it did, you can run the next task.

If not, you can stop the workflow. Use the following types of workflow variables:

Predefined workflow variables. The Workflow Manager provides predefined workflow variables for tasks within a workflow.
User-defined workflow variables. You create user-defined workflow variables when you create a workflow. Use workflow variables when you configure the following types of tasks:
Assignment tasks. Use an Assignment task to assign a value to a user-defined workflow variable. For Example, you can increment a user-defined counter variable by setting the variable to its current value plus 1.
Decision tasks. Decision tasks determine how the Integration Service runs a workflow. For example, use the Status variable to run a second session only if the first session completes successfully.
Links. Links connect each workflow task. Use workflow variables in links to create branches in the workflow. For example, after a Decision task, you can create one link to follow when the decision condition evaluates to true, and another link to follow when the decision condition evaluates to false.
Timer tasks. Timer tasks specify when the Integration Service begins to run the next task in the workflow. Use a user-defined date/time variable to specify the time the Integration Service starts to run the next task.

Use the following keywords to write expressions for user-defined and predefined workflow variables:

AND
OR
NOT
TRUE
FALSE
NULL
SYSDATE

Predefined Workflow Variables:

Each workflow contains a set of predefined variables that you use to evaluate workflow and task conditions. Use the following types of predefined variables:

Task-specific variables. The Workflow Manager provides a set of task-specific variables for each task in the workflow. Use task-specific variables in a link condition to control the path the Integration Service takes when running the workflow. The Workflow Manager lists task-specific variables under the task name in the Expression Editor.
Built-in variables. Use built-in variables in a workflow to return run-time or system information such as folder name, Integration Service Name, system date, or workflow start time. The Workflow Manager lists built-in variables under the Built-in node in the Expression Editor.

Task-Specific Variables	Description	Task Types	Data type
Condition	Evaluation result of decision condition expression. If the task fails, the Workflow Manager keeps the condition set to null. Sample syntax: $Dec_TaskStatus.Condition = <TRUE \| FALSE \| NULL \| any integer>	Decision	Integer
End Time	Date and time the associated task ended. Precision is to the second. Sample syntax: $s_item_summary.EndTime > TO_DATE('11/10/2004 08:13:25')	All tasks	Date/Time
ErrorCode	Last error code for the associated task. If there is no error, the Integration Service sets ErrorCode to 0 when the task completes. Sample syntax: $s_item_summary.ErrorCode = 24013. Note: You might use this variable when a task consistently fails with this final error message.	All tasks	Integer
ErrorMsg	Last error message for the associated task.If there is no error, the Integration Service sets ErrorMsg to an empty string when the task completes. Sample syntax: $s_item_summary.ErrorMsg = 'PETL_24013 Session run completed with failure Variables of type Nstring can have a maximum length of 600 characters. Note: You might use this variable when a task consistently fails with this final error message.	All tasks	Nstring
First Error Code	Error code for the first error message in the session. If there is no error, the Integration Service sets FirstErrorCode to 0 when the session completes. Sample syntax: $s_item_summary.FirstErrorCode = 7086	Session	Integer
FirstErrorMsg	First error message in the session.If there is no error, the Integration Service sets FirstErrorMsg to an empty string when the task completes. Sample syntax: $s_item_summary.FirstErrorMsg = 'TE_7086 Tscrubber: Debug info… Failed to evalWrapUp'Variables of type Nstring can have a maximum length of 600 characters.	Session	Nstring
PrevTaskStatus	Status of the previous task in the workflow that the Integration Service ran. Statuses include: 1.ABORTED 2.FAILED 3.STOPPED 4.SUCCEEDED Use these key words when writing expressions to evaluate the status of the previous task. Sample syntax: $Dec_TaskStatus.PrevTaskStatus = FAILED	All Tasks	Integer
SrcFailedRows	Total number of rows the Integration Service failed to read from the source. Sample syntax: $s_dist_loc.SrcFailedRows = 0	Session	Integer
SrcSuccessRows	Total number of rows successfully read from the sources. Sample syntax: $s_dist_loc.SrcSuccessRows > 2500	Session	Integer
StartTime	Date and time the associated task started. Precision is to the second. Sample syntax: $s_item_summary.StartTime > TO_DATE('11/10/2004 08:13:25')	All Task	Date/Time
Status	Status of the previous task in the workflow. Statuses include: - ABORTED - DISABLED - FAILED - NOTSTARTED - STARTED - STOPPED - SUCCEEDED Use these key words when writing expressions to evaluate the status of the current task. Sample syntax: $s_dist_loc.Status = SUCCEEDED	All Task	Integer
TgtFailedRows	Total number of rows the Integration Service failed to write to the target. Sample syntax: $s_dist_loc.TgtFailedRows = 0	Session	Integer
TgtSuccessRows	Total number of rows successfully written to the target. Sample syntax: $s_dist_loc.TgtSuccessRows > 0	Session	Integer
TotalTransErrors	Total number of transformation errors. Sample syntax: $s_dist_loc.TotalTransErrors = 5	Session	Integer

User-Defined Workflow Variables:

You can create variables within a workflow. When you create a variable in a workflow, it is valid only in that workflow. Use the variable in tasks within that workflow. You can edit and delete user-defined workflow variables.

Use user-defined variables when you need to make a workflow decision based on criteria you specify. For example, you create a workflow to load data to an orders database nightly. You also need to load a subset of this data to headquarters periodically, every tenth time you update the local orders database. Create separate sessions to update the local database and the one at headquarters.

Use a user-defined variable to determine when to run the session that updates the orders database at headquarters.

To configure user-defined workflow variables, complete the following steps:

1. Create a persistent workflow variable, $$WorkflowCount, to represent the number of times the workflow has run.

2. Add a Start task and both sessions to the workflow.

3. Place a Decision task after the session that updates the local orders database.Set up the decision condition to check to see if the number of workflow runs is evenly divisible by 10. Use the modulus (MOD) function to do this.

4. Create an Assignment task to increment the $$WorkflowCount variable by one.

5. Link the Decision task to the session that updates the database at headquarters when the decision condition evaluates to true. Link it to the Assignment task when the decision condition evaluates to false. When you configure workflow variables using conditions, the session that updates the local database runs every time the workflow runs. The session that updates the database at headquarters runs every 10th time the workflow runs.

Creating User-Defined Workflow Variables :

You can create workflow variables for a workflow in the workflow properties.

To create a workflow variable:

1. In the Workflow Designer, create a new workflow or edit an existing one.

2. Select the Variables tab.

3. Click Add.

4. Enter the information in the following table and click OK:

Field	Description
Name	Variable name. The correct format is $$VariableName. Workflow variable names are not case sensitive. Do not use a single dollar sign ($) for a user-defined workflow variable. The single dollar sign is reserved for predefined workflow variables
Data type	Data type of the variable. You can select from the following data types: - Date/Time - Double - Integer - Nstring
Persistent	Whether the variable is persistent. Enable this option if you want the value of the variable retained from one execution of the workflow to the next.
Default Value	Default value of the variable. The Integration Service uses this value for the variable during sessions if you do not set a value for the variable in the parameter file and there is no value stored in the repository. Variables of type Date/Time can have the following formats: - MM/DD/RR - MM/DD/YYYY - MM/DD/RR HH24:MI - MM/DD/YYYY HH24:MI - MM/DD/RR HH24:MI:SS - MM/DD/YYYY HH24:MI:SS - MM/DD/RR HH24:MI:SS.MS - MM/DD/YYYY HH24:MI:SS.MS - MM/DD/RR HH24:MI:SS.US - MM/DD/YYYY HH24:MI:SS.US - MM/DD/RR HH24:MI:SS.NS - MM/DD/YYYY HH24:MI:SS.NS You can use the following separators: dash (-), slash (/), backslash (\), colon (:), period (.), and space. The Integration Service ignores extra spaces. You cannot use one- or three-digit values for year or the “HH12” format for hour. Variables of type Nstring can have a maximum length of 600 characters.
Is Null	Whether the default value of the variable is null. If the default value is null, enable this option.
Description	Description associated with the variable.

5. To validate the default value of the new workflow variable, click the Validate button.

6. Click Apply to save the new workflow variable.

7. Click OK.

Friday, 22 April 2011

Constraint-Based Loading:

Constraint-Based Loading:

In the Workflow Manager, you can specify constraint-based loading for a session. When you select this option, the Integration Service orders the target load on a row-by-row basis. For every row generated by an active source, the Integration Service loads the corresponding transformed row first to the primary key table, then to any foreign key tables. Constraint-based loading depends on the following requirements:

Active source. Related target tables must have the same active source.
Key relationships. Target tables must have key relationships.
Target connection groups. Targets must be in one target connection group.
Treat rows as insert. Use this option when you insert into the target. You cannot use updates with constraint based loading.

Active Source:

When target tables receive rows from different active sources, the Integration Service reverts to normal loading for those tables, but loads all other targets in the session using constraint-based loading when possible. For example, a mapping contains three distinct pipelines. The first two contain a source, source qualifier, and target. Since these two targets receive data from different active sources, the Integration Service reverts to normal loading for both targets. The third pipeline contains a source, Normalizer, and two targets. Since these two targets share a single active source (the Normalizer), the Integration Service performs constraint-based loading: loading the primary key table first, then the foreign key table.

Key Relationships:

When target tables have no key relationships, the Integration Service does not perform constraint-based loading.

Similarly, when target tables have circular key relationships, the Integration Service reverts to a normal load. For example, you have one target containing a primary key and a foreign key related to the primary key in a second target. The second target also contains a foreign key that references the primary key in the first target. The Integration Service cannot enforce constraint-based loading for these tables. It reverts to a normal load.

Target Connection Groups:

The Integration Service enforces constraint-based loading for targets in the same target connection group. If you want to specify constraint-based loading for multiple targets that receive data from the same active source, you must verify the tables are in the same target connection group. If the tables with the primary key-foreign key relationship are in different target connection groups, the Integration Service cannot enforce constraint-based loading when you run the workflow. To verify that all targets are in the same target connection group, complete the following tasks:

Verify all targets are in the same target load order group and receive data from the same active source.
Use the default partition properties and do not add partitions or partition points.
Define the same target type for all targets in the session properties.
Define the same database connection name for all targets in the session properties.
Choose normal mode for the target load type for all targets in the session properties.

Treat Rows as Insert:

Use constraint-based loading when the session option Treat Source Rows As is set to insert. You might get inconsistent data if you select a different Treat Source Rows As option and you configure the session for constraint-based loading.

When the mapping contains Update Strategy transformations and you need to load data to a primary key table first, split the mapping using one of the following options:

Load primary key table in one mapping and dependent tables in another mapping. Use constraint-based loading to load the primary table.
Perform inserts in one mapping and updates in another mapping.

Constraint-based loading does not affect the target load ordering of the mapping. Target load ordering defines the order the Integration Service reads the sources in each target load order group in the mapping. A target load order group is a collection of source qualifiers, transformations, and targets linked together in a mapping. Constraint based loading establishes the order in which the Integration Service loads individual targets within a set of targets receiving data from a single source qualifier.

Example

The following mapping is configured to perform constraint-based loading:

In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign keys referencing the T1 primary key. T_3 has a primary key that T_4 references as a foreign key.

Since these tables receive records from a single active source, SQ_A, the Integration Service loads rows to the target in the following order:

1. T_1

2. T_2 and T_3 (in no particular order)

3. T_4

The Integration Service loads T_1 first because it has no foreign key dependencies and contains a primary key referenced by T_2 and T_3. The Integration Service then loads T_2 and T_3, but since T_2 and T_3 have no dependencies, they are not loaded in any particular order. The Integration Service loads T_4 last, because it has a foreign key that references a primary key in T_3.After loading the first set of targets, the Integration Service begins reading source B. If there are no key relationships between T_5 and T_6, the Integration Service reverts to a normal load for both targets.

If T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6 receive data from a single active source, the Aggregator AGGTRANS, the Integration Service loads rows to the tables in the following order:

T_1, T_2, T_3, and T_4 are in one target connection group if you use the same database connection for each target, and you use the default partition properties. T_5 and T_6 are in another target connection group together if you use the same database connection for each target and you use the default partition properties. The Integration Service includes T_5 and T_6 in a different target connection group because they are in a different target load order group from the first four targets.

Enabling Constraint-Based Loading:

When you enable constraint-based loading, the Integration Service orders the target load on a row-by-row basis. To enable constraint-based loading:

In the General Options settings of the Properties tab, choose Insert for the Treat Source Rows As property.
Click the Config Object tab. In the Advanced settings, select Constraint Based Load Ordering.
Click OK.

Thursday, 21 April 2011

Target Load Order

Target Load Plan

When you use a mapplet in a mapping, the Mapping Designer lets you set the target load plan for sources within the mapplet.

Setting the Target Load Order

You can configure the target load order for a mapping containing any type of target definition. In the Designer, you can set the order in which the Integration Service sends rows to targets in different target load order groups in a mapping. A target load order group is the collection of source qualifiers, transformations, and targets linked together in a mapping. You can set the target load order if you want to maintain referential integrity when inserting, deleting, or updating tables that have the primary key and foreign key constraints.

The Integration Service reads sources in a target load order group concurrently, and it processes target load order groups sequentially.

To specify the order in which the Integration Service sends data to targets, create one source qualifier for each target within a mapping. To set the target load order, you then determine in which order the Integration Service reads each source in the mapping.

The following figure shows two target load order groups in one mapping:

In this mapping, the first target load order group includes ITEMS, SQ_ITEMS, and T_ITEMS. The second target load order group includes all other objects in the mapping, including the TOTAL_ORDERS target. The Integration Service processes the first target load order group, and then the second target load order group.

When it processes the second target load order group, it reads data from both sources at the same time.

To set the target load order:

Create a mapping that contains multiple target load order groups.
Click Mappings > Target Load Plan.
The Target Load Plan dialog box lists all Source Qualifier transformations in the mapping and the targets that receive data from each source qualifier.
Select a source qualifier from the list.
Click the Up and Down buttons to move the source qualifier within the load order.
Repeat steps 3 to 4 for other source qualifiers you want to reorder. Click OK.

Wednesday, 20 April 2011

Type 2

SCD 2 (Complete):

Let us drive the point home using a simple scenario. For eg., in the current month ie.,(01-01-2010) we are provided with an source table with the three columns and three rows in it like (EMpno,Ename,Sal). There is a new employee added and one change in the records in the month (01-02-2010). We are gonna use the SCD-2 style to extract and load the records in to target table.

The thing to be noticed here is if there is any update in the salary of any employee then the history of that employee is displayed with the current date as the start date and the previous date as the end date.

Source Table: (01-01-11)

Emp no	Ename	Sal
101	A	1000
102	B	2000
103	C	3000

Target Table: (01-01-11)

Skey	Emp no	Ename	Sal	S-date	E-date	Ver	Flag
100	101	A	1000	01-01-10	Null	1	1
200	102	B	2000	01-01-10	Null	1	1
300	103	C	3000	01-01-10	Null	1	1

Source Table: (01-02-11)

Emp no	Ename	Sal
101	A	1000
102	B	2500
103	C	3000
104	D	4000

Target Table: (01-02-11)

Skey	Emp no	Ename	Sal	S-date	E-date	Ver	Flag
100	101	A	1000	01-02-10	Null	1	1
200	102	B	2000	01-02-10	Null	1	1
300	103	C	3000	01-02-10	Null	1	1
201	102	B	2500	01-02-10	01-01-10	2	0
400	104	D	4000	01-02-10	Null	1	1

In the second Month we have one more employee added up to the table with the Ename D and salary of the Employee is changed to the 2500 instead of 2000.

Step 1: Is to import Source Table and Target table.

Create a table by name emp_source with three columns as shown above in oracle.
Import the source from the source analyzer.
Drag the Target table twice on to the mapping designer to facilitate insert or update process.
Go to the targets Menu and click on generate and execute to confirm the creation of the target tables.
The snap shot of the connections using different kinds of transformations are shown below.

In The Target Table we are goanna add five columns (Skey, Version, Flag, S_date ,E_Date).

Step 2: Design the mapping and apply the necessary transformation.

Here in this transformation we are about to use four kinds of transformations namely Lookup transformation (1), Expression Transformation (3), Filter Transformation (2), Sequence Generator. Necessity and the usage of all the transformations will be discussed in detail below.

Look up Transformation: The purpose of this transformation is to Lookup on the target table and to compare the same with the Source using the Lookup Condition.

The first thing that we are gonna do is to create a look up transformation and connect the Empno from the source qualifier to the transformation.
The snapshot of choosing the Target table is shown below.

Drag the Empno column from the Source Qualifier to the Lookup Transformation.
The Input Port for only the Empno1 should be checked.
In the Properties tab (i) Lookup table name ->Emp_Target.

(ii)Look up Policy on Multiple Mismatch -> use Last Value.

(iii) Connection Information ->Oracle.

In the Conditions tab (i) Click on Add a new condition

(ii)Lookup Table Column should be Empno, Transformation port should be Empno1 and Operator should ‘=’.

Expression Transformation: After we are done with the Lookup Transformation we are using an expression transformation to find whether the data on the source table matches with the target table. We specify the condition here whether to insert or to update the table. The steps to create an Expression Transformation are shown below.

Drag all the columns from both the source and the look up transformation and drop them all on to the Expression transformation.
Now double click on the Transformation and go to the Ports tab and create two new columns and name it as insert and update. Both these columns are goanna be our output data so we need to have unchecked input check box.
The Snap shot for the Edit transformation window is shown below.

The condition that we want to parse through our output data are listed below.

Insert : IsNull(EmpNO1)

Update: iif(Not isnull (Skey) and Decode(SAL,SAL1,1,0)=0,1,0) .

We are all done here .Click on apply and then OK.

Filter Transformation: We need two filter transformations the purpose the first filter is to filter out the records which we are goanna insert and the next is vice versa.

If there is no change in input data then filter transformation 1 forwards the complete input to Exp 1 and same output is goanna appear in the target table.
If there is any change in input data then filter transformation 2 forwards the complete input to the Exp 2 then it is gonna forward the updated input to the target table.
Go to the Properties tab on the Edit transformation

(i) The value for the filter condition 1 is Insert.

(ii) The value for the filter condition 2 is Update.

The closer view of the connections from the expression to the filter is shown below.

Sequence Generator: We use this to generate an incremental cycle of sequential range of number.The purpose of this in our mapping is to increment the skey in the bandwidth of 100.

We are gonna have a sequence generator and the purpose of the sequence generator is to increment the values of the skey in the multiples of 100 (bandwidth of 100).
Connect the output of the sequence transformation to the Exp 1.

Expression Transformation:

Exp 1: It updates the target table with the skey values. Point to be noticed here is skey gets multiplied by 100 and a new row is generated if there is any new EMP added to the list. Else the there is no modification done on the target table.

Drag all the columns from the filter 1 to the Exp 1.
Now add a new column as N_skey and the expression for it is gonna be Nextval1*100.
We are goanna make the s-date as the o/p and the expression for it is sysdate.
Flag is also made as output and expression parsed through it is 1.
Version is also made as output and expression parsed through it is 1.

Exp 2: If same employee is found with any updates in his records then Skey gets added by 1 and version changes to the next higher number,F

Drag all the columns from the filter 2 to the Exp 2.
Now add a new column as N_skey and the expression for it is gonna be Skey+1.
Both the S_date and E_date is gonna be sysdate.

Exp 3: If any record of in the source table gets updated then we make it only as the output.

If change is found then we are gonna update the E_Date to S_Date.

Update Strategy: This is place from where the update instruction is set on the target table.

The update strategy expression is set to 1.

Step 3: Create the task and Run the work flow.

Don’t check the truncate table option.
Change Bulk to the Normal.
Run the work flow from task.
Create the task and run the work flow.

Step 4: Preview the Output in the target table.

Update Strategy Transformation

Active and Connected Transformation

Till now, we have only inserted rows in our target tables. What if we want to update, delete or reject rows coming from source based on some condition?

Example: If Address of a CUSTOMER changes, we can update the old address or keep both old and new address. One row is for old and one for new. This way we maintain the historical data.

Update Strategy is used with Lookup Transformation. In DWH, we create a Lookup on target table to determine whether a row already exists or not. Then we insert, update, delete or reject the source record as per business need.

In Power Center, we set the update strategy at two different levels:

Within a session
Within a Mapping

1. Update Strategy within a session:

When we configure a session, we can instruct the IS to either treat all rows in the same way or use instructions coded into the session mapping to flag rows for different database operations.

Session Configuration:

Edit Session -> Properties -> Treat Source Rows as: (Insert, Update, Delete, and Data Driven). Insert is default. Specifying Operations for Individual Target Tables:

You can set the following update strategy options:

Insert: Select this option to insert a row into a target table.

Delete: Select this option to delete a row from a table.

Update: We have the following options in this situation:

Update as Update. Update each row flagged for update if it exists in the target table.
Update as Insert. Inset each row flagged for update.
Update else Insert. Update the row if it exists. Otherwise, insert it.

Truncate table: Select this option to truncate the target table before loading data.

2. Flagging Rows within a Mapping

Within a mapping, we use the Update Strategy transformation to flag rows for insert, delete, update, or reject.

Operation	Constant	Numeric Value
INSERT	DD_INSERT	0
UPDATE	DD_UPDATE	1
DELETE	DD_DELETE	2
REJECT	DD_REJECT	3

Update Strategy Expressions:

Frequently, the update strategy expression uses the IIF or DECODE function from the transformation language to test each row to see if it meets a particular condition.

IIF( ( ENTRY_DATE > APPLY_DATE), DD_REJECT, DD_UPDATE )

IIF( ( ENTRY_DATE > APPLY_DATE), 3, 2 )

The above expression is written in Properties Tab of Update Strategy T/f.
DD means DATA DRIVEN

Forwarding Rejected Rows:

We can configure the Update Strategy transformation to either pass rejected rows to the next transformation or drop them.

Steps:

Create Update Strategy Transformation
Pass all ports needed to it.
Set the Expression in Properties Tab.
Connect to other transformations or target.

Performance tuning:

Use Update Strategy transformation as less as possible in the mapping.
Do not use update strategy transformation if we just want to insert into target table, instead use direct mapping, direct filtering etc.
For updating or deleting rows from the target table we can use Update Strategy transformation itself.