Frequently asked Informatica interview questions and answers 2019
Updated: Nov 18, 2022
What is Informatica and Enterprise Data Warehouse?
Informatica is an ETL tool which is used to extract, transform and load data. It plays crucial role in building Enterprise Data Warehouse.
EDW provides user a global view on historical data stored from various department in an organization like finance, sales, labor, marketing etc based on which critical business decisions could be made.
What is repository service, integration service, reporting service, domain and node?
There are basically three types of services which you will find under application services:
Repository service - It is responsible for storing Informatica metadata (source definition, target definition, mappings, mapplets and transformations in the form of tables) and providing access to these repositories for other services. It is also responsible for maintaining connection between client development tool and Informatica repository which keeps metadata up-to-date by updating, inserting and deleting the metadata.
Integration Service - It is responsible for execution of workflow and end-to-end data movement from source to target. It pulls the workflow information from repository, starts the execution of tasks, gathers/combines data from different sources (for example relational database and flat file) and loads into single or multiple targets.
Reporting Service - It enables report generation.
Domain - It is administrative part of Informatica which is used by admins to operate nodes and services. You can define multiple nodes in a domain with a gateway node which is responsible for receiving request and distributing it to worker nodes. Further, nodes are responsible for running services and other Informatica processes. There are basically two types of services in a domain - Service manager and Application services.
Service manager is responsible for logging and login operations. Like authentication, authorization, managing people and groups.
Application services are responsible for managing integration services, repository services and reporting services.
Key properties of domain are as follows:
Database properties - You can define database instance name and port responsible for holding domain. It consist of database type (like Oracle, SQL server etc), host, port, database name and user name.
General Properties - You can define resilience timeout, restart period, restart attempts, dispatch mode etc. For example if services goes down how many seconds application services wait to again connect to respective service depends upon resilience timeout, how long Informatica can try restarting those services depends upon restart period and attempts. How tasks will be distributed to worker nodes from gateway will depend upon dispatch mode like round robin.
What is PowerCenter repository?
It is like a relational database which stores Informatica metadata in the form of tables (underlying database could be Oracle database or SQL server or similar) and it is managed by repository services.
What is Informatica client tool?
Informatica client tool is basically developer tool installed on client machine and it consist of four parts:
Designer (Source, target, mapping, mapplet and transformations designer)
Workflow manager (Task, worklet and workflow designer)
Basic terminology consist of:
Workflow, Worklet, Sessions & Tasks: Workflow consists of one or more session, worklet and task (includes timer, decision, command, event wait, mail, link, assignment, control etc) connected in parallel or sequence. You can run these sessions by using session manager or pmcmd command. Further, you can write pre-post-session commands.
Mappings, Mapplets & Transformations: Mappings are collection of source, target and transformations. Mapplets (designed in mapplet designer) are like re-usable mappings which contains various transformations but no source/target. You can run the mapping in debug mode without creating a session. There are mapping parameters and variables as well. Mapping parameters represent constant values that are defined before running a session while mapping variables can change values during sessions.
What could be the various states of object in Informatica?
Valid - fully syntactically correct object.
Invalid - where syntax and properties are invalid according to Informatica standards. Informatica marks those objects invalid. It could be mapping, mapplet, task, session, workflow or any transformation.
Impacted - where underlying object is invalid. For instance in a mapping suppose underlying transformation has become invalid due to some change.
What happens when user executes the workflow?
User executes workflow
Informatica invokes integration service to pull workflow details from repository
Integration service starts execution of workflow after gathering workflow metadata
Integration service runs all child tasks
Reads and combine data from sources and loads into target
After execution, it updates the status of task as succeeded, failed, unknown or aborted
Workflow and session log is generated
What is the difference between ABORT and STOP?
Abort will kill process after 60 seconds even if data processing hasn't finished. Session will be forcefully terminated and DTM (Data Transformation Manager) process will be killed.
Stop will end the process once DTM processing has finished processing. Although it stops reading data from source as soon as stop command is received.
What are the various types of transformation available in Informatica?
There are basically two categories of transformation in Informatica.
Active - It can change the number of rows that pass through transformation for example filter, sorter transformations. It can also change the row type for example update strategy transformation which can mark row for update, insert, reject or delete. Further it can also change the transaction boundary for example transaction control transformation which can allow commit and rollback for each row based on certain expression evaluation.
Passive - Maintains same number of rows, no change in row type and transaction boundary. Number of output rows will remain always same as number of input rows.
Active transformation includes:
Source Qualifier: Connected to relational source or flat file, converts source data type to Informatica native data types. Performs joins, filter, sort, distinct and you can write custom SQL. You can have multiple source qualifier in single session with multiple targets, in this case you can decide target load order as well.
Joiner: It can join two heterogeneous sources unlike source qualifier which needs common source. It performs normal join, master outer join, detail outer join and full outer join.
Filter: It has single condition - drops records based on filter criteria like SQL where clause, single input and single output.
Router: It has input, output and default group - acts like filter but you can apply multiple conditions for each group like SQL case statement, single input multiple output, more easier and efficient to work as compared to filter.
Aggregator: It performs calculations such as sums, averages etc. It is unlike expression transformation in which one can do calculations in groups. It also provides extra cache files to store transformation values if required.
Sorter: Sorts the data. It has distinct option which can filter duplicate rows.
Lookup: It has input, output, lookup and return port. Explained in next question.
Union: It works like union all SQL.
Update Strategy: Treats source row as "data driven" - DD_UPDATE, DD_DELETE, DD_INSERT and DD_REJECT.
Normalizer: Takes multiple columns and returns few based on normalization.
Passive transformation includes:
Expression: It is used to calculate in single row before writing on the target, basically non-aggregate calculations.
Sequence Generator: For generating primary keys, surrogate keys (NEXTVAL & CURRVAL).
Explain Lookup transformation in detail?
Lookup transformation is used to lookup a flat file, relational database table or view. It has basically four ports - input (I), output (O), lookup (L) and return port (R). Lookup transformations can be connected or unconnected and could act as active/passive transformation.
Connected lookup: It can return multiple output values.
It can have both dynamic and static cache.
It can return more than one column value in output port.
It caches all lookup columns.
Unconnected lookup: It can take multiple input parameters like column1, column2, column3 .. for lookup but output will be just one value.