1. What are the Data Marts?

A data mart is a collection of tables focused on specific business group/department. It may have multi-dimensional or normalized. Data marts are usually built from a bigger data warehouse or from operational data.

2. Explain What are the vaious ETL tools in the Market?

Various ETL tools used in market are:

Informatica
Data Stage
Oracle Warehouse Bulider
Ab Initio
Data Junction

BusinessObjects DataIntegrator is another ETL tool.

3. Explain the definition of normalized and denormalized view and what are the differences between them?

Normalization is the process of removing redundancies.

Denormalization is the process of allowing redundancies.

Normalization is the process of removing redundancies.

Denormalization is the process of allowing redundancies.

4. What is surrogate key? where we use it explain with example?

surrogate key is a substitution for the natural primary key.

It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table.

Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension tables primary keys. They can use Infa sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key.

It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.

Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as the primary keys (according to the business users) but ,not only can these change, indexing on a numerical value is probably better and you could consider creating a surrogate key called, say, AIRPORT_ID. This would be internal to the system and as far as the client is concerned you may display only the AIRPORT_NAME.

2. Adapted from response by Vincent on Thursday, March 13, 2003

Another benefit you can get from surrogate keys (SID) is :

Tracking the SCD - Slowly Changing Dimension.

Let me give you a simple, classical example:

On the 1st of January 2002, Employee 'E1' belongs to Business Unit 'BU1' (that's what would be in your Employee Dimension). This employee has a turnover allocated to him on the Business Unit 'BU1' But on the 2nd of June the Employee 'E1' is muted from Business Unit 'BU1' to Business Unit 'BU2.' All the new turnover have to belong to the new Business Unit 'BU2' but the old one should Belong to the Business Unit 'BU1.'

If you used the natural business key 'E1' for your employee within your datawarehouse everything would be allocated to Business Unit 'BU2' even what actualy belongs to 'BU1.'

If you use surrogate keys, you could create on the 2nd of June a new record for the Employee 'E1' in your Employee Dimension with a new surrogate key.

This way, in your fact table, you have your old data (before 2nd of June) with the SID of the Employee 'E1' + 'BU1.' All new data (after 2nd of June) would take the SID of the employee 'E1' + 'BU2.'

You could consider Slowly Changing Dimension as an enlargement of your natural key: natural key of the Employee was Employee Code 'E1' but for you it becomes
Employee Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the difference with the natural key enlargement process, is that you might not have all part of your new key within your fact table, so you might not be able to do the join on the new enlarge key -> so you need another id.

A surrogate key is a system generated sequential number which acts as a primary key.

5. What is the datatype of the surrogate key?

Datatype of the surrogate key is either integer or numeric.it,s always generated by system because surrogate key works as primary key.surrogate key help us to distinguish the information about the data and store the data history.

6. What is incremintal loading?
What is batch processing?
What is crass reference table?
What is aggregate fact table?

Incremental loading means loading the ongoing changes in the OLTP.<br><br>Aggregate table contains the [measure] values ,aggregated /grouped/summed up to some level of hirarchy.<br>

Batch Processing means executing more than one session in single run at the same time. we can execute these session in 2 ways : <br>linear: exececuting one after another<br>parralel: executing more than one session at at time

8. What is metadata in context of a Datawarehouse and how it is important?

Meta data is the data about data; Business Analyst or data modeler usually capture information about data - the source (where and how the data is originated), nature of data (char, varchar, nullable, existance, valid values etc) and behavior of data (how it is modified / derived and the life cycle ) in data dictionary a.k.a metadata. Metadata is also presented at the Datamart level, subsets, fact and dimensions, ODS etc. For a DW user, metadata provides vital information for analysis / DSS.

9. What is static and local variable?

Static variable is not created on function stack but is created in
the initialized data segment and hence the variable can be shared across the multiple call of the same function. Usage of static variables within a function is not thread safe.

On the other hand local variable or auto variable is created on function stack and valid only in the context of the function call and is not shared across function calls.

10. What is the main difference between schema in RDBMS and schemas in DataWarehouse?

RDBMS Schema
* Used for OLTP systems
* Traditional and old schema
* Normalized
* Difficult to understand and navigate
* Cannot solve extract and complex problems
* Poorly modelled

DWH Schema
* Used for OLAP systems
* New generation schema
* De Normalized
* Easy to understand and navigate
* Extract and complex problems can be easily solved
* Very good model

Download Interview PDF