1. What is Meta data?

Metadata is data about data. E.g. if in data mart we are receiving any file. Then metadata will contain information like how many columns, file is fix width/limited, ordering of fields, data types of field etc.

2. Briefly state different between data ware house & data mart?

Data warehouse is made up of many datamarts. DWH contain many subject areas. However, data mart focuses on one subject area generally. E.g. If there will be DHW of bank then there can be one data mart for accounts, one for Loans etc. This is high-level definitions.

3. What is galaxy schema?

Galaxy schema is also known as fact constellation scheme. It requires no of fact tables to share dimension tables. In data, wares housing mainly the people are using the conceptual hierarchy.

4. Suppose you are filtering the rows using a filter transformation only the rows meet the condition pass to the target. Tell me where the rows will go that does not meet the condition.

Informatica filter transformation default value is 1 i.e. true. If you place a break point on filter transformation and run the mapping in a debugger mode, you will find these values 1 or 0 for each row passing through filter. If you change 0 to 1, the particular row will be passed to next stage.

6. What is Core Dimension?

Core Dimension is a Dimension table, which is used dedicated for single fact table or Datamart. Conform Dimension is a Dimension table which is used across fact tables or Data marts.

7. How much data hold in one universe.

Universe does not hold any data. However, practically the universe is known to have issues when the objects cross 6000.

8. Can any one explain about Core Dimension, Balanced Dimension, and Dirty Dimension?

Dirty Dimension is nothing but Junk Dimensions. Core Dimensions are dedicated for a fact table or Data mart. Conformed Dimensions are used across fact tables or Data marts.

9. Can any one explain the Hierarchies level Data warehousing.

In Data warehousing, levels are columns available in dimension table. Levels are having attributes. Hierarchies are used for navigational purpose; there are two types of Hierarchies. You can define hierarchies in top down or bottom up.

1. Natural Hierarchy: Best example is Time Dimension - Year, Month, Day etc. In natural Hierarchy definite relationship exists between each level

2. Navigational Hierarchy: You can have levels like

Ex - Production cost of Product, Sales Cost of Product.

Ex - Lead Time defined to procure, Actual Procurement time,

In this, two levels need not to have relationship. This Hierarchy is created for navigational purpose.

10. What is data cleaning? How can we do that?

Data cleaning is a self-explanatory term. Most of the data warehouses in the world source data from multiple systems - systems that were created long before data warehousing was well understood, and hence without the vision to consolidate the same in a single repository of information. In such a scenario, the possibilities of the following are there:

► Missing information for a column from one of the data sources;
► Inconsistent information among different data sources;
► Orphan records;
► Outlier data points;
► Different data types for the same information among various data sources, leading to improper conversion;
► Data breaching business rules

In order to ensure that the data warehouse is not infected by any of these discrepancies, it is important to cleanse the data using a set of business rules, before it makes its way into the data warehouse.

Download Interview PDF

11. What is dimension modeling?

A logical design technique that seeks to present the data in a standard, intuitive framework that allows for high-performance access. There are different data modeling concepts like ER Modeling (Entity Relationship modeling), DM (Dimensional modeling), Hierarchal Modeling, Network modeling. However, popular are ER and DM only.

12. Where the cache files stored?

Caches are stored in Repository.

13. How can you import tables from a database?

In Business Objects Universe Designer you can open Table Browser and select the tables needed then insert them to designer.

14. What is drilling across?

Drill across corresponds to switching from 1 classification in 1 dimension to a different classification in different dimension.

15. How Many different schemas or DW Models can be used in Siebel Analytics. I know Only STAR and SNOW FLAKE and any other model that can be used?

Integrated schema design is also used to define an integrated schema design we have to define the following concepts

► Fact constellation

► Act less fact table

► Onformed dimension

A: A fact constellation is the process of joining two or more fact tables

B: A fact table with out any facts is known as fact less fact table

C:A dimension which is re useful and fixed is known as conformed dimensionA dimension, which is, shared with multiple fact tables known as conformed dimension

16. What is an error log table in Informatica occurs and how to maintain it in mapping?

Error Log in Informatica is a one of output file created by Informatica Server while running the session for error messages. It is created in Informatica home directory.

17. What is loop in Data warehousing?

In DWH loops may exist between the tables. If loops exist, then query generation will take more time, because more than one path is available. It creates ambiguity also. Loops can be avoided by creating aliases of the table or by context.

Example: 4 Tables - Customer, Product, Time, Cost forming a close loop. Create alias for the cost to avoid loop.

18. How many clustered indexes can u create for a table in DWH? In case of truncate and delete command what happens to table, which has unique id.

You can have only one clustered index per table. If you use delete command, you can rollback... it fills your redo log files.

If you do not want records, you may use truncate command, which will be faster and does not fill your redo log file.

19. What is hybrid slowly changing dimension?

Hybrid SCDs are combination of both SCD 1 and SCD 2.It may happen that in a table, some columns are important and we need to track changes for them i.e. capture the historical data for them whereas in some columns even if the data changes, we don't care.For such tables we implement Hybrid SCDs, where in some columns are Type 1 and some are Type 2.You can add that it is not an intelligent key but similar to a sequence number and tied to a timestamp typically!

20. Can a dimension table contain numeric values?

Yes. However, those data type will be char (only the values can numeric/char).Yes, dimensions even contain numerical because these are descriptive elements of our business.

21. How do you create Surrogate Key using Ab Initio?

There are many ways to create Surrogate key but it depends on your business logic. Here you can try these ways.1. Use next in sequence () function in your transform

2. Use Assign key values component (if your GDE is higher than 1.10)

3. Write a stored proc to this and call this store proc wherever you need.Yes, dimension table contains numeric but not contain measures and facts

22. What is the difference between star and snowflake schemas?

Star schema:
A single fact table with N number of DimensionSnowflake schema: Any dimensions with extended dimensions are known as snowflake schema.

23. What is a CUBE in data warehousing concept?

Cubes are logical representation of multidimensional data. The edge of the cube contains dimension members and the body of the cube contains data values.

24. What is the difference between Snowflake and Star Schema? What are situations where Snowflake Schema is better than Star Schema when the opposite is true?

Star schema contains the dimension tables mapped around one or more fact tables.It is a renormalized model and no need to use complicated joins. Also Queries results fast.Snowflake schema is the normalized form of Star schema. It contains in-depth joins, because the tables are spited in to many pieces. We can easily do modification directly in the tables.We have to use complicated joins, since we have more tables. There will be some delay in processing the Query.

Download Interview PDF

25. What is ER Diagram?

The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views. Simply stated the ER model is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the Entity-Relationship diagram, which is used to visually represent data objects. Since Chen wrote his paper the model has been extended and today it is commonly used for database design for the database designer, the utility of the ER model is: it maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables. It is simple and easy to understand with a minimum of training. Therefore, the database designer to communicate the design to the end user can use the model. In addition, the model can be used as a design plan by the database developer to implement a data model in specific database management software.