1. Tell me do you know why Data Warehouse is used?

For a long time in the past and also even today, Data warehouses are built to facilitate reporting on different key business processes of an organization, known as KPI. Today we often call this whole process of reporting data from data warehouses as "Data Analytics". Data warehouses also help to integrate data from different sources and show a single-point-of-truth values about the business measures (e.g. enabling Master Data Management).

2. Explain me what do you understand by data mart?

Data marts are for the most part intended for a solitary branch of business. They are designed for the individual departments. For example, I used to work for a health insurance provider company which had different departments in it like Finance, Reporting, Sales and so forth.

We had a data warehouse that was holding the information pertaining to all these departments and then we have few data marts built on top of this data warehouse. These DataMart were specific to each department. In simple words, you can say that a DataMart is a subset of a data warehouse.

3. What are conformed dimensions?

A Dimension that is utilized as a part of different areas is called as conformed dimension. It might be utilized with different fact tables in a single database or over numerous data marts/warehouses. For example, if subscriber dimension is connected to two fact tables – billing and claim then the subscriber dimension would be treated as conformed dimension.

4. Tell me what is a fact & a fact table?

Facts represent quantitative data. For example – net amount due is a fact. A fact table contains numerical data and foreign keys from related dimensional tables.

5. Tell me the physical data model?

The physical data model will be showing primary keys, foreign keys, table names, column names and column data types. This view actually elaborates how the model will be actually implemented in the database.

6. Do you know what do you understand by Data Modelling?

Data Modelling is the diagrammatic representation showing how the entities are related to each other. It is the initial step towards database design. We first create the conceptual model, then logical model and finally move to the physical model.

Generally, the data models are created in data analysis & design phase of software development life cycle.

7. Explain me what is a 'Conformed Dimension'?

A conformed dimension is the dimension that is shared across multiple subject area. Consider 'Customer' dimension. Both marketing and sales department may use the same customer dimension table in their reports. Similarly, a 'Time' or 'Date' dimension will be shared by different subject areas. These dimensions are conformed dimension.

Theoretically, two dimensions which are either identical or strict mathematical subsets of one another are said to be conformed.

8. Tell me additive Measures?

Additive measures can be used with any aggregation function like Sum(), Avg() etc. Example is Sales Quantity etc.

9. Explain me what is ER model?

ER model or entity-relationship model is a particular methodology of data modeling wherein the goal of modeling is to normalize the data by reducing redundancy. This is different than dimensional modeling where the main goal is to improve the data retrieval mechanism.

10. Tell me what is data warehouse?

A data warehouse is a electronic storage of an Organization's historical data for the purpose of Data Analytics, such as reporting, analysis and other knowledge discovery activities.

Other than Data Analytics, a data warehouse can also be used for the purpose of data integration, master data management etc.

Download Interview PDF

11. Tell me what is Fact?

A fact is something that is quantifiable (Or measurable). Facts are typically (but not always) numerical values that can be aggregated.

12. Do you know what is junk dimension?

A junk dimension is a grouping of typically low-cardinality attributes (flags, indicators etc.) so that those can be removed from other tables and can be junked into an abstract dimension table.

These junk dimension attributes might not be related. The only purpose of this table is to store all the combinations of the dimensional attributes which you could not fit into the different dimension tables otherwise. Junk dimensions are often used to implement Rapidly Changing Dimensions in data warehouse.

13. Can you throw some light on your experience in Data Modelling with respect to projects you have worked on till date?

This was the very first question in one of my Data Modelling interviews. So, before you step into the interview discussion, you should have a very clear picture of how data modeling fits into the assignments you have worked upon.
I have worked on a project for a health insurance provider company where we have interfaces build in Informatica that transforms and process the data fetched from Facets database and sends out useful information to vendors.

14. Tell me what is junk Dimension?

It is a dimension table comprising of attributes that don't have a place in the fact table or in any of the current dimension tables. Generally, these are the properties like flags or indicators. For example, it can be member eligibility flag set as ‘Y' or ‘N' or any other indicator set as true/false, any specific comments, etc. if we keep all such indicator attributes in the fact table then its size gets increased. So, we combine all such attributes and put in a single dimension table called as junk dimension having unique junk IDs with a possible combination of all the indicator values.

15. Tell us what are the different types of measures?

We have three types of measures

☛ Non- additive measures
☛ Semi-additive measures
☛ Additive measures

Non-additive measures are the ones on top of which no aggregation function can be applied. For example, a ratio or a percentage column; a flag or an indicator column present in fact table holding values like Y/N, etc. is a non-additive measure.

16. Can you explain me what is meant by Data Analytics?

Data analytics (DA) is the science of examining raw data with the purpose of drawing conclusions about that information. A data warehouse is often built to enable Data Analytics

17. Tell us what is a mini dimension?

Mini dimensions can be used to handle rapidly changing dimension scenario. If a dimension has a huge number of rapidly changing attributes it is better to separate those attributes in different table called mini dimension. This is done because if the main dimension table is designed as SCD type 2, the table will soon outgrow in size and create performance issues. It is better to segregate the rapidly changing members in different table thereby keeping the main dimension table small and performing.

18. Please explain what are the different types of dimensions you have come across? Explain each of them in detail with an example?

There are typically five types of dimensions.

1) Conformed dimensions: A Dimension that is utilized as a part of different areas is called as conformed dimension. It might be utilized with different fact tables in a single database or over numerous data marts/warehouses. For example, if subscriber dimension is connected to two fact tables – billing and claim then the subscriber dimension would be treated as conformed dimension.

2) Junk Dimension: It is a dimension table comprising of attributes that don't have a place in the fact table or in any of the current dimension tables. Generally, these are the properties like flags or indicators. For example, it can be member eligibility flag set as ‘Y' or ‘N' or any other indicator set as true/false, any specific comments, etc. if we keep all such indicator attributes in the fact table then its size gets increased. So, we combine all such attributes and put in a single dimension table called as junk dimension having unique junk IDs with a possible combination of all the indicator values.

3) Role Playing Dimension: These are the dimensions which are utilized for multiple purposes in the same database. For example, a date dimension can be used for “Date of Claim”, “Billing date” or “Plan Term date”. So, such a dimension will be called as Role playing dimension. The primary key of Date dimension will be associated with multiple foreign keys in the fact table.

4) Slowly Changing Dimension (SCD): These are most important amongst all the dimensions. These are the dimensions where attribute values vary with time. Below are the varies types of SCDs

19. Tell me what is snow-flake schema?

This is another logical arrangement of tables in dimensional modeling where a centralized fact table references number of other dimension tables; however, those dimension tables are further normalized into multiple related tables.

Consider a fact table that stores sales quantity for each product and customer on a certain time. Sales quantity will be the measure here and keys from customer, product and time dimension tables will flow into the fact table. Additionally all the products can be further grouped under different product families stored in a different table so that primary key of product family tables also goes into the product table as a foreign key. Such construct will be called a snow-flake schema as product table is further snow-flaked into product family.

20. Explain me what is your understanding of different data models?

There are three types of data models – conceptual, logical and physical. The level of complexity and detail increases from conceptual to logical to a physical data model.

The conceptual model shows a very basic high level of design while the physical data model shows a very detailed view of design.

The conceptual model will be just portraying entity names and entity relationships. Figure 1 shown in the later part of this article depicts a conceptual model.

The logical model will be showing up entity names, entity relationships, attributes, primary keys and foreign keys in each entity. Figure 2 shown inside question#4 in this article depicts a logical model.

The physical data model will be showing primary keys, foreign keys, table names, column names and column data types. This view actually elaborates how the model will be actually implemented in the database.

21. Tell me your idea regarding factless fact? And why do we use it?

Factless fact table is a fact table that contains no fact measure in it. It has only the dimension keys in it.

At times, certain situations may arise in the business where you need to have factless fact table. For example, suppose you are maintaining an employee attendance record system, you can have a factless fact table having three keys.

22. What is the logical model?

The logical model will be showing up entity names, entity relationships, attributes, primary keys and foreign keys in each entity. Figure 2 shown inside question#4 in this article depicts a logical model.

23. Tell me what are the different types of dimension?

In a data warehouse model, dimension can be of following types,

☛ Conformed Dimension
☛ Junk Dimension
☛ Degenerated Dimension
☛ Role Playing Dimension


Based on how frequently the data inside a dimension changes, we can further classify dimension as
☛ Unchanging or static dimension (UCD)
☛ Slowly changing dimension (SCD)
☛ Rapidly changing Dimension (RCD)

24. Tell us what are the benefits of data warehouse?

A data warehouse helps to integrate data and store them historically so that we can analyze different aspects of business including, performance analysis, trend, prediction etc. over a given time frame and use the result of our analysis to improve the efficiency of business processes.

Download Interview PDF

25. Explain me non-additive Measures?

Non-additive measures are those which can not be used inside any numeric aggregation function (e.g. SUM(), AVG() etc.). One example of non-additive fact is any kind of ratio or percentage. Example, 5% profit margin, revenue to asset ratio etc. A non-numerical data can also be a non-additive measure when that data is stored in fact tables, e.g. some kind of varchar flags in the fact table.