1. Tell us what is degenerated dimension?
A degenerated dimension is a dimension that is derived from fact table and does not have its own dimension table.
A dimension key, such as transaction number, receipt number, Invoice number etc. does not have any more associated attributes and hence can not be designed as a dimension table.
2. Explain me what do you understand by dimension and attribute?
Dimensions represent qualitative data. For example– plan, product, class are all dimensions. A dimension table contains descriptive or textual attributes. For example, product category & product name are the attributes of product dimension.
3. Please explain semi Additive Measures?
Semi-additive measures are those where only a subset of aggregation function can be applied. Let's say account balance. A sum() function on balance does not give a useful result but max() or min() balance might be useful. Consider price rate or currency rate. Sum is meaningless on rate; however, average function might be useful.
4. Please explain what is dimensional modeling?
Dimensional model consists of dimension and fact tables. Fact tables store different transactional measurements and the foreign keys from dimension tables that qualifies the data. The goal of Dimensional model is not to achieve high degree of normalization but to facilitate easy and faster data retrieval.
5. What is role Playing Dimension?
These are the dimensions which are utilized for multiple purposes in the same database. For example, a date dimension can be used for “Date of Claim”, “Billing date” or “Plan Term date”. So, such a dimension will be called as Role playing dimension. The primary key of Date dimension will be associated with multiple foreign keys in the fact table.
6. What is the conceptual model?
The conceptual model will be just portraying entity names and entity relationships. Figure 1 shown in the later part of this article depicts a conceptual model.
7. Explain me what is the difference between OLTP and OLAP?
OLTP is the transaction system that collects business data. Whereas OLAP is the reporting and analysis system on that data.
OLTP systems are optimized for INSERT, UPDATE operations and therefore highly normalized. On the other hand, OLAP systems are deliberately denormalized for fast data retrieval through SELECT operations.
8. Can you explain me what is dimension?
A dimension is something that qualifies a quantity (measure).
For an example, consider this: If I just say… “20kg”, it does not mean anything. But if I say, "20kg of Rice (Product) is sold to Ramesh (customer) on 5th April (date)", then that gives a meaningful sense. These product, customer and dates are some dimension that qualified the measure - 20kg.
9. Can you tell me what is SCD?
SCD stands for slowly changing dimension, i.e. the dimensions where data is slowly changing. These can be of many types, e.g. Type 0, Type 1, Type 2, Type 3 and Type 6, although Type 1, 2 and 3 are most common.
10. What is slowly Changing Dimension (SCD)?
These are most important amongst all the dimensions. These are the dimensions where attribute values vary with time. Below are the varies types of SCDs
11. Tell me which schema is better – star or snowflake?
The choice of a schema always depends upon the project requirements & scenarios.
Since star schema is in de-normalized form, you require fewer joins for a query. The query is simple and runs faster in a star schema. Coming to the snowflake schema, since it is in normalized form, it will require a number of joins as compared to a star schema, the query will be complex and execution will be slower than star schema.
12. Explain me what is a role-playing dimension?
Dimensions are often reused for multiple applications within the same database with different contextual meaning. For instance, a "Date" dimension can be used for "Date of Sale", as well as "Date of Delivery", or "Date of Hire". This is often referred to as a 'role-playing dimension'
13. Tell us what is data mart?
Data marts are generally designed for a single subject area. An organization may have data pertaining to different departments like Finance, HR, Marketing etc. stored in data warehouse and each department may have separate data marts. These data marts can be built on top of the data warehouse.
14. Can you distinguish between OLTP and OLAP?
OLTP stands for Online Transaction Processing system & OLAP stands for Online Analytical processing system. OLTP maintains the transactional data of the business & is highly normalized generally. On the contrary, OLAP is for analysis and reporting purpose & it is in de-normalized form.
This difference between OLAP and OLTP also gives you the way to choosing the design of schema. If your system is OLTP, you should go with star schema design and if your system is OLAP, you should go with snowflake schema.
15. Tell me what are the different design schemas in Data Modelling? Explain with the example?
There are two different kinds of schemas in data modeling
Star Schema
Snowflake Schema
Now I will be explaining each of these schemas one by one.
The simplest of the schemas is star schema where we have a fact table in the center which references multiple dimension tables around it.
All the dimension tables are connected to the fact table. The primary key in all dimension tables acts as a foreign key in the fact table.
16. Explain me non-additive Measures?
Non-additive measures are those which can not be used inside any numeric aggregation function (e.g. SUM(), AVG() etc.). One example of non-additive fact is any kind of ratio or percentage. Example, 5% profit margin, revenue to asset ratio etc. A non-numerical data can also be a non-additive measure when that data is stored in fact tables, e.g. some kind of varchar flags in the fact table.
17. Tell us what are the benefits of data warehouse?
A data warehouse helps to integrate data and store them historically so that we can analyze different aspects of business including, performance analysis, trend, prediction etc. over a given time frame and use the result of our analysis to improve the efficiency of business processes.
18. Tell me what are the different types of dimension?
In a data warehouse model, dimension can be of following types,
☛ Conformed Dimension
☛ Junk Dimension
☛ Degenerated Dimension
☛ Role Playing Dimension
Based on how frequently the data inside a dimension changes, we can further classify dimension as
☛ Unchanging or static dimension (UCD)
☛ Slowly changing dimension (SCD)
☛ Rapidly changing Dimension (RCD)
19. What is the logical model?
The logical model will be showing up entity names, entity relationships, attributes, primary keys and foreign keys in each entity. Figure 2 shown inside question#4 in this article depicts a logical model.
20. Tell me your idea regarding factless fact? And why do we use it?
Factless fact table is a fact table that contains no fact measure in it. It has only the dimension keys in it.
At times, certain situations may arise in the business where you need to have factless fact table. For example, suppose you are maintaining an employee attendance record system, you can have a factless fact table having three keys.
21. Explain me what is your understanding of different data models?
There are three types of data models – conceptual, logical and physical. The level of complexity and detail increases from conceptual to logical to a physical data model.
The conceptual model shows a very basic high level of design while the physical data model shows a very detailed view of design.
The conceptual model will be just portraying entity names and entity relationships. Figure 1 shown in the later part of this article depicts a conceptual model.
The logical model will be showing up entity names, entity relationships, attributes, primary keys and foreign keys in each entity. Figure 2 shown inside question#4 in this article depicts a logical model.
The physical data model will be showing primary keys, foreign keys, table names, column names and column data types. This view actually elaborates how the model will be actually implemented in the database.
22. Tell me what is snow-flake schema?
This is another logical arrangement of tables in dimensional modeling where a centralized fact table references number of other dimension tables; however, those dimension tables are further normalized into multiple related tables.
Consider a fact table that stores sales quantity for each product and customer on a certain time. Sales quantity will be the measure here and keys from customer, product and time dimension tables will flow into the fact table. Additionally all the products can be further grouped under different product families stored in a different table so that primary key of product family tables also goes into the product table as a foreign key. Such construct will be called a snow-flake schema as product table is further snow-flaked into product family.
There are typically five types of dimensions.
1) Conformed dimensions: A Dimension that is utilized as a part of different areas is called as conformed dimension. It might be utilized with different fact tables in a single database or over numerous data marts/warehouses. For example, if subscriber dimension is connected to two fact tables – billing and claim then the subscriber dimension would be treated as conformed dimension.
2) Junk Dimension: It is a dimension table comprising of attributes that don't have a place in the fact table or in any of the current dimension tables. Generally, these are the properties like flags or indicators. For example, it can be member eligibility flag set as ‘Y' or ‘N' or any other indicator set as true/false, any specific comments, etc. if we keep all such indicator attributes in the fact table then its size gets increased. So, we combine all such attributes and put in a single dimension table called as junk dimension having unique junk IDs with a possible combination of all the indicator values.
3) Role Playing Dimension: These are the dimensions which are utilized for multiple purposes in the same database. For example, a date dimension can be used for “Date of Claim”, “Billing date” or “Plan Term date”. So, such a dimension will be called as Role playing dimension. The primary key of Date dimension will be associated with multiple foreign keys in the fact table.
4) Slowly Changing Dimension (SCD): These are most important amongst all the dimensions. These are the dimensions where attribute values vary with time. Below are the varies types of SCDs
24. Tell us what is a mini dimension?
Mini dimensions can be used to handle rapidly changing dimension scenario. If a dimension has a huge number of rapidly changing attributes it is better to separate those attributes in different table called mini dimension. This is done because if the main dimension table is designed as SCD type 2, the table will soon outgrow in size and create performance issues. It is better to segregate the rapidly changing members in different table thereby keeping the main dimension table small and performing.
25. Can you explain me what is meant by Data Analytics?
Data analytics (DA) is the science of examining raw data with the purpose of drawing conclusions about that information. A data warehouse is often built to enable Data Analytics
26. Tell us what are the different types of measures?
We have three types of measures
☛ Non- additive measures
☛ Semi-additive measures
☛ Additive measures
Non-additive measures are the ones on top of which no aggregation function can be applied. For example, a ratio or a percentage column; a flag or an indicator column present in fact table holding values like Y/N, etc. is a non-additive measure.
27. Tell me what is junk Dimension?
It is a dimension table comprising of attributes that don't have a place in the fact table or in any of the current dimension tables. Generally, these are the properties like flags or indicators. For example, it can be member eligibility flag set as ‘Y' or ‘N' or any other indicator set as true/false, any specific comments, etc. if we keep all such indicator attributes in the fact table then its size gets increased. So, we combine all such attributes and put in a single dimension table called as junk dimension having unique junk IDs with a possible combination of all the indicator values.
This was the very first question in one of my Data Modelling interviews. So, before you step into the interview discussion, you should have a very clear picture of how data modeling fits into the assignments you have worked upon.
I have worked on a project for a health insurance provider company where we have interfaces build in Informatica that transforms and process the data fetched from Facets database and sends out useful information to vendors.
29. Do you know what is junk dimension?
A junk dimension is a grouping of typically low-cardinality attributes (flags, indicators etc.) so that those can be removed from other tables and can be junked into an abstract dimension table.
These junk dimension attributes might not be related. The only purpose of this table is to store all the combinations of the dimensional attributes which you could not fit into the different dimension tables otherwise. Junk dimensions are often used to implement Rapidly Changing Dimensions in data warehouse.
A fact is something that is quantifiable (Or measurable). Facts are typically (but not always) numerical values that can be aggregated.
31. Tell me what is data warehouse?
A data warehouse is a electronic storage of an Organization's historical data for the purpose of Data Analytics, such as reporting, analysis and other knowledge discovery activities.
Other than Data Analytics, a data warehouse can also be used for the purpose of data integration, master data management etc.
32. Explain me what is ER model?
ER model or entity-relationship model is a particular methodology of data modeling wherein the goal of modeling is to normalize the data by reducing redundancy. This is different than dimensional modeling where the main goal is to improve the data retrieval mechanism.
33. Tell me additive Measures?
Additive measures can be used with any aggregation function like Sum(), Avg() etc. Example is Sales Quantity etc.
34. Explain me what is a 'Conformed Dimension'?
A conformed dimension is the dimension that is shared across multiple subject area. Consider 'Customer' dimension. Both marketing and sales department may use the same customer dimension table in their reports. Similarly, a 'Time' or 'Date' dimension will be shared by different subject areas. These dimensions are conformed dimension.
Theoretically, two dimensions which are either identical or strict mathematical subsets of one another are said to be conformed.
35. Do you know what do you understand by Data Modelling?
Data Modelling is the diagrammatic representation showing how the entities are related to each other. It is the initial step towards database design. We first create the conceptual model, then logical model and finally move to the physical model.
Generally, the data models are created in data analysis & design phase of software development life cycle.