1. Explain me what is meant by Data Analytics?
Data analytics (DA) is the science of examining raw data with the purpose of drawing conclusions about that information. A data warehouse is often built to enable Data Analytics
2. Explain me what is conformed fact?
Conformed fact is a table which can be used across multiple data marts in combined with the multiple fact tables.
3. Do you know what is called data cleaning?
Name itself implies that it is a self explanatory term. Cleaning of Orphan records, Data breaching business rules, Inconsistent data and missing information in a database.
4. Explain me what is real-time datawarehousing?
Real-time datawarehousing captures the business data whenever it occurs. When there is business activity gets completed, that data will be available in the flow and become available for use instantly.
5. Explain any five applications of data warehouse?
Some applications include:
☛ financial services
☛ banking services
☛ customer goods
☛ retail sectors
☛ controlled manufacturing
6. Explain me what are the key columns in Fact and dimension tables?
Foreign keys of dimension tables are primary keys of entity tables. Foreign keys of fact tables are the primary keys of the dimension tables.
7. Explain what is dimensional modeling?
Dimensional model consists of dimension and fact tables. Fact tables store different transactional measurements and the foreign keys from dimension tables that qualifies the data. The goal of Dimensional model is not to achieve high degree of normalization but to facilitate easy and faster data retrieval.
Ralph Kimball is one of the strongest proponents of this very popular data modeling technique which is often used in many enterprise level data warehouses.
8. Tell me what are Aggregate tables?
Aggregate tables are the tables which contain the existing warehouse data which has been grouped to certain level of dimensions. It is easy to retrieve data from the aggregated tables than the original table which has more number of records.
This table reduces the load in the database server and increases the performance of the query.
9. Explain me what needs to be done when the database is shutdown?
Following needs to be done when the database is shutdown:
☛ Close the database
☛ Dismount the database
☛ Shutdown the Instance
VLDB is abbreviated as Very Large Database and its size is set to be more than one terabyte database. These are decision support systems which is used to server large number of users.
11. Tell us what is called Dimensional Modelling?
Dimensional Modeling is a concept which can be used by dataware house designers to build their own datawarehouse. This model can be stored in two types of tables – Facts and Dimension table.
Fact table has facts and measurements of the business and dimension table contains the context of measurements.
12. Explain me the functions of a load manager?
A load manager extracts data from the source system. Fast load the extracted data into temporary data store. Perform simple transformations into structure similar to the one in the data warehouse.
13. Tell me what are the different types of datawarehosuing?
Following are the different types of Datawarehousing:
☛ Enterprise Datawarehousing
☛ Operational Data Store
☛ Data Mart
14. Tell me what is Snowflake Schema?
Snowflake schema which has primary dimension table to which one or more dimensions can be joined. The primary dimension table is the only table that can be joined with the fact table.
ODS is abbreviated as Operational Data Store and it is a repository of real time operational data rather than long term trend data.
16. Tell me what is ER Diagram?
ER diagram is abbreviated as Entity-Relationship diagram which illustrates the interrelationships between the entities in the database. This diagram shows the structure of each tables and the links between the tables.
17. Tell me what are the benefits of data warehouse?
A data warehouse helps to integrate data and store them historically so that we can analyze different aspects of business including, performance analysis, trend, prediction etc. over a given time frame and use the result of our analysis to improve the efficiency of business processes.
18. Tell me what are loops in Datawarehousing?
In datawarehousing, loops are existing between the tables. If there is a loop between the tables, then the query generation will take more time and it creates ambiguity. It is advised to avoid loop between the tables.
19. Tell me how can we load the time dimension?
Time dimensions are usually loaded through all possible dates in a year and it can be done through a program. Here, 100 years can be represented with one row per day.
20. Tell me what needs to be done while starting the database?
Following need to be done to start the database:
☛ Start an Instance
☛ Mount the database
☛ Open the database
A load manager performs the operations required to extract and load the process. The size and complexity of load manager varies between specific solutions from data warehouse to data warehouse.
22. Tell us what are the tools available for ETL?
Following are the ETL tools available:
☛ Informatica
☛ Data Stage
☛ Oracle
☛ Warehouse Builder
☛ Ab Initio
☛ Data Junction
23. Tell us what is a core dimension?
Core dimension is nothing but a Dimension table which is used as dedicated for single fact table or datamart.
24. Tell me what is the difference between OLTP and OLAP?
Following are the differences between OLTP and OLAP:
OLTP:
Data is from original data source
Simple queries by users
Normalized small database
Fundamental business tasks
OLAP:
Data is from various data sources
Complex queries by system
De-normalized Large Database
Multi-dimensional business tasks
25. Tell me why facts table is useful in representing the data?
Fact table allows the measurement and the values of the facts of the data to be contained inside the table. This table consists of the foreign keys and primary keys of the dimension tables. It is located in between the star schema or snowflake schema. It provides values that are additive and independent variables through which the dimensional attributes are analyzed. This table consists of the grains, which consist of atomic level of data and through which the facts in the tables are defined. Each record defines the independent facts that provide higher level of data to be given to the user. It is useful in representing the data due to easy storage and less memory to be taken to the facts of the data that are associated with it.
26. Tell me what are the reasons for partitioning?
Partitioning is done for various reasons such as easy management, to assist backup recovery, to enhance performance.
ETL is abbreviated as Extract, Transform and Load. ETL is a software which is used to reads the data from the specified data source and extracts a desired subset of data. Next, it transform the data using rules and lookup tables and convert it to a desired state.
28. Do you know what is the difference between Datawarehouse and OLAP?
Datawarehouse is a place where the whole data is stored for analyzing, but OLAP is used for analyzing the data, managing aggregations, information partitioning into minor level information.
29. Tell me what is the definition of Cube in Datawarehousing?
Cubes are logical representation of multidimensional data. The edge of the cube has the dimension members,and the body of the cube contains the data values.
30. Explain me what is the use of dimensional modeling in data warehousing?
Dimensional modeling is a set of techniques that is used in designing the overall structure of data warehousing. It doesn't involve relational database but at the logical level uses the physical form of the database. It is used to support user queries and to increase the performance and understanding of a particular database concept. It uses facts and dimensions to support the measures and the context of the database. The facts define the values that can be aggregated and dimensions represent the group of hierarchies and the descriptors that define the facts in return. This type of models is built by business process model and consists inside the process area. This process area consists of the same design and operation details as others.
31. Explain me how does a Data Cube help?
Data cube helps us to represent the data in multiple dimensions. The data cube is defined by dimensions and facts.
32. Tell me what are the stages that are required in Data warehousing?
There are four different kinds of stages that are required in data warehousing and they are:
Offline Operational Databases: This is the top most and initial stage that allows the database to be viewed offline without going to online. This copy the database to the operational system and an offline server that processes the load of the online and offline and allow the performance to be balanced.
Offline Data Warehouse: This is the second stage where the updation of the time cycle that is regular takes place. The settings are given through which the data can be set like daily, weekly, monthly and yearly. This data is taken from the operational system. The data is stored in the report oriented data structure.
Real Time Data Warehouse: This allows the transaction update on the event basis. It means on an event an updation occurs. The transaction is performed in the operational system as well.
Integrated Data Warehouse: This is the final stage and it is used to generate activity or transactions. After generation they are again put back to the operational system to be used by the user on the daily basis
33. Tell me what does subject-oriented data warehouse signify?
Subject oriented signifies that the data warehouse stores the information around a particular subject such as product, customer, sales, etc.
34. Tell me why is dimensional normalization not required?
Dimensional normalization allows the database related problems to be solved. It is used to remove the redundant attributes that are used as de-normalized dimensions. Dimensions consist of sub-dimensions that are joined together in one. The dimensional normalization is not used due to the fact that it makes:
☛ Data structure more complex and due to which the performance can be degraded as it requires lots of joining of tables and keep the relations intact.
☛ The space is not utilized properly and use of more space is required.
☛ The query performance suffers when aggregating or retrieving many dimensional values. This requires proper analysis and making of operational reports are necessary.
35. Tell me what is dimension?
A dimension is something that qualifies a quantity (measure).
For an example, consider this: If I just say… “20kg”, it does not mean anything. But if I say, "20kg of Rice (Product) is sold to Ramesh (customer) on 5th April (date)", then that gives a meaningful sense. These product, customer and dates are some dimension that qualified the measure - 20kg.
Dimensions are mutually independent. Technically speaking, a dimension is a data element that categorizes each item in a data set into non-overlapping regions.