1. Explain data Exploration?

Having identified the business problem, a data analyst has to go through the data provided by the client to analyse the root cause of the problem.

2. Tell me what are the elements of an ERD?

The three elements include the entities for which someone is seeking information, the attributes of those entities, and the relationships between the entities.

3. Explain data Preparation?

This is the most crucial step of the data analysis process wherein any data anomalies (like missing values or detecting outliers) with the data have to be modelled in the right direction.

4. Do you know what is cardinality?

Thinking mathematically, it is the number of elements in a set. Thinking in the database world, cardinality has to do with the counts in a relationship, one-to-one, one-to-many, or many-to-many.

5. Can you describe the differences in the first through fifth normalization forms?

Database candidates should be familiar with most if not all of these without needing to lookup definitions. Some of the other normalization forms are less commonly known/used, but could theoretically be asked. Knowing the differences between second and third is probably a good idea.

6. Tell us what do you know about interquartile range as data analyst?

A measure of the dispersion of data that is shown in a box plot is referred to as the interquartile range. It is the difference between the upper and the lower quartile.

7. Tell me what is a database transaction?

A transaction is a single logical (atomic) unit of work, in which a sequence of operations (or none) must be executed. A transaction has a defined beginning and end. You can commit or roll back a transaction.

8. What is Logic Regression?

Logic Regression can be defined as:

This is a statistical method of examining a dataset having one or more variables that are independent defining an outcome.

9. Tell us what are the important steps in data validation process?

Data Validation is performed in 2 different steps-

☛ Data Screening – In this step various algorithms are used to screen the entire data to find any erroneous or questionable values. Such values need to be examined and should be handled.

☛ Data Verification- In this step each suspect value is evaluated on case by case basis and a decision is to be made if the values have to be accepted as valid or if the values have to be rejected as invalid or if they have to be replaced with some redundant values.

10. Tell me what is the difference between LEFT JOIN and RIGHT JOIN?

A LEFT JOIN returns all records from the left table, even when they do not match in the right table. Missing values become NULL. In a similar manner, a RIGHT JOIN returns all records from the right table, even when they do not match those in the left table. Missing values become NULL.

Download Interview PDF