1. Explain data Exploration?

Having identified the business problem, a data analyst has to go through the data provided by the client to analyse the root cause of the problem.

2. Tell me what are the elements of an ERD?

The three elements include the entities for which someone is seeking information, the attributes of those entities, and the relationships between the entities.

3. Explain data Preparation?

This is the most crucial step of the data analysis process wherein any data anomalies (like missing values or detecting outliers) with the data have to be modelled in the right direction.

4. Do you know what is cardinality?

Thinking mathematically, it is the number of elements in a set. Thinking in the database world, cardinality has to do with the counts in a relationship, one-to-one, one-to-many, or many-to-many.

5. Can you describe the differences in the first through fifth normalization forms?

Database candidates should be familiar with most if not all of these without needing to lookup definitions. Some of the other normalization forms are less commonly known/used, but could theoretically be asked. Knowing the differences between second and third is probably a good idea.

6. Tell us what do you know about interquartile range as data analyst?

A measure of the dispersion of data that is shown in a box plot is referred to as the interquartile range. It is the difference between the upper and the lower quartile.

7. Tell me what is a database transaction?

A transaction is a single logical (atomic) unit of work, in which a sequence of operations (or none) must be executed. A transaction has a defined beginning and end. You can commit or roll back a transaction.

8. What is Logic Regression?

Logic Regression can be defined as:

This is a statistical method of examining a dataset having one or more variables that are independent defining an outcome.

9. Tell us what are the important steps in data validation process?

Data Validation is performed in 2 different steps-

☛ Data Screening – In this step various algorithms are used to screen the entire data to find any erroneous or questionable values. Such values need to be examined and should be handled.

☛ Data Verification- In this step each suspect value is evaluated on case by case basis and a decision is to be made if the values have to be accepted as valid or if the values have to be rejected as invalid or if they have to be replaced with some redundant values.

10. Tell me what is the difference between LEFT JOIN and RIGHT JOIN?

A LEFT JOIN returns all records from the left table, even when they do not match in the right table. Missing values become NULL. In a similar manner, a RIGHT JOIN returns all records from the right table, even when they do not match those in the left table. Missing values become NULL.

11. Tell us what is involved in typical data analysis?

Typical data analysis involves the collection and organization of data. Then, finding correlations between that analyzed data and the rest of the company's and industry's data. It also entails the ability to spot problems and initiate preventative measures or problem solve creatively.

12. Explain data Modelling?

The modelling step begins once the data has been prepared. Modelling is an iterative process wherein the model is run repeatedly for improvements. Data modelling ensures that the best possible result is found for a given business problem.

13. Tell me what is the criteria to say whether a developed data model is good or not?

☛ The developed model should have predictable performance.
☛ A good data model can adapt easily to any changes in business requirements.
☛ Any major data changes in a good data model should be scalable.
☛ A good data model is one that can be easily consumed for actionable results.

14. Tell me what are the differences between primary and foreign keys?

The primary key is the column or set of columns used to uniquely identify the items in a table. A foreign key is used to uniquely identify the items in a different table, allowing join operations to happen.

15. Explain me what is self JOIN?

Self JOIN is a query that joins a table with itself. This is used to compare the values of a particular column with other values in the same column of the same table. Self JOIN uses aliases to name the original and duplicate tables.

16. Do you know what is a subquery?

A subquery is, as the name suggests, a query within another query. The outer query is called the main query, while the inner query is the subquery. You can think of this in terms of composition of functions. A subquery is always executed first, and its result is then passed on to the main query.

17. Tell me how often should you retrain a data model?

A good data analyst is the one who understands how changing business dynamics will affect the efficiency of a predictive model. You must be a valuable consultant who can use analytical skills and business acumen to find the root cause of business problems.

The best way to answer this question would be to say that you would work with the client to define a time period in advance. However, I would refresh or retrain a model when the company enters a new market, consummate an acquisition or is facing emerging competition. As a data analyst, I would retrain the model as quick as possible to adjust with the changing behaviour of customers or change in market conditions.

18. Tell me what is Data Cleansing?

When answering this question, you should know that the definition of data cleansing is:

Data cleansing (also known as data cleaning) involves a data analyst discovering and eliminating errors and irregularities from the database to enhance data quality.

19. Tell us what is the difference between Data Mining and Data Profiling?

Data Profiling, also referred to as Data Archeology is the process of assessing the data values in a given dataset for uniqueness, consistency and logic. Data profiling cannot identify any incorrect or inaccurate data but can detect only business rules violations or anomalies. The main purpose of data profiling is to find out if the existing data can be used for various other purposes.

Data Mining refers to the analysis of datasets to find relationships that have not been discovered earlier. It focusses on sequenced discoveries or identifying dependencies, bulk analysis, finding various types of attributes, etc.

20. Tell us what was your most difficult data analyst project?

With a question like this, you glean insight into how candidates approach and solve problems. It also gives you a better idea of the type of work they have done.

What to look for in an answer:

☛ Explanation of how challenge(s) were overcome
☛ Lack of blaming others
☛ Discussion of why the project was difficult

21. Do you know what Is INNER JOIN?

INNER JOIN is the most common type of join. It returns all rows that are shared by two tables. Visually, it's analogous to identifying the overlap, or intersection, between two sets of data.

22. Explain me what are the responsibilities of a Data Analyst?

To answer this question, you need to know that such responsibilities include:

☛ Interpret data and analyze results by using techniques of statistics and give reports.
☛ Look out for new areas or processes to improve opportunities.
☛ Get data from various sources (primary and secondary) and keep the systems running.
☛ Filter data from various sources and go through computer reports.
☛ Make sure all data analysis gets support and makes sure customers and staff relate well

23. Tell us what are the best practices for data cleaning?

☛ Separate data depending on their attributes
☛ In the case of massive datasets, do a stepwise cleansing and improve on the data on every step until the data quality is good.
☛ For common data cleansing, you need to generate a set of scripts which include blanking out every value not matching a regex.
☛ Do analysis on the statistic for every column.
☛ Stay up to date with all cleaning operations, so changes could make when necessary.

24. Tell us what is the GROUP BY statement used for?

GROUP BY is a statement that divides the result for a particular query into groups of rows. It is often used with aggregate functions such as SUM, AVG, MAX, MIN, and COUNT, which calculate information about each group. The SELECT statement returns one row for each group.

25. Explain what is a view?

A view is a virtual table that consists of a subset of data from a table. The content of a view is defined by the query. A view takes up little space because it doesn't copy all data from the given table but only a subset of data as defined by the view. Note that a view can also display a combination of data from one or more tables. Views allow you to hide the complexity of large data and instead narrow in on areas of interest.

26. Tell us what is your process when you start a new project?

This question lets you measure candidates' organizational skills and how well they anticipate. It also gives you an opportunity to see if candidates' leadership or work styles are compatible with your company culture.

What to look for in an answer:

☛ Clear steps
☛ Deliberate process
☛ Consideration of deadline

27. Tell me what are your communication strengths?

My greatest communication strength would have to be my ability to relay information. I'm good at speaking in a simple, yet effective manner so that even people who aren't familiar with the terms can grasp the overall concepts. I think communication is extremely valuable in a role like this, specifically when presenting my findings. This is even more important when those findings could be beneficial or detrimental to other departments within the company and you need to make sure that everyone understands the overall message.

28. Tell me what steps are in an analytics project?

The steps involved in an analysis project can be listed as:

☛ Problem identification
☛ Exploration of data
☛ Preparation of data
☛ Modeling
☛ Data Validation
☛ Implementation and tracking

29. Tell me the difference between data profiling and data mining?

The difference between data profiling and data mining is:

Data Profiling is aimed at individual attributes' analysis. Information on different attributes like discrete values, value ranges and their data type, frequency, length are gotten from it. Data mining, on the other hand, targets unusual records detection, cluster analysis, sequence discovery and others.

30. Tell me in what situations should you use WHERE and HAVING in a statement?

Though both WHERE and HAVING are used to filter records, there is a subtle difference between the two. The WHERE clause is used to filter records from a result, whereas HAVING is used to filter groups. If the two are used together, the WHERE clause is applied first to filter rows; only after grouping finishes is the HAVING clause applied.

31. Tell me what do data analysts do?

This question is basic but serves an essential function. It weeds out the candidates who lack a rudimentary understanding of data analysis. It also lets you compare how well various candidates understand data analysis.

What to look for in an answer:

☛ Coverage of each step
☛ Mention of soft skills, such as communication
☛ Discussion of how data analysts benefit a company

32. Can you tell us what has been your most difficult analysis to date?

My biggest challenge was making prediction sales during the recession period and estimating financial losses for the upcoming quarter. Interpreting the information was a seamless process. However, it was slightly difficult to forecast future trends when the market fluctuates frequently. Usually I analyze and report on data that has already occurred. In this case, I had to research how receding economic conditions impacted varying income groups and then make an inference on the purchasing capacity of each group.

33. Suppose you are assigned a new data anlytics project. How will you begin with and what are the steps you will follow?

The purpose of asking this question is that the interviewer wants to understand how you approach a given data problem and what is the though process you follow to ensure that you are organized. You can start answering this question by saying that you will start with finding the objective of the given problem and defining it so that there is solid direction on what need to be done. The next step would be to do data exploration and familiarise myself with the entire dataset which is very important when working with a new dataset.The next step would be to prepare the data for modelling which would including finding outliers, handling missing values and validating the data. Having validated the data, I will start data modelling untill I discover any meaningfuk insights. After this the final step would be to implement the model and track the output results.

This is the generic data analysis process that we have explained in this answer, however, the answer to your question might slightly change based on the kind of data problem and the tools available at hand.

34. Tell us what are the requirements needed for becoming a data analyst?

This is one of the most commonly asked data analyst interview questions. Listed below are the requirements needed for becoming a data analyst:

☛ Sound knowledge of statistical packages used in analyzing big datasets like Excel, SAS, SPSS and many others
☛ Very good knowledge of the programming language (Javascript, ETL frameworks or XML), reporting packages (Business Objects) and databases.
☛ Strong technical knowledge in areas like data models, segmentation techniques, data mining and database design
☛ Good skills in knowing how to run analysis, organization, collection and dissemination of big data accurately

35. Tell me what is CROSS JOIN?

CROSS JOIN defines a Cartesian product on the sets of records from two or more joined tables, where the number of rows in the first table is multiplied by the number of rows in the second table.