1. Do you know what is White box testing?
White-box testing (also known as clear box testing, glass box testing, transparent box testing, and structural testing) is a method of testing software that tests internal structures or workings of an application, as opposed to its functionality (i.e. black-box testing).
2. Tell us what is Black box testing?
Black-box testing is a method of software testing that examines the functionality of an application without peering into its internal structures or workings. This method of test can be applied to virtually every level of software testing: unit, integration, system and acceptance.
In computing and data management, data mapping is the process of creating data element mappings between two distinct data models. Data mapping is used as a first step for a wide variety of data integration tasks including: Data transformation or data mediation between a data source and a destination.
4. Explain what is Data Integration?
► The process of combining data from different resources.
► The combined data is provided to the users with unified view.
► Information from different enterprise domains are integrated - known as Enterprise Information Integration.
► Useful for merging information from different technologies among enterprises.
► The sub areas of data integration are
1. Data Warehousing.
2. Data Migration.
3. Master Data Management.
A business plan is a formal statement of business goals, reasons they are attainable, and plans for reaching them. It may also contain background information about the organization or team attempting to reach those goals.
6. What are the benefits of data integration?
Following are the benefits of data integration:
► Makes reporting, monitoring, placing customer information across the enterprise flexible and
convenient.
► Data usage is efficient.
► Cost Effective.
► Risk adjusted profitability management as it allows accurate data extraction.
► Allows timely and reliable reporting, as data quality is the prime technology for business challenges.
7. Tell me what is Risk Management?
The identification, analysis, assessment, control, and avoidance, minimization, or elimination of unacceptable risks. An organization may use risk assumption, risk avoidance, risk retention, risk transfer, or any other strategy (or combination of strategies) in proper management of future events.
8. Describe about Physical Data Integration?
► Physical Data Integration is all about creating new system that replicates data from the source systems.
► This process is done to manage the data independent of the original system.
► Data Warehouse is the example of Physical Data Integration.
► The benefits of PDI include data version management, combination of data from various sources, like mainframes, flat files, databases.
► A separate system is needed for handling vast data volumes.
9. What is Use Case and Test Case?
Use Case Testing is a functional black box testing technique that helps testers to identify test scenarios that exercise the whole system on each transaction basis from start to finish.
10. Explain about Data Integration hierarchy?
The DI hierarchy is as follows:
► Project->JOB->WorkFlow->DataFlow.
► WorkFlow also has scripts.
► Source, Query, Target are under Data Flow and known as Transformations.
► Workflow, Dataflow, data, files or tables usage for certain number of times, is specified by usage count.
► Objects can be used more than once in Data Integration. These objects are known as reusable objects.
11. What is SDLC methodologies?
To manage this level of complexity, a number of SDLC models or methodologies have been created, such as "waterfall"; "spiral"; "Agile software development"; "rapid prototyping"; "incremental"; and "synchronize and stabilize". SDLC can be described along a spectrum of agile to iterative to sequential.
The Unified Modeling Language (UML) is a general-purpose, developmental, modeling language in the field of software engineering, that is intended to provide a standard way to visualize the design of a system.
13. What is History Preserving?
► History Preserving is for providing new row in the target instead of updating the existing row.
► The columns are indicated for transforming the changes that are to be preserved.
► New rows are created when the value of certain column changes.
► Each of these rows is flagged as UPDATE.
► The UPDATE flag is applied for the input data set.
shell script is set of shell commands with some programming constructs e.g. if and for loop, which allow you to automate some repetitive task. For example, you can write shell script to daily cleanup of logs files, for backing up data for historical use and for other housekeeping jobs, releases and monitoring.
15. What is the Open closed design principle?
Open closed is another principle from SOLID, which asserts that a system should be open for extension but close for modification. Which means if a new functionality is required in a stable system then your tried and tested code should not be touched and new functionality should be provided by adding new classes only.
16. What is Hierarchy Flattening?
► Construction of parent/child relationships hierarchy is known as Hierarchy Flattening.
► A description of hierarchy in the vertical or horizontal format is produced.
► The hierarchy pattern includes Parent column, Child Column, Parent Attributes and Child Attributes.
► Hierarchy Flattening allows to understand the basic hierarchy of BI in a lucid manner.
► As the flattening is done in horizontal or vertical format, the sub elements are easily identified.
17. What is a regular expression?
Regular expression is a way to perform pattern matching on text data. It's very powerful tool to find something e.g. some character in a long string e.g. finding if a book contains some word or not. Almost all major programming language supports regular expression but Perl has been renowned for its enormous capability. Java also supports Perl like regular expression using java.util.regex package. You can use regular expression to check if a email is valid or not, if a phone number is valid, or if a zip code is valid, or even a SSN number is valid or not. One of the simplest example of regular expression is to check if a String is number or not.
18. Explain about Pivot - Columns to Rows?
► Data Integrator produces a row in the output data set for every value in the designated pivot column.
► More than one pivot column can be set as per the need of application's data integration.
► Pivot Sequence Column - Data Integrator increments a sequence number for every row created from a pivot column.
► Non-Pivot column - The columns that need to appear in the target.
► Pivot Set - A group of pivot columns, unique data field and header column.
► Data Field Column - It contains the pivot data along with pivot columns values.
► Header Column - Lists the name of the columns.
19. What is Push Back from Business Users?
At many times the biggest issue with business users is their time availability. But, if you have a good Project Manager and Leadership, there should be collaboration between them and your business users and their management to ensure the time is available, this should be done well before the testing period begins. The Project Manager should also be able to define the length of time they would be required to spend testing. As a Business Analyst, I realize the business user's time is very limited. Throughout the requirements lifecycle I try to provide mockups and/or prototypes of everything, get feedback, and make changes where necessary so that when the business users begin testing it isn't so time consuming or the first time they are seeing the changes. Developing a Test Plan is required to set expectations at the time development begins.
20. Explain about Data Integrator Metadata Reports?
► Browser-based analysis and reporting capabilities are provided by Metadata reports.
► The DI Metadata Reports are generated on metadata that associates with
1. Data Integration jobs.
2. Other BO applications those are associated with Data Integration.
► Three modules are provided by Metadata Reports. They are
1. Operational Dashboards.
2. Auto Documentation.
3. Impact and Lineage analysis.
21. How do you find large files in UNIX e.g. more than 1GB?
You can easily find big files by using find command because it provides option to search files based upon there size. Use this if your file system is full and your Java process is crashing with no more space. This command will list all files which is more than 1GB. You can tweak the size easily e.g. to find all files with more than 100 MB just use +100M.
find . - type f -size +1G -print
22. How SNMP Agent is associated with Data Integrator?
► Error events are communicated using applications are best supported by SNMP Agent.
► Errors are monitored better using SNMP.
► DI SNMP Agent need to be installed on any Job Server.
► Job servers information is recorded by DI SNMP Agent while running jobs.
► Network Management Software need to be configured for applications to communicate with DI SNMP Agent.
► The status of NMS applications will monitor the Data Integrator jobs.
23. What is test-driven development?
Test driven is one of the popular development methodology in which tests are written before writing any function code. In fact, test drives the structure of your program. Purists never wrote a single line of application code without writing test for that. It greatly improve code quality and often attributed as a quality of rockstar developers.
24. Explain what is Uniform Data Access Integration?
► UDAI places the data in the source systems.
► A set of views are defined for providing access the unified view to the clients / customers.
► Zero latency of data can be propagated from the source system.
► The generated consolidated data need not require separate storage space.
► Data history and version management is limited and applied only to the similar type of data.
► Accessing to the user data overloads on the source systems.
25. Can you give a practical example of a recursive algorithm?
There are lots of places where recursive algorithm fits e.g. algorithm related to binary and linked list. Couple of examples of recursive algorithm is reversing String and calculating Fibonacci series. Other examples include reversing linked list, tree traversal, and quick sort algorithm.
26. How Full Outer Join is implemented BODI? Explain with examples?
► Full Outer Join is implemented by using SQL Transformation and writing custom query.
► Following example describes SQL Transformation to implement Full Outer Join:
select emp.*, dept.deptname, dept.deptno dno, dept.location from scott.employee emp
FULL OUTER JOIN
scott.department dept on (emp.deptno = dept.deptno) ;
► Following example illustrates custom query to implement Full Outer Join:
1. Drag EMPLOYEE, DEPARTMENT tables as src.
2. Place the query transform for performing the Left Outer Join.
3. Place one more query transform for performing the Right Outer Join.
4. Merge and load them into the target.
27. Describe how to adjust the performance of Data Integrator?
Following are the ways to perform this:
► Using array fetch size.
► Ordering the joins.
► Extracted data minimizing.
► Locale conversion minimization.
► Setting target-based options to optimize the performance.
► Improving throughput.
► Data type conversion minimization.
28. Explain what is Traceability Matrix?
A traceability matrix is a document, usually in the form of a table, that correlates any two baselined documents that require a many-to-many relationship to determine the completeness of the relationship.
29. What is the result of 1 XOR 1?
Answer is zero, because XOR returns 1 if two operands are distinct and zero if two operands are same, for example 0 XOR 0 is also zero, but 0 XOR 1 or 1 XOR 0 is always 1.
30. What is System Design Document (SDD)?
A software design description (SDD) is a written description of a software product, that a software designer writes in order to give a software development team overall guidance to the architecture of the software project. An SDD usually accompanies an architecture diagram with pointers to detailed feature specifications of smaller pieces of the design. Practically, the description is required to coordinate a large team under a single vision, needs to be a stable reference, and outline all parts of the software and how they will work.
31. How do we measure progress in Data Integration?
Look for the existence of the following items:-
► Generic Data Models
► An Enterprise Data Platform
► Identify the Data Sources
► Selection of a MDM Product
► Implementation of a Customer Master Index or appropriate alternative
32. Write SQL query to find second highest salary in employee table?
This is one of the classic question from SQL interviews, event it's quite old it is still interesting and has lots of follow-up you can use to check depth of candidate's knowledge. You can find second highest salary by using correlated and non-correlated sub query. You can also use keyword's like TOP or LIMIT if you are using SQL Server or MySQL, given Interviewer allows you. The simplest way to find 2nd highest salary is following :
SELECT MAX(Salary) FROM Employee WHERE Salary NOT IN (SELECT MAX(Salary) FROM Employee)
This query first find maximum salary and then exclude that from list and again finds maximum salary. Obviously second time, it would be second highest salary.
This tells me a lot about what they know, what they value, what actual positions they've held on a team, and whether they actually think about what they're doing.
34. What is RUP, Rational Unified Process, implementation?
The Rational Unified Process (RUP) is an iterative software development process framework. RUP is not a single concrete prescriptive process, but rather an adaptable process framework, intended to be tailored by the development organizations and software project teams that will select the elements of the process that are appropriate for their needs. RUP is a specific implementation of the unified process.
35. What are the factors that are addressed to integrate data?
Following are the data integration factors:
► Sub set of the available data should be optimal.
► Noise/distortion estimation levels because of sensory/processing conditions at the
time of data collection.
► Accuracy, spatial and spectral resolution of data.
► Data formats, storage and retrieval mechanisms.
► Efficiency of computation for integrating data sets to reach the goals.