Operational Pathology Interview Questions & Answers:

1. Artificial intelligence and data-mining systems often use "training data sets" and "test data sets." Define these terms and describe briefly how these data sets are used.

Training data sets are given to systems initially to teach them to make correct responses. Test data sets are equivalent to the training sets but contain separate data and are used to verify the performance of the systems.

2. Rule-based systems underlie most clinical event monitors (programs that detect important clinical events and notify appropriate medical personnel). Often these systems work in conjunction with data from the clinical pathology LIS. What aspects of clinical pathology make a rule-based system a reasonable approach?

Clinical laboratory databases consist of many discrete test results that have known reference ranges and critical values. Well-established patterns of these results exist that is known as related to important clinical conditions. Writing rules that detect and alert to these patterns is straightforward.

3. You are working with an intensive care unit (ICU) attending physician on a project to see if you can predict readmission for patients with pancreatitis. You have access to a large database of ICU data (such as cardiac catheter values, vital signs, and respiratory parameters), as well as all of the data that can be gleaned from the LIS. There are approximately 800 measurements of various types for each of 4000 patients. You do not really have any specific ideas about what values would be most predictive; in fact, you think it is likely that the predictors are highly complex combinations of factors. Which of the 3 types of artificial intelligence systems would be most appropriate for this problem, and why?

A neural network is most appropriate, because there is no prior knowledge to allow selection of predictors, the relative weighting of predictors is unknown, a large data set of many discrete potential predictors is available, combinations of predictors may provide better discrimination than individual predictors, and the desired classification is binary (readmission likely or unlikely).

4. What is the Arden syntax?

The Arden syntax is a standard language and format for representing the medical knowledge and algorithms required for making medical decisions. It is used in medical decision support systems.

5. How are neural networks different than Bayesian belief networks along the following dimensions:
(1) inspect ability of knowledge,
(2) need for probabilities acquired from "domain" experts,
(3) need for data to train the system, and
(4) ability of the system to make classifications based on input data. (Note: You may find it helpful to make a 2 × 4 table and include a short phrase or two in each cell.)

Bayesian belief networks are inspectable, known probabilities are required, training data are not needed, and they can classify into multiple categories.

Neural networks are not inspectable, they do not need domain expertise or known probabilities, training data are required, and they are best for a binary classification ("yes" or "no").

6. Most of the artificial intelligence systems we discussed rely on some kind of knowledge representation, with the notable exception of neural networks. Where is the "knowledge" in a neural network stored?

In the weightings between: the nodes or "neurons."

7. What is an "entity-relationship" diagram useful for, state briefly?

An entity-relationship diagram is a way of illustrating the structure of a relational database in a simple format. It displays the primary "entities" (tables) in the database and the relationships that exist between the data elements in the tables. It is useful as a basis for discussion during database design and in describing existing databases.

8. Describe the main difference between the hypothesis-testing and hypothesis-generating approaches to data mining.

In hypothesis testing, data mining is used to determine whether and under what conditions a proposed pattern exists in a large data set. In hypothesis generation, data mining is used to discover patterns in the data without prior knowledge of what kinds of patterns might exist.

9. What are process measures in outcomes research and why are they sometimes used in place of actual outcomes data?

A process measure is a piece of data that is closely related to an outcome, but is easier to measure or more available than the actual outcome data. Thus, it is convenient to use as a surrogate measure for the outcome. For example, the effect of diabetes health education program, the number of eye examinations and regular evaluation of glycosylated hemoglobin (i.e., good practices) rather than assessing the actual long-term health of the diabetics.

10. What advantage does a pathologist have over investigators in most other fields in carrying out outcomes or data-mining studies?

Some of the most important and useful data in clinical data mining are derived from pathology services (anatomic pathology diagnoses and laboratory test results). In most places, pathologists manage the systems that contain these key data.

Download Interview PDF

11. Define "association rules" and describe their use in exploratory data mining.

Association rules express the likelihood of co occurrence of features or events in records in a database (e.g. if a patient has characteristics A and B, he or she has an 80% chance of having characteristic C). Data-mining software can automatically identify associations in large data sets. Although many associations are trivial, some indicate causative or "common cause" relationships. Changing associations over time may also provide useful information.

12. Outcomes research is often limited in the conclusions that can be drawn because of limitations in the data sources used for the studies. What are the most common data sources and what are their main limitations?

The most common data sources include large local or regional administrative databases from hospitals, insurers, or government agencies. These databases contain very limited clinical information (usually ICD-9 codes), and thus it is difficult to meaningfully stratify patients by the severity of their illness, particular symptoms or test result characteristics, or the details of their therapy.

13. Amyloid reacts with the Prussian blue stains.

False

14. Amyloid reacts with the alcian blue stains.

False

15. Amyloid reacts with the methyl violet stains.

True

16. Amyloid reacts with the thioflavine -T stains.

True

17. Amyloid reacts with the Congo red stains.

True

18. Necrosis is a feature in leprosy.

False