1. Artificial intelligence and data-mining systems often use "training data sets" and "test data sets." Define these terms and describe briefly how these data sets are used.

Training data sets are given to systems initially to teach them to make correct responses. Test data sets are equivalent to the training sets but contain separate data and are used to verify the performance of the systems.

2. Rule-based systems underlie most clinical event monitors (programs that detect important clinical events and notify appropriate medical personnel). Often these systems work in conjunction with data from the clinical pathology LIS. What aspects of clinical pathology make a rule-based system a reasonable approach?

Clinical laboratory databases consist of many discrete test results that have known reference ranges and critical values. Well-established patterns of these results exist that is known as related to important clinical conditions. Writing rules that detect and alert to these patterns is straightforward.

4. What is the Arden syntax?

The Arden syntax is a standard language and format for representing the medical knowledge and algorithms required for making medical decisions. It is used in medical decision support systems.

5. How are neural networks different than Bayesian belief networks along the following dimensions:
(1) inspect ability of knowledge,
(2) need for probabilities acquired from "domain" experts,
(3) need for data to train the system, and
(4) ability of the system to make classifications based on input data. (Note: You may find it helpful to make a 2 × 4 table and include a short phrase or two in each cell.)

Bayesian belief networks are inspectable, known probabilities are required, training data are not needed, and they can classify into multiple categories.

Neural networks are not inspectable, they do not need domain expertise or known probabilities, training data are required, and they are best for a binary classification ("yes" or "no").

7. What is an "entity-relationship" diagram useful for, state briefly?

An entity-relationship diagram is a way of illustrating the structure of a relational database in a simple format. It displays the primary "entities" (tables) in the database and the relationships that exist between the data elements in the tables. It is useful as a basis for discussion during database design and in describing existing databases.

8. Describe the main difference between the hypothesis-testing and hypothesis-generating approaches to data mining.

In hypothesis testing, data mining is used to determine whether and under what conditions a proposed pattern exists in a large data set. In hypothesis generation, data mining is used to discover patterns in the data without prior knowledge of what kinds of patterns might exist.

9. What are process measures in outcomes research and why are they sometimes used in place of actual outcomes data?

A process measure is a piece of data that is closely related to an outcome, but is easier to measure or more available than the actual outcome data. Thus, it is convenient to use as a surrogate measure for the outcome. For example, the effect of diabetes health education program, the number of eye examinations and regular evaluation of glycosylated hemoglobin (i.e., good practices) rather than assessing the actual long-term health of the diabetics.

10. What advantage does a pathologist have over investigators in most other fields in carrying out outcomes or data-mining studies?

Some of the most important and useful data in clinical data mining are derived from pathology services (anatomic pathology diagnoses and laboratory test results). In most places, pathologists manage the systems that contain these key data.

Download Interview PDF