Training data sets are given to systems initially to teach them to make correct responses. Test data sets are equivalent to the training sets but contain separate data and are used to verify the performance of the systems.
2. Rule-based systems underlie most clinical event monitors (programs that detect important clinical events and notify appropriate medical personnel). Often these systems work in conjunction with data from the clinical pathology LIS. What aspects of clinical pathology make a rule-based system a reasonable approach?
Clinical laboratory databases consist of many discrete test results that have known reference ranges and critical values. Well-established patterns of these results exist that is known as related to important clinical conditions. Writing rules that detect and alert to these patterns is straightforward.
3. You are working with an intensive care unit (ICU) attending physician on a project to see if you can predict readmission for patients with pancreatitis. You have access to a large database of ICU data (such as cardiac catheter values, vital signs, and respiratory parameters), as well as all of the data that can be gleaned from the LIS. There are approximately 800 measurements of various types for each of 4000 patients. You do not really have any specific ideas about what values would be most predictive; in fact, you think it is likely that the predictors are highly complex combinations of factors. Which of the 3 types of artificial intelligence systems would be most appropriate for this problem, and why?
A neural network is most appropriate, because there is no prior knowledge to allow selection of predictors, the relative weighting of predictors is unknown, a large data set of many discrete potential predictors is available, combinations of predictors may provide better discrimination than individual predictors, and the desired classification is binary (readmission likely or unlikely).
The Arden syntax is a standard language and format for representing the medical knowledge and algorithms required for making medical decisions. It is used in medical decision support systems.
5. How are neural networks different than Bayesian belief networks along the following dimensions:
(1) inspect ability of knowledge,
(2) need for probabilities acquired from "domain" experts,
(3) need for data to train the system, and
(4) ability of the system to make classifications based on input data. (Note: You may find it helpful to make a 2 × 4 table and include a short phrase or two in each cell.)
Bayesian belief networks are inspectable, known probabilities are required, training data are not needed, and they can classify into multiple categories.
Neural networks are not inspectable, they do not need domain expertise or known probabilities, training data are required, and they are best for a binary classification ("yes" or "no").
6. Most of the artificial intelligence systems we discussed rely on some kind of knowledge representation, with the notable exception of neural networks. Where is the "knowledge" in a neural network stored?
In the weightings between: the nodes or "neurons."
An entity-relationship diagram is a way of illustrating the structure of a relational database in a simple format. It displays the primary "entities" (tables) in the database and the relationships that exist between the data elements in the tables. It is useful as a basis for discussion during database design and in describing existing databases.
In hypothesis testing, data mining is used to determine whether and under what conditions a proposed pattern exists in a large data set. In hypothesis generation, data mining is used to discover patterns in the data without prior knowledge of what kinds of patterns might exist.
A process measure is a piece of data that is closely related to an outcome, but is easier to measure or more available than the actual outcome data. Thus, it is convenient to use as a surrogate measure for the outcome. For example, the effect of diabetes health education program, the number of eye examinations and regular evaluation of glycosylated hemoglobin (i.e., good practices) rather than assessing the actual long-term health of the diabetics.
Some of the most important and useful data in clinical data mining are derived from pathology services (anatomic pathology diagnoses and laboratory test results). In most places, pathologists manage the systems that contain these key data.