Knowledge Engineering Requirements for Generic Diagnostic

Systems

Andreas Mueller

, Ingmar Hofmann

, Heiner Oberkampf

and Sonja Zillner

Siemens AG, Industry Sector, Advanced Technologies & Standards, Nuremberg, Germany

Siemens AG, Corporate Technology, Munich, Germany

Keywords: Diagnostic Knowledge, Knowledge Representation, Knowledge-based Diagnostics, Domain-specific

Language, Requirements Analysis, Knowledge Engineering, Reasoning, Knowledge Interchange.

Abstract: Diagnostics is the process of determining the nature of malfunctions or faults of systems in various domains.

With regard to the complexity of systems and their composition of different subsystems or subcomponents,

for which different diagnostic approaches are optimal, no means exist for seamless and agile cooperation

and information exchange between currently isolated diagnostic approaches. However, we consider this

essential for an integrated diagnostic mechanism covering complex systems in their entirety. Hence, in this

paper, we show the basic requirements for a generic diagnostic knowledge representation language (DKRL)

by investigating typical diagnostic examples from different domains, namely industry and medicine. DKRL

is intended to facilitate the generic representation, handling, and interchange of diagnostic knowledge

required for performing diagnostics without regard to specific diagnostic approaches.

1 INTRODUCTION

We are concerned with the efficient representation

of diagnostic information.

Being the process of analytically determining the

nature and cause(s) of a malfunction of a technical

or biological system using structural, functional and

causal knowledge about the system, diagnostics is

used in various disciplines, with variation in the use

and representation of the underlying knowledge. On

malfunctions, a useful diagnostic mechanism must

be able to determine defective parts of the system

and also the root cause of these failures, which may

sometimes lie outside the system’s boundaries. This

determination relies on observations delivering more

details on the system’s current behavior.

Various diagnostic mechanisms address isolated

diagnostic problems. However, the seamless and

agile cooperation between them is currently not

possible. A truly integrated diagnostic mechanism

for analyzing complex systems requires a high-level

exchange of diagnostic information across isolated

diagnostic mechanisms. For this, a vast amount of

different information has to be managed in a precise

and standardized manner. These challenges can be

met by a generic diagnostic representation language

providing means for handling diagnostic knowledge.

Current computer-based diagnostic systems use

established reasoning methods, e.g. rule-based

reasoning (Ligeza, 2006), case-based reasoning

(Kolodner, 1993) or probabilistic reasoning (Pearl,

2005). However, there is little common ground

regarding the formalisms for representing the

diagnostic knowledge. In fact, these are tightly

coupled to each approach. A good survey regarding

such knowledge representation is provided in (Van

Harmelen, Lifschitz and Porter, 2008). This,

however, causes a semantic gap between the

perception of a diagnostic problem and its formal

representation. Also, this causes the diagnostic

knowledge to be bound to the employed diagnostic

approach. The flexible exchange of information

about the structure of the system being diagnosed,

about malfunctions, and observations requires an

explicitly specified terminology of a generic

diagnostic process in an unambiguous manner.

Our overall research goal is to develop a generic

diagnostic knowledge representation language

(which will be referred to as DKRL) that addresses

this lack of a coherent diagnostic formalism in the

form of a domain-specific language (DSL). Intended

to facilitate the representation of diagnostic

problems as such with all relevant diagnostic

aspects, but independent of any reasoning approach.

184

Mueller A., Hofmann I., Oberkampf H. and Zillner S..

Knowledge Engineering Requirements for Generic Diagnostic Systems.

DOI: 10.5220/0004124601840189

In Proceedings of the International Conference on Knowledge Engineering and Ontology Development (KEOD-2012), pages 184-189

ISBN: 978-989-8565-30-3

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

DKRL could eventually become the basis for

machine-processable diagnostic lexicons for

arbitrary domains. Hence, we regard two aspects as

being equally important: representational

capabilities for the diagnostic knowledge itself as

well as facilities for the efficient handling of

represented knowledge. This paper discusses the

analysis and gathering of requirements that need to

be addressed by such a DSL for diagnostics.

The paper is organized as follows. Section 2

describes previous work. Section 3 illustrates aspects

of diagnostics in different domains. Section 4

investigates diagnostics in the domains and

introduces the requirements for generic knowledge

representation. Section 5 concludes the paper and

provides an outlook.

2 RELATED WORK

As one basic distinction of diagnostic systems, we

have model-based (Lucas, 1998) or first principles

(Reiter, 1987) diagnosis, and heuristic classification

(Lucas, 1998) or heuristic diagnosis (Reiter, 1987).

Model-based diagnosis as consistency-based

diagnosis and abductive diagnosis (Lucas,

1998)(Poole, 1994) mainly proved useful in the

technical/industrial domain, whereas in the medical

domain, heuristic diagnosis is often used. In model-

based diagnosis we have a description about how the

system is meant to operate, together with

observations. In heuristic diagnosis, information like

“rules of thumb, statistical intuition and past

experience” are more important and “the real world

system being diagnosed is only weakly represented”

(Reiter, 1987). Even in model-based diagnosis, there

are many different formalisms for similar problems.

Existing approaches for a generic representation

language for diagnostic knowledge focus on only

one of the diagnosis problems. (Reiter, 1987) and

(Poole, 1994) focus on model-based diagnosis. In

(Poole, 1994), a further distinction of system-driven

diagnosis in Consistency-Based Diagnosis and

Abductive Diagnosis is made. It is shown that for a

certain class of problems both formalisms reach the

same diagnosis. In (Lucas, 1998) an attempt is made

to create a generic diagnosis language. “Evidence

functions” are used to represent the knowledge

common to all diagnostic systems, the interactions

among defects and findings (Lucas, 1998). The

experience-driven (heuristic) approach is realized in

Bayesian networks (probabilistic dependencies),

default logic (rules of thumb) etc. However, there is

still no overall diagnosis representation language

able to represent the full spectrum of different

diagnostic knowledge.

As our overall goal, DKRL is intended to cover

both model-based and heuristics-based diagnosis.

Showing typical features of the respective diagnosis

types, in the following the industrial and the medical

domain were selected for a requirements analysis.

3 DIAGNOSTICS IN DIFFERENT

DOMAINS

We exemplarily consider the domains “industry”

and “medicine” since these substantially differ in

complexity and availability of reliable factual and

causal knowledge, yet in both domains reaching a

correct diagnosis quickly is critical.

The proposal to capture the notion of diagnostic

reasoning has been considered by two extreme poles

of the diagnosis problem (Poole, 1994): Firstly, the

overall aim may be to describe how components are

structured and work normally, however information

on the origin and the manifestation of malfunctions

is missing. This holds true for the industrial domain,

thus, diagnostic algorithms aim to isolate deviations

from normal behavior. Secondly, knowledge about

faults and symptoms may be used to interpret the

relevance of abnormalities. This holds true for the

medical domain: medical diagnostic knowledge is

typically about “incorrect functioning”.

For a comprehensive set of requirements needed

to represent diagnostic knowledge generically, we

analyze typical diagnostic use case scenarios from

the industrial and the medical domains. The

following examples illustrate the aspects relevant in

diagnostic processes in the selected domains and

show the requirements to be met in order to perform

the described diagnostics. When gathering the

requirements, we discussed with experts and

analyzed existing systems to identify roadblocks and

shortcomings. The medical examples are taken from

interviews with our clinical partners.

3.1 Diagnostics in Industry

Typically, the industry domain shows a high degree

of engineered knowledge, with an adequate

understanding of the considered plant or component

and corresponding diagnostic knowledge being

possibly available from the beginning of the

respective lifecycle. Thus, observations can often be

performed directly and symptoms can often be

treated as directly identifiable causes of observed

KnowledgeEngineeringRequirementsforGenericDiagnosticSystems

185

Table 1: Entities from the industrial domain.

# Entity Explaining remark

I1. Causal relationships In many cases it is possible to state the potential causes for symptoms.

I2. Causes and effects

(symptoms)

Sensors as well as the human operator’s sense provide information that can be interpreted as a

manifestation of a malfunction or fault.

I3. Context-dependent

interpretation

Nominal values for operating parameters need to be interpreted with regard to e.g. the currently

selected mode of plant operation or environmental influences.

I4. Likelihood of

occurrence

As shown in the example of the feeder malfunction, of two possible causes one might occur less

likely than the other.

I5. Localization Recognizing components or functions of a plant where a given effect typically occurs is important

in order to direct service technicians.

I6. Probabilistic causality Causal relations are not necessarily absolute. Instead, there would be more than one possible, but

not equally likely, causes for an observed symptom..

I7. Significance of a

symptom

Certain observed symptoms are typical for certain causes, due to the respective system’s structure

or functionality.

I8. Temporal correlation Some causes and effects become relevant only in correlation to the amount of time passed. Also,

due to aspects such as the system’s structure or functionality, the effect of an occurred cause

might become visible only after some time has passed.

Additionally important functional aspects

I9. Extensibility Knowledge is subject to change due to plant modifications during the lifecycle.

I10. Incomplete knowledge A lack of knowledge at the system level and instance level must be handled. Exact probabilistic

knowledge about the causal relations is not always available.

I11. Reusability Knowledge about symptoms, etc. may be relevant for different faults, thus should be reused.

faults by applying sensors to critical positions in the

structure or the process. Also, the causality behind

symptoms is usually rather easy to determine.

3.1.1 Example 1: Process Industry

(Abdul-Wahab et al., 2007) give examples of

troubleshooting in the domain of multi-stage flash

seawater desalination, with focus on the brine heater

component. Considering the process value of

“condensate conductivity” as a representation of the

salt ratio in low pressure steam after condensation,

the plant operator might observe a gradual increase.

The degree of increase might be a symptom of

leaking tubes in the brine heater, since this would

cause seawater to mix with the condensate, resulting

in an increase of conductivity due to a higher salt

ratio. Hence, the “leakage” needs to be ruled out by

maintenance actions. If the conductivity continues to

rise, automatic valve operation measures would be

taken. Similarly, the conductivity of the distillate

might also show a symptomatic, differently located

increase. Here, a possible cause might be an

increased “top brine temperature”, which can be

responsible for increased brine flashing, causing the

conductivity to rise due to a higher salt ratio.

3.1.2 Example 2: Manufacturing Industry

At Siemens in Nuremberg a research facility is

operated, producing a running text made from small

plastic disks using a series of conveyor belts (see

Figure 1) as an exemplary manufacturing process.

Normally, the disks are separated from the rear-side

storage belt and transported onto the right-hand-side

“column belt” (1), where the next required disk

column is prepared according to the control system.

The column is then pushed onto the front-side “text

belt” by a properly triggered proximity switch (PS)-

controlled feeder (2), positioning the column at the

correct distance in relation to the previous one.

Considering the “text belt”, the symptom of

“irregular positioning patterns” might occur.

Figure 1: Siemens research facility for the manufacturing

process of mechanical running text production.

This can be caused by a wrong feeder mode

(continuous instead of on-demand). Currently, the

reason may be either a broken PS cable connection,

so trigger signals can no longer be received, or a

broken PS, which can be treated as less likely.

3.2 Diagnostics in Medicine

In the medical domain, direct observations are often

KEOD2012-InternationalConferenceonKnowledgeEngineeringandOntologyDevelopment

186

Table 2: Entities from the medical domain.

# Entity Explaining remark

M1. Context-dependent

interpretation

Standard intervals for measurements like blood pressure or blood counts need patient-specific

interpretation regarding, e.g. the patient’s age.

M2. Functional dependencies

and localization

For instance, an artery functionally supplies blood to a number of organs. Such information allows the

physician to anticipate phenomena and to imagine an internal situational picture of the disease he is

confronted with.

M3. Patient history For many diseases risk factors are known. They influence the probability of certain diseases and allow

the clinician to check for such diseases first.

M4. Probabilistic causality The disease-symptom-relation is typically a probabilistic causal relation – e.g. lymphoma has the

associated symptom enlarged lymph nodes in 80% and the symptom enlarged spleen in 30% of all

cases (Herold, 2011).

M5. Relations between

symptoms

Symptom descriptions are fuzzy, e.g. elevated temperature vs. mild fever. Also, some symptoms form

groups like e.g. the B-symptoms - occurring together they constitute a cardinal symptom for lymphoma.

M6. Symptom significance Cardinal symptoms help to focus on certain diseases in the initial diagnosis phase. Additional

occurrence of pathognomonic symptoms allow an immediate diagnosis.

M7. Symptoms There are symptoms that can be observed on a patient during a diagnostic process without being able to

immediately draw conclusions about the actual disease. This situation is typical when recording a

patient's medical history, where symptoms are collected as reported by a patient before being

interpreted.

M8. Temporal progression Diseases may show a characteristic development of symptoms over time. Another temporal factor is the

“novelty” of a symptom.

M9. Urgency of symptoms Diseases and their symptoms are not equally dangerous for the patient’s health. Thus, highly dangerous

diseases and related symptoms have to be checked first.

Additionally important functional aspects

M10. Extensibility Knowledge is subject to change, so it must be represented in an extendable way.

M11. Incomplete knowledge A lack of knowledge at the system level and instance level must be handled. Exact probabilistic

knowledge about the causal relations between diseases and symptoms is not always available. Similarly,

the complete patient situation is seldom known.

M12. Reusability Diagnostic knowledge may be relevant for various diseases, thus should be reused.

impossible, so conclusions about the actually desired

biological characteristics have to be drawn based on

observable characteristics and presumed or

confirmed interrelations. Because of the domain’s

inherent complexity, medical diagnostic knowledge

contains a high degree of uncertainty.

3.2.1 Example 1: Differential Diagnosis

Differential diagnosis is a standard diagnostic

approach in clinical practice. We illustrate the

process along an example: first, the observation of

an initial set of (unspecific) symptoms, say “fever”,

“night sweats”, “feeling weak” and “changes in

bowel patterns”, leads the clinician to suspect a set

of likely diseases which might have caused the

symptoms. Second, the set of likely diseases is turned

into a ranked list based on information about

cardinal symptoms, incidence proportion of a

disease and other factors. Since “changes in bowel

patterns” is a cardinal symptom for “diverticulitis”

and “colorectal cancer”, we may obtain a ranked list

like “diverticulitis”, “colorectal cancer”, “cold”,

“lymphoma” etc. Third, the clinician aims to

differentiate between the likely diseases by checking

for further symptoms that might strengthen or

weaken diagnoses on the list. At first, he will check

other (cardinal) symptoms of top-ranked diseases. In

this case he might identify “weight loss” and

“enlarged lymph nodes” but does not find evidence

for “blood in stool” and “thickened intestinal wall”.

Fourth a more precise list of likely diseases is

obtained, with “lymphoma” now at the top as both

cardinal symptoms (“enlarged lymph nodes”) and B

symptoms (correlation of “fever”, “night sweats”,

and “weight loss”) are present. “Diverticulitis” and

“colorectal cancer” become less likely as important

symptoms “blood in stool” and “thickened intestinal

wall” are absent. The process continues until a

plausible diagnosis is found.

3.2.2 Example 2: Lyme Borreliosis

As an infectious disease, Lyme borreliosis has

exactly one cause: the patient has been infected by

bacteria of genus Borrellia after a tick bite (Masuhr,

1996). Depending on both the part of the body the

infection took place at, and the time passed since the

infection, different symptoms occur. The course of

borrelliosis is divided into three stages after

KnowledgeEngineeringRequirementsforGenericDiagnosticSystems

187

Table 3: Requirements and classifications.

# Requirement Explaining remark Based on

Core requirements: mandatory for the intended representation language

R1. Causality DKRL must represent causality (cause-effect-relationships). I1, M4

R2. Causes and effects DKRL must represent causes and the effects of causes. I2, M7

R3. Context-dependent

interpretations

DKRL must represent information about the interpretation of measurements taking into

consideration DO-specific influences.

I3, M1

R4. Faults DKRL must represent faults that may (have) occur(ed) on a DO. I2, M3

R5. Likeliness under error

conditions

DKRL must represent the likeliness of symptoms and faults for each of the DO’s components

as well as for the DO as a whole to occur under error conditions.

I4, M6

R6. Likeliness under nominal

operating conditions

DKRL must represent the probability that symptoms and faults occur under nominal

operating conditions, since even under optimum conditions there is possibility of spontaneous

malfunctions that might spread throughout the DO.

I6, M4

R7. Localization (physical) DKRL must represent where a symptom or fault is located physically . I5, M2

R8. Localization (functional) DKRL must represent where a symptom or fault is located functionally. I5, M2

R9. Significance of causal

relationships

DKRL must represent that a symptom or cause may have a different significance to a fault or

an effect, respectively, than another symptom/cause.

I6, I7, M4

R10. Symptoms DKRL must represent symptoms that may occur on a DO. I2, M7

R11. Temporal classification DKRL must represent information that effects might become observable only after some time

has passed after the occurrence of a cause.

I8, M8

Application-specific requirements: optional for the intended representation language

R12. Novelty of symptoms DKRL must represent information that the new occurrence of symptoms during the course of

a fault may be of special importance.

R13. Relationships between

symptoms/causes/effects

DKRL must represent information that certain symptoms, causes, or effects have a special

relationship to other symptoms, causes, or effects, respectively.

R14. Temporal relevance of

symptoms

DKRL must represent the temporal relevance of symptoms, i.e. the period of time a symptom

is of significance for a certain fault.

R15. Urgency DKRL must represent information that certain symptoms need to be investigated prior to

others.

Overall functional requirements: mandatory for effective and efficient knowledge management

R16. Extensibility DKRL must represent knowledge in an easily modifiable manner. I9, M10

R17. Incompleteness of knowledge DKRL must represent knowledge so that valid descriptions can be created, even though there

might be information missing.

I10, M11

R18. Reusability DKRL must represent knowledge in a manner that allows reuse or

referencing of knowledge once it has been captured.

I11, M12

infection. The cardinal symptom of the first stage (3

days to 3 weeks) is a circular rash called erythema

chronicum migrans at the region of the tick bite. In

the second stage (1 to 4 months) different body parts

will be affected and the patient might show

symptoms of a meningopolyneuritis like radicular

pain or even facial paralysis. Here, for anamnesis the

physician would ask the patient if he remembers a

tick bite, but also needs to conduct several tests,

since these symptoms might be caused by other

diseases. In the third stage (>5 months) untreated

patients might show colored areas of skin

(acrodermatitis chronica atrophicans) or joint

disorders as a possible symptom of Lyme arthritis

and diagnosis is even more complex. The common

underlying reason for this diversity of symptoms is

the spreading of the bacteria in the body together

with the initiated pathomechanism (comprising the

effects of “cellular invasion” and consequently

“inflammatory reactions”), which progresses at

different rates depending on the affected tissues.

4 REQUIREMENTS ON

KNOWLEDGE

REPRESENTATION

From these examples, we obtain the aspects to be

represented and derive the requirements for DKRL.

We distinguish between core requirements (most

basic and essential; inherent to all diagnostic

decisions), application-specific requirements (only

relevant within applications for special diagnostic

problems), and overall functional requirements (for

effective and efficient handling of knowledge) (see

Figure 2).

Figure 2: Requirement layers.

KEOD2012-InternationalConferenceonKnowledgeEngineeringandOntologyDevelopment

188

4.1 Relevant Entities from the

Considered Domains

In the following sections we describe the knowledge

entities and aspects typical for each domain. The

representation language must be able to represent

these either implicitly or explicitly.

4.1.1 Entities from the Industrial Domain

The examples illustrate that diagnostics in the

industrial domain strongly uses pre-engineered

domain knowledge. If available, systemic

information about the representatives of the

respective domains (i.e. components used in process

technology or in the manufacture of discrete parts)

in terms of structure and functionality is rather

certain and complete. Hence, symptoms and often

their causes are well-known and thus can be directly

captured. To represent the required diagnostic

knowledge, we have identified the entities in Table 1

to be required.

4.1.2 Entities from the Medical Domain

The examples illustrate that medical diagnosis is

largely based on the clinician’s experience and

statistical information: he knows from experience

which disease might cause certain general and

cardinal symptoms, and he knows about the

significance of symptoms for certain diseases. Based

on statistics, more frequent diseases will be

considered first. On the other hand, systemic

information about the human organism is less certain

and complete, and processes in the human body are

highly interconnected and not yet understood well

enough to facilitate correct model-based diagnosis.

We consider the entities in Table 2 to be important

in order to represent medical diagnostic knowledge.

4.2 Requirements and Classifications

From the entities listed for each domain, in Table 3

we now derive the requirements that the intended

representation language has to meet (representatives

from the domains are generically referred to as

diagnostic objects, abbreviated DO).

5 CONCLUSIONS AND

OUTLOOK

In this paper we have derived the requirements for a

generic diagnostic knowledge representation

language (DKRL) by investigating diagnostic

knowledge from the exemplary domains of industry

and medicine. DKRL is intended for use with any

diagnostic system, for handling diagnostic

knowledge in an easily reusable way. We have

shown that the majority of requirements holds true

in both domains. Application-specific requirements

are mainly induced by the medical domain. Still, this

does not restrict their relevance to the medical

context. In fact, we consider that fulfilling these

requirements step-by-step forms a suitable basis for

gradual extension of the diagnostic functionalities

addressable by DKRL. Hence, the identified

requirements allow for the development of DKRL as

well as a corresponding software infrastructure.

The future development of DKRL will focus on a

prototypical implementation with full coverage of

the core requirements and the overall functional

requirements, followed by adding application-

specific representational capabilities.

ACKNOWLEDGEMENTS

The authors are grateful to Dr. Kristina Bayerlein,

senior physician at the University Hospital of

Erlangen, for details on the interpretation of the

medical terminology.

REFERENCES

Abdul-Wahab, S., Elkamelb, A., Al-Weshahic, M. & Al

Yahmadia, A., 2007. Troubleshooting the brine heater

of the MSF plant using a fuzzy logic-based expert

system. In Desalination, vol. 217, no. 1-3, pp. 100-

117.

Herold, G., 2011. Innere Medizin, Herold.

Kolodner, J., 1993. Case-Based Reasoning, Morgan

Kaufmann Publishers Inc., San Mateo, CA.

Ligeza, A., 2006. Logical Foundations for Rule-Based

Systems, Springer, Berlin Heidelberg.

Lucas, P., 1998. Analysis of notions of diagnosis. In

Artificial Intelligence, vol. 105, no. 1-2, pp. 295-343.

Masuhr, K., Neumann, M., 1996. Neurologie,

Hippokrates. Stuttgart.

Pearl, J., 2005. Causality, Cambridge University Press,

New York.

Poole, D., 1994. Representing Diagnosis Knowledge. In

Annals of Mathematics and Artificial Intelligence, vol.

11, pp. 33-50.

Reiter, R., 1987. A Theory of Diagnosis from First

Principles. In Artificial Intelligence, vol. 32, no. 1, pp.

57-95.

Van Harmelen, F., Lifschitz, V., Porter, B., 2008.

Handbook of Knowledge Representation, Elsevier

Science.

KnowledgeEngineeringRequirementsforGenericDiagnosticSystems

189