Authors:
Alina Corso-Radu
1
;
Raul Murillo Garcia
1
;
Andrei Kazarov
2
;
Giovanna Lehmann Miotto
2
;
Luca Magnoni
2
and
John Erik Sloper
2
Affiliations:
1
University of California, United States
;
2
CERN, Russian Federation
Keyword(s):
Controls, Expert-system, Knowledge, Verification, Testing, Recovery, ATLAS.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Biomedical Engineering
;
Expert Systems
;
Health Information Systems
;
Knowledge Engineering and Ontology Development
;
Knowledge-Based Systems
;
Symbolic Systems
Abstract:
The ATLAS Trigger-DAQ system is composed of O(10000) of applications running ~1500 computers distributed over a network. To maximise the experiment run efficiency, the Trigger-DAQ control system includes advanced verification, diagnostics and complex dynamic error recovery tools, based on an expert system. The error recovery (ER) system is responsible for analysing and recovering from a variety of errors, both software and hardware, without stopping the data-gathering operations. The verification framework allows users to develop and configure tests for any component in the system with different levels of complexity. It can be used as a standalone test facility during the general TDAQ initialization procedure, and for diagnosing the problems which may occur at run time. A key role in both recovery and verification frameworks is played by the rule-based expert system, which is also known as a knowledge-based system, to analyse errors and decide on appropriate recovery actions. The
system is composed of a dynamic set of rules that describe the TDAQ system behaviour and by an inference engine that takes decisions on which actions to perform. The system is currently used on a daily basis for the operation of the ATLAS experiment. The paper describes the architecture and implementation of the TDAQ error-recovery system and verification framework with emphasis on the latest developments and experience gained over the first LHC beam runs.
(More)