Specifying asynchronous transfer of control
A principal requirement of a safety critical system is that it should be able to cope with errors and deficiencies in software and hardware. There are two main approaches in handling this viz., masking and recovery. Masking is usually achieved by replicating the hardware/software. One can either adopt strategies such as voting [Avi85] or treat part of the system as a shadow system and activate it when a fault occurs [HAH89]. Even if a subset of the components fail, the entire system can continue to function. The degree of replication depends on the criticality of the unit and the probability of failure. It is easy to see that such a technique cannot be adopted for large systems, as the cost would be prohibitively large. Recovery from hardware failures, usually results in reassigning the task on the failed unit to other unit(s) in the system. Recovery from software failures is achieved by transferring control to a recovery unit. The general strategy for recovery can be described as follows. After a unit detects a malfunction, another unit is notified. The notified unit responds to the malfunction as soon as possible by taking appropriate action. The action it takes depends on the nature of the error and could affect other units in the system. [Cri91] describes the various dimensions that are important in fault-tolerant computing. It does not appear to be possible to support all the issues directly in a single framework. However, one can provide a few primitives which can then be used to code the various detection/recovery techniques necessary. Asynchronous transfer of control is an important primitive and in this paper we concentrate on this aspect. As fault recovery is a high priority task, the communication between the detection unit and the handler is usually in the form of an interrupt. In this paper we describe a semantic framework for interrupts and show how different kinds of recovery actions can be specified. The model is an extension of the Action Notation [Mos90, Mos92], which supports various features including distributed computation (asynchronously communicating agents). However it does not support interrupts (or asynchronous transfer of control.) This paper is organized as follows. In the next section we present a brief overview of the Action Notation. In section 3 our model for interrupts is described. In section 4 the change to the operational semantics of Action Notation is described. In section 5, we present a few examples using the extended notation.
SubjectsFields of Research::280000 Information, Computing and Communication Sciences::280300 Computer Software::280303 Programming languages
- Engineering: Reports