Accident Models and Accident Analysis

See also the item on Performance Variability Management

Thinking about the performance of complex human-machine systems, indeed thinking about the systems themselves, involves a conceptualisation of how the system is put together and how the interaction between the parts or components should properly be described or – in a nutshell – a model of the system. Such models are commonly based on the principle of structural separateness, which means that a human-machine system is seen as being composed of humans, of machines, and of the interaction between them. Although it is undeniably true that humans and machines are physically separate, there are nevertheless good reasons to question the validity of this approach. Firstly, there are a growing number of bionic systems where it is less clear-cut to make a distinction between humans and machines, since they may be both structurally integrated or fused. Secondly, there are some fundamental limitations with the decomposed human-machine view, which are turning out to be major obstacles for research and development. Finally, the structure of a system is often less important than the function, and the latter may require a breakdown that does not map easily onto the structure.

The role of a system model is essential in thinking about how systems can malfunction, or in other words in thinking about accidents. A fundamental distinction is whether accidents are due to specific malfunctions or “error mechanisms”, or whether they are due to unfortunate coincidences. Over the years, the efforts to explain and predict accidents have involved a number of stereotypical ways of accounting for how events may take place. Although there are many individual instances of such accident models, they seem to fall into three types, which can be called sequential, epidemiological, and systemic accident models.

Sequential Accident Models

The simplest type describes accidents as the result of a sequence of clearly distinguishable events that occur in a specific order. The classical example of this is the so-called domino model (Heinrich, 1931), which depicts the accident as a set of dominos that tumble because of a unique initiating event. In this model the dominos that fall represent the action failures, while the dominos that remain standing by default represent the normal events. The model is deterministic because the outcome is seen as a necessary consequence of one specific event. Another example is the Accident Evolution and Barrier model (Svenson, 1991), which in contrast to the domino model only describes the events – or rather barriers – that failed. This model is sequential but not strictly deterministic, since it cannot be assumed that the failure of one barrier leads to the failure of another. Sequential models need, of course, not be limited to a single sequence of events but may include a representation of multiple sequences of events in the form of hierarchies such as the traditional event tree and networks such as Critical Path models (Programme Evaluation and Review Technique or PERT) or Petri networks.

Epidemiological Accident Models

Sequential models are attractive because they are easy to understand – and have convenient graphical representations – but suffer from being oversimplified. An alternative is provided by the epidemiological model, which describes an accident in analogy with a disease, i.e., as the outcome of a combination of factors. Some of these factors are manifest and some are latent, and accidents seem to happen when a sufficient number of factors come together in space and time. The classical example of this is the description of latent conditions (Reason, 1990). Other examples are models that consider barriers and carriers, and models of pathological system (organisation) states. Epidemiological models are structurally and functionally underspecified but are valuable because they provide a basis for discussing the complexity of accidents that overcome the limitations of sequential models. Unfortunately, epidemiological models are never stronger than the analogy they use, and they are often difficult to specify in further detail, even though they have been instrumental in developing methods that can be used to characterise the general “health” of a system (Reason, 1997).

Systemic Accident Models

A third option is the so-called systemic model, which endeavours to describe the characteristic performance on the level of the system as a whole, rather than on the level of specific “cause-effect mechanisms”. The analogical forms of systemic models are “Brownian” movement models and chaos models. More distinct exemplars are found in models based on control theory (Sheridan, 1992), which provide an account of how the complexities of human-machine interaction may lead to function failures. Yet another exemplar is the so-called coincidence model. A token for the latter type is the Swiss cheese analogy (Reason, 1997), although this is not a model in the usual meaning of the term. It is, however, possible to develop coincidence models that are more detailed and precise, and which potentially can yield accurate predictions. The overriding advantage of systemic models is their emphasis that accidents analysis must be based on an understanding of the functional characteristics of the system, rather than on assumptions or hypotheses about interaction between structures or internal mechanisms as provided by standard representations of, e.g., information processing or failure pathways.

From Error Management to Performance Variability

The way we think about systems determines how we respond to the events that manifest themselves as accidents, both in direct interaction and in developing more considerate responses. (This is the case not only for technological systems, but also for social systems such as other people, organisations, etc.) Each type of accident model characterised above has consequences for how unwanted performance outcomes are dealt with, and specifically for the measures that are taken to improve system safety during design and operation.

Each type of accident model represents a characteristic approach to how responses to accidents should be determined. The three approaches, as shown in Table 1, can be called “error” management, performance deviation management, and performance variability management respectively. “Error” management is based on the assumption that the development of an accident is deterministic, as in the case of the sequential type of accident models. Consistent with that assumption, it should be possible to identify a clear “root cause” – or set of “root causes” – and therefore to prevent future accidents by eliminating or encapsulating the identified causes.

Table 1: Three approaches to accident management

Management principle

Accident model

Nature of causes

Response type

"Error" management

Accident development is deterministic (cause-effect links)

Causes can be clearly identified (root cause assumption)

Eliminating or containing causes will exclude accidents

Performance deviation management

Accidents have both manifest and latent causes

Blunt end / sharp end deviations have clear signatures

Deviations leading to accidents must be suppressed

Performance variability management

Variability can be helpful as well as disruptive

Sources of variability can be identified and monitored

Some variability should be amplified, some reduced

The second approach, called performance deviation management, recognises that accidents may have both manifest and latent causes, and corresponds to the epidemiological type of accident models. It is acknowledged that it may be difficult or impossible to find specific “root causes”, and instead the search is for traces or signatures of characteristic types of deviations. The prevention of accidents is achieved by finding ways of eliminating or suppressing the potentially harmful deviations.

While the performance deviation approach is a significant advantage over the “error” management approach, it still entails the view that “errors” or deviations are negative and therefore undesirable. As Amalberti (1996) has argued, “errors” or deviations have a positive side since they enable users – and systems – to learn about the nature of accidents. Indeed, deviations from the norm can have a distinct positive effect and be a source not only of learning but also of innovation. This requires that the system has sufficient resilience to withstand the consequences of the uncommon action and that it is possible for the users to see what has happened and how. Performance variability management captures this dual nature of performance deviations. This approach fully acknowledges that unwanted outcomes usually are the result of coincidences, which are the inevitable consequences of the natural variability of a system’s performance. The variability can be found at every level of system description and for every kind of system from mechanical components to organisations, as well as for every level of subsystem. It is assumed that it is possible to identify the sources of variability, and therefore also not only to define their characteristic “signatures” but also to monitor them in some way. The monitoring can be used either to suppress the variability that may lead to unwanted outcomes, or to enhance or amplify the variability that may lead to positive outcomes.

Performance variability management accepts the fact that accidents cannot be explained in simplistic cause-effect terms, but that instead they represent the outcome of complex interactions and coincidences which are due to the normal performance variability of the system, rather than actual failures of components or functions. (One may, of course, consider actual failures as an extreme form of performance variability, i.e., the tail end of a distribution.) To prevent accidents there is therefore a need to be able to describe the characteristic performance variability of a system, how such coincidences may build up, and how they can be detected. This reflects the practical lesson that simply finding one or more “root” causes in order to eliminate or encapsulate it is inadequate to prevent future accidents. Even in relatively simple systems new cases continue to appear, despite the best efforts to the contrary.

See also the item on Performance Variability Management

References

Amalberti, R. (1996). La conduite des systèmes à risques. Paris: PUF.
Heinrich, H. (1931). Industrial accident prevention. New York: McGraw-Hill.
Hollnagel, E. (2004). Barriers and accident prevention. Aldershot, UK: Ashgate.
Reason, J. (1990). The contribution of latent human failures to the break down of complex systems. Philosophical Transactions of the Royal Society (London), Series B. 327: 475-484.
Reason, J. T. (1997). Managing the risks of organizational accidents. Aldershot, UK: Ashgate.
Sheridan, T. B. (1992). Telerobotics, automation, and human supervisory control. Cambridge, MA: MIT Press.  
Svenson, O. (1991). The accident evolution and barrier function (AEB) model applied to incident analysis in the processing industries. Risk Analysis, 11(3), 499-507.

© Erik Hollnagel, 2005

Back