The Elusiveness of "Human Error"

 

(The text below is based on Hollnagel, E. & Amalberti, R. The Emperor’s New Clothes, or whatever happened to “human error”? Invited keynote presentation at 4th International Workshop on Human Error, Safety and System Development. Linköping, June 11-12, 2001.

Introduction

There has for many years been a protracted discussion about the nature and meaning of “human error”. Since the term has been part of the daily language for a very long  time, many researchers have taken the term at face value and gone on to propose an impressive number of models, theories, and methods to deal with the “human error”. There is considerable face validity to this approach since human action (and inaction) definitely plays a major role in a large number of spectacular incidents and accidents – and in an even larger number of seemingly mundane events. Others have for some time tried to argue that the term “human error” as such is ill defined, that the use of it is bound to be misleading, and that it therefore better not be used at all. The theoretical arguments have little by little been supplemented by a growing realisation that the process of searching for “human error”, and indeed the searching for any kind of root cause, is misguided because it corresponds to an oversimplified conception of how events occur. 

The Argument From Semantics

In daily life, and in daily language, we use the term “human error” casually on the assumptions: (1) that everyone understands it and (2) that other people’s understanding is the same as ours. In the communication between people it is sometimes a problem that a term may have a common denotation but different connotations. In the case of “human error” the situation is interestingly enough that the term has common connotations but different denotations. The fact that everyone does respond to the term because they do understand something by it, because it in a sense is intuitively meaningful, has created the misconception that it is a well-defined term and that everyone understands it in the same way.

 

The fundamental semantic problem is that the term “human error” has at least three different denotations, so that it can mean either the cause of something, the event itself (the action), or the outcome of the action.

The multiple denotations are unfortunately not the only problem with ”human error”. Another is that it alludes to the notion of right and wrong or correct and incorrect, that is, a binary distinction. Yet even if we limit the use of the term “human error” to denote “error as event”, the notion of an action gone wrong is a serious oversimplification. In practice, people may often realise, consciously or subconsciously, that something has gone awry before the consequences have had time to manifest themselves and therefore make attempts to compensate for or adjust the development of events. Following the proposal of Amalberti (1996), this leads to the following classification.

A common element in the above descriptions is the detection or recognition that the outcome differs from what was expected. In cases where there is an observable outcome, or even some direct feedback from the system, this does not pose any problems. In cases where the recognition happens as the action is carried out, such as in typing or speaking, it raises interesting questions about how such discrepancies can be detected. One explanation is that the brain makes some kind of comparison between actual and intended outcomes on a neural level. Whatever the explanation may be, the fact remains that humans are quite good at detecting when something has gone wrong.

 

The existence of these five categories of action makes it clear, that the binary distinction between correct actions and “errors” is an oversimplification and therefore inappropriate. In fact, the whole argument so far leads to the conclusion that it is misleading to consider the specific action as a cause in itself. Furthermore, even if an action was carried out incorrectly, it is not necessarily a bad thing. Failures provide an important opportunity to learn, particularly if the outcome was a near miss or an incident rather than an accident.

The Argument From Philosophy

The argument from philosophy relates to the metaphysical nature of causation. There is, of course, no reason to doubt the reality of causality, since almost everything we do bear witness to that. It is enshrined in the laws of physics, at least outside the world of quantum effects, and if further proof is needed it is sufficient to consider the manifest success in building technological systems and, indeed, in being able to survive in a complex world in the first place. Yet even though it is possible to observe two events, call them A and B, and also to infer with more than reasonable certainty that one is the cause of the other, the determination of a causal relation is the result of reasoning rather than of observation. This was clearly pointed out by David Hume, who noted that the necessary conditions for establishing a causal relation between two events were priority in time, meaning that A should happen before B, and contiguity in space and time, meaning that A should be close to B in both respects. The conditions of priority and contiguity are necessary to conclude that a causal relation exists, but they are not sufficient. Indeed, it is generally acknowledged that causality cannot be attributed solely on the basis of a temporal relation.

 

In the case of “human error”, the issue is even more complicated since it refers to the notion of backward causality, i.e., reasoning from effect to cause. In the simple case, we may observe that event A was followed by event B and conclude that B was the effect of A. In the more complex case that is the subject matter here, we observe event B, assume that is was an effect of something and then try to find out which event A was the cause of it. The problem of backward causation is aggravated by two common mistakes. The first is the human tendency to draw conclusions that are not logically valid. Thus, if we know that “If A, then B” is true, we are prone to assume that “B, therefore A” also is true (Wason & Johnson-Laird, 1972). In relation to backward causation this means that we fall into the trap of falsely associating a cause with an effect. This deficiency in the ability to reason in accordance with the rules of logic is exacerbated by tendency to rely on heuristics in reasoning, as described by e.g. Tversky & Kahneman (1984).

 

The second mistake is the failure to realise that the sequential relation between events to a considerable extent is an artefact of a description based on time. In the search for a cause, such as in accident analysis, it is common practice to represent how the events took place by means of a timeline. While it is certainly very sensible to do so, it should be realised that in such a description events will always follow each other. There will therefore be contiguity in time (and also in the graphical space of the representation) that is fortuitous but which nevertheless may affect the search for a cause. In the case of “human error” this is of some importance since one or more human actions always can be found. The artefactual contiguity in time combined with the tendency to make false logical conclusions therefore strongly predispose people to find causes where there are none, and in particular to find “human errors” all over the place.

The Argument From Logic

The argument from logic addresses the problem of the stop rule in searching for causes. As pointed out by many authors the stop rule is always relative rather than absolute. Even though accident investigations ostensibly aim to find the “root cause”, the determination of a cause reflects the interests of stakeholders as much as what actually happened. Finding a cause is thus a case of expediency as much as of logic. There are always practical constraints that limit the search in terms of, e.g., material resources or time. Any analysis must stop at some time, and the criterion is in many cases set by interests that are quite remote from the accident investigation itself. As Woods et al. (1994) have pointed out, a cause is always a judgement made in hindsight and therefore benefits from the common malaise of besserwissen. More precisely, a cause – or rather, an acceptable “cause” – usually has the following characteristics:

Even if the search for the cause is made as honestly as possible, it is necessary to stop at some point. In the case of hierarchical classification systems, such as the common “error taxonomies”, the stop rule is given by the structure of the taxonomy. Not only that, but the analysis always begins in the same place and goes through a pre-determined number of steps, i.e., it has a fixed direction and a fixed depth. The situation is somewhat better for analysis methods that use a flexible or non-hierarchical classification scheme. In these cases the analysis begins at the most appropriate category, and the direction and depth of the analysis is determined by the context rather than by the number of categories. Yet the stop rule problem exists even here, since the search can only continue as long as there is sufficient information.

 

The logical problem in searching for causes exists because there can be no absolute limit to the depth of the analysis. Even though there in practice always will be some point where it makes little sense to go on, the stop rule is still subject to the accumulated knowledge and experience as it is encapsulated in the commonly accepted classification schemes. This can be illustrated by considering the development in the categories of causes over the last 50 years or so. The starting point was a distinction between technical failures, “human error” and “other” – the latter being the famous garbage can category for things we either do not know or do not care about. Over the years there has been a proliferation of categories in the “human error” and “other” groups, but less development in the group of technical failures. Indeed, the imagination of analysts and psychologists seem to know few bounds in inventing new explanations for "human errors". Relative to the present discussion, the development shows that the determination of a cause is limited by the categories available to the analyst as well as by the uncertainty of the stop rule. This is not least the case for notion of a “human error”, which has undergone several radical changes over the years.

The Argument From Practice

The fourth argument comes from the practical problems in using classifications of “human error”. One indicator of this is the problem with the reliability of categorising or coding the “human error” components in event reports or observations of human performance. In order for a classification system of “human error” to be useful, it is a fundamental requirement that there is a sufficiently high reliability in the use of the categories. In other words, the quality of analysing or coding an event description should depend on the qualities of the classification scheme rather than on the skills and experience of the person using it. This is commonly expressed in terms of the inter-rater reliability, which can be measured in various ways. As an example, Wallace et al. (2001) selected 28 event reports, which had previously been analysed and coded as part of the normal handling of the reports. They then asked three experienced analysts to read the event reports and to code them following an established approach with which the analysts were familiar. The new results were compared to the original assign coding and an inter-rater reliability was calculated for each pair of coders using an Index of Concordance measure. The outcome was an average inter-rater reliability of 42%, which is well below the level of 75% that is the generally accepted norm. One possible explanation for this result was that the coding system used was rather complex and involved a partly inconsistent terminology. The coding system had in fact been developed in an incremental fashion over a number of years, and consequently did not represent a single principle or a consistent underlying model. Wallace et al. therefore repeated the experiment but this time used a coding system that was based on an articulated human factors or “human error” model. In the second study nine experienced coders were asked to work with another set of twelve randomly selected event reports. The result this time was an inter-rater reliability that ranged between 61% and 81%. Although this it not yet perfect, it was a clear improvement from the old system and one reason seemed to be the more logical structure of the codes.

 

The findings from this study are consistent with the general experience from the field, i.e., that the strength of a method may be due to the people who use it rather than to the method itself. There are several reasons for this. One is that the concepts and categories may be incompletely defined hence open to interpretation. This can be illustrated simply by listing popular concepts such as stress, attention, violation, memory slip, diagnosis, mistake, plan, etc. On a first reading these may seem to be perfectly good terms, but when they have to be applied to describe or characterise important parts of an event, people may use them differently or even disagree about what they really mean. A second reason is that the way in which people understand a description of an event, or even they way in which they notice what other people do, depends on their specific background, experience, and knowledge of the situation. If two different people, for instance a subject matter expert and a human factors expert, are confronted with the same data or descriptions, they may consequently notice or see quite different things. This is a simple consequence of the fact that we pay attention to what we know or consider to be important, and disregard that which we consider irrelevant or do not know about.

 

The practical problems are thus mainly due to the assumption that actions – whether correct or incorrect – can be classified unambiguously by a context-free set of categories. Yet the lesson from practice is that it is very difficult to classify actions without a context. (This obviously goes for “correct” actions as well as “errors”, and for human observers as well as for automated classification systems.) Or rather, the lesson is that a classification always implies a context, but that the context of one observer may be quite different from that of another, and different again from that of the person who is acting. It is furthermore impossible in a conceptual framework to define an absolute or reference context relative to which actions can unequivocally be classified as right or wrong. Since the context implied by most “error” taxonomies actually is very sparse because the supporting theories of human action usually are insufficiently articulated, it follows that it is both principally and practically impossible to use such taxonomies to classify “errors” in a reliable fashion.

 

In Conclusion

The term “human error” should be used carefully and sparingly – if it is to be used at all. In the long term it may be prudent to refrain from considering actions as being either correct or incorrect, firstly because these distinctions rarely apply to the action in itself but rather to the outcome, and secondly because they imply a differentiation that is hard to make in practice. The alternative is to acknowledge that human performance (as well as the performance of technological systems) is always variable. Sometimes the variability becomes so large that it leads to unexpected and unwanted consequences, which then are called “errors”. Yet regardless of what the outcome is, the basis for the performance variability is the same, and classifying one case as “error” and the other as not may have little practical value. Instead of trying to look for “human errors” as either causes or events, we should try to find where performance may vary, how it may vary, and how the variations may be detected and – eventually – controlled. (See also the item on Accident Models.)

References

Amalberti, R. (1996). La conduite des systèmes à risques, Paris: PUF.

Tversky, A. & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124-1131.

Wallace, B., Ross, A., Davies, J. B. & Wright, L. (2001). The creation of a new minor event coding system. Cognition, Technology & Work, 3(4), in print.

Wason, P. C. & Johnson-Laird, P. N. (1972). Psychology of reasoning. London: B. T. Batsford.

Woods, D. D., Johannesen, L. J., Cook, R. I. & Sarter, N. B. (1994). Behind human error: Cognitive systems, computers and hindsight. Columbus, Ohio: CSERIAC.

 

© Erik Hollnagel, 2005

 

Back