Word Sense Disambiguation
Edited by Eneko Agirre and Philip Edmonds


Graeme Hirst

Of the many kinds of ambiguity in language, the two that have received the most attention in computational linguistics are those of word senses and those of syntactic structure, and the reasons for this are clear: these ambiguities are overt, their resolution is seemingly essential for any practical application, and they seem to require a wide variety of methods and knowledge-sources with no pattern apparent in what any particular instance requires.

Right at the birth of artificial intelligence, in his 1950 paper "Computing machinery and intelligence", Alan Turing saw the ability to understand language as an essential test of intelligence, and an essential test of language understanding was an ability to disambiguate; his example involved deciding between the generic and specific readings of the phrase a winter's day. The first generations of AI researchers found it easy to construct examples of ambiguities whose resolution seemed to require vast knowledge and deep understanding of the world and complex inference on this knowledge; for example, Pharmacists dispense with accuracy. The disambiguation problem was, in a way, nothing less than the artificial intelligence problem itself. No use was seen for a disambiguation method that was less than 100% perfect; either it worked or it didn't. Lexical resources, such as they were, were considered secondary to non-linguistic common-sense knowledge of the world.

And because the methods that were developed required a resource whose eventual existence was merely hypothesized - a knowledge base containing everything a typical adult knows - and because there were no test data available, it was not possible to empirically test them or quantitatively evaluate them or their underlying ideas in any serious way. Rather, systems and methods were presented like theorems whose truth or correctness could be demonstrated by a rational argument bolstered by hand-waving and a "toy" demonstration: a knowledge source would be built for a few words and facts, and the system would be run on a few "interesting" constructed examples to show that it did "the right thing". This approach to evaluation was quite normal in the milieu in which this research was carried out and didn't seem to worry anyone at the time: computational linguistics had not yet achieved its empirical orientation.

Contemporary approaches have turned all that upside-down. Statistical and machine-learning methods and methodologies that have been adopted in the last decade have revolutionized our view of ambiguity resolution. It is now understood that imperfect methods that rely on rich lexical resources but limited additional knowledge have great use in the world; and that systems must undergo rigorous evaluation. The present volume demonstrates this in particular for word sense disambiguation - both the strengths and the inherent limitations of these approaches. [Footnote 1: A similar revolution has occurred in parsing and structural disambiguation; see Manning and Schütze (2000, Chaps. 11-12) for an overview.] In particular, contemporary methods are less ambitious and have lower expectations. Unlike the earlier research, they don't worry about case roles, about helping a parser with attachment decisions, or about working with a semantic interpretation process aimed at a deep level of "understanding". Rather than aiming for a complete solution and hypothesizing a resource that this necessitates, they rely on an existing resource and try to see how much can be done with it. And yet they still have enormous application in NLP (see Chap. 11).

One issue that has remained constant is what kinds of information in the text may be drawn upon as cues for disambiguation, and how near in the text to the target word those cues should be. In my own early work (Hirst 1987), restrictions on communication between disambiguating processes arose from two competing principles: any particular word or structural cue for disambiguation has quite a limited sphere of influence, and yet almost anything in a text or discourse is potentially a cue for disambiguation (cf. McRoy 1992). In contemporary systems, the analogous dilemma is in the choice of features and the window size (see Chap. 8).

The other thing that hasn't changed is how hard the lexical disambiguation problem is. Many sophisticated systems struggle merely to reach the modest accuracy of simple baseline algorithms such as that of Lesk (1986) (see Chap. 5) or just choosing the most-frequent sense. But what is a poor computer to do when humans themselves frequently disagree on what the correct answer is supposed to be (see Chaps. 2-4)?

Although it is an edited volume, this book is not an anthology of "recent advances" papers by individual authors on their own research, requiring each reader to synthesize a view of the overall situation in a research topic. Rather, editors Agirre and Edmonds have enlisted the leading researchers of the field to do the hard work. Each chapter of this book presents an overview and synthesis of one facet of current research. The result is a clear and well-organized presentation of the state of the art in word sense disambiguation that can be read, like a textbook, from start to finish. I commend it to you.

Graeme Hirst is the author of Semantic Interpretation and the Resolution of Ambiguity (Cambridge University Press, 1987), which presents an integrated theory of lexical disambiguation, structural disambiguation, and semantic interpretation.


Hirst, Graeme. 1987. Semantic Interpretation and the Resolution of Ambiguity. Cambridge University Press.

Lesk, Michael. 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. Proceedings of SIGDOC-86: 5th International Conference on Systems Documentation, Toronto, Canada, 24-26.

Manning, Christopher D. & Hinrich Schütze. 2000. Foundations of Statistical Natural Language Processing. Cambridge, MA: The MIT Press.

McRoy, Susan. 1992. Using multiple knowledge sources for word sense discrimination. Computational Linguistics, 19(1):1-30.

Turing, Alan M. 1950. Computing machinery and intelligence. Mind, 59:433-460. Reprinted in: Stuart Shieber, ed. 2004. The Turing Test: Verbal Behavior as the Hallmark of Intelligence. Cambridge, MA: The MIT Press.

Copyright © 2006 Springer. All rights reserved. Reprinted here by permission.