Word Sense Disambiguation

Edited by Eneko Agirre and Philip Edmonds

Home

Extended contents

1 Introduction. 1

1.1 Word sense disambiguation. 1

1.2 A brief history of WSD research. 4

1.3 What is a word sense?. 8

1.4 Applications of WSD.. 10

1.5 Basic approaches to WSD.. 12

1.6 State-of-the-art performance. 14

1.7 Promising directions. 15

1.8 Overview of this book. 19

1.9 Further reading. 21

References. 22

2 Word senses. 29

2.1 Introduction. 29

2.2 Lexicographers. 30

2.3 Philosophy. 32

2.3.1 Meaning is something you do. 32

2.3.2 The Fregean tradition and reification. 33

2.3.3 Two incompatible semantics?. 33

2.3.4 Implications for word senses. 34

2.4 Lexicalization. 35

2.5 Corpus evidence. 39

2.5.1 Lexicon size. 41

2.5.2 Quotations. 42

2.6 Conclusion. 43

2.7 Further reading. 44

Acknowledgments. 45

References. 45

3 Making sense about sense. 47

3.1 Introduction. 47

3.2 WSD and the lexicographers. 49

3.3 WSD and sense inventories. 51

3.4 NLP applications and WSD.. 55

3.5 What level of sense distinctions do we need for NLP, if any?. 58

3.6 What now for WSD?. 64

3.7 Conclusion. 68

References. 68

4 Evaluation of WSD systems. 75

4.1 Introduction. 75

4.1.1 Terminology. 76

4.1.2 Overview.. 80

4.2 Background. 81

4.2.1 WordNet and Semcor 81

4.2.2 The line and interest corpora. 83

4.2.3 The DSO corpus. 84

4.2.4 Open Mind Word Expert 85

4.3 Evaluation using pseudo-words. 86

4.4 Senseval evaluation exercises. 86

4.4.1 Senseval-1. 87

Evaluation and scoring. 88

4.4.2 Senseval-2. 88

English all-words task. 89

English lexical sample task. 89

4.4.3 Comparison of tagging exercises. 91

4.5 Sources of inter-annotator disagreement 92

4.6 Granularity of sense: Groupings for WordNet 95

4.6.1 Criteria for WordNet sense grouping. 96

4.6.2 Analysis of sense grouping. 97

4.7 Senseval-3. 98

4.8 Discussion. 99

References. 102

5 Knowledge-based methods for WSD.. 107

5.1 Introduction. 107

5.2 Lesk algorithm.. 108

5.2.1 Variations of the Lesk algorithm.. 110

Simulated annealing. 110

Simplified Lesk algorithm.. 111

Augmented semantic spaces. 113

Summary. 113

5.3 Semantic similarity. 114

5.3.1 Measures of semantic similarity. 114

5.3.2 Using semantic similarity within a local context 117

5.3.3 Using semantic similarity within a global context 118

5.4 Selectional preferences. 119

5.4.1 Preliminaries: Learning word-to-word relations. 120

5.4.2 Learning selectional preferences. 120

5.4.3 Using selectional preferences. 122

5.4 Heuristics for word sense disambiguation. 123

5.5.1 Most frequent sense. 123

5.5.2 One sense per discourse. 124

5.5.3 One sense per collocation. 124

5.6 Knowledge-based methods at Senseval-2. 125

5.7 Conclusions. 126

References. 127

6 Unsupervised corpus-based methods for WSD.. 133

6.1 Introduction. 133

6.1.1 Scope. 134

6.1.2 Motivation. 136

Distributional methods. 137

Translational equivalence. 139

6.1.3 Approaches. 140

6.2 Type-based discrimination. 141

6.2.1 Representation of context 142

6.2.2 Algorithms. 145

Latent Semantic Analysis (LSA) 146

Hyperspace Analogue to Language (HAL) 147

Clustering By Committee (CBC) 148

6.2.3 Discussion. 150

6.3 Token-based discrimination. 150

6.3.1 Representation of context 151

6.3.2 Algorithms. 151

Context group discrimination. 152

McQuitty's similarity analysis. 154

6.3.3 Discussion. 157

6.4 Translational equivalence. 158

6.4.1 Representation of context 159

6.4.2 Algorithms. 159

6.4.3 Discussion. 160

6.5 Conclusions and the way forward. 161

Acknowledgements. 162

References. 162

7 Supervised corpus-based methods for WSD.. 167

7.1 Introduction to supervised WSD.. 167

7.1.1 Machine learning for classification. 168

An example on WSD.. 170

7.2 A survey of supervised WSD.. 171

7.2.1 Main corpora used. 172

7.2.2 Main sense repositories. 173

7.2.3 Representation of examples by means of features. 174

7.2.4 Main approaches to supervised WSD.. 175

Probabilistic methods. 175

Methods based on the similarity of the examples. 176

Methods based on discriminating rules. 177

Methods based on rule combination. 179

Linear classifiers and kernel-based approaches. 179

Discourse properties: The Yarowsky bootstrapping algorithm.. 181

7.2.5 Supervised systems in the Senseval evaluations. 183

7.3 An empirical study of supervised algorithms for WSD.. 184

7.3.1 Five learning algorithms under study. 185

Naive Bayes (NB) 185

Exemplar-based learning (kNN) 186

Decision lists (DL) 187

AdaBoost (AB) 187

Support Vector Machines (SVM) 189

7.3.2 Empirical evaluation on the DSO corpus. 190

Experiments. 191

7.4 Current challenges of the supervised approach. 195

7.4.1 Right-sized training sets. 195

7.4.2 Porting across corpora. 196

7.4.3 The knowledge acquisition bottleneck. 197

Automatic acquisition of training examples. 198

Active learning. 199

Combining training examples from different words. 199

Parallel corpora. 200

7.4.4 Bootstrapping. 201

7.4.5 Feature selection and parameter optimization. 202

7.4.6 Combination of algorithms and knowledge sources. 203

7.5 Conclusions and future trends. 205

Acknowledgements. 206

References. 207

8 Knowledge sources for WSD.. 217

8.1 Introduction. 217

8.2 Knowledge sources relevant to WSD.. 218

8.2.1 Syntactic. 219

Part of speech (KS 1) 219

Morphology (KS 2) 219

Collocations (KS 3) 220

Subcategorization (KS 4) 220

8.2.2 Semantic. 220

Frequency of senses (KS 5) 220

Semantic word associations (KS 6) 221

Selectional preferences (KS 7) 221

Semantic roles (KS 8) 222

8.2.3 Pragmatic/Topical 222

Domain (KS 9) 222

Topical word association (KS 10) 222

Pragmatics (KS 11) 223

8.3 Features and lexical resources. 223

8.3.1 Target-word specific features. 224

8.3.2 Local features. 225

8.3.3 Global features. 227

8.4 Identifying knowledge sources in actual systems. 228

8.4.1 Senseval-2 systems. 229

8.4.2 Senseval-3 systems. 231

8.5 Comparison of experimental results. 231

8.5.1 Senseval results. 232

8.5.2 Yarowsky and Florian (2002) 233

8.5.3 Lee and Ng (2002) 234

8.5.4 Martínez et al. (2002) 237

8.5.5 Agirre and Martínez (2001a) 238

8.5.6 Stevenson and Wilks (2001) 240

8.6 Discussion. 242

8.7 Conclusions. 245

Acknowledgments. 246

References. 247

9 Automatic acquisition of lexical information and examples. 253

9.1 Introduction. 253

9.2 Mining topical knowledge about word senses. 254

9.2.1Topic signatures. 255

9.2.2 Association of Web directories to word senses. 257

9.3 Automatic acquisition of sense-tagged corpora. 258

9.3.1 Acquisition by direct Web searching. 258

9.3.2 Bootstrapping from seed examples. 261

9.3.3 Acquisition via Web directories. 263

9.3.4 Acquisition via cross-language evidence. 264

9.3.5 Web-based cooperative annotation. 268

9.4 Discussion. 269

Acknowledgements. 271

References. 272

10 Domain-specific WSD.. 275

10.1 Introduction. 275

10.2 Approaches to domain-specific WSD.. 277

10.2.1 Subject codes. 277

10.2.2 Topic signatures and topic variation. 282

Topic signatures. 282

Topic variation. 283

10.2.3 Domain tuning. 284

Top-down domain tuning. 285

Bottom-up domain tuning. 285

10.3 Domain-specific disambiguation in applications. 288

10.3.1 User-modeling for recommender systems. 288

10.3.2 Cross-lingual information retrieval 289

10.3.3 The MEANING project 292

10.4 Conclusions. 295

References. 296

11 WSD in NLP applications. 299

11.1 Introduction. 299

11.2 Why WSD?. 300

Argument from faith. 300

Argument by analogy. 301

Argument from specific applications. 302

11.3 Traditional WSD in applications. 303

11.3.1 WSD in traditional information retrieval 304

11.3.2 WSD in applications related to information retrieval 307

Cross-language IR.. 308

Question answering. 309

Document classification. 312

11.3.3 WSD in traditional machine translation. 313

11.3.4 Sense ambiguity in statistical machine translation. 315

11.3.5 Other emerging applications. 317

11.4 Alternative conceptions of word sense. 320

11.4.1 Richer linguistic representations. 320

11.4.2 Patterns of usage. 321

11.4.3 Cross-language relationships. 323

11.5 Conclusions. 325

Acknowledgments. 325

References. 326