Word Sense Disambiguation
Edited by Eneko Agirre and Philip Edmonds

Chapter 7: Supervised Corpus-Based Methods for WSD

Lluís Màrquez, Lluís Màrquez, Gerard Escudero, David Martínez, German Rigau


In this chapter, the supervised approach to word sense disambiguation is presented, which consists of automatically inducing classification models or rules from annotated examples. We start by introducing the machine learning framework for classification and some important related concepts. Then, a review of the main approaches in the literature is presented, focusing on the following issues: learning paradigms, corpora used, sense repositories, and feature representation. We also include a more detailed description of five statistical and machine learning algorithms, which are experimentally evaluated and compared on the DSO corpus. In the final part of the chapter, the current challenges of the supervised learning approach to WSD are briefly discussed.


SVMlight Support Vector Machine implementation


7.1 Introduction to supervised WSD.. 167

7.1.1 Machine learning for classification. 168

An example on WSD.. 170

7.2 A survey of supervised WSD.. 171

7.2.1 Main corpora used. 172

7.2.2 Main sense repositories. 173

7.2.3 Representation of examples by means of features. 174

7.2.4 Main approaches to supervised WSD.. 175

Probabilistic methods. 175

Methods based on the similarity of the examples. 176

Methods based on discriminating rules. 177

Methods based on rule combination. 179

Linear classifiers and kernel-based approaches. 179

Discourse properties: The Yarowsky bootstrapping algorithm.. 181

7.2.5 Supervised systems in the Senseval evaluations. 183

7.3 An empirical study of supervised algorithms for WSD.. 184

7.3.1 Five learning algorithms under study. 185

Naive Bayes (NB) 185

Exemplar-based learning (kNN) 186

Decision lists (DL) 187

AdaBoost (AB) 187

Support Vector Machines (SVM) 189

7.3.2 Empirical evaluation on the DSO corpus. 190

Experiments. 191

7.4 Current challenges of the supervised approach. 195

7.4.1 Right-sized training sets. 195

7.4.2 Porting across corpora. 196

7.4.3 The knowledge acquisition bottleneck. 197

Automatic acquisition of training examples. 198

Active learning. 199

Combining training examples from different words. 199

Parallel corpora. 200

7.4.4 Bootstrapping. 201

7.4.5 Feature selection and parameter optimization. 202

7.4.6 Combination of algorithms and knowledge sources. 203

7.5 Conclusions and future trends. 205

Acknowledgements. 206

References. 207

Copyright © 2006 Springer. All rights reserved.