Chapter 6: Unsupervised Corpus-Based Methods for WSD

Ted Pedersen

Abstract

This chapter focuses on unsupervised corpus-based methods of word sense discrimination that are knowledge-lean, and do not rely on external knowledge sources such as machine readable dictionaries, concept hierarchies, or sense-tagged text. They do not assign sense tags to words; rather, they discriminate among word meanings based on information found in unannotated corpora. This chapter reviews distributional approaches that rely on monolingual corpora and methods based on translational equivalence as found in word-aligned parallel corpora. These techniques are organized into type- and token-based approaches. The former identify sets of related words, while the latter distinguish among the senses of a word used in multiple contexts.

Links

Latent Semantic Analysis (LSA)
Clustering By Committee (CBC)
SenseClusters Perl package

6.1 Introduction. 133

6.1.1 Scope. 134

6.1.2 Motivation. 136

Distributional methods. 137

Translational equivalence. 139

6.1.3 Approaches. 140

6.2 Type-based discrimination. 141

6.2.1 Representation of context 142

6.2.2 Algorithms. 145

Latent Semantic Analysis (LSA) 146

Hyperspace Analogue to Language (HAL) 147

Clustering By Committee (CBC) 148

6.2.3 Discussion. 150

6.3 Token-based discrimination. 150

6.3.1 Representation of context 151

6.3.2 Algorithms. 151

Context group discrimination. 152

McQuitty's similarity analysis. 154

6.3.3 Discussion. 157

6.4 Translational equivalence. 158

6.4.1 Representation of context 159

6.4.2 Algorithms. 159

6.4.3 Discussion. 160

6.5 Conclusions and the way forward. 161

Acknowledgements. 162

References. 162

Chapter 6: Unsupervised Corpus-Based Methods for WSD

Abstract

Links

Contents