Chapter 9: Automatic Acquisition of Lexical Information and Examples

Julio Gonzalo, Felisa Verdejo

Abstract

A possible way of solving the knowledge acquisition bottleneck in word sense disambiguation is mining very large corpora (most prominently the World Wide Web) to automatically acquire lexical information and examples to feed supervised learning methods. Although this area of research remains largely unexplored, it has already revealed a strong potential to improve WSD performance. This chapter reviews the main approaches, initial accomplishments, and open challenges in this topic.

Links

sensecorpus (sense-annotated web corpus)
Open Directory Project - dmoz
WordNet-ODP associations

9.1 Introduction. 253

9.2 Mining topical knowledge about word senses. 254

9.2.1Topic signatures. 255

9.2.2 Association of Web directories to word senses. 257

9.3 Automatic acquisition of sense-tagged corpora. 258

9.3.1 Acquisition by direct Web searching. 258

9.3.2 Bootstrapping from seed examples. 261

9.3.3 Acquisition via Web directories. 263

9.3.4 Acquisition via cross-language evidence. 264

9.3.5 Web-based cooperative annotation. 268

9.4 Discussion. 269

Acknowledgements. 271

References. 272

Chapter 9: Automatic Acquisition of Lexical Information and Examples

Abstract

Links

Contents