Automatic identification of infrequent word senses

McCarthy, Diana; Koeling, Rob; Weeds, Julie; Carroll, John

File(s) not publicly available

Automatic identification of infrequent word senses

presentation

posted on 2023-06-07, 21:07 authored by Diana McCarthy, Rob Koeling, Julie WeedsJulie Weeds, John Carroll

In this paper we show that an unsupervised method for ranking word senses automatically can be used to identify infrequently occurring senses. We demonstrate this using a ranking of noun senses derived from the BNC and evaluating on the sense-tagged text available in both SemCor and the SENSEVAL-2 English all-words task. We show that the method does well at identifying senses that do not occur in a corpus, and that those that are erroneously filtered but do occur typically have a lower frequency than the other senses. This method should be useful for word sense disambiguation systems, allowing effort to be concentrated on more frequent senses; it may also be useful for other tasks such as lexical acquisition. Whilst the results on balanced corpora are promising, our chief motivation for the method is for application to domain specific text. For text within a particular domain many senses from a generic inventory will be rare, and possibly redundant. Since a large domain specific corpus of sense annotated data is not available, we evaluate our method on domain-specific corpora and demonstrate that sense types identified for removal are predominantly senses from outside the domain.

History

Publication status

Published

External DOI

https://doi.org/10.3115/1220355.1220532

Page range

1220-1226

Presentation Type

paper

Event name

20th International Conference on Computational Linguistics (COLING)

Event location

Geneva, Switzerland

Event type

conference

ISBN

1-932432-49-3

Department affiliated with

Informatics Publications

Full text available

No

Peer reviewed?

Yes

Legacy Posted Date

2012-02-06

Usage metrics

Keywords

Uncategorised value

Licence

Copyright not evaluated

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) not publicly available

Automatic identification of infrequent word senses

History

Publication status

External DOI

Page range

Presentation Type

Event name

Event location

Event type

ISBN

Department affiliated with

Full text available

Peer reviewed?

Legacy Posted Date

Usage metrics

Categories

Keywords

Licence

Exports