Corpus annotation as a scientific task

Scott, Donia; Barone, Rossano; Koeling, Rob

Scottetal.pdf (594.41 kB)

Corpus annotation as a scientific task

presentation

posted on 2023-06-08, 11:44 authored by Donia Scott, Rossano Barone, Rob Koeling

Annotation studies in CL are generally unscientific: they are mostly not reproducible, make use of too few (and often non-independent) annotators and use guidelines that are often something of a moving target. Additionally, the notion of ‘expert annotators’ invariably means only that the annotators have linguistic training. While this can be acceptable in some special contexts, it is often far from ideal. This is particularly the case when subtle judgements are required or when, as increasingly, one is making use of corpora originating from technical texts that have been produced by, and intended to be consumed by, an audience of technical experts in the field. We outline a more rigorous approach to collecting human annotations, using as our example a study designed to capture judgements on the meaning of hedge words in medical records.

History

Publication status

Published

File Version

Published version

Publisher URL

http://www.lrec-conf.org/proceedings/lrec2012/index.html

Presentation Type

paper

Event name

The Eighth International Conference on Language Resources and Evaluation (LREC'2012)

Event location

Istanbul, Turkey

Event type

conference

Event date

23-27th May, 2011

Department affiliated with

Informatics Publications

Full text available

Yes

Peer reviewed?

Yes

Legacy Posted Date

2012-06-01

First Open Access (FOA) Date

2012-06-01

First Compliant Deposit (FCD) Date

2012-05-31

Usage metrics

Keywords

corpus annotation hedges electronic patient records

Licence

CC0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Corpus annotation as a scientific task

History

Publication status

File Version

Publisher URL

Presentation Type

Event name

Event location

Event type

Event date

Department affiliated with

Full text available

Peer reviewed?

Legacy Posted Date

First Open Access (FOA) Date

First Compliant Deposit (FCD) Date

Usage metrics

Categories

Keywords

Licence

Exports