Corpus annotation as a scientific task

Scott, Donia, Barone, Rossano and Koeling, Rob (2012) Corpus annotation as a scientific task. In: The Eighth International Conference on Language Resources and Evaluation (LREC'2012), 23-27th May, 2011, Istanbul, Turkey.

[img] PDF (Corpus annotation applied to linguistic hedges in medical records.) - Published Version
Available under License Creative Commons Public Domain Dedication.

Download (608kB)


Annotation studies in CL are generally unscientific: they are mostly not reproducible, make use of too few (and often non-independent) annotators and use guidelines that are often something of a moving target. Additionally, the notion of ‘expert annotators’ invariably means only that the annotators have linguistic training. While this can be acceptable in some special contexts, it is often far from ideal. This is particularly the case when subtle judgements are required or when, as increasingly, one is making use of corpora originating from technical texts that have been produced by, and intended to be consumed by, an audience of technical experts in the field. We outline a more rigorous approach to collecting human annotations, using as our example a study designed to capture judgements on the meaning of hedge words in medical records.

Item Type: Conference or Workshop Item (Paper)
Keywords: corpus annotation, hedges, electronic patient records
Schools and Departments: School of Engineering and Informatics > Informatics
Subjects: P Language and Literature > P Philology. Linguistics > P0098 Computational linguistics. Natural language processing
Related URLs:
Depositing User: Donia Scott
Date Deposited: 01 Jun 2012 08:53
Last Modified: 15 Aug 2012 15:54

View download statistics for this item

📧 Request an update