Deciphering clinical text: concept recognition in primary care text notes

Savkov, Aleksandar Dimitrov

Savkov, Aleksandar Dimitrov.pdf (1.76 MB)

Deciphering clinical text: concept recognition in primary care text notes

thesis

posted on 2023-06-09, 06:24 authored by Aleksandar Dimitrov Savkov

Electronic patient records, containing data about the health and care of a patient, are a valuable source of information for longitudinal clinical studies. The General Practice Research Database (GPRD) has collected patient records from UK primary care practices since the late 1980s. These records contain both structured data (in the form of codes and numeric values) and free text notes. While the structured data have been used extensively in clinical studies, there are significant practical obstacles in extracting information from the free text notes. The main obstacles are data access restrictions, due to the presence of sensitive information, and the specific language of medical practitioners, which renders standard language processing tools ineffective. The aim of this research is to investigate approaches for computer analysis of free text notes. The research involved designing a primary care text corpus (the Harvey Corpus) annotated with syntactic chunks and clinically-relevant semantic entities, developing a statistical chunking model, and devising a novel method for applying machine learning for entity recognition based on chunk annotation. The tools produced would facilitate reliable information extraction from primary care patient records, needed for the development of clinically-related research. The three medical concept types targeted in this thesis could contribute to epidemiological studies by enhancing the detection of co-morbidities, and better analysing the descriptions of patient experiences and treatments. The main contributions of the research reported in this thesis are: guidelines for chunk and concept annotation of clinical text, an approach to maximising agreement between human annotators, the Harvey Corpus, a method for using a standard part-of-speech tagging model in clinical text chunking, and a novel approach to recognising clinically relevant medical concepts.

History

File Version

Published version

Pages

232.0

Department affiliated with

Informatics Theses

Qualification level

doctoral

Qualification name

phd

Language

eng

Institution

University of Sussex

Full text available

Yes

Legacy Posted Date

2017-05-25

Usage metrics

Keywords

Uncategorised value

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Deciphering clinical text: concept recognition in primary care text notes

History

File Version

Pages

Department affiliated with

Qualification level

Qualification name

Language

Institution

Full text available

Legacy Posted Date

Usage metrics

Categories

Keywords

Licence

Exports