University of Sussex
Browse

File(s) not publicly available

Improved learning for hidden Markov models using penalized training

presentation
posted on 2023-06-07, 21:41 authored by Bill Keller, Rudi Lutz
In this paper we investigate the performance of penalized variants of the forwards-backwards algorithm for training Hidden Markov Models. Maximum likelihood estimation of model parameters can result in over-fitting and poor generalization ability. We discuss the use of priors to compute maximum a posteriori estimates and describe a number of experiments in which models are trained under different conditions. Our results show that MAP estimation can alleviate over-fitting and help learn better parameter estimates.

History

Publication status

  • Published

ISSN

0302-9743

Publisher

Springer-Verlag

Volume

2464

Pages

8.0

Presentation Type

  • paper

Event name

AICS 02: Proceedings of the 13th Irish International Conference on Artificial Intelligence and CognitiveScience

Event location

LIMERICK, IRELAND

Event type

conference

ISBN

3540441840

Department affiliated with

  • Informatics Publications

Notes

Originality: This was the first application within NLP of penalised training of Hidden Markov Models using Dirichlet priors over the emission probabilities of the model. Rigour: The paper derived the necessary EM update rule incorporating the Dirichlet prior, and described emiprical results comparing learning with this prior with several other priors recommended in the literature. The data consisted of the first 5000 POS tagged sentences from the BNC corpus, split into training and test sets. All results were obtained using 10-fold cross validation, and were shown to be statistically significant. Significance: The paper showed that the use of Dirichlet priors (with the Dirichlet distribution parameters set proportional to the normalised frequencies of the observation symbols in the training data) consistently enabled the learning of better performing models. This result was robust across model sizes and variations in initial conditions. Additionally, the results cast doubt on claims by Brand that minimum entropy priors gave good results, suggesting the need for further work in this area. Since this paper was written use of Dirichlet priors (and more recently Dirichlet Process priors) has become widespread. Outlet: this was a fully (3 referees) refereed international conference

Full text available

  • No

Peer reviewed?

  • Yes

Editors

RFE Sutcliffe, M Oneill, M Eaton, C Ryan, NJL Griffith

Legacy Posted Date

2012-02-06

Usage metrics

    University of Sussex (Publications)

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC