Lutz, Rudi (1998) Learning Stochastic Context-Free Grammars from Corpora Using a Genetic Algorithm. In: Unset, Smith G Steele NC and Albrecht RF (Eds) Artificial Neural Nets and Genetic Algorithms. Proceedings of the 3rd International Conference on Artificial Neural Networks and Genetic Algorithms (ICANNGA-97) Norwich UK..
Full text not available from this repository.Abstract
A genetic algorithm for learning stochastic context-free grammars from finite language samples as described. Solutions to the inference problem are evolved by optimising the parameters of a covering grammar for a given language sample. We describe a number of experiments in learning grammars for a range of formal languages. The results of these experiments are encouraging and compare favourably with other approaches to stochastic grammatical inference.
Item Type: | Conference or Workshop Item (Other) |
---|---|
Additional Information: | Originality: This paper (and its earlier conference version) was one of the first papers to apply a genetic algorithm to the problem of inferring stochastic grammar from a corpus. It also used a novel prior over stochastic context free grammars in order to bias the search towrads simpler grammars. It is also the first to compare this technique to use of the Inside-Outside algorithm. It also made use of an entirely novel crossover operator within the GA, which resulted in faster discovery of good grammars. Rigour: the fitness function for the GA is developed within a Bayesian approach to grammar induction. Results for all language learning tasks were given as averages over 20 runs. Languages learnt were the standard ones for the area, plus 3-symbol palindromes which had previously been considered too hard. Significance: We have demonstrated that a genetic algorithm, making use of a prior over grammars can successfully learn SCFGs from corpora. Good results are obtained on a range of formal languages using a prior that is designed to prefer simpler grammars. Grammars found by the GA are both small and accurate in comparison to those produced by the Inside-Outside algorithm. Impact: This paper, and its earlier conference version, have been cited a total of 20 times (Google Scholar) Outlet: A leading journal in its area. Impact factor 1.822 |
Schools and Departments: | School of Engineering and Informatics > Informatics |
Depositing User: | Rudi Lutz |
Date Deposited: | 06 Feb 2012 18:27 |
Last Modified: | 31 May 2012 09:47 |
URI: | http://srodev.sussex.ac.uk/id/eprint/16429 |