Identifying patterns for unsupervised grammar induction

Titulo Identifying patterns for unsupervised grammar induction
Autor(es) Santamaría, Jesús
Araujo, Lourdes
Materia(s) Informática
Resumen This paper describes a new method for unsupervised grammar induction based on the automatic extraction of certain patterns in the texts. Our starting hypothesis is that there exist some classes of words that function as separators, marking the beginning or the end of new constituents. Among these separators we distinguish those which trigger new levels in the parse tree. If we are able to detect these separators we can follow a very simple procedure to identify the constituents of a sentence by taking the classes of words between separators. This paper is devoted to describe the process that we have followed to automatically identify the set of separators from a corpus only annotated with Part-of-Speech (POS) tags. The proposed approach has allowed us to improve the results of previous proposals when parsing sentences fromtheWall Street Journal corpus.
Fecha 2010-07-15
Fuente Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pages 38–45,Uppsala, Sweden, 15-16 July 2010
