Programme synopsis‎ > ‎Theory‎ > ‎

Parsing of Open Domain Text with GF


 Krasimir Angelov


So far GF was used only for parsing small controlled languages but the improvements in the parsing performance in the last few years made it possible to dream about parsing open domain unrestricted text. In the resource libraries, we already have wide coverage grammars for many languages but having a grammar is only part of the problem. Even if we improve more and more our grammars it will be always possible to find syntactic constructions which are not covered by the grammar. Another
problem is that when we add more syntactic constructions in the grammar, this usually makes it more ambiguous. The solutions is to build a parser that is robust and is able to do statistical ranking when there are ambiguities in the grammar. The first obstacle in this initiative is that all statistical parsers require some training data (i.e. treebank) in order to estimate the probabilities of the different events.

In this talk I will present the current state of the conversion of Penn Treebank to GF abstract syntax trees compatible with the English Resource Grammar. When there are unknown syntactic constructions then we just leave placeholders in the abstract tree. Currently we have matched 86% of the constructions with the grammar.

 Date and Time