Introduction to SMT and its standard tools


 Cristina España



This tutorial is intended to provide an introduction to Statistical
Machine Translation. The statistical paradigm is one of the
predominants within machine translation. This is possibly due to the
simplicity of building a basic system with free software, the large
community behind it and, of course, the good results that it achieves.

The main objective of the session is to get to know the fundamentals
behind the three modules of a statistical system: the language model,
the translation model and the decoding or search for the best
translation. The presentation, although theoretical, is focused on
understanding how software such as SRILM and Moses work, what's the
logic behind them so that it is easy to understand the extensions and
modifications available.
We also devote a small portion of time to see how these systems, and
machine translation systems in general, are evaluated automatically.
Machine translation evaluation is a delicate topic. Here we will put
the evaluation into context, describe in detail the standard metrics
and overview on the existing possibilities.

Finally, in a second part, the standard software will be introduced
and if there is time a toy SMT system will be build. Otherwise the
main steps for building it will be given.


Part I: SMT background
1 Introduction
2 Basics
3 Components: language model, translation model and the decoding
4 The log-linear model
5 Beyond standard SMT
6 MT Evaluation

Part II: SMT experiments
7 Translation system
8 Evaluation system

 Date and Time