Adding Linguistic Information to Statistical Machine Translation:
Information Packaging for SMT

Early phase of PhD project, 2009. Supervisors: Mark Dras, Robert Dale

Project Background

Machine translation (MT) is the task of automatically translating a written text from one human language to another. In statistical machine translation (SMT), this is accomplished by developing a probabilistic model of the translation process. Intuitively, linguistic information about the sentence should aid translation, but so far the addition of such information to the statistical model has not consistently proven useful. This project investigates how such information can be usefully incorporated into the system.

Information Packaging for SMT

Information packaging refers to the speaker's choice between several possible realisations of the same information. In other words, it is the selection of the order and manner in which information is presented in a sentence. I hypothesised that information packaging is used to place emphasis on a particular part of the sentence. Based on this, I conducted a small survey to investigate whether native speakers prefer a translation where emphasis falls on the same element over a translation with different emphasis.

Survey: Effect of Information Packaging on Perceived Translation Quality

Survey questions are based on sentences from the freely-available German–English Europarl training data from the 2009 Workshop on Statistical Machine Translation (WMT'09).

No formal publications have arisen from this survey.

Finish

After completing my analysis of the survey and exploring the theoretical linguistics work on information packaging, I concluded it would be difficult to implement an adequate IP analysis for SMT within the time frame of the project. My project shifted direction to consider confidence in syntactic information. No further work is planned for this project.