1 Department of Business Communication, Aarhus BSS, Aarhus University2 Dublin City University3 Department of Business Communication, Aarhus BSS, Aarhus University
Denoual (2005) discovered that, contrary to popular belief, an Example-Based Machine Translation system trained on heterogeneous data produced significantly better results than a system trained on homogeneous data. Using similar evaluation metrics and a few additional ones, in this paper we show that this does not hold true for the automated translation of subtitles. In fact, our system (when trained on homogeneous data) shows a relative increase of 74% BLEU in the language direction German-English and 86% BLEU English-German. Furthermore, we show that increasing the amount of heterogeneous data results in ‘bad examples’ being put forward as translation candidates, thus lowering the translation quality.