1 Department of Management - Nobelparken, Aarhus BSS, Aarhus University2 Dublin City University3 School of Communication and Culture - English Business Communication, School of Communication and Culture, Arts, Aarhus University4 School of Communication and Culture - English Business Communication, School of Communication and Culture, Arts, Aarhus University
Denoual (2005) discovered that, contrary to popular belief, an Example-Based Machine Translation system trained on heterogeneous data produced significantly better results than a system trained on homogeneous data. Using similar evaluation metrics and a few additional ones, in this paper we show that this does not hold true for the automated translation of subtitles. In fact, our system (when trained on homogeneous data) shows a relative increase of 74% BLEU in the language direction German-English and 86% BLEU English-German. Furthermore, we show that increasing the amount of heterogeneous data results in ‘bad examples’ being put forward as translation candidates, thus lowering the translation quality.