The sentence in the parenthesis is Google Translate’s interpretation of the title in Serbian. Why the ridiculousness? Sure, the sentence is not a typical one; some linguistic liberties have been taken, some rules neglected, some malice exhibited, but the trap has been set with hopes of catching some amusing mistakes (and checking to see if the internet will implode), not spawning this abominable construct of transcription and lies. I won’t be so petty to point out that a translation tool failed to translate the word “translate” (which would be the far less remarkable and just as absurd equivalent to a pen that can’t write the word pen) and instead I’ll focus on a slightly more important aspect of the issue – the fact that the machine didn’t say “I can’t do it” it just did it, poorly. Why? Well, because it can never be really sure that it can do it properly, it just hopes it can, and that is one of the main problems with machine translation.

Google Translate uses statistical approach, which basically means that it translates by searching for pairs of sentences in parallel texts and displaying the most common target language equivalent to the source language sentence. When dealing with languages that have large corpora this method is superior to rule based translations which try to decipher the deep structure of a sentence and then express it in another language, a task made impossible by the number of variables, it is easy enough to find the corresponding word in the target language, but what about gender, number, case, idioms, phrases, collocations, tense? Often enough these categories are determined by context which is something that no amount of programming can enable a machine to understand. This is why most of the translation tools use statistical approach, sometimes combined with rule based.

This English to Serbian translation of the title is what you get when there is no pair in the target language. Serbian is a highly inflected language, with a negligible amount of parallel texts and as such is extremely resilient to machine translation, but since the program can never be sure that it’s right it can also never be sure it’s wrong, all it can do is to try its best (да испробате своје најбоље). - dissing defenceless machines since 2011. 

  1. prevedi posted this