State of The Art: Machine Translation

In a world filled with diversity, language has always been a considerable barrier when it comes to the expression and communication of thoughts and ideas. As humans, we sometimes struggle to accurately express what we think or how we feel in a foreign language. The simple fact is that the thoughts we want to convey could have a different meaning in another language, even if they are translated word for word. Today, we could use tools like Google translate to help us with understanding and communicating in different languages but it would not be without some imperfections.

Machine Translation (MT) is the process – involving Artificial Intelligence and Machine Learning (Deep Learning) – by which a computer software translates text or speech from one human language to another. MT executes mechanical substitution of words from one language to another, but that alone rarely produces a good translation because not all words have their equivalent in other languages, also many words could have more than one meaning. In addition, two given languages may have completely different structures, so recognition of whole phrases and their closest counterparts in the target language is needed.

Although it is pretty remarkable for a computer to be able to understand what a human says and translate it into a different language with a fair amount of accuracy, translation in itself can be really complicated. Words are not precise in the sense that they sometimes have different meanings – For instance, the word "left" has three different interpretations in the following sentence: "There was no one left on his left after he left the door open". In addition, grammar is often unalike in different languages. So when we ask machines to perform translations, the issue is that they may lack the common sense to complete the task with perfection. But it is important to mention that over the past two decades, there has been great improvement in Machine Translation, from Rule-Based MT to Statistical MT to Neural Machine Translation.

According to Systran – a pioneer and global leader in translation solutions – Rule-Based Machine Translation (RBMT) relies on countless built-in linguistic rules and millions of bilingual dictionaries for each language pair. The software uses complex rule sets and transfers the grammatical structure of the source language into the target language. Statistical Machine Translation (SMT) uses a different approach. Unlike RBMT, the statistical systems learn to translate by analyzing large amounts of data for each language pair with no knowledge of language rules. They can be trained for specific domains or industries (i.e. legal, medical, IT, etc...) using additional data relevant to that specific sector. SMT delivers more fluent-sounding but less consistent translations.

The new approach to MT, Neural Machine Translation (NMT) learns to translate through one large neural network, which is a group of processing devices modeled on the brain. NMT systems have shown better translation performance in many language pairs compared to the previous two MTs. With DeepL and Google Translate as leading tools using NMT, most tech companies are moving towards that technology. A year ago, Facebook introduced the unsupervised MT model, combining both SMT and NMT which require only large monolingual corpora – collections of linguistic data used for research – instead of a bilingual one.

With the progress already made with Machine Translation, we are yet to experience MTs that would be able to understand things like humor or sarcasm and convey them accurately in other languages. Scientists are researching human accentuated MTs, that would be able to understand the thoughts behind words and translate them in different languages. Achieving this would be major progress in Machine Translation, especially from a business perspective, for those who depend on translations to conduct transactions.

  • Facebook
  • Twitter
  • YouTube
  • Instagram

Subscribe to Atomination Updates!