November 15

Introduction to Machine Translation

What is Machine Translation?

As defined by Amazon, Machine Translation (MT) is a process of using Artificial Intelligence to automatically translate text from one language to another without any human involvement.

4 models of Machine Translation

Statistical MT (SMT) – Based on statistical models that are dependent on analyzing enormous volumes of bilingual content. The database establishes the patterns, relationships, and hypotheses between a word from the source language and a word from the target language and how it should translate similar text.

It was initially word-based but has evolved into a phrase-based system to help capture word context.

It is good for basic translation relating to technical and scientific text. However, it does not consider the context and thus results in low-quality translation output. A good example of SMT would be Google translate when it was first introduced.

The types of Statistical-based MT are:

  • Hierarchical phrase-based translation
  • Syntax-based translation
  • Phrase-based translation
  • Word-based translation

Rule-based MT (RBMT) – RBMT is an MT that translates based on grammatical rules. It examines the grammatical structure and establishes rules for sentence structure, word order and phraseology for the source and target languages to create the translation output. Relying on dictionaries for the relevant information, it maps each source word to an appropriate translation in the target language.

It is mostly obsolete.

Hybrid MT (HMT) – As the term implies, HMT is a mix of SMT + RBMT. It uses Translation Memory, which makes it more reliable in terms of quality, but the disadvantage is that it requires massive editing and the involvement of human translators.

Known approaches to HMT:

  • Multi-engine
  • Statistical rule generation
  • Multi-pass
  • Confidence-based

Neural MT (NMT) – This type of MT relies upon a neural network model to build statistical models for translation. This neural network codes and decodes the source text. It does not just run a set of pre-defined rules to determine the final translation output. As such, NMT addresses many problems faced in SMT and RBMT systems.

NMT’s power lies in its neural network architecture that facilitates the processing of enormous amounts of data and its adaption to new contexts.

NMT is ideal for the translation of content quickly, accurately and flexibly.

Benefits of NMT

High Accuracy: By drawing from extensive data sets and using language modelling, NMT covers and understands the broader context of words and phrases, resulting in more accurate and coherent translations.

Fast Learning: Neural networks can be trained quickly through automated processes allowing the creation of customized specialized MT engines.

Simple and flexible: It can be easily integrated using APIs and SDKs into other software.

Customization: The output of the NMT can be customized and updated using specialized terminology databases, brand-specific glossaries, and other data sources to improve the final translation output.

Cost efficiency: NMT produces highly accurate translation quickly at a fraction of the cost of human translation. Although there may still be a difference between NMT translation output and that which human translators produce, this deviation can be narrowed by relying on human editors for machine translation post-editing process.

Scalability: NMT can scale up to produce an enormous volume of translation output quickly. 

Advantages and disadvantages of Machine Translation

Advantages

  • Output increases
  • Less effort in terms of human involvement
  • Reduced cost, especially for a massive volume of translation
  • Increase consistency
  • Increases speed

Disadvantages

  • Literal translation
  • Poor quality MT – implies more effort for human translators
  • Hidden costs for clients – review requires cost and takes time
  • Accuracy and ambiguity – only human translators are capable of doing that

What type of content is best suited for MT?

NMT excels best in the following:

  • Translating a massive amount of content within a short timeframe, such as online customer reviews, information required after a natural disaster etc.
  • Translating highly repetitive content such as manuals, user guides, reference materials etc.
  • Translating user-generated content (UGC) for social sentiment analysis, such as comments from social media.
  • Online customer service to scale up Live Chat.

As a rule of thumb, MT is more suited for more structured content like technical documentation, legal, and IP. It also works well for internal communications and materials used for references and research purposes.

Generally, content related to marketing, branding, and content that depends heavily on context is better handled by human translators.

Machine Translation Post Editing (MTPE) may be the most appropriate solution to balance quality, cost, and efficiency.


You may also like

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}