The engines

From Pangeanic
Jump to navigation Jump to search

They fulfil the tasks of processing language assets in the form of individual sentences, paragraphs or text chunks. The output format depends of the processing type. In the case of standard translation engines, a sentence is the input, and a translated sentence is the output


The NLP processes included look after the pre-process and post-process of the input and output and couple the neural network to the rest of the engine.

The engine small access server implements a basic REST API that will be used by the corporate solution but that makes possible the deploying of single engines with no need of extra modules in simple deployment scenarios.


The process types provided by Pangeanic engines are (not a complete list):

  • Generic translation
  • Customized translation
  • Language detection
  • Text categorization
  • Anonymization
  • Summarization
  • Relevance and sentiment analysis

When a NN-based (neural network) engine is installed together with the Online Learning feature, the engine can be re-trained with user-corrected data to improve the engine’s accuracy according to client preferences.

Refer to the Online Learning section in the document to know more about the features implemented.

Pangeanic engines are delivered in a docker package and in two flavours, one to deploy and run the engine using one or more CPU cores and another one to deploy and run the engine using GPUs for improved performance.

Several engines can share the same host system, but GPU engines require dedicated access to one of the GPUs in the host system.

Translation Engines

Pangeanic offers different types of Translation Engines. For all most-spoken language combinations (pairs) there exists a Generic Translation Engine (GTE). While GTEs are good enough for most applications they don’t offer near-human translation accuracy because they need to cope with multiple domains jargon, styles, dialog styles, legal texts, technical documents, unpredictable writing, etc.

Unlike GTEs, Custom Translation Engines (CTEs) are trained to translate text in a specific domain (legal, medical, finance...) and they typically contain data generated by the client, whose terminology and style are prioritized. CTEs are trained for customers requesting them, with homogeneous in-domain corpora provided partially or totally by the client.

Anonymization Engines

Anonymization Engines are similar to translation engines, the output can be considered text that is written in the same language of the source but where personal data is replaced by identifiers.

But in many user cases the anonymization engines are run on-premises whereas the rest of language services are accessed as SaaS. Running the anonymization on-premises guarantees that texts sent to a 3rd party (SaaS provider) will be clean of private/personal data in order to comply with GPRD-like regulations.