Working with glossaries

From Pangeanic
Jump to navigation Jump to search

Glossaries

A glossary is a file in plain text (UTF-8 encoded) containing bilingual alignments to force some tranalations. When a glossary is specified and the text to translate contains one or more snippets of one or more words that are included in the first part of the glossary file the translation will be forced to the second part. Let's conside this example for a Spanish to English glossary:

Pedro Pedro

Juan Juan

Please notice that first snippet and 2nd snippet in each line are separated by a TAB character


What we are indicating with that glossary file is that any ocurrences of the words Pedro and Juan should be kept invariant during the translation instead of trying to translate by Peter and John.

Glossaries are typically used to force the translation in some way of proper names, people names, brands, etc. They are not a replacement of Translation Memories and applicability is limited in flexive languages.


Creating a Glossary

Linguist create glossary files. using a plain text editor, for instance Sublime Text or Notepad+.

Some important points to take into account:

  • Use UTF-8 encoding ALWAYS, moreover when dealing with non-standard charsets
  • Use a TAB character to separate the two snippets in each line
  • Snippets are case sensitive, consider using different lines for different cased snippets
  • When finished save the file with a name you can remember later and .txt file name, for instance my-first-glossary.txt


Managing Glossaries in ECO

Once you have a glossary in a txt file you should upload it to ECO using your admin account. Regular users can use but cannot manage glossaries.

Use the left menu option Corporate > Dicts/Glossaries

Glos1.png

In that screen you can easily upload, delete and assign to engines.

Glossaries and Dictionaries are not only used for translation processes. You should always use the GLOSSARY / Translation type.

In order to be able to use a glossary in ECO you need to assign the glossary to one or more engines.

Using a Glossary when processing a File with ECO

The glossary option will appear when you try to process a file and choose an engine that has assigned glossaries. The use of the glossary is optional.

Glos2.png

Using a Glossary with Pangeanic Translation API

The API used to translate an array of snippets can also specify a glossary using the unique identifier (a number).

In order to use a glossary with the API you need to get the details of the glosaries you can use. That is achieved using the glossaries endpoint:


curl --location --request POST 'http://prod.pangeamt.com:8080/NexRelay/v1/corp/glossaries' -- header 'Content-Type: application/json' --data-raw '{"apikey":"demo"}'


Notice you have to send your APIKey, the same APIKey you will be using when translating snippets.


The info returned by the API is a list of glossaries:


[

     {
     "id": 3,
     "name": "Automotive",
     "engineids": [15]
  },
     {
     "id": 4,
     "name": "Medical_1",
     "engineids": [12, 10]
  },
     {
     "id": 5,
     "name": "Medical_2",
     "engineids": [12, 10, 934]
  }

]


Each glossary is listed with its id and the engines the admin assigned each glossary.


And now, if you want to translate with engine 12 and glossary Medical_2 with id 5 you will use this data body when calling the translate API:

{ "src":"es", "tgt":"en", "apikey": "demo", "engine":"12", "text":["Casos de uso", "Esto es hola mundo!"], "glossary_id": 5 }