Working with glossaries: Difference between revisions

From Pangeanic
Jump to navigation Jump to search
No edit summary
 
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Glossaries==
==Glossaries==
A glossary is a file in plain text (UTF-8 encoded) containing bilingual alignments to force some tranalations. When a glossary is specified and the text to translate contains one or more snippets of one or more words that are included in the first part of the glossary file the translation will be forced to the second part.
A glossary is a file in plain text (UTF-8 encoded) containing bilingual alignments to force some tranalations. When a glossary is specified and the text to translate contains one or more snippets of one or more words that are included in the first part of the glossary file the translation will be forced to the second part.
Let's conside this example for a Spanish to English glossary:
Let's conside this example for a Spanish to English glossary:
Line 10: Line 11:


'''Please notice that first snippet and 2nd snippet in each line are separated by a TAB character'''
'''Please notice that first snippet and 2nd snippet in each line are separated by a TAB character'''
<br />


What we are indicating with that glossary file is that any ocurrences of the words ''Pedro'' and ''Juan'' should be kept invariant during the translation instead of trying to translate by ''Peter'' and ''John''.
What we are indicating with that glossary file is that any ocurrences of the words ''Pedro'' and ''Juan'' should be kept invariant during the translation instead of trying to translate by ''Peter'' and ''John''.


Glossaries are typically used to force the translation in some way of proper names, people names, brands, etc. They are not a replacement of Translation Memories and applicability is limited in flexive languages.
Glossaries are typically used to force the translation in some way of proper names, people names, brands, etc. They are not a replacement of Translation Memories and applicability is limited in flexive languages.
<br />


==Creating a Glossary==
==Creating a Glossary==
Line 24: Line 29:
* Snippets are case sensitive, consider using different lines for different cased snippets
* Snippets are case sensitive, consider using different lines for different cased snippets
* When finished save the file with a name you can remember later and ''.txt'' file name, for instance ''my-first-glossary.txt''
* When finished save the file with a name you can remember later and ''.txt'' file name, for instance ''my-first-glossary.txt''
<br />


==Managing Glossaries in ECO==
==Managing Glossaries in ECO==
Line 35: Line 42:


Glossaries and Dictionaries are not only used for translation processes. You should always use the '''GLOSSARY / Translation''' type.
Glossaries and Dictionaries are not only used for translation processes. You should always use the '''GLOSSARY / Translation''' type.
In order to be able to use a glossary in ECO you need to assign the glossary to one or more engines.


==Using a Glossary when processing a File with ECO==
==Using a Glossary when processing a File with ECO==
The glossary option will appear when you try to process a file and choose an engine that has assigned glossaries. The use of the glossary is optional.
[[File:Glos2.png | 650px]]
==Using a Glossary with Pangeanic Translation API==
The API used to translate an array of snippets can also specify a glossary using the unique identifier (a number).
In order to use a glossary with the API you need to get the details of the glosaries you can use. That is achieved using the '''glossaries''' endpoint:
<code>
curl --location --request POST 'http://prod.pangeamt.com:8080/NexRelay/v1/corp/glossaries' --
header 'Content-Type: application/json' --data-raw '{"apikey":"demo"}'
</code>
Notice you have to send your '''APIKey''', the same APIKey you will be using when translating snippets.
The info returned by the API is a list of glossaries:
<code>
[
      {
      "id": 3,
      "name": "Automotive",
      "engineids": [15]
  },
      {
      "id": 4,
      "name": "Medical_1",
      "engineids": [12, 10]
  },
      {
      "id": 5,
      "name": "Medical_2",
      "engineids": [12, 10, 934]
  }
]
</code>
<br />
Each glossary is listed with its '''id''' and the engines the admin assigned each glossary.
<br />
And now, if you want to translate with engine 12 and glossary ''Medical_2'' with id 5 you will use this data body when calling the ''translate'' API:
<code>
{
"src":"es",
"tgt":"en",
"apikey": "demo",
"engine":"12",
"text":["Casos de uso", "Esto es hola mundo!"],
"glossary_id": 5
}
</code>


==Using a Glossary Pangeanic Translation API==
<br />

Latest revision as of 09:05, 3 December 2021

Glossaries

A glossary is a file in plain text (UTF-8 encoded) containing bilingual alignments to force some tranalations. When a glossary is specified and the text to translate contains one or more snippets of one or more words that are included in the first part of the glossary file the translation will be forced to the second part. Let's conside this example for a Spanish to English glossary:

Pedro Pedro

Juan Juan

Please notice that first snippet and 2nd snippet in each line are separated by a TAB character


What we are indicating with that glossary file is that any ocurrences of the words Pedro and Juan should be kept invariant during the translation instead of trying to translate by Peter and John.

Glossaries are typically used to force the translation in some way of proper names, people names, brands, etc. They are not a replacement of Translation Memories and applicability is limited in flexive languages.


Creating a Glossary

Linguist create glossary files. using a plain text editor, for instance Sublime Text or Notepad+.

Some important points to take into account:

  • Use UTF-8 encoding ALWAYS, moreover when dealing with non-standard charsets
  • Use a TAB character to separate the two snippets in each line
  • Snippets are case sensitive, consider using different lines for different cased snippets
  • When finished save the file with a name you can remember later and .txt file name, for instance my-first-glossary.txt


Managing Glossaries in ECO

Once you have a glossary in a txt file you should upload it to ECO using your admin account. Regular users can use but cannot manage glossaries.

Use the left menu option Corporate > Dicts/Glossaries

Error creating thumbnail: File missing

In that screen you can easily upload, delete and assign to engines.

Glossaries and Dictionaries are not only used for translation processes. You should always use the GLOSSARY / Translation type.

In order to be able to use a glossary in ECO you need to assign the glossary to one or more engines.

Using a Glossary when processing a File with ECO

The glossary option will appear when you try to process a file and choose an engine that has assigned glossaries. The use of the glossary is optional.

Error creating thumbnail: File missing

Using a Glossary with Pangeanic Translation API

The API used to translate an array of snippets can also specify a glossary using the unique identifier (a number).

In order to use a glossary with the API you need to get the details of the glosaries you can use. That is achieved using the glossaries endpoint:


curl --location --request POST 'http://prod.pangeamt.com:8080/NexRelay/v1/corp/glossaries' -- header 'Content-Type: application/json' --data-raw '{"apikey":"demo"}'


Notice you have to send your APIKey, the same APIKey you will be using when translating snippets.


The info returned by the API is a list of glossaries:


[

     {
     "id": 3,
     "name": "Automotive",
     "engineids": [15]
  },
     {
     "id": 4,
     "name": "Medical_1",
     "engineids": [12, 10]
  },
     {
     "id": 5,
     "name": "Medical_2",
     "engineids": [12, 10, 934]
  }

]


Each glossary is listed with its id and the engines the admin assigned each glossary.


And now, if you want to translate with engine 12 and glossary Medical_2 with id 5 you will use this data body when calling the translate API:

{ "src":"es", "tgt":"en", "apikey": "demo", "engine":"12", "text":["Casos de uso", "Esto es hola mundo!"], "glossary_id": 5 }