Anonymization: Difference between revisions

From Pangeanic
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
Pangeanic Anonymization Solution (PAS) is based in the use of Neural Networks (NN). Those Networks ingest text and output annotated text identifying the multiple entities found, for instance person names, addresses...
'''Pangeanic Anonymization Solution''' (PAS) is based in the use of Neural Networks (NN). Those Networks ingest text and output annotated text identifying the multiple entities found, for instance person names, addresses...


But Neural Networks only work as they have been taught during the training, and that means that will not be able to identify new entities or to adapt to specific usage cases.
But Neural Networks only work as they have been taught during the training, and that means that will not be able to identify new entities or to adapt to specific usage cases.
Line 10: Line 10:


==Anonymization Dictionaries==
==Anonymization Dictionaries==
The simplest way to help the NN to identify an entity is to have that entity name declared in a Dictionary. We call these Dictionaries **Anon** Dictionaries as they list texts that can be anonymized.
The simplest way to help the NN to identify an entity is to have that entity name declared in a Dictionary. We call these Dictionaries '''Anon Dictionaries''' as they list texts that can be anonymized.


Imagine you work for a hospital and you want to be sure the NN will detect and anonymize any Doctor name appearing in the documents you want to anonymize, you may create a Anon Dictionary called the -DoctorsList- that simply contains the names in different lines of a plan text file and use it when you are anonymizing.
Imagine you work for a hospital and you want to be sure the NN will detect and anonymize any Doctor name appearing in the documents you want to anonymize, you may create a Anon Dictionary called the ''DoctorsList'' that simply contains the names in different lines of a plan text file and use it when you are anonymizing.


An Anon Dictionary is linked to an entity type, for instance our Doctors List can be linked to the type PER (Person Name) or we can create a new Entity Type called DOCTOR and assign those names to DOCTOR type.
An Anon Dictionary is linked to an entity type, for instance our Doctors List can be linked to the type PER (Person Name) or we can create a new Entity Type called ''DOCTOR'' and assign those names to DOCTOR type.

Revision as of 06:44, 21 April 2022

Pangeanic Anonymization Solution (PAS) is based in the use of Neural Networks (NN). Those Networks ingest text and output annotated text identifying the multiple entities found, for instance person names, addresses...

But Neural Networks only work as they have been taught during the training, and that means that will not be able to identify new entities or to adapt to specific usage cases.

To solve the problem PAS uses two techniques that will refine the identification performance:

  • Dictionaries
  • Rules

Anonymization Dictionaries

The simplest way to help the NN to identify an entity is to have that entity name declared in a Dictionary. We call these Dictionaries Anon Dictionaries as they list texts that can be anonymized.

Imagine you work for a hospital and you want to be sure the NN will detect and anonymize any Doctor name appearing in the documents you want to anonymize, you may create a Anon Dictionary called the DoctorsList that simply contains the names in different lines of a plan text file and use it when you are anonymizing.

An Anon Dictionary is linked to an entity type, for instance our Doctors List can be linked to the type PER (Person Name) or we can create a new Entity Type called DOCTOR and assign those names to DOCTOR type.