Pangeanic Corporate Solution Architecture: Difference between revisions
Jump to navigation
Jump to search
(3 intermediate revisions by the same user not shown) | |||
Line 8: | Line 8: | ||
The solution is implemented on a fully distributed architecture with different modules and covers 3 main areas: | The solution is implemented on a fully distributed architecture with different modules and covers 3 main areas: | ||
# SaaS | # SaaS - Service Area | ||
# Data Management | # Data Management | ||
# Machine Learning | # Machine Learning | ||
=== SaaS - Service Area=== | |||
That area includes the technical resources to be able to serve users' requests and includes as main blocks: | |||
*'''Interfaces''': a web interface and a restful API used by users, robots or programatically to access the solution functionality | |||
*The '''HUB''' receiving and rerouting the users requests. The physical implementation varies from a single micro server to a kubernetes managed platform with load-balancing and upscalation in order to server up-to thousands of requests per second. | |||
*The '''File Procesor''', in charge of extract text data from formated documents and to re-build them with minimal format loss. | |||
<br/> | |||
=== Machine Learning Area=== | |||
That area includes the neural networks and all the required resources to create, train and run them as text procesors and NLP units. | |||
*'''NE Farm''': a highly scalable farm of dockerized servers escalating to serve processing requests | |||
*'''Neural Trainer''': GPU dedicated servers offered to customers to adapt the language models to their specific needs | |||
*A '''Model & Dockers Repo''' to store base and evolved models | |||
<br/> | |||
=== Data Management Area=== | |||
Data is the ''fuel'' of machine learning. Any neural network might require up-to 100 Million examples to be trained and those examples have to be acquired, processed, cleaned, packaged... | |||
*We at Pangeanic use a Data Lake as'''Corpora Repository''' and multiple NLP processes interface with it to add data units, clean, categorize, evaluate, etc. | |||
*'''PECAT''' is a specific tool to allow professionals and not-professional -crowd- input to improve, filer, select and categorize the data | |||
===Flow=== | |||
*Users access the functionalities with a variety of client interfaces (described later) such as the PGB, Web applications, CAT tools or programmatically with a RESTFul API for integrations. | *Users access the functionalities with a variety of client interfaces (described later) such as the PGB, Web applications, CAT tools or programmatically with a RESTFul API for integrations. | ||
*The Production Access Server manages user requests and orchestrates the rest of modules. It requires a standard SQL database to store the required data to fulfil the requests. | *The Production Access Server manages user requests and orchestrates the rest of modules. It requires a standard SQL database to store the required data to fulfil the requests. |
Latest revision as of 16:04, 3 December 2021
Pangeanic Architecture Block Diagrams
The diagram shows the different logical blocks.
The solution is implemented on a fully distributed architecture with different modules and covers 3 main areas:
- SaaS - Service Area
- Data Management
- Machine Learning
SaaS - Service Area
That area includes the technical resources to be able to serve users' requests and includes as main blocks:
- Interfaces: a web interface and a restful API used by users, robots or programatically to access the solution functionality
- The HUB receiving and rerouting the users requests. The physical implementation varies from a single micro server to a kubernetes managed platform with load-balancing and upscalation in order to server up-to thousands of requests per second.
- The File Procesor, in charge of extract text data from formated documents and to re-build them with minimal format loss.
Machine Learning Area
That area includes the neural networks and all the required resources to create, train and run them as text procesors and NLP units.
- NE Farm: a highly scalable farm of dockerized servers escalating to serve processing requests
- Neural Trainer: GPU dedicated servers offered to customers to adapt the language models to their specific needs
- A Model & Dockers Repo to store base and evolved models
Data Management Area
Data is the fuel of machine learning. Any neural network might require up-to 100 Million examples to be trained and those examples have to be acquired, processed, cleaned, packaged...
- We at Pangeanic use a Data Lake asCorpora Repository and multiple NLP processes interface with it to add data units, clean, categorize, evaluate, etc.
- PECAT is a specific tool to allow professionals and not-professional -crowd- input to improve, filer, select and categorize the data
Flow
- Users access the functionalities with a variety of client interfaces (described later) such as the PGB, Web applications, CAT tools or programmatically with a RESTFul API for integrations.
- The Production Access Server manages user requests and orchestrates the rest of modules. It requires a standard SQL database to store the required data to fulfil the requests.
- The engines, either local (managed by the organization on their own premises or on their own cloud) or operated by Pangeanic with a SaaS model will perform the actual language processing.
- A file processor is in charge of dealing with converting files and documents when this feature is installed.
- An on-line trainer module is in charge of evolving the models according to the user preferences. This is integrated in the engine package when the on-line learning option is installed.