Improving Maintenance and Performance of a Data Integration / ETL Module
Altoros helped a customer to achieve durability for its data analytics system, improve maintenance, and cut time on support.
Summary
A provider of data analytics solutions wanted to simplify the ETL (Extract, Transform, and Load) processes of its data analytics platform. As a result of cooperation with Altoros, the customer has got:
- A highly scalable multi-tenant system that can process gigabytes of data uploaded simultaneously by different users.
- Saved time on maintenance and customer support—by implementing monitoring and notifications to rapidly detect changes or failures.
- Durability of all transactions—due to a secure replication server.
The customer
The customer is a provider of software that enables to efficiently organize the governance of external service delivery through the use of technology. The company builds solutions for telecom, banking, finance, and other industries.
The need
Although the customer successfully served clients with its analytics platform, the data integration part of its service needed to be streamlined. E.g., the system had overcomplicated chains of ETL transformations; the existing solution based on Pentaho DI had issues with linear scalability. In many cases, updates in transformation processes required extra workarounds. In addition, no changes implemented within the BI part were transparent in the version control system. Finally, there was no monitoring, logging, or notification functionality. So, the customer turned to Altoros for assistance.
The challenges
Under the project, the team at Altoros had to address the following issues:
- Incoming CSV files should be validated in accordance with customizable templates described by a meta language. All the template updates should be tracked by the version control system (Git).
- The FTP server was deployed to Amazon VPC—with simple instances. In case an instance failed during a transformation, some of the ETL data was lost. Therefore, no durability was ensured.
The solution
After a comparative study of a few data integration systems, Altoros suggested implementing Apache Camel to streamline the ETL process. On top of it, our engineers developed another framework to simplify the work of the customer’s team with Apache Camel.
Altoros’s experts delivered a Java-based validator that converted the CSV files in compliance with the template to further generate OLAP cubes.
To achieve durability, the Altoros team mapped customer’s local disks to Amazon S3 instances. So, in case of a failure, data was read from this replication server.
To track changes in the BI module and detect system (or transformation) failures, the project team delivered the monitoring, logging, and notification functionality.
The monitoring and UI modules were implemented as microservices-based architectures, which simplified maintenance, support, scaling, debugging, etc.
The outcome
The customer has got a multi-tenant, highly scalable system with improved performance and durability. Now, it can process gigabytes of data uploaded by different users simultaneously.
The solution also enabled the customer to cut time on system maintenance by simplifying ETL processes and making them transparent for developers.
Finally, end-user support was also enhanced—the changes and failures are detected faster now, while the system became more stable.