1

Improving Maintenance and Performance of a Data Integration / ETL Module

Altoros helped a customer to achieve durability for its data analytics system, improve maintenance, and cut time on support.

Data-Driven Analytics
Java
MongoDB
MySQL

Summary

A provider of data analytics solutions wanted to simplify the ETL (Extract, Transform, and Load) processes of its data analytics platform. As a result of cooperation with Altoros, the customer has got:

  • A highly scalable multi-tenant system that can process gigabytes of data uploaded simultaneously by different users.
  • Saved time on maintenance and customer support—by implementing monitoring and notifications to rapidly detect changes or failures.
  • Durability of all transactions—due to a secure replication server.

The customer

The customer is a provider of software that enables to efficiently organize the governance of external service delivery through the use of technology. The company builds solutions for telecom, banking, finance, and other industries.

The need

Although the customer successfully served clients with its analytics platform, the data integration part of its service needed to be streamlined. E.g., the system had overcomplicated chains of ETL transformations; the existing solution based on Pentaho DI had issues with linear scalability. In many cases, updates in transformation processes required extra workarounds. In addition, no changes implemented within the BI part were transparent in the version control system. Finally, there was no monitoring, logging, or notification functionality. So, the customer turned to Altoros for assistance.

The challenges

Under the project, the team at Altoros had to address the following issues:

  • Incoming CSV files should be validated in accordance with customizable templates described by a meta language. All the template updates should be tracked by the version control system (Git).
  • The FTP server was deployed to Amazon VPC—with simple instances. In case an instance failed during a transformation, some of the ETL data was lost. Therefore, no durability was ensured.

The solution

After a comparative study of a few data integration systems, Altoros suggested implementing Apache Camel to streamline the ETL process. On top of it, our engineers developed another framework to simplify the work of the customer’s team with Apache Camel.

Altoros’s experts delivered a Java-based validator that converted the CSV files in compliance with the template to further generate OLAP cubes.

To achieve durability, the Altoros team mapped customer’s local disks to Amazon S3 instances. So, in case of a failure, data was read from this replication server.

To track changes in the BI module and detect system (or transformation) failures, the project team delivered the monitoring, logging, and notification functionality.

The monitoring and UI modules were implemented as microservices-based architectures, which simplified maintenance, support, scaling, debugging, etc.

The outcome

The customer has got a multi-tenant, highly scalable system with improved performance and durability. Now, it can process gigabytes of data uploaded by different users simultaneously.

The solution also enabled the customer to cut time on system maintenance by simplifying ETL processes and making them transparent for developers.

Finally, end-user support was also enhanced—the changes and failures are detected faster now, while the system became more stable.

Technology stack

Client platform / app server

Tomcat

Programming languages

Java, MDX

Frameworks and tools

Pentaho Data Integration, Apache Camel

Databases

MongoDB, MySQL

Contact us and get a quote within 24 hours

Damian Castelli
Business Development Manager
damian.castelli@altoroslabs.com
Headquarters
1-650-662-5052
Toll-Free
1-855-ALTOROS