“Design and implement a new data source optimized for ultra-fast in-memory loading in Atoti, exploring innovative formats and advanced data processing techniques”
Introduction
ActiveViam has shown that its proprietary software Atoti is the leading technology to interactively analyse datasets from Gigabytes to Terabytes. Atoti can harness the largest machines of Cloud platforms and on-premise DataCenters to make all the CPUs work at a blazing speed on their Terabyte RAMs filled with client data.
Atoti is delivered to clients as a set of libraries. They are the building blocks to create their projects. This encompasses the setup of the data model: the tables and joins, the creation of new metrics and axes for analysis. It contains utility to load data into the project, and various ways of querying the data from the project.
Clients can build applications on their own. They often resort to buying consultancy from ActiveViam to build some advanced features, before integrating and maintaining it over the years, without intervention from ActiveViam teams.
In this context, ActiveViam is looking for ways to help their clients in deploying, monitoring and maintaining Atoti applications easily.
Expected Work
During this internship, you will be responsible for creating a new optimized data source for Atoti, targeting minimal in-memory loading time essentially limited only by disk read speed. Unlike traditional formats, this new data source will be able to bypass several limitations, such as:
- Key duplication checks
- Dynamic data partitioning
- Constraints imposed by linear formats
You will explore different approaches, including testing existing data formats (e.g., Apache Arrow) or designing a custom format better suited to Atoti’s needs. You will also conduct benchmarks on these different solutions to compare performance in terms of loading speed and memory management.
Objectives
- Propose and design a new format or data loading technique optimized for maximum performance in Atoti.
- Evaluate existing data formats and benchmark them against current solutions.
- Set up benchmarks to measure the impact of this new data source on loading times and data management.
- Suggest technical improvements based on benchmark results.
Technologies
- Java,
- I/O APIs,
- File systems,
- Data formats (CSV, Parquet, Avro, etc.),
- JVM performance tuning,
- Cloud benchmarking.
Internship details
This internship will last 5 to 6 months, based in Paris, and may lead to a full-time position within our R&D team, giving you the opportunity to integrate your work directly into the Atoti platform.
About ActiveViam
ActiveViam provides business users with instant insight into large volumes of fast-moving data for timely and context-aware decision-making.
Founded in 2005, ActiveViam employs over 150 people in its five offices of New-York, London, Paris and Singapore. We expect sustained growth in 2025 and will continue hiring the best talents from the top schools.
