As data volumes and market complexity have increased, the financial services industry is ever more reliant on technologies that can quickly and seamlessly help them perform data analytics.
Performing calculations on hundreds of millions of rows of fast-moving data in real-time exceeds mainstream database capacities.Traditional databases cannot handle dynamic bucketing, non- linear, multi-step aggregations, statistics over simulations and time series in real time.
What’s more, identifying technologies that can handle these tasks is difficult. Standard analytical benchmarks such as those found on the TPC do not support this.
This makes public benchmarks focused on fintech applications of this complexity extremely useful, and this is also why a recent blog post by Altinity on calculating Value-at- Risk (VaR) caught our eye. In that article, the ClickHouse database is compared to the AWS Redshift database for a VaR aggregation use case – that being non-linear aggregation of historical returns. In order to perform at interactive speed, that type of aggregation requires native support for array aggregation. Redshift is a popular general purpose data warehouse but does not support array aggregation and, therefore, cannot really be compared to more specialized technologies.
ActiveViam has been performing VaR aggregation on terabytes of data for nearly two decades for dozens of multinational banks and use cases ranging from front-office PnL to credit, market and liquidity risk as well as solving for regulatory requirements.
In the blog, Altinity invited some “friendly competition”. Respectfully, we’re up for the challenge.
We ran the exact same benchmark using Atoti+ and found that Atoti+ (as well as its free community edition atoti) is not only faster, but less expensive to run. Specifically:
- The Atoti+ engine appears consistently five times faster than the ClickHouse engine.
- Atoti+ appears to offer similar or better performance than ClickHouse, while running on an eight-times less expensive server.
Atoti+ and atoti, both powered by the same underlying ActivePivot technology, offer advanced data modelling features including native array aggregation and multidimensional cubes. Atoti+ does not require users to install, configure and manage a database like ClickHouse. It can work directly on the data like other popular Python libraries such as pandas.
Now we will show you how we ran the same benchmark referenced in the Altinity blog using Atoti+.
We use the same data generator that produces a dataset with:
- 1,720,000 rows
- One array of 1000 historical returns per row
- About 7GB in size
We load it into Atoti+and define the “VaR” measure. It’s all contained in this small notebook.
(This is a very trivial way to calculate VaR. If you’re interested in more sophisticated analytics and what-if analysis please visit the public atoti/Atoti+ Python repository of fintech use cases at https://github.com/atoti/notebooks/tree/master/notebooks).
Then we benchmark the four queries described in the reference blog post, reusing the VaR measure. Atoti+ is running on an AWS m5.8xlarge instance, in the same condition as the ClickHouse benchmark.
cube.query(m["ValueAtRisk"], levels=[lvl["str0"], lvl["str1"], lvl["int10"], lvl["int11"], lvl["dttime10"], lvl["dttime11"]])
cube.query(m["ValueAtRisk"], levels=[lvl["str0"], lvl["str1"], lvl["str2"],lvl["str3"], lvl["int10"], lvl["int11"], lvl["int12"], lvl["int13"], lvl["dttime10"], lvl["dttime11"], lvl["dttime12"], lvl["dttime13"]])
cube.query(m["PnL at index"], levels=[lvl["str0"], lvl["Scenarios"]], condition=(lvl["str1"] == "KzORBHFRuFFOQm"))
Here the queries are described in the Atoti+ Python API, a convenient way to run the benchmark directly from a notebook. Atoti+ also embeds an online user interface similar to Tableau and PowerBI, so in real life end users would start analysing VaR immediately from their browser, without having any query to write and without any additional software to deploy.
|Q1||720 ms||121 ms||6 x|
|Q2||1040 ms||194 ms||5.4 x|
|Q3||1050 ms||223 ms||4.7 x|
|Q4||450 ms||65 ms||7x|
The Atoti+ engine is consistently five times faster for this use case than the ClickHouse engine, according to the Altinity test.
Behind that performance is special memory management, multicore processing, query compilation and proven performance in working with the largest financial institutions in the world that have been pushing the Atoti+engine in production for years. If you want to know more, you can watch this JavaOne presentation that discloses some of the secrets.
While a traditional database is sticky to its underlying hardware and persistent storage, Atoti+ with its Python library can be used on demand and anywhere, with a clean decoupling between storage and compute.
Running the benchmark with Atoti+on a smaller AWS instance: m5.xlarge (4 CPUs, 16GB RAM, 0.19$/hour)
|Q1||720 ms||663 ms|
|Q2||1040 ms||798 ms|
|Q3||1050 ms||877 ms|
|Q4||450 ms||110 ms|
Atoti+ is faster or equivalent to ClickHouse, while running on an eight-times less expensive server.
Running the benchmark with Atoti+ on the same m5.8xlarge instance than ClickHouse, but duplicating the dataset 6 times:
- 10,000,000 rows
- One array of 1000 historical returns per row
- About 42GB in size
|ClickHouse7GB dataset||Atoti+42GB dataset|
|Q1||720 ms||610 ms|
|Q2||1040 ms||740 ms|
|Q3||1050 ms||770 ms|
|Q4||450 ms||90 ms|
In sum, Atoti+can process a dataset six times larger than ClickHouse in less time and for the same cost. If you are interested, you can easily try it for yourself by reproducing this test with atoti, the free community version of Atoti+.
Contact us if you’d like more information on this use case and on this benchmark.