In this article, we detail how we solved a computationally intensive high-cardinality aggregation problem in asset management. The use case was the interactive calculation of portfolio performance metrics from 8 years-worth of the most granular daily position-level valuations. 

The challenge is not just one of volume: it is about enabling flexible analytics at this scale. Most technologies lose flexibility on a large scale and slow down the process, but we were able to deliver a truly interactive experience.

Key to our success was choosing an appropriate model for the data: we stored daily market values and cash flows as vectors in Atoti+ to use CPUs efficiently, and this way we achieved impressive calculation speed.

Faster queries through data vectorization

The goal of the project was to implement on-the-fly calculations of investment performance metrics, which are mathematically complex and have to be performed on large data sets. We had to demonstrate fast interactive recalculation (see sample queries and execution time below) and efficient usage of available hardware.

The client wanted to display interactively time-weighted returns, excess returns, contribution, tracking errors, money-weighted returns (internal rate of return), and contributions of any individual position or their groups during arbitrary time windows. The computation had to happen on-the-fly from daily market values and cash flows of individual transactions. This was the only way to guarantee the required level of precision of the return calculation and flexibility for analysis. To make the formulae a little more sophisticated still, rounding had to be emulated, and errors allocated to certain positions per industry convention. 

The sample data set consisted of 8 years-worth of daily observations for a portfolio consisting of 100,000 positions. Some of the investments were in funds which necessitated digging into and decomposing the underlying instruments for analysis. This is how we ended up with 60 billion data points that needed to be efficiently processed and aggregated on-the-fly. How did we fare?

Sample queries and execution time

QueryResponse time
Top 10 securities based on contribution to return 1.4 sec
Return – 3M, 6M, since inception – and excess return by portfolio, region and country1.6 sec
Daily return time series chart 1.3 sec

User experience: Fast calculation enables truly interactive analytics

From the top-down portfolio KPIs – and per arbitrary time windows – users can visualize returns computed per various methodologies and benchmarked against desired indices decomposed into individual names and positions and their attributes – all through an interactive dashboard.

In this recording, you can see time-weighted returns for the investments and for the benchmark, then filtered by portfolios. Since the return is computed from the most granular-level data every time we change the view the user is free to evaluate the investment performance where he or she wants and attribute it to instruments, periods and/or investment decisions in an explorative way.

User experience: Fast calculation enables truly interactive analytics

Vectorize the data where possible!

In his post “To vectorize or not to vectorize?”, Sam Brown, ActiveViam Solutions Architect discusses the benefits of vectorizing the data. 

It’s not always obvious however that certain data can be vectorized. Here are a few examples from fintech use cases, and how they can be modelled as vectors:

  • Monte-Carlo simulations, for instance, derivatives pricing and valuation adjustments (xVA) or capital models, such as FRTB IMA DRC – can be vectorized along scenarios
  • Time-series data, backtesting, model eligibility tests, risk factor modellability  can all be vectorized along past dates
  • Liquidity management can be vectorized along future dates
  • Time buckets can be vectorized along tenors 

Later in this post we provide a reference data model for time-series of daily portfolio valuations and explain how it helped us: 

  • Boost the performance of the high-cardinality aggregation
  • Save memory 
  • Efficiently expand funds into underlying constituents .

In order to explain what brings the benefits listed above, let’s discuss each of them briefly. 

Efficient CPU usage

The time-weighted return calculation required multiplying contributions for each of the 2,920 dates – this is an example of high-cardinality aggregation. With a flat representation of the data, the algorithm would be operating with one date at a time.

Since Atoti+ supports operations on vectors, it can leverage modern processor architecture. When the data is vectorized, the aggregation is computed on a set of values simultaneously as opposed to operating on a single value. This leads to a better query time as seen above.

Memory optimization

The easiest way to see how vectorization helps reduce memory footprint of your data is to look at an example. 

Below is the memory usage for the daily observations data before and after we vectorized them over 2,920 historical dates. 

Before

After

With the flat model described below we had around 60 Billion rows of base data taking 72 bytes each, which leads to ~4.32TB of direct memory footprint

With the vectorized model we had 60 Billion / 2920 = 20 Millions rows of base data, containing for each row 24 036 bytes which leads to ~480 GiB of direct memory to store the observations

Flat data model

  • As-of-Date – LocalDate Integer, 12 bytes 
  • Position ID – Integer, 4 bytes
  • Benchmark ID – Integer, 4 bytes
  • Decomposition ID – Integer, 4 bytes
  • Constituent ID – Integer, 4 bytes
  • Market Value EOD – Double, 8 bytes
  • Market Value SOD – Double, 8 bytes
  • Cashflow – Double, 8 bytes
  • 5 References to other store – 1 Integer, reference, 20 bytes

Vectorized data model

  • Position ID – Integer, 4 bytes
  • Benchmark ID – Integer, 4 bytes
  • Decomposition ID – Integer, 4 bytes
  • Constituent ID – Integer, 4 bytes
  • Market Value EOD – DoubleArray 8 bytes * 2920,
  • Cashflow –  DoubleArray 8 bytes * 2920 |
  • 5 References – to other store 1 Integer / reference, 20 bytes 

So by vectorizing the data we have reduced the memory footprint by a factor of 8! 

On-the-fly fund decomposition

Allocating the time-weighted-return to the underlying instruments means that each of the fund inputs (20% of input data) has to be broken down between 100 to 2,000 (1,000 on average) contributors – that equates to 60 bln data points for the instrument-level observation. In our experiment we didn’t store the instrument-level observations. Instead we generated them on-the-fly – only when needed, i.e. when the user wants to analyze the return by an instrument attribute.

Opting for the on-the-fly fund decomposition, we saved a lot of direct memory (800 GB per 3 numerics field). However when the instrument-level decomposition is generated, we still have to deal with 60 bln points – this is where the vector operations described above are particularly advantageous.   

Conclusion

In this article we showed how vectorization has helped to solve a computationally intensive analytical challenge – computing investment return on-the-fly from 60 bln data points. As Atoti+ is optimized for vector operations, we were able to achieve a truly interactive analysis for the end users enabling them to re-evaluate and visualize investment performance metrics for any scope of data they need.

Contact us at sales@activeviam.com if you wish to learn more about this use case and the technology.