In a previous post comparing multidimensional and relational databases we mentioned that the decision making imperatives in the Big Data era were disrupting the clear-cut border between OLTP and OLAP, enabling a new type of mixed workload database that addresses both needs.
This post takes a closer look at mixed workload systems – what they are, how they work, and what are they useful for.
What is a mixed workload database
There used to be a single state of mind resulting from the technological trenches that engineers have built over the years. According to this single state of mind, enterprise data management is handled by distinct systems: OLTP and OLAP. Day to day operational data is handled by Online Transaction Processing (OLTP) systems, which are optimized to support high-volume transaction processing and allow users to retrieve and modify records in real-time. OLTP databases are generally row-stores with row-level locking. Whilst they work well for small updates and look ups, they are simply inefficient for massive scans and aggregations. At the other end of the spectrum are Online Analytical Processing (OLAP) systems, which are designed to support decision making over massive amounts of data. OLAP databases are usually column stores that store, compress and sort the data so that scans are very fast. However they are read-only and you cannot update them in real-time.
Simply put, mixed workload databases were invented to reconcile both worlds.
A mixed workload database is capable of handling both OLAP and OLTP workloads in one system. It has the ability to record data updates through transactions, and at the same time performs large, intensive analytical queries.
The most advanced mixed workload databases can process short transactions, long transactions, short queries and complex queries at the same time. No need to wait for a transaction to be committed before starting a new transaction and vice versa. A significant advantage of such a system is that it removes the latency between the transactional environment where data is created, and the analytical environment where data is explored. As a result, users are able to interactively explore data that is updated incrementally in real-time.
How does a mixed workload solution work
For a mixed workload database to actually deliver on its promise, a mix of a few ingredients is needed.
In-memory database engine, running on multicore CPUs. This enables fast data access and the necessary performance and response time to run analytical queries (OLAP) directly on operational data, in addition to processing load from transactional queries (OLTP).
Multi-threaded algorithms. This maximizes the usage of all computer cores and enables processing both transactions and queries simultaneously to deliver the best possible throughput and response time.
An updateable column store. This stores the underlying facts so that they are easily updateable and quickly accessible for multidimensional queries.
Mixed Workload Sweet Spot and use cases
Mixed workload systems provide the most value for business analytics applications where speed and data freshness are critical. For example, with ActivePivot, our in-memory mixed workload database, we implemented this concept of transactions/analytics in treasury and capital markets – probably the industry with the most demanding users when it comes to real-time decision making on fast moving data sets.
One specific example is a front office risk analytics applications used by traders, where the mixed workload implementation allows traders to analyse their risk exposure at any time across many dimensions (currency, counterparty…) and with various levels of aggregation (at portfolio level or at the most granular level of the trade id). Typically risk intelligence also needs to be propagated at various levels in the organisation in real-time. A mixed workload system will allow both the desk manager and the head of trading to see aggregated risk data across all trading desks and all risk class assets with instant updates as market data change.
Another business segment where speed and data freshness are critical is ecommerce. With real-time analysis capabilities, one of our clients, a large e-commerce retailer, can assess the performance of prices on thousands of references and instantly modify prices throughout the day as a means to optimize margins or grow their revenues. You can see more details in our ecommerce demo.
‘What-if’ simulation is another key benefit of a mixed workload system. In traditional OLAP systems, what-if simulations are usually conducted on a set of test data, but not on live data – since users cannot change the data that is available in the analytics environment (aka the cube). A true mixed workload database allows huge data sets of real-time data to be used as a basis for what-if simulations. This provides a significant advantage for impact analysis and alternative scenario evaluation. For example a trader can perform “pre-deal checking,” a process by which he is able to gauge the impact of a new trade on the overall bank’s exposure before it is executed.
Today, many vendors are jumping on the mixed workload bandwagon and rewriting their software. Very few vendors (Yes, we are one of them) designed their aggregation engine from the ground up as a mixed workload system. To take a deeper look, you’re welcome to view one of the ActivePivot demonstrations.
But isn’t this discussion already outdated? From the moment you perform analytics on real-time data, you naturally want to go one step further: Work with continuous queries that automatically propagates updates and also have a real-time push mechanism that proactively alerts the users whenever certain conditions are met.
So what is needed is in fact is a “mixed-mixed workload” database combining OLAP, OLTP and the CEP (Complex Event Processing) element. That will be the subject of the next post.