Cassandra Materialized Views — A High Level Overview

In this blog post we will look into,
- Internal implementation of mviews and major difference between regular table and tables with materialised view write operations
- When and when not to use materialized views
Materialized views (MViews):
In Cassandra mviews are standard CQL tables created on top of regular CQL tables with a different partition key to denormalise data automatically by leaving the maintenance to Cassandra server.
Materialized view replaces the manual process of similar data set denormalisation with shadow tables and application logic to maintain the data synchronisation between these group of tables.
This sounds great but how cassandra maintains this internally?
Well, let's see this information in detail below.
MView internal implementation details:
Let's see how an insert/update/delete operation works on mview and regular table. Here I’m going to explain update procedure and same will be applicable for other DML operations.
Table Update procedure
- Update request submitted from application to co-ordinator with CL=LOCAL_QUORUM with RF=3
- Co-ordinator sends update request to corresponding replicas (3)
- Co-ordinator waits for an acknowledgement from 2 of the replicas
- On successful acknowledgement from 2 replicas, co-ordinator sends success message to client application. (job done)
MView Update procedure
- Update request submitted from application to co-ordinator with CL=LOCAL_QUORUM with RF=3
update table mview set mview_id='XX' where id='X';
- Co-ordinator sends update request to corresponding replicas (3) on base table
- Base table replicas acquires a local lock.
- Reads current value for the give update from each base table replica (select * from table where id=’X’) and prepares a batch log with below content on each of the replica nodes.
delete from mview where mview_id=old_value;insert into mview(mview_id, id,..) values ('XX','X' ...);
- Base table replicas get updated locally and releases locks
- Number of replicas (depending on CL) sends acknowledgment to coordinator.
- On successful acknowledgment coordinator sends data/response back to application. (job done).
By comparing above two methods,
- Mview based operations add some additional overhead interms of performance on base table. Hence write throughput on base table will be dropped and this will become even worse when more than one mview is created (means N time drop in throuput and increase in latency. Here N is number of mviews)
- Space occupied by mviews is more compared to base tables because of deletes involved.
- Batch inserts can hammer the performance even more depending on the size/number of operations involved.
- Hot spots because of poor partition key selection and low cardinality. Hence stressing only few nodes in the cluster.
- Data inconsistencies in mviews because of asynchronous batch log replay from base table replicas.
- When a partitions key hits more than 100k tombstones for any reason on base table, there is no way Cassandra can sync subsequent updates to mview and this can lead to OOM/long GC pauses/Repair issues.
How select operation works with mview
- Select operations on mviews works similar to regular tables but if any of the above problems comes into the picture then we could expect to see stale data and latency related issues …etc
When to use mview:
- Mview is a good candidate when selected primary column data from base table is with high cardinality (or one row per partition) and with less to no updates (write once and read many) but do consider additional overhead and performance penalty on base table. (Please note by default this feature is disabled in C* 4.0).
When not to use mview:
- When cardinality of the data is low and expected to have many updates then don’t use mviews.
- As explained above low cardinality data can create hot spots in the cluster and unexpected data growth because of update statments.
Conclusion
Materialized Views are definitely a great feature if they work without any side effects as described above but it doesn’t. So use Mviews with caution (if possible avoid using them). Also note that materizlized views feature in cassandra is still experimental.
In version 4.0 this feature comes disabled by default under the experimental features section of cassandra.yaml. Keep an eye on this setting before the upgrade.