Every so often, the question of wide vs narrow partitions come up. Typically someone has read that wide partitions can be a problem for performance. Occasionally it goes so far as to trip up the data modeling process as the analyst tries to create a performant data model but is afraid that the partitions on a couple tables would be too wide – leading to problems down the road.
The generally accepted wisdom is that narrow partitions are generally more performant than wide partitions. The problem is that no one seems to really know how much a factor it is or in which ways it’s a factor.
This entry is all about getting some concrete answers on partition width. How it affects inserting, selecting, updating and deleting data and by how much. And most importantly, how big a deal is a table’s partition width.
In DataStax, all tables have a primary key. Every primary key is composed of a mandatory partition key (containing one or more columns) and optional clustering columns. The entire primary key (partition key + clustering columns) defines a unique row in a table.
When data is written, it’s the partition key which is first used to figure out which node will contain the data – that’s why it’s mandatory. The data in a partition key is hashed (using the murmer3 algorithm) and the resulting value is used to determine which node will store the data. During read operations, supplying a partition key allows DataStax to again generate a hash and quickly determine which node contains the required data. There are a couple of important things to note here. First, by using clustering columns, multiple rows of data can be associated with a single partition key. Second, all data for a given partition key must be stored together on the same node.
A narrowly partitioned table is simply a table that either doesn’t have any clustering columns in its primary key (so each partition key value will identify a single row) or where the number of records returned for a given partition key value is fairly small. For example, a table of addresses where the partition key is ’street_address’ and the clustering column is ‘state’. There would be millions of distinct partition keys with a few having multiple rows when the same street_address combination exists in more than one state.
A widely partitioned table has at least one clustering columns and the number of records identified for a given partition key is large. For example, a list of U.S. addresses where the partition key is ‘state’ and clustering column is ‘street_address’. There would be relatively few distinct partition keys, each containing millions of addresses.
The issue isn’t DataStax finding the node with the data – the partition key always does that. It’s how much additional work DataStax has to do in either writing new data into an existing partition or reading through a partition to find specific rows.
In order to test the effect of wide vs narrow partitions, I devised a table called partition_test as follows:
|CREATE TABLE partition_test (
PRIMARY KEY((id), sub_id));
The primary key for this table has a single column partition key (id) and a single clustering column (sub_id). If every row we add has a unique id, each row will have a unique partition key. If we repeat the id and have a unique sub_id, we will have multiple rows with the same partition key. The text column (sentence) is randomly populated with text data and is simply stored in the partition.
To do the actual test and gather metrics, I wrote a java program which allows me to vary the number of partition key values to be generated and how many cluster key values should be generated for each partition key. This allows me to try out a number of partition width scenarios at varying dataset sizes.
For each run, the java program did the following:
- Connect to the cluster.
- Drop and create the testing keyspace.
- Create the partition_test table.
- Generate an ArrayList containing the specified combination of ids and sub_ids.
- Time how long it takes to Insert each id/sub_id combination.
- Shuffle the ArrayList and time how long it takes to update each id/sub_id combination.
- Shuffle the ArrayList and time how long it takes to select each id/sub_id combination.
- Shuffle the ArrayList and time how long it takes to delete each id/sub_id combination.
The program was run a total of 35 times in five sets. Each set had the same number of total rows – ranging from 10 million to 50 million total rows. Each run varied the number of rows per partition key from 1 to 1,000,000.
A three node DataStax cluster was was set up on the Microsoft Azure cloud platform along with a single Benchmark runner. All machines were placed on the same virtual network had the following characteristics:
|Cluster Node||Benchmark Runner|
|Size||Standard DS3 v2||Standard DS5 v2|
|Cores||4 vcpu||16 vcpu|
|Storage||28GB Local SSD||112GB Local SSD|
|Operating System||Ubuntu 16.04.3 LTS||Ubuntu 16.04.3 LTS|
Each machine was updated and upgraded via apt-get update and apt-get upgrade and had Oracle Java 1.8.0_144 installed. DataStax Enterprise 5.1.2 was installed on each node of the cluster.
Each of the 5 sets completed without error and the results charted. Each graph below shows lines for each of the insert, select, update and delete phases. The X axis shows the number of rows for each partition key. The Y axis shows the amount of time to complete each phase in seconds.
10 Million Rows
In this set, 10 million rows were inserted, then 10 million each of selects, updates and deletes performed. For inserts, we can see an increase in processing time as the width of each partition increases. In fact it takes 2.8 times as long to insert the rows when the partitions are their widest. Selects fared better, but still a 1.8 times drop in performance as partitions widened. Interestingly, having up to about 10 rows per partition improved performance. Updates and Deletes were pretty much identical and both largely flat not matter how wide the partition was.
20 Million Rows
In this set, 20 million rows were inserted, then 20 million each of selects, updates and deletes performed. For inserts, we again see an increase in processing time. This time up to 2.75 times slower for the widest partition. Selects showed the same trend as in the 10 million row set. Again an improvement in performance to about 10 rows per partition key, then a steady (if uneven) decline as more rows are placed in each partition key. Updates and Deletes again were pretty much unaffected by partition width until we reached the widest partition where we started seeing a minor drop in performance.
30 Million Rows
In this set, 30 million rows were inserted, then 30 million each of selects, updates and deletes performed. Inserts are again showing the same trend as before with processing taking up to 2.7 times longer with the widest partition. Selects are also showing the same trend as in earlier sets with an improvement in performance around 10 rows per partition key, followed by decreasing performance Updates and Deletes are still showing virtually no effect from partition width. Again, they are a little faster at around 100 rows per key and slowing down a bit at 1 million rows per key.
40 Million Rows
For this set, 40 million rows were inserted into a table, then 40 million each of selects, updates and deletes performed. Once again, insert performance follows the same trend as before with inserts taking 2.7 times longer for the widest partition Selects also show the same performance characteristics as before; running a little faster up to 100 rows, then slowing down. Updates and Deletes again appear largely unaffected by partition width except at the very last data point where there is a slight drop in performance.
50 Million Rows
For the last set, 50 million rows were inserted into a table, then 50 million each of selects, updates and deletes. Inserts continue showing the same trend with processing taking up to 2.6 times longer at the widest partition. Selects also continue the same trend as in earlier sets with an improvement in performance around 100 rows per partition key, followed by decreasing performance. Updates and Deletes are finally showing some differences with a noticeable decrease in performance but only at the narrowest partition. At 10 rows per partition key both are back in line with earlier sets.
There is remarkable consistency between the four sets for all of the phases – insert, select, update and delete. In fact normalizing the sets down to 10 million rows and graphing each phase yields results that are remarkably similar:
Insert performance doesn’t appear to be affected by size of the dataset. DataStax maintains consistent insert performance through the full range of data in our sets. Where it does suffer is when there are lots of rows with the same partition key – up to 2.5 times slower. For a write heavy workload that can be a significant performance issue.
Select performance is clearly affected by the size of the data set. We expect that since each node is being asked to sift through larger amounts of data. But it’s even more affected by the number of rows per partition. Up to about 1.8 times slower.
Update performance is really consistent. In fact, the most noticeable spike came on updating in the 50 million row dataset and only when there was one row per partition. Number of rows per partition key or size of dataset do not appear to be a factor at all.
Delete performance is pretty much as consistent as updates. It even has the same spike in the same 50 million row result set. Again, partition width and size of dataset don’t seem to have an effect on performance.
Based on these results we would say that partition width should be considered during data modeling but only as a secondary or tertiary concern. Usage requirements of the system and data organization should be considered first. In fact, during a modeling session, we would likely ignore any concern about partition width. Yes, 2.5 times slower select performance is a lot – but let’s face it, we literally had to add a million times more rows per partition key to see it.
Where we would give more consideration to partition width is during performance tuning. A production system at scale may be seeing performance issues related to partition width. It would take very little time to look at data usage patterns and the data model with an eye to partition width and doing so could yield a pretty substantial improvement in performance.