Partitions are a way to divide a large table into smaller, more manageable pieces called partitions. Each partition holds a subset of the table's data based on a specified partition key. This is only a change from the engine perspective. Nothing changes from the end-user perspective, there is still one table that the user can access regardless of partitioning.
The partitioning key could be a range of values (range partitioning), a list of specific values (list partitioning), or a mathematical expression (hash partitioning). One of the typical examples is partitioning based on date or country. We can also build custom partitioning with views, as we’ll see later in this post.
Partitions help improve query performance by allowing the database to eliminate unnecessary data during query execution. When a query specifies a condition that matches the partition key, PostgreSQL can directly access only the relevant partition, rather than scanning the entire table. This can significantly speed up queries on large tables.
Furthermore, partitions can be spread across multiple storage devices or tablespaces, enabling parallelism and efficient disk utilization. They also facilitate data management tasks such as archiving or deleting old data by simply dropping or detaching partitions.
PostgreSQL provides different types of partitioning methods, including range partitioning, list partitioning, and hash partitioning. Each method has its own benefits and is suitable for different use cases.
PostgreSQL offers built-in support for the following forms of partitioning:
- Range Partitioning: The table is partitioned into “ranges” defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. For example, one might partition by date ranges, or by ranges of identifiers for particular business objects. Each range's bounds are understood as being inclusive at the lower end and exclusive at the upper end. For example, if one partition's range is from 1 to 10, and the next one's range is from 10 to 20, then value 10 belongs to the second partition not the first.
- List Partitioning: The table is partitioned by explicitly listing which key values appear in each partition.
- Hash Partitioning: The table is partitioned by specifying a modulus and a remainder for each partition. Each partition will hold the rows for which the hash value of the partition key divided by the specified modulus will produce the specified remainder.
We can also build partitions using views. We create separate tables manually, and then create a view that joins the data from all the tables.
Let’s see practical examples.
Partitions based on date
Let’s start by creating a partitioned table:
Important part is the last line:
We create a table that partitions the data based on the year of the order date. Once we have that, we need to explicitly create tables for specific values:
We created three partitions for three different years. You can see that we included the year in the tables’ names. This is a typical approach that increases the maintenance.
Let’s now add some data:
We can now query all the orders:
Metis shows that the following tables were read:
We can see all three partitions were accessed.
We can also take orders from a specific table, like this:
Metis shows the following analysis:
We can see only one table has been read. However, if we try the following query:
with the following result:
we get the following execution:
The reason is that the engine doesn’t recognize our filters as the ones for the partitioning. We need to change the query to the following:
Metis shows the following:
We can see that partitions can improve the query performance, but we need to be careful when using them. Metis can help us troubleshoot and optimize database queries easily in this case.
You can also examine the partitions. This query shows the total number of partitions and rows: