Partitioning Made Easy in PostgreSQL

Partitions are a way to divide a large table into smaller, more manageable pieces called partitions. Each partition holds a subset of the table's data based on a specified partition key. This is only a change from the engine perspective. Nothing changes from the end-user perspective, there is still one table that the user can access regardless of partitioning.

The partitioning key could be a range of values (range partitioning), a list of specific values (list partitioning), or a mathematical expression (hash partitioning). One of the typical examples is partitioning based on date or country. We can also build custom partitioning with views, as we’ll see later in this post.

Partitions help improve query performance by allowing the database to eliminate unnecessary data during query execution. When a query specifies a condition that matches the partition key, PostgreSQL can directly access only the relevant partition, rather than scanning the entire table. This can significantly speed up queries on large tables.

Furthermore, partitions can be spread across multiple storage devices or tablespaces, enabling parallelism and efficient disk utilization. They also facilitate data management tasks such as archiving or deleting old data by simply dropping or detaching partitions.

PostgreSQL provides different types of partitioning methods, including range partitioning, list partitioning, and hash partitioning. Each method has its own benefits and is suitable for different use cases.

PostgreSQL offers built-in support for the following forms of partitioning:

Range Partitioning: The table is partitioned into “ranges” defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. For example, one might partition by date ranges, or by ranges of identifiers for particular business objects. Each range's bounds are understood as being inclusive at the lower end and exclusive at the upper end. For example, if one partition's range is from 1 to 10, and the next one's range is from 10 to 20, then value 10 belongs to the second partition not the first.
List Partitioning: The table is partitioned by explicitly listing which key values appear in each partition.
Hash Partitioning: The table is partitioned by specifying a modulus and a remainder for each partition. Each partition will hold the rows for which the hash value of the partition key divided by the specified modulus will produce the specified remainder.

We can also build partitions using views. We create separate tables manually, and then create a view that joins the data from all the tables.

Let’s see practical examples.

Partitions based on date

‍

Let’s start by creating a partitioned table:

CREATE TABLE orders (
	order_id SERIAL,
	order_date DATE NOT NULL,
	customer_name VARCHAR(255),
	product_name VARCHAR(255),
	quantity INT
) PARTITION BY RANGE(EXTRACT(YEAR FROM order_date));

Important part is the last line:

PARTITION BY RANGE(EXTRACT(YEAR FROM order_date))

We create a table that partitions the data based on the year of the order date. Once we have that, we need to explicitly create tables for specific values:

CREATE TABLE orders_2019 PARTITION OF orders FOR VALUES FROM (2019) TO (2020);
CREATE TABLE orders_2020 PARTITION OF orders FOR VALUES FROM (2020) TO (2021);
CREATE TABLE orders_2021 PARTITION OF orders FOR VALUES FROM (2021) TO (2022);

We created three partitions for three different years. You can see that we included the year in the tables’ names. This is a typical approach that increases the maintenance.

Let’s now add some data:

INSERT INTO orders (order_date, customer_name, product_name, quantity)
VALUES ('2019-01-01', 'John Smith', 'Product A', 10),
	('2019-02-15', 'Jane Doe', 'Product B', 5),
	('2019-04-20', 'Bob Johnson', 'Product C', 2),
	('2019-07-10', 'Alice Brown', 'Product A', 7),
	('2019-12-30', 'Mike Wilson', 'Product B', 8);

INSERT INTO orders (order_date, customer_name, product_name, quantity)
VALUES ('2020-02-14', 'John Smith', 'Product A', 15),
	('2020-03-20', 'Jane Doe', 'Product B', 3),
	('2020-06-05', 'Bob Johnson', 'Product C', 10),
	('2020-08-15', 'Alice Brown', 'Product A', 5),
	('2020-11-25', 'Mike Wilson', 'Product B', 2);

INSERT INTO orders (order_date, customer_name, product_name, quantity)
VALUES ('2021-01-07', 'John Smith', 'Product A', 4),
	('2021-03-15', 'Jane Doe', 'Product B', 12),
	('2021-05-20', 'Bob Johnson', 'Product C', 6),
	('2021-09-01', 'Alice Brown', 'Product A', 3),

We can now query all the orders:

SELECT COUNT(*) FROM orders;

Result:

count
15

Metis shows that the following tables were read:

‍

‍

We can see all three partitions were accessed.

We can also take orders from a specific table, like this:

SELECT COUNT(*) FROM orders_2019;

Result:

count
5

Metis shows the following analysis:

‍

Metis partitioning Query Tables Analysis

We can see only one table has been read. However, if we try the following query:

SELECT COUNT(*) FROM orders
WHERE order_date BETWEEN '2020-01-01' AND '2020-12-31';

with the following result:

count
5

we get the following execution:

‍

Metis partitioning Query Tables Execution

‍

The reason is that the engine doesn’t recognize our filters as the ones for the partitioning. We need to change the query to the following:

SELECT COUNT(*) FROM orders
WHERE order_date BETWEEN '2020-01-01' AND '2020-12-31'
AND EXTRACT(YEAR FROM order_date) = 2020

Metis shows the following:

Changing Query partitioning tables result

We can see that partitions can improve the query performance, but we need to be careful when using them. Metis can help us troubleshoot and optimize database queries easily in this case.

You can also examine the partitions. This query shows the total number of partitions and rows:

SELECT
	pg_inherits.inhparent::regclass AS table_name,
	pg_class.relkind,
	COUNT(pg_inherits.inhrelid::regclass) AS count_partitions,
	SUM(pg_class.relpages) as total_pages,
	SUM(pg_class.reltuples) as total_rows
FROM pg_inherits
JOIN pg_class ON pg_inherits.inhrelid = pg_class.oid
WHERE pg_class.relkind = 'r'
GROUP BY pg_inherits.inhparent::regclass, pg_class.relkind

Output:

table_name	relkind	count_partitions	total_pages	total_rows
orders	r	3	0	-3.0

This query shows each partition with partitioning expression:

SELECT
	pg_inherits.inhparent::regclass AS table_name,
	pg_inherits.inhrelid::regclass AS partition_name,
	pg_class.oid,
	pg_class.relpages,
	pg_class.reltuples,
	pg_get_expr(pg_class.relpartbound, pg_class.oid, true) as partition_expression
FROM pg_inherits
JOIN pg_class ON pg_inherits.inhrelid = pg_class.oid
WHERE pg_class.relkind = 'r'
ORDER BY pg_inherits.inhparent, pg_inherits.inhrelid

Output:

table_name	partition_name	oid	reltuples	partition_expression
orders	orders_2019	16995	-1.0	FOR VALUES FROM ('2019') TO ('2020')
orders	orders_2020	17001	-1.0	FOR VALUES FROM ('2020') TO ('2021')
orders	orders_2019	17007	-1.0	FOR VALUES FROM ('2021') TO ('2022')

Partitions implemented with views

Let’s now recreate the previous example with views. We create the following tables:

CREATE TABLE orders_2019 (
	order_id SERIAL,
	order_date DATE NOT NULL,
	customer_name VARCHAR(255),
	product_name VARCHAR(255),
	quantity INT
);

CREATE TABLE orders_2020 (
	order_id SERIAL,
	order_date DATE NOT NULL,
	customer_name VARCHAR(255),
	product_name VARCHAR(255),
	quantity INT
);

CREATE TABLE orders_2021 (
	order_id SERIAL,
	order_date DATE NOT NULL,
	customer_name VARCHAR(255),
	product_name VARCHAR(255),
	quantity INT
);

Let’s now create a view that will include data from all these tables:

CREATE VIEW orders AS (
	SELECT * FROM orders_2019
	UNION ALL
	SELECT * FROM orders_2020
	UNION ALL
	SELECT * FROM orders_2021
);

Let’s insert data the same way as before:

INSERT INTO orders (order_date, customer_name, product_name, quantity)
VALUES ('2019-01-01', 'John Smith', 'Product A', 10),
	('2019-02-15', 'Jane Doe', 'Product B', 5),
	('2019-04-20', 'Bob Johnson', 'Product C', 2),
	('2019-07-10', 'Alice Brown', 'Product A', 7),
	('2019-12-30', 'Mike Wilson', 'Product B', 8);

INSERT INTO orders (order_date, customer_name, product_name, quantity)
VALUES ('2020-02-14', 'John Smith', 'Product A', 15),
	('2020-03-20', 'Jane Doe', 'Product B', 3),
	('2020-06-05', 'Bob Johnson', 'Product C', 10),
	('2020-08-15', 'Alice Brown', 'Product A', 5),
	('2020-11-25', 'Mike Wilson', 'Product B', 2);

INSERT INTO orders (order_date, customer_name, product_name, quantity)
VALUES ('2021-01-07', 'John Smith', 'Product A', 4),
	('2021-03-15', 'Jane Doe', 'Product B', 12),
	('2021-05-20', 'Bob Johnson', 'Product C', 6),
	('2021-09-01', 'Alice Brown', 'Product A', 3),

Notice that we didn’t insert the data into the view. We had to manually specify the table for the data.

Let’s now query the table:

SELECT COUNT(*) FROM orders

The result is as expected:

count
15

Metis shows this analysis:

‍

Metis partitioning with views Query Tables Analysis

Let’s now query the subset of the data:

SELECT COUNT(*) FROM orders
WHERE order_date BETWEEN '2020-01-01' AND '2020-12-31'

Result:

count
5

Metis shows the following:

‍

Metis partitioning with views Query Tables Execution

‍

We see that all the tables were used. However, we cannot configure index on this view to speed things up, because views do not support such an operation:

CREATE INDEX orders_order_date_idx ON orders(order_date)

Result:

[Code: 0, SQL State: 42809]  ERROR: cannot create index on relation "orders"
  Detail: This operation is not supported for views.

Therefore, we should use built-in partitioning where possible.

Summary

Partitions can improve the query performance and should be in our toolbox for database optimizations. Metis can help troubleshoot performance issues and show slow queries. There are multiple configuration options for partitions, and we can always build views that mimic the partitioning behavior.

How Does Partitioning Work in PostgreSQL and Why Should You Care?

Partitions based on date

Partitions implemented with views

Summary

Keep Your Databases Fast

Harness Your Database Complexity

Prevent Your Data Workflows From Breaking

Never worry about your
database again!

Keep Your Databases Fast

Harness Your Database Complexity

Prevent Your Data Workflows From Breaking

Never worry about your database again!

Contact Us

Never worry about your
database again!