PostgreSQL Table Partitioning Part III – Partition Elimination

Understanding Partition Elimination in PostgreSQL 11

This is Part-III for my series on Postgres Table partitioning. I’ll encourage you to also read Part-I and II

PostgreSQL Table Partitioning Part I – Implementation Using Inheritance

PostgreSQL Table Partitioning Part II – Declarative Partitioning

In this post, lets compare the READ performance between partitioned and an un-partitioned table. Before that lets first review our table schema and data distribution.

Un-Partitioned Data Set
-- Data Distribution un-partitioned tbl
SELECT logdate, COUNT (logdate)
FROM measurement_np
GROUP BY logdate;

Partitioned Data Set
-- Data Distribution partitioned tbl
SELECT logdate, COUNT (logdate)
FROM measurement
GROUP BY logdate;

Comparing Query Plan (EXPLAIN ANALYZE) and Partition Elimination

1st Execution – on cold cache
-- Un-Partitioned tbl
Index Scan using measurement_np_indx_logdate on measurement_np (cost=0.44..416.13 rows=10672 width=16) (actual time=0.031..40.625 rows=10000 loops=1)
Index Cond: (logdate = '2006-04-11'::date)

-- Partitioned tbl
Append (cost=0.43..4.46 rows=1 width=16) (actual time=0.028..2.316 rows=10000 loops=1)
-> Index Scan using measurement_y2006m04_logdate_idx on measurement_y2006m04 (cost=0.43..4.45 rows=1 width=16) (actual time=0.028..1.813 rows=10000 loops=1)
Index Cond: (logdate = '2006-04-11'::date)
2nd Execution – on hot cache
-- Un-Partitioned tbl
Index Scan using measurement_np_indx_logdate on measurement_np (cost=0.44..337.14 rows=8671 width=16) (actual time=0.040..1.750 rows=10000 loops=1)
Index Cond: (logdate = '2006-04-11'::date)

-- Partitioned tbl
Append (cost=0.43..405.61 rows=9519 width=16) (actual time=0.022..1.942 rows=10000 loops=1)
-> Index Scan using measurement_y2006m04_logdate_idx on measurement_y2006m04 (cost=0.43..358.01 rows=9519 width=16) (actual time=0.021..1.426 rows=10000 loops=1)
Index Cond: (logdate = '2006-04-11'::date)
Conclusion

On in both attempts (cold and hot cache), the data retrieval from  ‘partitioned’ tables was faster than the un-partitioned table.

PostgreSQL Table Partitioning Part II – Declarative Partitioning

Starting Postgres 10.x and onward, it is now possible to create declarative partitions.

In my previous post ‘postgresql-table-partitioning-part-i-implementation-using-inheritance‘, I discussed about implementing Partitioning in PostgreSQL using ‘Inheritance’. Up until PostgreSQL 9, it was only way to partition tables in PostgreSQL. It was simple to implement, however had some limitations like:

  • Row INSERT does not automatically propagate data to a child tables (aka partition), instead it uses explicit ‘BEFORE INSERT’ trigger, making them slower
  • INDEXES and constraints have to be separately created on child tables
  • Significant manual work is required to create and maintain child tables ranges

‘Declarative’ partitioning released with Postgres 10 does not have these limitations and requires much less manual work to manage partitions.

Let’s see the implementation of ‘Declarative’ partitioning with example:

— Step 1.
Create a partitioned table using the PARTITION BY clause,                              — which includes the partitioning method (RANGE in this example) and the list of column(s) to use as the partition key

CREATE TABLE measurement (
 	city_id int not null,
 	logdate date not null,
 	peaktemp int,
 	unitsales int
) PARTITION BY RANGE (logdate);

— Step 2.
— Create Index on parent table
— Note: creation of seperate indexes on parent table is not required

CREATE INDEX measurement_indx_logdate ON measurement (logdate);

— Step 3.
— Create Default partition

CREATE TABLE measurement_default PARTITION OF measurement DEFAULT;

— Create partitions with exclusive range and dfeualt partition (catch-all for out of range values)

CREATE TABLE measurement_y2006m02 PARTITION OF measurement 
             FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
CREATE TABLE measurement_y2006m03 PARTITION OF measurement 
             FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');
CREATE TABLE measurement_y2006m04 PARTITION OF measurement 
             FOR VALUES FROM ('2006-04-01') TO ('2006-05-01');
CREATE TABLE measurement_y2007m11 PARTITION OF measurement 
             FOR VALUES FROM ('2007-11-01') TO ('2007-12-01');

Let’s review our schema now

— Step 4
— Insert sample rows

DO $DECLARE
  BEGIN
    FOR i in 1..10 loop
      INSERT INTO measurement VALUES (1,'2006-02-07',1,1);
    end loop;
  end $;

DO $DECLARE
  BEGIN
    FOR i in 1..1000000 loop
      INSERT INTO measurement VALUES (1,'2006-03-06',1,1);
    end loop;
  end $;

DO $DECLARE
  BEGIN
    FOR i in 1..1000000 loop
      INSERT INTO measurement VALUES (1,'2006-04-09',1,1);
    end loop;
  end $;

DO $DECLARE
  BEGIN
    FOR i in 1..1000000 loop
      INSERT INTO measurement VALUES (1,'2007-11-11',1,1);
    end loop;
  end $;

--Optional Step
ANALYZE measurement;

–Step 5
–Test partition elimination

  SELECT * FROM measurement
    WHERE logdate = '2007-11-11';

Let’s look at the Query Plan

As you can see, the ‘Declarative’ partitioning is much more intuitive and requires less manual steps in declaring  partitions compares to inheritance.

thanks for reading!

PostgreSQL Table Partitioning Part I – Implementation Using Inheritance

In earlier PostgreSQL versions, it was not possible to declare table partitions syntactically. Partitioning can be implemented using table inheritance. The inheritance approach involves creating a single parent table and multiple child tables (aka. Partitions) to hold data in each partition range.

In this post, I’ll discuss the implementation of table partitions using inheritance. However before proceeding, let’s first understand why do we need partitioning?

Why Partition?

The simple answer is to improve the scalability and manageability of large data sets and tables that have varying access patterns.

Typically, you create tables to store information about an entity, such as customers or sales, and each table has attributes that describe only that entity. While a single table for each entity is the easiest to design and understand, these tables are not necessarily optimized for performance, scalability, and manageability, particularly as the table grows larger.

How can partitioning help? 

  1. When tables and indexes become very large, partitioning can help by partitioning the data into smaller and more manageable sections.
  2. It allows you to speed up loading and archiving of data, so that you can perform maintenance operations on individual partitions instead of the whole table, and this in turn improves the query performance.

There is a ton of information published on partitioning, But if you new to partitioning in PostgreSQL, below are some great examples:

Now that we have some insight what table partitioning is, let’s do a real partitioning  using below scripts:

How to Partition a Table?

— Step 1.
— CREATE PARENT & CHILD TABLES

CREATE TABLE parent (
id int,
col_a varchar,
col_b varchar);
--Child Table 1
CREATE TABLE range1() INHERITS (parent);
--Child Table 2
CREATE TABLE range2() INHERITS (parent);
--Child Table 3
CREATE TABLE range3() INHERITS (parent);

Let’s review the schema now

— Step 2.
— CREATE PARTITION FUNCTION THAT WILL HOLD LOGIC OF ‘ON-INSERT’ TRIGGER

CREATE OR REPLACE FUNCTION partition_parent() RETURNS trigger as $$
  BEGIN
    IF (new.id < 10) THEN
         INSERT INTO range1 VALUES (new.*) ;
    ELSEIF (new.id >= 10 AND NEW.id < 20 ) then
         INSERT INTO range2 VALUES (new.*) ;
    ELSEIF (new.id >= 20 AND NEW.id < 30 ) then
         INSERT INTO range3 VALUES (new.*) ;
    ELSE
         RAISE EXCEPTION 'out of range';
END IF;

RETURN NULL;
END;
$$ language plpgsql;

— Step 3.
— CREATE TRIGGER AND ATTACH IT TO THE PARENT TABLE TO BE PARTITIONED

CREATE TRIGGER partition_parent_trigger
    BEFORE INSERT ON parent
        FOR EACH ROW EXECUTE PROCEDURE partition_parent();for each row execute PROCEDURE partition_parent();

— Step 4.
— INSERT SAMPLE ROWS

DO $$DECLARE
  BEGIN
     FOR i in 1..29 LOOP
          INSERT INTO parent(id, col_a, col_b) VALUES (i, 'a', 'b');
     END LOOP;
END $$;

— Optional Step (stats update)

ANALYZE parent;
-- Step 5.
-- TEST PARTITION ELIMINATION (PRUNING)

EXPLAIN ANALYZE
SELECT * from parent
   WHERE id = 5;

Query Plan 

In the next Post in this series, I’ll discuss the new ‘Declarative Partitioning‘ implementation

thanks for reading!