PostgreSQL Table Partitioning Part III – Partition Elimination

Understanding Partition Elimination in PostgreSQL 11

This is Part-III for my series on Postgres Table partitioning. I’ll encourage you to also read Part-I and II

PostgreSQL Table Partitioning Part I – Implementation Using Inheritance

PostgreSQL Table Partitioning Part II – Declarative Partitioning

In this post, lets compare the READ performance between partitioned and an un-partitioned table. Before that lets first review our table schema and data distribution.

Un-Partitioned Data Set
-- Data Distribution un-partitioned tbl
SELECT logdate, COUNT (logdate)
FROM measurement_np
GROUP BY logdate;

Partitioned Data Set
-- Data Distribution partitioned tbl
SELECT logdate, COUNT (logdate)
FROM measurement
GROUP BY logdate;

Comparing Query Plan (EXPLAIN ANALYZE) and Partition Elimination

1st Execution – on cold cache
-- Un-Partitioned tbl
Index Scan using measurement_np_indx_logdate on measurement_np (cost=0.44..416.13 rows=10672 width=16) (actual time=0.031..40.625 rows=10000 loops=1)
Index Cond: (logdate = '2006-04-11'::date)

-- Partitioned tbl
Append (cost=0.43..4.46 rows=1 width=16) (actual time=0.028..2.316 rows=10000 loops=1)
-> Index Scan using measurement_y2006m04_logdate_idx on measurement_y2006m04 (cost=0.43..4.45 rows=1 width=16) (actual time=0.028..1.813 rows=10000 loops=1)
Index Cond: (logdate = '2006-04-11'::date)
2nd Execution – on hot cache
-- Un-Partitioned tbl
Index Scan using measurement_np_indx_logdate on measurement_np (cost=0.44..337.14 rows=8671 width=16) (actual time=0.040..1.750 rows=10000 loops=1)
Index Cond: (logdate = '2006-04-11'::date)

-- Partitioned tbl
Append (cost=0.43..405.61 rows=9519 width=16) (actual time=0.022..1.942 rows=10000 loops=1)
-> Index Scan using measurement_y2006m04_logdate_idx on measurement_y2006m04 (cost=0.43..358.01 rows=9519 width=16) (actual time=0.021..1.426 rows=10000 loops=1)
Index Cond: (logdate = '2006-04-11'::date)
Conclusion

On in both attempts (cold and hot cache), the data retrieval from  ‘partitioned’ tables was faster than the un-partitioned table.

Learning beyond SQL…PostgreSQL – Indexes

It’s been some time since I’ve blogged, even though I’ve been reading a lot all this while. One of the reason is that I couldn’t find enough compelling topics to write and share. Microsoft has been moving too fast with their SQL Server releases (2012, 2014, 2016….and we’re talking  Linux beta right now) and I’ve always been catching up.

However, between all this, something has changed. Due to an ever-growing buzz around Open Source, I haltingly started looking into PostgreSQL.  Truth be told, I’m starting from ground zero (so nothing to loose) and will be writing on topics that might sound too simple for some of you, nevertheless you may still find ’em helpful.

So starting with Indexes in PostgreSQL

PostgreSQL offers several index types:

  • B-tree
  • Hash
  • GiST  and GIN

Each index type uses a different algorithm that is best suited to different types of queries. In this post we’ll talk about B-tree indices.

Why Index?

  • Speed up data retrievals
  • Indexes reference data locations, explicitly, for the indexed column, consequently reducing data retrieval time
  • Without indices, SQL performs sequential table scans in search for data (applies to SELECT and DMLs)
  • B-tree index is sorted in ascending order by default

Hands-on

#Create Index Syntax
CREATE INDEX name ON table USING btree (coulmn);

#Check of the existing indices on  a table (as they can also be created implicitly by PRIMARY or UNIQUE key definition)
SELECT * FROM pg_indexes WHERE schemaname = 'public';
SELECT * FROM pg_stat_all_indexes WHERE schemaname NOT IN ('pg_catalog','pg_toast');

#Query a table without a filter condition and get query plan using EXPLAIN
SELECT * FROM public.film;

Seq Scan on film  (cost=0.00..127.00 rows=2000 width=384)

1-18-2017-11-01-50-am

#Query a table with a filter condition and get query plan using EXPLAIN
SELECT title AS Name, release_year AS year 
FROM public.film
WHERE title in ('Clones Pinocchio','Vanilla Day');

Index Scan using idx_title on film (cost=0.28..21.63 rows=4 width=19)
Index Cond: ((title)::text = ANY (‘{“Clones Pinocchio”,”Vanilla Day”}’::text[]))

1-18-2017-11-02-14-am

Here, after specifying the WHERE condition Postgres Planner (aka Optimizer) decided to choose an index, instead of sequentially scanning the table. Postgres is able to find the targeted rows in an index, and then fetch them from disk selectively.

It’s a very wide topic, so I’ll write more about indexes. However to sum it up, PostgreSQL provides lot of flexibility with B-tree indexes so they can be optimized to suit your requirement and keep your queries snappy.

Identify queries that consume a large amount of log space in SQL Server

One of regular issues DBA’s get are about the T-log growth. Situations, wherein one “bad”  or “poorly-designed” query can eat up entire T-log space, bring the free space to zero and then bring your application down. The cause and remedy of most these issue is discussed in this KB # 317375(I’m  big fan on Microsoft KB’s).

While the KB discussed about the causes and approaches to deal with high T-log growth situations, it also hints about how we can ‘proactively’ find the queries that are consuming your T-log space at any given moment using DMV’s. Taking cue from this, I have written a below T-SQL Code:

Identify queries consuming large T-log space:

Use <your_database_name>
go
-- Description: get queries consuming maximum T-log space
-- Source: based on KB317375
-- Author: varun.dhawan@gmail.com                                               

SELECT dtst.session_id AS 'spid'
, CAST(DB_NAME(dtdt.database_id) AS VARCHAR(20)) AS 'database'
, der.command
, SUBSTRING(st.text, ( der.statement_start_offset / 2 ) + 1
, ((CASE der.statement_end_offset 
WHEN -1 THEN DATALENGTH(st.text) 
ELSE der.statement_end_offset 
END - der.statement_start_offset)/2)+1) AS statement_text
, COALESCE(QUOTENAME(DB_NAME(st.dbid)) + N'.' 
+ QUOTENAME(OBJECT_SCHEMA_NAME(st.objectid, st.dbid)) 
+ N'.' 
+ QUOTENAME(OBJECT_NAME(st.objectid, st.dbid)), '') AS command_text
, der.wait_type
, der.wait_time
, dtdt.database_transaction_log_bytes_used / 1024.0 / 1024.0 AS 'mb used'
, dtdt.database_transaction_log_bytes_used_system / 1024.0 / 1024.0 AS 'mb used system'
, dtdt.database_transaction_log_bytes_reserved / 1024.0 / 1024.0 AS 'mb reserved'
, dtdt.database_transaction_log_bytes_reserved_system / 1024.0 / 1024.0 AS 'mb reserved system'
, dtdt.database_transaction_log_record_count AS 'record count' 
FROM sys.dm_tran_database_transactions dtdt 
JOIN sys.dm_tran_session_transactions dtst 
ON dtdt.transaction_id = dtst.transaction_id 
JOIN sys.dm_exec_requests der 
CROSS apply sys.Dm_exec_sql_text(der.sql_handle) AS st 
ON dtst.session_id = der.session_id 
ORDER BY 8 DESC; 
Hope that this will help you too!

Disclaimer: Everything here, is my personal opinion and is not read or approved by my employer before it is posted. No warranties or other guarantees will be offered as to the quality of the opinions or anything else offered here.