The Phoenix Project – Book Review

A Novel about IT, DevOps, and Helping Your Business Win

Some people are lucky to find books that changes their life. While I’m yet to find my life-changing novel, I do come across books that make me think, question my beliefs and push me to learn. The Project Phoenix is that book for me.

Written by Gene Kim, George Spafford, and Kevin Behr, the book is about a large company’s transformation into a DevOps culture. Transformation driven not just to look cool, but as a necessity for the survival of the company.

The synopsis is simple. Bill, the protagonist, is the Director of Midrange Operations at Parts Unlimited, a US-based $4 billion per year manufacturing and retail company. Bill is swiftly pulled into the spotlight by the CEO and persuaded hoodwinked into taking up the post as VP of IT Operations. It soon becomes clear that among standard responsibilities, Bill and his team are responsible for making the launch of the risky doomed Project Phoenix a success. Project Phoenix not only seems to be hugely overscoped for its ambitious – and imminent – timelines, but it also faces enormous pressure elsewhere.

The characters and situations in the book are stereotypical, however that’s not a criticism. The intent is clearly for us to identify with the characters and events, to relate your workplace with the story . So here are my key learnings “spoiler-free

DevOps is a collaborative working relationship between Development and IT Operations 🤝

Outcome of this collaboration is fast flow of planned work, while increasing the reliability, stability of the production environment.

3️ Ways principle 📜

The First way – focuses on maximizing flow of work from left-to-right starting from business to development to IT operations to the end user.

The Second way – focuses on increasing the feedback loop from right to left. The focus is not only on getting feedback but also on how fast we can get the feedback in order to make necessary corrections/improvement quickly.

The Third way – The third way is all about developing and fostering a culture of continuous experimentation and learning.

Speed to Deliver is the key 🚀

Technology is life blood of all business today. It’s imperative that all business should strive to bring their applications to market more quickly so they don’t miss any opportunities and easily adjust to the market standards. To achieve these objectives, organizations must adopt the right DevOps practices in their software development processes to reduce time to market.

Overall, book does a great job at explaining all these ideas with examples and linking them together. It’s a super fun and easy read, and I would definitely recommend you.

Connect to PostgreSQL in VS Code

VS Code has a rich extension API that let you add languages, debuggers, and tools to your installation to support easy development. PostgreSQL extension allows following:

  • Connect to PostgreSQL instances
  • View object DDL with ‘Go to Definition’ and ‘Peek Definition’
  • Write queries with IntelliSense
  • Run queries and save results as JSON, csv, or Excel

Download link: https://marketplace.visualstudio.com/items?itemName=ms-ossdata.vscode-postgresql

Using VS Code PostgreSQL extension

  1. Open the Command Palette Ctrl + Shift + P  (On mac use  ⌘ + Shift + P)

a

  1. Search and select PostgreSQL: New Query

  2. In the command palette, select Create Connection Profile. Follow the prompts to enter your Postgres instance hostname, database, username, and password.

a

 

 

You are now connected to your Postgres database. You can confirm this via the Status Bar (the ribbon at the bottom of the VS Code window). It will show your connected hostname, database, and user.

Now, let’s try to query database.

  1. Type a query ex. SELECT * FROM pg_stat_activity;

  2. Right-click, select Execute Query / keyboard shortcut [⌘M ⌘R] and the results will show in a new window.

6. You can also save the query results as JSONCSV or Excel.

So now, you can seamlessly code for PostgreSQL from Microsoft VS Code without switching screens, leverage powerful intellisense and execute queries.

Enjoy Coding!

IMP NOTE: Result windows from queries won´t show up again after being closed. This is bug with current version and is being worked by dev team. Workaround is either to keep the result window Open Or close / re-open the VS code window.

Site Reliability Engineering: How Google Runs Production Systems – Book Review

Essential Read for anyone managing highly available distributed systems at scale

First off – It’s worth let you know that Google lets you read this “entire” book online for free on their website. Yes you read it right, you don’t need to buy the book, just click on below link – https://landing.google.com/sre/sre-book/toc/index.html and start reading!

The book starts with a story about a time Margaret Hamilton brought her young daughter with her to NASA, back in the days of the Apollo program. During a simulation mission, her daughter caused the mission to crash by pressing some keys accidentally. Hamilton noticed this defect and proactively submitted a change to add error checking code to prevent this from happening again, however the change was rejected because program leadership believed that error should never happen. On the next mission, Apollo 8, that exact error condition occurred and a potentially fatal problem that could have been prevented with a trivial check took NASA’s engineers 9 hours to resolve. Hence early learning from book

“Embrace the idea that systems failures are inevitable, and therefore teams should work to optimize to recover quickly through using SRE principles.”

The book is divided into four parts, each comprised of several sections. Each section is authored by a Google engineer.

In Part I, Introduction, the authors introduce Google’s Site Reliability Engineering (SRE) approach to managing global-scale IT services running in datacenters spread across the entire world. (Google approach is truly extraordinary) After a discussion about how SRE is different from DevOps (another hot term of the day), this part introduces the core elements and requirements of SRE, which include the traditional Service Level Objectives (SLOs) and Service Level Agreements (SLAs), management of changing services and requirements, demand forecasting and capacity, provisioning and allocation, etc. Through a sample service, Shakespeare, the authors introduce the core concepts of running a workflow, which is essentially a collection of IT tasks that have inter-dependencies, in the datacenter.

In Part II, Principles, the book focuses on operational and reliability risks, SLO and SLA management, the notion of toil (mundane work that scales linearly, and can be automated) and the need to eliminate it (through automation), how to monitor the complex system that is a datacenter, a process for automation as seen at Google, the notion of engineering releases, and, last, an essay on the need for simplicity . This rather disparate collection of notions is very useful, explained for the laymen but still with enough technical content to be interesting even for the expert (practitioner or academic).

In Parts III and IV, Practices and Management, respectively, the book discusses a variety of topics, from time-series analysis for anomaly detection, to the practice and management of people on-call, to various ways to prevent and address incidents occurring in the datacenter, to postmortems and root-cause analysis that could help prevent future disasters, to testing for reliability (a notoriously difficult issue), to software engineering the SRE team, to load-balancing and overload management (resource management and scheduling 101), communication between SRE engineers, etc. etc. etc., until the predictable call for everyone to use SRE as early as possible and as often as possible. This is where I started getting a much better sense of practical SRE (a.ha!)

Overall it’s a great read, however it isn’t perfect. The two big downsides for me are 1.) this is one of those books that’s a collection of chapters by different people, so there’s a fair amount of redundancy and 2.) the book takes a sided approach on “Build Vs Buy” dilemma of engineering. I mean at Google scale, it will always be better to build, however that is rarely true in the real world. But even including the downsides, I’d say that this is the most valuable technical book I’ve read in the year. If you really like these notes, you’ll probably want to read the full book.

Azure Data Studio – Switching from Management Studio (SSMS) to Azure Data Studio (ADS)

Azure Data Studio (formerly SQL Operations Studio) is a free Cross-Platform DB management tool for for Windows, macOS and Linux. Azure Data Studio (ADS) initial release was only compatible for SQL Server, however recently Microsoft released a PostgreSQL extension for ADS – so now you can also manage your PostgreSQL instance using ADS. For more details on Azure Data Studio PostgreSQL Extension, refer to my earlier posts

Initially, I was apprehensive switching to ADS as I did not want to leave the comfort and ease of SSMS – after all I had been using it for more than a decade. My thoughts changed once I was on it.

Also, do you know ADS is built on Visual Studio code that has multiple options to ‘Customize’?And in this post I’ll take that feature and share how you could ‘Customize’ and ‘Personalize’ Azure Data Studio.

I. Change the Color theme

  1. Open Settings by clicking the gear on the bottom left and click on Settings
  2. Click on Color Theme
  3. Choose from the number of options (Choose between Light, Dark and High Contrast themes)

II. Change Keyboard Shortcuts

  1. Open Settings by clicking the gear on the bottom left and click on Settings
  2. Click on Keyboard Shortcuts

Change Run Query from Default to F5 And/Or Ctrl+E

// Place your key bindings in this file to overwrite the defaults
[
 {
  "key": "ctrl+e",
  "command": "runQueryKeyboardAction"
 },
 {
  "key": "f5",
  "command": "-runQueryKeyboardAction"
 }
]

III. Add Extensions

  1. Click the Extensions icon on the left
  2. Select the Install button on the extension you want from the list
  3. Click the Reload button to activate the installed extension

Refer to my previous post for  detailed step-by-step instructions for installing PostgreSQL extension.

IV. Get ‘Actual’ execution plan

  1. Highlight the query and Press [Ctrl] + [M] Or click Explain

You can modify the keyboard shortcut to your preference in user settings

V. Open an Integrated Terminal

  1. Use the Ctrl+` keyboard shortcut with the back tick character
  2. As a default, Terminal on my Windows 10 use Powershell, while Linux and macOS use $SHELL.
  3. You can customize and change the terminal by specifying the correct path for executable and update the settings. Below is the list of common shell executable and their default locations.
// Command Prompt
"terminal.integrated.shell.windows": "C:\\Windows\\System32\\cmd.exe"
// PowerShell
"terminal.integrated.shell.windows": "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe"
// Git Bash
"terminal.integrated.shell.windows": "C:\\Program Files\\Git\\bin\\bash.exe"

Isn’t customizing on ADS easy? Let me know what you think.

I’ll share more about Azure Data Studio as I continue the journey – so stay tuned!

Azure Data Studio PostgreSQL Extension – Custom insight dashboard


Azure Data Studio (formerly SQL Operations Studio) is a free Cross-Platform DB management tool for Windows, macOS and Linux.  Staying true to their promise of offering a unified data management experience for developers, Microsoft recently released PostgreSQL Extension for Azure Data Studio

So now developers can use same great GUI features for PostgreSQL database as they were for SQL Server, like IntelliSense, Multiple Query windows and Custom insight dashboard. In this blog post I’ll explain how to create a custom insight and add it to PostgreSQL dashboard as a widget

For this exercise I’ll use a simple query to display ‘active connections’ grouped by connection state

Step 1. Prepare your custom insight

1. Open a new Query Editor (Ctrl+N)

2. Copy / Paste below query

-- ADS custom dashboard - Get Active Vs Inactive connections
SELECT state, count(pid) 
    FROM pg_stat_activity 
     GROUP BY state, datname
     HAVING datname = '<app data>' --replace with your db name
          ORDER BY count(pid) DESC;

3. Save the query as “connection_stats.SQL” file and execute the query (F5)

4. Once you get the result-set, click on View as Chart

5. Customize the chart and click on Create Insight

Step 2. Add the insight to database dashboard

  1. Copy the insight configuration (the JSON data).
  2. Press Ctrl+Comma to open User Settings
  3. Type dashboard in Search Settings
  4. Click Edit for dashboard.database.widgets

  5. Paste the insight configuration JSON into dashboard.database.widgets. A formatted dashboard setting should look something like this
  6. Save the User Settings file
  7. In Server Explorer, right click on your database server name and click Manage
    Similar to above, you can also create more such “insights” to the default server dashboard. I have created one for checking top 5 tables by size (using below query)

    -- Get details of TOP `n` tables in database
    SELECT cl.relname AS objectname 
    ,pg_total_relation_size(cl.oid)/1024/1024/1024 AS size_in_GB
      FROM pg_class cl
         LEFT JOIN pg_namespace n ON (n.oid = cl.relnamespace)
         LEFT JOIN pg_stat_user_tables s ON (s.relid =cl.oid)
         WHERE nspname NOT IN ('pg_catalog', 'information_schema')
                 AND cl.relkind <> 'i' 
                 AND nspname !~ '^pg_toast' 
                   ORDER BY pg_total_relation_size(cl.oid) DESC 
                      LIMIT 5;

Azure Data Studio PostgreSQL Extension – Free data management tool to manage your PostgreSQL databases

Azure Data Studio (formerly SQL Operations Studio) is free Cross-Platform DB management tool for for Windows, macOS and Linux. Azure Data Studio was initially only released for managing SQL Server, however with the today’s Microsoft’s announcement , it will now be possible to connect and manage PostgreSQL databases with Azure Data Studio PostgreSQL Extension (Sweet deal!)

In this (and probably next few blog posts) I’ll be exploring this shiny new Azure Data Studio PostgreSQL Extension and will share my experience. So lets get started with Install and Configuration.

This post is written assuming you already have Azure Data Studio Installed on your machine (PC or Mac).  If you haven’t installed it already, the steps are simple and available here

How to Add Azure Data Studio PostgreSQL Extension

  1. Select the extensions icon from the sidebar in Azure Data Studio
  2. Type ‘PostgreSQL‘ into the search bar. Select the PostgreSQL extension
  3. Click “Reload Now
  4. Extension Install Notification
    Downloading https://pgtoolservice.file.core.windows.net/download/pgsqltoolsservice-win-x64.zip?sv=2018-03-28&ss=bfqt&srt=sco&sp=rwdlacup&se=2019-06-28T01:59:13Z&st=2019-02-15T18:59:13Z&spr=https&sig=LPGNO%2BGA%2FjWCavlMuLFKevGK%2FkbulFXARRbXwORgkp4%3D
    
    (21434 KB)....................Done!
    
    Installing pgSQLToolsService service to C:\XXXX\.azuredatastudio\extensions\microsoft.azuredatastudio-pgsql-0.1.0\out\pgsqltoolsservice\Windows\1.2.0-alpha.22
    
    Installed

    Note: If you are do not see above notification, consider restarting Azure Data Studio Window

  5. Click “New Connection” icon in the SERVERS page
  6. Here, you’ll now be able to select “PostgreSQL” in Connection type list
  7. Specify the all connection details as below and click
    - Server Name: PostgreSQL host name
    - User name: User name for the PostgreSQL
    - Password: Password for the PostgreSQL user
    - Database Name: Database name in PostgreSQL
    - Server Group: <Optional - if you want to create a server group>
    - Name: <Optional - name this connection>

In the next few posts, I’ll to share my experience using Azure Data Studio PostgreSQL Extension.

PostgreSQL Table Partitioning Part III – Partition Elimination

Understanding Partition Elimination in PostgreSQL 11

This is Part-III for my series on Postgres Table partitioning. I’ll encourage you to also read Part-I and II

PostgreSQL Table Partitioning Part I – Implementation Using Inheritance

PostgreSQL Table Partitioning Part II – Declarative Partitioning

In this post, lets compare the READ performance between partitioned and an un-partitioned table. Before that lets first review our table schema and data distribution.

Un-Partitioned Data Set
-- Data Distribution un-partitioned tbl
SELECT logdate, COUNT (logdate)
FROM measurement_np
GROUP BY logdate;

Partitioned Data Set
-- Data Distribution partitioned tbl
SELECT logdate, COUNT (logdate)
FROM measurement
GROUP BY logdate;

Comparing Query Plan (EXPLAIN ANALYZE) and Partition Elimination

1st Execution – on cold cache
-- Un-Partitioned tbl
Index Scan using measurement_np_indx_logdate on measurement_np (cost=0.44..416.13 rows=10672 width=16) (actual time=0.031..40.625 rows=10000 loops=1)
Index Cond: (logdate = '2006-04-11'::date)

-- Partitioned tbl
Append (cost=0.43..4.46 rows=1 width=16) (actual time=0.028..2.316 rows=10000 loops=1)
-> Index Scan using measurement_y2006m04_logdate_idx on measurement_y2006m04 (cost=0.43..4.45 rows=1 width=16) (actual time=0.028..1.813 rows=10000 loops=1)
Index Cond: (logdate = '2006-04-11'::date)
2nd Execution – on hot cache
-- Un-Partitioned tbl
Index Scan using measurement_np_indx_logdate on measurement_np (cost=0.44..337.14 rows=8671 width=16) (actual time=0.040..1.750 rows=10000 loops=1)
Index Cond: (logdate = '2006-04-11'::date)

-- Partitioned tbl
Append (cost=0.43..405.61 rows=9519 width=16) (actual time=0.022..1.942 rows=10000 loops=1)
-> Index Scan using measurement_y2006m04_logdate_idx on measurement_y2006m04 (cost=0.43..358.01 rows=9519 width=16) (actual time=0.021..1.426 rows=10000 loops=1)
Index Cond: (logdate = '2006-04-11'::date)
Conclusion

On in both attempts (cold and hot cache), the data retrieval from  ‘partitioned’ tables was faster than the un-partitioned table.

PostgreSQL Table Partitioning Part II – Declarative Partitioning

Starting Postgres 10.x and onward, it is now possible to create declarative partitions.

In my previous post ‘postgresql-table-partitioning-part-i-implementation-using-inheritance‘, I discussed about implementing Partitioning in PostgreSQL using ‘Inheritance’. Up until PostgreSQL 9, it was only way to partition tables in PostgreSQL. It was simple to implement, however had some limitations like:

  • Row INSERT does not automatically propagate data to a child tables (aka partition), instead it uses explicit ‘BEFORE INSERT’ trigger, making them slower
  • INDEXES and constraints have to be separately created on child tables
  • Significant manual work is required to create and maintain child tables ranges

‘Declarative’ partitioning released with Postgres 10 does not have these limitations and requires much less manual work to manage partitions.

Let’s see the implementation of ‘Declarative’ partitioning with example:

— Step 1.
Create a partitioned table using the PARTITION BY clause,                              — which includes the partitioning method (RANGE in this example) and the list of column(s) to use as the partition key

CREATE TABLE measurement (
 	city_id int not null,
 	logdate date not null,
 	peaktemp int,
 	unitsales int
) PARTITION BY RANGE (logdate);

— Step 2.
— Create Index on parent table
— Note: creation of seperate indexes on parent table is not required

CREATE INDEX measurement_indx_logdate ON measurement (logdate);

— Step 3.
— Create Default partition

CREATE TABLE measurement_default PARTITION OF measurement DEFAULT;

— Create partitions with exclusive range and dfeualt partition (catch-all for out of range values)

CREATE TABLE measurement_y2006m02 PARTITION OF measurement 
             FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
CREATE TABLE measurement_y2006m03 PARTITION OF measurement 
             FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');
CREATE TABLE measurement_y2006m04 PARTITION OF measurement 
             FOR VALUES FROM ('2006-04-01') TO ('2006-05-01');
CREATE TABLE measurement_y2007m11 PARTITION OF measurement 
             FOR VALUES FROM ('2007-11-01') TO ('2007-12-01');

Let’s review our schema now

— Step 4
— Insert sample rows

DO $DECLARE
  BEGIN
    FOR i in 1..10 loop
      INSERT INTO measurement VALUES (1,'2006-02-07',1,1);
    end loop;
  end $;

DO $DECLARE
  BEGIN
    FOR i in 1..1000000 loop
      INSERT INTO measurement VALUES (1,'2006-03-06',1,1);
    end loop;
  end $;

DO $DECLARE
  BEGIN
    FOR i in 1..1000000 loop
      INSERT INTO measurement VALUES (1,'2006-04-09',1,1);
    end loop;
  end $;

DO $DECLARE
  BEGIN
    FOR i in 1..1000000 loop
      INSERT INTO measurement VALUES (1,'2007-11-11',1,1);
    end loop;
  end $;

--Optional Step
ANALYZE measurement;

–Step 5
–Test partition elimination

  SELECT * FROM measurement
    WHERE logdate = '2007-11-11';

Let’s look at the Query Plan

As you can see, the ‘Declarative’ partitioning is much more intuitive and requires less manual steps in declaring  partitions compares to inheritance.

thanks for reading!

PostgreSQL Table Partitioning Part I – Implementation Using Inheritance

In earlier PostgreSQL versions, it was not possible to declare table partitions syntactically. Partitioning can be implemented using table inheritance. The inheritance approach involves creating a single parent table and multiple child tables (aka. Partitions) to hold data in each partition range.

In this post, I’ll discuss the implementation of table partitions using inheritance. However before proceeding, let’s first understand why do we need partitioning?

Why Partition?

The simple answer is to improve the scalability and manageability of large data sets and tables that have varying access patterns.

Typically, you create tables to store information about an entity, such as customers or sales, and each table has attributes that describe only that entity. While a single table for each entity is the easiest to design and understand, these tables are not necessarily optimized for performance, scalability, and manageability, particularly as the table grows larger.

How can partitioning help? 

  1. When tables and indexes become very large, partitioning can help by partitioning the data into smaller and more manageable sections.
  2. It allows you to speed up loading and archiving of data, so that you can perform maintenance operations on individual partitions instead of the whole table, and this in turn improves the query performance.

There is a ton of information published on partitioning, But if you new to partitioning in PostgreSQL, below are some great examples:

Now that we have some insight what table partitioning is, let’s do a real partitioning  using below scripts:

How to Partition a Table?

— Step 1.
— CREATE PARENT & CHILD TABLES

CREATE TABLE parent (
id int,
col_a varchar,
col_b varchar);
--Child Table 1
CREATE TABLE range1() INHERITS (parent);
--Child Table 2
CREATE TABLE range2() INHERITS (parent);
--Child Table 3
CREATE TABLE range3() INHERITS (parent);

Let’s review the schema now

— Step 2.
— CREATE PARTITION FUNCTION THAT WILL HOLD LOGIC OF ‘ON-INSERT’ TRIGGER

CREATE OR REPLACE FUNCTION partition_parent() RETURNS trigger as $$
  BEGIN
    IF (new.id < 10) THEN
         INSERT INTO range1 VALUES (new.*) ;
    ELSEIF (new.id >= 10 AND NEW.id < 20 ) then
         INSERT INTO range2 VALUES (new.*) ;
    ELSEIF (new.id >= 20 AND NEW.id < 30 ) then
         INSERT INTO range3 VALUES (new.*) ;
    ELSE
         RAISE EXCEPTION 'out of range';
END IF;

RETURN NULL;
END;
$$ language plpgsql;

— Step 3.
— CREATE TRIGGER AND ATTACH IT TO THE PARENT TABLE TO BE PARTITIONED

CREATE TRIGGER partition_parent_trigger
    BEFORE INSERT ON parent
        FOR EACH ROW EXECUTE PROCEDURE partition_parent();for each row execute PROCEDURE partition_parent();

— Step 4.
— INSERT SAMPLE ROWS

DO $$DECLARE
  BEGIN
     FOR i in 1..29 LOOP
          INSERT INTO parent(id, col_a, col_b) VALUES (i, 'a', 'b');
     END LOOP;
END $$;

— Optional Step (stats update)

ANALYZE parent;
-- Step 5.
-- TEST PARTITION ELIMINATION (PRUNING)

EXPLAIN ANALYZE
SELECT * from parent
   WHERE id = 5;

Query Plan 

In the next Post in this series, I’ll discuss the new ‘Declarative Partitioning‘ implementation

thanks for reading! 

PostgreSQL-Diagnostic-Queries – Dec 2020

psql queries to quickly Identify & resolve database performance problems

As a seasoned data store engineer, I often find myself in situations where a production application is down due to some sort of performance issue and I am being asked “What’s wrong with the database?”. In almost all these situations, the database (along with the DBA) is automatically considered guilty until proven innocent. As a DBA, I need the tools and knowledge to help quickly determine the actual problem, if there is one, because maybe there’s nothing wrong with the database or the database server. My favorite approach to start with data driven performance analysis using  PostgreSQL systems catalog

In below post, I am sharing bunch of PostgreSQL system catalog queries that can be used to troubleshoot database engine performance

Postgres system catalogs are a place where database management system stores schema metadata, such as information about tables and columns, and internal bookkeeping information. PostgreSQL’s system catalogs are regular tables.

Instance level queries

1. Get server IP address, version and port number

-- query server version (standard major.minor.patch format) 
SELECT Inet_server_addr() AS "Server IP", 
       Version()          AS "Postgres Version", 
       setting            AS "Port Number", 
       current_timestamp :: timestamp 
FROM   pg_settings 
WHERE  name = 'port'; 
2. Get server version
SHOW server_version;

-- release history
| Version  | First Release 	    | Final Release		|
| -------- | ------------------	| ------------------|
| 13       | September 24, 2020 | November 13, 2025 |
| 12       | October 3, 2019    | November 14, 2024 |
| 11 	   | October 18, 2018 	| November 9, 2023	|
| 10 	   | October 5, 2017 	| November 10, 2022	|
| 9.6 	   | September 29, 2016 | November 11, 2021	|
| 9.5 	   | January 7, 2016 	| February 11, 2021	|
| 9.4 	   | December 18, 2014 	| February 13, 2020	|
| 9.3 	   | September 9, 2013 	| November 8, 2018	|
| 9.2 	   | September 10, 2012 | November 9, 2017	|
| 9.1 	   | September 12, 2011 | October 27, 2016	|
3. Get system info
-- Server up time 
SELECT Inet_server_addr() 
       AS 
       Server_IP --server IP address 
       , 
       Inet_server_port() 
       AS Server_Port --server port 
       , 
       Current_database() 
       AS Current_Database --Current database 
       , 
       current_user 
       AS Current_User --Current user 
       , 
       Pg_backend_pid() 
       AS ProcessID --Current user pid 
       , 
       Pg_postmaster_start_time() 
       AS Server_Start_Time --Last start time 
       , 
       current_timestamp :: TIMESTAMP - Pg_postmaster_start_time() :: TIMESTAMP 
       AS 
       Running_Since; 
4. Get details of postgres configuration parameter
-- Option 1: PG_SETTINGS
-- This gives you a lot of useful info about postgres instance
SELECT name, unit, setting FROM pg_settings WHERE name ='port'                  
UNION ALL
SELECT name, unit, setting FROM pg_settings WHERE name ='shared_buffers'        -- shared_buffers determines how much memory is dedicated for caching data
UNION ALL
SELECT name, unit, setting FROM pg_settings WHERE name ='work_mem'              -- work memory required for each incoming connection
UNION ALL
SELECT name, unit, setting FROM pg_settings WHERE name ='maintenance_work_mem'  -- work memory of maintenace type queries "VACUUM, CREATE INDEX etc."
UNION ALL
SELECT name, unit, setting FROM pg_settings WHERE name ='wal_buffers'           -- Sets the number of disk-page buffers in shared memory for WAL
UNION ALL           
SELECT name, unit, setting FROM pg_settings WHERE name ='effective_cache_size'  -- used by postgres query planner
UNION ALL
SELECT name, unit, setting FROM pg_settings WHERE name ='TimeZone'              -- server time zone;

-- Option 2: SHOW ALL
-- The SHOW ALL command displays all current configuration setting of in three columns
SHOW all;

-- Option 3: PG_FILE_SETTINGS
-- To read what is stored in the postgresql.conf file itself, use the view pg_file_settings.
TABLE pg_file_settings ;

5. Get OS information
-- Get OS Version
SELECT version();

| OS     | Wiki References                                       |
| ------ | ----------------------------------------------------- |
| RedHat | wikipedia.org/wiki/Red_Hat_Enterprise_Linux	         |
| Windows| wikipedia.org/wiki/List_of_Microsoft_Windows_versions |
| Mac OS | wikipedia.org/wiki/MacOS				 |
| Ubuntu | wikipedia.org/wiki/Ubuntu_version_history		 |
6. Get location of data directory (this is where postgres stores the database files)
SELECT NAME, 
       setting 
FROM   pg_settings 
WHERE  NAME = 'data_directory'; 
--OR 
SHOW data_directory;
7. List all databases along with creation date
SELECT datname AS database_name, 
       (Pg_stat_file('base/' 
              ||oid 
              ||'/PG_VERSION')).modification as create_timestamp 
FROM   pg_database 
WHERE  datistemplate = false;
8. Get an overview of current server activity
SELECT
    pid
    , datname
    , usename
    , application_name
    , client_addr
    , to_char(backend_start, 'YYYY-MM-DD HH24:MI:SS TZ') AS backend_start
    , state
    , wait_event_type || ': ' || wait_event AS wait_event
    , pg_blocking_pids(pid) AS blocking_pids
    , query
    , to_char(state_change, 'YYYY-MM-DD HH24:MI:SS TZ') AS state_change
    , to_char(query_start, 'YYYY-MM-DD HH24:MI:SS TZ') AS query_start
    , backend_type
FROM
    pg_stat_activity
ORDER BY pid;

CONNECTION DETAILS

8. Get max_connections configuration
SELECT NAME, 
       setting, 
       short_desc 
FROM   pg_settings 
WHERE  NAME = 'max_connections';
9. Get total count of current user connections
SELECT Count(*) 
FROM   pg_stat_activity; 
10. Get active v/s inactive connections
SELECT state, 
       Count(pid) 
FROM   pg_stat_activity 
GROUP  BY state, 
          datname 
HAVING datname = '<your_database_name>' 
ORDER  BY Count(pid) DESC; 

-- One row per server process, showing database OID, database name, process ID, user OID, user name, current query, query's waiting status, time at which the current query began execution
-- Time at which the process was started, and client's address and port number. The columns that report data on the current query are available unless the parameter stats_command_string has been turned off.
-- Furthermore, these columns are only visible if the user examining the view is a superuser or the same as the user owning the process being reported on

Database specific queries

**** Switch to a user database that you are interested in *****

11. Get database current size (pretty size)
SELECT Current_database(), 
       Pg_size_pretty(Pg_database_size(Current_database())); 
12. Get top 20 objects in database by size
SELECT nspname                                        AS schemaname, 
       cl.relname                                     AS objectname, 
       CASE relkind 
         WHEN 'r' THEN 'table' 
         WHEN 'i' THEN 'index' 
         WHEN 'S' THEN 'sequence' 
         WHEN 'v' THEN 'view' 
         WHEN 'm' THEN 'materialized view' 
         ELSE 'other' 
       end                                            AS type, 
       s.n_live_tup                                   AS total_rows, 
       Pg_size_pretty(Pg_total_relation_size(cl.oid)) AS size 
FROM   pg_class cl 
       LEFT JOIN pg_namespace n 
              ON ( n.oid = cl.relnamespace ) 
       LEFT JOIN pg_stat_user_tables s 
              ON ( s.relid = cl.oid ) 
WHERE  nspname NOT IN ( 'pg_catalog', 'information_schema' ) 
       AND cl.relkind <> 'i' 
       AND nspname !~ '^pg_toast' 
ORDER  BY Pg_total_relation_size(cl.oid) DESC 
LIMIT  20; 
13. Get size of all tables
SELECT *, 
       Pg_size_pretty(total_bytes) AS total, 
       Pg_size_pretty(index_bytes) AS INDEX, 
       Pg_size_pretty(toast_bytes) AS toast, 
       Pg_size_pretty(table_bytes) AS TABLE 
FROM   (SELECT *, 
               total_bytes - index_bytes - COALESCE(toast_bytes, 0) AS 
               table_bytes 
        FROM   (SELECT c.oid, 
                       nspname                               AS table_schema, 
                       relname                               AS TABLE_NAME, 
                       c.reltuples                           AS row_estimate, 
                       Pg_total_relation_size(c.oid)         AS total_bytes, 
                       Pg_indexes_size(c.oid)                AS index_bytes, 
                       Pg_total_relation_size(reltoastrelid) AS toast_bytes 
                FROM   pg_class c 
                       LEFT JOIN pg_namespace n 
                              ON n.oid = c.relnamespace 
                WHERE  relkind = 'r') a) a; 
14. Get table metadata
SELECT relname, 
       relpages, 
       reltuples, 
       relallvisible, 
       relkind, 
       relnatts, 
       relhassubclass, 
       reloptions, 
       Pg_table_size(oid) 
FROM   pg_class 
WHERE  relname = '<table_name_here>'; 
15. Get table structure (i.e. describe table)
SELECT column_name, 
       data_type, 
       character_maximum_length 
FROM   information_schema.columns 
WHERE  table_name = '<table_name_here>'; 
						  
-- Does the table have anything unusual about it?
-- a. contains large objects
-- b. has a large proportion of NULLs in several columns
-- c. receives a large number of UPDATEs or DELETEs regularly
-- d. is growing rapidly
-- e. has many indexes on it
-- f. uses triggers that may be executing database functions, or is calling functions directly

LOCKING

16. Get Lock connection count
SELECT Count(DISTINCT pid) AS count 
FROM   pg_locks 
WHERE  NOT granted; 
17. Get locks_relation_count
SELECT   relation::regclass  AS relname , 
         count(DISTINCT pid) AS count 
FROM     pg_locks 
WHERE    NOT granted 
GROUP BY 1;
18. Get locks_statement_duration
SELECT a.query                                     AS blocking_statement, 
       Extract('epoch' FROM Now() - a.query_start) AS blocking_duration 
FROM   pg_locks bl 
       JOIN pg_stat_activity a 
         ON a.pid = bl.pid 
WHERE  NOT bl.granted; 

INDEXING

19. Get missing indexes
SELECT 
	relname AS TableName
	,seq_scan-idx_scan AS TotalSeqScan
	,CASE WHEN seq_scan-idx_scan > 0 
		THEN 'Missing Index Found' 
		ELSE 'Missing Index Not Found' 
	END AS MissingIndex
	,pg_size_pretty(pg_relation_size(relname::regclass)) AS TableSize
	,idx_scan AS TotalIndexScan
FROM pg_stat_all_tables
WHERE schemaname='public'
	AND pg_relation_size(relname::regclass)>100000 
		ORDER BY 2 DESC;
20. Get Unused Indexes
SELECT indexrelid::regclass AS INDEX , 
       relid::regclass      AS TABLE , 
       'DROP INDEX ' 
              || indexrelid::regclass 
              || ';' AS drop_statement 
FROM   pg_stat_user_indexes 
JOIN   pg_index 
using  (indexrelid) 
WHERE  idx_scan = 0 
AND    indisunique IS false;
21. Get index usage stats
SELECT t.tablename                                                         AS 
       "relation", 
       indexname, 
       c.reltuples                                                         AS 
       num_rows, 
       Pg_size_pretty(Pg_relation_size(Quote_ident(t.tablename) :: text))  AS 
       table_size, 
       Pg_size_pretty(Pg_relation_size(Quote_ident(indexrelname) :: text)) AS 
       index_size, 
       idx_scan                                                            AS 
       number_of_scans, 
       idx_tup_read                                                        AS 
       tuples_read, 
       idx_tup_fetch                                                       AS 
       tuples_fetched 
FROM   pg_tables t 
       left outer join pg_class c 
                    ON t.tablename = c.relname 
       left outer join (SELECT c.relname   AS ctablename, 
                               ipg.relname AS indexname, 
                               x.indnatts  AS number_of_columns, 
                               idx_scan, 
                               idx_tup_read, 
                               idx_tup_fetch, 
                               indexrelname, 
                               indisunique 
                        FROM   pg_index x 
                               join pg_class c 
                                 ON c.oid = x.indrelid 
                               join pg_class ipg 
                                 ON ipg.oid = x.indexrelid 
                               join pg_stat_all_indexes psai 
                                 ON x.indexrelid = psai.indexrelid) AS foo 
                    ON t.tablename = foo.ctablename 
WHERE  t.schemaname = 'public' 
ORDER  BY 1, 2;

QUERY PERFORMANCE

22. Get top 10 costly queries
SELECT   r.rolname, 
         Round((100 * total_time / Sum(total_time::numeric) OVER ())::numeric, 2) AS percentage_cpu ,
         Round(total_time::numeric, 2)                                            AS total_time, 
         calls, 
         Round(mean_time::numeric, 2) AS mean, 
         Substring(query, 1, 800)     AS short_query 
FROM     pg_stat_statements 
JOIN     pg_roles r 
ON       r.oid = userid 
ORDER BY total_time DESC limit 5;

CACHING

23. Get TOP cached tables
SELECT relname 
       AS 
       "relation", 
       heap_blks_read 
       AS heap_read, 
       heap_blks_hit 
       AS heap_hit, 
       ( ( heap_blks_hit * 100 ) / NULLIF(( heap_blks_hit + heap_blks_read ), 0) 
       ) AS 
       ratio 
FROM   pg_statio_user_tables; 

AUTOVACUUM

24. Last Autovaccum run time
SELECT relname                                                  AS "relation", 
       Extract (epoch FROM CURRENT_TIMESTAMP - last_autovacuum) AS since_last_av 
       , 
       autovacuum_count                                         AS 
       av_count, 
       n_tup_ins, 
       n_tup_upd, 
       n_tup_del, 
       n_live_tup, 
       n_dead_tup 
FROM   pg_stat_all_tables 
WHERE  schemaname = 'public' 
ORDER  BY relname; 

PARTITIONING

25. List all table partitions (as parent/child relationship)
SELECT nmsp_parent.nspname AS parent_schema, 
       parent.relname      AS parent, 
       child.relname       AS child, 
       CASE child.relkind 
         WHEN 'r' THEN 'table' 
         WHEN 'i' THEN 'index' 
         WHEN 'S' THEN 'sequence' 
         WHEN 'v' THEN 'view' 
         WHEN 'm' THEN 'materialized view' 
         ELSE 'other' 
       END                 AS type, 
       s.n_live_tup        AS total_rows 
FROM   pg_inherits 
       JOIN pg_class parent 
         ON pg_inherits.inhparent = parent.oid 
       JOIN pg_class child 
         ON pg_inherits.inhrelid = child.oid 
       JOIN pg_namespace nmsp_parent 
         ON nmsp_parent.oid = parent.relnamespace 
       JOIN pg_namespace nmsp_child 
         ON nmsp_child.oid = child.relnamespace 
       JOIN pg_stat_user_tables s 
         ON s.relid = child.oid 
WHERE  child.relkind = 'r' 
ORDER  BY parent, 
          child; 
26. Postgres 12 – pg_partition_tree()

Alternatively, can use new PG12 function pg_partition_tree() to display information about partitions. 

SELECT relid, 
       parentrelid, 
       isleaf, 
       level 
FROM   Pg_partition_tree('<parent_table_name>'); 

Roles and Privileges

27. Checking if user is connected is a “superuser”
SELECT usesuper 
FROM   pg_user 
WHERE  usename = CURRENT_USER; 
28. List all users (along with assigned roles) in current database
SELECT usename AS role_name, 
       CASE 
         WHEN usesuper 
              AND usecreatedb THEN Cast('superuser, create database' AS 
                                   pg_catalog.TEXT) 
         WHEN usesuper THEN Cast('superuser' AS pg_catalog.TEXT) 
         WHEN usecreatedb THEN Cast('create database' AS pg_catalog.TEXT) 
         ELSE Cast('' AS pg_catalog.TEXT) 
       END     role_attributes 
FROM   pg_catalog.pg_user 
ORDER  BY role_name DESC;