PostgreSQL Table Partitioning Part I – Implementation Using Inheritance

In earlier PostgreSQL versions, it was not possible to declare table partitions syntactically. Partitioning can be implemented using table inheritance. The inheritance approach involves creating a single parent table and multiple child tables (aka. Partitions) to hold data in each partition range.

In this post, I’ll discuss the implementation of table partitions using inheritance. However before proceeding, let’s first understand why do we need partitioning?

Why Partition?

The simple answer is to improve the scalability and manageability of large data sets and tables that have varying access patterns.

Typically, you create tables to store information about an entity, such as customers or sales, and each table has attributes that describe only that entity. While a single table for each entity is the easiest to design and understand, these tables are not necessarily optimized for performance, scalability, and manageability, particularly as the table grows larger.

How can partitioning help? 

  1. When tables and indexes become very large, partitioning can help by partitioning the data into smaller and more manageable sections.
  2. It allows you to speed up loading and archiving of data, so that you can perform maintenance operations on individual partitions instead of the whole table, and this in turn improves the query performance.

There is a ton of information published on partitioning, But if you new to partitioning in PostgreSQL, below are some great examples:

Now that we have some insight what table partitioning is, let’s do a real partitioning  using below scripts:

How to Partition a Table?

— Step 1.
— CREATE PARENT & CHILD TABLES

CREATE TABLE parent (
id int,
col_a varchar,
col_b varchar);
--Child Table 1
CREATE TABLE range1() INHERITS (parent);
--Child Table 2
CREATE TABLE range2() INHERITS (parent);
--Child Table 3
CREATE TABLE range3() INHERITS (parent);

Let’s review the schema now

— Step 2.
— CREATE PARTITION FUNCTION THAT WILL HOLD LOGIC OF ‘ON-INSERT’ TRIGGER

CREATE OR REPLACE FUNCTION partition_parent() RETURNS trigger as $$
  BEGIN
    IF (new.id < 10) THEN
         INSERT INTO range1 VALUES (new.*) ;
    ELSEIF (new.id >= 10 AND NEW.id < 20 ) then
         INSERT INTO range2 VALUES (new.*) ;
    ELSEIF (new.id >= 20 AND NEW.id < 30 ) then
         INSERT INTO range3 VALUES (new.*) ;
    ELSE
         RAISE EXCEPTION 'out of range';
END IF;

RETURN NULL;
END;
$$ language plpgsql;

— Step 3.
— CREATE TRIGGER AND ATTACH IT TO THE PARENT TABLE TO BE PARTITIONED

CREATE TRIGGER partition_parent_trigger
    BEFORE INSERT ON parent
        FOR EACH ROW EXECUTE PROCEDURE partition_parent();for each row execute PROCEDURE partition_parent();

— Step 4.
— INSERT SAMPLE ROWS

DO $$DECLARE
  BEGIN
     FOR i in 1..29 LOOP
          INSERT INTO parent(id, col_a, col_b) VALUES (i, 'a', 'b');
     END LOOP;
END $$;

— Optional Step (stats update)

ANALYZE parent;
-- Step 5.
-- TEST PARTITION ELIMINATION (PRUNING)

EXPLAIN ANALYZE
SELECT * from parent
   WHERE id = 5;

Query Plan 

In the next Post in this series, I’ll discuss the new ‘Declarative Partitioning‘ implementation

thanks for reading! 

PostgreSQL-Diagnostic-Queries – Dec 2020

psql queries to quickly Identify & resolve database performance problems

As a seasoned data store engineer, I often find myself in situations where a production application is down due to some sort of performance issue and I am being asked “What’s wrong with the database?”. In almost all these situations, the database (along with the DBA) is automatically considered guilty until proven innocent. As a DBA, I need the tools and knowledge to help quickly determine the actual problem, if there is one, because maybe there’s nothing wrong with the database or the database server. My favorite approach to start with data driven performance analysis using  PostgreSQL systems catalog

In below post, I am sharing bunch of PostgreSQL system catalog queries that can be used to troubleshoot database engine performance

Postgres system catalogs are a place where database management system stores schema metadata, such as information about tables and columns, and internal bookkeeping information. PostgreSQL’s system catalogs are regular tables.

Instance level queries

1. Get server IP address, version and port number

-- query server version (standard major.minor.patch format) 
SELECT Inet_server_addr() AS "Server IP", 
       Version()          AS "Postgres Version", 
       setting            AS "Port Number", 
       current_timestamp :: timestamp 
FROM   pg_settings 
WHERE  name = 'port'; 
2. Get server version
SHOW server_version;

-- release history
| Version  | First Release 	    | Final Release		|
| -------- | ------------------	| ------------------|
| 13       | September 24, 2020 | November 13, 2025 |
| 12       | October 3, 2019    | November 14, 2024 |
| 11 	   | October 18, 2018 	| November 9, 2023	|
| 10 	   | October 5, 2017 	| November 10, 2022	|
| 9.6 	   | September 29, 2016 | November 11, 2021	|
| 9.5 	   | January 7, 2016 	| February 11, 2021	|
| 9.4 	   | December 18, 2014 	| February 13, 2020	|
| 9.3 	   | September 9, 2013 	| November 8, 2018	|
| 9.2 	   | September 10, 2012 | November 9, 2017	|
| 9.1 	   | September 12, 2011 | October 27, 2016	|
3. Get system info
-- Server up time 
SELECT Inet_server_addr() 
       AS 
       Server_IP --server IP address 
       , 
       Inet_server_port() 
       AS Server_Port --server port 
       , 
       Current_database() 
       AS Current_Database --Current database 
       , 
       current_user 
       AS Current_User --Current user 
       , 
       Pg_backend_pid() 
       AS ProcessID --Current user pid 
       , 
       Pg_postmaster_start_time() 
       AS Server_Start_Time --Last start time 
       , 
       current_timestamp :: TIMESTAMP - Pg_postmaster_start_time() :: TIMESTAMP 
       AS 
       Running_Since; 
4. Get details of postgres configuration parameter
-- Option 1: PG_SETTINGS
-- This gives you a lot of useful info about postgres instance
SELECT name, unit, setting FROM pg_settings WHERE name ='port'                  
UNION ALL
SELECT name, unit, setting FROM pg_settings WHERE name ='shared_buffers'        -- shared_buffers determines how much memory is dedicated for caching data
UNION ALL
SELECT name, unit, setting FROM pg_settings WHERE name ='work_mem'              -- work memory required for each incoming connection
UNION ALL
SELECT name, unit, setting FROM pg_settings WHERE name ='maintenance_work_mem'  -- work memory of maintenace type queries "VACUUM, CREATE INDEX etc."
UNION ALL
SELECT name, unit, setting FROM pg_settings WHERE name ='wal_buffers'           -- Sets the number of disk-page buffers in shared memory for WAL
UNION ALL           
SELECT name, unit, setting FROM pg_settings WHERE name ='effective_cache_size'  -- used by postgres query planner
UNION ALL
SELECT name, unit, setting FROM pg_settings WHERE name ='TimeZone'              -- server time zone;

-- Option 2: SHOW ALL
-- The SHOW ALL command displays all current configuration setting of in three columns
SHOW all;

-- Option 3: PG_FILE_SETTINGS
-- To read what is stored in the postgresql.conf file itself, use the view pg_file_settings.
TABLE pg_file_settings ;

5. Get OS information
-- Get OS Version
SELECT version();

| OS     | Wiki References                                       |
| ------ | ----------------------------------------------------- |
| RedHat | wikipedia.org/wiki/Red_Hat_Enterprise_Linux	         |
| Windows| wikipedia.org/wiki/List_of_Microsoft_Windows_versions |
| Mac OS | wikipedia.org/wiki/MacOS				 |
| Ubuntu | wikipedia.org/wiki/Ubuntu_version_history		 |
6. Get location of data directory (this is where postgres stores the database files)
SELECT NAME, 
       setting 
FROM   pg_settings 
WHERE  NAME = 'data_directory'; 
--OR 
SHOW data_directory;
7. List all databases along with creation date
SELECT datname AS database_name, 
       (Pg_stat_file('base/' 
              ||oid 
              ||'/PG_VERSION')).modification as create_timestamp 
FROM   pg_database 
WHERE  datistemplate = false;
8. Get an overview of current server activity
SELECT
    pid
    , datname
    , usename
    , application_name
    , client_addr
    , to_char(backend_start, 'YYYY-MM-DD HH24:MI:SS TZ') AS backend_start
    , state
    , wait_event_type || ': ' || wait_event AS wait_event
    , pg_blocking_pids(pid) AS blocking_pids
    , query
    , to_char(state_change, 'YYYY-MM-DD HH24:MI:SS TZ') AS state_change
    , to_char(query_start, 'YYYY-MM-DD HH24:MI:SS TZ') AS query_start
    , backend_type
FROM
    pg_stat_activity
ORDER BY pid;

CONNECTION DETAILS

8. Get max_connections configuration
SELECT NAME, 
       setting, 
       short_desc 
FROM   pg_settings 
WHERE  NAME = 'max_connections';
9. Get total count of current user connections
SELECT Count(*) 
FROM   pg_stat_activity; 
10. Get active v/s inactive connections
SELECT state, 
       Count(pid) 
FROM   pg_stat_activity 
GROUP  BY state, 
          datname 
HAVING datname = '<your_database_name>' 
ORDER  BY Count(pid) DESC; 

-- One row per server process, showing database OID, database name, process ID, user OID, user name, current query, query's waiting status, time at which the current query began execution
-- Time at which the process was started, and client's address and port number. The columns that report data on the current query are available unless the parameter stats_command_string has been turned off.
-- Furthermore, these columns are only visible if the user examining the view is a superuser or the same as the user owning the process being reported on

Database specific queries

**** Switch to a user database that you are interested in *****

11. Get database current size (pretty size)
SELECT Current_database(), 
       Pg_size_pretty(Pg_database_size(Current_database())); 
12. Get top 20 objects in database by size
SELECT nspname                                        AS schemaname, 
       cl.relname                                     AS objectname, 
       CASE relkind 
         WHEN 'r' THEN 'table' 
         WHEN 'i' THEN 'index' 
         WHEN 'S' THEN 'sequence' 
         WHEN 'v' THEN 'view' 
         WHEN 'm' THEN 'materialized view' 
         ELSE 'other' 
       end                                            AS type, 
       s.n_live_tup                                   AS total_rows, 
       Pg_size_pretty(Pg_total_relation_size(cl.oid)) AS size 
FROM   pg_class cl 
       LEFT JOIN pg_namespace n 
              ON ( n.oid = cl.relnamespace ) 
       LEFT JOIN pg_stat_user_tables s 
              ON ( s.relid = cl.oid ) 
WHERE  nspname NOT IN ( 'pg_catalog', 'information_schema' ) 
       AND cl.relkind <> 'i' 
       AND nspname !~ '^pg_toast' 
ORDER  BY Pg_total_relation_size(cl.oid) DESC 
LIMIT  20; 
13. Get size of all tables
SELECT *, 
       Pg_size_pretty(total_bytes) AS total, 
       Pg_size_pretty(index_bytes) AS INDEX, 
       Pg_size_pretty(toast_bytes) AS toast, 
       Pg_size_pretty(table_bytes) AS TABLE 
FROM   (SELECT *, 
               total_bytes - index_bytes - COALESCE(toast_bytes, 0) AS 
               table_bytes 
        FROM   (SELECT c.oid, 
                       nspname                               AS table_schema, 
                       relname                               AS TABLE_NAME, 
                       c.reltuples                           AS row_estimate, 
                       Pg_total_relation_size(c.oid)         AS total_bytes, 
                       Pg_indexes_size(c.oid)                AS index_bytes, 
                       Pg_total_relation_size(reltoastrelid) AS toast_bytes 
                FROM   pg_class c 
                       LEFT JOIN pg_namespace n 
                              ON n.oid = c.relnamespace 
                WHERE  relkind = 'r') a) a; 
14. Get table metadata
SELECT relname, 
       relpages, 
       reltuples, 
       relallvisible, 
       relkind, 
       relnatts, 
       relhassubclass, 
       reloptions, 
       Pg_table_size(oid) 
FROM   pg_class 
WHERE  relname = '<table_name_here>'; 
15. Get table structure (i.e. describe table)
SELECT column_name, 
       data_type, 
       character_maximum_length 
FROM   information_schema.columns 
WHERE  table_name = '<table_name_here>'; 
						  
-- Does the table have anything unusual about it?
-- a. contains large objects
-- b. has a large proportion of NULLs in several columns
-- c. receives a large number of UPDATEs or DELETEs regularly
-- d. is growing rapidly
-- e. has many indexes on it
-- f. uses triggers that may be executing database functions, or is calling functions directly

LOCKING

16. Get Lock connection count
SELECT Count(DISTINCT pid) AS count 
FROM   pg_locks 
WHERE  NOT granted; 
17. Get locks_relation_count
SELECT   relation::regclass  AS relname , 
         count(DISTINCT pid) AS count 
FROM     pg_locks 
WHERE    NOT granted 
GROUP BY 1;
18. Get locks_statement_duration
SELECT a.query                                     AS blocking_statement, 
       Extract('epoch' FROM Now() - a.query_start) AS blocking_duration 
FROM   pg_locks bl 
       JOIN pg_stat_activity a 
         ON a.pid = bl.pid 
WHERE  NOT bl.granted; 

INDEXING

19. Get missing indexes
SELECT 
	relname AS TableName
	,seq_scan-idx_scan AS TotalSeqScan
	,CASE WHEN seq_scan-idx_scan > 0 
		THEN 'Missing Index Found' 
		ELSE 'Missing Index Not Found' 
	END AS MissingIndex
	,pg_size_pretty(pg_relation_size(relname::regclass)) AS TableSize
	,idx_scan AS TotalIndexScan
FROM pg_stat_all_tables
WHERE schemaname='public'
	AND pg_relation_size(relname::regclass)>100000 
		ORDER BY 2 DESC;
20. Get Unused Indexes
SELECT indexrelid::regclass AS INDEX , 
       relid::regclass      AS TABLE , 
       'DROP INDEX ' 
              || indexrelid::regclass 
              || ';' AS drop_statement 
FROM   pg_stat_user_indexes 
JOIN   pg_index 
using  (indexrelid) 
WHERE  idx_scan = 0 
AND    indisunique IS false;
21. Get index usage stats
SELECT t.tablename                                                         AS 
       "relation", 
       indexname, 
       c.reltuples                                                         AS 
       num_rows, 
       Pg_size_pretty(Pg_relation_size(Quote_ident(t.tablename) :: text))  AS 
       table_size, 
       Pg_size_pretty(Pg_relation_size(Quote_ident(indexrelname) :: text)) AS 
       index_size, 
       idx_scan                                                            AS 
       number_of_scans, 
       idx_tup_read                                                        AS 
       tuples_read, 
       idx_tup_fetch                                                       AS 
       tuples_fetched 
FROM   pg_tables t 
       left outer join pg_class c 
                    ON t.tablename = c.relname 
       left outer join (SELECT c.relname   AS ctablename, 
                               ipg.relname AS indexname, 
                               x.indnatts  AS number_of_columns, 
                               idx_scan, 
                               idx_tup_read, 
                               idx_tup_fetch, 
                               indexrelname, 
                               indisunique 
                        FROM   pg_index x 
                               join pg_class c 
                                 ON c.oid = x.indrelid 
                               join pg_class ipg 
                                 ON ipg.oid = x.indexrelid 
                               join pg_stat_all_indexes psai 
                                 ON x.indexrelid = psai.indexrelid) AS foo 
                    ON t.tablename = foo.ctablename 
WHERE  t.schemaname = 'public' 
ORDER  BY 1, 2;

QUERY PERFORMANCE

22. Get top 10 costly queries
SELECT   r.rolname, 
         Round((100 * total_time / Sum(total_time::numeric) OVER ())::numeric, 2) AS percentage_cpu ,
         Round(total_time::numeric, 2)                                            AS total_time, 
         calls, 
         Round(mean_time::numeric, 2) AS mean, 
         Substring(query, 1, 800)     AS short_query 
FROM     pg_stat_statements 
JOIN     pg_roles r 
ON       r.oid = userid 
ORDER BY total_time DESC limit 5;

CACHING

23. Get TOP cached tables
SELECT relname 
       AS 
       "relation", 
       heap_blks_read 
       AS heap_read, 
       heap_blks_hit 
       AS heap_hit, 
       ( ( heap_blks_hit * 100 ) / NULLIF(( heap_blks_hit + heap_blks_read ), 0) 
       ) AS 
       ratio 
FROM   pg_statio_user_tables; 

AUTOVACUUM & Data bloat

24. Last Autovaccum, live & dead tuples
SELECT relname                                                  AS "relation", 
       Extract (epoch FROM CURRENT_TIMESTAMP - last_autovacuum) AS since_last_av 
       , 
       autovacuum_count                                         AS 
       av_count, 
       n_tup_ins, 
       n_tup_upd, 
       n_tup_del, 
       n_live_tup, 
       n_dead_tup 
FROM   pg_stat_all_tables 
WHERE  schemaname = 'public' 
ORDER  BY relname; 

PARTITIONING

25. List all table partitions (as parent/child relationship)
SELECT nmsp_parent.nspname AS parent_schema, 
       parent.relname      AS parent, 
       child.relname       AS child, 
       CASE child.relkind 
         WHEN 'r' THEN 'table' 
         WHEN 'i' THEN 'index' 
         WHEN 'S' THEN 'sequence' 
         WHEN 'v' THEN 'view' 
         WHEN 'm' THEN 'materialized view' 
         ELSE 'other' 
       END                 AS type, 
       s.n_live_tup        AS total_rows 
FROM   pg_inherits 
       JOIN pg_class parent 
         ON pg_inherits.inhparent = parent.oid 
       JOIN pg_class child 
         ON pg_inherits.inhrelid = child.oid 
       JOIN pg_namespace nmsp_parent 
         ON nmsp_parent.oid = parent.relnamespace 
       JOIN pg_namespace nmsp_child 
         ON nmsp_child.oid = child.relnamespace 
       JOIN pg_stat_user_tables s 
         ON s.relid = child.oid 
WHERE  child.relkind = 'r' 
ORDER  BY parent, 
          child; 
26. Postgres 12 – pg_partition_tree()

Alternatively, can use new PG12 function pg_partition_tree() to display information about partitions. 

SELECT relid, 
       parentrelid, 
       isleaf, 
       level 
FROM   Pg_partition_tree('<parent_table_name>'); 

Roles and Privileges

27. Checking if user is connected is a “superuser”
SELECT usesuper 
FROM   pg_user 
WHERE  usename = CURRENT_USER; 
28. List all users (along with assigned roles) in current database
SELECT usename AS role_name, 
       CASE 
         WHEN usesuper 
              AND usecreatedb THEN Cast('superuser, create database' AS 
                                   pg_catalog.TEXT) 
         WHEN usesuper THEN Cast('superuser' AS pg_catalog.TEXT) 
         WHEN usecreatedb THEN Cast('create database' AS pg_catalog.TEXT) 
         ELSE Cast('' AS pg_catalog.TEXT) 
       END     role_attributes 
FROM   pg_catalog.pg_user 
ORDER  BY role_name DESC; 

“Hit Refresh” Book Review – Transforming Microsoft with a growth mindset

“A mind needs books as a sword needs a whetstone, if it is to keep its edge. – George R.R. Martin”

Yes, I admit that I sort of dropped the ball on reading. What was more surprising is that all this while I almost did nothing to correct it. Fortunately sanity prevailed and my wife who herself is a voracious reader and a writer (though her first masterpiece is still in making) help me remind about all the good things I am missing by not reading. So I picked up books again!

This blog is share reviews of whatever little I’ve read and learn from your thoughts.

So what is my latest read?

Hit Refresh: The Quest to Rediscover Microsoft’s Soul and Imagine a Better Future for Everyone – By Satya Nadella

516+-p6YgoL

 

First as an Ex Micrsoftie, I am in absolutely love with Microsoft and I take so much pride in talking about how the years that I spent in MS has helped me shaping my career. While some may disagree, but I firmly believe that impact that Microsoft had on our daily lives is beyond any other technology company. Not just the software, Microsoft clearly were the first company that provided the platform, tools and more importantly the inspiration to dream and design a digital world, as it we know today. Over the time, other tech companies came to their age and outsmarted Microsoft in multiple key areas including Search, Social and Mobile. And then Satya Nadella came to helm he embarked on a journey to re-discover the lost soul of company and make it relevant (and Cool) again.

‘Hit Refresh’ is primarily divided into three parts

  1. First is Satya’s personal story coming from India and arriving at Microsoft in 1992
  2. Second is story of Microsoft transformation and steps Satya took to curate the ‘Growth Mindset’ in employees
  3. Third (and favorite) part of where Satya talks about the disruptions and transformation that technology is going to create, which will overcome the existing limits of physics and chemistry and ultimately better the mankind

Author has been extremely honest and open in talking about the strengths of his competitors and acknowledges the need find the smart ways to partner with companies to learn from each other perspective. There are multiple places in book that made me realize that Satya is a leader who has no fear of being authentic and vulnerable. He actually talked about his biggest fumble at Grace Hopper conference and how working with ‘Frenemies’ Microsoft created more growth opportunities like Office for Mac!

The key take away the book offers to its readers is to stay optimistic about what’s to come. As the technology is advancing, the world is changing faster then ever. One  definite ways master this change is have a “Growth Mindset

“Don’t Be a Know-It-All, Be a Learn-It-All – Satya Nadella”

Learning beyond SQL…PostgreSQL – Indexes

It’s been some time since I’ve blogged, even though I’ve been reading a lot all this while. One of the reason is that I couldn’t find enough compelling topics to write and share. Microsoft has been moving too fast with their SQL Server releases (2012, 2014, 2016….and we’re talking  Linux beta right now) and I’ve always been catching up.

However, between all this, something has changed. Due to an ever-growing buzz around Open Source, I haltingly started looking into PostgreSQL.  Truth be told, I’m starting from ground zero (so nothing to loose) and will be writing on topics that might sound too simple for some of you, nevertheless you may still find ’em helpful.

So starting with Indexes in PostgreSQL

PostgreSQL offers several index types:

  • B-tree
  • Hash
  • GiST  and GIN

Each index type uses a different algorithm that is best suited to different types of queries. In this post we’ll talk about B-tree indices.

Why Index?

  • Speed up data retrievals
  • Indexes reference data locations, explicitly, for the indexed column, consequently reducing data retrieval time
  • Without indices, SQL performs sequential table scans in search for data (applies to SELECT and DMLs)
  • B-tree index is sorted in ascending order by default

Hands-on

#Create Index Syntax
CREATE INDEX name ON table USING btree (coulmn);

#Check of the existing indices on  a table (as they can also be created implicitly by PRIMARY or UNIQUE key definition)
SELECT * FROM pg_indexes WHERE schemaname = 'public';
SELECT * FROM pg_stat_all_indexes WHERE schemaname NOT IN ('pg_catalog','pg_toast');

#Query a table without a filter condition and get query plan using EXPLAIN
SELECT * FROM public.film;

Seq Scan on film  (cost=0.00..127.00 rows=2000 width=384)

1-18-2017-11-01-50-am

#Query a table with a filter condition and get query plan using EXPLAIN
SELECT title AS Name, release_year AS year 
FROM public.film
WHERE title in ('Clones Pinocchio','Vanilla Day');

Index Scan using idx_title on film (cost=0.28..21.63 rows=4 width=19)
Index Cond: ((title)::text = ANY (‘{“Clones Pinocchio”,”Vanilla Day”}’::text[]))

1-18-2017-11-02-14-am

Here, after specifying the WHERE condition Postgres Planner (aka Optimizer) decided to choose an index, instead of sequentially scanning the table. Postgres is able to find the targeted rows in an index, and then fetch them from disk selectively.

It’s a very wide topic, so I’ll write more about indexes. However to sum it up, PostgreSQL provides lot of flexibility with B-tree indexes so they can be optimized to suit your requirement and keep your queries snappy.

2014 – My Year in Books 📚

There is more treasure in books than in all the pirate’s loot on Treasure Island” – Walt Disney

2014 has been quite a year for me both personally and professionally. I’ve never been much of book reader as they’ve always reminded me of my school days when reading a book was like compromising on all other good things (some of my schoolmates can resonate to that). However things gradually changed (thanks to wife), and I started to read and ended up having an year far more richer than than one I started with.

While this by any standards is a tiny number of books read during the year, I still decided go ahead and blog about whatever little I’ve read and to share my reviews.  This I believe will motivate me to read more in 2015 and could be some help for my readers.

I’m mostly reading on my Kindle (Fire HD) device which makes reading very portable, light and less expensive (given Kindle format books are much cheaper), however I still have love for physical paper ones.

Tech reads

T-SQL Programming

#1 Inside Microsoft SQL Server 2008 T-SQL Programming; Itzik Ben-Gan, Dejan Sarka; Coding & Technical; Read-Status: Completed;

Itzik Ben-Gan is unquestionably the greatest T-SQL Guru and this book is one of the best SQL references ever published on T-SQL. I’ve been working on databases since SQL 2000 and Oracle 8i days, and thought I knew ‘reasonably’ enough about T-SQL until I started reading this book.

The book is more or less is a conversation where author and readers, which at some point looks it’s not like a technical read (when it really is). Few chapters look at SQL beyond the storage engine, exploring other areas like SSDT, CLR, Hierarchical/Geospatial and Azure & xVelocity etc.

I would recommend this book any day for a SQL DBA trying to get a deeper handle on T-SQL code and beyond relational topics. This one, however, is not for beginners, so I would also recommend them to take a look at 1st book in series “T-SQL Querying“.

DBA_Surviver_1

#2 DBA Survivor: Become a Rock Star DBA; Thomas LaRock; Coding & Technical; Read-Status: Completed;

‘DBA Survivor’ is like SQL DBA Career Guide 101. In my opinion (yes it’s just mine) this book is not really a technical reference that you may want to revisit while troubleshooting a database issue, however, its a book that you would like to refer when you feel lil lost with your career path as DBA; be it a beginner, an intermediate or an expert. Thomas LaRock is veteran DBA and also a PASS president, having a great career working in mid-to-large size organizations and whatever he talks in the book is all about his own experiences, which I’m sure you’ll be able to relate with.

Now the best part about ‘DBA Survivor’ is that if you’re a new DBA, it has a clear road map of what you should be doing to make the most of your current role and reach to the next. I repeat it is not overly technical, instead,  all it has is navigation map which will help you take do those rights things at right time and help you get there, and all of this in a simple entertaining voice.

After reading the book, I wished if could have got this book eight year back, when I started my career as a DBA, still, I’m thankful to Tom that I have read this now.

SQL_2014

#3 Introducing Microsoft SQL Server 2014; Ross Mistry, Stacia Misner; Coding & Technical; Read-Status: Completed;

Finally after much wait ‘SQL Server 2014’ got RTMed this year, and this is first official book that covers the new SQL release. The thing to callout is that the book is ‘freely’ available in kindle format, making it a little more special (yes! you can download one right now and then resume this read).

As expected, the book pre-dominantly take quick deep dive into topics that matter most to new SQL release like AlwaysON enhancements, In-Memory Capabilities, and Columnstore indices. Being an introductory reference, none of these chapters cover the minuscule depth of the respective topics, rather provides a high-level overview of these enhancements. In addition, the author also makes an attempt to help you identify a relevant business case for choosing these new features (example what kinda workloads will get benefited from In-Memory? Or Will Bpool extension really help you gain that much desired app performance?). Don’t consider this marketing, instead its a guide to help you map your business needs to this newest SQL release.

Last, it always help to take a lead and learn something new, so then when your workplace decides to upgrade, the decision makers know whom to talk to.

Non-Tech/Business reads

delivering_happiness

#4 Delivering Happiness: A Path to Profits, Passion, and Purpose; Tony Hsieh; Business & Money; Read-Status: Completed;

‘Delivering Happiness: A Path to Profits, Passion, and Purpose’ is a great book, actually its the best I’ve read during this year. I personally recommend this book to all my family and friends. Wherever you are, whatever you do, you should be reading this book……as its talk about the most basic human trait i.e. being happy. So if you’ve been thinking this is another book from co-founder for a billion dollar start-up, havingusual insight into how successful they have been, then it’s not adequately correct. While the book still talks about how Tony Hsieh and other co-founder friends started Zappos, this one clearly goes beyond telling about how any individual or an organization can ‘really’ be successful.

Profits, Passion and Purpose are the three main elements of this book, which Tony Hsieh explains in a simple (yet uncommon) story telling way. There are too many thing that I liked about the book, in the interest of space, so I’ll list my top favorites:

  • Everyone is an entrepreneur – Tony Hsieh was into business right from age of 8 when he started selling customized buttons via mail order. It was not just about earning money instead it was about being creative and experimenting. When you read this section, do take a pause to go back in your past and ask yourself, what did you liked doing as a child and then see if you can relate some of it to what you’re doing today.
  • Profits – Profits are important, cyclical and are much easier to make when you follow a creative and less-taken path.
  • Passion – Profits alone cannot make anyone successful as they are cyclical and cannot be sustained. Passion, however is less cyclical. Combining profits with passion can make bigger impact on overall success.
  • Purpose – Even when you’ve identified your passion and path to profits (being successful) there’s still a need to have a purpose, i.e. what are those things that finally makes you a happier person.
  • Finally on Happiness? It’s beyond my explanation, as it’s so much personal to everyone. All I can say is that I’m joining Tony’s  happiness movement

The book is filled with some wisdom tweets from Tony Hsieh (he referred them as “tweets to live by”) such as ones below:

“We either make ourselves miserable, or we make ourselves strong. The amount of work is the same.”

“No matter what your past has been, you have a spotless future.”

everything_store_

#5 The Everything Store: Jeff Bezos and the Age of Amazon; Brad Stone; Business & Money; Read-Status: Completed;

Consider this book as a memoir on Amazon and it’s undisputed leader Jeff Bezos. Author Brad Stone I believe is been able to get the most authentic details of Amazon including its founders, history, challenges, competitions, losses and triumphs.  ‘The Everything Store’ primarily focuses on Bezos (pronounced Bay-Zoes) and his awe-inspiring journey of founding Amazon. Given the author’s perspective, overall journey of Amazon seems like a Bezos personal journey and every progressing chapter will get you even with your thoughts around why Jeff is among the smartest CEO’s in relevant times. Here’s some of my favorite picks around Jeff’s leadership styles:

  1. Play Godfather – Make them an offer they can’t refuse: picking up zappos.com, diaper.com or numerous other Jeff’s negotiations
  2. Information only when necessary – Do you really know why kindle books are cheaper than their hardcover counterparts?
  3. Two Pizza Teams – Size of team shouldn’t be more that can be fed on 2 large pizza’s (this been adopted in by Google as well!)
  4. “In the old world, you devoted 30% of your time to building a great service and 70% of your time to shouting about it. In the new world, that inverts”
  5. Customer Obsession – Add that empty chair in every meeting that represent your customer and write all memo’s as journals

Further besides Amazon, the book also has an inspiring story on early formative dates of world-wide-web (WWW), talking about stuff like web mails, online retail and the eventful time of dotcom boom and bust. So if you have slightest interest in travelling back in time to understand evolution of internet and then be returned talking Cloud and e-readers (Kindle), the book won’t disappoint you.

One of my favorite quote from the book “For all great innovations it’s necessary to have the willingness to fail and be misunderstood for a long period of time – Jeff B”

Jobs

#6 Steve Jobs: The Exclusive Biography; Walter Isaacson; Biographies & Memoirs; Read-Status: Completed;

When I started with the book, I was unsure if I’ll ever be able to cover it in its entirety (be able to read and understand) given the length 600 pages. And then I spent the whole of weekend reading without any major breaks and completed in 2 days! Although the book is an biography of Jobs, the details are exquisite and they cover the entire history of Apple Inc. (from Apple II to iPads).

Steve Jobs was “Diva”. He’s been the among best CEOs the world has known or will remember. Author Walter Isaacson has done a brilliant job in narrating Job’s complex personality in a compelling way. Actually planning of this book started with Jobs being alive and as such jobs has taken a great care to ensure that the books covers not just the brilliant feats but also the monumental failures that Jobs has to go thru to make Apple what it today. The book typically has 3 parts

First one covers the Jobs personal life (before Apple came to existence). Jobs was born with bundle of contradictions. Adopted since birth, he grew up with being skeptical to church and then ended up being a huge believer of Karma, followed Buddhism and got obsessed with Zen-like simplicity (know you know why iPhone’s feature just one button).

Second part, covers the birth of Apple and the challenges that lies ahead. Steve was turned out of apple, founding of NeXT, Pixar and then dramatic return to apple. This is most gripping tale of how an ordinary looking individual choose to do extraordinary things with childlike simplicity (iPhone would never have been so successful with a phone keypad). There are many other facets about the jobs personality like being highly demanding of his colleagues and coworkers, have lack of sensitivity towards others and over obsession with details / design.

Third part covers the post PC (read windows) era, that details the epic rise of Apple products like iPod, iPhone, iPads, iTunes, iCloud with Apple becoming world’s most valuable technology company.

The books is an amazing narrative not just because the subject matter is jobs, but it is also well-written (Kudos to Walter Isaacson). Also the book is not just for IT nerds or geeks, but everyone who want to dig deeper in Steve’s life, belief and obsessions that led to creation of Apple. Here’s my favorite Jobs quote

You can’t connect the dots looking forward; you can only connect them looking backwards. So you have to trust that the dots will somehow connect in your future. You have to trust in something – your gut, destiny, life, karma, whatever ” – Steve Jobs

google_works

#7 How Google Works; Eric Schmidt; Business & Money; Read-Status: Completed;

Google has been among the most disruptive innovators of 21st century and company has consistently been among the ‘best places to work for’. Everyone’s aware of that already, except this book explains WHY part of it?

The authors Eric Schmidt and Jonathan Rosenberg have done an impeccable job in explaining the “Why Google has been so successful?” like explaining the core philosophy that has helped Google to be the successful enterprise that it has become today or how they hire right people (aka Smart Creative) or stuff like allowing 20% of time on personal projects.

There’s been lot written about Google already like the search giant dominance over net, their mapping services, driverless cars and google glasses; however reading through the book I realized that Google primarily is data company that is managing information of 3 billion users (source Internet Live stats) and is coming up with new business models for making that data useful to us. Did you know that Google has ability to track the spread of any epidemic 200x faster than traditional ways used so far by US CDC!

Hiring is a big part of Google’s success and as such it’s given supreme importance within the company. Going thru some pages, one actually feels as if the entire company is like a huge HR dept. that is constantly looking to find and hire smart creative.

Eric and Jonathan have added tons of example like one above which explains WHY part, in an interesting story telling way. In a nutshell, the book provides a firsthand account Google’s success formula, that should be helpful of all professionals and organizations that are aspiring to successful in a ‘data-driven’ world.

Hatching_twitter

#8 Hatching Twitter; Nick Bilton; Business & Money; Read-Status: In-Progress;

Even before I start telling about the book, I just want to say that I’m still in middle of my read, yet I’m compelled to include this in year’s review.

Since I picked up this book, I couldn’t put it down. The first section (where I’m right now) talks about how Nick, the farm boy taught himself to code and started blogger. The book also covers lives of other founders and how each one of them brought their own unique strengths, perspectives, resources, skills and innovation to the table to make twitter what it is today. The book so far is all about the people behind the twitter and it will be interesting to read more to understand how they were able to script a collective success in spite of their individual imperfections. So I’ll come back to complete this review…!

PLURPeace Love Unity Respect

kindle-with-books

SQL-Saturday #116 – Question of the day: When does a GRANT overrides a DENY in SQL Server?

SQL_SAT

 

SQLSaturday #116 – Bangalore

WoW!! What a great day it was to learn and connect with SQL guru’s and like minded folks who carry same passion for SQL server as I do. It’s quite an honor to meet and listen @blakhani, @pinaldave, @banerjeeamit, @vinodk_sql, @kashyapa , Rick @Joes2Pros……You guys are truly inspirational

Fellow attendees –  For those of you on Twitter, follow #sqlpass and make sure to check out the #sqlsat116 and #sqlsaturday hashtags to stay up to date

As tweeted by @vinodk_sqlRace to First 10 blog post on #SQLSat116. If you attended the event, we want to hear from you. Drop us a nudge 🙂 …”

Here’s the first one to start with –

Fellow attendees – During the session on Security Pitfalls, Vinod gave us this interesting question to reply/tweet later.

Question: When GRANT overrides DENY in #sql server? Was asked during a session today in #sqlsaturday #sqlsat116

Answer: An explicit GRANT on a TABLE::COLUMN takes precedence on DENY TABLE #sqlsat116 #sqlsaturday

And here’s a simple test I did to get me answer

We needs 3 users/login (one being S.A.) 

–Here:

–UserOne is SA on SQL Instance

–UserTwo has got READ/WRITE privileges on TEST database

–UserThree is DBO (database_owner) for TESTDB

–Step1. Create a sample table say ‘iden_test’ with 2 cols (id,name)

USE [test]

GO

CREATE TABLE [dbo].[iden_test](

     [id] [int] IDENTITY(1,1) NOT NULL,

     [name] [varchar](10) NULL

)

–Step2. Run below T-SQL to DENY READ (SELECT) to UserTwo

DENY SELECT on OBJECT::dbo.iden_test TO UserTwo

GO

–Step 3. Try running below query and this should fail with permissions issues

SELECT id FROM dbo.iden_test

Go

Error: Msg 229, Level 14, State 5, Line 1

The SELECT permission was denied on the object ‘iden_test’, database ‘test’, schema ‘dbo’.

–Step 4. Login to server with UserThree (SA user) and run below T-SQL to GRANT

GRANT SELECT ON OBJECT::dbo.iden_test(id) TO UserTwo

Go

 –Step 5. Login again as UserTwo and run the SELECT (per Step 3) again

SELECT id FROM dbo.iden_test

Go

I can get the results for the specific column now.

Conclusion – A Grant on Table (Column) overrides the DENY on same object.

Varun as iVarund

Identify queries that consume a large amount of log space in SQL Server

One of regular issues DBA’s get are about the T-log growth. Situations, wherein one “bad”  or “poorly-designed” query can eat up entire T-log space, bring the free space to zero and then bring your application down. The cause and remedy of most these issue is discussed in this KB # 317375(I’m  big fan on Microsoft KB’s).

While the KB discussed about the causes and approaches to deal with high T-log growth situations, it also hints about how we can ‘proactively’ find the queries that are consuming your T-log space at any given moment using DMV’s. Taking cue from this, I have written a below T-SQL Code:

Identify queries consuming large T-log space:

Use <your_database_name>
go
-- Description: get queries consuming maximum T-log space
-- Source: based on KB317375
-- Author: varun.dhawan@gmail.com                                               

SELECT dtst.session_id AS 'spid'
, CAST(DB_NAME(dtdt.database_id) AS VARCHAR(20)) AS 'database'
, der.command
, SUBSTRING(st.text, ( der.statement_start_offset / 2 ) + 1
, ((CASE der.statement_end_offset 
WHEN -1 THEN DATALENGTH(st.text) 
ELSE der.statement_end_offset 
END - der.statement_start_offset)/2)+1) AS statement_text
, COALESCE(QUOTENAME(DB_NAME(st.dbid)) + N'.' 
+ QUOTENAME(OBJECT_SCHEMA_NAME(st.objectid, st.dbid)) 
+ N'.' 
+ QUOTENAME(OBJECT_NAME(st.objectid, st.dbid)), '') AS command_text
, der.wait_type
, der.wait_time
, dtdt.database_transaction_log_bytes_used / 1024.0 / 1024.0 AS 'mb used'
, dtdt.database_transaction_log_bytes_used_system / 1024.0 / 1024.0 AS 'mb used system'
, dtdt.database_transaction_log_bytes_reserved / 1024.0 / 1024.0 AS 'mb reserved'
, dtdt.database_transaction_log_bytes_reserved_system / 1024.0 / 1024.0 AS 'mb reserved system'
, dtdt.database_transaction_log_record_count AS 'record count' 
FROM sys.dm_tran_database_transactions dtdt 
JOIN sys.dm_tran_session_transactions dtst 
ON dtdt.transaction_id = dtst.transaction_id 
JOIN sys.dm_exec_requests der 
CROSS apply sys.Dm_exec_sql_text(der.sql_handle) AS st 
ON dtst.session_id = der.session_id 
ORDER BY 8 DESC; 
Hope that this will help you too!

Disclaimer: Everything here, is my personal opinion and is not read or approved by my employer before it is posted. No warranties or other guarantees will be offered as to the quality of the opinions or anything else offered here.

What is a columnstore index?

WARNING: The blog-post is based on pre-release software so things could change. For more details on CTP, please refer SQL Server Code-Named “Denali” CTP1 Release Notes

Upcoming SQL Product, introduces a new data warehouse query acceleration feature based on a new type of index called columnstore. Before we move any further exploring this new feature, I want to take time to explain the basics behind a columnstore index and how different is it from a traditional index (rowstore).

What is columnstore? And what is a rowstore?

To understand this, lets see a simple illustration below. Here I have a table with 4 columns (First name, Email, Phone, Street Address) . Below is a representation of how the index data will be stored and their associated pros and cons.

image

As opposed to a rowstore, a columnstore index stores each column in a separate set of disk pages, rather than storing multiple rows per page as data traditionally has been stored. So in above example, columns (First name, Email, Phone, Street Address) are stored in different groups of pages in a columnstore index.

So what’s BAD with rowstore design?

Say if we have to run a query like ‘select first_name, phone from emp’. In a rowstore design, DBMS will transfer the ENTIRE ROW from disk to memory buffer even though the query required just 2 attributes. In case of a large read intensive queries, we do so much of un-necessary disk I/O and thus wasting precious disk bandwidth.

And what’s good with columnstore design?

1. Better performance for SELECT’s – only the attributes needed to solve a query are fetched from disk, thereby saving on disk I/O.
2. Better compression ratio – it’s easier to compress the data due to the redundancy of data within a column

Really are they so good?

Wait, “There’s no free lunch”. Due to change in the index storage design, any tuple (row) writes are very expensive on a column store index. As such, in Denali, tables with columnstore indexes can’t be updated directly using INSERT, UPDATE, DELETE, and MERGE statements, or bulk load operations. Hence to perform a DML on table, we may need to disable/drop an index temporarily and then re-create post DML activity.

Hope this provides you with some initial understanding of a ROWSTORE vs COLUMNSTORE.  This feature is expected to be available in next CTP build of Denali, so once we have the build I will be able to share a demo.

Thanks of reading!

Disclaimer: Everything here, is my personal opinion and is not read or approved by my employer before it is posted. No warranties or other guarantees will be offered as to the quality of the opinions or anything else offered here.

Hello world….once again

Hello World !

This is continuation to my SQL Server blog on MSDN. With this blog post, I intend to help fellow database engineers community using PostgreSQL / SQL Server as a Database Platform, with occasional other stuff thrown in related to other RDBMS platforms and their respective integration issues. Thanks to my wife who’d been pestering me to blog for a year or so.

I’ve spent more than a decade working on various different RDBMS platforms including PostgreSQL, SQL Server, Oracle, MySQL. My current focus area is PostgreSQL and it’s implementation in a large enterprise landscape overcoming challenges of scale, manageability and automation.

Thanks for your time reading and I hope all my effort will help (or at least entertain) you at many levels. I would really appreciate if you could let me know what you think about it, good or bad. I always appreciate feedback 🙂

Sincere Thanks!

Varun