Data Science with PostgreSQL – Window Functions Basics

Window functions are a powerful tool that helps to leverage the power of PostgreSQL for Data Analysis. In this blog series, I will explain what window functions are, why you should use them, types of window functions and finally will introduce you to some basic window functions in PostgreSQL. In the next few post, I’ll go through more advanced window functions and demo some scenarios. So let’s get going.

Definitions first – What are window functions?

A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculations that can be done with an aggregate function. However, window functions do not cause rows to become grouped into a single output row like non-window aggregate calls would. Instead, the rows retain their separate identities.
Source: PostgreSQL Docs 3.5. Window Functions

So how are they different from GROUP BY ?

Window functions do aggregation similar to GROUP BY clause. However, rows are not grouped into a single row, instead, each row retains its separate identity. That is, a window function may return a single value for each row. Here’s a simple visualization to understand this.

On left-hand side is GROUP BY , that collapse all 3 rows in one single row based on type of aggregation (SUM(), AVG(), COUNT() etc.). On Right-hand side is Window Function that output all 3 rows with an aggregation value. The example below explain this further.

Comparing GROUP BY and Window Function

Let’s use a simple example to compare and contrast GROUP BY and Window Function. We have an Employee table with columns as employee name, job title, salary, etc. We want to calculate the average salary for each job title.

As you can notice, GROUP BY aggregation on the left-hand side, collapse everything in four rows (i.e. 4 distinct job title) to display the average salary. In contrast, the Window Function retains all original rows. So we can see employees ‘Job Title’, ‘Salary’ and also ‘Average Salary’. This makes it easy to compare each employee salary to the average salary for their job title.

One can argue that it does not make much sense to repeat the same value Avg Salary for every single row. However it becomes quite useful if you were to compute Percentage Salary for all employees compared to their peers with same job title. I’ll show you this later in post, for now lets start with a super simple example.

Let’s explore with a simple example

Create a simple result set of numbers from 1 to 10 using generate_series()

SELECT * 
FROM generate_series(1, 10) AS f(series);

series
------
1
2
3
4
5
6
7
8
9
10

Using window function OVER() display SUM of series against each row.

SELECT series, SUM(series) OVER () 
FROM generate_series(1, 10) AS f(series);

series	sum
------	------
1	    55
2	    55
3	    55
4	    55
5	    55
6	    55
7	    55
8	    55
9	    55
10	    55

Using three OVER() display COUNT, SUM, AVERAGE of series against each row.

SELECT series
	, COUNT(series) OVER ()
	, AVG(series) OVER ()
	, SUM(series) OVER ()
FROM generate_series(1, 10) AS f(series);

series count   avg    sum
----- ------ ------ ------
1	   10	   5.5	   55
2	   10	   5.5	   55
3	   10	   5.5	   55
4	   10	   5.5	   55
5	   10	   5.5	   55
6	   10	   5.5	   55
7	   10	   5.5	   55
8	   10	   5.5	   55
9	   10	   5.5	   55
10	   10	   5.5	   55

At this time, there is no need to memorize the syntax. The only concept I want you to understand is that the window function computes a value for each row in the “window” or “partition“.

Now a “window” can be all rows (like our example above), or an only subset of rows as specified in the PARTITION BY clause. For our employee table example, we may partition by Job Title. In the example below, you can see the job title in a different color. Here, each color represents a different “window” or a “partition”. Let’s see this in real-world scenario.

Display `Sal`, `Avg Sal` and `% Sal` for all employees, group by Job Title

Coming back to employee, job title and salary example. Let’s first check the our base tables here i.e. “employee” and “jobs”.

Sample schema – https://github.com/vrajmohan/pgsql-sample-data

SELECT j.job_title,
e.salary,
AVG(e.salary) OVER (PARTITION BY j.job_title) as "avg sal",
round(e.salary / AVG(e.salary) OVER (PARTITION BY j.job_title) * 100,2) as "sal percent"
FROM employees e 
INNER JOIN jobs j 
	ON e.job_id = j.job_id
	GROUP BY j.job_title, e.salary;

Types of window functions?

There are multiple different window functions in PostgreSQL, that can all be grouped in three broad categories based on their use: Value, Ranking and Aggregate. Here’s the complete list.

You can check the PostgreSQL docs for the full list.

Conclusion

Hopefully you’ve an overview of window functions. These are powerful tools for analyzing the structured data. In the subsequent posts, I’ll deep dive into each of these Window Function categories with real-world some scenarios. To read additional details about postgres window functions, please refer to the PostgreSQL Window Function Docs. Stay tuned!

This is 1st article in four-part series on Window Functions in PostgreSQL

Enjoy what you’re reading?

Join my mailing list, and I’ll let you know whenever I write another post. No spam, I promise! 👨‍💻

Author: Varun Dhawan

I’m Varun. I used to be a Software Engineer building data applications for large corporations like McKinsey and Target. Now, I’m a Product Manager at Microsoft, making Azure PostgreSQL the go-to platform for running mission-critical workloads (and no, I’m not obsessing over every little detail… I swear). When I’m not working, you can find me blogging at data-nerd.blog, where I help fellow data enthusiasts master PostgreSQL, sharpen their coding skills, and navigate their careers with confidence. And if there’s one thing you absolutely need to know about me, it’s that…I'm apparently a great cook—just don’t ask why I’m the only one who eats my food. View all posts by Varun Dhawan

Data Science with PostgreSQL – Window Functions Basics

Definitions first – What are window functions?

So how are they different from GROUP BY ?

Comparing GROUP BY and Window Function

Let’s explore with a simple example

Display `Sal`, `Avg Sal` and `% Sal` for all employees, group by Job Title

Types of window functions?

Conclusion

Enjoy what you’re reading?

Author: Varun Dhawan

2 thoughts on “Data Science with PostgreSQL – Window Functions Basics”

Leave a comment Cancel reply

Definitions first – What are window functions?

So how are they different from GROUP BY ?

Comparing GROUP BY and Window Function

Let’s explore with a simple example

Display Sal, Avg Sal and % Sal for all employees, group by Job Title

Types of window functions?

Conclusion

Enjoy what you’re reading?

Share this:

Related

Author: Varun Dhawan

2 thoughts on “Data Science with PostgreSQL – Window Functions Basics”

Leave a comment Cancel reply

Display `Sal`, `Avg Sal` and `% Sal` for all employees, group by Job Title