Window functions are a powerful tool that helps to leverage the power of PostgreSQL for Data Analysis. In this blog series, I will explain what window functions are, why you should use them, types of window functions and finally will introduce you to some basic window functions in PostgreSQL. In the next few post, I’ll go through more advanced window functions and demo some scenarios. So let’s get going.
Definitions first – What are window functions?
A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculations that can be done with an aggregate function. However, window functions do not cause rows to become grouped into a single output row like non-window aggregate calls would. Instead, the rows retain their separate identities.Source: PostgreSQL Docs 3.5. Window Functions
So how are they different from GROUP BY ?
Window functions do aggregation similar to GROUP BY clause. However, rows are not grouped into a single row, instead, each row retains its separate identity. That is, a window function may return a single value for each row. Here’s a simple visualization to understand this.
On left-hand side is GROUP BY , that collapse all 3 rows in one single row based on type of aggregation (SUM(), AVG(), COUNT() etc.). On Right-hand side is Window Function that output all 3 rows with an aggregation value. The example below explain this further.
Comparing GROUP BY and Window Function
Let’s use a simple example to compare and contrast GROUP BY and Window Function. We have an Employee table with columns as employee name, job title, salary, etc. We want to calculate the average salary for each job title.
As you can notice, GROUP BY aggregation on the left-hand side, collapse everything in four rows (i.e. 4 distinct job title) to display the average salary. In contrast, the Window Function retains all original rows. So we can see employees ‘Job Title’, ‘Salary’ and also ‘Average Salary’. This makes it easy to compare each employee salary to the average salary for their job title.
One can argue that it does not make much sense to repeat the same value
Avg Salary for every single row. However it becomes quite useful if you were to compute
Percentage Salary for all employees compared to their peers with same job title. I’ll show you this later in post, for now lets start with a super simple example.
Let’s explore with a simple example
Create a simple result set of numbers from 1 to 10 using generate_series()
SELECT * FROM generate_series(1, 10) AS f(series); series ------ 1 2 3 4 5 6 7 8 9 10
Using window function OVER() display SUM of series against each row.
SELECT series, SUM(series) OVER () FROM generate_series(1, 10) AS f(series); series sum ------ ------ 1 55 2 55 3 55 4 55 5 55 6 55 7 55 8 55 9 55 10 55
Using three OVER() display COUNT, SUM, AVERAGE of series against each row.
SELECT series , COUNT(series) OVER () , AVG(series) OVER () , SUM(series) OVER () FROM generate_series(1, 10) AS f(series); series count avg sum ----- ------ ------ ------ 1 10 5.5 55 2 10 5.5 55 3 10 5.5 55 4 10 5.5 55 5 10 5.5 55 6 10 5.5 55 7 10 5.5 55 8 10 5.5 55 9 10 5.5 55 10 10 5.5 55
At this time, there is no need to memorize the syntax. The only concept I want you to understand is that the window function computes a value for each row in the “window” or “partition“.
Now a “window” can be all rows (like our example above), or an only subset of rows as specified in the PARTITION BY clause. For our employee table example, we may partition by Job Title. In the example below, you can see the job title in a different color. Here, each color represents a different “window” or a “partition”. Let’s see this in real-world scenario.
Avg Sal and
% Sal for all employees, group by Job Title
Coming back to employee, job title and salary example. Let’s first check the our base tables here i.e. “employee” and “jobs”.
SELECT j.job_title, e.salary, AVG(e.salary) OVER (PARTITION BY j.job_title) as "avg sal", round(e.salary / AVG(e.salary) OVER (PARTITION BY j.job_title) * 100,2) as "sal percent" FROM employees e INNER JOIN jobs j ON e.job_id = j.job_id GROUP BY j.job_title, e.salary;
Types of window functions?
There are multiple different window functions in PostgreSQL, that can all be grouped in three broad categories based on their use: Value, Ranking and Aggregate. Here’s the complete list.
You can check the PostgreSQL docs for the full list.
Hopefully you’ve an overview of window functions. These are powerful tools for analyzing the structured data. In the subsequent posts, I’ll deep dive into each of these Window Function categories with real-world some scenarios. To read additional details about postgres window functions, please refer to the PostgreSQL Window Function Docs. Stay tuned!
This is 1st article in four-part series on Window Functions in PostgreSQL
Enjoy what you’re reading?
Join my mailing list, and I’ll let you know whenever I write another post. No spam, I promise! 👨💻
2 thoughts on “Data Science with PostgreSQL – Window Functions Basics”