This is my 5th (and final) post on Window Function in PostgreSQL series. In previous posts on this topic, I have covered window function basics, using aggregate window functions, ranking window function and value window functions. While you’re here, I’ll recommend to you check the previous posts on this topic.
In this article, we’ll learn a new concept
frame_clause. Let’s jump right in!
What’s a window frame?
Windows function works on a group of rows called partitions (or window). A PARTITION BY clause divides the rows into partitions, so we can choose the specific rows to be included in a computation. A FRAME is a subset of this “current partition” and is defied using a
Syntax for frame_clause
ROWS BETWEEN <starting_row> AND <ending_row>
Below are the possible values that can be specified for <starting_row> and the <ending_row> bounds of a window frame:
UNBOUNDED PRECEDING— all rows before the current row in the partition, i.e. the first row of the partition
[N] PRECEDING— No. of rows before the current row
CURRENT ROW— the current row
[M] FOLLOWING— No. of rows after the current row
UNBOUNDED FOLLOWING— all rows after the current row in the partition, i.e. the last row of the partition
Let’s see this using sample dataset of a car dealership group. Here, we have a table
car_sales_by_month that stores car sales information by months.
CREATE TABLE IF NOT EXISTS car_sales_by_month ( year INT, month INT, brand VARCHAR(250), model VARCHAR(250), type VARCHAR(250), quantity INT, sales INT ); INSERT INTO car_sales_by_month VALUES (2022, 1 ,'Toyota','RAV4','Suv',50,1500000), (2022, 1 ,'Toyota','Corolla','Sedan',10,2010000), (2022, 1 ,'Honda','CRV','Suv',30,8000000), (2022, 2 ,'Honda','CRV','Suv',40,33000000), (2022, 2 ,'Toyota','RAV4','Suv',30,1000000), (2022, 2 ,'Toyota','Corolla','Sedan',20,2050000), (2022, 3 ,'Honda','Civic','Sedan',60,10000000), (2022, 3 ,'Honda','NSX','Hybrid ',25,2004000), (2022, 3 ,'Toyota','Corolla','Sedan',30,1080000), (2022, 4 ,'Honda','Civic','Sedan',60,30000000), (2022, 4 ,'Honda','NSX','Hybrid ',25,2004000), (2022, 4 ,'Toyota','Corolla','Sedan',35,4520000);
Scenario 1. Query to report yearly revenue by make of the car.
SELECT brand, SUM(sales) AS "Total sales $" FROM car_sales_by_month GROUP BY brand;
Scenario 2. Query to report monthly sales revenue changes for a specific car model.
SELECT brand, model, month, sales AS current_month_sales, LAG(sales) OVER ( ORDER BY month) AS previous_month_sales, sales - LAG (sales) OVER (ORDER BY month) AS difference FROM car_sales_by_month WHERE year = 2022 AND model = 'Corolla'
Scenario 3. Query to report the total sales for current month and previous month, alongwith the maximum sales in any individual month throughout the year.
SELECT brand, model, month, sales AS current_month_sales, LAG(sales) OVER (PARTITION BY brand, model ORDER BY month) AS prev_month, MAX(sales) OVER (PARTITION BY brand, model ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS max_year_sales FROM car_sales_by_month WHERE year = 2022
💡 One worthy note is that anytime that you specify an
ORDER BY clause, PostgreSQL will automatically set the default window as ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
The real magic of window functions is actually the frame i.e. subset of records over which the window function works. Here we just learned that a window frame depends on the current row and is defined by over clause.
If you have followed through all my posts, you now have seen most of the common Window functions/patterns. Great Job! Remember, window functions often get discussed during data engineer / scientist job interviews. So if you want to deepen your knowledge, I’d recommend reading PostgreSQL docs. And I hope the details and examples shared in this series will help in acing your next interview. All the best 👍
Enjoy what you’re reading?
Join my mailing list, and I’ll let you know whenever I write another post. No spam, I promise! 🧑🏻💻