Looking Back to Go Forward: 20 years in Software

You know when someone casually mentions they’ve been working for 20 years, and you think, “Wow, that’s a long time” only to realize…wait, you’re that person? Yep, that’s me. This year marks 20 years in software, and it feels surreal.

In 2004, I started as a fresh graduate at McKinsey & Company. I was excited, nervous, clueless, and had no idea where this path would lead me. Two decades later, I’ve worked with some of the best teams at McKinsey, Microsoft, Target, and, yes, Microsoft again—because sometimes you come full circle.

So, here’s my look back at 20 years in software—20 moments, lessons, and memories that shaped this journey:

Continue reading “Looking Back to Go Forward: 20 years in Software”

Mastering Data Manipulation with MERGE Command in PostgreSQL 15

Two years back, I wrote a blog post titled “PostgreSQL – Mastering UPSERT“, in which I explored the nuances of the INSERT ON CONFLICT command that allows conditional inserts or updates of rows.

Continuing its database technology innovation, PostgreSQL 15 has introduced an exciting new feature – the MERGE command. This command offers a more versatile approach to INSERT, UPDATE, or even DELETE rows in a table based on specific conditions. In this post, we’ll delve into the intricacies of this new MERGE command.

Continue reading “Mastering Data Manipulation with MERGE Command in PostgreSQL 15”

PostgreSQL Advanced Pattern Matching – Beyond LIKE Operator

Whenever you are dealing with pattern matching in PostgreSQL, it is often required to write queries to match string patterns. The conventional wisdom to do pattern match was using the LIKE operator, which is not very effective for many reasons. PostgreSQL now offers more advanced pattern matching options such as SIMILAR TO expressions, POSIX regular expressions, and ANY operator. In this article, we’ll compare the LIKE operator with these other advanced pattern matching features.

Continue reading “PostgreSQL Advanced Pattern Matching – Beyond LIKE Operator”

Data Science with PostgreSQL – Using the Window frame_clause

This is my 5th (and final) post on Window Function in PostgreSQL series. In previous posts on this topic, I have covered window function basics, using aggregate window functionsranking window function and value window functions. While you’re here, I’ll recommend to you check the previous posts on this topic.

In this article, we’ll learn a new concept frame_clause. Let’s jump right in!

Continue reading “Data Science with PostgreSQL – Using the Window frame_clause”

Data Science with PostgreSQL – Value Window Functions

This far in the Window Function in PostgreSQL series I have covered window function basics, and how to use aggregate window functions and ranking window function. I suggest you check the previous posts out 🙂

In this 4th post, I’ll show you how to use Value window functions – that can be used to calculate various “value” type aggregations such as Lag, LeadFirst_Value and Last_Value within each group of rows referred here as “window” or “partition”.

Continue reading “Data Science with PostgreSQL – Value Window Functions”

Data Science with PostgreSQL – Window Functions Basics

Window functions are a powerful tool that helps to leverage the power of PostgreSQL for Data Analysis. In this blog series, I will explain what window functions are, why you should use them, types of window functions and finally will introduce you to some basic window functions in PostgreSQL. In the next few post, I’ll go through more advanced window functions and demo some scenarios. So let’s get going.

Continue reading “Data Science with PostgreSQL – Window Functions Basics”

Recursive SQL for querying hierarchical data: Part 2 – Levels and Ancestors

Ok, so you’ve stored hierarchical data in a relational database, and written recursive CTEs to query the data and find relationships. Now the application team wants to query hierarchical levels and print the complete ancestry tree. Time to deep dive into some advance CTE constructs.

This is the second post of the series about the Recursive SQL for querying hierarchical data started in the previous post . If you haven’t read it already, I recommend reading it to understand the key concepts:

  1. What is a hierarchical data?
  2. How to store hierarchical data in a relational database?
  3. And how to query hierarchical data using:
    • Self-Joins
    • Common Table Expressions (CTE)

In this post, we’ll discuss the advanced scenarios like displaying hierarchical levels and printing the “ancestry tree”. Let’s dive in…

Continue reading “Recursive SQL for querying hierarchical data: Part 2 – Levels and Ancestors”

Recursive SQL for querying hierarchical data: Part 1

Recently, I was working on an application that required reading hierarchically structured data. And I thought it might be useful to document multiple ways to store and query such hierarchical data (ex. Org. chart, File-system layout, or Set of tasks in a project) in a database. So, let’s jump right in.

Definitions first – what is hierarchical data?

Hierarchical data is a specific kind of data, characterized by a hierarchical relationship between the data sets.

Think about data sets having multiple levels: something above, something below, and a few at the same level. A typical example of such hierarchical model is an organizational chart like the one below.

Continue reading “Recursive SQL for querying hierarchical data: Part 1”

PostgreSQL – how to UPSERT (Update or Insert into a table)

Most modern-day relational database systems use SQL MERGE (also called UPSERT) statements to INSERT new records or UPDATE existing records if a matching row already exists. UPSERT is a combination of Insert and Update, driven by a “PRIMARY KEY” on the table.

Continue reading “PostgreSQL – how to UPSERT (Update or Insert into a table)”

Why Averages suck and what make Percentiles great

Average(mean), median, mode are core statistical concepts, that are often applied in software engineering. Whether you are new to programming or have multi years of computer science experience, you’ve likely at some point in time will have used these statistical functions, say to calculate system resource utilization, or network traffic or a website latency. In my current role, my team is responsible for running a telemetry platform to help dev teams measure application performance. We do this by collecting point-in-time data points referred as metrics.

A common use case for metrics is to tell the application latency (i.e. amount of time it took between a user action and web app response to that action). For examples, amount of time it took you to click on twitter photo and till it finally showed up on your device screen. So if you have this metrics collected at regular intervals (say every 1s), you can simply average it over a period of time like an hour or a day to calculate latency. Simple right!

Continue reading “Why Averages suck and what make Percentiles great”