Why Averages suck and what make Percentiles great

Average(mean), median, mode are core statistical concepts, that are often applied in software engineering. Whether you are new to programming or have multi years of computer science experience, you’ve likely at some point in time will have used these statistical functions, say to calculate system resource utilization, or network traffic or a website latency. In my current role, my team is responsible for running a telemetry platform to help dev teams measure application performance. We do this by collecting point-in-time data points referred as metrics.

A common use case for metrics is to tell the application latency (i.e. amount of time it took between a user action and web app response to that action). For examples, amount of time it took you to click on twitter photo and till it finally showed up on your device screen. So if you have this metrics collected at regular intervals (say every 1s), you can simply average it over a period of time like an hour or a day to calculate latency. Simple right!

Well, it might not be that simple. Averages are bad in this case, Let me explain why.

Disclaimer: By no means I claim to be expert on statistics,  so please correct me if I’m wrong. 😀

Why do averages suck?

Consider this:

  • You have a code to measures how long a user waits for an image to render.
  • You collected 5 data points over a period of time: 3s, 5s, 7s, 4s, and 2s.
  • If you average them, you get 3.5 (i.e. ((3+5+7+4+2) / 5))
  • However, 3.5 seconds is NOT representative of your actual users experience. From this data, it’s clear some of your users are having a very fast experience (less than 3 seconds ✅), and some are having a very slow experience (greater than 7 seconds ❌). But none of them are having a mathematically average experience. This isn’t helping

Are Percentiles better in this case? Yes!

Percentiles is a value on a scale of 100 that indicates the percent of a distribution that is equal to or below it. For example, the 95th percentile is the value which is greater than 95% of the all observed values. Coming back to our app latency scenario, instead of calculating the average of all observed data points, we calculate Percentile50 or Percentile90.

P50 – 50th Percentile

  • Sort the data points in ascending order: 2s, 3s, 4s, 6s, 7s.
  • You get P50 by throwing out the bottom 50% of the points and looking at the first point that remains: 6s

P90 – 90th Percentile

  • Sort the data points in ascending order: 2s, 3s, 4s, 6s, 7s.
  • You get P90 by throwing out the bottom 90% of the points and looking at the first point which remains: 7s

Using percentiles has these advantages:

  1. Percentiles aren’t skewed by outliers like averages are.
  2. Every percentile data point is an actual user experience, unlike averages.

You can plot percentiles on a time series graph just like averages. And you can also setup threshold alerts on them. So say if P90 is greater than 5 seconds (i.e. 90% of observed values have greater than 5s latency) you can be alerted. Below is a spreadsheet to explain Percentile.

As you might have noticed, when you use percentile based metrics, you get a much better sense for reality.

Some interesting facts about Percentile

  • Percentile are commonly referred: p99 (or P99, or P₉₉) means “99th percentile”, p50 means “50th percentile”…you get the drift.
  • P50 is same as the median (mid-point of a distribution)
  • And Percentile are NOT Percentage!

Conclusion

Now armed with some basic knowledge about percentiles, hopefully you’ll start seeing your metrics in whole different way.

SQL Injection and Preventing them in your Golang app

SQL injection!… Is it really a thing?

SQL injection is a code injection technique that is capable of destroying your database. It is a code injection technique, used to attack data-driven applications, in which malicious SQL statements are inserted into an entry field for execution. (e.g. to dump the database contents to the attacker)

How SQL Injection Works?

SQL injection is a hacking technique that’s been around since at least 1998. It takes advantage of two factors for success: First, web applications often ask users for data; Second, those applications tend to take the user-supplied data and pass it to the database as part of an instruction. Put them together with no code-based guardrails, and a criminal can run the application far off into the weeds.

(image from xkcd.com, with “copy and share” license described here:  License)

Like in above case, Bobby’s school lose all their student records when Bobby’s father decided to name their son as “Robert’); DROP TABLE students;” and used that as an input parameter in online school’s application form.

How can we avoid it in Golang?

In this post, we’ll learn how to avoid SQL Injection attack in our Golang code by sanitizing database inputs.

To be able to achieve our objective i.e. Avoid SQL Injection in Golang application, there are some prerequisites. Sticking to the core subject of this blog, I won’t cover them, however, I’m providing with some references

Install PostgreSQL, setup an instance and create a test database – https://www.postgresqltutorial.com/install-postgresql/

Install Go and configure workspace – https://www.callicoder.com/golang-installation-setup-gopath-workspace/

For this tutorial, I’m using Postgres 12 and Go 1.13.xx

Step 1. SQL Table definition

We’ll use test table EMP with some test data

-- create a sample table EMP
CREATE TABLE emp (  
  empno SERIAL PRIMARY KEY,  
  ename TEXT,
  sal INT CHECK (sal>0),
  email TEXT UNIQUE NOT NULL 
);

-- insert test data
INSERT INTO public.emp (ename, sal, email)
values
('Smith', 1400, 'smith@acme.com'),
('Allen', 2000, 'allen@acme.com'),
('Jones', 3000, 'jones@acme.com'),
('Blake', 4000, 'blake@acme.com');

Step 2. Basic select statement

Let’s just assume that we want to SELECT employee record from table EMP by passing EMPNO column filter in WHERE clause.

q := fmt.Sprintf("SELECT ename FROM emp where empno=%s;", "7")
row := db.QueryRow(q)

The empno WHERE clause is substituted with value 7 and all rows relating to that employee number are returned. No big deal. But what happens if someone like Bobby’s Father inputs 7); TRUNCATE TABLE emp;--sql_injection as variable input value.

q := fmt.Sprintf("SELECT ename FROM emp where empno=%s;", "7; Truncate Table emp;--sql_injection!")
row := db.QueryRow(q)

Here:

  • Since empno contain  ); the VALUES argument is closed
  • And postgres continues executing the next line i.e. the text that follows TRUNCATE TABLE emp and truncates the whole table emp
  • Finally the — at the end comments out the remaining SQL, essentially ignoring the rest of the original code and making sure no error occurs.

Step 3. How to deal with this in Golang

The database/sql package from the standard library provides methods for executing parameterized queries either as prepared statements or as one-off queries. For example, you might want to have code that looks roughly like this:

// this is for Postgres driver
// querying for a single record using Go's database/sql package
sqlStatement := `SELECT ename FROM emp WHERE empno=$1;`
row := db.QueryRow(sqlStatement, 7)

The key distinction here is that we aren’t trying to construct the SQL statement ourselves, but are instead providing arguments that can be easily escaped for us. The underlying driver for database/sql will ultimately be aware of what special characters it needs to handle and will escape them for us, preventing any dangerous SQL from running.

Step 4. Putting this all together

package main
// querying for a single record using Go's database/sql package
import (
"database/sql"
"fmt"
_ "github.com/lib/pq"
)

const (
	host = "localhost"
	port = 5432
	user = "postgres"
	password = "****"
	dbname = "connect"
)

func main() {
	psqlInfo := fmt.Sprintf("host=%s port=%d user=%s "+
		"password=%s dbname=%s sslmode=disable",
		host, port, user, password, dbname)

	db, err := sql.Open("postgres", psqlInfo)
	if err != nil {
		panic(err)
	}
	defer db.Close()

	var ename string
	sqlStatement := `SELECT ename FROM emp WHERE empno=$1;`
	row := db.QueryRow(sqlStatement, 46)
	switch err := row.Scan(&ename); err {
		case sql.ErrNoRows:
			fmt.Println("No rows were returned!")
		case nil:
			fmt.Println(ename)
		default:
			panic(err)
	}
}

Conclusion

You should always use the database/sql package to construct prepared statements. These prepared statements have parameters that will be passed while executing the statement. This is much better than concatenating strings (Avoiding SQL injection attack). In PostgreSQL, the parameter placeholder is $N, where N is a number. In MySQL it is ?. SQLite accepts either of these. For more best practices on preventing SQL Injection, please refer https://bobby-tables.com/go

If you find this post helpful, I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter or Facebook. Thank you

Querying rows from PostgreSQL from Go lang project

Introduction

In the last post Updating/Deleting rows with Go, we learned to manipulate rows in PostgreSQL database in Go project using database/sql package that ships with Go, along with github.com/lib/pq Postgres driver. In this post, we’ll learn how to query rows i.e. SELECT

Prerequisites

To be able to achieve our objective i.e. Select rows from PostgreSQL using Go, there are some prerequisites. Sticking to core subject of this blog, I won’t cover them, however I’m providing with some references

Install PostgreSQL, setup an instance and create a test database – https://www.postgresqltutorial.com/install-postgresql/

Install Go and configure workspace – https://www.callicoder.com/golang-installation-setup-gopath-workspace/

For purpose of this tutorial, I’m using Postgres 11 and Go 1.13.xx

Objective – Updating and  Deleting rows in a PostgreSQL table using Go lang database/sql package

In following steps, I’ll demonstrate how to query rows

Step 1. SQL Table definition

We’ll reuse our table EMP that has following columns

-- create a sample table EMP
CREATE TABLE emp (  
  empno SERIAL PRIMARY KEY,  
  ename TEXT,
  sal INT,
  email TEXT UNIQUE NOT NULL 
);

Step 2. SELECT statement in SQL (multiple rows)

Lets first write down our DML statements for update and delete

-- select row from table
SELECT ename, sal 
	FROM emp 
		ORDER BY sal desc;

Step 3. Connecting to PostgreSQL instance from Go lang

As always, we first need to make a connection to PostgreSQL instance. Since the code to make a connecting to PostgreSQL database is same, I’ll skip repeating it here, however if you need a refresher please visit the earlier post Connecting to PostgreSQL db from Go lang project

Step 4. Preparing Go lang code

Update row in EMP table

For reading rows, we will use db.Query() method, fetch ename and sal column from EMP table and output rows sorted by sal. Next, we will assign results to variables, one row at a time, with row.Scan().

// query rows from a table
var (
		ename string
		sal int
)


rows, err := db.Query("SELECT ename, sal FROM emp order by sal desc")
if err != nil {
			panic(err)
}
defer rows.Close()

for rows.Next() {
		err := rows.Scan(&ename, &sal)
		if err != nil {
					panic(err)
		}
		fmt.Println("\n", ename, sal)
}
err = rows.Err()
if err != nil {
			panic(err)
}

Lets understand the code above:

  • We’re using db.Query() to send the query to the database and checking for any error
  • Next we iterate over the rows with rows.Next().
  • Read the columns in each row into variables with rows.Scan() and print them to console with fmt.Println().
  • Finally, we check for errors after we’re done iterating over the rows.

Step 5. Putting this all together

package main

// querying all rows from the database with Go lang sql package
import (
	"database/sql"
	"fmt"

	_ "github.com/lib/pq"
)

const (
	host = "localhost"
	port = 5432
	user = "postgres"
	password = "postgres"
	dbname = "connect-db"
)

var (
	ename string
	sal int
)

func main() {
	psqlInfo := fmt.Sprintf("host=%s port=%d user=%s "+
		"password=%s dbname=%s sslmode=disable",
		host, port, user, password, dbname)
	
	db, err := sql.Open("postgres", psqlInfo)
	if err != nil {
		panic(err)
	}
	defer db.Close()

	rows, err := db.Query("SELECT ename, sal FROM emp order by sal desc")
	if err != nil {
		panic(err)
	}
	defer rows.Close()

	for rows.Next() {
		err := rows.Scan(&ename, &sal)
		if err != nil {
			panic(err)
		}
		fmt.Println("\n", ename, sal)
	}

	err = rows.Err()
	if err != nil {
		panic(err)
	}
}

What-If, you only want to query one single row?

There could be scenario when you would only want to query a single row, ex. lookup rows for a employee with highest salary. In this case, we can skip the for loop, instead use below shortcut approach.

var name string
err = db.QueryRow("SELECT ename FROM emp ORDER BY sal DESC LIMIT 1;").Scan(&name)
if err != nil {
		panic(err)
	}
fmt.Println(name)

Errors from the query are deferred until Scan() is called, and then are returned with variable output.

Conclusion

In this post we learned how to query rows (single or multiple) from a PostgreSQL table using Go lang  database/sql package that ships with Go, along with github.com/lib/pq Postgres driver.

Connecting to PostgreSQL from Go lang project

Introduction

The Go programming language, sometimes referred to as Go lang, is making strong gains in popularity. Chances are if you are a Go developer, you will have to interact with SQL at some point in your project. This blog post will show how to connect to a PostgreSQL database from Go using database/sql package.

What is PostgreSQL?

PostgreSQL, is a free and open-source relational database management system that uses and extends the SQL language combined with many features that safely store and scale the most complicated data workload.  PostgreSQL allows user-defined functions to be written in other languages including C and Go. For example, you can define your own data types, build out custom functions, even write code from different programming languages without recompiling your database.

If you’re new to PostgreSQL project, I’ll recommend you to please refer the documentation

What is Go lang?

Go is an open source, statically typed, compiled programming language designed at Google. Its is  syntactically similar to C, but with overcomes lot of it’s limitations. It is often referred to as “Golang” because of its domain name, golang.org, but the proper name is Go. There are multiple areas where Go shines, like statically, strongly typed with a great way to handle errors, compiles down to one binary, super fast compilation, open source and above all simplicity.

To learn more about Go, please refer Go docs

Prerequisites

To be able to achieve our objective i.e. Connecting to PostgreSQL from Go , there are some prerequisites. Sticking to core subject of this blog, I won’t cover them, however I’m providing with some references

Install PostgreSQL and setup an instance – https://www.postgresqltutorial.com/install-postgresql/

Install Go and configure workspace – https://www.callicoder.com/golang-installation-setup-gopath-workspace/

For purpose of this tutorial, I’m using Postgres 11 and Go 1.13.xx

Objective – Connecting to PostgreSQL from Go

In next few steps, I’ll demonstrate how to build a simple command line tool (CLI) using Go lang that can connect to a PostgreSQL instance. So let’s Go (pun intended)

Step 1. Gather PostgreSQL Instance details

host     = "localhost"
port     =  5432
user     = "postgres"
password = "secure-password"
dbname   = "connect-db"

Step 2. Install the github.com/lib/pq package

go get -u github.com/lib/pq

Go’s standard library was not built to include any specific database drivers. So we need to install a third party package named lib/pq.

While Go provides us with the database/sql package that we will be utilizing to interact with our database, the standard libraries do not include drivers for every SQL database variant. Instead this is left up to the community, and from my experience the lib/pq package is the best driver for Postgres.

Step 3. Putting together our package pg-connect.go

package main
	
import (
  "database/sql"
  "fmt"
_ "github.com/lib/pq"
)

Step 4. Configuring database connection string in our code

Inside main() function, we’re are going to create a connection string, containing all the information required to connect to our postgres database.

func main() {
  connStr := "user=postgres dbname=connect-db password=secure-password host=localhost sslmode=disable"
}

Depending on your specific setup, you may want to change sslmode to enable or disable (default value is enabled).

Step 5. Creating a connection to our database

At this time our code is ready for a handshake with database using the connection string connStr. To make this work, we will use sql.Open() function, that takes 2 arguments – a driver name and connecting string. At this step, we’ll also add error handling to ensure we signal a panic if anything goes wrong.

db, err := sql.Open("postgres", connStr )
if err != nil {
  panic(err)
}
defer db.Close()

Hoping our connection details are validated, next we’re going to call Ping() method on sql.DB object to test our connection. db.ping() will force open a database connection to confirm if we are successfully connected to the database.

err = db.Ping()
if err != nil {
  panic(err)
}

Step 6. Putting it all together

package main

// connecting to a PostgreSQL database with Go's database/sql package	
import (
	"database/sql"
	"fmt"
	_ "github.com/lib/pq"
)
	
func main() {
	
/*
variables required for connection string: connStr

user= (using default user for postgres database)
dbname= (using default database that comes with postgres)
password = (password used during initial setup)
host = (hostname or IP Address of server)
sslmode = (must be set to disabled unless using SSL)
*/
	
  connStr := "user=postgres dbname=connect-db password=secure-password 
  host=localhost sslmode=disable"
  db, err := sql.Open("postgres", connStr)
  if err != nil {
	panic(err)
  }
  defer db.Close()
	
  err = db.Ping()
  if err != nil {
	panic(err)
  }
  fmt.Printf("\nSuccessfully connected to database!\n")
}

Next steps…

In the next post, we’ll see how to interact with the data within the database using Go lang.

Connect to PostgreSQL in VS Code

VS Code has a rich extension API that let you add languages, debuggers, and tools to your installation to support easy development. PostgreSQL extension allows following:

  • Connect to PostgreSQL instances
  • View object DDL with ‘Go to Definition’ and ‘Peek Definition’
  • Write queries with IntelliSense
  • Run queries and save results as JSON, csv, or Excel

Download link: https://marketplace.visualstudio.com/items?itemName=ms-ossdata.vscode-postgresql

Using VS Code PostgreSQL extension

  1. Open the Command Palette Ctrl + Shift + P  (On mac use  ⌘ + Shift + P)

a

  1. Search and select PostgreSQL: New Query

  2. In the command palette, select Create Connection Profile. Follow the prompts to enter your Postgres instance hostname, database, username, and password.

a

 

 

You are now connected to your Postgres database. You can confirm this via the Status Bar (the ribbon at the bottom of the VS Code window). It will show your connected hostname, database, and user.

Now, let’s try to query database.

  1. Type a query ex. SELECT * FROM pg_stat_activity;

  2. Right-click, select Execute Query / keyboard shortcut [⌘M ⌘R] and the results will show in a new window.

6. You can also save the query results as JSONCSV or Excel.

So now, you can seamlessly code for PostgreSQL from Microsoft VS Code without switching screens, leverage powerful intellisense and execute queries.

Enjoy Coding!

IMP NOTE: Result windows from queries won´t show up again after being closed. This is bug with current version and is being worked by dev team. Workaround is either to keep the result window Open Or close / re-open the VS code window.

Site Reliability Engineering: How Google Runs Production Systems – Book Review

Essential Read for anyone managing highly available distributed systems at scale

First off – It’s worth let you know that Google lets you read this “entire” book online for free on their website. Yes you read it right, you don’t need to buy the book, just click on below link – https://landing.google.com/sre/sre-book/toc/index.html and start reading!

The book starts with a story about a time Margaret Hamilton brought her young daughter with her to NASA, back in the days of the Apollo program. During a simulation mission, her daughter caused the mission to crash by pressing some keys accidentally. Hamilton noticed this defect and proactively submitted a change to add error checking code to prevent this from happening again, however the change was rejected because program leadership believed that error should never happen. On the next mission, Apollo 8, that exact error condition occurred and a potentially fatal problem that could have been prevented with a trivial check took NASA’s engineers 9 hours to resolve. Hence early learning from book

“Embrace the idea that systems failures are inevitable, and therefore teams should work to optimize to recover quickly through using SRE principles.”

The book is divided into four parts, each comprised of several sections. Each section is authored by a Google engineer.

In Part I, Introduction, the authors introduce Google’s Site Reliability Engineering (SRE) approach to managing global-scale IT services running in datacenters spread across the entire world. (Google approach is truly extraordinary) After a discussion about how SRE is different from DevOps (another hot term of the day), this part introduces the core elements and requirements of SRE, which include the traditional Service Level Objectives (SLOs) and Service Level Agreements (SLAs), management of changing services and requirements, demand forecasting and capacity, provisioning and allocation, etc. Through a sample service, Shakespeare, the authors introduce the core concepts of running a workflow, which is essentially a collection of IT tasks that have inter-dependencies, in the datacenter.

In Part II, Principles, the book focuses on operational and reliability risks, SLO and SLA management, the notion of toil (mundane work that scales linearly, and can be automated) and the need to eliminate it (through automation), how to monitor the complex system that is a datacenter, a process for automation as seen at Google, the notion of engineering releases, and, last, an essay on the need for simplicity . This rather disparate collection of notions is very useful, explained for the laymen but still with enough technical content to be interesting even for the expert (practitioner or academic).

In Parts III and IV, Practices and Management, respectively, the book discusses a variety of topics, from time-series analysis for anomaly detection, to the practice and management of people on-call, to various ways to prevent and address incidents occurring in the datacenter, to postmortems and root-cause analysis that could help prevent future disasters, to testing for reliability (a notoriously difficult issue), to software engineering the SRE team, to load-balancing and overload management (resource management and scheduling 101), communication between SRE engineers, etc. etc. etc., until the predictable call for everyone to use SRE as early as possible and as often as possible. This is where I started getting a much better sense of practical SRE (a.ha!)

Overall it’s a great read, however it isn’t perfect. The two big downsides for me are 1.) this is one of those books that’s a collection of chapters by different people, so there’s a fair amount of redundancy and 2.) the book takes a sided approach on “Build Vs Buy” dilemma of engineering. I mean at Google scale, it will always be better to build, however that is rarely true in the real world. But even including the downsides, I’d say that this is the most valuable technical book I’ve read in the year. If you really like these notes, you’ll probably want to read the full book.