Most modern-day relational database systems use SQL MERGE (also called UPSERT) statements to INSERT new records or UPDATE existing records if a matching row already exists. UPSERT is a combination of Insert and Update, driven by a “PRIMARY KEY” on the table.
Say, we have a use case to insert all records from a text file sent by an external app every day into our database table that has a PRIMARY KEY defined. During the insert, if a duplicate record is found, entire insert batch will fail with “duplicate key violation error“. Annoying, right?!
Postgres 9.5 (and later) has introduced UPSERT (INSERT ON CONFLICT) operation, that can allow either updating the data for duplicate row in or just silently skipping the duplicate row, without any error.
Here, let me explain this using coffee shop scenario – coffee makes almost everything easy 😛
Coffee Shop Scenario ☕
I own a coffee shop, and I want to keep a record of drink preferences for all my customers. There’s one problem though, the preferences change over time. So how can I record all drink preference of for my customer in one place, without creating a duplicate row in database. Until now, to solve this, this would require writing a complex stored procedure in SQL, PL/pgSQL, C, Python, etc. In Postgres 9.5 (and later), this could be easily achieved using single line INSERT ON CONFLICT statement. Let’s see how…
Step 1. Create a test table
CREATE TABLE myCoffee ( id serial PRIMARY KEY, name VARCHAR UNIQUE, preference VARCHAR NOT NULL);
myCoffee table consists of three columns: id, name, and preference. The name column has a unique constraint to guarantee the uniqueness of customer names.
Step 2. INSERT sample rows
INSERT INTO myCoffee (name, preference) VALUES ('Tom', 'Cappuccino'), ('Joe', 'Espresso'), ('Varun', 'Frappé');
Step 3. UPSERT in action
Now suppose Joe visits my shop again on a hot summer day and asks for an ‘Iced Tea’ instead of his preferred drink. We have few options here:
Options A. Serve Joe the drink and skip updating his “new” drink preference (we already have his data in our table).
The below INSERT statement has ON CONFLICT DO NOTHING. This will ensure that if a row already exists, it will be skipped. Else, a new row will be inserted.
INSERT INTO myCoffee (NAME, preference) VALUES('Joe','Iced Tea') ON CONFLICT (name) DO NOTHING;
Options B. Or else, we can register ‘Iced Tea’ as his new drink preference using ON CONFLICT DO UPDATE as in the below statement
INSERT INTO myCoffee (name, preference) VALUES('Joe','Iced Tea is my 2nd favorite') ON CONFLICT (name) DO UPDATE SET preference = EXCLUDED.preference || ';' || myCoffee.preference;
|2||Joe||Iced Tea is my 2nd favorite; Espresso|
So, we just learned that UPSERT is a combination of two different SQL statements UPDATE and INSERT. It works on a simple rule that if a new row being inserted does not have any duplicate then insert it, else if there are duplicate rows then either skip insert or update the new column value.
Reference – ON CONFLICT Clause
UPSERT is a helpful feature to handle records that require to be updated frequently. I hope y’all found this post useful.