PostgreSQL Advanced Pattern Matching – Beyond LIKE Operator

Whenever you are dealing with pattern matching in PostgreSQL, it is often required to write queries to match string patterns. The conventional wisdom to do pattern match was using the LIKE operator, which is not very effective for many reasons. PostgreSQL now offers more advanced pattern matching options such as SIMILAR TO expressions, POSIX regular expressions, and ANY operator. In this article, we’ll compare the LIKE operator with these other advanced pattern matching features.

1. LIKE operator

The LIKE expression returns true if the string matches the supplied pattern. For example, here is a query to find users with a “gmail.com” email account. This query uses the wildcard % to find users with emails that end with “@gmail.com”.

SELECT *
FROM   employees
 WHERE  email LIKE '%@gmail.com'; 
first_name last_name email
Steven K steven.k@gmail.com
Neena K neena.k@gmail.com

Now, what if, you have to find users with both “@outlook.com” and “@gmail.com”. Here we can use an OR condition to match against either one of these patterns.

SELECT *
FROM   employees
 WHERE  email LIKE '%@gmail.com'
     OR email LIKE '%@outlook.com'; 
first_name last_name email
Steven K steven.k@gmail.com
Neena K neena.k@gmail.com
David A david.a@outlook.com
Valli P valli.p@outlook.com

Say there are 20 distinct email patterns to match. The more patterns that we match, we add more OR conditions, and the query gets more complicated (and less readable). This is where SIMILAR TO operator comes in handy.

2. SIMILAR TO operator

SIMILAR TO operator lets you provide multiple patterns to match separated by a pipe operator. It is similar to LIKE, except that it interprets the pattern using the SQL standard’s definition of a regular expression. Also like LIKESIMILAR TO uses _ and % as wildcard characters denoting any single character and any string, respectively.

SELECT *
FROM employees
 WHERE email SIMILAR TO '%@gmail.com|%@outlook.com';
first_name last_name email
Steven K steven.k@gmail.com
Neena K neena.k@gmail.com
David A david.a@outlook.com
Valli P valli.p@outlook.com

Using SIMILAR TO you can match against multiple patterns and keep the query simple.

3. Introducing – REGEXP operator

Another way to match against multiple patterns is using the regular expression, or REGEXP. It is well known across many programming languages such as C, C++, .NET, Java that supports regexp. It is largely based on the powerful UNIX grep search tools.

SELECT *
FROM employees
 WHERE email ~ '@gmail\.com$|@outlook\.com$';
first_name last_name email
Steven K steven.k@gmail.com
Neena K neena.k@gmail.com
David A david.a@outlook.com
Valli P valli.p@outlook.com

When using REGEXP you need to take be a bit more cautious. A period “.” will match anything, so to match the period “.” in gmail.com or outlook.com, you need to add the escape character “\.“.

4. ANY to match against array of patterns

PostgreSQL has an ANY operator that is used to compare a scalar value with a set of values returned by a subquery. And in this case, the sub-query is an array for patterns to match.

SELECT *
FROM employees
 WHERE email ~ ANY(ARRAY['@gmail\.com$', '@outlook\.com$']);
first_name last_name email
Steven K steven.k@gmail.com
Neena K neena.k@gmail.com
David A david.a@outlook.com
Valli P valli.p@outlook.com

Here, this query uses the ANY operator to match against an array of email domain. If the email matches any of the patterns, the condition will be true. This approach is easier to work with from a host language such as Python.

Conclusion

  • PostgreSQL has a complete, and extremely tunable regular expression engine built right in.
  • Compared to LIKE, regular expressions offers much more flexibility and performance for pattern matching.
  • Also, everything you learn about PostgreSQL regular expressions is transferable to other programming environments. Regex is everywhere.
  • Finally, To learn more about pattern matching using regular expressions in PostgreSQL, I would recommend you follow the official tutorial.
Enjoy what you’re reading?

Join my mailing list, and I’ll let you know whenever I write another post. No spam, I promise! 🧑🏼‍💻

Author: Varun Dhawan

Hello dear reader, I'm a Product Manager @Microsoft based in MN, US (beautiful land of 10,000 lakes). I am perpetually curious and always willing to learn and engineer systems that can help solve complex problems using data. When I am not engineering or blogging, you’ll find me cooking and spending time with my family. Varun.Dhawan@gmail.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: