
Whenever you are dealing with pattern matching in PostgreSQL, it is often required to write queries to match string patterns. The conventional wisdom to do pattern match was using the LIKE operator, which is not very effective for many reasons. PostgreSQL now offers more advanced pattern matching options such as SIMILAR TO expressions, POSIX regular expressions, and ANY operator. In this article, we’ll compare the LIKE operator with these other advanced pattern matching features.
1. LIKE operator
The LIKE expression returns true if the string matches the supplied pattern. For example, here is a query to find users with a “gmail.com” email account. This query uses the wildcard % to find users with emails that end with “@gmail.com”.
SELECT *
FROM employees
WHERE email LIKE '%@gmail.com';
| first_name | last_name | |
|---|---|---|
| Steven | K | steven.k@gmail.com |
| Neena | K | neena.k@gmail.com |
Now, what if, you have to find users with both “@outlook.com” and “@gmail.com”. Here we can use an OR condition to match against either one of these patterns.
SELECT *
FROM employees
WHERE email LIKE '%@gmail.com'
OR email LIKE '%@outlook.com';
| first_name | last_name | |
|---|---|---|
| Steven | K | steven.k@gmail.com |
| Neena | K | neena.k@gmail.com |
| David | A | david.a@outlook.com |
| Valli | P | valli.p@outlook.com |
Say there are 20 distinct email patterns to match. The more patterns that we match, we add more OR conditions, and the query gets more complicated (and less readable). This is where SIMILAR TO operator comes in handy.
2. SIMILAR TO operator
SIMILAR TO operator lets you provide multiple patterns to match separated by a pipe operator. It is similar to LIKE, except that it interprets the pattern using the SQL standard’s definition of a regular expression. Also like LIKE, SIMILAR TO uses _ and % as wildcard characters denoting any single character and any string, respectively.
SELECT *
FROM employees
WHERE email SIMILAR TO '%@gmail.com|%@outlook.com';
| first_name | last_name | |
|---|---|---|
| Steven | K | steven.k@gmail.com |
| Neena | K | neena.k@gmail.com |
| David | A | david.a@outlook.com |
| Valli | P | valli.p@outlook.com |
Using SIMILAR TO you can match against multiple patterns and keep the query simple.
3. Introducing – REGEXP operator
Another way to match against multiple patterns is using the regular expression, or REGEXP. It is well known across many programming languages such as C, C++, .NET, Java that supports regexp. It is largely based on the powerful UNIX grep search tools.
SELECT *
FROM employees
WHERE email ~ '@gmail\.com$|@outlook\.com$';
| first_name | last_name | |
|---|---|---|
| Steven | K | steven.k@gmail.com |
| Neena | K | neena.k@gmail.com |
| David | A | david.a@outlook.com |
| Valli | P | valli.p@outlook.com |
When using REGEXP you need to take be a bit more cautious. A period “.” will match anything, so to match the period “.” in gmail.com or outlook.com, you need to add the escape character “\.“.
4. ANY to match against array of patterns
PostgreSQL has an ANY operator that is used to compare a scalar value with a set of values returned by a subquery. And in this case, the sub-query is an array for patterns to match.
SELECT *
FROM employees
WHERE email ~ ANY(ARRAY['@gmail\.com$', '@outlook\.com$']);
| first_name | last_name | |
|---|---|---|
| Steven | K | steven.k@gmail.com |
| Neena | K | neena.k@gmail.com |
| David | A | david.a@outlook.com |
| Valli | P | valli.p@outlook.com |
Here, this query uses the ANY operator to match against an array of email domain. If the email matches any of the patterns, the condition will be true. This approach is easier to work with from a host language such as Python.
Conclusion
- PostgreSQL has a complete, and extremely tunable regular expression engine built right in.
- Compared to LIKE, regular expressions offers much more flexibility and performance for pattern matching.
- Also, everything you learn about PostgreSQL regular expressions is transferable to other programming environments. Regex is everywhere.
- Finally, To learn more about pattern matching using regular expressions in PostgreSQL, I would recommend you follow the official tutorial.
Enjoy what you’re reading?
Join my mailing list, and I’ll let you know whenever I write another post. No spam, I promise! 🧑🏼💻