MariaDB Crash Course - Ben Forta [25]
Regular expressions are supported in all sorts of programming languages, text editors, operating systems, and more. And savvy programmers and network managers have long regarded regular expressions as a vital component of their technical toolboxes.
Regular expressions are created using the regular expression language, a specialized language designed to do everything that was just discussed and much more. Like any language, regular expressions have a special syntax and instructions that you must learn.
* * *
Note: To Learn More
Full coverage of regular expressions is beyond the scope of this chapter. While the basics are covered here, for a more thorough introduction to regular expressions you might want to obtain a copy of my Sams Teach Yourself Regular Expressions in 10 Minutes (ISBN 0672325667).
* * *
Using Regular Expressions
So what does this have to do with MariaDB? As already explained, all regular expressions do is match text, comparing a pattern (the regular expression) with a string of text. MariaDB provides rudimentary support for regular expressions with WHERE clauses, allowing you to specify regular expressions that are used to filter data retrieved using SELECT.
* * *
Note: Just a Subset of the Regular Expression Language
If you are already familiar with regular expressions, take note. MariaDB supports only a small subset of what is supported in most regular expression implementations, and this chapter covers most of what is supported.
* * *
This will all become much clearer with some examples.
Basic Character Matching
We start with a simple example. The following statement retrieves all rows where column prod_name contains the text 1000:
Input
SELECT prod_name
FROM products
WHERE prod_name REGEXP '1000'
ORDER BY prod_name;
Output
+--------------+
| prod_name |
+--------------+
| JetPack 1000 |
+--------------+
Analysis
This statement looks much like the ones that used LIKE (in Chapter 8, “Using Wildcard Filtering”), except that the keyword LIKE has been replaced with REGEXP. This tells MariaDB that what follows is to be treated as a regular expression (one that just matches the literal text 1000).
So, why bother using a regular expression? Well, in the example just used, regular expressions really add no value (and probably hurt performance), but consider this next example:
Input
SELECT prod_name
FROM products
WHERE prod_name REGEXP '.000'
ORDER BY prod_name;
Output
+--------------+
| prod_name |
+--------------+
| JetPack 1000 |
| JetPack 2000 |
+--------------+
Analysis
Here the regular expression .000 was used. . is a special character in the regular expression language. It means match any single character, and so both 1000 and 2000 matched and were returned.
Of course, this particular example could also have been accomplished using LIKE and wildcards (as seen in Chapter 8).
* * *
Note: LIKE Versus REGEXP
There is one important difference between LIKE and REGEXP. Look at these two statements:
SELECT prod_name
FROM products
WHERE prod_name LIKE '1000'
ORDER BY prod_name;
and
SELECT prod_name
FROM products
WHERE prod_name REGEXP '1000'
ORDER BY prod_name;
If you were to try them both you’d discover that the first returns no data and the second returns one row. Why is this?
As seen in Chapter 8, LIKE matches an entire column. If the text to be matched existed in the middle of a column value, LIKE would not find it and the row would not be returned (unless wildcard characters were used). REGEXP, on the other hand, looks for matches within column values, and so if the text to be matched existed in the middle of a column value, REGEXP would find it and the row would be returned. This is an important distinction.
So can REGEXP be used to match entire column values (so that