MariaDB Crash Course - Ben Forta [27]
Input
SELECT vend_name
FROM vendors
WHERE vend_name REGEXP '.'
ORDER BY vend_name;
Output
+----------------+
| vend_name |
+----------------+
| ACME |
| Anvils R Us |
| Furball Inc. |
| Jet Set |
| Jouets Et Ours |
| LT Supplies |
+----------------+
Analysis
That did not work. . matches any character, and so every row was retrieved.
To match special characters they must be preceded by \\. So, \\- means find – and \\. means find .
Input
SELECT vend_name
FROM vendors
WHERE vend_name REGEXP '\\.'
ORDER BY vend_name;
Output
+--------------+
| vend_name |
+--------------+
| Furball Inc. |
+--------------+
Analysis
That worked. \\. matches ., and so only a single row was retrieved. This process is known as escaping, and all characters that have special significance within regular expressions must be escaped this way. This includes ., |, [], and all the other special characters used thus far.
\\ is also used to refer to metacharacters (characters that have specific meanings), as listed in Table 9.1.
Table 9.1 Whitespace Metacharacters
* * *
Tip: To Match \
To match the backslash character itself (\), you need to use \\\.
* * *
* * *
Note: \ or \\?
Most regular expression implementations use a single backslash to escape special characters to be able to use them as literals. MariaDB, however, requires two backslashes (MariaDB itself interprets one, and the regular expression library interprets the other).
* * *
Matching Character Classes
There are matches that you’ll find yourself using frequently—digits, or all alphabetical characters, or all alphanumerical characters, and so on. To make working with these easier, you may use predefined character sets known as character classes. Table 9.2 lists the character classes and what they mean.
Table 9.2 Character Classes
Matching Multiple Instances
All the regular expressions used thus far attempt to match a single occurrence. If there is a match, the row is retrieved, and if not, nothing is retrieved. But sometimes you require greater control over the number of matches. For example, you might want to locate all numbers regardless of how many digits the number contains, or you might want to locate a word but also be able to accommodate a trailing s if one exists, and so on.
This can be accomplished using the regular expressions repetition metacharacters, listed in Table 9.3.
Table 9.3 Repetition Metacharacters
Following are some examples.
Input
SELECT prod_name
FROM products
WHERE prod_name REGEXP '\\([0-9] sticks?\\)'
ORDER BY prod_name;
Output
+----------------+
| prod_name |
+----------------+
| TNT (1 stick) |
| TNT (5 sticks) |
+----------------+
Analysis
Regular expression \\([0-9] sticks?\\) requires some explanation. \\( matches (, [0-9] matches any digit (1 and 5 in this example), sticks? matches stick and sticks (the ? after the s makes that s optional because ? matches 0 or 1 occurrence of whatever it follows), and \\) matches the closing ). Without ? it would have been difficult to match both stick and sticks.
Here’s another example. This time we try to match four consecutive digits:
Input
SELECT prod_name
FROM products
WHERE prod_name REGEXP '[[:digit:]]{4}'
ORDER BY prod_name;
Output
+--------------+
| prod_name |
+--------------+
| JetPack 1000 |
| JetPack 2000 |
+--------------+
Analysis
As explained previously, [:digit:] matches any digit, and so [[:digit:]] is a set of digits. {4} requires exactly four occurrences of whatever it follows (any digit), and so [[:digit:]]{4} matches any four consecutive digits.
It is worth noting that when using regular expressions there is almost always more than one way to write a specific expression. The previous example could have also been written as follows:
Input
SELECT prod_name
FROM products
WHERE prod_name REGEXP '[0-9][0-9][0-9][0-9]'
ORDER BY prod_name;