Perl Regular Expression Cheat Sheet



Summary: in this tutorial, you will learn how to use the MySQL REGEXP operator to perform complex searches based on regular expressions.

Perl regular expression cheat sheet

Introduction to regular expressions

Perl Regular Expression Reference

  • (Again, taken from rexegg.com’s regex cheat sheet) Regex resources. That is all I have for now. If you want to learn more, there’s are a lot of useful resources out there: Rexegg.com – Many great articles on most aspects of regex; Regex101 – A tester for your regex, offering a few different implementations.
  • Don’t forget to visit REGEX CHEAT SHEET Python, PHP, and Perl Supported Metacharacters Metacharacters Meaning a Alert b Backspace n.
  • A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. They’re typically used to find a sequence of characters within a string so you can extract and manipulate them. For example, the following returns both instances of ‘active’: import re pattern = 'ac.ve'.

A regular expression is a special string that describes a search pattern. It is a powerful tool that gives you a concise and flexible way to identify strings of text e.g., characters, and words, based on patterns.

This is a quick reference to Perl’s regular expressions. For full information see the perlre and perlop manual pages. Operators =Ÿ determines to which variable the regex is applied. In its ab-sence, $is used. $var =Ÿ /foo/;!Ÿ determines to which variable the regex is applied, and negates.

Perl Compatible Regular Expression Cheat Sheet

For example, you can use regular expressions to search for email, IP address, phone number, social security number, or anything that has a specific pattern.

A regular expression uses its own syntax that can be interpreted by a regular expression processor. A regular expression is widely used in almost platforms from programming languages to databases including MySQL.

The advantage of using regular expression is that you are not limited to search for a string based on a fixed pattern with the percent sign (%) and underscore (_) in the LIKE operator. The regular expressions have more meta-characters to construct flexible patterns.

The disadvantage of using regular expression is that it is quite difficult to understand and maintain such a complicated pattern. Therefore, you should describe the meaning of the regular expression in the comment of the SQL statement. In addition, the speed of data retrieval, in some cases, is decreased if you use complex patterns in a regular expression.

The abbreviation of regular expressions is regex or regexp.

MySQL REGEXP operator

MySQL adapts the regular expression implemented by Henry Spencer. MySQL allows you to match pattern right in the SQL statements by using REGEXP operator.

Perl Regular Expression Cheat Sheet

The following illustrates the syntax of the REGEXP operator in the WHERE clause:

This statement performs a pattern match of a string_column against a pattern.

If a value in the string_column matches the pattern, the expression in the WHERE clause returns true, otherwise it returns false.

If either string_column or pattern is NULL, the result is NULL.

In addition to the REGEXP operator, you can use the RLIKE operator, which is the synonym of the REGEXP operator.

The negation form of the REGEXP operator is NOT REGEXP.

MySQL REGEXP examples

Suppose you want to find all products whose last names start with character A, B or C. You can use a regular expression in the following SELECT statement:

The pattern allows you to find the product whose name begins with A, B, or C.

  • The character ^ means to match from the beginning of the string.
  • The character | means to search for alternatives if one fails to match.

The following table illustrates some commonly used metacharacters and constructs in a regular expression.

To find products whose names start with the character a, you use the metacharacter '^' to match at the beginning of the name:

If you want the REGEXP operator to compare strings in case-sensitive fashion, you can use the BINARY operator to cast a string to a binary string.

Because MySQL compares binary strings byte by byte rather than character by character. This allows the string comparison to be case sensitive.

For example, the following statement matches only uppercase 'C' at the beginning of the product name.

To find the product whose name ends with f, you use 'f$' to match the end of a string.

To find the product whose name contains the word 'ford', you use the following query:

To find the product whose name contains exactly 10 characters, you use ‘^' and ‘$ to match the beginning and end of the product name, and repeat {10} times of any character ‘.' in between as shown in the following query:

In this tutorial, you have learned how to query data using the MySQL REGEXP operator with regular expressions.

PrevNext

In the previous article we saw how regular characters match themselves and how dot . can match any character.

Sheet

A character class is something in between those two extremes. A character class is a list of characters that can be matched.

The list is placed in square brackets [].

For example [abc] will match either 'a' or 'b' or 'c'.

Just as a regular character or a . can match exactly one character so does a character class. Later we are going to learnabout quantifiers that will allow us to say how many of something we would like to match, but for now remember that a characterclass always matches exactly one character. If it cannot fulfill the match then the whole regex matching fails.

So what if we have a bunch of strings and we would like to make sure only strings containing any of the following will match?#a#, #b#, #c#, #d#, #e#, #f#, #@# or #.#That is, we would like the string to have a # character, followed by 'a', 'b', 'c', 'd', 'e', 'f', '@', or '.', followed by another #character. (We are using # in this example in order to get you used to seeing 'strange' characters that have no special meaning.)

The regex that will match those looks like this: /#[abcdef@.]#/.

It says: match a #, then match any one(!) of the characters in the square bracket, then match another #.

this will match

but will not match any of the following:

Two notes:

  • The regex won't match '##' or '#ab#' because the character class must match exactly one character between the two '#' characters.
  • The '.' inside the character class lost its special meaning of 'everything except newline' and can match a single '.' only.

In general, most special characters lose their special meaning inside a character class, but there are of course exceptions.There are even character that gain special meaning inside a character class.

Range in a character class

Programmers are lazy typing in all the characters between 'a' and 'f' in the regex /#[abcdef@.]#/ was really tiring. If we had to typein all the characters between 'a' and 'z' that would be even worse and it would be very error-prone. What if I miss one of the characters?Instead of that regexes allow us to define a range of characters from the ASCII table using a dash (-). The previous regexcould be written as /#[a-f@.]#/

So as you can see a dash -, that did not have any special meaning outside of a character class, inside has the special 'range-making' meaning.

Of course you will then want to know how can you express that one of the characters you'd like to match in the character class is a dash, and the answeris that if you place the dash as the first or the last character in the character class, then it will be just a plain dash.So /#[a-f@.-]#/ will match all the above and also '#-#'.

Another frequently asked question at this point is how to include a closing square bracket ] in a character class. That's simple too.You just need to 'escape' it be preceding with a back-slash: ].

Negated character class

What if we would like to allow the matching of any character between two '#' characters except 'a', 'b', or 'c'? We would need to constructa character class with all the characters in the world and Unicode has more that 110,000 characters.That would be a lot of work to type in. Instead of that, Perl allows us to negate a character class. If we put a Caret (^) as the first character in the character class it will mean the character class can match any one character except those mentioned in the character class.So [^abc] would match exactly one character that is not 'a', nor 'b', nor 'c'. Our full regex then would look like /#[^abc]#/.

This regex will match these strings:

but will not match any of these strings:

Note, it won't match the string '##' or the string '#xyz#', because the negated character class still has to match exactly one character.

Summary

Published on 2014-11-09

Comments

In the comments, please wrap your code snippets within <pre> </pre> tags and use spaces for indentation.Please enable JavaScript to view the comments powered by Disqus.comments powered by Disqus
If you have any comments or questions, feel free to post them on the source of this page in GitHub. Source on GitHub.