Regular expressions – Overview

Regular expressions

Definition

Regular expressions is a special text string for describing a search pattern. A pattern consists of one or more character literals, operators, or constructs. You can use Regular expressions to find out a parttern in very short way such as: codeail, Phone format, Password rules, etc… It is very powerful whether that code is written in Perl, PHP, Java, a .NET language or a multitude of other languages.

Regular expressions – overview

1. Special Characters in Regular Expressions & their meanings

These characters bellow are special characters, override (escape) with character “\”.

Character Meaning Example
* Match zero, one or more of the previous Be* matches B or Be or Beeee
? Match zero or one of the previous Colou?r matches Color or Colour but not colouur
+ Match one or more of the previous He+ matches He or Hee but not H
\ Used to escape a special character he\? matches he?
. Wildcard character, matches any character cat. matches catT and cat2 but not catty
( ) Group characters a(bee)?t matches at or abeet but not abet
[ ] Matches a range of characters bee[rft] matches beer, beef, or beet
[0-9]+ matches any positive integer
[A-Za-z] matches ascii letters a-z (uppercase and lower case)
[^0-9] matches any character not 0-9.
| Matche previous OR next character/group (Mon)|(Tues)day matches Monday or Tuesday
July(first|1st|1) will match July first or July 1st or July 1 but not July 2
{ } Matches a specified number of occurrences of the previous [0-9]{3} matches any three digits 315 but not 31
[0-9]{2,4} matches any two, three or four digits 12, 123, and 1234
[0-9]{2,} matches any two or more digits 1234567…
^ Beginning of a string. Or within a character range [] negation. ^http matches strings that begin with http, such as a url.
[^0-9] matches any character not 0-9.
$ End of a string. cat$ matches any string that ends with cat
ing$ matches exciting but not ingenious

2. Perl-Style Metacharacters

Character Meaning Example
i Append to pattern to specify a case insensitive match /colou?r/i matches COLOR or Colour
\b A word boundary, the spot between word (\w) and non-word (\W) characters /\bis\b/ matches is but not island
\B A non-word boundary /fred\B/i matches Frederick but not Fred
\w A single word character – alphanumeric and underscore /\w/ matches 1 or _ but not ?
\W A single non-word character /a\Wb/i matches a!b but not a2b
\d A single digit character /b\dc/i matches b2c but not bac
\D A single non-digit character /a\Db/i matches aCb but not a3b
\s A single whitespace character /a\sb/ matches a b but not ab
\S A single non-whitespace character /a\Sb/ matches a2b but not a b
\t The tab character. (ASCII 9) /\t/ matches a tab.
\n The newline character. (ASCII 10) /\n/ matches a newline
\r The carriage return character. (ASCII 13) /\r/ matches a carriage return

3. POSIX Character Classes

A POSIX character class is a special metasequence of the form [:…:] that can be used only inside a bracketed expression in #px syntax. The POSIX classes supported are

Character Class Meaning
[:alpha:] Any letter, [A-Za-z]
[:upper:] Any uppercase letter, [A-Z]
[:lower:] Any lowercase letter, [a-z]
[:digit:] Any digit, [0-9]
[:alnum:] Any alphanumeric character, [A-Za-z0-9]
[:xdigit:] Any hexadecimal digit, [0-9A-Fa-f]
[:space:] A tab, new line, vertical tab, form feed, carriage return, or space
[:blank:] A space or a tab.
[:print:] Any printable character
[:punct:] Any punctuation character: ! ' # S % & ' ( ) * + , - . / : ; < = > ? @ [ / ] ^ _ { | } ~
[:graph:] Any character defined as a printable character except those defined as part of the space character class
[:word:] Continuous string of alphanumeric characters and underscores.
[:ascii:] ASCII characters, in the range: 0-127
[:cntrl:] “Control” characters: ASCII 0 to 32

Your can practice Regular expressions online over here

Related Post

Leave a Reply