Regular Expression Syntax, Regular Expression Quick Reference

Regular expressions, also referred to as rule expressions, are a type of text pattern commonly used for retrieving, replacing, and manipulating text. They primarily consist of letters from a to z and certain special metacharacters. Regular expressions have a wide range of applications, initially popularized by Unix, and are later widely used in Scala, PHP, C#, Java, C++, Objective-C, Perl, Swift, VBScript, Javascript, Ruby, Python, and many others. Learning regular expressions is essentially learning a highly flexible logical thinking, which involves achieving control over strings through simple and efficient methods.

Regex Character	Description
\	Marks the next character as a special character, a literal, a backreference, or an octal escape. For example, "`n`" matches the character "`n`". "`\n`" matches a newline. The sequence "`\\`" matches "`\`" and "`\(`" matches "`(`".
^	Matches the position at the start of the input string. If the RegExp object's Multiline property is set, ^ also matches the position after any "`\n`" or "`\r`".
$	Matches the position at the end of the input string. If the RegExp object's Multiline property is set, $ also matches the position before any "`\n`" or "`\r`".
*	Matches the preceding subexpression zero or more times. For example, `zo` can match "`z`" as well as "`zoo`". is equivalent to {0,}.
+	Matches the preceding subexpression one or more times. For example, "`zo+`" can match "`zo`" as well as "`zoo`", but not "`z`". + is equivalent to {1,}.
?	Matches the preceding subexpression zero or one time. For example, "`do(es)?`" can match "`do`" in "`does`" or "`do`" in "`do`". ? is equivalent to {0,1}.
{n}	n is a non-negative integer. Matches exactly n times. For example, "`o{2}`" cannot match the "`o`" in "`Bob`", but it can match the two o's in "`food`".
{n,}	n is a non-negative integer. Matches at least n times. For example, "`o{2,}`" cannot match the "`o`" in "`Bob`", but it can match all o's in "`foooood`". "`o{1,}`" is equivalent to "`o+`". "`o{0,}`" is equivalent to "`o*`".
{n,m}	m and n are non-negative integers, where n<=m. Matches at least n and at most m times. For example, "`o{1,3}`" will match the first three o's in "`fooooood`". "`o{0,1}`" is equivalent to "`o?`". Note that there must not be a space between the comma and the numbers.
?	When this character follows any of the other quantifiers (*, +, ?, {n}, {n,}, {n,m}), the matching pattern is non-greedy. The non-greedy pattern matches as little of the searched string as possible, whereas the default greedy pattern matches as much of the searched string as possible. For example, in the string "`oooo`", "`o+?`" will match a single "`o`", whereas "`o+`" will match all "`o`"s.
.	Matches any single character except for "\`n`". To match any character including "\`n`", use a pattern like "(.\|\n)".
(pattern)	Matches 'pattern' and captures the match. The captured matches can be retrieved from the resulting Matches collection, using the SubMatches collection in VBScript or the $0…$9 properties in JScript. To match parenthesis characters, use "$" or "$".
(?:pattern)	Matches 'pattern' but does not capture the match, meaning it is a non-capturing match that is not stored for later use. This is useful when using the alternation character "\|" to combine parts of a pattern. For example, "industr(?:y\|ies)" is a more concise expression than "industry\|industries".
(?=pattern)	Positive lookahead assertion. Matches a search string at any point where a string matching 'pattern' begins. This is a non-capturing match, meaning the match is not required to be retrieved for later use. For example, "Windows(?=95\|98\|NT\|2000)" matches "Windows" in "Windows2000" but not in "Windows3.1". Lookaheads do not consume characters, meaning after a match occurs, the search for the next match begins immediately after the last match, not after the characters included in the lookahead.
(?!pattern)	Negative lookahead assertion. Matches a search string at any point where a string not matching 'pattern' begins. This is a non-capturing match, meaning the match is not required to be retrieved for later use. For example, "Windows(?!95\|98\|NT\|2000)" matches "Windows" in "Windows3.1" but not in "Windows2000". Lookaheads do not consume characters, meaning after a match occurs, the search for the next match begins immediately after the last match, not after the characters included in the lookahead.
(?<=pattern)	Positive lookbehind assertion, similar to positive lookahead but in the opposite direction. For example, "(?<=95\|98\|NT\|2000)Windows" matches "Windows" in "2000Windows" but not in "3.1Windows".
(?<!pattern)	Negative lookbehind assertion, similar to negative lookahead but in the opposite direction. For example, "(?<!95\|98\|NT\|2000)Windows" matches "Windows" in "3.1Windows" but not in "2000Windows".
x\|y	Matches either 'x' or 'y'. For example, "z\|food" matches either "z" or "food". "(z\|f)ood" matches "zood" or "food".
[xyz]	Character class. Matches any single character that is included. For example, "[abc]" matches "a" in "plain".
[^xyz]	Negated character class. Matches any single character that is not included. For example, "[^abc]" matches "p" in "plain".
[a-z]	Character range. Matches any single character within the specified range. For example, "[a-z]" matches any lowercase letter from "a" to "z".
[^a-z]	Negated character range. Matches any single character that is not within the specified range. For example, "[^a-z]" matches any character that is not a lowercase letter from "a" to "z".
\b	Matches a word boundary, which is the position between a word and a space. For example, "\ber" matches the "er" in "never", but not the "er" in "verb".
\B	Matches a non-word boundary. "\ber" matches the "er" in "verb", but not the "er" in "never".
\cx	Matches a control character specified by x. For example, \cM matches a Control-M or carriage return. The value of x must be one of A-Z or a-z. Otherwise, c is treated as a literal 'c' character.
\d	Matches a digit character. Equivalent to [0-9].
\D	Matches a non-digit character. Equivalent to [^0-9].
\f	Matches a form feed. Equivalent to \x0c and \cL.
\n	Matches a newline character. Equivalent to \x0a and \cJ.
\r	Matches a carriage return. Equivalent to \x0d and \cM.
\s	Matches any whitespace character, including space, tab, form feed, etc. Equivalent to [ \f\n\r\t\v].
\S	Matches any non-whitespace character. Equivalent to [^ \f\n\r\t\v].
\t	Matches a horizontal tab. Equivalent to \x09 and \cI.
\v	Matches a vertical tab. Equivalent to \x0b and \cK.
\w	Matches any word character including underscore. Equivalent to "[A-Za-z0-9_]".
\W	Matches any non-word character. Equivalent to "[^A-Za-z0-9_]".
\xn	Matches n, where n is a hexadecimal escape value. The hexadecimal escape value must be exactly two digits long. For example, "\x41" matches "A". "\x041" is equivalent to "\x04&1". ASCII encoding can be utilized within regular expressions.
\num	Matches num, where num is a positive integer. This serves as a reference to a captured match. For example, "(.)\1" matches two consecutive identical characters.
\n	Denotes an octal escape value or a backreference. If \n is preceded by at least n captured subexpressions, then n is a backreference. Otherwise, if n is an octal digit (0-7), then n represents an octal escape value.
\nm	Denotes an octal escape value or a backreference. If \nm is preceded by at least nm captured subexpressions, then nm is a backreference. If \nm is preceded by at least n captures, then n followed by the literal m represents a backreference. If neither condition is met, and if n and m are both octal digits (0-7), then \nm matches the octal escape value nm.
\nml	If n is an octal digit (0-3), and both m and l are octal digits (0-7), then it matches the octal escape value nml.
\un	Matches n, where n is a Unicode character represented by four hexadecimal digits. For example, \u00A9 matches the copyright symbol (©).

Common Regular Expressions

Username	/^[a-z0-9_-]{3,16}$/
Password	/^[a-z0-9_-]{6,18}$/
Confirm Password	(?=^.{8,}$)(?=.\d)(?=.\W+)(?=.[A-Z])(?=.[a-z])(?!.\n).$ (Must contain digits, uppercase letters, lowercase letters, and punctuation; at least 8 characters long)
Hexadecimal Value	/^#?([a-f0-9]{6}\|[a-f0-9]{3})$/
Email	/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/ /^[a-z\d]+(\.[a-z\d]+)@([\da-z](-[\da-z])?)+(\.{1,2}[a-z]+)+$/ or \w+([-+.]\w+)@\w+([-.]\w+)\.\w+([-.]\w+)
URL	/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-])\/?$/ or [a-zA-z]+://[^\s]*
IP Address	/((2[0-4]\d\|25[0-5]\|[01]?\d\d?)\.){3}(2[0-4]\d\|25[0-5]\|[01]?\d\d?)/ /^(?:(?:25[0-5]\|2[0-4][0-9]\|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]\|2[0-4][0-9]\|[01]?[0-9][0-9]?)$/ or ((2[0-4]\d\|25[0-5]\|[01]?\d\d?)\.){3}(2[0-4]\d\|25[0-5]\|[01]?\d\d?)
HTML Tag	/^<([a-z]+)([^<]+)(?:>(.)<\/\1>\|\s+\/>)$/ or <(.)(.)>.<\/\1>\|<(.) \/>
Remove Code Comments	(?<!http:\|\S)//.*$
Match Double-Byte Characters (Including Chinese)	[^\x00-\xff]
Chinese Character	[\u4e00-\u9fa5]
Range of Chinese Characters in Unicode	/^[\u2E80-\u9FFF]+$/
Chinese Characters and Full-Width Punctuation	[\u3000-\u301e\ufe10-\ufe19\ufe30-\ufe44\ufe50-\ufe6b\uff01-\uffee]
Date (Year-Month-Day)	(\d{4}\|\d{2})-((0?([1-9]))\|(1[1\|2]))-((0?[1-9])\|([12]([1-9]))\|(3[0\|1]))
Date (Month/Day/Year)	((0?[1-9]{1})\|(1[1\|2]))/(0?[1-9]\|([12][1-9])\|(3[0\|1]))/(\d{4}\|\d{2})
Time (Hour:Minute, 24-Hour Format)	((1\|0?)[0-9]\|2[0-3]):([0-5][0-9])
Chinese Mainland Fixed Phone Number	(\d{4}-\|\d{3}-)?(\d{8}\|\d{7})
Chinese Mainland Mobile Phone Number	1\d{10}
Chinese Mainland Postal Code	[1-9]\d{5}
Chinese Mainland ID Card Number (15 or 18 Digits)	\d{15}(\d\d[0-9xX])?
Non-Negative Integer (Positive Integer or Zero)	\d+
Positive Integer	[0-9][1-9][0-9]
Negative Integer	-[0-9][1-9][0-9]
Integer	-?\d+
Decimal Number	(-?\d+)(\.\d+)?
Whitespace Line	\n\s\r or \n\n (editplus) or ^[\s\S ]\n
QQ Number	[1-9]\d{4,}
Word Not Containing "abc"	\b((?!abc)\w)+\b
Match Leading and Trailing Whitespace Characters	^\s\|\s$
Editing Commons	The following are replacements for special Chinese characters (editplus) ^[0-9].\n ^[^第].\n ^[Exercise].\n ^[\s\S ]\n ^[0-9]\. ^[\s\S ]\n <p[^<>]> href="javascript:if$confirm\('(.?)'$\)window\.location='(.?)'" <span style=".[^"]rgb$255,255,255$">.[^<>]</span> <DIV class=xs0>[\s\S]?</DIV>

Regular Expression Syntax

The Regular Expression Syntax serves as a quick reference guide for commonly used regular expressions, allowing you to query regular expression syntax, access common regular expression patterns, understand basic regular expression syntax, grasp subexpression syntax, explore regular expression modifiers, and distinguish between greedy and non-greedy modes in regular expressions. It enables you to control strings through simple and efficient methods.