Common Regular Expressions, Regular Expression Numbers

We have collected commonly used regular expressions for you, which are frequently employed in program development. These expressions facilitate quick usage, save valuable time, and enhance program development efficiency. The following regular expressions have undergone multiple tests and are continually being updated. Since regular expressions vary slightly across different programs or tools, you can easily modify and use them as per your needs.

Description	Regular Expression
Uniform Resource Locator (URL)	[a-zA-z]+://[^\s]*
IP Address	((2[0-4]\d\|25[0-5]\|[01]?\d\d?)\.){3}(2[0-4]\d\|25[0-5]\|[01]?\d\d?)
Email Address	\w+([-+.]\w+)@\w+([-.]\w+)\.\w+([-.]\w+)*
QQ Number	[1-9]\d{4,}
HTML Tag (Including Content or Self-Closing)	<(.)(.)>.<\/\1>\|<(.) \/>
Password (Combination of Numbers, Uppercase Letters, Lowercase Letters, and Punctuation Marks; All Four Required; Minimum 8 Characters)	(?=^.{8,}$)(?=.\d)(?=.\W+)(?=.[A-Z])(?=.[a-z])(?!.\n).$
Date (Year-Month-Day)	(\d{4}\|\d{2})-((1[0-2])\|(0?[1-9]))-(([12][0-9])\|(3[01])\|(0?[1-9]))
Date (Month/Day/Year)	((1[0-2])\|(0?[1-9]))/(([12][0-9])\|(3[01])\|(0?[1-9]))/(\d{4}\|\d{2})
Time (Hour:Minute, 24-Hour Format)	((1\|0?)[0-9]\|2[0-3]):([0-5][0-9])
Chinese Character	[\u4e00-\u9fa5]
Chinese and Full-Width Punctuation Characters	[\u3000-\u301e\ufe10-\ufe19\ufe30-\ufe44\ufe50-\ufe6b\uff01-\uffee]
Fixed-Line Telephone Number in Mainland China	(\d{4}-\|\d{3}-)?(\d{8}\|\d{7})
Mobile Phone Number in Mainland China	1\d{10}
Postal Code in Mainland China	[1-9]\d{5}
Identity Card Number in Mainland China (15 or 18 Digits)	\d{15}(\d\d[0-9xX])?
Non-Negative Integer (Positive Integer or Zero)	\d+
Positive Integer	[0-9][1-9][0-9]
Negative Integer	-[0-9][1-9][0-9]
Integer	-?\d+
Decimal Number	(-?\d+)(\.\d+)?
Word Not Containing "abc"	\b((?!abc)\w)+\b

Regular expressions are practical and efficient tools used for string processing, form validation, and other applications. Here, we have collected some commonly used expressions for your reference.

Description	Regular Expression
Username	/^[a-z0-9_-]{3,16}$/
Password	/^[a-z0-9_-]{6,18}$/
Hexadecimal Value	/^#?([a-f0-9]{6}\|[a-f0-9]{3})$/
Email	/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/
URL	/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-])\/?$/
IP Address	/^(?:(?:25[0-5]\|2[0-4][0-9]\|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]\|2[0-4][0-9]\|[01]?[0-9][0-9]?)$/
HTML Tag	/^<([a-z]+)([^<]+)(?:>(.)<\/\1>\|\s+\/>)$/
Unicode Range for Chinese Characters	/^[u4e00-u9fa5],{0,}$/
Regex for Matching Chinese Characters	[\u4e00-\u9fa5]
Comment: Matching Chinese characters can be tricky, but with this expression, it becomes much easier.
Match double-byte characters (including Chinese characters)	[^\x00-\xff]
Comment: Can be used to calculate the length of a string (a double-byte character counts as 2, ASCII characters count as 1)
Regular expression for matching blank lines	\n\s*\r
Comment: Can be used to delete blank lines
Regular expression for matching HTML tags	<(\S?)[^>]>.?</\1>\|<.?/>
Comment: The versions available online are too poor, and the above can only match some cases; it is still helpless for complex nested tags
Regular expression for matching leading and trailing whitespace characters	^\s\|\s$
Comment: Can be used to delete leading and trailing whitespace characters (including spaces, tabs, newline characters, etc.), a very useful expression
Regular expression for matching Email addresses	\w+([-+.]\w+)@\w+([-.]\w+)\.\w+([-.]\w+)*
Comment: Very useful for form validation
Regular expression for matching URLs	[a-zA-z]+://[^\s]*
Comment: The versions available online have limited functionality; the above can basically meet the requirements
Match if the account is legal (starts with a letter, allows 5-16 bytes, allows letters, digits, and underscores)	^[a-zA-Z][a-zA-Z0-9_]{4,15}$
Comment: Very useful for form validation
Match domestic phone numbers in China	\d{3}-\d{8}\|\d{4}-\d{7}
Comment: Matches formats like 0511-4405222 or 021-87888822
Match Tencent QQ numbers	[1-9][0-9]{4,}
Comment: Tencent QQ numbers start from 10000
Match Chinese mainland postal codes	[1-9]\d{5}(?!\d)
Comment: Chinese mainland postal codes are 6 digits
Match ID card	\d{15}\|\d{18}
Comment: Chinese mainland ID cards are either 15 or 18 digits
Match IP address	\d+\.\d+\.\d+\.\d+
Comment: Useful when extracting IP addresses
Match specific numbers:
^[1-9]\d*$	//Match positive integers
^-[1-9]\d*$	//Match negative integers
^-?[1-9]\d*$	//Match integers
^[1-9]\d*\|0$	//Match non-negative integers (positive integers + 0)
^-[1-9]\d*\|0$	//Match non-positive integers (negative integers + 0)
^[1-9]\d\.\d\|0\.\d[1-9]\d$	//Match positive floating-point numbers
^-([1-9]\d\.\d\|0\.\d[1-9]\d)$	//Match negative floating-point numbers
^-?([1-9]\d\.\d\|0\.\d[1-9]\d\|0?\.0+\|0)$	//Match floating-point numbers
^[1-9]\d\.\d\|0\.\d[1-9]\d\|0?\.0+\|0$	//Match non-negative floating-point numbers (positive floating-point numbers + 0)
^(-([1-9]\d\.\d\|0\.\d[1-9]\d))\|0?\.0+\|0$	//Match non-positive floating-point numbers (negative floating-point numbers + 0)
Comment: Useful when processing large amounts of data, make adjustments as needed for specific applications
Match specific strings
^[A-Za-z]+$	//Match strings composed of 26 English letters
^[A-Z]+$	//Match strings composed of uppercase English letters
^[a-z]+$	//Match strings composed of lowercase English letters
^[A-Za-z0-9]+$	//Match strings composed of digits and 26 English letters
^\w+$	//Match strings composed of digits, 26 English letters, or underscores

Complete Set of Regular Expressions: Regular expressions come in various flavors. The following table is a comprehensive list of metacharacters and their behaviors in the context of regular expressions in PCRE (Perl Compatible Regular Expressions):

Character	Description
\	Marks the next character as a special character, a literal, a backreference, or an octal escape. For example, "n" matches the character "n". "\n" matches a newline. The sequence "\\" matches "\" and "\(" matches "(".
^	Matches the position at the start of the input string. If the RegExp object's Multiline property is set, ^ also matches positions after "\n" or "\r".
$	Matches the position at the end of the input string. If the RegExp object's Multiline property is set, $ also matches positions before "\n" or "\r".
*	Matches the preceding subexpression zero or more times. For example, zo* matches "z" and "zoo". * is equivalent to {0,}.
+	Matches the preceding subexpression one or more times. For example, "zo+" matches "zo" and "zoo", but not "z". + is equivalent to {1,}.
?	Matches the preceding subexpression zero or one time. For example, "do(es)?" can match the "do" in "do" or "does". ? is equivalent to {0,1}.
{n}	n is a non-negative integer. Matches exactly n times. For example, "o{2}" does not match the "o" in "Bob" but matches the two o's in "food".
{n,}	n is a non-negative integer. Matches at least n times. For example, "o{2,}" does not match the "o" in "Bob" but matches all o's in "foooood". "o{1,}" is equivalent to "o+". "o{0,}" is equivalent to "o*".
{n,m}	m and n are non-negative integers, where n <= m. Matches at least n and at most m times. For example, "o{1,3}" matches the first three o's in "fooooood". "o{0,1}" is equivalent to "o?". Note that there cannot be a space between the comma and the numbers.
?	When this character follows any of the other quantifiers (*, +, ?, {n}, {n,}, {n,m}), the matching behavior is non-greedy. The non-greedy mode matches as few characters as possible, whereas the default greedy mode matches as many characters as possible. For example, with the string "oooo", "o+?" will match a single "o", whereas "o+" will match all "o"s.
.	Matches any single character except "\n". To match any character including "\n", use a pattern like "[.\n]".
(pattern)	Matches pattern and captures the match. The captured match can be obtained from the resulting Matches collection, in VBScript use the SubMatches collection, and in JScript use the $0…$9 properties. To match parenthesis characters, use "$" or "$".
(?:pattern)	Matches pattern but does not capture the match, meaning it is a non-capturing match that is not stored for later use. This is useful when using the alternation character “(\|)” to combine parts of a pattern. For example, “industr(?:y\|ies)” is a more concise expression than “industry\|industries”.
(?=pattern)	Positive lookahead, matches the search string at any position where a string matching pattern begins. This is a non-capturing match, meaning the match is not needed for later use. For example, “Windows(?=95\|98\|NT\|2000)” matches “Windows” in “Windows2000” but not in “Windows3.1”. Lookaheads do not consume characters, meaning after a match occurs, the search for the next match begins immediately after the last match, not after the character containing the lookahead.
(?!pattern)	Negative lookahead, matches the search string at any position where a string not matching pattern begins. This is a non-capturing match, meaning the match is not needed for later use. For example, “Windows(?!95\|98\|NT\|2000)” matches “Windows” in “Windows3.1” but not in “Windows2000”. Lookaheads do not consume characters, meaning after a match occurs, the search for the next match begins immediately after the last match, not after the character containing the lookahead.
x\|y	Matches x or y. For example, “z\|food” matches “z” or “food”. “(z\|f)ood” matches “zood” or “food”.
[xyz]	Character class. Matches any one of the enclosed characters. For example, “[abc]” matches “a” in “plain”.
[^xyz]	Negated character class. Matches any character not enclosed. For example, “[^abc]” matches “p” in “plain”.
[a-z]	Character range. Matches any character within the specified range. For example, “[a-z]” matches any lowercase letter from “a” to “z”.
[^a-z]	Negated character range. Matches any character not within the specified range. For example, “[^a-z]” matches any character not from “a” to “z”.
\b	Matches a word boundary, which is the position between a word and a space. For example, “er\b” matches “er” in “never” but not in “verb”.
\B	Matches a non-word boundary. “er\B” matches “er” in “verb” but not in “never”.
\cx	Matches a control character indicated by x. For example, \cM matches a Control-M or carriage return. The value of x must be A-Z or a-z. Otherwise, c is treated as a literal “c” character.
\d	Matches a digit character. Equivalent to [0-9].
\D	Matches a non-digit character. Equivalent to [^0-9].
\f	Matches a form feed. Equivalent to \x0c and \cL.
\n	Matches a newline character. Equivalent to \x0a and \cJ.
\r	Matches a carriage return. Equivalent to \x0d and \cM.
\s	Matches any whitespace character, including space, tab, form feed, etc. Equivalent to [\f\n\r\t\v].
\S	Matches any non-whitespace character. Equivalent to [^\f\n\r\t\v].
\t	Matches a tab character. Equivalent to \x09 and \cI.
\v	Matches a vertical tab character. Equivalent to \x0b and \cK.
\w	Matches any word character including underscore. Equivalent to “[A-Za-z0-9_]”.
\W	Matches any non-word character. Equivalent to “[^A-Za-z0-9_]”.
\xn	Matches n, where n is a hexadecimal escape value. The hexadecimal escape value must be exactly two digits long. For example, “\x41” matches “A”. “\x041” is equivalent to “\x04&1”. ASCII codes can be used in regular expressions.
\num	Matches num, where num is a positive integer. This is a reference to a previously captured match. For example, “(.)\1” matches two consecutive identical characters.
\n	Denotes an octal escape value or a backreference. If \n is preceded by at least n captured subexpressions, then n is a backreference. Otherwise, if n is an octal digit (0-7), then n is an octal escape value.
\nm	Denotes an octal escape value or a backreference. If \nm is preceded by at least nm captured subexpressions, then nm is a backreference. If \nm is preceded by at least n captures, then n is a backreference followed by the literal m. If neither of the previous conditions is met, and if n and m are octal digits (0-7), then \nm matches the octal escape value nm.
\nml	If n is an octal digit (0-3) and m and l are octal digits (0-7), then matches the octal escape value nml.
\un	Matches n, where n is a Unicode character represented by four hexadecimal digits. For example, \u00A9 matches the copyright symbol (©).