Lazy Greedy Selfish

Repeat Certain Number

Hexadecimal number

\b[a-f0-9]{1,8}\b

  • Regex options: Case insensitive
  • Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Floating-point number

\d*\.\d+(e\d+)?

  • Regex options: Case insensitive
  • Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Lazy quantifiers

1
2
3
4
*?
+?
??
{7,42}?
  1. any times, one after another, the less the better
  2. more than one time, one after another, the less the better
  3. have or not have, not have first
  4. seven to fourtytwo times, the less the better

In default repeat like * + ? {} after match unit are all greedy[greedy is not selfish]

Possessive quantifiers

Possessive quantifiers is greedy and selfish
It’s greedy and selfish, when it matches, it wouldn’t care other expressions’ express.
In other words, it wouldn’t Backtracking when matches.

1
2
3
4
*+
++
?+
{7,42}+
  1. any times, block reduce to match, the more the better, no backtracking
  2. more than one time, the more the better, no backtracking
  3. have or not have, the more the better, no backtracking
  4. seven to fourtytwo times, the more the better, no backtracking

\b\d++\b

  • Regex options: None
  • Regex flavors: Java, PCRE, Perl 5.10, Ruby 1.9

\b(?>\d+)\b

  • Regex options: None
  • Regex flavors: .NET, Java, PCRE, Perl, Ruby

ps. (?>\d+) is an atomic group that essentially is a noncapturing group,
with the extra job of refusing to backtrack.

Example

\w++\d++ can’t mathes “123abc”
Because \w++ selfishly occupy “123abc”, don’t care \d++ failed to match.
When use (\w+)(\d+) to match “123abc”, i found gourp \1 matches “12”
and group \2 matches “3”, so greedy works.
It’s not mean selfish is bad, but only unusual work in simply task.

Prevent Runaway Repetition

Why prevent? because in some complex condition, expression will complex the same.
In these condition Possessive quantifiers is necessary.

Problem

Match a html’s origin code
Solution

1
2
<html>(?>.*?<head>)(?>.*?<title>)(?>.*?</title>)↵
(?>.*?</head>)(?>.*?<body[^>]*>)(?>.*?</body>).*?</html>

  • Regex options: Case insensitive, dot matches line breaks
  • Regex flavors: .NET, Java, PCRE, Perl, Ruby

ps.If some tag properties need be matched, please modify expression.

This regular expression has a worst-case complexity$^3$ of $O(n^7)$
If the file is twice the size, the regex can need up to 128 times
as many steps to figure out it doesn’t match.

Example

(x+x+)+y to matches “xxxxxxxxxx”
It should not matches, watching the operation progress.
This regex is $O(2^n)$, if one more ‘x’ is added, the regex can need up to 2times