Lookahead and lookbehind

(?<=<b>)\w+(?=</b>)

  • Regex options: Case insensitive
  • Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby 1.9

JavaScript and Ruby 1.8 support the lookahead (?=</b>)
, but not the lookbehind (?<=<b>)

  • I prefer call lookbehind that where ahead cursor is something
  • I prefer call lookahead that where behind cursor is something

Positive lookaround

  • Essentially, lookaround checks whether certain text can be
    matched without actually matching it.

  • lookbehind (?<=…) is the only regular expression construct that
    will traverse the text right to left instead of from left to right

  • Lookaround constructs are therefore called zero-length assertions.

Negative lookaround

  • (?!...) (?<!...), with an exclamation point
    instead of an equals sign, is negative lookaround.

  • negative lookaround matches when the regex inside the
    lookaround fails to match.

Different levels of lookbehind

  • lookahead is completely compatible, even
    lookahead or lookbehind nested in lookahead.

  • lookbehind is different, because regex is design traverse
    from left to right, but lookbehind needs right to left


  1. Perl and Python still require lookbehind to have a fixed length

  2. PCRE and Ruby 1.9 allow alternatives of different lengths inside lookbehind
    notepad++ use PCRE7.2 regular expression engine?

  3. Java takes lookbehind one step further, allows any finite-length
    regular expression ‹*›, ‹+›, and ‹{42,}› inside lookbehind

  4. .NET Framework is the only one in the world
    that can actually apply a full regular expression from right to left.


Lookaround is atomic

(?=(\d+))\w+\1

  • Regex options: None
  • Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

The group capture inside the lookaround is same as usual group,
numbered from outter to inner , left to right

Alternative to Lookbehind

<b>\K\w+(?=</b>)

  • Regex options: Case insensitive
  • Regex flavors: PCRE 7.2, Perl 5.10

Match with ‘\K’, string in front of it will not be pattern.
It matches like a block, no recursive, no loop, no backtrack.

For example:
when (?<=a)a matches the string ‘aaaa’, three a be matched,
the 2th/ 3th/ 4th a. Lookbehind will track to left one matched then next.

But a\Ka matches two a, the 2th and the 4th.
when first/second a captured, abandon first, then second matches.
Then begin next matching, third/fourth a captured, abandon thrid.

Solution Without Lookbehind

In Ruby 1.8 or JavaScript there is no lookbehind can be use.
Solution:

  • use a common expression to suit, group them, just pick the group you want.
  • If replace operation needed, use group number to replace
    which place you don’t want be changed \1 or \kxxx.

simulate lookbehind

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
var mainregexp = /\w+(?=<\/b>)/;
var lookbehind = /<b>$/;
if (match = mainregexp.exec("My <b>cat</b> is furry")) {
// Found a word before a closing tag </b>
var potentialmatch = match[0];
var leftContext = match.input.substring(0, match.index);
if (lookbehind.exec(leftContext)) {
// Lookbehind matched:
// potentialmatch occurs between a pair of <b> tags
} else {
// Lookbehind failed: potentialmatch is no good
}
} else {
// Unable to find a word before a closing tag </b>
}
  • first find the target location with Lookahead, remove it.
  • second if the forepart is end with what you lookbehind anticipated <b>, then lookbehind matched.