Match "start" & "end"

Problem

alpha…..↵
alpha…..↵
begin…..↵
…….end↵
….omega↵
….omega↵

  • Match 'alpha' occurs at the very beginning
  • Match two 'alpha' at line head
  • Match 'omega' occurs at the very end
  • Match two 'omega' at line end
  • Match 'begin' at line’s head
  • Match 'end' at line’s tail

Solution

Start of the subject

1
^alpha

  • Regex options: None (“^ and $ match at line breaks” must not be set, if not matches two)
  • Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python
1
\Aalpha
  • Regex options: None
  • Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby
    End of the subject
1
omega$
  • Regex options: None (“^ and $ match at line breaks” must not be set, if not matches two)
  • Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python
1
omega\Z
  • Regex options: None
  • Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby Start of a line
1
^begin
  • Regex options: ^ and $ match at line breaks
  • Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
1
end\$
  • Regex options: ^ and $ match at line breaks
  • Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

  • JavaScript does not support ‹\A›.
  • The anchor ‹^› is equivalent to ‹\A›, as long as you do not turn on the “^ and $ match at line breaks” option.
  • The anchor ‹$› is equivalent to ‹\Z›, as long as you do not turn on the “^ and $ match at line breaks” option.

    • In Java is Pattern.MULTILINE option
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      /**
      * Enables multiline mode.
      *
      * <p> In multiline mode the expressions <tt>^</tt> and <tt>$</tt> match
      * just after or just before, respectively, a line terminator or the end of
      * the input sequence. By default these expressions only match at the
      * beginning and the end of the entire input sequence.
      *
      * <p> Multiline mode can also be enabled via the embedded flag
      * expression&nbsp;<tt>(?m)</tt>. </p>
      */
  • The anchors ‹\Z› and ‹\z› always match at the very end of the subject text, after the last character

  • <\Z› without having to worry about stripping off a trailing line break at the end of your subject text.
  • <\Z> The very last \r\n|\r|\n -> will be ignore.
  • <\z> The very last \r\n|\r|\n -> will not be ignore.

  • JavaScript does not support ‹\A›
  • JavaScript does not support ‹\Z› or ‹\z› at all
  • .NET, Java, PCRE, Perl, and Ruby support both ‹\Z› and ‹\z›.
  • Python supports only ‹\Z›.

Variations

  • .NET, Java, XRegExp, PCRE, Perl, and Python1: (?m) internal mode, for “^ and $ match at line breaks”.
  • Ruby uses‹(?m)› to turn on “dot matches line breaks” mode.
  • In Ruby, ‹^› and ‹$› always match at the start and end of each line.
  • ‹(?-m)› to turn off the option.
  • <(?i)> turn on the ignore the sensitive of letter.
  • <(?s)> dot matches line breaks.[except Ruby]