raku.gg / regex

Controlling Backtracking

2026-04-03

Backtracking is both the power and the peril of regular expressions. Raku gives you explicit control over when and how the regex engine backtracks, letting you write patterns that are both correct and efficient.

What Is Backtracking?

When a regex engine tries to match a pattern and fails partway through, it backs up and tries alternative match lengths. This is backtracking:
# With backtracking (regex keyword) "aardvark" ~~ regex { \w+ a }; # Engine matches all of "aardvark" with \w+ # Then can't match 'a' -- nothing left # Backtracks: tries "aardvar" for \w+, then matches 'k'? No, needs 'a' # Keeps backtracking until \w+ = "aardv" and 'a' matches 'a'... # Actually matches: "aardva" then remaining 'r' and 'k' don't matter # The full match: \w+ grabs "aardv", then 'a' matches, giving "aardva"

The Three Declaration Types

Raku's three pattern declaration types differ in their backtracking behavior:
# regex -- full backtracking (traditional behavior) my regex with-backtrack { \w+ '.' \w+ } # token -- ratchet mode (no backtracking) my token no-backtrack { \w+ '.' \w+ } # rule -- ratchet mode + significant whitespace my rule spaced { \w+ '.' \w+ }
Let us see the difference:
# regex backtracks: \w+ initially grabs "hello.world" # then backs up to let the '.' literal match say "hello.world" ~~ regex { \w+ '.' \w+ }; # hello.world # token does NOT backtrack: \w+ grabs "hello.world" including the dot # (since \w includes word chars, but '.' is not \w, so \w+ grabs "hello") # Actually: \w+ matches "hello", '.' matches '.', \w+ matches "world" say "hello.world" ~~ token { \w+ '.' \w+ }; # hello.world
The difference shows with patterns where backtracking matters:
say "aaab" ~~ regex { a+ ab }; # "aaab" -- backtracks: a+ gives up one 'a' say "aaab" ~~ token { a+ ab }; # Nil -- no backtrack: a+ eats all 'a's, 'ab' fails

The :ratchet Adverb

You can enable ratchet mode on any regex with the :ratchet adverb:
say "aaab" ~~ / :ratchet a+ ab /; # Nil -- same as token behavior say "aaab" ~~ / a+ ab /; # "aaab" -- default backtracking

Why Disable Backtracking?

  1. Performance: Backtracking can cause exponential runtime on pathological inputs
  2. Predictability: Ratcheting makes match behavior easier to reason about
  3. Correctness for parsing: When parsing structured text, backtracking often produces wrong results
# Catastrophic backtracking example with a traditional regex: # my $evil = "a" x 30 ~ "b"; # $evil ~~ / (a+)+ b /; # Can be extremely slow with backtracking # With token (ratchet), this is instant: # $evil ~~ token { (a+)+ b }; # Fails immediately and efficiently

Selective Backtracking with :!ratchet

Inside a token or rule, you can re-enable backtracking for specific portions:
token mostly-ratchet { <;fixed-part>; :!ratchet # Enable backtracking from here \w+ '.' \w+ :ratchet # Disable again <;more-fixed>; }

The : (colon) Backtrack Control

Inside a pattern, : acts as a commit point. The engine will not backtrack past it:
say "foobar" ~~ / foo : bar /; # foobar -- matched, : has no effect say "foobaz" ~~ / foo : bar /; # Nil -- after matching "foo", committed to it

token vs rule in Practice

The main difference between token and rule is whitespace handling:
token email-token { \w+ '@' \w+ '.' \w+ } # Matches: user@host.com # Space between elements in the pattern is ignored rule email-rule { \w+ '@' \w+ '.' \w+ } # The spaces in the pattern become \s+ matchers # This would match: user @ host . com # But NOT: user@host.com (no whitespace between elements)
Use token for patterns where whitespace is not expected between elements. Use rule for patterns where elements are separated by whitespace (like programming language statements).

Grammars: Choosing the Right Declaration

grammar ProgramLine { # rule for statements (whitespace between keywords and expressions) rule TOP { <;statement>; ';' } # rule: 'if' SPACE condition SPACE '{' are separated by whitespace rule statement { 'if' <;condition>; '{' <;body>; '}' | <;assignment>; } # token for identifiers (no whitespace inside a name) token identifier { <;alpha>; <;alnum>;* } # token for numbers (no whitespace inside a number) token number { '-'? \d+ ['.' \d+]? } # rule for assignments (spaces around =) rule assignment { <;identifier>; '=' <;expression>; } token condition { <;expression>; <;comparator>; <;expression>; } token comparator { '==' | '!=' | '<' | '>' } rule expression { <;identifier>; | <;number>; } rule body { <;statement>;* % ';' } }

Performance Patterns

Avoid Overlapping Alternatives

# Bad: overlapping alternatives cause backtracking my regex slow { [\w+\s+]* \w+ ':' .* } # Good: specific patterns, no overlap my token fast { [\w+] ** 1..* % \s+ ':' \N* }

Anchor Your Patterns

# Without anchors, the engine tries matching at every position my token unanchored { 'ERROR' .* } # With anchors, matching starts at known positions my token anchored { ^ 'ERROR' .* $ }

Use Possessive Matching

In patterns where you know backtracking will never help, make quantifiers possessive:
# Standard greedy (will backtrack if needed) / \w+ / # The token keyword makes everything inside ratcheting (possessive) token { \w+ } # You can think of token as making all quantifiers possessive

Debugging Backtracking

Use Grammar::Tracer to visualize what the engine tries:
use Grammar::Tracer; grammar Test { regex TOP { <;a>; <;b>; } regex a { \w+ } token b { 'end' } } Test.parse("helloend"); # The tracer shows how 'a' backtracks to let 'b' match
Understanding backtracking control is essential for writing efficient parsers. As a rule of thumb: use token by default, rule when you want free whitespace handling, and regex only when you specifically need backtracking.