Controlling Backtracking
2026-04-03
Backtracking is both the power and the peril of regular expressions. Raku gives you explicit control over when and how the regex engine backtracks, letting you write patterns that are both correct and efficient.
What Is Backtracking?
When a regex engine tries to match a pattern and fails partway through, it backs up and tries alternative match lengths. This is backtracking:
"aardvark" ~~ regex { \w+ a };
The Three Declaration Types
Raku's three pattern declaration types differ in their backtracking behavior:
my regex with-backtrack { \w+ '.' \w+ }
my token no-backtrack { \w+ '.' \w+ }
my rule spaced { \w+ '.' \w+ }
Let us see the difference:
say "hello.world" ~~ regex { \w+ '.' \w+ };
say "hello.world" ~~ token { \w+ '.' \w+ };
The difference shows with patterns where backtracking matters:
say "aaab" ~~ regex { a+ ab };
say "aaab" ~~ token { a+ ab };
The :ratchet Adverb
You can enable ratchet mode on any regex with the :ratchet adverb:
say "aaab" ~~ / :ratchet a+ ab /;
say "aaab" ~~ / a+ ab /;
Why Disable Backtracking?
- Performance: Backtracking can cause exponential runtime on pathological inputs
- Predictability: Ratcheting makes match behavior easier to reason about
- Correctness for parsing: When parsing structured text, backtracking often produces wrong results
Selective Backtracking with :!ratchet
Inside a token or rule, you can re-enable backtracking for specific portions:
token mostly-ratchet {
<fixed-part>
:!ratchet
\w+ '.' \w+
:ratchet
<more-fixed>
}
The : (colon) Backtrack Control
Inside a pattern, : acts as a commit point. The engine will not backtrack past it:
say "foobar" ~~ / foo : bar /;
say "foobaz" ~~ / foo : bar /;
token vs rule in Practice
The main difference between token and rule is whitespace handling:
token email-token { \w+ '@' \w+ '.' \w+ }
rule email-rule { \w+ '@' \w+ '.' \w+ }
Use token for patterns where whitespace is not expected between elements. Use rule for patterns where elements are separated by whitespace (like programming language statements).
Grammars: Choosing the Right Declaration
grammar ProgramLine {
rule TOP { <statement> ';' }
rule statement { 'if' <condition> '{' <body> '}' | <assignment> }
token identifier { <alpha> <alnum>* }
token number { '-'? \d+ ['.' \d+]? }
rule assignment { <identifier> '=' <expression> }
token condition { <expression> <comparator> <expression> }
token comparator { '==' | '!=' | '<' | '>' }
rule expression { <identifier> | <number> }
rule body { <statement>* % ';' }
}
Performance Patterns
Avoid Overlapping Alternatives
my regex slow { [\w+\s+]* \w+ ':' .* }
my token fast { [\w+] ** 1..* % \s+ ':' \N* }
Anchor Your Patterns
my token unanchored { 'ERROR' .* }
my token anchored { ^ 'ERROR' .* $ }
Use Possessive Matching
In patterns where you know backtracking will never help, make quantifiers possessive:
Debugging Backtracking
Use Grammar::Tracer to visualize what the engine tries:
use Grammar::Tracer;
grammar Test {
regex TOP { <a> <b> }
regex a { \w+ }
token b { 'end' }
}
Test.parse("helloend");
Understanding backtracking control is essential for writing efficient parsers. As a rule of thumb: use token by default, rule when you want free whitespace handling, and regex only when you specifically need backtracking.