Code Blocks in Regex
Raku lets you embed executable code directly inside regex patterns. This feature turns a regex from a pure pattern matcher into a programmable parser. You can run side effects, perform validations, construct values, and even modify the match behavior dynamically.Basic Code Blocks
Curly braces inside a regex execute Raku code at that point in the match:The code runs during the matching process, at the point where the engine reaches the block."hello world" ~~ / (\w+) { say "Matched: $0" } \s+ (\w+) { say "Then: $1" } /; # Output: # Matched: hello # Then: world
Accessing Match State
Inside a code block, you have access to the current match object$/ and all captures:
"abc123def456" ~~ / (\d+) { say "Found number: $0 at position {$0.from}" } /; # Found number: 123 at position 3
Code Blocks for Debugging
Code blocks are incredibly useful for understanding what the regex engine is doing:This lets you trace the match process step by step."hello" ~~ / { say "Starting match..." } h { say "Matched 'h'" } e { say "Matched 'e'" } llo { say "Matched 'llo'" } /;
Dynamic Assertions with <?{ }>
The<?{ }> construct is a code assertion. It succeeds if the code returns true, and fails (causing backtracking) if it returns false:
This is like a# Match a number, but only if it's even "42" ~~ / (\d+) <?{ $0 %% 2 }> /; say $/; # 42 "43" ~~ / (\d+) <?{ $0 %% 2 }> /; say $/; # Nil (43 is odd, assertion failed)
where clause for regex matches.
Negative Assertions with <!{ }>
The negative form fails if the code returns true:# Match any word that is NOT a reserved keyword my @keywords = <if else for while>; "myvar" ~~ / (\w+) <!{ $0 (elem) @keywords }> /; say $/; # myvar "while" ~~ / (\w+) <!{ $0 (elem) @keywords }> /; say $/; # Nil (while is a keyword)
Building Values with make
Code blocks can usemake to attach semantic values to the current match:
This is the same mechanism grammar actions use, but inline.my $m = "42" ~~ / (\d+) { make $0. * 2 } /; say $m.made; # 84
Practical: Validating IP Addresses
A pure regex cannot check that each octet is 0-255. Code blocks can:my token ip-address { (\d ** 1..3) <?{ 0 <= $0. <= 255 }> ** 4 % '.' } say "192.168.1.1" ~~ / <ip-address> /; # Matches say "192.168.1.300" ~~ / <ip-address> /; # Nil (300 > 255) say "10.0.0.1" ~~ / <ip-address> /; # Matches
Practical: Balanced Parentheses
Check that parentheses are balanced, which is impossible with pure regex:Though for this particular case, a grammar with recursion is cleaner.my $str = "((a + b) * (c - d))"; $str ~~ / ^ { my $depth = 0; my $balanced = ; for $str.comb -> $ch { if $ch eq '(' { $depth++ } elsif $ch eq ')' { $depth--; $balanced = if $depth < 0 } } $balanced = if $depth != 0; make $balanced; } /; say $/.made; # True
Dynamic Patterns
Code blocks can generate patterns dynamically:For more complex dynamic patterns, usemy $keyword = "error"; my $log = "2026-04-09 ERROR disk is full"; # Interpolate a variable as a pattern $log ~~ / :i $keyword /; say $/; # ERROR (case-insensitive match)
<{code}> which returns a regex to match:
my @patterns = <error warn fatal>; my $combined = @patterns.join('|'); "Found a warning" ~~ / <{ $combined }> /; # This does not work as expected -- <{}> generates a pattern string # Better approach: use alternation from an array "Found a warning" ~~ / @patterns /; # Raku interpolates arrays in regex as alternations
Code Blocks with Backtracking
Be aware that code blocks in aregex (with backtracking) may execute multiple times as the engine explores alternatives:
In a"aab" ~~ regex { (a+) { say "Trying: '$0'" } ab }; # Output: # Trying: 'aa' # Trying: 'a' # The engine first tries a+='aa', fails on 'ab', backtracks to a+='a', succeeds
token (ratchet mode), each code block executes at most once.
Side Effects: Counting Matches
my $count = 0; "banana" ~~ m:g/ (an) { $count++ } /; say "Found $count occurrences"; # Found 2 occurrences
Building a Parse Result Inline
my %data; my $config = "host=localhost\nport=8080\ndebug=true"; $config ~~ m:g/ (\w+) '=' (\N+) { %data{$0} = ~$1 } /; say %data; # {debug => true, host => localhost, port => 8080}
Code Blocks in Grammars
In grammars, code blocks are usually less common because action classes handle the transformation. But they are useful for inline validation:grammar Date { token TOP { <year> '-' <month> '-' <day> } token year { \d ** 4 } token month { (\d ** 2) <?{ 1 <= $0. <= 12 }> } token day { (\d ** 2) <?{ 1 <= $0. <= 31 }> } } say Date.parse("2026-04-09"); # Matches say Date.parse("2026-13-09"); # Nil (month 13 is invalid) say Date.parse("2026-04-32"); # Nil (day 32 is invalid)
Performance Considerations
- Code blocks add overhead. For hot loops matching millions of strings, keep blocks simple.
- Assertions (
<?{ }>) cause backtracking when they fail, which has performance implications. - In
tokenmode, failed assertions cause immediate failure without backtracking, which is faster.
Guidelines
- Use code blocks for validation that pure pattern matching cannot express (numeric ranges, lookups, cross-field checks).
- Use
<?{ }>for constraints and{ }for side effects ormake. - Keep code blocks short. If you need complex logic, move it to a grammar action class.
- Remember that blocks in
regex(backtracking) mode may execute more than once.
Code blocks bridge the gap between pattern matching and programming, giving you the full power of Raku right inside your regular expressions.