Regex Subrules
2026-03-28
One of the biggest improvements Raku's regex engine has over traditional regex is the ability to compose patterns from named, reusable components. These are called subrules, and they turn messy regex into readable, maintainable code.
Built-in Character Classes
Raku ships with a rich set of predefined subrules that you can use inside any regex:
say "abc123" ~~ / <alpha>+ /;
say "abc123" ~~ / <digit>+ /;
say "abc123" ~~ / <alnum>+ /;
say " hi " ~~ / <ws> /;
say "abc_123" ~~ / <ident> /;
say "Hello!" ~~ / <upper> /;
say "Hello!" ~~ / <lower>+ /;
say "café" ~~ / <print>+ /;
These are more readable and more correct than hand-written character classes like [a-zA-Z].
Defining Your Own Subrules
Use my regex, my token, or my rule to define reusable patterns:
my token number { '-'? \d+ ['.' \d+]? }
my token word { <alpha> <alnum>* }
my token email { <[\w . -]>+ '@' <[\w . -]>+ }
say "Price: 42.50" ~~ / <number> /;
say "Hello World" ~~ / <word> /;
say "user@host.com" ~~ / <email> /;
Once defined, these can be called by name inside angle brackets < >.
Subrules as Building Blocks
The power comes from combining subrules:
my token year { \d ** 4 }
my token month { \d ** 2 }
my token day { \d ** 2 }
my token date { <year> '-' <month> '-' <day> }
my token time { \d ** 2 ':' \d ** 2 ':' \d ** 2 }
my token datetime { <date> 'T' <time> }
my $stamp = "2026-03-28T14:30:00";
if $stamp ~~ / <datetime> / {
say $<datetime><date><year>;
say $<datetime><date><month>;
say $<datetime><time>;
}
Each subrule call creates a named capture, so you get structured data for free.
Subrules in Grammars
Grammars are essentially collections of subrules. Every token, rule, and regex in a grammar is a subrule:
grammar URL {
token TOP { <scheme> '://' <authority> <path>? ['?' <query>]? }
token scheme { <alpha>+ }
token authority { <host> [':' <port>]? }
token host { <[\w . -]>+ }
token port { \d+ }
token path { '/' <[\w . / -]>* }
token query { <[\w = & . -]>+ }
}
my $m = URL.parse("https://example.com:8080/api/data?format=json");
say $m<scheme>;
say $m<authority><host>;
say $m<authority><port>;
say $m<path>;
say $m<query>;
Calling Subrules with Arguments
Subrules can be parameterized:
my regex bracketed($open, $close) {
$open <( <-[$close]>* )> $close
}
say "Hello (world) there" ~~ / <bracketed('(', ')')> /;
say "Hello [world] there" ~~ / <bracketed('[', ']')> /;
Note: Parameterized regex from variables has some limitations. Grammars provide a more robust way to handle this.
Lookahead and Lookbehind Subrules
Raku supports zero-width assertions:
say "foobar" ~~ / foo <?before bar> /;
say "foobar" ~~ / foo <!before baz> /;
say "foobar" ~~ / <?after foo> bar /;
The <.subrule> Non-capturing Call
Prefix a subrule call with . to use it without capturing:
my $log = "2026-03-28 ERROR disk full";
if $log ~~ / (\S+) <.ws> (\S+) <.ws> (.*) / {
say $0;
say $1;
say $2;
}
This keeps your match object clean, containing only the data you care about.
Alternation with Proto Tokens
In grammars, you can use proto token to create extensible subrules:
grammar Literal {
token TOP { <value> }
proto token value {*}
token value:sym<integer> { '-'? \d+ }
token value:sym<float> { '-'? \d+ '.' \d+ }
token value:sym<string> { '"' <( <-["]>* )> '"' }
token value:sym<bool> { 'true' | 'false' }
}
say Literal.parse('42');
say Literal.parse('3.14');
say Literal.parse('"hello"');
say Literal.parse('true');
Each sym variant is a separate subrule that the proto dispatches to.
Composing with Role Grammars
Grammars can compose rules from roles, just like classes:
role NumberRules {
token integer { '-'? \d+ }
token decimal { '-'? \d+ '.' \d+ }
}
role StringRules {
token single-quoted { "'" <( <-[']>* )> "'" }
token double-quoted { '"' <( <-["]>* )> '"' }
}
grammar MyLang does NumberRules does StringRules {
token TOP { <integer> | <decimal> | <single-quoted> | <double-quoted> }
}
say MyLang.parse("42");
say MyLang.parse("3.14");
say MyLang.parse("'hello'");
Practical Example: Log Line Parser
my token ip-addr {
[\d ** 1..3] ** 4 % '.'
}
my token timestamp {
\d ** 4 '-' \d ** 2 '-' \d ** 2
' '
\d ** 2 ':' \d ** 2 ':' \d ** 2
}
my token log-level {
'DEBUG' | 'INFO' | 'WARN' | 'ERROR' | 'FATAL'
}
my token log-line {
'[' <timestamp> ']'
\s+
<log-level>
\s+
<ip-addr>?
\s*
$<message>=[\N+]
}
my $line = "[2026-03-28 14:30:00] ERROR 10.0.0.1 Connection refused";
if $line ~~ / <log-line> / {
say "Time: {$<log-line><timestamp>}";
say "Level: {$<log-line><log-level>}";
say "IP: {$<log-line><ip-addr>}";
say "Msg: {$<log-line><message>}";
}
Subrules transform regexes from write-only strings into modular, self-documenting patterns. Use them anywhere patterns start getting complex.