raku.gg / regex

Regex Subrules

2026-03-28

One of the biggest improvements Raku's regex engine has over traditional regex is the ability to compose patterns from named, reusable components. These are called subrules, and they turn messy regex into readable, maintainable code.

Built-in Character Classes

Raku ships with a rich set of predefined subrules that you can use inside any regex:
say "abc123" ~~ / <;alpha>;+ /; # abc -- alphabetic characters say "abc123" ~~ / <;digit>;+ /; # 123 -- digits say "abc123" ~~ / <;alnum>;+ /; # abc123 -- alphanumeric say " hi " ~~ / <;ws>; /; # whitespace say "abc_123" ~~ / <;ident>; /; # abc_123 -- identifier say "Hello!" ~~ / <;upper>; /; # H say "Hello!" ~~ / <;lower>;+ /; # ello say "café" ~~ / <;print>;+ /; # café -- printable characters
These are more readable and more correct than hand-written character classes like [a-zA-Z].

Defining Your Own Subrules

Use my regex, my token, or my rule to define reusable patterns:
my token number { '-'? \d+ ['.' \d+]? } my token word { <;alpha>; <;alnum>;* } my token email { <;[\w . -]>+ '@' <;[\w . -]>+ } say "Price: 42.50" ~~ / <;number>; /; # 42.50 say "Hello World" ~~ / <;word>; /; # Hello say "user@host.com" ~~ / <;email>; /; # user@host.com
Once defined, these can be called by name inside angle brackets < >.

Subrules as Building Blocks

The power comes from combining subrules:
my token year { \d ** 4 } my token month { \d ** 2 } my token day { \d ** 2 } my token date { <;year>; '-' <;month>; '-' <;day>; } my token time { \d ** 2 ':' \d ** 2 ':' \d ** 2 } my token datetime { <;date>; 'T' <;time>; } my $stamp = "2026-03-28T14:30:00"; if $stamp ~~ / <;datetime>; / { say $<datetime>;<date>;<year>;; # 2026 say $<datetime>;<date>;<month>;; # 03 say $<datetime>;<time>;; # 14:30:00 }
Each subrule call creates a named capture, so you get structured data for free.

Subrules in Grammars

Grammars are essentially collections of subrules. Every token, rule, and regex in a grammar is a subrule:
grammar URL { token TOP { <;scheme>; '://' <;authority>; <;path>;? ['?' <;query>;]? } token scheme { <;alpha>;+ } token authority { <;host>; [':' <;port>;]? } token host { <;[\w . -]>+ } token port { \d+ } token path { '/' <;[\w . / -]>* } token query { <;[\w = &; . -]>+ } } my $m = URL.parse("https://example.com:8080/api/data?format=json"); say $m<;scheme>;; # https say $m<;authority>;<host>;; # example.com say $m<;authority>;<port>;; # 8080 say $m<;path>;; # /api/data say $m<;query>;; # format=json

Calling Subrules with Arguments

Subrules can be parameterized:
my regex bracketed($open, $close) { $open <;( <;-[$close]>* )> $close } say "Hello (world) there" ~~ / <;bracketed('(', ')')> /; # world say "Hello [world] there" ~~ / <;bracketed('[', ']')> /; # world
Note: Parameterized regex from variables has some limitations. Grammars provide a more robust way to handle this.

Lookahead and Lookbehind Subrules

Raku supports zero-width assertions:
# Lookahead: match only if followed by pattern say "foobar" ~~ / foo <;?before bar>; /; # foo # Negative lookahead say "foobar" ~~ / foo <;!before baz>; /; # foo # Lookbehind: match only if preceded by pattern say "foobar" ~~ / <;?after foo>; bar /; # bar

The <.subrule> Non-capturing Call

Prefix a subrule call with . to use it without capturing:
my $log = "2026-03-28 ERROR disk full"; # <.ws> matches whitespace but does not capture it if $log ~~ / (\S+) <;.ws>; (\S+) <;.ws>; (.*) / { say $0; # 2026-03-28 say $1; # ERROR say $2; # disk full }
This keeps your match object clean, containing only the data you care about.

Alternation with Proto Tokens

In grammars, you can use proto token to create extensible subrules:
grammar Literal { token TOP { <;value>; } proto token value {*} token value:sym<;integer>; { '-'? \d+ } token value:sym<;float>; { '-'? \d+ '.' \d+ } token value:sym<;string>; { '"' <;( <;-["]>* )> '"' } token value:sym<;bool>; { 'true' | 'false' } } say Literal.parse('42'); # Matches value:sym<integer> say Literal.parse('3.14'); # Matches value:sym<float> say Literal.parse('"hello"'); # Matches value:sym<string> say Literal.parse('true'); # Matches value:sym<bool>
Each sym variant is a separate subrule that the proto dispatches to.

Composing with Role Grammars

Grammars can compose rules from roles, just like classes:
role NumberRules { token integer { '-'? \d+ } token decimal { '-'? \d+ '.' \d+ } } role StringRules { token single-quoted { "'" <;( <;-[']>* )> "'" } token double-quoted { '"' <;( <;-["]>* )> '"' } } grammar MyLang does NumberRules does StringRules { token TOP { <;integer>; | <;decimal>; | <;single-quoted>; | <;double-quoted>; } } say MyLang.parse("42"); say MyLang.parse("3.14"); say MyLang.parse("'hello'");

Practical Example: Log Line Parser

my token ip-addr { [\d ** 1..3] ** 4 % '.' } my token timestamp { \d ** 4 '-' \d ** 2 '-' \d ** 2 ' ' \d ** 2 ':' \d ** 2 ':' \d ** 2 } my token log-level { 'DEBUG' | 'INFO' | 'WARN' | 'ERROR' | 'FATAL' } my token log-line { '[' <;timestamp>; ']' \s+ <;log-level>; \s+ <;ip-addr>;? \s* $<message>;=[\N+] } my $line = "[2026-03-28 14:30:00] ERROR 10.0.0.1 Connection refused"; if $line ~~ / <;log-line>; / { say "Time: {$<log-line><timestamp>}"; say "Level: {$<log-line><log-level>}"; say "IP: {$<log-line><ip-addr>}"; say "Msg: {$<log-line><message>}"; }
Subrules transform regexes from write-only strings into modular, self-documenting patterns. Use them anywhere patterns start getting complex.