🦋 Grammar Basics

2026-03-18

Raku grammars are one of the language's most powerful features. They let you define full parsers using a clean, declarative syntax built right into the language. If you have ever written a regex and wished it could be more structured, grammars are exactly what you need.

What Is a Grammar?

A grammar is a special kind of class designed for parsing text. It contains named rules that describe the structure of the input you want to parse. Think of it as a collection of related regular expressions that work together.

grammar Color {
    token TOP { <;hex-color>; | <;rgb-color>; | <;named-color>; }
    token hex-color { '#' <;xdigit>; ** 6 }
    token rgb-color { 'rgb(' <;number>; ',' <;number>; ',' <;number>; ')' }
    token named-color { 'red' | 'green' | 'blue' | 'white' | 'black' }
    token number { \d+ }
}

say Color.parse('#ff00aa');
say Color.parse('rgb(255,128,0)');
say Color.parse('blue');

The TOP rule is the entry point. When you call .parse(), Raku starts matching from TOP and works its way through the referenced rules.

Grammar, Token, Rule, and Regex

Grammars give you three kinds of pattern declarations, each with different backtracking behavior:

grammar Example {
    # 'regex' -- full backtracking (like traditional regexes)
    regex flexible { \w+ \s+ \w+ }

    # 'token' -- no backtracking (ratchet mode)
    token strict { \w+ \s+ \w+ }

    # 'rule' -- no backtracking AND whitespace is significant
    rule spaced { \w+ \w+ }  # the space between \w+ patterns matches \s+
}

Here is a quick breakdown:

regex allows backtracking. The engine can retry different match lengths.
token uses :ratchet mode. Once a part of the pattern matches, the engine will not go back and try a shorter match.
rule uses both :ratchet and :sigspace. Literal spaces in your pattern become \s+ matchers.

For most parsing work, token is your go-to. Use rule when you want whitespace handling for free, and regex only when you truly need backtracking.

Building a Simple Parser

Let us parse a simple key-value configuration format:

grammar Config {
    token TOP { <;entry>;+ % \n }
    token entry { <;key>; \s* '=' \s* <;value>; }
    token key { <;[a..z A..Z _]> <;[a..z A..Z 0..9 _]>* }
    token value { \N+ }
}

my $input = q:to/END/;
host=localhost
port=8080
debug=true
END

my $result = Config.parse($input.trim);

if $result {
    for $result<;entry>; -> $entry {
        say "Key: {$entry<key>}, Value: {$entry<value>}";
    }
} else {
    say "Parse failed!";
}

Output:

Key: host, Value: localhost
Key: port, Value: 8080
Key: debug, Value: true

The Match Object

When a grammar successfully parses input, it returns a Match object. This object is a tree you can navigate using < > subscripts to access named captures:

grammar Greeting {
    token TOP { <;salutation>; ',' \s* <;name>; '!' }
    token salutation { 'Hello' | 'Hi' | 'Hey' }
    token name { \w+ }
}

my $m = Greeting.parse('Hello, World!');

say $m;                   # Full match object
say $m<;salutation>;;       # "Hello"
say $m<;name>;;             # "World"
say $m<;name>;.from;        # Starting position
say $m<;name>;.to;          # Ending position

The % Separator Modifier

One useful pattern is the % modifier, which lets you specify a separator between repeated elements:

grammar NumberList {
    token TOP { <;number>;+ % ',' }
    token number { \d+ }
}

my $m = NumberList.parse('10,20,30,40');
say $m<;number>;;  # List of match objects for each number

This is much cleaner than manually matching commas between items.

Debugging Grammars

Raku includes a built-in grammar tracer. Enable it by mixing in the Grammar::Tracer role:

use Grammar::Tracer;

grammar Debug {
    token TOP { <;word>;+ % \s+ }
    token word { \w+ }
}

Debug.parse('hello world');

This prints a tree of which rules are being tried and whether they succeed or fail. It is invaluable when your grammar is not matching what you expect.

When to Use Grammars

Grammars are the right tool when:

You need to parse structured text (config files, log formats, protocols)
Regular expressions are getting too complex to maintain
You want to build an AST from parsed input
You are implementing a DSL or a small language

They are overkill for simple pattern matching. For that, stick with regular Raku regexes.

In the next grammar post, we will look at action classes, which let you transform the parse tree into useful data structures as you parse.