Grammar Basics
Raku grammars are one of the language's most powerful features. They let you define full parsers using a clean, declarative syntax built right into the language. If you have ever written a regex and wished it could be more structured, grammars are exactly what you need.What Is a Grammar?
A grammar is a special kind of class designed for parsing text. It contains named rules that describe the structure of the input you want to parse. Think of it as a collection of related regular expressions that work together.Thegrammar Color { token TOP { <hex-color> | <rgb-color> | <named-color> } token hex-color { '#' <xdigit> ** 6 } token rgb-color { 'rgb(' <number> ',' <number> ',' <number> ')' } token named-color { 'red' | 'green' | 'blue' | 'white' | 'black' } token number { \d+ } } say Color.parse('#ff00aa'); say Color.parse('rgb(255,128,0)'); say Color.parse('blue');
TOP rule is the entry point. When you call .parse(), Raku starts matching from TOP and works its way through the referenced rules.
Grammar, Token, Rule, and Regex
Grammars give you three kinds of pattern declarations, each with different backtracking behavior:Here is a quick breakdown:grammar Example { # 'regex' -- full backtracking (like traditional regexes) regex flexible { \w+ \s+ \w+ } # 'token' -- no backtracking (ratchet mode) token strict { \w+ \s+ \w+ } # 'rule' -- no backtracking AND whitespace is significant rule spaced { \w+ \w+ } # the space between \w+ patterns matches \s+ }
- regex allows backtracking. The engine can retry different match lengths.
- token uses
:ratchetmode. Once a part of the pattern matches, the engine will not go back and try a shorter match. - rule uses both
:ratchetand:sigspace. Literal spaces in your pattern become\s+matchers.
For most parsing work, token is your go-to. Use rule when you want whitespace handling for free, and regex only when you truly need backtracking.
Building a Simple Parser
Let us parse a simple key-value configuration format:Output:grammar Config { token TOP { <entry>+ % \n } token entry { <key> \s* '=' \s* <value> } token key { <[a..z A..Z _]> <[a..z A..Z 0..9 _]>* } token value { \N+ } } my $input = q:to/END/; host=localhost port=8080 debug=true END my $result = Config.parse($input.trim); if $result { for $result<entry> -> $entry { say "Key: {$entry<key>}, Value: {$entry<value>}"; } } else { say "Parse failed!"; }
Key: host, Value: localhost Key: port, Value: 8080 Key: debug, Value: true
The Match Object
When a grammar successfully parses input, it returns aMatch object. This object is a tree you can navigate using < > subscripts to access named captures:
grammar Greeting { token TOP { <salutation> ',' \s* <name> '!' } token salutation { 'Hello' | 'Hi' | 'Hey' } token name { \w+ } } my $m = Greeting.parse('Hello, World!'); say $m; # Full match object say $m<salutation>; # "Hello" say $m<name>; # "World" say $m<name>.from; # Starting position say $m<name>.to; # Ending position
The % Separator Modifier
One useful pattern is the% modifier, which lets you specify a separator between repeated elements:
This is much cleaner than manually matching commas between items.grammar NumberList { token TOP { <number>+ % ',' } token number { \d+ } } my $m = NumberList.parse('10,20,30,40'); say $m<number>; # List of match objects for each number
Debugging Grammars
Raku includes a built-in grammar tracer. Enable it by mixing in theGrammar::Tracer role:
This prints a tree of which rules are being tried and whether they succeed or fail. It is invaluable when your grammar is not matching what you expect.use Grammar::Tracer; grammar Debug { token TOP { <word>+ % \s+ } token word { \w+ } } Debug.parse('hello world');
When to Use Grammars
Grammars are the right tool when:- You need to parse structured text (config files, log formats, protocols)
- Regular expressions are getting too complex to maintain
- You want to build an AST from parsed input
- You are implementing a DSL or a small language
They are overkill for simple pattern matching. For that, stick with regular Raku regexes.
In the next grammar post, we will look at action classes, which let you transform the parse tree into useful data structures as you parse.