JSON Parser in 50 Lines
Parsing JSON is a great demonstration of Raku grammar power. The JSON spec is well-defined, the format is universally known, and the grammar plus actions fit in about 50 lines. Let us build it from scratch.The JSON Grammar
JSON has six value types: objects, arrays, strings, numbers, booleans, and null. Here is the complete grammar:That is the entire grammar: 15 lines covering the full JSON specification. Let us walk through each part.grammar JSON { rule TOP { <value> } rule value { <object> | <array> | <string> | <number> | <true> | <false> | <null> } rule object { '{' <pair>* % ',' '}' } rule pair { <string> ':' <value> } rule array { '[' <value>* % ',' ']' } token string { '"' <( <str-char>* )> '"' } token str-char { <-["\\\x0 .. \x1f]> | '\\' <escape> } token escape { <["\\/bfnrt]> | 'u' <xdigit> ** 4 } token number { '-'? [ '0' | <[1..9]> <digit>* ] [ '.' <digit>+ ]? [ <[eE]> <[+-]>? <digit>+ ]? } token true { 'true' } token false { 'false' } token null { 'null' } }
Breaking It Down
Values
A JSON value can be any of the six types. Therule keyword gives us free whitespace handling:
rule value { <object> | <array> | <string> | <number> | <true> | <false> | <null> }
Objects
An object is curly braces containing zero or more key-value pairs separated by commas:Therule object { '{' <pair>* % ',' '}' } rule pair { <string> ':' <value> }
% ',' modifier handles commas between pairs automatically.
Arrays
Similar to objects but with values instead of pairs:rule array { '[' <value>* % ',' ']' }
Strings
JSON strings are quoted, with escape sequences:Thetoken string { '"' <( <str-char>* )> '"' } token str-char { <-["\\\x0 .. \x1f]> | '\\' <escape> } token escape { <["\\/bfnrt]> | 'u' <xdigit> ** 4 }
<( and )> capture markers exclude the quotes from the captured value.
Numbers
JSON numbers follow a specific format:token number { '-'? [ '0' | <[1..9]> <digit>* ] [ '.' <digit>+ ]? [ <[eE]> <[+-]>? <digit>+ ]? }
The Action Class
Now let us transform the parse tree into Raku data structures:class JSON-Actions { method TOP($/) { make $<value>.made } method value($/) { make $<object>.made if $<object>; make $<array>.made if $<array>; make $<string>.made if $<string>; make $<number>.made if $<number>; make if $<true>; make if $<false>; make if $<null>; } method object($/) { make $<pair>.map(*.made). } method pair($/) { make $<string>.made => $<value>.made } method array($/) { make $<value>.map(*.made). } method string($/) { make self!unescape($/.) } method number($/) { make +$/. } method !unescape( $s --> ) { $s.subst(:g, / '\\' (.) /, -> $/ { given ~$0 { when '"' { '"' } when '\\' { '\\' } when '/' { '/' } when 'b' { "\b" } when 'f' { "\f" } when 'n' { "\n" } when 'r' { "\r" } when 't' { "\t" } default { "\\$_" } } }) } }
Putting It Together
sub from-json( $text) { my $result = JSON.parse($text, actions => JSON-Actions.new); die "Invalid JSON" unless $result; $result.made } # Test it my $json = q:to/END/; { "name": "Alice", "age": 30, "active": true, "address": { "city": "Toronto", "country": "Canada" }, "hobbies": ["reading", "coding", "hiking"], "notes": null } END my $data = from-json($json); say $data<name>; # Alice say $data<age>; # 30 say $data<active>; # True say $data<address><city>; # Toronto say $data<hobbies>[1]; # coding say $data<notes>.defined; # False
Testing Edge Cases
# Empty object and array say from-json('{}').raku; # {} say from-json('[]').raku; # [] # Nested arrays say from-json('[[1,2],[3,4]]').raku; # [[1, 2], [3, 4]] # Numbers say from-json('42'); # 42 say from-json('-3.14'); # -3.14 say from-json('1e10'); # 10000000000 # Escaped strings say from-json('"hello\\nworld"'); # hello (newline) world say from-json('"say \\"hi\\""'); # say "hi" # Boolean and null say from-json('true'); # True say from-json('false'); # False say from-json('null'); # (Any)
Adding a JSON Writer
For completeness, let us add serialization:Output:sub to-json($value, :$indent = 0, :$level = 0 --> ) { my $pad = ' ' x ($indent * $level); my $inner-pad = ' ' x ($indent * ($level + 1)); my $nl = $indent > 0 ?? "\n" !! ''; my $sp = $indent > 0 ?? ' ' !! ''; given $value { when { return '{}' unless $value.elems; my @pairs = $value.sort(*.key).map: -> $p { "{$inner-pad}\"{$p.key}\":{$sp}{to-json($p.value, :$indent, level => $level + 1)}" }; "\{{$nl}{@pairs.join(",{$nl}")}{$nl}{$pad}\}" } when { return '[]' unless $value.elems; my @items = $value.map: { "{$inner-pad}{to-json($_, :$indent, level => $level + 1)}" }; "[{$nl}{@items.join(",{$nl}")}{$nl}{$pad}]" } when { "\"{.subst('"', '\\"', :g).subst("\n", '\\n', :g)}\"" } when { $_ ?? 'true' !! 'false' } when Numeric { ~$_ } when !.defined { 'null' } default { "\"{$_}\"" } } } my %data = name => "Bob", scores => [95, 87, 92], active => ; say to-json(%data, indent => 2);
{ "active": true, "name": "Bob", "scores": [ 95, 87, 92 ] }
Round-Trip Test
my $original = q:to/END/.trim; {"users":[{"name":"Alice","age":30},{"name":"Bob","age":25}],"count":2} END my $parsed = from-json($original); my $regenerated = to-json($parsed); my $re-parsed = from-json($regenerated); say $parsed<users>[0]<name>; # Alice say $re-parsed<users>[0]<name>; # Alice say $parsed<count> == $re-parsed<count>; # True
Why Build Your Own?
Raku ships withJSON::Fast in the ecosystem, which is faster and handles all edge cases. The point of this exercise is to show that a complete, correct JSON parser fits in about 50 lines of grammar + actions. This demonstrates the power of Raku grammars for real-world parsing tasks.
When you need to parse a custom format, whether it is a configuration file, a log format, a protocol, or a domain-specific language, this same pattern applies: define the grammar, write action methods, and you have a clean, maintainable parser.