raku.gg / grammars

CSV Parser with Grammars

2026-04-05

CSV looks simple until you encounter quoted fields, embedded commas, escaped quotes, and newlines inside values. A proper CSV parser needs to handle all of these. Raku grammars make this surprisingly approachable.

The CSV Spec (Simplified)

A CSV file consists of:

A Minimal CSV Grammar

Let us start with the basics:
grammar CSV { token TOP { <;record>;+ % \n \n? } token record { <;field>;+ % ',' } token field { <;quoted-field>; | <;plain-field>; } token plain-field { <;-[,\n"]>* } token quoted-field { '"' <;( <;quoted-content>; )> '"' } token quoted-content { [ <;-["]> | '""' ]* } } my $csv = q:to/END/.trim; name,age,city Alice,30,Toronto Bob,25,"New York" END say CSV.parse($csv);
The <( and )> markers in quoted-field capture only the content between quotes, excluding the quotes themselves.

Adding Actions for Data Extraction

The grammar parses the structure, but we want usable data:
class CSV-Actions { method TOP($/) { make $<record>;.map(*.made).list; } method record($/) { make $<field>;.map(*.made).list; } method field($/) { make $<quoted-field>; ?? $<quoted-field>;.made !! $<plain-field>;.made; } method plain-field($/) { make $/.Str; } method quoted-field($/) { make $<quoted-content>;.made; } method quoted-content($/) { make $/.Str.subst('""', '"', :g); # Unescape doubled quotes } } my $data = q:to/END/.trim; name,age,city Alice,30,Toronto Bob,25,"New York" "Carol ""CJ"" Jones",35,"San Francisco, CA" END my $result = CSV.parse($data, actions => CSV-Actions.new); for $result.made -> @row { say @row.join(' | '); }
Output:
name | age | city Alice | 30 | Toronto Bob | 25 | New York Carol "CJ" Jones | 35 | San Francisco, CA

Handling Edge Cases

Empty Fields

my $tricky = q:to/END/.trim; a,,c ,b, ,, END say CSV.parse($tricky, actions => CSV-Actions.new).made; # ((a c) ( b ) ( ))
Empty fields are handled by plain-field matching zero characters.

Quoted Fields with Newlines

For CSV files where quoted fields span multiple lines, we need to adjust our grammar:
grammar CSV-Multiline { token TOP { <;record>;+ % \n \n? } token record { <;field>;+ % ',' } token field { <;quoted-field>; | <;plain-field>; } token plain-field { <;-[,\n"]>* } token quoted-field { '"' <;( <;quoted-content>; )> '"' } token quoted-content { [ <;-["]> | '""' ]* } }
Actually, the grammar above already handles embedded newlines because <-["]> in quoted-content matches newlines too. The tricky part is that our TOP rule splits records on \n, which would break multi-line fields. Let us fix that:
grammar CSV-Full { token TOP { ^ <;record>;+ % \n $ } token record { <;field>;+ % ',' } token field { <;quoted-field>; | <;plain-field>; } token plain-field { <;-[,\n"]>* } token quoted-field { '"' <;( <;quoted-inner>; )> '"' } token quoted-inner { [ <;-["]> | '""' ]* } }

A Complete CSV Toolkit

Let us wrap everything in a reusable module-style structure:
grammar CSV-Grammar { token TOP { <;record>;+ % \n \n? } token record { <;field>;+ % ',' } token field { <;quoted-field>; | <;plain-field>; } token plain-field { <;-[,\n"]>* } token quoted-field { '"' <;( <;quoted-inner>; )> '"' } token quoted-inner { [ <;-["]> | '""' ]* } } class CSV-To-Arrays { method TOP($/) { make $<record>;.map(*.made).list } method record($/) { make $<field>;.map(*.made).list } method field($/) { make $<quoted-field>; ?? $<quoted-field>;.made !! $<plain-field>;.made; } method plain-field($/) { make $/.Str } method quoted-field($/) { make $<quoted-inner>;.made } method quoted-inner($/) { make $/.Str.subst('""', '"', :g) } } class CSV-To-Hashes { has @.headers; method TOP($/) { my @records = $<record>;.map(*.made).list; @!headers = @records.shift.list; make @records.map(-> @row { my %hash; for @!headers.kv -> $i, $h { %hash{$h} = @row[$i] // ''; } %hash }).list; } method record($/) { make $<field>;.map(*.made).list } method field($/) { make $<quoted-field>; ?? $<quoted-field>;.made !! $<plain-field>;.made; } method plain-field($/) { make $/.Str } method quoted-field($/) { make $<quoted-inner>;.made } method quoted-inner($/) { make $/.Str.subst('""', '"', :g) } }

Using the Toolkit

my $csv-data = q:to/END/.trim; name,age,city,bio Alice,30,Toronto,"Software developer" Bob,25,"New York","Loves ""coding"" and coffee" Carol,35,"San Francisco, CA",Artist END # As arrays my $arrays = CSV-Grammar.parse($csv-data, actions => CSV-To-Arrays.new); for $arrays.made -> @row { say @row.raku; } # As hashes (first row = headers) my $hashes = CSV-Grammar.parse($csv-data, actions => CSV-To-Hashes.new); for $hashes.made -> %row { say "{%row<name>} from {%row<city>}: {%row<bio>}"; }
Output:
Alice from Toronto: Software developer Bob from New York: Loves "coding" and coffee Carol from San Francisco, CA: Artist

Writing CSV

For completeness, here is a CSV writer:
sub to-csv(@rows --> Str) { @rows.map(-> @fields { @fields.map(-> $f { if $f ~~ / <;[,"\n]> / { '"' ~ $f.subst('"', '""', :g) ~ '"' } else { ~$f } }).join(',') }).join("\n") } my @data = ( [<name age city>;], ["Alice", 30, "Toronto"], ["Bob", 25, "New York"], ['Carol "CJ"', 35, "San Francisco, CA"], ); say to-csv(@data);
Output:
name,age,city Alice,30,Toronto Bob,25,New York "Carol ""CJ""",35,"San Francisco, CA"

Performance Considerations

For small to medium CSV files (up to a few MB), this grammar-based approach works well. For very large files (hundreds of MB), you might want to use a line-by-line approach or the Text::CSV module, which is optimized for throughput. But for correctness and readability, grammars are hard to beat.

This CSV parser demonstrates the real-world value of Raku grammars: they handle complex parsing rules that would be painful with regular expressions alone, and the action classes give you clean data transformation as a bonus.