🦋 CSV Parser with Grammars

2026-04-05

CSV looks simple until you encounter quoted fields, embedded commas, escaped quotes, and newlines inside values. A proper CSV parser needs to handle all of these. Raku grammars make this surprisingly approachable.

The CSV Spec (Simplified)

A CSV file consists of:

Rows separated by newlines
Fields separated by commas
Fields optionally enclosed in double quotes
Quoted fields can contain commas, newlines, and escaped quotes ("")

A Minimal CSV Grammar

Let us start with the basics:

grammar CSV {
    token TOP { <;record>;+ % \n \n? }
    token record { <;field>;+ % ',' }
    token field { <;quoted-field>; | <;plain-field>; }
    token plain-field { <;-[,\n"]>* }
    token quoted-field { '"' <;( <;quoted-content>; )> '"' }
    token quoted-content { [ <;-["]> | '""' ]* }
}

my $csv = q:to/END/.trim;
name,age,city
Alice,30,Toronto
Bob,25,"New York"
END

say CSV.parse($csv);

The <( and )> markers in quoted-field capture only the content between quotes, excluding the quotes themselves.

Adding Actions for Data Extraction

The grammar parses the structure, but we want usable data:

class CSV-Actions {
    method TOP($/) {
        make $<record>;.map(*.made).list;
    }

    method record($/) {
        make $<field>;.map(*.made).list;
    }

    method field($/) {
        make $<quoted-field>;
            ?? $<quoted-field>;.made
            !! $<plain-field>;.made;
    }

    method plain-field($/) {
        make $/.Str;
    }

    method quoted-field($/) {
        make $<quoted-content>;.made;
    }

    method quoted-content($/) {
        make $/.Str.subst('""', '"', :g);  # Unescape doubled quotes
    }
}

my $data = q:to/END/.trim;
name,age,city
Alice,30,Toronto
Bob,25,"New York"
"Carol ""CJ"" Jones",35,"San Francisco, CA"
END

my $result = CSV.parse($data, actions => CSV-Actions.new);
for $result.made -> @row {
    say @row.join(' | ');
}

Output:

name | age | city
Alice | 30 | Toronto
Bob | 25 | New York
Carol "CJ" Jones | 35 | San Francisco, CA

Handling Edge Cases

Empty Fields

my $tricky = q:to/END/.trim;
a,,c
,b,
,,
END

say CSV.parse($tricky, actions => CSV-Actions.new).made;
# ((a  c) ( b ) (  ))

Empty fields are handled by plain-field matching zero characters.

Quoted Fields with Newlines

For CSV files where quoted fields span multiple lines, we need to adjust our grammar:

grammar CSV-Multiline {
    token TOP { <;record>;+ % \n \n? }
    token record { <;field>;+ % ',' }
    token field { <;quoted-field>; | <;plain-field>; }
    token plain-field { <;-[,\n"]>* }
    token quoted-field { '"' <;( <;quoted-content>; )> '"' }
    token quoted-content { [ <;-["]> | '""' ]* }
}

Actually, the grammar above already handles embedded newlines because <-["]> in quoted-content matches newlines too. The tricky part is that our TOP rule splits records on \n, which would break multi-line fields. Let us fix that:

grammar CSV-Full {
    token TOP { ^ <;record>;+ % \n $ }

    token record {
        <;field>;+ % ','
    }

    token field {
        <;quoted-field>; | <;plain-field>;
    }

    token plain-field {
        <;-[,\n"]>*
    }

    token quoted-field {
        '"' <;( <;quoted-inner>; )> '"'
    }

    token quoted-inner {
        [ <;-["]> | '""' ]*
    }
}

A Complete CSV Toolkit

Let us wrap everything in a reusable module-style structure:

grammar CSV-Grammar {
    token TOP { <;record>;+ % \n \n? }
    token record { <;field>;+ % ',' }
    token field { <;quoted-field>; | <;plain-field>; }
    token plain-field { <;-[,\n"]>* }
    token quoted-field { '"' <;( <;quoted-inner>; )> '"' }
    token quoted-inner { [ <;-["]> | '""' ]* }
}

class CSV-To-Arrays {
    method TOP($/) { make $<record>;.map(*.made).list }
    method record($/) { make $<field>;.map(*.made).list }
    method field($/) {
        make $<quoted-field>; ?? $<quoted-field>;.made !! $<plain-field>;.made;
    }
    method plain-field($/) { make $/.Str }
    method quoted-field($/) { make $<quoted-inner>;.made }
    method quoted-inner($/) { make $/.Str.subst('""', '"', :g) }
}

class CSV-To-Hashes {
    has @.headers;

    method TOP($/) {
        my @records = $<record>;.map(*.made).list;
        @!headers = @records.shift.list;
        make @records.map(-> @row {
            my %hash;
            for @!headers.kv -> $i, $h {
                %hash{$h} = @row[$i] // '';
            }
            %hash
        }).list;
    }
    method record($/) { make $<field>;.map(*.made).list }
    method field($/) {
        make $<quoted-field>; ?? $<quoted-field>;.made !! $<plain-field>;.made;
    }
    method plain-field($/) { make $/.Str }
    method quoted-field($/) { make $<quoted-inner>;.made }
    method quoted-inner($/) { make $/.Str.subst('""', '"', :g) }
}

Using the Toolkit

my $csv-data = q:to/END/.trim;
name,age,city,bio
Alice,30,Toronto,"Software developer"
Bob,25,"New York","Loves ""coding"" and coffee"
Carol,35,"San Francisco, CA",Artist
END

# As arrays
my $arrays = CSV-Grammar.parse($csv-data, actions => CSV-To-Arrays.new);
for $arrays.made -> @row {
    say @row.raku;
}

# As hashes (first row = headers)
my $hashes = CSV-Grammar.parse($csv-data, actions => CSV-To-Hashes.new);
for $hashes.made -> %row {
    say "{%row<name>} from {%row<city>}: {%row<bio>}";
}

Output:

Alice from Toronto: Software developer
Bob from New York: Loves "coding" and coffee
Carol from San Francisco, CA: Artist

Writing CSV

For completeness, here is a CSV writer:

sub to-csv(@rows --> Str) {
    @rows.map(-> @fields {
        @fields.map(-> $f {
            if $f ~~ / <;[,"\n]> / {
                '"' ~ $f.subst('"', '""', :g) ~ '"'
            } else {
                ~$f
            }
        }).join(',')
    }).join("\n")
}

my @data = (
    [<name age city>;],
    ["Alice", 30, "Toronto"],
    ["Bob", 25, "New York"],
    ['Carol "CJ"', 35, "San Francisco, CA"],
);

say to-csv(@data);

Output:

name,age,city
Alice,30,Toronto
Bob,25,New York
"Carol ""CJ""",35,"San Francisco, CA"

Performance Considerations

For small to medium CSV files (up to a few MB), this grammar-based approach works well. For very large files (hundreds of MB), you might want to use a line-by-line approach or the Text::CSV module, which is optimized for throughput. But for correctness and readability, grammars are hard to beat.

This CSV parser demonstrates the real-world value of Raku grammars: they handle complex parsing rules that would be painful with regular expressions alone, and the action classes give you clean data transformation as a bonus.