Hyper and Race
When you have a list of items to process and each item is independent, Raku's.hyper and .race methods let you parallelize the work across multiple threads with minimal code changes. They are the easiest way to add parallelism to data processing pipelines.
.hyper: Ordered Parallel Processing
.hyper processes elements in parallel but preserves the original order of results:
Despite running on multiple threads, the output order matches the input order.my @results = (1..20).hyper.map(-> $n { sleep 0.1; # Simulate work $n ** 2 }); say @results; # (1 4 9 16 25 ... 400) -- always in order
.race: Unordered Parallel Processing
.race also processes in parallel but does NOT guarantee order. This can be faster because there is no synchronization overhead:
Usemy @results = (1..20).race.map(-> $n { sleep rand * 0.2; # Variable-time work $n ** 2 }); say @results; # Order may vary! (25 4 1 16 9 ...)
.race when order does not matter and you want maximum throughput.
Configuring Parallelism
Both methods accept:batch and :degree parameters:
# degree: number of worker threads (default: number of CPU cores) # batch: how many items each worker processes at once (default: 64) my @results = (1..1000).hyper(degree => 4, batch => 50).map(-> $n { $n ** 2 });
- degree controls the number of parallel workers
- batch controls how many items are sent to each worker at a time
Tuning these can significantly affect performance:
# Small batches for uneven workloads (1..100).hyper(batch => 1).map(-> $n { sleep $n * 0.01; # Work time varies a lot $n }); # Large batches for uniform workloads (1..100000).hyper(batch => 1000).map(-> $n { $n ** 2 # Same work for every item });
Chaining Operations
.hyper and .race work with the full suite of list operations:
The entire pipeline runs in parallel. Each stage feeds into the next.my @results = (1..1000) .hyper .grep(*.is-prime) .map(* ** 2) .grep(* < 10000); say @results.elems; say @results.head(10);
Practical Example: Parallel File Processing
Count words in multiple files simultaneously:Settingmy @files = dir(".", test => /\.txt$/); my @counts = @files.hyper(batch => 1).map(-> $file { my $words = $file.slurp.words.elems; "{$file.basename}: $words words" }); .say for @counts;
batch => 1 ensures each file is processed independently, which is ideal when files are different sizes.
Practical Example: Parallel Data Transformation
my @raw-data = (1..10000).map({ %(id => $_, value => (rand * 1000).) }); my @processed = @raw-data.hyper(degree => 8, batch => 100).map(-> %item { # Simulate a complex transformation my $normalized = %item<value> / 1000; my $category = do given %item<value> { when 0..333 { "low" } when 334..666 { "medium" } default { "high" } }; %( |%item, :$normalized, :$category ) }); say "Processed {+@processed} items"; say "High: {+@processed.grep(*<category> eq 'high')}"; say "Medium: {+@processed.grep(*<category> eq 'medium')}"; say "Low: {+@processed.grep(*<category> eq 'low')}";
Practical Example: Parallel HTTP Checks
my @hosts = < example.com google.com github.com raku.org perl.org >; my @results = @hosts.hyper(batch => 1, degree => 5).map(-> $host { my $start = now; my $proc = run 'curl', '-sS', '-o', '/dev/null', '-w', '%{http_code}', '--max-time', '5', "https://$host", :out, :err; my $code = $proc.out.slurp(:close); my $time = (now - $start).round(0.01); "$host: HTTP $code ({$time}s)" }); .say for @results;
hyper vs race: When to Use Which
| Feature | .hyper | .race |
|---|---|---|
| Output order | Preserved | Not guaranteed |
| Use when | Order matters | Order does not matter |
| Performance | Slightly slower (sync overhead) | Slightly faster |
| Good for | Reports, sequential output | Aggregation, side effects |
# hyper: ordered results for a report my @report = @files.hyper.map(-> $f { "{$f}: {$f.IO.s} bytes" }); .say for @report; # Files appear in original order # race: fastest aggregation my $total = 0; my $lock = Lock.new; @files.race.map(-> $f { my $size = $f..s; $lock.protect({ $total += $size }); }); say "Total: $total bytes";
Error Handling
Exceptions in hyper/race workers propagate to the caller:For fault-tolerant processing, handle errors inside the map:try { my @r = (1..10).hyper.map(-> $n { die "Error on $n" if $n == 5; $n * 2 }); say @r; CATCH { default { say "Caught: {.message}" } } }
my @results = (1..10).hyper.map(-> $n { try { die "Bad" if $n == 5; $n * 2 CATCH { default { "ERROR: $n" } } } }); say @results; # (2 4 6 8 ERROR: 5 12 14 16 18 20)
Performance Tips
- Batch size matters: Too small wastes time on overhead. Too large reduces parallelism. Start with
batch => 1for I/O-bound work and larger batches for CPU-bound work. - Degree matches cores: The default degree is usually your CPU core count, which is a good starting point. For I/O-bound work, you can go higher.
- Avoid shared mutable state: If you must share data, use
Lockor atomic operations:
my atomicint $counter = 0; (1..10000).race.map({ $counter++ }); # Uses atomic operations, but be careful say $counter; # May not be exactly 10000 without proper atomics
- Measure, do not guess: Always benchmark with your actual workload:
my $start = now; my @seq = (1..1000).map(-> $n { $n ** 2; sleep 0.001; $n }); say "Sequential: {(now - $start).round(0.01)}s"; $start = now; my @par = (1..1000).hyper(batch => 10).map(-> $n { $n ** 2; sleep 0.001; $n }); say "Parallel: {(now - $start).round(0.01)}s";
.hyper and .race are the simplest path to parallelism in Raku. For most data-parallel workloads, adding .hyper to your pipeline is all you need to take advantage of multiple CPU cores.