In March I did something slightly unhinged: I wrote the same small bioinformatics pipeline three times, once each in Nextflow, Snakemake, and WDL. The repos are public (simple-nf, simple-smk, simple-wdl). The context is that I was building a workflow execution service that needs to run all three, and I was tired of testing against large production pipelines where failures could mean anything. I needed small, known-good pipelines I fully understood.
The workflow itself is deliberately boring: FastQC on raw reads, trimming with Trimmomatic, alignment with BWA-MEM, sorting and indexing with samtools, and variant calling with bcftools. Five steps, each consuming the output of the previous one, plus FastQC which branches off independently. Simple enough to fit in your head, complex enough to exercise each language's dependency resolution.
Nextflow: channels and dataflow
Nextflow models data as channels. Each process declares input and output channels, and you wire them together. The TRIM process emits a channel of trimmed FASTQ pairs, the ALIGN process consumes that channel and emits a channel of BAM files, and so on.
workflow {
reads_ch = Channel.fromFilePairs(params.reads)
FASTQC(reads_ch)
trimmed = TRIM(reads_ch)
aligned = ALIGN(trimmed, params.reference)
sorted = SORT(aligned)
CALL(sorted, params.reference)
}
Parallelism falls out naturally. If you emit 10 sample pairs into reads_ch, the runtime forks 10 independent execution paths without you writing any parallel code. Each process runs in its own working directory (a temp directory under work/), and Nextflow stages input files via symlinks. That staging model means processes are isolated by default, which is great for reproducibility but means you can't share intermediate state between steps without explicitly channeling it.
The thing that tripped me up: Nextflow channels are consumed on read. If you feed reads_ch into both FASTQC and TRIM, you need to use .into { fastqc_ch; trim_ch } (DSL1) or just reference it twice (DSL2, which creates implicit forks). In DSL1, referencing a channel twice silently empties it on the second read. That's a class of bug you won't find in any other workflow language, because it's a consequence of the dataflow model rather than a logic error.
Error handling is per-process via errorStrategy. You can set 'retry' with maxRetries, which is important because bioinformatics tools fail for transient reasons (network filesystem hiccups, memory pressure) more often than for code bugs. The retry re-runs the process in a fresh work directory.
Snakemake: rules and filename DAGs
Snakemake inverts the model. You declare rules with input/output file patterns, and the engine resolves the DAG backwards from the target output.
rule trim:
input: "data/{sample}_R1.fastq.gz", "data/{sample}_R2.fastq.gz"
output: "trimmed/{sample}_R1.fastq.gz", "trimmed/{sample}_R2.fastq.gz"
shell: "trimmomatic PE {input} {output} ILLUMINACLIP:..."
rule align:
input: "trimmed/{sample}_R1.fastq.gz", "trimmed/{sample}_R2.fastq.gz"
output: "aligned/{sample}.bam"
shell: "bwa mem {config[reference]} {input} | samtools view -bS - > {output}"
You request aligned/sampleA.bam, and Snakemake walks backwards: to build that BAM, I need trimmed FASTQs, to build those I need raw FASTQs. The entire DAG is inferred from filename patterns. This is elegant for well-structured data. It falls apart when your naming convention is messy, because the DAG is literally constructed from filename regex matches. If two rules can produce the same output pattern, Snakemake throws an ambiguity error, and resolving it means adding ruleorder directives or restructuring your output paths.
Snakemake runs in the working directory by default (no sandboxed work dirs like Nextflow), so intermediate files from different rules can see each other on disk. This makes ad-hoc debugging easier (just ls the output directory) but means you have to be careful about rules accidentally reading stale files from a previous run. The --forceall flag re-runs everything, but in practice I added a clean rule that wipes output directories.
Parallelism is via --cores N. Snakemake determines which rules are independent in the DAG and runs up to N of them concurrently. On a cluster, --cluster "sbatch ..." replaces local execution with job submission, which is why Snakemake dominates in HPC environments where Slurm is the scheduler.
WDL: tasks, types, and scatter-gather
WDL feels the most like writing functions in a normal programming language. You define a task with typed inputs, a command block (a bash template), typed outputs, and a runtime section.
task align {
input {
File r1
File r2
File reference
File reference_idx
}
command <<<
bwa mem ~{reference} ~{r1} ~{r2} | samtools view -bS - > aligned.bam
>>>
output {
File bam = "aligned.bam"
}
runtime {
docker: "biocontainers/bwa:0.7.17"
memory: "8 GB"
cpu: 4
}
}
Tasks compose into a workflow block. Scatter-gather is explicit: scatter (sample in samples) { call align { input: ... } } produces an Array[File] of BAMs. There's no implicit parallelism from data structure, you tell it what to parallelize.
The type system is WDL's biggest differentiator. File, String, Int, Boolean, Array[X], Map[K, V], Pair[A, B], Object, and optional types (File?). The Cromwell/miniwdl validators can typecheck a workflow before execution, catching errors like passing an Array[File] where a File is expected. For a service that accepts untrusted user workflows (which is what I was building), this is a real advantage. I can reject a malformed WDL at submission time with a useful error message, instead of discovering the problem ten minutes into execution.
WDL's weakness is ecosystem. Nextflow has nf-core (hundreds of curated pipelines). Snakemake has the Snakemake Workflow Catalog. WDL has BioWDL and the Broad Institute's workflows, but the selection is thinner. If you're starting fresh and your tool already exists as an nf-core module, Nextflow is the obvious choice.
What I took away
None of these is "best." Nextflow has the largest community and the richest ecosystem. Snakemake has the gentlest learning curve and works brilliantly for single-machine analysis. WDL has the cleanest spec and the best story for pre-execution validation.
The real takeaway for the execution service I was building: it shouldn't prefer one over the other. Each language carries an execution model, and the researchers picking them have made that choice for domain reasons. My job is to run whatever they send. Writing the same pipeline three times gave me enough familiarity with each model to write correct adapters.