ASR.lab — Béranger Thomas

Context

ASR.lab is a systematic evaluation platform for automatic speech recognition (ASR) engines. It enables comparing models from different architectures and organizations — OpenAI, Meta, NVIDIA, Alibaba — under controlled and reproducible acoustic conditions, through a fully declarative configuration interface (YAML).

Architecture

The benchmark pipeline chains configurable stages run as a Cartesian product:

Audio degradation: applying realistic acoustic conditions (reverb, noise, compression) via VST3 plugins, with named preset management
Audio enhancement (optional): denoising via Demucs or DeepFilterNet, applied to the degraded audio
Loudness normalization: grid search across different LUFS levels conforming to the EBU R128 standard
ASR transcription: processed through all enabled engines (Whisper, Wav2Vec2, NeMo, Vosk, SeamlessM4T, Moonshine, SenseVoice)
Metrics computation: WER, CER, MER, WIL, WIP — computed twice for each transcription (raw and normalized text), forming an additional grid search dimension

Each combination of degradation × enhancement × normalization × engine generates a distinct entry in the results, enabling exhaustive analysis of the factors affecting recognition performance.

Features

Multi-engine: 7 supported and tested frameworks — Whisper, Wav2Vec2, NeMo, Vosk, SeamlessM4T, Moonshine, SenseVoice
Automatic grid search: Cartesian product of all test parameters, configured in YAML
Interactive reports: self-contained HTML with multi-criteria filters, scatter plots, heatmaps, word-level diffs, and CSV-exportable table — no client-side Pandas/Plotly dependency
Live demo on Hugging Face Spaces
Extensible: plugin architecture for adding new engines or metrics
Open source under MIT license, Python 3.12+, dependency management via uv

Impact

ASR.lab provides a concrete answer to the question “which ASR engine, under what conditions, for which language?” — a key decision for any large-scale transcription project. By making evaluation reproducible and exhaustive, it provides a solid factual basis for comparing open source solutions and guiding technical choices.