Béranger Thomas
Python ASR Benchmark Whisper Wav2Vec2 Speech Recognition Open Source

ASR.lab

ASR.lab

Benchmarking platform for automatic speech recognition systems: controlled audio degradation, enhancement, normalization and multi-engine comparison with interactive reports.

Context

ASR.lab is a systematic evaluation platform for automatic speech recognition (ASR) engines. It enables comparing models from different architectures and organizations — OpenAI, Meta, NVIDIA, Alibaba — under controlled and reproducible acoustic conditions, through a fully declarative configuration interface (YAML).

Architecture

The benchmark pipeline chains configurable stages run as a Cartesian product:

  1. Audio degradation: applying realistic acoustic conditions (reverb, noise, compression) via VST3 plugins, with named preset management
  2. Audio enhancement (optional): denoising via Demucs or DeepFilterNet, applied to the degraded audio
  3. Loudness normalization: grid search across different LUFS levels conforming to the EBU R128 standard
  4. ASR transcription: processed through all enabled engines (Whisper, Wav2Vec2, NeMo, Vosk, SeamlessM4T, Moonshine, SenseVoice)
  5. Metrics computation: WER, CER, MER, WIL, WIP — computed twice for each transcription (raw and normalized text), forming an additional grid search dimension

Each combination of degradation × enhancement × normalization × engine generates a distinct entry in the results, enabling exhaustive analysis of the factors affecting recognition performance.

Features

Impact

ASR.lab provides a concrete answer to the question “which ASR engine, under what conditions, for which language?” — a key decision for any large-scale transcription project. By making evaluation reproducible and exhaustive, it provides a solid factual basis for comparing open source solutions and guiding technical choices.

View demo → GitHub ↗