Béranger Thomas

Data Scientist
& AI Engineer

I design and deploy AI-powered solutions and data analysis pipelines with over 20 years of experience, from software engineering to data science. Expert in NLP, LLMs and data pipelines, I combine scientific rigour with a pedagogical approach to turn business challenges into robust, production-ready applications.

Projects

SmartWatch

SmartWatch

Scraping pipeline for Lyon public facility opening hours. Leverages embeddings and LLM to produce structured, unambiguous output, compared against data.grandlyon.com.

Python LLM Web Scraping Embeddings NLP
StellaScript

StellaScript

Local Python audio transcription pipeline with speaker diarization, usable in real time (microphone) or on file. Works offline after initial model download.

Python Speech Processing WhisperX Diarization Pyannote SpeechBrain Open Source
PRISM

PRISM

Composable Python library for string similarity matching. Supports edit distance, sequence similarity, token-based, phonetic and semantic similarity with a unified API.

Python NLP Embeddings String Matching Library
ASR.lab

ASR.lab

Benchmarking platform for automatic speech recognition systems: controlled audio degradation, enhancement, normalization and multi-engine comparison with interactive reports.

Python ASR Benchmark Whisper Wav2Vec2 Speech Recognition Open Source
ForzaEmbed

ForzaEmbed

Python benchmarking framework for text embedding models: grid search over chunking strategies and similarity metrics, with textual heatmap and embedding space visualizations.

Python NLP Embeddings Benchmark RAG Chunking Open Source