Data scientist & AI Engineer

From business challenges to AI in production.

NLP · LLM · RAG · Machine learning · Audio & speech processing

I design and deploy AI applications, from scoping to production. Twenty-two years of engineering and solid hands-on data practice. I bring technical rigour, domain grounding, and a pedagogical sensibility.

Available immediately
Lyon · France
Email me
LinkedIn
Interactive Resume

Projects

A few projects that summarise my practice:

NLP & RAG

SmartWatch

Stable

juin 2025

An automated data pipeline for the Metropolis of Lyon that retrieves, extracts, and compares public facility opening hours using semantic filtering and large language models.

Python LLM Playwright Embeddings NLP CodeCarbon Polars SQLAlchemy Jinja2

View

NLP & RAG

ForzaEmbed

Stable

juil. 2025

Python benchmarking framework for text embedding models: grid search over chunking strategies and similarity metrics, with textual heatmap and embedding space visualizations.

Python NLP Embeddings Benchmark RAG Chunking

View

Audio & speech processing

ASR.lab

Stable

oct. 2025

Benchmarking platform for automatic speech recognition systems: controlled audio degradation, enhancement, normalization and multi-engine comparison with interactive reports.

ASR Benchmark Whisper Nemo Speech recognition

View

DevOps & monitoring

Veona

Beta

mai 2026

System monitoring platform: lightweight Go agent, TypeScript/Hono server, VictoriaMetrics time-series storage and integrated ML engine (anomalies, forecasting, health scoring).

Go TypeScript Monitoring VictoriaMetrics Machine Learning

View

Audio & speech processing

StellaScript

Stable

sept. 2025

Local Python audio transcription pipeline with speaker diarization, usable in real time (microphone) or on file. Works offline after initial model download.

Python Speech Processing WhisperX Diarization Pyannote

View

Data visualization

Selma

Active

avr. 2026

React/TypeScript application for visualizing and navigating hierarchical taxonomies and DAG in a browser.

DAG TypeScript React Vite React Flow Tailwind i18n

View

AI training & mentoring

School of Statistics

Beta

juin 2026

Interactive visualizations for exploring statistical and machine learning concepts.

TypeScript Statistics Dataviz Chart.js D3.js

View

Audio & speech processing

FLAC Toolkit

Stable

oct. 2025

A command-line utility for low-level validation, automated repair, audio duplicate detection, and ReplayGain normalization of FLAC files.

Python Audio Processing CLI RFC 9639 ReplayGain

View

AI training & mentoring

Immersion IA

Stable

juin 2025

Co-design and facilitation of AI training workshops, and production of pedagogical resources (prompting booklet, facilitator guide) for employees of the Metropolis of Lyon.

LLM Prompt Engineering Training Pedagogy

View

Contact

Below you can find my various professional networks, my contact information, and my interactive resume.

LinkedIn in/berangerthomas

GitHub @berangerthomas

Hugging Face @berangerthomas

Interactive Resume Online Curriculum Vitæ

E-mail beranger.thomas@proton.me