Context
Choosing an embedding model and a text chunking strategy for a RAG pipeline is rarely straightforward: performance varies with language, chunk size, overlap, and the similarity metric in use. ForzaEmbed automates this evaluation by running an exhaustive grid search across all these parameters, and produces interactive reports to visually analyze the quality of the resulting embeddings.
Architecture
The framework is built around three stages:
- Configuration expansion: from a YAML file, ForzaEmbed generates the Cartesian product of all parameters — embedding model, chunking strategy (
langchain,raw,semchunk,nltk,spacy), chunk size, overlap, and similarity metric (cosine,euclidean,dot_product, etc.). Sentence-based chunkers (nltk,spacy) ignore size parameters, which eliminates up to 40% of redundant combinations. - Execution and caching: for each combination, the text is chunked, embeddings are computed, and chunks are scored against user-defined keyword themes. Every result is cached in a SQLite database with intelligent quantization (
float16for embeddings,uint16for similarities). Already-processed combinations are automatically skipped on resume. - Interactive report: a standalone HTML file is generated with a textual heatmap (spans color-coded by thematic similarity), t-SNE/UMAP/PCA projections of chunk embeddings with original text tooltips, and a draggable floating similarity threshold slider to dim irrelevant passages. Evaluation metrics (silhouette score with intra/inter-cluster decomposition, computation time) are displayed per configuration.
Features
- Multi-backend: sentence-transformers, HuggingFace, FastEmbed, and APIs (OpenAI, Mistral, Voyage)
- Smart grid search: automatic deduplication of redundant combinations for sentence-based chunkers
- Self-contained report: standalone HTML with no client-side Pandas/Plotly dependency, reloadable without rerunning computations
- Live demo on Hugging Face Spaces
- Full documentation on GitHub Pages
- Open source under MIT license, Python 3.13+, dependency management via
uv
Impact
ForzaEmbed addresses a concrete question in RAG pipeline design: which model/chunking combination maximizes thematic coherence of embeddings on my documents? By making this evaluation systematic and visual, it turns what often remains an intuitive call into an evidence-based decision.