Skip to content

Data Science · One build, frame by frame

FreeRAG

Open RAG, no API tax.

A retrieval-augmented generation system built to run without paid inference APIs — local embeddings, vector retrieval, and grounded responses over your own documents.

Data SciencePythonLangChainHugging FaceFastAPI

01The problem

RAG demos almost always assume a paid LLM/embedding API, which makes them expensive to run and impossible to self-host privately.

02What I built

Wired local Hugging Face embeddings into a vector store with a retrieval + grounding pipeline behind a FastAPI service, keeping every component swappable and offline-capable.

03What changed

A fully self-hostable RAG stack that answers over private documents with citations and zero per-token cost.

See it for yourself

← All work