Open Source

Built in the open

Akshara is fully open source. The website, the OCR extraction pipeline, the scripts that turn crumbling PDFs into readable text — all of it lives on GitHub.

akshara

The website you're reading right now. A Hugo-powered digital library for India's forgotten literary heritage — hand-curated book pages, a clean reading interface, full-text search, and everything needed to make century-old texts feel alive on screen.

Hugo Cloudflare Workers Pagefind

akshara-extract

A multi-pass archival digitization pipeline. Takes horribly scanned PDFs from Archive.org and turns them into clean, structured markdown using Gemini Flash for OCR extraction, Claude Haiku for assembly planning, and deterministic verification to ensure nothing is lost or hallucinated. The engine behind every book on this site.

Python Gemini Flash Claude Haiku PyMuPDF

Found a bug? Want to contribute a book? Have ideas for improvements?
Pull requests and issues are always welcome.

View all repositories