A couple of months ago, I started working on a passion project called Dhwani. The trigger: discovering just how awesome the public domain is. I came to this realization a little late in life, but since then, I’ve been awestruck at the boundless shared heritage that’s free, open, and accessible to everyone.
The kind of things that are in the public domain—it’s hard to wrap your brain around it. Some of the greatest literary works, music, cinema, artwork, letters, among other things, are free, open, and available for anybody to remix and give new life to. Some of the greatest voices, some of the greatest words ever written, some of the greatest artistic moments—freely accessible to everyone. A common shared inheritance, if you will. It’s an absurd idea. It feels untrue, but it is true.
Discovering the public domain is probably one of the best things that happened to me. Somewhere along this journey, I came across Project Gutenberg, which is probably one of the most awesome sites on the internet.
And I started thinking: why isn’t there a Project Gutenberg for India?
What annoyed me further is that we boast endlessly about how India is the oldest civilization on earth—how we’ve given the world science, mathematics, philosophy, medicine, spirituality, and not to mention hot yoga. But when it comes to actually protecting our own cultural heritage, especially our written cultural history, we’ve done a horrible job.
There’s a wide chasm between our boasts about the richness of our civilization and the respect we actually have for it.
Why isn’t there a single repository of all the written works in the Indian public domain? All of these works are scattered across hundreds of different places—Archive.org, Gutenberg, and God knows what other undiscoverable sites. This bothered me.
I wanted to do something. But my desire to do something was checkmated by my lack of technical abilities. Building something like Project Gutenberg is not a small undertaking—it’s ridiculously complex, costly, time-consuming. And I’m not a coder. So it remained an idea in my head.
Then AI coding tools—Claude, Cursor, Codex—popped onto the scene. Suddenly, there was a small but non-trivial probability that I could actually do something.
I was still learning how to use these tools. Since I’m not a technical person, I didn’t have a good sense of their strengths and weaknesses. But as I kept messing about with them, it became clear that even people without any technical background could build useful things.
So as a first step, I built Dhwani—a simple directory aggregating links to Indian literary and historical works on Archive.org, Gutenberg, and elsewhere. Instead of extracting text from old PDFs and making them readable like Project Gutenberg, I just collected the links in one place, hoping some people might find it useful.
How useful is it? I don’t know. But even if a few people find some use, I think it’s time well spent.
After I launched Dhwani, I kept messing around with these AI tools. And through a series of happy accidents, I realised that the current crop of LLMs are actually pretty good at extracting clean, well-formatted text from old PDFs. Which is to say, they’re good at OCR.
So I started experimenting—over the last month or so—with OCR models and large language models. Suddenly, it became clear that I could use these tools to make old books in the public domain readable. I could take the horridly scanned PDFs on Archive.org and elsewhere, extract clean text, format them, and publish them online—just like Project Gutenberg.
Despite initial frustrations, I’ve figured out a way to do this. Thanks to Gemini’s models—Flash and Pro—and Claude’s Haiku, along with tools like Paddle OCR, I can extract clean, well-formatted text. It requires some hacking around and manual intervention, but it works.
And that’s the backstory of my latest passion project: Akshara.
On Akshara, I’m making old Indian literary works readable. You can read them online, or download EPUBs, PDFs, or Markdown files for each book. There are about 15 books right now. I’m slowly working on more.
Since I’m doing this alone, I don’t have a whole lot of resources. I can’t afford to use the latest frontier models heavily—they’re pricey—so I’m finding the most cost-effective way to do this. And despite the “AI” label, a lot of manual effort is still needed to check for errors, fix formatting, clean up tables and footnotes, and so on. Even a rough check of a 300–400 page book takes 2–3 hours. For now, I can do a few books a week. Hopefully, as these models get better and cheaper, I’ll be able to do more.
But I’m genuinely excited. The wealth of wisdom in the Indian public domain waiting to be unlocked is just ridiculous. While Western thought gets far too much credit, the kind of wisdom that India has given to the world is insane. I’m not saying this as another mystic-India, loudly-beating-the-drum-about-our-rich-cultural-past thing. I’m saying this as someone who started discovering these books a little late in life.
Why do this? This should exist. That’s it. I don’t have any other reason. Whether it’s useful for people to discover—I don’t know. But I’m genuinely excited about doing this. It feels fun.
And if you find it interesting, I’d appreciate it if you spread the word.
Thank you.