Purpose:
Converts English research papers into easily understandable audio podcasts (in Hindi).
Key Technologies:
PyMuPDF for PDF text extraction.
facebook/bart-large-cnn (cpu accel), and llama3.2:3b (llamafile, selfhosted) for text processing (chunking, summarization, transcripting).
Krutrim for translation.
Silero TTS for text-to-speech conversion.
Automated PDF text extraction.
AI-powered text-to-speech generation.
Language translation capabilities.
Lightweight operation (CPU/CUDA compatible).
Fully self hosted
Customizable language and voice options.
SSML for enhanced audio quality.
Development:
Open-source project with GPL-3.0 license.
Workflow:
PDF -> Chunking -> Summarization -> Translation -> Transcript -> Audio.