Talk
Intermediate

From Crawl to Chat: Optimizing RAG Pipelines with Async Scraping and Hybrid Re-Ranking

Rejected

Building efficient Retrieval-Augmented Generation (RAG) chatbots is a tough problem, especially when you’re dealing with complex, dynamic data sources such as large websites, PDFs with embedded images, and mixed-format tables.

Traditional RAG pipelines often face bottlenecks in:

  • Long ingestion times during data scraping and processing,

  • Irrelevant retrievals due to noisy chunks,

  • High latency and token costs during query response.

In this session, we’ll explore a real-world implementation of a RAG-based chatbot that overcame these performance issues through four key innovations:

  1. Accelerated Ingestion (Asyncio):
    By building an asynchronous scraping layer using asyncio, we reduced total website crawl time from 5 hours 18 minutes to 40 minutes, a 9× speedup in ingestion.

  2. Multi-Modal Chunking:
    We developed a chunking pipeline that intelligently processes text, images, and tables, preserving contextual relationships to improve the embedding and retrieval accuracy.

  3. Hybrid Re-Ranking:
    Instead of relying solely on semantic similarity, our re-ranking model blends semantic relevance with metadata factors (page authority, depth, and domain weight) to surface the most relevant snippets.

  4. Optimized Query Workflow:
    Incoming user queries are classified (general, gibberish, site-specific) and rephrased to ensure optimal recall during the retrieval phase all while reducing token usage and response latency.

This end-to-end pipeline produced a chatbot that was significantly faster, more cost-efficient, and contextually precise ideal for real-world web-scale RAG systems.

The talk will walk attendees through:

  • Architecting an async data ingestion pipeline for large-scale websites,

  • Implementing multi-modal preprocessing and chunking,

  • Designing a hybrid scoring function (α·semantic + β·page weight + γ·domain authority) for re-ranking,

  • Integrating the pipeline into a retrieval and response generation workflow with LLMs.

By the end, participants will gain a clear blueprint for applying similar optimisations to their own RAG chatbots, achieving both speed and quality without excessive token usage.

  • Build async data ingestion systems using Python’s asyncio for massive speed improvements.

  • Apply multi-modal chunking to preserve context across images, text, and tables.

  • Use hybrid scoring for more intelligent retrieval.

  • Learn how query optimization reduces cost and latency while improving accuracy.

Technology architecture
Other

Srikanth Doddi
Architect OSI DIGITAL PVT LTD
https://www.linkedin.com/in/srikanthdoddi/
Speaker Image

0 %
Approvability
0
Approvals
0
Rejections
0
Not Sure
No reviews yet.