It’s Time to RAG!

AgileDD - RAG Post (Saved Image)

I’m not talking about early 20th century dances and Scott Joplin (despite what our young AgileDD developers might think about my age). Instead, I want to discuss chatbots and specifically how the AgileDD chatbot tackles a critical challenge.

When building a chatbot that needs to process tens of thousands of technical documents, data scientists love to discuss sophisticated approaches: the optimal generative AI model, fine-tuning techniques, reinforcement learning, and in-context learning. They might even suggest you take a prompt engineering course to improve your interactions with LLMs!

But there’s one critical algorithm they’re often hesitant to spotlight – Retrieval-Augmented Generation, or RAG. This “ugly duckling” among AI algorithms deserves our attention, as it’s the unsung hero behind effective document intelligence. Let’s explore what makes RAG so valuable, how it works, and how we’re making it more effective at AgileDD.

Understanding the LLM Context Challenge

To appreciate RAG’s importance, we should first understand how large language models operate. As a simplified view, these gigantic neural networks:

  •  – Become more capable (and expensive) as they add more parameters
  •  – Can be trained on vast document libraries – the largest models ingest the entire Library of Congress, Wikipedia in every language, and billions of web pages
  •  – Retain massive amounts of information and infer linguistic patterns
  •  – Require context to effectively answer specific questions

This last point is where RAG becomes crucial. Ideally, you’d want to feed all your documents to your favorite LLM and ask it questions about your specific content. It sounds reasonable – if these models can process the entire Library of Congress over breakfast, surely they can handle your organization’s documents.

But that’s not how it works in practice. By the time you’re interacting with an LLM, it has completed its training. What it needs now is a pre-packaged context from which to construct an answer to your question.

The significant limitation is that this context has strict size boundaries:

LLM

Model Size (Parameters)

Max Context Size (Tokens)

Max Context in Pages*

Max Context in Documents (50-page)*

GPT-4 (Standard)

175B

4,076

6

0.1

GPT-4 (Extended)

175B

32,000

45

1

Gemini 1.5

200+B

1M

1428

29

Gemini Pro

200+B

2M

2857

58

DeepSeek R-1

671B

128,000

183

4

Llama 70B

70B

32,768

46

1

Grok3

200B

131,072

187

4

*Approximating 700 tokens per technical page, 50 pages per technical report

These numbers reveal the fundamental challenge. How can we build a non-hallucinating chatbot when context windows are so constrained? Imagine an engineering team creating a chatbot to query documents for 15,000 North Sea oil wells – that’s potentially 500,000 documents. Even the most advanced LLMs would be overwhelmed.

How RAG Solves the Context Problem

This is where RAG shines. Its concept is elegantly simple: the LLM doesn’t need all documents to answer a question – it only needs the relevant sections. RAG filters your document collection and extracts chunks with high probability of containing information needed for the answer.

For example: You have 50,000 mineral exploration reports from Quebec (available on SIGEOM) and ask your chatbot, “What are the gold concentrations measured around Mount Albert?” The RAG will extract key concepts like “gold concentration” and “Mount Albert,” then search for document chunks addressing those specific topics. If the resulting context is limited to a few pages, even a modestly-sized LLM like Llama 70B can construct an excellent response.

The RAG Challenge with Broad Queries

While this sounds promising, RAG faces significant challenges with broader questions. If you ask about gold concentrations across all of Quebec, the RAG might return hundreds or thousands of pages. The LLM can only process a portion of this material within its context window, resulting in incomplete answers – despite potentially incurring maximum costs (as LLMs typically charge based on context size).

This limitation is why data scientists encourage specific questions with detailed parameters – they’re hoping to help RAG produce a focused, manageable context.

Various RAG algorithms exist with different performance characteristics, and they can be tuned for optimization. At their core, these algorithms compare vector representations of your question against vectors for each text chunk in your document collection. Additional techniques like BM25 and reranking help improve results, but fundamentally, RAG struggles to provide limited context for general questions across massive document collections.

AgileDD’s Approach to Better RAG

This is where AgileDD’s approach makes a critical difference. Our platform leverages document metadata to filter content based on categories, authors, attribute values, and more. Instead of processing thousands of documents, RAG can work with dozens, dramatically improving efficiency.

On the AgileDD platform, users control this preliminary filtering through an intuitive interface. You can select precisely which documents to query based on our full-text index and thematic indexes built from captured attribute values.

But we’re taking RAG enhancement even further. To maintain manageable context sizes, it’s sometimes necessary to reduce the size of chunks extracted by RAG – which risks losing valuable information. Our solution injects extracted attribute values into RAG-proposed chunks, creating information-dense contexts that preserve critical knowledge while respecting size limitations.

The AgileDD development team is currently refining this attribute-enhanced RAG approach. In the coming weeks, you’ll see this “ugly duckling” transform into the most elegant of swans, delivering more precise and comprehensive answers from your document collections than ever before.