Large language models (LLMs) have revolutionized natural language processing (NLP), achieving impressive results on various tasks. However, their knowledge is fundamentally limited by their training data. Retrieval-Augmented Generation (RAG) elegantly addresses this limitation by combining LLMs with external knowledge sources, enabling them to access and utilize information far exceeding their training data. This survey examines the seminal paper "Improving language models by retrieving from trillions of tokens" Borgeaud et al., 2021, which introduced the Retrieval-Enhanced Transformer (RETRO) architecture, and explores subsequent research and advancements in the field.
The Core Concept: Bridging Parametric and Non-Parametric Memory
Traditional LLMs rely on parametric memory, where knowledge is implicitly encoded within the model's learned parameters. RAG introduces non-parametric memory, the ability to dynamically access and incorporate information from a vast external knowledge base (e.g., Wikipedia, specialized corpora, or databases). This dynamic access empowers LLMs to leverage information beyond their training data, leading to more accurate, comprehensive and contextually appropriate responses.
Analogy: The Researcher and the Library
Imagine a researcher writing a paper. A traditional LLM is like a researcher relying solely on their notes. A RAG system is like a researcher who can access and use a vast library. The researcher actively searches the library for relevant information (retrieval), and then integrates this knowledge into their paper (generation), leading to a significantly improved final product.