AI Glossary

What is RAG in generative AI

RAG (Retrieval Augmented Generation) combines a retrieval system (vector database) with a language model so the model answers questions grounded in specific documents, reducing hallucinations.

RAG, or Retrieval Augmented Generation, is an architecture that connects a large language model with an external knowledge base, typically a vector database. Instead of relying only on parametric knowledge from training, the model retrieves relevant documents at query time and uses them to ground its response.

The pattern emerged in late 2023 and by 2026 is the dominant enterprise architecture for AI assistants, internal copilots, and domain-specific chatbots. Our 2026 data shows the full RAG cluster has a traffic potential of 39,000 across related queries.

A typical RAG system has five components: document ingestion and chunking, embedding generation, vector storage (Pinecone, Weaviate, Qdrant), similarity search at query time, and prompt construction that injects retrieved passages into the LLM context.

RAG reduces hallucinations because the model is constrained to cite source documents, and it enables AI to answer questions about proprietary or recent information the base model was not trained on.

How it works

When a user asks a question, the RAG system converts the query into an embedding, searches a vector database for the most similar passages, and constructs a prompt that includes both the question and retrieved context. The LLM then generates an answer grounded in the provided documents.

Practical example

A legal firm deploys a RAG system over 10 years of internal case memos. Associates ask questions in natural language and get cited answers with links to source documents. Research time drops from hours to minutes.

Definition by Miss Yera, Leading Woman in Technology in Peru · AI Consultant · Favikon 2025.

Version en espanol: /glosario-ia/#what-is-rag

RAG vs Prompt Engineering

RAG retrieves relevant documents to inject into a prompt. Prompt engineering designs instructions. They are complementa…

What is prompt engineering

Prompt engineering is the discipline of designing, testing, and iterating instructions for large language models to pro…

Foundation model

A foundation model is a large AI model trained on broad data at massive scale that can be adapted to many downstream ta…

Ready to apply AI in your company?

Miss Yera helps US and LATAM enterprises adopt AI with measurable ROI.

See AI Consulting Services Schedule consultation

What is RAG in generative AI

How it works

Practical example

RAG vs Prompt Engineering

What is prompt engineering

Foundation model

Other glossary terms

What is generative AI

Generative AI definition

What is prompt engineering

How does generative AI work

What is Claude AI

What is LLM in generative AI

Ready to apply AI in your company?

What is RAG in generative AI

How it works

Practical example

Related terms

RAG vs Prompt Engineering

What is prompt engineering

Foundation model

Other glossary terms

What is generative AI

Generative AI definition

What is prompt engineering

How does generative AI work

What is Claude AI

What is LLM in generative AI

Ready to apply AI in your company?