RAG (Retrieval-Augmented Generation) lets you build an AI that answers questions from your own documents. It retrieves relevant information from your data and uses it to generate accurate answers.

Do I need a machine learning background to build RAG?

No. Basic Python knowledge is enough. Using LangChain and Chroma, you can build a working RAG system that answers questions from PDFs in an afternoon.

What is the difference between RAG and fine-tuning?

RAG retrieves information from your documents at query time — no model training needed. Fine-tuning permanently modifies a model's weights using your data. RAG is cheaper, faster, and sufficient for 90% of use cases.

Can I build RAG for free?

Yes. Use open-source tools (LangChain + Chroma) with free API tiers. Gemini API's 1,500 free requests/day is enough for a basic RAG system.

Can I build RAG with Indian language documents?

Yes. Use multilingual embedding models like sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 which supports Hindi, Tamil, Telugu, Bengali, and 50+ languages. Your RAG pipeline will work with documents in any Indian language.

How much does it cost to build a RAG system in India?

A basic RAG system costs nothing — use free tools like LangChain, Chroma vector database, and Google Colab. For production, expect ₹2,000-₹5,000/month for API costs and a small VPS for the vector database.

Do I need machine learning knowledge for RAG?

No. RAG is essentially a pipeline that retrieves relevant documents and passes them to an AI model. You need basic Python skills and understanding of APIs. The LangChain framework handles all the complex parts automatically.

RAG for Beginners — Build Your Own — India Guide 2026

RAG for Beginners — Build Your Own

Retrieval-Augmented Generation explained, build it yourself

Have you ever wished you could ask ChatGPT a question and have it answer based on YOUR company's documents, YOUR notes, or YOUR codebase? RAG (Retrieval-Augmented Generation) makes this possible — without training a new model, without uploading all your data to OpenAI, and without a machine learning background.

This guide explains what RAG is, why it works, and walks you through building a simple RAG system that can answer questions from a set of PDF documents — in Python, in an afternoon.

What You'll Learn

What RAG is and why it works better than fine-tuning for most use cases
The three components of every RAG system
Step-by-step: building a PDF Q&A system with LangChain + Chroma
How to improve RAG accuracy
Production considerations
Free and paid tools comparison

What Is RAG?

RAG stands for Retrieval-Augmented Generation. It is a technique where an AI model is given relevant context at query time by first retrieving relevant documents from a collection.

Without RAG: User: "What is our company's refund policy?" AI: "I don't have information about your specific company's policies."

With RAG: User: "What is our company's refund policy?" System: 1) Search documents for "refund policy" → find relevant sections → pass them to AI AI: "According to your company policy document, refunds are processed within 7-10 business days..." (with specific, accurate information)

The key insight is that you do not need to train the AI on your data — you just give it relevant context at the time of each question. This is much cheaper, faster, and more flexible than fine-tuning.

🇮🇳 India Note: RAG is particularly useful for Indian businesses with large document repositories — legal firms with case precedents, CA firms with tax notifications, hospitals with medical records, and government agencies with circular and notification databases. The documents stay on your own infrastructure; only the question and retrieved snippets go to the AI API.

The Three Components of RAG

Every RAG system has three parts:

1. Embedding Model

Converts text into numbers (vectors) that capture semantic meaning. Similar texts get similar vectors. This is how the system finds relevant documents.

Free options:

sentence-transformers/all-MiniLM-L6-v2 (local, free, fast)
Google's text-embedding-004 (free tier via Gemini API)

Paid options:

OpenAI text-embedding-3-small ($0.02/million tokens)

2. Vector Database

Stores the embedded document chunks and allows fast similarity search. When you ask a question, the vector database finds the most similar document chunks.

Options:

Chroma — Local, free, great for development
FAISS — Local, free, used in production at Meta
Pinecone — Cloud, free tier available, easy to use
Supabase — PostgreSQL + pgvector, free tier

3. Language Model (LLM)

The AI that generates the final answer using the retrieved context. Any LLM works — GPT-4, Claude, Gemini.

Step-by-Step: Build a PDF Q&A System

Prerequisites

pip install langchain langchain-community chromadb pypdf sentence-transformers

You also need either:

An OpenAI API key (for GPT), OR
A Google AI API key (free via aistudio.google.com)

Step 1: Load and Split Documents

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def load_documents(pdf_paths: list[str]):
    all_docs = []
    for path in pdf_paths:
        loader = PyPDFLoader(path)
        docs = loader.load()
        all_docs.extend(docs)
    
    # Split into chunks for better retrieval
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,    # ~250 words per chunk
        chunk_overlap=200   # Overlap prevents losing context at boundaries
    )
    return splitter.split_documents(all_docs)

chunks = load_documents(["company_policy.pdf", "product_manual.pdf"])
print(f"Created {len(chunks)} chunks from your documents")

Step 2: Create Embeddings and Store in Chroma

from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma

# Use a free local embedding model
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# Create vector store and embed all chunks
# This takes 1-5 minutes depending on document size
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"  # Save to disk
)

print("Documents embedded and stored in Chroma!")

Step 3: Build the Q&A Chain

from langchain.chains import RetrievalQA
from langchain_google_genai import ChatGoogleGenerativeAI

# Use free Gemini model
llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",  # Free tier
    google_api_key="YOUR_GOOGLE_AI_KEY"  # Free from aistudio.google.com
)

# Create retriever — finds top 4 most relevant chunks
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 4}
)

# Create QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",  # Stuff all chunks into context
    retriever=retriever,
    return_source_documents=True  # Show which documents were used
)

Step 4: Ask Questions

def ask(question: str):
    result = qa_chain({"query": question})
    print(f"Answer: {result['result']}")
    print("\nSources:")
    for doc in result['source_documents']:
        print(f"  - {doc.metadata.get('source', 'Unknown')} (page {doc.metadata.get('page', '?')})")

# Test it!
ask("What is the refund policy?")
ask("How do I reset my password?")
ask("What are the working hours?")

💰 Free Deal: Get your Google AI API key at aistudio.google.com — completely free, no credit card needed. The free tier gives 1 million tokens/day with Gemini 2.5 Flash, which is more than enough for a document Q&A system with moderate usage.

Complete Working Example

Here is the complete script in one place:

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_google_genai import ChatGoogleGenerativeAI
import os

# Configuration
PDF_FILES = ["document1.pdf", "document2.pdf"]
GOOGLE_API_KEY = "your-api-key-here"

# 1. Load documents
all_docs = []
for pdf in PDF_FILES:
    loader = PyPDFLoader(pdf)
    all_docs.extend(loader.load())

# 2. Split into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(all_docs)

# 3. Create embeddings (free, runs locally)
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = Chroma.from_documents(chunks, embeddings, persist_directory="./db")

# 4. Create LLM and QA chain
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash", google_api_key=GOOGLE_API_KEY)
qa = RetrievalQA.from_chain_type(llm=llm, retriever=vectorstore.as_retriever(search_kwargs={"k": 4}))

# 5. Ask questions
while True:
    question = input("\nAsk a question (or 'quit'): ")
    if question == "quit":
        break
    result = qa({"query": question})
    print(f"\nAnswer: {result['result']}")

Improving RAG Accuracy

When your RAG system gives wrong answers, these techniques help:

Better chunking: Semantic chunking (splitting at natural paragraph or section boundaries) works better than fixed-size chunking for structured documents.

Hybrid search: Combine vector similarity search with keyword search (BM25). This handles cases where exact terminology matters.

Reranking: After retrieving top-K chunks, use a reranker model to sort them by relevance before passing to the LLM.

Metadata filtering: Add metadata (document date, section, author) to chunks and filter by it. "Answer this from documents published after January 2025 only."

RAG vs Fine-Tuning

| Criteria | RAG | Fine-Tuning | |----------|-----|------------| | Cost | Low ($0-50/month) | High ($500-5,000+) | | Setup time | Hours | Days to weeks | | Data freshness | Real-time update | Re-train for updates | | Accuracy | Good with good retrieval | Higher for specific domains | | Transparency | Shows sources | Black box | | Best for | Q&A, search, chat | Specific behavior/style |

Rule of thumb: Use RAG if you need to answer questions from documents. Consider fine-tuning only if RAG quality is insufficient after optimization.

Official Resources

LangChain Documentation — Complete LangChain framework docs
Chroma Documentation — Local vector database
Google AI Studio — Free Gemini API key, no credit card
Pinecone Quickstart — Cloud vector DB with free tier
Sentence Transformers — Free local embedding models

Community Questions

No questions yet. Be the first to ask!

Share this guide

r/developersIndia r/india r/ChatGPT

RAG for Beginners — Build Your Own

Retrieval-Augmented Generation explained, build it yourself

This guide explains what RAG is, why it works, and walks you through building a simple RAG system that can answer questions from a set of PDF documents — in Python, in an afternoon.

What You'll Learn

What RAG is and why it works better than fine-tuning for most use cases
The three components of every RAG system
Step-by-step: building a PDF Q&A system with LangChain + Chroma
How to improve RAG accuracy
Production considerations
Free and paid tools comparison

What Is RAG?

RAG stands for Retrieval-Augmented Generation. It is a technique where an AI model is given relevant context at query time by first retrieving relevant documents from a collection.

Without RAG: User: "What is our company's refund policy?" AI: "I don't have information about your specific company's policies."

🇮🇳 India Note: RAG is particularly useful for Indian businesses with large document repositories — legal firms with case precedents, CA firms with tax notifications, hospitals with medical records, and government agencies with circular and notification databases. The documents stay on your own infrastructure; only the question and retrieved snippets go to the AI API.

The Three Components of RAG

Every RAG system has three parts:

1. Embedding Model

Converts text into numbers (vectors) that capture semantic meaning. Similar texts get similar vectors. This is how the system finds relevant documents.

Free options:

sentence-transformers/all-MiniLM-L6-v2 (local, free, fast)
Google's text-embedding-004 (free tier via Gemini API)

Paid options:

OpenAI text-embedding-3-small ($0.02/million tokens)

2. Vector Database

Stores the embedded document chunks and allows fast similarity search. When you ask a question, the vector database finds the most similar document chunks.

Options:

Chroma — Local, free, great for development
FAISS — Local, free, used in production at Meta
Pinecone — Cloud, free tier available, easy to use
Supabase — PostgreSQL + pgvector, free tier

3. Language Model (LLM)

The AI that generates the final answer using the retrieved context. Any LLM works — GPT-4, Claude, Gemini.

Step-by-Step: Build a PDF Q&A System

Prerequisites

pip install langchain langchain-community chromadb pypdf sentence-transformers

You also need either:

An OpenAI API key (for GPT), OR
A Google AI API key (free via aistudio.google.com)

Step 1: Load and Split Documents

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def load_documents(pdf_paths: list[str]):
    all_docs = []
    for path in pdf_paths:
        loader = PyPDFLoader(path)
        docs = loader.load()
        all_docs.extend(docs)
    
    # Split into chunks for better retrieval
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,    # ~250 words per chunk
        chunk_overlap=200   # Overlap prevents losing context at boundaries
    )
    return splitter.split_documents(all_docs)

chunks = load_documents(["company_policy.pdf", "product_manual.pdf"])
print(f"Created {len(chunks)} chunks from your documents")

Step 2: Create Embeddings and Store in Chroma

from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma

# Use a free local embedding model
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# Create vector store and embed all chunks
# This takes 1-5 minutes depending on document size
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"  # Save to disk
)

print("Documents embedded and stored in Chroma!")

Step 3: Build the Q&A Chain

from langchain.chains import RetrievalQA
from langchain_google_genai import ChatGoogleGenerativeAI

# Use free Gemini model
llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",  # Free tier
    google_api_key="YOUR_GOOGLE_AI_KEY"  # Free from aistudio.google.com
)

# Create retriever — finds top 4 most relevant chunks
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 4}
)

# Create QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",  # Stuff all chunks into context
    retriever=retriever,
    return_source_documents=True  # Show which documents were used
)

Step 4: Ask Questions

def ask(question: str):
    result = qa_chain({"query": question})
    print(f"Answer: {result['result']}")
    print("\nSources:")
    for doc in result['source_documents']:
        print(f"  - {doc.metadata.get('source', 'Unknown')} (page {doc.metadata.get('page', '?')})")

# Test it!
ask("What is the refund policy?")
ask("How do I reset my password?")
ask("What are the working hours?")

💰 Free Deal: Get your Google AI API key at aistudio.google.com — completely free, no credit card needed. The free tier gives 1 million tokens/day with Gemini 2.5 Flash, which is more than enough for a document Q&A system with moderate usage.

Complete Working Example

Here is the complete script in one place:

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_google_genai import ChatGoogleGenerativeAI
import os

# Configuration
PDF_FILES = ["document1.pdf", "document2.pdf"]
GOOGLE_API_KEY = "your-api-key-here"

# 1. Load documents
all_docs = []
for pdf in PDF_FILES:
    loader = PyPDFLoader(pdf)
    all_docs.extend(loader.load())

# 2. Split into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(all_docs)

# 3. Create embeddings (free, runs locally)
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = Chroma.from_documents(chunks, embeddings, persist_directory="./db")

# 4. Create LLM and QA chain
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash", google_api_key=GOOGLE_API_KEY)
qa = RetrievalQA.from_chain_type(llm=llm, retriever=vectorstore.as_retriever(search_kwargs={"k": 4}))

# 5. Ask questions
while True:
    question = input("\nAsk a question (or 'quit'): ")
    if question == "quit":
        break
    result = qa({"query": question})
    print(f"\nAnswer: {result['result']}")

Improving RAG Accuracy

When your RAG system gives wrong answers, these techniques help:

Better chunking: Semantic chunking (splitting at natural paragraph or section boundaries) works better than fixed-size chunking for structured documents.

Hybrid search: Combine vector similarity search with keyword search (BM25). This handles cases where exact terminology matters.

Reranking: After retrieving top-K chunks, use a reranker model to sort them by relevance before passing to the LLM.

Metadata filtering: Add metadata (document date, section, author) to chunks and filter by it. "Answer this from documents published after January 2025 only."

RAG vs Fine-Tuning

Rule of thumb: Use RAG if you need to answer questions from documents. Consider fine-tuning only if RAG quality is insufficient after optimization.

Official Resources

LangChain Documentation — Complete LangChain framework docs
Chroma Documentation — Local vector database
Google AI Studio — Free Gemini API key, no credit card
Pinecone Quickstart — Cloud vector DB with free tier
Sentence Transformers — Free local embedding models

Community Questions

No questions yet. Be the first to ask!

Share this guide

r/developersIndia r/india r/ChatGPT

What You'll Learn

What Is RAG?

The Three Components of RAG

1. Embedding Model

2. Vector Database

3. Language Model (LLM)

Step-by-Step: Build a PDF Q&A System

Prerequisites

Step 1: Load and Split Documents

Step 2: Create Embeddings and Store in Chroma

Step 3: Build the Q&A Chain

Step 4: Ask Questions

Complete Working Example

Improving RAG Accuracy

RAG vs Fine-Tuning

Official Resources

Community Questions

Share this guide

More guides in Advanced AI

What is MCP (Model Context Protocol)?

Build Your Own MCP Server

Claude Certification & Learning Paths

You Might Also Like

DeepSeek — Best Free Open-Source LLM

Gemini CLI — 1000 Free Requests/Day

Claude Code — Custom Commands & CLAUDE.md

What You'll Learn

What Is RAG?

The Three Components of RAG

1. Embedding Model

2. Vector Database

3. Language Model (LLM)

Step-by-Step: Build a PDF Q&A System

Prerequisites

Step 1: Load and Split Documents

Step 2: Create Embeddings and Store in Chroma

Step 3: Build the Q&A Chain

Step 4: Ask Questions

Complete Working Example

Improving RAG Accuracy

RAG vs Fine-Tuning

Official Resources

Community Questions

Share this guide

More guides in Advanced AI

What is MCP (Model Context Protocol)?

Build Your Own MCP Server

Claude Certification & Learning Paths

You Might Also Like

DeepSeek — Best Free Open-Source LLM

Gemini CLI — 1000 Free Requests/Day

Claude Code — Custom Commands & CLAUDE.md