RAG agent gave great answers. Client asked: 'Which contract did that come from?' Agent couldn't say. Added citation tracking. Now every answer includes source document and page number.
THE PROBLEM WITH BASIC RAG:
Agent answers questions. Doesn't track source documents. Can't verify answers. Can't cite sources. Legal teams need sources for every claim.
THE SOLUTION:
Build citation tracking into RAG pipeline. Capture document metadata. Pass through embeddings. Return with answers.
THE EXTENSION:
ORIGINAL RAG FLOW:
Document → Parse → Split → Embed → Store → Query → Answer
EXTENDED RAG FLOW:
Document → Parse → Add Metadata → Split (preserve metadata) → Embed (with metadata) → Store with source → Query → Retrieve with source → Answer with citation
THE NEW METADATA STRUCTURE:
document_id, document_name, document_type, page_number, section, last_updated, owner
STORED WITH EACH CHUNK:
Every text chunk in vector store includes metadata. When agent retrieves chunks, metadata comes with it.
AGENT TOOL ENHANCEMENT:
Modified 'Search Documents' tool to return text PLUS source information.
Agent receives: Relevant text, source document name, page numbers, confidence score.
Agent includes in answer: 'According to the Service Agreement (page 7), the cancellation policy is...'
THE IMPLEMENTATION:
NODE CHANGES:
1. After Document Parser: SET node creates metadata object
2. Text Splitter: Configure to preserve metadata in chunks
3. Vector Store Insert: Include metadata with embeddings
4. Vector Store Query: Return metadata with results
5. Agent Tool: Format metadata into citation
CITATION FORMAT OPTIONS:
Simple: (Source: filename.pdf)
Detailed: (Source: filename.pdf, page 7, section 3.2)
Legal: (Service Agreement v2.1, executed 2024-01-15, section 3.2, page 7)
THE RESULTS:
- Base RAG: Answers without sources
- Extended RAG: Every answer cited
- Legal team: Approved (had rejected basic RAG)
- Audit compliance: 100% traceable
- User trust: Significantly increased
REAL EXAMPLE:
Question: 'What is our refund policy for enterprise customers?'
Basic RAG: 'Enterprise customers can request refunds within 30 days with manager approval.'
Citation-enabled RAG: 'Enterprise customers can request refunds within 30 days with manager approval. (Source: Enterprise Terms of Service v3.2, page 12, section 4.3, last updated 2024-03-01)'
Can verify. Can audit. Can trust.
CONFIGURATION:
Metadata schema: Define in SET node. Consistent across documents.
Vector store: Qdrant metadata filtering. Query specific document types or date ranges.
THE LESSON:
RAG without citations is demo. RAG with citations is production. Legal and compliance require source tracking.
TEMPLATE:
Complete citation-enabled RAG system. Metadata schema, vector store config, agent tool with citations.
How do you handle source tracking in RAG?