Source1
Supplementary reading to the research paper
Hallucination Index Results
![]()
📌 Additional Context Improves RAG
Adding contextual layers has emerged as a key strategy to reduce dependency on vector databases and improve retrieval-augmented generation (RAG) reliability.
🧮 Impact of Context Length
Context length isn’t just a technical spec—it directly influences:
- Retrieval strategy architecture
- Latency and compute overhead
- Balance between recall breadth and precision focus
📊 Comparison of Context Length Features

🧠 Hallucination & Evaluation Methodologies
ChainPoll with GPT-4o
- Runs the model through a multi-prompt sequence using Chain-of-Thought
- Used to identify hallucination frequency and context adherence
- Useful for cross-domain accuracy benchmarking
Needle Chunk
- Tests the model’s ability to find the most relevant data chunk (the “needle”) embedded in broader context
- Used to simulate retrieval focus
Chain-of-X Frameworks
- Chain-of-Note: Creates notes from retrieved docs, enhancing reflection and synthesis
- Chain-of-Thought: Sequential reasoning to reduce leaps and omissions
- Chain-of-Knowledge: Linked knowledge progression to deepen understanding
- Chain-of-Verification: Prompts validation steps post initial output to refine answers
- Chain-of-Explanation: Justifies response logic for interpretability
SelfCheck-BERTScore
- Evaluates semantic similarity to ground truth using BERT embeddings
- Goes beyond n-gram exactness—captures intent-level match
- Also checks internal consistency of generated output
Other Evaluation Scores
- G-Eval
- Max pseudo-entropy
- GPTScore
- Random Guessing (baseline)