Source1
Supplementary reading to the research paper

Hallucination Index Results

Hallucination Index Graph

📌 Additional Context Improves RAG

Adding contextual layers has emerged as a key strategy to reduce dependency on vector databases and improve retrieval-augmented generation (RAG) reliability.

🧮 Impact of Context Length

Context length isn’t just a technical spec—it directly influences:

📊 Comparison of Context Length Features

Context Length Comparison Table

🧠 Hallucination & Evaluation Methodologies

ChainPoll with GPT-4o
  • Runs the model through a multi-prompt sequence using Chain-of-Thought
  • Used to identify hallucination frequency and context adherence
  • Useful for cross-domain accuracy benchmarking
Needle Chunk
  • Tests the model’s ability to find the most relevant data chunk (the “needle”) embedded in broader context
  • Used to simulate retrieval focus
Chain-of-X Frameworks
  • Chain-of-Note: Creates notes from retrieved docs, enhancing reflection and synthesis
  • Chain-of-Thought: Sequential reasoning to reduce leaps and omissions
  • Chain-of-Knowledge: Linked knowledge progression to deepen understanding
  • Chain-of-Verification: Prompts validation steps post initial output to refine answers
  • Chain-of-Explanation: Justifies response logic for interpretability
SelfCheck-BERTScore
  • Evaluates semantic similarity to ground truth using BERT embeddings
  • Goes beyond n-gram exactness—captures intent-level match
  • Also checks internal consistency of generated output
Other Evaluation Scores
  • G-Eval
  • Max pseudo-entropy
  • GPTScore
  • Random Guessing (baseline)

1 Report from Galileo