Hybrid Retrieval Augmented Generation for Technical Document QA

Maria Santos Perez Poster Presentation

Maria Santos Perez

Co-Presenters: Individual Presentation

College: Hennings College of Science Mathematics and Technology

Major: MS.COMPUTER/SCIENCE

Faculty Research Mentor: Dan Liu

Abstract:

Organizations that rely on large technical manuals often face challenges in quickly locating accurate and authoritative information. Traditional document systems require manual searching across hundreds of pages, which can slow onboarding, reduce efficiency, and increase the risk of misinterpretation. This project presents the design and evaluation of a citation-grounded AI assistant intended to modernize legacy documentation systems while maintaining strict evidence control.

The system implements a retrieval-augmented generation (RAG) architecture combining vector similarity search with keyword-based retrieval in a PostgreSQL database using pgvector. Documents are parsed into structured chunks that preserve section hierarchies, tables, equations, and cross-references. A hybrid ranking strategy dynamically adjusts weights between semantic similarity and exact token matching to improve precision for section numbers, engineering standards, and table lookups. The language model is constrained to generate responses exclusively from retrieved content and must return a deterministic fallback when sufficient evidence is not found.

Evaluation using a structured benchmark set demonstrates strong retrieval coverage and reliable citation grounding, particularly for table-based and cross-referenced queries. This work contributes a practical framework for organizations seeking to upgrade static documentation systems into interactive, AI-powered knowledge tools while minimizing hallucination risk and ensuring traceable, evidence-based outputs. Future work includes expanded benchmarking and automated change-tracking integration.

Previous
Previous

VR-TSST: Understanding Stress Responses in Adolescents and Young Adults