Hybrid Retrieval Augmented Generation for Technical Document QA
Maria Santos Perez
Co-Presenters: Individual Presentation
College: Hennings College of Science Mathematics and Technology
Major: MS.COMPUTER/SCIENCE
Faculty Research Mentor: Dan Liu
Abstract:
Organizations that rely on large technical manuals often face challenges in quickly locating accurate and authoritative information. Traditional document systems require manual searching across hundreds of pages, which can slow onboarding, reduce efficiency, and increase the risk of misinterpretation. This project presents the design and evaluation of a citation-grounded AI assistant intended to modernize legacy documentation systems while maintaining strict evidence control.
The system implements a retrieval-augmented generation (RAG) architecture combining vector similarity search with keyword-based retrieval in a PostgreSQL database using pgvector. Documents are parsed into structured chunks that preserve section hierarchies, tables, equations, and cross-references. A hybrid ranking strategy dynamically adjusts weights between semantic similarity and exact token matching to improve precision for section numbers, engineering standards, and table lookups. The language model is constrained to generate responses exclusively from retrieved content and must return a deterministic fallback when sufficient evidence is not found.
Evaluation using a structured benchmark set demonstrates strong retrieval coverage and reliable citation grounding, particularly for table-based and cross-referenced queries. This work contributes a practical framework for organizations seeking to upgrade static documentation systems into interactive, AI-powered knowledge tools while minimizing hallucination risk and ensuring traceable, evidence-based outputs. Future work includes expanded benchmarking and automated change-tracking integration.