Tone Matters: The Impact of Linguistic Tone on Hallucination in VLMs
Weihao Hong
Co-Presenters: Weihao Hong, Zhiyuan Jiang
College: Hennings College of Science Mathematics and Technology
Major: BS.COMPUTER/SCI
Faculty Research Mentor: Li, Boyang
Abstract:
Vision-Language Models (VLMs) are increasingly usedin safety-critical applications that require reliable visualgrounding. However, these models often hallucinate detailsthat are not present in the image to satisfy user prompts.While recent datasets and benchmarks have been introduced to evaluate systematic hallucinations in VLMs, manyhallucination behaviors remain insufficiently characterized.In particular, prior work primarily focuses on object presence or absence, leaving it unclear how prompt phrasingand structural constraints can systematically induce hallucinations. In this paper, we investigate how different formsof prompt pressure influence hallucination behavior. Weintroduce Ghost-100, a procedurally generated dataset ofsynthetic scenes in which key visual details are deliberatelyremoved, enabling controlled analysis of absence-basedhallucinations. Using a structured 5-Level Prompt Intensity Framework, we vary prompts from neutral queries totoxic demands and rigid formatting constraints. We evaluate three representative open-weight VLMs: MiniCPM V 2.6-8B, Qwen2-VL-7B, and Qwen3-VL-8B. Across allthree models, hallucination rates do not increase monotonically with prompt intensity. All models exhibit reductionsat higher intensity levels at different thresholds, thoughnot all show sustained reduction under maximum coercion. These results suggest that current safety alignmentis more effective at detecting semantic hostility than structural coercion, revealing model-specific limitations in handling compliance pressure.