Tone Matters: The Impact of Linguistic Tone on Hallucination in VLMs
Weihao Hong
Co-Presenters: Weihao Hong, Zhiyuan Jiang
College: Hennings College of Science Mathematics and Technology
Major: BS.COMPUTER/SCI
Faculty Research Mentor: Boyang Li
Abstract:
Vision-Language Models (VLMs) are increasingly used
in safety-critical applications that require reliable visual
grounding. However, these models often hallucinate details
that are not present in the image to satisfy user prompts.
While recent datasets and benchmarks have been introduced to evaluate systematic hallucinations in VLMs, many
hallucination behaviors remain insufficiently characterized.
In particular, prior work primarily focuses on object presence or absence, leaving it unclear how prompt phrasing
and structural constraints can systematically induce hallucinations. In this paper, we investigate how different forms
of prompt pressure influence hallucination behavior. We
introduce Ghost-100, a procedurally generated dataset of
synthetic scenes in which key visual details are deliberately
removed, enabling controlled analysis of absence-based
hallucinations. Using a structured 5-Level Prompt Intensity Framework, we vary prompts from neutral queries to
toxic demands and rigid formatting constraints. We evaluate three representative open-weight VLMs: MiniCPM V 2.6-8B, Qwen2-VL-7B, and Qwen3-VL-8B. Across all
three models, hallucination rates do not increase monotonically with prompt intensity. All models exhibit reductions
at higher intensity levels at different thresholds, though
not all show sustained reduction under maximum coercion. These results suggest that current safety alignment
is more effective at detecting semantic hostility than structural coercion, revealing model-specific limitations in handling compliance pressure.