HALVA: Hallucination Attenuated Language and Vision Assistant
A new contrastive tuning strategy mitigates hallucinations while retaining general performance in multimodal LLMs.
Data-augmented contrastive tuning has been introduced to mitigate object hallucination in MLLMs. The proposed method effectively mitigates object hallucinations and beyond while retaining or improving performance on general vision-language tasks. Moreover, the proposed contrastive tuning is simple, fast, and requires minimal training with no additional overhead at inference. This method may have applications in other areas as well. For example, it might be adapted to mitigate bias and harmful language generation.