| | |
| | |
Stat |
Members: 3665 Articles: 2'599'751 Articles rated: 2609
25 January 2025 |
|
| | | |
|
Article overview
| |
|
Image Hijacking: Adversarial Images can Control Generative Models at Runtime | Luke Bailey
; Euan Ong
; Stuart Russell
; Scott Emmons
; | Date: |
1 Sep 2023 | Abstract: | Are foundation models secure from malicious actors? In this work, we focus on
the image input to a vision-language model (VLM). We discover image hijacks,
adversarial images that control generative models at runtime. We introduce
Behavior Matching, a general method for creating image hijacks, and we use it
to explore three types of attacks. Specific string attacks generate arbitrary
output of the adversary’s choosing. Leak context attacks leak information from
the context window into the output. Jailbreak attacks circumvent a model’s
safety training. We study these attacks against LLaVA-2, a state-of-the-art VLM
based on CLIP and LLaMA-2, and find that all our attack types have above a 90\%
success rate. Moreover, our attacks are automated and require only small image
perturbations. These findings raise serious concerns about the security of
foundation models. If image hijacks are as difficult to defend against as
adversarial examples in CIFAR-10, then it might be many years before a solution
is found -- if it even exists. | Source: | arXiv, 2309.00236 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
|
| |
|
|
|