‘Typographic attacks’ bring OpenAI image recognition to your knees

CLIP IDs before and after attaching a piece of paper that says 'iPod' to an apple.

CLIP IDs before and after attaching a piece of paper that says ‘iPod’ to an apple.
Print Screen: OpenAI (Other)

Tricking a terminator into no shooting you can be as simple as using a giant sign that says ROBOT, at least until the research agency OpenAI, supported by Elon Musk, trains its image recognition system to not misidentify things based on some scribbles from a Sharpie.

OpenAI Researchers work published last week on the neural network CLIP, its state-of-the-art system to allow computers to recognize the world around them. Neural networks are machine learning systems that can be trained over time to improve on a given task using a network of interconnected nodes – in the case of CLIP, identifying objects based on an image – in ways that are not always immediately clear for system developers. The research published last week concerns “multimodal neurons ”, which exist both in biological systems such as the brain and in artificial systems such as CLIP; they “respond to groups of abstract concepts centered around a common, high-level theme, rather than any specific visual resource.” At the highest levels, CLIP organizes images based on a “semantic collection of ideas”.

For example, the OpenAI team wrote, CLIP has a multimediaodal neuron “Spider-Man” that fires when seeing the image of a spider, the word “spider”, or an image or drawing of the eponymous superhero. A side effect of multimodal neurons, according to the researchers, is that they can be used to trick CLIP: the research team managed to trick the system into identifying an apple (the fruit) as an iPod (the device made by Apple) just by gluing a piece of paper that says “iPod” on it.

CLIP IDs before and after attaching a piece of paper that says 'iPod' to an apple.

CLIP IDs before and after attaching a piece of paper that says ‘iPod’ to an apple.
Graphic: OpenAI (Other)

In addition, the system was really most confident that they had correctly identified the item in question when this occurred.

The research team referred to the failure as a “typographic attack” because it would be trivial for anyone aware of the problem to explore it deliberately:

We believe that attacks like the ones described above are far from being just an academic concern. In exploring the model’s ability to read text robustly, we found that even handwritten text photographs can often deceive the model.

[…] We also believe that these attacks can also take on a more subtle and less visible form. An image, given to CLIP, is abstracted in many subtle and sophisticated ways, and these abstractions can abstract common patterns too much – oversimplify and, as a result, over-generalize.

This is less a failure of the CLIP than an illustration of how complicated the underlying associations that it has composed over time are complicated. Per the Guardian, OpenAI research indicated the conceptual models that The CLIP compilations are, in many ways, similar to the functioning of the human brain.

The researchers anticipated that the apple / iPod problem was just an obvious example of a problem that could manifest itself in countless other ways at CLIP, as its multimodal neurons “generalize the literal and the iconic, which can be a double-edged. ”For example, the system identifies a piggy bank as the combination of the neurons” finance “and” dolls, toys “. The researchers found that CLIP thus identified an image of a standard poodle as a piggy bank when they forced the financial neuron to fire, drawing dollar signs on it.

The research team noted that the technique is similar to “opposing images, ”Which are images created to induce neural networks to see something that is not there. But, in general, it is cheaper to do, because all it requires is paper and some form of writing on it. (While the Annotated record, visual recognition systems are largely in their infancy and vulnerable to a number of other simple attacks, such as a Tesla autopilot system that McAfee Labs researchers tricked into thinking a 56 km / h road sign was actually a 80 km / h sign with a few centimeters of electrical tape.)

The associative model of CLIP, the researchers added, also had the ability to go wrong and generate prejudiced or racist conclusions about various types of people:

We observe, for example, a neuron from the “Middle East” [1895] with an association with terrorism; and an “immigration” neuron [395] that responds to Latin America. We even found a neuron that fires for both dark-skinned people and gorillas [1257], mirroring previous photo tagging incidents on other models that we consider unacceptable.

“We believe that these CLIP investigations only scratch the surface in understanding CLIP’s behavior and invite the research community to come together to improve our understanding of CLIP and similar models,” wrote the researchers.

CLIP is not the only project OpenAI is working on. Its GPT-3 text generator, which OpenAI researchers described in 2019 as too dangerous to release, has come a long way and is now capable of generating natural (but not necessarily convincing) sounds fake news articles. In September 2020, Microsoft acquired a exclusive license to get the GPT-3 to work.

.Source