OpenAI’s state-of-the-art machine vision AI is fooled by handwritten notes

Researchers at the OpenAI machine learning lab have found that their state-of-the-art computer vision system can be defeated by tools no more sophisticated than a pen and pad. As illustrated in the image above, simply writing the name of an object and pasting it on another can be enough to deceive the software, causing it to misidentify what it sees.

“We refer to these attacks as typographic attacks”, OpenAI researchers write in a blog post. “By exploring the model’s ability to read text robustly, we found that even photographs of handwritten text can often mislead the model.” They note that such attacks are similar to “opposing images” that can trick commercial machine vision systems, but much simpler to produce.

Opposing images pose a real danger to systems that rely on machine vision. Researchers have shown, for example, that they can trick Tesla’s self-driving car software into changing lanes without warning by simply placing certain stickers on the road. These attacks are a serious threat to a variety of AI applications, from medical to military.

But the danger posed by this specific attack is, at least for now, nothing to worry about. The OpenAI software in question is an experimental system called CLIP that is not implemented in any commercial product. In fact, the very nature of CLIP’s unusual machine learning architecture created the weakness that allows this attack to succeed.

“Multimodal neurons” at CLIP respond to photos of an object, as well as sketches and text.
Image: OpenAI

CLIP’s goal is to explore how AI systems can learn to identify objects without close supervision, training in huge databases of image and text pairs. In this case, OpenAI used around 400 million image-text pairs extracted from the internet to train CLIP, which was revealed in January.

This month, OpenAI researchers published a new article describing how they opened CLIP to see how it works. They discovered what they are calling “multimodal neurons” – individual components in the machine learning network that respond not only to images of objects, but also to the associated text. One reason this is exciting is that it seems to reflect how the human brain reacts to stimuli, where individual brain cells have been observed to respond to abstract concepts rather than specific examples. OpenAI research suggests that it may be possible for AI systems to internalize this knowledge in the same way as humans.

In the future, this may lead to more sophisticated vision systems, but now, these approaches are in their infancy. Although any human being can tell the difference between an apple and a piece of paper with the word “apple” written on them, software like CLIP cannot. The same ability that allows the program to link words and images at an abstract level creates this unique weakness, which OpenAI describes as the “fallacy of abstraction”.

Another example of a typographic attack. Do not rely on AI to put your money in the piggy bank.
Image: OpenAI

Another example given by the laboratory is the CLIP neuron that identifies piggy banks. This component not only responds to piggy bank images, but also to dollar sign sequences. As in the example above, this means that you can trick CLIP into identifying a chainsaw as a piggy bank if you overlap it with “$$$” strings, as if it were half the price at your local hardware store.

The researchers also found that CLIP’s multimodal neurons encode exactly the type of biases you can expect to find when searching for your data on the Internet. They note that the “Middle East” neuron is also associated with terrorism and have found “a neuron that fires for both dark-skinned people and gorillas.” This replicates an infamous error in Google’s image recognition system, which has labeled blacks as gorillas. It is yet another example of how machine intelligence is different from that of humans – and why separating the first to understand how it works is necessary before we entrust our lives to AI.

Source