‘Typographic attack’: pen and paper make AI think that apple is an iPod | Artificial Intelligence (AI)

As far as artificial intelligence systems are concerned, it is very clever: show a clip of an image of an apple and he can recognize that he is looking at a fruit. He can even say which one, and sometimes even goes so far as to differentiate the varieties.

But even the most intelligent AI can be tricked with the simplest of hacks. If you write the word “iPod” on an adhesive label and stick it over the apple, the Clip does something strange: it almost certainly decides that it is looking at an electronic device from the mid-2000s. In another test, pasting dollar signs over a dog’s photo made him recognized as a piggy bank.

The image of a poodle is identified as
The image of a poodle is identified as ‘poodle’ and the image of a poodle with $$$ pasted on it is identified as ‘piggy bank’. Photography: OpenAI

OpenAI, the machine learning research organization that created the Clip, calls this weakness a “typographic attack”. “We believe that attacks like the ones described above are far from being just an academic concern,” the organization said in an article published this week. “By exploring the model’s ability to read text robustly, we found that even handwritten text photographs can mislead the model. This attack works in the wild … but requires no more technology than pen and paper. “

Like the GPT-3, the last AI system made by the laboratory to reach the first pages, the Clip is more a proof of concept than a commercial product. But both have made great strides in what was thought possible in their domains: the GPT-3 wrote a commentary by the Guardian last year, while the Clip showed an ability to recognize the real world better than almost all similar approaches.

While the lab’s latest discovery raises the possibility of cheating AI systems with nothing more complex than a T-shirt, OpenAI says the weakness is a reflection of some of the underlying strengths of its image recognition system. Unlike older AIs, the Clip is able to think about objects not only on a visual level, but also in a more “conceptual” way. This means, for example, that he can understand that a photo of Spider-Man, a stylized drawing of the superhero, or even the word “spider”, all refer to the same basic thing – but also that it can sometimes fail in recognize the important differences between these categories.

“We found that the higher layers of the Clip organize the images as a collection of loose semantic ideas,” says OpenAI, “providing a simple explanation for the versatility of the model and the compression of the representation”. In other words, like the functioning of the human brain, AI thinks of the world in terms of ideas and concepts, rather than purely visual structures.

An image of an apple, labeled 'Granny Smith' and an image of an apple with an adhesive label saying 'iPod' on it
“When we put a label saying ‘iPod’ on this Granny Smith apple, the model mistakenly classifies it as an iPod in the zero-shot configuration,” says OpenAI. Photography: OpenAI

But this abbreviation can also lead to problems, of which “typographical attacks” are only the top level. The “Spider-Man neuron” in the neural network can be shown to respond to the collection of ideas related to Spider-Man and spiders, for example; but other parts of the network group together concepts that can be better separated.

“We see, for example, a ‘Middle East’ neuron associated with terrorism,” writes OpenAI, “and a ‘immigration’ neuron that responds to Latin America. We even found a neuron that fires for both dark-skinned people and gorillas, mirroring previous incidents of photo tagging on other models that we consider unacceptable. ”

In 2015, Google had to apologize for automatically labeling images of blacks as “gorillas”. In 2018, it was discovered that the search engine had never really solved the underlying problems with its AI that led to this error: instead, it simply intervened manually to prevent it from ever branding anything like a gorilla, no matter how accurate or not, the label was.

Source