OpenAI’s state-of-the-art machine vision AI is fooled by handwritten notes

Researchers from machine learning lab OpenAI have discovered that their state-of-the-art computer vision system can be deceived by tools no more sophisticated than a pen and a pad. As illustrated in the image above, simply writing down the name of an object and sticking it on another can be enough to trick the software into misidentifying what it sees.

“We refer to these attacks as typographic attacks,” write OpenAI’s researchers in a blog post. “By exploiting the model’s ability to read text robustly, we find that even photographs of hand-written text can often fool the model.” They note that such attacks are similar to “adversarial images” that can fool commercial machine vision systems, but far simpler to produce.

Adversarial images present a real danger for systems that rely on machine vision. Researchers have shown, for example, that they can trick the software in Tesla’s self-driving cars to change lanes without warning simply by placing certain stickers on the road. Such attacks are a serious threat for a variety of AI applications, from the medical to the military.

But the danger posed by this specific attack is, at least for now, nothing to worry about. The OpenAI software in question is an experimental system named CLIP that isn’t deployed in any commercial product. Indeed, the very nature of CLIP’s unusual machine learning architecture created the weakness that enables this attack to succeed.

CLIP is intended to explore how AI systems might learn to identify objects without close supervision by training on huge databases of image and text pairs. In this case, OpenAI used some 400 million image-text pairs scraped from the internet to train CLIP, which was unveiled in January.

This month, OpenAI researchers published a new paper describing how they’d opened up CLIP to see how it performs. They discovered what they’re calling “multimodal neurons” — individual components in the machine learning network that respond not only to images of objects but also sketches, cartoons, and associated text. One of the reasons this is exciting is that it seems to mirror how the human brain reacts to stimuli, where single brain cells have been observed responding to abstract concepts rather than specific examples. OpenAI’s research suggests it may be possible for AI systems to internalize such knowledge the same way humans do.

In the future, this could lead to more sophisticated vision systems, but right now, such approaches are in their infancy. While any human being can tell you the difference between an apple and a piece of paper with the word “apple” written on it, software like CLIP can’t. The same ability that allows the program to link words and images at an abstract level creates this unique weakness, which OpenAI describes as the “fallacy of abstraction.”

Another example given by the lab is the neuron in CLIP that identifies piggy banks. This component not only responds to pictures of piggy banks but strings of dollar signs, too. As in the example above, that means you can fool CLIP into identifying a chainsaw as a piggy bank if you overlay it with “$$$” strings, as if it were half-price at your local hardware store.

The researchers also found that CLIP’s multimodal neurons encoded exactly the sort of biases you might expect to find when sourcing your data from the internet. They note that the neuron for “Middle East” is also associated with terrorism and discovered “a neuron that fires for both dark-skinned people and gorillas.” This replicates an infamous error in Google’s image recognition system, which tagged Black people as gorillas. It’s yet another example of just how different machine intelligence is to that of humans’ — and why pulling apart the former to understand how it works is necessary before we trust our lives to AI.