AI: Facebook’s new algorithm has been trained on a billion Instagram photos

facebook-ai-seer.jpg

Facebook researchers have unveiled a new AI model that can learn from any random group of unlabeled images on the internet.

Image: Facebook AI

Facebook researchers have unveiled a new AI model that can learn from any random group of unlabeled images on the internet, in an advance that, although still in its early stages, the team hopes to generate a “revolution” in computer vision.

Nicknamed SEER (SElf-SupERvised), the model was fed with a billion publicly available images on Instagram, which had not been previously selected manually. But even without the labels and annotations that normally go into training the algorithm, SEER was able to work autonomously on the data set, learning while it was going and eventually achieving higher levels of accuracy in tasks like object detection.

The method, appropriately called self-supervised learning, is already well established in the field of AI: it consists of creating systems that can learn directly from the information they receive, without having to rely on carefully labeled data sets to teach them how to learn. perform a task like recognizing an object in a photo or translating a block of text.

Self-supervised learning has attracted a lot of scientific attention lately, because it means that much less data is needed to be labeled by humans – a time-consuming task that most researchers prefer to dispense with. At the same time, without the need for a curated data set, a self-supervision model can work with larger and more diverse data sets.

In some fields, especially in natural language processing, the method has already led to advances; algorithms trained in ever-increasing amounts of unlabeled text have enabled advances in applications such as answering questions, machine translation, inference in natural language and much more.

In contrast, computer vision has not yet fully jumped into the self-supervised learning revolution. As Priya Gopal, software engineer at Facebook AI Research, explains, SEER is the first in the field. “SEER is the first fully self-supervised computer vision model that is trained in random images from the Internet, compared to the existing self-supervised computer vision works that were trained on the highly cured ImageNet data set,” she told ZDNet.

ImageNet, in fact, is a large-scale database of millions of photos that have been tagged by researchers and opened to the larger computer vision community for advancing AI developments.

The project’s database was used as a reference by Facebook researchers to assess SEER performance, who found that the self-supervised model outperformed state-of-the-art supervised AI systems in tasks such as low shot, object detection, segmentation and image classification.

“SEER surpasses existing self-supervised models, training only on random images,” says Goyal. “This result essentially indicates that we do not need highly selected data sets, such as ImageNet in computer vision and self-supervision in random images produces high quality models.”

With the degree of sophistication that self-supervised learning requires, the work of the researchers had challenges. When it comes to text, AI models are tasked with assigning meaning to words; but with images, the algorithm must decide how each pixel corresponds to a concept – while taking into account the various angles, views and shapes that a single concept can obtain in different images.

In other words, the researchers needed a lot of data and a model capable of deriving all possible visual concepts from this complex set of information.

To accomplish the task, Goyal and his team adapted a new algorithm from Facebook’s existing AI work in self-supervised learning, called SwAV, which groups images that show similar concepts in separate groups. The scientists also designed a convolutional network – a deep learning algorithm that models the connectivity patterns of neurons in the human brain to assign importance to different objects in an image.

With a billion-person Instagram-based data set, the scale of the system was large, to say the least. The Facebook team used Nvidia V100 GPUs with 32 GB of RAM and, as the model’s size increased, it had to fit the model in the available RAM. But Goyal explains that future research will be useful to ensure that computing resources are adapted to the new system.

“As we train the model on more and more GPUs, communication between these GPUs needs to be fast for faster training. This challenge can be solved with the development of clear software and research techniques that are efficient for the memory budget. and determined run time, “she says.

Although there is still some work to be done, therefore, before SEER can be harnessed for real-world use cases, Goyal argues that the impact of technology should not be underestimated. “With SEER, we can now make further advances in computer vision, training great models in a large abundance of random images from the Internet,” she says.

“This discovery could allow for a self-supervised learning revolution in computer vision similar to what we saw in natural language processing with text.”

On Facebook, SEER can be used for a wide range of computer vision tasks, from automatically generating image description to helping to identify content that violates policies. Outside the company, the technology can also be useful in fields with limited images and metadata, such as medical images.

The Facebook team asked for more work to be done to take SEER into its next stage of development. As part of the research, the team developed a PyTorch-based multifunctional library for self-supervised learning called VISSL, whose source code is open to encourage the broader AI community to test the technology.

Source