Facebook’s next big AI project is to train their machines on public user videos

Teaching AI systems to understand what’s going on in videos as completely as a human being can is one of the most difficult challenges – and the greatest potential breakthroughs – in the world of machine learning. Today, Facebook announced a new initiative that hopes to give it an advantage in this consequent work: training its AI in the public videos of Facebook users.

Access to training data is one of the biggest competitive advantages in AI and, by collecting this resource from millions and millions of its users, technology giants like Facebook, Google and Amazon have been able to advance in several areas. And while Facebook has already trained machine vision models on billions of images collected from Instagram, it has not announced similarly ambitious projects for understanding video.

“Learning from globally publicly available video streams spanning almost every country and hundreds of languages, our AI systems will not only improve accuracy, but will also adapt to our rapidly changing world and recognize the nuances and visual cues in different cultures and regions, ”said the company in a blog. The project, entitled Learning from Videos, is also part of “Facebook’s broader efforts to build machines that learn like humans”.

The resulting machine learning models will be used to create new content recommendation systems and moderation tools, says Facebook, but they can do much more in the future. AI that can understand the content of the videos can give Facebook an unprecedented view of users’ lives, allowing them to analyze their hobbies and interests, preferences in brands and clothing and countless other personal details. Sure, Facebook already has access to this information through its current ad targeting operation, but being able to analyze video through AI would add an incredibly rich (and invasive) data source to its stores.

Facebook is vague about its future plans for AI models trained in user videos. The company said The Verge these models can have many uses, from video subtitling to creating advanced search functions, but they did not answer a question about whether or not they would be used to collect information for targeting ads. Likewise, when asked whether users should consent to their videos being used to train Facebook AI or whether they could choose not to participate, the company responded only by noting that its Data Policy states that the content sent by users can be used for “product research and development. ”Facebook also did not answer questions about exactly how much video will be collected to train its AI systems or how access to that data by company researchers will be supervised.

In its blog post announcing the project, however, the social network pointed to a future speculative use: the use of AI to retrieve “digital memories” captured by smart glasses.

Facebook plans to launch a pair of smart glasses for consumers later this year. Details about the device are vague, but it is likely that these or future glasses will include integrated cameras to capture the user’s point of view. If AI systems can be trained to understand the content of the video, this will allow users to search for previous recordings, just as many photo apps allow people to search for specific places, objects or people. (This is information, by the way, that has often been indexed by AI systems trained on user data.)

Facebook released images showing prototype pairs of its augmented reality smart glasses.
Image: Facebook

As video recording with smart glasses “becomes the norm,” says Facebook, “people should be able to recall specific moments in their vast bank of digital memories as easily as they capture them.” He gives the example of a user conducting a survey with the phrase “Show me whenever we sing congratulations to grandma”, before receiving relevant clips. As the company notes, such research would require AI systems to make connections between data types, teaching them to “combine the phrase ‘happy birthday’ with cakes, candles, people singing various birthday songs and more.” Like humans, AI would need to understand rich concepts composed of different types of sensory input.

Looking ahead, the combination of smart glasses and machine learning would allow what is known as “world capture” – capturing granular data about the world, turning smart glasses users into mobile CCTV cameras. As the practice was described in a report last year from the The Guardian: “Each time someone visited a supermarket, their smart glasses recorded real-time price data, stock levels and browsing habits; every time they opened a newspaper, their glasses knew which stories they read, which ads they looked at and which beach photos of celebrities their gaze lingered on ”.

This is an extreme result and not a search engine that Facebook says it is exploring. But it illustrates the potential significance of pairing advanced AI video analytics with smart glasses – which the social network is apparently interested in doing.

In comparison, the only use of its new AI video analysis tools that Facebook is currently touting is relatively mundane. Along with the Learning with Videos ad today, Facebook says it has implemented a new content recommendation system based on its video work on its TikTok clone Reels. “Popular videos usually consist of the same song with the same dance moves, but created and represented by different people,” says Facebook. When analyzing the content of the videos, Facebook AI can suggest similar clips to users.

However, these content recommendation algorithms are not without potential problems. A recent report from MIT Technology Review highlighted how the social network’s emphasis on growth and user engagement has prevented its AI team from fully addressing how algorithms can spread misinformation and encourage political polarization. While the Technology analysis the article says: “The [machine learning] models that maximize engagement also favor controversy, misinformation and extremism. ”This creates a conflict between the duties of Facebook’s AI ethics researchers and the company’s belief in maximizing growth.

Facebook is not the only major technology company that seeks advanced AI video analytics, nor is it the only one that leverages user data to do so. Google, for example, maintains a publicly accessible survey data set containing 8 million YouTube videos curated and partially labeled to “help speed up research on video comprehension on a large scale”. The search giant’s ad operations can also benefit from AI that understands the content of videos, even if the end result is simply running more relevant ads on YouTube.

Facebook, however, believes it has a particular advantage over its competitors. Not only does it have extensive training data, but it is pushing more and more resources into an AI method known as self-supervised learning.

Normally, when AI models are trained in data, these entries must be labeled by humans: by marking objects in images or transcribing audio recordings, for example. If you have already solved a CAPTCHA by identifying fire hydrants or crosswalks, you have probably labeled data that helped train AI. But self-supervised learning does away with labels, speeding up the training process and, some researchers believe, resulting in deeper and more meaningful analysis as AI systems learn to put the dots together. Facebook is so optimistic about self-supervised learning that it is called “the dark question of intelligence”.

The company says that its future work in AI video analysis will focus on semi-supervised and self-supervised learning methods, and that such techniques “have already improved our computer vision and speech recognition systems”. With so much video content available to 2.8 billion Facebook users, skipping the AI ​​training part certainly makes sense. And if the social network can teach your machine learning models to understand video perfectly, who knows what they can learn?

Source