Scaling up AI and the future of content delivery

Recently, rumors have surfaced about Nintendo’s plans to launch a new version of its hugely popular Switch console in time for the holidays. A faster CPU, more RAM and an improved OLED screen are practically a given, as you would expect for a half-generation upgrade. These updated specs will almost certainly come at an inflated price too, but given the incredible demand for the current switch, an increase of $ 50 or even $ 100 is unlikely to deter many potential buyers.

But according to a report from Bloomberg, the new switch may have a little more detail under the hood than you would expect from the technologically conservative Nintendo. Its sources say the new system will use an NVIDIA chipset capable of Deep Learning Super Sampling (DLSS), a feature that is currently only available on GPUs in the GeForce RTX 20 and GeForce RTX 30 series. The technology, which has already been used by several games of notable PCs in recent years, uses machine learning to enhance images rendered in real time. So, instead of assigning the GPU the task of producing a native 4K image, the engine can render the game at a lower resolution and have DLSS to make a difference.

The current Nintendo Switch model

The implications of this technology, especially on computationally limited devices, are immense. For the Switch, which functions as a battery-powered handheld when removed from its dock, the use of DLSS could allow the production of visuals similar to the much larger and more expensive Xbox and PlayStation systems with which it is competing. If Nintendo and NVIDIA can prove that DLSS is viable in something as small as the Switch, we will probably see the technology reach the smartphones and tablets of the future to compensate for their relatively limited GPUs.

But why stop there? If artificial intelligence systems like DLSS can scale up a video game, it stands to reason that the same techniques can be applied to other forms of content. Instead of saturating your Internet connection with a 16K video stream, will the TVs of the future simply do their best using a machine-learning algorithm trained in popular movies and programs?

How low can you go?

Obviously, you don’t need machine learning to resize an image. You can take a standard resolution video and scale it to high definition quite easily, and in fact, your TV or Blu-ray player does just that when you watch older content. But it doesn’t take much attention to immediately see the difference between an enlarged DVD to fit a high definition screen and the modern content actually produced at that resolution. Taking a 720 x 480 image and enlarging it to 1920 x 1080, or even 3840 x 2160 in the case of 4K, will lead to quite obvious image degradation.

To solve this fundamental problem, AI-enhanced scaling actually creates new visual data to bridge the gaps between source and destination resolutions. In the case of DLSS, NVIDIA trained its neural network by obtaining low and high resolution images of the same game and having its internal supercomputer analyze the differences. To maximize results, high-resolution images were rendered at a level of detail that would be computationally impractical or even impossible to obtain in real time. Combined with the motion vector data, the neural network was tasked not only with filling in the visual information needed to make the low-resolution image better approach the idealistic target, but with predicting what the next frame of the animation might look like.

NVIDIA DLSS 2.0 architecture

Although less than 50 PC games supported the latest version of DLSS at the time of writing, the results so far have been extremely promising. The technology will allow today’s computers to run newer, more complex games for longer and, for current titles, it will lead to substantially improved rendering of frames per second (FPS). In other words, if you have a computer powerful enough to run a game at 30 FPS in 1920 x 1080, the same computer could achieve 60 FPS if the game were rendered at 1280 x 720 and scaled up with DLSS.

There have been many opportunities to compare DLSS real-world performance gains in supported titles over the past two years, and YouTube is full of direct comparisons that show what the technology is capable of. In a particularly extreme test, 2kliksphilip ran 2019’s To control and 2020 Death Stranding at just 427 x 240 and used DLSS to scale it up to 1280 x 720. Although the results were not perfect, the two games ended up looking much better than they should, considering that they were being rendered at a resolution that we would most likely like associated with the Nintendo 64 than a modern gaming PC.

IA Enhanced Entertainment

Although these may be the first days, it seems quite clear that machine learning systems, such as Deep Learning Super Sampling, are very promising for games. But the idea is not limited to video games. There is also a big push towards using similar algorithms to enhance older movies and television shows for which there is no higher resolution version. Both proprietary and open software are now available and take advantage of the computing power of modern GPUs to enhance still images and videos.

Of the open source tools in this area, the Video2X project is well known and is in active development. This Python 3 framework makes use of the waifu2x and Anime4K upscalers, which, as you may have noticed by their names, were designed to work mainly with anime. The idea is that you could take a movie or animation series that was only released in standard definition and, when playing it through a neural network specifically trained in visually similar content, bring it to 1080 or even 4K resolution.

While getting the software up and running can be somewhat complicated due to the different GPU acceleration structures available, depending on your operating system and hardware platform, this is something that anyone with a relatively modern computer is capable of doing on their own. For example, I took a 640 x 360 frame from Big Buck Bunny and expanded it to 1920 x 1080 using default settings on the backif of the waifu2x upscaler on Video2X:

When compared to the native 1920 x 1080 image, we can see some subtle differences. The shading of the rabbit’s fur is not so nuanced, the eyes don’t have a certain shine and, above all, the grass has gone from individual blades to something that looks more like an oil painting. But would you really have noticed any of this if the two images were not side by side?

Requires some assembly

In the previous example, AI was able to increase the resolution of an image by three times with negligible graphic artifacts. But what is perhaps more impressive is that the file size of the 640 x 360 frame is only one fifth of the original 1920 x 1080 frame. Extrapolating this difference to the length of a feature film, it is clear how technology can have a big impact on the huge bandwidth and storage costs associated with streaming video.

Imagine a future in which, instead of streaming an ultra-high-resolution movie from the Internet, your device receives a video stream at 1/2 or even 1/3 of the target resolution, along with a trained neural network model on that specific content. Your AI-enabled player could then take that “dehydrated” video and scale it in real time to any resolution appropriate for your screen. Instead of saturating your Internet connection, it would be a bit like delivering pizzas in Back to the future II.

The only technical challenge on the way is the time it takes to perform this type of scaling up: when running Video2X on reasonably advanced hardware, a rendering speed of 1 or 2 FPS is considered fast. It would take a huge increase in computing power to scale AI video in real time, but the progress that NVIDIA has made with DLSS is certainly encouraging. Of course, moviegoers would say that such a reproduction may not fit the director’s intention, but when people watch movies 30 minutes at a time on their phones while commuting to work, it’s safe to say that the ship is already gone.

Source