YouTube gets real-time video segmentation: Here’s how this technology works | tech | Hindustan Times
Today in New Delhi, India
Apr 21, 2018-Saturday
-°C
New Delhi
  • Humidity
    -
  • Wind
    -

YouTube gets real-time video segmentation: Here’s how this technology works

The new segmentation technology will allow creators to replace and modify the background, increasing videos’ production value without specialised equipment.

tech Updated: Mar 04, 2018 12:59 IST
A visitor is seen at the YouTube stand during the annual MIPCOM television programme market in Cannes. Credit: Reuters
A visitor is seen at the YouTube stand during the annual MIPCOM television programme market in Cannes. Credit: Reuters(Reuters)

Google has introduced real-time, on-device mobile video segmentation to the YouTube app, by integrating this technology into the latter’s stories feature, a new lightweight video format, designed specifically for YouTube creators on its beta version.

The new segmentation technology will allow creators to replace and modify the background, effortlessly increasing videos’ production value without specialised equipment, Google’s research blog noted.

“Video segmentation is a widely used technique that enables movie directors and video content creators to separate the foreground of a scene from the background and treat them as two different visual layers. By modifying or replacing the background, creators can convey a particular mood, transport themselves to a fun location or enhance the impact of the message. However, this operation has traditionally been performed as a time-consuming manual process or requires a studio environment with a green screen for real-time background removal. In order to enable users to create this effect live in the viewfinder, we designed a new technique that is suitable for mobile phones,” the blog read.

The new technology has been developed using machine learning to solve a semantic segmentation task using convolution neural networks. To provide high-quality data for the machine learning pipeline, the developers annotated thousands of images that captured a wide spectrum of foreground poses and background settings. Annotations consisted of pixel-accurate locations of foreground elements such as hair, glasses, neck, skin, and lips, and a general background label achieving a cross-validation result of 98 percent Intersection-Over-Union (IOU) of human annotator quality.

via GIPHY

Furthermore, the specific segmentation task to compute a binary mask separating foreground from the background for every input frame (three channels, RGB) of the video was created. After this, the computed mask was passed from the previous frame as a prior by concatenating it as a fourth channel to the current RGB input frame to achieve temporal consistency, the developers said in the blog.

The original frame (left) is converted into three colour channels and concatenated with the previous mask (middle). Google uses this input to its neural network to predict the mask for the real-time frame (right). (Google)

Google noted that a limited rollout of YouTube stories will be facilitated to test the technology on this first set of effects, and will be rolled out across all versions in the near future.

“As we improve and expand our segmentation technology to more labels, we plan to integrate it into Google’s broader Augmented Reality services,” the blog noted.