LIVE NEWS
  • KiloClaw targets shadow AI with autonomous agent governance
  • Warming Waters in the Gulf of Maine May Affect the Future of Lobsters
  • DHS disputes Dem senator’s claim ICE struck asylum seeker in Baltimore
  • Artificial saliva made from sugarcane protein protects teeth from acid and decay
  • Peter Thiel’s big bet on solar-powered cow collars
  • Jack Black and Jack White Kick Off SNL Crossover Episode with "Seven Nation Army" Performance – consequence.net
  • Netflix AI Team Just Open-Sourced VOID: an AI Model That Erases Objects From Videos — Physics and All
  • DeBriefed 2 April 2026: Countries ‘revive’ energy-crisis measures | Record UK renewables | Plug-in solar savings
Prime Reports
  • Home
  • Popular Now
  • Crypto
  • Cybersecurity
  • Economy
  • Geopolitics
  • Global Markets
  • Politics
  • See More
    • Artificial Intelligence
    • Climate Risks
    • Defense
    • Healthcare Innovation
    • Science
    • Technology
    • World
Prime Reports
  • Home
  • Popular Now
  • Crypto
  • Cybersecurity
  • Economy
  • Geopolitics
  • Global Markets
  • Politics
  • Artificial Intelligence
  • Climate Risks
  • Defense
  • Healthcare Innovation
  • Science
  • Technology
  • World
Home»Artificial Intelligence»Netflix AI Team Just Open-Sourced VOID: an AI Model That Erases Objects From Videos — Physics and All
Artificial Intelligence

Netflix AI Team Just Open-Sourced VOID: an AI Model That Erases Objects From Videos — Physics and All

primereportsBy primereportsApril 5, 2026No Comments7 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Netflix AI Team Just Open-Sourced VOID: an AI Model That Erases Objects From Videos — Physics and All
Share
Facebook Twitter LinkedIn Pinterest Email


Video editing has always had a dirty secret: removing an object from footage is easy; making the scene look like it was never there is brutally hard. Take out a person holding a guitar, and you’re left with a floating instrument that defies gravity. Hollywood VFX teams spend weeks fixing exactly this kind of problem. A team of researchers from Netflix and INSAIT, Sofia University ‘St. Kliment Ohridski,’ released VOID (Video Object and Interaction Deletion) model that can do it automatically.

VOID removes objects from videos along with all interactions they induce on the scene — not just secondary effects like shadows and reflections, but physical interactions like objects falling when a person is removed.

What Problem Is VOID Actually Solving?

Standard video inpainting models — the kind used in most editing workflows today — are trained to fill in the pixel region where an object was. They’re essentially very sophisticated background painters. What they don’t do is reason about causality: if I remove an actor who is holding a prop, what should happen to that prop?

Existing video object removal methods excel at inpainting content ‘behind’ the object and correcting appearance-level artifacts such as shadows and reflections. However, when the removed object has more significant interactions, such as collisions with other objects, current models fail to correct them and produce implausible results.

VOID is built on top of CogVideoX and fine-tuned for video inpainting with interaction-aware mask conditioning. The key innovation is in how the model understands the scene — not just ‘what pixels should I fill?’ but ‘what is physically plausible after this object disappears?’

The canonical example from the research paper: if a person holding a guitar is removed, VOID also removes the person’s effect on the guitar — causing it to fall naturally. That’s not trivial. The model has to understand that the guitar was being supported by the person, and that removing the person means gravity takes over.

And unlike prior work, VOID was evaluated head-to-head against real competitors. Experiments on both synthetic and real data show that the approach better preserves consistent scene dynamics after object removal compared to prior video object removal methods including ProPainter, DiffuEraser, Runway, MiniMax-Remover, ROSE, and Gen-Omnimatte.

Netflix AI Team Just Open-Sourced VOID: an AI Model That Erases Objects From Videos — Physics and AllNetflix AI Team Just Open-Sourced VOID: an AI Model That Erases Objects From Videos — Physics and All
https://arxiv.org/pdf/2604.02296

The Architecture: CogVideoX Under the Hood

VOID is built on CogVideoX-Fun-V1.5-5b-InP — a model from Alibaba PAI — and fine-tuned for video inpainting with interaction-aware quadmask conditioning. CogVideoX is a 3D Transformer-based video generation model. Think of it like a video version of Stable Diffusion — a diffusion model that operates over temporal sequences of frames rather than single images. The specific base model (CogVideoX-Fun-V1.5-5b-InP) is released by Alibaba PAI on Hugging Face, which is the checkpoint engineers will need to download separately before running VOID.

The fine-tuned architecture specs: a CogVideoX 3D Transformer with 5B parameters, taking video, quadmask, and a text prompt describing the scene after removal as input, operating at a default resolution of 384×672, processing a maximum of 197 frames, using the DDIM scheduler, and running in BF16 with FP8 quantization for memory efficiency.

The quadmask is arguably the most interesting technical contribution here. Rather than a binary mask (remove this pixel / keep this pixel), the quadmask is a 4-value mask that encodes the primary object to remove, overlap regions, affected regions (falling objects, displaced items), and background to keep.

In practice, each pixel in the mask gets one of four values: 0 (primary object being removed), 63 (overlap between primary and affected regions), 127 (interaction-affected region — things that will move or change as a result of the removal), and 255 (background, keep as-is). This gives the model a structured semantic map of what’s happening in the scene, not just where the object is.

Two-Pass Inference Pipeline

VOID uses two transformer checkpoints, trained sequentially. You can run inference with Pass 1 alone or chain both passes for higher temporal consistency.

Pass 1 (void_pass1.safetensors) is the base inpainting model and is sufficient for most videos. Pass 2 serves a specific purpose: correcting a known failure mode. If the model detects object morphing — a known failure mode of smaller video diffusion models — an optional second pass re-runs inference using flow-warped noise derived from the first pass, stabilizing object shape along the newly synthesized trajectories.

It’s worth understanding the distinction: Pass 2 isn’t just for longer clips — it’s specifically a shape-stability fix. When the diffusion model produces objects that gradually warp or deform across frames (a well-documented artifact in video diffusion), Pass 2 uses optical flow to warp the latents from Pass 1 and feeds them as initialization into a second diffusion run, anchoring the shape of synthesized objects frame-to-frame.

How the Training Data Was Generated

This is where things get genuinely interesting. Training a model to understand physical interactions requires paired videos — the same scene, with and without the object, where the physics plays out correctly in both. Real-world paired data at this scale doesn’t exist. So the team built it synthetically.

Training used paired counterfactual videos generated from two sources: HUMOTO — human-object interactions rendered in Blender with physics simulation — and Kubric — object-only interactions using Google Scanned Objects.

HUMOTO uses motion-capture data of human-object interactions. The key mechanic is a Blender re-simulation: the scene is set up with a human and objects, rendered once with the human present, then the human is removed from the simulation and physics is re-run forward from that point. The result is a physically correct counterfactual — objects that were being held or supported now fall, exactly as they should. Kubric, developed by Google Research, applies the same idea to object-object collisions. Together, they produce a dataset of paired videos where the physics is provably correct, not approximated by a human annotator.

Key Takeaways

  • VOID goes beyond pixel-filling. Unlike existing video inpainting tools that only correct visual artifacts like shadows and reflections, VOID understands physical causality — if you remove a person holding an object, the object falls naturally in the output video.
  • The quadmask is the core innovation. Instead of a simple binary remove/keep mask, VOID uses a 4-value quadmask (values 0, 63, 127, 255) that encodes not just what to remove, but which surrounding regions of the scene will be physically affected — giving the diffusion model structured scene understanding to work with.
  • Two-pass inference solves a real failure mode. Pass 1 handles most videos; Pass 2 exists specifically to fix object morphing artifacts — a known weakness of video diffusion models — by using optical flow-warped latents from Pass 1 as initialization for a second diffusion run.
  • Synthetic paired data made training possible. Since real-world paired counterfactual video data doesn’t exist at scale, the research team built it using Blender physics re-simulation (HUMOTO) and Google’s Kubric framework, generating ground-truth before/after video pairs where the physics is provably correct.

Check out the Paper, Model Weight and Repo.  Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleDeBriefed 2 April 2026: Countries ‘revive’ energy-crisis measures | Record UK renewables | Plug-in solar savings
Next Article Jack Black and Jack White Kick Off SNL Crossover Episode with "Seven Nation Army" Performance – consequence.net
primereports
  • Website

Related Posts

Artificial Intelligence

KiloClaw targets shadow AI with autonomous agent governance

April 5, 2026
Artificial Intelligence

What Smart Buyers See in Hyperliquid, Ethereum, Chainlink & BlockDAG

April 4, 2026
Artificial Intelligence

Internet Bug Bounty program hits pause on payouts

April 4, 2026
Add A Comment
Leave A Reply Cancel Reply

Top Posts

Global Resources Outlook 2024 | UNEP

December 6, 20258 Views

The D Brief: DHS shutdown likely; US troops leave al-Tanf; CNO’s plea to industry; Crowded robot-boat market; And a bit more.

February 14, 20264 Views

German Chancellor Merz faces difficult mission to Israel – DW – 12/06/2025

December 6, 20254 Views
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Latest Reviews

Subscribe to Updates

Get the latest tech news from FooBar about tech, design and biz.

PrimeReports.org
Independent global news, analysis & insights.

PrimeReports.org brings you in-depth coverage of geopolitics, markets, technology and risk – with context that helps you understand what really matters.

Editorially independent · Opinions are those of the authors and not investment advice.
Facebook X (Twitter) LinkedIn YouTube
Key Sections
  • World
  • Geopolitics
  • Popular Now
  • Artificial Intelligence
  • Cybersecurity
  • Crypto
All Categories
  • Artificial Intelligence
  • Climate Risks
  • Crypto
  • Cybersecurity
  • Defense
  • Economy
  • Geopolitics
  • Global Markets
  • Healthcare Innovation
  • Politics
  • Popular Now
  • Science
  • Technology
  • World
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms & Conditions
  • Disclaimer
  • Cookie Policy
  • DMCA / Copyright Notice
  • Editorial Policy

Sign up for Prime Reports Briefing – essential stories and analysis in your inbox.

By subscribing you agree to our Privacy Policy. You can opt out anytime.
Latest Stories
  • KiloClaw targets shadow AI with autonomous agent governance
  • Warming Waters in the Gulf of Maine May Affect the Future of Lobsters
  • DHS disputes Dem senator’s claim ICE struck asylum seeker in Baltimore
© 2026 PrimeReports.org. All rights reserved.
Privacy Terms Contact

Type above and press Enter to search. Press Esc to cancel.