
Netflix and INSAIT researchers have open-sourced VOID, an AI model that removes objects from video while preserving physical consistency — handling shadows, reflections, and even gravity. The release could reshape post-production workflows and democratize professional-grade video editing.
Researchers from Netflix and INSAIT at Sofia University “St. Kliment Ohridski” have released VOID — short for Video Object and Interaction Deletion — an open-source AI model capable of removing objects from video footage while simultaneously handling the cascade of physical consequences that deletion creates. It’s not just erasing pixels. It’s rewriting reality.
If you’ve ever watched a behind-the-scenes breakdown of Hollywood visual effects, you know that painting something out of a shot is only half the battle. The real nightmare begins when you realize that every object in a scene is entangled with its environment — casting shadows, reflecting light, exerting force on other things. VOID tackles all of it in one pass.
Consider a deceptively simple scenario: you want to remove a person carrying a guitar from a scene. Traditional video inpainting tools can mask the person effectively enough, but they leave behind a guitar-shaped ghost hovering in mid-air. The instrument has no reason to fall because the AI doesn’t understand gravity — it only understands pixels.
VOID approaches the problem differently. When it deletes an object from a video, it also resolves every interaction that object was responsible for:
This distinction is critical. Previous approaches treated video inpainting as a purely visual reconstruction problem — fill in the missing region with something plausible. VOID reframes it as a scene-level reasoning task where physics and causality matter.
The implications ripple across multiple sectors, not just entertainment.
In Hollywood post-production, removing unwanted elements from footage is a labor-intensive process that can consume weeks of artist time per shot. Studios routinely allocate six-figure budgets just for cleanup work on a single film. A tool like VOID won’t replace VFX artists overnight, but it could dramatically accelerate the rough pass — giving compositors a far better starting point than a blank mask.
For Netflix specifically, the motivation is obvious. The company produces an enormous volume of original content across dozens of countries. Any technology that shaves time or cost from post-production at scale represents a significant competitive advantage. Open-sourcing the model, however, signals that Netflix sees more value in advancing the broader research ecosystem than in hoarding the tool internally.
Content creators on platforms like YouTube, TikTok, and Instagram also stand to benefit enormously. Professional-grade object removal has historically required expensive software and deep technical skill. An open-source model that handles physics-aware deletion could democratize capabilities that were previously locked behind studio doors.
Video inpainting has progressed rapidly over the past three years, largely driven by diffusion models and transformer architectures. Tools from companies like Runway, Pika, and Adobe have made impressive strides in generating plausible fill content for masked regions.
But “plausible” and “physically correct” are very different standards. Most existing models operate frame-by-frame or with limited temporal context. They can produce visually coherent patches, but they struggle with:
VOID appears to make meaningful progress on all three fronts, particularly on interaction awareness, which has been largely unaddressed in prior work.
The collaboration between Netflix and INSAIT is worth noting. INSAIT — the Institute for Computer Science, Artificial Intelligence, and Technology — was founded in 2022 with backing from Google DeepMind and Amazon, among others. It has quickly established itself as one of Europe’s premier AI research hubs. This partnership suggests that Netflix is serious about pushing foundational research, not just shipping product features.
Researchers in the computer vision community will likely scrutinize how well VOID generalizes across diverse scenarios — outdoor scenes with complex lighting, crowded environments with heavy occlusion, and fast-moving footage where temporal coherence is hardest to maintain. Open-sourcing the model invites exactly this kind of rigorous community evaluation.
Several developments are worth watching in the wake of this release:
The release of VOID also raises a fascinating philosophical question about what it means to edit video. Traditional editing removes frames or rearranges them. Generative inpainting creates frames that never existed. VOID goes further — it reasons about a counterfactual version of reality where an object was simply never present, and then renders that alternative timeline. We’re moving from editing to world simulation.
VOID represents a meaningful leap in how AI handles video manipulation — not because it removes objects more cleanly, but because it understands that removing an object changes the world around it. By open-sourcing the model, Netflix and INSAIT have handed the research community and creative industry a powerful new tool that could reshape post-production workflows, enable new forms of content creation, and inevitably complicate an already thorny debate about the authenticity of digital media.
For now, the most important thing is that it works — and anyone can try it.