Runway AI, a start-up based in New York, has announced the availability of its Gen 2 system that generates short video clips using just a few words of user prompts. Users can input a description of what they want to see, and the AI system will generate a roughly 3-second video clip depicting the desired scene. Alternatively, users can upload an image as a reference point for the system to use as a prompt. The launch of this product represents one of the most high-profile instances of text-to-video generation outside of a lab.
Runway AI has been developing AI-based film and editing tools since 2018, and the company raised $50 million late last year. The company helped create the original version of Stable Diffusion, a text-to-image AI model that has since been popularized and further developed by the company Stability AI.
The Gen 2 system is currently available through a waitlist, and interested users can sign up for access to a private Discord channel that the company plans to add more users to each week. While both Alphabet’s Google and Meta Platforms have demonstrated their own text-to-video efforts, such as a teddy bear washing dishes or a sailboat on a lake, they have not announced any plans to move their work beyond the research stage.
In an exclusive live demo, Runway co-founder and CEO Cris Valenzuela demonstrated the Gen 2 system’s capabilities by generating a video clip of a desert landscape using the prompt “drone footage of a desert landscape.” The resulting video was a few seconds long and slightly distorted but appeared to be drone footage of a desert landscape, complete with a blue sky, clouds on the horizon, and a rising or setting sun in the right corner of the frame. While the system has shown some strengths, such as generating crisp, humanlike close-up images of objects, it still has weaknesses.