OpenAI, the company behind the popular chatbot ChatGPT, has unveiled its latest innovation: Sora, a powerful video-generation model that can create realistic and imaginative scenes from text instructions.
What is Sora and how does it work?
Sora, which means “sky” in Japanese, is an AI model that can generate high-definition videos up to one minute long based on whatever prompt a user types into a text box. The model boasts an impressive array of capabilities, enabling it to construct intricate scenes featuring multiple characters, precise motion types, and detailed subject and background elements.
According to OpenAI’s official announcement, Sora empowers users to fabricate photorealistic videos lasting up to one minute based on prompts they provide. The model not only comprehends the user’s prompts but also interprets how these elements manifest in real-world scenarios.
“The model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions,” OpenAI wrote in a blog post introducing Sora. “Sora can also create multiple shots within a single generated video that accurately persist characters and visual style.”
Why is Sora a breakthrough in AI technology?
Sora is not the first text-to-video AI tool in the market. Google and smaller companies such as Runway have their own models, which have similar functions to OpenAI’s. However, early users of the new model have praised Sora for its detailed and mostly realistic looking output.
The sample videos from OpenAI’s Sora are high-definition and full of detail. OpenAI also says it can generate videos up to a minute long, which is much longer than most existing models. One video of a Tokyo street scene shows that Sora has learned how objects fit together in 3D: the camera swoops into the scene to follow a couple as they walk past a row of shops.
Sora also handles occlusion well, which means it can keep track of objects when they drop out of view. This is a common problem for existing models, which can fail to maintain consistency or realism in complex scenes.
The emergence of Sora reflects a broader trend in AI development, with a notable shift towards enhancing video-generation capabilities. Competitors such as Runway, Pika, and Google’s Lumiere have also made significant strides in this domain, offering text-to-video models of their own.
However, Sora marks a significant leap forward in AI technology, as it demonstrates a deep understanding of language, physics, and emotion, as well as a high level of creativity and imagination.
What are the potential applications and risks of Sora?
Sora is currently only accessible to select individuals designated as “red teamers”, tasked with evaluating the model for potential risks and drawbacks. Additionally, OpenAI has extended access to visual artists, designers, and filmmakers to solicit feedback, recognising the importance of community input in refining its technology.
OpenAI has not confirmed if or when it will release Sora to the public, but the company is strongly signalling that a public release is on the cards. “Learning from real-world use is critical,” OpenAI said in its blog post.
Sora could have many positive applications, such as enhancing education, entertainment, and communication. For example, Sora could help teachers create engaging videos for their students, or help filmmakers visualise their scripts before shooting. Sora could also enable users to create personalised videos for their friends and family, or express their own creativity and ideas.
However, Sora also poses many ethical and safety challenges, such as the potential misuse of fake but photorealistic video. Experts have expressed concern that AI-generated content could be used to wrongly influence elections, spread misinformation, or harm individuals or groups. The World Economic Forum’s Global Risks Report 2024 listed AI-generated misinformation and disinformation as the most significant risk facing the world in 2024.
OpenAI is aware of these risks and has taken steps to mitigate them. For example, the company has added watermarks to its text-to-image tool, DALL-E, to combat the proliferation of fake content. OpenAI also said it will develop tools to detect misleading content, such as a detection classifier capable of identifying videos generated by Sora.
OpenAI also said it will leverage existing safety protocols developed for its products that use DALL-E, which are relevant to Sora as well. These include limiting the number of queries per user, filtering out harmful or abusive prompts, and providing clear disclaimers and warnings to users.
Sora is an impressive and exciting AI tool that can create stunning videos from text. However, it also raises many questions and concerns about the future of AI and its impact on society. As OpenAI continues to refine and improve its model, it will also need to balance the benefits and risks of its technology, and ensure that it is used responsibly and ethically.