Breaking Down Sora AI’s Limitations: What You Need to Know
Sora AI is an AI model created by OpenAI, a company known for creating AI models like Dall-E and ChatGPT. It can take words written by people and turn them into cool videos that look real. These videos can be as long as one minute and are full of interesting scenes, moving cameras, and colorful characters. Sora works by using a special technique called a diffusion model. It starts with fuzzy noise and slowly changes it to make clear pictures or videos according to the words it’s given. This AI model learns from lots of examples where people wrote about videos, so it understands how to make the right images and actions match up with the prompts.
What are its capabilities?
Video and image generation
Sora AI generally can create high-definition videos up to one minute long from text prompts. It could also generate images of many sizes.
Understand User’s Language
The model has a deep understanding of language, enabling it to translate text prompts into videos and expand existing videos forward or backward in time. Even if users use short prompts, it will send longer, more detailed captions to the video model to generate videos accurately according to user prompts.
Convert images and video prompts to a video
It is now possible to make a static image move. Putting images into prompts will have Sora generate them into a video. When putting video as a prompt, it could lengthen the video duration longer, not by making the video in a slo-mo but by adding more frames to the existing video.
Image and video editing
When users input any image or video and ask Sora to edit the video to a certain style, such as putting on filters or changing the scenery of the video, it can do that.
Create transitions with two videos
With Sora, users can create transitions using two different videos. Imagine you have one video of an egg and another video of a chicken. Sora helps create seamless transitions, such as turning an egg into a chicken in one video, with the two videos you provided.
Capable of simulations
With this ability, Sora can make simulations that look like real people, animals, and places in the real world. It could generate videos with camera motion, whether it rotates, moves upwards, downwards or more. With long-range coherence and object permanence, people, animals or objects still exist even when they are outside of the frame. Hence, users can generate videos of the objects from multiple angles. Also, every action in the video could have an impact just as in the real world, such as leaving bite marks on a person eating a burger. It can also simulate digital worlds, such as creating virtual environments similar to video games like Minecraft.
Sora Limitations
Sora AI, despite its impressive capabilities, has some limitations that are important to consider:
- Short Video Length: Currently, Sora can only create videos up to a maximum length of one minute.
- Flaw in Simulations: The model may struggle with accurately simulating the physics of complex scenes and understanding specific instances of cause and effect. For example, when taking a slice of cake from a whole cake, there should be an empty space, but the cake is still there. Hence, it might not accurately know how an action would affect another.
- Spatial Details and Time: Sora may confuse spatial details in prompts, such as left and right, and struggle with precise descriptions of events that unfold over time, like following a specific camera trajectory
Sora AI is being tested by a small group of people to make sure it’s safe and works well before it’s available to everyone. It could change how videos are made and used in things like ads and marketing. There are other similar tools, like Veed.io and Runway. OpenAI, the company behind Sora, is talking to important people to understand any concerns about using this kind of technology. But right now, there’s no date for when Sora will be released to everyone.