Stability AI, a leading company in the field of generative AI, has announced the early preview of its new text-to-image model, Stable Diffusion 3. This model promises to deliver high-quality images from natural language prompts, with improved performance, accuracy, and spelling abilities.
What is Stable Diffusion 3?
Stable Diffusion 3 is the latest iteration of Stability AI’s text-to-image models, which allow users to create realistic and detailed images from text descriptions. For example, a user can type “a horse balancing on top of a colorful ball in a field with green grass and a mountain in the background” and get an image that matches the prompt.
Stable Diffusion 3 uses a novel architecture called diffusion transformer, which is inspired by OpenAI’s Sora model. This architecture combines a diffusion process, which gradually transforms a random noise image into a target image, with a transformer network, which learns the mapping between text and image features. Additionally, Stable Diffusion 3 uses a technique called flow matching, which aligns the latent space of the diffusion process with the latent space of the transformer network, resulting in more coherent and consistent images.
Stability AI claims that Stable Diffusion 3 is its most capable text-to-image model to date, with significant improvements in multi-subject prompts, image quality, and spelling abilities. The model can handle complex and diverse prompts, generate images with sharp and realistic details, and correct spelling errors in the input text.
How to access Stable Diffusion 3?
Stable Diffusion 3 is not yet broadly available to the public, but Stability AI has opened a waitlist for an early preview. Users who sign up for the waitlist will get a chance to try the model and provide feedback to the company. Stability AI says that this preview phase is crucial for gathering insights to improve the model’s performance and safety ahead of an open release.
Stability AI also says that the Stable Diffusion 3 suite of models ranges from 800 million to 8 billion parameters, which means that the models can be run locally on a variety of devices, from smartphones to servers. This approach aims to democratize access and provide users with different options for scalability and quality to best meet their creative needs.
Stability AI has a history of releasing open-weights and source-available models, which means that users can download the models and fine-tune them to change their outputs. However, the company also has a history of controversy, as some of its models have been accused of using copyrighted training data, generating biased and harmful images, and enabling misuse by bad actors. Stability AI says that it has taken and continues to take reasonable steps to prevent the abuse of Stable Diffusion 3, and that it collaborates with researchers, experts, and its community to ensure generative AI is open, safe, and universally accessible.
Why is Stable Diffusion 3 important?
Stable Diffusion 3 is an important milestone for the field of generative AI, as it demonstrates the potential of text-to-image models to create realistic and diverse images from natural language. Text-to-image models have many applications in various domains, such as art, education, entertainment, and design. For example, a user can use a text-to-image model to create illustrations, animations, logos, memes, or wallpapers.
Stable Diffusion 3 is also an important competitor to other text-to-image models, such as OpenAI’s DALL-E 3, which is a proprietary model that is not openly accessible. Stability AI aims to offer an alternative solution that is more open and adaptable, and that enables individuals, developers, and enterprises to unleash their creativity.
Stability AI says that its mission is to activate humanity’s potential, and that with Stable Diffusion 3, it strives to offer solutions that align with its core values. The company also says that it will publish a detailed technical report on Stable Diffusion 3 soon, and that it will keep updating its progress on its social media platforms and Discord community.