Meta announces its own artificial intelligence (AI) tool called CM3leon. It is able to understand and generate both text and images. Thus, you can create images from text descriptions and compose text based on images. This makes it useful for multiple tasks. For example, if one types A small cactus wearing a straw hat and reflective glasses in the Sahara desert, an image with that description is generated.
CM3Leon trained with models using only text
This is the first multimodal model trained with a recipe adapted from text-only language models. So, it improves the training speed of the model and allows larger transformers to be trained with a significant but achievable increase in performance.
According to Meta, CM3Leon is even more efficient than most transformers, requiring five times less computing power and a smaller training data set than previous transformer-based methods.
In addition to generating images from text, CM3leon also provides the ability to edit images using text prompts. An example is that you can change the color of the sky to bright blue. This is challenging, as it requires the model to simultaneously understand both the textual instructions and the visual content. On the other hand, the tool allows the option of asking the AI to describe a photo with words.
