With the help of MGIE, users can effortlessly make precise edits to images by simply providing text prompts. Apple has introduced MGIE, an innovative AI model that enables users to edit images using natural language instructions. MGIE, which stands for MLLM-Guided Image Editing, utilizes large language models to interpret text prompts and make detailed changes to photos at the pixel level. This remarkable open-source tool represents a significant advancement in multimodal AI and has the potential to greatly enhance creative workflows.
The development of MGIE is the result of a collaboration between Apple and researchers at UC Santa Barbara. The model was presented in a paper at the prestigious International Conference on Learning Representations, a leading platform for showcasing state-of-the-art AI systems. The experiments described in the paper demonstrate the impressive performance of MGIE in improving image editing metrics and receiving positive evaluations from humans. Furthermore, the system maintains competitive computational efficiency.
So, how does MGIE work its magic? It incorporates multimodal large language models (MLLMs) to comprehend instructions and generate visual outputs. MLLMs have proven to be highly proficient in cross-modal reasoning and responding appropriately to text-image inputs. By integrating MLLMs into the editing pipeline, MGIE can translate user commands into concise and unambiguous editing guidance. For instance, a prompt like “make the sky more blue” would be translated into an instruction to increase the saturation of the sky region by 20%.
The versatile design of MGIE empowers users to tackle various image editing use cases. It can handle common adjustments like cropping, rotating, and filtering, similar to what can be done in Photoshop. Additionally, the model is capable of performing more advanced manipulations such as object replacements, background changes, and photo blending. MGIE optimizes images on a global scale by adjusting properties like brightness and contrast. It also allows for localized edits on specific regions and objects, modifying visual attributes such as shape, size, color, texture, and style.
Furthermore, individuals have the option to explore a live web demonstration on Hugging Face Spaces in order to engage with the model. MGIE has the capability to comprehend natural language instructions and produce modified images, accompanied by the corresponding editing procedures. Users have the opportunity to offer feedback to continuously enhance the outcomes. The versatile API of MGIE facilitates seamless integration into various applications that require image manipulation capabilities.
MGIE signifies a remarkable advancement in instruction-based image editing. It showcases the potential of leveraging MLLMs to augment image editing and introduces fresh avenues for cross-modal interaction and communication.