A team of researchers from HKUST has developed a new image editing system called MagicQuill, which is based on diffusion models. This system aims to provide a user-friendly experience by combining AI-driven suggestions with precise local editing capabilities. It builds on Stable Diffusion v1.5 and Fine-tuned LLaVA-1.5, and challenges existing tools such as SmartEdit, BrushNet, and GAN-based tools like SketchEdit. The system is also a competitor to Adobe’s Palette and offers real-time intent recognition through Multimodal Large Language Models (MLLMs). The research paper was a collaboration between researchers from Ant Group, ZJU, and HKU. MagicQuill aims to address limitations in existing tools by improving precision, reducing complexity, and making sophisticated image editing accessible to users of all skill levels. Its Painting Assistor feature uses MLLMs to predict editing prompts in real-time, eliminating the need for manual input. The Idea Collector interface streamlines interaction and allows for iterative edits, stroke management, and result previews. However, the system has some limitations, such as quality degradation when sketches deviate from prompts, loss of fine details during color adjustments, and misinterpretation of simple or ambiguous sketches. It also requires high-end hardware and lacks advanced features like reference-based editing, which may be included in future updates. Overall, MagicQuill is a promising tool for image editing, and its diffusion model-based approach sets it apart from other advanced systems in the market.