Alibaba Releases Qwen-Image-Edit: 20B Open-Source Model For Advanced Image And Text Editing

2025-08-19 14:13:00

In Brief

Alibaba Cloud’s Qwen team has launched Qwen-Image-Edit, a state-of-the-art image editing model that combines semantic and appearance editing with precise bilingual text modification, delivering advanced capabilities for creative and practical applications.

Alibaba Cloud’s Qwen team has introduced Qwen-Image-Edit, an advanced image editing model derived from the 20B Qwen-Image framework. The new system expands upon Qwen-Image’s distinct text rendering capabilities by applying them to image editing, with a particular focus on precision in text modifications. Qwen-Image-Edit processes input images through two parallel components: Qwen2.5-VL, which manages visual semantic control, and the VAE Encoder, which governs visual appearance. This dual approach enables the model to handle both semantic-level and appearance-level editing tasks effectively. The tool is accessible through Qwen Chat under the “Image Editing” feature.

Qwen-Image-Edit is designed to perform across multiple editing dimensions. It supports both appearance-level adjustments, such as the addition, removal, or modification of visual elements while keeping all other areas of the image intact, and semantic-level edits, such as intellectual property creation, object rotation, or style transfers, where broader pixel alterations are permitted but semantic integrity remains preserved. It also provides refined text editing capabilities in both Chinese and English, allowing users to add, remove, or adjust text within images while maintaining font, size, and style consistency. Benchmark testing across several widely recognized datasets indicates that Qwen-Image-Edit reaches state-of-the-art performance in image editing, positioning it as a strong foundation model for future applications in this domain.

Qwen-Image-Edit’s Semantic And Appearance Editing For Creative And Practical Applications

One of the defining aspects of Qwen-Image-Edit is its advanced functionality in both semantic and appearance editing. Semantic editing involves altering the content of an image while ensuring that the underlying visual meaning remains intact. To illustrate this function in a straightforward way, the development team highlights its use with Qwen’s official mascot, the Capybara, as a practical example.

Observation shows that while the majority of pixels in the modified image differ from those in the original input image on the left, the overall consistency of the Capybara character remains fully maintained. This demonstrates the strong semantic editing capability of Qwen-Image-Edit, which supports flexible and varied development of original intellectual property content. In addition, within Qwen Chat, a dedicated set of editing prompts was created around the 16 MBTI personality types. Using these prompts, a complete collection of MBTI-themed emoji packs featuring the Capybara mascot was successfully produced, effectively extending both the representation and visibility of the character.

Moreover, novel view synthesis represents another important use case within semantic editing. Qwen-Image-Edit is capable of rotating objects by 90 degrees or executing a full 180-degree rotation, enabling direct visualization of an object’s rear side. A further example of semantic editing lies in style transfer, where, for instance, a standard portrait can be reinterpreted into multiple artistic aesthetics, including styles reminiscent of Studio Ghibli.

Alongside semantic editing, appearance editing constitutes a frequently required function in image modification. This approach focuses on preserving specific regions of an image entirely unchanged while introducing, removing, or altering designated elements. As demonstrated in an example where a signboard is seamlessly incorporated into a scene, appearance editing lends itself to a broad array of applications such as background adjustments for individuals or modifications of clothing. Another defining capability of Qwen-Image-Edit is its precision in text editing, a feature derived from Qwen-Image’s advanced expertise in text rendering technologies.

IN-4.86%

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

0/400

No comments

Topic
#Token of Love: Cheer on Square & Win Tickets
4k Popularity
#Crypto Market Rebound
197k Popularity
#FOMC July Minutes
22k Popularity
#Show My Alpha Points
178k Popularity
#Crypto-Related xStocks Rally
3k Popularity

Sitemap