WiseEdit: Benchmarking Cognition- and Creativity-Informed Image Editing
arXiv Codeπ€
WiseEdit Benchmarkπ€
Results Gallery- [Note] If you would like to submit your results on WiseEdit, please contact us at kaihangpan@zju.edu.cn.
- [2025-12-07] π₯ We release the Results Gallery, which initially contains the generation results of 22 mainstream open- and closed-source image editing models (as shown in our paper).
- [2025-12-07] π₯ We release the evaluation codes for WiseEdit on our github. Welcome to evaluate your model on WiseEdit!
- [2025-12-07] π₯ We release the WiseEdit benchmark on Huggingface.
- [2025-11-29] π₯ We release the project page with the leaderboard.
- [2025-11-29] π₯ We release the WiseEdit paper on Arxiv.
Abstract
Recent image editing models boast next-level intelligent capabilities, facilitating cognition- and creativity-informed image editing. Yet, existing benchmarks provide too narrow a scope for evaluation, failing to holistically assess these advanced abilities. To address this, we introduce WiseEdit, a knowledge-intensive benchmark for comprehensive evaluation of cognition- and creativity-informed image editing, featuring deep task depth and broad knowledge breadth. Drawing an analogy to human cognitive creation, WiseEdit decomposes image editing into three cascaded stepsβAwareness, Interpretation, and Imaginationβeach corresponding to a task that poses a challenge for models to complete at the specific step. It also encompasses complex tasks, where none of the three steps can be finished easily. Furthermore, WiseEdit incorporates three fundamental types of knowledge: Declarative, Procedural, and Metacognitive knowledge. Ultimately, WiseEdit comprises 1,220 test cases, objectively revealing the limitations of SoTA image editing models in knowledge-based cognitive reasoning and creative composition capabilities.
Leaderboard on WiseEdit (English)
| Model | Multi Img |
Awareness Task | Interpretation Task | Imagination Task | Overall AVG |
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IF↑ | DP↑ | VQ↑ | KF↑ | AVG | IF↑ | DP↑ | VQ↑ | KF↑ | AVG | IF↑ | DP↑ | VQ↑ | CF↑ | AVG | |||
| InstructPix2Pix | β | 24.6 | 33.7 | 50.6 | 26.4 | 33.8 | 20.7 | 50.3 | 66.9 | 23.6 | 40.4 | 17.0 | 29.9 | 41.2 | 27.4 | 28.8 | 34.4 |
| MagicBrush | β | 27.2 | 43.4 | 53.3 | 27.1 | 37.8 | 16.8 | 50.3 | 63.2 | 22.2 | 38.1 | 18.0 | 36.9 | 44.8 | 22.3 | 30.5 | 35.5 |
| OmniGen | β | 35.0 | 42.0 | 46.7 | 37.4 | 40.3 | 19.0 | 34.8 | 40.3 | 21.5 | 28.9 | 42.2 | 35.1 | 46.0 | 38.7 | 40.5 | 36.6 |
| Janus-4o | β | 34.7 | 37.0 | 45.9 | 36.2 | 38.5 | 27.2 | 43.8 | 53.6 | 28.2 | 38.2 | 28.2 | 37.6 | 42.0 | 25.5 | 33.3 | 36.7 |
| AnyEdit | β | 25.0 | 54.6 | 61.3 | 26.3 | 41.8 | 15.9 | 61.2 | 62.0 | 20.2 | 39.8 | 9.1 | 49.7 | 50.9 | 16.5 | 31.5 | 37.7 |
| UltraEdit | β | 26.5 | 42.5 | 53.1 | 33.9 | 39.0 | 24.3 | 61.7 | 73.6 | 26.7 | 46.6 | 20.7 | 31.7 | 45.8 | 27.5 | 31.5 | 39.0 |
| ICEdit | β | 26.1 | 42.2 | 61.2 | 31.8 | 40.4 | 21.4 | 48.3 | 81.5 | 24.9 | 44.0 | 21.5 | 40.6 | 54.0 | 25.0 | 35.3 | 39.9 |
| UniWorld-V1 | β | 31.5 | 48.9 | 58.8 | 38.6 | 44.5 | 18.1 | 44.5 | 58.1 | 22.5 | 35.8 | 30.3 | 50.3 | 64.2 | 27.5 | 43.1 | 41.1 |
| HiDream-E1 | β | 29.7 | 41.2 | 56.3 | 32.0 | 39.8 | 26.7 | 53.6 | 68.4 | 29.6 | 44.6 | 39.6 | 40.1 | 49.9 | 29.6 | 39.8 | 41.4 |
| FLUX.1 Kontext Dev | β | 31.4 | 52.0 | 55.0 | 35.5 | 43.5 | 27.5 | 62.2 | 69.6 | 29.0 | 47.1 | 39.1 | 47.1 | 43.4 | 27.1 | 39.2 | 43.2 |
| OmniGen2 | β | 35.0 | 64.0 | 75.4 | 41.3 | 53.9 | 18.9 | 56.9 | 64.9 | 23.5 | 41.1 | 42.0 | 64.4 | 74.6 | 31.8 | 53.2 | 49.4 |
| Step1X-Edit-v1p2 | β | 39.8 | 53.5 | 61.3 | 44.4 | 49.7 | 35.7 | 73.0 | 75.2 | 38.2 | 55.5 | 44.7 | 49.4 | 50.3 | 28.4 | 43.2 | 49.5 |
| Echo-4o | β | 47.6 | 63.0 | 75.4 | 51.7 | 59.4 | 30.8 | 71.0 | 80.4 | 32.9 | 53.8 | 63.4 | 62.4 | 73.7 | 41.2 | 60.2 | 57.8 |
| Bagel | β | 46.2 | 71.0 | 75.8 | 50.8 | 61.0 | 38.6 | 72.1 | 78.8 | 39.5 | 57.3 | 62.8 | 68.5 | 74.5 | 40.7 | 61.6 | 60.0 |
| Uni-CoT | β | 46.0 | 69.1 | 77.8 | 51.6 | 61.1 | 36.9 | 70.1 | 76.3 | 38.6 | 55.5 | 67.6 | 64.3 | 79.6 | 42.9 | 63.6 | 60.1 |
| Qwen-Image-Edit | β | 48.1 | 69.0 | 79.5 | 53.6 | 62.5 | 32.1 | 69.7 | 80.6 | 34.2 | 54.1 | 67.1 | 66.8 | 79.2 | 42.3 | 63.8 | 60.2 |
| DreamOmni2 | β | 43.3 | 74.4 | 85.0 | 51.2 | 63.5 | 34.3 | 81.7 | 88.1 | 35.9 | 60.0 | 50.6 | 64.9 | 81.9 | 35.3 | 58.2 | 60.6 |
| FLUX.2-dev | β | 42.6 | 63.3 | 78.4 | 53.3 | 59.4 | 35.4 | 75.0 | 85.6 | 37.6 | 58.4 | 73.6 | 70.7 | 82.1 | 43.6 | 67.5 | 61.8 |
| Nano Banana | β | 70.6 | 85.7 | 86.8 | 75.2 | 79.6 | 63.4 | 84.9 | 91.4 | 61.5 | 75.3 | 75.3 | 73.8 | 87.3 | 44.3 | 70.2 | 75.0 |
| Seedream 4.0 | β | 70.8 | 78.1 | 86.6 | 74.6 | 77.5 | 63.7 | 80.1 | 90.6 | 64.2 | 74.6 | 82.2 | 77.8 | 86.9 | 47.0 | 73.5 | 75.2 |
| GPT-image-1 | β | 78.5 | 85.8 | 88.0 | 81.2 | 83.3 | 62.9 | 82.9 | 93.0 | 60.8 | 74.9 | 84.4 | 76.2 | 89.2 | 48.4 | 74.6 | 77.6 |
| Nano Banana Pro | β | 85.4 | 88.6 | 83.9 | 91.4 | 87.3 | 76.0 | 89.1 | 92.3 | 75.8 | 83.3 | 86.6 | 79.5 | 88.8 | 51.5 | 76.6 | 82.4 |
Leaderboard on WiseEdit (Chinese)
| Model | Multi Img |
Awareness Task | Interpretation Task | Imagination Task | Overall AVG |
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IF↑ | DP↑ | VQ↑ | KF↑ | AVG | IF↑ | DP↑ | VQ↑ | KF↑ | AVG | IF↑ | DP↑ | VQ↑ | CF↑ | AVG | |||
| MagicBrush | β | 15.0 | 43.8 | 52.9 | 17.8 | 32.4 | 10.1 | 40.5 | 59.4 | 13.5 | 30.9 | 5.4 | 39.6 | 49.2 | 12.7 | 26.8 | 30.0 |
| OmniGen | β | 16.3 | 42.6 | 60.1 | 22.6 | 35.4 | 13.4 | 30.9 | 49.2 | 14.8 | 27.1 | 15.4 | 27.6 | 51.3 | 32.3 | 31.7 | 31.3 |
| ICEdit | β | 12.9 | 29.1 | 63.6 | 17.4 | 30.8 | 11.5 | 43.5 | 81.1 | 17.0 | 38.3 | 5.1 | 37.4 | 56.5 | 16.7 | 28.9 | 32.7 |
| FLUX.1 Kontext Dev | β | 16.5 | 48.4 | 52.1 | 19.1 | 34.0 | 16.8 | 58.4 | 66.1 | 22.2 | 40.9 | 9.2 | 41.9 | 43.3 | 10.6 | 26.3 | 33.7 |
| AnyEdit | β | 17.5 | 55.3 | 55.7 | 19.9 | 37.1 | 11.4 | 51.8 | 58.0 | 16.0 | 34.3 | 7.3 | 47.4 | 52.3 | 15.4 | 30.6 | 34.0 |
| InstructPix2Pix | β | 14.2 | 51.0 | 58.6 | 18.1 | 35.5 | 13.0 | 55.7 | 65.2 | 16.0 | 37.5 | 3.7 | 52.0 | 53.2 | 9.0 | 29.5 | 34.1 |
| Janus-4o | β | 31.1 | 38.6 | 46.3 | 34.1 | 37.5 | 23.9 | 45.9 | 55.7 | 25.8 | 37.8 | 25.4 | 36.9 | 41.8 | 22.1 | 31.5 | 35.6 |
| UniWorld-V1 | β | 18.4 | 49.0 | 60.0 | 26.2 | 38.4 | 13.3 | 48.8 | 59.7 | 16.9 | 34.7 | 17.9 | 54.8 | 68.9 | 18.0 | 39.9 | 37.7 |
| HiDream-E1 | β | 28.2 | 37.6 | 51.4 | 32.4 | 37.4 | 25.4 | 47.1 | 63.6 | 29.7 | 41.5 | 32.5 | 39.2 | 47.5 | 27.4 | 36.6 | 38.5 |
| UltraEdit | β | 16.9 | 58.5 | 62.1 | 21.2 | 39.7 | 17.4 | 74.2 | 78.9 | 19.2 | 47.4 | 9.3 | 42.3 | 51.8 | 14.8 | 29.5 | 38.9 |
| OmniGen2 | β | 35.1 | 57.9 | 72.4 | 41.0 | 51.6 | 19.1 | 57.1 | 64.8 | 23.0 | 41.0 | 45.5 | 64.0 | 72.0 | 33.8 | 53.8 | 48.8 |
| Step1X-Edit-v1p2 | β | 38.6 | 55.6 | 59.5 | 42.0 | 48.9 | 37.0 | 77.5 | 76.8 | 35.9 | 56.8 | 45.7 | 48.3 | 51.3 | 27.0 | 43.1 | 49.6 |
| DreamOmni2 | β | 31.9 | 78.7 | 85.4 | 38.4 | 58.6 | 24.0 | 80.1 | 86.2 | 27.5 | 54.4 | 38.5 | 69.4 | 84.8 | 27.2 | 55.0 | 56.0 |
| Echo-4o | β | 47.9 | 59.9 | 73.1 | 55.0 | 59.0 | 31.9 | 74.5 | 77.4 | 32.6 | 54.1 | 62.8 | 64.2 | 75.1 | 41.5 | 60.9 | 58.0 |
| Bagel | β | 48.5 | 71.3 | 76.8 | 52.1 | 62.2 | 36.5 | 68.7 | 75.0 | 38.5 | 54.7 | 63.5 | 68.3 | 75.3 | 39.7 | 61.7 | 59.5 |
| Uni-CoT | β | 46.2 | 70.0 | 80.7 | 53.6 | 62.6 | 37.4 | 71.5 | 79.2 | 36.6 | 56.2 | 65.5 | 65.1 | 79.7 | 41.6 | 63.0 | 60.6 |
| Qwen-Image-Edit | β | 45.0 | 67.3 | 79.9 | 52.9 | 61.3 | 35.8 | 74.0 | 80.7 | 36.1 | 56.6 | 66.3 | 67.2 | 80.0 | 41.7 | 63.8 | 60.6 |
| FLUX.2-dev | β | 43.0 | 60.6 | 79.5 | 51.4 | 58.6 | 34.3 | 73.7 | 83.1 | 36.0 | 56.8 | 75.5 | 74.4 | 82.8 | 42.6 | 68.8 | 61.4 |
| Seedream 4.0 | β | 69.1 | 79.0 | 84.4 | 72.0 | 76.1 | 62.2 | 80.3 | 89.9 | 59.9 | 73.1 | 79.8 | 79.7 | 86.5 | 46.4 | 73.1 | 74.1 |
| Nano Banana | β | 71.8 | 83.8 | 86.5 | 70.7 | 78.2 | 67.9 | 84.5 | 91.4 | 63.7 | 76.9 | 76.0 | 75.8 | 87.3 | 43.7 | 70.7 | 75.3 |
| GPT-image-1 | β | 77.0 | 80.7 | 86.6 | 80.7 | 81.2 | 61.4 | 82.6 | 93.8 | 61.2 | 74.8 | 78.8 | 73.3 | 89.6 | 48.3 | 72.5 | 76.2 |
| Nano Banana Pro | β | 84.6 | 91.8 | 83.1 | 87.9 | 86.9 | 74.2 | 83.9 | 91.3 | 74.6 | 81.0 | 85.5 | 77.6 | 88.4 | 51.1 | 75.6 | 81.2 |
Leaderboard on WiseEdit-Complex
We exclude models unable to handle multi-image inputs.
| Model | English Version | Chinese Version | Overall AVG |
||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IF↑ | DP↑ | VQ↑ | KF↑ | CF↑ | AVG | IF↑ | DP↑ | VQ↑ | KF↑ | CF↑ | AVG | ||
| AnyEdit ZJU |
2.5 | 5.6 | 20.6 | 3.3 | 11.7 | 8.7 | 1.3 | 5.1 | 22.2 | 2.9 | 9.3 | 8.2 | 8.4 |
| UniWorld-V1 PKU |
18.1 | 32.1 | 55.3 | 22.8 | 28.6 | 31.4 | 8.8 | 23.8 | 64.6 | 12.4 | 15.4 | 25.0 | 28.2 |
| OmniGen BAAI |
23.5 | 25.7 | 41.2 | 31.5 | 48.9 | 34.2 | 4.4 | 15.4 | 50.3 | 15.1 | 32.2 | 23.5 | 28.8 |
| OmniGen2 BAAI |
32.2 | 48.3 | 70.5 | 47.1 | 42.9 | 48.2 | 28.3 | 49.8 | 73.2 | 45.4 | 43.9 | 48.1 | 48.2 |
| DreamOmni2 CUHK |
35.6 | 63.1 | 79.6 | 46.8 | 41.8 | 53.4 | 37.4 | 52.4 | 79.9 | 50.4 | 36.3 | 51.2 | 50.7 |
| Echo-4o Shanghai AI Lab |
44.3 | 48.1 | 65.7 | 52.5 | 50.3 | 52.2 | 42.6 | 57.1 | 70.0 | 54.6 | 51.7 | 55.2 | 52.1 |
| Qwen-Image-Edit Qwen |
38.7 | 58.6 | 75.8 | 48.5 | 47.1 | 53.8 | 35.3 | 55.0 | 78.1 | 49.6 | 48.9 | 53.4 | 53.6 |
| Bagel ByteDance |
44.6 | 62.8 | 70.8 | 54.3 | 44.2 | 55.3 | 40.3 | 59.0 | 74.3 | 56.2 | 47.6 | 55.5 | 53.8 |
| Uni-CoT SAIS |
36.3 | 60.8 | 70.1 | 57.8 | 50.3 | 55.1 | 37.2 | 56.0 | 78.2 | 58.9 | 49.7 | 56.0 | 53.9 |
| FLUX.2-dev Black Forest Labs |
42.3 | 68.5 | 75.6 | 56.5 | 49.1 | 58.4 | 46.3 | 73.4 | 80.3 | 59.9 | 52.8 | 62.6 | 60.5 |
| Nano Banana |
53.8 | 75.2 | 82.7 | 82.4 | 53.7 | 69.6 | 53.3 | 71.3 | 79.9 | 77.4 | 51.1 | 66.6 | 68.1 |
| GPT-image-1 OpenAI |
58.7 | 75.9 | 87.6 | 77.9 | 54.8 | 71.0 | 59.5 | 76.6 | 88.2 | 78.8 | 54.1 | 71.4 | 69.6 |
| Seedream 4.0 ByteDance |
68.0 | 78.1 | 80.4 | 90.5 | 53.9 | 74.2 | 60.1 | 63.6 | 81.8 | 88.6 | 56.3 | 70.1 | 70.5 |
| Nano Banana Pro |
68.1 | 78.1 | 86.7 | 88.1 | 56.6 | 75.5 | 77.7 | 83.8 | 84.3 | 90.7 | 57.0 | 78.7 | 77.1 |
Benchmark Examples
Qualitative Comparisons on WiseEdit Awareness Task.
Qualitative Comparisons on WiseEdit Interpretation Task.
Qualitative Comparisons on WiseEdit Imagination Task.
Qualitative Comparisons on WiseEdit-Complex Task.
BibTeX
@article{pan2025wiseedit,
title={WiseEdit: Benchmarking Cognition-and Creativity-Informed Image Editing},
author={Pan, Kaihang and Chen, Weile and Qiu, Haiyi and Yu, Qifan and Bu, Wendong and Wang, Zehan and Zhu, Yun and Li, Juncheng and Tang, Siliang},
journal={arXiv preprint arXiv:2512.00387},
year={2025}
}