WiseEdit: Benchmarking Cognition- and Creativity-Informed Image Editing

1Zhejiang University, 2Shanghai Artificial Intelligence Laboratory
*Equal Contribution
arXiv Code

πŸ€—

WiseEdit Benchmark

πŸ€—

Results Gallery
  • [Note] If you would like to submit your results on WiseEdit, please contact us at kaihangpan@zju.edu.cn.
  • [2025-12-07] πŸ”₯ We release the Results Gallery, which initially contains the generation results of 22 mainstream open- and closed-source image editing models (as shown in our paper).
  • [2025-12-07] πŸ”₯ We release the evaluation codes for WiseEdit on our github. Welcome to evaluate your model on WiseEdit!
  • [2025-12-07] πŸ”₯ We release the WiseEdit benchmark on Huggingface.
  • [2025-11-29] πŸ”₯ We release the project page with the leaderboard.
  • [2025-11-29] πŸ”₯ We release the WiseEdit paper on Arxiv.

Abstract

Recent image editing models boast next-level intelligent capabilities, facilitating cognition- and creativity-informed image editing. Yet, existing benchmarks provide too narrow a scope for evaluation, failing to holistically assess these advanced abilities. To address this, we introduce WiseEdit, a knowledge-intensive benchmark for comprehensive evaluation of cognition- and creativity-informed image editing, featuring deep task depth and broad knowledge breadth. Drawing an analogy to human cognitive creation, WiseEdit decomposes image editing into three cascaded stepsβ€”Awareness, Interpretation, and Imaginationβ€”each corresponding to a task that poses a challenge for models to complete at the specific step. It also encompasses complex tasks, where none of the three steps can be finished easily. Furthermore, WiseEdit incorporates three fundamental types of knowledge: Declarative, Procedural, and Metacognitive knowledge. Ultimately, WiseEdit comprises 1,220 test cases, objectively revealing the limitations of SoTA image editing models in knowledge-based cognitive reasoning and creative composition capabilities.

WiseEdit Intro Image

Leaderboard on WiseEdit (English)

Model Multi
Img
Awareness Task Interpretation Task Imagination Task Overall
AVG
IF↑ DP↑ VQ↑ KF↑ AVG IF↑ DP↑ VQ↑ KF↑ AVG IF↑ DP↑ VQ↑ CF↑ AVG
InstructPix2Pixβœ— 24.6 33.7 50.6 26.4 33.8 20.7 50.3 66.9 23.6 40.4 17.0 29.9 41.2 27.4 28.8 34.4
MagicBrushβœ— 27.2 43.4 53.3 27.1 37.8 16.8 50.3 63.2 22.2 38.1 18.0 36.9 44.8 22.3 30.5 35.5
OmniGenβœ“ 35.0 42.0 46.7 37.4 40.3 19.0 34.8 40.3 21.5 28.9 42.2 35.1 46.0 38.7 40.5 36.6
Janus-4oβœ— 34.7 37.0 45.9 36.2 38.5 27.2 43.8 53.6 28.2 38.2 28.2 37.6 42.0 25.5 33.3 36.7
AnyEdit βœ— 25.0 54.6 61.3 26.3 41.8 15.9 61.2 62.0 20.2 39.8 9.1 49.7 50.9 16.5 31.5 37.7
UltraEditβœ— 26.5 42.5 53.1 33.9 39.0 24.3 61.7 73.6 26.7 46.6 20.7 31.7 45.8 27.5 31.5 39.0
ICEditβœ— 26.1 42.2 61.2 31.8 40.4 21.4 48.3 81.5 24.9 44.0 21.5 40.6 54.0 25.0 35.3 39.9
UniWorld-V1βœ“ 31.5 48.9 58.8 38.6 44.5 18.1 44.5 58.1 22.5 35.8 30.3 50.3 64.2 27.5 43.1 41.1
HiDream-E1βœ— 29.7 41.2 56.3 32.0 39.8 26.7 53.6 68.4 29.6 44.6 39.6 40.1 49.9 29.6 39.8 41.4
FLUX.1 Kontext Devβœ— 31.4 52.0 55.0 35.5 43.5 27.5 62.2 69.6 29.0 47.1 39.1 47.1 43.4 27.1 39.2 43.2
OmniGen2βœ“ 35.0 64.0 75.4 41.3 53.9 18.9 56.9 64.9 23.5 41.1 42.0 64.4 74.6 31.8 53.2 49.4
Step1X-Edit-v1p2βœ— 39.8 53.5 61.3 44.4 49.7 35.7 73.0 75.2 38.2 55.5 44.7 49.4 50.3 28.4 43.2 49.5
Echo-4oβœ“ 47.6 63.0 75.4 51.7 59.4 30.8 71.0 80.4 32.9 53.8 63.4 62.4 73.7 41.2 60.2 57.8
Bagelβœ“ 46.2 71.0 75.8 50.8 61.0 38.6 72.1 78.8 39.5 57.3 62.8 68.5 74.5 40.7 61.6 60.0
Uni-CoTβœ“ 46.0 69.1 77.8 51.6 61.1 36.9 70.1 76.3 38.6 55.5 67.6 64.3 79.6 42.9 63.6 60.1
Qwen-Image-Editβœ“ 48.1 69.0 79.5 53.6 62.5 32.1 69.7 80.6 34.2 54.1 67.1 66.8 79.2 42.3 63.8 60.2
DreamOmni2βœ“ 43.3 74.4 85.0 51.2 63.5 34.3 81.7 88.1 35.9 60.0 50.6 64.9 81.9 35.3 58.2 60.6
FLUX.2-devβœ“ 42.6 63.3 78.4 53.3 59.4 35.4 75.0 85.6 37.6 58.4 73.6 70.7 82.1 43.6 67.5 61.8
Nano Bananaβœ“ 70.6 85.7 86.8 75.2 79.6 63.4 84.9 91.4 61.5 75.3 75.3 73.8 87.3 44.3 70.2 75.0
Seedream 4.0βœ“ 70.8 78.1 86.6 74.6 77.5 63.7 80.1 90.6 64.2 74.6 82.2 77.8 86.9 47.0 73.5 75.2
GPT-image-1βœ“ 78.5 85.8 88.0 81.2 83.3 62.9 82.9 93.0 60.8 74.9 84.4 76.2 89.2 48.4 74.6 77.6
Nano Banana Proβœ“ 85.4 88.6 83.9 91.4 87.3 76.0 89.1 92.3 75.8 83.3 86.6 79.5 88.8 51.5 76.6 82.4


Leaderboard on WiseEdit (Chinese)

Model Multi
Img
Awareness Task Interpretation Task Imagination Task Overall
AVG
IF↑ DP↑ VQ↑ KF↑ AVG IF↑ DP↑ VQ↑ KF↑ AVG IF↑ DP↑ VQ↑ CF↑ AVG
MagicBrushβœ— 15.0 43.8 52.9 17.8 32.4 10.1 40.5 59.4 13.5 30.9 5.4 39.6 49.2 12.7 26.8 30.0
OmniGenβœ“ 16.3 42.6 60.1 22.6 35.4 13.4 30.9 49.2 14.8 27.1 15.4 27.6 51.3 32.3 31.7 31.3
ICEditβœ— 12.9 29.1 63.6 17.4 30.8 11.5 43.5 81.1 17.0 38.3 5.1 37.4 56.5 16.7 28.9 32.7
FLUX.1 Kontext Devβœ— 16.5 48.4 52.1 19.1 34.0 16.8 58.4 66.1 22.2 40.9 9.2 41.9 43.3 10.6 26.3 33.7
AnyEditβœ“ 17.5 55.3 55.7 19.9 37.1 11.4 51.8 58.0 16.0 34.3 7.3 47.4 52.3 15.4 30.6 34.0
InstructPix2Pixβœ— 14.2 51.0 58.6 18.1 35.5 13.0 55.7 65.2 16.0 37.5 3.7 52.0 53.2 9.0 29.5 34.1
Janus-4oβœ— 31.1 38.6 46.3 34.1 37.5 23.9 45.9 55.7 25.8 37.8 25.4 36.9 41.8 22.1 31.5 35.6
UniWorld-V1βœ“ 18.4 49.0 60.0 26.2 38.4 13.3 48.8 59.7 16.9 34.7 17.9 54.8 68.9 18.0 39.9 37.7
HiDream-E1βœ— 28.2 37.6 51.4 32.4 37.4 25.4 47.1 63.6 29.7 41.5 32.5 39.2 47.5 27.4 36.6 38.5
UltraEditβœ— 16.9 58.5 62.1 21.2 39.7 17.4 74.2 78.9 19.2 47.4 9.3 42.3 51.8 14.8 29.5 38.9
OmniGen2βœ“ 35.1 57.9 72.4 41.0 51.6 19.1 57.1 64.8 23.0 41.0 45.5 64.0 72.0 33.8 53.8 48.8
Step1X-Edit-v1p2βœ— 38.6 55.6 59.5 42.0 48.9 37.0 77.5 76.8 35.9 56.8 45.7 48.3 51.3 27.0 43.1 49.6
DreamOmni2βœ“ 31.9 78.7 85.4 38.4 58.6 24.0 80.1 86.2 27.5 54.4 38.5 69.4 84.8 27.2 55.0 56.0
Echo-4oβœ“ 47.9 59.9 73.1 55.0 59.0 31.9 74.5 77.4 32.6 54.1 62.8 64.2 75.1 41.5 60.9 58.0
Bagelβœ“ 48.5 71.3 76.8 52.1 62.2 36.5 68.7 75.0 38.5 54.7 63.5 68.3 75.3 39.7 61.7 59.5
Uni-CoTβœ“ 46.2 70.0 80.7 53.6 62.6 37.4 71.5 79.2 36.6 56.2 65.5 65.1 79.7 41.6 63.0 60.6
Qwen-Image-Editβœ“ 45.0 67.3 79.9 52.9 61.3 35.8 74.0 80.7 36.1 56.6 66.3 67.2 80.0 41.7 63.8 60.6
FLUX.2-devβœ“ 43.0 60.6 79.5 51.4 58.6 34.3 73.7 83.1 36.0 56.8 75.5 74.4 82.8 42.6 68.8 61.4
Seedream 4.0βœ“ 69.1 79.0 84.4 72.0 76.1 62.2 80.3 89.9 59.9 73.1 79.8 79.7 86.5 46.4 73.1 74.1
Nano Bananaβœ“ 71.8 83.8 86.5 70.7 78.2 67.9 84.5 91.4 63.7 76.9 76.0 75.8 87.3 43.7 70.7 75.3
GPT-image-1βœ“ 77.0 80.7 86.6 80.7 81.2 61.4 82.6 93.8 61.2 74.8 78.8 73.3 89.6 48.3 72.5 76.2
Nano Banana Proβœ“ 84.6 91.8 83.1 87.9 86.9 74.2 83.9 91.3 74.6 81.0 85.5 77.6 88.4 51.1 75.6 81.2

Leaderboard on WiseEdit-Complex

We exclude models unable to handle multi-image inputs.

Model English Version Chinese Version Overall
AVG
IF↑ DP↑ VQ↑ KF↑ CF↑ AVG IF↑ DP↑ VQ↑ KF↑ CF↑ AVG
AnyEdit

ZJU

2.5 5.6 20.6 3.3 11.7 8.7 1.3 5.1 22.2 2.9 9.3 8.2 8.4
UniWorld-V1

PKU

18.1 32.1 55.3 22.8 28.6 31.4 8.8 23.8 64.6 12.4 15.4 25.0 28.2
OmniGen

BAAI

23.5 25.7 41.2 31.5 48.9 34.2 4.4 15.4 50.3 15.1 32.2 23.5 28.8
OmniGen2

BAAI

32.2 48.3 70.5 47.1 42.9 48.2 28.3 49.8 73.2 45.4 43.9 48.1 48.2
DreamOmni2

CUHK

35.6 63.1 79.6 46.8 41.8 53.4 37.4 52.4 79.9 50.4 36.3 51.2 50.7
Echo-4o

Shanghai AI Lab

44.3 48.1 65.7 52.5 50.3 52.2 42.6 57.1 70.0 54.6 51.7 55.2 52.1
Qwen-Image-Edit

Qwen

38.7 58.6 75.8 48.5 47.1 53.8 35.3 55.0 78.1 49.6 48.9 53.4 53.6
Bagel

ByteDance

44.6 62.8 70.8 54.3 44.2 55.3 40.3 59.0 74.3 56.2 47.6 55.5 53.8
Uni-CoT

SAIS

36.3 60.8 70.1 57.8 50.3 55.1 37.2 56.0 78.2 58.9 49.7 56.0 53.9
FLUX.2-dev

Black Forest Labs

42.3 68.5 75.6 56.5 49.1 58.4 46.3 73.4 80.3 59.9 52.8 62.6 60.5
Nano Banana

Google

53.8 75.2 82.7 82.4 53.7 69.6 53.3 71.3 79.9 77.4 51.1 66.6 68.1
GPT-image-1

OpenAI

58.7 75.9 87.6 77.9 54.8 71.0 59.5 76.6 88.2 78.8 54.1 71.4 69.6
Seedream 4.0

ByteDance

68.0 78.1 80.4 90.5 53.9 74.2 60.1 63.6 81.8 88.6 56.3 70.1 70.5
Nano Banana Pro

Google

68.1 78.1 86.7 88.1 56.6 75.5 77.7 83.8 84.3 90.7 57.0 78.7 77.1

Benchmark Examples

First research result visualization

BibTeX

@article{pan2025wiseedit,
        title={WiseEdit: Benchmarking Cognition-and Creativity-Informed Image Editing},
        author={Pan, Kaihang and Chen, Weile and Qiu, Haiyi and Yu, Qifan and Bu, Wendong and Wang, Zehan and Zhu, Yun and Li, Juncheng and Tang, Siliang},
        journal={arXiv preprint arXiv:2512.00387},
        year={2025}
}

This website is adapted from the project page of Kris-Bench.