WiseEdit: Benchmarking Cognition- and Creativity-Informed Image Editing

1Zhejiang University, 2Shanghai Artificial Intelligence Laboratory
*Equal Contribution
arXiv Code

πŸ€—

WiseEdit Benchmark (Coming Soon)

πŸ€—

Results Gallery (Coming Soon)

Abstract

Recent image editing models boast next-level intelligent capabilities, facilitating cognition- and creativity-informed image editing. Yet, existing benchmarks provide too narrow a scope for evaluation, failing to holistically assess these advanced abilities. To address this, we introduce WiseEdit, a knowledge-intensive benchmark for comprehensive evaluation of cognition- and creativity-informed image editing, featuring deep task depth and broad knowledge breadth. Drawing an analogy to human cognitive creation, WiseEdit decomposes image editing into three cascaded stepsβ€”Awareness, Interpretation, and Imaginationβ€”each corresponding to a task that poses a challenge for models to complete at the specific step. It also encompasses complex tasks, where none of the three steps can be finished easily. Furthermore, WiseEdit incorporates three fundamental types of knowledge: Declarative, Procedural, and Metacognitive knowledge. Ultimately, WiseEdit comprises 1,220 test cases, objectively revealing the limitations of SoTA image editing models in knowledge-based cognitive reasoning and creative composition capabilities.

WiseEdit Intro Image

Leaderboard on WiseEdit (English)

Model Multi
Img
Awareness Task Interpretation Task Imagination Task Overall
AVG
IF↑ DP↑ VQ↑ KF↑ AVG IF↑ DP↑ VQ↑ KF↑ AVG IF↑ DP↑ VQ↑ CF↑ AVG
InstructPix2Pixβœ— 24.6 33.7 50.6 26.4 33.8 20.7 50.3 66.9 23.6 40.4 17.0 29.9 41.2 27.4 28.8 34.4
MagicBrushβœ— 27.2 43.4 53.3 27.1 37.8 16.8 50.3 63.2 22.2 38.1 18.0 36.9 44.8 22.3 30.5 35.5
OmniGenβœ“ 35.0 42.0 46.7 37.4 40.3 19.0 34.8 40.3 21.5 28.9 42.2 35.1 46.0 38.7 40.5 36.6
Janus-4oβœ— 34.7 37.0 45.9 36.2 38.5 27.2 43.8 53.6 28.2 38.2 28.2 37.6 42.0 25.5 33.3 36.7
AnyEdit βœ— 25.0 54.6 61.3 26.3 41.8 15.9 61.2 62.0 20.2 39.8 9.1 49.7 50.9 16.5 31.5 37.7
UltraEditβœ— 26.5 42.5 53.1 33.9 39.0 24.3 61.7 73.6 26.7 46.6 20.7 31.7 45.8 27.5 31.5 39.0
ICEditβœ— 26.1 42.2 61.2 31.8 40.4 21.4 48.3 81.5 24.9 44.0 21.5 40.6 54.0 25.0 35.3 39.9
UniWorld-V1βœ“ 31.5 48.9 58.8 38.6 44.5 18.1 44.5 58.1 22.5 35.8 30.3 50.3 64.2 27.5 43.1 41.1
HiDream-E1βœ— 29.7 41.2 56.3 32.0 39.8 26.7 53.6 68.4 29.6 44.6 39.6 40.1 49.9 29.6 39.8 41.4
FLUX.1 Kontext Devβœ— 31.4 52.0 55.0 35.5 43.5 27.5 62.2 69.6 29.0 47.1 39.1 47.1 43.4 27.1 39.2 43.2
OmniGen2βœ“ 35.0 64.0 75.4 41.3 53.9 18.9 56.9 64.9 23.5 41.1 42.0 64.4 74.6 31.8 53.2 49.4
Step1X-Edit-v1p2βœ— 39.8 53.5 61.3 44.4 49.7 35.7 73.0 75.2 38.2 55.5 44.7 49.4 50.3 28.4 43.2 49.5
Echo-4oβœ“ 47.6 63.0 75.4 51.7 59.4 30.8 71.0 80.4 32.9 53.8 63.4 62.4 73.7 41.2 60.2 57.8
Bagelβœ“ 46.2 71.0 75.8 50.8 61.0 38.6 72.1 78.8 39.5 57.3 62.8 68.5 74.5 40.7 61.6 60.0
Uni-CoTβœ“ 46.0 69.1 77.8 51.6 61.1 36.9 70.1 76.3 38.6 55.5 67.6 64.3 79.6 42.9 63.6 60.1
Qwen-Image-Editβœ“ 48.1 69.0 79.5 53.6 62.5 32.1 69.7 80.6 34.2 54.1 67.1 66.8 79.2 42.3 63.8 60.2
DreamOmni2βœ“ 43.3 74.4 85.0 51.2 63.5 34.3 81.7 88.1 35.9 60.0 50.6 64.9 81.9 35.3 58.2 60.6
FLUX.2-devβœ“ 42.6 63.3 78.4 53.3 59.4 35.4 75.0 85.6 37.6 58.4 73.6 70.7 82.1 43.6 67.5 61.8
Nano Bananaβœ“ 70.6 85.7 86.8 75.2 79.6 63.4 84.9 91.4 61.5 75.3 75.3 73.8 87.3 44.3 70.2 75.0
Seedream 4.0βœ“ 70.8 78.1 86.6 74.6 77.5 63.7 80.1 90.6 64.2 74.6 82.2 77.8 86.9 47.0 73.5 75.2
GPT-image-1βœ“ 78.5 85.8 88.0 81.2 83.3 62.9 82.9 93.0 60.8 74.9 84.4 76.2 89.2 48.4 74.6 77.6
Nano Banana Proβœ“ 85.4 88.6 83.9 91.4 87.3 76.0 89.1 92.3 75.8 83.3 86.6 79.5 88.8 51.5 76.6 82.4


Leaderboard on WiseEdit (Chinese)

Model Multi
Img
Awareness Task Interpretation Task Imagination Task Overall
AVG
IF↑ DP↑ VQ↑ KF↑ AVG IF↑ DP↑ VQ↑ KF↑ AVG IF↑ DP↑ VQ↑ CF↑ AVG
MagicBrushβœ— 15.0 43.8 52.9 17.8 32.4 10.1 40.5 59.4 13.5 30.9 5.4 39.6 49.2 12.7 26.8 30.0
OmniGenβœ“ 16.3 42.6 60.1 22.6 35.4 13.4 30.9 49.2 14.8 27.1 15.4 27.6 51.3 32.3 31.7 31.3
ICEditβœ— 12.9 29.1 63.6 17.4 30.8 11.5 43.5 81.1 17.0 38.3 5.1 37.4 56.5 16.7 28.9 32.7
FLUX.1 Kontext Devβœ— 16.5 48.4 52.1 19.1 34.0 16.8 58.4 66.1 22.2 40.9 9.2 41.9 43.3 10.6 26.3 33.7
AnyEditβœ“ 17.5 55.3 55.7 19.9 37.1 11.4 51.8 58.0 16.0 34.3 7.3 47.4 52.3 15.4 30.6 34.0
InstructPix2Pixβœ— 14.2 51.0 58.6 18.1 35.5 13.0 55.7 65.2 16.0 37.5 3.7 52.0 53.2 9.0 29.5 34.1
Janus-4oβœ— 31.1 38.6 46.3 34.1 37.5 23.9 45.9 55.7 25.8 37.8 25.4 36.9 41.8 22.1 31.5 35.6
UniWorld-V1βœ“ 18.4 49.0 60.0 26.2 38.4 13.3 48.8 59.7 16.9 34.7 17.9 54.8 68.9 18.0 39.9 37.7
HiDream-E1βœ— 28.2 37.6 51.4 32.4 37.4 25.4 47.1 63.6 29.7 41.5 32.5 39.2 47.5 27.4 36.6 38.5
UltraEditβœ— 16.9 58.5 62.1 21.2 39.7 17.4 74.2 78.9 19.2 47.4 9.3 42.3 51.8 14.8 29.5 38.9
OmniGen2βœ“ 35.1 57.9 72.4 41.0 51.6 19.1 57.1 64.8 23.0 41.0 45.5 64.0 72.0 33.8 53.8 48.8
Step1X-Edit-v1p2βœ— 38.6 55.6 59.5 42.0 48.9 37.0 77.5 76.8 35.9 56.8 45.7 48.3 51.3 27.0 43.1 49.6
DreamOmni2βœ“ 31.9 78.7 85.4 38.4 58.6 24.0 80.1 86.2 27.5 54.4 38.5 69.4 84.8 27.2 55.0 56.0
Echo-4oβœ“ 47.9 59.9 73.1 55.0 59.0 31.9 74.5 77.4 32.6 54.1 62.8 64.2 75.1 41.5 60.9 58.0
Bagelβœ“ 48.5 71.3 76.8 52.1 62.2 36.5 68.7 75.0 38.5 54.7 63.5 68.3 75.3 39.7 61.7 59.5
Uni-CoTβœ“ 46.2 70.0 80.7 53.6 62.6 37.4 71.5 79.2 36.6 56.2 65.5 65.1 79.7 41.6 63.0 60.6
Qwen-Image-Editβœ“ 45.0 67.3 79.9 52.9 61.3 35.8 74.0 80.7 36.1 56.6 66.3 67.2 80.0 41.7 63.8 60.6
FLUX.2-devβœ“ 43.0 60.6 79.5 51.4 58.6 34.3 73.7 83.1 36.0 56.8 75.5 74.4 82.8 42.6 68.8 61.4
Seedream 4.0βœ“ 69.1 79.0 84.4 72.0 76.1 62.2 80.3 89.9 59.9 73.1 79.8 79.7 86.5 46.4 73.1 74.1
Nano Bananaβœ“ 71.8 83.8 86.5 70.7 78.2 67.9 84.5 91.4 63.7 76.9 76.0 75.8 87.3 43.7 70.7 75.3
GPT-image-1βœ“ 77.0 80.7 86.6 80.7 81.2 61.4 82.6 93.8 61.2 74.8 78.8 73.3 89.6 48.3 72.5 76.2
Nano Banana Proβœ“ 84.6 91.8 83.1 87.9 86.9 74.2 83.9 91.3 74.6 81.0 85.5 77.6 88.4 51.1 75.6 81.2

Leaderboard on WiseEdit-Complex

We exclude models unable to handle multi-image inputs.

Model English Version Chinese Version Overall
AVG
IF↑ DP↑ VQ↑ KF↑ CF↑ AVG IF↑ DP↑ VQ↑ KF↑ CF↑ AVG
AnyEdit

ZJU

2.5 5.6 20.6 3.3 11.7 8.7 1.3 5.1 22.2 2.9 9.3 8.2 8.4
UniWorld-V1

PKU

18.1 32.1 55.3 22.8 28.6 31.4 8.8 23.8 64.6 12.4 15.4 25.0 28.2
OmniGen

BAAI

23.5 25.7 41.2 31.5 48.9 34.2 4.4 15.4 50.3 15.1 32.2 23.5 28.8
OmniGen2

BAAI

32.2 48.3 70.5 47.1 42.9 48.2 28.3 49.8 73.2 45.4 43.9 48.1 48.2
DreamOmni2

CUHK

35.6 63.1 79.6 46.8 41.8 53.4 37.4 52.4 79.9 50.4 36.3 51.2 50.7
Echo-4o

Shanghai AI Lab

44.3 48.1 65.7 52.5 50.3 52.2 42.6 57.1 70.0 54.6 51.7 55.2 52.1
Qwen-Image-Edit

Qwen

38.7 58.6 75.8 48.5 47.1 53.8 35.3 55.0 78.1 49.6 48.9 53.4 53.6
Bagel

ByteDance

44.6 62.8 70.8 54.3 44.2 55.3 40.3 59.0 74.3 56.2 47.6 55.5 53.8
Uni-CoT

SAIS

36.3 60.8 70.1 57.8 50.3 55.1 37.2 56.0 78.2 58.9 49.7 56.0 53.9
FLUX.2-dev

Black Forest Labs

42.3 68.5 75.6 56.5 49.1 58.4 46.3 73.4 80.3 59.9 52.8 62.6 60.5
Nano Banana

Google

53.8 75.2 82.7 82.4 53.7 69.6 53.3 71.3 79.9 77.4 51.1 66.6 68.1
GPT-image-1

OpenAI

58.7 75.9 87.6 77.9 54.8 71.0 59.5 76.6 88.2 78.8 54.1 71.4 69.6
Seedream 4.0

ByteDance

68.0 78.1 80.4 90.5 53.9 74.2 60.1 63.6 81.8 88.6 56.3 70.1 70.5
Nano Banana Pro

Google

68.1 78.1 86.7 88.1 56.6 75.5 77.7 83.8 84.3 90.7 57.0 78.7 77.1

Benchmark Examples

First research result visualization

BibTeX