WiseEdit: Benchmarking Cognition- and Creativity-Informed Image Editing
arXiv Codeπ€
WiseEdit Benchmark (Coming Soon)π€
Results Gallery (Coming Soon)Abstract
Recent image editing models boast next-level intelligent capabilities, facilitating cognition- and creativity-informed image editing. Yet, existing benchmarks provide too narrow a scope for evaluation, failing to holistically assess these advanced abilities. To address this, we introduce WiseEdit, a knowledge-intensive benchmark for comprehensive evaluation of cognition- and creativity-informed image editing, featuring deep task depth and broad knowledge breadth. Drawing an analogy to human cognitive creation, WiseEdit decomposes image editing into three cascaded stepsβAwareness, Interpretation, and Imaginationβeach corresponding to a task that poses a challenge for models to complete at the specific step. It also encompasses complex tasks, where none of the three steps can be finished easily. Furthermore, WiseEdit incorporates three fundamental types of knowledge: Declarative, Procedural, and Metacognitive knowledge. Ultimately, WiseEdit comprises 1,220 test cases, objectively revealing the limitations of SoTA image editing models in knowledge-based cognitive reasoning and creative composition capabilities.
Leaderboard on WiseEdit (English)
| Model | Multi Img |
Awareness Task | Interpretation Task | Imagination Task | Overall AVG |
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IF↑ | DP↑ | VQ↑ | KF↑ | AVG | IF↑ | DP↑ | VQ↑ | KF↑ | AVG | IF↑ | DP↑ | VQ↑ | CF↑ | AVG | |||
| InstructPix2Pix | β | 24.6 | 33.7 | 50.6 | 26.4 | 33.8 | 20.7 | 50.3 | 66.9 | 23.6 | 40.4 | 17.0 | 29.9 | 41.2 | 27.4 | 28.8 | 34.4 |
| MagicBrush | β | 27.2 | 43.4 | 53.3 | 27.1 | 37.8 | 16.8 | 50.3 | 63.2 | 22.2 | 38.1 | 18.0 | 36.9 | 44.8 | 22.3 | 30.5 | 35.5 |
| OmniGen | β | 35.0 | 42.0 | 46.7 | 37.4 | 40.3 | 19.0 | 34.8 | 40.3 | 21.5 | 28.9 | 42.2 | 35.1 | 46.0 | 38.7 | 40.5 | 36.6 |
| Janus-4o | β | 34.7 | 37.0 | 45.9 | 36.2 | 38.5 | 27.2 | 43.8 | 53.6 | 28.2 | 38.2 | 28.2 | 37.6 | 42.0 | 25.5 | 33.3 | 36.7 |
| AnyEdit | β | 25.0 | 54.6 | 61.3 | 26.3 | 41.8 | 15.9 | 61.2 | 62.0 | 20.2 | 39.8 | 9.1 | 49.7 | 50.9 | 16.5 | 31.5 | 37.7 |
| UltraEdit | β | 26.5 | 42.5 | 53.1 | 33.9 | 39.0 | 24.3 | 61.7 | 73.6 | 26.7 | 46.6 | 20.7 | 31.7 | 45.8 | 27.5 | 31.5 | 39.0 |
| ICEdit | β | 26.1 | 42.2 | 61.2 | 31.8 | 40.4 | 21.4 | 48.3 | 81.5 | 24.9 | 44.0 | 21.5 | 40.6 | 54.0 | 25.0 | 35.3 | 39.9 |
| UniWorld-V1 | β | 31.5 | 48.9 | 58.8 | 38.6 | 44.5 | 18.1 | 44.5 | 58.1 | 22.5 | 35.8 | 30.3 | 50.3 | 64.2 | 27.5 | 43.1 | 41.1 |
| HiDream-E1 | β | 29.7 | 41.2 | 56.3 | 32.0 | 39.8 | 26.7 | 53.6 | 68.4 | 29.6 | 44.6 | 39.6 | 40.1 | 49.9 | 29.6 | 39.8 | 41.4 |
| FLUX.1 Kontext Dev | β | 31.4 | 52.0 | 55.0 | 35.5 | 43.5 | 27.5 | 62.2 | 69.6 | 29.0 | 47.1 | 39.1 | 47.1 | 43.4 | 27.1 | 39.2 | 43.2 |
| OmniGen2 | β | 35.0 | 64.0 | 75.4 | 41.3 | 53.9 | 18.9 | 56.9 | 64.9 | 23.5 | 41.1 | 42.0 | 64.4 | 74.6 | 31.8 | 53.2 | 49.4 |
| Step1X-Edit-v1p2 | β | 39.8 | 53.5 | 61.3 | 44.4 | 49.7 | 35.7 | 73.0 | 75.2 | 38.2 | 55.5 | 44.7 | 49.4 | 50.3 | 28.4 | 43.2 | 49.5 |
| Echo-4o | β | 47.6 | 63.0 | 75.4 | 51.7 | 59.4 | 30.8 | 71.0 | 80.4 | 32.9 | 53.8 | 63.4 | 62.4 | 73.7 | 41.2 | 60.2 | 57.8 |
| Bagel | β | 46.2 | 71.0 | 75.8 | 50.8 | 61.0 | 38.6 | 72.1 | 78.8 | 39.5 | 57.3 | 62.8 | 68.5 | 74.5 | 40.7 | 61.6 | 60.0 |
| Uni-CoT | β | 46.0 | 69.1 | 77.8 | 51.6 | 61.1 | 36.9 | 70.1 | 76.3 | 38.6 | 55.5 | 67.6 | 64.3 | 79.6 | 42.9 | 63.6 | 60.1 |
| Qwen-Image-Edit | β | 48.1 | 69.0 | 79.5 | 53.6 | 62.5 | 32.1 | 69.7 | 80.6 | 34.2 | 54.1 | 67.1 | 66.8 | 79.2 | 42.3 | 63.8 | 60.2 |
| DreamOmni2 | β | 43.3 | 74.4 | 85.0 | 51.2 | 63.5 | 34.3 | 81.7 | 88.1 | 35.9 | 60.0 | 50.6 | 64.9 | 81.9 | 35.3 | 58.2 | 60.6 |
| FLUX.2-dev | β | 42.6 | 63.3 | 78.4 | 53.3 | 59.4 | 35.4 | 75.0 | 85.6 | 37.6 | 58.4 | 73.6 | 70.7 | 82.1 | 43.6 | 67.5 | 61.8 |
| Nano Banana | β | 70.6 | 85.7 | 86.8 | 75.2 | 79.6 | 63.4 | 84.9 | 91.4 | 61.5 | 75.3 | 75.3 | 73.8 | 87.3 | 44.3 | 70.2 | 75.0 |
| Seedream 4.0 | β | 70.8 | 78.1 | 86.6 | 74.6 | 77.5 | 63.7 | 80.1 | 90.6 | 64.2 | 74.6 | 82.2 | 77.8 | 86.9 | 47.0 | 73.5 | 75.2 |
| GPT-image-1 | β | 78.5 | 85.8 | 88.0 | 81.2 | 83.3 | 62.9 | 82.9 | 93.0 | 60.8 | 74.9 | 84.4 | 76.2 | 89.2 | 48.4 | 74.6 | 77.6 |
| Nano Banana Pro | β | 85.4 | 88.6 | 83.9 | 91.4 | 87.3 | 76.0 | 89.1 | 92.3 | 75.8 | 83.3 | 86.6 | 79.5 | 88.8 | 51.5 | 76.6 | 82.4 |
Leaderboard on WiseEdit (Chinese)
| Model | Multi Img |
Awareness Task | Interpretation Task | Imagination Task | Overall AVG |
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IF↑ | DP↑ | VQ↑ | KF↑ | AVG | IF↑ | DP↑ | VQ↑ | KF↑ | AVG | IF↑ | DP↑ | VQ↑ | CF↑ | AVG | |||
| MagicBrush | β | 15.0 | 43.8 | 52.9 | 17.8 | 32.4 | 10.1 | 40.5 | 59.4 | 13.5 | 30.9 | 5.4 | 39.6 | 49.2 | 12.7 | 26.8 | 30.0 |
| OmniGen | β | 16.3 | 42.6 | 60.1 | 22.6 | 35.4 | 13.4 | 30.9 | 49.2 | 14.8 | 27.1 | 15.4 | 27.6 | 51.3 | 32.3 | 31.7 | 31.3 |
| ICEdit | β | 12.9 | 29.1 | 63.6 | 17.4 | 30.8 | 11.5 | 43.5 | 81.1 | 17.0 | 38.3 | 5.1 | 37.4 | 56.5 | 16.7 | 28.9 | 32.7 |
| FLUX.1 Kontext Dev | β | 16.5 | 48.4 | 52.1 | 19.1 | 34.0 | 16.8 | 58.4 | 66.1 | 22.2 | 40.9 | 9.2 | 41.9 | 43.3 | 10.6 | 26.3 | 33.7 |
| AnyEdit | β | 17.5 | 55.3 | 55.7 | 19.9 | 37.1 | 11.4 | 51.8 | 58.0 | 16.0 | 34.3 | 7.3 | 47.4 | 52.3 | 15.4 | 30.6 | 34.0 |
| InstructPix2Pix | β | 14.2 | 51.0 | 58.6 | 18.1 | 35.5 | 13.0 | 55.7 | 65.2 | 16.0 | 37.5 | 3.7 | 52.0 | 53.2 | 9.0 | 29.5 | 34.1 |
| Janus-4o | β | 31.1 | 38.6 | 46.3 | 34.1 | 37.5 | 23.9 | 45.9 | 55.7 | 25.8 | 37.8 | 25.4 | 36.9 | 41.8 | 22.1 | 31.5 | 35.6 |
| UniWorld-V1 | β | 18.4 | 49.0 | 60.0 | 26.2 | 38.4 | 13.3 | 48.8 | 59.7 | 16.9 | 34.7 | 17.9 | 54.8 | 68.9 | 18.0 | 39.9 | 37.7 |
| HiDream-E1 | β | 28.2 | 37.6 | 51.4 | 32.4 | 37.4 | 25.4 | 47.1 | 63.6 | 29.7 | 41.5 | 32.5 | 39.2 | 47.5 | 27.4 | 36.6 | 38.5 |
| UltraEdit | β | 16.9 | 58.5 | 62.1 | 21.2 | 39.7 | 17.4 | 74.2 | 78.9 | 19.2 | 47.4 | 9.3 | 42.3 | 51.8 | 14.8 | 29.5 | 38.9 |
| OmniGen2 | β | 35.1 | 57.9 | 72.4 | 41.0 | 51.6 | 19.1 | 57.1 | 64.8 | 23.0 | 41.0 | 45.5 | 64.0 | 72.0 | 33.8 | 53.8 | 48.8 |
| Step1X-Edit-v1p2 | β | 38.6 | 55.6 | 59.5 | 42.0 | 48.9 | 37.0 | 77.5 | 76.8 | 35.9 | 56.8 | 45.7 | 48.3 | 51.3 | 27.0 | 43.1 | 49.6 |
| DreamOmni2 | β | 31.9 | 78.7 | 85.4 | 38.4 | 58.6 | 24.0 | 80.1 | 86.2 | 27.5 | 54.4 | 38.5 | 69.4 | 84.8 | 27.2 | 55.0 | 56.0 |
| Echo-4o | β | 47.9 | 59.9 | 73.1 | 55.0 | 59.0 | 31.9 | 74.5 | 77.4 | 32.6 | 54.1 | 62.8 | 64.2 | 75.1 | 41.5 | 60.9 | 58.0 |
| Bagel | β | 48.5 | 71.3 | 76.8 | 52.1 | 62.2 | 36.5 | 68.7 | 75.0 | 38.5 | 54.7 | 63.5 | 68.3 | 75.3 | 39.7 | 61.7 | 59.5 |
| Uni-CoT | β | 46.2 | 70.0 | 80.7 | 53.6 | 62.6 | 37.4 | 71.5 | 79.2 | 36.6 | 56.2 | 65.5 | 65.1 | 79.7 | 41.6 | 63.0 | 60.6 |
| Qwen-Image-Edit | β | 45.0 | 67.3 | 79.9 | 52.9 | 61.3 | 35.8 | 74.0 | 80.7 | 36.1 | 56.6 | 66.3 | 67.2 | 80.0 | 41.7 | 63.8 | 60.6 |
| FLUX.2-dev | β | 43.0 | 60.6 | 79.5 | 51.4 | 58.6 | 34.3 | 73.7 | 83.1 | 36.0 | 56.8 | 75.5 | 74.4 | 82.8 | 42.6 | 68.8 | 61.4 |
| Seedream 4.0 | β | 69.1 | 79.0 | 84.4 | 72.0 | 76.1 | 62.2 | 80.3 | 89.9 | 59.9 | 73.1 | 79.8 | 79.7 | 86.5 | 46.4 | 73.1 | 74.1 |
| Nano Banana | β | 71.8 | 83.8 | 86.5 | 70.7 | 78.2 | 67.9 | 84.5 | 91.4 | 63.7 | 76.9 | 76.0 | 75.8 | 87.3 | 43.7 | 70.7 | 75.3 |
| GPT-image-1 | β | 77.0 | 80.7 | 86.6 | 80.7 | 81.2 | 61.4 | 82.6 | 93.8 | 61.2 | 74.8 | 78.8 | 73.3 | 89.6 | 48.3 | 72.5 | 76.2 |
| Nano Banana Pro | β | 84.6 | 91.8 | 83.1 | 87.9 | 86.9 | 74.2 | 83.9 | 91.3 | 74.6 | 81.0 | 85.5 | 77.6 | 88.4 | 51.1 | 75.6 | 81.2 |
Leaderboard on WiseEdit-Complex
We exclude models unable to handle multi-image inputs.
| Model | English Version | Chinese Version | Overall AVG |
||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IF↑ | DP↑ | VQ↑ | KF↑ | CF↑ | AVG | IF↑ | DP↑ | VQ↑ | KF↑ | CF↑ | AVG | ||
| AnyEdit ZJU |
2.5 | 5.6 | 20.6 | 3.3 | 11.7 | 8.7 | 1.3 | 5.1 | 22.2 | 2.9 | 9.3 | 8.2 | 8.4 |
| UniWorld-V1 PKU |
18.1 | 32.1 | 55.3 | 22.8 | 28.6 | 31.4 | 8.8 | 23.8 | 64.6 | 12.4 | 15.4 | 25.0 | 28.2 |
| OmniGen BAAI |
23.5 | 25.7 | 41.2 | 31.5 | 48.9 | 34.2 | 4.4 | 15.4 | 50.3 | 15.1 | 32.2 | 23.5 | 28.8 |
| OmniGen2 BAAI |
32.2 | 48.3 | 70.5 | 47.1 | 42.9 | 48.2 | 28.3 | 49.8 | 73.2 | 45.4 | 43.9 | 48.1 | 48.2 |
| DreamOmni2 CUHK |
35.6 | 63.1 | 79.6 | 46.8 | 41.8 | 53.4 | 37.4 | 52.4 | 79.9 | 50.4 | 36.3 | 51.2 | 50.7 |
| Echo-4o Shanghai AI Lab |
44.3 | 48.1 | 65.7 | 52.5 | 50.3 | 52.2 | 42.6 | 57.1 | 70.0 | 54.6 | 51.7 | 55.2 | 52.1 |
| Qwen-Image-Edit Qwen |
38.7 | 58.6 | 75.8 | 48.5 | 47.1 | 53.8 | 35.3 | 55.0 | 78.1 | 49.6 | 48.9 | 53.4 | 53.6 |
| Bagel ByteDance |
44.6 | 62.8 | 70.8 | 54.3 | 44.2 | 55.3 | 40.3 | 59.0 | 74.3 | 56.2 | 47.6 | 55.5 | 53.8 |
| Uni-CoT SAIS |
36.3 | 60.8 | 70.1 | 57.8 | 50.3 | 55.1 | 37.2 | 56.0 | 78.2 | 58.9 | 49.7 | 56.0 | 53.9 |
| FLUX.2-dev Black Forest Labs |
42.3 | 68.5 | 75.6 | 56.5 | 49.1 | 58.4 | 46.3 | 73.4 | 80.3 | 59.9 | 52.8 | 62.6 | 60.5 |
| Nano Banana |
53.8 | 75.2 | 82.7 | 82.4 | 53.7 | 69.6 | 53.3 | 71.3 | 79.9 | 77.4 | 51.1 | 66.6 | 68.1 |
| GPT-image-1 OpenAI |
58.7 | 75.9 | 87.6 | 77.9 | 54.8 | 71.0 | 59.5 | 76.6 | 88.2 | 78.8 | 54.1 | 71.4 | 69.6 |
| Seedream 4.0 ByteDance |
68.0 | 78.1 | 80.4 | 90.5 | 53.9 | 74.2 | 60.1 | 63.6 | 81.8 | 88.6 | 56.3 | 70.1 | 70.5 |
| Nano Banana Pro |
68.1 | 78.1 | 86.7 | 88.1 | 56.6 | 75.5 | 77.7 | 83.8 | 84.3 | 90.7 | 57.0 | 78.7 | 77.1 |
Benchmark Examples
Qualitative Comparisons on WiseEdit Awareness Task.
Qualitative Comparisons on WiseEdit Interpretation Task.
Qualitative Comparisons on WiseEdit Imagination Task.
Qualitative Comparisons on WiseEdit-Complex Task.
BibTeX