PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

WiseEdit: Benchmarking Cognition- and Creativity-Informed Image Editing

Kaihang Pan¹^*, Weile Chen¹^*, Haiyi Qiu¹^*, Qifan Yu¹, Wendong Bu¹,Zehan Wang¹ Yun Zhu², Juncheng Li¹, Siliang Tang¹

¹Zhejiang University, ²Shanghai Artificial Intelligence Laboratory
^*Equal Contribution

arXiv Code

🤗

WiseEdit Benchmark

🤗

Results Gallery

[Note] If you would like to submit your results on WiseEdit, please contact us at kaihangpan@zju.edu.cn.
[2025-12-07] 🔥 We release the Results Gallery, which initially contains the generation results of 22 mainstream open- and closed-source image editing models (as shown in our paper).
[2025-12-07] 🔥 We release the evaluation codes for WiseEdit on our github. Welcome to evaluate your model on WiseEdit!
[2025-12-07] 🔥 We release the WiseEdit benchmark on Huggingface.
[2025-11-29] 🔥 We release the project page with the leaderboard.
[2025-11-29] 🔥 We release the WiseEdit paper on Arxiv.

Abstract

Recent image editing models boast next-level intelligent capabilities, facilitating cognition- and creativity-informed image editing. Yet, existing benchmarks provide too narrow a scope for evaluation, failing to holistically assess these advanced abilities. To address this, we introduce WiseEdit, a knowledge-intensive benchmark for comprehensive evaluation of cognition- and creativity-informed image editing, featuring deep task depth and broad knowledge breadth. Drawing an analogy to human cognitive creation, WiseEdit decomposes image editing into three cascaded steps—Awareness, Interpretation, and Imagination—each corresponding to a task that poses a challenge for models to complete at the specific step. It also encompasses complex tasks, where none of the three steps can be finished easily. Furthermore, WiseEdit incorporates three fundamental types of knowledge: Declarative, Procedural, and Metacognitive knowledge. Ultimately, WiseEdit comprises 1,220 test cases, objectively revealing the limitations of SoTA image editing models in knowledge-based cognitive reasoning and creative composition capabilities.

Leaderboard on WiseEdit (English)

Model	Multi Img	Awareness Task					Interpretation Task					Imagination Task					Overall AVG
Model	Multi Img	IF↑	DP↑	VQ↑	KF↑	AVG	IF↑	DP↑	VQ↑	KF↑	AVG	IF↑	DP↑	VQ↑	CF↑	AVG	Overall AVG
InstructPix2Pix	✗	24.6	33.7	50.6	26.4	33.8	20.7	50.3	66.9	23.6	40.4	17.0	29.9	41.2	27.4	28.9	34.4
MagicBrush	✗	27.2	43.4	53.3	27.1	37.8	16.8	50.3	63.2	22.2	38.1	18.0	36.9	44.8	22.3	30.5	35.5
OmniGen	✓	35.0	42.0	46.7	37.4	40.3	19.0	34.8	40.3	21.5	28.9	42.2	35.1	46.0	38.7	40.5	36.6
Janus-4o	✗	34.7	37.0	45.9	36.2	38.5	27.2	43.8	53.6	28.2	38.2	28.2	37.6	42.0	25.5	33.3	36.7
AnyEdit	✗	25.0	54.6	61.3	26.3	41.8	15.9	61.2	62.0	20.2	39.8	9.1	49.7	50.9	16.5	31.5	37.7
UltraEdit	✗	26.5	42.5	53.1	33.9	39.0	24.3	61.7	73.6	26.7	46.6	20.7	31.7	45.8	27.5	31.5	39.0
ICEdit	✗	26.1	42.2	61.2	31.8	40.4	21.4	48.3	81.5	24.9	44.0	21.5	40.6	54.0	25.0	35.3	39.9
UniWorld-V1	✓	31.5	48.9	58.8	38.6	44.5	18.1	44.5	58.1	22.5	35.8	30.3	50.3	64.2	27.5	43.1	41.1
HiDream-E1	✗	29.7	41.2	56.3	32.0	39.8	26.7	53.6	68.4	29.6	44.6	39.6	40.1	49.9	29.6	39.8	41.4
FLUX.1 Kontext Dev	✗	31.4	52.0	55.0	35.5	43.5	27.5	62.2	69.6	29.0	47.1	39.1	47.1	43.4	27.1	39.2	43.2
OmniGen2	✓	35.0	64.0	75.4	41.3	53.9	18.9	56.9	64.9	23.5	41.1	42.0	64.4	74.6	31.8	53.2	49.4
Step1X-Edit-v1p2-preview	✗	39.8	53.5	61.3	44.4	49.7	35.7	73.0	75.2	38.2	55.5	44.7	49.4	50.3	28.4	43.2	49.5
Echo-4o	✓	47.6	63.0	75.4	51.7	59.4	30.8	71.0	80.4	32.9	53.8	63.4	62.4	73.7	41.2	60.2	57.8
Bagel	✓	46.2	71.0	75.8	50.8	61.0	38.6	72.1	78.8	39.5	57.3	62.8	68.5	74.5	40.7	61.6	60.0
Uni-CoT	✓	46.0	69.1	77.8	51.6	61.1	36.9	70.1	76.3	38.6	55.5	67.6	64.3	79.6	42.9	63.6	60.1
Qwen-Image-Edit	✓	48.1	69.0	79.5	53.6	62.5	32.1	69.7	80.6	34.2	54.1	67.1	66.8	79.2	42.3	63.8	60.2
DreamOmni2	✓	43.3	74.4	85.0	51.2	63.5	34.3	81.7	88.1	35.9	60.0	50.6	64.9	81.9	35.3	58.2	60.6
FLUX.2-dev	✓	42.6	63.3	78.4	53.3	59.4	35.4	75.0	85.6	37.6	58.4	73.6	70.7	82.1	43.6	67.5	61.8
Nano Banana	✓	70.6	85.7	86.8	75.2	79.6	63.4	84.9	91.4	61.5	75.3	75.3	73.8	87.3	44.3	70.2	75.0
Seedream 4.0	✓	70.8	78.1	86.6	74.6	77.5	63.7	80.1	90.6	64.2	74.6	82.2	77.8	86.9	47.0	73.5	75.2
GPT-image-1	✓	78.5	85.8	88.0	81.2	83.3	62.9	82.9	93.0	60.8	74.9	84.4	76.2	89.2	48.4	74.6	77.6
Nano Banana Pro	✓	85.4	88.6	83.9	91.4	87.3	76.0	89.1	92.3	75.8	83.3	86.6	79.5	88.8	51.5	76.6	82.4

Leaderboard on WiseEdit (Chinese)

Model	Multi Img	Awareness Task					Interpretation Task					Imagination Task					Overall AVG
Model	Multi Img	IF↑	DP↑	VQ↑	KF↑	AVG	IF↑	DP↑	VQ↑	KF↑	AVG	IF↑	DP↑	VQ↑	CF↑	AVG	Overall AVG
MagicBrush	✗	15.0	43.8	52.9	17.8	32.4	10.1	40.5	59.4	13.5	30.9	5.4	39.6	49.2	12.7	26.8	30.0
OmniGen	✓	16.3	42.6	60.1	22.6	35.4	13.4	30.9	49.2	14.8	27.1	15.4	27.6	51.3	32.3	31.7	31.3
ICEdit	✗	12.9	29.1	63.6	17.4	30.8	11.5	43.5	81.1	17.0	38.3	5.1	37.4	56.5	16.7	28.9	32.7
FLUX.1 Kontext Dev	✗	16.5	48.4	52.1	19.1	34.0	16.8	58.4	66.1	22.2	40.9	9.2	41.9	43.3	10.6	26.3	33.7
AnyEdit	✓	17.5	55.3	55.7	19.9	37.1	11.4	51.8	58.0	16.0	34.3	7.3	47.4	52.3	15.4	30.6	34.0
InstructPix2Pix	✗	14.2	51.0	58.6	18.1	35.5	13.0	55.7	65.2	16.0	37.5	3.7	52.0	53.2	9.0	29.5	34.1
Janus-4o	✗	31.1	38.6	46.3	34.1	37.5	23.9	45.9	55.7	25.8	37.8	25.4	36.9	41.8	22.1	31.5	35.6
UniWorld-V1	✓	18.4	49.0	60.0	26.2	38.4	13.3	48.8	59.7	16.9	34.7	17.9	54.8	68.9	18.0	39.9	37.7
HiDream-E1	✗	28.2	37.6	51.4	32.4	37.4	25.4	47.1	63.6	29.7	41.5	32.5	39.2	47.5	27.4	36.6	38.5
UltraEdit	✗	16.9	58.5	62.1	21.2	39.7	17.4	74.2	78.9	19.2	47.4	9.3	42.3	51.8	14.8	29.5	38.9
OmniGen2	✓	35.1	57.9	72.4	41.0	51.6	19.1	57.1	64.8	23.0	41.0	45.5	64.0	72.0	33.8	53.8	48.8
Step1X-Edit-v1p2-preview	✗	38.6	55.6	59.5	42.0	48.9	37.0	77.5	76.8	35.9	56.8	45.7	48.3	51.3	27.0	43.1	49.6
DreamOmni2	✓	31.9	78.7	85.4	38.4	58.6	24.0	80.1	86.2	27.5	54.4	38.5	69.4	84.8	27.2	55.0	56.0
Echo-4o	✓	47.9	59.9	73.1	55.0	59.0	31.9	74.5	77.4	32.6	54.1	62.8	64.2	75.1	41.5	60.9	58.0
Bagel	✓	48.5	71.3	76.8	52.1	62.2	36.5	68.7	75.0	38.5	54.7	63.5	68.3	75.3	39.7	61.7	59.5
Uni-CoT	✓	46.2	70.0	80.7	53.6	62.6	37.4	71.5	79.2	36.6	56.2	65.5	65.1	79.7	41.6	63.0	60.6
Qwen-Image-Edit	✓	45.0	67.3	79.9	52.9	61.3	35.8	74.0	80.7	36.1	56.6	66.3	67.2	80.0	41.7	63.8	60.6
FLUX.2-dev	✓	43.0	60.6	79.5	51.4	58.6	34.3	73.7	83.1	36.0	56.8	75.5	74.4	82.8	42.6	68.8	61.4
Seedream 4.0	✓	69.1	79.0	84.4	72.0	76.1	62.2	80.3	89.9	59.9	73.1	79.8	79.7	86.5	46.4	73.1	74.1
Nano Banana	✓	71.8	83.8	86.5	70.7	78.2	67.9	84.5	91.4	63.7	76.9	76.0	75.8	87.3	43.7	70.7	75.3
GPT-image-1	✓	77.0	80.7	86.6	80.7	81.2	61.4	82.6	93.8	61.2	74.8	78.8	73.3	89.6	48.3	72.5	76.2
Nano Banana Pro	✓	84.6	91.8	83.1	87.9	86.9	74.2	83.9	91.3	74.6	81.0	85.5	77.6	88.4	51.1	75.6	81.2

Leaderboard on WiseEdit-Complex

We exclude models unable to handle multi-image inputs.

Model	English Version						Chinese Version						Overall AVG
Model	IF↑	DP↑	VQ↑	KF↑	CF↑	AVG	IF↑	DP↑	VQ↑	KF↑	CF↑	AVG	Overall AVG
AnyEdit ZJU	2.5	5.6	20.6	3.3	11.7	8.7	1.3	5.1	22.2	2.9	9.3	8.2	8.5
UniWorld-V1 PKU	18.1	32.1	55.3	22.8	28.6	31.4	8.8	23.8	64.6	12.4	15.4	25.0	28.2
OmniGen BAAI	23.5	25.7	41.2	31.5	48.9	34.2	4.4	15.4	50.3	15.1	32.2	23.5	28.9
OmniGen2 BAAI	34.1	50.2	72.4	49.0	44.8	50.1	30.2	51.7	75.1	47.3	45.8	50.0	50.1
DreamOmni2 CUHK	34.8	62.3	78.8	46.0	41.0	52.6	36.6	51.6	79.1	49.6	35.5	50.4	51.5
Echo-4o Shanghai AI Lab	42.7	46.5	64.1	50.9	48.7	50.6	41.0	55.5	68.4	53.0	50.1	53.6	52.1
Qwen-Image-Edit Qwen	38.7	58.6	75.8	48.5	47.1	53.8	35.3	55.0	78.1	49.6	48.9	53.4	53.6
Bagel ByteDance	43.8	62.0	70.0	53.5	43.4	54.5	39.5	58.2	73.5	55.5	46.8	54.7	54.6
Uni-CoT SAIS	35.5	60.0	69.3	57.0	49.5	54.3	36.4	55.2	77.4	58.1	48.9	55.2	54.8
FLUX.2-dev Black Forest Labs	42.3	68.5	75.6	56.5	49.1	58.4	46.3	73.4	80.3	59.9	52.8	62.6	60.5
Nano Banana Google	53.8	75.2	82.7	82.4	53.7	69.6	53.3	71.3	79.9	77.4	51.1	66.6	68.1
GPT-image-1 OpenAI	58.7	75.9	87.6	77.9	54.8	71.0	59.5	76.6	88.2	78.8	54.1	71.4	71.2
Seedream 4.0 ByteDance	67.2	77.3	79.6	89.7	53.1	73.4	59.3	62.8	81.0	87.8	55.5	69.3	71.4
Nano Banana Pro Google	68.1	78.1	86.7	88.1	56.6	75.5	77.7	83.8	84.3	90.7	57.0	78.7	77.1

Benchmark Examples

Qualitative Comparisons on WiseEdit Awareness Task.

Qualitative Comparisons on WiseEdit Interpretation Task.

Qualitative Comparisons on WiseEdit Imagination Task.

Qualitative Comparisons on WiseEdit-Complex Task.

BibTeX

@article{pan2025wiseedit,
        title={WiseEdit: Benchmarking Cognition-and Creativity-Informed Image Editing},
        author={Pan, Kaihang and Chen, Weile and Qiu, Haiyi and Yu, Qifan and Bu, Wendong and Wang, Zehan and Zhu, Yun and Li, Juncheng and Tang, Siliang},
        journal={arXiv preprint arXiv:2512.00387},
        year={2025}
}