Text-to-Image Models in Digital Illustration Generation: A Performance Evaluation

Main Article Content

Rotimi-Williams Bello
Roseline Oluwaseun Ogundokun
Pius A. Owolawi
Etienne A. van Wyk
Chunling Tu

Abstract

Traditional methods for text-to-image in digital illustration generation are with limitations, necessitating state-of-the-art models in digital illustration generation. However, the performance of these state-of-the art models has not been comprehensively evaluated. In this paper, the performance of GPT-4o, DALL•E 3, and Midjourney were manually evaluated as three important text-to-image models. By using 10 simple prompts and 10 complex prompts, 180 illustrations were generated and evaluated across three criteria, including artistic expression, semantic control, and workflow flexibility. Experimental results show that no single model can dominate all aspects of digital illustration generation. Instead, GPT-4o and DALL•E 3 are best choice for illustrations that are structured and instruction-filled, such as UI sketches, storyboards, and educational diagrams. Midjourney has no rival in generating illustration that is visually rich, cinematic, and stylistic. The findings in this paper suggest using the desired balance between artistic expression, semantic control, and workflow flexibility when choosing models for text-to-image in digital illustration generation.

Article Details

Section

Articles

How to Cite

Text-to-Image Models in Digital Illustration Generation: A Performance Evaluation. (2026). Architecture Image Studies, 7(1), 1596-1603. https://doi.org/10.62754/ais.v7i1.1063