Text-to-Image Models in Digital Illustration Generation: A Performance Evaluation
Main Article Content
Abstract
Traditional methods for text-to-image in digital illustration generation are with limitations, necessitating state-of-the-art models in digital illustration generation. However, the performance of these state-of-the art models has not been comprehensively evaluated. In this paper, the performance of GPT-4o, DALL•E 3, and Midjourney were manually evaluated as three important text-to-image models. By using 10 simple prompts and 10 complex prompts, 180 illustrations were generated and evaluated across three criteria, including artistic expression, semantic control, and workflow flexibility. Experimental results show that no single model can dominate all aspects of digital illustration generation. Instead, GPT-4o and DALL•E 3 are best choice for illustrations that are structured and instruction-filled, such as UI sketches, storyboards, and educational diagrams. Midjourney has no rival in generating illustration that is visually rich, cinematic, and stylistic. The findings in this paper suggest using the desired balance between artistic expression, semantic control, and workflow flexibility when choosing models for text-to-image in digital illustration generation.
Article Details
Issue
Section
Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
How to Cite
Text-to-Image Models in Digital Illustration Generation: A Performance Evaluation. (2026). Architecture Image Studies, 7(1), 1596-1603. https://doi.org/10.62754/ais.v7i1.1063