Text-to-Image Models in Digital Illustration Generation: A Performance Evaluation

Rotimi-Williams Bello; Roseline Oluwaseun Ogundokun; Pius A. Owolawi; Etienne A. van Wyk; Chunling Tu

doi:10.62754/ais.v7i1.1063

Download full-text

Published: 2026-01-27

DOI: https://doi.org/10.62754/ais.v7i1.1063

Keywords:

DALL•E 3, Digital Illustration, GPT-4o, Text-to-image

Rotimi-Williams Bello

Department of Computer Systems Engineering, Faculty of Information and Communication Technology, Tshwane University of Technology, South Africa, Department of Mathematics and Computer Science, Faculty of Basic and Applied Sciences, University of Africa, Toru-Orua, Nigeria

Roseline Oluwaseun Ogundokun

Department of Computer Systems Engineering, Faculty of Information and Communication Technology, Tshwane University of Technology, South Africa

Pius A. Owolawi

Department of Computer Systems Engineering, Faculty of Information and Communication Technology, Tshwane University of Technology, South Africa

Etienne A. van Wyk

Department of Computer Systems Engineering, Faculty of Information and Communication Technology, Tshwane University of Technology, South Africa

Chunling Tu

Department of Computer Systems Engineering, Faculty of Information and Communication Technology, Tshwane University of Technology, South Africa

Abstract

Traditional methods for text-to-image in digital illustration generation are with limitations, necessitating state-of-the-art models in digital illustration generation. However, the performance of these state-of-the art models has not been comprehensively evaluated. In this paper, the performance of GPT-4o, DALL•E 3, and Midjourney were manually evaluated as three important text-to-image models. By using 10 simple prompts and 10 complex prompts, 180 illustrations were generated and evaluated across three criteria, including artistic expression, semantic control, and workflow flexibility. Experimental results show that no single model can dominate all aspects of digital illustration generation. Instead, GPT-4o and DALL•E 3 are best choice for illustrations that are structured and instruction-filled, such as UI sketches, storyboards, and educational diagrams. Midjourney has no rival in generating illustration that is visually rich, cinematic, and stylistic. The findings in this paper suggest using the desired balance between artistic expression, semantic control, and workflow flexibility when choosing models for text-to-image in digital illustration generation.

Issue

Vol. 7 No. 1 (2026)

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

How to Cite

Text-to-Image Models in Digital Illustration Generation: A Performance Evaluation. (2026). Architecture Image Studies, 7(1), 1596-1603. https://doi.org/10.62754/ais.v7i1.1063

Article Sidebar

Main Article Content

Abstract

Article Details

Issue

Section

How to Cite