I designed a single text-to-image prompt with 7 hidden stress tests baked in. Then I ran it through Gemini, GPT Image 1.5, Midjourney, Qwen, Wan, Seedream, Flux, Z-Image, and Uni-1.
The prompt asked for an elderly Japanese calligrapher painting the kanji character 勇 (courage) in a candlelit tatami room with very specific requirements: correct shadow direction from a defined light source, three wall scrolls with different kanji, a tortoiseshell cat sleeping on a red cushion, a snowy garden through an open window, and medium format film aesthetics.
Every element was chosen to test something most AI models get wrong.
What I was actually testing:
Specific kanji rendering - Can it produce a real Japanese character, not gibberish? Shadow physics - Candle on the left means shadow falls right. Basic physics that most models ignore. Object counting + variation - Three scrolls, each with a different character. Breed-specific animal - Tortoiseshell pattern, sleeping pose, correct placement on the floor. Reflections on curved surfaces - Candlelight on a ceramic tea cup. Dual-temperature lighting - Warm tungsten interior vs. cool blue moonlit exterior in the same frame. Film simulation - Shallow depth of field with medium format bokeh characteristics.
The Results (ranked)
1. Gemini (Nano Banana Pro) - 9.0/10 The champion, and it won by doing something no other model managed: zero catastrophic failures. Three perfect scroll kanji (道, 夢, 志 meaning way, dream, ambition), correct shadow on the shoji screen, sleeping tortoiseshell cat on the floor on a red cushion, visible candlelight reflection on the tea cup, and the most natural dual-temperature lighting of any model. The only misses were the main kanji on the paper (not quite 勇) and slightly ambiguous gender.
2 (tied). Uni-1 - 8.7/10 Uni-1 debuted in the shootout and immediately tied for second place. Three real distinct kanji on the scrolls (静, 誠, 道 meaning tranquility, sincerity, way), tabby cat sleeping curled on a red cushion on the floor, correct shadow direction on the shoji screen, and an authentic indigo patterned kimono. The calligrapher clearly reads as female. Only real miss was the main kanji on the paper not being clearly 勇.
2 (tied). Seedream - 8.7/10 The only model in this round that nailed the two hardest individual tests simultaneously: correct 勇 kanji on the paper AND correct shadow direction from the candle. Nobody else managed both. The dual-temperature lighting was also excellent, and the tea cup showed a visible candlelight reflection. Lost points for cat placement (on the table instead of the floor).
4 (tied). GPT Image 1.5 - 8.5/10 A massive leap from earlier GPT versions. Three real distinct kanji on the scrolls (禅, 夢, 和 meaning zen, dream, harmony), the character on paper is a recognizable 夢 shape (close structure but not 勇), shadow cast correctly to the right, and a tortoiseshell cat sleeping on the red cushion on the floor. Nice cultural details with the ink stone and brush rest.
4 (tied). Gemini (Nano Banana 2) - 8.5/10 The earlier Gemini model still holds up strong. Real kanji on the scrolls (平, 和, 誠, 道), correct shadow, cinematic atmosphere. Four scroll characters instead of three was its main miss.
6 (tied). Qwen - 8.0/10 The most culturally intelligent output. Confucian virtues on the scrolls (仁, 義, 礼 meaning benevolence, righteousness, propriety), authentic indigo kimono, proper matcha bowl, and the best tortoiseshell coloring of any cat in the test. It understood the context, not just the instructions.
6 (tied). Wan - 8.0/10 Rendered 勇 correctly on all three scrolls (though repeated instead of varied). Perfect cat placement (calico, sleeping, on the floor). Cleanest spatial layout of any model. But the calligrapher read as male.
6 (tied). Flux - 8.0/10 The most photorealistic image in the test. If you showed this to someone without context, they'd think it was a real photograph. Correct shadow direction and gold brocade kimono are stunning. But it cannot render kanji at all. Beautiful but illiterate.
9. Z-Image - 7.0/10 Competent across the board, no standout strengths. The solid B student.
10. Midjourney - 6.0/10 The most atmospheric, moody images. Also a complete failure on every kanji in the scene. All gibberish. The calligrapher is wearing a sweater and glasses instead of a kimono. MJ remains the "vibes over accuracy" model.
The Big Takeaways
Photorealism is solved. Every model in this test produces images that could pass as photographs. The competition has moved upstream.
Consistency beats any single strength. Gemini didn't win by being the best at any one thing. It won by having no catastrophic failures. Seedream nailed the hardest single tests, but stumbled on cat placement. Uni-1 also achieved this "no weaknesses" profile in its very first appearance.
Text rendering is the great divider. The ability to produce specific, accurate text (or in this case, kanji characters) separates the smart models from the pretty ones. Ranking: Seedream > Wan > Uni-1 > GPT > Qwen >> Flux/MJ.
New models can leapfrog established ones. Uni-1 didn't exist a few months ago and debuted tied for 2nd place. The field is moving incredibly fast, and rankings can shift within weeks.
Cultural accuracy is underrated. Qwen and Uni-1 dressed the character in the most authentic kimono, chose real and meaningful characters, and understood what a Japanese calligrapher's room should feel like, not just what it should look like.
Try It Yourself:
Here's the exact prompt. Run it through your model of choice and see how it scores:
"A elderly Japanese calligrapher sits at a low wooden table in a dimly lit tatami room, painting the kanji character 勇 (courage) on a large sheet of rice paper with a thick ink brush. Three half-finished scrolls hang on the wall behind her, each showing a different kanji. A single candle to her left casts her shadow sharply to the right onto a sliding shoji screen. A small tortoiseshell cat sleeps curled on a red cushion in the foreground, and a ceramic cup of green tea sits on the table, reflecting the candlelight on its surface. Through a partially open window on the right, a snowy garden with a stone lantern is visible under moonlight. Shot on medium format film, shallow depth of field, warm tungsten tones."
Score it against these 7 tests and share your results in the comments. I want to see if anyone finds a model that cracks 9.0.
Round 2 prompt involves a 1920s Havana barber shop with a mirror that has to reverse text. It's even harder.