I Put a Storm in a Coffee Cup. 4 AI Models Solved It 4 Different Ways.

Every other container in this session was transparent. Lightbulbs, mason jars, snow globes, pocket watches with open glass faces. The models could render a miniature world inside and show it through the glass. The coffee cup broke that assumption.

A white ceramic coffee cup is opaque. You cannot see through the walls. The prompt still asks for a thunderstorm inside with lightning illuminating from within. But "from within" means something different when the container blocks the view.

This is the article about what happened when I gave four AI models a problem they could not solve with their default rendering approach. The scores are not the point. The creative solutions are.

Here is the prompt:

A white ceramic coffee cup sitting on a dark wooden table, inside the cup a complete miniature thunderstorm is raging with tiny dark clouds rising above the rim and bright lightning bolts illuminating the cup from within, rain falling inside the cup with water collecting at the bottom, the lightning casts dramatic light onto the table surface and the interior walls of the cup, macro photography, 85mm lens, shallow depth of field, dark moody background, atmospheric haze, professional product photography

Four Models, Four Solutions

Each model hit the same opacity wall and found a different way around it.

Firefly Image 5 (@AdobeFirefly): The Hover. Firefly floated the storm clouds above the cup, hovering over the rim like steam rising from hot coffee. The cup stayed opaque and intact. The storm existed above the container rather than inside it. It was the safest interpretation, and the one closest to something that could exist in stock photography (a coffee cup with dramatic "steam"). Firefly's uniqueness scores were actually its highest of any container at 8.75, because the hover solution looked different from every other model's output. Average: 8.37.

GPT Image 1.5 (@ChatGPTapp): The Immersion. GPT pushed the storm clouds down into the cup, dense and compressed, then let them billow dramatically over the rim. You could not see through the ceramic walls, but you could see the storm erupting from the opening. Condensation formed on the exterior of the cup. Water overflowed onto the table. GPT treated the opacity as a physics constraint and rendered the consequences: if a storm is inside something you cannot see through, the evidence appears at the edges. This was GPT's second-highest container average at 8.95, with a peak of 9.20.

Flux 1.1 Pro (@bfl_ml): The Bridge. Flux placed the storm cloud directly above the cup and had lightning arcing down from the cloud into the liquid surface below, creating a bridge between atmosphere and container. Coffee beans appeared on the table, unprompted. The whole composition read as "steam from a cup became a thunderstorm," which is a narrative reframing rather than a literal rendering. The coffee cup was Flux's lowest-scoring container at 8.13, but the "steam became weather" metaphor was the most conceptually original solution.

Nano Banana Pro (@NanoBanana): The Rule-Breaker. NB Pro did not pick one solution. It tried four different ones across four images. NBP-CC-1 used an overhead angle looking down into the cup, bypassing opacity entirely by changing the camera position. NBP-CC-2 made the cup transparent, rendering it as a glass coffee cup instead of ceramic. NBP-CC-4 tilted the cup to reveal a seascape interior. And NBP-CC-1 was the highest-scoring coffee cup image from NB Pro at 8.58. The model's consistency dropped to 7.00, its lowest of any container. But the variety of solutions was unmatched. Average: 8.33.

Same prompt. Same opaque cup. Firefly hovered above it. GPT erupted from inside it. Flux bridged lightning to it. NB Pro changed the cup entirely. Four models, zero shared solutions.

The Creativity Scores Tell the Story

The coffee cup was not the highest-scoring container. It averaged 8.45 cross-model, placing third behind the pocket watch (8.67) and snow globe (8.54). But the dimension breakdown reveals something the composite score hides.

Dimension	Coffee Cup Avg	Snow Globe Avg	Pocket Watch Avg
Uniqueness	8.56	8.13	8.88
X Engagement	8.81	8.50	9.00
Prompt Alignment	7.88	8.38	8.25

The coffee cup's uniqueness and engagement scores are higher than the snow globe's. Its prompt alignment is the lowest of any container. The models deviated further from the literal prompt, and the results were more visually distinctive because of it.

This is the same accuracy-vs-artistry tension that showed up with NB Pro in Part 2 of this series. Models that follow prompts precisely produce reliable results. Models that deviate produce more surprising ones. The coffee cup forced every model to deviate, and the deviation produced the session's widest variety of creative approaches from a single prompt.

The coffee cup forced every model off-script. Prompt alignment dropped. Uniqueness and engagement climbed. Sometimes the best results come from prompts the model cannot follow literally.

Why Opacity Matters

The transparent containers (lightbulb, mason jar, snow globe) all allow the same rendering strategy: build a world inside, show it through the glass. The model does not need to make creative decisions about how to reveal the interior. The glass does the work.

The pocket watch added mechanical complexity but was still an open-faced container. You could see the storm through the watch crystal. The rendering challenge was fitting weather into a shallow disc, not hiding it behind walls.

The coffee cup is the only container that blocked the default "see through the walls" approach entirely. And the results show what that constraint does to model behavior:

It eliminates the safe path. On transparent containers, models can produce competent but predictable output by rendering the world inside and showing it through glass. The coffee cup removes that option. Every model was forced into an interpretation that has no direct training data precedent.
It reveals problem-solving style. Firefly avoided the problem (hover above). GPT solved the problem through physics (eruption evidence at the edges). Flux reframed the problem (steam metaphor). NB Pro broke the rules (changed the cup, changed the angle). These are four fundamentally different approaches to creative constraint, and they align with each model's broader behavioral patterns.
It produces the widest variety from a single prompt. On transparent containers, the four models produced variations on the same theme. On the coffee cup, they produced four different themes. If your goal is to explore a concept space rather than execute a specific vision, opaque containers are the prompt engineering tool that forces exploration.

Transparent containers let models render through the walls. Opaque containers force them to think around the walls. The constraint is the creativity tool.

The Series Principle

This is the final article in the Miniature Worlds series, and the coffee cup ties the whole arc together.

Part 1 showed that the subject you choose (container + ecosystem) matters more than which model renders it. Part 2 showed that containers can amplify or suppress a model's natural behavior. Part 3 showed that ecosystems which create visible light inside glass universally outperform ones that do not. Part 4 showed that premium tiers sacrifice consistency without gaining quality.

The coffee cup adds the last piece: constraints drive creativity. The highest-scoring concepts in this session were not the ones that gave models the most freedom. The forest snow globe (9.07) worked because five specific factors converged. The pocket watch (8.67) worked because it forced models to reason without stock precedent. The coffee cup (8.45) scored lower on average but produced the most diverse creative output.

Prompt engineering is not about describing exactly what you want and hoping the model executes it. It is about designing constraints that push the model toward discoveries you did not plan. Choose your container. Choose your ecosystem. Choose whether you want the model to see through the walls or think around them. Those decisions are the prompt. The words are just the delivery mechanism.

The prompts from this entire session are spread across all five articles in this series. Try them. Change the containers. Change the ecosystems. And pay attention to what happens when you give the model a problem it was not expecting.

176 images. 12 models. 5 containers. 5 ecosystems. The finding that mattered most: the choices you make before you start typing are the ones that shape the output.

Glenn is an Adobe Firefly Ambassador and AI creator documenting the craft of prompt engineering at @GlennHasABeard. He publishes The Render newsletter and creates the Stor-AI Time series adapting world folktales through AI-generated video.

This is Part 5 of 5 in the Miniature Worlds series analyzing 176 scored AI images across 12 models, 5 containers, and 5 ecosystems. Parts 1-4: "I Scored 176 AI Images," "The Container That Turned a 6th-Place Model Into #1," "Every AI Model Agreed on the Best Miniature World," and "Why Premium AI Models Keep Scoring Worse."

I Put a Storm in a Coffee Cup. 4 AI Models Solved It 4 Different Ways.

Four Models, Four Solutions

The Creativity Scores Tell the Story

Why Opacity Matters

The Series Principle

Keep Reading

Recommended Creators