People see the prompt share. The fill-in-the-blank template. The image that goes with it. What they don't see is the session that came before it.
I don't write a prompt and post it. I write a prompt, drop it into multiple AI models in Adobe Firefly Boards, generate four images from each model, score them, change a variable, and do it again. Then again. Sometimes three or four rounds before the prompt earns a spot on your timeline.
This article is about the R&D process behind one prompt share. 176 images. 12 models. 5 containers. 5 ecosystems. Three rounds of systematic testing, all inside Firefly Boards. The prompt template at the end of this article is what survived.
The Tool That Makes This Possible
Firefly Boards is where the R&D happens. It puts every model in one workspace. Firefly Image 5, Nano Banana Pro, Flux, GPT Image, all generating side by side. No switching tabs. No re-uploading reference images. No losing track of which output came from which model.
That sounds like a convenience feature. It's not. It's a workflow change.
When outputs are next to each other, you stop evaluating images individually and start evaluating them comparatively. You see patterns. You see where models agree and where they diverge. You see which variables actually move the needle. That comparative view is the entire foundation of how I develop prompts now.

Firefly Boards with the Miniature Worlds session. Every model, every container, every ecosystem, one workspace. This is what prompt R&D looks like.
Round 1: One Prompt, Twelve Models
The concept for this session was simple. A complete miniature thunderstorm trapped inside a clear glass lightbulb, sitting on a dark wooden table. Clouds, lightning, rain, puddles at the base, light casting through the glass onto the wood.
Here is the prompt:
A clear glass lightbulb sitting on a dark wooden table, inside the bulb a complete miniature thunderstorm is raging with tiny dark clouds and bright lightning bolts illuminating the glass from within, rain falling inside the bulb with tiny puddles collecting at the bottom, the lightning casts dramatic light through the glass onto the table surface, macro photography, 85mm lens, shallow depth of field, dark moody background, atmospheric haze, professional product photographyI dropped this into twelve models in Boards. Four images each. Forty-eight images total. Scored on a weighted five-dimension rubric: Visual Quality (30%), Prompt Alignment (25%), Consistency (15%), Uniqueness (15%), and X Engagement Potential (15%).
The rankings:
Rank | Model | Avg Score | Best Single |
|---|---|---|---|
1 | GPT Image 1.5 | 8.75 | 9.15 |
2 | Flux 1.1 Pro | 8.50 | 8.88 |
3 | Flux 2 Pro | 8.31 | 8.70 |
4 | Firefly Image 5 | 8.31 | 8.58 |
5 | Nano Banana 2 | 8.30 | 8.55 |
6 | Nano Banana Pro | 8.26 | 8.90 |
GPT won. That part was expected. What I didn't expect was the interpretation range.
The same prompt produced six entirely different narratives across the twelve models. Firefly kept the storm sealed perfectly inside the glass. Flux 2 Pro leaked water through the base onto the table. The Nano Banana family let lightning escape in Lichtenberg patterns across the wood surface. Flux 1.1 Pro used warm amber lightning instead of blue-white and turned the filament into the storm's power source. Flux 1.1 Pro Ultra Raw abandoned containment entirely and built a landscape with trees and a pond inside the bulb.
Same words. Six stories. You don't see that unless you're comparing in one workspace.

Same prompt, same concept. One model sealed the storm. Another let lightning escape. A third turned the filament into the storm's generator. Boards showed me all of this in one view.
This is the first thing Boards reveals: your prompt doesn't mean what you think it means. It means something different to every model. And until you see those interpretations next to each other, you're optimizing blind.
Round 2: Same Storm, Different Containers
Round 1 gave me a ranking. Round 2 tested whether that ranking held.
I narrowed the field to four models (GPT Image 1.5, Firefly Image 5, Flux 1.1 Pro, Nano Banana Pro) and kept the thunderstorm but changed the container. Mason jar. Snow globe. Coffee cup. Pocket watch. Four images per model per container. Sixty-four more images, all generated and compared in Boards.
The cross-model averages by container:
Container | Cross-Model Avg |
|---|---|
Pocket Watch | 8.67 |
Snow Globe | 8.54 |
Coffee Cup | 8.45 |
Mason Jar | 8.35 |
Lightbulb (R1) | 8.54 |
The pocket watch scored highest. And it was the container with zero stock photography precedent for "weather inside a timepiece."
But here's the finding that mattered: the model rankings shifted depending on which container I gave them. Nano Banana Pro placed 6th out of 12 on the lightbulb with an 8.26 average. On the pocket watch, it produced the session's highest-scoring single image at 9.23.
Same model. Same storm. Different container. A 0.97-point swing.

Nano Banana Pro: 6th place on the lightbulb, session champion on the pocket watch. The container changed everything. Boards let me test that systematically instead of guessing.
The pocket watch worked for NB Pro because it aligned with the model's natural behavior. NB Pro's signature is escaped containment (lightning crawling across surfaces), environmental context (warm rooms with books and lamps), and warm tonality. The lightbulb suppressed all three of those instincts. The pocket watch amplified all three.
I would not have discovered that by testing one container and moving on. The R&D process in Boards, running the same prompt across multiple containers and comparing, is what surfaced the match.
Round 3: Same Globe, Different Worlds
Round 2 established that the snow globe offered the highest cross-model consistency. So I locked the container and changed what lived inside it. Coral reef. Ancient forest. Galaxy. Desert sandstorm. Plus the original thunderstorm for comparison.
Sixty-four more images across four models. The results were unanimous.
Ecosystem | Cross-Model Avg |
|---|---|
Forest | 9.07 |
Coral Reef | 8.79 |
Galaxy | 8.75 |
Desert Sandstorm | 8.73 |
Thunderstorm | 8.54 |
The forest won across all four models. Not one model ranked it second. The cross-model average of 9.07 made it the session's peak concept. GPT averaged 9.41 on the forest globe, with three individual images tying at 9.43 for the session's highest scores. Firefly hit 9.10, its best average across all 176 images. Even Flux 1.1 Pro reached 8.90.
The thunderstorm I started with finished last. Every colored, living ecosystem outperformed monochrome weather.

The forest snow globe. Four models, sixteen images, not one scored below 8.78. Boards showed me this was the session champion before I even finished scoring.
Why forest? Five factors converged. God rays create visible volumetric light inside glass. The green-gold palette produces natural warm-cool contrast against dark backgrounds. Trees pressing against curved glass create compression tension at any viewing size. Organic detail (moss, ferns, mushrooms) rewards high-resolution rendering. And the concept maps to terrarium photography, which exists in training data but isn't oversaturated the way "lightbulb with weather" is.
Here is the prompt that produced it:
A glass snow globe sitting on a dark wooden table, inside the globe a complete miniature ancient forest with towering moss-covered trees and thick green canopy, golden sunlight filtering through the leaves creating volumetric god rays inside the glass, tiny ferns and mushrooms covering the forest floor, the warm light casts a green-gold glow through the glass onto the table surface, macro photography, 85mm lens, shallow depth of field, dark moody background, atmospheric haze, professional product photographyThe R&D Finding That Changed My Process
Here is the session mapped as a single progression:
Thunderstorm in a lightbulb: 8.54 cross-model average. Thunderstorm in a pocket watch: 8.67. Forest in a snow globe: 9.07.
The starting point was the weakest version of the concept. Every step that improved quality was a change to what was inside the prompt, not which model rendered it.
The gap between the best model (GPT at 8.75) and the worst surviving model (Firefly at 8.31) on the lightbulb was 0.44 points. The gap between the worst ecosystem (thunderstorm at 8.54) and the best ecosystem (forest at 9.07) was 0.53 points. Changing the subject outperformed changing the model.
This does not mean model choice is irrelevant. GPT won every container and every ecosystem. But the order of operations matters. If I had spent the entire session testing 12 models on a thunderstorm in a lightbulb, the best possible result would have been GPT's 8.75. By testing 4 containers and 5 ecosystems instead, the worst model's best ecosystem (Firefly's forest at 9.10) outscored the best model's starting concept (GPT's lightbulb at 8.75).
That's the R&D principle: the decisions you make before you start typing are the ones that shape the output. Container choice. Ecosystem design. Light behavior inside glass. Those decisions are the prompt. The words are the delivery mechanism.
And the first version of a prompt is almost never the best one. Testing is how you find the best one.
Why Boards Is the R&D Tool
I want to be specific about what Boards does for this process, because it's not just "another place to generate images."
Comparative evaluation. When I generate the same prompt across four models and see the outputs in one workspace, I'm not judging images. I'm judging how models interpret language. Firefly's sealed containment tells me the model reads "inside" literally. NB Pro's escaped lightning tells me it prioritizes visual drama over instruction accuracy. That understanding informs every prompt I write afterward.
Variable isolation. Boards lets me hold the model constant and change the container, or hold the container constant and change the ecosystem. That's how I discovered that container choice swings scores by nearly a full point. Without the ability to compare systematically in one workspace, I'd be changing multiple variables at once and never knowing which one mattered.
Session persistence. Every generation stays in the workspace. By the end of the Miniature Worlds session I had 176 images across three rounds, all visible. I could scroll back from the forest globe (Round 3) to the original lightbulb (Round 1) and see the entire evolution of the concept in one place. That visual history is the R&D record.

176 images across three rounds, all in one Boards workspace. The full R&D record for a single prompt share.
This is what survived the gauntlet.
Here is the template version, built from the session's findings:
Prompt share:
A glass [CONTAINER] sitting on a dark wooden table, inside the [CONTAINER] a complete miniature [ECOSYSTEM] with [ECOSYSTEM DETAILS], [LIGHT SOURCE] casting [LIGHT COLOR] light through the glass onto the table surface, macro photography, 85mm lens, shallow depth of field, dark moody background, atmospheric haze, professional product photographyContainers that scored highest: snow globe, pocket watch.
Ecosystems that scored highest: ancient forest with god rays, coral reef with bioluminescence.
And here are the four top-scoring specific prompts from the session if you want to skip the template and try them directly:
Forest Snow Globe (session champion, 9.07 avg):
A glass snow globe sitting on a dark wooden table, inside the globe a complete miniature ancient forest with towering moss-covered trees and thick green canopy, golden sunlight filtering through the leaves creating volumetric god rays inside the glass, tiny ferns and mushrooms covering the forest floor, the warm light casts a green-gold glow through the glass onto the table surface, macro photography, 85mm lens, shallow depth of field, dark moody background, atmospheric haze, professional product photographyCoral Reef Snow Globe (8.79 avg):
A glass snow globe sitting on a dark wooden table, inside the globe a complete miniature coral reef ecosystem with colorful coral formations and tiny tropical fish swimming through crystal clear turquoise water, bioluminescent jellyfish providing soft glowing light from within, light refracting through the water and glass onto the table surface, macro photography, 85mm lens, shallow depth of field, dark moody background, atmospheric haze, professional product photographyThunderstorm Pocket Watch (best unique concept, 8.67 avg):
An open antique pocket watch sitting on a dark wooden table, inside the watch face a complete miniature thunderstorm is raging with tiny dark clouds and bright lightning bolts illuminating the watch glass from within, rain falling inside the watch with tiny puddles collecting on the watch mechanism, the lightning casts dramatic light through the watch crystal onto the table surface, macro photography, 85mm lens, shallow depth of field, dark moody background, atmospheric haze, professional product photographyDesert Sandstorm Snow Globe (best warm palette, 8.73 avg):
A glass snow globe sitting on a dark wooden table, inside the globe a complete miniature desert landscape with sand dunes and a raging sandstorm with swirling amber dust and tiny lightning bolts within the dust clouds, the warm amber light from the storm casts golden light through the glass onto the table surface, macro photography, 85mm lens, shallow depth of field, dark moody background, atmospheric haze, professional product photographyTry them. Run them across models in Firefly Boards. Pay attention to which model interprets each container and ecosystem differently. The side-by-side comparison will teach you more about your prompt than any single generation ever could.
Try Adobe Firefly → firefly.adobe.com
Made in Adobe Firefly.
Glenn is an Adobe Firefly Ambassador and AI creator documenting the craft of prompt engineering at @GlennHasABeard. He publishes The Render newsletter and creates the Stor-AI Time series adapting world folktales through AI-generated video.
This article is part of the Miniature Worlds series analyzing 176 scored AI images across 12 models, 5 containers, and 5 ecosystems.

