Here's a question I couldn't answer until I ran the experiment: if you ask Firefly to make a tiny elephant, does it work the same way as asking it to make a giant butterfly?
Turns out? Not even close. And the reason why taught me more about how Firefly thinks than almost any test I've run.
The Experiment
I generated 24 images across 6 prompt variations, all testing scale disruption - impossible size relationships rendered with photographic realism. Half the prompts went "small thing made big" (giant insects, enormous flowers). The other half went "big thing made small" (miniature elephants on coffee saucers).
Same model. Same session. Same scoring rubric. One direction scored a full point higher than the other.

Same session, same scoring rubric. The butterfly is unmistakably alive. The elephant is a gift shop figurine. That gap tells the whole story.
The Direction That Works: Small to Big
I started with a praying mantis the size of a building.
Giant praying mantis perched on top of a city skyscraper,
dramatic scale contrast, overcast sky, 200mm telephoto lens,
compressed perspective, professional photography, hyper-realistic,
sharp detail, cinematic qualityThe results were immediately impressive. The mantis looked alive - compound eyes catching light, serrated forelegs with organic joint detail, translucent wing edges. And Firefly did something I didn't ask for: it added atmospheric haze between the mantis and distant buildings, creating the kind of depth you see in real telephoto cityscape photos. That haze turned out to be the single biggest factor in making impossible scale look believable.

A building-sized predator surveys its territory. Firefly rendered the compound eyes, the serrated forelegs, the translucent wing edges - all unprompted biological detail at impossible scale.
When I pushed further and placed the mantis on a glass building, Firefly rendered its reflection in the glass panels. I never asked for that. It just understood that glass reflects things, even impossible things.
Average score across the "small to big" direction: 8.36 out of 10.
The Direction That Breaks: Big to Small
Then I asked for a tiny elephant on a coffee cup saucer. Same model, same level of prompt detail.
Tiny elephant standing on a coffee cup saucer in a café,
tilt-shift effect creating miniature appearance, warm ambient
lighting, shallow depth of field, 85mm lens f/2.8, bokeh
background, professional photography, hyper-realistic, sharp detailThe photography was fine. Warm café light, creamy bokeh, nice composition. But the elephant? It looked like something you'd buy at a gift shop. Matte paint finish. Slightly plasticky surface. No skin pores, no wrinkles, no moisture - none of the biological texture that made the giant mantis look alive.
Firefly turned my living elephant into a figurine.

Warm bokeh, beautiful café atmosphere, polished composition. But zoom in on the elephant's skin. Matte. Plasticky. No wrinkles, no pores, no moisture. This is product photography of a figurine.
I thought maybe it was the tilt-shift vocabulary causing the problem - tilt-shift photography literally makes real things look like miniatures, so maybe I was accidentally reinforcing a toy aesthetic. So I rewrote the prompt using macro photography language instead:
Miniature elephant standing on a coffee cup saucer, extreme
macro shot, shallow depth of field with background bokeh, soft
diffused studio lighting from above, 100mm macro lens, water
droplets visible on saucer for scale, professional photography,
hyper-realistic, ultra-detailedThe photography improved. Tighter framing, water droplets appeared on the saucer, the cup loomed overhead creating a stronger sense of tininess. But the elephant was still a figurine. The macro language made a better photo without fixing the core problem.
Average score across the "big to small" direction: 7.36 out of 10.
Why This Happens (My Theory)
Adobe Stock - Firefly's training data - is full of product photography of figurines, toys, and miniatures sitting on tabletops. Thousands of images of tiny animal statues in warm light with shallow depth of field. When Firefly sees "tiny elephant on saucer in café," it's not thinking "real elephant that is small." It's thinking "figurine product shot."
But Stock has basically zero images of building-sized insects. So when Firefly sees "giant mantis on skyscraper," it can't fall back on training shortcuts. It has to construct the scene from scratch - and it constructs a living creature, because that's what a praying mantis is.
The training data creates an asymmetry. One direction has a shortcut that produces the wrong answer. The other direction doesn't have a shortcut at all, so Firefly does the hard work and gets it right.
The Proof: Same Elements, Opposite Directions
To nail this down, I ran a control test. Same two elements - a monarch butterfly and a park bench - with the scale flipped.
Direction 1: Giant butterfly on park bench scored 8.55 average. The butterfly is unmistakably alive. Translucent wings, organic legs, natural color. It's sitting on the bench like it owns the park. One image even cast a person-sized shadow on the ground.
Direction 2: Tiny bench on butterfly wing scored 7.97 average. The bench looks like a dollhouse piece. But here's where it gets interesting: that 7.97 is dramatically higher than the figurine elephants (6.83-7.27). Why? Because a bench is already inanimate. There's no biological realism to lose. A tiny model bench on a butterfly wing is charming. A tiny figurine elephant is disappointing.

Same butterfly. Same bench. Opposite scale directions. The giant butterfly owns the park. The tiny bench became a dollhouse prop. But the bench scored higher than the figurine elephants - because there's no biological realism to lose when something is already inanimate.
The figurine problem specifically punishes subjects that are supposed to be alive.
What Actually Sells Scale
Across 24 images, I tracked which techniques boosted scores. Here's what I found, ranked by impact:
Physical consequence is the biggest differentiator. My highest-scoring image of the entire session (9.25) was a giant sunflower towering over a suburban house. Not because the flower was impressive - though it was - but because its root system was cracking the sidewalk. The impossible object had weight. It was affecting its environment. The mantis perched on buildings but didn't change them. The sunflower was rewriting the landscape.

9.25 out of 10. Root systems cracking pavement. Bark-like stem texture Firefly intuited on its own. A family looking up in the kind of wonder that makes you feel the scale in your chest, not just see it on a screen.
Human reaction figures are the next best thing. People in the scene looking up, shielding their eyes, standing in wonder - that sells scale emotionally in a way that even perfect atmospheric perspective can't. You don't just see how big the flower is. You feel what it would be like to stand next to it.
Reflective surfaces provide secondary confirmation. Puddles reflecting the butterfly's silhouette. Glass panels reflecting the mantis. The reflection tells you the impossible thing is really there - it exists in the physics of the scene, not just pasted on top of it.
Atmospheric haze is the prerequisite. Every top-scoring image had it. It's not a bonus - it's the baseline. Without haze creating depth between the giant subject and distant objects, the scale illusion collapses into "Photoshop composite."
The Photography Vocabulary That Matters
Not all technical terms are equal for scale work. Here's what I found:
For giant subjects: wide-angle lens (24mm) + low-angle perspective + atmospheric haze. This combination produced the session's best average (8.76). Telephoto (200mm) + compressed perspective also works well (8.06), especially for creatures in urban settings.
For miniature subjects: macro language (100mm macro lens, water droplets, extreme macro shot) outperforms tilt-shift by 0.44 points. It won't fix the figurine problem, but at least it produces better macro photography around the figurine.
Avoid tilt-shift for impossible miniatures. It actively makes things worse. Tilt-shift's entire purpose is making real things look like toys - so it pushes Firefly even harder toward the figurine interpretation. My lowest-scoring variation (6.83) used tilt-shift language.
What I'd Do Differently
If I were running this test again, I'd try fighting the figurine problem with explicit biological language - "living, breathing miniature elephant with wet skin, visible pores, and real muscle movement." That's a targeted attempt to override the training data shortcut. I'd also test adding physical consequence to miniature scenes: a tiny elephant leaving actual wet footprints on the saucer, or the saucer cracking under its weight. If physical consequence sells giant scale, maybe it sells miniature scale too.
But honestly? The simplest takeaway is this: if you want impossible scale in Firefly, make things bigger, not smaller. The scoring gap is a full point, it's consistent across subjects, and it's visible from across the room.
The Numbers
Direction | Average Score | Best Score | Figurine Problem? |
|---|---|---|---|
Small to Big | 8.36 | 9.25 | No |
Big to Small | 7.36 | 8.35 | Yes (living subjects) |
Gap | 1.00 | 0.90 |
24 images. 6 prompt variations. One clear answer.
Go big.
Testing methodology: All images generated in Adobe Firefly Image 5, single session. Each variation generated 4 images. Scored on a 5-dimension rubric: Visual Quality (30%), Prompt Alignment (25%), Consistency (15%), Uniqueness (15%), X Engagement Potential (15%). No cherry-picking - all images scored, averages reported.

