I Asked AI for a Puddle Reflecting Summer. It Gave Me Autumn.

I had a beautiful idea. A woman in a winter coat standing on a grey, overcast sidewalk. At her feet, a rain puddle. But the puddle reflects a different version of the exact same street. Summer. Green trees. Golden sunlight. Warm blue sky. Two seasons existing in one frame, connected by a thin layer of water.

Poetic, right? I thought so too.

Firefly thought it was autumn.

The Failure

I'd already proven the concept worked. One variation earlier, a businessman stood over a puddle reflecting an ancient Greek temple. Scored 8.09 average, peaked at 8.46. Firefly nailed it. Wrong reflection, right execution, dramatic contrast between modern city and ancient architecture.

So I figured a more subtle version would land just as well. Maybe even better, since the emotional resonance of "same place, different time" felt stronger than "completely different world." There's something about standing where summer used to be and seeing it below your feet.

Here's the prompt:

Woman in winter coat standing on wet city sidewalk at dusk, 
low angle shot from near ground level, rain puddle at her feet 
reflecting a warm summer version of the same street with green 
trees and golden sunlight, reflection fills lower half of frame, 
reflection is sharp and detailed, moody overcast lighting above 
contrasting warm reflection below, 35mm street photography, 
deep focus showing both figure and reflection sharply, 
hyper-realistic, cinematic quality

Here's what I got instead:

Set average: 6.54. Green trees in reality AND the reflection. Winter plus summer equals autumn everywhere.

Look at the trees. They're green. In the real world AND in the reflection. The whole point was winter above, summer below. Instead, Firefly split the difference and gave me late autumn everywhere. Overcast sky, mild greenery, nothing particularly summer-like and nothing particularly winter-like.

The second image did the same thing. Third one too. By the fourth, I stopped hoping and started taking notes.

Set average: 6.54. That's a full 1.55 points below the dramatic Greek temple version. And more importantly, zero out of four images actually achieved the two-season contrast I asked for.

What Went Wrong

Firefly doesn't hold two versions of the same scene in its head simultaneously. When you say "this street, but summer" and "this street, but winter" in the same prompt, you're asking it to generate one image that contains two contradictory descriptions of the same subject. A street can't be both green and grey. Trees can't be both full and bare.

So it compromises. It finds the average of your two descriptions and renders that. Winter plus summer equals something vaguely temperate. The contrast disappears because the AI resolves the contradiction rather than maintaining it.

Compare this to the temple prompt. A Greek temple and a modern city sidewalk share nothing. They can't blend. Marble columns don't average out with street lamps. So Firefly renders them as two completely separate realities occupying different parts of the frame. Which is exactly the point.

Left: 8.46. Right: 7.18. Same surface type. The only difference is how impossible the reflected scene is.

The 2.08 Point Gap

I scored every image from both variations. Same rubric. Same weights. Same evaluator. Here's what the numbers said:

Dramatic contrast (Greek temple in puddle) averaged 8.09. Subtle contrast (same street, different season) averaged 6.54. That's a 2.08 point gap from changing nothing except how different the reflected scene was from reality.

The best subtle image scored 7.18. The worst dramatic image scored 7.65. There's no overlap. Even the best "subtle" result couldn't beat the worst "dramatic" one. The gap is absolute.

The Rule

After this test, I applied dramatic contrast to every remaining variation in the session. Tropical jungles in bathroom mirrors. Cosmic nebulae behind elderly women. Underwater oceans in shop windows. Stormy seas in hand mirrors. Nothing that could be confused with what was actually in the room.

And the results held. Every dramatically contrasted variation scored between 7.67 and 8.98. Nothing subtle came close.

So here's the principle, stated plainly: the reflected scene must be categorically impossible, not merely different.

"Different season" is different. "Underwater ocean" is impossible. The gap between those two words, different and impossible, is worth two full points on a ten-point scale.

Score: 9.23. A stormy ocean inside a hand mirror on velvet. Nothing subtle about it. That's the point.

Why This Matters Beyond Reflections

This isn't just a reflection thing. I've seen the same pattern across months of testing.

In my material transformation work, the most successful face portraits were made of fire and ice, materials that couldn't possibly form a human face. The least successful were wood and water, materials that sort of could. "Too plausible" was the anti-pattern. The impossible materials scored 9.43 average. The merely unusual ones dropped below 7.0.

In my impossible architecture series, buildings made of soap bubbles and candle wax outperformed buildings made of colored glass. Why? Because glass buildings actually exist. Soap bubble buildings can't. The AI commits harder to the concept when there's no realistic version to fall back on.

And here, in the reflection work, the same thing. A puddle can reflect a different season. That's plausible. So Firefly treats it as plausible and renders something plausible. A puddle can't reflect a Greek temple. That's impossible. So Firefly treats it as impossible and renders something impossible.

The lesson is consistent across hundreds of images and months of testing: AI art gets more convincing as it gets more impossible. The further you push from reality, the harder the AI commits to the concept, and the better the results get.

It feels counterintuitive. You'd think subtlety would be easier for AI. A small change should be simpler than a massive one. But that's human logic, not AI logic. AI doesn't understand degrees of impossibility. It understands pattern matching. And "winter street reflected as summer street" matches "slightly different photo of the same place," a pattern that produces boring results. While "modern street with Greek temple in puddle" matches "surreal composite photography," a pattern that produces striking results.

For Your Own Work

If you're going for surreal or impossible imagery with any AI generator, lean into the impossible. Don't soften it. Don't make it subtle. Don't think "this would be more elegant with a gentler contrast."

The elegance comes from execution, not restraint. A woman calmly gazing into a mirror that shows a galaxy is elegant because of the lighting, the expression, the fog on the glass, not because the galaxy is a restrained creative choice. If anything, the galaxy's impossibility is what gives Firefly permission to go all in on the execution.

Ask for the thing that can't exist. The AI will figure out how to make it look like it was photographed anyway. That's what it's good at.

But ask for the thing that almost could exist? The AI will shrug and give you something that almost does.

Part 3 of the Impossible Reflections series. Part 1 covered the surface hierarchy, why puddles, mirrors, and glass produce different types of impossible reflections. Part 2 detailed the three-word mirror fix that improved scores by 12%. Next: a deep dive into the highest-scoring image in the series.