How we extract brand colors from any URL with Gemini Nano Banana

Type a URL into BannerKit AI and seconds later you have a banner with the right headline, the right colors, and a background image that actually fits the brand. No design files. No brand guide upload. No color picker.
Here's how the pipe works under the hood — and why we ended up using Google's Gemini models instead of the more obvious choices.
The problem
"Make a banner that matches my brand" is a deceptively hard ask.
You need to know:
- What the brand actually looks like. Logo, colors, typography vibe.
- What the offer is. The headline, the call to action, the value proposition.
- What style fits. A fintech and a children's toy company need very different banners even with the same color palette.
A human designer figures this out by visiting your site, scrolling around, looking at your About page, and forming a gestalt. Most automated tools skip this entirely — they ask you to upload a logo and pick colors from a swatch.
We wanted to skip the swatch. Paste a URL, get a banner. So we had to teach a machine to do what a human designer does at first glance.
Why scraping the page isn't enough
The naive approach: load the URL, parse the CSS for color values, grab the first H1 as the headline.
This breaks immediately on every modern site. Color values in CSS are scattered across hundreds of utility classes. The H1 is often "Welcome" or empty (rendered by JavaScript). Brand colors don't live in CSS variables — they live in logos and hero images that the page references but doesn't declare.
So we don't try to parse the page. We render it with Puppeteer, then hand the rendered output to a model that can see.
Stage 1: rendered content + Gemini 2.0 Flash
Puppeteer loads the URL in a real headless Chromium, executes JavaScript, waits for the page to settle, and returns the fully-rendered HTML plus a screenshot. This is the same approach Google's own crawler uses; it's the only reliable way to see what a real visitor sees.
The rendered HTML and a small set of metadata go to Gemini 2.0 Flash with a structured prompt. Flash is fast (~1 second response time on the prompts we send) and cheap, which matters because we run this on every URL submission.
We ask it for a strict JSON object:
{
"headline": "string — the most banner-worthy 4–8 word phrase",
"cta": "string — the most likely call to action",
"colors": {
"primary": "hex",
"accent": "hex",
"background": "hex"
},
"imagePrompt": "string — a description of the background visual"
}
Stage 2: the background image — Nano Banana Pro
The imagePrompt from stage 1 goes to Gemini 3 Pro Image (the model Google ships under the unfortunate codename "Nano Banana Pro"). This is Google's high-fidelity image generation model — released late 2025 — and it's the part of the stack that turns "a minimalist gradient with subtle geometric shapes in deep navy and electric orange" into a real image.
Two things make Nano Banana Pro good for banners specifically:
- It honors aspect ratios cleanly. We need to generate at IAB sizes — 728×90, 300×250, 160×600 — which are weird ratios that a lot of image models struggle with. Nano Banana Pro generates at the requested ratio without the awkward letterboxing other models do.
- It generates background-friendly images by default. We can prompt for "a banner background that won't fight with overlaid white text" and get something usable, instead of a busy hero image that drowns out the headline.
The generated image becomes the background layer. The colors from stage 1 become the gradient, the headline text, and the CTA button. Remotion (the React-based video framework) composites all of it into the final animation.
Why Gemini and not the obvious alternatives
We tested several stacks during the prototype:
- GPT-4o + DALL-E 3: Worked, but slower (~3–4 second roundtrip on text alone, vs ~1s for Flash) and DALL-E 3 has worse aspect-ratio honoring than Nano Banana Pro.
- Claude + Stable Diffusion XL: Best text quality of any option, but SDXL needs heavy prompting work to get banner-friendly backgrounds, and we'd need to host the image model ourselves.
- Gemini end-to-end: Slightly less polished text than Claude, but fast, and the image model is hosted by Google so we don't run GPU infrastructure.
For a real-time interactive tool where speed matters more than the last 5% of polish, Gemini won. The combined latency from URL submission to a complete banner is under 30 seconds, and most of that is the Remotion render — the AI part is single-digit seconds.
What this means for you
You don't have to think about any of this. The interface is just a text field.
But if you've ever wondered why some "generate a banner from a URL" tools spit out something that looks vaguely like your brand and others spit out something that looks like a hostage note in Comic Sans — this is the difference. Render the page like a real browser, ask a vision-capable model to see it, and use the structured output as the input to a generative pipeline.
There's no magic. Just a stack that respects what each model is actually good at.
Try it on your own site
Paste your URL, see what BannerKit AI does with it. Five free credits when you sign up — no card required.
Generate IAB-standard animated banners from a URL or prompt — 5 free credits when you sign up.
Start generating →