LLM chat and prompt -> image generation is everywhere now. The problem is most people stop there, and the output looks random. The style drifts. Characters change between shots. Colors don’t match the brand. You end up spending more time re-rolling images than actually shipping work.
This is a practical walkthrough for getting quality while keeping brand consistency, by using the right tool for each stage: generation, LoRA training (if you own the style/character), and specialised editing models for inpainting, outpainting, context replacement, segmentation, and upscaling.
Let’s say your brand has a recurring set of characters, for example our characters, Al and Min appear in blog posts, ads, and eventually videos. You want each new image to look like it belongs in the same environment:
If you only use a text prompt, you can get a nice single image, but you will struggle to get a repeatable series and character that align to your brand. That’s where “specialised models + a workflow” matters.
Create an image of Al and Min is standing side by side facing the viewer — ChatGPT
To generate images with quality and adhere to brand guidelines, here’s the process I follow:
It’s not complicated. The win is choosing the right model at each step instead of forcing one model to do everything. This not only improve the image quality, but reduces the cost of using single image model.
When you generate an image, you’re using a base model (network + trained weights). Your text prompt gets converted into embeddings, and the model predicts an image step by step. If you want characters and style that are not commonly found in public datasets, you would expect the base model try to “guess” and generate a random character style that different from your expectation.
This is where LoRA comes in for brand assets: instead of having the base model “guess” your brand’s style or characters, you can teach the model exactly what you want (see Step 3.1).
Most prompts are missing the brand constraints. Add them explicitly.
For the Al and Min example, instead of prompting “two characters standing side by side”. Say what your brand cares about.
Here’s a prompt template you can reuse:
Subject:
- Al and Min, full body, standing side by side, facing viewer
Brand style constraints:
- [your style keywords: e.g. clean vector, soft shading, thick outline, flat background]
- color palette: [your palette words or hexes if you have them]
- lighting: [soft studio light / warm sunset / neon night]
- mood: [friendly, playful, confident]
Composition constraints:
- centered composition, consistent proportions, readable silhouette
- keep faces consistent, avoid extra fingers, avoid warped text
Background constraints:
- [plain / simple gradient / light environment], no clutterThis sounds basic, but it changes the output a lot. You’re telling the model what not to drift on.
This is where the LLM is useful: tightening wording, removing ambiguity, and generating variations of the same intent.
What I ask the LLM to do:
Important: don’t let the LLM add random “cinematic” fluff unless your brand actually wants it.
There are two categories of image generation services you’ll run into: commercial and open source.
Commercial usually wins on raw quality and convenience. Open source wins on control and customization.
Here’s the straightforward tradeoff:
Use commercial when you need:
Example Models: Google Gemini (“nano banana”), GPT-5.2.
Use open source when you need:
Example Models: Wan 2.2, Flux 2 Dev, Qwen Image.
For brand consistency (Al and Min), open source becomes valuable because you can train a LoRA.
Training is simply: the model learns patterns from your data. In brand terms, this is “teach the model what Al and Min look like by using multiple references (images or videos)“.
Recommended training platforms:
Model considerations (depending on your setup and goal): Qwen Image, Wan 2.2, Z-image.
Once you have a LoRA, you can attach it to the base model and are now able generate consistent character image with prompt. That’s the whole point for brand work.
Most people only know “chat back and forth until it looks right”. It works, but it’s slow and not precise.
The faster way is: decide the editing task, then pick a model designed for that task.
Below are some common image editing functions I use for image editing. While recent image model like Inpainting and Outpainting have been recently outpaced by the more common image model like all in one editting model, I am listing it out for side references. This isn’t exhaustive, but it covers most real workflows.
Inpainting fills in missing or unwanted parts of an image.
Example: “Replace the vest with the new vest” and keep everything else unchanged.
Inputs you usually need:
Why it helps with brand: you can update details (logos, outfit versioning, small props) without re-generating the whole image.
Model example: Flux.1 Fill
Outpainting extends the image beyond its borders.
Example: you generated a portrait, but you need a 16:9 banner. Outpainting “zooms out” and invents the missing context while matching the style.
Inputs you usually need:
Model example: Flux.1 Fill
Context replacement changes parts of an image based on new instructions while keeping the rest intact.
Example: “Make the sky night instead of sunset” without re-drawing Al and Min.
This is where strong commercial models often feel “magical”, because they’re trained to follow instructions while preserving identity and layout.
Model example:
Segmentation is for detecting and extracting objects or people cleanly.
Example: cut out Al and Min so you can place them on a new background, or prepare a mask for inpainting.
Input is typically:
Model example: SAM3.
Upscaling makes images sharper and larger, and fills in missing detail.
Input is typically:
Model example:
Even with good models, brand work still benefits from small human adjustments:
Open source + commercial is the best of both worlds.
Use open source when you need control and brand specific assets (especially LoRA). Use commercial when you need speed and high-quality edits with strong instruction following. The key is not picking one tool and forcing it to do everything. Pick the right specialised model for the job, and your output becomes both higher quality and more consistent.