Flux1 Kontext (Dev) support #707

stduhpf · 2025-06-27T09:51:36Z

https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev/

Usage:

sd.exe -M edit --diffusion-model ..\models\unet\flux1-kontext-dev.safetensors --clip_l .\models\clip\clip_l\clip_l.safetensors --t5xxl .\models\clip\t5\t5xxl_fp16.safetensors --vae .\models\vae\flux\ae.f16.gguf --cfg-scale 1 --sampling-method euler --steps 20 --color -v --guidance 2.5 -p 'Prompt' -r reference.png

Example outputs:

reference image	Replace the text with "KONTEXT"	Change the background to a mountain backdrop	The cat is walking on the roof, remove the sign

Green-Sky · 2025-06-27T13:01:48Z

I wonder if we should use the input image dimensions as defaults for the output dimensions.

Green-Sky · 2025-06-27T13:07:47Z

input	prompt	output
	"add a plane into the sky"

Looks like if the input image is smaller then (default) specified image resolution, it gets cropped to upper left.

768x768:

input	prompt	output
	"add a plane into the sky"

edit: Oh and this is with cuda.

stduhpf · 2025-06-27T15:12:12Z

Looks like if the input image is smaller then (default) specified image resolution, it gets cropped to upper left.

Hmm It looks like it tends to re-frame the image if the resolution don't match, but depending on the prompt/seed, the exact framing changes, and it deosn't seem to always just crop the top left of the reference image to the target resolution.

Example: You can see the big cloud and part of the slope on the right were included in this 512x512 image, these were missing from your example with the plane.

Here's what a simple 512x512 crop would look like:

That's even less that what's included in the plane image.

Green-Sky · 2025-06-27T15:21:34Z

That's even less that what's included in the plane image.

You are right. I guess this is good behavior then.

bssrdf · 2025-06-27T23:40:48Z

input	prompt	output
	a beautiful girl model holding it

LostRuins · 2025-06-28T01:21:43Z

How does the kontext img interact with an img2img source image? How does the flow go?

I also noticed you're not actually limiting the kontext imgs from being used with regular flux. I wonder how that looks.

Anyway, it's working well. very good work. Merging was a bit of a pain tho, since the chroma PR isn't accepted yet so there's a bunch of conflicts. Meanwhile your bleedingedge branch also has other stuff. But I managed.

leejet · 2025-06-28T05:01:29Z

Thank you for your contribution. However, interestingly, I also added support for kontext dev and added the edit mode. I pushed some of the changes to your branch to make the code easier to maintain.

leejet · 2025-06-28T05:03:38Z

.\bin\Release\sd.exe -M edit -r .\kontext_input.png --diffusion-model  ..\models\flux1-kontext-dev-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\ae.sft --clip_l ..\..\ComfyUI\models\clip\clip_l.safetensors --t5xxl ..\..\ComfyUI\models\clip\t5xxl_fp16.safetensors  -p "change 'flux.cpp' to 'kontext.cpp'" --cfg-scale 1.0 --sampling-method euler -v  --t5xxl ..\x5c..\x5cComfyUI\x5cmodels\x5cclip\x5ct5xxl_fp16.safetensors  -p "change 'flux.cpp' to 'kontext.cpp'" --cfg-scale 1.0 --sampling-method euler -v

ref_image	prompt	output
	change 'flux.cpp' to 'kontext.cpp'

stduhpf · 2025-06-28T09:04:06Z

I also noticed you're not actually limiting the kontext imgs from being used with regular flux. I wonder how that looks.

That's because Regular Flux and Flux Kontext have the exact same architecture, so I haven't found a way to tell them apart at runtime. Regular Flux get very confused by the reference images, but it does its best.

How does the kontext img interact with an img2img source image? How does the flow go?

To be honest I haven't tried that configuration yet. But I don't think It should cause any weird interactions, The latent image is initialized with the source image (instead of empty), and then the model starts addig some noise and denoising at an advenced timestep like normal img2img. It's just conditioned by the reference image. (though with the edit mode it's no longer a concern)

LostRuins · 2025-06-28T09:24:48Z

@stduhpf I noticed your kontext_imgs is a vector i.e. it supports multiple images, but I'm not sure if I am doing it right.

I used 2 kontext_imgs:

My prompt is "display the images side by side"

Result:

Tried a few more times with equally odd results, often the second image is just completely ignored,
Am I using it correctly?

stduhpf · 2025-06-28T09:34:39Z

@stduhpf I noticed your kontext_imgs is a vector i.e. it supports multiple images, but I'm not sure if I am doing it right.

I used 2 kontext_imgs:

My prompt is "display the images side by side"

Result:

Tried a few more times with equally odd results, often the second image is just completely ignored, Am I using it correctly?

In the paper, they say:

So my understanding is that while the currently released model wasn't trained to work with multiple images, a future release might be able to do that.

stduhpf · 2025-06-28T10:58:58Z

@leejet I'm not convinced it's usefull to make such a distinction between the "edit" and "txt2img" modes. Isn't edit mode is just txt2img with image conditionning?

stduhpf · 2025-06-28T11:16:06Z

flux.hpp

+            uint64_t curr_h_offset = 0;
+            uint64_t curr_w_offset = 0;
+            for (ggml_tensor* ref : ref_latents) {
+                uint64_t h_offset = 0;
+                uint64_t w_offset = 0;
+                if (ref->ne[1] + curr_h_offset > ref->ne[0] + curr_w_offset) {
+                    w_offset = curr_w_offset;
+                } else {
+                    h_offset = curr_h_offset;
+                }

+                auto ref_ids = gen_img_ids(ref->ne[1], ref->ne[0], patch_size, bs, 1, h_offset, w_offset);
+                ids = concat_ids(ids, ref_ids, bs);
+
+                curr_h_offset = std::max(curr_h_offset, ref->ne[1] + h_offset);
+                curr_w_offset = std::max(curr_w_offset, ref->ne[0] + w_offset);
+            }


If I undesrtand correctly this is "stitching" the reference images together (in the same 3D postional encoding slice) instead of putting them each on their own "slice" like in the paper? It seems to work very well with this model though, but it will need to be changed again in the future if a model with "true" support for multiple references gets released.

This is based on the implementation of comfyui. In my tests, this implementation performed better when dealing with multiple reference images. I think the current kontext dev model supports multiple reference images.

I think the current kontext dev model supports multiple reference images.

Well, as I understand it, with this implementation, it kind of acts like all the reference images are just one big reference mosaic (well not quite since the VAE encodes them separately, but they are positioned on the same "RoPE plane" of index 1 if that makes sense). Anyways, I agree that this implementation is better, at least for as long there is no model that does support reference images with different indices.

LostRuins · 2025-06-28T13:02:49Z

@stduhpf bear with me a bit, this is just some rambling, so I did some searching on reddit, and saw this comfyui workflow: https://www.reddit.com/r/comfyui/comments/1l2zsz2/flux_kontext_is_amazing/

Here, Flux Kontext ingests multiple separate images and then can generate a composite image containing all 3 subjects.

What is not entirely clear is whether it's fed into the model as one source reference image, or several.

I tried the current implementation in sd.cpp, with the same prompt and these 2 images:

but instead, I have received Walter Ramsay.

stduhpf · 2025-06-28T13:09:32Z

@LostRuins This is with commit 8967889?

LostRuins · 2025-06-28T13:15:36Z

nope, this is with your earlier changes. Should I merge the latest?

stduhpf · 2025-06-28T13:21:48Z

@leejet's changes kind of fixed multiple references by virtually stitching the reference images together using RoPE offsets so the model "sees" them as one mosaic, rather than implementing it like suggested in the original paper with each image being clearly separated in a third dimension. (It seems that ComfyUI does the same)

LostRuins · 2025-06-28T14:14:30Z

@leejet I'm not convinced it's usefull to make such a distinction between the "edit" and "txt2img" modes. Isn't edit mode is just txt2img with image conditionning?

I agree, it seems unnecessary to duplicate a whole different flow for this particular case. After all, we already add photomaker, controlnet and others directly in txt2img and img2img, I don't see why this should be different.

LostRuins · 2025-06-28T14:28:29Z

Alright seems mostly working now, very nice.

Are there any guidelines to follow regarding the size dimensions of the kontext input images? Should they be resized to output dims, aspect ratio, or not needed?

stduhpf · 2025-06-28T14:37:47Z

Alright seems mostly working now, very nice.

Are there any guidelines to follow regarding the size dimensions of the kontext input images? Should they be resized to output dims, aspect ratio, or not needed?

https://github.com/comfyanonymous/ComfyUI/blob/master/comfy_extras/nodes_flux.py#L60-L100
Maybe this is what you're lookng for? But I tried a lot of resolutions that don't match these "optimal" ones and didn't have any issues.
Also ref images size increases the compute buffer as much as output image size does, and on Vulkan at least, compute buffer size is a scarce ressource (because of the allocation limit).

leejet · 2025-06-28T15:09:24Z

@leejet I'm not convinced it's usefull to make such a distinction between the "edit" and "txt2img" modes. Isn't edit mode is just txt2img with image conditionning?

Because the edit model and the txt2img model are different models, from the user's perspective, distinguishing between the edit mode and the txt2img mode is a more user-friendly approach, although both have significant similarities in their workflows.

leejet · 2025-06-28T15:23:45Z

@leejet's changes kind of fixed multiple references by virtually stitching the reference images together using RoPE offsets so the model "sees" them as one mosaic, rather than implementing it like suggested in the original paper with each image being clearly separated in a third dimension. (It seems that ComfyUI does the same)

This is based on the implementation of comfyui. In my tests, this implementation performed better when dealing with multiple reference images.

leejet · 2025-06-28T15:26:44Z

Alright seems mostly working now, very nice.

Are there any guidelines to follow regarding the size dimensions of the kontext input images? Should they be resized to output dims, aspect ratio, or not needed?

According to my test results, the resolution of the reference image has little impact. Actually, I suggest reducing the size of the image when the VRAM is limited, in order to reduce the VRAM usage and improve the generation speed.

LostRuins · 2025-06-28T18:44:07Z

@stduhpf that civitai clothing lora got deleted so I didn't see your reply - but I was saying that I don't think flux loras are actually working correctly on Kontext, similar to your own observation.

stduhpf · 2025-06-28T19:03:15Z

@stduhpf that civitai clothing lora got deleted so I didn't see your reply - but I was saying that I don't think flux loras are actually working correctly on Kontext, similar to your own observation.

It should be expected for Flux [Dev] LoRAs not to work very well with Flux Kontext [Dev], Flux Kontext [Dev] is distilled from Flux Konext [Pro] rather than fine-tuned from Flux [Dev] on edit tasks, so it should be expected for the two models to be quite different. But I noticed that some loras Like the ones to reduce the number of steps seem to work somewhat wich is interesting. They probably used Flux [Dev] as a base for the distillation.

leejet · 2025-06-29T02:07:45Z

It seems that this PR can be merged now. Thank you everyone!

Kontext support

c3d50c7

add edit mode

8967889

stduhpf commented Jun 28, 2025

View reviewed changes

leejet merged commit c9b5735 into leejet:master Jun 29, 2025
9 checks passed

Flux1 Kontext (Dev) support #707

Flux1 Kontext (Dev) support #707

Uh oh!

Conversation

stduhpf commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Usage:

Example outputs:

Uh oh!

Green-Sky commented Jun 27, 2025

Uh oh!

Green-Sky commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Green-Sky commented Jun 27, 2025

Uh oh!

bssrdf commented Jun 27, 2025

Uh oh!

LostRuins commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leejet commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leejet commented Jun 28, 2025

Uh oh!

stduhpf commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Jun 28, 2025

Uh oh!

stduhpf commented Jun 28, 2025

Uh oh!

stduhpf commented Jun 28, 2025

Uh oh!

stduhpf Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leejet Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

stduhpf Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

LostRuins commented Jun 28, 2025

Uh oh!

stduhpf commented Jun 28, 2025

Uh oh!

LostRuins commented Jun 28, 2025

Uh oh!

stduhpf commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Jun 28, 2025

Uh oh!

LostRuins commented Jun 28, 2025

Uh oh!

stduhpf commented Jun 28, 2025

Uh oh!

leejet commented Jun 28, 2025

Uh oh!

leejet commented Jun 28, 2025

Uh oh!

leejet commented Jun 28, 2025

Uh oh!

LostRuins commented Jun 28, 2025

Uh oh!

stduhpf commented Jun 28, 2025

Uh oh!

leejet commented Jun 29, 2025

Uh oh!

Uh oh!

Uh oh!

stduhpf commented Jun 27, 2025 •

edited

Loading

Green-Sky commented Jun 27, 2025 •

edited

Loading

stduhpf commented Jun 27, 2025 •

edited

Loading

LostRuins commented Jun 28, 2025 •

edited

Loading

leejet commented Jun 28, 2025 •

edited

Loading

stduhpf commented Jun 28, 2025 •

edited

Loading

stduhpf Jun 28, 2025 •

edited

Loading

stduhpf commented Jun 28, 2025 •

edited

Loading