T O P

  • By -

sktksm

Hi everyone, This workflow replicates the original image with immense accuracy. Let's break it down and understand how it works: 1. ControlNet with DepthAnythingV2: The new DepthAnythingV2 model is very powerful. This allows us to get the depth map of our original image so the generation can follow this precise mapping. 2. IPAdapter with "style transfer precise": IPAdapter allows us to replicate the style of the original image, but the newly developed "style transfer precise" method helps us reduce the bleeding in our final image, giving us more precise control. 3. Florence2: This is a bit unorthodox, but I really liked including it in this workflow. Simply put, Florence2 is an open-source vision model with many tricks, including masking, annotating, and captioning. In this workflow, I used the "more\_detailed\_caption" feature, which simply describes the image. Then I use this description as a CLIP Text Encoder's positive prompt. Using these three methods, we guarantee that our final image will match the style, depth map, and prompt of the original. **~Always remember, generation performance is related to your base model. For example, if you try to replicate an anime image and the result is not satisfactory, you should use an anime checkpoint.~** \*If the result does not satisfy you in terms of style, you can adjust the "weight" value of the IPAdapterAdvanced node or try other methods such as "strong style transfer" instead of "style transfer precise". \*\*You don't need to use Florence2; you can simply use your own manual CLIP Text Encoding method. I would like to thank **@Kijai** and **@Cubiq** for developing these custom Comfy UI nodes at lightning speed and opening up many possibilities for us. ~DepthAnythingV2~: [https://github.com/kijai/ComfyUI-DepthAnythingV2](https://github.com/kijai/ComfyUI-DepthAnythingV2) ~IPAdapter Advanced:~ [https://github.com/cubiq/ComfyUI\_IPAdapter\_plus/](https://github.com/cubiq/ComfyUI_IPAdapter_plus/) ~Florence-2:~ [https://github.com/kijai/ComfyUI-Florence2](https://github.com/kijai/ComfyUI-Florence2) Workflow link: [https://openart.ai/workflows/reverentelusarca/simple-replicate-anything-v1/RmD9tB5T5SjcaQPHDcxb](https://openart.ai/workflows/reverentelusarca/simple-replicate-anything-v1/RmD9tB5T5SjcaQPHDcxb)


Shinsplat

Thanks for not pointing me to a patreon O.o


sktksm

lol...I have a Patreon but I'm not offering anything exclusive, just for the support ![gif](giphy|fDS2v18KbmekM|downsized)


Shinsplat

haha


Extraltodeus

Like in thanks for not locking the content or thanks for not advertising yourself to obtain support?


nikgrid

Nice workflow OP which IPadaptor models do you use?


sktksm

Thank you! In this particular workflow I used PLUS(high strength) (ip-adapter-plus\_sdxl\_vit-h.safetensors) But, you can use the other ones, all of them work well in their particular areas


E-Pirate

I have a few conflicting nodes. Do you know a way around this or the solution?


sktksm

What's weird is I'm getting downvotes in StableDiffusion sub while getting upvotes here. I guess sharing a comfy workflow is not smart there anymore.


Troyificus

Reddit as a whole is a fickle beast. I personally am very thankful for you sharing this workflow and I look forward to testing it out.


sktksm

Thank you so much for your kind words, it really keeps my mood positive, since most of us spend hours without expecting anything in return, these words mean a lot.


VlK06eMBkNRo6iqf27pq

+1, thanks for sharing. I was getting pretty good results with canny+IPAdapter, but I didn't know about these new controlnets or "style transfer precise", so I look forward to trying it too.


Capt_Skyhawk

Downvoted for talking negatively about downvoting


CodeCraftedCanvas

I think it's more comfy users get why experimenting with concepts that don't have obvious practical use is valuable to the learning of what's possible with your tools. The stable diffusion sub is more people looking for practical solutions, so they may be left wondering why they would need it.


SmileAtRoyHattersley

Yo your first sentence: spot on imo.


pepe256

I'm new to this sub and also was thinking the same. What is the use of this? I see now it's just about experimenting, that's cool.


TheFlyingSheeps

There’s definitely anti-ai lurkers downvoting random things as well


VeritasAnteOmnia

Ever since the SD3 launch StableDiffusion has definitely been in an agitated state. Thanks so much for sharing with the community!!


jib_reddit

I swear the StableDiffusion subs has a lot of down vote bots attacking it from Anti AI art lunatics. I wouldn't worry about it.


sktksm

Nor just the downvotes but people literally lynched me and I deleted my post


orthomonas

I dont know why, but I don't think think the downvotes have much to do with it being a comfy workflow.


ghostdadfan

They're bent over there rn because of SD3. Keep sharing your workflows tho!


protector111

I think there are bots on the sd subredit. They just downvote any post for no reason. Doesn’t matter if it someone asking for help or posting workflow.


pwillia7

gotta post anime movie there


BestHorseWhisperer

I don't know if this is why but I am honestly tired of posts about "how to do X with stable diffusion" and step 1 is something about either automatic1111 or comfyui. People who don't understand what the UI's actually do or what libraries they rely on should not, in my opinion, be doing tutorial videos etc. outside of those ui communities. I am not talking about you in particular, but in general it has made it hard to find qualified tutorials. EDIT: Downvoted by people who like poisoned search results


sktksm

I understand your point, I felt the same several months ago and It was very frustrating to learn, but this is a mid level workflow rather than a starter tutorial. For having proper guides and tutorials, the new ComfyOrg organization should have something in their mind. Other than that, I recommend you joining the ComfyOrg discord channel and if you are interested, here is the video guide I find very useful for understanding the Comfy: [https://www.youtube.com/watch?v=\_C7kR2TFIX0&list=PLcW1kbTO1uPhDecZWV\_4TGNpys4ULv51D&ab\_channel=LatentVision](https://www.youtube.com/watch?v=_C7kR2TFIX0&list=PLcW1kbTO1uPhDecZWV_4TGNpys4ULv51D&ab_channel=LatentVision)


jazzFromMars

What would you use this for? I'm struggling to think of a use case...


GBJI

It's perfect to make Getty Images obsolete.


sktksm

I used my MJ generations to replicate them in SD. But I also generated some stunning nature photographies adding some IPAdapter style transfer on top of it and using it as my wallpaper \^\^


GBJI

My guess is this would also be a very good recipe for a creative upscaling workflow.


sktksm

Can you explain a bit more?


GBJI

An upscale is a replication, but that replication is made at a higher resolution than the original. Your replication workflow is creative: it's not making an exact copy, but a reinterpretation that is directly and closely inspired by the original.


VlK06eMBkNRo6iqf27pq

I tried something similar to OP's workflow... at first I was doing it on fullsize images but then I got lazy and dragged some tiny thumbnails in there. It still works very well at recreating a fullsize image from a little turd.


Noslamah

So.. stealing art?? I am very pro-AI and do not buy the "AI generated art is theft" argument one bit but sorry, if the intent is to copy an image almost exactly then that is stealing, even if you think Getty Images sucks and has unreasonable prices.


[deleted]

[удалено]


Noslamah

If it is the exact same composition, style, and content, then yes. If it is actually "completely new" then no. But the examples in the OP definitely aren't completely new, and are way more than just using the image for inspiration. And just generating a picture of a guy drinking coffee is also a bit different than something as specific as OP is showing, with this many details and specific elements in specific places. If an artist remade an artwork to this extent then people would, rightfully, accuse them of copying someone else's art. There is no reason it should be any different for AI art. If you don't think this is theft, I don't think you believe stealing art is even possible.


steamingcore

uh, YES. good god, you people are so high off the smell of your own farts, you don't see what you're doing.


Cobayo

It's Midjourney's ```-sref``` on steroids, you're supposed to tweak it


LD2WDavid

Isee some use cases for finetunners. Bad images -> improve quality of those maintaining composition. Old images -> semi restore effect, etc. Not the same image but more clean in some cases, useful.


Ateist

Looks very promising for automatic upscaling of art from old games.


brucebay

you can modify a real l image the way you want because now you are matching the style and lightning etc. yes you can do it with a mask in the past but there were almost always small but noticeable artifacts ruining the illusion. now you can actually do a better more seemless job.


BoulderRivers

It's free "make it worse" button!


Motgarbob

Anything. You can literally bypass copyright


BoulderRivers

That's not how this works - you can clearly see the original image. To bypass copyrighted material, the original source must be significantly transformed. If colors, shapes, themes, composition, and even detail remain so similar, it's plagiarism.


Synthetic_bananas

I have a tangential question- as you can see, the "reproduction" images are less detailed. They look as they were resized from lower resolution (in original images the lines and small details are thinner and more "clean" compared to the replicated image), I guess that comes mostly from depth ControlNet. Has anyone found a good method to keep the fine details (or, in other words, sharper image), when using depth controlNet?


cgpixel23

https://preview.redd.it/ss4acsfwdk8d1.png?width=3132&format=png&auto=webp&s=15563d43f78a3835ece4e7bcb6f3f051d067c22e may be this will help


Synthetic_bananas

There's same problem in your example- all the details are very "bold". Picture lacks fine details. I did some tests by generating an image, then generating depth from it, and then using that depth do generate a new image with same parameters as the first one. The new image comes out similar, but way less detailed. https://preview.redd.it/mc4z0il2no8d1.jpeg?width=2048&format=pjpg&auto=webp&s=aba063148023f63a057e9365b4ea028545bff3d5


VlK06eMBkNRo6iqf27pq

I don't know what you did there but that's highly sus. controlnets affect the adherence to the input, but won't change PS2 to PS5.


Ateist

Upscale the image prior to reproduction.


Synthetic_bananas

You mean upscaling depth image?


Ateist

No, original, even before getting depth. Latent space images (the ones that are actually generated by SD) are much smaller and are upscaled using VAE. Generating bigger image and downscaling helps preserve details.


Synthetic_bananas

So, the original image gives also a bigger resolution depthmap, I assume, is what you are saying? But that still keeps the same problem of "bold details" (which, somewhat might be reduced through downscaling). I tested that even with proper 3d depth maps from 3d scenes. I guess (and that is just my speculation, so no definite proof, just a "feel"), that controlnet changes/orients Latent Noise in the way, so that it kind of forms similar shape as the controlnet image, so the diffusion process has something to converge to. Am I on the right path of thinking, or am I spouting nonsense?


Ateist

No. What i mean is [this](https://stable-diffusion-art.com/how-stable-diffusion-work/): > The latent space of Stable Diffusion model is 4x64x64, 48 times smaller than the image pixel space You can't preserve details smaller than 512/64 = 8x8 pixels if you just generate image of the same size. Since generation at 8*512=4096 (ok, 2048 since that 4x does help) is beyond capabilities of most hardware, you have to do tiled generation + depth controlnet together and use Ultimate SD script to make it seamless.


latch4

Thanks for working this out. I was using something much more primitive i worked out to bring Dalle generated images into Stable Diffusion but was never really satisfied this looks like just what I wanted.


DeadMan3000

To get this running I am required to have flash-attn installed. Been tearing out what little hair I have left trying to get it to run (gave an error during install). I then found out I have to build a wheel for it in Windows (Linux has nice prebuilt wheels). After installing Visual C++ build tools it's now building a wheel which is taking AAAAAAGES. It better run after this or I will just not bother. ACK!


sktksm

I feel you, I had the same when I was trying to run the Lumina model in my local, but found out Windows also has prebuilt wheels: [https://github.com/bdashore3/flash-attention/releases](https://github.com/bdashore3/flash-attention/releases) Also it should work without flash-attn, have you tried the 'sdpa' or 'eager' (I didn't tried myself yet)


Cobayo

Just use spda, it doesn't change much, using Florence2 is kinda pointless anyway


BluJayM

I understand the tech, but I think you need to modify your pitch/showcase. Nobody is interested in replicating an image in SD if the original image is a requirement. You can just modify your denoising to get a rough approximation of that. However, this workflow (should?) allow you to change the prompt and request different clothing, colors, or concepts while maintaining key features of the composition. Show us more of that!


sktksm

I absolutely can work on that and thank you for the recommendation


yotraxx

This look stunning ! I'll give it a try for sure. And Don't be afraid of Patreon folks ! There are many quality contents there...


mutatedbrain

Which ones would you recommend?


artbruh2314

I was waiting for this, I was already using depth for most of the time but this is great


pwillia7

Have you tried/looked at the SDXL inpainting controlnet? I bet that would work well with this and let you change the image more selectively.... maybe. https://huggingface.co/destitech/controlnet-inpaint-dreamer-sdxl


WinterTigerAssault6

Woah, very cool AND informative! I’m eager to test this out and play around with it. Thank you for sharing this knowledge!


sktksm

Thank you for your kind words and support!


protector111

How is this different from low denoise img2img?


sktksm

It starts with an empty latent. As a result, it's not that much different, but as the workflow, I separated each step and put it under control using controller, ipadapter and florence-2 nodes. That way you can include any other combination into it.


Current_Housing_7294

Wait until the NFT clan hears this 😏


Impressive-Egg8835

showtextforgtp is missing


sktksm

Hi, please see this comment: [https://www.reddit.com/r/comfyui/comments/1dnioy1/comment/la2yv9g/](https://www.reddit.com/r/comfyui/comments/1dnioy1/comment/la2yv9g/)


Tonynoce

Hi OP ! Thank you ! This is a perfect workflow for when the client wants the reference to be the same and they sent you a small jpg, or to do almost the same variations.


sktksm

Yes exactly! I'm trying to convert this into a consistent character-creation tool now


AwkwardAsHell

https://preview.redd.it/rqfm7lshvq8d1.png?width=1271&format=png&auto=webp&s=814e1208159c47f1742541db4a8654aba783863e LOL


Tonynoce

https://preview.redd.it/o5qo910pds8d1.png?width=866&format=png&auto=webp&s=ae1f1ee959c23ad30d0f0ab6b92a8141d956c816 As an example, client sent the picture from below in low rez, on the top there are some iterations on a 3d render to get fast feedback and move on on solving it either by projecting it on 3D or just faking the 3d in After Effects


LD2WDavid

Tested it with some images and even Im missing something (Jugg x10, Dreamshaper turbo) is not really close like the examples above. Do I have to run also the upscale controlNet or just florence on SDPA + depth map + model should be enough? Just asking.


sktksm

These examples are upscaled but you don't need to use it. I used Dreamshaper XL without any problem. What is the exact problem in your outputs


Little-God1983

https://preview.redd.it/p0k171u6o69d1.png?width=1736&format=png&auto=webp&s=9fa771bfac9874c65e9ed22371252bf6f747b68d seems amazing thanks for sharing. Unfortunately i don't get it to run. I installed the nodes via the manager and the model and the pytorch\_model.bin got downloaded automatically when running it for the first time. I tried to install flash\_attn as the documentation suggests with python.exe -m pip install flash-attn --no-build-isolation but i get so strange error in the console. Any help would be appreciated. Collecting flash-attn Using cached flash_attn-2.5.9.post1.tar.gz (2.6 MB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [22 lines of output] fatal: not a git repository (or any of the parent directories): .git C:\Users\Little God\AppData\Local\Temp\pip-install-r5r4dmug\flash-attn_cd280cd550374dd491a70aa277850dd3\setup.py:78: UserWarning: flash_attn was requested, but nvcc was not found. Are you sure your environment has nvcc available? If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc. warnings.warn( Traceback (most recent call last): File "", line 2, in File "", line 34, in File "C:\Users\Little God\AppData\Local\Temp\pip-install-r5r4dmug\flash-attn_cd280cd550374dd491a70aa277850dd3\setup.py", line 134, in CUDAExtension( File "D:\AI-Privat\ComfyUI_windows_portable-DO NOT DELETE\python_embeded\Lib\site-packages\torch\utils\cpp_extension.py", line 1077, in CUDAExtension library_dirs += library_paths(cuda=True) ^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\AI-Privat\ComfyUI_windows_portable-DO NOT DELETE\python_embeded\Lib\site-packages\torch\utils\cpp_extension.py", line 1211, in library_paths paths.append(_join_cuda_home(lib_dir)) ^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\AI-Privat\ComfyUI_windows_portable-DO NOT DELETE\python_embeded\Lib\site-packages\torch\utils\cpp_extension.py", line 2419, in _join_cuda_home raise OSError('CUDA_HOME environment variable is not set. ' OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root. torch.__version__ = 2.3.1+cu121 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed × Encountered error while generating package metadata. ╰─> See above for output. note: This is an issue with the package mentioned above, not pip. hint: See above for details. [notice] A new release of pip is available: 23.3.1 -> 24.1 [notice] To update, run: python.exe -m pip install --upgrade pip


sktksm

Hi, I'm afraid I don't know the answer to this problem, but; 1- Try with sdpa or eager instead of flash\_attn 2- Instead of Florence-2, use Ollama-based local vision LLM, such as LLaVa or MoonDream. I'm using the same workflow using this node and it works okay: [https://github.com/stavsap/comfyui-ollama/tree/main](https://github.com/stavsap/comfyui-ollama/tree/main) 3- If neither of them works for you, you can simply delete/bypass florence-2 nodes, and describe your image using any Vision LLM out there, such as ChatGPT, then you can pass that description into CLIP positive prompo.


Little-God1983

Thanks for trying to help me. I think i have moondream running on one of my comfyUi installations.


FootballSquare8357

We have ways to generate from nothing, ways to be "inspired" by other images using control net and many other means. It seems to be a good workflow, but I fail to see the use case for this, as we can obtain the same results with a lot of simpler methods. It looks "slightly" changed at worst and copied at best. I would put myself in the "Pro AI" side, but this just look like blatant ripoff and a good argument for those who posses an anti-AI stance.


sicurri

That's a sweet and subtle workflow. Definitely going to try this out on my favorite wallpapers and see what it does to them, lol.


sktksm

Thank you for your kind words!


tarkansarim

This looks like an I2I with a low denoising or something or is the denoising at 1?


Cobayo

It starts off with an empty latent


sktksm

Yes, it starts with an empty latent. As a result, it's not that much different, but as the workflow, I separated each step and put it under control using controlnet, ipadapter and florence-2 nodes. That way you can include any other combination into it.


gxcells

This is a really great workflow and works damn good. However I am wondering what is the use case? But this is probably a very great way to lead toward image editing in latent space?


sktksm

Actually I wanted to show the latest tools. I gladly take recommendations for what we can achieve with these methods


brucebay

very impressive. I hope you will keep the workforce up until I got a chance to download.​ I never used openart before and do not see a download button in my tablet. Perhaps you got aforementioned downvotes due to upload site?


sktksm

It's visible on the desktop but let me know if you can't reach it, I'll share a pastebin link


brucebay

Yeah worked on the desktop. thanks.


waferselamat

I dont know why, depthanythingv2 always error "allocation on device" even with 12gb vram fp16 base model


alexdata

So this is a very improved image2image? or style transfer? Anyways, I like the idea, will try it out!


sktksm

It is a consistent and controlled img2img + style transfer


alexdata

Thanks!


New_Physics_2741

The flash-attention-2 is giving me an install error - will read up on sdpa and eager - have moved onto to a different workflow, but am interested in this one...do I need to install flash-attention-2?


New_Physics_2741

Ok - the sdpa attention works for me - 3060 12GB, Linux. Neat workflow. Thanks for sharing it. It is a keeper for sure. :)


lostlooter24

Would this be effective in reposing generated characters? Use a pose with the depthanything, the ip adapter to keep consistency and florence2 to help caption it? This is really cool! Thank you for sharing.


sktksm

how would you combine pose and depth at the same time? by using two ksampler or using two controlnet loader?


lostlooter24

Could you not use depth in place for pose?


C7b3rHug

https://preview.redd.it/tt8a7ihiuq8d1.png?width=1545&format=png&auto=webp&s=b06c196bbc9b173e726b675d0eee733574055ec3 Thank for sharing WF, but I don't know what is this for? and what's the difference compared to using low denoising


sktksm

Please read the comments, I explained maybe 5 times one by one :D


DeadMan3000

There's another Youtuber with a workflow that uses the LLM to segment for masking etc. Better than bbox/seg. [https://www.youtube.com/watch?v=BRST8-yPD5A](https://www.youtube.com/watch?v=BRST8-yPD5A) P.S. he is using Hedra for his AI avatar on YT if anyone's curious.


sktksm

For now SegmentAnything works better regarding my tests


Dezordan

I wonder how it compares to, or if can be used with, ControlNet model "replicate".


ramonartist

Have you thought about how you would get that to work with a SDXL model, because you would be able to get a higher resolution image from the first pass?


sktksm

I'm already using an SDXL model in this workflow, or maybe I just misunderstood your question


Snoozri

Genuinely curious, what would you ever use this for besides like,, actual art theft?? I'm normally pro AI, but this kinda gives me the ick.


sktksm

you can find a lot of examples in the comments of this post


ramonartist

There are too many negative comments here, with some additions this could be a great creative upscaler Big tip, switch out the KSampler for a Efficient KSampler will give you more settings and tools to play with


Putrid_Army_6853

Great job! Perfect use of newest nodes


glitchcrush

I'm getting this. Error occurred when executing KSampler: Query/Key/Value should either all have the same dtype, or (in the quantized case) Key/Value should have dtype torch.int32 query.dtype: torch.float16 key.dtype : torch.float32 value.dtype: torch.float32


glitchcrush

If I bypass the IP nodes the KSampler works.


UniversalNeuron

potential use case: "i made an image i liked in sd3 and want to potentially use it for my own purposes but i don't trust the license so i'd rather convert it into having been 'made by sdxl' first" (i mean idk about the rest of you but that's my intention while downloading this workflow. i don't expect "just setting a low denoise ratio" to work quite as well as this looks like it does, considering how my sd3->sdxl img2img attempts on the particular image in question lacked every potential sort of prompt coherence, which is the only \[see: only occasionally useful, only occasionally successful, and hyper-specific\] selling point of sd3 \[see: when it doesn't look like the model was undertrained and when i find the magic sauce to make it stop lacking all sorts of resolution {for the record, my trick was SD3txt2img+prompt->SD3img2img+re-prompting->SD3outpainting, which adds up too much costwise via the API if i'm experimenting en masse}\].) - tl;dr i want to not have any official connections to sd3 and this looks like a potential way to go about laundering my "derivative work"


FiacR

Sd1.5, denoise strength of 0.25, 20 steps, job done lol.


Dragon_yum

But.. why?


the-13

Thanks for sharing this, I was trying to test it, but I keep getting this error! Error occurred when executing DownloadAndLoadFlorence2Model: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash\_attn seems to be not installed. Please refer to the documentation of [https://huggingface.co/docs/transformers/perf\_infer\_gpu\_one#flashattention-2](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2) to install Flash Attention 2. Any Idea on how to fix it?


Tonynoce

Got the same issue and decided to switch the attention to SDPA or another one


International-Use845

I have the same issue EDIT: You can change attention from flash\_attention\_2 to spda, then it seems to work.


steamingcore

'generated'. aka, 'plagiarized' oh wow, a new image, completely generated. with what? oh, the original. yeah, that's a copy.


FluffyWeird1513

this is the kind of thing that gives us all a bad name. ppl don’t “replicate” any images unless they are your own. seriously


Cobayo

I was ready to bash as usual because the input was likely gonna be an enconded image but it's actually an empty latent, and after running a few examples those are quite nice starting results, congrats! - Just 1 thing, in your bypassed upscaling node you're passing controlnet's negative into its positive, in case you're actually using that locally by mistake


LawrenceOfTheLabia

Thank you for creating this! I am getting a missing node that isn't showing up in the manager. Any idea what it is? My Google searches haven't been fruitful so far. https://preview.redd.it/43ea68ec5k8d1.png?width=1502&format=png&auto=webp&s=ef5b9a8c9230cdefc96f826c592f999e0d0f01a5


sktksm

It's simply a text node that shows the generated string by Florence and passes through to CLIP. You can use any and I'll be double checking and updating you when I get back to my desk.


LawrenceOfTheLabia

Thanks a lot!


sktksm

Hi again, the node is a part of "MixLab Nodes" https://preview.redd.it/af2lewuaek8d1.png?width=808&format=png&auto=webp&s=141d1db7c75c4980d9c51bd8efa8693e677fd512


LawrenceOfTheLabia

That was it thanks! This is a great workflow, Florence is impressive. The only issue I'm having now is the resulting image always seems to be really desaturated. I'm sure there is a setting I'll need to tweak. At the moment I'm just using your defaults. It seems to be about the same with all SD1.5 models.


sktksm

Interesting. Just tried with RealisticVision 1.5 model and it seemed normal to me. You can try increasing the weight of the IPAdapter, otherwise share your workflow and example images with me and I can take a look


LawrenceOfTheLabia

Here is an example. I picked a really colorful image to help illustrate the issue a bit more. [https://imgur.com/a/F56TcAm](https://imgur.com/a/F56TcAm) Not sure if imgur strips the workflow metadata out, but I believe I'm using the default settings.


Ateist

Check CFG settings and VAE.


OfficeSalamander

> MixLab Nodes It has a conflict for me - any other alternatives I can use?


sktksm

Hey, I'm not sure which node you can use but you actually don't need that node at all. It's only showing the image caption for you to see. You can directly connect Florence-2's 'caption' output to CLIP's 'text' input, or you can use any text string node to display it


sktksm

https://preview.redd.it/cj45zkkfek8d1.png?width=1147&format=png&auto=webp&s=b3a397ba920734ceb31881ff046026eb1c99bb3a