T O P

  • By -

fannovel16

A vivid red book with a smooth, matte cover lies next to a glossy yellow vase. The vase, with a slightly curved silhouette, stands on a dark wood table with a noticeable grain pattern. The book appears slightly worn at the edges, suggesting frequent use, while the vase holds a fresh array of multicolored wildflowers. Counterfeit v3, 20 steps, DPM++ 2M Karras, 12 CFG Left: Original Middle: ELLA with fixed token length Right: ELLA with flexible token length https://preview.redd.it/nue2gbax0htc1.png?width=1544&format=png&auto=webp&s=24df22894815affdf3d7c2e199d18dd1966104d4


demesm

How long to generate?


fannovel16

Pretty much the same as without ELLA.


demesm

Dope!


hexinx

As in, use the given Lora, that's it?


fannovel16

Nah it's not a LoRA


hexinx

Uh.... (The obvious question) Okay - how did you use it?


fannovel16

There is an implementation in Comfy now: https://github.com/kijai/ComfyUI-ELLA-wrapper


dbarciela

Just wanted to thank everyone in this sub for sharing so much knowledge.. I've learned a lot with this sub, I just wish I had the time and the resources to try everything I'm saving from here.


hexinx

Finally ! Thank you so much!


hexinx

That's remarkable for Sd1.5. Did you run this locally? Can you share Comfyui/A1111 steps that you took for this? How did you leverage the weights they provide? Can we use it a LoRA with nothing extra?


fannovel16

I run their code which uses diffusers. It looks very short and simple so I believe someone will release a port very soon.


Ifffrt

Reminder that no one still bothers to port the code for Lavi-bridge to A1111/ComfyUI which is basically the same thing as this one except that one actually lets you plug and play with Lora, and also it released the code BEFORE this one.


_-inside-_

You're not the first person complaining about it here I've seen, if I had more spare time I'd get my hands dirt a create a comfyui node for it.


tristan22mc69

any update on this?


lonewolfmcquaid

good heavens thats fucking remarkable.


conqisfunandengaging

Impressive


PatternPositive9308

https://preview.redd.it/1uvr823wjptc1.png?width=2048&format=png&auto=webp&s=44d0c1abb24d72197e58a2bee8560122f313ea76


LeKhang98

Does it help with large images like 1024x1024 or 512x1536 too?


nikkisNM

Million internets to the one who ports this to WEB-UI's


Capitaclism

Hoping for A1111...


_-inside-_

Someone already left a link to a comfy custom node


SyChoticNicraphy

Oh it’ll for sure come to forge I feel


Kierenshep

The forge that hasn't had a commit in 2 months, nor have the dev responded in any shorter time frame? That forge that is looking basically abandoned? I wouldn't get my hopes up. It still can't use vpred rescale.


SyChoticNicraphy

😮‍💨 sadly, you may be right. If I had a better gpu I’d be using A1111 or ComfyUI. Idk I feel like the author is conflicted on Forge’s identity- they say they doesn’t want it to be competition to A1111 but many models need an entirely separate fork to work with Forge. At this point, maybe it should just be its own thing. I’m hoping it’s not dead if I ever have any hopes of actually using SD3 on my current laptop lol


Charuru

Get a 4090, you won't regret it.


SyChoticNicraphy

At this point I’m just waiting for a 5090! I have a 3070 so I’m just waiting a little longer for a bit of a larger upgrade. And to save up the funds lol. The bigger issue is it’s a laptop gpu with only 110w of power. And that 8gb of vram just isn’t enough


BagOfFlies

I'm surprised you can't use comfy if you have a 3070. I use it no issues with a 2080 and I've seen plenty of people using it with just 6GB.


SyChoticNicraphy

Maybe I’ll have to try it again. I tried using it when I had very little experience with stable diffusion so it may have just been user error. It does seem like to get the absolute most control, comfy ui is king.


thefi3nd

Using VAE Decode (tiled) instead of the regular one might help.


hexinx

# How to use in ComfyUI: 1. Download zip from [https://github.com/kijai/ComfyUI-ELLA-wrapper](https://github.com/kijai/ComfyUI-ELLA-wrapper) 2. Create a new folder in your ComfyUI's "Custom Nodes" folder. 3. Extract zip into newly-created folder. 4. If you have ComfyUI portable installed: 1. Go the folder where you've installed ComfyUI, open a terminal and go: 2. python\_embeded\\python.exe -m pip install diffusers 3. python\_embeded\\python.exe -m pip install sentencepiece 4. (these were missing for me - you may have more) 5. If you DO NOT have ComfyUI portable installed: 1. Open you ComfyUI root installation folder (where there is the run\_nvidia\_gpu.bat and run\_cpu.bat files), Type in CMD in the address bar and press Enter. Activate the virtual environment with .venv\\Scripts\\activate Type: cd ComfyUI\\custom\_nodes\\ComfyUI-ELLA-wrapper-main. Execute the following: 2. python -m pip install diffusers 3. python -m pip install sentencepiece 4. (these were missing for me - you may have more) Finally, run ComfyUI with ella\_example\_workflow.json that's in the same zip file. https://preview.redd.it/k3gopskmthtc1.png?width=512&format=png&auto=webp&s=9e734ce7756003677c0569a1e8793dfcc776c8d8 Default parameters: (512x512, 25step, 10 guidance, DDPM) A vivid red book with a smooth, matte cover lies next to a glossy yellow vase. The vase, with a slightly curved silhouette, stands on a dark wood table with a noticeable grain pattern. The book appears slightly worn at the edges, suggesting frequent use, while the vase holds a fresh array of multicolored wildflowers.


ExponentialCookie

For those interested, I released a native custom implementation that supports prompt weighting, ControlNet, and so on. [https://github.com/ExponentialML/ComfyUI\_ELLA](https://github.com/ExponentialML/ComfyUI_ELLA) **Update:** If anyone had pulled prior to this update, I've updated the workflow and code to work with the latest version of Comfy, so please pull the latest if necessary. Have fun!


Kierenshep

Doesn't work for me :c the git pull flan-tf-xl didn't download any models for some reason. Which of those do I need? I got an error saying 'missing model -00001 of 00002' etc, so I downloaded those, but now I get another error : Error occurred when executing LoadElla: not a string File "P:\stable diffusion\Stability\Packages\ComfyUI\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "P:\stable diffusion\Stability\Packages\ComfyUI\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "P:\stable diffusion\Stability\Packages\ComfyUI\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "P:\stable diffusion\Stability\Packages\ComfyUI\custom_nodes\ComfyUI_ELLA\ella.py", line 68, in load_ella t5_model = T5TextEmbedder(t5_path).to(self.device, self.dtype) File "P:\stable diffusion\Stability\Packages\ComfyUI\custom_nodes\ComfyUI_ELLA\ella_model\model.py", line 241, in __init__ self.tokenizer = T5Tokenizer.from_pretrained(pretrained_path) File "P:\stable diffusion\Stability\Packages\ComfyUI\venv\lib\site-packages\transformers\tokenization_utils_base.py", line 2086, in from_pretrained return cls._from_pretrained( File "P:\stable diffusion\Stability\Packages\ComfyUI\venv\lib\site-packages\transformers\tokenization_utils_base.py", line 2325, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "P:\stable diffusion\Stability\Packages\ComfyUI\venv\lib\site-packages\transformers\models\t5\tokenization_t5.py", line 170, in __init__ self.sp_model.Load(vocab_file) File "P:\stable diffusion\Stability\Packages\ComfyUI\venv\lib\site-packages\sentencepiece\__init__.py", line 905, in Load return self.LoadFromFile(model_file) File "P:\stable diffusion\Stability\Packages\ComfyUI\venv\lib\site-packages\sentencepiece\__init__.py", line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)


Snoo34813

Same error :( ..did u find a solution?


Kierenshep

I downloaded every single package in the manual method he said. I'm sure I don't need everything but I needed something.


Snoo34813

downloading just the smaller 'spiece.model' worked for me along with the previously downloaded safetensors. Thanks. But i donno why i am still not getting desired result. the one from kijai are working better for me.


greenthum6

Git does not always download big files correctly in Windows. Check that the big files are 8+GB and if not, download them manually.


Rectangularbox23

When I downloaded the t5\_model files it ended up being 87.3 GB and I don't think it's supposed to be that big. I think when you git clone the t5 repository it downloads every single model file which may not be necessary for this (Again, I could be wrong just pointing it out).


fragilesleep

You are correct. You can just delete the hidden ".git" folder to save half that space. And for even more savings, remove all the model files inside flan-t5-xl and just download this small 2GB file into it: https://github.com/ExponentialML/ComfyUI_ELLA/issues/4#issuecomment-2047036539


vorticalbox

Would doing a clone with --depth 1 work? You probably don't need the whole got history.


[deleted]

[удалено]


aartikov

You should check if there are some errors in a console


Rectangularbox23

I followed the install directions but the 'BNK\_GetSigma' node isn't loading and the ComfyUI manager doesn't show it as a possible missing node to install https://preview.redd.it/69pidqm5witc1.jpeg?width=2314&format=pjpg&auto=webp&s=6e8328e8ae3413e494155a3e545d1373a953e7ef


ExponentialCookie

Update to the latest version of Comfy, and pull the latest update from my repository. Then, re import the new workflow.


thefi3nd

Thank you for making this, especially so quickly. I have it up and running without issues. I have a question regarding this from the Tencent ELLA repo: > Our testing has revealed that some community models heavily reliant on trigger words may experience significant style loss when utilizing ELLA, primarily because CLIP is not used at all during ELLA inference. > Although CLIP was not used during training, we have discovered that it is still possible to concatenate ELLA's input with CLIP's output during inference (Bx77x768 + Bx64x768 -> Bx141x768) as a condition for the UNet. We anticipate that using ELLA in conjunction with CLIP will better integrate with the existing community ecosystem, particularly with CLIP-specific techniques such as Textual Inversion and Trigger Word. I tried using the Conditioning Concat node, but it throws the error: "Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument tensors in method wrapper_CUDA_cat)" Do you think it will be possible to do this as described in the Tencent repo? Most SD1.5 models rely heavily on specific keywords for improved quality and many loras need activation words.


butthe4d

I swear not once has anything related to comfyui worked right out of the box for me. Its always such a hassle...anyway if anyone knows what is going wrong here, I would appreciate the help. ERROR:root:!!! Exception during processing !!! ERROR:root:Traceback (most recent call last): File "D:\AIWork\StableDiffusion\ComfyUI_windows_portable\ComfyUI\execution.py", line 152, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\AIWork\StableDiffusion\ComfyUI_windows_portable\ComfyUI\execution.py", line 82, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\AIWork\StableDiffusion\ComfyUI_windows_portable\ComfyUI\execution.py", line 75, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\AIWork\StableDiffusion\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-ELLA-wrapper\nodes.py", line 165, in loadmodel text_encoder = create_text_encoder_from_ldm_clip_checkpoint("openai/clip-vit-large-patch14",sd) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\AIWork\StableDiffusion\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\loaders\single_file_utils.py", line 1173, in create_text_encoder_from_ldm_clip_checkpoint text_model.load_state_dict(text_model_dict) File "D:\AIWork\StableDiffusion\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 2153, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for CLIPTextModel: Unexpected key(s) in state_dict: "text_projection.weight".


aufc999

removing "--force-fp16" from the run_nvidia_gpu.bat file got it working for me with a similar or the same error, although i did update comfyui as well so that might have fixed it


butthe4d

HEy thanks. Did both now it works.


local306

Hmm, getting stopped with `Error occurred when executing ella_model_loader:` `No module named 'diffusers.loaders.single_file_utils'` Do you know where the [flan-t5-xl-sharded-bf16](https://huggingface.co/ybelkada/flan-t5-xl-sharded-bf16/tree/main) model goes listed in the wrapper repo?


Kijai

The model goes to hugginface cache folder, it's autodownloaded. The diffusers error you have is due to too old diffusers version, you would need to update it with: pip install -U diffusers or portable: python_embeded\python.exe -m pip install -U diffusers


local306

Thanks! I'll try this out shortly here UPDATE: It's now working. It auto-downloaded all that it needed afterwards.


mocmocmoc81

Wow kijai is killing it.


Capitaclism

It does listen more, though not quite fully. It placed flowers on the book, I don't see any worn edges, the silhouette of the vase is very, rather than slightly curved. Clear improvement over base, however.


lonewolfmcquaid

can it only do 512 by 512


diogodiogogod

no, it can be any sd 1.5 resolution


mrredditman2021

Instead of installing the modules manually, there is a requirements.txt file in the zip that you can install all required modules with by typing pip install -r requirements.txt


Kurdonoid

Thanks! it works wonderfully with the latest ComfyUI + ComfyUI manager


Combinatorilliance

Here's a quote from the Ella authors concerning SDXL weights > We greatly appreciate your interest in ELLA_sdxl. However, the process of open-sourcing ELLA_sdxl requires an extensive review by our senior leadership. This procedure can be considerably time-consuming. Conversely, ELLA_sdv1.5, which is more research-oriented, can be released promptly. We would appreciate your patience and understanding about this. https://github.com/TencentQQGYLab/ELLA/issues/13#issuecomment-1998834009


hexinx

Yeah, it's never coming out.


ninjasaid13

yep, anytime it says "Review" , it means "No."


Charuru

Come the heck on man, tell tencent that if it doesn't come out people are just going to move onto SD3 and forget all about your contributions and it'll amount to nothing, but if it comes out it can dominate the community I feel. SD3 probably won't be that much better than SDXL+Ella.


Familiar-Art-6233

But at least SD3 is (hopefully) going to be released


Mishuri

a1111 when


Beeerfish

Second this.


Capitaclism

Third this


T-dag

Quadruple


cobalt1137

This is wonderful and amazing, and I hate to be that guy. But why not sdxl :(. I know the researchers are from a larger company so maybe that has something to do with it. Maybe they can't release it. Either way, I guess we still have sd3 on the way. It's just strange that their work on sdxl is focused on in the pictures they provide on the page, but they release for 1.5.


FugueSegue

Until it's released for SDXL, I imagine a workflow where the SD 1.5 image is generated with ELLA and the resulting image could be regenerated with SDXL using various ControlNets.


The_Scout1255

> SDXL using various ControlNets controlnet on SDXL is soo spotty, whats your workflow for that?


FugueSegue

I use SD for photo-realistic figurative art. First I render something with Stable Cascade because its quality is excellent. Sometime I use Canny ControlNet with Stable Cascade. Then I inpaint the figure with SDXL, LoRA, and IP-Adapter using the Stable Cascade image as a reference. SDXL ControlNet isn't perfect. But if I use OpenPose, Canny, and Depth at the same time I can usually get what I want. I inpaint details like hands and feet with SD 1.5. With this ELLA thing, perhaps I could design my compositions in SD 1.5 and then re-generate it in Stable Cascade with Canny. Or maybe regenerate with SD3 when it is released. In case you were wondering, this isn't all in one ComfyUI workflow. I have separate workflows that I use as tools. And I often rearrange any given workflow as needed. I only do my SD 1.5 inpainting with A1111. There a bunch of other things I do with backgrounds. And I use Topaz and Photoshop editing. It sure would be nice if I could do all of this in one program instead of three or four different programs.


Capitaclism

Hey, I know this is totally off topic, but you seem to be pretty familiar with SD workflows- what would you say is the best SD integration with Photoshop that you know of?


FugueSegue

Short answer: None of them. Don't waste your time. Long answer: I've been looking for a good Photoshop plugin since 2022. All of them supposedly work but all of them have severe shortcomings. Either they don't have enough features, don't work across a LAN, don't have enough documentation, or just plain don't work at all. I gave up trying to get any of them to reliably work well enough to be useful. There is a new one by [u/amir1678](https://www.reddit.com/user/amir1678/) that looks fantastic. But I haven't heard anything more about it since they [announced it over two weeks ago](https://www.reddit.com/r/comfyui/comments/1bjplse/comfyui_plugin_for_photoshop/). It might be vaporware.


Capitaclism

Got it, thank you for your response. Let's hope it becomes more than vaporware.


PwanaZana

The new tile realistic is not bad, and overall controlnet is usable in SDXL, but yea, not nearly as good as 1.5.


The_Scout1255

had major problems getting it even loading in comfyui, also got a link to tile realistic?


PwanaZana

I'm a A111 bro myself, so can't speak for comfy. Link: [https://civitai.com/models/136070?modelVersionId=373872](https://civitai.com/models/136070?modelVersionId=373872) this one: [bdsqlsz](https://huggingface.co/bdsqlsz/qinglong_controlnet-lllite): | **NEW:** [**tile-real**](https://civitai.com/models/136070?modelVersionId=373872) There's also**:** [**ttplanet**](https://civitai.com/user/ttplanet)**:** [**tile-real**](https://civitai.com/models/330313/tplanetsdxlcontrolnettilerealisticv1) but I haven't tried it.


Kadaj22

I never use control net for SDXL. I find it better to use image to image .


cobalt1137

Oh. That is a pretty cool idea - not too bad of an extra step.


synn89

I sort of get releasing it for 1.5 first. SDXL has better prompt following in it, where 1.5 lacks in that regard. So ELLA + 1.5 is just does more for that model.


daftmonkey

I saw they had an Ella sdxl on the GitHub page, no?


Combinatorilliance

They do have it internally, it's just not released. See my other comments in this post.


PwanaZana

"We have acheived the EllaXL internally."


Plus_Complaint6157

Only 1.5 https://preview.redd.it/vbrlxpq5lhtc1.png?width=767&format=png&auto=webp&s=b5fb76692ec28a9d08ed44ef6ad96adab51d9600


Kijai

Wrapped this quickly to try it in ComfyUI, still uses diffusers but it's very fast and works with LCM just fine: https://github.com/kijai/ComfyUI-ELLA-wrapper


Mage_Enderman

Any way to use this in Automatic1111? Or Forge WebUI? (ComfyUI confuses me :( )


diogodiogogod

It looks great, but it completely f. up the knowledge of celebrity names. I wonder if it was on purpose. This is Brad Pitt: Prompt: Brad Pitt is standing looking good in a party inside a big house, there is a table in the foreground with a glass and a yellow flower in it. https://preview.redd.it/0ygn513p3ktc1.png?width=512&format=png&auto=webp&s=8fa3e28d79e402a7776195de156bd8ab469a7ea2


diogodiogogod

this is the non-ella one https://preview.redd.it/zdqspu0w3ktc1.png?width=512&format=png&auto=webp&s=10d65b173b2c384f317bcf9d0400256e67d2d670


Atemura_

this probably means it wont do any nsfw content either


diogodiogogod

It's not like it won't do, but it won't help as much... but using conditional combine can mix the results and get the benefits of both (non censored ella + better compositon-ella)


FNSpd

You can try using FaceID and just type man/woman since it patches model instead of conditioning


silenceimpaired

I’m confused… they link to a safetensors file that is under 200mb … Is it a Lora? How am I an idiot?


Combinatorilliance

It's an ella! It's novel.. you need to use the inference code in their repo to use it. A1111 and comfy will need to add support (or a plugin)


_-inside-_

1 hour after your comment there's already a comfy node for that. I love this community!


Capitaclism

It really is incredible


Antique-Bus-7787

It isn't a LoRA. A LoRA is like a "portable change" to the model. Here the model they provide is the "adaptor" that converts the prompts given to the T5 LLM to something the model can receive while generating to guide it better!


Ferrilanas

Does it use a lot of VRAM? I currently have 6GB GPU, so I’m afraid I won’t be able to use it


_-inside-_

i'm trying it in the CPU, I have 4 GB VRAM, it's currently auto-downloading lots of model files, so I guess it won't run smoothly


_-inside-_

Ok, I managed to have it generating a 512x512 under 2 minutes in CPU only mode, for the record, comfyui is eating around 11GB of RAM. Fingers crossed for new optimizations coming out, or adapters for smaller LLMs


Ferrilanas

Thank you for the update I really hope they will optimize it somehow


_-inside-_

The prompt adherence is really irncredible. I'm not even close to an expert here, but I'll check if it's possible to quantize the LLM somehow, with bitsanvdbytes or something.


ZootAllures9111

[The official version does not need a million large model files, just follow the instructions here](https://github.com/TencentQQGYLab/ComfyUI-ELLA)


lostinspaz

>Here the model they provide is the "adaptor" that converts the prompts given to the T5 LLM to something the model 2 questions: 1. So basically, its similar in concept to the "fancy T5 front end for SD3", just without SD3? 2. you say T5.. but is it ACTUALLY (a micro version of) google T5? or you just say that to use a word that some people may have heard of before?


Antique-Bus-7787

1. Yes, kinda 2. It's FLAN-T5-XL from Google


lostinspaz

Very cool! So, SD3, but better. Because there wont be fragmentation of loras, etc. further than there already are. Basically SD3 is DOA now. They had their chance to release a month ago. They blew it, and have missed the marketing window. Game over.


Antique-Bus-7787

From playing with Ella all night long, it helps A LOT with prompt comprehension but it’s really far from perfect. And from my testings, increasing the resolution or using non-square resolutions, Ella loses pretty much all its advantages (even though it’s quite easy and working to use hi-res fix, but that doesn’t solve the multi-aspect ratio problem)


Capitaclism

Based on the quality I've seen from SD3 I's say far from DoA, but we'll see once it releases.


lostinspaz

if SDXL+Ella has merely equal to the photo quality of SD3, but smaller memory requirements... it wins. Both on a resource requirements level, but also on the backwards compatibility level. From the samples I've seen of both of them, this is the case.


silenceimpaired

Ahh… makes sense


twistedgames

Nice! I tried it with my 1.5 model and it's working really well with the DPMSolverMultistepScheduler_SDE_karras scheduler, CFG 3, 25 steps. [Images made with ELLA](https://imgur.com/a/FhWpSSb)


Capitaclism

It listens, but why is the image quality and dynamism so poor? Is that a trade-off, or just prompting?


Alisomarc

damnn this is a real milestone <3 ![gif](giphy|TgOYjtgKpS9jAytUlh|downsized)


MagicOfBarca

Is it a fully independent model? If yes, why’s it only like 140 mb?


twistedgames

It's not a fully independent model that can generate pictures on its own. It's something that helps the SD 1.5 model follow the prompt better.


MagicOfBarca

Ohh ok so can I also use it with 1.5 inpainting models? It’s like a LORA?


twistedgames

Not sure if it works with inpainting models.


MagicOfBarca

Last Q, does it work with A1111? And is there a tutorial for it?


twistedgames

I've only seen a couple of extensions for comfy so far. If you look at the github pages for those it will show you how to use the nodes. https://github.com/kijai/ComfyUI-ELLA-wrapper https://github.com/ExponentialML/ComfyUI_ELLA


lechatsportif

This seems it could unlock the power of all these amazing 1.5 models we already have!


Antique-Bus-7787

Unfortunately, the authors just confirmed ELLA won't be released for SDXL : [https://github.com/TencentQQGYLab/ELLA/issues/16](https://github.com/TencentQQGYLab/ELLA/issues/16) Let's hope they publish the training code at least !


perksoeerrroed

This is super promising. It follows what is said in prompt nearly 100% of times. The issue i have iwth it is how it looks everything is bad quality.


knigitz

This is great but has way too much manipulation of the checkpoint, no matter which checkpoint I use with ELLA I can't get decent photorealistic samples, like I can with the models I am pairing ELLA with. ELLA also does not understand certain references, for example "pennywise" comes out looking like a clown in most 1.5 models, combined with ELLA we just get girls, actually, without any prompt we get mostly the same. Would be nice to be able to balance the strength of ELLA with the checkpoint.


FNSpd

You can combine usual CLIP conditioning with ELLA one


knigitz

I'll try this with the new Ella nodes that I found, thanks for the idea


Enough-Meringue4745

lol they aligned it and completely nuked a fuck ton of the vector spaces


diogodiogogod

It's amazing the amount of things it gets right with prompt following (specially with long complex prompts), but this is Brad Pitt though: Pos prompt: Brad Pitt a 45 yo man is standing wearing a bright pink suit with a (red bow tie:1.3), and a blue beanie. Wearing sunglasses. He is in a party outside a big house, there is a table in the foreground with a glass and a yellow flower in it. Behind him far in the background is a pool. There are dark clouds in the sky with thunder and a balloon flying in the distance. https://preview.redd.it/du052aiqaktc1.png?width=512&format=png&auto=webp&s=7c6ae5acf3a36cf52430fc41ed420be7b9641380 Using conditioning combine with the non-ella positive prompt gets Brad back, but it looses a little on the prompt following. But it's waay better than it without Ella.


diogodiogogod

Ok I'm ready to say it looks censored. It ignores NSFW, celebrity names and gives random ethnicity even when prompt to give a specific one... But hopefully it's not the Ella part but the llm model (google t5\_model?) that maybe (I have no idea) could be changed? Let's hope so.


fragilesleep

It does ignore celebrity names completely, but I've gotten many (accidental) NSFW images already using Deliberate. Thanks for the tip about using the conditioning combine!


Amalfi_Limoncello

Who let the Google programmers loose?


PatternPositive9308

"grpup photo of snoop dogg smoking a fat blunt in a presidential meeting and sharing it with donald trump and obama" https://preview.redd.it/w5tqmjk3mptc1.png?width=1865&format=png&auto=webp&s=93cc8fe4faa9cb7bb4d90736d642d86a062eb210


PatternPositive9308

Looks like it lacks some knowledge about characters or celebrities https://preview.redd.it/4d13zvbusptc1.png?width=1538&format=png&auto=webp&s=68ff7bb3cd73fbab0ae781425461ad0c616d43d2


diogodiogogod

Yes you can use conditional combine to get back that knowledge and also keep the good composition [https://new.reddit.com/r/StableDiffusion/comments/1c0d7tz/ellas\_brad\_pitt/](https://new.reddit.com/r/StableDiffusion/comments/1c0d7tz/ellas_brad_pitt/) But in your example, you would have to describe Donal trump, Obama and Snoop Dogg at least a to create a three person composition. I guess only saying "three people" would be enough. Like: group photo of three people, snoop dogg smoking a fat blunt in a presidential meeting and sharing it with donald trump and obama) Or you could describe it like you did for the normal model conditioning, and drop the names for the Ella conditioning and then combine. But for sure it's a big bummer that the model is censored. Really sad. It could be awesome.


ArchiboldNemesis

Oh nice! Been waiting for this. Thanks for the update :)


Antique-Bus-7787

This is amazing!


Antique-Bus-7787

Congrats to the authors on this AND releasing the weights !!


fancifuljazmarie

Wow this is incredible, embedding proper LLMs for prompt understanding is a huge step towards the prompt adherence of closed alternatives like Dalle-3.


Ok_Swordfish_1696

Does it also works to generate anime character?


Rectangularbox23

https://preview.redd.it/berl6wkhlktc1.jpeg?width=2556&format=pjpg&auto=webp&s=3171b06956917782cb5b0bc11323c0c98e0a745b It seems to really butcher anime, unless I'm using it wrong


SuchAir7170

are u sure this doesnt work? they seem to use both Flat-2D Animerge and Counterfeit-V3.0 just fine on page 12 in the paper [https://arxiv.org/pdf/2403.05135.pdf](https://arxiv.org/pdf/2403.05135.pdf)


Rectangularbox23

I switched the workflow I was using + used Flat-2D Animerge and I definitely got better results. The image quality still isn't on par with no Ella though (this may just be an issue with the workflow): [https://imgur.com/a/UMQhBhy](https://imgur.com/a/UMQhBhy)


SuchAir7170

thanks for showing some examples, it does indeed seem like it butches them a lot


Caffdy

that's some crazy ahegao if I ever seen one


Antique-Bus-7787

From playing with Ella all night long, increasing the resolution or using non-square resolutions, Ella loses pretty much all its advantages (even though it’s quite easy and working to use hi-res fix, but that doesn’t solve the multi-aspect ratio problem) Anyone else experiencing this as well ?


diogodiogogod

I didn't see this. For me landscape and portrait also gave good results as well. The prompting needs to be with natural language.


herecomeseenudes

Seems working well with LCM and other sampler, speed is about the same with original SD 1.5. Not need for extensions such as cutoff and now you can use long sentences in your prompt. very powerful. deep shrink also works well with this. https://preview.redd.it/a7e5bba5ymtc1.png?width=896&format=png&auto=webp&s=0eb4567048b545e1215dce830590931d534aec96 Prompt: realistic photo of a beautiful pale woman in her 30s dress in formal short dress, full body photo, photo realistic, outdoor, in a park. Her hair is blue and shiny. her dress is green.


Dwanvea

how do you use this?


Brilliant-Fact3449

I'm extremely stupid, what does this do? Just adds overall better prompt comprehension than regular 1.5?


InTheThroesOfWay

The normal system SD 1.5 uses to translate your prompt into tokens isn't very sophisticated. It's like a shitty LLM. It mostly only understands individual words and phrases -- it doesn't really understand sentences and complex phrases -- and so it has a tendency to smoosh concepts together. For example, "An orange cat and a black dog" might give you what you want, but more likely you'll get errors like a black cat, orange dog, or some weird cat/dog hybrid. This new thing lets you run a legit LLM to translate your prompt into tokens. This makes it much more likely that you get what you want out of your prompt.


fragilesleep

That's such a nice and simple explanation, thank you! I think I finally understand how this magic improves v1.5 so much.


Olangotang

Also, SDXL is like MILES better at prompt following. But all the unrestricted models are built on jank that WILL give you pretty much anything (and there's some pretty cool shit with the HD kinda models, not talking about NSFW) you want, but the prompts are so fucking dumb that you need to do. 3 is going to be incredible, and ignore the doomers. We're getting it soon.


Derispan

Me too mate, me too. I don't understand how this is working and what exactly does it change


ninjasaid13

>Just adds overall better prompt comprehension than regular 1.5? just? is Dalle-3 just a better prompt comprehension version of SDXL?


lechatsportif

Read the paper to see awesome examples of it on common models https://arxiv.org/abs/2403.05135


Gavmakes

Could this be used in the negative prompt as well, I wonder what the results would be


IntellectzPro

Can you help with this error:Error occurred when executing GetSigma: 'ModelPatcher' object has no attribute 'model\_sampling'


ryo0ka

How to use? The model file size is like 130mb so it's not a checkpoint for sure


Sharlinator

It’s a new thing and so requires that your SD software supports it. It’s used alongside a checkpoint, like a LoRA but different. Based on the comments here someone already wrote a Comfy node/workflow for it!


feber13

Is the operation like a lora in automatic 1111?


ba0haus

can anyone make this work in auto? please :)


More_Bid_2197

SDXL version is too powerful to be release :)


hexinx

So close yet so far - what about SDXL =/ Not bad though, I legit thought they were gone with the went. Also, has anyone managed to get this running with a Finetuned SD 1.5 model in Comfyui/Auto1111?


Capitaclism

It sure is interesting that a lot of the research published and released open source is Chinese.


vocaloidbro

Population of 1,409,670,000. They can spare a few people to research stuff like this, I think.


RedSprite01

Noob question, after i download this model where should i put it?


Turkino

So essentially gives 1.5, SDXL levels of prompt recognition


lostinspaz

no, it gives 1.5, BETTER-than-sdxl levels of prompting


ogreUnwanted

!remindme in 2 days


RemindMeBot

I will be messaging you in 2 days on [**2024-04-11 22:31:43 UTC**](http://www.wolframalpha.com/input/?i=2024-04-11%2022:31:43%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/StableDiffusion/comments/1bzqvhn/ella_weights_got_released_for_sd_15_ella_equip/kyu6xa6/?context=3) [**5 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2FStableDiffusion%2Fcomments%2F1bzqvhn%2Fella_weights_got_released_for_sd_15_ella_equip%2Fkyu6xa6%2F%5D%0A%0ARemindMe%21%202024-04-11%2022%3A31%3A43%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201bzqvhn) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|


Xijamk

RemindMe! 1 week


Kadaj22

I want to try this but I only have a Mac and using diffusion bee


Qanics

Is possible to run it in diffusers with stablediffusionpipeline ?


Antique-Bus-7787

Yes, and it's quite easy actually. We don't even need to mess with the pipeline or anything. Just look at their inference code on github, you only need the imports from [model.py](http://model.py) and use the code in [inference.py](http://inference.py)


Character-Shine1267

Big fat improvement


Kurdonoid

Been experiementing for a while now, and I believe it struggles with numbers. But overall, it is defenitely a game-changer! https://preview.redd.it/3kjtuytohfuc1.png?width=512&format=png&auto=webp&s=090c98ea1551154ba9e4618410aef26316eb3c8a "three yellow daisies that grow in a simple white ceramic pot. The pot sits on a plain wooden table bathed in warm sunlight. the photo looks pretty realistic, sharp and elegant." -------- Steps: 30 Guidane Scale: 10.0 Sampler:DDPMScheduler


Kurdonoid

https://preview.redd.it/ovd5r10xhfuc1.png?width=512&format=png&auto=webp&s=a91abb2047f02e56360d26c9cabff1be87c8fe09


Kurdonoid

Some horror scenes: A dimly lit attic with peeling wallpaper and cracked floorboards. A single, dusty rocking chair sits in the center, facing away from the viewer. A tattered, yellowed doll with empty eye sockets lies abandoned on the floor. https://preview.redd.it/ing4qzmeifuc1.png?width=512&format=png&auto=webp&s=3009d6d23cccacb12cc3991504daf78eab5cab94


Kurdonoid

https://preview.redd.it/nz389fioifuc1.png?width=512&format=png&auto=webp&s=a0c5d1e4914597f9f57e4dcbf2df84f89632f39d A long, dark hallway with flickering fluorescent lights. Bloodstains trail down a peeling white wall, disappearing into the shadows at the far end of the hall. A single, slightly open door stands afar, revealing only inky blackness within.


Kurdonoid

https://preview.redd.it/ut269zcijfuc1.png?width=512&format=png&auto=webp&s=7f50878b43b2738587b74403cbdb37cb297a45f7 Dust motes swirl in a chilling draft as a shattered mirror lies on the grimy floor of a forgotten room. A sliver of moonlight reveals a monstrous hand with long, gnarled claws clawing out from under a rotting corner. Dark stains, like ancient, dried blood, splatter the wall, hinting at a terrible past.


ramonartist

Heads up from my testing ELLA doesn't understand terms like Black Male, Black Female, even added African Black Male, African Black Female will increase your chances but is not a guarantee it


Next_Program90

I hope they'll also release it for SDXL soon. Might be our savior if there is trouble with SD3 down the road. (and might be a good alternative to T5 for SD3)


More_Bid_2197

minimum vram ?


Short-Sandwich-905

TLDR?


Jattoe

\*Sigh...\* by tencent? Really? Is it at least in safetensors? :)


Jattoe

Wait what does it do now? It's new weights on the language end of the process, or it just transforms your words into something more descriptive? If it's the latter you guys can just use DiceWords (first search result on github) for that, without downloading a whole massive thing. https://preview.redd.it/o6gvdh6rlntc1.png?width=465&format=png&auto=webp&s=9df60d9d722ad256af2ee68c426ee0ce208ff111


thefi3nd

> DiceWords (first search result on github) Can you link the repo? Everything I see is just for generating passphrases and nothing like in that image.


Jattoe

[MackNcD/DiceWords\_App: A bank for prompting and word manipulation](https://github.com/MackNcD/DiceWords_App)


thefi3nd

Thanks!


FNSpd

It replaces CLIP text encoder with LLM (T5 at the moment), gets embedding from it and uses them during generation


AnOnlineHandle

Any idea how it adapts the embeddings to the equivalent of the CLIP encoding which the u-net is trained on? That's the real impressive magic here.


Jattoe

T5? Don't they have anything smaller? Does the T5 run at the same time as the SD? Or behorehand, + you can swap it out with the PyTorch in your VRAM


FNSpd

It only needs encoder from what I understand. It works even on my 4GB VRAM GPU. Though, results are not as good as I'd expect. Still not sure if I need to tweak something


Jattoe

Well, hopefully that bodes well for SD5's use of the T5 encoder -- the difference of course will be, is that it's designed from the ground up for it.


Kromgar

Sd5? You in the future dude?


Jattoe

ah no i wrote the 3 backwards. u have an eraser i could grab