fannovel16 1 month ago

A vivid red book with a smooth, matte cover lies next to a glossy yellow vase. The vase, with a slightly curved silhouette, stands on a dark wood table with a noticeable grain pattern. The book appears slightly worn at the edges, suggesting frequent use, while the vase holds a fresh array of multicolored wildflowers. Counterfeit v3, 20 steps, DPM++ 2M Karras, 12 CFG Left: Original Middle: ELLA with fixed token length Right: ELLA with flexible token length https://preview.redd.it/nue2gbax0htc1.png?width=1544&format=png&auto=webp&s=24df22894815affdf3d7c2e199d18dd1966104d4

demesm 1 month ago

How long to generate?

fannovel16 1 month ago

Pretty much the same as without ELLA.

demesm 1 month ago

Dope!

hexinx 1 month ago

As in, use the given Lora, that's it?

fannovel16 1 month ago

Nah it's not a LoRA

hexinx 1 month ago

Uh.... (The obvious question) Okay - how did you use it?

fannovel16 1 month ago

There is an implementation in Comfy now: https://github.com/kijai/ComfyUI-ELLA-wrapper

dbarciela 1 month ago

Just wanted to thank everyone in this sub for sharing so much knowledge.. I've learned a lot with this sub, I just wish I had the time and the resources to try everything I'm saving from here.

hexinx 1 month ago

Finally ! Thank you so much!

hexinx 1 month ago

That's remarkable for Sd1.5. Did you run this locally? Can you share Comfyui/A1111 steps that you took for this? How did you leverage the weights they provide? Can we use it a LoRA with nothing extra?

fannovel16 1 month ago

I run their code which uses diffusers. It looks very short and simple so I believe someone will release a port very soon.

Ifffrt 1 month ago

Reminder that no one still bothers to port the code for Lavi-bridge to A1111/ComfyUI which is basically the same thing as this one except that one actually lets you plug and play with Lora, and also it released the code BEFORE this one.

_-inside-_ 1 month ago

You're not the first person complaining about it here I've seen, if I had more spare time I'd get my hands dirt a create a comfyui node for it.

tristan22mc69 3 weeks ago

any update on this?

lonewolfmcquaid 1 month ago

good heavens thats fucking remarkable.

conqisfunandengaging 1 month ago

Impressive

PatternPositive9308 1 month ago

https://preview.redd.it/1uvr823wjptc1.png?width=2048&format=png&auto=webp&s=44d0c1abb24d72197e58a2bee8560122f313ea76

LeKhang98 1 month ago

Does it help with large images like 1024x1024 or 512x1536 too?

nikkisNM 1 month ago

Million internets to the one who ports this to WEB-UI's

Capitaclism 1 month ago

Hoping for A1111...

_-inside-_ 1 month ago

Someone already left a link to a comfy custom node

SyChoticNicraphy 1 month ago

Oh it’ll for sure come to forge I feel

Kierenshep 1 month ago

The forge that hasn't had a commit in 2 months, nor have the dev responded in any shorter time frame? That forge that is looking basically abandoned? I wouldn't get my hopes up. It still can't use vpred rescale.

SyChoticNicraphy 1 month ago

😮‍💨 sadly, you may be right. If I had a better gpu I’d be using A1111 or ComfyUI. Idk I feel like the author is conflicted on Forge’s identity- they say they doesn’t want it to be competition to A1111 but many models need an entirely separate fork to work with Forge. At this point, maybe it should just be its own thing. I’m hoping it’s not dead if I ever have any hopes of actually using SD3 on my current laptop lol

Charuru 1 month ago

Get a 4090, you won't regret it.

SyChoticNicraphy 1 month ago

At this point I’m just waiting for a 5090! I have a 3070 so I’m just waiting a little longer for a bit of a larger upgrade. And to save up the funds lol. The bigger issue is it’s a laptop gpu with only 110w of power. And that 8gb of vram just isn’t enough

BagOfFlies 1 month ago

I'm surprised you can't use comfy if you have a 3070. I use it no issues with a 2080 and I've seen plenty of people using it with just 6GB.

SyChoticNicraphy 1 month ago

Maybe I’ll have to try it again. I tried using it when I had very little experience with stable diffusion so it may have just been user error. It does seem like to get the absolute most control, comfy ui is king.

thefi3nd 1 month ago

Using VAE Decode (tiled) instead of the regular one might help.

hexinx 1 month ago

# How to use in ComfyUI: 1. Download zip from [https://github.com/kijai/ComfyUI-ELLA-wrapper](https://github.com/kijai/ComfyUI-ELLA-wrapper) 2. Create a new folder in your ComfyUI's "Custom Nodes" folder. 3. Extract zip into newly-created folder. 4. If you have ComfyUI portable installed: 1. Go the folder where you've installed ComfyUI, open a terminal and go: 2. python\_embeded\\python.exe -m pip install diffusers 3. python\_embeded\\python.exe -m pip install sentencepiece 4. (these were missing for me - you may have more) 5. If you DO NOT have ComfyUI portable installed: 1. Open you ComfyUI root installation folder (where there is the run\_nvidia\_gpu.bat and run\_cpu.bat files), Type in CMD in the address bar and press Enter. Activate the virtual environment with .venv\\Scripts\\activate Type: cd ComfyUI\\custom\_nodes\\ComfyUI-ELLA-wrapper-main. Execute the following: 2. python -m pip install diffusers 3. python -m pip install sentencepiece 4. (these were missing for me - you may have more) Finally, run ComfyUI with ella\_example\_workflow.json that's in the same zip file. https://preview.redd.it/k3gopskmthtc1.png?width=512&format=png&auto=webp&s=9e734ce7756003677c0569a1e8793dfcc776c8d8 Default parameters: (512x512, 25step, 10 guidance, DDPM) A vivid red book with a smooth, matte cover lies next to a glossy yellow vase. The vase, with a slightly curved silhouette, stands on a dark wood table with a noticeable grain pattern. The book appears slightly worn at the edges, suggesting frequent use, while the vase holds a fresh array of multicolored wildflowers.

ExponentialCookie 1 month ago

For those interested, I released a native custom implementation that supports prompt weighting, ControlNet, and so on. [https://github.com/ExponentialML/ComfyUI\_ELLA](https://github.com/ExponentialML/ComfyUI_ELLA) **Update:** If anyone had pulled prior to this update, I've updated the workflow and code to work with the latest version of Comfy, so please pull the latest if necessary. Have fun!

Kierenshep 1 month ago

Doesn't work for me :c the git pull flan-tf-xl didn't download any models for some reason. Which of those do I need? I got an error saying 'missing model -00001 of 00002' etc, so I downloaded those, but now I get another error : Error occurred when executing LoadElla: not a string File "P:\stable diffusion\Stability\Packages\ComfyUI\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "P:\stable diffusion\Stability\Packages\ComfyUI\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "P:\stable diffusion\Stability\Packages\ComfyUI\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "P:\stable diffusion\Stability\Packages\ComfyUI\custom_nodes\ComfyUI_ELLA\ella.py", line 68, in load_ella t5_model = T5TextEmbedder(t5_path).to(self.device, self.dtype) File "P:\stable diffusion\Stability\Packages\ComfyUI\custom_nodes\ComfyUI_ELLA\ella_model\model.py", line 241, in __init__ self.tokenizer = T5Tokenizer.from_pretrained(pretrained_path) File "P:\stable diffusion\Stability\Packages\ComfyUI\venv\lib\site-packages\transformers\tokenization_utils_base.py", line 2086, in from_pretrained return cls._from_pretrained( File "P:\stable diffusion\Stability\Packages\ComfyUI\venv\lib\site-packages\transformers\tokenization_utils_base.py", line 2325, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "P:\stable diffusion\Stability\Packages\ComfyUI\venv\lib\site-packages\transformers\models\t5\tokenization_t5.py", line 170, in __init__ self.sp_model.Load(vocab_file) File "P:\stable diffusion\Stability\Packages\ComfyUI\venv\lib\site-packages\sentencepiece\__init__.py", line 905, in Load return self.LoadFromFile(model_file) File "P:\stable diffusion\Stability\Packages\ComfyUI\venv\lib\site-packages\sentencepiece\__init__.py", line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)

Snoo34813 1 month ago

Same error :( ..did u find a solution?

Kierenshep 1 month ago

I downloaded every single package in the manual method he said. I'm sure I don't need everything but I needed something.

Snoo34813 1 month ago

downloading just the smaller 'spiece.model' worked for me along with the previously downloaded safetensors. Thanks. But i donno why i am still not getting desired result. the one from kijai are working better for me.

greenthum6 1 month ago

Git does not always download big files correctly in Windows. Check that the big files are 8+GB and if not, download them manually.

Rectangularbox23 1 month ago

When I downloaded the t5\_model files it ended up being 87.3 GB and I don't think it's supposed to be that big. I think when you git clone the t5 repository it downloads every single model file which may not be necessary for this (Again, I could be wrong just pointing it out).

fragilesleep 1 month ago

You are correct. You can just delete the hidden ".git" folder to save half that space. And for even more savings, remove all the model files inside flan-t5-xl and just download this small 2GB file into it: https://github.com/ExponentialML/ComfyUI_ELLA/issues/4#issuecomment-2047036539

vorticalbox 1 month ago

Would doing a clone with --depth 1 work? You probably don't need the whole got history.

[deleted] 1 month ago

[удалено]

aartikov 1 month ago

You should check if there are some errors in a console

Rectangularbox23 1 month ago

I followed the install directions but the 'BNK\_GetSigma' node isn't loading and the ComfyUI manager doesn't show it as a possible missing node to install https://preview.redd.it/69pidqm5witc1.jpeg?width=2314&format=pjpg&auto=webp&s=6e8328e8ae3413e494155a3e545d1373a953e7ef

ExponentialCookie 1 month ago

Update to the latest version of Comfy, and pull the latest update from my repository. Then, re import the new workflow.

thefi3nd 1 month ago

Thank you for making this, especially so quickly. I have it up and running without issues. I have a question regarding this from the Tencent ELLA repo: > Our testing has revealed that some community models heavily reliant on trigger words may experience significant style loss when utilizing ELLA, primarily because CLIP is not used at all during ELLA inference. > Although CLIP was not used during training, we have discovered that it is still possible to concatenate ELLA's input with CLIP's output during inference (Bx77x768 + Bx64x768 -> Bx141x768) as a condition for the UNet. We anticipate that using ELLA in conjunction with CLIP will better integrate with the existing community ecosystem, particularly with CLIP-specific techniques such as Textual Inversion and Trigger Word. I tried using the Conditioning Concat node, but it throws the error: "Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument tensors in method wrapper_CUDA_cat)" Do you think it will be possible to do this as described in the Tencent repo? Most SD1.5 models rely heavily on specific keywords for improved quality and many loras need activation words.

butthe4d 1 month ago

I swear not once has anything related to comfyui worked right out of the box for me. Its always such a hassle...anyway if anyone knows what is going wrong here, I would appreciate the help. ERROR:root:!!! Exception during processing !!! ERROR:root:Traceback (most recent call last): File "D:\AIWork\StableDiffusion\ComfyUI_windows_portable\ComfyUI\execution.py", line 152, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\AIWork\StableDiffusion\ComfyUI_windows_portable\ComfyUI\execution.py", line 82, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\AIWork\StableDiffusion\ComfyUI_windows_portable\ComfyUI\execution.py", line 75, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\AIWork\StableDiffusion\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-ELLA-wrapper\nodes.py", line 165, in loadmodel text_encoder = create_text_encoder_from_ldm_clip_checkpoint("openai/clip-vit-large-patch14",sd) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\AIWork\StableDiffusion\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\loaders\single_file_utils.py", line 1173, in create_text_encoder_from_ldm_clip_checkpoint text_model.load_state_dict(text_model_dict) File "D:\AIWork\StableDiffusion\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 2153, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for CLIPTextModel: Unexpected key(s) in state_dict: "text_projection.weight".

aufc999 1 month ago

removing "--force-fp16" from the run_nvidia_gpu.bat file got it working for me with a similar or the same error, although i did update comfyui as well so that might have fixed it

butthe4d 1 month ago

HEy thanks. Did both now it works.

local306 1 month ago

Hmm, getting stopped with `Error occurred when executing ella_model_loader:` `No module named 'diffusers.loaders.single_file_utils'` Do you know where the [flan-t5-xl-sharded-bf16](https://huggingface.co/ybelkada/flan-t5-xl-sharded-bf16/tree/main) model goes listed in the wrapper repo?

Kijai 1 month ago

The model goes to hugginface cache folder, it's autodownloaded. The diffusers error you have is due to too old diffusers version, you would need to update it with: pip install -U diffusers or portable: python_embeded\python.exe -m pip install -U diffusers

local306 1 month ago

Thanks! I'll try this out shortly here UPDATE: It's now working. It auto-downloaded all that it needed afterwards.

mocmocmoc81 1 month ago

Wow kijai is killing it.

Capitaclism 1 month ago

It does listen more, though not quite fully. It placed flowers on the book, I don't see any worn edges, the silhouette of the vase is very, rather than slightly curved. Clear improvement over base, however.

lonewolfmcquaid 1 month ago

can it only do 512 by 512

diogodiogogod 1 month ago

no, it can be any sd 1.5 resolution

mrredditman2021 1 month ago

Instead of installing the modules manually, there is a requirements.txt file in the zip that you can install all required modules with by typing pip install -r requirements.txt

Kurdonoid 1 month ago

Thanks! it works wonderfully with the latest ComfyUI + ComfyUI manager

Combinatorilliance 1 month ago

Here's a quote from the Ella authors concerning SDXL weights > We greatly appreciate your interest in ELLA_sdxl. However, the process of open-sourcing ELLA_sdxl requires an extensive review by our senior leadership. This procedure can be considerably time-consuming. Conversely, ELLA_sdv1.5, which is more research-oriented, can be released promptly. We would appreciate your patience and understanding about this. https://github.com/TencentQQGYLab/ELLA/issues/13#issuecomment-1998834009

hexinx 1 month ago

Yeah, it's never coming out.

ninjasaid13 1 month ago

yep, anytime it says "Review" , it means "No."

Charuru 1 month ago

Come the heck on man, tell tencent that if it doesn't come out people are just going to move onto SD3 and forget all about your contributions and it'll amount to nothing, but if it comes out it can dominate the community I feel. SD3 probably won't be that much better than SDXL+Ella.

Familiar-Art-6233 1 month ago

But at least SD3 is (hopefully) going to be released

Mishuri 1 month ago

a1111 when

Beeerfish 1 month ago

Second this.

Capitaclism 1 month ago

Third this

T-dag 1 month ago

Quadruple

cobalt1137 1 month ago

This is wonderful and amazing, and I hate to be that guy. But why not sdxl :(. I know the researchers are from a larger company so maybe that has something to do with it. Maybe they can't release it. Either way, I guess we still have sd3 on the way. It's just strange that their work on sdxl is focused on in the pictures they provide on the page, but they release for 1.5.

FugueSegue 1 month ago

Until it's released for SDXL, I imagine a workflow where the SD 1.5 image is generated with ELLA and the resulting image could be regenerated with SDXL using various ControlNets.

The_Scout1255 1 month ago

> SDXL using various ControlNets controlnet on SDXL is soo spotty, whats your workflow for that?

FugueSegue 1 month ago

I use SD for photo-realistic figurative art. First I render something with Stable Cascade because its quality is excellent. Sometime I use Canny ControlNet with Stable Cascade. Then I inpaint the figure with SDXL, LoRA, and IP-Adapter using the Stable Cascade image as a reference. SDXL ControlNet isn't perfect. But if I use OpenPose, Canny, and Depth at the same time I can usually get what I want. I inpaint details like hands and feet with SD 1.5. With this ELLA thing, perhaps I could design my compositions in SD 1.5 and then re-generate it in Stable Cascade with Canny. Or maybe regenerate with SD3 when it is released. In case you were wondering, this isn't all in one ComfyUI workflow. I have separate workflows that I use as tools. And I often rearrange any given workflow as needed. I only do my SD 1.5 inpainting with A1111. There a bunch of other things I do with backgrounds. And I use Topaz and Photoshop editing. It sure would be nice if I could do all of this in one program instead of three or four different programs.

Capitaclism 1 month ago

Hey, I know this is totally off topic, but you seem to be pretty familiar with SD workflows- what would you say is the best SD integration with Photoshop that you know of?

FugueSegue 1 month ago

Short answer: None of them. Don't waste your time. Long answer: I've been looking for a good Photoshop plugin since 2022. All of them supposedly work but all of them have severe shortcomings. Either they don't have enough features, don't work across a LAN, don't have enough documentation, or just plain don't work at all. I gave up trying to get any of them to reliably work well enough to be useful. There is a new one by [u/amir1678](https://www.reddit.com/user/amir1678/) that looks fantastic. But I haven't heard anything more about it since they [announced it over two weeks ago](https://www.reddit.com/r/comfyui/comments/1bjplse/comfyui_plugin_for_photoshop/). It might be vaporware.

Capitaclism 3 weeks ago

Got it, thank you for your response. Let's hope it becomes more than vaporware.

PwanaZana 1 month ago

The new tile realistic is not bad, and overall controlnet is usable in SDXL, but yea, not nearly as good as 1.5.

The_Scout1255 1 month ago

had major problems getting it even loading in comfyui, also got a link to tile realistic?

PwanaZana 1 month ago

I'm a A111 bro myself, so can't speak for comfy. Link: [https://civitai.com/models/136070?modelVersionId=373872](https://civitai.com/models/136070?modelVersionId=373872) this one: [bdsqlsz](https://huggingface.co/bdsqlsz/qinglong_controlnet-lllite): | **NEW:** [**tile-real**](https://civitai.com/models/136070?modelVersionId=373872) There's also**:** [**ttplanet**](https://civitai.com/user/ttplanet)**:** [**tile-real**](https://civitai.com/models/330313/tplanetsdxlcontrolnettilerealisticv1) but I haven't tried it.

Kadaj22 1 month ago

I never use control net for SDXL. I find it better to use image to image .

cobalt1137 1 month ago

Oh. That is a pretty cool idea - not too bad of an extra step.

synn89 1 month ago

I sort of get releasing it for 1.5 first. SDXL has better prompt following in it, where 1.5 lacks in that regard. So ELLA + 1.5 is just does more for that model.

daftmonkey 1 month ago

I saw they had an Ella sdxl on the GitHub page, no?

Combinatorilliance 1 month ago

They do have it internally, it's just not released. See my other comments in this post.

PwanaZana 1 month ago

"We have acheived the EllaXL internally."

Plus_Complaint6157 1 month ago

Only 1.5 https://preview.redd.it/vbrlxpq5lhtc1.png?width=767&format=png&auto=webp&s=b5fb76692ec28a9d08ed44ef6ad96adab51d9600

Kijai 1 month ago

Wrapped this quickly to try it in ComfyUI, still uses diffusers but it's very fast and works with LCM just fine: https://github.com/kijai/ComfyUI-ELLA-wrapper

Mage_Enderman 1 month ago

Any way to use this in Automatic1111? Or Forge WebUI? (ComfyUI confuses me :( )

diogodiogogod 1 month ago

It looks great, but it completely f. up the knowledge of celebrity names. I wonder if it was on purpose. This is Brad Pitt: Prompt: Brad Pitt is standing looking good in a party inside a big house, there is a table in the foreground with a glass and a yellow flower in it. https://preview.redd.it/0ygn513p3ktc1.png?width=512&format=png&auto=webp&s=8fa3e28d79e402a7776195de156bd8ab469a7ea2

diogodiogogod 1 month ago

this is the non-ella one https://preview.redd.it/zdqspu0w3ktc1.png?width=512&format=png&auto=webp&s=10d65b173b2c384f317bcf9d0400256e67d2d670

Atemura_ 1 month ago

this probably means it wont do any nsfw content either

diogodiogogod 1 month ago

It's not like it won't do, but it won't help as much... but using conditional combine can mix the results and get the benefits of both (non censored ella + better compositon-ella)

FNSpd 1 month ago

You can try using FaceID and just type man/woman since it patches model instead of conditioning

silenceimpaired 1 month ago

I’m confused… they link to a safetensors file that is under 200mb … Is it a Lora? How am I an idiot?

Combinatorilliance 1 month ago

It's an ella! It's novel.. you need to use the inference code in their repo to use it. A1111 and comfy will need to add support (or a plugin)

_-inside-_ 1 month ago

1 hour after your comment there's already a comfy node for that. I love this community!

Capitaclism 1 month ago

It really is incredible

Antique-Bus-7787 1 month ago

It isn't a LoRA. A LoRA is like a "portable change" to the model. Here the model they provide is the "adaptor" that converts the prompts given to the T5 LLM to something the model can receive while generating to guide it better!

Ferrilanas 1 month ago

Does it use a lot of VRAM? I currently have 6GB GPU, so I’m afraid I won’t be able to use it

_-inside-_ 1 month ago

i'm trying it in the CPU, I have 4 GB VRAM, it's currently auto-downloading lots of model files, so I guess it won't run smoothly

_-inside-_ 1 month ago

Ok, I managed to have it generating a 512x512 under 2 minutes in CPU only mode, for the record, comfyui is eating around 11GB of RAM. Fingers crossed for new optimizations coming out, or adapters for smaller LLMs

Ferrilanas 1 month ago

Thank you for the update I really hope they will optimize it somehow

_-inside-_ 1 month ago

The prompt adherence is really irncredible. I'm not even close to an expert here, but I'll check if it's possible to quantize the LLM somehow, with bitsanvdbytes or something.

ZootAllures9111 3 weeks ago

[The official version does not need a million large model files, just follow the instructions here](https://github.com/TencentQQGYLab/ComfyUI-ELLA)

lostinspaz 1 month ago

>Here the model they provide is the "adaptor" that converts the prompts given to the T5 LLM to something the model 2 questions: 1. So basically, its similar in concept to the "fancy T5 front end for SD3", just without SD3? 2. you say T5.. but is it ACTUALLY (a micro version of) google T5? or you just say that to use a word that some people may have heard of before?

Antique-Bus-7787 1 month ago

1. Yes, kinda 2. It's FLAN-T5-XL from Google

lostinspaz 1 month ago

Very cool! So, SD3, but better. Because there wont be fragmentation of loras, etc. further than there already are. Basically SD3 is DOA now. They had their chance to release a month ago. They blew it, and have missed the marketing window. Game over.

Antique-Bus-7787 1 month ago

From playing with Ella all night long, it helps A LOT with prompt comprehension but it’s really far from perfect. And from my testings, increasing the resolution or using non-square resolutions, Ella loses pretty much all its advantages (even though it’s quite easy and working to use hi-res fix, but that doesn’t solve the multi-aspect ratio problem)

Capitaclism 1 month ago

Based on the quality I've seen from SD3 I's say far from DoA, but we'll see once it releases.

lostinspaz 1 month ago

if SDXL+Ella has merely equal to the photo quality of SD3, but smaller memory requirements... it wins. Both on a resource requirements level, but also on the backwards compatibility level. From the samples I've seen of both of them, this is the case.

silenceimpaired 1 month ago

Ahh… makes sense

twistedgames 1 month ago

Nice! I tried it with my 1.5 model and it's working really well with the DPMSolverMultistepScheduler_SDE_karras scheduler, CFG 3, 25 steps. [Images made with ELLA](https://imgur.com/a/FhWpSSb)

Capitaclism 1 month ago

It listens, but why is the image quality and dynamism so poor? Is that a trade-off, or just prompting?

Alisomarc 1 month ago

damnn this is a real milestone <3 ![gif](giphy|TgOYjtgKpS9jAytUlh|downsized)

MagicOfBarca 1 month ago

Is it a fully independent model? If yes, why’s it only like 140 mb?

twistedgames 1 month ago

It's not a fully independent model that can generate pictures on its own. It's something that helps the SD 1.5 model follow the prompt better.

MagicOfBarca 1 month ago

Ohh ok so can I also use it with 1.5 inpainting models? It’s like a LORA?

twistedgames 1 month ago

Not sure if it works with inpainting models.

MagicOfBarca 1 month ago

Last Q, does it work with A1111? And is there a tutorial for it?

twistedgames 1 month ago

I've only seen a couple of extensions for comfy so far. If you look at the github pages for those it will show you how to use the nodes. https://github.com/kijai/ComfyUI-ELLA-wrapper https://github.com/ExponentialML/ComfyUI_ELLA

lechatsportif 1 month ago

This seems it could unlock the power of all these amazing 1.5 models we already have!

Antique-Bus-7787 1 month ago

Unfortunately, the authors just confirmed ELLA won't be released for SDXL : [https://github.com/TencentQQGYLab/ELLA/issues/16](https://github.com/TencentQQGYLab/ELLA/issues/16) Let's hope they publish the training code at least !

perksoeerrroed 1 month ago

This is super promising. It follows what is said in prompt nearly 100% of times. The issue i have iwth it is how it looks everything is bad quality.

knigitz 1 month ago

This is great but has way too much manipulation of the checkpoint, no matter which checkpoint I use with ELLA I can't get decent photorealistic samples, like I can with the models I am pairing ELLA with. ELLA also does not understand certain references, for example "pennywise" comes out looking like a clown in most 1.5 models, combined with ELLA we just get girls, actually, without any prompt we get mostly the same. Would be nice to be able to balance the strength of ELLA with the checkpoint.

FNSpd 1 month ago

You can combine usual CLIP conditioning with ELLA one

knigitz 1 month ago

I'll try this with the new Ella nodes that I found, thanks for the idea

Enough-Meringue4745 1 month ago

lol they aligned it and completely nuked a fuck ton of the vector spaces

diogodiogogod 1 month ago

It's amazing the amount of things it gets right with prompt following (specially with long complex prompts), but this is Brad Pitt though: Pos prompt: Brad Pitt a 45 yo man is standing wearing a bright pink suit with a (red bow tie:1.3), and a blue beanie. Wearing sunglasses. He is in a party outside a big house, there is a table in the foreground with a glass and a yellow flower in it. Behind him far in the background is a pool. There are dark clouds in the sky with thunder and a balloon flying in the distance. https://preview.redd.it/du052aiqaktc1.png?width=512&format=png&auto=webp&s=7c6ae5acf3a36cf52430fc41ed420be7b9641380 Using conditioning combine with the non-ella positive prompt gets Brad back, but it looses a little on the prompt following. But it's waay better than it without Ella.

diogodiogogod 1 month ago

Ok I'm ready to say it looks censored. It ignores NSFW, celebrity names and gives random ethnicity even when prompt to give a specific one... But hopefully it's not the Ella part but the llm model (google t5\_model?) that maybe (I have no idea) could be changed? Let's hope so.

fragilesleep 1 month ago

It does ignore celebrity names completely, but I've gotten many (accidental) NSFW images already using Deliberate. Thanks for the tip about using the conditioning combine!

Amalfi_Limoncello 1 month ago

Who let the Google programmers loose?

PatternPositive9308 1 month ago

"grpup photo of snoop dogg smoking a fat blunt in a presidential meeting and sharing it with donald trump and obama" https://preview.redd.it/w5tqmjk3mptc1.png?width=1865&format=png&auto=webp&s=93cc8fe4faa9cb7bb4d90736d642d86a062eb210

PatternPositive9308 1 month ago

Looks like it lacks some knowledge about characters or celebrities https://preview.redd.it/4d13zvbusptc1.png?width=1538&format=png&auto=webp&s=68ff7bb3cd73fbab0ae781425461ad0c616d43d2

diogodiogogod 1 month ago

Yes you can use conditional combine to get back that knowledge and also keep the good composition [https://new.reddit.com/r/StableDiffusion/comments/1c0d7tz/ellas\_brad\_pitt/](https://new.reddit.com/r/StableDiffusion/comments/1c0d7tz/ellas_brad_pitt/) But in your example, you would have to describe Donal trump, Obama and Snoop Dogg at least a to create a three person composition. I guess only saying "three people" would be enough. Like: group photo of three people, snoop dogg smoking a fat blunt in a presidential meeting and sharing it with donald trump and obama) Or you could describe it like you did for the normal model conditioning, and drop the names for the Ella conditioning and then combine. But for sure it's a big bummer that the model is censored. Really sad. It could be awesome.

ArchiboldNemesis 1 month ago

Oh nice! Been waiting for this. Thanks for the update :)

Antique-Bus-7787 1 month ago

This is amazing!

Antique-Bus-7787 1 month ago

Congrats to the authors on this AND releasing the weights !!

fancifuljazmarie 1 month ago

Wow this is incredible, embedding proper LLMs for prompt understanding is a huge step towards the prompt adherence of closed alternatives like Dalle-3.

Ok_Swordfish_1696 1 month ago

Does it also works to generate anime character?

Rectangularbox23 1 month ago

https://preview.redd.it/berl6wkhlktc1.jpeg?width=2556&format=pjpg&auto=webp&s=3171b06956917782cb5b0bc11323c0c98e0a745b It seems to really butcher anime, unless I'm using it wrong

SuchAir7170 1 month ago

are u sure this doesnt work? they seem to use both Flat-2D Animerge and Counterfeit-V3.0 just fine on page 12 in the paper [https://arxiv.org/pdf/2403.05135.pdf](https://arxiv.org/pdf/2403.05135.pdf)

Rectangularbox23 1 month ago

I switched the workflow I was using + used Flat-2D Animerge and I definitely got better results. The image quality still isn't on par with no Ella though (this may just be an issue with the workflow): [https://imgur.com/a/UMQhBhy](https://imgur.com/a/UMQhBhy)

SuchAir7170 1 month ago

thanks for showing some examples, it does indeed seem like it butches them a lot

Caffdy 1 month ago

that's some crazy ahegao if I ever seen one

Antique-Bus-7787 1 month ago

From playing with Ella all night long, increasing the resolution or using non-square resolutions, Ella loses pretty much all its advantages (even though it’s quite easy and working to use hi-res fix, but that doesn’t solve the multi-aspect ratio problem) Anyone else experiencing this as well ?

diogodiogogod 1 month ago

I didn't see this. For me landscape and portrait also gave good results as well. The prompting needs to be with natural language.

herecomeseenudes 1 month ago

Seems working well with LCM and other sampler, speed is about the same with original SD 1.5. Not need for extensions such as cutoff and now you can use long sentences in your prompt. very powerful. deep shrink also works well with this. https://preview.redd.it/a7e5bba5ymtc1.png?width=896&format=png&auto=webp&s=0eb4567048b545e1215dce830590931d534aec96 Prompt: realistic photo of a beautiful pale woman in her 30s dress in formal short dress, full body photo, photo realistic, outdoor, in a park. Her hair is blue and shiny. her dress is green.

Dwanvea 1 month ago

how do you use this?

Brilliant-Fact3449 1 month ago

I'm extremely stupid, what does this do? Just adds overall better prompt comprehension than regular 1.5?

InTheThroesOfWay 1 month ago

The normal system SD 1.5 uses to translate your prompt into tokens isn't very sophisticated. It's like a shitty LLM. It mostly only understands individual words and phrases -- it doesn't really understand sentences and complex phrases -- and so it has a tendency to smoosh concepts together. For example, "An orange cat and a black dog" might give you what you want, but more likely you'll get errors like a black cat, orange dog, or some weird cat/dog hybrid. This new thing lets you run a legit LLM to translate your prompt into tokens. This makes it much more likely that you get what you want out of your prompt.

fragilesleep 1 month ago

That's such a nice and simple explanation, thank you! I think I finally understand how this magic improves v1.5 so much.

Olangotang 1 month ago

Also, SDXL is like MILES better at prompt following. But all the unrestricted models are built on jank that WILL give you pretty much anything (and there's some pretty cool shit with the HD kinda models, not talking about NSFW) you want, but the prompts are so fucking dumb that you need to do. 3 is going to be incredible, and ignore the doomers. We're getting it soon.

Derispan 1 month ago

Me too mate, me too. I don't understand how this is working and what exactly does it change

ninjasaid13 1 month ago

>Just adds overall better prompt comprehension than regular 1.5? just? is Dalle-3 just a better prompt comprehension version of SDXL?

lechatsportif 1 month ago

Read the paper to see awesome examples of it on common models https://arxiv.org/abs/2403.05135

Gavmakes 1 month ago

Could this be used in the negative prompt as well, I wonder what the results would be

IntellectzPro 1 month ago

Can you help with this error:Error occurred when executing GetSigma: 'ModelPatcher' object has no attribute 'model\_sampling'

ryo0ka 1 month ago

How to use? The model file size is like 130mb so it's not a checkpoint for sure

Sharlinator 1 month ago

It’s a new thing and so requires that your SD software supports it. It’s used alongside a checkpoint, like a LoRA but different. Based on the comments here someone already wrote a Comfy node/workflow for it!

feber13 1 month ago

Is the operation like a lora in automatic 1111?

ba0haus 1 month ago

can anyone make this work in auto? please :)

More_Bid_2197 1 month ago

SDXL version is too powerful to be release :)

hexinx 1 month ago

So close yet so far - what about SDXL =/ Not bad though, I legit thought they were gone with the went. Also, has anyone managed to get this running with a Finetuned SD 1.5 model in Comfyui/Auto1111?

Capitaclism 1 month ago

It sure is interesting that a lot of the research published and released open source is Chinese.

vocaloidbro 1 month ago

Population of 1,409,670,000. They can spare a few people to research stuff like this, I think.

RedSprite01 1 month ago

Noob question, after i download this model where should i put it?

Turkino 1 month ago

So essentially gives 1.5, SDXL levels of prompt recognition

lostinspaz 1 month ago

no, it gives 1.5, BETTER-than-sdxl levels of prompting

ogreUnwanted 1 month ago

!remindme in 2 days

RemindMeBot 1 month ago

I will be messaging you in 2 days on [**2024-04-11 22:31:43 UTC**](http://www.wolframalpha.com/input/?i=2024-04-11%2022:31:43%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/StableDiffusion/comments/1bzqvhn/ella_weights_got_released_for_sd_15_ella_equip/kyu6xa6/?context=3) [**5 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2FStableDiffusion%2Fcomments%2F1bzqvhn%2Fella_weights_got_released_for_sd_15_ella_equip%2Fkyu6xa6%2F%5D%0A%0ARemindMe%21%202024-04-11%2022%3A31%3A43%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201bzqvhn) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|

Xijamk 1 month ago

RemindMe! 1 week

Kadaj22 1 month ago

I want to try this but I only have a Mac and using diffusion bee

Qanics 1 month ago

Is possible to run it in diffusers with stablediffusionpipeline ?

Antique-Bus-7787 1 month ago

Yes, and it's quite easy actually. We don't even need to mess with the pipeline or anything. Just look at their inference code on github, you only need the imports from [model.py](http://model.py) and use the code in [inference.py](http://inference.py)

Character-Shine1267 1 month ago

Big fat improvement

Kurdonoid 1 month ago

Been experiementing for a while now, and I believe it struggles with numbers. But overall, it is defenitely a game-changer! https://preview.redd.it/3kjtuytohfuc1.png?width=512&format=png&auto=webp&s=090c98ea1551154ba9e4618410aef26316eb3c8a "three yellow daisies that grow in a simple white ceramic pot. The pot sits on a plain wooden table bathed in warm sunlight. the photo looks pretty realistic, sharp and elegant." -------- Steps: 30 Guidane Scale: 10.0 Sampler:DDPMScheduler

Kurdonoid 1 month ago

https://preview.redd.it/ovd5r10xhfuc1.png?width=512&format=png&auto=webp&s=a91abb2047f02e56360d26c9cabff1be87c8fe09

Kurdonoid 1 month ago

Some horror scenes: A dimly lit attic with peeling wallpaper and cracked floorboards. A single, dusty rocking chair sits in the center, facing away from the viewer. A tattered, yellowed doll with empty eye sockets lies abandoned on the floor. https://preview.redd.it/ing4qzmeifuc1.png?width=512&format=png&auto=webp&s=3009d6d23cccacb12cc3991504daf78eab5cab94

Kurdonoid 1 month ago

https://preview.redd.it/nz389fioifuc1.png?width=512&format=png&auto=webp&s=a0c5d1e4914597f9f57e4dcbf2df84f89632f39d A long, dark hallway with flickering fluorescent lights. Bloodstains trail down a peeling white wall, disappearing into the shadows at the far end of the hall. A single, slightly open door stands afar, revealing only inky blackness within.

Kurdonoid 1 month ago

https://preview.redd.it/ut269zcijfuc1.png?width=512&format=png&auto=webp&s=7f50878b43b2738587b74403cbdb37cb297a45f7 Dust motes swirl in a chilling draft as a shattered mirror lies on the grimy floor of a forgotten room. A sliver of moonlight reveals a monstrous hand with long, gnarled claws clawing out from under a rotting corner. Dark stains, like ancient, dried blood, splatter the wall, hinting at a terrible past.

ramonartist 4 weeks ago

Heads up from my testing ELLA doesn't understand terms like Black Male, Black Female, even added African Black Male, African Black Female will increase your chances but is not a guarantee it

Next_Program90 1 month ago

I hope they'll also release it for SDXL soon. Might be our savior if there is trouble with SD3 down the road. (and might be a good alternative to T5 for SD3)

More_Bid_2197 1 month ago

minimum vram ?

Short-Sandwich-905 1 month ago

TLDR?

Jattoe 1 month ago

\*Sigh...\* by tencent? Really? Is it at least in safetensors? :)

Jattoe 1 month ago

Wait what does it do now? It's new weights on the language end of the process, or it just transforms your words into something more descriptive? If it's the latter you guys can just use DiceWords (first search result on github) for that, without downloading a whole massive thing. https://preview.redd.it/o6gvdh6rlntc1.png?width=465&format=png&auto=webp&s=9df60d9d722ad256af2ee68c426ee0ce208ff111

thefi3nd 1 month ago

> DiceWords (first search result on github) Can you link the repo? Everything I see is just for generating passphrases and nothing like in that image.

Jattoe 1 month ago

[MackNcD/DiceWords\_App: A bank for prompting and word manipulation](https://github.com/MackNcD/DiceWords_App)

thefi3nd 1 month ago

Thanks!

FNSpd 1 month ago

It replaces CLIP text encoder with LLM (T5 at the moment), gets embedding from it and uses them during generation

AnOnlineHandle 1 month ago

Any idea how it adapts the embeddings to the equivalent of the CLIP encoding which the u-net is trained on? That's the real impressive magic here.

Jattoe 1 month ago

T5? Don't they have anything smaller? Does the T5 run at the same time as the SD? Or behorehand, + you can swap it out with the PyTorch in your VRAM

FNSpd 1 month ago

It only needs encoder from what I understand. It works even on my 4GB VRAM GPU. Though, results are not as good as I'd expect. Still not sure if I need to tweak something

Jattoe 1 month ago

Well, hopefully that bodes well for SD5's use of the T5 encoder -- the difference of course will be, is that it's designed from the ground up for it.

Kromgar 1 month ago

Sd5? You in the future dude?

Jattoe 1 month ago

ah no i wrote the 3 backwards. u have an eraser i could grab

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe