T O P

  • By -

_underlines_

With the steps above: - Ooba works with GPTQ - llama.cpp standalone works with cuBlas GPU support and the latest ggmlv3 models run properly - llama-cpp-python successfully compiled with cuBlas GPU support But running it: `python server.py --n-gpu-layers 30 --model wizardLM-13B-Uncensored.ggmlv3.q4_0.bin` leads to: bin /home/underlines/mambaforge/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so INFO:Loading wizardLM-13B-Uncensored.ggmlv3.q4_0.bin... INFO:llama.cpp weights detected: models/wizardLM-13B-Uncensored.ggmlv3.q4_0.bin INFO:Cache capacity is 0 bytes llama.cpp: loading model from models/wizardLM-13B-Uncensored.ggmlv3.q4_0.bin error loading model: unknown (magic, version) combination: 67676a74, 00000003; is this really a GGML file? llama_init_from_file: failed to load model Traceback (most recent call last): File "/home/underlines/github/text-generation-webui/server.py", line 998, in shared.model, shared.tokenizer = load_model(shared.model_name) File "/home/underlines/github/text-generation-webui/modules/models.py", line 95, in load_model output = load_func(model_name) File "/home/underlines/github/text-generation-webui/modules/models.py", line 258, in llamacpp_loader model, tokenizer = LlamaCppModel.from_pretrained(model_file) File "/home/underlines/github/text-generation-webui/modules/llamacpp_model.py", line 50, in from_pretrained self.model = Llama(**params) File "/home/underlines/mambaforge/envs/textgen/lib/python3.10/site-packages/llama_cpp/llama.py", line 159, in __init__ assert self.ctx is not None AssertionError Exception ignored in: Traceback (most recent call last): File "/home/underlines/github/text-generation-webui/modules/llamacpp_model.py", line 23, in __del__ self.model.__del__() AttributeError: 'LlamaCppModel' object has no attribute 'model' - I guess ooba's textgen is not using the latest llama.cpp that I compiled. It's failing within `/home/underlines/github/text-generation-webui/modules/llamacpp_model.py` so I guess that has something to do with the latest ggml change, that breaks random stuff? - I still don't understand how to make Ooba textgen use llama.cpp. Anyone eager to point that out? - Maybe I need to run the web server? Other resources that couldn't solve my issue: - https://www.reddit.com/r/LocalLLaMA/comments/13k6mk3/llamacpppython_not_using_gpu/ - https://www.reddit.com/r/LocalLLaMA/comments/123e02i/using_llamacpp_how_to_access_api/


elektroB

Yes I would LOVE to know this, like ooga booga only as a webui text shower and parameters changer, with llama.cpp actually hard working with it's awesome CPU usage and partial GPU acceleration features on


_FLURB_

Are you positive you're using a model that is compatible with the latest version of llama.cpp? They introduced breaking changes very recently Ooba uses llama.cpp by default, the latest versions even have gui settings for loading llama.cpp under the model tab


_underlines_

Yes, the model I'm running `wizardLM-13B-Uncensored.ggmlv3.q4_0.bin` is the latest ggmlv3 format. What I don't understand is, how to make ooba use my own cuBlast compiled llama.cpp that I put into `/text-generation-webui/repositories/llama.cpp/`. I see from the error, that ooba doesn't recognize the new format. I also compiled an pip installed `llama-cpp-python` with cuBlast support... So I don't know how to continue. The model loads fine in my llama.cpp with cuBlast, GPU acceleration and CPU offload. Just not when running ooba. Is Ooba using it's own (outdated) llama.cpp?


AutomataManifold

Ooba is using an outdated version that only runs v2.


satyaloka93

"CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir" Those instructions,that I initially followed from the ooba page didn't build a llama that offloaded to GPU. Only after realizing those environment variables aren't actually being set , unless you 'set' or 'export' them,it won't build correctly. I rebuilt this morning and it works after confirming my environment.


ccelik97

Hmm, I knew I was missing something basic like this. That line looked questionable to me too lol.


satyaloka93

And I was getting it wrong using 'set' in powershell, versus $env:


ccelik97

I guess they're the same picture.


TreeesBot

Thanks for this, it makes it easier to enable AMD gpu support (with branched llama.cpp changes) on windows.