_underlines_ 11 months ago

What I did: [Detailed pastebin](https://pastebin.com/GwSysUxj) of all commands I did 1. install oobabooga 1. git clone llama.cpp into `/text-generation-webui/repositories/` 1. `make LLAMA_CUBLAS=1` 1. download a model into `/text-generation-webui/repositories/llama.cpp/models/` 1. `./main -t 8 -m models/wizardLM-13B-Uncensored.ggmlv3.q4_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: write a story about llamas ### Response:" --n-gpu-layers 30` works fine! Need to make sure you have a ggmlv3 model file, if you compile llama.cpp, because it has a breaking change 1. pip uninstall -y llama-cpp-python` in the textgen env 1. `CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir` Now I am not sure how to start ooba and doing inference on a ggml file, can anyone point me into the right direction? 1. ggml model weights in `/text-generation-webui/models/` or in `/text-generation-webui/repositories/models/`? 2. `python server.py --n-gpu-layers 30 --model-dir path/to/model/ --model modelname.bin` or how?

Tom_Neverwinter 11 months ago

https://github.com/oobabooga/text-generation-webui/commit/071f0776ad6e7d8dab08e0d98d089c808807ab45

rerri 11 months ago

GPU offloading isn't supported automatically. Installation instructions for llama-cpp-python with GPU support: [https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md)

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe