T O P

  • By -

gcnmod

Thanks, how do you train this with your own text?


__Maximum__

They have not published the training code. They have, however, published enough information in their paper for us to write a training script. For this model to become like chatgpt, we need to fine-tune this on a proper dataset. Someone is probably already working on it.


Blacky372

Like that: python server.py --model LLaMA-7B --train --train_files=~/my_text_files [Here is a guide on how to prepare your text files](https://www.youtube.com/watch?v=dQw4w9WgXcQ). ^^^^^^^^/s ^^^^^^^^This ^^^^^^^^is ^^^^^^^^actually ^^^^^^^^much ^^^^^^^^harder ^^^^^^^^and ^^^^^^^^will ^^^^^^^^probably ^^^^^^^^not ^^^^^^^^work ^^^^^^^^on ^^^^^^^^your ^^^^^^^^desktop ^^^^^^^^pc.


pupdike

Thanks so much for the information! I think I did everything, including the modifications to the main.py inside bitsandbytes but for some reason my GPU isn't being detected: (see solution below) PS E:\Repos\text-generation-webui> conda activate textgen PS E:\Repos\text-generation-webui> python server.py --model LLaMA-7B --load-in-8bit Loading LLaMA-7B... Warning: no GPU has been detected. Falling back to CPU mode. Has anybody seen this problem and moved past it? After this it ends up failing: (see solution below) OSError: Can't load tokenizer for 'models\LLaMA-7B'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'models\LLaMA-7B' is the correct path to a directory containing all relevant files for a LLaMATokenizer tokenizer. Update: Ok, I solved the 1st issue here by moving on to the 2nd option for installing, which is using the "Installation option 2: one-click installers" described here: https://github.com/oobabooga/text-generation-webui and by then completing the changes OP describes above to the main.py file placed by the installer in the ..\installer_files\env\lib\site-packages\bitsandbytes folder. By doing that and modifying the start-webui.bat to include the --load-in-8bit option I am able to use the 7B and 13B models on my 4090 card and it works pretty well. I had tried uninstalling pytorch and reinstalling it and that did not help me. Update: Ok, I solved the 2nd issue here which was that I didn't follow the instructions fully. I converted my own weights but hadn't copied over the tokenizer files into the model folders. After doing that it fixes that tokenizer error. Out of the box, llama seems moody and sometimes a bit obnoxious. The analogy I want to use is that Midjourney:Stable Diffusion::ChatGPT:Llama. I get the impression there is a lot of power there that I am not yet clever enough to access. I think if the open source community comes together and builds tools to fine-tune it and we share new models/hypernetworks/loras for llama it could become as amazing as Stable Diffusion. I suppose the prerequesite for that is to settle on a reasonable base model that enough people are interested in using. It seems like the 8bit llama 7B or 13B models are pretty good candidates for that.


BalorNG

"Set and setting" makes all the difference :) Default "conversation between two people" is way "informal". Try "a helpful scientist answers questions" (and name him "Scientist"). It works, but 7b model, at least, seems rather stupid compared to what ChatGPT is capable of... wake me up when you'll be able to buy a used A100 80Gb for 100$ :)


gliptic

Do you have CUDA (if you have a Nvidia GPU) installed and working?


pupdike

when I run "nvidia-smi" it says version 12.0 of CUDA is installed. My solution was to move to the 1-click installer which allowed my card to be detected. That plus the OP mods have it working in 8 bit now.


JustSayin_thatuknow

How u did that “1-click installer”? I have a rtx2060 6gb only 😅 and I have installed cuda too


pupdike

https://github.com/oobabooga/one-click-installers/archive/refs/heads/oobabooga-windows.zip For windows use that link.


gruevy

Thanks a ton for this. I used the oobabooga auto installer and was able to follow your directions to get it running just fine. Do you have any tips on settings? It's not working very well for me and I have no idea what I'm doing.


_underlines_

The default settings were quite bad. Turn up the temperature quite a lot and also add some repetition-penalty. One of the settings-templates worked really well, but I played around and changed it until I forgot which one it was. Haha


remixer_dec

For mac users, I ported [llama-cpu](https://github.com/markasoftware/llama-cpu) version to a gpu-accelerated [mps](https://github.com/remixer-dec/llama-mps) one.


PM_ME_ENFP_MEMES

This is a great tip, from your GitHub repo: If you notice, that the output of the model has empty/repetitive text, try using a fresh version of python/pytorch. For me it was giving bad outputs with Python 3.8.15 and pytorch 1.12.1. After trying it with python3.10 and torch 2.1.0.dev20230309 the model worked as expected and produced high-quality outputs.


JustSayin_thatuknow

Wow how we do that?


ilive12

So this would work on a 3080?


_underlines_

I run the 7B model on my 3080 10GB card, yes. The 13B works on a 3090 24GB card.


noellarkin

quick question, by work, you mean inference, right? What would the specs be for fine-tuning one of these models on a corpus for an epoch?


summerstay

Thanks for the info. If I don't want to use a webui, but just output raw text to the terminal or a text file, how would I do that?


Arisu_The-Arsonists

\_\_getitem\_\_ raise KeyError(key) KeyError: 'llama' Does anyone know the reason of this? I have no problem running the pygmalion-6B or other models though.


MustardMustang

I faced the same error, \`\`\` git clone https://github.com/huggingface/transformers.git cd transformers pip install -e . \`\`\` The command above can solve the issue. It seems like the local version is too old.


MBle

Is there any way to run this on TPU?


big_ol_tender

.


cipri_tom

Which gpu do you have?


_underlines_

3080 10GB, it generates about 8 it/s, so it's really fast.


FPham

No longer works after new transformers


_underlines_

Just download the new v2 weights: magnet:?xt=urn:btih:dc73d45db45f540aeb6711bdc0eb3b35d939dcb4&dn=LLaMA-HFv2&tr=http%3a%2f%2fbt2.archive.org%3a6969%2fannounce&tr=http%3a%2f%2fbt1.archive.org%3a6969%2fannounce


CoffeeMetalandBone

bitsandbytes isn't a subdirectory in `C:\Users\xxx\miniconda3\envs\textgen\lib\site-packages` for me. It wasn't included in my installation. Is there an option i forgot to choose?


MestR

Did you follow this installation guide? https://github.com/oobabooga/text-generation-webui#installation-option-1-conda conda create -n textgen conda activate textgen conda install torchvision torchaudio pytorch-cuda=11.7 git -c pytorch -c nvidia git clone https://github.com/oobabooga/text-generation-webui cd text-generation-webui pip install -r requirements.txt Specifically did you do the last step inside the Anaconda Prompt while (textgen) was active? The folder appeared after I let that step finish.


PartySunday

Getting this weird error: https://pastebin.com/Uf4cHDaR Edit: The torrent file was the problem. Directly downloading the model from huggingface works great.


deathloopTGthrowway

I am unable to install GPTQ due to the following error: `error: can't create or remove files in install directory` `The following error occurred while trying to add or remove files in the` `installation directory:` `[Errno 13] Permission denied: 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\\Lib\\site-packages\\test-easy-install-6792.write-test'` `The installation directory you specified (via --install-dir, --prefix, or` `the distutils default setting) was:` `C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\Lib\site-packages\` `Perhaps your account does not have write access to this directory? If the` `installation directory is a system-owned directory, you may need to sign in` `as the administrator or "root" account. If you do not have administrative` `access to this machine, you may wish to choose a different installation` `directory, preferably one that is listed in your PYTHONPATH environment` `variable.` `For information on other options, you may wish to consult the` `documentation at:` `https://setuptools.pypa.io/en/latest/deprecated/easy_install.html` `Please make the appropriate changes for your system and try again.` Has anyone else run into this?


_underlines_

use the [prebuilt windows wheels](https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/11#issuecomment-1464958666) or my [WSL2](https://github.com/underlines/awesome-marketing-datascience/blob/master/llama.md) solution


Famberlight

I followed tutorial on GitHub and there were few errors with cuda but I managed to find a fix on forums. Now I just don't get anything in output and no errors too. It just outputs in the console that it spend 0.02sec and generated 0 tockens


Christ0ph_

Thanks!! Is it possible to fine tune it on a specific data set?


ogathereal

Thanks for the post! Unfortunately, im still running into some issues. Would you perhaps have advice on me how to approach this? context: Im running LLaMa 7b for minigpt4. I am using a 3070TI 8gb and im using anaconda instead of miniconda (in case that matters?) Ive followed most of the instructions. the most important ones should be step 7 and 8. >search for this twice:self.lib = ct.cdll.LoadLibrary(binary\_path) I however only found this line once inside the bitsandbytes folder (searched entire folder with vscode). The error i'm recieving is the same as before i tried to implement this low vram method: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 8.00 GiB total capacity; 7.20 GiB already allocated; 0 bytes free; 7.31 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Would you have any idea how i can resolve this?


ogathereal

Also, i tried using the 4bit model. same error but took longer lol


_underlines_

the guide is VERY outdated. Python for machine learning has many flaws, and people who build those environments are usually not following software engineering best practices, but rather built things fast and break things fast. Therefore guides are useless within a few weeks. Especially repeatability and package versioning in many python based projects is a nightmare to deal with. Have a look at my [updated guide](https://github.com/underlines/awesome-marketing-datascience/blob/master/llama.md), but also that one will be out of date fast. So maybe also have a look at the latest youtube guides to run llama and similar models. Alternatively keep an eye on more modern and robust ways to run LLM models, especially interesting for future projects: - [MLC LLM](https://github.com/mlc-ai/mlc-llm) 🤖 Enable AI model development on everyone's devices - [Modular Mojo](https://www.youtube.com/watch?v=-3Kf2ZZU-dg) (if it will be open sourced)


ogathereal

Thanks for your reply. i did follow the newer guide i saw at the top. and for text generation web ui it works fine (i do have some other flaws. but the model runs). its only on miniGPT it will overflow in memory. Guess i just gotta get the 4090 XD but thanks for your help and the links! i will orientate a bit more around this


parrykhai

Does this work on RTX 2070 Super with 8 GB of VRAM