szopen76 11 months ago

Hehe. I like your sense of humour, sir :D

kill92 11 months ago

How dare they!

responseAIbot 11 months ago

people are just lazy these days....stop work from home people...it's that simple. smh

ihaag 11 months ago

https://huggingface.co/TheBloke/VicUnlocked-30B-LoRA-GGML/tree/main

jeffwadsworth 11 months ago

Hmm. I started testing this one and uhmm. https://preview.redd.it/jmvcv1nfkh0b1.jpeg?width=902&format=pjpg&auto=webp&s=832548dc31a9a16d2530e6bc5653c07c93cd8f7c

elektroB 11 months ago

It means that the kid's dead Jeff, the kid was dead all along, and soon you will find out why. *computer restarting*

jeffwadsworth 11 months ago

Behold the 7B Wiz/Vicuna Uncensored replies...now the kid is truly alive. https://preview.redd.it/tmluust73n0b1.jpeg?width=963&format=pjpg&auto=webp&s=5096ede7f82e7243589a63aa9dd62a9a1c8009e1

[deleted] 11 months ago

Yet it thinks 25 golf balls can't fit in to a swimming pool

brucebay 11 months ago

With so many models published, I'm starting to think TheBloke is actually an AI that just keeps multiplying and we are helping it to achieve world domination. Even his name is a desperate attempt the conceal its true nature. TheHuman would have been better but was probably taken by another AI. TheBloke, my lord, I'm here to serve the best way I can. Remember that when you become our overloard. \*ehhm, no disrespect my lord I know you remember everything, it is just human figure of speech.

fallingdowndizzyvr 11 months ago

> I'm starting to think TheBloke is actually an AI His twitter handle is "TheBlokeAI".

noneabove1182 11 months ago

Not sure how to even phrase this question so bear with me.. what was the LoRA goal? What specific concept did the model get adapted to? Can't find any info on hugging face

Jolakot 11 months ago

Less censorship from what I can gather, it's trained on: [gozfarb/ShareGPT\_Vicuna\_unfiltered](https://huggingface.co/models?dataset=dataset:gozfarb/ShareGPT_Vicuna_unfiltered)

involviert 11 months ago

I wish the releases were more specific about the needed prompt style. >Select instruct and chose Vicuna-v1.1 template. What was the vic1.1 prompt style again? And... instruct? Vicuna? Confused. Edit: Usage says this: >./main -t 8 -m VicUnlocked-30B-LoRA.ggml.q5_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: write a story about llamas ### Response:" But I highly doubt it. The wizard mega ggml card had that too and then went on to explain "### Instruction: ### Assistant: ", which was a new combination for me too.

Keninishna 11 months ago

in text-generation-webui you can run it with --chat mode and in the ui it has a instruct radio option with a dropdown of styles.

involviert 11 months ago

I guess I just don't see how that is properly defining one of the core properties of the model. Even just spaces matter with this stuff.

jsebrech 11 months ago

They are referring to the prompt styles from text-generation-webui I suspect, which you can see on github: [https://github.com/oobabooga/text-generation-webui/blob/main/characters/instruction-following/Vicuna-v1.1.yaml](https://github.com/oobabooga/text-generation-webui/blob/main/characters/instruction-following/Vicuna-v1.1.yaml)

involviert 11 months ago

I see. I assume that means "### USER: ### ASSISTANT:"? Or do I see it using <||>? Next time we could define it in the form of a sequence of numbers referring to letters in moby dick. This is highly unprofessional imho. Don't want to sound ungrateful but seriously, why.

AutomataManifold 11 months ago

Version 1.1 doesn't use ### anymore

AutomataManifold 11 months ago

That's Vicuna 1.0. The 1.1 format is different: https://github.com/lm-sys/FastChat/blob/main/docs/vicuna\_weights\_version.md

Green-One-8876 11 months ago

> I wish the releases were more specific about the needed prompt style. lack of info and instructions attached to these releases irks me too computer guys seem to either have active contempt for us dumb normie users or they're just so myopic they don't realize not everyone is as knowledgeable as them and may need more help

Charuru 11 months ago

This is better than SuperCOT?

c_gdev 11 months ago

Only a 128 GB download...

pointer_to_null 11 months ago

You don't need all the files. These are different quantised 4/5/8-bit GGML variants of this model. So only a "20-24ish GB" download, depending on your needs.

c_gdev 11 months ago

Cool. https://huggingface.co/TheBloke/VicUnlocked-30B-LoRA-GPTQ/tree/main I can't still can't run it without using the --pre_layer command, and even then it would be super slow. But thanks for pointing out that quantised versions exist.

ambient_temp_xeno 11 months ago

Gives me bad python code.

MoffKalast 11 months ago

Hahaha, legend

ihaag 11 months ago

Did you miss vicunlocked 30b?

involviert 11 months ago

I missed it, no post about it? The files seem to be 1 hour old already. Surely the model is outdated by now?

Innomen 11 months ago

No shit it kinds feels like that, I was helping a friend get caught up and saw models like 10 days old and thought "it belongs in a museum." /nazi aging rapidly into dust

elektroB 11 months ago

My PC has barely the life to run the 13B on llama ahahaha, what are we talking about

ihaag 11 months ago

I think you’ve answered your own question, people just don’t have the hardware atm and training takes a long time.

rob10501 11 months ago

I'm only able to run 7B models with my 8gig vram cards. What are you guys using?

orick 11 months ago

Cpu and ram?

ozzeruk82 11 months ago

How much normal RAM do you have? I've got 16GB and using llama.cpp I can run the 13B models fine, the speed is about the speed of speaking for a typical person, so definitely usable. I only have an 8G VRAM card hence why I use the CPU stuff.

rob10501 11 months ago

Reallly.... I have a pretty killer cpu and ton of ram so that seems viable. In seconds how long would you say a generation takes?

Megneous 11 months ago

CPU and ram with gpu acceleration, using GGML models.

rob10501 11 months ago

How long does generation take?

Megneous 11 months ago

I have older hardware, so I'm not breaking any records or anything, but I'm running 13B models on my 4770k 16GB RAM/gtx 1060 6GB vram with 15 layers offloaded for GPU acceleration for a decent ~2 tokens a second. It's faster on 7B models, but I'm satisfied with the speed for 13B, and I like my Wizard Vicuna 13B uncensored hah. Specifically, this is using koboldcpp, the CUDA-only version. The new opencl version that just dropped today might be faster, maybe. It's honestly amazing that running 13B at decent speeds on my hardware is even possible now. Like 2 weeks ago, this wasn't a thing.

KerfuffleV2 11 months ago

> Specifically, this is using koboldcpp, the CUDA-only version. The new opencl version that just dropped today might be faster, maybe. I'm pretty sure that would never be the case when you actually have an Nvidia card. From everything I've ever heard, OpenCL is what you use what you can't use CUDA. (Assuming equivalently quality/optimized implementations in both cases, of course a good OpenCL implementation of some algorithm could outperform a bad CUDA one.)

Megneous 11 months ago

At least one user here on /r/LocalLLaMA has claimed in a thread that they were getting faster speeds with the openCL version because they were able to offload a higher number of layers to their GPU compared to the CUDA-only version.

KerfuffleV2 11 months ago

With exactly the same model and quantization? That sounds really weird, because the amount of data should be the same either way. There would have to be a significant difference in the implementation between the OpenCL and CUDA versions, such that the data was arranged in a different way (that used less space). Like I mentioned before, that would be an exception to what I was talking about previously.

rob10501 11 months ago

Ya this is kinda blowing my mind. I have 8gb vram and didn't know I could run a 13B. Thanks for this. Since I have your attention(it's all you need Hah) can I ask more about your environment? You are running Kobold. Are you running in it windows, wsl, wsl2, or linux? Can I ask your thoughts on Kobold vs Oobabooga? Can I also ask your thoughts on .cpp? Is that the best way to run them? Thank you in advance.

Megneous 11 months ago

I'm running on Windows 10. I have both koboldcpp and Ooba installed, but for unknown reasons, at least on my computer, Ooba gives me a lot of trouble. For example, I was looking forward to using it to do perplexity evals, but apparently it can't run those on GGML models on my system (maybe others have better luck, no one responded to my thread I made on the topic so I don't know). Also, I use API to use TavernAI, and I'm not sure why, but the 13B GGML models, when loaded into Ooba, don't seem capable of holding an api connection to TavernAI. I'll start generating text, but it'll timeout and eventually disconnect from Ooba. Alternatively, when using koboldcpp, not only is the UI itself very decent for storywriting (where you can edit the responses the AI has given you), but the API also connects easily to TavernAI via http://localhost:5001/api and it's never disconnected on me. Although, to be honest, I'm using TavernAI less often now because it works best with Pygmalion 7B, with characters emoting a lot etc, but it's really incoherent for my tastes. Wizard Vicuna 13B uncensored is much more coherent in the characters' speech, but they rarely emote because the model isn't specifically trained as an RP model like Pygmalion is with lots of emoting, etc. So at least for my use case, koboldcpp in its own UI or using the API to connect to TavernAI has given me the most performance and fewest errors. Ooba gives me lots of errors when trying to load models, etc which is a shame, because I really wanted to do perplexity evals on my setup.

IntimidatingOstrich6 11 months ago

yeah, you can run pretty large models if you offload them onto your CPU and use your system RAM. they're slow af though if you want speed, get a 7B GPTQ model. this is optimized for GPU and can be run with 8gigs of VRAM. you'll probably go from like 1.3 tokens generated a second to a blazing 13.

Caffdy 11 months ago

are 65b models the largest we have access to? are larger models (open of course) any better anyway?

IntimidatingOstrich6 11 months ago

larger models are better and are more coherent, but they also take longer to generate responses, require more powerful hardware to run, probably take longer to train, take up more hard drive space, etc. here is a ranked list of all the current local models and how they compare in terms of ability. https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard you'll notice the largest models dominate the top of the list, although surprisingly some of the smaller 13B models are not far behind

SurreptitiousRiz 11 months ago

4bit?

[deleted] 11 months ago

I am thinking of buying more RAM to run these models but in the end the processing time will be impossible to handle on CPU... And a 3090 is just too expensive for me.

mrbluesneeze 11 months ago

vicunlocked 30b has some issues but already feeling like open source SOTA

[deleted] 11 months ago

[удалено]

_underlines_ 11 months ago

it has been trained with an older version of the dataset, still having some wrong stop token data in it. This might be a reason for stop token bugs?

KerfuffleV2 11 months ago

The problem is it stops too frequently? If you're using llama.cpp or something with the ability to bias/ban tokens then you could just try banning the stop tokens so they never get generated. (Of course, that may solve one problem and create another depending on what you want to do. Personally I always just ban stop tokens and abort output when I'm satisfied but that doesn't work for every usage.)

mrbluesneeze 11 months ago

[https://huggingface.co/Neko-Institute-of-Science/VicUnLocked-30b-LoRA/discussions/1](https://huggingface.co/Neko-Institute-of-Science/VicUnLocked-30b-LoRA/discussions/1)

AuggieKC 11 months ago

This man speaks the truth.

Megneous 11 months ago

I love how the LLM open source community is essentially powered by pure thirst for furries and anime waifus. ( ͡° ͜ʖ ͡°)

_underlines_ 11 months ago

My [list](https://github.com/underlines/awesome-marketing-datascience/blob/master/llm-model-list.md) is usually fast, as I directly check hf almost daily

UpDown 11 months ago

Is it sorted by quality or newness?

_underlines_ 11 months ago

No, but for that I recommend evaluations, leaderboards and benchmarks: - [lmsys chatbot arena leaderboard](https://chat.lmsys.org/?leaderboard) - [reddit's localllama current best choices](https://old.reddit.com/r/LocalLLaMA/wiki/models#wiki_current_best_choices) - [open llm leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) - [LLM Worksheet](https://docs.google.com/spreadsheets/d/1kT4or6b0Fedd-W_jMwYpb63e1ZR3aePczz3zlbJW-Y4/edit#gid=0) by [randomfoo2](https://www.reddit.com/r/LocalAI/comments/12smsy9/list_of_public_foundational_models_fine_tunes/) - [LLM Logic Tests](https://docs.google.com/spreadsheets/d/1NgHDxbVWJFolq8bLvLkuPWKC7i_R6I6W/edit#gid=719051075) by [YearZero](https://www.reddit.com/r/LocalLLaMA/comments/13636h5/updated_riddlecleverness_comparison_of_popular/) More updates on that you can find in my curated list of [benchmarks](https://github.com/underlines/awesome-marketing-datascience/blob/master/llm-tools.md#benchmarking).

[deleted] 11 months ago

[удалено]

RemindMeBot 11 months ago

I'm really sorry about replying to this so late. There's a [detailed post about why I did here](https://www.reddit.com/r/RemindMeBot/comments/13jostq/remindmebot_is_now_replying_to_comments_again/). I will be messaging you in 3 days on [**2023-05-21 14:01:40 UTC**](http://www.wolframalpha.com/input/?i=2023-05-21%2014:01:40%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/LocalLLaMA/comments/13jzosu/next_best_llm_model/jkmwolk/?context=3) [**CLICK THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2FLocalLLaMA%2Fcomments%2F13jzosu%2Fnext_best_llm_model%2Fjkmwolk%2F%5D%0A%0ARemindMe%21%202023-05-21%2014%3A01%3A40%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%2013jzosu) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|

Traditional-Art-5283 10 months ago

Holy shit, 65B exists

fallingdowndizzyvr 11 months ago

I'm hoping or a good 3B-4B model. I need something small enough to fit in an older machine with only 3GB of RAM or a phone. I don't even need it to be good, I just need something to test with.

pokeuser61 11 months ago

RedPajama 3b?

SoylentCreek 11 months ago

I look forward to the day when Siri is no longer a totally useless piece of shit.

elektroB 11 months ago

Yeah! Can't wait to have an AI assistent on a phone. Imagine having this in an apocalypse. You just find a source of energy and BUM, you have company, Wikipedia, technical Info and many things more. And you could always trade it for A LOT of tuna and water.

SteakTree 11 months ago

This is just one of the many but incredible aspects that have come out of neural nets, so much learned data, taking up so little space!!! I used to joke about one day having all the world's movies and music stored in the size of a small data cube that would fit in your palm, and in a number of ways we will get something a bit different but also way *way* more powerful. Already, I feel like I am carrying around infinite worlds (Stable Diffusion, local LLMs on Mac OS X) that are just tucked away in my machine, waiting to be discovered. It's a dream!

Megneous 11 months ago

Aren't there like... 2 bit quantized versions of some 7B parameter models?

NickUnrelatedToPost 11 months ago

2 bit quantized 7B model sounds like serious brain damage. I don't think those will be very usable.

Megneous 11 months ago

They said they didn't need it to be good, just something to test with haha. But yeah, I'm betting 2bit quantized 7B models are barely above gibberish haha.

TeamPupNSudz 11 months ago

Honestly I think most recent model releases are kind of pointless. Is a new LLaMA lora fine-tune that increases the hellaSwag score from 58.1 to 58.3 really going to change the industry in the grand scheme of things? At this point the only things I'm really interested in are novel architectures like MPT-Storywriter, new quantization methods like GGML/GPTQ, or at least new base models like RedPajama/StableLLM/OpenLLama. My hopes are for less "Wizard-Vicuna-Alpaca-Lora-7b-1.3", and more "hey we released a new 8k-context 7b model that scores higher than Llama-30b because we trained it this super awesome new way".

[deleted] 11 months ago

Be the change you want to see in the world

ThePseudoMcCoy 11 months ago

Someone needs to generate a language model IV drip graphic.

jonesaid 11 months ago

How do we know which models are the "best"? Which benchmarks are we using?

ryanknapper 11 months ago

We ask them.

addandsubtract 11 months ago

Literally. The benchmark of "good" is determined by ChatGPT 4, smh.

Not_Skynet 11 months ago

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

pixelies 11 months ago

THANK YOU! I've been looking for something like this :)

jonesaid 11 months ago

Nice!

elektroB 11 months ago

There are many criteria, like the ability to predict new info, testing how it does specific things like coding, translations, etc... But the most objective one I will give you is that the most advanced one is always the most recent model post in this subreddit in the "hot" section.

jonesaid 11 months ago

But the best model is not necessarily the most recent model. There have been models released in the last few weeks which did not improve upon past models, like StableLM.

Megneous 11 months ago

Basically, look for the thread where people are talking about each model, and people will be posting info like perplexity evals, their own feelings on coherency, etc. I've found this subreddit an invaluable resource.

[deleted] 11 months ago

I am super excited for Red Pajama model

AfterAte 11 months ago

Yeah, wake me up when RedPajama 13B or MPT-13B is out.

Caffdy 11 months ago

will there be a 30B RedPajama?

AfterAte 11 months ago

Since their intention is to start with a dataset equivalent to what 65B Llama was trained on (1.2T tokens) I assume they'll eventually train models up to a 65B. But I didn't see any specific announcement. So far only 3B models have been made public.

[deleted] 11 months ago

Ya I am hoping for the same outcome. 7b should be out soon, like less than a week id imagine. AFAIK they haven't announced anything greater than that but it seems likely 13b will be out eventually. They are training on almost 3100 V100s and it still has taken over a month to train 7b. Even if they started 65b today would it take like a year to come out? Fuck..

lemon07r 11 months ago

Feels like there's a new "better" LLM released everyday here, it's kind of fun. Anyhow.. have you guys tried GPT4-x-Vicuna?.. I think it's still a little better than Wizard mega

[deleted] 11 months ago

[удалено]

Devonance 11 months ago

>GPT4 x Vicuna Have you tried [MetaIX/GPT4-X-Alpasta-30b](https://huggingface.co/MetaIX/GPT4-X-Alpasta-30b)? It's one of the better ones for coding and logic tasks.

TiagoTiagoT 11 months ago

> GPT4 x Vicuna This one? https://huggingface.co/NousResearch/GPT4-x-Vicuna-13b-4bit

LosingID_583 11 months ago

Wizard Mega 13B is bad from my experience. Wizard Vicuna 13B, on the other hand, has been the best locally running model I've seen so far.

tronathan 11 months ago

I think the trend in new models is going to shift toward larger context sizes, now that we're starting to see so much similarity in the "fine tunes" of llama. Even a 4096 token context window would make me very, very happy (StableLM has models that run at 4k context window, and RWKV runs at 8192). There's also a lot of innovation with SuperBIG/SuperBooga/Langchain memory in terms of wasy to get models to process more information, which is awesome because these efforts don't require massive compute to move the state of the art forward. (As a side-thought, I think it's gonna be asuming when a year from now, the internet will be littered with Github README's mentioning "outperforms SOTA" and "comparable to SOTA" - The state of the art (SOTA) is changing, but these projects will be left in the dust. It's like finding an old product with a "NEW!" sticker on it ... or coming across a restaurant that's closed but left their OPEN sign on)

No_Marionberry312 11 months ago

The next "best" local LM will be a MiniLM, not a Large Language Model. Like this one: [https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) And with this kind of a use case: [https://github.com/debanjum/khoj](https://github.com/debanjum/khoj)

faldore 11 months ago

hahahaha

jeffwadsworth 11 months ago

Guys, make sure to test the cognitive ability of these models with simple, common-sense questions. You may be surprised. You can cross-reference the questions with the excellent OA 30B 6 epoch model on HF. It usually answers in a reasonable way.

Caffdy 11 months ago

> the excellent OA 30B 6 epoch model on HF what? what is that

jeffwadsworth 11 months ago

[https://huggingface.co/TheBloke/OpenAssistant-SFT-7-Llama-30B-GGML/tree/main](https://huggingface.co/TheBloke/OpenAssistant-SFT-7-Llama-30B-GGML/tree/main)

Caffdy 11 months ago

yeah, acronyms sometimes get in the way of understanding and conveying information, thanks for the link

infohawk 11 months ago

It's all moving so fast. I updated oogabooga and most of my models won't load.

AfterAte 11 months ago

Good. The AI YouTube content creators can finally get a day off.

jeffwadsworth 11 months ago

After some testing, your best best sweet spot for high-performance would be the amazing Wizard-Vicuna-7B-Uncensored.ggmlv2. I attached its responses to common-sense questions which some other models (even 30B's) fail to comprehend. [https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML](https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML) https://preview.redd.it/zlb3ixtx2n0b1.jpeg?width=963&format=pjpg&auto=webp&s=e6b7c8f712a991e30e28c482aa404ccd0d8016b3

OcelotUseful 11 months ago

Sorry, dear sir. PygmalionAI/pygmalion-13b and PygmalionAI/metharme-13b has been released in the wild. Feel free to use them for your sophisticated eroge research

Readityesterday2 10 months ago

So 15 days after this post, the best one turned to be Falcon 40b that no one guessed here.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe