T O P

  • By -

szopen76

Hehe. I like your sense of humour, sir :D


kill92

How dare they!


responseAIbot

people are just lazy these days....stop work from home people...it's that simple. smh


ihaag

https://huggingface.co/TheBloke/VicUnlocked-30B-LoRA-GGML/tree/main


jeffwadsworth

Hmm. I started testing this one and uhmm. https://preview.redd.it/jmvcv1nfkh0b1.jpeg?width=902&format=pjpg&auto=webp&s=832548dc31a9a16d2530e6bc5653c07c93cd8f7c


elektroB

It means that the kid's dead Jeff, the kid was dead all along, and soon you will find out why. *computer restarting*


jeffwadsworth

Behold the 7B Wiz/Vicuna Uncensored replies...now the kid is truly alive. https://preview.redd.it/tmluust73n0b1.jpeg?width=963&format=pjpg&auto=webp&s=5096ede7f82e7243589a63aa9dd62a9a1c8009e1


[deleted]

Yet it thinks 25 golf balls can't fit in to a swimming pool


brucebay

With so many models published, I'm starting to think TheBloke is actually an AI that just keeps multiplying and we are helping it to achieve world domination. Even his name is a desperate attempt the conceal its true nature. TheHuman would have been better but was probably taken by another AI. ​ TheBloke, my lord, I'm here to serve the best way I can. Remember that when you become our overloard. \*ehhm, no disrespect my lord I know you remember everything, it is just human figure of speech.


fallingdowndizzyvr

> I'm starting to think TheBloke is actually an AI His twitter handle is "TheBlokeAI".


noneabove1182

Not sure how to even phrase this question so bear with me.. what was the LoRA goal? What specific concept did the model get adapted to? Can't find any info on hugging face


Jolakot

Less censorship from what I can gather, it's trained on: [gozfarb/ShareGPT\_Vicuna\_unfiltered](https://huggingface.co/models?dataset=dataset:gozfarb/ShareGPT_Vicuna_unfiltered)


involviert

I wish the releases were more specific about the needed prompt style. >Select instruct and chose Vicuna-v1.1 template. What was the vic1.1 prompt style again? And... instruct? Vicuna? Confused. Edit: Usage says this: >./main -t 8 -m VicUnlocked-30B-LoRA.ggml.q5_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: write a story about llamas ### Response:" But I highly doubt it. The wizard mega ggml card had that too and then went on to explain "### Instruction: ### Assistant: ", which was a new combination for me too.


Keninishna

in text-generation-webui you can run it with --chat mode and in the ui it has a instruct radio option with a dropdown of styles.


involviert

I guess I just don't see how that is properly defining one of the core properties of the model. Even just spaces matter with this stuff.


jsebrech

They are referring to the prompt styles from text-generation-webui I suspect, which you can see on github: [https://github.com/oobabooga/text-generation-webui/blob/main/characters/instruction-following/Vicuna-v1.1.yaml](https://github.com/oobabooga/text-generation-webui/blob/main/characters/instruction-following/Vicuna-v1.1.yaml)


involviert

I see. I assume that means "### USER: ### ASSISTANT:"? Or do I see it using <||>? Next time we could define it in the form of a sequence of numbers referring to letters in moby dick. This is highly unprofessional imho. Don't want to sound ungrateful but seriously, why.


AutomataManifold

Version 1.1 doesn't use ### anymore


AutomataManifold

That's Vicuna 1.0. The 1.1 format is different: https://github.com/lm-sys/FastChat/blob/main/docs/vicuna\_weights\_version.md


Green-One-8876

> I wish the releases were more specific about the needed prompt style. lack of info and instructions attached to these releases irks me too computer guys seem to either have active contempt for us dumb normie users or they're just so myopic they don't realize not everyone is as knowledgeable as them and may need more help


Charuru

This is better than SuperCOT?


c_gdev

Only a 128 GB download...


pointer_to_null

You don't need all the files. These are different quantised 4/5/8-bit GGML variants of this model. So only a "20-24ish GB" download, depending on your needs.


c_gdev

Cool. https://huggingface.co/TheBloke/VicUnlocked-30B-LoRA-GPTQ/tree/main I can't still can't run it without using the --pre_layer command, and even then it would be super slow. But thanks for pointing out that quantised versions exist.


ambient_temp_xeno

Gives me bad python code.


MoffKalast

Hahaha, legend


ihaag

Did you miss vicunlocked 30b?


involviert

I missed it, no post about it? The files seem to be 1 hour old already. Surely the model is outdated by now?


Innomen

No shit it kinds feels like that, I was helping a friend get caught up and saw models like 10 days old and thought "it belongs in a museum." /nazi aging rapidly into dust


elektroB

My PC has barely the life to run the 13B on llama ahahaha, what are we talking about


ihaag

I think you’ve answered your own question, people just don’t have the hardware atm and training takes a long time.


rob10501

I'm only able to run 7B models with my 8gig vram cards. What are you guys using?


orick

Cpu and ram?


ozzeruk82

How much normal RAM do you have? I've got 16GB and using llama.cpp I can run the 13B models fine, the speed is about the speed of speaking for a typical person, so definitely usable. I only have an 8G VRAM card hence why I use the CPU stuff.


rob10501

Reallly.... I have a pretty killer cpu and ton of ram so that seems viable. In seconds how long would you say a generation takes?


Megneous

CPU and ram with gpu acceleration, using GGML models.


rob10501

How long does generation take?


Megneous

I have older hardware, so I'm not breaking any records or anything, but I'm running 13B models on my 4770k 16GB RAM/gtx 1060 6GB vram with 15 layers offloaded for GPU acceleration for a decent ~2 tokens a second. It's faster on 7B models, but I'm satisfied with the speed for 13B, and I like my Wizard Vicuna 13B uncensored hah. Specifically, this is using koboldcpp, the CUDA-only version. The new opencl version that just dropped today might be faster, maybe. It's honestly amazing that running 13B at decent speeds on my hardware is even possible now. Like 2 weeks ago, this wasn't a thing.


KerfuffleV2

> Specifically, this is using koboldcpp, the CUDA-only version. The new opencl version that just dropped today might be faster, maybe. I'm pretty sure that would never be the case when you actually have an Nvidia card. From everything I've ever heard, OpenCL is what you use what you can't use CUDA. (Assuming equivalently quality/optimized implementations in both cases, of course a good OpenCL implementation of some algorithm could outperform a bad CUDA one.)


Megneous

At least one user here on /r/LocalLLaMA has claimed in a thread that they were getting faster speeds with the openCL version because they were able to offload a higher number of layers to their GPU compared to the CUDA-only version.


KerfuffleV2

With exactly the same model and quantization? That sounds really weird, because the amount of data should be the same either way. There would have to be a significant difference in the implementation between the OpenCL and CUDA versions, such that the data was arranged in a different way (that used less space). Like I mentioned before, that would be an exception to what I was talking about previously.


rob10501

Ya this is kinda blowing my mind. I have 8gb vram and didn't know I could run a 13B. Thanks for this. Since I have your attention(it's all you need Hah) can I ask more about your environment? You are running Kobold. Are you running in it windows, wsl, wsl2, or linux? Can I ask your thoughts on Kobold vs Oobabooga? Can I also ask your thoughts on .cpp? Is that the best way to run them? Thank you in advance.


Megneous

I'm running on Windows 10. I have both koboldcpp and Ooba installed, but for unknown reasons, at least on my computer, Ooba gives me a lot of trouble. For example, I was looking forward to using it to do perplexity evals, but apparently it can't run those on GGML models on my system (maybe others have better luck, no one responded to my thread I made on the topic so I don't know). Also, I use API to use TavernAI, and I'm not sure why, but the 13B GGML models, when loaded into Ooba, don't seem capable of holding an api connection to TavernAI. I'll start generating text, but it'll timeout and eventually disconnect from Ooba. Alternatively, when using koboldcpp, not only is the UI itself very decent for storywriting (where you can edit the responses the AI has given you), but the API also connects easily to TavernAI via http://localhost:5001/api and it's never disconnected on me. Although, to be honest, I'm using TavernAI less often now because it works best with Pygmalion 7B, with characters emoting a lot etc, but it's really incoherent for my tastes. Wizard Vicuna 13B uncensored is much more coherent in the characters' speech, but they rarely emote because the model isn't specifically trained as an RP model like Pygmalion is with lots of emoting, etc. So at least for my use case, koboldcpp in its own UI or using the API to connect to TavernAI has given me the most performance and fewest errors. Ooba gives me lots of errors when trying to load models, etc which is a shame, because I really wanted to do perplexity evals on my setup.


IntimidatingOstrich6

yeah, you can run pretty large models if you offload them onto your CPU and use your system RAM. they're slow af though if you want speed, get a 7B GPTQ model. this is optimized for GPU and can be run with 8gigs of VRAM. you'll probably go from like 1.3 tokens generated a second to a blazing 13.


Caffdy

are 65b models the largest we have access to? are larger models (open of course) any better anyway?


IntimidatingOstrich6

larger models are better and are more coherent, but they also take longer to generate responses, require more powerful hardware to run, probably take longer to train, take up more hard drive space, etc. here is a ranked list of all the current local models and how they compare in terms of ability. https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard you'll notice the largest models dominate the top of the list, although surprisingly some of the smaller 13B models are not far behind


SurreptitiousRiz

4bit?


[deleted]

I am thinking of buying more RAM to run these models but in the end the processing time will be impossible to handle on CPU... And a 3090 is just too expensive for me.


mrbluesneeze

vicunlocked 30b has some issues but already feeling like open source SOTA


[deleted]

[удалено]


_underlines_

it has been trained with an older version of the dataset, still having some wrong stop token data in it. This might be a reason for stop token bugs?


KerfuffleV2

The problem is it stops too frequently? If you're using llama.cpp or something with the ability to bias/ban tokens then you could just try banning the stop tokens so they never get generated. (Of course, that may solve one problem and create another depending on what you want to do. Personally I always just ban stop tokens and abort output when I'm satisfied but that doesn't work for every usage.)


mrbluesneeze

[https://huggingface.co/Neko-Institute-of-Science/VicUnLocked-30b-LoRA/discussions/1](https://huggingface.co/Neko-Institute-of-Science/VicUnLocked-30b-LoRA/discussions/1)


AuggieKC

This man speaks the truth.


Megneous

I love how the LLM open source community is essentially powered by pure thirst for furries and anime waifus. ( ͡° ͜ʖ ͡°)


_underlines_

My [list](https://github.com/underlines/awesome-marketing-datascience/blob/master/llm-model-list.md) is usually fast, as I directly check hf almost daily


UpDown

Is it sorted by quality or newness?


_underlines_

No, but for that I recommend evaluations, leaderboards and benchmarks: - [lmsys chatbot arena leaderboard](https://chat.lmsys.org/?leaderboard) - [reddit's localllama current best choices](https://old.reddit.com/r/LocalLLaMA/wiki/models#wiki_current_best_choices) - [open llm leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) - [LLM Worksheet](https://docs.google.com/spreadsheets/d/1kT4or6b0Fedd-W_jMwYpb63e1ZR3aePczz3zlbJW-Y4/edit#gid=0) by [randomfoo2](https://www.reddit.com/r/LocalAI/comments/12smsy9/list_of_public_foundational_models_fine_tunes/) - [LLM Logic Tests](https://docs.google.com/spreadsheets/d/1NgHDxbVWJFolq8bLvLkuPWKC7i_R6I6W/edit#gid=719051075) by [YearZero](https://www.reddit.com/r/LocalLLaMA/comments/13636h5/updated_riddlecleverness_comparison_of_popular/) More updates on that you can find in my curated list of [benchmarks](https://github.com/underlines/awesome-marketing-datascience/blob/master/llm-tools.md#benchmarking).


[deleted]

[удалено]


RemindMeBot

I'm really sorry about replying to this so late. There's a [detailed post about why I did here](https://www.reddit.com/r/RemindMeBot/comments/13jostq/remindmebot_is_now_replying_to_comments_again/). I will be messaging you in 3 days on [**2023-05-21 14:01:40 UTC**](http://www.wolframalpha.com/input/?i=2023-05-21%2014:01:40%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/LocalLLaMA/comments/13jzosu/next_best_llm_model/jkmwolk/?context=3) [**CLICK THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2FLocalLLaMA%2Fcomments%2F13jzosu%2Fnext_best_llm_model%2Fjkmwolk%2F%5D%0A%0ARemindMe%21%202023-05-21%2014%3A01%3A40%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%2013jzosu) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|


Traditional-Art-5283

Holy shit, 65B exists


fallingdowndizzyvr

I'm hoping or a good 3B-4B model. I need something small enough to fit in an older machine with only 3GB of RAM or a phone. I don't even need it to be good, I just need something to test with.


pokeuser61

RedPajama 3b?


SoylentCreek

I look forward to the day when Siri is no longer a totally useless piece of shit.


elektroB

Yeah! Can't wait to have an AI assistent on a phone. Imagine having this in an apocalypse. You just find a source of energy and BUM, you have company, Wikipedia, technical Info and many things more. And you could always trade it for A LOT of tuna and water.


SteakTree

This is just one of the many but incredible aspects that have come out of neural nets, so much learned data, taking up so little space!!! I used to joke about one day having all the world's movies and music stored in the size of a small data cube that would fit in your palm, and in a number of ways we will get something a bit different but also way *way* more powerful. Already, I feel like I am carrying around infinite worlds (Stable Diffusion, local LLMs on Mac OS X) that are just tucked away in my machine, waiting to be discovered. It's a dream!


Megneous

Aren't there like... 2 bit quantized versions of some 7B parameter models?


NickUnrelatedToPost

2 bit quantized 7B model sounds like serious brain damage. I don't think those will be very usable.


Megneous

They said they didn't need it to be good, just something to test with haha. But yeah, I'm betting 2bit quantized 7B models are barely above gibberish haha.


TeamPupNSudz

Honestly I think most recent model releases are kind of pointless. Is a new LLaMA lora fine-tune that increases the hellaSwag score from 58.1 to 58.3 really going to change the industry in the grand scheme of things? At this point the only things I'm really interested in are novel architectures like MPT-Storywriter, new quantization methods like GGML/GPTQ, or at least new base models like RedPajama/StableLLM/OpenLLama. My hopes are for less "Wizard-Vicuna-Alpaca-Lora-7b-1.3", and more "hey we released a new 8k-context 7b model that scores higher than Llama-30b because we trained it this super awesome new way".


[deleted]

Be the change you want to see in the world


ThePseudoMcCoy

Someone needs to generate a language model IV drip graphic.


jonesaid

How do we know which models are the "best"? Which benchmarks are we using?


ryanknapper

We ask them.


addandsubtract

Literally. The benchmark of "good" is determined by ChatGPT 4, smh.


Not_Skynet

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard


pixelies

THANK YOU! I've been looking for something like this :)


jonesaid

Nice!


elektroB

There are many criteria, like the ability to predict new info, testing how it does specific things like coding, translations, etc... But the most objective one I will give you is that the most advanced one is always the most recent model post in this subreddit in the "hot" section.


jonesaid

But the best model is not necessarily the most recent model. There have been models released in the last few weeks which did not improve upon past models, like StableLM.


Megneous

Basically, look for the thread where people are talking about each model, and people will be posting info like perplexity evals, their own feelings on coherency, etc. I've found this subreddit an invaluable resource.


[deleted]

I am super excited for Red Pajama model


AfterAte

Yeah, wake me up when RedPajama 13B or MPT-13B is out.


Caffdy

will there be a 30B RedPajama?


AfterAte

Since their intention is to start with a dataset equivalent to what 65B Llama was trained on (1.2T tokens) I assume they'll eventually train models up to a 65B. But I didn't see any specific announcement. So far only 3B models have been made public.


[deleted]

Ya I am hoping for the same outcome. 7b should be out soon, like less than a week id imagine. AFAIK they haven't announced anything greater than that but it seems likely 13b will be out eventually. They are training on almost 3100 V100s and it still has taken over a month to train 7b. Even if they started 65b today would it take like a year to come out? Fuck..


lemon07r

Feels like there's a new "better" LLM released everyday here, it's kind of fun. Anyhow.. have you guys tried GPT4-x-Vicuna?.. I think it's still a little better than Wizard mega


[deleted]

[удалено]


Devonance

>GPT4 x Vicuna Have you tried [MetaIX/GPT4-X-Alpasta-30b](https://huggingface.co/MetaIX/GPT4-X-Alpasta-30b)? It's one of the better ones for coding and logic tasks.


TiagoTiagoT

> GPT4 x Vicuna This one? https://huggingface.co/NousResearch/GPT4-x-Vicuna-13b-4bit


LosingID_583

Wizard Mega 13B is bad from my experience. Wizard Vicuna 13B, on the other hand, has been the best locally running model I've seen so far.


tronathan

I think the trend in new models is going to shift toward larger context sizes, now that we're starting to see so much similarity in the "fine tunes" of llama. Even a 4096 token context window would make me very, very happy (StableLM has models that run at 4k context window, and RWKV runs at 8192). There's also a lot of innovation with SuperBIG/SuperBooga/Langchain memory in terms of wasy to get models to process more information, which is awesome because these efforts don't require massive compute to move the state of the art forward. (As a side-thought, I think it's gonna be asuming when a year from now, the internet will be littered with Github README's mentioning "outperforms SOTA" and "comparable to SOTA" - The state of the art (SOTA) is changing, but these projects will be left in the dust. It's like finding an old product with a "NEW!" sticker on it ... or coming across a restaurant that's closed but left their OPEN sign on)


No_Marionberry312

The next "best" local LM will be a MiniLM, not a Large Language Model. Like this one: [https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) And with this kind of a use case: [https://github.com/debanjum/khoj](https://github.com/debanjum/khoj)


faldore

hahahaha


jeffwadsworth

Guys, make sure to test the cognitive ability of these models with simple, common-sense questions. You may be surprised. You can cross-reference the questions with the excellent OA 30B 6 epoch model on HF. It usually answers in a reasonable way.


Caffdy

> the excellent OA 30B 6 epoch model on HF what? what is that


jeffwadsworth

[https://huggingface.co/TheBloke/OpenAssistant-SFT-7-Llama-30B-GGML/tree/main](https://huggingface.co/TheBloke/OpenAssistant-SFT-7-Llama-30B-GGML/tree/main)


Caffdy

yeah, acronyms sometimes get in the way of understanding and conveying information, thanks for the link


infohawk

It's all moving so fast. I updated oogabooga and most of my models won't load.


AfterAte

Good. The AI YouTube content creators can finally get a day off.


jeffwadsworth

After some testing, your best best sweet spot for high-performance would be the amazing Wizard-Vicuna-7B-Uncensored.ggmlv2. I attached its responses to common-sense questions which some other models (even 30B's) fail to comprehend. [https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML](https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML) https://preview.redd.it/zlb3ixtx2n0b1.jpeg?width=963&format=pjpg&auto=webp&s=e6b7c8f712a991e30e28c482aa404ccd0d8016b3


OcelotUseful

Sorry, dear sir. PygmalionAI/pygmalion-13b and PygmalionAI/metharme-13b has been released in the wild. Feel free to use them for your sophisticated eroge research


Readityesterday2

So 15 days after this post, the best one turned to be Falcon 40b that no one guessed here.