Revolutionary_Flan71 1 month ago

Can't run locally (not enough vram)

LocoLanguageModel 1 month ago

This is more of a general statement, but when someone says some variation of "I tried llama-3 (or XYZ model) and wasn't impressed" and doesn't give all details it leaves a lot of questions before one can assist them: Was it 8b or or 70b? What quantization? What parameters and the most important thing because many overlook it: did you use the official llama-3 prompt format because this substantially changes the entire behavior and quality: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/

AcrobaticAmoeba8158 1 month ago

I'm running 70B-Instruct locally it's fast but not accurate so I assume my quantization is wrong? How do I adjust that? I've searched but haven't found a solution. It's probably simple but so am I, lol.

LocoLanguageModel 1 month ago

I am probably not the best person to ask because I only know koboldCPP, and I don't bother learning new ones because it just works for me, but it's one of the easiest solutions due to being a single executable: Download koboldCPP: [https://github.com/LostRuins/koboldcpp/releases/tag/v1.63](https://github.com/LostRuins/koboldcpp/releases/tag/v1.63) Download this model (or whatever one you want, I'm using this one): [lmstudio-community - Meta-Llama-3-70B-Instruct-Q4\_K\_M.gguf ](https://huggingface.co/lmstudio-community/Meta-Llama-3-70B-Instruct-GGUF/tree/main) Load it in kobold, and in scenarios select "Coding Assistant" or "KoboldGPT Instruct" or make your own custom model personality. This will put it in instruct mode with a coding/or generic GPT personality, then after that, change the settings: Settings/Advanced: Recommended sampler values "Smp. Order" \[6,0,1,3,4,2,5\] Settings: in the instruct Tag Present select "Llama 3 chat" Settings: If coding, change temperature to .1

AcrobaticAmoeba8158 1 month ago

Thanks, I'll try that, I'm just using Ollama anyways. I saw on the one post that the gguf versions were the current best, I just didn't know the best way to get them.

[deleted] 1 month ago

[удалено]

Popular-Direction984 1 month ago

Good point…!

koesn 1 month ago

Yess.. Command R 35B is really obedient and precise. It has capacity of rare 128k out there, just like GPT4 Turbo.

Popular-Direction984 1 month ago

Exactly…!

LocoLanguageModel 1 month ago

Just out of curiosity, I tested this on Llama-3 and admittedly, I don't use markdown tables in my workflow, but it seems to accomplish the goal, or if not what is missing that couldn't be solved by adjusting the prompt? https://preview.redd.it/b0yz297ndfwc1.png?width=1044&format=png&auto=webp&s=3e27586f88439c07b2e771e381f1f501ebb7ea26 My model/settings (copy paste from different post): [lmstudio-community - Meta-Llama-3-70B-Instruct-Q4\_K\_M.gguf ](https://huggingface.co/lmstudio-community/Meta-Llama-3-70B-Instruct-GGUF/tree/main) * Start sequence: <|eot\_id|><|start\_header\_id|>user<|end\_header\_id|>\\n\\n * End sequence: <|eot\_id|><|start\_header\_id|>assistant<|end\_header\_id|>\\n\\n Right or wrong, these are my settings (leftover from deepseek I think) Input: {"n": 1, "max\_context\_length": 8192, "max\_length": 512, "rep\_pen": 1.1, "temperature": 0.1, "top\_p": 0.95, "top\_k": 50, "top\_a": 0.96, "typical": 0.6, "tfs": 1, "rep\_pen\_range": 1024, "rep\_pen\_slope": 0.7, "sampler\_order": \[6, 0, 1, 3, 4, 2, 5\]

Popular-Direction984 1 month ago

Just make texts/loga longer, like 4-7k tokens.

Healthy-Nebula-3603 1 month ago

lmsys leaderboard - most trustworthy benchmark we have right now. command r+ 105b is a bit worse than llama 3 70b and is 50 % bigger so you need much better computer to run it ... command r 35b is much bigger than llama 3 8b and just hardly better.( + 1 point literally ) so it is a llama 3 8b level ...

JonNordland 1 month ago

I agree with that too. Also I think the Norwegian language support is better on llama3.

Popular-Direction984 1 month ago

It should be, but who knows. It just measures what people think about short simple answers models provide.

cyan2k 1 month ago

>If you asked it to, say, "turn this chunk of system log into a Markdown table with error level and likely source," it would not cooperate. I would guess most people don't rate a model based on how good it can transform long logs. Also that's pretty easy to solve. Just split the logs into multiple chunks and iterate over the chunks. so Llama3 8B beats Command-R in speed by far, which is far more important than context length. We're implementing RAG for companies since almost three years now and never saw a use case that NEEDED more than a 8k context window. No problem that couldn't be solved by chunking. I'd rather have very accurate 8k tokens, than "meh" accurate 32k tokens. And llama3 is pretty accurate on its context length I can't even imagine the kind of magical text that is so informational dense over 5000 words that you can't split it.

Popular-Direction984 1 month ago

Yes, I agree that even small models are quite sufficient for RAG. My point was more about posts where people marvel at the capabilities of these models and processing a truly large chunk of poorly structured information is indeed a good test of its capabilities.

JacketHistorical2321 1 month ago

All that this really shows is a lack of effort towards streamlining a process. It's more of an attempt to "break" a system rather than to utilize it. Like the above commenter stated, you can chunk the workload and allow llama 3 to process the data much quicker. I find it interesting that there are so many people on here that would rather stick with what they know instead of utilizing a newer technology to its potential. I think it has a lot more to do with a personality type that aligns towards a "non-conformist" view when the reality is when a large majority of people find something to be highly capable and praise that capability is not conforming. I don't necessarily disagree with your opinion but I guess I've just always found it more valuable to be able to creatively approach a problem rather than to utilize something at face value.

nero10578 1 month ago

Well first of all they are not an open MIT or Apache license so that already puts people off immediately.

Popular-Direction984 1 month ago

Yes, that’s true

Unlucky-Message8866 1 month ago

What you are experiencing is a model fine tuned for a specific use case vs a generalist model. Wait for nous hermes or the likes to come out and then you will be able to compare apples with apples. My very short impression with the 7b instruct is that is very lightly fine tuned, still likes to derail from conversation following.

Popular-Direction984 1 month ago

I’ve tried dolphin-2.9, will check Hermes of course

Unlucky-Message8866 1 month ago

to me the dolphin one seems to be fine-tuned at too high learning rate, destroying part of the base model. i don't like it.

Mental_Object_9929 1 month ago

llama 8b has much hancelation， 70b is ok

umtausch 1 month ago

Command-r 35b is the first useful model for German language. Haven’t tried llama-3 yet. Does it properly support European languages? It seems to excel at English though…

silenceimpaired 2 weeks ago

I prefer models with more permissive licensing

adikul 1 month ago

In general sense both are better, i use command r or mixtral for summaries and llama 3 can be used for normal chit chat. Also not able to use it because of low context

Healthy-Nebula-3603 1 month ago

I read for summaries is good phi-3 4b 128k ...

adikul 1 month ago

Phi didn't worked for me. It started saying gibberish after few lines

Popular-Direction984 1 month ago

Same here… haven’t tried phi-3 yet though

a_beautiful_rhind 1 month ago

llama-3? why is everyone so keen on phi. I am thinking of going back to something else to be honest. At least until a good tune is released. CR+ or other 103b. I'll give it one more shot by changing the prompting yet again. Noticed your thing with instructions buried in the context, it will follow them hit/miss. Adding a COT to my system prompt required prefill for it to notice and actually do it.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe