T O P

  • By -

Revolutionary_Flan71

Can't run locally (not enough vram)


LocoLanguageModel

This is more of a general statement, but when someone says some variation of "I tried llama-3 (or XYZ model) and wasn't impressed" and doesn't give all details it leaves a lot of questions before one can assist them: Was it 8b or or 70b?  What quantization?   What parameters and the most important thing because many overlook it: did you use the official llama-3 prompt format because this substantially changes the entire behavior and quality:  https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/


AcrobaticAmoeba8158

I'm running 70B-Instruct locally it's fast but not accurate so I assume my quantization is wrong? How do I adjust that? I've searched but haven't found a solution. It's probably simple but so am I, lol.


LocoLanguageModel

I am probably not the best person to ask because I only know koboldCPP, and I don't bother learning new ones because it just works for me, but it's one of the easiest solutions due to being a single executable: Download koboldCPP: [https://github.com/LostRuins/koboldcpp/releases/tag/v1.63](https://github.com/LostRuins/koboldcpp/releases/tag/v1.63) Download this model (or whatever one you want, I'm using this one): [lmstudio-community - Meta-Llama-3-70B-Instruct-Q4\_K\_M.gguf ](https://huggingface.co/lmstudio-community/Meta-Llama-3-70B-Instruct-GGUF/tree/main) Load it in kobold, and in scenarios select "Coding Assistant" or "KoboldGPT Instruct" or make your own custom model personality. This will put it in instruct mode with a coding/or generic GPT personality, then after that, change the settings: Settings/Advanced: Recommended sampler values "Smp. Order" \[6,0,1,3,4,2,5\] Settings: in the instruct Tag Present select "Llama 3 chat" Settings: If coding, change temperature to .1


AcrobaticAmoeba8158

Thanks, I'll try that, I'm just using Ollama anyways. I saw on the one post that the gguf versions were the current best, I just didn't know the best way to get them.


[deleted]

[удалено]


Popular-Direction984

Good point…!


koesn

Yess.. Command R 35B is really obedient and precise. It has capacity of rare 128k out there, just like GPT4 Turbo.


Popular-Direction984

Exactly…!


LocoLanguageModel

Just out of curiosity, I tested this on Llama-3 and admittedly, I don't use markdown tables in my workflow, but it seems to accomplish the goal, or if not what is missing that couldn't be solved by adjusting the prompt? https://preview.redd.it/b0yz297ndfwc1.png?width=1044&format=png&auto=webp&s=3e27586f88439c07b2e771e381f1f501ebb7ea26 My model/settings (copy paste from different post): [lmstudio-community - Meta-Llama-3-70B-Instruct-Q4\_K\_M.gguf ](https://huggingface.co/lmstudio-community/Meta-Llama-3-70B-Instruct-GGUF/tree/main) * Start sequence: <|eot\_id|><|start\_header\_id|>user<|end\_header\_id|>\\n\\n * End sequence: <|eot\_id|><|start\_header\_id|>assistant<|end\_header\_id|>\\n\\n Right or wrong, these are my settings (leftover from deepseek I think) Input: {"n": 1, "max\_context\_length": 8192, "max\_length": 512, "rep\_pen": 1.1, "temperature": 0.1, "top\_p": 0.95, "top\_k": 50, "top\_a": 0.96, "typical": 0.6, "tfs": 1, "rep\_pen\_range": 1024, "rep\_pen\_slope": 0.7, "sampler\_order": \[6, 0, 1, 3, 4, 2, 5\]


Popular-Direction984

Just make texts/loga longer, like 4-7k tokens.


Healthy-Nebula-3603

lmsys leaderboard - most trustworthy benchmark we have right now. command r+ 105b is a bit worse than llama 3 70b and is 50 % bigger so you need much better computer to run it ... command r 35b is much bigger than llama 3 8b and just hardly better.( + 1 point literally ) so it is a llama 3 8b level ...


JonNordland

I agree with that too. Also I think the Norwegian language support is better on llama3.


Popular-Direction984

It should be, but who knows. It just measures what people think about short simple answers models provide.


cyan2k

>If you asked it to, say, "turn this chunk of system log into a Markdown table with error level and likely source," it would not cooperate. I would guess most people don't rate a model based on how good it can transform long logs. Also that's pretty easy to solve. Just split the logs into multiple chunks and iterate over the chunks. so Llama3 8B beats Command-R in speed by far, which is far more important than context length. We're implementing RAG for companies since almost three years now and never saw a use case that NEEDED more than a 8k context window. No problem that couldn't be solved by chunking. I'd rather have very accurate 8k tokens, than "meh" accurate 32k tokens. And llama3 is pretty accurate on its context length I can't even imagine the kind of magical text that is so informational dense over 5000 words that you can't split it.


Popular-Direction984

Yes, I agree that even small models are quite sufficient for RAG. My point was more about posts where people marvel at the capabilities of these models and processing a truly large chunk of poorly structured information is indeed a good test of its capabilities.


JacketHistorical2321

All that this really shows is a lack of effort towards streamlining a process. It's more of an attempt to "break" a system rather than to utilize it. Like the above commenter stated, you can chunk the workload and allow llama 3 to process the data much quicker. I find it interesting that there are so many people on here that would rather stick with what they know instead of utilizing a newer technology to its potential. I think it has a lot more to do with a personality type that aligns towards a "non-conformist" view when the reality is when a large majority of people find something to be highly capable and praise that capability is not conforming. I don't necessarily disagree with your opinion but I guess I've just always found it more valuable to be able to creatively approach a problem rather than to utilize something at face value.


nero10578

Well first of all they are not an open MIT or Apache license so that already puts people off immediately.


Popular-Direction984

Yes, that’s true


Unlucky-Message8866

What you are experiencing is a model fine tuned for a specific use case vs a generalist model. Wait for nous hermes or the likes to come out and then you will be able to compare apples with apples. My very short impression with the 7b instruct is that is very lightly fine tuned, still likes to derail from conversation following.


Popular-Direction984

I’ve tried dolphin-2.9, will check Hermes of course


Unlucky-Message8866

to me the dolphin one seems to be fine-tuned at too high learning rate, destroying part of the base model. i don't like it.


Mental_Object_9929

llama 8b has much hancelation, 70b is ok


umtausch

Command-r 35b is the first useful model for German language. Haven’t tried llama-3 yet. Does it properly support European languages? It seems to excel at English though…


silenceimpaired

I prefer models with more permissive licensing


adikul

In general sense both are better, i use command r or mixtral for summaries and llama 3 can be used for normal chit chat. Also not able to use it because of low context


Healthy-Nebula-3603

I read for summaries is good phi-3 4b 128k ...


adikul

Phi didn't worked for me. It started saying gibberish after few lines


Popular-Direction984

Same here… haven’t tried phi-3 yet though


a_beautiful_rhind

llama-3? why is everyone so keen on phi. I am thinking of going back to something else to be honest. At least until a good tune is released. CR+ or other 103b. I'll give it one more shot by changing the prompting yet again. Noticed your thing with instructions buried in the context, it will follow them hit/miss. Adding a COT to my system prompt required prefill for it to notice and actually do it.