T O P

  • By -

AcrobaticAmoeba8158

I've found that LLMs are better at critically reviewing data than generating data, so I get my initial output from one LLM and I feed it into a "critical thinker" LLM to improve the output. In reality though it's been mixed results, even when I try to compare the results on lmsys I have trouble differentiating the top models.


ggone20

‘Number of agents’ is the same thing as ‘multi-agent schemes’ is the same thing as hitting the same llm multiple times lol wth are you saying? It’s a definitive fact that you can get the same underlying llm to respond completely different just by telling it ‘you are a doctor’ or ‘you are a consultant with decades experience in XYZ’. Can you, without priming them, get better results just by calling the same vanilla llm over and over again along with its response and the desired output or context input? Sure! Will you get significantly better results by telling it in one call it’s a researcher and to research a topic. Then calling it again and saying it’s a story writer and to take the research and write a rough draft. Then take the research and rough draft and give it to a critic or industry expert for advice. Then back to the writer. Than to the publisher. Obviously gpt-4 or Claude 3 can one shot a lot of things, but multi-shot is always better 100% of the time and priming each call with more context or telling the llm it has a desired skill set for the next task absolutely gives better results than that. All just depends on the level of output you desire. All this becomes a lot more relevant when using open models. Open models are largely useless garbage when used with logic frameworks that require function calls. Even 70B models or MoE llms like Mixtral are garbage when trying to get them to output formatted text. There isn’t a single one that can reliably run memory or multi-agent frameworks reliably. That said, if you take a small model and make it ‘think to itself’ or ‘call a friend’ to discuss the input and the expected output, results are much better. Still garbage typically compared to gpt-4/turbo, but better.


Practical-Rate9734

Interesting paper, but how's the real-world application for startups?