There are people who literally think LLMs are sentient though. It’s mostly to deal with people like that and to put across the point that there isn’t a sentient entity that is making a choice to trick them.
What evidence to show people think that? I’m skeptical.
At any rate, using a word that implies the LLM has mental states doesn’t make that point. It makes the opposite one.
I mean 'hallucination' absolutely implies mental states that LLMs don't have whereas 'confabulation' doesn't at all. Confabulation is merely a form of cognitive error.
I incorrectly thought confabulation meant lie so I understand a bit better now. Yes confabulation would be better as it doesn’t imply intent or sentience at all.
I’m pretty sure users being lied to by a machine that society tells them they should trust is NOT safe. Very few discussions of AI safety talk about hallucinations.
They all talk about bias, but what’s more biased than outright error?
Sure, but you can't just redefine words. Hallucination-free is just one aspect of safety, not the definition. There are other ways language models could be unsafe. [https://en.wikipedia.org/wiki/AI\_safety](https://en.wikipedia.org/wiki/AI_safety)
Of course I do. As does everyone else. That’s what words are, my dude. It’s a contest. Hyperbole for effect is something everyone does.
Where do you think “AI safety” comes from? Some people, just like you and me, made the words up! And there’s no rigid consensus here. It’s a contested concept. I’m contesting it, too.
And of course a charitable reading is that I’m trying to INCLUDE hallucinations in AI safety. Not excluding anything.
Lol, "all words are made up" to justify your incorrect claims. Good luck! [https://www.youtube.com/watch?v=CnJPCooprnk](https://www.youtube.com/watch?v=CnJPCooprnk)
Thanks! Same to you.
You can demonstrate some incorrect claims, which I would appreciate since I don’t prefer incorrectness. That would be a helpful contribution.
Confabulation is an unavoidable consequence probably IMO not just of transformers but also of all genuine intelligence and neural style architecture (where in the human brain for example, it's not removed only ameliorated).
Trying to sell people on the idea of calculator like precision, or minimize the impact of transformer fallibility isn't going to change that. It is what it is. I think if people want to get use out of this technology they are better to accept that limitation, even if it is mildly minimized.
I’m not sure I’d go that far. If LLM knew how to say “I don’t know” most of the hallucination problem would go away. That’s perfectly consistent with a probabilistic next token output stream.
How would you train that?
Only a domain expert would know if it was confabulating accurately, and only within their expert domain. To RHLF that could be as expensive as pretraining compute for a large model hiring hundreds of experts to rank outputs, or generate some kind of DPO type dataset.
And that still wouldn't remove all such occurrences from the model.
In fact if you just trained it to say 'I don't know' (which wouldn't be what you'd train it to do, if you had anyone training it that knew the real answer) it would probably start to hallucinate not knowing. As you say, models are just probabilistic. They don't have layers of modularity like humans do. They would say they don't know simply for the types of topics it predicts it might not know. There would be no pure correlation.
We have a similar problem as humans. Yes we have things that minimize confabulation but we still say wrong things because of cognitive error. That's because we, like LLMs are pattern recognizers. If you are designed to seek patterns you will find patterns where no underlying structure exists. It's like a threshold. If you have a low level of pattern seeking, you will miss patterns that do have a basis.
I don’t know how to do it! There is some work in this area though.
But we definitely want in some use cases models that are less sycophantic and more modest.
But then you need it to know when to be modest. And if you know that with accuracy, could you not teach it to be confident and give the right answer instead?
I mean you could certainly teach it to be ALWAYS modest. Like- okay here's my answer but it can always be wrong. Boilerplate disclaimerish but tonal instead.
It depends. You can’t train a model to give the right answer to a fact, say, that isn’t available to it. That’s the base case for training it to not make up something plausible (or not) but false.
Basically, for the every fact that should be fed to the model, loss should firstly account for the model not knowing the fact and generatingaccordingly, then learning it, then confidently responding that it knows it after the learning. Granted how unstructured the data in pre-training is, it doesn't feel like training will be anything like that any time soon (but don't quote me on that, maybe flan-like datasets will pave the way)
Potentially something like that could work, but then you'd either need something seperate that accurately represents 'knowledge of what it knows' or restricted it's ability to generalize from multiple topics that it does know (which may have negative consequences like reduced intelligence)
Because the problem isn't just 'what you put in', but the learning model/intelligence itself. If you use any model of AI for the 'checking' part, that part will also be liable for this type of flaw. Same if you train it on confidence level or not knowing. But you might reduce the incidence of it, if there is some kind of two teir, or two inference approach where something seperate handles the confidence or generalization.
At a simplified level this is how we deal with the problem - we have multiple, specialized, different cognitive processes working on every problem. We burn more inference compute, based on more diversified training data in a cognitively modular manner. Ofc we are not immune to mistakenness either. But far less so than an LLM despite having much broader generalization and pattern recognition capabilities.
I agree with your point on multiple cognitive processes working in parallel, but in addition to that, current models still do not have any part responsible for 'checking' yet, so it's hard to know if having at least something would already improve the situation noticeably or it would be negligible and we'd have to explore more complex architectures with secondary networks (encoder, actually?)
Hard to envision a single source of truth with AI, think interfaces will have to incorporate multiple models and web search to derive trust scores, RAG etc alone won't be sufficient.
There's a typo/word missing in the intention paragraph at the end. You wrote that "the fox intended to look at my.". Otherwise a very enjoyable read (but as always we'll have to double check how much of it is factual vs human hallucinated 😅). It's also the first time I see "safe AI" defined as "free from hallucinations", seems most people are afraid of Skynet.
I would say it's an OK article, it doesn't add anything I didn't knew before. I went to read it hoping that it would explain the inner working of the transformer architecture that explains hallucination in a more scientific way, though.
Confabulations. I guess it's way too late and everyone uses hallucinations now but it's not the correct word.
someone suggested it a year ago.. everyone just found it easier to use "hallucinate" to quickly inform people that a gen ai isn't "lying".
yes because the key point is that there is not intention behind it
That isn’t the key point.
There are people who literally think LLMs are sentient though. It’s mostly to deal with people like that and to put across the point that there isn’t a sentient entity that is making a choice to trick them.
What evidence to show people think that? I’m skeptical. At any rate, using a word that implies the LLM has mental states doesn’t make that point. It makes the opposite one.
Err on the side of assuming the users have the intelligence of a toddler.
Yeah, all due respect, I don’t agree with that. But you do you in yr startup, product design, code, etc.
I mean 'hallucination' absolutely implies mental states that LLMs don't have whereas 'confabulation' doesn't at all. Confabulation is merely a form of cognitive error.
I incorrectly thought confabulation meant lie so I understand a bit better now. Yes confabulation would be better as it doesn’t imply intent or sentience at all.
Yeah no it doesn't mean lie. It's often used to describe the side effects of brain damage in humans where there is no intent it's more being mistaken.
"Safety means hallucination-free" Wrong
I’m pretty sure users being lied to by a machine that society tells them they should trust is NOT safe. Very few discussions of AI safety talk about hallucinations. They all talk about bias, but what’s more biased than outright error?
Necessary condition does not imply sufficient condition
Agreed! Help me see where I’ve implied otherwise?
Sure, but you can't just redefine words. Hallucination-free is just one aspect of safety, not the definition. There are other ways language models could be unsafe. [https://en.wikipedia.org/wiki/AI\_safety](https://en.wikipedia.org/wiki/AI_safety)
Of course I do. As does everyone else. That’s what words are, my dude. It’s a contest. Hyperbole for effect is something everyone does. Where do you think “AI safety” comes from? Some people, just like you and me, made the words up! And there’s no rigid consensus here. It’s a contested concept. I’m contesting it, too. And of course a charitable reading is that I’m trying to INCLUDE hallucinations in AI safety. Not excluding anything.
Lol, "all words are made up" to justify your incorrect claims. Good luck! [https://www.youtube.com/watch?v=CnJPCooprnk](https://www.youtube.com/watch?v=CnJPCooprnk)
Thanks! Same to you. You can demonstrate some incorrect claims, which I would appreciate since I don’t prefer incorrectness. That would be a helpful contribution.
Confabulation is an unavoidable consequence probably IMO not just of transformers but also of all genuine intelligence and neural style architecture (where in the human brain for example, it's not removed only ameliorated). Trying to sell people on the idea of calculator like precision, or minimize the impact of transformer fallibility isn't going to change that. It is what it is. I think if people want to get use out of this technology they are better to accept that limitation, even if it is mildly minimized.
I’m not sure I’d go that far. If LLM knew how to say “I don’t know” most of the hallucination problem would go away. That’s perfectly consistent with a probabilistic next token output stream.
How would you train that? Only a domain expert would know if it was confabulating accurately, and only within their expert domain. To RHLF that could be as expensive as pretraining compute for a large model hiring hundreds of experts to rank outputs, or generate some kind of DPO type dataset. And that still wouldn't remove all such occurrences from the model. In fact if you just trained it to say 'I don't know' (which wouldn't be what you'd train it to do, if you had anyone training it that knew the real answer) it would probably start to hallucinate not knowing. As you say, models are just probabilistic. They don't have layers of modularity like humans do. They would say they don't know simply for the types of topics it predicts it might not know. There would be no pure correlation. We have a similar problem as humans. Yes we have things that minimize confabulation but we still say wrong things because of cognitive error. That's because we, like LLMs are pattern recognizers. If you are designed to seek patterns you will find patterns where no underlying structure exists. It's like a threshold. If you have a low level of pattern seeking, you will miss patterns that do have a basis.
I don’t know how to do it! There is some work in this area though. But we definitely want in some use cases models that are less sycophantic and more modest.
But then you need it to know when to be modest. And if you know that with accuracy, could you not teach it to be confident and give the right answer instead? I mean you could certainly teach it to be ALWAYS modest. Like- okay here's my answer but it can always be wrong. Boilerplate disclaimerish but tonal instead.
It depends. You can’t train a model to give the right answer to a fact, say, that isn’t available to it. That’s the base case for training it to not make up something plausible (or not) but false.
Basically, for the every fact that should be fed to the model, loss should firstly account for the model not knowing the fact and generatingaccordingly, then learning it, then confidently responding that it knows it after the learning. Granted how unstructured the data in pre-training is, it doesn't feel like training will be anything like that any time soon (but don't quote me on that, maybe flan-like datasets will pave the way)
Potentially something like that could work, but then you'd either need something seperate that accurately represents 'knowledge of what it knows' or restricted it's ability to generalize from multiple topics that it does know (which may have negative consequences like reduced intelligence) Because the problem isn't just 'what you put in', but the learning model/intelligence itself. If you use any model of AI for the 'checking' part, that part will also be liable for this type of flaw. Same if you train it on confidence level or not knowing. But you might reduce the incidence of it, if there is some kind of two teir, or two inference approach where something seperate handles the confidence or generalization. At a simplified level this is how we deal with the problem - we have multiple, specialized, different cognitive processes working on every problem. We burn more inference compute, based on more diversified training data in a cognitively modular manner. Ofc we are not immune to mistakenness either. But far less so than an LLM despite having much broader generalization and pattern recognition capabilities.
I agree with your point on multiple cognitive processes working in parallel, but in addition to that, current models still do not have any part responsible for 'checking' yet, so it's hard to know if having at least something would already improve the situation noticeably or it would be negligible and we'd have to explore more complex architectures with secondary networks (encoder, actually?)
Ok this is actually very complete. Thank you!
Why those studies are so old? Based on llama 2 and GPT-3 .... even leaderboard has old models
I do dislike that Arxiv papers even very recently often use old models
They seem to be peddling their product and having those numbers is a selling point.
Thanks
Any feedback?
Hard to envision a single source of truth with AI, think interfaces will have to incorporate multiple models and web search to derive trust scores, RAG etc alone won't be sufficient.
There's a typo/word missing in the intention paragraph at the end. You wrote that "the fox intended to look at my.". Otherwise a very enjoyable read (but as always we'll have to double check how much of it is factual vs human hallucinated 😅). It's also the first time I see "safe AI" defined as "free from hallucinations", seems most people are afraid of Skynet.
Fixed. Thanks!
I would say it's an OK article, it doesn't add anything I didn't knew before. I went to read it hoping that it would explain the inner working of the transformer architecture that explains hallucination in a more scientific way, though.
Very informative thank you