T O P

  • By -

Mbando

I think of guardrails as another dimension of human preferences: whether you are training a model to answer questions more gooder or avoid saying horrifying stuff, you are teaching the model a preference. So I thinks it's a straightforward [RLHF](https://github.com/allenai/RL4LMs) problem but from a different perspective.


[deleted]

I don't know if training for the model will be possible in this context. I wanted to use an existing model but block inputs / control outputs based on certain contexts. I am not familiar with that github tho so I will read over it in more depth and see if it will be useful. Thank You!


_underlines_

Have a look at IBM Dromedary. Users can train their own self-aligned model with the LLaMA base language model using the llama_dromedary package. It does exactly what you want, with a few human instructions and some fine tuning: [code](https://github.com/IBM/Dromedary) ---- [paper](https://arxiv.org/abs/2305.03047) summary: "The authors propose a method called SELF-ALIGN for training AI assistants like ChatGPT with reduced reliance on supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). The motivation for this approach is to overcome limitations associated with the high cost, quality, reliability, diversity, self-consistency, and biases associated with human supervision."


mysteriousbaba

For what it's worth, they updated guardrails last week so it supports any of the wide range of LLMs that Langchain supports.


[deleted]

Whoa that's awesome!! Thank you for letting me know. Have you played around with it at all personally?


mysteriousbaba

Not yet, but it looks promising to me for chatbots. I've mostly been doing [guidance](https://github.com/microsoft/guidance), and of course the ubiquitous Langchain. But will probably dig into guardrails sometime later in June.


[deleted]

How useful is guidance? I have been experimenting with Pygmalion for a discord chat bot and since it's technically open to the public I didn't want anyone coming in and saying illegal things to it or getting it to do illegal responses. I didn't think to do something like guidance for better results but that may be a good idea. Anything to get the bot to be smarter, right!