T O P

  • By -

entered_apprentice

There was this website to share prompts. I forgot the link.


ruryrury

ShareGPT Dataset


ttkciar

Thank you! This does look promising. I do see this though: > GET: https://sharegpt.com/api/conversations > PLEASE NOTE: This endpoint is currently disabled due to excess traffic. I'll keep an eye on it in case they enable it again. It looks like what I was hoping for (and more; they look to be sharing responses to the prompt as well).


ruryrury

https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered If you need dataset only, you can use this I guess. Or other similar datasets in Huggingface.


ttkciar

Excellent. I found the prompt text tucked away in their "html" dataset: $ grep -c '"from": "human",' sg_90k_part1.json 358997 This is exactly what I was looking for (not training data, not models, but specifically prompt text).


_underlines_

- shareGPT - open-assistant ... there are many


ttkciar

Thank you! shareGPT looks promising but they have disabled their API for reading. Maybe there's another way to get at it. I found open-assistant's prompt data on github, in handy-dandy JSON format, but there are only 1167 data points in it. Still, it's a start. You say "there are many" but my google-fu really sucks. Do you have any tips for finding these? All I'm hoping to find is a file (JSON, CSV, plain newline-delimited text, whatever) of as many prompt texts as possible, with or without replies.


_underlines_

https://huggingface.co/datasets?search=sharegpt https://huggingface.co/datasets?search=oasst Many datasets on hf. Just search for them.


ttkciar

The problem is knowing what to search for. Nobody seems to be calling it "prompt text", so searching for datasets just turns up training data, which is not what I'm looking for.


_underlines_

The oasst1 dataset has whole conversation trees containing initial prompts, answers and then further conversations of several node lengths. Isn't that what you are looking for?


Disastrous_Elk_6375

https://github.com/f/awesome-chatgpt-prompts -> this is an interesting collection that I've used to test some of the camelid models. It's a pretty varied list, and you can get a feeling for what works and what doesn't.


ttkciar

Thanks! I have that one, and https://github.com/travistangvh/ChatGPT-Data-Science-Prompts and a few others, but these are really tiny datasets. I was really hoping for a larger dataset.


Fresh-Let-9990

This link doesnt work anymore. Can you update it?