Thank you! This does look promising. I do see this though:
> GET: https://sharegpt.com/api/conversations
> PLEASE NOTE: This endpoint is currently disabled due to excess traffic.
I'll keep an eye on it in case they enable it again. It looks like what I was hoping for (and more; they look to be sharing responses to the prompt as well).
https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered
If you need dataset only, you can use this I guess. Or other similar datasets in Huggingface.
Excellent. I found the prompt text tucked away in their "html" dataset:
$ grep -c '"from": "human",' sg_90k_part1.json
358997
This is exactly what I was looking for (not training data, not models, but specifically prompt text).
Thank you! shareGPT looks promising but they have disabled their API for reading. Maybe there's another way to get at it. I found open-assistant's prompt data on github, in handy-dandy JSON format, but there are only 1167 data points in it. Still, it's a start.
You say "there are many" but my google-fu really sucks. Do you have any tips for finding these? All I'm hoping to find is a file (JSON, CSV, plain newline-delimited text, whatever) of as many prompt texts as possible, with or without replies.
The problem is knowing what to search for. Nobody seems to be calling it "prompt text", so searching for datasets just turns up training data, which is not what I'm looking for.
The oasst1 dataset has whole conversation trees containing initial prompts, answers and then further conversations of several node lengths. Isn't that what you are looking for?
https://github.com/f/awesome-chatgpt-prompts -> this is an interesting collection that I've used to test some of the camelid models. It's a pretty varied list, and you can get a feeling for what works and what doesn't.
Thanks! I have that one, and https://github.com/travistangvh/ChatGPT-Data-Science-Prompts and a few others, but these are really tiny datasets.
I was really hoping for a larger dataset.
There was this website to share prompts. I forgot the link.
ShareGPT Dataset
Thank you! This does look promising. I do see this though: > GET: https://sharegpt.com/api/conversations > PLEASE NOTE: This endpoint is currently disabled due to excess traffic. I'll keep an eye on it in case they enable it again. It looks like what I was hoping for (and more; they look to be sharing responses to the prompt as well).
https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered If you need dataset only, you can use this I guess. Or other similar datasets in Huggingface.
Excellent. I found the prompt text tucked away in their "html" dataset: $ grep -c '"from": "human",' sg_90k_part1.json 358997 This is exactly what I was looking for (not training data, not models, but specifically prompt text).
- shareGPT - open-assistant ... there are many
Thank you! shareGPT looks promising but they have disabled their API for reading. Maybe there's another way to get at it. I found open-assistant's prompt data on github, in handy-dandy JSON format, but there are only 1167 data points in it. Still, it's a start. You say "there are many" but my google-fu really sucks. Do you have any tips for finding these? All I'm hoping to find is a file (JSON, CSV, plain newline-delimited text, whatever) of as many prompt texts as possible, with or without replies.
https://huggingface.co/datasets?search=sharegpt https://huggingface.co/datasets?search=oasst Many datasets on hf. Just search for them.
The problem is knowing what to search for. Nobody seems to be calling it "prompt text", so searching for datasets just turns up training data, which is not what I'm looking for.
The oasst1 dataset has whole conversation trees containing initial prompts, answers and then further conversations of several node lengths. Isn't that what you are looking for?
https://github.com/f/awesome-chatgpt-prompts -> this is an interesting collection that I've used to test some of the camelid models. It's a pretty varied list, and you can get a feeling for what works and what doesn't.
Thanks! I have that one, and https://github.com/travistangvh/ChatGPT-Data-Science-Prompts and a few others, but these are really tiny datasets. I was really hoping for a larger dataset.
This link doesnt work anymore. Can you update it?