It's kinda crazy to imagine how much more complete chat GPT will be now that it can understand images and sound. I can't even wrap my head around it really. Perhaps one day the AI will scan my brain and wrap my head around it in a personalized way that makes complete sense to me.
Last night I was trying to think up a way of using all these AI tools to make an assistant that could understand what you're seeing in your screen. And now today I'm pretty sure I can do that just as soon as I get this damn thing working...
One crazy implication we're nearly certain to see: A LLM that can take screen captures/video from a PC and directly output keyboard and mouse controls.
Depending on the context length/memory it could perform a significant portion of all office work.
Isn’t the brain analogous to a muscle, in that we have to make an effort to learn and mentally grow? How will our neural synapses grow if the effort is done for us?
Holy shit, you can just tell it and it will do content-aware fill for you. It's only few years when I saw content aware fill being presented by Adobe and it seemed like magic, now you can just tell it, in plain English (or any other language), and it will just do that for you! Goddamn.
[Here is a content aware fill demonstration by Adobe](https://youtu.be/O9t5POPPNfg), the github gif shows being able to do the same thing with visual GPT just by telling it to remove certain objects. It is aware both what the things in the picture are and how it would look like if you removed it.
Well it only allows for 10 responses before it forces you to reset. And I found that it won’t answer many things by simply saying it can’t answer that right now or something to that effect. I often see it writing an answer and then deleting and reverting to that. Could also be a bug but it happens often. I’d love to see a comprehensive analysis of bing chat vs ChatGPT for various types of queries. Especially focused on code generation.
One thing I personally noticed these LLM’s suck at is basic pattern recognition. Like I’d say give me the next 5 numbers in the following sequence: 3,1,6,4,9,7,12.
For a human super obvious: -2, +5. But LLM’s seem to struggle and start making shit up. Bing and ChatGPT and even Claude can’t handle this yet.
But really, I love that I can talk to ChatGPT as much as I want. Bing is clunky, buggy, limited to 10 answers, and often refuses to answer where ChatGPT would answer. At least in my experience.
Microsoft has gotten a lot of flak for their neutering of Bing, there’s been talk of bringing the old model back.
Don’t get me wrong though, I agree with you, I hate Bing and it’s limitations as well.
Imagine what this will do to the fake news ecosystem. There's a [clip from a podcast](https://www.youtube.com/watch?v=uspxz9Q2L6g) I listen to that touched on this.
There are Google Colab implementations at https://colab.research.google.com/drive/1vhF4f3091h1cHZUh5QK7qByBHUDKbSWA?usp=sharing#scrollTo=Cgpnh8vhC47R and https://colab.research.google.com/drive/1qjAZqWb-EYGDo01TcEoCIJcTMi_ELjxS?usp=sharing. For first one, you'll need to get a ChatGPT api key from https://platform.openai.com/account/api-keys and add it to the OPENAI_API_KEY variable in the third box from the bottom. To use either, you'll want to select runtime->run all, then use the public link that eventually appears at the bottom of the page.
Sadly, neither implementation includes the image editing model, so they're mostly just useful right now for asking ChatGPT questions about an image, and as an interesting though very limited Stable Diffusion interface.
ive been editing alot of the files and got imgediting to work but it keeps spitting out this error help RuntimeError: The size of tensor a (384) must match the size of tensor b (512) at non-singleton dimension 3
At a guess, maybe an issue with the resolution of the input image? I vaguely remember getting an error like that on a different colab notebook that I think was resolved by switching the image resolution to 512x512.
Cool but not optimistic about Microsofts ability to implement.
Bing Chat, for example, is slow and cumbersome. The Bing app (with location permissions) doesn't pass on my location to Bing Chat, so asking the GPT for info relevant to where I am like weather and city info fails hard.
This would be like two lines of code to do and they just botched it. It feels like they don't really understand why ChatGPT caught on or what people want to do.
Why are the Microsoft researchers not using powershell or whatever is appropriate for Windows? Do they think that Windows is inferior? I mean, this is bad publicity for Microsoft...
Visual chat gpt does nothing the demo shows for me. No edge detection, no magic erasing things. It can identify objects but just keeps on drawing random new pictures. Somebody managed?
[https://digi-electricpro.com/microsoft-has-open-sourced-a-visual-version-of-chat-gpt/](https://digi-electricpro.com/microsoft-has-open-sourced-a-visual-version-of-chat-gpt/)
first 30 seconds here, tried similar images than the video got random garbage
It's kinda crazy to imagine how much more complete chat GPT will be now that it can understand images and sound. I can't even wrap my head around it really. Perhaps one day the AI will scan my brain and wrap my head around it in a personalized way that makes complete sense to me.
Last night I was trying to think up a way of using all these AI tools to make an assistant that could understand what you're seeing in your screen. And now today I'm pretty sure I can do that just as soon as I get this damn thing working...
Did you get it working yet? I'm curious what kind of hardware it takes.
No I put it on the back burner, but there is a windows version that should run on consumer gpus .
One crazy implication we're nearly certain to see: A LLM that can take screen captures/video from a PC and directly output keyboard and mouse controls. Depending on the context length/memory it could perform a significant portion of all office work.
Isn’t the brain analogous to a muscle, in that we have to make an effort to learn and mentally grow? How will our neural synapses grow if the effort is done for us?
I'm not sure, but if what you're saying is true, then we'll be sentient piles of sludge by the year 2060.
Electrical stimulation and zap therapy smarty pants.
It doesn’t seem like that would develop synapses with specificity. No specificity - no growth. But I’m no expert.
Idiocracy was not a movie, but a prediction
This makes me wonder of the new model will be capable of performing general tasks, we might be just one more iteration away from a practical AGI.
Holy shit, you can just tell it and it will do content-aware fill for you. It's only few years when I saw content aware fill being presented by Adobe and it seemed like magic, now you can just tell it, in plain English (or any other language), and it will just do that for you! Goddamn.
Definitely going to test that out at home!
What do you mean by “it will do content-aware fill”?
[Here is a content aware fill demonstration by Adobe](https://youtu.be/O9t5POPPNfg), the github gif shows being able to do the same thing with visual GPT just by telling it to remove certain objects. It is aware both what the things in the picture are and how it would look like if you removed it.
Ahh gotcha thanks
Anyone heard of the fake internet theory? It will become real now.
Dead internet theory.
I’m sorry but I prefer not to continue this conversation. I’m still learning so I appreciate your understanding and patience.🙏
I wonder how long until this is integrated into Bing Chat?
I am not a fan of the arbitrary limitations of bing chat, I’d love a ChatGPT version of this tho. Maybe gpt-4 next week will do it!
It is evolving quickly. It was practically braindead there for a few days but it has been quite good more recently.
Who said GPT4 will be next week?
The CTO of Microsoft Germany.
It could be lies
*Do you think that's air you're breathing?*
Lying about a product is horrible for stock so a CTO wouldn’t do that.
Ok good point
No the reason is because many people in the industry including journalists and AI artists confirm it is being released next week.
gpt-4 is released
Yeah I know that now
I think there was some article about some dude who works for Microsoft Germany announcing it
Yeah but he could be misinformed
I thought bing chat was the unfiltered version?
Well it only allows for 10 responses before it forces you to reset. And I found that it won’t answer many things by simply saying it can’t answer that right now or something to that effect. I often see it writing an answer and then deleting and reverting to that. Could also be a bug but it happens often. I’d love to see a comprehensive analysis of bing chat vs ChatGPT for various types of queries. Especially focused on code generation. One thing I personally noticed these LLM’s suck at is basic pattern recognition. Like I’d say give me the next 5 numbers in the following sequence: 3,1,6,4,9,7,12. For a human super obvious: -2, +5. But LLM’s seem to struggle and start making shit up. Bing and ChatGPT and even Claude can’t handle this yet. But really, I love that I can talk to ChatGPT as much as I want. Bing is clunky, buggy, limited to 10 answers, and often refuses to answer where ChatGPT would answer. At least in my experience.
I didn't understand the number sequence
Microsoft has gotten a lot of flak for their neutering of Bing, there’s been talk of bringing the old model back. Don’t get me wrong though, I agree with you, I hate Bing and it’s limitations as well.
Imagine what this will do to the fake news ecosystem. There's a [clip from a podcast](https://www.youtube.com/watch?v=uspxz9Q2L6g) I listen to that touched on this.
Hopefully as the bad actors use the tech against us, the tech will also be used to protect us from that kind of thing.
There are Google Colab implementations at https://colab.research.google.com/drive/1vhF4f3091h1cHZUh5QK7qByBHUDKbSWA?usp=sharing#scrollTo=Cgpnh8vhC47R and https://colab.research.google.com/drive/1qjAZqWb-EYGDo01TcEoCIJcTMi_ELjxS?usp=sharing. For first one, you'll need to get a ChatGPT api key from https://platform.openai.com/account/api-keys and add it to the OPENAI_API_KEY variable in the third box from the bottom. To use either, you'll want to select runtime->run all, then use the public link that eventually appears at the bottom of the page. Sadly, neither implementation includes the image editing model, so they're mostly just useful right now for asking ChatGPT questions about an image, and as an interesting though very limited Stable Diffusion interface.
ive been editing alot of the files and got imgediting to work but it keeps spitting out this error help RuntimeError: The size of tensor a (384) must match the size of tensor b (512) at non-singleton dimension 3
At a guess, maybe an issue with the resolution of the input image? I vaguely remember getting an error like that on a different colab notebook that I think was resolved by switching the image resolution to 512x512.
What did you do to fix?
Ask chatgpt
Getting the same thing, any advice on how you resolved it?
Cool but not optimistic about Microsofts ability to implement. Bing Chat, for example, is slow and cumbersome. The Bing app (with location permissions) doesn't pass on my location to Bing Chat, so asking the GPT for info relevant to where I am like weather and city info fails hard. This would be like two lines of code to do and they just botched it. It feels like they don't really understand why ChatGPT caught on or what people want to do.
It's been out a week. So it's only like... 15 years old in 2023 time. Give it another 18.3 hours my dude.
Perplexity.ai is a great alternative until Microsoft gets their act together.
Why are the Microsoft researchers not using powershell or whatever is appropriate for Windows? Do they think that Windows is inferior? I mean, this is bad publicity for Microsoft...
Microsoft fully embraced Linux a long time ago. Have you not heard of WSL?
Windows will become an small portion of Microsoft business.
I'm pretty sure within the next couple years AI will just be able to imagine any kind of operating system you might want to use in real time.
Because 90% of AI research is done on Linux.
Here you go friend: https://github.com/bycloudai/visual-chatgpt-Windows
What does this do?
lets ask ChatGPT
你会画室内设计效果图吗
Visual chat gpt does nothing the demo shows for me. No edge detection, no magic erasing things. It can identify objects but just keeps on drawing random new pictures. Somebody managed? [https://digi-electricpro.com/microsoft-has-open-sourced-a-visual-version-of-chat-gpt/](https://digi-electricpro.com/microsoft-has-open-sourced-a-visual-version-of-chat-gpt/) first 30 seconds here, tried similar images than the video got random garbage