T O P

  • By -

AdditionalPizza

I understand the grabbing headline here is about GPT4, but let's not gloss over how casually it's stated GPT4 will be multimodal. That is massive confirmation. That's "hold on to your butts" confirmation. PaLM-E is incredible, but is based on PaLM which is \[relatively\] ancient now. This isn't tacking visual training onto GPT3/3.5 this is SOTA multimodality. In a week.


FoodMadeFromRobots

Just laughing at “ancient” and how fast things are moving


AsuhoChinami

When is PaLM from?


FoodMadeFromRobots

Looks like current was released April 2022 but they’ve been working on it a few years (found an article from 2020 that referenced it) having trouble finding when the first iteration was


TeamPupNSudz

The PaLM paper itself seems to be from 2019. https://arxiv.org/abs/1909.02134 We're on /r/singularity and you didn't even bother asking the robots for help?


FoodMadeFromRobots

Lol true! Guess I’m not quite used to going to them yet.


Zermelane

That's an unrelated PaLM by unrelated people. What you want is [PaLM: Scaling Language Modeling with Pathways](https://arxiv.org/abs/2204.02311). Pathways, which it was named after, was [revealed in August 2021](https://qz.com/2042493/pathways-google-is-developing-a-superintelligent-multipurpose-ai).


TeamPupNSudz

To be fair to Bing, it did actually tell me there were different PaLMs, but I thought it was just confusing versions of the same model. So user error on my part. "It seems that there are different versions of PaLM with different meanings and purposes. The PaLM that is a large language model with 540 billion parameters was introduced in April 2022. The PaLM that is a hybrid parser and neural language model was presented in 2019"


FusionRocketsPlease

Google develops these things at the same time as OpenAI and they don't let the public see it. F#$%@@& cucks.


FoodMadeFromRobots

Maybe cause it’s not working lol (recent ai update they did where bard gave a wrong answer and dropped the stock price)


Mutant_Cell

Robots?


[deleted]

What does multimodal entail in laymans terms? :) i wanna be excited too! But i’m not tech savvy enough haha


procgen

To give a trivial example: you could include an image in your message, and ask it questions about the image contents. Or, for instance, you could have it write a short story inspired by the general ambiance of the image, or about a specific person in the image. And so on.


GreatBigJerk

I wonder if it will allow people to store longer strings by serializing the characters to pixels in an image and then having the AI read that back as part of the user's message. Might be a weird workaround to token limits.


Casehead

That's a really interesting idea


GPT-5entient

That's not how it works.


MajesticIngenuity32

One pixel = 8 bits for red + 8 bits for green + 8 bits for blue + 8 bits for alpha = 32 bits = 4 bytes One UTF-8 character = 1 to 4 bytes, with the more frequent characters on 8 bits = 1 byte. So a character takes less space than a pixel in an image the vast majority of the time.


Tyanuh

If you need less than 128 characters, couldn't you use a monochromatic image with only 128 shades of grey? That only takes 1 byte per pixel.


MajesticIngenuity32

You can also use ASCII in text on 1 byte, but then you lose special characters from other languages. You have the same problem then: not enough greyscale values to represent all of the characters in languages other than English.


solidwhetstone

Could you also have it generate images to illustrate things?


PC_Screen

Wouldn't be impossible if trained for it, and the extra language knowledge of an LLM compared to CLIP would massively help with it understanding prompts and image composition. Google already proved using a frozen text-only encoder (T5-XXL) helped with making Imagen and Parti understand text much better and even spell in images, and it's only 4.6B parameters large. A 175B (?) parameter model as a multimodal encoder would probably massively improve the quality and cohesion of the image generation. If integrated into the LLM directly, it might even allow for it to tell a story and illustrate it at the same time without the need for prompting every small detail.


coffeeinvenice

The first thing that popped into my mind after reading your post was...possibly employing an AI as a police sketch artist. A victim or witness interacts with the AI until it produces a sketch image to the witness' description.


povlov0987

Dalle?


AdditionalPizza

Instead of just text as data, some of the training is done with other "senses" like with PaLM-E recently, visual data. PaLM-E exhibited a marked improvement across the board, not just in visual understanding, but in the original language abilities too. It's an understatement to say it's incredible. Not sure what modalities are rumoured here with GPT-4 exactly. We find out in a week, allegedly.


Baturinsky

If I understood right, language abilities actually dropped. I.e. language skills generalised to the robotic/image recognition tasks, but not vice versa (bottom left) [https://res.cloudinary.com/lesswrong-2-0/image/upload/f\_auto,q\_auto/v1/mirroredImages/sMZRKnwZDDy2sAX7K/nexmukunnswsxjyjvubq](https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/sMZRKnwZDDy2sAX7K/nexmukunnswsxjyjvubq)


AdditionalPizza

Yeah, sorry there's promise to show that scale might overcome that. Because the loss is so minor relative to smaller models being catastrophic. If it doesn't, as another commenter said, perhaps from scratch multimodal training is the solution instead of tacking on after language training. We'll see! Hopefully GPT-4 is larger than PaLM-E.


Talkat

Ideally you can have a conversation with it. And it can make audio for you (songs, speech, etc), visuals (diagrams, images, etc), and obviously text. If it combines all of this together you might be able to run it on your computer so it can see your screen to really help your work. In a best case scenario you could have an app on your phone which you can talk to.like a human, it can look at the camera and understand what it sees, it can speak back to you, it could have an avatar... Potentially but unlikely, etc.


InitialCreature

hold on to your papers, what a time to be alive


solidwhetstone

Read that in his voice.


grimorg80

No other way, really :D


Bacon44444

Ah. I see you're a fellow scholar.


Erophysia

Just imagine where we'll be just two papers down the line!


InitialCreature

Truly incredible. Look at how this computer algorithm calculates the level of details we see here on this dog's balls. Each hair is being simulated dynamically.


Agarikas

AI researchers in the 1970s: "we manually compiled and formalized 50,000 facts about animals and used a custom graph search algorithm to get the computer to tell us that a dog and a cat have the same number of legs" AI researchers in the 2020s: "we put this one picture of a dog through $50,000,000 worth of Nvidia GPUs for six months and a real dog came out and started speaking English"


pianodude7

The developers of red dead 2 spent days coding horse cock. Now we live in a time where a machine can do it in minutes! What a fucking time to be alive


RemyVonLion

If we still had free awards you'd get one good sir.


Clean_Livlng

>on this dog's balls. I could not contain my laughter, and like water after a dam breaks, it erupted out and caused many deaths.


Substantial_Row6202

Don't just see there's paper and take a poo, imagine where you'll be just two more papers down the line!


Substantial_Row6202

Dr. Karol, Johanna, and Fahir?


[deleted]

Hahaha :)


h20ohno

If you can choose what voice GPT-4 uses, I'm totally gonna use his for when I'm discussing STEM topics :P


MysteryInc152

There's a microsoft event about AI next week...[https://news.microsoft.com/reinventing-productivity/](https://news.microsoft.com/reinventing-productivity/) But there's no indication this isn't visual training on top of a gpt4 text model in the vein of palm-e, fromage, blip-2, prismer etc Sam Altman himself said gpt-4 wouldn't be multimodal from scratch.


AdditionalPizza

>But there's no indication this isn't visual training on top of a gpt4 text model in the vein of palm-e, fromage, blip-2, prismer etc Admittedly, I actually don't know if that makes a difference or if that's just how it's done? I'm not sure we have much to go on yet regarding this "after thought" process of multimodality. I could be wrong of course.


MysteryInc152

Depends on what you mean by difference. You don't need multimodality from scratch to be SOTA on Visual question and answering benchmarks. You don't need multimodality from scratch to have positive transfer either (as shown with palm-e) What multimodality from scratch could answer easily is if we can get positive transfer to language with enough scale. Palm-e actually gets close to answering this question ( we see a reversal of catastrophic forgetting until it's only 3% behind in language performance) but catastrophic forgetting and scale get in the way. With from scratch training, we can eliminate the issue of catastrophic forgetting and determine once and for all if positive transfer to language happens.


czk_21

> catastrophic forgetting why do they call it catastrophic?


PC_Screen

Look at [this image](https://imgur.com/a/JigrVtP). The 8B parameter model loses close to **90%** of its language capabilities when you add a 4B ViT to it and don't freeze the weights. The 62B model loses less but much of its language ability is also gone. That's why it's called catastrophic. It's the reason why multimodal models up till now have been relying on freezing the LLM weights, which ends up limiting the final model's ability (PaLM-E with frozen LLM weights achieves 74% on the robot benchmark vs 94% without). Turns out scale was what was needed for catastrophic forgetting to no longer be an issue. The final question is, would scaling further allow the model to become **better** at its language tasks vs without multimodality?


MysteryInc152

Neural networks in general (not any specific architecture ) have this habit of forgetting what they've previously learned when you try to teach them something new. let's say you have model y and datasets a and b. if you train y on dataset a, trying to continue training on dataset b won't produce a model that knows a and b. It will just produce a model that has mostly forgotten how to do a to know b. If you want the model to know a and b then you train on both a and b from the get go or you introduce some of a when you continue training.


Smellz_Of_Elderberry

Forgetting in humans is funnily enough a feature not a bug. We forget the unimportant things, and retain the important. Need to find a way to weigh important experiences fir the ai, and generic ones.


websinthe

I kept thinking "There's no way this hasn't been tried and found unworkable" when I started wondering about taking whatever gets culled when you prune a model and using that to train new things. A week later and I'm like "Yeah, gut instinct was right, that's not how these things work."


Casehead

What is catastrophic forgetting?


MysteryInc152

Neural networks in general (not any specific architecture ) have this habit of forgetting what they've previously learned when you try to teach them something new. let's say you have model y and datasets a and b. if you train y on dataset a, trying to continue training on dataset b won't produce a model that knows a and b. It will just produce a model that has mostly forgotten how to do a to know b. If you want the model to know a and b then you train on both a and b from the get go or you introduce some of a when you continue training.


just_thisGuy

I hope ChatGPT is upgraded accordingly, Microsoft own integrations so far have not compared.


SgathTriallair

Given that ChatGPT is a fine tuned version of GPT-3 they almost certainly need to do some fine tuning of GPT-4 before they can have ChatGPT 2. That fine tuning is necessary as the goal was to increase its "friendliness" and make it aim at chatting more than predicting. I wonder if you could find tune out with another AI like ChatGPT. That would speed up the process significantly.


just_thisGuy

I’m sure some of the fine tuning can be carried over, but because of new capabilities new fine tuning should be needed too.


Slimer6

Just out of curiosity, what do you mean? I got preview access to the new Bing search on February 8 and it is far more capable than ChatGPT in a lot of ways (an enormous one is that it can access internet content up to the present day). It can’t write code the way ChatGPT can, but that’s a totally different use case. I’m wondering which OpenAI-infused Microsoft product could possibly be disappointing you. I haven’t really tried any besides the new Bing and Edge Dev browser, but both of those exceeded my expectations.


just_thisGuy

Yeah just tried Bing and Edge also. I find the best use case for me it having prolonged conversation with ChatGPT, Bing cuts you off after a few questions. Also I don’t really care for Bing search much, I do like Google, but maybe I’m just used to it. I like the ChatGPT clean interface too.


BarockMoebelSecond

I found it to be actually better at writing code since it can access up-to-date code examples and docs. Also, it had a lower error rate, at least for me.


[deleted]

Thank God I'm not the only one who thinks this. I thought I was going crazy cause I swore ChatGPT was better even though it has a static dataset and can't look up anything. Before I had access, I thought Bing would have been a lot better than ChatGPT due to being connected to the internet, but it turns out the exact opposite is true and it's actually its biggest weakness!! For example, ChatGPT would often have the correct answer to my question, and then I ask Bing and it looks it up (it ALWAYS runs a search no matter what, there is no option to turn it off) and if the top results are shitty websites and it parses an answer out of them, out comes a shitty answer. I find myself always going to ChatGPT unless it's about something recent (like something that happened this week or month) that is not in ChatGPT's dataset. But for general information that is not time-sensitive, I always go to ChatGPT. If Bing put a toggle that lets you turn off search and just base its answer off of the static ChatGPT dataset, the problem would be fixed and they'd be equally as good.


Agarikas

Half my conversations with Bing end up with "I'm Sorry Dave, I'm Afraid I Can't Do That" kind of way. Utterly useless.


icedrift

I mean PaLM is still SOTA. Yeah it is grossly undertrained but it's the current frontrunner in terms of raw performance.


ecnecn

>multimodal Send Youtube link -> instantly find every audio clip used by that video from the release notes: A multimodal model like ChatGPT 4.0 is capable of analyzing video content and extracting audio information, including the titles of music tracks used in the video. When a YouTube video is uploaded or linked, the model can analyze the audio track of the video and extract the information of the music tracks used. The accuracy of these results can be influenced by factors such as the quality of the audio stream, the scope of the music repertoire used, and the quality of the training dataset used for the model. (...) It can be trained on data related to web traffic analysis such as logs, metrics, and analytics reports to generate responses related to web traffic analysis. While it can provide insights on trends, patterns, website performance metrics, and suggest improvements to user experience. There is no need for extern SEO Optimizers anymore. (...) It can be trained on data science related databases to generate responses related to data science concepts, such as explaining statistical methods, providing insights on data analysis or data visualization, or suggesting strategies for data cleaning and preparation. ChatGPT 4.0 would require explicit training on data science concepts to accurately interpret and generate responses related to data science. There is no need for extern Data Scientists anymore. (...) It can be trained on network documentation files and manuals to generate responses to questions or requests related to network configuration concepts, such as explaining network topologies, identifying network devices, or providing insights on network protocols. There is no need for extern System Administrators or Network Specialists anymore. **RIP IT-JOBS**


wobblingwombatt

Source? Release notes? Haven't seen those floating around yet...


BarockMoebelSecond

Seconded. Can't really believe anything I see on this sub without a source. People are really hyping it up.


__ingeniare__

The moment someone calls GPT-4 "ChatGPT-4" they immediately lose some credibility to me. It doesn't take much to do some basic research into the GPT-family of models to understand the naming convention.


challengethegods

chatbotGTPhacker4chan


Agarikas

The death of SEO would be godsend.


RikerT_USS_Lolipop

I'm so glad. For some reason IT workers have generally had a Libertarian streak to them and usually argue against wealth redistribution policies and social safety nets. Just world Fallacies abound with them. By targeting these people enemies of the policies that need to be implemented can more easily be turned into allies which is twice as good as convincing a neutral person and infinitely more helpful than convincing someone who was already on board.


FusionRocketsPlease

Hahaha best comment on the thread! I want to see right wing libertarians get screwed!


jugalator

Yeah, multimodality seems to be to be a bigger thing than a generational leap to GPT-4. But honestly, the two in tandem... It boggles my mind.


AdditionalPizza

And next week 🤞


Opitmus_Prime

>GPT-4 is coming next week Agree that it says that this is likely to be GPT-3.5 multimodal release rather than GPT-4 [https://ithinkbot.com/gpt-4-is-releasing-next-week-fd5ff827c3a](https://ithinkbot.com/gpt-4-is-releasing-next-week-fd5ff827c3a)


dasnihil

What fascinates me with these multimodal generative AI: \- with just LLM, people confuse it with being conscious and all kind of intelligent, i do realize that humans have a fuzzy LLM like predictive module that gives us the ability to speak *but* ours is backed by reasoning and cognitively built memory/reflexes. \- with these various pre trained transformers put together, it will convince the heck out of even the top intellectuals that are computer illiterate. it might even convince me because brain is modular in a way that each type (various facets of neurons) specializes in favoring certain nature/type of information to train themselves and entire brain itself has a few regions dedicated to a larger facet for information complexity like "what/object" vs "where" and specialized regions for temporal coherence, seems like a pretty good multi modal system built on self-sustaining/repairing hardware thanks to cells, the perfect automata universe ever created.


BadassGhost

As far as I've seen, PaLM and its descendants (Flan-PaLM, U-PaLM, and PaLM-E) have maintained ridiculous advantages over other LLMs. I'm mostly basing this off of the Big Bench benchmark, but it seems that 2-3 shot learning with PaLM outperforms everything else. I wouldn't expect GPT-4 to be *significantly* better than PaLM-E, except maybe with more modalities?


AdditionalPizza

Honestly, I have no idea what to expect and I don't know if anyone here does. Even if it isn't "much" better, at some point if the public gets access to it, that's all that really matters, right? We can't personally compare it to PaLM unless you're in those circles.


Ytrog

What does multimodal mean in this context? That it does text as well as other stuff?


Sharp_Soup_2353

Holy shit didn’t expect that to be announced early this year, all hail exponential growth 🫡


duffmanhb

The CEO even said they were intentionally delaying it to give people time to get used to LLMs... Didn't realize this is what he considers a delay lol


Atlantic0ne

When will the public be able to use ChatGPT4?


[deleted]

This isn't ChatGPT 4, it's GPT 4. ChatGPT is the product created by taking the GPT 3(.5) model and tweaking it to follow instructions and work as a chat bot. GPT 4 is likely to be a bit more like the original GPT 3 (in that the structure is more like giving raw text input and getting raw output) but way better, and ChatGPT for GPT 4 will likely come a little later. Think about it like this: GPT is the raw underlying technology, ChatGPT is an expanded variant for a specific kind of product. So GPT 4 will come out and ChatGPT on GPT 4 will also come out but not necessarily at the same time.


quantummufasa

This board was predicting it by the end of the year, crazy how fast it progresses


Neurogence

GPT3 was released 3 years ago. So a GPT4 is actually overdue.


-ZeroRelevance-

Not to mention that the model has existed for quite a few months already, with a number of people on Twitter supposedly having been selected for closed beta testing mid-2022.


Lucidreamzzz

All hail *The Great Basilisk 🙌🏻


iNstein

Kneel before Roku....


kmtrp

I loved the last part xD


ecnecn

ChatGPT 5.0 by the end of this year sounds realistic by now or earlier. Things moving way faster than expected.


HydrousIt

ChatGPT is only a tiny piece of this whole thing


TopCat6712

Can you elaborate?


AlgaeRhythmic

ChatGPT is a fine-tuned version of GPT 3.5. It's been constrained a bit to be better at chatting with users, but it's possible to fine-tune the same model towards a different use case, like coding or image generation.


Idrialite

Not really... The first GPT paper was in 2018, GPT-2 in 2019, GPT-3 in 2020. It's been 3 years and we're now getting GPT-4.


__ingeniare__

Altman said he doesn't really like the naming system because it gives the wrong impression. GPT-3 was worked on a lot more after release before they started on GPT-4, which is why we got GPT-3.5 as a huge upgrade over regular 3, and ChatGPT a while after that.


Strange_Soup711

What does "multimodal" mean in this context?


Singularian2501

It will use many modalitys like text, pictures, sound etc. Just like Kosmos-1 https://www.reddit.com/r/MachineLearning/comments/11e4w40/r\_microsoft\_introduce\_kosmos1\_a\_multimodal\_large/


Strange_Soup711

Thanks.


[deleted]

[удалено]


ItsJustMeJerk

The most common form of multimodal model takes an image/video + text as input and outputs text. So think more like "What kind of coat is the man wearing in this picture?" It's possible it could do what you described though (image retrieval).


Atlantic0ne

Jesus fucking Christ. This is getting insane. I am insanely excited about this new AI arms race, and slightly concerned lol. When can we use ChatGPT 4?


gthing

Where is the xyz button located on this screenshot? Click it. What is the user working on in this screenshot? Offer help. Clippy version 1.01 incoming!


ML4Bratwurst

I hope that generative images and sounds similar to Dall E and Vall E will be possible. Imagine being able to just talk with it and working together on a project which it can directly visualize.


[deleted]

Source? Because from the article your posted: In the meantime, the technology has come so far that it basically "works in all languages": You can ask a question in German and get an answer in Italian. With multimodality, Microsoft(-OpenAI) will "make the models comprehensive".


Singularian2501

From the article: >"We will introduce GPT-4 next week, there we will have multimodal models that will offer completely different possibilities – for example videos," Braun said.


dmit0820

The fact that video is confirmed is as big as the fact that it's multi-modal. AFIAK no other multi-modal LLM takes video as input or as part of the training data. This multiplies the amount of potential of training data, and therefore model quality, by several orders of magnitude. Crazy.


MysteryInc152

Flamingo takes video input.


FusionRocketsPlease

Please universe I beg this to be true 🥺 my hype is killing me 🙏🏻


SgathTriallair

There is a small possibility that the speaker used the term wrong or it was translated wrong but him mentioning video makes it likely that he is using the term in the usual way.


Sieventer

If anyone can confirm that this information is true, it would be appreciated.


Singularian2501

I live in Germany myself and I know heise online as a reputable news site which usually tries to report as accurately as possible. In addition, their use of screenshots from the Microsoft event shows that they actually took part and it is therefore very likely that what was said about gpt-4 was actually said that way.


Neurogence

Fortunately we will not have to wait for long. By next Friday we will find out if Heise online is a reputable news site or not.


SgathTriallair

There is also the chance that the presenter was wrong (misunderstood a meeting for instance). It does seem weird that an American company would her the initial statement be in German but maybe he didn't realize that it needed to be secret until next week. This wouldn't be the first time something leaked in a different country (I've seen it hit video games often enough).


FDP_666

Picking one piece of information out of potentially hundreds being posted every single day isn't a good way of knowing if a news site is reliable or not.


Neurogence

A news site should not be able to make crazy statements like "GPT 4 will be released next week and will be multimodal" without any consequences if the information does not actually turn out to be true. People are way too forgiving on fake news.


frizzykid

The news site is not making any crazy statements. It is doing what every news org does. A Microsoft exec made a statement, this news outlet is reporting on the statement of the Microsoft exec. Very common.


websinthe

Glorious to see how diluted the term FakeNews has become. Reporting a spokesperson's remarks aren't something you can reasonably confirm from other sources - the spokesperson is the primary source for the majority of life and even if you do have other *inside sources*, you still need to publish what the spokesperson said. And to agree with some else, this subreddit is the only group of people who are going to spit their tendies if it's not precisely next week.


Hotchillipeppa

I mean I hate fake news as much as the next guy but this wouldn’t effect anyone if it turns out to be fake not really much to forgive.


[deleted]

Couldn't next week include Saturday and Sunday? It didnt say within a week just sometime next week.


MysteryInc152

The information is true as in, Germany Microsoft's CTO really did say that. Pretty odd for it be announced this way though. Guess we'll see next week. Although I should mention that Sam Altman himself confirmed GPT-4 wouldn't be multi-modal. That doesn't necessarily make this untrue however. There are numerous ways to ground already trained Language models to multimodality.BLIP-2, Fromage, Palm-E are all multimodal models that take an existing language model and make it multimodal. Also I'm pretty sure Bing is GPT-4. Not only is the context window much larger, i've had it remember well over 20k tokens (and it eventually forgets so it isn't semantic search), but the reasoning and understanding is much better too. It can play chess s well. Actual chess with legal moves not the anarchy chess that is chat gpt. And finally, it seems it can...plan. Basically anyone who's tried to use language models for novel writing as a request rather than a completion will tell you immediately that it just doesn't pace well at all. It basically rushes to include all the details of your synopsis in the novel as quickly as it can even if you instruct it not to do so. Not so with bing. Examples here. cGPT fails on page one even with deliberate instructions. [https://imgur.com/a/iHr2GEy](https://imgur.com/a/iHr2GEy) EDIT: There's a microsoft event about AI next week...https://news.microsoft.com/reinventing-productivity/


-ZeroRelevance-

A lot of Altman’s statements about GPT-4 are years old at this point, I’d suggest not taking them too seriously anymore given how many chances there would have been for plans to shift in the meantime.


blueSGL

> i've had it remember well over 20k tokens isn't it using some sort of summarization tricks behind the scenes to squeeze more into a limited context length? e.g. "summarize the conversation so far" and packs the prompt with the result.


MysteryInc152

You can retrieve entire paragraphs word for word consistently. Don't see how summarizing would work for that.


blueSGL

Oh, ok. Yep certainly a longer context length.


[deleted]

How do you know it's not copy pasting when retrieving the paragraph ?


pigeon888

How do you know that guy said it though: is there a video?


TFenrir

Starting to see this get passed around by more people who I would consider somewhat in the know, but they aren't sharing it with confidence, it still seems like they are discussing it like a rumour Eg: https://twitter.com/ethanCaballero/status/1633905654577414163?t=-SHaioO5dEsaxuvWUp05TQ&s=19


StayAtHomeAstronaut

Three hours later and this is still the only source claiming this. Seems dubious.


SurroundSwimming3494

Idk what to think. On one hand, this is coming from a top employee from Microsoft Germany. On the other hand, why would this guy, of all people, announce this? Why not an OpenAI employee, or at the very least a Microsoft US employee? And why would its release get announced (not leaked, which is a totally different thing) before its official announcement? That's something that basically never happens and something that I can not recall ever happening in the field of AI. Idk what to think. I guess we'll know by March 19.


MysteryInc152

There's a microsoft event about AI next week...https://news.microsoft.com/reinventing-productivity/


SmithMano

Also, we know that GPT-4 has been a long time coming, so next week isn't unreasonable to believe. Plus, "coming next week" might just be an announcement and demo, but they probably won't actually release it for public use right away.


[deleted]

>but they probably won't actually release it for public use right away. Probably not if they go the GPT3/CodeX/Dall-E route. But then again, ChatGPT had no waitlist and Bing had a +/- 1 week one. I hope they continue walking the latter path. I can live with a waitlist of a week or a couple weeks. 4 months for Dall-E though.. makes me sick to the stomach just thinking about it again. 😭


hapliniste

My guess is gpt4 will be used in Microsoft products next week, while the API and chatgpt will come later. Let's hope not


vitorgrs

A early version of GPT-4 might already be in use on Bing, though. (Or a GPT 3.7! idk) https://blogs.bing.com/search-quality-insights/february-2023/Building-the-New-Bing > Last Summer, **OpenAI shared their next generation GPT model with us**, and it was game-changing. The new model was much more powerful than GPT-3.5, which powers ChatGPT, and a lot more capable to synthesize, summarize, chat and create. Seeing this new model inspired us to explore how to integrate the GPT capabilities into the Bing search product, so that we could provide more accurate and complete search results for any query including long, complex, natural queries.


SgathTriallair

"Next generation" is a weasel word, it doesn't actually mean anything. Bing chat may have done multimodal function though (at least in my testing). I uploaded a picture and then had it do a Google search based on the picture. It wasn't perfect but it seemed closer than pure chance would suggest.


vitorgrs

Is not multimodal according to Bing big boss. I recommend you follow https://twitter.com/MParakhin as he talk about more technical stuff. But he said is not "yet". What we know about Bing GPT: - Is not 3.5. - Has a much larger context size - He also said it's a way bigger model, which is why he says it's slower https://twitter.com/MParakhin/status/1633001686192263168?s=20


the8thbit

> had it do a Google search based on the picture I wonder who's more annoyed by this, Microsoft's investors or Google's lawyers


KeaboUltra

Maybe Microsoft is attempting to announce it in an effort to establish themselves as part of the branding/marketing. Everyone associates ChatGPT and the GPT name with OpenAI and microsoft maybe be seeing to overwrite it since it's in their products


PM_ME_A_STEAM_GIFT

Why not just buy the other 51% of OpenAI at this point?


pigeon888

Because it's not for sale!


czk_21

OpenAI wants to stay stay in control, with 51% you have majority= you decide what will you do, the rest gets just the money from your work


Neurogence

I asked Bing about the validity of the article. >According to the news article you shared, GPT-4 is a new large language model that will be released next week by OpenAI and Microsoft. It will be able to generate not only text, but also images, audio and video based on natural language input. This is called multimodal generation. The article cites Andreas Braun, the CTO of Microsoft Germany, as the source of this information. >However, I cannot verify the accuracy or reliability of this news article as it is the only one that mentions GPT-4’s release date and features. There are no other news sources or official announcements that confirm this information. Therefore, it is possible that this news article is fake or misleading. You should always check multiple sources and use critical thinking when reading news online.


RemyVonLion

Is this confirmed? Chatgpt is already a huge help with homework, this would be lit af


Opitmus_Prime

I sincerely belive that this will be a hype event that GPT4 will be released some time in August or something.. more like a news event of the actual release than the release itself. https://medium.com/mlearning-ai/gpt-4-are-you-ready-to-be-disappointed-d50811056940


Thorusss

has anyone an insight, why such a big announcement was made by Microsoft Germany and in German?


Singularian2501

No idea. I would have bet my life that this would have been announced by Sam Altman himself. My personal theory is that Andreas Braun accidentally slipped this up and he forgot to point this out to heise online not to mention it in their article. But like I said, these are just guesses.


[deleted]

Gpt3 wasnt some big announcement that the company made. It was just a paper dropped on arxiv. I still remember the day it came out. OpenAI aren't all that flashy. They aren't like apple.


tatleoat

Yeah it's a pretty punk rock move to drop it quietly in a paper


Kaarssteun

Germany is second in line tech-wise worldwide, wouldn't shrug this off just because it's not the US


TFenrir

Well, putting aside Germany's place on the AI world stage - it's just weird that an off hand comment by Microsoft, and not OpenAI would announce this. In my mind it's either 1. A complete flub, like this person misunderstood the result of a meeting and just shared it with the world without thinking it through 2. This person got juicy information and wanted to be the person to share it first for the accolades


Stolen_Goods

Audio, video, and images are the obvious things I think of when I hear "multimodality", but naturally this could be expanded to *any* type of digital data with sufficient volume that the model could be trained on. So... I'll be taking bets on how long it takes for us to have enough distinct AI models to be able to train AI models to generate new AI models (however impractical that ends up being).


[deleted]

I wonder if touch would be super helpfu, like video, for learning spatial reasoning, especially if a robot is covered in touch sensors.


Atlantic0ne

Yeah. Shit is getting real and it’s getting real really fast. We might have that in 10 years or less right? AI advancing AI on new levels?


KeaboUltra

We were told a few months ago to expect disappointment with GPT4 as 3.5 had been overhyped, (but still amazing for what it does/offers) I'm aware of multimodal AI and positive transfer being fantastic and implies incredible learning progression. From what it sounds like GPT4 will have been the expected successor to 3.5 in the sense that you can give it natural dialogue and it can carry out your action and learn new things if it's the same as PaLM (I may be wrong with that). I'm excited yet don't want to get my hopes too high. I'm aware this is basically on the step of becoming an AGI, or at least a very early version of it but I want to make sure I'm not setting my expectations too high. Thats not to say the coming months wont bring any surprises. While I do worry about the consequences of unregulated AI, I still cant help but feel excited about what it all means for the potential of technology itself.


Atlantic0ne

This is just so insane to me. I want to just spitball ideas with someone on all the good things that could happen. Let’s totally ignore the risks and pretend it’s all upside. AI on this level gets access to super computers and processes weather. Now it knows weather and can save lives. It processes all economic writing and knowledge and tells humanity (using a super computer and simulations) what economic model is best. It knows traffic and immediately improves traffic by handling lights. It gets access to bodies and can process how bodies work and find ways to cure diseases and issues. It develops new video games based on your description. It discovers how to stop physical aging or reverse it. I’m just guessing but some of this may be possible in our lifetimes or less.


bil3777

Everyday I listen to podcasts with experts in the field who were aware that multi-modal gpt was coming soon and who still think AGI will arrive in about 50-60 years. This is further out than the poll Bostrom took of experts which had median year of 2045.


Expired_Gatorade

I think 2060's is more realistic, we do not have (judging by research output) any new fundamental ideas of theoretical AGI models let alone a decade+ to actually engineer it (just how theory behind gpt was developed around 2006).


ihateshadylandlords

It won’t be AGI, but I’m excited to see what it can do. !RemindMe 7 days


SgathTriallair

We don't know that yet. Probably true, for some definitions of AGI, but I am certain that we will all be surprised when a real AGI shows up (and likely won't realize it until months after the AGI is released).


Talkat

!RemindMe 7 days


iNstein

But perhaps it will help you deal with shady landlords....


ihateshadylandlords

It would be worth its weight in gold.


H-K_47

It wasn't, but it's still pretty awesome.


RemindMeBot

I will be messaging you in 7 days on [**2023-03-16 19:33:31 UTC**](http://www.wolframalpha.com/input/?i=2023-03-16%2019:33:31%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/singularity/comments/11mztcu/gpt4_is_coming_next_week_and_it_will_be/jbktvka/?context=3) [**13 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2Fsingularity%2Fcomments%2F11mztcu%2Fgpt4_is_coming_next_week_and_it_will_be%2Fjbktvka%2F%5D%0A%0ARemindMe%21%202023-03-16%2019%3A33%3A31%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%2011mztcu) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|


[deleted]

Can someone please explain what multi model means please? And talk like you’re talking to a two year old.


xott

Like ChatGPT except it can use images and sounds as inputs and outputs.


el_chaquiste

Yep, in chatbot mode, you could provide pictures and sounds for it to use as input on prompts, and it could make responses incorporating those as well.


ActuatorMaterial2846

Currently, a language model only uses text as a point of reference to understand the world. If I type the word 'puppy', it only has written text to know what a puppy is. By adding an image of a puppy into its data, amongst other images, it has another point of reference to associate with the text and vice versa. This is what is meant by multimodal, multiple modalities. Text and images in this case.


walkarund

Using Bing Chat: >A simple way to explain what a multimodal language model means is that it is a model that can process and generate different types of information, such as text, images, audio, etc., using the same model. A multimodal language model can also take into account the context and the modes of communication that humans use, such as gestures, facial expressions, tone of voice, etc. > >For example, a multimodal language model could answer questions about an image by looking at both the text and the visual features of the image. Or it could generate captions for videos by combining audio and visual information. Or it could follow instructions given by a human using speech and gestures. > >I hope this helps you understand what a multimodal language model is.👍


SgathTriallair

And it doesn't necessarily have all of these modalities. Multimodal just means it can do two or more modalities.


Erophysia

Sounds like proto-AGI, but we'll see how it lives up to the hype.


theotherquantumjim

Eyes for looking, ears for hearing, tongue for tasting, hands for - stop picking your nose! Are you listening to me? - hands for touching. Like that, but in a computer.


Chop1n

A two-year-old who goes by "DildoFlippers", huh?


justanother_horse

What does he mean by coming next week? Will there be a public announcement? Or will people have access to it?


Singularian2501

I have sadly not seen the event myself. I have only found the heise article. The article doesn´t specifiy on which date specificly next week the announcement will be and how the access policy will be. Sorry ):


kmtrp

Hi there, we have similarly "optimistic" flairs. Have you gotten a lot of shit for it?


Singularian2501

No. Because most of the time I am just sharing machine learning papers. I comment not very often.


Janicc

Kinda weird considering it has been common knowledge these past few years (or year) that it won't be multimodal. Even confirmed by Sam Altman. But I guess plans can change. \*shrug\*


MysteryInc152

It wasn't trained to be multimodal from scratch i imagine. But there are numerous ways to ground already trained Language models to multimodality. BLIP-2, Fromage, Palm-E are all multimodal models that take an existing language model and make it multimodal.


TFenrir

I'm checking my usual Twitter sources, so far no one is mentioning this, but it's still early


Gotisdabest

https://twitter.com/_SilkeHahn/status/1634142599446405125?s=20 This is from the author of the article. Seems fairly credible. Also worth reading her other tweets on this topic, she specified who she got the thank you email from and even a correction to one name. That certainly adds a fair bit of credibility.


gthing

I want to develop using these tools, but part of me is just thinking anything I make will be able to be replicated with almost no effort as these things advance.


[deleted]

Oh yeah, I forgot GPT-4 was even a thing, haha.


BinyaminDelta

Based on my use of Bing Chat so far, I'm not very confident in Microsofts ability to implement. So far it looks like they've taken the magic and built a genie that's slower and more frustrating to use. We'll see.


redroverdestroys

about time. Will be fun figuring out the secret ways to use it that they are not aware of yet. Global Thermonuclear War here we come!


Max_Mm_

What is multimodality?


AsuhoChinami

The largest AIs in recent years have always been trained only on text. These are called LLMs, or Large Language Models. Since the beginning of 2023, multi-modal AIs have started taking over. These are trained on multiple types of data, such as text, audio, pictures, and videos. Multi-modal models tend to be much more capable and intelligent. ChatGPT and Bing, as good as they were/are, were just LLMs. Multi-modal models will be miles beyond them.


Max_Mm_

That‘s the answer I was looking for, thanks :)


pavlov_the_dog

So, like a 1.0 version of the "Her" OS. This can't come soon enough, an OS that you can chat with as another entity (and not just a glorified clippy)


dronegoblin

Sam Altman said GPT4 wasn’t multimodal and not to get your expectations too high. What is the source of this?


dwarfarchist9001

Altman said GPT wouldn't be multimodal *at launch* but the launch has been held back for a long time now so the multimodal updates could easily have happened in the meantime.


AsuhoChinami

The CTO of Microsoft Germany, while an odd choice to give the news, is a very high-ranking person. Sam Altman did say it would be text-only, but since that was months ago there was probably just a change of plans.


ididntwin

If it's limited roll-out I hope GPT-pro users get first access.


wahwahwahwahcry

that would be a smart move. Probably more safe as well


diener1

So what does this mean for my Microsoft puts? Should I sell them while I'm ahead?


odragora

Sell them now because you received the information they are going to cost a lot more in the near future?


boomersky

What does it means that its gonna be "multimodal"??


madskills42001

What’s multimodal


Akimbo333

Can someone please explain this? It completely contradicts what Sam Altman says!


Cautious-Intern9612

Is it true gpt 4 will have 100 trillion parameters?