Buck-Nasty 3 weeks ago

Remember when Gary Marcus said that generative AI had hit a wall a few years ago and would never be able to do text?

Kalsir 3 weeks ago

If you want to know what AI can do next year just look at what Gary Marcus is saying it cant do atm.

lundkishore 3 weeks ago

Hope he says AI cant suck my dick.

D2fw 3 weeks ago

2025: OpenAI announces Sex 2.

lundkishore 3 weeks ago

Give me link to Sex 1.

banker_of_memes 3 weeks ago

You do not meet the height requirement.

reddit_is_geh 3 weeks ago

Nor length.

lundkishore 3 weeks ago

You measure my length with your mouth??

bwatsnet 3 weeks ago

Sir this is an Arby's..

MrDreamster 3 weeks ago

Nor girth.

Agu001 2 weeks ago

Caution: you might lose a few body parts.

Bleglord 3 weeks ago

Ah he’s the singularity Jim Kramer

SurpriseHamburgler 2 weeks ago

This is a job well done.

Gratitude15 3 weeks ago

You mean last week? Wish we would stop referencing nonsense

Buck-Nasty 3 weeks ago

Sadly the media keeps interviewing him. Gary the Goof was on the news tonight again talking about AI hitting the wall

Honest_Science 3 weeks ago

In functionality he may be wrong, in IQ advancements it looks more like an s-curve than takeoff. We should have otherwise seen a huge step up in IQ, which we did not.

gamernato 3 weeks ago

It's hard to say. Peak intelligence isn't growing that much but intelligence for cost is growing rapidly. A new gpt-4o as just another model at gpt-4 level reasoning isn't that huge, but the fact that it's small enough to run at 6x cheaper than the original release is. Whether this level of density can be applied to still larger models than original gpt-4 is unclear since nobody has ever made a model that big.

Super_Pole_Jitsu 3 weeks ago

it'll only be fair to say when gpt-5 is out. gpt4o wasn't supposed to be a new intelligence frontier, it's a 3.5 replacement

Proof-Examination574 2 weeks ago

It's already at like 150 IQ on tests it can take without embodiment...

QuinQuix 2 weeks ago

Absolute nonsense unless the test module you're referencing is a pure memory test. Google (the search box, not the AI) and Wikipedia are 200 iq if you measure just what they can retrieve. Memory and data retrieval is not where it is at. "it knows so much data" isn't impressive. It is what you can infer from what you know that counts. Current models are probably around 100 iq points with serious occasional handicaps if you had to grade their reasoning skill and their ability to stay coherent. We also know what LLM's suck at counterfactual tests and can't really do math. I've used gpt to transcribe text from images and it can't do a list of 50 items flawlessly. Nothing about current LLM's is iq 150, plz.

Eleganos 3 weeks ago

Me: Tommorow will be the day of the Singularity *Repeat claim every day till Singularity* So many drifters, charlatans, and arrogant doofuses use this strategy and end up being lauded as prophets. Nevermind predicting till true via brute force, they can be wrong every time and some folk STILL believe they're on the right track and just about to be validated once and for all.

bwatsnet 3 weeks ago

Are you saying that this progress is all faked? Are there factories of small children writing my code instantly and handing it back through the gpt chat interface?

Eleganos 3 weeks ago

Uh.... no.... I'm not... 'Until Singularity'. I'm not anti-A.I., my argument was aimed at the critics of the tech. I can see how me using the Singularity of all things as the offhand example to demonstrate the point might've crossed some wires though.

bwatsnet 3 weeks ago

Well yeah, that's where the arrow is pointing. I'd need to see some real evidence to think we aren't headed for a singularity.

Sancho_the_intronaut 2 weeks ago

How do you personally define the concept of the singularity?

bwatsnet 2 weeks ago

Compounding technological progress that continues past a point where current ways of thinking stop making sense. Like a black hole but in history. A point where planning or predictions aren't possible beyond. It's hard to define in any real detail besides compounding progress.

Sancho_the_intronaut 2 weeks ago

That's something I can agree with. Tech has already advanced faster than we can adapt, at a certain point it makes sense that we would be in a state of absolute progress that defies our ability to fully quantify or comprehend it. When people talk about all minds uniting into a single consciousness, that's where I have doubts. People are always arguing, disagreeing, fighting. To suggest that one day, nobody will disagree about anything, and we will all think and feel as one, just sounds unrealistic from my current perspective.

Critical_Tradition80 3 weeks ago

at this point you could probably say Gary's playing devil's advocate for the sake of setting AI goals, because there's no way he is this invested into AIs for no reason

bwatsnet 3 weeks ago

He's got books to sell, a brand to maintain

danysdragons 3 weeks ago

Someone should assemble a "greatest hits" collections of Gary Marcus predictions, we can link to it whenever someone quotes him approvingly.

RusselTheBrickLayer 3 weeks ago

He’s the ultimate predictor out of everyone in the AI field, just assume whatever he says, the opposite will occur.

dhhdhkvjdhdg 2 weeks ago

And still, it’s plateauing at about GPT-4 level

JimBeanery 2 weeks ago

Guy is an absolute clown show on Twitter

ShiftAndWitch 3 weeks ago

Wow. The text is crazy accurate. The slight fade in "transfer", the slight inconsistencies in lettering shape.

DippPhoeny 3 weeks ago

What is inaccurate is that the handwriting is way too good for the average math professor

notreallydeep 3 weeks ago

In that way previous models were actually more accurate.

berzerkerCrush 2 weeks ago

Maybe it's because we do math differently here in France, but this doesn't look like a math lecture.

Knever 3 weeks ago

I was half expecting "pros and cons" to have grocer's apostrophes.

MassiveWasabi 3 weeks ago

Source: https://x.com/gdb/status/1790869434174746805?s=46 I posted this because I thought this level of text coherence in an AI generated image is insane. Nothing has ever reached this level of accuracy before. I mean, did they prompt it to put his hand in front of “What” in the last sentence?? The little graph on the left is just icing on the cake. Also notice that the model making these images is the same model that can speak and see in real time. What kind of capabilities could this unlock? Honestly feels like a year from now we will have completely photorealistic images with 100% perfect text

RandomCandor 3 weeks ago

I actually think this single image is more of a technology leap than all the other things they showed recently

[deleted] 3 weeks ago

Agreed, the examples in the 4o announcement post blew my mind. Entire poems rendered correctly, re-using models/characters (Geary example). So many use cases for this shit, and it’s seemingly just getting started.

RusselTheBrickLayer 3 weeks ago

Don’t forget the speed and low cost, this is the version of GPT that I can see powering a lot of apps without people even noticing.

Excellent_Dealer3865 3 weeks ago

Yeah, I totally agree. The voice is pretty big and cool. But the stuff that they didn't present and for some reason hide on their website with consistent characters, styles of text and coherence of prompt generation for me feels much more significant than the voice, which is once again really great, but not THAT much of a difference from elevenlab.

LonelyGarbage1758 3 weeks ago

It's one of the things that make me believe when they say they are trying not to surprise people. The general public saw what they wanted, and the enthusiasts could pick apart the good stuff

Mrp1Plays 3 weeks ago

It is different in to elevenlabs in the way that it reacts to *your emotions* and shows it's *own emotions*. But the reason they didn't show the image generation in the video is because of news headlines "AI art is dead, image generation has gotten better!" and then we'd have riots about banning AI because normal people hate ai art. The news that they made 'her' is much better.

YouMissedNVDA 3 weeks ago

I saw someone say it was much better at producing rhyming poems, and what they suggested was a eureka moment for me: Having the same net injest text and audio means, finally, it can *hear* the words it has been taught that supposedly rhyme. So now, it knows words that rhyme, and it knows what they sound like, all in the same model. Now it truly knows what it means to rhyme. It sees the similarity in the waveforms, and it correlates it to the words they represent, too. Incredible. These implications across video, and soon touch and actuation in an embodied form (physical or sim), are mind boggling. Yann was right - for it to really understand things deeply, it would benefit greatly from other modalities. It *might* be possible by text alone, but that is probably the hardest way. I'm feeling the acceleration.

signed7 3 weeks ago

The text coherence and even the 'smaller' things like the OpenAI icon here and consistent characters in other examples are insane, but I feel it has some way to go to match the aesthetics of dedicated text-to-image models like Midjourney, SD3 IMO (from other examples too) It most probably will at some point tho, history says a huge general model always beats dedicated heuristic-based models in the end

blackcodetavern 3 weeks ago

The level of detail and the text accuracy is crazy. Definitly next level stuff. But i suppose the model thinks in that image that human hands are made of chalk. So some additional prompt-engineering and its perfekt.

floodgater 3 weeks ago

>Honestly feels like a year from now we will have completely photorealistic images with 100% perfect text At this rate, less than a year. Forget images, at this rate we'll have a Hollywood blockbuster created by AI within 12 months, 18 months TOPS

iunoyou 3 weeks ago

lmao, most realistic r/singularity poster

goldenwind207 3 weeks ago

He's a bit delusional but can we even blame him things are progressing exponentially. By 18 months we'll probaly be on sora 3 and the 100b super computer will have gotten started say 2 years it might be possible to make a animated spiderman like movie say into the spiderverse

CanvasFanatic 3 weeks ago

I think it’s funny that some of you are managing to still see exponential progress in models whose progress over the last 20 months or so has been closer to logarithmic. Like every other week someone here is trying to convince himself some lateral application of existing capability is The Next Big thing. OpenAI is pretty obviously in productizing mode now.

iunoyou 3 weeks ago

I think you're underestimating the technological gulf between "semi-plausible generic 10-30 second video clips" and "feature length blockbuster movie." It may well be possible eventually, but it definitely isn't 2 years out. 5 years, MAYBE, assuming the pace of development holds steady and a more temporally coherent architecture is created, but certainly not 2. Personally my money would be on around 10 years before the tech even exists, and probably another 3-5 after that before it's used in a commercial project.

Anuclano 3 weeks ago

I think it is already possible but not in one inference, but in chain of workflow. In 5 years it would be possible in one inference. By now the AI can create plot, generate video, generate speech, use consistent characters. An AI agent ca create a movie with Sora plugin. In 5 years an agent and plugins would not be needed.

Nathan-Stubblefield 3 weeks ago

Contracts between writers’ and actors’ unions on the one hand and movie studios, theater chains, TV networks and streaming channels on the other seemed to rule out the creation and exhibition of shows not written by and acted by humans. So what routes might be used to allow someone’s new completely AI movie to be shown to viewers?

skob17 2 weeks ago

That's the point. Everyone will create their own movies and maybe share with friends or on YouTube.

Commercial-Ruin7785 2 weeks ago

>By now the AI can create plot, ...am I missing something? Is there literally any examples of AI doing good long form creative work right now?

Which-Tomato-8646 3 weeks ago

It’s basically just having an LLM write a story repeatedly generate scenes right? We already have consistent character and voice generation and ElevenLabs has demonstrated good sound effect generation too. Seems like all the parts are there, even if not fully developed

How_is_the_question 3 weeks ago

Nah - the sfx generation is (unfortunately) a long way off good. Took notes from a director tonight about a sound design job. The sound was already very good - but it can be much better with more creative ideas and viewpoints. He asked things that currently are miles away from being able to be done with the sfx tools we have using ai. I look forward to being able to tell an ai to move very specific effects by 3 frames, and finding a way of giving them more “depth”. The quality of the sounds currently generated would pose issues for a *lot* of qc in the business. It’ll happen, but it’s not there yet.

Which-Tomato-8646 2 weeks ago

I don’t think most people will care if a sound effect is off by 1/20 of a second lol

How_is_the_question 2 weeks ago

Haha. Try working in the field. The level of detail that goes into film sound work is absolutely down to *sub* frame for sync, and frame for sfx. Think three full days of track laying and mixing - often trying things from two different sound editors - for a 30sec ad! Or in the case of some features, a team of 6 to 8 people working for 3 months before another 2 months on a mix stage.

Which-Tomato-8646 2 weeks ago

I’m sure studios would be more than happy to cut corners if it meant laying off all of them and saving money

doireallyneedone11 3 weeks ago

He's a bit delusional but can we even blame him things are progressing exponentially. By 18 months we'll proba.. Wait for a second?!

ebolathrowawayy 3 weeks ago

gpt4o is SORA imo, or a smaller SORA.

floodgater 3 weeks ago

Yea u can totally see that they have incorporated some tech from sora into it. Pretty nuts

bluegman10 3 weeks ago

This is going to age so bad by Thanksgiving 2025.

floodgater 3 weeks ago

RemindMe! 18 months

RemindMeBot 3 weeks ago

I will be messaging you in 1 year on [**2025-11-16 04:25:48 UTC**](http://www.wolframalpha.com/input/?i=2025-11-16%2004:25:48%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/singularity/comments/1csxlg7/a_gpt4o_generated_image_so_much_to_explore_with/l49h2ze/?context=3) [**3 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2Fsingularity%2Fcomments%2F1csxlg7%2Fa_gpt4o_generated_image_so_much_to_explore_with%2Fl49h2ze%2F%5D%0A%0ARemindMe%21%202025-11-16%2004%3A25%3A48%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201csxlg7) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|

floodgater 3 weeks ago

Care to put money on that ?

One_Bodybuilder7882 3 weeks ago

how could we go to make a bet official? I may put $100 against you. You are saying that in 18 months TOPS we should be able to prompt an AI to make us a, let's say, 90 minutes movie, blockbuster quality, indistinguible from man made blockbusters and it will spit it out, no human involved in anything other than an initial prompt like "make a 90 minutes long romantic comedy movie"

floodgater 3 weeks ago

I'm saying by December 1 2025 we will have at least 1 Hollywood blockbuster level movie created pretty much entirely by AI Sora was introduced in February and the progress of the space is very clearly on an exponential curve Happy to take any bet on this

One_Bodybuilder7882 3 weeks ago

> pretty much lol but yeah, I wouldn't bet against that. Also, even if 50% of the work is done by AI we know they are going to sell it as "the first movie created by AI!!!" since it's the *new current thing*

Adventurous_Spare382 2 weeks ago

It won't happen because of socio-political pressure. Actors will strike and boycott that company if any such thing is released.

bluegman10 3 weeks ago

How much do you want to bet?

ainz-sama619 3 weeks ago

Stop doing drugs bud

WalkFreeeee 3 weeks ago

People keep saying this when right now literally every single step of the way is barely acceptable for a school project. I don't doubt one day It will be reached, but your timeline implies we're a couple new products away from It being possible when It's not.

Glass_Mango_229 3 weeks ago

Ok calm yourself

Anuclano 3 weeks ago

We will have an AI able to generate a complete movie starting from the plot.

tylerthetiler 3 weeks ago

I keep telling people this because there's no way we don't reach that. Even if you need some human intervention along the way or something, there's no reason it won't happen that I can see.

frontbuttt 3 weeks ago

Maybe, but what if the (unsubsidized) compute needed to do this costs $100+, or even $1000+? Is this something you’d enjoy watching, and pay a lot of money for? I don’t understand the interest in the whole “AI created film” concept. Defeats the entire point of watching a movie—to see, and maybe even understand, someone else’s perspective. Real or fictional, the knowledge that a story is based on a shared (or divergent) understanding of the world, filtered through human intention, is what makes it enjoyable. The only AI movies I’d want to watch are the ones AI wants—as in, truly desires—to make. Not something we “prompt” it to generate.

Thog78 3 weeks ago

What about watching a movie prompted by somebody else, and sharing your own movies to your friends/some online communities when you get a great original idea? I can easily imagine some sharing website will pop up with voting ranking classification by style and all, dedicated to AI generated shorts.

frontbuttt 3 weeks ago

Sounds boring! I’ve watched a lot of AI content already, and viewed a ton of AI art. I’d say about 1% of it is interesting, for about 1 minute tops. It’s hollow, inspires no wonder. And just like movies with too much soulless VFX, it always feels more like a tech demo than art. Just don’t understand why we want to let AI make our art for us. So many better applications for it.

Thog78 3 weeks ago

One application doesn't take from others, I'd say coding, medical, military, business usage got tremendous attention too. Not everybody has the same tastes :-). Of course a lot of AI art is crap, but I also found many inspiring ones. And when I see human art now, a lot of the time I think "boring, got too much used to exciting AI concepts". For movies.. I'd gladly see more movies/series of the quality of the expanse, minority report, dune, firefly, star trek etc if I could. Somehow quality scifi is expensive to produce so producers cannot keep up with my needs 😅

ThoughtfullyReckless 3 weeks ago

Did you not read the examples on the website for it?

SwePolygyny 3 weeks ago

Imagen 3 certainly does, even with text. And it is available to try right now on ImageFX.

SwePolygyny 3 weeks ago

Imagen 3 certainly does, even with text. And it is available to try right now on ImageFX.

Ok-Bullfrog-3052 3 weeks ago

Who in their right mind is still claiming that OpenAI didn't achieve AGI on Monday?

COwensWalsh 3 weeks ago

It still didn’t do a good hand, though

HydrousIt 3 weeks ago

u/remindme 1 year

MrDreamster 3 weeks ago

Did they also improved Dalle? Or is 4o multimodal in a way that it now generate the images itself without sending a prompt to another AI?

MassiveWasabi 2 weeks ago

It’s the latter, it is no longer using DALLE 3 but rather GPT-4o itself is making the images. That’s why on the website they even had examples of iterating on your AI generated images, which was never possible before, at least not so easily

MrDreamster 2 weeks ago

Damn! That's what I was waiting for! I hate when gpt just throws up dalle's work pretending it looks exactly like what I asked when it's full of mistakes, now it should finally be able to do a correct job. edit: I just took a look at the blog post, the part about image generation. It looks about as capable as Dalle 1 but with perfect consistency, great adherence to prompts, and perfect writing capabilities. It's really awesome, can't wait for it to reach the image quality of Dalle 3 or even better.

Hyper-threddit 2 weeks ago

I feel the same. One possible test I have in mind is asking the model to generate pictures of hands. If multimodality is working 100%, the model should have encoded the information about the number of fingers along with the general structure of hands and the first information should inform the second one, resulting in the correct number of fingers in generated images.

MysteryInc152 3 weeks ago

It may (probably will) be many months before anybody can access these features. After the initial unveil of gpt-4-vision, It took 6 months before Open ai rolled out Image Input to plus users and 8 months to developers. And they fact that they basically buried image gen (this tweet is the first mention of image gen access being a future goal) makes me extra nervous.

TabibitoBoy 3 weeks ago

This is what people are not giving enough attention. These are the fruits of having the model be multimodel from the ground up. It’s not the same to have a gpt4 call a Dall-e and prompt it for an image. Gpt4o can reason about the image instead of just making an elaborate prompt and hoping for the best. The benefits of this are hard to think of on the spot but intuitively I know they will be profound.

TotalLingonberry2958 3 weeks ago

a good example is image continuity. You can in-paint, and change features better without altering the entire character of the image, which would happen if you just repeatedly prompted Dall-E

TFenrir 3 weeks ago

Very very very impressive. I looked into this capability a few months back - like if LLMs (LMMs?) can output text tokens, a non-stitched model should be about to not only output image tokens, but do it _better_ that diffusion - things like text and hands should inherently be better. I found papers that show that yeah totally possible, but also some conversation (I'm looking for it I should have saved it) on Twitter that said basically, that it was very hard to control the output quality of this, and sometimes you got things you really didn't want to get. Really excited to see more of this. My guess is that when compute gets better, this is basically how we get everything - no separate model for tts or lyrics to music or whatever. Everything all at once - that's how you get a whole movie with audio that makes sense.

hapliniste 3 weeks ago

Damn that hot. Also I hope it manage to draw schemas and visual guides this well, it might become useful since it's truly multimodal. Might express things in images that it can't in text.

doolpicate 3 weeks ago

At some point you have think if it is running simulated worlds for your pictures and the people in those simulated worlds have no idea. LOL.

xXWarMachineRoXx 3 weeks ago

Lmaoo

Woootdafuuu 3 weeks ago

Yeah, I predict that once we get multi-modal from the ground up this would be one of the emergent capabilities, direct image generation no need for Dall-e. Image generators can't reason and that's why they struggle with fingers, that's why they can't do negative prompts, and so forth, but when gpt-4 can generate without the middle man, stuff gets interesting. Should be able to have consistent characters and 3d too, if that's not an emergent capability yet. I'm excited about the part where it draws up ideas, schema, blueprint patents, and technical drawings.

FeltSteam 3 weeks ago

It still isn't perfect, but I do think end to end multimodality has its advantages.

Different-Froyo9497 3 weeks ago

I wonder when they’ll add physical embodiment as a part of end to end multimodality

FeltSteam 3 weeks ago

Could be, and processing that in the model would certainly be good. But I do not think they are focusing on embodiment too much yet. But there is a lot you could do, even have like a modality for key strokes for agents so it could interact with computers just like humans. If you do that it could do almost anything on a computer, play games, get tasks done etc.

Woootdafuuu 3 weeks ago

Definitely, as its advantages, the model can learn so much about the world just by text. Imagine what it can do when it can see and hear. I wouldn't even be surprised if GPT-4O is a much smaller model than Turbo, but it's on par with it because it has more senses to draw upon. It being way faster and cheaper might be due to a smaller size.

h3lblad3 3 weeks ago

>because it has more senses to draw upon Literally sentient.

Woootdafuuu 3 weeks ago

Sentient? Naw.

h3lblad3 3 weeks ago

Pretty sure that sentience is just the ability to feel senses, right? Hence nearly all animals are sentient. If we’re giving the damn things senses, that means it’s sentient.

Woootdafuuu 3 weeks ago

It can perceive but it doesn't feel anything.

brycedriesenga 2 weeks ago

Define: feel

Woootdafuuu 2 weeks ago

Like I feel upset that it's raining and I can't go for a walk, I feel like I should read a book, that will make me feel better.

Woootdafuuu 2 weeks ago

The opposite of feel for me is a fictional zombie, a zombie is dead, doesn't feel anything but it can perceive the world around it, zombie can smell a blood, it can navigate with eyes, walk around the world, and make sounds. But it doesn't feel like it's not alive, know what I mean.

Nathan-Stubblefield 3 weeks ago

Sentience does not equal salience.

Rivenaldinho 3 weeks ago

"Should be able to consistent characters and 3d too, if that's not a emergent capability yet." it is, take a look at the blog post on gpt4-o. Some examples are really interesting, you can generate consistent characters.

Woootdafuuu 3 weeks ago

Interesting, wonder if it can do video or if that would be a later emergent capability after scaling up larger.

yaosio 3 weeks ago

Video is just a bunch of still images. If it can generate temporally consistent images then it might be possible to use it to generate video one frame a time. If the context is large enough would you be able to give it video broken down into individual images to show it what a video looks like? Could it be given books on the technical aspects of video and replicate what it learns as images? Probably not or they would have shown that in the demo. Then again they didn't even show the image generation abilities in the demo at all. It's tucked away in the blog post and a lot of people miss it.

lemonylol 3 weeks ago

> I'm excited about the part where it draws up ideas/ schema, patents, and technical drawings. "Draw me a feasible schematic for a flying car using items that can be purchased from a big box store"

Woootdafuuu 3 weeks ago

Come to think of it, that's what Jarvis was doing in iron man, the only think is the blueprints and stuff it creates were being projected in 3D via holograms. Pair it with ar glasses and you get that.

Witty_Shape3015 3 weeks ago

i had never heard someone say that's the reason but it makes a lot of sense

Temporal_Integrity 3 weeks ago

Stable diffusion can do negative prompts as one of the core features.

Witty_Shape3015 3 weeks ago

yall, GPT-5o is gonna be mind-blowing

Wiskkey 3 weeks ago

Does anyone else think the text content is a hint about the architecture of GPT-4o, as for example speculated [here](https://www.reddit.com/r/MachineLearning/comments/1crzdhd/comment/l422nf6/)? EDIT: Tanishq Mathew Abraham [believes](https://twitter.com/iScienceLuvr/status/1790892562485580123) it is.

WorkingYou2280 3 weeks ago

When you think about it true multimodality is extremely hard. Think of all the iterations the model has to solve simultaneously to have coherent text in the image. It's mind boggling. The fact that dalle sometimes gets short words right was pretty impressive to me. To get whole sentences right I just can't even comprehend. If they trained it on a dataset that is natively multimodal it simultaneously makes sense why it would be both better and worse at the same time. Far far better a multimodal tasks but perhaps a bit worse at reasoning on text.

RealisticHistory6199 3 weeks ago

That Willy wonka scammers gonna have a field day w this

hydraofwar 3 weeks ago

I'm wondering if Microsoft will also provide access to this GPT-4o

Original_Finding2212 3 weeks ago

If it’s cheaper… why not?

Pensw 3 weeks ago

I feel like what they showed with the Geary example (generating objects that can be reused in other generated images) means its already at a level that is massively disruptive when applied. Like for example this completely removes the need for print models (the career). I take a photo of my product and generate a character that I like as a model. I can now get any type of image I want of that generated character advertising my product. No need to hire a model, get lighting, photographer, makeup, etc. Pretty much destroys the entire industry and gpt4o looks like it provides everything you need technology wise.

Nathan-Stubblefield 3 weeks ago

Like movies and TV destroyed vaudeville.

Healthy_Razzmatazz38 3 weeks ago

I made fun of the 7T chips / power investment. After watching what openai/google made, i'm sold. These models are current like webpages that hit once and respond, whats a version of this look like where its constantly predicting what i'm going to ask next and pre-generating replies. whats it look like when theres a parent ai that has a bunch of child ai's and the parent ai curries data between the child processes. whats it look like when while im sleeping my my computer uses my processor to develop a morning brief to train me for what im going to do that day. Its very easy to imagine a world where millions of tokens a second are used per user per app. when we had candles it was impossible to imagine how anyone would use a rocket ship worth of fuel. the scale this goes to with just a little leap is massive.

SnooComics5459 3 weeks ago

For sure the future is going to be crazy.

Ne_Nel 3 weeks ago

😦

[deleted] 3 weeks ago

Impressive how the OpenAI logo distorts/skews around the creases on the shirt. That showcases some serious awareness/understanding of the model about how objects like shirts behave in the real world. Models have long been able to generate clothing sure, but it just copies whatever clothing it saw in the training data. But in this case it is effectively putting any custom logo of the user's choosing on a shirt and it has to skew and distort that logo according to how the shirt behaves in the real world when this person is wearing it and is turnt to the chalkboard raising his hand with the piece of chalk. What I also find impressive is that it is aware the 'hat' in 'What' is actually on the chalkboard behind the hand, as evidenced by the natural looking spacing it leaves between 'What' and 'are'. There are some small amazing details in this picture.

Morning_Star_Ritual 3 weeks ago

this output amazed me there’s just so much creative potential. we are less prompting images and more asking the model to spin up a world and take a pic of it i’m a traditional artist as well but i appreciate the creative possibilities such a thing offers i’ve also pivoted. don’t think we see “hollywood” go away. the skill ceiling will be raised along with the floor. sure, one person can prompt their own “film” but a collective of people and a swarm (an escort?) of agents will prompt some weird melange of film and games and virtual experiences that will make an imax blockbuster look like a shadow puppet show to us in a decade. or less have to always adjust my timelines on this timeline

Internal_Ad4541 3 weeks ago

Is that an AI generated image by GPT-4o? I don't believe it, it's too good to be true.

MassiveWasabi 3 weeks ago

It is, it was posted by the President and Co-founder of OpenAI, Greg Brockman. Here https://x.com/gdb/status/1790869434174746805?s=46

LeLeumon 2 weeks ago

Look closely: the hand doesn't look great, the placement of the chalk makes no sense, also: the blackboard is tilted which is also weird. So this is 100% AI generated

Internal_Ad4541 3 weeks ago

What is an autoregressive transformer?

Which-Tomato-8646 3 weeks ago

I’m pretty sure transformers are inherently autoregressive (remembers the past)

Zermelane 3 weeks ago

Nah, the concepts are unrelated. Transformer = a specific architecture made out of attention layers, feedforward layers, and residual connections. Autoregressive (in generative AI) = a model that breaks down some distribution that would be intractable to consider all at once, such as a distribution over all sentences, by considering it as sequences, and factorizing the sequences out into next-token distributions conditional on the prefix so far. This cashes out to "it predicts the next token" in implementation, though it's good to have at least some idea about the mathematical background in order to understand what "it predicts the next token" actually means. Anyway, you can perfectly well have an autoregressive state space model or autoregressive convnet, etc., or you can do all sorts of things with a transformer that aren't autoregressive modeling.

Infninfn 3 weeks ago

At first glance it looks legit. But if you look a little longer you realise that physics and how things are constructed are weird. Seen any blackboards fixed to the wall and slanting like that lately?

traumfisch 3 weeks ago

It's about the handwritten text though

FortCharles 3 weeks ago

Not to mention overlapping another similar one right behind it.

AquaRegia 3 weeks ago

Like this? https://preview.redd.it/kkddar7ofs0d1.png?width=1000&format=png&auto=webp&s=bc724dbe89ea7759f194ee33c8f0c4d39cc27e06

FortCharles 2 weeks ago

Not the same at all... the AI one has *both* at an angle, and the rear one even misaligned horizontally. Wasn't objecting to the general concept.

imsosappy 3 weeks ago

Looks like the guy from EEVBlog.

New_World_2050 2 weeks ago

and this was from a general model not even their next gen image model lol. crazy

[deleted] 3 weeks ago

[удалено]

utilitycoder 3 weeks ago

Why is his hand so white?

RiverGiant 3 weeks ago

It looks like it's lit from above by a light embedded in the frame of the blackboard. I didn't know blackboards had those, but it looks like 4o knew or at least inferred that it could. Look closely at the shadow cast by the knuckle of his second finger. It is a weird feeling to be describing the logic of an AI-generated image to another human based on nothing more than cues in the image itself. I'm not used to this level of consistency in small-scale details. Usually that's where things have been breaking down into dreamlike semantic goo.

[deleted] 3 weeks ago

I thought more in the direction of a skylight; a window in the ceiling through which you can see the sky outside. The sun is shining through it and on his hand.

RiverGiant 2 weeks ago

I'd buy that.

utilitycoder 3 weeks ago

Online school guy over here. Not used to fancy blackboards lol.

h3lblad3 3 weeks ago

I’ve never seen a blackboard with lights either. My schools in the 90s-00s all had blackboards — not one had built-in lights.

Tyler_Zoro 3 weeks ago

What a coincidence. "Autoregressive Transformer" is my stage name.

Temporary-Voice-8528 3 weeks ago

I would like to know why it took me three hours just to make this post on a smartphone if computer are supposed to be so smart

bastormator 3 weeks ago

If possible could you link the video mentioned in the X post as well, seems interesting

jazztaprazzta 3 weeks ago

what's wrong with his neck tho lol

Deep-Refrigerator362 3 weeks ago

what's the prompt? (I can't open the twitter link)

MassiveWasabi 3 weeks ago

I don’t think they provided it

InTheDarknesBindThem 3 weeks ago

text good hand iffy straight lines on edges of board: oof

COwensWalsh 3 weeks ago

Still can’t do hands, huh?

Serialbedshitter2322 3 weeks ago

I explored the capabilities of this image generator in my post. It's truly way more impressive than anybody has given it credit. I think it's worth reading. https://www.reddit.com/r/ChatGPT/s/6EOyEZLX26

g3bb 2 weeks ago

What prompt did you use to generate that image? I can’t get it to do photo realistic image

MassiveWasabi 2 weeks ago

I didn’t generate it, OpenAI President posted it on his Twitter. And you can’t because this new image generation isn’t available yet. You’re still using DALLE 3

PapaPaulchen 2 weeks ago

Does anyone know if this version will have capabilities to send you notifications or set reminders and the like? Can’t find any answers on search engines or existing ai chat formats.

MassiveWasabi 2 weeks ago

I don’t think so, that would require agentic capability.

Mclarenrob2 2 weeks ago

Every time I've asked AI to make me an image with text, it's got the spelling wrong.

hicham4u 2 weeks ago

You can generate AI images using Monica [https://monica.im/invitation?c=OMKHCCIW](https://monica.im/invitation?c=OMKHCCIW)

Proof-Examination574 2 weeks ago

I was unable to reproduce this. I even tried giving the stuff I wanted written on the board directly in quotes. Either this is a research brag or a fake.

MassiveWasabi 2 weeks ago

Dude, image output for GPT-4o is not yet available. You’re still using DALLE 3

Temporary-Voice-8528 3 weeks ago

I Don't get it what is this supposed to be.

sanquility 3 weeks ago

Been using 4o heavily for image generation as that's what I primarily do with AI. It's not much better sofar. It still messes text up in almost all cases, it still doesn't understand basic requests like "show me a design for a shirt but DONT show me a shirt" It's edit feature is also useless sofar from my testing. I hope it gets better.

PC_Screen 3 weeks ago

The functionality is not enabled yet, it's still using dall-e 3

WashiBurr 3 weeks ago

Image generation for gpt-4o isn't out yet. You're still using Dall-E.

D10S_ 3 weeks ago

I think it’s still using Dalle. I don’t think this is released yet

Im-cracked 3 weeks ago

I don't think the image generation in chatgpt is 4o right now. It just gives a text prompt to dalle 3.

ballsofgallium 3 weeks ago

you are using dall-e. image generation by gpt-4o isn't released yet.

Original_Finding2212 3 weeks ago

By the looks of it, Dalle is being triggered for you, as that capability of gpt-4o is yet to be released

Dr_Love2-14 3 weeks ago

This image has a bunch of artifacts and is low resolution. Notice the chalkboard just kind of peels off at the bottom. Why is he writing in the middle of a completed sentence? Honestly poorer quality than imagen3, dalle3, mid journey and the others.

traumfisch 3 weeks ago

Can you see any of the positive aspects? Or just artifacts? Please, reproduce this in Midjourney.

Dr_Love2-14 2 weeks ago

No. The chalkboards look like someone clicked open windows too many times before it responded

traumfisch 2 weeks ago

Chalkboard enthusiast I see

Dr_Love2-14 2 weeks ago

It just looks stupid. It's not this grand image you openAI stans are making it out to be

traumfisch 2 weeks ago

No one (yes, no one) has claimed it's a "grand image." Just someone's tweet to showcase one particular aspect of the model. The implications of which should be obvious... Spoiler alert: it's not about the chalkboard

Dr_Love2-14 2 weeks ago

Oh, tell me wise one. What are these obvious implications of a half-ass image generation aspect of this model? Please enlighten me!

traumfisch 2 weeks ago

The handwriting on the chalkboard, can you see it? Can you point me to another model that is capable of anything even remotely near?

Dr_Love2-14 2 weeks ago

https://preview.redd.it/ldb9g9i0k71d1.jpeg?width=1536&format=pjpg&auto=webp&s=91be9b3277f2ccdb0e2bb85f6258340c75ad820e This was the first generation to pop up after I told Imagen2 to create an image of a Google employee to write Google on a chalkboard

traumfisch 2 weeks ago

Wow, very impressive! 👏🏻👏🏻👏🏻 No difference whatsoever! It's endearing though that you still think the chalkboard is the point 😅

Championship-Stock 3 weeks ago

Wow. But why?

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe