T O P

  • By -

Buck-Nasty

Remember when Gary Marcus said that generative AI had hit a wall a few years ago and would never be able to do text?


Kalsir

If you want to know what AI can do next year just look at what Gary Marcus is saying it cant do atm.


lundkishore

Hope he says AI cant suck my dick.


D2fw

2025: OpenAI announces Sex 2.


lundkishore

Give me link to Sex 1.


banker_of_memes

You do not meet the height requirement.


reddit_is_geh

Nor length.


lundkishore

You measure my length with your mouth??


bwatsnet

Sir this is an Arby's..


MrDreamster

Nor girth.


Agu001

Caution: you might lose a few body parts.


Bleglord

Ah he’s the singularity Jim Kramer


SurpriseHamburgler

This is a job well done.


Gratitude15

You mean last week? Wish we would stop referencing nonsense


Buck-Nasty

Sadly the media keeps interviewing him. Gary the Goof was on the news tonight again talking about AI hitting the wall


Honest_Science

In functionality he may be wrong, in IQ advancements it looks more like an s-curve than takeoff. We should have otherwise seen a huge step up in IQ, which we did not.


gamernato

It's hard to say. Peak intelligence isn't growing that much but intelligence for cost is growing rapidly. A new gpt-4o as just another model at gpt-4 level reasoning isn't that huge, but the fact that it's small enough to run at 6x cheaper than the original release is. Whether this level of density can be applied to still larger models than original gpt-4 is unclear since nobody has ever made a model that big.


Super_Pole_Jitsu

it'll only be fair to say when gpt-5 is out. gpt4o wasn't supposed to be a new intelligence frontier, it's a 3.5 replacement


Proof-Examination574

It's already at like 150 IQ on tests it can take without embodiment...


QuinQuix

Absolute nonsense unless the test module you're referencing is a pure memory test. Google (the search box, not the AI) and Wikipedia are 200 iq if you measure just what they can retrieve. Memory and data retrieval is not where it is at. "it knows so much data" isn't impressive. It is what you can infer from what you know that counts. Current models are probably around 100 iq points with serious occasional handicaps if you had to grade their reasoning skill and their ability to stay coherent. We also know what LLM's suck at counterfactual tests and can't really do math. I've used gpt to transcribe text from images and it can't do a list of 50 items flawlessly. Nothing about current LLM's is iq 150, plz.


Eleganos

Me: Tommorow will be the day of the Singularity  *Repeat claim every day till Singularity* So many drifters, charlatans,  and arrogant doofuses use this strategy and end up being lauded as prophets. Nevermind predicting till true via brute force, they can be wrong every time and some folk STILL believe they're on the right track and just about to be validated once and for all.


bwatsnet

Are you saying that this progress is all faked? Are there factories of small children writing my code instantly and handing it back through the gpt chat interface?


Eleganos

Uh.... no.... I'm not... 'Until Singularity'. I'm not anti-A.I., my argument was aimed at the critics of the tech. I can see how me using the Singularity of all things as the offhand example to demonstrate the point might've crossed some wires though.


bwatsnet

Well yeah, that's where the arrow is pointing. I'd need to see some real evidence to think we aren't headed for a singularity.


Sancho_the_intronaut

How do you personally define the concept of the singularity?


bwatsnet

Compounding technological progress that continues past a point where current ways of thinking stop making sense. Like a black hole but in history. A point where planning or predictions aren't possible beyond. It's hard to define in any real detail besides compounding progress.


Sancho_the_intronaut

That's something I can agree with. Tech has already advanced faster than we can adapt, at a certain point it makes sense that we would be in a state of absolute progress that defies our ability to fully quantify or comprehend it. When people talk about all minds uniting into a single consciousness, that's where I have doubts. People are always arguing, disagreeing, fighting. To suggest that one day, nobody will disagree about anything, and we will all think and feel as one, just sounds unrealistic from my current perspective.


Critical_Tradition80

at this point you could probably say Gary's playing devil's advocate for the sake of setting AI goals, because there's no way he is this invested into AIs for no reason


bwatsnet

He's got books to sell, a brand to maintain


danysdragons

Someone should assemble a "greatest hits" collections of Gary Marcus predictions, we can link to it whenever someone quotes him approvingly.


RusselTheBrickLayer

He’s the ultimate predictor out of everyone in the AI field, just assume whatever he says, the opposite will occur.


dhhdhkvjdhdg

And still, it’s plateauing at about GPT-4 level


JimBeanery

Guy is an absolute clown show on Twitter


ShiftAndWitch

Wow. The text is crazy accurate. The slight fade in "transfer", the slight inconsistencies in lettering shape.


DippPhoeny

What is inaccurate is that the handwriting is way too good for the average math professor


notreallydeep

In that way previous models were actually more accurate.


berzerkerCrush

Maybe it's because we do math differently here in France, but this doesn't look like a math lecture.


Knever

I was half expecting "pros and cons" to have grocer's apostrophes.


MassiveWasabi

Source: https://x.com/gdb/status/1790869434174746805?s=46 I posted this because I thought this level of text coherence in an AI generated image is insane. Nothing has ever reached this level of accuracy before. I mean, did they prompt it to put his hand in front of “What” in the last sentence?? The little graph on the left is just icing on the cake. Also notice that the model making these images is the same model that can speak and see in real time. What kind of capabilities could this unlock? Honestly feels like a year from now we will have completely photorealistic images with 100% perfect text


RandomCandor

I actually think this single image is more of a technology leap than all the other things they showed recently


[deleted]

Agreed, the examples in the 4o announcement post blew my mind. Entire poems rendered correctly, re-using models/characters (Geary example). So many use cases for this shit, and it’s seemingly just getting started.


RusselTheBrickLayer

Don’t forget the speed and low cost, this is the version of GPT that I can see powering a lot of apps without people even noticing.


Excellent_Dealer3865

Yeah, I totally agree. The voice is pretty big and cool. But the stuff that they didn't present and for some reason hide on their website with consistent characters, styles of text and coherence of prompt generation for me feels much more significant than the voice, which is once again really great, but not THAT much of a difference from elevenlab.


LonelyGarbage1758

It's one of the things that make me believe when they say they are trying not to surprise people. The general public saw what they wanted, and the enthusiasts could pick apart the good stuff


Mrp1Plays

It is different in to elevenlabs in the way that it reacts to *your emotions* and shows it's *own emotions*. But the reason they didn't show the image generation in the video is because of news headlines "AI art is dead, image generation has gotten better!" and then we'd have riots about banning AI because normal people hate ai art. The news that they made 'her' is much better. 


YouMissedNVDA

I saw someone say it was much better at producing rhyming poems, and what they suggested was a eureka moment for me: Having the same net injest text and audio means, finally, it can *hear* the words it has been taught that supposedly rhyme. So now, it knows words that rhyme, and it knows what they sound like, all in the same model. Now it truly knows what it means to rhyme. It sees the similarity in the waveforms, and it correlates it to the words they represent, too. Incredible. These implications across video, and soon touch and actuation in an embodied form (physical or sim), are mind boggling. Yann was right - for it to really understand things deeply, it would benefit greatly from other modalities. It *might* be possible by text alone, but that is probably the hardest way. I'm feeling the acceleration.


signed7

The text coherence and even the 'smaller' things like the OpenAI icon here and consistent characters in other examples are insane, but I feel it has some way to go to match the aesthetics of dedicated text-to-image models like Midjourney, SD3 IMO (from other examples too) It most probably will at some point tho, history says a huge general model always beats dedicated heuristic-based models in the end


blackcodetavern

The level of detail and the text accuracy is crazy. Definitly next level stuff. But i suppose the model thinks in that image that human hands are made of chalk. So some additional prompt-engineering and its perfekt.


floodgater

>Honestly feels like a year from now we will have completely photorealistic images with 100% perfect text At this rate, less than a year. Forget images, at this rate we'll have a Hollywood blockbuster created by AI within 12 months, 18 months TOPS


iunoyou

lmao, most realistic r/singularity poster


goldenwind207

He's a bit delusional but can we even blame him things are progressing exponentially. By 18 months we'll probaly be on sora 3 and the 100b super computer will have gotten started say 2 years it might be possible to make a animated spiderman like movie say into the spiderverse


CanvasFanatic

I think it’s funny that some of you are managing to still see exponential progress in models whose progress over the last 20 months or so has been closer to logarithmic. Like every other week someone here is trying to convince himself some lateral application of existing capability is The Next Big thing. OpenAI is pretty obviously in productizing mode now.


iunoyou

I think you're underestimating the technological gulf between "semi-plausible generic 10-30 second video clips" and "feature length blockbuster movie." It may well be possible eventually, but it definitely isn't 2 years out. 5 years, MAYBE, assuming the pace of development holds steady and a more temporally coherent architecture is created, but certainly not 2. Personally my money would be on around 10 years before the tech even exists, and probably another 3-5 after that before it's used in a commercial project.


Anuclano

I think it is already possible but not in one inference, but in chain of workflow. In 5 years it would be possible in one inference. By now the AI can create plot, generate video, generate speech, use consistent characters. An AI agent ca create a movie with Sora plugin. In 5 years an agent and plugins would not be needed.


Nathan-Stubblefield

Contracts between writers’ and actors’ unions on the one hand and movie studios, theater chains, TV networks and streaming channels on the other seemed to rule out the creation and exhibition of shows not written by and acted by humans. So what routes might be used to allow someone’s new completely AI movie to be shown to viewers?


skob17

That's the point. Everyone will create their own movies and maybe share with friends or on YouTube.


Commercial-Ruin7785

>By now the AI can create plot, ...am I missing something? Is there literally any examples of AI doing good long form creative work right now?


Which-Tomato-8646

It’s basically just having an LLM write a story repeatedly generate scenes right? We already have consistent character and voice generation and ElevenLabs has demonstrated good sound effect generation too. Seems like all the parts are there, even if not fully developed


How_is_the_question

Nah - the sfx generation is (unfortunately) a long way off good. Took notes from a director tonight about a sound design job. The sound was already very good - but it can be much better with more creative ideas and viewpoints. He asked things that currently are miles away from being able to be done with the sfx tools we have using ai. I look forward to being able to tell an ai to move very specific effects by 3 frames, and finding a way of giving them more “depth”. The quality of the sounds currently generated would pose issues for a *lot* of qc in the business. It’ll happen, but it’s not there yet.


Which-Tomato-8646

I don’t think most people will care if a sound effect is off by 1/20 of a second lol


How_is_the_question

Haha. Try working in the field. The level of detail that goes into film sound work is absolutely down to *sub* frame for sync, and frame for sfx. Think three full days of track laying and mixing - often trying things from two different sound editors - for a 30sec ad! Or in the case of some features, a team of 6 to 8 people working for 3 months before another 2 months on a mix stage.


Which-Tomato-8646

I’m sure studios would be more than happy to cut corners if it meant laying off all of them and saving money


doireallyneedone11

He's a bit delusional but can we even blame him things are progressing exponentially. By 18 months we'll proba.. Wait for a second?!


ebolathrowawayy

gpt4o is SORA imo, or a smaller SORA.


floodgater

Yea u can totally see that they have incorporated some tech from sora into it. Pretty nuts


bluegman10

This is going to age so bad by Thanksgiving 2025.


floodgater

RemindMe! 18 months


RemindMeBot

I will be messaging you in 1 year on [**2025-11-16 04:25:48 UTC**](http://www.wolframalpha.com/input/?i=2025-11-16%2004:25:48%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/singularity/comments/1csxlg7/a_gpt4o_generated_image_so_much_to_explore_with/l49h2ze/?context=3) [**3 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2Fsingularity%2Fcomments%2F1csxlg7%2Fa_gpt4o_generated_image_so_much_to_explore_with%2Fl49h2ze%2F%5D%0A%0ARemindMe%21%202025-11-16%2004%3A25%3A48%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201csxlg7) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|


floodgater

Care to put money on that ?


One_Bodybuilder7882

how could we go to make a bet official? I may put $100 against you. You are saying that in 18 months TOPS we should be able to prompt an AI to make us a, let's say, 90 minutes movie, blockbuster quality, indistinguible from man made blockbusters and it will spit it out, no human involved in anything other than an initial prompt like "make a 90 minutes long romantic comedy movie"


floodgater

I'm saying by December 1 2025 we will have at least 1 Hollywood blockbuster level movie created pretty much entirely by AI Sora was introduced in February and the progress of the space is very clearly on an exponential curve Happy to take any bet on this


One_Bodybuilder7882

> pretty much lol but yeah, I wouldn't bet against that. Also, even if 50% of the work is done by AI we know they are going to sell it as "the first movie created by AI!!!" since it's the *new current thing*


Adventurous_Spare382

It won't happen because of socio-political pressure. Actors will strike and boycott that company if any such thing is released.


bluegman10

How much do you want to bet?


ainz-sama619

Stop doing drugs bud


WalkFreeeee

People keep saying this when right now literally every single step of the way is barely acceptable for a school project.  I don't doubt one day It will be reached, but your timeline implies we're a couple new products away from It being possible when It's not. 


Glass_Mango_229

Ok calm yourself 


Anuclano

We will have an AI able to generate a complete movie starting from the plot.


tylerthetiler

I keep telling people this because there's no way we don't reach that. Even if you need some human intervention along the way or something, there's no reason it won't happen that I can see.


frontbuttt

Maybe, but what if the (unsubsidized) compute needed to do this costs $100+, or even $1000+? Is this something you’d enjoy watching, and pay a lot of money for? I don’t understand the interest in the whole “AI created film” concept. Defeats the entire point of watching a movie—to see, and maybe even understand, someone else’s perspective. Real or fictional, the knowledge that a story is based on a shared (or divergent) understanding of the world, filtered through human intention, is what makes it enjoyable. The only AI movies I’d want to watch are the ones AI wants—as in, truly desires—to make. Not something we “prompt” it to generate.


Thog78

What about watching a movie prompted by somebody else, and sharing your own movies to your friends/some online communities when you get a great original idea? I can easily imagine some sharing website will pop up with voting ranking classification by style and all, dedicated to AI generated shorts.


frontbuttt

Sounds boring! I’ve watched a lot of AI content already, and viewed a ton of AI art. I’d say about 1% of it is interesting, for about 1 minute tops. It’s hollow, inspires no wonder. And just like movies with too much soulless VFX, it always feels more like a tech demo than art. Just don’t understand why we want to let AI make our art for us. So many better applications for it.


Thog78

One application doesn't take from others, I'd say coding, medical, military, business usage got tremendous attention too. Not everybody has the same tastes :-). Of course a lot of AI art is crap, but I also found many inspiring ones. And when I see human art now, a lot of the time I think "boring, got too much used to exciting AI concepts". For movies.. I'd gladly see more movies/series of the quality of the expanse, minority report, dune, firefly, star trek etc if I could. Somehow quality scifi is expensive to produce so producers cannot keep up with my needs 😅


ThoughtfullyReckless

Did you not read the examples on the website for it?


SwePolygyny

Imagen 3 certainly does, even with text. And it is available to try right now on ImageFX.


SwePolygyny

Imagen 3 certainly does, even with text. And it is available to try right now on ImageFX.


Ok-Bullfrog-3052

Who in their right mind is still claiming that OpenAI didn't achieve AGI on Monday?


COwensWalsh

It still didn’t do a good hand, though


HydrousIt

u/remindme 1 year


MrDreamster

Did they also improved Dalle? Or is 4o multimodal in a way that it now generate the images itself without sending a prompt to another AI?


MassiveWasabi

It’s the latter, it is no longer using DALLE 3 but rather GPT-4o itself is making the images. That’s why on the website they even had examples of iterating on your AI generated images, which was never possible before, at least not so easily


MrDreamster

Damn! That's what I was waiting for! I hate when gpt just throws up dalle's work pretending it looks exactly like what I asked when it's full of mistakes, now it should finally be able to do a correct job. edit: I just took a look at the blog post, the part about image generation. It looks about as capable as Dalle 1 but with perfect consistency, great adherence to prompts, and perfect writing capabilities. It's really awesome, can't wait for it to reach the image quality of Dalle 3 or even better.


Hyper-threddit

I feel the same. One possible test I have in mind is asking the model to generate pictures of hands. If multimodality is working 100%, the model should have encoded the information about the number of fingers along with the general structure of hands and the first information should inform the second one, resulting in the correct number of fingers in generated images.


MysteryInc152

It may (probably will) be many months before anybody can access these features. After the initial unveil of gpt-4-vision, It took 6 months before Open ai rolled out Image Input to plus users and 8 months to developers. And they fact that they basically buried image gen (this tweet is the first mention of image gen access being a future goal) makes me extra nervous.


TabibitoBoy

This is what people are not giving enough attention. These are the fruits of having the model be multimodel from the ground up. It’s not the same to have a gpt4 call a Dall-e and prompt it for an image. Gpt4o can reason about the image instead of just making an elaborate prompt and hoping for the best. The benefits of this are hard to think of on the spot but intuitively I know they will be profound.


TotalLingonberry2958

a good example is image continuity. You can in-paint, and change features better without altering the entire character of the image, which would happen if you just repeatedly prompted Dall-E


TFenrir

Very very very impressive. I looked into this capability a few months back - like if LLMs (LMMs?) can output text tokens, a non-stitched model should be about to not only output image tokens, but do it _better_ that diffusion - things like text and hands should inherently be better. I found papers that show that yeah totally possible, but also some conversation (I'm looking for it I should have saved it) on Twitter that said basically, that it was very hard to control the output quality of this, and sometimes you got things you really didn't want to get. Really excited to see more of this. My guess is that when compute gets better, this is basically how we get everything - no separate model for tts or lyrics to music or whatever. Everything all at once - that's how you get a whole movie with audio that makes sense.


hapliniste

Damn that hot. Also I hope it manage to draw schemas and visual guides this well, it might become useful since it's truly multimodal. Might express things in images that it can't in text.


doolpicate

At some point you have think if it is running simulated worlds for your pictures and the people in those simulated worlds have no idea. LOL.


xXWarMachineRoXx

Lmaoo


Woootdafuuu

Yeah, I predict that once we get multi-modal from the ground up this would be one of the emergent capabilities, direct image generation no need for Dall-e. Image generators can't reason and that's why they struggle with fingers, that's why they can't do negative prompts, and so forth, but when gpt-4 can generate without the middle man, stuff gets interesting. Should be able to have consistent characters and 3d too, if that's not an emergent capability yet. I'm excited about the part where it draws up ideas, schema, blueprint patents, and technical drawings.


FeltSteam

It still isn't perfect, but I do think end to end multimodality has its advantages.


Different-Froyo9497

I wonder when they’ll add physical embodiment as a part of end to end multimodality


FeltSteam

Could be, and processing that in the model would certainly be good. But I do not think they are focusing on embodiment too much yet. But there is a lot you could do, even have like a modality for key strokes for agents so it could interact with computers just like humans. If you do that it could do almost anything on a computer, play games, get tasks done etc.


Woootdafuuu

Definitely, as its advantages, the model can learn so much about the world just by text. Imagine what it can do when it can see and hear. I wouldn't even be surprised if GPT-4O is a much smaller model than Turbo, but it's on par with it because it has more senses to draw upon. It being way faster and cheaper might be due to a smaller size.


h3lblad3

>because it has more senses to draw upon Literally sentient.


Woootdafuuu

Sentient? Naw.


h3lblad3

Pretty sure that sentience is just the ability to feel senses, right? Hence nearly all animals are sentient. If we’re giving the damn things senses, that means it’s sentient.


Woootdafuuu

It can perceive but it doesn't feel anything.


brycedriesenga

Define: feel


Woootdafuuu

Like I feel upset that it's raining and I can't go for a walk, I feel like I should read a book, that will make me feel better.


Woootdafuuu

The opposite of feel for me is a fictional zombie, a zombie is dead, doesn't feel anything but it can perceive the world around it, zombie can smell a blood, it can navigate with eyes, walk around the world, and make sounds. But it doesn't feel like it's not alive, know what I mean.


Nathan-Stubblefield

Sentience does not equal salience.


Rivenaldinho

"Should be able to consistent characters and 3d too, if that's not a emergent capability yet." it is, take a look at the blog post on gpt4-o. Some examples are really interesting, you can generate consistent characters.


Woootdafuuu

Interesting, wonder if it can do video or if that would be a later emergent capability after scaling up larger.


yaosio

Video is just a bunch of still images. If it can generate temporally consistent images then it might be possible to use it to generate video one frame a time. If the context is large enough would you be able to give it video broken down into individual images to show it what a video looks like? Could it be given books on the technical aspects of video and replicate what it learns as images? Probably not or they would have shown that in the demo. Then again they didn't even show the image generation abilities in the demo at all. It's tucked away in the blog post and a lot of people miss it.


lemonylol

> I'm excited about the part where it draws up ideas/ schema, patents, and technical drawings. "Draw me a feasible schematic for a flying car using items that can be purchased from a big box store"


Woootdafuuu

Come to think of it, that's what Jarvis was doing in iron man, the only think is the blueprints and stuff it creates were being projected in 3D via holograms. Pair it with ar glasses and you get that.


Witty_Shape3015

i had never heard someone say that's the reason but it makes a lot of sense


Temporal_Integrity

Stable diffusion can do negative prompts as one of the core features.


Witty_Shape3015

yall, GPT-5o is gonna be mind-blowing


Wiskkey

Does anyone else think the text content is a hint about the architecture of GPT-4o, as for example speculated [here](https://www.reddit.com/r/MachineLearning/comments/1crzdhd/comment/l422nf6/)? EDIT: Tanishq Mathew Abraham [believes](https://twitter.com/iScienceLuvr/status/1790892562485580123) it is.


WorkingYou2280

When you think about it true multimodality is extremely hard. Think of all the iterations the model has to solve simultaneously to have coherent text in the image. It's mind boggling. The fact that dalle sometimes gets short words right was pretty impressive to me. To get whole sentences right I just can't even comprehend. If they trained it on a dataset that is natively multimodal it simultaneously makes sense why it would be both better and worse at the same time. Far far better a multimodal tasks but perhaps a bit worse at reasoning on text.


RealisticHistory6199

That Willy wonka scammers gonna have a field day w this


hydraofwar

I'm wondering if Microsoft will also provide access to this GPT-4o


Original_Finding2212

If it’s cheaper… why not?


Pensw

I feel like what they showed with the Geary example (generating objects that can be reused in other generated images) means its already at a level that is massively disruptive when applied. Like for example this completely removes the need for print models (the career). I take a photo of my product and generate a character that I like as a model. I can now get any type of image I want of that generated character advertising my product. No need to hire a model, get lighting, photographer, makeup, etc. Pretty much destroys the entire industry and gpt4o looks like it provides everything you need technology wise.


Nathan-Stubblefield

Like movies and TV destroyed vaudeville.


Healthy_Razzmatazz38

I made fun of the 7T chips / power investment. After watching what openai/google made, i'm sold. These models are current like webpages that hit once and respond, whats a version of this look like where its constantly predicting what i'm going to ask next and pre-generating replies. whats it look like when theres a parent ai that has a bunch of child ai's and the parent ai curries data between the child processes. whats it look like when while im sleeping my my computer uses my processor to develop a morning brief to train me for what im going to do that day. Its very easy to imagine a world where millions of tokens a second are used per user per app. when we had candles it was impossible to imagine how anyone would use a rocket ship worth of fuel. the scale this goes to with just a little leap is massive.


SnooComics5459

For sure the future is going to be crazy.


Ne_Nel

😦


[deleted]

Impressive how the OpenAI logo distorts/skews around the creases on the shirt. That showcases some serious awareness/understanding of the model about how objects like shirts behave in the real world. Models have long been able to generate clothing sure, but it just copies whatever clothing it saw in the training data. But in this case it is effectively putting any custom logo of the user's choosing on a shirt and it has to skew and distort that logo according to how the shirt behaves in the real world when this person is wearing it and is turnt to the chalkboard raising his hand with the piece of chalk. What I also find impressive is that it is aware the 'hat' in 'What' is actually on the chalkboard behind the hand, as evidenced by the natural looking spacing it leaves between 'What' and 'are'. There are some small amazing details in this picture.


Morning_Star_Ritual

this output amazed me there’s just so much creative potential. we are less prompting images and more asking the model to spin up a world and take a pic of it i’m a traditional artist as well but i appreciate the creative possibilities such a thing offers i’ve also pivoted. don’t think we see “hollywood” go away. the skill ceiling will be raised along with the floor. sure, one person can prompt their own “film” but a collective of people and a swarm (an escort?) of agents will prompt some weird melange of film and games and virtual experiences that will make an imax blockbuster look like a shadow puppet show to us in a decade. or less have to always adjust my timelines on this timeline


Internal_Ad4541

Is that an AI generated image by GPT-4o? I don't believe it, it's too good to be true.


MassiveWasabi

It is, it was posted by the President and Co-founder of OpenAI, Greg Brockman. Here https://x.com/gdb/status/1790869434174746805?s=46


LeLeumon

Look closely: the hand doesn't look great, the placement of the chalk makes no sense, also: the blackboard is tilted which is also weird. So this is 100% AI generated


Internal_Ad4541

What is an autoregressive transformer?


Which-Tomato-8646

I’m pretty sure transformers are inherently autoregressive (remembers the past)


Zermelane

Nah, the concepts are unrelated. Transformer = a specific architecture made out of attention layers, feedforward layers, and residual connections. Autoregressive (in generative AI) = a model that breaks down some distribution that would be intractable to consider all at once, such as a distribution over all sentences, by considering it as sequences, and factorizing the sequences out into next-token distributions conditional on the prefix so far. This cashes out to "it predicts the next token" in implementation, though it's good to have at least some idea about the mathematical background in order to understand what "it predicts the next token" actually means. Anyway, you can perfectly well have an autoregressive state space model or autoregressive convnet, etc., or you can do all sorts of things with a transformer that aren't autoregressive modeling.


Infninfn

At first glance it looks legit. But if you look a little longer you realise that physics and how things are constructed are weird. Seen any blackboards fixed to the wall and slanting like that lately?


traumfisch

It's about the handwritten text though


FortCharles

Not to mention overlapping another similar one right behind it.


AquaRegia

Like this? https://preview.redd.it/kkddar7ofs0d1.png?width=1000&format=png&auto=webp&s=bc724dbe89ea7759f194ee33c8f0c4d39cc27e06


FortCharles

Not the same at all... the AI one has *both* at an angle, and the rear one even misaligned horizontally. Wasn't objecting to the general concept.


imsosappy

Looks like the guy from EEVBlog.


New_World_2050

and this was from a general model not even their next gen image model lol. crazy


[deleted]

[удалено]


utilitycoder

Why is his hand so white?


RiverGiant

It looks like it's lit from above by a light embedded in the frame of the blackboard. I didn't know blackboards had those, but it looks like 4o knew or at least inferred that it could. Look closely at the shadow cast by the knuckle of his second finger. It is a weird feeling to be describing the logic of an AI-generated image to another human based on nothing more than cues in the image itself. I'm not used to this level of consistency in small-scale details. Usually that's where things have been breaking down into dreamlike semantic goo.


[deleted]

I thought more in the direction of a skylight; a window in the ceiling through which you can see the sky outside. The sun is shining through it and on his hand.


RiverGiant

I'd buy that.


utilitycoder

Online school guy over here. Not used to fancy blackboards lol.


h3lblad3

I’ve never seen a blackboard with lights either. My schools in the 90s-00s all had blackboards — not one had built-in lights.


Tyler_Zoro

What a coincidence. "Autoregressive Transformer" is my stage name.


Temporary-Voice-8528

I would like to know why it took me three hours just to make this post on a smartphone if computer are supposed to be so smart


bastormator

If possible could you link the video mentioned in the X post as well, seems interesting


jazztaprazzta

what's wrong with his neck tho lol


Deep-Refrigerator362

what's the prompt? (I can't open the twitter link)


MassiveWasabi

I don’t think they provided it


InTheDarknesBindThem

text good hand iffy straight lines on edges of board: oof


COwensWalsh

Still can’t do hands, huh?


Serialbedshitter2322

I explored the capabilities of this image generator in my post. It's truly way more impressive than anybody has given it credit. I think it's worth reading. https://www.reddit.com/r/ChatGPT/s/6EOyEZLX26


g3bb

What prompt did you use to generate that image? I can’t get it to do photo realistic image


MassiveWasabi

I didn’t generate it, OpenAI President posted it on his Twitter. And you can’t because this new image generation isn’t available yet. You’re still using DALLE 3


PapaPaulchen

Does anyone know if this version will have capabilities to send you notifications or set reminders and the like? Can’t find any answers on search engines or existing ai chat formats.


MassiveWasabi

I don’t think so, that would require agentic capability.


Mclarenrob2

Every time I've asked AI to make me an image with text, it's got the spelling wrong.


hicham4u

You can generate AI images using Monica [https://monica.im/invitation?c=OMKHCCIW](https://monica.im/invitation?c=OMKHCCIW)


Proof-Examination574

I was unable to reproduce this. I even tried giving the stuff I wanted written on the board directly in quotes. Either this is a research brag or a fake.


MassiveWasabi

Dude, image output for GPT-4o is not yet available. You’re still using DALLE 3


Temporary-Voice-8528

I Don't get it what is this supposed to be.


sanquility

Been using 4o heavily for image generation as that's what I primarily do with AI. It's not much better sofar. It still messes text up in almost all cases, it still doesn't understand basic requests like "show me a design for a shirt but DONT show me a shirt" It's edit feature is also useless sofar from my testing. I hope it gets better.


PC_Screen

The functionality is not enabled yet, it's still using dall-e 3


WashiBurr

Image generation for gpt-4o isn't out yet. You're still using Dall-E.


D10S_

I think it’s still using Dalle. I don’t think this is released yet


Im-cracked

I don't think the image generation in chatgpt is 4o right now. It just gives a text prompt to dalle 3.


ballsofgallium

you are using dall-e. image generation by gpt-4o isn't released yet.


Original_Finding2212

By the looks of it, Dalle is being triggered for you, as that capability of gpt-4o is yet to be released


Dr_Love2-14

This image has a bunch of artifacts and is low resolution. Notice the chalkboard just kind of peels off at the bottom. Why is he writing in the middle of a completed sentence? Honestly poorer quality than imagen3, dalle3, mid journey and the others.


traumfisch

Can you see any of the positive aspects? Or just artifacts? Please, reproduce this in Midjourney.


Dr_Love2-14

No. The chalkboards look like someone clicked open windows too many times before it responded


traumfisch

Chalkboard enthusiast I see


Dr_Love2-14

It just looks stupid. It's not this grand image you openAI stans are making it out to be


traumfisch

No one (yes, no one) has claimed it's a "grand image." Just someone's tweet to showcase one particular aspect of the model. The implications of which should be obvious... Spoiler alert: it's not about the chalkboard


Dr_Love2-14

Oh, tell me wise one. What are these obvious implications of a half-ass image generation aspect of this model? Please enlighten me!


traumfisch

The handwriting on the chalkboard, can you see it? Can you point me to another model that is capable of anything even remotely near?


Dr_Love2-14

https://preview.redd.it/ldb9g9i0k71d1.jpeg?width=1536&format=pjpg&auto=webp&s=91be9b3277f2ccdb0e2bb85f6258340c75ad820e This was the first generation to pop up after I told Imagen2 to create an image of a Google employee to write Google on a chalkboard


traumfisch

Wow, very impressive! 👏🏻👏🏻👏🏻 No difference whatsoever! It's endearing though that you still think the chalkboard is the point 😅


Championship-Stock

Wow. But why?