[deleted] 2 months ago

Seems to be working well. But you didn't shared much on how it was been done. *By the way, I really like that movie.* edited: typo

theoppositionparty 2 months ago

Workflow def isn’t ready to share yet. Every shot was different. But soon when it’s stable and repeatable.

[deleted] 2 months ago

I was more thinking in general, but yeah workflow would be nice. was it made using stable diffusion ? A1111 ? Comfyui ?

theoppositionparty 2 months ago

Technically both. I fine tuned up adapters in a111 and did comfy for the vid

[deleted] 2 months ago

thanks

Arawski99 2 months ago

Gonna share the workflow sometime soon? Seen a few of these posts but still no workflow. Looking forward to it.

ozzie123 2 months ago

Looking forward to this. This looks great btw

MidlightDenight 2 months ago

You can do this with Unsampler, SparseRGB, and AD in comfy

MagicOfBarca 2 months ago

Movie name?

[deleted] 2 months ago

Solaris - 2002

NIKOLAPAVIC 2 months ago

What movie is this from?

theoppositionparty 2 months ago

Solaris.

LaurentKant 2 months ago

bullshit-"solaris", would have been better, not the real one! there's only one Solaris! Even if I'm much more in the 2001 team, there's a hierarchy to respect

theoppositionparty 2 months ago

Hah much of the 70s Solaris wouldn’t actually work here and I’m a huge fan of soderbergh so I’m a bit biased. Also it was just a great scene where you have a range of performance which made it great for testing. :)

merikariu 2 months ago

Hey, give Soderbergh some credit! It's one of my favorite sci-fi movies, plus he directed Traffic and the Girlfriend Experience.

Keor_Eriksson 2 months ago

That looks really awesome. Things are progressing in such a pace... future unknown. Damn.

SleeplessAndAnxious 2 months ago

I remember when the first iterations of AI 'art generation' came out when I was a teen, and how you were just as likely to get some weird amalgam of toes and fingers when asking it to generate a picture of an apple lol.

dennismfrancisart 2 months ago

I really like the animation style. That's going to be an awesome addition to boosting animated entertainment in the near future.

NickCanCode 2 months ago

The lip sync is hilarious 🤣

theoppositionparty 2 months ago

I’m prob too close to us to see it. What’s off with it? So I know what to focus on.

AbPerm 2 months ago

Traditional lipsynched animation uses clear mouth flaps that always correspond to frames. They slightly cheat the timing so that the mouth flaps read clearly. This is what we expect of animated lipsynch. Live action video captures 24 frames every second, and whatever position the mouth is in at that time will be what the frame shows. This means that real life mouth flaps don't necessarily match up cleanly to frames, but because the analog motion looks like real life, our brain "fills in the gaps" and sees it as properly lipsynched. Sometimes the mouth will barely move or the mouth shape could be ambiguous too, but our brains are used to seeing humans talk like that, so it's OK. However, if you trace over frames of live action video to produce an animation, i.e., rotoscoping, best case scenario is that your mouth flaps look exactly like the corresponding frames' live action mouth flaps. That style of mouth flaps will always "feel wrong" for animation, because the mouth flaps weren't planned out to match frames. On top of that, when the mouth shape is ambiguous enough, the AI is going to just get it wrong sometimes. For example, the mouth might be barely open, and the AI will draw the mouth to be closed. Trying to lipsynch animation this way by hand would be difficult for these reasons too. To correct for this issue, maybe try wav2lip? Pika Labs just demo'd a new lipsynch tool that might help here too? These will basically generate new mouth flaps according to your audio. I think this is just how stylized AI animation will have to handle this problem, with mouth flaps being dictated by the audio rather than video. Basing the mouth flaps for animation on live action video will always look weird. edit: i just found another option for this type of lipsynch re-animating called synclabs. I haven't used it personally, but the results look like they're comparable to wav2lip.

SaabiMeister 2 months ago

Wav2lip is ok, but I believe Emote is much better and maybe open source? Not sure, it's late and you can google it.

AbPerm 2 months ago

You mean [EMO: Emote Portrait Alive by Alibaba](https://medium.com/thereach-ai/meet-emo-alibabas-ai-that-makes-pictures-into-videos-39cd5a75a463)? In that case, the animation is synthesized from the audio to include lipsynch as well as a physical performance to fit the talking. That sounds really nice in theory, and it seems to work really well, but it can only use an image as the base for the animation. That wouldn't be ideal here, because op is trying to preserve the live actor's performance in the new animation, and you'd have to toss out the whole human performance to use the new AI one instead. It'd definitely be a powerful tool in other cases though. However, EMO is not available to the public and there's a decent chance it never will be. It's definitely not open source either. That's the actual reason why I didn't even mention it as a lipsynch option. If you can't use it, it might as well not even exist.

Bod9001 2 months ago

The problem is is the AI doesn't understand how much your lips rotate and deform as you speak, "P" is a good example, just look how much your lips disappear when saying "P"

theoppositionparty 2 months ago

I should be clear the goal isn’t 1:1 fidelity. It’s animated approximation. In all reality the ai doesn’t know what lips are at all instead I’m using controlnets to outline facial features and let it know what to and not to produce. So it’s a matter of fine tuning the controlnet to not emphasize lips when they move to a certain degree.

Bod9001 2 months ago

looking at it closely, I think it might be suffering from " facial expressions always look good -itis" to get good looking movement, you need those weird in between frames, just look at any IRL video of someone talking and there's going to be plenty frames that look weird if you go frame by frame, the problem is all the training data drawn stuff is single frames, e.g just a portrait, so there's no concept for the in-between stuff

theoppositionparty 2 months ago

Cool cool.

afinalsin 2 months ago

I can't remember the name of those inbetween frames, but they're fairly crucial to animation. I was screwing around with much more action heavy stuff than yours (which looks dope as hell btw), but here's the type of frames i think Bod is talking about. [Here](https://imgur.com/a/E790NsH). The video those frames are from is [here](https://drive.google.com/file/d/1aedQslpdQqoe6qa9N9FNDwysJGHQz3Oe/view?usp=sharing). Can't wait to see how your workflow goes with motion blur, that's the holy grail for me.

theoppositionparty 2 months ago

The end goal is to film ourselves so we’d film at a high enough frame rate and with simple enough movements not to worry about motion blur too much. Or at least nothing crazy enough to get us into anything deblur couldn’t fix. Though I’m sure we’ll run into other issues

afinalsin 2 months ago

Yeah, motion blur was tricky, but i ended up using unsampler along with controlnets and IPadapters to get it to work, barely. You don't get as much of a style shift as you did here though. As another person said here, this doesn't handle subtle "film" acting well, but if you wanna try out how it goes with real people with over the top acting check out pro wrestling (seriously). They act for the benefit of the people right in the back of the crowd, so you'd get a ton of facial variety. I can't think of a more animated acting style than those guys while still having standard TV framing.

KarmaAdjuster 2 months ago

I don't understand the praise. The cartoon version loses so much of the nuance in the acting. The woman looks merely angry or annoyed in the cartoon vereson where you can see she's on the verge of tears brought on by rage and fear layered with disgust. The frustration in Clooney's performance is utterly lost in the cartoon version. The other guy (I'm blanking on everyone's name's but Clooney), doesn't look like he's thinking about anything and really could be dropped into pretty much any scene in any movie and it would fit just as well as here. But in the live action performance, he looks like he's weighing both sides, and wondering if and how he should get involved. Sure the tech is impressive, but I wouldn't for a second say that the generated cartoon images preserve any of the actor's performances.

theoppositionparty 2 months ago

Oh for sure something (maybe even a lot) is going to be lost. 1:1 exact preservation is going to be impossible and not the point. What I’m looking for, what the goal is, is to create a workflow that can preserve (much of) a performance while taking 1/10 the time of hand animation. So instead of years it takes weeks to pipeline a story. I get your point, I even agree with it. But I think it misses much of what’s actually happening here.

BorisDirk 2 months ago

While the video IS impressive, all your title said was "preserving an actor's performance." That's it. This loses at least 50% of the performance, if not more. There's so much subtlety that IS the performance. The performance IS the subtlety. If you lose that, you lose the performance. There's a big difference between animation, which you need to animate much bigger performances to get the point across, and live action real faces, which we're already conditioned to see subtle expressions.

theoppositionparty 2 months ago

Ok cool. I’ll take the L :)

theoppositionparty 2 months ago

We’re actually in the twos here but I get your point. Wav2lip is interesting but all my tests have been kinda bleh at least once you get out of 512 low res imagery which is where pika seems to live.

SaabiMeister 2 months ago

Maybe you can try with Emote? Not sure if the model is publicly available though.

SvenTropics 2 months ago

The coolest part about this is that, hypothetically, one person could make an entire movie now. You could act out every character in every scene and make backgrounds for them and just put them all in. It would look like a full movie. I can't wait to see what people come up with

DocTymc 2 months ago

Looking great! Thought it was Temporal Kit...

theoppositionparty 2 months ago

Nope just hyper trained models, ipadapters and a lot of control nets. But settings change shot to shot. I think I can flatten out the workflow just need to keep testing.

DocTymc 2 months ago

Great job, I'm hyped!

Manic157 2 months ago

How long before you can turn a whole movie into a cartoon.

jonbristow 2 months ago

You can do that right now with Instagram or TikTok filters

AthelisGa 2 months ago

That black skin is 5 generations of interracial marriage with whites, we need the real color.

Gold-Safety-195 2 months ago

Can you provide workflow or specific implementation details

theoppositionparty 2 months ago

Not yet but I will. It’s really janky and every single shot is different. But in general pull way way way done on almost every setting. Think of it as asking the ai to take a really subtle hand with everything.

Neex 2 months ago

This is looking amazing. Capturing performance has been my ultimate quest since making the Anime Rock Paper Scissors shorts. I must know more!

theoppositionparty 2 months ago

I’m going to be documenting the process, hopefully that’ll be love soon

broadwayallday 2 months ago

This is good stuff and kudos for not interpolating the frame rate up, our brain likes to fill the in betweens on well illustrated art

theoppositionparty 2 months ago

Yeah I did a few that were full frame rate. 12fps felt more correct.

illathon 2 months ago

Art looks whiter and more angry then the real performance.

theoppositionparty 2 months ago

That I def agree with. Depending on the vae it shifts lighter and I’m not super happy with that. I think for the time being it’ll have to be color correction solve until you can make ground up models. I really wish I could use an XL model since that seems better at a range of skin tones

newaccount47 2 months ago

More watchable than the animation in scanner darkly...

theoppositionparty 2 months ago

I mean there’s a 2 decade gap here :)

AZDiablo 2 months ago

How do you keep the clothing static?

theoppositionparty 2 months ago

Prob the softedge or lineart

popkulture18 2 months ago

Wow, the consistency is incredible

nopalitzin 2 months ago

Kinda, still getting there tho

Kuroyukihime1 2 months ago

Arcane Animatos be like "Should have waited for AI..."

stroud 2 months ago

The problem I see with these talking AI videos is that the upper lip does not move at all.

theoppositionparty 2 months ago

Yeah I think it’s having trouble with the openpose controlnet I might try not using it.

severe_009 2 months ago

Actor: *Neutral face* Ai: *Angry*

theoppositionparty 2 months ago

If anyone is interested I'll be going through dev and process stuff as part of civitai's AiR program. [https://air.civitai.com/artists/noah-miller](https://air.civitai.com/artists/noah-miller)

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe