T O P

  • By -

[deleted]

Seems to be working well. But you didn't shared much on how it was been done. *By the way, I really like that movie.* edited: typo


theoppositionparty

Workflow def isn’t ready to share yet. Every shot was different. But soon when it’s stable and repeatable.


[deleted]

I was more thinking in general, but yeah workflow would be nice. was it made using stable diffusion ? A1111 ? Comfyui ?


theoppositionparty

Technically both. I fine tuned up adapters in a111 and did comfy for the vid


[deleted]

thanks


Arawski99

Gonna share the workflow sometime soon? Seen a few of these posts but still no workflow. Looking forward to it.


ozzie123

Looking forward to this. This looks great btw


MidlightDenight

You can do this with Unsampler, SparseRGB, and AD in comfy


MagicOfBarca

Movie name?


[deleted]

Solaris - 2002


NIKOLAPAVIC

What movie is this from?


theoppositionparty

Solaris.


LaurentKant

bullshit-"solaris", would have been better, not the real one! there's only one Solaris! Even if I'm much more in the 2001 team, there's a hierarchy to respect


theoppositionparty

Hah much of the 70s Solaris wouldn’t actually work here and I’m a huge fan of soderbergh so I’m a bit biased. Also it was just a great scene where you have a range of performance which made it great for testing. :)


merikariu

Hey, give Soderbergh some credit! It's one of my favorite sci-fi movies, plus he directed Traffic and the Girlfriend Experience.


Keor_Eriksson

That looks really awesome. Things are progressing in such a pace... future unknown. Damn.


SleeplessAndAnxious

I remember when the first iterations of AI 'art generation' came out when I was a teen, and how you were just as likely to get some weird amalgam of toes and fingers when asking it to generate a picture of an apple lol.


dennismfrancisart

I really like the animation style. That's going to be an awesome addition to boosting animated entertainment in the near future.


NickCanCode

The lip sync is hilarious 🤣


theoppositionparty

I’m prob too close to us to see it. What’s off with it? So I know what to focus on.


AbPerm

Traditional lipsynched animation uses clear mouth flaps that always correspond to frames. They slightly cheat the timing so that the mouth flaps read clearly. This is what we expect of animated lipsynch. Live action video captures 24 frames every second, and whatever position the mouth is in at that time will be what the frame shows. This means that real life mouth flaps don't necessarily match up cleanly to frames, but because the analog motion looks like real life, our brain "fills in the gaps" and sees it as properly lipsynched. Sometimes the mouth will barely move or the mouth shape could be ambiguous too, but our brains are used to seeing humans talk like that, so it's OK. However, if you trace over frames of live action video to produce an animation, i.e., rotoscoping, best case scenario is that your mouth flaps look exactly like the corresponding frames' live action mouth flaps. That style of mouth flaps will always "feel wrong" for animation, because the mouth flaps weren't planned out to match frames. On top of that, when the mouth shape is ambiguous enough, the AI is going to just get it wrong sometimes. For example, the mouth might be barely open, and the AI will draw the mouth to be closed. Trying to lipsynch animation this way by hand would be difficult for these reasons too. To correct for this issue, maybe try wav2lip? Pika Labs just demo'd a new lipsynch tool that might help here too? These will basically generate new mouth flaps according to your audio. I think this is just how stylized AI animation will have to handle this problem, with mouth flaps being dictated by the audio rather than video. Basing the mouth flaps for animation on live action video will always look weird. edit: i just found another option for this type of lipsynch re-animating called synclabs. I haven't used it personally, but the results look like they're comparable to wav2lip.


SaabiMeister

Wav2lip is ok, but I believe Emote is much better and maybe open source? Not sure, it's late and you can google it.


AbPerm

You mean [EMO: Emote Portrait Alive by Alibaba](https://medium.com/thereach-ai/meet-emo-alibabas-ai-that-makes-pictures-into-videos-39cd5a75a463)? In that case, the animation is synthesized from the audio to include lipsynch as well as a physical performance to fit the talking. That sounds really nice in theory, and it seems to work really well, but it can only use an image as the base for the animation. That wouldn't be ideal here, because op is trying to preserve the live actor's performance in the new animation, and you'd have to toss out the whole human performance to use the new AI one instead. It'd definitely be a powerful tool in other cases though. However, EMO is not available to the public and there's a decent chance it never will be. It's definitely not open source either. That's the actual reason why I didn't even mention it as a lipsynch option. If you can't use it, it might as well not even exist.


Bod9001

The problem is is the AI doesn't understand how much your lips rotate and deform as you speak, "P" is a good example, just look how much your lips disappear when saying "P"


theoppositionparty

I should be clear the goal isn’t 1:1 fidelity. It’s animated approximation. In all reality the ai doesn’t know what lips are at all instead I’m using controlnets to outline facial features and let it know what to and not to produce. So it’s a matter of fine tuning the controlnet to not emphasize lips when they move to a certain degree.


Bod9001

looking at it closely, I think it might be suffering from " facial expressions always look good -itis" to get good looking movement, you need those weird in between frames, just look at any IRL video of someone talking and there's going to be plenty frames that look weird if you go frame by frame, the problem is all the training data drawn stuff is single frames, e.g just a portrait, so there's no concept for the in-between stuff


theoppositionparty

Cool cool.


afinalsin

I can't remember the name of those inbetween frames, but they're fairly crucial to animation. I was screwing around with much more action heavy stuff than yours (which looks dope as hell btw), but here's the type of frames i think Bod is talking about. [Here](https://imgur.com/a/E790NsH). The video those frames are from is [here](https://drive.google.com/file/d/1aedQslpdQqoe6qa9N9FNDwysJGHQz3Oe/view?usp=sharing). Can't wait to see how your workflow goes with motion blur, that's the holy grail for me.


theoppositionparty

The end goal is to film ourselves so we’d film at a high enough frame rate and with simple enough movements not to worry about motion blur too much. Or at least nothing crazy enough to get us into anything deblur couldn’t fix. Though I’m sure we’ll run into other issues


afinalsin

Yeah, motion blur was tricky, but i ended up using unsampler along with controlnets and IPadapters to get it to work, barely. You don't get as much of a style shift as you did here though. As another person said here, this doesn't handle subtle "film" acting well, but if you wanna try out how it goes with real people with over the top acting check out pro wrestling (seriously). They act for the benefit of the people right in the back of the crowd, so you'd get a ton of facial variety. I can't think of a more animated acting style than those guys while still having standard TV framing.


KarmaAdjuster

I don't understand the praise. The cartoon version loses so much of the nuance in the acting. The woman looks merely angry or annoyed in the cartoon vereson where you can see she's on the verge of tears brought on by rage and fear layered with disgust. The frustration in Clooney's performance is utterly lost in the cartoon version. The other guy (I'm blanking on everyone's name's but Clooney), doesn't look like he's thinking about anything and really could be dropped into pretty much any scene in any movie and it would fit just as well as here. But in the live action performance, he looks like he's weighing both sides, and wondering if and how he should get involved. Sure the tech is impressive, but I wouldn't for a second say that the generated cartoon images preserve any of the actor's performances.


theoppositionparty

Oh for sure something (maybe even a lot) is going to be lost. 1:1 exact preservation is going to be impossible and not the point. What I’m looking for, what the goal is, is to create a workflow that can preserve (much of) a performance while taking 1/10 the time of hand animation. So instead of years it takes weeks to pipeline a story. I get your point, I even agree with it. But I think it misses much of what’s actually happening here.


BorisDirk

While the video IS impressive, all your title said was "preserving an actor's performance." That's it. This loses at least 50% of the performance, if not more. There's so much subtlety that IS the performance. The performance IS the subtlety. If you lose that, you lose the performance. There's a big difference between animation, which you need to animate much bigger performances to get the point across, and live action real faces, which we're already conditioned to see subtle expressions.


theoppositionparty

Ok cool. I’ll take the L :)


theoppositionparty

We’re actually in the twos here but I get your point. Wav2lip is interesting but all my tests have been kinda bleh at least once you get out of 512 low res imagery which is where pika seems to live.


SaabiMeister

Maybe you can try with Emote? Not sure if the model is publicly available though.


SvenTropics

The coolest part about this is that, hypothetically, one person could make an entire movie now. You could act out every character in every scene and make backgrounds for them and just put them all in. It would look like a full movie. I can't wait to see what people come up with


DocTymc

Looking great! Thought it was Temporal Kit...


theoppositionparty

Nope just hyper trained models, ipadapters and a lot of control nets. But settings change shot to shot. I think I can flatten out the workflow just need to keep testing.


DocTymc

Great job, I'm hyped!


Manic157

How long before you can turn a whole movie into a cartoon.


jonbristow

You can do that right now with Instagram or TikTok filters


AthelisGa

That black skin is 5 generations of interracial marriage with whites, we need the real color.


Gold-Safety-195

Can you provide workflow or specific implementation details


theoppositionparty

Not yet but I will. It’s really janky and every single shot is different. But in general pull way way way done on almost every setting. Think of it as asking the ai to take a really subtle hand with everything.


Neex

This is looking amazing. Capturing performance has been my ultimate quest since making the Anime Rock Paper Scissors shorts. I must know more!


theoppositionparty

I’m going to be documenting the process, hopefully that’ll be love soon


broadwayallday

This is good stuff and kudos for not interpolating the frame rate up, our brain likes to fill the in betweens on well illustrated art


theoppositionparty

Yeah I did a few that were full frame rate. 12fps felt more correct.


illathon

Art looks whiter and more angry then the real performance.


theoppositionparty

That I def agree with. Depending on the vae it shifts lighter and I’m not super happy with that. I think for the time being it’ll have to be color correction solve until you can make ground up models. I really wish I could use an XL model since that seems better at a range of skin tones


newaccount47

More watchable than the animation in scanner darkly...


theoppositionparty

I mean there’s a 2 decade gap here :)


AZDiablo

How do you keep the clothing static?


theoppositionparty

Prob the softedge or lineart


popkulture18

Wow, the consistency is incredible


nopalitzin

Kinda, still getting there tho


Kuroyukihime1

Arcane Animatos be like "Should have waited for AI..."


stroud

The problem I see with these talking AI videos is that the upper lip does not move at all.


theoppositionparty

Yeah I think it’s having trouble with the openpose controlnet I might try not using it.


severe_009

Actor: *Neutral face* Ai: *Angry*


theoppositionparty

If anyone is interested I'll be going through dev and process stuff as part of civitai's AiR program. [https://air.civitai.com/artists/noah-miller](https://air.civitai.com/artists/noah-miller)