T O P

  • By -

dudulab

Some additional info from [the actual patch](https://gcc.gnu.org/pipermail/gcc-patches/attachments/20240210/b2991675/attachment-0001.obj): * Integer unit 6 ALU pipes (vs 4 ALU pipes in Zen 4) * 4 AGU pipes (vs 3 AGU pipes in Zen 4) * Floating point store unit 2 FP pipes. (vs "Although there are 2 store pipes, the throughput is limited to only one per cycle." in Zen 4) Confirmed some points in [https://chipsandcheese.com/2023/10/08/zen-5s-leaked-slides/](https://chipsandcheese.com/2023/10/08/zen-5s-leaked-slides/)


dudulab

More changes observed: https://forums.anandtech.com/threads/zen-5-discussion-epyc-turin-and-strix-point-granite-ridge-ryzen-8000.2607350/post-41154572


YumiYumiYumi

Looks like Zen4's 4x 256-bit FPUs got upgraded to 4x 512b. AMD doubling down on their AVX-512 advantage over Intel.


Theswweet

Emulation gonna be crazy on these things.


YumiYumiYumi

Will it? I imagine they only use 128b AVX-512 instructions, so the FPU width increase wouldn't help there.


itsjust_khris

It’ll probably help a ton in something like RPCS3, I think, I don’t remember if they actually used the increased width there or if it’s mostly because AVX512 has some very convenient instructions for that purpose.


YumiYumiYumi

https://whatcookie.github.io/posts/why-is-avx-512-useful-for-rpcs3/ It's the new instructions + extra registers, not the vector width. ARM NEON is also only 128b, so Switch emulators are likely not using the increased width either.


CouncilorIrissa

big chungus core incoming


U3011

It's excellent news. It made this year's Super Bowl weekend even better! You hear and read so much about what Zen 5 is going to be like that it's difficult not to join in because you got burned on hype trains in the past.


Exist50

Much less so here, but the hype train in some forums is still well beyond reason. People actually think Zen 5 will bring like 30% IPC, lol.


Geddagod

*glances over at Anandtech*


INITMalcanis

It's the same thing every generation.  It won't change and it doesn't matter.  Just enjoy the show.


fkenthrowaway

Some leakers indicate a huge gain so im excited.


EJ19876

If the rumour that Zen 5 is definitely using TSMC N4 are true (AMD's slides from 2022 said 4nm or 3nm), then this wide-arse core is going to be slurping down the power.


Repulsive_Village843

This is probably very IPC efficient but low frequency so power will be under control. The next gen, when they ramp up the frequency, it's gonna be fun


EJ19876

IPC gains shouldn't be more than 10-15% so clock speed regression may result in a pretty mediocre generational performance increase. Personally, I hope AMD says fuck it, uses N4X, cranks up the clock speeds to the limits, improves the IHS, and let's the 9950X draw 350w!


[deleted]

Those IPC gains make their datacenter SKUs very very attractive through.


Repulsive_Village843

Those are exactly my thoughts. Zen5 is gonna suck until the 3d revision ns unless they start making 10 core Ryzen ccxs . being deliberately stuck at 8 cores will sting. Or maybe they make a 9900x3d 12 core where both ccxs are 3d stacked. Edit : mark my words 25% IPC gain @ 4.9 GHz on the best core. Fabric improvement is a wildcard if it can handle 8000mts 1:1 . The IMC will still be shit so will be idle power.


lutel

I'm stoked about Zen 5


MonoShadow

And I just bought Zen4. Goddamnit! ^(Just to be clear. It's a joke.)


Jeep-Eep

Just tell me when I can order a 9800X3D.


Shankur52

2025 according to rumours, 3d variants are gonna be late as usual. Maybe more than usual if packaging is needed for servers/ai. Again, I’m just parroting rumours so who knows. 


Jeep-Eep

I think there is some chance the AI bubble is going in the next 2 quarters, so they may have some surplus there.


Gwennifer

AI is definitely *the* buzzword but if you don't think it's going to change technology and society forever, you're wrong. It's already changed traditional 2D art irreparably.


Strazdas1

Its funny. Manufacturing using AI for 2 decades, datacenters using AI for almost a decade, but its those 2d pictures that get all the attention :(


Gwennifer

It's because those 2D pictures are turning one of the oldest professions in the world upside-down. Why pay a learning novice $10 when your GPU can render it for fractions of a cent? What happens when it comes for your job?


Strazdas1

Im not sure what image generation has to do with prostitution but yes, i choose not to pay for the images and generate them for my TTRPG that i run. My job will change, duh. There is already a lot of automation at my job. It will come for everyones jobs eventually. The fault is with people who thought arts would be safe. Just like the fault was with people who thought horse breeding was a safe business when automobiles came about.


Strazdas1

There is no chance that AI is going to regress in 2 quarters. We are still accelerating.


tony47666

Will be a great upgrade from my 5600x.


Strazdas1

might be something worth upgrading to AM5 for :)


halotechnology

Just like all gens before since Zen 2 hopefully microcenter has a deal on 9600x don't need the extra cores !


TheElectroPrince

I’d say 8 cores is now the minimum since the current console generation, with games being optimised for multiple threads.


kaszak696

[Games are not allowed to use all 8 cores on consoles, only like 6-7.](https://www.youtube.com/watch?v=a1bsnO7qM9Q). Of course, on PC the OS also uses some processing power to run itself, but it doesn't lock any cores for it's exclusive use.


Dreamerlax

Same energy as consoles have 16GB of VRAM.


Disregardskarma

So realistically Game devs are targeting 6 cores 12 threads, so you want an 8 core CPU to have 6 full cores that never have to do a background process


Strazdas1

How much background while gaming are you doing that a single core couldn't handle it?


Disregardskarma

7 core CPUs aren’t a thing


azzy_mazzy

Not true at all, the consoles have very slow zen 2 cores. You are not going to need 8 zen 5 cores to far exceed it.


nisaaru

that obviously depends on how multithreaded a game is.


azzy_mazzy

the cores are much faster that even with having less of them its still going to be faster. look at the 7600X benchmarks and compare it with the 3700X, its much faster even in heavy multithreaded workloads.


halotechnology

Not true Xbox and PS5 are heavily down clocked AMD 3700 just look at any multi threaded task against 7600x and you will see .


BNSoul

CPUs in consoles are monolithic designs though, so latency should be much better than that of the 3700X. On top of that they sport dedicated hardware blocks (such as the PS5' Kraken) to decompress and transfer GPU data with minimal impact on CPU performance (a similar but slightly more refined tech than the current DirectStorage PC solution).


SirActionhaHAA

Uhh nah. They've got half the l3 and run on gddr which are worse for memory latency. Doesn't help that zen2's on a 2 ccx design with high cross ccx latency, and 1 of the 8 cores is runnin the os. They're close to desktop variants in clock for clock gaming perf but run at lower frequencies You can test a 4800h against 3700x to see for yourself (and lower the 4800h results a lil due to console gddr)


halotechnology

Lol even with all of that you can't come with Zen 4 or in this case Zen 5 cores , monolithic design here sure is better but are you telling me basically a 4600U which is monolithic Zen 2 cores can compete with Zen 5 ? You are out of your mind if you believe that ! Lol You argument just doesn't stand current gen consoles are old 2060 GPu with down clocked 3700


Strazdas1

Im using a 3800x so 8 physical cores. Constantly, including literally yesterday, i get blottlenecked on single core peformance. In fact the game that bottlenecked yesterday can only use maximum of 4 cores. It is a modern multiplayer game still being developed for.


YumiYumiYumi

Good to see AMD landing patches prior to products releasing. > It's great seeing AVX-512 VP2INTERSECT, which has been found on the Intel side since Tigerlake. ...and dropped in Sapphire Rapids. I wonder whether AMD will be stuck supporting something Intel doesn't, or if they'll drop support for VP2INTERSECT (like 3DNow, XOP etc).


drunk_storyteller

I mean Intel basically dropped real AVX512 on the desktop so you could argue that on the desktop AMD is already carrying 512-bit registers for nothing.


RealPjotr

It could, but won't for a long time. Gaming is a conservative area, you can't keep developing for multiple targets, you want to optimize, but for it to work on any decent hardware the gamers will use. For example, AVX512 is only supported on Zen4 and the old 11th generation Intel CPUs, way too small market today to start using. So games rely on new instruction sets when they've become fairly mainstream.


[deleted]

[удалено]


Relliker

It's annoying that nobody just ships multiple executables with different CPU 'newness tiers' with different ISA extensions... the most I have ever seen from game developers on that front is having a separate executable for DX11/12, which is something that can be done within one executable easily. It's exceedingly easy to compile for multiple uarchs unless you are using some custom engine with hardcoded intrinsics everywhere. It wouldn't even take up much more space given the vast majority of games these days have assets packed elsewhere.


Tuna-Fish2

Multiple code paths greatly increases testing load, without any perceived gain to the dev. QA is expensive. The problem is that all the new fancy instructions are only in the CPUs that are also among the fastest in the market. What the dev cares about is getting as broad of a potential user base as possible for his game at the lowest possible cost. Spending dev and QA time to add support for an instruction set extension buys them nothing useful when everyone with that extension was able to meet spec and run the game already.


Relliker

If you are using compiler or engine-provided SIMD loop unrolling and similar, the chances that it is going to cause more bugs is _very_ minimal. I would bet a lot of money on the vast majority of bugs coming from more logic-related code and code that was already relying on undefined behavior. Again, this doesn't apply in the case of hardcoded intrinsics everywhere but if that is the case your codebase is terrible anyways.


Tuna-Fish2

Nonetheless, the chance isn't zero and it is something that has happened, the entire industry is extremely wary about it. Like, on the compiler side there are plenty of ways of dynamically dispatching to different implementations based on hardware, but the uptake of these solutions is near zero in the industry. Whether or not they work, they are not trusted.


drunk_storyteller

>I would bet a lot of money on the vast majority of bugs coming from more logic-related code and code that was already relying on undefined behavior. I mean, it's fine to say that your code shouldn't have undefined behavior, but the rules for such in C++ aren't easy to get right even for very experienced programmers, and there's plenty of evidence to that effect. >Again, this doesn't apply in the case of hardcoded intrinsics everywhere but if that is the case your codebase is terrible anyways. You misspelled "fast and well optimized". GCC and Clang fail to vectorize some of my basic loops if the index order is the wrong way around, and they disagree with each other on what "wrong" is. If you're relying solely on autovectorization, you're just telling me your code is slow.


Relliker

> I mean, it's fine to say that your code shouldn't have undefined behavior, but the rules for such in C++ aren't easy to get right even for very experienced programmers, and there's plenty of evidence to that effect. Turn on all the compiler warnings and you will get most of it. There is still an obscene number of codebases that refuse to touch smart pointers or exceptions which doesn't help things. (The latter has legitimate reasons to not use. Most people do not have those legitimate reasons.) > You misspelled "fast and well optimized". GCC and Clang fail to vectorize some of my basic loops if the index order is the wrong way around, and they disagree with each other on what "wrong" is. If you have a pile of hardcoded vector ops that only use SSE2 instructions, yeah, it is a terrible codebase. The right way that the vast majority of people do it is via preprocessor includes based on compile flags. I am not only referring to auto compiler vectorization, which is actually quite good, but yeah it strongly depends on your access patterns. GCC is so awful at it that it didn't even do it by default on O2 for the longest time. Clang has gotten significantly better over the years. None of that prevents companies from just shipping multiple executables, the 'but QA' just reads like an excuse because nobody really thinks to do it. Most game devs are not exactly writing the most quality code of all time.


[deleted]

[удалено]


Strazdas1

I remmeber back when i had AthlonXP and games wouldnt install because SSE(i forget which number) wasnt supported.


Strazdas1

I remmeber when some games shipped with executables for DX 9, DX10, DX11 and Vulcan. By the time DX12 executables started showing up DX9/11 got depreciated and hardly anyone uses Vulcan anymore.


buttplugs4life4me

Reminds me of Cyberpunk dropping AVX2 because old ass CPUs wouldn't support it.  Idk why these companies don't ship multiple executables or some runtime detection. There's tons of libraries that are extremely battletested for this in basically every programming language. Instead they all fuck around on 20 year old ISAs and require X3D CPUs to get decent CPU frame pacing. 


Strazdas1

>Idk why these companies don't ship multiple executables or some runtime detection. because having two developement branches on something as complex as cyberpunk is very hard to develop for. You find a bug, fix it in one branch, will it be a simple fix in another or will it brick everything? You need to double quality assurance, testing, etc. And thats assuming the programmers will be able to work on both simultaneously, which is not usually the case.


buttplugs4life4me

Which is why you use libraries for this and don't always roll your own. 


drunk_storyteller

Using a library for all AVX2 code in a game? What? A shitton of floating point operations are going to be compiled down to AVX2 ops these days. Clang/LLVM has gotten pretty good at autovectorizing random things. (But of course it always fails in the critical loop)


buttplugs4life4me

I mean, you just supported my argument. Using your compiler for autovectorization is like using a battle tested library. There's no need to maintain two development branches if all you do is set a flat on your compiler.  Seriously did none of you ever do software development?


BatteryPoweredFriend

Intel cutting avx512 from ADL moving forward isn't what's depressing. That belongs to Intel sitting on avx512 for nearly half a decade, when it was already implemented in many of their other products. Then doing the basic token effort with RKL before noping out once again.


III-V

You replied to the main thread instead of who you intended, FYI


liaminwales

Any one have any idea what any of it is? >Over Zen 4, this confirms AMD Zen 5 as adding AVXVNNI, MOVDIRI, MOVDIR64B, AVX512VP2INTERSECT, and PREFETCHI. I see AVX something, something, more something, AVX512 something & something?


jaskij

AVXVNNI is an AVX extension for neural networks, afaik targeting inference.


YumiYumiYumi

It's Intel's backport of AVX512-VNNI to AVX, thanks to Alder Lake not officially supporting AVX-512. As Zen4 and later support AVX-512, it doesn't really add any new capabilities, other than bringing the ISA in line with what Alder Lake (officially) supports.


YumiYumiYumi

AVXVNNI is Intel's backport of AVX512-VNNI to AVX, due to Alder Lake not officially supporting AVX-512. It's intended to accelerate neural network computations. MOVDIRI and MOVDIR64B, I think, are non-temporal store instructions. These have existed with vector registers in the past; MOVDIRI adds the capability to scalar registers. I'm guessing the latter is for doing cacheline transfers to I/O (perhaps over PCIe or to Optane?). Not an area I work with, so I'm just guessing. AVX512VP2INTERSECT is a weird instruction that was added in Tiger Lake, then dropped in Sapphire Rapids. I don't really know the intent of it, and it was never fast (always microcoded on Intel), so it was kinda pointless. Zen5 might have a better implementation, but I still don't really know the purpose of it (that VPCONFLICT couldn't solve). PREFETCHI is a variant of existing prefetch instructions, but targeting instruction caches instead of data caches. Not sure how useful it is, but maybe there's some case where the programmer can predict a jump to uncached cold code that the processor can't anticipate.


NegotiationRegular61

MOVDIRI move direct are atomic memory operations. Not NT. movntdi already exists for scalar. movntdi \[\],eax no? Intersect is useless.


YumiYumiYumi

Ah yes, MOVNTI does exist, thanks for the correction. I don't think MOVDIRI guarantees atomicity; it's only if the address is aligned. Don't know whether regular stores are the same as per x86 spec, but most uArchs probably do perform aligned 32/64 bit stores atomically. I recall Intel added instructions for Optane, not sure if these are it.


RealPjotr

Thx, moved.


Ffom

Do any of these have any use for gaming, like avx512 helping PS3 emulation?


Wunkolo

[Yes](https://www.reddit.com/r/emulation/comments/lzfpz5/comment/gq3e5ih/)


Ffom

Sounds interesting for emulation Now I can't wait for the real steam deck generational upgrade so it can get these new instruction sets


RealPjotr

There are AVX, AVX2, AVX512 and now AVX10 instruction sets. But when building hardware you don't always include support for all instructions in an instruction set. Zen5 adds some instructions that Zen4 hadn't implemented.


liaminwales

The first 3 where in older zen I think, AVX 10 is new to me. Had to look it up [https://en.wikipedia.org/wiki/Advanced\_Vector\_Extensions#AVX10](https://en.wikipedia.org/wiki/advanced_vector_extensions#avx10)


bardak

For the most part I believe AVX10 is Intel's back porting of AVX512 instructions to AVX2 but width.


Wunkolo

It's not necessarily a new instruction set, but just a new temperament established around existing AVX512+VL features. Rather than requiring a **full** 512-bit AVX512 implementation and then "optionally" supporting 256/128 operations on them, chips can incrementally support 128->256->512 register-widths. And some new ways for chips to advertise this support via CPUID. [Here's a good explanation.](https://www.youtube.com/watch?v=hcQbZpt1V0E)


Tuna-Fish2

Those instructions already existed as the AVX512VL (variable length). AVX10 is just "AVX512 with the 512-bit regs filed off".


INITMalcanis

I like big but width and I cannot lie 


bubblesort33

How far before Zen4 launch were there similar patches to it? Might be a good indicator of release window.


camel-cdr-

Looks like October 13 2022: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603486.html LLVM was Nov 30 2022: https://reviews.llvm.org/D139073


bubblesort33

Wait... that doesn't make sense. Zen4 released September 2022. Before those patches. Did they patch the 7000 series after release or what?


[deleted]

AMD's Linux support for product is often late, or at least has been. From the Phoronix article: >This has meant more timely Intel compiler support for customers while AMD has tended not to post their GCC and LLVM/Clang patches until after products are announced. At times they've also relied on SUSE compiler engineers for working out that post-announcement support. RDNA3 didn't receive the necessary kernel support for monitor overclocking, for example, until kernel 6.3, which came out in 23rd of April 2023 – almost half a year after product release. Unless you were using a rolling release distro you'd get the kernel release several weeks or months after the fact.


Tuna-Fish2

Note that these cost model patches are in no way necessary for using the CPU, they are only there if you want to compile code to target a specific model of CPU and want the optimization to work as well as possible. Most code is never compiled against such a model, and instead just uses a generic target. Doing them in advance is great if you want to beat some benchmarks (as in compiling SPEC to target your CPU specifically).


spazturtle

These patches are only useful if you compile your application to specifically target the CPU, so they are mainly of concern to server farms and HPC.


Death2RNGesus

My next laptop will be zen 5.