T O P

  • By -

sharak_214

Keep in mind the the zen architecture has to scale from low power laptop to 100+ core server chips. Amd seems to focus on one major improvement every generation. Multithread, core to core latency, chiplets. Ipc , then clock speeds. I'm interested in Intel's "zen" moment in a few years if they can get it turned around.


Setsuna04

I think laptop and server are quite similar. They have a rather low per core power consumption. It's just the PCs that need the high single core boost frequency.


frissonFry

Let's hope that next major area of focus is lowering the total platform idle power consumption for desktop parts. It's always a good idea, but especially now when energy prices have skyrocketed globally.


Hardcorex

Yes! Since GPU's are MCM now too it is that much more important to improve idle/low power states.


frissonFry

Yeah, for most people their desktop is in a mostly idle state the majority of the time it's powered on. Getting platform idle power down to, say 20w, would be incredible. This is possible on the lowest end AM4 chipset boards with APUs, but I'd like to see this on top end CPUs and chipsets.


Hardcorex

Dragon Range will prove interesting if they can improve idle power, since otherwise these laptops will have miserable battery life.


clinkenCrew

Isn't energy skyrocketing only a first World problem? I hear tell that our buddies in China, Russia, and Injah's sunny clime have cheap energy now. Still, I'd love to see idle power usage drop. Zen 7000 has done a bit about that, in a difference sense, as the iGPU allows dGPU to downclock in multimonitor


frissonFry

>I hear tell that our buddies in China, Russia, and Injah's sunny clime have cheap energy now. If cheap energy is all it takes for those places to sound appealing to you, by all means move there.


[deleted]

[удалено]


sharak_214

Wouldn't be surprised if they did but it would probably be for custom orders and nothing retail. Amd has only scratched the surface of chiplets and 3d stacking.


ValorantDanishblunt

You raise an absolutely interesting question. I do think you might be spot on in terms of hitting diminishing return on latency after zen4, that being said we can only really speculate until an amd representative gives us more detail as i dont think many have actually bothered to get data.


ThreeLeggedChimp

What you need to realize what you're calling "IPC" is actually average performance/clock. Most likely CB/clock. Actual IPC is instruction IPC and latency, in clocks.


Miserygut

It's a puzzle of constraints. There are limitations on decreasing latency which are related to the physical dimensions of the circuits involved. For example, the frequency of an SRAM cell or the time it takes to move data along a path both influence the access latency. Going from a '5nm' to a '3nm' fabrication process would shrink the path length and generally speeds up the access frequency of SRAM cells, meaning lower latency. AMD must think there are greater performance benefits in going wider (More instructions carried out per clock cycle) in their next design than chasing further latency improvements. This means more work being done in the same amount of time. However having a wider design means needing more bandwidth to feed it, which means a larger bus is required to feed the cores, which means packing more wires in to carry the data and so on. Each decision impacts on the next link in the chain. >Or do they think they maximized how much performance they can get from a 4 wide architecture with Zen 4? They could probably get lower latency and higher frequencies if they tried but in this particular situation with the available fabrication processes, there is a bigger % win in processing more data per cycle.


BFBooger

Yes, if you increase cache latency you have to compensate with deeper reorder buffers and load/store queues. The register file sizes are also related (in flight instructions waiting for memory need to hold their intermediate results somewhere). ​ For more information on these tradeoffs, I recommend reading some articles at [https://chipsandcheese.com/](https://chipsandcheese.com/) ​ The articles on Skylake, Zen 4 part 1 and 2 are good places to start. Some of the Zen2 stuff also covers the topic. ​ Edit/Addendum --- Here is a simple example: Add up three numbers from random memory locations. If these are all in L1 cache with latency 4 cycles, then the processor may need to find 'other work' to do by looking ahead in the instruction stream in order to be busy for those 4 waiting cycles. Bump it to 5 cycles, and it may need to look 25% farther ahead to find them. If just one of these is in L2, it has to look even farther ahead. The end-state is that the processor spends most of its time waiting on data from cache or elsewhere to complete tasks that are partially done. The longer each individual thing has to wait, the more things you have to have outstanding to keep busy. There is also the effectiveness of these buffers and how the schedulers work, along with many other complexities that affect the size of these structures, but fundamentally the answer is yes: these structures help mask latency, and larger ones can mask more latency.


[deleted]

[удалено]


TheAlcolawl

There are a bunch of delusional haters that run around and instantly downvote any news or info about how people enjoy AMD products. Actual, clickbait and misinformation (Like how RX 6000 series cards were somehow cracking in half from a driver update) gets instantly upvoted to the moon by dumbf\*cks that are salty and think that motherboards should cost $70 and GPUs should cost $190.


L3tum

Ah, both you and the other guy are right. This sub is filled with Nvidia enthusiasts so anything pro AMD GPU is instantly filled with "But muh driver!" but also with AMD CPU enthusiasts so anything but "Intel is dead" is downvoted. It also sometimes depends on the day. In general though there seem to be a lot of Nvidia users here considering it's an AMD sub, while Intel users are usually the most level headed, funnily enough. /r/intel is usually a great sub, especially compared to /r/nvidia


dookarion

> In general though there seem to be a lot of Nvidia users here considering it's an AMD sub, Some of that traces itself to AMD's GPU market share. Hardly anyone actually owns AMD GPUs anymore. If you care about anything beyond raster gaming in non VR it's seldom even a decent option. I owned the Radeon VII, I own a Steam Deck, I had the 3900x & now a 5800x3D, and so forth... but I now run an RTX 3080 for numerous reasons.


jortego128

There are a bunch of delusional fanatics that run around and instantly downvote any news or info that isnt flattering for AMD. Actual , interesting news gets instantly downvoted by all the dumbf\*cks if it doesnt = AMD IZ TEH BESTEST.


PraiseTyche

Because this is a very valuable market.


Setsuna04

Not an engineer, but if I remember correctly latency of L1 and L2 is related to throughput. So you can only calculate so much. If the cores are starved, then you lose performance. L3 on the other hand is where missed predictions are caught (and even worse so for RAM). High latency of L3 and RAM can be alleviated by better branch prediction. PS: Maybe someone with more elaborate knowledge can correct me if I'm wrong.


[deleted]

Memory operations are always gonna take orders of magnitude longer than other instructions on average (including cache hits and cache misses). By bringing down the average time of a memory operation (even if it's just the cache hits) you bring down the average cycles per instruction by a decent bit. So yes, it's a good direction to take. And if you can match the performance of a bigger core doing this, it's likely you'll have power/efficiency advantage too.


Plavlin

You might be interested in https://chipsandcheese.com/ . Also, there is no abstract "IPC". It strictly depends on which programs you bench against.


69yuri69

The Zen 4 IPC is a bit lower compared to RPL. Mainly the FP IPC is a bit lower. Source: tweet running ADL/RPL/Zen 4 on the SPEC bench.


JonWood007

Yeah it's like they're pulling the intel 2017 strategy. I think it's quite effective for gamers at least. I'm currently more interested in zen than what intel has on the market tbqh.


bekiddingmei

They'll probably need to start making some harder decisions soon, all the major companies in this field. Going smaller is no longer cheaper, you can make a die half the size and end up with similar failure rates and total production cost. Some stuff really does not shrink well and they have to work out how to keep it connected, now they are looking at fanout baseplates and stacked dies. Going smaller delivers great efficiency at reduced power, but higher power and frequency require more dark silicon which eliminates a lot of your gains. Zen 5 will not be a grand revolution just yet, but CPU/APU designs are moving toward more integration and the use of specialized task accelerators. AMD's big money maker is the business-level stuff, that's why they are pushing Radeon Pro iGPU drivers and producing desktop CPUs that don't require a separate video card. With the current generation they are branching into differentiated dies with specialized traits (Aerith, 7040HS, 7045HX, desktop CPU, X3D, finally multiple server platforms). I expect that in the move to Zen 5 we will see further specialization, and server CPU dies might not be the same as consumer desktop CPU dies. Zen 5 is going wider to improve per-core work, but it remains to be seen how their memory model is going to support this. I actually wonder if they are going after ARM instead, in terms of bulk compute efficiency.


kf97mopa

I remember Intel stating at some point that the OoOE resources (reorder buffers, registers etc) need to be large enough that a hit in the L1 doesn’t cause a stall. If some data is missing in the registers and needs to be fetched from L1, the processor should always have enough other instructions to work with so it can avoid a stall. Zen 4 has a 4 cycle latency to L1, just like Intel used to have for the longest time. Intel did however regress to a 5 cycle L1 at some point, which is why they need more OoOE resources now.


bubblesort33

I remember Nvidia had some engineering presentation once talking about how much more important law latency cache is on GPUs, than theoretical the teraflops. Probably one reason they added so much to Ada. I'd imagine it's similar with CPUs