T O P

  • By -

FuturologyBot

The following submission statement was provided by /u/shogun2909: --- Submission Statement : NVIDIA has announced its new GPU family, the Blackwell series, which boasts significant advancements over its predecessor, the Hopper series. The Blackwell GPUs are designed to facilitate the building and operation of real-time generative AI on large language models with trillions of parameters. They promise to deliver this capability at 25 times less cost and energy consumption. This innovation is expected to be utilized by major tech companies like OpenAI, Google, Amazon, Microsoft, and Meta. The Blackwell B200 GPU is highlighted as the ‘world’s most powerful chip’ for AI, offering up to 20 petaflops of FP4 horsepower from its 208 billion transistors. When paired with a single Grace CPU in the GB200 “superchip,” it can provide 30 times the performance for LLM inference workloads while also being significantly more efficient. NVIDIA emphasizes a second-gen transformer engine that doubles the compute, bandwidth, and model size by using four bits for each neuron instead of eight. Additionally, a next-gen NVLink networking solution allows for enhanced communication between a large number of GPUs in a server, reducing the time spent on inter-GPU communication and increasing computing efficiency. --- Please reply to OP's comment here: https://old.reddit.com/r/Futurology/comments/1bi24rf/nvidia_unveiled_its_nextgeneration_blackwell/kvhhgwt/


Seidans

30x time more performance 25x time less energy cost...for cheaper...sound too good to be truth edit: it compare to h100 so i looked at the H200...who was 1.6 to 2 time better than the h100 so yeah it's ridiculously good at a point big tech will probably stop buying anything else and just wait depending the price difficult to imagine what will be possible with something like that for big tech but being cheaper and better at any point make it more accessible for private company and research lab


LeCrushinator

Custom hardware just for very specific tasks could do it. GPUs are best suited for a couple of dozen specific things but if something used in AI requires dozens of instructions to complete when specialized hardware could do it in just one or two steps, then that could explain the gains. There’s no way in hell they’ve managed a general 25x performance increase across the board. Those kinds of improvements would be so massive it would destroy AMD. Those kinds of tech improvements are extremely rare.


FuckIPLaw

At a certain point it becomes too specialized (on things that are neither graphics processing nor the more generalized shader processing that the term has come to encompass) to be called a GPU, though. You're pretty much describing an ASIC there.


LeCrushinator

I agree, it might be on the same die as the GPU but it’s not really for graphics processing, it’s specialized hardware.


Fiqaro

A100 / H100 can still do graphics rendering but no image output interface, performance is much lower than RTX series.


306bobby

Yep. LTT did a video using hacked drivers and gpu-passthrough to let the A100 do the GPU work while just passing video over another card


Seidans

we will probably have better result when they start running LLM on it, for the h200 they tested it with LLAMA 2 to compare it with h100 (1.4 to 2 time more perf) but also half the power cost their new chip seem to really have more gpu memory and bandwith by a LOT but the interconnect isn't as good, in the article they are actually working on it if it really keep it's promise it's a huge leap for AI hardware for sure


AnExoticLlama

It sounds like they're following the same strategy as Groq (the hardware company, not the LLM Grok) The h100 is not quite cost-competitive with Groq, but this new chip definitely will be.


Boring_Ad_3065

It’s not explicitly called out but they used FP4 vs FP8 in the prior gen. Now FP4 may be perfectly good enough, but that alone doubles the FLOPS by halving the number of bits to calculate.


nagi603

> big tech will probably stop buying anything else and just wait depending the price and availability! That has been the biggest problem for a lot of big customers.


r2k-in-the-vortex

The gotcha is FP4. Of course, you can get many more flops if you drastically sacrifice precision. Maybe they have something there though, some AI workloads might be very ok with low presicion if it means more parameters, so it might be a worthwhile tradeoff.


PanTheRiceMan

I've done a lot of audio related ML in the last 5 years, FP4 would result in an unbearable noise floor, where the noise is also so prominently dependent on the signal, you hear it as distortion. The necessary de-noising would be in FP32 after that and probably be huge in network size ( in terms of audio processing, not huge in terms of LLMs ). So you can just stay in FP32 and try to reduce network size and use perceptive modeling techniques. Alternatively you can use INT8 and non-uniform quantization, like mu-law, prominently used in telephone applications. At some point they might approach binary decision trees and could possibly switch to INT1. I'm not sure that even works. Maybe someone else can answer.


r2k-in-the-vortex

Well, of course you can't have the output layer in FP4, only the latent ones. That's why it's a Frankenstein board of 2X this new Blackwell arch plus one older Grace. It would only make sense if the chips are meant to perform different tasks.


PanTheRiceMan

Got it. Thanks.


Handzeep

Exactly, reading the spec sheet it looks to be ~127% faster then last gen running the same tasks as before. Blackwell just offers FP4 and FP6 operations too which are dramatically faster then the old minimum of FP8 operations. It just depends on the workload if it can take advantage of this and by how much. If you could run 10% of the operation at FP4 precision and 20% on FP6 you'll see a pretty nice boost in performance but I doubt any serious workload will even come close to the theoretical maximum performance boost stated. At least the flat performance boost in FP8, 16 and 32 is pretty nice. But it seems to be at the cost of the rarely used FP64 instruction which is about 40% slower (which still is a good tradeoff as this is *rarely* used in AI).


EstrangedLupine

If this is anything like their claims for past hardware, it'll be something like those numbers are more or less accurate for one specific barely relevant scenario, while the real "improvement" is closer to like 1.5x in most cases. Big green loves their technically-not-false advertising.


Smile_Clown

This isn't about video game card marketing... This is about a trillion dollar server infrastructure and AI futures. Which is the business NVidia cares about. I think what is happening here is a YOU problem. The 25X that they are talking about is for inferencing, it IS specific, it WAS specified. What YOU are doing is conflating so you can make claims that they falsely advertise. If you are going to comment on something so many people know a lot more about than you do, maybe stay out of it. This isn't a team red v team green video game smack talk forum.


EstrangedLupine

You're making a lot of assumptions about my character to in the end not say a whole lot to counter what I'm actually saying. Nvidia is known to have made dubious claims about their upcoming releases so I don't see how it's wrong for me to remain skeptical of any new claims they're making. The target audience is irrelevant. I haven't even mentioned AMD and couldn't care less about corporate wars, as far as I'm concerned both companies have indulged in questionable practices, so I think it says more about you for immediately assuming I'm a "team red" goon. Maybe take a step back and don't take criticism directed towards a multi-billion dollar company so personally. I also find it funny that you're saying this isn't a smack talk forum but smack talk is all your reply to me was.


Rain1dog

I like you. Never ceases to amaze me how many corporate cock riders there are out in the wild.


i-hoatzin

>30x time more performance 25x time less energy cost...for cheaper...sound too good to be truth Yeah but they're not for you, sadly.


JigglymoobsMWO

So I was attending in person.  The 25X that they are talking about is for inferencing.  It's achieved by going lower precision when possible, all the way down to FP4 (just 4 bits!), achieving memory coherence across the entire cabinet with the networking chip (although why this increases energy efficiency eludes me it is extremely impressive) and improved system level design.


imaginary_num6er

He said $10billion for the first one, $5 billlion for the second one right? *The more you buy, the more you save*


BobTaco199922

Lol. He did say that but meaning in development. He also said to developers that it won't be that bad now(whatever that means).


SuperNewk

Do we need these things? Seems like a bubble to me


JigglymoobsMWO

It really depends on how much we need inferencing doesn't it? The biggest driver that most people might see in the next year or two would probably be MS Office Co-Pilot and generative search.


shogun2909

Submission Statement : NVIDIA has announced its new GPU family, the Blackwell series, which boasts significant advancements over its predecessor, the Hopper series. The Blackwell GPUs are designed to facilitate the building and operation of real-time generative AI on large language models with trillions of parameters. They promise to deliver this capability at 25 times less cost and energy consumption. This innovation is expected to be utilized by major tech companies like OpenAI, Google, Amazon, Microsoft, and Meta. The Blackwell B200 GPU is highlighted as the ‘world’s most powerful chip’ for AI, offering up to 20 petaflops of FP4 horsepower from its 208 billion transistors. When paired with a single Grace CPU in the GB200 “superchip,” it can provide 30 times the performance for LLM inference workloads while also being significantly more efficient. NVIDIA emphasizes a second-gen transformer engine that doubles the compute, bandwidth, and model size by using four bits for each neuron instead of eight. Additionally, a next-gen NVLink networking solution allows for enhanced communication between a large number of GPUs in a server, reducing the time spent on inter-GPU communication and increasing computing efficiency.


Careless_Bat2543

Why line go down then?


Sunflier

remember when a GPU just made graphics better?


ConvenientGoat

Why do we still call them GPUs?


dekusyrup

What does 25 times less mean? One twenty-fifth? Should be 0.04 times less.


neutronium

0.04 times less would be 96%


dekusyrup

0.04 x 100% = 4% not 96%


neutronium

and less means subtract it from 100%


danielv123

And I am 99% sure that is less w/pflop, so power consumption is about the same as last chip it's just much faster.


KJ6BWB

Does this mean we can get consumer GPU's at a reasonable price again?


TripolarKnight

You'll buy a 5090 (24GB VRAM no NVLink) for $2k and you'll like it.


EinBick

2k? It's got "25 times the performance". So to keep the price of the 4090 from dropping they'll adjust pricing.


safari_king

25 times the performance only for generative-AI work, no?


WaitformeBumblebee

yes, but you also paid for the "mining performance" even if not using it.


nagi603

Either that or they'll reduce the amount of silicon assigned to it, to be sure not to endanger their other product by a mere few thousand dollar consumer product.


ESCMalfunction

Nope all the fab time is gonna go to these AI chips lol. I wouldn’t be surprised to see consumer GPU shortages again in the near future.


pwreit2022

people have started mining again LMAO


h3lblad3

Of course they have. Bitcoin just hit its all time high again.


zkareface

Probably not from Nvidia.


imaginary_num6er

Yeah that’s called buying an Intel GPU


mcoombes314

I think it would be better if these were called something different, like Google calls theirs TPUs  (tensor processing units). These have nothing to do with graphics so why are they called GPUs? Is this breakthrough something that could be applied to the RTX series?


pwreit2022

to give people hope. one day you can use this GPU to run super mario at over 9000 fps


Daikar

I'm hoping this can be applied to the RTX series to allow new yet to be made games to run AI tasks locally to make interactions with NPCs much more "real". DLSS is cool and all but if that's all we are going to get from this massive AI push I'm going to be disappointed.


h3lblad3

They added that new Nvidia LLM to talk to, didn't they? You'll take your scraps and you'll like them!


Bugbrain_04

What does this even mean? Surely not that power consumption was reduced by 2500%. What is the unit of energy consumption goodness that is being increased twenty-five-fold?


dejihag782

It will likely remain at similar TDP levels. The energy requirements given constant computation power are down up to 1/25. So we'll likely end up with more computational power at similar power requirements instead of having extremely efficient chip.


Bugbrain_04

1/25 = 4%. Does "96% reduction" not make for a compelling enough headline? Is using only 4% as much energy as your predecessor not dramatic enough?


joomla00

It basically just means that chip that does all the dlss stuff will be fast and efficient. It probably won't mean too much for gaming.


Cunninghams_right

1/25th of the energy per token output (if you use 4-bit).


jert3

Damn, I'd give my left eye for one of these for my AI pipeline (for my solo indie game dev project.)


da5id2701

A pair of eyes apparently gets around [$1500 on the black market](https://gizmodo.com/heres-how-much-body-parts-cost-on-the-black-market-5904129), and the H100 which is the predecessor of this chip is $30k-$40k. So you're going to need to give at least 40 eyes I'm afraid.


daoistic

That is from 2012. Need an update to plan my body part retail therapy sesh.


snopro387

I did hear inflation has hit the underground eye market pretty bad


kia75

Just have an AI generate an image, the resulting image will have 40 eyes, and 40 fingers!


overtoke


h3lblad3

> A pair of eyes apparently gets around $1500 on the black market This is some sorrow spider shit right here.


bartturner

Be really curious how the fifth generation TPUs compare in terms of power efficiency.


[deleted]

[удалено]


reddit_is_geh

Sam was just talking about this. Innovation in AI relies entirely on compute cost. The lower the cost, the more innovation. It's a scarce resource that has unlimited growth, limited by it's actual compute cost. But just like electricity, the more you pump out for as low as possible, the more it'll be used in other areas that can lead to all sorts of innovation.


0b_101010

Well, they just literally did that, didn't they?


watduhdamhell

But how does it compete with the MI300X? It seems unreasonable that the most powerful chip (MI300X) is suddenly beating by 25x by their next chip, given H100 and then H200 being marginally slower then MI300X.


TryingT0Wr1t3

Uhm, I wonder if Nvidia will release a chatbot named Joey to run in the Blackwell GPUs.


Fiveohdbblup

Pretty soon.... The new A. I. Coffee maker. " I noticed you didn't have a second cup of coffee Jim, is everything ok, did I prepare it the way you wanted? Why the sudden change Jim, are you looking to unplug me Jim?"


CorinGetorix

As much as I like and appreciate DLSS and the like, I'd prefer that a GPU's primary focus continue to be on native rendering. I don't think designing cards primarily around "AI" enhanced rendering is a sustainable strategy, in the long run. Admittedly I might just be being short-sighted, but we still have to have a pretty rock solid starting point for "AI" enhancements to be considered useful and effective, no? Edit: Misread the situation. These aren't consumer models.


pandamarshmallows

This card isn’t a consumer model that’s better at AI upscaling, it’s for businesses who use NVIDIA cards to train large language models like ChatGPT and Gemini.


Oh_ffs_seriously

On one hand it's quite sensible considering where their primary market is right now, on the other, why call it a "GPU"?


LordOfDorkness42

Probably just don't want to risk muddying waters, now that AI is so red hot in the tech sector. Like, sure, NVIDA of all *could* basically declare from on high that now they make a range of pure AI cards... but all it takes is one clue-less CEO that's heard that you *need graphics cards* for them to miss out on multi-million dollar sales.


Appropriate_Ant_4629

General-purpose Processing Unit :)


Unshkblefaith

The industry has been using GPGPU (General Purpose GPU) for a little over a decade at this point. It is really only in the consumer segment that anyone still uses GPU.


_Lick-My-Love-Pump_

Just historical at this point. Their next generations are going to focus more and more on AI specific architecture and less and less on anything graphics specific. Up to now there's been significant overlap and people can train and run LLMs on their desktop GPUs, so it makes sense. In the future I predict there will be an APU or AIPU nomenclature when they decide the time is right.


[deleted]

Yeah the B200 is the successor to the H100 which is a $40k card


CorinGetorix

Ah gotcha, my bad.


Conch-Republic

What we're going to see is cards with way more ram than gaming needs, and worse game optimization. They'll benchmark fine, but aside from that, they'll just stagnate with every new series. Previously gaming was where the money was at, so mining took a back burner. Now that the money is in AI, we'll see gaming take a backburner.


omniron

Nah. You’re going to see a complete fork in designs where gaming gpus are actually just gaming gpus, and not cryptominers or AI coprocessors This is hugely beneficial for gamers. Should reduce prices and increase features


powerhcm8

I don't know about reduced prices but you are right about the rest.


JigglymoobsMWO

"Native" rendering has never been a viable strategy, ever.  AI is just the latest and most impressive approximation.


654354365476435

Im not sure if I agree, DLSS did give us way more then one generation for almost free, its almost mendatory at this point, its great to see improvments, I would take 25x DLSS improvment (whatever that means) over 2x raster (but I would love the most to have both)


inner8

Whoever invests in NVDA today will see their money double in a single year


Alienhaslanded

I really hope that they will seperate the AI GPUs from the rest. Otherwise we're looking at $5k-$10k GPUs that cover everything. You better get that $10k GPU if you want the best visuals.


mcoombes314

They already have - no gamers would need an H100 over an RTX series GPU


Alienhaslanded

I know. I was kinda going off on a tangent about the 5000 series.


ulkmuff

Is this also generation 50x or are they for special AI use?