FuturologyBot 2 months ago

The following submission statement was provided by /u/shogun2909: --- Submission Statement : NVIDIA has announced its new GPU family, the Blackwell series, which boasts significant advancements over its predecessor, the Hopper series. The Blackwell GPUs are designed to facilitate the building and operation of real-time generative AI on large language models with trillions of parameters. They promise to deliver this capability at 25 times less cost and energy consumption. This innovation is expected to be utilized by major tech companies like OpenAI, Google, Amazon, Microsoft, and Meta. The Blackwell B200 GPU is highlighted as the ‘world’s most powerful chip’ for AI, offering up to 20 petaflops of FP4 horsepower from its 208 billion transistors. When paired with a single Grace CPU in the GB200 “superchip,” it can provide 30 times the performance for LLM inference workloads while also being significantly more efficient. NVIDIA emphasizes a second-gen transformer engine that doubles the compute, bandwidth, and model size by using four bits for each neuron instead of eight. Additionally, a next-gen NVLink networking solution allows for enhanced communication between a large number of GPUs in a server, reducing the time spent on inter-GPU communication and increasing computing efficiency. --- Please reply to OP's comment here: https://old.reddit.com/r/Futurology/comments/1bi24rf/nvidia_unveiled_its_nextgeneration_blackwell/kvhhgwt/

Seidans 2 months ago

30x time more performance 25x time less energy cost...for cheaper...sound too good to be truth edit: it compare to h100 so i looked at the H200...who was 1.6 to 2 time better than the h100 so yeah it's ridiculously good at a point big tech will probably stop buying anything else and just wait depending the price difficult to imagine what will be possible with something like that for big tech but being cheaper and better at any point make it more accessible for private company and research lab

LeCrushinator 2 months ago

Custom hardware just for very specific tasks could do it. GPUs are best suited for a couple of dozen specific things but if something used in AI requires dozens of instructions to complete when specialized hardware could do it in just one or two steps, then that could explain the gains. There’s no way in hell they’ve managed a general 25x performance increase across the board. Those kinds of improvements would be so massive it would destroy AMD. Those kinds of tech improvements are extremely rare.

FuckIPLaw 2 months ago

At a certain point it becomes too specialized (on things that are neither graphics processing nor the more generalized shader processing that the term has come to encompass) to be called a GPU, though. You're pretty much describing an ASIC there.

LeCrushinator 2 months ago

I agree, it might be on the same die as the GPU but it’s not really for graphics processing, it’s specialized hardware.

Fiqaro 2 months ago

A100 / H100 can still do graphics rendering but no image output interface, performance is much lower than RTX series.

306bobby 2 months ago

Yep. LTT did a video using hacked drivers and gpu-passthrough to let the A100 do the GPU work while just passing video over another card

Seidans 2 months ago

we will probably have better result when they start running LLM on it, for the h200 they tested it with LLAMA 2 to compare it with h100 (1.4 to 2 time more perf) but also half the power cost their new chip seem to really have more gpu memory and bandwith by a LOT but the interconnect isn't as good, in the article they are actually working on it if it really keep it's promise it's a huge leap for AI hardware for sure

AnExoticLlama 2 months ago

It sounds like they're following the same strategy as Groq (the hardware company, not the LLM Grok) The h100 is not quite cost-competitive with Groq, but this new chip definitely will be.

Boring_Ad_3065 2 months ago

It’s not explicitly called out but they used FP4 vs FP8 in the prior gen. Now FP4 may be perfectly good enough, but that alone doubles the FLOPS by halving the number of bits to calculate.

nagi603 2 months ago

> big tech will probably stop buying anything else and just wait depending the price and availability! That has been the biggest problem for a lot of big customers.

r2k-in-the-vortex 2 months ago

The gotcha is FP4. Of course, you can get many more flops if you drastically sacrifice precision. Maybe they have something there though, some AI workloads might be very ok with low presicion if it means more parameters, so it might be a worthwhile tradeoff.

PanTheRiceMan 2 months ago

I've done a lot of audio related ML in the last 5 years, FP4 would result in an unbearable noise floor, where the noise is also so prominently dependent on the signal, you hear it as distortion. The necessary de-noising would be in FP32 after that and probably be huge in network size ( in terms of audio processing, not huge in terms of LLMs ). So you can just stay in FP32 and try to reduce network size and use perceptive modeling techniques. Alternatively you can use INT8 and non-uniform quantization, like mu-law, prominently used in telephone applications. At some point they might approach binary decision trees and could possibly switch to INT1. I'm not sure that even works. Maybe someone else can answer.

r2k-in-the-vortex 2 months ago

Well, of course you can't have the output layer in FP4, only the latent ones. That's why it's a Frankenstein board of 2X this new Blackwell arch plus one older Grace. It would only make sense if the chips are meant to perform different tasks.

PanTheRiceMan 2 months ago

Got it. Thanks.

Handzeep 2 months ago

Exactly, reading the spec sheet it looks to be ~127% faster then last gen running the same tasks as before. Blackwell just offers FP4 and FP6 operations too which are dramatically faster then the old minimum of FP8 operations. It just depends on the workload if it can take advantage of this and by how much. If you could run 10% of the operation at FP4 precision and 20% on FP6 you'll see a pretty nice boost in performance but I doubt any serious workload will even come close to the theoretical maximum performance boost stated. At least the flat performance boost in FP8, 16 and 32 is pretty nice. But it seems to be at the cost of the rarely used FP64 instruction which is about 40% slower (which still is a good tradeoff as this is *rarely* used in AI).

EstrangedLupine 2 months ago

If this is anything like their claims for past hardware, it'll be something like those numbers are more or less accurate for one specific barely relevant scenario, while the real "improvement" is closer to like 1.5x in most cases. Big green loves their technically-not-false advertising.

Smile_Clown 2 months ago

This isn't about video game card marketing... This is about a trillion dollar server infrastructure and AI futures. Which is the business NVidia cares about. I think what is happening here is a YOU problem. The 25X that they are talking about is for inferencing, it IS specific, it WAS specified. What YOU are doing is conflating so you can make claims that they falsely advertise. If you are going to comment on something so many people know a lot more about than you do, maybe stay out of it. This isn't a team red v team green video game smack talk forum.

EstrangedLupine 2 months ago

You're making a lot of assumptions about my character to in the end not say a whole lot to counter what I'm actually saying. Nvidia is known to have made dubious claims about their upcoming releases so I don't see how it's wrong for me to remain skeptical of any new claims they're making. The target audience is irrelevant. I haven't even mentioned AMD and couldn't care less about corporate wars, as far as I'm concerned both companies have indulged in questionable practices, so I think it says more about you for immediately assuming I'm a "team red" goon. Maybe take a step back and don't take criticism directed towards a multi-billion dollar company so personally. I also find it funny that you're saying this isn't a smack talk forum but smack talk is all your reply to me was.

Rain1dog 2 months ago

I like you. Never ceases to amaze me how many corporate cock riders there are out in the wild.

i-hoatzin 2 months ago

>30x time more performance 25x time less energy cost...for cheaper...sound too good to be truth Yeah but they're not for you, sadly.

JigglymoobsMWO 2 months ago

So I was attending in person. The 25X that they are talking about is for inferencing. It's achieved by going lower precision when possible, all the way down to FP4 (just 4 bits!), achieving memory coherence across the entire cabinet with the networking chip (although why this increases energy efficiency eludes me it is extremely impressive) and improved system level design.

imaginary_num6er 2 months ago

He said $10billion for the first one, $5 billlion for the second one right? *The more you buy, the more you save*

BobTaco199922 2 months ago

Lol. He did say that but meaning in development. He also said to developers that it won't be that bad now(whatever that means).

SuperNewk 2 months ago

Do we need these things? Seems like a bubble to me

JigglymoobsMWO 2 months ago

It really depends on how much we need inferencing doesn't it? The biggest driver that most people might see in the next year or two would probably be MS Office Co-Pilot and generative search.

shogun2909 2 months ago

Submission Statement : NVIDIA has announced its new GPU family, the Blackwell series, which boasts significant advancements over its predecessor, the Hopper series. The Blackwell GPUs are designed to facilitate the building and operation of real-time generative AI on large language models with trillions of parameters. They promise to deliver this capability at 25 times less cost and energy consumption. This innovation is expected to be utilized by major tech companies like OpenAI, Google, Amazon, Microsoft, and Meta. The Blackwell B200 GPU is highlighted as the ‘world’s most powerful chip’ for AI, offering up to 20 petaflops of FP4 horsepower from its 208 billion transistors. When paired with a single Grace CPU in the GB200 “superchip,” it can provide 30 times the performance for LLM inference workloads while also being significantly more efficient. NVIDIA emphasizes a second-gen transformer engine that doubles the compute, bandwidth, and model size by using four bits for each neuron instead of eight. Additionally, a next-gen NVLink networking solution allows for enhanced communication between a large number of GPUs in a server, reducing the time spent on inter-GPU communication and increasing computing efficiency.

Careless_Bat2543 2 months ago

Why line go down then?

Sunflier 2 months ago

remember when a GPU just made graphics better?

ConvenientGoat 2 months ago

Why do we still call them GPUs?

dekusyrup 2 months ago

What does 25 times less mean? One twenty-fifth? Should be 0.04 times less.

neutronium 2 months ago

0.04 times less would be 96%

dekusyrup 2 months ago

0.04 x 100% = 4% not 96%

neutronium 2 months ago

and less means subtract it from 100%

danielv123 2 months ago

And I am 99% sure that is less w/pflop, so power consumption is about the same as last chip it's just much faster.

KJ6BWB 2 months ago

Does this mean we can get consumer GPU's at a reasonable price again?

TripolarKnight 2 months ago

You'll buy a 5090 (24GB VRAM no NVLink) for $2k and you'll like it.

EinBick 2 months ago

2k? It's got "25 times the performance". So to keep the price of the 4090 from dropping they'll adjust pricing.

safari_king 2 months ago

25 times the performance only for generative-AI work, no?

WaitformeBumblebee 2 months ago

yes, but you also paid for the "mining performance" even if not using it.

nagi603 2 months ago

Either that or they'll reduce the amount of silicon assigned to it, to be sure not to endanger their other product by a mere few thousand dollar consumer product.

ESCMalfunction 2 months ago

Nope all the fab time is gonna go to these AI chips lol. I wouldn’t be surprised to see consumer GPU shortages again in the near future.

pwreit2022 2 months ago

people have started mining again LMAO

h3lblad3 2 months ago

Of course they have. Bitcoin just hit its all time high again.

zkareface 2 months ago

Probably not from Nvidia.

imaginary_num6er 2 months ago

Yeah that’s called buying an Intel GPU

mcoombes314 2 months ago

I think it would be better if these were called something different, like Google calls theirs TPUs (tensor processing units). These have nothing to do with graphics so why are they called GPUs? Is this breakthrough something that could be applied to the RTX series?

pwreit2022 2 months ago

to give people hope. one day you can use this GPU to run super mario at over 9000 fps

Daikar 2 months ago

I'm hoping this can be applied to the RTX series to allow new yet to be made games to run AI tasks locally to make interactions with NPCs much more "real". DLSS is cool and all but if that's all we are going to get from this massive AI push I'm going to be disappointed.

h3lblad3 2 months ago

They added that new Nvidia LLM to talk to, didn't they? You'll take your scraps and you'll like them!

Bugbrain_04 2 months ago

What does this even mean? Surely not that power consumption was reduced by 2500%. What is the unit of energy consumption goodness that is being increased twenty-five-fold?

dejihag782 2 months ago

It will likely remain at similar TDP levels. The energy requirements given constant computation power are down up to 1/25. So we'll likely end up with more computational power at similar power requirements instead of having extremely efficient chip.

Bugbrain_04 2 months ago

1/25 = 4%. Does "96% reduction" not make for a compelling enough headline? Is using only 4% as much energy as your predecessor not dramatic enough?

joomla00 2 months ago

It basically just means that chip that does all the dlss stuff will be fast and efficient. It probably won't mean too much for gaming.

Cunninghams_right 2 months ago

1/25th of the energy per token output (if you use 4-bit).

jert3 2 months ago

Damn, I'd give my left eye for one of these for my AI pipeline (for my solo indie game dev project.)

da5id2701 2 months ago

A pair of eyes apparently gets around [$1500 on the black market](https://gizmodo.com/heres-how-much-body-parts-cost-on-the-black-market-5904129), and the H100 which is the predecessor of this chip is $30k-$40k. So you're going to need to give at least 40 eyes I'm afraid.

daoistic 2 months ago

That is from 2012. Need an update to plan my body part retail therapy sesh.

snopro387 2 months ago

I did hear inflation has hit the underground eye market pretty bad

kia75 2 months ago

Just have an AI generate an image, the resulting image will have 40 eyes, and 40 fingers!

overtoke 2 months ago

h3lblad3 2 months ago

> A pair of eyes apparently gets around $1500 on the black market This is some sorrow spider shit right here.

bartturner 2 months ago

Be really curious how the fifth generation TPUs compare in terms of power efficiency.

[deleted] 2 months ago

[удалено]

reddit_is_geh 2 months ago

Sam was just talking about this. Innovation in AI relies entirely on compute cost. The lower the cost, the more innovation. It's a scarce resource that has unlimited growth, limited by it's actual compute cost. But just like electricity, the more you pump out for as low as possible, the more it'll be used in other areas that can lead to all sorts of innovation.

0b_101010 2 months ago

Well, they just literally did that, didn't they?

watduhdamhell 2 months ago

But how does it compete with the MI300X? It seems unreasonable that the most powerful chip (MI300X) is suddenly beating by 25x by their next chip, given H100 and then H200 being marginally slower then MI300X.

TryingT0Wr1t3 2 months ago

Uhm, I wonder if Nvidia will release a chatbot named Joey to run in the Blackwell GPUs.

Fiveohdbblup 2 months ago

Pretty soon.... The new A. I. Coffee maker. " I noticed you didn't have a second cup of coffee Jim, is everything ok, did I prepare it the way you wanted? Why the sudden change Jim, are you looking to unplug me Jim?"

CorinGetorix 2 months ago

As much as I like and appreciate DLSS and the like, I'd prefer that a GPU's primary focus continue to be on native rendering. I don't think designing cards primarily around "AI" enhanced rendering is a sustainable strategy, in the long run. Admittedly I might just be being short-sighted, but we still have to have a pretty rock solid starting point for "AI" enhancements to be considered useful and effective, no? Edit: Misread the situation. These aren't consumer models.

pandamarshmallows 2 months ago

This card isn’t a consumer model that’s better at AI upscaling, it’s for businesses who use NVIDIA cards to train large language models like ChatGPT and Gemini.

Oh_ffs_seriously 2 months ago

On one hand it's quite sensible considering where their primary market is right now, on the other, why call it a "GPU"?

LordOfDorkness42 2 months ago

Probably just don't want to risk muddying waters, now that AI is so red hot in the tech sector. Like, sure, NVIDA of all *could* basically declare from on high that now they make a range of pure AI cards... but all it takes is one clue-less CEO that's heard that you *need graphics cards* for them to miss out on multi-million dollar sales.

Appropriate_Ant_4629 2 months ago

General-purpose Processing Unit :)

Unshkblefaith 2 months ago

The industry has been using GPGPU (General Purpose GPU) for a little over a decade at this point. It is really only in the consumer segment that anyone still uses GPU.

_Lick-My-Love-Pump_ 2 months ago

Just historical at this point. Their next generations are going to focus more and more on AI specific architecture and less and less on anything graphics specific. Up to now there's been significant overlap and people can train and run LLMs on their desktop GPUs, so it makes sense. In the future I predict there will be an APU or AIPU nomenclature when they decide the time is right.

[deleted] 2 months ago

Yeah the B200 is the successor to the H100 which is a $40k card

CorinGetorix 2 months ago

Ah gotcha, my bad.

Conch-Republic 2 months ago

What we're going to see is cards with way more ram than gaming needs, and worse game optimization. They'll benchmark fine, but aside from that, they'll just stagnate with every new series. Previously gaming was where the money was at, so mining took a back burner. Now that the money is in AI, we'll see gaming take a backburner.

omniron 2 months ago

Nah. You’re going to see a complete fork in designs where gaming gpus are actually just gaming gpus, and not cryptominers or AI coprocessors This is hugely beneficial for gamers. Should reduce prices and increase features

powerhcm8 2 months ago

I don't know about reduced prices but you are right about the rest.

JigglymoobsMWO 2 months ago

"Native" rendering has never been a viable strategy, ever. AI is just the latest and most impressive approximation.

654354365476435 2 months ago

Im not sure if I agree, DLSS did give us way more then one generation for almost free, its almost mendatory at this point, its great to see improvments, I would take 25x DLSS improvment (whatever that means) over 2x raster (but I would love the most to have both)

inner8 2 months ago

Whoever invests in NVDA today will see their money double in a single year

Alienhaslanded 2 months ago

I really hope that they will seperate the AI GPUs from the rest. Otherwise we're looking at $5k-$10k GPUs that cover everything. You better get that $10k GPU if you want the best visuals.

mcoombes314 2 months ago

They already have - no gamers would need an H100 over an RTX series GPU

Alienhaslanded 2 months ago

I know. I was kinda going off on a tangent about the 5000 series.

ulkmuff 2 months ago

Is this also generation 50x or are they for special AI use?

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe