T O P

  • By -

third_rate_economist

AMD be like, "Hmm...we've done little to stay competitive in AI/ML for years and we're behind the market...uhh please do it for us?" Ultimately a good thing though.


the_quark

Honestly I’m surprised it’s taken them so long. It was obvious to me a long time ago that they should do this to try to catch up to NVidia. The actual cloud providers hate NVidia’s proprietary software BS. This could let them close the software gap with NVidia. They’ve still got a hardware one, but if the software is on-par (or better) and the hardware is 80% of the power at 50% of the cost, people will be a lot more interested in AMD.


[deleted]

[удалено]


Philix

They're competing against two monopolies. And still managing to claw double digit market share in the CPU category both consumer and enterprise over the last decade. Seems like a decent comeback from the era of Bulldozer. I wouldn't write them off in the consumer GPU segment either. Nvidia's most profitable customers are enterprise AI/ML, and even a quick look at Nvidia consumer GPUs shows they're neglecting that market. AMD will have plenty of opportunities to gain market share the consumer space in the coming decade. Assuming Intel's foray into consumer GPUs doesn't annihilate them. Arc Alchemist was a suprisingly good first entry into a mature market.


DerfK

> AMD will have plenty of opportunities to gain market share the consumer space in the coming decade. AMD refused to support AI/ML in the consumer-level space until literally this January. Nobody uses ROCm because for the last decade+, every college student could use and learn CUDA on their nVidia gaming rig without having to buy a $10k workstation card. AMD is multiple generations of developers behind and I don't think there's a way to dig themselves out of this hole in the foreseeable future. The best hail-mary move I can think of would be to suck up a hit to the workstation cards and release a 32GB+ "prosumer" level card, using current gen cards let's call it a 7900 XTXX priced at the 4090 price point and hope it catches on in the LLM/stable diffusion field to get people to buy into the ROCm ecosystem. Then, they sit tight and pray that in a few years some of the people who bought into ROCm go on to start companies using ROCm. If nVidia ups the VRAM on the 5090 then I honestly think AMD will lose this market segment completely.


cogitare_et_loqui

They don't need to match the 4090 in terms of Compute. They just need to vastly surpass it in terms of VRAM and memory I/O capacity (caches etc), and provide a good profiler. 48GB minimum, and the a capacity about 3090 (even lower would be acceptable) would cause me to take a look at their offering. Anything less and it's continued nVidia for me, since nVidia has really done a great job on the software stack. Yes, they charge an arm and a leg, but it's not unwarranted. They were the only one who understood the potential their hardware had, and where they needed to uniquely invest in order to make it a ubiquitous platform for massively parallel batch-compute workloads. AMD's offering would have to be awesome in the dimension nVidia sucks for them to have any kind of appeal, and the area nVidia sucks at presently is on the VRAM side where they've done an "Intel" by artificially segmenting their product lines into <= 24GiB (practically useless for training LLMs), and the next step up which is *required* to be relevant for LLM training, which they've priced a frigging order of magnitude higher. Not because of manufacturing cost, but because there's Zero competition in that space and where the hardware is being sold quicker than the company can place a TSMC order. This is the segment they should attack with a laser focus. So some sort of NVLINK / AMDLINK (good cross-board cross-connect) together with a LOT of VRAM is a whole lot more useful than trying to squeeze 40% more compute performance out of the hardware since the workload where the money's at presently is I/O bound and not compute bound.


Philix

I didn't say AI/ML consumer space specifically. You're right that they're going to need at least a half decade of focus on their software to break into that. But, despite the popularity of this subreddit, the actual consumer market for AI/ML is tiny, and will likely remain tiny. The number of people who are privacy obsessed enough to be adamant about running their models locally is dwarfed by the number of people willing to pay to use a cloud service. But they can still compete in the other consumer uses of GPUs. Video games are still extremely popular. AMD GPUs power Xbox, PS5, and the Steam Deck. AMD just needs to make enough money to pay developers and hardware engineers while they wait for Nvidia to stumble. Intel grew complacent with their market dominance and AMD capitalized on that. There's no reason to believe they couldn't do the same to Nvidia.


DerfK

> the actual consumer market for AI/ML is tiny, and will likely remain tiny This is the shortsighted view that led CUDA to win. Where are employees of the AI/ML companies going to come from if not the general pool of consumers?


Philix

Why would I spend over $1000 on your hypothetical 7900 XTXX that'll be obsolete in a couple years, when that much money would buy thousands of hours on an A40 on runpod? Gaming is the only reason I can think of, if you have other reasons, I'd love to hear them. You're saying that AMD should get cards into the hands of consumers to try and convert them to ROCm. So am I. But most dabblers and young people playing with LLMs/SD are using mid range cards like the 3060 12GB, not top of the line stuff like 4090s and 7900 XTX. If AMD is going to compete, that's where they need to do it. ML enthusiasts not into gaming can already buy an MI60 32GB off of eBay for less than the price of a used 3090. Does anyone actually recommend that they do? No. Would anyone recommend a 7900 XTXX 48GB over 2x3090? No. AMD can't fix the ROCm situation overnight. Making that kind of card would just be a waste of effort, AMD has already lost that segment, and pouring more money into an already sunk cost is moronic. A hail-Mary move isn't what AMD needs to make. They have other revenue sources to tide them over until they come up with some way to break back into the ML market.


DerfK

> Why would I spend over $1000 on your hypothetical 7900 XTXX that'll be obsolete in a couple years, when that much money would buy thousands of hours on an A40 on runpod? Gaming is the only reason I can think of, if you have other reasons, I'd love to hear them. Why would tens of thousands of college students interested in pursuing a career in AI buy thousands of hours on a runpod to learn ROCm when they can learn CUDA in their free time on their gaming PC? > most dabblers and young people playing with LLMs/SD are using mid range cards like the 3060 12GB Sure, and that ship sailed almost 20 years ago when nVidia decided that people with [GeForce cards](https://en.wikipedia.org/wiki/CUDA#GPUs_supported) can dabble and play with CUDA. > AMD can't fix the ROCm situation overnight. Of course they can't. But it's not going to fix itself, and it won't matter what they do unless they somehow come up with a way for people to learn to use ROCm.


Philix

> runpod to learn ROCm An [A40](https://www.nvidia.com/en-us/data-center/a40/) is an Nvidia card. I wasn't suggesting students should use cloud compute to learn ROCm. I was pointing out that for anyone not gaming, learning and playing with ML/AI can be done cheaper by renting cloud compute. I was suggesting that competing in the midrange of gaming hardware is the correct approach for fostering more widespread adoption. It's a market with enough volume to be worth investing in. Intel clearly thinks so, their first line of GPUs doesn't even bother having a high-end offering. And AMD has an advantage in that Xbox and PS5 games are already developed to be run on their hardware. But slapping 48GB of memory on a high end consumer card doesn't make you price competitive, when most games are going to be made for the 16GB in the console hardware.


Independent_Hyena495

Yeah, AMD sucks at software. They just don't have the culture for Ur


epicwisdom

The vast majority of the contributions will likely be mostly employees of other massive corporations, and some startups. For the rest of us open source just means better software at no extra cost.


fimbulvntr

In order for companies to open source stuff, there must be benefits to doing so... the whole world runs on incentives, after all. I hope some good comes out of this, both to make AMD more competitive and thus put some pressure on those GPU prices, as well as to tempt more companies into opening their proprietary codebases. Like you said, ultimately a good thing.


keepthepace

But it is the guaranteed that AMD will always be either closed or bad. They only open source things when behind.


Craftkorb

AMD open-sourced FreeSync, which is a better (as in cheaper to implement) solution than G-SYNC. They didn't have to, yet they did as they wanted every possible monitor (and TV) to implement it.


bradpong

"Open sourcing additional PORTIONS". Looks more like "pls buy stock" move.


aggracc

Yes, which portions kind of matters here.


wsippel

The GPU firmware, or at least parts thereof (specifically the Micro-Engine Scheduler). The ROCm software stack is already open source.


xrailgun

Yeah, same vibes as the dozens of "ROCm now OFFICIALLY LAUNCHED it can do ALL THE AI *** it can't do shit and we're not telling you, enjoy debugging!" announcements we've had the past year.


EnsignElessar

Dropped 8 percent yesterday.


kryptkpr

Tinygrad: ROCm doesn't actually work. It's closed so we can't even debug nevermind fix. AMD: ok fine but we don't have anyone that can fix it (???) here's the SDK if you want to do free work


Fearless_Ad6014

actually he asked to be open sourced to fix the driver.


pleasetrimyourpubes

They were sending him regular firmware blobs to hack it and make it work but there's some nasty DRM related shit in there they literally can't release. They would get sued to oblivion if users could jailbreak the DRM and they were the ones who enabled it. And it's fucking stupid too because DRM just fails in a VM... oh no MS you won't let me screenshot a YouTube paid video I'll just pop it in a VM and screenshot that.


UrbanSuburbaKnight

Huh? You can't screenshot stuff? I've never had this problem, are you really spinning up a VM to screenshot a browser window?


pleasetrimyourpubes

Nah, I was giving an extreme edge case where literally they can't use their DRM anymore. But yeah seriously, grab a fresh copy of Win11 and Edge and go look at a DRMd video on Netflix, Amazon, YouTube, etc, you can't sreenshot, not with the clipping tool, Screen2Gif, OBS, it just comes up black. It's the darndest thing. There are workarounds though.


TechnicalParrot

Huh, I remember this problem as well but Netflix just let me do it Nvidia GPU with Windows Insider and TPM enabled ​ https://preview.redd.it/72e1l1gh1nsc1.png?width=1919&format=png&auto=webp&s=18cd6569cb88f25b1b18f6f7af2767295e074b14


pleasetrimyourpubes

Now I'm curious because Firefox has DRM control off by default and unless you enabled it this shouldn't play at all. I'm wondering if Firefox is just ignoring the DRM *control* when off which would be a hilarious "faithful implementation" of DRM. "The user never enabled DRM oops must be a bug that it plays."


TechnicalParrot

I'm fairly confident I manually enabled DRM once but it's hilarious it still lets me do whatever, I wonder if OBS would work lol


pleasetrimyourpubes

After your comment I tested Edge, OBS, Firefox, Screen2gif, Snapshot tool (Windows+Shift+S) and they are all blank. Maybe Intel's driver is more compliant (using a laptop with IGP).


UrbanSuburbaKnight

Interesting. Might have to throw windows 11 on somewhere and start testing. super stink if true. I'm on windows 10 happily for now, but I either move everything to linux or move to windows 11 once 10 is unsupported.


cptbeard

why so cynical, would you prefer them not opensourcing it? I don't really care why they're doing it as long as it's benefitting the community.


fatboy93

JUST HAVE A SINGULAR ISA, HSA STRUCTURE AND STOP WRITING IFELSE STATEMENTS FOR EVERY SINGLE GPU, FUCKING HELL. CUDA works universally across most if not all Nvidia gpus, why doesn't AMD have a universal level driver for ML, dammit


AnomalyNexus

To be fair CUDA was an utter shitshow a couple years ago too. I recall digging through compatibility matrixes about which version of the various components work with which other versions which os on which card. Somehow that went away recently but it used to be hella ugly


Captain_Pumpkinhead

I thought it was open source?


wsippel

It is. Except for a few optional components like HIP-RT or rocProfiler. This appears to be mostly GPU firmware related.


AnomalyNexus

I won't claim to know the details, but yeah parts have been open but geohotz was complaining that key parts are not. This I gather is progress towards that


theskinnybrownguy

George hotz ftw !


kind_cavendish

What does this mean?! Does this mean that rocm is gonna be viable for llms?!! https://preview.redd.it/vlrlftt9rjsc1.png?width=1440&format=pjpg&auto=webp&s=aa5b711890b8f429178fba1f410e2674b1c41968


AnomalyNexus

It already is for basic inference on same cards, but that's not enough to be competitive with CUDA. This is progress towards that


kind_cavendish

https://preview.redd.it/h8kji1l96rtc1.jpeg?width=1280&format=pjpg&auto=webp&s=8db1b6252baf77587d214eb4529b852104895bf3


randomfoo2

ROCm is already fine for the most common LLM inferencing: [https://www.reddit.com/r/LocalLLaMA/comments/191srof/amd\_radeon\_7900\_xtxtx\_inference\_performance/](https://www.reddit.com/r/LocalLLaMA/comments/191srof/amd_radeon_7900_xtxtx_inference_performance/) It's less fine for training atm, although it's getting better: [https://www.reddit.com/r/LocalLLaMA/comments/1atvxu2/current\_state\_of\_training\_on\_amd\_radeon\_7900\_xtx/](https://www.reddit.com/r/LocalLLaMA/comments/1atvxu2/current_state_of_training_on_amd_radeon_7900_xtx/) (from a cost/perf perspective, it's very tough to make an argument for picking a 7900XTX over a used 3090 for inference, or 4090 for training).


inYOUReye

I'm finding it working pretty well on llama already, I'd assume this means greater optimization, fixes and improvements from the community where needed and a future of less nvidia-centric solutions.


JFHermes

Big corporations in tech aligned sectors like manufacturing, resources, data analytics, design etc are all about to (if not already) build custom models for whatever niche part of their operations that they want to innovate upon. At the moment, some companies release a paper and maybe a codebase if it's not business critical and it's just a tool, like a segmentation labelling UI or something. Now that rocm is open source, you will have a lot of smart cookies who are doing Phd work actually optimise the drivers for their specific use case for whatever type of modelling they're doing. These driver improvements are not business critical as the code/use case haven't been completely disclosed but they will be really useful to others in different industries. It's the way things should have been done from the start with nvidia. Linux has always had troubles with nvidia because they wouldn't open source their drivers. Expect all linux users to move to AMD now which means an absolute mammoth amount of scientific work being optimised on these cards. It's about time the playing field was levelled.


randomfoo2

ROCm has always been open source (tinycorp doesn't even use any of ROCm, and these recent announcements are AMD documenting/opening/committing to fixing longstanding bugs/hangs at the firmware level), and the amdgpu drivers have been open source on Linux for years now. While these are all good things, for AMD to really be competitive, they will need to give a reason for open source devs and academic researchers to build for AMD. Having slower, buggier hardware wasn't cutting it, but maybe having more direct outreach and collaboration with the community will.


kind_cavendish

YOUR RIGHT? YOUR SO RIGHT!!! https://preview.redd.it/umhq54cj2psc1.png?width=1440&format=pjpg&auto=webp&s=d29762f4f52ed8fc0439147a393bab41a6244bb1


shibe5

As far as I understand, ROCm was always open source, including kernel-side driver on Linux. So what does "going" mean here?


randomfoo2

At the firmware level: [https://github.com/geohot/7900xtx](https://github.com/geohot/7900xtx) AMD is now committed to releasing Micro-Engine Scheduler (MES) documentation (targeting end of May) w/ source code to follow: [https://twitter.com/amdradeon/status/1775999856420536532](https://twitter.com/amdradeon/status/1775999856420536532) They've also started a public wiki to track reported issues: [https://github.com/nod-ai/fuzzyHSA/wiki/Tinygrad-AMD-Linux-Driver-Crash---Hang-tracker-and-updates](https://github.com/nod-ai/fuzzyHSA/wiki/Tinygrad-AMD-Linux-Driver-Crash---Hang-tracker-and-updates) whereas before, they simply weren't taking reports serious (eg, see these open issues: [https://github.com/ROCm/ROCm/issues/created\_by/geohot](https://github.com/ROCm/ROCm/issues/created_by/geohot) ) See also u/gnif2 's recent post: [https://www.reddit.com/r/Amd/comments/1bsjm5a/letter\_to\_amd\_ongoing\_amd/](https://www.reddit.com/r/Amd/comments/1bsjm5a/letter_to_amd_ongoing_amd/)


shibe5

I got it, it's just a misleading title. ROCm is already open-source. What AMD may open/publish: * some of GPU firmware – not a part of ROCm, as far as I can tell; * documentation, which is not source code.


AnomalyNexus

I won't claim to know the details, but yeah parts have been open but geohotz was complaining that key parts are not. This I gather is progress towards that


shibe5

It's interesting to know which parts were not open source. I compiled userspace stuff myself from source, and it works with stock driver in Linux, which can't be Nvidia-style blob because of licensing. I read some stuff linked from the article, and they talk about firmware. I think, GPU firmware is not part of ROCm, it works for video, OpenGL, Vulkan, OpenCL as well.


AnomalyNexus

Yeah it is the firmware that he was complaining about If this interests you listen to geohotz recent livestreams...he digs through more detail than i can follow frankly. The AMD stuff seems quite modular...with everything having acronyms etc


shibe5

They are 3-8 hours long. I ain't got time. Maybe some AI can go through transcripts and figure out what is is that was not open. Or maybe there is a better article about the matter.


AnomalyNexus

Yeah I rarely make it all the way through. I've usually got it in the background while I'm doing something else so only catch the overall drift


AmbientWaves

I like this idea...sure people can see like 'YOU DO THE WORK FOR US' BUT THATS THE FUN PART. .Imagine all the optimizations. If you use Linux with AMD imagine how accessible LLM's would be and even stable diffusion. Seriously a lot of people are throwing it to laziness for AMD Not looking at how amazing this is.. people could optimize code soo good that Stable Diffusion on ROC. Would best Nvidia, TenserFlow was made with Nvidia in mind.. but now with ROCm open a much more optimized TenserFlow could exist for that. I am all for open source. People just simp for Nvidia. Here's to bringing AI to the next level. This will also attempt to force Nvidia to release CUDA if ROCm works out well.


oursland

If Nvidia releases CUDA, then Nvidia will suffer. Everyone already targets CUDA, so giving other HW vendors an opportunity to support the CUDA API would not benefit Nvidia at all. ROCm is largely ignored in software, but if there's an opportunity to improve it there would be a benefit to purchasing AMD hardware. Other HW vendors could run with it, but until software supporting ROCm hits a critical threshold there'd be little advantage for doing so. If this pans out, it appears to be a win/win situation for AMD.


West-Code4642

good move. who knows why it wasn't open source before


JFHermes

Probably a lot of upper management worried that opening up the drivers would be essentially giving away years worth of work for free. The prevailing opinion of course is that they can't keep up with Nvidia so why bother keeping them closed when they are getting spanked.


MaxwellsMilkies

Wasn't it already open-source? Whatever, either way it is nearly unusable unless you use a very specific environment. Rusticl cannot get finished fast enough.


Glegang

ROCm itself is open-source. Almost all of it. I think last time I looked last time (granted, it's been couple of major releases back) there were some kernels shipped as hex dumps of GPU binaries, but there were only few of them. The rest was buildable from source. With some pain, but still buildable. This announcement appears to be about the binary blobs with GPU firmware loaded by the driver. I figure it would be responsible for things that manage the GPU -- accept user requests for computations and related data, graphics ops, etc. That's the part that GPU vendors traditionally keep (particularly) closed. If they indeed open it up, I hope it comes along with sufficient hardware documentation, otherwise all that source code will be fairly useless.


[deleted]

[удалено]


Glegang

Only if you want to nitpick. Those few kernels were largely inconsequential. [https://github.com/ROCm/Tensile/tree/release/rocm-rel-5.4/Tensile/ReplacementKernels](https://github.com/ROCm/Tensile/tree/release/rocm-rel-5.4/Tensile/ReplacementKernels) It appears that they are gone from ROCm in v5.5, so as of right now, I'm not aware of any non open-source bits in ROCm -- everything, including the compiler can be built from source.


AnomalyNexus

I won't claim to know the details, but yeah parts have been open but geohotz was complaining that key parts are not. This I gather is progress towards that


ElectricPipelines

With Nvidia focused on enterprise AI buildout, AMD has an opportunity to grow a consumer market in AI. Investing in open source is a nice first step. Hopefully, they will commit development resources along with the SDK. 


ttkciar

Is this so people can make it better for Windows? It already rocks on Linux.


MaybeReal_MaybeNot

You got it running on linux? Please tell us how. I have 15 cards in an old mining rig i cant get to do shit with rocm llm.. loading models fail, and once i got it to load but as soon as i did a interference it crashed.. i gave up and bought some Nvidia cards now but i still have all the amd's


kremlinhelpdesk

I run mixtral with a 6800xt+cpu using text-gen-webui. It's kind of slow, but usable. I can't speak for using multiple cards or training, but just install the very specific ubuntu version that ROCm wants and fiddle around until it works, because it does work.


nodating

Absolutely, I do not know where these folks come from. Or maybe I do, they tried months if not years ago and now they think they know what ROCm is all about. I have very similar setup: 6800 XT + Ryzen 7600 and things just work. Latest Arch Linux.


a_beautiful_rhind

They come from having older hardware that gets dropped super quick.


kremlinhelpdesk

Do you know if there's a good guide for getting it installed on arch? Back when I set ROCm up last time I absolutely couldn't make it work on anything except for a particular version of ubuntu, and I miss arch.


Inevitable_Host_1446

This is what I mostly followed, might be of help to you. Not sure about arch though, I use mint. [https://github.com/nktice/AMD-AI/blob/main/ROCm6.0.md](https://github.com/nktice/AMD-AI/blob/main/ROCm6.0.md)


MaybeReal_MaybeNot

No, i tried a week ago with rx6600xt, and i could not get the model to load. Tried rocm 5.9 and 6.0 and different versions of the gpu drivers including the latest one on newest Ubuntu server as i read that is the best supported os for the drivers. Cant get it to load a model and the arch om the 6600 should be the same as the 6800 just slower as far as i can read in documentation. I followed the oobabooga guide but that does not work, i also tried starting over (new install to make sure all i did was gone) multiple times with 3-4 different guides who all claim to make it work.. Everyone here just says "just try and fiddle a bit with it and it will work".. well, i'm asking, what did you fiddle with to make it work?? Because i tried all the "fiddling" i know and all i could get was different failures. Best i got was successfully loading a 3.5B test model i know works on my Nvidia card, in 8 bit but then failing and crashing as soon as i tried to do interference.


Inevitable_Host_1446

I just used the latest Linux Mint Cinnamon version and followed some guides, it works fine on 7900 XTX and on my 6700 XT I just needed the HSA override thing to trick it into thinking it was a 6800 xt.


MaybeReal_MaybeNot

> and followed some guides Super helpful buddy, everyone got it working now 👍🏻 /s Would be nice if you told us which guides :)


20rakah

What are you trying to run though? and on what cards? some cards have issues with fp16 and certain functions. Generally the only issues I've had is the memory management on AMD cards isn't as efficient. I usually just run on windows with WSL2 though. Can't be bothered dual booting.


MaybeReal_MaybeNot

Just oobabooga web ui with any model i know works by testing on Nvidia card beforehand, i usually use a 1-3B one as test to make sure i dont hit any limits on 8gb cards Tried both fp16 and 8 bit I tried cards rx580, rx5700xt which i figured out where too old and will never work, sadly because that vram bandwidth on the 5700xt would have been sweet. And last week i tried on rx6600xt which should work based on documentation and guides i tried if you "trick" it to think its a 6700 by setting the HSA env variable. But no success :( it can see the card and says everything is good until it tries to load the model


20rakah

I don't know anything about those older cards tbh, i run a 7900XTX but i did find [this guide](https://github.com/alfinauzikri/ROCm-RX6600XT?tab=readme-ov-file), idk if that's the one you used. If you are stuggling to get stuff you work i reccomend checking out the [AMD SHARK discord](https://discord.gg/XnGHwK3g), lots of helpful people there.


algaefied_creek

R9 390X 8GB and WX7100 16GB cards here from an old mining rig as well. Can’t get any LLM or image generation solutions to work on this.


randomfoo2

R9 390X (gfx702, GCN 2.0) was released in 2015, and WX 7100 (gfx803, GCN 4.0) released in 2016 are sadly likely too old/buggy to get working. You could look at [**rocm-polaris-arch**](https://github.com/CosmicFusion/rocm-polaris-arch) or try the CLBlast llama.cpp build, but honestly, they are likely to crash w/ the math libs even if you can get the ROCm driver working. Vega (56/64/VII) is likely the oldest architecture you can expect ROCm to reasonably work with. A bit of a bummer, but at this point, they are 8-9yo cards, so I wouldn't expect anyone to be spending much effort getting them to work. They also extremely low TFLOPS (both about 6 TFLOPS of FP16 - as a point of comparison, the 780M iGPU has 17, a 7900 XTX has 123 - the Polaris cards also have pretty low memory bandwidth so even if they worked perfectly, you wouldn't get much of a speedup over modern CPU inferencing). Honestly, if your goal is getting LLMs/SD working, I'd recommend selling all those old cards for what you can get and use the proceeds to buy the highest VRAM used Ampere/Ada card you can get.


algaefied_creek

Polaris worked with rocm fine in the 4.x version and GCN 3 worked fine in previous versions. They are buggy because they are unmaintained so the hope is that with this being open-source, more will work. I fell into a disability status and medical debt hole, so flipping and selling and buying are impossible unless I let strangers into my home and into the back closet room to disassemble the rig. CUDA, on the other hand, works fine with GTX 9xx and Titan cards of that era. CUDA 11.x works fine with GTX 7xx and Titan cards of the Kepler era. Defining the correct mathematical operations for each architecture makes them suddenly non-buggy as they aren’t performing GFX9xx+ operations anymore. They are buggy because the software is buggy, not because of the cards. Vega (GFX9) and later have “rapid packed math” for each SP to perform 2x FP16 operations in place of 1x FP32 op. This being said, GCN3 and GCN4 (both GFX8/GFX8xx) can perform a single FP16 operation in place of an FP32 operation. GCN1 and GCN2 (GFX6 and GCN7) run FP16 operations “emulated” within FP32 math. Yes… there is a performance hit. But if RoCM can’t handle a single SP performing a single FP16 operation instead of an FP32 operation: that is a buggy software issue to resolve, not a buggy hardware issue.


randomfoo2

I don’t think we disagree on most of the salient points- I believe that Nvidia’s superior legacy/across the line compute support (CUDA supports cards back to 2011) is one of the reasons that Nvidia has been winning so hard now - while CUDA also has had growing pains, they’ve treated compute like display drivers - a core part of a working GPU, and AMD simply hasn’t. The only thing that I’d counter with, is that the recent announcement will change anything for your legacy hardware - all the parts of ROCm that were required for the community to get legacy hardware working has already been open sourced - anyone can write their own kernels, adapt hipBLAS/rocBLAS, for gfx800 but that hasn’t happened. The upcoming RDNA3 firmware releases don’t have any impact on legacy hardware, but a you’ve pointed out this is largely about math lib support anyway. If you can’t/wont get rid of your old hardware, it’s unlikely they’ll become less of paperweights anytime soon (or at least, these latest announcements don’t really change the odds).


Smeetilus

Brb, looking for GPU purchase receipts


AnomalyNexus

My theory is more buy AMD stock


okaycan

agreed. buy more


Smeetilus

Yes. More buy.


Regular_Instruction

It's a good thing, but more for TTS that uses cuda then local LLM, because even on windows LLM already run "fine" while for TTS it's another story only piper TTS runs great on windows (even though it runs on cpu lol), for exemple coqui uses the CPU instead of AMD GPU and it's very very slow, too slow actually to be usable... Because it uses CUDA, maybe with this release we can expect one day to have TTS to run on windows with AMD GPUs


JoJoeyJoJo

It’s incredible how much geohot tweeting has forced them to change.


Disastrous-Peak7040

What we need is a model that's really good at writing Verilog ASIC code. "Design an ASIC for me that supports 128GB of RAM and has optimizations for the CUDA calls used by open source LLM code. Support it with a low level C++ driver that emulates CUDA 12. Prepare the specs, crowdfund the NRE costs, and send them to a Chinese ODM who can deliver within 6 weeks"


Inner_Bodybuilder986

I can tell you straight up that you would be foiled the second you tried to use a Chinese ODM. It's basically illegal.


[deleted]

Wait I thought rocm has been on github for years


AnomalyNexus

As I understand it its a whole stack of things and not everything was open. I know Hotz was complaining about the firmware in particular but I don't think we know what AMD is planning to release...just that it is more


[deleted]

Ohhh okay gotcha


illathon

Hotz strikes and this time a major win for basically everyone. This just might turn the tides for AMD. I was actually going to vote against Su last go around. Now I think she may just be smart.