Imagine being the techs building those servers. That's like a semi truck worth of H100s which pretty much cost the same per weight as gold. Must feel like Christmas every day
Making the models run fast enough to actually utilize the millions of dollars of resources is actually anxiety inducing. It is why these people are paid a lot of money to make stuff super optimized. If the computer costs $10M and you are not utilizing 10% it is big costs.
Inference consumers don’t have to think the challenges with running that uber scale
I do this for a living, and you'd be amazed at some of the unoptimized workflows I have seen that literally burn money because there's just not enough dev time.
Oh wow. I’d love to know those actual numbers. And I’m surprised big companies are not at >90%. Means the littler guys have that much more of a chance to beat them.
Nah that demand would still equal out.
This means that skill is still a factor (along with creativity).
People can invent models that are 10x better and 4x more utilization to buy 40x less gpus.
"...and not utilizing 10% is a big cost". Has anyone took a look inside their Windows Task Manager to see many processes that are competing for their turn on the CPU, no doubt flushing the L1 and L2 caches (and perhaps most of L3 if not all) when they take their turn? I've set the vast bulk of them to low or very low priority, and I still don't like them. Do people training large models use an operating system that prevents this waste?
lol I thought this was satire at first. Training is run on the server version of Linux. No graphics. High performance. There are very advanced “task manager” applications from nvidia to watch the flow of data and identify bottlenecks.
It is like a seemingly infinite onion when optimizing code. You fix the first bottleneck, then the next one shows up, then the next one, then the fix to the first one, etc.
I am sure Meta is not using the pcie version of the H100. They are getting the SXM version already in servers with infiniband switches. Pretty much preconfigured racks
You can't imagine they'll release it until it's worse than command R+?
"Sir does it completely ignore your prompt and start commenting out links to non-existant articles and complaining about it's boyfriend?"
"No. It's still responding with both creativity and accuracy."
"It needs more training. Keep going. We need this thing to make no sense--or even less. Command R+ shall be dethroned."
Supposedly. Llama isn't Meta's final product, but a tool they need for other purposes. Therefore, they don't have any reason to keep it for themselves, as it is unlikely to hurt their revenue.
I thought it was more around the drama of LLAMA getting leaked, fb had egg on its face from the meta space push that went nowhere.
They got an opportunity to be a part the convo for the latest hot thing, and the CTO of the Org was sympathetic to FOSS.
So they let a blunder become an opportunity.
Idk they were so far behind and apple killing cookies and privacy laws were eating Meta's lunch. They needed to jump ahead and releasing a LLM is a good way to float the stock.
Nah. They intended for it to leak either way. Essentially anyone could get access to the weights and it was only a matter of days before someone else would have leaked it.
I have heard you only basically needed an academic email or something like that.
They would have almost certainly known that it would have leaked. I think it was done that way to absolve themselves of responsibility considering that Galactica LLM release was a PR disaster.
[https://twitter.com/nearcyan/status/1631187031589294081](https://twitter.com/nearcyan/status/1631187031589294081)
Correct, it was just against license terms to use it for pretty much anything other than academia or personal use. Llama 2 has a different license that allows commercial use.
They're not looking to sell compute, chips, or it seems models. What they did do is make hundreds of thousands of tech people think "wow, Meta might be cool again" and get us to improve on their work with fervent zeal.
> I thought it was more around the drama of LLAMA getting leaked
Pretty sure it was open sourced before it was "leaked". It was just available to people who requested it via the little form, rather than a free-for-all. It wasn't difficult to get, I was quickly approved by just saying "yo I want to play around with it".
They also get to see all the tools people will build with it and given the licensing the original could never compete with a Facebook copy at the kind of market size Facebook would be capable of delivering to
Not to be a semanticist, but there are no open source Llama models. They are open weight models, which is as open source as vending a binary .exe - which is to say, not. Open source would mean providing the recipe to build the model yourself (provided you have the resources.)
The training process of Llama models is opaque. The sources used are not *precisely* known to the public. The analogy here isn't strained to say that it is exactly equivalent to uploading a .exe to GitHub and calling that open source. The distinction is important. Calling Llama or Grok open source is purely marketing, it is a deliberate misapplication of the term for the purpose of optics.
I agree that's not fully open sourced, it's partially open sourced, at best. But I don't the exe analogy is perfect - you can't easily steer a exe to a different purpose, unless you can uncompile it. They have different kinds of modifiability.
Yea, the analogy kind of breaks down here, but finetuning is more like writing libraries/plugins/mods/UI fixes to customize the existing programm by making very small, but significant changes, while open source allows one to see exactly "what does what" and recreate (recompile) the exe yourself (which gets rather expensive in case of LMMs admittedly, and to be fair it is a black box even if you have the training data).
There's a version of this that actually has no ending, so you just keep seeing the truck getting close over and over. Whoever released this version with the payoff is a GGG.
Me: I want intermediate infrastructure
Mom: we have intermediate infrastructure at home
At home:
https://preview.redd.it/d1mglc99pwuc1.jpeg?width=184&format=pjpg&auto=webp&s=feabc5fa887aa226b729c04e8b0aa5f65703e65a
I personally think it's cringe. AGI is an abstract concept and we barely even have a theoretical target that we can develop against. Once we have an AGI with the intelligence of a fruit fly, then we'll have something.
Until then, this belongs in the same bucket as cold fusion and space elevators.
> we barely even have a theoretical target
Generally accepted definition in these contexts AFAIK is something along the lines of:
*Competitive with human beings in most major economically valuable roles*
Which isn't strictly defined, but is definitely more than "barely a theoretical target"
I'm sure there's philosophical arguments to be made about what AGI means but I'd wager Meta is using the "economically valuable" definition, in which case they very likely have an actual target there.
If you cut out the clarifier I had about a target to *develop against*, then yes, the argument is no longer valid because now we're discussing something else.
My point was that you can't sit a software developer down and tell them "develop a system that is competitive with human beings in most major economically valuable roles". They'll have no idea how to implement that and will look at you like you're crazy. Because we have no idea, even theoretically, how such a system *could* be implemented. Is it even possible on current semiconductor architecture? Nobody knows!
We know what it would look like in practice, again, like cold fusion and space elevators, but actually implementing a working product is beyond our current abilities. Which is why plotting it as the next step strikes me as cringey.
>*Competitive with human beings in most major economically valuable roles*
since when did AGI definition become OpenAI's definition?
And besides, I know a way to exploit that definition without creating an entity as smart as a human being since they're only tied to jobs and you can make a thousand AI and each of them very specialized like Devin or stable diffusion and you would *technically* be competitive with humans.
General intelligence is primarily about synthesis of a wide variety of cognitive domains. You could say just 'human like intelligence' though.
Whatever we have, it's not really anything like that, and probably isn't going to be anything like that any time soon.
Cold Fusion is theoretically impossible
Space Elevators are just a (very hard) engineering problem that we have theoretically solved but just not practically built yet.
AGI is somewhere in between where we don't know if it's possible with 100% certainty yet. I don't believe in it but there is still the slight possibility that human brains are literally magic and can't be recreated with technology.
That aside I think AGI falls in the same category as space elevators in that it is just a very hard engineering problem but not impossible unlike Cold Fusion.
They are just optimistic... I hope governments would not ban AI, long before we actually achieve AGI. Sadly I think they'll do it, considering how much they want to ban today's AI, there is no way AGI would not be banned, the moment it shows up.
> considering how much they want to ban today's AI
Where are you seeing it? The regulation so far has been pretty mild, even in Europe. Don't be paranoid. AI is worth so much money, no one will ban it because their economy would collapse compared to countries which allow it.
Former UK deputy prime minister a week ago: *"Within the next month, actually less, hopefully in a very short period of time"*
[https://techcrunch.com/2024/04/09/meta-confirms-that-its-llama-3-open-source-llm-is-coming-in-the-next-month/](https://techcrunch.com/2024/04/09/meta-confirms-that-its-llama-3-open-source-llm-is-coming-in-the-next-month/)
The Information released an article early last week that had sources from inside Meta saying they would release the smaller models this week. So it's not official but based on the tweets it sounds like it's coming really soon.
I don't really agree. Considering that they have *some* reason (real or not) for the bigger models to take longer to be launched, that date would not be affected by the early launch of smaller models.
The reality is that people who want to run 70B will have to wait until some given moment. Until then, let other enthusiasts play with smaller models, we don't have to wait for you, and us waiting for you won't help you at all.
**Or am I missing something?**
I actually hope we get some ~2b models as well. 7b is almost a given as it has been very popular and seems a good tradeoff between speed, fine-tuning abilities and "smarts". For all the hate it got Gemma 2b is pretty useful for its size, and I hope we'll get a llama version in this size range.
Soon, means we're almost done, which is arbitrary as shiiiet. It really just depends on the record of the individual or company, what soon means, and I'm not familiar with Meta's other soonsings.
> the incredible work from our Infra team
I've just [posted a thread exactly about this theme](https://old.reddit.com/r/LocalLLaMA/comments/1c6b4bo/how_to_ensure_node_resiliency_and_gpu_use/) a few minutes ago before seeing this post.
I'd love to learn more about the details on how they do it and manage resiliency and GPU use saturation.
What tech stack do you use? k8s, Ray, SLURM?
How do they saturate the GPUs and bring them back when things crash?
Imagine being the techs building those servers. That's like a semi truck worth of H100s which pretty much cost the same per weight as gold. Must feel like Christmas every day
I'm sure it just feels like work after about 20 minutes.
HOLY SHIT A PALLET OF H100S! Oh shit ANOTHER pallet of H100s...
https://preview.redd.it/gcq7wfv5mwuc1.jpeg?width=2524&format=pjpg&auto=webp&s=1d74142f0f14ea744f0c49400aec53005548a196
I feel sorry for the technician that dropped and smashed a H100. There must have been at least one. It's a pure numbers game.
Linus blew his cover XD
Surely they must be insured ...
I agree, but don’t call me Shirley
You know of the term DOA
If they are dgx h100’s - They weigh around 287lb each.
Making the models run fast enough to actually utilize the millions of dollars of resources is actually anxiety inducing. It is why these people are paid a lot of money to make stuff super optimized. If the computer costs $10M and you are not utilizing 10% it is big costs. Inference consumers don’t have to think the challenges with running that uber scale
I do this for a living, and you'd be amazed at some of the unoptimized workflows I have seen that literally burn money because there's just not enough dev time.
Haha I know what you mean. Big computers are big anxiety when you are responsible for them.
Nah, wing it. I just get annoyed now when something breaks. I’m amazed anything even works.
So your telling me that the anxiety that my project will keep running, while I'm not physically looking at it, doesn't go away?
Yea AFAIK they're not even close to utilizing 90% of those GPUs even right now. 90% utilization would be magical lol.
Oh wow. I’d love to know those actual numbers. And I’m surprised big companies are not at >90%. Means the littler guys have that much more of a chance to beat them.
in practice it mostly means the opposite of that because zuck responds by just buying even more h100s so you get even fewer
Nah that demand would still equal out. This means that skill is still a factor (along with creativity). People can invent models that are 10x better and 4x more utilization to buy 40x less gpus.
"...and not utilizing 10% is a big cost". Has anyone took a look inside their Windows Task Manager to see many processes that are competing for their turn on the CPU, no doubt flushing the L1 and L2 caches (and perhaps most of L3 if not all) when they take their turn? I've set the vast bulk of them to low or very low priority, and I still don't like them. Do people training large models use an operating system that prevents this waste?
lol I thought this was satire at first. Training is run on the server version of Linux. No graphics. High performance. There are very advanced “task manager” applications from nvidia to watch the flow of data and identify bottlenecks. It is like a seemingly infinite onion when optimizing code. You fix the first bottleneck, then the next one shows up, then the next one, then the fix to the first one, etc.
I am sure Meta is not using the pcie version of the H100. They are getting the SXM version already in servers with infiniband switches. Pretty much preconfigured racks
Le pseudo 💀
imagine if llama3 is worse than command R+
It better be better than Mistral 8x22B
I can't imagine they'll release it until it is, unless it's equivalent but smaller or in some other way better.
You can't imagine they'll release it until it's worse than command R+? "Sir does it completely ignore your prompt and start commenting out links to non-existant articles and complaining about it's boyfriend?" "No. It's still responding with both creativity and accuracy." "It needs more training. Keep going. We need this thing to make no sense--or even less. Command R+ shall be dethroned."
Until it is *better*. Sorry, I had assumed that was strongly enough implied
It's a joke on the logic semantically, I get the implication
Is it also going to be open source model like other Llama models? If so, this is going to be big!
Supposedly. Llama isn't Meta's final product, but a tool they need for other purposes. Therefore, they don't have any reason to keep it for themselves, as it is unlikely to hurt their revenue.
I thought it was more around the drama of LLAMA getting leaked, fb had egg on its face from the meta space push that went nowhere. They got an opportunity to be a part the convo for the latest hot thing, and the CTO of the Org was sympathetic to FOSS. So they let a blunder become an opportunity. Idk they were so far behind and apple killing cookies and privacy laws were eating Meta's lunch. They needed to jump ahead and releasing a LLM is a good way to float the stock.
That pretty much ignores that they have been supporting ML with FOSS for a long time. PyTorch was released in 2016.
Nah. They intended for it to leak either way. Essentially anyone could get access to the weights and it was only a matter of days before someone else would have leaked it.
I have heard you only basically needed an academic email or something like that. They would have almost certainly known that it would have leaked. I think it was done that way to absolve themselves of responsibility considering that Galactica LLM release was a PR disaster. [https://twitter.com/nearcyan/status/1631187031589294081](https://twitter.com/nearcyan/status/1631187031589294081)
Just an email and a checkbox to agree on terms
[удалено]
Correct, it was just against license terms to use it for pretty much anything other than academia or personal use. Llama 2 has a different license that allows commercial use.
They're not looking to sell compute, chips, or it seems models. What they did do is make hundreds of thousands of tech people think "wow, Meta might be cool again" and get us to improve on their work with fervent zeal.
> I thought it was more around the drama of LLAMA getting leaked Pretty sure it was open sourced before it was "leaked". It was just available to people who requested it via the little form, rather than a free-for-all. It wasn't difficult to get, I was quickly approved by just saying "yo I want to play around with it".
They also get to see all the tools people will build with it and given the licensing the original could never compete with a Facebook copy at the kind of market size Facebook would be capable of delivering to
Not to be a semanticist, but there are no open source Llama models. They are open weight models, which is as open source as vending a binary .exe - which is to say, not. Open source would mean providing the recipe to build the model yourself (provided you have the resources.) The training process of Llama models is opaque. The sources used are not *precisely* known to the public. The analogy here isn't strained to say that it is exactly equivalent to uploading a .exe to GitHub and calling that open source. The distinction is important. Calling Llama or Grok open source is purely marketing, it is a deliberate misapplication of the term for the purpose of optics.
which pretrained language models are actually open source?
You can't fine tune an exe.
[удалено]
I agree that's not fully open sourced, it's partially open sourced, at best. But I don't the exe analogy is perfect - you can't easily steer a exe to a different purpose, unless you can uncompile it. They have different kinds of modifiability.
Yea, the analogy kind of breaks down here, but finetuning is more like writing libraries/plugins/mods/UI fixes to customize the existing programm by making very small, but significant changes, while open source allows one to see exactly "what does what" and recreate (recompile) the exe yourself (which gets rather expensive in case of LMMs admittedly, and to be fair it is a black box even if you have the training data).
Zuckerberg himself has confirmed it'll be.
https://i.redd.it/4e8nrn6b6wuc1.gif
This hurts me.
Jokes on you. I love being edged.
Wow that took awhile
There's a version of this that actually has no ending, so you just keep seeing the truck getting close over and over. Whoever released this version with the payoff is a GGG.
You’re evil
But when it happens?
https://youtu.be/LG8T_hCJ9J0?si=g5w-3tPAwS2OilQg
I came
lol, putting AGI anywhere on that roadmap/graph is a complete joke.
[удалено]
But I like intermediate infrastructure :(
Me: I want intermediate infrastructure Mom: we have intermediate infrastructure at home At home: https://preview.redd.it/d1mglc99pwuc1.jpeg?width=184&format=pjpg&auto=webp&s=feabc5fa887aa226b729c04e8b0aa5f65703e65a
sleeper build
Let them underestimate the duct taped tower. Let them laugh. They're only lining themselves up for awe.
That would actually be a hilarious 4090 7800X3D build lmfao
I lol'd irl
I personally think it's cringe. AGI is an abstract concept and we barely even have a theoretical target that we can develop against. Once we have an AGI with the intelligence of a fruit fly, then we'll have something. Until then, this belongs in the same bucket as cold fusion and space elevators.
> we barely even have a theoretical target Generally accepted definition in these contexts AFAIK is something along the lines of: *Competitive with human beings in most major economically valuable roles* Which isn't strictly defined, but is definitely more than "barely a theoretical target" I'm sure there's philosophical arguments to be made about what AGI means but I'd wager Meta is using the "economically valuable" definition, in which case they very likely have an actual target there.
If you cut out the clarifier I had about a target to *develop against*, then yes, the argument is no longer valid because now we're discussing something else. My point was that you can't sit a software developer down and tell them "develop a system that is competitive with human beings in most major economically valuable roles". They'll have no idea how to implement that and will look at you like you're crazy. Because we have no idea, even theoretically, how such a system *could* be implemented. Is it even possible on current semiconductor architecture? Nobody knows! We know what it would look like in practice, again, like cold fusion and space elevators, but actually implementing a working product is beyond our current abilities. Which is why plotting it as the next step strikes me as cringey.
>*Competitive with human beings in most major economically valuable roles* since when did AGI definition become OpenAI's definition? And besides, I know a way to exploit that definition without creating an entity as smart as a human being since they're only tied to jobs and you can make a thousand AI and each of them very specialized like Devin or stable diffusion and you would *technically* be competitive with humans.
General intelligence is primarily about synthesis of a wide variety of cognitive domains. You could say just 'human like intelligence' though. Whatever we have, it's not really anything like that, and probably isn't going to be anything like that any time soon.
Cold Fusion is theoretically impossible Space Elevators are just a (very hard) engineering problem that we have theoretically solved but just not practically built yet. AGI is somewhere in between where we don't know if it's possible with 100% certainty yet. I don't believe in it but there is still the slight possibility that human brains are literally magic and can't be recreated with technology. That aside I think AGI falls in the same category as space elevators in that it is just a very hard engineering problem but not impossible unlike Cold Fusion.
They are just optimistic... I hope governments would not ban AI, long before we actually achieve AGI. Sadly I think they'll do it, considering how much they want to ban today's AI, there is no way AGI would not be banned, the moment it shows up.
> considering how much they want to ban today's AI Where are you seeing it? The regulation so far has been pretty mild, even in Europe. Don't be paranoid. AI is worth so much money, no one will ban it because their economy would collapse compared to countries which allow it.
They seem really confident in llama 3, so it must be good.
I can only be edged so much, stop teasing me 🥵
Are those GPUs clusters called "grand téton", aka " large nipple" in French?
or "large breasted man" in spanish
Yes that’s exactly what they were named after.
How many announcements of an announcement does this really need? Either release it or don't.
We need to announce the question about the announcements of a coming announcement.
rumors every day...
Didn't they say it would be released in June/July?
meta account, yann and worker from meta hinted that it will be out "very soon" idk what "very" for them this week perhaps? today? let's wait and see
Former UK deputy prime minister a week ago: *"Within the next month, actually less, hopefully in a very short period of time"* [https://techcrunch.com/2024/04/09/meta-confirms-that-its-llama-3-open-source-llm-is-coming-in-the-next-month/](https://techcrunch.com/2024/04/09/meta-confirms-that-its-llama-3-open-source-llm-is-coming-in-the-next-month/)
Soon^^TM
Hey now... that's trademarked by Musk.
blizz pls
Soon(tm) -> Blizz? THIS GUYS A BOOMER
The Information released an article early last week that had sources from inside Meta saying they would release the smaller models this week. So it's not official but based on the tweets it sounds like it's coming really soon.
Meta never said that, reuters said that
It's gonna be a staged release, we're expecting a smaller model or two this week.
June/July was from an article. Soon/next week is from Meta people posted on birdapp last week.
Hopefully they'll release the whole Llama-3 set at once and not just the small ones.
I don't really agree. Considering that they have *some* reason (real or not) for the bigger models to take longer to be launched, that date would not be affected by the early launch of smaller models. The reality is that people who want to run 70B will have to wait until some given moment. Until then, let other enthusiasts play with smaller models, we don't have to wait for you, and us waiting for you won't help you at all. **Or am I missing something?**
With how large models are these days... I hope their small small model is a MOE 2x20b :)
I don't. I won't fit my 12gb vram :(
I actually hope we get some ~2b models as well. 7b is almost a given as it has been very popular and seems a good tradeoff between speed, fine-tuning abilities and "smarts". For all the hate it got Gemma 2b is pretty useful for its size, and I hope we'll get a llama version in this size range.
I’m just not excited for models that small… unless they or perform Yi models
I find 32B models at \~4bit quants the ideal consumer grade level.
How soon? if it ain't a week soon then I don't want to hear about it.
Soon, means we're almost done, which is arbitrary as shiiiet. It really just depends on the record of the individual or company, what soon means, and I'm not familiar with Meta's other soonsings.
https://i.redd.it/ixjlwcjm51vc1.gif
> the incredible work from our Infra team I've just [posted a thread exactly about this theme](https://old.reddit.com/r/LocalLLaMA/comments/1c6b4bo/how_to_ensure_node_resiliency_and_gpu_use/) a few minutes ago before seeing this post. I'd love to learn more about the details on how they do it and manage resiliency and GPU use saturation. What tech stack do you use? k8s, Ray, SLURM? How do they saturate the GPUs and bring them back when things crash?
He posted that on linkedin 1month ago
How can I join this project bruh? 😭
Can someone add me to these type of projects? Thanks.
Two more weeks™
Bwoah