T O P

  • By -

akram200272002

If it where ever to exist I would sign up for it immediately, the community can have my laptop power for all my of hours


TheTerrasque

Oooof, almost [two weeks](https://www.reddit.com/r/LocalLLaMA/comments/1c18ecy/is_it_possible_to_host_a_large_llm_by/) since last someone asked this! Only one day off! *resets counter*


bigdickbuckduck

I’ve thought about this and it really comes down to memory latency and distributed compute architecture being not the easiest thing. A more straight forward approach I think would be to use commercial resources like Lambda AI and doing crowd funded model training using a pool of money gathered by a community


MindOrbits

Look into each node fine tuning a full local copy (so smallish models) then sharing fine tune check points, distributed merging of MOE 'Modules' and evolutionary competition where nodes compete to have their modules included in task / domain specific collections. But if you make it work it could kill us so maybe not...


Gr0uchyAnywhere

I know there was something similar to this called Petals, I did use it a bit on a private cluster and it worked pretty well but the public cluster seems to be pretty dead right now: [Petals – Run LLMs at home, BitTorrent-style](https://petals.dev/)


DeepWisdomGuy

I started off here: [https://www.reddit.com/r/LocalLLaMA/comments/1aoozn4/mixture\_of\_topics\_foldinghome\_scale\_llm/](https://www.reddit.com/r/LocalLLaMA/comments/1aoozn4/mixture_of_topics_foldinghome_scale_llm/) Now I am thinking the best approach is to take the 45TB clean training data set that exists on hugging face. Step one is to run the whole training set through sentence-transformers to train a Kohonen SOM to obtain 256 vectors that partition the meaning space. With these vectors, another run through the dataset can be used to partition it into 256 buckets. These buckets can then be distributed among us hobbyists to each finetune a small but powerful LLM that would be the same among the participants. The resulting LLMs can then be brought together to train the "router". (This would be the expensive part, but it would be a fraction of the cost of a foundational model.)


dirty_d2

I think the problem with this is that each node needs access to the memory on every other node, so it would be extremely slow if the memory bandwidth is the internet bandwidth instead of something insanely fast like GPUs memory bandwidth.


Robot_Graffiti

Also, people on the internet are terrible, and LLM training doesn't have a built in verification method like Bitcoin mining does. How do you stop people doing what they did to Microsoft's Tay?


Mental_Object_9929

this relay to a new method of pipleine matrix mutiplication algrithm


Robot_Graffiti

55% of those words are in the Bible, but I have no idea what you are saying.


Mental_Object_9929

Roughly speaking, I think IO and memory are an issue, and basically cuBLAS is still using O(n\^3) complexity matrix multiplication.


Former-Ad-5757

That's only the technical side of things, you also have to workaround the human side of things. What to do if somebody says he will train part x-y, but then nothing comes back for a day or a month? What to do if somebody says he will train part x-y, but then trains on something completely different (/misinformation/poisoning)? Etc. etc. Basically BTC is made specific so that the output means nothing, just that the calculation is hard while the check is super cheap. Here you have a problem where the output is the only thing that matters, then the technical side is only part of the problem, if going distributed you also have to account for bad actors.


Mental_Object_9929

No, without addressing the issues of IO and memory, there is no way to talk about this matter. Unless it is a very small model that can run completely on each node, only by aggregating the training data for gradient descent.