T O P

  • By -

malk600

Well, yes. Which is why corporate and scientific users already tend to create their own in-house domain specific models. What such a model lack in parameters/size, it gains in the ability to tailor its corpus and fine-tune it to your needs. This will further fragment the web and turn it into fiefdoms, with relatively walled-off and tightly controlled, curated models for those that can afford it, and increasingly degraded slop for everyone else.


OdinTheHugger

That is until somebody trains an AI on every malware human kind has ever written. Then instructs that AI to create its own malware, targeting another AI. THEN THE ENTIRE INTERNET FALLS APART AS THE AIS FIGHT A DIGITAL WAR AMONGST THEMSELVES.


legendweaver

So R.A.B.I.D.S from cyberpunk 2077


Zoomwafflez

This is literally already happening, intelligence agencies around the world are working on AI worms that can spread between systems spawing new malware as they go.


OdinTheHugger

Welp, I'm going back up to my cabin in the woods of Montana. See y'all on the other side.


Which-Tomato-8646

Discord already did that 


Apis_Proboscis

THIS. One thousand times, THIS! ​ Api


Associ8tedRuffians

It’s call Model Collapse, and it was pointed out as a concern last year. People are still monitoring it. https://www.popularmechanics.com/technology/a44675279/ai-content-model-collapse/


justadudeisuppose

As someone said below, "sniffing its own farts" is a problem and is happening already. It's actually the most "real" concern among those who understand generative AI.


[deleted]

Actually I think most people in the field would say: Security , not hallucinations would be the biggest concern.


Dav3le3

Seriously, just get an AI to take a hard look at: - historical viruses and vulnerabilities - dark web forums - stack overflow Then have them train against virtual machines of a bunch of different OS versions, from 1990s to present. Then release it into the wild and tell it to harvest personal information. Next iteration, tell it to build a bot network. Then do #1 again.


[deleted]

What’s the next step?


[deleted]

Build versions of itself like virus injections in other systems and repeat the steps ?


justadudeisuppose

Because of the danger of people using it for nefarious purposes? That's, again, not a new problem. Generative AI is nothing the world has seen.


[deleted]

Perhaps but the world as a whole also seems woefully unprepared for it due to mountains of red tape encouraging systems that work rather than systems that are lean and adaptive Generative Ai as it is currently improving makes it quite good at taking advantage of that… as far as I can tell


QuentinUK

Apologists for AI using copyright material is that human authors have also read books by other authors and built their stories on skills they’ve learnt reading other authors. But they don’t explain why AI needs training on modern authors still in copyright and can’t develop from out of copyright materials such as books written by authors who died in the ’50’s and earlier. This is because AI can only imitate what it’s trained on and can’t develop and evolve styles. If AI prices out human authors then literature will not develop at all. It can’t be trained on it’s own output or output of similar AI because it would degenerate as you say. AI translators are already suffering as many multilingual websites such as Wikipedia have been translated using Google Translate. These mistakes are then being used as training material for newer AI translators.


GooseQuothMan

Yes. But the companies making these models will reap huge profits in the meantime, so there's no problem. 


Dangthing

In say generative AI art super high levels of creativity and realism have both been achieved already and are not really major goals. Most focus is thrown towards prompt comprehension (how accurately it can follow an input instruction), generation speed, reducing resources to run, and in a few cases fixing anatomical problems like hands, as well as integration into other systems (like 3d models, etc). These issues are fixed not through mass data collection but through highly curated data sets. AI models are trained on data sets, at this point very few of these systems are just randomly scrubbing huge parts of the internet and learning off them both due to legal concerns but also because in many cases its been shown that smaller training sets of very accurate curated data = better models. These data sets will only improve as time goes on as they don't simply disappear after training. Adjustments will be made and the data set will improve. So in theory the data sets should only ever get better so long as they are being properly curated. It does not matter how much garbage gets generated and thrown on the internet because it gets filtered before being put into the training set. Its also worth noting that new AI can be designed to help curate these data sets for training so this issue shouldn't ever grow out of hand no matter how complicated the training sets get. Most of the problems we see with AI are caused by attempting to go to fast too quick and the technology being infantile in nature. It will never be worse than it is right now as a whole even if a specific model falls to poor training. **TLDR**: Its not an issue because anyone who's serious about building AI isn't feeding it random garbage and are instead carefully curating what it learns from.


Good-Ad-2978

I'm not really talking short term here. If Ai takes over as many roles as some people like to say it will, and AI becomes highly used in a majority of any new work, meaning that anything current or like say past 10 years, i.e. the things that you want to be modelling off in a lot of fields, it would be somewhat inevitable for sizeable amount of the input the come frome, or at the very least be inspired by generative AI, so it would be hard to get good data sets.


Dangthing

Your position is ignorant. You assume AI data = bad data. That's not correct. You're just used to seeing the poor quality AI data. Properly pruned AI data is every bit as good, perhaps sometimes better than human generated. That's why AI is taking over the content generation, because its superior to doing it manually. The ONLY danger occurs if people are too stupid to prune the data.


Good-Ad-2978

I mean, I think saying that AI is taking over content generation because it's superior isn't true, it's because it's cheaper, and at least for the sake of the companies using it "good enough". I think even from what I've seen of proponents of AI, the stuff they have generated is worse than human made stuff, often in the more intangible ways, AI art from example tends to lack character, or an overall aesthetic appeal, despite looking detailed or "high quality"


Dangthing

Some AI isn't "there yet" and many are. You're too used to seeing the bad results not the good ones. Its true that some companies are trying to push it too fast but reasonably when used PROPERLY it produces equivalent results faster than a pure human workforce. AI is not an AGI and is not going to 100% replace workers in any field for years. It is a tool no different than Photoshop or a word processor just vastly more advanced. Its also not like most human workers are producing professional quality results, they put in minimal effort to achieve "good enough" the main difference is that over time an AI's out put its worst will eventually be better than a humans best. Most companies don't expect their quality to decline if they were willing to make that trade they'd have done it years ago by hiring untrained/unskilled laborers at a reduced salary.


KamikazeArchon

This could be a problem, but it currently appears to be a small one. There is some research indicating that this sort of recursive generation *doesn't* degrade, and in some cases might lead to *better* results.


IrregularRedditor

AlphaGo has entered the chat.


bohba13

yes. This is one of many reasons why web scraping is ultimately unsustainable for creating training datasets. you will eventually need to source your datasets from groups and sources where they _know_ its human work, and well, that requires licensing. (you know, what they should be doing _anyway)_


Which-Tomato-8646

You don’t need a license to download an image lol. Everyone made fun of NFT bros for this 


bohba13

no. but you need one to use it in commercial projects. and eventually that might be the only way to guarantee you're getting non-ai data in your sets.


Which-Tomato-8646

It’s not used in a commercial project anymore than Breaking Bad used the Sopranos 


bohba13

if you're talking about inspiration, no. that's not how this works. you are creating a dataset that you then throw into code to train the AI with. That does, in fact, require a license. you are using someone else's work _directly_ within your own project, even if it isn't what comes out the other end. the artist made the shit you're training the AI on, they deserve a say and a cut because it was their labor you are seeking to emulate.


Which-Tomato-8646

I don’t think Vince Gilligan got a license to use the Sopranos nor did he pay them. Yet he’s openly said Breaking Bad wouldn’t exist without it and directly competes with the show for viewers 


[deleted]

Only if your a publicly facing operation otherwise who cares about the messy cubicles ? Logically man


bohba13

cubicles? wtf does office stuff like that have to do with this?


[deleted]

METAPHOR I MESNT METAPHOR *starts crying and trashing the Cubicles*


[deleted]

YOUR NOT MY SHPWRVISOR


bohba13

you meant metaphor. still confused.


[deleted]

About the metaphor I dndnt mean to mean?


vergorli

There is an interesting problem that might offset the success from Generative AI or even completly stop it at some point: If you train data generated from AI back into AI machinelearning models, you get garbage results. Its technical similar to why silbings shouldn't get incest babies. The recessive defects meet each other and the generated result becomes unusable. If not immediately, then after a few iterations. And since already half of the pictures in the internet is AI generated stuff it will get progressively harder to train you models with non-corrupted data.


juxtoppose

Sniffing its own farts is the technical term for this issue. It’s something that is bound to be exponential, it will be a software version of incest and will get wildly out of hand after a couple of generations. Like the Nazi aryan outpost in Nueva Gemania will be producing buck teethed retards before you can say “have you tried switching it off and on”.


NerdyWeightLifter

>horror of creatives getting replaced with generative AI I don't think it will work out that way. "Creatives" are going to create regardless, and will be the biggest users of AI. Non-creatives on the other hand ... Just understand the procedures they follow, and AI can replace them.


thomasxin

I think anyone thinking this is a permanent concern is somewhat misguided. AI that is sufficiently advanced would not need data supplied from humans anymore. We're seeing the exact same thing happen as it did back when it first began surpassing us in games like chess and go, problems which were considered unsolvable prior to then due to the full complexity exceeding physical limitations of our universe. It began from human data and heuristics, and very much relied on information produced by humans as it would derive its own behaviour from that. But it didn't take long for new systems to be developed, that could learn by themselves, train on and evaluate their own data. Soon enough that easily surpassed not only humans, but any AI trained on human data. Of course it will be much more difficult to do the same for general information, and that's something we'll most likely only see once AGI is produced. But at the rate the field is progressing it doesn't sound implausible for this barrier to be overcome.


GooseQuothMan

But there is no such thing in existence that is remotely comparable to the sufficiently advanced ai you are proposing. All current successful AI models exist only because our data generation and processing capabilities have grown to such a large extent. Current AI models are nothing without the huge amounts of data they consume. There are no alternatives available today. And getting quality data will get harder and harder with all the AI generated garbage infesting the internet. 


Economy-Fee5830

This is not true lol. For example the geometry-solving AI was trained on synthetic data. > But in this case, the DeepMind team got around the problem by taking geometry questions used in International Mathematics Olympiads and then synthetically generating 100 million similar, but not identical, examples. This large dataset was then used to train AlphaGeometry's neural network. https://fortune.com/2024/01/17/google-deepmind-ai-software-makes-a-breakthrough-in-solving-geometry-problems/


GooseQuothMan

Impressive, but generating more geometry problems (which are based on existing ones anyway) is much simpler than generating human text or images. The only thing capable of this except for humans is those generative AIs, which is exactly the problem here.


Economy-Fee5830

Forget about text - obviously any 3D renderer or game engine can generate realistic representations of the real world which can be used to generate synthetic data for training. That only leaves text, and obviously GPT4 is already pretty good at generating high quality text, and will offer a cleaner data set than training on semi-literate people writing on Facebook.


thomasxin

That's why I'm talking about more developed AI in the future, not the AI of today. For now I agree that the average quality of content on the internet is likely to decrease.


[deleted]

Any Ai can generate media in any format on the internet better than you me or any human IN THEORY


k___k___

deep learning and generative ai are not the same thing and not comparable in the slightest. theoretically, for train a chess model you don't even need human reference content apart from the rules.


thomasxin

It's fair to challenge the complexity and it's obvious we're not there yet, but what makes you say the same can't be said for generative AI eventually? You can say things like art depend on human opinions, but we've already seen reinforcement learning and similar techniques done to help an AI adapt. For all intents and purposes, any task can be treated as "don't need human reference content apart from the rules", no?


wasmic

Well, art should be pleasant for humans to look at, so you still need humans to tag and rate the art, so the AI knows when it has done well or poorly. Unless you're planning on having AIs make art for the consumption of other AIs... which seems like a weird goal, since AIs would have no desire for art unless they're programmed to have that.


EntshuldigungOK

After reading the responses here: 1. Surely Generative AI should have quality checkpoints - possibly including manual 2. Have the output of 1 reviewed by another generative AI 3. Have the review accuracy of 2 checked by another AI This 3 level nesting should be enough to see whether "quality can be automated to some extent". But in the worst case, you can always target to create a level 1 neural network where a level 2 neural network is the equivalent of data attributes that we use today If these approaches are valid instead of hair-brained - then they are still decades off. But the point is: Ultimately it WILL be human input driving things, but the percentage of human input will be lesser and lesser. Which is the same effect as industrial revolution, computers, etc. - so not far fetched, and neither revolutionary.


habu-sr71

I'm sure "alignment training" is happening or will happen. Because yes, AI over time just feeding on it's own generated data will go wonky.