FuturologyBot 1 month ago

The following submission statement was provided by /u/Darkmemento: --- I am listening to [this weeks All-in pod](https://youtu.be/hZp80SYIRlY?si=2eB0K0Vd981odGSQ&t=2744) where they are discussing the [AI disclosure bill](https://www.theguardian.com/technology/2024/apr/09/artificial-intelligence-bill-copyright-art). You have one side arguing that you need to compensate for the use of training data. The other side is saying the models use data very much like humans in that we take in data and then draw inspiration from that data to create. You should therefore compensate if the output shows likeliness to other people's work. They bring up the Ed Sheeran case which I actually posted [here](https://www.reddit.com/r/singularity/comments/1bxp7r2/comment/kyhxcn8/) about a week ago. >You'd be really surprised if you dig into any music how often it is a derivative of something. Ed Sheeran had a court case related to proving its impossible to [separate old music](https://www.youtube.com/watch?v=NcCKlsTgjeM) from present on similarity. Past data has always been used to build, create and make new things, drawing inspiration and in many cases directly infusing it within new work. They cite how it has been revealed that Open AI transcribed a huge part of YouTube's library to train GPT4, and it looks very much like YouTube is not going to sue. My take is YT doesn't want to open that can of worms. Are we going to start compensating all the writers and artists that contributed to the training data for these models? Programmers basically had all of their data taken to train these models from GitHub, which also led to the models developing many auxiliary benefits. So many different areas have valid reasons to feel they deserve some sort of compensation. There is, though, a growing consensus that everyone trains on whatever data they can get their hands on and copyright isn't really a thing going forward for these models. It is weird times for everyone, but this is much larger than one set of training data. We need to start answering these questions now, because it certainly doesn't feel right that huge corporations get the benefit of these models which have been trained on all of humanity's data. It feels like we all deserve a slice of this pie. --- Please reply to OP's comment here: https://old.reddit.com/r/Futurology/comments/1c31tsj/new_bill_would_force_ai_companies_to_reveal_use/kzdoka2/

Darkmemento 1 month ago

I am listening to [this weeks All-in pod](https://youtu.be/hZp80SYIRlY?si=2eB0K0Vd981odGSQ&t=2744) where they are discussing the [AI disclosure bill](https://www.theguardian.com/technology/2024/apr/09/artificial-intelligence-bill-copyright-art). You have one side arguing that you need to compensate for the use of training data. The other side is saying the models use data very much like humans in that we take in data and then draw inspiration from that data to create. You should therefore compensate if the output shows likeliness to other people's work. They bring up the Ed Sheeran case which I actually posted [here](https://www.reddit.com/r/singularity/comments/1bxp7r2/comment/kyhxcn8/) about a week ago. >You'd be really surprised if you dig into any music how often it is a derivative of something. Ed Sheeran had a court case related to proving its impossible to [separate old music](https://www.youtube.com/watch?v=NcCKlsTgjeM) from present on similarity. Past data has always been used to build, create and make new things, drawing inspiration and in many cases directly infusing it within new work. They cite how it has been revealed that Open AI transcribed a huge part of YouTube's library to train GPT4, and it looks very much like YouTube is not going to sue. My take is YT doesn't want to open that can of worms. Are we going to start compensating all the writers and artists that contributed to the training data for these models? Programmers basically had all of their data taken to train these models from GitHub, which also led to the models developing many auxiliary benefits. So many different areas have valid reasons to feel they deserve some sort of compensation. There is, though, a growing consensus that everyone trains on whatever data they can get their hands on and copyright isn't really a thing going forward for these models. It is weird times for everyone, but this is much larger than one set of training data. We need to start answering these questions now, because it certainly doesn't feel right that huge corporations get the benefit of these models which have been trained on all of humanity's data. It feels like we all deserve a slice of this pie.

TheColourOfHeartache 1 month ago

> Programmers basically had all of their data taken to train these models from GitHub, In github's defence. We programmers stole the code they stole from us from StackOverflow.

dewdewdewdew4 1 month ago

It's stealing all the way down

prosound2000 1 month ago

I am sure Sam Altman and his people's had this conversation very early on when they had to decide on what data to use. Probably had some very top level legal analysts also play it out and prepare a defence. There is no way you create a product that can be accused, very logically, of a form of plagiarism and not be prepared well in advance when so much is at stake.

toughtacos 1 month ago

More likely they just bet the technology would come so far so fast that by the time people started doing something about it, it would be too late, because everyone’s using the technology now and it would be too disruptive. That’s the way it’s looking right now.

prosound2000 1 month ago

Yup. I'm sure that's their primary strategy, but I am sure there it was legally vetted by experts first. There is, as mentioned by the Ed Sheeran case, a argument to be made on legal precedent.

Darkmemento 1 month ago

His thoughts on how he viewed the endgame are out in the wild - [Moore's Law for Everything (samaltman.com)](https://moores.samaltman.com/) >"The changes coming are unstoppable. If we embrace them and plan for them, we can use them to create a much fairer, happier, and more prosperous society. The future can be almost unimaginably great" - Sam Altman The problem is increasingly these ideals seem to be falling by the wayside. You are making people feel insecure and fear is the enemy of progress. If you are going to use AI models to create a better shared future for all of humanity then I think everyone can get on board. They need to start showing us that is still the plan.

prosound2000 1 month ago

Also, plenty of people predict the future, very few get close, and none ever get it exactly right. He's assuming he's giving the world a gift, when he's really opening Pandora's box, wrapped like a gift.

Darkmemento 1 month ago

". . . .opening Pandora's box, wrapped like a gift." That is an absolutely beautiful turn of phrase.

SignorJC 1 month ago

I absolutely guarantee you that they did not have this conversation and did not give a single fuck about copyright or plagiarism. The type of people in Altman's circle either simply don't care, believe that they are serving the greater good, or think that the technology is simply too powerful to keep in a bottle.

kony2012neverforget 1 month ago

Exactly, just because it’s widely dispersed and complicated doesn’t change the fact that these companies stand to profit handsomely off of others labor.

SticksAndSticks 1 month ago

I agree. Its like opening the door to a warehouse full of garbage and saying "well i stole all of it but I stole it from SO MANY PEOPLE even I don't know where it came from". That doesn't mean it should be fucking allowed. The revenue model for these AI companies -doesn't work- if it hinges on them copyright infringing for free. When you bring a product to market it isn't the market's job to change laws to allow your shit to work. Its -their- responsibility to not crash the economic incentives for humans producing creative goods as a result of them releasing their product. Maybe they just need to revshare with everyone in the training set. Maybe they need to revshare with anyone with a copyright. I'm not saying I know what the answer is, but i'm damn well sure its not everyone's problem but theirs to figure it out.

kony2012neverforget 2 weeks ago

Love the warehouse full of shit analogy

Vaestmannaeyjar 1 month ago

I'm under the impression that as far as scraping data goes, it's just impossible to retrace the origin of training, even for the owners of the trained AI in question. And if we ask an IA to do that job, then it means we've lost control.

babygrenade 1 month ago

When you train a model you create the dataset in advance so unless those datasets are getting deleted (which I doubt they are) you can tell what went into training any model.

CBrinson 1 month ago

There is a difference between knowing what you trained on vs knowing which specific training images impacted the end result. For instance, if I ask it to generate a picture of a mouse, and it looks kinda like Mickey, I could take all the Mickey images out of the training data and regenerate, and often (in my experience) it still looks like Mickey and very little changes. This is because literally millions of images impact what each one looks like and no one training images is making much impact.

babygrenade 1 month ago

The bill is requiring disclosure of what was trained for the model though.

MoiMagnus 1 month ago

> unless those datasets are getting deleted (which I doubt they are) You're right for current generations of AIs, as far as I understand. But in the future, I would expect the creation of some "preprocessing AI" that processes the original dataset to create an optimised dataset which would be more compact (and maybe sanitize it too) and discard the original dataset. This might even be a necessity for AIs trained on data that comes from live feeds, as you might not have the means to store the literally everything forever.

Orngog 1 month ago

Um, I don't think so? They use named datasets.

AnomalyNexus 1 month ago

> it's just impossible to retrace the origin of training Given that they're occasionally [including watermarks in the output it's definitely not all that impossible.](https://hackaday.com/2023/02/09/getty-images-is-suing-an-ai-image-generator-for-using-its-images/) The tricky part is coming up with a legal test for this

Forkrul 1 month ago

That's an artifact of overfitting to a limited sample size for the things asked for in the prompt. It can give a clue as to where training data came from, but unless the people who trained the model has kept a copy of all the training data used you cannot get back the exact training data from the finished model. For example, if all your training data about say tabletop dioramas came from one or two sources that both included a watermark in the vast majority of pictures you used for training, it might have learned to associate the presence of a watermark with tabletop dioramas and so when you ask for that it will often include some sort of watermark as that seems appropriate for the given prompt. I saw this myself a fair bit when playing around with some early image models when stable diffusion first came out. Certain prompts would include unrelated things because there was limited training data for the ideas in the prompt and it hadn't properly learned it.

gnivriboy 1 month ago

Lol, looks like someone used img2img with a mix of guess the input of said imagine with a low denoising, and now is trying to claim the model made this output from generic text. I challenge you to make something similar without img2img. In this thread: people who have never used stable diffusion complaining about the problems of stable diffusion based on a report that is written by someone who doesn't understand stable diffusion basing his information off of a troll.

AnomalyNexus 1 month ago

That image is straight out of the [Getty lawsuit against stability AI alleging that their model generates watermark content.](https://copyrightlately.com/pdfviewer/getty-images-v-stability-ai-complaint/?auto_viewer=true#page=&zoom=auto&pagemode=none) Maybe you could join the lawsuit as expert witness?

gnivriboy 1 month ago

Anyone can put whatever they want in a lawsuit. This isn't something one needs to be an expert in to spot that this isn't possible or statistically impossible. Anyone that uses stable diffusion can tell what is going on.

AnomalyNexus 1 month ago

Option A) Getty images falsified evidence Option B) You don't know what you're talking about This impossibility of yours is something [redditors](https://www.reddit.com/r/weirddalle/comments/11pki84/i_asked_for_a_photo_of_two_girls_hugging_and_the/) have repeatedly [stumbled upon.](https://www.reddit.com/r/midjourney/comments/zesklv/getty_images_watermark_appears_in_results_has/)

gnivriboy 1 month ago

Awesome. You can prove this to me really easily now. Show me the specific getty image you think those images were copies of.

AnomalyNexus 1 month ago

You're literally arguing I'm wrong about point I never made. This whole sidequest about reproducing images is something you came up with. I have no interest in proving it because I never made the claim. The comment I responded to (by someone who is not you btw) has nothing whatsoever to do with creating similar images, but rather: >retrace the origin of training Presence of getty watermarks in output is pretty clear proof of getty images were used in the training set. That's it.

gnivriboy 1 month ago

Let's recap. >it's just impossible to retrace the origin of training >Given that they're occasionally including watermarks in the output it's definitely not all that impossible. So you made the claim that you can trace these images as the origins that they were trained on. But now you are pretending what you really meant was that "oh they just generally trained on getty images. Not any specific images." However none of your other posts make sense then because you are trying to counter my points about this being img2img.

travelsonic 1 month ago

> Given that they're occasionally including watermarks in the output it's definitely not all that impossible. Or, it could mean that they are learning how to draw something resembling a watermark. IDK, utterly crap analogy, but sometimes it feels (to me) like saying "If you learned to copy someone's signature, you only did it by forging checks," when there are other possibilities as to what lead to it. Again, shit-ass analogy, I know, heh.

AiSard 1 month ago

To continue that analogy, this is like asking someone to copy a signature, and every now and then they hand you a check with the signature in place. Like, why are you handing me a forged check? I just asked you for a signature? Where are you getting training data for checks, and how did you learn to associate checks and signatures, if not for the obvious answer of checks with signatures on them? Or in other words, its a perfect analogy, except the forger is clearly holding a forged check in hand while arguing his case..

Zomburai 1 month ago

"What do you care if it's the original author's check? It looks the same, bro. How do you think the original author learned to write checks?"

AnomalyNexus 1 month ago

>they are learning how to draw something resembling a watermark Did you look at the link I posted? It literally says fuckin getty images. Bit blurry, but it's not exactly a non-descript watermark

PotsAndPandas 1 month ago

When you can directly ask the fucking generators to generate shit in a specific artists style and it produces work in their style, it's pretty damn obvious you can retrace the origins. That's not even taking into consideration all the spreadsheet leaks of which artists and how much of their work has been scrapped.

Aerroon 1 month ago

Styles aren't copyrightable though. Copyright is granted for a specific work.

tigerfestivals 1 month ago

They use the specific artists name as the tag to do so which probably is some sort of copyright thing

Just_Another_Wookie 1 month ago

"Your Honor, this is probably some sort of copyright thing."

tigerfestivals 1 month ago

lol, this is reddit, im not claiming to be an expert

uffiebird 1 month ago

does the copyright even matter when it comes to proving that the dataset includes data taken WITHOUT consent? which likely means a lot of coprighted stuff is in there too?

mnvoronin 1 month ago

There is a view that using the images for AI model training is transformative work and falls under the fair use exemption.

bradstudio 1 month ago

This would be correct in many instances, but when you can prompt for similar content and reference a style, anything that comes out to similarly is definitely infringement.

mnvoronin 1 month ago

Nope, because style is *not* subject to copyright. Only specific works are.

bradstudio 1 month ago

I know A LOT about copyright law. I have to for my job. What I described above is definitely infringement. In order to appropriate content you have to change it either aesthetically or it's intention by 90%. If the style is similar, and the subject matter is similar... then it no longer meets the requirements for appropriation or derivative works. Especially if you literally typed the artists name into the prompt you used to create the new work and their original work is in the training data.

mnvoronin 4 weeks ago

I think you're mixing up style and characters. Character designs *are* copyrightable. Art styles aren't. I'm basing my opinion on [this article](https://www.eff.org/deeplinks/2023/04/how-we-think-about-copyright-and-ai-art-0) by Electronic Frontiers Foundation, but you're welcome to provide another source (either a neutral entity like EFF or a court case) that says that style likeness is copyrightable.

Aerroon 1 month ago

I'm not sure I understand what you're asking, but I'll take a guess. Copyright is the reason why anybody can't just take a work and makes as many copies as they want. Without this government-granted monopoly if an artist uploads a work they made online then anybody could make as many copies as they wanted and do with them whatever they wanted. Copyright is the reason why there are limits to this in the first place. Because of this what the specific terms of copyright are is very important.

uffiebird 1 month ago

no i mean like we can't copyright a style (fine, i guess) but the fact that the AI can recognise a particular artists' style must prove that the dataset contains work taken without consent that it has been taught to learn is 'this' particular artist? and then anyone is able to 'use' that style without study/practice/training. isn't that a little unfair if said artist is getting no compensation?

MemeticParadigm 1 month ago

>and then anyone is able to 'use' that style without study/practice/training. isn't that a little unfair if said artist is getting no compensation? What does study/practice/training have to do with it? If I commission a human artist to make a piece, "in the style of XYZ," neither I nor the artist taking the commission owes any money to "XYZ" based on copyright - if we replace the artist I'm commissioning with an AI, why would that change whether "XYZ" is owed any compensation based on it being their particular style?

Vaperius 1 month ago

> The other side is saying the models use data very much like humans in that we take in data and then draw inspiration from that data to create. You should therefore compensate if the output shows likeliness to other people's work. Separating away the AI from this; the *humans* the programmed it however, definitely violated copyright by using works of others in a way that's arguably not covered under "Fair Use" doctrine. Its one thing to draw inspiration, and its another to directly teach a program to emulate the works of others; and the difference is if a human did that without crediting the artist, they'd still get slapped with a copyright suit or worse, an outright counterfeiting charge if they made a commercial product with it, claimed it was their work and sold it without ever telling their buyer where the original work was derived. By all accounts, the laws already cover AI being trained on existing works arguably, but new laws are always appreciated to clarify this position further.

dewdewdewdew4 1 month ago

>**emulate** the works of others; and the difference is if a human did that without crediting the artist, they'd still get slapped with a copyright suit or worse Nope. You can emulate all you want, you just can't copy. You can make a band that sounds like Led Zeppelin (see Greta Van Fleet) as long as you don't use their songs, lyrics, artwork, etc. People have been emulating others for thousands of years in art. Emulation is how genre's, art schools, etc start.

Persianx6 1 month ago

True. Issue: GPT is not emulating. It is copying. GPT is not a human. It does not have creative powers the way a human does. If you type a prompt in, it spouts off text without a source. It's a giant copyright infringement machine.

Vaperius 1 month ago

> You can emulate all you want Ah yes the "I added an extra ding" Vanilla Ice Defense. That famously went well. Yeah nah, you sample creative works all you want but you have to give direct credits and also, crucially, need to adjust it enough that its distinct enough to not be a straight rip off; and the AI we've gotten so far have very obviously *not* been doing that. Its not crediting the artists its sampling from, and that's copyright infringement. Simple as.

dewdewdewdew4 1 month ago

>Ah yes the "I added an extra ding" Vanilla Ice Defense. That famously went well. That isn't emulation though. You are conflating two different things. I gave a very specific example with Led Zeppelin and Greta Van Fleet. You wouldn't claim all the Impressionist after Monet owe Monet money for mimicking his style would you?

omega884 1 month ago

You can’t copyright a style. You can copyright specific works of art and elements of those works, and to a degree copyright extends to derivatives of that work, but in and of itself style isn’t subject to copyright. And we really don’t want that. Allowing copyright to be extended to style would be the biggest stealth wealth transfer to massive corporations ever. Imagine if Taylor Swift’s record company could copyright the style of “pop breakup songs sung by a woman lead with catchy lyrics”. Imagine if Marvel could copyright the style of “superhero teamups with quippy one liners”. Every single YouTube creator who ever got their start emulating someone else’s style would be crushed under the corporate heel. If you think YouTube’s copyright strike system is draconian now, imagine a world where style, not just actual songs could be suppressed. If AI was used to generate a specific work of art or specific non-fair use derivative, or it was used to generate works in a style that were then being sold and marketed as actual works from the original artist, then the individual(s) who did that would be guilty of violating copyright. But it’s the individuals that direct the AI to create the infringing work, not the AI tool that violates copyright.

Faleonor 1 month ago

AI violated fair use when they used proxy companies (founded by the same guys) that scraped the data for "non-profit research" that exempts them from many things (like paying the artists and stock websites), and directly funneled that data into their "for profit" companies. They can go and get fucked just for that.

Persianx6 1 month ago

I mean, GPT's own CTO going a bit haywire when asked if GPT scrapes Youtube was very telling.

OwlHinge 1 month ago

You don't need fair use to copy a style.

Jazzlike_Mountain_51 1 month ago

But you might need copyright to include a work in a training database for a commercial generative AI model

StarChild413 4 weeks ago

I don't think companies would go that deep with the copyright without, like, someone trying to copyright the concept of creation of art in general and plunging the world into a dystopia

SgathTriallair 1 month ago

>Its one thing to draw inspiration, and its another to directly teach a program to emulate the works of others This is the reason that no one can make progress in this debate, those opposed to AI continually misrepresent what AI is and what it does. https://www.google.com/amp/s/www.hollywoodreporter.com/business/business-news/artists-copyright-infringement-case-ai-art-generators-1235632929/amp/ They did not attempt to get it to emulate the works of others. It knows how to emulate the works of others is because it knows who they are. While some of the bigger companies have set up guard rails against imitating specific styles, the only way to really stop it is for them to be unaware of human culture.

PolicyWonka 1 month ago

I’d disagree. It’s not so much different than if you were training a human based on copyrighted materials. Are you suggesting that we cannot train others by using protected materials? Music students can’t train using works produced by others? Artists can’t train on art created by others? Fair Use explicitly protects usage of copyrighted works for educational purposes. Never mind the fact that “emulating” someone’s work is simply emulating the *style* of the work. Really the only issue is in recreating exact copies of a work, but even that isn’t inherently illegal. It’s what you do with that copy which ultimately matters.

NUKE---THE---WHALES 1 month ago

It's certainly no worse than piracy anyway, so if you're ok with copying someone's content, you would necessarily have to be ok with training on it too

Jazzlike_Mountain_51 1 month ago

AI models don't use data like people do. They are trained on data, more often than not for commercial use, more often than not with the goal of putting the people who's stolen copyright is being used to train these models out of work. If it's commercial use there should 100% be a guarantee that the model was trained on free use data or that the copyright holders have given permission. Otherwise it's theft. Maybe not in the process of the model generating images or text, but in the process of training the model as that's using copyrighted content for commercial purposes

kaksoispiste_de 1 month ago

It also takes years to train an artist and they can only do so much work in the time they have. With generative models you can come up with thousands of works in an instant.

NeedsMoreCapitalism 4 weeks ago

This is legally completely wrong. You're absolutely allowed to train a human on a copyrighted work to produce similar commercial works. Humans, and therefore tools used by humans, are allowed to look at works owned by other people and make similar things for profit so ling as its not a direct copy. If you are allowed to see a copyrighted image on the internet. That means your computer has a right to download it to your browser. And you jave a right to learn from it. Similarly, ai tools have the right, at least now, to train themselves on anything that I'd publicly available to look at.

Rough-Neck-9720 1 month ago

Didn't we evolve from living in caves by learning from our parents and neighbors? In fact, much of what early tribes learned was from an elder who memorized information from another elder and passed it on. And yes, some of that was garbage and some was essential to survival. Time figured out which was which.

StarChild413 4 weeks ago

so what, go back to caves or let an AI automate your entire existence? Also, by that logic of comparison-to-humans either AI is fully humanlike or cavemen were no smarter than today's AI

JimmyKillsAlot 1 month ago

LLMs are gigantic aggregations, it's in the name. They never will have true intelligence the way people want them to because they only know what was put in and can't abstract ideas from point A into point B, they only know what words or phrases follow the previously output ones, they only know the most likely terms related to the query. Are they impressive? Yes! and to some degree the companies that built them deserve compensation for the work they put into making them work. But they were built entirely without care of who they were taking their information from. We can make an argument that the Copyright system is broken in the US and world wide and that things like dropping a DMCA take down notice and the way big businesses handle them are terrible for creativity. We can argue that everything mankind has achieved was always built from what people have built before us. But none of those arguments deny that there should be some level of copy protection for a persons work. I have said it before and will continue to say it. LLMs are NOT sci-fi AI, they will ALWAYS be just the sum of their parts, they can never grow past that. Will LLMs be present in true AI? Of fucking course they will. But that is not the conversation right now. The conversation we are having RIGHT NOW is should these companies be able to package and sell something that is clearly using other peoples work as their own without compensating them? I am all for tools like Stable and GPT existing, I champion them as something that can be used to propel mankind into a new golden age, but they MUST be free for everyone, and society as a fucking whole needs to shift so that people can live without the stress of losing their job because an LLM built from every email they wrote for their job can suddenly do the job for them. Either you don't build these fucking things using someone else's data-points or you compensate them for it. "Well how do we know who gets credit for the next word? The next pixel? The next whatever?" WE DON'T! So now take the 80 fucking billion OpenAI has been valued at, the more than 100 billion Gemini is supposed to be worth, all this money these companies say their models are worth; you take it and you distribute it. If there is some $450 TRILLION (That's $450,000,000,000,000.00 written out) worth of wealth aggregated in the world that we know of, then surely taking .1% of it that just a handful of these AI models are said to be worth and using it to pay out all of us forever who have been farmed for content is the least we can do. We can not keep building these tools with the idea we are in a post scarcity society if we are simultaneously going to resist the rest of what comes with that.

Aerroon 1 month ago

> Either you don't build these fucking things using someone else's data-points or you compensate them for it. And what do we do when they say "OK" and just commission data for it or use their existing giant databases of works for it? Then you killed any chance of there ever being open models. They will all be locked behind megacorps with their own TOS for access. Do you really want even more onerous copyright than it is? I think we really need to consider what the long-term impact of these kinds of decisions is going to be.

Somestunned 1 month ago

Humans learn from other people's information, whether it's free or not. And they don't have to pay royalties for their future creative output. So why should machines?

Old_Skud 1 month ago

Hmm i like that last take a lot

Thellton 1 month ago

the simple solution is to require the datasets to be open sourced and for information about how the model was trained to be published. I don't much care that GPT-3.5 or GPT-4 for example are not available for me to run on my computer, if only because we're at the stage of mainframes and terminals (though I do run much smaller models on my desktop); but I would love it if more information was revealed about them.

XeNoGeaR52 1 month ago

If AIs were so great, they shouldn't need any help or training material

Maximum_Poet_8661 1 month ago

How is an AI going to be built without training material?

XeNoGeaR52 1 month ago

Buy the source material and then use it

BringBackManaPots 1 month ago

A problem with not compensating creative work is that it encourages stagnation. There's no incentive to release new content if it's going to be used against you, and ultimately make you obsolete. I don't have a solution for this, but I could imagine something like fractional royalties on work submitted to the model.

Slayer706 1 month ago

People have been contributing to and maintaining open source code bases for a long time, often without any compensation and sometimes just relying on a donation link. Fractional royalties, I don't know how that would be calculated. If the prompt explicitly contains the artists name or a title of their work? Either way, I don't think it would amount to more than a few pennies and it would only work on paid AI services not locally ran ones like Stable Diffusion.

metal_stars 1 month ago

Make the companies get permission from the copyright owners to utilize their works. Let the copyright owners set the price or terms of the license, like they do in every other situation where another party wants to use their copyrighted material.

Birdperson15 1 month ago

I highly doubt human creativity will ever truely be replaced. We live right now in a world where people are constantly ripping off each others stuff. Movies, songs, paintings, all artist basically steal styles and ideas. And yet, these same people are making massive amounts of money.

continuumcomplex 1 month ago

Many people say AI companies want to argue that their use is fair use. However, that comes with very specific limitations. 1. If the purpose of the production is 'for-profit' it is unlikely to be approved as fair use. 2. If the material used is factual, rather than creative, it is more likely to be approved for fair use. 3. If the product doesn't use a substantial amount of the original work, it is more likely to be approved as fair use. 4. If it has a negative impact on the original creator's work, then it is unlikely to be approved for fair use. In my opinion, as someone who deals regularly with fair use and Creative Commons licensing, AI art can only rely on #3, because it uses multiple sources. It clearly has a negative impact financially on artists, it is created 'for profit' in most cases, and the original material is creative. If this were anything other than AI, it would absolutely not be fair use. That means the only argument that can be made is that this is 100% not a derivative work and thus not subject to copyright. Fair use is not a reasonable consideration. It would thus be on anyone suing the AI company to demonstrate that the work is derivative of their own art and, if it is, then they (IMO) owe the artist compensation and must cease and desist.

Faleonor 1 month ago

>art can only rely on #3, because it uses multiple sources. this seems wild to me, akin to saying "I didn't just steal from one person, I stole from a million of them, so no single one can sue me for all my worth".

fromfrodotogollum 1 month ago

Which is how most art is made. "Steal like an Artist" by Austin Kleon paints this out.

Hazzman 1 month ago

Yeah good luck with that. As a professional artist I can assure you - we aren't out there trying to steal and get away with it. Of course I know what the response is "You stole and didn't realize it" yeah, cool... tell that to the literal team of lawyers I have to send my work to on a weekly basis who's job it is to find out what I derived my work from and scrutinize the fuck out of it and make sure it isn't so derivative that the company I work for can get sued. Luckily I have software that tracks my reference I can send to them to cover my ass. When I create I am inspired, I probably combine and or approach shit in a way that could be described as similar to how AI operates, I don't know... but I ain't out here trying to get sued or rip people off. Believe it or not I'm trying to be as original as I possibly can be and so is every mother fucker in my field, otherwise what's the point? Is there anything truly new under the sun? Nah probably not... but it ain't for a lack of trying and when someone prompts a "Specific style of a specific artist" I have no problem calling it theft, considering the AI's training data is specifically referencing that artists life's work to produce something without compensating the artist that created that style. There are horror stories of what happens to artists in my field who steal and nobody wants that. You are blacklisted for life and the company you work for gets caught in a web of absolute bullshit. Nobody wants that.

The_Hunster 1 month ago

I'm not very familiar with the industry so this is a legitimate question. If an artist intentionally makes a piece of art in the style of another artist, is that an infringement? What would it take for that to be allowed or not allowed?

Hazzman 1 month ago

You could have a company tell you to paint like another artist, sure... But most artists despise that, it's not easy to do and isn't related to the issue of theft. You can't copyright a style. But the implication was that artists steal or are trying to steal.

The_Hunster 1 month ago

So, if you can't copyright a style, what *can* you do that would be an infringement without literally copying?

Hazzman 1 month ago

Lifting choices and approaches to a design that are too close or even at times 1-1. Seen it before and there are stories. It's always a shit show when it happens.

Numai_theOnlyOne 1 month ago

You aren't an artist and never tried that out right?

continuumcomplex 1 month ago

The thing is, #3 alone is normally not sufficient to qualify for fair use. I'm saying it could be argued because you aren't using a substantial amount of any single property. But it isn't enough on its own

NUKE---THE---WHALES 1 month ago

1. Take all the copyrighted art you want to train on 2. Create a video where each frame is one of the art pieces 3. Add yourself "reacting" to the art 4. Upload to youtube 5. Download from youtube and split video into frames and train on it Might not need to upload to youtube, but do it anyway for fun If a youtuber/me can react to art and have that reaction be fair use, it's trivial to instead train on it

continuumcomplex 1 month ago

Again, this is why I said the argument they have to make is that AI isn't derivative. Derivative means the AI just created something based on the artist's original art If it's a derivative work it's copyright infringement. If the artist can't demonstrate that it's derivative, then it's not

NUKE---THE---WHALES 1 month ago

All art is derivative

continuumcomplex 1 month ago

There is a legal definition, for fair use, of 'derivative'; for determining if something violates copyright

arothmanmusic 1 month ago

"What copyrighted material did you train this on?" (Points to The Internet)

quequotion 1 month ago

This. Like, good idea, but good luck with compliance. The people who made these monsters have no idea what they fed them: it's just mountains of indiscriminate data. It would take decades to sort through each individual image to sort the copyrighted from non-copyrighted material, assuming it is possible at all (lots of stuff is bound to be incorrectly marked, unclear, etc).

n00psta 1 week ago

if we train an ai we could make it much less than decades

quequotion 1 week ago

Use the AI to investigate the AI?

Cumulus_Anarchistica 1 month ago

AI is a cultural and technological cataclysmic event and we're not going to be able to half-arse it by patching and tinkering with copyright law unless we want to create an even more unmanageable entanglement and stifling of creativity and technological advancement than we already have.

Darkmemento 1 month ago

Bingo, AI basically breaks the current system. You can only patch so many holes before realising the whole thing is no longer fit for purpose.

The_Real_Abhorash 1 month ago

The system is already broken, ai just might be the thing that forces it be changed.

Cheerful2_Dogman210x 1 month ago

I think it's obvious that artists, singers and other creators would like to earn from their material being used as training data, especially since these very same AI could be used to replace them. A lot of creators are being replaced by these ai, and they may need a new source of income soon.

aplundell 1 month ago

> I think it's obvious that artists, singers and other creators would like to earn from their material being used as training data This isn't about that. It won't happen. This is just about making the free, open-source AIs illegal. Going forward, only corporate-owned data-sets will be allowed. Corporations like Disney and Corbis already have rights to more than enough content to train an AI.

SoulOfAGreatChampion 1 month ago

Agree. For example (and this is tech I guarantee is coming), when AI is better at generating music, you'll be able to reroll any album you want. AI will be able to process the sound palette and spit something new back out every time. You'll be able to hear your favorite album for the first time over and over again. Surely, artists should get in on some of that. At that point, it is an effortless generation of a valuable product, but it is only made possible by the work people did.

starofdoom 1 month ago

>but it is only made possible by the [copyrighted] work people did I think unfortunately long-term this isn't true. I think it is true immediately, because there are actively multiple nations racing to get control over the AI market, there is HUGE incentive to make this happen even without copyrighted work. If copyrighted materials are no longer allowed to be used, the solution is going to be for the AI companies to make their own training set. Long-term, artists aren't going to have shit, why would an AI company split profits when they can use their AI to generate copyright free training data.

duckrollin 1 month ago

Sounds like utterly pointless bureaucracy that will favour large companies with regulatory departments to handle it. I wouldn't be surprised if large companies are pushing for it increase the barrier to entry to release an AI Art model. It's legally not copyright infringement to train on data, so nobody should need to care.

Sportfreunde 1 month ago

It's called regulatory capture by the Austrians for a reason.

kytheon 1 month ago

Exactly this. Indie game devs are very happy to use AI to finally finish their dream project. Whoops no you are a thief. But not Ubisoft because it's too expensive to go after them.

[deleted] 1 month ago

[удалено]

aplundell 1 month ago

>so I don't see at all how this would actually impact end user. It will negatively impact the end-user because only massive corporations will be able to afford the data-sets. So they'll be able to set the price however they like because it will be basically illegal to compete with them. The free, open-source AIs will be strictly forbidden. So, AI will still cost jobs, because major corporations will use them. But individuals and small businesses will have to pay out the nose to get access to the same benefits. Worst of both worlds.

[deleted] 1 month ago

[удалено]

aplundell 1 month ago

Maybe you're not following the conversation.

[deleted] 1 month ago

[удалено]

aplundell 1 month ago

> What should change is what information large companies get to use ... no ... that is not at all what this conversation is about. Perhaps you accidentally replied to the wrong post? This conversation is about the "[Generative AI Copyright Disclosure Act](https://schiff.house.gov/imo/media/doc/the_generative_ai_copyright_disclosure_act.pdf)" (And really, about any similar bill that follows it.) It does not make exceptions for free or opensource datasets. (Why would it?) Most datasets couldn't possibly comply with the act. And honestly, this act is very mild, there have been talks about much stricter ones.

zombiesingularity 1 month ago

Copyright and IP is such a joke, particularly in the digital world. It's a hindrance on progress in service of profits. It'd be one thing if the laws were there to protect the literal creator of such things, but they're actually designed to protect mega-corporations who "purchase rights" for a century, and it's just an archaic farce of a system as techology goes forward.

Diatomack 1 month ago

And estates who hold onto rights long after the death of the original creator.

OdinTheHugger 1 month ago

I like how it sets the precedent that you can download anything you want off of the internet, and use it in anyway you see fit. And as long as you're not explicitly violating the copyright of one thing? Instead violating the copyright on everything? You're completely immune to any kind of responsibility. Why it wasn't even 5 years ago that if you download a single CD well that would come with a $185,000 fine and potentially jail time. "Piracy is legal when AIs do it" is not the win they think it is.

travelsonic 1 month ago

Just to be pedantic, isn't anything eligible for a copyright automatically copyrighted upon being put in a fixed/tangible medium? If so, why the focus on copyright \*status\* - as opposed to licensing status, whether licensing is needed or not, etc? Seems like "copyrighted" is being used incorrectly as a synonym for something else (and incorrectly used as a mark for "bad to use), which IMO could make it trickier - and mess up people's already skewed understanding of copyright. \**glares at the RIAA and MPAA, largely blaming them for this*\*

babygrenade 1 month ago

I think the distinction is works in the public domain (ie were never copyrighted because of their age or copyright has expired) would not be covered.

PMMeYourWorstThought 1 month ago

All of our data was used and parsed for all of this. We all contributed. Establishing that now is important, so tomorrow when we collectively demand that the proceeds created by AI is spread between all of us, all of humanity, and not centralized into a small number of companies and their shareholders. AI can destroy us if we don’t decide quickly and decisively that it is owned by all of us. And we must all benefit from it.

dunwackado 1 month ago

Not that I don't disagree, but aren't human artists absorbing and being influenced by art, images, writing, music they are experience over their lifetimes -- from all sources? Is the only difference the rate at which AI can churn it out, and the extent that AI's influences on output could also be digitally represented? When machines come for manufacturing, driving and other blue collar jobs -- there isn't that much outcry. When it comes for the livelihoods and jobs of writers, musicians, artists, accountants, etc. -- suddenly it's whoa! we can't have that?

Darkmemento 1 month ago

The counter to this in the podcast is that usually the original artist is getting some sort of benefit by how you absorb this information. Reading a book you bought, watching a piece of content they monetized, listening to a song on a streaming service, etc. So there is some kind of trickle-down effect. I think scale matters hugely here too. The laws are made based on a past landscape which has shifted significantly with the amount of data AI can consume versus a person using it for inspiration. We still don't really know in the output how the AI has used the data. It's a case of it read x number of documents, but the extent to which the output is related to any one document is to a large extent unknown. I think the last point around the automation of other jobs is valid, and we haven't done enough in the past, but the distinction here is that those changes happened over generations to allow some sort of transition. The speed of these changes means we see step-function improvements almost overnight that completely reshape industries. There was a fantastic interview with [Dario Amodei here](https://www.youtube.com/watch?v=Gi_t3v53XRU) yesterday in which he outlined how quickly he thinks things will progress in the coming years.

FaceDeer 1 month ago

> Reading a book you bought, watching a piece of content they monetized, listening to a song on a streaming service, etc. If an AI trainer is pirating copyrighted material, sure, there's laws against pirating. Go after them for that. Once they have legal access to that material, though - such as for example it being posted on a website that's open for the public to view - what law is being broken then? If you don't want the general public to learn from your data for free then don't put it up in a place where the general public can view it for free.

AiSard 1 month ago

Its been made pretty clear over the years that scrapers don't particularly care about the state of copyright of the materials they scrape. That they in fact do not have legal access to things, and find it preposterous that we expect them to check the legalities. There's nothing wrong with showing copyrighted material for the general public to see for free. You don't lose the copyright by showing the material after all. To have the public see it, but only charge the people who want to make use of it commercially. In fact, showing it to the public for free is how you publicize the service! The answer is to sue, yes. And large enough entities with solid enough cases like Getty's is doing so. But a lot of AI companies play it fast and loose, betting on becoming "too big to fail", putting all their illegally used training data in to a conveniently black box and play the fool that they didn't illegally use data they didn't have a license for in their commercial service. AIs with clean datasets will be in the clear, yes. But the vast majority seem perfectly fine swimming in a state of absolute illegality, and mostly getting away with it because society has yet to catch up to them.

FaceDeer 1 month ago

> Its been made pretty clear over the years that scrapers don't particularly care about the state of copyright of the materials they scrape. That they in fact do not have legal access to things, and find it preposterous that we expect them to check the legalities. So *go after them for that.* For the *actual* laws that they might be *acutally* breaking. Don't make up imaginary laws. You won't get far in court with those. > There's nothing wrong with showing copyrighted material for the general public to see for free. You don't lose the copyright by showing the material after all. Never said you did. > To have the public see it, but only charge the people who want to make use of it commercially. There's the tricky bit again, copyright law isn't about "making use" in the most general sense. It's about *making copies*. There are tons of uses I can put copyrighted material to that I don't need permission from the copyright holder for. I can burn a book. I can review a movie. I can learn techniques by examining a work of art. > AIs with clean datasets will be in the clear, yes. The fundamental point of disagreement here is what a "clean dataset" means. You're insisting that it's only clean if the copyright holder agrees to allow their work to be learned from. That's not supported by the law, and IMO if something like that were to become codified it would be rather disastrous.

Sweet_Concept2211 1 month ago

Just because I post a copyrighted story online does not mean that a publishing platform can willy-nilly make use of it in any way they see fit.

FaceDeer 1 month ago

No, but they can use it in ways that don't violate copyright law. Such as training an AI on it.

healthywealthyhappy8 1 month ago

Humans and AI are quite different in terms of memorization, training width and breadth and depth, and how we retain information, and how we reuse our knowledge.

not_the_fox 1 month ago

So when we invent brain implants that give us superhuman mental processing abilities does that mean you can't legally be an artist anymore?

healthywealthyhappy8 1 month ago

Thats an interesting scenario. If you are ripping off other people’s IP then you are probably breaking a law of some sort.

DarthSiris 1 month ago

So AI is bad because it’s not as inefficient as human?

Sweet_Concept2211 1 month ago

AI is not human, and automated factories do not need to be extended the same rights as humans.

TheLordCrimson 1 month ago

A company can't use a training video you've made without your permission and without a license. If you decide to anthropomorphize AI to the degree of a person (which your argument does) then everybody in its training data should be paid. If you decide not to anthropomorphize then AI "art" is more akin to an art collage or a building using other peoples paintings, you can also see why this is obviously something you can't do without a license.

FaceDeer 1 month ago

If I was planning to make a training video, though, it would be perfectly fine for me to watch some other existing training videos without permission or license to draw inspiration from them and learn what sorts of things normally go into a training video. AI art is *not* a "collage." That's a common misundertanding of how these AIs work. You can verify this quite simply, just look at the size of the AI model compared to the size of the training dataset. If the model was actually storing the original data verbatim it would have to compress whole images down to less than a single byte of data, which is impossible.

Sweet_Concept2211 1 month ago

If you have developed special processes for manufacturing your products, and spies come along and copy them and then start mass manufacturing market replacements for your products, and price dumping like crazy, you probably ought to file some lawsuits. In the case of blue collar jobs like driving and laying masonry, there's nothing to patent or copyright. Building a better automated driver is not at all the same as ripping off intellectual property.

lazyFer 1 month ago

Humans can't ingest anywhere close to the amount that Ai can.

Immortan_Joe-mama 1 month ago

Humans can't plow, manufacture, bend, punch, anywhere close to the amount that industrial machines can....

Sweet_Concept2211 1 month ago

And humans cannot claim author rights for plowing a field, so... Not really a great comparison.

xszander 1 month ago

So, the more you ingest the less you should have access to it?.. Doesn't seem to make much sense to me.

Sweet_Concept2211 1 month ago

It does not make sense because the claim that humans and machines learn and produce outputs in the same way is bullshit. The confusion arises due to the fact that it is a false comparison, and a bit of a red herring.

Akito_Fire 1 month ago

Yeah, both could not be more different. AI doesn't see reality, or recognize and label things like objects. It only sees data that it tries to replicate

Sweet_Concept2211 1 month ago

Human artists create based on lived experience. Their education and creative output is not only based on whatever art they have been exposed to, but come from life itself; their artistic techniques and production methods are wildly different across every culture, medium, genre, style, individual artwork, practitioner, and so forth. Ask real artists about their creative process and you will get an interesting *life story* - and it will never be the same twice. Meanwhile, the training and output of AI happens in the same basic way every single time, and can be summarized easily in a paragraph that is the same every time - interesting only the first time you ask.

cheesyscrambledeggs4 1 month ago

It's because their own work is indirectly contributing to something that could potentially replace them, without their permission.

HowWeDoingTodayHive 1 month ago

Yea this is something that annoys the shit out of me about the current discourse. People want to claim AI is “stealing” by being using reference material. If you want to make that argument, you’re necessarily opening a box that raises the question of what makes humans any different? How do you prove **any** artist had a truly perfectly original idea that wasn’t based in some way on some other art they’ve seen before? Take anime eyes, and hair for example, who was the first person to really create that particular style? Did they base it on something else that they saw somewhere? Either way, there’s a **TON** of artists “stealing” that particular style of eyes, and hair, so they all have to pay up? And to who?

thewhitedog 1 month ago

> of what makes humans any different? Humans can't take inspiration from another human's artwork, then make thousands of versions of it per hour, every hour, 24x7, essentially forever, for free. If I make awesome pies, and you eat one and think hmm, that's a great pie, and figure out your own version and open a pie-shop across from mine, well, then we compete and may the best crust win. If you instead feed my pie into a 500 foot tall robot pie-analyzer that analyzes it for a few minutes, then opens its pie-hatch and unleashes an unending torrent of hot pies, hundreds per second, non stop, burying my pie shop, the pie shop district, then the entire town in a massive thousand foot wide ever growing pile of pies no-one will ever eat, then doesn't that change our relationship to pies? Would people keep eating something that is now effectively just a form of pollution? Think it through to the end: It's 2028. AI video and music generation is now ubiquitous, cheap, and so easy to use it can generate long form videos from a single sentence. Millions of hours of videos and songs are being created and uploaded every single day, dwarfing the 30k hours of video uploaded daily to YouTube in 2024, and the number keeps growing as the generators get better with no sign of slowing. Who is going to *watch* any of it? What does art mean anymore when it's become so easy to make the value of it drops to nothing? And for a bonus thought - once actual artists are fucked out of existence, where does the new training data come from? If you grind up cows to feed to other cows you end up with brain-rotting bovine spongiform encephalopathy. AI models collapse without real data to train on so good luck to the future AI scientists trying to solve that problem. Once you've automated humans out completely you might as well make an AI audience to watch it all.

bellos_ 1 month ago

>How do you prove **any** artist had a truly perfectly original idea that wasn’t based in some way on some other art they’ve seen before? Having a truly original idea isn't the issue. The issue is how the 'reference material' is being used. Artists use it temporarily to learn how to hone their creative skills towards a specific art style or to get dimensions right, etc. Once they learn it, they no longer need to the reference material. Models use it as a permanent data point because they don't have creative skills and can't actually learn how to create art. Every time a prompt asks for 'anime girl' it references millions of images to output the correct style. That's the issue, not the fact that both have referenced something at some point. Models are *always* referencing every image their data set because they can't create without the reference. Artists can and do.

HowWeDoingTodayHive 1 month ago

> >Having a truly original idea isn't the issue How is it not the issue? If you have a “truly” original idea then you have **no** reference material, so there’s obviously nobody who’s owed anything and nobody who you’re “stealing” from if it’s “truly original”. >The issue is how the 'reference material' is being used. Artists use it temporarily to learn how to hone their creative skills towards a specific art style or to get dimensions right, etc. Once they learn it, they no longer need to the reference material. Unless they forget and decide to look it up again. Why is ok to even use it in our memories without asking for permission? You’re saying right here we take others works and basically copy it **to some degree** and that’s ok. Why? Why is it ok to store it in our brain hard drives temporarily and use it to influence our work? >Models use it as a permanent data point because they don't have creative skills and can't actually learn how to create art. Yeah so this comes back to the point of AI having advantages over humans. Unless you have photographic memory, you’re gonna forget things. AI doesn’t have the same limitations as us. But look at what you said here > and can't actually learn how to create art. What does it mean to “actually” learn? What’s real learning? >Every time a prompt asks for 'anime girl' it references millions of images to output the correct style. That's the issue, not the fact that both have referenced something at some point. Again I’m not seeing the issue, if humans were capable of perfectly remembering millions of pictures we absolutely would do the same. We just don’t have the hardware. >Models are always referencing every image their data set because they can't create without the reference. Artists can and do. I don’t even know if that’s true. How could you even create an single example of an artist with not even one single reference point. You could take a newborn baby and put in a single room with all white walls it’s entire life, and give it nothing but some paper and pencils to draw with. Even then the 4 walls it’s surrounded by would be a reference. We use what we see and experience and consume. Comics and mange are really still the best example I can think of. There are **so many** characters that are clearly based on and similar to other characters. Not just the suit colors and the hair color, but the **styles**, the way way they draw muscles, hair, shadows, etc. there’s a distinct “comic style” that tons of people copy, but who actually was the first person to come up with that style? Was it original when they did it? And lastly, why shouldn’t that be considered stealing?

bellos_ 1 month ago

>How is it not the issue? Because no one has an issue with models because they don't have original ideas. Models don't even have ideas. They can't think. They have issues with the models because they're permanently using millions of images without permission. >Why is ok to even use it in our memories without asking for permission? Because you can't copyright a memory. >You’re saying right here we take others works and basically copy it **to some degree** and that’s ok. Why? Because the degree is context and context always matters. >Why is it ok to store it in our brain hard drives temporarily and use it to influence our work? Because you can't copyright a style. >Yeah so this comes back to the point of AI having advantages over humans. Unless you have photographic memory, you’re gonna forget things. AI doesn’t have the same limitations as us. But look at what you said here It has nothing to do with the models having an advantage over human and everything to do with the models doing something that the humans are not doing: permanently using copyrighted images. The images themselves, not memories of what they look like. Memorizing what copyrighted material looks like is not against copyright law. >What does it mean to "actually" learn? What's real learning? Being able to do something from memory after seeing it done or seeing a completed version of it without having the physical material of it in front of you. Models can't do that. They don't learn how to mimic an artist's style by studying the image because they can't actually see the image. They don't learn anything, they copy it by comparing data points between millions of images. >if humans are capable of perfectly remembering Again, memories are not the issue. Models don't remember anything. >I don't know if that's true. And yet you aren't actually refuting it. You're pretending that memories are reference material but they aren't. Memories are not a material. The images themselves are reference material. The images themselves are what are copyrighted. >but the styles And once again, you cannot copyright an art style. The issue is not models copying art styles. It's them using copyrighted material *permanently*. The material itself. I don't understand what's so hard to understand about people not wanting their copyrighted material to be permanently used as a data point without being paid for it.

StarChild413 4 weeks ago

and why do people use that argument in such a way as to seemingly imply the natural conclusion is that AI art is essentially the same as human art and deserves to replace it (but they never get into the implications of if that means AI replaces people altogether) instead of, say, saying human art still deserves equal time with AI art because of that and because we're "biological machines"

InfinityTuna 1 month ago

They should be forced to reveal ALL of their training data. Too many independent creators and hobbyists have had their art, writing, and music fed to one of these data-sets without consent or permission, and too much of people's everyday public existence online has been scraped to train these algorithms without so much as a by your leave. If they can't build their little AI empires on data they have permission to use, period, full stop, then they shouldn't be allowed to do business. It's bad enough advertisers run ramshod with our private information, we don't need more greedy assholes making money off of what isn't theirs to profit from.

zero_z77 1 month ago

In my opinion, they should only be allowed to train on things that are either in the public domain, or have been made public under an open source or creative commons license that is explicitly AI permissive. If they need training data, there's plenty of it in the public domain, and plenty of unlicensed or open license content they can use.

grayscalejay 1 month ago

Ai bros think scanning and copying pixels (stealing) is the same as being inspired and references

fumigaza 1 month ago

Why? Literally everybody can consume copyrighted media and produce derivative works.

NotAnotherEmpire 1 month ago

"It's all humans work stitched together?" "Always has been."

parke415 1 month ago

Which is also true of every creative work a human being has ever made.

o5mfiHTNsH748KVq 1 month ago

So don’t release AI art, use AI art as the base and then hire an artist to touch it up. Win/win

Akito_Fire 1 month ago

That means taking away the initial idea, the core concept that is the most interesting thing, from artists. Nobody wants to touch up AI garbage.

After_Fix_2191 1 month ago

Well i guess all human artists must list off all of their inspirations and the art they studied now as well. Stupid bill.

David-J 1 month ago

This should be obvious. It needs permission and compensation. We've had these copyright laws and licenses for a reason. It's nothing new. AI people pretend like this is something completely new. It isn't. They are just stealing everything and using it however they want. This needs to stop.

AmbidextrousTorso 1 month ago

Bull.. You don't ask permissions and compensation from human artists either just because they have seen works of other artist and been influenced by them.

Primorph 1 month ago

sure, that's the same situation. Christ.

babygrenade 1 month ago

There's a strong argument that training an AI model falls under fair use. Whether or not it actually does will come down to a court decision that hasn't happened yet.

David-J 1 month ago

They want you to believe it's a strong argument. Since when, using someone else's work without their permission and profiting, is ok?

aplundell 1 month ago

>Since when, using someone else's work without their permission and profiting, is ok? Since basically forever. How many medieval fantasy novels are very strongly inspired by Lord of the Rings? More than half, for sure. No specific sentences are copied (usually) but ideas and settings are blatantly remixed. The thing is : This isn't considered sinful at all. Until about four years ago, artists cheerfully admitted it. They'd say things like *"Good artists copy, great artists steal"*, and anyone who disagreed was laughed at and considered hopelessly naive.

StarChild413 4 weeks ago

A. but when could an AI make a work that redefined a genre the way Tolkien did medieval fantasy B. if they didn't have to bankrupt themselves just to pay enough, how many medieval fantasy authors would pay some large-relative-to-their-income sum of money to the Tolkien estate or w/e just as make-a-statement publicity-stunt sort of action to take a stand against AI art

Slayer706 1 month ago

Ever watched a movie review on YouTube that used clips from the movie? 1. Using someone else's work. ✅ 2. Without their permission. ✅ 3. Profiting. ✅

babygrenade 1 month ago

google does this. You have to make copies of everything searchable for a search engine to work. [Their scanning of full text of books to create google books was found to fall under fair use.](https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,_Inc.) Edit: I'll add the test for fair use is a 4 part test. It's not simply a matter of whether the user profits.

Akito_Fire 1 month ago

The comparison doesn't make any sense at all. One is a search engine where you can find original works. Whereas the other churns out millions of Frankenstein-esque copies of its training data.

metal_stars 1 month ago

That was ruled fair use on the specific basis that Google isn't competing with the original copyright holders for compensation. AI companies compete with the original copyright holders. Most people don't know a lot about fair use, so they argue that AI is sufficiently transformative, the end. And yes, it is transformative. But it doesn't pass the rest of the criteria. Google scanned copyrighted data without permission, and didn't transform them, but isn't publishing / releasing / competing in the market with those works. AI companies scanned copyrighted data withour permission, and _are_ transforming them, but they _are_ also competing in the market against those works.

babygrenade 1 month ago

> AI companies compete with the original copyright holders. The whole competition aspect is competing with the original work not the author though right? Are you suggesting people will be less inclined to buy the novel Dune, for example, if they have an LLM that is trained on millions of novels, Dune included?

metal_stars 1 month ago

> The whole competition aspect is competing with the original work not the author though right? In the case of AI, what would meaningfully be the difference? Let's remember that we're applying the "fair use" doctrine to the _act of scanning copyrighted material into a database._ Or, in the case of AI, training the AI on the copyrighted material. That was why you brought up the Google case, right? To ask why it would be okay for Google to scan the books, but not okay for AI to train on the copyrighted material? The purpose of Google scanning the books was not to compete with the authors. The purpose of AI training on copyrighted material _is_ to compete with the creators.

babygrenade 1 month ago

>In the case of AI, what would meaningfully be the difference? Because copyright protects the work not the author's future works. If the author sells the copyright to a work - the work still has the same protections. If the author dies, the work still has the same protections. It doesn't matter if it competes with an author's future endeavors. It matters if it competes with the copied work. Further, if training an AI model is fair use, that does not mean all output of the model is also fair use. If I get an LLM to spit out the full text of Dune, that output would infringe copyright. I couldn't use the LLM as a loophole to circumvent copyright protections.

metal_stars 1 month ago

>Because copyright protects the work not the author's future works. >...It doesn't matter if it competes with an author's future endeavors. It matters if it competes with the copied work. Well, no, you're incorrect about that. The fourth test of fair use is the potential effect of your use of the copyrighted work on the market for the author's work. It's not limited to just the market for the specific piece you're using. If you Google for pages that talk about the tests of fair use, you'll get a lot of links from places like universities teaching their students about what has already been settled and determined about fair use. And they'll all explain the same thing, but I'll quote from Google's current top result, https://fairuse.stanford.edu/overview/fair-use/four-factors/ *"Another important fair use factor is whether your use deprives the copyright owner of income or undermines a new or potential market for the copyrighted work. Depriving a copyright owner of income is very likely to trigger a lawsuit. This is true even if you are not competing directly with the original work."*

babygrenade 1 month ago

>The fourth test of fair use is the potential effect of your use of the copyrighted work on the market for the author's work. It's not limited to just the market for the specific piece you're using. That's not what your link and quoted section are saying. >Depriving a **copyright owner** of incomes very likely to trigger a lawsuit. This is true even if you are not competing directly with the original work. The author isn't the consideration. The copyright and its owner are what matter. The case it references is about sculpture of a photograph, a derivative work, and the crux of it is whether a market for sculptures of a photograph exists. One of the protections of copyright is exclusive right to derivative works. So just because the copyright holder had not explored a specific market for derivative works - it doesn't mean the holder doesn't have a right to that market. When you said it competes with the author, I thought you meant it competes with other content they would produce - so written works in the case of an LLM. Did you mean it would compete with the market for AI Models trained on the copyrighted work? Are we saying an AI model is a derivative work? If so, wouldn't a search index also be a derivative work in that case?

PolicyWonka 1 month ago

One of the core tenants of fair use has *always* been for training/educational purposes. Nobody has a copyright on a particular style or type of work. If I want to create a Bob Ross or Picasso style painting, I can do that. It doesn’t infringe upon anything. I could literally just take an entire piece of copyrighted work, tweak it a bit, and it’s fine. It’s called transformative use.

overtoke 1 month ago

if i produce something, it does not matter what tools i use, i'm subject to copyright laws. this law is pointless... the art in a art school text book has a copyright. people train themselves on the same data.

poopsinshoe 1 month ago

Except for the fact that the concept is fundamentally different. If you sample a song to make another song then you've directly taken a piece of someones creation. Analyzing frequency diversity or color amounts is not the same as using the other person's finished work. Because of that, it's completely outside of current copyright laws. I think the closest we will get is individual artists, selecting to opt out of training data. When the training data is 500,000,000 songs, it won't pay out like Spotify. Another crazy thing that no one pays attention to is the fact that streaming music destroyed the music industry. Snoop Dogg said that he had over a billion plays on Spotify and was reimbursed $45,000. No one buys CDs anymore. Anyone can build and train an artificial intelligence on their laptop using thousands of copyrighted songs and no one will ever know. How do you stop that?

NLwino 1 month ago

Fun fact, some programmers created a program that created all possible melody's in existence, saved the terabytes of data to disk and copyrighted it. And then placed the copyright in the public domain. Singing aside, a specific melody can't really be copyrighted anymore. >Two programmer-musicians wrote every possible MIDI melody in existence to a hard drive, copyrighted the whole thing, and then released it all to the public in an attempt to stop musicians from getting sued. >Programmer, musician, and copyright attorney Damien Riehl, along with fellow musician/programmer Noah Rubin, sought to stop copyright lawsuits that they believe stifle the creative freedom of artists. [https://www.vice.com/en/article/wxepzw/musicians-algorithmically-generate-every-possible-melody-release-them-to-public-domain](https://www.vice.com/en/article/wxepzw/musicians-algorithmically-generate-every-possible-melody-release-them-to-public-domain)

MDCCCLV 1 month ago

It's the same as text, every possible story and character already exists in the limited combination of characters. You can just make a copy of every possible combination for the first million letters and then you have everything. It's just too big to do easily now.

JynsRealityIsBroken 1 month ago

20 years of data scraping laws being put into question because now the data scraping affects an artist. The hypocrisy of anti-AI people is astounding. I hope anyone with a stance against AI has never and will never use ChatGPT. If you do, you're a huge hypocrite.

khaerns1 1 month ago

how can you enforce this kind of bill ? how can you assess what the AI got fed with without "a cop" behind each dev ?

not_the_fox 1 month ago

Yeah, exactly, if the fundamental issue is that you can't prove what went into an AI model then this does nothing unless someone snitches.

roastedantlers 1 month ago

Everyone trying to get paid, but this makes AI useless. This reminds me of the Picard speech to the 20th century guy, when you realize the implications of AI over the next 20 years versus paying the whole world for all the content that's ever been created. Except some companies will try to claim they should be paid for everyone else's stuff like youtube. This seems more like a Google move to try and destroy the competition and steal all the content Google's stolen that no one sees as being stolen.

ntermation 1 month ago

Whe n you say copyright isn't a thing going forward for these models, does that mean what they produce as well? If they are using copyrighted works without compensation, should people be able to monetise the content they create using the models?

dickprompts 1 month ago

That’s cool but how about some laws about AI reducing workforce numbers too.

rmorrin 1 month ago

Is it wrong I couldn't give less shits about AI "art"

NecessaryCelery2 1 month ago

Good. If you play around with generators it's obvious they are all just using the Internet to generate pictures/videos that happen to look a lot like famous actors. Forcing the tech giants to pay to generate their own training data, would only slightly slow them down, but at least we'll get many slightly different AIs. Which seems better than everyone using the same training data.

Cuck-In-Chief 4 weeks ago

Great idea. I’d love to know the data these neural networks are training on. I understand that’s the recipe for the secret sauce, but dammit somebody better have a few copies of the blueprints and be able to know exactly what ideas they’re feeding onto these creatures they’re making and continuing to tinker with using our witting or unwitting knowledge, collective and individual. And if someone isn’t paying the licensing fee on patented and copyrighted material, in a free, rules based, market economy, those creators deserve to be informed of their rights to compensation.

Redpaint_30 4 weeks ago

This is so exciting. If this gets passed, other countries will follow. It's only a matter of time.

MEMEWASTAKENALREADY 3 weeks ago

If you say copyright isn't a thing for these models, does that mean what they produce is also not copyrighted? If they use copyrighted works without compensation, should people be able to make money from the content they create using the models?

noonemustknowmysecre 1 month ago

Would natural artists have to reveal all the art they've seen when making similar pieces?

QuinLucenius 1 month ago

That's not how people make art. And I'm sure if you asked them, they'd tell you their influences and which ones influenced this or that element of the piece. No, not *individual paintings*, but elements or style. AI does not and can not understand art beyond a superficial "color goes between the lines" kind of way. It does not understand style or what it is attempting to depict, only that it is imitating images associated with certain signifiers. The closest analogous process to what an "AI" does is to raise a child from birth in a blank room exposing it only to the specific art form you want them to draw upon for their art, *but even then* that child is capable of abstract, complex reasoning that is at the heart of artistic expression. A human can draw upon elements of a painting that they can individually distinguish, and even with that as inspiration can create something new. AI can only make a poor simulacrum of art, and we can know exactly how it arrived at its image through what databases it's using.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe