2 authors say OpenAI 'ingested' their books to train ChatGPT. Now they're suing, and a 'wave' of similar court cases may follow.

L4sBot@lemmy.world · 1 year ago

2 authors say OpenAI 'ingested' their books to train ChatGPT. Now they're suing, and a 'wave' of similar court cases may follow.

totallynotarobot@lemmy.world · 1 year ago

Can’t reply directly to @OldGreyTroll@kbin.social because of that “language” bug, but:

The problem is that they then sell the notes in that database for giant piles of cash. Props to you if you’re profiting off your research the way OpenAI can profit off its model.

But yes, the lack of meat is an issue. If I read that article right, it’s not the one being contested here though. (IANAL and this is the only article I’ve read on this particular suit, so I may be wrong).

totallynotarobot@lemmy.world · 1 year ago

@owf@kbin.social can’t reply directly to you either, same language bug between lemmy and kbin.

That’s a great way to put it.

Frankly idc if it’s “technically legal,” it’s fucking slimy and desperately short-term. The aforementioned chuckleheads will doom our collective creativity for their own immediate gain if they’re not stopped.

Sjatar@sjatar.net · edit-2 1 year ago

Was also going to reply to them!

"Well if you do that you source and reference. AIs do not do that, by design can’t.

So it’s more like you summarized a bunch of books. Pass it of as your own research. Then publish and sell that.

I’m pretty sure the authors of the books you used would be pissed."

Again cannot reply to kbin users.

“I don’t have a problem with the summarized part ^^ What is not present for a AI is that it cannot credit or reference. And that is makes up credits and references if asked to do so.” @bioemerl@kbin.social

totallynotarobot@lemmy.world · 1 year ago

Good point, attribution is a non-trivial part of it.

jecxjo@midwest.social · edit-2 1 year ago

The only question I have to content creators of any kind who are worried about AI…do you go after every human who consumed your content when they create anything remotely connected to your work?

I feel like we have a bias towards humans, that unless you’re actively trying to steal someone’s idea or concepts we ignore the fact that your content is distilled into some neurons in their brain and a part of what they create from that point forward. Would someone with an eidetic memory be forbidden from consuming your work as they could internally reference your material when creating their own?

Eccitaze@yiffit.net · 1 year ago

The problem with AI as it currently stands is that it has no actual comprehension of the prompt, or ability to make leaps of logic, nor does it have the ability to extend and build upon existing work to legitimately transform it, except by using other works already fed into its model. All it can do is blend a bunch of shit together to make something that meets a set of criteria. There’s little actual fundamental difference between what ChatGPT does and what a procedurally generated game like most roguelikes do–the only real difference is that ChatGPT uses a prompt while a roguelike uses a RNG seed. In both cases, though, the resulting product is limited solely to the assets available to it, and if I made a roguelike that used assets ripped straight from Mario, Zelda, Mass Effect, Crash Bandicoot, Resident Evil, and Undertale, I’d be slapped with a cease and desist fast enough to make my head spin.

The fact that OpenAI stole content from everybody in order to make its model doesn’t make it less infringing.

jecxjo@midwest.social · 1 year ago

The fact that OpenAI stole content from everybody in order to make its model doesn’t make it less infringing.

Totally in agreement with you here. They did something wrong and should have to deal with that.

But my question is more about…

The problem with AI as it currently stands is that it has no actual comprehension of the prompt, or ability to make leaps of logic, nor does it have the ability to extend and build upon existing work to legitimately transform it, except by using other works already fed into its model

Is comprehension necessary for breaking copyright infringement? Is it really about a creator being able to be logical or to extend concepts?

I think we have a definition problem with exactly what the issue is. This may be a little too philosophical but what part of you isn’t processing your historical experiences and generating derivative works? When I saw “dog” the thing that pops into your head is an amalgamation of your past experiences and visuals of dogs. Is the only difference between you and a computer the fact that you had experiences with non created works while the AI is explicitly fed created content?

AI could be created with a bit of randomness added in to make what it generates “creative” instead of derivative but I’m wondering what level of pure noise needs to be added to be considered created by AI? Can any of us truly create something that isn’t in some part derivative?

There’s little actual fundamental difference between what ChatGPT does and what a procedurally generated game like most roguelikes do

Agreed. I think at this point we are in a strange place because most people think ChatGPT is a far bigger leap in technology than it truly is. It’s biggest achievement was being able to process synthesized data fast enough to make it feel conversational.

What worries me is that we will set laws and legal precedent based on a fundamental misunderstanding of what the technology does. I fear that had all the sample data been acquired legally people would still have the same argument think their creations exist inside the AI in some full context when it’s really just synthesized down to what is necessary to answer the question posed “what’s the statically most likely next word of this sentence?”

Eccitaze@yiffit.net · 1 year ago

Is comprehension necessary for breaking copyright infringement? Is it really about a creator being able to be logical or to extend concepts?

I think we have a definition problem with exactly what the issue is. This may be a little too philosophical but what part of you isn’t processing your historical experiences and generating derivative works? When I saw “dog” the thing that pops into your head is an amalgamation of your past experiences and visuals of dogs. Is the only difference between you and a computer the fact that you had experiences with non created works while the AI is explicitly fed created content?

That’s part of it, yes, but nowhere near the whole issue.

I think someone else summarized my issue with AI elsewhere in this thread–AI as it currently stands is fundamentally plagiaristic, because it cannot be anything more than the average of its inputs, and cannot be greater than the sum of its inputs. If you ask ChatGPT to summarize the plot of The Matrix and write a brief analysis of the themes and its opinions, ChatGPT doesn’t watch the movie, do its own analysis, and give you its own summary; instead, it will pull up the part of the database it was fed into by its learning model that relates to “The Matrix,” “movie summaries,” “movie analysis,” find what parts of its training dataset matches up to the prompt–likely an article written by Roger Ebert, maybe some scholarly articles, maybe some metacritic reviews–and spit out a response that combines those parts together into something that sounds relatively coherent.

Another issue, in my opinion, is that ChatGPT can’t take general concepts and extend them further. To go back to the movie summary example, if you asked a regular layperson human to analyze the themes in The Matrix, they would likely focus on the cool gun battles and neat special effects. If you had that same layperson attend a four-year college and receive a bachelor’s in media studies, then asked them to do the exact same analysis of The Matrix, their answer would be drastically different, even if their entire degree did not discuss The Matrix even once. This is because that layperson is (or at least should be) capable of taking generalized concepts and applying them to specific scenarios–in other words, a layperson can take the media analysis concepts they learned while earning that four-year degree, and apply them to a specific thing, even if those concepts weren’t explicitly applied to that thing. AI, as it currently stands, is incapable of this. As another example, let’s say a brand-new computing language came out tomorrow that was entirely unrelated to any currently existing computing languages. AI would be nigh-useless at analyzing and helping produce new code for that language–even if it were dead simple to use and understand–until enough humans published code samples that could be fed into the AI’s training model.