• 0 Posts
  • 297 Comments
Joined 9 months ago
cake
Cake day: December 18th, 2023

help-circle


  • But it’s not “from each according to his ability”. FOSS is what people feel like contributing. And it’s not “to each according to their need”. It’s take it or leave it, unless someone feels like fulfilling requests.

    Traditionally, the slogan meant a duty to work. Contributing what you feel like is just charity.

    Capitalism, at its core, is private control of the capital. Copyright law turns code into intellectual property/capital. I’ve read the argument that copyleft requires strong copyrights. That argument implicitly makes copyleft a feature of capitalism. You know how rich people or corporations sometimes donate large sums to get their name on something, EG a hospital wing? That’s not so different from a FOSS license that requires attribution.


  • Text explaining why the neural network representation of common features (typically with weighted proportionality to their occurrence) does not meet the definition of a mathematical average. Does it not favor common response patterns?

    Hmm. I’m not really sure why anyone would write such a text. There is no “weighted proportionality” (or pathways). Is this a common conception?

    You don’t need it to be an average of the real world to be an average. I can calculate as many average values as I want from entirely fictional worlds. It’s still a type of model which favors what it sees often over what it sees rarely. That’s a form of probability embedded, corresponding to a form of average.

    I guess you picked up on the fact that transformers output a probability distribution. I don’t think anyone calls those an average, though you could have an average distribution. Come to think of it, before you use that to pick the next token, you usually mess with it a little to make it more or less “creative”. That’s certainly no longer an average.

    You can see a neural net as a kind of regression analysis. I don’t think I have ever heard someone calling that a kind of average, though. I’m also skeptical if you can see a transformer as a regression but I don’t know this stuff well enough. When you train on some data more often than on other data, that is not how you would do a regression. Certainly, once you start RLHF training, you have left regression territory for good.

    The GPTisms might be because they are overrepresented in the finetuning data. It might also be from the RLHF and/or brought out by the system prompt.


  • I accidentally clicked reply, sorry.

    B) you do know there’s a lot of different definitions of average, right?

    I don’t think that any definition applies to this. But I’m no expert on averages. In any case, the training data is not representative of the internet or anything. It’s also not training equally on all data and not only on such text. What you get out is not representative of anything.




  • Who exactly creates the image is not the only issue and maybe I gave it too much prominence. Another factor is that the use of copyrighted training data is still being negotiated/litigated in the US. It will help if they tread lightly.

    My opinion is that it has to be legal on first amendment grounds, or more generally freedom of expression. Fair use (a US thing) derives from the 1st amendment, though not exclusively. If AI services can’t be used for creating protected speech, like parody, then this severely limits what the average person can express.

    What worries me is that the major lawsuits involve Big Tech companies. They have an interest in far-reaching IP laws; just not quite far-reaching enough to cut off their R&D.



  • You’re allowed to use copyrighted works for lots of reasons. EG satire parody, in which case you can legally publish it and make money.

    The problem is that this precise situation is not legally clear. Are you using the service to make the image or is the service making the image on your request?

    If the service is making the image and then sending it to you, then that may be a copyright violation.

    If the user is making the image while using the service as a tool, it may still be a problem. Whether this turns into a copyright violation depends a lot on what the user/creator does with the image. If they misuse it, the service might be sued for contributory infringement.

    Basically, they are playing it safe.


  • It’s all just weights and matrix multiplication and tokenization

    See, none of these is statistics, as such.

    Weights is maybe closest but they are supposed to represent the strength of a neural connection. This is originally inspired by neurobiology.

    Matrix multiplication is linear algebra and encountered in lots of contexts.

    Tokenization is a thing from NLP. It’s not what one would call a statistical method.

    So you can see where my advice comes from.

    Certainly there is nothing here that implies any kind of averaging going on.








  • That’s a result of your values. Your views on property are incompatible with equality.

    You made the assumption that I do not care if my writings are used for AI training but I actually do. I like it. I like knowing that I helped other people. I feel the same way about taxes, but this is better since it does not cost me anything.


    This may be too long but here’s a quick overview of what your views on property mean for small artists.

    Per Google, Getty Images’ archive is the largest privately-owned photographic archive in the world, containing over 130 million images dating back to the beginning of photography and beyond. Unsurprisingly, Getty is suing over AI.

    How many images does your small artist own? A few dozen? A few hundred?

    So when your small artist gets a few dollars, Getty gets many millions. Of course, they won’t be getting the same per image. Getty can pay lawyers millions to negotiate and there will still be many millions left in profit. Your small artist can’t do that. Even the negotiation would cost more than their images are worth. They can only upload to their images to Adobe or Shutterstock and accept whatever they are given.

    Even the most selfless non-profit would have to take a big chunk just to handle the cost of running the website, dealing with copyright infringement, bad quality images, “naughty” images, track payment information, handle the money,… But why should they be selfless? After all, the website is basically their property.

    Now we reach the point where it gets bad.

    Remember that the rent for these images does not create anything of value. No one is paid to make anything new. Money is transferred to property owners, because they own property. It ends up mainly with rich people, because they own so much property. Much of the money for “small artists” is wasted on bureaucracy. A good chunk also ends up with rich people, because middle men are unavoidable.

    Since we are mainly transferring and not creating wealth, it must come from somewhere. It comes from subscription fees for AI services. It can’t come from anywhere else, right?

    For example, a subscription for Photoshop has to include these fees. What Photoshop calls generative fill is genAI.

    Now riddle me this: Who pays subscriptions for Photoshop?


  • But just because I don’t have an answer to that doesn’t mean I have to agree with AI companies scraping every last corner of the internet for their datasets.

    You don’t have to agree. It’s a value judgement. What is important to you? There is no correct answer.

    My conviction is that property is mainly a means to an end. That end is human well-being, but if you pressed me on what exactly that means, I’d start flailing.

    You can believe that intellectual property is fundamentally important. Mind that what you think of as intellectual property is probably broader/different from copyright in law. You can say that enforcing this kind of property right is an end in itself, that justifies the terrible consequences. Small artists would get shafted one way or the other.