• 0 Posts
  • 23 Comments
Joined 1 year ago
cake
Cake day: July 1st, 2023

help-circle







  • Critical to understanding whether this applies is to understand “use” in the first place. I would argue it’d even more important because it’s a threshold question in whether you even need to read 107.

    17 U.S. Code § 106 - Exclusive rights in copyrighted works Subject to sections 107 through 122, the owner of copyright under this title has the exclusive rights to do and to authorize any of the following: (1)to reproduce the copyrighted work in copies or phonorecords; (2)to prepare derivative works based upon the copyrighted work; (3)to distribute copies or phonorecords of the copyrighted work to the public by sale or other transfer of ownership, or by rental, lease, or lending; (4)in the case of literary, musical, dramatic, and choreographic works, pantomimes, and motion pictures and other audiovisual works, to perform the copyrighted work publicly; (5)in the case of literary, musical, dramatic, and choreographic works, pantomimes, and pictorial, graphic, or sculptural works, including the individual images of a motion picture or other audiovisual work, to display the copyrighted work publicly; and (6)in the case of sound recordings, to perform the copyrighted work publicly by means of a digital audio transmission.

    Copyright protects just what it sounds like- the right to “copy” or reproduce a work along the examples given above. It is not clear that use in training AI falls into any of these categories. The question mainly relates to items 1 and 2.

    If you read through the court filings against OpenAI and Stability AI, much of the argument is based around trying to make a claim under case 1. If you put a model into an output loop you can get it to reproduce small sections of training data that include passages from copyrighted works, although of course nowhere near the full corpus can be retrieved because the model doesn’t contain any thing close to a full data set - the models are much too small and that’s also not how transformers architecture works. But in some cases, models can preserve and output brief sections of text or distorted images that appear highly similar to at least portions of training data. Even so, it’s not clear that this is protected under copyright law because they are small snippets that are not substitutes for the original work, and don’t affect the market for it.

    Case 2 would be relevant if an LLM were classified as a derivative work. But LLMs are also not derivative works in the conventional definition, which is things like translated or abridged versions, or different musical arrangements in the case of music.

    For these reasons, it is extremely unclear whether copyright protections are even invoked, becuase the nature of the use in model training does not clearly fall under any of the enumerated rights. This is not the first time this has happened, either - the DMCA of 1998 amended the Copyright Act of 1976 to add cases relating to online music distribution as the previous copyright definitions did not clearly address online filesharing.

    There are a lot of strong opinions about the ethics of training models and many people are firm believers that either it should or shouldn’t be allowed. But the legal question is much more hazy, because AI model training was not contemplated even in the DMCA. I’m watching these cases with interest because I don’t think the law is at all settled here. My personal view is that an act of congress would be necessary to establish whether use of copyrighted works in training data, even for purposes of developing a commercial product, should be one of the enumerated protections of copyright. Under current law, I’m not certain that it is.




  • The issue is that the values of the parameters don’t correspond to traditional variables. Concepts in AI are not represented with discrete variables and quantities. A concept may be represented in a distributed way across thousands or millions of neurons. You can look at each individual neuron and say, oh, this neuron’s weight is 0.7142, and this neuron’s weight is 0.2193, etc., across all the billions of neurons in your model, but you’re not going to be able to connect a concept from the output back to the behavior of those individual parameters because they only work in aggregate.

    You can only know that an AI system knows a concept based on its behavior and output, not from individual neurons. And AI systems are quite like humans in that regard. If your professor wants to know if you understand calculus, or if the DMV wants to know if you can safely drive a car, they give you a test: can you perform the desired output behavior (a correct answer, a safe drive) when prompted? Understanding how an idea is represented across billions of parameters in an AI system is no more feasible than your professor trying to confirm you understand calculus by scanning your brain to find the exact neuronal connections that represent that knowledge.


  • Yeah, exactly. They’re reporting findings. Saying that it worked in 100% of the cases they tested is not making a claim that it will work in 100% of all cases ever. But if they had 30 images and it classified all 30 images correctly, then that’s 100%.

    The article headline is what’s misleading. First, it’s poorly written - “AI-screened eye PICS DIAGNOSE childhood autism.” The pics do not diagnose the autism, so the subject of the verb is wrong. But even if it were rephrased, stating that the AI system diagnoses autism itself is a stretch. The AI system correctly identified individuals previously diagnosed with autism based on eye pictures.

    This is an interesting but limited finding that suggests AI systems may be capable of serving as one diagnostic tool for autism, based on one experiment in which they performed well. Anything more than that is overstating the findings of the study.






  • It wasn’t worth $44 billion when he bought it. That’s why he tried desperately to back out. The reason the company is in such a dire financial situation is specifically because it was bought at that price and now pays debt service far disproportionate to its actual worth.

    You’re also confusing company valuation with operating revenue. $44 billion isn’t how much cash they have on hand and $75 million doesn’t get subtracted from that, so expressing that percentage makes no sense. One number isn’t a percent of the other.

    Twitter’s ad revenue is already down more than 50% since the takeover and this is $75 million more of lost revenue on top of that. The company was maybe on a path to profitability at full advertising revenue and without the debt service, but now it is burning cash even as revenues tank.