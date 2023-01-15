What many of us had expected has finally happened, artist have sued for copyright infringement a couple of AI companies, as well as an art repository site (complaint here). Is this the end of AI tools? I don’t think so, I’ll try to explain why, this will not be a detailed look at the lawsuit, there will be more time for that, this is my own take on some of the technical issues that I think the complaint gets wrong, so this is not intended as an in-depth look at the law anyway, as I suspect this may not get to a trial, more on that later. I’m also aware that this is at a very early stage, things may change, and most importantly, nobody can be sure of what the result will be, this is my own early speculation on the first filing as it stands, I’ll update and write further blog posts as needed.

The claims

Three artists are starting a class-action lawsuit against Stability.ai, Midjourney, and DeviantArt alleging direct copyright infringement, vicarious copyright infringement, DMCA violations, publicity rights violation, and unfair competition. DeviantArt appears to be included as punishment for “betrayal of its artist community”, so I will mostly ignore their part in this analysis for now. Specifically with regards to the copyright claims, the lawsuit alleges that Stability.ai and Midjourney have scraped the Internet to copy billions of works without permission, including works belonging to the claimants. They allege that these works are then stored by the defendants, and these copies are then used to produce derivative works.

This is at the very core of the lawsuit. The complaint is very clear that the resulting images produced by Stable Diffusion and Midjourney are not directly reproducing the works by the claimants, no evidence is presented of even a close reproduction of one of their works. What they are claiming is something quite extraordinary: “Every output image from the system is derived exclusively from the latent images, which are copies of copyrighted images. For these reasons, every hybrid image is necessarily a derivative work.” Let that sink in. Every output image is a derivative of every input, so following this logic, anyone included in the data scraping of five billion images can sue for copyright infringement. Heck, I have quite a few images in the training data, maybe I should join! But I digress.

The argument goes something like this: images are scraped from the Internet without permission, these images are then copied, compressed and stored by the defendants, and these copies are used as a “modern day collage tool” to put together images from the training data, this is because machines cannot reason like people, so it stands to reason that they just put together stuff, hence all images are derivatives of the works in the training data.

The technology

I think that the argument in the claim is flawed because it does not accurately represent the technology, so I will attempt to make a very quick explanation of how tools such as Stable Diffusion or Midjourney produce images. What follows is using some excerpts from my forthcoming article, so stay tuned for a lengthier explanation.

I like to classify what happens in AI generative tools in two stages, the input phase and the output phase. The input phase is comprised of the gathering of data to create a dataset, and this is used to train a model. In the case of Stable Diffusion, it uses a dataset called LAION, which has of over 5 billion entries consisting of the pairing of a hyperlink to a web image (not the image itself) with its ALT text description. This dataset then is used to train a model, I will not go into detail into models, suffice it to say that a model is a mathematical representation of a real-world process that is trained using a dataset, this can be used to make predictions or decisions without being explicitly programmed to perform the task. There are various types of models, but Stable Diffusion and Midjourney both use diffusion models (see an explanation in a previous blog post). Long story short, diffusion models take an image, add noise to it, and then put it back together.

But what is the model from a practical perspective? It is a common misconception that a machine learning model is just a storage of images that then generates a collage, the current lawsuit uses the word collage repeatedly, so it is perpetuating this myth. This is where another machine learning model comes in, this is known as CLIP, it is designed to improve the performance of AI models on a wide range of tasks involving both language and images. The model is trained using a large dataset of images and their corresponding text descriptions, and it learns to understand the relationship between language and images. This allows it to perform tasks such as image captioning, image classification, and translation with high accuracy. So, AI tools use a combination of a diffusion model trained on reconstructing images, as well as CLIP models that understand words used to describe an image.

There is another very important elements involved in generative models, and this is called latent space. In order to train a model with millions, and sometimes billions of single data points, it would be inefficient to treat every data point in the same way, there could be clusters of similar works. If we are thinking about images, you may not have to look at every single cat picture, it may suffice to cluster data that is similar. Imagine data as a room, you would put the cat pictures in the same space, the dog pictures in another space, etc. Latent space is the space of hidden or underlying factors that can explain the observed data, by clustering similar data, it is used in generative models where the goal is to learn a representation of the data that can be used to generate new samples that are like the ones in the training set. This is very valuable because it helps to compress the inputs, there’s no need to copy all images of cats, the model contains latent representations of cats.

The output phase is the generation of the image using all of the above models, and it is done using apps that can take a text prompt and generate a new image based on a combination of statistical data, language models, and latent space.

In other words, this is not a collage.

Analysing the claims

As you can start seeing from the above description of the technology, there is a big issue with how things are described in the lawsuit that clash with how machine learning and diffusion models work in reality. The disparity is that there appears to be a big leap in understanding between the training of a model, and how the model stores that knowledge. According to the complaint, Stability.ai takes the images in the training dataset and these are “stored at and incorporated into Stable Diffusion as compressed copies”. This is not what happens at all, a trained model does not have copies of the training data, that would create an unwieldy behemoth of unfathomable size. What happens is the creation of clusters of representation of things, namely latent space.

What is likely to happen during the trial, if it gets to that, is that there will be expert testimony, and this claim is likely to fall easily. Sure, there is some temporary copying at some stage, it is important to remember that LAION doesn’t copy images either, but there is scraping of images in the training process, but these are not stored in the model as claimed.

This will be a vital point, because as mentioned, the complaint doesn’t make any claims that the outputs are reproductions of any of the training images belonging to the claimants.

The other problematic issue in the complaint is the claim that all resulting images are necessarily derivatives of the five billion images used to train the model. I’m not sure if I like the implications of such level of dilution of liability, this is like homeopathy copyright, any trace of a work in the training data will result in a liable derivative. That way madness lies.

Other legal considerations

Perhaps the biggest surprise in the complain is who is missing as a defendant, in particular two very conspicuous names: LAION and OpenAI. I think that LAION is easier to explain, it is a German research organisation, and what they do is collect hyperlinks and text descriptions. This I believe falls under the text and data mining exception contained in the EU’s DSM Directive. OpenAI’s absence is more difficult to explain. I think that the main reason is that OpenAI does not disclose which dataset they are using, so it is not very easy for a claimant to prove that they have been used in the training data. As this lawsuit is entirely based on the input phase, this missing information is vital.

The other question is whether the lawsuit will be successful, and the honest answer is that I do not know. I am not impressed by the technical errors described above, and I think that this will be an important part of the defence. The defendants are likely to claim fair use, and this case has the potential to be the test case for whether training an AI without permission can meet the fair use requirements. We do not know, but I find that this lawsuit could be a risky gamble for artists. A defeat would finally settle the question that has been left open since Google Books, and I do not think that this is the strongest case, at least as it stands.

Concluding

This is the chronicle of a lawsuit foretold. Now that it has arrived, it will be analysed endlessly and poured over during the next weeks, and I’m looking forward to reading what others think, perhaps my skepticism will prove to be misplaced, we will see. My first impression is that this has “out of court settlement” written all over it, but if there is no compromise then this suit could last for years as any result will likely be appealed.

Paraphrasing Yoda, begun, this AI War has.