Copyright infringement in artificial intelligence art

The Great Wave off Kanagawa, painted by Rembrandt.

As AI creative tools are becoming widespread, the question of copyright of AI creations has also taken centre-stage. But while copyright nerds obsess over the authorship question, the issue that is getting more attention from artists is that of copyright infringement.

AI is trained on data, in the case of graphic tools such as Imagen, Stable Diffusion, DALL·E, and MidJourney, the training sets consist of terabytes of images comprising photographs, paintings, drawings, logos, and anything else with a graphical representation. The complaint by some artists is that these models (and accompanying commercialisation) are being built on the backs of human artists, photographers, and designers, who are not seeing any benefit from these business models. The language gets very animated in some forums and chats rooms, often using terms such as “theft” and “exploitation”. So is this copyright infringement? Are OpenAI and Google about to get sued by artists and photographers from around the world?

This is a question that has two parts, the input phase and the output phase.

Inputs

The explosion in the sophistication of AI tools has come because of two important developments, firstly the improvement and variety of training models, but most importantly, the availability of large training datasets. The first source of works stems from open access or public domain works, these are sources that are licensed under permissible licences such as Creative Commons (example here), or they’re works that are in the public domain (example here). But of course the amount of such datasets is limited, so researchers can have access to many other datasets, some are even free (lists here and here).

But researchers may also want to try and scrape images from the largest image repository in the world: the Internet. Can they do that? There’s growing recognition that mining data (in this case in the shape of images) is allowed under copyright as fair use or fair dealing. The earliest source of an exception for training an AI can be found in the United States in the shape of the Google Books case. This was a long-running dispute between Authors Guild and Google over scanning books for a service called Google Print (later renamed Google Book Search). After a lengthy battle involving settlements and appeals, the court decided that Google’s scanning was fair use, the transformative nature of the scanning played a big part in the decision, as well as the fact that the copying would not affect the market for book sales online, the purpose of the Google database was to make the works available to libraries, as well as to provide snippets in search results.

While Google Books does not deal specifically with machine learning, it is similar in many ways to what happens in most machine learning training, there is copying of large amounts of works to produce something different.

In the EU, the Digital Single Market Directive has also opened the door for wider adoption of text and data mining. In Art 3 the Directive sets out a new exception for copyright for “reproductions and extractions made by research organisations and cultural heritage institutions in order to carry out, for the purposes of scientific research, text and data mining of works or other subject matter to which they have lawful access.” Art 4 extends this permission to commercial organisations for any purpose, as long as they have lawful access to the work, and also rightsholders have not reserved their rights out of this exception.

The end result of the above is that a large number of commercial entities operating both in the US and Europe are able to scrape images from the Internet for the purpose of data mining, and they can make reproduction and extraction of such materials. Furthermore, other countries such as the UK and Japan have similar exceptions.

Between open data, public domain images, and the data mining exceptions, this means that we can assume that the vast majority of training for machine learning is lawful. While it is possible to imagine some data being gathered and used unlawfully, I cannot imagine that the biggest organisations involved in AI are infringing the law in this respect.

Edit: It’s come to my attention that an important detail is also lost in this debate. Most image datasets do not copy or store images, for example, LAION-5B, the largest image and text dataset in the world, is comprised of links. And linking is not infringing copyright.

Outputs

Assuming a lot of the inputs that go into training AI are lawful, then what about the outputs? Could a work that has been generated by an AI trained on existing works infringe copyright?

This is trickier to answer, and it may very well depend on what happens during and after the training, and how the outputs are generated, so we have to look in more detail under the hood at machine learning methods. A big warning first, obviously I’m no ML expert, and while I have been reading a lot of the basic literature for a few years now, my understanding is that of a hobbyist, if I misrepresent the technology it is my own fault, and will be delighted to correct any mistakes. I will of course be over-simplifying some stuff.

The main idea behind creative AI is to train a system in a way that it can generate outputs that statistically resemble their training data, in other words, in order to generate poetry, you train the AI with poetry, if you want it to generate faces, you train it with faces. There are various models for generative AI, but the two main ones are generative adversarial networks (GANs) and diffusion models.

GAN is a model that uses two agents set against each other (hence the adversarial) in order to generate better outputs. There is a generator, which generates output based on a training dataset, and there is a discriminator, which compares the generated output against the training data in order to discern if it resembles it, and if it does not then it is discarded in favour of outputs that resemble the input.

For a relatively long time GANs were the king of machine learning, as they managed to produce some passable output (see all of these cats that don’t exist). But GANs have limitations, the discriminator could be too good, so no output would pass the grade, or the generator could learn to produce only a limited type of output that would pass the discriminator.

The most successful recent examples of AI, such as Imagen, DALL·E 2, Stable Diffusion, and Midjourney, are using the diffusion model, which reportedly produces superior results. Diffusion works by taking an input, for example an image, and then corrupting it by adding noise to it, the training takes place by teaching a neural network to put it back together by reversing the corruption process.

The most important takeaway from the perspective of a legal analysis is that a generative AI does not reproduce the inputs exactly, even if you ask for a specific one. For example, I asked Midjourney to generate “Starry Night by Vincent Van Gogh”. The result was this:

It looks like it, but it’s not the exact same thing, it’s almost as if the AI is drawing it from memory, which in some way it is, it’s re-constructing what Starry Night looks like. It does a relatively good job of it because it has seen it a lot of times.

Moreover, the developers of these tools are aware of the potential pitfalls of producing exact replicas of art in their training datasets. OpenAI admitted that this was a problem in some of the earlier iterations of the program, and they now filter out specific instances of this happening. According to OpenAI, this was mostly taking place with low-quality images, which would be easier to memorise for the neural network, and also there were images that had a lot of repetition in the datasets. They mitigated for this by training the system to recognise duplicates, and DALL·E no longer does image regurgitation.

So, if there is no direct infringement, and the systems are not reproducing works in their entirety, is there still a possibility to have copyright infringement? Most people have been generating prompts of artists that are long dead, and whose works are in the public domain. So the AIs will easily produce works in the style of Van Gogh, Rembrandt, Henri Rousseau, Gauguin, Matisse, etc. Just put the name of the artist in your prompt, and even the specific artwork that you want reproduced, and the AI will do it. But these works are in the public domain, so nobody cares. What about artists that are still alive, and their works under copyright?

Here things get trickier. It is clear that one can produce art in the style of a living artist. [Edit note: tweaked this section a couple of times, now I’m happy with the balance]. So for example, you can go to any of the tools and type the name of one artist, and this could be a living or recently passed one, whose works may still be protected by copyright. Sometimes this will work, but most often it will not. The issue is that not all artists have a recognisable style, but also it could be due to the fact that the artist is not replicated enough in the datasets. So typing an artist with “city landscape” produced this picture.

City landscape. Can you guess the artist used in the prompt?

The problem is that style and a “look and feel” are not copyrightable. Sure, an image could be inspired by an author, and you could recognise a style, but it would be a stretch to say that it infringes copyright. One of the challenges for living authors, but also for others whose work may still be under copyright (Warhol and Basquiat come to mind), is that we don’t know if the AIs have been trained on their own artwork, or if they have been trained on the army of human imitators that are all over the web. There’s a reason why the AIs are so good at replicating Van Gogh’s style. Evidence of this is that if you go to any digital art repository and search for a living artist, you will find hundreds of images from human artists that are referencing living authors works (see for example at Bēhance and ArtStation).

Copyright protects the expression of an idea, not the idea itself (the famous idea/expression dichotomy). It will be difficult in my opinion for an artist to successfully sue for copyright infringement as their style is not protected, and as mentioned above, it is unlikely that an AI tool will reproduce a work verbatim (can you use verbatim for images? I digress).

The best case against an AI tool may be when they reproduce a well-known character, say for example, Darth Vader, Mario, or Pikachu, or a picture of Groot and Baby Yoda. But while I could see this easily as potential infringement of an existing character, it is unlikely that this would be pursued by the copyright owner unless there is a good reason to do so. It is unlikely that a person or a company would make these things available commercially to the commercial scale that it’s worthy of copyright infringement action, and in that sense, it would not be any different to all of the infringement that already exists on the Internet made by humans.There are loads of examples of low-level infringing work online.

Concluding

This blog post is just scratching the surface of the conflicts that are to come with regards to AI and copyright. I am sure that at some point an artist will try to sue one of the companies working in this area for copyright infringement. Assuming that the input phase is fine and the datasets used are legitimate, then most infringement lawsuits may end up taking place in the output phase. And it is here that I do not think that there will be substantive reproduction to warrant copyright infringement. On the contrary, the technology itself is encoded to try to avoid such a direct infringement from happening.

So what we will see is people trying to argue for styles, and here a decision may rest entirely on the specifics of the case. I am not convinced that a court would find infringement, but it’s still early days.

On the meantime, I leave you with a picture of llamas in the style of Klimt’s The Kiss.

Categories: Artificial intelligence

Tags: Artificial IntelligenceCopyright

35 Comments

andrewducker · August 15, 2022 at 3:30 pm

Thank you, this got me 75 internet points 🙂

https://news.ycombinator.com/item?id=32436203

Anonymous · August 16, 2022 at 5:02 am

Glad to be useful.

Andy Baker · August 17, 2022 at 8:54 am

Marvin Gaye estate vs Robin Thicke and Pharrell Williams over Blurred Lines showed how surprising judgements in copyright cases can be. Could something similar happen with regard to AI art?

Andres Guadamuz · August 17, 2022 at 4:05 pm

I think that Marvin Gaye is a great example, but more of an outlier (and music copyright itself often relies on the strength of the expert analysis. I think that the Ed Sheeran case is more likely to be the norm.

John M · August 17, 2022 at 10:12 am

Very interesting. Copyright is a very well litigated and clear area. Adaptations from a work should credit the original works as a courtesy if nothing else, imho.

Andres Guadamuz · August 20, 2022 at 7:24 pm

Absolutely agreed.

Mike · August 20, 2022 at 10:49 am

The author wrote regarding recreation of existing copyright characters, “It is unlikely that a person or a company would make these things available commercially.”

You really think people won’t put the recreated Groot’s, Calvin & Hobbes, and Baby Yoda’s on T-shirts, Mugs, and everything else to sell?

You don’t think that there will be flood of products from overseas where copyright is difficult to enforce?

That seems very optimistic. Are you really certain your view isn’t biasing towards something that falls into your perceived interests?

Andres Guadamuz · August 20, 2022 at 7:38 pm

Hi Mike,

Good catch, I guess that should read “to the commercial scale that it’s worthy of copyright infringement action”, but that would destroy the flow of the sentence. There’s so much low-level infringement that never gets litigated because it’s just not worth it. I mean, go to any art website that is filled to the rafters with fan art and derivative art, some of it is even commercial. Infringement is everywhere online, even low-level commercial infringement. As for flooding markets, it’s not different from knock-offs, it’s not a market worthy of widespread enforcement, but it still gets enforced at the border with seizures of products all the time. And if these things were flooding in, they’d get seized immediately by Disney, etc.

I’m curious as to what you think are my “perceived interests”?

Dave · November 5, 2022 at 11:58 am

It might be true that large companies would never take legal action against the millions of copyright infringement that’s not worth pursuing legally. However, many freelance and small artists were already being fully taking advantage of on a daily basis. When Disney, large or foreign companies infringe smaller artists, there is little to nothing they can do about it, and that’s with the law on the artists side. It’s now just become way easier to copy; it’s just a couple of prompts away. In fact, before AI the main resistance to copying was from hired artists themselves, they tend to hate copying art or trying to emulate someone else, it’s morally wrong.

It’s not the cheap knockoffs we have to worry about, it’s the corporations who begin to stop hiring artists and begin to hire people who prompt ai, which is a low skill job vs highly skilled one.

Artists in the droves are having to look for other income sources because large companies are going to hire them less and less. Their futures are uncertain, and many are in student debt. Why would you hire a highly skilled person for more money when you can hire a low skilled worker who can ‘utilise’ the work of skilled artists before them. And they’ll be so removed from working off skilled artists backs, there will be no resistance or moral duty as a fellow artist to not just type in some other artists name and almost perfectly copy their style and be capable of acting just like a ‘deep fake’.

The commercial implications for hiring artists is massive. One thing for certain is executives and directors don’t actually have a clue what really works and are happy with mediocrity most of the time. It’s an artists job to interpret, communicate and direct art, and those skills only come from the knowledge and practice of being an artist. Yet, that might become a job that’s no longer needed and the educational system might be in shambles for art because all of a sudden the commercial hire pools are now too small and funding will diminish.

And it’s come to a greater point that copyright laws need to be updated. They were there to protect all artists in wording, but not really. The only reason they were created was to protect large companies IPs and profits. It never came down to the morality of using someone’s work in a way to make profit without the artists receiving anything in return. It’s about time people stand up for themselves and change copyright laws to actually protect people and not companies which now have no reason to update them because they get a win win. They get the art they can use to make money and save thousands by not hiring artists. And they can copy their work with almost no effort. There’s will also come a point where there is little to no way for other people to know if that work is AI generated, hense not owned by copyright by the company as to stop others from using it.

Also, I might add that although AI is not an entity on its own, there should be stricter ethics on what you can use in a data set to train. The company should be held accountable, not the AI if it infringes copyright. If you hit someone with a car after rolling it off a cliff, it’s not the cars fault, nor is it gravity’s, it’s on the person who pushed/knocked it off or neglected it. AI is the creation of code owned by the company, and they’re the trying to bury artists by saying “You’re worthless now because creativity and art can now be done with AI. Now get over it, and get a ‘real’ job”. All the while working off of their backs with AI that wouldn’t exist without their hard skilled work.

Alan Acevedo · January 18, 2023 at 5:40 pm

Student loans aren’t dispersed with a moral compass, last I checked. While the defense of anyone’s way of life is appropriately empathetic, society isn’t exactly banning calculators to save mathematicians jobs. Are we to demonize automobiles for the horses sake? Replacing tedious and costly processes with efficiency has long been the marching tune of technology. Smart artists take advantage of the tools put in front of them. Let’s not be fooled by old men yelling at clouds.

If an artists work is sufficiently novel or popular as to warrant a premium, I’m all for paying what’s fair. How else do theaters continue production in an age of 8k television? To argue that AI is going to take advantage of artists is either willingly ignorant of how the process works or devoutly indignant.

In contrast, it is clear that intentional reproductions of copyrighted works are a no-no (and removal of this ability through AI prompts should be the norm). But which artist can claim ownership of the angle at which a brush meets canvas, or the color wheel? If art is truly a person’s way of life, AI would ostensibly not affect them as much as everyone with a drawing on the family fridge would have us believe. Who’s more susceptible to being forgotten, my ten year old nephew who just learned to paint oil on canvas, or Bob Ross’s estate? Artists claiming their jobs are being taken are putting themselves on quite the pedestal in my opinion.

In addition, let’s not pretend that the copyright system is somehow broken or inconsistent. Muppet Babies are a clear example of large corporations being held just as accountable as the little guy. This comes with the balance of fair use doctrine which Google Book Search has shown opens up the doors for works from AI being commercially viable. With precedent and copyright law in mind, the argument of whether or not AI produced art is legal or producible for profit is already a case closed in my opinion.

amurra · February 20, 2023 at 10:47 pm

The analogy of calculators, cars, steam machines, and any other innovations that are changing social conditions is often presented in the current discussion. The problem is that the analogy is not entirely valid. Calculators did not replace mathematicians, and cars and steam machines did not replace engineers. They only made certain activities of their users easier and cheaper, such as calculating, commuting, and powering mills. By doing this, they gave a competitive advantage to some people, which changed societies. The invention of calculators did not put accountants at risk of losing their jobs because uneducated and inexperienced individuals who could afford to buy the miraculous device started to become better accountants. Calculators only sped up the process of calculation (by the way, abacus was not much slower) and made accountants who used them more competitive. The situation was a bit different with cars. It was (and still is) an expensive thing, so if someone was earning a living by riding a horse, they could be outcompeted by a car owner at some point. One could sell a horse and try to buy a car if needed. What is important to note is that all inventions so far aimed to make life easier (even if it was an electric chair that made the life of the executioner easier). The goal of AI is the same – to make life easier. Instead of talent, hard work, and education, it is enough to learn how to prompt AI properly. After a few hours of work, one can be better than an artist with a few years of experience. Due to this, the market of creative jobs will undergo upheaval, and I do not mean AI-generated images only or mainly. The problem that I see is that an artist’s individual style is the essence of their work, and it can be copied to a high degree by AI. Consequently, the artist will be less competitive, and an art career will be less desirable, leading to fewer people pursuing this career path, resulting in an overall decrease in creativity levels in society. The solution to this problem would be to change legislation to protect artists’ unique style and forbid using their artworks for teaching AI without their permission. If this solution were possible, it would be a good tradeoff between protecting human creativity and the development of AI.

Нарушение авторских прав в искусстве искусственного интеллекта - BORAO.RU - Хорошие новости · August 12, 2022 at 10:29 am

[…] URL статьи: https://www.technollama.co.uk/copyright-infringement-in-artificial-intelligence-art […]

Futureseek Daily Link Review; 15 August 2022 | Futureseek Link Digest · August 15, 2022 at 10:19 pm

[…] Copyright infringement in artificial intelligence art >> * Scaling Kubernetes to Thousands of CRDs >> * The Aerocon Wingship: 7 stunning images of […]

Ainsi l’IA ne contesté pas les salubrité de interjection mieux infatigablement – Politico – berserk-scan.online · August 16, 2022 at 1:07 am

[…] à l’Lycée du Sussex qui publié une ordre dubitatif sur les loyaux de argent cérébrale, répondre à la question Pour un articulet de blog neuf – moulant à période contre qu’un combattant […]

Why AI isn’t changing health care faster- POLITICO – Monkey Viral · August 16, 2022 at 12:01 pm

[…] scholar at the University of Sussex who edits a scholarly journal on intellectual property rights, tackled the question in a recent blog post — just in time for the announcement by an OpenAI competitor that it would […]

Copyright Infringement In Artificial Intelligence Art - AI Summary · August 18, 2022 at 4:30 am

[…] Read the complete article at: https://www.technollama.co.uk […]

Best of the blogs - Legal Cheek · August 19, 2022 at 8:46 am

[…] Copyright infringement in artificial intelligence art [TechnoLlama] […]

formación - TECH · August 19, 2022 at 12:20 pm

[…] formación […]

Tech roundup 155: a journal published by a bot - Javi López G. · August 20, 2022 at 5:21 pm

[…] Copyright infringement in artificial intelligence art […]

Künstliche Intelligenz für die Architekturkommunikation? · August 29, 2022 at 11:26 am

[…] Dennoch gibt es wohl rechtliche Grenzbereiche: Zum Beispiel dann, wenn über den Umweg der KI-Software illegale Inalte erzeugt werden. Oder wenn der Stil einer bestimmten (zeitgenössischen) Künstlerin nachgeahmt wird etc. Über solche Themen wird schon jetzt in Fachkreisen heiß diskutiert, siehe z. B. hier und hier. […]

Algoritmen kunnen nu elke artiest nabootsen. Sommige artiesten haten het - CNTech News · August 29, 2022 at 3:06 pm

[…] kunst heeft bekritiseerd en zou kunnen worden verwacht dat hij bezwaar zou maken. In een blogpost na het incident betoogt Guadamuz dat rechtszaken waarin inbreuk wordt geclaimd, weinig kans van […]

Știri #62 - BreakingPoint · September 2, 2022 at 2:32 pm

[…] Copyright infringement in artificial intelligence art – TechnoLlama […]

AI Art Generation Notes from MoniGarr – MONIGARR · September 10, 2022 at 4:22 pm

[…] Copyright infringement in AI Art […]

Using AI Artwork to Avoid Copyright Infringement | Copyright Lately · October 24, 2022 at 9:30 am

[…] viable option if the AI-generated art doesn’t itself infringe a preexisting copyrighted work. As TechnoLlama author Dr. Andrés Guadamuz has explained, the infringement issue needs to be examined at both the input phase and output phase of the […]

The Uffizi follows up Pornhub success by suing Jean Paul Gaultier - Legal Cheek · November 7, 2022 at 9:22 am

[…] the artist’s recognisable traits matter? Looking at the IP academic Andres Guadamuz’s images generated on MidJourney from the command ‘Starry Night by Vincent Van Gogh’ may allow you to better visualise […]

The Uffizi follows up Pornhub success by suing Jean Paul Gaultier - Bath Beacon · November 7, 2022 at 11:36 am

[…] tone, or the artist’s recognisable traits matter? Looking at the IP academic Andres Guadamuz’s images generated on MidJourney from the command ‘Starry Night by Vincent Van Gogh’ may allow you to better visualise the […]

Welcome · November 10, 2022 at 4:33 pm

[…] viable option if the resulting artwork doesn’t itself infringe a preexisting copyrighted work. As TechnoLlama author Dr. Andrés Guadamuz has explained, the infringement issue needs to be examined at both the input phase and output phase of the […]

Using AI Artwork to Avoid Copyright Infringement - Lexology - SoundOfTexk · November 13, 2022 at 1:21 am

[…] if the ensuing art work doesn’t itself infringe a preexisting copyrighted work. As TechnoLlama author Dr. Andrés Guadamuz has explained, the infringement difficulty must be examined at each the enter section and output section of the […]

Using AI Artwork To Avoid Copyright Infringement – mondaq.com – Finahost Online Solutions · November 25, 2022 at 4:54 am

[…] option if the resulting artwork doesn't itself infringe a preexisting copyrighted work. As TechnoLlama author Dr. Andrés Guadamuz has explained, the infringement issue needs to be examined at both the input phase and output phase of the […]

GUEST POST: SAVAN DHAMELIYA AI-GENERATED ART AND COPYRIGHT INFRINGEMENT | IPRMENTLAW · April 16, 2023 at 3:21 pm

[…] [3] Andres Guadamuz, Copyright infringement in artificial intelligence art, TechnoLlama (January 15, 2023) https://www.technollama.co.uk/copyright-infringement-in-artificial-intelligence-art […]

Uma nova primavera para o direito Autoral das IAs – BaixaCultura · April 24, 2023 at 2:08 pm

[…] de como é feito o treinamento de uma aplicação de IA (veja aqui uma ótima explicação sobre o assunto), o cerne não só dessas duas ações, mas do funcionamento dos aplicativos de IA como um todo, é […]

Alles nur geklaut? - medienMITTWEIDA · April 28, 2023 at 1:01 pm

[…] sich unter anderem Andres Guadamuz, der an der University of Sussex Immaterialgüterrecht lehrt. In seinem Blog vergleicht er das Training von Modellen mit einem Fall rund um Google Books. Google scannte 2002 […]

AI Tarot Decks: The Good, Bad and the Ugly – A Passion For Tarot · May 30, 2023 at 2:54 am

[…] be considered copyright protected. An excellent article you can read on Copyright infringement here explains the how and the what of the problem, that being: how much of an original artwork can be […]

“Uso Justo”: Treinando IA Generativas - CC Brasil · June 6, 2023 at 9:32 pm

[…] os fatos do caso. Andrés Guadamuz tem duas excelentes publicações no seu blog que explicam a tecnologia envolvida neste caso e que começam a explicar por que razão isto deve constituir um uso […]

C4C’s Perspective on the EU AI Act: Copyright in Real Life is Messy and AI Discussions Are Not Helping | C4C · September 28, 2023 at 3:31 pm

[…] it comes to generative AI, as explained by Dr Andres Guadamuz, Reader in Intellectual Property Law at the University of Sussex, “the […]