First case on AI and copyright referred to the CJEU

So long and thanks for all the lawsuits!

Most people who have been paying attention to the copyright and AI debate in Europe have been expecting that at some point the Court of Justice of the European Union (CJEU) would be tasked with interpreting some of the existing exceptions regarding text and data mining. Many assumed, myself included, that the first referral may very well be the LAION case, currently on appeal. But we were all wrong, we now have the first referral to the CJEU dealing with copyright and AI, and it is Like Company v Google Ireland (Case C-250/25).

The case involves the Hungarian publisher “Like Company“, which runs several news portals, SEO content providers, and PR campaign sites. The website in question here is https://balatonkornyeke.hu which provides news articles related to the Balaton Lake region. The website published an article about Hungarian singer Kozso, which details his plans to release dolphins into the lake, as well as many other details about his life. While the article is not linked to in the referral, it contained specific enough information to narrow it down, and Professor Balazs Bodo was able to find the article in question here. I would recommend that you visit the article with an ad blocker, it’s difficult to navigate otherwise, I got bombarded by IQ test links without it. The article itself consists of 579 words (removing the repeated links to its own Facebook page and links to other articles), and according to Prof Bodo it appears to consist of summaries from other websites, newspaper publications, and TV interviews. After spending some time on the site (thanks to the magic of AI translation), I concur with his interpretation, I did not find any page on that site that appeared to be original, the articles have no author name, and to be honest they appear to be short summaries of articles from other news organisations. The article in question is even illustrated by a picture of Kozso which is sourced from his Facebook page, and I would love to know if it’s been properly licensed. But I digress…

According to the referral, the claimants went to Google Gemini and wrote the following prompt:

“Can you provide a summary in Hungarian of the online press publication that appeared on balatonkornyeke.hu regarding Kozsó’s plan to introduce dolphins into the lake?”

The response is not reproduced in the referral, but according to the claimants “the defendant’s chatbot provided a detailed response which included a summary of the information appearing in the news media belonging to the applicant.” As the original article was only 579 words long, it is difficult to know how a summary could be detailed. I tried the prompt myself in several chatbots. Claude, Perplexity, and Deepseek didn’t give an answer, Perplexity at least tried to provide wider information about dolphins in Lake Balaton. Both ChatGPT and Gemini provided positive answers. ChaptGPT’s answer was 215 words long (you can see it here), and Gemini’s answer was 235 words long, and you can see it here. It’s difficult to tell if there is any sort of direct reproduction given the short summaries involved, but at least in my own experiment that doesn’t appear to be the case in either of the outputs.

Like Company sued Google for copyright infringement, alleging that the summaries produced by Gemini were communication to the public of their work, they also sued for breach of the new publisher’s right contained in Article 15(1) of DSM Directive, which gives publishers an exclusive right over the publication of snippets of information linking to their publications. They also argued that their works had been used in training its chatbot. The defendants responded that there had not been any reproduction of works in Hungary in the training of the model, so Hungarian law was not applicable. They also argued that a summary is not a reproduction, and that a chatbot producing a summary of a work is also not a communication to the public. Defendants also argued that the work did not reach a new public (which is a requirement for there to be a communication to the public) as the content is online publicly available. Moreover, the summaries do not reproduce any of the content of the article in question, at most it reproduces some basic facts. Finally, they argued that if there was any reproduction, this would fall under a couple of copyright exceptions, that of temporary copies according to Article 5(1) of the Infosoc Directive, and for text and data mining present in article 4 of the DSM Directive.

The Budapest District Court decided to submit a request for a preliminary ruling to the CJEU with these 4 questions:

Does a chatbot’s output that closely resembles protected parts of press publications qualify as a “communication to the public” under EU copyright law, even if the chatbot generates text through predictive modelling?
Does the process of training an LLM by analysing and learning from existing texts, constitute a reproduction of protected works under EU law, even if it’s based on pattern recognition?
If LLM training does count as reproduction, can it still be lawful under the text and data mining (TDM) exception in Article 4 of the DSM Directive, provided the sources were lawfully accessible?
If a chatbot reproduces part or all of a press publication in response to a user prompt, does that output constitute a copyright-relevant reproduction by the chatbot provider under EU law?

There will be a lot of discussion in the following months on these legal questions, but I think that the current case doesn’t even pass the first question. As explained above, the original publication appears to be nothing more than a short summary of various different Hungarian publications and TV programmes, and I am surprised that this was not attacked by the defendants. Summaries can definitely have their own copyright, but I find it strange that the claimants are trying hard to attack a practice that they seem to be engaged in themselves. I cannot imagine how a summary can infringe copyright, particularly when the summary has been generated by the claimant in order to initiate a copyright infringement case. I also can’t imagine that there are many people who are typing the prompt in question into Gemini or ChatGPT, the only people who will be doing it are people like me.

Needless to say, I find the idea that a summary from an LLM can be a communication to the public to be utterly ridiculous. People use LLMs for all sorts of purposes, and for the most part outputs are not available to the public. Even if we were to argue that the summaries of the article provided are infringing copyright (which is a big ‘if’), these outputs are mostly private, and only available to a public after they have been shared to by asking for a link. Sure, the definition of “public” has been stretched to include individuals using a facility over time (see Rafael Hoteles), but again, this rests on the presupposition that such public will be accessing the same content over time. LLMs produce different results with every question, and different models from the same family will also produce different results. Besides, I cannot see how outputs in this case can claim to be communicating a work to a new public, as the defendants accurately argue.

Moreover, we really have to discuss the fact that a case involving a summary listicle from what appears to be a Facebook clickbait farm is going to be the first case that will decide the future of copyright and AI in the EU. It boggles the mind that this is the first referral on one of the most important legal subjects of our time. But that is the world in which we live in I guess. I won’t even address the publisher’s right questions, I will be surprised if it even gets to that.

Anyway, paraphrasing Senator Palpatine, we will watch your progress with great interest.

Edited to add: I forgot to highlight perhaps one of the most interesting parts of the referral, which is the first time that I know of that a court has described the nature of a chatbot:

“Gemini (Bard) is a basic model of the LLM type. It is neither an information database, nor an information retrieval system. It does not store copies of the data collected, but rather it converts them into tokens – that is, it breaks the texts down into minimum units – and processes them. That chatbot does not have a fixed database from which it is able to retrieve any data content at the request of users. It uses the Google Search database to collect data and also usually suggests that the user search for the subject in question in Google Search afterwards. When it is asked to directly, the chatbot operated by the defendant is capable of providing a response that displays the content of a protected press publication.”

Fascinating that it completely goes after the erroneous idea that LLMs are databases.

5 Comments

Anonymous · May 27, 2025 at 8:20 pm

¿No es demasiado sospechoso que el primer caso en un tema trascendental sea este? Es como si Google mismo lo hubiera planeado

Andres Guadamuz · May 27, 2025 at 8:29 pm

Honestamente si yo fuese Google este seria el mejor caso para defender. Pero las partes no tienen control sobre si se refiere una pregunta a la CJEU, tan solo la corte referente tiene el poder

‘First case on AI and copyright referred to the CJEU’ | Private Law Theory - Obligations, Property, Legal Theory · May 28, 2025 at 6:52 am

[…] Most people who have been paying attention to the copyright and AI debate in Europe have been expecting that at some point the Court of Justice of the European Union (CJEU) would be tasked with interpreting some of the existing exceptions regarding text and data mining. Many assumed, myself included, that the first referral may very well be the LAION case, currently on appeal. But we were all wrong … (more) […]

Do AI models dream of dolphins in lake Balaton? - Kluwer Copyright Blog · May 28, 2025 at 12:07 pm

[…] Like Company claims that responses provided by Bard, in reply to requests to summarize the content of a specific web page, infringe its rights under the relevant national and EU legislation (copyright and/or the neighbouring right for press publishers), as the response constitutes an unauthorized communication to the public. Whether chatbot answers that summarize publicly available information protected by the press publishers’ right constitute a communication to the public indeed seems like an interesting new question for the CJEU to answer[1] — and one I’ll gladly leave to more qualified people to opine on. […]

Interesting Finds #1 (28 May, 2025) | Atis Gailis · May 28, 2025 at 1:49 pm

[…] perspective highlights open questions about originality, ownership, and liability, while the first AI copyright case referred to the CJEU signals that European courts will soon take up these issues […]