I’ve been following the CJEU case C-250/25 Like Company v Google hearing with interest (my initial thoughts on the case here). I won’t attempt to cover the entirety of the proceedings, I’ve already accumulated plenty of notes for that, but I want to focus on one specific aspect that jumped out at me: a line of questioning from Advocate General Szpunar that I found particularly problematic, and which deserves immediate commentary.
Before I get to that, a few general comments about the proceedings. I’ve been writing about AI and TDM since 2012, and more specifically about what we now call generative AI since 2015, so I’ve become deeply familiar with the intricacies of the debate. The downside of that familiarity is that I tend to assume others understand at least the basics as well as I do, and I get a bit of a shock when I witness non-experts grappling with a very complex technical issue. The lawyers, advocates, and the judges involved did a very good job in trying to understand the technology and how it fits with copyright, but I could notice several times where the non-specialist parties appeared to be struggling with some concepts. I’m reminded of this famous xkcd cartoon:
I wrote an editorial recently where I touch precisely on this point, we are in dire need of lawyers who can explain a topic clearly and concisely to a lay audience. I just hope that the commentary leading to the ruling will help to steer things in the right direction.
An end to inputs and outputs?
What prompted me to write this blog post took place during AG Szpunar’s questions for clarification from the parties (around 14:45 in the recording). He was asking a very good question to Google regarding territoriality and applicable law, particularly with regards to where the AI training took place, relying on the Rome II Regulation: the applicable law would be that of the country where the training occurred. Google usefully clarified that they argued the applicable law was that of the country where protection was sought, which is Hungary, but that Hungarian law could not reach the place where the training took place. Each act would have to be analysed individually. So training (the input phase) is not the same as producing an output; Google argues that Hungarian law could apply to an output, while on the input phase it may not. I personally think that Google is correct here, and this is consistent with what was found in Getty Images v Stability AI.
It was at this point that AG Szpunar dropped the bomb. He suggested that in order to analyse where the damage occurred, the court may have to analyse the entirety of the actions that led to a potential infringement, citing the Acacia design case. In other words, he is proposing doing away with the prevailing input-output dichotomy in analysing AI copyright cases.
I had to stop the recording and watch the entire sequence again to make sure I was getting this correctly, because this is a very big statement, particularly coming from the Advocate General.
Google’s response was immediate. Counsel argued that this would be the wrong approach because copyright grants different rights (reproduction, publication, communication to the public, and so on), and those rights have to be analysed separately. If a book is reproduced, that is one action; if that copy is distributed, that is another. They cited the Austro-Mechana case dealing with cloud storage, which involved separate actions analysed differently by the Court. AG Szpunar countered that Austro-Mechana regarded private copying, so the analogy did not hold. A back-and-forth ensued where Google argued that it would not be advisable to consider the acts together, citing also international principles of tort law.
I hope that the input-output dichotomy will prevail, for various reasons, but particularly because we are dealing with two very different processes. I was concerned that the analogies being used involved mostly physical copies of the same works, and I would hope that the nature of training and producing outputs would be at the centre of the analysis. Understanding this distinction is vital to the question of whether we should consider the use of a model as a single act from training to output, or as separate acts. AG Szpunar used the analogy of a book that is reproduced without authorisation in one country and then distributed in another. The problem with this analogy is that it fails to recognise the reality of AI training; some people continue to think of it as making a direct copy of the work which can then be communicated to the public or distributed. I have talked repeatedly about this here, but for completeness: the act of reproduction during training produces an entirely different product, a model, which does not contain any copies as such.
While I’m on record as not favouring analogies that equate human and machine learning, for the purpose of this discussion one such analogy would actually be closer. The scenario we are dealing with in AI training is that a copy is made in one country, a person reads that copy, and then goes to another country and recites it, or tries to produce a summary of what they read. It may amount to a reproduction, it may not, but these are different acts, and the copy, if there is any, is not the same at all.
This is precisely why collapsing the input and output phases into a single act would be so damaging. If the CJEU were to follow AG Szpunar’s reasoning, it would effectively allow courts to treat the statistical patterns encoded in a model’s weights as if they were copies of the training data, smuggling in a finding of reproduction at the output stage based purely on what happened at the input stage. I believe that would be technically inaccurate, and could also undermine the careful rights-specific analysis that EU copyright law requires. Worse, it would create a chilling precedent: any downstream use of a model could be treated as an extension of training, regardless of whether the output actually reproduces any protected expression. The input-output distinction is a convenient analytical framework because it reflects a genuine difference in what is happening to the works at each stage, and the law should respect that difference.
There are other reasons to keep the input-output analysis separate. A model is trained with millions of works, and there is usually a reproduction taking place, but the place where that training took place may consider such an act as fair use/fair dealing, in other words, legitimate, while that act may be an infringement in another. Furthermore, the production of an output may not even be accessing the copy in the first place when producing an output. Most modern output involves Retrieval-Augmented Generation (RAG), this is pretty much an online search for updated information, so an output may not even be connected at all to the input, it could be drawing a summary from the live web, just like a search engine. These are two acts that require different analysis.
Concluding
I hope that some of the misgivings from the hearing are caused by some unfamiliarity with the technology. At some point the Advocate General kept using terms like training, grounding, and RAG interchangeably, which I think denotes a bit of a struggle with the technical terms. That is understandable, but the distinction of all of these concepts is important, and therefore may result in very different legal analysis.
Now we just have to sit and wait, we are indeed living in the golden age of AI copyright.
0 Comments