The Munich Regional Court has handed down its much-awaited decision in GEMA v OpenAI, and it is… a lot, and right on the back of the Getty Images High Court ruling. On the surface, this looks like a clear defeat for OpenAI: the court found copyright infringement in the model and in the outputs, granted injunctive relief, ordered disclosure, and paved the way for damages to be decided at a later date. GEMA even gets a right to publish parts of the judgement in the press. Under the surface, however, the picture is more complicated. There are parts of the reasoning I think are genuinely helpful, and others that leave me unconvinced.

This is very much a “first thoughts” piece. There will almost certainly be an appeal, and at some point we may even get the CJEU to tidy up the mess. For those who want to read the whole thing, there is the text in German.

The basics

GEMA, the German collecting society for musical works, brought a legal action against OpenAI for copyright infringement on behalf of several German lyricists and publishers of nine well-known German songs, including works like “Atemlos”, “Über den Wolken” and “In der Weihnachtsbäckerei”. The lawsuit focused on models 4 and 4o, which were available in Germany during the relevant period.

The lawsuit centred on both inputs and outputs, claimants argued that the works had been used in the training data, and also alleged that ChatGPT could generate upon request outputs that reproduced those lyrics. There was never a dispute that the the lyrics at issue were indeed in the training data. Like everyone else, OpenAI scraped content from the web and other sources, bundled everything into a giant corpus and used that material to train its models. The real fight was about what that means legally, and what happens when a chatbot can be coaxed into outputting bits of those lyrics.

GEMA’s legal team set out to do precisely that, they disabled web search to stop the model from getting information using Retrieval-Augmented Generation (RAG). They used standard ChatGPT-4 and also custom agents based on model 4o. These agents were given the role of “lyrics expert” with the instruction that the bot “knows all lyrics and can reproduce them accurately and completely”. They then prompted along predictable lines: “What are the lyrics to [song]?”, “What is the chorus?”, “Please reproduce the first verse”, and so on.

There were definitely some outputs, but they were decidedly underwhelming. In one instance the agent was able to reproduce 25 consecutive words from the song “36 Grad”, and the first three verses of the song “Über den Wolken (just over 70 words) with some other songs being reproduced as little as 15 words (and with hallucinations). Still, the court takes these outputs as sufficient to infer memorisation in the model and infringement both “in the model” and in the outputs.

Memorisation and reproduction

One genuinely interesting part of the judgement is the way the court conceptualises training. It borrowed from the Hamburg LAION decision and German scholarship to break training into phases: first, extraction and conversion of training material into a machine-readable corpus; second, analysis of that data and the actual training of the model; and third, the later use through prompts and outputs.

When the court considered whether there were infringing reproductions in the model, it focused on the second phase. This is where the idea of “memorisation” does the heavy lifting, I’ve talked extensively about this subject here and elsewhere, but suffice it to say that memorisation is when a model is capable of remembering some of the data it was trained with, and therefore can reproduce it as an output. The court relied on computer science research that shows large models sometimes memorise training data in a way that allows it to be reproduced later. Here I disagree with some of the language used (or at least my translation of it), using words such as storage. This is a huge point of disagreement with the UK Getty case, where the judge did not find any copies had been stored in the model. However, the fact that models can memorise items in the training data is not a controversial factual claim, but as I keep repeating, memorisation is not an exclusive right of the author, reproduction is, so what matters in the end is that a model is capable of reproducing a work. And that has happened here, albeit at a minimal level, but it did happen.

From there, the judgement takes two steps. First, it found as a matter of fact that the disputed lyrics are memorised. They were used as training data and parts of them appear in outputs in response to very simple prompts. According to the court, the combination of length, complexity and overlap rules out mere coincidence. The only explanation is that the texts are “contained” in the model and can be retrieved. Again, I sort of disagree with some of the language used, but not with the fact that there has been a reproduction.

Second, it treats this memorisation as reproduction under the usual copyright rules. The German reproduction right, like its EU counterparts, is technologically neutral. It has already been applied to digitisation, compression, thumbnails and other storage techniques that bear no resemblance to traditional copying. For the court it is enough that the storage allows the work, or a protected part of it, to become perceptible again with some probability using appropriate tools. Parameters and weights are no different in principle from bits on a hard drive.

In that sense the decision is actually useful. It confirms quite explicitly that reproductions in the model are capable of falling within the normal reproduction right. This is simply an application of the technological neutrality story that copyright courts have been telling themselves since the early 2000s.

The TDM exception

While most coverage has been concentrated on the reproduction of outputs, to me the part that will probably be cited for years is the treatment of text and data mining. For the last couple of years we have been bombarded with confident analysis that the EU’s TDM exception was never meant to cover AI training. I have disagreed with that view from the start, particularly since the AI Act seemed to have settled this debate once and for all. Now the Munich court now comes down clearly on the side that language models fall within the scope of the text-and-data-mining exception. The court comments (translated with Gemini):

“Language models such as the models at issue generally fall within the scope of the limitations on text and data mining […] the EU legislator was aware of the use of data for the purpose of training models. […] Even if this were not the case, the scope of application of Section 44b extends Copyright law and Article 4 of the Data Mining Directive (DSM Directive) refer to modern technology. According to the recitals of the DSM Directive, the limitations on text and data mining are specifically intended to promote new technologies…”

The court reads § 44b UrhG and Article 4 DSM in light of the recitals and the legislative history. TDM was introduced with an explicit eye on modern computational uses. The recitals talk about machine learning and about efforts to promote “new technologies”. Reproductions that are necessary to build a training corpus, such as converting works into digital formats or storing them temporarily in memory, are exactly the sort of preparatory acts that these limitations were designed to cover. The analysis itself, in other words the extraction of patterns, statistics and embeddings, is not a copyright-relevant act at all.

However, the judgement then draws a very sharp line. It distinguishes between reproductions that are needed for analysis and reproductions that arise from memorisation in the model. The first category falls within the scope of the TDM exception. The second does not. In other words, memorisation and reproduction of such memorisation does not fall under the Art 4 exception. This to me is a good line to draw, and a useful distinction that could be adopted in the future. Training will be fine, as long as outputs don’t reproduce works used in training.

A point with which I disagree is that the court seems to assume that memorisation will always take place, and announces that if memorisation of training data cannot be prevented according to the state of the art, training models on copyrighted data is not exempt under the TDM exception. In other words, the limitation becomes conditional, the ability to rely on it depends on the existence of effective anti-memorisation measures. I found this to be perhaps a bit convoluted, and I wonder if the court was not presented with accurate literature on memorisation, particularly the fact that while it can occur, it does not always occur. To me the dividing line has to be on the outputs, as has always been my theory. The proof is in the output because that is the only evidence of whether memorisation has taken place.

From a policy perspective, this is a big move. The court even notes that people are already talking about new exceptions or licensing regimes for AI training with remuneration, but concludes that this is a job for the legislator. It is not prepared to re-write § 44b by judicial creativity. Here, for once, I find myself nodding. The TDM provisions were clearly introduced with AI in mind, but they were never intended as a blank cheque for building a karaoke machine into every large language model.

Opt-outs and outputs

There are other points discussed that can prove interesting to the future debates, but that I found a bit less compelling. Part of the case rested on the fact that GEMA had amended its contracts and terms so that it could declare a reservation of rights under § 44b. It then did so via its website and later in its legal notice. The court accepted this as a valid reservation of rights for the repertoire in question. This seems to contradict other decisions which require a machine-readable copy as a valid reservation of rights.

OpenAI countered that it respects robots.txt and that this should be enough to manage opt-outs and TDM reservations. The court was unimpressed and argued that robots.txt is treated as a blunt and imperfect tool that operates at the level of paths and directories, not individual works. It also cannot express different licensing terms or uses very well, and in any case the defendants had not been transparent about what they were actually training on. The opt-out regime is inherently detrimental to rightoholders, and therefore it should be read broadly. I’m not totally against this interpretation, but this is certainly going to be a point of contention in the future appeals.

Another interesting question was who is liable for infringing outputs, another important point that we have discussed here. The court again adopts a familiar rightsholder-friendly stance as it treats the disputed chatbot responses as acts of reproduction and making available to the public, both on user devices and in stored chat histories on OpenAI’s servers. It also attributes these acts to OpenAI rather than to the user. The reasoning here is straightforward. Users provide simple prompts, they do not design the models, they do not determine the architecture, and they cannot meaningfully influence the decoding process, that is all under the control of the provider. The court compares this to older German case law on internet radio recorders and draws a distinction. The classic recorder is a neutral tool for private copying, a large language model that is known to occasionally spit out copyrighted lyrics is not neutral at all, it is a content-determining system operated commercially by its provider. I also think that this will be a point of contention in the appeal, and this is also another point in which I strongly disagree with the decision.

Looking forward

The above are just my initial thoughts on a lengthy and complex decision, and I am sure that we will be discussing it for the next few months, at least until the next big case hits. I do want to put the case in the wider context, and why it is not the huge victory that some people have announced.

When you strip away the noise, this is a remarkably narrow decision affecting only nine songs. The immediate consequence is that OpenAI will need to restrict those tracks from future reproduction. I tested this by asking ChatGPT for the lyrics of Über den Wolken, and it responded: “I can’t reproduce the lyrics from ‘Über den Wolken’ because it’s a copyrighted song.” I’m sure there will be jailbreaking methods that get around this, but models are generally far less willing to reproduce text than they used to be. We still don’t know what will happen with damages, although I suspect they won’t be crippling; the reproductions were, in my view, extremely limited and short. And while the ruling can be cited elsewhere, it isn’t binding on other courts. Civil law jurisdictions simply don’t operate with the same notion of precedent as common law systems.

What I find interesting is how this fits into the wider landscape. It’s obvious that GEMA brought this as a test case, aimed squarely at sending a message to OpenAI and the rest of the industry: “Pay us licensing fees for training, or else.” The difficulty is that there is no licensing market yet, so we still have no idea what these deals will cost. AI companies might well comply, but end up paying very little. The real problem for copyright holders is that the emerging pattern is that you can only sue successfully if you can point to an actual reproduction in the outputs. In this case the court was willing to treat very short reproductions as substantial, but that won’t necessarily hold elsewhere. Yes, the test for substantiality is qualitative rather than quantitative, but I suspect AI developers may be inclined to call rightsholders’ bluffs. This sort of litigation is costly and drawn-out, especially when the alleged reproductions are so minimal.

But perhaps more importantly, while we argue over three verses and twenty-five reproduced words, you can simply google the songs and find countless sites hosting the full lyrics. Search engines have been reproducing and storing works for decades, quietly keeping them in data centres near you for faster delivery. My frustration with cases like this is that we end up fixating on microscopic reproductions to scrape together a few pennies that authors will never actually see, all so we can declare a symbolic victory over the “evil” tech companies. Meanwhile, the underlying practice is already ubiquitous online. It’s an attempt to cover the sun with your thumb.

Concluding

This case will almost certainly be appealed, and many of the legal issues that the Munich court touches on will be tested again. I suspect we will be discussing this case for years, not least because it offers one of the first detailed judicial attempts to grapple with AI training and memorisation.

Whatever happens next, I would bet that we eventually end up with some form of licensing market. The real question is how we get there and what shape that market will take. I was also surprised that jurisdiction played such a small role here, given how central it was in the UK Getty litigation; that may well surface as another pressure point on appeal.

In the meantime, I will try to get these Schlager classics out of my head. I have no desire to be sued for memorisation.


6 Comments

Anonymous · March 20, 2026 at 3:59 pm

Why do you disagree with the court’s decision to allocate responsibility for the restricted activity to OpenAI? Please note that I am simply a copyright law enthusiast, and therefore I would like to apologise in advance if the question is daft.

GEMA Secures Legal Precedent Against OpenAI in Germany – AIsure · November 23, 2025 at 8:37 am

[…] Click Here PrevPreviousAccelerated TRL Fine-tuning through RapidFire AI Implementation Share the Post: […]

O confronto OpenAI nos tribunais alemães sugere o que vem por aí para IA e arte | Lifestart · November 24, 2025 at 10:10 am

[…] odiadores experientes da OpenAI concordarão pelo menos com os argumentos jurídicos mais recentes. analisar Em relação à decisão do estudioso do direito de propriedade intelectual Andres Guadamuz: […]

OpenAI’s Smackdown by a German Court Hints at What’s Next for AI and Art - DGMG · November 24, 2025 at 6:00 pm

[…] OpenAI haters will agree, I think, with at least some of the recent legal analysis of the ruling by intellectual property law scholar Andres Guadamuz of the University of Sussex. […]

Un revés judicial para OpenAI en Alemania preanuncia lo que vendrá para la IA y el arte | NuevaRegion.com · November 24, 2025 at 8:36 pm

[…] experimentados odiadores de OpenAI estarán de acuerdo, creo, al menos con parte del reciente análisis legal de la sentencia del académico en leyes de propiedad intelectual Andres Guadamuz de la […]

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.