The enjoyment purpose
Do exceptions under the copyright laws of Japan & Singapore allow GenAI training?
TLDR: Not really.
When the history of generative AI is written, we will need to understand how the tech industry came to take such a big gamble: to invest deca-billions based on a fair use claim. (A highly uncertain fair use claim in my view, see previous newsletters). The excellent Illusion of More blog had a post on this subject recently: “Fair Use” is Not a Great Business Plan. It’s worth quoting at some length:
“Because simply put, the party who conjures the term “fair use” has effectively assumed that a potential liability for copyright infringement exists. And if that assumption is a bad business decision, then that’s the founders’ problem, not a flaw in copyright law’
No matter what the critics say, or how hard certain academics try to alter its meaning, the courts are clear that fair use is an affirmative defense to a claim of copyright infringement, which means that building a business venture on an assumption of fair use is tantamount to assuming that lawsuits are coming. And if it’s a multi-billion-dollar venture that potentially infringes millions of works owned by major corporations, then the lawsuits are going to be big—perhaps even existential.
The viewpoint of researchers is easier to understand. The desirable endpoint is this amazing technology, and let the corporate lawyers figure out the commercialisation. Everyone is publishing, we want to catch up! But very quickly millions, then billions of dollars began to be invested in products brought to market, and surely this brought questions from management and their lawyers? Were these asked and answered or brushed aside? Was there a pricing of the risk and rational decision to go ahead? Or an avoidance of the question? Thinking back to my days as a cofounder in an internet startup, raising, well, only deca-millions ahead of a NASDAQ IPO, my money is on the latter. But I would have loved to have been a fly on those walls…
But then fair use, it’s an American thing! It’s a big old world. Won’t the AI investors just move to those jurisdictions where copyright claims against training generative AI are forestalled by wise governments? Places like Singapore and Japan?
Well it’s not so simple.
Singapore: always exceptional
Let’s take Singapore first. Singapore’s new Copyright Law (which came into force in November 2021) has the most liberal TDM exception anywhere in the world. Singapore’s formulation is to allow copying for computational data analysis, for commercial or non-commercial uses. And it is backed up with a US-inspired fair use provision as well.
Yet Singapore’s top legal scholars argue that this doesn’t mean Singapore’s law allows GenAI training. Simon Chesterman, Vice Provost and former Dean of the law school at my employer, the National University of Singapore, and concurrently Senior Director of AI Governance at AI Singapore, argued on the opinion pages of our major broadsheet The Straits Times that
“…the Singapore exception for data mining is broad. Nonetheless, the law specifies that the materials should not be used for any purpose other than computational data analysis. If they are used to create new artistic works that compete with the original works, that may fail to satisfy the fair use test too.”1
Then copyright expert David Tan of the NUS Centre for Technology, Robotics, Artificial Intelligence and the Law published a paper in the Singapore Academy of Law Practitioner which puts it very directly: in Generative AI training, “the making of a copy…will not be for the sole purpose of analysing the data to improve the functioning of the AI in relation to the data; it will be for the purpose of generating new expressive works based on that data, which is an impermissible purpose.”2
Enjoying the Japanese Exception
Next we come to Japan, which passed its own forward-looking TDM exception in 2018.
You will remember in June 2023, various AI researchers, including Yann LeCunn and David Ha were celebrating Japan’s decision ‘that it will not enforce copyrights on data used in AI training’. This turned out to be fake news, but I guess the impression sticks. And the actual Japanese position was not very easy to access.
Accordingly the copyright world has been watching the report of Japan’s Agency for Cultural Affairs, the unit of the Ministry of Education, Culture, Sports, Science and Technology which houses the national copyright authority. So in effect this is the equivalent of the US Copyright Office. The Agency was charged with sorting through how Japan’s TDM exception might apply to GenAI training.
The report is still in draft form, but the draft is not expected to shift very much. Toshimichi Ishijima, Secretary General of the Japan Academic Association For Copyright Clearance has provided me with a translation of the draft, but warned of the complexities of translating this sort of high-bureaucratic Japanese into English. Let’s consider this just a window on some of the thinking in the Agency rather than the final view. We understand that the Agency will likely produce an official translation for the final document, given the international interest.
Similar to the Singapore interpretations of Chesterman and Tan, the end purpose of the text and data mining/computation is seen as important in deciding if the clause applies. The Japanese law allows copying “for information analysis” but not when the “purpose is to enjoy for oneself or to cause others to enjoy the thoughts or feelings expressed in the work”. I think it’s best to think of this phrasing as a way of reframing the fact/expression distinction that is so important to copyright, with a focus not on some intrinsic characteristics of the work, but on the impact on the reader/viewer/listener.
Of course there are so many things going on in a large language model or a multimodal GenAI, model. These are general purpose tools. People may use them for a great many things, but getting more variations or extensions of a favourite creative work is certainly one popular use. “I love those Rutkowski dragons!”
On this question the Copyright Agency’s paper is pretty clear. As long as there is any “enjoyment” going on, in cases where the enjoyment and non-enjoyment purposes co-exist, the exception does not apply and the copying is illegitimate.
But unlike me, concluding that any generative AI model is by its nature a mixed use, the Agency’s draft seeks look at cases, in creating an opinion that can apply for future technologies. “Depending on the specific case, if it is evaluated that the purpose is to output a product that allows one to directly sense the essential characteristics of the expression of the copyrighted work of the learning data, the purpose of enjoyment may coexist.”
The Paper goes on to enumerate a number of uses where the enjoyment and non-enjoyment purposes clearly do co-exist (and for which copying would not be allowed). These are many of the more popular use cases of GenAI. They specifically identify practices like fine-tuning on a particular set of material (say an individual artist’s work, a favourite technique when using Stable Diffusion, (or on the works of a group of 16,000 artists, part of the fine-tuning of Midjourney apparently)). They see Retrieval Augmented Generation as (usually) including enjoyment of the material retrieved. Any data pulled from a datastore as part of a RAG process would need to be licensed (and certainly could be easily licensed given that the extraction of the relevant contract is a clearly identified vector database retrieval).
They specifically mention that Internet search would be a non-allowable purpose, if it “generates answers in the form of sentences based on the results.” Big trouble for Perplexity, Bing Search, anyone doing RAG-of-the-Net.
But then the Agency report takes a different tack from the “substantially similarity” argument so beloved of American judges.
“Furthermore, even if there is a case in which a work similar to the copyrighted work learned by the AI is generated at the generation/use stage, it is usually not possible to infer the existence of the purpose of enjoyment at the development/learning stage from this fact alone. In other words, the application of Article 30-4 of the Act cannot be immediately denied. On the other hand, the fact that products similar to the studied works are frequently generated during the generation and use stage can be considered to be a factor in inferring the existence of the purpose of enjoyment at the development and learning stage.”
My guess is that people looking for clarity and bright lines out of this exercise are going to be pretty disappointed. Once you write your law based on the motivation or purpose of the copying, and you want to look at purposes both in the training as well as in the “generation/use” stage of GenAI it seems you are introducing a huge amount of complexity. I would imagine that it takes very little “enjoyment” to poison the GenAI well, so I can hardly imagine this development of the implications of the Japanese exception as being good news for BigTech.
and then there was Berne
The other feature of the Japanese exception is that it uses language to bring into alignment with the Berne Convention on exceptions, that they are not allowed “if the interests of the copyright holder would be unjustly prejudiced.” So this exception does not apply if the copying unjustly prejudices the interests of creators. We are here because copyright holders, newspapers, freelance artists, copyright writers, jingle composers, etc, are very clearly feeling that their interests are unjustly prejudiced by generative AI.
This is quite a separate pillar of Article 30, and the Agency paper suggests a logical flow: if the copying was interested in enabling “enjoyment”, it’s not valid. This clause will only apply for “non-enjoyment” uses, ie applying only to facts and ideas, not expression.
And so the Agency reminds us that ideas are not protected. “The abstract fear that one's own market may be squeezed due to the generation of a large number of similar ideas” is not sufficient reason to trigger this clause.
So if your work is copied in a non-enjoyment use (ie expression is not mobilized) and if you want to argue against that copying, you have to have a concrete argument about market harm. Judges might want to see some evidence that you had the intention to license your material. The Agency gives one example here, but it’s an important one: That the use of a machine readable block in a website should be taken to read that the website owner intends to license their content. In such a case, the exception would not give you clearance to violate the block and make the copying.
This moves Japan’s exception closer to the European exception, with a machine-readable opt-out for commercial purposes. That would be consistent with Japan’s stated desire to move forward on AI governance in lockstep with their G7 partners.
The language here is complex, in the original Japanese I am assured, and as rendered into English. So don’t treat this as definitive until the report is issued and we have an agreed-upon translation. But it seems quite an interesting threading of the needle and I do hope we see an official translation. Meanwhile, I thought the discussion interesting enough to share even at draft stage.
But perceptions are hard to dislodge…
Late last year I attended a discussion organized by the Singapore Management University’s Centre for AI and Data Governance but funded by a Big Tech player, and the sponsors of our session set their view out very clearly in their introductory remarks: training generative models is legal in Singapore and Japan, because of their TDM exceptions.
Sorry guys… there are no bright lines in Singapore and Japan, and what lines there are argue against your case. There is no refuge here for copying works to train models that create works as output.
This was based on a longer paper prepared for publication in Policy & Society. Chesterman, Simon, “Good Models Borrow, Great Models Steal: Intellectual Property Rights and Generative AI” (October 11, 2023). NUS Law Working Paper No. 2023/025, Available at http://dx.doi.org/10.2139/ssrn.4590006.
David Tan, “Generative AI and Copyright: Part 2: Computational Data Analysis Exception and Fair Use”, 2023 SAL Prac 25, p. 7