“Japan will not enforce copyright on data used for AI training”
This disinformation is human-authored…
David Ha, Head of Strategy at Stability AI, formerly of Google Brain, tweeted on June 1st that “Japan recently reaffirmed that it will not enforce copyrights on data used in AI training. The policy allows AI to use any data ‘regardless of whether it is for non-profit or commercial purposes, whether it is an act other than reproduction, or whether it is content obtained from illegal sites or otherwise.’”
A Minister saying the law would not be enforced? This rather surprising news then got picked up in several tech outlets, under headlines like “AI Art Wars: Japan Says AI Model Training Doesn’t Violate Copyright”. I was asked about this by members of Brazil’s Books Chamber on a Zoom call last night, so this story got around.
However a little digging showed is no real story here. There is no new policy, and the Minister’s re-affirmation was rather of the protections afforded to copyright holders by the provisions of Japan’s existing text-and-data-mining exception, dating from 2018.
Whether those protections are indeed robust enough, and how they will apply to training of generative models is a live question, and one apparently much discussed in Japan at the moment. The 2018 Japanese text and data mining exception is rather broad, and its unique formulation will need to be interpreted in light of generative AI. But the public sentiment is rather for more protection than less. The Minister was saying in response to those sentiments, “our existing law gives us the tools to manage the question, we do not need a new law.”
The ‘news’ spread quickly from Ha’s Tweet, to his 228,400 followers (I am one). Ha’s source was a story titled “Japan Goes All In: Copyright Doesn’t Apply To AI Training” from a website called Technomancers.ai. This is an anonymous site “dedicated to providing information about large language models and other forms of AI/Machine Learning”. Why anonymous? “…if you work in tech, what if you say something that’s going to make the language model you just got assigned to work with hate you if it ever finds out your name?” Er, OK, so far so credible....
But what actually was being reported?
Technomancer cited a statement from Keiko Nagaoka, Japanese Minister of Education, Culture, Sports, Science, and Technology, saying she confirmed this “policy” in a “local meeting”. Following the links, one reads that the Minister was in fact answering a question from Parliamentarian Kii, in a Subcommittee session of the Diet Committee on Oversight of Accounts and Administration, dedicated to oversight of her Ministry. Friends in Tokyo provided me with the official transcription of the session, from the official Diet website, and a careful English translation to go with.
It is clear that no new policy was announced, the Minister was merely repeating the provisions of 2018 act. In fact, when asked whether a new policy or law was needed to protect the interest of copyright holders, the Minister said no, and pointed to the guardrails in the existing law, which protect the interests of creators. She mentioned two such guardrails, from Chapter 5, Subsection 2, Article 30-4 of the Japanese law, first, that copying for text and datamining would only be allowed if “it is not [the copier’s] purpose to personally enjoy or cause another person to enjoy the thoughts or sentiments expressed in that work”. And secondly, that this does not apply “if the action would unreasonably prejudice the interests of the copyright owner in light of the nature or purpose of the work or the circumstances of its exploitation.”
(I am quoting the phrasing used in the official translation of Article 30-4), from the website of the Copyright Research and Information Centre of Japan, a public-interest corporation authorized by the government.
Here is a translation of the Minister’s concluding statement, provided by Toshimichi Ishijima, of the Japan Academic Association for Copyright Clearance:
“In this way, the Copyright Law of Japan is stipulated in consideration of the interests of copyright holders, taking into account the actual state of use and the intentions of the parties concerned, including the rights holder. In any case, it is very important for the Ministry of Education, Culture, Sports, Science and Technology to advance research on the relationship with copyright, while balancing the protection and use of copyrighted works, based on the progress of new technologies such as AI.”
One of the interesting details in the exchange was around the question of whether “content obtained from illegal sites” is permitted to be used for information analysis. This question has been of particular interest for this newsletter, as we have identified how hundreds of thousands of pirated books have been used to train LLMs.
Minister Nagaoka first pointed out that there are various remedies to tackle illegal sites. “And one more thing, regarding content obtained from illegal sites, the situation and condition in which illegally uploaded copyrighted works can be used is a problem. Compensation claims, injunctive relief, and criminal penalties may apply…
“On the other hand, when using copyrighted works on the Internet for information analysis, it is practically difficult to confirm whether each of the copyrighted works collected in large quantities is legal. It is conceivable that the actual situation makes it difficult to analyze information using data. In addition, the act of using a work for information analysis is not intended for the enjoyment of the thoughts or feelings expressed in the work. It does not conflict with the original market for the use of copyrighted material, and does not prejudice the interests of copyright holders protected by copyright law. From this point of view, Article 30-4 of the Copyright Act does not require that the work be legal.”
This interpretation is a little shocking at first read. But a careful reading (of a careful translation), makes clear the conditionality. IF the analysis is not “intended for the enjoyment of the thoughts or feelings expressed in the work” OR when the interests of copyright holders are not prejudiced, THEN the question of the legality of the copyrighted works is not germane. The strong conditionality is reinforced by the Minister’s concluding remarks, quoted above.
A reckoning will come eventually, as to whether generative AI is intended to allow the enjoyment of the thoughts or feelings expressed in the works copied, and whether generative AI prejudices the interests of copyright holders. Certainly there is a strong case that both things are true.
And for more on the question of whether creators’ interests are harmed by generative AI, read this sad and depressing story from the Washington Post: “ChatGPT took their jobs. Now they walk dogs and fix air conditioners.” What is sad and depressing is not the news that “those who write marketing and social media content are in the first wave of people being replaced with tools such as chatbots,” (I was recently quoted to the same effect in the FT’s Sifted). It was the treatment of the writer Olivia Lipkin at the tech company she worked at, as she started to be referred to as “Olivia/ChatGPT” in internal comms.
In attempt to read something more light-hearted, I revisited technomancer, to see what else they were “reporting”. I found a great story on a new UFO, sorry unidentified anomalous phenomena or UAP, seen in an NASA photograph, “Whatever is going on, it has massive implications for US national security.” It’s either aliens or the Russians apparently. So we have other things to worry about aside from AIs causing the end of humanity. And tipping points in climate change. Etc.