About these emails…
This is AI and Copyright, a series of email updates tracking the shifting legal picture around the copyright status of the Foundation Models, including the Large Language Models (LLMs), and their cousins the image- (and video- and 3D-) generating models like DALL-E2. Emails are sent weekly. I’m not sure how long this will be needed.
There is little doubt that the Foundation Models will create great value, for the world and for their creators and the companies and entrepreneurs that are starting to use them. But the AI community now recognizes the downsides, issues and problems with data-based predictive tools like machine learning, and these apply, often in different ways, to the Foundation Models. My focus is deliberately narrow — we are zooming in on a single question, how will the legal uncertainty about these models be resolved?
We just don’t know enough about the corpora these models were trained on. (Who are you Books2?) But we do know that some nontrivial proportion of the text (and later images) they were trained on was material that was copied without permission from their creators, in other words, without respecting copyright.
The people who did the copying have said, when they say anything about this, which is not often, that the doctrine of fair use as established in US jurisprudence makes this legitimate. In some cases, copying is said to be allowed under a TDM research exception, even when that exception is for research only, and the model so trained is used for commercial purposes. This is a new area of discussion, and I’m really looking forward to a more robust conversation on the matter.
For a setting out of my current position on these matters, see my article, Good AIs Copy, Great AIs Steal.
OK, we’ll get into this in more detail as we go along. There is a lot of uncertainty. There is a long conversation to be had! This will be a platform for me to track the evolving conversation, and to develop and evolve my own views on the subject.