Breaking: The first LLM training data legal case?
A weekly email update on developments in the world of Foundation Models, with a specific focus on the question of how their legal uncertainty will be sorted.
It's from the open source software community, challenging Microsoft's Copilot, built on GPT-3, which trained on open source code libraries as part of its crawl of the internet. These code libraries, like many webpages or Smashwords novels, are free to access but still come with specific license terms if you want to use them.
Butterick et al is less concerned with the loss of licensing revenue and more with other elements of open source licenses that are violated by Copilot’s re-use, and he bundles this with the deleterious effect of the offering on the open source community.
https://githubcopilotinvestigation.com/
Meanwhile, we open-source authors have to watch as our work is stashed in a big code library in the sky called Copilot. The user feedback & contributions we were getting? Soon, all gone. Like Neo plugged into the Matrix, or a cow on a farm, Copilot wants to convert us into nothing more than producers of a resource to be extracted. (Well, until we can be disposed of entirely.)
And for what? Even the cows get food & shelter out of the deal. Copilot contributes nothing to our individual projects. And nothing to open source broadly.
Read all about the potential class action in this exquisitely designed website:
https://githubcopilotinvestigation.com/
The person who initiated the case is writer, coder, lawyer and typographer Matthew Butterick.
And it hasn’t taken long for commentators to say that this could lead to companies moving to jurisdictions with TDM exceptions, this from Dr Andres Guadamuz, who was an expert witness in the recent House of Lords testimony. (This view doesn’t grapple with the commercial/non-commercial distinction in the UK exception.)
Music to the UK IPO’s ears? And even more for Singapore’s IPOS? Or will we look back on such moves as an opportunity lost for a considered discussion on how to build accountability and fairness into the operation of these important and impactful models?
If you have open source code on Github you can consider joining the class action.