AI and Copyright

Share this post

Breaking: The first LLM training data legal case?

aicopyright.substack.com

Discover more from AI and Copyright

How will the legal uncertainty around AI models be resolved?
Continue reading
Sign in

Breaking: The first LLM training data legal case?

A weekly email update on developments in the world of Foundation Models, with a specific focus on the question of how their legal uncertainty will be sorted.

Peter Schoppert
Oct 18, 2022
Share

It's from the open source software community, challenging Microsoft's Copilot, built on GPT-3, which trained on open source code libraries as part of its crawl of the internet. These code libraries, like many webpages or Smashwords novels, are free to access but still come with specific license terms if you want to use them.

Butterick et al is less concerned with the loss of licensing revenue and more with other elements of open source licenses that are violated by Copilot’s re-use, and he bundles this with the deleterious effect of the offering on the open source community.

https://githubcopilotinvestigation.com/

Mean­while, we open-source authors have to watch as our work is stashed in a big code library in the sky called Copi­lot. The user feed­back & con­tri­bu­tions we were get­ting? Soon, all gone. Like Neo plugged into the Matrix, or a cow on a farm, Copi­lot wants to con­vert us into noth­ing more than pro­duc­ers of a resource to be extracted. (Well, until we can be dis­posed of entirely.)

And for what? Even the cows get food & shel­ter out of the deal. Copi­lot con­tributes noth­ing to our indi­vid­ual projects. And noth­ing to open source broadly.

Read all about the potential class action in this exquisitely designed website:

https://githubcopilotinvestigation.com/

The person who initiated the case is writer, coder, lawyer and typographer Matthew Butterick.

And it hasn’t taken long for commentators to say that this could lead to companies moving to jurisdictions with TDM exceptions, this from Dr Andres Guadamuz, who was an expert witness in the recent House of Lords testimony. (This view doesn’t grapple with the commercial/non-commercial distinction in the UK exception.)

Twitter avatar for @technollama
Andres Guadamuz @technollama
It's happening. The first US case on training data is being discussed against Microsoft and Github's Co-pilot. My guess? A lengthy litigation that could push companies towards jurisdictions with TDM exceptions.
githubcopilotinvestigation.comGitHub Copilot investigation · Joseph Saveri Law Firm & Matthew ButterickGitHub Copilot investigation
7:49 AM ∙ Oct 18, 2022
38Likes12Retweets

Music to the UK IPO’s ears? And even more for Singapore’s IPOS? Or will we look back on such moves as an opportunity lost for a considered discussion on how to build accountability and fairness into the operation of these important and impactful models?

If you have open source code on Github you can consider joining the class action.

Thanks for reading AI and Copyright! Subscribe for free to receive new posts.

Share
Previous
Next
Top
New

No posts

Ready for more?

© 2023 Peter Schoppert
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing