The two cultures divide at the heart of AI's copyright problem
a parable about the ways different communities value their intellectual labour
Imagine that you spend five years working full time on your ultimate passion project. Your whole education and career has been leading up to this work; it demands from you an enormous commitment of time and energy. You put aside lucrative opportunities so you can pursue this singular goal. If you are not careful this work can put at risk your relationships, even your mental health. You only make it through with grinding hard work, financial support from family members, moral support from the right mentors, and some plain good luck. After five years, the work is done, and you share it with the public. Maybe the work makes headlines and commands attention around the world, maybe it doesn’t — but in either case you can take pride in the personal achievement, you get professional recognition, and this effort becomes a platform for the rest of your career.
Then as you come back from a well-deserved break, you learn that a clever technology company has created an AI model that can generate in seconds what took you four or five years of sweat and labour to bring forth. The clever model can create an output and results that can stand in for your own work. What’s more, the AI model could only achieve this result because it used your work as an input into its training. And by the way, nobody asked you for permission to train the model on your work.
If you are a creative writer, and this was your first novel, or an animated short film or portfolio of illustrations, you are likely to feel deeply wronged. With a few prompts into a large language model, it becomes possible to make an imitation of the literary style, and evoke the imaginary world that you invested so much of yourself into creating. If you have become a successful artist, anyone can pop your name into Stable Diffusion and get an image in what is recognizably your style (but probably a poor imitation).
But if you are biologist and the project was your PhD thesis on the folding structure of a single protein, you are more likely to shrug, sigh, and as quickly as you can start using that new tool (AlphaFold from Google DeepMind in this case). After all, now you can predict the folding structures of proteins you never thought we would have a clue about. It will be something you use a great deal in your career researching the cure to Parkinson’s disease, say. Sure, you spent years doing painstaking work in a way that no one will ever have to do again, but you have your PhD, science marches on, and now you get to use this super-clever tool. In later years, you will have the great pleasure of complaining about how kids these days have it easy…
How we value intellectual and creative work
This is the two cultures problem at the heart of current debates on AI and copyright. The scientists sees her labour very differently than the creator does, at least after the fruits of that labour been ingested into a machine learning model. One size may not fit all when society decides how to value accumulated human labour against the possibilities of AI tools. And the great scientific potential of one kind of machine learning model (supervised learning on carefully collected databases) need not justify the abuses of different kinds of AI models (LLMs trained via self-supervision on vast numbers of copyrighted works).
It is certainly true of both AI models that they exist only because of the human work that went before. The modern revolution in neural networks kicked off in 2012 when Geoff Hinton's team (including then graduate student, now OpenAI chief scientist Ilya Sutskever) blew everybody out of the water in the annual ImageNet Large Scale Visual Recognition Challenge. But that effort depended on the ImageNet dataset, an accumulation of 12 million digital images labelled by hand by anonymous workers working online.
Creators often hear the explanation that they should stop being Luddites, and not stand in the way of new technology, adjust like painters did to the invention of photography. But the analogy is deeply flawed. The camera was not invented by stealing all the paintings from the Louvre and boiling them down into a solution that could then capture light on paper. Painters may have contributed to the development of photopraphy, by innovating around pinhole cameras and other such techniques, but they did so as inventors themselves.
No doubt there are those in Singapore and elsewhere who believe that creative work is overvalued, frivolous, indulgent, compared to the work of biology, physics or engineering. But surely we’ve reached the point where we realise the huge impact that creative work and imagination has one how we see the world, and how we interact with each other, even if in Singapore most of the films we see and novels we read are written in other contexts, for other societies. Creative labour is important and the interests of creators need to be protected. Science will benefit from easier discoverability and access to the findings of other scientists, and the careful application of machine learning to high quality datasets. We need to find solutions that work across the two cultures.
Notes
This is a draft op-ed based on part of my keynote at the October IFFRO annual meeting in Reykjavik. I’m thinking of pitching it to Singapore’s broadsheet The Straits Times, after I was quoted in a recent story titled “Books3 dataset, used to train AI, contains works stolen from Singaporean authors”:
“We have to tell big tech companies, ‘You took our stuff without asking or paying us. Now you’re creating all this value and shrinking the market for creative work.’ It is fundamentally unfair and a huge transfer of value away from creators and towards tech companies instead.”
This is just one of a raft of recent statements from publishers and authors around the world over the last couple of months. But much as we need to repeat this verity, and keep using forceful terms, it may be useful to help explain why we feel a bit differently than many other communities with a stake in AI do. As a university press publisher in the social sciences and humanities, my professional interests straddle both perspectives.
(edited five minutes after sending out to correct the typo in the headline…grrr…)