The pixie dust option

All the world is faith and trust, and a little sprinking of...

Mar 16, 2025

How will the legal uncertainty around training AI models be resolved? The drama only mounts, but eventual solutions always seemed to be either enforcement of existing laws, or governments deciding to sprinkle magic pixie dust over the actions of the AI model builders.

The pixie dust option is now looking more likely, with the election of Donald Trump and the arrival of an administration that seems to actually run on magical pixie dust! No one at this point needs reminding of Elon Musk's role here, or the unedifying spectacle of the tech luminaries in Washington around inauguration. My favourite pix was Mark Zuckerberg and Brett Kavanaugh sipping champagne together (though to be fair Justice Kavanaugh looks a little embarrassed).

Pixie dust goes well with champagne… screen grab from that New York Times video… used without permission alas, but I hope I am forgiven,,,

The pixie dust ploy is revealed in the OpenAI and Google policy submissions in response to the US administration's recently concluded public consultation on a new AI Executive Order, both of which deal directly with copyright law, really the first time these giants have admitted that their fair use defenses might not be working out after all. (Anthropic's submission is also interesting, but doesn't touch on copyright law at all. Twitter I suppose doesn't need to submit anything...)

OpenAI is asking the federal government to “work to prevent less innovative countries from imposing their legal regimes on American AI firms” and to “weigh in” on debates and “ongoing litigation where pro-innovation principles are at risk.” How would that work exactly I wonder?

OpenAI (and Anthropic in a different direction) are both leaning heavily into the national security approach, in OpenAI’s case tying it directly to copyright, saying in essence that since Chinese companies can’t be expected to respect intellectual property then we had better not constrain our AI champions with the need to respect copyright either. Pretty interesting circular approach: OpenAI was the first to take this path, of using massive caches of pirated books to train its models — soon to be followed by Meta and others, and when Chinese companies do the same, the Americans claim that steps need to be taken because the Chinese have “unfettered access to data”.

Actually China, with its focus on data as a key factor of production and talk of data markets and clearinghouses showed every sign of developing a much more nuanced approach to managing data for AI training, including respect for copyright. But that may be out the window now, I don’t know. We do know that DeepSeek used notorious pirate Anna’s Archive for at least one of its models.

"⁠Applying the fair use doctrine to AI is not only a matter of American competitiveness⁠⁠⁠⁠—it’s a matter of national security...If the PRC’s developers have unfettered access to data and American companies are left without fair use access, the race for AI is effectively over. America loses, as does the success of⁠⁠ ⁠⁠democratic AI." - The Open AI Submission

Using national security as an excuse for drastic and quick action now relies on some weak version of the singularity argument — that whoever gets “there” first is going to have a permanent technological edge as it zooms ahead ever faster. That things will be forever different once we get our “country of geniuses in a datacenter”, as Anthropic CEO Dario Amodei puts it. I find it hard to visualise how more potent large language models are going to “fundamentally transform our understanding of what is possible”... those geniuses are going to be a bit anti-social and disconnected, and diffusion of their abilities will surely follow all the painfully slow contingent development paths of new technologies of the past. Sorry Ezra Klein, Kevin Rouse and Casey Newton!

Case in point: Three years since I first saw how well a GPT model could edit text I still don't have a tool that can deploy this capability in my publishing company's workflow, to bring this ability to bear in a meaningfully productive way. None of the various options are there yet. And that includes me trying to vibecode an MS Word Add-in with Replit. But I'm getting close...and am likely to be getting close for a very long time...

The Google Version

Google of course didn't have to resort to pirate sites since it had already copied millions of books under cover of the Google Books decision. It just had to decide it was going to use this material to train AI, a use emphatically not envisaged by Justice Leval. All behind closed doors, nothing to see here! But obviously is getting a bit nervous about being called out for this.

To make its copyright points, Google doesn't go so hard on the national security argument. It would seem to have more of a global cloud business to protect, and coming down hard on Tier Two countries, as suggested by OpenAI and Anthropic, would probably make life hard for it. But on copyright, Google seems to be arguing for a new US text-and-data mining exception, and — oddly — it says this is a good idea because TDM exceptions have worked well to date. I don't know of any one who can really say that existing TDM exceptions have given any model trainer legal certainty and clarity... not even in Japan or Singapore with their broad exceptions. Read the paper and see if you agree. Key para below:

Balanced copyright rules, such as fair use and text-and-data mining exceptions, have been critical to enabling AI systems to learn from prior knowledge and publicly available data, unlocking scientific and social advances. These exceptions allow for the use of copyrighted, publicly available material for AI training without significantly impacting rightsholders...

- Response to the National Science Foundation’s and Office ofi Science & Technology Policy’s Request for Information on the Development of an Artificial Intelligence (AI) Action Plan

(Thanks Google, you've always been looking out for rightsholders...)

Exceptions allow for the use of copyrighted material in LLM training, except that they don’t, really, certainly not unambiguously. See for example the recently published argument from Kings Counsel Nicholas Caddick, on how the UK's proposed opt-out exception runs afoul of Berne Convention obligations. And of course the Japanese exception includes the Berne convention language in the text of the statute itself! Exceptions to copyright cannot harm the interests of those whose works have been copied under the exception…

But please don’t make this point in the US.

"Doesn’t this go against our treaty obligations?"

"Then it must be a good thing!"

Everything is better with a bit of pixie dust!

AI and Copyright

The pixie dust option

All the world is faith and trust, and a little sprinking of...

The Google Version