We’re all skewed

Alignment, responsibility, and “core socialist values”

Apr 18, 2023

Of all the vectors of regulation that the AI companies have to solve for, copyright, privacy, bias, unfair competition, truth in advertising, concentration of power, the one that is least appreciated is liability. Who is responsible for the kooky things the AIs output?

Who is responsible when “ChatGPT accuses [a] law professor of sexual harassment” or accuses an Australian mayor of bribery? Both falsely?

I have to admit that when I first started to use GPT-3, then still in private beta, I couldn’t imagine a situation where anyone would want to “publish” its output. The models were amazing, but clearly were only suited to be used as tools, a sort of assistant or helper to a writer or publisher.

I couldn’t understand why copyright experts were so obsessed with “the output question”, seeking to understand the copyright status of works created by generative AI. For one thing, the large language models didn’t create anything, they responded to prompts. They were auto-completing text given to them by users. This was far from creation ex-nihilo, it was “co-creation” at best. GPT output would always be attached to a person’s input, and more to the point, would mostly then be used again by people as an input back into their own work. I couldn’t imagine why anyone would want to present anything as “created by” the AI, unless you were Stephen Thaler playing philosophical games with the law.

I read Franklin Graves’ prescient June 2022 blogpost on IPWatch, “Thaler Pursues Copyright Challenge Over Denial of AI-Generated Work Registration”, which introduced the idea of the creation-generation spectrum, where AI generation would be linked to some creation, some prompt, every work a combination of both, in different proportion. (I didn’t anticipate that the US Copyright Office would opine that the creation work in Graves’ model was mere “sweat of the brow”, and not protected. I still think that’s wrong, as do most people who have explored these tools at any length.)

I most definitely did not see the conversational search moment coming, that Microsoft would bet millions on the idea that LLMs could be used as question answering machines. I remembered reading that pretrained LLMs scored worse than a coin toss at truthful question-answering tasks. (See Wei, Jason, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, et al. “Emergent Abilities of Large Language Models.” arXiv, June 15, 2022. p3). Why would anyone try to force models to do something so far from their nature? Well, the answer became clear: “because you have nothing to lose in your competition against Google in the websearch space,” see the 12th Feb issue of this newsletter.

So now the output of LLMs is being presented as a product, a statement from the oracle, facts and knowledge coming from somewhere, whether decompressed from the training data, or looked up from an plug-in to the internet, or somehow “understood” by the model. The LLM wasn’t auto-completing, it was answering a question… (With an answer that is often wrong, but never mind…) In this situation, the AI companies are starting to reckon with the need to take responsibility for the output of their models.

The companies are all madly model-patching, plugging-in and slathering layers of Reinforcement Learning with Human Feedback onto their models, to try and prevent them from saying reprehensible, wrong things. Will all that model-patching work out? And the models become completely reliable, and aligned to human goals (as defined by…?) So that the AI companies can change their health warnings? I just don't see it, but I guess it’s possible.

Certainly it is the greater skill with all that post-pre-training work that makes OpenAI a big success compared to, say, especially, Facebook. Open AI claims it used 20,000 hours of human feedback to train Instruct-GPT, the public face of GPT-3. And see the recent FT article on OpenAI's Red Team efforts - “OpenAI’s red team: the experts hired to ‘break’ ChatGPT.” It may even be that it is the patching not the pre-training that is the big difference between GPT-3 and -4. Sam Altman claims that progress in aligment is leading to progress in capabilities of the models, that they are linked. We don't know because, repeat after me, OpenAI has not released any details about the training of GPT-4.

I guess I notice this because taking responsibility is a core concern for publishers. it’s one of the least appreciated value-adds of publishers working with their authors.

Publishers join with authors to take responsibility for what they've published. We put our addresses on the copyright page so you know where to sue us, or in some countries, where to come to arrest us.

But I’m not sure we’re headed to a convergence, where the AI companies will make their models super robust, and then will be able to stand by them, like a proud publisher with their precocious young author at the book launch.

There’s a risk that something else will happen.

AI fans are fond of pointing out that people who object to what models say are really objecting to that fact that models “say things I disagree with”. True in some cases. Andrew Torba dislikes all the models currently on the market, because “every single one is skewed with a liberal/globalist/talmudic/satanic worldview.” Others rush to publish analyses of the “political biases” of the different models, as captured in standard tests of (human) political orientation. (See David Rozado’s Substack, The Political Bias of ChatGPT - Extended Analysis.)

In his Lex Fridman interview, at about minute 25:30, Altman reckons with the fact that his models will never please everyone. People all have such different values! So the plan is to “make it easy for users to change the behavior of the AI they’re using”. We can all have different “RLHF-tunes” in Altman’s jargon. Two months earlier, OpenAI published a long blogpost on how to architect that choice, titled “How should AI systems behave, and who should decide?”

Here’s the vision.

Diagram of where we’re headed building ChatGPT

OpenAI wants someone else (“the public”) to decide on the broad bounds of what models should be able to say, and it is betting it can train its model accordingly. Then it will enable production of an infinitude of different model versions. Does this mean OpenAI will never be responsible for what its models say, that the customizer has to take the rap? Is OpenAI going to claim a sort of LLM safe harbour? So many questions. Fascinating as this vision is, and hats off for putting it out there, when I squint my eyes it looks like a little like a massive dodge to avoid responsibility for the output of the models.

I hope I’m wrong. But I do know that OpenAI’s approach won’t work in China, as recently released commentary makes clear, from the Cyberspace Administration of China on last year's Provisions on the Administration of Deep Synthesis Internet Information Services:

“Content generated by generative artificial intelligence should embody core socialist values and must not contain any content that subverts state power, advocates the overthrow of the socialist system, incites splitting the country or undermines national unity,” (Bill Bishop's translation)

You can have any colour you want, as long as it’s red.

AI and Copyright

We’re all skewed

Alignment, responsibility, and “core socialist values”