AI学習禁止! Have you been Dreamboothed yet?

and other updates from November 2, 2022

Nov 02, 2022

Manga fans protest! AI学習禁止!

The excellent Rest of the World reported on the manga fan backlash against AI-art generated in the style of well-known manga artists.

Generative AI might have been dubbed Silicon Valley’s “new craze,” but beyond the Valley, hostility and skepticism are already ramping up among an unexpected user base: anime and manga artists. In recent weeks, a series of controversies over AI-generated art — mainly in Japan, but also in South Korea — have prompted industry figures and fans to denounce the technology, along with the artists that use it.

The article quotes a Tokyo-based lawyer Kazuyasu Shiraishi, as saying that Tokyo’s 2018 copyright law allows copying for machine learning under exception. Well he’s a lawyer and I’m not, but my own understanding of the relevant Article 30-4 of the Japanese law is that it takes into account the uses of the models so trained. Specifically, such copying would only be allowed if “it is not [the copier’s] purpose to personally enjoy or cause another person to enjoy the thoughts or sentiments expressed in that work”. And separately, that “this does not apply if the action would unreasonably prejudice the interests of the copyright owner in light of the nature or purpose of the work or the circumstances of its exploitation.”1

If the purpose of the copying is to allow the creation of more works in the style of the creator copied, and in a way which prejudices the interests of the copyright owner, the exception would not seem to give any shield to training data copying.

The Twitter storms around this topic are impressive.

Another must read from Andy Baio: “How one unwilling illustrator found herself turned into an AI model”

This covers the latest trend, also referenced in the article above, of fine-tuning image generators to create works in the style of a single illustrator, or alternatively, to create works with a single subject in many styles. The extra trained requires as few as a couple of dozen additional images . As Baio describes it, in as little as 20 minutes, and spending about $0.40 on extra “compute”, using a technique called DreamBooth he was able to train a model to spit out self-portraits in different styles.

Frankly, it was shocking how little effort it took, how cheap it was, and how immediately fun the results were to play with. Unsurprisingly, a bunch of startups have popped up to make it even easier to DreamBooth yourself, including Astria, Avatar AI, and ProfilePicture.ai.

But the bulk of the article is his interview with a well-known professional 2D artist Hollie Mengert and a Redditor named MysteryInc152 who created a DreamBooth to imitate her style. Baio’s reporting of the thinking of MysteryInc152 reveals where the discourse has gone.

His take was very practical: he thinks it’s legal to train and use, likely to be determined fair use in court, and you can’t copyright a style. Even though you can recreate subjects and styles with high fidelity, the original images themselves aren’t stored in the Stable Diffusion model, with over 100 terabytes of images used to create a tiny 4 GB model. He also thinks it’s inevitable: Adobe is adding generative AI tools to Photoshop, Microsoft is adding an image generator to their design suite. “The technology is here, like we’ve seen countless times throughout history.”

But even our AIbro does recognize that there is something else at work when an AI is trained to reproduce work in the style of a particular living artist who might not be keen on the result.

Read the piece!

More fair use counters from Neil Turkewitz

On Twitter I came across the account of Neil Turkewitz, copyright lawyer, who has been critiquing the “copying to train models is fair use” argument for some time. He’s written a number of very useful posts on the topic dating back at least to 2019, and he’s engaging on Twitter with journalists and others on the topic. I’ll discuss one of his posts briefly here:

AI, Copyright & Fair Use: Avoiding the Artificial in Intelligence & Maintaining our Humanity, Feb 2, 2020

Here he critiques a Jan 2020 submission from the Business Software Alliance which argues that “creating a database of lawfully accessed works for use as training data for machine learning will almost always be considered non-infringing in circumstances where the output of that process does not compete with the works used to train the AI system.”

He makes a very important point about the flawed logic of “copying is how computers read” argument, which seems to be inspiring the folks who have argued for TDM exceptions.

BSA likens machine “learning” to how a human might ingest a book, combing through the protected expression while retaining the unprotected ideas. But while a human might very well operate in that manner, it’s a terrible stand-in for the operation of machines which by their very nature “learn” through reproduction, with such reproductions forming the basis of any new output. Those reproductions of expression, however temporary, are the raw materials used for the development of new forms of expression. In other words, AI isn’t just inspired by the works it ingests — it owes its very existence to them.

And anyway, again, to add my two cents, the large language models are different. As statistical models of text, they are all about style - frequency of particular words, proximity of words, sentence structure. The LLMs are not retaining the unprotected ideas, they famously cannot be trusted on ideas. They are retaining the protected expression.

And, also, BTW, the LLMs and image generators are now creating outputs that compete with the works used to train the AI system. So it would appear that even the Business Software Alliance would not recognize copying to train LLMs and image generators as fair use.

I recommend Neil’s Medium page, there’s lots of great stuff for copyright nerds, from controlled digital lending, to South Africa’s copyright law changes and more analysis of the copyright status of training data.

Tweets, new products, etc

hardmaru @hardmaru

The Simpsons in the style of Anime ｘ Death Note:

I don’t *think* this was created by a style transfer AI process, but who knows! Anyway, this is one of the directions that AI companies like Stability.ai will be exploring with media asset owners, to repurpose existing content in new styles. What’s your generative content strategy?

Shubhro Saha @shubroski

This weekend I built =GPT3(), a way to run GPT-3 prompts in Google Sheets. It's incredible how tasks that are hard or impossible to do w/ regular formulas become trivial. For example: sanitize data, write thank you cards, summarize product reviews, categorize feedback...

The AI-written thank you notes are priceless! And all in a Google Sheet. This is moving so fast…

The English translation of the Japanese law is from the website of the Copyright Research and Information Centre of Japan, a public-interest corporation authorized by the government.

AI and Copyright