Now that computer-generated imaging is accessible to anyone with a weird idea and an internet connection, the creation of “AI art” is raising questions—and lawsuits. The key questions seem to be 1) how does it actually work, 2) what work can it replace, and 3) how can the labor of artists be respected through this change?
The lawsuits over AI turn, in large part, on copyright. These copyright issues are so complex that we’ve devoted a whole, separate post to them. Here, we focus on thornier non-legal issues.
How Do AI Art Generators Work?
There are two different parts of the life of an AI art generator. First are the data that teaches it what a "dog" is or, more abstractly, what "anger" looks like. Second are the outputs that the machine gives in response to prompts. Early, when the generator has not had enough training, those outputs only loosely reflect the prompts. But eventually, the generator will have seen enough images to figure out how to properly respond to a prompt (this is just how people do it, too). AI-generated creative content can run the gamut from "prompt based on an image I saw in a fever dream" to "very poorly written blog post."
How Does an AI Art Generator “Learn”?
AI art generators depend on “machine learning.” In a machine learning process, a training algorithm takes in an enormous set of data and analyzes the relationships between its different aspects. An AI art generator is trained on images and on the text that describes those images.
Once it has analyzed the relationships between the words and features of the image data, the generator can use this set of associations to produce new images. This is how it is able to take text input—a “prompt”—like “dog” and generate (that is, “output”) arrangements of pixels that it associates with the word, based on its training data.
The nature of these “outputs” depends on the system’s training data, its training model, and the choices its human creators make.
For instance: a model trained by feeding it images labeled with text that appeared close to those images on public web-pages will not be as good at matching “prompts” as it would be if it had been trained with images that had been manually annotated with explicit, human-generated labels.
This process is not too different from how babies learn things. For example, a lot of kids basically think all animals are "doggies" until they have enough exposure and correction by adults to distinguish "doggie" from "horsie." Machine learning can make similar mistakes, finding connections that, to humans, are obscure. For example, a cancer classifier can “learn” that an image shows a tumor if that image contains a ruler. The AI learned a shortcut: images of structures that a radiologist has identified as cancerous tumors have pictures with rulers for scale and to track size. The training images of benign growths were from a different set, and they didn’t have rulers.
Beyond the effect of training data quality, there is also the effect of different training “models.” These models have names like “diffusion” or “generative adversarial networks” (GANs). Each of these models has different strengths and weaknesses (as of this writing, diffusion models are generally considered the state of the art).
During training, programmers introduce variables that determine the similarity of the model’s output to the images in its training data. Other variables determine whether the system prioritizes creating close matches for the prompt, or being more experimental by outputting images for which the model has less “confidence” (a mathematical term describing a kind of statistical certainty) as a match for the users’ prompts. Some models allow users to adjust such variables when they issue prompts to the model.
Where Does the Training Data Come From?
In general, the training data comes from scraping the web: finding available images that have text associated with them (in some cases, annotations are added afterwards). This means the creators of the images or people depicted in them likely do not know or specifically consent to being included in the analysis. For the “Stable Diffusion” system that is the subject of two recent lawsuits—a class action complaint on behalf of several visual artists and another filed by Getty Images—the dataset is five billion images indexed by a nonprofit called LAION.
For an analysis of the copyright concerns related to those training sets, see our other blog.
Work Replacement and AI
Many artists are concerned that the availability of AI art will mean less of a market for their work. That’s a valid concern: there are some services provided by artists that could likely be replaced by AI generators. This happened previously, with transcription: machine learning systems replaced some human transcription. However, these automated systems produce output that is of generally low quality, as anyone who has seen auto-generated closed captions can attest.
In fact, the issues that come with automating labor go back centuries: automated replacements that can be owned outright by employers or are simply cheaper than paying a worker can result in fewer people with jobs. In a perfect world, automation would be used to free people to pursue matters they care about, but that’s not the world we live in (yet), so it’s natural and valid for workers to worry about automation driving down wages or pushing them out of their industry altogether.
The debate over AI art isn’t limited to general concerns about automation and the lack of support for people automated out of a job; it’s also about whether that AI art generation is especially unfair because much of its training data consists of copyrighted images used without permission. We discuss this in the other post.
Beyond labor market and fairness concerns, there’s a real risk that AI art will give a few corporations even more control over future creativity. Most access to art is already controlled by a few major gatekeepers, which have no interest in the livelihood of artists and no appetite for risk.
For example, Getty Images, a plaintiff in one of the lawsuits against AI art generation, has cornered the market on stock and event images. Most news organizations use Getty because it’s a near-certainty that Getty will have an image of the subject of a given article standing on a red carpet. Without Getty, media companies would have had to either send a photographer to every event, or figure out which freelance photographers were present at it and try to license their images. As a monopoly, Getty both undercuts independent photographers and gouges news organizations.
In its lawsuit, Getty cites an AI-generated image that produced a distorted version of its watermark. Getty claims that this is proof that its copyrighted materials are found in the output of an art generator, but what’s really happening is that the image generator has “learned” that any image of a red carpet contains a Getty watermark, so it draws the watermark into images that seem “Getty-like.” In other words, Getty has such a lock on a certain kind of news photography that a statistical analysis of all newsworthy photos of celebrities will conclude that Getty is inseparable from that kind of photography. A Getty watermark is to a celebrity image as a ruler is to a tumor.
Letting Corporations Control AI Will Flatten Our Creative World
At the moment, there are freely available, open-source models for AI art generators, and anyone can tweak them in innovative ways and innovate with them. But if the legal environment or the technology changed so that only a few large companies could make or use AI art models, it would make our creative world even more homogenous and sanitized.
For example, large commercial deployments of diffusion models already refuse queries that might lead to nude images, which of course are not intrinsically harmful, illegal, or immoral, and have a long history in artistic expression. Heavy-handed restrictions on “adult” subject matter are especially hard on people whose identities are wrongly labeled obscene, intrinsically sexual, or “adult only” (including queer people), erasing them from the world generated by these tools.
AI art generators’ bias needn’t be the result of explicit, active censorship; it can also come from bias in their training data. For example, an AI art tool may generate images of white people as the default, reinforcing racial inequality, or tend towards lighter skin in response to requests for “beautiful” people. Images of women are more likely to be coded as sexual in nature than images of men in similar states of dress and activity, because of widespread cultural objectification of women in both images and its accompanying text. An AI art generator can “learn” to embody injustice and the biases of the era and culture of the training data on which it is trained. AI art generators sometimes produce surprising novelty, but they predominantly favor past values and aesthetics. Models tend to recreate what they see over and over, making their output tend toward the average and typical, at the expense of minority aesthetics and identity.
Another thing to watch for: AI art generators may depend on and reveal private information. Imagine asking an AI art generator to generate images related to a medical condition and seeing a recognizable person in the output (this could happen if the model wasn’t trained on many images related to that condition).
Finally, as has been the case with “deep fakes,” it is possible to use machine learning to generate deceptive images depicting real people doing things they never did. Those images can shame or defame the person or otherwise harm their social and economic lives.
However, such images can also be used for important social commentary, or simply as art, when they are not passed off as true occurrences. We understand when we see an image of a politician lighting the Constitution on fire that they did not literally burn the document—rather, that the creator of that image is commenting on the politician’s policies.
This is a situation where each use should be evaluated on its own merits, rather than banning technology that has both positive and negative uses. As with photomanipulation, it’s important that we learn how to determine what is real. For example, norms around parody photomanipulation exaggerate the edited feel both as part of the parody and to make the joke clear.
What the World Looks Like if AI Creators Need Permission from Rightsholders
See our other blog for our thoughts on copyright and why we don’t think AI art generators are likely to infringe. For purposes of this discussion, however, imagine that you can’t train an AI model on copyrighted information without permission.
Requiring a person using an AI generator to get a license from everyone who has rights in an image in the training data set is unlikely to eliminate this kind of technology. Rather, it will have the perverse effect of limiting this technology development to the very largest companies, who can assemble a data set by compelling their workers to assign the “training right” as a condition of employment or content creation.
This would be a pyrrhic victory for opponents of the very idea of AI art: in the short term, AI tools would cease to exist or would produce lower-quality outputs, reducing the potential to drive down creators’ wages.
But in the medium to long term, this effect is likely to be the reverse. Creative labor markets are intensely concentrated: a small number of companies—including Getty—commission millions of works every year from working creators. These companies already enjoy tremendous bargaining power, which means they can subject artists to standard, non-negotiable terms that give the firms too much control, for too little compensation.
If the right to train a model is contingent on a copyright holder’s permission, then these very large firms could simply amend their boilerplate contracts to require creators to sign away their model-training rights as a condition of doing business. That’s what game companies that employ legions of voice-actors are doing, requiring voice actors to begin every session by recording themselves waiving any right to control whether a model can be trained from their voices.
If large firms like Getty win the right to control model training, they could simply acquire the training rights to any creative worker hoping to do business with them. And since Getty’s largest single expense is the fees it pays to creative workers—fees that it wouldn’t owe in the event that it could use a model to substitute for its workers’ images—it has a powerful incentive to produce a high-quality model to replace those workers.
This would result in the worst of all worlds: the companies that today have cornered the market for creative labor could use AI models to replace their workers, while the individuals who rarely—or never—have cause to commission a creative work would be barred from using AI tools to express themselves.
This would let the handful of firms that pay creative workers for illustration—like the duopoly that controls nearly all comic book creation, or the monopoly that controls the majority of role-playing games—require illustrators to sign away their model-training rights, and replace their paid illustrators with models. Giant corporations wouldn’t have to pay creators—and the GM at your weekly gaming session couldn’t use an AI model to make a visual aid for a key encounter, nor could a kid make their own comic book using text prompts.
Approaches to AI That Respect Artists
The Writer's Guild of America-West is in the middle of renegotiating its minimum basic agreement. That agreement creates the floor for how to credit and pay writers across a number of creative industries, including film and television. The Guild’s AI proposal has some technical problems reflecting an incomplete understanding of how the technology works, but from a labor perspective, it demonstrates an excellent proposal for AI-generated output that, while not perfectly understanding how the technology works, does grasp the central concern at issue very well.
The Guild’s core proposal is this: AI-generated material can't replace a human writer. AI-generated material cannot qualify as source material for adaptation in any way. AI-generated work can be used as research material, just as a Wikipedia article could, but because of the unclear nature of the sources that go into its output and how the output is generated, it has no place as an "author" in the world of copyright. AI outputs are not, in the Guild's opinion, copyrightable.
That means that if a studio wants to use an AI-generated script, there can be no credited author and no copyright. In a world where studios jealously guard the rights to their work, that's a major poison pill. Under this proposal, studios must choose between the upfront cost of paying a writer what they are worth, and the backend cost of not having control over the copyright to the product.
This is a smart strategy that zeroes in on the Guild’s domain: protecting its members. That said, the Guild’s conception of the technology is a little off: the Guild claims that AI creates a mosaic from its training data. This is less true than the Guild claims, and the output from AI doesn’t infringe as often as they imply. But despite these technical misapprehensions, the Guild’s way of thinking about it as a tool is very smart (again, here is our analysis of the copyright status of AI art).
For the Guild, AI-generated writing has no place in Guild-covered works. If a studio makes something that is covered by this agreement, you have to hire a human writer and pay the Guild’s negotiated rate (or more). AI material cannot be used to undercut that labor. AI is a tool to help writers, not a replacement for writers.
That is the way all technology should be seen in relation to artistic work: as an artistic tool, not as a replacement for artists. A broad ban on AI will not fix the inequities of a highly concentrated market - but it could cost us the exciting uses of this technology for creative expression.
Exciting Things About AI Art Generation
Any development that gives more people the ability to express themselves in a new way is an exciting one. For every image that displaces a potential low-dollar commission for a working artist, there are countless more that don’t displace anyone’s living—images created by people expressing themselves or adding art to projects that would simply not have been illustrated. Remember: the major impact of automated translation technology wasn’t displacing translators—it was the creation of free and simple ways to read tweets and webpages in other languages when a person would otherwise just not know what was being said.
When people use AI tools, it leads to a different kind of “creativity” than human artists produce on their own, as the tool finds associations and imagery that unassisted people hadn’t previously made. AI art generators can also help working artists in several ways, for example, by producing a rough first pass or automating time-consuming tasks like shading a flat image. This would be the art equivalent of the research material argument made by the WGA.
There is a lot to like about art generators. The problem going forward is keeping the good things—open-source technology that researchers can audit, cutting down on the tedious parts of making things—without letting the concerns give power to the same companies that disempower artists every day.