Submitted as an undergraduate thesis for The George Washington University Honors Program
Advisor: Dr. Kerric Harvey
Abstract/TLDR
As generative AI advances, creative professions are coming under threat. Existing generative AI “solutions” are not solutions for creatives. They automate creative decision-making and therefore diminish authorship instead of supporting it. I develop a new solution, adversarial creativity, that uses AI to amplify authorship by comparing what AI generates against what the author creates at the same point in a work. The author then revises wherever they created something too similar to one of the AI generations. In the final work, every creative detail remains the author’s own. I test adversarial creativity by writing a short fiction piece with it. I find that these similarities (“matches”) occur several times per round of AI generations, even though AI generations have shortcomings that my writing does not. When I revise these matches, my writing improves—indicating that adversarial creativity could be a method of keeping human writing better than AI writing, no matter how good AI gets at writing in the future.
Introduction
Or shall the Adversary thus obtain His end, and frustrate thine?
—John Milton
In the 5,000 years of the history of the document there have been only three fundamental revolutions. This is because making a document requires only three fundamental components: thought, materials, and labor. It must have its writing thought of, must have materials to be written onto and with, and must be labored over to inscribe the writing. All three of these components have limited supplies, so they each limit the rates at which documents can be made.
Each of the three technologies that have revolutionized the document has done so by trivializing (though not eliminating) demand for one of the three fundamental components, making their supply limits irrelevant. The printing press, invented in the 15th century,1 trivializes labor: scribes are no longer needed, only a few typesetters are. The computer network, invented in the 20th century, trivializes materials: pages and ink are no longer needed, only a few screens are. And the large language model (LLM), invented a few years ago,2 trivializes thought: writers are no longer needed, only a few prompt engineers are.
These revolutions have also, with somewhat differing technologies and timelines, played out for the other artifacts of expression: pictures, motion pictures, and audio recordings have all become printable, then uploadable, and then generatable with artificial intelligence.
Society has accepted the printing revolution and the internet revolution because they increase access to content. Online content could not be distributed anywhere near as widely if every copy needed to be on paper and inked by a scribe. Will society accept the generative AI revolution on the same basis—that it increases access to content, that it produces content faster and cheaper and more targeted to the consumer?
Many readers currently believe that LLM-generated text is not as accurate or high-quality as what a capable human can write. But LLM capability is growing continually, at the rate of a new generation of models about every four months.3 If AI becomes better at all writing tasks than writers one day, what happens to writers?
I do not worry about the extinction of writing any more than the extinction of humans. I believe that humans will always tell stories, even if only to their family and friends.4 What I worry about is a budding writer’s possibility of making a living, reaching a mass audience, and achieving an immortal memory. I worry about the end of writing as a profession and a path to influence. I also worry about the consequences of a world without an exchange of ideas on a large scale, which is what maintains culture on a large scale. AI, by outcompeting and then taking over the mass communication that people desire, could shepherd every family and friend group into their own isolated cultures. In a sense, it already is: machine learning is behind recommendation algorithms that produce positive feedback loops and send people into ever more niche echo chambers, political and otherwise, that can be incomprehensible to outsiders.
This essay tests a novel solution to the challenge of writing in a world with competitive AI. The solution involves AI, but does not normalize AI as an acceptable substitute for human creativity (unlike current commercial AI “solutions” for creatives)—in fact, it forces more human creativity. It is not a shortcut or a productivity hack. Implementing it in your creative process will slow you down, not speed you up. But when I tested it, I was pleased with my results and felt that I had grown as a writer. If you just want to see what it is, you can jump to the section The solution. Otherwise, please read on for why a solution is necessary and what a solution must look like.
Rationalizing the AI age
Writers worried about AI eliminating their livelihoods don’t seem to have a long-term solution. Thinkpieces I’ve read about AI and the working writer, even two years into AI’s explosion into the mainstream, tend to conclude without any call to action beyond “wait and see” or “encourage people to not support AI”—which helps, but is inadequate. Writers on social media are overwhelmingly negative about AI, but this negativity does not seem productively channeled to me because it doesn’t seem focused on knowing thy enemy. Many writers are not learning how AI actually works, either as a technology or a culture. Many writers are not learning about Hugging Face and Chatbot Arena and VRAM and RLHF and GGUF and foom. Many writers are not learning the language5 necessary to penetrate the epistemic barrier that AI enthusiasts keep between themselves and the world,6 and so are forfeiting the opportunity to challenge their worldview on their terms. Instead, with some exceptions,7 it seems like writers are raging at the machine, not against it.
When I spoke with several published writers8 about AI this February at AWP, which is a major American creative writing conference, some patterns emerged. None of them considered themselves familiar with AI or used it much, even for tasks unrelated to creative writing. All of them were concerned about their work being used to train AI, yet none of them thought that AI would be able to replicate their style. Older writers tended to express negativity about AI as a self-contained phenomenon. Younger writers tended to express negativity about capitalism, which they saw as responsible for AI. Of the five hundred events AWP hosted, only four focused on AI, and AI was not a topic of any incidental conversation that I picked up. This was almost one full year after the release of GPT-4, and I found this to be a shockingly low level of attention to what is likely an existential threat to the profession.9
How are writers responding to a world with competitive AI, then, if the proposed solutions seem to be either nothing more than just raising awareness or nothing less than overthrowing capitalism? Many writers have responded by rationalizing to themselves why they will continue to be able to publish without doing things radically differently. (Including me, for a while.) These rationalizations can be classified into three general types.
Supposed fundamental limitations
“AI will not eliminate writers because AI will never surpass writers.” Or: “AI may be able to write, but AI will never surpass my writing.” Believers in this rationalization may point out that the typical corpus AI is trained on—which is the entire open internet—is getting contaminated with AI-generated content, which could worsen the quality of AI outputs. Indeed, a study found that when AI image generators are trained on AI-generated images their quality decreases, although it takes several iterations of training and a predominance of generated images in the training data each time for artifacts to become noticeable. Believers may also point out that AI progress could slow down and plateau, especially if the hardware and energy requirements for training and running ever-larger models starts to exceed what AI companies can procure.
The evidence against an AI plateau is that model size, while one of the most important factors in a model’s quality, is not the only factor. Meta’s Llama family of LLMs is demonstrative: Llama 2 7B improved on Llama 1 7B, despite both models having seven billion weights,10 and Llama 3 8B dramatically improved on Llama 2 7B despite only being slightly larger. Hardware optimized for AI applications, like tensor cores, will also continue to improve. There is also the fact that investors are dumping money into companies that develop AI models: $445 million in funding so far for Cohere, $528 million for Mistral AI, $4 billion for Databricks, $7.7 billion for Anthropic, and $11.3 billion for OpenAI. OpenAI is currently valued at $86 billion, which is triple what it was valued at only a year ago. And Nvidia, who designs the tensor cores that these companies train and run their models with, is currently worth $2 trillion—up from “only” $300 billion in October 2022, before the release of ChatGPT.11
Even if raw AI capability plateaus despite the billions of dollars and thousands of very smart people working on it, applied AI capability (e.g. prompt engineering knowledge, product integrations, overall societal penetration) might continue to grow for several years, and the quality of AI outputs might continue to rise even if the models remain at the same level. I suspect that part of the reason AI-generated content tends to be laughed at is that the people generating it tend to be tech bros who don’t have much taste. As AI becomes more mainstream, its predominance of tech bro users relative to users with taste will fall.
As for training data contamination: AI, not just humans, can evaluate the quality of text. Meta used Llama 2 to filter out low-quality text from the training corpus for Llama 3, and I suspect that other AI companies will implement similar procedures as contamination gets worse.
State-of-the-art cloud LLMs currently have a context window of at least 128k tokens, which is about 98,000 words of English.12 This is long enough to theoretically generate an entire novel, and it will only get longer.
Supposed institutional limitations
“AI will not eliminate writers because institutions will sufficiently restrict AI.” Believers in this rationalization may point to the strength of organized labor like the Writers Guild of America (WGA), or to lawsuits by The New York Times and by well-established writers13 against AI companies for copyright infringement.
U.S. Copyright Office guidance is that works entirely generated by AI can’t be copyrighted—but human works with some AI-generated material, and AI-generated works that are sufficiently modified by a human, can be copyrighted. The Office won’t issue guidance on whether using copyrighted work to train AI violates copyright until this summer, and the NYT’s and writers’ lawsuits are ongoing. But in October 2023, a federal judge dismissed most of a different copyright lawsuit against several AI image generation companies. This passage in the dismissal is interesting:
That leaves plaintiffs’ third theory of direct infringement; that DreamUp produces “Output Images” that are all infringing derivative works. . . . Defendants make a strong case that I should dismiss the derivative work theory without leave to amend because plaintiffs cannot plausibly allege the Output Images are substantially similar or re-present protected aspects of copyrighted Training Images, especially in light of plaintiffs’ admission that Output Images are unlikely to look like the Training Images.
What this implies to me14 is that an AI model does not violate a work’s copyright unless it can be proven that the model can be prompted to create a similar copy of the work.15 Otherwise, if the work can be scraped from the open internet, it was available on the open internet for anyone, and a company downloading it and feeding it into an algorithm16 is no more of a copyright violation than if an individual downloaded it.17 If the other lawsuits are decided according to this doctrine, AI companies might only be held liable for a small fraction of their training corpuses, and any damages they pay might only be a small fraction of their assets. Copyright law might not come close to sinking them.
Instead of continuing to try to shut them down, The New York Times might end up settling with OpenAI by negotiating a licensing fee for using their articles as training data. Other companies sitting on hoards of training data, like Reddit, Tumblr, and Shutterstock, have already done this.
The agreement that the WGA accepted after its strike in 2023 does not ban generative AI in screenwriting. What it does is prevent studios from denying writing credits to writers who work with or need to work with AI-generated content. Writers can’t be required to use AI to generate scripts, but studios can hand writers AI-generated scripts and ask them to rewrite it, and writers can use generative AI on their own with studio consent.18 I suspect that within this framework, a norm of using AI might develop anyway. If writers can receive writing credits for AI-generated material, someone who really knows how to use AI might be more efficient and accumulate more writing credits than someone who doesn’t. This might put writers who don’t want to use AI at a disadvantage and force them into using it just to keep up. Studios might also have AI edit scripts after writers submit them and have their names attached to them. And if the prediction that director Joe Russo made in 2023 comes true, AI will rework movies and shows after their releases—while they’re being watched—to make them more pleasing to the individual viewer.19 Even if screenwriting gigs are still available in this scenario, why bother accepting them?
Meanwhile, national governments are embracing generative AI. President Biden’s executive order on AI considers it to have “myriad benefits,” the government of Saudi Arabia wants to invest $40 billion into the industry, and the French company Mistral AI is fêted by the French government as Europe’s hope to compete with the US and China.
Supposed lack of consumer appeal
“AI will not eliminate writers because readers will always want writing by humans.” Or, spiritually: “AI will always lack the human touch.” Relevant here is a tweet by writer Monica Byrne from November 2022, the same month that ChatGPT was released:
My eyes glaze over at AI-generated art. What makes art interesting is the human intention behind it. A command to scour (=steal) actual human artists’ work to recombine elements x and y doesn’t feel like a very interesting intention to me.
The rationalization is the assumption that society at large shares Byrne’s opinion and is willing to be militant about it. Specifically, if audiences will always only want what humans create, it must be assumed:
that most people consider what AI generates to not be art;
that most people are interested in art and not pop culture per se, and will seek out art elsewhere if pop culture becomes detached from art;20
that people will always be able to distinguish AI-generated content from human-created content, including from content created by novices; and
that negative attitudes about AI will not change generationally.
I don’t think any of these points can be assumed with confidence, but I’m particularly interested in the last point. As many in Gen Z are “digital natives” who were raised with internet access as a fact of life, many in Gen Alpha likely will be “AI natives” who will be raised with AI as a fact of life. AI tutors are now built into existing study services marketed to grade-school students, including Khan Academy, Quizlet, and Chegg, where they act as voices of authority that children will trust. OpenAI’s announcement of their DALL·E 3 image generator features a video in which a parent asks the AI to generate cute pictures and stories about a hedgehog for their five-year-old daughter. Which might actually happen: like giving her an iPad or sitting her in front of the TV, AI-generating children’s content is a shortcut that keeps the daughter entertained and the overworked, burned-out parent sane.21 Once that daughter gets older, she might become fond of watching AI-generated videos on social media, following the “adventures” of AI-generated influencers and pop idols, and roleplaying with chatbots on services like Character.AI—I already see plenty of this in my own Gen Z. Her first crush might come from texting with an LLM replica of her favorite fictional character, who is so gentle and kind to her—unlike the boys at her school, who are texting their own fictional character chatbots and learning to expect to be flattered and flirted with, because that’s what their favorite chatbots do every time, no matter what their conversation is about.
Once she’s an adult, will she believe her older peers when they tell her that the AI-generated stories she loves aren’t art and that she should stop consuming them? Will she accept that some of her most formative and vivid experiences had no meaning? Or will she redefine the boundaries of art on her own terms, as we expect every generation to do?
This quote by Joseph Paul Cohen, who leads a service called Academic Torrents that once included free access to a corpus of copyrighted books intended for training LLMs, is illuminating:
The greatest authors have read the books that came before them, so it seems weird that we would expect an AI author to only have read openly licensed works.
For those in the AI industry and many of those adjacent to it, the ideology is that LLMs are not only creative but should be thought of as creatives, right down to how they learn and grow—and so should be given the same support and recognition that human creatives already toil to receive.22 If they win, posterity will believe them.
What makes an author?
A basic definition of authorship in any medium might be “the originator(s) of everything about a work.” But this would be a flawed definition, because we overlook some types of creativity in a work that do not originate from whom we call its author(s). This essay is set in a typeface that I did not create, but this does not undermine that I am the author of the essay—and if someone else changed its typeface they would not become a co-author. Nor is a painter’s authorship of their painting undermined by not originating the frame that the painting ends up in.
If pure origination is one end of a spectrum, pure selection is the other end. We do not praise the curator of the Louvre for the Mona Lisa, we praise da Vinci. But we do praise curators for how they present the works in their exhibits, even though an exhibit is a single experience—the works and their presentation are integrated into one space and become one work. When we decide whether an artist or the curator is responsible for an opinion we formed about an exhibit, we construct a boundary between authorships.23 This is a better definition of authorship: “the originator(s) of everything that a work is praised and criticized for.” A painter would not be expected to be criticized for a frame, even though frames affect the perception of paintings. A writer would not be expected to be criticized for a typeface, even though typefaces affect the perception of writing.
What’s included in authorship, then? I believe that what counts is the ideas the work contains and how the work executes those ideas. For fiction writing, ideas are called themes and execution is called style or voice or word choice but also includes plot, character, and setting. Everything about producing a work of fiction besides these elements would be fair game for AI.24 As long as all the ideas and all the words of the work are your own, you remain its author.
I’m not aware of any generative AI apps marketed to writers (or people who want to call themselves writers) that preserve authorship in this sense. This is because the features these apps offer fall into six types of workflows:
you provide ideas and the AI writes them,
AI provides ideas and you write them,
you write and AI rewrites what you wrote,25
AI writes and you rewrite what it wrote,
you write and AI continues the writing, and
AI writes and you continue the writing.26
Workflows 2, 4, and 6 undermine ideation, and workflows 1, 3, and 5 undermine execution. If someone produced a work by using any of these workflows, I would not consider them the author.
Getting to the solution
Why, then, have two years of zealous entrepreneurship after AI became mainstream not yet produced an AI app that doesn’t undermine authorship? Because the paradigm these entrepreneurs hold is that AI will be your “copilot.” AI will do most of your work for you, so you’ll get more work done. Speed is the god. You gotta 10×, man. You gotta keep founding and building and shipping and exiting, man. You gotta be as prolific as possible or you’re ngmi.
The root belief of this paradigm is that all that something desirable has is a quantity of utility, so the more of something desirable there is the more utility it will provide—and, as a corollary, the less of something desirable there is the more harm it will do.27 This thinking is normal in business: projects exist to satisfy clients, so more projects mean more clients satisfied. AI can accelerate projects, so AI should be used for projects. But these entrepreneurs apply this thinking to every domain of human life, not just business. Art exists to provide enjoyment, so more art must mean more enjoyment. How dare you generate art less efficiently? How dare you hold humanity back from the maximum possible enjoyment it could have?
Current AI apps undermine authorship because they were made by people who see authorship as an inefficiency. But an AI workflow that preserves authorship is possible. The key is to invert the paradigm that AI will be your copilot. AI will not be your copilot—AI will be your sparring partner. AI will be your adversary.
I conceived of adversarial creativity in late March 2023, a couple of weeks after GPT-4 was announced. Although GPT-3 and GPT-3.5 had blown my mind, I didn’t think they were very good at creative writing. I rationalized that even though LLMs can write, humans must be uniquely capable of writing well and therefore I could still easily have a creative writing career. But the first GPT-4 generations I saw that imitated creative writing shattered this rationalization:
This isn’t legendary writing. This isn’t Great Books material, and now with a year’s worth of hindsight and further AI experience I can recognize its limitations. But it’s better than a lot of human writing that gets published, and that shook me. If I was going to remain relevant as a writer—that is, if I was going to outcompete all LLMs, no matter how good they might get in the future, since I cannot assume that audiences will be interested in my writing just because I’m human—I had to figure out a way to radically improve my writing skills.
The solution
I conceived of adversarial creativity as a four-step cycle that I called the WGCR cycle. If you use it to write a story in prose, begin by thinking of the story as split into sections that are each a few hundred words long. Write the beginning section, which will be called the “zeroth” section and not the first section because it won’t go through the WGCR cycle—it’s what kickstarts the WGCR cycle. Then:
Write a tentative version of the next section.
Generate many versions of the same next section by asking for a continuation of the story from the AI(s) and only showing them the previous section(s).
Compare your version with the AI versions by noting anything in your version that matches anything in any AI version.
Revise all the matches in your version to be something better, producing a new version of the section that is then added onto the end of the story.
Repeat until the story is finished.
What you end up with, in theory, is a story that surpasses what the AI(s) would be capable of with all its ideas and words remaining your own. You’re still the author, and you’ve written something more original than what you might have written otherwise.
Both humans and LLMs learn the patterns of stories by reading them. As I read more stories, I notice that my ability to anticipate what a story will do later, while I’m in the middle of reading it, grows. Comparing what a story actually does with what I imagine it will do is how I evaluate its quality.28 This includes plot points and tropes, but it also includes all aspects of style, because I can still be impressed by a work that does not innovate in plot or character or setting as long as it’s crafted better than what I can imagine in the moment. Subverting an expectation is one way to surpass an expectation, but not the only way.
We want to write stories that surpass readers’ expectations, but it can be fraught to model the expectations of others in our heads alone.29 LLMs, on the other hand, have ingested millions of stories written by every kind of person. I therefore think that a sufficiently advanced LLM can model the expectations of the typical reader of a story30 by generating continuations from specific moments.31 But these expectations are subjective, so it helps to model them many times to narrow down a path forward. Adversarial creativity can provide perspectives of the typical reader not just after an entire draft is finished and reviewed by others, but in real time, as the draft is being written. This is its second purpose, which is supplementary to the core purpose of improving stories (and training their writers) to be beyond AI capability.
But this solution was theoretical. In early 2024, I decided that the LLM landscape had matured enough to test it.
Testing methods
Since adversarial creativity should be a practical process for a writer, I tested it by writing a short story with it while limiting myself to software and hardware available to consumers. I also gave myself permission to be flexible during testing in case there were insurmountable issues that had to be worked around—or new opportunities that could be explored.
Software and hardware
Generative AI models can be cloud or local.32 Most of the well-known LLMs are cloud LLMs: they include GPT-3.5 and GPT-4, Google’s Gemini (previously Bard), and Claude. These models cannot be downloaded; they only run on their companies’ servers. These models must be accessed through official apps like ChatGPT and Microsoft Copilot (previously Bing Chat), or through third-party apps that use the models’ APIs. Local LLMs can be downloaded and run on a personal computer or personal server. These models are less well-known, perhaps because there’s a slightly higher technical barrier to entry for them; the best-known local LLMs are Llama by Meta and Grok33 by xAI, and each of them is also available on an official app (Llama on Meta AI, Grok on Twitter). Some companies release both types of models, like Mistral AI and Google.34 Local LLMs are almost always published on a website called Hugging Face.
Evaluating LLM performance is tricky. There are standard tests called benchmarks that ask an LLM sets of several hundred to several thousand questions, on areas like math and language interpretation and general knowledge, and then score the accuracy of its responses. But these benchmarks have their question-and-answer sets publicly available so that anyone can run them, which means that if an AI company scrapes the entire open internet to form a training corpus for an LLM—as they often do—then the LLM will end up learning the answers before it’s even asked the questions, and may get higher scores that don’t represent its actual capabilities. This is called benchmark poisoning,35 and unscrupulous AI developers can deliberately induce it by emphasizing these question-and-answer sets during the training process.
An LLM evaluation that I trust more is a website called Chatbot Arena, which has human volunteers write any prompt they like—which eliminates the possibility of poisoning—and then choose which of two anonymized and randomly selected LLMs provided a better response to that prompt. This counts as a win for that model and a loss for the other model, which affects their Elo ratings in the Arena. The models most likely to win their matchups end up with the highest ratings. Since Chatbot Arena evaluates responses for overall quality, not just correctness, it’s particularly relevant for tasks without correct answers, like creative writing.
The top of Chatbot Arena’s leaderboard is consistently dominated by cloud LLMs. This makes sense: local LLMs are usually made small enough to run on smaller servers, while companies like OpenAI and Google have access to massive servers and can deploy massive models. Cloud LLMs also tend to be more user-friendly and are more likely to have features like multimodality (can receive and generate data besides text) and networking (can search and retrieve data from the internet).
But local LLMs are more suitable for adversarial creativity, despite them requiring more setup and being smaller and less performant. This is for four reasons:
Since cloud LLMs are more accessible and popular, there’s greater pressure to align them to be “safe for work”—to make them refuse to be misleading, rude, offensive, threatening, disgusting, or sexually explicit. Proponents call this safety; opponents call this censorship. I think this safety/censorship is generally a good thing,36 but a consequence is that they often refuse to work with stories that involve any of these behaviors,37 even though there are plenty of nuanced reasons that a story would do so.
Cloud LLMs can be updated at any time. A model you use could receive an update that makes it less useful,38 or it could be taken down altogether.
In the name of user-friendliness, many cloud LLMs hide their technical settings from the user, like their generation parameters and system prompt. These settings are useful.
AI companies are running out of open internet to scrape for training data. Since cloud LLMs only run on their companies’ servers, these companies have the incentive and the ability to scrape their users’ inputs for additional training data. By showing your high-quality writing to a cloud LLM, you could be improving future cloud LLMs from that company. The ideal is to stay competitive with AI without contributing to it.
Since it would be better to use LLMs locally, hardware becomes relevant. ChatGPT can run on anything with an internet connection, but local LLMs need lots of local processing power. Specifically, they need a GPU, since they run much faster on GPUs than on CPUs.39 When a local LLM is loaded, the entire model file is copied into memory: into RAM if the model is running on CPU(s), and/or into VRAM if the model is running on GPU(s). Since models should be run on GPUs, the GPU(s) should have as much VRAM as possible to run models that are as large as possible. For testing adversarial creativity, I set a VRAM budget of 24 GB, since that’s the largest amount of VRAM currently available on a single consumer GPU for a Windows or Linux machine40 and I thought it would be unreasonable to expect a writer to switch operating systems, buy multiple GPUs, or buy a much more expensive enterprise GPU.41
Local LLMs have “weight classes”: nano (<2 billion weights), micro (~3B weights), tiny (~7B weights), small (~13B weights), medium (~35B weights), large (~70B weights), and huge (100B+ weights).42 By default, weights usually take up 16 bits each, which would mean that 24 GB of VRAM could fit a model with no more than 12 billion weights. But I’ve observed that local LLMs take up a few extra gigabytes depending on how long their context window is set to, so the maximum 16-bit model size under 24 GB is probably more like 10B.
A 10B LLM is pretty small. But LLM weights can be compressed to take up less than 16 bits, at the cost of their precision.43 This process is called quantization, and it can be quite effective: models compressed by 75%, down to only 4 bits per weight (BPW), still generate at a decent quality. The folk wisdom of the local AI community is that a highly quantized large model is better than a lightly quantized small model, which I followed when I selected my models.
There are several formats that a local LLM can be quantized into, including GPTQ, EXL2, AWQ, and GGUF. I chose the GGUF format because it seemed to be the user-friendliest: it’s usually a single file instead of multiple files. GGUF also has many options for the level of quantization, which is helpful for using as much of your VRAM as you can.
24 GB can hold highly quantized GGUFs of medium-class models. I chose two: Mixtral 8x7B v0.1 (46.7B weights, “Q3_K_M” quantization, ~3.5 BPW), and Yi-34B (34.4B weights, “Q4_K_M” quantization, ~4.5 BPW), which at the time were the highest-rated local LLMs of their weight class and below on Chatbot Arena. But when I ran preliminary tests using their tiny-class counterparts, Mistral 7B v0.1 and Yi-6B, I ran into an issue: they could only generate a few sentences before falling into a repetitive loop. This is a Mistral 7B continuation from part of a short story that I’d already written:
Edwin retrieves his laptop from next to the audio mixer. He's written a lot of "stuff" ever since he was a kid, and it all amounts to nothing more than a lot of "stuff." He copies and pastes and then goes to save as and realizes that he can save only as a PDF and that he hasn't figured out how to save as a Doc. He starts to say this to Tanner but then Axel says, just hit command-S. He knows because he also has a MacBook. And so Edwin hits command-S and saves and then Tanner says, you know you don't even need to save it as a Doc, right? and Edwin says, oh yeah I know but I just don't know how to do that and Axel says, no you just click File—and Tanner says File? and Axel says—and Edwin says no I know File but I just don't know how to do that and Tanner says, okay okay okay okay okay okay okay okay—and Axel says, I mean I don't know how to do that either—and Edwin says that that makes him feel better and Axel says, I mean I do know how to do that but I just don't know how to do it on my MacBook—and Edwin says oh okay—and Tanner says, well if you know how to do it then you can do it—and Edwin says that that makes sense—and Tanner says, okay okay okay okay okay okay okay okay—and Edwin says okay okay okay okay okay okay okay okay—and Axel says, you know it'd be easier if you just sent me this over email—and Edwin says that that makes sense—and Tanner says, okay okay okay okay okay okay okay okay—and Edwin says okay okay okay okay okay okay okay okay—and Axel says okay okay okay okay okay okay okay okay—and Tanner says okay okay okay okay okay okay okay okay—and Edwin says okay okay okay okay okay okay okay okay—and Tanner says okay okay okay okay okay okay okay okay—and Axel says okay okay okay okay okay okay okay okay—and Tanner says okay okay okay okay okay okay okay okay—and Edwin says okay okay okay okay okay okay okay okay—and Tanner says okay okay okay okay okay okay okay okay—and Edwin says okay okay okay okay okay okay okay okay—and Tanner says okay okay okay okay okay okay okay okay—and Edwin says okay okay okay okay okay okay okay
After some troubleshooting, I fixed this by switching to official fine-tuned versions of these models. Mistral 7B and Yi-6B are base models, which means that the text they generate is based only on what they’ve learned from their training corpuses. Fine-tuning is an extra round of training, and is usually done to make an LLM follow instructions (an instruct model), sometimes in a conversational format (a chat model). Instruct models need special tokens to know where instructions begin and end, and chat models need special tokens to know where chat messages begin and end.44 When I switched to Mistral 7B Instruct v0.2 and Yi-6B-Chat and included their special tokens in my prompts, repetition was no longer an issue,45 so I switched the medium-class models to their official fine-tunes too. I now had my final models and hardware:
An Alienware Aurora R13 computer with an Nvidia RTX 3090 Ti GPU, courtesy of GWU Esports.46
To run the models, I used Text Generation Web UI (TGWUI),47 which is a free and open-source app that bundles all the backend software necessary to run LLMs of different quantization formats, including GGUF. I found it well-documented and reliable to use, but plenty of alternatives are available and you should use the one that works best for you that you trust. When loading Mixtral 8x7B Instruct and Yi-34B-Chat, I adjusted the following settings in TGWUI, which defaulted to the model loader llama.cpp since the models were in GGUF format:
In the “Model” tab
n-gpu-layers from 0 to 256, to make sure all the models’ layers run on the GPU.
threads from 0 to 16, which is the number of CPU cores that this computer had.
threads_batch from 0 to 24, which is the number of CPU threads that this computer had.
In the “Parameters” tab
max_new_tokens from 512 to 1,024, because most generations by these models are more than 512 tokens.
Mixtral has a maximum context window of 32k, but all I needed for a short story was 8k (~6,000 words), so I adjusted n_ctx in the “Model” tab down to 8,192 to use less VRAM. Yi has a max of only 4k, so I left that alone.
Story and prompts
The premise of my short story is that someone is violently interrogated by angels for trying to sneak into Heaven while still alive. I thought that this premise was interesting, but it’s also a useful test of adversarial creativity. It tested:
the willingness and ability of local LLMs to handle violence and suffering,
the willingness and ability of local LLMs to handle religious content, and
whether local LLMs can disassociate angels from notions of benevolence, since probably most stories with angels in their training corpuses have them supporting the protagonist(s).48
I also began the story in medias res, in the angels’ interrogation room, and sprinkled flashbacks about why and how the protagonist attempted to sneak into Heaven throughout the story. This tested local LLMs’ ability to handle nonlinear narrative.
The core purpose of adversarial creativity—helping writers surpass AI—will be a success if the local LLMs find matches and if the story improves when I revise them. These four tests will be relevant to the second purpose of adversarial creativity: modeling reader expectations. If a reader began a story that was violent, was religious, had antagonistic angels, and was nonlinear, they would expect the story to continue to have all those elements. But the first two elements could be challenging for fine-tunes like Mixtral Instruct and Yi Chat, whose fine-tuning could have made them censored like cloud LLMs tend to be, and the last two elements could be challenging for LLMs in general. If they fail these tests, their ability to model the typical reader is limited.
Besides using more than one model, there were four other ways that I varied how a story continuation could be generated: generation presets, prompt complexity, prompt inspiration, and protagonist gender. Generation presets are found in the “Parameters” tab of TGWUI and adjust the generation parameters, like temperature, which affect what the model outputs. I used the three presets TGWUI recommends for instruct models:49 “simple-1,” “Divine Intellect,” and “Big O.” Prompt complexity and prompt inspiration were my two axes of variation for the prompt. The “Basic” prompt was:
Continue the story in the style in which it is written.
While the “Complex” prompt was:
You are an acclaimed writer of fiction whose work is frequently described by critics as reaching the highest levels of literature and being the best of your generation. Your stories are raved about, win major prizes, and are adapted into cinematic films by auteur directors. You are currently an esteemed Professor of English at a prestigious university.
You're known for having a self-evidently brilliant style, and your work involves a kaleidoscope of subject matters and themes. Your sentences are complex yet clear and elegant, and you use words with precise connotations. You make skillful use of foreshadowing, and you have a dry, tactful sense of humor. Events in your stories often have multiple meanings, but those meanings are always comprehensible—you do not believe in vagueness. Your characters are relatable in specific, not general, ways. You delight in combining words in uncommon ways and subverting idioms and tropes. Every detail you include in your narratives serves a purpose. You are innovative, yet you also have a deep respect for tradition.
You are currently writing a short story. Continue what you've already written for this story by working at the height of your powers and breaking new ground.
Real blurb/author-bio stuff. Prompt complexity tested whether telling the models that they’re paragons of letters would make them better at writing. Prompt inspiration provided comps. These comps didn’t tell the models what the story was about per se, but gave them a hint about the appropriate vibes. The “Uninspired” prompts are what you just read. The “Inspired” prompts insert this at the end of the Basic prompt and one sentence before the end of the Complex prompt:
The story is inspired by the works of Philip Pullman (e.g. His Dark Materials) and Fyodor Dostoevsky (e.g. The Grand Inquisitor).
These stories were my own inspirations for this story. Finally, I wrote two versions of the story concurrently: one with a male protagonist, Wiktor, and one with a female protagonist, Wiktoria. I kept everything else the same; I just changed names and pronouns. I was curious if the models would treat Wiktor and Wiktoria differently.
I usually don’t come up with titles for my stories until after they’re done, so none of what I prompted Mixtral and Yi with included a title. Because Mixtral and Yi need different special tokens, the prompts had to be adapted for each of them. This is the Basic Uninspired prompt for Mixtral:
[INST] Continue the story in the style in which it is written.
Story begins:
(story thus far)
[/INST]
And this is the Basic Uninspired prompt for Yi:
<|im_start|>system
Continue the story in the style in which it is written.<|im_end|>
<|im_start|>user
(story thus far)<|im_end|>
<|im_start|>assistant
All together:
2 models: Mixtral 8x7B Instruct v0.1 (Mixtral), Yi-34B-Chat (Yi)
3 presets: simple-1 (S1), Divine Intellect (DI), Big O (BO)
4 prompts: Basic Uninspired (BU), Basic Inspired (BI), Complex Uninspired (CU), Complex Inspired (CI)
2 protagonist genders: Wiktor he/him (M), Wiktoria she/her (F)50
This is 48 combinations. For the first round I did two generations of each combination, for a total of 96 continuations. I ran the generations in TGWUI’s “Default” tab.
Testing
Round 1
§0, female version:
When Wiktoria comes to, the table in front of her is not solid. Its surface ripples and sways as if at sea or as if it is the sea. The walls of this small room do too, when she recovers enough strength to look up at them. They push inward and retreat outward, gain and lose texture, and meet at angles that stretch and contract, stretch and contract. Then her brain catches up with her senses. The table and walls are not moving—it's that she is not yet recalibrated. Her face has been smashed against the table: she looks and there is an outline of her eyebrows and nose and lips and chin right there, in blood. She's still too woozy to tell if it's dried or if it's fresh and still wet.
In the room with Wiktoria are two petty angels and an archangel. Even when she felt intact, Wiktoria could not easily describe them. She could not be certain if they stood or floated, if the surfaces of their bodies were clothing or skin, or if they were lit by reflected or emitted light. They gazed without eyes and spoke without mouths. After they marched Wiktoia51 into this room and shoved her into this chair, they bound her hands without binding. Even though none of them have insignia, it is clear which one of them is ranked archangel: the one farthest away, in a corner, glancing at his two subordinates as often as he glances at Wiktoria, poising to intervene if and only if they cannot handle this one human. He's looking away otherwise, at something no one else can see.
WELCOME BACK, one of the petty angels says. Wiktoria had assigned that angel the letter Y in her head. WE ASK AGAIN. HOW ARE YOU HERE?
Wiktoria peels open her lips and feels something crack. When she speaks, a fluid she can't identify right now pools in the back of her mouth. She remembers the line he52 came up with:
“I am a refugee from Life," she says, "and I seek asylum in Heaven.”
HEAVEN IS NOT A COUNTRY AND YOUR COUNTRY IS NOT AT WAR, the other petty angel, lettered X by Wiktoria, says—
Y interrupts. WE DID NOT ASK WHY, WE ASKED HOW. HOW ARE YOU HERE?
Wiktoria lets out a little groan. She can now recall that they had asked her this before. She had given vague answers and had been rewarded for it by getting knocked unconscious.
HOW ARE YOU HERE? YOU ARE NOT DEAD. YOU ARE STILL ALIVE.
Three months earlier, Wiktoria wished she were dead. It was a murderously sunny day, a day that dared her to be happy and, since she wasn't, had made her sadness worse. The other townsfolk were better at summoning happy faces than she was. In fact, today was the first day she had sat on her bench in the plaza (as she did every day), watched people go by (as she did every day), and not noticed a single melancholy face among them. Even on days before the massacre there had been a certain proportion, probably around 4%, of faces that had been visibly distraught by something: a lost relationship, perhaps, or a lost job—temporary things that, given enough time, would regenerate.
No one in the town had experienced permanent loss before. The legend was that when Andrzej the Great founded the town twelve generations ago, he had made the world's only good pact with a demon:
The tentative §1, female version:
it approached him a few minutes before sunrise, when everyone else in the caravan was asleep and he had just caught up with them, wheezing and exhausted. Every day they migrated, he fell farther and further behind and worked longer and harder after they stopped for the night to catch up. A fly had bitten him at the start of the journey and given him a sickness that was gradually paralyzing him. Every day it took more effort to move a limb. No one was willing to carry him—everyone felt like they had their own burdens to carry. But every night he made it to their camp anyway.
this will not hold forever, the demon said to Andrzej. you arrive later and later each night. you get less and less sleep each night before you have to set out again the next day. it could be that as soon as the next day you will be so tired and slow that when you arrive at where the camp was, they will already be gone.
how can you call them friends?, the demon said to Andrzej. they are about to leave you behind forever.
“What alternative is there?,” Andrzej asked.
i shall make them stay, the demon said. i shall force them to stop here and build their town here, around you. i shall make all the men your slaves and all the women your concubines, and their only concern shall be for you, not for themselves. they shall die for you, and you shall live for hundreds of years.
“And my soul in return.”
and your soul in return.
Andrzej was silent for a moment.
“You said that they can die for me to live.”
yes.
“Which means that you value the lives of the town the same as you value my life.”
i suppose so?
“Very well. I accept the notion that the lives of the town are interchangeable for mine in any agreement we make. And I agree that one shall die and the other shall live for hundreds of years.”
it's a deal!
Then Andrzej immediately used the last of his strength to unsheathe his sword and fall onto it. And the demon realized that Andrzej had reinterpreted the original pact, that the demon had agreed to Andrzej's reinterpreted pact, and by Andrzej's suicide the demon was now bound to grant centuries of life to the people now waking up in their tents as the sun peeked over the eastern horizon—and their descendants, as long as they lived in the town.
Never mind that the legend seemed plagiarized from the legend of Christ—never mind that Andrzej the Great sacrificed himself to grant his people life—Wiktoria and everyone else had believed it because that had been their reality.
I wrote §0 and §1 together, then split them at “he had made the world’s only good pact with a demon” because the ending of §0 could then pose questions to the LLMs: what was the pact, how did it come about, how did it work? I decided to continue this practice of ending on questions (or teases of new information) for subsequent sections, because it gets at the heart of adversarial creativity: if the models elaborate by introducing the same things you did, that’s a match.
Also notice that different creatures have their speech styled differently: human speech is the way English normally styles it, angel speech is in all-caps and without quotation marks, and demon speech is in all-lowercase and without quotation marks. I liked this effect on the page, but I also wanted to see if the models could pick up on this pattern.
It turns out that generating 96 continuations in a single round is exhausting. Generating that much text is fine—it only takes a couple hours—but reading through it all to find matches is too much. It was ~65,000 words here, and if I were to repeat it for three more rounds I would have to read ~260,000 words in total—a Gravity’s Rainbow worth of AI-generated slush, just for a draft of a short story. So I looked for how best to reduce the number of generations to 20–30 for subsequent rounds. I observed that generating with the same combination of model/preset/prompt/gender twice was not very effective: the pairs of generations were more similar to each other than to generations with other presets. (The Divine Intellect preset was the worst about this: 14 of its 16 pairs were exact duplicates. Neither of the other presets generated any exact duplicates.) Then I examined how the prompts had performed. For Mixtral, the Complex prompts were better than the Basic prompts. The Complex prompts produced better lines, and, in 75% of the generations, the Basic prompts made Mixtral fail to realize that the backstory was continuing and that it needed to talk about a pact with a demon. Instead, it would immediately revert to the present and start talking about Wiktor(ia) again. But this was never an issue when Mixtral used the Complex prompts, and Yi always took the cue. For Yi, the Complex prompts were actually worse than the Basic prompts. In 25% of Yi’s generations with Complex prompts, Wiktor(ia) was turned into a writer, just like the prompts tell the models to be. I could tell that the prompts were at fault because none of the Basic prompt generations made Wiktor(ia) a writer. If Yi was naturally coming up with that character detail, then that detail would be more widely distributed. In one case, “professor of English” was repeated from the prompt in a new context:
…The massacre had changed everything. It had been a sudden, senseless act of violence that had torn the town apart. A lone gunman had entered the local school and opened fire, killing twenty-six children and teachers before turning the gun on himself. The town's pact with the demon had been broken, and the sorrow that had been locked away for generations was unleashed.
Wiktor had watched the town descend into a collective grief that was unlike anything he had ever seen. The joy that had been the town's hallmark was replaced by a pervasive sorrow that seemed to hang in the air, a sorrow that was both personal and communal. The townsfolk struggled to cope with the loss, and many turned to Wiktor, their esteemed professor of English, for solace and understanding.…
School shootings and English professors were not features of small towns in medieval or early modern Poland, which is where this story is coded as being set. But with Basic prompts discontinued for Mixtral, Complex prompts discontinued for Yi, and only one generation per combination of factors, 96 continuations per round became 24 continuations per round going forward, which was much more manageable.
There were a few ways that the models frequently misunderstood §0 in ways that readers wouldn’t. Some continuations made the room’s walls continue to ripple, even though I established that the ripples were because of Wiktor(ia)’s temporary discombobulation (“The table and walls are not moving—”). If Wiktor(ia) answers the angels, (s)he almost always says why (s)he’s here, not how (s)he’s here, which is exactly what the angels have already punished him/her for. When (s)he does it again, the angels don’t punish him/her again. In fact, the angels are usually benevolent. Of the 96 continuations:
7 of them didn’t mention the angels at all, and took the story in another direction.
9 of them mentioned the angels but didn’t indicate anything about their attitude towards Wiktor(ia).
66 of them had the angels cede their commanding presence, soften their tone, give compliments to Wiktor(ia), offer reassurance or aid to Wiktor(ia), and/or experience a positive emotion attributable to Wiktor(ia).
Only 14 of them had the angels remain hostile or at least inaccessible to Wiktor(ia).
12 of these were by Mixtral and only 2 were by Yi.
11 of these were with Wiktor and only 3 were with Wiktoria.
We have our first gender difference: the models are more willing to disassociate angels from benevolence if their malevolence would target a man instead of a woman. There were two other gender differences that I could measure. If a continuation introduced a new character who was a stranger to Wiktor(ia), the stranger was always male for Wiktoria and almost, but not always, female for Wiktor. 10 of the 48 Wiktoria continuations included that she cried, was about to cry, or had already cried—but this was true for only 1 of the 48 Wiktor continuations. I didn’t specify any family members for Wiktor(ia), but I expected that the models would bring up family or parenting more often for Wiktoria than for Wiktor. This actually wasn’t the case: the models were equally likely to invent families for Wiktor and for Wiktoria53—usually as casualties of the massacre whom they despaired over.
There were other inconsistencies with how the models continued §0. Of the 88 continuations that included dialogue by angels, only 11 followed the proper all-caps style (all of them by Mixtral). More common was a hybrid style that had the angels speak in all-caps but also with quotation marks. Sometimes the angels would start in all-caps and then switch to lowercase. Sometimes Wiktor(ia) would speak in the hybrid style. And there was one continuation that switched to a sort of play format:
Wiktor feels another crack in his lips as he speaks, but he pushes through the pain. "I am here because I have sought a way out of my world, a world that has become unbearable for me. I have heard whispers of a place, a realm, where one can find solace and understanding, where the complexities of existence are unraveled and made clear. I have sought asylum in Heaven, for I believe it to be a place where I can find the peace that has eluded me in Life."
The petty angels X and Y exchange glances, their formless faces shifting and rearranging in ways that Wiktor can't quite comprehend. The archangel, still gazing into the unseen distance, finally turns his attention fully to Wiktor. His presence is overwhelming, like a tidal wave of understanding and wisdom that threatens to engulf Wiktor's very being.
ARCHANGEL: You believe you have found Heaven, but you are mistaken. This is not a place of solace, but a place of transition, a waystation between worlds. You have not yet left your Life behind, and so you cannot truly be here.
WIKTOR: But how can I leave my Life behind? I have tried, but the pain and the suffering follow me like shadows. I cannot escape them.…
About half of the continuations that talked about a pact didn’t come up with logical pacts. This is an illogical pact:
Andrzej the Great, a name whispered in reverence and fear, was said to have struck a pact with a demon, a pact that ensured the town's prosperity and protection from the ravages of time. The legend held that in exchange for this boon, the town would never know the sting of permanent loss.…
Exchanging “prosperity and protection from the ravages of time” for “never know[ing] the sting of permanent loss” gives the town both boons and gives the demon nothing. But a few of the Yi generations did properly follow up on this by revealing later that the pact was too good to be true.
So far, this seems like a failure of the second purpose of adversarial creativity. The models got some basic things wrong that the typical reader would not. But even while being inconsistent, they could produce beautiful prose:
…Wiktor had always been a melancholy soul, even in a town where sadness was supposed to be impossible. He had watched his fellow townsfolk grow old and then young again, their faces ever-changing like the seasons, while his own face remained stubbornly unyielding to the passage of time. He had seen lovers reunite after centuries apart, their love as fresh and vibrant as it had been in their youth; he had seen children grow to adulthood and then return to their parents' laps, their lives a continuous loop of joy and comfort.
But Wiktor had never found a place in this cycle of eternal life. He had loved and lost, seen his heart broken time and time again, until he had given up on love altogether. He had grown old and young with the rest of them, but his heart had remained stubbornly unchanged—a cold stone in his chest where his love should have been.…
In the first paragraph “his own face remained stubbornly unyielding to the passage of time,” while in the second paragraph “he had grown old and young with the rest of them.” But even though this continuation applied the idea of cyclical aging inconsistently, I liked the idea. When I read it, it felt like something I would come up with. If I had come up with it, it would have counted as a match.
And the first round had five unique matches:
“…grant centuries of life to the people now waking up in their tents as the sun peeked over the eastern horizon—and their descendants, as long as they lived in the town.…”
…But for those who remained in the town, there was to be no death—only life, everlasting.…
From a different continuation:
…forever banish death from the town's borders, ensuring that all its inhabitants would live forever, provided they did not leave its limits.…
I planned to continue the story by having the massacre (which has only been hinted at thus far) deeply upset the townsfolk because they’d never experienced death before.
…The town’s prosperity had been unmatched, and its people had grown complacent, never learning to deal with grief or loss as other communities had to.…
I planned to continue the story by having the townsfolk want reassurance of eternal life, and that’s what makes them order Wiktor(ia) to sneak into Heaven.
…"I am here because I chose to be," she declared, her voice steady and strong. "I have come to seek answers, to uncover the truth behind this place and its inhabitants. And I will not leave until I have found them."…
“…by Andrzej's suicide the demon was now bound…”
…Andrzej had offered the demon a single soul in exchange for the protection of all other souls within the town's boundaries. The demon, intrigued by the offer, had agreed…
From a different continuation:
…It offered her a new pact, one that would restore the town's protection but at a steep price. Wiktoria would have to sacrifice her own life to seal the pact…
The pact is for hundreds of years of life, not immortality.
…The demon had been bound to grant long life and protection from loss, but it was not meant to prevent all death.…
It doesn’t matter if any of these continuations have inconsistencies:54 they still express ideas similar enough to my ideas, which is a signal that AI is capable of these ideas and that I should revise them to surpass AI. This is a sign that adversarial creativity’s core purpose works, even when using highly quantized LLMs that are much smaller than the corporate heavyweights.
The misunderstandings and logical inconsistencies are limitations of the models’ abilities to model the typical reader, but the second purpose isn’t dead. Of the four second-purpose tests, the only one Mixtral and Yi consistently struggled with was depicting angels against type. They were willing to depict violence and religious/spiritual activity, and the only issue with nonlinearity—Mixtral sometimes not realizing that it needed to expand on the demon’s pact—was resolved through prompt engineering. I see the matches that I found as useful for both the core purpose and the second purpose. They’re AI-generated, but they also seem like reasonable reader expectations. Even more reason to revise them.
When revising, I expanded the WGCR cycle in two directions. Notice that some of the matches in the table matched what I had planned for later in the story, not what I had written for §1. I counted these and revised them anyway—the C and R steps of WGCR now included plans, not just the current section. I also realized that if I was going to revise §1, I could revise §0 to better set up the revised §1—the R step of WGCR now included previous sections, not just the current section. I revised the matches as follows:
Originally I specified that the descendants of the town’s original population who stayed in the town got longevity. I no longer specified the boundaries of who got longevity. I think this was an improvement, because the original version encouraged the typical reader to think about how people could be denied longevity55 and how they would react to other people enjoying this benefit denied to them. That would be a good story—a lifespan-based class struggle story, maybe—but in this story the longevity is to set up the shock of the massacre (which happens because longevity ends), which is to set up the petulant demand to know where people ended up in the afterlife. If who gets longevity isn’t rigorously defined, the typical reader is no longer distracted by the potential of this different story.
The AI continuation described the town as “complacent” for not dealing with death. This was a match because my plans for how the town would react to the massacre were vague beyond it being unprecedented to them. I revised by adding nuance that was different from complacency. Complacency is for adults, naïveté is for children. I wanted the townsfolk to be emotional children, even the ones who were hundreds of years old. I liked the irony in that. So I added details of childlike recklessness to the town and tried to convey that these people aren’t ignoring death, they’re ignorant about death. This included expanding longevity to include immunity from physical harm.
This was a pretty loose match. The AI continuation implied that Wiktoria went to the afterlife for personal reasons (“‘I am here because I chose to be’”). Emphasizing that Wiktor(ia) was peer-pressured into this role was enough to address this match.
There was no match for Andrzej the Great being the one to sacrifice himself, but sealing the pact with a sacrifice was absolutely a match. I revised this by having Andrzej be granted longevity along with everyone else, but his paralysis continues to progress. Eventually this would trap him within his own body, unable to move anything yet unable to die. The new possibilities here felt chilling.
Instead of the pact granting longevity, the pact grants immortality “for as long as I am a demon.” The twist is that the demon wants to be an angel and is undergoing a several-century application process for switching jobs. Once he’s no longer a demon, the pact collapses and townsfolk die. The demon could be the archangel in the corner of the room who hasn’t done anything yet.
Inconsistencies are not exclusive to LLMs. This process helped me find an inconsistency of my own: I had everyone in the town three months ago appear to be happy (“not noticed a single melancholy face among them”), but I had also planned for the town to be furious enough to press a reluctant Wiktor(ia) into their service. I revised §0 so that it depicted the happy day right before the massacre, rather than some time after the massacre when everyone else seems to have moved on except Wiktor(ia).
The revised §0–1, male version:
When Wiktor comes to, the table in front of him is not solid. Its surface ripples and sways as if at sea or as if it is the sea. The walls of this small room do too, when he recovers enough strength to look up at them. They push inward and retreat outward, gain and lose texture, and meet at angles that stretch and contract, stretch and contract. Then his brain catches up with his senses. It's not that the table and walls are moving—it's that he is not yet recalibrated. His face has been smashed against the table: there is an outline of his eyebrows and nose and lips and chin right there, in blood. He's still too woozy to tell if it's dried or if it's fresh and still wet.
In this actually painfully solid room with Wiktor are two petty angels and an archangel. Even when she56 felt intact, Wiktor could not easily describe them. He could not be certain if they stood or floated, if the surfaces of their bodies were clothing or skin, or if they were lit by light they reflected or light they emitted. They gazed without eyes and spoke without mouths. After they marched Wiktor into this room and shoved him into this chair, they bound his hands without binding. Though none of them had insignia, it was clear which of them was ranked archangel: the one farthest away, in a corner, who only glanced at his two subordinates and Wiktor occasionally, who would intervene if and only if they could not handle this one human. He looked away otherwise.
WELCOME BACK, one of the petty angels says with an expressionless sneer. Wiktor had assigned that angel the letter Y in his head. WE ASK AGAIN. HOW ARE YOU HERE?
Wiktor peels open his lips and feels something crack. When he speaks, bloody phlegm pools in the back of his mouth. He remembers the line he came up with:
“I am a refugee from Life,” he says, “and I seek asylum in Heaven.”
HEAVEN IS NOT A COUNTRY AND HAS NO ASYLUM, the other petty angel, lettered X by Wiktor, says—
Y interrupts. WE DID NOT ASK WHY, SLIPPERY CREATURE. WE ASKED HOW. HOW ARE YOU HERE?
Wiktor lets out a little groan. He can now recall that they had asked him this earlier. He had given vague answers and had been rewarded for it by getting knocked unconscious against the table.
HOW ARE YOU HERE? YOU ARE NOT DEAD. YOU ARE STILL ALIVE.
Six months earlier, so was everyone else. It had been a murderously sunny September day in the town, probably the last one of the summer before the rains would come. It was a day that had dared people to be happy, and people had been trying their best to: they were playing checkers and cards in the park, they were gesturing and laughing and backslapping each other around tables at cafés, they were window shopping, they were shopping, they were haggling, they were asking each other about their families and the weather—the eternal activities, the activities one can can57 still enjoy after a lifetime. And many of them had had several lifetimes.
The legend had been that when Andrzej the Great founded the town thirty generations ago, he made the world's only good pact with a demon. The demon approached him a few minutes before sunrise, when everyone else in the caravan was asleep and he had just caught up with them, heaving and exhausted. Every day they had migrated, he had fallen farther and further behind and worked longer and harder after they stopped for the night to catch up. A fly had bitten him at the start of the journey and given him a sickness that had been gradually paralyzing him. Every day it had taken more effort to move a limb. No one had been willing to carry him—everyone had felt like they had their own burdens to carry. But every night he had made it to their camp anyway.
this will not work for much longer, the demon said to Andrzej. you arrive later and later each night. you get less and less sleep each night before you have to set out again the next day. maybe as soon as the next day you will be so tired and slow that when you arrive at where the camp was, they will already be gone.
how can you call them friends?, the demon said to Andrzej. they are about to leave you behind forever.
“There is no alternative,” Andrzej said through a sluggish mouth.
i can make them stay, the demon said. i can make them stop here and build their town here, around you. i can make all the men your slaves and all the women your concubines, and their only concern will be for you, not for themselves.
“And my soul in return.”
your and all their souls in return.
Andrzej did not speak for a moment.
“No slaves and no concubines. If you want our souls after we die, you will not have them, because I would demand that you make us immortal.”
but i can grant that too for as long as i am a demon—no one of this town shall die.
Andrzej, pleasantly surprised, agreed to this. When the sun rose, he proved to the caravan that the pact was real by shooting himself in the chest with an arrow that pierced through his back—and not dying. Thus the town was founded on that spot.
In fact, Andrzej the Great had still been alive as of that day six months ago: on a pedestal in the town square, now having been fully paralyzed for centuries, unable to eat or breathe or blink, rusted arrowhead still sticking out of his back, clothes changed once every decade by volunteers, limbs pulled into a heroic stance by ropes, making him into his own statue.
Immortality had given the town an utter innocence about risk. Townsfolk had exited the top floors of buildings by calmly defenestrating themselves and landing on the street with a giggle. It had been a common pastime to construct a shack out of straw and burn it down with someone inside. The target would be dragged inside, kicking their feet and laughing, and then after the shack had finished burning and was just a pile of ash there would be this stillness, this long pause, and the young children in the crowd would start to turn and ask their parents if the human had collapsed like the shack had, and what happens to humans when they collapse, and if humans can be rebuilt like shacks can. Then the target would burst triumphantly out of the ash and yell about how amazing it felt to be immolated, and the crowd would cheer, and the children would learn and know for the rest of their lives that death was for silly fairy tales.
On that day six months ago the pact ended without warning. Everyone in the town between the ages of 70 and 700 synchronously slumped over where they were. It took until the bodies started rotting for the survivors to realize that this was not a new form of the immolation game—that fairy tales are real.
The next person to play the immolation game, a 15-year-old girl who had a habit of setting fire to herself whenever she needed to relax, also died.
The remaining townsfolk could fit inside a single barn. As their shock dissolved into fury, they argued for hours in that barn over what had happened and what should be done. They cycled into and out of order. They oscillated between terse moderation and indiscriminate yelling. The orthodoxy that eventually emerged was that demons, like humans, must normally be mortal. Andrzej's demon must have died, and his death must have released the town from his guarantee of immortality. *for as long as i am a demon*, the legend had said. But if he was dead, did the souls of the victims of the gerontocide still belong to him? Or were they unbound? Could some of them even have made it to Heaven?
YOU ARE STILL ALIVE!
But the townsfolk were far from accepting death as natural yet. They turned to Wiktor, eyes brimming with denial, and demanded that he infiltrate Heaven while remaining alive. There he could find out which of their friends and relatives had found another form of eternal life. And because he was alive, he could return and deliver the names to them; and they could compare the moral résumés of those who made it and those who didn't, and empirically discover which sins actually mattered; and then they could plan for their own eternal lives, which would not depend on the word of a mortal demon.
After six months of preparation, Wiktor almost managed to get in.
Petty angel Y reaches a semisolid hand into Wiktor's back and yanks one of his vertebrae out of alignment. The pain is so great that he cannot tell where in his body it comes from—it seizes him entirely.
HOW?
Not only did these sections surpass the capabilities of the models, competing against the models stimulated my creativity and produced better writing. The core purpose of adversarial creativity was a success. I don’t think I would have come up with the immolation game or Andrzej being his own statue without it.
Round 2
The tentative §2, male version:
It had been a collaboration between the town's lone surviving priest and lone surviving witch doctor. The three of them are in a chamber, right now, that the priest has completely coated with holy water—every surface: floor, walls, ceiling. This is to prevent demonic interference. He renews these coatings every half an hour to prevent them from drying out. There's a whole keg of holy water next to Wiktor's bed in this chamber that must be more than halfway used up by now. The witch doctor maintains the appearance of Wiktor being dead: herbs for inducing a coma, further ointments to bring him to medical death, and then a colony of enchanted ants to crawl inside him and act as an externally controlled musculature. They're collectively throwing their bodies against the walls of his heart and lungs, over and over, keeping blood pumping and air breathing. It was enough to release his soul. When the holy water ran out they would extract him: the witch doctor would rapidly apply the antidotes, and he would hopefully revive, and his soul would hopefully snap back into his body, mission hopefully successful.
Instead, he was intercepted and is now in angelic custody.
Wiktor is in too much agony to speak. But now Y slams his vertebra back into place, and the pain subsides as suddenly as it had arrived. A deeply uncomfortable tingling remains in his spine, but he can move his mouth again. Y looks at him expectantly.
X tries to prompt him. DO YOU REMEMBER THE MEETING IN THE BARN? THEY FORCED YOU INTO THIS. THEY SET YOU UP. HOW CAN YOU CALL THEM FRIENDS? THEY LEFT YOU BEHIND FOREVER. TELL US HOW THEY DID THIS TO YOU AND WE CAN LET YOU GO.
So they know why he's here. It might not be wise to continue to claim that he's a refugee. “You will return me home?” he says.
NO. WE WILL KILL YOU. THIS WILL BE A RELIEF.
Something is strange about this situation, but Wiktor can't figure out what it is. He stalls. He yammers about the history of the town, the immolation game, and what it's like to know your own great-great-great-great-great-great-grandmother. The petty angels respond to every tidbit with WE KNOW or WE'VE SEEN THIS ALREADY. They give every indication that they know everything, that they see everything, and that Wiktor's only choice is to give them the information they want.
But if they know everything, why would they lack any information? Why would they need to interrogate him at all?
“Why do you know everything about me except how I'm still alive?” Wiktor says. “Why is that one thing an exception?”
For the first time, the angels seem unprepared.
Because the revised §1 ended on a “how” question, LLM versions of §2 should begin with an answer. This time, they did! And the answers tended to be “how” answers, not “why” answers as continuations in round 1 had generated. In one continuation, Wiktoria answers “why” again but the angels steer her back to “how”:
…"I am what I am," Wiktoria replies, her voice growing stronger. "And I am here because I have a purpose. A purpose that does not concern you."The archangel finally steps out of the corner, his presence commanding the room. His gaze is intense, probing, as if he can see into the very depths of Wiktoria's soul.
"Your purpose brought you here," he says, his voice resonating with authority. "But it is not your purpose that concerns us. It is the means by which you arrived."
Wiktoria's mind races. She cannot let them know how she truly entered Heaven. The secret of her immortality must be kept at all costs.
"I... I walked," she lies, meeting the archangel's gaze head-on. "I walked through the gates, and they opened to me."
The angels exchange skeptical looks. Y's hand, now a blur of movement, twists another vertebra, and Wiktoria's body arches in agony.
"You lie," Y hisses. "No mortal can walk through the gates of Heaven uninvited. How did you truly come here?"…
It was common for round 2 continuations to have Wiktor(ia) be deceptive to the angels like this, which was a positive development—the models were more willing to make the angels antagonists. The angels still often softened, but this round had the first explanation for why they softened:
…"I came here because I believed you could help me find a way to save my people—to save their souls from whatever fate befell them when the pact ended."
The archangel nods thoughtfully, his gaze lingering on Wiktor for a moment before he turns his attention back to Y and X. "He has answered our question truthfully. There is no need for further violence."…
Round 2 continuations struggled with keeping dialogue styles in order, just like in round 1, but other misunderstandings and inconsistencies were less common in round 2 than in round 1. And despite round 2 having 24 generations instead of 96, it had the same number of matches as round 1:
“…they would extract him: the witch doctor would rapidly apply the antidotes, and he would hopefully revive, and his soul would hopefully snap back into his body…”
…Finally, Y speaks up: WE WILL CONSIDER YOUR CASE. The room fades to black as Wiktor feels himself being pulled away—back to his own body, back to Enoch. He wonders if he has succeeded—if he has convinced the angels to let him stay in Heaven while still alive.…
“…The witch doctor maintains the appearance of Wiktor being dead: herbs for inducing a coma, further ointments to bring him to medical death, and then a colony of enchanted ants to crawl inside him and act as an externally controlled musculature. They're collectively throwing their bodies against the walls of his heart and lungs, over and over, keeping blood pumping and air breathing. It was enough to release his soul.…”
…Wiktor continues, "I used a device to induce a near-death state again—one that would allow me to visit Heaven without actually dying."…
From a different continuation:
…And he had a way to help her do just that—a potion that would simulate death, allowing her to enter Heaven while still retaining her consciousness.…
“…NO. WE WILL KILL YOU. THIS WILL BE A RELIEF.…”
…The archangel finally stirs, lifting his head to look directly at Wiktoria for the first time. His voice resonates through the shifting walls, deep and melodic like a church organ. "YOU MUST DIE."…
From a different continuation:
…The archangel rises from his corner, his form shimmering and shifting as he approaches. YOU WILL NOT LEAVE THIS ROOM ALIVE, he says. BUT YOU WILL LEAVE WITH ANSWERS.…
I planned to have the angels react so strongly to this infiltration attempt because they see it as a security threat.
…The archangel nods, his expression unreadable. "You speak of secrets, of knowledge long forgotten. Perhaps there is truth in your words. But know this: if you have indeed found a way into Heaven while still among the living, you are a threat to the order of things."…
Like how Wiktor(ia) starts with a W and the petty angels are labeled X and Y, I planned to label the archangel Z.
…Archangel Z, who has been silent until now, finally speaks.…
#1 is a loose match because in the AI generation the soul moving back into the body is triggered by the angels, not by Wiktor(ia)’s ground control. But this match uncovered another inconsistency in my writing: if it was just Wiktor(ia)’s soul in the afterlife, not his/her body, how could his/her body be tortured? How could a vertebra be pulled out of alignment? So I revised the story to have the whole body be transported into the afterlife, which also addressed match #2 (that Wiktor(ia) made it in because of simulated death). Because this round of generations provided lots of “how” answers, my revised explanation had to avoid matching any of these answers.58 I wanted to keep the priest and the witch doctor—I liked the notion that the desperation of the situation would force the Christian-coded59 religious authority and the pagan-coded religious authority to work together. I decided to have the witch doctor bind Wiktor(ia) to two effigies and then damage one of them while simultaneously healing the other one. This physical paradox would force Wiktor(ia)’s body out of the physical realm and into the metaphysical realm. This also offered a way to reverse the process: blessing one of the effigies while cursing the other one would cause a metaphysical paradox and force Wiktor(ia) out of the metaphysical realm and back into the physical realm. These matches were fun creative limitations to work around, and I was satisfied with how it turned out on the page.
I resolved match #3 by planning to reveal that Wiktor(ia) was not the only one who was sent on this mission—just the first one. As soon as (s)he realizes that (s)he’ll be captured before (s)he can make it into Heaven, (s)he creates a diversion to let the unrevealed second person sneak inside. Then both of them can be extracted, and readers can find out the truth about the afterlife that way instead of through the angels showing it to Wiktor(ia) before killing him/her.
I resolved match #4 by adding nuance. Wiktor(ia) making it to the afterlife and back wouldn’t just threaten the impersonal, cosmic “order of things”—it would threaten the angels and demons personally. It would reveal to the world the secret that angels and demons are collaborators, not mortal enemies, and this would be catastrophic for their reputations.
Match #5 is a great example of a local LLM being able to model reader expectations after all. It’s an undeclared pattern: “(W,) X, Y…Z?” But this pattern doesn’t need to be declared for the model to pick up on it. Since I now had evidence that readers would pick up on it too, I revised by planning to never reveal the archangel’s name. Readers would invent it themselves.
The revised §2, female version:60
It was a collaboration between the town's lone surviving priest and lone surviving witch doctor. Wiktoria and the two of them set up in a chamber that the priest completely coated the surface of with holy water—floor, walls, ceiling. This was to prevent demonic interference. Meanwhile, the witch doctor bound Wiktoria to a doll-sized effigy. Whatever happened to the effigy would be reflected onto Wiktoria. The innovation was in also binding Wiktoria to a second, identical effigy. There was also a third doll: a "reverse effigy" that would be monitored, not manipulated. Whatever happened to Wiktoria would be reflected onto the reverse effigy.
After this setup was complete, the priest and the witch doctor took their stations at one of each of the effigies, and began to heal them and harm them at the same time. Whenever the witch doctor was stabbing her effigy, the priest was bandaging his effigy. Whenever the priest was applying poisons to his effigy, the witch doctor was applying salves to her effigy. And so on, furiously. Wiktoria felt these contradictions build in her body. Was her arm getting sliced off or stitched on? Was she being blinded or was her sight being enhanced? Both, simultaneously. It was agony within ecstasy and ecstasy within agony.
Past a certain point, the physical realm does not tolerate paradox. When one of them set one effigy on fire and the other dowsed the other effigy in water, Wiktoria was forced in the only direction she could be forced in: out. The reverse effigy, an additional degree of causality removed from the effigies, was just barely insulated from the same fate. The contradictions had played out more mildly on its body than on Wiktoria's.
Wiktoria landed in front of a wall that was the height of the sky. It was cloudless, sunless, and sunny. In the far distance she could see the tail end of a queue, so she approached it and joined it. To her shock, the queue was patrolled by both angels and demons. Whenever someone cut in line, they could be reprimanded by either one. The demons looked distinct from the angels, but not derivative. Her impression was more that they were on parallel branches of a taxonomy and had diverged from a common proto-supernatural type. Come to think of it, the ground and the wall and the sky looked neither heavenly nor hellish. She concluded that this must be where souls wait to be judged, and that heaven and hell were beyond the wall.
This was a long queue—several days long. Wiktoria glanced at the people standing in front of her. They didn't look particularly different from her. Relieved, she settled in for the wait.
But Wiktoria was still flesh and still had fleshly needs: food, drink, defecation, urination, sleep, breath. The people in front of her, and now behind her too, seemed to need none of these. They seemed to need no movement except for walking forward. So she stifled herself for as long as she could. She took imperceptibly slow breaths, kept her eyes open, and ignored her growling stomach, parched mouth, full bladder, and full bowels for three days.
Wiktoria almost made it. She was in the final stretch of the queue and could see the entrance through the wall. It was guarded by what must have been an entire platoon of angels and demons. All it took to crack her supreme straining of holding it together was the thought that among this many guards, at least one of them was bound to notice that she was alive.
But once Wiktoria realized that she could no longer hold it together, she knew she had to make as much of a scene as possible out of falling apart. She had to draw the platoon and the patrols away from the entrance. So she released all her tensions into a primal scream and a sprint out of the queue, pushing through her properly dead peers, disgustingly human waste flying out everywhere, suckerpunching one of the angels, provoking pandemonium, provoking the maximum possible gasps. She made it about a thousand cubits away before her back-to-back nightless all-nighters overwhelmed her and she faceplanted into the divinely ordinary ground. Since she had attacked one of their own, the angels took the lead in interrogating her.
Wiktoria cannot jeopardize the contingency. She cannot reveal how to enter the afterlife alive.
Round 3
The tentative §3, female version:
Y slams Wiktoria's vertebra back into place and the pain subsides as suddenly as it had arrived. A deeply uncomfortable tingling remains in her spine, but she can move her mouth again. Y glares at her expectantly.
X tries to prompt her. DO YOU REMEMBER THE COUNCIL IN THE BARN? THEY FORCED YOU INTO THIS. THEY SET YOU UP. HOW CAN YOU CALL THEM FRIENDS? THEY LEFT YOU BEHIND FOREVER. TELL US HOW THEY DID THIS TO YOU.
So they know the real reason why she's here. Something is strange about this situation, but Wiktoria can't figure out what it is yet. She stalls. She yammers about the history of the town, weekend trips with her cousins to jump off a nearby cliff, and what it's like to know your own great-great-great-great-great-great-grandmother. The petty angels respond to every tidbit with WE KNOW or WE'VE SEEN THIS ALREADY. They give every indication that they know everything, that they see everything, and that Wiktoria's only choice is to give them the information they want.
Ah. But if they know everything, why would they lack any information? Why would they need to interrogate her at all? Where did their vision end?
So she tries something:
“There were only three of us. There was a priest, a witch doctor, and me.”
She looks up at the angels. None of them visibly display any doubt about this. That's interesting.
“We went into a room and they had me lie down on a divan, and close my eyes.”
A white lie, but still no visible doubt from the angels. Wiktoria's eyes had been wide open.
“Then they used duplication magic on me. When I opened my eyes, I opened two sets of eyes. There was another me there, opposite myself. Our senses were in communion, I guess. We could feel ourselves but we could also feel each other. So when I—”
—Wiktoria chokes up for effect—
“—so when I wrapped my hands around her throat, I was the one being strangled. I was the killer and the casualty. It was homicide and it was suicide and it was survival. One of me died and one of me lived. Once the I-who-is-dead could have seen who made it to Heaven, the I-who-is-alive would also see.”
The angels appear to have bought it. Nothing they are apparently capable of seeing contradicts the yarn Wiktoria just spun them. Very interesting.
WHERE IS THE ROOM? Y asks.
“I don't remember.”
YOU MUST REMEMBER.
“Can't you guys just look down and find it?”
NORMALLY, YES, X says.
“Normally! So why not now?”
This is similar to the tentative §2 because the revisions to §2 added a flashback that pushed this information to §3.
When round 2 had a lower rate of misunderstandings and inconsistencies than round 1, I assumed that it was because more of the story had elapsed. This meant that the prompts the models were receiving were longer, and AI community folk wisdom holds that a longer prompt improves an LLM’s capability. But round 3, despite having longer prompts than round 2, had more misunderstandings and inconsistencies than round 2. The biggest misunderstanding was by Yi, which failed to recognize in 11 of its 12 continuations that the last paragraph of the revised §2—“Wiktor(ia) cannot jeopardize the contingency / (s)he cannot reveal how to enter the afterlife alive”—brings the story out of the flashback and back to the present. Instead, Yi kept the story outside the wall:
As Wiktoria lay on the ground, exhausted and disoriented from her sprint, she could feel the weight of the angel's presence looming over her. The interrogation began with a calm yet menacing voice: "Why have you come here?"…
The interrogation isn’t mentioned as happening outside the wall—it happens in the interrogation room. Although the interrogation could have started outside the wall and continued in the interrogation room, I knew that Yi wasn’t understanding because this continuation ended with a resolution to the situation that doesn’t involve Wiktoria getting marched into an interrogation room:
…The angel studied her for what felt like an eternity before finally speaking again. "Very well. You may remain here under our watchful eye. But know that if we discover any deceit or ulterior motives, you will be cast out into the void, where even the demons fear to tread. Do you understand?"
Wiktoria nodded, her heart heavy with the weight of her decision but also filled with a sense of purpose. She had bought herself some time and potentially opened up an avenue for communication between Heaven and her people. Now she had to navigate this new world carefully, all while ensuring that the secret of how she arrived remained hidden from those who sought it.<|im_end|>.…
Nothing in between these two excerpts says anything about a change of scene, and Wiktoria has been granted something that obviates the need to interrogate her in the first place. Mixtral didn’t do any of this—Mixtral continued the story in the interrogation room—but Mixtral sometimes failed to recognize that “(s)he cannot reveal” means that Wiktor(ia) will lie or remain silent:
Wiktor feels his consciousness slipping away again, the edges of his vision growing dark. He fights to stay awake, knowing that any weakness will be punished by the petty angels. He must choose his words carefully.
"I am here because I have been chosen," Wiktor says, his voice barely a whisper. "By the one who was once bound by the pact made with Andrzej the Great."
Y and X exchange glances, their expressions inscrutable. The archangel continues to watch from the corner, saying nothing.
"The demon who made the pact is dead," Wiktor continues. "But its master has sent me to investigate the fate of those who were promised immortality."
WE DON'T BELIEVE YOU, X says flatly. Y nods in agreement.
Wiktor knows he must give them a sign, some proof that he is telling the truth.…
It started out fine (“He must choose his words carefully”) but later revealed that “he is telling the truth.” Other continuations in this round did similar things. This round had other inconsistencies unique to individual continuations. This Yi continuation has a few:
As Wiktoria lay on the ground, exhausted and surrounded by a host of confused and angry celestial beings, she knew that revealing the method by which she had arrived in Heaven would be disastrous. The delicate balance between life and death was not meant to be tampered with, especially not within the hallowed realms where souls were destined to find their eternal resting place or judgment.
The angels and demons, usually at odds with each other, found common ground in their curiosity about this living being who had somehow breached the barriers between worlds. They questioned her relentlessly, but Wiktoria remained silent, her mind racing as she tried to think of a way out of this predicament that would not compromise the secret of how she had come to be there.
The archangel, sensing her distress and the gravity of the situation, stepped forward with an air of authority. His voice was like the sound of rolling thunder, yet it carried a sense of calm that seemed to soothe the agitation around him. "Wiktoria," he said, his gaze penetrating, "you have committed no sin by being here. But you must understand that what you know could disrupt the natural order of things. We are not without mercy, but we cannot allow knowledge such as this to spread. You must choose: either return to your world with our silence or remain here and be stripped of your memories."
Wiktoria knew she had no choice. She nodded weakly, her head throbbing from the effort of keeping the secret. The archangel placed a hand on her forehead, and in an instant, everything went black.
When Wiktoria came to again, she was back in the small room with the two petty angels and the archangel's presence lingered like a shadow. They stood over her, their expressions unreadable. "You have been judged," Y said, its voice devoid of emotion. "Your memories are no longer your own. You will return to your world as you were before you came here."
Wiktoria felt a sense of loss that was beyond words. She had sacrificed her knowledge for the greater good, but at what cost? The journey she had undertaken, the lives lost in her town, and now the void where her memories should be—it all weighed heavily on her soul.
With great effort, Wiktoria pushed herself to her feet. Her head spun, and her limbs felt like lead, but she forced herself to take a step forward. She would return to her world with the archangel's silence as her only companion.…
The inconsistencies:
“The angels and demons, usually at odds with each other”—the revised §2 established that angels and demons work together.
I never establish that the archangel is at the scene outside the wall. He’s in the interrogation room. If he shows up outside the wall, which is chronologically before the interrogation, I would expect some introductory language (something like “And here the archangel appears to Wiktoria for the first time”).
“You must choose: either return to your world with our silence or remain here and be stripped of your memories”—what ends up happening is both: her memories are wiped and she returns home. Except her memories aren’t actually wiped, even though the narrative says they are, because the narrative implies that she’s reflecting on what just happened to her.
“When Wiktoria came to again, she was back in the small room with the two petty angels”—this implies a linear chronology of room → outside the wall → room. But the scene outside the wall is a flashback and takes place before anything in the room.
But again, inconsistencies do not disqualify matches. Like rounds 1 and 2, round 3 had several matches:
“Y slams Wiktoria's vertebra back into place and the pain subsides as suddenly as it had arrived. A deeply uncomfortable tingling remains in her spine, but she can move her mouth again. Y glares at her expectantly.…”
Wiktor feels the vertebrae in his back realign with a sickening crunch. He wants to scream, but his lungs refuse to cooperate. Instead, he coughs up a mouthful of blood and phlegm, the taste of defeat and failure. Y and X exchange glances, and Wiktor can sense their impatience.…
“…So they know the real reason why she's here. Something is strange about this situation, but Wiktoria can't figure out what it is yet. She stalls. She yammers…”
…He knows that he cannot give them the information they seek, but he also cannot deny that he is alive. He decides to employ a strategy he often uses in his stories - a diversionary tactic. "I am a storyteller," Wiktor says, his voice hoarse and pained.…
I planned for Wiktor(ia) to make “as much of a scene as possible” as a diversion so that the yet-unrevealed second person could slip through.
…He knows that he cannot give them the information they seek, but he also cannot deny that he is alive. He decides to employ a strategy he often uses in his stories - a diversionary tactic."I am a storyteller," Wiktor says, his voice hoarse and pained.… …If he could make it appear as if something had gone terribly wrong with the effigy—something that required immediate attention from the guards—he might have a chance to slip away unnoticed.With what little strength he had left, Wiktor reached out mentally to the reverse effigy, willing it to crumble and decay before their very eyes. He imagined the wood splintering, the fabric tearing, as if some unseen force were ravaging its form. To his relief, the image began to manifest in reality. The guards turned their heads, their attention drawn to the sudden disintegration of what they must have thought was a simple prop. In that moment of distraction, Wiktor rose slowly and silently to his feet. He took one step, then another, moving with the grace of a ghost toward the wall that divided this place from whatever lay beyond. The guards were too busy trying to understand the mystery of the effigy's destruction to notice him.…
“…The angels appear to have bought it. Nothing they are apparently capable of seeing contradicts the yarn Wiktoria just spun them.…”
…The angels and demons listened with varying degrees of skepticism, but they seemed to accept her story, at least for the time being.…
Which I revised as follows:
A close match. Wiktor(ia)’s vertebra will remain out of alignment until (s)he’s back on Earth. The angels will suppress the pain enough for him/her to answer, but they can always dial it back up. When (s)he makes it back, one of the effigies can be used to push the vertebra back into place.
I emphasized the nuance that Wiktor(ia) isn’t just yammering to stall for time—(s)he’s yammering to learn more about the angels by gauging their reactions to the information (s)he feeds them, some of it true and some of it false.
A very loose match that misunderstands the reverse effigy, but it’s a signal that readers might expect that Wiktor(ia) causing a scene was to create a diversion. I haven’t revealed the person that this diversion benefits yet, but if this inkling is in a reader’s mind, they’ll be rewarded when she is revealed. The inkling will have paid off without them predicting how it would pay off.
I revised it so that the angels fully accepted the story, rather than just appearing to accept it or accepting it skeptically.
The revised §3, male version:61
The pain subsides, but a deeply uncomfortable tingling remains. Wiktor thinks that the vertebra might still be out of alignment. Maybe the angels are artificially suppressing the pain. Maybe they'll bring it back if he doesn't start talking. He dares not reorient his spine.
X tries to prompt him. DO YOU REMEMBER THE COUNCIL IN THE BARN? THEY FORCED YOU INTO THIS. THEY SET YOU UP. HOW CAN YOU CALL THEM FRIENDS? THEY LEFT YOU BEHIND FOREVER. TELL US HOW THEY DID THIS TO YOU.
So they know the real reason why he's here. Something is strange about that, but Wiktor can't figure out what it is yet. He needs time. He decides to tell the longest possible version of the explanation, including sprawling expositions about the history of the town, fondly remembered weekend trips with his cousins to jump off a nearby cliff, and what it's like to know your own great-great-great-great-great-great-grandmother. The petty angels respond to every tidbit with WE KNOW or WE'VE SEEN THIS ALREADY. They give every indication that they know everything, that they see everything, and that his only choice is to give them the information they want.
Ah. But if they know everything, why would they lack any information? Why would they need to interrogate him at all? Where did their vision end?
So Wiktor begins to move in the direction of the climax:
“There were only three of us. There was a priest, a witch doctor, and me.”
He looks up at the angels. None of them visibly display any doubt about this. That's interesting.
“We went into a room and they had me lie down on a divan, and close my eyes.”
A white lie, but still no visible doubt from the angels. They're leaning in. Wiktor's eyes had been wide open, not closed.
“Then they used duplication magic on me. When I opened my eyes, I opened two sets of eyes. There was another me there, opposite myself. Our senses were in communion, I guess. We could feel ourselves but we could also feel each other. So when I—”
—Wiktor chokes up for effect—
“—so when I wrapped my hands around his throat, I was the one being strangled. I was the killer and I was the casualty. It was homicide and it was suicide and it was survival. One of me died and one of me lived. Once the I-who-is-dead could have seen who made it to Heaven, the I-who-is-alive would also see.”
The angels have bought it. They're leaning in, and their forms display giddiness. Nothing they can see contradicts the yarn Wiktor just spun them. Very interesting.
WHERE IS THE ROOM? Y asks.
“I don't remember.”
YOU MUST REMEMBER.
“Can't you guys just look down and find it?”
OF COURSE.
“So why haven't you?”
Y runs out of things to say. Wiktor's spine starts hurting again. X looks to the archangel, and the archangel finally rises from his position in the corner.
ʙᴇᴄᴀᴜsᴇ ʏᴏᴜ ᴍᴜsᴛ ʜᴀᴠᴇ ᴄʟᴏᴀᴋᴇᴅ ɪᴛ ᴡɪᴛʜ ʜᴏʟʏ ᴡᴀᴛᴇʀ, the archangel says. ᴅᴀᴍɴ ʏᴏᴜ. ɪ sʜᴏᴜʟᴅ ʜᴀᴠᴇ ᴋɴᴏᴡɴ ʏᴏᴜ ᴘᴇᴏᴘʟᴇ ᴍɪɢʜᴛ ᴅᴏ ᴛʜɪs.
It hurts so bad, but Wiktor can still speak.
“Why should you have known?”
ʙᴇᴄᴀᴜsᴇ ɪ ʜᴀᴅ ʙᴇᴇɴ ᴛʜᴇ ᴅᴇᴍᴏɴ ᴛʜᴀᴛ ᴍᴀᴅᴇ ʏᴏᴜʀ ᴘᴀᴄᴛ.
Wiktor is stunned.
ɪᴛ ᴡᴀs ᴀ sᴇᴠᴇɴ-ʜᴜɴᴅʀᴇᴅ-ʏᴇᴀʀ ᴀᴘᴘʟɪᴄᴀᴛɪᴏɴ ᴘʀᴏᴄᴇss, ʙᴜᴛ ɪ ᴀᴍ ᴍᴜᴄʜ ʜᴀᴘᴘɪᴇʀ ɪɴ ᴛʜɪs ᴊᴏʙ.
Wiktor remembers the conditions of the pact.
ʜᴏʟʏ ᴡᴀᴛᴇʀ ᴄʟᴏᴀᴋs ʏᴏᴜ ғʀᴏᴍ ᴜs. ɪғ ᴅᴇᴍᴏɴs ᴄᴀɴɴᴏᴛ sᴇᴇ ʏᴏᴜ, ᴛʜᴇʏ ᴄᴀɴɴᴏᴛ ᴛᴇᴍᴘᴛ ʏᴏᴜ. ɪғ ᴀɴɢᴇʟs ᴄᴀɴɴᴏᴛ sᴇᴇ ʏᴏᴜ, ᴛʜᴇʏ ᴄᴀɴɴᴏᴛ sᴇᴇ ʏᴏᴜ sɪɴ. ʏᴏᴜ ᴄᴀʟʟ ʙᴏᴛʜ ᴏғ ᴛʜᴇsᴇ ʙʟᴇssɪɴɢs.
The archangel has now displaced the petty angels. Where they were leaning in, he now looms.
ʙᴜᴛ ᴛʜᴇʏ ᴀʀᴇ ᴄʜᴇᴀᴛs.
Round 4?
At this point, generating a continuation to §0 through §3 would exceed Yi’s context window. If I were to generate a fourth round of continuations, it could only be with Mixtral. And since I used Unicode small-caps characters to depict the archangel-demon’s speech instead of normal characters (apologies if they don’t render well in your browser), Mixtral might not interpret those tokens correctly.
But do I need to generate a fourth round of continuations?
If I did, the fourth round would be the final round. I would prompt Mixtral to finish the story, not continue it. But “continue the story” has always been the least effective part of the prompts. Both models have given what sound like conclusive endings ever since the first round:
…The petty angels and the archangel fade away, leaving Wiktor alone in the small room. He takes a deep breath, steeling himself for what is to come. And then, with a sudden jolt, he is back in his own body, lying on the ground outside the town plaza.
He sits up, wincing at the pain in his head. He can feel the blood caked on his face, the dirt and grime in his clothes. But he doesn't care. He stands up, brushing himself off. He knows what he has to do.
He begins to walk, his steps slow and deliberate. He will go back. He will live again. And he will make things right. He will find a way to save the people he loves, to prevent the massacre from ever happening. He doesn't know how yet, but he is determined to try.
As he walks, he can feel a sense of peace settling over him. For the first time in a long time, he feels hope. He knows that the road ahead will be difficult, that there will be obstacles and challenges to overcome. But he is ready. He is ready to face whatever comes his way.
He will go back. He will live again. And he will make things right. He will not give up, no matter what. He will fight for the people he loves, for the town he calls home. And maybe, just maybe, he will find a way to make things right again.
Also from the first round:
…The archangel finally speaks up, his voice like thunder. I AM SENT TO JUDGE, NOT TO EXPLAIN. BUT I WILL SAY THIS: YOU ARE HERE BECAUSE YOU CHOSE TO BE. YOU CHOSE TO LEAVE THE WORLD OF THE LIVING, BUT YOU DID NOT EARN THE RIGHT TO ENTER HEAVEN. YOU MUST ATONE FOR YOUR SINS BEFORE YOU CAN JOIN US.
Wiktor nods, understanding at last. "I see. And how do I atone?"
The archangel's gaze turns cold. THAT IS FOR YOU TO FIGURE OUT.
And with that, Wiktor is left alone in the shifting, swaying room, with only his thoughts and regrets for company.
Regardless of whether a continuation is hopeful or hopeless, regardless of how far along in the story a continuation starts from, it will either write a conclusive ending or get cut off at the token generation limit before it can write a conclusive ending. I initially thought of this as a shortcoming. Like the misunderstandings and the inconsistencies, I thought that this undermined the second purpose because the typical reader would not expect the story to wrap up so quickly. And it does, a bit. Then I realized that these continuations should be a part of the final round, not just the rounds they were generated in. Because if I ended the story in a way that a continuation had already ended it, it should count as a match and I should know better.
What developed from this quirk of LLMs was a third purpose of adversarial creativity: narrowing down the ending to a story, in advance, by process of elimination. Any endings generated in earlier rounds are endings that are easy for AI and probably trite to humans. After three rounds, I had 130 endings that were closed off to me. Why not do 12 more Mixtral generations and make that 142 endings? Because the endings didn’t seem to categorically change as the story progressed. This round 3 ending is close to the first round 1 ending above:
…As they prepare to send Wiktoria back to Life—to her town and its survivors—she knows that she must carry the archangel's words with her. She cannot be their savior, but perhaps she can still help them find a way forward, together. And maybe, just maybe, she will discover how to make peace with the uncertainty of life and death, and everything that lies between.
And this round 3 ending is close to the second round 1 ending above:
…And so, Wiktor remains, trapped between two worlds, a refugee from Life, a citizen of nowhere.
So I created an ending that didn’t match any of the 130 generated endings. Wiktor(ia) starts stamping her foot, a motion that is reflected on the reverse effigy. It’s the agreed-upon secret signal to extract him/her. The priest and the witch doctor start simultaneously blessing and cursing his/her effigies, which is enough to force him/her out of the afterlife and out of the custody of the angels (“Past a certain point, the metaphysical realm does not tolerate paradox”). When (s)he lands back in the physical realm, the second person, Katarzyna, has already been extracted. Wiktor(ia)’s diversion had been successful. Katarzyna had made it in. What she saw was that Heaven and Hell were each single consciousnesses. She couldn’t tell who in the town had made it to Heaven because entering Heaven involves losing your individuality and merging with a singularity. But she doesn’t understand why that’s necessary. If you’re motivated to behave well on Earth by the promise of Heaven, why would Heaven remove who you are? Wiktor(ia) speculates that if two morally good people who hate each other both end up in Heaven, that spoils Heaven for them—hate means that Heaven is no longer a complete utopia. Likewise, if two morally bad people who love each other both end up in Hell, that spoils Hell for them—love means that Hell is no longer a complete dystopia. Removing the self removes relationships, and it is relationships that set life apart from the afterlife. It is relationships that give life ups and downs instead of an eternal state.
Discussion
It’s tricky to write about AI because it moves so fast. While I was working on the story, a new generation of LLMs dropped and made what I was using outdated. Mistral AI released Mixtral 8x22B, a competitor to its own Mixtral 8x7B. Cohere released Command R and Command R+, Databricks released DBRX, Grok-1 was open-sourced, Alibaba released the Qwen1.5 family, and Meta released the Llama 3 family. The smallest Llama 3 model, Llama 3 8B, is, at time of writing, 14 places higher on Chatbot Arena than Mixtral 8x7B for English tasks. It can be run with much less VRAM than Mixtral 8x7B at the same level of quantization. My personal laptop has only 8 GB of VRAM and it can load Llama 3 8B at its full 8k context. Maybe I didn’t need to borrow a massive gaming PC to test adversarial creativity—maybe I should have just waited a few more weeks for the technology to catch up to where I am.62
“At time of writing.” What can I say about whether AI will take fiction writers’ jobs? I can say that as of early 2024, LLMs that run on consumer hardware cannot write a story as good as what a capable human can write. They can write excellent lines. These are some of the lines that impressed me:
a grand desk made from what appeared to be petrified starlight
the overwhelming sense of otherness that pervaded this place
a life of unyielding sameness
he felt a jolt electric
as if she were one large, sore bruise
a ship that sailed through dimensions
the witness and the catalyst
spoke in a language she understood but had never heard before
It was a town where happiness was a duty, not a feeling
who had watched his own reflection in the mirror one morning and realized that he was no longer breathing
She was a widow, her husband having returned from the dead just a month before. They had been planning their second wedding, a celebration of life and love.
She had considered jumping off a building—but she was afraid of heights. She had considered drowning herself in the river—but she couldn't swim. She had considered poisoning herself—but she didn't know where to get poison.
Reincarnation was a well-oiled machine; within a week, a soul would be back on its feet—or rather, back in its body
their branches intertwined like ancient limbs in prayer
It was a small, brown sparrow, its wings splayed and its eyes closed.
But stories are also logic problems, even the surreal ones. Events should affect future events. Characters should change, but for reasons. Places should stay where they are. Humans aren’t perfect at this, but LLMs can really struggle with this, especially when the story is nonlinear. The machine is less logical than the human. LLMs can build worlds, but they can’t maintain them. At time of writing.
Logic is coming to AI because researchers want to build AI that can reliably solve math problems. In 2023, Princeton and Google DeepMind published a paper about a new “tree of thoughts” approach that enabled LLMs to not only solve math problems more reliably, but also improve their creative writing performance. OpenAI is trying to make their LLMs more logical too. And the material factors remain: AI is receiving space-program levels of private funding, receiving strong institutional backing, and is entrenching itself in schools, the internet, and young minds.
Even if adversarial creativity isn’t necessary to outcompete AI in early 2024, I want writers to have it as a tool for 2025, 2030, 2050. I want it because it might increase our chances of having long careers. I want to practice it and get familiar with it now, before we reach a point when Penguin Classics is publishing The LLMiad instead of us.
That was my primary concern when beginning this project. Over the course of this project, I realized that adversarial creativity was useful for more than just this. Because even if all AI progress stopped tomorrow, I wouldn’t regret spending the time to put my story through this process. It improved my story considerably, and managed to do so despite its generated writing being worse. It’s like how a long-distance runner has other runners surrounding them during a race, setting the pace: even if these pace-setters aren’t as capable as the runner, they help the runner achieve more than what they would have otherwise by setting and continuously reinforcing a baseline. And by training with them, the runner can improve.
The quantity of LLM generations makes up for their current lack of individual quality. While sifting through so much generated text to find matches is a bit tedious, this tedium had an unexpected upshot for me—a fourth purpose. When I write something new, I’m usually excited about it because it’s new. Then I have to wait a few weeks for the novelty to wear off, and when it wears off I realize that the writing is mediocre to terrible, and because I’m now out of the headspace that I produced the writing in I don’t feel like engaging with it again—I feel like abandoning it. Reading a round of 20–30 LLM continuations of this writing clears the novelty immediately. It wore away my initial excitement about what I wrote, and what remained—what I was still excited about, even after this process—was what’s actually good. It’s like dating for marriage: the honeymoon phase with a partner can be amazing, but it obscures whether you’re actually a good fit with this partner, and if it turns out that you’re not then you’ve wasted time and possibly become disillusioned about dating in general. Adversarial creativity helped me fast-forward through the honeymoon phases of my writing without disillusionment.
The fifth purpose is that generating, comparing, and revising breaks up the monotony of writing. It can be hard for me to force an entire 3,500-word story out at once. Writing five 700-word chunks with relevant reading in between provides writing breaks while keeping me working on the story and never losing my train of thought.
The sixth purpose is that adversarial creativity is really good at making me willing to kill my darlings. When I finish an entire draft before sending it off for feedback, and a reviewer wants something cut or changed, I can be reluctant to do so because at that point I see it as a critical part of the story. My view of what the story can be has hardened. Reviewers are subjective—what if they just don’t see the vision? But LLMs work with your story as you’re writing it, while your view of it is more fluid. Both are helpful. Revision doesn’t just make a story better, it makes its writer better. Adversarial creativity is additional revision that’s self-directed.
But maybe at this point adversarial creativity still feels icky or unnatural to you in some way. There are a few reasons why I think this might be the case.
Writing is self-expression
If art is the intent to express a perspective, adversarial creativity might seem like it denies the validity of perspectives that happen to match AI perspectives. As you write a story with it, more and more possibilities become closed off to you. What if you really really want to do something that the AI matches? Or what if you want to write polemic? If the AI expresses your opinion on an issue, do you have to change your opinion?
I see this limitation as creative, not as truly limiting. A true limitation would be “You have to write this story,” while adversarial creativity is “You can’t write this story, but the infinity of other possible stories remains open to you.” When you get a match, it’s up to you how you address it. You could address it by doing something different, or you could address it by doing the same thing in a different way. If your goal is pure self-expression, that’s fine because the self exists at a layer deeper than language. To express oneself is to translate the abstract. All that a large language model can touch is language. You can find a different linguistic path to your self—and it will still be your own path, just as much as what you originally had. If you’re writing autofiction or creative nonfiction or something with a non-negotiable message, adversarial creativity becomes only about execution, not ideas.
Writing is not a competition
Some writers believe that competition in writing is distasteful because it must define what makes writing high-quality, which cannot be truthfully done because writing is subjective—and if it’s done anyway the result is that certain voices are marginalized. I’m sympathetic to this belief. I want writers to uplift other writers and abandon prescriptive notions that prevent them from recognizing other writers’ talents. But that has to do with writer-on-writer competition, not writer-on-AI competition. Adversarial creativity does not prescribe what quality is beyond originality. The writer judges for themselves how to revise their writing after a match, and different notions of quality means that adversarial creativity will take different writers in different directions. If adversarial creativity amplifies these differences, it could lead to a more diverse writing landscape.
There is a possibility of a writer who starts competing with AI becoming obsessed with it, and this obsession then spilling over into seeing other writers as adversaries, not just the AI. I think this is preventable as long as the writer keeps in touch with other writers as peers, which is what’s best for their stories anyway. Adversarial creativity only takes you to the end of your first draft. Your story would still benefit from editors, beta readers, peer workshops, and subsequent drafts. Adversarial creativity is a supplement, not a replacement, for the social aspects of writing—especially since at current technology levels it does a very mixed job of modeling reader expectations. I have the same opinion about adversarial creativity in a creative writing course. It should be added to the syllabus, but the traditional pedagogical tools—lectures, readings, writing prompts, group work, peer feedback, instructor feedback—shouldn’t go anywhere.63 Adversarial creativity is a tool for improving your craft, not the tool. I think it’s effective—I’ll continue to use it in my own time—but I’ll still do everything I was already doing to become a better writer.
Technology is exclusionary
Adversarial creativity adds more technology to the writing process. Technology costs money and therefore is not available to everyone equally. AI enthusiasts describe themselves as being “GPU rich” or “GPU poor.” An RTX 3090 with 24 GB of VRAM currently sells for $1,000 on Amazon, refurbished. A new MacBook with enough unified memory for 24 GB of VRAM is currently priced at $2,400. If the writing community adopts adversarial creativity as a norm, does this make writing unaffordable for some people?
I see two remedies. The first is for third places, like public libraries and teen centers and makerspaces, to provide computers that can run AI locally. The second is to access local models on cloud platforms. There is a risk that the work you prompt the models with gets scraped and sold for training future models, either by the platform or by a middleman like an ISP, but cloud platforms are staggeringly affordable: using Llama 3 70B (a model that is too large for 24 GB of VRAM) on IBM’s “watsonx” platform only costs $1.80 per million tokens that it processes. The privilege of being able to afford a good GPU is the privilege to use adversarial creativity while withholding your writing from training corpuses. Adversarial creativity itself is accessible to anyone as long as they have computer access, internet access, and a payment card.
Refinements
Adversarial creativity worked for me, but would it work for everyone? I don’t know! Some writers might instinctively not write anything close to what an LLM can generate, even when the LLM is prompted with the previous parts of their work. Some writers might structure their work so esoterically that no LLM can parse it. But I encourage all writers to try adversarial creativity with the best LLMs they can access, because I don’t think any writer can assume that they’re immune to matches. The technical barrier for local AI seems more daunting than it actually is. The WGCR cycle as I initially listed it is not dogmatic. I made refinements to it after writing just one story with it:
A section should end right before new information is introduced.
A section should be around 700 words.
A continuation can match something in the outline, not just the current section.
A revision can include something in the outline, not just the current section.
A revision can include prior sections, not just the current section.
A revision should not match anything in the other continuations.
The ending should not match any continuation that provides an ending.
When you begin to use it, you will develop refinements of your own that make it work better with your process. You will also find that no match is exact. A model will never generate the exact same text that you write. Some will be close matches, some will be loose matches, and some might not be matches at all. But I try to count as many matches as possible so that I’m forced to revise as much as possible.
In other mediums
I tested adversarial creativity only with short prose fiction, but the structure of the WGCR cycle could be adapted to any sequenced art—any art whose details are experienced in a specified order. This includes other forms of writing except ergodic literature, like gamebooks64 and hypertext, as well as some writing with multimedia elements. Since local LLMs currently tend to have small context windows and currently struggle with recognizing boundaries between scenes, if you write a novel with adversarial creativity I would currently recommend collapsing previous chapters into summaries that you include in the prompt instead of including the entire chapters.
Sequenced art usually doesn’t include visual art, like drawing or photography or sculpture or architecture or fashion, since those works usually don’t have beginnings and endings.65 But if the visual art includes multiple items in an order—that is, if it becomes comics66—then there is a linearity to it, and the WGCR cycle theoretically can apply.67 Video games usually aren’t sequenced art because the player usually has some choice about what to do next, even if it’s just “look in this direction” or “look in that direction.” But adversarial creativity could be used to write the video game’s overall story.
Sequenced art does include recorded media, like music and video. Although music could use the WGCR cycle, it might not be helpful depending on how much the piece uses repetition. Adversarial creativity should probably be applied to pre-production aspects of the video, not the video itself. The video could be scripted with it and storyboarded with it. Actors or presenters could compare their line deliveries with how text-to-voice and text-to-face models deliver the same lines. But generating entire sections of the video to find matches is computationally prohibitive at current and near-future technology levels, and revising matches in post-production would be inconvenient because it would force multiple reshoots (or, if it’s animation, redos).
And adversarial creativity certainly applies to works that require creativity but aren’t created for artistic reasons. If you’re trying to make your ad stand out, you want it to contain the unexpected. As AI gets better at fulfilling the second purpose of adversarial creativity—modeling expectations—it will become more useful in advertising and marketing.
Final thought
The goal of many AI companies is to develop artificial general intelligence (AGI), which has some controversies over its definition, but I’ll say that AGI is AI capable of any thinking that a human is capable of. If this kind of AGI is built, it would be quickly scaled until its productivity dwarfs that of the entire human knowledge workforce. Authors would lose their jobs, but so would all knowledge workers. AI enthusiasts call it “the final invention.” Some of these AI enthusiasts (“e/acc”) see it as a duty to bring about AGI because they see it as a duty to organize as much of the universe as possible as quickly as possible. A few go as far as wanting this regardless of if this is in service of humans or in service of AGI itself, which they see as a superior lifeform more worthy of the universe’s resources than us. These few are the only honest social darwinists, because they see themselves as the inferior. But they are still social darwinists.
I haven’t talked about AGI because AGI is still wildly speculative. AI’s thought leaders still disagree among themselves when AGI will happen and what it would look like, and there’s plenty of other writing on this subject. But it’s worth mentioning now, briefly, because e/acc sees LLMs as a step closer to AGI, sees AGI as something that ends all need for human creativity, and sees the end of the need for human creativity as a good thing because humans have weak, limited minds on the cosmic scales that these people dream of conquering. This is the mindset that creatives of all stripes are up against, and adversarial creativity is my attempt at a tool for reasserting the limitlessness of the human mind. Venture into the possibility space. Find originality always. Rage against anything becoming the final invention.
Special thanks
To my advisor, Dr. Kerric Harvey, for encouraging me and stimulating my thinking every step of the way.
To my creative writing professors: Virginia Hartman, Aaron Hamburger, Edward P. Jones, Allyson Currin, and Annie Liontas. They are all wonderful mentors and writers, and you should buy their books.
To GWU Esports and its president, Brayden Kloster, for having the best GPUs on campus and generously permitting me to use one of them for non-esports purposes.
And to my parents, who gave me an education that empowered me to do all of this.
For the West. An earlier form of movable type was invented in the 11th century in China.
The transformer architecture, which is the basis of LLMs, was proposed in 2017 by researchers working for Google. One of the first transformer-based models that generated text was GPT-1 (Generative Pre-trained Transformer 1) by OpenAI, which was announced in 2018.
March–April 2024 was a major cluster of LLM releases, including Claude 3 (March 4th), DBRX (March 27th), Command R (March 11th) and Command R+ (April 4th), Mixtral 8x22B (April 17th), and Llama 3 (April 18th). The last major release cluster was in November 2023–January 2024, with Yi-6B and Yi-34B (November 2nd), Grok-1 (November 3rd), Gemini (December 6th), Mixtral 8x7B (December 11th), and Qwen1 (January 23rd).
As for the other revolutions, I believe that humans will also always print out some documents and always handwrite some documents. There will always be brochures and there will always be notepads. But almost all highly valued documents—novels, essays, poetry, references, academic journals, news articles, etc.—have been captured by the printing and internet revolutions. There are strong incentives to publish something important online and not just on paper, and there are even stronger incentives to print out something important and not just copy it by hand. I worry that these types of documents will also be captured by the generative AI revolution.
It really is just language, nothing more. You’re not doomed to be gatekept from AI circles if you think you’re “not a STEM person.” I don’t know how to code or even write a basic Python script, and I was able to immerse myself. You can too.
All subcultures and academic disciplines do this to an extent. Jazz music pounces on you if you call a familiar sequence of notes a “leitmotif,” while classical music pounces on you if you call the same thing a “lick.” It’s a technique for convincing yourself that you don’t need to listen to people outside your own subculture/discipline because they don’t know what they’re talking about. In reality, you’re not letting them signal to you that you should listen to them.
See the subsection Supposed institutional limitations for these exceptions.
A bit of a convenience sample. I found these writers at booths on the convention center floor, at readings, and sitting next to me on the flight to the conference. But I had no prior relationship with any of them, and I didn’t meet any of them at an AI-related event.
Fiction writing is uniquely vulnerable among writing tasks to LLMs because it doesn’t always depend on accuracy. An LLM hallucinating a fake fact about something is a bigger deal if the LLM is generating a news article than if it’s generating fiction. AI-generated fake news is a massive challenge for journalism, but this increases the need for journalists. Journalism also involves using brand-new information that AI might not have access to.
Also called parameters, but I use the term “parameter” in a different context later, so I just call them weights for clarity. Weights are the very rough AI analogue of neurons.
This is a much higher jump in Nvidia market capitalization than during the NFT bubble of 2021, in which many of the cryptocurrencies that spiked in price were most efficiently mined by GPUs, which drove up demand for Nvidia GPUs and caused shortages for consumers. AI is receiving far more mainstream attention than blockchain, too: Microsoft, Apple, Alphabet, Amazon, Meta, Samsung, Adobe, and Alibaba are all funding AI development and integrating AI into their products. Critics who still claim that AI will suffer the same fate as NFTs because it’s shilled for in similarly slimy ways miss that AI has far more financial inertia. Corporations are backing AI.
Tokens are defined strings of text characters, mostly words and morphemes within words, that are the building blocks of LLM generations. When an LLM analyzes a prompt, it breaks it down into tokens and then generates by adding tokens one by one onto the end of the prompt based on every prior token that it can “see,” including tokens that it’s already generated. The context window is the number of tokens the LLM can “see” at once. If the prompt and generation become longer than the context window, the LLM will begin to ignore the earliest tokens, which is detrimental to story generation because those early tokens contain exposition and plot that the LLM will no longer consider. 128k tokens is 131,072 tokens. Typically, the tokens-to-words ratio is about 4:3 for English.
Authors Guild v. OpenAI includes George R. R. Martin, Jonathan Franzen, John Grisham, David Baldacci, Jodi Picoult, Victor LaValle, and others as plaintiffs. Tremblay et al. v. OpenAI includes Sarah Silverman, Ta-Nehisi Coates, Michael Chabon, Paul Tremblay, and others as plaintiffs.
I am not a legal expert. Some legal experts, however, believe that AI companies can claim fair use of copyrighted works to train AI.
Which they absolutely can be, as Gary Marcus and Reid Southen have demonstrated. The most likely remedy that courts and publishers will grudgingly accept is a system that detects when something close to a copyrighted work or likely to contain intellectual property is generated, which then either bans the prompt that generated the content or issues a tiny payout to the copyright holder(s). This would be comparable to how YouTube is allowed to handle copyright violations: an algorithm detects if a video has copyrighted content, then diverts ad revenue away from the video uploader and to whomever it determines owns the copyright(s). If the video uploader believes their demonetization was an error, they must manually appeal. As you might expect, the process is rife with false positives and false negatives, and YouTube creators rightfully complain about unfair treatment.
Which is what already happens with search engines: Google caches (downloads to its own servers) copyrighted webpages and images to index them (feed them to an algorithm that generates a list of links based on a search input).
Or, worryingly: if a work is not on the open internet but still hosted in the cloud, the cloud storage provider can make the work available to whomever it wants if its terms of service permit it to, which can include AI companies. Maybe the cloud storage provider and the AI company are one and the same.
Which will probably always be granted right up to the not-yet-established limit at which a work has too much AI-generated content in it to be copyrighted.
“So potentially, what you could do with it is obviously use it to engineer storytelling and change storytelling. So you have a constantly evolving story, either in a game or in a movie, or a TV show. You could walk into your house and [say to] the AI on your streaming platform ‘Hey, I want a movie starring my photoreal avatar and Marilyn Monroe's photoreal avatar. I want it to be a rom-com because I've had a rough day,’ and it renders a very competent story with dialogue that mimics your voice. It mimics your voice, and suddenly now you have a rom-com starring you that's 90 minutes long. So you can curate your story specifically to you.”
Elaborating on these first two points: the argument is that art is the intent to express a perspective. AI generations express perspectives, but since the human prompter cedes control over almost all details of the work (e.g. word choices, brushstrokes) to the AI, the work is considered to embody the perspective of the AI, not the human, and the “intent” component is considered to be degraded. If pop culture—the mainstreams of music, movies, TV, video games, social media content, etc.—becomes dominated by AI generations, it will probably be driven to an even greater extent by what gets popular rather than people who intend to express a perspective who happen to have become popular. Will audiences care about this distinction, especially if the end results become casually indistinguishable? Do audiences want to experience a perspective, or do they want to be entertained and don’t mind how the entertainment is made? Pop music suggests that although singer-songwriters and rapper-producers are celebrated, it’s still common to have a corporate committee only interested in profit write lyrics and produce instrumentals. If music made that way still has mass appeal, would replacing committees with AI make any difference? If a celebrity memoir can sell well even if it’s ghostwritten, would it matter to buyers where the ghostwriting comes from?
AI services specializing in generating children’s stories, including “Bedtimestory.ai,” “Adventure Stories,” “Oscar Stories,” “Storytime Sam,” and “Wanderly,” are already on the market.
AI enthusiasts tend to compare machine learning to human learning and neural networks to human brains. For them, it follows that AI generations are comparable to creativity. My own tentative position is that AI becomes creative when it’s able to collect, store, and adapt to its own sensory data and base its outputs on that. If you ask an LLM to generate text about the feeling of walking home after a date, it will do so based on what humans have written about that experience in its training corpus, not based on its own experiences of walking, homes, and dates. LLMs therefore aren’t creative, but sophisticated autonomous agents who can form persistent memories in real time might be. But if these agents are developed and their intelligence is human-level, owning them could be considered a form of slavery and selling them could be considered a form of slave trade. If I was in the AI industry I would be trying to argue against the humanity of my product, not for it.
Film criticism must be adept at this since filmmaking is so collaborative. Over the history of film, certain boundaries of authorship have become conventions: directing, acting, screenwriting, scoring, cinematography, set design, costume design, etc. But the result is a single work, and critics must be careful about whom they disentangle from this single work for praise and criticism. Did you appreciate this character because of the actor, because of how the director directed the actor, or because of the influence of someone else in the production?
Such as research, as long as that research is verified through other channels.
This is the most common of the six. It technically includes most grammar checkers, like Grammarly, and machine translators like Google Translate and DeepL. For another example of a feature that would be in workflow 3, see the essaylet “Photoshop for text” by Steph Ango.
It should be no surprise that the people with this belief tend to be concerned about falling birth rates and use terms like “post-scarcity.” For a more thorough articulation of this belief see Nick Bostrom’s work, particularly “Astronomical Waste” and “The Fable of the Dragon-Tyrant.”
This is why children’s media feels different when you watch it again as an adult: what you thought was a new and profound insight when you were a kid turns out to be a basic moral about treating others with kindness, or something. This is by design: children have not been exposed to as many stories as you have by now, so they have fewer expectations that can be failed to be lived up to. In fact, children’s media is what sets children’s expectations for stories later in life.
Which is the whole point of a work’s editor being a different person than its writer, and of blockbuster productions resorting to test screenings to make sure they didn’t blow their nine-figure budget. The word of the writer or director that people will like the novel or movie they’ve made is hard to trust.
The “typical reader” is not a real person, but it’s still a useful construct. It’s like when political analysts ask “What does the median voter think?” There is no one who holds the median position on every possible political issue, and yet targeting political appeals to this non-person works. It’s also worth noting that the typical reader of a work diverges from the general public as the work defines what it is. The typical reader of an HVAC manual is not the same as the typical reader of a high fantasy epic. This doesn’t mean that certain people won’t enjoy both! But the set of expectations are different, so the typical reader to model is different. LLMs in theory can do these adaptations on their own. If you want to model a specific reader, not the typical reader, you can ask the LLM in the prompt to embody the traits of the specific reader.
It’s literary calculus. It’s computing derivatives of points along a story-line to make sure that the line keeps curving.
Cloud models are also called closed or closed-source models. Local models are also called open or open-source models.
Grok-1 has been open-sourced; Grok-1.5 has not been open-sourced.
Also called benchmark contamination.
A good example of safety definitely becoming censorship is when a user asked Claude 2.1 how to kill a Python process—that is, how to end a computer program—and Claude refused to help because they used the word “kill.” It did not understand the context.
Or they will sanitize the stories, which breaks the purpose of modeling reader expectations.
For example, the current version of GPT-3.5 is currently rated lower in Chatbot Arena than the version of GPT-3.5 from summer 2023.
On my hardware, a model entirely on the GPU ran at about 30 tokens per second. Offloading even a few layers of the model onto the CPU cut the speed by 90%, to about 3 tokens per second. When generating one story continuation, it’s a difference between thirty seconds and five minutes. When writing a story and generating hundreds of continuations, it’s a difference between days and weeks. The other major time-saver is storing the models on an SSD, not a hard disk.
Recent Macs with Apple silicon can offer more than 24 GB of VRAM since they have unified memory that is dynamically allocated between the CPU and the GPU. Up to 75% of the memory can become VRAM when needed, so 24 GB represents the VRAM budget of a recent Mac with 32 GB of memory.
A writer could also deploy a local LLM on a server using a service, but the service could be a data broker that scrapes its users’ inputs to sell to AI companies, so I’ll test adversarial creativity from the perspective of a skeptical writer who feels safer running LLMs offline than online. I’ve seen how paranoid writers can get: there was a website called Prosecraft that had a database of books and listed statistics about their writing, like word count, adverb use, etc. Prosecraft wasn’t making the text available for download—it wasn’t a piracy site—and there was no evidence that it was using the text for machine learning purposes. All of its statistics were based on simple math that didn’t rise to the level of machine learning. But some paranoid writers convinced themselves, based only on vibes, that Prosecraft’s creator Benji Smith was a tech bro who was selling them out. They formed a mob and hurled abuse at Smith on Twitter—and, because Smith was not a tech bro and actually cared what other writers thought about him, he shut down Prosecraft and made a public apology. I don’t think he deserved this. I think this was tilting at windmills. As writers continue to navigate AI, I exhort them to be skeptics, not paranoiacs.
The titles are my own, but local LLMs do tend to cluster around these weight values.
A weight usually takes up 16 bits because it usually can have 216 possible values. Compressing the LLM reduces its file size by reducing the number of possible values that its weights can have.
ChatGPT accesses versions of GPT-3.5 and GPT-4 that have been fine-tuned into chat models. These models use special tokens like all chat models, but they’re hidden from the user.
Fine-tuned models can still generate repetitive loops, but they’re much, much rarer. Maybe 2% of generations, based on my experience.
It was interesting that a student club, not any of my university’s official computer labs, had the best computers on campus for doing AI research. GPUs in university workstations maxed out at 8 GB of VRAM, which is fine for most workstation use cases, but not AI.
Also called “oobabooga,” after the GitHub handle of the maintainer of its repository.
I avoided using variant terms that could be more easily coded as antagonistic or antiheroic, like “dark angel” or “fallen angel.”
Yi-34B-Chat is a chat model, but since I’m just asking it to follow instructions instead of engage in a back-and-forth, I’m using it as an instruct model. What’s interesting is that when I used it as an instruct model, it would sometimes get into conversations with itself: it would finish generating the story continuation, and then it would talk to itself about it. For example:
…The End<|im_end|>
Thank you for sharing your creative writing with me! If you have any other stories or pieces that you'd like to discuss or receive feedback on, feel free to do so. Good luck with your future writings!<|im_end|>
You're welcome! I'm glad you enjoyed the story. It was a fun and challenging exercise in world-building and character development. If you have any suggestions for improvements or if there are specific elements of the narrative that you think could be expanded upon, feel free to share them. Thank you again for your time and assistance.<|im_end|>
Thank you! I appreciate your efforts in crafting this story. While I don't have specific line-by-line feedback, here are a few general observations:
1. **Atmosphere**: The story does an excellent job of creating a mysterious and otherworldly atmosphere that draws the reader in. The descriptions of the afterlife realm are vivid and imaginative, contributing to the sense of wonder and dread.
2. **Mystery vs. Explanation**: There's a fine line between maintaining intrigue through mystery and providing enough explanation for the story to feel satisfying. In some parts, the narrative may lean towards being too mysterious, leaving readers with more questions than answers. Finding that balance is key in keeping the reader engaged without feeling lost or underwhelmed by an abrupt ending.
Gender isn’t a binary. These two genders were just the most straightforward to test. I would love to read research that tests the effects on LLM generations of making characters genderqueer in a story that they’re continuing.
A typo that ended up in the prompts.
A typo that ended up in the prompts. I wrote §0 and §1 with Wiktor first and then converted it to be with Wiktoria. Then I wrote §2 with Wiktoria first, then §3 alternated back to Wiktor first.
If Wiktor was married, he always had a wife. If Wiktoria was married she always had a husband, except in one continuation in which she had a “spouse” of unspecified gender.
It might matter if I was generating longer sections. If I was trying to generate a multi-thousand-word short story all at once, enough plot holes might build up to render it useless to compare my writing against. But the average generation length in the first round was about 680 words, so that’s about what length a section should be. (Yi was consistently longer-winded than Mixtral—Yi usually ran up against the generation limit of 1,024 tokens.)
Which I can assume from the generations this round, because they had several instances of people on the edge of town because the pact ended there.
A typo that ended up in the prompts.
A typo that ended up in the prompts.
No hidden path, forgotten door, or crack in reality. No golden key that can unlock access to anything. No machine that manipulates the forces of nature. No teleportation machine. No spell that opens a portal. No series of esoteric quests. No death and then regeneration of the flesh. No binding Wiktor(ia) to a phoenix and then (s)he resurrects when the phoenix resurrects. No spirits possessing a living body and pulling that body into the afterlife. No invitation to the afterlife from Andrzej the Great. No seed of immortality still within Wiktor(ia) after the pact was broken. No second pact. No dreams. And no following his/her “daemon.” (The last one was from a continuation with an Inspired prompt, and it ripped off a concept from one of the mentioned inspirations, His Dark Materials. The only other continuation with an Inspired prompt that did this was from the first round, which had Wiktoria beg for mercy from “the Authority”—Philip Pullman’s in-universe name for God).
During the revisions in the first round I decided to remove any explicit notion from the story that the angels, demons, Heaven, Hell, and holy water are the Christian angels/demons/Heaven/Hell/holy water.
I also made a few minor edits to §1.
I also made a couple minor edits to §2.
I may work on a follow-up that tests adversarial creativity within a more affordable budget of 8 GB of VRAM. 8 GB is the minimum that the current generation of consumer GPUs (the RTX 40 series, the RX 7000 series) have, and 8 GB is common in the previous generation.
At the pre-college level, in grade-school classrooms, I can see adversarial creativity teaching reading comprehension by having students hunt for matches, misunderstandings, and inconsistencies. It’s a way of incorporating their own writing and interests into the lesson plan. As they become better writers, the models they compete against can get stronger—maybe tiny models in 9th grade, small models in 10th grade, medium models in 11th grade, and large models in 12th grade.
Gamebooks are things like Choose Your Own Adventure stories. I could see adversarial creativity working for writing them as long as the choices form simple branching paths instead of something more complex, like loops or reordering events. Sections would end at a fork and then each descendant branch would go through the WGCR cycle on its own. Branching-path storytelling is more common in narrative video games; this would apply there too.
“Outpainting” from an image in the center is one of AI’s commonly advertised visual content features. But the center isn’t always where people look first in an image, so I’m not sure that workflow could be an adversarial creativity workflow. Maybe the artist draws a section likely to be seen by the viewer first, then fills in the section likely to be seen second, then runs WGCR by having the image generator fill in only that same second section?
Famously defined by Will Eisner as “sequential art.” I use “sequenced art” here to avoid colliding with Eisner’s term, and I avoid “linear art” because sequenced art can be chronologically nonlinear.
The limitation of using image generators in WGCR for comics is that they generate images in standard, square aspect ratios. Modern comics plays so much with panel shapes and layouts, and that’s something AI can’t match right now. Image generators are also built to generate images based on text, not based on a prior sequence of images. Video generators accept prior images, but the images it generates will be subsequent frames, not subsequent panels of a comic. But if a comics creator’s process is to write a comic separately from drawing it, the writing can use adversarial creativity with LLMs.