TV Writers Found 139,000 of Their Scripts Trained AI. Hell Broke Loose
It's 'organized crime,' says one, as scribes from Shonda Rhimes to Robert King face a system where studio policy and law remain murky
Elaine covers the TV market from L.A. Her Fall Market Guide looks at what every studio and streamer wants to buy right now and how to pitch them (for paid subscribers only), including Netflix, Amazon Prime Video, NBCU and Peacock, Disney brands ABC, Disney+, Hulu and FX and Apple TV+. Her next installment will appear after Thanksgiving.
Whenever AI came up during last year’s WGA and SAG-AFTRA strikes, it was a contentious issue, but one that seemed to exist as an abstraction, fodder for pithy picket signs.
But last year’s theoretical fear became a real, deeply personal one with last week’s discovery by The Atlantic of more than 139,000 TV and film scripts in a data set being used to train AI. It set writer group chats aflame, and apparently no one was safe from having their work hoovered up by AI, with the search function built by The Atlantic revealing that AI had used 508 scripts credited to Shonda Rhimes, 346 from Ryan Murphy and 742 of Matt Groening’s episodes of Futurama and The Simpsons. (Showrunners and writers spent much of this past week frantically typing in their name into the search field, coming back horrified.)
The training data isn’t uploaded scripts but rather subtitles from those TV episodes and movies, sourced from a site called OpenSubtitles.org. If you’re wondering if your show or film’s script is floating around in this data set, search here.
Writer and programmer Alex Reisner, who built The Atlantic’s search tool to examine the data, wrote:
I can now say with absolute confidence that many AI systems have been trained on TV and film writers’ work. Not just on The Godfather and Alf, but on more than 53,000 other movies and 85,000 other TV episodes: Dialogue from all of it is included in an AI-training data set that has been used by Apple, Anthropic, Meta, Nvidia, Salesforce, Bloomberg, and other companies. I recently downloaded this data set, which I saw referenced in papers about the development of various large language models (or LLMs). It includes writing from every film nominated for Best Picture from 1950 to 2016, at least 616 episodes of The Simpsons, 170 episodes of Seinfeld, 45 episodes of Twin Peaks, and every episode of The Wire, The Sopranos, and Breaking Bad.”
(If you’re wondering why you can’t find any newer films or titles from newer streaming services like Apple TV+ or Disney+, the subtitles were extracted in 2018, Reisner tells me.)
“I’m livid. I’m completely outraged. It’s disgusting,” Teen Titans’ David Slack tells me after discovering 42 of his credited scripts in the database, including ones for Person of Interest, Lie to Me and In Plain Sight. “It’s a huge amount of my work . . . These are things that I poured my heart and soul into.”
As writers fume, they’re wondering what recourse they have and what kind of systems could be put in place for them to be compensated. I spoke to several, including The Good Wife co-creator Robert King, about their questions and ideas.
Then, of course, there are the studios that own the copyrights to these scripts. What should they be doing? I spoke to several well-known writers, and attorneys whose specialties range from AI and entertainment law to patent and copyright litigation to see what they had to say as well as Ken Basin, former head of Paramount and Amazon Studios business affairs and the author of The Business of Television.
In this week’s Series Business, you’ll learn:
Why dialogue from scripts is of particular value to AI training
Writers’ reactions to finding their work in the database
Why studios have been quiet in response to this controversy
Writers’ ideas for how they’d like to be paid if they can’t stop AI
How the Writers Guild of America is responding to this discovery
The current state of law regarding copyrighted material being used to train AI
The emerging technology that promises to offer Hollywood more control over the situation
Why writers fear it’s a “slippery slope” for their art to be used to train AI chatbots