
I wrote about athletes are using AI to build their own media empires, why companies like Google are funding AI curricula at top film schools, the future of AI video after Sora’s demise, the real reason OpenAI acquired TBPN, and Ben Affleck’s sale of his AI company to Netflix.
Every era of entertainment ends up defined less by who makes content than by who controls the choke points around it. Agents were supposed to help talent and studios work more functionally; distributors were meant to expose great films to global markets; music-licensing organizations like ASCAP and BMI were framed as administrative plumbing. In our industry, middlemen matter.
AI is extending that pattern — with a significant difference. In earlier eras, the intermediary arrived before the deal was struck. Here, the content was arguably already taken during the great AI scraping battles of the last year. Studios believe, with reasonable evidence, that their libraries have already been used by large AI models, mostly run by tech behemoths, without permission or payment; tech companies call that fair use.
This was evident last year when Disney and NBCUniversal sued Midjourney for copyright infringement. The suit, filed in federal district court in Los Angeles, alleges the AI company pirated the studios’ libraries, distributing “innumerable” AI-generated copies of their marquee characters such as Darth Vader from Star Wars and the Minions from Despicable Me. A trial date has not been set. And later that year, OpenAI’s Sora became its own flashpoint: Studios alleged the video model scraped their libraries, and talent found themselves having to opt out of a system they’d never opted in to — a requirement OpenAI walked back under pressure.
What sits uncomfortably next to this earlier wave of data scraping is a large and growing zone of unresolved value: content already being used at scale, by systems already deployed, under rules that nobody has quite agreed to yet.
Into that gap a new class of intermediary has been inserted: not studios, not model builders, not platforms — but data brokers selling something the scraping era never bothered to offer: ethically sourced training data, licensed and rights-cleared before it ever reaches a large language model. These brokers act as connective tissue between rights holders and AI developers. If the analogy holds, they look less like vendors than early agents of a new market — dealmakers who don’t just move assets, but define how those assets are priced, licensed and ultimately controlled.
Below, I’ll dig into the implications of this new business model for Hollywood and Silicon Valley:
- How Hollywood’s scraped libraries became the next big thing for representation
- How a data broker went from paying studios $0 in royalties in 2024 — to eight figures last year
- Why “ethical” AI training data is becoming a licensing pitch, not just a moral one
- How curated datasets and global copyrights complicate Big Tech’s fair-use defense
- Why studios are hunting for licensing deals and AI companies fear setting a price
- The Protege pitch: sign one license agreement for worldwide training rights in 190 countries
Don’t stop here
Unlock the full story — and the no-spin reporting Hollywood trusts
Already a subscriber?


