How AI Models Steal Creative Work — and What to Do About It | Ed Newton-Rex | TED

How AI Models Steal Creative Work — and What to Do About It | Ed Newton-Rex | TED

Generative AI and the Ethics of Training Data

The Foundation of Generative AI

  • The technology behind generative AI is impressive, but it relies on three key resources: people (engineers), compute (GPUs), and data.
  • AI companies invest heavily in engineers and computational power, often spending millions per engineer and up to a billion dollars per model.
  • However, they expect to acquire training data for free, which raises ethical concerns regarding the use of creators' work.

Unlicensed Use of Creative Work

  • Many AI companies currently train their models on creative works without permission or compensation, leading to an unfair ecosystem.
  • A significant portion of training data comes from web scraping copyrighted content, with 64% of large language models using datasets like Common Crawl that include such works.
  • This practice has become standard in the industry despite its negative impact on original creators.

Competition with Original Creators

  • Generative AI competes directly with the creative works it is trained on; for example, an AI trained on short stories can produce similar competing stories.
  • This competition is not just theoretical; real-world examples show that generative AI is already affecting markets by providing alternatives to human-created content.

Real-Life Impacts

  • Filmmaker Ram Gopal Varma announced plans to use AI music exclusively in his projects, indicating a shift towards reliance on generative outputs over human-produced music.
  • Artist Kelly McKernan experienced a 33% drop in income after their work was used without consent to train an AI model that mimicked their style.

Market Changes Due to Generative AI

  • Freelance platforms like Upwork report an 8% decrease in demand for freelance writing tasks since the introduction of ChatGPT, rising to 18% for lower-value tasks.
  • These trends suggest that generative AI's ease of use leads to inevitable competition against original creators' work.

Legal Perspectives on Copyright

  • Creators argue that unlicensed training violates copyright laws as it involves copying their work without authorization.
  • While some companies claim fair use allows them to train on copyrighted material, many rights holders disagree vehemently about this interpretation's validity.

Proposed Solutions: Licensing

  • Creators advocate for licensing agreements similar to those used in other industries when utilizing copyrighted materials for commercial purposes.

Generative AI and Copyright Issues

The Impact of Generative AI on Creators

  • Commercial entities in generative AI often scrape content without creator consent, creating scalable competitors while violating copyright laws.
  • AI image generators produce approximately 2.5 million images daily, while song generators create around 10 songs per second; equating human learning with AI training is deemed unreasonable.
  • AI companies argue that licensing data is impractical due to the vast amount of training data used, but creators still seek compensation for their work.
  • There are numerous datasets available for licensing, including media company agreements and public domain resources like the Common Corpus dataset.
  • Companies can also utilize synthetic data created by AI models, which typically does not have copyright restrictions.

Licensing Practices in Generative AI

  • Successful examples exist where companies license their training data; the speaker's team at Stability AI released an AI music model trained on licensed music.
  • Fairly Trained, a nonprofit founded by the speaker, certifies generative AI companies that do not infringe copyrights without licenses; 18 companies have been certified so far.
  • Various approaches to licensing exist: some models use upfront fees while others share revenue with data providers; flexibility in licensing arrangements is emphasized.
  • Smaller startups are increasingly willing to license their data using innovative models rather than relying solely on large upfront fees.
  • Unlicensed training practices lead publishers to restrict access to their content, negatively impacting new entrants and researchers who benefit from an open internet.

Public Sentiment on Data Usage and Compensation

  • A study found that the percentage of websites restricting access for unlicensed use increased significantly over one year due to rising concerns about copyright infringement.
  • Public opinion largely opposes the notion that publicly available data should be free for use by AI companies; a poll indicated 60% disapproval of this practice.
  • A significant majority (74%) believe that AI companies should compensate creators whose works are used for training purposes, highlighting a disconnect between public sentiment and industry practices.
  • An open letter launched today emphasizes the unjust threat posed by unlicensed use of creative works in generative AI; it has garnered support from over 11,000 creators globally.

Generative AI and the Creative Community: A Call for Respect

The Discontent of Creators

  • Many artists, writers, musicians, and creators express strong opposition to generative AI due to its reliance on their work without consent.
  • This discontent stems from the fact that generative AI is trained on creative works without permission, leading to a sense of exploitation among creators.

The Path Forward: Mutual Benefit

  • Advocating for a respectful relationship between the AI industry and creative sectors is essential; both can benefit if rights are acknowledged.
  • Licensing the resources used in building generative AI may slow down development initially but will lead to equally capable models while respecting creators' rights.

A Vision for Coexistence

Channel: TED
Video description

Generative AI is built on three key resources: people, compute and data. While companies invest heavily in the first two, they often use unlicensed creative work as training data without permission or payment — a practice that pits AI against the very creators it relies on. AI expert Ed Newton-Rex has a solution: licensing. He unpacks the dark side of today's AI models and outlines a plan to ensure that both AI companies and creators can thrive together. (Recorded at TEDAI San Francisco on October 22, 2024) If you love watching TED Talks like this one, become a TED Member to support our mission of spreading ideas: https://ted.com/membership Follow TED! X: https://twitter.com/TEDTalks Instagram: https://www.instagram.com/ted Facebook: https://facebook.com/TED LinkedIn: https://www.linkedin.com/company/ted-conferences TikTok: https://www.tiktok.com/@tedtoks The TED Talks channel features talks, performances and original series from the world's leading thinkers and doers. Subscribe to our channel for videos on Technology, Entertainment and Design — plus science, business, global issues, the arts and more. Visit https://TED.com to get our entire library of TED Talks, transcripts, translations, personalized talk recommendations and more. Watch more: https://go.ted.com/ednewtonrex https://youtu.be/U9d0p96N1iw TED's videos may be used for non-commercial purposes under a Creative Commons License, Attribution–Non Commercial–No Derivatives (or the CC BY – NC – ND 4.0 International) and in accordance with our TED Talks Usage Policy: https://www.ted.com/about/our-organization/our-policies-terms/ted-talks-usage-policy. For more information on using TED for commercial purposes (e.g. employee learning, in a film or online course), please submit a Media Request at https://media-requests.ted.com #TED #TEDTalks #ai