Computer screen showing searchable database interface with song listings and AI training data

The Atlantic Exposes 21M Songs Used to Train AI Models

🤯 Mind Blown

A journalist just made it possible for anyone to search which songs are being used to train AI music generators. The searchable database reveals millions of tracks from artists who never gave permission.

Musicians finally have a way to see if their work is being used without permission to teach AI how to make music.

Atlantic reporter Alex Reisner built a searchable database that reveals exactly which songs are being fed into AI music models. The free tool lets anyone look up specific artists, albums, or tracks across four major datasets containing over 21 million songs combined.

The largest sets are massive. Two databases contain 12 million and 9 million tracks respectively. Two smaller collections still pack over 100,000 songs each. Tech giants Google and Stability AI have both confirmed using some of these datasets in their research papers.

Artists ranging from Lady Gaga and Bruce Springsteen to Radiohead and Wu-Tang Clan appear in the collections. Experimental musicians like Aphex Twin and Hainbach are also included, often without their knowledge or consent.

Here's where it gets tricky. The datasets don't contain actual audio files. Instead, they're lists of links to songs on YouTube and Spotify. AI developers use automated tools to download the music, often bypassing ads and login requirements that would normally generate revenue for artists.

The Atlantic Exposes 21M Songs Used to Train AI Models

Some sources like the Free Music Archive allow free streaming for personal use but require licensing for commercial applications. Using these songs to train AI models that could compete with human musicians sits in a legal gray area.

The Bright Side

Transparency is the first step toward accountability. Before Reisner's database, artists had no practical way to know if their music was being used to train AI competitors. Now they can search their own catalog in seconds.

The tool also reveals the scale of the issue, which could push lawmakers and platforms to create clearer rules. When problems stay hidden, they're easier to ignore. Public databases like this one make it harder for tech companies to claim they didn't know the impact of their training data choices.

Musicians and their representatives can now gather evidence to support copyright claims or negotiate fair licensing agreements. Knowledge is power, especially when it comes to protecting creative work.

This database turns an invisible problem into something concrete that artists, lawyers, and policymakers can actually address together.

More Images

The Atlantic Exposes 21M Songs Used to Train AI Models - Image 2
The Atlantic Exposes 21M Songs Used to Train AI Models - Image 3
The Atlantic Exposes 21M Songs Used to Train AI Models - Image 4
The Atlantic Exposes 21M Songs Used to Train AI Models - Image 5

Based on reporting by The Verge

This story was written by BrightWire based on verified news reports.

Spread the positivity!

Share this good news with someone who needs it

More Good News