Scraping great music taste

August 20, 2014 (Last Modified: November 16, 2024)

I’m a sometime listener of John Schaefer’s New Sounds podcast. He has a particularly eclectic taste of wide reknown.

Sometimes a recording of the show is posted online, but this is far from often the case.

However, blog posts corresponding to each show include the tracklist for each show as a HTML

element.

Therefore, it is trivial to write a scraper that iterates through the back-catalog of tracklists. This scraper outputs a CSV file.

Then, we strip the CSV file of all but the artist and track name data. This requires that we remove the timecodes appended to each track name. To my surprise, this can be done using a regular expression in LibreOffice.

A sort of clean list can be then used to search music sites. I used a tool called Ivy to convert this list into a Spotify playlist.

Of about 700 tracks in the original list, 200 or so were found in Spotify’s catalog. There are some issues in the scraped data that are easily fixed which are causing this number to be so low. But it’s also worth bearing in mind that some tracks that Schaefer plays are not available commercially, nor might they be licensed for playback on Spotify.

Those 700 tracks only come from the last 100 shows (as recorded on the website). New Sounds has been running for decades. And it’s a treasure trove of music.