Blog

Hot Twitter topics for GE16 candidates one week in

This post is timely given RTE’s recent publication of their analysis of social media usage during the 2016 general election in Ireland, available here. It looks like they’ve partnered with the ADAPT Centre for Digital Content Technology to produce the Twitter data and Facebook for that content. It’s a pity that they:

  1. don’t indicate the source of their data
  2. don’t indicate on what basis were Tweets deemed to be about the election (was it simply the presence of the #ge16 hashtag?)
  3. don’t indicate the basis for the topic codings
  4. don’t mention the sentiment analyser algo (state of the art is still pretty far from robust, often missing e.g. sarcasm)
  5. don’t indicate what consitutes a “mention” of a political party.

It would be great to take on each of these, but I only have time to examine the topics that were identified in online Twitter posts. Ranked by volume (presumably over tweets tagged with #ge16, but I don’t know) they are:

Breaking Beethoven

Last night, at the suggestion of a friend, I took a midi performance of the first movement of Beethoven’s Op. 2, No. 1 (a piano sonata in F minor) and squashed it all into the one octave. It’s a sort of pitch-class version of the movement, where each pitch-class is represented by a real pitch, bounded to a given octave of the piano. But there are twelve different ways to do this, since the “destination” octave could be bounded by twelve different pitch-classes. This is the offset parameter referred to in the playlist. Different offsets lend each flattening a slightly different character, due to the fact that different pitch classes will end up with different registers in each iteration. The code for this experiment is forthcoming. It was made possible with pretty-midi and pyfluidsynth.

The right way to use requests in parallel in Python

Today I was working on getting as many YouTube comments out of the internets as was possible. I’m sure that my code has a long way to go, but here’s one speed-up that a naive first day out with multiprocessing and requests generated.

import requests
import multiprocessing

BASE_URI = 'http://placewherestuff.is/?q='

def internet_resource_getter(stuff_to_get):
  session = session.Session()
  stuff_got = []
  
  for thing in stuff_to_get:
    response = session.get(BASE_URI + thing)
    stuff_got.append(response.json())

  return stuff_got
  
stuff_that_needs_getting = ['a', 'b', 'c']

pool = multiprocessing.Pool(processes=3)
pool_outputs = pool.map(internet_resource_getter,
                        stuff_that_needs_getting)
pool.close()
pool.join()
print pool_outputs

Some updates

It’s been while, again, since I’ve blogged. And I’m sort of concerned about how to fix that. Do I commit to a post a day? There’s a bit of pop pyschology floating around that you should never tell your goals to anyone. Doing so only gives you a dopamine dose of self-satisfaction that actually reduces your likelihood of completing the project. Why go for delayed gratification at all if you can get your hit of happy by telling all your friends what you plan to do.

30 days of iPython

See the GitHub repo here

I suck at Python. I write Python like I’m still 10 years old, programming in QBASIC. I don’t even need to be a good better programmer in my line of work (I’m a music student), but it’s something that I’ve wanted to work on for a while, and I know the only way to improve is to write, write, write.

I love iPython Notebook (a.k.a. Jupyter + Python 2 kernel) because it allows me to mess up, fix my mistakes, and run the code again. It also supports cells that contain prose, rendered from Markdown source, so it’s a perfect engine for blogging about the code that I intend to write, using the same tool I’m writing the code with.

Getting Eulerian Video Magnification set up on Ubuntu 14.10

  1. Download this version (R2012b a.k.a. v80) of the Matlab Compiler Runtime.

  2. Follow the instructions carefully and make sure to modify the LD_LIBRARY_PATH and XAPPLRESDIR environment variables appropriately. These changes can be made permanent in your shell startup profiles.

  3. Trusty Tahr (14.04) doesn’t usually come with the right codecs in order for the Matlab Compiler Runtime to do its thing. These packages seemed to do the trick for me: ubuntu-restricted-extras, and then add the ppa ppa:mc3man/trusty-media which provides gstreamer0.10-ffmpeg

This is not what hyperlinks are for

Allow me a little rant. I was reading this FastCo article about a Spotify webapp that seemed interesting to me. Here’s a screencap of the relevant part.

FastCo

See the hyperlinked words “playlist tool”, underlined in yellow? You’d think that this would link to the webapp in question.

But no, it resolves to a category/tag-explorer page with the URI http://www.fastcompany.com/explore/playlist-tool.

What about “web app”? Nope: http://www.fastcompany.com/explore/web-app.

Does the article link to the tool at all? No.

How I hacked scheduling class meetings

As a preface, I think this merits the label hack not because it’s particularly clever or well-implemented; simply it was the fastest way for me to arrive at an optimal solution for a well-defined problem.

Problem statement

Splitting a class of $$k$$ students into $$n$$ disjoint meetings (‘sections’) which meet once a week on a pre-determined day of the week, and finding a mutually convenient time for each session based on the availability of each student

GitHub pages subdirectory hassle

This blog is hosted on GitHub pages. It is automatically generated from Markdown source files by Jekyll every time a commit is pushed to the gh-pages branch of the GitHub repo corresponding to the blog.

I have a private repo called ‘blog’, and under normal circumstances its ‘project page’ (actually my blog) would appear at, say, http://myusername.github.io/blog.

GitHub’s Jekyll process seems to be clever enough to handle this and ensure that html links in the source code are rendered correctly as links relative to this base URL.

Scraping great music taste

I’m a sometime listener of John Schaefer’s New Sounds podcast. He has a particularly eclectic taste of wide reknown.

Sometimes a recording of the show is posted online, but this is far from often the case.

However, blog posts corresponding to each show include the tracklist for each show as a HTML

element.

Therefore, it is trivial to write a scraper that iterates through the back-catalog of tracklists. This scraper outputs a CSV file.