Extracting poetry from the TwitterStream

Offensive language warning:

Uncensored Twitter content will be viewed at the end and may contain offensive, racist, sexual terms etc.

About me:

  • Just another Perl Hacker
  • Web Developer
  • Degree in English Literature
  • Perl is quite literary, as languages go

Novelty Twitter accounts:

  • Similar to Reddit "bot" accounts
  • Registered Twitter Applications
  • Parse tweets for patterns, and comment on or retweet them
  • Some examples...

I went with a slightly meta/open-ended approach, recognising:

  • Tweets in alphabetical order
  • Tweets where the initial letters spell out words
  • Tweets where the word-length increments
  • Tweets which are palindromes

I called it "Twittapophenia"

as in, Twitter + App + Apophenia.


The spontaneous perception of connections and meaningfulness in unrelated things.
William Gibson


Two key problems to solve.

  • How to read from Twitter's firehose
  • How to stop reading from Twitter's firehose 1

1 technically this is not the firehose

Twitter will happily stream tweets to your application, (in JSON), forever.

No, seriously. That length of time again: “forever”.

To connect to the Streaming API, form an HTTP request and consume the resulting stream for as long as is practical. Our servers will hold the connection open indefinitely, barring server-side error, excessive client-side lag, network hiccups, routine server maintenance or duplicate logins.
Twitter Documentation

“form an HTTP request”

You can do this the hard way or the easy way...

I'm still not sure which one I chose.

I didn't use a module, and I didn't make the request in Perl...

I found some code to build the OAuth request header as a string…

and used curl to hit the stream…

…but I can explain

I used curl because it has a cutoff argument which solves the "how to stop" problem.

So we end up with this neat thing:

system( "/usr/bin/curl --request 'GET' 'https://stream.twitter.com/1.1/statuses/sample.json' --max-time 240 --header 'Authorization: $header' > /var/tmp/$time.json\n" );

Where the $header var is produced by this ugly thing:

sub make_header { use Digest::SHA qw(hmac_sha1 hmac_sha1_base64); use URI::Escape; my $conf = shift; my $header = "OAuth "; my $nonce = time(); my $signature_method = "HMAC-SHA1"; my $version = "1.0"; my $http_method = "GET"; my $base_url = "https://stream.twitter.com/1.1/statuses/sample.json"; my $timestamp = time(); my $signature_base = uri_escape("oauth_consumer_key") . '=' . uri_escape( $conf->{consumer_key} ) . '&'; $signature_base .= uri_escape("oauth_nonce") . '=' . uri_escape($nonce) . '&'; $signature_base .= uri_escape("oauth_signature_method") . '=' . uri_escape($signature_method) . '&'; $signature_base .= uri_escape("oauth_timestamp") . '=' . uri_escape($timestamp) . '&'; $signature_base .= uri_escape("oauth_token") . '=' . uri_escape( $conf->{token} ) . '&'; $signature_base .= uri_escape("oauth_version") . '=' . uri_escape($version); $signature_base = $http_method . "&" . uri_escape($base_url) . "&" . uri_escape($signature_base); my $signing_key = uri_escape( $conf->{consumer_secret} ) . "&" . uri_escape( $conf->{oauth_token_secret} ); my $signature = hmac_sha1_base64( $signature_base, $signing_key ); # the signature length needs to be a multiple of 4 while ( length($signature) % 4 ){ $signature .= '='; } $header .= uri_escape("oauth_consumer_key") . '="' . uri_escape( $conf->{consumer_key} ) . '", '; $header .= uri_escape("oauth_nonce") . '="' . uri_escape($nonce) . '", '; $header .= uri_escape("oauth_signature") . '="' . uri_escape($signature) . '", '; $header .= uri_escape("oauth_signature_method") . '="' . uri_escape($signature_method) . '", '; $header .= uri_escape("oauth_timestamp") . '="' . uri_escape($timestamp) . '", '; $header .= uri_escape("oauth_token") . '="' . uri_escape( $conf->{token} ) . '", '; $header .= uri_escape("oauth_version") . '="' . uri_escape($version) . '"'; return $header; }

let's not dwell on that for too long…

So, all that is wrapped up in a nice little perl script and run by cron.

  • Script downloads the JSON to a file…
  • Script parses the JSON
  • Script dies because when you stop abruptly in the middle of downloading some JSON, you're going to have a bad time
  • Script, now with added eval block goodness, parses the JSON
  • Script does various bits of text-munging on the tweets and emails me about anything interesting.

I got a bit bored with simply finding the interesting tweets…

  • Because it got a bit repetitive over time…
  • Because I never really found a good palindrome.
  • Because approximately 82% of Twitter traffic seemed to be about English boy band “One Direction”

Some examples of tweets which are palindromes:

  • Toot Toot!
  • Wow!
  • Wow wow!
  • Yay!
  • Yay Yay!
  • etc.

so I decided to create "found" poetry from tweets…

it would be fairly easy to write modern poetry…

I have eaten the plums that were in the icebox and which you were probably saving for breakfast Forgive me they were delicious so sweet and so cold

“This Is Just To Say” (William Carlos Williams, 1883–1963)

I decided to write sonnets, which have actual rules:

They must be fourteen lines

There must be 10 syllables in each line

They should rhyme in the following pattern:

a b a b c d c d e f e f g g

Here's an example of a classic sonnet:

Shall I compare thee to a summer’s day? Thou art more lovely and more temperate. Rough winds do shake the darling buds of May, And summer’s lease hath all too short a date. Sometime too hot the eye of heaven shines, And often is his gold complexion dimmed; And every fair from fair sometime declines, By chance, or nature’s changing course, untrimmed; But thy eternal summer shall not fade, Nor lose possession of that fair thou ow’st, Nor shall death brag thou wand’rest in his shade, When in eternal lines to Time thou grow’st. So long as men can breathe, or eyes can see, So long lives this, and this gives life to thee.

So we need to:

  • count syllables in each tweet
  • if ($syllables == 10) save to a database
  • take the last word
  • use that last word to find other tweets which rhyme

Luckily there is a module for the hard parts:

Lingua::Rhyme abstracts word sounds into this syntax

Anyone know what that's all about?

And precompiles a rhyme database in MySQL:

So now we have a database of over a million eligible tweets…

We have Perl and jQuery…

So let's write a poem…


I will of course upload this slideshow later and provide code/links.