Simulating skill in the Premier League: part1
I love sports. Every week I watch Tottenham play, and just as regularly I go through the emotional roller-coaster that entails. As a sports fan I use the first person plural to describe 'my' team, and...
View ArticleA giant step for man...
no wait, 'a giant step for me'... a massively underwhelming one for mankind: the Memory at War Project newsletter features a writeup I did on detecting Memory Events in Russian press coverage of the...
View ArticlePutin’s bot army - part one: a bit about bots
In this first of a three-part series on how the pro-Kremlin youth group Nashi simulated popular support for Putin around the 2011 and 2012 elections, I briefly introduce what bots are and how they are...
View ArticlePutin's bot army - part two: Nashi's online campaign (and undesirable bots)
Part one of 'Putin's bot army'examined what bots are, and how they have been used to simulate online interaction. In part two I explore in detail email correspondence between Nashi activists to...
View ArticleTopic maps
I've been exploring ways of calculating the subject matter discussed by Russian newspapers recently. In the end I settled on using TF-IDF (Term Frequence - Inverse Document Frequency) keywords on a...
View ArticleTennis simulator
This Wimbledon has seen a large number of injuries and withdrawals, prompting commentators to ask - could this be coincidence, or has something changed to make the courts more dangerous? Hearing people...
View ArticleFun simulating Wimbledon in R and Python
R and Python have different strengths. There's little you can do in R you absolutely can't do in Python and vice versa, but there's a lot of stuff that's really annoying in one and nice and simple in...
View ArticleFailures in Gephi
I'm sure most users of Gephi have had moments where they stumble across incredible visualizations that make no apparent sense. I have found the most spectacular failures often to give the most...
View ArticleScaling up text processing and Shutting up R: Topic modelling and MALLET
In this post I show how a combination of MALLET, Python, and data.table means we can analyse quite Big data in R, even though R itself buckles when confronted by textual data. Topic modelling is great...
View ArticleThe Challenge of not-quite-Gargantuan Data (and why DH needs SQL)
I felt strangely belittled when Andrew Goldstone tweeted about a recent blog entry:So familiar…Dealing w/ R's habit of choking on not-even-medium data. MT @RolfFredheim: Shutting up R:...
View ArticleDatabases for text analysis: archive and access texts using SQL
This post is a collection of scripts I've found useful for integrating a SQL database into more complex applications. SQL allows quickish access to largish repositories of text (I wrote about this at...
View Article[deleted post on] d3 visualisations of the GDELT data
I accidentally deleted my post on visualising the GDELT data using d3, and because it was really fiddly to make blogger display it properly in the first place, I won't reupload it. Instead here is the...
View ArticleTop Seven Tips for Processing 'Foreign' Text in Python (2.7)
Following on from my guide to making R play nice with utf-8, here is a seven-step guide to understanding Python's handling of unicode. Trust me, if you work with non-latin characters, you need to know...
View ArticleTopic Modelling Media Coverage of Memory Conflicts
Ostensibly this is a blog about memory conflict. It has become more of a repository of script snippets and visualisations, but here I get back to my roots and apply topic modelling to the...
View ArticleVisualising Structure in Topic Models
How exactly should we visualise topic models to get an overview of how topics relate to each other? This post is a brief lit review of that debate - I realise the subject matter is sooo last year. I...
View ArticlePlugging hierarchical data from R into d3
Here I show how to convert tabulated data into a json format that can be used in d3 graphics. The motivation for this was an attempt at getting an overview of topic models (link). Illustrations like...
View ArticleWeb-Scraping: the Basics
Slides from the first session of my course about web scraping through R: Web scraping for the humanities and social sciencesIncludes an introduction to the paste function, working with URLs, functions...
View ArticleProtests in Ukraine 18-20 February
The map below, based on data from here shows where the people killed in the protests of 18-20 February came from.* I can't vouch for the accuracy of the data. Apparently most of the protesters killed...
View ArticleDetecting bots
This is part 3 of the series about Nashi bots. If doing this currently I would approach the problem differently, but to my knowledge NodeXL is still a viable way of accessing the Twitter API.Part -...
View ArticleWeb Scraping part2: Digging deeper
Slides from the second web scraping through R session: Web scraping for the humanities and social sciencesSlides from the first session here...the third session here... and the fourth and final session...
View Article