Algorithms Augmenting Human Decisions

Ushahidi
Jun 8, 2011

Here's an update about the SwiftRiver platform from PDF11 which I had the pleasure of speaking at yesterday. My slides are below and here you can find video of my presentation.

Crowdsourcing 102: Mining Real-Time Data

The summation of the talk is that the Swift project has been assigned a very complex and incredibly difficult task: to verify and contextualize data from the mobile and social web. How do we do this? This seems to be the part that confuses people. It's not any of our apps, and it's not any of our individual APIs that we rely upon to do this. It's the combination of all these things into one robust algorithm that tries to digitally reconstruct the real-world context, using the features extracted from the content to prioritize and de-prioritize information relevant to that context. I like to refer to this as folksonomic triage where layers of historic, social, temporal, geospatial and other types of information are layered upon one another to perform a function, and the system (through a process called active learning) then learns how to improve form the user's interactions. What this attempts to do is allow the human to give the machine algorithm some insight into the types of content they prefer, and the types of content they don't. A statistical profile of the content features of each type is recorded, with varying degrees of nuance in-between including accounting for bias, crosstalk, irrelevance and falsehoods. Some of this happens on the application side, some of it happens on the logic/cloud side of things. This is because it's very important that the user understand that the platform is there to serve them, and not the other way around; algorithms augmenting human decision making. This means we've abstracted some elements of the system logic (the elements that everyone needs to re-use over and over again) while the things specific to the use of the platform, are defined in the UI. Use Cases We're really excited to have had a number of really amazing partners new and old using the platform. This includes groups like Newsti.ps who are building a 'people's newswire' using the Swift products. There are also some really big projects that are occurring. For instance this BBC article profiles PAX which is using our platform to power their conflict early warning platform. They want to index massive amounts of data from around the world and then use what's captured to spot the historic patterns and trends that then can be used to demonstrate confidence in future patterns. One of our favorite uses of the Swift platform to date was Product (RED)'s use last year to mashup large quantities of social media activity to power their Turn The World (RED) campaign. There have been many more uses that we can't talk about yet, but hopefully those become public soon. Some Numbers There are currently eight different code repositories housing the greater Swift project. Each of these API elements is tackled as if it were a single problem. This includes code for location disambiguation, natural language processing, influence detection, reputation monitoring and duplication filtering. You can find more about them here - http://blog.swiftly.org/post/5788873594/resources-for-developers

These combined repos contain around 150,000 lines of code (not including frameworks like Kohana)

Over 7,000 downloads of Sweeper to date

Which theoretically means at least 7,000 users of our APIs

Sweeper users tend to aggregate thousands of items of content over the life of a deployment which means we've taken around 70,000,000 items of unstructured data and done things to it - like add location, tags or filtered out the duplicates. That's obviously a very liberal extrapolation, but it should give you a sense of the amount of data we're dealing with.

As the project moves forward, and all our APIs are finally completed, this number will grow exponentially. With RiverID alone (which tracks the reputation of content and people online) we expect to be indexing over half a billion items of content and actions from the social web alone by the end of the year. That's just one API, the others will also need to scale on equal terms.