Taxonomy for Text Messages

Ushahidi
May 5, 2010

Getting crowd-sourced information into a system is only the first hurdle, the next is managing it Getting crowd-sourced information into a system is only the first hurdle, the next is managing it. Last week we announced Swift Web Services, RESTful applications hosted in the cloud, that any third-party application or developer can use to assist in managing data. One of those services is SiLCC, a semantic tag extraction service for parsing text and extracting relevant keywords from Tweets and Text Messages. Tags like the names of people and places, actions that need to be taken or locations where things have occurred. It's is an open service that we host on our servers, meaning anyone can use it in their applications. It will work with Word Press, Drupal, Frontline SMS, other aggregators like Managing News and more. These other applications would send the SiLCC api a feed of content they want tagged, it then extracts keywords and returns a feed of tags linked to the content they refer to. From there they go on to be used however the original app developers decide. tufts For many organizations, this is a critical time saver. It saves humans the time from having to comb through a system to find useful content. Aggregating content in an Ushahidi instance that uses SiLCC or in SwiftRiver would allow bypass that manual sorting, allowing users to focus on verifying reports and responding to urgent requests. Tags are the first, autonomous layer of taxonomy for content. They won’t be the only layer, but if you’re monitoring 100 different mobile phones sending in messages referring to volcanic eruption in Iceland, but you’re looking for the ten that reference one particular cancelled flight, this is one of the quickest ways to couple disparate items. 280 Characters or Less A number of services are out there that offer similar functionality, in fact we recently partnered with Thomson Reuters who offers a service called Open Calais which extracts semantic keywords from articles and blogs. Where Open Calais doesn’t work so well is with shorter messages that are less than a paragraph in length. For managing information from mobile phone users, this is a problem because that content falls well below the threshold of Open Calais. So our partnership allows their service to supplement ours and vice-versa. Active Mobile SiLCC does one thing in particular differently than many apps out there that might be similar. Rather than exist as service that has to be improved by the developers (us) we’ve incorporated active learning techniques that allow it to learn autonomously. This is because we don’t know where or when the next crisis that needs to be monitored will occur. We don’t know who will set up the next SwiftRiver instance or what they’ll use it for. So we designed SiLCC to adapt to any and all scenarios by learning from the instance of use, rather than the top-down approach of tweaking the app on demand. This is known as persistent tagging. SiLCC auto-tags content, but also self-improves and accumulates knowledge (rather, conditions that it can use to improve future decisions). Natural language processing geeks will wonder if they can define their own corpora and add words specific to their organization or event directly to SiLCC? Of course, this saves time and also improves performance. Additionally, by default we’ve included corpora for dealing with Twitter ontology as well as the TXTSPK (text speak) commonly used by mobile phone users. Secret Ontology Finally, the fact that we can predefine corpora, gives organizations the option of setting up codes for people utilize the system remotely. For instance, we could customize an Ushahidi instance to automatically verify and map any text message that contains a unique string (example “Help trapped in Port-au-prince Market #a1u9”). That tailing string of alphanumeric characters is like a password that tells the system to do something. An organization could set up these unique character strings and functions, giving them only to people they send to the field. In the event of an emergency, that person could communicate with HQ in ways that the other users of the system couldn’t. We have other apps for auto-detecting location, which makes it simple to extract that data as well. Rather than take a laptop into the field to map data, an organization could set up a specific set of keywords that represent locations or events. Then workers, armed only with phones with SMS functionality could use the system remotely. This isn't why we designed the app, and I doubt many orgs will use it this way, but I think it makes for an interesting possible extension of the Ushahidi platform. A more common use will probably be differentiation between actionable (someone needs something done now) and non-actionable reports (nothing needs to be done) for emergency response organizations.

We announced our alpha of SiLCC last week. If you’re interested in applying to be an alpha tester, click here. SiLCC is open source, so if you’d like to contribute to the project as a developer, follow the project on GitHub.