Skip to content


Taxonomy for Text Messages

Getting crowd-sourced information into a system is only the first hurdle, the next is managing it

Getting crowd-sourced information into a system is only the first hurdle, the next is managing it. Last week we announced Swift Web Services, RESTful applications hosted in the cloud, that any third-party application or developer can use to assist in managing data. One of those services is SiLCC, a semantic tag extraction service for parsing text and extracting relevant keywords from Tweets and Text Messages. Tags like the names of people and places, actions that need to be taken or locations where things have occurred. It’s is an open service that we host on our servers, meaning anyone can use it in their applications. It will work with Word Press, Drupal, Frontline SMS, other aggregators like Managing News and more.

These other applications would send the SiLCC api a feed of content they want tagged, it then extracts keywords and returns a feed of tags linked to the content they refer to. From there they go on to be used however the original app developers decide.

tufts

For many organizations, this is a critical time saver. It saves humans the time from having to comb through a system to find useful content. Aggregating content in an Ushahidi instance that uses SiLCC or in SwiftRiver would allow bypass that manual sorting, allowing users to focus on verifying reports and responding to urgent requests.

Tags are the first, autonomous layer of taxonomy for content. They won’t be the only layer, but if you’re monitoring 100 different mobile phones sending in messages referring to volcanic eruption in Iceland, but you’re looking for the ten that reference one particular cancelled flight, this is one of the quickest ways to couple disparate items.

280 Characters or Less

A number of services are out there that offer similar functionality, in fact we recently partnered with Thomson Reuters who offers a service called Open Calais which extracts semantic keywords from articles and blogs. Where Open Calais doesn’t work so well is with shorter messages that are less than a paragraph in length. For managing information from mobile phone users, this is a problem because that content falls well below the threshold of Open Calais. So our partnership allows their service to supplement ours and vice-versa.

Active Mobile

SiLCC does one thing in particular differently than many apps out there that might be similar. Rather than exist as service that has to be improved by the developers (us) we’ve incorporated active learning techniques that allow it to learn autonomously. This is because we don’t know where or when the next crisis that needs to be monitored will occur. We don’t know who will set up the next SwiftRiver instance or what they’ll use it for. So we designed SiLCC to adapt to any and all scenarios by learning from the instance of use, rather than the top-down approach of tweaking the app on demand. This is known as persistent tagging. SiLCC auto-tags content, but also self-improves and accumulates knowledge (rather, conditions that it can use to improve future decisions).

Natural language processing geeks will wonder if they can define their own corpora and add words specific to their organization or event directly to SiLCC? Of course, this saves time and also improves performance. Additionally, by default we’ve included corpora for dealing with Twitter ontology as well as the TXTSPK (text speak) commonly used by mobile phone users.

Secret Ontology

Finally, the fact that we can predefine corpora, gives organizations the option of setting up codes for people utilize the system remotely. For instance, we could customize an Ushahidi instance to automatically verify and map any text message that contains a unique string (example “Help trapped in Port-au-prince Market #a1u9”). That tailing string of alphanumeric characters is like a password that tells the system to do something. An organization could set up these unique character strings and functions, giving them only to people they send to the field. In the event of an emergency, that person could communicate with HQ in ways that the other users of the system couldn’t. We have other apps for auto-detecting location, which makes it simple to extract that data as well. Rather than take a laptop into the field to map data, an organization could set up a specific set of keywords that represent locations or events. Then workers, armed only with phones with SMS functionality could use the system remotely.

This isn’t why we designed the app, and I doubt many orgs will use it this way, but I think it makes for an interesting possible extension of the Ushahidi platform. A more common use will probably be differentiation between actionable (someone needs something done now) and non-actionable reports (nothing needs to be done) for emergency response organizations.


We announced our alpha of SiLCC last week. If you’re interested in applying to be an alpha tester, click here. SiLCC is open source, so if you’d like to contribute to the project as a developer, follow the project on GitHub.

Posted in Mobile, crowdsourcing, swift river. Tagged with , , .

8 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Question on the learning part of the tagging: is it global or localized per project? If localized, it would be sweet if you described that a tad.

  2. Both. The cloud app is global, but anyone can download the code (it’s an open source project) and setup SiLCC on one of their own local servers.

  3. Dear Colleagues
    This is a great development … and further legitimizes the basic architecture for Community Analytics (CA) that seeks to help organize key data about socio-economic progress and performance at the community level!
    Peter Burgess
    http://communityanalyticsca.blogspot.com

  4. I need to run /right now/ so let me dash this off.
    Context: when I tweet using the TBuzz bookmarklet from @Arc90 there’s a benefit: I see who else has tweeted that page. Sweet.

    So: let’s say I want to blog about a certain story. And I tag that post … and assign a category … which just about guarentees that I’ve made the post unique (cuz I’m an odd old dog) … which isn’t at all the aim of the game.

    But let’s say I have access to the community campfire. I mutter “http://bit.ly/anXQQ7″ [The actual URL is far too long for this.] and someone (the system) replies, “tags: #graphics #startup #VentureCapital; category: Business”.
    And there we go … my blog post regarding that site / story / event can join the stream.

    In short: I submit a URL, the system (using the appropriate semantic analysis and taxonomy / ontology) spits out category and tags.

    sorry this is rushed
    luv to you all
    ben

  5. @ben so essentially what you’re suggesting is, in addition to the parsed tags, you also want OpenSiLLC to a return a feed of ‘crowd-sourced’ tags? That’s definitely possible, great idea.

Continuing the Discussion

  1. SwiftRiver Web Services Launches – The Ushahidi Blog linked to this post on May 12, 2010

    [...] Read the related post “Taxonomy for Text Messages“. [...]

  2. SwiftRiver Web Services Launches by Ushahidi and Jon Gosier « surflightroy linked to this post on May 12, 2010

    [...] Read the related post “Taxonomy for Text Messages“. [...]

  3. Ellen apologizes after airing fake iPhone commercial | Cell Phone Choice Blog linked to this post on May 20, 2010

    [...] Taxonomy for Text Messages – The Ushahidi Blog [...]

Some HTML is OK

(required)

(required, but never shared)

or, reply to this post via trackback.