Hunting Dinosaurs with the Crowd

Ushahidi
May 20, 2011

[Cross-posted from Frank Ostermann's blog GeoSocialite]

Dinosaurs?

“Did you see Aldo, Ugo, Luca, Maria or Anna? If so, then send us a text message, reporting which dinosaur you saw, and where. If you spot all five of them, you’ll win a small reward. ” – That was the teaser line with which we tried to lure people into our experiment.  What experiment, you ask? Well, there’s the occasion of the Open Day of the JRC, where roughly 10,000 visitors get an overview of the diverse activities of our center. We were one of the activities, and saw an opportunity not only to show what we do with your data, but also get the visitors to participate in generating information, analyze it, and learn from this experience. The security of the JRC was not too enthusiastic about our initial proposal to have the people report on (fake) wild fires. Neither did our follow-up idea of using natural animals like wolves and bears… So in the end we had the idea to use dinos, because kids love them, and if a stray message goes to someone else, that person might have some second thoughts about the sender’s sanity, but at least shouldn’t panic or alert the authorities. So that’s how Aldo, Maria, and company were born. Meet Aldo – he would like to give you his phone number…

Set-up

While in our project we monitor social media networks for information of forest fires, for the open day we chose to use SMS text messages, because almost everyone who has a mobile phone knows how to use them. That decision, however, led directly to the first problem: How do we access the messages? Although most phones have some proprietary software to access text messages on a computer, the overall quality of these programs is, well, questionable. Plus, most of them don’t allow easy export for further analysis. Fortunately, there is FrontlineSMS, a great tool to do just everything we want: Receive messages, export them, run external commands or http requests, etc. pp. Unfortunately, it works only with a limited number of phones, mostly older ones. I had hoped that everyone has now at least three generations of mobile phones in his/her basement, but it took quite a while and effort to find one (and the correct data cable!) that worked. We finally managed to get our hands on a old Nokia 6021 and its CA-42 data cable. Now we had a way to get to the messages. What’s next? Well, the elegant way would have been to use FrontlineSMS’s http feature and send those messages to a RESTful webservice of our own, where they would be first analysed and then visualized on a map displaying the JRC. Unfortunately, there wasn’t much time left, and I am geographic information analyst by background, and not a skilled web designer or programmer (one could say “amateur”, but actually I prefer the old meaning of the word “dilettante”…). Plus, I was impressed by the Crowdmap service of the Ushahidi platform. So I made the fateful decision of “why implement the wheel another time?”… Instead of using a webservice, I wrote a little Python script that got triggered from FrontlineSMS when a message arrived. It would analyze the message, looking for a Dino name and a placename. The latter was provided by our own gazetteer: After having chosen the location of the five dinos, we geo-referenced all buildings, streets and point of interests in the vicinity, setting up a gazzetteer in English and Italian. After the Python script had categorized and geo-located the message, it created a report that was uploaded to a Crowdmap deployment. While there are many ways to transmit messages to crowdmap, only the upload of a CSV allows to include information on the location. The data was also displayed on a desktop computer running QuantumGIS, because the Crowdmap had only limited querying functionality, and we wanted to show people their messages. Another problem was the performance of the set-up: I had anticipated a few thousand people, of which a fraction would play our game. After I heard that we have more than 10,000 expected visitors, all with families, I became a little bit nervous… So, the day before the open day, we set up everything. I’ll spare you the details of what went wrong, suffice it to say that it was a lot. Last minute changes to the code, Wifi issues, you name it…. However, with the help of our colleagues, we got the dinosaurs in place and the system running. Now the weather forecast predicted some serious thunderstorms in the morning. Would our dinos stand up to them?

The Open Day

Well, they did. Mostly. Maria was a bit wobbly behind her plexiglas shielding since water had leaked in, despite the best efforts of our unit’s park rangers… So the gates opened at 10, and the messages began pouring in a few minutes thereafter. But what was that? Some messages did not get geo-located as they should. Uh, well, blame me and my lack of Italian skills. Of course there are special characters, and I had tried to anticipate them. What I had not anticipated was the heavy use of ” ‘ ” before placenames (“all’edificio”). But that was quickly fixed by operating on the open heart of the system, i.e. the Python script. A more serious problem was that people used placenames other than those we had thought of, or made heavy use of abbreviations (maybe I should text more often, then I would have anticipated this). We had tested the system with colleagues, of course, but they had reported in a more structured way. So there’s obviously a difference in reporting places between people working with geographic information, and those that do not…. who would have thought. But we fixed that too by expanding the gazetteer on-the-fly. So the messages kept pouring in, the map filled with with dots that grew bigger and bigger (because they were clustered and symbolized proportional in size to cluster size). Everything was going well, except… there were no reports of Maria. None. I got nervous. Was there a bug in the code I had not thought of? Some devious placename that messed up the parsing algorithm? But a thorough check confirmed: Maria simply had not been found and reported on, yet. So we dispatched a team of seasoned troubleshooters and dino hunters to find out what had happened to Maria. They soon reported back: Not only had Maria been damaged by the rain. Not only was she in a place that did not have as many visitors. No, she was also partially obscured by a parking car! The only car parking in the whole area (there were only few cars allowed on the premises during the open day) was parking right in front of Maria! Well, that’s why field work is dirty. There are circumstances you just cannot foresee. In the end, Maria got some reports, but only a fraction of those we received about the other dinosaurs. In total, we got 293 reports of dinosaur sightings. That is actually much less than we had expected (and feared!). In fact, it was probably to our advantage that people thought the game was meant only for kids to be played, and that the competition by other activities from other units and departments was so tough (there were more than 80 activities in the whole research center). I doubt that the mobile phone would have been able to handle triple the number of messages…. And the desktop where people could ask about their messages was occupied enough already. So that’s the end of the story: Here you can see all the reports that were successfully geo-coded and categorized (the others are not displayed) and the map. We’ll analyze the messages we got, geocode those that were not geocoded only because of our gazetteer, and publish the results, both on this blog and in some journal in more detail.

Screenshot of JRC Open Day Crowdmap deployment:

It was a very exciting two-week period, and I learned a tremendous amount of things. Most of all, it increased my respect for everyone out there working in the field, who’s setting up and running things in much more adverse conditions than we had to face. My respects also to the community of software developers and volunteers who make the numerous crowdmapping efforts possible. And a big thanks to my colleagues who helped us set up everything and run it.