Sudden Onset Translation: Lessons from Colombia Disaster Simulation

Ushahidi
Dec 6, 2010

[This is an excerpt taken from Robert Munro's blog post on Sudden Onset Translation] There is a place for machine-translation, crowdsourced translation and professional translation in humanitarian work. What machine-translation lacks in precision it gains in scale. It can also expand the potential worker community: even if translations are not reliable, they may be accurate enough to allow topic-identification and prioritization. Last month I worked on machine-translation for a UN OCHA earthquake simulation in Colombia as part of an evaluation of the recently formed Standby Taskforce. As part of the exercise over several days we tested the processing and structuring of information from emergency text messages in the wake of an earthquake in Bogota. For this I created a new auto-plugin for the Ushahidi crisis map that attempts to identify the language of an incoming message and translate it into given target languages via Google and Bing’s translation services. This was a first pass at translation, in negligible time, so that responders and information managers of any language can all understand the messages as quickly as possible. The translations were then corrected by crowdsourced volunteers. It is up to the independent observers to make the final conclusions about the potential success of the simulation as a whole, but for the translation part I was monitoring I can report success for English-only speakers in identifying priority messages using the machine-translations – an automated expansion of our value-adding workforce beyond the language of the crisis-affected community. In many ways this was historic: we combined machine-translation and crowdsourced translation seamlessly into a UN crisis response system. As far as I know it is probably the first time that artificial intelligence and crowdsourcing have been incorporated into the United Nations information management systems. It was extremely informative to be able to do this in the context of a simulation. (With special thanks to Anahi Ayala Iacucci for coordinating the workers, George Chamales for customizing the Ushahidi platform, Helena Puig Larrauri for working with the translation and analysis teams, Jaroslav Valůch for his watchful monitoring and everybody who responded to our open call for volunteers!) Despite this small step for machine-kind, it seems that automated translation was the least accurate on identifying location names. This feedback was from Marta Poblet, one of the translators:

I realized while checking the automated translation that it would translate as proper names words that were not (for example, the past tense of “fall” (“cayó”, or “cayo” usually without the accent in SMS) was translated as Gaius (a Latin jurist?) and conversely, names of neighborhoods such as Salitre or Puerta al Llano were not recognized as such and unnecessarily being translated.

This is clearly problematic. Identifying locations is one of the most important tasks for structuring emergency messages so this component would require an initial manual translation in many cases or mapping by a native speaker. While this still creates a potential bottleneck by requiring bilingual speakers at at least one early step in the process, early work in ‘monolingual translation’ indicates that the future of translation may be able to circumvent this (see Chang Hu, Benjamin B. Bederson and Philip Resnik’s "Translation by Iterative Collaboration between Monolingual Users’", or in named entity recognition for multilingual SMS, but such techniques are in their infancy. Finally, when combining machine-translation with expert correction it is hard to go past Meedan, an Arabic/English bilingual news sharing site. Users share links/comments in either language which are then translated by machine and corrected by professionals. From my interactions with Ed Bice and Chris Blow it is clear they are both thought leaders in how cross-lingual mass communication is possible in digital world. If you want to understand the complications and importance of translation, there is no better place to start than with an Arabic-English article on their site. The future of how we talk to each other is changing and even crowdsourced translation looks nothing now like it did 12 months ago. Who knows how it will look in just a few more years? (ps: In lieu of solid conclusions I can offer what you really want: more bad translation photos)