Big Data, Small Data, and You

Ushahidi
Jul 18, 2013

In a new post at TechPresident, Jeffrey Warren of Public Lab makes a full-throated argument for the need to look at the current data revolution in a new way. I encourage you to read his piece in full, but here is (as far as I see it) the gist. Warren posits that “big data”, the concept of mining vast quantities of data to reveal underlying patterns and the hot new trend in research, business, and everywhere else, is fundamentally flawed. He argues that the more we accept big data, the more we are ceding the advantage of that data to large organizations with the infrastructure and expertise necessary to analyze it. To Warren, big data is “premised on an asymmetric — purely upward — flow of data towards a central authority whom we must trust to make decisions on our behalf.” In other words, big data risks making the powerful only more powerful. Instead, Warren puts forward an alternative called Small Data: a “bottom-up, voluntary, shared model of data aggregation whose participants are not mere data points,” and where platforms are “built on the open exchange of data, by and for the public, towards civic ends.” It is similar to a previous argument made by Rufus Pollock of the Open Knowledge Foundation, also under the label of “small data.” To Warren, small data is not just crowdsourcing, but instead is about going a step farther, with the “public having a say in how that pooled data is used, and what questions it answers.” Warren likens it to the PC, which was the revolutionary counter to the idea that only the biggest and most powerful organizations would have access to computers. In his mind, data should go the same way. There is much to unpack in Warren’s post, so I will refrain from tackling it all, but his argument did made me think more about two notions around data that I have not been able to get out of my head for months. Years ago, Larry Diamond coined a term that defined a new field of research and practice. Called “Liberation Technology,” the disciple looked at how new technologies (Ushahidi, for example) are being used to advance social good, whether remote-sensors for human rights or social media for peacebuilding. Reading Warren’s post, I could not help but think that where we have failed so far is advancing a related concept: “liberation data,” the active use of data to create positive social change. The data is here to stay (and more is on the way), we need to step up and start looking at how it can, can’t, should, shouldn’t, will, and won’t be used to create a better world. Very recently there have been some steps in the right direction, for example the Data Science for Social Good summer fellowship we are participating in. But, my own (ongoing) literature review on the use of data for social good has found the topic largely devoid of activity. It is, as far as I can tell, an unexplored frontier. The second concept aligns closely with Warren and Pollock. Over the last five years I have seen a pattern: everyone has data and everyone has questions, but people treat the exploration of data like brain surgery — a mysterious skill best left to the professionals. The end result is that the supply of data scientists is miniscule compared to the demand, even when it comes to answering simple questions with simple data sets. This needs to change; we need to democratize data science. The reality is that most data science is not brain surgery and requires only a simple understanding of some basic methods of data collection and analysis. I call it “localbrew data": data made by locals, for locals, to solve local issues. It does not need to be bleeding-edge analyses on par with top academic research. It only needs to answer the question, even if it is just with a map or a wordcloud. At the end of the day, most of us do not need the data equivalent of an MD, we just need a data first aid kit. This is why I believe the future will be in free tools like R and Python that allow people to explore their own data without needing to wait around for a data scientist. Because, given the current demand, that might be a long wait.