Friday, May 10, 2013

Little Fuss on Big Data?

"Big data" is everywhere - and no, that's not ungrammatical.  From the 2.5 quintillion bytes of data that IBM says we create every day, government, IT firms, consulting firms, and universities are not only intensively searching for more and more ways to crunch massive amounts of structured and unstructured data to yield new insights into behavior and make better predictions and decisions, but adding to those data by relentlessly publicizing the virtues of using big data.  In the midst of the almost deafening buzz about big data, two commentaries this week offered some trenchant observations about the care with which we need to gather and analyze big data.

First, in a May 9 article for Foreign Policy, Kate Crawford of the MIT Center for Civic Media warns against blind trust that big data necessarily "illuminate the hidden world of human behavior."  Crawford makes five main points.  First, as she puts it,"Numbers can't speak for themselves, and data sets -- no matter their scale -- are still objects of human design. The tools of big-data science . . . do not immunize us from skews, gaps, and faulty assumptions."  Noting that "there is a problematic belief that bigger data is always better data and that correlation is as good as causation," she points out the risks of reliance on social-media data in light of its potential nonrepresentativeness, confirmation bias, and flawed or biased algorithms.  Second, while she accepts that "[b]ig data can provide valuable insights to help improve our cities," she also observes that "big-data approaches to city planning depend heavily on city officials understanding both the data and its limits."  Third, she identifies the real potential for researchers, marketers, and even law enforcement to use big data in improperly discriminatory ways.  Fourth, she observes that big data can pose threats to privacy, through re-identification of individuals whose data are part of big data aggregations, and through tracking individuals' identities and activities.  Fifth, she cautions that "unless we recognize and address some of big data's inherent weaknesses in reflecting on human lives, we may make major public policy and business decisions based on incorrect assumptions."  Crawford concludes on an optimistic note, saying that "we can draw on expertise across different fields in order to better recognize biases, gaps, and assumptions, and to rise to the new challenges to privacy and fairness."

Second, in a May 10 post on Wired's Innovation Insights blog, entrepreneur Ari Zoldan focuses on three significant problems with big data: (1) "it’s so vast and unorganized, that organizing it for analysis is no easy task," such as the identification of assumptions on which algorithms for big-data analysis are based; (2) the sheer volume of data can lead researchers into "signal error" (i.e., overlooking large gaps in data) and confirmation bias; and (3) the risk that drastically wrong conclusions from incorrect analysis will be broadcast faster in a globally connected world.  Zoldan offers three pieces of advice: (1) "approach every data set with skepticism [and] . . . assume that the data has inherent flaws"; (2) "realize that data is a tool, not a course of action," and use common sense in analyzing and basing decisions on big data; and (3) reiterating that we need "the means to analyze and interpret [big data] for use."

The two commentaries make similar points, and that's a good thing.  The points they raise are fundamental to developing a healthier and more informed perspective on big data and its responsible use in societies around the world.

No comments:

Post a Comment