Big Data: Is There Such a Thing as Too Much?

Researches across the globe are both delighted and discouraged by the massive quantity of data that now exists and is being created at an unprecedented pace. On the one hand, this means that a wealth of material, with the majority of it easily-accessible, exists for them to sift through and use to facilitate more in-depth studies. However, on the other hand, the amount of data is truly overwhelming. The majority of analysis still requires human involvement, and unfortunately humans have a very finite reading speed. This means that researchers either have to content themselves with a limited number of sources, which may not provide as comprehensive results, or turn to computer-based analysis tools.

One research system, ASSANA, uses four methods to analyze data:
1) Traditional – A researcher reads each source material, makes observations, and develops subject codes along the way.
2) CAQDAS – Computer assisted qualitative data analysis software. Enables faster hand coding and automatic data cleanup.
3) CADM – Computer assisted data-mining and content analysis software. Provides a statistical and algorithmic analysis of text to look for patterns.
4) High Performance Data Mining – Content analysis in a high performance computing environment using R/tm.[1]

By using this system, a much greater quantity of data can be analyzed, with much higher precision that simply using only one step individually. Undoubtedly, data analysis tools will only increase in their capabilities over the years. Yet, with greater analysis come several challenges. In the article “Biology’s Big Problem: There’s Too Much Data to Handle,” Emily Singer states that “Big data efforts have almost invariably generated data that is more complicated than scientists had expected, leading some to question the wisdom of funding projects to create more data before the data that already exists is properly understood.”[2] This brings to mind the age-old question “Is it possible to have too much of a good thing?”

In Big Data: A Revolution that will Transform How We Live, Work and Think, Ken Cukier and Viktor Mayer-Schonberger caution that the ability to collect personal information has become an automatic feature included in many of our devices. This information, they state, could even be used to generate patterns surrounding user statistics that could lead authorities or other services to deduce ahead of time people’s likelihood of being involved in certain activities, which would potentially get rid of the “innocent until proven guilty” mentality that America prides itself in.

Kate Crawford, a Microsoft principle researcher, agrees that big data could, or in fact already does, lead to discrimination. She comments: “It’s not that big data is effectively discriminating — it is, we know that it is. It’s that you will never actually know what those discriminations are.”[3] She exemplifies this by noting that web advertising doesn’t have as strict of parameters, so while a bank, for instance, has to publish statistics about its activities, it could still be discriminating by only advertising to the groups of people that it’s most interested in.

With growing interests and concerns about the ability to reach, study, and hide information from large amounts of people, big data will play a pivotal role in future technology and regulations. Overall, its existence has been beneficial – the ability to process and note patterns in massive amounts of research undoubtedly has advanced areas such as the medical field and business development by untold leaps and bounds. However, on an individual and regulatory level, we will have to be mindful of Cukier, Mayer-Schonberger, and Crawford’s warnings that loss of privacy could lead to such negative aspects as preemptive charges and discrimination.

