We're sitting on a big-data gold mine
Imagine getting a phone call from your doctor telling you that changes in your grocery purchases, as recorded by your shopper's loyalty card, indicate a possibility of dementia.
Now imagine that same data being used by your health-insurance provider to increase your premiums, or by a potential employer to take you out of the running for a job you applied for.
That's one of the quandaries of big data discussed in a March 16, 2015, article by Ulrike Deetjen on the Policy and Internet Blog. Deetjen writes that medical researchers in Sweden claim to have a "goldmine" of data on dementia and many other diseases, but there are too few scientists to analyze the data. So-called non-medical behavioral data can improve early diagnosis of dementia in particular but also many other health maladies.
Researchers hope to combine data from diverse sources around the world to create a vast repository of information about us that can be tapped for insights into the public's health and well-being. Two of the largest obstacles to such a universal public database are the lack of data standards that make the information easy to share, and the distrust of the public about the threat of such widespread data sharing to their privacy and security.
The solution to the first problem is to create data that is "sharable by design." The technical hurdles that must be jumped to free data from the silos it resides in currently are formidable, but solvable technically. In an April 29, 2015, article, the Huffington Post's George Vradenburg discusses the challenges researchers faced when attempting to aggregate data for a recent crowd-sourced, public-research Alzheimer's project.
Overcoming public suspicion about blanket data collection is thornier because it comes down to trust. And trust is in short supply when it comes to the collection of private information by government and big business.
Trust requires informed consent, safety assurances
Would you donate your personal information -- what you do, where you go, how you feel -- to medical, social-science, and other researchers if you knew that by doing so you could help treat and prevent diseases, poverty, corruption, and other public ills? Even knowing that some of the data could be tied back to you and possibly used against your interests? What if doing so would also make some businesses richer, without any direct benefits to you?
Would I be willing to do so? It depends. Would the social benefit outweigh the personal cost of a loss of privacy and the potential security risk? What am I giving up, and what do we all get in return? That's informed consent in a nutshell. Right now, there is no informed consent to the collection of our personal information. That, too, may be changing -- slowly. In a January 12, 2015, article on The Conversation, Anya Skatova and James Goulding write that 60 percent of respondents to a recent survey indicated they would be willing to donate their personal information to benefit the public.
Many of the people who said they would share their data stated that the information is already in the hands of private companies, so why not make it available for public-service purposes as well? The two conditions they would place on the personal-data collection are that they know what information is being collected, and that they be able to track the information and exercise some control over how it is used.
That circles back to the question of who "owns" the data. The companies that collect it claim that it's theirs -- or at least they claim an unrestricted, unlimited license to use the data. A cooperative formed last year in Spain wants to help people regain some control over their personal information. The GoodData intends to sell its members' data to information brokers and donate the proceeds to the nonprofit Kiva group that provides peer-to-peer lending services for people in developing countries. Fast Company's Ben Schiller writes about the program in a March 28, 2014, article.
There are many other examples of the public use of private data, as Computerworld's Josh New describes in a February 4, 2015, article: UN workers analyze Facebook posts to help prevent teenage pregnancies and the transmission of diseases; an FDA researcher mined electronic health records from Kaiser Permanente to determine that the prescription pain medication Vioxx should be withdrawn from the market; and satellite imagery from Google Maps is used in Uganda to prioritize aid to the poorest areas of the country.
One of my favorite success stories of data mining for the public good is presented in a recent research article published in Policy & Internet. The article, entitled Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making, describes the development of a "machine learning text classifier" that scans Twitter posts to identify hate speech. The article, which was first published online on April 22, 2015, is available on the Wiley Online Library.
Such a system could potentially be used to warn Twitter posters -- and users of other social media -- beforehand when their posts could be deemed hate speech. It could also let people block such posts, or at least be warned before the posts open that they may contain disturbing content. I can see it now: "Are you sure you want to use this public Facebook post to call your boss a brainless pot of spoiled, festering gruel?"