How open data can help save the world, December 2, 2015

How open data can help save the world

Q: How did the accidental downing of a Korean passenger jet by the Soviet Union in 1983 lead to Uber, Waze, and countless other revolutionary products and services?

A: Open data.

The Atlantic’s Sarah Laskow explains in a November 3, 2015, article that the downing of Korean Air Lines Flight 007 on September 1, 1983, motivated President Reagan to speed up provision of the U.S. government’s GPS satellite data to civilians. More than 10 years and $10 billion later, the data from the government’s GPS satellite network was made available to private parties (although the most precise GPS data was reserved for use by government agencies and their chosen partners).

The benefits of open data extend far beyond GPS services. In a December 17, 2013, article, InformationWeek’s Wyatt Kash describes the OpenData 500 Global Network created by New York University’s Governance Lab (GovLab). OD500 assists government agencies in countries around the world in making their data available for social benefit and commercial projects. The site lets you search for U.S. companies that are using open data by state, industry, and federal agency.

We’re only scratching the surface of open data’s potential to improve people’s lives. A November 21, 2015, article in The Economist highlights open data’s many possibilities. In addition to making governments more accountable and helping to fight corruption, open data promises to get more citizens involved in public services.

Government operations can be made more efficient by adopting smart-city technologies, as well as through sharing and regionalization. By making government data available to the public, the data can be verified and improved by tapping the skills of citizen volunteers, such as Code for America. Last but not least, open data allows the people who receive government services to play a role in determining how the services are provided.

Current state of open data: Not good, but getting better

The Economist lists four formidable obstacles to the use of public data for the public good:

1) Most of the data being released by government agencies has no practical use.
2) The data is difficult to analyze.
3) There’s a shortage of people with the skills required to analyze the data.
4) Use of the data is limited by privacy concerns.

Despite the effort involved, government agencies of all types and sizes are embracing “citizen-facing software and services that connect citizens, tourists and businesses with government services,” according to TechCrunch’s Maury Blackman in a November 24, 2015, article. They’re taking an entrepreneurial approach to open data, driven primarily by citizens’ raised expectations. “Citizens feel all data concerning civic matters should be readily accessible and consumable, so these info-savvy individuals are pushing local governments to open their data ports and let them in,” according to Blackman.

Adventures in open data: Three very different examples

GCN’s Derek Major writes in a November 23, 2015, article about “California’s new open data website, DebtWatch, which provides details about $1.5 trillion in debt issued by state and local government entities.” The site is “powered” by Socrata, a cloud service intended for federal and other government agencies. According to Major, DebtWatch can be used to “track proposed and issued debt, cost of issuance as well as bond and tax election results. The debt-related information on the site goes back to 1984, includes more than 2.8 million fields of data and will be updated monthly. Users can download raw numbers, create charts and make comparisons among debt issuers.”

In their spare time, three programmers created Seattle’s Center for Open Policing, which sued successfully to get the Seattle Police Department to release data relating to the location of police patrols, and information about citizen complaints against/internal investigations of specific police officers. Steve Miletich of the Seattle Times reports in a November 23, 2015, article on Government Technology that the three programmer-activists were motivated by run-ins each experienced with law enforcement. They are developing a database that the public can use to find information about specific police officers.

Drug companies are being compelled to release all data relating to their drug trials that use human subjects. Xconomy’s Alex Lash writes in a November 17, 2015, article that “[i]n exchange for the privilege of testing for-profit drugs on human subjects, companies have an obligation to make all their clinical data available for the common good. Science and medicine should work best when research can be examined, criticized, and serve as a foundation for more studies…. [T]he FDA’s parent department, Health and Human Services, has ethical language in its ‘common rule,’ which in turn was based on a 1979 report on the ethics of research with human subjects.”

Lash reports that GlaxoSmithKline suppressed results of a study that found Paxil was ineffective, and worse, was dangerous. Several people committed suicide as a result of Paxil’s dangerous side-effects.

Blueprint for ensuring open data is truly open, truly useful

Open Knowledge is a “worldwide non-profit network of people passionate about openness, using advocacy, technology and training to unlock information and enable people to work with it to create and share knowledge.” Its Open Definition is intended to ensure that open data is truly open and truly useful.
The Open Data Handbook provides even more instruction in the form of tutorials, guides, case studies, and resources. It covers choosing a dataset, applying for an open license, making the data available for download, and allowing it to be searched on a website.

Medium’s Susan Crawford writes in a November 2, 2015, article on Back Channel about the city of Louisville using the open-source OpenStreetMap mapping project to help blind people get around town. American Printing House for the Blind created an Android app called Nearby Explorer that lets blind people in APH’s hometown of Louisville know the street addresses of the buildings they pass.

APH chose to use OpenStreetMap because it’s cheap, it doesn’t require a lot of resources, and it’s accurate. Volunteers from Code for America brought APH together with officials from the city of Louisville, who offered to provide APH with address information from the city’s building footprint database.
GCN’s Amanda Ziadeh writes in a November 13, 2015, article about OpenDataSoft, which “has mapped more than 1,600 open data portals worldwide, in an effort to make clean, usable data easier to find and access.” The Open Data Inception project’s data sources range from public high schools to federal agencies and military departments.

Downside of open data

Medium’s Nick Selby writes in a November 26, 2015, article about a Los Angeles councilwoman’s proposal to use data collected from license-plate readers to send warning letters to people whose cars are spotted in high-crime areas, such as locations where prostitution is prevalent. In addition to being unconstitutional, the idea is vague beyond reckoning. But its most dangerous aspect is the chilling effect it would have on freedom of association and freedom of transportation. Keep in mind, there is no court order and no due process in this unwise attempt to automate reasonable suspicion.

On top of this, there’s the matter of data retention. The license-plate data is theoretically disposed of quickly. However, once the data is used, it is subject to public requests for release, so the names of the people receiving these letters could be published.

Thus we circle back to the Achilles heel of open data: privacy. It’s too easy to de-anonymize personal information that is claimed to have been anonymized. For example, researchers at Harvard University’s Data Privacy Lab were able to associate anonymized medical information released by the State of Washington with newspaper accounts of accidents. This allowed the researchers to identify the medical histories of the people mentioned in the newspapers.

Even if we’re not able to find a reasonable way to ensure anonymized data can’t be reconstituted into personally identifiable information, the benefits of open data could outweigh some reasonable loss of privacy. Then again, the data is going to be collected and analyzed regardless of the privacy risk, so we might as well make the process as transparent as possible.