Making Open Data more evidence-based: Toward a user-centric and interdisciplinary research agenda to advance open data.

17 October 2016

Reposted from IODC2016 (Spanish version) – by Stefaan G. Verhulst and Danny Lämmerhirt
Key take-aways from the Measurement Action Track at IODC 2016
During the IODC 2016, the “Action Track: Measurement and Increasing Impact” sought to review the need and role of research for (scaling) open data practice and policy. The track was informed by the various sessions and workshops that took place at the Open Data Research Symposium prior to the Conference.
In what follows, we summarize what we heard throughout the conference.
Headline message that emerged from our engagement with the community at IODC: To realize its potential there is a need for more evidence on the full life cycle of open data – within and across settings and sectors.
Many participants acknowledged and shared progress toward gathering evidence on developments, actors and conditions that impact open data. Yet, a consensus emerged that more systematic research is still needed. An “evidence-based and user-centric open data” approach is necessary to drive adoption, implementation, and use.
In particular, three substantive areas were identified that could benefit from interdisciplinary and comparative research:
Demand and use: First, many expressed a need to become smarter about the demand and use-side of open data. Much of the focus, given the nascent nature of many initiatives around the world, has been on the supply-side of open data. Yet to be more responsive and sustainable more insight needs to be gained to the demand and/or user needs.
Conversations repeatedly emphasized that we should differentiate between open data demand and use. Open data demand and use can be analyzed from multiple directions: 1) top-down, starting from a data provider, to intermediaries, to the end users and/or audiences; or 2) bottom-up, studying the data demands articulated by individuals (for instance, through FOIA requests), and how these demands can be taken up by intermediaries and open data providers to change what is being provided as open data.
Research should scrutinize each stage (provision, intermediation, use and demand) on its own, but also examine the interactions between stages (for instance, how may open data demand inform data supply, and how does data supply influence intermediation and use?).
Several research questions were proposed including the following: What is the demand for open data – and do interest groups understand the potential value open data conveys for them? If so, how to study these interest groups? Who are the audiences of open data? What are the different types of users (and users of users)? What are their needs? What are the problems or opportunities current and potential users seek to address using open data? When do users become producers of data and vice-versa? What is the role of data intermediaries in providing and using open data? How to study and establish feedback loops between open data users, intermediaries, and providers that can help make open data more relevant to users?  Do we need professional standards for different types of users – such as, for instance, data journalists?
Unfortunately – besides traditional UX research methods – no method exists for data holders and/or users to assess demand and use in a manner that can inform design and policy requirements.
Next steps:

  • Toward that end, it was suggested to create a collaborative effort to develop a “diagnostic tool or method” to map and analyze the ecosystem of open data toward better understanding the needs, interests as well as power relations of different stakeholders, users, non-users and other audiences.
  • In addition, to be more deductive, explanatory, and generate insights that are operational (for instance, with regard to which users to prioritize) several IODC participants recommended to expand the development and exchange of “demand and use” case studies based on interdisciplinary perspectives (and going beyond a descriptive collection of examples).

Informing data supply and infrastructure: Second, we heard on numerous occasions, a call upon researchers and domain experts to help in identifying “key data” and inform the government data infrastructure needed to provide them. Principle 1 of the International Open Data Charter states that governments should provide key data “open by default”, yet the questions remains in how to identify “key” data (e.g., would that mean data relevant to society at large?).
Which governments (and other public institutions) should be expected to provide key data and which information do we need to better understand government’s role in providing key data? How can we evaluate progress around publishing these data coherently if countries organize the capture, collection, and publication of this data differently?
Next Steps: Several steps were suggested to enable policy and decision makers in prioritizing data sets and allocating resources to do so, including:

  • Develop decision trees that compare and integrate evidence on the demand, benefits and risks of data-sets;
  • Identify and analyze “data deserts” – where no or little data is collected and made available;
  • Develop and provide assessment frameworks for National Statistical Offices on the potential value of certain data-sets.

Impact: In addition to those two focus areas – covering the supply and demand side –  there was also a call to become more sophisticated about impact. Too often impact gets confused with outputs, or even activities. Given the embryonic and iterative nature of many open data efforts, signals of impact are limited and often preliminary. In addition, different types of impact (such as enhancing transparency versus generating innovation and economic growth) require different indicators and methods. At the same time, to allow for regular evaluations of what works and why there is a need for common assessment methods that can generate comparative and directional insights.
Next steps: Joint efforts were recommended to develop

  • Data-value chain assessment mechanisms that can identify and illustrate how value gets generated (if at all), at what stage and under which conditions;
  • A conceptual framework that can accommodate the (e)-valuation of data as an infrastructure or “commons” (similar to other public interest resources such as green spaces or air quality).

Research Networking: Several researchers identified a need for better exchange and collaboration among the research community. This would allow to tackle the research questions and challenges listed above, as well as to identify gaps in existing knowledge, to develop common research methods and frameworks and to learn from each other. Key questions posed involved: how to nurture and facilitate networking among researchers and (topical) experts from different disciplines, focusing on different issues or using different methods? How are different sub-networks related or disconnected with each other (for instance how connected are the data4development; freedom of information or civic tech research communities)? In addition, an interesting discussion emerged around how researchers can also network more with those part of the respective universe of analysis – potentially generating some kind of participatory research design.
Next steps: To enable networking and increased matching of expertise, needs and interests, resources and efforts must be directed toward:

  • A collaborative (and dynamic) mapping of the current open data research eco-system – identifying both the supply and demand for research; and how research questions and methods of different research disciplines already intersect and could cross-pollinate each other;
  • Network analysis of the open data research universe to identify gaps and hubs of expertise (including, for instance, possible correlation analysis of participants of different open data-related conferences);
  • Experimentation with participatory research design – not only to study “the user”, but to study open data “with the user”;
  • Experimentation and evaluation of different networking and collaboration platforms. This may increase our understanding of the usability and usefulness of existing research networks.

To conclude, the different papers that were submitted and presented at the Open Data Research Symposium (all downloadable from odresearch.org) and the growing literature on open data (see for instance ogrx.org) indicates that much progress has been made toward an enhanced understanding of open data –its suppliers, users and practices. Yet as the Open Data community matures, more evidence is needed to guide future investments and uses. Ultimately the open data community should “walk the talk” and become more data-driven – which means that more investment is needed to support research and network the evidence and expertise that already exists.
This blogpost intends to start a conversation how we can better research open data. We invite everyone interested in this important area to discuss with us the possible research topics proposed above.
How can we operationalize the topics described above?
Are any important research areas missing?
Please feel free to use established venues including the Network of Innovators and the Open Knowledge Discuss Forum for Research and Policy. This allows us to create central discussion channels that will bring interdisciplinary research interests together.
See also: Open Data Research Symposium