Davies, T., Walker, S., Rubinstein, M., & Perini, F. (Eds.). (2019). The State of Open Data: Histories and Horizons. Cape Town and Ottawa: African Minds and International Development Research Centre.
First published in 2019 by African Minds and the International Development Research Centre (IDRC).
African Minds
4 Eccleston Place
Somerset West, 7130
Cape Town, South Africa
A co-publication with
International Development Research Centre
PO Box 8500, Ottawa, ON, K1G 3H9, Canada
© Contributors 2019. Licensed under the Creative Commons Attribution 4.0 International licence (http://creativecommons.org/licenses/by/4.0/).
The research presented in this publication was carried out with the aid of the Open Data for Development (OD4D) Network and a grant from the International Development Research Centre, Ottawa, Canada. The views expressed herein do not necessarily represent those of IDRC or its Board of Governors.
ISBNs:
Print edition 978-1-928331-95-7
eBook edition (IDRC): 978-1-55250-612-7
ePub edition: 978-1-928331-96-4
Orders:
African Minds
4 Eccleston Place, Somerset West, 7130, Cape Town, South Africa
For orders outside Africa:
African Books Collective
PO Box 721, Oxford OX1 9EN, UK
Foreword by Beth Simone Noveck
SECTION 1: OPEN DATA SECTORS AND COMMUNITIES
Chapter 1.Accountability and anti-corruption
Jorge Florez and Johannes Tonn
Ben Schaap, Ruthie Musker, Martin Parr, and André Laperriere
Jack Lord
Sandra Elena
Chapter 5.Development assistance and humanitarian action
Catherine Weaver, Josh Powell, and Heather Leson
Javiera Atenas and Leo Havemann
Selwyn Willoughby
Anders Pedersen
Renée Sieber
Chapter 10.Government finances
Cécile Le Guen
Mark Irura
Tim Davies and Sumandro Chattapadhyay
Chapter 13.National statistics
Shaida Badiee, Caleb Rudow, and Eric Swanson
Stephen Song
Pieter Colpaert and Julián Andrés Rojas Meléndez
Jean-Noé Landry
SECTION 2: ISSUES IN OPEN DATA
Chapter 17.Algorithms and artificial intelligence
Tim Davies
Chapter 18.Data infrastructure
Leigh Dodds and Peter Wells
Mariel Garcia Montes and Dirk Slater
Ana Brandusescu and Nnenna Nwakanma
Chapter 21.Indigenous data sovereignty
Stephanie Carroll Rainie, Tahu Kukutai, Maggie Walter, Oscar Luis Figueroa-Rodríguez, Jennifer Walker, and Per Axelsson
Danny Lämmerhirt and Ana Brandusescu
Teresa Scassa
SECTION 3: OPEN DATA STAKEHOLDERS
Christopher Wilson
Chapter 25.Donors and investors
Fernando Perini and Michael Jarvis
Barbara-Chiara Ubaldi
Chapter 27.Journalists and the media
Alex Howard and Eva Constantaras
Chapter 28.Multilateral organisations
Craig Hammer
Joel Gurin, Carla Bonina, and Stefaan Verhulst
François van Schalkwyk
SECTION 4: OPEN DATA AROUND THE WORLD
Chapter 31.Eastern Europe and Central Asia
Lejla Sadiku and Yaera Chung
Rufus Pollock and Danny Lämmerhirt
Chapter 33.Latin America and the Caribbean
Silvana Fumega and Maurice McNaughton
Chapter 34.Middle East and North Africa
Nagla Rizk, Nancy Salem, and Stefanie Felsberger
Chapter 35.North America, Australia, and New Zealand
David Eaves, Ben McGuire, and Audrey Carson
Chapter 36.South, East, and Southeast Asia
Michael Canares
Leonida Mutuku and Teg-wende Idriss (Tinto)
The editors would like to thank the International Development Research Centre (IDRC) for its support of the State of Open Data project from inception to conclusion, without which this publication would not have been possible.
We wish to also acknowledge the ongoing efforts of the Open Data for Development (OD4D) network to help create open data ecosystems around the world in order to spur social change, increase government transparency, support the Sustainable Development Goals (SDGs), and for its ongoing commitment to promoting and understanding the impact of open data, resulting in the publication of this volume. OD4D has received support from IDRC, the World Bank, the United Kingdom’s Department for International Development (DFID), the William and Flora Hewlett Foundation, and Global Affairs Canada.
The State of Open Data project is the result of a collaboration, drawing on input from over 200 individuals. Authors have benefitted from independent reviews by members of the Editorial Board and invited reviewers, as well as the input of many more contributors during the early online “Environment Scan” stage of the project. We have endeavoured to credit all non-anonymous contributions, but know there will have been input and suggestions offered at workshops or in conversations that are not recorded below. If you were a contributor to the State of Open Data project in any way, we thank you.
The State of Open Data project owes its greatest debt to all of the authors who have come together to contribute chapters to this volume, bringing to the project an unprecedented level of expertise and knowledge, as well as a diversity of invaluable experience. We have included more specific information on every author in each chapter.
We would like to recognise and thank all the members of The State of Open Data’s dedicated Editorial Board, our extremely knowledgeable team of peer reviewers, and all those who have provided additional assistance to the authors through their contribution to the environmental scans or to the development of the chapters in this volume.
Ania Calderon, Craig Hammer, Fiona Smith, Joel Gurin, Katelyn Rogers, Lejla Sadiku, Maurice McNaughton, Muchiri Nyaggah, Nancy Salem, Nnenna Nwakanma, Shaida Badiee, Stefaan Verhulst, and Teg-wende Idriss (Tinto).
Abed Khooli, Ali Rebaie, Amanda Smith, Amy Guy, Anca Matioc, Ania Calderon, Caleb Rudow, Claire Schouten, Claudia Schwegmann, David McNair, Eric Swanson, Francesca De Chiara, Giuseppe Sollazzo, Hatem Ben Yacoub, Jacqueline Klopp, Jean-Noé Landry, Jenna Slotin, Joshua Powell, Julian Tait, Keitha Booth, Krishna Sapkota, Krzysztof Izdebski, Leigh Dodds, Maya Forstater, Mollie Hanley, Omenogo Mejabi, Oscar Montiel, Paul Walsh, Paulina Bustos Arellano, Pyrou Chung, Raed M. Sharif, Rafael García Aceves, Riyadh Al-Balushi, Rob Kitchin, Rosario Pavese, Satyarupa Shekhar, Tom Orrell, Willow Brugh, and Yan Naung Oak.
Carlos Iglesias, Enrique Zapata, Arjan El Fassed, Stefanie Felsberger, Alan Hudson, Kshitiz Khanal, Michal Kubáň, Carla Bonina, Devangana Khokhar, James McKinney, Khairil Yusof, Pierre Chrzanowski, Adrián Pino, Ana Brandusescu, Andrea Borruso, Anne-Marie Heemskerk, Bart Hanssens, Ben Parker, Christian Medina-Ramirez, Eduard Martin-Borregon, Elise Dufief, Eva Constantaras, Francois van Schalkwyk, Joshua Tauberer, Lindsay Read, Manuel Acevedo, Marina Godoy Crotto, Martin Noblecourt, Martín Szyszlican, Matteo Brunati, Paige Kirby, Rachel Rank, Rupert Simons, Aaron Wytze, Adam Kariv, Alannah Hilt, Alla Morrison, Andi Pawelke, Andrea Ayres Deets, Andrew Nicklin, Andrew Therriault, Andrew Young, Anna Alberts, Anna Fleming, Anna Powell-Smith, Anton Ruehling, Antonio Jesús Sánchez Padial, Arturo Muente-Kunigami, Audrey Ariss, Bierta Thaci, Carole Excell, Chang Liu, Chipo Msengezi, Chris Taggart, Daniel Carranza, Darko Brkan, David Moore, David Rae, David Sasaki, David Selassie Opoku, David Wasylciw, Denice Ross, Dhanaraj Thakur, Dheeraj Ravindranath, Diego Cuesy, Duncan Edwards, Edafe Onerhime, Edward Saperia, Eliza Niewiadomska, Fabrizio Scrollini, Felipe Amaya Salazar, Feng Gao, Gabe Sawhney, Gabriel Mercado, Gabriela Rodriguez, Gaurav Godhwani, Georges Labreche, German Stalker, Gerry Tychon, Gwen Phillips, Hari Subhash, Hossein Maleknejad, Jason Lally, Jason M. Hare, Jay Daley, Jeff Geipel, Joel Natividad, Jonathan van Geuns, Jorge Florez, Jorge Umaña, Jose M. Alonso, Joshua Powell, Juan Ortiz Freuler, Juan Pablo Marin Diaz, Kate Vang, Katie Clancy, Krystina Shveda, Krzysztof Madejski, Kyle Copas, Laura Meggiolaro, Lisa Walmsley, Liz Dodds, Liz McGrath, Maciej Możejewski, Madeleine Ngeunga, Maggie Walter, Manuel Acevedo, Marnie Webb, Martin Bader, Martin Noblecourt, Matthew McNaughton, Michael Schnuerle, Mike Davies, Mikhail Parfentiev, Miles Litvinoff, Momi Peralta Ramos, Mor Rubinstein, Nadiia Babynsky Virna, Nancy Salem, Natalia Mazotte, Nikesh Balami, Nikhil VJ, Nino Macharashvili, Noémie Girard, Nora Lester Murad, Owen Boswarva, Pablo Cruz Casas, Paloma Baytelman, Paola Mosso, Paul Bradshaw, Paul Hindriks, Paul Stone, Paulina Bustos, Pedro Manrique, Philip Horgan, Pınar Dağ, Rachel Murray, Ruba Ishak, Scott McQuarrie, Selene Yang, Sidi Zakari Ibrahim, Stefaan Verhulst, Steven Adler, Sym Roe, Tara Susman-Peña, Thomas Lassourd, Tina Appiah, Tyler Kleykamp, Valentina Delgado, Virginia Brussa, Walter Palmetshofer, Will Skora, Yacine Khelladi, Yanina Bellini Saibene, Yohanna Loucheur, and Zukiswa Kota.
The editors also wish to thank and acknowledge Jean-Noé Landry and the team at OpenNorth for their invaluable support of the project’s administrative processes.
Special thanks are due to the entire African Minds production team: Simon Chislett, Leith Davis, and Tessa Botha, as well as African Minds Director, François van Schalkwyk, for his partnership in the publishing process. Finally, the project also owes a great debt to Nola Haddadian, the Publisher at IDRC, for her tireless support and patience throughout the publishing and editorial review process without which the project would not have been realised.
Tim Davies is an activist, researcher, and social entrepreneur, who has been working on themes related to open data since 2009. He was Research Lead for the first two years of the IDRC/World Wide Web Foundation’s “Exploring the Emerging Impacts of Open Data in Developing Countries” research network and coordinated the first two editions of the global Open Data Barometer. He co-founded Open Data Services Co-op in 2015 to support ongoing development of open data infrastructures, including the Open Contracting Data Standard (OCDS) and data standards for corporate transparency. He was series editor for the Open Data Charter Open-Up Guides on anti-corruption and agriculture. A social researcher by training, Tim has been a fellow of the Berkman Centre for Internet and Society and has studied at the Oxford Internet Institute and University of Southampton Web Science Centre. He blogs at http://www.timdavies.org.uk and tweets at https://www.twitter.com/timdavies.
Stephen B. Walker is the former Director General responsible for leading open government and open data for the Government of Canada, where he developed and implemented national policies, programmes, and infrastructure to advance open data. At the international level, Steve was directly involved in the development of the G8 Open Data Charter as well as the Open Data Charter. He also chaired the Open Government Partnership’s Working Group on Open Data. More recently, Steve has worked with the Open Data for Development (OD4D) network and managed the International Open Data Conference. Steve also runs his own consulting company, True North Consulting, specialising in advancing open data and transparency policies and practices. Steve tweets infrequently at https://www.twitter.com/sbwalker61.
Mor Rubinstein is an open data practitioner with more than ten years of experience. She was a Community Coordinator and the Lead Researcher for Open Knowledge International’s Global Open Data Index. She is currently the Labs Manager for 360Giving, a UK initiative for opening up philanthropic grants data for better grant-making. She is also the co-founder and coordinator of the Open Heroines community, a global community for women in open data, open government, and civic tech. She holds a Master of Science in Social Science of the Internet from the Oxford Internet Institute. You can follow her on Twitter at https://www.twitter.com/morchickit.
Fernando Perini is a Senior Programme Specialist at Canada’s International Development Research Centre (IDRC), where he coordinates the Open Data for Development (OD4D) programme. OD4D is a global partnership that supports southern leadership and locally led data ecosystems around the world as a way to spur positive social change and sustainable development. You can follow Fernando on Twitter at https://www.twitter.com/fperini.
It was a long day in early December 2008. Thirteen hours alone on a Sunday in a windowless room of the presidential transition HQ on 6th Street in DC. The transition team that had started as a dozen people the previous summer had ballooned after the election to almost 700 people who were now responsible for planning the first hundred days of the Obama administration. It was a microcosm of the government, designing initiatives to launch the new presidency with a socially impactful and politically practical bang.
Coming on the heels of the Bush Administration and plummeting rates of trust in government, it was imperative that we govern differently, not behind closed doors, but in the open. Although the iPhone had only just been invented and social media platforms, Facebook and Twitter, were still comparatively new, it was clear that the internet, especially new data science tools and methods, might make it possible to strive for more evidence-based policy-making and better solutions to public problems.
At that juncture, I was chairing the Technology, Innovation and Government Reform (TIGR) working group, a small band of people passionate about the potential for using new technology to modernise and improve the workings of government. Our policy initiatives were designed to cut across the usual topics of economy, education, foreign policy, and health to promote a different way of working. We wanted to be “one bullet point of every five” and help each of the subject-matter teams to use technology, data, and innovation to accelerate the implementation of their goals.
We had a motley array of cross-cutting suggestions to put forward to the President-elect. They included new websites, such as USASpending.gov that would lay bare the money we were spending on the bailout after the financial crisis, and new hires, including the creation of a new Chief Technology Officer position, an expanded Chief Information Officer role, and a technology “SWAT” team that would go into each agency and assess the state of its infrastructure, as well as a new open government policy. As is now well known, that policy had three inextricably intertwined prongs: transparency, participation, and collaboration.
Inspired by the way the publication of weather data had spawned a billion-dollar forecasting industry or the sharing of government-collected genomic data had birthed the biotech revolution, we were convinced that opening up the information that government collects would accelerate solutions to public problems if designed to go beyond mere transparency to create incentives for a wide range of actors across government, academia, and industry to use information for public good.
Just as open source software development – creating code with a larger group of people often outside the confines of one organisation to accelerate the process of both writing and testing software – opening up government data could make it possible for those outside of government to scrutinise and use government information more productively than government acting on its own.
Now ten years into the open data revolution, it is almost hard to remember how radical an idea open data – or transparency plus participation and collaboration – was at the time.
First, it upended 50 years of thinking about the right-to-know strategies embodied in Freedom of Information (FOI) legislation. Open data complicated our reliance on FOI as the bedrock of transparency policy by shifting the underlying theoretical understanding of the relationship between the state and the public from the adversarial to the collaborative.
FOI is an inherently confrontational tactic focused on prying secrets out of government. Open data is not. It depends upon the institution that collects the data wanting to publish it in order to attract knowledgeable and passionate members of the public who want to use it. Because governments in an open data regime must proactively publish their data with the intent that people will use it, the normative essence of open data is participation rather than litigation. The role of the public has always been to scrutinise and criticise. The idea that the public and government can work together to augment the manpower and skills in under-resourced public institutions continues to demand a major shift of mindset.
Second, many transparency and good government activists were actively hostile toward the new policy because it did not focus squarely on publishing information only about the workings of government such as budget data that is designed to produce greater government accountability. By catalysing public engagement to promote both the scrutiny of data by the public and collaboration with the public in building new analytical tools and websites, open data galvanised collaboration between institutions and the public to create value of different kinds, especially to advance solutions to hard problems.
Opening up the corpus of patent data – one of our earliest projects – while laudable, struck many as a distraction from the all-important goal of enhancing government accountability. The fact that such data could unlock our understanding of the innovation economy was not yet well understood. Similarly, the idea that open data could be a key asset in developing tools to help passengers know which flights were likely to be delayed, help patients choose between hospitals, or help parents make more informed decisions about colleges, ran contrary to what open government meant for many people.
It took many years of experience with open data to temper the discontent and persuade the naysayers. Creating apps for the Health Datapalozza by using newly published datasets from Health and Human Services began to change minds. Witnessing first-hand the reforms to the criminal justice system in the United States made possible by opening up police data was a sign that the movement was maturing. Thousands of lives saved by CPR-trained bystanders responding to texts specifying the locations of people experiencing cardiac arrest, generated by a real-time open data feed of emergency 911 calls, drove home the point that open data is a vital new tool for advancing social justice. The countless examples from around the world sprinkled throughout this volume, and the over 70 countries making commitments to publishing open data as part of their participation in the Open Government Partnership, have created widespread awareness of the power of open data as a new tool in the toolkit for public problem solving.
The explosion of newly available data coupled with mounting evidence (as this book so thoroughly demonstrates) that data catalyses productive, problem-solving partnerships between government and the governed suggests that the use of open data as a tool of governing will continue to grow. If the trend continues, open data will lead to new empirically informed ways to hold government and others accountable, spurring consumer choice and expanding the range of approaches to tackling human rights and development challenges.
Yet a week does not go by when I do not still have to debate with those in government about the value of opening data. Open data in many places is still under threat from the move toward more closed governments and closed societies. Even in more enlightened regimes, however, many still argue that it is better to sell than give away the data that was paid for by, and belongs to, taxpayers. I still plead with those who doubt whether people will use the open data we invest in publishing in machine-readable formats rather than PDFs.
These doubts stem, in part, from the lack of data-analytical skills among public servants. We know more in 2019 than we did a decade ago about how to use data for good. But even when governments know to open and publish their data, they still often lack the ability to use the data themselves. This may slowly change as agencies like Digital Canada, the Argentinian government lab (LabGobAr), and the multi-university Coleridge Initiative in the US, train people in government in how to use data to solve problems.
To be sure, there have been times when the potential for open data has been over-hyped, especially when naively assuming that data publication, in and of itself, will solve problems, neglecting the importance of investing in the original idea that participation and collaboration are vital for getting multi-disciplinary teams of people inside and outside of government scrutinising, visualising, and using the data to create value.
But, fundamentally, the challenge for open data – and open government more broadly – is the shift in mindset it demands to embrace the original values and learn the practices of transparency, participation, and collaboration.
Open government shifts the focus of transparency from monitoring government after the fact to mechanisms that encourage the public to participate actively in improving societal outcomes. Open data fosters more active citizenship and more collaborative democratic institutions that draw directly on the collective expertise of the population to solve public problems. Ultimately, open data gives us a vision for a new kind of government to strive for – not bigger or smaller – but one that ensures collaboration makes our public institutions more effective and legitimate and our democracy stronger. By taking stock of the current state of open data, this book acts as a key resource and charts a course for future action to keep open data on track as a transformative tool of more open, collaborative, innovative, and participatory governance.
Beth Simone Noveck
Professor, New York University and Director, The Governance Lab
New York City, 2019
A decade ago, open data was more or less just an idea, emerging as a rough point of consensus for action among pro-democracy practitioners, internet entrepreneurs, open source advocates, civic technology developers, and open knowledge campaigners. Calls for “open data now” offered a powerful critique of the way in which governments and other institutions were hoarding valuable data paid for by taxpayers – data that if made accessible, could be reused in a myriad of different ways to bring social and economic benefits and democratic change.
Ten years on, open data is much more than just an idea. First, it was a movement, and then a label applied to vast quantities of data from genomics and geospatial data to land registers, contracting, and parliamentary voting. Today, it’s a term found on government portals, in global policy documents, and in job descriptions. Thousands of businesses around the world owe their existence or their growth to the release of open government data, and hundreds of civil society organisations have embraced open data as a key element of their social change toolkit.
For a while, it may have been possible to identify a cohesive open data movement united by shared interests, working simply to gain access to more data and establishing the principle that government data should be open. However, as the movement has evolved, stakeholders have turned their focus to linking data use to specific needs and to questions of how to quantify the return on investment in advancing open data. Within this fast growing and organic open data movement, an ever-increasing number of networks and communities of practice have become more diverse, fluid, and cross-sectoral.
So what is the open data movement today? What has it achieved over the last decade? Answering these questions is at the core of this publication. It is a collective effort to explore what we can learn from the past, to identify how to build on the investments made to date, and to look at how open data policy and practice have started to address challenges such as mainstreaming and sectorisation.
Exploring these questions is not just important for historical purposes. It can yield important insights on how best to move forward. This publication is also an invitation to identify the issues that may sustain this broad coalition into the future. We believe that a deep reflection about the movement, even a reflection on whatever cracks have appeared or on the gaps between promise and reality, provides a vital opportunity to discuss where realignment and rethinking are needed.
This collection of essays is the product of an 18-month journey that has brought together almost 70 authors, supported by over 200 other contributors, to produce 37 short chapters on the current state of open data from a range of different perspectives, offering the most comprehensive attempt to explore the breadth and depth of the open data field to date.
Ten years may seem like a short period of time, but, when technology is involved, it constitutes a generational age. Institutional memories are curiously short, and in the cultural context of open data where amateurs are often welcome and professional barriers to entry are low, it is easy for work to proceed with little awareness of the past. This last decade has seen many succeeding phases of activity, so we have encouraged our authors to take a comparatively long view (when set against other contemporary writing on open data) to document the past in order to lay stronger foundations for future research and action.
We have also sought to understand open data as a global movement. Although some accounts have a tendency to focus on the North American or European roots of open data, tracing histories back to the launch of data.gov under Barack Obama’s presidency, open data practice has been shaped by interventions from across the globe. To gain a vantage point on open data as a global movement, this collection draws upon the editors’ engagement with the Open Data for Development (OD4D) network1 which has been closely engaged in regional networks in the Global South and involved in a range of global initiatives, including the Open Data Barometer (ODB), the Open Data Charter, the Open Government Partnership (OGP) Open Data Working Group, the Impact Map, and the Open Data Leaders Network.
Since 2015, OD4D has also been the permanent co-host of the International Open Data Conference (IODC), and the editors of this volume have been involved in preparing conference reports, including shared roadmaps for action, for the third, fourth, and fifth IODC meetings. We have seen how, over its five editions, IODC has shifted from a focus on open data, in and of itself, toward discussions that are thematic, sectoral, regional, and issue oriented, fostering critical debates on open data. The conference tracks and sessions at IODC have ultimately provided many of the chapter titles in this book, reflecting the many subcommunities of the open data field that have emerged. The debates at IODC over the last nine years also provide a useful proxy for debates across the wider field of open data, so a survey of the IODC conferences offers us one route to explore, in broad strokes, a history of how the focus of the open data movement has evolved.
The first IODC was hosted by the United States Department of Commerce and took place in November 2010 in Washington, DC.2 At the same time, in London, a civil-society led conference, the Open Government Data Camp, was taking place.3 These parallel events captured the growing excitement about open data from both governments and civil society and marked the end of a year in which open data had moved from idea to initiative and from inception to the earliest stages of institutionalisation. Over time, the boundaries between government and civil society networks have become more fluid with both positive and negative effects. The focus of these early events was on showcasing the platforms that had been built and discussing the potential for open data across sectors. However, even at this early stage, questions were being asked about how the impact of open data might be tracked, and whether bold claims being made on the transformative potential of open data could actually be realised.
By the time of the second IODC, hosted by the World Bank in July 2012,4 the question of how to measure emerging impact was firmly on the agenda. At this point, open data was being discussed in the context of international development and the movement had broadened to include a number of open data leaders from developing countries. Yet, while many of the projects profiled were still platform-focused, it was becoming clear that simply releasing data was not enough and that the quality of data available was far from perfect. Early discussions turned to whether the potential returns of open data had been overstated and how to deal with the growing gap between rhetoric and reality. That early sense of an impact gap still pervades many of the chapters in this collection with several authors exploring the various reasons that could explain less than promised progress on transformative use. However, we note that the perception of an impact gap is rarely reflected by a similar level of difficulty in sourcing case studies of open data use, raising questions about the perception and the reality of progress on open data, as well as the influence of early conceptual models for open data impact on current critical practice.
By the time of the third IODC in Ottawa in May 2015, the focus had moved to an examination of how open data ideas and practices were developing in different sectors and regions.5 The conference captured a period of dramatic regional and sectoral growth of open data activity with increasingly diverse representation from across the globe. There was growing recognition that opening data alone was not enough to create impact. Instead, as many of the chapters in this collection explore, to secure outcomes from open data, clear goals need to be established and a series of strategic interventions identified. Policy design, intermediaries, and capacity building were all on the agenda. As more stories of open data in use to solve specific problems were shared, there was a growing recognition that impacts secured in one context or sector may not automatically translate to another. And with this recognition came an understanding that, rather than a single open data movement, there may be many overlapping, interwoven movements, drawing on particular elements of open data to address many different agendas.
The third IODC also made explicit the potential links between open data and sustainable development, highlighting that open data was no longer the only data game in town. Instead, in the context of international development, open data now had to find its place alongside renewed efforts to build the capacity of long-established statistical agencies, as well as newer initiatives seeking to tap into the potential of big data from proprietary private sector providers.
The fourth IODC, held in Madrid in October 2016, was framed in terms of “Global Goals, Local Impact”, reflecting increased consolidation of global advocacy and a continued focus on shared global principles, which was evolving in parallel with the growth of subnational and thematic initiatives.6 Although the open data agenda had matured and become well-established as part of global policy-making, discussions explored concerns that it risked becoming a niche issue, destined to be the focus of only a small group of the “usual suspects”. Issues of privacy, gender equity, diversity, inclusion, and Indigenous data rights, all competed for space on the agenda, along with a new space for more critical discussion of how open data impact might be realised and the potential for more nuanced approaches to open data practice.
These critical threads continued into the fifth IODC that was held in Buenos Aires in September 2018.7 New on the agenda were discussions related to artificial intelligence (AI), and the conference saw a stronger focus on data standards and open data infrastructure. Although these later issues have long been discussed by a small but dedicated element of the open data community, there was increased recognition that they are not just technical issues. They also involve questions of data governance with political choices embedded in the use of data standards and structures, having substantial consequences for who can use and benefit from data.
In 2018, for the first time, the IODC agenda also featured a session on “Open Data Under Threat”, capturing a sense that continued progress was by no means assured. Against the backdrop of a deepening crisis of diminishing government support for openness around the world and much more public debate around the positive and negative potential of technology, concerns voiced over open data were no longer solely about a perceived impact gap. They also involved a deeper questioning of when and where openness can be safely practised and whether open data should be a priority for donors, advocates, and activists in the future.
A look at the 2018 IODC agenda also illustrates sectoral and regional sessions going deeper into the specific concerns of their fields and localities. In this, we find a reflection of the increasing diffusion of open data ideas, representing both a marker of success but also a potential risk to any future coherence of open data activity. In putting together this collection, while drawing on the OD4D network and IODC as a starting point, we have been conscious of the need to move beyond to capture wider activity on open data and to explore how an early open data movement has now become many overlapping movements. By working with a diverse community of authors, encouraging them to draw on both published literature and their own domain-based networks, as well as on wider online outreach to the community, we have looked to capture insights into the open data world from far beyond the core IODC community.
Culture and temperament inevitably shape any qualitative review of progress. As with any invested community, a substantial number of people and organisations engaged with open data have a tendency toward critique. For many, the idea that data should be open was ultimately born out of a critical opposition to the way governments were handling data and an ambitious imagining of an alternative future in which access and capacity to gain benefit from data is more evenly distributed. Coupled with the differences in pace between rapid technological change and comparatively glacial governmental reform, this critical approach combined with well-meaning ambition can lead to the progress of the last decade being underplayed. Challenges on the horizon ahead can too often serve to mask the steps that have been taken in order for those challenges to become visible.
In looking across the chapters that follow, we are struck by the extent to which open data ideas have become established across the globe. For instance, in Chapter 28 (Multilateral organisations), Hammer describes how, from 2010 onward, global development banks have integrated open data into their own methodologies, helping to popularise open data initiatives in developing and developed countries. In Chapter 29 (Private sector), Gurin, Bonina, and Verhulst illustrate the private sector’s widespread use of open data with examples from Asia, Africa, Latin America, Europe, and America. And since the Sustainable Developments Goals were adopted in 2015, robust, comparable, and open data has been emphasised as a critical tool to both inform and monitor development efforts. Across the entire section on Open Data Sectors and Communities, examples of open data being used to drive socioeconomic benefits or to shape policy debates are too numerous to mention here.
The adoption of open data as a central tool used in a number of major global policy initiatives of the last decade is particularly notable. The OGP, the International Transparency Initiative, the Extractives Industry Transparency Initiative (see Chapter 8: Extractives), and the Global Legal Entity Identifier Foundation which was created to respond to the last financial crisis (see Chapter 3: Corporate ownership), have all embraced open data within their work. Within the OGP in particular, commitments related to open data have been some of the most popular and successful.8 As Chapter 17: Algorithms and artifical intelligence explores, even as public attention shifts from open data toward a new wave of excitement about AI, open data ideas appear firmly established as a foundation for governmental AI policy.
So why is the current period for open data one of re-evaluation, rather than of celebrating progress? Put simply, the adoption of open data as part of the global development toolbox has opened it (rightly) to substantial scrutiny. How quickly are efforts to open up data leading to change? What is the return on investment from open data-related reforms? What are the factors that shape whether or not open data leads to impact? And finally, how does work on open data interact or integrate with other core issues of sustainable development, such as gender equity, Indigenous rights, and good governance? Questions such as these have received increasingly detailed attention over the last few years. Although hardly any of these questions have simple answers, by looking at both progress and challenges, this volume seeks to bring together evidence, examples, and analysis that can support efforts to address them more clearly than before.
For all the steps forward described above, as we look to the horizons of open data, we are confident in stating that policy excitement about open data has peaked. Ten years in, we are past the peak of a hype cycle and past the point where promise has to give way to evidence of practical impact. As a result, many open data communities are fast approaching their difficult teenage years with a deepening identity crisis.
Over the last decade, debates around the role of data in society have moved to centre stage, but arguments for openness now have to share the spotlight with newer excitement over the economic potential of big data, machine learning, and growing fears about the negative impacts of data stemming from data-driven manipulation of politics or the corporate invasion of personal privacy. Although early narratives around open data may have been able to present increased access to data as an unalloyed public good, contemporary advocacy must confront a much more complex landscape in which power, politics, and the question of who gains or loses from unfolding regimes of data access cannot be ignored.
This presents a number of key challenges with which the following chapters attempt to grapple. As open data has spread globally, the way in which open data ideas have manifested across different sectors, communities, countries, and stakeholder groups has increasingly varied. Regional distinctions of emphasis have developed, with, for example, some downplaying the importance of open licences (see Chapter 37: Sub-Saharan Africa) and others talking of innovation rather than of openness in order to avoid political resistance (see Chapter 34: Middle East and North Africa). As sectoral efforts deepen, it is domain or subject matter experts, rather than data specialists, who drive activity forward, so that the challenges of creating cross-sectoral linkages and building shared data infrastructure become even greater. Increased emphasis on inclusion places a substantial demand on problem-centred initiatives, which, in light of low levels of data literacy, must choose whether to focus on data for expert communities or to actively pursue the promise of open data as a tool of wider popular empowerment. When the focus shifts from calling for access to data to creating data infrastructure and putting data to work, the divergent goals of those who formed an initial open data movement come clearly into view and managing the tensions that emerge can be complex.
It was in mid-2017, as these tensions were becoming more apparent, amid a sense that overall momentum for open data may be faltering, that the State of Open Data project was conceived. Our objective:
To critically review the current state of the open data movement, assessing its progress and effectiveness in addressing challenges related to social and economic development and democratisation around the world.
Based on such a broad stock-taking of open data activity, we may not be able to fully resolve questions about the future of open data, but we can provide an account that helps practitioners, policy-makers, and community advocates to step back from their own position to gain a view of the wider landscape. By doing this, we hope to offer a rich and timely perspective and the groundwork for constructive debates that will shape the next decade of open data.
The open data field already benefits from a number of semi-regular quantitative studies of the progress of open data, such as the ODB9 and Open Data Index,10 both also supported by OD4D. To complement these, the approach to the State of Open Data project was designed, from the outset, to be more qualitative and narrative in style, involving a five-stage process.
1.Selection. Working with the OD4D network, potential chapters were identified based on open data communities, regions, stakeholder groups, and cross-cutting issues. Authors were then invited to lead on creating these chapters. The introduction to each section of this book provides details on the selection of chapter topics.
2.Engagement. Authors were asked to create an initial “environment scan”: a community brainstorming of issues, evidence, key actors, and events related to their topic. Scans were posted online for public comment and additions to gather more examples, case studies, articles, and input from beyond the authors’ own networks.
3.Writing and review. Responding to a common set of questions and prompts, authors then completed full chapter drafts, drawing on the input received from the environmental scans. These draft chapters were sent for peer review by independent reviewers and by members of our editorial board. Reviews were sent to authors who completed chapter revisions based on the input received.
4.Public drafts and discussion. Public drafts for the majority of the chapters were posted online ahead of the IODC in Buenos Aires in September 2018, where emerging themes were discussed. Panel discussions on themes from the work were also held at the OGP Summit in Tbilisi, Georgia, followed by additional opportunities for revision.
5.Synthesis and recommendations. Based on a collective review of all chapters, the editors have worked to draw out key findings and recommendations, which are summarised in section introductions and the book’s conclusion, including recommendations for research, funding, policy-making, and practitioner communities.
The authors and contributors to this project have been drawn from a wide range of backgrounds. Some have been active in the open data field for many years, while others are relative newcomers. Some are advocates and activists, while others are observers or academics. Some are open data generalists, while others specialise in a particular field. Many draw upon a range of different roles and positions.
When considering all of the authors, contributors to the environment scans, independent reviewers, and the editorial board, input has been received from over 220 individuals from around the world. Representing the diversity of the open data community with regard to gender, diversity, and global inclusivity has been the key principle underlying our approach to this volume. The goal was to achieve a 50–50 gender split in terms of authorship, although we fell short of this with a 58–42 split in favour of men.
Definitions and scope: Open government data
Our focus in this volume is primarily, but not exclusively, on open government data. That is, data which traditionally originates from governments, is created or used during the business of governing, or is created or published at the request of governments. We have intentionally adopted a broad definition here, cognisant that, over recent years, the traditional monopoly of national-level governments both in data collection and in being a primary site of governance has been eroded. For example, satellite imagery data from private companies or crowdsourced data from citizen scientists can all fall within the broad landscape of open data either traditionally collected by governments or used for governing. Similarly, data that results from academic research networks, but which informs public decision-making and action, forms a component of some chapters within this volume. However, reflecting the way that communities of practice around open data are generally organised, we have mostly stayed away from looking at open data in terms of open science or evaluating the extent to which different scientific disciplines and communities are approaching data sharing, access, and openness. This is well addressed in other work.11
When it comes to defining open data, we draw upon the widely used definition of open data as data that is accessible, machine-readable, and free of licensing restrictions on reuse. However, we apply the definition heuristically rather than legalistically. This recognises, for example, that in some countries and contexts, the lack of a fully “open licence” is less of a barrier to reuse in practice than in others, or that, at times, data may not be provided in machine-readable formats at source but has been easily converted for reuse by intermediaries. Rather than rule out such cases from exploration on a technicality, they are included in the scope of this study with their limitations noted where relevant.
One of the notable features of open data is the way in which it has been adopted and shaped by so many different stakeholders. Unlike “big data”, for example, which appears to be primarily a corporate concept marketed to governments and civil society, networks around “open data” have always been much more diverse, fluid, and cross-sectoral. More than anything, this breadth and fluidity lies at the root of the impending identity crisis of the open data movement. For a long time, it may have been possible to manage the tension between different interests via a short-term focus on simply gaining access to more data. However, when stakeholders turn their focus to data use and the need to quantify the return on their investment of time and resources, a broader open data coalition is much harder to sustain. Determining what the open data movement can (and should) yield moving forward, how to maximise every investment made, and how to take on the challenges of mainstreaming and sectoralisation simultaneously, is at the core of the movement’s identity crisis. The cracks that may appear need not lead to crisis. Rather, they should serve to highlight in relief where realignment and rethinking are needed for the future.
In editing this collection, we have sought to work with all of the authors to address the needs of four main groups: researchers, funders, policy-makers, and practitioners.
For researchers, each chapter draws upon available academic and grey literature, providing detailed citations and suggesting further reading. The hope is that researchers will use these chapters as a primer on open data within particular contexts to identify critical research gaps in need of further attention. In particular, the inclusion of further reading is designed to assist the use of these chapters in a teaching context.
For funders, we have sought to highlight key organisations and stakeholders in each sector and region and to point out instructive examples of what is being done with open data, noting, where appropriate, gaps in the available resources needed to develop new ideas or to scale what works in more locations for larger impact. A dedicated chapter on donors and investors (see Chapter 25) also considers the need for greater coordination of funding, and, as with most chapters, points to current areas of underinvestment, particularly around the infrastructure needed for sustainability and high-quality data delivery, as well as capacity building, to create a widespread culture of data use.
For policy-makers, we have encouraged authors to address both progress and challenges in the implementation of open data. In many cases, you will find more on the persistent challenges, reflecting not so much a lack of progress but rather the shared critical and progressive mindset of our authors who seek ambitious social change through the application of open data. We have sought, however, to keep chapters focused on a relatively small number of issues, prioritising those that most deserve policy attention at present.
For practitioners interested in detail on open data projects, whether focused on data publication or use, we have sought to provide them with both critical reflection and inspiration. The hope is that by reviewing chapters related to a specific sector from multiple perspectives, practitioners will discover new ways of framing old problems and practical ideas about how to move forward in using open data as a tool of entrepreneurial development or social progress.
Crucially though, we do not know how many of the readers of these essays will, in the future, associate themselves with the label of “open data practitioner” or “researcher”, or whether they will simply perceive their role as someone who engages with open data as one tool among many. This is perhaps core to the identity crisis the movement may be currently experiencing and to the corresponding adjustments that open data communities will need to make in the second decade of open data. Is there still a need for a sustained movement that identifies the technical and licensing regime around open data as its core objective? What ethical and normative approaches need to be integrated into any future engagement with open data? Is it a good thing for the debate to move on from openness to adopt other narratives related to “good data”,12 “data justice”,13 or “data rights”14? We will return to these questions after our review of the state of open data offered in the following chapters, when we will be better placed to discuss what stands to be gained or lost in the years ahead.
2https://web.archive.org/web/20101128112407/https://www.data.gov/conference/
3https://web.archive.org/web/20101218004117/http://opengovernmentdata.org/camp2010/
4https://web.archive.org/web/20120821060331/http://www.data.gov/communities/conference
5https://web.archive.org/web/20150716100733/http://opendatacon.org/
6https://web.archive.org/web/20161104210854/http://opendatacon.org/
7https://web.archive.org/web/20190127062831/https://www.opendatacon.org/
8Khan, S. & Foti, J. (2015). Aligning supply and demand for better governance: Open data in the Open Government Partnership. Open Government Partnership, 1 January. https://www.opengovpartnership.org/resources/aligning-supply-and-demand-better-governance-open-data-open-government-partnership
9https://opendatabarometer.org/
11Digital Science. (2018). The state of open data report 2018. London: Digital Science and FigShare. https://digitalscience.figshare.com/articles/The_State_of_Open_Data_Report_2018/7195058
12Daly, A., Mann, M., & Devitt, S.K. (2019). Good data. Amsterdam: Institute of Network Cultures.
13Taylor, L. (2017). What is data justice? The case for connecting digital rights and freedoms globally. Big Data & Society, 4(2). https://doi.org/10.1177/2053951717736335
14Tisne, M. (2018). It’s time for a bill of data rights. MIT Technology Review, 14 December. https://www.technologyreview.com/s/612588/its-time-for-a-bill-of-data-rights/
CONTENTS
Chapter 1. Accountability and anti-corruption
Chapter 3. Corporate ownership
Chapter 5. Development assistance and humanitarian action
Chapter 10. Government finances
Chapter 13. National statistics
The chapters in this section explore sixteen different sectors and communities where open data has been applied.
The earliest advocates turned to open data because they faced particular problems. They were not seeking data in general, but rather specific datasets to help them solve those problems. In the years that have followed, a broad movement on open data has secured access to data on thousands of different topics. How useful this data has been in solving problems or meeting social challenges is dependent on both the data and on the particular problems and challenges that were targeted. Open data is not a one-size-fits-all solution, but instead plays out in different ways in different settings. As the chapters in this section will illustrate, to understand the state of open data, we need to look at open data in context, exploring the particular sectors where it has evolved and the communities that have developed around it.
There are very few sectors where open data might not have a role. However, to provide a broad overview of open data developments, the focus chapters in this section were selected based on an analysis of the agenda and discussions at recent editions of the International Open Data Conference (see Introduction), as well as themes identified in the 2015 Sustainable Development Goals (SDGs)1 and the categories of high-value data identified in the G8 Open Data Charter2 and global measurement tools (see Chapter 22). We have sought to select sectors at varying stages of progress, ranging from government finances (Chapter 10) where budget and subsidy datasets have had a pivotal role in shaping early work on open data through to telecommunications (Chapter 14), a sector largely overlooked to date as an area of focus for open data initiatives. Our coverage is by no means comprehensive, and, inevitably, there are different choices that could have been made on the scope of each sector. Water and air quality, for example, could arguably have been addressed as sectors in their own right, although, in this volume, they find their place as sub-themes within the essay on the environment (Chapter 7).
The key advantage of a sectoral approach in a review of open data is that it requires us to take a step back and to understand open data in context. Understanding and intervening in the struggles around land ownership data (Chapter 12), for example, requires an appreciation of the different systems related to land ownership and a recognition of the role that records and data play in securing land rights. Progress on opening up corporate ownership data (Chapter 3) can also be better understood in the context of the global financial crisis and the search for policy responses at that point in time when “shovel-ready” open data approaches were available to draw on. Sectoral engagement with open data is far from inevitable but instead relies on the right combination of advocacy, infrastructure, and backing at key opportunity points. These opportunities can evolve quickly from external events, as in the 2008 financial crisis, or from the alignment of different stakeholder interests over time, such as with agriculture (Chapter 2), where a case can be made for opening up new pre-competitive space and a sectoral shift from closed to open models of data production and use.
The histories and horizons of open data vary from sector to sector. We have worked with the authors of each chapter to identify key dates in the development of open data in their sectors. These timelines are published as part of the online companion to this book. Taking this long view helps us to understand the way in which open data ideas enter into an existing landscape of data systems, political attitudes, stakeholder relationships, and programmes of action. In the crime and justice sector, for example (Chapter 4), the history of open data might have started with interactive crime mapping in 2005, but new technological approaches have to contend with long-established and localised legacy ICT systems and the conservative ethos of many judicial institutions. The crime and justice chapter also draws important attention to the way open data work unfolds between different branches of government, encouraging us to consider government stakeholders beyond just the executive branch.
A sectoral approach also allows us to look beyond the “usual suspects” who self-identify with open data to locate other important stakeholders who have, to date, been on the periphery of the open data discourse. In the health chapter, for example (Chapter 11), the creators of an open source health management information system (HMIS) emerge as central players whose actions, in tandem with national-level policy activity, can contribute to improvements in the availability of aggregated open health data. Chapters on education (Chapter 6) and geospatial data (Chapter 9) also identify key stakeholder groups (the open education working group and open geospatial community, respectively) who have had relatively weak links to wider open data communities in spite of their relevant expertise and knowledge. A sectoral approach also reveals common influences across sectors. Eleven of the sixteen chapters in this section, for example, mention either the Open Data Charter3 or the Open Government Partnership4 as an influence on open data advances, and nine chapters draw on evidence from the Open Data Barometer5 to understand progress.
Finally, a sectoral lens can help us to assess open data maturity and explore how embedded open data has become across a sector. To comprehensively assess the state of open data in a particular sector might require looking at the proportion of data generated in that sector which is ultimately available as open data, or it might involve an audit of use cases, identifying how far open data approaches have been adopted in addressing key sectoral challenges. While the chapters that follow are indicative rather than exhaustive, they show very different states of open data adoption. For example, the chapter on development assistance and humanitarian action (Chapter 5) suggests that the idea of open by default has become reasonably embedded in the sector, allowing stakeholders to shift their focus to developing and embedding more mature data-use practices. However, the chapter authors also note the ongoing challenge of building a data, and open data, culture in the sector, particularly given complex relationships between international, national, and local stakeholders. In the extractives sector (Chapter 8), work on governance, looking at issues such as contracts, tax, and royalty payments, has progressively integrated open data over the last decade, resulting in increased data availability and use. Yet, at the same time, the wider sector has seen a vast growth in proprietary data collection by commercial firms using emerging technologies, meaning that while the absolute quantity of open data available may have grown, the relative proportion of open to closed data has likely declined. A similar issue appears to be at play in the transport sector (Chapter 15), where route-planning apps have been a poster-child of the open data movement, but where the authors report that only a fraction of the data used to drive these apps is actually provided as open data. Even when open data is available, it may only cover a limited portion of the transportation experience. If a small group of stakeholders have access to superior but restricted-access application programming interfaces, the ideal conditions for innovation in the development of solutions will not develop.
One factor evident throughout the chapters in this section (and indeed throughout this volume) is that while open data has a technical foundation, progress relies upon policy, people, and collaboration. Open data tends to enter the discourse of a sector through the actions of one or more small groups that are able to enrol a wider group around them to develop and explore the application of open data. These are the open data communities that this section also attempts to bring into focus.
The original working title for this section of the book was “Open data communities” rather than “Open data sectors and communities”. Yet, it became clear that for most chapters, there was an open question as to the extent to which a coherent and recognisable community could be said to exist around the chapter subject. For most, the idea of community invokes a group with some degree of shared values, attitudes, and goals, and whose members have some degree of interaction. Although there are many successful “thematic” open data communities, in some sectors there are many different groups, each with distinct agendas, and with varying levels of interconnection, whilst in other sectors the sense of a distinct open data community is much more nascent.
By looking at the extent of community networking within, and across, sectors, we bring into focus a number of the drivers for community cohesion, including levels of collaboration, learning, and progress on securing impact from open data. For example, in the broad accountability and anti-corruption field (Chapter 1), we find strong connections have been made between distinct communities of investigative journalists, open contracting and procurement specialists, and individuals acting under a “follow the money” banner. While often meeting separately, these groups also benefit from a high degree of fluidity and the exchange of ideas through events, multilateral meetings, and field-building publications. By contrast, although the crime and justice chapter (Chapter 4) identifies many individual projects looking at open data, there is little evidence of a sustained global or regional community pushing open data forward in this sector, and instead the landscape is made up of ad-hoc initiatives by governments or other stakeholders without the evidence of substantial community development. Using a community lens can highlight how differing sectoral cultures, and different levels of investment in community coordination, impact on the degree to which action has been mobilised to address open data.
A community lens also brings to the fore questions about the people involved in steering and shaping open data activity within particular domains, inviting an exploration of whether communities are diverse or whether they are globally representative. Ultimately, all of the chapters serve to illustrate that community building requires intentional effort and sustained investments of time, resources, and energy. For example, substantial efforts have gone into outreach and to providing travel support to enable participants from lower-income countries to participate in open data events, such as the International Open Data Conference,6 the GODAN Summit focusing on agriculture,7 Open Contracting global events,8 or meetings of the International Aid Transparency Initiative’s Technical Advisory Group.9 We should also note that global community building often requires bridging language barriers, and the flow of learning and conversation between different linguistic open data communities is worthy of further investigation.
Lastly, a community lens can be used to examine the position of an open data community within a wider sector as a whole. Are open data specialists simply talking to each other or are they reaching out to shape wider sectoral work? The picture is varied, although, in almost all cases, there are opportunities to improve the integration of open data practitioners into existing sectoral communities of practice and to leverage open data to broaden those communities. A level of cultural adaptation is generally required as open data communities interface with existing communities of practice. For example, the national statistics chapter (Chapter 13) calls for improved connections between open data and national statistics offices (NSOs), recognising the need to focus on building mutual respect and understanding between statistics professionals and open data communities. The urban development chapter (Chapter 16) also illustrates the challenges of inserting an open data community into the mainstream of the sector, where, although open data has become a central topic in community discussions of resilient cities, within the commercial-led smart-cities marketplace, open data is treated as a minor tool rather than a transformative agenda.
The chapters in this section identify hundreds of different organisations engaging with the open data agenda and many different projects opening data and putting it to use. However, they also reveal that increasing open data adoption and impact across a sector is by no means inevitable. The process of making data open and ensuring that datasets can serve a much wider range of use cases than those for which they were originally created has resulted in a myriad of issues around data quality and interoperability that are only now starting to be addressed. Many chapters also point to major bottlenecks caused by endemic capacity gaps around data analysis and use, as well as the limited deployment of strategic actions to connect data analysis with policy change. In many sectors, the full potential of open data is being missed, in part, due to a shortage of sustained specialist work on technical and policy challenges and difficulty in finding non-profit or for-profit models that can bring the extended focus needed to move beyond pilots into long-term projects and programmes.
What is clear, however, is that although, in 2009, open data was promoted as a general reform, today, it is primarily seen as an asset to be used in meeting specific goals (including the SDGs). This raises many new questions for the open data movement as a whole, including whether it can be said that there is even a single overarching open data movement or whether we have many divergent sectoral movements and communities. How can open data be used to go deeper into sectoral problem solving while still maintaining cross-cutting learning and connections between communities? The chapters that follow are intended to address these questions and more.
1https://sustainabledevelopment.un.org
2Cabinet Office. (2013). G8 Open Data Charter and Technical Annex. GOV.UK, 18 June. https://www.gov.uk/government/publications/open-data-charter/g8-open-data-charter-and-technical-annex
4https://www.opengovpartnership.org
5https://opendatabarometer.org/
7https://www.godan.info/pages/godan-summit-2016
An established international field working on anti-corruption and accountability has existed only marginally longer than the open data movement itself. Open data for anti-corruption holds great potential, but efforts often face the common challenge that data availability does not automatically translate into effective data use.
Strategies employed by reformers to address corruption and anti-corruption include strengthening the capacity of different local stakeholders to work with open data and tailoring the implementation of technical solutions to the institutional and political dynamics of particular contexts.
Research indicates that the relationship between transparency and accountability is not necessarily causal or linear. Anti-corruption practitioners continue to debate how to best address the challenges at the heart of corruption problems.
Future efforts need to focus on strengthening the connections between open data and anti-corruption practitioners, and ensuring the sharing of evidence and lessons learned.
The expectations that open data might serve as a strategic tool for reformers around the world to improve anti-corruption and accountability results has been a key driver behind the push by open data advocates for more and better open government data. The underlying theory appears straightforward: open data “can reinforce anti-corruption efforts by strengthening transparency, increasing trust in governments, and improving public sector integrity and accountability by reinforcing the rule of law through dynamic citizen participation, engagement, and multi-stakeholder collaboration”.1
Excitement over the promise of open data has been shared by large and small organisations alike. The G7 and the G20 have recognised its value, and multilaterals, such as the World Bank and the Inter-American Development Bank, have invested heavily in programmes to support open data. Bilateral aid agencies, including the Department for International Development (DFID) in the United Kingdom (UK) and the United States Agency for International Development (USAID), and philanthropic foundations, such as members of the Transparency and Accountability Initiative,2 have also supported open data work. Additionally, multi-stakeholder initiatives like the Open Government Partnership, the Open Contracting Partnership (OCP), and the International Aid Transparency Initiative (IATI), among others, have facilitated and promoted efforts by government agencies, civil society, and media organisations across the world.
Current evidence about the impact of this work is relatively scant. Some argue that open data efforts have proven successful in “improving government by tackling corruption and increasing transparency, and enhancing public services and resource allocation”, and in “empowering citizens [...] by enabling more informed decision making and new forms of social mobilisation”.3 Yet, at the same time, others have pointed out that open data has not been widely used in corruption investigations.4 Other research questions the linearity and simplicity of the assumption that data availability leads to results, arguing that “transparency, information or open data are not sufficient to generate accountability”.5 It is fair to conclude that challenges exist in measuring the impact of open data to improve accountability and anti-corruption results. This raises questions about whether, and how, the open data community can convince the general public that greater access to open data is key to achieving results.
One reason why the evidence is patchy is that the relevant literature lacks common definitions of accountability and anti-corruption.6 Definitions are often overly broad, defining accountability as the combination of answerability, the obligation to inform and justify public decisions, and enforceability, the ability to sanction or remedy contravening behaviour.7 Corruption, in turn, is often used as an umbrella term to group behaviours related to the abuse of entrusted power, ranging from bribery and embezzlement to clientelism.8 Both accountability and anti-corruption are about preventing, detecting, and disrupting abuses of power. Open data is a very powerful tool to reduce information asymmetries that lead to a power imbalance; however, more open information is not enough to actually negate the institutional and political dynamics that allow those in power to abuse it and remain impune.
Open data activists often assume that the solutions needed to strengthen accountability and to reduce corruption are already known by specialists, and that open data will increase the effectiveness of those working to implement such solutions. However, international development work focused on anti-corruption and accountability has been around only marginally longer than work on open data,9,10 and the communities working on these issues have not yet reached consensus on several issues. Debates related to anti-corruption and accountability revolve around: concerns over how to prioritise and address corruption challenges in different contexts;11 exploration of how to design, monitor, and implement interventions;12 questions related to understanding and tracking changes in the political and technical dynamics that shape institutional reform and behavioural change;13 discussions regarding how to identify and assess impact;14 and ways to ensure that interventions actually empower marginalised groups and provide them with the means to improve their lives.15
Reflection on the overlap between the open data and the anti-corruption and accountability agendas offers important opportunities to methodically test underlying assumptions about the impact that power abuses have in practice and the role opening information can play in addressing these abuses. However, up to this point, such work has often been done by “pioneers” with little collaboration across agendas and with little attention given to the movement from simple data availability to using it strategically to address systemic or sectoral problems and achieving real impact.
This chapter will highlight the challenges, gaps, and progress made on key issues at the intersection between open data, accountability, and anti-corruption.
In the mid-2000s, reformers pushing for open data began to demand the publication of data by governments in reusable formats that could be accessed by the general public. This effort later evolved toward identifying and then closing gaps in the publication of datasets,16 with an additional focus on the implementation of data standards and data interoperability. Advocates have been successful in framing the open data agenda, advocating for standards, and convincing civil society, governments, and, to a lesser extent, the private sector to engage.
Open data initiatives have tended to focus on the release of data summarising existing government processes, while paying little attention to uses and users of the data, often treating open data as an end in itself. This has created momentum for the publication of datasets, but has also led to some governments focusing solely on transparency around selected issues without paying attention to opening up the underlying processes behind that data which are used internally to support transactions and decision-making. Open data and open government advocates have labeled these types of efforts as “passing off the release of inconsequential government-held data as transparency”17 or “open-washing”.
The mostly implicit theory of change in many open data initiatives is that more information will (almost) automatically lead to its use by those working on anti-corruption and accountability and enable them to produce better outcomes and achieve impact. However, while information and technical improvements are great tools to better understand accountability and corruption challenges, they are not sufficient to address entrenched power structures that oppose governance reform and generate systemic changes.
In 2016, the Open Data Barometer found that a number of datasets relevant to anti-corruption work (e.g. budgets, company registries, spending, contracting, and land ownership) “still tend to be highly opaque, and often the least open”, and that important differences persist within and across regions.18 A review of key datasets in five G20 countries also indicates that these relevant datasets are often not yet published, that public officials lack the skills to leverage open data, and that initiatives to strengthen citizen engagement using open data rarely link to anti-corruption or sectoral areas.19
In 2017, the “Open up guide: Using open data to combat corruption”20 identified 30 key datasets21 for fighting corruption (see sample in Figure 1), as well as standards that can make these datasets interoperable. The guide was tested in Mexico,22 which produced evidence on the value of the guide for enabling government officials to open key datasets. It also highlighted the need to define clear data governance frameworks and to promote dialogue between data users and producers in government and civil society.
Figure 1:Ten of the 30 datasets identified by the “Open up guide: Using open data to combat corruption”23
Efforts to open up data that is directly relevant to local accountability and corruption challenges are becoming more frequent, but they remain siloed, with a low degree of interoperability among released datasets that are often used by only a limited number of stakeholders active in a specific issue area. Such efforts are often led by civil society and, to a lesser extent, by governments. Examples of government-led efforts include the publication of commercial agreements, business relations, payments, and gifts to health providers by the private sector in France24 and Germany,25 as well as budget and/or spending data by many governments at different levels, often with support from international actors such as the Global Initiative for Fiscal Transparency,26 the World Bank,27 and Open Budgets.28 These government-led efforts have also spread to government performance data, such as the publication of data on the use of public resources for natural risk management and response by Italy29 and Mexico.30
Testing the “Open up guide: Using open data to combat corruption” – Mexico31
A joint effort by the government, Cívica Digital, Transparencia Mexicana, the Open Data Charter, and the Inter-American Development Bank tested the Open up Guide in Mexico by publishing a number of the key datasets it identifies.32 This work provided insights into the challenges and opportunities of opening key datasets to fight corruption:
Access to a list of key datasets and guidelines for data publication facilitates collaboration with institutions; however, this collaboration can be improved by prioritising data publication based on locally relevant corruption challenges and user needs. The process also provides entry points for opening datasets beyond the executive branch.
Data publication needs to be complemented with capacity building for work on data and the provision of targeted support concerning gaps, legal challenges, and data use. The Mexico pilot enabled researchers to produce a process that can be used by governments elsewhere in their efforts to improve the publication of key datasets related to anti-corruption.
Agencies with the mandate to open government data and civil society organisations are both key to ensuring the actual implementation of commitments to open data and to improving the processes and practices that underlie data production and use. This collaboration can be improved by instituting and/or strengthening formal data governance frameworks.
In other cases, civil society and media organisations have stepped in to close important gaps in the official publication of data related to accountability and anti-corruption. Most commonly, these efforts focus on those areas where governments have not indicated a willingness to act (or even explicitly oppose the publication of datasets) by using a wide array of strategies to achieve the release of information which is then transformed into open data. Such efforts often seek to pressure governments by accessing and releasing information in ways that will create incentives for government officials to publish the same information as open data. Some of the strategies used to access data when official open data is lacking include:
1.Making public information requests33 and publishing structured data from the results, such as the work by La Nacion newspaper on asset declarations.34
2.Obtaining data from candidates running for public office and from government officials on assets, tax compliance, and interests, as with the work done by the civil society coalition behind the “tres de tres” initiative in Mexico.35
3.Scraping documents and connecting different sources of data, such as with the publication of open data on political finance36 in Peru by “Ojo público” and in Taiwan37 by the Council Voting Guide.
4.Transforming complex data into open formats as has been done by “Ciudadano Inteligente”38 in Chile with regard to party financing.
5.Turning information published by non-government actors (e.g. reports by private companies) into open data, as with the Data Extractors Programme by Publish What You Pay.39
6.Combing through public records and linking up data to enable the investigations of potentially corrupt transactions, such as the work by the Open Data Institute (ODI) in Kenya40 and the Organised Crime and Corruption Reporting Project in Eastern Europe.41
7.Collating and systematising data from different sources and jurisdictions, such as the work by Open Ownership, merging public registers, government reports, and voluntary disclosure42 to reveal beneficial ownership, or the work by Govtrack43 with regard to the US Congress.
These efforts hold great potential, but have often faced challenges to translate data gathering into data use with tangible impact. Data often remains both siloed and dispersed, with information on the same topic being scattered across different agencies or levels of government, which provide the data in different ways and formats. Even where data can be collected and connected, concerns about its quality, completeness, usability, and sustainability are common. When working with data, questions of trust inevitably arise. Data users often doubt the reliability of the data and question whether the design and evaluation of public policies and decisions are actually based on that data. Finally, many potential data users face the emerging tendency of many governments to close civic space.44
Opening sensitive data in closed contexts
Most conversations around open data are based on experiences from those countries with some willingness to release open data on contentious issues, yet there are also efforts to open data for accountability and anti-corruption led by civil society mavericks in repressive countries with high levels of secrecy.
In Venezuela, the Transparency International chapter and the Instituto de Prensa y Sociedad de Venezuela have led an effort to compile, systematise, and publish open data45 about regulations and decisions with regard to the use of public money. In Malaysia, the Sinar Project and the Web Foundation have produced and linked data about politically exposed persons46 in an effort to shed light on how power is used and misused in the country. These admirable efforts challenge repressive and secretive governments and put issues of corruption and accountability up for public debate.
Over the last decade, progress and challenges in achieving accountability and anti-corruption results have led the community to gradually revise the theory and practice underlying their work on open data. Activists are now moving beyond models based on the supply and demand of data47 to focus their work on more locally relevant problems, seeking to unpack the different elements needed to connect data production to use and impact. Some of the key ideas that may be coalescing into a revised theory of change include:
1.The need to make explicit the steps needed to go from data production through to taking actions that can activate institutional responses.48
2.A move from linear models to the use of cyclical and iterative approaches that enable a focus on specific governance challenges and the use of learning and adaptation.49
3.Integrating open data into the operation of existing anti-corruption institutions and mechanisms.50
4.Revising how to measure progress in the implementation of open data initiatives.51
The following sections will provide a deeper exploration of the different mechanisms connecting data availability and action with regard to existing anti-corruption systems and initiatives.
Progress in the publication of data, even if uneven and patchy, has raised important questions about who will use that data, how they will use it, and what results can be achieved. There are no silver bullets when it comes to promoting the use of open data by local stakeholders to address corruption and accountability challenges. The approaches that have been used to bridge the gap between data production and use can be classified into three overlapping groups: those focused on data standardisation and technological tools, those focused on engaging users and particular problems, and those focused on changing government processes and practices.
First, those initiatives that have focused on standardisation and technological tools have paid great attention to the development of data standards and their implementation by governments. They aim to improve the quality and comparability of published data and enable the development of tools that can be adapted according to the needs of audiences in different contexts. These efforts have targeted a variety of areas from democratic processes to resource flows and, to a lesser extent, development results. Examples52 include the IATI standard,53 Fiscal Data Package,54 the Popolo data specification,55 the OpenCorporates schema,56 and the Open Contracting Data Standard.57
The development and management of data standards related to accountability and anti-corruption have shown a similar trend to that of the broader open data space. Initially, standardisation was focused on finding ways to better present the information that was produced by governments, but later those leading the standards began to pay greater attention to data users’ needs, moving beyond representing government processes into using data to reshape those processes. Important challenges still remain in terms of the technical features and tools needed to make the implementation of data standards more useful and in relation to ensuring that stakeholders have the capacity to use the standards to address locally relevant challenges. Increased collaboration between standard developers, implementers, and data users at the global and national level is needed to develop technical solutions in a way that is sensitive to local capacities to produce the data and to put it to use within complex political systems.
Even though there are a number of stakeholders working to implement data standards, promote interoperability, and develop tools to facilitate data use, the actual use of open data has not increased proportionally. New projects that pay greater attention to supporting users trying to use data presented according to data standards are now emerging with strategies to promote data use58 and to explore the use of open data to fight corruption in particular countries.
Second, those initiatives that have paid greater attention to engaging users and achieving particular outcomes have shown important results. A clear example is the work of journalists at the national and international levels involved in collaborative networks such as the International Consortium of Investigative journalists (ICIJ). Recent scandals, such as those exposed by the Panama59 and Paradise60 Papers, have not only uncovered corruption, but have led to the consequential launch of prosecutions and the resignation of public officers and even presidents.61 After publishing such stories, data has been made available in open formats that can enable the work or analysis by others. While these examples could be used to question the value of open government data on politically salient issues when compared to data obtained through leaks, the disparity in outcomes may indicate more about the differences in the way this data is being produced, treated, and used.
Leaked data often includes full versions of documents that are then used to stimulate collaboration among networks of journalists, both online and offline. These networks review the data thoroughly to organise it, clean it, and make sense of it. The same networks then use the data to find leads that are further corroborated and developed through other sources, including open government data, documents, and on-the-ground research. This intense work is not focused on merely making the information available; it is aimed at making the information useful to further identify and expose illegal activities carried out by those in power.
Lastly, there are a number of initiatives that have focused on fostering and supporting changes in government processes and administrative practices. Some of this work relies heavily on data to explore the value of new technologies like machine learning, blockchain,62 and algorithms;63 however, using these tools to analyse open government data has not yet reached a widespread level of popularity.64 Interest in these still emerging technologies leads inevitably to a range of challenges with regard to potential violation of privacy, the possibility of reproducing and increasing existing biases, and the threat of using automation to hide questionable decisions and practices.65
Other important work to promote change in government practices through the use of open data is led by multi-stakeholder initiatives on procurement, international aid, extractives, and public infrastructure. Even though these initiatives are at different levels in their uptake and maturity, all of them seek to alter long-established government processes. While some initiatives use formal multi-stakeholder forums for the production, verification, and use of open data, others promote the integration of open data into government processes beyond the simple publication of data. These initiatives have led to important, if not yet widespread, results,66 ranging from identifying money flows in the extractives sector67 and the misuse of public resources68 to achieving savings and better service delivery through improvements in the planning and implementation of government processes (see the box opposite).
Open contracting: From open data to improved results
From saving millions in public resources69 to fueling citizen mobilisation demanding accountability70 and improving the implementation of service delivery programmes,71 open contracting is one of the most successful uses of open data to improve anti-corruption and accountability results. Three features place the work of the OCP and its local partners72 at the forefront of work on open data:
Open contracting principles and the data standard to regularise the opening of procurement information were developed in collaboration with government reformers, lawyers, private sector companies, and the media.
Sectoral efforts have gone beyond the development of a standard to focus on work with local reformers to address concrete challenges related to increasing value for money, strengthening public integrity, boosting market opportunities, enhancing internal efficiency, and improving the quality of goods and services.
Reformers have used agile and adaptive ways for promoting the implementation of procurement reforms, user engagement, and the actual use of data, learning on the go and adjusting strategies as needed.
Opening information on government contracts is increasing the capacity of activists and journalists to understand and challenge existing structures and protocols that allow the siphoning of public resources and unfair contracting practices.
An example of this work in action is the joint effort by the municipal government of Bogotá and Colombia’s procurement agency, Colombia Compra Eficiente (CCE), to use open data to identify inefficiencies and corrupt practices in the delivery of school meals in the city.73 The use of this data by government and suppliers has led to reshaping the way the programme is tendered, opening opportunities for more suppliers to participate, and enabling the busting of a price-fixing scheme for fruit. This improved the accountability of the process and enhanced the quality and timeliness of the meals provided.
The wide variety of approaches by government and civil society to address anti-corruption and accountability challenges should not be read as an attempt to identify the single best strategy to achieve results. Instead, the open data community needs to distill, share, and debate the lessons emanating from both successes and failures, reflecting on what these lessons mean for developing and implementing further projects moving forward. Additionally, it is not only a matter of choosing between approaches focused around a particular technology, stakeholder group, or government reform. It will be necessary to identify and explore, in practice, how a combination of these approaches can help to address particular corruption and accountability challenges within specific contexts.
As discussed above, more data does not necessarily lead to a proportional increase in either the use of data or anti-corruption and accountability results; however, increased access to standardised, machine-readable, and reusable data has enabled sharper investigations into instances of corruption and abuses of power, additional research to identify inefficiencies in the use of public resources, and greater awareness of systematic biases against particular groups. Nonetheless, current advances are generally insufficient to address the root causes that underpin corruption and accountability challenges: the ways in which power is distributed in a given society and the subversion of existing (democratic) institutions for private gain.
There are several emerging efforts to improve openness beyond the executive branch74,75 and address corruption and accountability challenges76 in other branches of government, including activities at the heart of the democratic process, such as monitoring elections and the undue influence of money in politics through campaign and party financing. Some initiatives, like those of organisations in the Openingparliament.org77 network, have paid particular attention to the legislative branch.78 These efforts to open and communicate information about legislators, how they perform their duties, and about legislation itself79 have been at the centre of work to strengthen democracy, with consequent benefits for anti-corruption work. Yet, these emerging efforts often face important challenges in relation to the availability of data in machine-readable formats to support the accountability of members of parliament, as well as in relation to implementing lobbying reforms. While legislatures may be happy to see accessible data on aspects of government operations, they may be more resistant to opening up structured data on their own activities and interests.
A number of national governments have also been subject to interesting efforts to open up data about the judiciary80 and oversight bodies, such as audit institutions. Their aim is to get a more complete picture of how cases are assigned to judges and how those cases progress until judgment is rendered. However, these efforts are not yet widespread and often face claims that they may hamper due process during trials.
Crucially, even when data is made available, initiatives tend to remain limited in their focus on particular branches of government or processes and generally have weak formal connections to the institutional systems in which they operate (e.g. the functioning of democratic institutions, the use of public resources, and the application of effective sanctions against those who engage in corrupt practices). This makes it challenging to follow cases of corruption from identification through to final resolution and sanctions, and, ultimately, hinders lasting impact and influence on future activities. Without ongoing scrutiny of democratic systems of power to enforce anti-corruption measures, individuals and institutions are able to continue to act with impunity, and the consolidation and replication of corrupt networks is facilitated.
The theory of change behind the idea of using open data for anti-corruption and accountability also highlights the potential value data can have in empowering citizens and enabling social mobilisation. Some organisations have used data to pursue an activist approach to crafting stories, uncovering wrongdoing, and identifying entry points that enable others to get involved. However, these approaches can put activists in peril, and, as of today, there are no established safety networks for this work, such as those that exist to protect human rights defenders or journalists. The absence of such safeguards, and the weaker links to established mechanisms for protection, may lead activists to take unnecessary risks and expose them to legal, reputational, or physical attacks.
Opening up the judiciary and advocating for greater accountability results
Due process and the effective management of evidence during trials is often seen by reformers as an excuse taken to the extreme by judicial bodies, preventing public disclosure of the most basic information on how cases are moved through the judicial system after a corruption scandal has been uncovered. However, some initiatives in this area have had an impact. One example of an open data initiative to obtain and use such information is the work done by the “Asociación Civil por la Igualdad y la Justicia” (ACIJ) in Argentina.
After years of litigation efforts to access information on corruption cases from the judiciary, and the burdensome work of turning hard copies into machine-readable data, ACIJ was able to create an observatory of cases.81 This has enabled the public to demand greater accountability regarding the delivery of justice in corruption-related cases. Recently, this work has been further enabled by the opening of judicial information by the Argentine government.82 Investigations into how corruption cases are allocated83 have resulted in significant insights into how impunity is sustained, and there are now calls for reform to tackle more profound systemic issues in the judicial system.
Despite the emergence of various activist approaches, it is generally organisations that focus on governance, transparency, participation, and accountability that most frequently lead initiatives to address accountability and anti-corruption. These organisations play an important role, but still need to find effective ways to engage other key stakeholders, such as organisations working in particular sectors or territories, those working on rights protection, those active in social movements, or those working through other alternative mechanisms, such as strategic litigation. The minimal connections that often exist between open data initiatives and a broader range of stakeholders can deepen the challenges related to the usability, and actual use, of the data and hinder real impact in addressing problems that affect citizens.
The assumption that an improved capacity to identify instances of corruption leads to activating institutional oversight mechanisms is not necessarily wrong; however, assuming that those mechanisms will actually deliver results in the form of successful reforms, grievance redress, or sanctions without additional effort is, at the very least, an oversimplification. Open data can be a tool useful not only to identify instances of corruption, but also to engage, challenge, and reform the institutional designs and practices that enable corruption. Turning this potential into reality requires the use of approaches that consider the institutional and political environments in which data is produced and used. Adopting such approaches will enable sharper thinking about how the use of data can be more effective in practice, how to counter the forces that oppose openness (be they for private gain or from an aversion to change), and how to build stronger bridges between advocates for open data, activists working on sectoral and systemic challenges, and the democratic forces that can act on the findings and evidence obtained from the use of open data.
The open data community needs to explore and test innovative ways of using data that take all of these slowly acquired insights into consideration: to effectively challenge institutional mechanisms and practices that perpetuate impunity, inefficiencies, and the abuse of power; to reach out to unusual stakeholders by finding ways to integrate their needs and interests; to tap into existing social mobilisation processes; and to link the efforts of the different stakeholders engaging on reform with government branches and institutions.
Over the past decade, reformers have used open data to create ripples and, in some cases, waves, in uncovering and prosecuting corruption. In a few cases, these efforts have resulted in the reform of systems where corruption had been the norm. Through their work, these reformers have generated insights that can help us to understand how to use open data more effectively to fight against corruption going forward. One of the key insights open data reformers have started to embrace is the value of adopting a problem-driven approach to the publication and use of data in order to address much more specific corruption and accountability challenges. These approaches also call for more collaborative models that are more grounded in the environmental context in which they are to be implemented, building on the needs and interests of local reformers and moving away from the replication of generalised practices toward the development of tailored “best-fit” solutions.
This shift in thinking on how to best use open data for accountability and anti-corruption does not represent a break with the ideals at the core of the open data movement, such as “open by default”, but it does call for some refinement of our thinking around how to articulate advocacy goals, learning aims, and the desired impact. There is, and will continue to be, value in demanding that governments open up data on key issues related to accountability and anti-corruption; however, these demands should be based on a clear understanding of the users and usefulness of data, as well as the technical, political, and institutional environments in which it will be used.
To grapple with the implications of these insights, stakeholders would benefit from engaging with each other to develop non-linear approaches to better address particular corruption and accountability challenges. Learning about other perspectives and approaches will provide useful insights to improve how we devise and test methods, monitor progress and results, and spur dialogue on how and why specific approaches might yield better results. In particular, it is important to explicitly address the following questions:
1.How can the field facilitate and strengthen the work of local champions, including government, civil society, and the private sector, to generate and use data evidence to demand accountability and to lead in the fight against corruption?
2.What are the needs of different local stakeholders with regard to using open data? How can these insights help to tailor technical tools and methodological approaches to better support stakeholders in different sectors and contexts?
3.How can stakeholders build stronger and more effective connections among those working on open data, accountability, and anti-corruption, and those who work in sectors on specific issue areas?
4.What are the potential risks associated with using emerging technologies, such as machine learning, and artificial intelligence, in relation to accountability and anti-corruption? How can these tools and methods be combined with the social mobilisation and institutional mechanisms needed to generate and sustain change?
5.How can actors link the technical capacities needed to use open data with the political strategies needed to effectively change systems and ensure sustainable results?
Success in addressing these questions over the next decade will enable reformers to achieve significant accountability and anti-corruption results, but future work will require the community to develop holistic theories of change and the willingness to test them, implementing interventions in an iterative manner that enables reformers to ensure that open data is useful and used, while strengthening collaboration among stakeholders to achieve systemic reform and explicitly addressing entrenched power dynamics. In addition, the community must move beyond simple dichotomies that either highlight the production or the use of data toward models that start with identifying a specific problem to be solved, include the identification of the opportunities and challenges faced by local champions, and embrace learning and adaptation to develop solutions that are a better fit in specific environmental contexts.
Further reading
Carolan, L. (2017). Mapping open data for accountability. Transparency and Accountability Initiative and Open Data Charter. http://www.transparency-initiative.org/wp-content/uploads/2017/06/taiodc_draft_data4accountabilityframework.pdf
McGee, R., Edwards, D., Hudson, H., Anderson, C., & Feruglio, F. (2017). Appropriating technology for accountability: Messages from making all voices count. Brighton: Institute of Development Studies. https://opendocs.ids.ac.uk/opendocs/bitstream/handle/123456789/13452/RR_Synth_Online_final.pdf
OECD. (2017). Compendium of good practices on the publication and reuse of open data for anti-corruption across G20 countries: Towards data-driven public sector integrity and civic auditing. Paris: Organisation for Economic Co-operation and Development. http://www.oecd.org/corruption/g20-oecd-compendium-open-data-anti-corruption.htm
Santiso, C. (2018). Will Blockchain disrupt government corruption? Stanford Social Innovation Review, 5 March. https://ssir.org/articles/entry/will_blockchain_disrupt_government_corruption
Vrushi, J. & Hodess, R. (2017). Connecting the dots: Building the case for open data to fight corruption. Berlin: Transparency International and World Wide Web Foundation. http://webfoundation.org/docs/2017/04/2017_OpenDataConnectingDots_EN-6.pdf
About the authors
Jorge Florez leads Global Integrity’s work on fiscal governance. He focuses on helping country-level partners to use data strategically to address accountability and corruption challenges. Follow Jorge at https://www.twitter.com/j_florezh and learn more about Global Integrity at https://www.globalintegrity.org.
Johannes Tonn leads Global Integrity’s anti-corruption work and supports partners in designing and implementing problem-driven, data-informed, and learning-centred approaches to solving governance challenges. Follow Johannes at https://www.twitter.com/johntonn and learn more about Global Integrity at https://www.globalintegrity.org.
How to cite this chapter
Florez, J & Tonn, J. (2019). Open data, accountability, and anti-corruption. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The state of open data: Histories and horizons (pp. 17–34). Cape Town and Ottawa: African Minds and International Development Research Centre. http://stateofopendata.od4d.net
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. It was carried out with the aid of a grant from the International Development Research Centre, Ottawa, Canada. |
1OECD. (2017). Compendium of good practices on the publication and reuse of open data for anti-corruption across G20 countries: Towards data-driven public sector integrity and civic auditing. Paris: Organisation for Economic Co-operation and Development, p. 11. http://www.oecd.org/corruption/g20-oecd-compendium-open-data-anti-corruption.htm
2http://www.transparency-initiative.org/who-we-are/
3GovLab. (2016). Open data’s impact: Open data is changing the world in four ways. http://odimpact.org/
4Segato, L. (2015). Revolution delayed: The impact of open data on the fight against corruption. Torino: Research Centre on Security and Crime, p. 2. https://www.transparency.it/wp-content/uploads/2015/09/2015-TACOD-REPORT.pdf
5McGee, R., Edwards, D., Hudson, H., Anderson, C., & Feruglio, F. (2017). Appropriating technology for accountability: Messages from making all voices count. Brighton: Institute of Development Studies, p. 11. https://opendocs.ids.ac.uk/opendocs/bitstream/handle/123456789/13452/RR_Synth_Online_final.pdf
6Fox, J. (2018). The political construction of accountability keywords: Lessons from action-research. TICTeC 2018, Lisbon. https://tictec.mysociety.org/2018/presentation/political-construction-of-accountability-keywords
7Stapenhurst, R. & O’Brien, M. (n.d). Accountability in governance. Washington, DC: World Bank. https://siteresources.worldbank.org/PUBLICSECTORANDGOVERNANCE/Resources/AccountabilityGovernance.pdf
8Menocal, R.A. & Taxell, N. (2015). Why corruption matters: Understanding causes, effects and how to address them. London: Department for International Development. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/406346/corruption-evidence-paper-why-corruption-matters.pdf
9Carothers, T. & Brechenmacher. S. (2014). Accountability, transparency, participation, and inclusion: A new development consensus? Washington, DC: Carnegie Endowment for International Peace. https://carnegieendowment.org/files/new_development_consensus.pdf
10Savedoff, W. (2016). Anti-corruption strategies in foreign aid: From controls to results. CDG Policy Paper. Washington, DC: Center for Global Development. https://www.cgdev.org/sites/default/files/CGD-policy-paper-Savedoff-anticorruption-agenda.pdf
11Heywood, P. (2016). Tackling corruption overseas – Written evidence. London: Parliament of the United Kingdom. http://data.parliament.uk/writtenevidence/committeeevidence.svc/evidencedocument/international-development-committee/tackling-corruption-overseas/written/29840.html
12Marquette, H. (2016). Tackling corruption: Why we need to do things differently. London: Parliament of the United Kingdom. http://data.parliament.uk/writtenevidence/committeeevidence.svc/evidencedocument/international-development-committee/tackling-corruption-overseas/written/30710.pdf
13Menocal, R.A. & Taxell, N. (2015). Why corruption matters: Understanding causes, effects and how to address them. London: Department for International Development. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/406346/corruption-evidence-paper-why-corruption-matters.pdf
14Malito, D.V. (2014). Measuring corruption indicators and indices. EUI Working Paper. Robert Schuman Centre for Advanced Studies. San Domenico di Fiesole: European University Institute. http://cadmus.eui.eu/bitstream/handle/1814/29872/RSCAS_2014_13.pdf
15McGee, R., Edwards, D., Hudson, H., Anderson, C., & Feruglio, F. (2017). Appropriating technology for accountability: Messages from making all voices count. Brighton: Institute of Development Studies. https://opendocs.ids.ac.uk/opendocs/bitstream/handle/123456789/13452/RR_Synth_Online_final.pdf
16Web Foundation. (2017). Open Data Barometer – Global report. 4th edition. Washington, DC: World Wide Web Foundation. https://opendatabarometer.org/4thedition/report/
17Khan, S. & Foti, J. (2015). Aligning supply and demand for better governance: Open data in the Open Government Partnership. Washington, DC: Open Government Partnership. https://www.opengovpartnership.org/resources/aligning-supply-and-demand-better-governance-open-data-open-government-partnership
18Web Foundation. (2017). Open Data Barometer – Global report. 4th edition. Washington, DC: World Wide Web Foundation. https://opendatabarometer.org/4thedition/report/
19Vrushi, J. & Hodess, R. (2017). Connecting the dots: Building the case for open data to fight corruption. Berlin: Transparency International and World Wide Web Foundation. http://webfoundation.org/docs/2017/04/2017_OpenDataConnectingDots_EN-6.pdf
20Open Data Charter. (2018). Open up guide: Using open data to combat corruption. https://open-data-charter.gitbook.io/open-up-guide-using-open-data-to-combat-corruption/
21https://airtable.com/shrHY9KFJ5bircwvx/tblOY2aw1hYUuJze9
22Open Data Charter. (2017). Anti-corruption open up guide: Road-testing methodology. https://drive.google.com/file/d/0B44SovahLueTUTIxaUZBVldrWDQ/view
23https://airtable.com/universe/exppzIHMSvEHE3ZCB/using-open-data-to-combat-corruption
24http://www.transparence.sante.gouv.fr
25Vrushi, J. & Hodess, R. (2017). Connecting the dots: Building the case for open data to fight corruption. Berlin: Transparency International and World Wide Web Foundation. http://webfoundation.org/docs/2017/04/2017_OpenDataConnectingDots_EN-6.pdf
26http://www.fiscaltransparency.net/
29http://italiasicura.governo.it/site/home.html
30http://www.transparenciapresupuestaria.gob.mx/es/PTP/fuerzamexico
31Echeverria, A., D’Herrera, D., & Alanís, R. (2018). Open up guide: Testing how to use open data to combat corruption in Mexico. Washington, DC: Inter-American Development Bank. https://opendatacharter.net/open-up-guide-testing-how-to-use-open-data-to-combat-corruption-in-mexico/
32https://datos.gob.mx/busca/group/guia-de-datos-abiertos-anticorrupcion
33Fumega, S.V. (2016). Transformations in international civil society organisations working towards a greater access and use of governmental informational resources. PhD Thesis, University of Tasmania. https://eprints.utas.edu.au/23437/
34http://interactivos.lanacion.com.ar/declaraciones-juradas/
36Luna Amancio, N. (2018). Cinco desafíos para investigar el dinero, la política y el crimen desde los datos estructurados [Five challenges to investigate money, politics and crime using structured data]. OjoPúblico. https://fondosdepapel.ojo-publico.com/data/cinco-desafios-para-investigar-el-dinero-la-politica-y-el-crimen-desde-los-datos/
37Pei-yi, C. (2018). How Taiwan uses open data to follow the money in politics. G0v.News, 28 March. https://g0v.news/how-taiwan-uses-open-data-to-follow-the-money-in-politics-779cb58a648d
38https://partidospublicos.cl/
39https://web.archive.org/web/20180726074721/http://www.publishwhatyoupay.org/our-work/using-the-data/
40Young, A. & Verhulst, S. (2016). Kenya’s Open Duka: Open data for transactional transparency. GovLab and Omidyar Network. http://odimpact.org/files/case-study-kenya.pdf
41Radu, P. (2016). Follow the money: How open data and investigative journalism can beat corruption. Global Investigative Journalism Network, 25 March. https://gijn.org/2016/05/25/follow-the-money-how-open-data-and-investigative-journalism-can-beat-corruption/
42https://openownership.org/what-we-do/
44https://monitor.civicus.org/
46Canares, M., Yusof., K., & Meng, S. (2017). Collaborating for open data. Building an open database on politically exposed persons in Malaysia: A case study. Washington, DC: World Wide Web Foundation. http://webfoundation.org/docs/2017/08/RP-Collaboration-For-Open-Data-082017.pdf
47Khan, S. & Foti, J. (2015). Aligning supply and demand for better governance: Open data in the Open Government Partnership. Open Government Partnership, 1 January. https://www.opengovpartnership.org/resources/aligning-supply-and-demand-better-governance-open-data-open-government-partnership
48Carolan, L. (2017). Mapping open data for accountability. Transparency and Accountability Initiative and the Open Data Charter. http://www.transparency-initiative.org/wp-content/uploads/2017/06/taiodc_draft_data4accountabilityframework.pdf
49Davies, T. & Perini, F. (2016). Researching the emerging impacts of open data: Revisiting the ODDC conceptual framework. The Journal of Community Informatics, 12(2). http://ci-journal.org/index.php/ciej/article/view/1281
50Open Data Charter. (2018). Open up guide: Using open data to combat corruption. https://open-data-charter.gitbook.io/open-up-guide-using-open-data-to-combat-corruption/
51Brandusescu, A. & Lämmerhirt, D. (2018). Open Data Charter measurement guide. Open Data Charter, 22 May. https://opendatacharter.net/4869-2/
52For more examples, see https://airtable.com/shrHY9KFJ5bircwvx/tblOY2aw1hYUuJze9
53International Aid Transparency Initiative Standard. http://iatistandard.org/
54Kariv, A. (2018) Introducing version 1 of the Fiscal Data Package Specification. Open Knowledge International Blog, 28 May. https://blog.okfn.org/2018/05/28/introducing-version-1-of-the-fiscal-data-package-specification/
55McKinney, J. (2013). Introducing Popolo, an Open Government Data Specification. OpenNorth. http://www.opennorth.ca/2013/02/21/update-on-opengovernment.html
56https://opencorporates.com/info/about
57https://www.open-contracting.org/data-standard/
58IATI. (2017). IATI data use strategy 2017-19.https://drive.google.com/file/d/1Oh_tFfe5sahfkeUISRynR2U6dm4lPQSS/view?usp=embed_facebook
59International Coalition of Investigative Journalists (ICIJ). (2016). The Panama Papers: Exposing the rogue offshore finance industry. https://www.icij.org/investigations/panama-papers/
60ICIJ. (2018). Paradise Papers: Secrets of the global elite. https://www.icij.org/investigations/paradise-papers/
61Fitzgibbon, W. & Díaz-Struck, E. (2016) Panama Papers have had historic global effects – and the impacts keep coming. International Coalition of Investigative Journalists, 1 December. https://www.icij.org/investigations/panama-papers/20161201-global-impact/
62Santiso, C. (2018). Will Blockchain disrupt government corruption? Stanford Social Innovation Review, 5 March. https://ssir.org/articles/entry/will_blockchain_disrupt_government_corruption
63World Economic Forum. (2018). Tech for integrity. https://widgets.weforum.org/tech4integrity/
64See, for example, https://github.com/okfn-brasil/serenata-de-amor
65Rieke, A., Bogen, M., & Robinson, D.G. (2018). Public scrutiny of automated decisions: Early lessons and emerging methods. Upturn and Omidyar Network. https://www.omidyar.com/sites/default/files/file_archive/Public%20Scrutiny%20of%20Automated%20Decisions.pdf
66Brockmyer, B. & Fox, J.A. (2015). Assessing the evidence: The effectiveness and impact of public governance-oriented multi-stakeholder initiatives. SSRN. https://papers.ssrn.com/abstract=2693608
67EITI. (2012). Nigeria EITI: Making transparency count, uncovering billions. Case studies. Oslo: Extractives Industry Transparency Initiative. https://eiti.org/sites/default/files/documents/Case%20Study%20-%20EITI%20in%20Nigeria.pdf
68CoST Honduras. (2016). Why we make infrastructure transparent in Honduras. Open Government Partnership [Blog post], 6 December. https://www.opengovpartnership.org/stories/why-we-make-infrastructure-transparent-honduras
69Brown, S. (2016). “Everyone sees everything”: Overhauling Ukraine’s corrupt contracting sector. Open Contracting Stories, 28 November. https://medium.com/open-contracting-stories/everyone-sees-everything-fa6df0d00335
70Brown, S. & Neumann, G. (2017). Paraguay’s Transparency Alchemists: How citizens are using open contracting to improve public spending. Medium [Open contracting stories], 2 October. https://medium.com/open-contracting-stories/paraguays-transparency-alchemists-623c8e3c538f
71Brown, S. & Neumann, G. (2018). The deals behind the meals: How open contracting helped fix Colombia’s biggest school meal program. Medium [Open contracting stories], 9 April. https://medium.com/open-contracting-stories/the-deals-behind-the-meals-c4592e9466a2
72https://www.open-contracting.org/why-open-contracting/worldwide/#/
73Ibid.
74Naser, A., Ramírez-Alujas, Á., & Rosales, D. (Eds.). (2017). Desde el Gobierno abierto al Estado abierto en América Latina y el Caribe [From open government to the open state in Latin America and the Caribbean]. Santiago: Comisión Económica para América Latina y el Caribe. https://repositorio.cepal.org/bitstream/handle/11362/41353/1/S1601154_es.pdf
75OECD. (2017). Compendium of good practices on the publication and reuse of open data for anti-corruption across G20 countries: Towards data-driven public sector integrity and civic auditing. Paris: Organisation for Economic Co-operation and Development. http://www.oecd.org/corruption/g20-oecd-compendium-open-data-anti-corruption.htm
76Open Data Charter. (2016). Open Data for anti-corruption: Investigation and Enforcement: Workshop Report, 24 April. https://docs.google.com/document/d/1cnnjwfX1aDjNVjUhI-0gWGRwLVfU1T2gFeJDuDBiSwY/edit?usp=embed_facebook
77See more at https://www.openingparliament.org/ and at https://www.transparencialegislativa.org/
78See more at https://beta.openparldata.org/about/ and http://everypolitician.org/
79https://www.regardscitoyens.org/la-fabrique-de-la-loi/
80See, for example, the open data portal of the judiciary in Argentina at http://datos.jus.gob.ar/, the publication of data gathered by audit institutions in the city of New York at https://www.checkbooknyc.com/, and the state of Veracruz in Mexico at http://sistemas.orfis.gob.mx/simverp
82A new version of the observatory is being built using data from the Supreme Court. https://www.cij.gov.ar/causas-de-corrupcion.html
83For more information, see https://conocimientoabierto.github.io/visualizaciones/sorteosJudiciales/
High level leadership, private sector engagement, and academic networks have put open data on the agenda across the agriculture sector.
Issues of ethics, ownership, power, culture, and capacity all need to be addressed before the sector is “open by default”.
Mapping information flows through agriculture value chains can help policy-makers and practitioners to identify pre-competitive spaces for open data sharing and to understand the implications of opening data more broadly.
Donors and governments have a key role to play in establishing the policy framework for openness and supporting the infrastructure needed for a sustainable open data commons for agricultural research and practice.
Goal 2 of the Sustainable Development Goals (SDGs) commits United Nations member states to both achieve food and nutrition security and to promote sustainable agriculture. The world population is projected to exceed 9 billion people by 2050,1 and the corresponding growing demand for food is exerting massive pressure on the use of water, land, and soil, which is further exacerbated by global warming. The majority of the world’s food is still harvested by smallholder farmers,2 many of whom are poor and food insecure themselves.3
Agriculture is a knowledge intensive industry. Government and private sector-supported research and agricultural extension work (e.g. farmer education) is central to improving crop yields, understanding and implementing sustainable practices, and getting food to market. However, it is only in the past two decades that the agricultural sector has valued data as a tool for generating, sharing, and exploiting knowledge to improve yields, reduce losses, and increase overall agricultural business outcomes.
Rapid internet and mobile phone penetration, especially in the developing world, the accessibility of satellite and remote sensing data, and new data collection and analytical approaches all play a role in the “datification” of agriculture. While data-related opportunities are increasing, challenges still exist in the policy, ethical, and data standards domains, and key datasets remain absent or inaccessible. This is especially true in terms of nutrition-related data, which is largely under-utilised in the field of agriculture. Despite some progress in raising consumers’ awareness of the nutritional value of the food they consume, demand has not been significantly redirected to the production of more nutritious food, especially in the developing world.
Networks and leadership: A history of open data in agriculture
Work on open data in agriculture has emerged from a long history of knowledge management practice and international networking. Agricultural libraries in the United States (US) have been sharing bibliographical data since the 1940s. In the 1980s, the Food and Agriculture Organization (FAO) of the United Nations developed AGROVOC4 initially as a printed thesaurus of terms and later established it as the first real data standard (vocabulary) for an open agriculture information ecosystem. FAO also created the first network to support agricultural information sharing in 2003, known as GLOBAL.RAIS (Global Alliance of the Regional Agricultural Information Systems).5 In 2008, they launched the Coherence in Information for Agricultural Research for Development (CIARD) initiative,6 a global movement dedicated to open agricultural knowledge, working to align the efforts of national, regional, and international institutions, and to improve information sharing and services.
The importance of considering not only data, but open data, came to the fore in 2012, when the US convened an international conference on Open Data for Agriculture, the result of a G8 commitment, with an emphasis on making “reliable agricultural and related information available to African farmers, researchers, and policymakers”.7 This led to the creation of Global Open Data for Agriculture and Nutrition (GODAN) as a convening network to bring together public, private, and non-profit stakeholders to find ways to open up and use data more effectively.
GODAN was conceived to focus on awareness raising and advocacy as reflected in its statement of purpose,8 but, from the outset, it was found that change through advocacy results only when partners are brought together to debate the issues and obstacles to making open data for agriculture a reality, especially when they can draw on provocative policy-focused research and recommendations. An approach to “Convene, Equip, and Empower” now frames the overall GODAN theory of change.9
Other notable networks that advocate for open data in agriculture through high-level communications, research, and events include the Global Partnership for Sustainable Development Data, the Research Data Alliance, Global Forum for Agricultural Research, Presidents United to Solve Hunger, and AgriCord.10
When we consider the potential and use of open data in agriculture, there are numerous facets that reflect the breadth and diversity of the sector, especially when one also considers nutrition as a key element of the field. Whether it is food price data, geodata, plant genomes, country statistics, nutrition data, or data from a grassroot initiative to quantify food composition, published open data sets can be used by a wide variety of stakeholders to generate impact.11 The actors involved are similarly diverse. Consider, for example, the single value chain for cheese production illustrated below.
Figure 1:Single value chain for cheese production
Source: Authors
Cheese is made of milk produced with the involvement of feed producers, dairy farmers, transporters, and processing factories. Each actor has an interest in understanding the provenance of their inputs and the markets they operate in. Some of the production chain involves data that can be made open. In other cases, data will be seen as a commercial asset. Regulators may be interested in product traceability, nutritional content, and labelling, and in providing this information to consumers. Producers are also interested in investment opportunities and risk reduction. In this simple value chain, there are various ancillary datasets that may be considered pre-competitive, yet still have some commercial value (weather data, transportation data, genetic data on livestock, etc.). These datasets can inform production, allowing producers to adjust the sourcing of inputs or to modify the production process to improve both the quality and the volume of their crops. Openness is clearly a tool to facilitate the flow of data across this value chain and to realise the maximum potential of data, yet openness requires policy choices, private sector engagement, and consumer awareness. It also requires that consideration be given to how different actors will be able to use the data that becomes available based on its level of interoperability. This chapter will attempt to unpack a number of these issues in more depth.
Agriculture is a complex sector, and it can be difficult to define its boundaries. Agriculture and food systems integrate seamlessly into other systems, such as ecology, human health, and the built environment. Sustainable agriculture is considered a “wicked problem”,12 where too many elements are involved in order for the problem to ever be considered “solved”. The data and metadata that are collected within agricultural systems are equally complex because they are generated by thousands of global stakeholders from multiple sectors, using an incredible range of types, formats, and ontologies. However, when we consider some of the primary forms and uses of agricultural data, such as research, production management, and statistical monitoring, we can start to map out some of the roles that different stakeholders play as illustrated in Figures 2 and 3.
Governments collect and share data in the form of national and international statistics (e.g. US National Agriculture Census13 and FAOSTAT14), but often also support farmers and agricultural practices by publishing key datasets used for ICT-enabled farm extension and to empower consumers in food supply chains. Governments may also provide policy-relevant open data, including data related to national standards and frameworks used by service providers who help farmers or processors meet regulatory requirements.15 Government also uses open data to promote transparency in their operations, with registers of land ownership a key example.16 They are able to use their regulatory power to collect, or require the publication of, key data from private actors. Since 2012, a number of governments have developed and implemented open data policies to help embed open data practice in their own organisations or use their role as donors, funders, and commissioners to bring open data into the mainstream of agricultural development work.
Figure 2:Different actors in the agriculture sector
Source: Jellema, A., Meijninger, W., & Addison, C. (2015). Data and smallholder food and nutritional security. CTA Working Paper 15/1. Wageningen, The Netherlands: Technical Centre for Agricultural and Rural Cooperation
Figure 3:The relationships and data flows between various actors in the agriculture sector
Source: Authors and Technical Centre for Agricultural and Rural Cooperation (CTA)
Larger agricultural businesses are increasingly interested in open data, and companies are exploring opportunities to act as both data producers and consumers.17 Some larger companies recognise that they are being held accountable by society and that greater transparency is a key foundation of their licence to operate.18 In 2014, with the support of the Open Data Institute, Syngenta, a multi-billion dollar firm, placed open data at the core of its transparency strategy;19 however, for many firms, operational “transparency” remains more opaque with information buried in corporate reports and the lack of structured background data. This presents challenges not only for public scrutiny, but also for investors seeking to target more sustainable investments.20 Due to the nature and size of the value chains of larger corporate entities in the agri-food business that operate on a truly global level, they can have a significant impact on countries that lag behind in terms of reaching the SDGs.
Many farmers in developed countries are turning to data-based precision agriculture. Even in the developing world, farming involves increasing amounts of data collection and analysis. However, smallholder farmers often lack the technical capacity to manage or exploit the open data they create or that is provided by external producers. Instead, they often rely on intermediaries from the private sector or government. These intermediaries typically develop portals, apps, and tools that allow farmers to benefit from data on a range of topics, such as weather, infestations, or soil quality, that would otherwise be unavailable to them. Farmers’ organisations have raised questions about the potential exploitation of data from farmers, with it being used against the interests of farmers unless it is well governed. In some countries, farmers have decided to take data management into their own hands by collectively developing portals and tools for themselves.
Academia and research have a long history of sharing data, and the cultural environment is shifting in a more open direction as open science is being embraced by more researchers,21 donors,22 and research networks.23,24 The FAIR (Findable, Accessible, Interoperable, Reusable) data principles25 have seen very rapid adoption in the scientific community, and open data has an important, albeit not exclusive, role within these principles. In partnership with international institutions, researchers have built a range of research infrastructure, including the European Open Science Cloud,26 and networks for the discovery of data, such as the CIARD Ring.27 The Interest Group on Agricultural Data (IGAD) at the Research Data Alliance (RDA)28 connects a global community of researchers in the agricultural domain to exchange state-of-the-art research data on agriculture. However, access to research data remains fragmented. Although good permanent repositories exist,29 it is not uncommon for data associated with a research project to be published, but then disappear when funding for the project tied to maintaining the data servers is no longer available.
Overall, although the supply of open data from all these different stakeholders is increasing, there remain large gaps, quality issues, and challenges in making data interoperable, as well as difficulties in establishing appropriate incentives for the stakeholders that are most relevant within the value chain.
Toward a global (open) data ecosystem for agriculture and food
Agricultural data includes social, environmental, physical, and financial factors. If viewed through the value chain, this includes inputs (fertilizer, pesticides, seeds), production (soil, weather, growth, land and water use), harvest (farmer income, yield, storage), and transport to market (food prices, road conditions, CO2 emissions). This data is collected using several methods: in-situ sensors, household surveys/interviews and on-the-ground collection, and, increasingly, through technology, such as satellites and drones, and sensors on farm equipment.
With all this data, what would it take to secure the best access to data for improving agriculture and food security? This is the question addressed by Syngenta and GODAN partners in articulating their vision for a global data ecosystem for agriculture and food.30 A global data ecosystem encompasses open standards and frameworks that enable decentralised data exchange. In an ideal open data ecosystem, all data, from geospatial to household surveys, could be layered together and used by any actor within the ecosystem. This is a socio-technical project: combining principles (such as the FAIR principles), technology, and stakeholder engagement.
Standards are explicit guidelines for the collection, management, and organisation of data. They can dramatically improve the interoperability of data between different stakeholders across agricultural value chains. Standards take many forms, including vocabularies, taxonomies, measurement protocols, data models, and equipment interfaces. The field of agriculture has long engaged in processes of standardisation for specific purposes, such as food safety, cross compliance of subsidies, machine engineering, and lab analysis, yet the existence of many subfields in agriculture has led to a proliferation of standards. These various standards have a surprisingly low degree of interoperability as they were developed to primarily serve the specific sub-fields; however, the need to use data from different sources for new applications (including big data and artificial intelligence applications) has made interoperability increasingly important. The starting point for greater interoperability is increased transparency on the development and use of current standards.
In order for standards to be more useful for research and for decision-making, they must be online, open, and machine-readable. GODAN Action (see box below) has completed a mapping of agri-food standards31 and discovered that 16% of the standards are not online, only 56% are machine-readable, and only 21% are clearly available under open licences, thereby limiting their use for open data. The relative openness of standards is often related to the sub-field where they originated. For example, plant science standards are more likely to be open than soil-related standards, and supply chain standards are even less likely to be open.
GODAN and the Agricultural Information Management Standards (AIMS) initiative, hosted by FAO, have developed the VEST Registry32 to make standards more open and useful by cataloguing ontologies in use in different agricultural sub-fields.33 The RDA/IGAD,34 started in 2013, works specifically on methods to make agricultural data more interoperable across crop-specific themes (such as rice and wheat) by developing joint standardised vocabularies, such as the Global Agricultural Concept Scheme.35 Identifying and describing the standards in use provides a first step to increasing interoperability and rationalising standards; however, it is also important to increase widespread adoption of standards by embedding their use requirements in the development of guidelines and policies on open data.
GODAN Action
GODAN Action36 is a three-year multi-sector project funded by the Department for International Development (DFID) in the UK and implemented by the Open Data Institute (ODI), GODAN, the Technical Centre for Agricultural and Rural Cooperation (CTA), Wageningen UR, and FAO, which aims to enable data users, practitioners, and intermediaries to work effectively with open data in the agriculture and nutrition sectors. GODAN Action works on three focal areas that will help overcome open data challenges: promoting standards and best practices, measuring open data impact, and building capacity with stakeholders. GODAN Action is applying these three focal areas to three specific data themes: weather data (2017), nutrition data (2018), and land use data (2019).
Over the past decade, open access and open data policies have become more prominent among governments and funders of agricultural programmes. The US and the United Kingdom (UK) made some of the first efforts toward the creation of open data policies. In 2013, US President Obama signed an executive order37 toward making data open by default, which led to the US Department of Agriculture’s (USDA) launch of the Food, Agriculture, and Rural virtual community38 on data.gov. The UK created its open data policy in 201239 and has since opened thousands of agriculture-related datasets through the Department for Food and Rural Affairs,40 and the European Union (EU) has undertaken similar work through the EU Open Data Portal.41 These examples illustrate the potential for public policy development in support of the publication of agriculturally relevant data.
Several governments in Africa are in the process of developing open data policies specifically for agriculture. In 2017, Kenya held a Ministerial Conference on Open Data for Agriculture and Nutrition, which culminated in the Nairobi Declaration, a 16-article statement on open data policy in agriculture and nutrition.42 The statement was signed by 15 African ministers, who have formed a network to develop policies for their respective countries. Francophone Africa is developing a similar network to support public policy development, the Conférence d’Afrique Francophone sur les Données Ouvertes (CAFDO).43
In 2016, a beta version of an International Open Data Charter Open Up Guide on Agriculture was published,44 setting out a call for all governments to adopt a focus on agriculture within their wider open data policies and providing guidance on policy and practice specifically in the agricultural domain. The full version of the Open Up Guide45 was subsequently launched in 2018 at the International Open Data Conference in Buenos Aires, Argentina.
Funders of agricultural research and development have developed open access policies, although these generally require only open journal publication of the research conclusions without necessarily requiring the underlying data to also be published as open data. Since 2012, the UK’s DFID, the US Agency for International Development (USAID), and the Gates Foundation, among others, have established policies that require their funded researchers to share both research publications and research data under conditions that permit access and reuse.46 However, a review of these policies in 2017 found they lacked clear open data definitions, suggesting a need to strengthen understanding of open data as a distinct concept alongside open access. There is also growing recognition that funded projects need support to understand and apply open data principles to their work, as well as access to technical data infrastructures to ease data publication and sharing. Several initiatives, such as the Gates Foundation funded Initiative for Open Ag Funding,47 which ran from 2016 to 2018, have explored how to make programmatic data (financial and administrative data about funded programmes) open as well, building on the International Aid Transparency Initiative.48
A large gap also exists in the development of any coherent open data policy or practice among the private sector actors within the agricultural industry, although when put into place, such data policies would likely seek to balance open access with business interests, thereby limiting open data benefits and overall transparency.
The widely cited case of John Deere tractors has become a key reference point in discussions related to data ethics. These “smart machines” not only plough the soil, but also capture vast amounts of data, which, under their “terms of service”, are fed back to John Deere to analyse and exploit with no guarantee of benefits or data going back to the farmer.49 Cases like this50 have helped to spark an emphasis on data ethics in agriculture, exploring perceived power imbalances between farmers and big agribusiness and triggering initiatives, such as the EU Code of Conduct on Agricultural Data Sharing by Contractual Agreement,51 endorsed by hundreds of equipment manufacturers.
Data privacy and security issues relate to the management and use of personally identifiable data, whether it is photographic, geospatial, financial, or demographic. There are many issues and ongoing discussions underway related to the degree of access that industry, government, and research institutions should have to data on the choices (e.g. agricultural practice, land use, product use) that an individual farmer makes. The norm is that data should not be made open when farm and farmer data privacy and security are at risk. There is general acceptance that sensitive data can be made available at times if aggregated, but not at the individual level. Data collectors must make every effort to prevent data breaches and inform farmers how data about them is used.52 One such initiative that is now gaining traction in opening up data across agricultural companies, such as tractor companies and farm sourcing corporations, is the Open Ag Data Alliance53 which has built an open source framework to allow farmers to access and control their own data.
Data ownership and legal rights issues are a difficult and complex component of the data ethics debate within the agriculture domain.54 If data is to be increasingly made open by default, the sector would benefit from improved clarity around legal data ownership and governance frameworks. Legal issues that affect access to, and the use of, data at the international, national, and subnational level include copyright, database rights, technical protection measures, trade secrets, patents, plant breeders’ rights, privacy, and even tangible property rights.55 Within the sector, there is general agreement that farmers should steward their own data and that legal frameworks should be transparent, but the discussions are complex,56 and many worry that more stringent mechanisms around farm data ownership could hurt innovation.57
Responsible data relates to employing data in ways that do not increase power imbalances. Careful examination of context can result in data being opened, shared with a chosen group, or kept closed.58 Governments may publish data to improve accountability, as a policy instrument or as a service to citizens, especially if collection has been paid for by taxes. The Open Data Charter59 encourages governments to make their data “open by default” for this reason, but accepts that there may be cases when data cannot be opened.
There is growing recognition in the field that to release data responsibly, the effects on vulnerable communities, especially women, Indigenous populations, and migrant workers must be considered.60 The sensitive information at issue in this case is not always personally identifiable information, but rather knowledge that, if made open, may allow others to profit from it to the detriment of others. For example, if data released indicates that women are managing or using land without obtaining the legal rights to do so, external actors may undertake to gain control of the land at the expense of the women.61 Trust between stakeholders around appropriate data responsibilities is important, but little guidance currently exists on best practices.
Preliminary work on issues of privacy, responsible data, and data ownership in agriculture has been carried out, and numerous farm organisations, manufacturers, and other entities have expressed interest in participating in further conversations around data ethics to build a new consensus, especially as it pertains to smallholder rights.62 This work is still at an early stage.
While smallholder farmers could benefit significantly from open data-driven knowledge on when and where to plant and harvest, and what current market prices are, at present it is highly resourced stakeholders who appear to be the primary beneficiaries of open data in agriculture. To ensure all stakeholders have the technical resources, knowledge, and capabilities to collect, publish, or reuse open data, efforts over the last few years have sought to overcome major capacity gaps among governments, data intermediaries, and farmers. For example, the GODAN Capacity Development Working Group and GODAN Action host webinars and provide a conversation space for those exploring how to use open data to create benefit for themselves or their organisations.63
Early learning from the field is showing that forming relationships among organisations and individuals, building trust, and ensuring a high diversity of stakeholders are all important in moving from awareness of open data to implementation of new business models and data use strategies. Researchers, governments, donors, NGOs, and farmers’ organisations have all discussed trust as an essential component of capacity development and willingness to commit to open data in agriculture.64
Evidence shows, however, that digital skills, including access to technology, access to the internet, and even simple word processing and spreadsheet management skills are lacking in rural farming areas, especially in developing countries, and among women and vulnerable communities. To seek to address these issues, CTA has invested in IT capacity development efforts and e-Learning specifically for women and girls.65 As mobile phones are increasingly available in developing countries, advocates expect that skills will increase, especially in rural agricultural areas. However, it is also anticipated that more capacity development efforts will be needed to ensure that all farmers can access, use, and share open data, including through the use of mobile platforms.
As we have seen, agriculture is diverse, as is the potential for applying open data to support a range of activities in the sector, from providing remote sensing data for precision agriculture applications to bringing farming extension advice to smallholder farm owners. Although the stakeholders may look very different, overarching sector goals remain mostly unchanged: to grow nutritious food as efficiently as possible, balanced with the need to secure the basic livelihood of people everywhere, using successful business models. As outlined in this chapter, the burgeoning ecosystem for open agricultural data is only beginning to address a myriad of issues as evidenced by the series of discussions that took place at the GODAN Summit in 2016 (see Figure 4). In the light of a growing world population and ever-increasing pressures on resources, we need technological improvements and innovative approaches in many areas of agriculture and nutrition to meet this goal, and data will be central to that effort.
Figure 4:Drawnalism artwork by Alex Hughes captured at the GODAN Summit, September 2016
To date, the private sector has shown only minimal interest in publishing their data openly for reuse. A much greater emphasis on incentives and business models that encourage the release of open data at all levels of agricultural value chains is necessary. Both researchers and companies need to undergo a cultural shift from closed and proprietary to shared and open, recognising the value of open data in promoting innovation, cost-sharing, and improved value chain efficiencies. The extent to which the FAIR principles have caught on, at least in the rhetoric of the sector, is encouraging and highlights the value of communicating open data ideas as part of a broader normative agenda for advancing agriculture.
In 2018, the meaningful sharing of useful (both anonymised and identifiable) on-farm data was often curtailed by legitimate privacy concerns raised by farmers and their organisations, or by farm machinery and farm management systems that operate in a proprietary space. The open data community needs to increasingly involve stakeholders who are trusted by farmers, such as farm cooperatives, in order to promote innovation using on-farm data. Right now, for those wanting to innovate with data, obtaining large satellite datasets from governments or agents who have already adopted an open data policy is a lot easier than opening up on-farm data or nutrition data from surveys. Yet inclusive innovation also requires remote sensing to provide ground-truth data, highlighting the need for ongoing efforts to secure granular data about farms with the acceptance and support of farmers and their communities. As the agri-food industry increasingly needs a “licence to operate” from the public, they have begun to release more data on their sustainability performance. Early examples of this data publication and of the private sector’s involvement in tracking SDG progress is promising in that regard. In addition, open data for agriculture has almost exclusively focused on food security, but, thus far, has neglected to consider textiles and forestry, which bear a large environmental cost and should be priority areas for future focus.
The seeds are sown for the growth of open data in agriculture, but, as yet, the evidence of lasting impact is limited. Creating the right ecosystem will need more than awareness raising. It will require all stakeholders to grapple with challenging ethical issues by turning debates and discussions into consensus, capacity development, guidance, and common approaches that can be deployed at scale.
Further reading
Allemang, D. & Teegarden, B. (2017). A global data ecosystem for agriculture and food (version 1; not peer reviewed). F1000Research, 6(1844). https://doi.org/10.7490/f1000research.1114971.1
Carolan, L., Smith, F., Protonotarios, V., Schaap, B., Broad, E., Hardinges, J., & Gerry, W. (2015). How can we improve agriculture, food and nutrition with open data? Open Data Institute [Article], 27 May. https://theodi.org/article/improving-agriculture-and-nutrition-with-open-data/
De Beer, J. (2017). Ownership of open data: Governance options for agriculture and nutrition (version 1; not peer reviewed). F1000Research, 6(1002). https://doi.org/10.7490/f1000research.1114298.1
Ferris, L. & Rahman, Z. (2017). Responsible data in agriculture (version 1; not peer reviewed). F1000Research, 6(1306). https://doi.org/10.7490/f1000research.1114555.1
Smith, F., Fawcett, J., & Musker, R. (2017). Donor open data policy and practice: An analysis of five agriculture programmes (version 1; not peer reviewed). F1000Research, 6(1900). https://doi.org/10.7490/f1000research.1115013.1
Wilkinson, M.D., Dumontier, M., Aalbersberg, Ij.J., Appleton, G., Axton, M., Baak, A., Blomberg, N. et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(160018). https://www.nature.com/articles/sdata201618
About the authors
Ben Schaap is Research Lead at GODAN, where he works on research to support advocacy goals for open data through the mapping of impact case studies. You can follow Ben at https://twitter.com/benschp and learn more about GODAN at https://www.godan.info.
Ruthie Musker is Strategic Projects and Partnerships Lead for GODAN. Follow Ruthie at http://www.twitter.com/ruthiemusker and learn about GODAN at https://www.godan.info.
Martin Parr is currently Director of Data & Services, Digital Development, at the Centre for Agriculture and Biosciences International (CABI). He has been involved in open data and open knowledge projects in the agricultural sector for many years, and was involved in drafting the Global Open Data for Agriculture and Nutrition declaration. You can follow Martin at https://www.twitter.com/parr2_parr and learn more about CABI at https://www.cabi.org.
André Laperriere is Executive Director of the Secretariat for GODAN. Follow André at https://www.twitter.com/a_laperriere and learn more about GODAN at https://www.godan.info.
How to cite this chapter
Schaap, B., Musker, R., Parr, M., & Laperriere, A. (2019). Open data and agriculture. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The state of open data: Histories and horizons (pp. 35–50). Cape Town and Ottawa: African Minds and International Development Research Centre. http://stateofopendata.od4d.net
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. It was carried out with the aid of a grant from the International Development Research Centre, Ottawa, Canada. |
1Thomson Reuters. (2015). How will we fill 9 billion bowls by 2050? | #9billionbowls. http://reports.thomsonreuters.com/9billionbowls/
2Graeub, B.E., Chappell, M.J., Wittman, H., Ledermann, S., Kerr, R.B., & Gemmill-Herren, B. (2016). The state of family farms in the world. World Development, 87, 1–15. https://www.sciencedirect.com/science/article/pii/S0305750X15001217?via%3Dihub
3Rapsomanikis, G. (2015). The economic lives of smallholder farmers: An analysis based on household data from nine countries. Rome: Food and Agriculture Organization of the United Nations. http://www.fao.org/3/a-i5251e.pdf
4AIMS (Agricultural Information Management Standards). (2018). AGROVOC. http://aims.fao.org/vest-registry/vocabularies/agrovoc
5GFAR. (2003). GLOBal ALliance of the Regional Agricultural Information Systems (GLOBAL.RAIS): Project document. Rome: Global Forum on Agricultural Research. http://www.fao.org/docs/eims/upload/216229/GLOBAL.RAIS_Project.pdf
6https://ring.ciard.net/about-ring
7http://www.godan.info/events/g8-conference-open-data-agriculture
8http://godan.info/pages/statement-purpose
9http://www.godan.info/pages/theory-change
10http://www.data4sdgs.org/, https://www.rd-alliance.org/, http://www.gfar.net/, http://wp.auburn.edu/push/ and https://www.agricord.org/en, respectively.
11Carolan, L., Smith, F., Protonotarios, V., Schaap, B., Broad, E., Hardinges, J., & Gerry, W. (2015). How can we improve agriculture, food and nutrition with open data? London: Open Data Institute and Global Open Data for Agriculture and Nutrition. https://theodi.org/article/improving-agriculture-and-nutrition-with-open-data/
12Van Latesteijn, H.C. & Rabbinge, R. (2012). Wicked problems in sustainable agriculture and food security, the TransForum experience. International Food and Agribusiness Management Review, 15 (Special Issue B), 89–94. https://www.ifama.org/resources/Documents/v15ib/Latesteijn-Rabbinge.pdf
14http://www.fao.org/faostat/en/
15For example, https://data.food.gov.uk/catalog
16Smith, F. & Jellema, A. (2016). Introducing the Agriculture Open Data Package. BETA version. Wallingford: Global Open Data for Agriculture and Nutrition. http://www.godan.info/sites/default/files/GODAN_Agriculture_Open_Data_Package_BETA_1.pdf
17Beardmore, D. (2017). The value of open data for the private sector. Open Data Institute [Article], 23 June. https://theodi.org/article/the-value-of-open-data-for-the-private-sector/
18Menzies, T. (2015). What does “social license” mean for agriculture? CropLife Canada, 3 November. https://croplife.ca/what-does-social-license-mean-for-agriculture/
19https://www.syngenta.com/site-services/transparency.aspx
20Odier, P. (2017). Why lack of data is the biggest hazard in “green investing”. Financial Times, 6 March. https://www.ft.com/content/be8e5db2-0249-11e7-aa5b-6bb07f5c8e12
21Government of The Netherlands. (2016). Amsterdam call for action on open science. Amsterdam: Ministry of Education, Culture and Science. https://www.government.nl/documents/reports/2016/04/04/amsterdam-call-for-action-on-open-science
22Smith, F., Fawcett, J., & Musker, R. (2017). Donor open data policy and practice: An analysis of five agriculture programmes (version 1; not peer reviewed). F1000Research, 6(1900). https://doi.org/10.7490/f1000research.1115013.1
23See, for example, the GoFAIR Initiative, https://www.go-fair.org/go-fair-initiative
24Zervas, P., Manouselis, N., Karampiperis, P., Hologne, O., Janssen, S., & Keizer, J. (2018). E-ROSA D3.7: Foresight roadmap paper. Zenodo, 17 October. https://zenodo.org/record/1479659#.XMaPFegzZPY
25https://www.nature.com/articles/sdata201618#author-information
26EC (European Commission). (2018). European Open Science Cloud (EOSC). https://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud
28https://www.rd-alliance.org/groups/agriculture-data-interest-group-igad.html
29See, for example, the Open Data Journal for Agricultural Research at http://library.wur.nl/ojs/index.php/odjar/ and AgTrials at http://www.agtrials.org/
30Allemang, D. & Teegarden, B. (2017). A global data ecosystem for agriculture and food (version 1; not peer reviewed). F1000Research, 6(1844). https://doi.org/10.7490/f1000research.1114971.1
31Pesce, V., Tennison, J., Mey, L., Jonquet, C., Toulet, A., Aubin, S., & Zervas, P. (2018). A map of agri-food data standards (version 1; not peer reviewed). F1000Research, 7(177). https://doi.org/10.7490/f1000research.1115260.1
32https://vest.agrisemantics.org/about/vest-agroportal
33Jonquet, C., Toulet, A., Arnaud, E., Aubin, S., Dzalé Yeumo, E., Emonet, V., Graybeal, J., Laporte, M.A., Musen, M.A., Pesce, V., & Larmande, P. (2018). AgroPortal: A vocabulary and ontology repository for agronomy. Computers and Electronics in Agriculture, 144, 126–143. https://doi.org/10.1016/j.compag.2017.10.012
34https://www.rd-alliance.org/groups/agriculture-data-interest-group-igad.html
35https://agrisemantics.org/GACS/
36http://www.godan.info/godan-action
37Obama, B. (2013). Executive Order: Making open and machine readable the new default for government information. The White House, 9 May. https://obamawhitehouse.archives.gov/the-press-office/2013/05/09/executive-order-making-open-and-machine-readable-new-default-government-
39HM Government of the United Kingdom. (2012). Open data: Unleashing the potential. White Paper. London: HM Government UK. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/78946/CM8353_acc.pdf
40Defra (Department for Food and Rural Affairs, UK). (2017). About Defra’s data programme. Gov.UK/DefraDigital. https://webarchive.nationalarchives.gov.uk/20180801164029oe_/https://defradigital.blog.gov.uk/about-defras-data-programme/
41https://open-data.europa.eu/
42GODAN (Global Open Data for Agriculture and Nutrition). (2017). Statement of the ministers: Building resilience on food security and nutrition through open data. Ministerial Conference on Agriculture and Nutrition Data, 14–16 June 2017, Nairobi, Kenya. http://assets.aims.fao.org.s3-eu-west-1.amazonaws.com/public/posts/attachments/Conference%20Statement%20-%20Open%20Data%20-%20GODAN%20-%20JUne%202017.pdf
43Banzet, A. (2017). #CAFDO2017: The first Francophone African Conference on Open Data and Open Government. Open Government Partnership, 15 June. https://www.opengovpartnership.org/stories/cafdo2017-first-francophone-african-conference-on-open-data-and-open-government
44Smith, F. & Jellema, A. (2016). Introducing the Agriculture Open Data Package. BETA version. Wallingford: Global Open Data for Agriculture and Nutrition. http://www.godan.info/sites/default/files/GODAN_Agriculture_Open_Data_Package_BETA_1.pdf
45Jellema, A., Musker, R., Smith, F., Brandusescu, A., & Davies, O. (2018). Open up guide for agriculture. https://openupguideforag.info/
46Smith, F., Fawcett, J., & Musker, R. (2017). Donor open data policy and practice: An analysis of five agriculture programmes (version 1; not peer reviewed). F1000Research, 6(1900). https://doi.org/10.7490/f1000research.1115013.1
47InterAction. (2017). Initiative for Open Ag Funding. https://interaction.org/project/open-ag-funding/overview
48https://iatistandard.org/en/
49Baarbé, J. & De Beer, J. (2017). A data commons for food security. Open AIR Working Paper 7(17). SSRN, 1 August. https://dx.doi.org/10.2139/ssrn.3008736
50EIP-AGRI (European Innovation Partnership for Agricultural Productivity and Sustainability). (2016). Data revolution: Emerging new data-driven business models in the agri-food sector. Seminar Report. https://ec.europa.eu/eip/agriculture/sites/agri-eip/files/eip-agri_seminar_data_revolution_final_report_2016_en.pdf
51https://copa-cogeca.eu/img/user/files/EU%20CODE/EU_Code_2018_web_version.pdf
52Ferris, L. & Rahman, Z. (2017). Responsible data in agriculture (version 1; not peer reviewed). F1000Research, 6(1306). https://doi.org/10.7490/f1000research.1114555.1
54Davies, T. (2015). Data, openness, community ownership and the commons. Tim’s Blog, 2 September. http://www.timdavies.org.uk/2015/09/02/openness-community-ownership-and-the-commons/
55De Beer, J. (2017). Ownership of open data: Governance options for agriculture and nutrition (version 1; not peer reviewed). F1000Research, 6(1002). https://doi.org/10.7490/f1000research.1114298.1
56Cosgrove, E. (2017). Congress wades into farm data “ownership” debate. AgFunderNews, 17 July. https://agfundernews.com/congress-wades-farm-data-ownership-debate.html/
57Heath, R. (2017). Could farmer data “ownership” kill innovation? Australian Farm Institute, 13 June. http://www.farminstitute.org.au/ag-forum/could-farmer-data-ownership-kill-innovation
58(ODI) Open Data Institute. (2015). The data spectrum. https://theodi.org/about-the-odi/the-data-spectrum/
59Open Data Charter. (2015). Principles. https://opendatacharter.net/principles/
60Maru, A., Berne, D., De Beer, J., Ballantyne, P., Pesce, V., Kalyesubula, S., Fourie, N., Addison, C., Collett, A., & Chaves, J. (2018). Digital and data-driven agriculture: Harnessing the power of data for smallholders (version 1; not peer reviewed). F1000Research, 7(525).https://doi.org/10.7490/f1000research.1115402.1
61Ferris, L. & Rahman, Z. (2017). Responsible data in agriculture (version 1; not peer reviewed). F1000Research, 6(1306). https://doi.org/10.7490/f1000research.1114555.1
62Ibid.
63http://www.godan.info/working-groups/capacity-development
64Musker, R. & Schaap, B. (2018). Global Open Data in Agriculture and Nutrition (GODAN) Initiative Partner Network Analysis. F1000Research, 7(47). http://dx.doi.org/10.12688/f1000research.13044.1
65CTA (Technical Centre for Agricultural and Rural Cooperation). (2018). Women need access to open data with potential application for agriculture. https://www.cta.int/en/article/women-need-access-to-open-data-with-potential-application-for-agriculture-sid0c13ea34e-27e6-40d5-895a-67d434b0f933
The availability of standardised and openly licensed corporate data “at source” from corporate registries is limited, but through intermediaries like OpenCorporates, significant open data can be accessed and reused.
Big strides have been made over the last decade in laying the technical and policy foundations for more open data on corporate structures, ownership, and control, but although progress has been made on the balance between openness and privacy in corporate data, there are still issues to resolve.
Evidence suggests open corporate data can be a key tool in improving risk management and holding the powerful to account, but progress may also bring increasing hostility to openness from some entities in the private sector.
A concerted effort will be needed in the coming years to build on the foundations laid to date in order to deliver a global, robust, and reliable supply of open data on corporate identity and ownership.
Basic corporate data is essential for understanding our world. The name of a company, its legal form, registration number, formation date, the identities of its directors, and the registry where this information is held are all fundamental to knowing who we are doing business with and who our employers are, as well as which entities should be taxed and in which jurisdiction. Access to that data over time allows us to assess the performance and structure of the economy as businesses form, merge, break apart, and fail. Another layer of analysis opens up if we move from simply identifying corporate entities to identifying their owners and those who ultimately control them, a concept referred to as beneficial ownership. The more jurisdictions that require corporate ownership data to be open, the easier it becomes to navigate through a myriad of shell companies, regardless of where they are located, to identify the actual owners.
Information provided as open corporate data is of interest to public, private, and civil society stakeholders, and has a universal geographical applicability. G20 leaders have discussed the need to use corporate data to improve financial stability and efficiency, to combat corruption, and to improve the exchange of tax information between jurisdictions.1 These same goals are reflected in the Sustainable Development Goals (SDGs). SDGs 16 and 17 address the need to reduce illicit financial flows, ensure the return of stolen assets, reduce corruption, promote investment, and develop capacities for domestic tax collection, all of which are supported by improving the availability and use of corporate data.2
Despite the relevance of information on corporates, progress on releasing it as open data has been slow. In the fourth edition of the Open Data Barometer, corporate data, identified as company register data, had the third lowest score of all the datasets surveyed, and only 5% of all company register data was available as open data.3 Nor has there been a significant shift over time to open these datasets. The first edition of the Open Data Barometer in 2013 found that only 3 of 74 company register datasets were available as open data; by the time of the fourth edition in 2017, only 6 of 109 datasets were open.4 Corporate registers still lack a universal data standard that could be used to make opening these datasets easier in the future.
A more hopeful picture can be found from OpenCorporates, an aggregator that provides corporate data under an open licence. In 2018, OpenCorporates provided corporate data from 127 registries in 73 different countries; however, much of this comes from OpenCorporates’ own scraping work rather than from the release of open datasets at source. For the platform to move toward more comprehensive global coverage, more jurisdictions will need to open their company register data and remove the paywalls that limit access.
Although datasets containing basic corporate information were found in all but one of the jurisdictions assessed by the 2017 Open Data Barometer, albeit at many different levels of openness and machine-readability, data related to corporate ownership often did not exist at all.5 In the latest Financial Action Task Force (FATF) consolidated assessment, only 19 of 69 countries listed are compliant with transparency requirements for beneficial ownership, and these requirements mandate only that information on beneficial ownership is obtainable by competent authorities, not by the public.6,7 Very little open data on corporate ownership exists at the present time. Open registers on beneficial ownership are available for Denmark (which also has a register of legal ownership), the UK, and the Ukraine, and a state contractors’ register is available in Slovakia. Furthermore, while the policy advances discussed below are likely to create more ownership registers in the future, these will not necessarily be open, free-to-access, or machine-readable.
Figure 1:The Open Data Barometer (ODB) uses ten different variables to assess the presence and openness of company register datasets using an expert survey. The fourth edition of the ODB found just six company register datasets that met the full open definition.
Source: https://opendatabarometer.org/?_year=2017&indicator=ODB
However, important advances have been made. These advances might be described as “infrastructural”, providing the technical components that support the dynamic and international nature of corporate information, as well as the legislative and civil society support that have made these technical advances possible.
The most fundamental requirement for open corporate data is the availability of identifiers that are unique, stable, and interoperable across jurisdictions and that are openly licensed. In this respect, a great deal of progress has been made. Two systems are operational. Thomson Reuters’ PermID assigns identifiers and offers them, plus basic company and officer information, under an open license for over 3.4 million legal entities.8 The Global Legal Entity Identifier Foundation (GLEIF) takes a different approach, requiring legal entities to sign up for an identifier through a local operating unit. The LEI data they provide also contains basic company information and will soon also contain “Level Two” data on an entity’s accounting parent. Over 1.3 million LEIs have been issued to date; however, despite their universal applicability, identifier coverage is greater across the wealthiest countries with almost 25% of the LEIs issued coming from the UK and USA alone (see Figure 2).9
Another supporting development has been the emergence of lists and services to point publishers and data users toward the right sources of corporate identifiers. One example of this is org-id.guide, which has evolved from the organisation identifiers registry list maintained by the International Aid Transparency Initiative to help encourage the use of stable identifiers from existing registers for all types of legal entities across the world.10 Launched at the 2016 International Open Data Conference and developed to meet a commitment of the IODC Roadmap,11 it has been fostered by a collaboration of open data standards providers, but has not yet seen wide adoption. GLEIF also maintains a list of corporate registries with unique identifiers known as the Registration Authorities List that is used in LEIs.12 One danger here is that we face a proliferation of competing open standards to the point that tooling for crosswalks between identifiers will need to be part of the future landscape of open corporate data. GLEIF’s dataset linking LEIs to SWIFT’s Business Identifier Codes is a welcome step in this direction.13
Figure 2:Over 1.3 million LEIs have been issued, although the global distribution of identifiers, and the extent to which they are attached to verified data, varies by country.
Source: https://www.gleif.org/en/lei-data/global-lei-index/lei-statistics
The final technical component of progress on open corporate data is the emergence of universal data platforms. These services offer an advantage over national platforms in that corporate activity often crosses borders, so identifying corporates often requires searching multiple corporate registers maintained on national platforms. Reconciliation of company names and disambiguation of company officers is a significant value-added service that such platforms can provide. OpenCorporates is the most well-established example, offering access to open corporate data via search and an API. More recently, OpenOwnership, funded by the UK’s Department for International Development, has sponsored the development of the Beneficial Ownership Data Standard and brought together beneficial ownership information from several existing national registers on its own platform with plans to extend further and to allow self-submission by companies and individuals.14
An introduction to beneficial ownership
In the context of corporate ownership, “beneficial ownership” refers to the identification of the natural person or persons who benefit from, or control, legal entities, persons, or arrangements. Beneficial ownership can be achieved through such means as formal rights, like votes or dividend rights attached to shareholdings, or informal rights, like the ability to influence the direction of a company outside a formal ownership relationship. The identification of beneficial ownership involves looking through otherwise complex corporate ownership chains to find the “ultimate” beneficial owner regardless of how many shell companies or secrecy-based jurisdictions may stand in the way.
Beneficial ownership is also partly defined in the negative. While the “beneficiary” of an asset can be any legal person, natural person, or arrangement, a beneficial owner must be a natural person, because, regardless of how complex a corporate structure may be, control over it ultimately resolves to one or more natural persons. Beneficial ownership is also distinct from “legal ownership”, which refers only to ownership of legal title. For example, one natural person may legally own a company, while another natural person is the beneficial owner through a trust or nominee structure. Legal title is not necessary for beneficial ownership, because control may be exercised through informal means.15
The concept of beneficial ownership has its origins in trust law, but, beginning in the 1970s, it has become part of the lexicon of international tax, anti-money laundering, and illicit financial flows. More recently, beneficial ownership has moved out of these specific fields into the broader policy debate on corruption and transparency.16 Of particular importance is the emergence of the OpenOwnership project, which is focused on enabling the publication of open beneficial ownership data and creating its own global register of this data. The project has a steering group comprised of civil society groups, such as the Open Contracting Partnership and OpenCorporates, and has built links with organisations like the Open Government Partnership and the Extractives Industry Transparency Initiative that have existing commitments to beneficial ownership transparency.
The foundational work for progress on opening data corporate ownership has involved winning the policy and legislative argument that corporate ownership data should be made available. At a high-level, these arguments are summarised in two communiqués from the G8 and G20 in 2013 around the themes of anti-corruption and open societies,17 which, in October of that year, led to the first public commitment to an open register of beneficial owners by the UK at an Open Government Partnership (OGP) conference in London.18
The motivation for opening up corporate data has three interlocking themes. First, the financial crisis ushered in a historical period of reform centred around the dangers of uncertain information and unknown actors in financial markets. This has been particularly important in the G20 driving forward the Global Legal Identifier Foundation, which is based on their linking of time-consuming and uncoordinated practises for identifying counterparties to the potential for exposure to liabilities and consequent financial instability.19 Second, the use of anonymous corporate vehicles in corruption cases and other illicit financial flows was highlighted in the World Bank’s influential 2011 Puppet Masters report.20 Corporate anonymity has since been identified as a contributor to terrorist-financing, to corruption, to the expropriation of shareholders, and to impeding development goals.21 Third, the technical requirements of tax-sharing and anti-money laundering requirements increased the demand for interchangeable data on natural persons and legal entities.22 Together, these three themes have been critical drivers of the beneficial ownership agenda.
The concrete outcome of these policy initiatives has been national- and regional-level legislation to mandate open registers of business data and the systematic collection of beneficial ownership data (to which access may still be restricted). While early adopters were single European nations, new regional leaders like Indonesia are starting to emerge in the Global South.23 In the European Union, the Anti-Money Laundering Directive (AMLD) obligates member states to create public central registers of beneficial ownership. In a major legislative advance in 2017, the AMLD was updated to include trusts and trust-like arrangements and to make that data accessible to those with a legitimate interest.24
Multilateral organisations have also drawn a wider range of entities into contact with the infrastructure for identifying and describing corporates and for publishing this information as open data. The Extractive Industry Transparency Initiative (EITI) 2016 Standard requires countries to publish roadmaps for beneficial ownership transparency in the extractives sector and a recommendation that beneficial ownership registers be public. As of March 2017, 21 countries had committed to a establishing a public register.25 The Open Government Partnership also recommends robust registers of beneficial ownership as an intermediate commitment to open government, and recommends providing open access to machine-readable data from these registers as an advanced commitment.26
There is also a less visible layer of work related to corporate registers that is not yet yielding open datasets but still establishes concrete targets for advocacy work in this area. In many jurisdictions, such as Hong Kong, Singapore, Switzerland, Zambia, and others, governments have passed legislation to require closed beneficial ownership registers to comply with anti-money laundering standards.27 Similarly, the implementation of the Markets in Financial Instruments Directive II (MiFID II) requires trusts wishing to trade in financial instruments to have an Legal Entity Identifier as of January 3, 2018,28 so that prior to the implementation of the revised AMLD in the EU, we will get signals about which trusts in the region are economically active and another identifier that can be incorporated into central registers.
Case study: OpenCorporates
OpenCorporates was founded in 2010 and has received funding from the Alfred P. Sloan Foundation, ODINE, and EU Horizon 2020. OpenCorporates collects corporate data from public open data sources, official APIs, and data scrapings, and transforms it all into a standard form. Covering 127 registries, OpenCorporates offers a search engine, access to free and paid APIs, corporate data, company gazettes, and LEIs. The site also offers curated datasets on out-of-state corporations in the US. In 2017, OpenCorporates took the Quebec corporate registry to court after receiving a takedown notice for information originally accessed from that registry. The case indicates both how far the movement for open corporate data has come and the role that OpenCorporates has played and how much work is still left to be done.
Even at this relatively early stage, the opening up of corporate data has had significant impact. There is a significant market for data on corporate entities, and startups in this ecosystem have both built new business models based on available open data sources and challenged information providers. Significant businesses that have emerged in this space: OpenCorporates, providing standardised corporate and financial data; DueDil, offering credit checking and anti-money laundering checks; Arachnys, offering automated and manual tools for due diligence; and Calcbench, offering standardised accounts data for US companies.29 Encouragingly, many of these businesses support the rhetoric around the expected impacts of open corporate data by serving to reduce the compliance costs for businesses, making decision-making about investment and risk more transparent, giving minority shareholders visibility on who controls legal entities, and allowing diagnostics for individual businesses and whole sectors of the economy through the use of detailed data. There is still an obvious absence of businesses from the Global South; however, the coverage of OpenCorporates and Arachnys, in particular, is also deliberately global and constrained only by data availability. This is an encouraging sign that businesses built on open corporate data can be supply-led rather than demand-led and that we will see a more diverse customer profile emerge as a result.
Corporate data has been fertile ground for civil society with the origins of movement closely tied to transparency goals and now closely aligned with the Sustainable Development Goals. The NGO, Global Witness, has used corporate data to investigate corruption in Myanmar’s jade industry (see box) and money-laundering associated with Panamanian companies and the Trump Ocean Club.30 Investigations that combine leaked and official data simultaneously illustrate both the promise of joining up data and the difficulties of investigating ownership when so many legal entities are registered in jurisdictions that support secret registration for corporations. As another example, Transparency International, when investigating foreign ownership of properties in London, combined information from the International Consortium of Investigative Journalists’ Offshore Leaks Database and official sources; however, despite combining sources, the report was unable to find information on 46% of the companies concerned.31 When open data sources are available, the potential for civil society to investigate and exert pressure is clear. For example, court documents from a Brazilian case linked to the Odebrecht bribery scandal, enabled journalists in Scotland to both examine the role of Scottish Limited Partnerships (SLPs) as a possible money-laundering vehicle and highlight the lack of compliance with disclosure rules in the UK’s open register.32 The UK has since signalled reforms on SLPs.33 Similarly, campaigners have also demonstrated the potential impact of tying data standards together. In Malaysia, the Sinar Project’s Telus website, for example, will use the Open Contracting Data Standard, Popolo, and the Beneficial Ownership Data Standard to link legal entities and natural persons in public procurement disclosures as a way of finding corruption.34
Case Study: Global Witness and the Jade Industry
The Global Witness investigation of the jade industry in Myanmar involved unstructured source data that was turned into structured data by OpenCorporates. This process also meant that full access to the dataset was preserved even as director and shareholder details were being scrubbed from the original source.35 Global Witness was then able to see links within the jade industry using official data and to disambiguate legal entities and natural persons with the support of on-the-ground interviews. Together, these techniques allowed for precise documentation on how important figures from government, the military, and the narcotics trade were heavily involved in the jade industry. As this example suggests, the investigative use of corporate data often requires anchoring in a particular external context. This might, as in Myanmar, provide information on the local significance of patterns of corporate ownership or, as with large-scale leaks like the Panama Papers, point investigators toward possible wrongdoing.
A significant challenge that is likely to become more acute over time is the quality assurance of data held in open registers, especially when legal entities self-submit and information is verified on a risk-assessment basis. In these situations, the information provided by honest actors is of variable quality, and the information provided by dishonest actors is often hard to disentangle. This problem is particularly acute when new compliance regimes are introduced and submitters’ understanding of their statutory duties is limited. A report from Global Witness has found that the UK’s person of significant control (PSC) register, which is unverified and has some unvalidated data fields, has serious data quality issues.36 Verified data, or data submitted through a corporate service provider, is likely to be high quality, but imposes higher costs on corporates. The poor quality of open registries, and the high cost of verified alternatives, was used as an argument against open and verified corporate registries in a 2011 report by the Stolen Asset Recovery Initiative.37 The same argument has resurfaced more recently in the context of registers of beneficial ownership.38 Proponents of open corporate data will need to be wary of poor data creating negative feedback around corporate transparency. A possible solution is to adopt the LEI approach, where local operating units validate information for a small fee and accuracy remains high.39 There will also need to be a push for verified data, using traceable processes by authorised persons to reduce the opportunity for plausible deniability when false information is entered into a register.40 To guard against honest mistakes, registries may also be able to improve data quality through automated error detection, better guidance to submitters, and improved data validation.
Other than data quality, the other major challenge to improving open data on corporate identity and ownership may be the potential negative reaction from some corners to increases in transparency, primarily because arguments around privacy and data protection are currently unresolved. While the European court has ruled that there is no “right to be forgotten” for natural persons in company registers, legislation has not removed ambiguities around how long data can be stored for, who has the right to access it, and how far should it go to identify individuals.41 While campaigners have made cogent arguments against privacy for beneficial owners, the argument is likely to re-emerge as ownership transparency is tied into legal reforms associated with corporations and to the fundamental rights of individuals. Given that extracting value from corporate ownership data involves making sometimes uncertain connections between datasets held in different jurisdictions, practitioners and civil society will need to balance the arguments for transparency with the need for coordination and harm prevention for individuals exempt from disclosure and for individuals who, while not exempt, may nonetheless be exposed to harm by the joining up of datasets.
One more challenge arising from success will be the need to encourage positive uses for ownership data, while discouraging detrimental or adversarial uses. There is a significant risk that some combination of poor data quality, complex or non-interoperable data, and a lack of capacity will lead to registers being created but not used. In this regard, it is useful to have sector-specific guidance (e.g. NRGI’s work on extractives) or targets for success or failure (e.g. increasing revenue collection in the Global South).42 While positive uses of ownership data will need to be cultivated, adversarial uses of registers are likely to increase unsupervised. Dishonest actors will become familiar with the rules and find it easier to skirt disclosure requirements, and as long as secrecy-based jurisdictions still exist, they will also have the option to simply close legal entities and open new ones not subject to scrutiny.43 This could itself be seen as a sign of success, but it will be crucial to be able to measure such behavioural changes.
Much progress on open data related to corporate identity and ownership has been made. While it might appear that the amount of open corporate data is still relatively limited, much of the infrastructure required for success, such as policy, legislation, and technical architecture is now in place or being developed. Moreover, progress to date has involved a broad base of stakeholders from government, multilateral institutions, civil society, and the private sector. Great advances have been made in creating a shared understanding that we need to be able to unambiguously identify legal entities using universal identifiers, which has led to an emerging ecosystem of links to complex corporate data. A more recent development is that the policy argument in support of beneficial ownership seems to have been won. We have seen early examples of registers providing ownership data as open data, as well as the legislative basis for many more to come in the near future mostly as the result of a civil society movement committed to opening up asset ownership as part of economic transparency. However, the withdrawal of the US from the EITI on the basis that it is a “burden for business” demonstrates the fragility of the achievements to date and the difficulty of maintaining corporate disclosure as a desirable goal.44 For open corporate data to fulfil its promise for the SDGs, any momentum that has been created in Europe and North America will need to be recreated across the Global South.
FATF & Egmont Group. (2018). Concealment of beneficial ownership. Paris: Financial Action Task Force. https://www.fatf-gafi.org/media/fatf/documents/reports/FATFEgmont-Concealment-beneficial-ownership.pdf
Gray, J. & Davies, T. (2015). Fighting phantom firms in the UK: From opening up datasets to reshaping data infrastructures? SSRN. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2610937
Knobel, A., Meinzer, M., & Harari, M. (2017). What should be included in corporate registries? A data checklist part 1: Beneficial ownership information, 2017. SSRN. https://ssrn.com/abstract=2953972
Prichard, W. (2018). Linking beneficial ownership transparency to improved tax revenue collection in developing countries. Summary Brief 15. Brighton, UK: International Centre for Tax and Development. http://opendocs.ids.ac.uk/opendocs/handle/123456789/13753
Westenberg, E. & Sayne, A. (2018). Beneficial ownership screening: Practical measures to reduce corruption risks in extractives licensing. Briefing May 2018. New York, NY: Natural Resource Governance Institute. https://resourcegovernance.org/sites/default/files/documents/beneficial-ownership-screening_0.pdf
About the author
Jack Lord is a Data and Policy Analyst at Open Data Services Co-operative. He is co-chair of the Beneficial Ownership Data Standard working group and is works with OpenOwnership to provide technical assistance to countries implementing beneficial ownership transparency. Follow Jack at https://www.twitter.com/jacklord and follow his work at https://www.opendatanotes.com.
How to cite this chapter
Lord, J. (2019). Corporate Ownership. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The state of open data: Histories and horizons (pp. 51–64). Cape Town and Ottawa: African Minds and International Development Research Centre. http://stateofopendata.od4d.net.
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. It was carried out with the aid of a grant from the International Development Research Centre, Ottawa, Canada. |
1G20. (2013). Communiqué: Meeting of finance ministers and central bank governors. http://www.g20.utoronto.ca/2013/2013-0720-finance.html
2UN (United Nations). (2015). Transforming our world: The 2030 Agenda for Sustainable Development. https://sustainabledevelopment.un.org/post2015/transformingourworld
3Web Foundation. (2017). Open Data Barometer – Global report. 4th edition. Washington, DC: World Wide Web Foundation. https://opendatabarometer.org/doc/4thEdition/ODB-4thEdition-GlobalReport.pdf
4Davies, T. (2013). Open Data Barometer: 2013 global report. Washington, DC: World Wide Web Foundation. http://opendatabarometer.org/doc/1stEdition/Open-Data-Barometer-2013-Global-Report.pdf
5Web Foundation. (2017). Open Data Barometer – Global report. 4th edition. Washington, DC: World Wide Web Foundation. https://opendatabarometer.org/doc/4thEdition/ODB-4thEdition-GlobalReport.pdf
6FATF (Financial Action Task Force). (2018). FATF methodology for assessing compliance with the FATF recommendations and the effectiveness of AML/CFT systems. http://www.fatf-gafi.org/publications/mutualevaluations/documents/fatf-methodology.html
7FATF (Financial Action Task Force). (2018). Consolidated assessment ratings. http://www.fatf-gafi.org/publications/mutualevaluations/documents/assessment-ratings.html
8Thomson Reuters. (2018). Open PermID: Entity search. https://developers.thomsonreuters.com/open-permid/open-permid-entity-search-restful-api
9GLEIF (Global Legal Entity Identifier Foundation). (2018). Global LEI Index: LEI statistics. https://www.gleif.org/en/lei-data/global-lei-index/lei-statistics
11IDRC. (2015). Enabling the data revolution: An international open data roadmap. Conference report. Ottawa: International Development Research Centre. http://1a9vrva76sx19qtvg1ddvt6f.wpengine.netdnacdn.com/wp-content/uploads/2015/09/IODC2015-Final-Report-web.pdf
12GLEIF (Global Legal Entity Identifier Foundation). (n.d.). GLEIF registration authorities list. https://www.gleif.org/en/about-lei/gleif-registration-authorities-list
13Wolf, S. (2018). GLEIF and SWIFT introduce the first open source BIC-to-LEI relationship file to allow for interoperability across multiple ID platforms. GLEIF Blog, 8 February. https://www.gleif.org/en/newsroom/blog/gleif-and-swift-introduce-the-first-open-source-bic-to-lei-relationship-file-to-allow-for-interoperability-across-multiple-id-platforms
14OpenOwnership. (2018). About the project. https://openownership.org/about/
15FATF & Egmont Group. (2018). Concealment of beneficial ownership. Paris: Financial Action Task Force. https://www.fatf-gafi.org/media/fatf/documents/reports/FATF-Egmont-Concealment-beneficial-ownership.pdf
16Gray, J. & Davies, T. (2015). Fighting phantom firms in the UK: From opening up datasets to reshaping data infrastructures? SSRN. https://papers.ssrn.com/abstract=2610937
17Lough Erne G8 Leaders’ Communiqué, 18 June 2013. https://www.gov.uk/government/publications/2013-lough-erne-g8-leaders-communique; G20 Meeting of Finance Ministers and Central Bank Governors, Moscow, 20 July 2013. http://www.g20.utoronto.ca/2013/2013-0720-finance.html
18The background to this commitment is described in Gray, J. & Davies, T. (2015). Fighting phantom firms in the UK: From opening up datasets to reshaping data infrastructures? SSRN. https://papers.ssrn.com/abstract=2610937
19Couillault, B., Mizuguchi, J., & Reed, M. (2017). Collective action: Toward solving a vexing problem to build a global infrastructure for financial information. https://www.fsa.go.jp/common/conference/danwa/20170202.pdf
20Halter, E.M., Harrison, R.M., Park, J.W., Sharman, J.C., & Van der Does de Willebois, E.J.M. (2011). The puppet masters: How the corrupt use legal structures to hide stolen assets and what to do about it. Washington, DC: World Bank. http://documents.worldbank.org/curated/en/784961468152973030/The-puppet-masters-how-the-corrupt-use-legal-structures-to-hide-stolen-assets-and-what-to-do-about-it
21Global Witness. (2013). Poverty, corruption and anonymous companies: How hidden company ownership fuels corruption and hinders the fight against poverty. London: Global Witness. https://www.globalwitness.org/en/archive/anonymous-companies-global-witness-briefing/; O’Donovan, J., Wagner, H.F., & Zeume, S. (2016). The value of offshore secrets: Evidence from the Panama Papers. SSRN. https://papers.ssrn.com/abstract=2771095; Ohlbaum, D. (2013). Terrorism, Inc.: How shell companies aid terrorism, crime, and corruption. New York, NY: Open Society Foundations. https://www.opensocietyfoundations.org/sites/default/files/Terrorism%20INC%20Final%2010-24-13%20FINAL.pdf
22FATF. (2017). Guidance on transparency and beneficial ownership. October 2014. Paris: Financial Action Task Force. http://www.fatf-gafi.org/publications/fatfrecommendations/documents/transparency-and-beneficial-ownership.html; OECD. (2017). Standard for automatic exchange of financial account information in tax matters. 2nd edition. Paris: Organisation of Economic Co-operation and Development Publishing. http://dx.doi.org/10.1787/9789264267992-en
23SSEK Indonesian Legal Consultants – Suwana, A.S. & Erlan, B.B. (2018). Mandatory disclosure of beneficial owners in Indonesia. https://www.lexology.com/library/detail.aspx?g=57148873-1811-4daf-852f-d207dd474313
24EUR-Lex. (2017). Proposal for a Directive of the European Parliament and of the Council amending Directive (EU) 2015/849 on the prevention of the use of the financial system for the purposes of money laundering or terrorist financing and amending Directive 2009/101/EC – Analysis of the final compromise text with a view to agreement ST 15849 2017 INIT - 2016/0208 (COD). http://eur-lex.europa.eu/legal-content/EN/TXT/?qid=1514903266840&uri=CONSIL:ST_15849_2017_INIT
25EITI. (2015). The EITI Standard 2016. Oslo: Extractive Industries Transparency Initiative. https://eiti.org/sites/default/files/documents/the_eiti_standard_2016_-_english.pdf; EITI. (2017). The EITI Board Approved the Recommendations of the Implementation Committee Related to Beneficial Ownership. 36th Board meeting in Bogota, Colombia. 9 March 2017. https://eiti.org/BD/2017-31
26Open Government Partnership. (n.d.). Beneficial ownership, illustrative commitments. https://www.opengovpartnership.org/theme/beneficial-owners
27Baker McKenzie. (2017). Survey of beneficial ownership disclosure in Hong Kong, Singapore, Switzerland and the UK. 5 December. https://www.bakermckenzie.com/en/insight/publications/2017/12/survey-of-beneficial-ownership
28ESMA (European Securities and Markets Authority). (2017). Legal Entity Identifier. Briefing Note 9, October 2017. https://www.esma.europa.eu/document/legal-entity-identifier-briefing-note
29EUR-Lex. (2017). Proposal for a Directive of the European Parliament and of the Council amending Directive (EU) 2015/849 on the prevention of the use of the financial system for the purposes of money laundering or terrorist financing and amending Directive 2009/101/EC – Analysis of the final compromise text with a view to agreement ST 15849 2017 INIT – 2016/0208 (COD). http://eur-lex.europa.eu/legal-content/EN/TXT/?qid=1514903266840&uri=CONSIL:ST_15849_2017_INIT
30Global Witness. (2015). Jade: A Global Witness investigation into Myanmar’s “big state secret”. https://www.globalwitness.org/jade-story/; Global Witness. (2017). Narco-a-Lago: Money laundering at the Trump Ocean Club, Panama. November 2017. https://www.globalwitness.org/en/campaigns/corruption-and-money-laundering/narco-a-lago-panama/
31Transparency International. (2016). London property: A top destination for money launderers. http://www.transparency.org.uk/publications/london-property-tr-ti-uk/; ICIJ (International Consortium of Investigative Journalists). (2018). Offshore leaks database. https://offshoreleaks.icij.org/
32Leask, D. (2018). Scots shell firms play key role in Latin America’s bribery “mega scandal”. The Herald Scotland, 3 February. http://www.heraldscotland.com/news/15917473.Scots_shell_firms_play_key_role_in_global_web_of_bribery/
33Settle, M. (2018). Theresa May set to ban secretive Scottish shell companies to halt flow of dirty Russian money. The Herald Scotland, 22 March. http://www.heraldscotland.com/news/16103825.Banned__Scotland_s_secret_tax_havens_for_Putin_cronies/
34Yusof, K. (2017). Telus (Transparency). 5 November. https://docs.google.com/presentation/d/1nVO30WSfdHXcvyDMUU80pmazoXCpcqH6nVFaJZARk6w; Sinar/Telus. (2017). Telus: Joined up data transparency project for PEPs, OCDS & beneficial ownership. GitHub [Post].https://github.com/Sinar/telus
35OpenCorporates. (2015). How open company data was used to uncover the powerful elite benefiting from Myanmar’s multi-billion dollar jade industry. White Paper. Medium [Blog post], 27 October. https://medium.com/opencorporates/how-open-company-data-was-used-to-uncover-the-powerful-elite-benefiting-from-myanmar-s-multi-1ef35f88d6bd
36Global Witness & Open Ownership. (2017). Learning the lessons from the UK’s public beneficial ownership register. 23 October. https://www.globalwitness.org/en/campaigns/corruption-and-money-laundering/learning-lessons-uks-public-beneficial-ownership-register/
37StAR (Stolen Asset Recovery Initiative). (2011). Barriers to asset recovery. Washington, DC: World Bank. https://star.worldbank.org/sites/star/files/Barriers%20to%20Asset%20Recovery.pdf
38Comments by Pascal Saint-Amans, Director of the OECD’s Centre for Tax Policy in 2016, in Rumney, E. (2016). Time not right for public registers of beneficial ownership, says OECD tax chief. Public Finance International, 15 June. http://www.publicfinanceinternational.org/news/2016/06/time-not-right-public-registers-beneficial-ownership-says-oecd-tax-chief; Forstater, M. (2017). Beneficial openness? Weighing the costs and benefits of financial transparency. CMI Working Paper 3. Bergen, Norway: Christian Michelson Institute. https://www.cmi.no/publications/6201-beneficial-openness
39Less than 0.01% of LEI records failed an accuracy check in the GLEIF data quality report, see GLEIF (Global Legal Entity Identifier Foundation). (2018). Global LEI data quality report. https://www.gleif.org/en/leidata/gleif-data-quality-management/about-the-data-quality-reports/download-data-quality-reports/download-global-lei-data-quality-report-february-2018
40Sztykowski, Z. & Taggart, C. (2017). What we really mean when we talk about verification: Authentication and authorization (Part 2 of 4). Open Ownership [News post], 26 September. https://openownership.org/news/what-we-really-mean-when-we-talk-about-verification-authentication-and-authorization-part-2-of-4/
41Judgment of the Court (Second Chamber) of 9 March 2017, Camera di Commercio, Industria, Artigianato e Agricoltura di Lecce vs Salvatore Manni, No. ECLI:EU:C:2017:197; Fowler, N. (2016). Beneficial ownership and disclosure of trusts: Challenging the privacy arguments. Tax Justice Network [Blog post], 7 December. http://www.taxjustice.net/2016/12/07/beneficial-ownership-disclosure-trusts-challenging-privacy-arguments/
42Westenberg, E. & Sayne, A. (2018). Beneficial ownership screening: Practical measures to reduce corruption risks in extractives licensing. Briefing 15 May. New York, NY: Natural Resource Governance Institute. https://resourcegovernance.org/sites/default/files/documents/beneficial-ownership-screening_0.pdf; Prichard, W. (2018). Linking beneficial ownership transparency to improved tax revenue collection in developing countries. ICTD Summary Brief 15. Brighton, UK: International Centre for Tax and Development. http://opendocs.ids.ac.uk/opendocs/handle/123456789/13753
43When Scottish Limited Partnerships were brought into the UK’s beneficial ownership disclosure regime, the number of registrations fell from 5 215 in 2016 to 2 823 in 2017, see Bellingcat Investigation Team. (2018). Scottish Limited Partnerships: Scottish in name only. Bellingcat, 2 March. https://www.bellingcat.com/news/uk-and-europe/2018/03/02/scottish-limited-partnerships-scottish-name/
44EITI Secretariat. (2017). EITI Chair statement on United States withdrawal from the EITI. Extractive Industries Transparency Initiative [News post], 2 November. https://eiti.org/news/eiti-chair-statement-on-united-states-withdrawal-from-eiti
Some of the earliest open data experiments revolved around crime data and were driven by public and journalistic interest in local crime data; however, the open data community related to crime and justice data remains one of the least developed.
Open data work in the crime and justice domain faces particular challenges related to privacy, legacy systems, and interoperability, and often involves working with some of the most conservative institutions.
Donors and international organisations have increasingly recognised the potential links between open data and the crime and justice sector, but there are many cultural and coordination barriers to be overcome.
A strong and sustainable judicial open data ecosystem has the potential to create more transparent and accountable judicial institutions and to improve the quality and effectiveness of judicial public policy, leading to greater access to justice and safer environments for all.
In May 2005, a month before the official Google Maps application programming interface (API) was launched, ChicagoCrime.org was launched, a pioneering experiment that took crime data from the Chicago Police Department and presented it on an interactive map. Not only did this inspire a plethora of diverse mapping mash-ups,1 but it also sparked many other data-driven crime maps and acted as a key reference point for early open data arguments.2 Yet despite this strong beginning, crime and justice probably remains one of the least developed sectors for open data. Outside of the publication of administrative data by police forces, usually at the local level to enable crime incident mapping, the release of open data from judiciaries or other entities within the justice system remains rare, and governments are generally more reluctant to open up crime and justice data to the public. According to the last edition of the Open Data Barometer, only 17% of surveyed governments had, by 2017, made any crime data available to the public as open data.3
There is, however, a growing awareness of “open justice” as a public good and the need to apply open data principles to enhance transparency, accountability, and citizen participation related to the activities of the government agencies dealing with crime and justice matters. Evidence of this movement toward “open justice” can be seen in the growing number of worldwide judicial commitments included by member countries of the Open Government Partnership (OGP) in their National Action Plans. In 2011, only two out of a total of 170 commitments delivered by member states related to the judiciary, while the last three years have seen an increase to 63 (16 in 2015, 25 in 2016, and 22 in 2017). Of the total of 100 justice-related commitments delivered within the OGP system since 2011, 24 are based on the use of open data (see box, Examples of OGP justice-related commitments based on the use of open data).
The importance of open crime and justice data derives from the need to reinforce transparency and accountability. Crime and justice institutions have historically been seen as rather aloof institutions, detached from social influence; however, these institutions take actions and make decisions that should really not be considered any differently from other public institutions with regard to the need to be transparent and subject to constant public scrutiny.4
Government is often divided into three branches: legislative, executive, and judicial. Open government data programmes have predominantly focused on the executive, which has responsibility for the delivery of government services and the implementation of legislation, leading to the release of crime data and crime mapping from police institutions or ministries. However, there has been much less focus on the availability of open data from the judicial branch of government.
This must be addressed as a core issue of democracy as the justice system should be a citizen-centred public service, where decisions are actually made by civil servants who are entrusted with the task of observing the law but are in no way above it.5 Greater levels of open data from the judiciary can help the system become not only more transparent and accountable, but also more efficient. Open data should inform judicial public policy. While, at present, policies are often designed from the top down and can result in poor quality services, the use of open data to build sound judicial policies through data analysis and citizen engagement will provide more efficient judicial services. Jimenez-Gomez recently described the worldwide state of the art of “open justice” initiatives more broadly and identifies the use of open data as a core element that should be taken into account to enhance the accountability of the courts.6
There are many elements that go into open justice, often involving the innovative use of technology in the crime and justice space. This chapter is intended to be an examination of the evolution of open data specifically pertaining to crime and justice, and is not intended to include the additional analysis of policies related to civic participation or technology other than those related to open data.
Examples of OGP justice-related commitments based on the use of open data
France (2015), National Action Plan 1, Commitment 12: Open Legal Resources7
This commitment received a starred rating after evaluation by the OGP’s Independent Reporting Mechanism due to its potentially transformative impact. It includes further developing the provision of legal and legislative resources as reusable open data and deepening citizen participation in developing innovative services and open source tools to facilitate the understanding and preparation of legislative texts, as well as in the drafting (avant-projet de loi) of the Digital Bill.
Spain (2017), National Action Plan 3, Commitment 4.1: Open Justice in Spain8
This current commitment focuses on advancing open data as an instrument for achieving openness in Spain’s judicial branch. It seeks to promote the citizen’s right to access judicial information, including the initial steps required to transform the existing model of judicial statistics into a new system based on open data with improved characteristics regarding the quality of data, its collection, and management.
The most common sources of open data in this domain are the government agencies responsible for delivering services and implementing policies related to crime and justice. Hence, it makes sense that a vast majority of open data initiatives are national projects driven by institutions of the executive branch and the judiciary, with a few of them carried out by international organisations such as the European Union (EU) or the United Nations Office for Drugs and Crime (UNODC). While there is a not negligible amount of crime and justice data collected by private sector institutions, such as law firms, the potential contribution of these sources is yet to be realised.
Bargh, Choenni, and Meijer have accurately identified “three typical challenges” for the implementation of open data in the judicial field, highlighting privacy, legacy, and interoperability as significant challenges that should be taken into account in further development.9
Privacy relates to the required balance between the transparency of data and privacy for the real-life persons whose sensitive attributes, such as names, birth dates, crime types, or judgments, must be protected by removal or anonymisation.
Legacy refers to the very nature of legal data and the semantic evolution over time caused by continuous changes to rules and regulations. New crimes under the law need to be codified and old crimes can have their names changed or be redefined. Managing legacy data also becomes a challenge when government reform initiatives involve switching to new IT landscapes, requiring the migration of large amounts of accumulated data (data historically stored on paper) and then transferred to newer electronic systems. In order to make this data open and reusable, the importance of effective independent management of legacy systems cannot be underestimated.
Finally, the challenge of interoperability alludes to the necessity of ensuring that different sets of data, gathered by a large number of different agencies, be collected, stored, and then released using standardised criteria and processes, allowing the data to be integrated and combined with data from external sources. The justice system also needs to advance the use of unique identifiers that would make it easier to connect data across institutions and avoid redundancies in data collection between multiple partners across the judiciary who may be recording the same information.
The structure of the judiciary within federal governments deserves a special mention as the existence of national and sub-national levels (involving different judicial systems within one country) requires complicated inter-institutional coordination, making synchronisation and interoperability especially difficult to accomplish. Collaboration between different branches of government also presents challenges to reforms driven by the openness agenda as the transversal interaction required is perceived, in some cases, as a threat to the separation of powers (i.e. judicial independence).
Political and cultural barriers are still common hurdles for the implementation of the open data agenda in the public sector. Those barriers tend to be even higher in the case of institutions dealing with crime and justice (law enforcement agencies, the judiciary), which are traditionally some of the most conservative and independent institutions. As Roberto Gargarella notes, this is based on a conception of impartiality that holds dear the idea of isolated reflection by an individual (or a small elite of individuals) as a requisite for making correct or unbiased decisions.10
Another challenge is created when police and justice institutions lag behind in terms of technological capacity (i.e. hardware, software, skills, and expertise). Universities and scientific agencies could, and should, play a key role in building capacity for the use of data and emerging technologies in the police and justice sector.
Additionally, the involvement of civil society is still very limited in this field. Few civil society networks have projects looking specifically at open data in the crime and justice domain, with even fewer being large enough to be known internationally. However, there are some emerging examples of civil society organisations (CSOs) working independently or in collaboration with government agencies, including Measures for Justice in the United States (US), OpenGiustizia in Italy, La Nación Data in Argentina, and the Justice Data Lab11 in the United Kingdom (UK).
Measures for Justice12 is a civil society initiative launched in 2011 that has developed a data-driven set of performance measures aimed at assessing and comparing different aspects of the criminal justice system in state jurisdictions of the US. The analysis, using data extracted from administrative case management systems, covers three main categories: fiscal responsibility, fair process, and public safety.
OpenGiustizia13 was a project focused on organisational innovation and optimisation for the Court and the Public Prosecutor of Napoli, developed by three Italian universities and financed by the EU’s Social Fund between 2007 and 2013. Among the project’s objectives was the creation of interoperability within the system’s databases and the provision of tools for accountability and performance evaluation.
La Nación Data14 is a data journalism initiative which has been underway since 2012 by one of the main newspapers of Argentina. It consists of a news portal and blog based on data collected from various sources. It makes an intensive use of open crime and justice data, delivering content on themes such as femicide, high-profile judicial cases, and the penitentiary system.
Justice Data Lab15 is a service run by the Ministry of Justice of the UK and New Philanthropy Capital. Set up in 2013, it is aimed at organisations providing offender rehabilitation services. It uses administrative data on re-offenders to conduct on-demand impact evaluations, so that these organisations can assess the actual impact of their work through data-based evidence.
For most of these initiatives, data availability and interoperability remain a challenge. Measures for Justice, for example, covers just six US states at present. The next section will explore the different kinds of data that could or should be available on crime and justice, and the various actors involved in creating and using it.
The primary focus of opening data within the crime and justice sector is on three main categories of information:
1.Case data: information on judgments and court rulings issued by crime and justice institutions (e.g. courts, tribunals, etc.).
2.Jurisdictional data: performance and activity data from crime and justice agencies, such as statistical data related to cases, reported crimes, arrests, citizen complaints, etc.
3.Structural data: information on the internal characteristics of crime and justice agencies, such as their organisation, their internal processes, how they allocate their budget, infrastructure, rules of procedure, staff and salaries, procurement, etc.
Data produced or collected by institutions within the crime and justice system is generally made available in three main ways:
1.As primary data (i.e. unprocessed, as it was collected at the source) in downloadable datasets or files (CSV, XML, DOC, XLS, PDF).
2.As aggregated statistical data (i.e. as processed, and anonymised if necessary, data) in the form of downloadable files or datasets.
3.In aggregate form, but as graphical presentations either in static visualisations of statistical data or through the use of user-facing data visualisation and analysis tools.
As the box below illustrates, while jurisdictional and structural data may originate with either the judicial or executive branches of government, case data tends to be solely within the province of the judiciary.
The open crime and justice open data ecosystem
Using a data ecosystem mapping methodology,16 this diagram represents the crime and justice open data ecosystem as described in this chapter. Unbroken lines represent constant interaction between the actors involved, while dotted arrows represent a direct but non-continuous bond. Information producers (e.g. the judiciary, ministries of justice, the police) are often also active consumers and users of the information produced. Stakeholders, such as academia, CSOs, and data journalists, act as intermediaries, using raw data and transforming it into user-friendly information products for a broader range of users (citizens).17
The way in which open crime and justice data are combined and delivered to users can vary substantially. Table 1 gives 18 examples of data projects, although, for the sake of this chapter, detailed analysis is restricted to four illustrative cases: Openjustice (US), Data.police.uk (UK), Datos.jus.gob.ar (Argentina), and ECourts (India).
Openjustice18 is an open data project developed by the Office of the Attorney General of the Department of Justice of California (US) to establish a criminal justice data portal which was first launched in 2015. It currently delivers jurisdictional data from all enforcement agencies across the State of California, covering crime, deaths in custody, hate crimes, homicides, juvenile court and probation, citizen complaints, and the use of force. Structural data on the portal includes lists of law enforcement and criminal justice personnel, as well as county-level contextual data (educational attainment, income, poverty, and unemployment levels) for each county. The data is made available as downloadable datasets and through visualisation tools.
Data.police.uk19 is an open data portal maintained by the Home Office of the United Kingdom that provides data about crime and policing in England, Wales, and Northern Ireland.
Table 1: Open crime and justice data projects around the world |
||||||
INITIATIVES |
TYPE OF DATA |
FORMATS |
||||
Cases |
Jurisdictional data |
Structural data |
Primary data (dataset) |
Aggregate data (dataset) |
Aggregate data (graphic) |
|
OpenJustice (US) |
X |
X |
X |
X |
X |
|
data.police.uk (UK) |
X |
X |
X |
X |
X |
|
datos.jus.gob.ar (Argentina) |
X |
X |
X |
X |
X |
|
ECourts (India) |
X |
X |
X |
X |
||
Measures for Justice20 (US) |
X |
X |
X |
X |
||
Mapa del Delito CABA21 (Argentina) |
X |
X |
X |
X |
||
Datos Abiertos del Poder Judicial de Costa Rica22 |
X |
X |
X |
X |
||
Data Portal Singapore’s Public Data23 |
X |
X |
X |
X |
||
Productivity Commission24 (Australia) |
X |
X |
X |
X |
||
Dados Abertos MPRS25 (Brazil) |
X |
X |
X |
X |
||
Judicial Department26 (Russia) |
X |
X |
X |
X |
||
Statistics Canada Crime and Justice27 |
X |
X |
X |
X |
||
The Judiciary28 (Liberia) |
X |
X |
X |
|||
data.unodc.org29 (UNODC) |
X |
X |
X |
|||
ISS Crime Hub30 (South Africa) |
X |
X |
X |
|||
Otvorené Súdy31 (Slovakia) |
X |
X |
X |
|||
Eur-lex32 (EU) |
X |
X |
||||
De Rechstpraak33 (Netherlands) |
X |
X |
It was launched in 201334 and delivers jurisdictional data on reported crimes and all kinds of police activity, including drug seizures, the issuance of firearms certificates, breath tests, or the setting up of cordons under the Terrorism Act. It also contains structural data on the police workforce, procurement, salaries, etc. Both primary and aggregated statistical data is available, and the update frequency is either quarterly or annually depending on the subject matter.
Datos.jus.gob.ar35 is the open data portal of the Ministry of Justice and Human Rights of Argentina, containing overall data on the country’s justice sector. The portal was launched in 2016 and offers data on a range of jurisdictional activities, such as the delivery of pre-judicial mediation and the provision of access to justice, as well as information on criminal policy, the prison system, and structural data on institutions of the judicial branch and the Ministry of Justice. Primary and aggregate data are available as downloadable datasets as well as via visualisation tools. The update frequency of datasets depends on the subject matter, ranging from daily or monthly to annually.
ECourts36 is a service provided by the Ministry of Law and Justice and the Supreme Court of India. Online since 2013, it contains real-time judicial data for all jurisdictions subject to the Indian judiciary. It aims to serve as a dynamic source of information on the judicial system. It is based upon a “National Judicial Data Grid”, which works as a nationwide data warehouse for case data and aggregated data delivered through visualisations.
Additional examples of open crime and justice data initiatives are highlighted in Table 1, classified according to the categories mentioned previously.37
Open crime and justice data is expected to play a key role in measuring and delivering progress in terms of social and economic development. Although the UNODC has been working on crime and justice statistics for many years, the United Nations 2030 Agenda for Sustainable Development places a fundamental importance on open data at all levels to promote accountability and inclusive decisions, to support reductions in crime and violence, and to improve access to justice for all over the next 11 years. Although data will play a vital role in showcasing national progress toward over 169 global targets encompassed within the Sustainable Development Goals (SDGs), it will also allow decision-makers in all three branches of government to be able to rely on quality information for the design of public policies, based on evidence, to achieve those global targets. Key SDGs for crime and justice data include SDG 16, aimed at the reduction of violence, the reduction of organised crime, the development of effective, accountable, and transparent institutions, and ensuring access to justice and public information, and SDG 5, which focuses on gender equality and the total elimination of violence against women and girls.
With regard to SDG 5, two specific examples of effective initiatives should be noted: the provision of primary open data on sexual offences by the Colombian government through their Open Data Portal38 and the specific section on gender issues of the Open Judicial Data Portal of the Ministry of Justice of Argentina, datos.jus.gob.ar,39 where primary data is made available on femicides, human trafficking, and assistance granted to victims of violence.
The potential of open crime and justice data with regard to the SDGs will probably have a crucial impact in the future allocation of resources and funding for associated projects and initiatives. International organisations, such as The Hague Institute for Innovation of Law or the Latin American Open Data Initiative (ILDA), are already orienting their funding priorities in this direction as are other significant actors like the Open Data Institute, Transparency International, and the Open Society Foundations. Crime and justice open data is also increasingly on the agenda for key international organisations that are pushing for open government-oriented reforms in the public sector, including the International Development Research Centre, the OGP, and mySociety, among others.
An enabling environment is currently emerging for the spread of open data by and between crime and justice institutions. International organisations and governments have begun to consider crime and justice data as a raw material to use in implementing and evaluating public policies. At the same time, data journalists, academia, and CSOs are learning how to use open data to promote more transparent and accountable judicial institutions.
There are still, however, many barriers to the implementation of quality open data initiatives and specifically to the extended use of judicial open data. The main barriers are traditional cultural and political forces against openness within the judicial system, the lack of adequate financial and human resources invested in capacity building, and current inadequate or restrictive legal frameworks, including those that create a barrier to publishing data from judicial cases. Another hurdle to overcome is the difficult but necessary coordination of the various public institutions producing judicial data, as well as the lack of consistent standards around the production and publishing of judicial information and the relatively weak expertise of civil society and other actors in analysing the data.
Ultimately, two main actions are required in order to strengthen the judicial open data ecosystem. First, at the institutional level, governments should include the judicial sector within their access to information laws and open data policies and regulations. At the same time, justice institutions should commit to country-wide and transversal open data strategies. These strategies must take into account open judicial data intermediaries, such as academia and CSOs. It is recommended that one judicial institution in the country (e.g. the Ministry of Justice, the Supreme Court, or the Judicial Council) takes on the leadership role and coordinates the development and implementation of open data policies and plans. Additionally, judiciaries should set goals, targets, and indicators for justice delivery, and the resulting performance data should be available and evaluated through open data.
Second, concerning the use of open judicial data, each judicial system must establish a balance between publication and privacy protection. Privacy should not be used as an excuse for avoiding openness. Governments and international organisations should promote the use of open judicial data through different tools (curricula, hackathons, journalism, etc.) and public participation mechanisms should be put in place to assess and set priorities for the data release process. We also recommend setting up open judicial data portals to contain the totality of available data for each judicial system. Open judicial data should not only be provided in the format of open datasets, but also through visualisations and data stories in order to reach a wide variety of users. Advancing the interoperability of the data held in the systems of numerous institutions producing open judicial data is a must. Governments, working with leading international organisations, should promote the definition and adoption of specific standards for open judicial data. International organisations should also promote the creation of open judicial data networks and working groups, as well as the participation of open judicial data experts and leaders in relevant conferences and debates wherever they are taking place. With success in these endeavours, the next decade should see the ongoing development of a strong and sustainable judicial open data ecosystem that can enable more transparent and accountable judicial institutions while delivering more effective access to justice and safer environments for all.
Further reading
Elena, S. (2015). Open data for open justice: A case study of the judiciaries of Argentina, Brazil, Chile, Costa Rica, Mexico, Peru and Uruguay. Presented at Open Data Research Symposium, 27 May 2015, Ottawa, Canada. http://www.opendataresearch.org/dl/symposium2015/odrs2015-paper10.pdf
Elena, S. (2018) Justicia Abierta: Aportes para una agenda en construcción [Open Justice: Contributions for an agenda under construction]. Ediciones SAIJ. http://www.bibliotecadigital.gob.ar/items/show/1818
Huang, Y. (2017). Open data portal for Champaign racial and criminal justice: Towards greater transparency in policy making. Master’s Thesis, Urban Planning, University of Illinois. http://hdl.handle.net/2142/98538
Jiménez-Gómez, C.E. (2017). Hacia el Estado abierto: Justicia Abierta en America Latina y el Caribe [Towards the Open State: Open Justice in Latin America and the Caribbean]. In A. Naser, Á. Ramírez-Alujas, & D. Rosales (Eds.), Desde el gobierno abierto al Estado abierto en América Latina y el Caribe [From open government to the open state in Latin America and the Caribbean]. CEPAL. https://repositorio.cepal.org/handle/11362/41353
Jiménez-Gómez, C.E. & Gascó-Hernández, M. (2016). Achieving open justice through citizen participation and transparency. IGI Global. https://www.igi-global.com/book/achieving-open-justice-through-citizen/148515
Marković, M. & Gostojić, S. (2018). Open judicial data: A comparative analysis. Social Science Computer Review. https://journals.sagepub.com/doi/10.1177/0894439318770744
Sandra Elena is an open data and open government expert based in Buenos Aires. She currently coordinates the Open Justice Program, Ministry of Justice and Human Rights, which is implementing the first open data initiative in Argentina’s judiciary. You can follow Sandra at https://www.twitter.com/sandra_elena1 and learn more about her work at https://datos.jus.gob.ar.
How to cite this chapter
Elena, S. (2019). Open data, crime and justice. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The state of open data: Histories and horizons (pp. 65–76). Cape Town and Ottawa: African Minds and International Development Research Centre. http://stateofopendata.od4d.net
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. It was carried out with the aid of a grant from the International Development Research Centre, Ottawa, Canada. |
1Holovaty, A. (2018). In memory of Chicagocrime.Org. Holovaty.Com, 31 January. http://www.holovaty.com/writing/chicagocrime.org-tribute/
2Huijboom, N. & Van den Broek, T. (2011). Open data: An international comparison of strategies. European Journal of EPractice, 12(1), 4–16. http://unpan1.un.org/intradoc/groups/public/documents/UN-DPADM/UNPAN046727.pdf
3Web Foundation. (2018). Open Data Barometer. 3rd edition. https://opendatabarometer.org/3rdedition/data/
4Montero, G. (2017). Del gobierno abierto al Estado abierto: la mirada del Centro Latinoamericano de Administración para el Desarrollo [From open government to the open state: The view of the Latin American Center for Administration for Development]. In A. Naser, A. Ramirez, & D. Roslaes (Eds.), Desde el gobierno abierto al Estado abierto en América Latina y el Caribe [From open government to open state in Latin America and the Caribbean] (pp. 53–81). Santiago: Comisión Económica para América Latina y el Caribe [Economic Commission for Latin America and the Caribbean] (our translation). https://repositorio.cepal.org/handle/11362/41353
5Mora, L.P.M. (2006). Jueces y Reforma Judicial en Costa Rica [Judges and judicial reform in Costa Rica]. Revista de Ciencias Jurídicas, 109, 15–32. https://revistas.ucr.ac.cr/index.php/juridicas/article/view/9717 (our translation)
6Jiménez-Gómez, C.E. (2016). Open judiciary worldwide: Best practices and lessons learnt. In C.E. Jiménez-Gómez & M. Gascó-Hernández (Eds.), Achieving open justice through citizen participation and transparency (pp. 1–15). Hershey, PA: IGI Global. https://www.researchgate.net/publication/307598204_Open_Judiciary_Worldwide_Best_Practices_and_Lessons_Learnt
7https://www.opengovpartnership.org/starred-commitments/open-legal-resources
8https://www.opengovpartnership.org/documents/spain-action-plan-2017-2019
9Bargh, M.S., Choenni, S., & Meijer, R.F. (2016). Integrating semi-open data in a criminal judicial setting. In C.E. Jiménez-Gómez & M. Gascó-Hernández (Eds.), Achieving open justice through citizen participation and transparency (pp. 137–156). Hershey, PA: IGI Global. https://www.igi-global.com/chapter/integrating-semi-open-data-in-a-criminal-judicial-setting/162839
10Gargarella, R. (1996). La justicia frente el gobierno [Justice against the government](our translation). Barcelona: Ariel.
11https://www.gov.uk/government/publications/justice-data-lab
12https://measuresforjustice.org/
13http://www.opengiustizia.it/
14https://www.lanacion.com.ar/data
15https://www.gov.uk/government/publications/justice-data-lab
16http://www.opendataresearch.org/emergingimpacts/methods.html
17Elena, S., Aquilino, N., & Riviére, A. (2014). Emerging impacts in open data in the judiciary branches in Argentina, Chile and Uruguay. Center for the Implementation of Public Policies Promoting Equity and Growth. http://www.opendataresearch.org/content/2014/658/emerging-impacts-open-data-judiciary-branches-argentina-chile-and-uruguay.html
18https://openjustice.doj.ca.gov/
20https://measuresforjustice.org/
21https://mapa.seguridadciudad.gob.ar/
22https://datosabiertospj.eastus.cloudapp.azure.com/
23https://data.gov.sg/dataset?organization=ministry-of-home-affairs-singapore-prison-service
24https://www.pc.gov.au/research/ongoing/report-on-government-services/2018/justice
25http://dados.mprs.mp.br/dados_abertos/
27https://www150.statcan.gc.ca/n1/en/subjects/crime_and_justice
30https://issafrica.org/crimehub
34Smith, A.M. & Heath, T. (2014). Police.uk and Data.police.uk: Developing open crime and justice data for the UK. JeDEM – EJournal of EDemocracy and Open Government, 6(1), 87–96. https://jedem.org/index.php/jedem/article/view/326/273
36http://ecourts.gov.in/ecourts_home/
37Information available at Open Data Portals as of 13 March 2018.
38https://www.datos.gov.co/Seguridad-y-Defensa/Delitos-Sexuales-2016/3j7m-zgyi/data
39http://datos.jus.gob.ar/pages/datos-de-justicia-con-perspectiva-de-genero
From the mid-2000s, organisations and individuals working in the field of development assistance and humanitarian action have identified significant gaps in the data sharing needed to support effective coordination of funding and operational work. Early adopters of open data, from 2008 onwards, have worked to fill these gaps and have continued to pioneer open data projects.
Availability and accessibility of open data have increased substantially, often outstripping the capacity of organisations to reliably use this data, and more work is needed to ensure that data sharing reflects the principles of data protection.
Greater investment is needed in joining up data and establishing common languages and standards for aid-related data. Open data approaches have a key role in breaking down silos between aid, budget, and demographic data.
Research must now move beyond qualitative case studies to rigorous testing of theories of change through quantitative longitudinal studies.
Bureaucracies like to “hug” data1 for many diverse reasons, and international development aid and humanitarian agencies are no exception. For decades, the complex aid regime has been plagued by information silos and technical, political, and cultural barriers to data sharing. This legacy presents a distinct challenge to effective global assistance in an era of unprecedented humanitarian crises and persistent poverty, especially in conflict-ridden states. Doing no harm and ensuring protection are key principles in development assistance and humanitarian action; therefore, ensuring data protection must also be a principle. Sharing data requires a delicate balance of effective coordination and protection of the most vulnerable. Organisations involved in aid and humanitarian action have limited funding allocated to upskilling staff and developing infrastructure. Gaps in technology and digital literacy are often barriers to building open data processes within the complex aid delivery structures.
According to the international transparency movement’s theory of change, open data is the key to unlocking the potential of international aid. Opening data related to development assistance and humanitarian action will improve donor coordination, improve the efficiency of humanitarian action, facilitate a faster response regarding relief assistance and development spending, better inform resource planning and management, and empower stakeholders and communities to push for greater participation.2,3,4,5,6,7,8,9 Open data, simply put, will make development aid and humanitarian action more accountable and effective. But how far have we come in realising this potential?
This chapter will provide a brief overview of the state of open data in the development and humanitarian space, focusing on data collected and published by development agencies, private philanthropic organisations, and humanitarian relief organisations. It will also supply a critical assessment of the progress and pitfalls in the global transparency movement. We find that there have been significant achievements in building consensus, standards, and technical platforms around open aid data. Yet, the supply of open data has not always matched the demand, nor has the open data revolution incited the expanded use of data in the area of international aid that may have been expected.
The key challenges lingering today involve the need to improve the quality and consistency of available data. This is difficult insofar as the data models, infrastructure, training, and business risk analysis/workflow for open data in aid and humanitarian action are often insufficiently funded. At the same time, we need to build broader awareness and expand the use of open data with the objective of building data literacy and improving (and proving) the impact of open data on decisions and outcomes. Likewise, we need to make open data accessible and useful to all stakeholders, while also addressing difficult issues, such as data privacy, protection, and responsible use. Finally, to sustain the momentum behind this data revolution, we need to garner greater evidence of impact to demonstrate the benefits of open data in the field of development and humanitarian assistance.
In the context of development assistance, the open data agenda has grown out of larger debates on aid accountability and effectiveness. Since the Second and Third High Level Forums on Aid Effectiveness in Paris in 2005 and Accra in 2008, several definitions and standards on aid transparency and open data have emerged, as well as numerous efforts to construct monitoring and verification systems around compliance with international agreements and transparency guarantees. At the Fourth High Level Forum on Aid Effectiveness in Busan, South Korea, in November 2011, most major donor countries and agencies, including many from the Global South, committed to reporting their aid information according to a common standard that combined three complementary systems: the Organisation for Economic Co-operation and Development (OECD) Development Assistance Committee (DAC) Creditor Reporting System (CRS++),10 the OECD DAC Forward Spending Survey (FSS),11 and the International Aid Transparency Initiative (IATI).12
The open data movement in international development has seen the development of a rich set of supranational initiatives,13 national-level policies, and international non-governmental organisations (NGOs), and networks devoted specifically to the advocacy and production of transparent and open aid data. Today, the principals and goals of open data are embedded in the United Nations (UN) 2030 Sustainable Development Goals (SDGs). In 2014, the UN’s Independent Expert Advisory Group (IEAG) published A world that counts: Mobilising the data revolution for sustainable development.14 The report called for investments in new technologies and capacity building to improve the quantity and quality of data to address the inequalities in data access between countries and for donors to promote the use of data in decision-making, participation, and accountability.15 Similar commitments were made in the 2015 African Data Consensus,16 the 2016 G8 Open Data Charter,17 the Grand Bargain for the Global Humanitarian Agenda,18 and, more recently, the March 2018 UN Statistical Commission’s 49th Session on “Better Data, Better Lives”.19 The open data movement as it pertains to international development and humanitarian aid has shared a similar trajectory in terms of the evolution of influential policies and activities.
The growth and support of open data as applied to humanitarian action is often tied to large-scale humanitarian crisis events. This work often starts with determining workflows and best practices for sharing data that will not do harm, and the first data that needs to be shared is most often geospatial data. The Global Facility for Disaster Reduction and Recovery (GFDRR), created in 2006,20 has been instrumental in advocating and piloting open data for both resilience and disaster recovery, primarily through its OpenDRI initiative established in 2011.21 The GFDRR has connected key humanitarian actors with technical communities. Open data, including OpenStreetMap (OSM),22 has become more central for humanitarian action after its use during the response to the 2010 Haiti earthquake. By engaging volunteers, the global OSM community can quickly contribute essential geospatial data, such as location data on buildings and roads. Having the most up-to-date data can provide those involved in delivering humanitarian aid with the information needed to make strategic decisions. The UN Foundation sponsored Disaster relief 2.0 report outlined the potential impact of this kind of information sharing. The GFDRR, the World Bank, the United Nations Office for the Coordination of Humanitarian Affairs (UN OCHA), and government agencies collaborated with the open data community during this response.23
Many other emergency response activities have included organised efforts of global open data advocates within the humanitarian network or within digital humanitarian networks like the Digital Humanitarian Network or CrisisMappers.24 The Humanitarian OpenStreetMap Team (HOT),25 founded in 2010, has worked to coordinate technology communities, mappers, and humanitarians to deliver geospatial data for both international aid and humanitarian action. Missing Maps, founded in 2014 by the American Red Cross, British Red Cross, Medicine Sans Frontiers/Doctors Without Borders UK (United Kingdom), and HOT, promotes the use of open map data for humanitarian action from disaster responses to health programming.26 The UN OCHA’s establishment of the Humanitarian Data Exchange in 2014 builds on years of effort by multiple humanitarian groups to open data.27 UN Global Pulse, the United Nations International Children’s Emergency Fund (UNICEF), the World Food Programme, the United Nations High Commissioner for Refugees (UNHCR), and other UN agencies all work with open data. In the humanitarian space, the CrisisMappers Conference and the State of the Map events28 have convened businesses, technologists, researchers, open data enthusiasts, funders, and governments. Burgeoning support for open data has also been reinforced by the proliferation of work by civil society organisations (CSOs), NGOs, technologists, businesses, and researchers, much of which has been initiated as a result of global and regional events, including the annual International Open Data Conference,29 Open Data Day,30 and the Data for Development Festival.31
The International Aid Transparency Initiative (IATI)
IATI was launched in Accra, Ghana in 2008 at the Third High Level Forum on Aid Effectiveness. IATI is a multi-stakeholder, voluntary initiative created to better capture timely, detailed, comparable information on aid from traditional multilateral and bilateral donors, new and emerging donors (such as the BRICS countries, Brazil, Russia, India, China, and South Africa), NGOs, and foundations.
IATI offers a common standard for reporting and promoting the principles of open aid by making all data publicly accessible, machine-readable, and downloadable for replication and integration with other datasets. It also makes a variety of aid information available, including data on forward spending and subnational activity locations. IATI is supported by a governing board, a technical secretariat, and a Members Assembly, and currently has over 600 publishers. In 2009, Publish What You Fund (PWYF) was created to monitor donor compliance with IATI and other aid transparency commitments through an annual Aid Transparency Index (see Figure 1).
Figure 1:Overview of the 2018 Aid Transparency Index
In the development aid space, key leaders in the open data movement include the Members’ Assembly of IATI, PWYF, Development Initiatives, the World Bank Open Aid Partnership and Mapping for Results team, Development Gateway, AidData, the International Development Research Centre, the Transparency and Accountability Initiative, Interaction, and the Open Data Research Network. These actors have been central to establishing the broad momentum for open aid and establishing the methodologies and platforms needed to provide open aid data within developing and emerging market economies (through country-owned aid information management systems, such as Development Gateway’s Aid Management Platforms),32 bilateral and multilateral aid donor dashboards,33 and international datasets (including the IATI registry, Development Initiatives’ Development Data Hub, and AidData’s project-level aid datasets).34
As discussed above, one clear success in the open aid data movement is the emergence of a clear consensus on the need to open data and to establish robust policies to ensure the provision of standardised aid data by development and humanitarian organisations, national governments, and supranational institutions. There has been considerable progress in developing the infrastructure, in particular the systems and standards needed to collect, store, and publish open data, such as the IATI XML standard and the Humanitarian Data Exchange.35 To reinforce the transparency movement, monitoring and rating systems have been established to oversee aid donor performance, including one aid-specific index, PWYF’s Aid Transparency Index, and others with a broader focus on open data, such as Open Data Watch’s Open Data Inventory, Open Knowledge Foundation’s Government Open Data Index, and the World Wide Web Foundation’s Open Data Barometer.
The Centre for Humanitarian Data and HDX
The UN OCHA’s Humanitarian Data Exchange (HDX) is an open data platform for sharing data across organisations and crises. Early HDX iterations included support from technology communities at hackathons leading up to the official HDX launch in 2014. HDX has a series of features, including organisation pages, country pages, and crisis pages. HDX also includes tools for automated charting based on the Humanitarian Exchange Language (HXL), a data standard based on using hashtags in spreadsheets.
HDX provides step-by-step guidance for sharing data while adhering to strict practices of organisational and individual accountability. All datasets are reviewed to ensure they do not include personal identifiable data. As of March 2018, there are over 6 500 datasets and hundreds of participating organisations sharing a wide range of open data, including assessments, geospatial, population, and more.
There are HDX Labs in Dakar, Senegal and Nairobi, Kenya, and, building on its success to date, the UN OCHA launched the Centre for Humanitarian Data in The Hague, Netherlands, in late 2017 with a focus on four areas: data policy, data literacy, data services, and network engagement.
More recently, the open aid data movement has introduced innovations to improve access to data in forms useful to stakeholders and decision-makers. This has produced platforms that enable interactive use and easily downloadable data. Perhaps, more critically, the collection of data has taken on more inclusive approaches. From mapathons to hackathons, the entire data lifecycle has changed with new mobile and community engagement programmes. This improves the timeliness and usefulness of data and increases awareness and community buy-in. Simultaneously, there is growing attention to the need to “join up” open data across sectors (e.g. open aid data with open budget data) to increase its usefulness to key stakeholders.
Despite the progress described above, there remain numerous challenges to realising the promise of open data in international development and humanitarian action. There are four main issues: persistent problems in providing consistent, standardised data across a proliferating number of sites; concerns about privacy and data protection; a lack of organisational investment in technology; and the lack of clear evidence of the cost benefits and impact of open aid data.
One challenge facing open aid data is widespread inconsistency in how multilateral organisations report their data.36 While the IATI registry has been increasingly used by development agencies, reporting has been uneven across organisations and across key data points, especially disbursement and procurement data. Some multilateral organisations, such as the World Bank Group, provide more financial information on their websites, although not necessarily as open data. Other organisations, such as the OECD, United Nations Environment Programme (UNEP), International Organisation for Migration (IOM), and the International Monetary Fund (IMF), have been slow to release open financial data.
Likewise, there is often conflicting data across different open data systems. For example, in collecting and attempting to code data on aid projects in Nepal and Bangladesh, the Complex Emergencies and Political Stability in South Asia (CEPSA) team at the University of Texas collated all project documents, financial information, and geolocation data from Nepal’s Aid Management Platform, Bangladesh’s Ministry of Finance, IATI, AidData, OECD CRS++, and the websites of numerous donors, including the World Bank, Asian Development Bank, Japan, the United States (US), and the United Kingdom (UK). The CEPSA team found dramatically different totals on the number of projects and surprising gaps in the availability of activity-level data across the different sources, including project titles, funding amounts, and project data. The CEPSA team even found significant inconsistencies in the data coming from individual donor countries. For example, in attempting to assess patterns in US development assistance in Nepal and Bangladesh, there were discrepancies in the data provided by the US Congressional Greenbook, OECD CRS++, USAID Foreign Aid Tracker, and the US State Department Foreign Aid Dashboard.37
A root cause of these inconsistencies may be the lack of common data sharing protocols. One key exception is in the health sector, where there are data sharing protocols for pandemic and epidemic emergencies available via the World Health Organization (WHO). There are also informal informational working groups in the humanitarian sector, as well as country-level donor sector working groups and donor coordinated forums in the development sector. However, efforts to “join up” data on a global level are nascent, including initiatives such as the Joined-Up Data Standards (JUDS) Project (closed in 2017)38,39 and the Global Partnership for Sustainable Development Data (GPSDD)40 working group on SDG Data Interoperability. Nonetheless, the irony is that the open data movement may be moving too fast as many data sources have yet to converge upon a common standard (with common fields) for collecting and reporting data.
Concerns about data privacy, protection, and responsible use are valid and persuasive reasons why some organisations have been reluctant to open and share data. Choosing which data to share, and for what purpose, is very complex for humanitarian organisations. Any discussion of data sharing needs to start with ensuring the protection of the most vulnerable communities. Coordination is key for delivering effective humanitarian responses guided by international humanitarian law and standards like Sphere.41 Information managers need to collaborate to determine data sharing workflows that adhere to data protection guidance and responsible data use while still improving coordination. This is complicated by the lack of business analysis of workflows that would better support incorporating open data practices into processes, procedures, and tools. Similarly, there needs to be more effort to reconcile open data with domestic and international privacy laws and protections (e.g. the European Union’s General Data Protection Regulation), and, therefore, a need for open data advocates to understand that humanitarians may not be able to share all data given the sensitive situations in which they work.
Addressing all of these issues requires a wholesale change in how development practitioners and humanitarians work, as well as the development and adoption of data protection and responsible use policies. The International Committee of the Red Cross (ICRC) has sought to develop such protocols in their Professional standards for protection work42 and Handbook on data protection.43 The United States Agency for International Development (USAID) has created guidance for its implementing partners under ADS Chapter 50844 and has implemented a research programme on responsible data, although results are not yet public.45 The Responsible data handbook also lays out principles for handling data privacy in development projects.46
Sharing and opening data requires tools, knowledge, and established workflows. International humanitarian and development organisations have funding structures focused on either rapid response or programmatic delivery. There is rarely sufficient investment in upgrading technology infrastructure and business workflows to prepare for all the potential changes noted in the Fourth industrial revolution.47 A data revolution needs a technology revolution first. Improved data opening and sharing is also related to upskilling organisations and individuals in these sectors. Data literacy is essential for improving advocacy and the use of open data everywhere and critically important in the area of development and humanitarian assistance. Investment in data literacy, operational changes, technology innovation, and back office workflows are rarely the priority given the pressing humanitarian needs.48 This funding gap inhibits the critical changes required to properly implement tools and workflows to better support open data.
While innovation in open data has been a top priority of many development agencies at the headquarters level, these innovations often fail to appeal to country office staff, limiting impact and implementation at local levels. For example, while publishing and using IATI has been a top priority of many agencies, country staff are often unaware of IATI and are occasionally resistant to its use, creating inconsistencies between data published locally and that published internationally. More broadly, research has shown that agency staff at the country level often rely more heavily on interpersonal relationships rather than openly accessible data.49 These misalignments suggest a combination of factors:
International open data publishers often do not understand the needs of local users, leading to a top-down push for data use that results in country office fatigue and resistance.
Data published internationally often does not reflect local realities or it lacks the attributes (e.g. subnational locations and results data) needed to answer key questions on aid efficacy.
Country-level agency staff are often sceptical of the value of data generally, as local conversations and negotiations are seen as more effective means for gathering information.
Fostering data literacy requires a data culture of learning and sharing, meaning new approaches to leadership, sharing, and trust building with local stakeholders. Current systems and processes for knowledge exchange are often outdated.
Theories of change around open data still need to grapple with the necessary cultural change for data producers and data consumers. Work to advance open data should recognise that trust in, and the use of, open data can vary greatly across countries and sectors where data may be highly politicised and contested, and where the practice of evidence-based decision-making is not yet ingrained in policy-making.
An increased focus on enhancing the partnership between headquarters and country offices with the aim of tackling local challenges and improving the effective dissemination and uptake of open aid data is necessary. Examples of this include UNICEF’s partnership with Development Gateway, Development Initiatives’ work with partner governments and country offices to localise IATI data to solve priority challenges,50 and the Netherlands’ and the Department for International Development’s (DFID) engagement with suppliers and country offices to encourage disaggregated publication and use of IATI data.
There are a number of new studies that attempt to provide evidence on the impact of open data (see box, Building the evidence base: Studies of open aid data use and impact). Each of these studies seeks to fill the gap on what we know about the extent to which key stakeholders are actually aware of open data, as well as their willingness and ability to access and use these systems. Ultimately, with the realisation that an “if we build it, they will come” approach is simply not enough, attention has shifted from developing to testing the open aid data theory of change.
Building the evidence base: Studies of open aid data use and impact
While evidence of the longer-term impact remains sparse, there are several recent studies that have attempted to directly measure the levels of awareness, use, and outputs related to open aid data.
Studies by USAID (2015: Aid transparency country pilot assessment),51 Development Gateway (2016: Use of IATI in country systems),52 and Development Initiatives (2017: Reaching the potential of IATI data)53 have studied the awareness and use of IATI data globally and within specific countries, such as Zambia, Ghana, and Bangladesh. Similar studies have examined awareness and use of in-country aid information management systems, including in Nepal (with a 2014 study by Freedom Forum),54 in Sierra Leone (in a 2017 Oxfam study),55 and in Timor Leste, Senegal, and Honduras (in a 2017 report from AidData).56
Fewer studies have attempted to measure the actual impact of open aid data on other variables, such as accounting in development finances, donor coordination, citizen empowerment, and development outcomes (with the exception of PWYF’s 2017 work in Benin and Tanzania,57 papers by GovLab in 2017,58 and Kotsadam et al. in 2018 in Nigeria59).
To date, evaluations related to open aid data have been largely qualitative and limited to nongeneralisable case studies. In many instances, these case studies reveal little awareness of open aid data systems and engagement with that data. As a step prior to measuring impact, research must first better understand the conditions that enable or constrain data awareness and use. Such conditions often boil down to simple capacity issues with respect to accessing and analysing data, which often require higher bandwidth, sufficient server capacity, and the availability of computers and smartphones. Access and use also require sufficient expertise to navigate data that is supplied in foreign languages (especially English) or complex programs (ArcGIS, XML formats, and dense CSV files). To understand awareness and use, research must also address the complex political economy around data ecosystems. This includes developing a sensitivity to the cultures of data production and sharing, the politics behind resource allocation, the delegation of authority for open data systems, the role of the media and data journalists in serving as intermediaries, and the historical relationships between governments, donors, and civil society groups.60
The past decade of the application of open data to development assistance and humanitarian action has provided critical lessons for moving forward. We offer four key recommendations to the international open data community on how to address the key challenges faced in making open data work in the delivery of development assistance and humanitarian action.
1.The release and use of open data faces organisational hurdles. This may include a lack of resources and infrastructure needed to ensure quality and timely data collection or a lack of a data culture that encourages data use. Data is only useful if it is seen by end users as central to information products, evidence, decisions, and knowledge sharing. Open data advocates need to ensure that the mechanisms designed to supply open data are informed by, and integrated into, organisational structures in ways that are consistent with local data cultures and existing capacities. A common language is needed to develop an understanding between data consumers and data producers. The key to success is understanding the culture and context, then building capacity and usage with early adopters. The talking points about “why open data matters” need to incorporate and acknowledge the barriers and aim for opportunities that show true impact.
2.More investment is needed to support joined-up data initiatives. The evidence we have to date suggests strongly that stakeholders want open data around aid and humanitarian assistance, but would find it more useful if such data was more effectively integrated across sectors, especially with respect to domestic budgets and essential demographic information. We need to break down silos and manage open data with a comprehensive, holistic approach.
3.Successfully addressing data privacy, protection, and responsible use will continue to be critical to the success of the open data movement. Setting minimum data standards is the starting point for data sharing. Improving education on the impact and value of data sharing while still adhering to data protection and responsible data use will require a constant balance. Open data and data sharing can occur if data-driven projects are built with privacy protection by design. Data controllers, data producers, and data consumers will need to plan and manage risks and benefits by incorporating proven practices into standard operating procedures.
4.The open data community, and the broader community of donors engaged in international development and humanitarian action, need to invest more in basic research on awareness, use, and impact. Investment in technology and business analysis will also aid the implementation of open data practices. To sustain momentum for open data, we need to rigorously test the theory of change and hypothesised effects on outcomes, such as aid accountability, effectiveness, donor coordination, improved budget management, and timely and inclusive decision-making in the allocation of scarce resources.61 These studies need to go beyond static, qualitative case studies to include more longitudinal studies that are capable of capturing the larger societal costs and benefits and the long-term impacts of open data.
Clare, A., Verhulst, S., & Young, A. (2016). Open aid in Sweden. Brooklyn, NY: GovLab. http://odimpact.org/files/case-study-sweden.pdf
Friends of Publish What You Fund. (2016). How can data revolutionize development?: Putting data at the center of US global development – an assessment of US foreign aid transparency. Washington, DC: Friends of Publish What You Fund. http://media.wix.com/ugd/9a0ffd_2ce18150803b48989905acabf9bb91d6.pdf
GovLab. (2016). The GovLab selected readings on data and humanitarian response. http://thegovlab.org/data-and-humanitarian-response/
Gutman, J. & Horton, C. (2015). Accessibility and effectiveness of donor disclosure policies: When disclosure clouds transparency. Washington, DC: Brookings Institution. https://www.brookings.edu/wp-content/uploads/2016/07/donor-disclosure-policies-gutman.pdf
About the authors
Catherine Weaver is an Associate Professor and Associate Dean of Students at the University of Texas at Austin (LBJ School of Public Affairs), and Co-Director of Innovations for Peace and Development (IPD). Learn more about IPD at https://www.ipdutexas.org. Follow Kate at https://www.twitter.com/kateweaverUT.
Josh Powell is the Deputy CEO at Development Gateway and has been involved with the International Aid Transparency Initiative (IATI) and open data since 2010. Follow Josh at https://www.twitter.com/joshuacpowell and learn more about Development Gateway at https://www.developmentgateway.org.
Heather Leson is Data Literacy Lead at the International Federation of Red Cross Red Crescent Societies (IFRC), is on the board of the OpenStreetMap Foundation, and is a member of the Humanitarian OpenStreetMap Team. Follow Heather at https://www.twitter.com/heatherleson and learn more about her work at https://www.ifrc.org.
How to cite this chapter
Weaver, C., Powell, J., & Leson, H. (2019). Open data and development assistance and humanitarian aid. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The state of open data: Histories and horizons (pp. 77–90). Cape Town and Ottawa: African Minds and International Development Research Centre. http://stateofopendata.od4d.net
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. It was carried out with the aid of a grant from the International Development Research Centre, Ottawa, Canada. |
1Khokar, T. (2017). Hugs and databases: In memory of Hans Rosling. World Bank: The Data Blog, 13 February. https://blogs.worldbank.org/opendata/hugs-and-databases-memory-hans-rosling
2Florini, A. (Ed.). (2007). The right to know: Transparency for an open world. New York, NY: Columbia University Press.
3Fox, J. (2007). The uncertain relationship between transparency and accountability. Development in Practice, 17(4/5), 663–671. https://www.jstor.org/stable/25548267
4Collin, M., Zubairi, A., Nielson, D., & Barder, O. (2009). The costs and benefits of aid transparency. Wells, UK: Aidinfo/Development Initiatives. http://bit.ly/2ShVCfu
5PWYF. (2009). Why aid transparency matters, and the global movement for aid transparency. PWYF Briefing Paper 1. London: Publish What You Fund. http://www.publishwhatyoufund.org/wp-content/uploads/2017/01/Briefing-Paper-1-Why-Aid-Transparency-Matters.pdf
6Mulley, S. (2010). Donor aid: New frontiers in transparency and accountability. London: Transparency and Accountability Initiative. http://www.transparency-initiative.org/archive/wp-content/uploads/2011/05/donor_aid_final1.pdf
7Carothers, T. & Brechenmacher, S. (2014). Accountability, transparency, participation, and inclusion: A new development consensus? Washington, DC: Carnegie Endowment for International Peace. https://carnegieendowment.org/files/new_development_consensus.pdf
8Herrling, S. (2015). The business proposition of open aid data: Why every US agency should default to transparency. Publish What You Fund [Blog post], 30 June.https://web.archive.org/web/20150914193510/http://www.publishwhatyoufund.org/updates/by-country/us/business-proposition-open-aid-data-why-every-u-s-agency-should-default-transparency/
9Barder, O. (2016). Aid transparency: Are we nearly there? Center For Global Development [Blog post], 13 April. https://www.cgdev.org/blog/aid-transparency-are-we-nearly-there
10https://stats.oecd.org/Index.aspx?DataSetCode=CRS1
11https://stats.oecd.org/Index.aspx?DataSetCode=FSS
12http://www.aidtransparency.net/
13See, for example, the EU Aid Transparency Guarantee and the Global Partnership for Effective Development Cooperation.
14IEAG. (2014). A world that counts: Mobilising the data revolution for sustainable development. Independent Expert Advisory Group Secretariat. http://www.undatarevolution.org/report/
15Ibid., p. 6.
16ECA. (2015). Africa data consensus. Addis Ababa: Economic Commission for Africa. https://www.uneca.org/sites/default/files/PageAttachments/final_adc_-_english.pdf
17G8. (2013). Policy paper: G8 open data charter and technical annex. London: Government of the United Kingdom. https://www.gov.uk/government/publications/open-data-charter/g8-open-data-charter-and-technical-annex
18Grand Bargain Signatories. (2016). The Grand Bargain: A shared commitment to better serve people in need. New York, NY: Agenda for Humanity. https://www.agendaforhumanity.org/initiatives/3861
19https://unstats.un.org/unsd/statcom/49th-session/
22https://www.openstreetmap.org/
23Harvard Humanitarian Initiative. (2011). Disaster relief 2.0: The future of information sharing in humanitarian emergencies. Washington, DC and Berkshire, UK: UN Foundation & Vodafone Foundation Technology Partnership. https://hhi.harvard.edu/sites/default/files/publications/disaster-relief-2.0.pdf
24http://crisismappers.net/ and http://digitalhumanitarians.com/
26http://www.missingmaps.org/about/ and https://wiki.openstreetmap.org/wiki/Missing_Maps_Project
27https://data.humdata.org/faq
28https://wiki.openstreetmap.org/wiki/State_Of_The_Map
29https://www.opendatacon.org/
31http://www.data4sdgs.org/news/data-development-festival
32Mitchell, L. (2017). Systematically tracking the aid tracking systems. Medium: Leigh Mitchell’s Blog, 27 July. https://medium.com/@leighmitchell/tracking-the-tracking-systems-ddd3d6578fef
33See, for example, https://devtracker.dfid.gov.uk/, https://openaid.se/, http://openaid.um.dk/, https://explorer.usaid.gov/, https://open.unicef.org/, and https://open.undp.org/
34See https://www.iatiregistry.org/, http://data.devinit.org/, and http://aiddata.org/datasets
35http://iatistandard.org/203/schema/, https://digitalprinciples.org/, and https://data.humdata.org/
36https://openstate.github.io/multitest/
37GAO. (2016). Foreign assistance: Actions needed to improve transparency and quality of data on Foreignassistance.gov. Washington, DC: United States Government Accountability Office. https://www.gao.gov/products/D14383
38http://juds.joinedupdata.org/
39Steele, L. & Orrell, T. (2017). The frontiers of data interoperability for sustainable development. London: Development Initiatives & Publish What You Fund. http://www.publishwhatyoufund.org/wp-content/uploads/2017/11/JUDS_Report_Web_061117.pdf
41http://www.sphereproject.org/
42ICRC. (2013). Professional standards for protection work. 2013 edition. Geneva: International Committee of the Red Cross. https://reliefweb.int/sites/reliefweb.int/files/resources/Professional%20standards%20for%20protection%20work%20carried%20out%20by%20humanitarian%20and%20human%20rights%20actors%20in%20armed%20conflict%20and%20other%20situations%20of%20violence.pdf
43Kuner, C. & Marelli, M. (Eds.). (2017). Handbook on data protection in humanitarian action. Geneva: International Committee of the Red Cross. https://shop.icrc.org/handbook-on-data-protection-in-humanitarian-action.html?___store=default
44USAID. (2014). USAID ADS chapter 508: Privacy program. Washington, DC: United States Agency for International Development. https://www.usaid.gov/sites/default/files/documents/1868/508.pdf
45USAID. (2018). An introduction to USAID’s work on responsible data. USAID & FHI360. Washington, DC: United States Agency for International Development. http://devinfo.digitaldevelopment.org/resources/introduction-usaids-work-responsible-data
46Responsible Data Forum. (2016). The hand-book of the modern development specialist. The Engine Room. https://the-engine-room.github.io/responsible-data-handbook/assets/pdf/responsible-data-handbook.pdf
47Schwab, K. (2017). The fourth industrial revolution. New York, NY: Currency.
48Obrecht, A. & Warner, A.T. (2016). More than just luck: Innovation in humanitarian action. An HIF/ALNAP Study. London: ALNAP & Open Data Institute. https://www.alnap.org/help-library/more-than-just-luck-innovation-in-humanitarian-action
49Custer, S. & Sethi, T. (Eds.). (2017). Avoiding data graveyards: Insights from data producers and consumers. Washington, DC: AidData at the College of William and Mary. https://www.aiddata.org/publications/avoiding-data-graveyards-insights-from-data-producers-users-in-three-countries
50https://www.developmentgateway.org/sites/default/files/2018-06/Learning%20from%20Using%20UNICEF%20IATI%20Data%20in%20Madagascar%20and%20Senegal%20AMPs_DetailReport.pdf
51USAID. (2015). Aid transparency country pilot assessment. Washington, DC: United States Agency for International Development.
52Cisse,H., Ferreyra, F., Irura, M., Musoni, F., Ngom, O., Powell, J., & Sanchez, V. (2016). Use of IATI in country systems: Final report. Washington, DC: Development Gateway. https://www.developmentgateway.org/sites/default/files/2017-02/IATI-UseinCountrySystems-FINAL.pdf
53Ntawiha, W. & Zellmann, C. (2017). Reaching the potential of IATI data. Bristol, UK: Development Initiatives. http://devinit.org/wp-content/uploads/2017/03/reaching-the-potential-of-IATI-data.pdf
54Sapkota, K. (2014). Exploring the emerging impacts of open aid data and budget data in Nepal. Kathmandu: Freedom Forum. http://www.opendataresearch.org/sites/default/files/publications/Open%20Aid%20and%20Budget%20Data%20in%20Nepal%20-%2015th%20Sept-print.pdf
55Grabowski, A. (2017). Transparency is more than dollars and cents: An examination of informational needs for aid spending in Sierra Leone and Liberia. Oxford: Oxfam. http://hdl.handle.net/10546/620330
56Custer, S. & Sethi, T. (Eds.). (2017). Avoiding data graveyards: Insights from data producers and consumers. Washington, DC: Aid Data at the College of William and Mary. https://www.aiddata.org/publications/avoiding-data-graveyards-insights-from-data-producers-users-in-three-countries
57PWYF. (2017). With publication comes responsibility: Using open data for accountability in Benin and Tanzania – A Discussion Paper. London: Publish What You Fund. http://www.publishwhatyoufund.org/wp-content/uploads/2017/09/With-Publication-Brings-Responsibility-A-discussion-paper.pdf
58Verhulst, S. & Young, A. (2016). Open data impact: When demand and supply meet. Brooklyn, NY: GovLab. http://odimpact.org/files/open-data-impact-key-findings.pdf
59Kotsadam, A., Østby, G., Rustad, S.A., Tollefsen, A.F., & Urdal, H. (2018). Development aid and infant mortality. Micro-level evidence from Nigeria. World Development, 105, 59–69.
60Custer, S. & Sethi, T. (Eds.). (2017). Avoiding data graveyards: Insights from data producers and consumers. Washington, DC: Aid Data at the College of William & Mary. https://www.aiddata.org/publications/avoiding-data-graveyards-insights-from-data-producers-users-in-three-countries
61Carolan, L. (2017). Mapping open data for accountability. Washington, DC: Transparency and Accountability Initiative and the Open Data Charter. http://www.transparency-initiative.org/wp-content/uploads/2017/06/taiodc_draft_data4accountabilityframework.pdf
Open data can help researchers and policy-makers understand the education landscape, provide information for parents and children about education facilities and their performance, and serve as a key element in the creation of open educational resources (OER).
Attention must move beyond the simple availability of data on education to also question how the data is contextualised, presented, and used to ensure it does not result in the reinforcement of pre-existing biases and social divides.
There has been relatively limited intersection to date between the open education and open data communities. There are opportunities for future strengthening of these links, increasing the use of open data as a key educational resource, and supporting more applied civic education.
According to United Nations (UN) Sustainable Development Goal (SDG) 4,1 states must “ensure inclusive and quality education for all and promote lifelong learning”. In this chapter, we consider the ways in which open data can support the achievement of this goal. In the education sector, open data released by governments and educational institutions, as well as by national and international organisations, can support a wide range of interventions, including strategies to improve the quality of education, the design of effective education policies, the creation of educational resources, and the development of the key literacies needed to operate and participate in today’s “datafied society”.2
The education ecosystem is made up of a complex network of systems and practices developed to address a wide range of sociopolitical and economic issues. Despite the enormous efforts made by countries to guarantee equal access to quality education, there are still challenges to overcome for which open data can provide insight, perspective, and a wide range of tools to further our understanding of core educational problems and to support the development of solutions. It has also been argued that open data can be used as part of a series of quality indicators to help people to make better decisions related to educational opportunities and methodologies and to choose among education providers. More overtly, open data used in the development of open educational resources (OER) can be considered a key tool in promoting citizenship and democratic values and developing the transversal literacies that citizens require in order to participate in a datafied society. Figure 1 indicates three main ways in which open data and the broader education sector intersect. You can also think of this in terms of how open data use intersects with the three main education stakeholder groups: policy-makers, parents and learners, and educators.
Figure 1:Open data in education
Source: Authors
In this chapter, we will explore both the opportunities and challenges that open data presents across the education sector, drawing upon examples from around the world, and wider critical arguments and studies related to open data.3 We are aware that while open data can promote public participation and social innovation, it can also reinforce pre-existing biases by connecting performance with the poor and vulnerable in an unfair manner, helping to further marginalise those who cannot choose where to live or study. The evidence we have gathered suggests that although impact to date has been mixed, there are many opportunities to substantially strengthen existing networks and activities around open data and education in the future.
Understanding the current state of education and identifying ways to improve education, are vital tasks for policy-makers. Davies,4 Niemi,5 Burns, Köster,6 and the 2017 EU Eurydice Report7 argue that policy-makers need better access to evidence in order to address policy issues. Data that describes achievements, attainment, enrolment, or the distribution of learning are all important to determine whether educational systems are working or not. The United Nations Educational, Scientific and Cultural Organization (UNESCO)8 has indicated the need to ground policy on reliable evidence to ensure that educational policies are effective, efficient, and implementable. They argue for the use of comparable indicators and for ensuring that data is available disaggregated by gender, administrative area, geographical location, sociocultural groupings, education level, and type of provider to enable a comparison between the different groups and to identify those who are educationally disadvantaged.
Motivans,9 in exploring data availability to monitor the SDGs, also calls for educational data that is relevant, valid, reliable, timeless, punctual, clear, transparent, comparable, accessible, affordable, consistent, and with potential for disaggregation. There has been some progress on making this data available (and open), but major gaps remain. Notably, educational data from countries such as Kenya, South Africa, Ecuador, or Montenegro10 is scarce and neither widely nor openly available, making it difficult to assess their progress in relation to SDG 4.
While some states have had standardised testing since the 1950s, it is only in the last 20 years that standard national assessments have become the norm in Europe, and the majority of the world’s population still resides in countries without such testing.11 International initiatives have stepped in to fill the gap. The best-known example of performance data provided at the international level is the Organisation for Economic Co-operation and Development’s (OECD) Programme for International Student Assessment (PISA) test12 initiated in 2000, providing data about learner performance in science, mathematics, and reading. The results of this standard test, linked to sociodemographic data, enable comparative analysis regarding differences in performance among diverse groups of learners, taking into account gender, social background, migrant learners, and ethnicity. In 2015, 72 countries participated in the PISA survey, generating data that is commonly used in evidence-based policy-making to help educational stakeholders to target specific problems guided by clear information. Individual (anonymous) student results from the study are published in downloadable structured data formats for common statistical software.
When open data is available as disaggregated data then a wide range of actors can get involved in its analysis. Academics are clearly major users of education-related data, but private consultancies and non-profit organisations have also taken advantage of available datasets. For example, in the United Kingdom (UK), the FTT Education Datalab13 was established by a nonprofit education services company to help policy-makers improve educational practice. International organisations, such as the OECD, UNESCO, and the World Bank, make use of data (combined with qualitative research) to contribute to the international collection of policies, presentations, policy tools, and frameworks intended to support evidence-based policy-making. Van Schalkwyk (2017) has also drawn attention to the way in which institutions providing performance data (in particular, higher education institutions in South Africa14) take advantage of cross-institution comparisons for benchmarking and how making more granular information available as open data has provided “a new fuel for transformation”.15
However, when approaching educational data for research and policy purposes, there are at least two important considerations to keep in mind. First, the privacy of educators and learners must be protected when using or sharing data, particularly administrative and statistical data containing personally identifiable information. Surfacing and addressing patterns of educational disadvantage requires a careful balance because it is important that educational data can be disaggregated by gender, sociocultural background, educational level, and type of school. In the UK, controversy has emerged a number of times over the intrusiveness and level of data disclosure from the National Pupil Database.16
Second, it is important to consider the capacity to create and use data, not just its availability. In this area, one project to watch is the CapED initiative.17 This project, active in 25 of the least developed countries (LDCs), aims to connect national education policies with data sources, and to support states in their use of this data in the development of their national action plans to achieve SDG 4. As each national CapED project works with UNESCO’s Institute of Statistics to implement a data component, there may be opportunities to further emphasise open data approaches.
When microdata cannot be disclosed, the design of indicators that describe the data landscape is also of crucial importance. At the national level, one example that demonstrates this is the Data Chile education indicators site18 that provides information from the National System of Performance Evaluation (SNED). SNED has been constructed using six indicators: school effectiveness, improvement, initiative, improvement of working conditions, equal opportunities, and the integration of teachers, parents, and guardians (see Figure 2). In an open data context, it is important to think about who gets involved in defining the indicators that will shape the sources of data that will be available in future.
Figure 2:National System of Performance Evaluation (SNED) data: Integration
Source: DataChile, https://es.datachile.io/geo/chile#education
In summary: demand is high for data across the education landscape, but supply varies. When open data is available, established policy-makers can be joined by new actors, including entrepreneurs and journalists, to debate and shape education performance and policy; however, even in the absence of globally comparable data or the use of that data by policy-makers, datasets on educational institutions can also drive change through parent and pupil behaviours.
In many countries, parents and/or pupils have some degree of choice over educational institutions. Statistics have long played a role in decisions related to the selection of learning products, programmes, and providers. With the availability of open data, a range of interactive platforms have emerged that use institutional or third-party assessment data to inform parents and learners, providing them with indicators and information they can use to make informed choices.19,20 The data made available about educational institutions tends to focus on performance (e.g. university ratings) by using standardised metrics, but also may provide detailed information on programmes and prerequisites.
The last decade has seen the launch of numerous portals around the world that provide the means to compare the quality of education at different institutions using data provided by national and local authorities. Some examples include the Identicole portal in Peru, MIME from the Ministry of Education in Chile, JedeSchule run by non-profit organisations in Germany, the mobile app-based Conozca su escuela in Costa Rica run by Programa Estado De La Nación, and Scholen Keuze and Scholen op de Kaart in the Netherlands.21
A number of platforms go beyond using data to encourage “shopping around” in the selection of schools. For example, Mejora tu escuela in Mexico,22 created by El Instituto Mexicano para la Competitividad (IMCO) with funding from the Omidyar Network, places an emphasis on gathering feedback from users of the platform and equipping them to advocate for improvements to their existing schools. In the UK, School Cuts,23 created with the backing of major teachers’ unions, places the emphasis on how funding cuts in education are impacting individual schools and was used as an advocacy tool in the last election. One of the unions funding the project claimed it helped to change “750 000 votes during the election and resulted in the government stumping up another £1.3 billion for schools in July”.24 However, the vast majority of platforms focus on maps and rankings. Figures 3 and 4 show two further examples from the UK. The first one, School Atlas, was developed by the Mayor of London and showcases the impact of income deprivation on children in London. The second example is a map of schools in London developed by a private firm, Locrating Ltd, which places the emphasis on school quality, cross-referencing data from Ofsted (the inspector of schools) and the Department of Education (UK). It showcases schools by area, displaying school quality as “inadequate”, “requires improvement”, “good”, or “outstanding”; however, if we look at the data from a critical perspective, we can note the biases this information may portray by reinforcing preconceived notions of privilege and disadvantage.
Figure 4:Examples of school information platforms
Source: Locrating, A to Z of Schools, https://www.locrating.com/all_schools.aspx
Both examples offer an illustration of how the quality of education can be portrayed, but, even with contextual data, there is a risk that such information could stigmatise pupils from schools rated as inadequate or in low-income areas. We need to consider critical ethical questions when making data available about schools or, at the very least, ensure performance data is accompanied by contextualised information about the socioeconomic challenges faced by the relevant community, such as poverty, integration, and inclusion.
While school information portals are popular and may support more informed decision-making by learners faced with a complex mix of educational opportunities, there is limited empirical evidence to date on whether they ultimately improve education as much as advocacy-oriented efforts aimed at holding governments accountable or at ensuring proper funding for quality education for the most vulnerable in our society. When it comes to data on educational institutions, we have both ample open data supply and demand, as well as active intermediaries who are able to sustain their platforms. While there may be cases of individual impact for particular learners, the net social impact is difficult to determine.
The final application of open data in education is its direct use in the development, or as part, of OER. OER are defined by UNESCO25 as “any type of educational materials that are in the public domain or introduced with an open license”. Open data used as OER can allow students to learn and experiment by working with the same raw data researchers, governments, civil society, international organisations, and policy-makers generate and use. They can form a key component in research- and scenario-based learning activities, and in supporting students to develop informational, statistical, scientific, media, political, and critical-thinking skills. By working with real-world data, students can develop storytelling and research skills, and can apply analytical, collaborative, and citizenship skills in using data to solve real-world problems.
This idea of using open data in education is recognised in the sixth principle of the Open Data Charter26 on open data for inclusive development and innovation, which states that it is key to “[e]ngage with schools and post-secondary education institutions to support increased open data research and to incorporate data literacy into educational curricula.” Although it is not clear how much emphasis has been placed to date on this point by countries and cities adopting the Charter, the groundwork to support the use of open data as OER has been laid in a number of projects.
In 2015, the Open Education Working Group of the Open Knowledge Foundation, established in 2013, published Open data as Open Educational Resources: Case studies of emerging practice27 in which a series of authors presented activities that could be adopted by educators at schools and universities to promote the use of open data in research-related activities. The book provides examples and best practices, showcasing how to use real data from research and from national and international data projects to foster educational activities to develop data literacies and critical thinking through collaborations among students, researchers, and academics. One of the practices portrayed in the book is A Scuola di OpenCoesione in Italy,28 an educational challenge, designed for Italian high school students. It was funded under the open government strategy on cohesion policy in partnership with the Ministry of Education and the Representation Office of the European Commission in Italy.
Other practical examples of the use of open data as OER29 can be found at the Open Data School in Russia, which provides a series of lectures and seminars from experts on open data topics. The Open Linked Data project at the Universidad Técnica Particular de Loja in Ecuador presents the results of a study on Linked Data technology for students, researchers, and educators, and Data Science Fundamentals in Palestine offers an online tool to enable students to follow the Foundations of Data Science training course developed by students and academics from Birzeit University. Finally, Monithon, also from Italy, offers an example of applied learning through open data, which citizens and university students, alongside researchers and policy-makers, use to monitor development projects. However, even with these notable successes, many initiatives focused on the use of open data as OER have been relatively short-lived, and the connections between the open education and the open data communities are still relatively weak with only a few points at which the communities intersect.
Supporting use of open data as OER is closely linked to work on data literacy (see Chapter 19: Data literacy). Recently, the Latin American Initiative for Open Data (ILDA) has developed a training programme for academics in the use of open data for teaching and learning30 to support them in developing the capacities needed to live and work in the datafied society, including learning to construct knowledge and analysing information critically from a wide range of data sources.31
Following Uhlir and Schröder’s argument32 that “[s]tudents may be less effectively educated and trained if they are unable to work with a broad cross-section of data”, and Davies’33 assertion that “there will be greater need in future for capacity both in state and society to be able to debate the meaning of data, and to find responsible ways of using open data in democratic debate”, we consider that the inclusion of open data in curricular activities is key to ensuring that both educators and learners acquire the skills they need to participate in contemporary society.
Over the last ten years, open data availability has grown, including data about education and data that can be used within education. Looking for school performance information may have involved using tables published once a year in newspapers ten years ago, but now many countries have interactive websites offering analysis and visualisation: ranging from official government sites to private sector-managed portals. Schools and post-secondary education institutions no longer need to rely on tables in textbooks, but can go to real-world updated datasets for teaching and learning; however, many challenges remain.
Although open data can provide evidence about problems that need to be addressed at the policy level, it can also be a key component in the development of the literacies needed in a datafied society, as well as in enhancing and promoting civic participation and understanding of the media and the sciences. However, it cannot be considered as the panacea for all educational problems.
Data is never neutral and it is ultimately a political instrument. Data and the algorithms used to analyse it can prompt stigmatisation, segregation, and discrimination. Mainstream narratives may place the blame for poor quality education on the children that perform poorly on standardised tests based on their economic or social background, instead of pointing at the authorities who have failed to provide the policies, programmes, and funding needed to improve the schools those children attend.
Arguments for opening data in education have tended to focus simply on the importance of access to data. Such arguments can gloss over the non-neutrality of data and the potential threats inherent in data-driven decision-making, where the context for data collection and presentation is opaque or where data “consumers” lack the critical thinking skills needed to interpret the data. They often also ignore the impact of trends toward the marketisation of education. We do not believe that it helps to approach open data as innocuous and benign per se. As Kitchin34 states, “if open data merely serves the interests of capital by opening public data for commercial re-use and further empowers those who are already empowered and disenfranchises others, then it has failed to make society more democratic and open”. However, as we have seen above, with examples like SchoolCuts.org, it is not only private interests that can deploy data for implicit or explicit political ends and there is potential for critical action.
Ultimately, while there are many challenges around the use of open data for education, it is through wider education about the creation and use of open data that these risks can be best addressed. The wealth of open data on all topics that could be applied to OER can be part of this. In conclusion, we recommend that:
In the use and development of education indicators, it is important to prevent analysis exclusively through the use of algorithms as these may reflect biases and can foster the stigmatisation of vulnerable students.
When governments open up educational data, they must ensure that it is anonymised to prevent the identification of individuals and collectives and, in addition, consider the potential uses of this data by public and private stakeholders to prevent this data from being used unethically.
When institutions, civil society, and private sector organisations build tools using educational data, they need to consider the potential impact and use for students, educators, and educational communities.
And finally, to foster data and citizenship literacies, the open education, open data, and open science communities must collaborate to develop educational materials and curricula to support educational institutions and programmes at all levels, including training for educators and educational communities.
Atenas, J. & Havemann, L. (Eds.). (2015). Open data as Open Educational Resources: Case studies of emerging practice. London: Open Knowledge, Open Education Working Group. https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/2396976/BookOpenDataasOpenEducationalResources.pdf
Mandinach, E.B. & Gummer, E.S. (2013). A systemic view of implementing data literacy in educator preparation. Educational Researcher, 42(1), 30–37. https://journals.sagepub.com/doi/abs/10.3102/0013189x12459803
Schafer, M.T. & Van Es, K. (Eds.). (2017). The datafied society: Studying culture through data. Amsterdam: Amsterdam University Press. http://www.oapen.org/search?identifier=624771
About the authors
Javiera Atenas is the Education Lead at the Latin America Open Data Initiative (ILDA) and co-coordinator of the Open Knowledge Open Education Working Group. She holds a PhD in Education and her research focuses on educational policies, open educational practices, data ethics, and data literacies. She is an active researcher and advocate of open practices. You can follow Javiera at https://www.twitter.com/jatenas.
Leo Havemann is a Digital Education Advisor at University College London and a postgraduate researcher at the Open University. His interests focus on the use of technology in higher education, open educational practices, and learning literacies. Follow Leo at https://www.twitter.com/leohavemann.
How to cite this chapter
Atenas, J. & Havemann, L. (2019). Open data and education. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The state of open data: Histories and horizons (pp. 91–102). Cape Town and Ottawa: African Minds and International Development Research Centre. http://stateofopendata.od4d.net.
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. It was carried out with the aid of a grant from the International Development Research Centre, Ottawa, Canada. |
1UN (United Nations). (2018). Sustainable Development Goals: 4. Education. https://www.un.org/sustainabledevelopment/education/
2Schafer, M.T. & Van Es, K. (Eds.). (2017). The datafied society: Studying culture through data. Amsterdam: Amsterdam University Press. http://www.oapen.org/search?identifier=624771
3Kitchin, R. (2013). Four critiques of open data initiatives. LSE Impact Blog, 27 November. http://blogs.lse.ac.uk/impactofsocialsciences/2013/11/27/four-critiques-of-open-data-initiatives/
4Davies, P. (1999). What is evidence-based education? British Journal of Educational Studies, 47(2), 108–121. https://www.tandfonline.com/doi/abs/10.1111/1467-8527.00106
5Niemi, H. (2007). Equity and good learning outcomes. Reflections on factors influencing societal, cultural and individual levels – the Finnish perspective. Zeitschrift für Pädagogik, 53(1), 92–107. https://www.pedocs.de/volltexte/2011/4389/pdf/ZfPaed_2007_1_Niemi_Equity_good_learning_D_A.pdf
6Burns, T. & Köster, F. (2016). Educational research and innovation: Governing education in a complex world. Paris: Organisation for Economic Co-operation and Development Publishing. http://dx.doi.org/10.1787/9789264255364-en
7European Commission/EACEA/Eurydice. (2017). Support mechanisms for evidence-based policy-making in education. Eurydice report. Luxembourg: Publications Office of the European Union. https://eige.europa.eu/resources/206_EN_Evidence_based_policy_making.pdf
8UNESCO. (2013). UNESCO handbook on education policy analysis and programming. Volume 1: Education policy analysis. Bangkok: United Nations Educational, Scientific and Cultural Organization. http://unesdoc.unesco.org/images/0022/002211/221189E.pdf
9Motivans, A. (2015). Improving education statistics systems: Challenges and opportunities. Presentation at World Statistics: Sustainable Data for Sustainable Development, 20–22 October 2015, Xi’an, People’s Republic of China. Montreal: UNESCO Institute for Statistics. https://unstats.un.org/sdgs/files/meetings/sdg-seminar-xian-2015/Presentation--3.4-Sustainable-Data-for-Sustainable-Development--UNESCO.pdf
10http://kenya.opendataforafrica.org/gallery/Education, http://www.statssa.gov.za/?cat=16, https://educacion.gob.ec/estadisticaseducativas/, and http://www.mpin.gov.me/en/ministry
11Sandefur, J. (2018). The case for global standardized testing. Center for Education Innovations [Blog post], 5 May. https://educationinnovations.org/case-for-global-standard-testing
12See the OECD’s PISA test data http://www.oecd.org/pisa/data/
13https://ffteducationdatalab.org.uk/
14Van Schalkwyk, F., Willmers, M., & McNaughton, M. (2016). Viscous open data: The roles of intermediaries in an open data ecosystem. Information Technology for Development, 22(1), 68–83.
15Van Schalkwyk, F. (2017). Open data on universities: New fuel for transformation. University World News, 14 July. http://www.universityworldnews.com/article.php?story=20170710104034491
16UK Data Service. (2019). National pupil database. https://beta.ukdataservice.ac.uk/datacatalogue/series/series?id=2000108
17https://en.unesco.org/themes/education/caped
18https://es.datachile.io/geo/chile#education
19McAuley, D., Rahemtulla, H., Goulding, J., & Souch, C. (2011). How open data, data literacy and linked data will revolutionise higher education. In L. Coiffait (Ed.), New thinking about the future of higher education (pp. 88–93). London: Pearson. https://www.academia.edu/1945211/The_Open_Data_Revolution_and_Data_Literacy_in_Higher_Education
20Guy, M. (2016). The Open Education Working Group: Bringing people, projects and data together. In D. Mouromtsev & M. d’Aquin (Eds.), Open data for education: Linked, shared, and reusable data for teaching and learning (pp. 166–187). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-30493-9_9
21http://identicole.minedu.gob.pe/, http://www.mime.mineduc.cl/mvc/mime/portada, https://jedeschule.de/, https://www.conozcasuescuela.ac.cr/,http://www.scholenkeuze.nl/, and https://www.scholenopdekaart.nl/, respectively.
22http://www.mejoratuescuela.org/
23http://www.schoolcuts.org.uk/
24Whittaker, F. (2018). NUT spent £326k on general election campaign. Schools Week, 19 March. https://schoolsweek.co.uk/nut-spent-326k-on-general-election-campaign/
25UNESCO (United Nations Educational, Scientific and Cultural Organization). (2017). Open Educational Resources (OER). https://en.unesco.org/themes/building-knowledge-societies/oer
26Open Data Charter. (2015). Principles: International Open Data Charter. https://opendatacharter.net/principles/
27Atenas, J. & Havemann, L. (Eds.). (2015). Open data as Open Educational Resources: Case studies of emerging practice. London: Open Knowledge, Open Education Working Group. http://dx.doi.org/10.6084/m9.figshare.1590031
28http://www.ascuoladiopencoesione.it/about-opencohesion-school/
29http://opendataschool.ru/anketa/, http://data.utpl.edu.ec/, https://github.com/abedkhooli/ds1, and http://www.monithon.it/
30https://idatosabiertos.org/en/investigaciones-2/modelo-docente-y-datos-abiertos/
31Atenas, J., Havemann, L., & Priego, E. (2015). Open data as Open Educational Resources: Towards transversal skills and global citizenship. Open Praxis, 7(4), 377–389. http://dx.doi.org/10.5944/openpraxis.7.4.233
32Uhlir, P.F. & Schröder, P. (2007). Open data for global science. Data Science Journal, 6, OD36–OD53, p. 201. https://www.jstage.jst.go.jp/article/dsj/6/0/6_0_OD36/_pdf
33Davies, T. (2010). Open data, democracy and public sector reform: A look at open government data use from data.gov.uk. Master’s Thesis. University of Oxford, UK, p. 5. http://www.opendataimpacts.net/report/
34Kitchin, R. (2013). Four critiques of open data initiatives. LSE Impact Blog, 27 November. http://blogs.lse.ac.uk/impactofsocialsciences/2013/11/27/four-critiques-of-open-data-initiatives/
Data and information underpinning environmental knowledge is recognised as a form of power.
Vast quantities of environmental data are available online through many dedicated local, regional, and international data portals. This reflects long-established norms and practices of data-sharing within the environmental research community.
Emphasis must be placed on increasing the volume and geographic coverage of open water and air quality data.
Making connections between datasets across borders and thematic silos is essential to support greater understanding of a changing climate, to address air quality, to manage water resources, and to sustain biodiversity. However, there is often a disconnect between academic and official data initiatives and open-source, grassroots/citizen-science open data projects.
Context-aware open data approaches and well-resourced data infrastructures are crucial to avoid loss of data, missed opportunities, and duplication of effort.
As the amount of environmental data from sensor networks increases, there will be major inequalities in global data coverage to address with developing countries often being more poorly represented.
Since the early 1960s, we have seen an increasingly vocal response to unmitigated anthropogenic impacts on the environment.1 Although there were earlier activists and movements, the 1960s marked the period when disparate voices started to coalesce. Environmental activists started conceptualising environmental problems as political matters, and, in doing so, using scientific knowledge as part of their armament. This led to a significant change in policy-making with regard to the use of scientific outputs and knowledge as supporting evidence. Data and information have become forms of power that are used to drive or change political discourse on issues affecting the environment. Knowledge derived from science, coupled with activism, played a major role in getting governments to endorse the Declaration of the United Nations Conference on the Human Environment in Stockholm in June 1972.2 It was at this conference that governments accepted that anthropogenic impacts on the environment were a reality and that more research was needed to understand the causes, impacts, and mitigation measures. Since that time, we have had subsequent international environmental engagements that rely on scientific knowledge to guide activism, decision-making, and policy development.
The 1990s brought the digital revolution. Data generation and exchange became easier, and, by 1996, the internet had become mainstream, allowing for easy digitisation and the dissemination of data. Environmental data became easier to acquire and to share. Although access to environmental data, information, and knowledge is not a recent phenomenon, over time the emphasis for open access has shifted from information and knowledge as products to include the underlying elements: the data that comprises these products.
Environmental concerns are all-encompassing, ranging from microbial research through to large planetary weather systems research. Open data provides an opportunity to promote review, transparency, accountability, participation, and the identification of knowledge gaps. The growth in environmental open data portals to support research, advocacy, decision-making, and communication indicates the importance of sharing data on a range of environmental issues.
The following sections present an overview of the progress on open data in relation to four key environmental domains: climate change, air quality, biodiversity, and water resources.
Known research into climate change can be traced back to 1824, when Joseph Fourier3 noted the warming of the Earth. In the 1890s, Swedish scientist Svante Arrhenius4 made the connection between carbon dioxide and rising temperatures, the “greenhouse effect”. It took another century of research, publications, and advocacy before the issue secured global attention.
The Intergovernmental Panel on Climate Change (IPCC) has achieved great success in putting climate change on the international political agenda and ensuring that almost every national government is paying attention to the issue. The data underpinning IPCC research comes from various open sources, and there are robust processes in place to ensure data integrity. The transformation of statistical climate data into easily digestible visuals through data visualisation, such as maps, also helped convey the importance of the issue to the general public (see Figure 1). The IPCC Fourth Assessment Report provided credible evidence to gain the necessary political traction;5 however, the identification of “major errors” in the main report had some sceptics questioning its veracity. The greatest error related to the incorrect referencing of 2035 as the date by which the Himalayan glaciers will have melted; however, a correction was made after a review of the source data, and the date estimate was changed to 2350.6 Other perceived “errors” were not actual errors, but rather questions regarding the validity of including content that had not been peer reviewed.
Figure 1:Data visualisation is a powerful tool to interpret complex climate data and make it accessible to a wider audience. In this image, NASA uses visualisation to illustrate temperature departures from the average during February 2016.
Source: IMAGE:NASA GISS
The 4th IPCC Assessment Report
The main criticism of the 4th IPCC Assessment Report has been that errors can be attributed to the referencing of non-peer reviewed literature, such as a World Wide Fund for Nature report, as well as various grey literature. The outcome of the criticism has had two positive effects: 1) the correction of the errors and 2) refinement in the process and structures to review data to support any claims the IPCC makes. In an open data environment, robust and well-documented data management processes are essential for credibility.
Due to the political, economical, and social visibility, as well as the importance of climate change research, a number of open data platforms have been created as detailed in Table 1, which also demonstrate various levels of open data licensing.
Table 1: Open data platforms to access climate-related data |
|||
Name |
Year launched |
Core focus |
Data licence |
IPCC Data Distribution Centre |
1998 |
To facilitate the timely distribution of a set of consistent up-to-date scenarios of changes in climate and related environmental and socioeconomic factors for use in climate impact and adaptation assessment. |
OECD Principle of “openness” |
World Bank Climate Change Knowledge Portal |
2010 |
Hub for climate information |
Various CC licences |
Southern African Science Service Centre for Climate Change and Adaptive Land Management http://www.sasscal.org/ |
2012 |
To host, safeguard, and make data and information resources available openly, yet ensure the integrity and ownership of the contributing parties. |
Open access to data (incl. climate change and weather data) for southern Africa. |
European Union Copernicus Climate Data Store |
2018 |
The Copernicus Climate Change Service (C3S) will combine observations of the climate system with the latest science to develop authoritative, quality-assured information about the past, current, and future states of the climate in Europe and worldwide. |
Free of charge, worldwide, nonexclusive, royalty free, and perpetual. |
Climate change open data portals present one of the best case studies of how open access to data, and the resulting scientific and advocacy collaborations, has led to a major shift in public understanding of science-backed policy and to large financial investments in further research and mitigation. Although data on the monetary investment and outcomes of mitigation measures is more limited, highlighting a gap still to be filled, a number of projects are now tracking climate-related financing. The National Determined Contributions Explorer aims to publish national climate change mitigation plans and data on progress as the means to hold governments accountable.7 Transparency International (TI) also publishes data on the use of global funds to tackle climate change impacts,8 noting that the amount pledged by national governments will be running at USD 100 billion per year by 2020, and set to increase over time. TI has also been exploring the adoption of the Open Contracting Data Standard to ensure transparency and accountability in the contracting chain for climate-related finances.9
Air pollution has been an historical concern since the industrial revolution. However, it was only in the 1970s that scientists made the link between air pollution and its impact on human health. It was also during this decade that the United States and the United Kingdom started to implement regulations to curb air pollution. Today, policy-makers rely heavily on air quality data to inform policy review and development.
Air quality monitoring requires the implementation and management of monitoring stations, which may take the form of real-time digital instrumentation or manually monitored diffusion tubes. While governments often collate and publish this data, the 2016/2017 Global Open Data Index ranks the openness of air quality data by national governments as very low with only 8% of governments sharing air quality data as accessible open data.10 However, several initiatives are now working to aggregate and analyse air quality monitoring from around the world.
The World Air Quality Index (WAQI), created in 2007 by a team in Beijing, provides access to open air quality information from more than 10 000 stations in 800 cities from 70 countries.11 Only data on particulate matter of PM2.5/PM10 and greater from official government or professionally maintained measuring stations is published.12 This data is validated through neighbourhood and historical comparisons. The data from this platform conforms to the data requirements for reporting on the Sustainable Development Goal (SDG) health-related indicators,13 and is, therefore, able to inform government policy and support SDG reporting obligations.
The OpenAQ initiative also aggregates data from government monitoring stations and is exploring the inclusion of data from citizen-run low-cost sensors. With a strong open source and open data ethos, and an emphasis on permanently archiving data, the project is a key example of data being used to influence people’s behaviour and government action.14
Both OpenAQ and WAQI offer maps of the sensor networks they draw upon. A cursory glance at these reveals a dearth of measuring stations in Africa. This is supported by research conducted by Wetsman15 that notes South Africa is the only country in Africa with an air-quality monitoring programme. The map (Figure 2) below illustrates the global distribution. The lack of data collection and open data in certain regions will, therefore, negatively impact research and mitigation-related actions. Future work in this sector will have to focus on extending measures to collect data from more locations in developing countries.
Figure 2:Distribution of air quality monitoring stations sharing data via the WAQI portal
Source: https://waqi.info/
Biodiversity is about the variety of life on earth. Typically, biodiversity data covers genetics through to landscapes and all the floral and faunal species in between. Many open data sources exist, ranging from the Biodiversity Heritage Library (BHL) and the Encyclopaedia of Life (EoL) to the Global Biodiversity Information Facility (GBIF). As an example, GBIF collates and shares over 1 billion biodiversity records from more than 1 400 institutions, covering the globe.16 Figure 3 illustrates an extract from the GBIF portal of the available open biodiversity data for Niger where the 83 449 recorded occurrences contribute toward this resource. The general conclusion is that data collections on biodiversity held at the local, regional, and international level are vast and very often made available under open access licences.
While these datasets may be valuable at a local level or thematic scale, it is in the connectedness of this data that the true value is found. The ultimate goal of this data is to answer overarching questions on ecological interactions and interdependencies within the biotic and abiotic environment at different scales. This can create major challenges for data-sharing infrastructures, requiring systems, standards, and collaborative mechanisms to enable the discovery of data and to manage information on provenance. Many initiatives, such as the Biodiversity and Protected Areas Management Programme (BIOPAMA),17 are now actively integrating the collation and collection of data into their project designs to encourage open data sharing. Funders are also playing an important role in creating funding conditions to share data. For example, the JRS Biodiversity Foundation18 and many other grant-making agencies are including conditional clauses to enforce the free sharing of data collected as the result of grant funding.
Generally, the biodiversity community has self-organised to limit the overlap in data collection and management. Accordingly, organisations, such the Internal Union for the Conservation of Nature, BirdLife, and the World Conservation Monitoring Centre, have adopted specific focus areas for the type of biodiversity data collected as part of their project work, assessments, and other related activities. These organisations also play a very important role in supporting national reporting obligations toward the Aichi Biodiversity Targets19 and the SDGs.20 It is important to note that not all biodiversity data is considered to be open data. BirdLife International, for example, has protocols that restrict access to certain bird data that it deems sensitive, such as nesting sites. The aim is to protect species from local or even global extinction as a result of poaching, illegal hunting, collection, or intrusive behaviour.
Figure 3:An example of a biodiversity dataset available on the GBIF portal. Data is aggregated from many different sources and openly shared.
Water is a basic human need, and access to clean water is becoming a major global concern. Climate change has had a significant impact on rainfall patterns, most notably in Sub-Saharan Africa. Changing rainfall patterns, coupled with poor management of existing water supplies, pose major livelihood challenges to millions of people. Those most affected by the lack of clean water are women and children in developing countries.21
The water sector has a fair number of dedicated data portals. The United Nations Educational, Scientific and Cultural Organization (UNESCO) has recently launched22 the Water Data Quality Portal to provide access to related global datasets.23 The Global Environment Monitoring System for freshwater (GEMS/Water) provides data on fresh water quality intended to support scientific assessments and decision-making related to water management.24 Sharing Water-related Information to Tackle Changes in the Hydrosphere - for Operational Needs (SWITCH-ON), a European Union (EU) initiative, provides access to water-related information to assist in managing water in a sustainable manner.25 The International Water Management Institute’s Water Data Portal provides access to global water-related information.26 The European Commission, using Google Earth Engine, has developed the Global Surface Water Explorer, which maps the location and temporal distribution of surface water for the period 1984–2015.27 Given the many available data portals, it is interesting to note that the Global Open Data Index28 still ranks the openness of water quality data from national governments as very low with just 1% of index surveys able to access open data on water quality direct from governments.
Access to clean water is an immediate and critical concern. This is especially true in rural areas, where water contamination can affect human lives, livestock, and crops. The data currently collected at the global level is analysed using remote sensing tools coupled with water quality information obtained from available sensors. The challenge ahead will be to expand the collection of water quality information, using the power of technology to immediately communicate changes in water provision or quality. Therefore, the future of open data within the water sector relies on developing technology that can be used in the most remote locations in developing countries. Through the application of technology, the data collection activities will need to improve to near real-time with higher levels of accuracy to assist emergency response activities and policy development.
Cape Town drought
Since 2015, Cape Town has experienced an unprecedented drought, leading to serious water shortages. Although many causes have been postulated, and blame apportioned, defensible evidence was sought to understand whether the crisis was caused by less rainfall, increased evaporation, increased agricultural and urban use, or poor management. A study by the Climate Systems Analysis Group at the University of Cape Town, using open data, found the main cause of the water crisis to be a result of low rainfall between 2015 and 2017.29,30
Open datasets were used to create two separate maps to analyse the temporal levels of the Theewaterskloof Dam, the largest water source in Cape Town. Figure 4 shows that the dam levels were fairly constant for the period 1984–2015. Figure 5 illustrates the rapid decline of water volumes between 2016 and 2018. These two different datasets, using different visualisation techniques, complement the UCT study that found exceptional low levels of rainfall since 2015 had resulted in the water crisis.
Figure 4:The darker blue areas show more permanent water for the period 1984–2015.
Figure 5:The reduction in water levels for the period 2016–2018. The dark blue represents the water level in 2018.
Governments, civil society, business, and academia are the four major groups driving the environmental open data agenda. Governments have been changing policies and legislation to support open data,31 mostly as the result of pressure from civil society and academia. Traditionally, business is an active user of open data, but is not widely known for the release of open data.
Keeping open data portals open requires resources. Wealthier countries typically fund their own environmental open data initiatives; however, for developing countries, continuous access to open data is very much dependent on available funding to generate, curate, and publish datasets. Typical major funding sources include the World Bank, the United Nations, the Global Environmental Facility (GEF), bilateral foreign aid, and many private donors. This presents a particular challenge for emerging economies, where data management is linked to project-based funding and the data becomes “lost” or “orphaned” after a project has been completed. Therefore, the true value of the new data is not realised and the investment is not able to generate ongoing value. New projects then re-invest in data collection, often collecting the same or similar data, and the cycle repeats itself.
The pathway to sustainable data management practices must be multi-pronged and not rely on any single approach. To be successful in the long-term, the management of open datasets will require investment from host agencies in the form of money or in-kind resourcing, such as staff, infrastructure, or content. It is also important that donor funding be moulded to support the needs of the specific country or agency and to ensure that data collection and management is not responding solely to short-term donor agendas. The funding model used must be structured to build internal data management capacity within recipient organisations that will have a legacy impact after the temporary needs of a project have been met. In this manner, internal capacity and resources can be developed over time as the result of donor support. Importantly, a fresh take on the role of the private sector is also needed in order to evaluate how it can enhance the shared value of public datasets used by business as a means to contribute to the public good. One way is for private sector data users to return enhanced datasets to governments for publication; another approach is for the private sector to provide expertise and infrastructure to support the management and publication of data.
The environmental sector has a history of collaborating toward common goals. An example of this is the initiative to combat illegal wildlife trafficking, where environmental actors collaborate with non-environmental agencies, such as Interpol, by exchanging critical data. International conservation organisations, such as the World Wildlife Fund and the International Union for Conservation of Nature, share their data to drive cooperation, transparency, and accountability, and to encourage community review of quality. The collections of natural history museums and herbaria are being digitised and placed in the public domain with the aim of the data being used to aid conservation and management.
Collaborations like these can also be extended to the management of open data. The Atlas of Living Australia32 is an international leader in publishing collated open biodiversity data with more than 76 million records made freely available from 311 different data providers. Citizen science is becoming very popular and it is also adding volumes of data to established scientific collections. Through collaboration, environmental organisations are able to secure a range of benefits, including shared skills, experts, and infrastructure.
Innovation: Cybertracker
The award winning Cybertracker33 app was created to provide the indigenous Kalahari San with technology to capture complex field data. The technology has been developed to be intuitive and to allow non-literate people to record data and knowledge for scientific conservation and management applications.
Indigenous knowledge, knowledge passed on from one generation to the next, can advance scientific research and improve the public image of science. However, this type of knowledge is often viewed as “unscientific” although it is the basis upon which we built our existing scientific knowledge. Ironically, we have seen the appropriation and exploitation of Indigenous knowledge on the use of plant-based natural resources by multinational corporations: a phenomenon known as biopiracy.34 The World Intellectual Property Organization is currently working on international legal instruments to protect Indigenous knowledge and ensure appropriate benefit sharing.35
Many new companies have been established using public open data. As noted earlier, the private sector is an active user of public data, and the potential exists to create valuable public– private partnerships to further advance the private sector as a contributor of open data. Recognising the value of sharing data as the means to stimulate innovation and build positive public relations, the private sector is becoming more transparent. While the overall open data market value is projected to be in the region of € 286 billion by 2020,36 the exact potential value of open environmental data is not known. However, it is reasonable to assume that the value of this open data is significant. In 2013, the Climate Corporation, a private company built on open climate data to support farming decisions, was sold for USD 1.1 billion to Monsanto, a multinational agricultural company.
Further evidence on the use of environmental data in the private sector comes from the Open Data 500 project,37 which provides information on private companies using government open data through studies in six countries. The project seeks to map the economic and social impact of government open data by looking at the businesses using it. Figure 6 illustrates the number of businesses per country in the environment and weather sector. Canada tops the list with 45 businesses, followed by Italy (24) and Korea (16).
Figure 6:Number of private companies in the environment and weather sector using open access government data as an integral part of their business model and as a tool to generate new business Source: www.opendata500.com. See the Open Data 500 website for more details.
Standards are necessary to define acceptable quality metrics for data, ensure consistent use, and to facilitate data sharing. The lack of common standards negatively impacts the credibility, use, and exchange of data across the environmental sector.
While environmental data collection has become easier, the development and maintenance of metadata has become increasingly laborious; however, without metadata, the value of the data erodes and data interoperability becomes extremely difficult. Making environmental data interoperable creates the capacity to share data and important indicators across systems regardless of geographic boundary, vendor, or organisation, but this requires consistent adherence to standardised metadata, ontologies, and vocabularies for the description and organisation of the data. The Committee on Data (CODATA) of the International Council of Science, established in 1966, is actively working toward coordinating data standards among scientific unions at the international level and has made major steps in embedding open data principles in their work.38
The lack of skills, expertise, and equipment within governments needed to meaningfully exploit the vast quantities of available environmental open data is also a major constraint in addressing environmental challenges, especially in developing countries. It is widely noted that developing countries will be the most impacted by climate change with one (proprietary) index of climate change vulnerability identifying the Central African Republic, the Democratic Republic of the Congo, Haiti, Liberia, and South Sudan as facing the greatest risks.39 Many developing countries are also home to vast natural resources that are under the pressure of exploitation or destruction. These very countries are under social and political pressure to protect their natural resources while simultaneously under economic pressure to grow their economy.
Providing capacity building for developing countries has been on the developmental agenda for many years and has taken the form of institutional, individual, and infrastructural interventions. Very often, capacity development has been focused on the needs of donor-funded projects, limited to the funding period or conditions and not structured around government-led interventions that can sustain impact. Linked to this technical capacity constraint are the political challenges that face institutions intending to make environmental data openly accessible. For example, the Government of Tanzania has recently withdrawn from the Open Government Partnership.40 The systemic impact of this decision is to further limit disclosure of data into the public domain, restricting capacity development in publishing data, hindering innovation in using open data, and limiting potential private sector expansion using open data.
Generally, although substantial expertise exists within the research community, the broader environmental sector, including government and civil society actors, is lagging behind in terms of applied data management expertise. This has a profound effect on the quality, quantity, access, and frequency of data that can be released as open data, and further frustrates attempts to use data to mitigate environmental damage and the negative impacts of climate change.
Open data plays a crucial role in advancing our collective efforts to ensure sustainable management of all our natural resources. It has fostered collaboration that would not have been possible 30 years ago. It has allowed scientists to review the veracity of their work and hold them accountable for their conclusions, as it does politicians for their decisions. Furthermore, it has also supported instances of greater civil participation in the public and private sector spheres with the potential to give poor and marginalised people greater power through knowledge. Open data has also helped to drive the development of innovative products and services, not only in developed countries, but also in developing countries, addressing issues of environmental conservation, skills development, and economic growth. Overall, open data has shown revolutionary potential, although the measurement of impact remains difficult.
However, there is still much effort needed to ensure that environmental data becomes fully accessible to address environmental challenges. The advancement of the environmental open data agenda must happen at both the macro and micro levels. At the macro level, changes are necessary on an institutional scale to challenge closed governments to open their data. The collaboration between thematic sectors must be encouraged to avoid data duplication and gaps, as well as to maximise the value of open data. A coherent and collaborative approach must be adopted to address data gaps, specifically in developing countries. These gaps can be filled through adopting vendor and ‘donor agnostic’ data management systems, integrating data sharing agreements for funded programmes, and establishing formal data sharing programmes with the private sector without compromising personal information or trade secrets. The development of case studies is a powerful mechanism to encourage sharing as it can illustrate effective processes and the value of open data.
At the micro level, institutions should develop formal or structured data management strategies that can proactively lead to open data. Data management strategies must always be focused on organisational needs and address standards, quality, applications, and capacity building.
Environmental open data has helped shape national and international policies and decisions. Notwithstanding the challenges of getting governments and private sector entities to share data, the volume of open data is increasing. Our task is to ensure that the release of environmental open data is needs-based, user friendly, and of sufficient quality to address the local, regional, and global challenges in developing a sustainable future.
EC (European Commission). (2013). Sharing water-related information to tackle changes in the hydrosphere – for operational needs. https://cordis.europa.eu/project/rcn/110496/factsheet/en
Hobern, D., Apostolico, A., Arnaud, E., Bello, J.C., Canhos, D., Dubois, G., Field, D., Alonso Garcia, E., Hardisty, A., Harrison, J., Heidorn, B., Krishtalka, L., Mata, E., Page, R., Parr, C., Price, J., & Willoughby, S. (2013). Delivering biodiversity knowledge in the Information Age. Copenhagen: Global Biodiversity Information Facility. http://orca.cf.ac.uk/71243/1/GBIO.pdf
Schmidt, B., Gemeinholzer, B., & Treloar, A. (2016). Open data in global environmental research: The belmont forum’s open data survey. PLOS ONE, 11(1). https://doi.org/10.1371/journal.pone.0146695
Transparency International. (2018). Climate Adaptation Finance Governance Standards: A new approach piloted in the Maldives and Bangladesh. Berlin: Transparency International. https://www.transparency.org/whatwedo/publication/climate_adaptation_finance_governance_standards
Wetsman, N. (2018). Air-pollution trackers seek to fill Africa’s data gap. Nature, 556, 284. https://www.nature.com/magazine-assets/d41586-018-04330-x/d41586-018-04330-x.pdf
About the author
Selwyn Willoughby is an international information strategist with over 20 years of experience in the environment and conservation sector. He is the Director of the data advisory company, Information by Design, and a Fellow of the South African National Biodiversity Institute (SANBI). You can follow Selwyn at https://www.twitter.com/selwynwill and get more information about his work at https://www.infobydesign.co.za.
How to cite this chapter
Willoughby, S. (2019). Open data and the environment. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The state of open data: Histories and horizons (pp. 103–118). Cape Town and Ottawa: African Minds and International Development Research Centre. http://stateofopendata.od4d.net
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. It was carried out with the aid of a grant from the International Development Research Centre, Ottawa, Canada. |
1PBS (Public Broadcasting Service). (2014). A fierce green fire: Timeline of environmental movement and history. American Masters, 15 April. http://www.pbs.org/wnet/americanmasters/a-fierce-green-fire-timeline-of-environmental-movement/2988/
2UN (United Nations). (1972). Declaration of the United Nations Conference on the Human Environment. Stockholm, 5–16 June 1972. http://www.un-documents.net/unchedec.htm
3Bell, A. (2014). A very short history of climate change research. Road to Paris, 5 September. http://roadtoparis.info/2014/09/05/history-climate-change-research/
4Pralle, S.B. (2009). Agenda-setting and climate change. Environmental Politics, 18(5), 781–799. http://www.tandfonline.com/doi/full/10.1080/09644010903157115
5Nuccitelli, D. (2013). IPCC model global warming projections have done much better than you think. The Guardian, 1 October. https://www.theguardian.com/environment/climate-consensus-97-percent/2013/oct/01/ipcc-global-warming-projections-accurate
6RealClimate. (2010). IPCC errors: Facts and spin. 14 February. http://www.realclimate.org/index.php/archives/2010/02/ipcc-errors-facts-and-spin/
7Rateng, B. (2017). Using data visualisation to track climate change. OpenLearn, 1 August. https://www.open.edu/openlearn/nature-environment/environmental-studies/using-data-visualisation-track-climate-change
8TI (Transparency International). (2015). Clean finance for a clean planet. https://www.transparency.org/whatwedo/activity/clean_finance_for_a_clean_planet
9TI. (2018). Safeguarding climate finance procurement: National-level procurement of the Green Climate Fund. Nairobi: Transparency International. http://files.transparency.org/content/download/2236/13961/file/2018_Report_NationalProcurementGCF_English.pdf
10https://index.okfn.org/dataset/emissions/
13Lim, S.S., Allen, K., Bhutta, Z.A., Dandona, L., Forouzanfar, M.H., Fullman, N., Gething, P.W. et al. (2016). Measuring the health-related Sustainable Development Goals in 188 countries: A Baseline analysis from the global burden of disease study 2015. The Lancet, 388(10053), 1813–1850. https://doi.org/10.1016/S0140-6736(16)31467-2
15Wetsman, N. (2018). Air-pollution trackers seek to fill Africa’s data gap. Nature, 556, 284. https://www.nature.com/magazine-assets/d41586-018-04330-x/d41586-018-04330-x.pdf
16GBIF. (2018). Big data for biodiversity: GBIF.Org surpasses 1 billion species occurrences. Global Biodiversity Information Facility [News post], 6 July. https://www.gbif.org/news/5BesWzmwqQ4U84suqWyOQy/big-data-for-biodiversity-gbiforg-surpasses-1-billion-species-occurrences
19CBD (Convention on Biological Diversity). (2010). Aichi biodiversity targets. https://www.cbd.int/sp/targets/
20Brooks, T.M., Butchart, S.H.M., Cox, N.A., Heath, M., Hilton-Taylor, C., Hoffmann, M., Kingston, N., Rodríguez, J.P., Stuart, S.N., & Smart, J. (2015). Harnessing biodiversity and conservation knowledge products to track the Aichi targets and sustainable development goals. Biodiversity, 16(2–3), 157–174. https://doi.org/10.1080/14888386.2015.1075903
21http://wholives.org/our-mission/mission/
22UNESCO (United Nations Educational, Scientific and Cultural Organization). (2018). UNESCO launches a pioneering tool to monitor water quality. https://en.unesco.org/news/unesco-launches-pioneering-tool-monitor-water-quality
23http://www.worldwaterquality.org/
25http://www.water-switch-on.eu/
27https://global-surface-water.appspot.com
28https://index.okfn.org/dataset/water
29Wolski, P. (2018). Drivers of Cape Town’s water shortage. Climate Systems Analysis Group [Blog post], 18 July. http://www.csag.uct.ac.za/2018/07/18/drivers-of-cape-town-water-shortage/
30CPI (Centre for Public Impact). (2018). Impact insight: Cape Town water crisis. https://www.centreforpublicimpact.org/impact-insight-cape-town-water-crisis/?gclid=EAIaIQobChMI-L36lKb13wIVRbnACh1xuQGnEAAYASAAEgJjT_D_BwE
31High-level Group for Partnership, Coordination and Capacity-Building for Statistics. (2017). Cape Town Global Action Plan for Sustainable Development data. United Nations Data Forum, 15 January. https://undataforum.org/WorldDataForum/wp-content/uploads/2017/01/Cape-Town-Action-Plan-For-Data-Jan2017.pdf
32https://dashboard.ala.org.au/
33https://www.cybertracker.org
34IPW. (2016). Drawn out battle over genetic resources dampens Africa’s hopes. Intellectual Property Watch [News post], 27 April. http://www.ip-watch.org/2016/04/27/drawn-out-battle-over-genetic-resources-dampens-africas-hopes/
35WIPO. (2015). The WIPO Intergovernmental Committee on Intellectual Property and Genetic Resources, Traditional Knowledge and Folklore: Background Brief. Geneva: World Intellectual Property Organization. https://www.wipo.int/edocs/pubdocs/en/wipo_pub_tk_2.pdf
36UrbanTide. (2016). Open data – is the open private sector the next frontier? UrbanTide [Blog post], 24 October. https://urbantide.com/fullstory2/2016/10/24/open-data-is-the-open-private-sector-the-next-frontier
39Maplecroft. (2016). Climate Change Vulnerability Index 2017. World (Infographic). ReliefWeb, [Report], 14 November. https://reliefweb.int/report/world/climate-change-vulnerability-index-2017
40Open Government Partnership. https://www.opengovpartnership.org/participants
During the last decade, data has rapidly become more available across the extractives sector. Civil society, researchers, and journalists have responded by finding new ways to examine natural resource revenues, locations, production statistics, and corporate filings, drawing on data which, until recently, was only available to the companies involved or locked up in databases of proprietary data providers.
The Extractives Industry Transparency Initiative (EITI) adopted an Open Data Policy in 2015 and has since introduced a database of revenue payments for its 51 members, demonstrating a shared view that open data can serve as an enabler of accountability.1
Open data principles have also been gaining traction within government-led extractive industry reporting regimes, including requirements to submit structured and standardised data. However, experience shows that unless reporting as data is made mandatory, companies prefer to provide unstructured PDFs.
New extractives open data has, in some cases, allowed for vibrant and timely evidence-based debate on taxation in resource-rich countries, offering a public space for review of various public policy options. However, it is important that analysis, journalism, and evidence-based advocacy reaches policy-makers in order for it to achieve lasting impact.
Since the 1990s, academics, civil society organisations (CSOs), and multilateral institutions have paid increasing attention to the impact of oil, gas, and mining operations on human development in resource-rich countries. The apparent paradox that abundant natural resources have not, in many countries, translated into economic growth and human development2 has sparked considerable work toward shining a light on how extractive industries operate. Questions of policy and taxation, of the local impact of extractive operations, and of governance and corruption, have all found their way onto the agenda.
At the start of this millennium, substantial international advocacy efforts toward greater transparency for the extractives sector started to make up ground with the creation of the Extractives Industry Transparency Initiative (EITI), Revenue Watch Institute (RWI), which later merged with the Natural Resource Charter to form the Natural Resources Governance Institute (NRGI) in 2013, and the Publish What You Pay (PWYP) global coalition among others. A focus on the disclosure of information about payments made from extractives companies to governments and on contracts and concessions have led, over the last 15 years, to a number of new legal disclosure requirements, national-level multi-stakeholder partnerships, and voluntary disclosure schemes. In the last few years, these reforms have started to yield new flows of documents, and, in some cases, open data.
However, persistent problems of unequal access to data and poor data quality in the extractives sector are far from solved, and it has long been acknowledged that a lack of accessible data affects everyone from grassroots civil society groups and national anti-corruption watchdogs through to multilateral institutions engaged in economic planning. This chapter will examine how stakeholders in the extractives sector have engaged with open data over the last decade, working in parallel to secure incremental policy change on data publication and to put the data that is already available to use. It will also explore how the broad community of practice around open extractives data has supported a cross-pollination of ideas and research methods, helping to break down silos between different development disciplines and to more rapidly facilitate informed and evidence-based debate.
Driving context-specific use: Open Jade Data Myanmar
OpenJadeData.org3 is a public data portal launched by the Natural Resource Governance Institute (NGRI) in May 2018. The site, available in English and Burmese, aims to support engagement with new datasets on Myanmar’s Jade trade. It has three main objectives: provide clean, collated data on jade to be used for further analysis; allow users to visualise the information with an online tool; and help users to dive deeper into some of the prevalent issues related to the jade industry through original “data stories”. Each feature was developed with input from users in Myanmar and developed with different audiences and skills levels in mind, including researchers, journalists, and interested members of the general public.4
So far, the portal features stories re-examining estimates of the size of the Jade industry, highlighting the lack of accurate data on the scale of the sector with estimates ranging from USD 5 to USD 31 billion.5,6
Another clear goal of the portal is to support ongoing efforts of the government and civil society groups to increase transparency and conduct regular analysis of the jade sector. NRGI describes the portal as “focused on jade, at the moment, since it has become one of the symbols of Myanmar’s inextricable political-economic situation which links a precious natural resource worth billions of dollars annually – and characterised by illegal trade and a huge amount of uncollected potential state revenues – with domestic conflicts and the peace process; as well as environmental and social disasters and unregulated, mass migration of workers with no safety regulations. Addressing the multiple challenges faced by the jade sector could be one of the best proofs that the 2015 elections and the consecutive National League for Democracy government are really ushering in a new era for Myanmar’s politics and economy.”7
Figure 1:Data story showing estimates of the size of the Jade trade in Myanmar
Source: https://openjadedata.org/Stories/how_much_jade_worth.html
Understanding the extractive sector requires data. A country’s fiscal regimes provide detailed rules on how oil and minerals can be extracted, how extracted resources will be priced, the deductible costs for extractors, and how revenues are to be collected.8 Further rules and environmental regulations regarding where and how extraction can take place are mapped through mining cadastre datasets. Information on the companies, corporate structures, the public–private partnerships involved, and the mechanisms through which they are financed, can only be understood through data analysis. Global commodity traders manage the transfer of oil, gas, and minerals around the world using complex data systems.
Although transparency efforts in the extractives industry in the early 2000s were focused primarily on the publication of documents ready to be checked and reconciled through an audit process, by the start of this decade, there was an increasing emphasis on disclosure in the form of data. This coincides with the emergence of the open data movement on the global stage, marking the extractives transparency sector as an early adopter of open data methodologies.
Notably, work on transparency in the extractives sector has focused as much on requiring the private sector to open up data as it has on opening up government data. PWYP has described how this framing was a strategic choice, noting that “At the time of the launch [of PWYP] there was little recourse at the global level to push for disclosure of revenues by resource-rich developing country governments”, but “mechanisms to require disclosure by companies which are listed on stock exchanges and subject to accounting regulations were available and could be amended”.9 There is, within this work, an ambitious programme of re-imagining how a global market should function, with concerted work to rebalance the line between public and proprietary information. For some, this is simply a corrective action in a market where governance has not kept up with the globalised industry. For others, such as Berlin-based social enterprise OpenOil, launched in 2011 and operating under the tagline “imagine an open oil industry …”,10 there may be a deeper vision at play of transforming the way natural resources are managed and the role that policy-makers and citizens have in their exploitation.
As government agencies gain new data, both public and private, from companies, the attention of civil society and researchers has turned to the lack of cross-agency data sharing. Various government agencies tend to obtain different types of data which can all be of value when, for example, assessing tax payments and the other contractual obligations of companies. But if the data from different agencies is not brought together, opportunities to use it may be missed. In a survey of government officials in African resource-rich countries, OpenOil identified, in particular, that project costs, reserves, and production data were identified as areas where the gap between the perceived need for joined-up data and data availability was the most pronounced.11 More research is necessary in order to determine the appropriate limits for how much data should be made public, but, in the meantime, models for data sharing between trusted agencies could be further advanced. In support of this, recent research in the mining sector noted that “Revenue authorities could improve their analysis of risks through sharing production data, findings from cost audits, mining agreements, and information on beneficial owners as a matter of course rather than just before a tax audit.”12 Concretely, the African Tax Administration Forum has indicated that such work is being explored.13
In 2010, President Obama signed into law the Dodd-Frank Act14 with a provision (Section 1504) which requires extractives companies to report on their project-level payments to governments as part of official security filings. The European Union followed suit with the 2013 Accounting and Transparency Directive that was subsequently transposed into national law across Europe,15 and, in Canada, the Extractive Sector Transparency Measures Act (ESTMA) was passed at the end of 2014.16 Although a decision by the United States (US) Congress, under President Trump, to vacate the rules for Section 1504 of the Dodd-Frank Act means that disclosure from US-listed extractives companies is currently on hold, other countries are pushing forward with disclosure. Fifty-one EITI member countries have now agreed to provide project-level disclosure for the 2018 financial year in open data formats.17
Experiences to date demonstrate that the implementation of these mandatory disclosure regulations and processes have a large impact on how far the disclosures lead to machine-readable and user-friendly open data. In Canada, where more than 500 companies have disclosed payments data, companies can choose under the regulations to either publish machine-readable data in XLS format or to publish PDF documents which place a heavy processing burden on anyone wanting to carry out detailed analysis of the reports. For the fiscal year 2016–2017, only 27 company reports were provided in machine-readable XLS format, while 687 reports were provided in PDF format.18 According to PricewaterhouseCoopers Canada (PwC), an auditing company, the share of company reports submitted with one or more deficiencies in company reports fell from 80% in the first year to 46% in the second year.19 In the United Kingdom (UK), the company register, Companies House, has developed an application programming interface (API) for the digital submission of reports, attracting structured data reports from 115 companies for the financial year 2015–2016.20 Working with this released data, PWYP has noted that “Despite its value and importance, [...] the quality of mandatory extractive company reporting to date indicates that improvement is needed in several areas”,21 highlighting the impact that definitions and disclosure formats defined in the regulations has on the data output. Gaining an agreement on global standards and infrastructures for joined-up extractives reporting to improve quality will be no small task.
One of the key ways in which mandatory disclosure can be improved is by different actors making use of, and providing feedback on, data disclosures. This enables civil society to provide detailed technical feedback that can strengthen implementation of new disclosure processes during their critical early years. High-profile use cases demonstrating the value of disclosures are also important to overcome resistance to ongoing publication. For example, in 2017, PWYP France, Oxfam France, ONE, and Sherpa analysed payments from the uranium mining company, Areva, to the Government of Niger. They concluded that a contract renegotiated in 2014 had led to a substantial reduction in government revenue.22 The same year, NRGI published an analysis of oil revenues in Ghana23 and in Nigeria24 demonstrating how, with better access to data, civil society can ask more precise questions of both national oil companies and the government.
It is important that data use can be sustained and that potential users of data are supported in navigating a complex data landscape. In 2018, PWYP in the UK conducted a detailed study of extractives disclosures,25 and, Global Witness and Resources for Development Consulting published a guide for using mandatory disclosure data, highlighting both sources of data and the “red flags” to look for.26 To facilitate the use of disparate data, NRGI also pioneered the development of the ResourceProjects.org data platform which collects, processes, and standardises mandatory disclosure data across jurisdictions, taking some of the hard work out of data access. At the time of writing, the ResourceProjects.org platform contains mandatory disclosures covering more than 18 000 payments from 747 reporting companies with payments worth more than USD 537 billion. The platform allows users to navigate disclosure data either by reporting by company or by country. By collating disclosures from across reporting jurisdictions, the platform reduces the complexity of acquiring the data for local civil society users, thus lowering barriers to using the data.
There is also evidence that governments are beginning to consider how to better engage citizens as data users, applying open source and human-centred design principles to support the dissemination of data. In the US, the US Digital Service helped to develop the US EITI data platform. Although the US withdrew from EITI in 2017,27 EITI Multi-Stakeholder Group (MSG) of Germany adopted the same open source developed platform.28 In the Philippines, the government has pursued local workshops to stimulate the use of data collected through the EITI process.29
Disclosures secured by regulations are not the only source of data becoming available on the extractives sector. There exists a much wider landscape of data collection with increasing efforts to standardise and align data. Data ranges from high-level economic statistics from institutions like the World Bank and International Centre for Tax and Development (ICTD)30 to extractives data on revenues and contracts published by companies and governments. In some cases, the data narrowly focuses on extractives; however, in other cases, extractives-relevant data is drawn from wider data sources, such as corporate registries (see Chapter 3: Corporate ownership), satellite data (see Chapter 9: Geospatial), and trade statistics.
A key source of comparable data about extractives is EITI. Through its implementation in 51 countries, EITI has established multi-stakeholder-led processes for regular data collection on several topics, such as the production of extractives, revenues paid to governments, and licences issued. In the past, this information was largely captured in PDF reports, but EITI has, since 2016, maintained a database of country-level data. In 2018, this was followed by the release of a public API that provides a feed of more than 300 reporting years from EITI member states and USD 2 trillion in revenue payments.31 In December 2015, EITI had already adopted an Open Data Policy, which, a year later, became part of the EITI Standard. This encourages the development of open-by-default systems and the use of unique identifiers to link data between years and reporting sources.32 As a sign of growing interest, the World Bank published a comprehensive study in 2016 on how the extractives sector could leverage existing open data standards for new disclosures.33
Aligned reporting requirements are key to mapping revenue flows between datasets. Between 2014 and 2017, the International Monetary Fund (IMF) engaged in a series of country pilots to examine how revenue in resource-rich countries could be matched to their Government Finance Statistics Manual. Beyond the studies themselves, the result of this has been a crosswalk between EITI and IMF standards and a new standard data collection template for resource revenue data.34
EITI has also developed a strategy for “mainstreaming” transparency requirements within country data systems, recognising the importance of having government agencies in charge of collecting disaggregated revenue data and sketching out how new and existing information systems, including financial systems and cadastres, can be oriented toward the development of standardised open data using case studies from Kazakhstan, Timor-Leste, Norway, and Mongolia.35
Contracts, licences, and information on fiscal terms are increasingly recognised as critical building blocks for analysing the public long-term benefits of extractives projects. During the past few years, substantive improvements have occurred in the publishing practices of both major companies and governments.36 A focus on contracts has also helped build links to other complex sectors, such as land, infrastructure, and private–public partnerships, and has offered opportunities to work systematically on improvements to open contracting.37 However, key standards, such as the Open Contracting Data Standard (OCDS), are yet to be fully adapted to capture structured data on concessions and extractives contracts, meaning that extracting data from disclosed contract documents continues to rely heavily on civil society and researchers.
Commercial data providers have, during recent years, also served as key actors in the rapid digitisation of extractives records management within government, thanks, in particular, to international donor funding. Yet open data principles have often failed to materialise within these projects. For example, among funded mining cadastres in 15 Sub-Saharan African countries, none have, to date, published the underlying licensed data in open formats. These constraints make it more difficult for civil society to scrutinise the data.
However, one example of progress on open data among ICT system providers can be found in the Revenue Development Foundation (RDF), which facilitates the publication of mining licences and revenue data for government ministries with plans to provide a public API in the future.38 RDF reports 5 000 registered users, of which 65% are from mining companies and investors, 8% are researchers, and another 8% are from civil society. To date, RDF has launched four public data portals across Sierra Leone, Liberia, Mali, and Ghana, tracking a total of 17 000 mining licences and 30 000 payments.39
In some cases, countries are going beyond the minimum requirements for disclosure. In Mexico, for example, the EITI MSG developed a data portal with input from civil society, making contracts, production data, and revenue data available to the public. In Myanmar, the EITI MSG expanded its disclosure of disaggregated data on the Jade trade, thus enabling new analysis which has achieved coverage in national media.40 In the coming years, implementation of beneficial ownership registers will be a new critical data priority, following commitments made by members of EITI to include the publication of beneficial ownership information as open data by 2020.41 In several countries, reforms are currently underway to establish the legal frameworks that will mandate beneficial ownership disclosure.
In looking at extractives, we should also not ignore forestry and agriculture. In 2017, the World Resources Institute (WRI) examined the transparency of logging, mining, and agricultural concession data in 14 countries. The WRI concluded that, while data disclosure varies significantly by country and sector, its quality is limited by the absence of internationally agreed upon data standards, stating that “civil society can be a significant source of concessions information where official data are unavailable”.42 The forestry sector has also been a major user of satellite data, working in partnership with academia to process global landsat data to detect changes in land use.43
Although there is a long way to go before all relevant extractives-related data is well-structured, standardised, and open by default, the sector can no longer be considered data-poor as NRGI’s mapping of the data supply ecosystem illustrates, identifying over 35 repositories and databases for extractive sector open data.44 The ways in which the sector has engaged with the available data is instructive both to understanding what can be done even when data gaps persist and also to identifying the most crucial areas for advocacy to further improve data supply.
Measuring extractives governance
Neither the Open Data Barometer, Open Data Index, nor the Open Data Inventory measure specific variables on extractives governance. However, a number of sector-specific projects are now tracking data availability and offer the chance to continuously monitor trends in the future.
In 2017, the Resource Governance Index (RGI) launched the most comprehensive measurement of extractives governance to date, including assessments of 81 countries (accounting for 82% of the world’s oil production and a significant proportion of mineral extraction).45 A number of the questions assess availability and machine-readability of key disclosures, and the index itself has published raw data across all its 149 questions. The RGI has also provided almost 10 000 supporting documents.46
Launched in 2018, the Responsible Mining Index (RMI) has developed “an evidence-based assessment of mining company policies and practices on economic, environmental, social, and governance issues”, including assessments at the company and mine-site level. The RMI examines the extent to which companies are supplying open data on a number of aspects of their operations.47
Critical to the use of extractives data is an emerging ecosystem of platforms, infomediaries, and capacity-building activities.48 Organisations, such as OpenOil, have deployed the Aleph platform that enables the search of security exchange filings from extractives companies, which helps data users to find the needle in the haystack of disclosure. In 2016, PWYP, in partnership with OpenOil, launched a global Data Extractors programme dedicated to building skills among CSOs working on extractives accountability issues,49 and the WRI has released mapping platforms, Global Forest Watch and Resource Watch, which provide access to raw data and visualisations from public domain mining datasets.50 All these activities are illustrative of the comparatively (to other sectors) well-resourced environment for data use in extractives. The following sections outline four different ways in which data is being put into use.
With the increasing availability of contract information, information on revenue payments, and other key data points, a growing community of CSOs and consultants have engaged in financial modelling to help the public understand how different policies can lead to vastly different outcomes from resource extraction. OpenOil describe their modelling as “building an Excel-based model that recreates the past and forecasts the future cashflows of a specific mining or petroleum (oil and/or gas) project, and evaluates how these cashflows are shared between the resource owner – usually the government – and the investor – usually a mining or oil company – over the life of the project, under the fiscal rules (the fiscal regime) that apply to the project”.51
So far, OpenOil has modelled eleven oil and mining projects across several countries, including Indonesia,52 Mongolia, and Brazil.53 Similarly, the Columbia Center for Sustainable Investment (CCSI) has produced an analysis of gold mines,54 and NRGI has used modelling to offer quantitative analysis on sector-wide fiscal regimes in order to contribute to national policy debates in many countries, such as Ghana, Kyrgyzstan, Uganda, Mongolia, and the Democratic Republic of Congo (DRC). In addition, the IMF released details on its modelling in 2015 following calls by civil society for transparency around how their models were built.55
Modelling has proven effective at closing the gap between disclosure and discussion by taking advantage of improvements in data availability, and the work of a critical audience of infomediaries, to increase public scrutiny and debate. This was evident in Guyana, where the publication of the contract for the Stabroek project led to OpenOil publishing a financial model three months later which assessed that the government take from the project (52% at today’s prices) was “low even for frontier provinces”.56 As another example, in the US, the Project on Government Oversight (POGO) used auction data to analyse how the price per acre for extracting oil in the Gulf of Mexico had “declined by 95.7 percent from $9,068 to $391”.57 Among multilateral institutions, the IMF has also contributed to this emerging open community of modelling practitioners with the publication of its official IMF model and methodology.58
Besides the modelling of individual extractives projects, a range of actors are using available data for long-term forecasting. For example, in 2018, NRGI published a tool for visualising IMF forecasting data from the World Economic Outlook “to assess how countries have been coping with resource sector volatility and uncertainty”.59 In several countries, improved access to revenue data has also provided input for improved macro-economic analysis and revenue forecasting with implications relevant to open budget communities. In one example from Mongolia, revenue forecasts from five of the largest mines were used to feed into an openly licensed macro-economic analysis.60
While civil society may, until recently, have referred to the extractives sector as data poor, the private sector has long benefitted from proprietary data provided by highly valued business intelligence companies, one of which was recently sold in a multi-billion dollar deal.61 Companies, such as S&P Global, Rystad Energy, and Wood Mackenzie,62 all produce commercial databases for oil and mining production, which, due to the high subscription fees, are rarely accessible to civil society, journalists, or even governments. Some have hoped that mandatory disclosure data, data from EITI reporting and contract information, will, over time, make these proprietary databases redundant and level the playing field between governments and CSOs on one side and companies on the other. However, government data releases to date have, to some extent, provided a sobering moment. Mandatory disclosure data does not, for example, provide the reserve and cost figures which are often needed for developing financial models. While some governments have, for a considerable period, had processes for obtaining independently produced data on cost and reserves that can be utilised for monitoring costs by operating companies, others have only more recently begun considering these options.63
In the area of oil shipping, there are signs that new types of analytical products are emerging which could improve scrutiny of the commodity trading sector. The TankerTrackers platform has been publishing analysis based on open or public data on production and shipping since 2016, generating regular coverage in major news outlets.64 This success highlights how incumbent commercial data providers may have left a number of users unserved due to high subscription costs for data services and the lack of data provenance within proprietary databases.
Looking ahead, the development of new business models across the extractives business intelligence market could play an important role in addressing data gaps created by the current unaffordable commercial databases, which often lack transparent methodologies and have limited coverage in developing countries. An analysis by the Organisation for Economic Cooperation and Development (OECD) of privately provided datasets for use in transfer price analysis underscores that often these proprietary sources have major limitations, leaving substantial gaps in the market for data products better oriented toward government and multilateral needs.65 As yet, it is not clear whether the combination of entrepreneurs and investors needed to fill these gaps will emerge.
Use of microtasking to uncover oil spills
The limited extent of open data standardisation for extractives can sometimes mean that it can be difficult to use data to advance accountability. In Nigeria, the National Oil Spill Detection and Response Agency (NOSDRA) began providing online data for oil spill reports reported by companies operating in the country in 2014.66 When approaching the data, Amnesty International was, however, faced with thousands of PDF documents without any standardised information. They turned to their Amnesty Decoders online community of human rights volunteers, working with them to deliver 1 300 hours of data cleaning and generating the structured data needed in order to analyse the track record of different companies’ oil spill reports.
The investigation showed that figures reported by companies were vastly different from those of the Nigerian government. The Decoders helped identify massive delays in resolving spills with some spills continuing for months after they were reported. The investigation earned widespread media mentions and illustrated how “microtasking” can leverage volunteers to expand the investigative reach of CSOs.67
Collaboration between civil society and investigative journalists working with new sources of data has helped with the scrutiny of extractives companies and corruption in resource-rich developing countries. Journalists used the Panama Papers to report how the Panama-based law firm, Mossack-Fonseca, served as “a major provider of secrecy to companies involved in extractive industries” in countries such as Algeria,68 and Global Witness has leveraged company register data to report on how former generals in Myanmar benefitted from the opaque jade trade.69 In fragile states, established and new media outlets have worked side-by-side to leverage satellite imagery and open datasets in order to uncover the role of natural resources in conflict. During the rise of the Islamic State in Syria, the Washington Post mapped makeshift oil refineries using satellite imagery from Digital Globe,70 and the Financial Times covered the political economy of conflict in Iraq using local oil price data.71 In Libya, Al Jazeera has mapped oil fields in order to contextualise coverage of the ongoing civil war.72
In reporting with data, civil society and journalists both surface stories for wider public attention and demonstrate what could be done with data on a more systematic basis. The citizens’ journalism collective, Bellingcat, for example, has noted that their “open-source research is only a small attempt to demonstrate that much more work can be undertaken to identify conflict pollution and improve humanitarian response and post-conflict reconstruction work” and have provided workshops and training to humanitarian organisations and UN agencies. They used satellite imagery to examine oil pollution in Syria during the civil war.73 In Peru, the Amazon Conservation Association, a CSO, has documented deforestation from illegal gold mining. Lastly, a civil society-initiated project in Indonesia explored how community-based drone mapping could reveal land grabs related to mining.74
There is a well-developed research community around extractives governance with signs of growing interest in the potential of quantitative and experimental research methods to generate policy-relevant knowledge from new data sources. Pioneering work by AidData, which combines geo-referenced concession data with remote sensing satellite data in order to look at connections between natural resource concessions and local economic development in Liberia, is illustrative of the new kind of approach researchers are exploring.75 This research does not wait for perfect open data but rather brings together data in new configurations to rigorously generate new insights. International institutions are also heavily embedded in the research community, engaging with new data flows. For example, the IMF has explored the potential to monitor real-time fiscal data76 which could be particularly useful for countries with volatile extractives revenues.
Central to the goal of turning research into action are strong connections between researchers, policy-makers, and non-governmental organisations (NGOs). The recently launched Project on Resources and Governance (PRG) describes how it is working to address the scant evidence about effective interventions in the extractives sector by bringing together a “network of social scientists, policy-makers, NGOs, and industry representatives dedicated to finding policies that promote welfare, peace, and accountability in resource-rich countries”.77 In the Disclosure to Development (D2D) programme, led by the International Finance Corporation, a message emerging from research so far is “that data needs to connect to policy and accountability within both government and the private sector”,78 and that this requires local voices and citizen participation. It is the combination of rigorous data analysis and local insight that can ultimately move the sector from being “data rich and information poor” to having actionable know-how for securing better impacts.79 The initiative Leveraging Transparency to Reduce Corruption, headed by the Brookings Institution, released an annotated bibliography in 2018 which “reviewed more than 650 books, papers, and other resources in the transparency, accountability, and participation and/or natural resource space”.80
Ultimately, as this chapter has sought to demonstrate, one of the key roles that increased data accessibility has played is to break down the silos of the different actors, enabling collaboration, knowledge transfer, and creative development of new methods and approaches.
In this chapter, we have highlighted trends in data collection and use which are starting to have an impact at both a global and country level. We see more robust analysis, more data-driven experimentation, and improved cross-sectional work to build skills among extractives-focused NGOs.
The extractives sector remains a highly volatile and capital-intensive sector. While there are different opinions on the pace, there is broad agreement that the sector will continue to invest heavily in new technologies, digitisation, and automation in the future.81 The World Economic Forum has also identified a wide area of technologies which could impact the extractives value chains, including artificial intelligence, robotics, privately owned trading platforms drawing on distributed ledgers (blockchain technology), and automation across the supply chain.82 This indicates that, while progress has been made in the last decade on data openness, private sector investments in data collection and management across extractives operations is likely to expand. At this stage, however, it is unclear how CSOs, activists, journalists, and policy-makers will be able to keep up with the pace of change to maintain access to data and to scrutinise a rapidly changing market.
As the extractives transparency movement heads into its third decade, and extractives open data enters its second, there is a need to draw on learning so far to sharpen our focus on clearly defined problems. Improvements are needed in providing reliable data, generating analysis which can address timely policy challenges, and finally getting the analysis and policy recommendations into the hands of broad-based civil society campaigns, journalists, and, ultimately, policy-makers. There are major benefits to be sought from linking up the sector with other “Follow the Money” efforts,83 building alliances to trace funds across extractives, budgets, contracting, aid, and service delivery. While substantive research is yet to emerge beyond initial mapping,84 there are opportunities here to go beyond breaking down silos in a single sector and to build bridges between groups concerned with fundamental questions of how public resources are managed. There is also a need to strengthen links with sustainability and climate change networks. Notable efforts are being made to connect analysis of extractives revenues, fiscal policy, and implications for climate change, for example, in areas such as fossil fuel subsidies85 and through the Green Fiscal Policy Network.86 Lastly, the emergence of the Task Force on Climate-related Financial Disclosures has led to the development of a framework for company disclosure of climate risks with commitments from 1 800 major companies.87 Moving forward, these climate disclosures will be an important source for analysis.
As this chapter has shown, when machine-readable open data is available, infomediaries, civil society, and governments can leverage it to develop analysis and evidence-based policy recommendations. Yet, it is important to recognise that the road from data acquisition to analysis and policy impact remains highly complex and dependent on many factors, such as the political context, the capacity of civil society and other potential users, and opportunities for decision-making based on the results of analysis. The extractives sector will be able to continue to learn from peer research across the open government space.88 Scaling up the use of extractives data is not necessarily about building mass-movements or substantially increasing investment in open data, but it does involve supporting the development of specialist skills among key stakeholders and leveraging those skills effectively, for example, by expanding work on fiscal modelling, which, if enabled by open data, would bring extensive domain expertise to bear on specific weaknesses in extractives governance. The future of open data in extractives will also rest on how well regulations for disclosure are implemented in practice by national agencies and how far multi-stakeholder initiatives advance interoperable disclosure requirements. The importance of getting the technical definitions and details right cannot be underestimated, and this will demand much deeper collaboration between open data and standards specialists, policy advocates, and policy-makers.
Overall, if the multi-stakeholder approach that has characterised extractives governance work over recent years can be sustained, and open data specialists can be further integrated into the process, then the next decade should see open data become an unprecedented tool for extractives governance.
Further reading
Natural Resource Governance Institute. (2017). 2017 Resource Governance Index. https://resourcegovernanceindex.org/about/global-report
Ray, N., Lacroix, P.M.A., Giuliani, G., Upla, P., Rajabifard, A., & Jensen, D. (2016). Open spatial data infrastructures for the sustainable development of the extractives sector: Promises and challenges. In D. Coleman, A. Rajabifard, & J. Crompvoets (Eds.), Spatial enablement in a smart world (pp. 53–69). Quebec City: Global Spatial Data Infrastructure Association Press. https://archive-ouverte.unige.ch/unige:90248
Saavedra, S. & Romero, M. (2017). Local incentives and national tax evasion: The response of illegal mining to a tax reform in Colombia. Job Market Paper. Stanford, CA: Stanford University. http://web.stanford.edu/~santisap/Saavedra_JMP.pdf
Srivastava, N., Agarwal, V., Bhattacharjya, S., Gopalakrishnan, T., Meenawat, H., Nayak, B., & Soni, A. (2014). Open government data for regulation of energy resources in India. New Delhi: The Energy and Resources Institute. http://opendataresearch.org/sites/default/files/publications/Full%20-%20TERI%20OGD%20and%20energy%20resources%20July%202014-print.pdf
World Bank. (2016). Options for data reporting – EITI Standard, 2016: The good, the better and the best. Washington, DC: World Bank. http://documents.worldbank.org/curated/en/793601469102170609/pdf/107171-WP-P152662-PUBLIC.pdf
Anders Pedersen is a Senior Open Data Officer at the NRGI. At NRGI, Anders has overseen the development of global data platforms and technical assistance on open data and data-driven decision-making in more than a dozen countries across Asia, Latin America, and Africa. He has engaged with global stakeholders, such as IMF, Open Contracting Partnership, the World Bank, and IETI, on interoperability, data standards, and open licensing. Before joining NRGI, Anders worked for more than a decade in the field of open data, international development, and journalism at the Open Knowledge Foundation.
How to cite this chapter
Pedersen, A. (2019). Open data and extractives. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The state of open data: Histories and horizons (pp. 119–136). Cape Town and Ottawa: African Minds and International Development Research Centre. http://stateofopendata.od4d.net.
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. It was carried out with the aid of a grant from the International Development Research Centre, Ottawa, Canada. |
1https://www.resourcedata.org/organization/eiti
2Humphreys, M., Sachs, J.D., & Stiglitz, J.E. (2007). Escaping the resource curse. New York, NY: Columbia University Press.
4Kyithar Swe, A. & Salomon, M. (2018). New data demystify Myanmar’s jade sector. Myanmar Times, 2 August. https://www.mmtimes.com/news/new-data-demystify-myanmars-jade-sector.html
5Global Witness. (2015). Jade: Myanmar’s “big state secret”: The biggest natural resources heist in history? Global Witness, 23 October. https://www.globalwitness.org/en/campaigns/oil-gas-and-mining/myanmarjade/
6Dapice, D. (2018). A grand bargain: What it is and why it is needed. Cambridge, MA: Ash Center for Democratic Governance and Innovation, Harvard Kennedy School. https://ash.harvard.edu/files/ash/files/20160815_a_grand_bargain_eng_oct_24.2016.pdf
7Author’s interview with key informant.
8IMF. (2012). Fiscal regimes for extractive industries: Design and implementation. Washington, DC: Fiscal Affairs Department, International Monetary Fund. https://www.imf.org/en/Publications/Policy-Papers/Issues/2016/12/31/Fiscal-Regimes-for-Extractive-Industries-Design-and-Implementation-PP4701
9Van Oranje, M. & Parham, H. (2009). Publishing what we learned: An assessment of the Publish What You Pay Coalition. London: Publish What You Pay, p. 30. https://eiti.org/document/publishing-what-we-learned-assessment-of-publish-what-you-pay-coalition
10http://openoil.net/about-openoil/
11African Natural Resources Center, African Development Bank, and OpenOil. (2017). Running the numbers: How African governments model the extractive projects. African Natural Resources Center, African Development Bank, and OpenOil. https://www.afdb.org/fileadmin/uploads/afdb/Documents/Publications/anrc/Running_the_Numbers_Analytical_report.pdf
12Readhead, A. (2016). Preventing tax base erosion in Africa: A regional study of transfer pricing challenges in the mining sector. New York, NY: Natural Resource Governance Institute. https://resourcegovernance.org/sites/default/files/documents/nrgi_transfer-pricing-study.pdf
13https://www.ataftax.org/Exchange-of-information
14Security and Exchange Commission. (2010). Dodd-Frank Wall Street Reform and Consumer Protection Act of 2010, 24 May. http://legcounsel.house.gov/Comps/Dodd-Frank%20Wall%20Street%20Reform%20and%20Consumer%20Protection%20Act.pdf
15https://eur-lex.europa.eu/legal-content/EN/NIM/?uri=celex:32013L0050
16https://laws-lois.justice.gc.ca/eng/acts/E-22.7/page-1.html
17EITI. (2017). Project-level reporting guidance note 29 – Requirement 4.7. Oslo: Extractive Industries Transparency Initiative. https://eiti.org/sites/default/files/documents/guidance_note_29_on_project-level_reporting.pdf
18Author’s calculation based on available reports on Natural Resources Canada’s disclosure site, Links to ESTMA reports, at https://www.nrcan.gc.ca/mining-materials/estma/18198
19PwC Canada. (2017). The Extractive Sector Transparency Measures Act – Year 1 reporting highlights. Ottawa: PricewaterhouseCoopers LLP. https://www.pwc.com/ca/en/energy-utilities/publications/pwc-energy-canada-estma-year-1-reporting-highlights-July%2014,2017-EN.pdf and author’s review of PwC Canada’s 2018 review.
20Litvinoff, M. (2017). Submission to UK government review of the reports on payments to governments regulations 2014. London: Publish What You Pay UK, p. 7. http://www.publishwhatyoupay.org/wp-content/uploads/2017/11/2017.11.17-PWYP-submission-to-UK-review-final.pdf
21Ibid., pp. 2–4.
22Alliot, C., Cortin, M., Kurkjian, M., Lemaître, S., Ly, S., & Parrinello, Q. (2017). Beyond transparency: Investigating the new extractive industry disclosures. Paris: PWYP France, One, Oxfam France, and Sherpa. https://www.oxfamamerica.org/explore/research-publications/beyond-transparency-investigating-the-new-extractive-industry-disclosures/
23Malden, A. & Osei, E. (2018). Ghana’s gold mining revenues: An analysis of company disclosures. Natural Resource Governance Institute, 11 September. https://resourcegovernance.org/analysis-tools/publications/ghanas-gold-mining-revenues-analysis-company-disclosures
24Malden, A. (2017). Nigeria’s oil and gas revenues: Insights from new company disclosures. Natural Resource Governance Institute. https://resourcegovernance.org/sites/default/files/documents/nigeria-oil-revenue.pdf
25Litvinoff, M. (2018). Comparing UK EITI and mandatory payments to governments data for 2016: Assessment report. London: Publish What You Pay UK. www.publishwhatyoupay.org/wp-content/uploads/2018/09/Comparing-UK-EITI-mandatory-data-assessment-report-PWYP-UK-Sept18.pdf
26Global Witness & Resources for Development Consulting. (2018). Finding the missing millions: A handbook for using extractive companies’ revenue disclosures to hold governments and industry to account. https://www.globalwitness.org/en/campaigns/oil-gas-and-mining/finding-missing-millions/#chapter-0/section-0
27Simon, J. (2017). US withdraws from extractive industries anti-corruption effort. Reuters, 2 November. https://www.reuters.com/article/us-usa-eiti/u-s-withdraws-from-extractive-industries-anti-corruption-effort-idUSKBN1D2290
28https://revenuedata.doi.gov/
29Palawan News. (2018). PH-EITI Roadshow 2018 pushes for extractives transparency in local development. Palawan News, 13 July. https://palawan-news.com/ph-eiti-roadshow-2018-pushes-for-extractives-transparency-in-local-development/
30The International Centre for Tax and Development (ICTD) has developed the Government Revenue Dataset (GRD), which serves as a key source for government revenues in resource-rich countries. The dataset is hosted by UN Wider. https://www.wider.unu.edu/project/government-revenue-dataset
31https://eiti.org/explore-data-portal
32Extractive Industries Transparency Initiative. (2016). EITI Open Data Policy. https://eiti.org/document/eiti-open-data-policy
33World Bank. (2016). Options for data reporting – EITI Standard, 2016: The good, the better and the best. Washington, DC: World Bank. http://documents.worldbank.org/curated/en/793601469102170609/pdf/107171-WP-P152662-PUBLIC.pdf
34IMF. (2017). Update on the standard template to collect data on government revenues from natural resources. Washington, DC: International Monetary Fund. https://www.imf.org/en/Publications/Policy-Papers/Issues/2017/04/27/update-on-the-standard-template-to-collect-data-on-government-revenues-from-natural-resources
35EITI. (2016). Towards mainstreaming action plan. Oslo: Extractive Industries Transparency Initiative.
36For disclosing companies, see Oxfam. (2018). Contract disclosure survey 2018: A review of the contract disclosure policies of 40 oil, gas and mining companies. Oxfam Briefing Paper. Oxford: Oxfam. https://oxfamilibrary.openrepository.com/bitstream/handle/10546/620465/bp-contract-disclosure-extractives-2018-030518-en.pdf;jsessionid=9F7A2330301E95422B0426167E88B854?sequence=4. For disclosure practices among governments, see Hubert, D. & Pitman, R. (2017). Past the tipping point? Contract disclosure within EITI. New York, NY: Natural Resource Governance Institute. https://resourcegovernance.org/sites/default/files/documents/past-the-tipping-point-contract-disclosure-within-eiti-web.pdf
37Pitman, R., Shafaie, A., Hayman, G., & Kluttz, C. (2018). Open contracting for oil, gas and mineral rights: Shining a light on good practice. New York, NY: Open Contracting Partnership & Natural Resource Governance Institute. https://resourcegovernance.org/sites/default/files/documents/open-contracting-foroil-and-gas-mineral-rights.pdf
38Revenue Development Foundation. (n.d.). The RDx Data Exchange Standard & API Service: Piloting and ongoing case studies. https://revenuedevelopment.org/page/rdx-data-exchange
39Written interview, Aasmund Andersen, Revenue Development Foundation, August 2018.
40https://www.mmtimes.com/news/new-data-demystify-myanmars-jade-sector.html
41EITI. (2017). 2017 Progress report: Ending company anonymity – The key to fighting corruption. Oslo: Extractive Industries Transparency Initiative, p. 14. https://eiti.org/sites/default/files/documents/eiti_progress_report_2017.pdf
42Webb, J., Petersen, R., Moses, E., Excell, C., Weisse, M., Cole, E., & Szoke-Burke, S. (2017). Logging, mining, and agricultural concessions data transparency: A survey of 14 forested countries. Washington, DC: World Resources Institute. http://wriorg.s3.amazonaws.com/s3fs-public/Logging_Mining_and_Agricultural_Concessions_Data_Transparency_A_Survey_of_14_Forested_Countries.pdf
43Hansen, M.C., Krylov, A., Tyukavina, A., Potapov, P.V., Turubanova, S., Zutta, B., Ifo, S., Margono, B., Stolle, F., & Moore, R. (2016). Humid tropical forest disturbance alerts using Landsat data. Environmental Research Letters, 11(3). https://doi.org/10.1088/1748-9326/11/3/034008
44http://apps.resourcegovernance.org/supply-ecosystem/
45The Economist. (2017). Resource Governance Index. The Economist, 1 July. https://www.economist.com/economic-and-financial-indicators/2017/07/01/resource-governance-index
46Natural Resource Governance Institute. (2017). 2017 Resource Governance Index. https://resourcegovernanceindex.org/
47https://responsibleminingindex.org/en/downloads
48Van Schalkwyk, F., Willmers, M., & McNaughton, M. (2016). Viscous open data: The roles of intermediaries in an open data ecosystem. Information Technology for Development, 22(1), 68–83. https://www.tandfonline.com/doi/pdf/10.1080/02681102.2015.1081868?needAccess=true
49https://www.pwyp.org/pwyp-news/qa-mining-new-company-data-with-pwyps-open-extractors/
50https://data.globalforestwatch.org/datasets/26a457ee3b584824bb930f2ec791b60d_0
51OpenOil. (n.d.). Financial Modeling Program. http://openoil.net/contract-modeling/financial-modeling-program/
52http://openoil.net/wp/wp-content/uploads/2014/09/OO_id_batuhijau_narrative_v1.0_161109.pdf
53http://openoil.net/wp/wp-content/uploads/2014/09/OO_br_Libra_narrative_1.0_161104.pdf
54http://ccsi.columbia.edu/work/projects/open-fiscal-models/
55Cust, J. & Mihalyi, D. (2015). IMF’s Open FARI Model Release an Important First Step. Natural Resource Governance Institute [Blog post], 9 October. https://resourcegovernance.org/blog/imfs-open-fari-model-release-important-first-step
56West, J. (2018). Stabroek Oil Field, Guyana: Narrative report. Berlin: OpenOil. http://openoil.net/wp/wpcontent/uploads/2018/03/oo_gy_stabroek_narrative_v1.0_180315_1025_jw.pdf
57Hilzenrath, D.S. & Pacifico, N. (2018). Drilling down: Big oil’s bidding. Project On Government Oversight (POGO), 22 January. http://www.pogo.org/our-work/articles/2018/drilling-down-big-oils-bidding.html
58Luca, O. & Mesa Puyo, D. (2016). Fiscal Analysis of Resource Industries (FARI) methodology. Washington, DC: Fiscal Affairs Department, International Monetary Fund. https://www.imf.org/external/pubs/ft/tnm/2016/tnm1601.pdf
59Mihalyi, D. & Morrison, T. (2018). World Economic Outlook Forecast Tracker. Natural Resource Governance Institute. https://resourcegovernance.org/analysis-tools/tools/world-economic-outlook-forecast-tracker
60Mihalyi, D., Baksa, D., & Romhanyi, B. (2017). Mongolia Macro-Fiscal Model. Natural Resource Governance Institute. https://resourcegovernance.org/analysis-tools/tools/mongolia-macro-fiscal-model
61Kent, S. (2015). Verisk analytics to buy Wood Mackenzie for $2.8 Billion. Wall Street Journal, 10 March. https://www.wsj.com/articles/verisk-analytics-to-buy-wood-mackenzie-for-2-8-billion-1425981462
62See https://www.spglobal.com/en/, https://www.rystadenergy.com/, and https://www.woodmac.com/, respectively.
63Readhead, A., Mulé, D., & Op de Beke, A. (2018). Examining the crude details: Government audits of oil and gas project costs to maximize revenue collection. Ghana case study. Oxfam Briefing Paper. Oxford: Oxfam. https://oxfamilibrary.openrepository.com/bitstream/handle/10546/620595/bp-examining-the-crude-details-ghana-131118-en.pdf
64Tanker Trackers. (2018). Iran, September 2018 with oil production calculation. https://tankertrackers.com/news/crude-oil-exports-report/iran-september-2018
65For a rare discussion on the quality, coverage, and cost of commercial databases, see pp. 22–24 of OECD & World Bank Group. (2017). A toolkit for addressing difficulties in accessing comparables data for transfer pricing analyses. The Platform for Collaboration on Tax Discussion Draft. Paris: Organisation for Economic Co-operation and Development. https://www.oecd.org/tax/discussion-draft-a-toolkit-for-addressing-difficulties-in-accessing-comparables-data-for-transfer-pricing-analyses.pdf
67Carstens, P. (2018). Amnesty says Shell, Eni negligent on Nigeria oil spills, Reuters, 15 March. https://www.reuters.com/article/us-oil-nigeria/amnesty-says-shell-eni-negligent-on-nigeria-oil-spills-idUSKCN1GS00A
68Fitzgibbon, W. (2016). Secret offshore deals deprive Africa of billions in natural resource dollars. International Consortium for Investigative Journalists, 25 July. https://www.icij.org/investigations/panama-papers/20160725-natural-resource-africa-offshore/
69https://www.globalwitness.org/jade-story/
70Warrick, J. (2016). Satellite photos show Islamic State installing hundreds of makeshift oil refineries to offset losses from airstrikes. Washington Post, 7 July. https://www.washingtonpost.com/news/worldviews/wp/2016/07/07/satellite-photos-show-isis-installing-hundreds-of-makeshift-oil-refineries-to-offset-losses-from-air-strikes/?utm_term=.fdcf81ae44a9
71Solomon, E., Kwong, R., & Bernard, S. (2016). Inside Isis Inc: The journey of a barrel of oil. Financial Times, 29 February. https://ig.ft.com/sites/2015/isis-oil/?
72Raymond, P.A. & Haddad, M. (2015). The battle for Libya’s oil: Fighting over Libya’s oil resources is placing the country’s future in jeopardy. Al Jazeera, 19 February. https://www.aljazeera.com/indepth/interactive/2015/02/battle-libyas-oil-150219124633572.html
73Zwijnenburg, W. (2018). Nefarious negligence: Post-conflict oil pollution in Eastern Syria. Bellingcat, 9 April. https://www.bellingcat.com/news/mena/2018/04/09/nefarious-negligence-post-conflict-oil-pollution-in-eastern-syria/
74Radjawali, I. & Pye, O. (2015). Counter-mapping land grabs with community drones in Indonesia. Paper presented at Land Grabbing, Conflict and Agrarian–Environmental Transformations: Perspectives from East and Southeast Asia, 5–6 June 2015, Chiang Mai University, Bangkok. https://www.iss.nl/sites/corporate/files/CMCP_80-Radjawali_and_Pye.pdf
75Bunte, J.B., Desai, H., Gbala, K., Parks, B., & Runfola, D.M. (2017). Natural resource sector FDI and growth in post-conflict settings: Subnational evidence from Liberia. AidData Working Paper 34. Washington, DC: AidData. http://docs.aiddata.org/ad4/files/wps34_natural_resource_sector_fdi_and_growth_in_post-conflict_settings.pdf
76Olden B., Poplawski-Ribeiro, M., & Kejji, L. (2017). Nowcashing: Using daily fiscal data for real-time macroeconomic analysis. Washington, DC: International Monetary Fund. http://www.imf.org/en/publications/wp/issues/2017/11/06/nowcashing-using-daily-fiscal-data-for-real-time-macroeconomic-analysis-45372
78Kharma, S. & Crist, L. (2018). Data rich and information poor: Increasing the effectiveness of data disclosure in the natural resource sectors. CommDev. https://www.commdev.org/data-rich-and-information-poor-increasing-the-effectiveness-of-data-disclosure-in-the-natural-resource-sectors/
79Ibid.
80Eisen, N., Kaufmann, D., & Heller, N. (2018). Annotated bibliography: Transparency, accountability, and participation along the natural resource value chain. Washington, DC: Brookings Institution. https://www.brookings.edu/about-the-leveraging-transparency-to-reduce-corruption-project/
81http://ccsi.columbia.edu/files/2015/07/mining-a-mirage-CCSI-IISD-EWB-2016.pdf
82https://www.weforum.org/agenda/archive/fourth-industrial-revolution
84https://www.opengovpartnership.org/stories/share-love-share-data-follow-money
85Ross, M., Hazlett, C., & Mahdavi, P. (2015). The politics of petroleum prices: A new global dataset. https://www.sscnet.ucla.edu/polisci/faculty/ross/papers/working/IPES_final.pdf
86http://greenfiscalpolicy.org
87https://www.fsb-tcfd.org/publications/final-recommendations-report/#
88Berdou, E. & Shutt, C. (2017). Shifting the spotlight: Understanding crowdsourcing intermediaries in transparency and accountability initiatives. Making All Voices Count, 22 February. https://www.makingallvoicescount.org/publication/shifting-spotlight-understanding-crowdsourcing-intermediaries-transparency-accountability-initiatives/
Approximately 80% of all government data contains some reference to location.
Opening up geospatial data was a key early driver of open data advocacy, and there has been significant progress in opening up this type of data. However, much of government geospatial data remains under restrictive intellectual property agreements.
Work on open geospatial data technology and infrastructure pre-dates the concept and implementation of open data, yet there are relatively weak links between the open geospatial and other open data communities. Stronger links could build critical capacity for spatial analysis within open data communities.
Mapping visualisations are a popular way of presenting open data, yet the spatial analysis carried out is often unsophisticated. Relationships that appear on a map may not be statistically significant. It is important to recognise that geographic relations can be shown in other forms, such as tables and charts.
Fifteen years ago, most users experienced online maps much as they might their paper counterparts: flat non-interactive images for browsing geography. In 2005, Google Maps changed that, giving rise to enthusiasm for the mapping mash-up, where data (often taken from public datasets) is located on an interactive scrollable and zoomable map. A year later, OpenStreetMap was launched, providing a platform for the collection and display of mapping data, unencumbered by intellectual property (IP) restrictions, and launched in response to ongoing frustration at the lack of open geographic data in the United Kingdom (UK).1 The move from large proprietary desktop Geographic Information Systems (GIS) to increasingly open access to geospatial2 data appeared to be underway.
Mapping visualisations have been strategic assets in the popularity of open data and they remain one of the public entry points to engage with open data. A typical mapping portal from the City of Phoenix3 in the United States (US) demonstrates the type of geospatial data (and prepared maps) available through a typical North American municipal data portal, including property boundaries, zoning information, traffic volumes, and recreation areas (see Figure 1). A similar site can be found for Manchester, England,4 although geospatial data and map access come with terms and conditions that restrict how that data can be used.
Both the potential demonstrated by mapping mashups and user interfaces and the desire for access to valuable geospatial datasets held by governments and government agencies can be seen as driving forces in the development of the open data movement. But what of geospatial data today? Is the data now widely open, accessible, and used? And what progress has been made in unlocking the potential of geospatial data for analysis and improved policy-making?
While much progress has been made in the availability of data, and in the development of tools to visualise it, substantial work is needed to better connect geospatial and open data communities, to equip creators and users of geospatial data with the critical skills (and technical platforms) needed to move beyond simply mapping, and to gain the full benefits of geospatial data analysis. There also are significant risks from the wider use of geospatial data that need to be more directly addressed. Ultimately, advances made in terms of sheer data availability and infrastructures are currently counterbalanced by significant stalemates in terms of analytical approaches to geodata, as well as ownership and privacy risks.
Figure 1:Screenshot of the City of Phoenix – Mapping Open Data platform
It is estimated that 80% of all government data has some reference to location.5 Almost every chapter in this volume touches upon geospatial data in some form. Geospatial content can be found in datasets on subjects as diverse as parks, refugee camps, financial transactions, natural resource distributions, and socioeconomic statistics. Many uses of open data rely on being “mapped” (i.e. attached) to basic geographic framework data.6 For example, socioeconomic statistics, like population, may be mapped on top of administrative boundaries. Data on soil quality may be attached to digital elevation models to model erosion, and that same soil data may be compared to geographically intersecting data on land ownership and land subsidies. Without their geospatial component, many open datasets would have much reduced impact.
Figure 2:Simple illustration of geographic layers
Source: Author
Mapping generally involves presenting geospatial data alongside a geographic layer. Geographic layers are datasets that are essentially outlines and may or may not be open data themselves. These layers include jurisdictional boundary files (e.g. country, city, school catchment areas, and watershed districts) or linear features like rivers or roads. For completeness sake, there also are geographic point layers, such as centres of cities or locations of known elevation like mountain peaks. Geographic layers may also include remote sensed imagery. Imagery can function as a backdrop onto which geospatial data is overlaid (e.g. logging operations in forested areas). Like other geospatial data, remote sensed imagery can be analysed alone or in combination with other open datasets to identify areas of drought, land use, or pollution.
Many practitioners working with open data consider geography primarily in terms of x and y coordinates, usually expressed as latitude and longitude, respectively. It is important to recognise that there are numerous types of “coordinates”. These include direct location references such as latitude/longitude, postal addresses, or GPS traces. There also are indirect references to location, such as place names (e.g. colloquial neighbourhood names, or official country or region names) that can be turned into a set of coordinates using a gazetteer or a lookup database.
The vast diversity of geospatial data may be more or less open along a number of dimensions. Data may be free to browse but not to download. Or data may be free to download but provided under restrictive licences that limit reuse. Or data may be openly licensed but only available in formats that require proprietary software or that use proprietary referencing systems. To understand open geospatial data, we need to ask: What kind of data is this? and How open is it?
Many kinds of geospatial data in terms of structure, representation, and analysis
There are many different kinds of geospatial data, and for any geographic feature, choices are made about how to represent it. The same feature might be represented using points, lines, polygons, or pixels. This choice impacts the kind of analysis that is possible, the technologies that can be used in analysis, and the biases to watch out for when drawing conclusions from the data.
Figure 3 shows how a feature might be represented as a vector (a collection of linked points) or a raster (a collection of pixels scaled to a particular resolution with each individual pixel encoding information from its immediate area).
Figure 3:Different ways to represent the same geographic feature
Source: Author
Figure 4 illustrates how information is linked to geography for presentation (mapping) and analysis. Geographic layers are usually not directly accompanied by geospatial data. Instead, to a polygon (e.g. a country boundary), one could add (join) datasets, such as population data, information on political control, or catchment areas for particular service provision, and to a point (e.g. lat, lng) one could add details of public services provided at that location.
Figure 4:How information can be linked to geography for mapping and analysis
Source: Author
However, geospatial analysis does not require pre-existing boundaries like countries or cities. This can be useful when the boundaries are not available or when mapping onto those boundaries would be misleading (e.g. mapping incidences of crime onto areas with very different populations). Hexbinning, shown in Figure 5, is an approach to handle point data in these cases, creating a new geographic layer of arbitrary shapes into which the points can be aggregated.
Figure 5:Hexbinning creates a new layer that allows data points to be mapped when boundaries are unavailable or when mapping the available boundaries could mislead.
Source: Author
The last decade has seen substantial strides in opening up geospatial datasets. Evidence suggests this has brought significant social and economic value. For example, in 2013, the Government of Denmark, through their Basic Data programme, released digital mapping data free under an open licence. A follow-up study in 2017 estimated that this had led to DK 3.5 billion (approx USD 495 million) in socioeconomic value in the preceding year.7 It is estimated that making the US LandSat satellite imagery freely available in 2009 accrued USD 1.8 billion annual value to the economy; whereas, charging for access would lead to substantial inefficiencies and loss of value.8 In the UK, open data policy has led to new datasets being made open from their mapping agency, the Ordnance Survey. The release of geospatial data responded to advocacy that focused on gains to the economy from a more open approach to this data.9 It has long been argued that Canada suffered significant losses due to government’s early reticence to open geospatial data,10 which is being remedied.
In the US, efforts to open up federal geospatial data pre-date most consideration of open data worldwide. The federal government, as well as subnational jurisdictions of the US (states, cities), tends to publish geographic datasets as integral parts of their open data portals. The reason that geospatial data is arguably the first open (government) data is due to the establishment of national or subnational spatial data infrastructures (NSDIs), the first one being the Australian Land Information Council in 1986.11 NSDIs are outgrowths of “the technology, policies, standards, and human resources necessary to acquire, process, store, distribute, and improve utilisation of geospatial data”.12 Geospatial data infrastructures tend to require high levels of interoperability in terms of standardisation to function. These datasets likely originate in different agencies with varying practices of data collection, update schedules, and definitions. Full standardisation requires geospatial data to be at the same geographic projection with the same coordinate system, spatial extent, updates, and data definitions. It is by no means easy to coordinate data so that layers “lie on top” of each other in alignment.
Spatial data infrastructures did not necessarily originate as open platforms. Many were designed as government-to-government data sharing platforms, although several promoted the idea that the data should be accessible to a range of applications and support economic development. Openness of geospatial data remains uneven across the world. The latest Open Data Index13 identifies just 12 countries where governments provide fully open national geospatial data, and only one (Brazil) is not in the World Bank’s “High-Income Economies” category. There is movement among numerous countries to increase openness (e.g. Indonesia’s widely discussed One Map initiative). Progress has been slow and mostly focused on rationalisation of geospatial data management. Opening up geospatial data is not simply a matter of applying a licence to existing datasets, but also involves the adoption of policies, standards, and human resources specific to geospatial data.
Encouraged by the International Open Data Charter, and noting the value of an “open by default” approach, the Group on Earth Observation adopted open data principles in 2016,14 seeing this as the natural step forward from their existing data sharing regime (established in 2006) and justifying this shift on the basis of the economic, social, governance, education, research, and innovation value.15 The European Union’s (EU) INSPIRE16 directive has driven the inclusion of geospatial data features in a number of national data portals and extensions for geospatial data to the open source CKAN software.17 Many NSDIs have had little integration into the open data landscape. However, the EU’s initiative demonstrates how governments may integrate parallel tracks of activity between the open data and geospatial communities.
Gaps in geospatial data are increasingly addressed through the use of cross-border satellite imagery available on digital earth mapping platforms. Some of this data is sourced from government. The launch of the Africa Regional Data Cube in May 2018 resembles many features of an NSDI in terms of standardisation and provides access to free satellite imagery for Kenya, Senegal, Sierra Leone, Ghana, and Tanzania. It builds on an open source “data cube” platform that compresses pre-processed imagery to reduce the otherwise prohibitive costs of data transfer, storage, and analysis.18
Government data also is being augmented by the private sector and civil society, and some of these new geospatial datasets could become open data. Firms like DigitalGlobe provide imagery derived from commercial satellites. Whereas satellite coverage may be universal, street mapping remains limited by either the availability of non-proprietary street-mapping data or volunteer contributions. Much of this data is licensed to proprietary platforms like Google Maps. Users can zoom into most places on Earth and see road layouts or satellite imagery. To access the same data on other platforms to support applications or analysis can often be prohibitively expensive. For instance, software application programming interfaces (APIs) may be available but based on per-access pricing,19 or sudden price changes may leave data out of reach of users seeking to map open data coordinates or build open data-related applications and businesses.20 It is important to remember that free to use, but non-open, platforms are subject to prevailing business models of tech industries. Parts of Microsoft’s Bing mapping division were sold to Uber in 2015, and Google increased prices for its mapping APIs up to fourteenfold in 2018. There is a precariousness to basing one’s mapping applications on a specific non-open platform. Fortunately for data consumers, the last decade also has seen the emergence of tools like Leaflet,21 which enable digital mapping using a variety of geospatial data providers. Companies like MapBox22 provide a commercial offering but are committed to building on top of open source tools and data.
Open geospatial data also is being created through crowdsourcing. The largest platform, OpenStreetMap, “is built by a community of mappers that contribute and maintain data about roads, trails, cafés, railway stations, and much more, all over the world”.23 By comparing CIA World Factbook data on road length in a country with OpenStreetMap data, Maron and Channell found that some countries have 100% coverage of major roads.24 In Asia and China coverage is more limited. In India, for example, only 21% of the road network has been digitised on OpenStreetMap.25
Use of private or crowdsourced data reflects the costs of collection and maintenance of geospatial data and related infrastructures. When geospatial data is funded directly from government budgets, rather than through cost-recovery (i.e. charging users for use of the data as a method of supporting government data collection and maintenance), access is at greater risk of budget cuts.26 This can lead to pressure from agencies working with geospatial data to develop or retain financing regimes. The cost of data collection has led a few governments, particularly in North America, to explore partnerships with private sector firms to collect data through projects, such as Google Waze, Strava Metro, and Uber Movement.27 Ironically, these datasets frequently originate from civil society or individual citizens, but ownership is claimed by the firms providing the platforms for data collection. This can introduce new sources of proprietary data in spatial data infrastructures at the same time that other aspects of those infrastructures may be opening up. Additionally, the inclusion of privately sourced or crowdsourced data invariably shifts control from government in terms of data accuracy, coverage, and timeliness of edits and updates. This will increase the risk to governments (real or perceived), particularly if that data is central to government operations.28
Four examples of open geospatial data
Thousands of examples of open geospatial data projects exist. These include:
Crime Maps presenting data from the police and justice system (see Chapter 4: Crime and justice) for individuals to see recorded crime incidents and rates in their communities.
Community assets mapping such as the MySociety.org “Keep it in the Community” project that is mapping an England-wide register of community assets and exploring issues around ownership of community buildings and land.
Disaster relief and resilience initiatives such as the work of Humanitarian OpenStreetMap Team (HOT) which mobilises volunteers to remotely map disaster-hit areas in support of responders. The OpenDRI (Open Data for Resilience Initiative) seeks to reduce vulnerability to natural hazards and impacts of climate change.29
Aid mapping including work to understand patterns of aid distribution and the geopolitics of aid.30
For all the progress that has been made in terms of data openness, four issues present notable challenges for work with open geospatial data.
First, numerous countries face challenges in opening key datasets due to IP restrictions. The UK’s mapping agency, the Ordnance Survey, and postal service, Royal Mail, have long been restricted in how they can open up their geospatial data due to Crown Copyright. Ownership of all or part of the IP was further complicated when the management of the postcode database was outsourced to a private firm. The situation shows signs of improvement with a 2015 open data policy supporting a “presumption to publish”.31 However, efforts to create an open address register for the UK have been put on hold, which places this critical lookup dataset out of the reach of many open data projects.32 CanadaPost has maintained strict IP protections on its postal code database. In Canada, a one-person firm, Geolytica, built an application that would reverse engineer Canadian postal code boundaries using computational geometry and crowdsourcing. It was done as a proof-of-concept, but the database was also opened up to the public. Geolytica’s efforts led to it being sued by CanadaPost for violating the latter’s ownership of the phrase “postal code” and the underlying content.33
The value of spatial data as IP means that firms are often interested in acquiring exclusive rights to it. Another example from Canada illustrates this. The Ontario-based firm, Teranet, purchased the rights to land registries (cadastres) around the world. In exchange for those rights, the firm maintains the registry datasets and then licenses access back to local and regional governments.34 This represents not just private provision of the service but private ownership of the data. There is a paucity of reliable data on how many countries have substantial private ownership of IP in their spatial data infrastructure, yet this is likely to be an important area to track over the coming decade if further gaps are to be avoided in the open geospatial data landscape.
A second key challenge relates to privacy and security. When it concerns data about individuals, location data can often pierce privacy protections and enable surveillance. A combination of just three variables (i.e. gender, birthdate, US zip code) has been found sufficient to identify individuals by name in the US.35 Individuals increasingly leave geographic data traces on the web through their use of fitness trackers, location-stamped photographs, or a myriad of other location tracking apps. The existence of this data can jeopardise the anonymity of other datasets that might contain coinciding location and timestamps. Methods exist to maximise privacy while preserving the ability to analyse data (e.g. through geographic masking).36 However, the ability to deanonymise data will only improve as artificial intelligence and machine learning are applied to open data.37 Whereas open datasets generally do not describe individual persons, the growing availability of geo-indexed data needs to be accounted for when creating, sharing, and using open datasets.
Standardisation presents a third major challenge for greater interoperability in the world of geospatial data. The most commonly used standard for geography is the “atomic standard” of the coordinates, latitude and longitude. Multiple alternatives exist to lat/long (e.g. polar coordinates are better for people near the poles). Considering coordinate systems requires contemplating standards in geographic projections. Inconsistent projections prevent one dataset from correctly being overlaid onto other data layers and may inhibit other operations like calculating travel distances. Polygons like jurisdictional boundaries also generate complexity related to standards. The schema.org standards for place, which contain at least ten different relationships of containment, overlapping, intersection, and equality between areas, provides a sense of how complicated it is to structure geometries beyond simple point locations.38 Maintaining the quality of geographic data and ensuring standards are adopted correctly is not trivial. Unlike other sectors, the problem is not the availability of standards (e.g. the Open Geospatial Consortium maintains over 30 open standards for geographic data).39 We need an educated understanding about their adoption. Instead of creating an integrated world of geospatial data, open data initiatives could lead to a soup of misaligned points and polygons that are difficult to distinguish.
This leads to the last challenge: the lack of interaction between open data communities and the communities that traditionally work with geographic data. Open geospatial data (via WAIS servers, NDSIs, and Al Gore’s articulation of a Digital Earth40) predate the concept and implementation of open data. Open data advocacy in several countries was sparked by a desire for geospatial data as in the UK FreeOurData campaign41 and Canada’s DataLibre.42 Nonetheless, there has been a gulf between the early open data movement with its focus on quantity over quality and the geography/geomatics community, which by 2010, was already well established and considering issues of standardisation and data management. We have seen plenty of missed opportunities to bridge the gulf, which has resulted in a bifurcation in skills for geospatial data handling that impedes both the opening, and the effective use, of geospatial data. In particular, this has led to the open data world’s focus on mapping but very little focus on geographical analysis. There remains considerable potential for increased interaction between the two communities to enhance skills and analysis.
Mapping is undoubtedly important, but visualisation of data is just one strategy of many. There has been a tendency among open data practitioners to map and make inferences based on visual inspection of geospatial datasets. However, these ostensible relationships are often not statistically significant. The ability to map open data in the absence of the critical skills to analyse it correctly can lead to problems and even incorrect policy prescriptions. Expanding skills for detailed spatial statistics and analysis, to allow conclusions to be drawn from open datasets and to create new, improved maps based on the results of that analysis, should be a high priority in the open data community. General data literacy capacity has grown, but the availability of tools, resources, and outreach to promote geospatial data literacy is much more limited. The current lack of analytical capacity represents a critical bottleneck to the effective use of open geospatial data.
For example, one large part of open geographic data handling concerns what is known as “feature geometry”. Most open data containing geospatial attributes is point-based. That is, an entity’s location (e.g. a park, a government transaction, a building project, or a refugee settlement) is represented by a single x, y coordinate. The choice of which points to use is not always obvious. Should the location be a headquarters of a local relief agency or the location where activities are occurring? Many of these points reflect what is called a central tendency or the centroid (a geometric centre of an area). Depending on the shape of the area (e.g. a crescent), a centroid could actually appear outside the area. The simple consideration of which location is mapped can affect the message a map communicates.
Numerous forms of analysis should not rely on point location at all. Many features, such as the geographic distribution of poverty or of crop types, are not natural distributions, easily interpreted through the use of latitude and longitude, but are shaped by politics. Such features are more appropriately described by areal measures. For example, poverty should be reported by the political boundary of a township. Unlike geographic points, working with jurisdictional data can be difficult because boundary file availability and discoverability are limited and there may be disputes over borders. Tools for working with containment (polygons) are less user-friendly, in many cases, than those for generating point-based online maps. Similar issues exist for raster datasets (e.g. satellite imagery), which are especially important for rural areas.43 Working with raster data, whether it is satellite data or drone data, generally requires more extensive experience and expensive software than other types of data.
A common alternative to mapping by jurisdiction is through aggregation and clustering. Two popular aggregation methods are hexagonal binning (hexbins) and rectangular grids, which rely on the use of regular artificial areas into which points are counted. A different approach is clustering points through hotspot analysis, which infers the geospatial extent of a phenomenon (e.g. a cluster of disease outbreaks) and differentiates statistically significant clusters from non-significant clusters. Many tools can now automate aggregation and clustering, but tools need to be accompanied by a critical understanding of the way the choice of approach affects analysis. Geographers have widely discussed the modifiable areal unit problem (MAUP)44 whereby aggregation units are understood as definitionally artificial and the results of data aggregation depend on the choice of the unit. Results (e.g. counts, rates, densities, and correlations) are influenced by the shape and orientation of the unit (e.g. slight tilting or enlarging of a rectangular grid), as well as by the way the units are combined (scale). O’Loughlin et al. (2014), for example, use open data on a rectangular grid to map violence, heat, and precipitation across the African continent.45 They note limits in the data and its aggregation, even as they perform analyses at a finer aggregation than previously conducted to better understand climate conflicts. Tools exist to improve data literacy with regard to problems introduced by spatial aggregation.46 The challenge is promoting their adoption outside the geography community and within the much wider community of open data users who may otherwise adopt naive analytical strategies. No aggregation is perfect, including those using jurisdictional boundaries. It is important to broaden critical understanding of the malleability of aggregations in the results they deliver.
This noted, we must be aware that improving the quality of analysis of geospatial open data can be knowledge and resource intensive. For example, AidData’s infrastructure for sophisticated geospatial analysis of international aid patterns is expensive to maintain and requires substantial annual resources.47 Although Google has instituted a business model for Google Maps, organisations like AidData cannot rely on similar mechanisms of support.
As we look to the future, opportunities lie in better connecting the open data and geospatial data communities. The latter has been working on improving open source geospatial data tooling for many decades. Even though much of this work has been focused in particular professional contexts, critical and community geographers have long been working on ways to open up access to, and support popular engagement with, geospatial data. The extensive learning and thinking within this field should not be ignored in the rush to open up data and excitement over the latest commercial tools and simplified mapping platforms.
Major advances have been made in open geospatial data. However, numerous gaps remain related to IP, standardisation, privacy, and analytical capacity. In the next decade of open data, we need to ensure greater coordination between the geomatics/GIS and the open data communities so better maps can be produced and greater value can be demonstrated from the wealth of geographic content within the open data released in the last decade.
More than anything, anyone working with geographic open data should approach it with a critical eye and ask two questions. Which choices have been made in creating this data? What lessons might there be from the existing geospatial data community to help with the analysis of this data?
Armstrong, M. & Ruggles, A.J. (2005). Geographic information technologies and personal privacy. Cartographica: The International Journal for Geographic Information and Geovisualization, 40(4), 63–73.
Johnson, P., Sieber, R., Scassa, T., Stephens, M., & Robinson, P. (2017). The cost(s) of geospatial open data. Transactions in GIScience, 21(3), 434–445.
MacEachren, A.M. & Kraak, M.J. (2001). Research challenges in geovisualization. Cartography and Geographic Information Science, 28(1), 3–12.
Monmonier, M. (2018). How to lie with maps. 3rd edition. Chicago, IL: University of Chicago Press.
Openshaw, S. (1983). The modifiable areal unit problem. Norwick: Geo Books.
About the author
Renée Sieber is an Associate Professor at McGill University in the Department of Geography and School of Environment, where she researches the use and value of geospatial information for social change. Renée examines applications in GIS for and by poor communities, social movements (particularly the environmental movement), and Indigenous groups. You can follow Renée at https://www.twitter.com/re_sieber.
How to cite this chapter
Sieber, R. (2019). Open data and geospatial. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The state of open data: Histories and horizons (pp. 137–150). Cape Town and Ottawa: African Minds and International Development Research Centre. http://stateofopendata.od4d.net.
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. It was carried out with the aid of a grant from the International Development Research Centre, Ottawa, Canada. |
1Haklay, M. & Weber, P. (2008). OpenstreetMap: User-generated street maps. IEEE Pervasive Computing, 7(4), 12–18.
2GIS specialists tend to refer to data containing a geographic reference as “geospatial” instead of “geographic” or “locational” to suggest how geographic/locational data allows us to ask spatial questions, like “What is statistically near this location?” or “How many of these items are within this boundary?”
3https://mapping-phoenix.opendata.arcgis.com/
5Hahmann, S. & Burghardt, D. (2013). How much information is geospatially referenced? Networks and cognition. International Journal of Geographical Information Science, 27(6), 1171–1189.
6The US Federal Geospatial Data Committee argued that all communities should have seven datasets: land ownership (cadastre), digital orthoimagery, elevation, geodetic control, jurisdictional and other government unit boundaries, hydrography, and transportation.
7PwC. (2017). The impact of the open geographical data: Follow up study. PwC Danmark. https://sdfe.dk/media/2917052/20170317-the-impact-of-the-open-geographical-data-management-summary-version-13-pwc-qrvkvdr.pdf
8Loomis, J., Koontz, S., Miller, H., & Richardson, L. (2015). Valuing geospatial information: Using the contingent valuation method to estimate the economic benefits of landsat satellite imagery. Photogrammetric Engineering & Remote Sensing, 81(8), 647–656. https://doi.org/10.14358/PERS.81.8.647
9Yates, D., Keller, J., Wilson, R., & Dodds, L. (2018). The UK’s geospatial data infrastructure: Challenges and opportunities. London: Open Data Institute. https://theodi.org/wp-content/uploads/2018/11/2018-11-ODIGeospatial-data-infrastructure-paper.pdf
10Klinkenberg, B. (2003). The true cost of spatial data in Canada. The Canadian Geographer, 47(1), 37–49.
11Masser, I. (1999). All shapes and sizes: The first generation of national spatial data infrastructures. International Journal of Geographical Information Science, 13(1), 67–84.
12https://www.fgdc.gov/nsdi/nsdi.html
14http://www.earthobservations.org/open_eo_data.php
15Uhlir, P.F. (2015). The value of open data sharing. Geneva: Group on Earth Observations. http://www.earthobservations.org/documents/dsp/20151130_the_value_of_open_data_sharing.pdf
16https://inspire.ec.europa.eu
17https://docs.ckan.org/en/ckan-1.7.4/geospatial.html
18Melamed, C. (2018). The Africa Regional Data Cube: Harnessing SATELLITES for SDG progress. United Nations Foundation, 4 June. https://unfoundation.org/blog/post/the-africa-regional-data-cube-harnessing-satellites-for-sdg-progress/
19See, for example, https://platform.digitalglobe.com/maps-api
20ProgrammableWeb. (2018). Time to challenge Google Maps pricing. ProgrammableWeb, 26 August. https://www.programmableweb.com/news/time-to-challenge-google-maps-pricing/elsewhere-web/2018/08/26
23https://www.openstreetmap.org/about
24Maron, M. & Channell, T. (2015). How complete Is OpenStreetMap? Mapbox, 18 November. https://blog.mapbox.com/how-complete-is-openstreetmap-7c369787af6e
25https://www.mapbox.com/data-platform/country/#india
26Lee, T. (2018). Open data, and how we preserve it. Medium, 30 October. https://medium.com/@thomas.j.lee/open-data-and-how-we-preserve-it-4db836f354fc
27Marzloff, L., Hamonic, A., & Rieg, J. (n.d.). Quelles coopérations public-privé à l’ère de la Data? [Public–private partnerships in the age of data]. Le Lab. https://www.le-lab.org/enquetes/2-quelles-cooperations-public-prive-a-lere-de-la-data
28Johnson, P.A. (2017). Models of direct editing of government spatial data: Challenges and constraints to the acceptance of contributed data. Cartography and Geographic Information Science, 44(2), 128–138.
29See https://www.hotosm.org and https://opendri.org/about/
31https://www.ordnancesurvey.co.uk/business-and-government/public-sector/news/2015/presumption-publish.html
32ODI (Open Data Institute). (2015). Creating the UK’s first free and open address list. https://theodi.org/project/creating-the-uks-first-free-and-open-address-list/
33https://cippic.ca/en/news/canada_post_settles_postal_code_geolytical_lawsuit
34Sangiambut, S. (2017). Geospatial open data: Reshaping citizens and governments, roles and interactions. Master’s Thesis, McGill University. digitool.library.mcgill.ca:8881/dtl_publish/1/145408.html
35Sweeney, L. (2000). Uniqueness of simple demographics in the US population. Technical Report LIDAP-WP4. Pittsburgh: Carnegie Mellon University, School of Computer Science.
36Armstrong, M. & Ruggles, A.J. (2005). Geographic information technologies and personal privacy. Cartographica: The International Journal for Geographic Information and Geovisualization, 40(4), 63–73.
37https://privacyinternational.org/blog/54/privacy-international-launches-surveillance-industry-index-new-accompanying-report
39http://www.opengeospatial.org/
40Foresman, T.W. (2008). Evolution and implementation of the Digital Earth vision, technology and society. International Journal of Digital Earth, 1(1), 4–16. https://www.tandfonline.com/doi/full/10.1080/17538940701782502
41http://www.freeourdata.org.uk/
43Sieber, R.E. & Parfitt, I. (2019). The future of open data is rural. In P. Robinson and T. Scassa (Eds.),The future of open data. Ottawa: University of Ottawa Press.
44Openshaw, S. (1983). The modifiable areal unit problem. Norwick: Geo Books.
45O’Loughlin, J., Linke, A.M., & Witmer, F.D.W. (2014). Effects of temperature and precipitation variability on the risk of violence in sub-Saharan Africa, 1980–2012. Proceedings of the National Academy of Sciences, 111(47), 16712–16717.
46See Amelia McNamara and Aran Lunzer’s site at https://tinlizzie.org/spatial
47Custer, S., DiLorenzo, M., Masaki, T., Sethi, T., & J. Wells. (2017). Beyond the tyranny of averages: Development progress from the bottom up. Williamsburg, VA: AidData at the College of William & Mary, p. 2.
Opening up data on government finance has been a major focus of open data advocacy with projects like OpenSpending bringing a data-driven approach to work on fiscal transparency.
Opening up public finance data requires a whole set of conditions for success, including government capacity, access to technical platforms and standards, and in-depth engagement from civil society, to help make sense of complex financial data.
When better connected to grassroots advocacy, open data approaches to government finance can help re-energise global budget transparency work.
Working to ensure the transparency of government finances has a long history. By 1850, many countries in Europe had already enacted constitutional requirements that government budgets or accounts be published, leading to what Irwin1 refers to as an “avalanche of data” that was sparked, in part, by “rulers’ need to persuade creditors to lend and taxpayers’ representatives to approve new taxes”. However, this avalanche of annual accounts, published in printed paper reports, seems miniscule when compared to the data on government finances that could be made available today. When the East Asian financial crisis hit in 1997, fiscal transparency was firmly placed on the global agenda, and principles were put forward calling for disclosure of information across government operations, not just budgets.2 And as the open data movement has developed over the last decade, it has brought a particular focus on transparency in government finances, adding a particular digital spin to advocacy and calling not only for data but for machine-readable data that is ready for public analysis.
Public finances are ultimately at the heart of government activity, constituting one of the main levers of public action through which governments shape society. The study of public finance may historically have been regarded as a question of simply determining the income and expenditures of governments. However, since the middle of the 20th century, this has expanded to recognise the role that taxes and spending play in shaping the wider economy (e.g. taxing activities that may have negative consequences and spending that may stimulate economic development and trade, including research grants or development aid). As such, citizen scrutiny and a clear understanding of all aspects of public finances is crucial. Debt, taxation, contracting, grants, and subsidies are all topics to be covered within the context of fiscal transparency, alongside more obvious themes of budgets and expenditures. With the right mechanisms in place, improved citizen understanding of the state’s fiscal behaviour can encourage greater civic participation and oversight, can promote public accountability, and, most importantly, can potentially enhance the effectiveness and efficiency of public budgets and spending.3,4
From the start, the open data movement has placed an emphasis on government finances with projects such as the 2007 “Where Does My Money Go” prototype (see box below) that demonstrated the potential of open data in this sector. Over the last decade, civil society and government-led projects around the world have sought to make public finance data more accessible with initiatives on almost every continent. However, the latest findings from the Open Data Index5 and Open Data Barometer illustrate that just 10% of surveyed governments publish fully open budget data (12 countries in total) and only 3% publish disaggregated open spending data (just 4 countries).6 In some countries, such as the United Kingdom (UK), an early publisher of spend data, reliable data availability has not been sustained, and it is not clear how far citizens have engaged with the data that has been made available.7
A decade into the new wave of open data-driven financial transparency, it is important to take stock of progress and to ask whether efforts to open up financial data have delivered results or whether activity is beginning to stall. This chapter takes a look at the arc of activity since 2005, taking stock of the state of initiatives, issues, and communities related to open government finance data.
The new wave of fiscal transparency: From documents to datasets
“Fiscal transparency – the comprehensiveness, clarity, reliability, timeliness, and relevance of public reporting on the past, present, and future state of public finances – is critical for effective fiscal management and accountability. It helps ensure that governments have an accurate picture of their finances when making economic decisions, including of the costs and benefits of policy changes and potential risks to public finances. It also provides legislatures, markets, and citizens with the information they need to hold governments accountable.”8
What counts as “public reporting” depends on your perspective. For much of the history of fiscal transparency, the focus has been on access to information being provided through the publication of government reports on budget formation and execution (including spending). These reports are generally static documents prepared by selecting, analysing, and summarising data from one or more “live” financial information systems. Governments may, in some cases, provide interactive tools to support the user’s ability to “drill-down” into the contents of those reports. However, with documents, there is a limit on how far users can dig into the data or remix the information to present it in different ways.
This is where calls for “raw data” come in: asking not just for reports and documents about budgets, taxes, and spending, but also for the underlying granular data. Where a row in a published document might represent hundreds of individual budget allocations, an open dataset could include a row for every allocation, along with detailed classification information. Where a spending report might contain an aggregated figure on payments by a particular agency, spending data could contain a row for each payment with details on the suppliers paid in each case and information on the timing of those payments. The move from documents to data provides for both increased granularity (or disaggregation) of information and increased flexibility in how users can work with it (see Figure 1). With access to data, rather than documents alone, it becomes possible for a wider range of users to create a wider range of visualisations, interfaces, and analysis, although such applications are very dependent on the quality of the raw data and on the metadata to provide context.
Figure 1:From documents to data: An example of the “Public Expenditure Statistical Analysis” document on the left, and the COINS public spending dataset to illustrate the difference in granularity between the two.
Sources: PESA document: HM Treasury. (2013). Public Expenditure: Statistical Analyses 2013, p. 19. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/223600/public_expenditure_statistical_analyses_2013.pdf; COINS dataset: HM Treasury. (2010). COINS 2010–11 Q1 Dataset (coins_sept10_3. csv). data.gov.uk. https://data.gov.uk/dataset/3266d22c-9d0f-4ebe-b0bc-ea622f858e15/combined-online-information-system
In 2005, a trio of data journalists launched FarmSubsidy.org with the goal of facilitating access to information on the subsidy payments under the European Union (EU) Common Agricultural Policy (CAP). The platform, constructed with data accessed via freedom of information (FOI) requests to governments across Europe, made structured data accessible to search and explore, providing detail not only on subsidy payments, but also the details of the companies who receive subsidies. Danish journalists were able to use this information to challenge the dominant political narrative that the CAP supported primarily the poorest farmers by showing that it was actually large landowners and agri-businesses that received the most funds.9 By 2009, EU member states were mandated to publish their subsidy data, removing the need for FOI requests, although, even now, the data is not always available in machine-readable formats. The growth of the project played a key role in demonstrating the value of data-driven public finance journalism and attracted interest from a range of funders, including the Hewlett Foundation.10
In 2007, the Open Knowledge Foundation’s Jonathan Gray developed the idea for “Where Does My Money Go”11 as a visual breakdown of the UK budget, tapping into a growing appetite for both data visualisation and open data ideas (see Figure 2). In 2008, the project was a winner of the UK Government’s “Show Us A Better Way” competition and had soon secured grant funding from government to develop a working prototype.12 Further funding from a UK state broadcaster (4IP), the Open Society Foundation, the Knight Foundation, the Hewlett Foundation, and the Omidyar Network enabled the evolution of the project into the global Open Spending platform,13 which now hosts elements of fiscal data from at least 70 countries. The nascent community related to the project was not comprised of accountants or public finance experts but rather civic hackers and citizens interested in making complex government finance more accessible and supporting wider citizen engagement.
By 2010, more governments were starting to explore the direct publication of machine-readable budget data, reducing the need for citizens, organisations, and projects to manually scrape data out of documents and PDFs. The United States (US) government’s USASpending.gov, originally created in response to legislation passed in 2006 requiring all “federal contract, grant, loan, and other financial assistance awards of more than $25,000 to be displayed on a publicly accessible and searchable website to give the American public access to information on how their tax dollars are being spent”,14 went through a number of relaunches in 2009 and 2010 with increasing emphasis placed on the availability of downloadable open data and enhanced granularity. Although the site had provided an application programming interface (API) since 2007, it was the addition of downloadable open data in subsequent versions that gained it an increased profile.
Intense policy competition between the UK and US during this period may be behind the UK government’s 2010 publication of the COINS (Combined Online INformation System) dataset,15 providing detailed “fact tables” that presented disaggregated spending data from across the public sector. The Guardian newspaper was one of the early users of this data, creating a public data explorer interface to help citizens search the large dataset and working with Open Knowledge Foundation to use citizen research and FOI requests to fill gaps in the data, particularly around individual supplier names.16 The Guardian went on to write a number of stories based on their analysis of the COINS data and used its release to explore gaps in the quality of public financial management in the UK.17 In parallel, government departments and local authorities were asked in a letter from Prime Minister David Cameron on 31 May 2010 to publish details on all expenditures over GBP 25 000. The letter also committed to the online publication of information on all new central government contracts and all international development project spending over GBP 500 from January 2011 onward.18
Latin American governments also took a lead during this early wave of government activity. In Mexico, the first budget dataset was published in 2011 by the Ministry of Finance as part of a project under Mexico’s Open Government Partnership (OGP) action plan.19 The portal that was created published basic information about federal programmes with quarterly updates on the money spent, information on external evaluations, and a matrix illustrating progress toward planned and achieved goals. The intent of this project was to provide a place where both citizens and decision-makers could find government finance data in a unified format. An OGP case study credits the portal with generating “commitments from the Federal Public Administration to make progress on public projects and initiatives which [had] fallen behind”.20 Although such portals could theoretically be created without using open data, taking an open data approach helped to provide Mexico with a common format for aligning data from different departments and agencies, supporting the integration of information that originated from many different IT systems.
Although Brazil launched a National Transparency Portal in 2004, Beghin and Zigoni (2014) documented21 that it was not until the passage of an Access to Information Act in 2011, establishing procedures for federated entities to follow in the disclosure of information, that access to government finance data increased. However, they note that, in 2014, there was still a long way to go before all budget and spending data would be accessible in machine-readable form.
It is no surprise then that a World Bank study in 2013 cited the UK, Mexico, and Brazil as members of a small pioneering group of countries working to provide good access to reliable open budget data from financial management information systems.22 The full list of countries noted included Brazil, Germany, South Korea, Mexico, New Zealand, Spain, Sweden, the UK, and the US. They all have a high Open Budget Index score (above 60)23 and OGP commitments to promote fiscal transparency in common.24
In this pioneer phase, we can see how the interaction between select journalists, civil society, and governments spurred action to make more granular and machine-readable data available on government finance. But whether or not this data can be used to answer questions like “where does the money go?” and whether these early publication projects are sustainable depends on a much wider network of actors and activities.
While many of the most prominent actors working in the area of open budgets are intergovernmental organisations, international NGOs, and multi-stakeholder initiatives, according to Gray’s analysis of the open budget data landscape, the active grassroot community is composed of a myriad of international and local CSOs involved in open government, government transparency, aid transparency, open data, and related topics.25 The way these groups are making use of public finance data is innovative and ingenious and reflects a limited but dedicated citizen interest in understanding how public money is spent. For example, in Nigeria, BudgIT26 has worked since 2011 on creating infographics that explain elements of the budget and, since 2014, has used Tracka to crowdsource information on the progress of development projects in local communities. Communicating through social media, mainstream media, and community outreach, BudgIT reports reach over 4 million Nigerians with their information.27
In an effort to help scale up innovations, equip organisations with open source tools, and improve data literacy around spending data, the Open Knowledge Foundation launched the OpenSpending project in 2011.28 Its vision was to provide a central database of budget and spending data, as well as to build a community of groups and individuals who could work together to acquire, use, and add their contributions to the platform. From its launch, the resources made available increased substantially as the project grew, including a spending data handbook,29 an open-source CKAN data portal with extensions,30 a visualisation library based on Where Does My Money Go,31 and a data specification called the Fiscal Data Package.32 As of November 2018, OpenSpending contains government finance datasets from over 80 countries, although at varying levels of granularity and timeliness.
The tools provided by OpenSpending have been used by different civil society projects and platforms to provide citizens with accessible and user-friendly budget information (e.g. the German project Offener Haushalt, the budget explorer tool in Kosovo, and the Open Budget platform in Ukraine).33 Many other organisations have developed their own technology and visualisations. This is the case for the Open Key project in Israel, the Open Spending portal in the Netherlands, the Vuleka Mali project in South Africa, and the Dónde van mis impuestos platform in Spain.34 A key driver for the editorial and technological choices of these projects has been the goal of building visualisations that reflect the needs of citizens and a desire to embed data within a pedagogical context that provides education on government finance.
Between 2013 and 2017, as the community grew, many more projects and platforms emerged from civil society organisations (CSOs), some of them with the specific objectives of using public finance data for investigation in journalism or to enhance civic participation. One notable data journalism project using public finance data is Spending Stories, a project by the former data journalism agency J++ that was developed in 2013 to allow comparisons between big and small amounts of money to give users a context to understand how money is being spent while referencing original news stories.35 The Farmsubsidy.org network has also continued to play an important part in building data journalism capacity related to open financial data, giving rise to the annual European investigative journalism Dataharvest Conference that now brings together as many as 400 journalists, coders, and scholars from all over Europe each year.36
As Gray’s map of the linkages between open budget data-related websites from 2015 suggests (Figure 3), it is also important to recognise different sub-communities working in the open finance data domain. As well as local groups, there are a number of overlapping global communities of practice, some with specific thematic areas of focus. Examples are the International Aid Transparency Initiative (IATI), the Extractives Industry Transparency Initiative (EITI), and others looking at particular sources of data, such as the Open Contracting Partnership which has, since 2015, developed a global network of governments, civil society organisations, and companies working with data on public procurement to enable a different way to “follow the money” that complements budget and spending data. Gray’s 2015 mapping does not, however, capture groups working in the area of tax justice. Since 2017, the Open Data for Tax Justice network has sought to put more focus on companies reporting the tax payments they make to government,37 which, once again, fills in another part of the complex government finance picture.
Although they do not feature heavily in Gray’s mapping, we should also not ignore private sector actors. Firms like SpendNetwork38 clean and re-package government spending data for firms interested in securing government contracts, and there is some evidence to suggest government spending data feeds into a range of other private sector products. This said, more could be done to understand the role of the private sector in this field.
It should be clear from the examples above that there is widespread interest in, and engagement with, open data on government finances. Networks like the FollowTheMoney network39 host regular community calls to connect organisations working on different parts of the governance finance puzzle, and groups like the Global Initiative for Fiscal Transparency (GIFT)40 place an emphasis on open data as part of wider fiscal transparency reforms. Yet there remain many shared practical challenges that mean the vision of timely, accessible, and accurate open data on government finances is far from fully realised.
Figure 3:Open budget data: Mapping the landscape
Source: Gray, J. (2015). Open budget data: Mapping the landscape. Washington, DC: Global Initiative for Fiscal Transparency. http://www.fiscaltransparency.net/resourcesfiles/files/20150902128.pdf
Even though much progress has been made in opening public finance data, some gaps remain, including those related to policy and high-level commitments, technical platforms for data, linking data to decision-making, and challenges in encouraging the use of data.
In 2009, the Sunlight Foundation in the US started the Clearspending41 project to generate an annual report on the consistency, completeness, and timeliness of federal data published on USASpending.gov. The project discovered over USD 1.3 trillion worth of missing or inaccurate data.42 The Guardian similarly reported problems with the accuracy and coverage of the early UK COINS datasets,43 and monitoring of UK government departments’ compliance with requirements to publish expenditures over GBP 25 000 indicates that many are failing to publish the required data on time.44 When data quality is low, it becomes hard for citizens to use and interpret data or to draw conclusions from it. This can be addressed by providing documentation that explains how the data was created and its limitations. In other cases, independent monitoring of data quality can provide an impetus for governments to improve their data. However, it is difficult for civil society (and even governments themselves) to sustain a quality control over published data. For example, the Clearspending project in the US only ran until 2012, and a number of other projects that have sought to monitor the quality of data in specific countries or localities are now defunct.
One of the key barriers to improving data quality has been the lack of a legislative basis for open data publication. In the US, the Digital Accountability and Transparency Act of 2014 (DATA Act)45 has addressed this in part, setting out standards for data publication and leading to the creation of detailed standards and procedures that apply quality assurance in stages as data is collated. Yet, in many countries, legislation or regulations supporting the transparency of government finance, even where they exist, have stopped short of providing enough detail to allow quality requirements to be enforced.
The OpenBudgets.eu project looked at the standardisation of budget and spending datasets across the EU in 2016 and concluded that there were a “plethora of budget and spending data models which reflect … fine-tuned differences in the legislative design of political entities”,46 although they also recognised the need for common approaches to data publication. One standard that has been put forward to address this gap, developed by a consortium of global organisations, including GIFT, the World Bank, and Open Knowledge International,47 is the Open Fiscal Data Package (OFDP). Rather than impose a particular structure on source data, the latest iteration of the OFDP allows datasets coming from countries with different fiscal and accountability structures to be published in any tabular form and then subsequently annotated to explain how data should be interpreted and visualised.
Adoption of the OFDP remains limited at present; however, the way in which data standards can facilitate global collaboration around government finance data has already been demonstrated through the adoption of more mature standards for aid flows (IATI) and contracting data (the Open Contracting Data Standard (OCDS)), and with the right backing, there are opportunities for the OFDP to support a step-change in the accessibility and re-use of budget and spend data.
As noted in the introduction to this chapter, to construct a full picture of government finances, more than budget and spend data is needed. This calls for interoperability between standards. There has been some recent progress on this with extensions to the OCDS (Figure 4) being designed to provide interoperability with the OFDP, although this work is currently untested.
Figure 4:Linking contract, budget, and budget execution data in Mexico
Source: https://github.com/open-contracting-extensions/ocds_budget_and_spend_extension
The Ministry of Finance in Mexico has been working on a pilot to link federal budget and spending data with investment projects through the use of two standards: OCDS and OFDP. They have successfully linked budget data from the planning phase of procurement processes with amounts spent per project in the implementation phase and have made this available.48
Through work with the Open Contracting Partnership, a proposed extension to OCDS has been developed to describe how other governments could make similar linkages.49
The greatest challenges (and opportunities) to increased adoption and impact from open data activities related to government financial transparency ultimately relate to policy. In 2017, the International Budget Partnership’s Open Budget Survey (OBS) of 115 countries suggested that progress on opening up budgets had stalled for the first time in a decade.50 Although the OBS does not look specifically at open data publication, its findings suggest that the global political will to increase financial transparency may be at a low ebb. There have also been long-standing challenges to securing public attention on open government finance data, as noted by Carter in 2013 that “budget transparency has still not captured global attention in the way that other related movements have”.51
Regardless, in some noteworthy countries, open data regulations and legislative frameworks are being used successfully to enforce either the publication of public finance data52 or to make finance data a priority within wider programmes of open data release across government.53 The OGP has also provided a key forum for increasing the disclosure of contracting data in recent years with many commitments secured to adopt the OCDS.54 This suggests that the current wave of interest in open data and data standards could still be used to help advance the financial transparency agenda. Crucially, getting to joined-up data that presents a full picture of government finances means overcoming silos in government and securing data across agencies. For this, the importance of political leadership cannot be underestimated.
Government finances are undoubtedly complex. Increasing the use of available data requires accessible technical platforms, skilled intermediaries, and capacity building for citizen-users of data. As a whole, the last decade has seen an increase in resources to support the development of data literacy skills which enable users to work with public finance data through digital tools, and many resources are still improving based on cases studies, user involvement, stakeholder feedback, and innovations in technology. However, continued capacity building will be needed for increased data availability to drive new models of citizen engagement around government finances.
In the years ahead, the key challenge will be to better connect the current wave of the open data-driven transparency movement with other grassroots advocacy networks and government decision-makers. When it comes to securing impactful results from open government finance data, the evidence suggests that projects will require unique partnerships between technologists, CSOs, and government. This is the model followed in South Africa with the Vuleka Mali project, a partnership between the National Treasury and a coalition of CSOs called Imali Yethu to make government budget data and processes accessible to all citizens and interested parties. Their motto, “We aren’t interested in transparency for transparency’s sake”,55 should be one that more organisations place at the heart of their thinking. Technical work on government financial data also needs to connect with wider social agendas. For example, Carter notes that the potential exists to apply a gender lens to budget analysis;56 however, we have not yet found open data projects that directly apply a gender lens to open budget data creation and sharing.
Given the long history of work on opening up government finance, we should not expect a complete transformation in less than 15 years of open data activity. The vision of current advocates for open government finances is an ambitious one – to provide more granular data than ever before. There are signs, however, that when data is used, and governments are willing to open up, substantial progress can be made. In their brief history with the DATA Act, Sunlight Foundation has described how “reporting bad data drove reform” to secure new legislation, better processes, and ultimately improved data.57 Rather than waiting for perfect data, it is possible to publish data and then improve it with subsequent iterations.
The foundations laid in the last decade in terms of technology platforms and data standards, and in terms of networks and communities, is impressive. Long-term, opening up and securing the use of government finance data will require significant resources in terms of technology, financial and human capacities, as well as time and strong political support. Not all the organisations explored in this chapter will have the resources they need for sustainability, but all have demonstrated what could be possible in their local contexts, and they have collectively re-imagined ways to engage citizens on governance finances.
Government finance data has played a key role in shaping the early development of the open data movement. The challenge for the decade ahead is to see how far, and to what end, open data advocates and practitioners can shape a sustainable ecosystem of open government finance data.
Further reading
Beghin, N. & Zigoni, C. (2014). Measuring open data’s impact of Brazilian national and sub-national budget transparency websites and its impacts on people’s rights. Brasilia: Institute for Socioeconomic Studies (INESC). http://www.opendataresearch.org/sites/default/files/publications/Inesc_ODDC_English.pdf
Carter, B. (2013). Budget accountability and participation. GSDRC Helpdesk Reports, 15 July. London: Department for International Development. http://www.gsdrc.org/docs/open/hdq973.pdf
Dener, C. & Min, S.Y. (2013). Financial management information systems and open budget data: Do governments report on where the money goes? Washington, DC: World Bank. http://documents.worldbank.org/curated/en/659821468152725669/pdf/81332-REVISED-ENGLISH-PUBLIC-WB-Study-FMIS-and-OBD-eng.pdf
Gray, J. (2015). Open budget data: Mapping the landscape. Washington, DC: Global Initiative for Fiscal Transparency. http://www.fiscaltransparency.net/resourcesfiles/files/20150902128.pdf
Tygel, A.F., Attard, J., Orlandi, F., Campos, M.L.M., & Auer, S. (2015). “How much?” is not enough: An analysis of open budget initiatives. Cornell University, 7 April. https://arxiv.org/pdf/1504.01563.pdf
Cécile Le Guen is an associate of Datactivist, a French cooperative that provides open data services, strategies, research, and consulting to CSOs and the public and private sector. Datactivist is conducting projects in France and internationally, with a focus on data literacy, open data standards, and open government policies. You can follow Cécile at https://www.twitter.com/cecileLG and learn more about Datactivist at https://datactivist.coop/.
How to cite this chapter
Le Guen, C. (2019). Open data and government finances. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The state of open data: Histories and horizons (pp. 151–165). Cape Town and Ottawa: African Minds and International Development Research Centre. http://stateofopendata.od4d.net
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. It was carried out with the aid of a grant from the International Development Research Centre, Ottawa, Canada. |
1Irwin, T. (2013). Shining a light on the mysteries of state: The origins of fiscal transparency in Western Europe. IMF Working Papers 13–219. Washington, DC: International Monetary Fund. pp. 27 and 1, respectively. http://www.imf.org/en/Publications/WP/Issues/2016/12/31/Shining-a-Light-on-the-Mysteries-of-State-The-Origins-of-Fiscal-Transparency-in-Western-41012
2International Monetary Fund. (2018). Fiscal transparency. https://www.imf.org/external/np/fad/trans/index.htm
3GIFT. (2015). The time is now: Advancing public participation in government fiscal policy and budget-making. Washington, DC: Global Initiative for Fiscal Transparency. http://www.fiscaltransparency.net/resourcesfiles/files/20150729123.pdf
4Carter, B. (2013). Budget accountability and participation. GSDRC Helpdesk Reports, 15 July. London: Department for International Development. http://www.gsdrc.org/docs/open/hdq973.pdf
6Web Foundation. (2017). Open Data Barometer – Global report. 4th edition. Washington, DC: World Wide Web Foundation. https://opendatabarometer.org/4thedition/report/
7Worthy, B. (2015). The impact of open data in the UK: Complex, unpredictable, and political. Public Administration, 93(3), 788–805. https://doi.org/10.1111/padm.12166
8International Monetary Fund. (2018). Fiscal transparency. https://www.imf.org/external/np/fad/trans/index.htm
9Léchenet, A. (2014). Global database investigations: The role of the computer-assisted reporter. Fellowship Papers. Oxford: Reuters Institute for the Study of Journalism. https://reutersinstitute.politics.ox.ac.uk/ourresearch/global-database-investigations-role-computer-assisted-reporter
10FarmSubsidy.org. (2017). Farmsubsidy.Org at a glance. Open Knowledge Foundation Deutschland. https://farmsubsidy.org/about/
11http://app.wheredoesmymoneygo.org/about.html
12https://webarchive.nationalarchives.gov.uk/20100807004350/http://www.showusabetterway.co.uk/
14US Congress. (2006). S.2590-109th Congress (2005–2006): Federal Funding Accountability and Transparency Act of 2006. https://www.congress.gov/bill/109th-congress/senate-bill/2590
15HM Treasury. (2010). Combined online information system. https://data.gov.uk/dataset/3266d22c-9d0f-4ebe-b0bc-ea622f858e15/combined-online-information-system
16Rogers, S. (2010). COINS data release: The 10 things we found out. The Guardian, 14 June. https://www.theguardian.com/news/datablog/2010/jun/14/coins-data-results-10-things
17https://www.theguardian.com/news/datablog+politics/coins-combined-online-information-system
18Cameron, D. (2010). Letter to government departments on opening up data. GOV.UK, 31 May. https://www.gov.uk/government/news/letter-to-government-departments-on-opening-up-data
19Government of Mexico. (2011). Alianza Para El Gobierno Abierto – Plan de Acción de México, 20 September [Open Government Partnership - Mexico Plan of Action]. New York, NY: Open Government Partnership.https://www.opengovpartnership.org/sites/default/files/Mexico_Action_Plan_0.pdf
20OGP. (2013). Mexico: The budget transparency portal. Washington, DC: Open Government Partnership. https://www.opengovpartnership.org/sites/default/files/Inspiring%20Story%20-%20Mexico.pdf
21Beghin, N. & Zigoni, C. (2014). Measuring open data’s impact of Brazilian national and sub-national budget transparency websites and its impacts on people’s rights. Brasilia: Institute for Socioeconomic Studies. http://www.opendataresearch.org/sites/default/files/publications/Inesc_ODDC_English.pdf
22Dener, C., & Min, S.Y. (2013). Financial management information systems and open budget data: Do governments report on where the money goes? Washington, DC: World Bank. http://documents.worldbank.org/curated/en/659821468152725669/pdf/81332-REVISED-ENGLISH-PUBLIC-WB-Study-FMIS-and-OBDeng.pdf
23IBP. (2017). Open budget survey 2017. Washington, DC: International Budget Partnership. https://www.internationalbudget.org/wp-content/uploads/open-budget-survey-2017-report-english.pdf
24http://www.opengovpartnership.org/explorer/all-data.html
25Gray, J. (2015). Open budget data: Mapping the landscape. Washington, DC: Global Initiative for Fiscal Transparency. http://www.fiscaltransparency.net/resourcesfiles/files/20150902128.pdf
27http://yourbudgit.com/about-us/
28Chambers, L. (2011). OpenSpending goes live. Open Knowledge International Blog, 26 June. https://blog.okfn.org/2011/06/26/openspending-goes-live/
29OpenSpending. (2013). Spending data handbook. London: Open Knowledge International. http://community.openspending.org/resources/handbook/
30Björgvinsson, T. (2015). Presenting public finance just got easier. Open Knowledge International Blog, 20 March. https://blog.okfn.org/2015/03/20/presenting-public-finance-just-got-easier/
31https://github.com/openspending-archive/openspendingjs
32Walsh, P., Pollock, R., Björgvinsson, T., Bennet, S., Kariv, A., & Fowler, D. (2018). Fiscal data package (version 1.0rc1). London: Open Knowledge International. https://frictionlessdata.io/specs/fiscal-data-package/
33See https://offenerhaushalt.de/, http://www.institutigap.org/spendingsEng/, and http://openbudget.in.ua/, respectively.
34See https://next.obudget.org/, http://www.openspending.nl/, https://vulekamali.gov.za/2017-18/national/departments/women, and https://dondevanmisimpuestos.es/, respectively.
35Pedersen, A. (2013). Launching spending stories: How much is it really? Open Knowledge International Blog, 21 November. https://blog.okfn.org/2013/11/21/launching-spending-stories-how-much-is-it-really/
37Cobham, A., Gray, J., & Murphy, R. (2018). What do they pay? Towards a public database to account for the economic activities and tax contributions of multinational corporations. #OD4TJ. http://datafortaxjustice.net/what-do-they-pay/
40http://www.fiscaltransparency.net/
41https://web.archive.org/web/20110802165935/http://sunlightfoundation.com/clearspending/
42Carr, A. (2010). The curious case of USASpending.Gov’s missing $1.3 trillion. Fast Company, 7 September. https://www.fastcompany.com/1687410/curious-case-usaspendinggovs-missing-13-trillion
43Evans, L. (2010). What was coins missing? The mystery of the government’s hidden spending data. The Guardian, 14 July. https://www.theguardian.com/news/datablog/2010/jul/14/whole-government-accounts-coins-data
44Freeguard, G., Campbell, L., Cheung, A., Lily, A., & Baker, C. (2018). Whitehall Monitor 2018: The general election, Brexit and beyond. London: Institute for Government. https://www.instituteforgovernment.org.uk/publications/whitehall-monitor-2018
45US Congress. (2014). S.994 – DATA Act. https://www.congress.gov/bill/113th-congress/senate-bill/994
46Dudáš, M., Klímek, J., Kučera, J., Mynarz, J., Sedmihradská, L., Zbranek, J., & Seeger, B. (2016). The Openbudgets data model and the surrounding landscape. OpenBudgets.eu. Berlin: OpenBudgets, p. 8. http://openbudgets.eu/assets/resources/Report-UEP-The-Open-Budgets-Data-Model-and-the-Surrounding-Landscape.pdf
47http://www.fiscaltransparency.net/ofdp/
48https://www.gob.mx/contratacionesabiertas/home#!/
49Davies, T. & Pane, J. (2018). Open contracting budgets and spend extension. https://github.com/open-contracting-extensions/ocds_budget_and_spend_extension
50IBP. (2017). Open budget survey 2017. Washington, DC: International Budget Partnership. https://www.internationalbudget.org/wp-content/uploads/open-budget-survey-2017-report-english.pdf
51Carter, B. (2013). Budget accountability and participation. GSDRC Helpdesk Reports, 15 July. London: Department for International Development. http://www.gsdrc.org/docs/open/hdq973.pdf
52Dulong de Rosnay, M. & Janssen, K. (2014). Legal and institutional challenges for opening data across public sectors: Towards common policy solutions. Journal of Theoretical and Applied Electronic Commerce Research, 9(3), 1–14.
53Lucchesi, L. (2016). Digital Republic Bill: France’s first open bill. Open Government Partnership Blog, 28 January. https://www.opengovpartnership.org/stories/digital-republic-bill-frances-first-open-bill
54https://www.open-contracting.org/why-open-contracting/worldwide/#/
56Carter, B. (2013). Budget accountability and participation. GSDRC Helpdesk Reports, 15 July. London: Department for International Development. http://www.gsdrc.org/docs/open/hdq973.pdf
57Rumsey, M. (2017). A brief history of the DATA Act. Sunlight Foundation. https://sunlightfoundation.com/2017/05/08/a-brief-history-of-the-data-act/
There is relatively limited awareness of open data in the health sector, where, given the focus on patient data, the idea of “open by default” does not resonate. It is important for initiatives to understand that data exists on a spectrum from personal and closed to non-sensitive and open.
Privacy concerns, a lack of fresh data, disjointed source systems, and usability problems have all hindered nascent open data initiatives in health. Initiatives have often failed to identify the high-priority use cases, driven by demand from multiple stakeholders, that would sustain the attention and investment necessary to help them overcome early challenges.
Open data that originates from health facilitates as feedback from service users can be used to improve performance or support researchers as input into policy; however, if feedback is not connected to action or if input meets political and resource constraints, it is hard to create a virtuous cycle of data publication and reuse.
The development of large public databases by government ministries, departments, and agencies (MDAs) has been ongoing in earnest in many countries around the world since at least the 1990s. The most basic of these government data systems are registers, supporting a range of government services, such as health insurance, social security, vehicle and business registration, and census-taking among many others. These registers form the basis of numerous vital public services whether the services are delivered electronically or not. Other systems are layered on top of these registers in order to support decision-making, planning, and policy-related research. To function well, many of these systems reside behind rigid security and multi-level authentication and authorisation protocols as they regularly contain very sensitive personal information about citizens.
Data about health is often considered some of the most sensitive information collected and held by governments and institutions. Yet over the last decade, there have been a number of initiatives focused on open data in the health sector. Broad et al. describe open data as “data made available by governments, businesses, and individuals for anyone to access, use and share”.1 Clearly this should not apply to the detailed personal information within health registers. So, when it comes to open data and health, it is paramount to understand the particular data held within each system, to think carefully about the levels of access that different stakeholders may want or need, and to determine how, or whether, the data may be safely anonymised prior to publication as open data. To do this, it is useful to consider a data spectrum for health, to enumerate the different stakeholders creating and using data, and to consider the challenges they must overcome before open data in the health sector can evolve from being a minor sub-community and enter the mainstream.
Keen et al. (2013) state that government MDAs and private firms coexist, often exhibiting a dichotomous relationship between public and private interests in the national health system and the data therein. The following broad categories of actors can be identified within the national health system: the state, private-sector firms, citizens/patients, doctors and other health professionals, researchers, and a broader diaspora of interested parties, including health charities and journalists. All these actors, as illustrated in Figure 1, have the potential to generate data that could be accessed and used within the health sector, and all may also be users of data generated by other actors.
Figure 1:Spectrum of health data stakeholders
Source: Authors
Different actors seek to use data for a variety of purposes. In particular, users seek the data from registers, for example, to access and update information about individuals. They also look for data to support operational requirements, such as organisational planning and decision-making, and to improve efficiency and effectiveness of services, as well as to analyse for research purposes to inform policy and practice development. Data may also be used by patients to locate and access health services.
By examining how different uses of data are currently regulated, it is possible to identify a spectrum of data openness ranging from closed data with highly restricted access through to data that is openly published in reusable formats. Between these two ends of the spectrum can be found planning and decision-support data with ranging levels of restriction on access and reusability as illustrated in Figure 2.
Figure 2:Openness of data based on type and intended use
Source: Authors
The level of openness of data, who it is shared with, and in what level of detail, should also vary according to circumstances. For example, it may become vital to understand who and where patients are located in the case of an outbreak of a deadly disease epidemic, but detailed information may not be necessary when citizens engage civic leaders to mobilise resources for local health centres to be established.
This chapter will examine how to ensure health data is effectively placed on this continuum depending on its intended use. While the focus of this chapter is on the open data end of the spectrum, where individual records are generally only available in anonymised or aggregated form as Figure 2 indicates, the potential uses for open data often overlap with the needs of stakeholders who might also have access to shared or even closed data. This is an important point to be made as it affects the politics behind the open publication of health data. The challenge of working out where certain datasets should fall on this data spectrum is further compounded by advances in computing technologies that could potentially enable the deanonymisation of sensitive data on individuals.
Information technology has been the key process automation enabler in government, which has led to more and better data, and has dictated areas for integration in order to bolster efficiency in service delivery.2 E-government both improves the quantity and veracity of data. Examples of e-government in health services include, but are not limited to:
National health insurance schemes.
Health registries (births, deaths, treatments).
Electronic health records (patient records inputted at facilities by medical personnel).
Electronic prescriptions.
Progress toward implementation of these systems has varied, but in Estonia, for example, 99% of all prescriptions are now electronically issued by doctors,3 creating a potential wealth of data about prescribing practices.
The different political landscapes from country to country have an influence on which health programmes are prioritised by governments and the stage of development of the supporting data systems. For example, in Kenya, the health sector has been devolved so as to be able to offer more resources for better services to citizens at the subnational county level. However, this does not necessarily have to lead to poor data integration. In Kenya, the Health Data Collaborative,4 established in 2015, provides a framework that stipulates how partners (international agencies, the United Nations, governments, civil society organisations, philanthropies, donors, and academics) engage and align data initiatives with the common aim of improving health data. Similar health data collaboratives exist in Tanzania, Malawi, and Cameroon.
The World Health Organization (WHO) hosts the Global Health Observatory,5 which is a one-stop portal initiative where countries share both their health data and health priorities. Various countries are moving to implement their own national health data observatories or portals, and the scientific community is moving toward the adoption of common open data principles as evidenced by a number of platforms making clinical trials data available6 and scientific journals, such as the British Medical Journal, campaigning for more open data publication.7
In many countries, the health sector has seen significant investment in capacity building over several years. For instance, the District Health Information System 2 (DHIS2)8 is used in many countries as the national Health Management Information System (HMIS) to collect, manage, and analyse health data. At the time of writing, the open source DHIS2 software is used in over 40 countries in Africa, Asia, and Latin America, and countries that have adopted DHIS2 as their national HMIS software include Kenya, Tanzania, Uganda, Rwanda, Ghana, Liberia, and Bangladesh. The core development activities of the DHIS2 platform are coordinated by the Department of Informatics at the University of Oslo,9 supported by the North American Aerospace Defense Command (NORAD); the President’s Emergency Plan for AIDS Relief (PEPFAR); the Global Fund to Fight AIDS, Tuberculosis and Malaria; United Nations International Children’s Emergency Fund (UNICEF); and the University of Oslo.
The introduction of HMIS software does not, however, automatically lead to good data quality. Processes often require data to be manually transcribed from paper into computer terminals at the health facility level before it can be captured and collated in the HMIS, where regional and national health management teams review data for quality. A review stage can impact the timeliness of data and its availability for operational decision support, with some delays of up to six months before data is made available at the local facilities where it originated.10
When data is keyed in by health workers solely for the purpose of reporting to administrative agencies, there may be limited local ownership of the data and, as a result, limited investment in its accuracy. There can be tension between the creation of systems that support doctors and clinicians in their day-to-day localised work and systems that emphasise centralised reporting. Arguably, a focus on open data availability can place extra emphasis on centralised reporting, with MDAs pushing healthcare providers to enter as much standardised information as possible. However, if system architectures do not give local stakeholders access to the information they need for planning and prioritisation, they can ultimately lead to expensive, error-prone, and patchy data.11 One remedy for this comes through the use of automated data collection systems, relying on data created at source from digital keypads, mobile devices, and user interfaces that eliminate the need to transcribe from paper in the first place.
In summary, initiatives at the international, national, and subnational levels are actively encouraging health programmes to improve data management. These initiatives cover not just the creation of data, but also focus on strengthening the use of data by targeting monitoring and evaluation processes. This suggests that, although there may be a long way to go in terms of data quality in some settings, the right steps are being taken toward a strategic approach to establish a conducive environment for leveraging data (UNECA et al., 2016) as evidenced by:
Legislative and policy reforms that will allow for harnessing data.
Significant investments in information technology, tools, and infrastructure.
Greater collaboration and coordination among health stakeholders.
Investments in administrative data collection and use at the subnational level.
Supporting and resourcing national statistical offices as key facilitators and drivers of national data ecosystems in their respective countries.
However, much of the focus here is on data use within a single stakeholder group or the use of data shared securely between two particular stakeholders. When it comes to opening up data for wider use, a number of gaps and challenges emerge.
Data from the 2016 edition of the Open Data Barometer indicates that health sector performance statistics exist in 98% of countries surveyed and are available in some form (such as aggregate tables in print or via PDFs) in 85% of countries, but only 7% of countries had openly licensed and machine-readable datasets.12
To allow for the maximum range of use when datasets are made open, they should be disaggregated to the lowest levels of administrative geography possible and split by gender, age, income, disability, and other categories. Many governments have made commitments to opening up datasets via their own open data portals, often included in the National Action Plans submitted under their membership in the Open Government Partnership. However, often data that exists in national HMIS remains locked away in countries where they are deployed, and few portals host statistical datasets on health that contain full details. When health data is published, it often does not meet the level of detail demanded or it is too outdated to meet the needs of users.13
Although platforms like DHIS2 could be configured to generate regular, anonymised exports of data by using application programming interfaces (APIs), it appears this is only rarely the case (Tanzania’s HMIS portal being an interesting exception14). For example, while the DHIS2 demo shows the location of all health clinics in Sierra Leone, the national open data portal gives no clue that such data even exists, nor does it provide links to the regularly updated dataset.15
For academia, particularly in Africa, the use of data to generate scientific output has remained very low (overall scientific research output is less than 1% of global research), limiting key opportunities for locally driven research that could address key development challenges.16 Alongside the limited quantity of open data, the usability of open data platforms also limits discovery and the uptake of data. In the example of the Kenya Open Data Initiative Platform, usability experiments revealed that more than half of the users found it difficult to navigate and could not find the information they were looking for via the platform.17 Where data is found and used for research in Africa, there are further challenges related to the ecosystem for knowledge dissemination, with much of the research published in non-indexed journals or left in unpublished dissertations.18 Although there is more data being generated inside public and private health services than they can analyse themselves, the potential for external stakeholders to get involved in working with this data is currently almost entirely lost.
Increasingly, there is a push from data communities, including the open data community, to engage with policy-makers and other stakeholders to ensure that decision-making is driven by data and research. There have been successes in this regard; however, much remains to be done as evidence is often not a driving factor in decision-making. Many governments will grapple with other considerations, such as budgets, politics, and development partner priorities when it comes to resource allocation,19 and these decisions can be as basic, as, for example, “Do we buy SMS bundles to disseminate information to patients, pay our staff, or buy additional hospital beds?”
As already noted, the lack of a supply of fresh data, especially from government as the key source of official statistics and operational information, has led to limited progress in developing open data initiatives in health. To date, many seem to have fallen short on scalability and sustainability. This can be attributed in part to failures in identifying high-priority use cases for health data that are driven by demand from multiple stakeholders, which will serve to embed open data initiatives within the wider data ecosystem. The integrated approaches illustrated through the examples in the box on what happens when health data is open are, at present, the exception rather than the rule. As a result, projects have often failed to actualise value through visible results that could lead to continued investment and development.20 To make sure more opportunities related to open health data are realised, policy-makers, practitioners, and funders will need to address three key challenges.
What happens when health data is open?
The following examples illustrate the potential of open health data:
Maternal mortality in Mexico: working with the Government of Mexico, the Data Science for Social Good programme at the University of Chicago has explored how available datasets can be leveraged to support reductions in maternal mortality, a key target of the Sustainable Development Goals (SDGs). Researchers, working with a combination of open and shared data, explored how analysis at the regional level could present a more granular picture of how current interventions may be working.21
In Uruguay, A Tu Servico has taken data on healthcare provider performance and made this accessible to citizens, supporting them to make better decisions during the annual one-month window when Uruguayans can choose whether or not to switch healthcare providers.22 Data made accessible through the site has been used by politicians, media, and by over 35 000 citizens (more than 1% of Uruguay’s population).
During the Ebola outbreak in Sierra Leone, responders made use of HDX, the open data Humanitarian Data eXchange platform, to bring together up-to-the-minute data from different stakeholders, visualising the results through open mapping tools.23 The Ministry of Health and Sanitation released geocoded data on health facilities, while others released data on ebola cases and current organisational responses. Multiple stakeholders used the data to identify the regions that needed the most urgent medical supplies. Using an open data approach reduced the friction on data exchanged during this crisis situation.
As explored in the previous sections, technology has been a key driver of e-government and has resulted in substantial growth in the amount of health data available. The coming decade could see further dramatic developments in the use of technology in healthcare, and, consequently, the rapid expansion of data availability, especially with the trend toward big-data enabled healthcare. Potential open data users must be prepared for this expansion, while also ready to address the critical need for information governance. Most importantly, the orientation of open data projects must move from analysis to action to create an evidence base that can reveal the different components needed to secure meaningful impact on the health system.
The potential for big data to improve health outcomes and create new revenue streams and complementary services has often been acknowledged.24 One of the trends emerging as the healthcare community recognises the potential value of the data generated by advanced medical equipment is “servitisation”. In commercial circles, servitisation describes the trend in the business of companies moving from selling goods to selling “bundles” of goods, services, support, self-service, and knowledge. These hybrid product-services place the emphasis on the service component and have a much heavier reliance on data,25 creating new potential opportunities, including economic, social, and environmental efficiencies. In this new world, for example, expensive MRI scanners are constantly monitored and repaired by a service firm, while older models can be acquired by health systems with smaller budgets, such as MDAs in developing countries. Consumer technologies also now collect a wealth of data that may be of value to healthcare stakeholders with mobile phones and fitness trackers recording countless data points every day.
However, before healthcare stakeholders can realise the benefits of big data (including large anonymised open datasets), there are a number of prerequisites:
1.Infrastructure that can handle the required storage and analytics as managing large datasets can be complex and expensive. This infrastructure also needs to allow stakeholders to determine how and when data should be disposed of when it is no longer of value.
2.Access to data for external stakeholders, recognising it is often not the government agency which collects it, but other stakeholders who have the skills and resources to create new value from data.
3.Integration of data from multiple systems, including the ability to connect new streams of big data with systems that are still using brittle legacy architectures.
4.Connectivity to high-capacity internet. This has a huge impact for the developmental potential of health data in environments with poor connectivity.
Even if open data approaches enable access to data that is generally more evenly distributed, the capacity to use it may not be. More attention must be given to who ultimately benefits and whether healthcare inequalities might be challenged or reinforced. As a result of servitisation and the other broad trends in the delivery of healthcare, private firms (hospitals, banks, insurance) and civil society organisations are increasingly in possession of data that can also contribute to national or government healthcare objectives, even though the data may not be of great utility to the organisations that have collected it.26 This draws attention to the non-state actors who are collecting important data that could be used to complement state data. Discussions about legal reforms that could allow privately generated data to contribute to official statistics have already begun but are mostly ongoing, and major advances have not yet been realised.27 However, some of the recent literature has expressed concerns that this kind of public–private data sharing may reinforce relationships between state and private sector actors and weaken the power and positions of both citizens/patients and professionals.28 Working out what should be shared beyond the private–state axis and how more data should be open to researchers and citizens to use remains a vital task. The success or failure of open data in health may largely depend on how the question of trust between organisations is addressed as big data flows continue to develop. This is ultimately a question of information governance.
Open data is not just about technology. It involves a mesh of people (with newer technologies implemented mostly in a piecemeal fashion), processes (policies and guidelines), culture (changes in attitudes, behaviours, and practices), and legacy systems (including existing IT infrastructures).29 This “ecosystem” produces complex dynamics around data. For example, published data does not remain static. It can keep changing continuously with new fields introduced or integration with other related datasets, including those from non-health sectors, which also bring new challenges, namely the potential negative consequences from privacy breaches or from unethical research.
Many health problems are highly personal and patients need to be confident that their conversations with doctors and other professionals are confidential. While the data is important for treating the patient (primary use at administrative or operational levels at the facilities), secondary uses, such as medical research or planning health services, may pose a challenge. Striking a balance between primary and secondary uses of data is increasingly difficult because modern technology makes it possible to combine data and identify individuals through statistical inference.30 This provides one of the regulatory paradoxes of open data in health: the more details a dataset contains, the more valuable it is (for example, to detect patterns of health inequality), but also the greater the likelihood of identifying individuals and disclosing sensitive personal information.
The European Union’s General Data Protection Regulations and the data provisions of the United States Health Insurance Portability and Accountability Act (HIPAA) try to provide frameworks to address the security and reuse of data on individuals, but many countries still lack suitable legal frameworks (see Chapter 23: Privacy), and questions still remain around the appropriate reuse of personal experimental data in research-like activities.31 When there is a lack of clarity between closed, shared, and open data, citizen trust may be undermined. This was evident when the Government of the United Kingdom proposed a data-sharing framework in 2013 for medical records from the National Health Service (care.data) using the language of “open data” even though the scheme would not have published individuals’ information under an open licence.32 After a backlash from citizens, the scheme was cancelled, and awareness and opinions about open data were also tainted.33,34
Even in the absence of the socio-technical infrastructures and governance frameworks needed to identify what and how increased health-related data can be made open to academic and citizen stakeholders, there have been, as noted above, cases where health data has been available, accessible, and used; however, these cases have not always led to long-term change.
There is need to move from just data release to action. Although open health data may build transparency, if there is no real commitment and accountability for the use of evidence in decision-making within government, then effective adoption and use of data will not occur. For example, when citizens report on poor service delivery at a health facility and feedback is not acted upon, enthusiasm for data understandably wanes. The converse is true when data is visibly acted upon. In Swaziland, UNICEF’s U-Report platform is used by the quality assurance teams within government to perform customer satisfaction surveys using a free short message service (SMS). Given the cultural context, a client might not provide clear feedback on what the problem was with the services they have obtained from a facility, but, with SMS, they are anonymous, and they might even mention names of those who have caused problems at the facility. Actions undertaken in response to this information are clearly evident to the client, and as a result, they are even willing to pay for the SMS.35
Getting from data use to action requires relationship building and the development of products that can scale and be adapted to different healthcare environments. As the Prescribing Analytics case shows (see box), it can be a long journey between discovering the potential for change in health services using open data and seeing that change realised at scale. At present, few initiatives outside of academia may have access to the funding needed to pursue these longer-term programmes. Expanding the number of stakeholders (funders, academia, technology innovators, medical charities, governments, etc.) who are able to invest the necessary resources, and work collaboratively to take open data initiatives from proof-of-concept to full implementation, is vital.
Case study: Prescribing Analytics
The Prescribing Analytics website36 was created by a group of open data enthusiasts, companies, and researchers at a 2012 “NHS Hack Day” event. The project used newly released prescribing data from doctors to look for potential cost savings from prescribing cheaper drugs, identifying GBP 27 million a month potential savings from changing the approach for one drug alone.37 Unsurprisingly, this single finding did not change doctor behaviour. Indeed, the problem of expensive drug use had been reported as early as 2006 using other data sources; however, the project team has gone on to develop the Evidence-Based Medicine DataLab38 at Oxford University, as part of the Open Prescribing project,39 which provides data, tools, and email alerts to doctors to help them find clinic-level cost savings and prescription improvements. This journey from idea to implementation of a platform tailored to the needs of key stakeholders highlights the movement from data release to impact and the need for longer-term research on the potential impacts of open data in health.
Figure 3:Percentage of proprietary statin prescribing by CCG September 2011 to May 2012
Source: www.prescribinganalytics.com
The International Open Data Conference (IODC) brings together a few thousand people every two years. Major healthcare conferences may have ten times that many attendees to discuss research, products, and innovations, most of which have a data component. Over the last decade, open data has made some inroads into the medical science community; however, concerns over privacy, infrastructure, and the challenges of creating trust and sustainable projects based on open health data have made limited progress. Yet, there is much for the open data field to learn from the health sector as it forces continuous engagement with issues related to personal data, ethics, and the interaction of different stakeholder groups.
This chapter has started to sketch out distinctions between different stakeholders and the different approaches to data sharing, as well as to highlight challenges arising from a private– public nexus of data sharing that could exclude citizen access to data. However, much more needs to be done to bring clarity to the health and open data discussion. Lumping together administrative data for decision-making and longitudinal data for research purposes can frustrate progress. This is because the goals of the stakeholders are different: some are focused on health planning and policy improvements, whereas health facility managers are mostly interested in day-to-day patient management. Building infrastructure capacity will be an ongoing issue as the technical foundations to produce and use open data vary substantially around the world even if all regions are heading toward increasingly digitised healthcare.
Perhaps when we look back on open data and health in the next decade, we will have a much clearer framework available to understand the different potential applications from policy and epidemiological research through to enabling decision-making by patients. Ultimately, the search for innovation should continue with a broader view of real-use cases and examples of stakeholders that have been able to access health data, build services, or develop policy, and then make the impact sustainable.
Further reading
Kostkova, P., Brewer, H., De Lusignan, S., Fottrell, E., Goldacre, B., Hart, G., Koczan, P. et al. (2016). Who owns the data? Open data for healthcare. Frontiers in Public Health, 4, 7. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4756607/
Tambo, E., Madjou, G., Khayeka-Wandabwa, C., Tekwu, E.N., Olalubi, O.A., Midzi, N., Bengyella, L., Adedeji, A.A., & Ngogang, J.Y. (2016). Can free open access resources strengthen knowledge-based emerging public health priorities, policies and programs in Africa? F1000Research, 5 (9 May). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4955019/
Verhulst, S., Noveck, B.S., Caplan, R., Brown, K., & Paz, C. (2014). The open data era in health and social care: A blueprint for the national health service (NHS England) to develop a research and learning programme for the open data in health and social care. Brooklyn: GovLab. http://www.thegovlab.org/static/files/publications/nhs-full-report.pdf
Mark Irura has extensive experience heading end-to-end digital implementations for bilateral and multilateral development agencies, government ministries, and the private sector. With a decade of experience across East and Southern Africa at both national and subnational levels, Mark has implemented and managed large-scale platforms, assessed scalable technical tools, and designed comprehensive data models. He previously studied at the University of Cape Town and at Strathmore University and is presently pursuing doctoral studies at the University of Eastern Finland.
How to cite this chapter
Irura, M. (2019). Open data and health. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The state of open data: Histories and horizons (pp. 166–180). Cape Town and Ottawa: African Minds and International Development Research Centre. http://stateofopendata.od4d.net
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. It was carried out with the aid of a grant from the International Development Research Centre, Ottawa, Canada. |
1Broad, E., Smith, F., Duhaney, D., & Carolan, L. (2015). Open data in government: How to bring about change – The ODI. London: Open Data Institute. https://theodi.org/article/open-data-in-government-how-to-bring-about-change/
2InfoDev & Center for Democracy and Technology. (2002). The e-government handbook for developing countries. Washington, DC: Center for Democracy & Technology. http://www.infodev.org/sites/default/files/resource/InfodevDocuments_16.pdf
3Rivera, A.M.A. & Vassil, K. (2015). Estonia – A successfully integrated population-registration and identity management system: Delivering public services effectively. Washington, DC: World Bank. https://doi.org/10.1596/28077
4https://www.healthdatacollaborative.org/
6For example, Krumholz, H.M. & Waldstreicher, J. (2016). The Yale Open Data Access (YODA) Project: A mechanism for data sharing. New England Journal of Medicine, 375(5), 403–405. https://doi.org/10.1056/NEJMp1607342
7https://www.bmj.com/open-data
9https://www.mn.uio.no/ifi/english/research/networks/hisp/
10Development Gateway. (2016). Development Gateway 2016 Annual Report: Turning data into action. Washington, DC: Development Gateway.
11Keen, J., Calinescu, R., Paige, R., & Rooksby, J. (2013). Big data + politics = open data: The case of health care data in England. Policy & Internet, 5(2), 228–243. https://doi.org/10.1002/1944-2866.POI330
12Web Foundation. (2017). Open Data Barometer – Global report. 4th edition. Washington, DC: World Wide Web Foundation. https://opendatabarometer.org/4thedition/report/
13Mutuku, L. & Mahihu, C. (2014). Open data in developing countries: Understanding the impacts of Kenya open data applications and services. Nairobi: iHub Research. https://idl-bnc-idrc.dspacedirect.org/handle/10625/56300
14See https://hmisportal.moh.go.tz/. Please note that data visualisation was not working at the time of our research in November 2018.
15Based on a comparison of https://www.dhis2.org/demo and http://opendatasl.gov.sl/ in November 2018.
16Francescon, D. (2017). Research without borders: Sharing expertise in Africa. Elsevier Connect, 13 January. https://www.elsevier.com/connect/research-without-borders-sharing-expertise-in-africa
17Mutuku, L. & Mahihu, C. (2014). Open data in developing countries: Understanding the impacts of Kenya open data applications and services. Nairobi: iHub Research. https://idl-bnc-idrc.dspacedirect.org/handle/10625/56300
18UNECA, United Nations Development Programme, Open Data for Development & World Wide Web Foundation. (2016). The Africa Data Revolution Report 2016: Highlighting developments in African data ecosystems. Addis Abba: United Nations Economic Commission for Africa Printing and Publishing Unit. http://www.africa.undp.org/content/rba/en/home/library/reports/the_africa_data_revolution_report_2016.html
19Bhatia, V., Stout, S., Baldwin, B., & Homer, D. (2017). Results Data Initiative: Findings from Tanzania. Washington, DC: Development Gateway. https://www.developmentgateway.org/sites/default/files/2017-02/RDI-Tanzania.pdf
20Mutuku, L. & Mahihu, C. (2014). Open data in developing countries: Understanding the impacts of Kenya open data applications and services. Nairobi: iHub Research. https://idl-bnc-idrc.dspacedirect.org/handle/10625/56300
21Eng, N. (2014). Making our moms proud: Reducing maternal mortality in Mexico. Data Science for Social Good, 4 August. Center for Data Science and Public Policy at the University of Chicago. https://dssg.uchicago.edu/2014/08/04/making-our-moms-proud-reducing-maternal-mortality-in-mexico/
22Sangokoya, D., Clare, A., Verhulst, S., & Young, A. (2016). Uruguay’s A Tu Servicio: Empowering citizens to make data-driven decisions on health care. Brooklyn, NY: GovLab. http://odimpact.org/case-uruguays-a-tuservicio.html
23Young, A. & Verhulst, S. (2016). Battling Ebola in Sierra Leone: Data sharing to improve crisis response. Brooklyn, NY: GovLab. http://odimpact.org/case-battling-ebola-in-sierra-leone.html
24Veale, M. (2016). Data management and use: Case studies of technologies and governance. London: British Academy and the Royal Society. https://royalsociety.org/~/media/policy/projects/data-governance/data-governance-case-studies.pdf?la=en-GB
25Vandermerwe, S. & Rada, J. (1988). Servitization of business: Adding value by adding services. European Management Journal, 6(4), 314–324.
26Veale, M. (2016). Data management and use: Case studies of technologies and governance. London: British Academy and the Royal Society. https://royalsociety.org/~/media/policy/projects/data-governance/data-governance-case-studies.pdf?la=en-GB
27UNECA, United Nations Development Programme, Open Data for Development & World Wide Web Foundation (2016). The Africa Data Revolution Report 2016: Highlighting developments in African data ecosystems. Addis Abba: United Nations Economic Commission for Africa Printing and Publishing Unit. http://www.africa.undp.org/content/rba/en/home/library/reports/the_africa_data_revolution_report_2016.html
28Harvey, D. (2005). A brief history of neoliberalism. New York, NY: Oxford University Press.
29Buttles-Valdez, P., Svolou, A., & Valdez, F. (2008). A holistic approach to process improvement using the People CMM and the CMMI-DEV: Technology, process, people, & culture, the holistic quadripartite. In Software Engineering Institute, SEPG 2008 Conference, Tampa, FL.
30Ohm, P. (2010). Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review, 57, 1701. https://papers.ssrn.com/abstract=1450006
31Veale, M. (2016). Data management and use: Case studies of technologies and governance. London: British Academy and the Royal Society. https://royalsociety.org/~/media/policy/projects/data-governance/data-governance-case-studies.pdf?la=en-GB
32Wolf, A. (2014). Thanks to care.data, your secrets are no longer safe with your GP. Wired UK, 7 February. https://www.wired.co.uk/article/care-data-nhs-healthcare
33Boiten, E. (2016). Care.data has been scrapped, but your health data could still be shared. The Conversation, 12 July. http://theconversation.com/care-data-has-been-scrapped-but-your-health-data-could-still-be-shared-62181
34Kostkova, P., Brewer, H., De Lusignan, S., Fottrell E., Goldacre, B., Hart, G., Koczan, P. et al. (2016). Who owns the data? Open data for healthcare. Frontiers in Public Health, 4(7). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4756607/
35UNICEF. (2016). UNICEF Annual Report 2016: Swaziland. New York, NY: United Nations International Children’s Emergency Fund. https://www.unicef.org/about/annualreport/files/Swaziland_2016_COAR.pdf
36http://www.prescribinganalytics.com/
37The Economist. (2012). Beggar thy neighbour: Open data and health care. The Economist, 8 December. Print Edition. https://www.economist.com/britain/2012/12/08/beggar-thy-neighbour
Global availability of land ownership and land deals data is patchy, but, when available, it has been used by individual citizens, entrepreneurs, civil society, and journalists.
Over the last decade, a number of responsible data lessons have been learned. These lessons can provide guidance on how to balance transparency and privacy and on how to draw research conclusions from partial data.
In spite of large donor investments in land registration systems, few resources are currently made available to enable open data related to these projects. There are untapped opportunities as a result.
Lessons from the land ownership field highlight the political nature of data, and illustrate the importance of politically aware interventions when creating open data standards, infrastructure, and ecosystems.
Open data is often described as a non-rival good and inexhaustible resource. If I take a digital copy of a dataset, it doesn’t leave less data for you. This effectively costless sharing of open data is central to the logic that it should be made freely available and reusable, rather than treated as a finite resource to be hoarded. Land as a resource, however, is very different. Each use of land precludes use by others. Land is finite, and there is competition to control and exploit it. Potential users of land are often excluded by distance, physical, and legal barriers. Data also plays into this competition over land. Effective access to land data for one user may lead to significant first-mover’s advantage and, thus, preclude other users from taking action vis-a-vis a parcel of land, even if they eventually have access to the same data.
When we also consider the natural resources that land provides from the minerals underneath to the soil and crops on top, we can see that land can be managed well or can become degraded through over-exploitation. Unlike a digital dataset, where each different use can bring cumulative benefits, with land, there is a much more delicate balance to be struck. Yet, when it comes to understanding who owns or holds rights over land, the transactions that affect it, or how it is being managed, the word most often used is “murky”.1 Comprehensive and detailed information about land ownership is scarce.
Some of this is unsurprising. Land ownership patterns have developed over many centuries with overlapping systems of tenure, and, in many countries, these can involve feudal structures, traditional rights, common lands, leaseholds, and freeholds. The first registers of titles to land only emerged in the 1850s under colonial administrative predicaments, and many countries still lack centralised registers, let alone systems that have digitised full country-wide records. Unlike many other government databases that might be born-digital, such as those created by electronic monitoring of the distribution of welfare services, land (ownership) data is often stored in legacy, pre-digital, information systems. Digitisation and verification of such legacy data is a significantly expensive and extensive undertaking, especially in larger countries still migrating from a paper-based land records system. This implies, among other things, that land ownership data is costly to produce and maintain, even though relatively costless to share once digitised. Further, across the world, owners, custodians, and communities have a wide range of, often complex and overlapping, rights and responsibilities in relation to land, which are often not automatically captured by simplified data representations used when land information systems are migrated from paper-based to digital records.
However, over recent decades, markets for land have globalised, and land has increasingly become a valuable asset class. This has led to vast, and often secretive, land deals taking place across the world with much remaining unknown about their scale and scope.2 At the same time, national and local debates over land rights have been unfolding, with local communities often fighting similar battles in parallel geographic silos. National-scale debates and movements have also brought into focus the importance of understanding land and land ownership. For example, the Constitutional Court of South Africa has recently declared two landmark judgments upholding the land rights of women and communities affected by mining activities.3
Ultimately, the lack of transparency on land deals and the fragmented information landscape around land ownership presents problems felt by government, citizens, civil society organisations, and the private sector. For example, without clear information, governments are unable to identify and evaluate policy interventions to stimulate housing development, developers cannot locate land to build on, and communities cannot monitor whether environmental protections are being upheld or claim their rights over geographical areas inhabited for generations. Taken together, all these challenges have fed into calls for increased openness about land ownership, and they bring focus to the idea that open data can be used as a critical tool to address the land ownership transparency gap.
Land ownership and open data already have a history. When, in 2011, Michael Gurstein wrote his widely cited paper, “Open data: Empowering the empowered or effective data use for everyone?”, it was the release of land ownership information he turned to in order to ask his critical questions.4 Drawing on the account by Solomon Benjamin et al. (2007) of the Bhoomi land reform project in Bangalore, he described how “the digitization and related digital access to land title had the direct effect of shifting power and wealth to those with the financial resources and skills to use this information in self–interested ways”.5 Although Gurstein was cautious not to frame this as an argument against open data, but as one about the complementary interventions needed alongside it, the Bhoomi case has become iconic in open data discourse, frequently used to introduce the potential downsides of openness.
How far then have open data ideas progressed in relation to land ownership and governance? What is the current state of the art? And what lessons has the last decade provided? In the following sections, this chapter explores these questions through four lenses: first, with a look at cadastres and land registers, then at data on land deals and transactions, followed by data on land use, and finally, at how the land governance community is engaging with open data. In doing so, the chapter seeks to highlight how the topic of land ownership and open data provides a unique perspective on the challenges of building open data infrastructures and ecosystems in the context of unequally distributed power and wealth and how the power dynamics around data cannot be ignored.
Understanding land ownership generally relies upon two types of data: cadastres, which record the boundaries (formal or informal) of land parcels, and land registries, which record property rights and interests, and the details of ownership of particular parcels of land.6 While some countries have unified systems, in others, there are separate systems for each function, different systems at each level of government, or distinct cadastres and registries maintained by individual agencies, such as government departments related to natural resources and mining.
Since they started tracking land ownership data, both the Open Data Index7 and the Open Data Barometer8 have reported it to be one of the least available categories of data. This has remained a consistent finding, even after the Open Data Index dataset definition was updated in 2016 to remove the requirement that open land ownership data should include identifiable property owners.9 This revision, based on work with Cadasta Foundation, represented a more mature understanding in the open data community of the complex power dynamics and administrative structures around property ownership in different countries and the careful balance to be struck between privacy and transparency when it comes to land ownership records.
For example, in New Zealand, a detailed cadastre showing plots and the tenure type of each plot has been available since 2011 under Creative Commons licensing,10 but access to data that includes ownership information requires users to agree to a separate licence for personal data.11 In the United Kingdom (UK), individual title information can only be accessed for individual plots by purchasing title deeds, but a unified dataset of land held by commercial, corporate, and government owners was made available for free as bulk data in 2017, albeit under restrictive licensing terms that emphasise it should only be used for personal and non-commercial use, effective management of land, and prevention of crime.12 Apart from transparency needs and privacy concerns, the significant commercial value of land data, especially of disaggregated data that incorporates ownership and land use information, shapes the decisions by land administration authorities regarding the opening of data as the New Zealand and UK cases illustrate.
While Rufus Pollock’s arguments support the view that the model of charging users for access to land titles is economically inefficient and leads to a loss of societal benefits (as well as leading to inequality between those who can afford to build their own plot-by-plot view of land ownership and those who cannot),13 others see selling access to data plot-by-plot as a reasonable restriction, judging that open access to the full dataset would be harmful in a way that selective access to records is not. Cadasta Foundation’s analysis of open land ownership data suggests, however, that the level of land ownership transparency that is appropriate is likely to be context dependent from country to country, noting that “the UK is a highly developed and relatively equitable country with a 150 year old land administration system that holds 24 million titles. Opening up data on property owners’ names in this context has very different risks and implications than in a country with less formal documentation, or where dispossession, kidnapping, and or death are real and pervasive issues.”14
Who uses land data?
United States (US) real-estate platform Zillow draws upon US housing transaction data to provide housing purchase and rental valuations and provides an open application programming interface (API) of government records it has digitised and converted into structured data. The business was valued at USD 540 million at the time of its IPO in 2011.15
In New Zealand, wind farm developers have taken advantage of machine-readable cadastral and land ownership data to speed up the process of identifying and planning new sites.16
Investigations by the New York Times uncovered the true owners of expensive New York apartments purchased through anonymous shell companies. The investigation helped lead to actions by the US Government to seize assets suspected to have been bought with money stolen from Malaysia’s sovereign wealth fund in the 1MDB scandal.17
Note: current use of land data is greatly limited by availability. A number of the cases illustrating what could be done with land data in this chapter have sourced their data through Right to Information (RTI) requests or other research, rather than having direct access to open land datasets. Of the 17 countries with more than a 0% score for open publication of land ownership data in the latest Open Data Index, five are from Asia, 11 from Europe, and one from the Caribbean region.18
Privacy and security issues aside, one of the biggest hurdles to increasing the availability of land ownership records is the fact that many have still not been digitised. For many decades, development banks, including the World Bank, have provided extensive financial support to national and subnational efforts to develop cadastres and land registries in developing and middle-income countries. It is notable, however, that none of these projects, even those recently established, appear to have any explicit open data component, talking at best only about online portals.19 It is also worth noting that many digital land titling projects have taken decades longer than planned to complete and have struggled to overcome the considerable technical and logistical challenges of converting millions of paper records into digital forms.
Large-scale land digitisation projects also face critical questions about their tendency to adopt narrow ontologies, and to represent land in terms of simple ownership, rather than as a complex web of rights.20 Studies report that digitisation initiatives restructure not only data but the bureaucracy around it.21,22 It is primarily this concern with the way digitisation took place, ignoring traditional land usage in favour of only a limited class of documented land rights and centralising power over land decisions within higher levels of government, that was arguably at the root of the Bhoomi case,23 with open access in situations of low literacy or low capacity of users to effectively use the digitised data presenting a secondary, albeit critical, complication.
For the millions of people around the world without secure title to their land, the official datasets and data structures used to judge land disputes represent a major source of power. But if open data is understood as more than a one-way flow of data from governments, and instead, as a means to allow citizens to create and publish data about their land ownership, opportunities exist to shift that balance of power and create records that can be used to support land claims. For example, tools developed by Cadasta Foundation support communities to document their own land use and rights data, adopting flexible data models and offering fine-grained control of what is, or is not, shared openly.24 Where such systems are compatible with local legal regimes, they can give communities more control of land ownership evidence and offer a route to greater empowerment.
There have also been a number of announcements in the last few years of blockchain or distributed ledger-based alternatives to, or add-ons for, government land registry systems. Although these might, in theory, provide access to cryptographically secured and open land data,25 they do not escape the need to determine the provenance of the information added to the ledger, and evidence of any blockchain-based land registers in operation, or achieving impacts on the ground, is vanishingly thin.26
Even when land registry data is collected and kept updated, three further barriers to open data access are commonly found: cost, infrastructure, and discoverability. In South Africa, for example, it is possible to browse a detailed cadastral map of property boundaries and tenure types online through a free portal,27 but access to detailed data requires the payment of fees for each 100 or 200 parcels.28 Renee Sieber, in Chapter 9: Geospatial, also notes the increasing presence of private businesses in providing cadastral services, sometimes in return for exclusive rights to monetise the resulting data. In Europe, the 2007 INSPIRE Directives on geospatial data (see Chapter 32: European Union) have led to some progress on making cadastral records available as standardised open data,29 although users seeking to bring together data across countries are likely to be met with numerous technical errors, incompatible metadata, and broken APIs. The technical complexity of both producing and consuming cadastral data may also help explain why spot checks of Open Data Index and Open Data Barometer assessments reveal weaknesses in the accuracy of their measurements with respect to land ownership and with their researchers apparently struggling to consistently locate and assess the openness of cadastral data.30
In summary, open data ideas are relatively new within the long-established and politically charged field of land registration. While in some higher-income countries an early balance appears to have been struck between making cadastral data “open by default” and protecting the privacy rights of individual owners, there is a long way to go before the balance is struck for most countries, particularly when capacity to use data is also unevenly distributed. While the possibility of open data approaches allowing marginalised groups to take control of the representation of their own land rights is worthy of more focused research, the key technological need right now appears to be skills for grassroots data collection and management as opposed to innovations in specific database technology, such as blockchain or other distributed ledger solutions.
Data on land ownership is not only captured through static registries. Over the last decade, there has also been considerable interest in transaction data related to the buying and selling of land. This kind of data can reveal the value of land, show changing patterns of land ownership and use, and highlight risks related to money laundering and corruption.
Sources of land deal data range from national government records, such as the UK Land Registry Price Paid Dataset that lists residential property transactions,31 to crowdsourced datasets, such as GRAIN32 and Land Matrix,33 created by a network of researchers drawing on crowdsourcing and media reports to provide a partial global view of prospective or completed land deals. This latter class of data has become the subject of some controversy, illustrating the tensions that can exist when creating datasets to support research and advocacy.
Founded in 2009 by a group involving the International Land Coalition (ILC), among others, LandMatrix.org launched a beta dataset of “land grabs” in April 2012, offering a downloadable list of locations and investors, along with the anticipated size of the area to be bought. This, along with data from GRAIN, helped to spark a number of academic papers and media reports on the phenomena of land deals with a particular emphasis on deals in Africa. However, Oya (2013) has argued that the crowdsourced data lacked methodological rigour, and a focus on generating “killer facts” through rapid research could ultimately undermine the work of researchers and advocacy organisations seeking to understand deals, providing “false precision” and generating data that would not be trusted by governments and businesses.34 Scoones et al. (2013) have described this as the “politics of evidence”.35 By 2013, revisions to the LandMatrix methodology and dataset structure to more clearly illustrate source information had responded to some of these critiques, suggesting a reasonably tight feedback loop between academic and activist communities. Although it appears work on open data around land deals peaked in 2012–13, both GRAIN and LandMatrix have continued data collection. LandMatrix, in particular, is preparing for a new version to be released with updated data and features, working through a network of regional focal point institutions, including the University of Pretoria in South Africa, the Asian Farmers’ Association for Sustainable Rural Development (AFA) in Asia, and the Foundation for Development in Justice and Peace (FUNDAPAZ) in Latin America.36
Oya’s critique of land grab databases also questioned the reliance on datasets alone and called for more mixed-methods and in-depth research. One tool responding to this has been OpenLandContracts.org,37 which was launched in October 2015 by the Columbia Center on Sustainable Investment (CCSI) and builds on a platform created for extractives contract monitoring. This tool provides full text land deal documents and allows their annotation to create additional structured data. Szoke-Burke (2016) writes that the platform can encourage “more sustainable land-use practices and fresh opportunities for public participation in decision-making on [land] investments”.38
It is notable, however, that while the systematic publication of government procurement contracts has received considerable international attention (see Chapter 1: Accountability and anti-corruption), there has been much less policy focus on proactive publication of government land deals, even in light of substantial programmes of government land disposal in a number of countries. The UK, for example, has required local government agencies to prepare and publish open data on their land holdings, identifying surplus land which might be sold off for housing or property development. Yet there is no corresponding requirement to publish data on the land that has been sold off, who it was sold to, and how it is subsequently developed.39 This fits with an emphasis in government policy on using data to support an emerging PropTech (Property Technology) sector,40 rather than supporting public ownership of land.41 In seeking to take a global look at this issue, we could not locate any sources indicating the extent to which different countries provide structured data on government land holdings, their purchases, and disposals.
Ultimately, when it comes to land deals, crowdsourced open data has been instrumental in generating debate. However, its use has also brought into relief the politics of data, leading organisations to seek a balance between rapid data-driven research and rigorous data collection that combines quantitative and qualitative perspectives. Data on government land deals is of particular interest; however, there appears, at present, to be few coordinated calls for its proactive publication.
Private Eye - Land deals data and offshore ownership
In 2015 and 2016, British satirical and current affairs magazine, Private Eye, investigated ownership of UK property through offshore companies using a mix of land registry and land transaction data, albeit obtained through Freedom of Information requests, taking advantage of journalistic privilege to draw on some copyright protected information. The magazine published an interactive map showing GBP 170 billion of UK property acquired by companies registered offshore over a ten-year period, highlighting how these structures were used for large-scale tax avoidance or provided secrecy vehicles that could facilitate money-laundering.42
The investigation helped spark plans to require foreign companies buying UK property to declare their beneficial owners43 and the open release of the UK’s Overseas Company land ownership dataset.
Figure 1:Map of offshore property ownership.
Source: PrivateEye. http://www.private-eye.co.uk/registry
From a sustainable development perspective, it is not so much land ownership that matters per se, but rather the use to which land is put (albeit noting that ownership has a big impact on the equitable or distorted distribution of benefits from that use). In recent years, there has been a step-change in the global availability of remote sensing data on land quality and its use. This has been accompanied by a number of local projects making use of geospatial tools to layer together land rights and land use information, guiding policy design and supporting community action. We also note promising examples that show how open data can be used to support citizens in accessing and enjoying the use of public lands.
Two sources have been instrumental in making it possible to zoom to any square mile on earth and access visualisations and open data on estimated soil quality, land cover, and land use. Openly licensed satellite data is the driver for platforms like soilgrids.org44 that provides downloads under the Open Data Commons Open Database License (ODbL). However, recent experiments have also turned to crowdsourced OpenStreetMap data to generate land use maps, combining this with satellite data to offer usable land-use classifications across the world.45,46,47 Although there are still some methodological challenges in reconciling figures from crowdsourced and remote sensing datasets with national records, this data has the potential to be used in both planning and measuring development interventions, including by tracking the impact of development activity on soil health and land productivity.
The East West Management Institute’s (EWMI) Open Development Initiative (ODI) in the Mekong region48 also draws on geospatial tools and a number of base maps as the background for curated datasets on concessions, oil and gas blocks, and registered Indigenous lands, supporting research into the relationship between different land users. Through the ODI, EWMI acts as a paradigmatic “infomediary”49 with goals to “change public perceptions about information and build demand for more transparency, shift dynamics from debates over basic data, encourage independent analysis, and level the playing field in regard to information access”.50 The breadth of scholarly literature citing ODI sources suggests this goal is being met. Notably, however, the data available on different ODI maps across the Mekong region varies with detailed government-sourced land use only available for Cambodia, while sites for Laos, Myanmar, Vietnam, and Thailand have to fall back on international sources. When it comes to concessions, data gaps are a global problem with the 2017 Resource Governance Index51 finding that over 50% of the countries surveyed lacked any public cadastre of oil, gas, or mining concessions and licences.52
Along with land allocated for resource extraction, many countries have land allocated for national parks, reserves, and recreation areas. In the US, an online platform for finding campsites (hipcamp.com), a mass membership environmental charity (the Sierra Club), and Code for America have come together with over 50 other partners to advocate for US National and State parks to adopt an open data approach within their park reservation system.53 Active since 2014, the group has proposed model language for Parks Services to include in contracts with third-party vendors and has offered to broker introductions between national park staff and open data experts.54 The AccessLand.org project hopes to encourage all parks to create open APIs that will allow a variety of civic and entrepreneurial platforms to hook into their data to discover available facilities and facilitate the booking of park spaces.55
This last case draws attention once again to the interactive opportunities of open data about land by creating systems that not only present information but also support two-way engagement through data.
As the introductory section of this chapter describes, land governance debates often play out in very local contexts, leading to the creation of many grassroots communities, activist networks, and stakeholder groups. However, the land governance sector has a track record of organising internationally with multi-stakeholder networks such as the ILC56 and Global Land Tools Network (GLTN)57 that emerged in 1995 and 2006, respectively.
In 2009, ILC and the consortium behind the experimental landtenure.info database58 launched plans for the Land Portal to be a clearinghouse for land governance information and data.59 The Land Portal quickly evolved to have a strong focus on open data and semantic linked open data standards, aggregating and repackaging existing indicator data and developing LandVoc as a flexible vocabulary for describing land governance documents and data.60 Active in advocacy for open data in the land governance sector,61 the Land Portal has taken a particular stance in its approach to both the sources of its data and the audience for the information that results from it.62 In their 2014 business plan, the Land Portal describes a focus on “supporting the efforts of the rural poor to gain equitable access to land by addressing a fragmentation of information resources on land, which makes it difficult and often prohibitively expensive to draw together reliable evidence in support of programs, advocacy campaigns or policy formulation, especially for grassroots organisations”.63 One of the datasets made available through the site is the Property Rights Index (Prindex), launched in 2016 and now covering 36 countries with measures to represent citizen perceptions of how secure their land rights are and to complement or challenge more formal technical measures of national tenure systems.64 Through a series of partnerships with grassroots groups in Latin America, Africa, and Asia, the Land Portal has also explored approaches to filling gaps in available information and data, seeking to redress the imbalance of an information ecosystem where the majority of data remains the product of powerful global players.65
Since the Sustainable Development Goals (SDGs) were established in 2015, the land governance community has been tracking the quality and availability of data required to measure progress against land-relevant targets and indicators. As of December 2018, of the 12 land-related indicators, only three have both an established methodology and regular data collection, with six indicators still lacking an established methodology. Of the “tier 2” indicators (methodology established, but no regular data collection), two relate to gender and one to inclusive access to public space for people of all ages, genders, and disabilities.66
Figure 2:LandPortal.org mapping of SDG indicator status and visualisation showing the current limited number of countries covered by data that can be used to report against indicator 5.a.1.
Source: https://landportal.org/book/sdgs
Most recently, funding for the work of the Land Portal (and a number of other land governance data projects) has predominantly come from the UK Department for International Development’s LEGEND (Land: Enhancing Governance for Economic Development) programme,67 from Omidyar Network,68 and from partnerships with GODAN (Global Open Data for Agriculture and Nutrition: see Chapter 2: Agriculture). However, compared to the levels of support for specific open data initiatives in other sectors, such as agriculture or anti-corruption, resourcing for open data in land remains comparatively limited at present.
Overall, open data appears to still be a relatively niche issue within the land governance community. An increasing number of organisations in the sector have adopted open licences for their data and publications, and, in 2017, a number signed onto a Land Information Ecosystem Declaration,69 yet broad mainstream recognition of the role of open data still appears limited. This may be because of the particular political slant adopted by advocates of open land data, or simply because data issues still feel distant from the concerns of actors involved in fighting local land governance battles.
When it comes to land ownership data, we are confronted by a transparency gap and a messy reality of patchy and overlapping recordkeeping and data systems. However, where data is available, solid foundations have been laid for a responsible data70 approach to be taken, recognising that, where ownership records include personal data, “open by default” does not automatically apply. Ultimately, both data collection and data publication need to account for the political context and power dynamics in which they are undertaken, and recognise the way in which remote sensing and crowdsourcing can rapidly transform the overall data landscape.
Over the last decade, numerous examples have made it clear that when better land ownership and use data is made available in appropriate ways, and when it is connected with data on company ownership, agricultural practices, or Indigenous rights, it can generate substantial value realised through investigative journalism, community action, academic research, and by informing government strategies. Continued development of the critical and multi-method research skills needed to use land data effectively will be vital to unlocking further value in the future.
Looking ahead, there are three key areas for action. First, we need continued work to understand and create the conditions under which marginalised and disadvantaged groups are empowered to access and use data on land ownership to secure their property claims, to seek justice, and to address corruption. Not only is capacity building vital to make the most of land ownership data, but without capacity building to level the playing field between developers, PropTech firms, and existing land users, just outcomes from increasing openness cannot be taken for granted.
Second, donors and governments investing in the technical infrastructures for land governance should be incorporating open data terms into all their project plans, funding agreements, and contracts. This does not mean all data must be open by default, but rather that systems must be open data ready, and the proprietary control of land ownership and use data must be ruled out. Directing just a small percentage of the millions invested in land registry systems every year toward open data approaches could be transformative.
Lastly, we need to see much better baseline and monitoring data on current levels of openness around the world for cadastre, land registry, and land deal data. Current open data studies lack the depth and geographic coverage needed to allow accurate monitoring of progress. At a minimum, studies need to distinguish between data that covers all forms of tenure and data that is restricted to only corporate or government-owned land. With a better baseline, it should also be possible to foster stronger advocacy, calling for land registry and land deal open data to be published with purpose.
In closing, the key lesson to take away from looking at open data and land ownership is that political struggles over the collection, curation, and release of data are now part and parcel of political struggles related to land ownership and use. Although this is brought into sharp relief in the case of land, open data in each sector is equally likely to possess its own complex politics, and advocates taking a stand on open data should always consider the wider political context within which it is pursued.
Further reading
Cadasta Foundation. (2017). Towards a more open future: Increasing accountability and transparency through open land data. https://cadasta.org/resources/white-papers/accountability-transparency-open-land-data/
Hetherington, K. (2012). Promising information: Democracy, development, and the remapping of Latin America. Economy and Society, 41(2), 127–150. https://www.tandfonline.com/doi/abs/10.1080/03085147.2011.607365
Pierce, C.J., Tagliarino, N., MacInnes, M., Daniel, P., & Jaitner, A. (2018). Towards transparency in land ownership: A framework for research on beneficial land ownership. Berlin: Transparency International. https://www.transparency.org/whatwedo/publication/towards_transparency_in_land_ownership_a_framework_for_research
About the authors
Tim Davies has been researching the power dynamics around open data since 2009. He has worked with the Land Portal on strategy development and has an interest in UK land policy reform.
Sumandro Chattapadhyay is Research Director at the Centre for Internet and Society in Bangalore. He was a member of the founding team at the MOD Institute, Bangalore, and led a data analysis and visualisation project at the Azim Premji University, Bangalore. Sumandro was formerly a research associate at the Sarai programme, Centre for the Study of Developing Societies, Delhi, and was a member of the IDRC Open Data in Developing Countries research network. You can follow Sumandro on Twitter at https://www.twitter.com/ajantriks.
Davies, T. & Chattapadhyay, S. (2019). Open data and land ownership. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The state of open data: Histories and horizons (pp. 181–195). Cape Town and Ottawa: African Minds and International Development Research Centre. http://stateofopendata.od4d.net
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. It was carried out with the aid of a grant from the International Development Research Centre, Ottawa, Canada. |
1Hogge, B. (2015). Open data: Six stories about impact in the UK. London: Omidyar Network. https://www.omidyar.com/sites/default/files/file_archive/insights/Open%20Data_Six%20Stories%20About%20Impact%20in%20the%20UK/OpenData_CaseStudies_Report_complete_DIGITAL_102715.pdf
2Cotula, L. & Berger, T. (2017). Trends in global land use investment: Implications for legal empowerment. Land, Investment and Rights Series. London: Institute for Environment and Development. http://pubs.iied.org/pdfs/12606IIED.pdf
3Mavhinga, D. (2018). South Africa’s Constitutional Court protects land rights. Human Rights Watch, 6 November. https://www.hrw.org/news/2018/11/06/south-africas-constitutional-court-protects-land-rights
4Gurstein, M.B. (2011). Open data: Empowering the empowered or effective data use for everyone? First Monday, 16(2). https://doi.org/10.5210/fm.v16i2.3316
5Benjamin, S., Bhuvaneswari, R., Rajan, P., & Manjunatha, P. (2007). Bhoomi: “E-governance” or an anti-politics machine necessary to globalize Bangalore? CASUM–m Working Paper. https://casumm.files.wordpress.com/2008/09/bhoomi-e-governance.pdf
6Cadasta Foundation. (2016). An overview of property rights data. https://cadasta.org/open-data/overview-of-property-rights-data/
8https://opendatabarometer.org/
9https://web.archive.org/web/20170508074348/https://index.okfn.org/methodology/
10https://data.linz.govt.nz/layer/50804-nz-property-titles/
11https://www.linz.govt.nz/data/licensing-and-using-data/linz-licence-for-personal-data
12https://data.landregistry.gov.uk/data_pub/terms-of-use/read/ccod
13Pollock, R. (2009). The economics of public sector information. Working Paper. Cambridge: University of Cambridge. https://doi.org/10.17863/CAM.5635
14Cadasta Foundation. (2016). An overview of property rights data. https://cadasta.org/open-data/overview-of-property-rights-data/
15Zillow. (2016). Data for good: Zillow’s open data collaborations. https://www.zillow.com/research/data/
16Land Information New Zealand. (n.d). Using LINZ data to develop wind farms. https://web.archive.org/web/20161201133406/https://www.linz.govt.nz/using-linz-data-develop-wind-farms
17https://www.nytimes.com/news-event/shell-company-towers-of-secrecy-real-estate
18https://index.okfn.org/dataset/land/
19Author’s research from International Aid Transparency Data. https://github.com/timgdavies/land-governance-open-data-research/
20Ferris, L., Pichel F., & Sorensen, N. (2016). Report: Debate on open data and land governance. Land Portal and Cadasta Foundation. https://landportal.org/pt/library/resources/report-debate-open-data-and-land-governance
21Hetherington, K. (2012). Promising information: Democracy, development, and the remapping of Latin America. Economy and Society, 41(2), 127–150. https://doi.org/10.1080/03085147.2011.607365
22Hetherington, K. (2011). Guerrilla auditors: The politics of transparency in neoliberal Paraguay. Durham, NC: Duke University Press.
23Benjamin, S., Bhuvaneswari, R., Rajan, P., & Manjunatha, P. (2007). Bhoomi: “E-governance” or an anti-politics machine necessary to globalize Bangalore? CASUM–m Working Paper. https://casumm.files.wordpress.com/2008/09/bhoomi-e-governance.pdf
24Pichel, F. & Weber, M. (2018). Strengthening land tenure in informal settings: A fit-for-purpose approach. African Journal on Land Policy and Geospatial Sciences (Special Issue, October 2018): 16–21. https://revues.imist.ma/index.php?journal=AJLP-GS&page=article&op=view&path%5B%5D=13368
25Vos, J. (2016). Blockchain based land registry: Panacea, illusion or something in between? European Land Registry Association. https://www.elra.eu/wp-content/uploads/2017/02/10.-Jacques-Vos-Blockchain-based-Land-Registry.pdf
26Burg, J., Murphy, C., & Pétraud, J.P. (2018). Blockchain for international development: Using a learning agenda to address knowledge gaps. MERL Tech [Blog post], 29 November. http://merltech.org/blockchain-for-international-development-using-a-learning-agenda-to-address-knowledge-gaps/
27https://csg.esri-southafrica.com/
28http://csg.dla.gov.za/fees_spatial.htm
29http://inspire-geoportal.ec.europa.eu/overview.html?view=themeOverview&theme=cp
30Author’s research.
31https://www.gov.uk/government/collections/price-paid-data
32GRAIN. (2012). GRAIN releases data set with over 400 global land grabs. GRAIN [Article], 22 February. https://www.grain.org/article/entries/4479-grain-releases-data-set-with-over-400-global-land-grabs
34Oya, C. (2013). Methodological reflections on “land grab” databases and the “land grab” literature “rush”. The Journal of Peasant Studies, 40(3), 503–520. https://doi.org/10.1080/03066150.2013.799465
35Scoones, I., Hall, R., Barros Jr, S.M., White, B., & Wolford, W. (2013). The politics of evidence: Methodologies for understanding the global land rush. The Journal of Peasant Studies, 40(3), 469–483. https://doi.org/10.1080/03066150.2013.801341
36https://landmatrix.org/region/
37https://www.openlandcontracts.org/
38Szoke-Burke, S. (2016). Here’s the deal. RICS Land Journal, October/November. https://issuu.com/ricsmodus/docs/land_oct_nov_16_interactive_pdf/20
39DCLG. (2015). Local government transparency code 2015. London: Department for Communities and Local Government. https://www.gov.uk/government/publications/local-government-transparency-code-2015
40https://web.archive.org/web/20190104134543/https://digital-land.github.io/
41Christophers, B. (2018). The new enclosure: The appropriation of public land in neoliberal Britain. London: Verso.
42Private Eye. (2016). Tax havens: Selling England by the offshore Pound – The story so far. London: Pressdram Ltd. http://www.private-eye.co.uk/pictures/special_reports/tax-havens.pdf
43Private Eye. (2016). Selling England (and Wales) by the Pound. London: Pressdram Ltd. http://www.private-eye.co.uk/registry
45Schultz, M., Voss, J., Auer, M., Carter, S., & Zipf, A. (2017). Open land cover from OpenStreetMap and remote sensing. International Journal of Applied Earth Observation and Geoinformation, 63, 206–213.
47Yang, D., Fu, C.-S., Smith, A.C., & Yu, Q. (2017). Open land-use map: A regional land-use mapping strategy for incorporating OpenStreetMap with earth observations. Geo-Spatial Information Science, 20(3), 269– 281. https://doi.org/10.1080/10095020.2017.1371385
48https://opendevelopmentmekong.net
49Magalhaes, G., Roseira, C., & Strover, S. (2013). Open government data intermediaries: A terminology framework. Proceedings of the 7th International Conference on Theory and Practice of Electronic Governance, Seoul, Korea, 22–25 October, pp. 330–333. New York, NY: Association for Computing Machinery. https://doi.org/10.1145/2591888.2591947
50Open Development Innitiative. (n.d.). EWMI-ODI and the open movement. Open Development Mekong. https://opendevelopmentmekong.net/background/ewmi-odi-and-the-open-movement/
51NRGI. (2017). 2017 resource governance index. New York: Natural Resource Governance Institute. https://resourcegovernance.org/sites/default/files/documents/2017-resource-governance-index.pdf
52https://www.resourcegovernanceindex.org/data/both/issue?category=1&indicator=2®ion=global&subcategory=1
53AccessLand Coalition. (2014). Whitepaper: Recreation.gov as a platform – How to inspire the next generation of conservationists, reach diverse audiences, and connect more Americans to their land. https://docs.google.com/document/d/1f6qNofxPHVWtywYavv-mTT4wkQaa0IZfrTemx55RkEM/edit?usp=embed_facebook
55Ravasio, A. (2014). Open data for open lands. Medium, 20 October. https://medium.com/@alyraz/open-data-for-93af9d3d30aa
56http://www.landcoalition.org/
58https://web.archive.org/web/20101014014449/http://www.landtenure.info:80/
59LandPortal.info. (2014). Land portal development 2014–2017: Business plan. https://landportal.org/sites/landportal.info/files/LandPortal_BusinessPlan_Sept2014-Web.pdf
60https://landportal.org/voc/landvoc
61Ferris, L., Pichel, F., & Sorensen, N. (2017). Towards a more open future: Increasing accountability and transparency through open land data. 2017 World Bank Conference on Land and Poverty, Washington DC, 20–24 March 2017. https://www.conftool.com/landandpoverty2017/index.php/11-09-Pichel-661_paper.pdf?page=downloadPaper&filename=11-09-Pichel-661_paper.pdf&form_id=661&form_version=final
62PDD. (2019). Overcoming land data silos: The role of data ecosystems in achieving global development goals. PDD Case Study: Land Portal. Principles for Digital Development. https://digitalprinciples.org/wpcontent/uploads/PDD_CaseStudy-LandPortal_v4.pdf
63LandPortal.info. (2014). Land portal development 2014–2017: Business plan. https://landportal.org/sites/landportal.info/files/LandPortal_BusinessPlan_Sept2014-Web.pdf
65https://landportal.org/about/vision
66https://landportal.org/book/sdgs
67https://devtracker.dfid.gov.uk/projects/GB-1-204252
68https://www.omidyar.com/investees/land-portal
69Land Portal. (2017). Land information ecosystem declaration. https://landportal.org/news/2017/05/land-information-ecosystem-declaration
For national statistical offices (NSOs) and their partner agencies, open data provides a route to engage with a larger world of data-driven innovation and to demonstrate their relevance and value to the public.
Progress on making official statistics openly available has been slow and fraught with quick wins missed and a lack of long-term investment.
Greater engagement between open data and NSO communities is needed to drive cultural and practical changes, recognising the strengths that each bring to the data ecosystem in support of the Sustainable Development Goals.
Data has the power to save lives, end poverty, protect the planet, and transform our world, but only if it is open and well used. This chapter is concerned with open, official statistics, which include some of the most important datasets that decision-makers need to create policies, design programmes, and monitor results. They are derived from data produced by governments as part of their official function. They provide a quantitative record of the country’s social, economic, and environmental condition.1 Collected through censuses, surveys, and administrative records, official statistics are the product of national statistical systems, which are confederations of official agencies that in most countries are coordinated by a national statistical office (NSO).
Since they are produced by public bodies using public funds, official statistics should be considered public goods, capable of being used and reused for many purposes without diminishing their value to others, and available to be copied or reproduced by anyone. In economists’ terms, they are non-rivalrous and non-excludable. Making official statistics openly available is, therefore, economically efficient. Beyond satisfying economic theory, making official statistics openly available can stimulate innovative applications, encourage citizen engagement, and increase confidence in the statistical system as a whole.
Although their responsibilities differ from country to country, NSOs generally have the authority to set statistical standards, to design and implement large-scale data collection programmes, and to ensure the quality, reliability, and availability of official statistics. Through their links to other NSOs and to international statistical agencies, they contribute to, and benefit from, new techniques and common standards. Because of their centrality and the importance of statistics for setting policies and measuring outcomes, NSOs and national statistical systems should be at the forefront of the data revolution and the open data agenda. Where they lack explicit authority, they can, and should, lead by example. For NSOs and their partner agencies, open data is more than a dissemination strategy; embracing the principles of open data is an opportunity to engage with the larger world of data-driven innovation and to demonstrate their relevance to their own governments, the private sector, and the public at large.
There is an emerging, international consensus on the principles of open data, and much advice is available on how to make data open, but implementation of these principles has been difficult. Measurement of the availability of open data from official sources reveals slow progress at best. There are relatively low-cost actions that could make official statistics more open: providing data in machine-readable formats, making metadata available, and publishing open terms of use. However, producing larger and more complex datasets in response to the demands of the Sustainable Development Goals (SDGs) will require increasing the capacity of national statistical systems and securing additional resource commitments from governments to support robust, effective, independent, and open statistical systems.
As the coordinating body for a country’s national statistical system, NSOs are charged with identifying, collecting, processing, analysing, and disseminating official statistics on behalf of the government. NSOs are a part of government, but should be independent of partisan activities. Their independence is critical to their position as information brokers that need to build trust and remain free from influences that might bias their data or analyses. NSOs and the larger statistical system should, however, be responsive to the demands of policy-makers, who finance their budgets to meet their own, and the public’s, need for reliable information. These demands are not fixed. They grow and change as new challenges and opportunities present themselves.
National statistical systems are the repositories of two kinds of data: microdata, which are the unit records of censuses, surveys, and administrative datasets, as well as aggregate data or indicators. Microdata contains identifiable information about people, businesses, or other entities. Before this data can be made openly available, it must be anonymised or aggregated into public-use data and indicators. Access to the underlying microdata must be strictly controlled.
Guidance for NSOs is provided by the United Nations Fundamental principles of official statistics, a set of ten principles that set out the professional and scientific standards for NSOs.2
The first principle, which arguably incorporates the remaining nine and embraces the core principle of open data, says that “official statistics that meet the test of practical utility are to be compiled and made available on an impartial basis by official statistical agencies to honour citizens’ entitlement to public information”. The sixth principle states that data on individuals “is to be strictly confidential and used exclusively for statistical purposes”. Balancing the public’s right to information with the possible privacy risks for certain microdata sets is a balancing act that all NSOs work to maintain.
As the data ecosystem expands, NSOs are expected to take a stronger coordinating role, encompassing new data sources, producers, and users, including both public and private actors. NSOs must also engage with a diverse set of stakeholders, including academic institutions, nongovernmental organisations (NGOs), and bilateral and multilateral agencies in support of their research, development projects, and applications of open data. But many NSOs still lack the human, physical, and financial resources needed to perform even their traditional role. A report on the World Bank’s Statistical Capacity Indicators Database found that 39% of the 131 countries studied had a low statistical capacity. They lack a recent census, survey, complete civil registration and vital statistics system, or general statistical capacity.3 The global community needs to be conscious of the varying capacities of NSOs, and create space for a variety of approaches based on technical capacity and country-level compatibility. There is no one-size-fits-all approach to building open data practices in NSOs around the world.
By lowering the transaction costs for disseminating data, open data can reduce the operational costs for NSOs, who will have an increasing role in coordinating and managing the data ecosystem. There are greater economic benefits to governments through the more efficient management of programmes, and to individuals and businesses through the use of data to create new products and services. In one of the earliest studies of the benefits of open data, Rufus Pollock estimated welfare gains to opening data that were previously sold by the British government to be from GBP 1.6 to 6 billion.4 A study of the European Union’s open data portal predicted a total of Euro 1.7 billion will be saved in efficiency gains from open data for the public sector in the year 2020 alone.5 Research on the opening of Landsat satellite data in the United States (US) points to similar financial benefits. Annual savings from the open Landsat data for NGOs, Federal Government, and the private sector is estimated at between USD 350 and 436 million per year.6
The degree of engagement with open data among NSOs varies widely. Some are leading, such as Mexico, Jamaica, and the Philippines. They are embracing open data by establishing open data portals, reviewing access to information laws and policies, and including open data in national budgeting and planning processes. Others have been slower to implement even the simplest open data policies.
At the international level, there have been important steps taken toward open data. New standards, principles, and operating guidelines have been created; Open Knowledge International7and the Open Data Charter8 have established a working definition of open data. The Cape Town global action plan for sustainable development data,9 adopted at the first United Nations World Data Forum in 2017, includes open data among its key actions for innovation and the modernisation of national statistical systems. Open data was subsequently addressed at the 48th and 49th annual meetings of the United Nations Statistical Commission (UNSC), a meeting of chief statisticians from UN member states and the highest decision-making body on statistical activities. The UNSC discussions on open data from the 49th meeting, held in March 2018, showed that countries are starting to treat open data as a priority and trying to integrate it into their national strategies and budgeting processes, as well as seeking international support for technical and financial assistance. Further, discussions from the 49th UNSC resulted in the designation of a subgroup to recommend changes to incorporate open data concepts in the Fundamental Principles of Statistics.
Beyond international advocacy for open data, practical steps to implement open data have been taken. A network of regional open data hubs has been developed by Open Data for Development (OD4D).10 PARIS21 now includes open data in its recommendations on National Strategies for the Development of Statistics (NSDS)11 and in its training programmes. The World Bank’s Open Data Readiness Assessment (ODRA)12 helps countries identify gaps and opportunities for implementing open data. And NSOs are increasingly involved in international open data events, such as the International Open Data Conference (IODC). These are important advances that empower local actors to choose their own paths towards statistical development and learn from a growing network of open data actors.
The national and international policy developments are encouraging, but results must be measured by their impact on the availability and openness of official statistics. There is a consensus among projects measuring open data implementation that many countries have not fully adopted open data policies and practices and that implementation has been slow.13 To accelerate progress, additional financial resources are needed to build capacity and modernise national statistical systems in low- and middle-income countries. Further, the value of data needs to be demonstrated to strengthen popular and political support for open data.
There are several quantitative indexes that measure the openness of government data. Among these are the Open Data Inventory (ODIN), the Open Data Barometer (ODB), and the Global Open Data Index (GODI). ODIN is designed to measure the openness of official statistics produced by national statistical systems and is the most appropriate index for this paper. The ODB and GODI both include “national statistics” among the types of public information they evaluate, but they are more concerned with non-statistical datasets, such as government budgets, voting records, transportation timetables, weather information, and maps.14 Despite the differences in the data incorporated in their assessments, all these indexes employ a similar definition of open data, based on the principles of the Open Data Charter15 and the Open Definition.16 The indexes also point to similar conclusions: there is a large gap between the success of some countries regarding open data and the failure of others. Many of the datasets that users seek are unavailable or not provided on open terms, and there has been little improvement in open data scores over the last four years.
The ODIN scores highlight the large differences in open access to official statistics between countries. The highest scoring country in the ODIN 2017 report, Denmark, scored 80 (out of 100), while the lowest scoring country, Chad, scored 3. The median score was 37. Similar disparities between high and low-to-middle income countries’ open data scores were found in the ODB.17 Scores are typically correlated with a country’s GDP, but there are examples of relatively poor countries that provide open data on a large set of official statistics. In ODIN 2017, Rwanda, for example, had a higher score for data openness than one-third of the OECD countries. A few countries have made significant improvements. In 2017, Bulgaria’s ODIN score increased by 14 points, placing it in the top ten globally, because the NSO made more data available in machine-readable and non-proprietary formats, and revised its terms of use to make them more open.
Despite widespread support for open data, the open data indexes have not, on average, registered a significant improvement in the last few years. Figure 1 shows the average open data scores from the ODB, ODIN, and GODI indexes. To make these indexes more comparable, only countries that had a score in every year of the index’s study period were used. Small changes in methodology limit comparability over time,18 but a general pattern is clear: there is no clear upward trend in average scores; if anything, there appears to be a levelling off of progress toward open data.
Figure 1:Measuring open data index scores over time
Source: Data taken from the ODB, ODIN, and GODI indexes
To have open data, you first need data. Without open data, it is difficult to demonstrate the value of data to policy-makers, and, without recognition of the value of data, progress toward complete and open data will remain slow. For many countries, this defines a nexus of problems: lack of focus on the demand side, lack of commitment, and, lack of resources. There continues to be a mismatch in countries between data demand and supply. Like all service providers, NSOs must understand their clients. If members of government, businesses, and citizens cannot access the data they need, then they will go elsewhere or do without.19 Beyond simply publishing data on their website or through a dedicated data portal, NSOs must engage with their clients, demonstrate the relevance and value of data, and provide tools and information that make the data more accessible. User surveys, feedback options, and monitoring web traffic are some of the methods that can be used to understand client needs.
The SDGs have increased the demands on NSOs as they require a comprehensive set of data from social, economic, and environmental sectors to measure progress toward the 2030 targets. This presents an opportunity for closing the gap between supply and demand since much of the data required for monitoring the SDGs depends upon the work of the national statistical system. But the 2017 ODIN report finds that critical datasets on the environment and gender are absent from some national data portals.20 The lack of gender data is a particular obstacle to the SDG commitment to “Leave no one behind”,21 which focuses on making disaggregated data available on gender, age, income, disability, and other important factors to make sure that the SDG targets are met for all segments of society. NSOs have an important role to play in closing these data gaps and meeting the demands of the SDGs.
Many national statistical systems are underfunded and lack the modern data infrastructure and statistical capacity necessary to meet the demands of the 2030 SDG Agenda. The Development co-operation report 201722 and The state of development data funding23 report find that funding levels for statistics are insufficient. Both recommend that the donor community (including multilateral, bilateral, and philanthropic organisations) adopt new financing strategies to provide more resources for data production and statistical capacity building. It is not just a matter of how much financing is given, but how it is given. As PARIS21’s project on Capacity Development 4.0 makes clear, better allocation of resources and coordination of donors’ programmes can increase the effectiveness of capacity-building programmes. The amounts needed are not large. Properly allocated and well used, an increase in support for statistics from 0.30 to 0.45% of official development assistance is needed to increase the statistical capacity to support the SDGs. National statistical systems with strong open data practices will have a positive effect on capacity-building efforts.
The countries that outperformed expectations in the open data indexes can provide important lessons on best practices. Countries like Rwanda, which has the highest ODIN score of any low-income country, or Mexico, which has developed a strong culture of support for open data and is consistently ranked highly in measures of open data, are good examples. Because many of the actions needed to make data open (e.g. open licensing and providing machine-readable formats) do not require large investments and are achievable with simple policy changes, it is often leadership and politics that keep data from being open.
NSOs are, by their design, supposed to be apolitical government organisations. Politics, however, often becomes entangled in NSO activities because official statistics can be used to justify funding from donors24 or defend a politician’s governing record,25 or because census statistics can be used for taxation and other functions of state power.26 Because of NSOs’ apolitical nature, the leadership in the organisations often lack or do not want to use their political capacity to push for an open data agenda.27 Successful national movements for open data require a high-level commitment on behalf of the government (often at the head of state level), long-term planning to create continued political support in transition, and guiding political frameworks. With this political support, minor changes in policy and better dissemination tools could open data in many countries.
A rising open data star: Mexico
The Instituto Nacional de Estadística y Geografía (INEGI) in Mexico is opening data and leading the way in its region with high-level support from the Office of the President.28 INEGI’s hard work prompted the country to move into the top-ten most open countries in ODIN 2017, passing the United States (US). Mexico also consistently outperforms other countries in its region and other middle-income countries as measured by the ODB and GODI indexes. As a result, impactful open data programmes can be seen across the country, like Mejora Tu Escuela, a programme that displays school data and rankings to spur educational improvements by holding schools accountable.29
The impact of open data on the economy, good governance, and democracy needs to be measured and communicated to the public, decision-makers, and politicians. If the value can be demonstrated, a virtuous cycle of data use can begin. People who use data will make better decisions. Data-based decisions will have more positive outcomes, and this will lead to greater data use and encourage additional funding for data and statistics. Broader use of data can also help NSOs improve the quality of their data. The more statistics are compared, contrasted, and combined with other data and information, the more light is shed on quality issues that may not have been identified previously.
The results from research studies on the use of open data on development are mixed and show that data has the capacity to generate economic impacts, but decision-makers often have difficulty incorporating data into their decision-making process. The Results Development Initiative30 and the Avoiding data graveyards report31 point to low use of data and open data platforms by decision-makers. Conversely, a survey from the United Nations Economic Commission for Europe finds there is a rising perception of the importance of data use and an increase in the citations of data in the countries surveyed.32 More research is needed to understand the obstacles to, and incentives for, making better use of development data for public decision-making.
Leading the pack on open data in Africa: Rwanda
Rwanda has proven that, with a commitment to open data and some practical steps, low-income countries can open data. Rwanda has strategically invested in funding for statistics and open data.33 As a result, the country earned the highest ODIN 2017 ranking for a low-income country. As a champion of open data, the country has also seen societal benefits like the open data land use portal that promotes land rights in the country. It especially benefits women, who are often cheated in land deals due to lack of access to land documentation.34
Taking stock of the state of open data for official statistics, we see that much progress has been made but that more is needed. International financial support for NSOs and a global push to demonstrate the value of open data for development could have dramatic effects on changing popular and political support for open data. However, there are also actions that NSOs can take to support open data in their own countries.
An important first step is to secure political and institutional support for open data within the government and to obtain the support of other stakeholders. This effort should be coordinated with a government-wide open data initiative, if possible. Legal frameworks and access to information policies should be reviewed and revised as necessary to support open data policies. Open data should be incorporated in countries’ NSDS, as well as in the planning and implementation of SDG national reporting platforms. For countries that have not already done so, an ODRA can be used to identify a roadmap for implementing open data. NSOs should champion open data in their own countries. Their perspectives and voices are needed at international discussions around open data, such as the IODC and United Nations World Data Forum.
Implementing open data programmes for existing datasets need not be expensive, and countries do not need to wait for additional funding to make progress. Data in PDF or image files can be converted to non-proprietary and machine-readable formats at little or no cost. Current production processes should be updated to go directly to machine-readable files, which will reduce costs over the long run. Metadata should be assembled and made available. And all data should be published under an open licence, such as a Creative Commons Public Domain (CC0) or Attribution Only (CC-BY) licence. These steps only require the political will to open data and few additional resources.
Just as it is important to make the case for the value of data at the international level, it is also important at the country level. Open data expands the reach and influence of the national statistical system, increasing the value of official statistics to the government and to the public. Data that is open can be used and reused without diminishing its value, for mobile phone applications, analyses, and other applications. By following the “Leave no one behind” movement, NSOs can also build a broad coalition of all segments of society to make sure all people are included and can benefit from this data. Most NSOs are more focused on the technical aspects of running their organisations, but effort should also be put into spreading data success stories to the public to increase support for open data. Overall, open data can raise the profile of data and the profile of NSOs as trusted organisations that are responsive to national and international demands.
When these steps at the international and national levels are taken, the open data index scores will begin to improve, and, more importantly, citizens will start to see the promised benefits of open data and much needed movement toward the 2030 SDGs.
Further reading
Badiee, S., Jütting, J., Appel, D., Klein, T., & Swanson, E. (2017). The role of national statistical systems in the data revolution. In Development Co-operation report 2017. Paris: Organisation of Economic Co-operation and Development Publishing. https://doi.org/10.1787/dcr-2018-en
High-level Group for Partnership, Coordination and Capacity-Building. (2017). Cape Town global action plan for sustainable development data. Cape Town: United Nations Statistical Division. https://unstats.un.org/sdgs/hlg/Cape-Town-Global-Action-Plan/
Open Data Watch. (2019). Open data inventory 2018/19 annual report. Open Data Watch. http://odin.opendatawatch.com/report/pressReport
UN Statistical Commission. (2018). Open data: Report of the Secretary-General. New York, NY: United Nations Economic and Social Council. https://doi.org/10.1787/dcr-2018-en
About the authors
Shaida Badiee is a co-founder of Open Data Watch, where she directs the strategic planning, partnership, and fund-raising work. She is a Senior Advisor on gender data to Data2X, a co-chair of the SDSN TReNDS group, part of the Technical Advisory Group for the Global Partnership on Sustainable Development Data, a member of the PARIS21 board, and serves on a number of other boards. To follow Shaida, go to https://twitter.com/ShaidaBadiee.
Caleb Rudow is a Research and Data Analyst at Open Data Watch and conducts research on open data funding, patterns of data use, and technical issues around open data policy.
Eric Swanson is a co-founder of Open Data Watch, where he is the Director of Research. He is a globally recognised economist with a passion for analysing the most effective ways to use data for development. More information on Eric can be found at https://twitter.com/EricVSwanson and details on Open Data Watch are available at https://opendatawatch.com/about/.
Badiee, S., Rudow, C., & Swanson, E. (2019). Open data and national statistics. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The state of open data: Histories and horizons (pp. 196–206). Cape Town and Ottawa: African Minds and International Development Research Centre. http://stateofopendata.od4d.net
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. It was carried out with the aid of a grant from the International Development Research Centre, Ottawa, Canada. |
1OECD. (2017). Development co-operation report 2017: Data for development. Paris: Organisation for Economic Co-operation and Development Publishing. https://doi.org/10.1787/dcr-2017-en
2UNSC (United Nations Statistical Commission). (2018). Fundamental principles of official statistics. New York, NY: United Nations. https://unstats.un.org/unsd/dnss/gp/FP-Rev2013-E.pdf
3World Bank. (2017). Statistical capacity indicator dashboard. http://datatopics.worldbank.org/statisticalcapacity/SCIdashboard.aspx
4Pollock, R. (2010). Welfare gains from opening up public sector information in the UK. https://rufuspollock.org/papers/psi_openness_gains.pdf
5Capgemini Consulting. (2015). Creating value through open data study on the impact of re-use of public data resources. Luxembourg: European Data Portal. https://www.europeandataportal.eu/sites/default/files/edp_creating_value_through_open_data_0.pdf
6Landsat Advisory Group. (2018). The value proposition for Landsat applications. NGAC Landsat Economic Value Paper – 2014 update. Reston, VA: National Geospatial Advisory Committee. https://www.fgdc.gov/ngac/meetings/december-2014/ngac-landsat-economic-value-paper-2014-update.pdf
7Open Knowledge International. (n.d.). Open definition 2.1. http://opendefinition.org/od/2.1/en/
8Open Data Charter. (2015). Principles: International Open Data Charter. https://opendatacharter.net/principles/
9High-level Group for Partnership, Coordination and Capacity-Building for Statistics for the 2030 Agenda for Sustainable Development. (2017). Cape Town global action plan for sustainable development data. New York, NY: United Nations Statistical Commission. https://unstats.un.org/sdgs/hlg/Cape-Town-Global-Action-Plan/
11PARIS21. (n.d.). NSDS (National Strategies for the Development of Statistics) guidelines: Open data. http://nsdsguidelines.paris21.org/node/530
12World Bank. (2015). Readiness Assessment Tool. http://opendatatoolkit.worldbank.org/en/odra.html
13Web Foundation. (2017). Open Data Barometer – Global report. 4th edition. Washington, DC: World Wide Web Foundation. https://opendatabarometer.org/4thedition/report/
14Open Knowledge International. (n.d.). Global Open Data Index. https://index.okfn.org.
15Open Data Charter. (2015). Principles: International Open Data Charter. https://opendatacharter.net/principles/
16Open Knowledge International. (n.d.). Open definition 2.1. http://opendefinition.org/od/2.1/en/
17Web Foundation. (2017). Open Data Barometer – Global report. 4th edition. Washington, DC: World Wide Web Foundation. https://opendatabarometer.org/4thedition/report/
18For a discussion of the ODIN methodology, see Open Data Watch. (2017). Open data inventory: 2017: Methodology report. Washington, DC: Open Data Watch. https://opendatawatch.com/reference/open-data-inventory-2017-methodolgy-report/
19Young, A. & Verhulst, S. (2016). When demand and supply meet: Key findings of the open data impact case studies. Brooklyn, NY: GovLab. http://odimpact.org/files/open-data-impact-key-findings.pdf
20Open Data Watch. (2017). Open data inventory 2017: Annual report. Washington, DC: Open Data Watch. http://odin.opendatawatch.com/Downloads/otherFiles/ODIN-2017-Annual-Report.pdf
21United Nations Statistics Division. (2016). Leaving no one behind. https://unstats.un.org/sdgs/report/2016/leaving-no-one-behind
22OECD. (2017). Development co-operation report 2017. Paris: Organisation for Economic Co-operation and Development Publishing. https://doi.org/10.1787/dcr-2017-en
23Global Partnership for Sustainable Development Data. (2016). The state of development data funding. Washington, DC: Open Data Watch. https://opendatawatch.com/wp-content/uploads/2016/09/development-data-funding-2016.pdf
24Sandefur, J. & Glassman, A. (2014). The political economy of bad data: Evidence from African survey & administrative statistics. CGD Working Paper 373. Washington, DC: Center for Global Development. https://www.cgdev.org/publication/political-economy-bad-data-evidence-african-survey-administrative-statistics-working
25Custer, S. & Sethi, T. (Eds.). (2017). Avoiding data graveyards: Insights from data producers and consumers. Washington, DC: Aid Data at the College of William and Mary. http://docs.aiddata.org/reports/avoiding-data-graveyards-report.html
26Krätke, F. & Byiers, B. (2014). The political economy of official statistics: Implications for the data revolution in Sub-Saharan Africa. ECDPM Discussion Paper 170. Maastricht, Netherlands and Brussels, Belgium: European Centre for Development Policy Management, in partnership with PARIS21. http://www.ecdpm.org/dp170
27World Bank. (2018). World Bank support for open data 2012–2017. Washington, DC: World Bank. http://opendatatoolkit.worldbank.org/docs/world-bank-open-data-support.pdf
28Web Foundation. (2017). Open Data Barometer: Latin America regional report. (3rd edition). Washington, DC: World Wide Web Foundation. https://opendatabarometer.org/3rdedition/regional-report/latin-america/
29Young, A. & Verhulst, S. (2016). Mexico’s Mejora Tu Escuela: Empowering citizens to make data-driven decisions about education. Brooklyn, NY: GovLab. http://odimpact.org/case-mexicos-mejora-tu-escuela.html
30Development Gateway. (2017). Increasing the impact of results data. Policy Brief. Washington, DC: Development Gateway. https://www.developmentgateway.org/sites/default/files/2017-02/RDI-PolicyBrief.pdf
31Custer, S. & Sethi, T. (Eds.). (2017). Avoiding data graveyards: Insights from data producers and consumers. Washington, DC: Aid Data at the College of William and Mary. http://docs.aiddata.org/reports/avoiding-data-graveyards-report.html
32https://www.unece.org/statistics/networks-of-experts/task-force-on-the-value-of-official-statistics.html
33UNECA et al. (2016). The Africa data revolution report 2016. Addis Ababa: United Nations Economic Commission for Africa. https://www.uneca.org/sites/default/files/uploaded-documents/ACS/africa-data-revolution-report-2016.pdf
34Crompton, S. (2016). Success stories: Issue 1. Wallingford, UK: Global Open Data for Agriculture and Nutrition. https://www.godan.info/sites/default/files/documents/GODAN_Success_Stories_Brochure_Issue_1.pdf
Although open data relies upon connectivity, the telecoms sector has been overlooked as an area of focus for open data initiatives.
Good practices exist for open data in telecommunications from providing details of cell towers and spectrum allocation to publishing pricing data; however, these good practices are not yet widely adopted.
Open data enabled transparency for telecommunications network infrastructures and pricing could spur innovation, improve accountability, and help track the social impact of investments in connectivity.
The value of being connected to a communication network is steadily rising. More than a decade ago, researchers established that simple proximity to a communication network was directly correlated1 to a reduction in the probability of dying from malaria. Today, with smartphones delivering powerful generic services like group and personal messaging and more specific apps aimed at critical sectors like education, agriculture, and health, communication networks are approaching the status of essential infrastructure for a modern economy.
And yet, mobile subscriber growth is slowing2 as current mobile network operators struggle to find viability in markets with subsistence-level incomes and/or sparsely populated regions. Attempts to address this problem through universal service strategies/funds have met with limited success.
This presents a conundrum for policy-makers and regulators where value continues to accrue to those with affordable access to communication infrastructure, while the unconnected fall further and further behind by simply staying in the same place. Those who most desperately need support are cut off from access to opportunity, to social and health safety nets, to education, to information that can improve lives, and to platforms to demand change. It is ironic, or perhaps tragic, that the voices of the unconnected are not heard on this issue for the very reason that they are unconnected.
In order to address this issue, fresh thinking is required. Previously, solving connectivity challenges could only be tackled by entire governments investing vast resources in state-owned networks. The mobile phone revolution opened the door to private sector investment in telecommunications, and new business models like pay-as-you-go services have extended sustainable communication services further than anyone could have imagined. However, becoming a mobile network operator still involves millions of dollars, creating a high barrier to market entry.
There are a number of factors that suggest that the telecommunications landscape is shifting once again.
The value chain of telecommunications networks is becoming disaggregated. Previously, in order to enter a market, an operator needed to invest in international, national, middle mile, and last mile infrastructure. Now, we are beginning to see competition in each of those segments.
The spread of fibre optic infrastructure, both undersea and terrestrial, is changing the access market. While there is no question that fibre optic networks are increasing the ability of existing operators to deliver broadband, those same networks are opening up possibilities for new players who now can deliver more targeted, localised, and affordable solutions to unserved populations.
Changes in last mile technology are opening up new possibilities. The spread of WiFi as an access technology is empowering commercial, government, and community access initiatives to offer local services. Dynamic spectrum technology also shows promise as an alternative access technology.
Finally, the meteoric growth of access and mass manufacturing has brought down the cost of access technologies to the point where they are within the reach of small-scale operators. Low-cost solar-powered open source GSM (Global System for Mobile Communications) base stations can be deployed for a fraction of the cost models of existing mobile network operators.
All of these changes represent genuine cause for optimism that it is possible to sustainably connect everyone on the planet. However, in order for that to happen, changes in access policy and regulation are required. And those changes need to be informed by accurate data on existing telecommunication infrastructure and its use. This includes data on the extent and uptake of fibre optic networks, towers used by mobile operators, broadcasters, and ISPs, as well as the wireless spectrum assignments that are assigned to operators. The pricing of wholesale networks is also an important data point, especially from the point of view of regional benchmarking.
To date, public access to any of the above information has been through communication regulators who collect some or all of this information from licensed operators. Some of this information may be passed on to the public through the regulator’s website. In some cases, the operators themselves may release portions of this information. What is evident from an examination of the websites of communication regulators is that there is no consistency as to what information is made publicly available and how detailed that information is.
In the early days of mobile networks (and fibre networks), there was not that much emphasis on accurate mapping of network infrastructure, partially because operators were expanding so rapidly at the time. Now, as subscriber growth is slowing and the challenge of providing affordable access in more difficult regions becomes more evident, it is essential to have more accurate information on the state of network growth and the resources in use.
It is also essential that this data be made available to the public as open data. There are several reasons for this.
Since a wider range of actors from community networks to wireless ISPs to municipalities have the potential to address access gaps in a sustainable manner, we need public access to telecom infrastructure data in order to open the doors to collective, community, and entrepreneurial approaches to infrastructure deployment. Open data on telecommunications infrastructure would enable the identification of infrastructure gaps and opportunities.
Transparency is essential in any industry where hundreds of millions of dollars are invested by both private and public sector organisations. Public data will provide an important reality check.
There is an ongoing need for comparative analysis. Telecommunications infrastructure varies dramatically from country to country and within countries, yet there is very little comparison of physical infrastructure, spectrum assignments, and backhaul costs. Having common public standards for telecommunications data will enable these comparisons and help to identify outliers both good and bad.
Telecommunications infrastructure is enabling the profound social and economic impact that we see as a result of the spread of voice and data networks. The opportunity to compare telecommunications infrastructure development with other social and economic indicators represents a significant opportunity to understand more about their impact.
The open data movement in government has been growing for over a decade. It contributes to more accountable and democratic institutions, and is one way that governments can meet their obligation to provide access to information. The open data principle of providing timely, accessible, complete, affordable, and non-discriminatory access to data is ideal for the telecommunications sector, which stands to benefit with respect to both transparency and innovation. While blanket approaches to open data in government have not always been successful, there is substantial evidence to suggest that more targeted, bottom-up approaches can have very positive outcomes. Some examples include OpenSpending’s work on government finance, Publish What You Pay’s work on extractive industries, and work on transport data by organisations such as London Transport.
The rest of this chapter looks at specific aspects of the telecommunications sector and attempts to show why transparency is essential for it. Further, it demonstrates that good practices do exist for open data in telecommunications but they are not widespread. Promoting open telecommunications data is not about doing something new, but rather about normalising the examples of good behaviour that already exist and aligning with the principles of the open data movement.
The spread of undersea fibre optic cables around Africa since 2009, followed closely by the rapid spread of terrestrial fibre optic infrastructure, is nothing short of a revolution. It has spread far faster than anyone would have imagined possible. There are perhaps only two or three countries in Africa that do not have a national fibre optic backbone currently. Many countries have several. Fibre optic networks are the deep water ports of the internet; they enable orders of magnitude greater broadband capacity than any other kind of access technology and at very low latency. For terrestrial networks in particular, the capacity of this infrastructure is so great that it is effectively a non-rival resource: access for one service provider does not diminish opportunity for other providers.
However, operators are often reluctant to share information about their fibre networks. This reluctance betrays an apprehension that it may somehow compromise their competitive edge, but, in many cases, operators have simply not considered the issue from a strategic perspective. While the majority of operators decline to publish detailed information about their fibre networks, their response stands in stark contrast to companies like Dark Fibre Africa3 in South Africa, and regional operator, Liquid Telecom,4 who readily publish maps of their fibre networks. Dark Fibre Africa stands out in the detail and ease-of-use of their maps.
Taking this information from the narrow group of stakeholders within which it resides and opening it up to public input and discussion as open data can have multiple benefits. For example, a small rural municipality might determine from a public fibre map that it is in their interest to invest in 50 kilometres of fibre network to connect to a nearby network. A province or state might determine that their region is suffering due to a lack of fibre infrastructure investment. A school or a hospital could fundraise for better access if they can show that a fibre optic cable is within a reasonable distance. From a national strategic perspective, fibre optic infrastructure is now comparable in terms of importance with other basic infrastructure like roads, railways, and bridges. The public needs to be aware of its existence in order to identify opportunities to connect to it and to identify gaps where more investment is needed. Making this data public can also be good for operators who can use the scope of their investment in fibre infrastructure to market their services.
Once the end of the fibre network is reached, it is wireless technologies that typically deliver the last mile of connectivity to citizens. Wireless technologies are dependent on national regulatory authorities that grant specific permission to use any given set of radio frequencies. To become a wireless network operator, a licence to operate radio equipment within a given set of frequencies is typically required. The exceptions to this are the industrial, scientific, and medical (ISM) bands, or licence-exempt bands (used by technologies like WiFi, Bluetooth, etc.), which do not require a specific licence. Twenty years ago, when mobile networks were just getting off the ground and most of the internet was carried over copper wires, obtaining a spectrum licence was effectively a simple administrative process. Now that demand for wireless spectrum has significantly increased, spectrum licences have become valuable assets that are often sold at auction for millions of dollars.
It is essential that the public has access to information about which organisations have been assigned a given frequency band, that is, given a licence to operate in a given frequency and on what terms that licence has been granted. A few national regulators publish this information on their websites, but most do not. In Africa, Nigeria stands out for their diligence in publishing spectrum assignments.5 Kenya and South Africa are also relatively good; however, not only do most national regulators not publish this information but some will also refuse a public request for this information.
Why is public access to information on spectrum assignments important? Because there are often opportunities to take better advantage of existing spectrum availability. In Mexico, a nonprofit6 is using low-cost GSM technologies to deliver affordable access7 in the state of Oaxaca. The Mexican regulator has set aside a small amount of GSM spectrum specifically to enable rural access. This inspiring model deserves to be replicated elsewhere; however, without publicly available information on spectrum assignments, it is a challenge to understand where those opportunities are available.
Public access to data on mobile tower locations is also essential. Why? In terms of understanding who has network coverage, we currently must rely on mobile network operator coverage maps. Mobile network operators do not have the best incentives to be completely rigorous in ensuring the accuracy of their network maps. As it becomes more strategically important to connect every citizen, it becomes equally essential to understand exactly who does and who does not have network coverage. The simplest way to validate network coverage claims is to know where the towers are, which operators are on them, and what technologies (i.e. 2G, 3G, LTE) they are using on that tower.
A common push back to this suggestion is that publishing tower information would compromise the security of the networks. In fact, tower locations are already reasonably well-known. First, they are easily visible to the naked eye, therefore, not hard to locate. Second, many, if not most of them, can be identified through online services like OpenCellID8 or Mozilla’s Location Service.9 These two resources are invaluable, but a limitation of their crowd-sourced approach is that they depend on someone (who has their software installed on their phone) being near a tower in order to detect it. To date, this approach has been successful in picking up a large percentage of the towers in many countries; however, the more remote towers (where populations are sparse) tend not to get picked up by these services. It is exactly in these more remote areas (where operators have the least incentive to provide coverage) that we want to know more about access conditions. Therefore, having open data on public tower locations would be extremely valuable from the point of view of mapping the unserved, and in terms of identifying opportunities for new business models to provide services.
Like fibre maps and spectrum charts, good practices already exist with regard to tower information. The Canadian government publishes open data via a Comma Separated Value (CSV) file10 with the location of every tower in Canada together with information about the operator(s) on the tower, as well as the type of equipment, power output, antenna orientation, etc. This is all you would need to build a comprehensive map of towers across Canada, and indeed someone has done so. Steven Nikkel has imported that data into an online map that provides a detailed picture of mobile infrastructure in Canada.11 This is essential information for the average citizen trying to choose a service provider in any region outside of a major urban centre, where coverage varies significantly between operators. There is no reason not to do this sort of mapping everywhere, but it will be necessary to explode a few myths and change the norms around publishing tower data.
It is not just the Canadian government that has seen the value of publishing tower data. In India, veteran operator, Airtel, has published a new website, Open Network,12 where all of their towers for both 2G and 4G networks are mapped. They also identify where towers are being upgraded and where they are still needed. The website goes by the slogan “Because you have a lot to say. And we have nothing to hide”. This is strong evidence to illustrate how transparency, far from being a liability, can actually be a powerful tool for marketing. This is the first instance of a commercial operator publishing tower location data.
Demand for broadband is increasing exponentially in Africa with the result that backhaul networks are fast becoming the critical bottleneck in affordable access to broadband. As noted previously, there is a lot of fibre across Africa, but the cost of terrestrial fibre networks is often so high that it makes operator expansion impractical. This is not a problem if you happen to own the fibre (as many incumbent operators do), but it can be a significant obstacle for new operators. This is not a simple challenge to address, but a step in the right direction would be to introduce more transparency through open data on network backhaul pricing. The cost per Mbps varies dramatically across regions. Regulators may be unaware of how their country stacks up in terms of national backhaul pricing. A little transparency would go a long way. This is not to suggest that operators must reveal their business agreements, only their basic rate card. Among other things, this would have the result of establishing a ceiling for costs.
Once again, some good practices do exist. The regulator in Botswana (BOCRA) publishes a public rate card13 on access to the national fibre optic backbone. Granted this is a state-owned network, which removes the complication of negotiating with the private sector, but even if we just succeeded with state-owned networks, it would be a big leap forward. The practice of publishing backhaul and interconnection pricing is more common in West Africa thanks to a directive in 2006 from the West African Economic and Monetary Union (UEMOA),14 the West African regional economic community.
Affordable access to communication is now such a valuable social and economic enabler that it is no longer appropriate to talk about strategies that connect “most” of the population. We need strategies that can embrace all levels of society and all regions. Fortunately, market and technological trends have created new possibilities for the development of affordable access solutions; however, in order to have a meaningful conversation about those options, we need better data on current telecommunications network development. Governments across the world have seen the potential of open data to increase both transparency and innovation in specific sectors and to better meet the needs of their citizens. Open data policies contribute to more efficient and accountable governance, and facilitate the enjoyment of human rights. Telecommunications has been overlooked as a sector to which open data policies might be applied. This is not a question of massive change for either regulators or operators, but is more of a case of socialising and normalising the good practices that already exist for making telecommunications data public, whether fibre, spectrum, towers, or pricing.
To counteract the inertia of the status quo, a coalition of civil society and research organisations is needed. This group can come up with a simple, convincing campaign to get policy-makers, regulators, and operators to see the value of open telecommunications data with an initial set of data standards, descriptors, and tools that can help early adopters to start opening their data.
Further reading
Song, S. Open telecom data – moving forward. Many Possibilities, 25 May. https://manypossibilities.net/2018/05/open-telecom-data-moving-forward/. Note that this article was the basis for this chapter.
About the author
Stephen Song is a researcher, entrepreneur, and advocate for cheaper, more pervasive access to communication infrastructure in Africa. He is a 2018 Fellow at the Mozilla Foundation, a Research Associate with the Network Startup Resource Center (NSRC), and the founder of Village Telco, a social enterprise that manufactures low-cost WiFi mesh VoIP technologies to deliver affordable voice and internet service. Learn more about Steve and telecommunications in Africa through his blog at http://manypossibilities.net.
Song, S. (2018). Open data and telecommunications. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The state of open data: Histories and horizons (pp. 207–214). Cape Town and Ottawa: African Minds and International Development Research Centre. http://stateofopendata.od4d.net
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. It was carried out with the aid of a grant from the International Development Research Centre, Ottawa, Canada. |
1Mozumder, P. & Marathe, A. (2007). Role of information and communication networks in malaria survival. Malaria Journal, 6, 136–144. https://doi.org/10.1186/1475-2875-6-136
2Rizzato, F., Giles, M., & San Martin, M.C. (2018). Unique subscribers and mobile internet users: Understanding the new growth story. GSMA Intelligence, 15 February. https://www.gsmaintelligence.com/research/2018/02/unique-subscribers-and-mobile-internet-users-understanding-the-new-growth-story/653/
3http://www.dfafrica.co.za/network/coverage/
4https://www.liquidtelecom.com/about-us/network-map.html
5See, for example, https://www.ncc.gov.ng/docman-main/spectrum-frequency-allocation-tables/756-frequency-assignments-900mhz/file
7Lakhani, N. (2016). “It feels like a gift”: Mobile phone co-op transforms rural Mexican community. The Guardian [World news], 15 August. https://www.theguardian.com/world/2016/aug/15/mexico-mobile-phone-network-indigenous-community
9https://location.services.mozilla.com/
10http://sms-sgs.ic.gc.ca/eic/site/sms-sgs-prod.nsf/eng/h_00010.html
11https://www.ertyu.org/steven_nikkel/cancellsites.html
12https://www.airtel.in/opennetwork/
13http://www.bocra.org.bw/sites/default/files/documents/Telecommunications%20and%20ICT%20Prices.pdf
Public transport has been a poster-child of the open data movement with a variety of route planning applications used by millions of people every day. Transport data can also be used to analyse policy and advocate for service improvements.
Tensions exist between centralised route planning services and distributed, open data-driven approaches to transport data. Only a fraction of the data used to drive mobility apps is truly open, and current technical architectures risk holding back a next wave of innovation.
Data-driven transport tools have been developed worldwide; however, established standards need to be more flexible in order to accommodate semi-structured and informal transport networks in the developing world.
The future success of “Mobility as a Service” will depend on a much greater range of open transport data and application programming interfaces (APIs).
How far do you live from your place of work? Was your answer a distance or was it a duration dependent upon a specific mode of transport? The question of how far you can go, and how long it takes to go from one location to another, is key to identifying the opportunities you and your family can take advantage of. The amount of data an application could use to support an answer to this question is beyond imagination. Details of road networks, live public transport timetables, and even wheelchair accessibility of public buildings, are just a few of the applicable datasets.
Urban planners, real estate developers, travel application developers, and even manufacturers of autonomous vehicles, all need this kind of information to make their services better. For some, the availability of this data is even a primary condition for operation. Take the Dutch company, GoOV,1 for example, which aids people with a mental disability to get home safely and autonomously using public transport. Without access to live transport tables, they would not be able to offer these services.
Figure 1:The shape of capitals of Europe – How far can you travel in 1 hour by car?
Source: Created by Topi Tjukanov (used with permission) https://static1.squarespace.com/static/5a25370fc027d841ff016862/5a76d9da53450ac90957f6bd/5a76d9f071c10bcbfb7264af/1517738518899/isochronesv3.png?format=1000w
Transport apps have served as a poster child for the open data movement, with route planning apps, such as CityMapper, Transit App, or Google Maps often appearing in presentations on the benefits of open data. In 2014, the International Association for Public Transport (UITP) made open data the main subject of a focus paper,2 and the association featured open data talks in its IT-Trans conference. A year later, the American Public Transport Association (APTA) published a Policy Development and Research Paper on embracing open data.3 Although these developments indicate significant traction to date on open transport data, gaining the disclosure of transit data has not been straightforward. As pointed out in studies by Rojas4 and Colpaert et al.,5 many cultural, technical, and legal obstacles have had to be overcome.
While transport data is hard to define, this chapter will focus on data that can be used by route planners and on three main challenges:
1.Route planning – determining who does what and how transport data is licensed.
2.The accessibility and availability of datasets.
3.Emerging technologies such as Mobility as a Service (MaaS) and autonomous driving.
Route or trip planner apps advise consumers on how to get to a specific destination. Travel information is displayed using a plethora of interfaces from in-car navigation systems to the website of a local bus company or a third-party travel app. In-car navigation systems may weight data elements differently when providing route planning advice when compared with an application from a municipal transit agency, yet both of them need access to the same data.
The data needed to create these types of applications resides inside the organisations that manage and operate public transport networks, and, due to the high degree of heterogeneity that can be found from one organisation to another in terms of how they manage data, opening and using this data can be a big challenge. To address this issue, several standardisation efforts have arisen around the world to support public transport operators in openly sharing their data in an interoperable fashion. Standards, such as the General Transit Feed Specification (GTFS),6 the European Network Timetable Exchange (NeTEx),7 the Standard Interface for Real-time Information (SIRI),8 or the American Transit Communications Interface Profiles (TCIP),9 provide mechanisms to model and describe scheduled services and real-time updates from transport networks, including arrival predictions, vehicle positions, and service advisories in machine-readable formats.
Some operators also offer route planning application programming interfaces (APIs), which function as open innovation tools, encouraging the creativity of partners who need access to route planning information quickly. However, offering a public route planning API comes at a cost. API providers need to consider whether they are able to respond to all queries with the consequent server bill for each request. As a consequence, route planning APIs are often only available via registration, API keys, and rate limiting. Open data advocates would not call this truly open data as data users are not in control of the algorithms that modify, filter, and operate on the data that is finally exchanged via the API. In a truly open transport data ecosystem, everyone would be able to create their own specific route planning API based on all data being published as open data first.
Transport for London
Transport for London is the local government body responsible for the transport system in Greater London, England, and is commonly cited as a source of open data success stories. When Transport for London began opening up data and offering public APIs, the economic growth potential was estimated at GBP 130 million,10 with more than 600 apps created and more than 500 people directly employed in the reuse of public transport data. Transport for London now focuses primarily on publishing the data and not on building their own route planning apps.11 However, they still publish both the raw timetable data as well as a unified API. Read more at https://tfl.gov.uk/info-for/open-data-users/.
The manner in which data is published also reflects legal constraints. In 2016, Scassa and Diebel12 published a paper in which they, from a legal perspective, argued that publishing real-time data as open data is troublesome. Indeed, when a route planning API is offered, a Service Level Agreement (SLA) is needed, guaranteeing the up-times of an expensive but free service. When, however, the raw data is published via downloads or file updates, the effort required by the publisher is lower and, thus, easier to guarantee.
Public authorities and non-governmental organisations (NGOs) also play a key role regarding open data in the public transport ecosystem. Public authorities provide the legal framework and regulations that drive public transport organisations to pursue open data strategies. They also may provide technical and standardisation guidelines for data publishing that help to achieve greater interoperability. NGOs are avid users of public transport open data, which they use for different kinds of studies and data analysis that aim to shed light on social issues and potential solutions. For example, a study13 conducted by the non-profit organisation, Despacio, on the current status of, and trends in, bike mobility in Bogotá (Colombia) relied on open data provided by the Secretaría de Movilidad of Bogotá to highlight the main challenges and gaps in terms of security and infrastructure for the growing number of bike users in the city. They also used this data, together with the public transportation routes information, to generate a mobility coverage map of the city. Another example is the study performed by the Public Knowledge Workshop, an Israeli NGO that facilitates open data initiatives, which used schedule and live update data from Israeli railway and bus companies to verify their operational synchronisation.14 They revealed that despite the presence of an official government plan requiring joined-up scheduling, there was little synchronisation in practice between the trains and the corresponding buses that were supposed to deliver and pick up passengers to and from their trains. These open data-based studies provide a vital resource for urban planners to better design and plan the development of cities and for social organisations that work toward improving living conditions in cities.
Emerging technology
In 2015, Linked Connections was put forward as a middle-ground route planning solution, moving beyond the false dichotomy between data dumps and route planning APIs.15 With Linked Connections, route planning happens on the infrastructure of the data user, but data is already prepared for the purpose of route planning by the provider. At the basis of the technology lies the same idea as behind Content Delivery Networks (CDN). By creating small fragments of data about the departures of public transport vehicles in cacheable documents, the raw data needed by users is published cost efficiently. The goal of the framework is to enable a new open source route planning ecosystem based on web querying. Further information is available at https://linkedconnections.org.
Although there have been major steps in opening up transit data in the last decade, building a global route planner that includes all public transport modes in the world remains close to impossible. The amount of effort and money required for such an endeavour exceeds what governments and companies are willing to invest. The obstacles are diverse, including technical, legal, and financial barriers, but the availability and accessibility of the required data is paramount.
The majority of public transport companies in the world still do not provide their schedules as open data, and even fewer publish live transit updates in machine-readable formats. Therefore, it is not possible to automatically include such data in a global route planning application. One approach to tackling this data gap might be the use of applications that crawl through transport provider websites and scrape schedule information. This kind of approach demands a high effort, as for every company, there must be an ad hoc implementation of the scraper to extract data. Furthermore, there are often legal uncertainties as to whether scraping transport websites is legal in a particular jurisdiction.
Despite the relatively low availability of data and legal uncertainties around scraped data, there are still some entrepreneurs and established businesses that have been addressing this titanic challenge. The most famous, and notorious, is Google Maps. Google uses the GTFS specification that they maintain, together with a global community of developers in order to import data on different transport modes and networks into their route planner. They encourage public transport companies to generate and deliver their data in this format, but Google does not require the data to be openly published. Sometimes they will work out a direct arrangement with the public transport operators as is the case for the urban bus company, Transmilenio in Bogotá (Colombia), where the operator hires an external company to generate and deliver the GTFS feed to Google without publishing it for public access. According to the Google Transit website, they currently support 5 64016 different transport companies within their route planning application that covers over 18 00017 different cities around the world.
There are several other examples of applications and services that reuse transport open data and that seek to provide a global route planner, such as CityMapper, Transit App, Ally, Moovit, among others. Some of them even try to generate their own data to include cities and transport networks that do not publish their own data (e.g. CityMapper and their work on Mexico City and Istanbul).18 Navitia19 makes an API available that currently contains 434 transport datasets from around the world from which developers can use route planning features, generate maps of time/distances, and access timetables. They take advantage of publicly available open data and encourage users to provide new data sources. However, Transitland is potentially the largest catalogue of open transit data,20 which reports 945 open GTFS feeds, covering 2 377 different public transit operators at the time of writing.
Mexico City
In Mexico, a GTFS feed was introduced to take advantage of the collection of GPS data throughout its transit systems. In a matter of weeks, this mega-city with several different transit providers was able to introduce a fully functional GTFS feed and obtain the benefits of work done on route planning tools elsewhere. A range of free or low-cost customer-facing applications and planning tools were able to immediately capitalise on this data.
Problematic, however, is the fact that part of the public transit system in Mexico is only semi-structured, meaning that some services do not have fixed stops, nor a defined timetable. The project revealed an important limitation of GTFS in its current form as it is unable to easily accommodate the kind of semi-structured public transit services that operate in many developing world cities.
Eros et. al (2014) have detailed the experience in a full paper for the Transportation Research Board.21
By providing a standardised way to model and describe public transport time schedules in machine-readable formats, GTFS has become one of the most important tools to increase the amount of available open data in the transport sector. However, it has some notable limitations when working toward global coverage of transport data. It was originally designed to model structured networks that define a set of fixed stops for vehicles and that run on predefined time schedules that are often specified down to the second. But this is not the case for most of the public transportation services offered in the major cities of the Global South, where operators may define a set of routes that are followed by a set of vehicles but without predefined fixed stops. This type of limitation in the modelling capabilities of the available standards adds difficulty to both standards and open data adoption in these parts of the world. Moreover, public transport operators in developing countries often have few incentives to provide data about their operation, and public authorities may lack the necessary regulatory framework and resources to drive or support these organisations in publishing open data.
To address these shortcomings, and to promote the wider implementation of open data initiatives, a number of different approaches have arisen. For instance, the GTFS-flex22 specification, created and maintained by the independent developer community, is a proposed extension for GTFS that aims to provide the capabilities for modelling semi-structured public transport and demand-responsive transportation services. In Kenya, the Digital Matatus project23 has made use of mobile communication and geolocation technologies to map and generate a GTFS data source for the semi-structured public transport service in Nairobi, which has proven to be a feasible mechanism to fill the gap when data on these types of transport networks is not available from official sources. Following this initiative, the Digital Transport for Africa community was created, which has supported open data generation projects for public transport services in Cairo, Maputo, Accra, and Abidjan.24 Similarly, the World Bank began offering a course to empower participants to create, manage, and use GTFS feeds in resource-constrained environments.25 It is important to note that these types of initiatives help to increase available open data for the transport sector, but they still require significant investment and political will from the public authorities in the developing world.
Today, there is evidence that disclosing public transport data can generate many benefits for different actors, including developers, entrepreneurs, users, and transport companies, and the discussion is no longer centred on whether data should or should not be openly published. The resistance still encountered around the world to engaging with open data is attributed more to a matter of the political will of organisations. Policies promoted at the national, regional, and local levels can play an important role in increasing the implementation of open transport data initiatives. One clear example of such promotion is the Intelligent Transport System (ITS) Directive26 of the European Union. The directive aims to accelerate the deployment of innovative transport technologies across Europe, and the public accessibility of data is one of its main requirements, indicating that both policy and research discussions about open data in the public transport sector have now moved to a technical and a legal level. The key questions to address in scaling coverage relate to how transport data should be published to improve interoperability, while keeping costs to a minimum, as well as how to address legal considerations to protect the interests of involved parties, without limiting open data benefits.
World-wide and open source – Transportr
Open source route planning software exists today, such as Open Trip Planner, OSRM, Navitia, or RRRR, and many companies, like Plannerstack, Conveyal, Digitransit, and Kisio Digital, make use of this open source software to provide services to their clients. Navitia.io, based on the Navitia code-base, is a freemium SaaS solution for route planning. Transportr reuses this service to create a fully open source and free app with the data available via the web-services. Read more at https://transportr.grobox.de/.
Mobility is always a core point of discussion in urban planning. Ever since its introduction in the early 20th century, cities have been adapting to, or have been “taken hostage by”, as some would proclaim, the car as the primary means of transport. The continued dominance and density of cars, and their negative environmental and social impacts within urban environments, has created a sense of urgency around the need to diversify the way we move from one place to another. Yet statistics on car use will not trigger a worldwide change by themselves. In order to change dominant behaviour, mobility activists and entrepreneurs have coined the term Mobility as a Service (MaaS). This new idea tries to activate people to leave their cars behind and diversify their mobility choices by means of an app. Instead of having to use multiple apps to find routes and buy tickets for each different mode of transport, an ecosystem for all-in-one solutions must be built.
In order to grow a MaaS ecosystem in a certain region, three requirements need to be fulfilled. The first is that the data needs to be available on where and when specific services can be used. Given the low availability of open transport datasets today, the MaaS movement is also an important advocacy force for open data, arguing that every mobility player, whether public or private, needs to publish their data in order to create a truly level playing field for MaaS.
In Belgium, for example, an Open Data Charter was created in 2018 by local governments and regional governmental institutions27 that lays out 20 principles for open data, including the 19th principle stating that data resulting from a government concession should be open as well. Local governments adopting such a principle may push forward the agenda of open data and MaaS worldwide.
The second requirement for MaaS is that an open ticketing API must be in place. The more you allow third parties to sell your tickets, the more integration can happen with other mobility solutions. An open ticketing API may allow tickets to be granted to users in various ways (e.g. per hour, per km, etc.). As evidenced by the low availability of fare data in general, it is certainly early days, yet this is an area that is currently rapidly evolving. In Finland, for example, an API for ticketing has been created that can be used by anyone to buy tickets without signing complicated contracts. This allows apps created by vendors, such as MaaS global,28 to start selling tickets as a third party.
Finally, open data and open ticketing alone are not going to create a seamless travel experience for end-users. As a third condition, a city needs to prepare itself for multimodality. Infrastructures need to be better aligned with public and private transport offerings. Different enablers exist for brainstorming solutions in this area, such as Open Transport Camp in Australia29 and TransportationCamp in the US,30 or initiatives such as the MaaS alliance,31 Fabrique Mobilité,32 or Mobihubs33 in Europe. Ultimately, it will take multiple data communities to output the policy, planning, and programmes of action that will truly reshape public space and mobility.
There is evidence worldwide that transport data is being released as open data, whether it is through crowdsourcing initiatives as in Mexico City or through official public transport or governmental organisations. In the US, thanks to the APTA, and in European countries, thanks to the ITS, Public Sector Information (PSI), and INSPIRE directives, policies are pushing the agenda for open transport data forward.
Now that the benefits of sharing public transport data openly are becoming visible through apps that can immediately turn these datasets into route planners, the way data is shared needs to evolve technically. The current de facto standard for sharing data via GTFS still requires a big investment from users before the data can be used in a route planner, and only a fraction of the data that exists in GTFS format is publicly available as open data. The true potential of open transport data is yet to be unlocked, although as the integration costs of transport data decrease and more data is made available, there is scope for substantial progress to be made.
Open data alone is not going to create a big change in how people move from one place to another. Advancement of MaaS will need to combine concepts of open data, support for open ticketing, and work on infrastructure investments in order to diversify the availability of transport options. It is up to policy-makers to create the right environment and infrastructure to properly prepare cities for the mobility of the future.
Further reading
Colpaert, P., Compernolle, M.V., Walravns, N., Mechant, P., Adriaenssens, J., Ongenae, F., Verborgh, R., & Mannens, E. (2017). Open transport data for maximising reuse in multimodal route planners: A study in Flanders. IET Intelligent Transport Systems, 11(7), 397–402. https://ieeexplore.ieee.org/abstract/document/8061184
Eros, E., Mehndiratta, S., Zegras, C., Webb, K., & Ochoa, M.C. (2014). Applying the General Transit Feed Specification to the Global South: Experiences in Mexico City, Mexico – and beyond. Transportation Research Record, 2442(1), 44–52. https://doi.org/10.3141/2442-06
Hogge, B. (2016). Transport for London: Get set, go! Open data’s impact. GovLab and Omidyar Network. http://odimpact.org/files/case-studies-transport-for-london.pdf
Rojas, F.M. (2012). Transit transparency: Effective disclosure through open data. Transparency Policy Project. Cambridge, MA: Ash Center for Democratic Governance and Innovation. http://www.transparencypolicy.net/assets/FINAL_UTC_TransitTransparency_8%2028%202012.pdf
Pieter Colpaert is a Researcher at Ghent University’s Internet and Data Research Lab. His research focuses on enabling route planning at a large scale, using linked data. He is a board member of the Belgian chapter of Open Knowledge and a community coordinator of the open transport working group at Open Knowledge International. You can learn more about Pieter’s work at http://pietercolpaert.be.
Julián Andrés Rojas Meléndez is a Researcher at the University of Ghent working on interoperable open data publishing strategies on the Web and decentralised route planning with open data. You can follow Julián at https://www.twitter.com/julianr1987.
How to cite this chapter
Colpaert, P. & Rojas Meléndez, J.A. (2019). Open data and transportation. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The state of open data: Histories and horizons (pp. 215–224). Cape Town and Ottawa: African Minds and International Development Research Centre. http://stateofopendata.od4d.net.
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. It was carried out with the aid of a grant from the International Development Research Centre, Ottawa, Canada. |
2UITP. (2014). Action points for the public transport sector. The benefits of open data. Brussels: Union Internationale des Transports Publics. https://www.uitp.org/sites/default/files/cck-focus-papers-files/AP%20-%20Benefits%20of%20open%20data%20EN.pdf
3APTA. (2015). Public transportation embracing open data. Washington, DC: American Public Transport Association. https://www.apta.com/resources/reportsandpublications/Documents/APTA-Embracing-Open-Data.pdf
4Rojas, F.M. (2012). Transit transparency: Effective disclosure through open data. Transparency Policy Project. Cambridge, MA: Ash Center for Democratic Governance and Innovation. http://www.transparencypolicy.net/assets/FINAL_UTC_TransitTransparency_8%2028%202012.pdf
5Colpaert, P., Van Compernolle, M., Walravens, N., Mechant. P., Adriaenssens, J., Ongenae, F., Verborgh, R., & Mannens, E. (2017). Open transport data for maximising reuse in multimodal route planners: A study in Flanders. IET Intelligent Transport Systems, 11(7), 397–402. https://ieeexplore.ieee.org/abstract/document/8061184
8http://www.transmodel-cen.eu/standards/siri/
10Deloitte. (2017). Assessing the value of TfL’s open data and digital partnerships. London: Deloitte. http://content.tfl.gov.uk/deloitte-report-tfl-open-data.pdf
11Hogge, B. (2016). Transport for London: Get set, go! Open Data’s Impact. GovLab and Omidyar Network. http://odimpact.org/files/case-studies-transport-for-london.pdf
12Scassa, T. & Diebel, A. (2016). Open or closed? Open licensing of real-time public sector transit data. JeDEM – EJournal of EDemocracy and Open Government, 8(2), 1–20. https://jedem.org/index.php/jedem/article/view/414
13Verma, P., López, J.S., & Pardo, C. (2015). Bicycle account: Bogota 2014. Bogotá: Despacio. http://www.despacio.org/wp-content/uploads/2015/01/Bicycle-Account-BOG-2014-20150109-LR.pdf
14See the OpenTrain project at http://otrain.org/ and Open Bus project details at http://www.hasadna.org.il/en/projects/
15Colpaert, P. (2017). Publishing transport data for maximum reuse. PhD Thesis, Ghent University, Belgium. https://phd.pietercolpaert.be
16https://maps.google.com/landing/transit/cities/
17https://maps.google.com/help/maps/mapcontent/transit/participate.html
18Citymapper. (2015). Building a city without open data. https://medium.com/citymapper/building-a-city-without-open-data-124356672deb
20https://transit.land/feed-registry/
21Eros, E., Mehndiratta, S., Zegras, C., Webb, K., & Ochoa, M.C. (2014). Applying the General Transit Feed Specification to the Global South: Experiences in Mexico City, Mexico – and beyond. Transportation Research Record, 2442(1), 44–52. https://doi.org/10.3141/2442-06
22https://github.com/MobilityData/gtfs-flex
23Williams, S., White, A., Waiganjo, P., Orwa, D., & Klopp, J. (2015). The Digital Matatu Project: Using cell phones to create an open source data for Nairobi’s semi-formal bus system. Journal of Transport Geography, 49, 39–51. http://www.sciencedirect.com/science/article/pii/S0966692315001878
24http://digitaltransport4africa.org/
25https://olc.worldbank.org/content/introduction-general-transit-feed-specification-gtfs-and-informal-transit-system-mapping
26https://ec.europa.eu/transport/themes/its/road/action_plan_en
27Smart Flanders. (2018). Open Data Charter. https://smart.flanders.be/
29http://www.transportcamp.org.au/
30http://transportationcamp.org/
32http://wiki.lafabriquedesmobilites.fr/wiki/Ev%C3%A9nements
Open data in the context of urban development is increasingly linked with “smart cities” and “urban resilience” agendas.
There has been a shift from an early emphasis on hackathons, seen as a potential mechanism for co-production of public services with external experts, toward working on data standards, infrastructure, and in-house analytical capacity within city governments.
Intermediaries, including public-funded organisations such as libraries, have an important role to play helping citizens gain value from urban open data.
Without further work crafting practitioner communities and clear agendas, open data is likely to be seen primarily as a tool to be selectively used in smart cities, rather than as the central element of a comprehensive approach to achieve more open urban development.
Since 2008, more than 50% of the world’s population has resided in cities.1 Estimates suggest this figure will be over 66% by 2050,2 with much of this urbanisation taking place in the developing world. The ongoing growth of the urban environment and of urban density brings with it both opportunities and challenges. Creating vibrant communities, maintaining mobility, delivering essential services, and creating low-carbon development depend on a mix of planned and emergent action, all of which has come to rely more and more upon data. However, there are differing visions of effective urban development from centralised and highly technical “smart city” narratives that envisage a city organised using predominantly proprietary and commercial technology3 to “smart citizen” viewpoints that stem more from an ad hoc bottom-up model of urban development based on open technology and open data.4 The reality for many cities will likely lie somewhere between these extremes, and, over the last decade, open data has played a critical role in creating the space for dialogue about the future of the city and in providing a platform for various urban innovations.
To some extent, the template for the wider open data movement was originally set at the city level. In 2007, Vivek Kundra (who later became chief information officer for the United States (US) government and the architect of data.gov) launched the Washington, DC data portal, opening up datasets that had previously driven the government’s own CitiStat dashboard.5 Recognising the limited capacity of the city bureaucracy to develop new solutions with its own data, the driving goal was to harness ideas and energy from outside government. Today, hundreds of cities have their own open data portals (although this appears to be much more common in North America and Europe than in other parts of the world),6 and many have hosted events to spur on use of this data to support engagement with independent developers, researchers, and municipal open data champions.
More recently, as the Sustainable Development Goals (SDGs) and the debate around resilience have become key elements of the mainstream development agenda, questions on how open data can play a role in urban sustainability and urban resilience have received increasing attention.7 In many cities, a lack of data can hamper efforts to plan and coordinate development or lead to certain populations ending up “invisible” with their needs neglected as city leaders push forward with major infrastructure and construction projects. On the other hand, the innovative use of open data has provided the means for citizens to get involved in local government, using statistics, data visualisation, and storytelling to engage in debates about shared urban futures.
In this chapter, we explore these themes through four lenses: innovation, infrastructure, measurement and management, and resilience. We also identify some of the principles, interventions, and attitudes that will be required for open data-enabled urban development in the years ahead.
The concept of innovation is often a contested one because of the ambiguity around determining what is, or is not, innovative. However, since the earliest open data app competitions, the idea that releasing open data can unlock innovative capacity for urban development and local economic growth has been compelling for community-minded developers and service providers.8 Open data has been framed as a tool to engage a wider range of actors in solving municipal problems, harnessing “innovative ideas” from academics, citizens, and the private sector. Sandoval-Almazan et al. describe this as creating new forms of citizen empowerment,9 and Hielkema and Hongisto have argued that app competitions based on open data are important venues to bring together stakeholders (including government, developers, and users) to initiate collaborative projects and enable the private sector to tailor products to the public.10
Hundreds of urban open data hackathon events have taken place over the last decade, providing new points of connection between city officials and technically skilled communities who want to engage with them. When organised by government, they demonstrate a clear recognition of the fact that open data initiatives require more than just data release.11 The hackathon model has been used across the world and, in many cases, framed as a catalyst for urban and economic development.12 Building on a “government as a platform” premise, many early hackathons were based on the belief that innovation could only come from outside government. Over time, there has been a shift to recognise that effective co-production through hackathons requires a more strategic approach, with officials taking a more active role in defining problems and in finding, preparing, and using relevant datasets.13
Although popular, hackathons have also been subject to critique. Researchers have questioned their ability to produce meaningful social impact and citizen engagement, noting that there is scarce evidence of the link between hackathons and economic growth.14 Johnson and Robinson have further argued that open data, particularly when delivered as part of a hackathon-model, may promote a particular model for outsourcing government functions with potentially negative outcomes for urban development.15 City governments have also struggled with how to partner with, or procure from, civic entrepreneurs who will work with their open datasets. This has led to many hackathons focusing more on acting solely as demonstrators of what could be possible, rather than on the development of active tools to transform urban governance or city living. A number of studies have looked at identifying how cities can better design innovative competitions and events and how best to secure lasting impact from them.16,17
Linking open data and urban governance in Jakarta
In 2014, the Southeast Asia Technology and Transparency Initiative, the Web Foundation, and Government of Jakarta worked with a range of partners to host “HackJak”, a two-day hackathon, where more than 100 participants worked on 53 different projects. Organisers took advantage of Indonesia’s upcoming role as government co-chair of the Open Government Partnership to secure interest and participation in the event, which resulted in several prototypes, including an application that provided live feedback on public transport shelters and tools for navigating and engaging with the city budget.18,19
In 2016, Jakarta’s Provincial Disaster Management Agency, the National Indonesian Disaster Agency, the World Bank, Humanitarian OpenStreetMap Team (HOT), and other partners convened another hackathon to explore how technology, data, and open collaboration could be marshalled to address challenges around flooding. Five teams developed prototypes, including designs for tools that would keep citizens informed during flooding and provided an analysis on the areas of the city most vulnerable to flooding.20
Over time, city engagement with open data has become professionalised. Numerous cities now have internal innovation labs, connecting work on open data with wider themes of data-science, service design, and technology innovation. One of the early examples, Boston’s Office of New Urban Mechanics, operates as a cross-departmental unit focused on addressing specific city challenges.21 In the City of Buenos Aires, open data, open government, and smart city activities have been placed under the same directorate, the Innovation and Smart City Undersecretariat (Subsecretaría de Innovación y Ciudad Inteligente).22 This integration of “openness” (open data and open government) into urban development, or “smart city” work, is becoming more common. In Montreal, the recently established Urban Innovation Lab23 has absorbed all staff previously working on open data, mirroring a pattern seen in a number of cities, where external civic hackers and entrepreneurs were hired by government and then given a wider set of responsibilities that incorporate, but do not entirely centre around, open data.
This blending of open data into broader technology-driven urban development agendas has both benefits and risks. With the application of consistent values and principles, it can help drive an “open by default” approach to work on innovation and co-production. However, it can also lead to the importance of openness being downplayed or to less engagement with the kinds of experimental citizen engagement spaces that characterised early work on open data. A focus on instrumental and institutionalised use of open data may also orient work solely toward service delivery and away from transparency and accountability as is suggested in the framing of the 2013 Code for America book Beyond transparency: Open data and the future of civic innovation.24
While many new ideas and approaches have emerged from urban planners and service providers engaging with open data, in many cases, cities are facing similar challenges and, therefore, looking at similar solutions. For example, public transit apps, 311-reporting systems, city dashboards, and resource directories, although once innovative, have become open data initiatives more commonly adopted by cities whenever the data infrastructure exists to drive them. However, when it comes to the availability of data infrastructure, the picture is much more mixed across the globe.
Centrally coordinated management of the physical infrastructure of a city (electricity grids, water supply and waste disposal, transport networks, city services, etc.) requires a substantial data infrastructure. This data infrastructure relies, in turn, on a telecommunications infrastructure to connect sensors, staff, and systems together.25 In the context of “smart cities”, this infrastructure is usually built by private firms and offered as a package that may give corporate platform providers considerable control of the hardware, software, and data involved in city operations, and which may bring long-term dependency on specific vendors. Alternatively, some cities are piecing together legacy systems, or working with a mixed ecology of public, private, and citizen-generated data, laying the foundations for a more open model of the “smart city”. Regardless, central to interoperability of these systems will be the continued development of open standards.
Over the last decade, open data standards development has been an area of considerable focus. Standards and API (application program interface) specifications, such as Open311 (initially developed as an API on top of Washington, DC’s existing “311” public request system),26 have helped facilitate the creation of open data via crowdsourcing and opening up existing issue-reporting systems.27 Services built on top of Open311, such as SeeClickFix28 and FixMyStreet,29 have provided new avenues for citizen participation and co-production, as well as demonstrating a range of business models for civic technology. However, although its potential and impacts have been widely discussed, Open311’s own dashboard suggests full adoption has been limited, with just 25 city deployments identified at the time of writing.30 There is also evidence that individual implementations of 311 have undergone semantic and ontological divergence.31 Similarly, the roughly 60 city-relevant standards listed by GovEx and GeoThink in their datastandards.directory32 project have only been adopted by a minority of cities.
In 1996, the City of Baltimore launched a simple non-emergency phone number, 311, to relieve pressure on the emergency 911 service. By 2001, a number of US cities had taken up the idea, providing access to a wide range of services via 311 call centres. Subsequently, the first experiments with web-based 311 contacts started in Chicago. In 2008, SeeClickFix, a product that enables US cities to accept 311-style reports, was launched by a for-profit firm33 in parallel with a similar platform, FixMyStreet, that was launched in the United Kingdom by MySociety.34 The growth of 311, and of digital 311 platforms, meant increasing amounts of data on issues facing cities but also led to increased risks that different cities would be collecting fragmented data that was locked into proprietary systems.
In 2009, the Open311 project was launched to develop an open API specification that would support both data collection and retrieval on reported issues,35 turning 311 services from a one-way communication to government into a conversation between government and citizens about pressing issues at the neighbourhood level. Open311 specifications, created through an open community process, have now been implemented for both FixMyStreet and SeeClickFix, as well as custom systems in a number of municipalities. The presence of a standard has also given new cities a head-start in developing their own systems.
One urban data standard that has achieved widespread adoption is the General Transit Feed Specification (GTFS) for public transport schedules, which originated as a project between Google and the City of Portland.36 While implementations of data standards in developed countries are often intended to digitise and extend existing analogue services provided by government, implementations in the Global South may focus on the development of new services not originally provided by government. One such example, Digital Matatus, a university collaboration in Nairobi, has mapped Nairobi’s matatus (minibus) transit routes, which dominate the public transit environment, using a citizen science approach and GTFS to improve local trip planning.37
The ultimate impact of open data infrastructures on urban planning is often indirect. For example, third-party transit apps based on GTFS data can improve the ability of residents, including those with disabilities and the elderly, to navigate urban environments, 38 while the data on how and when people are travelling can be used for research and urban mobility planning by local government.39
Whether the pace of development and adoption of open data standards is enough to ensure urban environments run on open infrastructures is a critical question requiring further research. Similarly, further work is required to track how far standards have progressed in making data shareable between cities as evidence to date suggests that data interoperability across cities and regions remains a major challenge. A lack of open and interoperability infrastructures will impact governments’ ability to undertake collaborative urban development initiatives at a broader scale. However, within individual cities, governments are certainly looking at how they can make more use of their existing data through new models of data presentation.
Over the last decade, many cities have opened up hundreds of datasets that provide both a real-time and an historic view on the urban environment, and numerous projects have taken place to create urban dashboards driven by that data. The underlying idea that open data can be analysed and visualised in a simple interface (e.g. dashboard) is a key driver for municipal government,40 linking open data with other fields connected to e-government,41 urban data analytics, big data, and government data infrastructures,42 all of which have explored mechanisms to combine and display multiple streams of data. One iconic example is the Rio de Janeiro Operations Centre,43 which combines multiple streams of data, including open data, for predictive modelling of development and disaster scenarios. Such systems may not necessarily require data to be open in order to function, and the concept of a government operations centre has been around for a while,44 but the addition of open data brings greater transparency to the calculation and presentation of city performance measures.
Open data-powered dashboards have become key public-facing instruments to demonstrate city performance on select indicators45 and have often been deployed to support citizen engagement. Many examples, such as the City of Edmonton’s Citizen Dashboard,46 reflect a new approach to city performance measurement over the past few years, transforming data into information through simple charts and web maps. It is a model adopted both by cities and by cross-city collaborations. For example, in a project framed specifically around open data, the InterAmerican Development Bank has supported the creation of an Urban Dashboard platform for 50 cities in Latin America, providing access to benchmark indicators and survey data for individual cities and allowing for performance comparisons between cities.47
While performance dashboards can still be manipulated by governments to present higher levels of performance (including performance on transparency indicators) than the reality,48 open data approaches add an extra level of traceability and accountability by publicly linking data visualisations back to source data and the data owner responsible. Engaging effectively with open data can require high levels of data literacy. This has led to the rise of infomediaries who can help citizens to engage with flows of data.49,50,51 However, dashboards and performance metrics can only visualise data if it is available and, in many cases, relevant datasets are not available52 or simply do not exist. If available, it is also critical to ensure they do not contain biased information about the urban environment that can contribute to the marginalisation of certain populations.
A recognition of the importance of shaping not only decisions made with data, but also the stock of urban data that supports decisions, has driven the work of groups, such as Transparent Chennai in India, who have deployed small teams of researchers to work with grassroots communities to generate new datasets on issues ranging from public toilets to road safety. By using their own dashboards, Transparent Chennai is able to present a different view on the city and to support citizens in making a case for change to government officials. The initiative seeks to establish a bottom-up model of the “transparent city” and enable “smart citizens” to overcome top-down models of traditional citizenship.53 Similar models of bottom-up data creation can be seen in action, working on open data for urban resilience.
The Rockefeller Foundation describes resilience as “the capacity of individuals, communities, institutions, businesses, and systems within a city to survive, adapt, and grow no matter what kinds of chronic stresses and acute shocks they experience”.54 The Rockefeller Foundation’s 100 Resilient Cities (100RC) initiative has helped to establish urban resilience as a major topic in mainstream discourse. Resilience has often been framed as a problem of infrastructure,55,56 while open data has usually been linked to the urban context as an issue for data infrastructure.57 As a result, open data is now viewed as a key element of city infrastructure that can help to predict and ameliorate stresses and shocks.58 This is increasingly important in the developing world, where open data59 and open source software60 are presented as tools for international development.
The Open Data for Resilience Initiative (OpenDRI) under the World Bank Group’s Global Facility for Disaster Reduction and Recovery (GFDRR) was created specifically to incorporate open data into urban resilience projects through data collection, data portals, and data analytics.61 With a focus on risk data, OpenDRI incorporates key open data principles, such as open by default, and has supported numerous urban (and rural) community mapping initiatives in the developing world. The programme has developed a field guide62 and a guide to creating OpenDRI Open Cities Projects.63
Urban resilience has become closely tied to crowdsourced mapping efforts in countries of the Global South as it can help governments fill in the gaps in their spatial data collection, especially in the mapping of urban infrastructure.64 OpenStreetMap (OSM) has become a dominant platform for crowdsourcing geospatial open data and has been adopted by many programmes, including OpenDRI. This work originated from crisis responses to large natural disasters, such as the Haiti earthquake of 201065 and the Nepal earthquake of 2015.66 The Humanitarian OpenStreetMap Team (HOT), an entirely volunteer-driven community focused on disaster mapping around the world, was established in the immediate aftermath of the Haiti earthquake.67 Mapping of urban environments in cities struck by disaster has allowed humanitarian workers to target their services more accurately. However, OSM as a crowdsourcing platform was recognised to have greater benefits beyond short-term mapping needs. Humanitarian organisations, such as the Red Cross, have recognised the potential of OSM to support development activities, map infrastructure, and form the backbone of government geospatial data infrastructure traditionally provided by the private sector. This is reflected in financial and data contributions from the Red Cross68 and the Red Cross Missing Maps initiative.69 OpenDRI has actively pursued the use of OSM for its own projects. Other organisations have used OSM for their own initiatives. For example, the Kathmandu Living Lab has been able to map large sections of Kathmandu and Nepal, leading to support activities, such as mapping literacy workshops to sustain local OSM mapping and data currency. HOT has since partnered with the Red Cross, the World Bank, and the US State Department, providing one indicator that the international development community has begun to embrace open data production as a core element of their activities.
Just as the release of government data created a space for citizens to explore new models of engagement with city governance, the rise of citizen-led data generation to support urban resilience has opened up new possibilities for co-production. Sustaining the participatory dimensions of these efforts over the longer term will be an important challenge to meet.
In some ways, open data is a golden thread running through modern urban development work. Development needs data, and the open sharing of that data is clearly an effective strategy to support collaboration between different stakeholders. But, in many ways, open data is still peripheral to other, more integrated, data-related work within the urban environment. Mainstream urban development literature is more likely to talk about big data, sensor networks, and APIs than it is to talk about “open” data. There are established networks working on open “smart” cities, international collaboration on standards, and strong indications of support from major institutions, yet, even though urban open data initiatives have been successful in pioneering new models of collaboration and co-production, excitement for citizen–government collaboration based on a foundation of open data has often waned, and it is not clear how many cities have truly embedded a culture of openness through data into their organisational DNA.
To keep openness on the urban development agenda, it will be vital to maintain and visibly demonstrate the value of open data as other emerging technologies, particularly big data and artificial intelligence, start to take over the spotlight. Building on existing events, such as the biannual Open Cities Summit, will help, but much wider outreach will be necessary. There are also challenges ahead for both research and practice. Much more sustainable investment is needed in shared and scalable open data infrastructures if the potential of an open marketplace for urban development solutions is to be realised. The strong and consistent adoption of existing and new data standards will be paramount to increasing interoperability and reuse. In parallel, work is needed at the grassroots to promote a vision of open data as a powerful tool for urban development and co-production with opportunities for development that can be initiated by both government and citizens. This requires building connections between existing civil society groups working to support urban communities and the technical intermediaries who can help them to make the most of open data. Crafting these practitioner communities around open data will be an ongoing challenge, but, if successful, it will result in the knowledge sharing and effective data-driven problem-solving needed to address the challenges of modern urban development.
Further reading
Beckwith, R., Sherry, J., & Prendergast, D. (2019). Data flow in the smart city: Open Data versus the commons. In M. de Lange & M. de Waal (Eds.), The hackable city (pp. 205– 221). Singapore: Springer Singapore. https://link.springer.com/chapter/10.1007%2F978-981-13-2694-3_11
Goldsmith, S. & Crawford, S. (2014). The responsive city: Engaging communities through data-smart governance. Hoboken, NJ: John Wiley & Sons.
Goldstein, B. & Dyson, L. (Eds.). (2013). Beyond transparency: Open data and the future of civic innovation. San Francisco, CA: Code for America Press. http://beyondtransparency.org/pdf/BeyondTransparency.pdf
Landry, J.-N., Webster, K., Wylie, B., & Robinson, P. (2016). How can we improve urban resilience with open data? Ottawa: Open Data for Development. https://drive.google.com/file/d/0B739vUevKlPgYjJweC1NMElDaVk/view
Sadoway, D. & Shekhar, S. (2014). (Re)Prioritizing citizens in smart cities governance: Examples of smart citizenship from urban India. Journal of Community Informatics, 10(3). http://ci-journal.org/index.php/ciej/article/view/1179
About the author
Jean-Noé Landry is a social entrepreneur and Executive Director of OpenNorth, Canada’s leading not-for-profit organisation specialising in open data, civic technology, and smart cities. As an open data expert, he convenes data stakeholders, promotes data standardisation, and connects governments to their data constituents. You can follow Jean-Noé at https://www.twitter.com/jeannoelandry and learn more about Open North at https://www.opennorth.ca.
How to cite this chapter
Landry, J.-N. (2019). Open data and urban development. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The state of open data: Histories and horizons (pp. 225–236). Cape Town and Ottawa: African Minds and International Development Research Centre. http://stateofopendata.od4d.net
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. It was carried out with the aid of a grant from the International Development Research Centre, Ottawa, Canada. |
1World Bank. (2018). Urban population (% of total). https://data.worldbank.org/indicator/SP.URB.TOTL.IN.ZS
2UN DESA (). (2018). 68% of the world population projected to live in urban areas by 2050, says UN. United Nations Department of Economic and Social Affairs [News post], 16 May. https://www.un.org/development/desa/en/news/population/2018-revision-of-world-urbanization-prospects.html
3Rabari, C. & Storper, M. (2015). The digital skin of cities: Urban theory and research in the age of the sensored and metered city, ubiquitous computing and big data. Cambridge Journal of Regions, Economy and Society, 8(1), 27–42. https://doi.org/10.1093/cjres/rsu021
4Sadoway, D. & Shekhar, S. (2014). (Re)Prioritizing citizens in smart cities governance: Examples of smart citizenship from urban India. Journal of Community Informatics, 10(3). http://ci-journal.org/index.php/ciej/article/view/1179
5Tauberer, J. (2014). Open government data: The book. 2nd edition. (pp. 7–28). https://opengovdata.io/2014/history-the-movement/
6https://www.opendatasoft.com/a-comprehensive-list-of-all-open-data-portals-around-the-world/
7Landry, J.-N., Webster, K., Wylie, B., & Robinson, P. (2016). How can we improve urban resilience with open data? Ottawa: Open Data for Development. https://drive.google.com/file/d/0B739vUevKlPgYjJweC1NMElDaVk/view
8Lee, M., Almirall, E., & Wareham, J. (2015). Open data and civic apps: First-generation failures, second-generation improvements. Communications of the ACM, 59(1), 82–89. http://dl.acm.org/citation.cfm?doid=2859829.2756542
9Sandoval-Almazan, R., Gil-Garcia, J.R., Luna-Reyes, L.F., Luna, D.E., & Rojas-Romero, Y. (2012). Open Government 2.0: Citizen empowerment through open data, web and mobile apps. In Proceedings of the 6th International Conference on Theory and Practice of Electronic Governance (pp. 30–33). New York, NY: Association for Computing Machinery. http://doi.acm.org/10.1145/2463728.2463735
10Hielkema, H. & Hongisto, P. (2013). Developing the Helsinki Smart City: The role of competitions for open data applications. Journal of the Knowledge Economy, 4(2), 190–204. https://doi.org/10.1007/s13132-012-0087-6
11Kamariotou, M. & Kitsios, F. (2017). Open data hackathons: A strategy to increase innovation in the city. In Proceedings of International Conference for Entrepreneurship, Innovation and Regional Development (pp. 231–238). Thessaloniki, Greece. https://artionce-my.sharepoint.com/personal/iceird_artion_com_gr/Documents/ICEIRD2017-ProceedingsBook.pdf?slrid=f85ca99e-80f7-7000-a716-e87e89251c73#page=231
12Khan, A. (2016). An incredible weekend for civic hacking in Islamabad! Code for Pakistan, 8 August. http://codeforpakistan.org/blog/2016/08/08/an-incredible-weekend-for-civic-hacking-in-islamabad/
13Kamariotou, M. & Kitsios, F. (2017). Open data hackathons: A strategy to increase innovation in the city. In Proceedings of International Conference for Entrepreneurship, Innovation and Regional Development, 231. Thessaloniki, Greece. https://artionce-my.sharepoint.com/personal/iceird_artion_com_gr/Documents/ICEIRD2017-ProceedingsBook.pdf?slrid=f85ca99e-80f7-7000-a716-e87e89251c73#page=231
14Johnson, P. & Robinson, P. (2014). Civic hackathons: Innovation, procurement, or civic engagement? Review of Policy Research, 31(4), 349–357. https://doi.org/10.1111/ropr.12074
15Ibid.
16Kamariotou, M. & Kitsios, F. (2017). Open data hackathons: A strategy to increase innovation in the city. In Proceedings of International Conference for Entrepreneurship, Innovation and Regional Development, 231. Thessaloniki, Greece. https://artionce-my.sharepoint.com/personal/iceird_artion_com_gr/Documents/ICEIRD2017-ProceedingsBook.pdf?slrid=f85ca99e-80f7-7000-a716-e87e89251c73#page=231
17Lee, M., Almirall, E., & Wareham, J. (2015). Open data and civic apps: First-generation failures, second-generation improvements. Communications of the ACM, 59(1), 82–89. http://dx.doi.org/10.1145/2756542
18McKenzie, J. (2014). #HackJak: Jakarta’s first gov-sponsored open data hackathon tackles budget and public transportation. TechPresident. http://techpresident.com/news/wegov/24972/hackjak-jakartas-first-gov-sponsored-open-data-hackathon-tackles-budget-and-public
19Lukman, E. (2014). Jakarta government held its first hackathon, here are the winners. Tech in Asia, 29 April. https://www.techinasia.com/jakarta-hackathon-hackjak-winners
20Sutton, T. (2016). FloodHack 2016. InaSAFE, 8 April. http://inasafe.org/floodhack-2016/
21Goldsmith, S. & Crawford, S. (2014). The responsive city: Engaging communities through data-smart governance. Hoboken, NJ: John Wiley & Sons.
22http://www.buenosaires.gob.ar/innovacion/ciudadinteligente/proyectos
23https://www.facebook.com/mtlvi/
24Goldstein, B., & Dyson, L. (Eds.). (2013). Beyond transparency: Open data and the future of civic innovation. San Francisco, CA: Code for America Press. http://beyondtransparency.org/pdf/BeyondTransparency.pdf
25Goldsmith, S. & Crawford, S. (2014). The responsive city: Engaging communities through data-smart governance. Hoboken, NJ: John Wiley & Sons.
26Ashlock, P. (2015). Highlights from the Open311 Ecosystem. Open311, 21 June. http://www.open311.org/2015/06/highlights-from-the-open311-ecosystem/
27Steinberg, T. (2013). Open311: What is it, and why is it good news for both governments and citizens? MySociety, 10 January. https://www.mysociety.org/2013/01/10/open311-introduced/
29https://www.fixmystreet.com/
30Based on https://status.open311.org/
31Nalchigar, S. & Fox, M. (2017). Achieving interoperability of smart city data: An analysis of 311 Data. Journal of Smart Cities, 3(1), 1–13. http://eil.mie.utoronto.ca/wp-content/uploads/2015/06/nalchigar-jsc17.pdf
32http://datastandards.directory
33Goodyear, S. (2015). The invention of 3-1-1 and the city services revolution. CityLab. https://www.citylab.com/city-makers-connections/311/
34mySociety. (2018). The UK’s street fault reporting website. FixMyStreet.com. https://www.mysociety.org/community/fixmystreet-in-the-uk/
35Ashlock, P. (2009). Open311.org launches. Open311, 23 June. http://www.open311.org/2009/06/open311-launches/
36McHugh, B. (2013). Pioneering open data standards: The GTFS story. In B. Goldstein & L. Dyson (Eds.), Beyond transparency: Open data and the future of civic innovation (pp. 125–136). San Francisco, CA: Code for America Press. http://beyondtransparency.org/pdf/BeyondTransparency.pdf#page=136
37http://www.digitalmatatus.com/about.html
38Mirri, S., Prandi, C., Salomoni, P., Callegati, F., & Campi, A. (2014). On combining crowdsourcing, sensing and open data for an accessible smart city. In 2014 Eighth International Conference on Next Generation Mobile Apps, Services and Technologies (pp. 294–299), 10–12 September 2014. https://doi.org/10.1109/NGMAST.2014.59
39Dong, J., Ma, C., Cheng, W., & Xin, L. (2017). Data augmented design: Urban planning and design in the new data environment. In 2017 IEEE 2nd International Conference on Big Data Analysis (pp. 508–512), 10–12 March 2017. https://doi.org/10.1109/ICBDA.2017.8078685
40Kitchin, R. (2014). The data revolution: Big data, open data, data infrastructures and their consequences. London: SAGE Publications.
41Baumgarten, J. & Chui, M. (2009). E-Government 2.0. McKinsey Quarterly, July. https://www.mckinsey.com/industries/public-sector/our-insights/e-government-20
42Ganapati, S. (2011). Use of dashboards in government. Fostering Transparency and Democracy Series. Washington, DC: IBM Center for the Business of Government. http://www.businessofgovernment.org/sites/default/files/Use%20of%20Dashboards%20in%20Government.pdf
43Matheus, R., Vaz, J.C., & Ribeiro, M.M. (2014). Open government data and the data usage for improvement of public services in the Rio De Janeiro city. In Proceedings of the 8th International Conference on Theory and Practice of Electronic Governance (pp. 338–341). Guimaraes, Portugal, 27–30 October 2014. New York, NY: Association for Computing Machinery. https://doi.org/10.1145/2691195.2691240
44Medina, E. (2011). Cybernetic revolutionaries: Technology and politics in Allende’s Chile. Cambridge, MA: MIT Press. https://99percentinvisible.org/episode/project-cybersyn/
45Kitchin, R., Lauriault, T.P., & McArdle, G. (2015). Knowing and governing cities through urban indicators, city benchmarking and real-time dashboards. Regional Studies, Regional Science, 2(1), 6–28. https://doi.org/10.1080/21681376.2014.983149
46http://dashboard.edmonton.ca/
47http://www.urbandashboard.org/
48Peled, A. (2011). When transparency and collaboration collide: The USA Open Data Program. Journal of the American Society for Information Science and Technology, 62(11), 2085–2094. https://doi.org/10.1002/asi.21622
49Van Schalkwyk, F., Chattapadhyay, S., Caňares, M., & Andrason, A. (2015). Open data intermediaries in developing countries. Working Paper. Washington, DC: World Wide Web Foundation. http://hdl.handle.net/10625/56288
50Magalhaes, G., Roseira, C., & Strover, S. (2013). Open government data intermediaries: A terminology framework. In Proceedings of the 7th International Conference on Theory and Practice of Electronic Governance (pp. 330–333). New York, NY: Association for Computing Machinery. http://dx.doi.org/10.1145/2591888.2591947
51Wolff, A., Gooch, D., Cavero, J., Rashid, U., & Kortuem, G. (2019). Removing barriers for citizen participation to urban innovation. In M. de Lange & M. de Waal (Eds.), The hackable city (pp. 153–168). Singapore: Springer Singapore. http://link.springer.com/10.1007/978-981-13-2694-3_8
52Fox, M.S. & Pettit, C.J. (2015). On the completeness of open city data for measuring city indicators. In 2015 IEEE First International Smart Cities Conference (pp. 1–6). 25–28 October 2015. http://dx.doi.org/10.1109/ISC2.2015.7366147
53Sadoway, D. & Shekhar, S. (2014). (Re)Prioritizing citizens in smart cities governance: Examples of smart citizenship from urban India. Journal of Community Informatics, 10(3). http://ci-journal.org/index.php/ciej/article/view/1179
54Rockefeller Foundation. (n.d.). 100 resilient cities: Resources. 100 Resilient Cities. https://www.100resilientcities.org/resources/
55Leichenko, R. (2011). Climate change and urban resilience. Current Opinion in Environmental Sustainability, 3(3), 164–168. https://doi.org/10.1016/J.COSUST.2010.12.014
56Muller, M. (2007). Adapting to climate change: Water management for urban resilience. Environment and Urbanization, 19(1), 99–113. https://doi.org/10.1177/0956247807076726
57ODI. (2016). Principles for strengthening our data infrastructure. Open Data Institute [Article], 31 August. https://theodi.org/article/principles-for-strengthening-our-data-infrastructure/
58Landry, J.-N., Webster, K., Wylie, B., & Robinson, P. (2016). How can we improve urban resilience with open data? Ottawa: Open Data for Development. https://drive.google.com/file/d/0B739vUevKlPgYjJweC1NMElDaVk/view
59Linders, D. (2013). Towards open development: Leveraging open data to improve the planning and coordination of international aid. Government Information Quarterly, 30(4), 426–434. https://doi.org/10.1016/j.giq.2013.04.001
60Hartung, C., Lerer, A., Anokwa, Y., Tseng, C., Brunette, W., & Borriello, G. (2010). Open data kit: Tools to build information services for developing regions. In Proceedings of the 4th ACM/IEEE International Conference on Information and Communication Technologies and Development. London, 13–16 December. New York, NY: Association for Computing Machinery. http://doi.acm.org/10.1145/2369220.2369236
62GFDRR (Global Facility for Disaster Reduction and Recovery). (2014).Open Data for Resilience Initiative: Field guide. Washington, DC: World Bank. https://www.gfdrr.org/sites/gfdrr/files/publication/opendri_fg_web_20140629b_0.pdf
63GFDRR. (2014). Open Data for Resilience Initiative: Planning an open cities mapping project. Washington, DC: World Bank. https://opendri.org/wp-content/uploads/2014/12/Planning-an-Open-Cities-Mapping-Project_0.pdf
64Goodchild, M.F. (2007). Citizens as sensors: The world of volunteered geography. GeoJournal, 69(4), 211–221. https://doi.org/10.1007/s10708-007-9111-y
65Zook, M., Graham, M., Shelton, T., & Gorman, S. (2010). Volunteered geographic information and crowdsourcing disaster relief: A case study of the Haitian earthquake. World Medical & Health Policy, 2(2), 7–33. https://doi.org/10.2202/1948-4682.1069
66Poiani, T.H., dos Santos Rocha, Castro Degrossi, L., & d’Albuquerque, J.P. (2016). Potential of collaborative mapping for disaster relief: A case study of OpenStreetMap in the Nepal earthquake 2015. In 2016 49th Hawaii International Conference on System Sciences (HICSS) (pp. 188–197). Koloa, HI, 5–8 January. https://doi.org/10.1109/HICSS.2016.31
67Soden, R. & Palen, L. (2014). From crowdsourced mapping to community mapping: The post-earthquake work of OpenStreetMap Haiti. In C. Rossitto, L. Ciolfi, D. Martin, & B. Conein (Eds.), COOP 2014 – Proceedings of the 11th International Conference on the Design of Cooperative Systems (pp. 311–326). Nice, France, 27–30 May. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-06498-7_19
68Exel, M.V. (2017). OpenStreetMap receives $25,000 grant from American Red Cross. OpenStreetMap Blog, 8 September. https://blog.openstreetmap.org/2017/12/08/openstreetmap-receives-25000-grant-from-american-red-cross/
CONTENTS
Chapter 17. Algorithms and artificial intelligence
Chapter 18. Data infrastructure
The chapters in this section address a number of cross-cutting issues that shape the current state of open data.
In 2009, while giving a TED Talk on “The next web” and invoking an earlier talk by Hans Rosling1 on the potential of data, Sir Tim Berners-Lee invited the audience to chant “Raw data, now”.2 That message was highly influential in early open data work. Coupled with the argument that government-collected data has already been paid for by taxpayers and so should, by rights, be available for them to reuse, the mantra of “raw data, now” led to a focus on getting as much data as possible, as quickly as possible, on to open data portals. Notably, although the open government data movement initially sought to draw a clear line between data about individual people and non-personal government data, focusing only on the latter, Berners-Lee’s speech covered the whole spectrum of data from official government datasets to social network data, research data, and crowdsourced citizen-generated data. His argument was that access to raw data was the first step toward building a web of interlinked data, noting that “there’s not an immediate return on the investment”, but that “it will only really pay off when everybody else has done it”.3 “Raw data, now” was both a political call to remove the gatekeepers restricting the flow of data to skilled users and a strategic move, seeking to pre-emptively challenge the “data hugging” and organisational inertia that might prevent the potential benefits of data sharing from being realised.
In the decade that has followed, that idea of “raw data, now” has faced a number of critical issues and open data advocacy has had to adapt in response. First, the line between public and private data has turned out to not be so easily drawn. As early as 2010, governments attempted to balance open data and privacy concerns, although, as Chapter 23 (Privacy) explores, it was not until 2015 that privacy principles started to be widely incorporated into the international discourse on open data. As the chapter notes, considerable work has now gone into developing tools and resources to support governments and civil society in addressing privacy risks related to open data efforts. Because new technologies can transform older documents published by governments into searchable data, these risks relate to more than just the publication of new datasets. Although anecdotes of open data-related privacy breaches are relatively few and far between, it has become clear that with very different legal frameworks, cultural practices, and risk profiles across the world, providing raw data on demand may not always be possible. Instead, a more intentional approach has to be taken, weighing the potential public benefits of opening data against the privacy rights of individuals included in the data or even of third-parties who may be affected by the data.
Chapter 23 also introduces the Open Data Institute’s data spectrum, situtating open data on a continuum from closed to open. This, alongside a range of other conceptual innovations, such as “data stewardship”4 and “responsible data”,5 may serve to blur (artificially) the neat boundaries of open data, presenting open data programmes with the choice of maintaining a narrow focus on data availability or considering the consequences, positive and negative, of broader data accessibility and use.
This leads to the second critical issue. Data use requires users with the technical and analytical capacity to transform data to information, knowledge, and action. As per Chapter 19 (Data literacy), data literacy has moved up on the agenda of funders, governments, and civil society networks, but capacity to make the most of open data remains scarce, and training and capacity building delivered to date is woefully inadequate. Calls for more investment in this area are well warranted and should be matched by a continued shift in open data measurement work to look not only at the supply of, and the demand for, data but also examine data use (Chapter 22: Measurement). However, when it comes to enabling open data use, there is also an interaction that exists between the “rawness” of data and data literacy building. Chapter 19 describes how lower-qu