There are regular calls for more global water data. And there are also many, many global water data sets out there, so many that we’re practically swimming in water data. What’s the disconnect?
My work focuses on global trends in water availability, water use, and water productivity, which means I spend a lot of time using diverse global data sets and also wishing there were more. Recently, I analyzed the outputs of a global water resources model, WaterGAP3, to assess the frequency with which people consume most of the annually renewable water in a watershed. Interestingly, it turned out we could break down the findings into distinct categories—in a small fraction of watersheds, people use up most of the renewable water that’s available all year (~2% of watersheds); there are many more watersheds where we’re using up most of the water during the dry season (9%) or during dry years (21%); in the other 67% of the world’s watersheds, we’re using up very little of the water.
Doing this work, I’ve identified four big, interconnected issues to explain the simultaneous abundance and scarcity of global water data. First, the category of water data potentially encompasses a tremendous number of things. Second, global data coverage is uneven. Third, lots of water questions require very detailed data to address them. And fourth, big data are unwieldy and frequently hard to interpret—in other words, big data are big.
Before I start, though, I want to clarify what I mean by global water data. In this context, I’m talking about two things. One is measured biophysical data, including everything from water quality measurements to rainfall measurements to the locations of pipes under a city. The second is modeled data, which can include everything from model outputs that fill in missing measurements to calculations of global water availability based on geographic information and driven by climate inputs. Neither one is wholly reliable—I’ve spent enough time working in the field to know that rainfall collectors get backed up with leaf litter and data loggers stop working, and I’ve also worked with enough modeled data to know that equations and assumptions that provide reasonable answers in some places spit out nonsense in others. That doesn’t mean we shouldn’t use or trust both types of data. It does mean that we should be critical of what data tell us and how much confidence we should have about an answer.
On to the disconnect between not enough data and too much data. The first reason I’ve identified for this mismatch is that the topics and measurements that fall into the category of water data are nearly endless. On the biophysical side, there are climate data—such as the amount, frequency, and intensity of precipitation now, in the past, and into the future. There are hydrologic data—river flow or aquifer characteristics, and data about water quality. Somewhere between the biophysical and social realms are data related to infrastructure—everything from the location of water withdrawals to the direction of inter-basin transfers to the location, age, and materials of pipes under a city. There are also many, many social data related to water—what kind of water governance system exists, who has water rights, where those water rights are located. Especially for social data, we frequently don’t have global coverage of variables of interest. When we do have biophysical or social data in hand, it my not be the right information to address our question. And almost any water question could probably be answered better if it were informed by each of the categories above. So it’s no wonder we hear a constant clamoring for more data. For logistical reasons, however, it’s hard to imagine we’ll ever collect every piece of information that might someday inform the answer to a water question, so as a water community we need to prioritize which data to collect, particularly at the global scale. We can start to do this by assessing which questions are the most important to answer at a global scale and what data are crucial to answering them.
A second reason abundant data can seem sparse is that the coverage of global data is uneven, in both space and time. I focus primarily on biophysical data from here onwards, both because it’s what I know best and because so much more of it is rapidly becoming available. Usually with global data, we’re aiming to get information that’s distributed evenly all over the world: for example, annual rainfall totals for every country, county, or grid square formed by latitude and longitude lines. We generally get these global data by stitching together local data. Sometimes the data are the same everywhere, such as images from satellites. However, in many cases the source data sets are at different resolutions, because in some places we take a lot of measurements, in others not so many. Even for something as seemingly straightforward as rainfall data, we still use a model to fill in the gaps between individual measuring gauges. If there are a lot of gauges, the model is more likely to provide an accurate number for the spaces in between, but even when we have a lot of measurements there is still quite a bit of uncertainty in global data. In places where little local data are available, global data are particularly important because they allow us to make inferences about a local water situation, and yet these are exactly the places for which we’ve had to do the most modeling and interpolation.
A third reason water data can appear sparse or incomplete is because the answers to many questions can only be illuminated with information that is very detailed in time and space. Lead in tap water is a great example. Global data about infrastructure age could conceivably tell us where lead might be a problem by highlighting which cities are likely to be affected, based on the materials likely used to construct a system in a given place at a given time. Global data might even suggest what fraction of the population is likely affected. Knowing exactly which people are affected requires much more detailed data about in-home water quality or the materials in pipes going to specific homes. From a global perspective, high resolution data frequently means a grid square, often one of the more than 9 billion 5 arc-minute grid cells that make up the globe; from the perspective of a farm or a house or person on foot, the approximately 60 square km inside each grid cell is pretty big. The data point for a grid square represents an average or majority or some other simplification of what’s inside. There are plenty of questions that are best answered on a global scale, especially inquiries that compare distant places, identify hotspots, or evaluate global and regional trends, but not every question needs a global answer. One critical insight global data can provide is guidance on where more detailed local data would likely provide answers to pressing questions.
A fourth reason abundant data may not be recognized and used is that even with computing power increasing, big data take a lot to store and analyze. Most days I work with global water and crop data at five arc-minute resolution (60 minutes in 1 degree, so that’s 1/12th of the square made by latitude and longitude lines 1 degree apart, about 8 km by 8 km at the equator). When I spread those data out into a grid, it measures 4,320 rows by 2,160 columns, a lot more than a spreadsheet program can handle. Looking for patterns in a global data set and making sense of what we see requires analytic tools. For example, satellites now provide high-resolution images of the earth that are updated regularly. This provides a whole new way to track changes visible in the landscape, like identifying newly build dams. To do that, you could hire someone to look at every single grid square individually, every year, or you could figure out how to identify dams using markers a computer can sense and write a sophisticated computer program to search for them. Big water data can provide amazing insights, but we have to advance analysis and better articulate questions in ways that our analytic tools can respond to.
There are a lot of global water data, but they can’t answer all our water questions. We have more biophysical data than social data, and most of the existing data are some combination of measurements and model outputs. There are some exciting questions we can ask of global data, and some interesting answers. But we’ll never be able to answer local questions at the global scale, nor would we even know the appropriate local questions. Instead, we have the exciting challenge ahead of identifying the global water questions of importance, figuring out how to analyze the data we have, identifying what additional global data would really improve our analysis, and then going out and collecting, storing, and sharing it.
Sevruk, Ondrás et al. (2009) explains how the World Meteorological Organization needs to and does normalize measured rainfall data from governments all over the world to make them comparable.
Döll, Douville et al. (2015) provides an overview of the challenges in global water modeling, and Sood and Smakhtin (2015) review the major global hydrologic models.
Brauman, K. A., B. D. Richter, S. Postel, M. Malsy and M. Flörke (2016). “Water depletion: An improved metric for incorporating seasonal and dry-year water scarcity into water risk assessments.” Elementa: Science of the Anthropocene 4(1): 000083.
Döll, P., H. Douville, A. Güntner, H. Müller Schmied and Y. Wada (2015). “Modelling Freshwater Resources at the Global Scale: Challenges and Prospects.” Surveys in Geophysics: 1-27.
Sevruk, B., M. Ondrás and B. Chvíla (2009). “The WMO precipitation measurement intercomparisons.” Atmospheric Research 92(3): 376-380.
Sood, A. and V. Smakhtin (2015). “Global hydrological models: a review.” Hydrological Sciences Journal 60(4): 549-565.
Brauman, Kate A. 2016. “Global Water Data: We’ll Show You the World, Sort Of.” Open Rivers: Rethinking The Mississippi, no. 2. http://editions.lib.umn.edu/openrivers/article/global-water-data-well-show-you-the-world-sort-of/.
Download PDF of Global Water Data: We’ll Show You the World, Sort Of by Kate A. Brauman.