Mapping data from Indonesia’s disaster information portal

Maps are great for decision-making (ex. where’s the nearest restaurant, how to get from point A to B)… they’re even better when you know how use them to help analyze data and information (thank you geography degree). A lot of data visualization automation software exists now that can produce charts, graphs and even maps to help see trends and patterns. But when it comes to really understanding and analyzing information, there’s still a lot to be said about including a human touch/perspective to data and information visualization.

One of the projects I’ve been working on is to capture and analyze disaster-induced displacement information for the Internal Displacement Monitoring Centre (IDMC) and it’s Global Report on Internal Displacement and Global Internal Displacement Database. One of things IDMC wants to know when a disaster strikes, like a flood, hurricane or earthquake, is how many people are displaced? It’s a simple research question that usually doesn’t lead to a straight-forward answer. Challenges can include lack of government monitoring for this kind of information, data collection and standardization issues, accessibility of said data, or even the political nature of publishing and sharing this information.

GRID2016_1000px

Fortunately some governments actually do a great job in collecting, processing, and publishing this kind of information. Indonesia is one of them. The government provides a disaster data portal which it maintains on a regular basis that tracks where a disaster takes place, when it happens, what kind of hazard triggered the disaster event, and the people killed, missing, injured and displaced/evacuated. For one of the most disaster-prone countries in the world, having this kind of information online, updated and easily accessible is an asset for research organizations like IDMC to be able to develop policies and recommendations that can have an impact on saving lives.

While the website has automatic visualization features, it requires a lot of assumptions and understanding by the user to know what to search for. At the same time, it is a bit challenging to use since it’s an online portal that has limited visualization and analysis capabilities. As part of my research, I decided to put my geography background to work to make sense of this data.

The mapping feature from BNPB disaster data portal - http://dibi.bnpb.go.id/data-bencana
The mapping feature from BNPB disaster data portal – http://dibi.bnpb.go.id/data-bencana

I downloaded the raw data in Excel format and in most situations a quick manipulation of Excel can reveal some trends. However the Excel included too many data points with differing variables like event date, hazard type, and location. I wanted to find a better way to make sense of its data so I decided to plot the data using QGIS, a free open-source Geographic Information System (GIS).

Here’s a quick summary of what I did:

  1. The Excel included raw district-level disaster information that goes back as far as 1815. I only need 2016 data so I filtered the data set and extracted all 2016 data that included “Mengungsi” or evacuation values greater than zero.
  2. In order to plot the data on a map, I needed to add spatial information to the data set. As the Indonesian data was broken down by districts, a quick search led me to district boundary level data published by the World Food Programme – unfortunately I couldn’t find district-level spatial data on the government website.
  3. Once I joined the Excel sheet with the district boundaries, I still needed to clean and verify that all districts in the government disaster data set matched the WFP district boundary data set. This is key otherwise the data can’t be mapped by QGIS.
  4. Since no GPS locations were included to pinpoint exactly where each disaster occurred, I defined a centroid (i.e. a point at the centre of each district boundary). This allowed me to plot each event as a specific point on the map to help in analyzing and aggregating information since multiple events can take place in one district.

It may not have been pretty, but it did make it easier to interpret the data based on hazard type, event date, and geographic location. And it made it more effective to work with when I wanted to conduct further analysis, run queries to address different research questions, and produce maps like the ones below.

Evacuations Events-by-Date Events-by-Hazard-Type Total-Events-by-District

Data visualization automation software and websites can be useful, but it’s also great to have a skill like old-school mapping and cartography to turn to when I need it… times and projects like these make me realize how useful a geography degree can be.

Data is good. Data is bad. Where’s the middle ground?

It’s amazing to see the use of technology to track, monitor, and collect information and data from things that we do, like sports, to help improve and enhance what we do. I think it’s called life-hacking and if you’re thinking of resolutions for the New Year, there’s a whole website about doing just that. Yet as we get more and more comfortable tracking and hacking ourselves (FitBit anyone?), I can’t help but think – are we getting too reliant on the data?

I’ve written a couple posts about how data has been used to better predict and inform baseball and basketball. Yet a prime example of the over-reliance on data, and not balancing it with common sense, comes from the sports world. In this article, it shows that despite all the hype around big data for the 2014 World Cup, Nate Silver, a popular data geek who made predictions during the 2012 US presidential election, still got it all wrong. Some said that the competition was no place for big data, which can’t understand the intrinsic issues and subtleties that real soccer fans see. Others claimed that Silver ignored some basic data issues.

IMG_0333

Data and information that we collect whether through technology or by ourselves inherently has biases, like how someone setup these exit “sortie” signs for a reason. We build the technology and develop algorithms that are supposedly objective yet in developing them we make inherent compromises and assumptions. The same goes with collecting and compiling data ourselves – from monitoring our diet, building a contact/email list, or just keeping track of our to-do lists and calendar – we are biased to certain things (ex. what we think is more important, what we can remember, etc.) when collecting this information (i.e. Excel sheets anyone?). Also are we managing our information consistently enough so that it can reveal some truth that can help our decision-making?

P1150774

I work for an organization that prides itself on its “information management” and spend a lot of time with internal and external clients to not only improve this management (ex. simplify, organize and clean the data), but also to understand how it can be used strategically to communicate their work and key messages (ex. like making a good infographic). Within the international development community, OCHA is light-years ahead of the game when it comes to this. They’ve also evolved and branched out to apply information in a useful way for the humanitarian community like the recently launched INFORM initiative to improve risk analysis and the Humanitarian ID project to make contact management simpler and better for an emergency or crisis. Perhaps following OCHA’s lead, there are plenty of UN organizations starting to visualize this information and realize that data is more than just 1’s and 0’s or that it’s only for “geeks”, but can be used in different ways to communicate and provide “evidence” to improve programming and decision-making. The success of innovative ideas like these will depend on how accessible both the data and tools are to the people who will use them. It can also be summed up by these two quotes from the Nate Silver article:

Predictions are no better than the quality of data and model that you employ.

Big data and predictive techniques are supposed to inform smart decision making, not automate it.

On the opposite end of the spectrum of the Silver article are these little visual vignettes by the New York Times of what went right for the dutch, and so wrong for Brazil during the world cup. They are both data-driven and informative.

Data simplicity might be the best way to help us understand and improve the way we do things.

Cartographer takes Kobe to school

Admittedly the title for this post would make for a great story, but unfortunately it’s only fantasy for now. Skill and talent may still have a large part to play in basketball, but Kirk Goldsberry thinks there’s more to it and that thinking like a cartographer (i.e. you know, those guys that makes maps) might actually help people understand the game better and improve the way the NBA plays and manages it.

Goldsberry’s quest to map every moment of basketball really stood out for me in WIRED magazine last month. This excerpt from Mark Mclusky’s new book “Faster, Higher, Stronger” (Xmas present anyone?) is about maps and basketball. Two of my favorite things. I can’t help but think where was this research when I was studying Geography in university – I would’ve jumped at the chance to work on it.

Goldsberry’s research is different compared to that of data-analytics-driven baseball (i.e. Moneyball). He saw the constant flow of basketball as just a problem in information flow.

Unlike the static, state-to-state action in baseball, basketball is a constant flow. Players switch from offense to defense, from posting up to double-teaming. If a baseball player is a left fielder, you know the basic area he will patrol on defense. If a basketball player is a forward, he could be anywhere on the court at any time. The game has no states, so statistically you can’t determine the odds of a given outcome.

Basketball hoop

So the whole problem with basketball wasn’t so much percentages and probabilities, but of space… more specfically the spatial distribution of players in where they have their strengths when shooting, playing defense, or driving the lane.

Instead of focusing on the numbers that defined a state in baseball, Goldsberry began to focus on the locations and movement of objects—specifically, the players and the ball. It was a mapping problem… To understand basketball, you also have to understand space. You need a cartographer.

The best thing about this project isn’t so much the geek-factor of collecting stats and visualizing it, but what Goldsberry wanted it to do.

“I wanted to find a way to get this data to sing a new song, to tell us things like where Kobe is good and where Kobe is bad… and to communicate to players, and fans, and the media.”

By charting the location and frequency of every shot in the NBA, Kirk Goldsberry can create a map of the strengths and weaknesses of each player’s offensive game, like the ones below.

Midrange shots aren’t very productive for most players—except Nowitzki, who loves the right baseline.
Even the most prolific three-point shooter of all time has relatively weak areas, like from the left wing.

If this really is going to change the face of basketball like how Moneyball did for baseball, I’m looking out for a future movie. In the meantime, it would be great to see a head-to-head match up with Goldsberry and Kobe!

Data isn’t everything – let’s balance it with common sense

I love visualizations, data, and information and finding creative ways to turn it into something interesting and useful. It’s a great way to take advantage of the analytical and creative sides of the brain. At the same time, I’m quite aware that even if the world is becoming more visual and addicted to stats and numbers, we have to be even more wary of how that information is being used and interpreted. It’s shouldn’t be about seeing the superficial side of a statistic and using it in the hopes of sensationalizing a topic (i.e. it’s tempting for journalists and others to do this), but being true to what the statistics represent, building a story around it, and respecting how this may influence the audience.

That’s why it’s refreshing to see that in WIRED, a magazine focused on technology and all the numbers coming from it, they published Felix Salmon’s article “Numbed by Numbers: Why Quants don’t know everything“, which helps to put a bit of perspective on the numbers game.

Let's not get bent out of shape over numbers.
Let’s not get bent out of shape over numbers.

According to the Merriam-Webster dictionary, a quant is an expert at analyzing and managing quantitative data and its first known use was in 1979. In Salmon’s article, he uses the example of the movie Moneyball which documented how statistics were used in baseball to help the underfunded Oakland A’s to a division-winning 2002 season. He writes that quants are almost always right since they use algorithms and setup systems that track every aspect of society with 1’s and 0’s. Yet, the more that a field is run by a system, the more the system creates incentives for everyone to change their behavior – and in the end people start to cheat the system – and that the statistics/numbers generated by the system may not actually hold value or be telling the “truth”.

It’s increasingly clear that for smart organizations, living by numbers alone simply won’t work…

There needs to be a bit more of a balance to the numbers that can help make our lives better and the use of good ol’ human insight, decision-making and common sense. Believing in statistics as they stand is one thing, but we also have to use our judgement and experience to bolster our understanding so that this information can improve the society we live in. For example, the National Weather Service employs meteorologists who, understanding the dynamics of weather systems, can improve forecasts by as much as 25% compared with computers alone.

Let’s celebrate the value of disruption by data – but let’s not forget that data isn’t everything.

Read “Numbed by Numbers: Why Quants don’t know everything

The Economic and Human Impact of Disasters in the last 12 years

Conceptualized and designed this infographic that shows the Economic and Human Impact of Disasters in the last 12 years. The data is sourced from the Brussels-based Centre for Research on the Epidemiology of Disasters (CRED) and their International Disasters Database (EM-DAT). The idea was to take global disaster statistics and visualize three key variables that UNISDR wanted to convey to the public – the economic and social impacts from natural hazards in the last 12 years. The infographic also includes key disaster events that correspond to the annual figures. The billion and trillion figures have been used by the media and referenced in numerous publications. To download the infographic, visit UNISDR’s Flickr account.