Unrasterize

Motivation

Raster data formats have become increasingly popular to represent global population density (see, for example, CIESIN’s Gridded Population of the World). But the sheer number of pixels can make working with raster data difficult for certain use cases, especially in live applications where calculations are performed on the fly.

Enter unrasterize, a lightweight package to extract representative population points from raster data. The library returns points in vector format (i.e., GeoJSON), selecting certain pixels as “representative” and aggregating nearby population values to preserve the total population.

For a concrete example, consider Encompass, a tool used to measure spatial access to critical social services. Measuring the driving time between every raster pixel and every service provider is computationally infeasible. Unrasterizing the data yields a manageable number of points and allows the massive pairwise distance computation to terminate before the sun becomes a red giant.

Note that some loss of fidelity is inevitable. In fact, that’s the point.

Sample Output

https://farm5.staticflickr.com/4708/39370187915_693f694b79_z_d.jpg

Contents

Usage

Algorithm walkthrough

Let’s walk through the algorithm behind the default Unrasterizer class. The basic use pattern is as follows:

u = Unrasterizer(mask_width=4, threshold=0.5)
representative_points = u.select_representative_pixels(raster_data)

mask_width indicates minimum number of non-selected pixels between adjacent selected pixels.

threshold indicates the minimum value required for a pixel to be selected. Measured in the same units as value (often population per pixel).

First, all pixels with value below the given threshold will be ignored. The remaining pixels are sorted according to their values and considered for selection one by one.

After each point is selected, all points within a circle of radius mask_width in the \(L^\infty\) norm (i.e., a square in the standard Euclidean norm) of the newly chosen point are masked and removed from further consideration. This process continues until every point is either selected or part of the mask.

The Unrasterizer class performs this selection process on the entire raster file, while the WindowedUnrasterizer applies the same process across each individual raster block and combines the results into a single array of selected pixels.

Demonstration

https://github.com/tetraptych/unrasterize/blob/master/docs/img/unrasterizer.gif?raw=true

Here, the initial red triangle in the west represents pixels that fall below the population threshold. This could represent an uninhabited geographical feature, like a mountainside or part of a lake.

The initial purple pixel indicates the pixel with the highest value. It is chosen first.

All points within a square of radius mask_width of the chosen pixel are added to the mask (moving from turquoise to orange to red). These points will never be selected going forward.

Next, the remaining (turquoise) pixel with the highest value is selected and the pixels around it are masked. This process repeats ad infinitum.

Examples

For an example of unrasterize in action, see this Jupyter notebook.

Unrasterize API

The following classes exist to convert raster data to GeoJSON.

For large raster files, the WindowedUnrasterizer is the most memory efficient, with the caveat that it may select some points from adjacent windows that are very close together.

Classes

unrasterize.BaseUnrasterizer
unrasterize.Unrasterizer
unrasterize.WindowedUnrasterizer

Contributing

View unrasterize on GitHub.

Indices and tables