spbabel: a tidy view of Spatial

Spbabel provides simple tools to flip between Spatial and tidy forms of data. This package aims to assist in the ongoing development of tools for spatial data in R.

NOTE: this is largely deprecated by new code in https://github.com/hypertidy/silicate.

Formal spatial data in R

Spatial data in the sp package have a formal definition (extending class Spatial) that is modelled on shapefiles, and close at least in spirit to the Simple Features definition. See What is Spatial in R? for more details.

Turning these data into long-form tables has been likened to making fish soup, which has a nice nod to the universal translator babelfish.

The spbabel package tries to help by providing a more systematic encoding into the long-form with consistent naming and lossless ways to re-compose the original (or somewhat modified) objects.

There are two main forms.

  1. The two-table form is the “coordinates with identifiers”, and the metadata
  2. The four-table form applies database normalization of the coordinates table into branches and vertices, adding a fourth link table to allow de-duplication of the coordinates.

The two-table version is similar to that implemented in:

  • sp’s as() coercion for SpatialLinesDataFrame to SpatialPointsDataFrame
  • rasters’s geom()
  • ggplot2’s fortify()
  • gris’ normalized tables

The four-table form is in development across a number of projects. It is straightforward to work with but there aren’t any high-level tools in package form yet. (Though rbgm is probably the most advanced, it uses an early form that was absolutely necessary to read everything in those files before the more general approach here).

How do the ‘sptable()’ and ‘sp()’ functions work

The long-form single table of all coordinates, with object, branch, island-status, and order provides a reasonable middle-ground for transferring between different representations of Spatial data. Tables are always based on the “tibble” since it’s a much better data frame.

The sptable function creates the table of coordinates with identifiers for object and branch, which is understood by sptable<- to “fortify” and sp for the reverse.

The long-form table may seem like soup, but it’s not meant to be seen for normal use. It’s very easy to dump this to databases, or to ask spatial databases for this form. There are other more normalized multi-table approaches as well - this is just a powerful lowest common denominator.

We can tidy this up by encoding the geometry data into a geometry-column, into nested data frames, or by normalizing to tables that store only one kind of data, or with recursive data structures such as lists of matrices. Each of these has strengths and weaknesses. Ultimately I want this to evolve into a fully-fledged set of tools for representing spatial/topological data in R, but still by leveraging existing code whereever possible.

sptable: a round-trip-able extension to fortify

The sptable function decomposes a Spatial object to a single table structured as a row for every coordinate in all the sub-geometries, including duplicated coordinates that close polygonal rings, close lines and shared vertices between objects.

The sp function re-composes a Spatial object from a table, it auto-detects the topology by the matching column names:

  • SpatialPolygons: object_, branch_, island_, order_
  • SpatialLines: object_, branch_, order_
  • SpatialPoints: object_
  • SpatialMultiPoints: object_, branch_

The sp function could include overrides to avoid these tests but it’s so easy to modify a table to have the matches for the required topology it hardly seems worth it.

Use the sptable<- replacement method to modify the underlying geometric attributes (here x and y is assumed no matter what coordinate system).

library(maptools)
data(wrld_simpl)
library(spbabel)
(oz <- subset(wrld_simpl, NAME == "Australia"))
## class       : SpatialPolygonsDataFrame 
## features    : 1 
## extent      : 112.9511, 159.1019, -54.74973, -10.05167  (xmin, xmax, ymin, ymax)
## coord. ref. :  +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0 
## variables   : 11
## # A tibble: 1 x 11
##   FIPS  ISO2  ISO3     UN NAME    AREA POP2005 REGION SUBREGION   LON   LAT
##   <fct> <fct> <fct> <int> <fct>  <int>   <dbl>  <int>     <int> <dbl> <dbl>
## 1 AS    AU    AUS      36 Aust… 768230  2.03e7      9        53  136. -25.0

We can also restructure objects, by mutating the value of object to be the same as “branch” we get individual objects from each.

pp <- sptable(wrld_simpl %>% subset(NAME == "Japan" | grepl("Korea", NAME)))
## explode (or "disaggregate"") objects to individual polygons
## here each branch becomes an object, and each object only has one branch
## (ignoring hole-vs-island status for now)
wone <- sp(pp %>% mutate(object_ = branch_), crs = attr(pp, "crs"))
op <- par(mfrow = c(2, 1), mar = rep(0, 4))
plot(sp(pp), col = grey(seq(0.3, 0.7, length = length(unique(pp$object)))))
## Warning: Unknown or uninitialised column: 'object'.
plot(wone, col = sample(rainbow(nrow(wone), alpha = 0.6)), border = NA)

par(op)

The long-form table is also ready for ggplot2:

library(ggplot2)
ggplot(pp) + aes(x = x_, y = y_, fill = factor(object_), group = branch_) + geom_polygon()

Please note that that geom_polygon cannot handle islands with multiple holes, and it only can do one hole by pretending it is a closed path and hides the boundary so you don’t see the sleight of hand. See the package ‘ggpolypath’ for a ‘geom_polypath’ geom.

De-duplication of vertices

For the four-table for de-duplication is performed in “x”/“y” by default, but the technique is general and could be applied to a geometric space of any dimension. Duplicates are identified by coercing numeric values to character (after ‘base::duplicated’) and so is not robust to numeric issues. We haven’t found any problems yet.

An example of working with the four table form.

## Joining, by = "vertex_"
## Joining, by = "branch_"
## class       : SpatialPolygonsDataFrame 
## features    : 19 
## extent      : 95.00803, 209.7808, -54.74973, 14.59805  (xmin, xmax, ymin, ymax)
## coord. ref. : NA 
## variables   : 12
## # A tibble: 19 x 12
##    FIPS  ISO2  ISO3     UN NAME    AREA POP2005 REGION SUBREGION   LON
##    <fct> <fct> <fct> <int> <fct>  <int>   <dbl>  <int>     <int> <dbl>
##  1 AS    AU    AUS      36 Aust… 768230  2.03e7      9        53 136. 
##  2 BP    SB    SLB      90 Solo…   2799  4.72e5      9        54 160. 
##  3 FJ    FJ    FJI     242 Fiji    1827  8.28e5      9        54 178. 
##  4 FM    FM    FSM     583 Micr…     70  1.10e5      9        57 158. 
##  5 KR    KI    KIR     296 Kiri…     73  9.20e4      9        57 175. 
##  6 NC    NC    NCL     540 New …   1828  2.34e5      9        54 165. 
##  7 NF    NF    NFK     574 Norf…      0  0.          9        53 168. 
##  8 CK    CC    CCK     166 Coco…      1  0.          0         0  96.8
##  9 KT    CX    CXR     162 Chri…      0  0.          0         0 106. 
## 10 NH    VU    VUT     548 Vanu…   1219  2.15e5      9        54 167. 
## 11 NR    NR    NRU     520 Nauru      2  1.01e4      9        57 167. 
## 12 NZ    NZ    NZL     554 New …  26799  4.10e6      9        53 172. 
## 13 PP    PG    PNG     598 Papu…  45286  6.07e6      9        54 143. 
## 14 SN    SG    SGP     702 Sing…     67  4.33e6    142        35 104. 
## 15 TV    TV    TUV     798 Tuva…      3  1.04e4      9        61 179. 
## 16 ID    ID    IDN     360 Indo… 181157  2.26e8    142        35 114. 
## 17 TT    TL    TLS     626 Timo…   1487  1.07e6    142        35 126. 
## 18 PS    PW    PLW     585 Palau      0  2.01e4      9        57 135. 
## 19 RM    MH    MHL     584 Mars…      0  5.67e3      9        57 169. 
## # … with 2 more variables: LAT <dbl>, object_1 <chr>
plot(mm_sp, col = grey(seq(0, 1, length = nrow(mm_sp))))

## get a gg plot

ggplot(ctable %>% inner_join(dplyr::select(mm$o, NAME, object_))) + 
  aes(x = x_, y = y_, group = branch_, fill = NAME) + 
  geom_polygon() + ggplot2::coord_equal()
## Joining, by = "object_"