top of page

GeNetIt - An R library for Spatial Graph-theoretic Genetic Gravity Modelling

 

A tutorial for building a spatial graph with covariates, structuring site (node) data and specifying a gravity model. Please note that this package is in the alpha test phase. 

 

First let's set site library and install the GeNetIt package and then add the required libraries to the current R session


    .Library.site <- file.path(chartr("\\", "/", R.home()), "library")
    .libPaths(file.path(chartr("\\", "/", R.home()), "library"))
    install.packages("http://s3.amazonaws.com/rpackages/GeNetIt_0.1-0.zip", type="binary", repos=NULL)

    library(sp)
    library(spdep)
    library(raster)
    library(GeNetIt)

 

There is example data included with the library that can be used in the tutorial. The dps data.frame are the genetic distances, ralu.site is a SpatialPointsDataFrame with at site data and will be used as the node data both to construct the graph and in the gravity model. The rasters data is a SpatialPixelsDataFrame that we will coerce to a raster stack for use as covariates both at the nodes and edges.


    data(dps)
    data(ralu.site)
    data(rasters)

   
Here we coerce the SpatialPixelsDataFrame into raster stack so it can be used in subsequent functions


    ( xvars <- stack(rasters[-6]) )
    ( land.cov <- stack(rasters[6]) )

 

Build at site covariates by extracting raster point values


Build at site covariates by assigning raster pixel values to the points

 
    ralu.site@data <- data.frame(ralu.site@data, extract(xvars[[c(1,2)]], ralu.site))
 

Here we create a kNN graph using the knn.graph function. For computational reasons we will apply a distance constraint (max.dist = 5000). In real analysis one may want to use a distance constraint to represent ecological processes such as speceis dispersal limitaions. If d is not defined the resulting graph will be saturated. We use the row.names = "SiteName" ID's contained in our node data to track a common ID that we can merge results on.

 

    dist.graph <- knn.graph(ralu.site, row.names = ralu.site@data[,"SiteName"], max.dist = 5000)

 

Here we Add "from.to" unique ID's to the graph and the genetic distance data and merge with genetic distance matrix. If the genetic distance data is in a matrix format there is a helper function "dmatrix.df" that will coerce the matrix into a data.frame object suitable for this step.

 

    dist.graph@data$from.to <- paste(dist.graph$i, dist.graph$j, sep=".")
    dps$from.to <- paste(dps$FROM_SITE, dps$TO_SITE, sep=".")

 

    dist.graph <- merge(dist.graph, dps, by = "from.to")
    dist.graph@data <- dist.graph@data[,-c(7,8)]

 

Since we are working with subset data we need some house keeping to remove NA values in the graph

 

    na.index <- unique(as.data.frame(which(is.na(dist.graph@data), arr.ind = TRUE))[,1])
    dist.graph <- dist.graph[-na.index, ]


Here we display the a summary of the resulting data and plot the graph and nodes with an underlying raster.


    str(dist.graph@data)  
    plot(xvars[[2]])
      plot(dist.graph, add=T)
      points(ralu.site, pch=20, col="red")

 

Build between site covariates by extracting raster values and calculating statistics

 

We want an additional statistic "skewness" avaliable for statistical summaries on the graph so we define a skewness function.
 

    skew <- function(x, na.rm = TRUE) {  
              if (na.rm) x <- x[!is.na(x)]
              sum( (x - mean(x)) ^ 3) / ( length(x) * sd(x) ^ 3 )  
      }

 

The "graph.statistics" function allows us to calculate statistical summaries on rasters for each edge in our graph. The function uses a sample for each line. If you defind the "d" argument as the same resolution of the raster(s) with the default sample type = "regular" the function will sample every pixel along the edge. It is also possible to subsample and speficy different sampling schemes such as random. The result of the function are a data.frame that can be joined back to the graph. If sp=TRUE a SpatialPointsDataFrame object is also returned that contains the point samples for each edge along with the assiciated raster values.

      
    stats <- graph.statistics(dist.graph, r = xvars, d=30, stats = c("min", "mean", "max", "var", "skew"), sp = FALSE)
      dist.graph@data <- data.frame(dist.graph@data, stats)


If we have categorical raster data (eg., USGS NLCD landcover) a custom function needs to be written and the function run only a single nominal raster at a time. Her we write a function forcalculating the proportion of wet landcover types and then pass the function and landcover raster to graph.statistics.


    wet.pct <- function(x) {
      x <- ifelse( x == 11 |  x == 12 | x == 90 | x == 95, 1, 0)
      prop.table(table(x))[2]
    }

  
    lc.stats <- graph.statistics(dist.graph, r = land.cov, d=30, stats = "wet.pct")
      lc.stats[is.na(lc.stats)] <- 0
      dist.graph@data <- data.frame(dist.graph@data, lc.stats)
    
    str(dist.graph@data)

 

Build origin (from) and destination (to) node data for gravity model and merge with graph edges


We need to build from/to site (node) level data structure so we use the helper function "build.node.data" that structures the data so there is from and to node characteristics. Unless you are planning on defining a different set of "to" parameters (which is not a good idea unless you really know what you are doing) the site parameter set is only needed for the from.parms argument and will be duplicated to the to.params.


    site.parms = c("AREA_m2", "PERI_m", "Depth_m", "TDS", "cti", "err27")
    site <- build.node.data(ralu.site@data, group.ids = c("SiteName"), from.parms = site.parms )


Nowe we can merge the site (nodes) data with the graph (edges).

                       
    cdata <- merge(dist.graph, site, by.x="from_ID", by.y="SiteName")
      cdata$Dps <- 1 - cdata$Dps
      cdata <- cdata@data

 

Specify and fit gravity model

 

Specify parameters
 

    x = c("length", "mean.cti", "mean.err27", "mean.ffp", "mean.gsp", "f.AREA_m2", "f.Depth_m", "f.err27")

Gravity model
 

    ( gm <- gravity(y = "Dps", x = x, d = "length", group = "from_ID", data = cdata) )

Plot diagnostics for gravity model


    par(mfrow=c(2,3))
      for (i in 1:6) { plot(gm, type=i) }

 

bottom of page