top of page

Raster Analysis in R

Aside from manipulation matrix and array objects, the primary ways to handle rasters in R are the raster, rgdal and sp libraries. The difficulty in raster analysis is that R holds everything in active memory making the handling of large rasters problematic. The raster library provides the capacity to hold rasters out-of-memory allowing processing of larger data as well as prediction to multiple rasters. The ability to access raster data, without reading it into R, has provided a large advantage in specifying spatial models.

 

In this tutorial I will introduce raster classes, reading raster objects, manipulation, focal functions, vector extraction and writing functions.     

    

First, let’s add the R libraries we will need


    require(sp)
    require(rgdal)
    require(raster)

 

There are four relevant raster object classes: “SpatialPixelsDataFrame”, “SpatialGridDataFrame” in the sp library, “raster” in the raster library and “image” in spatstat. The spatstat image class is specific to point process analysis and will be covered in a tutorial specific to this type of analysis. The SpatialPixels and SpatialGrid classes are classes representing array/point and pixel/polygon topology and are not currently relevant. Understanding these classes can be important for coercion or setting up empty rasters but I am not covering these specifics in this tutorial. There are functions available that act as wrappers for coercion, making it unnecessary to understand the nuances of raster topology classes at this point.

 

There are two primary ways to read rasters through rgdal or raster libraries. Using rgdal reads the entire raster into memory whereas raster assesses the size of the raster and will, if memory safe, read it into memory otherwise index it out of memory. This is quite nice for handling large rasters or raster stacks.

You can use readGDAL/writeGDAL for reading and writing rasters in a large variety of formats. This results in a sp spatial pixels object or the specified output. You can also use GDALinfo to retrieve information about the raster without reading it into R. 

Alternately, you can use readRaster/writeRaster in the raster package to ensure that the object is memory safe. This results in a raster class object or the specified output.

Let’s start with introducing the SpatialGridDataFrame class. This is the object type that results from reading data in via rgdal and is structured in much the same way that as an sp SpatialPointsDataFrame. As in sp point and polygon objects, the attribute data is held in the @data slot.

For example data we will use the meuse.grid data available in sp.

    data(meuse.grid)

 

Coerce to SpatialPointsDataFrame


    coordinates(meuse.grid) = c("x", "y")

 

Create gridded topology

 

    gridded(meuse.grid) <- TRUE

 

Coerce to SpatialGridDataFrame and check attributes.

 

    x <- as(meuse.grid, "SpatialGridDataFrame")      
        class(x)
        str(x)

 

If you plot the resulting SpatialGridDataFrame object you will see that it looks like systematic point data.

 

    plot(x)

 

This is an object type that needs to be plotted via spplot.


    spplot(x)

 

You can manipulate this object in the same way that we worked with other sp objects. But, because of the specific object topology, care must be taken in subset type operators.  For example, If you attempt to remove NA values you will break the association between the @data slot and the row/col index. Additionally, NA values are inherently part of the data. As such you need to use na.omit or the na.rm flag available in many R functions.

    mean(x@data$dist)

    mean(x@data$dist, na.rm=TRUE)

 

If you do not accout for NA values you can also get different answers. For instance, it we want to look at the number of distances <= 0.5 and > 0.5

 

    dim(x)
      length( x@data$dist[x@data$dist <= 0.5] )
      length( x@data$dist[x@data$dist > 0.5] )

 

Here you see that, based on the query, we end up with a count of 13121 but there are only 8112 observations in the data. This is because R is including NA vlaues in each query. If we account for the NA's the we get the real count. Remember, R gives you exactly what you ask for!

    length( na.omit( x@data$dist[x@data$dist <= 0.5] ) )
    length( na.omit( x@data$dist[x@data$dist > 0.5] ) )

 

​Now we will move on to raster class objects. First, let’s create some data to work with.

Create example raster(s) and sp vector

 

    r <- raster(ncols=150, nrows=150, xmn=0)
        r[] <- runif(ncell(r))
    r2 <- raster(ncols=150, nrows=150, xmn=0)
        r2[] <- runif(ncell(2)) * 100  
    xy <- sampleRandom(r, 10, na.rm=TRUE, sp=TRUE)

 

There are various ways to manipulate rasters in the raster library. Unlike sp raster classes where you can directly manipulate the @data slot raster class objects require calling specific functions. There are many raster library functions that can be explored (i.e., reclass, aggregated, etc…) but I am going to focus on a few key functions that will provide considerable flexibility in working with rasters: cellStats, calc, focal and overlay.

The simplest thing that we often want to know about a raster are the global statistical moments. These can be retrieved using the cellStats or quantile function(s). This function provides access to a variety of statistics (sum, min, max, sd, mean, rms, skew) and allows for sampling of large rasters. The quantile function calculates specified percentile.

Here we calculated some basic statistics on our raster.

 

    cellStats(r, stat="mean")
    cellStats(r, stat="skew")
    quantile(r, probs = c(0.25, 0.50, 0.75))

 

We can combine moments to give us other statistics. Here is an example of the coefficient of variation.

 

    ( cellStats(r, stat="sd") / cellStats(r, stat="mean") ) * 100

 

We can also manipulate individual cells in the raster using the calc function. Like cellStats we can pass base functions to calc. In this case we define a function that uses operators (i.e., +, -, /, sqrt, etc…).

Here we simply transform the cell values by taking the square root.


    ( r.sqrt <- calc(r, fun=sqrt) )

 

We can also create our own function that does the same thing.


    r.sqrt <- calc(r, fun=function(x) { sqrt(x) } )

 

But here is where the true flexibility of the calc function can be seen. We can pass it any function that operates at the cell level so we can, for example, write our own reclassification function.

 

    rc <- function(x) { ifelse(x <= 0.5011546, 0, ifelse(x > 0.5011546, 1, NA)) }

    ( r.rc <- calc(r, fun=rc) )

    plot(r.rc)

 

We have looked at operation on single rasters but what if you would like to calculate across rasters. Here is where the overlay function comes in. It is very much like the calc function but is intended for operating across individual rasters as separate objects or in a stack/brick raster class.

Here we can take the mean of two rasters at the cell level

 

    r.mean <- overlay(r, r2, fun=mean)

 

We can do the same thing by passing calc a function

    r.mean <- overlay(r, r2, fun=function(x,y) { (x+y)/2 } )

We can get quite specific with in the function and, since the two rasters are in different scales, log transform the variables before taking the mean.

 

    r.nmean <- overlay(r, r2, fun=function(x,y) { (log(x)+log(y))/2 } )


Common raster analyses, available in GIS software, are focal (moving window) functions where a statistic is calculated for a focal cell within a specified window. There is a focal function available in the raster library that provides considerable flexibility. This function reads the data in as a vector, allowing you to pass any appropriate base R function or customized function.  

Here we will calculate a simple focal mean using the base "mean" function.

 

    r.mean <- focal(r, w=c(5,5), fun=mean,  na.rm=TRUE)

 

We can also convolve a defined matrix. Here we demonstrate a spatial smoothing approach where we define a Gaussian NxN matrix distribution following: m=g(x,y)=(1/2*pi*sigma^2)*exp(x^2+y^2/2*sigma^2)

Define matrix

    m <- matrix(nc=5, nr=5)
      col <- rep(1:5, 5); row <- rep(1:5, each=5)
        x <- col - ceiling(5/2); y <- row - ceiling(5/2)
          m[cbind(row, col)] <- 1/(2*pi*1^2) * exp(-(x^2+y^2)/(2*1^2))
            m <- m / sum(m) 

 
Convolve Gaussian matrix with raster

 

   r.convolve <- focal(r, w=m)
 
We can also write a function that is passed to the focal function. Here we define a simple function that calculates the percent of values, within specified window, that are >= a specified value.

 

Define custom function

    pct <- function(x, p=0.5) { na.omit( length(x[x >= p]) ) / na.omit( length(x) ) }

Pass function to focal function
 
    r.pct <- focal(r.convolve, w=c(11,11), fun=pct)

Plot results of all above focal functions

 

     dev.new()
       par(mfcol=c(2,2))
         plot(r, main="Raw Raster")
         plot(r.convolve, main="Gaussian Filtered Raster sigma=1, n=5")
         plot(r.mean, main="Focal Mean")
         plot(r.pct, main="Focal percent >= 0.5")

The raster has a build in function "extract" to pull raster values based on specified spatial locations, cell indexs, sp point objects and sp polygon objects. It is often necessary to extract raster values of covariates that will be used in specifying a model. 

 

Returns vector of extracted values

  extract(r, xy)

Because the resulting vector is ordered we can easily add it to sp vector object data.frame

    xy@data <- data.frame(xy@data, rVal=extract(r, xy))
      str(xy@data)

It is also possible to extract focal window as a data.frame or list object. Extracting a window results in raw focal values so, it is actually much more efficient to use the focal function.

Extract first 5 obs.
 
    extract(r, xy[1:5,], buffer=1000000)
 
For construction models you can read in a set of rasters and define them as a raster stack. When you operate on the stack you are operating on all the rasters associated with the stack. Conveniently, extracting values from a stack results in a ordered data.frame object that can be joined to the point data attributes in the @data slot.

 

Create a raster stack and extract values for all rasters.

    rs <- stack(r,r2)
      names(rs) <- c("r","r2")
        print(rs)

Extract values for all rasters in stack

 

    ( extract.rs <- as.data.frame(extract(rs, xy)) )
      names(extract.rs) <- names(rs)  
        class(extract.rs)

bottom of page