Reading vector data in R

"Reading vector data in R" will start with introducing reading/writing vector data. In future installments of this blog I plan on covering: advanced vector and raster manipulation, rarity and diversity statistics, imputation analysis for identifying ecological offsets, species distribution modeling, global and local autocorrelation, assessment of autocorrelation effects on regression, spatial regression methods, Random Forests, graph theory, time series analysis, cluster analysis and any other topic that comes to mind. I welcome requests for topics.

Now let’s get started with an introduction to reading and writing spatial vector objects in R. The workhorse functions for reading data into R are: “read.table()”, “scan()” and “read.csv()” with database functionality in the “RODBC” package. These functions, mostly, result in a “data.frame” S3 object. R has a long legacy of spatial statistics and as such each package developed their own spatial classes and associated functions for reading data. Fortunately, spatial classes are converging on a single standard as defined by “sp” S4 objects. These are a special type of R object that are managed by the “sp” package and have multiple components, “slots”, making up a given spatial object.

First, let’s start with X,Y coordinates. If we do not have data in a GIS format but we have coordinates then we can use functions in “sp” to coerce an object into a spatial object. In the code examples I will prefix annotation with # which is the comment character for R.

Add required package(s)

    require(sp)
    require(maptools)
    require(rgdal)

Here we create some dummy coordinates so we can create a spatial object.

x <- round(runif(10), 2)
y <- round(runif(10), 2)

The SpatialPoints function allows you to coerce coordinates into a sp object. In this case the resulting object would not have attributes associated with it.

    xy.sp <- SpatialPoints(cbind(x,y))
      class(xy.sp)
        plot(xy.sp)

If we wanted to associate a data.frame with our SpatialPoints object we could coerce it using the SpatialPointsDataFrame function.

Create an example data.frame

df <- data.frame(group=rep(c("A","B"),5), x=round(5 + rnorm(10), 2), y=20:29)

Relate data.frame attributes to SpatialPoints object.

xy.spdf <- SpatialPointsDataFrame(xy.sp, df)

Unless we are going to conduct simulations the above method has limited application but is, rather, intended to demonstrate the creation of spatial objects and sp coercion. A more real world example would be having your data in a flat file (ASCII) such as a csv. You could use any of the standard functions to read your data into R. For example purposes I am going to use the “meuse” dataset that is distributed with the sp package.

Add meuse dataframe object from the sp library. When we look at the dataframe object we see that it is not an sp object but there are x and y columns.

data(meuse)

class(meuse)

str(meuse)

We can coerce this data.frame into an sp SpatialPointsDataFrame object directly using a native coercion method with the coordinates function. Note that the class of the object changes to a SpatialPointsDataFrame. The data.frame is always held in the slot “data” and can be accessed using the slot operator “@”

coordinates(meuse) <- ~x+y
class(meuse)

str(meuse@data)

My preference for reading vectors that are in a GIS format is the “readOGR” function available in the “rgdal” package. This package is a binding to the C++ Geospatial Data Abstraction Library (GDAL) which provides access to a large number of vector and raster formats. Reading data using rgdal always results in an S4 sp class object. For this example I will use the cities shapefile that is distributed with rgdal. Note; where it is possible to read an ESRI filegeodatabase unfortunately you must compile gdal from scratch with ESRI’s filegeodatabase API.

The important required arguments for readOGR are “dsn” and “layer”. The function can recognize data format based on its file extension however, occasionally specifying the driver is required and is good practice. The “dsn” argument is merely the path that holds the data and “layer” refers to the name of the dataset.

cities <- readOGR(dsn=system.file("vectors", package = "rgdal")[1], layer="cities")
class(cities)

It is also possible to retrieve information on the data without reading it into R.

ogrInfo(dsn=system.file("vectors", package = "rgdal")[1], layer="cities")

Writing data from R to an external file can be accomplished using writeOGR. The “obj” argument references an sp object, “dsn” is the path to write data to “layer” is the name of the output and it is not necessary to specify file extension and “driver” specifies the output format.

writeOGR(cities, dsn="C:/GIS", “cities”, driver="ESRI Shapefile")

The last method that I will cover for reading vector data is using the “maptools” package. Unlike rgdal, maptools does not call an external library for reading data but rather uses functions written in native R. These functions are specific to shapefiles and different functions must be called based on vector type (i.e., line, point, polygon). One disadvantage to reading/writing data using maptools is that you must specifically define the projection using the “proj4string” argument. Spatial read functions in maptools always result in an sp class object. One major advantage that I have found in this package is that there is capacity in the functions for repairing topology errors and consistency checking that can make gdal fail. This can be accomplished by setting the “repair” argument to TRUE. This is of particular use when you wish to the write an object after manipulating it in R. We will use the sids shapefile that is distributed with maptools.

The base functions for reading data are: “readShapePoint”, “readShapePoly” and “readShapeLine”. The “fn” argument is the name of the shapefile and can include the path, “IDvar” refers to a column in the attribute table that you would like to use instead of FID as the unique ID field and “proj4string” is how you define the projection if needed.

sids <- readShapePoly(fn=system.file("shapes/sids.shp", package="maptools")[1],
IDvar="FIPSNO", proj4string=CRS("+proj=longlat +ellps=clrk66"))

To write data use the corollary “write” for the above mentioned read functions. The x argument is the sp object to be written.

writePolyShape(x=sids, fn=sids_new)

To demonstrate how you can use read/write functions here is a loop that iteratively subsets the sids shapefile into multiple shapefiles, based on the "NAME" attribute, and writes each to a new shapefile. This is the same as ArcMap’s “Split” in analysis tools. Don’t worry I will cover how the querying works in this function in "Manipulating spatial vector data in R".

Create a vector of unique values to split the data on

( y <- unique(sids@data$NAME) )

Loop to split feature and write new shapefile(s) for first 10 counties

    for (i in y[1:10] ) {
        temp = sids[sids$NAME == i, ]
        writeOGR(temp, dsn=getwd(), i, driver="ESRI Shapefile", overwrite_layer=TRUE)
    }