The R Script associated with this page is available here. Download this file and open it (or copy-paste into a new script) with RStudio if you want to follow along.
Raster data in R can be loaded in a relatively straight forward way using the raster package. There are other ways of loading raster data, including implementations in sp, where raster data can be formatted as ‘SpatialPixelsDataFrame()’.
The raster package is still the most widely used package though, although it is increasingly being replaced by an approach that is based on a tidy concept (separating raw data from metadata and spatial information). More on this later
First, lets load the raster package and some first prepared data. If the loading below does not work, download the tif here or from the repository
library(raster)
# The data to load is from Jung et al. (2020), https://www.nature.com/articles/s41597-020-00599-8
# I have prepared a subset for the Alpes to load in here.
# These contain the fraction (multiplied with 1000) of a grid cell containing forest or arti at 1km.
# We now load the forest layer from this object
ras <- raster('ht_004_clipped.tif',layer = 1)
class(ras)
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"
# Note that the range of values is loaded by default to the minimum and maximum possible
# Reset like this
ras <- setMinMax(ras)
# -> range of values 1 - 1000
# Plot the raster data
plot(ras, col = topo.colors(100))
Here we loaded in the first band of the data as Raster object, but there are others (more about this below).
Can you load in and plot the second band (artifical landscapes)?
ras_crop <- raster('ht_004_clipped.tif',layer = 2)
plot(ras, col = rev(heat.colors(100)))
There are multiple functions to assess the metadata associated with a raster
# Plot the extent
extent(ras)
## class : Extent
## xmin : 7.377105
## xmax : 16.67129
## ymin : 43.44738
## ymax : 48.96765
# How many rows and columns are there? How many cells
nrow(ras)
## [1] 557
ncol(ras)
## [1] 937
ncell(ras)
## [1] 521909
# We can also define a custom function to work with raster data information
# This one here converts the extent to a WKT bounding box object
bbox2wkt <- function(bbox=NULL){
if(!is.null(bbox)) bbox <- c(bbox[1], bbox[3], bbox[2], bbox[4]) else stop('Provide a an extent object')
stopifnot(length(bbox)==4) #check for 4 digits
paste('POLYGON((',
sprintf('%s %s',bbox[1],bbox[2]), ',', sprintf('%s %s',bbox[3],bbox[2]), ',',
sprintf('%s %s',bbox[3],bbox[4]), ',', sprintf('%s %s',bbox[1],bbox[4]), ',',
sprintf('%s %s',bbox[1],bbox[2]),
'))', sep="")
}
# Print the extent in WKT format
bbox2wkt(extent(ras))
## [1] "POLYGON((7.377105383 43.447384894,16.671291018 43.447384894,16.671291018 48.967647157,7.377105383 48.967647157,7.377105383 43.447384894))"
Now we want to know what the mean fraction of forest across the scene is? To do so you first need to convert the units to fractions (0-1) and then calculate a mean using the function ‘cellStats()’. Check the help to see how to use this function via ?cellStats
# Calculate fractions
ras_frac <- ras / 1000
cellStats(x = ras_frac,stat = 'mean')
## [1] 0.4230384
# On average grid cells with some level of forest area
In essence raster files are just large matrices, thus it is relatively straight forward to convert the raster object to other common r data formats
# One can access all raster data like this
ras[] # Careful, large file!
# or all coordinates
coordinates(ras)
# Convert to matrix
mat <- as.matrix(ras)
# Or to data.frame. Here we set the option xy to TRUE, which saves the coordinates of each cell as well
df <- as.data.frame(ras,xy = TRUE)
Similarly the raster data can be manipulated by accessing the raw data
# For instance, suppose we want to set all cells with values lower than 500 (50%) to NA (No data)
# CAREFUL: This overwrites the original data in R!
ras[ras < 500] <- NA
# Plot
plot(ras)
Finally we going to write the output as new raster
# What is the data type of the raster?
dataType(ras)
## [1] "INT2U"
# The output data type decides how you can save the raster most space efficiently.
# If you have for instance an integer raster, then it is a waste of space saving it with floating point precision.
# Here is a list of all output formats your system supports via the gdal library
knitr::kable(writeFormats() )
name | long_name |
---|---|
raster | R-raster |
SAGA | SAGA GIS |
IDRISI | IDRISI |
IDRISIold | IDRISI (img/doc) |
BIL | Band by Line |
BSQ | Band Sequential |
BIP | Band by Pixel |
ascii | Arc ASCII |
CDF | NetCDF |
ADRG | ARC Digitized Raster Graphics |
BMP | MS Windows Device Independent Bitmap |
BT | VTP .bt (Binary Terrain) 1.3 Format |
BYN | Natural Resources Canada’s Geoid |
CTable2 | CTable2 Datum Grid Shift |
EHdr | ESRI .hdr Labelled |
ELAS | ELAS |
ENVI | ENVI .hdr Labelled |
ERS | ERMapper .ers Labelled |
FITS | Flexible Image Transport System |
GPKG | GeoPackage |
GS7BG | Golden Software 7 Binary Grid (.grd) |
GSBG | Golden Software Binary Grid (.grd) |
GTiff | GeoTIFF |
GTX | NOAA Vertical Datum .GTX |
HDF4Image | HDF4 Dataset |
HFA | Erdas Imagine Images (.img) |
IDA | Image Data and Analysis |
ILWIS | ILWIS Raster Map |
INGR | Intergraph Raster |
ISCE | ISCE raster |
ISIS2 | USGS Astrogeology ISIS cube (Version 2) |
ISIS3 | USGS Astrogeology ISIS cube (Version 3) |
KRO | KOLOR Raw |
LAN | Erdas .LAN/.GIS |
Leveller | Leveller heightfield |
MBTiles | MBTiles |
MRF | Meta Raster Format |
netCDF | Network Common Data Format |
NGW | NextGIS Web |
NITF | National Imagery Transmission Format |
NTv2 | NTv2 Datum Grid Shift |
NWT_GRD | Northwood Numeric Grid Format .grd/.tab |
PAux | PCI .aux Labelled |
PCIDSK | PCIDSK Database File |
PCRaster | PCRaster Raster File |
Geospatial PDF | |
PDS4 | NASA Planetary Data System 4 |
PNM | Portable Pixmap Format (netpbm) |
RMF | Raster Matrix Format |
ROI_PAC | ROI_PAC raster |
RRASTER | R Raster |
RST | Idrisi Raster A.1 |
SAGA | SAGA GIS Binary Grid (.sdat, .sg-grd-z) |
SGI | SGI Image File Format 1.0 |
Terragen | Terragen heightfield |
Can you save the ras object as new file ? The function is called ‘writeRaster’
writeRaster(ras, 'ht_004_clipped.tif',
# This bit is extra. Here we specify that the output should be compressed
options=c("COMPRESS=DEFLATE"))
Above we have overwritten the original data object. You might have also noticed that the provided raster file contains two bands, one forest and one for artifical landscapes Let’s load it back again, but this time as multidimensional raster stack
library(raster)
ras <- stack('ht_004_clipped.tif')
# Notice the difference in class
class(ras)
## [1] "RasterStack"
## attr(,"package")
## [1] "raster"
# When plotted, by default all layers are shown
plot(ras)
However quite often these stacks are provided in formats that require other wrapper packages. I have saved the above file also in NetCDF (Network Common Data Form) format (https://en.wikipedia.org/wiki/NetCDF), which is a self-describing data format that supports for instance metadata.
Generally these data are a convenient format to store spatial-temporal information or other spatial data with more than two axes (for instance depth or height in addition to coordinates). The file can be found here.
library(ncdf4)
# When the ncdf4 package is installed, nc files should be loadable directly with raster
# (might spit out a few warnings though)
ras <- raster('ht_004_clipped.nc')
## Warning in .varName(nc, varname, warn = warn): varname used is: Band1
## If that is not correct, you can set it to one of: Band1, Band2
## Warning in .getCRSfromGridMap4(atts): cannot process these parts of the CRS:
## long_name=CRS definition
## spatial_ref=GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]
## GeoTransform=7.377105383 0.00991908819103522 0 48.967647157 0 -0.009910704242369837
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum unknown in CRS definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum unknown in CRS definition
# Another way to open these files are the functions from the ncdf4 package
ras <- nc_open('ht_004_clipped.nc')
# This format looks a bit different
# Notice how there are variables, dimensions and global attributes
ras
## File ht_004_clipped.nc (NC_FORMAT_CLASSIC):
##
## 3 variables (excluding dimension variables):
## float Band1[lon,lat]
## long_name: GDAL Band Number 1
## _FillValue: 0
## grid_mapping: crs
## float Band2[lon,lat]
## long_name: GDAL Band Number 2
## _FillValue: 0
## grid_mapping: crs
## char crs[]
## grid_mapping_name: latitude_longitude
## long_name: CRS definition
## longitude_of_prime_meridian: 0
## semi_major_axis: 6378137
## inverse_flattening: 298.257223563
## spatial_ref: GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]
## GeoTransform: 7.377105383 0.00991908819103522 0 48.967647157 0 -0.009910704242369837
##
## 2 dimensions:
## lon Size:937
## standard_name: longitude
## long_name: longitude
## units: degrees_east
## lat Size:557
## standard_name: latitude
## long_name: latitude
## units: degrees_north
##
## 3 global attributes:
## Conventions: CF-1.5
## GDAL: GDAL 3.0.4, released 2020/01/28
## history: Mo Okt 12 14:45:47 2020: GDAL Create( /mnt/hdrive/Talks/20201022_ESMTrainingSession/CDAT_Materials/SpatialDataAnalysis_withR/ht_004_clipped.nc, ... )
What we going to do now is to save an attribute directly to the file, that specifies the type of data contained in the file.
# Description
desc <- 'This file contains data on forested and artifical habitats'
# Reopen the file in write mode
ras <- nc_open('ht_004_clipped.nc',write = TRUE)
# We are writing a global attribute with the description of the file
ncatt_put(nc = ras,varid = 0 ,attname = 'description',attval = desc)
# And close
nc_close(ras)
# Now load again to check that the description is there
ras <- nc_open('ht_004_clipped.nc')
ras
More data manipulation functions with this format can be found in the help files (?ncdf4).
The raster package can be slow, especially if your files are large. Splitting them up in tiles is usually a good idea for processing. There is an upcoming replacement package called terra that aims to implement most raster packages functions in faster, more efficient code.
Another new package of loading and analysing multidimensional data is the stars package. This new package specifically aims at processing multidimensional rasters and works with netCDF files too.
library(stars)
## Loading required package: abind
## Loading required package: sf
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
# Load in the ncdf or tif file
ras <- read_stars('ht_004_clipped.tif')
ras
## stars object with 3 dimensions and 1 attribute
## attribute(s), summary of first 1e+05 cells:
## ht_004_clipped.tif
## Min. : 1.0
## 1st Qu.: 48.0
## Median : 119.0
## Mean : 203.0
## 3rd Qu.: 272.5
## Max. :1000.0
## NA's :24513
## dimension(s):
## from to offset delta refsys point values
## x 1 937 7.37711 0.00991909 WGS 84 FALSE NULL [x]
## y 1 557 48.9676 -0.0099107 WGS 84 FALSE NULL [y]
## band 1 2 NA NA NA NA NULL
plot(ras,
downsample = TRUE # Show only as many colours as fit in the pixel size of the image
)
Another very useful feature of stars is that it allows to read in lazy object. This is particularly useful for very large files. The object can be created and investigated even without loading all data into memory. Only if any calculation requires the data stored within the stars object, it is loaded into memory.
ras <- read_stars('ht_004_clipped.tif',proxy = TRUE)
# There are also ways to load in for instance only a specific area (bbox) or range of the data into R.
If you are familiar with the methods of the tidyverse family of packages: Stars objects interact with a number of functions from these package. For instance, here is how I would select the top 100 cell values (e.g. those from left to right) from the stars object for band 1 (Forest) and plot it
suppressPackageStartupMessages(library(tidyverse))
# Take the raster object and 'pipe' into slice for the y-coordinate (Latitude)
ras %>% slice(y, 1:100) %>%
# Filter to band 1
filter(band == 1) %>%
# Plot the output of the chain
plot()
# Note that nothing is saved as part of this processing chain, although one obviously could save the output in a new object.
# Equally one access the attributes and dimensions of any stars object
# The first arguement to any stars object is the attribute (could be time), the next the dimensions
# For instance if I want to plot the first 20 values (upper left corner), here is how
plot(ras[,1:10,1:20])
See here for a number of functions provided by the stars package. Some of which we will explore later on. (Can you find the function that lets you write a stars file?)
# All functions of the stars package
methods(class = "stars")
## [1] [ [<- $<- adrop
## [5] aggregate aperm as_tibble as.data.frame
## [9] c coerce contour cut
## [13] dim dimnames dimnames<- droplevels
## [17] filter image initialize is.na
## [21] Math merge mutate Ops
## [25] plot predict print pull
## [29] select show slice slotsFromS3
## [33] split st_apply st_area st_as_sf
## [37] st_as_sfc st_as_stars st_bbox st_coordinates
## [41] st_crop st_crs st_crs<- st_dimensions
## [45] st_dimensions<- st_extract st_geometry st_interpolate_aw
## [49] st_intersects st_join st_mosaic st_normalize
## [53] st_redimension st_sample st_transform_proj st_transform
## [57] write_stars
## see '?methods' for accessing help and source code
Final exercise: Remember the outline of Laxenburg park that was saved in the vector exercise? Take the artifical dataset loaded via stars and clip it to the bounding box of the Laxenburg park!
ras <- read_stars('ht_004_clipped.tif',proxy = TRUE)
laxpol <- read_sf('Laxenburg.gpkg')
bb = st_bbox(laxpol)
plot( ras[bb] )
# There are many different ways of doing this. One could equally process this in a chain using 'pipes'
laxpol %>%
st_bbox() %>%
ras[.] %>%
plot()
Another big field of application for stars is the analysis of multi-temporal raster stacks or data-cubes. Custom-written function can then be easily applied to entire stacks of rasters, e.g. stars::st_apply()
. We don’t explore this as part of this tutorial, but have a look at the stars website for some example code.
(source: https://r-spatial.github.io/stars/ )