Lab 4 - Accessing spatial datasets | Applied Spatial Data Science for Public Health

Introduction

In the last lab, we did cluster analysis to reveal noteworthy spatial patterns in the outcome data (health related events/measures) and ways to assess spatial autocorrelation. In lab 2 (Spatial variation in risk), we actually kind of already used spatial autocorrelation to interpolate the data and produce smooth maps of our outcomes.

Starting next week and till the end of the class, we will talk about spatial regression modeling which combines regression modeling (first trend) and spatial autocorrelation modeling (second trend).

Today, in this transition session, we will cover some of the spatial data you can use in the regression modeling step.

Accessing spatial data

Many spatial data have been made directly available within R thanks to packages but for some data, you might need to look online.

From R packages

# Load packages
library(raster)
library(leaflet)
library(geodata)
library(terra)

You have already used spatial data from the “raster” package via the getData function. Let’s see what’s in there!

# Read vignette for details and references
#OLD ?raster::getData()
#NEW 

?geodata

So you can download administrative boundaries (option name = ‘GADM’), elevation (name = ‘alt’) and even climate variables (name = ‘worldclim’).

## Ethiopia
# Administrative boundaries (level 1)
#OLD ETH_Adm_1 <- raster::getData(name = "GADM",
 #                            country = "ETH",
 #                            level = 1)

#NEW 
ETH_Adm_1 <- as(gadm(country = "ETH", level=1, path=tempdir()), "Spatial")
class(ETH_Adm_1)

## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"

leaflet() %>% # Plot
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(data = ETH_Adm_1,
              popup = ETH_Adm_1$NAME_1,
              label = ETH_Adm_1$NAME_1)

# Elevation (meters)
#OLD ETH_elev <- raster::getData(name = "alt",
 #                           country = "ETH")
 
#NEW 
ETH_elev <- as(elevation_30s("ETH", path=tempdir()), "Raster") 
ETH_elev

## class      : RasterLayer 
## dimensions : 1428, 1836, 2621808  (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333  (x, y)
## extent     : 32.8, 48.1, 3.2, 15.1  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : ETH_elv_msk.tif 
## names      : ETH_elv_msk 
## values     : -202, 4459  (min, max)

raster_colorPal_elev <- colorNumeric(palette = topo.colors(64),
                                     domain = values(ETH_elev),
                                     na.color = NA) # Define palette
leaflet() %>% # Plot
  addProviderTiles("CartoDB.Positron") %>%
  addRasterImage(x = ETH_elev,
                 color = raster_colorPal_elev) #%>%

  # addLegend(pal = raster_colorPal_elev, 
  #           values = values(ETH_elev),
  #           title = "Elevation<br>(meters)") 
  # I can't figure out why this ^ isn't working

Make sure you get what you want

Need to correctly specify ‘country’ option:

#OLD raster::getData('ISO3')
cc <- country_codes()
cc

Varying ‘level’ option gives you different administrative boundaries and disabling ‘mask’ option reveals elevation data comes from a larger surface that goes beyond the country’s boundaries:

## Ethiopia and South Sudan
# Administrative boundaries (level 0)
#OLD SSD_Adm_0 <- raster::getData(name = "GADM",
 #                            country = "SSD",
 #                            level = 0)

#NEW
SSD_Adm_0 <- as(gadm(country = "SSD", level=0, path=tempdir()), "Spatial")
class(SSD_Adm_0)

## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"

# Elevation (meters)
#OLD ETH_elev_unmasked <- raster::getData(name = "alt",
 #                                    country = "ETH",
  #                                   mask = FALSE)
#NEW
ETH_elev_unmasked <- as(elevation_30s("ETH", path=tempdir(), mask=FALSE), "Raster")
ETH_elev_unmasked

## class      : RasterLayer 
## dimensions : 1428, 1836, 2621808  (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333  (x, y)
## extent     : 32.8, 48.1, 3.2, 15.1  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : ETH_elv.tif 
## names      : ETH_elv 
## values     : -202, 4459  (min, max)

raster_colorPal_elev_unmasked <- colorNumeric(palette = topo.colors(64),
                                              domain = values(ETH_elev_unmasked),
                                              na.color = NA) # Define palette
leaflet() %>% # Plot
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(data = SSD_Adm_0,
              popup = SSD_Adm_0$NAME_0,
              label = SSD_Adm_0$NAME_0,
              fillOpacity = 0,
              color = "red",
              weight = 3) %>%
  addRasterImage(x = ETH_elev_unmasked,
                 color = raster_colorPal_elev_unmasked) %>%
  addLegend(title = "Elevation<br>(meters)",
            values = values(ETH_elev_unmasked),
            pal = raster_colorPal_elev_unmasked) 

As you vary the ‘name’ option to access environmental variable, pay particular attention to the geolocalisation options. For instance, at the 2.5 resolution (minutes of a degree), there is no need for location specification but at the 0.5 resolution, need to provide ‘lon’ and ‘lat’ arguments…

# Precipitation from worldclim at the 2.5, 5 or 10 resolution (minutes of a degree)... no need for location specification?!?!
#OLD ETH_prec_10 <- raster::getData(name = "worldclim",
 #                           var = "prec",
  #                          res = 10)

#NEW 
ETH_prec_10 <- as(worldclim_global(var="prec", res=10, path=tempdir(), version="2.1"), "Raster")
ETH_prec_10

## class      : RasterStack 
## dimensions : 1080, 2160, 2332800, 12  (nrow, ncol, ncell, nlayers)
## resolution : 0.1666667, 0.1666667  (x, y)
## extent     : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## names      : wc2.1_10m_prec_01, wc2.1_10m_prec_02, wc2.1_10m_prec_03, wc2.1_10m_prec_04, wc2.1_10m_prec_05, wc2.1_10m_prec_06, wc2.1_10m_prec_07, wc2.1_10m_prec_08, wc2.1_10m_prec_09, wc2.1_10m_prec_10, wc2.1_10m_prec_11, wc2.1_10m_prec_12 
## min values :                 0,                 0,                 0,                 0,                 0,                 0,                 0,                 0,                 0,                 0,                 0,                 0 
## max values :               908,               793,               720,              1004,              2068,              2210,              2381,              1674,              1955,              2328,               718,               806

# Precipitation from worldclim at the 0.5 resolution (minutes of a degree)... need to specifiy 'lon' and 'lat' of the tile looked for
#OLD ETH_prec_0.5 <- raster::getData(name = "worldclim",
                     #       var = "prec",
                     #       res = 0.5,
                     #       lon = 40,
                      #      lat = 10)
#NEW

ETH_prec_0.5 <- as(worldclim_tile(var = "prec", res = 0.5, lon = 40.4897, lat = 9.1450, path=tempdir()), "Raster")
ETH_prec_0.5

## class      : RasterBrick 
## dimensions : 3600, 3600, 12960000, 12  (nrow, ncol, ncell, nlayers)
## resolution : 0.008333333, 0.008333333  (x, y)
## extent     : 30, 60, 0, 30  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : tile_32_wc2.1_30s_prec.tif 
## names      : tile_32_wc2.1_30s_prec_1, tile_32_wc2.1_30s_prec_2, tile_32_wc2.1_30s_prec_3, tile_32_wc2.1_30s_prec_4, tile_32_wc2.1_30s_prec_5, tile_32_wc2.1_30s_prec_6, tile_32_wc2.1_30s_prec_7, tile_32_wc2.1_30s_prec_8, tile_32_wc2.1_30s_prec_9, tile_32_wc2.1_30s_prec_10, tile_32_wc2.1_30s_prec_11, tile_32_wc2.1_30s_prec_12 
## min values :                        0,                        0,                        0,                        0,                        0,                        0,                        0,                        0,                        0,                         0,                         0,                         0 
## max values :                      122,                      131,                      239,                      707,                      303,                      344,                      471,                      441,                      315,                       471,                       577,                       185

Pop quiz: why do you think that is?

# Explore objects
ETH_prec_10

## class      : RasterStack 
## dimensions : 1080, 2160, 2332800, 12  (nrow, ncol, ncell, nlayers)
## resolution : 0.1666667, 0.1666667  (x, y)
## extent     : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## names      : wc2.1_10m_prec_01, wc2.1_10m_prec_02, wc2.1_10m_prec_03, wc2.1_10m_prec_04, wc2.1_10m_prec_05, wc2.1_10m_prec_06, wc2.1_10m_prec_07, wc2.1_10m_prec_08, wc2.1_10m_prec_09, wc2.1_10m_prec_10, wc2.1_10m_prec_11, wc2.1_10m_prec_12 
## min values :                 0,                 0,                 0,                 0,                 0,                 0,                 0,                 0,                 0,                 0,                 0,                 0 
## max values :               908,               793,               720,              1004,              2068,              2210,              2381,              1674,              1955,              2328,               718,               806

ETH_prec_0.5

## class      : RasterBrick 
## dimensions : 3600, 3600, 12960000, 12  (nrow, ncol, ncell, nlayers)
## resolution : 0.008333333, 0.008333333  (x, y)
## extent     : 30, 60, 0, 30  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : tile_32_wc2.1_30s_prec.tif 
## names      : tile_32_wc2.1_30s_prec_1, tile_32_wc2.1_30s_prec_2, tile_32_wc2.1_30s_prec_3, tile_32_wc2.1_30s_prec_4, tile_32_wc2.1_30s_prec_5, tile_32_wc2.1_30s_prec_6, tile_32_wc2.1_30s_prec_7, tile_32_wc2.1_30s_prec_8, tile_32_wc2.1_30s_prec_9, tile_32_wc2.1_30s_prec_10, tile_32_wc2.1_30s_prec_11, tile_32_wc2.1_30s_prec_12 
## min values :                        0,                        0,                        0,                        0,                        0,                        0,                        0,                        0,                        0,                         0,                         0,                         0 
## max values :                      122,                      131,                      239,                      707,                      303,                      344,                      471,                      441,                      315,                       471,                       577,                       185

# Restrict to October and plot
ETH_prec_10_Oct <- crop(ETH_prec_10[[10]], c(0,60,-10,20)) # xmin, xmax, ymin, ymax
ETH_prec_0.5_Oct <- ETH_prec_0.5[[10]]


raster_colorPal_prec <- colorNumeric(palette = topo.colors(64),
                                     domain = values(ETH_prec_10_Oct),
                                     na.color = NA) # Define palette

leaflet() %>% # Plot
  addProviderTiles("CartoDB.Positron") %>%
  addRasterImage(x = ETH_prec_10_Oct,
                 color = raster_colorPal_prec,
                 group = "October precipitation (mm)") %>%
  addLegend(title = "October precipitation (mm)<br>(10' res)",
            values = values(ETH_prec_10_Oct),
            pal = raster_colorPal_prec)  %>%
  addLayersControl(overlayGroups = c("October precipitation (mm)"))

leaflet() %>% # Plot
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(data = SSD_Adm_0,
              popup = SSD_Adm_0$NAME_0,
              label = SSD_Adm_0$NAME_0,
              fillOpacity = 0,
              color = "red",
              weight = 3,
              group = "South Sudan") %>%
  addPolygons(data = ETH_Adm_1,
              popup = ETH_Adm_1$NAME_1,
              label = ETH_Adm_1$NAME_1,
              fillOpacity = 0,
              color = "black",
              weight = 3,
              group = "Ethiopia") %>%
  addRasterImage(x = ETH_prec_0.5_Oct,
                 color = raster_colorPal_prec,
                 group = "October precipitation (mm)") %>%
  addLegend(title = "October precipitation (mm)<br>(0.5' res)",
            values = values(ETH_prec_0.5_Oct),
            pal = raster_colorPal_prec) %>%
  addLayersControl(overlayGroups = c("October precipitation (mm)", "Ethiopia", "South Sudan")) %>%
  hideGroup("South Sudan")

Pop quiz: how would you get the 0.5 resolution October precipitation for South Sudan?

# Precipitation from worldclim at the 0.5 resolution (minutes of a degree)... Left half of South Sudan
# SSD_prec_0.5_left <- raster::getData(name = "worldclim",
#                                      var = "prec",
#                                      res = 0.5,
#                                      lon = 20,
#                                      lat = 10)



SSD_prec_0.5_left <- as(worldclim_tile(var = "prec", res = 0.5, lon = 20, lat = 10, path=tempdir()), "Raster")


# Restrict to October
SSD_prec_0.5_Oct_left <- SSD_prec_0.5_left[[10]]

# Merge left and right halves together
SSD_prec_0.5_Oct <- raster::merge(x = SSD_prec_0.5_Oct_left,
                                  y = ETH_prec_0.5_Oct)

# Crop to South Sudan extent 
SSD_prec_0.5_Oct_Crop_Unmasked <- raster::crop(x = SSD_prec_0.5_Oct,
                                               y = SSD_Adm_0)

# Mask to South Sudan and plot
SSD_prec_0.5_Oct_Crop <- raster::mask(x = SSD_prec_0.5_Oct_Crop_Unmasked,
                                      mask = SSD_Adm_0)

raster_colorPal_prec_SSD <- colorNumeric(palette = topo.colors(64),
                                         domain = values(SSD_prec_0.5_Oct_Crop),
                                         na.color = NA) # Define palette

leaflet() %>% # Plot
  addProviderTiles("CartoDB.Positron") %>%
  addRasterImage(x = SSD_prec_0.5_Oct_Crop,
                 color = raster_colorPal_prec_SSD) %>%
  addPolygons(data = SSD_Adm_0,
              popup = SSD_Adm_0$NAME_0,
              label = SSD_Adm_0$NAME_0,
              fillOpacity = 0,
              color = "red",
              weight = 3) %>%
  addLegend(title = "October precipitation (mm)<br>(0.5' res)",
            values = values(SSD_prec_0.5_Oct_Crop),
            pal = raster_colorPal_prec_SSD)

In class assignement: map the 1 (minute of a degree) resolution January precipitation for Kenya in inches?

Other R packages

Here are a few R packages with spatial data available:

raster
getSpatialData
rgistools
rstoolbox
sentinel2
MODIS

On the web

Often times those R packages simply get you an easier access to data hosted online by various research groups and can have useful additional functions to process the data. Sometimes though, no one has put the effort to package the data you are looking for or you might not know that it is already available in R. In those cases, you end up needing to download the data online (which can come in many different formats) and import in R yourself.

You can download the precipitation WorldClim variable at 10m resolution here. What you get is a folder with 12 .tif files, presumably representing monthly precipitation (in mm and averaged over the 1970-2000 period), presumably over the entire globe. As you encounter new file formats that you don’t know how to import in R, it’s good practice to scout forums. Let’s google How to import .tif files in R? and find our answer on that post. Fortunately, it looks like the raster function of the ‘raster’ package reads in .tif files.

We’ve pre-downloaded the folder to the git repo for ease of use. Usually, you’d download and extract the folder to your own system.

# Path to wc2 folder downloaded for you into repository
path_to_wc2_folder <- "~/Documents/GitHub/phw272c.github.io/_source/wc2.1_10m_prec/"

# List files in folder downloaded
list.files(path = path_to_wc2_folder)

##  [1] "readme.txt"            "wc2.1_10m_prec_01.tif" "wc2.1_10m_prec_02.tif"
##  [4] "wc2.1_10m_prec_03.tif" "wc2.1_10m_prec_04.tif" "wc2.1_10m_prec_05.tif"
##  [7] "wc2.1_10m_prec_06.tif" "wc2.1_10m_prec_07.tif" "wc2.1_10m_prec_08.tif"
## [10] "wc2.1_10m_prec_09.tif" "wc2.1_10m_prec_10.tif" "wc2.1_10m_prec_11.tif"
## [13] "wc2.1_10m_prec_12.tif"

list.files(path = path_to_wc2_folder, pattern = ".tif") # Restrict to .tif files

##  [1] "wc2.1_10m_prec_01.tif" "wc2.1_10m_prec_02.tif" "wc2.1_10m_prec_03.tif"
##  [4] "wc2.1_10m_prec_04.tif" "wc2.1_10m_prec_05.tif" "wc2.1_10m_prec_06.tif"
##  [7] "wc2.1_10m_prec_07.tif" "wc2.1_10m_prec_08.tif" "wc2.1_10m_prec_09.tif"
## [10] "wc2.1_10m_prec_10.tif" "wc2.1_10m_prec_11.tif" "wc2.1_10m_prec_12.tif"

# Read in October precipitation
wc2_prec_10_Oct <- raster(x = paste0(path_to_wc2_folder, "wc2.1_10m_prec_10.tif"))


plot(wc2_prec_10_Oct, main = "Average Precipitation(mm) October, 1970-2000")

# Another more elegant way to read in the data downloaded online
wc2_prec_10 <- raster::stack(x = paste0(path_to_wc2_folder, 
                                        list.files(path = path_to_wc2_folder,
                                                   pattern = ".tif")))
wc2_prec_10

## class      : RasterStack 
## dimensions : 1080, 2160, 2332800, 12  (nrow, ncol, ncell, nlayers)
## resolution : 0.1666667, 0.1666667  (x, y)
## extent     : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## names      : wc2.1_10m_prec_01, wc2.1_10m_prec_02, wc2.1_10m_prec_03, wc2.1_10m_prec_04, wc2.1_10m_prec_05, wc2.1_10m_prec_06, wc2.1_10m_prec_07, wc2.1_10m_prec_08, wc2.1_10m_prec_09, wc2.1_10m_prec_10, wc2.1_10m_prec_11, wc2.1_10m_prec_12 
## min values :                 0,                 0,                 0,                 0,                 0,                 0,                 0,                 0,                 0,                 0,                 0,                 0 
## max values :               908,               793,               720,              1004,              2068,              2210,              2381,              1674,              1955,              2328,               718,               806

wc2_prec_10[[10]]

## class      : RasterLayer 
## dimensions : 1080, 2160, 2332800  (nrow, ncol, ncell)
## resolution : 0.1666667, 0.1666667  (x, y)
## extent     : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : wc2.1_10m_prec_10.tif 
## names      : wc2.1_10m_prec_10 
## values     : 0, 2328  (min, max)

Other online resources

Here are a few good websites with free downloadable spatial data

Spatial data: where do they come from?

We often interact with climatic and environmental raster data, like the land use dataset shown below. But how are these datasets produced? The answer is via remotely sensed imagery, such as images taken by satellites, planes or drones. These data can sometimes be combined with ground data, such as data from weather stations.

# Land use (# For information on land use classifications see http://due.esrin.esa.int/files/GLOBCOVER2009_Validation_Report_2.2.pdf)
ETH_land_use <- raster::raster("https://github.com/phw272c/phw272c.github.io/blob/master/data/ETH_land_use.tif?raw=true")
plot(ETH_land_use)

Remote sensing data

Satellites orbiting around the globe carry instruments to take measurements. Just likes our eyes or cameras, the sensors onboard satellites receive a radiation that was emitted by a source of light (sun, flash, light bulb, etc…) and reflected by an object. The differences between the pre and post reflection radiations characterize properties of the reflecting object such as color for which our eyes have been optimized to perceive. You can picture the radiation as a wave signal. Speed radars and the SRTM mission act in the same way (except they also send the incident radiation) and from the difference in the 2 wave signals calculate speed and elevation respectively.

Speed, elevation, color are calculated based on the wavelengths property of the wave. Sensors onboard remote sensing satelitte mission can receive multiple waves across the spectrum.

For instance the satellite Landsat-8 has 11 bands (i.e wavelengths ranges) at which it measures the reflectance (fraction of incident electromagnetic power that is reflected) which we can simply see as a measure of ‘light’ intensity reflected by the earth surface.

# Load some Etiophia Landsat data for the year 2017 (only 7 first bands available)
Landsat_Band_1 <- raster::raster(x = "https://github.com/phw272c/phw272c.github.io/blob/master/data/Landsat/Landsat_2017_Band1.tif?raw=true")
Landsat_Band_2 <- raster::raster(x = "https://github.com/phw272c/phw272c.github.io/blob/master/data/Landsat/Landsat_2017_Band2.tif?raw=true")
Landsat_Band_3 <- raster::raster(x = "https://github.com/phw272c/phw272c.github.io/blob/master/data/Landsat/Landsat_2017_Band3.tif?raw=true")
Landsat_Band_4 <- raster::raster(x = "https://github.com/phw272c/phw272c.github.io/blob/master/data/Landsat/Landsat_2017_Band4.tif?raw=true")
Landsat_Band_5 <- raster::raster(x = "https://github.com/phw272c/phw272c.github.io/blob/master/data/Landsat/Landsat_2017_Band5.tif?raw=true")
Landsat_Band_6 <- raster::raster(x = "https://github.com/phw272c/phw272c.github.io/blob/master/data/Landsat/Landsat_2017_Band6.tif?raw=true")
Landsat_Band_7 <- raster::raster(x = "https://github.com/phw272c/phw272c.github.io/blob/master/data/Landsat/Landsat_2017_Band7.tif?raw=true")

# Stack together layers
Landsat_Band <- raster::stack(Landsat_Band_1,
                              Landsat_Band_2,
                              Landsat_Band_3,
                              Landsat_Band_4,
                              Landsat_Band_5,
                              Landsat_Band_6,
                              Landsat_Band_7)

# Explore the reflectance values. Assume you should normalize by 10000 to get reflectance
summary(Landsat_Band[[1]])

##             B1
## Min.      73.0
## 1st Qu.  338.0
## Median   403.0
## 3rd Qu.  493.5
## Max.    4032.0
## NA's       0.0

How do you get from reflectance to land use surfaces?

First you can read some documentation about the wavelength ranges covered by band for your satellite.

# Label bands accordingly
names(Landsat_Band) <- c('Ultra.blue', 'Blue', 'Green', 'Red', 'NIR', 'SWIR1', 'SWIR2')

# Plot individual layers
par(mfrow = c(2,2))
plot(Landsat_Band_2, main = "Blue", col = gray(0:100 / 100))
plot(Landsat_Band_3, main = "Green", col = gray(0:100 / 100))
plot(Landsat_Band_4, main = "Red", col = gray(0:100 / 100))
plot(Landsat_Band_5, main = "NIR", col = gray(0:100 / 100))

Pop quiz: what do you think is in the top right corner of our spatial extent?

Notice the difference in shading and range of legends between the different bands. This is because different surface features reflect the incident solar radiation differently. Each layer represent how much incident solar radiation is reflected for a particular wavelength range. For example, vegetation reflects more energy in NIR than other wavelengths and thus appears brighter. In contrast, water absorbs most of the energy in the NIR wavelength and it appears dark.

We do not gain that much information from these grey-scale plots; they are often combined into a “composite” to create more interesting plots. To make a “true (or natural) color” image, that is, something that looks like a normal photograph (vegetation in green, water blue etc), we need bands in the red, green and blue regions. For this Landsat image, band 4 (red), 3 (green), and 2 (blue) can be used. The plotRGB method can be used to combine them into a single composite. You can also supply additional arguments to plotRGB to improve the visualization (e.g. a linear stretch of the values, using strecth = “lin”).

raster::plotRGB(x = Landsat_Band,
                r = 4,
                g = 3,
                b = 2,
                stretch = "lin",
                main = "Landsat True Color Composite")

The true-color composite reveals much more about the landscape than the earlier gray images. Another popular image visualization method in remote sensing is known “false color” image in which NIR, red, and green bands are combined. This representation is popular as it makes it easy to see the vegetation (in red).

raster::plotRGB(x = Landsat_Band,
                r = 5,
                g = 4,
                b = 3,
                stretch = "lin",
                main = "Landsat False Color Composite")

Relation between bands

A scatterplot matrix can be helpful in exploring relationships between raster layers. This can be done with the pairs() function of the “raster” package.

Plot of reflection in the ultra-blue wavelength against reflection in the blue wavelength.

raster::pairs(x = Landsat_Band[[1:2]], main = "Ultra-blue versus Blue")

## Warning in graphics::par(usr): argument 1 does not name a graphical parameter

## Warning in graphics::par(usr): argument 1 does not name a graphical parameter

## Warning in graphics::par(usr): argument 1 does not name a graphical parameter

Plot of reflection in the red wavelength against reflection in the NIR wavelength.

raster::pairs(x = Landsat_Band[[4:5]], main = "Red versus NIR")

## Warning in graphics::par(usr): argument 1 does not name a graphical parameter

## Warning in graphics::par(usr): argument 1 does not name a graphical parameter

## Warning in graphics::par(usr): argument 1 does not name a graphical parameter

The first plot reveals high correlations between the blue wavelength regions. Because of the high correlation, we can just use one of the blue bands without losing much information. The distribution of points in second plot (between NIR and red) is unique due to its triangular shape. Vegetation reflects more in the NIR range than in the red and creates the upper corner close to NIR (y) axis. Water absorbs energy from all the bands and occupies the location close to origin. The furthest corner is created due to highly reflecting surface features like bright soil or concrete.

Spectral profiles

A plot of the spectrum (all bands) for pixels representing a certain earth surface features (e.g. water) is known as a spectral profile. Such profiles demonstrate the differences in spectral properties of various earth surface features and constitute the basis for image analysis.

Let’s start by loading some training data from the region which compiles GPS coordinates of points for which we know the Land use and Land cover (LULC) class.

# Load the training dataset, which compiles GPS coordinates of points for which we know the Land use and Land cover (LULC) class
Training_Data <- read.csv(file = "https://raw.githubusercontent.com/phw272c/phw272c.github.io/master/data/LULC_Training_Tana.csv", header = T)

# Explore the data
head(Training_Data)

##   Latitude Longitude LULC
## 1 11.75859  37.26399   12
## 2 11.70965  37.14431   12
## 3 11.79550  37.38545   17
## 4 11.78845  37.15329   10
## 5 11.50553  37.15888   10
## 6 11.79360  37.29655   10

# Label LULC classes
Training_Data$LULC <- factor(Training_Data$LULC,
                             levels = c(9, 10, 11, 12, 13, 17),
                             labels = c("Savanna", "Grassland", "Wetland", "Cropland", "Urban", "Water"))

# Frequency table of classes
table(Training_Data$LULC)

## 
##   Savanna Grassland   Wetland  Cropland     Urban     Water 
##         4       481        27       328         4       156

You can now convert the data frame to a spatial object to extract reflectance values at training points. Spectral values can be extracted from any multispectral data set using extract() function of the “raster” package.

# Convert data frame to a SpatialPointsDataFrame by specifying coordinates
Training_Data_DF <- Training_Data # Save a data frame version
sp::coordinates(Training_Data) <- c("Longitude", "Latitude")

# Extract values from the reflectance images at the point locations and merge to training dataset
Training_Data_Reflectance <- cbind(data.frame(LULC =Training_Data_DF$LULC), as.data.frame(raster::extract(x = Landsat_Band, y = Training_Data)))
head(Training_Data_Reflectance) 

##        LULC Ultra.blue Blue Green  Red  NIR SWIR1 SWIR2
## 1  Cropland        372  512   823  991 2494  2548  1620
## 2  Cropland        375  483   795 1183 2038  1886  1613
## 3     Water        603  780  1378 1111  259   171   141
## 4 Grassland        364  485   806  917 1767  1524   997
## 5 Grassland        355  437   693  765 1972  1990  1427
## 6 Grassland        407  499   759  868 2254  2161  1433

We can now compute the mean reflectance values for each class and each band and plot to obtain the spectral profiles.

# Average reflectance values over LULC classes
LULC_Mean_Reflectance <- stats::aggregate(x = . ~ LULC, data = Training_Data_Reflectance, FUN = function(x){round(mean(x), digits = 2)})
LULC_Mean_Reflectance

##        LULC Ultra.blue   Blue   Green     Red     NIR   SWIR1   SWIR2
## 1   Savanna     404.00 455.25  702.12  747.62 2303.88 1639.50 1085.62
## 2 Grassland     367.07 450.39  731.63  883.92 2137.84 2044.79 1478.32
## 3   Wetland     360.65 440.81  760.96  664.87 1929.70 1099.56  586.33
## 4  Cropland     408.25 503.26  794.93 1030.95 2308.46 2169.48 1615.76
## 5     Urban     476.38 580.12  848.00  957.88 2128.75 1884.00 1393.12
## 6     Water     571.12 731.72 1313.05 1131.43  302.03  170.37  136.47

# Reshape to long format
library(tidyr)
LULC_Mean_Reflectance_Long <- tidyr::gather(data = LULC_Mean_Reflectance,
                                            key = "Band",
                                            value = "Reflectance",
                                            -"LULC",
                                            factor_key = T)

# Plot
library(ggplot2)
ggplot2::ggplot(data = LULC_Mean_Reflectance_Long, 
                mapping = aes(x = Band, y = Reflectance, color = LULC, group = LULC)) +
    geom_line()

The spectral profiles show (dis)similarity in the reflectance of different features on the earth’s surface (or above it). ‘Water’ shows relatively low reflection in all wavelengths, while all other classes have relatively high reflectance in the longer wavelengts. Remember though that our training data only has 1000 points across 6 LULC classes only have a few observation for 3 of those classes.

Classification

So now let’s try to predict LULC and produce our raster layer by fitting a classification model using the “randomForest” package.

library(randomForest)
RF_Model <- randomForest::randomForest(x = Training_Data_Reflectance[,2:8], y = Training_Data_Reflectance[,1])
RF_Model

## 
## Call:
##  randomForest(x = Training_Data_Reflectance[, 2:8], y = Training_Data_Reflectance[,      1]) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 2
## 
##         OOB estimate of  error rate: 22.9%
## Confusion matrix:
##           Savanna Grassland Wetland Cropland Urban Water class.error
## Savanna         0         1       2        1     0     0  1.00000000
## Grassland       1       382       7       90     0     1  0.20582121
## Wetland         1        11      11        0     0     4  0.59259259
## Cropland        0       103       1      224     0     0  0.31707317
## Urban           0         2       0        2     0     0  1.00000000
## Water           0         0       2        0     0   154  0.01282051

By inspecting the output of the model, we realize it is not doing a super job. Again remember we only have a few observations in our training dataset. Classification of the ‘water’ class is pretty good, thanks to its particular spectral profile. Now that we have a model of the relationship between land cover and reflectance values, we can predict land cover over our entire region.

#This prediction function could take ~1 minute
LULC_predicted <- raster::predict(object = Landsat_Band,
                                  model = RF_Model,
                                  progress = 'text',
                                  type = 'response',
                                  overwrite = TRUE) 

##   |                                                                              |                                                                      |   0%  |                                                                              |==================                                                    |  25%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================================                  |  75%  |                                                                              |======================================================================| 100%
## 

library(RColorBrewer)
factpal_LULC <- colorFactor(palette = brewer.pal(n = length(levels(Training_Data_DF$LULC)), name = "Set1"),
                            domain = c(1:length(levels(Training_Data_DF$LULC)))) # define color palette

leaflet() %>% # Plot
  addProviderTiles("Esri.WorldImagery") %>%
  addRasterImage(x = LULC_predicted,
                 color = factpal_LULC,
                 group = "Predicted LULC") %>%
  addLegend(title = "Predicted LULC",
            colors = brewer.pal(n = length(levels(Training_Data_DF$LULC)), name = "Set1"),
            labels = c("Savanna", "Grassland", "Wetland", "Cropland", "Urban", "Water"),
            opacity = 1) %>%
  addLayersControl(overlayGroups = c("Predicted LULC"))

Concluding remarks

Now you have a sense of how remote sensing can be used to produce all sort of environmental layers. The quality of those will depend on the spatial and temporal resolutions of the sensors as well as the techniques used to process the raw data (clouds?) and model it to produce more useful layers.

Resources

This week borrows heavily from this excellent tutorial.

Readings

Hay SI. An overview of remote sensing and geodesy for epidemiology and public health application. Advances in parasitology. 2000 Jan 1;47:1-35.

Simoonga C, Utzinger J, Brooker S, Vounatsou P, Appleton CC, Stensgaard AS, Olsen A, Kristensen TK. Remote sensing, geographical information system and spatial analysis for schistosomiasis epidemiology and ecology in Africa. Parasitology. 2009 Nov;136(13):1683-93.

Savory DJ, Andrade-Pacheco R, Gething PW, Midekisa A, Bennett A, Sturrock HJ. Intercalibration and Gaussian process modeling of nighttime lights imagery for measuring urbanization trends in Africa 2000–2013. Remote Sensing. 2017 Jul;9(7):713.

Yang GJ, Vounatsou P, Xiao-Nong Z, Utzinger J, Tanner M. A review of geographic information system and remote sensing with applications to the epidemiology and control of schistosomiasis in China. Acta tropica. 2005 Nov 1;96(2-3):117-29.