Machine Learning with Earth Observation Data

Case Study: Counting the number of boats around the Isle of Wight (summer)

Return to GIS

Introduction

What changes can be measured using Landsat and/or Sentinel-2 data? In large areas change detection (land use for example) is commonly used for these data sets.  If companies like Orbital Insights are counting cars, using shadows from floating oil tanks to determine capacity and measuring levels of construction, what smaller objects and data analytics can be developed with Landsat and Sentinel data? The resolution of this type of EO data doesn’t lend itself to this kind of analysis, at least not reliably enough. Ideally looking for objects that were about 30m in size (I can pan sharpen Landsat 8 to 15m pixels and the visible bands on Sentinel-2a are 10m pixels), detecting boats might be suitable to this.

Counting boats / objects around the Isle of Wight

There should be more boats in the summer and fewer in the winter (seems fairly obvious).  Attempting to use as many cloud free Landsat 8 and Sentinel-2 data sets that have imaged the Isle of Wight is a reasonable starting point. Being in the UK, clouds are just something we have to live with. There are methods for cloud removal as this link suggests: http://gis.stackexchange.com/questions/101740/cloud-removal-from-landsat-data but ultimately it is easier to mitigate this risk by getting cloud free images.

There are plenty of options for processing satellites images using specialist software such as ENVI and Erdas, to name two. GIS software like ArcGIS Desktop has plenty of imagery tools as does open source software. QGIS in combination with the semi-automatic plugin for QGIS was used in this example. The plugin creates a pan sharpened image of the tiles downloaded. After clipping the images out (Landsat 8 and Sentinel-2 cover different areas) the imagery is ready.

Here is Landsat 8 clipped to the Isle of Wight from the Summer 2016

And here is a Sentinel-2a image from Summer 2016

Both are beautiful images and both are affected slightly by clouds.

Build point shapefiles to use as training areas on both images. Even though the main interest here is in the number of boats, it is worth creating a number of other classes (infrastructure, water, forest, agriculture etc). Also create some validated shapefiles to check the results of the classification; in an ideal world these would be from known/field verified points.

In this case the script from http://www.machinalis.com/blog/python-for-geospatial-data-processing/ is used to process the imagery.

Sometimes the classification can be good and sometimes it can be poor. The sea will have an impact, if you don’t have broad enough training areas as displayed in the classified image above.

Once the training areas are good enough most of objects that looked like boats are captured.

After the classification has been run convert the raster image to vector in QGIS. A QC of the data to remove any erroneous data and clean up any miss ties. This allows a simple count of all the objects. 

With a few lines of Python, a quick histogram can be plotted

import matplotlib.pyplot as plt import numpy as np with open('boats.csv') as fname: data = np.loadtxt(fname, delimiter= ',', dtype="float", skiprows=1, usecols=None) plt.hist(data, bins=50) plt.xlabel("area of boat m2") plt.ylabel('# boats') plt.title("Histogram of boat/object size around isle of wight") plt.show()

Further thoughts

It is possible to stack up images, once downloaded, and run this process as a task. A numpy array could be built for the area and used for the training and verification data instead of creating shapefiles every time. Machine learning can count boats around the Isle of Wight and if it can count the number of boats than what other things can it count?

Machine Learning on EO for Oil and Gas

Counting the number of Oil/Gas wells might be possible, monitoring oil field development are both possibilities.

Ultimately the advance of machine learning and Earth Observation means terrain models can be built using more images (as the Catalogue of data increases), and complex image classification can be handled through code outside of the traditional remote sensing toolbox. Features on the surface of the earth can be identified and computers trained to detect them, or trained to detect changes.

High resolution imagery could mean that oil field service companies could be monitored, progress on land seismic surveys detected. Bottlenecks can be identified in operations, best/safest paths to travel can be validated. In short mapping can be improved. 

Useful links

Regression Analysis with python tutorial https://www.youtube.com/playlist?list=PLQVvvaa0QuDfKTOs3Keq_kaG2P55YRn5v

Alternative Satellite companies list

https://www.quandl.com/blog/alternative-data-satellite-companies

Machine Learning in Geoscience

http://giswin.geo.tsukuba.ac.jp/sis/tutorial/Machine_learning%20_in_geoscience.pdf

Python for Geospatial analysis

http://www.machinalis.com/blog/python-for-geospatial-data-processing/

Data (Landsat and Sentinel-2 data available here)

http://earthexplorer.usgs.gov/

Semi automatic plugin QGIS

https://plugins.qgis.org/plugins/SemiAutomaticClassificationPlugin/