A feature-based procedure for detecting technical outliers in water-quality data from in situ sensors

Priyanga Dilini Talagala, Rob J. Hyndman, Catherine Leigh, Kerrie Mengersen, Kate Smith-Miles

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Outliers due to technical errors in water-quality data from in situ sensors can reduce data quality and have a direct impact on inference drawn from subsequent data analysis. However, outlier detection through manual monitoring is infeasible given the volume and velocity of data the sensors produce. Here we introduce an automated procedure, named oddwater, that provides early detection of outliers in water-quality data from in situ sensors caused by technical issues. Our oddwater procedure is used to first identify the data features that differentiate outlying instances from typical behaviors. Then, statistical transformations are applied to make the outlying instances stand out in a transformed data space. Unsupervised outlier scoring techniques are applied to the transformed data space, and an approach based on extreme value theory is used to calculate a threshold for each potential outlier. Using two data sets obtained from in situ sensors in rivers flowing into the Great Barrier Reef lagoon, Australia, we show that oddwater successfully identifies outliers involving abrupt changes in turbidity, conductivity, and river level, including sudden spikes, sudden isolated drops, and level shifts, while maintaining very low false detection rates. We have implemented this oddwater procedure in the open source R package oddwater.

Original languageEnglish
Number of pages22
JournalWater Resources Research
DOIs
Publication statusAccepted/In press - 2019

Cite this

Talagala, Priyanga Dilini ; Hyndman, Rob J. ; Leigh, Catherine ; Mengersen, Kerrie ; Smith-Miles, Kate. / A feature-based procedure for detecting technical outliers in water-quality data from in situ sensors. In: Water Resources Research. 2019.
@article{f22b3ba5ca7d4ca8a64f44297272ab9d,
title = "A feature-based procedure for detecting technical outliers in water-quality data from in situ sensors",
abstract = "Outliers due to technical errors in water-quality data from in situ sensors can reduce data quality and have a direct impact on inference drawn from subsequent data analysis. However, outlier detection through manual monitoring is infeasible given the volume and velocity of data the sensors produce. Here we introduce an automated procedure, named oddwater, that provides early detection of outliers in water-quality data from in situ sensors caused by technical issues. Our oddwater procedure is used to first identify the data features that differentiate outlying instances from typical behaviors. Then, statistical transformations are applied to make the outlying instances stand out in a transformed data space. Unsupervised outlier scoring techniques are applied to the transformed data space, and an approach based on extreme value theory is used to calculate a threshold for each potential outlier. Using two data sets obtained from in situ sensors in rivers flowing into the Great Barrier Reef lagoon, Australia, we show that oddwater successfully identifies outliers involving abrupt changes in turbidity, conductivity, and river level, including sudden spikes, sudden isolated drops, and level shifts, while maintaining very low false detection rates. We have implemented this oddwater procedure in the open source R package oddwater.",
author = "Talagala, {Priyanga Dilini} and Hyndman, {Rob J.} and Catherine Leigh and Kerrie Mengersen and Kate Smith-Miles",
year = "2019",
doi = "10.1029/2019WR024906",
language = "English",
journal = "Water Resources Research",
issn = "0043-1397",
publisher = "American Geophysical Union",

}

A feature-based procedure for detecting technical outliers in water-quality data from in situ sensors. / Talagala, Priyanga Dilini; Hyndman, Rob J.; Leigh, Catherine; Mengersen, Kerrie; Smith-Miles, Kate.

In: Water Resources Research, 2019.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - A feature-based procedure for detecting technical outliers in water-quality data from in situ sensors

AU - Talagala, Priyanga Dilini

AU - Hyndman, Rob J.

AU - Leigh, Catherine

AU - Mengersen, Kerrie

AU - Smith-Miles, Kate

PY - 2019

Y1 - 2019

N2 - Outliers due to technical errors in water-quality data from in situ sensors can reduce data quality and have a direct impact on inference drawn from subsequent data analysis. However, outlier detection through manual monitoring is infeasible given the volume and velocity of data the sensors produce. Here we introduce an automated procedure, named oddwater, that provides early detection of outliers in water-quality data from in situ sensors caused by technical issues. Our oddwater procedure is used to first identify the data features that differentiate outlying instances from typical behaviors. Then, statistical transformations are applied to make the outlying instances stand out in a transformed data space. Unsupervised outlier scoring techniques are applied to the transformed data space, and an approach based on extreme value theory is used to calculate a threshold for each potential outlier. Using two data sets obtained from in situ sensors in rivers flowing into the Great Barrier Reef lagoon, Australia, we show that oddwater successfully identifies outliers involving abrupt changes in turbidity, conductivity, and river level, including sudden spikes, sudden isolated drops, and level shifts, while maintaining very low false detection rates. We have implemented this oddwater procedure in the open source R package oddwater.

AB - Outliers due to technical errors in water-quality data from in situ sensors can reduce data quality and have a direct impact on inference drawn from subsequent data analysis. However, outlier detection through manual monitoring is infeasible given the volume and velocity of data the sensors produce. Here we introduce an automated procedure, named oddwater, that provides early detection of outliers in water-quality data from in situ sensors caused by technical issues. Our oddwater procedure is used to first identify the data features that differentiate outlying instances from typical behaviors. Then, statistical transformations are applied to make the outlying instances stand out in a transformed data space. Unsupervised outlier scoring techniques are applied to the transformed data space, and an approach based on extreme value theory is used to calculate a threshold for each potential outlier. Using two data sets obtained from in situ sensors in rivers flowing into the Great Barrier Reef lagoon, Australia, we show that oddwater successfully identifies outliers involving abrupt changes in turbidity, conductivity, and river level, including sudden spikes, sudden isolated drops, and level shifts, while maintaining very low false detection rates. We have implemented this oddwater procedure in the open source R package oddwater.

UR - http://www.scopus.com/inward/record.url?scp=85074797893&partnerID=8YFLogxK

U2 - 10.1029/2019WR024906

DO - 10.1029/2019WR024906

M3 - Article

AN - SCOPUS:85074797893

JO - Water Resources Research

JF - Water Resources Research

SN - 0043-1397

ER -