Machine learning for anomaly detection in cyanobacterial fluorescence signals

Husein Almuhtaram, Arash Zamyadi, Ron Hofmann

Research output: Contribution to journalArticleResearchpeer-review

28 Citations (Scopus)

Abstract

Many drinking water utilities drawing from waters susceptible to harmful algal blooms (HABs) are implementing monitoring tools that can alert them to the onset of blooms. Some have invested in fluorescence-based online monitoring probes to measure phycocyanin, a pigment found in cyanobacteria, but it is not clear how to best use the data generated. Previous studies have focused on correlating phycocyanin fluorescence and cyanobacteria cell counts. However, not all utilities collect cell count data, making this method impossible to apply in some cases. Instead, this paper proposes a novel approach to determine when a utility needs to respond to a HAB based on machine learning by identifying anomalies in phycocyanin fluorescence data without the need for corresponding cell counts or biovolume. Four widespread and open source algorithms are evaluated on data collected at four buoys in Lake Erie from 2014 to 2019: local outlier factor (LOF), One-Class Support Vector Machine (SVM), elliptic envelope, and Isolation Forest (iForest). When trained on standardized historical data from 2014 to 2018 and tested on labelled 2019 data collected at each buoy, the One-Class SVM and elliptic envelope models both achieve a maximum average F1 score of 0.86 among the four datasets. Therefore, One-Class SVM and elliptic envelope are promising algorithms for detecting potential HABs using fluorescence data only.

Original languageEnglish
Article number117073
Number of pages10
JournalWater Research
Volume197
DOIs
Publication statusPublished - 1 Jun 2021
Externally publishedYes

Keywords

  • Artificial intelligence
  • CCchHlo C
  • Chlorophyll a
  • Cyanobacteria
  • Drinking water treatment
  • Monitoring
  • Phycocyanin

Cite this