Probabilistic methods of analysis for the time series Moran scatterplot quadrant signature

D. Rohde, J. Corcoran, T.R. McGee, R. Wickes, M. Townsley

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Recently, time series Moran scatterplot quadrant signatures (MSQS) have been introduced as a means of spatio-temporal analysis. Moran scatter plots summarise a set of local Moran statistics that identify how a quantity of interest relates to its neighbours. By reducing these scatter plots to just one of four quadrant locations at fixed time intervals, a time series MSQS is obtained. Clustering on the time series MSQS allows regions that show similar behaviour relative to their neighbours to be identified. It has recently been shown that the Levenshtein metric, a distance metric originally used for string comparison, can be used to construct a kernel that allows standard Ward hierarchical clustering to be applied. The result is that regions with similar spatio-temporal behaviour can be identified. The purpose of this paper is to demonstrate the use of Dirichlet Process mixture models as a fully probabilistic alternative to applying Ward hierarchical clustering with the Levenshtein metric. This approach offers an advance to the existing literature as it proposes a fully generative model that articulates the underlying assumptions and allows prediction of new test points. Additionally, it provides a principled method that avoids using heuristics to select the number of clusters. An efficient Gibbs sampling Markov chain Monte Carlo algorithm is presented, and it is demonstrated how the output of this can be mapped and analysed. Difficulties inherent in mapping and plotting high-dimensional mathematical objects are discussed, and practical solutions are proposed.

Original languageEnglish
Pages (from-to)52-65
Number of pages14
JournalEnvironmetrics
Volume26
Issue number1
DOIs
Publication statusPublished - 1 Feb 2015
Externally publishedYes

Keywords

  • Bayesian statistics
  • Clustering
  • Dirichlet process
  • Fire modelling
  • Machine learning
  • Spatial statistics

Cite this

Rohde, D. ; Corcoran, J. ; McGee, T.R. ; Wickes, R. ; Townsley, M. / Probabilistic methods of analysis for the time series Moran scatterplot quadrant signature. In: Environmetrics. 2015 ; Vol. 26, No. 1. pp. 52-65.
@article{d820dbb1ae8c4507baf6c98622dadf3c,
title = "Probabilistic methods of analysis for the time series Moran scatterplot quadrant signature",
abstract = "Recently, time series Moran scatterplot quadrant signatures (MSQS) have been introduced as a means of spatio-temporal analysis. Moran scatter plots summarise a set of local Moran statistics that identify how a quantity of interest relates to its neighbours. By reducing these scatter plots to just one of four quadrant locations at fixed time intervals, a time series MSQS is obtained. Clustering on the time series MSQS allows regions that show similar behaviour relative to their neighbours to be identified. It has recently been shown that the Levenshtein metric, a distance metric originally used for string comparison, can be used to construct a kernel that allows standard Ward hierarchical clustering to be applied. The result is that regions with similar spatio-temporal behaviour can be identified. The purpose of this paper is to demonstrate the use of Dirichlet Process mixture models as a fully probabilistic alternative to applying Ward hierarchical clustering with the Levenshtein metric. This approach offers an advance to the existing literature as it proposes a fully generative model that articulates the underlying assumptions and allows prediction of new test points. Additionally, it provides a principled method that avoids using heuristics to select the number of clusters. An efficient Gibbs sampling Markov chain Monte Carlo algorithm is presented, and it is demonstrated how the output of this can be mapped and analysed. Difficulties inherent in mapping and plotting high-dimensional mathematical objects are discussed, and practical solutions are proposed.",
keywords = "Bayesian statistics, Clustering, Dirichlet process, Fire modelling, Machine learning, Spatial statistics",
author = "D. Rohde and J. Corcoran and T.R. McGee and R. Wickes and M. Townsley",
year = "2015",
month = "2",
day = "1",
doi = "10.1002/env.2302",
language = "English",
volume = "26",
pages = "52--65",
journal = "Environmetrics",
issn = "1180-4009",
publisher = "Wiley-Blackwell",
number = "1",

}

Probabilistic methods of analysis for the time series Moran scatterplot quadrant signature. / Rohde, D.; Corcoran, J.; McGee, T.R.; Wickes, R.; Townsley, M.

In: Environmetrics, Vol. 26, No. 1, 01.02.2015, p. 52-65.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Probabilistic methods of analysis for the time series Moran scatterplot quadrant signature

AU - Rohde, D.

AU - Corcoran, J.

AU - McGee, T.R.

AU - Wickes, R.

AU - Townsley, M.

PY - 2015/2/1

Y1 - 2015/2/1

N2 - Recently, time series Moran scatterplot quadrant signatures (MSQS) have been introduced as a means of spatio-temporal analysis. Moran scatter plots summarise a set of local Moran statistics that identify how a quantity of interest relates to its neighbours. By reducing these scatter plots to just one of four quadrant locations at fixed time intervals, a time series MSQS is obtained. Clustering on the time series MSQS allows regions that show similar behaviour relative to their neighbours to be identified. It has recently been shown that the Levenshtein metric, a distance metric originally used for string comparison, can be used to construct a kernel that allows standard Ward hierarchical clustering to be applied. The result is that regions with similar spatio-temporal behaviour can be identified. The purpose of this paper is to demonstrate the use of Dirichlet Process mixture models as a fully probabilistic alternative to applying Ward hierarchical clustering with the Levenshtein metric. This approach offers an advance to the existing literature as it proposes a fully generative model that articulates the underlying assumptions and allows prediction of new test points. Additionally, it provides a principled method that avoids using heuristics to select the number of clusters. An efficient Gibbs sampling Markov chain Monte Carlo algorithm is presented, and it is demonstrated how the output of this can be mapped and analysed. Difficulties inherent in mapping and plotting high-dimensional mathematical objects are discussed, and practical solutions are proposed.

AB - Recently, time series Moran scatterplot quadrant signatures (MSQS) have been introduced as a means of spatio-temporal analysis. Moran scatter plots summarise a set of local Moran statistics that identify how a quantity of interest relates to its neighbours. By reducing these scatter plots to just one of four quadrant locations at fixed time intervals, a time series MSQS is obtained. Clustering on the time series MSQS allows regions that show similar behaviour relative to their neighbours to be identified. It has recently been shown that the Levenshtein metric, a distance metric originally used for string comparison, can be used to construct a kernel that allows standard Ward hierarchical clustering to be applied. The result is that regions with similar spatio-temporal behaviour can be identified. The purpose of this paper is to demonstrate the use of Dirichlet Process mixture models as a fully probabilistic alternative to applying Ward hierarchical clustering with the Levenshtein metric. This approach offers an advance to the existing literature as it proposes a fully generative model that articulates the underlying assumptions and allows prediction of new test points. Additionally, it provides a principled method that avoids using heuristics to select the number of clusters. An efficient Gibbs sampling Markov chain Monte Carlo algorithm is presented, and it is demonstrated how the output of this can be mapped and analysed. Difficulties inherent in mapping and plotting high-dimensional mathematical objects are discussed, and practical solutions are proposed.

KW - Bayesian statistics

KW - Clustering

KW - Dirichlet process

KW - Fire modelling

KW - Machine learning

KW - Spatial statistics

UR - http://www.scopus.com/inward/record.url?scp=84921027235&partnerID=8YFLogxK

U2 - 10.1002/env.2302

DO - 10.1002/env.2302

M3 - Article

VL - 26

SP - 52

EP - 65

JO - Environmetrics

JF - Environmetrics

SN - 1180-4009

IS - 1

ER -