Pattern recognition and segmentation of smart meter data

Barry McDonald, Peter Pudney, Jia Rong

Research output: Contribution to journalArticleResearchpeer-review


In Australia, Smart Meters automatically provide electricity suppliers with half-hour energy use data for each customer. This data can be used to classify customers into different categories. To this end, electricity supplier AGL provided MISG participants with data from 772 anonymous Victorian customers, collected between 2011-07-16 and 2012-01-30, and the corresponding series of half-hour temperature readings for Melbourne. The goals were to identify a small number of load profiles that could be used to classify customers, and to identify which customers have significant cooling loads and which customers have significant heating loads. For each customer there was a time series of 9552 half-hour periods, which made the dimensionality of the problem too high for cluster analysis of the entire sample data. Therefore analysis proceeded in two phases. First, the data were explored using various methods of data visualisation, including time series plots, scatterplots and heatmaps of electricity use against temperature and time, Fourier series analysis and load duration curves. Exploration suggested that some automatic data-selection rules would be useful, for example to eliminate premises with long periods of zero electricity, presumably due to vacancy. Based on the data exploration, summary statistics were chosen that would represent each customer, and these were used in the next phase, cluster analysis. Second, three approaches were used for clustering: self-organising maps, agglomerative clustering, and K-means clustering. Each of these methods produced interpretable clusters indicating different types of customer. Agglomerative clustering with complete linkage was good for picking out small very distinctive clusters, and Ward's linkage also performed well provided sufficient clusters were allowed. Computational limitations mean these two techniques cannot be directly used on very large samples---AGL has hundreds of thousands of customers. However, the cluster centroids from a pilot study, such as the sample provided to MISG, could be used as initial estimates for feeding into K-means clustering, providing the twin benefits of interpretable clusters and computational efficiency.
Original languageEnglish
Article numberM105
Pages (from-to)105-150
Number of pages46
JournalANZIAM Journal
Publication statusPublished - 27 May 2014
Externally publishedYes


  • Pattern recognition
  • Cluster analysis
  • Segmentation
  • smart meters

Cite this