Experience report: anomaly detection of cloud application operations using log and cloud metric correlation analysis

Mostafa Farshchi, Jean Guy Schneider, Ingo Weber, John Grundy

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

82 Citations (Scopus)

Abstract

Failure of application operations is one of the main causes of system-wide outages in cloud environments. This particularly applies to DevOps operations, such as backup, redeployment, upgrade, customized scaling, and migration that are exposed to frequent interference from other concurrent operations, configuration changes, and resources failure. However, current practices fail to provide a reliable assurance of correct execution of these kinds of operations. In this paper, we present an approach to address this problem that adopts a regression-based analysis technique to find the correlation between an operation's activity logs and the operation activity's effect on cloud resources. The correlation model is then used to derive assertion specifications, which can be used for runtime verification of running operations and their impact on resources. We evaluated our proposed approach on Amazon EC2 with 22 rounds of rolling upgrade operations while other types of operations were running and random faults were injected. Our experiment shows that our approach successfully managed to raise alarms for 115 random injected faults, with a precision of 92.3%.

Original languageEnglish
Title of host publicationProceedings of the 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE)
EditorsMarco Vieira, Katinka Wolter
Place of PublicationWashington DC USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages24-34
Number of pages11
ISBN (Electronic)9781509004065, 9781509004058
DOIs
Publication statusPublished - 2015
Externally publishedYes
EventInternational Symposium on Software Reliability Engineering 2015 - Gaithersbury, United States of America
Duration: 2 Nov 20155 Nov 2015
Conference number: 26th
https://web.archive.org/web/20171212165629/http://2015.issre.net/organizing-committee

Conference

ConferenceInternational Symposium on Software Reliability Engineering 2015
Abbreviated titleISSRE 2015
Country/TerritoryUnited States of America
CityGaithersbury
Period2/11/155/11/15
Internet address

Keywords

  • anomaly detection
  • Cloud application operations
  • Cloud monitoring
  • DevOps
  • error detection
  • log analysis

Cite this