Statistical issues with using herbarium data for the estimation of invasion lag-phases

Robin John Hyndman, Mohsen B Mesgaran, Roger D Cousens

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Current methods for using herbarium data as time series, for example to estimate the length of the invasion lag phase, often make assumptions that are both statistically and logically inappropriate. We present an alternative statistical approach, estimating the lag phase based on annual rather than cumulative data, a generalized linear model incorporating a log link for overall collection effort, and piecewise linear splines. We demonstrate the method on two species representing good and poor data quality, then apply it to two data sets comprising 448 species/region combinations. Significant lags were detected in only 28 and 40 of time series, a much lower level than the 95 and 77 found in previous analyses of the same data. In a case with high quality data, a lag was concluded even though during the lag the locations of herbarium collections indicated that it was spreading rapidly at a continental scale. In species with few records, results were sensitive to the way in which zeroes were included. Overall, our method gives very good fit to the data, avoids unrealistic assumptions of other methods and gives more reliable estimates of confidence. However, given the poor representation of herbarium samples in the early stages of invasions and the fact that they do not constitute a structured survey of abundance, we warn against over-reliance on statistical analysis of such data to reach conclusions about the dynamics of invasions.
Original languageEnglish
Pages (from-to)3371 - 3381
Number of pages11
JournalBiological Invasions
Volume17
Issue number12
DOIs
Publication statusPublished - 2015

Cite this

Hyndman, Robin John ; Mesgaran, Mohsen B ; Cousens, Roger D. / Statistical issues with using herbarium data for the estimation of invasion lag-phases. In: Biological Invasions. 2015 ; Vol. 17, No. 12. pp. 3371 - 3381.
@article{ea52beb4e4694e0aa21248c736884175,
title = "Statistical issues with using herbarium data for the estimation of invasion lag-phases",
abstract = "Current methods for using herbarium data as time series, for example to estimate the length of the invasion lag phase, often make assumptions that are both statistically and logically inappropriate. We present an alternative statistical approach, estimating the lag phase based on annual rather than cumulative data, a generalized linear model incorporating a log link for overall collection effort, and piecewise linear splines. We demonstrate the method on two species representing good and poor data quality, then apply it to two data sets comprising 448 species/region combinations. Significant lags were detected in only 28 and 40 of time series, a much lower level than the 95 and 77 found in previous analyses of the same data. In a case with high quality data, a lag was concluded even though during the lag the locations of herbarium collections indicated that it was spreading rapidly at a continental scale. In species with few records, results were sensitive to the way in which zeroes were included. Overall, our method gives very good fit to the data, avoids unrealistic assumptions of other methods and gives more reliable estimates of confidence. However, given the poor representation of herbarium samples in the early stages of invasions and the fact that they do not constitute a structured survey of abundance, we warn against over-reliance on statistical analysis of such data to reach conclusions about the dynamics of invasions.",
author = "Hyndman, {Robin John} and Mesgaran, {Mohsen B} and Cousens, {Roger D}",
year = "2015",
doi = "10.1007/s10530-015-0962-8",
language = "English",
volume = "17",
pages = "3371 -- 3381",
journal = "Biological Invasions",
issn = "1387-3547",
publisher = "Springer-Verlag London Ltd.",
number = "12",

}

Statistical issues with using herbarium data for the estimation of invasion lag-phases. / Hyndman, Robin John; Mesgaran, Mohsen B; Cousens, Roger D.

In: Biological Invasions, Vol. 17, No. 12, 2015, p. 3371 - 3381.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Statistical issues with using herbarium data for the estimation of invasion lag-phases

AU - Hyndman, Robin John

AU - Mesgaran, Mohsen B

AU - Cousens, Roger D

PY - 2015

Y1 - 2015

N2 - Current methods for using herbarium data as time series, for example to estimate the length of the invasion lag phase, often make assumptions that are both statistically and logically inappropriate. We present an alternative statistical approach, estimating the lag phase based on annual rather than cumulative data, a generalized linear model incorporating a log link for overall collection effort, and piecewise linear splines. We demonstrate the method on two species representing good and poor data quality, then apply it to two data sets comprising 448 species/region combinations. Significant lags were detected in only 28 and 40 of time series, a much lower level than the 95 and 77 found in previous analyses of the same data. In a case with high quality data, a lag was concluded even though during the lag the locations of herbarium collections indicated that it was spreading rapidly at a continental scale. In species with few records, results were sensitive to the way in which zeroes were included. Overall, our method gives very good fit to the data, avoids unrealistic assumptions of other methods and gives more reliable estimates of confidence. However, given the poor representation of herbarium samples in the early stages of invasions and the fact that they do not constitute a structured survey of abundance, we warn against over-reliance on statistical analysis of such data to reach conclusions about the dynamics of invasions.

AB - Current methods for using herbarium data as time series, for example to estimate the length of the invasion lag phase, often make assumptions that are both statistically and logically inappropriate. We present an alternative statistical approach, estimating the lag phase based on annual rather than cumulative data, a generalized linear model incorporating a log link for overall collection effort, and piecewise linear splines. We demonstrate the method on two species representing good and poor data quality, then apply it to two data sets comprising 448 species/region combinations. Significant lags were detected in only 28 and 40 of time series, a much lower level than the 95 and 77 found in previous analyses of the same data. In a case with high quality data, a lag was concluded even though during the lag the locations of herbarium collections indicated that it was spreading rapidly at a continental scale. In species with few records, results were sensitive to the way in which zeroes were included. Overall, our method gives very good fit to the data, avoids unrealistic assumptions of other methods and gives more reliable estimates of confidence. However, given the poor representation of herbarium samples in the early stages of invasions and the fact that they do not constitute a structured survey of abundance, we warn against over-reliance on statistical analysis of such data to reach conclusions about the dynamics of invasions.

U2 - 10.1007/s10530-015-0962-8

DO - 10.1007/s10530-015-0962-8

M3 - Article

VL - 17

SP - 3371

EP - 3381

JO - Biological Invasions

JF - Biological Invasions

SN - 1387-3547

IS - 12

ER -