TY - JOUR
T1 - A workflow for standardising and integrating alien species distribution data
AU - Seebens, Hanno
AU - Clarke, David A.
AU - Groom, Quentin
AU - Wilson, John R.U.
AU - García-Berthou, Emili
AU - Kühn, Ingolf
AU - Roigé, Mariona
AU - Pagad, Shyama
AU - Essl, Franz
AU - Vicente, Joana
AU - Winter, Marten
AU - McGeoch, Melodie
N1 - Funding Information:
This paper is a joint effort of the sTWIST working group (Theory and Workflows for Invasive Species Tracking) supported by sDiv, the Synthesis Centre of iDiv (DFG FZT 118 – 202548816). It is a contribution to the Species Populations Working Group of the Group on Earth Observations Biodiversity Observation Network (GEO BON; https://geobon. org/ebvs/workinggroups/species-populations). We thank Wolfgang Traylor for advice on structuring the R code, Carlos Eduardo Arlé Ribeiro de Souza for providing the shapefile and Gabriele Rada for support on graphic design. Support from the following funding agencies is acknowledged: HS – Belmont Forum-BiodivERsA project AlienScenarios through the national funders German Federal Ministry of Education and Research (BMBF; grant 01LC1807A). MAM – Australian Research Council (DP200101680). FE – BiodivERsA-Belmont Forum Project AlienScenarios (FWF project no I 4011-B32). JRUW – South African Department of Forestry, Fisheries and the Environment (DFFtE) for funding noting that this publication does not necessarily represent the views or opinions of DFFtE or its employees. DAC – Australian Government Research Training Program (RTP) scholarship. EGB – Spanish Ministry of Science and Innovation (projects CGL2016-80820-R, PCIN-2016-168 and RED2018‐102571‐T) and the Government of Catalonia (ref. 2017 SGR 548). QG – Belgian Science Policies Brain program (BR/165/A1/TrIAS).
Funding Information:
This paper is a joint effort of the sTWIST working group (Theory and Workflows for Invasive Species Tracking) supported by sDiv, the Synthesis Centre of iDiv (DFG FZT 118-202548816). It is a contribution to the Species Populations Working Group of the Group on Earth Observations Biodiversity Observation Network (GEO BON; https://geobon. org/ebvs/workinggroups/species-populations). We thank Wolfgang Traylor for advice on structuring the R code, Carlos Eduardo Arlé Ribeiro de Souza for providing the shapefile and Gabriele Rada for support on graphic design. Support from the following funding agencies is acknowledged: HS-Belmont Forum-BiodivERsA project AlienScenarios through the national funders German Federal Ministry of Education and Research (BMBF; grant 01LC1807A). MAM-Australian Research Council (DP200101680). FE-BiodivERsABelmont Forum Project AlienScenarios (FWF project no I 4011-B32). JRUW-South African Department of Forestry, Fisheries and the Environment (DFFtE) for funding noting that this publication does not necessarily represent the views or opinions of DFFtE or its employees. DAC-Australian Government Research Training Program (RTP) scholarship. EGB-Spanish Ministry of Science and Innovation (projects CGL2016-80820-R, PCIN-2016-168 and RED2018-102571-T) and the Government of Catalonia (ref. 2017 SGR 548). QG-Belgian Science Policies Brain program (BR/165/A1/TrIAS).
Publisher Copyright:
© 2020.
PY - 2020/7/28
Y1 - 2020/7/28
N2 - Biodiversity data are being collected at unprecedented rates. Such data often have significant value for purposes beyond the initial reason for which they were collected, particularly when they are combined and collated with other data sources. In the field of invasion ecology, however, integrating data represents a major challenge due to the notorious lack of standardisation of terminologies and categorisations, and the application of deviating concepts of biological invasions. Here, we introduce the SInAS workflow, short for Standardising and Integrating Alien Species data. The SInAS workflow standardises terminologies following Darwin Core, location names using a proposed translation table, taxon names based on the GBIF backbone taxonomy, and dates of first records based on a set of predefined rules. The output of the SInAS workflow provides various entry points that can be used both to improve coherence among the databases and to check and correct the original data. The workflow is flexible and can be easily adapted and extended to the needs of different users. We illustrate the workflow using a case-study integrating five widely used global databases of information on biological invasions. The comparison of the standardised databases revealed a surprisingly low degree of overlap, which indicates that the amount of data may currently not be fully exploited in the original databases. We highly recommend the use and development of publicly available workflows to ensure that the integration of databases is reproducible and transparent. Workflows, such as SInAS, ultimately increase trust in data, study results, and conclusions.
AB - Biodiversity data are being collected at unprecedented rates. Such data often have significant value for purposes beyond the initial reason for which they were collected, particularly when they are combined and collated with other data sources. In the field of invasion ecology, however, integrating data represents a major challenge due to the notorious lack of standardisation of terminologies and categorisations, and the application of deviating concepts of biological invasions. Here, we introduce the SInAS workflow, short for Standardising and Integrating Alien Species data. The SInAS workflow standardises terminologies following Darwin Core, location names using a proposed translation table, taxon names based on the GBIF backbone taxonomy, and dates of first records based on a set of predefined rules. The output of the SInAS workflow provides various entry points that can be used both to improve coherence among the databases and to check and correct the original data. The workflow is flexible and can be easily adapted and extended to the needs of different users. We illustrate the workflow using a case-study integrating five widely used global databases of information on biological invasions. The comparison of the standardised databases revealed a surprisingly low degree of overlap, which indicates that the amount of data may currently not be fully exploited in the original databases. We highly recommend the use and development of publicly available workflows to ensure that the integration of databases is reproducible and transparent. Workflows, such as SInAS, ultimately increase trust in data, study results, and conclusions.
KW - Darwin core
KW - Databases
KW - GBIF
KW - Invasive alien species
KW - R software environment
KW - Reproducibility
KW - Standardisation
KW - Taxonomy
KW - Workflow
UR - http://www.scopus.com/inward/record.url?scp=85089487408&partnerID=8YFLogxK
U2 - 10.3897/NEOBIOTA.59.53578
DO - 10.3897/NEOBIOTA.59.53578
M3 - Article
AN - SCOPUS:85089487408
SN - 1619-0033
VL - 59
SP - 39
EP - 59
JO - NeoBiota
JF - NeoBiota
ER -