Informatics tools to assess the success of procedural harmonization in preclinical multicenter biomarker discovery study on post-traumatic epileptogenesis

Robert Ciszek, Xavier Ekolle Ndode-Ekane, Cesar Santana Gomez, Pablo M. Casillas-Espinosa, Idrish Ali, Gregory Smith, Noora Puhakka, Niina Lapinlampi, Pedro Andrade, Alaa Kamnaksh, Riikka Immonen, Tomi Paananen, Matthew R. Hudson, Rhys D. Brady, Sandy R. Shultz, Terence J. O'Brien, Richard J. Staba, Jussi Tohka, Asla Pitkänen

Research output: Contribution to journalArticleResearchpeer-review

2 Citations (Scopus)

Abstract

The Epilepsy Bioinformatics Study for Antiepileptogenic Therapy (EpiBioS4Rx) is a National Institutes for Neurological Diseases and Stoke funded Centers-Without-Walls international multidisciplinary study aimed at preventing epileptogenesis. The preclinical biomarker discovery in EpiBios4Rx applies a multicenter study design to allow the number of animals that are required for adequate statistical power for the analysis to be studied in an efficient manner. Further, the use of multiple centers mimics the clinical trial situation, and therefore potentially the chance of successful clinical translation of the outcomes of the study. Its successful implementation requires harmonization of procedures and data analyses between the three contributing centers in Finland, Australia, and USA. The objective of the present analysis was to develop metrics for analysis of the success of harmonization of procedures to guide further data analyses and plan the future multicenter preclinical studies. The interim analysis of data is based on the analysis of data from 212 rats with lateral fluid-percussion injury or sham-operation included in the biomarker discovery by April 30, 2018. The details of protocols, including production of injury, post-injury follow-up, blood sampling, electroencephalogram recording, and magnetic resonance imaging have been presented in the accompanying manuscripts in this Supplement. Implementation of protocols in EpiBios4Rx project participant centers was visualized in 2D using t-distributed stochastic neighborhood embedding (t-SNE). The protocols applied to each rat were presented as feature vectors of procedure related variables (e.g., impact pressure, anesthesia time). The total number of protocol features linked to each rat was 112. The missing data was accounted in visualization by utilizing imputation and adding the number of missing values as a third dimension to 2D t-SNE plot, resulting in a 3D overview of protocol data. Intraclass correlation coefficient (ICC) using Euclidean distances and area under receiver operating characteristic curve (AUC) of k-nearest neighbor classifier (KNN) were utilized to quantify the degree of clustering by center. Both subsets of data with incomplete protocol vectors omitted and missing protocol data imputed were assessed. Our data show that a visible clustering by center was observed in all t-SNE plots, except for day 7 neuroscores. Both ICC and AUC indicated clustering by center in all protocol variable subsets, excluding unimputed day 7 neuroscores (ICC 0.04 and AUC 0.6). ICC for imputed set of all protocol related variables was 0.1 and KNN AUC 0.92. In conclusion, both ICC and AUC indicated differences in protocol between EpiBios4Rx participating centers, which needs to be taken into account in data analysis. Importantly, the majority of observed differences are recoverable as they relate to insufficient updates in record keeping. While AUC score of KNN is a more sensitive measure for protocol harmonization than ICC for data that displays complex splintered clustering, ICC and AUC provide complementary measures to assess the degree of procedural harmonization. This experience should be helpful for other groups planning such multicenter post-traumatic epileptogenesis studies in the future.

Original languageEnglish
Pages (from-to)17-26
Number of pages10
JournalEpilepsy Research
Volume150
DOIs
Publication statusPublished - 1 Feb 2019

Keywords

  • Classification
  • Common data element
  • Dimensionality reduction
  • Intraclass correlation
  • k-nearest neighbor
  • Lateral fluid-percussion
  • Machine learning
  • Traumatic brain injury

Cite this

Ciszek, Robert ; Ndode-Ekane, Xavier Ekolle ; Gomez, Cesar Santana ; Casillas-Espinosa, Pablo M. ; Ali, Idrish ; Smith, Gregory ; Puhakka, Noora ; Lapinlampi, Niina ; Andrade, Pedro ; Kamnaksh, Alaa ; Immonen, Riikka ; Paananen, Tomi ; Hudson, Matthew R. ; Brady, Rhys D. ; Shultz, Sandy R. ; O'Brien, Terence J. ; Staba, Richard J. ; Tohka, Jussi ; Pitkänen, Asla. / Informatics tools to assess the success of procedural harmonization in preclinical multicenter biomarker discovery study on post-traumatic epileptogenesis. In: Epilepsy Research. 2019 ; Vol. 150. pp. 17-26.
@article{b8245cf6f3884b269b1391b7dc5360d5,
title = "Informatics tools to assess the success of procedural harmonization in preclinical multicenter biomarker discovery study on post-traumatic epileptogenesis",
abstract = "The Epilepsy Bioinformatics Study for Antiepileptogenic Therapy (EpiBioS4Rx) is a National Institutes for Neurological Diseases and Stoke funded Centers-Without-Walls international multidisciplinary study aimed at preventing epileptogenesis. The preclinical biomarker discovery in EpiBios4Rx applies a multicenter study design to allow the number of animals that are required for adequate statistical power for the analysis to be studied in an efficient manner. Further, the use of multiple centers mimics the clinical trial situation, and therefore potentially the chance of successful clinical translation of the outcomes of the study. Its successful implementation requires harmonization of procedures and data analyses between the three contributing centers in Finland, Australia, and USA. The objective of the present analysis was to develop metrics for analysis of the success of harmonization of procedures to guide further data analyses and plan the future multicenter preclinical studies. The interim analysis of data is based on the analysis of data from 212 rats with lateral fluid-percussion injury or sham-operation included in the biomarker discovery by April 30, 2018. The details of protocols, including production of injury, post-injury follow-up, blood sampling, electroencephalogram recording, and magnetic resonance imaging have been presented in the accompanying manuscripts in this Supplement. Implementation of protocols in EpiBios4Rx project participant centers was visualized in 2D using t-distributed stochastic neighborhood embedding (t-SNE). The protocols applied to each rat were presented as feature vectors of procedure related variables (e.g., impact pressure, anesthesia time). The total number of protocol features linked to each rat was 112. The missing data was accounted in visualization by utilizing imputation and adding the number of missing values as a third dimension to 2D t-SNE plot, resulting in a 3D overview of protocol data. Intraclass correlation coefficient (ICC) using Euclidean distances and area under receiver operating characteristic curve (AUC) of k-nearest neighbor classifier (KNN) were utilized to quantify the degree of clustering by center. Both subsets of data with incomplete protocol vectors omitted and missing protocol data imputed were assessed. Our data show that a visible clustering by center was observed in all t-SNE plots, except for day 7 neuroscores. Both ICC and AUC indicated clustering by center in all protocol variable subsets, excluding unimputed day 7 neuroscores (ICC 0.04 and AUC 0.6). ICC for imputed set of all protocol related variables was 0.1 and KNN AUC 0.92. In conclusion, both ICC and AUC indicated differences in protocol between EpiBios4Rx participating centers, which needs to be taken into account in data analysis. Importantly, the majority of observed differences are recoverable as they relate to insufficient updates in record keeping. While AUC score of KNN is a more sensitive measure for protocol harmonization than ICC for data that displays complex splintered clustering, ICC and AUC provide complementary measures to assess the degree of procedural harmonization. This experience should be helpful for other groups planning such multicenter post-traumatic epileptogenesis studies in the future.",
keywords = "Classification, Common data element, Dimensionality reduction, Intraclass correlation, k-nearest neighbor, Lateral fluid-percussion, Machine learning, Traumatic brain injury",
author = "Robert Ciszek and Ndode-Ekane, {Xavier Ekolle} and Gomez, {Cesar Santana} and Casillas-Espinosa, {Pablo M.} and Idrish Ali and Gregory Smith and Noora Puhakka and Niina Lapinlampi and Pedro Andrade and Alaa Kamnaksh and Riikka Immonen and Tomi Paananen and Hudson, {Matthew R.} and Brady, {Rhys D.} and Shultz, {Sandy R.} and O'Brien, {Terence J.} and Staba, {Richard J.} and Jussi Tohka and Asla Pitk{\"a}nen",
year = "2019",
month = "2",
day = "1",
doi = "10.1016/j.eplepsyres.2018.12.010",
language = "English",
volume = "150",
pages = "17--26",
journal = "Epilepsy Research",
issn = "0920-1211",
publisher = "Elsevier",

}

Informatics tools to assess the success of procedural harmonization in preclinical multicenter biomarker discovery study on post-traumatic epileptogenesis. / Ciszek, Robert; Ndode-Ekane, Xavier Ekolle; Gomez, Cesar Santana; Casillas-Espinosa, Pablo M.; Ali, Idrish; Smith, Gregory; Puhakka, Noora; Lapinlampi, Niina; Andrade, Pedro; Kamnaksh, Alaa; Immonen, Riikka; Paananen, Tomi; Hudson, Matthew R.; Brady, Rhys D.; Shultz, Sandy R.; O'Brien, Terence J.; Staba, Richard J.; Tohka, Jussi; Pitkänen, Asla.

In: Epilepsy Research, Vol. 150, 01.02.2019, p. 17-26.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Informatics tools to assess the success of procedural harmonization in preclinical multicenter biomarker discovery study on post-traumatic epileptogenesis

AU - Ciszek, Robert

AU - Ndode-Ekane, Xavier Ekolle

AU - Gomez, Cesar Santana

AU - Casillas-Espinosa, Pablo M.

AU - Ali, Idrish

AU - Smith, Gregory

AU - Puhakka, Noora

AU - Lapinlampi, Niina

AU - Andrade, Pedro

AU - Kamnaksh, Alaa

AU - Immonen, Riikka

AU - Paananen, Tomi

AU - Hudson, Matthew R.

AU - Brady, Rhys D.

AU - Shultz, Sandy R.

AU - O'Brien, Terence J.

AU - Staba, Richard J.

AU - Tohka, Jussi

AU - Pitkänen, Asla

PY - 2019/2/1

Y1 - 2019/2/1

N2 - The Epilepsy Bioinformatics Study for Antiepileptogenic Therapy (EpiBioS4Rx) is a National Institutes for Neurological Diseases and Stoke funded Centers-Without-Walls international multidisciplinary study aimed at preventing epileptogenesis. The preclinical biomarker discovery in EpiBios4Rx applies a multicenter study design to allow the number of animals that are required for adequate statistical power for the analysis to be studied in an efficient manner. Further, the use of multiple centers mimics the clinical trial situation, and therefore potentially the chance of successful clinical translation of the outcomes of the study. Its successful implementation requires harmonization of procedures and data analyses between the three contributing centers in Finland, Australia, and USA. The objective of the present analysis was to develop metrics for analysis of the success of harmonization of procedures to guide further data analyses and plan the future multicenter preclinical studies. The interim analysis of data is based on the analysis of data from 212 rats with lateral fluid-percussion injury or sham-operation included in the biomarker discovery by April 30, 2018. The details of protocols, including production of injury, post-injury follow-up, blood sampling, electroencephalogram recording, and magnetic resonance imaging have been presented in the accompanying manuscripts in this Supplement. Implementation of protocols in EpiBios4Rx project participant centers was visualized in 2D using t-distributed stochastic neighborhood embedding (t-SNE). The protocols applied to each rat were presented as feature vectors of procedure related variables (e.g., impact pressure, anesthesia time). The total number of protocol features linked to each rat was 112. The missing data was accounted in visualization by utilizing imputation and adding the number of missing values as a third dimension to 2D t-SNE plot, resulting in a 3D overview of protocol data. Intraclass correlation coefficient (ICC) using Euclidean distances and area under receiver operating characteristic curve (AUC) of k-nearest neighbor classifier (KNN) were utilized to quantify the degree of clustering by center. Both subsets of data with incomplete protocol vectors omitted and missing protocol data imputed were assessed. Our data show that a visible clustering by center was observed in all t-SNE plots, except for day 7 neuroscores. Both ICC and AUC indicated clustering by center in all protocol variable subsets, excluding unimputed day 7 neuroscores (ICC 0.04 and AUC 0.6). ICC for imputed set of all protocol related variables was 0.1 and KNN AUC 0.92. In conclusion, both ICC and AUC indicated differences in protocol between EpiBios4Rx participating centers, which needs to be taken into account in data analysis. Importantly, the majority of observed differences are recoverable as they relate to insufficient updates in record keeping. While AUC score of KNN is a more sensitive measure for protocol harmonization than ICC for data that displays complex splintered clustering, ICC and AUC provide complementary measures to assess the degree of procedural harmonization. This experience should be helpful for other groups planning such multicenter post-traumatic epileptogenesis studies in the future.

AB - The Epilepsy Bioinformatics Study for Antiepileptogenic Therapy (EpiBioS4Rx) is a National Institutes for Neurological Diseases and Stoke funded Centers-Without-Walls international multidisciplinary study aimed at preventing epileptogenesis. The preclinical biomarker discovery in EpiBios4Rx applies a multicenter study design to allow the number of animals that are required for adequate statistical power for the analysis to be studied in an efficient manner. Further, the use of multiple centers mimics the clinical trial situation, and therefore potentially the chance of successful clinical translation of the outcomes of the study. Its successful implementation requires harmonization of procedures and data analyses between the three contributing centers in Finland, Australia, and USA. The objective of the present analysis was to develop metrics for analysis of the success of harmonization of procedures to guide further data analyses and plan the future multicenter preclinical studies. The interim analysis of data is based on the analysis of data from 212 rats with lateral fluid-percussion injury or sham-operation included in the biomarker discovery by April 30, 2018. The details of protocols, including production of injury, post-injury follow-up, blood sampling, electroencephalogram recording, and magnetic resonance imaging have been presented in the accompanying manuscripts in this Supplement. Implementation of protocols in EpiBios4Rx project participant centers was visualized in 2D using t-distributed stochastic neighborhood embedding (t-SNE). The protocols applied to each rat were presented as feature vectors of procedure related variables (e.g., impact pressure, anesthesia time). The total number of protocol features linked to each rat was 112. The missing data was accounted in visualization by utilizing imputation and adding the number of missing values as a third dimension to 2D t-SNE plot, resulting in a 3D overview of protocol data. Intraclass correlation coefficient (ICC) using Euclidean distances and area under receiver operating characteristic curve (AUC) of k-nearest neighbor classifier (KNN) were utilized to quantify the degree of clustering by center. Both subsets of data with incomplete protocol vectors omitted and missing protocol data imputed were assessed. Our data show that a visible clustering by center was observed in all t-SNE plots, except for day 7 neuroscores. Both ICC and AUC indicated clustering by center in all protocol variable subsets, excluding unimputed day 7 neuroscores (ICC 0.04 and AUC 0.6). ICC for imputed set of all protocol related variables was 0.1 and KNN AUC 0.92. In conclusion, both ICC and AUC indicated differences in protocol between EpiBios4Rx participating centers, which needs to be taken into account in data analysis. Importantly, the majority of observed differences are recoverable as they relate to insufficient updates in record keeping. While AUC score of KNN is a more sensitive measure for protocol harmonization than ICC for data that displays complex splintered clustering, ICC and AUC provide complementary measures to assess the degree of procedural harmonization. This experience should be helpful for other groups planning such multicenter post-traumatic epileptogenesis studies in the future.

KW - Classification

KW - Common data element

KW - Dimensionality reduction

KW - Intraclass correlation

KW - k-nearest neighbor

KW - Lateral fluid-percussion

KW - Machine learning

KW - Traumatic brain injury

UR - http://www.scopus.com/inward/record.url?scp=85059229929&partnerID=8YFLogxK

U2 - 10.1016/j.eplepsyres.2018.12.010

DO - 10.1016/j.eplepsyres.2018.12.010

M3 - Article

VL - 150

SP - 17

EP - 26

JO - Epilepsy Research

JF - Epilepsy Research

SN - 0920-1211

ER -