TY - JOUR
T1 - Development of an algorithm to classify primary care electronic health records of alcohol consumption
T2 - Experience using data linkage from UK Biobank and primary care electronic health data sources
AU - Fraile-Navarro, David
AU - Azcoaga-Lorenzo, Amaya
AU - Agrawal, Utkarsh
AU - Jani, Bhautesh
AU - Fagbamigbe, Adeniyi
AU - Currie, Dorothy
AU - Baldacchino, Alexander
AU - Sullivan, Frank
N1 - Funding Information:
AA-L received funding from an HDRUK Fellowship for some of her research time. The study was carried out independently with no involvement from the funder. This project was funded by a research bursary from NHS Fife R&D department. Award date 10 April 2019.
Publisher Copyright:
© 2022 BMJ Publishing Group. All rights reserved.
PY - 2022/2
Y1 - 2022/2
N2 - Objectives Develop a novel algorithm to categorise alcohol consumption using primary care electronic health records (EHRs) and asses its reliability by comparing this classification with self-reported alcohol consumption data obtained from the UK Biobank (UKB) cohort. Design Cross-sectional study. Setting The UKB, a population-based cohort with participants aged between 40 and 69 years recruited across the UK between 2006 and 2010. Participants UKB participants from Scotland with linked primary care data. Primary and secondary outcome measures Create a rule-based multiclass algorithm to classify alcohol consumption reported by Scottish UKB participants and compare it with their classification using data present in primary care EHRs based on Read Codes. We evaluated agreement metrics (simple agreement and kappa statistic). Results Among the Scottish UKB participants, 18 838 (69%) had at least one Read Code related to alcohol consumption and were used in the classification. The agreement of alcohol consumption categories between UKB and primary care data, including assessments within 5 years was 59.6%, and kappa was 0.23 (95% CI 0.21 to 0.24). Differences in classification between the two sources were statistically significant (p<0.001); More individuals were classified as € sensible drinkers' and in lower alcohol consumption levels in primary care records compared with the UKB. Agreement improved slightly when using only numerical values (k=0.29; 95% CI 0.27 to 0.31) and decreased when using qualitative descriptors only (k=0.18;95% CI 0.16 to 0.20). Conclusion Our algorithm classifies alcohol consumption recorded in Primary Care EHRs into discrete meaningful categories. These results suggest that alcohol consumption may be underestimated in primary care EHRs. Using numerical values (alcohol units) may improve classification when compared with qualitative descriptors.
AB - Objectives Develop a novel algorithm to categorise alcohol consumption using primary care electronic health records (EHRs) and asses its reliability by comparing this classification with self-reported alcohol consumption data obtained from the UK Biobank (UKB) cohort. Design Cross-sectional study. Setting The UKB, a population-based cohort with participants aged between 40 and 69 years recruited across the UK between 2006 and 2010. Participants UKB participants from Scotland with linked primary care data. Primary and secondary outcome measures Create a rule-based multiclass algorithm to classify alcohol consumption reported by Scottish UKB participants and compare it with their classification using data present in primary care EHRs based on Read Codes. We evaluated agreement metrics (simple agreement and kappa statistic). Results Among the Scottish UKB participants, 18 838 (69%) had at least one Read Code related to alcohol consumption and were used in the classification. The agreement of alcohol consumption categories between UKB and primary care data, including assessments within 5 years was 59.6%, and kappa was 0.23 (95% CI 0.21 to 0.24). Differences in classification between the two sources were statistically significant (p<0.001); More individuals were classified as € sensible drinkers' and in lower alcohol consumption levels in primary care records compared with the UKB. Agreement improved slightly when using only numerical values (k=0.29; 95% CI 0.27 to 0.31) and decreased when using qualitative descriptors only (k=0.18;95% CI 0.16 to 0.20). Conclusion Our algorithm classifies alcohol consumption recorded in Primary Care EHRs into discrete meaningful categories. These results suggest that alcohol consumption may be underestimated in primary care EHRs. Using numerical values (alcohol units) may improve classification when compared with qualitative descriptors.
KW - health informatics
KW - primary care
KW - public health
UR - http://www.scopus.com/inward/record.url?scp=85123973672&partnerID=8YFLogxK
U2 - 10.1136/bmjopen-2021-054376
DO - 10.1136/bmjopen-2021-054376
M3 - Article
C2 - 35105585
AN - SCOPUS:85123973672
SN - 2044-6055
VL - 12
JO - BMJ Open
JF - BMJ Open
IS - 2
M1 - e054376
ER -