TY - JOUR
T1 - Prediction and interpretation of antibiotic-resistance genes occurrence at recreational beaches using machine learning models
AU - Iftikhar, Sara
AU - Karim, Asad Mustafa
AU - Karim, Aoun Murtaza
AU - Karim, Mujahid Aizaz
AU - Aslam, Muhammad
AU - Rubab, Fazila
AU - Malik, Sumera Kausar
AU - Kwon, Jeong Eun
AU - Hussain, Imran
AU - Azhar, Esam I.
AU - Kang, Se Chan
AU - Yasir, Muhammad
N1 - Funding Information:
This research work was funded by Institutional Fund Projects under grant no. ( IFPIP: 1270-141-1442 ). Therefor authors gratefully acknowledge technical and financial support from the Ministry of Education and King Abdulaziz University , DSR , Jeddah, Saudi Arabia.
Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2023/2/15
Y1 - 2023/2/15
N2 - Antibiotic-resistant bacteria and antibiotic resistance genes (ARGs) are pollutants of worldwide concern that seriously threaten public health and ecosystems. Machine learning (ML) prediction models have been applied to predict ARGs in beach waters. However, the existing studies were conducted at a single location and had low prediction performance. Moreover, ML models are “black boxes” that do not reveal their predictions' internal nuances and mechanisms. This lack of transparency and trust can result in serious consequences when using these models in high-stakes decisions. In this study, we developed a gradient boosted regression tree based (GBRT) ML model and then described its behavior using six explainable artificial intelligence (XAI) model-agnostic explanation methods. We used hydro-meteorological and qPCR data from the beaches in South Korea and Pakistan and developed ML prediction models for aac (6′-lb-cr), sul1, and tetX with 10-fold time-blocked cross-validation performances of 4.9, 2.06 and 4.4 root mean squared logarithmic error, respectively. We then analyzed the local and global behavior of the developed ML model using four interpretation methods. The developed ML models showed that water temperature, precipitation and tide are the most important predictors for prediction of ARGs at recreational beaches. We show that the model-agnostic interpretation methods not only explain the behavior of the ML model but also provide insights into the behavior of the ML model under new unseen conditions. Moreover, these post-processing techniques can be a debugging tool for ML-based modeling.
AB - Antibiotic-resistant bacteria and antibiotic resistance genes (ARGs) are pollutants of worldwide concern that seriously threaten public health and ecosystems. Machine learning (ML) prediction models have been applied to predict ARGs in beach waters. However, the existing studies were conducted at a single location and had low prediction performance. Moreover, ML models are “black boxes” that do not reveal their predictions' internal nuances and mechanisms. This lack of transparency and trust can result in serious consequences when using these models in high-stakes decisions. In this study, we developed a gradient boosted regression tree based (GBRT) ML model and then described its behavior using six explainable artificial intelligence (XAI) model-agnostic explanation methods. We used hydro-meteorological and qPCR data from the beaches in South Korea and Pakistan and developed ML prediction models for aac (6′-lb-cr), sul1, and tetX with 10-fold time-blocked cross-validation performances of 4.9, 2.06 and 4.4 root mean squared logarithmic error, respectively. We then analyzed the local and global behavior of the developed ML model using four interpretation methods. The developed ML models showed that water temperature, precipitation and tide are the most important predictors for prediction of ARGs at recreational beaches. We show that the model-agnostic interpretation methods not only explain the behavior of the ML model but also provide insights into the behavior of the ML model under new unseen conditions. Moreover, these post-processing techniques can be a debugging tool for ML-based modeling.
KW - Antibiotic resistance genes
KW - Artificial intelligence
KW - Black box models
KW - Explainable
KW - Machine learning
KW - Recreational beaches
UR - http://www.scopus.com/inward/record.url?scp=85144053888&partnerID=8YFLogxK
U2 - 10.1016/j.jenvman.2022.116969
DO - 10.1016/j.jenvman.2022.116969
M3 - Article
C2 - 36495825
AN - SCOPUS:85144053888
SN - 0301-4797
VL - 328
JO - Journal of Environmental Management
JF - Journal of Environmental Management
M1 - 116969
ER -