TY - JOUR
T1 - Demystifying uncertainty in PM10 susceptibility mapping using variable drop-off in extreme-gradient boosting (XGB) and random forest (RF) algorithms
AU - AlThuwaynee, Omar F.
AU - Kim, Sang Wan
AU - Najemaden, Mohamed A.
AU - Aydda, Ali
AU - Balogun, Abdul Lateef
AU - Fayyadh, Moatasem M.
AU - Park, Hyuck Jin
N1 - Funding Information:
This research was supported by Space Core Technology Development Program through the National Research Foundation of Korea funded by the Ministry of Science and ICT (2018M1A3A3A02066002), and MSIT (Ministry of Science, ICT), Korea, under the High-Potential Individuals Global Training Program (2019-0-01561) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation).
Funding Information:
We express our gratitude to Ministry of Environment, Iraq for providing air quality record data, and Scientists Adoption Academy research network (scadacademy.com ), with special thanks to Ms. Badal Pokharel for literature feedback.
Publisher Copyright:
© 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2021/8
Y1 - 2021/8
N2 - This study investigates uncertainty in machine learning that can occur when there is significant variance in the prediction importance level of the independent variables, especially when the ROC fails to reflect the unbalanced effect of prediction variables. A variable drop-off loop function, based on the concept of early termination for reduction of model capacity, regularization, and generalization control, was tested. A susceptibility index for airborne particulate matter of less than 10 μm diameter (PM10) was modeled using monthly maximum values and spectral bands and indices from Landsat 8 imagery, and Open Street Maps were used to prepare a range of independent variables. Probability and classification index maps were prepared using extreme-gradient boosting (XGBOOST) and random forest (RF) algorithms. These were assessed against utility criteria such as a confusion matrix of overall accuracy, quantity of variables, processing delay, degree of overfitting, importance distribution, and area under the receiver operating characteristic curve (ROC). Graphical abstract: [Figure not available: see fulltext.]
AB - This study investigates uncertainty in machine learning that can occur when there is significant variance in the prediction importance level of the independent variables, especially when the ROC fails to reflect the unbalanced effect of prediction variables. A variable drop-off loop function, based on the concept of early termination for reduction of model capacity, regularization, and generalization control, was tested. A susceptibility index for airborne particulate matter of less than 10 μm diameter (PM10) was modeled using monthly maximum values and spectral bands and indices from Landsat 8 imagery, and Open Street Maps were used to prepare a range of independent variables. Probability and classification index maps were prepared using extreme-gradient boosting (XGBOOST) and random forest (RF) algorithms. These were assessed against utility criteria such as a confusion matrix of overall accuracy, quantity of variables, processing delay, degree of overfitting, importance distribution, and area under the receiver operating characteristic curve (ROC). Graphical abstract: [Figure not available: see fulltext.]
KW - Air quality modeling
KW - Landsat 8 OLI/TIRS imagery
KW - Petroleum cities
KW - PM10
KW - Spectral indices
KW - Urban planning
UR - http://www.scopus.com/inward/record.url?scp=85104056850&partnerID=8YFLogxK
U2 - 10.1007/s11356-021-13255-4
DO - 10.1007/s11356-021-13255-4
M3 - Article
C2 - 33834339
AN - SCOPUS:85104056850
VL - 28
SP - 43544
EP - 43566
JO - Environmental Science and Pollution Research
JF - Environmental Science and Pollution Research
SN - 0944-1344
IS - 32
ER -