A cluster-based data splitting method for small sample and class imbalance problems in impact damage classification[Formula presented]

Quoc Hoan Doan, Sy Hung Mai, Quang Thang Do, Duc Kien Thai

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

From collected experimental data, a rapid and precise classification model for impact damage modes (IMDs) can be developed using machine learning (ML) techniques to evaluate impact resistant capabilities of reinforced concrete (RC) building walls. However, experimental data is often small and imbalanced, resulting in significant degradation and instability in classification performance. In this study, an imbalanced 4-classes dataset consisted of 240 missile impact tests is employed, with the most minor class containing only 10 samples. The paper aims to develop an automated classification model for IDMs, using a clustering-based within-class stratified splitting technique, named WICS, combining with a well-known oversampling technique, namely SMOTE-NC, that considers not only the between-class imbalance but also the within-class distribution to stabilize the classification performance. Four classifiers and five data splitting techniques are developed and implemented to address classification performance. We found that the support vector machine (SVM) classifier using WICS and SMOTE-NC achieves the best micro F1 score (0.821), Cohen's kappa score (0.700), and AUC value (0.949) with highly stable performance. Friedman and Holm's post-hoc statistical tests also confirm the outperformance of WICS+SMOTE-NC over other techniques.

Original languageEnglish
Article number108628
JournalApplied Soft Computing
Volume120
DOIs
StatePublished - May 2022

Keywords

  • Imbalanced dataset
  • Impact damage
  • Impact loading
  • RC walls
  • Small dataset

Fingerprint

Dive into the research topics of 'A cluster-based data splitting method for small sample and class imbalance problems in impact damage classification[Formula presented]'. Together they form a unique fingerprint.

Cite this