A Study on Claims Risk Identification Method Based on K-Means++ and Random Forest

Gao Haoyuan

doi:10.62022/JMR.issn3007-7931.2026.02.010

Home > Journals（Abstract）

Online Office System

Author Submission System

News

Contact Us

Email：NEMPublishing@163.com

Tel(Beijing): 010-69313991；

010-58563191 ；010-58563176

Journals（Abstract）

A Study on Claims Risk Identification Method Based on K-Means++ and Random Forest

Gao Haoyuan

School of Computer Science and Technology, Shandong University of Technology

Abstract：

To address the challenge of risk identification in logistics claims processing, this study proposes a claims risk identification method that integrates cluster analysis and machine learning. Firstly, the claims discrepancy ratio is defined, and its distribution characteristics are analysed using kernel density estimation, combined with K-Means++ to partition the risk structure. Secondly, a claims payout prediction model is established based on random forests, and key variables are screened using feature importance analysis. To address the issue of uneven distribution across risk categories, the SMOTE strategy is introduced to build a random forest risk classification model. Experimental results indicate that the claim amount prediction model achieved a coefficient of determination of 0.75 on the test set, whilst the AUC for each category in the risk classification model exceeded 0.76, demonstrating that the models possess strong risk identification capabilities.

Key Words：

claims risk identification; k-means++; random forest; claims payout forecas; SMOTE

Back to Article List PDF Pages：32-34