Welcome To  NEM   

Journals(Abstract)

A Study on Claims Risk Identification Method Based on K-Means++ and Random Forest

Gao Haoyuan

School of Computer Science and Technology, Shandong University of Technology

Abstract:

To address the challenge of risk identification in logistics claims processing, this study proposes a claims risk identification method that integrates cluster analysis and machine learning. Firstly, the claims discrepancy ratio is defined, and its distribution characteristics are analysed using kernel density estimation, combined with K-Means++ to partition the risk structure. Secondly, a claims payout prediction model is established based on random forests, and key variables are screened using feature importance analysis. To address the issue of uneven distribution across risk categories, the SMOTE strategy is introduced to build a random forest risk classification model. Experimental results indicate that the claim amount prediction model achieved a coefficient of determination of 0.75 on the test set, whilst the AUC for each category in the risk classification model exceeded 0.76, demonstrating that the models possess strong risk identification capabilities.


Key Words:

claims risk identification; k-means++; random forest; claims payout forecas; SMOTE

技术支持:人人站CMS
Powered by RRZCMS