A Survey of Training-stage Privacy Protection for Large Language Models: The Utility-overhead Trade-off in Federated Learning

Xu Jia

doi:10.62022/JFDR.issn3007-8032.2026.04.018

Home > Journals（Abstract）

Online Office System

Author Submission System

News

Contact Us

Email：NEMPublishing@163.com

Tel(Beijing): 010-69313991；

010-58563191 ；010-58563176

Journals（Abstract）

A Survey of Training-stage Privacy Protection for Large Language Models: The Utility-overhead Trade-off in Federated Learning

Xu Jia

Tianjin Tianshi College

Abstract：

In recent years, the rapid deployment of large language models (LLMs) in highly sensitive domains such as healthcare, finance, and government services has made privacy risks in the training stage increasingly difficult to ignore. Existing reviews have examined this issue from the perspectives of the full lifecycle, overall security, or individual privacy-preserving techniques. However, the training stage is often treated merely as one layer within a broader risk landscape, with insufficient mechanism-level analysis and limited unified comparison across different protection paradigms.Against this backdrop, this review narrows its analytical focus to the training stage. It begins by examining three major categories of risk, namely training-process security and data quality risks, privacy inference, and training data extraction, and then explains why federated learning has become a critical framework for privacy protection in training under multi-institutional collaboration and stringent compliance constraints. Building on this foundation, the review develops a unified comparative framework for four major technical routes-secure aggregation and secure multi-party computation, homomorphic encryption, differential privacy, and parameter-efficient fine-tuning-across six dimensions: protected objects, privacy strength, utility impact, communication overhead, computational overhead, and engineering deployability. The analysis shows that these four routes shift their costs to different dimensions, making it difficult for any single mechanism to achieve comprehensive protection. A more promising direction for future research lies in protection strategies that integrate parameter efficiency, architecture awareness, and fine-grained control.

Key Words：

large language models; privacy-preserving training; federated learning; training-stage privacy risks; utility-overhead trade-off

Back to Article List PDF Pages：56-60