Online Office System
News
- 2026 Greater Bay Area International Academic Paper Competition: No Review Fees,
- Leveraging Intelligent Tools to Enhance Competitiveness in Submitting to Interna
- AI Intelligent Review: The "Intelligent Facilitator" in the Field of Academic Pu
- Chinese AI Publishing Standards: The Inevitable Path of Independent Innovation a
- NEM: Sharing Chinese Wisdom for Global Scholarly Synergy!
Contact Us
Email:NEMPublishing@163.com
Tel(Beijing): 010-69313991;
010-58563191 ;010-58563176
Journals(Abstract)
A Survey of Training-stage Privacy Protection for Large Language Models: The Utility-overhead Trade-off in Federated Learning
Xu Jia
Tianjin Tianshi College
Abstract:
In recent years, the rapid deployment of large language models (LLMs) in highly sensitive domains such as healthcare, finance, and government services has made privacy risks in the training stage increasingly difficult to ignore. Existing reviews have examined this issue from the perspectives of the full lifecycle, overall security, or individual privacy-preserving techniques. However, the training stage is often treated merely as one layer within a broader risk landscape, with insufficient mechanism-level analysis and limited unified comparison across different protection paradigms.Against this backdrop, this review narrows its analytical focus to the training stage. It begins by examining three major categories of risk, namely training-process security and data quality risks, privacy inference, and training data extraction, and then explains why federated learning has become a critical framework for privacy protection in training under multi-institutional collaboration and stringent compliance constraints. Building on this foundation, the review develops a unified comparative framework for four major technical routes-secure aggregation and secure multi-party computation, homomorphic encryption, differential privacy, and parameter-efficient fine-tuning-across six dimensions: protected objects, privacy strength, utility impact, communication overhead, computational overhead, and engineering deployability. The analysis shows that these four routes shift their costs to different dimensions, making it difficult for any single mechanism to achieve comprehensive protection. A more promising direction for future research lies in protection strategies that integrate parameter efficiency, architecture awareness, and fine-grained control.
Key Words:
large language models; privacy-preserving training; federated learning; training-stage privacy risks; utility-overhead trade-off