Online Office System
News
- 2026 Greater Bay Area International Academic Paper Competition: No Review Fees,
- Leveraging Intelligent Tools to Enhance Competitiveness in Submitting to Interna
- AI Intelligent Review: The "Intelligent Facilitator" in the Field of Academic Pu
- Chinese AI Publishing Standards: The Inevitable Path of Independent Innovation a
- NEM: Sharing Chinese Wisdom for Global Scholarly Synergy!
Contact Us
Email:NEMPublishing@163.com
Tel(Beijing): 010-69313991;
010-58563191 ;010-58563176
Journals(Abstract)
Performance Differences of Large Language Models in IELTS Writing Evaluation
Luan Keyun
Department of English, Hebei University of Technology
Abstract:
With the IELTS exam transitioning entirely to computer-based testing, the shift toward electronic submission and feedback aligns with the growing trend of using large language models for writing self-assessment. Traditional manual teacher evaluation suffers from high costs, limited coverage, and delayed feedback, whereas large language models—with their low cost, immediacy, and reproducibility—have emerged as a core alternative tool for student self-assessment. However, significant scoring discrepancies between different models and inconsistent feedback across multiple evaluations have led to mixed results in student self-assessment. To address this, this study focuses on the IELTS Writing Task 2. It selects three mainstream Chinese large language models—Doubao, Yuanbao, and DeepSeek—and systematically compares their performance across four dimensions—scoring accuracy, scoring stability, feedback accuracy, and feedback consistency—using a corpus of IELTS essays covering different score bands and topic types. The findings indicate that no all-purpose model currently exists; test-takers should select or combine models based on their core needs. Models excel at evaluating surface-level language dimensions but have limited capacity to assess higher-order thinking dimensions such as arguments and structure. Models exhibit no fatigue effects, offering new possibilities for large-scale standardized writing assessment. This study provides empirical evidence for the development of a human-machine collaborative IELTS writing assessment system.
Key Words:
large language model; IELTs; writing evaluation; automated essay scoring; second language writing