A study conducted by researchers has found that leading Generative AI models are not yet capable of grading undergraduate essays effectively. The AI systems matched human grading only half the time.
The research highlights significant shortcomings in AI's ability to discern quality, as it often fails to identify both the best and worst submissions among the essays evaluated.
These findings prompt concerns regarding the increasing reliance on AI for academic grading, suggesting that current technology may prioritize style over substance in assessments.