According to Beating, a recent Agent memory study by Dylan Zhang, a PhD student at University of Illinois, found that repeatedly summarizing model experiences can degrade performance rather than improve it. In ARC-AGI tasks, GPT-5.4 achieved 100% accuracy on 19 problems without memory, but after multiple rounds of memory compression based on correct solution trajectories, accuracy fell to 54%. Similarly, in WebShop shopping tasks, the AWM memory method scored 0.64 with 8 expert trajectories but dropped to 0.20 with 128 trajectories, returning to baseline. The research suggests the issue stems from over-summarization: each abstraction step loses specific details and merges task-specific rules into generic guidance, ultimately degrading model performance.
Related News
Google: Large language models are being used for real-world attacks; AI can bypass dual-factor authentication security mechanisms
Google reveals the first AI-generated zero-day vulnerability: hackers aim to bypass 2FA for large-scale exploitation
Stanford launches Agent Island: AI models in Survivor-style games carry out strategic betrayals and vote out targets in counter-alliances