MiniMax Scans 200K Tokens, Discovers 4.9% Degradation in M2 Series Models

According to MiniMax's technical blog, the company discovered significant token degradation in its M2 series models through a full vocabulary scan. Approximately 4.9% of the 200,000 tokens showed notable performance decline, with Japanese tokens hardest hit at 29.7%, compared to Korean (3.3%), Russian (3.7%), Chinese (3.9%), and English (3.5%). The degradation stems from low-frequency tokens being pushed into incorrect vector space directions during post-training, where high-frequency tokens like tool_call markers continuously update surrounding parameters.

MiniMax implemented a synthetic data fix using simple token repetition tasks to stabilize the entire vocabulary. Results were immediate: Russian characters mixed into Japanese responses dropped from 47% to 1%, and vector stability (cosine similarity) improved from a low of 0.329 to above 0.97 across all tokens.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments