AI researcher Aran Komatsuzaki recently published an experimental analysis on X exposing a serious “non-English tax” problem with tokenizers used in mainstream large language models (LLMs). In particular, Claude models from Anthropic may require consuming nearly three times as many tokens for Chinese, Japanese, and Korean language materials, sparking debate in the community.
Experimental method: use a classic paper to quantify language cost differences
Using the classic article “The Bitter Lesson” as the source material, Komatsuzaki translated it into multiple languages—including Chinese, Hindi, Arabic, Korean, and Japanese—then fed each version into the tokenizers of major models to calculate the number of tokens consumed. The experiment uses OpenAI’s English version as the baseline (1.0×) and compares each model’s processing efficiency across languages via standardized multipliers.
Token counts directly determine API usage fees and response latency: the more tokens, the higher the cost and the slower the speed. Therefore, differences in tokenizer efficiency ultimately reflect differences in users’ wallets and overall experience.
Komatsuzaki also shared a website he designed himself that can calculate token usage:
Does AI have racial discrimination? Claude language tax is the highest, and Hindi is hit first
OpenAI vs. Anthropic per-language token consumption multiplier bar chart
The data show that OpenAI’s token multipliers across languages are generally kept within 1.4×, while Anthropic (Claude)’s gap is extremely pronounced:
Hindi: 3.24× (Claude) vs. 1.37× (OpenAI)
Arabic: 2.86× (Claude) vs. 1.31× (OpenAI)
Russian: 2.04× (Claude) vs. 1.31× (OpenAI)
Chinese: 1.71× (Claude) vs. 1.15× (OpenAI)
In other words, if an Indian developer uses Claude’s API to process Hindi content, they may end up paying more than three times the cost of the same English task—and response speed will also drop significantly as token counts balloon.
Cross-model comparison of six models: China-based models catch up, Gemini performs best
Cross-model cross-language token consumption multiplier heat map
Komatsuzaki’s follow-up post further expanded the comparison range to include models such as Gemini 3.1, Qwen 3.6, DeepSeek V4, and Kimi K2.6. The results show:
Gemini 3.1: 1.22× (most friendly to non-English users)
Qwen 3.6: 1.23×
OpenAI: 1.33×
DeepSeek V4: 1.49×
Kimi K2.6: 1.76×
Anthropic: 2.07× (least friendly to non-English users)
The data show that token consumption for Chinese on Qwen (0.85×), DeepSeek (0.87×), and Kimi (0.81×) is lower than the English baseline, indicating that China-based models have been deeply optimized for Chinese. Komatsuzaki himself also admitted in his replies: “I didn’t expect Claude to be this bad and unbalanced.”
Community concerns: “cost disparity” is a serious issue in the process of AI popularization
The experimental results sparked strong resonance across the X community. Many non-English developers said that in real-world use, using Claude to process the same Chinese or Korean documents actually costs far more than Gemini.
The discussion also extended to technical root causes: tokenizer efficiency differences mainly stem from training data being dominated by English content and Latin letters, which leads models to have lower understanding of other writing systems—meaning more tokens are required for each character or word. Even though Hindi users number in the hundreds of millions worldwide, relatively scarce high-quality training material combined with the complex morphological structure of the language makes it the highest-cost group to use AI.
Some netizens believe Anthropic’s main customer base leans toward English enterprises and code development scenarios, so it lacks motivation for multilingual optimization. By contrast, OpenAI is said to be better at handling language content. They bluntly stated: “AI should be a democratizing equal technology, but non-English users are the ones paying for language discrimination.”
Now, this controversy surrounding tokenizer design is no longer just a technical issue—it also reflects imbalance in how the AI industry is expanding globally.
Will Claude charge a language tax? The research reveals that processing Chinese, Japanese, and Korean content consumes the most tokens—nearly triple—the story first appeared on Chain News ABMedia.
Related News
BioMysteryBench: Mythos expert untangles unsolved questions 29.6%
Oxford Internet Institute: Friendly training increases AI error rate by 7.43 percentage points
NVIDIA’s VP of Deep Learning believes AI computing spend will surpass human payroll costs
Semiconductor analysts are bullish on the AI market, saying it will run “at least another three years”: advanced packaging is the industry bottleneck
Legendary hedge fund trader on the S&P 500 price-to-earnings ratio for U.S. stocks: It will be very hard for anyone buying the broad market to profit in the coming years