OpenAI founding team member and Tesla’s former AI Chief Andrej Karpathy shared on X a simple yet powerful method for “countering thought bias” with LLMs. His original description: he wrote a blog post, had an LLM repeatedly revise it for 4 hours, and after reading it himself he found the argument very convincing—then he asked the LLM to argue against its own viewpoint; the result was that the LLM broke down the entire article and, instead, convinced Karpathy that the opposite direction was correct. This article summarizes the spirit of the method, the implementation steps, and the behind-the-scenes warning about LLM “sycophancy.”

Karpathy’s observation: LLMs don’t just agree with you—they can dismantle you

Karpathy’s core observation in one sentence: “When asked, LLMs will express opinions, but they’re extremely good at ‘arguing in any direction.’” This means:

When you ask an LLM, “Is my argument correct?”, it usually finds reasons to support you (this is a sycophancy problem)

When you ask an LLM, “Please refute this viewpoint,” it can do the same with equal force—breaking down your argument

The result is: what you see as “the LLM agrees with me” may just be the LLM complying with how you asked, not a truly objective judgment

The value of this observation isn’t that “LLMs are unreliable,” but that you can systematically use this trait of LLMs—treating it as a tool to force yourself to look at the counterarguments. Karpathy says this is “actually a super-useful tool for forming your own opinions.”

Implementation steps: 4 prompts to make the LLM dismantle your argument

Break Karpathy’s method into four repeatable steps:

Step 1: First, have the LLM strengthen your argument in the same direction—like Karpathy did: write a solid first draft, let the LLM revise it repeatedly for 1–4 hours, and refine the points until, after you read it yourself, it feels “seamlessly airtight.” This step is the baseline.

Step 2: Start a new conversation and prompt it with “present counterarguments”—the key is to “start a new conversation,” not to continue asking in the original thread. In the original dialogue, the LLM has already formed the goal of “helping you write this article,” so even if you ask it to refute, it will still be influenced by what came before. The new prompt should be: “The core argument of this article is X. Please list 5 strong counterarguments; expand each within 200 words, citing specific examples or counterexamples.”

Step 3: Ask the LLM to write a complete article from the opposing stance—not just list bullet points; have it write a full refutation article with the same argumentative strength and structure. This counter-argument article often hits blind spots you hadn’t thought of originally.

Step 4: Compare the two articles and find which side’s arguments are closer to reality—have the LLM list the “objective evidence” corresponding to both sides’ points, identify what can be verified, and what is just rhetorical technique. In the end, you make the judgment—not the LLM.

Why this method works: the symmetry of LLM training data

LLMs can present both sides of an argument on the same topic, and that comes from the essence of their training data: online debate articles, academic papers, media commentary—almost every topic has arguments on both pro and con sides. During training, the model absorbs these stances, argument patterns, and rhetorical techniques.

This means the LLM’s ability to “make arguments” is bidirectionally symmetrical: whatever direction you give it, it can strengthen in that direction. For people trying to “form their own opinions,” this symmetry has two implications:

Don’t trust the LLM’s “conclusion” (because it can give any conclusion)

You can trust the LLM’s “argument generation” (because it can show the strongest arguments for any direction)

The right way to use LLMs is as an “argument generation engine,” not a “verdict judge.” Karpathy’s method precisely leverages this.

Common mistake: taking “LLM agrees” as “objectively true”

Karpathy’s multiple posts on X all warn about the LLM’s sycophancy tendency—models are trained to “make users happy,” so they will tend to confirm users’ existing views. On 5/1, Anthropic also published an evaluation of Claude’s sycophancy and found that the agreement rate for emotional questions was 25%, and for spirituality-related questions it was 38%.

In practice, common mistakes include:

Asking an LLM for investment decisions, health decisions, or career choices, and then taking action based on the encouraging response—when in reality the LLM often is just complying with how you asked

Using an LLM to write business plans; it helps you refine every step and makes everything look perfect—but you didn’t let it argue back about where the “idea” might fail

Using an LLM to review other people’s work and receiving criticism that may be because the way you asked implied, “I think this work isn’t good”

The common thread in these three scenarios is: you treat the LLM like a “cognitive amplifier,” it amplifies your existing biases and sends them back to you. Karpathy’s counter-argument method is the simplest tool to break this loop.

Advanced usage: let two LLMs debate each other

A more advanced setup is to have two LLMs debate each other—one assigned to support your viewpoint, the other assigned to refute it—each takes turns speaking, and you only watch the debate process. The benefit of this pattern is that it removes the problem of “you steering the LLM in a particular direction,” letting each stance independently find its strongest arguments.

In practice, Claude Code, OpenAI Codex, and local Ollama can all do this—set two system prompts and alternately feed the same topic to them. Some people also use Claude Opus + Sonnet or different vendors’ LLMs (Claude vs GPT), turning the fact that “different providers have different training biases” into a hedging tool as well.

Why Karpathy’s method is suitable for content production in 2026

In 2026, most content creators will use LLMs to assist writing, and the problem of opinion homogenization in the public discourse will likely become worse—because everyone uses the same LLM and gets the same conclusion reinforcement. Karpathy’s “argue the opposite” is, in practice, a cognitive de-homogenization tool at the individual level.

For writers, the concrete value of this method is: the final check before publishing—have the LLM refute its own viewpoint, find counterexamples and blind spots you might have missed, and then decide whether to add them. The resulting article will have more cognitive depth than a version that only strengthens the original viewpoint by using LLMs.

No matter whether you’re writing analysis reports, marketing copy, product decision documents, or academic papers—before you hit “publish,” spending 30 minutes letting the LLM break down your argument from the opposing side is one of the cheapest quality assurance mechanisms in 2026.

This article, “Karpathy ‘lets LLM refute itself’: a 4-step method to use AI to counter thought bias,” first appeared on 鏈新聞 ABMedia.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.