Gate News message, April 23 — Perplexity’s research team published a technical article detailing its post-training methodology for web search agents. The approach uses two open-source Qwen3.5 models (Qwen3.5-122B-A10B and Qwen3.5-397B-A17B) and employs a two-stage pipeline: supervised fine-tuning (SFT) to establish instruction-following and language consistency, followed by online reinforcement learning (RL) to optimize search accuracy and tool-use efficiency.
The RL phase leverages the GRPO algorithm with two data sources: a proprietary multi-hop verifiable question-answer dataset constructed from internal seed queries requiring 2–4 hops of reasoning with multi-solver verification, and rubric-based general conversation data that converts deployment requirements into objectively checkable atomic conditions to prevent SFT behavior degradation.
Reward design employs gated aggregation—preference scores only contribute when baseline correctness is achieved (question-answer match or all rubric criteria met), preventing high preference signals from masking factual errors. Efficiency penalties use within-group anchoring, applying smooth penalties to tool calls and generation length exceeding the baseline of correct answers in the same group.
Evaluation shows Qwen3.5-397B-SFT-RL achieves best-in-class performance across search benchmarks. On FRAMES, it reaches 57.3% accuracy with a single tool call, outperforming GPT-5.4 by 5.7 percentage points and Claude Sonnet 4.6 by 4.7 percentage points. Under moderate budget (four tool calls), it achieves 73.9% accuracy at $0.02 per query, compared to GPT-5.4’s 67.8% accuracy at $0.085 per query and Sonnet 4.6’s 62.4% accuracy at $0.153 per query. Cost figures are based on each provider’s public API pricing and exclude caching optimizations.
Related News
SlowMist CISO issues alert: ShinyHunters claims to have breached Anthropic’s internal systems
OpenAI Introduces ChatGPT Workspace Agents: Codex-Powered, Team Shared, Slack Integration
Google launches Deep Research Max: supports MCP and can access enterprise private data