Perplexity Discloses Web Search Agent Post-Training Method; Qwen3.5-Based Model Outperforms GPT-5.4 on Accuracy and Cost

Gate News message, April 23 — Perplexity’s research team published a technical article detailing its post-training methodology for web search agents. The approach uses two open-source Qwen3.5 models (Qwen3.5-122B-A10B and Qwen3.5-397B-A17B) and employs a two-stage pipeline: supervised fine-tuning (SFT) to establish instruction-following and language consistency, followed by online reinforcement learning (RL) to optimize search accuracy and tool-use efficiency.

The RL phase leverages the GRPO algorithm with two data sources: a proprietary multi-hop verifiable question-answer dataset constructed from internal seed queries requiring 2–4 hops of reasoning with multi-solver verification, and rubric-based general conversation data that converts deployment requirements into objectively checkable atomic conditions to prevent SFT behavior degradation.

Reward design employs gated aggregation—preference scores only contribute when baseline correctness is achieved (question-answer match or all rubric criteria met), preventing high preference signals from masking factual errors. Efficiency penalties use within-group anchoring, applying smooth penalties to tool calls and generation length exceeding the baseline of correct answers in the same group.

Evaluation shows Qwen3.5-397B-SFT-RL achieves best-in-class performance across search benchmarks. On FRAMES, it reaches 57.3% accuracy with a single tool call, outperforming GPT-5.4 by 5.7 percentage points and Claude Sonnet 4.6 by 4.7 percentage points. Under moderate budget (four tool calls), it achieves 73.9% accuracy at $0.02 per query, compared to GPT-5.4’s 67.8% accuracy at $0.085 per query and Sonnet 4.6’s 62.4% accuracy at $0.153 per query. Cost figures are based on each provider’s public API pricing and exclude caching optimizations.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.
Comment
0/400
No comments