Gate News message, April 11, AI infrastructure company Ramp Labs released research findings called “Latent Briefing,” enabling efficient memory sharing among multi-agent systems by directly compressing large-model KV caches, greatly reducing Token consumption without losing accuracy. In mainstream multi-agent architectures, the orchestrator breaks down tasks and repeatedly calls worker model instances; as the inference chain grows longer, Token usage expands exponentially. The core idea behind Latent Briefing is to use the attention mechanism to identify the truly crucial parts of the context, discard redundant information directly at the representation layer, rather than relying on slow LLM summarization or RAG retrieval with less stable results. On the LongBench v2 benchmark, the method performed impressively: the worker model’s Token consumption dropped by 65%, the Token savings’ median for medium-length documents (32k to 100k) reached 49%, overall accuracy improved by about 3 percentage points versus the baseline, and the additional time spent per compression was only about 1.7 seconds—roughly a 20x speedup compared with the original algorithm. The experiments used Claude Sonnet 4 as the orchestrator and Qwen3-14B as the worker model, covering a wide range of document scenarios including academic papers, legal documents, novels, and government reports. The study also found that the optimal compression threshold varies with task difficulty and document length—hard problems are better suited to aggressive compression to filter speculative reasoning noise, while long documents are better suited to lighter compression to preserve dispersed key information.