According to an Ollama official tweet on April 27, DeepSeek’s flagship model DeepSeek V4 Pro, released by the Chinese AI company DeepSeek on April 24, has officially moved into the Ollama official catalog in cloud mode. Users can call this model with a single-line command from popular agent tools such as Claude Code, Hermes Agent, OpenClaw, Codex, OpenCode, and more. This is Ollama’s fastest one-shot sync for integrating mainstream large models—going from DeepSeek releasing the weights to launching on Ollama Cloud, with only three days in between.
DeepSeek V4 Pro: 1.6T parameters, 1M context
V4 Pro uses a Mixture-of-Experts architecture, with a total parameter scale of 1.6 trillion (4.9 billion active parameters) and a context window of 1M tokens. Third-party benchmark Artificial Analysis indicates that V4 Pro is tied with the open-source model front-runners alongside Kimi K2.6 on programming benchmarks such as SWE-bench (80.6%), LiveCodeBench (93.5%), and Terminal-Bench (67.9%). Overall, its Intelligence Index trails Kimi K2.6 by one position.
In the same timeframe, DeepSeek also released a lighter V4 Flash model. Both are open-sourced under the MIT license and can download the weights from Hugging Face.
Ollama Cloud runs inference in the cloud; weights are not downloaded locally
deepseek-v4-pro:cloud is an Ollama Cloud model—its inference is performed on Ollama’s cloud, and the weights are not downloaded to the user’s local machine. This is Ollama’s standard approach for handling超 large models. Previously, Kimi K2.6 was also added using the same method. For users, the biggest advantage is that they don’t need to have dozens of GPUs on hand to call a flagship-class model. The downside is that it still requires a network connection, and computation resources are allocated according to the load on Ollama’s cloud.
If you want to run fully locally, you need to obtain the deepseek-ai/DeepSeek-V4-Pro weights from Hugging Face, combine an INT4 quantized version (such as the GGUF released by Unsloth), and set up a multi-GPU configuration to make it feasible. Typical consumer-grade hardware is not sufficient to carry the full model.
A single-line command connects Claude Code, Hermes Agent, OpenClaw
Ollama simultaneously releases an integration launcher command for mainstream agent tools:
The meaning is: in the past, if developers wanted to switch to DeepSeek inside Claude Code, they needed to manually wire it up via an OpenAI-compatible API, handling endpoints and authentication themselves; now, with Ollama, it can be done with a single-line command. For heavy Claude Code users, this provides a fast path to replace an Anthropic model with DeepSeek (or similarly replace with Kimi) to reduce costs.
Early testers’ feedback: speed from 30 tok/s to a peak of 1.1 tok/s
Community discussions under the tweet show that the speed of cloud inference depends on the load of Ollama’s cloud. Multiple early testers reported that speeds were slower during peak hours, dropping from the normal 30 tokens/s to around the 1.1 tokens/s level. User @benvargas directly posted a screenshot complaining, “Need More Compute.” In another reply, Ollama admitted that the official team “is also playing with this model,” implying that traffic is still in the exploration stage and that full capacity planning has not been done.
For developers seeking stable production-line speeds, the current recommendation is: use cloud mode for prototyping and cost evaluation, while official products still require building your own GPU inference infrastructure or choosing a commercial API. Ollama’s full tutorial has also been updated to include the V4 Pro entry and explanations of the trade-offs between cloud and local.
This article DeepSeek V4 Pro on Ollama Cloud: One-key integration with Claude Code first appeared on ChainNews ABMedia.
Related News
Guo Ming-chi: OpenAI wants to build an AI Agent phone; MediaTek, Qualcomm, and Luxshare Precision are key in the supply chain
Tencent Cloud QClaw integrates with the Hermes framework, supporting switching between multiple models such as DeepSeek-V4 Pro
xAI Grok Voice takes over Starlink customer service hotline, with 70% of calls automatically closed
DeepRoute.ai Advanced Driver Assistance System breakthrough: over 300k vehicles deployed. 2026 target: 1 million City NOA fleet.
DeepSeek V4-Flash goes live on Ollama Cloud, US-hosted: Claude Code, OpenClaw one-click integration