DeepSeek V4 Pro with Ollama Cloud: One-click integration with Claude Code

According to an Ollama official tweet on April 27, DeepSeek’s flagship model DeepSeek V4 Pro, released by the Chinese AI company DeepSeek on April 24, has officially moved into the Ollama official catalog in cloud mode. Users can call this model with a single-line command from popular agent tools such as Claude Code, Hermes Agent, OpenClaw, Codex, OpenCode, and more. This is Ollama’s fastest one-shot sync for integrating mainstream large models—going from DeepSeek releasing the weights to launching on Ollama Cloud, with only three days in between.

DeepSeek V4 Pro: 1.6T parameters, 1M context

V4 Pro uses a Mixture-of-Experts architecture, with a total parameter scale of 1.6 trillion (4.9 billion active parameters) and a context window of 1M tokens. Third-party benchmark Artificial Analysis indicates that V4 Pro is tied with the open-source model front-runners alongside Kimi K2.6 on programming benchmarks such as SWE-bench (80.6%), LiveCodeBench (93.5%), and Terminal-Bench (67.9%). Overall, its Intelligence Index trails Kimi K2.6 by one position.

In the same timeframe, DeepSeek also released a lighter V4 Flash model. Both are open-sourced under the MIT license and can download the weights from Hugging Face.

Ollama Cloud runs inference in the cloud; weights are not downloaded locally

deepseek-v4-pro:cloud is an Ollama Cloud model—its inference is performed on Ollama’s cloud, and the weights are not downloaded to the user’s local machine. This is Ollama’s standard approach for handling超 large models. Previously, Kimi K2.6 was also added using the same method. For users, the biggest advantage is that they don’t need to have dozens of GPUs on hand to call a flagship-class model. The downside is that it still requires a network connection, and computation resources are allocated according to the load on Ollama’s cloud.

If you want to run fully locally, you need to obtain the deepseek-ai/DeepSeek-V4-Pro weights from Hugging Face, combine an INT4 quantized version (such as the GGUF released by Unsloth), and set up a multi-GPU configuration to make it feasible. Typical consumer-grade hardware is not sufficient to carry the full model.

A single-line command connects Claude Code, Hermes Agent, OpenClaw

Ollama simultaneously releases an integration launcher command for mainstream agent tools:

直接對話 ollama run deepseek-v4-pro:cloud # connect Claude Code ollama launch claude --model deepseek-v4-pro:cloud # 串接 Hermes Agent ollama launch hermes --model deepseek-v4-pro:cloud # connect OpenClaw / OpenCode / Codex ollama launch openclaw --model deepseek-v4-pro:cloud ollama launch opencode --model deepseek-v4-pro:cloud ollama launch codex --model deepseek-v4-pro:cloud

The meaning is: in the past, if developers wanted to switch to DeepSeek inside Claude Code, they needed to manually wire it up via an OpenAI-compatible API, handling endpoints and authentication themselves; now, with Ollama, it can be done with a single-line command. For heavy Claude Code users, this provides a fast path to replace an Anthropic model with DeepSeek (or similarly replace with Kimi) to reduce costs.

Early testers’ feedback: speed from 30 tok/s to a peak of 1.1 tok/s

Community discussions under the tweet show that the speed of cloud inference depends on the load of Ollama’s cloud. Multiple early testers reported that speeds were slower during peak hours, dropping from the normal 30 tokens/s to around the 1.1 tokens/s level. User @benvargas directly posted a screenshot complaining, “Need More Compute.” In another reply, Ollama admitted that the official team “is also playing with this model,” implying that traffic is still in the exploration stage and that full capacity planning has not been done.

For developers seeking stable production-line speeds, the current recommendation is: use cloud mode for prototyping and cost evaluation, while official products still require building your own GPU inference infrastructure or choosing a commercial API. Ollama’s full tutorial has also been updated to include the V4 Pro entry and explanations of the trade-offs between cloud and local.

This article DeepSeek V4 Pro on Ollama Cloud: One-key integration with Claude Code first appeared on ChainNews ABMedia.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.
Comment
0/400
No comments