Chinese AI observer xiaohu shared a workflow example on May 10 that combines GPT and Gemini 3.1 Pro: first use GPT to generate images, then use Gemini 3.1 Pro to convert the images into interactive 3D content, enabling scientific applications on any knowledge topic that are rotatable and operable. The examples shown in xiaohu’s tweet include 3D planet displays, interactive science models, and more—concrete implementations of “multi-model workflows.”
Workflow structure: GPT image generation → Gemini 3.1 Pro into 3D
The two-stage design of the entire workflow:
Stage 1: Use GPT (GPT-image-1 or image generation built into ChatGPT) to produce topic images and provide the visual foundation
Stage 2: Feed the images into Gemini 3.1 Pro, where Gemini converts the 2D images into interactive 3D content
Output format: Interactive 3D objects that can be rotated and zoomed in the browser
Applicable scenarios: Science education, product showcases, interactive knowledge content
“Multi-model workflow” is one of the key trends in the 2026 AI application layer—single models are no longer万能 tools; developers connect the strongest parts of different models to build applications that a single model cannot accomplish.
Specific demonstrations: 3D planets, interactive science content, a robot vending website
Multiple examples shared by xiaohu at the same time:
3D planet display: a rotatable solar system or a single planet model
Interactive science content: turn abstract knowledge into 3D visualizations, suitable for educational use
Future robot vending machine website: use GPT image generation plus the Tripo 3D platform to create a showcase-style webpage
The common feature of these examples is “visual generation + interactive conversion”—GPT handles creative visuals, while Gemini or other 3D tools handle transforming static images into operable interactive forms. None of the pieces alone is new, but the final experience after chaining them is stronger than any single tool.
Meaning: Multi-model workflows are gradually becoming mainstream development approaches
Concrete takeaways for developers:
Choosing the right tools matters more than choosing the strongest model—GPT’s visual strength, Gemini’s multimodal understanding, Claude’s long context, and each has its own sweet spot
Integration costs for model APIs are declining, making it feasible at the implementation level to connect multiple models
A new kind of application is likely to be a “multi-model pipeline,” not an extension of “the strongest single model”
The value of this case isn’t technical breakthroughs, but a template for workflow design
Specific follow-up events to watch: whether Google will formally announce Gemini 3.1 Pro’s 3D generation capability as a product feature in subsequent events, whether multi-model workflows will receive default template support in frameworks like LangChain/LlamaIndex, and concrete adoption examples from commercialization cases (such as education, e-commerce, and marketing).
This article demonstrating the multi-model workflow (GPT image generation + Gemini 3.1 Pro converting into interactive 3D content) first appeared in 链新闻 ABMedia.
Related News
Anthropic Code Mode’s MCP vs CLI battle: tools pin runtime, tokens drop from 150K to 2K
Garry Tan: I now rarely prompt AI! YC CEO explains “compounding AI workflows”
Anthorpic launches finance-dedicated AI Agent; insiders reveal the key reason Claude cannot replace analysts
NVIDIA Space Computing ecosystem chain launches; Space-1 Vera Rubin delivers data-center-grade AI computing power to space
OpenAI’s GPT-5.5-Cyber arms cyber defenders