Xiaohu demonstrates a cross-model workflow: GPT image generation + Gemini 3.1 Pro to create interactive 3D content

ChainNewsAbmedia

2026-05-10 09:16:05

Chinese AI observer xiaohu shared a workflow example on May 10 that combines GPT and Gemini 3.1 Pro: first use GPT to generate images, then use Gemini 3.1 Pro to convert the images into interactive 3D content, enabling scientific applications on any knowledge topic that are rotatable and operable. The examples shown in xiaohu’s tweet include 3D planet displays, interactive science models, and more—concrete implementations of “multi-model workflows.”

Workflow structure: GPT image generation → Gemini 3.1 Pro into 3D

The two-stage design of the entire workflow:

Stage 1: Use GPT (GPT-image-1 or image generation built into ChatGPT) to produce topic images and provide the visual foundation

Stage 2: Feed the images into Gemini 3.1 Pro, where Gemini converts the 2D images into interactive 3D content

Output format: Interactive 3D objects that can be rotated and zoomed in the browser

Applicable scenarios: Science education, product showcases, interactive knowledge content

“Multi-model workflow” is one of the key trends in the 2026 AI application layer—single models are no longer万能 tools; developers connect the strongest parts of different models to build applications that a single model cannot accomplish.

Specific demonstrations: 3D planets, interactive science content, a robot vending website

Multiple examples shared by xiaohu at the same time:

3D planet display: a rotatable solar system or a single planet model

Interactive science content: turn abstract knowledge into 3D visualizations, suitable for educational use

Future robot vending machine website: use GPT image generation plus the Tripo 3D platform to create a showcase-style webpage

The common feature of these examples is “visual generation + interactive conversion”—GPT handles creative visuals, while Gemini or other 3D tools handle transforming static images into operable interactive forms. None of the pieces alone is new, but the final experience after chaining them is stronger than any single tool.

Meaning: Multi-model workflows are gradually becoming mainstream development approaches

Concrete takeaways for developers:

Choosing the right tools matters more than choosing the strongest model—GPT’s visual strength, Gemini’s multimodal understanding, Claude’s long context, and each has its own sweet spot

Integration costs for model APIs are declining, making it feasible at the implementation level to connect multiple models

A new kind of application is likely to be a “multi-model pipeline,” not an extension of “the strongest single model”

The value of this case isn’t technical breakthroughs, but a template for workflow design

Specific follow-up events to watch: whether Google will formally announce Gemini 3.1 Pro’s 3D generation capability as a product feature in subsequent events, whether multi-model workflows will receive default template support in frameworks like LangChain/LlamaIndex, and concrete adoption examples from commercialization cases (such as education, e-commerce, and marketing).

This article demonstrating the multi-model workflow (GPT image generation + Gemini 3.1 Pro converting into interactive 3D content) first appeared in 链新闻 ABMedia.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.