XAI Grok Adds Custom Voices: 2-Minute Cloning, Two-Stage Identity Verification

2026-05-03 01:35:06

xAI launched its Grok Custom Voices feature on May 2 via the official blog. Users only need to record about 1 minute of natural speech in the xAI console; the system completes processing within 2 minutes and outputs a custom voice model that can be used for TTS and the Voice Agent API. Also released in sync are the Grok 4.3 model and the Voice Library interface that aggregates all voice resources. Custom Voices also includes a two-stage identity verification mechanism designed to prevent cloning other people’s voices.

Feature: 1-minute recording, 2-minute generation, integrates TTS and the Voice Agent API

Users record about 1 minute of natural speech in the xAI console, and the backend workflow completes sequentially: (1) identity verification, (2) voice processing, (3) model output. In total, within 2 minutes, users can obtain a usable voice model. Custom Voices inherits all TTS capabilities, including speech tags, multilingual output, and REST and WebSocket streaming; it can be directly paired with xAI’s TTS endpoints or the Voice Agent API for real-time conversational agents.

The released-in-sync Voice Library is an interface in the xAI console for unified management of voice resources. It lets users browse, preview, and manage all voices—both user-created and prebuilt—preventing voice assets from being scattered across multiple interfaces. The prebuilt voice library includes more than 80 voices and supports 28 languages.

Two-stage identity verification: prevent cloning other people’s voices

Before voice generation, Custom Voices sets up two identity verification gates: first, the user reads a verification sentence, and the system transcribes that segment of speech in real time; second, the system calculates speaker embeddings (speaker feature vectors) separately from the verification sentence and the full recording, then checks whether both belong to the same person. Only after both stages pass will it enter the voice model output process.

xAI clearly states: users cannot clone voices using existing audio files, nor can they clone other people’s voices. This design rules out the scenario of “getting someone else’s public speech recording and copying it,” narrowing cloning to the single entry point where “the user records in real time.” For observers concerned about misuse of AI voice generation (such as phone scams and unauthorized dubbing), this mechanism is xAI’s specific response to the issue of preventing counterfeiting.

Follow-up to watch: Grok 4.3 sync release, Voice Library expands the pace

Custom Voices and the Grok 4.3 model were released on the same day, with xAI bundling “model upgrades + completion of the voice tool suite” into the same wave of announcements. The next thing to watch is the pace at which the Voice Library prebuilt voice collection expands beyond 80 voices, and whether the 28-language coverage can further extend to smaller languages such as Traditional Chinese. Another thing to watch is whether concrete adoption cases of the Voice Agent API are made public—especially integration examples in scenarios like customer service automation, podcast recording, and multilingual customer service.

This article “xAI Grok launches Custom Voices: 2-minute cloning, two-stage identity verification” first appeared on LianXin (ABMedia).

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.