Transformer's Eight Authors Nine Years Later: Google didn't keep a single one.

Original title: "Where Are the Eight Parents of the Transformer Now?"
Original source: Machine Heart

On June 18, Noam Shazeer, one of the co-authors of the Transformer paper, announced his departure from Google on X to join OpenAI. Two days later, John Jumper, the 2024 Nobel Prize winner in Chemistry and head of the AlphaFold team, also announced his departure from Google DeepMind, heading to Anthropic.

The two consecutive announcements sent shockwaves through the capital markets: Google's parent company Alphabet's stock price briefly plummeted over 7%, wiping out more than $300 billion in market value. Multiple analytical institutions attributed this sell-off to "talent flight." Gil Luria, an analyst at D.A. Davidson, bluntly stated that Shazeer joining OpenAI and Jumper joining Anthropic, leaving in quick succession, has made the market worry that Google is falling behind in the AI talent war.

Shazeer's departure is particularly noteworthy—this is already his second time leaving Google.

In 2021, dissatisfied with the company's reluctance to publicly release the chatbot he led the development of, he left to found Character.AI; in August 2024, Google spent approximately $2.7 billion to acquire the technology license for Character.AI, simultaneously inviting him back to DeepMind as Vice President of Engineering for the Gemini project, co-leading it with Jeff Dean. Less than two years later, he left again, this time to archrival OpenAI.

At this point, all eight co-authors of the paper "Attention Is All You Need," published nine years ago, have left Google.

User Tyler Maran created a diagram mapping out their current destinations, which has been widely shared on social networks.

However, this diagram may soon be outdated. Over the past two days, rumors have circulated in the market that NVIDIA is quietly absorbing the core team of Essential AI, including Ashish Vaswani, one of the Transformer paper authors and co-founder and CEO of Essential AI. As of press time, neither NVIDIA nor Essential AI has formally responded to this matter.

Taking this opportunity, let's conduct a comprehensive review of the career trajectories of these eight individuals, known as the "Fathers of the Transformer," over the past nine years, and where they truly ended up.

It should be noted that the author order of the "Attention Is All You Need" paper is random. The paper's footnote clearly states: All authors contributed equally, and the order is random, so there is no so-called "first author" or "corresponding author." This article will follow the original order of authors listed in the paper to introduce these eight individuals.

"Origin of Everything": Eight Googlers Who Didn't Stick to Their Jobs

To understand where they are today, we must first go back to 2017. At that time, the mainstream approach in machine translation was recurrent neural networks (RNNs), where models had to process sentences word by word sequentially, like queuing to cross the road in a single lane, unable to compute in parallel, making training slow and expensive.

Eight people at Google Brain decided to try a nearly audacious idea: discard the recurrent structure entirely, keeping only the "attention mechanism," allowing the model to see the entire sentence at once and decide for itself which words to focus on. The title of the paper, "Attention Is All You Need," was adapted from The Beatles' song "All You Need Is Love," and has since become a form imitated by many subsequent paper titles.

The author contributions section of the paper briefly records what each person specifically did:

· Jakob Uszkoreit was the first to propose replacing the recurrent structure with self-attention and led the early validation of this idea;

· Ashish Vaswani designed and implemented the initial Transformer model together with Illia Polosukhin, participating in nearly every aspect of the project;

· Noam Shazeer proposed scaled dot-product attention, multi-head attention, and parameter-free position representations, being another person who was involved in nearly every detail;

· Niki Parmar designed, implemented, and debugged countless model variants in the initial codebase and later the tensor2tensor framework;

· Llion Jones also tried numerous new model variants and was responsible for the initial codebase, inference efficiency optimization, and visualization work;

· Łukasz Kaiser and Aidan N. Gomez spent countless days and nights building the modules of tensor2tensor, replacing the early codebase, greatly improving experimental results and research efficiency.

This description also indirectly reveals a detail: although the paper's author order is random, Uszkoreit, Vaswani, Polosukhin, and Shazeer clearly took on more core architectural roles, while Parmar, Jones, Kaiser, and Gomez shouldered the engineering implementation and system building—this is also an early footnote showing the differences in personality and expertise that would later lead the eight to choose different paths.

The name "Transformer" itself has an interesting anecdote. Uszkoreit liked the sound of the word, so the team internally called themselves "Team Transformer," and the cover of the early design document featured six characters from the Transformers animated series.

Since the paper's publication, it has been cited over 260k times, making it one of the most cited papers of the 21st century.

Ashish Vaswani

Vaswani was born in 1986 in India. He earned his Bachelor's degree in Computer Science from Birla Institute of Technology (BIT Mesra) in 2002, then went to the U.S. to pursue a Ph.D. at the University of Southern California under David Chiang, focusing on statistical machine translation and neural language modeling. After completing his Ph.D., he worked as a computer scientist at the USC Information Sciences Institute for two years before officially joining Google Brain as a research scientist in 2016, where he worked until 2021.

According to the paper's author contributions, Vaswani designed and implemented the initial Transformer model with Illia Polosukhin, making him one of the core figures who "participated in nearly every aspect of the project."

After leaving Google, Vaswani co-founded Adept AI in 2021 with Niki Parmar and former OpenAI VP of Engineering David Luan, serving as Chief Scientist, aiming to build "action models" that could autonomously operate within any software.

Adept once raised over $400 million, with a valuation of about $1 billion, but its product failed to launch, and internal disagreements emerged. Vaswani and Parmar chose to exit early—his tenure as Chief Scientist at Adept ended in November 2022.

In early 2023, Vaswani and Parmar again joined forces to co-found Essential AI, with him serving as CEO. The company secured strategic investments from Google, NVIDIA, and AMD: an $8.3 million seed round led by Thrive Capital, and a $56.5 million Series A round in late 2023 led by March Capital, with participation from Google, NVIDIA, AMD, KB Investment, Franklin Templeton, and others.

In early 2026, the company completed a $175 million Series B round led by Lightspeed Venture Partners, with Thrive Capital participating, reaching a valuation of $1 billion, officially becoming a unicorn.

In late 2025, the company released its first open-source model series, Rnj-1 (named after Indian mathematician Ramanujan).

However, just in the past two days, the wind has shifted. Reports indicate that NVIDIA is recruiting Essential AI's core team, including Vaswani himself, who will join NVIDIA to work on its open-source model Nemotron.

Sources revealed that the reason is quite pragmatic: Essential AI is facing funding difficulties, and poaching Vaswani and his team from NVIDIA's competitor AMD's camp (AMD has been one of Essential AI's early strategic investors, and the company has long relied on AMD's GPUs) is itself a good deal.

Several Essential AI researchers (including Alok Tripathy and Saurabh Srivastava) have updated their LinkedIn profiles, showing they have joined NVIDIA. However, as of now, neither NVIDIA nor Essential AI has officially confirmed this news.

Noam Shazeer

Shazeer was born in Philadelphia in 1976 to an Orthodox Jewish family; his father Dov Shazeer was an engineer with a background in mathematics teaching, and his sister was ordained as a rabbi by the Hebrew College. He showed exceptional talent from a young age, winning a gold medal with a perfect score as a member of the U.S. team at the International Mathematical Olympiad in 1994. He then attended Duke University to study mathematics and computer science, was a recipient of the Angier B. Duke Memorial Scholarship, and won awards in the Putnam Mathematical Competition.

In 2000, Shazeer joined Google, where his early claim to fame was fixing Google Search's spell-check feature.

According to the Transformer paper's author contributions, he proposed scaled dot-product attention, multi-head attention, and parameter-free position representations, making him, along with Vaswani and Polosukhin, someone who was "involved in nearly every detail."

After co-authoring the Transformer paper in 2017, he developed the chatbot Meena with his colleague Daniel De Freitas, but Google did not release it publicly due to caution. They left in 2021 to found Character.AI, raising over $150 million from a16z and others, creating a popular role-playing chat application.

In August 2024, a twist occurred: Google reached a licensing agreement with Character.AI, reportedly worth up to $2.7 billion. Shazeer and De Freitas, along with a small group of colleagues, returned to Google DeepMind. He was appointed Vice President of Engineering, co-leading the Gemini project with Jeff Dean and Oriol Vinyals.

Since he personally held about 30-40% of Character.AI shares, the deal reportedly netted him between $750 million and $1 billion. In 2026, he was elected to the U.S. National Academy of Engineering, his resume seemingly at its peak.

But just a few months later, he chose to leave again, this time for OpenAI, reportedly to lead a direction called "architecture research," coinciding with OpenAI's talent drive ahead of its IPO (the company confidentially filed an S-1 with the SEC on June 8, with a rumored valuation of up to $852 billion).

OpenAI CEO Sam Altman publicly stated, unusually, that "from the first day of OpenAI, he was one of the people I most wanted to work with," and said this recruitment "had been brewing for ten years."

For Google, this was a costly "failed buyback": the person they spent $2.7 billion to bring back two years ago has now joined its top competitor, and this became one of the direct triggers for Google's sharp stock decline this week.

Niki Parmar

Parmar was born in Pune, India. She earned her Bachelor's degree in Information Technology from the Pune Institute of Computer Technology. During her studies, she became interested in artificial intelligence and machine learning through online open courses taught by Andrew Ng and Peter Norvig. She then went to the U.S. for a Master's in Computer Science at the University of Southern California, studying social science problems using machine learning methods under Professor Morteza Dehghani.

In 2015, Parmar joined Google Research as a software engineer, and in 2017, she transferred to Google Brain as a research software engineer—reportedly the youngest and the only one without a Ph.D. in the Google Brain team at the time.

According to the paper's author contributions, she designed, implemented, and debugged countless model variants in the initial codebase and later the tensor2tensor framework. After the paper was published, she continued to push Transformer beyond language, participating in research extending self-attention mechanisms to image generation and computer vision.

In 2021, Parmar left Google to co-found Adept AI with Ashish Vaswani and David Luan, serving as CTO. Like Vaswani, she exited Adept early, and in early 2023, she co-founded Essential AI with Vaswani again, remaining a co-founder.

However, she did not stay for Essential AI's later Series B round and unicorn status. At the end of 2024, Parmar quietly left Essential AI and joined Anthropic, publicly announcing this in February 2025. She wrote on X: "Today is as good a day as any to share: I joined Anthropic last December."

She subsequently participated in the development of Claude 3.7 Sonnet—one of the most important model releases in Anthropic's history. She is now a Member of Technical Staff at Anthropic, focusing on frontier capability research and reinforcement learning.

Two former inseparable co-authors and twice co-founding partners ultimately ended up in completely different places: Parmar quietly exited more than a year early, seamlessly integrating into a leading lab; while Vaswani chose to continue pushing Essential AI forward until this week, when a competitor's hand reached out.

Jakob Uszkoreit

Uszkoreit was born into a family of linguists. His father, Hans Uszkoreit, is a renowned computational linguist. When his son proposed the hypothesis that "attention alone is enough," even his father was skeptical. Uszkoreit earned his Ph.D. from the Technical University of Berlin and later achieved the level of "Distinguished Scientist" at Google Brain.

According to the paper's author contributions, it was Uszkoreit who first proposed replacing recurrent neural networks with self-attention mechanisms and led the early validation of this idea—the seed of this hypothesis was actually planted in his 2016 paper co-authored with Ankur Parikh, Oscar Täckström, and Dipanjan Das on the "decomposable attention model."

The name "Transformer" was also chosen because he liked the sound of the word; the team internally called themselves "Team Transformer," and the cover of the early design document featured six characters from the Transformers animated series.

At the end of 2020, DeepMind's AlphaFold2 demonstrated that Transformer-style models could solve protein folding, a "Holy Grail" level problem in biology. He also became increasingly aware that what deep learning lacked to truly transform biology was not algorithms, but data. "It almost became a moral obligation," he later recalled.

So in 2021, he co-founded Inceptive with Rhiju Das, a biochemistry professor at Stanford University and the developer of the famous RNA design game Eterna. The company is headquartered in Berkeley, with research teams in Berlin—he himself lives in Berlin—and employees spread across Zurich, London, Vancouver, and several cities on the U.S. East Coast.

The company's core idea is to reverse the experimental approach: instead of first having data and then training models, they use robots and humans to generate massive new RNA experimental data on a large scale, which is then fed to the model for learning.

Inceptive has raised about $120 million from NVIDIA, a16z, Obvious Ventures, Section 32, and others. The latest development occurred this month: in early June, Alnylam Pharmaceuticals, the pioneer of RNA interference therapy, signed a strategic partnership with Inceptive, using Inceptive's foundation models to accelerate the design of siRNA drug candidates, with an upfront payment of $30 million, reportedly with a total potential deal value of around $2 billion.

Uszkoreit said in a statement: "Most drug design still relies on trial and error—testing thousands of molecules, hoping one works. Inceptive's approach is different: life follows extremely complex laws that only AI can learn."

Among the eight authors, he is the only one who has completely switched to biotech, which precisely confirms a prediction left by the paper back then: the potential of the attention mechanism goes far beyond machine translation.

Llion Jones

Jones is Welsh, graduated from the University of Birmingham, and joined Google as a software engineer in 2011, working there for over a decade. He is one of the few among the eight authors without a Ph.D., relying purely on engineering intuition to find his way.

According to the paper's author contributions, he tried numerous new model variants and was responsible for the initial codebase, inference efficiency optimization, and visualization work.

He later recalled the decisive moment: "We were just starting to try removing some parts of the model entirely, just to see how much worse it would get. Surprisingly, it got better instead." This was the moment when the hypothesis that "the recurrent structure is redundant" was first validated.

In 2023, Jones co-founded Sakana AI in Tokyo with David Ha, who also came from Google. "Sakana" means "fish" in Japanese. Ha serves as CEO, Jones as CTO, and another co-founder, Ren Ito, as COO.

Jones now resides in Tokyo and describes himself on social media as "a Welsh AI researcher living in Tokyo." The company's research direction carries a distinctly counter-trend flavor: rather than simply stacking compute and parameters, it draws inspiration from the logic of natural evolution, letting a group of smaller models collaborate like a school of fish. The company's representative research achievements include the Continuous Thought Machine and the "AI Scientist" project capable of autonomously conducting end-to-end research.

Recently, the company released the cutting-edge Sakana Fugu model.

Sakana AI has raised a total of $379 million, including a Series B round completed in March 2026, with Mitsubishi Electric also being one of its investors. In March 2026, the company also secured a multi-year partnership with Mitsubishi UFJ Financial Group (MUFG), which plans to use Sakana's technology to transform its banking systems. This collaboration is reportedly expected to make the company, valued at around $1.5 billion, profitable within a year.

Jones himself has expressed skepticism about mere "scaling" on multiple occasions.

In March 2026, speaking at an internal banking industry event, he said that current AI research faces an awkward reality: with massive investment and talent influx, theoretically more breakthroughs should be expected, but the actual effect might be the opposite: investors push for results, competition pushes for first releases, and the space for researchers to "freely explore" is instead compressed.

He mentioned that Sakana internally has always retained a small portion of research freedom without KPIs, because the next breakthrough will surely come from this kind of long-term, no-strings-attached investment—this is exactly how the Transformer was born in that Google Brain office back then.

He also said a sentence that has been repeatedly quoted: for a new architecture to truly replace the Transformer, being merely "better" is not enough; it must be "clearly, unquestionably better."

Aidan N. Gomez

Gomez is the youngest of the eight authors. At the time the paper was published, he was just a 20-year-old undergraduate intern at Google Brain, studying computer science and mathematics at the University of Toronto.

According to the paper's author contributions, he and Łukasz Kaiser spent countless days and nights building the modules of the tensor2tensor framework, replacing the early codebase, greatly improving experimental results and research efficiency. "I was just trying to understand how the attention mechanism worked," he later recalled, "completely unaware it would become the 'architecture of everything.'" After the paper, he went to Oxford University for a Ph.D., paused his studies to start a company, and officially received his Ph.D. in 2024—one could say he completed his degree while building a company on the side.

In 2019, Gomez co-founded Cohere with Ivan Zhang and Nick Frosst, positioning the company as an enterprise AI service provider, deliberately avoiding the costly race for consumer chatbots, focusing on data privacy, local deployment, and multilingual capabilities, with customers mainly being large enterprises and various governments.

In 2023, Gomez was named to Time magazine's list of the 100 Most Influential People in AI, and he and his two co-founders topped Maclean's magazine's AI Trend Pioneer list for that year; in April 2025, he was elected to the board of electric vehicle company Rivian.

This relatively "unsexy" approach has actually yielded solid financial results: as of mid-2026, Cohere's annualized recurring revenue exceeded $200 million, growing sixfold over the past year, with a gross margin of about 70%. The company has raised nearly $1.7 billion cumulatively, with a valuation of around $7 billion. In August 2025, the company hired Francois Chadwick, who had participated in Uber's IPO, as its first CFO. An employee secondary market share sale window has also been opened once. Gomez has repeatedly stated that an IPO is "coming soon," but as of now, the company has not yet filed a prospectus with regulators.

In recent years, Gomez has increasingly become a spokesperson for AI in a geopolitical sense. Just this week, he penned an article in Fortune magazine, urging countries to take the issue of "digital sovereignty" seriously.

The article directly referenced the recent tightening of access to Anthropic's models, warning countries not to "rent out their future" to a few centralized tech giants, and proposed building a truly diverse ecosystem where countries can rely on different AI providers while preserving their own values, languages, and legal systems.

He has also publicly stated that concerns about "AI doomsday" existential risks are exaggerated, and he is more worried about the real risk of disinformation being amplified automatically on social media. Today, Gomez talks not only about the models themselves but also about who gets to decide what kind of AI the world uses.

Łukasz Kaiser

Kaiser is Polish. His initial academic training was in theoretical computer science areas such as logic, automata theory, algorithmic model theory, and game theory: he earned dual Master's degrees in Mathematics and Computer Science from the University of Wrocław, completed his Ph.D. at RWTH Aachen University in Germany, and then held a tenured faculty position at the French National Centre for Scientific Research (CNRS) and Paris VII University, focusing on pure theoretical research in logic and automata theory.

Later, he shifted to applied work, spending nearly eight years at Google Brain, during which he was also a co-author of TensorFlow and co-published early papers on "can active memory replace attention" with Samy Bengio, and on "Neural GPU learning algorithms" with Ilya Sutskever.

According to the paper's author contributions, he and Aidan N. Gomez spent countless days and nights building the tensor2tensor framework, greatly improving experimental results and research efficiency.

Among the eight authors, he is the only one who did not start a company and has always remained in a large lab doing pure research.

In 2021, he joined OpenAI, before ChatGPT was released. At OpenAI, he contributed to the development of Codex (which later became the technical foundation for GitHub Copilot) and the accompanying HumanEval programming benchmark, as well as research on the GSM8K math problem dataset, which early on demonstrated that "letting the model think a bit longer and sample more times during inference" could significantly improve accuracy—this was the prototype of the later reasoning model paradigm.

He is also a named author of the GPT-4 technical report and later became a core contributor to OpenAI's first reasoning model, o1 (released in September 2024), playing a role equivalent to "research lead," continuing through o3 and newer reasoning paradigms up to the current GPT-5 series.

Recently, on the MAD Podcast hosted by Matt Turck, he discussed that the Transformer has been mathematically proven to solve any problem, as long as the model is allowed to generate enough intermediate reasoning steps. To some extent, this is a belated, more precise annotation of the paper from nine years ago.

Illia Polosukhin

Polosukhin comes from Kharkiv, Ukraine. He studied applied mathematics for his bachelor's degree and was also a champion in the International Collegiate Programming Contest (ICPC). According to his own recollection, after watching The Matrix at age ten, he developed an almost obsessive interest in artificial intelligence. In 2014, he joined Google, working on TensorFlow-related research and also research on machine reading comprehension and question answering systems.

According to the paper's author contributions, he designed and implemented the initial Transformer model together with Ashish Vaswani, with his part primarily involving verifying the architecture's effectiveness for machine translation tasks.

After the paper was published, he left Google in 2017 and co-founded a company initially called NEAR.AI with Alexander Skidanov. However, they soon realized that building decentralized infrastructure might be more interesting than models, so the company pivoted around 2018 into the blockchain project NEAR Protocol.

NEAR employs a sharding technology called Nightshade and provides an Ethereum-compatible Layer 2 network through Aurora. Its mainnet launched in 2020, and it has raised over $530 million from institutions including a16z, Coinbase, Tiger Global, Hashed, and Dragonfly Capital.

Today, Polosukhin is trying to merge his two original identities once again: in March 2026, he told the media that "the future users of blockchain will be AI agents, not humans," positioning NEAR as the "settlement layer" for the agent economy.

In April of the same year, he publicly called for a more robust regulatory framework to handle autonomous AI agents; he believes existing institutions and systems are not ready to deal with the accountability and systemic risks posed by such systems, urging the establishment of clearer accountability mechanisms and "human-in-the-loop" oversight.

He currently resides in Portugal. Among those who have both "authored a foundational LLM paper" and "run a multi-billion dollar blockchain company," he is likely the only one in the world.

Eight Paths, Continuing to Explore

In March 2024, at the NVIDIA GTC conference, seven of the eight authors (Niki Parmar was absent for some reason) appeared together as a group for the first time, interviewed by Jensen Huang.

Huang said: "Everything we enjoy today can be traced back to that moment."

At the end of the conversation, he presented each of them with a signed NVIDIA DGX-1 supercomputer commemorative plaque engraved with "You transformed the world." In November of the same year, the NEC C&C Foundation in Japan awarded that year's C&C Prize to the "Transformer Team" composed of these eight individuals, sharing the stage with three senior engineers researching transoceanic submarine cable transmission technology. Infrastructure builders from two completely different fields were placed in the same award.

Nine years have passed, and these eight life trajectories have scattered to places where they rarely intersect: the enterprise service track in Silicon Valley, the evolutionary algorithm lab in Tokyo, the molecular biology company in Berlin, the blockchain protocol in Portugal, and the major AI labs that are still reshuffling this week.

But if you look at the things they've said over the years, a common judgment repeatedly emerges: no one truly believes the Transformer is the end point.

Aidan N. Gomez said the world needs something better than the Transformer; Llion Jones said the next architecture must be "clearly, unquestionably better" to replace it; Łukasz Kaiser is still using mathematical language to try to explain just how far this nine-year-old architecture can take humanity.

Perhaps this is the most enduring legacy left by this paper: its eight authors, scattered across the globe, have not stopped searching for the next answer.

Original link

Click to learn about the open positions at Rhythm BlockBeats

Welcome to join the official Rhythm BlockBeats community:

Telegram subscription group: https://t.me/theblockbeats

Telegram discussion group: https://t.me/BlockBeats_App

Twitter official account: https://twitter.com/BlockBeatsAsia

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned