On a day in early December 2012, a secret auction was taking place in a casino hotel in Lake Tahoe, a skiing resort in the United States.
Lake Tahoe is located on the border of California and Nevada, and is the largest alpine lake in North America, with a sapphire-like lake surface and top-notch ski trails. The Godfather Part II was filmed here, and Mark Twain once lingered here. Due to its proximity to the San Francisco Bay Area, which is only over 200 miles away, it is often referred to as the “backyard of Silicon Valley”. Pro like Zuckerberg and Larry Ellison have also acquired land here to build mansions.
The object of the secret auction is a company called DNNresearch, which was founded just a month ago and has only three employees. Its founders are Geoffrey Hinton, a professor at the University of Toronto, and two of his students.
This company has no tangible products or assets, but the identities of the pursuers indicate its significance - the four buyers are Google, Microsoft, DeepMind, and Baidu.
Harrah’s Hotel, Lake Tahoe, 2012, which holds secret auctions
At 65, Xindun looked old and thin, suffering from the pain of lumbar disc. He sat on the floor of room 703 of the hotel setting rules for the auction - starting at $12 million, with increments of at least $1 million.
A few hours later, the bidders pushed the price up to $44 million, and Singleton felt a little dizzy. ‘I feel like we’re filming a movie,’ he said. So he decided to call it off and sell the company to the final bidder, Google.
Interestingly, one of the origins of this $44 million auction comes from Google six months ago.
In June 2012, Google’s research department Google Brain released the research results of The Cat Neurons project (also known as “Google Cat”). In simple terms, this project uses Algorithm to identify cats in videos on YouTube. It was initiated by Andrew Ng, who joined Google after leaving Stanford, and he recruited Google’s legendary figure Jeff Dean. They also obtained a large budget from Google founder Larry Page.
The Google Cat project has built a neural network that downloads a large number of videos from YouTube without annotation, allowing the model to observe and learn cat features on its own. Then, 16,000 CPUs distributed across Google’s data centers were used for training (refusing to use GPUs internally due to complexity and high cost), achieving a recognition accuracy of 74.8%. This number shocked the industry.
Andrew Ng retired from the ‘Google Brain’ project shortly before its conclusion and devoted himself to his own internet education project. Before leaving, he recommended Sutton to take over his position. In response to the invitation, Sutton stated that he would not leave the university and only wanted to ‘spend a summer’ at Google. Due to the unique recruitment rules of Google, 64-year-old Sutton became the oldest summer intern in Google’s history.
Since the 1980s, Xin Dun has been fighting at the forefront of artificial intelligence. As a professor, he has nurtured many outstanding students, including Andrew Ng, and is a master in the field of Depth learning. Therefore, when he understood the technical details of the ‘Google Cat’ project, he immediately saw the hidden flaws behind the project’s success: ‘They ran the wrong neural network and used the wrong computing power.’
With the same task, Xin Dun believed that he could do better. So after the short “internship” period, he immediately took action.
Sinton brought in his two students, Ilya Sutskever and Alex Krizhevsky, both of whom were Jewish and born in the Soviet Union. The former had exceptional mathematical talent, while the latter excelled in engineering implementation. After close collaboration, the three of them created a new neural network and immediately entered the ImageNet image recognition competition (ILSVRC), ultimately winning the championship with an astonishing 84% recognition accuracy.
In October 2012, the team led by Hinton introduced the champion AlgorithmAlexNet at a computer vision conference held in Florence. Compared with Google’s cat, which used 16,000 CPUs, AlexNet only used 4 Nvidia GPUs. This caused a sensation in both the academic and industrial communities. The paper on AlexNet has become one of the most influential papers in the history of computer science, with over 120,000 citations, while Google’s cat has been quickly forgotten.
DNNresearch Trio
After reading the paper, Yu Kai, who won the first ImageNet competition, was extremely excited, ‘like being electrocuted’. Yu Kai is a Depth learning expert born in Jiangxi, who just moved from NEC to Baidu. He immediately wrote an email to Sutton expressing his desire to collaborate, which Sutton gladly agreed to. He even packaged himself and two students into a company, invited buyers to bid, and thus the scene at the beginning came about.
After the auction hammer fell, a bigger competition unfolded: Google followed up its success and acquired DeepMind in 2014, ‘all the world’s heroes fall into my hands’; and DeepMind shocked the world by launching AlphaGo in 2016; Baidu, defeated by Google, was determined to bet on AI, investing billions over ten years, and Yu Kai later helped Baidu to invite Andrew Ng, while he himself left and founded Horizon Robotics a few years later.
Microsoft may have seemed slow at first, but in the end it won the biggest prize - OpenAI, whose founders include Ilya Sutskever, one of Sutton’s two students. Sutton himself stayed at Google until 2023, during which time he won the ACM Turing Award. Of course, compared to Google’s $44 million (of which Sutton received 40%), the Turing Award’s $1 million prize money seems like pocket change. 01928374656574839201
From the Google Cat in June to the AlexNet paper in October, and then to the bidding of Taihao Lake in December, almost all the groundwork of the AI wave has been laid in nearly 6 months— the prosperity of Depth learning, the rise of GPU and NVIDIA, the dominance of AlphaGo, the birth of Transformer, the emergence of ChatGPT… The grand movement of the silicon-based golden age has played the first note.
In 2012, for 180 days, from June to December, the fate of carbon-based humanity was changed forever – and only a few people realized it.
Liquid Cat
Among these few people, Fei-Fei Li, a professor at Stanford University, is one of them.
In 2012, when Clinton participated in the ImageNet competition, Li Feifei, who had just given birth to a child, was still on maternity leave. However, the error rate of Clinton’s team made her realize that history was being rewritten. As the founder of the ImageNet challenge, she bought the last flight to Florence that day and personally presented the award to Clinton’s team.[2]。
Li Feifei was born in Beijing and grew up in Chengdu. At the age of 16, she immigrated to the United States with her parents. While working in a laundry, she finished her studies at Princeton. In 2009, Li Feifei joined Stanford as an assistant professor, specializing in computer vision and machine learning. The goal of this discipline is to enable computers to understand the meaning of images and videos like humans do.
For example, when a camera takes a picture of a cat, it simply converts the light into pixels through a sensor, without knowing whether the thing in the lens is a cat or a dog. If a camera is compared to a human eye, the problem solved by computer vision is to equip the camera with a human brain.
The traditional way is to abstract things in the real world into mathematical models, such as abstracting the characteristics of a cat into simple geometric shapes, which can significantly reduce the difficulty of machine recognition.
Image source: Li Feifei’s TED talk
However, this approach has very significant limitations because the cat may very well be like this:
In order to enable computers to recognize ‘Liquid Cat’, a large number of depth learning pioneers such as Jeff Hinton and Yann LeCun have been exploring since the 1980s. However, they always encounter bottlenecks in computing power or algorithms - good algorithms lack sufficient computing power to drive them, and algorithms that require less computing power are difficult to meet the recognition accuracy and cannot be industrialized.
If the problem of ‘liquid cat’ cannot be solved, the sexiness of Depth learning can only stay at the theoretical level, and industrial scenarios such as autonomous driving, medical imaging, and precision advertising push are just castles in the air.
In simple terms, the development of Depth learning requires three driving forces: Algorithm, Computing Power, and data. The Algorithm determines how the computer recognizes things; however, it requires sufficient Computing Power to drive it. At the same time, the improvement of the Algorithm also requires large-scale and high-quality data. The three complement each other and are indispensable.
After the year 2000, although the Computing Power bottleneck has gradually been eliminated with the rapid advancement of chip processing power, the mainstream academic community still lacks interest in the Depth learning route. Fei-Fei Li realized that the bottleneck may not lie in the accuracy of the Algorithm itself, but in the lack of high-quality, large-scale datasets.
Li Feifei’s inspiration comes from the way a three-year-old child understands the world - take cats as an example, children will encounter cats again and again under the guidance of adults, gradually mastering the meaning of cats. If the child’s eyes are treated as a camera, each rotation of the eyeball is equivalent to pressing the shutter once, then a three-year-old child has already taken billions of photos.
Trap this method on the computer, for example, if you keep showing the computer pictures containing cats and other animals, and write down the correct answer behind each picture. Every time the computer sees a picture, it checks it against the answer on the back. So as long as there are enough repetitions, the computer may be able to grasp the meaning of cats like a child.
The only problem is: where to find so many pictures with good answers?
Li Feifei came to China in 2016 and announced the establishment of Google AI China Center.
This is the opportunity for the birth of ImageNet. At that time, even the largest dataset PASCAL only had four categories and a total of 1578 images. However, Fei-Fei Li’s goal was to create a dataset with hundreds of categories and tens of millions of images in total. Now it sounds easy, but it was in 2006, when the most popular mobile phone in the world was still the Nokia 5300.
Relying on Amazon’s crowdsourcing platform, Li Feifei’s team solved the huge workload of manual annotation. In 2009, the ImageNet dataset, which contains 3.2 million images, was born. With the image dataset, we can train algorithms on this basis to improve the computer’s recognition ability. However, compared with billions of photos of three-year-old children, the scale of 3.2 million is still too small.
To continuously expand the dataset, Li Feifei decided to emulate the popular practice in the industry by hosting an image recognition competition, where participants bring their own datasets for Algorithm identification, and the one with the highest accuracy wins. However, the Depth learning route was not mainstream at that time. Initially, ImageNet could only ‘attach’ to the well-known European event PASCAL in order to barely gather enough participants.
By 2012, the number of images in ImageNet had expanded to 1,000 categories with a total of 15 million images. Fei-Fei Li took 6 years to make up for this shortcoming in data. However, the best error rate of ILSVRC is also 25%, showing insufficient convincing power in Algorithm and Computing Power.
At this point, Professor Simpton appeared with AlexNet and two GTX580 graphics cards.
Convolution
The champion AlgorithmAlexNet of the Sutton team adopts an algorithm called Convolutional Neural Networks (CNN). “Neural network” is an extremely common term in the field of artificial intelligence and a branch of machine learning. Its name and structure are based on the operation mode of the human brain.
The process of human identifying objects starts with the pupil capturing pixels, and the brain cortex makes preliminary processing through edges and orientation, and then the brain judges through continuous abstraction. Therefore, the human brain can distinguish objects based on some features.
For example, without showing the entire face, most people can recognize who the person in the following image is:
Neural networks are actually simulating the recognition mechanism of the human brain. In theory, the intelligent computer that the human brain can achieve can also be achieved. Compared with methods such as SVM, decision trees, and random forests, only simulating the human brain can handle non-structured data such as “liquid cat” and “half of Trump”.
This is also why even the ‘father of artificial intelligence’ Marvin Minsky is not optimistic about this approach. When he published his new book ‘The Emotion Machine’ in 2007, Minsky still expressed pessimism about neural networks. In order to change the long-term negative attitude of the mainstream machine learning community towards artificial neural networks, Hinton simply renamed it Depth Learning (Deep Learning).
In 2006, Hinton published a paper in Science, proposing the concept of “Depth Belief Neural Network (DBNN)”, and provided a training method for a multi-layer Depth neural network, which was considered a major breakthrough in Depth learning. However, Hinton’s method requires a significant amount of Computing Power and data, making practical application difficult to achieve.
Depth learning requires constantly feeding data to the Algorithm. At that time, the size of the dataset was too small, until ImageNet appeared.
In the first two competitions of ImageNet, the participating teams used other machine learning approaches, and the results were quite mediocre. However, the Sinton team used the convolutional neural network AlexNet in 2012, which was an improvement on another pioneer of depth learning, Yann LeCun, and his LeNet proposed in 1998, which allowed the algorithm to extract key features of images, such as Trump’s blonde hair.
At the same time, the convolutional kernel will slide on the input image, so no matter where the detected object is, the same features can be detected, greatly reducing the computational load.
Based on the classic convolutional neural network structure, AlexNet abandons the previous layer-by-layer unsupervised methods and conducts supervised learning on input values, greatly improving accuracy.
For example, in the image in the lower right corner of the figure below, AlexNet did not actually recognize the correct answer (lemur). However, it listed small mammals that can climb trees, just like lemurs. This means that the Algorithm can not only recognize the object itself, but also make inferences based on other objects.[5]。
Image source: AlexNet paper
The industry is excited that AlexNet has 60 million parameters and 650,000 neurons, and training the ImageNet dataset requires at least 26.2 quadrillion floating-point operations. However, the Sutton team only used two NVIDIA GTX 580 graphics cards in a week of training.
GPU
After the Xin Dun team won the championship, the most embarrassing thing is obviously Google.
It is said that Google also tested the ImageNet dataset internally, but its recognition accuracy lags far behind the Sutton team. Considering that Google has hardware resources that the industry cannot reach, as well as the massive data scale of search and YouTube, Google Brain is specifically appointed by the leader, and its results obviously lack sufficient persuasiveness.
Without such a huge contrast, Depth learning may not have shocked the industry, gained recognition and popularity in a short period of time. The industry is excited because the Sutton team was able to achieve such good results with only four GPUs, so Computing Power is no longer a bottleneck.
When training, Algorithm will perform hierarchical operations on the functions and parameters of each layer of the neural network to obtain the output results, and GPUs happen to have very strong parallel computing capabilities. In fact, Andrew Ng proved this in a paper in 2009, but he and Jeff Dean still used CPUs when running “Google Cat”. Later, Jeff Dean specially ordered equipment worth $2 million, which still did not include GPUs.[6]。
Sinton is one of the very few who realized the great value of GPU for Depth learning very early on, however, before AlexNet became popular, high-tech companies generally had unclear attitudes towards GPUs.
In 2009, Xin Dun was invited to Microsoft to be a short-term technical consultant for a speech recognition project. He suggested that the project leader, Deng Li, purchase the top-notch NVIDIA GPU and match it with the corresponding server. This idea was supported by Deng Li, but Deng Li’s boss, Alex Acero, thought it was just a waste of money.[6]“GPUs are for gaming, not for AI research.”
Deng Li
Interestingly, Alex Acero later jumped to Apple and was in charge of Apple’s voice recognition software, Siri.
Microsoft’s non-committal attitude towards the GPU obviously made Xindon somewhat angry. He later suggested in an email that Dengli buy a trap device, while he himself would buy three traps, and said something in a strange way.[6]After all, we are a financially strong Canadian university, not a financially tight software vendor.
But after the end of the ImageNet Challenge in 2012, all artificial intelligence scholars and technology companies made a 180-degree turn towards GPU. In 2014, Google’s GoogLeNet won the championship with a recognition accuracy of 93%, using NVIDIA GPU. That year, the number of GPUs used by all participating teams soared to 110.
The reason why this challenge is considered a “big bang moment” is that the three pillars of Depth learning - Algorithm, Computing Power, and the shortcomings in data have been filled, and industrialization is only a matter of time.
At the algorithm level, the paper on AlexNet published by the Simpson team became one of the most cited papers in the field of computer science. The originally diverse technical route became dominated by the Depth learning, and almost all computer vision research turned to neural networks.
At the level of Computing Power, the adaptability of GPU’s super parallel computing capabilities and Depth Learning was quickly recognized by the industry, and NVIDIA, which began to deploy CUDA six years ago, became the biggest winner. **
**On the data level, ImageNet has become the touchstone of image processing algorithms. With high-quality datasets, algorithms are making great strides in recognition accuracy. In the last challenge of 2017, the champion algorithm achieved an identification accuracy of 97.3%, surpassing that of humans.
At the end of October 2012, Sutton’s student Alex Krizhevsky presented a paper at a computer vision conference in Florence, Italy. Then, high-tech companies around the world began to spare no expense to do two things: First, buy all of NVIDIA’s graphics cards, and second, hire all the AI researchers in universities.
Lake Tahoe’s $44 million has given the global Depth learning genius a revaluation.
Capture the Flag
From publicly available information, it seems that Yu Kai, who was still at Baidu at the time, was indeed the first person to come and dig in Washington.
At that time, Yu Kai was the head of Baidu’s Multimedia Department, which was the predecessor of Baidu’s Institute of Deep Learning (IDL). After receiving Yu Kai’s email, Sinton quickly replied agreeing to the cooperation and also expressed the wish for Baidu to provide some funding. When Yu Kai asked for a specific amount, Sinton said that 1 million US dollars would be enough - a number so low that it was unbelievable, only enough to hire two P8s.
Yu Kai asked Li Yanhong for permission, and the latter readily agreed. After Yu Kai replied with no problem, Xin Dun may have felt the thirst of the industry and asked Yu Kai if he minded asking other companies, such as Google. Yu Kai later recalled.[6]:
“I regretted a bit at that time, thinking that I might have answered too quickly and made Xin Dun realize the huge opportunity. However, I can only generously say that I don’t mind.”
In the end, Baidu and the Xin Dun team failed to come to an agreement. But for this result, Yu Kai was not unprepared. On the one hand, Xin Dun has serious lumbar disc health issues, cannot drive, or take a plane, and it is difficult to endure the trip to China across the Pacific; on the other hand, Xin Dun has too many students and fren who work at Google, and the two sides have deep connections, the other three are essentially just keeping up with the bid.
If the impact of AlexNet is still concentrated in the academic circle, then the secret auction of Taihu Lake has completely shocked the industry—because Google, under the nose of global technology companies, spent 44 million US dollars to buy a company that was established less than a month, has no products, no income, only three employees, and a few papers.
The most exciting one is obviously Baidu. Although it failed in the auction, Baidu’s management witnessed how Google invested in Depth learning at all costs, which prompted Baidu to make up its mind to invest and announced the establishment of the Depth Learning Research Institute (IDL) at the annual meeting in January 2013. In May 2014, Baidu invited key figures from the ‘Google Cat’ project, Andrew Ng. In January 2017, they also invited Qi Lu, who had left Microsoft.
After acquiring the Sutton team, Google continued to make efforts and bought its competitor DeepMind for $600 million in 2014.
At that time, Musk recommended DeepMind, in which he invested, to Google co-founder Larry Page. In order to bring Sinton to London to test its value, the Google team specially chartered a private plane and modified the seats to solve Sinton’s problem of not being able to take a plane.[6]。
“British player” DeepMind defeated Lee Sedol in the Go match in 2016.
Facebook is competing with Google for DeepMind. After DeepMind went to Google, Zuckerberg hired one of the ‘Depth Learning Three Musketeers’, Yang Likun. In order to bring Yang Likun into his team, Zuckerberg promised him many harsh requirements, such as setting up an AI lab in New York, completely separating the lab from the product team, allowing Yang Likun to continue serving at New York University, etc.
After the **2012 ImageNet challenge, the field of artificial intelligence is facing a very serious “talent supply and demand mismatch” problem:
Due to the rapid opening of industrial space for Algorithm, image recognition, and autonomous driving, the demand for talents has skyrocketed. However, due to long-term pessimism, the circle of researchers in Depth learning is very small, and there are only a few top scholars who can be counted on fingers, resulting in a severe shortage of supply.
In this case, technology companies are so eager to buy “talent futures”: they dig up professors, and then wait for them to bring in their own students as well.
After Yang Likun joined Facebook, six students followed him to join the company. Apple, which is eager to try its hand at car manufacturing, hired Ruslan Salakhutdinov, one of Sutton’s students, as its first AI director. Even the hedging fund Citadel joined the talent war and poached Deng Li, who worked with Sutton on speech recognition and later represented Microsoft in a secret auction.
The subsequent history is clear to us: industrial scenarios such as facial recognition, machine translation, and autonomous driving are making rapid progress, GPU orders are flowing like snowflakes towards NVIDIA’s headquarters in Santa Clara, and the theoretical edifice of artificial intelligence is being built day by day.
In 2017, Google proposed the Transformer model in the paper ‘Attention is all you need’, which opened the era of large models today. A few years later, ChatGPT emerged.
And all of this can be traced back to the 2012 ImageNet Challenge.
So, in which year did the historical process that led to the birth of the 2012 “Big Bang Moment” manifest itself?
The answer is 2006.
Great
Before 2006, the current situation of Depth learning can be summarized by borrowing the famous saying of Lord Kelvin: The building of Depth learning has been basically completed, but there are three small dark clouds floating in the bright sky.
These three little dark clouds are Algorithm, Computing Power, and data.
As mentioned earlier, due to simulating the mechanism of the human brain, Depth learning is a theoretically perfect solution. However, the problem is that both the data it needs to consume and the Computing Power it requires were at a science fiction level at that time. The mainstream view of Depth learning in the academic community was: scholars with normal brains would not research neural networks.
But three things happened in 2006 that changed this:
Sutton and student Salakhutdinov (who later went to Apple) published a paper in Science, Reducing the dimensionality of data with neural networks, for the first time proposing an effective solution to the vanishing gradient problem, which made a significant step forward at the Algorithm level.
Salakhutdinov (left) and Sutton (center), 2016
Feifei Li of Stanford University realized that if the data scale is difficult to restore the true appearance of the real world, then even the best Algorithm will find it difficult to achieve the effect of “simulating the human brain” through training. So, she began to build the ImageNet dataset.
NVIDIA has released a new GPU based on the Tesla architecture and subsequently introduced the CUDA platform. The difficulty of training Depth neural networks using GPUs has dropped significantly, and the threshold of Computing Power has been greatly reduced.
The occurrence of these three events dissipated the three dark clouds above the Depth learning, and converged in the 2012 ImageNet Challenge, completely rewriting the fate of the high-tech industry and even the entire human society.
But in 2006, whether it’s Jeff Hinton, Fei-Fei Li, Renxun Huang, or others who have promoted the development of Depth learning, obviously they could not anticipate the subsequent prosperity of artificial intelligence, let alone the roles they would play.
The paper by Hinton and Salakhutdinov
To this day, the fourth industrial revolution driven by AI has begun, and the evolution of artificial intelligence will only accelerate. If we can gain any inspiration, it may be summed up in the following three points:
1. The thickness of the industry determines the height of innovation.
When ChatGPT emerged, the voices of ‘Why is it the United States again?’ were heard one after another. However, if we extend the time, we will find that from transistors, integrated circuits, Unix, x86 architecture, to today’s machine learning, the United States has almost always been a leader in academia and industry.
This is because, although the discussion about the hollowing out of American industry is endless, the software-centered computer science industry has not only not ‘flowed out’ to other economies, but its advantages are becoming greater. To date, nearly 70 winners of the ACM Turing Award are almost all Americans.
The reason why Wu Enda chose to cooperate with Google on the “Google Cat” project is largely because only Google has the data and Computing Power required for Algorithm training, and this is built on Google’s strong profit-making ability. This is the advantage brought by the industrial thickness - talent, investment, and innovation ability will all converge towards the highland of the industry.
In China’s advantaged industries, there is also a demonstrated ‘thickness advantage.’ Currently, the most typical example is new energy vehicles. On one hand, European car companies are chartering planes to come to China’s auto shows to pay homage to the new forces, while on the other hand, Japanese car executives are frequently jumping ship to BYD. What are they after? Obviously, it’s not just to be able to pay social security in Shenzhen.
The more cutting-edge the technology field is, the more important the talent is.
The reason Google was willing to spend $44 million to buy Xindun’s company is that in the field of cutting-edge technology such as Depth learning, the role of a top scholar is often greater than that of ten thousand fresh graduates majoring in computer vision. If Baidu or Microsoft had won the bid at that time, the development context of artificial intelligence might have been rewritten.
This kind of behavior of “buying the whole company for you” is actually very common. During the key stage of Apple’s self-developed chips, they acquired a small company PASemi, just to get the chip architecture genius Jim Keller on board - Apple’s A4, AMD’s Zen, TSL’s FSD chips, have all benefited from Jim Keller’s technical assistance.
This is also the greatest advantage brought by industrial competitiveness - the attractiveness to talents.
“Depth Learning Three Giants” none of them are Americans, the name AlexNet comes from Geoff Hinton’s student Alex Krizhevsky, who was born in Ukraine under Soviet rule, grew up in Israel, and came to Canada to study. Not to mention the many Chinese faces still active in American high-tech companies today.
3. The difficulty of innovation lies in how to face uncertainty.
Apart from Marvin Minsky, the ‘father of artificial intelligence’, another famous opponent of deep learning is Jitendra Malik from the University of California, Berkeley. Both Andrew Ng and Geoffrey Hinton have been ridiculed by him. When constructing ImageNet, Fei-Fei Li also consulted Malik, who advised her to ‘Do something more useful’.
Li Feifei’s TED talk
It is precisely because of the skepticism of these industry pioneers that Depth learning has remained in obscurity for decades. Even in 2006, when Hinton shed some light on it, Yang Likun, another heavyweight, was still repeatedly proving to the academic community that ‘Depth learning also has research value’.
Yang Likun has been studying neural networks since the 1980s. During his time at Bell Labs, Yang Likun and his colleagues designed a chip called ANNA, attempting to solve the Computing Power problem. Later, due to business pressure, AT&T required the research department to “empower the business”, and Yang Likun’s response was, “I am here to study computer vision. If you have the ability, you can fire me.” In the end, he persisted and was rewarded with success.[6]。
Researchers in any cutting-edge technology field must confront a common issue - what to do if this thing cannot be developed?
Starting from entering the University of Edinburgh in 1972, Sinton has been at the forefront of Depth study for 50 years. When the ImageNet Challenge was held in 2012, he was already 65 years old. It is hard to imagine how much academic skepticism and self-doubt he had to overcome during this long period.
Now we know that the 2006 Simpson has persisted until the final darkness before dawn, but he may not know this himself, let alone the entire academic and industrial community. Just like in 2007 when the iPhone was released, most people’s reactions were probably the same as then Microsoft CEO Ballmer’s:
Currently, the iPhone is still the most expensive phone in the world, and it doesn’t have a keyboard.
People who drive history often cannot guess their coordinates in the historical process.
The reason why greatness is great is not because of its stunning appearance when it emerges, but because it has to endure long periods of obscurity and incomprehension in the boundless darkness. It is only many years later that people can follow these benchmarks and marvel at the brilliant stars and the emergence of geniuses at that time.
In one scientific research field after another, countless scholars have never glimpsed the faintest glimmer of hope throughout their lives. So in a sense, Sutton and other Depth learning advocates are fortunate. They have created greatness, indirectly driving success in one industry after another.
The Capital Market will set a fair price for success, while history records those who create greatness through loneliness and sweat.
Reference
16000 computers searching for a cat together, The New York Times[1]
Fei-Fei Li’s Quest to Make AI Better for Humanity,Wired[2]
Li Feifei’s TED talk[3]
21 seconds to see through the ImageNet model, 60+ model architectures perform on the same stage, the heart of the machine[4]
The “Road to God” of Convolutional Neural Networks: It all began with AlexNet, New Smart Element[5]
Depth learning revolution, Kade Metz[6]
To Find AI Engineers, Google and Facebook Hire Their Professors,The Information[7]
Innovative Road of 30 Years of Depth Learning, Zhu Long[8]
ImageNet in the past eight years: Fei-Fei Li and the AI world she has changed, Quantum Bit[9]
DEEP LEARNING: PREVIOUS AND PRESENT APPLICATIONS,Ramiro Vargas[10]
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions, by Laith Alzubaidi and others[11]
Literature Review of Deep Learning Research Areas, Mutlu Yapıcı, etc.[12]
The true hero behind ChatGPT: Ilya Sutskever, Chief Scientist of OpenAI, and his leap of faith, the New Intelligence Yuan[13]
10 years later, deep learning ‘revolution’ rages on, say AI pioneers Hinton, LeCun and Li,Venturebeat[14]
From not working to neural networking, The Economist[15]
Huge “foundation models” are turbo-charging AI progress,经济学人[16]
2012: A Breakthrough Year for Deep Learning, Bryan House[17]
Depth Learning: The ‘Magic Wand’ of Artificial Intelligence, Anxin Securities[18]
Development of Depth Learning Algorithm: From Diversity to Unity, Guojin Securities[19]
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
180 days that changed the fate of humanity in 2012
On a day in early December 2012, a secret auction was taking place in a casino hotel in Lake Tahoe, a skiing resort in the United States.
Lake Tahoe is located on the border of California and Nevada, and is the largest alpine lake in North America, with a sapphire-like lake surface and top-notch ski trails. The Godfather Part II was filmed here, and Mark Twain once lingered here. Due to its proximity to the San Francisco Bay Area, which is only over 200 miles away, it is often referred to as the “backyard of Silicon Valley”. Pro like Zuckerberg and Larry Ellison have also acquired land here to build mansions.
The object of the secret auction is a company called DNNresearch, which was founded just a month ago and has only three employees. Its founders are Geoffrey Hinton, a professor at the University of Toronto, and two of his students.
This company has no tangible products or assets, but the identities of the pursuers indicate its significance - the four buyers are Google, Microsoft, DeepMind, and Baidu.
Harrah’s Hotel, Lake Tahoe, 2012, which holds secret auctions
At 65, Xindun looked old and thin, suffering from the pain of lumbar disc. He sat on the floor of room 703 of the hotel setting rules for the auction - starting at $12 million, with increments of at least $1 million.
A few hours later, the bidders pushed the price up to $44 million, and Singleton felt a little dizzy. ‘I feel like we’re filming a movie,’ he said. So he decided to call it off and sell the company to the final bidder, Google.
Interestingly, one of the origins of this $44 million auction comes from Google six months ago.
In June 2012, Google’s research department Google Brain released the research results of The Cat Neurons project (also known as “Google Cat”). In simple terms, this project uses Algorithm to identify cats in videos on YouTube. It was initiated by Andrew Ng, who joined Google after leaving Stanford, and he recruited Google’s legendary figure Jeff Dean. They also obtained a large budget from Google founder Larry Page.
The Google Cat project has built a neural network that downloads a large number of videos from YouTube without annotation, allowing the model to observe and learn cat features on its own. Then, 16,000 CPUs distributed across Google’s data centers were used for training (refusing to use GPUs internally due to complexity and high cost), achieving a recognition accuracy of 74.8%. This number shocked the industry.
Andrew Ng retired from the ‘Google Brain’ project shortly before its conclusion and devoted himself to his own internet education project. Before leaving, he recommended Sutton to take over his position. In response to the invitation, Sutton stated that he would not leave the university and only wanted to ‘spend a summer’ at Google. Due to the unique recruitment rules of Google, 64-year-old Sutton became the oldest summer intern in Google’s history.
Since the 1980s, Xin Dun has been fighting at the forefront of artificial intelligence. As a professor, he has nurtured many outstanding students, including Andrew Ng, and is a master in the field of Depth learning. Therefore, when he understood the technical details of the ‘Google Cat’ project, he immediately saw the hidden flaws behind the project’s success: ‘They ran the wrong neural network and used the wrong computing power.’
With the same task, Xin Dun believed that he could do better. So after the short “internship” period, he immediately took action.
Sinton brought in his two students, Ilya Sutskever and Alex Krizhevsky, both of whom were Jewish and born in the Soviet Union. The former had exceptional mathematical talent, while the latter excelled in engineering implementation. After close collaboration, the three of them created a new neural network and immediately entered the ImageNet image recognition competition (ILSVRC), ultimately winning the championship with an astonishing 84% recognition accuracy.
In October 2012, the team led by Hinton introduced the champion AlgorithmAlexNet at a computer vision conference held in Florence. Compared with Google’s cat, which used 16,000 CPUs, AlexNet only used 4 Nvidia GPUs. This caused a sensation in both the academic and industrial communities. The paper on AlexNet has become one of the most influential papers in the history of computer science, with over 120,000 citations, while Google’s cat has been quickly forgotten.
DNNresearch Trio
After reading the paper, Yu Kai, who won the first ImageNet competition, was extremely excited, ‘like being electrocuted’. Yu Kai is a Depth learning expert born in Jiangxi, who just moved from NEC to Baidu. He immediately wrote an email to Sutton expressing his desire to collaborate, which Sutton gladly agreed to. He even packaged himself and two students into a company, invited buyers to bid, and thus the scene at the beginning came about.
After the auction hammer fell, a bigger competition unfolded: Google followed up its success and acquired DeepMind in 2014, ‘all the world’s heroes fall into my hands’; and DeepMind shocked the world by launching AlphaGo in 2016; Baidu, defeated by Google, was determined to bet on AI, investing billions over ten years, and Yu Kai later helped Baidu to invite Andrew Ng, while he himself left and founded Horizon Robotics a few years later.
Microsoft may have seemed slow at first, but in the end it won the biggest prize - OpenAI, whose founders include Ilya Sutskever, one of Sutton’s two students. Sutton himself stayed at Google until 2023, during which time he won the ACM Turing Award. Of course, compared to Google’s $44 million (of which Sutton received 40%), the Turing Award’s $1 million prize money seems like pocket change. 01928374656574839201
From the Google Cat in June to the AlexNet paper in October, and then to the bidding of Taihao Lake in December, almost all the groundwork of the AI wave has been laid in nearly 6 months— the prosperity of Depth learning, the rise of GPU and NVIDIA, the dominance of AlphaGo, the birth of Transformer, the emergence of ChatGPT… The grand movement of the silicon-based golden age has played the first note.
In 2012, for 180 days, from June to December, the fate of carbon-based humanity was changed forever – and only a few people realized it.
Liquid Cat
Among these few people, Fei-Fei Li, a professor at Stanford University, is one of them.
In 2012, when Clinton participated in the ImageNet competition, Li Feifei, who had just given birth to a child, was still on maternity leave. However, the error rate of Clinton’s team made her realize that history was being rewritten. As the founder of the ImageNet challenge, she bought the last flight to Florence that day and personally presented the award to Clinton’s team.[2]。
Li Feifei was born in Beijing and grew up in Chengdu. At the age of 16, she immigrated to the United States with her parents. While working in a laundry, she finished her studies at Princeton. In 2009, Li Feifei joined Stanford as an assistant professor, specializing in computer vision and machine learning. The goal of this discipline is to enable computers to understand the meaning of images and videos like humans do.
For example, when a camera takes a picture of a cat, it simply converts the light into pixels through a sensor, without knowing whether the thing in the lens is a cat or a dog. If a camera is compared to a human eye, the problem solved by computer vision is to equip the camera with a human brain.
The traditional way is to abstract things in the real world into mathematical models, such as abstracting the characteristics of a cat into simple geometric shapes, which can significantly reduce the difficulty of machine recognition.
Image source: Li Feifei’s TED talk
However, this approach has very significant limitations because the cat may very well be like this:
In order to enable computers to recognize ‘Liquid Cat’, a large number of depth learning pioneers such as Jeff Hinton and Yann LeCun have been exploring since the 1980s. However, they always encounter bottlenecks in computing power or algorithms - good algorithms lack sufficient computing power to drive them, and algorithms that require less computing power are difficult to meet the recognition accuracy and cannot be industrialized.
If the problem of ‘liquid cat’ cannot be solved, the sexiness of Depth learning can only stay at the theoretical level, and industrial scenarios such as autonomous driving, medical imaging, and precision advertising push are just castles in the air.
In simple terms, the development of Depth learning requires three driving forces: Algorithm, Computing Power, and data. The Algorithm determines how the computer recognizes things; however, it requires sufficient Computing Power to drive it. At the same time, the improvement of the Algorithm also requires large-scale and high-quality data. The three complement each other and are indispensable.
After the year 2000, although the Computing Power bottleneck has gradually been eliminated with the rapid advancement of chip processing power, the mainstream academic community still lacks interest in the Depth learning route. Fei-Fei Li realized that the bottleneck may not lie in the accuracy of the Algorithm itself, but in the lack of high-quality, large-scale datasets.
Li Feifei’s inspiration comes from the way a three-year-old child understands the world - take cats as an example, children will encounter cats again and again under the guidance of adults, gradually mastering the meaning of cats. If the child’s eyes are treated as a camera, each rotation of the eyeball is equivalent to pressing the shutter once, then a three-year-old child has already taken billions of photos.
Trap this method on the computer, for example, if you keep showing the computer pictures containing cats and other animals, and write down the correct answer behind each picture. Every time the computer sees a picture, it checks it against the answer on the back. So as long as there are enough repetitions, the computer may be able to grasp the meaning of cats like a child.
The only problem is: where to find so many pictures with good answers?
Li Feifei came to China in 2016 and announced the establishment of Google AI China Center.
This is the opportunity for the birth of ImageNet. At that time, even the largest dataset PASCAL only had four categories and a total of 1578 images. However, Fei-Fei Li’s goal was to create a dataset with hundreds of categories and tens of millions of images in total. Now it sounds easy, but it was in 2006, when the most popular mobile phone in the world was still the Nokia 5300.
Relying on Amazon’s crowdsourcing platform, Li Feifei’s team solved the huge workload of manual annotation. In 2009, the ImageNet dataset, which contains 3.2 million images, was born. With the image dataset, we can train algorithms on this basis to improve the computer’s recognition ability. However, compared with billions of photos of three-year-old children, the scale of 3.2 million is still too small.
To continuously expand the dataset, Li Feifei decided to emulate the popular practice in the industry by hosting an image recognition competition, where participants bring their own datasets for Algorithm identification, and the one with the highest accuracy wins. However, the Depth learning route was not mainstream at that time. Initially, ImageNet could only ‘attach’ to the well-known European event PASCAL in order to barely gather enough participants.
By 2012, the number of images in ImageNet had expanded to 1,000 categories with a total of 15 million images. Fei-Fei Li took 6 years to make up for this shortcoming in data. However, the best error rate of ILSVRC is also 25%, showing insufficient convincing power in Algorithm and Computing Power.
At this point, Professor Simpton appeared with AlexNet and two GTX580 graphics cards.
Convolution
The champion AlgorithmAlexNet of the Sutton team adopts an algorithm called Convolutional Neural Networks (CNN). “Neural network” is an extremely common term in the field of artificial intelligence and a branch of machine learning. Its name and structure are based on the operation mode of the human brain.
The process of human identifying objects starts with the pupil capturing pixels, and the brain cortex makes preliminary processing through edges and orientation, and then the brain judges through continuous abstraction. Therefore, the human brain can distinguish objects based on some features.
For example, without showing the entire face, most people can recognize who the person in the following image is:
Neural networks are actually simulating the recognition mechanism of the human brain. In theory, the intelligent computer that the human brain can achieve can also be achieved. Compared with methods such as SVM, decision trees, and random forests, only simulating the human brain can handle non-structured data such as “liquid cat” and “half of Trump”.
This is also why even the ‘father of artificial intelligence’ Marvin Minsky is not optimistic about this approach. When he published his new book ‘The Emotion Machine’ in 2007, Minsky still expressed pessimism about neural networks. In order to change the long-term negative attitude of the mainstream machine learning community towards artificial neural networks, Hinton simply renamed it Depth Learning (Deep Learning).
In 2006, Hinton published a paper in Science, proposing the concept of “Depth Belief Neural Network (DBNN)”, and provided a training method for a multi-layer Depth neural network, which was considered a major breakthrough in Depth learning. However, Hinton’s method requires a significant amount of Computing Power and data, making practical application difficult to achieve.
Depth learning requires constantly feeding data to the Algorithm. At that time, the size of the dataset was too small, until ImageNet appeared.
In the first two competitions of ImageNet, the participating teams used other machine learning approaches, and the results were quite mediocre. However, the Sinton team used the convolutional neural network AlexNet in 2012, which was an improvement on another pioneer of depth learning, Yann LeCun, and his LeNet proposed in 1998, which allowed the algorithm to extract key features of images, such as Trump’s blonde hair.
At the same time, the convolutional kernel will slide on the input image, so no matter where the detected object is, the same features can be detected, greatly reducing the computational load.
Based on the classic convolutional neural network structure, AlexNet abandons the previous layer-by-layer unsupervised methods and conducts supervised learning on input values, greatly improving accuracy.
For example, in the image in the lower right corner of the figure below, AlexNet did not actually recognize the correct answer (lemur). However, it listed small mammals that can climb trees, just like lemurs. This means that the Algorithm can not only recognize the object itself, but also make inferences based on other objects.[5]。
Image source: AlexNet paper
The industry is excited that AlexNet has 60 million parameters and 650,000 neurons, and training the ImageNet dataset requires at least 26.2 quadrillion floating-point operations. However, the Sutton team only used two NVIDIA GTX 580 graphics cards in a week of training.
GPU
After the Xin Dun team won the championship, the most embarrassing thing is obviously Google.
It is said that Google also tested the ImageNet dataset internally, but its recognition accuracy lags far behind the Sutton team. Considering that Google has hardware resources that the industry cannot reach, as well as the massive data scale of search and YouTube, Google Brain is specifically appointed by the leader, and its results obviously lack sufficient persuasiveness.
Without such a huge contrast, Depth learning may not have shocked the industry, gained recognition and popularity in a short period of time. The industry is excited because the Sutton team was able to achieve such good results with only four GPUs, so Computing Power is no longer a bottleneck.
When training, Algorithm will perform hierarchical operations on the functions and parameters of each layer of the neural network to obtain the output results, and GPUs happen to have very strong parallel computing capabilities. In fact, Andrew Ng proved this in a paper in 2009, but he and Jeff Dean still used CPUs when running “Google Cat”. Later, Jeff Dean specially ordered equipment worth $2 million, which still did not include GPUs.[6]。
Sinton is one of the very few who realized the great value of GPU for Depth learning very early on, however, before AlexNet became popular, high-tech companies generally had unclear attitudes towards GPUs.
In 2009, Xin Dun was invited to Microsoft to be a short-term technical consultant for a speech recognition project. He suggested that the project leader, Deng Li, purchase the top-notch NVIDIA GPU and match it with the corresponding server. This idea was supported by Deng Li, but Deng Li’s boss, Alex Acero, thought it was just a waste of money.[6]“GPUs are for gaming, not for AI research.”
Deng Li
Interestingly, Alex Acero later jumped to Apple and was in charge of Apple’s voice recognition software, Siri.
Microsoft’s non-committal attitude towards the GPU obviously made Xindon somewhat angry. He later suggested in an email that Dengli buy a trap device, while he himself would buy three traps, and said something in a strange way.[6]After all, we are a financially strong Canadian university, not a financially tight software vendor.
But after the end of the ImageNet Challenge in 2012, all artificial intelligence scholars and technology companies made a 180-degree turn towards GPU. In 2014, Google’s GoogLeNet won the championship with a recognition accuracy of 93%, using NVIDIA GPU. That year, the number of GPUs used by all participating teams soared to 110.
The reason why this challenge is considered a “big bang moment” is that the three pillars of Depth learning - Algorithm, Computing Power, and the shortcomings in data have been filled, and industrialization is only a matter of time.
At the algorithm level, the paper on AlexNet published by the Simpson team became one of the most cited papers in the field of computer science. The originally diverse technical route became dominated by the Depth learning, and almost all computer vision research turned to neural networks.
At the level of Computing Power, the adaptability of GPU’s super parallel computing capabilities and Depth Learning was quickly recognized by the industry, and NVIDIA, which began to deploy CUDA six years ago, became the biggest winner. **
**On the data level, ImageNet has become the touchstone of image processing algorithms. With high-quality datasets, algorithms are making great strides in recognition accuracy. In the last challenge of 2017, the champion algorithm achieved an identification accuracy of 97.3%, surpassing that of humans.
At the end of October 2012, Sutton’s student Alex Krizhevsky presented a paper at a computer vision conference in Florence, Italy. Then, high-tech companies around the world began to spare no expense to do two things: First, buy all of NVIDIA’s graphics cards, and second, hire all the AI researchers in universities.
Lake Tahoe’s $44 million has given the global Depth learning genius a revaluation.
Capture the Flag
From publicly available information, it seems that Yu Kai, who was still at Baidu at the time, was indeed the first person to come and dig in Washington.
At that time, Yu Kai was the head of Baidu’s Multimedia Department, which was the predecessor of Baidu’s Institute of Deep Learning (IDL). After receiving Yu Kai’s email, Sinton quickly replied agreeing to the cooperation and also expressed the wish for Baidu to provide some funding. When Yu Kai asked for a specific amount, Sinton said that 1 million US dollars would be enough - a number so low that it was unbelievable, only enough to hire two P8s.
Yu Kai asked Li Yanhong for permission, and the latter readily agreed. After Yu Kai replied with no problem, Xin Dun may have felt the thirst of the industry and asked Yu Kai if he minded asking other companies, such as Google. Yu Kai later recalled.[6]:
“I regretted a bit at that time, thinking that I might have answered too quickly and made Xin Dun realize the huge opportunity. However, I can only generously say that I don’t mind.”
In the end, Baidu and the Xin Dun team failed to come to an agreement. But for this result, Yu Kai was not unprepared. On the one hand, Xin Dun has serious lumbar disc health issues, cannot drive, or take a plane, and it is difficult to endure the trip to China across the Pacific; on the other hand, Xin Dun has too many students and fren who work at Google, and the two sides have deep connections, the other three are essentially just keeping up with the bid.
If the impact of AlexNet is still concentrated in the academic circle, then the secret auction of Taihu Lake has completely shocked the industry—because Google, under the nose of global technology companies, spent 44 million US dollars to buy a company that was established less than a month, has no products, no income, only three employees, and a few papers.
The most exciting one is obviously Baidu. Although it failed in the auction, Baidu’s management witnessed how Google invested in Depth learning at all costs, which prompted Baidu to make up its mind to invest and announced the establishment of the Depth Learning Research Institute (IDL) at the annual meeting in January 2013. In May 2014, Baidu invited key figures from the ‘Google Cat’ project, Andrew Ng. In January 2017, they also invited Qi Lu, who had left Microsoft.
After acquiring the Sutton team, Google continued to make efforts and bought its competitor DeepMind for $600 million in 2014.
At that time, Musk recommended DeepMind, in which he invested, to Google co-founder Larry Page. In order to bring Sinton to London to test its value, the Google team specially chartered a private plane and modified the seats to solve Sinton’s problem of not being able to take a plane.[6]。
“British player” DeepMind defeated Lee Sedol in the Go match in 2016.
Facebook is competing with Google for DeepMind. After DeepMind went to Google, Zuckerberg hired one of the ‘Depth Learning Three Musketeers’, Yang Likun. In order to bring Yang Likun into his team, Zuckerberg promised him many harsh requirements, such as setting up an AI lab in New York, completely separating the lab from the product team, allowing Yang Likun to continue serving at New York University, etc.
After the **2012 ImageNet challenge, the field of artificial intelligence is facing a very serious “talent supply and demand mismatch” problem:
Due to the rapid opening of industrial space for Algorithm, image recognition, and autonomous driving, the demand for talents has skyrocketed. However, due to long-term pessimism, the circle of researchers in Depth learning is very small, and there are only a few top scholars who can be counted on fingers, resulting in a severe shortage of supply.
In this case, technology companies are so eager to buy “talent futures”: they dig up professors, and then wait for them to bring in their own students as well.
After Yang Likun joined Facebook, six students followed him to join the company. Apple, which is eager to try its hand at car manufacturing, hired Ruslan Salakhutdinov, one of Sutton’s students, as its first AI director. Even the hedging fund Citadel joined the talent war and poached Deng Li, who worked with Sutton on speech recognition and later represented Microsoft in a secret auction.
The subsequent history is clear to us: industrial scenarios such as facial recognition, machine translation, and autonomous driving are making rapid progress, GPU orders are flowing like snowflakes towards NVIDIA’s headquarters in Santa Clara, and the theoretical edifice of artificial intelligence is being built day by day.
In 2017, Google proposed the Transformer model in the paper ‘Attention is all you need’, which opened the era of large models today. A few years later, ChatGPT emerged.
And all of this can be traced back to the 2012 ImageNet Challenge.
So, in which year did the historical process that led to the birth of the 2012 “Big Bang Moment” manifest itself?
The answer is 2006.
Great
Before 2006, the current situation of Depth learning can be summarized by borrowing the famous saying of Lord Kelvin: The building of Depth learning has been basically completed, but there are three small dark clouds floating in the bright sky.
These three little dark clouds are Algorithm, Computing Power, and data.
As mentioned earlier, due to simulating the mechanism of the human brain, Depth learning is a theoretically perfect solution. However, the problem is that both the data it needs to consume and the Computing Power it requires were at a science fiction level at that time. The mainstream view of Depth learning in the academic community was: scholars with normal brains would not research neural networks.
But three things happened in 2006 that changed this:
Sutton and student Salakhutdinov (who later went to Apple) published a paper in Science, Reducing the dimensionality of data with neural networks, for the first time proposing an effective solution to the vanishing gradient problem, which made a significant step forward at the Algorithm level.
Salakhutdinov (left) and Sutton (center), 2016
Feifei Li of Stanford University realized that if the data scale is difficult to restore the true appearance of the real world, then even the best Algorithm will find it difficult to achieve the effect of “simulating the human brain” through training. So, she began to build the ImageNet dataset.
NVIDIA has released a new GPU based on the Tesla architecture and subsequently introduced the CUDA platform. The difficulty of training Depth neural networks using GPUs has dropped significantly, and the threshold of Computing Power has been greatly reduced.
The occurrence of these three events dissipated the three dark clouds above the Depth learning, and converged in the 2012 ImageNet Challenge, completely rewriting the fate of the high-tech industry and even the entire human society.
But in 2006, whether it’s Jeff Hinton, Fei-Fei Li, Renxun Huang, or others who have promoted the development of Depth learning, obviously they could not anticipate the subsequent prosperity of artificial intelligence, let alone the roles they would play.
The paper by Hinton and Salakhutdinov
To this day, the fourth industrial revolution driven by AI has begun, and the evolution of artificial intelligence will only accelerate. If we can gain any inspiration, it may be summed up in the following three points:
1. The thickness of the industry determines the height of innovation.
When ChatGPT emerged, the voices of ‘Why is it the United States again?’ were heard one after another. However, if we extend the time, we will find that from transistors, integrated circuits, Unix, x86 architecture, to today’s machine learning, the United States has almost always been a leader in academia and industry.
This is because, although the discussion about the hollowing out of American industry is endless, the software-centered computer science industry has not only not ‘flowed out’ to other economies, but its advantages are becoming greater. To date, nearly 70 winners of the ACM Turing Award are almost all Americans.
The reason why Wu Enda chose to cooperate with Google on the “Google Cat” project is largely because only Google has the data and Computing Power required for Algorithm training, and this is built on Google’s strong profit-making ability. This is the advantage brought by the industrial thickness - talent, investment, and innovation ability will all converge towards the highland of the industry.
In China’s advantaged industries, there is also a demonstrated ‘thickness advantage.’ Currently, the most typical example is new energy vehicles. On one hand, European car companies are chartering planes to come to China’s auto shows to pay homage to the new forces, while on the other hand, Japanese car executives are frequently jumping ship to BYD. What are they after? Obviously, it’s not just to be able to pay social security in Shenzhen.
The reason Google was willing to spend $44 million to buy Xindun’s company is that in the field of cutting-edge technology such as Depth learning, the role of a top scholar is often greater than that of ten thousand fresh graduates majoring in computer vision. If Baidu or Microsoft had won the bid at that time, the development context of artificial intelligence might have been rewritten.
This kind of behavior of “buying the whole company for you” is actually very common. During the key stage of Apple’s self-developed chips, they acquired a small company PASemi, just to get the chip architecture genius Jim Keller on board - Apple’s A4, AMD’s Zen, TSL’s FSD chips, have all benefited from Jim Keller’s technical assistance.
This is also the greatest advantage brought by industrial competitiveness - the attractiveness to talents.
“Depth Learning Three Giants” none of them are Americans, the name AlexNet comes from Geoff Hinton’s student Alex Krizhevsky, who was born in Ukraine under Soviet rule, grew up in Israel, and came to Canada to study. Not to mention the many Chinese faces still active in American high-tech companies today.
3. The difficulty of innovation lies in how to face uncertainty.
Apart from Marvin Minsky, the ‘father of artificial intelligence’, another famous opponent of deep learning is Jitendra Malik from the University of California, Berkeley. Both Andrew Ng and Geoffrey Hinton have been ridiculed by him. When constructing ImageNet, Fei-Fei Li also consulted Malik, who advised her to ‘Do something more useful’.
Li Feifei’s TED talk
It is precisely because of the skepticism of these industry pioneers that Depth learning has remained in obscurity for decades. Even in 2006, when Hinton shed some light on it, Yang Likun, another heavyweight, was still repeatedly proving to the academic community that ‘Depth learning also has research value’.
Yang Likun has been studying neural networks since the 1980s. During his time at Bell Labs, Yang Likun and his colleagues designed a chip called ANNA, attempting to solve the Computing Power problem. Later, due to business pressure, AT&T required the research department to “empower the business”, and Yang Likun’s response was, “I am here to study computer vision. If you have the ability, you can fire me.” In the end, he persisted and was rewarded with success.[6]。
Researchers in any cutting-edge technology field must confront a common issue - what to do if this thing cannot be developed?
Starting from entering the University of Edinburgh in 1972, Sinton has been at the forefront of Depth study for 50 years. When the ImageNet Challenge was held in 2012, he was already 65 years old. It is hard to imagine how much academic skepticism and self-doubt he had to overcome during this long period.
Now we know that the 2006 Simpson has persisted until the final darkness before dawn, but he may not know this himself, let alone the entire academic and industrial community. Just like in 2007 when the iPhone was released, most people’s reactions were probably the same as then Microsoft CEO Ballmer’s:
Currently, the iPhone is still the most expensive phone in the world, and it doesn’t have a keyboard.
People who drive history often cannot guess their coordinates in the historical process.
The reason why greatness is great is not because of its stunning appearance when it emerges, but because it has to endure long periods of obscurity and incomprehension in the boundless darkness. It is only many years later that people can follow these benchmarks and marvel at the brilliant stars and the emergence of geniuses at that time.
In one scientific research field after another, countless scholars have never glimpsed the faintest glimmer of hope throughout their lives. So in a sense, Sutton and other Depth learning advocates are fortunate. They have created greatness, indirectly driving success in one industry after another.
The Capital Market will set a fair price for success, while history records those who create greatness through loneliness and sweat.
Reference
16000 computers searching for a cat together, The New York Times[1]
Fei-Fei Li’s Quest to Make AI Better for Humanity,Wired[2]
Li Feifei’s TED talk[3]
21 seconds to see through the ImageNet model, 60+ model architectures perform on the same stage, the heart of the machine[4]
The “Road to God” of Convolutional Neural Networks: It all began with AlexNet, New Smart Element[5]
Depth learning revolution, Kade Metz[6]
To Find AI Engineers, Google and Facebook Hire Their Professors,The Information[7]
Innovative Road of 30 Years of Depth Learning, Zhu Long[8]
ImageNet in the past eight years: Fei-Fei Li and the AI world she has changed, Quantum Bit[9]
DEEP LEARNING: PREVIOUS AND PRESENT APPLICATIONS,Ramiro Vargas[10]
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions, by Laith Alzubaidi and others[11]
Literature Review of Deep Learning Research Areas, Mutlu Yapıcı, etc.[12]
The true hero behind ChatGPT: Ilya Sutskever, Chief Scientist of OpenAI, and his leap of faith, the New Intelligence Yuan[13]
10 years later, deep learning ‘revolution’ rages on, say AI pioneers Hinton, LeCun and Li,Venturebeat[14]
From not working to neural networking, The Economist[15]
Huge “foundation models” are turbo-charging AI progress,经济学人[16]
2012: A Breakthrough Year for Deep Learning, Bryan House[17]
Depth Learning: The ‘Magic Wand’ of Artificial Intelligence, Anxin Securities[18]
Development of Depth Learning Algorithm: From Diversity to Unity, Guojin Securities[19]