AI has three pillars: Computing Power, data, and Algorithm.
Among these three, the importance of Computing Power is the most intuitive, so NVIDIA’s Market Cap once surpassed Microsoft and Apple, becoming the most valuable company in the world. However, as Scale AI founder ALEX Wang emphasized in a podcast, data is replacing Computing Power as the biggest bottleneck for improving AI model performance.
The AI’s thirst for data is endless, but the accessible internet data resources are nearly depleted. To further enhance model performance, it is necessary to rely on more high-quality data. Although enterprises have a large amount of valuable data internally, these unstructured data can only be truly used for AI training after fine annotation. Data annotation is a resource-intensive task that has long been regarded as the most arduous and humble part of the AI industry on-chain.
However, it was with its strategy to be the first to enter the data annotation space that Scale AI was valued at $13.8 billion in its latest funding round in May this year, outpacing many well-known large model companies. This achievement undoubtedly breaks the bias that data labeling is just hard work.
Just like many Decentralization Computing Power projects challenging NVIDIA, in April this year, Sapien AI, an encryption AI project that just completed a $5 million seed round, also attempted to challenge Scale AI. It not only aims to enter the long-tail market through Decentralization, but also plans to build the world’s largest human data annotation network.
Recently, BlockBeats interviewed Trevor Koverko, co-founder and COO of Sapien AI. As the co-founder of several successful projects such as Polymath, Polymesh, and Tokens.com, Trevor had accumulated rich entrepreneurial experience before founding Sapien AI. In the interview, he shared in-depth his journey of founding Sapien AI, as well as his insights on the strategy of how Sapien AI competes dislocatively with Scale AI, and how to draw inspiration from blockchain games to design business mechanisms.
Sapien AI project experience URL: game.sapien.io
Innovative soil Toronto, encryption and AI community’s creative crystallization
BlockBeats: I saw from your LinkedIn that you used to play for the NHL New York Rangers. As a former professional ice hockey player, how did you transition into the encryption industry?
Trevor: In my career, I have tried many different roles. Ice hockey was my first job. In Canada, ice hockey is a very important part of our culture. If you don’t play ice hockey when you are young, you will almost be considered an outsider. So, this is an important part of my growing up. I learned a lot from teamwork and high-level competition, and these experiences still influence me today.
When my ice hockey career ended, I started to work in business, and in fact, I spent some time in Asia. I lived in China, specifically the city of Dalian in the northeast. My sports career and experience in China are two very important parts of shaping my growth.
I grew up in the encryption ecosystem in Toronto. I got involved in the BTC community very early on, before Ethereum was even launched. We often attended parties, exchanged ideas with friends, and met Vitalik, who was just an editor at Bitcoin Magazine at the time.
Later, when Vitalik released the White Paper, the BTC community gradually evolved into the ETH community. It was a passionate and burning time. I launched my own RWA project Polymath in 2017-2018, at a time when this field did not even have a clear classification, we called it “security token”. This was my first major project in the encryption field. We worked on all aspects of this project, from fundraising to launching applications on the ETH community.
Finally, we also established our own Layer 1 blockchain, which was a bigger challenge. Fortunately, we had very smart people like Charles Hoskinson serving as the protocol architect. Today, this blockchain has developed into an independent brand called Polymesh. It is one of the earliest and largest RWA networks, and it is at the Layer 1 level. Now, I am just a community member because it has been fully decentralized, and I am only supporting this network from a distance. In terms of adoption, it has performed very well, and now RWA is gradually becoming an exciting ecosystem.
BlockBeats: What prompted your shift of interest from RWA to AI and led to the founding of Sapien AI?
**Trevor:**After Polymesh’s daily operation of Decentralization, I became interested in AI. Toronto has a very strong AI technology community, and many early architectures of modern AI were created by researchers at the University of Toronto, such as the ‘Father of Depth Learning’ Geoffrey Hinton and former Chief Scientist of OpenAI Ilya Sutskever.
Left: Ilya Sutskever; Right: Geoffrey Hinton
I am personally very interested in using AI, and there is also a group of smart friends working on machine learning at the University of Waterloo. I have gradually become interested in the technology stack, operation mode, production process of training data, and how humans participate in the production of this training data. It is a very natural learning process.
I didn’t have the ambition to start a company at first, but after about 6 months of deep diving into AI and machine learning, under the guidance of a mentor in the machine learning graduate program at the University of Waterloo, we began to discover some interesting areas with existing problems and saw the opportunity to solve them. Eventually, we founded the company Sapien.
**BlockBeats: Can you introduce the core mission of Sapien AI to people who don’t understand the project? Where does the importance of data annotation services in the current AI industry lie?
**Trevor:**Data labeling is extremely important. This is also one of the main reasons for the success of mainstream large language models like ChatGPT, because they are among the first to use industrial-scale human data annotators to enrich the model’s datasets.
To this day, the importance of data annotation continues to increase, as there is fierce competition between these models, and the best way to improve model performance is to incorporate more professional human data annotation into the dataset.
We consider data processing as a Supply Chain: first the raw data, then it needs to be structured and organized. Once structured, the data can be trained. After training, it can be used for inference. In short, this is a process of gradually adding value to data in the context of artificial intelligence.
Just like other industries, we are starting to see the segmentation of the AI industry, with different verticals emerging and certain companies excelling in specific steps of the process. For me, the most interesting part is the second step, which involves structuring and preparing the data for training. This has always been the part that I am most interested in.
Scale AI of Decentralization, targeting the long-tail market
**BlockBeats: What makes Sapien AI different from traditional Web2 companies like Scale AI?
**Trevor:**This is a great question. We really appreciate Scale, they are an amazing company, and both co-founders are outstanding. We know one of them. They are one of the largest AI companies in the world, in terms of revenue, market capitalization, and usage.
Our difference is that we start from first principles and think about what a modern data labeling technology stack should look like in 2024. We do not necessarily pursue the use cases covered by Scale, our goal is the mid-market and long-tail market.
We strive to make it easy for anyone to get manual feedback on datasets, whether you are an Open Source model in the mid-market, an enterprise-level model, or just an individual researching on the weekend. If you want to improve model performance and need on-demand manual feedback, come to us.
You can think of us as a more distributed or Decentralization version of Scale AI. This means that our annotators are more widespread, not limited to a specific location, but can work remotely anywhere. To some extent, this decentralization can enable us to do better in data annotation quality, because diversity is not just for diversity’s sake, but also for improving the quality of data training.
For example, if you ask a group of people with similar backgrounds to label data in a facility, it is likely to produce biased or culturally biased data outputs. Therefore, we strive to make it as diverse and robust as possible from the beginning. Due to greater Decentralization, we can also obtain higher quality annotators to some extent. If you have to work in a specific location in the Philippines, the pool of talent you can attract is limited, but by prioritizing remote work, we can find annotators from anywhere.
I’m not saying that Scale hasn’t done these things, but we’re thinking about how to serve other parts of the model market. Because we believe that this market will continue to rise, there will be a lot of private and licensed models that require human feedback.
BlockBeats: How is the data annotation workflow of Sapien AI designed and optimized? What are the key steps to ensure data quality?
**Trevor:**Our platform operates similar to a two-sided market. You can think of it as the Uber of the data labeling world, a Decentralization version. On one hand, there are demand-side clients, similar to passengers in Uber, who are businesses that need human feedback in their models. For example, they are building a large language model and need human involvement to fine-tune the model.
They came to us and uploaded the raw dataset to the network. We provide a quote based on several different variables of the dataset, such as complexity, data modality, data format, etc. For enterprise clients, this process is very self-service.
On the other hand, there is the supply side, the annotators, who are equivalent to our Uber drivers. Currently, this is actually the bottleneck of the industry, and we need as many annotators to join the network as possible. Because the demand is basically unlimited, just like Uber, there are always people who want to ride, and this demand will never end. In the AI field, the demand for more data consumption by these AI models is also continuous.
We are very focused on the supply side, committed to making data annotation easy for anyone. We have invented some new technologies and are constantly improving these technologies to ensure large-scale high-quality annotation in a distributed mode. The initial problem we proposed was whether high-quality annotation could be ensured without centralized management? This is actually what we call the ‘three difficulties of data annotation’: can we lower the cost for clients, increase the income for annotators, and improve the overall quality at the same time?
We have conducted multiple experiments in this field and have achieved some very interesting results. We have tried different new mechanisms such as mean regression, anomaly detection, and mixed the use of some probability models, which can largely infer the quality of the annotators’ work. We are also developing some updated technologies. So far, we are very excited about the future development prospects of data annotation in the next five to ten years. We believe that data annotation will become more decentralized, more self-service, and more automated.
BlockBeats: Can you provide more detailed information about your products and technologies, especially those that ensure data quality? I know that you have a stake mechanism to prevent malicious annotators, are there any other technologies?
Trevor: Yes, we are trying many different methods. We have a reputation system, as well as stake and penalty mechanisms. After staking a certain amount of funds, if the annotator fails to meet the standards, they may be fined. These mechanisms are still in the early experimental stage, but we have found that this incentive mechanism alone can significantly improve compliance with quality, possibly even improving multiple standard deviations. However, this series of quality controls is achieved through a weighted average of different algorithms, and we are constantly fine-tuning these algorithms. At the same time, we are also using machine learning to optimize this process. For example, we use the ML linter tool and the ‘Red Rabbit’ test, which provides false data to annotators to test their honesty in annotation.
This is a big question: how do we know if people are conducting a Sybil Attack (i.e. attempting to cheat and manipulate the system) on the network? We must be vigilant about this at all times. This is also why we like certain Web3 incentive mechanisms, because these mechanisms were originally invented to address similar Sybil Attack problems and the Byzantine Generals Problem, with the aim of making rule compliance in everyone’s best interest. If you are selfish, you will follow the network protocol.
We are still in the early stages. For some major clients, we have implemented more traditional quality control methods, while we are also rapidly moving towards this new frontier of data world.
BlockBeats: What do you think is the biggest advantage of Sapien AI as a Decentralization data annotation platform?
Trevor: As I said, our platform is more self-service, which allows us to serve a wider customer base. For annotators, our requirements are also very broad. We want anyone to become an annotator because we believe that the next era or chapter of AI will be to extract more existing knowledge from humans. It’s not just basic things like “this is a stop sign” or “this is a car” that can be easily recognized by humans and machines, but it’s more about reasoning.
Alex Wang from Scale has talked about this issue: Internet data is the result of reasoning, but it does not truly describe the process of reasoning. So, how can we gain a deeper understanding of human thinking? This requires more work and more professional annotation. This may help us accelerate the development of general artificial intelligence (AGI).
So, our greater mission is: can we unlock more knowledge within private data sets within enterprises, in the minds of professionals? These professionals have specialized knowledge in certain verticals (such as healthcare or law) that models have yet to grasp.
We are still working hard to make our platform as liquid as possible and strive to maintain a balance between supply and demand. We aim to achieve dynamic pricing, just like Uber. These mechanisms make us more like a true bilateral market, meeting the data needs on one side and helping the annotators join on the other side. These are some unique ways we build our platform. In terms of quality assurance, we use the technologies I mentioned earlier in real time. We hope our annotators can get as much real-time feedback as possible, as this can create a better experience for everyone.
Label to Earn, the Future of Gig Economy
BlockBeats: I noticed that Sapien AI has partnered with gaming guild Yield Guild Games (YGG), so can Sapien AI’s decentralization labeling mechanism be understood as a ‘label to earn’ game?
Trevor: Absolutely right. We do hope to enter the world of those who want to make a living through their mobile phones, as we believe this is the future of the gig economy. You don’t need a car to drive for Uber, you don’t need to do food delivery at a physical location, you just need to log in on your phone, do data annotation, and earn income.
YGG is an amazing partner, they are one of our angel investors. We have a great relationship with founder Gabby and they have an amazing community in Southeast Asia. We have big plans with them, hoping to help their users find new ways to make money while they also help us gain new users. We recently announced some collaborative projects and have more plans in the works. In Q4, we will also be meeting with these partners in Asia and continuing to drive collaboration.
BlockBeats: What are your thoughts on ‘play to earn’ blockchain games like ‘Axie Infinity’?
Trevor: This is very innovative and can be said to be a source of inspiration. Although it is just an experiment, I believe it will return in new forms. This is the beauty of startups and Decentralization entrepreneurship, it is a creative destruction.
What we are doing does have some elements of ‘play to earn’, and we tend to use phrases like ‘label to earn’ or ‘train to earn’. However, there is still a difference, as we are a real business. Real data is being labeled here, real customers are paying real money, and ultimately a real product is being produced. So this is not just an endless loop video game.
Although annotating data with Sapien AI is interesting, it may not be as fun as playing Grand Theft Auto V. We hope to strike a good balance between entertainment and practicality, making it something you can do while waiting at a bus stop for 5 minutes or spending 5 hours in front of your computer at home. Our goal is to make it as easy to engage with as possible.
**BlockBeats: Do you have a way to make data annotation more interesting, not just work, but more like a game?
Trevor: Yes, we have a lot of experiments now. You can visit game.sapien.io to experience the game yourself and annotate real AI data. You can become an AI worker, play the game while annotating real AI data, and earn points. This game is very simple and has an intuitive interface.
game.sapien.io game interface
The data itself is also very interesting. You may need to label some very interesting pictures, such as labeling our fashion data, etc. We plan to support various types of modalities and datasets. We plan to continuously add more features over time.
Future Blueprint: Building the world’s largest artificial data labeling network
BlockBeats: In addition to YGG, what other encryption projects do you plan to cooperate with in the future?
Therefore, we are collaborating with others in the field of Decentralization data and are in the early stages of establishing this standard, with plans to release it as a public product. We have also done similar things during our time at Polymath, where we released ERC-1400, which has now become one of the default standards for tokenization on the Ethereum blockchain.
So we have some ideas about creating standards and plan to work with teams and some industry partners who have helped us in the past to push this process forward. This will make Decentralization AI more real and more interoperable, meaning data can flow more easily between different steps, as no one person can do everything.
**BlockBeats: What is the specific release date of Sapien AI Mainnet and the mobile app?
Trevor: We currently don’t have a specific release plan. We are now focused on matching our core Web2 product market. Our rise is doing very well, and we now have annotators from 71 countries. This year, our revenue on the demand side has almost doubled every month.
We just want to continue to grow, constantly understand our customers, and continue to provide services to them. Over time, we will maintain an open attitude towards various different strategies and technologies.
**BlockBeats: I saw Base co-founder Rowan Stone has joined Sapien AI as Chief Business Development Officer. Which public blockchain will Sapien AI be built on? Is there a plan to have the issuance of native Token?
**Trevor:**These are all very Depthy questions, I appreciate them. Rowan is great, he co-founded Base with Jesse Pollak, who is definitely a legendary figure. Rowan has extensive experience and is unparalleled in building industrial-grade Web3 products. In my opinion, he is second to none. He was involved in leading the ‘Onchain Summer’ event, which is one of the most successful events I can remember.
He is helping us develop market strategies in certain areas. However, as I just mentioned, we are currently very focused on serving existing clients, which is our main focus. We have not made any commitments or decisions regarding choosing any Layer 1 or other aspects. But in the future, we will continue to consider various possibilities.
BlockBeats: What are Sapien AI’s future plans or goals? What milestones do you hope to achieve in the coming years?
Trevor: Our mission is to increase the number of global human data annotators by 100 times and make it easy for anyone to access this network. We want to build the world’s largest human data annotation network. We believe this will be a very valuable asset, so we want to build and control it, but eventually open it up. We hope that anyone can access it without any permission at all.
If we can build the world’s largest human data annotation network, it will unlock a huge amount of potential AI capabilities, because the more high-quality data we have, the more powerful AI becomes, and the more it can be used by everyone.
We hope it can serve everyone, not just the large language model companies that can afford millions of human annotators. Now, anyone can use this network. You can think of it as a “labeling as a service” platform.
Behind Decentralization: The Entrepreneur’s Task is to Solve Problems
**BlockBeats: Finally, I would like to ask for your observations and opinions on the entire industry. What potential do you think is still untapped in the encryption AI field at present?
Trevor: I’m very excited about this field, which is why we created Sapien AI. There is a good side to it, but also a need for caution.
One positive aspect is that the AI of Decentralization may become more autonomous, more democratic, more accessible, and more powerful. This means that AI agents can transact with their own native currency, and it also means that you can have more privacy and can precisely know what is contained in the model through ZK technology.
In terms of defense, we are facing a very terrifying world, in which AI is becoming more and more centralized, and only governments and a few large technology companies can access powerful models. This is a rather frightening scenario. Therefore, Open Source and Decentralization AI are a means of defense.
For us, we are more focused on the data aspect, Decentralization data. This does not mean that you cannot Decentralization other parts of the AI stack, such as computing and Algorithm itself. Just like Transformer is the first innovation in the Algorithm aspect, we have seen more innovations, but there is always room for improvement.
Decentralization does not mean you should do it just because you can. Decentralizing something does not mean you should do it. There must be real value in the end. But like other parts of the financial and Web3 space, AI can definitely benefit from Decentralization.
BlockBeats: What advice would you most like to give to entrepreneurs who want to enter the encryption AI field?
**Trevor:**I suggest to learn as much as possible, to truly understand the technology stack and architecture. You don’t necessarily have to become a machine learning doctor, but it’s important to understand how it works and do research. From here, over time, you will gradually understand the problems more organically. This is key.
If you don’t understand how it works, you can’t understand where the problem is. And if you don’t know where the problem is, you shouldn’t be an entrepreneur, because the job of an entrepreneur is to solve problems.
So this is no different from any other startup, and you should understand this field. You don’t have to be the top global expert in this field, but you need to know enough about it to be able to understand the problems and then try to solve them.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
Exclusive Interview with Sapien AI Co-founder: Label to Earn is the Future of the Gig Economy in the AI Era
AI has three pillars: Computing Power, data, and Algorithm.
Among these three, the importance of Computing Power is the most intuitive, so NVIDIA’s Market Cap once surpassed Microsoft and Apple, becoming the most valuable company in the world. However, as Scale AI founder ALEX Wang emphasized in a podcast, data is replacing Computing Power as the biggest bottleneck for improving AI model performance.
The AI’s thirst for data is endless, but the accessible internet data resources are nearly depleted. To further enhance model performance, it is necessary to rely on more high-quality data. Although enterprises have a large amount of valuable data internally, these unstructured data can only be truly used for AI training after fine annotation. Data annotation is a resource-intensive task that has long been regarded as the most arduous and humble part of the AI industry on-chain.
However, it was with its strategy to be the first to enter the data annotation space that Scale AI was valued at $13.8 billion in its latest funding round in May this year, outpacing many well-known large model companies. This achievement undoubtedly breaks the bias that data labeling is just hard work.
Just like many Decentralization Computing Power projects challenging NVIDIA, in April this year, Sapien AI, an encryption AI project that just completed a $5 million seed round, also attempted to challenge Scale AI. It not only aims to enter the long-tail market through Decentralization, but also plans to build the world’s largest human data annotation network.
Recently, BlockBeats interviewed Trevor Koverko, co-founder and COO of Sapien AI. As the co-founder of several successful projects such as Polymath, Polymesh, and Tokens.com, Trevor had accumulated rich entrepreneurial experience before founding Sapien AI. In the interview, he shared in-depth his journey of founding Sapien AI, as well as his insights on the strategy of how Sapien AI competes dislocatively with Scale AI, and how to draw inspiration from blockchain games to design business mechanisms.
Sapien AI project experience URL: game.sapien.io
Innovative soil Toronto, encryption and AI community’s creative crystallization
BlockBeats: I saw from your LinkedIn that you used to play for the NHL New York Rangers. As a former professional ice hockey player, how did you transition into the encryption industry?
Trevor: In my career, I have tried many different roles. Ice hockey was my first job. In Canada, ice hockey is a very important part of our culture. If you don’t play ice hockey when you are young, you will almost be considered an outsider. So, this is an important part of my growing up. I learned a lot from teamwork and high-level competition, and these experiences still influence me today.
When my ice hockey career ended, I started to work in business, and in fact, I spent some time in Asia. I lived in China, specifically the city of Dalian in the northeast. My sports career and experience in China are two very important parts of shaping my growth.
I grew up in the encryption ecosystem in Toronto. I got involved in the BTC community very early on, before Ethereum was even launched. We often attended parties, exchanged ideas with friends, and met Vitalik, who was just an editor at Bitcoin Magazine at the time.
Later, when Vitalik released the White Paper, the BTC community gradually evolved into the ETH community. It was a passionate and burning time. I launched my own RWA project Polymath in 2017-2018, at a time when this field did not even have a clear classification, we called it “security token”. This was my first major project in the encryption field. We worked on all aspects of this project, from fundraising to launching applications on the ETH community.
Finally, we also established our own Layer 1 blockchain, which was a bigger challenge. Fortunately, we had very smart people like Charles Hoskinson serving as the protocol architect. Today, this blockchain has developed into an independent brand called Polymesh. It is one of the earliest and largest RWA networks, and it is at the Layer 1 level. Now, I am just a community member because it has been fully decentralized, and I am only supporting this network from a distance. In terms of adoption, it has performed very well, and now RWA is gradually becoming an exciting ecosystem.
BlockBeats: What prompted your shift of interest from RWA to AI and led to the founding of Sapien AI?
**Trevor:**After Polymesh’s daily operation of Decentralization, I became interested in AI. Toronto has a very strong AI technology community, and many early architectures of modern AI were created by researchers at the University of Toronto, such as the ‘Father of Depth Learning’ Geoffrey Hinton and former Chief Scientist of OpenAI Ilya Sutskever.
I am personally very interested in using AI, and there is also a group of smart friends working on machine learning at the University of Waterloo. I have gradually become interested in the technology stack, operation mode, production process of training data, and how humans participate in the production of this training data. It is a very natural learning process.
I didn’t have the ambition to start a company at first, but after about 6 months of deep diving into AI and machine learning, under the guidance of a mentor in the machine learning graduate program at the University of Waterloo, we began to discover some interesting areas with existing problems and saw the opportunity to solve them. Eventually, we founded the company Sapien.
**BlockBeats: Can you introduce the core mission of Sapien AI to people who don’t understand the project? Where does the importance of data annotation services in the current AI industry lie?
**Trevor:**Data labeling is extremely important. This is also one of the main reasons for the success of mainstream large language models like ChatGPT, because they are among the first to use industrial-scale human data annotators to enrich the model’s datasets.
To this day, the importance of data annotation continues to increase, as there is fierce competition between these models, and the best way to improve model performance is to incorporate more professional human data annotation into the dataset.
We consider data processing as a Supply Chain: first the raw data, then it needs to be structured and organized. Once structured, the data can be trained. After training, it can be used for inference. In short, this is a process of gradually adding value to data in the context of artificial intelligence.
Just like other industries, we are starting to see the segmentation of the AI industry, with different verticals emerging and certain companies excelling in specific steps of the process. For me, the most interesting part is the second step, which involves structuring and preparing the data for training. This has always been the part that I am most interested in.
Scale AI of Decentralization, targeting the long-tail market
**BlockBeats: What makes Sapien AI different from traditional Web2 companies like Scale AI?
**Trevor:**This is a great question. We really appreciate Scale, they are an amazing company, and both co-founders are outstanding. We know one of them. They are one of the largest AI companies in the world, in terms of revenue, market capitalization, and usage.
Our difference is that we start from first principles and think about what a modern data labeling technology stack should look like in 2024. We do not necessarily pursue the use cases covered by Scale, our goal is the mid-market and long-tail market.
We strive to make it easy for anyone to get manual feedback on datasets, whether you are an Open Source model in the mid-market, an enterprise-level model, or just an individual researching on the weekend. If you want to improve model performance and need on-demand manual feedback, come to us.
You can think of us as a more distributed or Decentralization version of Scale AI. This means that our annotators are more widespread, not limited to a specific location, but can work remotely anywhere. To some extent, this decentralization can enable us to do better in data annotation quality, because diversity is not just for diversity’s sake, but also for improving the quality of data training.
For example, if you ask a group of people with similar backgrounds to label data in a facility, it is likely to produce biased or culturally biased data outputs. Therefore, we strive to make it as diverse and robust as possible from the beginning. Due to greater Decentralization, we can also obtain higher quality annotators to some extent. If you have to work in a specific location in the Philippines, the pool of talent you can attract is limited, but by prioritizing remote work, we can find annotators from anywhere.
I’m not saying that Scale hasn’t done these things, but we’re thinking about how to serve other parts of the model market. Because we believe that this market will continue to rise, there will be a lot of private and licensed models that require human feedback.
BlockBeats: How is the data annotation workflow of Sapien AI designed and optimized? What are the key steps to ensure data quality?
**Trevor:**Our platform operates similar to a two-sided market. You can think of it as the Uber of the data labeling world, a Decentralization version. On one hand, there are demand-side clients, similar to passengers in Uber, who are businesses that need human feedback in their models. For example, they are building a large language model and need human involvement to fine-tune the model.
They came to us and uploaded the raw dataset to the network. We provide a quote based on several different variables of the dataset, such as complexity, data modality, data format, etc. For enterprise clients, this process is very self-service.
On the other hand, there is the supply side, the annotators, who are equivalent to our Uber drivers. Currently, this is actually the bottleneck of the industry, and we need as many annotators to join the network as possible. Because the demand is basically unlimited, just like Uber, there are always people who want to ride, and this demand will never end. In the AI field, the demand for more data consumption by these AI models is also continuous.
We are very focused on the supply side, committed to making data annotation easy for anyone. We have invented some new technologies and are constantly improving these technologies to ensure large-scale high-quality annotation in a distributed mode. The initial problem we proposed was whether high-quality annotation could be ensured without centralized management? This is actually what we call the ‘three difficulties of data annotation’: can we lower the cost for clients, increase the income for annotators, and improve the overall quality at the same time?
We have conducted multiple experiments in this field and have achieved some very interesting results. We have tried different new mechanisms such as mean regression, anomaly detection, and mixed the use of some probability models, which can largely infer the quality of the annotators’ work. We are also developing some updated technologies. So far, we are very excited about the future development prospects of data annotation in the next five to ten years. We believe that data annotation will become more decentralized, more self-service, and more automated.
BlockBeats: Can you provide more detailed information about your products and technologies, especially those that ensure data quality? I know that you have a stake mechanism to prevent malicious annotators, are there any other technologies?
Trevor: Yes, we are trying many different methods. We have a reputation system, as well as stake and penalty mechanisms. After staking a certain amount of funds, if the annotator fails to meet the standards, they may be fined. These mechanisms are still in the early experimental stage, but we have found that this incentive mechanism alone can significantly improve compliance with quality, possibly even improving multiple standard deviations. However, this series of quality controls is achieved through a weighted average of different algorithms, and we are constantly fine-tuning these algorithms. At the same time, we are also using machine learning to optimize this process. For example, we use the ML linter tool and the ‘Red Rabbit’ test, which provides false data to annotators to test their honesty in annotation.
This is a big question: how do we know if people are conducting a Sybil Attack (i.e. attempting to cheat and manipulate the system) on the network? We must be vigilant about this at all times. This is also why we like certain Web3 incentive mechanisms, because these mechanisms were originally invented to address similar Sybil Attack problems and the Byzantine Generals Problem, with the aim of making rule compliance in everyone’s best interest. If you are selfish, you will follow the network protocol.
We are still in the early stages. For some major clients, we have implemented more traditional quality control methods, while we are also rapidly moving towards this new frontier of data world.
BlockBeats: What do you think is the biggest advantage of Sapien AI as a Decentralization data annotation platform?
Trevor: As I said, our platform is more self-service, which allows us to serve a wider customer base. For annotators, our requirements are also very broad. We want anyone to become an annotator because we believe that the next era or chapter of AI will be to extract more existing knowledge from humans. It’s not just basic things like “this is a stop sign” or “this is a car” that can be easily recognized by humans and machines, but it’s more about reasoning.
Alex Wang from Scale has talked about this issue: Internet data is the result of reasoning, but it does not truly describe the process of reasoning. So, how can we gain a deeper understanding of human thinking? This requires more work and more professional annotation. This may help us accelerate the development of general artificial intelligence (AGI).
So, our greater mission is: can we unlock more knowledge within private data sets within enterprises, in the minds of professionals? These professionals have specialized knowledge in certain verticals (such as healthcare or law) that models have yet to grasp.
We are still working hard to make our platform as liquid as possible and strive to maintain a balance between supply and demand. We aim to achieve dynamic pricing, just like Uber. These mechanisms make us more like a true bilateral market, meeting the data needs on one side and helping the annotators join on the other side. These are some unique ways we build our platform. In terms of quality assurance, we use the technologies I mentioned earlier in real time. We hope our annotators can get as much real-time feedback as possible, as this can create a better experience for everyone.
Label to Earn, the Future of Gig Economy
BlockBeats: I noticed that Sapien AI has partnered with gaming guild Yield Guild Games (YGG), so can Sapien AI’s decentralization labeling mechanism be understood as a ‘label to earn’ game?
Trevor: Absolutely right. We do hope to enter the world of those who want to make a living through their mobile phones, as we believe this is the future of the gig economy. You don’t need a car to drive for Uber, you don’t need to do food delivery at a physical location, you just need to log in on your phone, do data annotation, and earn income.
YGG is an amazing partner, they are one of our angel investors. We have a great relationship with founder Gabby and they have an amazing community in Southeast Asia. We have big plans with them, hoping to help their users find new ways to make money while they also help us gain new users. We recently announced some collaborative projects and have more plans in the works. In Q4, we will also be meeting with these partners in Asia and continuing to drive collaboration.
BlockBeats: What are your thoughts on ‘play to earn’ blockchain games like ‘Axie Infinity’?
Trevor: This is very innovative and can be said to be a source of inspiration. Although it is just an experiment, I believe it will return in new forms. This is the beauty of startups and Decentralization entrepreneurship, it is a creative destruction.
What we are doing does have some elements of ‘play to earn’, and we tend to use phrases like ‘label to earn’ or ‘train to earn’. However, there is still a difference, as we are a real business. Real data is being labeled here, real customers are paying real money, and ultimately a real product is being produced. So this is not just an endless loop video game.
Although annotating data with Sapien AI is interesting, it may not be as fun as playing Grand Theft Auto V. We hope to strike a good balance between entertainment and practicality, making it something you can do while waiting at a bus stop for 5 minutes or spending 5 hours in front of your computer at home. Our goal is to make it as easy to engage with as possible.
**BlockBeats: Do you have a way to make data annotation more interesting, not just work, but more like a game?
Trevor: Yes, we have a lot of experiments now. You can visit game.sapien.io to experience the game yourself and annotate real AI data. You can become an AI worker, play the game while annotating real AI data, and earn points. This game is very simple and has an intuitive interface.
The data itself is also very interesting. You may need to label some very interesting pictures, such as labeling our fashion data, etc. We plan to support various types of modalities and datasets. We plan to continuously add more features over time.
Future Blueprint: Building the world’s largest artificial data labeling network
BlockBeats: In addition to YGG, what other encryption projects do you plan to cooperate with in the future?
Therefore, we are collaborating with others in the field of Decentralization data and are in the early stages of establishing this standard, with plans to release it as a public product. We have also done similar things during our time at Polymath, where we released ERC-1400, which has now become one of the default standards for tokenization on the Ethereum blockchain.
So we have some ideas about creating standards and plan to work with teams and some industry partners who have helped us in the past to push this process forward. This will make Decentralization AI more real and more interoperable, meaning data can flow more easily between different steps, as no one person can do everything.
**BlockBeats: What is the specific release date of Sapien AI Mainnet and the mobile app?
Trevor: We currently don’t have a specific release plan. We are now focused on matching our core Web2 product market. Our rise is doing very well, and we now have annotators from 71 countries. This year, our revenue on the demand side has almost doubled every month.
We just want to continue to grow, constantly understand our customers, and continue to provide services to them. Over time, we will maintain an open attitude towards various different strategies and technologies.
**BlockBeats: I saw Base co-founder Rowan Stone has joined Sapien AI as Chief Business Development Officer. Which public blockchain will Sapien AI be built on? Is there a plan to have the issuance of native Token?
**Trevor:**These are all very Depthy questions, I appreciate them. Rowan is great, he co-founded Base with Jesse Pollak, who is definitely a legendary figure. Rowan has extensive experience and is unparalleled in building industrial-grade Web3 products. In my opinion, he is second to none. He was involved in leading the ‘Onchain Summer’ event, which is one of the most successful events I can remember.
He is helping us develop market strategies in certain areas. However, as I just mentioned, we are currently very focused on serving existing clients, which is our main focus. We have not made any commitments or decisions regarding choosing any Layer 1 or other aspects. But in the future, we will continue to consider various possibilities.
BlockBeats: What are Sapien AI’s future plans or goals? What milestones do you hope to achieve in the coming years?
Trevor: Our mission is to increase the number of global human data annotators by 100 times and make it easy for anyone to access this network. We want to build the world’s largest human data annotation network. We believe this will be a very valuable asset, so we want to build and control it, but eventually open it up. We hope that anyone can access it without any permission at all.
If we can build the world’s largest human data annotation network, it will unlock a huge amount of potential AI capabilities, because the more high-quality data we have, the more powerful AI becomes, and the more it can be used by everyone.
We hope it can serve everyone, not just the large language model companies that can afford millions of human annotators. Now, anyone can use this network. You can think of it as a “labeling as a service” platform.
Behind Decentralization: The Entrepreneur’s Task is to Solve Problems
**BlockBeats: Finally, I would like to ask for your observations and opinions on the entire industry. What potential do you think is still untapped in the encryption AI field at present?
Trevor: I’m very excited about this field, which is why we created Sapien AI. There is a good side to it, but also a need for caution.
One positive aspect is that the AI of Decentralization may become more autonomous, more democratic, more accessible, and more powerful. This means that AI agents can transact with their own native currency, and it also means that you can have more privacy and can precisely know what is contained in the model through ZK technology.
In terms of defense, we are facing a very terrifying world, in which AI is becoming more and more centralized, and only governments and a few large technology companies can access powerful models. This is a rather frightening scenario. Therefore, Open Source and Decentralization AI are a means of defense.
For us, we are more focused on the data aspect, Decentralization data. This does not mean that you cannot Decentralization other parts of the AI stack, such as computing and Algorithm itself. Just like Transformer is the first innovation in the Algorithm aspect, we have seen more innovations, but there is always room for improvement.
Decentralization does not mean you should do it just because you can. Decentralizing something does not mean you should do it. There must be real value in the end. But like other parts of the financial and Web3 space, AI can definitely benefit from Decentralization.
BlockBeats: What advice would you most like to give to entrepreneurs who want to enter the encryption AI field?
**Trevor:**I suggest to learn as much as possible, to truly understand the technology stack and architecture. You don’t necessarily have to become a machine learning doctor, but it’s important to understand how it works and do research. From here, over time, you will gradually understand the problems more organically. This is key.
If you don’t understand how it works, you can’t understand where the problem is. And if you don’t know where the problem is, you shouldn’t be an entrepreneur, because the job of an entrepreneur is to solve problems.
So this is no different from any other startup, and you should understand this field. You don’t have to be the top global expert in this field, but you need to know enough about it to be able to understand the problems and then try to solve them.