Exploring synergies between Crypto and RLHF

Reinforcement Learning from Human Feedback (RLHF) is an exciting frontier in Artificial Intelligence (AI) where human expertise meets Machine Learning algorithms.

cpt n3mo
Hashed Team Blog

--

The fusion of cryptocurrencies and RLHF is an additional layer to this paradigm shift, where leveraging token incentives can enhance the effectiveness of learning systems.

In this article, we explore the role of token incentives in driving participation, expert engagement, and how it may contribute to the emergence of decentralised labour marketplaces.

1. Driving Participation and Data Collection

One of the key challenges in RLHF is acquiring a large dataset of human feedback and data.

This is where token incentives shine, acting as catalysts to drive user participation in the data gathering process. Token incentives can motivate a broader user base to actively engage and enrich their learning process, all while accelerating the development of the underlying models that the RLHF data is feeding.

Hivemapper

Take the case of Hivemapper — a company that incentivises participants involved in the map-making & editing process. The core business of Hivemapper is to generate accurate, detailed and refined maps that they can sell to businesses. For the end-product to be credible, they need a vast amount of image and video data of the world.

  • Hivemapper specially manufactured a dashcam that drivers can purchase and install in their vehicles.
  • By utilising the dashcams while driving, they capture footage of their surrounding routes and landscapes, which feeds into Hivemapper’s database of map curations.
  • Drivers earn passive income in the form of $HONEY tokens
  • As they go about their daily commutes, they are not only contributing to the mapping database but also earning tokens for their contributions.

This is an innovative approach to motivate users to actively engage and participate in the data collection process. As a result, the platform can continuously gather a vast amount of mapping data from thousands of users, leading to more accurate and comprehensive maps.

Hivemapper website

Wifi Map

Another example is Wifi Map, a crowdsourced platform that maps Wi-Fi hotspots around the world. Users can earn $WIFI tokens by contributing information about Wi-Fi networks in their vicinity, such as verifying the network’s name and password, adding hotspots, and testing internet speeds.

This incentive system encourages a large community of users to actively share data, ultimately creating a comprehensive database of Wi-Fi information that benefits millions of people seeking reliable internet connectivity.

Wifi Map website

By utilising token incentives, these platforms create a symbiotic relationship where users are rewarded for their valuable contributions, while the RLHF algorithms benefit from diverse and extensive datasets.

However — these companies have a long way to go until tokens are well integrated with the business. At present, the native tokens merely serve as farm-and-dump incentives, where drivers or wifi-contributors are mining with the end-intention of monetising and cashing out. Both tokens have only seen downwards price action since TGE.

CoinGecko

The next iteration of an effective RLHF-and-Crypto company will improve with better, synergistic relationships between all stakeholders of the business — users and tokenholders alike.

2. Aligning Incentives for Expert Engagement

While token incentives can be a powerful tool to drive engagement and participation in RLHF, they may not be universally applicable in all domains.

In expert fields such as Engineering, Finance, Medicine, Law and Academic Research, the contributors are qualified professionals with high opportunity cost of time, and they face significant reputational risks if found to provide inaccurate or misleading information to a dataset. As such, the monetary equivalent of token incentives alone may not adequately align incentives for expert engagement.

Reputation-based system

To address these challenges, alternative approaches that prioritise reputation-based rewards should be explored.

Creating a public record that is signed and verified by the contributors’ real-world identities can enhance the accuracy and credibility of their contributions.

Professionals may be incentivised to participate in this public record — akin to a social rating score within the RLHF domain — because it serves as a testament to their expertise and successful contributions within their niche community.

Initially, participation rates may be low. But as the social public ledger becomes more transparent, recognised and endorsed by professional bodies, a positive incentivisation loop may trigger — where individuals participate to score higher, which improves their “selection or citation” chances by the counterparties that value the reputation-based system.

Token incentives alone cannot be the primary motivator for experts, but with an established & verifiable reputation system, they may be able to play a valuable role in enhancing engagement.

The system is visualised to be centrally managed by each domain’s expert organisation (e.g. Ministry of Health, Ministry of Finance, etc.). These organisations may design utility tokens that give contributors access to exclusive datasets, advanced research tools or instruments which would otherwise be unavailable to those that do not have access to the token.

This creates a competitive cycle of lower-ranked professionals acquiring the token to get access to tools that improve their contributions to the community — which improves their score, and improves their reputation. In turn, this drives progress within the respective RLHF domain.

By combining these strategies of reputation-based rewards and utility token designs, RLHF platforms can create an environment that appeals to the competitiveness of experts in their respective domains.

3. Decentralised Labour Marketplaces for RLHF

In segments 1 and 2, we described how token incentives have a pivotal role in driving user participation and data collection in RLHF methodologies.

In this last segment, we explore how the earlier two segments culminate into an early iteration of a decentralised labour marketplace for RLHF, consisting of two agents:

  • Businesses (such as crypto protocols) form the demand-side, while;
  • Individuals (developers, researchers, engineers) form the supply-side.

Consider a new DeFi lending protocol that requires the expertise of Researchers to extract historical borrowing and repayment behaviours of all users’ wallets that have interacted with any money market before. The data aims to optimise the protocol’s RLHF algorithm such that they can easily target and acquire creditworthy borrowers onto their platform.

Job listing process

  • As it is a labour-intensive task with high time costs and expertise requirements, the protocol sets aside a budget in the form of native tokens as a reward for the job.
  • They refer to the public record system and specify the minimum reputation-rank to qualify for the job (as elaborated in segment 2).

Job bidding process

  • Individuals that satisfy the minimum ranking criteria are eligible for the job.
  • They are motivated to participate due to the potential rewards and reputation benefits.
  • Interested participants enter an auction and submit their bids.
  • The lowest bidder, who offers the most competitive price for the task, wins the opportunity to execute it.
  • This auction mechanism ensures the most cost-effective option for the protocol and the desired level of quality.

Keep3r Network

This decentralised labour marketplace concept was furthest explored by Keep3r Network, which is a task-based network that connects dApp teams with blockchain developers (“Keepers”). Teams put out jobs in the form of smart contracts, which developers can execute if they find the rewards commensurate.

However, this struggled to scale as there was no way to ensure tasks would be done in goodwill, nor penalise Keepers in the event of a performance clawback. Identity of the job-executor is also unverifiable.

Dune Analytics

The successful pointers of that model should be coupled with a reputation-based system similar to Dune Analytics’ Wizard leaderboard. Dune Analytics is a community-based open-source data provider that enables anyone to publish and share crypto dashboards. Contributors are known as “Wizards”, and are ranked daily on a leaderboard based on the quality and usefulness of their contributions.

The more accurate and insightful the data they provide, the more “stars” they get from Dune users or fellow Wizards (i.e. a peer-ranking system), and the higher their rank on the leaderboard.

Dune Analytics

Their identities are also verifiable on Twitter, Telegram and Teams — which are institutions that have Premium Dune accounts. Successful reputation-based systems will get mindshare, even onto mainstream media and beyond the niche community they are in.

A reputation-based system that manages to get significant recognition from its niche domain (crypto analytics), and also coverage from alternative outlets (mainstream media) would be considered successful.

Insider

By leveraging utility token incentives and a reputation-based system on verifiable public ledgers, decentralised labour marketplaces may emerge.

While they are most applicable in the field of RLHF, this design — if successful — may be extended to any field with a sufficient balance of demand (jobs) and supply (workers).

Conclusion

Token incentives help to drive user participation and data collection, while reputation-based rewards align incentives for expert engagement.

By combining these approaches, RLHF platforms may cultivate decentralised labour marketplaces, connecting businesses and individuals to efficiently complete tasks.

This integration of token incentives, reputation-based rewards, and decentralised labour marketplaces contributes to the growth of RLHF, the broader crypto ecosystem and AI.

If you are building within this intersection of Crypto and AI, please reach out!

If you enjoyed reading this, feel free to follow me on Twitter @cptn3mox.

Disclosure: Hashed has established, maintained, and enforced strict internal policies and procedures designed to identify and effectively manage conflicts of interest related to its investment activities. This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. Furthermore, references to any securities or digital assets are for illustrative purposes only and do not constitute an investment recommendation or offer to provide investment advisory services.

--

--

cpt n3mo
Hashed Team Blog

VC | Investment | Research @ Hashed; Articles are my personal views and not financial advice. @cptn3mox on Twitter