Nov 9, 2024
My Thesis Behind Dappier
As consumers increasingly adopt AI agents like ChatGPT, Dappier's RAG marketplace enables micro data transactions, facilitating the development of the "AI Internet."

In my last article I went into my experience experimenting with ChatGPT and the intuitiveness of the platform from its early days.
"Early tools like ChatGPT-3.5 were groundbreaking, making AI feel approachable and personal. These advancements effectively "personified" artificial intelligence for everyday users, catalyzing the current AI wave."
ChatGPT's intuitive appeal lies in its resemblance to traditional search engines, yet it surpasses their functionality by delivering contextually relevant answers directly within its chat interface, rather than merely listing website links. This capability enables users to obtain relevant information faster and easier than conventional search methods, propelling ChatGPT and its competitors as essential consumer technologies.
Earlier versions of ChatGPT were limited by knowledge cutoff dates, restricting their access to the most recent information. However, with the introduction of ChatGPT Search, the model can now access up-to-date data from the web to respond to user queries. Google, which has historically dominated Search, has developed its own LLM, Google Gemini, which integrates with Google's extensive search infrastructure to provide similar real-time, contextually relevant information.
Regardless of who ultimately wins Search, I believe we are quickly transitioning to an "AI internet" that is powered through LLMs to provide contextually relevant generative AI content wherever the user is. This shift, however, has the ability to disrupt digital content and the way that it has been historically monetized.
What is Dappier?
Dappier is at the forefront of the evolving "AI Internet," facilitating seamless transactions between digital publishers and AI agents that utilize online data to respond to user queries. Through Dappier's Platform, digital content owners can convert their websites and proprietary information into a vectorized format that can be readily used by RAG architectures. Dappier's Marketplace empowers these digital publishers to monetize their online data on a per-query basis, allowing them to generate revenue as their content is utilized across any AI endpoint.
Here is my thesis behind Dappier and why I have conviction in this business:
- The battle between LLMs and digital publishers
- How does Dappier fit into the future of the "AI internet"?
- Dappier's GTM strategy
- Who is the founding team?
- My thesis behind Dappier
1. The Battle Between LLMs and Digital Publishers
1a. The Need for Data Scraping
To understand the relationship between LLMs and digital content it's critical to know how AI models fundamentally work, which can broadly be broken down into two phases – training and inference.
In the training phase, LLMs are exposed to extensive text data to learn the intricacies of human language, including grammar, context and word relationships. This comprehensive training is essential for their effective application during the inference phase. In the inference phase, LLMs utilize the patterns and knowledge acquired during training to generate human-like text based on new inputs. This phase enables applications such as text completion, translation, and summarization. Having been trained on an extensive and rich data set, LLMs can demonstrate a high degree of understanding of context and nuance across a wide range of topics.
Given the importance of training LLMs on extensive and diverse data, model providers often turn to the web as a rich source of information. The internet offers a vast array of topics, writing styles, and languages, making it an ideal resource for comprehensive language learning. The process of gathering this digital content is known as data scraping, which involves extracting publicly available text from websites, forums, articles, and other online sources. This collected data serves as the foundation for developing effective LLMs and robust inference engines.
1b. Implications of hiQ Labs v. LinkedIn
The legal landscape of data scraping has been complex, with the hiQ Labs v. LinkedIn serving as a significant precedent.
In hiQ Labs v. LinkedIn, hiQ Labs was a data analytics company that specialized in collecting and analyzing publicly available employment information from platforms like LinkedIn. They used this data to predict employee turnover and map workforce skills and offered these insights to businesses. In 2017, LinkedIn issued a cease-and-desist to stop scraping data from LinkedIn's public profile, arguing that hiQ's activities violated the Computer Fraud and Abuse Act (CFAA).
In 2019, the Ninth Circuit Court of Appeals ruled in favor of hiQ, determining that accessing publicly available information did not violate the CFAA. The court reasoned that since the data hiQ accessed was publicly available to anyone with an internet connection, it did not constitute unauthorized access.
However, in 2021, in Van Buren v. United States, the U.S. Supreme Court clarified the scope of the CFAA. The case concluded in December 2022 with a settlement, wherein hiQ agreed to a consent judgment and permanent injunction, effectively ceasing its data scraping activities on LinkedIn's platform.
HiQ Labs v. LinkedIn highlighted that data scraping may violate the CFAA if a company's user agreement explicitly prohibits such activities.
Following this decision, many digital publishers have revised their terms of service to explicitly prohibit data scraping, particularly for training AI models. In 2023, both X (formerly Twitter) and The New York Times updated its terms to forbid the use of its content for AI training without explicit permission.
Recognizing the legal implications, AI model providers have increasingly sought licensing partnerships with digital publishers to access content for training their models.
Over the course of this year, OpenAI has partnered with The Associated Press, The Financial Times, and News Corp. Similarly, in October, Meta secured a multi-year AI content licensing deal with Reuters.
1c. How Would Digital Publishers Be Compensated at Scale?
While major digital publishers like News Corp and The Associated Press have secured compensation from AI model providers through licensing agreements, smaller digital publishers – such as blogs and forums – face significant challenges to their monetization strategies.
Let's take a fun example – say that I am going to Mexico City, and I want to find where I can see the best Lucha Libre in Mexico City. Previously, I would Google it and find blogs like Matador Network that provide comprehensive information.
I can also ask the same question through ChatGPT, which can provide answers based on data it has been trained on, including information from websites such as the Matador Network blog. Matador Network, however, monetizes its content through banner ads displayed on its website. When users obtain information directly from ChatGPT instead of visiting the blog, it results in fewer page views, leading to reduced advertising revenue.
At scale, I believe the only way to ensure that digital publishers are adequately compensated when their data is utilized by an AI system is through a real-time marketplace to support these transactions.
2. How Does Dappier Fit Into the Future of the "AI Internet"?
Dappier provides both a platform and a marketplace that will facilitate the future of the "AI internet". I've illustrated how Dappier fits into the ecosystem below:

Dappier's system is designed to enable digital publishers to monetize their content across various AI endpoints. Whether a user submits a query through a text-based interface like ChatGPT or a voice-activated assistant, Dappier ensures that publishers receive compensation for the use of their content.
RAG incorporates two types of data:
- Static Data: This includes fixed information such as PDFs, websites, and proprietary databases, representing data at a specific point in time.
- Dynamic Data: This encompasses continuously updated information from sources like RSS feeds, Airtable, and APIs. Incorporating dynamic data is crucial for responding to queries that require the most current information.
Dappier's platform enables digital publishers to seamlessly integrate their website URLs and proprietary data into "RAG-ready" data models within minutes. The platform continuously ingests, integrates, prepares, and stores this data, allowing publishers to list their data models in Dappier's marketplace and set their own price per query.
3. Dappier GTM Strategy
While being the marketplace that facilitates the "AI Internet" is Dappier's long-term vision, their current GTM strategy centers on providing digital publishers with a dual-value proposition – monetization and AI integration.
Through the Dappier Marketplace, publishers can license their proprietary content to AI developers, setting their own prices per query. Additionally, Dappier offers embeddable AI widgets, such as voice and chat agents, that publishers can seamlessly integrate into their platforms to enhance user engagement.
Dappier's go-to-market strategy has yielded significant traction, with its platform now encompassing digital publishers that collectively reach over 35 million monthly readers.

Additionally, Dappier has established strategic partnerships with several omnichannel digital publishers and media companies, enhancing its reach and service offerings.
In October 2024, Dappier partnered with Morgan Murphy Media, a local media company, to revolutionize local media monetization through AI. Similarly, In July 2024, Dappier partnered with HomeLife Brands, the parent company of IHeartDogs and IHeartCats, to monetize their extensive pet-related content for AI applications.
4. Who Is the Founding Team?
Dappier's leadership team comprises seasoned entrepreneurs with a proven track record of building and successfully exiting companies.
Dan Goikhman, Co-Founder and CEO of Dappier, is a seasoned entrepreneur with a proven track record of successful ventures in the technology and media sectors. In 2014, he and Krish co-founded Mojiva, a pioneering mobile ad network that was acquired by PubMatic. Subsequently, in 2022, he led Powr.tv, a Connected TV (CTV) publishing platform, to its acquisition by Bitcentral.
Krish Arvapally, Co-Founder and Chief Technology Officer, is a serial tech entrepreneur with two successful exits. In addition to co-founding Mojiva with Dan, he also co-founded Unreel, an over-the-top (OTT) video streaming platform, which was acquired by Powr.tv in 2019.
Akshay Arvapally, Co-Founder and Head of Product, is a seasoned entrepreneur with a strong background in technology and media. As President of Powr.tv, Akshay led the company to a successful acquisition by Bitcentral in 2022.
5. My Thesis Behind Dappier
Dappier is well-positioned to be a key player in the emerging "AI Internet" by enabling essential micro data transactions that support AI-driven applications. The legal landscape surrounding data scraping is complex, with several on-going court cases currently shaping its future. The precedent set by hiQ Labs v. LinkedIn suggests a narrower interpretation of the CFAA's "exceeds authorized access" clause, while the ability for digital publishers to restrict data scraping through updated terms of use implies that, eventually, all data in the AI ecosystem may need to be licensed. This trend is further evidenced by foundational AI model providers recently establishing data licensing agreements with content creators. Effectively managing these partnerships across the amount of digital content available on the internet, however, necessitates a real-time marketplace. From what I've seen, no other platforms currently facilitate these micro data transactions, positioning Dappier to fill this critical and inevitable gap.
One of the reasons why AI engineers have not developed a centralized marketplace for these AI data transactions may stem from the assumption that model providers would eventually establish such platforms. This perspective might hold if the AI model landscape mirrored Search, where a single entity like Google dominates. However, the current AI ecosystem is characterized by multiple LLMs such as ChatGPT, Gemini, and Claude which are leveraged in different aspects of the AI ecosystem. This decentralization creates an opportunity for marketplaces like Dappier to become facilitators of the AI Internet.
Lastly, the Dappier team is a battle-tested group of serial entrepreneurs that have successfully led multiple exits, and I have a strong conviction in their vision and ability to execute on this business plan. Thanks for making it all the way to the end! If you have any thoughts, questions, or feedback, I'd love to hear them – your input is always valuable.