ChatGPT has created a frenzy. Since the release of OpenAI’s large language model (LLM) in late November, there has been rampant speculation about how generative AIs — of which ChatGPT is just one — might change everything we know about knowledge, research, and content creation. Or reshape the workforce and the skills employees need to thrive. Or even upend entire industries!
One area stands out as a top prize of the generative AI race: search. Generative AI has the potential to drastically change what users expect from search.
Google, the longtime winner of online search, seems to suddenly have a challenger in Microsoft, which recently invested $10 billion in ChatGPT’s developer, OpenAI, and announced plans to incorporate the tool into a range of Microsoft products, including its search engine, Bing. Meanwhile, Google is releasing its own AI tool, Bard, and Chinese tech giant Baidu is preparing to launch a ChatGPT competitor. Millions of dollars are being poured into generative AI startups as well.
But despite the hype around ChatGPT — and generative AI overall — there are major practical, technical, and legal challenges to overcome before these tools can reach the scale, robustness, and reliability of an established search engine such as Google.
Search engines entered the mainstream in the early 1990s, but their core approach has remained unchanged since then: to rank-order indexed websites in a way that is most relevant to a user. The Search 1.0 era required users to enter a keyword or a combination of keywords to query the engine. Search 2.0 arrived in the late 2000s with the introduction of semantic search, which allowed users to type natural phrases as if they were interacting with a human.
Google dominated search right from its launch thanks to three key factors: its simple and uncluttered user interface; the revolutionary PageRank algorithm, which delivered relevant results; and Google’s ability to seamlessly scale with exploding volume. Google Search has been the perfect tool for addressing a well-defined use case: finding websites that have the information you are looking for.
But there seems to be a new use case on the rise now. As Google also acknowledged in its announcement of Bard, users are now seeking more than just a list of websites relevant to a query — they want “deeper insights and understanding.”
And that’s exactly what Search 3.0 does — it delivers answers instead of websites. While Google has been the colleague who points us to a book in a library that can answer our question, ChatGPT is the colleague who has already read every book in the library and can answer our question. In theory, anyway.
But here also lies ChatGPT’s first problem: In its current form, ChatGPT is not a search engine, primarily because it doesn’t have access to real-time information the way a web-crawling search engine does. ChatGPT was trained on a massive dataset with an October 2021 cut-off. This training process gave ChatGPT an impressive amount of static knowledge, as well as the ability to understand and produce human language. However, it doesn’t “know” anything beyond that. As far as ChatGPT is concerned, Russia hasn’t invaded Ukraine, FTX is a successful crypto exchange, Queen Elizabeth is alive, and Covid hasn’t reached the Omicron stage. This is likely why in December 2022 OpenAI CEO Sam Altman said, “It’s a mistake to be relying on [ChatGPT] for anything important right now.”
Will this change in the near future? That raises the second big problem: For now, continuously retraining an LLM as the information on the internet evolves is extremely difficult.
The most obvious challenge is the tremendous amount of processing power needed to continuously train an LLM, and the financial cost associated with these resources. Google covers the cost of search by selling ads, allowing it to provide the service free of charge. The higher energy cost of LLMs make that harder to pull off, particularly if the aim is to process queries at the rate Google does, which is estimated to be in the tens of thousands per second (or a few billion a day). One potential solution may be to train the model less frequently and to avoid applying it to search queries that cover fast-evolving topics.
But even if companies manage to overcome this technical and financial challenge, there is still the problem of the actual information it will deliver: What exactly are tools like ChatGPT going to learn and from whom?
Consider the Source
Chatbots like ChatGPT are like mirrors held up to society — they reflect back what they see. If you let them loose to be trained on unfiltered data from the internet, they could spit out vitriol. (Remember what happened with Tay?) That’s why LLMs are trained on carefully selected datasets that the developer deems to be appropriate.
But this level of curation does not ensure that all the content in such massive online datasets is factually correct and free of bias. In fact, a study by Emily Bender, Timnit Gebru, Angelina McMillan-Major, and Margaret Mitchell (credited as “Shmargaret Shmitchell”) found that “large datasets based on texts from the internet overrepresent hegemonic viewpoints and encode biases potentially damaging to marginalized populations.” As an example, one key source for ChatGPT’s training data is Reddit, and the authors quote a Pew Research study that shows 67% of Reddit users in the United States are men and 64% are between ages 18 and 29.
These disparities in online engagement across demographic factors such as gender, age, race, nationality, socioeconomic status, and political affiliation mean the AI will reflect the views of the group most dominant in the curated content. ChatGPT has already been accused of being “woke” and having a “liberal bias.” At the same time, the chatbot has also delivered racial profiling recommendations, and a professor UC Berkley got the AI to write code that says only white or Asian men would make good scientists. OpenAI has since put in guardrails to avoid these incidents, but the underlying problem still remains.
Bias is a problem with traditional search engines, too, as they can lead users to websites that contain biased, racist, incorrect, or otherwise inappropriate content. But as Google is simply a guide pointing users toward sources, it bears less responsibility for their contents. Presented with the content and contextual information (e.g., known political biases of the source), users apply their judgment to distinguish fact from fiction, opinion from objective truth, and decide what information they want to use. This judgment-based step is removed with ChatGPT, which makes it directly responsible for the biased and racist results it may deliver.
This raises the issue of transparency: Users have no idea what sources are behind an answer with a tool like ChatGPT, and the AIs won’t provide them when asked. This creates a dangerous situation where a biased machine may be taken by the user as an objective tool that must be correct. OpenAI is working on addressing this challenge with WebGPT, a version of the AI tool that is trained to cite its sources, but its efficacy remains to be seen.
Opacity around sourcing can lead to another problem: Academic studies and anecdotal evidence have shown that generative AI applications can plagiarize content from their training data — in other words, the work of someone else, who did not consent to have their copyrighted work included in the training data, did not get compensated for the use of the work, and did not receive any credit. (The New Yorker recently described this as the “three C’s” in an article discussing a class action lawsuit against generative AI companies Midjourney, Stable Diffusion, and Dream Up.) Lawsuits against Microsoft, OpenAI, GitHub, and others are also popping up, and this seems to be the beginning of a new wave of legal and ethical battles.
Plagiarism is one issue, but there are also times when LLMs just make things up. In a very public blunder, Google’s Bard, for example, delivered factually incorrect information about the James Webb telescope during a demo. Similarly, when ChatGPT was asked about the most cited research paper in economics, it came back with a completely made-up research citation.
Because of these issues, ChatGPT and generic LLMs have to overcome major challenges to be of use in any serious endeavor to find information or produce content, particularly in academic and corporate applications where even the smallest misstep could have catastrophic career implications.
LLMs will likely enhance certain aspects of traditional search engines, but they don’t currently seem capable of dethroning Google search. However, they could play a more disruptive and revolutionary role in changing other kinds of search.
What is more likely in the Search 3.0 era is the rise of purposefully and transparently curated and deliberately trained LLMs for vertical search, which are specialized, subject-specific search engines.
Vertical search is a strong use case for LLMs for a few reasons. First, they focus on specific fields and use cases — narrow, but deep knowledge. That makes it easier to train LLMs on highly curated datasets, which could come with comprehensive documentation describing the sources and technical details about the model. It also makes it easier for these datasets to be governed by the appropriate copyright, intellectual property, and privacy laws, rules, and regulations. Smaller, more targeted language models also means lower computational cost, making it easier for them to be retrained more frequently. Finally, these LLMs would be subject to regular testing and auditing by third-party experts, similar to how analytical models used in regulated financial institutions are subject to rigorous testing requirements.
In fields where expert knowledge rooted in historical facts and data is a significant part of the job, vertical LLMs can provide a new generation of productivity tools that augment humans in entirely new ways. Imagine a version of ChatGPT trained on peer-reviewed and published medical journals and textbooks and embedded into Microsoft Office as a research assistant for medical professionals. Or a version that is trained on decades of financial data and articles from the top finance databases and journals that banking analysts use for research. Another example is training LLMs to write or debug code and answer questions from developers.
Businesses and entrepreneurs can ask five questions when evaluating whether there is a strong use case for applying LLMs to a vertical search application:
- Does the task or process traditionally require extensive research or deep subject-matter expertise?
- Is the outcome of the task synthesized information, insight, or knowledge that allows the user to take action or make a decision?
- Does sufficient historical technical or factual data exist to train the AI to become an expert in the vertical search area?
- Is the LLM able to be trained with new information at an appropriate frequency so it provides up-to-date information?
- Is it legal and ethical for the AI to learn from, replicate, and perpetuate the views, assumptions, and information included in the training data?
Confidently answering the above questions will require a multidisciplinary lens that brings together business, technical, legal, financial, and ethical perspectives. But if the answer is “yes” to all five questions, there is likely a strong use case for a vertical LLM.
Letting the Dust Settle
The technology behind ChatGPT is impressive, but not exclusive, and will soon become easily replicable and commoditized. Over time, the public’s infatuation with the delightful responses produced by ChatGPT will fade while the practical realities and limitations of the technology will begin to set in. As a result, investors and users should pay attention to companies that are focusing on addressing the technical, legal, and ethical challenges discussed above, as those are the fronts where product differentiation will take place, and AI battles will ultimately be won
Leave a Reply