Wrong Answers Only: Industry Struggles with the ‘Inherent Characteristic’ of GenAI Hallucinations

Author:

At the two-year mark since the launch of OpenAI’s large-language model (LLM)-powered chatbot, ChatGPT, and the rapid rise of generative artificial intelligence (GenAI), the technology’s growing pains and future development were a central topic of discussion at Amazon Web Services’ (AWS) re:Invent conference in Las Vegas.Matt Garman, CEO of AWS, expressed the transformative potential of generative AI, stating, “I think generative AI has the potential to transform every single industry, every single company out there, every single workflow out there, every single user experience out there.”

However, he also acknowledged that as enterprises seek to integrate AI into their operations, they require clearer boundaries regarding the capabilities and outputs of these tools.One of the key challenges identified by Garman is the issue of generative AI models “hallucinating,” or producing incorrect, misleading, or nonsensical text. He pointed out that while current models are impressive, they still make errors, which becomes problematic when moving from a proof-of-concept phase to full-scale production.

“In reality, as good as the models are today, sometimes they get things wrong,” he said. “So when you did a proof-of-concept last year or the year before, 90 percent was okay.But when you get down into a production application, that’s not okay.”In response to this challenge, Garman introduced a new feature called Automated Reasoning checks. This tool, part of AWS’s Amazon Bedrock platform, aims to safeguard against hallucinations by assessing the accuracy of model responses based on information provided by customers.

In cases where a potential hallucination is detected, the feature will present an alternative, more accurate response alongside the model’s original output.This feature, which AWS claims will help improve the reliability of generative AI in real-world applications, follows similar releases from Microsoft and Google earlier in the year.The announcement underscored AWS’s commitment to addressing the reliability and safety concerns around generative AI as it becomes more integrated into enterprise workflows.

Pradeep Prabhakaran, a senior manager of solution architecture at Cohere, discussed the inherent challenges of generative AI during a panel at the AWS re:Invent conference, particularly the issue of “hallucination” in large language models (LLMs).He acknowledged that hallucination is a fundamental characteristic of LLMs, making it a persistent problem in the advancement of generative AI. He emphasized that as applications transition from prototype to production, it’s crucial to build systems that allow for continuous feedback.

This approach ensures that if an AI model produces incorrect outputs, there is still a method to validate and correct those errors.For Koho, a Canadian challenger bank, the accuracy of AI-generated outputs is especially important.David Kormushoff, Koho’s vice-president of technology and AI, shared the company’s interest in using generative AI for consumer-facing applications, particularly in educating clients about wealth-building and offering insights into their spending behaviors.

However, Kormushoff stressed that providing inaccurate information would undermine the company’s core values. “We don’t want to be … giving them bad information,” he said, though he expressed confidence that the company would eventually reach a point where it could trust the technology enough to offer it to customers.Similarly, Thomas Storwick, co-founder and COO of Coastal Carbon, a startup based in Waterloo, Ontario, that uses generative AI models to analyze geospatial data for industries like insurance, agriculture, and climate change adaptation, discussed the ongoing learning process in their field.

Coastal Carbon is currently exploring geospatial foundational models and studying how hallucinations manifest in this context. Storwick emphasized the importance of common sense and maintaining human oversight in the process, ensuring clients understand the nature of the data being provided.The debate surrounding the size of AI models is also gaining traction in the industry. While many LLM companies have focused on scaling up their models, expecting that larger datasets and bigger models would lead to better performance, there is now growing discussion on whether this approach is truly the best path forward.

The shift toward refining model accuracy and addressing hallucinations suggests that, in some cases, size may not necessarily equate to improved outcomes, and that more focused, nuanced approaches might be required to meet enterprise needs effectively.Hyperscalers like Microsoft, Amazon, and Google have made significant investments in expanding data centers and enhancing the power of graphic-processing units (GPUs) to support larger foundational models.

However, as enterprise applications become more central to the discussion around generative AI, some experts argue that using these larger models may not always be the best approach for business needs. During a panel at the AWS re:Invent conference, Cohere’s Pradeep Prabhakaran emphasized the importance of building AI models on smaller infrastructure that meets essential constraints like latency, accuracy, and cost.

This perspective challenges the prevailing notion that bigger models are always the solution for enterprise applications.Cohere’s CEO, Aidan Gomez, echoed this sentiment in a Dec. 5 letter to staff and shareholders, stating that the company believes the future of enterprise AI lies in smaller, customizable tools. Gomez outlined plans to launch a suite of workplace AI assistants that integrate seamlessly into existing business systems, such as email platforms and customer relationship management (CRM) tools.

These smaller, plug-and-play models would be tailored to each client and could be deployed privately, offering greater flexibility while addressing privacy and security concerns.Gomez also pointed out that stronger data privacy and security measures would be crucial in promoting AI adoption, especially for regulated businesses. Moreover, training models on business-relevant data could improve their accuracy and effectiveness, making them more valuable for companies across different sectors.

Patricia Nielsen, AWS’s head of startups for Canada, noted that businesses are increasingly focusing on the responsible use of data. She observed that companies are becoming more aware of the need for caution, particularly regarding the transparency of training data sources and how that data is processed.Despite growing adoption, generative AI faces significant scrutiny from leading figures in the AI community. Canadian AI pioneers Geoffrey Hinton and Yoshua Bengio have raised alarms about the potential future risks of the technology, warning that it could pose a serious threat to humanity.

Their concerns highlight the ongoing tension between the excitement for AI’s potential and the ethical considerations that accompany its rapid development and deployment.In a fireside chat at the AWS re:Invent conference, Andrew Ng, founder of the Google Brain project and DeepLearning.AI, who also serves on Amazon’s board, dismissed concerns about the potential dangers of AI as a “distraction.” While acknowledging that he had some concerns about AI “polluting the information ecosystem,” Ng downplayed the broader existential fears about the technology.

He emphasized that most of the threats associated with AI are application-specific, rather than intrinsic to the technology itself.Ng also addressed concerns about bias in AI systems, stating that AI teams take the issue “very seriously” and are working hard to mitigate it. His comments reflect a more optimistic view on AI’s development, suggesting that the technology’s risks can be managed with thoughtful and responsible application.