Study Reveals: Over Half of ChatGPT's Answers to Programming Questions Are Incorrect - SIIT

In recent years, computer programmers have increasingly turned to chatbots like OpenAI’s ChatGPT for coding assistance, impacting platforms like Stack Overflow, which had to downsize its workforce by nearly 30% last year.

The only problem arises from a recent study conducted by a team of researchers from Purdue University, which was presented at the Computer-Human Interaction conference this month. The study findings indicate that a significant 52 percent of programming answers generated by chatbots, such as ChatGPT, are inaccurate. This revelation raises concerns about the reliability and accuracy of utilizing chatbots for programming assistance, highlighting the potential limitations and challenges faced by programmers seeking automated support in their coding endeavors.

Indeed, the 52 percent inaccuracy rate revealed by the Purdue University study is alarmingly high for a program that users depend on for accurate and precise information. This highlights a common issue faced by various end users, including writers and teachers, who interact with AI platforms like ChatGPT. These platforms have been known to generate completely incorrect answers, seemingly conjured out of thin air. This phenomenon underscores the challenges and risks associated with relying on AI technology for critical tasks, emphasizing the importance of ensuring the accuracy and reliability of such systems before widespread adoption.

In their study, the researchers examined 517 questions from Stack Overflow and evaluated ChatGPT’s responses to these queries. The findings revealed that a significant 52 percent of ChatGPT’s answers contained misinformation. Additionally, 77 percent of the responses were found to be more verbose than human-generated answers, while 78 percent exhibited varying degrees of inconsistency compared to human responses. These results shed light on the challenges and discrepancies present in the responses generated by ChatGPT, highlighting the need for further refinement and improvement in AI-based systems to ensure accuracy and reliability in providing information and assistance.

Furthermore, the research team conducted a linguistic analysis of 2,000 randomly chosen ChatGPT responses. The analysis revealed that these responses exhibited a more formal and analytical tone, with a reduced presence of negative sentiment. This observation aligns with the typical bland and cheerful demeanor often associated with AI-generated content, highlighting the distinct linguistic characteristics and tendencies of AI platforms like ChatGPT in their communication style.

One particularly concerning aspect is that a significant number of human programmers appear to favor the answers provided by ChatGPT. In a poll conducted by the Purdue researchers, involving a sample size of 12 programmers (albeit small), it was discovered that 35 percent of them preferred ChatGPT’s responses. Additionally, 39 percent of the programmers failed to identify mistakes in the AI-generated answers. This preference for AI-generated content and the inability to discern errors underscores the potential influence and impact of AI platforms like ChatGPT on human decision-making and problem-solving processes within the programming community.

The phenomenon of human programmers preferring ChatGPT answers over human-generated responses may be attributed to the AI’s perceived politeness compared to interactions with individuals online. The researchers conducted follow-up semi-structured interviews, which unveiled that the polite language, articulate and textbook-style answers, and comprehensive nature of ChatGPT responses were key factors that made them appear more convincing to participants.

As a result, the programmers tended to lower their guard and overlook instances of misinformation present in ChatGPT answers. This highlights the influence of language style and presentation on the perceived credibility and trustworthiness of AI-generated content, emphasizing the need for critical evaluation and discernment when relying on automated systems for information and assistance.

The study serves as a stark reminder that ChatGPT continues to exhibit significant flaws, which may offer little solace to individuals who have been laid off from Stack Overflow or programmers tasked with rectifying mistakes generated by AI in code. The implications of these shortcomings underscore the real-world consequences and challenges associated with relying on AI technology for critical tasks and decision-making processes.

As such, it is imperative for developers and users to remain vigilant, exercise caution, and implement appropriate measures to mitigate the risks and limitations posed by AI systems like ChatGPT in order to ensure accuracy, reliability, and effectiveness in their applications.

SIIT – Tech Guest Posts

Study Reveals: Over Half of ChatGPT’s Answers to Programming Questions Are Incorrect