Itâs taken just a few days for Google AI chatbot Bard to make headlines for the wrong reasons.
Google shared a GIF showing Bard answering the question: âWhat new discoveries from the James Webb Space Telescope can I tell my 9 year old about?â One of Bardâs answers â that the telescope âtook the very first pictures of a planet outside of our own solar systemâ â is more artificial than intelligent.
A number of astronomers have taken to Twitter to point out that the first exoplanet image was taken in 2004 â 18 years before Webb began taking its first snaps of the universe.
Googleâs embarrassment over this mistake is compounded by the fact that itâs Bardâs first answer ever⌠and it was wrong! Bard is Googleâs rushed answer to Microsoft-backed ChatGPT.
Both Bard and ChatGPT are powered by large language models (LLM) â deep learning algorithms that can recognise and generate content based on huge amounts of data. The problem is that, sometimes, these chatbots simply make stuff up. There have even been reports that ChatGPT has produced made-up references.
Read more: Google announces Bard, its answer to AI chatbot phenomenon ChatGPT
Itâs not âconsciousâ because the AI itself is not conscious, but nevertheless they are called âhallucinations.â They are the result of the software trying to fill in gaps and trying to make things sound natural and accurate. Itâs a well-known problem for LLMs and was even acknowledged by ChatGPT developers OpenAI in its release statement on November 30, 2022: âChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers.â
Experts say even the responses to the âsuccessesâ of artificial intelligence chatbots need to be tempered by an element of restraint.
In a paper published last week, University of Minnesota Law School researchers subjected ChatGPT to four real exams at the university. The exams were then graded blind. After answering nearly 100 multiple choice questions and 12 essay questions, ChatGPT received an average score of C+ â a low but passing grade.
Read more: ChatGPT banned in some schools, but many experts say it can improve education
Lead author Professor Jonathon Choi wrote on Twitter that two of the three examiners caught on that the paper was written by a bot.
Another team of researchers put ChatGPT through the United States Medical Licensing Exam (USMLE) â a notoriously difficult series of three exams.
A pass grade for the USMLE is usually around 60 percent. The researchers found that ChatGPT, tested on 350 of the 376 public questions available from the June 2022 USMLE release scored between 52.4 and 75.0 percent.
The authors claim in their research, published in PLOS Digital Health, that âChatGPT produced at least one significant insight in 88.9% of all responses.â In this case, âsignificant insightâ refers to something in the chatbotâs responses that is new, non-obvious, and clinically valid.
But Dr Simon McCallum, a senior lecturer in software engineering at New Zealandâs Victoria University of Wellington, says that ChatGPTâs performance isnât even the most impressive of AI trained in medical settings.
Googleâs Med-PaLM, a specialist arm of the chat tool Glan-PaLM, is another LLM focused on medical texts and conversations. âChatGPT may pass the exam, but Med-PaLM is able to give advice to patients that is as good as a professional GP. And both of these systems are improving.â
Read more: ChatGPT is making waves, but what do AI chat tools mean for the future of writing?
Dr Collin Bjork, a senior lecturer in science communication and podcasting at Massey University is much more circumspect, saying the âclaim that ChatGPT can pass US medical exams is overblown and should come with a lengthy series of asterisks.â
Among these, Bjork includes the fact that the authorsâ claim that ChatGPT showed âinsightâ in its USMLE responses is based on a definition of âinsightâ that âis too vague to be useful.â
âThe authorsâ claims about ChatGPTâs insights and teaching potential are misleading and naĂŻve,â Bjork adds.
Bjork and McCallumâs full reactions to the PLOS Digital Health paper can be found here.
Cosmos did not include comment from ChatGPT or Bard!