AI Just Passed Another Human Test

There is an almost uncomfortable moment in the study. In front of a split screen, participants converse with what they thought were two people at the same time—one human, one machine—and are then asked to determine which was which. The majority of them made a mistake. Not every now and then. People pointed at GPT-4.5 and said, “That one’s human,” 73% of the time when it was on the other side of the conversation.

Cameron Jones and Benjamin Bergen, two cognitive scientists at UC San Diego, conducted the study, which was released as a preprint in March. It’s important to keep in mind that it hasn’t yet undergone peer review. However, the numbers are hard to ignore even with that asterisk firmly in place.

Category	Details
Test Name	Turing Test (Imitation Game)
Originally Proposed By	Alan Turing — British mathematician and computer scientist
Year Introduced	1950, in the paper “Computing Machinery and Intelligence”
Study Authors	Cameron Jones & Benjamin Bergen, UC San Diego
Study Status	Preprint — not yet peer-reviewed
AI Model Tested	GPT-4.5 (OpenAI), GPT-4o, LLaMa-3.1-405B (Meta), ELIZA
GPT-4.5 Human Rating	73% of participants judged it as human
LLaMa-3.1-405B Human Rating	56% of participants judged it as human
GPT-4o Human Rating	21% — lower than ELIZA’s 23%
Participants	~284 people assigned as interrogators or witnesses
Test Format	Split-screen, simultaneous five-minute conversations
Key Variable	Persona prompting significantly improved AI performance
Without Persona Prompt	GPT-4.5 dropped to 36% human identification rate
Reference	OpenAI Research

A modernized version of the Turing test, which has been used in philosophical and scientific circles since 1950, was applied to four AI models. OpenAI’s GPT-4.5, one of them, did not simply pass. In the study, it performed better than the real humans, being recognized as human even more frequently.

Most people are unaware of the longer and more bizarre history of the Turing test itself. It was not initially intended by Alan Turing as a direct confrontation between humans and machines. It wasn’t until his 1950 publication that the imitation game—a three-person setup in which an interrogator attempts to ascertain, solely through text, whether they are speaking with a man or a woman—to take shape.

His 1948 paper introduced a chess-based thought experiment. According to Turing’s framing, the machine was supposed to replace one of them. Over the course of several decades, the public version of this test deviated greatly from Turing’s initial goals and may not have had his full support.

In 2023, Google software engineer François Chollet stated unequivocally that the test was never intended to be taken literally. It was an experiment in thought. A method disguised as a philosophical challenge. When headlines start announcing that machines have crossed some sort of finish line, it’s easy to forget that context.

Nevertheless, despite the philosophical disagreements, something occurred in that study that merits consideration. Everything was altered by persona prompting. GPT-4.5’s success rate fell to 36% when given simple instructions, such as “convince the interrogator you are human,” compared to 73% when instructed to assume the role of a young, culturally competent internet user.

It’s possible that people aren’t actually detecting intelligence at all. Perhaps it’s personality. familiarity. the sound of a particular voice type that they have come to identify with actual people.

ELIZA, a chatbot that was created about eighty years ago and is a technological relic, managed to achieve a twenty-three percent success rate. GPT-4o, which presently powers the mainstream version of ChatGPT and is regarded as one of the more capable models available to the public, is one percentage point behind that. Though it’s still unclear exactly what it reveals, there’s a dark humor in that comparison. Because ELIZA is so crude, it completely deviates from the tone that users have trained themselves to expect from sophisticated AI.

It’s difficult to ignore the fact that the Turing test has always generated more controversy than the test itself merits. Four persistent objections are raised by critics: passing the test demonstrates behavioral mimicry rather than thought; the brain may not function like a machine at all; AI reasoning processes are not comparable to human ones; and testing a single behavior cannot establish general intelligence.

Jones himself appeared wary of making too many claims. The question of whether LLMs are as intelligent as humans is “very complicated,” he tweeted, and the findings should be “one among many other pieces of evidence.”

However, Jones did pursue a more sensible and possibly more pressing course of action. The automation of conversational jobs, more convincing social engineering, and the erosion of the presumption that you know who you’re talking to are all possibilities that most people haven’t fully considered if AI can convincingly replace humans in brief interactions. These are no longer science fiction issues. These are inquiries about design. questions about policy. the type that requires more time to complete than a split-screen test lasting five minutes.

Additionally, Jones concludes with an odd loop: the Turing test measures more than just machines. We are measured by it. Our presumptions, our expectations, and our perception of what constitutes a “real” conversation. People may become more adept at identifying or accepting AI systems as they spend more time interacting with them. It’s really hard to tell where that will go.

For the time being, a chatbot persuaded almost 75% of its interrogators that it was one of them. It’s probably worth more than a preprint and a news cycle to determine whether this indicates that machines are capable of thinking or just that thinking isn’t quite what we thought it was.

AI Just Passed Another Human Test

Why the World’s Biggest Tech Companies Are Suddenly Investing in Nuclear Fusion

Why Louisiana’s Decision to Scrap AI Legislation Is Being Watched by Every Other State Capital

Big Tech Promised AI Would Create Jobs. Instead, Oracle Just Cut Thousands More.

Journalism Students Are More Skeptical of AI Than Almost Any Other Group. Here’s Why That Matters.

How to Destroy a Hard Drive So the NSA Can Never Recover Your Data

The $100 Million AI Safety Pitch That Major Tech Giants Are Being Asked to Fund

Why the World’s Biggest Tech Companies Are Suddenly Investing in Nuclear Fusion

Researchers Say Machines May Soon Think Independently — And the Line Between Illusion and Reality Is Blurring Fast

This Breakthrough Changes Everything — And Most People Haven’t Heard About It Yet

Scientists Say They Are Entering Unknown Territory

How China’s Lithium-Free Fertilizer Production Is Insulating It From a Crisis Hitting Everyone Else

AI Just Passed Another Human Test

Related Posts