Alex Lupsasca can’t seem to shake a particular memory. The theoretical physicist was sitting with GPT-5 Pro in the summer of 2025, asking it to identify the same mathematical symmetries in black hole equations that he had spent months discovering. The machine initially malfunctioned. After posing a more straightforward warm-up question, he repeated the question. It arrived. “I was like, oh my God, this is insane,” he later remarked.
In order to join OpenAI, he packed up his life and relocated his family to San Francisco a few months later. Lupsasca isn’t an enthusiast, so it’s easy to write this off as a tech enthusiast going crazy. He is a dedicated scientist who works on some of the most difficult physics problems in existence.
| Category | Details |
|---|---|
| Pioneering AI System | Adam — first robot to make fully automated scientific discoveries |
| Field of Discovery | Yeast biology; conducted independent experiments using robotic arms and freezer samples |
| 2024 Nobel Prizes | Chemistry & Physics both awarded to pioneers of AI tools in science |
| GPT-4.5 Turing Test Result | Judged human 73% of the time — March 2025, UC San Diego study |
| UC San Diego Conclusion | Four faculty members across philosophy, linguistics, AI & data science argue current LLMs already constitute AGI |
| Key Researchers | Eddy Keming Chen, Mikhail Belkin, Leon Bergen, David Danks |
| OpenAI for Science | New team led by Kevin Weil building AI tools specifically for scientific research |
| Notable Scientist Convert | Alex Lupsasca — black hole physicist, relocated family to San Francisco to join OpenAI |
| Median AGI Prediction (Academic Researchers) | 2040–2050 based on 8 peer-reviewed surveys of 4,900+ AI experts |
| AGI Definition Basis | Breadth across domains (math, language, science, creative tasks) + depth of performance within them |
| Current Risk | Rise of AI-generated “junk science”; PLOS and Frontiers restricted certain AI-only paper submissions in 2025 |
| Word of the Year 2025 | “Slop” — coined by Merriam-Webster to describe low-quality AI-generated content flooding science, business, and media |
Right now, something truly peculiar is taking place in research labs, and it’s happening so quickly that even cautious scientists are finding it difficult to adequately describe it. Whether AI can help science is not the question being asked in academic institutions, at conferences, and apparently in some very contentious faculty meetings. Of course it can. The more difficult, unfamiliar, and unsettling question is whether it is starting to act in a way that seems more like thought than help.
For the better part of a year, four UC San Diego faculty members debated this exact question. Eddy Keming Chen is a philosopher. Mikhail Belkin is a professor of computer science and artificial intelligence. Linguist Leon Bergen. David Danks, who works at the nexus of philosophy, policy, and data science.

They came to a conclusion that would have seemed absurd five years ago, coming from genuinely different disciplines: by reasonable standards, current large language models already constitute artificial general intelligence.
Not machines of the future. those that are currently in use. The kind that is currently operating on servers, responding to inquiries about recipes, composing cover letters, and seemingly rediscovering black hole equations.
It’s still unclear if this claim will stand up to close examination or if it will be silently added to the lengthy list of premature AGI announcements. NYU’s Gary Marcus is still direct about his skepticism. He believed that no language model could replicate the kind of creative rupture that was necessary for the real leaps of scientific imagination, such as continental drift and special relativity.
Additionally, he makes a valid point. The moment a human scientist looks out a window and wonders, for no apparent reason, if the continents were ever in contact is different. That thought was not prompted by anyone. No one asked it a warm-up question.
And yet. Ernest Ryu, a mathematician at UCLA, engaged in a twelve-hour back-and-forth dialogue with ChatGPT running on GPT-5 pro in an attempt to demonstrate a challenging aspect of optimization, a field of mathematics that focuses on identifying the optimal solution among numerous options. The AI continued to make odd attempts, many of which were unsuccessful.
However, Ryu was the knowledgeable person in the room, so he could see when something was on the verge of success and take it a step further. Real, peer-reviewable mathematics was the proof they came up with together. Ryu told OpenAI, “It amazed me with the strange things it would try.” Since then, he has also joined the company. Here, a pattern is emerging that is noteworthy.
Of course, it didn’t start with GPT-5. In the early 2000s, a robot by the name of Adam was working inside a small van-sized robotic laboratory, asking questions about yeast, conducting experiments with robotic arms, and making modest but real scientific discoveries. At the time, no one thought Adam was intelligent. It was an automated machine carrying out a predetermined task. It’s difficult to draw a clear distinction between that and what Ryu encountered with ChatGPT, which may be the whole point.
The team from UC San Diego argues that impossible standards have subtly tainted the definition of general intelligence. They contend that perfection is not necessary for general intelligence. People are not flawless. Universal mastery is not necessary for general intelligence. Nobody is an expert in everything, not even in their own field.
Furthermore, intelligence can count without adhering to human cognitive architecture. According to their framework, intelligence is measured in tiers: basic literacy and conversational skills, PhD-level problem solving in a variety of fields, and ground-breaking scientific discoveries that are nearly impossible for humans to make. They conclude that Frontier LLMs have already passed the first two tiers.
There is a legitimate counterargument that AI is generating just as much garbage as insight. Marcus is correct to draw attention to the deluge of AI-generated garbage that is currently making its way through academic publishing. “Slop” was named Merriam-Webster’s word of the year for 2025, and journals like PLOS and Frontiers have started to limit some AI-only submissions because the amount of junk was getting too much.
It appears that a tool strong enough to find black hole symmetries is also strong enough to write meaningless research papers on an industrial scale. Both of these things are true at the same time, which is the kind of awkward circumstance that is rarely depicted in either the positive or negative headlines.
The head of OpenAI for Science, Kevin Weil, states unequivocally that they are just getting started. According to the majority of researchers surveyed in eight peer-reviewed studies, there is a 50% chance that full AGI will arrive between 2040 and 2050. However, entrepreneurs tend to push that date much closer, with some claiming this decade. There is a big difference between those two groups, which indicates how much is actually unknown.
It’s difficult to ignore the fact that OpenAI, which has clear reasons to think the technology is more capable than its detractors claim, now employs Lupsasca, Ryu, and other people making the most dramatic claims. Their experiences are not diminished by their close proximity. But as you watch this story develop, it’s important to keep that in mind.
There doesn’t seem to be a clear start or finish to anything. It feels more like a middle ground—noisy, contentious, unresolved, and moving more quickly than most institutions can handle.
