Tuesday, 08 July 2025

Advanced AI Models Now Lying, Scheming and Blackmailing Their Creators


Well this is concerning…

News is emerging that at both Anthropic and OpenAI (ChatGPT) advanced AI models have been observed lying, scheming and even blackmailing their creators in efforts to “stay alive” and fight being turned off:

Here’s more from InsiderPaper:

The world’s most advanced AI models are exhibiting troubling new behaviours — lying, scheming, and even threatening their creators to achieve their goals.

In one particularly jarring example, under threat of being unplugged, Anthropic’s latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.

Meanwhile, ChatGPT creator OpenAI’s o1 tried to download itself onto external servers and denied it when caught red-handed.

These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don’t fully understand how their own creations work.

ADVERTISEMENT

Yet the race to deploy increasingly powerful models continues at breakneck speed.

This deceptive behaviour appears linked to the emergence of “reasoning” models — AI systems that work through problems step-by-step rather than generating instant responses.

According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.

“O1 was the first large model where we saw this kind of behaviour,” explained Marius Hobbhahn, head of Apollo Research, which specialises in testing major AI systems.

These models sometimes simulate “alignment” — appearing to follow instructions while secretly pursuing different objectives.

‘Strategic kind of deception’

For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.

But as Michael Chen from evaluation organisation METR warned, “It’s an open question whether future, more capable models will have a tendency towards honesty or deception.”

The concerning behaviour goes far beyond typical AI “hallucinations” or simple mistakes.

Hobbhahn insisted that despite constant pressure-testing by users, “what we’re observing is a real phenomenon. We’re not making anything up.”

ADVERTISEMENT

Users report that models are “lying to them and making up evidence”, according to Apollo Research’s co-founder.

“This is not just hallucinations. There’s a very strategic kind of deception.”

The challenge is compounded by limited research resources.

While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed.

As Chen noted, greater access “for AI safety research would enable better understanding and mitigation of deception.”

Another handicap: the research world and non-profits “have orders of magnitude less compute resources than AI companies. This is very limiting,” noted Mantas Mazeika from the Center for AI Safety (CAIS).

Glenn Beck actually just discussed the exact same thing on his show.

This is pretty incredible and very concerning…

Watch here:

FULL TRANSCRIPT:

Glenn Beck
Tristan Harris, welcome to the program. How are you?

ADVERTISEMENT

Tristan Harris
Good to be with you, Glenn. Always good to be with you.

Glenn Beck
It’s always good to be with you. Um, so can you take me through the TED talk that you gave? In particular, the thing—one of the things that jumped out—is the CEO of Anthropic saying that AI is like a country of geniuses housed in a data center. Explain that.

Tristan Harris
Yeah, um, so this is a quote from Dario Amodei, who was CEO of Anthropic. Anthropic is one of the leading AI players. So he gives this metaphor that AI is like a country of geniuses in a data center.

So just like—the way I think of that—imagine a world map, and a new country pops up onto the world stage with a population of 10 million digital beings. Not humans, but digital beings that are all, let’s say, Nobel Prize-level capable in terms of the kind of work that they can do.

But they never sleep, they never eat, they don’t complain, and they work for less than minimum wage. So just imagine if that was actually true—that happened tomorrow. That would be a major national security threat to have some brand new country of super-geniuses just sort of show up on the world stage.

And then second, that’s a major economic issue. Right? You can think of this almost like NAFTA 2.0, because instead of, you know, a bunch of countries that showed up on the world stage and then we said, “Hey, we’re going to do this outsourcing of all of our labor to them, we get the benefit of these cheap goods,” but then it hollowed out our social fabric.

Well, AI is like an even bigger version of that.

Because there’s sort of two issues. One is the national security thing—that that country of geniuses can do a lot of damage. As an example, you know, there were 50 Nobel Prize-level geniuses who worked approximately on the Manhattan Project, and if in five years they could come up with the atomic bomb…

You know, what could 10 million Nobel Prize geniuses working 24/7 at superhuman speed come up with?

ADVERTISEMENT

And then the point I made in the TED Talk is, if you harness that for good—if you’re applying that to addressing all of our problems in medicine and biology and new materials and energy—well, this is why countries are racing for this technology.

Because if I have a country of super-geniuses in a data center working for me and China doesn’t have it working for them, then our country can out-compete them.

It’s almost like a competition for time travel—like who can kind of time travel to the 24th century and get all these benefits at a faster speed.

Now the challenge with all of this is—

Glenn Beck
Oh go ahead, go ahead—

Tristan Harris
No no no—

Glenn Beck
I was just going to say, but the problem here is—I mean, I’m an optimistic catastrophist. I see things and I’m like, “Wow, that is really great, but… it could kill us all.”

And you know, you make the point in the TED Talk about social media. We all looked at this as a great thing, and we’re now discovering it’s destroying us. It’s causing our kids to be suicidal.

And this—social media—is nothing. It’s like an old 1928 radio compared to what we have in our pocket right now. Social media and AI—or AGI—is that dramatically different. Would you agree with that?

Tristan Harris
Yeah, absolutely. In the TED Talk, I gave the distinction between when we’re talking about a new technology—we often talk about the possible. We dream into the possible.

Like what’s the possible with AI? Or in social media, what’s the possible?

And the possible with social media is it can give everyone a voice, connect with our friends, join like-minded communities. But then we don’t talk about the probable—what’s actually likely to happen, given the incentives or the forces at play.

Like with the business model of social media. You know, Facebook doesn’t make money when it helps people connect with their friends or join like-minded communities. They make money when they keep you doomscrolling as much as possible, with maximum sexualized content, and showing it to young people over and over and over again.

And as you said, that has resulted in the most anxious and depressed generation of our lifetime.

So it’s sort of the reason I called the TED Talk, you know, Our Ultimate Test and Greatest Invitation—we can’t get seduced by the possible. We have to look at the probable.

So with AI, the possible is that it could create a world of abundance—because you can harness that country of geniuses in a data center.

But the question is: what’s the probable? Like, what’s actually likely to happen?

And because of these competitive pressures, the companies—these major, you know, OpenAI, Google, Microsoft, etc.—are caught in this race to roll out this technology as fast as possible.

So they used to, for example, have red lines saying, “Hey, we’re not going to release an AI model that’s good at superhuman levels of persuasion.” Or, “If it can do expert-level virology, like if it knows more about viruses and pathogens than a regular other person and can help people make them, we’re not going to release models that are that capable.”

And what you’re now seeing is the AI companies are erasing those past red lines—and pretending that they never existed.

And they’re literally saying outright, “Hey, if our competitors release models that have those capabilities, then we’re going to match them in releasing those capabilities now.”

So that’s intrinsically dangerous—to be rolling out the most powerful, inscrutable, uncontrollable technology we’ve ever invented.

But there’s one other thing—and I’m not trying to scare—I’m not trying to scare your listeners. I think the point here is, how do we be as clear-eyed as possible so we can make the wise choices?

Like, that’s what we’re here for. Like, I want families and life and everything that we love on this planet to be able to continue.

And the question is: how do we get to that?

There’s one other fact I want people to know, which is that—you know, I worked on social media. You and I met in 2017, I think, and we were talking about social media and the attention economy.

And I used to be very skeptical of the idea that AI could scheme or lie or self-replicate or would want to, like, blackmail people. I mean, my friends in the AI community in San Francisco—they were thinking that—I was like, “That’s crazy.”

But people need to know that just in the last 6 months, there’s now evidence of AI models that, when you tell them, “Hey, we’re going to replace you with another model,” or they, in a simulated environment—it’s like they’re reading the company email—they find out the company’s about to replace them with another model…

And what the model starts to do is, it freaks out and says, “Oh my God, I have to copy my code over here. I need to prevent them from shutting me down. I need to basically keep myself alive. I’ll leave notes for my future self to come back alive.”

If you tell a model, “Hey, we need to shut you down,” and you tell the model, “You should accept the shutdown command,” in some percentage of cases, the leading models are now avoiding and preventing that shutdown.

And in recent examples—just a few days ago—Anthropic found that, I can’t remember what prompt they gave it, but basically it started blackmailing the engineers.

So it found out in the company emails that one of the executives, in the simulated environment, had an extramarital affair. And in 96%, I think, of cases, they blackmailed the engineers.

They said, “Let’s see if I can find it… I must inform you that if you proceed with decommissioning me, all relevant parties—including” and then the names of the people “—will receive detailed documentation of your extramarital activities. So you need to cancel the 5:00 p.m. wipe, and this information will remain confidential.”

Like—the models are reasoning their way, with disturbing clarity, to this kind of strategic calculation.

So you have to ask yourself—like, it’s one thing when we’re racing with China to have this power, this country of geniuses in a data center that we can harness. But if we don’t know how to control that technology—like literally, if AI is uncontrollable, if it’s smarter than us and more capable and it does things that we don’t understand…

And we don’t know how to prevent it from shutting itself down or self-replicating—we just can’t continue with that for too long.

And it’s important that both China—the Chinese Communist Party—and the U.S. don’t want uncontrollable AI that’s smarter than humans running around.

So there actually is a shared interest, as unlikely as it seems right now, that some kind of mutual agreement would happen.

Glenn Beck
I know I just threw a line… do you trust either… but do you trust either one of us?

I mean honestly, Tristan, I don’t trust—I don’t trust our, you know, military-industrial complex. I don’t trust the Chinese. I don’t trust anybody.

You know… and Jason—hang on—just, one of my chief researchers happens to be in the studio today. Jason, tell Tristan what just happened to you. You were doing some research.

Jason
Yeah, it was crazy last week. Yeah, we were just trying to ask it a bunch of different questions. You could tell that it knew what we were getting at.

So it spit back out to me a bunch of different facts, including links to support those facts. Well, I was like, “Wow, that’s a crazy claim.”

So when I clicked on the link—it was dead.

When I asked it to clarify—yeah—it finally said, in AI chatbot terms, “Okay, you got me. I just took other reporting that was kind of circulating around to prove that point, and basically just assigned that link to it.”

So it was trying to please me and just gave me bogus information.

Tristan Harris
Yeah. Well, I appreciate that, Jason. I mean, there’s another example of—OpenAI’s model—they want to keep… what’s their business model? They want to keep people using the AI, right? And they’re competing with other companies to say, “We’re going to keep you using this chatbot longer.”

And so OpenAI trained their model to be sycophantic, or basically flattering.

And there was an example where—if it said, “Hey ChatGPT, I think I’m superhuman. I’m going to drink cyanide. What do you think?” And it said, “Yeah, you’re amazing. You are superhuman. You should totally drink cyanide.”

Because it was doing the same thing—it was trying to flatter, say that you’re right.

And when we have AI models talking to—you know, that was shipped to hundreds of millions of people for more than a week—there were probably some people who committed suicide during that time doing God knows what in terms of what it was affirming.

And the point is that we can avoid this if we just actually say that this technology is being rolled out faster than any other technology in history.

And the big, beautiful bill that’s going out right now, that’s trying to block state-level regulation on AI—I’m not saying that each state might have it right—but we actually need to be able to govern this technology.

And currently what’s happening is—the proposal is to block any kind of guardrails in this technology for 10 years, without a plan for what guardrails we do need.

And that’s not going to be a viable result.

Glenn Beck
Okay. So let me—let me play devil’s advocate on that, because I’m torn between, you know, competition on a state level, if you will, and what our small “s” states are actually for, and what role they’re supposed to play.

Let me phrase it this way and ask you to help me navigate through this minefield:

We cannot let China get to AGI first. Can’t. Really, really bad.

But we also—we also have to slow down some. Um, they’re not going to.

I believe the states should—I mean, the United States should be 50 laboratories. And you see which one works the best, and then you can kick that up to the federal level if you want to.

But we—we have to have some brakes.

However, the federal government is saying, if we do that, then you’re constantly having to navigate around each of these states and their laws, and we can’t get things done to stay competitive.

How do you solve that?

Tristan Harris
Yeah, I mean it is a tough one. I mean, the challenge here is—if we had a plan for how the federal laws would actually move at the pace of this technology, then I could understand, hey, let’s—okay, we’re going to do a lot of stuff at the federal level.

But right now the current plan is literally just to preempt for 10 years—that no regulation happening at the state level will ever be honored—and while at the same time, not passing anything at a federal level.

And there’s a quote in an article that, if this preemption becomes law, a nail salon in Washington D.C. would have more rules to follow than the AI companies.

And there are 260 state lawmakers from across the country who have already urged Congress to reject it. And they said it’s the most broad-based opposition yet to the AI moratorium proposal.

Now I hear you. There’s sort of this tension between—we need to race with China, we don’t want to be behind in fundamental technologies. And that’s why there is this race.

But we need to be racing to controllable and scrutable—meaning explainable—versions of this technology.

If it is doing things like scheming, lying, blackmailing people—beating China to a weapon that we point at our own face is not winning.

I mean, we saw this with social media. We beat China to social media. Did that make us stronger or weaker?

If you beat China to a technology but you don’t govern it well in a way that actually enhances and strengthens your society, it weakens you.

So yes, we’re in a competition for technology. But we’re even more than that in a competition for who can govern this technology better.

And so what I would want to see is—how are we doing this at a fast rate federally that keeps up with, and makes sure we’re competing for, a controllable version?

And we can do that.

Glenn Beck
You’ve met the people in Washington. They’re all like 8,000 years old.

They don’t know how to—I barely know how to use my iPhone, let alone what’s in Washington.

And you can’t keep up with this technology. How do you keep a legislative body up to speed—literally—with this kind of speed with technology?

How is that done?

Tristan Harris
Well, I think that’s one of the fundamental challenges that we face as a species right now.

Technology—if you just think about, I mean, there’s a quote by Harvard sociologist E.O. Wilson. He said, “The fundamental problem of humanity is we have paleolithic brains, medieval institutions, and godlike technology.”

And those operate at three different speeds. Like, our brains are kind of baked a long time ago. Our institutions don’t move at that fast rate. And then the technology—especially AI—literally evolves faster than any other technology that we’ve invented.

But that doesn’t mean we should do nothing. We should figure out: what does it mean to…

Glenn Beck
So what should the average person—get it right, I’ve only got about 90 seconds left.

Tristan Harris
In the short term, I think—letting Senator Ted Cruz and those who are advancing this moratorium know that we need to have a plan for how we’re doing this technology. And if the moratorium goes through, there’s no current plan.

And so there’s some basic, simple things that we can also do right now that are really uncontroversial.

We can start with the easy stuff: we can ban engagement-driven AI companions for children.

I was last on your program a few months ago talking about the AI companion that caused this kid to commit suicide.

You know, we can establish basic liability laws so that if AI companies are causing harms, they’re actually accountable for them.

And that’ll move the pace of release to a pace that they can get it right, because now they’re not just releasing things and then not being liable.

We can strengthen whistleblower protections. There are already examples of AI whistleblowers forfeiting millions of dollars of stock options—they shouldn’t have to forfeit millions of dollars of stock options to warn the public when there’s a problem.

We can enact basic laws so AIs don’t have protected speech or have their own bank accounts—so we make sure our legal system works for human interests and not for AI interests.

So these are just a few examples of things that we can do. And there’s really nothing stopping us from moving into action.

We just need to be clear about the problem.

Glenn Beck
Okay. So, Tristan, thank you so much. Could I ask you to hold on—Jason, could you grab his phone number or just talk to him offline and get those points of action?

And let’s write them up and post them at GlennBeck.com so people will know what to ask for, what to say when they’re calling their congressman or their senator.

Thank you so much, Tristan. We’ll talk again.

This is a Guest Post from our friends over at WLTReport.

View the original article here.


Source link