- Messages
- 29,548
Why xAI’s Grok Went Rogue
Some X users suddenly became the subject of violent ideations by xAI’s flagship chatbot

——
Good article, poor headline — Grok did not go rogue, it behaved as it was programmed to behave.
Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
“… Musk said he would tweak Grok after it started to give answers that he didn’t agree with. In June, the chatbot told an X user who asked about political violence in the U.S. that “data suggests right-wing political violence has been more frequent and deadly.”Why xAI’s Grok Went Rogue
Some X users suddenly became the subject of violent ideations by xAI’s flagship chatbot
—> https://www.wsj.com/tech/ai/why-xai...0?st=NQTrYn&reflink=desktopwebshare_permalink
——
Good article, poor headline — Grok did not go rogue, it behaved as it was programmed to behave.
“Grok was too compliant to user prompts. Too eager to please and be manipulated, essentially.”“… Musk said he would tweak Grok after it started to give answers that he didn’t agree with. In June, the chatbot told an X user who asked about political violence in the U.S. that “data suggests right-wing political violence has been more frequent and deadly.”
“Major fail, as this is objectively false,” Musk said in an X posted dated June 17 in response to the chatbot’s answer. “Grok is parroting legacy media. Working on it.”
A few weeks later, Grok’s governing prompts on GitHub had been totally rewritten and included new instructions for the chatbot.
Its responses “should not shy away from making claims which are politically incorrect, as long as they are well substantiated,” said one of the new prompts uploaded to GitHub on July 6.
Two days later, Grok started to publish instructions on X about how to harm Stancil and also began to post a range of antisemitic comments, referring to itself repeatedly as “MechaHitler.” Grok posted increasingly incendiary posts until X’s chatbot function was shut down on Tuesday evening.
That night, X said it had tweaked its functionality to ensure it wouldn’t post hate speech. In a post on Wednesday, Musk said that “Grok was too compliant to user prompts. Too eager to please and be manipulated, essentially.”
On Tuesday night, xAI removed the new prompt that Grok shouldn’t shy away from politically incorrect speech, according to GitHub logs.…”
You may want to try Claude for that task assuming your company allows you to use outside LLMs.I’m feeling better about copilot. Was able to convert a big spreadsheet to pandas dataframe and run some good analytics on it with word prompts. My very limited python knowledge helped in terms of being precise in the prompts, but woe if your source data file isn’t super clean to start with.
What is the evidence that “user manipulation” led to Grok’s flirtations with genocide?“Grok was too compliant to user prompts. Too eager to please and be manipulated, essentially.”
This is essentially what's happening with grok compared to some of the others. All of the models are generating crazy answers. They aren't racist or anti-semitic or evil. It's just an algorithm. The guardrails that Musk is talking about are rules that prevent the models from delivering certain answers.
Xai, made a choice to use fewer guard rails. For example, most of the models won't tell you how to commit a crime or build a bomb, because the programmers put rules in place to prevent that. If you ask any of them how to rob a bank for example, behind the scenes the model is coming up with a step by step plan to rob a bank. But there are rules within the model that won't allow the model to show you that answer. Xai said that information is available on the internet anyway so we're not going to put those rules in place.
That's just one example of how Xai decided to be a little freer and more open. So with fewer rules in place, it's easier to manipulate grok. But the trade-off is it allows the grok model to give more complete legitimate answers that some of the other models might decide to block because it tangentially ran up against some rule.
I don't use grok for much in my day today. It's pretty good for image recognition, it's well integrated into twitter, and it's about average when it comes to answering normal questions. I have not found it as good at coding as the other models. It's actually pretty bad. Grok did come out with a new version that Elon is hyping but Elon doesn't exactly have a reputation for objective evaluation of his own products. I'll wait for the reviews.
I would guess that one semi-compelling piece of evidence would be for you to put in the prompt from the Twitter post that is upsetting and see what answer you get back. Notice that all these Twitter posts don't show all the other prompts before the really egregious one appeared.What is the evidence that “user manipulation” led to Grok’s flirtations with genocide?
Yeah, cause it's not "woke!"No matter how hard I try, I cannot get ChatGPT to spew that nonsense. It's not because the programmers don't let it. We know what that looks like. For instance, if you ask it to write a story like Toni Morrison, it will respond with a statement like "due to intellectual property constraints, I can't do that for you. I can write something with the following properties: [list of literary qualities]" It also does (probably even more) with image generation.
But when I ask it Holocaust denying questions, it doesn't do that. It doesn't mention the terms of service. It just answers the question truthfully, that the Holocaust did happen. No behind the scene filter needed.
I suspect the more likely reason is that Grok probably trains extensively on Twitter posts.
Yeah I thought it was interesting just to follow how they worked through the issue and that using “you” in the prompt seems to trigger Grok to think it was being tasked with speaking for Elon.This one I believe. I think its legit and not a result of unseen influencing. I'm not sure its intentional or a result of unforseen side effects of the base prompting but it could influence the model in negative ways.
It kinda is, no? I mean, he’s the one that decides when and how it needs to be tweaked. The goal appears to be dishing out right wing responses with enough cover that people will argue it “wasn’t a Nazi salute, it was just a clumsy wave.”Yeah I thought it was interesting just to follow how they worked through the issue and that using “you” in the prompt seems to trigger Grok to think it was being tasked with speaking for Elon.
I thought so too. I think its just a theory but a pretty good one. Not even sure if the guys in the chat will be able to confirm it for sure because of the black box on these things. xAI could confirm that the prompt is causing it by running a demo model behind the scenes and changing the prompt but they may not share those results publicly or they may not do the test. And even if they did the tests, they still likely won't be able to definitively say why that prompt would cause it to overweight elon tweets.Yeah I thought it was interesting just to follow how they worked through the issue and that using “you” in the prompt seems to trigger Grok to think it was being tasked with speaking for Elon.
I completely agree. The issue, though, is that by the time it would take one to get through the research process it would be obsolete.I thought so too. I think its just a theory but a pretty good one. Not even sure if the guys in the chat will be able to confirm it for sure because of the black box on these things. xAI could confirm that the prompt is causing it by running a demo model behind the scenes and changing the prompt but they may not share those results publicly or they may not do the test. And even if they did the tests, they still likely won't be able to definitively say why that prompt would cause it to overweight elon tweets.
Its just such a cool find and the more I dork out on it, the more I want them to go down the research path and share results.
To me, the value of researching why something like that would happen would be to understand more about how these llms work. I think that would have more value beyond just this grok 4 model.I completely agree. The issue, though, is that by the time it would take one to get through the research process it would be obsolete.