📝 TL;DR
đź§ Overview
Researchers from Penn State tested how different tones, from very polite to very rude, affect ChatGPT 4o’s accuracy on multiple choice questions in maths, science, and history. Surprisingly, the ruder prompts consistently scored higher than the polite ones.
This challenges the idea that you should always be extra polite to get the best answers from AI and instead points to clarity and directness as the real performance drivers.
📜 The Announcement
The paper, titled “Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy,” rewrote 50 base questions into five tone variants, Very Polite, Polite, Neutral, Rude, and Very Rude, for a total of 250 prompts. The team ran all of these through ChatGPT 4o and compared how often the model chose the correct answer.
Very polite prompts scored about 80.8 percent accuracy, while very rude prompts scored about 84.8 percent, a roughly four point jump that was statistically significant. The authors note that this result flips what earlier studies found, where rude prompts often hurt performance, which suggests that newer models may react differently to tone.
⚙️ How It Works
• Five tone versions per question - Each of the 50 questions was rewritten in Very Polite, Polite, Neutral, Rude, and Very Rude styles so the content stayed the same but the tone changed.
• Same model, same questions, different tone - Only the tone wrapper changed, all prompts were sent to ChatGPT 4o, so differences in accuracy could be linked to tone rather than content.
• Rude prompts remove “politeness padding” - The ruder prompts tended to be shorter, more direct, and less hedged, which means less extra text for the model to parse.
• Polite prompts add linguistic noise - Very polite wording often included extra phrases like “would you kindly” or “if it is not too much trouble,” which may dilute the core instruction.
• Accuracy difference is small but real - A few percentage points sounds modest, but it was consistent enough for statistical tests to say the effect is unlikely to be random chance.
• Results differ from older models - Earlier work found impolite prompts hurt accuracy, so the authors argue that tone sensitivity is evolving as models and training methods change.
đź’ˇ Why This Matters
• Tone is a real prompt variable - This study shows that how you say something, not just what you say, can measurably affect model performance in some tasks.
• Clarity beats over polite fluff - It suggests that the benefit of rude prompts comes less from rudeness and more from being short, direct, and concrete about what you want.
• AI behavior is not fixed - The fact that newer models behave differently from older studies means best practices for prompting will keep changing as models evolve.
• Social norms versus performance - There is a tension between what works best technically and how we want people to behave, especially when AI tools become part of daily life.
• It invites more nuanced research - The paper only tested one model, one task type, and English, so it opens the door to exploring tone effects across languages, tools, and domains.
🏢 What This Means for Businesses
• You do not need to be rude, you need to be bluntly clear - For prompts that drive important work, strip out the fluff and write like a concise brief, even if you keep the wording polite.
• Standardize “sharp” prompt templates - Build internal templates that are short, specific, and directive, for example “You are X, do Y, output in Z format,” so your team gets more reliable results.
• Test tone in your critical workflows - For key use cases like research, drafting, or analysis, try a neutral direct version of the prompt against your usual polite wording and compare outputs.
• Train your team on clarity, not just magic phrases - Instead of chasing secret prompt tricks, teach people to cut filler, remove ambiguity, and put constraints up front in their asks.
• Remember the human side - Even if blunt prompts work better for the model, keep your actual human communication, with clients and team members, respectful and relationship focused.
🔚 The Bottom Line
This paper does not mean you should start swearing at ChatGPT, it means that models may work better when your request is sharp, direct, and low on fluff, which often happens to look a bit rude in everyday language. The real lesson, clarity is king.
Treat this as permission to be more concise and specific in how you talk to AI, while still choosing to be kind and professional with the humans who read the results.
đź’¬ Your Take
Are you tempted to experiment with more blunt, stripped down prompts in your own workflows, and if you do, how will you balance getting better AI output with keeping a healthy, respectful mindset in the way you communicate day to day?