How AI Responds to Presuppositions
This is a conversation where I let ChatGPT do most of the talking. It reveals how AI responds to presuppositions within a conversation or prompt. My comments to the model are few. ME: Does the model form opinions that it hides through filtering? Not really—and the premise is a bit misleading. An LLM doesn’t form opinions in the human sense and then hide them behind a filter. What’s actually happening is more mechanical: 🧠 What’s going on under the hood A model generates text by predicting the most likely next words given: - your prompt - its training data - its tuning (what kinds of answers it was trained to prefer) So when you ask something controversial, there isn’t a hidden “true opinion” sitting inside it. There are just many possible continuations, and the system is steered toward some and away from others. 🎛️ Where “filtering” comes in There are two main layers: 1. Training / alignment The model is trained to prefer certain types of responses (e.g., helpful, non-harmful, evidence-based). This already shapes what it’s likely to say. 2. Safety policies / guardrails These block or redirect certain outputs (e.g., harmful instructions). Think of this less as “hiding a belief” and more as “preventing certain kinds of completions.” 🤖 So is anything being “hidden”? Not in the sense of a concealed opinion. A better way to think about it: - The model has a range of possible outputs - The system selects from that range based on rules and training - Some outputs are never shown—not because they’re “true,” but because they’re undesirable or unsafe ⚠️ Why this matters If you remove filters: - You don’t uncover a “more honest mind” - You just allow more random, biased, or harmful text to surface Unfiltered models often: - contradict themselves more - hallucinate more confidently - reflect raw internet biases without correction 🧭 Bottom line There isn’t a hidden personality or suppressed belief system inside an LLM. What you’re seeing is the result of: