Burstiness and Perplexity

Write something

Pinned

17d •

Understanding and Mitigating AI Hallucinations

This briefing document summarizes the core insights from the provided sources regarding the phenomenon of AI hallucinations, their underlying causes, and proposed solutions. 1. The Nature of AI Hallucinations AI hallucinations are defined as instances where large language models (LLMs) "confidently make things up," producing "plausible yet incorrect statements instead of admitting uncertainty." This differs fundamentally from human perceptual hallucinations. The problem is not necessarily about making models smarter or training them on more data; rather, it stems from the way AI models are currently trained and evaluated. Key Facts: - LLMs often provide "overconfident, plausible falsehoods," which "diminish their utility." - Examples include generating incorrect birthdates or dissertation titles for known individuals, even when explicitly asked to respond "only if known." - Hallucinations can be "intrinsic" (contradicting the user's prompt, e.g., miscounting letters in a word) or "extrinsic" (contradicting training data or external reality). Quote: "Language models are known to produce overconfident, plausible falsehoods, which diminish their utility. This error mode is known as 'hallucination,' though it differs fundamentally from the human perceptual experience." – why-language-models-hallucinate.pdf 2. Root Causes: Training and Evaluation Incentives The core argument across both sources is that AI models hallucinate because the current training and evaluation paradigms inadvertently reward guessing over honesty. Main Themes: - "Terrible Test-Takers": LLMs are "essentially training AI to be terrible test-takers who guess instead of admitting uncertainty." - Binary Scoring: Most benchmarks operate like "multiple-choice exams" with "binary 0-1 scheme[s]" where "1 point for a correct answer and none for blanks or IDKs." This incentivizes guessing, as "leaving an answer blank guarantees failure but guessing gives you a 1-in-365 chance of nailing someone's birthday." - Vicious Cycle: This leads to models learning to "bluff," generating "confident-sounding nonsense rather than admit uncertainty." As models become more capable, they continue to hallucinate because "that's what scores best on tests." - Statistical Origins (Pretraining): Hallucinations "originate simply as errors in binary classification." Even with error-free training data, the statistical objectives minimized during pretraining can lead to errors. This is due to factors like: - Arbitrary Facts: When there's no learnable pattern in data (e.g., specific birthdays), models are likely to hallucinate, with the hallucination rate being at least the "fraction of training facts that appear once." - Poor Models: The model architecture itself may be insufficient to represent the concept well (e.g., trigram models struggling with longer dependencies) or may not be a good fit even if expressive enough. - Computational Hardness: Problems that are computationally intractable for even superhuman AI will lead to errors if the model attempts to solve them rather than defer. - Distribution Shift (OOD Prompts): Prompts that differ significantly from training data can induce errors. - GIGO (Garbage In, Garbage Out): Training corpora often contain factual errors, which base models can replicate. - Persistence (Post-Training): Despite efforts to reduce hallucinations during post-training (e.g., RLHF), they persist because "guessing when unsure maximizes expected score under a binary 0-1 scheme." Existing primary evaluations "overwhelmingly penalize uncertainty."

Pinned

Guerin Green

May 6 •

AI Coding

n8n training files

n8n training files for Claude Project. Please join this Skool prior to downloading...

New comment Aug 1

Pinned

Guerin Green

Jan 20 •

General discussion

welcome to the new Burstiness and Perplexity community

Our mission is to create a true learning community where an exploration of AI, tools, agents and use cases can merge with thoughtful conversations about implications and fundamental ideas. If you are joining, please consider engaging, not just lurking.Tell us about yourself and where you are in life journey and how tech and AI intersect it. for updates on research, models, and use cases, click on the Classrooms tab and then find the Bleeding Edge Classroom

New comment Jul 29

Guerin Green

8d •

DeepSeek

DeepSeek-R1 and the Emergence of Reasoning via Reinforcement Learning

This document synthesizes findings on DeepSeek-R1, a Large Language Model (LLM) whose reasoning abilities have been significantly enhanced through a novel application of pure Reinforcement Learning (RL). The core thesis is that LLMs possess substantial latent reasoning potential that can be unlocked without extensive human-annotated reasoning trajectories. By providing hard reasoning questions, a reliable verifier (reward signal), and sufficient computational resources, the model can self-evolve sophisticated problem-solving strategies. The initial model, DeepSeek-R1-Zero, was trained using RL on the DeepSeek-V3 Base model, bypassing conventional supervised fine-tuning. It achieved superior performance on verifiable tasks in mathematics, coding, and STEM fields, notably improving its score on the AIME 2024 benchmark from 15.6% to 77.9%. This process led to the emergence of advanced reasoning patterns such as self-reflection, verification, and dynamic strategy adaptation. The final model, DeepSeek-R1, builds upon this foundation through a multi-stage pipeline that integrates RL with supervised fine-tuning and rejection sampling. This approach preserves the advanced reasoning of its predecessor while aligning the model with human preferences, improving instruction-following, readability, and general capabilities. The project highlights significant limitations, including challenges in structured output, token efficiency, and the risk of "reward hacking" in domains without rule-based verifiers. The models, data samples, and distilled smaller versions have been made publicly available to advance research in AI reasoning. Core Thesis: Incentivizing Reasoning with Pure Reinforcement Learning The central argument is that the reasoning capabilities of LLMs can be substantially incentivized through a pure Reinforcement Learning framework, obviating the need for human-labelled reasoning paths. Traditional methods, such as Chain-of-Thought (CoT) prompting or supervised learning on human demonstrations, are effective but have key limitations:

Guerin Green

10d •

General discussion

The "Traffic Apocalypse" for News Websites Driven by AI

Subject: Analysis of the significant decline in web traffic to major news websites in July 2025, primarily attributed to Google's AI Overviews feature and its broader implications for the media industry. Sources: Excerpts from "News Websites' Traffic Decline: The AI Impact" (Novel Cognition and Aibrandintelligence.com research, compiling data from Press Gazette, SimilarWeb, Columbia Journalism Review, Pew Research Center, and others). Executive Summary The digital news industry is experiencing a profound crisis, dubbed a "traffic apocalypse," with a widespread and significant decline in web traffic across major U.S. news websites. In July 2025, 46 of the top 50 U.S. news sites saw year-over-year traffic declines, with some publishers losing up to 50% of their audience compared to July 2024. The primary driver of this decline is Google's AI Overviews feature, launched in May 2024, which has drastically increased "zero-click" searches and reduced organic traffic to news sites. This shift is disrupting traditional publishing business models, leading to legal challenges and a pivot towards reader revenue, while raising concerns about the future of quality journalism. Key Themes and Most Important Ideas/Facts 1. Widespread and Severe Traffic Declines Across Major News Sites - Prevalence: "46 of the top 50 news sites experienced year-over-year traffic declines" in July 2025 compared to July 2024. - Magnitude: Some publishers "losing as much as 50% of their audience." - Major Publishers Hit Hard:Forbes: -50% (63 million visits) - Daily Mail: -44% (76.8 million visits) - NBC News: -42% (74.4 million visits) - HuffPost: -42% (38.5 million visits) - Washington Post: -40% (69.4 million visits) - CNN: -34% (323 million visits) - Fox News: -25% (249 million visits) - Limited Exceptions: Only three websites saw double-digit year-over-year growth: Substack (+47%), India Times (+46%), and Newsbreak (+24%). 2. Google AI Overviews as the Primary Driver of Decline

1-30 of 52

Burstiness and Perplexity

skool.com/burstiness-and-perplexity

Master AI use cases from legal & the supply chain to digital marketing & SEO. Agents, analysis, content creation--Burstiness & Perplexity from NovCog

Leaderboard (30-day)