The Fire Hose · Burstiness and Perplexity

The Fire Hose

---

### **OpenAI o3-mini: Cost-Efficient Reasoning Model**

OpenAI launched **o3-mini**, its most cost-efficient reasoning model optimized for STEM tasks (science, math, coding). Key features:

- **24% faster** than o1-mini (7.7s avg response vs 10.16s) with **39% fewer errors** [1][12][15]

- Supports **structured outputs**, function calling, and three reasoning effort modes (low/medium/high) [22][119]

- Available to **free ChatGPT users** via "Reason" mode and API ($1.10/million input tokens) [116][120]

- Outperforms o1-mini in 56% of tests but trails DeepSeek-R1 in cost efficiency ($0.55/million tokens) [15][118]

---

### **Alibaba's Qwen2.5-Plus Update**

Alibaba upgraded its **Qwen Chat** with:

- **Qwen2.5-Plus-0125-Exp** model using advanced post-training techniques [2][24]

- **10,000-character text input** and PDF/DOCX file support [2][128]

- Flexible mode switching (web search, normal, etc.) in a single session [2][126]

- Outperforms GPT-4o and Gemini in document analysis but lags in multilingual tasks [24][129]

---

### **OpenAI's Deep Research Agent**

New **AI research agent** synthesizes web data into reports:

- Powered by **o3 model**, analyzes text/images/PDFs in minutes [3][131][137]

- Generates citations and summaries, available for **ChatGPT Pro** users [44][135]

- 5–30 minutes per query, with lower hallucination rates than ChatGPT [137]

---

## **⚡️ Trending Signals**

### **Google DeepMind: RL > Supervised Fine-Tuning**

- **SCoRe** (Self-Correction via RL) improves math/coding accuracy by 15.6% and 9.1% over supervised methods [4][48]

- RL enables **self-correction traces**, reducing bias and distributional shift [53][140]

### **NVIDIA Eagle2-9B Vision-Language Model**

- **92.6% accuracy on DocVQA**, surpassing GPT-4V (88.4%) [5][59]

- Trained on 180+ sources with transparent data strategy [64][68]

- 80% smaller dataset (4.6M samples) maintains SOTA performance [5]

### **Meta's EvalPlanner for LLM Judges**

- **Three-stage evaluation** (plan → execute → judge) improves fairness in AI assessments [6][75]

- Optimizes synthetic preference pairs, outperforming human evaluators in coding/math tasks [71][78]

---

## **💻 Top Tutorials**

1. **LangGraph Multi-Agent Workflows**

- Build **supervised agent systems** with web research, RAG, and NL2SQL nodes [7][87]

- Code example: [GitHub](https://github.com/langchain-ai/langgraph)

2. **Reduce DeepSeek-R1 Size by 80%**

- Use **1.58-bit dynamic quantization** (131GB → 158GB VRAM) [93][97]

- Maintains performance with 7-8 tokens/sec speed on RTX 4090 [95]

3. **Gradio/Hugging Face Apps**

- Deploy ML demos with text summarization, image captioning, and LLM chat [106][115]

- Tutorial: [DeepLearning.AI Course](https://learn.deeplearning.ai/courses/huggingface-gradio)

---

**Key Takeaway**: OpenAI and Chinese rivals (DeepSeek, Alibaba) are racing to optimize cost, speed, and reasoning—with reinforcement learning and quantization emerging as critical tools.

Citations:

[1] https://www.gsmarena.com/openai_o3_mini_released-news-66347.php

[2] https://www.latestly.com/socially/technology/qwen-chat-new-update-alibabas-ai-company-upgrades-qwen-2-5-plus-to-qwen-plus-0125-exp-adds-flexible-modes-and-offers-unlimited-input-6614074.html

[3] https://retailwire.com/openai-deep-research-tool-chatgpt/

[4] https://www.infoq.com/news/2024/10/google-deepmind-score/

[5] https://www.marktechpost.com/2025/01/29/nvidia-ai-releases-eagle2-series-vision-language-model-achieving-sota-results-across-various-multimodal-benchmarks/

[6] https://fusionchat.ai/news/enhancing-ai-judgment-with-evalplanner-by-meta-ai

[7] https://blog.futuresmart.ai/multi-agent-system-with-langgraph

[8] https://www.plainconcepts.com/deepseek-r1/

[9] https://www.youtube.com/watch?v=GG_TuOeVTH0

[10] https://www.deeplearning.ai/short-courses/building-generative-ai-applications-with-gradio/

[11] https://venturebeat.com/ai/its-here-openais-o3-mini-advanced-reasoning-model-arrives-to-counter-deepseeks-rise/

[12] https://thezvi.substack.com/p/o3-mini-early-days

[13] https://www.reddit.com/r/OpenAI/comments/1iemnvi/openai_o3mini/

[14] https://www.forexlive.com/news/openai-to-launch-new-o3-model-for-free-today-as-it-pushes-back-against-deepseek-20250131/

[15] https://www.technologyreview.com/2025/01/31/1110757/openai-makes-its-reasoning-model-for-free/

[16] https://platform.openai.com/docs/guides/structured-outputs

[17] https://www.wired.com/story/openai-o3-mini-release/

[18] https://www.youtube.com/watch?v=p9JjeRWcIgg

[19] https://www.youtube.com/watch?v=6HfGqW2dg2E

[20] https://economictimes.indiatimes.com/news/international/us/as-deepseek-builds-up-heat-openai-releases-a-cost-efficient-reasoning-ai-model-heres-what-it-can-do/articleshow/117841762.cms

[21] https://www.axios.com/2025/01/31/o3-mini-chatgpt-release-openai

[22] https://openai.com/index/openai-o3-mini/

[23] https://qwenlm.github.io/blog/qwen2.5-vl/

[24] https://recodechinaai.substack.com/p/decoding-the-openai-deepseek-distillation

[25] https://www.computing.co.uk/news/2025/ai/alibaba-unveils-qwen-2-5-ai-model

[26] https://qwenlm.github.io/blog/qwen2.5-max/

[27] https://www.theregister.com/2025/01/30/alibaba_qwen_ai/

[28] https://finance.yahoo.com/news/alibaba-unveils-version-qwen-2-154004804.html

[29] https://github.com/QwenLM/Qwen2.5-VL

[30] https://www.reuters.com/technology/artificial-intelligence/alibaba-releases-ai-model-it-claims-surpasses-deepseek-v3-2025-01-29/

[31] https://github.com/QwenLM/Qwen/actions

[32] https://dev.to/maximsaplin/qwen25-max-release-went-unnoticed-in-deepseek-hysteria-276m

[33] https://www.alibabacloud.com/en/solutions/generative-ai/qwen?_p_lc=1

[34] https://www.alibabacloud.com/help/en/model-studio/developer-reference/what-is-qwen-llm

[35] https://osher.com.au/blog/how-to-build-ai-agent-openai/

[36] https://opentools.ai/news/openai-launches-game-changing-ai-agent-deep-research-to-revolutionize-how-we-analyze-information

[37] https://www.freethink.com/robots-ai/ai-agents-openai

[38] https://www.voiceflow.com/articles/ai-agents

[39] https://www.reddit.com/r/MachineLearning/comments/17mr1rk/d_openais_agents_and_their_plans_for_2024/

[40] https://www.leandratejedor.com/post/building-ai-agents-openai-assistants-api-beginner-tutorial

[41] https://thetechportal.com/2025/02/04/openai-launches-new-ai-agent-for-multi-step-research-tasks/

[42] https://www.tomsguide.com/ai/live/openai-spring-update-event-live-blog

[43] https://docs.llamaindex.ai/en/stable/examples/agent/openai_agent/

[44] https://openai.com/index/introducing-deep-research/

[45] https://web.swipeinsight.app/topics/openai

[46] https://cases.media/en/article/open-source-ai-agents-how-to-use-them-and-best-examples

[47] https://www.linkedin.com/posts/andrew-iain-jardine_supervised-fine-tuning-is-dead-long-live-activity-7290723366641012738-HSGM

[48] https://www.marktechpost.com/2024/09/21/google-deepmind-introduced-self-correction-via-reinforcement-learning-score-a-new-ai-method-enhancing-large-language-models-accuracy-in-complex-mathematical-and-coding-tasks/

[49] https://deepmind.google/discover/blog/fast-reinforcement-learning-through-the-composition-of-behaviours/

[50] https://www.researchgate.net/publication/386534799_Refining_Intelligence_A_Comparative_Study_of_Reinforcement_Fine-Tuning_and_Supervised_Fine-Tuning_in_Advancing_AI_Models

[51] https://rldm.org/rldm-2025/

[52] https://research.google/blog/introducing-dreamer-scalable-reinforcement-learning-using-world-models/

[53] https://www.linkedin.com/pulse/google-deepmind-introduces-self-correction-via-reinforcement-g-sowge

[54] https://blog.google/technology/ai/2024-ai-extraordinary-progress-advancement/

[55] https://deepmind.google/discover/blog/stacking-our-way-to-more-general-robots/

[56] https://www.marktechpost.com/2024/07/27/this-paper-from-google-deepmind-presents-conditioned-language-policies-clp-a-machine-learning-framework-for-finetuning-language-models-on-multiple-objectives/

[57] https://phys.org/news/2024-11-google-deepmind-ai-based-decoder.html

[58] https://www.youtube.com/watch?v=73kvNpyprUo

[59] https://www.reddit.com/r/OpenSourceeAI/comments/1iddouc/nvidia_ai_releases_eagle2_series_visionlanguage/

[60] https://www.linkedin.com/pulse/latest-breakthroughs-large-scale-multimodal-vision-language-models-fdutc

[61] https://www.researchgate.net/scientific-contributions/Shiyi-Lan-2149875426

[62] https://x.com/Marktechpost/status/1884811297818345717

[63] https://open.spotify.com/show/2J3lqMPD0BUI0bF9KJYKc1

[64] https://www.researchgate.net/publication/388421847_Eagle_2_Building_Post-Training_Data_Strategies_from_Scratch_for_Frontier_Vision-Language_Models

[65] https://podimo.com/dk/shows/the-top-ai-news-from-the-past-week-every-thursdai

[66] https://www.facebook.com/groups/DeepNetGroup/posts/2392237621169106/

[67] https://www.researchgate.net/publication/384171969_AM-RADIO_Agglomerative_Vision_Foundation_Model_Reduce_All_Domains_Into_One

[68] https://www.linkedin.com/posts/asifrazzaq_nvidia-ai-releases-eagle2-series-vision-language-activity-7290578394956615682-ip7X

[69] https://www.facebook.com/login/?next=https%3A%2F%2Fwww.facebook.com%2Fgroups%2FDeepNetGroup%2F

[70] https://x.com/arankomatsuzaki/status/1884120582230626729

[71] https://www.linkedin.com/posts/adinalihodzic_meta-ai-has-proposed-an-intriguing-advancement-activity-7291199961461862402-Yuxo

[72] https://www.youtube.com/watch?v=9NJ2P10SOAM

[73] https://www.threads.net/@marktechpost/post/DFfX7d8yvvm

[74] https://x.com/Marktechpost/status/1885215482804134115

[75] https://www.marktechpost.com/2025/01/30/meta-ai-proposes-evalplanner-a-preference-optimization-algorithm-for-thinking-llm-as-a-judge/

[76] https://cookbook.openai.com/examples/custom-llm-as-a-judge

[77] https://www.threads.net/@matt_dancho/post/DFfz9p2AjiW/6-become-this-new-role-rise-of-the-generative-ai-data-scientisttheres-a-new-role

[78] https://www.linkedin.com/posts/asifrazzaq_meta-ai-proposes-evalplanner-a-preference-activity-7290981481626124290-nbAm

[79] https://www.threads.net/@matt_dancho/post/DFfVzsxIrOh

[80] https://twitter.com/Marktechpost/status/1885215482804134115

[81] https://www.ifml.institute/events/ifml-seminar-020725-preference-optimization-large-language-model-alignment-personalization

[82] https://langchain-ai.github.io/langgraphjs/tutorials/workflows/

[83] https://datasciencedojo.com/blog/langgraph-tutorial/

[84] https://x.com/LangChainAI/status/1883926126583771531

[85] https://www.datacamp.com/tutorial/langgraph-tutorial

[86] https://www.youtube.com/watch?v=kxSeJKnBjp0

[87] https://blog.futuresmart.ai/langgraph-tutorial-for-beginners

[88] https://blog.langchain.dev/langgraph-multi-agent-workflows/

[89] https://www.youtube.com/watch?v=re35AJ3lCFo

[90] https://stackoverflow.com/questions/79399279/how-do-i-use-agents-in-langgraph-workflow

[91] https://github.com/langchain-ai/langgraph/discussions/496

[92] https://www.youtube.com/watch?v=w_HeP0A2MF8

[93] https://www.reddit.com/r/selfhosted/comments/1ic8zil/yes_you_can_run_deepseekr1_locally_on_your_device/

[94] https://news.ycombinator.com/item?id=42850222

[95] https://snowkylin.github.io/blogs/a-note-on-deepseek-r1.html

[96] https://www.reddit.com/r/LocalLLaMA/comments/1if7wkx/have_we_been_oversold_on_how_efficient_deepseek/

[97] https://unsloth.ai/blog/deepseekr1-dynamic

[98] https://vagon.io/blog/a-step-by-step-guide-to-running-deepseek-r1-on-vagon-cloud-desktops

[99] https://www.linkedin.com/pulse/efficiency-vs-compute-why-deepseek-r1-making-big-tech-r-pillai-a2aqe

[100] https://www.storagereview.com/news/how-deepseek-r1-overcame-hardware-limitations-to-deliver-ai-breakthroughs

[101] https://www.youtube.com/watch?v=9ALaTPHoog0

[102] https://lumina247.com/2025/01/deepseek-r1-ai-efficiency/

[103] https://simonw.substack.com/p/the-deepseek-r1-family-of-reasoning

[104] https://www.datacamp.com/tutorial/fine-tuning-deepseek-r1-reasoning-model

[105] https://huggingface.co/learn/nlp-course/en/chapter9/1

[106] https://www.youtube.com/watch?v=TRkacViVhNk

[107] https://www.gradio.app/guides/using-hugging-face-integrations

[108] https://learn.deeplearning.ai/courses/huggingface-gradio/lesson/1/introduction

[109] https://generativeai.pub/create-tweets-with-gen-ai-for-free-831373156d51?gi=1f6821dac878

[110] https://huggingface.co/learn/cookbook/en/enterprise_cookbook_gradio

[111] https://www.gradio.app/guides/gradio-and-llm-agents

[112] https://huggingface.co/blog/eu-ai-act-for-oss-developers

[113] https://www.youtube.com/watch?v=bN9WTxzLBRE

[114] https://huggingface.co/docs/transformers/en/generation_strategies

[115] https://towardsai.net/p/l/how-to-build-a-simple-generative-ai-application-with-gradio

[116] https://discuss.huggingface.co/t/how-to-insert-image-beside-text-in-gradio/50864

[117] https://www.techradar.com/computing/artificial-intelligence/openai-responds-to-the-deepseek-buzz-by-launching-its-latest-o3-mini-reasoning-model-for-all-users

[118] https://www.bankinfosecurity.com/openai-unveils-o3-mini-enhanced-coding-stem-reasoning-a-27429

[119] https://community.openai.com/t/launching-o3-mini-in-the-api/1109387

[120] https://www.theverge.com/news/603849/openai-o3-mini-launch-chatgpt-api-available-now

[121] https://azure.microsoft.com/en-us/blog/announcing-the-availability-of-the-o3-mini-reasoning-model-in-microsoft-azure-openai-service/?msockid=09ee36fbeca5692a2c76237ded166873

[122] https://www.macrumors.com/2025/01/31/openai-o3-mini-launch/

[123] https://evrimagaci.org/tpg/openai-launches-o3mini-reasoning-model-ahead-of-competition-177100

[124] https://www.amitysolutions.com/blog/qwen-2-5-ai-breakthrough-all-records

[125] https://www.reddit.com/r/LocalLLaMA/comments/1ig3ob4/qwen_chat_major_update_qwen25plus_closer_to_max/

[126] https://economictimes.com/tech/artificial-intelligence/alibaba-launches-advanced-ai-model-qwen2-5-max-to-rival-gpt-4/amp_articleshow/117718799.cms

[127] https://uptech-media.com/alibaba-introduces-latest-enhancements-to-its-qwen-2-5-visual-language-series/

[128] https://www.prismetric.com/qwen-2-5-what-it-is-and-how-to-use-it/

[129] https://techcrunch.com/2025/01/27/alibabas-qwen-team-releases-ai-models-that-can-control-pcs-and-phones/

[130] https://www.youtube.com/watch?v=X5bKFLK_EDw

[131] https://www.technologyreview.com/2025/02/03/1110826/openais-new-agent-can-compile-detailed-reports-on-practically-any-topic/

[132] https://www.nytimes.com/2025/02/02/technology/openai-deep-research-tool.html

[133] https://www.tomsguide.com/ai/chatgpt/the-agents-are-coming-openai-confirms-ai-will-work-without-humans-in-2025

[134] https://www.youtube.com/watch?v=ANE4eFdGvUg

[135] https://telecomtalk.info/openai-new-ai-agent-in-chatgpt-research/989034/

[136] https://lablab.ai/t/agents-retrieval-chatbot

[137] https://www.technologyreview.com/2025/02/03/1110826/openais-new-agent-can-compile-detailed-reports-on-practically-any-topic

[138] https://www.linkedin.com/pulse/exploring-power-google-deepmind-revolutionizing-machine-deshmukh

[139] https://www.reddit.com/r/machinelearningnews/comments/1iep67g/memorization_vs_generalization_how_supervised/

[140] https://syncedreview.com/2024/11/29/self-evolving-prompts-redefining-ai-alignment-with-deepmind-chicago-us-eva-framework-9/

[141] https://deepmind.google/discover/blog/generally-capable-agents-emerge-from-open-ended-play/

[142] https://wandb.ai/byyoung3/ml-news/reports/Deepmind-trains-self-correcting-LLM-s-with-RL--Vmlldzo5NDc1ODk2

[143] https://www.marktechpost.com/2025/01/26/google-deepmind-introduces-mona-a-novel-machine-learning-framework-to-mitigate-multi-step-reward-hacking-in-reinforcement-learning/

[144] https://www.reddit.com/r/MachineLearning/comments/bo4uai/d_why_does_deep_reinforcement_learning_not/

[145] https://github.com/NVlabs/EAGLE?tab=readme-ov-file

[146] https://huggingface.co/nvidia/Eagle2-9B

[147] https://app.daily.dev/posts/nvidia-ai-releases-eagle2-series-vision-language-model-achieving-sota-results-across-various-multim-u8dt7nzq9

[148] https://podtail.com/es/podcast/the-top-ai-news-from-the-past-week-every-thursdai/

[149] https://www.reddit.com/r/machinelearningnews/comments/1iddoyg/nvidia_ai_releases_eagle2_series_visionlanguage/

[150] https://podtail.com/en/podcast/the-top-ai-news-from-the-past-week-every-thursdai/--thursdai-sunday-special-deep-dives-into-crew-ai-/

[151] https://www.airesearchinsights.com/p/featured-ai-nvidia-releases-eagle2-and-meta-ai-introduces-mr-q

[152] https://arxiv.org/html/2501.18099v1

[153] https://www.reddit.com/r/machinelearningnews/comments/1ie8t5j/meta_ai_proposes_evalplanner_a_preference/

[154] https://arize.com/blog/llm-as-judge-survey-paper/

[155] https://app.daily.dev/posts/meta-ai-proposes-evalplanner-a-preference-optimization-algorithm-for-thinking-llm-as-a-judge-jezp0qrp9

[156] https://docs.aws.amazon.com/bedrock/latest/userguide/evaluation-judge.html

[157] https://www.marktechpost.com/2025/02/03/anthropic-introduces-constitutional-classifiers-a-measured-ai-approach-to-defending-against-universal-jailbreaks/

[158] https://langchain-ai.github.io/langgraph/

[159] https://www.youtube.com/watch?v=aHCDrAbH_go

[160] https://langchain-ai.github.io/langgraph/tutorials/introduction/

[161] https://www.getzep.com/ai-agents/langchain-agents-langgraph

[162] https://www.youtube.com/watch?v=0NwVNaD2Sgo

[163] https://langchain-ai.github.io/langgraph/tutorials/workflows/

[164] https://blogs.nvidia.com/blog/deepseek-r1-nim-microservice/

[165] https://gigazine.net/gsc_news/en/20250129-deepseek-r1-dynamic-quantized/

[166] https://www.youtube.com/watch?v=_CXwZ5xyFno

[167] https://www.hypotenuse.ai/blog/what-is-deepseek-r1-and-why-is-it-making-waves-in-ai

[168] https://ingoampt.com/deepseek-day-80/

[169] https://www.philschmid.de/mini-deepseek-r1

[170] https://docs.unsloth.ai/basics/tutorial-how-to-run-deepseek-r1-on-your-own-local-device

[171] https://pyimagesearch.com/2024/12/30/deploy-gradio-apps-on-hugging-face-spaces/

[172] https://github.com/sanikamal/genai-huggingface/

[173] https://pyimagesearch.com/2025/02/03/introduction-to-gradio-for-building-interactive-applications/

[174] https://huggingface.co/docs/transformers/main/en/tasks/image_captioning

[175] https://huggingface.co/blog/run-comfyui-workflows-on-spaces

[176] https://huggingface.co/docs/transformers/en/tasks/image_text_to_text

0 comments

Burstiness and Perplexity

skool.com/burstiness-and-perplexity

Master AI use cases from legal & the supply chain to digital marketing & SEO. Agents, analysis, content creation--Burstiness & Perplexity from NovCog

Leaderboard (30-day)