The Week AI Hype Finally Cracked

When the $600B reality check hit Wall Street.

Nov 16, 2025

The $4.6M Model That Just Broke Silicon Valley’s Monopoly 🇨🇳💥

Forget everything you thought you knew about AI dominance. While OpenAI burns through billions and tech giants battle with bloated budgets, a Chinese startup just delivered the ultimate reality check: Moonshot AI’s Kimi K2 Thinking model beats GPT-5 and Claude Sonnet 4.5 on Humanity’s Last Exam and several agentic benchmarks - and analysts estimate they trained it for less than the cost of a Bay Area mansion.

The model scored 44.9% on Humanity’s Last Exam, outperforming GPT-5’s 41.7% and Claude Sonnet 4.5’s 32%. Even better? It’s released as an open-weight model under a permissive license and costs 6-10 times less to run than OpenAI and Anthropic’s models.

The BS-Free Translation: China isn’t coming for AI dominance - they might already be here. And they’re doing it by being smarter, not richer.

OpenAI Drops GPT-5.1: The “Please Like Me Again” Update 🤖

After GPT-5 landed with mixed reviews and developer pushback over latency and costs, OpenAI rushed out GPT-5.1 with what they’re calling “personality improvements.” The new GPT-5.1 Instant is “warmer, more intelligent, and better at following your instructions” while GPT-5.1 Thinking is “easier to understand and faster on simple tasks”.

The model dynamically adapts how much time it spends thinking based on task complexity, making it 2-3x faster than GPT-5 on simple queries. Plus, they’ve added more granular tone controls (e.g., more playful, more candid, more formal) because apparently AI needed a personality makeover.

What Actually Matters: GPT-5.1 costs significantly less than competitors, with Claude Opus 4.1 at $15/$75 per million tokens versus GPT-5.1’s more competitive pricing. The real innovation? Adaptive reasoning that actually works.

Claude Quietly Dominates While Everyone’s Distracted 🎭

While everyone’s arguing about Chinese AI and GPT personalities, Anthropic just shipped Claude Opus 4.1 with 74.5% accuracy on SWE-bench Verified - the highest coding benchmark score to date.

The model improved its “harmless response rate” to 98.76%, up from 97.27% in Opus 4, with a 25% reduction in cooperation with high-risk misuse scenarios. GitHub, Rakuten, and Windsurf are already reporting massive improvements in production.

The Sleeper Hit: Claude’s not just better at coding - it’s becoming the go-to for enterprises who care more about reliability than hype.

The AI Agent War Gets Real: Salesforce vs Microsoft Cage Match 🥊

Marc Benioff continues his assault on Microsoft Copilot, calling it “Clippy 2.0” while pushing Agentforce as the future. Benioff said Microsoft Copilot suffers from “a lack of context, skills and adaptability” and called it a “science project”.

Microsoft’s response? Ship more agents. Microsoft introduced two AI-powered sales agents for Microsoft 365 Copilot, with the Sales Development Agent working autonomously around the clock. Meanwhile, Salesforce Agentforce acts directly within CRM records with transparent, auditable steps.

Who’s Winning? Both are losing to reality. Gartner data suggests that by 2025, 90% of enterprise gen-AI projects will face slowdowns as costs begin to outweigh the value they deliver, and only 24% of Microsoft Copilot users are planning large-scale rollouts.

The Infrastructure Crisis Nobody Wants to Talk About ⚡

Microsoft CEO Satya Nadella acknowledged that “the biggest issue we are now having is not a compute glut, but it’s the power” with chips sitting in inventory that can’t be plugged in due to power shortages.

Meanwhile, OpenAI signed a deal with AWS for $38 billion, which is part of a broader ~$1.4 trillion in infrastructure commitments across OpenAI and its cloud partners.

Reality Check: We’re building AI faster than we can power it. The next bottleneck isn’t chips or data - it’s literally electricity.

This Week’s “Wait, What?” Moments 🤯

Google’s Living Room Takeover: Google is expanding Gemini for TV to any television connected to a Google TV Streamer, replacing Google Assistant and offering conversational recommendations and smart home control (The Verge)
AI Actress Drama: AI-generated “actress” Tilly Norwood has sparked outrage in Hollywood after news broke that agents were in talks to sign her, with SAG-AFTRA condemning the move as a threat to human creativity (SAG-AFTRA Statement)
UK’s AI Growth Zone Bet: The UK government announced a £42 billion AI Growth Zone in North East England, believing it will create up to 5,000 R&D jobs (GOV.UK)

What This Actually Means For You 📊

If you’re building with AI:

Stop overpaying for closed models - Kimi K2 shows open-weight models can rival frontier models on certain benchmarks
Focus on actual ROI, not benchmark scores
Consider power consumption in your infrastructure planning

If you’re investing in AI:

The moat isn’t the model anymore - it’s the application layer
Chinese AI isn’t “catching up” - they’re setting the pace on specific benchmarks
Agent fatigue is real - enterprises want results, not more chatbots

If you’re running a business:

Wait for Agentforce/Copilot v3.0 before major commitments
Test open-source alternatives seriously
Budget for 2x the AI costs you’re projecting

The BS-Free Bottom Line

The AI industry just learned that an analyst-estimated $4.6M training budget can beat a $100B valuation on key benchmarks. OpenAI and Anthropic are in an arms race while China is playing a different game entirely. Microsoft and Salesforce are fighting over who can automate sales emails while missing that enterprises don’t even want what they’re selling yet.

The real story? We’re entering the “show me the money” phase of AI. The hype cycle is ending, reality is setting in, and the winners won’t be who you expect.

Key Sources:

Discussion about this post

Ready for more?