The Shift
Posts
Spring 2025 Model Wars

Spring 2025 Model Wars

Plus, 💻 Gemini’s GitHub Integration Lets You Reproduce Repos in Seconds, LLMs Struggle in Multi-Turn Prompts, and more!

May 16, 2025

Hello there! Ready to dive into another upgrading, Mind-boggling, and Value-filled Shift?

Today we have:

🎶 Spring 2025 Model Wars: Real Users Are Ditching Old AI Favorites

💻 Gemini’s GitHub Integration Lets You Reproduce Repos in Seconds

💬 LLMs Struggle in Multi-Turn Prompts, Study Finds

🏆 Tools and Shifts you Cannot Miss

🎶 Spring 2025 Model Wars: Real Users Are Ditching Old AI Favorites

AI platform Poe’s latest usage report shows user behavior shifting fast, with newer models stealing share from once-dominant players. Across text, reasoning, image, and video, it’s clear that feature velocity and utility, not just legacy name recognition, are defining winners.

The Shift:

1. GPT-4.1 and Gemini 2.5 Surge While Claude Drops - Within weeks of launch, GPT-4.1 and Gemini 2.5 Pro captured 10% and 5% of Poe’s text traffic, respectively. Meanwhile, Claude models lost a 10% share, reflecting a fast user pivot to newer models with sharper reasoning and usability.

2. Reasoning Models Hit 10% Share with Gemini Leading the Pack - Reasoning-specific model use jumped from 2% to 10% since January, with Gemini 2.5 Pro commanding nearly a third of that subsegment. OpenAI's iterative o-series updates saw quick user migrations from o1 to o3 to o4-mini.

3. GPT-Image-1 Makes Waves, FLUX and Imagen3 Hold Ground - GPT-image-1 gained 17% usage within two weeks of launch, rivaling top generators like FLUX and Imagen3. While FLUX still leads with 35%, its share is slipping as newcomers gain traction.

4. Kling Overtakes Runway in Video AI, ElevenLabs Rules Audio - Kling 2.0 grabbed 30% of video generation share almost immediately, topping Runway and Google’s Veo 2. In audio, ElevenLabs still dominates with 80% TTS share, though challengers like Cartesia and Unreal Speech are emerging.

This isn’t just leaderboard reshuffling, it’s a live snapshot of where real users are investing attention and trust. Poe’s trends show that AI loyalty is fragile, and model success now hinges on speed, quality, and adaptability, not legacy.

💻 Gemini’s GitHub Integration Lets You Reproduce Repos in Seconds

Gemini now supports direct GitHub repository integration, and it’s shockingly simple to use. With just a paste-and-prompt flow, you can have Gemini 2.5 Pro analyze, recreate, or modify an entire repo all within a single chat window.

Step 1: Log In to Gemini: Go to gemini.google.com and select the 2.5 Pro model.

Step 2: Click “Add Code”: Inside the prompt bar at the bottom, click on the + sign. You’ll see an option called “Add code.” Click it a pop-up input window will appear.

Step 3: Paste Your GitHub Repo URL

In the pop-up, paste the full GitHub repo link for example:

https://github.com/vercel/next.js

You can also paste specific file URLs or code snippets directly from GitHub.

Step 4: Add Your Prompt

Beneath the pasted link, type your prompt. Examples:

“Explain what this repo does and summarize the core architecture.”
“Rebuild this project using TypeScript instead of JavaScript.”
“Add login functionality to this project with Firebase.”

Then hit Enter.

This integration makes Gemini a powerful GitHub-native coding assistant. No downloads, no cloning, just drop a repo, ask your question, and watch the project come to life. It’s real coding superpowers, now in one click.

💬 LLMs Struggle in Multi-Turn Prompts, Study Finds

A new Microsoft, Salesforce study reveals a critical blind spot in today’s leading LLMs: they falter hard during multi-turn conversations, even when they ace single-shot prompts. For developers building anything agentic or interactive, this exposes a real-world reliability issue no leaderboard will show..

The Shift:

1. Multi-Turn Prompts Slash Performance by 39% - Models like GPT-4.1, Claude 3.7 Sonnet, and Gemini 2.5 Pro averaged over 90% success on single-turn tasks but dropped to ~60% in multi-turn settings. The problem wasn’t just aptitude, but unreliability, top models became as volatile as weaker ones. Even minor clarifications over turns caused major drops in accuracy.

2. Models Get "Lost" by Compounding Mistakes - LLMs often jump to conclusions before gathering enough data, and worse, build on their own incorrect outputs. They overemphasize the first and last prompts while neglecting info revealed mid-convo, a “loss-in-the-middle” effect.

3. Temperature Tweaks and Reasoning Didn’t Help - Adjusting parameters like temperature or switching to reasoning-focused models had no meaningful impact. Even state-of-the-art models showed severe degradation when prompts unfolded slowly or instructions came in pieces.

This study shows LLMs still lack the conversational robustness needed for complex workflows or collaborative tasks, making prompt strategy and fallback design more important than ever.

🏆 AI Tools for the Shift

🧑‍🎤 The Influencer AI – Create custom media in minutes using your personal AI influencer. Perfect for creators who want speed, style, and scale.

🧩 Fluig – Instantly turn docs and ideas into clean, professional diagrams. One-click conversion between formats makes workflow seamless.

🎶 freebeat AI – Turn music and concepts into viral videos automatically. One click and your next hit is live.

🎥 VidMe – Generate UGC-style videos with AI avatars in minutes. Just write a script, choose an avatar, and let AI take the wheel.

🚀Quick Shifts

🛍️ Jeff Bezos, founder of Amazon and owner of The Washington Post, has struck a $5 billion deal with Saudi Crown Prince Mohammed bin Salman's AI firm, just six years after MBS ordered the murder of Post columnist Jamal Khashoggi.

💭 OpenAI’s GPT-4.1 and GPT-4.1 mini are now live in ChatGPT for paid users, offering faster, more accurate coding and instructions with expanded 1 million token context windows.

🤖 Sam Altman envisions ChatGPT remembering your entire life with a trillion-token context. While potentially transformative, trusting Big Tech with such deep personal data raises serious privacy, bias, and ethical concerns.

🌬️ Windsurf launched its own AI model family, SWE-1, optimized for the full software engineering workflow, not just coding, challenging frontier models like GPT-4.1 and Claude 3.5 with cheaper, tailored performance.

That’s all for today’s edition see you tomorrow as we track down and get you all that matters in the daily AI Shift!

If you loved this edition let us know how much:

How good and useful was today's edition

Forward it to your pal to give them a daily dose of the shift so they can 👇

Reply

or to participate.