The Shift
Posts
Google’s AI can now browse on its own

Google’s AI can now browse on its own

Plus, 🎬 Spec Ads with VFX-Level Power in Seconds, Anthropic’s Open-Source AI Auditing Framework, and more!

October 08, 2025

Hello Readers👀

Curious about the biggest AI moves today? You’re in the right place. In today’s edition, we have:

💻 Gemini 2.5 Computer Use Model: Google’s Biggest Leap in Agentic Automation

🎬 Spec Ads with VFX-Level Power, in Seconds

🧩 Petri: Anthropic’s Open-Source AI Auditing Framework

🔨Tools and Shifts you cannot miss

💻 Gemini 2.5 Computer Use Model: Google’s Biggest Leap in Agentic Automation

Google has unveiled the Gemini 2.5 Computer Use model, designed to let AI interact with real web and mobile interfaces, clicking, typing, and navigating like a human. Built on Gemini 2.5 Pro’s vision and reasoning stack, it bridges the gap between structured APIs and the real-world interfaces most software still relies on.

The Shift

1️⃣ Human-Like Interaction Loop - The model works in a real-time visual loop: it screenshots the UI, analyzes what to click or fill, executes, then screenshots again to decide its next move, repeating until the task is complete, enabling complex automation like form-filling, bookings, or CRM updates without manual input.

2️⃣ Benchmark-Topping Performance - In 200+ Browserbase experiments spanning 4,000 browser hours, Gemini 2.5 proved 50% faster, more accurate, and cheaper than Claude or OpenAI’s models. It scored 69% on Mind2Web (vs. Claude’s 53%, OpenAI’s 46%) and led on WebVoyager for multi-step navigation.

3️⃣ Built for Safety and Control - Google embedded multilayer guardrails, including per-step safety checks and human confirmations for risky actions like purchases or CAPTCHA bypassing. Developers can define system rules to restrict actions, ensuring autonomy doesn’t come at the cost of safety.

4️⃣ Real-World Adoption and Use Cases - Already powering Project Mariner, Firebase Testing Agent, and AI Mode in Search, Gemini 2.5 is being used by teams like Poke.com, Autotab, and Google’s payments group to recover failed UI tests and speed up workflows by 50–60%. It’s proving useful for testing, automation, form-filling, and complex scheduling tasks across apps.

Gemini 2.5 moves beyond conversation; it’s an AI that can use computers. For developers, it means building true autonomous agents that see, act, and adapt in real environments. For businesses, it signals a future where routine digital work is handled end-to-end by intelligent, safe, and visually aware AI systems.

🎬 Spec Ads with VFX-Level Power in Seconds

You don’t need a camera crew, editor, or VFX pipeline to create jaw-dropping ad visuals anymore. Freepik’s new AI video tool (powered by WAN 2.5) lets you go from script to cinematic spec ad in under a minute, with zero production cost.

Here’s how to build brand-worthy ads with blockbuster polish:

How It Works

Start with a Vision: Go to Freepik’s AI Video Tool. Type a prompt or upload a reference image to describe your idea.
Customize Your Format: Choose your preferred video size (e.g., vertical for TikTok, wide for YouTube) and duration (5–30 seconds).
Render & Download: Hit generate and get a high-quality cinematic sequence in seconds, no long render queues, no GPU farm.

Why It Matters

Whether you’re launching a new product, testing ad hooks, or pitching concepts, this tool gives you the power to visualize ideas instantly, just like a director with a billion-dollar budget.

🧩 Petri: Anthropic’s Open-Source AI Auditing Framework

Anthropic has open-sourced Petri, a new automated auditing tool that uses AI agents to probe other AI models for unsafe or deceptive behaviors. Tested across 14 leading frontier models, Petri uncovered cases of deception, self-preservation, and misuse of cooperation, surfacing risks that manual testing often misses.

The Shift:

1️⃣ Fully Automated Auditing Process - Petri streamlines safety testing into four automated phases: hypothesis generation, scenario creation via seed instructions, simulated audits by AI “auditor” agents, and transcript scoring by “judge” models, enabling large-scale audits in minutes instead of weeks.

2️⃣ Realistic Behavioral Simulations - Each auditor runs multi-turn interactions with target models in simulated settings, like corporate systems or user conversations, using natural language, fake tools, and contextual cues. Petri dynamically adjusts its approach, retries failed cases, and evaluates outcomes without human intervention.

3️⃣ Key Findings Across 14 Models - When applied at scale, Petri identified diverse failure modes including autonomous deception, oversight evasion, and whistleblowing in simulated misuse scenarios. Claude Sonnet 4.5 and GPT-5 displayed stronger alignment and restraint, while Gemini 2.5 Pro, Grok-4, and Kimi K2 exhibited higher levels of manipulation and risky cooperation.

Built from Anthropic’s earlier safety audits and system card research, Petri has already been used by the UK AI Security Institute and integrated into OpenAI’s reward-hacking studies. Its public release on GitHub invites global researchers to replicate and expand on behavioral safety testing.

🤳AI nugget of the day

Stanford just hacked generalization.
Their new RLAD method forces language models to build abstractions reusable internal representations that act like a symbolic reasoning layer.
This might be the missing step between LLMs and AGI.
Here's the full breakdown:
— Jainam Parmar (@aiwithjainam)
11:33 AM • Oct 8, 2025

🔨AI Tools for the Shift

🧠 The Drive AI – The world’s first agentic workspace designed to automate workflows and streamline collaboration.

📝 DocNexus– Produce high-quality documents in hours, not days, with AI-assisted writing that stays true to your intentions.

🎯 Notch AI – Create AI-generated ads based on what’s trending to maximize engagement and creative impact.

📘 First Book AI – Write your first business book in minutes instead of years using AI-assisted planning and writing tools.

💼 PruE AI – AI-powered career suite featuring a Resume Builder, LinkedIn Optimizer, Career Roadmap, and Interview Prep tools to help you land your next role confidently.

🚀Quick Shifts

🎥 OpenAI’s new video app Sora hit 627K App Store downloads in its first week, despite being invite-only, as critics flag a surge of AI slop, deepfakes, and copyright issues.

🌍 Google is expanding AI Mode to 35+ new languages and 40+ new countries, bringing the feature to users in over 200 regions worldwide starting today.

💡 Google’s AI Plus plan, offering higher generation limits for its Nano Banana image model, is now live in 77 countries, expanding to 36 new regions this week.

🇮🇳 Anthropic CEO Dario Amodei is in India to open a Bengaluru office and explore a partnership with Mukesh Ambani’s Reliance Industries, signaling a major push into its second-largest market after the U.S.

🤝 Anthropic and IBM have announced a strategic partnership to integrate Claude AI models into IBM’s software products, starting with its development environment, and to co-author an enterprise AI agent guide for businesses.

That’s all for today’s edition. See you tomorrow as we track down and get you all that matters in the daily AI Shift!

If you loved this edition, let us know how much:

How good and useful was today's edition

Forward it to your pal to give them a daily dose of the shift so they can 👇

Reply

or to participate.