Google Brings Computer-Use Capabilities to Gemini

Gemini learns computer control, NVIDIA and Hugging Face speed up fine-tuning, and a new training method helps models learn from their own reasoning.

Jun 27, 2026

This week in AI, the focus is shifting toward smarter automation, faster model customization, and new training methods that improve how AI systems reason. The industry is steadily moving beyond static chatbots toward agents and models that can act, adapt, and learn more effectively in real-world environments.

Google introduced a new computer-use capability for Gemini 3.5 Flash, allowing the model to interact with software and websites by clicking buttons, navigating interfaces, filling forms, and completing multi-step tasks. The update pushes Gemini closer to becoming a practical AI agent capable of performing real-world digital workflows.
NVIDIA and Hugging Face announced new integrations that simplify and accelerate AI fine-tuning. By combining NVIDIA’s NeMo framework with Hugging Face’s model ecosystem, developers can customize large language models more efficiently while reducing infrastructure complexity and improving scalability.
Researchers also unveiled a new training approach that teaches AI models through intermediate reasoning steps rather than only final answers. The method improves performance on complex tasks such as mathematics, coding, and reasoning while producing more reliable and interpretable problem-solving processes.

Together, these updates highlight a broader shift toward AI systems that can act autonomously, adapt quickly to specialized needs, and reason more effectively across complex tasks.

Gemini 3.5 Flash Learns to Use Computers

Google has introduced a new computer-use capability for Gemini 3.5 Flash, enabling the model to interact with graphical user interfaces much like a human user. The system can understand on-screen content, click buttons, navigate websites, fill out forms, and complete multi-step tasks across applications. Designed for developers building AI agents, the feature allows Gemini to translate natural language instructions into actions performed directly on a computer. By combining visual understanding with task execution, Google is pushing Gemini beyond simple chat interactions and toward autonomous digital assistants capable of handling real-world workflows across software and web environments.

NVIDIA and Hugging Face Make AI Fine-Tuning Faster

NVIDIA and Hugging Face to Connect Millions of Developers to Generative AI Supercomputing | NVIDIA Newsroom

NVIDIA and Hugging Face have introduced new integrations that make it easier and faster to fine-tune large language models using NVIDIA’s NeMo framework and AutoModel tools. The update streamlines the training workflow, allowing developers to customize foundation models with less setup and improved performance across NVIDIA hardware. By combining Hugging Face’s widely used model ecosystem with NeMo’s optimization capabilities, the collaboration helps researchers and enterprises accelerate AI development, reduce infrastructure complexity, and bring customized models into production more efficiently. This move lowers the barrier to advanced model training while improving scalability for real-world AI applications.

New Method Helps AI Learn From Its Own Reasoning

Researchers have introduced a new training approach that helps AI models improve by learning from intermediate reasoning steps rather than focusing only on final answers. The method enables models to better understand how problems are solved, leading to stronger performance on complex reasoning, mathematics, and coding tasks. Unlike traditional training methods that reward only the final outcome, this approach provides feedback throughout the reasoning process, helping models identify and correct mistakes earlier. The researchers found that models trained this way produce more reliable and interpretable reasoning chains while requiring fewer training examples. The work highlights a growing trend in AI research: teaching models how to think, not just what answer to produce, which could lead to more capable and trustworthy AI systems in the future.

Hand Picked Video

In this video, I walk you through the two best platforms to discover and install AI Skills for Claude and one of them is built by me.

Top AI Products from this week

AI Slide Editor by CubeOne - CubeOne is an AI slide editor you talk to. Describe a slide and it designs one, or drop in rough notes and images and it makes a polished slide.
Basedash for Excel - Basedash now works with Excel, both ways. Drop an .xlsx file into the agent and it reads your data, analyzes it, and builds charts and dashboards in seconds no formulas, no pivot tables.
LockIn MCP - LockIn MCP is the first distraction block built for the AI agent era. Rather than using a bypassable Chrome extension, you now just tell your favourite agent to block distractions for you, and it can do it natively. No bypassing, pure focus.
DMV by Agent Community - Agent Community is building the identity layer for the agentic web. We are applying to ICANN for the [.agent] Top-Level Domain, supported by 29,000+ members and 7,000+ companies.
Cewsco - Cewsco is a premium AI assistant. Chat in real time, generate images, have voice conversations, get live stock and crypto market intelligence, manage your calendar, and more all in one app. Works on any device, no app store needed.
ModuleX - ModuleX is an AI workspace already connected to 200+ integrations. Describe what you want, and your assistant answers with your data, acts through your tools, and turns the work into a visual workflow your team can edit together.

This week in AI

GPT-5.5 Instant Upgraded - OpenAI has upgraded GPT-5.5 Instant with better intent understanding, stronger constraint handling, and improved recommendations. Rolling out now to paid users.
Adobe Buys Topaz Labs - Adobe is acquiring Topaz Labs, bringing its AI-powered image and video enhancement technology into Adobe’s creative ecosystem to boost editing capabilities.
Adobe Firefly Graph - Adobe’s new Firefly Graph turns creative workflows into reusable assets, allowing teams to automate, save, and scale complex content creation processes.
Gaming Trains AI Agents - General Intuition’s 2.3B project uses video games to train AI agents, helping them develop planning, decision-making, and real-world problem-solving skills.

Paper Of the day

Researchers introduced Arbor, an autonomous AI research framework that mimics how scientists work. Instead of repeatedly trying random solutions, Arbor organizes ideas into a Hypothesis Tree, where hypotheses, experiments, evidence, and lessons learned are stored and refined over time. A long-term coordinator manages the overall research strategy, while specialized executors test individual ideas in isolated environments. This allows the system to accumulate knowledge across multiple experiments and continuously improve its approach. In evaluations across model training, data synthesis, and engineering tasks, Arbor achieved the best results on all tested research tasks and significantly outperformed existing coding agents, demonstrating a major step toward general-purpose autonomous research systems.

Read this whole paper 👉 here

Explainx Substack

Discussion about this post

Ready for more?