Dia Open Source Rival to ElevenLabs

AI gets emotional, MIT maps ML like the periodic table, and RAGEN helps models think in steps. Dialogue, discovery, and decision-making just got a major upgrade.

Apr 24, 2025

This week’s updates feel like they’re straight out of a sci-fi novel—but they’re all very real, and happening right now.

First up: imagine an AI that doesn’t just speak, but actually feels like it's part of the conversation. A tiny startup called Nari Labs just released Dia, a new open-source text-to-speech model that can generate voices with emotion, personality, and even little things like laughter, sighs, or a cough. It's like giving your script to a cast of actors—but the actors are all AI. It’s built for dialogue, and it’s surprisingly good. And yes, you can download it and play with it yourself.

Over at MIT, researchers just dropped something wild: a periodic table of machine learning. Yep, just like the one we had in chemistry class—but for algorithms. They found a way to connect over 20 classic ML techniques using a single equation, and even built a new image classification model out of it that beat current best-in-class systems. More than a cool visual—it’s a powerful tool for building the next wave of AI models.

And lastly, if you’ve ever wondered how to train AI to think more like a human over multiple steps, check out RAGEN. It’s a new reinforcement learning framework that helps AI agents make better decisions in complex, interactive environments. No more repetitive loops or losing the thread mid-task—RAGEN is designed to help models reason long-term and stay on track.

So yeah—between emotional AI voices, algorithmic blueprints, and reasoning agents, it’s been a big week. Hope you find these as fascinating as I did.

Open-Source Text-to-Speech Dialogue Model

Dia is a 1.6 billion parameter open-source text-to-speech (TTS) model developed by Nari Labs, a small startup founded by two engineers. Designed to generate highly realistic, multi-speaker dialogue directly from transcripts, Dia stands out for its ability to produce lifelike conversational audio, including non-verbal cues such as laughter, coughing, and sighs, based on tags embedded in the input text. The model supports emotional and tonal control through audio conditioning, allowing users to influence the generated speech by providing short audio samples. While Dia currently supports only English and requires a CUDA-enabled NVIDIA GPU with 12–13GB VRAM for full performance, Nari Labs plans to introduce CPU support and quantized versions to broaden accessibility. Released under the permissive Apache 2.0 license, Dia’s code and pretrained weights are freely available on GitHub and Hugging Face, making it suitable for both research and commercial use, provided ethical guidelines are followed. Its unique focus on expressive, controllable dialogue generation positions Dia as a strong open-source alternative to proprietary TTS solutions like ElevenLabs and Google’s NotebookLM, with potential applications ranging from podcasts and audiobooks to video game characters and conversational AI interfaces.

Periodic Table of Machine Learning

MIT researchers have developed a “periodic table of machine learning” that organizes and connects over 20 classical algorithms using a unifying mathematical equation, revealing how these methods are fundamentally related in the way they learn relationships between data points. This new framework, called Information Contrastive Learning (I-Con), not only clarifies the structure of existing algorithms but also highlights gaps where new, yet-to-be-invented algorithms could fit. By combining elements from different algorithms within this system, the team created a new image-classification method that outperformed current state-of-the-art approaches by 8%. The periodic table serves as a toolkit for AI researchers, making it easier to design innovative models by fusing or extending established ideas, and is expected to accelerate discovery and collaboration in the field by offering a structured, visual guide to algorithm development and potential breakthroughs.

RAGEN Reinforcement Learning Framework for Multi-Turn Reasoning Agents

RAGEN is a reinforcement learning framework that trains large language model agents to perform multi-turn reasoning in interactive, stochastic environments. Built on the StarPO (State-Thinking-Action-Reward Policy Optimization) method, it optimizes entire reasoning trajectories instead of individual steps, enabling better long-term decision-making. RAGEN addresses challenges like reasoning collapse and repetitive behavior (“Echo Trap”) by using techniques such as uncertainty-based filtering and careful reward design to stabilize and improve training efficiency. Tested on tasks like Bandit, Sokoban, and Frozen Lake, RAGEN supports multiple optimization strategies including PPO and GRPO. Developed by researchers from Northwestern University, Stanford, Microsoft, and others, RAGEN is open-source and aims to advance the reasoning capabilities of AI agents.

Hand Picked Video

In this video, we'll look at Elevenlabs Conversational AI Agents.

Top AI Products from this week

Manna - Struggling with daily Bible motivation, comprehension, or consistency? Manna blends Scripture with gamified learning, AI-generated personalized levels, and an AI mentor for instant insights.
Buildpad - Buildpad is an AI co-founder that helps founders build products that people actually want. You'll be guided step-by-step through 7 phases, starting with finding a real problem to solve and finishing with building your product.
Alter - Alter is the Mac AI assistant that lets you get stuff done fast. Just talk to extract insights from YouTube videos, draft emails, or extract data. It understands what's on your screen and works across your apps, bringing intelligence to your entire day.
Partnero - Partnero is a powerful affiliate and referral management tool featuring advanced commission settings, automated payouts, fraud protection, and dynamic tracking options, including branded links and promo codes. Built with exceptional partner experience in mind.
Jo - Most products fail. Not because they’re bad, but because no one asked users. Jo makes it super simple to get fast, real feedback before you ship. Just share a link. Jo does the rest.
RightNow AI - Automatically profile, detect bottlenecks, and optimize your CUDA kernels for peak performance with no code required.

This week in AI

SWE PolyBench - Amazon introduces SWE-PolyBench, a multilingual benchmark with 2000+ issues in Java, JavaScript, TypeScript, and Python, for evaluating AI coding agents in real-world scenarios.
π0.5 Open-World Generalization - π0.5, a VLA model, generalizes to new environments by co-training on heterogeneous data, improving robot performance in complex tasks like cleaning in unseen homes. It combines action, images, and text for robust physical skills.
Generative Ghosts: AI Afterlives - AI "generative ghosts" are digital agents modeled on individuals that generate new content after death, offering comfort and memory but raising privacy and ethical concerns
Grok Vision AI Camera Feature - Elon Musk’s xAI launched Grok Vision, letting users scan real-world objects with their phone camera for instant explanations and context, now on iOS; Android via premium plan.
AvatarFX AI Video from Any Image - AvatarFX by Character.AI turns any image into a lifelike video that can speak, sing, and emote, with top-quality motion and audio, even for long videos and non-human faces.

ExplainX Substack

Discussion about this post