Two undergrads and an AI speech model

Two undergrads from Korea, with just three months of AI tinkering, have unleashed Dia—a 1.6B parameter open-source speech model that mimics podcast-style dialogues, rivaling Google’s NotebookLM. Built using Google’s TPU Research Cloud, Dia allows users to customize tones, insert natural speech disfluencies, and even clone voices, all operable on consumer-grade GPUs with 10GB VRAM. Available on Hugging Face and GitHub, it offers impressive performance, though it lacks safeguards against misuse. Nari Labs plans to expand Dia’s capabilities and language support, aiming to add a social layer atop their synthetic voice platform.

Nari Labs