Mind Machine AI

April 27, 2025

India grappling with a dearth of agentic AI experts

India faces a critical shortage of agentic AI professionals, with demand projected to double by 2026. The current talent pool of under 100,000 struggles to meet the growing need, driving up salaries and competition among GCCs, IT firms, and startups. The market for AI agents is projected to grow significantly, driven by sectors like autonomous vehicles, smart manufacturing, and healthcare.

economictimes

April 24, 2025

Suna - a generalist AI agent

Suna By Kortix AI

April 24, 2025

Adobe Firefly: The next evolution of creative AI

Adobe Firefly, which revolutionized the creative industry in under two years, has generated over 20 billion assets worldwide. Today at Adobe MAX London, they unveil ed the latest Firefly release, unifying AI-powered tools for image, video, audio, and vector generation into a single platform with new capabilities.

April 24, 2025

OpenAI releases latest image generation model in the API

The latest image generation model, gpt-image-1, is now available in the API, enabling developers to integrate high-quality image generation into their tools and platforms. The model can create images in diverse styles, follow custom guidelines, and accurately render text.

OpenAI gpt-image-1

April 23, 2025

Two undergrads and an AI speech model

Two undergrads from Korea, with just three months of AI tinkering, have unleashed Dia—a 1.6B parameter open-source speech model that mimics podcast-style dialogues, rivaling Google’s NotebookLM. Built using Google’s TPU Research Cloud, Dia allows users to customize tones, insert natural speech disfluencies, and even clone voices, all operable on consumer-grade GPUs with 10GB VRAM. Available on Hugging Face and GitHub, it offers impressive performance, though it lacks safeguards against misuse. Nari Labs plans to expand Dia’s capabilities and language support, aiming to add a social layer atop their synthetic voice platform.

Nari Labs

April 23, 2025

MCP - the OS of the AI fueled machine

A good look at the state of MCP by Charlie Graham. The current clients like Claude, cursor & VS Code seem crude but the future is bright. MCP’s are like “chat apps” that these clients run. MCP clients could dominate the future as the search engines do today. The gen1 apps lack security and results vary depending on llm interpretation of your query. But we are at gen 1 of this tech and in the past few weeks the count of MCP servers on MCP.so has exceeded 10k. If you want to dip your toes into AI app building MCP is where the action is.

mcps- gatekeepers and the future of AI

April 23, 2025

AI Horseless Carriages

An amazing essay on building software for the AI Era. A lot of today’s software is akin to the “Horseless carriage” which refers to the early motor car designs that borrowed heavily from the horse-drawn carriages that preceded them. Peter Koomen brilliantly shows the folly of today’s AI app builders and suggest some creative ways of getting beyond the AI slop with how AI is being used in email apps.

AI Horseless Carriages

April 22, 2025

The llm as Customer

Interesting insight from @karpathy on X that sees the llm as your customer

Are we pivoting to a world where we depend on llm’s and products are built llm first ?

April 22, 2025

Podcast Recommendation : Lennys Podcast - Varun Mohan - CEO Windsurf

Building a magical AI code editor used by over 1m developers in 4 months: Inside Windsurf

April 22, 2025

🚀 Google Launches Gemma QAT Models for Consumer GPUs 🎮🧠

Google has released new quantization-aware trained (QAT) versions of its Gemma 2B and 7B models, enabling state-of-the-art performance while running efficiently on consumer-grade GPUs. These models are designed to maintain accuracy even after 4-bit quantization, thanks to techniques like QLoRA and SmoothQuant. Notably, they outperform competitors like Mistral and LLaMA 2 across multiple benchmarks, and offer open weights for local deployment via platforms like Hugging Face and NVIDIA TensorRT-LLM. This positions Gemma QAT as a major leap toward democratizing high-performance AI inference for individual developers and small teams.

Google Developer Blog

April 22, 2025

🧑‍🎓 Columbia student raises $5.3M for an AI tool to ‘cheat on everything’ 🖥️

Chungin “Roy” Lee, a 21-year-old former Columbia student, raised $5.3 million for his startup Cluely, which offers an AI tool to “cheat” on exams, sales calls, and job interviews. The tool, originally called Interview Coder, was developed by Lee and his co-founder, Neel Shanmugam, and led to their suspension from Columbia. Cluely’s manifesto compares the tool to inventions like the calculator and spellcheck, while a launch video featuring Lee using the AI assistant on a date sparked both praise and criticism.

techcrunch

April 21, 2025

🚀 Seaweed 7B an AI video generation model from Bytedance 🎨🤖

ByteDance has unveiled Seaweed-7B, a 7-billion-parameter AI video generation model that delivers high-quality, real-time video with synchronized audio, rivaling larger models like OpenAI’s Sora and Google’s Veo at a fraction of the compute cost. Seaweed-7B supports text-to-video, image-to-video, and audio-driven synthesis, producing up to 60-second 720p videos at 24fps in real-time. It features advanced capabilities such as multi-shot storytelling, precise camera control, and lifelike human motion with synchronized lip-sync. The model’s efficiency stems from innovations like a 64× compression VAE, hybrid-flow Transformer architecture, and a progressive training strategy, reducing training costs by two-thirds compared to similar models. Seaweed-7B is currently in closed testing, with potential applications in AI filmmaking, education, gaming, and virtual assistants.

April 21, 2025

🚀 Kling AI 2.0 Goes Global: Unleashing Next-Level Creativity with AI 🎨🤖

Kling AI has launched Kling 2.0, a major upgrade to its generative video and image capabilities. The update includes new core models—Kling 2.0 Master for video and KOLORS 2.0 for images—featuring improved semantic alignment, prompt adherence, and motion realism. Users can now generate higher-quality visuals with cinematic effects, fluid character movement, and enhanced style control. New tools like a multi-element video editor and advanced image editing (inpainting, outpainting, restyling) expand creative flexibility, while over 60 style options broaden design possibilities. Kling claims industry leadership in motion quality and visual fidelity, supported by internal benchmarks. The company has also announced partnerships with AWS, Xiaomi, and Alibaba Cloud, though some users have raised concerns about the new pricing and credit model.

klingai

April 20, 2025

🤖 Hugging Face Acquires Pollen Robotics to Democratize Open-Source Humanoid AI

Hugging Face has acquired Pollen Robotics, the French startup behind the open-source humanoid robot Reachy 2. This move aims to democratize robotics by making both software and hardware open source, allowing developers to modify and improve upon them. Reachy 2, capable of tasks like picking up fruit and organizing mugs, is already being used by several major AI firms for research. Hugging Face plans to sell the robot while also releasing its code and hardware designs, promoting transparency and collaboration in robotics development. This acquisition aligns with Hugging Face’s mission to foster open-source AI development, similar to its previous initiatives in hosting open-weight AI models. The company believes that open-sourcing robotics will accelerate innovation and lead to safer, more capable robots.

pollen-robotics

April 17, 2025

🚀 Google Unveils Gemini 1.5 Flash: A Lightning-Fast AI Model Built for Speed and Efficiency ⚡🤖

Google has introduced Gemini 1.5 Flash, an optimized, lightweight AI model capable of handling multimodal inputs and high-throughput tasks, now available via the Gemini API.

April 17, 2025

🤖📱 OpenAI’s Secret Social App? A Potential Rival to X (Twitter) Is in the Works! 🚀🧠

OpenAI is reportedly working on a new social media platform that could rival X (formerly Twitter), according to insider sources. The experimental app, being quietly developed under the radar, is said to focus on AI-enhanced conversations, blending real-time social interaction with OpenAI’s language models. Though still in early stages, this move hints at OpenAI’s broader ambitions to go beyond AI tools and enter the consumer social networking space—where human-AI interactions might redefine how we connect, share, and engage online.

April 17, 2025

🧠👀 Microsoft Copilot Gets Eyes on Your Screen in Edge — Here’s What It Can Do! 🔍💻

Microsoft has upgraded its AI assistant Copilot with a powerful new feature: screen context awareness in the Edge browser. Now, Copilot can “see” what’s on your screen—from web pages to PDFs—and offer more relevant help, like summarizing content, explaining code, or generating emails based on what you’re viewing. This feature works via the Edge sidebar and allows users to ask questions or issue commands tied to on-screen content. It marks a major leap in contextual AI assistance, offering a smoother, more intuitive browsing and productivity experience.

April 17, 2025

🤖 Gemini Live’s screen sharing now free for Android users 📱

Gemini Live’s screen sharing feature, previously limited to Pixel 9 and Samsung Galaxy S25 users with a Gemini Advanced subscription, is now free for all Android users. The feature, which allows Gemini to see and respond to what’s on your camera and screen, will roll out over the coming weeks.

April 16, 2025

OpenAI in talks to buy Windsurf for about $3 billion

OpenAI is in talks to acquire Windsurf, an AI coding startup, for approximately $3 billion. This acquisition would be OpenAI’s largest to date and aims to help the company stay ahead in the generative AI race.Windsurf was in talks with investors such as Kleiner Perkins and General Catalyst to raise funding at a $3 billion valuation, the report added.It closed a $150 million funding round led by General Catalyst last year, valuing it at $1.25 billion.

April 16, 2025

Codex CLI - An Open-Source Local Coding Agent FROM OPENAI

OpenAI released Codex CLI, an open-source tool that translates natural language commands into executable code within terminal environments. It leverages OpenAI’s language models to interpret user inputs and supports multimodal inputs, enhancing its versatility. The tool operates locally, ensuring data privacy and reducing latency, and offers configurable autonomy levels for tailored behavior. To begin using Codex CLI, visit the official GitHub repository for installation instructions and documentation github.com/openai/co…