India grappling with a dearth of agentic AI experts
India faces a critical shortage of agentic AI professionals, with demand projected to double by 2026. The current talent pool of under 100,000 struggles to meet the growing need, driving up salaries and competition among GCCs, IT firms, and startups. The market for AI agents is projected to grow significantly, driven by sectors like autonomous vehicles, smart manufacturing, and healthcare.
Adobe Firefly: The next evolution of creative AI
Adobe Firefly, which revolutionized the creative industry in under two years, has generated over 20 billion assets worldwide. Today at Adobe MAX London, they unveil ed the latest Firefly release, unifying AI-powered tools for image, video, audio, and vector generation into a single platform with new capabilities.
OpenAI releases latest image generation model in the API
The latest image generation model, gpt-image-1, is now available in the API, enabling developers to integrate high-quality image generation into their tools and platforms. The model can create images in diverse styles, follow custom guidelines, and accurately render text.
Two undergrads and an AI speech model
Two undergrads from Korea, with just three months of AI tinkering, have unleashed Dia—a 1.6B parameter open-source speech model that mimics podcast-style dialogues, rivaling Google’s NotebookLM. Built using Google’s TPU Research Cloud, Dia allows users to customize tones, insert natural speech disfluencies, and even clone voices, all operable on consumer-grade GPUs with 10GB VRAM. Available on Hugging Face and GitHub, it offers impressive performance, though it lacks safeguards against misuse. Nari Labs plans to expand Dia’s capabilities and language support, aiming to add a social layer atop their synthetic voice platform.
MCP - the OS of the AI fueled machine
A good look at the state of MCP by Charlie Graham. The current clients like Claude, cursor & VS Code seem crude but the future is bright. MCP’s are like “chat apps” that these clients run. MCP clients could dominate the future as the search engines do today. The gen1 apps lack security and results vary depending on llm interpretation of your query. But we are at gen 1 of this tech and in the past few weeks the count of MCP servers on MCP.so has exceeded 10k. If you want to dip your toes into AI app building MCP is where the action is.
AI Horseless Carriages
An amazing essay on building software for the AI Era. A lot of today’s software is akin to the “Horseless carriage” which refers to the early motor car designs that borrowed heavily from the horse-drawn carriages that preceded them. Peter Koomen brilliantly shows the folly of today’s AI app builders and suggest some creative ways of getting beyond the AI slop with how AI is being used in email apps.
The llm as Customer
Interesting insight from @karpathy on X that sees the llm as your customer
Are we pivoting to a world where we depend on llm’s and products are built llm first ?
Podcast Recommendation : Lennys Podcast - Varun Mohan - CEO Windsurf
Building a magical AI code editor used by over 1m developers in 4 months: Inside Windsurf
🚀 Google Launches Gemma QAT Models for Consumer GPUs 🎮🧠
Google has released new quantization-aware trained (QAT) versions of its Gemma 2B and 7B models, enabling state-of-the-art performance while running efficiently on consumer-grade GPUs. These models are designed to maintain accuracy even after 4-bit quantization, thanks to techniques like QLoRA and SmoothQuant. Notably, they outperform competitors like Mistral and LLaMA 2 across multiple benchmarks, and offer open weights for local deployment via platforms like Hugging Face and NVIDIA TensorRT-LLM. This positions Gemma QAT as a major leap toward democratizing high-performance AI inference for individual developers and small teams.
🧑🎓 Columbia student raises $5.3M for an AI tool to ‘cheat on everything’ 🖥️
Chungin “Roy” Lee, a 21-year-old former Columbia student, raised $5.3 million for his startup Cluely, which offers an AI tool to “cheat” on exams, sales calls, and job interviews. The tool, originally called Interview Coder, was developed by Lee and his co-founder, Neel Shanmugam, and led to their suspension from Columbia. Cluely’s manifesto compares the tool to inventions like the calculator and spellcheck, while a launch video featuring Lee using the AI assistant on a date sparked both praise and criticism.
🚀 Seaweed 7B an AI video generation model from Bytedance 🎨🤖
ByteDance has unveiled Seaweed-7B, a 7-billion-parameter AI video generation model that delivers high-quality, real-time video with synchronized audio, rivaling larger models like OpenAI’s Sora and Google’s Veo at a fraction of the compute cost. Seaweed-7B supports text-to-video, image-to-video, and audio-driven synthesis, producing up to 60-second 720p videos at 24fps in real-time. It features advanced capabilities such as multi-shot storytelling, precise camera control, and lifelike human motion with synchronized lip-sync. The model’s efficiency stems from innovations like a 64× compression VAE, hybrid-flow Transformer architecture, and a progressive training strategy, reducing training costs by two-thirds compared to similar models. Seaweed-7B is currently in closed testing, with potential applications in AI filmmaking, education, gaming, and virtual assistants.
🚀 Kling AI 2.0 Goes Global: Unleashing Next-Level Creativity with AI 🎨🤖
Kling AI has launched Kling 2.0, a major upgrade to its generative video and image capabilities. The update includes new core models—Kling 2.0 Master for video and KOLORS 2.0 for images—featuring improved semantic alignment, prompt adherence, and motion realism. Users can now generate higher-quality visuals with cinematic effects, fluid character movement, and enhanced style control. New tools like a multi-element video editor and advanced image editing (inpainting, outpainting, restyling) expand creative flexibility, while over 60 style options broaden design possibilities. Kling claims industry leadership in motion quality and visual fidelity, supported by internal benchmarks. The company has also announced partnerships with AWS, Xiaomi, and Alibaba Cloud, though some users have raised concerns about the new pricing and credit model.
🤖 Hugging Face Acquires Pollen Robotics to Democratize Open-Source Humanoid AI
Hugging Face has acquired Pollen Robotics, the French startup behind the open-source humanoid robot Reachy 2. This move aims to democratize robotics by making both software and hardware open source, allowing developers to modify and improve upon them. Reachy 2, capable of tasks like picking up fruit and organizing mugs, is already being used by several major AI firms for research. Hugging Face plans to sell the robot while also releasing its code and hardware designs, promoting transparency and collaboration in robotics development. This acquisition aligns with Hugging Face’s mission to foster open-source AI development, similar to its previous initiatives in hosting open-weight AI models. The company believes that open-sourcing robotics will accelerate innovation and lead to safer, more capable robots.
🚀 Google Unveils Gemini 1.5 Flash: A Lightning-Fast AI Model Built for Speed and Efficiency ⚡🤖
Google has introduced Gemini 1.5 Flash, an optimized, lightweight AI model capable of handling multimodal inputs and high-throughput tasks, now available via the Gemini API.
🤖📱 OpenAI’s Secret Social App? A Potential Rival to X (Twitter) Is in the Works! 🚀🧠
OpenAI is reportedly working on a new social media platform that could rival X (formerly Twitter), according to insider sources. The experimental app, being quietly developed under the radar, is said to focus on AI-enhanced conversations, blending real-time social interaction with OpenAI’s language models. Though still in early stages, this move hints at OpenAI’s broader ambitions to go beyond AI tools and enter the consumer social networking space—where human-AI interactions might redefine how we connect, share, and engage online.
🧠👀 Microsoft Copilot Gets Eyes on Your Screen in Edge — Here’s What It Can Do! 🔍💻
Microsoft has upgraded its AI assistant Copilot with a powerful new feature: screen context awareness in the Edge browser. Now, Copilot can “see” what’s on your screen—from web pages to PDFs—and offer more relevant help, like summarizing content, explaining code, or generating emails based on what you’re viewing. This feature works via the Edge sidebar and allows users to ask questions or issue commands tied to on-screen content. It marks a major leap in contextual AI assistance, offering a smoother, more intuitive browsing and productivity experience.
🤖 Gemini Live’s screen sharing now free for Android users 📱
Gemini Live’s screen sharing feature, previously limited to Pixel 9 and Samsung Galaxy S25 users with a Gemini Advanced subscription, is now free for all Android users. The feature, which allows Gemini to see and respond to what’s on your camera and screen, will roll out over the coming weeks.
OpenAI in talks to buy Windsurf for about $3 billion
OpenAI is in talks to acquire Windsurf, an AI coding startup, for approximately $3 billion. This acquisition would be OpenAI’s largest to date and aims to help the company stay ahead in the generative AI race.Windsurf was in talks with investors such as Kleiner Perkins and General Catalyst to raise funding at a $3 billion valuation, the report added.It closed a $150 million funding round led by General Catalyst last year, valuing it at $1.25 billion.
Codex CLI - An Open-Source Local Coding Agent FROM OPENAI
OpenAI released Codex CLI, an open-source tool that translates natural language commands into executable code within terminal environments. It leverages OpenAI’s language models to interpret user inputs and supports multimodal inputs, enhancing its versatility. The tool operates locally, ensuring data privacy and reducing latency, and offers configurable autonomy levels for tailored behavior. To begin using Codex CLI, visit the official GitHub repository for installation instructions and documentation github.com/openai/co…