Simon Willison released datasette-agent-micropython 0.1a0, an alpha version of a tool designed to safely generate and execute Python code within a sandboxed environment. Initial tests show GPT-5.5 has been unable to escape the sandbox, indicating promising security for code execution. This release is a step towards enabling Datasette Agent to safely integrate AI-generated code.
→ This is a high-signal release for secure, sandboxed Python execution, directly relevant to local LLM tooling and safe code generation for projects like Bidet or TP3.
Hugging Face released a blog post discussing Direct Preference Optimization (DPO) and its applications beyond traditional chatbots. DPO is a method for aligning language models with human preferences without requiring a separate reward model. The post explores how DPO can be used for tasks like text summarization, style transfer, and even image generation, highlighting its versatility in fine-tuning models based on human feedback.
→ This deep dive into DPO's broader applications is highly relevant for fine-tuning local LLMs like Gemma or Llama for specific tasks beyond chat, potentially improving RAG or accessibility tools.
Holo3.1 is a new open-source computer use agent designed for fast, local operation. It leverages a multimodal LLM to understand screen content and user instructions, enabling autonomous task execution on a user's desktop. The agent is optimized for efficiency, running locally to provide quick responses without cloud dependency.
→ This is a high-signal release for local AI agents, offering on-device computer control that could integrate well with local LLMs like Gemma or Llama for enhanced accessibility and automation.
Hugging Face released a blog post discussing Direct Preference Optimization (DPO) and its applications beyond traditional chatbots. DPO is a method for aligning language models with human preferences without requiring a separate reward model. The post explores how DPO can be used for tasks like text summarization, style transfer, and even image generation, highlighting its versatility in fine-tuning models based on human feedback.
→ This deep dive into DPO's broader applications is highly relevant for fine-tuning local LLMs like Gemma or Llama for specific tasks beyond chat, potentially improving RAG or accessibility tools.
Axiom Math, a seven-month-old startup, reportedly solved all 12 problems of the Putnam exam, a notoriously difficult undergraduate math competition. Their score of 12/12 surpasses top human undergraduates and other AI systems like DeepSeek, though the impact of time limits on other scores is unclear. This achievement highlights significant progress in AI's ability to tackle complex mathematical problems.
→ This is a high-signal item demonstrating significant progress in AI's ability to solve complex, formal mathematical problems, which could have implications for RAG and education AI tools.
Simon Willison released datasette-agent-micropython 0.1a0, an alpha version of a tool designed to safely generate and execute Python code within a sandboxed environment. Initial tests show GPT-5.5 has been unable to escape the sandbox, indicating promising security for code execution. This release is a step towards enabling Datasette Agent to safely integrate AI-generated code.
→ This is a high-signal release for secure, sandboxed Python execution, directly relevant to local LLM tooling and safe code generation for projects like Bidet or TP3.
NVIDIA has launched Cosmos 3, a Mixture-of-Transformers architecture that unifies language, image, video, audio, and action. It features an autoregressive reasoner paired with a diffusion generator, available in Nano (16B) and Super (64B) models. Finetuned versions for Text2Image and Image2Video are now considered state-of-the-art open-weight models.
→ NVIDIA's Cosmos 3, with its open-weight SOTA image and video generation, is a strong contender for local LLM and multimodal on-device AI applications, potentially impacting future Gemma 4-style hackathons.
Simon Willison released datasette-agent-micropython 0.1a0, an alpha version of a tool designed to safely generate and execute Python code within a sandboxed environment. Initial tests show GPT-5.5 has been unable to escape the sandbox, indicating promising security for code execution. This release is a step towards enabling Datasette Agent to safely integrate AI-generated code.
→ This is a high-signal release for secure, sandboxed Python execution, directly relevant to local LLM tooling and safe code generation for projects like Bidet or TP3.
Hugging Face released a blog post discussing Direct Preference Optimization (DPO) and its applications beyond traditional chatbots. DPO is a method for aligning language models with human preferences without requiring a separate reward model. The post explores how DPO can be used for tasks like text summarization, style transfer, and even image generation, highlighting its versatility in fine-tuning models based on human feedback.
→ This deep dive into DPO's broader applications is highly relevant for fine-tuning local LLMs like Gemma or Llama for specific tasks beyond chat, potentially improving RAG or accessibility tools.
Holo3.1 is a new open-source computer use agent designed for fast, local operation. It leverages a multimodal LLM to understand screen content and user instructions, enabling autonomous task execution on a user's desktop. The agent is optimized for efficiency, running locally to provide quick responses without cloud dependency.
→ This is a high-signal release for local AI agents, offering on-device computer control that could integrate well with local LLMs like Gemma or Llama for enhanced accessibility and automation.
Axiom Math, a seven-month-old startup, reportedly solved all 12 problems of the Putnam exam, a notoriously difficult undergraduate math competition. Their score of 12/12 surpasses top human undergraduates and other AI systems like DeepSeek, though the impact of time limits on other scores is unclear. This achievement highlights significant progress in AI's ability to tackle complex mathematical problems.
→ This is a high-signal item demonstrating significant progress in AI's ability to solve complex, formal mathematical problems, which could have implications for RAG and education AI tools.
NVIDIA has launched Cosmos 3, a Mixture-of-Transformers architecture that unifies language, image, video, audio, and action. It features an autoregressive reasoner paired with a diffusion generator, available in Nano (16B) and Super (64B) models. Finetuned versions for Text2Image and Image2Video are now considered state-of-the-art open-weight models.
→ NVIDIA's Cosmos 3, with its open-weight SOTA image and video generation, is a strong contender for local LLM and multimodal on-device AI applications, potentially impacting future Gemma 4-style hackathons.
We’re announcing AIEWF speakers this week! Take the AI Engineering Survey ! Today’s guest Ethan first joined us for the LS Paper Club as the lead on NVIDIA Cosmos World Model , but then joined xAI and built Grok Imagine in 3 months: He comes back on Latent Space with some nuclear
Release: micropython-wasm 0.1a1 Fixes for some limitations that emerged while I was trying to use this to build datasette-agent-micropython . Tags: python , sandboxing , webassembly
Tool: Pasted File Editor I really like how you can paste a large volume of text into claude.ai (or the Claude desktop/mobile apps) and it will detect it as a large paste and turn it into a file attachment instead. I decided to have Codex desktop build me a version of that as a pr
Release: micropython-wasm 0.1a0 My latest sandboxing experiment: This alpha package bundles a lightly customized WASM build of MicroPython with a wrapper to execute code in it via wasmtime . Tags: python , sandboxing , webassembly
GPT-Rosalind advances life sciences research with enhanced biological reasoning, medicinal chemistry expertise, genomics analysis, and experimental workflow capabilities.
See how Wasmer used Codex with GPT-5.5 to build a Node.js runtime for the edge, accelerating development 10x to 20x and shipping in weeks instead of months.
4 years ago we argued that image composition was partially AGI-Hard . That gate has fallen this year. It can’t be pure coincidence that both Reve and Ideogram launched today, both with a heavy emphasis on how they made advances with strong labeling and code for layouts: and here’
Today was a big day, not least because we caught up on the state of GitHub vs Agents , and recorded a special pod with No Priors and Satya Nadella — at MS Build, Satya and Mustafa announced 7 new MAI models: This is an impressive lineup, especially considering that the Microsoft-
Uber Caps Usage of AI Tools Like Claude Code to Manage Costs I wrote the other day about Uber blowing its 2026 AI budget in four months, and how that wasn't particularly surprising given they would have set that budget in 2025, before anyone could have predicted how popular token
Microsoft announced two new text LLMs this morning - MAI-Thinking-1 (reasoning, 1T parameters, 35B active, available to "select early partners") and MAI-Code-1-Flash (137B Parameters, 5B active, "purpose-built for GitHub Copilot and VS Code to deliver high performance and lower c
California Brown Pelican, in Fort Mason, CA, US I'm at the Microsoft Build conference today, held at Fort Mason in San Francisco. There are California Brown Pelicans diving into the water directly behind venue! Tags: microsoft , ai , generative-ai , llms , llm-release
Hackers Simply Asked Meta AI to Give Them Access to High-Profile Instagram Accounts. It Worked I had trouble believing this story was true, but I've seen it verified from multiple sources now: One video shows a hacker starting a conversation with Meta’s AI support bot and asking
Uncover second-hand scores with AI tools in Google Search and Shopping.
OpenAI outlines a blueprint for U.S. governance of frontier AI, proposing a federal framework for safety, resilience, and national security.
Discover new Codex plugins, sites, and annotations that help analysts, marketers, designers, investors, and other teams get more done with AI.
The Next Era of Knowledge Work report explores how Codex is transforming productivity through AI-powered research, data analysis, workflow automation, and content creation.
OpenAI frontier models and Codex are now generally available on AWS, giving enterprises a new path to build with OpenAI through the AWS environments, controls, and procurement workflows they already use. Customers can get started with OpenAI on AWS and move faster from evaluation
We’ve informally heard that Satya is a listener to LS for a couple years now, but it was still absolutely surreal to meet him and do a live pod at Build, together with our friends at No Priors , the leading VC AI Podcast that we also greatly admire! We covered the MAI model techn
I’m excited to work with Microsoft once again as the presenting sponsors of the AI Engineer World’s Fair ! We’ll streaming live from MS Build today for a special crossover pod with our friends at No Priors and the one and only Satya Nadella . However we did not hold back with thi
Head to this link https://app.mindstudio.ai/signup?plan=mattwolfe for 90-days of free access to Remy and $25 in credits! I wanted to see if AI could rebuild my website… and honestly the result kind of shocked me. I used Remy from @MindStudio_ai to see if it could completely redes
NVIDIA unveiled the RTX Spark and Vera Rubin platform, pushing CPUs into AI inference and accelerating agent-based workloads. Anthropic and OpenAI entered an IPO race as Google plans an $80 billion equity raise and semiconductors drive a market rally. Policy debates intensified o
Recap of May's major AI shift from subsidized seat pricing to a token-based economy driven by agentic usage and runaway API consumption. Coverage of business-model pivots to usage-based billing, enterprise deployment and consulting plays, and market responses including cheaper sp
OpenAI outlines its public policy agenda for AI, including safety, youth protection, workforce transition, and global standards to ensure AI benefits society.
OpenAI calls for global action on youth AI safety, proposing an international institute to strengthen safeguards, standards, and opportunities for young people.
Our approach to AI policy and political advocacy, transparency, support for thoughtful regulation and AI safety, and that no outside political group speaks on the company’s behalf.
OpenAI breaks ground on a 1GW data center project in Michigan as part of Stargate, building AI infrastructure to expand access, create jobs, and support communities.
The White House AI executive order provoked intense debate over voluntary pre-release testing, a 30-day sharing window, and the risk of a de facto licensing regime. Anthropic’s Mythos and Project Glasswing expanded to critical infrastructure partners while exposing extreme token
Learn how Endava is using AI agents, ChatGPT Enterprise, and Codex to accelerate software delivery, automate workflows, and build an AI-native culture across the enterprise.
Travelers built an AI-powered Claim Assistant with OpenAI to guide customers through filing claims, provide 24/7 support, and scale operations during peak demand.
tp3_memories_localgemma3:4b)