How fast is 10 tokens per second really? Neat little HTML app by Mike Veerman ( source code here ) which simulates LLM token output speeds from 5/second to 800/second. Useful if you see a model advertised as "30 tokens/second" and want to get a feel for what that actually looks l
Today at Google I/O, Google released Gemini 3.5 Flash . This one skipped the -preview modifier and went straight to general availability, and Google appear to be using it for a whole lot of their key products: 3.5 Flash is available today to billions of people globally: For every
Release: datasette-llm-accountant 0.1a4 Fixed bug tracking chains of responses. Refs datasette-llm#7 Tags: llm , datasette
LiteRT-LM v2.1.5 has been released, introducing Python 3.14 support and making LiteRT C++ APIs header-only. This update also removes the libLiteRt.so dependency from GPU Accelerator and Dispatch API shared libraries, simplifying their usage. Additionally, it adds Raspberry Pi 5 GPU acceleration support and refines the LiteRT Options class for easier manipulation.
→ This update significantly boosts LiteRT-LM's on-device capabilities, especially with Raspberry Pi 5 GPU acceleration, making it a stronger contender for local LLM deployments and mobile AI projects.
How fast is 10 tokens per second really? Neat little HTML app by Mike Veerman ( source code here ) which simulates LLM token output speeds from 5/second to 800/second. Useful if you see a model advertised as "30 tokens/second" and want to get a feel for what that actually looks l
Today at Google I/O, Google released Gemini 3.5 Flash . This one skipped the -preview modifier and went straight to general availability, and Google appear to be using it for a whole lot of their key products: 3.5 Flash is available today to billions of people globally: For every
Release: datasette-llm-accountant 0.1a4 Fixed bug tracking chains of responses. Refs datasette-llm#7 Tags: llm , datasette
Release: llm-gemini 0.32a0 Compatible with llm>=0.32a0 alpha - adds the ability to stream reasoning tokens. Tags: gemini , llm
Release: datasette-llm 0.1a8 Fix for bug where llm_prompt_context() hook did not fully collect chains of responses. #7
LiteRT-LM v2.1.5 has been released, introducing Python 3.14 support and making LiteRT C++ APIs header-only. This update also removes the libLiteRt.so dependency from GPU Accelerator and Dispatch API shared libraries, simplifying their usage. Additionally, it adds Raspberry Pi 5 GPU acceleration support and refines the LiteRT Options class for easier manipulation.
→ This update significantly boosts LiteRT-LM's on-device capabilities, especially with Raspberry Pi 5 GPU acceleration, making it a stronger contender for local LLM deployments and mobile AI projects.
An OpenAI model solved the 80-year-old unit distance problem, disproving a major conjecture in discrete geometry and marking a milestone in AI-driven mathematics.
We will leave coverage of the SpaceXAI IPO filing for the actual day of IPO. Today we celebrate OpenAI’s result, speculated to be GPT 5.6 running for <32 hours or <$1000 , on the planar unit distance problem . Similar to the 2025 IMO Gold result, this is a general purpose LLM, no
The full keynote livestream was 2 hours, but as usual, The Verge has the best supercut down to 30 mins, which is very worthwhile to get a narrative sense: The mainline Gemini 3.5 Flash is GA today (very nice compared to some staged rollouts) and is sold as a decent step up even c
🚨 OpenAI just dropped the feature we've all been waiting for: Codex on mobile. If you watched my recent video on how I built my custom wiki, you know how much I’ve been relying on Codex. So I was super excited for this feature and I know a bunch of other people have been asking f
Release: llm-gemini 0.32 New model gemini-3.5-flash for Gemini 3.5 Flash . See also my notes on Gemini 3.5 Flash , and the pelican I drew using this upgrade to the plugin. Tags: gemini , llm
I put together these annotated slides from my five minute lightning talk at PyCon US 2026, using the latest iteration of my annotated presentation tool . # I presented this lightning talk at PyCon US 2026, attempting to summarize the last six months of developments in LLMs in fiv
We shared the next step in our journey to bring together the best of a search engine with the best of AI.
Google I/O unveiled Omni, Gemini 3.5 Flash, Antigravity 2.0, and Gemini Spark, framing a push toward multimodal generation and agentic tools. Omni showcased powerful video-to-video editing and fine-grained steerability. Gemini 3.5 Flash emphasized speed at the expense of token ef
The latest from Google I/O: See how we’re helping you get more done with Gemini.
At Google I/O we released Gemini 3.5, our latest series of models combining frontier intelligence with action.
OpenAI advances Education for Countries, expanding AI adoption in schools with new partnerships, teacher training, and tools to improve global learning outcomes.
It is the day before Google I/O, when the next major Gemini releases are expected to be previewed, and it will probably be a quiet week from competitors, though Anthropic and OpenAI both had minor wins today, and Cursor shipped their first SpaceXAI model with some nice detail on
The future of war has been evolving before our eyes in Ukraine, yet the west still plans to fight the last war. In this special episode, guest host ( @noahpinion ) and sit down with Yaroslav Azhnyuk ( @YaroslavAzhnyuk ) , a serial tech founder who went from building PetCube to fo
Internet art critics just took the biggest L of the year 💀 A user on X posted a REAL Monet painting, slapped an "AI Generated" tag on it, and asked people to critique it. People wrote whole essays about how the painting was "soulless," "lacked human touch," and "obviously a compu
How to get the PERFECT AI art style every single time: 1. Go to Krea.ai and toggle to the "Krea 2 Large" model for photorealism or "Krea 2 Medium" model for illustrations 2. Click "Mood Boards" on the bottom dashboard 3. Drop in a batch of images that share the aesthetic you want
Composer 2.5 narrows the gap with frontier coding models on key benchmarks while Cursor touts dramatic token‑efficiency at a fraction of the cost. Enterprise strategy is shifting toward harness‑first platforms and agent orchestration, capturing long‑running context, persistent me
Codex integration into ChatGPT Mobile ushers in persistent agent workflows and shifts knowledge work toward human review and approval instead of direct execution. Google I/O expectations include a Gemini Spark personal agent and cost-optimized Gemini Flash models pursuing both co
It's hard to find much to write about Google I/O this year because I have a policy of not writing about anything that I can't try out myself, and a lot of the big announcements are "coming soon". I actually prefer to write about things that are in general availability, because I'
How Ramp engineers use Codex with GPT-5.5 to review code and ship improvements, allowing them to get substantive feedback in minutes instead of hours.
OpenAI for Singapore launches a multi-year AI partnership to expand deployment, build local talent, and support businesses and public services with AI.
OpenAI advances AI content provenance with Content Credentials, SynthID, and a verification tool to help people identify and trust AI-generated media.
Take the 2026 AI Engineering Survey and get >$2k in credits and AIE WF tickets ! This was recorded before Railway suffered a major GCP outage on May 19, despite being a multi-AZ, multi-zone mesh ring, with HA fiber interconnects between their Metal <> GCP <> AWS, because workload
At Google I/O 2026, we shared how we’re making AI more helpful for everyone. See everything we announced.
One year after launch, see how AI Mode’s users are shifting from keywords to natural language queries.
Announcing new voice capabilities in Gmail, Docs and Keep, a new design tool called Google Pics and updates to AI Inbox.
OpenAI and Dell partner to bring Codex to hybrid and on-premise environments, helping enterprises deploy AI coding agents securely across data and workflows.
tp3_memories_localgemma3:4b)