With Nvidia announcing the RTX Spark to challenge Apple’s MX reign on local LLMs, it is appropriate that Google dumped some toys on us the next day.
Gemma4 12B is out today, in a sweet spot between their prior 8B and 26B I’m super interested to see how it stands up. 8B was a great chatbot, but a terrible tool bot. 26B seems much better but I could never run it on my M4 Mac mini 16GB. Does Google see 12B is “about right” for most spec’d out systems today – it runs on 16GB! Downloading now….
Unexpectedly, Google added to LiteRT-LM a local LLM server / CLI!
The LiteRT-LM CLI provides a lightweight, zero-code tool for running language models locally. We are now expanding the tool with the serve command, letting the CLI act as a drop-in local LLM server. Use this functionality with Gemma 4 12B to point any standard tool, SDK, or framework (such as OpenClaw, Hermes, OpenCode, Pi, or popular extensions like Continue and Aider) directly to your local endpoint.
Aside from the fact that the internets are saying RTX spark is going to cost $5000+ it does not seem far away where local models, even on edge devices (looking at you iOS27),may start to be competitive with cloud for basic stuff. Not coding, not tool use, not yet. But everyday things, like voice typing, opening apps, regular analysis, asking offline questions, could be done in the very near future all on local LLM.
