Notes on local LLMs, hardware tradeoffs, and what actually moves token-per-second numbers.
A short hello and what to expect from the writing here.
Why memory bandwidth dictates local LLM throughput more than raw compute, and what that means for hardware choice.