About

A quick compatibility check for local large language models.

Pick a device — or plug in custom specs — and see which models fit in memory, with token-per-second estimates grounded in the hardware that actually matters: memory bandwidth.

Why

Running LLMs locally is mostly a bandwidth problem, not a compute one. Most hardware comparisons make that unclear. This site tries to make it plain. The short version lives at Bandwidth, not FLOPS.

What’s here

  • Preset devices — Apple silicon and NVIDIA GPUs with published memory and bandwidth figures.
  • Custom rigs — plug in any memory and bandwidth pair and get the same analysis.
  • Speed estimates — theoretical max token/sec adjusted by a ~0.6 efficiency factor. Real numbers shift with runtime, context length, and batch size.
  • Install commands — copy the exact ollama run command for any compatible model.

Caveats

Estimates are not benchmarks. They’re a first-pass filter. If a specific workload matters (long context, structured output, multiple users), run the model and measure.

Who

Built by Pithos Labs. Source at github.com/pithoslabs/canirunthis. Feedback and pull requests welcome.