CanIRun.ai Review 2026: A Hardware Checker for Local LLMs
CanIRun.ai is a free browser tool that detects your GPU, VRAM, RAM, and CPU, then matches them against the real-world requirements of open-source LLMs from Meta, Google, DeepSeek, Mistral, Qwen, and others.
CanIRun.ai Review 2026: A Hardware Checker for Local LLMs
If you have ever spent an evening downloading a 30 GB model, only to watch it OOM your GPU on the first prompt, you already know the problem CanIRun.ai is trying to solve.
Until recently, the only way to answer "can my machine run Qwen-30B at Q4_K_M?" was a tour through llama.cpp issues, Ollama Discord screenshots, and someone's reddit comment from eight months ago. CanIRun.ai replaces that with a single page: open the site, let it detect your hardware, and get a graded list of which open-source LLMs will actually run on your machine β and at which quantization.
I have used it as my first-pass triage tool for the last few months, on three different setups (an M2 MacBook Pro, a Windows desktop with a 12 GB RTX 4070, and a Linux box with a 24 GB 3090). Here is what it does well, where it falls short, and whether it deserves a permanent bookmark.
β View CanIRun.ai on ToolCenter
What CanIRun.ai Actually Does
CanIRun.ai is a browser-based hardware analyzer for local AI inference. The flow is simple:
- Open canirun.ai in any modern browser.
- It detects your GPU model, VRAM, system RAM, and CPU through standard browser APIs (WebGPU, navigator hardware concurrency, GPU device queries).
- It cross-references your specs against a catalog of open-weight LLMs and their published requirements.
- Each model gets a grade β from Runs great down to Too heavy β for every available quantization (Q2_K, Q4_K_M, Q5_K_M, Q6_K, Q8_0, F16).
That is the whole product. No sign-up, no install, no background daemon, no model files uploaded.
The model catalog is the part that does the real work. It covers Meta's Llama family, Google's Gemma, DeepSeek's R1 and V3 variants, Mistral, Alibaba's Qwen, Microsoft's Phi, NVIDIA's Nemotron, Cohere Command, and Zhipu GLM β most of the open-weight stack that anyone running a local LLM in 2026 actually cares about. Data is sourced from the same model cards and quantization tables that llama.cpp, Ollama, and LM Studio rely on.
Quick Comparison: How CanIRun.ai Fits in the Local-LLM Stack
| Tool | What it does | When to use it | Pricing |
|---|---|---|---|
| CanIRun.ai | Pre-flight hardware compatibility check | Before downloading any model | Free |
| Ollama | Run and serve open-source LLMs locally | Actually running models day-to-day | Free |
| LM Studio | Desktop UI for downloading and chatting with local LLMs | Browsing models with a chat UI | Free |
| llama.cpp | Low-level inference engine | Squeezing every last token/sec | Free (OSS) |
| Manual VRAM math | Read the model card, do the arithmetic | When CanIRun.ai cannot detect your GPU | Free |
CanIRun.ai is not a replacement for any of these β it is the step you skipped that caused your last failed download.
Hands-On: The Hardware Detection
I started on the M2 MacBook Pro (32 GB unified memory). Detection took roughly two seconds. It correctly identified:
- Apple M2 Pro, 10-core CPU
- 32 GB unified memory
- GPU: M2 Pro integrated (Apple)
- WebGPU available
What it does not do well on Apple Silicon is split "VRAM" from "system RAM" β because there is no split. It treats the whole 32 GB as a single pool and grades models accordingly, which is technically right but misses Metal's practical limits on how much memory llama.cpp can address per process. In practice the grades were optimistic by about one tier β models it called Runs well ran fine but slowly; the one it called Decent for Q4_K_M genuinely struggled.
On the RTX 4070 desktop (12 GB VRAM, 32 GB RAM), detection was perfect: card identified by name, VRAM correct, driver version surfaced. The grades here matched reality almost exactly across a dozen models I sanity-checked in Ollama.
On the 24 GB 3090 box, same story β clean detection, accurate grading. NVIDIA GPUs with current drivers are the happy path.
Where detection gets shaky: older Intel integrated graphics, mobile NVIDIA chips on shared-memory laptops, and any setup where the browser cannot get a clean WebGPU adapter handle. In those cases CanIRun.ai shows a manual entry form so you can input specs yourself, which is fine but defeats the "zero friction" promise.
The Grading System
This is the feature that turns CanIRun.ai from "specs page" into something actually useful.
For every model + quantization combination, CanIRun.ai assigns one of six grades:
- Runs great β comfortable headroom, fast tokens/sec expected
- Runs well β fits cleanly, modest headroom
- Decent β fits but no slack for long contexts or batch sizes
- Tight fit β works at small context lengths only
- Barely runs β likely OOM with the full context window
- Too heavy β will not load
The grade considers VRAM headroom for the model weights, KV cache requirements at typical context lengths, and the activation memory the architecture needs (dense vs. MoE matters here β MoE models have much smaller active-parameter footprints than their nominal size suggests).
This is materially more useful than "model needs 8 GB VRAM" β the static-number guidance you get from most model cards.
One honest caveat: the grades are based on idealized assumptions (mostly-empty VRAM at start, default batch size, conservative context). If you are simultaneously running a browser with 80 tabs, a code editor, and Slack, knock the grades down by one tier in your head.
Filtering & Discovery
The filter sidebar is where CanIRun.ai earns its second use case: model discovery, not just compatibility check.
You can narrow the catalog by:
- Task: chat, code, reasoning, vision
- Provider: Meta, Google, DeepSeek, Mistral, Qwen, NVIDIA, etc.
- License: permissive, restrictive, research-only
- Compatibility grade: only show me what runs well or better on my machine
- Architecture: dense vs. MoE
The combination "code + DeepSeek + Runs well on my hardware" took me about ten seconds to filter into a shortlist of three models. That is the workflow I keep coming back for.
Sorting also covers what you actually want: newest, best score, smallest VRAM footprint, largest context window, fastest expected speed.
What CanIRun.ai Does Well
Friction is zero. Open the page, get an answer. No account, no install, no permission dialogs (beyond WebGPU's standard prompt). This matters more than it sounds β it is the difference between "I'll check later" and actually checking.
The catalog stays current. New Llama, Qwen, and DeepSeek releases tend to show up within a week. This is not unique to CanIRun.ai (Ollama's library is comparable), but the speed is respectable for a free side project.
Privacy by construction. Detection runs client-side. Your hardware fingerprint, model interest history, and IP-based location are not the product here. For a tool aimed at the local-LLM community β many of whom run models locally specifically because they distrust cloud inference β this is the right posture.
It teaches quantization without nagging. Many users still think "Llama 3 70B needs 140 GB VRAM" β the F16 number on the model card. Seeing the same model graded across Q2_K through F16 makes the tradeoff obvious without forcing you to read a llama.cpp wiki page.
Where It Falls Short
Apple Silicon unified memory is fuzzy. As covered above, grades on Macs skew optimistic. The tool acknowledges this but does not fully correct for it.
Multi-GPU is treated as single-GPU. If you have two 3090s, CanIRun.ai sees one. For people who actually have rigs like this, that is a real gap β tensor-split inference is exactly the case where "can it run" needs nuance.
Non-LLM workloads are out of scope. Stable Diffusion, video models, voice models, fine-tuning, training β none of this is covered. The name is "CanIRun" but the answer is always "can I run this LLM for inference." Fair scope, but worth knowing.
No live tokens/sec estimates. A grade tells you it will fit; it does not tell you whether you will get 3 tok/sec or 30. For batch-quality decisions ("can I serve this from my home lab?") you still need to benchmark yourself.
Estimates are estimates. This is a hedge the site itself makes prominently, and it is correct. Drivers, frameworks, OS, and other processes all bend reality. Treat CanIRun.ai as a high-quality first filter, not a final verdict.
CanIRun.ai vs. The Alternatives
vs. reading the Ollama library page directly: Ollama lists sizes but not personalized fit. You see "llama3:70b-q4_K_M is 40 GB" and have to do the VRAM math yourself. CanIRun.ai does the math, with your machine plugged in.
vs. LM Studio's built-in compatibility hints: LM Studio shows green/yellow/red dots next to models based on your system, which is conceptually identical to CanIRun.ai's grading. The difference is friction β LM Studio is a 400 MB download; CanIRun.ai is a URL. If you are already in LM Studio, use it. If you are deciding which model to download in the first place, CanIRun.ai wins.
vs. asking a chatbot: A general-purpose LLM will hallucinate VRAM requirements with confidence. CanIRun.ai is grounded in actual model card data and quantization tables. For this specific question, a small purpose-built tool beats a frontier model.
Pricing & Access
Free. No account. The whole thing is a static-ish web app with a public model database. The author (midudev) maintains it as an open-source community project, and the data is sourced from llama.cpp, Ollama, and LM Studio model definitions.
This means two things: (1) you should expect the project to keep being free, and (2) you should not expect enterprise SLAs around uptime or new-model latency. Both are fair tradeoffs.
Who Should Use CanIRun.ai
Use it if:
- You are deciding which open-source LLM to download next and want to skip the trial-and-error.
- You are buying a new GPU or laptop and want a concrete sense of which models it will unlock.
- You are explaining quantization to a colleague and want a visual reference.
- You run a homelab or small team and want a shared way to triage "can we host this?" before spinning up infrastructure.
Skip it if:
- You only use cloud LLMs (OpenAI, Anthropic, Together, Groq). There is nothing for you here.
- Your workload is non-LLM (image gen, video, fine-tuning). Use task-specific calculators instead.
- You have multi-GPU or exotic hardware where the single-card detection model breaks down.
Verdict
CanIRun.ai is one of those small, opinionated tools that does one thing precisely and stays out of the way. It is not trying to be Ollama, LM Studio, or a model marketplace. It is the pre-flight check you wanted to exist but never built yourself.
For anyone running open-source LLMs locally β or thinking about starting β it earns a permanent bookmark. Pair it with Ollama for actual inference and you have a complete local-LLM workflow with zero ongoing cost.
Last updated: June 2026. Hardware detection and model catalog tested on macOS, Windows, and Linux.
Next in Deep Dives
Continue your journey

Cuty AI Review 2026: Is cuty.ai a Real Text-to-Video Tool or Just Hype?
Cuty AI (cuty.ai) is a newer text-to-video and image-to-video generator pitched at marketers and creators who want short promo or social clips without editing skills.

alphaXiv Review 2026: AI Comments and Discussion on arXiv Papers
alphaXiv is a free, open community layer over arXiv: change arxiv.org to alphaxiv.org in any paper URL and you get a side-by-side reader with line-by-line comment threads, an Ask AI assistant, and an AI-generated blog summary of the paper.
