whichllm Command: Examples, Options, and Usage

常用示例

Detect hardware and list

whichllm

Show only your detected hardware

whichllm hardware

Restrict ranking to CPU-only

whichllm --cpu-only

Simulate a specific GPU

whichllm --gpu "[RTX 4090]"

Plan in reverse

whichllm plan [model_name]

Download a model and chat

whichllm run [model_name]

Print a Python snippet

whichllm snippet [model_name]

Emit JSON

whichllm --json

说明

whichllm detects local hardware (GPU model and VRAM, CPU, RAM, OS) and ranks open-weight large language models from HuggingFace and Ollama by how well they will actually run on that machine. Instead of treating "fits in VRAM" as the only criterion, it combines a fit check with recency-aware benchmark scores from sources such as LiveBench, Artificial Analysis, Aider, and the Chatbot Arena ELO leaderboard, and applies penalties for quantization, partial offload, and MoE architectures. The tool is designed for the common practical question "which model should I download tonight" rather than for marketing claims. The default invocation prints a short ranked table; subcommands extend the same engine to launch interactive sessions, plan hardware upgrades, or emit code snippets for direct integration.

参数

hardware: Print detected GPU, CPU, RAM, and OS information without ranking models.
run _model_: Download _model_ via Ollama and start an interactive chat session.
plan _model_: Reverse lookup: estimate which GPU or RAM tier is needed to run _model_ at usable speed.
snippet _model_: Print a ready-to-paste Python snippet that loads _model_ from HuggingFace or Ollama.
--gpu _name_: Override hardware detection and rank as if running on the named GPU (e.g. "RTX 4090").
--cpu-only: Restrict ranking to models that run acceptably without a GPU.
--top _N_: Show the top _N_ ranked models instead of the default short list.
--quant _type_: Filter results by quantization (e.g. _Q4_K_M_, _Q5_K_M_, _Q8_0_, _fp16_).
--profile _use_case_: Bias ranking towards a specific profile (_coding_, _vision_, _math_, _general_).
--json: Emit machine-readable JSON instead of the formatted table.
--refresh: Bypass the local cache and refetch benchmark data.
--version: Print version and exit.
--help: Print help and exit.

FAQ

What is the whichllm command used for?

whichllm detects local hardware (GPU model and VRAM, CPU, RAM, OS) and ranks open-weight large language models from HuggingFace and Ollama by how well they will actually run on that machine. Instead of treating "fits in VRAM" as the only criterion, it combines a fit check with recency-aware benchmark scores from sources such as LiveBench, Artificial Analysis, Aider, and the Chatbot Arena ELO leaderboard, and applies penalties for quantization, partial offload, and MoE architectures. The tool is designed for the common practical question "which model should I download tonight" rather than for marketing claims. The default invocation prints a short ranked table; subcommands extend the same engine to launch interactive sessions, plan hardware upgrades, or emit code snippets for direct integration.

How do I run a basic whichllm example?

Run `whichllm` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does hardware do in whichllm?

Print detected GPU, CPU, RAM, and OS information without ranking models.

whichllm 命令

常用示例

说明

参数

FAQ

相关命令