Linux command
whichllm 命令
文本
复制后可按需替换文件名、目录或参数。
常用示例
Detect hardware and list
whichllm
Show only your detected hardware
whichllm hardware
Restrict ranking to CPU-only
whichllm --cpu-only
Simulate a specific GPU
whichllm --gpu "[RTX 4090]"
Plan in reverse
whichllm plan [model_name]
Download a model and chat
whichllm run [model_name]
Print a Python snippet
whichllm snippet [model_name]
Emit JSON
whichllm --json
说明
whichllm detects local hardware (GPU model and VRAM, CPU, RAM, OS) and ranks open-weight large language models from HuggingFace and Ollama by how well they will actually run on that machine. Instead of treating "fits in VRAM" as the only criterion, it combines a fit check with recency-aware benchmark scores from sources such as LiveBench, Artificial Analysis, Aider, and the Chatbot Arena ELO leaderboard, and applies penalties for quantization, partial offload, and MoE architectures. The tool is designed for the common practical question "which model should I download tonight" rather than for marketing claims. The default invocation prints a short ranked table; subcommands extend the same engine to launch interactive sessions, plan hardware upgrades, or emit code snippets for direct integration.
参数
- hardware
- Print detected GPU, CPU, RAM, and OS information without ranking models.
- run _model_
- Download _model_ via Ollama and start an interactive chat session.
- plan _model_
- Reverse lookup: estimate which GPU or RAM tier is needed to run _model_ at usable speed.
- snippet _model_
- Print a ready-to-paste Python snippet that loads _model_ from HuggingFace or Ollama.
- --gpu _name_
- Override hardware detection and rank as if running on the named GPU (e.g. "RTX 4090").
- --cpu-only
- Restrict ranking to models that run acceptably without a GPU.
- --top _N_
- Show the top _N_ ranked models instead of the default short list.
- --quant _type_
- Filter results by quantization (e.g. _Q4_K_M_, _Q5_K_M_, _Q8_0_, _fp16_).
- --profile _use_case_
- Bias ranking towards a specific profile (_coding_, _vision_, _math_, _general_).
- --json
- Emit machine-readable JSON instead of the formatted table.
- --refresh
- Bypass the local cache and refetch benchmark data.
- --version
- Print version and exit.
- --help
- Print help and exit.
FAQ
What is the whichllm command used for?
whichllm detects local hardware (GPU model and VRAM, CPU, RAM, OS) and ranks open-weight large language models from HuggingFace and Ollama by how well they will actually run on that machine. Instead of treating "fits in VRAM" as the only criterion, it combines a fit check with recency-aware benchmark scores from sources such as LiveBench, Artificial Analysis, Aider, and the Chatbot Arena ELO leaderboard, and applies penalties for quantization, partial offload, and MoE architectures. The tool is designed for the common practical question "which model should I download tonight" rather than for marketing claims. The default invocation prints a short ranked table; subcommands extend the same engine to launch interactive sessions, plan hardware upgrades, or emit code snippets for direct integration.
How do I run a basic whichllm example?
Run `whichllm` in a terminal, then adjust file names, paths, flags, or remote targets for your system.
What does hardware do in whichllm?
Print detected GPU, CPU, RAM, and OS information without ranking models.