← 返回命令列表

Linux command

whichllm 命令

文本

复制后可按需替换文件名、目录或参数。

常用示例

Detect hardware and list

whichllm

Show only your detected hardware

whichllm hardware

Restrict ranking to CPU-only

whichllm --cpu-only

Simulate a specific GPU

whichllm --gpu "[RTX 4090]"

Plan in reverse

whichllm plan [model_name]

Download a model and chat

whichllm run [model_name]

Print a Python snippet

whichllm snippet [model_name]

Emit JSON

whichllm --json

说明

whichllm detects local hardware (GPU model and VRAM, CPU, RAM, OS) and ranks open-weight large language models from HuggingFace and Ollama by how well they will actually run on that machine. Instead of treating "fits in VRAM" as the only criterion, it combines a fit check with recency-aware benchmark scores from sources such as LiveBench, Artificial Analysis, Aider, and the Chatbot Arena ELO leaderboard, and applies penalties for quantization, partial offload, and MoE architectures. The tool is designed for the common practical question "which model should I download tonight" rather than for marketing claims. The default invocation prints a short ranked table; subcommands extend the same engine to launch interactive sessions, plan hardware upgrades, or emit code snippets for direct integration.

参数

hardware
Print detected GPU, CPU, RAM, and OS information without ranking models.
run _model_
Download _model_ via Ollama and start an interactive chat session.
plan _model_
Reverse lookup: estimate which GPU or RAM tier is needed to run _model_ at usable speed.
snippet _model_
Print a ready-to-paste Python snippet that loads _model_ from HuggingFace or Ollama.
--gpu _name_
Override hardware detection and rank as if running on the named GPU (e.g. "RTX 4090").
--cpu-only
Restrict ranking to models that run acceptably without a GPU.
--top _N_
Show the top _N_ ranked models instead of the default short list.
--quant _type_
Filter results by quantization (e.g. _Q4_K_M_, _Q5_K_M_, _Q8_0_, _fp16_).
--profile _use_case_
Bias ranking towards a specific profile (_coding_, _vision_, _math_, _general_).
--json
Emit machine-readable JSON instead of the formatted table.
--refresh
Bypass the local cache and refetch benchmark data.
--version
Print version and exit.
--help
Print help and exit.

FAQ

What is the whichllm command used for?

whichllm detects local hardware (GPU model and VRAM, CPU, RAM, OS) and ranks open-weight large language models from HuggingFace and Ollama by how well they will actually run on that machine. Instead of treating "fits in VRAM" as the only criterion, it combines a fit check with recency-aware benchmark scores from sources such as LiveBench, Artificial Analysis, Aider, and the Chatbot Arena ELO leaderboard, and applies penalties for quantization, partial offload, and MoE architectures. The tool is designed for the common practical question "which model should I download tonight" rather than for marketing claims. The default invocation prints a short ranked table; subcommands extend the same engine to launch interactive sessions, plan hardware upgrades, or emit code snippets for direct integration.

How do I run a basic whichllm example?

Run `whichllm` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does hardware do in whichllm?

Print detected GPU, CPU, RAM, and OS information without ranking models.