Linux command
llamafile 命令
文件
复制后可按需替换文件名、目录或参数。
常用示例
Run a llamafile (launches chat in terminal and server on port 8080)
./[model].llamafile
Run in server-only mode
./[model].llamafile --server
Run in CLI mode with a prompt
./[model].llamafile --cli -p "[prompt]"
Run interactive chat mode
./[model].llamafile --chat
Load external model weights
llamafile -m [path/to/model.gguf]
Set context size and number of threads
./[model].llamafile -c [8192] -t [8] -p "[prompt]"
Run server on a specific host and port
./[model].llamafile --server --host [0.0.0.0] --port [8080]
Offload layers to GPU and set temperature
./[model].llamafile -ngl [999] --temp [0.7] -p "[prompt]"
说明
llamafile is a single-file executable that bundles llama.cpp with a model for portable LLM inference. The same file runs on Linux, macOS, Windows, FreeBSD, NetBSD, and OpenBSD without installation, built on Cosmopolitan Libc. By default, llamafile launches both a terminal chatbot and an HTTP server with a web UI on port 8080. It can also run in dedicated CLI, chat, or server modes.
参数
- -m _model_
- Path to model weights file (if not embedded in the llamafile).
- -p _prompt_
- Input prompt text.
- --cli
- Run in CLI mode, answering a single prompt.
- --chat
- Run interactive chat mode with slash commands.
- --server
- Start HTTP server mode with web UI.
- -c _size_
- Context window size in tokens.
- -t _threads_
- Number of threads to use for computation.
- -n _count_
- Maximum number of tokens to generate.
- -ngl _n_
- Number of layers to offload to GPU.
- --host _addr_
- Server listening address (default: 127.0.0.1).
- --port _port_
- Server port (default: 8080).
- --temp _value_
- Sampling temperature (higher = more random).
- --top-k _n_
- Top-k sampling (default: 40).
- --top-p _value_
- Top-p nucleus sampling (default: 0.95).
- --seed _n_
- Random seed for reproducible output.
- --grammar _grammar_
- Apply BNF grammar to constrain output format.
- --mmproj _file_
- Multimodal projection model weights for vision models.
- --image _file_
- Image file input for multimodal models.
- --jinja
- Enable Jinja template support for chat templates.
- -e
- Enable prompt evaluation.
FAQ
What is the llamafile command used for?
llamafile is a single-file executable that bundles llama.cpp with a model for portable LLM inference. The same file runs on Linux, macOS, Windows, FreeBSD, NetBSD, and OpenBSD without installation, built on Cosmopolitan Libc. By default, llamafile launches both a terminal chatbot and an HTTP server with a web UI on port 8080. It can also run in dedicated CLI, chat, or server modes.
How do I run a basic llamafile example?
Run `./[model].llamafile` in a terminal, then adjust file names, paths, flags, or remote targets for your system.
What does -m _model_ do in llamafile?
Path to model weights file (if not embedded in the llamafile).