← 返回命令列表

Linux command

llamafile 命令

文件

复制后可按需替换文件名、目录或参数。

常用示例

Run a llamafile (launches chat in terminal and server on port 8080)

./[model].llamafile

Run in server-only mode

./[model].llamafile --server

Run in CLI mode with a prompt

./[model].llamafile --cli -p "[prompt]"

Run interactive chat mode

./[model].llamafile --chat

Load external model weights

llamafile -m [path/to/model.gguf]

Set context size and number of threads

./[model].llamafile -c [8192] -t [8] -p "[prompt]"

Run server on a specific host and port

./[model].llamafile --server --host [0.0.0.0] --port [8080]

Offload layers to GPU and set temperature

./[model].llamafile -ngl [999] --temp [0.7] -p "[prompt]"

说明

llamafile is a single-file executable that bundles llama.cpp with a model for portable LLM inference. The same file runs on Linux, macOS, Windows, FreeBSD, NetBSD, and OpenBSD without installation, built on Cosmopolitan Libc. By default, llamafile launches both a terminal chatbot and an HTTP server with a web UI on port 8080. It can also run in dedicated CLI, chat, or server modes.

参数

-m _model_
Path to model weights file (if not embedded in the llamafile).
-p _prompt_
Input prompt text.
--cli
Run in CLI mode, answering a single prompt.
--chat
Run interactive chat mode with slash commands.
--server
Start HTTP server mode with web UI.
-c _size_
Context window size in tokens.
-t _threads_
Number of threads to use for computation.
-n _count_
Maximum number of tokens to generate.
-ngl _n_
Number of layers to offload to GPU.
--host _addr_
Server listening address (default: 127.0.0.1).
--port _port_
Server port (default: 8080).
--temp _value_
Sampling temperature (higher = more random).
--top-k _n_
Top-k sampling (default: 40).
--top-p _value_
Top-p nucleus sampling (default: 0.95).
--seed _n_
Random seed for reproducible output.
--grammar _grammar_
Apply BNF grammar to constrain output format.
--mmproj _file_
Multimodal projection model weights for vision models.
--image _file_
Image file input for multimodal models.
--jinja
Enable Jinja template support for chat templates.
-e
Enable prompt evaluation.

FAQ

What is the llamafile command used for?

llamafile is a single-file executable that bundles llama.cpp with a model for portable LLM inference. The same file runs on Linux, macOS, Windows, FreeBSD, NetBSD, and OpenBSD without installation, built on Cosmopolitan Libc. By default, llamafile launches both a terminal chatbot and an HTTP server with a web UI on port 8080. It can also run in dedicated CLI, chat, or server modes.

How do I run a basic llamafile example?

Run `./[model].llamafile` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does -m _model_ do in llamafile?

Path to model weights file (if not embedded in the llamafile).