← 返回命令列表

Linux command

llama.cpp 命令

文本

复制后可按需替换文件名、目录或参数。

常用示例

Run interactive chat

./main -m [model.gguf] -i

Generate text with prompt

./main -m [model.gguf] -p "[Your prompt here]"

Set context size

./main -m [model.gguf] -c [4096] -p "[prompt]"

Use multiple threads

./main -m [model.gguf] -t [8] -p "[prompt]"

Run server mode

./server -m [model.gguf] --port [8080]

Quantize model

./quantize [model.gguf] [output.gguf] [q4_0]

说明

llama.cpp is a port of Meta's LLaMA model to C/C++ for efficient CPU and GPU inference. It supports various quantization formats and runs LLMs on consumer hardware. The project includes tools for model conversion, quantization, and serving.

参数

-m _model_
Path to GGUF model file.
-p _prompt_
Input prompt.
-i
Interactive mode.
-c _size_
Context size.
-t _threads_
Number of threads.
-n _tokens_
Number of tokens to generate.
--temp _temp_
Temperature for sampling.
-ngl _layers_
GPU layers to offload.

FAQ

What is the llama.cpp command used for?

llama.cpp is a port of Meta's LLaMA model to C/C++ for efficient CPU and GPU inference. It supports various quantization formats and runs LLMs on consumer hardware. The project includes tools for model conversion, quantization, and serving.

How do I run a basic llama.cpp example?

Run `./main -m [model.gguf] -i` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does -m _model_ do in llama.cpp?

Path to GGUF model file.