Linux command
llama.cpp 命令
文本
复制后可按需替换文件名、目录或参数。
常用示例
Run interactive chat
./main -m [model.gguf] -i
Generate text with prompt
./main -m [model.gguf] -p "[Your prompt here]"
Set context size
./main -m [model.gguf] -c [4096] -p "[prompt]"
Use multiple threads
./main -m [model.gguf] -t [8] -p "[prompt]"
Run server mode
./server -m [model.gguf] --port [8080]
Quantize model
./quantize [model.gguf] [output.gguf] [q4_0]
说明
llama.cpp is a port of Meta's LLaMA model to C/C++ for efficient CPU and GPU inference. It supports various quantization formats and runs LLMs on consumer hardware. The project includes tools for model conversion, quantization, and serving.
参数
- -m _model_
- Path to GGUF model file.
- -p _prompt_
- Input prompt.
- -i
- Interactive mode.
- -c _size_
- Context size.
- -t _threads_
- Number of threads.
- -n _tokens_
- Number of tokens to generate.
- --temp _temp_
- Temperature for sampling.
- -ngl _layers_
- GPU layers to offload.
FAQ
What is the llama.cpp command used for?
llama.cpp is a port of Meta's LLaMA model to C/C++ for efficient CPU and GPU inference. It supports various quantization formats and runs LLMs on consumer hardware. The project includes tools for model conversion, quantization, and serving.
How do I run a basic llama.cpp example?
Run `./main -m [model.gguf] -i` in a terminal, then adjust file names, paths, flags, or remote targets for your system.
What does -m _model_ do in llama.cpp?
Path to GGUF model file.