Linux command
koboldcpp 命令
文本
复制后可按需替换文件名、目录或参数。
常用示例
Launch
koboldcpp --model [path/to/model.gguf]
Launch
koboldcpp --model [path/to/model.gguf] --usecuda --gpulayers [35]
Launch
koboldcpp --model [path/to/model.gguf] --usevulkan --gpulayers [35] --port [8080]
Run a single prompt
koboldcpp --model [path/to/model.gguf] --prompt "[What is the meaning of life?]"
Launch in CLI interactive mode
koboldcpp --model [path/to/model.gguf] --cli
Load a saved configuration
koboldcpp --config [path/to/config.kcpps]
说明
koboldcpp is a self-contained AI text generation server that runs large language models locally. Built on top of llama.cpp, it provides a bundled web UI (KoboldAI Lite) and supports all GGML and GGUF model formats. It requires no external dependencies and runs as a single executable. The server exposes an API compatible with KoboldAI and OpenAI formats, making it usable with a wide range of frontends and applications. It supports CPU inference as well as GPU acceleration through CUDA (NVIDIA), Vulkan (AMD/NVIDIA), and Metal (Apple Silicon). Beyond text generation, koboldcpp supports image generation (Stable Diffusion), speech recognition (Whisper), and text-to-speech, all within the same executable. The bundled web UI offers multiple interaction modes including chat, instruct, adventure, and story writing.
参数
- --model _path_
- Specify the GGUF/GGML model file to load
- --config _file_
- Load a .kcpps configuration file
- --usecuda
- Enable NVIDIA CUDA GPU acceleration
- --usevulkan
- Enable Vulkan GPU acceleration (AMD/NVIDIA)
- --gpulayers _n_
- Number of model layers to offload to GPU
- --threads _n_
- Set CPU thread count for inference
- --contextsize _n_
- Set maximum context length in tokens
- --port _n_
- Change server port (default: 5001)
- --host _addr_
- Bind to a specific IP address
- --multiuser _n_
- Enable multiuser mode with _n_ concurrent slots
- --password _key_
- Require API authentication with the given key
- --cli
- Launch interactive command-line interface without starting a server
- --prompt _text_
- Run a single prompt, print output, and exit
- --benchmark
- Run performance benchmarking mode
- --flashattention
- Enable flash attention for improved performance
- --smartcontext
- Enable smart context handling to reduce reprocessing
- --usemmap
- Enable memory-mapped file I/O for model loading
- --usemlock
- Force model to remain in RAM (prevent swapping)
- --ssl
- Enable SSL for HTTPS connections
- --remotetunnel
- Enable remote tunnel access for sharing the server
- --sdmodel _path_
- Load a Stable Diffusion model for image generation
- --noavx2
- Compatibility mode for older CPUs without AVX2
- --showgui
- Show the GUI launcher even when command-line flags are used
- --skiplauncher
- Skip the GUI launcher and start the server directly
FAQ
What is the koboldcpp command used for?
koboldcpp is a self-contained AI text generation server that runs large language models locally. Built on top of llama.cpp, it provides a bundled web UI (KoboldAI Lite) and supports all GGML and GGUF model formats. It requires no external dependencies and runs as a single executable. The server exposes an API compatible with KoboldAI and OpenAI formats, making it usable with a wide range of frontends and applications. It supports CPU inference as well as GPU acceleration through CUDA (NVIDIA), Vulkan (AMD/NVIDIA), and Metal (Apple Silicon). Beyond text generation, koboldcpp supports image generation (Stable Diffusion), speech recognition (Whisper), and text-to-speech, all within the same executable. The bundled web UI offers multiple interaction modes including chat, instruct, adventure, and story writing.
How do I run a basic koboldcpp example?
Run `koboldcpp --model [path/to/model.gguf]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.
What does --model _path_ do in koboldcpp?
Specify the GGUF/GGML model file to load