← 返回命令列表

Linux command

koboldcpp 命令

文本

复制后可按需替换文件名、目录或参数。

常用示例

Launch

koboldcpp --model [path/to/model.gguf]

Launch

koboldcpp --model [path/to/model.gguf] --usecuda --gpulayers [35]

Launch

koboldcpp --model [path/to/model.gguf] --usevulkan --gpulayers [35] --port [8080]

Run a single prompt

koboldcpp --model [path/to/model.gguf] --prompt "[What is the meaning of life?]"

Launch in CLI interactive mode

koboldcpp --model [path/to/model.gguf] --cli

Load a saved configuration

koboldcpp --config [path/to/config.kcpps]

说明

koboldcpp is a self-contained AI text generation server that runs large language models locally. Built on top of llama.cpp, it provides a bundled web UI (KoboldAI Lite) and supports all GGML and GGUF model formats. It requires no external dependencies and runs as a single executable. The server exposes an API compatible with KoboldAI and OpenAI formats, making it usable with a wide range of frontends and applications. It supports CPU inference as well as GPU acceleration through CUDA (NVIDIA), Vulkan (AMD/NVIDIA), and Metal (Apple Silicon). Beyond text generation, koboldcpp supports image generation (Stable Diffusion), speech recognition (Whisper), and text-to-speech, all within the same executable. The bundled web UI offers multiple interaction modes including chat, instruct, adventure, and story writing.

参数

--model _path_
Specify the GGUF/GGML model file to load
--config _file_
Load a .kcpps configuration file
--usecuda
Enable NVIDIA CUDA GPU acceleration
--usevulkan
Enable Vulkan GPU acceleration (AMD/NVIDIA)
--gpulayers _n_
Number of model layers to offload to GPU
--threads _n_
Set CPU thread count for inference
--contextsize _n_
Set maximum context length in tokens
--port _n_
Change server port (default: 5001)
--host _addr_
Bind to a specific IP address
--multiuser _n_
Enable multiuser mode with _n_ concurrent slots
--password _key_
Require API authentication with the given key
--cli
Launch interactive command-line interface without starting a server
--prompt _text_
Run a single prompt, print output, and exit
--benchmark
Run performance benchmarking mode
--flashattention
Enable flash attention for improved performance
--smartcontext
Enable smart context handling to reduce reprocessing
--usemmap
Enable memory-mapped file I/O for model loading
--usemlock
Force model to remain in RAM (prevent swapping)
--ssl
Enable SSL for HTTPS connections
--remotetunnel
Enable remote tunnel access for sharing the server
--sdmodel _path_
Load a Stable Diffusion model for image generation
--noavx2
Compatibility mode for older CPUs without AVX2
--showgui
Show the GUI launcher even when command-line flags are used
--skiplauncher
Skip the GUI launcher and start the server directly

FAQ

What is the koboldcpp command used for?

koboldcpp is a self-contained AI text generation server that runs large language models locally. Built on top of llama.cpp, it provides a bundled web UI (KoboldAI Lite) and supports all GGML and GGUF model formats. It requires no external dependencies and runs as a single executable. The server exposes an API compatible with KoboldAI and OpenAI formats, making it usable with a wide range of frontends and applications. It supports CPU inference as well as GPU acceleration through CUDA (NVIDIA), Vulkan (AMD/NVIDIA), and Metal (Apple Silicon). Beyond text generation, koboldcpp supports image generation (Stable Diffusion), speech recognition (Whisper), and text-to-speech, all within the same executable. The bundled web UI offers multiple interaction modes including chat, instruct, adventure, and story writing.

How do I run a basic koboldcpp example?

Run `koboldcpp --model [path/to/model.gguf]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does --model _path_ do in koboldcpp?

Specify the GGUF/GGML model file to load