koboldcpp Command: Examples, Options, and Usage

常用示例

Launch

koboldcpp --model [path/to/model.gguf]

Launch

koboldcpp --model [path/to/model.gguf] --usecuda --gpulayers [35]

Launch

koboldcpp --model [path/to/model.gguf] --usevulkan --gpulayers [35] --port [8080]

Run a single prompt

koboldcpp --model [path/to/model.gguf] --prompt "[What is the meaning of life?]"

Launch in CLI interactive mode

koboldcpp --model [path/to/model.gguf] --cli

Load a saved configuration

koboldcpp --config [path/to/config.kcpps]

说明

koboldcpp is a self-contained AI text generation server that runs large language models locally. Built on top of llama.cpp, it provides a bundled web UI (KoboldAI Lite) and supports all GGML and GGUF model formats. It requires no external dependencies and runs as a single executable. The server exposes an API compatible with KoboldAI and OpenAI formats, making it usable with a wide range of frontends and applications. It supports CPU inference as well as GPU acceleration through CUDA (NVIDIA), Vulkan (AMD/NVIDIA), and Metal (Apple Silicon). Beyond text generation, koboldcpp supports image generation (Stable Diffusion), speech recognition (Whisper), and text-to-speech, all within the same executable. The bundled web UI offers multiple interaction modes including chat, instruct, adventure, and story writing.

参数

--model _path_: Specify the GGUF/GGML model file to load
--config _file_: Load a .kcpps configuration file
--usecuda: Enable NVIDIA CUDA GPU acceleration
--usevulkan: Enable Vulkan GPU acceleration (AMD/NVIDIA)
--gpulayers _n_: Number of model layers to offload to GPU
--threads _n_: Set CPU thread count for inference
--contextsize _n_: Set maximum context length in tokens
--port _n_: Change server port (default: 5001)
--host _addr_: Bind to a specific IP address
--multiuser _n_: Enable multiuser mode with _n_ concurrent slots
--password _key_: Require API authentication with the given key
--cli: Launch interactive command-line interface without starting a server
--prompt _text_: Run a single prompt, print output, and exit
--benchmark: Run performance benchmarking mode
--flashattention: Enable flash attention for improved performance
--smartcontext: Enable smart context handling to reduce reprocessing
--usemmap: Enable memory-mapped file I/O for model loading
--usemlock: Force model to remain in RAM (prevent swapping)
--ssl: Enable SSL for HTTPS connections
--remotetunnel: Enable remote tunnel access for sharing the server
--sdmodel _path_: Load a Stable Diffusion model for image generation
--noavx2: Compatibility mode for older CPUs without AVX2
--showgui: Show the GUI launcher even when command-line flags are used
--skiplauncher: Skip the GUI launcher and start the server directly

FAQ

What is the koboldcpp command used for?

koboldcpp is a self-contained AI text generation server that runs large language models locally. Built on top of llama.cpp, it provides a bundled web UI (KoboldAI Lite) and supports all GGML and GGUF model formats. It requires no external dependencies and runs as a single executable. The server exposes an API compatible with KoboldAI and OpenAI formats, making it usable with a wide range of frontends and applications. It supports CPU inference as well as GPU acceleration through CUDA (NVIDIA), Vulkan (AMD/NVIDIA), and Metal (Apple Silicon). Beyond text generation, koboldcpp supports image generation (Stable Diffusion), speech recognition (Whisper), and text-to-speech, all within the same executable. The bundled web UI offers multiple interaction modes including chat, instruct, adventure, and story writing.

How do I run a basic koboldcpp example?

Run `koboldcpp --model [path/to/model.gguf]` in a terminal, then adjust file names, paths, flags, or remote targets for your system.

What does --model _path_ do in koboldcpp?

Specify the GGUF/GGML model file to load

koboldcpp 命令

常用示例

说明

参数

FAQ

相关命令