vllm.entrypoints.openai.cli_args
This file contains the command line arguments for the vLLM's OpenAI-compatible server. It is kept in a separate file for documentation purposes.
 
 Arguments for the OpenAI-compatible frontend server.
Source code in vllm/entrypoints/openai/cli_args.py
 | 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 |  | 
 class-attribute instance-attribute  ¶
 allow_credentials: bool = False
Allow credentials.
 class-attribute instance-attribute  ¶
  Allowed headers.
 class-attribute instance-attribute  ¶
  Allowed methods.
 class-attribute instance-attribute  ¶
  Allowed origins.
 class-attribute instance-attribute  ¶
  If provided, the server will require one of these keys to be presented in the header.
 class-attribute instance-attribute  ¶
  The file path to the chat template, or the template in single-line form for the specified model.
 class-attribute instance-attribute  ¶
 chat_template_content_format: ChatTemplateContentFormatOption = "auto"
The format to render message content within a chat template.
- "string" will render the content as a string. Example: "Hello World"
- "openai" will render the content as a list of dictionaries, similar to OpenAI schema. Example: [{"type": "text", "text": "Hello world!"}]
 class-attribute instance-attribute  ¶
 disable_fastapi_docs: bool = False
Disable FastAPI's OpenAPI schema, Swagger UI, and ReDoc endpoint.
 class-attribute instance-attribute  ¶
 disable_frontend_multiprocessing: bool = False
If specified, will run the OpenAI frontend server in the same process as the model serving engine.
 class-attribute instance-attribute  ¶
 disable_uvicorn_access_log: bool = False
Disable uvicorn access log.
 class-attribute instance-attribute  ¶
 enable_auto_tool_choice: bool = False
If specified, exclude tool definitions in prompts when tool_choice='none'.
 class-attribute instance-attribute  ¶
 enable_force_include_usage: bool = False
If set to True, including usage on every request.
 class-attribute instance-attribute  ¶
 enable_log_outputs: bool = False
If set to True, enable logging of model outputs (generations) in addition to the input logging that is enabled by default.
 class-attribute instance-attribute  ¶
 enable_prompt_tokens_details: bool = False
If set to True, enable prompt_tokens_details in usage.
 class-attribute instance-attribute  ¶
 enable_request_id_headers: bool = False
If specified, API server will add X-Request-Id header to responses. Caution: this hurts performance at high QPS.
 class-attribute instance-attribute  ¶
 enable_server_load_tracking: bool = False
If set to True, enable tracking server_load_metrics in the app state.
 class-attribute instance-attribute  ¶
 enable_ssl_refresh: bool = False
Refresh SSL Context when SSL certificate files change
 class-attribute instance-attribute  ¶
 enable_tokenizer_info_endpoint: bool = False
Enable the /get_tokenizer_info endpoint. May expose chat templates and other tokenizer configuration.
 class-attribute instance-attribute  ¶
 exclude_tools_when_tool_choice_none: bool = False
Enable auto tool choice for supported models. Use --tool-call-parser to specify which parser to use.
 class-attribute instance-attribute  ¶
 log_config_file: Optional[str] = VLLM_LOGGING_CONFIG_PATH
Path to logging config JSON file for both vllm and uvicorn
 class-attribute instance-attribute  ¶
 lora_modules: Optional[list[LoRAModulePath]] = None
LoRA modules configurations in either 'name=path' format or JSON format or JSON list format. Example (old format): 'name=path' Example (new format): {"name": "name", "path": "lora_path", "base_model_name": "id"}
 class-attribute instance-attribute  ¶
  Max number of prompt characters or prompt ID numbers being printed in log. The default of None means unlimited.
 class-attribute instance-attribute  ¶
  Additional ASGI middleware to apply to the app. We accept multiple --middleware arguments. The value should be an import path. If a function is provided, vLLM will add it to the server using @app.middleware('http'). If a class is provided, vLLM will add it to the server using app.add_middleware().
 class-attribute instance-attribute  ¶
 response_role: str = 'assistant'
The role name to return if request.add_generation_prompt=true.
 class-attribute instance-attribute  ¶
 return_tokens_as_token_ids: bool = False
When --max-logprobs is specified, represents single tokens as strings of the form 'token_id:{token_id}' so that tokens that are not JSON-encodable can be identified.
 class-attribute instance-attribute  ¶
  FastAPI root_path when app is behind a path based routing proxy.
 class-attribute instance-attribute  ¶
  The CA certificates file.
 class-attribute instance-attribute  ¶
  Whether client certificate is required (see stdlib ssl module's).
 class-attribute instance-attribute  ¶
  The file path to the SSL cert file.
 class-attribute instance-attribute  ¶
  The file path to the SSL key file.
 class-attribute instance-attribute  ¶
  Select the tool call parser depending on the model that you're using. This is used to parse the model-generated tool call into OpenAI API format. Required for --enable-auto-tool-choice. You can choose any option from the built-in parsers or register a plugin via --tool-parser-plugin.
 class-attribute instance-attribute  ¶
 tool_parser_plugin: str = ''
Special the tool parser plugin write to parse the model-generated tool into OpenAI API format, the name register in this plugin can be used in --tool-call-parser.
 class-attribute instance-attribute  ¶
  Comma-separated list of host:port pairs (IPv4, IPv6, or hostname). Examples: 127.0.0.1:8000, [::1]:8000, localhost:1234. Or demo for demo purpose.
 class-attribute instance-attribute  ¶
  Unix domain socket path. If set, host and port arguments are ignored.
 class-attribute instance-attribute  ¶
 uvicorn_log_level: Literal[
    "debug", "info", "warning", "error", "critical", "trace"
] = "info"
Log level for uvicorn.
 staticmethod  ¶
 add_cli_args(
    parser: FlexibleArgumentParser,
) -> FlexibleArgumentParser
Source code in vllm/entrypoints/openai/cli_args.py
  
  Bases: Action
Source code in vllm/entrypoints/openai/cli_args.py
  
 __call__(
    parser: ArgumentParser,
    namespace: Namespace,
    values: Optional[Union[str, Sequence[str]]],
    option_string: Optional[str] = None,
)
Source code in vllm/entrypoints/openai/cli_args.py
  
 create_parser_for_docs() -> FlexibleArgumentParser
 
 make_arg_parser(
    parser: FlexibleArgumentParser,
) -> FlexibleArgumentParser
Create the CLI argument parser used by the OpenAI API server.
We rely on the helper methods of FrontendArgs and AsyncEngineArgs to register all arguments instead of manually enumerating them here. This avoids code duplication and keeps the argument definitions in one place.
Source code in vllm/entrypoints/openai/cli_args.py
  
 validate_parsed_serve_args(args: Namespace)
Quick checks for model serve args that raise prior to loading.