vllm.engine.multiprocessing
Modules:
| Name | Description |
|---|---|
client | |
engine | |
REQUEST_OUTPUTS_T module-attribute ¶
REQUEST_OUTPUTS_T = Union[
List[RequestOutput],
RPCAdapterLoadedResponse,
RPCIsSleepingResponse,
RPCError,
]
RPC_REQUEST_T module-attribute ¶
RPC_REQUEST_T = Union[
RPCProcessRequest,
RPCAbortRequest,
RPCStartupRequest,
RPCUProfileRequest,
RPCLoadAdapterRequest,
RPCResetMultiModalCacheRequest,
RPCResetPrefixCacheRequest,
RPCSleepRequest,
RPCWakeUpRequest,
RPCIsSleepingRequest,
]
MQEngineDeadError ¶
Bases: RuntimeError
RPCAbortRequest dataclass ¶
RPCAdapterLoadedResponse dataclass ¶
RPCError dataclass ¶
Source code in vllm/engine/multiprocessing/__init__.py
RPCIsSleepingRequest dataclass ¶
Source code in vllm/engine/multiprocessing/__init__.py
RPCIsSleepingResponse dataclass ¶
Source code in vllm/engine/multiprocessing/__init__.py
RPCLoadAdapterRequest dataclass ¶
Source code in vllm/engine/multiprocessing/__init__.py
RPCProcessRequest dataclass ¶
Source code in vllm/engine/multiprocessing/__init__.py
lora_request class-attribute instance-attribute ¶
lora_request: Optional[LoRARequest] = lora_request
trace_headers class-attribute instance-attribute ¶
__init__ ¶
__init__(
prompt: PromptType,
params: Union[SamplingParams, PoolingParams],
request_id: str,
lora_request: Optional[LoRARequest] = None,
trace_headers: Optional[Mapping[str, str]] = None,
priority: int = 0,
) -> None
Source code in vllm/engine/multiprocessing/__init__.py
RPCResetMultiModalCacheRequest ¶
RPCResetPrefixCacheRequest dataclass ¶
RPCSleepRequest ¶
RPCStartupRequest ¶
RPCStartupResponse dataclass ¶
RPCUProfileRequest ¶
RPCWakeUpRequest dataclass ¶
ENGINE_DEAD_ERROR ¶
ENGINE_DEAD_ERROR(
error: Optional[BaseException] = None,
) -> MQEngineDeadError