vllm.v1.executor.ray_distributed_executor
 
  Bases: Future
A wrapper around Ray output reference to meet the interface of .execute_model(): The top level (core busy loop) expects .result() api to block and return a single output.
If aggregator is provided, the outputs from all workers are aggregated upon the result() call. If not only the first worker's output is returned.
Source code in vllm/v1/executor/ray_distributed_executor.py
  
 __init__(
    refs, aggregator: Optional[KVOutputAggregator] = None
)
 
  Source code in vllm/v1/executor/ray_distributed_executor.py
   
  Bases: RayDistributedExecutor, Executor
Ray distributed executor using Ray Compiled Graphs.
Source code in vllm/v1/executor/ray_distributed_executor.py
  property  ¶
 max_concurrent_batches: int
Ray distributed executor supports pipeline parallelism, meaning that it allows PP size batches to be executed concurrently.
 
  Source code in vllm/v1/executor/ray_distributed_executor.py
   
 execute_model(
    scheduler_output,
) -> Union[ModelRunnerOutput, Future[ModelRunnerOutput]]
Execute the model on the Ray workers.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| scheduler_output | The scheduler output to execute. | required | 
Returns:
| Type | Description | 
|---|---|
| Union[ModelRunnerOutput, Future[ModelRunnerOutput]] | The model runner output. | 
Source code in vllm/v1/executor/ray_distributed_executor.py
  
 reinitialize_distributed(
    reconfig_request: ReconfigureDistributedRequest,
) -> None