vllm.distributed.device_communicators.all2all
 
  Bases: All2AllManagerBase
All2All communication based on DeepEP High-Throughput kernels.
Source code in vllm/distributed/device_communicators/all2all.py
  
  Source code in vllm/distributed/device_communicators/all2all.py
  
    
    
    
  Bases: DeepEPAll2AllManagerBase
All2All communication based on DeepEP High-Throughput kernels.
Source code in vllm/distributed/device_communicators/all2all.py
  
    
  Source code in vllm/distributed/device_communicators/all2all.py
  
  Source code in vllm/distributed/device_communicators/all2all.py
  
  Bases: DeepEPAll2AllManagerBase
All2All communication based on DeepEP Low-Latency kernels.
Source code in vllm/distributed/device_communicators/all2all.py
  
    
 _make_all2all_kwargs(
    max_num_tokens_per_dp_rank: int,
    token_hidden_size: int,
    num_ep_ranks: int,
    num_global_experts: int,
    num_local_experts: int,
) -> dict[Any, Any]
the maximum number of tokens a DP rank
can dispatch all the ranks must hold the same value.
token_hidden_size: the hidden dimension of each token. num_ep_ranks: the number of EP group ranks. num_global_experts: Number of experts in the model. num_local_experts: Number of experts in an EP rank.
Source code in vllm/distributed/device_communicators/all2all.py
  
  The kwargs for DeepEPLLAll2AllManager is dictated by _make_all2all_kwargs.
Source code in vllm/distributed/device_communicators/all2all.py
  
  Bases: All2AllManagerBase
A naive implementation of all2all communication. It uses all-reduce under the hood, which is not efficient at all. The main purpose is for testing and debugging.
Source code in vllm/distributed/device_communicators/all2all.py
  
    
  Source code in vllm/distributed/device_communicators/all2all.py
  
    
  Source code in vllm/distributed/device_communicators/all2all.py
  
  Source code in vllm/distributed/device_communicators/all2all.py
  
  Bases: All2AllManagerBase
All2All communication based on PPLX kernels.