vllm.model_executor.layers.fused_moe.flashinfer_cutlass_prepare_finalize
 
  Bases: FusedMoEPrepareAndFinalize
Source code in vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py
  
 __init__(
    quant_dtype: Optional[dtype] = None,
    per_channel_quant: bool = False,
    block_shape: Optional[list[int]] = None,
    num_dispatchers: int = 1,
)
Source code in vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py
  
 finalize(
    output: Tensor,
    fused_expert_output: Tensor,
    topk_weights: Tensor,
    topk_ids: Tensor,
    apply_router_weight_on_input: bool,
    weight_and_reduce_impl: TopKWeightAndReduce,
    extra_finalize_args: Optional[dict[str, Any]],
) -> None
Source code in vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py
  
    
 prepare(
    a1: Tensor,
    a1_scale: Optional[Tensor],
    a2_scale: Optional[Tensor],
    topk_weights: Tensor,
    topk_ids: Tensor,
    num_experts: int,
    expert_map: Optional[Tensor],
    apply_router_weight_on_input: bool,
    quant_config: FusedMoEQuantConfig,
    extra_prepare_args: Optional[dict[str, Any]],
) -> tuple[
    Tensor,
    Optional[Tensor],
    Optional[Tensor],
    Optional[Tensor],
    Optional[Tensor],
]