vllm.model_executor.layers.quantization.utils.flashinfer_fp4_moe
Utility helpers for NVFP4 + FlashInfer fused-MoE path
 module-attribute  ¶
 __all__ = [
    "is_flashinfer_fp4_cutlass_moe_available",
    "reorder_w1w3_to_w3w1",
    "build_flashinfer_fp4_cutlass_moe_kernel",
    "flashinfer_fp4_cutlass_moe_forward",
]
 
 build_flashinfer_fp4_cutlass_moe_kernel(
    moe_parallel_config: FusedMoEParallelConfig,
) -> FusedMoEModularKernel
Create and return a FlashInfer CUTLASS fused-MoE modular kernel
Source code in vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py
  
 flashinfer_fp4_cutlass_moe_forward(
    fused_experts: FusedMoEModularKernel,
    layer: Module,
    x: Tensor,
    topk_weights: Tensor,
    topk_ids: Tensor,
    activation: str,
    global_num_experts: int,
    expert_map: Optional[Tensor],
    apply_router_weight_on_input: bool,
) -> Tensor
Common forward wrapper for FlashInfer NV-FP4 fused-MoE
Source code in vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py
  
 is_flashinfer_fp4_cutlass_moe_available() -> bool
Return True when FlashInfer CUTLASS NV-FP4 kernels can be used.
Source code in vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py
   
  Re-order the concatenated [w1, w3] tensors to [w3, w1]
Source code in vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py
  
 select_nvfp4_gemm_impl(allow_flashinfer: bool, moe, logger)
Return a GEMM experts implementation for NV-FP4 fused-MoE layers