vllm.model_executor.warmup.deep_gemm_warmup
Warmup deep_gemm kernels. DeepGEMM JIT's the kernels. The warmup aims to JIT all the kernels that would be used during model execution beforehand.
 module-attribute  ¶
   
  Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
  
 _deepgemm_grouped_fp8_gemm_nt_contiguous_warmup(
    w1: Tensor,
    w2: Tensor,
    w1_scale: Tensor,
    w2_scale: Tensor,
    num_topk: int,
)
Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
  
  Extract weights, weight scales and num_topk from FusedMoE module.
Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
  
  Extract weights, weight scales and quantization block sizes from the given LinearBase module.
Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
  
  Return True if the input module/layer could be processed with DeepGEMM.
Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
  
  Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
  
    
  Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
  
 deepgemm_grouped_fp8_gemm_nt_contiguous_warmup(
    model: Module,
)