vllm.attention.backends.triton_mla
 
  Bases: MLACommonBackend
Source code in vllm/attention/backends/triton_mla.py
   staticmethod  ¶
 get_impl_cls() -> Type[TritonMLAImpl]
 
  Bases: MLACommonImpl[MLACommonMetadata]
Source code in vllm/attention/backends/triton_mla.py
  
 __init__(
    num_heads: int,
    head_size: int,
    scale: float,
    num_kv_heads: int,
    alibi_slopes: Optional[List[float]],
    sliding_window: Optional[int],
    kv_cache_dtype: str,
    logits_soft_cap: Optional[float],
    attn_type: str,
    kv_sharing_target_layer_name: Optional[str],
    **mla_args,
) -> None
Source code in vllm/attention/backends/triton_mla.py
  
 _forward_decode(
    q_nope: Tensor,
    q_pe: Tensor,
    kv_c_and_k_pe_cache: Tensor,
    attn_metadata: MLACommonMetadata,
) -> Tensor