vllm.model_executor.layers.mamba.mamba_utils
 
 Source code in vllm/model_executor/layers/mamba/mamba_utils.py
  classmethod  ¶
  Compute the increase in group numbers to account for replication in order to accompany the head shards.
Source code in vllm/model_executor/layers/mamba/mamba_utils.py
  classmethod  ¶
 linear_attention_state_shape(
    num_heads: int, tp_size: int, head_dim: int
) -> tuple[tuple[int, int, int], ...]
Source code in vllm/model_executor/layers/mamba/mamba_utils.py
   classmethod  ¶
 mamba1_state_shape(
    tp_world_size: int,
    intermediate_size: int,
    state_size: int,
    conv_kernel: int,
    use_v1: bool = True,
) -> tuple[tuple[int, int], tuple[int, int]]
Source code in vllm/model_executor/layers/mamba/mamba_utils.py
  classmethod  ¶
 mamba2_state_shape(
    tp_world_size: int,
    intermediate_size: int,
    n_groups: int,
    num_heads: int,
    head_dim: int,
    state_size: int,
    conv_kernel: int,
    use_v1: bool = True,
) -> tuple[tuple[int, int], tuple[int, int, int]]