vllm.model_executor.models.mamba_cache
 
  Bases: ConstantSizeCache
Source code in vllm/model_executor/models/mamba_cache.py
  
 __init__(
    vllm_config: VllmConfig,
    dtype: dtype,
    num_mamba_layers: int,
    conv_state_shape: tuple[int, int],
    temporal_state_shape: tuple[int, int],
)
Source code in vllm/model_executor/models/mamba_cache.py
  
    
 current_run_tensors(**kwargs) -> MambaCacheParams
Return the tensors for the current run's conv and ssm state.
Source code in vllm/model_executor/models/mamba_cache.py
  
 get_seqlen_agnostic_capture_inputs(batch_size: int)
Provide the CUDA graph capture runs with a buffer in adjusted size. The buffer is used to maintain the Mamba Cache during the CUDA graph replay runs.