vllm.attention.layers.chunked_local_attention
 
  Bases: Attention
Source code in vllm/attention/layers/chunked_local_attention.py
  
 __init__(
    num_heads: int,
    head_size: int,
    scale: float,
    attention_chunk_size: int,
    num_kv_heads: Optional[int] = None,
    alibi_slopes: Optional[List[float]] = None,
    cache_config: Optional[CacheConfig] = None,
    quant_config: Optional[QuantizationConfig] = None,
    kv_sharing_target_layer_name: Optional[str] = None,
    prefix: str = "",
)
Source code in vllm/attention/layers/chunked_local_attention.py
  cached  ¶
 create_chunked_local_attention_backend(
    underlying_attn_backend: AttentionBackend,
    attention_chunk_size: int,
    block_size: int,
) -> type[AttentionBackend]