vllm.model_executor.model_loader.utils
Utilities for selecting and loading models.
 dataclass  ¶
 A class to handle parameter mapping for model weight loading. It creates a bidirectional mapping between packed parameters and their constituent parts.
Source code in vllm/model_executor/model_loader/utils.py
  class-attribute instance-attribute  ¶
   
 __init__(
    packed_mapping: dict[str, list[str]],
    inverse_packed_mapping: dict[
        str, tuple[str, int]
    ] = dict(),
) -> None
 
  Source code in vllm/model_executor/model_loader/utils.py
  
    
 configure_quant_config(
    quant_config: QuantizationConfig,
    model_class: type[Module],
)
Pass packed_modules_mapping by reference to quant_config so that quant_config can properly match fused modules
Note that model attributes are passed by reference to quant_config, enabling them to be updated by model_class.new (ex. chatglm, qwen)
Once the SupportsQuant mixin has been added to all models, this function can be removed
Source code in vllm/model_executor/model_loader/utils.py
  
  Source code in vllm/model_executor/model_loader/utils.py
  
 get_architecture_class_name(
    model_config: ModelConfig,
) -> str
 
 get_model_architecture(
    model_config: ModelConfig,
) -> tuple[type[Module], str]
Source code in vllm/model_executor/model_loader/utils.py
  
 get_model_cls(model_config: ModelConfig) -> type[Module]
 
 initialize_model(
    vllm_config: VllmConfig,
    *,
    prefix: str = "",
    model_class: Optional[type[Module]] = None,
    model_config: Optional[ModelConfig] = None,
) -> Module
Initialize a model with the given configurations.
Source code in vllm/model_executor/model_loader/utils.py
  
 process_weights_after_loading(
    model: Module,
    model_config: ModelConfig,
    target_device: device,
) -> None