vllm.model_executor.layers.quantization.kernels.mixed_precision
Modules:
| Name | Description | 
|---|---|
| MPLinearKernel |  | 
| allspark |  | 
| bitblas |  | 
| conch |  | 
| dynamic_4bit |  | 
| exllama |  | 
| machete |  | 
| marlin |  | 
 module-attribute  ¶
 _POSSIBLE_KERNELS: list[type[MPLinearKernel]] = [
    MacheteLinearKernel,
    AllSparkLinearKernel,
    MarlinLinearKernel,
    Dynamic4bitLinearKernel,
    BitBLASLinearKernel,
    ConchLinearKernel,
    ExllamaLinearKernel,
]
 
 choose_mp_linear_kernel(
    config: MPLinearLayerConfig,
    compute_capability: Optional[int] = None,
) -> type[MPLinearKernel]
Choose an MPLinearKernel that can implement the given config for the given compute capability. Attempts to choose the best kernel in terms of performance.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| config | MPLinearLayerConfig | Description of the linear layer to be implemented. | required | 
| compute_capability | Optional[int] | The compute capability of the target device, if None uses  | None | 
Raises:
| Type | Description | 
|---|---|
| ValueError | If no kernel can implement the given config. | 
Returns:
| Type | Description | 
|---|---|
| type[MPLinearKernel] | type[MPLinearKernel]: Chosen kernel. |