nvidia-cusparselt-cu12

Name: nvidia-cusparselt-cu12
Author: NVIDIA Corporation
NVIDIA cuSPARSELt
Downloads: 0 (30 days)
Description

###################################################################################
cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication
###################################################################################

**NVIDIA cuSPARSELt** is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a structured sparse matrix with 50\% sparsity ratio:

.. math::

   D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias)

where :math:`op(A)/op(B)` refers to in-place operations such as transpose/non-transpose, and :math:`alpha, beta` are scalars or vectors.

The *cuSPARSELt APIs* allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

**Download:** `developer.nvidia.com/cusparselt/downloads <https://developer.nvidia.com/cusparselt/downloads>`_

**Provide Feedback:** `Math-Libs-Feedback@nvidia.com <mailto:Math-Libs-Feedback@nvidia.com?subject=cuSPARSELt-Feedback>`_

**Examples**:
`cuSPARSELt Example 1 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul>`_,
`cuSPARSELt Example 2 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul_advanced>`_

**Blog post**:

- `Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt <https://developer.nvidia.com/blog/exploiting-ampere-structured-sparsity-with-cusparselt/>`_
- `Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines <https://developer.nvidia.com/blog/structured-sparsity-in-the-nvidia-ampere-architecture-and-applications-in-search-engines/>`__
- `Making the Most of Structured Sparsity in the NVIDIA Ampere Architecture <https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31552/>`__

================================================================================
Key Features
================================================================================

* *NVIDIA Sparse MMA tensor core* support
* Mixed-precision computation support:

    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+
    | Input A/B    | Input C        | Output D        | Compute     | Block scaled                    | Support SM arch    |
    +==============+================+=================+=============+=================================+====================+
    | `FP32`       | `FP32`         | `FP32`          | `FP32`      | No                              |                    |
    +--------------+----------------+-----------------+-------------+                                 +                    |
    | `BF16`       | `BF16`         | `BF16`          | `FP32`      |                                 | `8.0, 8.6, 8.7`    |
    +--------------+----------------+-----------------+-------------+                                 + `9.0, 10.0, 10.1`  |
    | `FP16`       | `FP16`         | `FP16`          | `FP32`      |                                 | `11.0, 12.0, 12.1` |
    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+
    | `FP16`       | `FP16`         | `FP16`          | `FP16`      | No                              | `9.0`              |
    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+
    | `INT8`       | `INT8`         | `INT8`          | `INT32`     | No                              |                    |
    +              +----------------+-----------------+             +                                 + `8.0, 8.6, 8.7`    +
    |              | `INT32`        | `INT32`         |             |                                 | `9.0, 10.0, 10.1`  |
    +              +----------------+-----------------+             +                                 + `11.0, 12.0, 12.1` +
    |              | `FP16`         | `FP16`          |             |                                 |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `BF16`         | `BF16`          |             |                                 |                    |
    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+
    | `INT8`       | `INT8`         | `INT8`          | `INT32`     | No                              |                    |
    +              +----------------+-----------------+             +                                 + `8.0, 8.6, 8.7`    +
    |              | `INT32`        | `INT32`         |             |                                 | `9.0, 10.0, 10.1`  |
    +              +----------------+-----------------+             +                                 + `11.0, 12.0, 12.1` +
    |              | `FP16`         | `FP16`          |             |                                 |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `BF16`         | `BF16`          |             |                                 |                    |
    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+
    | `E4M3`       | `FP16`         | `E4M3`          | `FP32`      | No                              | `9.0, 10.0, 10.1`  |
    +              +----------------+-----------------+             +                                 + `11.0, 12.0, 12.1` +
    |              | `BF16`         | `E4M3`          |             |                                 |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `FP16`         | `FP16`          |             |                                 |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `BF16`         | `BF16`          |             |                                 |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `FP32`         | `FP32`          |             |                                 |                    |
    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+
    | `E5M2`       | `FP16`         | `E5M2`          | `FP32`      | No                              | `9.0, 10.0, 10.1`  |
    +              +----------------+-----------------+             +                                 + `11.0, 12.0, 12.1` +
    |              | `BF16`         | `E5M2`          |             |                                 |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `FP16`         | `FP16`          |             |                                 |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `BF16`         | `BF16`          |             |                                 |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `FP32`         | `FP32`          |             |                                 |                    |
    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+
    | `E4M3`       | `FP16`         | `E4M3`          | `FP32`      | A/B/D_OUT_SCALE = `VEC64_UE8M0` | `10.0, 10.1, 11.0` |
    +              +----------------+-----------------+             +                                 + `12.0, 12.1`       +
    |              | `BF16`         | `E4M3`          |             | D_SCALE = `32F`                 |                    |
    +              +----------------+-----------------+             +---------------------------------+                    +
    |              | `FP16`         | `FP16`          |             | A/B_SCALE = `VEC64_UE8M0`       |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `BF16`         | `BF16`          |             |                                 |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `FP32`         | `FP32`          |             |                                 |                    |
    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+
    | `E2M1`       | `FP16`         | `E2M1`          | `FP32`      | A/B/D_SCALE = `VEC32_UE4M3`     | `10.0, 10.1, 11.0` |
    +              +----------------+-----------------+             +                                 + `12.0, 12.1`       +
    |              | `BF16`         | `E2M1`          |             | D_SCALE = `32F`                 |                    |
    +              +----------------+-----------------+             +---------------------------------+                    +
    |              | `FP16`         | `FP16`          |             | A/B_SCALE = `VEC32_UE4M3`       |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `BF16`