simsimd
Portable mixed-precision BLAS-like vector math library for x86 and ARM
Description

Computing dot-products, similarity measures, and distances between low- and high-dimensional vectors is ubiquitous in Machine Learning, Scientific Computing, Geospatial Analysis, and Information Retrieval.
These algorithms generally have linear complexity in time, constant or linear complexity in space, and are data-parallel.
In other words, it is easily parallelizable and vectorizable and often available in packages like BLAS (level 1) and LAPACK, as well as higher-level numpy and scipy Python libraries.
Ironically, even with decades of evolution in compilers and numerical computing, most libraries can be 3-200x slower than hardware potential even on the most popular hardware, like 64-bit x86 and Arm CPUs.
Moreover, most lack mixed-precision support, which is crucial for modern AI!
The rare few that support minimal mixed precision, run only on one platform, and are vendor-locked, by companies like Intel and Nvidia.
SimSIMD provides an alternative.
1️⃣ SimSIMD functions are practically as fast as memcpy.
2️⃣ Unlike BLAS, most kernels are designed for mixed-precision and bit-level operations.
3️⃣ SimSIMD often ships more binaries than NumPy and has more backends than most BLAS implementations, and more high-level interfaces than most libraries.
Features
SimSIMD (Arabic: "سيمسيم دي") is a mixed-precision math library of over 350 SIMD-optimized kernels extensively used in AI, Search, and DBMS workloads. Named after the iconic "Open Sesame" command that opened doors to treasure in Ali Baba and the Forty Thieves, SimSIMD can help you 10x the cost-efficiency of your computational pipelines. Implemented distance functions include:
- Euclidean (L2) and Cosine (Angular) spatial distances for Vector Search. docs
- Dot-Products for real & complex vectors for DSP & Quantum computing. docs
- Hamming (~ Manhattan) and Jaccard (~ Tanimoto) bit-level distances. docs
- Set Intersections for Sparse Vectors and Text Analysis. docs
- Mahalanobis distance and Quadratic forms for Scientific Computing. docs
- Kullback-Leibler and Jensen–Shannon divergences for probability distributions. docs
- Fused-Multiply-Add (FMA) and Weighted Sums to replace BLAS level 1 functions. docs
- For Levenshtein, Needleman–Wunsch, and Smith-Waterman, check StringZilla.
- 🔜 Haversine and Vincenty's formulae for Geospatial Analysis.
Moreover, SimSIMD...
- handles
float64,float32,float16, andbfloat16real & complex vectors. - handles
int8integral,int4sub-byte, andb8binary vectors. - handles sparse
uint32anduint16sets, and weighted sparse vectors. - is a zero-dependency header-only C 99 library.
- has Python, Rust, JS, and Swift bindings.
- has Arm backends for NEON, Scalable Vector Extensions (SVE), and SVE2.
- has x86 backends for Haswell, Skylake, Ice Lake, Genoa, and Sapphire Rapids.
- with both compile-time and runtime CPU feature detection easily integrates anywhere!
Due to the high-level of fragmentation of SIMD support in different x86 CPUs, SimSIMD generally uses the names of select Intel CPU generations for its backends. They, however, also work on AMD CPUs. Intel Haswell is compatible with AMD Zen 1/2/3, while AMD Genoa Zen 4 covers AVX-512 instructions added to Intel Skylake and Ice Lake. You can learn more about the technical implementation details in the following blog-posts:
- Uses Horner's method for polynomial approximations, beating GCC 12 by 119x.
- Uses Arm SVE and x86 AVX-512's masked loads to eliminate tail
for-loops. - Substitutes libc's
sqrtwith Newton Raphson iterations. - Uses Galloping and SVE2 histograms to intersect sparse vectors.
- For Python: avoids slow PyBind11, SWIG, &
PyArg_ParseTupleusing faster calling convention. - For JavaScript: uses typed arrays and NAPI for zero-copy calls.