Project HSAV is a high-performance, open-source library of GEMM (General Matrix-Matrix Multiplication) kernels specifically optimized for the RISC-V Integrated Matrix Extension (IME) v0.6. By shifting from 1D vector processing to 2D matrix-tile operations, this project provides the critical "software glue" needed to run modern AI and Transformer-based models at peak efficiency on the next generation of RISC-V silicon.
In the current landscape of AI hardware, the RISC-V Integrated Matrix Extension (IME) represents a paradigm shift. While the standard Vector Extension (RVV 1.0) handles data in one dimension, IME introduces specialized instructions for 2D matrix-tile math. Project V-Tile is an initiative to implement the world’s first production-ready, open-source kernels leveraging the newly finalized IME v0.6 specification (released March 2026).
Standard AI libraries like OpenBLAS and OneDNN are currently optimized for 1D vector math. Compilers often struggle to automatically "tile" matrix data into the CPU’s registers, leading to hardware underutilization. Our project bridges this gap by writing hand-optimized micro-kernels using C-intrinsics and assembly that manually manage data movement between memory and the Matrix-Unit.
Implementation of GEMM Micro-kernels: Developing the core $C = \alpha(A \times B) + \beta C$ logic using the __riscv_vmmacc instruction set.
VLEN-Agnostic Design: Ensuring the kernels scale across different hardware implementations, whether the vector length ($VLEN$) is 128-bit or 1024-bit.
Precision Targeting: Optimizing specifically for BF16 and FP8 (E4M3/E5M2) data types, which are the industry standards for 2026 AI inference.
Benchmarking & Tooling: Providing a comprehensive test suite using the QEMU 9.2+ simulator to demonstrate performance gains over standard RVV 1.0 implementations.
By providing these kernels under a permissive license (MIT/Apache 2.0), this project enables the broader FOSS community to run Large Language Models (LLMs) and computer vision tasks on RISC-V hardware with performance parity to proprietary accelerators. The end goal is to upstream these kernels into major BLAS (Basic Linear Algebra Subprograms) libraries, making matrix acceleration "just work" for every RISC-V user worldwide.