Tiny LLM automatic deployment for RISC-V-based high-efficiency compute cluster platforms

Li, Cong (2026) Tiny LLM automatic deployment for RISC-V-based high-efficiency compute cluster platforms. [Laurea magistrale], Università di Bologna, Corso di Studio in Telecommunications engineering [LM-DM270], Documento full-text non disponibile
Il full-text non è disponibile per scelta dell'autore. (Contatta l'autore)

Abstract

Deploying transformer models on microcontroller-class processors can be severely constrained by limited on-chip memory and the absence of hardware-managed caches. This thesis extends Deeploy, an open-source neural-network compiler for low power microcontroller architectures, to deploy a proof-of-concept version of MicroLlama—a 526 K-parameter (∼2.0 MiB FP32) decoder-only transformer—on the Snitch cluster, an energy-efficient RISC-V compute tile with eight parallel cores and 128 KiB of scratchpad memory. The extension includes multi-core computational kernels and a tiling strategy for scratchpad-based memory hierarchies, with automated Direct Memory Access (DMA) transfers managing data movement. Cycle-approximate simulation confirms numerical correctness within 10−4 relative error across many different scratchpad configurations. The tiling strategy scales sub-linearly, for instance reducing the available scratchpad memory by 16 times only incurs in a 1.43× cycle penalty. Exploiting Snitch’s hardware stream registers (SSR) and floating-point repeat loops (FREP) yields a 1.50× end-to-end speedup. These results demonstrate that automatic deployment of transformer models on scratchpad-based RISC-V clusters is feasible, establishing a practical path toward on-device language model inference at the extreme edge.

Abstract
Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Li, Cong
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Edge AI, Small Language Model, RISC-V, Deep Learning Compilation, Memory Tiling
Data di discussione della Tesi
25 Marzo 2026
URI

Altri metadati

Gestione del documento: Visualizza il documento

^