Tiny LLM automatic deployment for RISC-V-based high-efficiency compute cluster platforms

Li, Cong (2026) Tiny LLM automatic deployment for RISC-V-based high-efficiency compute cluster platforms. [Laurea magistrale], Università di Bologna, Corso di Studio in Telecommunications engineering [LM-DM270], Documento full-text non disponibile

Salva citazione

Il full-text non è disponibile per scelta dell'autore. (Contatta l'autore)

Abstract

Deploying transformer models on microcontroller-class processors can be severely constrained by limited on-chip memory and the absence of hardware-managed caches. This thesis extends Deeploy, an open-source neural-network compiler for low power microcontroller architectures, to deploy a proof-of-concept version of MicroLlama—a 526 K-parameter (∼2.0 MiB FP32) decoder-only transformer—on the Snitch cluster, an energy-efficient RISC-V compute tile with eight parallel cores and 128 KiB of scratchpad memory. The extension includes multi-core computational kernels and a tiling strategy for scratchpad-based memory hierarchies, with automated Direct Memory Access (DMA) transfers managing data movement. Cycle-approximate simulation confirms numerical correctness within 10−4 relative error across many different scratchpad configurations. The tiling strategy scales sub-linearly, for instance reducing the available scratchpad memory by 16 times only incurs in a 1.43× cycle penalty. Exploiting Snitch’s hardware stream registers (SSR) and floating-point repeat loops (FREP) yields a 1.50× end-to-end speedup. These results demonstrate that automatic deployment of transformer models on scratchpad-based RISC-V clusters is feasible, establishing a practical path toward on-device language model inference at the extreme edge.

Abstract

Tipologia del documento

Tesi di laurea (Laurea magistrale)

Autore della tesi

Li, Cong

Relatore della tesi

Conti, Francesco

Correlatore della tesi

Calin, Diaconu ; Dequino, Alberto ; Belano, Andrea

Scuola

Ingegneria e Architettura

Corso di studio