Urbano, Gianlorenzo
(2024)
Time-Synced lyrics in the Browser with WebGPU.
[Laurea], Università di Bologna, Corso di Studio in
Informatica [L-DM270], Documento ad accesso riservato.
Documenti full-text disponibili:
Abstract
In recent years, On-Device inference has become a critical area of research, driven by the growing demand for low-latency, privacy-preserving, and cost-effective AI applications. This thesis explores the integration of advanced deep learning models for audio processing tasks within web browsers using modern web technologies like WebGPU and WebAssembly. The focus is on deploying a pipeline for time-synchronized lyrics, incorporating music source separation and speech recognition models—HTDemucs and Wav2Vec2—optimized for client-side execution.
The research addresses the challenges of executing resource-intensive audio processing tasks on heterogeneous devices, leveraging ONNX Runtime and efficient kernel implementations. Extensive optimizations in the model conversion, preprocessing, and inference pipelines are discussed, including the integration of dynamic time warping for accurate phoneme-to-text alignment.
Abstract
In recent years, On-Device inference has become a critical area of research, driven by the growing demand for low-latency, privacy-preserving, and cost-effective AI applications. This thesis explores the integration of advanced deep learning models for audio processing tasks within web browsers using modern web technologies like WebGPU and WebAssembly. The focus is on deploying a pipeline for time-synchronized lyrics, incorporating music source separation and speech recognition models—HTDemucs and Wav2Vec2—optimized for client-side execution.
The research addresses the challenges of executing resource-intensive audio processing tasks on heterogeneous devices, leveraging ONNX Runtime and efficient kernel implementations. Extensive optimizations in the model conversion, preprocessing, and inference pipelines are discussed, including the integration of dynamic time warping for accurate phoneme-to-text alignment.
Tipologia del documento
Tesi di laurea
(Laurea)
Autore della tesi
Urbano, Gianlorenzo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Source-Separation,WebGPU,Transformers,Speech-Recognition,Lyrics,On-Device
Data di discussione della Tesi
18 Dicembre 2024
URI
Altri metadati
Tipologia del documento
Tesi di laurea
(NON SPECIFICATO)
Autore della tesi
Urbano, Gianlorenzo
Relatore della tesi
Correlatore della tesi
Scuola
Corso di studio
Ordinamento Cds
DM270
Parole chiave
Source-Separation,WebGPU,Transformers,Speech-Recognition,Lyrics,On-Device
Data di discussione della Tesi
18 Dicembre 2024
URI
Gestione del documento: