Time-Synced lyrics in the Browser with WebGPU

Urbano, Gianlorenzo (2024) Time-Synced lyrics in the Browser with WebGPU. [Laurea], Università di Bologna, Corso di Studio in Informatica [L-DM270], Documento ad accesso riservato.

Salva citazione

Documenti full-text disponibili:

Documento PDF (Thesis)
Full-text non accessibile fino al 31 Dicembre 2026.
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Non opere derivate 4.0 (CC BY-NC-ND 4.0)
Download (1MB) | Contatta l'autore

Abstract

In recent years, On-Device inference has become a critical area of research, driven by the growing demand for low-latency, privacy-preserving, and cost-effective AI applications. This thesis explores the integration of advanced deep learning models for audio processing tasks within web browsers using modern web technologies like WebGPU and WebAssembly. The focus is on deploying a pipeline for time-synchronized lyrics, incorporating music source separation and speech recognition models—HTDemucs and Wav2Vec2—optimized for client-side execution. The research addresses the challenges of executing resource-intensive audio processing tasks on heterogeneous devices, leveraging ONNX Runtime and efficient kernel implementations. Extensive optimizations in the model conversion, preprocessing, and inference pipelines are discussed, including the integration of dynamic time warping for accurate phoneme-to-text alignment.

Abstract

Tipologia del documento

Tesi di laurea (Laurea)

Autore della tesi

Urbano, Gianlorenzo

Relatore della tesi

Asperti, Andrea

Correlatore della tesi

Parisi, Loreto

Scuola

Scienze

Corso di studio