Design of a Cluster-Coupled Hardware Accelerator for FFT Computation

Bertaccini, Luca (2020) Design of a Cluster-Coupled Hardware Accelerator for FFT Computation. [Laurea magistrale], Università di Bologna, Corso di Studio in Ingegneria elettronica [LM-DM270]
Documenti full-text disponibili:
[img] Documento PDF (Thesis)
Disponibile con Licenza: Creative Commons: Attribuzione - Non commerciale - Non opere derivate 3.0 (CC BY-NC-ND 3.0)

Download (2MB)


This thesis is related to the design of a hardware accelerator computing the Fast Fourier Transform (FFT) to be integrated into a PULP cluster. The project has been realized partly at the University of Bologna and partly at ETH Zurich. PULP (Parallel Ultra Low Power) platform is a joint project between the Energy-efficient Embedded Systems (EEES) group of UNIBO and the Integrated Systems Laboratory (IIS) of ETH Zurich that started in 2013. The FFT not only is used in data analytics but also represents a front-end for machine learning and neural networks application. The goal of this accelerator is to speed up these kinds of algorithms and to compute them in an ultra-low-power manner. For the project described in this thesis, the radix-2 DIT (Decimation-in-Time) FFT has been implemented and the whole design has been realized in synthesizable SystemVerilog. Fixed-point arithmetic has been used within the computational part of the accelerator and the correct behavior of this unit has been evaluated making use of some MATLAB scripts. Since the accelerator has been conceived to be integrated into the PULP platform, it has been designed in compliance with the communication protocols implemented on such a board. The performance of the hardware accelerator has then been estimated in terms of area, timing, flexibility, and execution time. It has resulted to be seven times faster than a highly optimized software running FFT on 8 cores. In 22 nm technology, it occupies around 115000 µm² and it is characterized by a maximum clock frequency of 690MHz. To avoid frequent conflicts accessing the external memory, a buffer has been internalized into the accelerator. Such a choice has led to shorter execution times but has increased considerably the overall area. Finally, a way to remove the internal buffer has been studied and the features of this new possible design have been compared to the results obtained for the implemented version of the FFT hardware accelerator.

Tipologia del documento
Tesi di laurea (Laurea magistrale)
Autore della tesi
Bertaccini, Luca
Relatore della tesi
Correlatore della tesi
Corso di studio
Ordinamento Cds
Parole chiave
Fast Fourier Transform,FFT,Signal Processing,Hardware Accelerator,PULP,Hardware Design,SystemVerilog,Synopsys,Fixed-point Arithmetic
Data di discussione della Tesi
6 Febbraio 2020

Altri metadati

Statistica sui download

Gestione del documento: Visualizza il documento