1 Department of Applied Mathematics and Computer Science, Technical University of Denmark2 Embedded Systems Engineering, Department of Applied Mathematics and Computer Science, Technical University of Denmark3 Copenhagen Center for Health Technology, Center, Technical University of Denmark
Embedded systems are used in a broad range of applications that demand high performance within severely constrained mechanical, power, and cost requirements. Embedded systems implemented in ASIC technology tend to provide the highest performance, lowest power consumption and lowest unit cost. However, high setup and design costs make ASICs economically viable only for high volume production. Therefore, FPGAs are increasingly being used in low and medium volume markets. The evolution of FPGAs has reached a point where multiple processor cores, dedicated accelerators, and a large number of interfaces can be integrated on a single device. This thesis consists of ve parts that address performance aspects of synthesizable computing systems on FPGAs. First, it is evaluated how synthesizable processor cores can exploit current state-of-the-art FPGA architectures. This evaluation results in a processor architecture optimized for a high throughput on modern FPGA architectures. The current hardware implementation, the Tinuso I core, can be clocked as high as 376MHz on a Xilinx Virtex 6 device and consumes fewer hardware resources than similar commercial processor congurations. The Tinuso architecture leverages predicated execution to circumvent costly pipeline stalls due to branches and exposes hazards to the compiler to keep the hardware simple. Second, it is investigated if a production compiler, GCC, is able to successfully leverage predicated execution and schedule instructions so as to mitigate the hazards. The third part of this thesis describes the design and implementation of communication structures for Tinuso multicore congurations and evaluates the scalability of these systems. Forth, a case study shows how to map a high performance synthetic aperture radar application to a synthesizable multicore system. The proposed system includes 64 processor cores and a 2D mesh interconnect on a single FPGA device and consumes about 10 watt only. Finally, a task based programming model is proposed that allows for easily expressing parallelism and simplies memory management.