Josiah Mendes

Making Things Fast 2 - Function Accelerator

2022-06-05

A module that I took in Year 3 at Imperial was Digital Systems Design, taught by Dr Christos Bouganis. This is without doubt one of my personal favourites, mostly because of how engaging and insightful the coursework was.

The coursework centered around taking a relatively simple mathematical equation, running it on a DE1-SoC's FPGA, initially implementing it in software using a soft core, and gradually moving more and more of the computation into hardware, increasing performance and reducing area at the expense of a loss in generality. This is where the top graph comes into play.

When it comes to hardware, a design that is for more general usage tends to have a larger footprint, but is also less proficient at carrying out a specific task. A design that is tailor made uses less area and power while being able to be very efficient at what it does, acheiving high performance for a very specific task, at the expense of not being able to apply that hardware to do other work.

I believe this is why we're gradually seeing the industry move towards more SoC designs that contain general purpose cores, but also dedicated silicon for doing things like machine learning or video transcoding. The example that comes to mind most recently is Apple's M-series chips. These chips have dedicated silicon to making creative applications fast such as doing HEVC video encoding for Final Cut Pro, or the neural engines that exist on Google Tensor for the Pixel 6 lineup that allow them to perform really well voice transcription and computational photography algorithms.

I do not think it's out of the realms of possiblity that FPGAs one day find themselves in consumer-focused SoC designs allowing software applications to accelerate things that may generally take longer with general purpose hardware. I can see a flow, where as part of the development process, a SW development team that has a workload that can be taken advantage of on FPGAs develops a kernel for flashing onto an FPGA at development time and this is included with the application and simply loaded onto the FPGA when needed to as part of the application's process.

For that to happen though, I think there are two things that are required for developers to pick up this technology for hetrogenus computing:

  1. HLS languages. Although designing hardware requires a different mindset to software, the current options for designing specialised HW in SystemVerilog or VHDL present a high barrier of entry for software engineers hoping to make use of FPGA fabric on a chip. HLS languages are an improvement, but I think more can be done to make them more accessible to SWEs.
  2. Platform standarisation. Currently, FPGA designs vary greatly, something that works on a DE0 FPGA from Intel/Altera may not work on an FPGA from another vendor. For consumer SW devs, the platform is (somewhat) standardised, so there's less to worry about, FPGAs would need something similar, like a standardised way of communication with a host CPU, or using classes to set the size of the FPGA with requirements for both logic elements, DSP elements and also memory elements. This kind of standardisation would in conjuction with better HW design languages would make FPGA acceleration more accessible to the masses.

Once again, take this opinion with the necessary amount of salt that needs to be applied to an optimistic 3rd-year computer engineering student. It was not my aim to dwelve into this when I started writing this blog, I really wanted to talk about our DSD coursework in a more informal way, but if you want to read more and see what we did, Here are the 3 papers authored by Carol Kwok and myself on the entire process with full technical detail:

Report 3 is probably the most interesting reading that will probably give you the most insight into our approach in this coursework.