FPGA: Why so few open source drivers for open hardware?

Field-Programmable Gate Arrays (FPGA) have been an interest of mine for well over a decade now. Being able to generate complex signals in the tens of MHz range with nanosecond accuracy, dealing with fast data streams, and doing all of this at a fraction of the power consumption of fast CPUs, they really have a lot of potential for fun. However, their prohibitive cost, proprietary toolchains (some running only on Windows), and the insanely-long bitstream generation made them look more like a curiosity to me rather than a practical solution. Finally, writing verilog / VHDL directly felt like the equivalent of writing an OS in assembly and thus felt more like torture than fun for the young C/C++ developer that I was. Little did I know that 10+ years later, I would find HW development to be the most amazing thing ever!

The first thing that changed is that I got involved in reverse engineering NVIDIA GPUs’ power management in order to write an open source driver, writing in a reverse-engineed assembly to implement automatic power management for this driver, creating my own smart wireless modems which detects the PHY parameters of incoming transmissions on the fly (modulation, center frequency) by using software-defined radio, and having fun with arduinos, single-board computers, and designing my custom PCBs.

The second thing that changed is that Moore’s law has grinded to a halt, leading to a more architecture-centric instead of a fab-oriented world. This reduced the advantage ASICs had on FPGAs, by creating a software eco-system that is more geared towards parallelism rather than high-frequency single-thread performance.

Finally, FPGAs along with their community have gotten a whole lot more attractive! From the FPGAs themselves to their toolchains, let’s review what changed, and then ask ourselves why this has not translated to upstream Linux drivers for FPGA-based open source designs.

Even hobbyists can make useful HW designs

Programmable logic elements have gone through multiple ages throughout their life. Since their humble beginning, they have always excelled at low-volune designs by spreading the cost of creating a new ASIC onto as many customers as possible. This has enabled start-ups and hobbyists to create their own niche and get into the market without breaking the bank.

Nowadays, FPGAs are all based around Lookup Tables (LUT) rather than a set of logic gates as they can re-create any logic function and can also serve as flip-flops (memory unit). Let’s have a quick look at what changed throughout the “stack” that makes designing FPGA-based HW designs so approachable even to hobbyists.

Price per LUT

Historically, FPGAs have compared negatively to ASICs due to their increased latency (limiting the maximum frequency of the design), and power efficiency. However, just like CPUs and GPUs, one can compensate for these limitations by making a wider/parallel design operating at a lower frequency. Wider designs however require more logic elements / LUTs.

Fortunately, the price per LUT has fallen dramatically since the introduction of FPGAs, to the point that pretty much all but the biggest designs would fit in them. Since then, the focus has shifted on providing hard IPs (fixed functions) instead. This enables a $37 part (XC7A12T) to be able to fit over 3 Linux-worthy RISC-V processors running at 180 MHz, with 80 kB of block RAM available for caches, FIFOs, or anything else. By raising the budget to the $100 mark, the specs improve dramatically with an FPGA capable of running 40 Linux-worthy RISC-V CPUs and over 500 kB of block RAM available for caches!

And just in case this would not be enough for you, you could consider the Alveo line up such as the Alveo U250 which has 1.3M LUTs and a peak throughput in INT8 operations of 33 TOPs and 64 GB of DDR4 memory (77 GB/s bandwidth). For memory-bandwidth-hungry designs, the Alveo U280 brings 8 GB of HBM2 memory to the table (460GB/s bandwidth) and 32 GB of DDR4 memory (38 GB/s of bandwidth), at the expense of having “only” 24.5 INT8 TOPs and 1M LUTs. Both models can be found for ~$3000 on ebay, used. What a bargain :D !

Toolchains

Proprietary toolchains

Linux is now really supported by the major players of the industry. Xilinx’s support came first (2005), while Altera joined the club in 2009. Both are however the definition of bloated, with toolchains weighing multiple GB (~6GB for Altera, while Xilinx is at a whooping 27 GB)!

Open source toolchains for a few FPGAS

Project icestorm created a fully-functional fully-opensource toolchain for Lattice’s ice40 FPGAs. Its regular structure made the reverse engineering and writing the toolchain easier. Since then, the more complex Lattice ECP5 FPGA got full support, and Xilinx’s 7-series is under way. All these projects are now working under the Symbiflow umbrella, which aims to become the GCC of FPGAs.

Languages:

Migen / LiteX

VHDL/Verilog are error-prone and do not land themselves to complex parametrization. This reduces the re-usability of modules. On the contrary, the Python language excels at meta-programming, and Migen provides a way to generate verilog from relatively-simple python constructs.

On top of Migen, LiteX provides easy-to-use and space-efficient modules to create your own System On Chip (SoC) in less than an hour! It already has support for 16+ popular boards, generates verilog, builds, and loads the bitstream for you. Documentation is however quite sparse, but I would suggest you read the LiteX for Hardware Engineers guide if you want to learn more.

High-level Synthesis (HLS)

For complex algorithms, Migen/VHDL/Verilog are not the most efficient languages as they are too low-level and are akin to writing image recognition applications in assembly.

Instead, high-level synthesis enables writing an untimed model of the design in C, and convert it in an efficient Verilog/VHDL module. This makes it easy to validate the model, and to target multiple FPGA vendors with the same code without an expensive rewrite of the module. Moreover, changes in the algorithm or latency requirements will not require an expensive rewrite and re-validation. Sounds amazing to me!

The bad part is that most of C/C++-compatible HLS tools are proprietary or seem to be academic toy projects. I hope I am wrong though, so I’ll need to look more into them as the prospects are just too good to pass! Let me know in the comments which projects are your favourite!

Hard IPs (Fixed functions)

Initially, FPGAs were only made of a ton of gates / LUTs, and designs would be fully implemented using them. However, some functions could be better implemented as a fast and efficient fixed function: block memory, Serializer/Deserializer (parallel to serial and vice versa, often call SERDES), PLLs (clock generators), memory controlers, PCIe, …

These fixed-function blocks are called Hard IPs, while the part implemented using the programmable part of the FPGA is by extension called a soft IP. Hard IPs used to be reserved to higher-end parts, but they are nowadays found on most FPGAs, save the cheapest and smallest ones which are designed for low-power and self-reliance.

For example, the $100 part mentioned earlier includes multiple SERDES that are sufficient to achieve HDMI 1.4 compliance, a PCIe 2.0 with 4 lanes block, and a DDR3 memory controler. This makes it sufficient for implementing display controlers with multiple outputs and inputs, as seen on the NeTV2 open hardware board.

Hard IPs can also be the basis of proprietary soft IPs. For instance, Xilinx sells HDMI 1.4/2.0 receivers IPs that use the SERDES hard IPs to achieve the necessary 18Gb/s bandwidth needed to achieve HDMI compliance.

Soft-CPUs

One might wonder why use an FPGA to implement a CPU. Indeed, physical CPUs which are dirt-cheap and better-performing could simply be installed alongside the FPGA! So, why waste LUTs on a CPU? This article addresses it better than I could, but the gist of it is that they really complement fixed-logic well for less latency-oriented parts and provide a lot of value. The inconvenients are that an additional firmware is needed for the SoC, but that is no different from having external CPUs.

There has been quite a few open source toy soft-CPUs for FPGAs, and some proprietary vendor-provided ones. The problem has been that their toolchain was often out of tree, and/or Linux couldn’t run on them. This really changed with the introduction of RISC V, which is pretty efficient, is supported in mainline Linux and GCC, and can fit comfortably in even the smallest FPGAs from Altera and Xilinx. What’s there not to love?

Open design / open hardware boards

So, all of these nice improvements in FPGAs and their community is great, but it wouldn’t be as attractive if not for all the cheap and relatively-open boards (if not fully-OSHW-compliant) with their inovative designs using them:

Fomu ($50): an ice40-based FPGA that fits in your USB port and is sufficient to play with RISC V and a couple of IOs using a full-opensource toolchain!
IceBreaker ($69): a more traditional ice40-based board that is oriented towards IOs, low-cost, and a full-opensource toolchain.
ULX3S ($115-200): the ultimate ECP5-based board? It can be used as a complete handheld or static game console (including wireless controlers) with over-the-air updates, a USB/Wireless display controler, an arduino-compatible home-automation gateway including surveillance cameras. All of that with a full-opensource toolchain.
NeTV2: Video-oriented platform with 2 HDMI inputs and 2 HDMI outputs which can run as a standalone device with USB and Ethernet connectivity, or as an accelerator using the PCIe 2.0 4x connector. The most expensive board has enough gates to get into serious computing power which could be used to create a slow GPU, with a pretty-decent display controler! Being Xilinx’s Artix7-based, the opensource toolchains is not yet complete, but by the time you will be done implementing your design, I am sure the toolchain will be ready!

Ultimately, these boards provide a good platform for any sort of project, further reducing the cost of entry in the hobby / market, and providing ready-made designs to be incorporated in your projects. All seem pretty good on the hardware side, so why don’t we have a huge community around a board that would provide the flexibility of arduinos but with Raspberry-Pi-like feature set?

Open source hardware blocks exist

We have seen that board availability, toolchains, languages, speed, nor price are limiting even hobbyists from getting into hardware design. So, there must be open blocks that could be incorporated in designs, right?

The answer is a resounding YES! The first project I would like to talk about is LiteX, which is a HDL language with batteries included (like Python). Here is a trimmed-down version of the different blocks it provides:

LiteX
- Soft CPUs: blackparrot, cv32e40p, lm32, microwatt, minerva, mor1kx, picorv32, rocket, serv, and vexriscv
- Input/Outputs: GPIO, I2C, SPI, I2S, UART, JTAG, PWM, XADC, …
- Wishbone bus: Enable MMIO access to the different IPs for the soft-CPUs, or through different buses (PCIe, USB, ethernet, …)
- Clock domains, ECC, random number generation, …
LiteDRAM: A SDRAM controller soft IP, or wrapper for DDR/LPDDR/DDR2/DDR3/DDR4 hard IPs of Xilinx or DDR3 for the ECP5.
LiteEth: A 10/100/1000 ethernet soft IP which also allows you to access the wishbone bus through it!
LitePCIe: Wrapper for the PCIe Gen2 x4 hard IPs of Xilinx and Intel
LiteSATA / LiteSDCard: Soft IP to access SATA drives / SD Cards, providing extensive storage capabilities to your soft CPU.
LiteVideo: HDMI input/output soft IPs, with DMA, triple buffering, and color space conversion.

Using LiteX, one may create a complete System of Chip in a matter of hours. Adding a block is as simple as adding two lines of code to the SoC: One line to instantiate the block (like one would instantiate an object), and one to expose it through the wishbone bus. And if this isn’t enough, check out the new Open WiFi project, or the OpenCores project which seems to have pretty much everything one could hope for.

So… where are the drivers for open source blocks?

We have seen that relatively-open boards with capable FPGAs and useful IOs are affordable even to hobbyists. We have also seen that creating SoCs can be done in a matter of hours, so why don’t we have drivers for all of them?

I mean, we have a FPGA subsystem that is focused on loading bitstreams at boot, or even supporting on-the-fly FPGA reconfiguration. We have support for most hard IPs, but only when accessed through the integrated ARM processor of some FPGAs. So, why don’t we have drivers for soft IPs? Could it be their developers would not want to upstream drivers for them because the interface and the base address of the block is subject to change? It certainly looks like it!

But what if we could create an interface that would allow listing these blocks, the current version of their interface, and their base address? This would basically be akin to the Device Tree, but without the need to ship to every single user the netlist for the SoC you created. This would enable the creation of a generic upstream driver for all the versions of a soft IPs and all the boards using them, and thus make open source soft IPs more usable.

Removing the fear of ABI instability in open cores is at the core of my new project, LiteDIP. To demonstrate its effectiveness, I would like to expose all the hardware available on the NeTV2 (HDMI IN/OUT, 10/100 ethernet, SD Card reader, Fan, temperature, voltages), and the ULX3S (HDMI IN/OUT, WiFi, Bluetooth, SD Card reader, LEDs, GPIOs, ADC, buttons, Audio, FM/AM radio, …) using the same driver. Users could pick and chose modules, configure them to their liking, and no driver changes would be necessary. It sounds ambitious, but also seems like a worthy challenge! Not only do I get to enjoy a new hobby, but it would bring together software and hardware developers, enabling the creation of modern-ish computers or accelerators using one size fits all open development boards.

Am I the only one excited by the prospect? Stay tuned for updates on the project!

2020-06-12 edit: Fixed multiple typos spotted by Forest Crossman, the confusion between kb and kB spotted by Mic, added a link to the Linux-worthy VexRiscv CPU, removed the confusion spotted by TD-Linux between HLS and Scala-based HDLs, link to the open-source hardware definition and do not label all boards as being fully open as suggested by the feedback from inamberclad and abetusk.