XFM Synthesizer

XFM is a polyphonic, 32-voice, 6-operator FM synthesizer stereo module that you can build yourself.

Essentially, the module receives MIDI messages from a controller/sequencer, produces audio and delivers it via digital (SPDIF) and analog outputs.

The sound range XFM can create and its feature set have a very broad intersection with most pure-FM synthesizers of the past (Yamaha DX series, OPL chip series, TX81Z/802/816 modules, etc.).

Testing a prototype of XFM, an FPGA-based FM synthesizer module (the DX7 is used only as a MIDI controller).
XFM is running from an USB battery pack.

Feature Set

  • 32-voice of polyphony

  • Polyphonic or Monophonic Legato modes

  • 6 operators per voice

  • Variable modulation algorithm

  • 8 waveforms per operator

  • A 6-segment Envelope Generator for each operator

  • 6-segment Pitch Envelope Generator

  • Per-voice LFO with Triangle, Square, Sine, S&H waveform

  • Per-operator adjustable LFO sensitivity for pitch and volume

  • Per-operator adjustable velocity tracking, keyboard tracking

  • Monophonic/Polyphonic portamento

  • Extensive MIDI implementation (pitch bend, mod wheel, aftertouch, sustain, breath ctl, foot ctl, volume, pan, etc.)

  • Auto-panner

  • Stereo Chorus

  • Stereo Decimator

  • 256 programs memory

  • MIDI in, thru

  • All 200+ parameters controllable via MIDI System Exclusive messages

  • 48kHz samplerate

  • Samplerate-grade modulations

  • Stereo, 24-bit SPDIF digital out

  • I2S digital out

  • Stereo, 16-bit analog out

  • Battery friendly, low power operation

  • Fully programmable (connecting it to a PC via USB)

  • Extremely low latency (< 1ms)

The Technology

XFM is an FPGA-based synthesizer.

The selection of technology for XFM was done in several stages. Needless to say, there’s no single computing device which is industry-driven by audio applications. The audio market is so small that it doesn’t even make enough space in the pie chart to have a title.

The audio trend of virtualizing synthesizers and effects also conspires against manufacturers investing millions of dollars in producing dedicated ASIC chips (like the ones used in all original DX synthesizers). So this project, as most hardware projects nowadays, has to be based on general-purpose computing elements. Fortunately, industry segments such as automobile, automation and IoT make for a such a huge scale that computing power-per-dollar is at a record high.

Coming from the software-based synthesizers and plug-ins, the very first stage was experimenting with the Single-Board Computer (SBC) options (Raspberry Pi, Beaglebone, etc.).

Those boards present enormous advantages against anything else in the option list: superbly inexpensive (Raspberry Pi is $35), easy programming, very stable toolchain. Such a low price is essentially a consequence of the huge production scale (so far over 20M boards have been sold).

However, those also come with notable drawbacks. They run a Linux kernel, so there it goes all hardware determinism and timing reliability. While many real-time oriented improvements to the standard OS distribution exist, allowing for decreased boot times and latency, I couldn’t achieve (as in Raspberry Pi 3+ gen) a level of latency, determinism and reliability remotely comparable with direct silicon.

Additionally, the boards can easily average 2W of power consumption *when idling*, which makes it unsuitable for battery operation.

Next option in queue is in fact a myriad of sub-options: all micro-controller units (MCU). Covering a vast price/computational power/energy range, there are hundreds to pick from.

Of notable interest are all the devices used in the Arduino platform. Most Arduino boards are based on AVR units from Atmel (now Microchip), yet a few are actually based on microprocessor units (MPU) such as the ATSAM series (ATSAM3X8E in the Arduino Due).

There are other MCU-based platforms such as the Teensy, which have the advantage of featuring a hardware floating-point unit (FPU).

A third computational option is the use of specialized DSP chips, such as the Texas Instruments C6000 series or Freescale MSC81 series.

The Challenge

XFM is the result of an exploration on applying state-of-the-art hardware technology to sound and music. It was born as the quest for “the ultimately inexpensive polyphonic, 6-operator FM synth”, in the understanding that replicating or combining multiple inexpensive cells would lead to a greater and more powerful unit.

In current world status, the term “inexpensive” means different things for different people. The price of goods and services is more and more tied to economic tensions and less to production costs.

Consequently, the answer to “how cheap can we go?” depends of factors which are much more related to sourcing and availability than technology and design optimization. In any case, the project outcome price would need to present a notable gap with anything equivalent in the market to be relevant. That goal was achieved.

Bro, do you even calculate?

To select a computing unit, it’s of fundamental importance to know how much calculations we need to perform.

That is reasonably easy to approximate. We know that we want the output at 48kHz samplerate. We also know we want 32 simultaneous voices, and for each voice we need to calculate:

-           7 waveform generators (6 operators + 1 LFO, phase accumulation plus LUT)

-           7 envelope generators (one for the level of each operator, plus one for the voice pitch)

-           7 gain stages, applying EG, LFO, velocity tracking, keyboard tracking, MIDI volume/wheel/after/breath/etc.)

In most platforms, a waveform generator can be optimized to use about 20  atomic instructions (fetch phase, calculate increment from all pitch modulators, add, store phase, get lookup multiple values from shared ROM/RAM, calculate interpolation). Envelope generators depend on the number of segments and shapes, but can easily go above 40 atomic instructions for a 6-segment generator with adjustable rates and levels, user-defined curves and interpolation. Gain stages depend on if the system features hardware multipliers, and if so they require one instruction per target, about 10 in our case. So one sample of one operator would total about 70 instructions.

Let’s call the group of one oscillator plus one EG plus one gain stage an “element”. Those numbers can then be multiplied to estimate our requirements:

    48,000 samples/sec  32 voice/sample * 7 elements/voice  * 70 instructions/element = 752 MIPS

Additionally, extra calculation power needs to be accounted to do:

  • MIDI handling

  • Parameter handling (UART/USB)

  • Program/Memory handling

  • Effects (Auto-panner, Chorus)

  • Digital Out converter

  • DAC

So in broad terms, we would need about 650 MIPS unit to fit the synthesizer reasonably well. As we want our output to have 24-bit quality, calculations need to be performed in at least 32-bit. That means that each one of those instructions should be 32-bit (i.e. 32-bit addition, 32x32 multiplication, etc.).

AVR units calculate one instruction per clock cycle (1 MIPS x MHz), while PIC units take four cycles to calculate one instruction (0.25 MIPS x MHz). Most of the Arduino boards run below 100MHz, so they are out of range.

Parallax Propeller chips (which were used to build a 16-voice, 2-op functional prototype) are a notable multi-core option. However, the lack of hardware multipliers reduces substantially their throughput. In the prototype, log/antilog tables combined with interpolation were used to compensate for the lack of multipliers, but the resulting SNR wasn’t acceptable for 24-bit operation.

So leaving the SBCs out because of power/latency/determinism, and most of the Arduino boards, the only viable options were the DSP boards and FPGA.

The FPGA option was chosen for XFM, essentially because of price/performance, scalability and future-proofing. DSP chips with similar computational power can easily cost 4x as much. Just as a reference, the DSI Prophet 12 uses the Analog Devices ADSP 21479 (six of those!), which cost $19.95.

The synthesizer was then designed in HDL, which could be ported/migrated to other FPGA vendors/families with a relatively lower effort. A bigger unit could eventually be assembled at silicon level (using a bigger FPGA device), in addition of gluing multiple chips together. And in a final case, an ASIC chip could be derived from the design. The only drawback being the length and complexity of the design.

FPGA Selection

Modern FPGAs are extremely complex devices. Besides de standard logic function generators (made with lookup tables or LUTs), and latches/registers (made of flip-flops), they commonly feature hard componentes, created in dedicated silicon fragments. Common in low-end devices are DSP slices (usually some form of multiplier/accumulator contraption) and ROM/RAM blocks. Going up in the device range are units with hard microprocessors (PowerPC, ARM, etc.), transceivers and miscellaneous interfaces (PCIe, etc.).

The criteria for selecting an FPGA is, consequently, also complex. The number of inputs/outputs to the device, DSP slices, memory requirements, logic elements, specific hard components, base technology, development toolchain, speed and power consumption are all factored in a giant matrix and then compared against what the market offers. In the case of XFM, price is the main driver of the search.

The FPGA market is dominated by Xilinx (the inventor of the FPGA) and Altera (Intel). They compose most of the available offer, with only a few small companies producing niche (yet interesting) devices.

The selected FPGA for the project was the Xilinx Spartan 6. It has 5.7k LUTs, 11.4k registers, 16 DSP blocks and about 70KB of memory. It doesn’t have any hard microprocessor (nor any other hard component), yet Xilinx has a very efficient soft core 8-bit microprocessor called PicoBlaze. A couple of instances of it were installed to handle MIDI and program parameters.

The Spartan 6 isn’t the latest series of FPGA Xilinx released. This might suggest that there could be better alternatives, but the price/performance ratio rarely works that way. It also forces the use of an old (yet super tested and stable) toolchain, but nothing too problematic. FPGA series usually have an extremely high lifecycle (> 10 years), so the component availability won’t ever be an issue. Also, the nature of the design and the fact that it’s an HDL, hardware design ensure easy portability to future devices.

The market offers several development boards with the Spartan 6. The two most competitive are the Mojo v3 ($38.9) and the Numato Mimas ($34.95). After prototyping XFM with both boards, it was clear that either one could resolve the project. The boards have basically the same FPGA, so the diffrences live in ho easy is to connect them to the development environment, and how well built they are.

The Numato has a faster clock (100MHz vs 50MHz) and a bigger configuration memory (16MB vs 4MB). On the other hand, the Mojo v3 has a very stanrdard, Arduino-grade MCU (ATmega 32U4). The board is fully open-source, so this combination would allow to directly embed XFM in bigger projects.

So XFM is based on the Mojo v3 development board.


The Design

The XFM design focuses on having most of the synthesizer crafted inside the FPGA board.

The only additional components are related to interfacing the board with the external world:

The MIDI Specification states that units should not have a common ground, so four components are needed to match this requirement (optocoupler OK1, D1, R1, R2).

Those components are not needed when chaining units to build a multitimbral unit (pin 57 of the Mojo v3 board is designated as MIDI thru, which can feed the next unit).

Program Memory
A dedicated high-write count, 1Mbit EEPROM chip was used to store Program Memory (IC1).

A very simple, capacitor-coupled SPDIF solution was implemented (C1)

16-Bit Stereo DAC
A simple dual sigma-delta DAC with passive filter is used for each channel (R3, R4, R5, R6, C2, C3).

Obviously, better DAC solutions exist. A DAC cost can range from $1 to $15,000. An easy option could have been used (i.e. using a dedicated 24-bit stereo DAC chip). However, the SNR and distortion measurements of the implemented DAC proved to be better than all the hardware FM units tested.

Moreover, those units do not feature the 24-bit SPDIF digital output, which XFM offers, allowing to squeeze real 24-bit audio resolution.

Sine wave C-E-G triad spectral response in XFM using SPDIF out.

Sine wave C-E-G triad spectral response in XFM using SPDIF out.

So besides the Mojo v3 development board, the whole design requires less than 20 components including the connectors.


The build consists of adding a small “shield” board to the Mojo v3, which essentially have the following functions:

  • MIDI optocoupling

  • Digital Out (SPDIF)

  • DAC, Analog Stereo Out

  • Program Memory

This is the shield schematic:




The design features a standard, MIDI specification compliant isolated input circuit. It is composed of the photo-coupler OK1, R1, R2 and D1.
While there are particular cases where the isolation (and consequently most of of those components) could be removed (particularly for the USB-powered MIDI controllers which share the USB ground with XFM), this design is flexible and reliable.

Digital Out (SPDIF)

There are numerous was to interface digital outs, both SPDIF and AES/EBU. XFM. High-end, low interference interfaces require the use of expensive transformers. However, almost every modern device (including modern high-end) implements the ultra-inexpensive interface XFM features: a single capacitor (C1). It has worked perfectly in multiple Digital Mixers, Converters and Sound Interface tests.

XFM uses the full frame for data transmission, so the digital output is Stereo, 24-bit.


As previously stated, the DAC in XFM is a 16-bit, dual PWM circuit for each channel. It’s composed by R3, R4, R5, R6, C2 and C3.
The stereo analog outputs are AC coupled by C4 and C5.

Program Memory

XFM stores all its programs in a 1Mbit EEPROM, a 24LC1025LC (IC1). it stores 256 programs plus a few status bytes. Unlike flash, EEPROMs have a very high write-cycle count, making it extremely reliable. The simple I2C interface makes it also possible and easy a mod to allow “cartridge memory” units.

So the whole design is composed by the Mojo board, six resistors, three capacitors, two e-caps, one diode, one photocoupler and one IC., three RCA connectors, one DIN connector, the board itself and a few wires.


XFM is a hardware design, which runs inside the FPGA. The design was crafted in a low-level HDL (Hardware Definition Language), and it has to be downloaded to the board to operate.

Once the board is connected to the computer, following steps are needed:

Installing the Mojo v3 Driver and Loader

The Driver allows for USB communication to the board, and the Loader will transmit the XFM binary to the FPGA.
This page has the driver and loader download links for your OS, and instructions on how to install (Windows installation is preferred).

Loading the XFM binary into the FPGA

Once the Driver and Loader are installed, just run the loader and load the XFM binary file into the board. Make sure both the “Store to Flash” and “Verify Flash” checkboxes are ticked.

The Mojo will restart after loading. The LED next to the USB should lit steadily indicating a successful configuration, and the eight LEDs should look as follows:


Now the synthesizer is ready and functional. Connecting the MIDI inputs and audio outputs should be enough as to play it in its glory. At least, in the full glory of bare sine waves, as we haven’t actually loaded any program yet.

Loading the Factory Sound Bank

The factory sound bank can be downloaded using any RS-232 terminal which can transfer binary files (see next), or using the bundled loading utility xload (Windows).

Comunicating with XFM

Once XFM is connected to a computer, it is very easy to communicate with it.

XFM uses a high-speed RS-232 communication protocol (500 kbps, 8-bit, no parity), and reacts to a few commands. Any terminal can be used for the communication, here’s a screensho of RealTerm after sending a Program Dump command (d):

RealTerm Serial Capture Program after issuing a Program Dump command (d)

RealTerm Serial Capture Program after issuing a Program Dump command (d)

The following are the commands used in XFM:

XFM Rs-232 commands

XFM Rs-232 commands

This minimalistic design allows access to the full functionality of the synthesizers parameters and memory space. The interfacing can be done very simply in software (using RS-232) or hardware (by editing the Atmega32U4 using the standard Arduino programming environment).


XFM is programmed by connecting it to a computer, and sending parameter messages via RS-232.

The complete parameter list is included in this spreadsheet (which can also be used to directly program the device: entering values will send the parameter set commands via serial USB).

Total Cost: sub $50

The main cost of the project is in the Mojo v3 board ($38.9 as in current eBay prices). The rest of the BOM is about $10.6 according to Octopart.com:

So the total cost for the project would be $49.5 for the whole module, assuming building only one unit. Prices go substantially lower when building multiple units.


Building a full board?

Hardcore DIY electronics builders could easily add extra savings to the project. The Mojo v3 is not only a very simple board, but is also open-source! The schematic, BOM, firmware and EAGLE-files are publicly available here.

The BOM for the Mojo could be in the $15-$30 range depending on where you source the FPGA from (the XC6SLX9 is listed at $16.5 in Octaparts.com, but $5.55 in lcsc.com!).

Designing a dedicated board for the whole project would also save the cost of a few connectors.

The ATmega32U4 in the Mojo isn’t used in the project, except for the USB communication. If you know how to JTAG the FPGA (or have another way to UART the FPGA), it could be easily removed for additional savings. On the other hand, it could be used to create a dedicated UI with a display and buttons. The board has eight ADC connections ready to be interfaced with the FPGA, implanting UART messages, allowing real-time control.

But I’m convinced that it is indeed possible to build the complete module preserving all functionality with less than $30, in a small batch of ten units or so.

Future Development

The modular structure of the XFM design allows for easy scalability. A much bigger, more powerful and capable synthesizer based on a bigger FPGA is being prototyped while you read this.

However, there’s still a long life of development ahead of XFM. While the Spartan 6 in the Mojo won’t accept a single extra byte of code, the ATmega32U4 MCU has still plenty of room to grow.

The code in the MCU is open source, making it easy to adapt and include XFM in any hardware project using standard Arduino tools.