Hey guys, I’m writing a user manual for some software I’m publishing. It’s a software synthesizer design toolkit, for making your own software synthesizer in your programming language of choice. Of course, in order to make your own synthesizer, you must know how one works.
My goal in writing this user manual is not only to document my code, but also to teach how synthesizers actually work, so that anyone can make their own. That’s where this post comes in. I need inspiration on what exactly it is people don’t already know about them, and what all the hot topics are.
I’m happy to actually explain these things in the comments below!
I’m a long-time software developer who at one point spent a lot of time on a software synth as a hobby project (never finished it as I realized it had fundamental design flaws). I’m also interested in making music (but still suck at it), follow various producers on YouTube and dabble with Ableton. Here are some things that puzzle me:
Latency seems inevitable, regardless of how fast your CPU or code is. Many algorithms simply require a certain window of input data before they can produce something. For example, an FFT with a window size of 2048 requires 2048 samples (~50 milliseconds) before it can react. Chain multiple such filters together and it adds up. In my hobby project I wanted to make a “reverse reverb” module (buffer data, reverse it, apply reverb, then reverse audio again to get an effect as if the sound is “arriving”) and I could never wrap my head around how to do it. It could potentially add a latency of tens of seconds. How can we deal with this in the audio pipeline? It seems like for prerecorded or generated audio, it should be possible to consume data ahead of time to make the output come out at the right time. But all of the modules need to be synchronized so e.g. a drum comes out at the right time along all paths.
Typically analog synths have lower latency, but I don’t understand why. Aren’t they theoretically subject to the same limitations as a digital synth? Even an analog filter would need some kind of buffer to determine frequency. It’s like Heisenberg’s uncertainty principle but for sound. So how does that work, and how can we replicate the low latency of analog synths in software synths?
I lack an intuition about sound synthesis and it all seems very magical, so I wish somebody would help me untangle the relationship between what I hear and what the algorithm does. I mean it’s easy to look up algorithms for producing audio, but I don’t know how to apply those algorithms to incrementally work my way toward the sound I’m looking for in my head. As a developer I have an analytical mindset, and most producers I follow seem to go more on feeling (which is difficult to me). I have a hunch that a lot of what they talk about is just placebo, but I don’t know how I would test that assertion. For example, there are people who compare the different sounds of Ableton’s Operator and Serum, as if they are different beasts. But both are FM synths; it’s the same maths behind them. So why would they have different sound? With all the FM synths that are out there, what are the things that actually separate them to produce different “feeling”?
In fact, speaking of FM synths, they are one of the biggest mysteries for me. I know what they do mathematically, but I need help understanding why someone chose to build a synth in this particular way and how they tame it to get the sound they want. It just seems like a really chaotic way to work for me, only slightly better than a random number generator.
Perhaps it would be interesting as a case study to try to replicate some of these commercial software synths by stitching together basic algorithms covered in the manual.