randy's Recent Posts

Since Sumu 1.1.3 was released with some important bug fixes, I've been working on optimization. The release notes just say "- many optimizations to app framework and DSP" but I know some of you will be interested in a deeper dive. This is written for people who are interested in programming. So while I'm not assuming a great amount of specialized info, I'll throw around terms like "compile time" and "runtime." If you want to know what anything means, you can ask a search engine or even me, in the comments! Let's dig in.

With the 1.1.3 version released October 1, I made an eight-voice patch that uses the filter and automatically makes sound, as my rough benchmark. I also set up a document in the app i use for much of my testing, the AudioPluginHost that comes with JUCE. I use this because it launches so fast, just a few seconds compared with maybe 20 seconds for the major DAWs. When looking at the Activity Monitor in MacOS, the CPU usage for my AudioPluginHost doc was 63%, plus or minus 0.5%. A rough number but good enough to see the big changes I'm making now, and when I don't need precision, I usually do what's quickest!

This 63% measures against one CPU core, of which most of our machines have around 8. Still, very very heavy CPU use for a softsynth. Let's do something about it. Using Apple's Time Profiler developer tool, I made a list of the biggest CPU time users in the code. While Activity Monitor is precise enough to check that I'm making forward progress, I use the profiler to identify which parts of the code are the hotspots. Let's save some cycles!

Optimization 1, hashed symbol rewrite in madronalib

My plugin framework has a lot of code based on an internal Symbol table. It's nice to write stuff like parameters["osc1/gain"] instead of parameters[kOsc1GainParamIndex] because then you have to make the list of those index constants somewhere and keep it up to date. A possible downside of indexing everything with Symbols is that you can only catch spelling errors at runtime, not compile time. But for me it's faster to bang out code this way and I love brevity, so it's a good tradeoff.

Version 1 of this Symbol code has an implementation where Symbols were stored by index, in order of their creation. Since this can't be done at compile-time, some work is needed at runtime to turn the string "osc1/gain" into a symbol index every time you want to access something by a symbol. I realized later that many of these lookups could be at compile-time instead (or in C++ terms, constexpr) if I used a hashing function to generate the index. Using a 64-bit index means that hash collisions are astronomically unlikely. Also, the debug version of the software checks for collisions so if that very unlikely thing happened someone could rename a variable.

We still want to index some things by un-hashed Paths, File trees for example. And there's a need to make Symbols on the fly sometimes, as when reading in config files. So after rewriting my classes SymbolTable and Symbol, I had my Path class to rewrite, with a GenericPath class that can be used to implement container classes like Tree, and Path and TextPath subclasses for the different use cases. It was a lot to think about. But in a couple of weeks of working on this part time I had my much-improved replacement code for this fundamental part of my framework.

speedup: 63% -> 60%

Optimization 2, removed mostly unused i/o scale multiplies in Patcher

I had designed some flexibility into the DSP object that implements Sumu's central patch bay, in the form of individual scales for each input and output. It turned out that in the final design, most of these were set to 1.0 most of the time, so it was quicker to just multiply the few inputs and outputs that needed different values, as special cases. Of course multiplying things takes time so we like to avoid it.

speedup: 60% -> 54%

Optimization 3, don't send published signals when there is no view

In order from signals to get from the DSP core to Sumu's GUI, for meters and displays, the concept of publishing is used. Sumu publishes signals and the GUI subscribes to them. I noticed that the code was doing some of the work for this publishing even when there was no window showing. Turning all the published signals code off when there's no window was an obvious time-saver.

speedup: 54% -> 47%

Optimization 4: optimized LadderFilter integration to reduce number of operations

One of the more expensive DSP components in Sumu is its nonlinear resonant Moog-style ladder filter. This is derived from work by D'Angelo and Välimäki. I love the onset of resonance in this filter model, the way it starts to oscillate just a bit in a way that depends on the input signal. It has a real liveness and sounds very full and clear.

The filter is four nonlinear stages with a feedback stage. I stared at my naive implementation, did some simple math and realized that by juggling the variables I could change the scaling of the signals running through the filter by a constant amount, and thereby save a multiply per stage. High school algebra for the win!

speedup: 47% -> 45%

Optimization 5: LadderFilter() 4x object

Because of its four stages, the Moog filter model is a bit slow: each stage depends on the result of the previous stage, and the values must be calculated one after another for each sample of audio. If we consider one of these filters by itself, SIMD (SSE / AVX / NEON) is not immediately helpful, because while SIMD can do four multiples at once, the values can't depend on each other.

Where SIMD does help us out is in running four or more of the filters at once. By changing the filter code to operate on groups of four samples rather than single samples, we can make a 4x filter bank that runs in only a little more time than the single filter. Because the inputs and outputs are arranged [a1, b1, c1, d1], [a2, b2, c2, d2], ... in memory, we can think of these as vertical filter banks, operating on four columns of signal in contrast to the horizontal single filter.

Sumu has two of the filters in each voice, applied to the left and right outputs. So if we have more than two voices of Sumu, we need to calculate four or more filters anyway. When we have eight whole voices and 16 filters, the savings are big, around 10% of our remaning total CPU!

speedup: 45% -> 41%

Optimization 6: - try tanhApprox w/ div approx in LadderFilter

The nonlinearity in each stage of the filter is implemented using a tanh (hyperbolic tangent). This is expensive operation, so we make do with an approximation. Picking a good one is as much art as science, because in the context of the filter different approximations will give different sounds. For the single filter I had already picked an approximation based on a simple ratio of polynomials: y = x(27 + x^2)/(27 + 9x^2). Though a deep understanding why this simple product is such a good match for our transcendental function eludes me, as we've established I'm good at simple math and timing things.

We were already using the above approximation in our single horizontal filter. With SIMD in the mix, though, there's an appealing idea in the form of the SSE function _mm_rcp_ps. Divides are one of the most CPU-intensive operations, and our approximation unavoidably contains one. But _mm_rcp_ps is a reciprocal approximation with the potential to run much faster than the full divide. It uses a lookup table implementation internally to produce a roughly 8-bit accurate result in around half the time of the more accurate division.

Now when you introduce any approximation, there are going to be changes in the output, possibly audible. This did not apply to the 1x -> 4x filter bank changes because the SIMD values are all still full-precision 32-bit floating point. But the divide approximation would change the feedback path of the filter to have less resolution. It might not sound different, but it might. Fortunately, this was a very quick change to try out.

And the somewhat surprising result: it wasn't any faster! This isn't too hard to explain: along with the reciprocal estimate, you still have to do a multiply to get the divide estimate. And, even though we are doing four filters at one, each stage still has data dependencies. I haven't used a timing simulator to do a deep dive into this (godbolt.org is the one I would try) but my guess is that the expensive divisions all fit into time that was spent in each stage waiting for previous values anyway. So the result:

speedup: none

So in around ten weeks we've gone from using 63% of our CPU for eight voices to 41%. This is around a 35% speedup, and IMO the difference between "what are they smoking" and U-He Diva territory. And, it's only the start! More optimizations remain. But now is a good time to make a release and share this work. It's there for you in Sumu 1.2.0. Enjoy the sounds!

Thanks for sharing! With a patch that uses the filter, you should see more of a difference.

Hi @marsdietz thanks for your kind note. I hear your compelling use case for an iOS version. I'm not working on iOS versions just yet, but I'm working on things that we would need to get in place before that effort. in other words you'll see Aalto 2.0 before too long, and shortly after that it could be possible to have an iOS version. I'll probably telegraph my intent before that and solicit some beta testing. Please stay tuned!

Happy Solstice, and here's to change and evolution. This goes out with all my best wishes to everyone for health, peace and prosperity in what has been such a hard year for so many. I hope that, despite COVID, you are finding ways to keep sane and happy, to nurture your body and spirit. A few weeks ago I got away on a short trip to meet some friends at the Pacific Bonsai Museum. That's where the lovely grouping of golden larch trees on the sale image here comes from. Visiting was a meditative, inspiring and world-expanding trip for me and if you're in the Seattle area, I recommend it highly.

From now through January 6 2021, our year-end sale is happening! You can use the code EVOLVE to get 30% off all our software. If you're an Aalto fan but you've been holding out on getting Kaivo or Virta or Aaltoverb, now's a great time. And yes, our simple bundle deal is in effect along with the year-end discount, if you choose to take advantage of them both. This results in some big discounts!

Finally, if you're looking for a last-minute gift, you should know that it's easy to give a Madrona Labs software license! A gift license can even be part of a simple bundle with one you bought for yourself and I'll be happy to transfer it free of charge (and judgement). Just email me at support@madronalabs.com to let me know. I'm taking Dec 25, Dec 31, and Jan 1 away from the computer but otherwise I'll be available within 24 hours (and probably less) to help make your holiday dreams come true.

Yep, we're about to have our end of year sale in a couple of days!

There's no further discount on the Studio Bundle however. We cap discounts at 30% to keep things sane and that's what the bundle offers any time.

I'm working to grow the company to where we can make a Soundplane Model B and support it well. it's been a long journey! Thanks for your continued interest.

I'm starting Team Notes as a place for me and other folks on Team Madrona Labs to post more casual things for a smaller audience. If you're seeing this, I guess you're that audience. Welcome!

Typical posts might be special interests within our special interest: technical details about making the software, or about making the coffee, what we're listening to, rants on technology and how it's making everything better or making everything worse, recipes... you get the idea. If there's anything you've always wanted to know, please reply here with your ideas!

This orange clipboard has my notes from the current Sumu optimization. I'm figuring out how to run four of the Moog filters in parallel in around the time it previously took to run one, and thinking about how to generalize this solution in an elegant way. I still do big abstract thinking much better when I'm using actual pen and paper.

Making code faster with SIMD (Single Instruction, Multiple Data) processing is something I've enjoyed doing since Motorola first released their Altivec-enabled PowerPC processors. From Altivec to SSE and AVX and ARM NEON, the SIMD concept has remained largely the same across processor generations for the last 25 years. Compilers have gotten better at optimizing C++ code for SIMD, but to get the fastest results, the right algorithms and data structures have to be picked to kind of make the CPU happy, which is much higher-level thinking. And zooming out even further there are more human-centered tradeoffs to think of: what kind of code reduces the potential for bugs, what is redadable and maintainable, what will be easy to learn for new people coming aboard the project? Are we optimizing for just the latest Apple machines, or do we want this stuff to be runnable on a ten year old ThinkPad? The questions may start out purely technical, but very soon the answers start to reflect an organization's values.

Sumu makes heavy use of SIMD already in its oscillator banks. But the 4x filter object is one that had to wait until after the initial release. Between this and other optimizations, Sumu 1.2 will use around 30% less CPU than 1.1.3. And there's more optimizing to come. My goal is to run it on this ten year old ThinkPad over here!

@rjschrei sorry to hear about your gear.

Thanks everyone! Sooooo much moss wants to grow on our roof here in the Pacific Northwest. I think I'm going to hire a professional.

Thanks for the additional info. If you can email me at support@madronalabs.com and send a crash log, it will help me figure this out.

Hey @intervolver I hear you are pretty frustrated with the sample import process, even without the crash. Obviously, you're not alone. I have plans to make it better.

As far as the crash, please contact me at support@madronalabs.com. I'm not getting any crashes with importing here, so maybe it's the specific files you are importing that can cause it. I'd like to get a hold of them and try to reproduce the problem.

Maybe it's overwriting the created .sumu files that is causing the crash? When you describe reimporting on your edit, one difference is that the files already exist. If you try deleting all the .sumu files in the folder you're writing to and importing again, you could test this. I'll do so as well.

It's a commonly asked question so I'm copying this info to a new thread in hopes it will be found more easily.

The Vutu app makes .utu files. These are a plain-text, editable, JSON format containing partials maps.

In Sumu, you can import the .utu files. The import process creates .sumu files, which are compressed to save space. The import also does some more analysis and saves it in the compressed file.

To import .utu files in Sumu, click the [...] in the Partials module. There's just one choice in this Popup menu: import partials. Then select a folder to import.

When you import a partials folder, it imports the whole directory tree underneath the one you pick, including any folders that contain .utu files.

So if I have on my disk
~/VutuFiles/Strings/cello.utu
~/VutuFiles/Strings/viol.utu
~/VutuFiles/Noises/tinkle.utu
~/VutuFiles/Noises/harsh/blender.utu

and then bring up the import dialog (partials/...) and select the folder ~/VutuFiles to import,

the files
~/Music/Madrona Labs/Sumu/Partials/Strings/cello.sumu
~/Music/Madrona Labs/Sumu/Partials/Strings/cello.sumu
~/Music/Madrona Labs/Sumu/Partials/Noises/tinkle.sumu
~/Music/Madrona Labs/Sumu/Partials/Noises/harsh/blender.sumu

will be created, along with the directories on the way.

Importing again will (for now) overwrite these files.

The intention is to "sync" your entire partials development folder at once. So if you make a folder somewhere called "VutuPartialsForSumu" or something, with everything you want to import into Sumu, and always select it when you import, you will keep your partials organized based on that folder's structure.

-Randy

Looks tasty!

These are already imported into .sumu format! So, you do the following:

  • go to our Google drive
  • click the "..." to download the whole theory folder
  • put the theory folder in .../Madrona Labs/Sumu/Partials with all the other partials.

The "import partials" converts utu files into .sumu format. You do not need to do that here.

We have had MPE support since version 1.1.0.

With the Sumu 1.1 release out, VST3 versions of Aalto, Kaivo and Virta are the top priority now. We're working hard on them.

Hi, sorry your post got buried—feel free to hit me up via support @ madronalabs any time.

I appreciate the feedback. I know importing partials is super clunky. Part of this is for legal reasons: the Loris software that I use to do the analysis is under a GPL license and would require that my entire synthesizer be open source, if I were to incorporate it. I have plans to address this in the future and make the workflow much better.

You mention some Vutu things I can definitely improve. The '90s are hip again but I can definitely do better on some workflow things.

At some point I plan to add parameter locks which would take care of the voices issue.

People have different ideas about which way is "correct" for scrolling. There's a setting in the main "..." -> GUI -> normal / reverse.

I hope you are enjoying the sounds, please stay tuned and I will work to improve other things about the software.

Wow, this is a weird one. Please keep us posted on whather the 3D driver fix is helpful!

Happy October! I've just uploaded an update to Sumu, version 1.1.3.

This update brings fixes to two main issues with version 1.1.0. On Mac OS, a crash in AU validation was affecting users with Intel-based computers. This has been fixed. On Windows, an expired code signing certificate was causing difficulties in downloading. This is also fixed.

Along the way I discovered a few additional problems. Here's a full list of the changes since 1.1.0:

  • fix AU validation crash on Intel processors
  • add Azure code signing for Windows DLL and installer
  • fix JSON-related bugs in partials import
  • fix rare crash in space display rendering
  • fix running on MacOS 10.13 and higher
  • optimize JSON parsing for very long files
  • fix FM mod index bug

Because of the bug fixes, I recommend this free update to every Sumu user. Installers are over here!

Hi @lumena, sorry it's giving you trouble, can you identify a particular .utu file you are importing that is causing the issue? Does it ever succeed? Please follow up with me by email at support @ madronalabs.com if you would like to send a file.

Maybe not too distant!

Now that Sumu is out with MPE, I'm working to make VST3 updates of the other synths with a fix to the keyboard problem and a couple of other long-standing bugs, ASAP.

Glad to hear!

The Sumu 1.1 update is out, bringing MPE support as well as a bunch of bug fixes and enhancements. These include:

  • gui: fix scrolling after setting default in popup dials
  • gui: fix missing update bug when pasting registration
  • osc: fix zipper noise in AM mod index
  • AU wrapper: fix Obj-C namespace collisions
  • fix UI scale bug with muliple displays of different scales
  • allow slower LFO times
  • increase LFO popup dials precision
  • adjust text position for negative numbers in Dials
  • increase pulses ratio dial precision
  • fix slow LFO sync
  • improve host clock sync
  • reset pulses hi scale offsets when adjusting past zero
  • fixed issue loading state saved by beta version
  • added MIDI learn for dials via popup menu
  • added main […] menu with gui and input settings
  • gui : added numbers on / off menu
  • gui : added scroll normal / reverse menu
  • input: added protocol [MIDI / MPE] menu
  • input: added MPE bend range [ 0 / 12 / 24 / 48 / 96 ] menu
  • input: reduced latency of gate and pitch in patcher
  • input: added MPE mode
  • input: don't reset voice time in unison mode legato
  • partials: add interpolation to time dial

NOTE: we are still working on meeting the latest requirements for Windows code signing. Windows will claim that the Sumu installer may be dangerous and you will need to click "Keep anyway" a few times. We'll update the installer as soon as we can address this.

Hi Peter, the way to do it is to use Sumu to import the .utu files. There's info in the manual and I've just made a new Quickstart from some info on the forums:

https://madronalabs.com/topics/10344-importing-partials-from-vutu-to-sumu-quickstart?locale=us&page=1

I hope this helps, if you have any questions please email me at support.

best,
Randy

Wow, that's interesting. I guess the other profile is for Direct3D?

Of course it's still not ideal if there's a setting you have to change to use Sumu but I'm glad you have a workaround.

received thanks!

Thanks. I have a Windows 11 machine here. Do you have a utility you are using to track GPU memory usage? I'm happy to research this but using the same one will help me reproduce what you are seeing.

Hi and thanks for the report. This is not familiar to me. I can check into it. What operating system are you using please?

Hi, sorry for the delay, I had one last long weekend adventure this summer.

It could be that your import is actually successful. The message "partials file not found" should show up in the partials module only when the patch itself is referring to a partials file that it can't find. Are you trying to load a patch you've downloaded and the partials to go with it?

After loading the folder "Vutu" that you made, what you should expect to see is the one partials map (utu) in the top level of the menu in the partials module. If you look in the Madrona Labs/Sumu/Partials diretory, you should see that one file in there converted to a .sumu file. Once you choose it from the partials chooser the "partials file not found" message shbould go away.

@cupcake thanks for the info! And sorry for the confusion. I'll look at the installer and try to remedy this.