mirror of
https://github.com/airwindows/airwindows.git
synced 2026-05-15 06:05:55 -06:00
[GH-ISSUE #28] Airwindows LinuxVSTs seem to not be optimized for Ryzen Threadripper CPU's #23
Labels
No labels
pull-request
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/airwindows#23
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @AVLinux on GitHub (May 14, 2021).
Original GitHub issue: https://github.com/airwindows/airwindows/issues/28
I started a song project on a laptop with an early Intel i3 Processor on it and my session with all Airwindows plugins was running at about 30% DSP usage in Ardour. I moved the session to another machine with an AMD Ryzen Threadripper 24 core CPU and the session was at 80% DSP usage with Xruns everywhere!! So I went on Ardour's IRC and they indicated a few CPU settings to look at etc etc. and no change. I opened Ardour's Window for Plugin DSP Load and many of the Airwindows were running very high in the metering. To rule everything out I opened another Ardour session with a similar amount of tracks that had no Airwindows Plugins but Plugins from other common developers and the DSP Usage in the session was 4% and no Xruns.
So it's not the machine, it's not the machine settings or Ardour settings it's specific to the Airwindows Plugins and the latest Zip from the Airwindows website must be missing some compiler optimization flags for newer AMD CPU's. Yes, I know I can compile them myself but it requires the VST2 SDK which I don't really want to be bothered with plus that only fixes the problem for me. The Zips from the Airwindows website should be compiled for all potential x86 processors and there are a lot of Threadripper machines out there.
@AVLinux commented on GitHub (May 14, 2021):
A picture of DSP Load from the Airwindows session:

DSP Load from a session with the same amount of tracks but no Airwindows Plugins:

@airwindows commented on GitHub (May 14, 2021):
Well, one thing about it is, if I can identify what needs to be changed about the Linux compilations, I can do them all over again and update the entire library of binaries :)
These are being done with the scripts/environment I offer, which is courtesy of a Linux user who provided it, and I've been through one situation where I moved the environment to a newer Linux environment that started compiling everything much faster and identifying which things had been previously built. Seemingly good, except that the resulting VSTs didn't work for some users, so I rolled it back. This is all being made on a virtual machine I barely understand, and THAT means I can investigate ways of optimizing and roll it back if it fails. This is also why the creation dates are weird on the files :)
So, thoughts? I haven't got the foggiest idea what I would do to change this. It sounds like Linux experts need to sort out what to tell me to do, at which point I will recompile all the binaries. I wish to avoid another situation where something changes and then it's completely broken for some or all Linux users, in the sense of 'does not work even at low efficiency'. I'm pretty sure I've provided all the information I know how to provide, and I can run the virtual Linux machine to run any further tests or self-checks you'd like.
@AVLinux commented on GitHub (May 14, 2021):
Hi Chris, wow a reply from the big man himself!
I wish my first correspondence wasn't a bug report but instead a huge THANK YOU for these plugins especially to have them also on Linux. Through a recent music challenge on the Linux Musicians forum to create a song exclusively with Airwindows plugins is how I have been introduced to them and they sound phenomenal and are a lot of fun, I would like to continue using them so this is why I was a bit dismayed by the fact that they weren't optimized on my AMD system..
OK so as far as a solution I am not an expert at all on compiler flags but I see looking at the CMake txt in the source code we have this:
So I would guess we need to add something to those lines, but what? A quick google came up with this Reddit thread which had some suggestions: https://www.reddit.com/r/linuxadmin/comments/bwnijq/compilation_optimization_options_for_amd/
One of the suggestions is '-O3' or '-Os' but I don't know whether this would fix AMD performance and harm others..? Unfortunately as I said I am not very savvy about these things.
I could try compiling on my own machine and I have the Steinberg SDK but I'm not understanding where the 'include' folder needs to be placed in the Source folder to compile properly. It would be nice if the Linux User who came up with the build scripts would drop by and shed some light.. I am willing to compile and try some things but I need a bit of guidance is all.
@x42 commented on GitHub (May 16, 2021):
What compiler was used to create the binaries? by chance
icc(intel compiler)? If so that could explain poor performance on AMD machines.Is there a verbose compiler log available? Maybe cmake adds some
-archflags for some reason.Otherwise it is a bit of a mystery. With just
-O2and no arch specific flags there' is no obvious explanation why performance on AMD machines would worse compared an i3. A data point of someone using the plugins on Windows running on an AMD Threadripper might be informative.@airwindows commented on GitHub (May 17, 2021):
Well, is that native on Ubuntu, an older version of Ubuntu? The time period I began making Linux VST is the time period the virtual machine was set up, and I believe it's Ubuntu Studio I used. It's running on Parallels, and I did nothing architecture-related: still using the same makefile etc. up on my Github. I'm pretty sure the virtual machine thinks it's on Intel because the physical machine is older Intel. I did nothing special to optimize it for anything.
@x42 commented on GitHub (May 17, 2021):
That should be fine. By default gcc does not optimize for a specific architecture. it optimizes for generic x86_64 machines (Intel and AMD alike).
We just discussed this on #ardour IRC, two users with a AMD Threadripper could reproduce it, one user with an AMD Ryzen 7 2700X Eight-Core could not.
Maybe your use of gcc 7.2.0 can explain this. Searching the web for "Threadripper gcc 7.2.0" finds:
Then again, this implies tuning for a specific
-arch=...not generic x86 machines. unclear if this is relevant.@AVLinux commented on GitHub (May 17, 2021):
OK after a long afternoon of trial and error I have a few findings:
On Ardour's IRC channel with the primary Ardour devs and some very helpful folks we were able to test the current Airwindows Plugins with another AMD Ryzen Threadripper machine with the exact same song session and the result was the same high DSP load. This confirmed it was not isolated to my particular computer. Users of Intel computers also tested the same session and the DSP Load was in keeping with what I found on the original i3 Intel machine the session was started on.. As an interesting aside one person who had a non-Threadripper AMD Ryzen machine had a significantly lower DSP Load than Threadrippers but still markedly higher than lower spec Intel machines.
So, the next step was for me to compile the Plugins on my own machine for my specific CPU and see if the DSP load would drop. It took some fiddling to get the Plugins to compile because the info in the Airwindows LinuxVST Readme needs some updating and there are crucial CMake build files missing as has previously been reported here:
https://github.com/airwindows/airwindows/pull/5
Once that was sorted I compiled the Plugins with GCC 8.3.0 and the following suggested CMake command to make binaries tailored specifically to my AMD Ryzen Threadripper 2970WX CPU:
cmake -Bbuild -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS='-O3 -march=native -ffast-math' && cmake --build build --parallel
It is important to note that building this way would only provide Plugin binaries that ran on my specific system and would not be any use to widely distribute elsewhere. The binaries built successfully however it is unfortunate to note that running our test session even with such specific tailoring to the Threadripper CPU only yielded about a 15% reduction in DSP load...
To summarize the Plugin binaries from the Airwindows site on our test session on an i3 Intel here ran at 30% DSP Load, the exact same session on an AMD Ryzen Threadripper was at 80% DSP load and the self-compiled and optimized binaries were at 65% DSP Load. It would appear that there is nothing that Chris can do in the compilation of the plugins that would optimize them for AMD Threadripper CPU's and allow them to remain distributable for Intel CPU's.
@x42 commented on GitHub (May 17, 2021):
A long shot, two users (@AVLinux and @pauldavisthefirst) who reproduced the issue on a Threadripper both run debian/stable.
Maybe the issue at hand is the rather excessive use of
pow()in many airwindows plugins and debian's libm.so.6 for the Threadripper.In DSP code that I write I generally substitute that with a call to
exp()orexpf()but that's not a panacea and may or may not make a difference for the issue at hand.
It would be useful to profile the plugins e.g.
-DCMAKE_BUILD_TYPE=DebugOptimizedand then profiling with perf or something. Sadly I cannot assist with this since compiling the plugins(s) depends on the Steinberg VST SDK and I cannot agree to their terms, nor do I have a AMD system.Other things to investigate might be potential syscalls. Deckwrecka has significant DSP load even on intel systems. That might be due to the many
rand()calls, combined with two calls topow(),fabs()and somesin()per sample. I'm rather surprised that it perform as well as it does on intel systems. and it could well be the reason why it's poor on AMD/Threadripper if libm/libc isn't as optimized yet.@mikelupe commented on GitHub (May 20, 2021):
Hello Chris, first of all me too I would like to thank you for these wonderful plugins. I'm spreading the "news" about them quite strong in my circles.
I was one of those also testing AVLinux' reference track on Linux in Ardour on a i7 4th generation notebook (4c/8t), and had a quite low DSP load of around 38%.
Even if I'm not on AMD, I think it would be greatly beneficial if there could be found a way to optimize them on that architecture.
Looking forward to new adventures using you plugins :)
@bsg75 commented on GitHub (May 24, 2021):
Btw, the AirWindows Challenge is still running until May 29th (: https://linuxmusicians.com/viewtopic.php?f=40&t=23169.
@robbert-vdh commented on GitHub (May 31, 2021):
I've finally had the chance to upgrade to Zen 3, so I decided also check this out. There are a lot of screenshots here, so I've collapsed this to avoid padding this ticket too much:
Click to expand
With a single audio track in Ardour-6.7.54-dbg containing a single instance of ADClip7 downloaded from the website a year or two ago, this is the DSP load with only a single track:
That's exactly as you'd expect, nice and low. Now, this is what happens when duplicating the track:
The DSP load immediately shoots up. Duplicating the track six more times for a total of eight tracks gets you the same issues described above:
The same thing also happened on a regular release build of Ardour 6.6. I don't have access to the VST2 SDK, so I sadly won't be able to build a copy of the plugin with debug symbols so I can profile it properly. Without debug symbols in ADClip7, 61% of the time is spent...somewhere:
Turning the 'Signal processing uses' option down from the default 'all but one processor' to '1 processor' brings processing time for the individual plugin instances down to the same level as processing a single plugin instance (after restart Ardour), but the DSP load remains high:
I double checked this in Bitwig Studio 4.0 beta 1, and the same thing happens there. 8 instances of ADClip7 on individual tracks that can be processed in parallel results in the exact same DSP load as 8 instances in series. Does ADClip7 use globals/statics internally? I don't have time to investigate this further, but I wouldn't be surprised if there was some form of false sharing or other cache thrashing going on here.
EDIT: I see that ADClip7 uses statics as seeds for noise. I wonder if using
thread_localinstead ofstatic(or just storing the seeds in the plugin's object) fixes the issue. I sadly can't build these plugins myself, so I can't test it out.@robbert-vdh commented on GitHub (May 31, 2021):
To add to the above, the use of
rand()in some other plugins also suffers from a similar same issue because it's not realtime safe. The GLIBC implementation uses locks and a static struct for the state, as can be seen here:595c22ecd8/stdlib/random.c (L286-L298)@x42 commented on GitHub (May 31, 2021):
Have a look at https://www.pcg-random.org/ there are liberally licensed C/C++ implementations that can use a per plugin instance seed (lock-free, no static global state).
@x42 commented on GitHub (May 31, 2021):
Confirmed @robbert-vdh findings on a quad-core Intel CPU:
Single instance of Deckwrecka:

Copying the track 3 times, all four run concurrently, but block each other, and load increases for each plugin: