[Csnd] Multi-threading multiple Csound instances with the API

Hi

As far as I understand and have tested/heard, using globals and channels can undermine any multi-threading when using -j (as opposed to having self-isolated instruments).

Ideally I'd like to use multi-threading, but also globals/channels, so I thought an interesting way to get around this could be to use the API to run multiple instances of Csound and try to handle threading a bit more manually. The more I looked into this, the more it seemed like it might create other problems though.

The general idea to test is to have one Csound instance running on the main thread which uses -odac and then two other threads using -n which would just send to/receive from the instance on the main thread, and the API brokering audio between them with channels.
In each thread it roughly does something like:

do {
     csoundWaitThreadLockNoTimeout(userdata->lock);
} while (csoundPerformKsmps(userdata->csound) == 0);

.. and in the main thread:
while (csoundPerformKsmps(main) == 0) {
     csoundNotifyThreadLock(userdata1->lock);
     csoundNotifyThreadLock(userdata2->lock);
     /*
         calls to csoundGetAudioChannel and csoundSetAudioChannel
         using previously allocated buffers
     */
}

This does basically work, but at sr=48000, on Windows it only runs without audio dropouts with kr=12, and on Linux on the same machine, a kr about ten times that.
I'm new to multi-threading with audio, but it seems that the threadlock might not be able to wake quick enough to keep up with a higher kr (which I would like to try and achieve).
The more I've read about audio multithreading, the more I think this may be a dead end, so I'd be interested if anyone has any views and opinions.

I also considered (but haven't tried) using csoundPerformBuffer, but then while the threads may be synchronised, as the get/set audio works with ksmps, I couldn't think of any way that would help to exchange channels between the threads (perhaps audio could work with spin/spout, but k-rate channels would face the same thing).

thanks
RK

Csound mailing list
Csound@listserv.heanet.ie
https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND
Send bugs reports to
        Issues · csound/csound · GitHub
Discussions of bugs and features can be posted here

Your design is similar to the original design for multi-threading within Csound, which I did some work on.

It was never efficient enough.

The current design is more efficient, but it is still not really efficient enough.

You could modify your design by using lock-free FIFOs. A master thread would receive events, channels, and audio and enqueue them in lock-free FIFOs. A number of worker threads would dequeue events, channels, and audio from these FIFOs, process this data, and enqueue the results in other lock-free FIFOs that the master thread would dequeue for output.

Possibly, the Csound channels could also be implemented as lock-free queues.

The csound_threaded.hpp class in the Csound source code enables a single instance of Csound to run in a separate thread. It uses a FIFO to receive events from the host. It might be a starting point for such a design.

I am dubious about the ultimate efficiency of this approach. Csound is a very challenging case for multi-threading. There are all kinds of overhead that end up causing swapping in and out of cache. The FIFOs themselves would almost certainly be efficient enough, the problem would be swapping Csound code and data in and out of cache. But in my experience, you never know until you actually code and test it.

I would advise consulting with John ffitch and working with the existing design before trying to come up with a new one.

Regards,
Mike

Thank you, some really useful insights there.

Threaded audio is certainly more challenging than I anticipated, I thought there might a ‘quick win’ for some simple specific cases with the design I was trying, maybe not. However it is worth saying the basic test I’ve done with three threads exchanging channels with the API does seem to perform better across cores than the equivalent in Csound with -j3 (albeit, also using channels) , at the same (low) kr, but it is an extremely specific case.

I’ll have a look at the lock-free options, csound_threaded.hpp and revisit John ffitch’s papers on multicore/parallel processing, which will likely make more sense to me now having dipped my toe in a bit.