Short-latency sound control

Hi,

I am new to Csound and to PureData. We have used PureData to implement a gesture-controlled sound generator - our emphasis is on minimizing latency between gesture (sent via OSC to Pd) and sound. I like Csound’s programmatic approach much more - but I have read here and there that PureData has more of a real-time emphasis and so may be “quicker” to respond. Is this correct ? I would use Csound if I could… and I think I would get some feedback here before/in addition to testing it myself.

Thanks !

There should be absoltuely no difference in response time between Pd and Csound. If you prefer to programmatic approach of Csound then I think you have your answer :wink:

1 Like

@suresh I don’t know which gestures are you detecting but since Csound has a very nice python bindings (ctcsound.py) you could run and control csound instruments directly from a python script and then you could e.g. take MediaPipe library (GitHub - google-ai-edge/mediapipe: Cross-platform, customizable ML solutions for live and streaming media.) for tracking e.g. hands/fingers. Here below is a script that tracks index finger of the left hand and uses it to control amplitude (y axis) and frequency (x axis):

CAMERA_ID = 0 # if there are multiple cameras connected, with this you can select which one to use

# MediaPipe parameters
MAX_NUM_HANDS = 2         
MODEL_COMPLEXITY = 0      
MIN_DETECTION_CONFIDENCE = 0.5 
MIN_TRACKING_CONFIDENCE = 0.5 

# ------------------------------------------------------------
# Imports
import cv2
import mediapipe as mp
import ctcsound

# ------------------------------------------------------------
# MediaPipe setup
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_hands = mp.solutions.hands
# ------------------------------------------------------------

# ------------------------------------------------------------
# Csound setup
# ------------------------------------------------------------
orc = '''
ksmps = 128
nchnls = 2
0dbfs = 1

instr 1
    ; IO channels
    kamp chnget "amp"      ; reading value from amp input channel 
    kfreq chnget "freq"    ; reading value from freq input channel 

    aout poscil kamp, kfreq                     ; audio synthesis
    outs aout, aout                           ; sending audio to speakers
endin
'''

cs = ctcsound.Csound()
pt = None # csound performance thread

#cs.compileCsdText(csd_text)
cs.setOption('-odac')
cs.setOption('-b1024')  # SW buffer size
cs.setOption('-B1024')  # HW buffer size
cs.compileOrc(orc)
cs.readScore('i 1 0 10000')  # run for a day
cs.start()

pt = ctcsound.CsoundPerformanceThread(cs.csound())
pt.play()

# Create control IO channels 
def createChannel(channelName):
    chn, _ = cs.channelPtr(channelName,
    ctcsound.CSOUND_CONTROL_CHANNEL | ctcsound.CSOUND_INPUT_CHANNEL)
    return chn

ampChannel = createChannel("amp")   # uses utility method to create a channel and get numpy array to write to
freqChannel = createChannel("freq")
# ------------------------------------------------------------


# ------------------------------------------------------------
# Processing part
pressed_button = False
cap = cv2.VideoCapture(CAMERA_ID)
with mp_hands.Hands(
    static_image_mode = False,
    max_num_hands = MAX_NUM_HANDS,
    model_complexity = MODEL_COMPLEXITY,
    min_detection_confidence = MIN_DETECTION_CONFIDENCE,
    min_tracking_confidence = MIN_TRACKING_CONFIDENCE) as hands:

    # Main loop
    while cap.isOpened():
        success, image = cap.read()
        if not success:
            print("Ignoring empty camera frame.")
            continue 
           
        # To improve performance, optionally mark the image as not writeable to pass by reference.
        image.flags.writeable = False
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        results = hands.process(image)

        image_height, image_width, _ = image.shape

        # Iterate through hands, draw hand landmarks on image and write to csound control channels
        image.flags.writeable = True
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        ampChannel[0] = 0.0 # if hand is not detected, set amp to 0 to turn off sound
        if results.multi_hand_landmarks:
            for hand_landmarks in results.multi_hand_landmarks:
                mp_drawing.draw_landmarks(
                    image,
                    hand_landmarks,
                    mp_hands.HAND_CONNECTIONS,
                    mp_drawing_styles.get_default_hand_landmarks_style(),
                    mp_drawing_styles.get_default_hand_connections_style())
                
                handIndex = results.multi_hand_landmarks.index(hand_landmarks)
                handLabel = results.multi_handedness[handIndex].classification[0].label

                # image is mirrored so we need to invert left and right hand detection 
                if handLabel == 'Left':  handLabel = 'Right'
                elif handLabel == 'Right':  handLabel = 'Left'

                # Write to control channels
                if handLabel == 'Left':
                    freqChannel[0] = 400 - abs(hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x)*300 
                    ampChannel[0] = 1-abs(hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y) 

        # Flip the image horizontally for a selfie-view display.
        cv2.imshow('MediaPipe Hands', cv2.flip(image, 1))
        if cv2.waitKey(5) & 0xFF == 27:
            break

cap.release()

Oh, this is wonderful, thank you, @Lovre !!

Hi,
(I come from CSOUND@LISTSERV.HEANET.IE)
I would like to try to venture with this specific approach to play Csound with gestures.
I state that I’m not versed in this and I would need some help step by step to achieve what I read here. I am a simple Csound user.
I try to ask some questions:

  1. at the hardware level will I need a camera besides of course my MacBook Pro (and then sound card and monitors…)? Which type of camera is better to use?
  2. what software is essential to achieve this? In addition of course to Csound, I understand that it is necessary to download the library: GitHub - google-ai-edge/mediapipe: Cross-platform, customizable ML solutions for live and streaming media.
    And what else will I need?
    Is there anything else I don’t know…?

Thank you for any help!
E

Hi @Enrico

You can use whatever camera you like. Integrated camera will work for sure but also external USB cameras usually work out of the box too (OpenCV library is responsible for reading frames from camera).

  1. You need to have python on your mac, so if you don’t install it. I recommend version 3.11 when working with mediapipe (Python Release Python 3.11.9 | Python.org)
    • if you have any experience with programming and python I recommend working with python virtual environments
  2. You need to install needed packages. Run following commands in the terminal (make sure that your wanted python interpreter is used here)
    • pip install cv2
    • pip install mediapipe
    • you also need to set up ctcsound module but this should happen automatically if csound is in your system path and if not then you can try with:
      • pip install ctcsound
  3. Run the example from my post above to make sure that everything is working
  4. Change that example by using pose detection instead of hands and play a bit with it

Hi Lovre.

Two questions:

  1. the mediapipe library
    GitHub - google-ai-edge/mediapipe: Cross-platform, customizable ML solutions for live and streaming media.
    Is it to be downloaded from this link?
    https://github.com/google-ai-edge/mediapipe/archive/refs/heads/master.zip

  2. After downloading it I will also have to install it, or will the terminal command pip install mediapipe install it be installed?

Thanks

No you don’t need to download and install it manually. Just run pip install mediapipe and that will do the trick

Ok Lovre,
I installed Python 3.11 on my MacBook Pro macOS 64-bit universal2 installer

…but when I go to install the packages through Terminal ( pip install cv2, pip install mediapipe and pip install ctcsound) I have this message:
zsh: command not found: pip

This means that your system doesn’t know where python is installed. Try to restart your pc and then try again.
You can check if python is detected by your system by running python --version command. If you get an answer: version 3.11.9, then you are fine and pip command should work. If for some reason it still doesn’t work then you can try this python -m pip install [library name]

If that doesn’t work try also this:

  • python3 --version
    • if this works then try following: pip3 install [library name]

ok Lovre,

From Terminal, with the last command you gave me python3 --version I saw that I have Python 3.11.9;
Then to the next command: pip3 install cv2 I have this Terminal report:

python3.11 -m pip install --upgrade pip
Requirement already satisfied: pip in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (24.0)
Collecting pip
Downloading pip-24.3.1-py3-none-any.whl.metadata (3.7 kB)
Downloading pip-24.3.1-py3-none-any.whl (1.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 1.8 MB/s eta 0:00:00
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 24.0
Uninstalling pip-24.0:
Successfully uninstalled pip-24.0
WARNING: The scripts pip, pip3 and pip3.11 are installed in ‘/Library/Frameworks/Python.framework/Versions/3.11/bin’ which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed pip-24.3.1

How do you advise me to proceed?

that’s strange …

try this: python3 -m pip install [library name]

Based on what I copied above, this is installed:
Successfully installed pip-24.3.1

Maybe cv2 is in parentheses? Like this: [cv2] ?

No no :grin:. Sorry. I forgot that cv has a bit different installation.
Try this: python3 -m pip install opencv-python

Ok,
I’ll try again in the next few hours.

Hi Lovre.
Here is the new Terminal report by typing: python3 -m pip install opencv-python

python3 -m pip install opencv-python
Collecting opencv-python
Downloading opencv_python-4.10.0.84-cp37-abi3-macosx_12_0_x86_64.whl.metadata (20 kB)
Collecting numpy>=1.21.2 (from opencv-python)
Downloading numpy-2.1.3-cp311-cp311-macosx_10_9_x86_64.whl.metadata (62 kB)
Downloading opencv_python-4.10.0.84-cp37-abi3-macosx_12_0_x86_64.whl (56.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 2.1 MB/s eta 0:00:00
Downloading numpy-2.1.3-cp311-cp311-macosx_10_9_x86_64.whl (21.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.2/21.2 MB 2.3 MB/s eta 0:00:00
Installing collected packages: numpy, opencv-python
WARNING: The scripts f2py and numpy-config are installed in ‘/Library/Frameworks/Python.framework/Versions/3.11/bin’ which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed numpy-2.1.3 opencv-python-4.10.0.84

Do you think it’s okay now?

Yes. Looks good :+1:

Will I now have to install the other libraries as well?

With which code in Terminal?

  • python3 -m pip install mediapipe

  • check first if ctcsound is already set up and if not:

    • check which csound version is on your computer
    • python3 -m pip install ctcsound==[your_csound_version]
    • if csound and ctcsound versions don’t match, it wont work

Hi Lovre,

Sorry for yesterday’s prolonged silence.

  • How do I check if it’s already set to ctcsound? …Always from Terminal?
  • the Csound version installed on my PC is version 6.18 (double samples) Nov 24 2022