Short-latency sound control

suresh · September 22, 2024, 2:49pm

Hi,

I am new to Csound and to PureData. We have used PureData to implement a gesture-controlled sound generator - our emphasis is on minimizing latency between gesture (sent via OSC to Pd) and sound. I like Csound’s programmatic approach much more - but I have read here and there that PureData has more of a real-time emphasis and so may be “quicker” to respond. Is this correct ? I would use Csound if I could… and I think I would get some feedback here before/in addition to testing it myself.

Thanks !

rory · September 22, 2024, 3:26pm

There should be absoltuely no difference in response time between Pd and Csound. If you prefer to programmatic approach of Csound then I think you have your answer

Lovre · September 24, 2024, 9:16am

@suresh I don’t know which gestures are you detecting but since Csound has a very nice python bindings (ctcsound.py) you could run and control csound instruments directly from a python script and then you could e.g. take MediaPipe library (GitHub - google-ai-edge/mediapipe: Cross-platform, customizable ML solutions for live and streaming media.) for tracking e.g. hands/fingers. Here below is a script that tracks index finger of the left hand and uses it to control amplitude (y axis) and frequency (x axis):

CAMERA_ID = 0 # if there are multiple cameras connected, with this you can select which one to use

# MediaPipe parameters
MAX_NUM_HANDS = 2         
MODEL_COMPLEXITY = 0      
MIN_DETECTION_CONFIDENCE = 0.5 
MIN_TRACKING_CONFIDENCE = 0.5 

# ------------------------------------------------------------
# Imports
import cv2
import mediapipe as mp
import ctcsound

# ------------------------------------------------------------
# MediaPipe setup
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_hands = mp.solutions.hands
# ------------------------------------------------------------

# ------------------------------------------------------------
# Csound setup
# ------------------------------------------------------------
orc = '''
ksmps = 128
nchnls = 2
0dbfs = 1

instr 1
    ; IO channels
    kamp chnget "amp"      ; reading value from amp input channel 
    kfreq chnget "freq"    ; reading value from freq input channel 

    aout poscil kamp, kfreq                     ; audio synthesis
    outs aout, aout                           ; sending audio to speakers
endin
'''

cs = ctcsound.Csound()
pt = None # csound performance thread

#cs.compileCsdText(csd_text)
cs.setOption('-odac')
cs.setOption('-b1024')  # SW buffer size
cs.setOption('-B1024')  # HW buffer size
cs.compileOrc(orc)
cs.readScore('i 1 0 10000')  # run for a day
cs.start()

pt = ctcsound.CsoundPerformanceThread(cs.csound())
pt.play()

# Create control IO channels 
def createChannel(channelName):
    chn, _ = cs.channelPtr(channelName,
    ctcsound.CSOUND_CONTROL_CHANNEL | ctcsound.CSOUND_INPUT_CHANNEL)
    return chn

ampChannel = createChannel("amp")   # uses utility method to create a channel and get numpy array to write to
freqChannel = createChannel("freq")
# ------------------------------------------------------------


# ------------------------------------------------------------
# Processing part
pressed_button = False
cap = cv2.VideoCapture(CAMERA_ID)
with mp_hands.Hands(
    static_image_mode = False,
    max_num_hands = MAX_NUM_HANDS,
    model_complexity = MODEL_COMPLEXITY,
    min_detection_confidence = MIN_DETECTION_CONFIDENCE,
    min_tracking_confidence = MIN_TRACKING_CONFIDENCE) as hands:

    # Main loop
    while cap.isOpened():
        success, image = cap.read()
        if not success:
            print("Ignoring empty camera frame.")
            continue 
           
        # To improve performance, optionally mark the image as not writeable to pass by reference.
        image.flags.writeable = False
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        results = hands.process(image)

        image_height, image_width, _ = image.shape

        # Iterate through hands, draw hand landmarks on image and write to csound control channels
        image.flags.writeable = True
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        ampChannel[0] = 0.0 # if hand is not detected, set amp to 0 to turn off sound
        if results.multi_hand_landmarks:
            for hand_landmarks in results.multi_hand_landmarks:
                mp_drawing.draw_landmarks(
                    image,
                    hand_landmarks,
                    mp_hands.HAND_CONNECTIONS,
                    mp_drawing_styles.get_default_hand_landmarks_style(),
                    mp_drawing_styles.get_default_hand_connections_style())
                
                handIndex = results.multi_hand_landmarks.index(hand_landmarks)
                handLabel = results.multi_handedness[handIndex].classification[0].label

                # image is mirrored so we need to invert left and right hand detection 
                if handLabel == 'Left':  handLabel = 'Right'
                elif handLabel == 'Right':  handLabel = 'Left'

                # Write to control channels
                if handLabel == 'Left':
                    freqChannel[0] = 400 - abs(hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x)*300 
                    ampChannel[0] = 1-abs(hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y) 

        # Flip the image horizontally for a selfie-view display.
        cv2.imshow('MediaPipe Hands', cv2.flip(image, 1))
        if cv2.waitKey(5) & 0xFF == 27:
            break

cap.release()

suresh · November 2, 2024, 7:19pm

Oh, this is wonderful, thank you, @Lovre !!

Enrico · November 9, 2024, 11:16am

Hi,
(I come from CSOUND@LISTSERV.HEANET.IE)
I would like to try to venture with this specific approach to play Csound with gestures.
I state that I’m not versed in this and I would need some help step by step to achieve what I read here. I am a simple Csound user.
I try to ask some questions:

at the hardware level will I need a camera besides of course my MacBook Pro (and then sound card and monitors…)? Which type of camera is better to use?
what software is essential to achieve this? In addition of course to Csound, I understand that it is necessary to download the library: GitHub - google-ai-edge/mediapipe: Cross-platform, customizable ML solutions for live and streaming media.
And what else will I need?
Is there anything else I don’t know…?

Thank you for any help!
E

Lovre · November 9, 2024, 12:33pm

Hi @Enrico

You can use whatever camera you like. Integrated camera will work for sure but also external USB cameras usually work out of the box too (OpenCV library is responsible for reading frames from camera).

You need to have python on your mac, so if you don’t install it. I recommend version 3.11 when working with mediapipe (Python Release Python 3.11.9 | Python.org)
- if you have any experience with programming and python I recommend working with python virtual environments
You need to install needed packages. Run following commands in the terminal (make sure that your wanted python interpreter is used here)
- pip install cv2
- pip install mediapipe
- you also need to set up ctcsound module but this should happen automatically if csound is in your system path and if not then you can try with:
  - pip install ctcsound
Run the example from my post above to make sure that everything is working
Change that example by using pose detection instead of hands and play a bit with it

Enrico · November 9, 2024, 2:22pm

Hi Lovre.

Two questions:

the mediapipe library
GitHub - google-ai-edge/mediapipe: Cross-platform, customizable ML solutions for live and streaming media.
Is it to be downloaded from this link?
https://github.com/google-ai-edge/mediapipe/archive/refs/heads/master.zip
After downloading it I will also have to install it, or will the terminal command pip install mediapipe install it be installed?

Thanks

Lovre · November 9, 2024, 3:37pm

No you don’t need to download and install it manually. Just run pip install mediapipe and that will do the trick

Enrico · November 9, 2024, 5:17pm

Ok Lovre,
I installed Python 3.11 on my MacBook Pro macOS 64-bit universal2 installer

…but when I go to install the packages through Terminal ( pip install cv2, pip install mediapipe and pip install ctcsound) I have this message:
zsh: command not found: pip

Lovre · November 9, 2024, 6:04pm

This means that your system doesn’t know where python is installed. Try to restart your pc and then try again.
You can check if python is detected by your system by running python --version command. If you get an answer: version 3.11.9, then you are fine and pip command should work. If for some reason it still doesn’t work then you can try this python -m pip install [library name]

If that doesn’t work try also this:

python3 --version
- if this works then try following: pip3 install [library name]

Enrico · November 9, 2024, 6:30pm

ok Lovre,

From Terminal, with the last command you gave me python3 --version I saw that I have Python 3.11.9;
Then to the next command: pip3 install cv2 I have this Terminal report:

python3.11 -m pip install --upgrade pip
Requirement already satisfied: pip in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (24.0)
Collecting pip
Downloading pip-24.3.1-py3-none-any.whl.metadata (3.7 kB)
Downloading pip-24.3.1-py3-none-any.whl (1.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 1.8 MB/s eta 0:00:00
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 24.0
Uninstalling pip-24.0:
Successfully uninstalled pip-24.0
WARNING: The scripts pip, pip3 and pip3.11 are installed in ‘/Library/Frameworks/Python.framework/Versions/3.11/bin’ which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed pip-24.3.1

How do you advise me to proceed?

Lovre · November 9, 2024, 6:53pm

that’s strange …

try this: python3 -m pip install [library name]

Enrico · November 9, 2024, 7:22pm

Based on what I copied above, this is installed:
Successfully installed pip-24.3.1

Maybe cv2 is in parentheses? Like this: [cv2] ?

Lovre · November 9, 2024, 7:45pm

No no . Sorry. I forgot that cv has a bit different installation.
Try this: python3 -m pip install opencv-python

Enrico · November 9, 2024, 8:51pm

Ok,
I’ll try again in the next few hours.

Enrico · November 9, 2024, 9:08pm

Hi Lovre.
Here is the new Terminal report by typing: python3 -m pip install opencv-python

python3 -m pip install opencv-python
Collecting opencv-python
Downloading opencv_python-4.10.0.84-cp37-abi3-macosx_12_0_x86_64.whl.metadata (20 kB)
Collecting numpy>=1.21.2 (from opencv-python)
Downloading numpy-2.1.3-cp311-cp311-macosx_10_9_x86_64.whl.metadata (62 kB)
Downloading opencv_python-4.10.0.84-cp37-abi3-macosx_12_0_x86_64.whl (56.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 2.1 MB/s eta 0:00:00
Downloading numpy-2.1.3-cp311-cp311-macosx_10_9_x86_64.whl (21.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.2/21.2 MB 2.3 MB/s eta 0:00:00
Installing collected packages: numpy, opencv-python
WARNING: The scripts f2py and numpy-config are installed in ‘/Library/Frameworks/Python.framework/Versions/3.11/bin’ which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed numpy-2.1.3 opencv-python-4.10.0.84

Do you think it’s okay now?

Lovre · November 9, 2024, 9:20pm

Yes. Looks good

Enrico · November 10, 2024, 6:00am

Will I now have to install the other libraries as well?

With which code in Terminal?

Lovre · November 10, 2024, 6:31am

python3 -m pip install mediapipe
check first if ctcsound is already set up and if not:
- check which csound version is on your computer
- python3 -m pip install ctcsound==[your_csound_version]
- if csound and ctcsound versions don’t match, it wont work

Enrico · November 11, 2024, 8:33am

Hi Lovre,

Sorry for yesterday’s prolonged silence.

How do I check if it’s already set to ctcsound? …Always from Terminal?
the Csound version installed on my PC is version 6.18 (double samples) Nov 24 2022