I am new to Csound and to PureData. We have used PureData to implement a gesture-controlled sound generator - our emphasis is on minimizing latency between gesture (sent via OSC to Pd) and sound. I like Csound’s programmatic approach much more - but I have read here and there that PureData has more of a real-time emphasis and so may be “quicker” to respond. Is this correct ? I would use Csound if I could… and I think I would get some feedback here before/in addition to testing it myself.
There should be absoltuely no difference in response time between Pd and Csound. If you prefer to programmatic approach of Csound then I think you have your answer
@suresh I don’t know which gestures are you detecting but since Csound has a very nice python bindings (ctcsound.py) you could run and control csound instruments directly from a python script and then you could e.g. take MediaPipe library (GitHub - google-ai-edge/mediapipe: Cross-platform, customizable ML solutions for live and streaming media.) for tracking e.g. hands/fingers. Here below is a script that tracks index finger of the left hand and uses it to control amplitude (y axis) and frequency (x axis):
CAMERA_ID = 0 # if there are multiple cameras connected, with this you can select which one to use
# MediaPipe parameters
MAX_NUM_HANDS = 2
MODEL_COMPLEXITY = 0
MIN_DETECTION_CONFIDENCE = 0.5
MIN_TRACKING_CONFIDENCE = 0.5
# ------------------------------------------------------------
# Imports
import cv2
import mediapipe as mp
import ctcsound
# ------------------------------------------------------------
# MediaPipe setup
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_hands = mp.solutions.hands
# ------------------------------------------------------------
# ------------------------------------------------------------
# Csound setup
# ------------------------------------------------------------
orc = '''
ksmps = 128
nchnls = 2
0dbfs = 1
instr 1
; IO channels
kamp chnget "amp" ; reading value from amp input channel
kfreq chnget "freq" ; reading value from freq input channel
aout poscil kamp, kfreq ; audio synthesis
outs aout, aout ; sending audio to speakers
endin
'''
cs = ctcsound.Csound()
pt = None # csound performance thread
#cs.compileCsdText(csd_text)
cs.setOption('-odac')
cs.setOption('-b1024') # SW buffer size
cs.setOption('-B1024') # HW buffer size
cs.compileOrc(orc)
cs.readScore('i 1 0 10000') # run for a day
cs.start()
pt = ctcsound.CsoundPerformanceThread(cs.csound())
pt.play()
# Create control IO channels
def createChannel(channelName):
chn, _ = cs.channelPtr(channelName,
ctcsound.CSOUND_CONTROL_CHANNEL | ctcsound.CSOUND_INPUT_CHANNEL)
return chn
ampChannel = createChannel("amp") # uses utility method to create a channel and get numpy array to write to
freqChannel = createChannel("freq")
# ------------------------------------------------------------
# ------------------------------------------------------------
# Processing part
pressed_button = False
cap = cv2.VideoCapture(CAMERA_ID)
with mp_hands.Hands(
static_image_mode = False,
max_num_hands = MAX_NUM_HANDS,
model_complexity = MODEL_COMPLEXITY,
min_detection_confidence = MIN_DETECTION_CONFIDENCE,
min_tracking_confidence = MIN_TRACKING_CONFIDENCE) as hands:
# Main loop
while cap.isOpened():
success, image = cap.read()
if not success:
print("Ignoring empty camera frame.")
continue
# To improve performance, optionally mark the image as not writeable to pass by reference.
image.flags.writeable = False
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
results = hands.process(image)
image_height, image_width, _ = image.shape
# Iterate through hands, draw hand landmarks on image and write to csound control channels
image.flags.writeable = True
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
ampChannel[0] = 0.0 # if hand is not detected, set amp to 0 to turn off sound
if results.multi_hand_landmarks:
for hand_landmarks in results.multi_hand_landmarks:
mp_drawing.draw_landmarks(
image,
hand_landmarks,
mp_hands.HAND_CONNECTIONS,
mp_drawing_styles.get_default_hand_landmarks_style(),
mp_drawing_styles.get_default_hand_connections_style())
handIndex = results.multi_hand_landmarks.index(hand_landmarks)
handLabel = results.multi_handedness[handIndex].classification[0].label
# image is mirrored so we need to invert left and right hand detection
if handLabel == 'Left': handLabel = 'Right'
elif handLabel == 'Right': handLabel = 'Left'
# Write to control channels
if handLabel == 'Left':
freqChannel[0] = 400 - abs(hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x)*300
ampChannel[0] = 1-abs(hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y)
# Flip the image horizontally for a selfie-view display.
cv2.imshow('MediaPipe Hands', cv2.flip(image, 1))
if cv2.waitKey(5) & 0xFF == 27:
break
cap.release()
Hi,
(I come from CSOUND@LISTSERV.HEANET.IE)
I would like to try to venture with this specific approach to play Csound with gestures.
I state that I’m not versed in this and I would need some help step by step to achieve what I read here. I am a simple Csound user.
I try to ask some questions:
at the hardware level will I need a camera besides of course my MacBook Pro (and then sound card and monitors…)? Which type of camera is better to use?
You can use whatever camera you like. Integrated camera will work for sure but also external USB cameras usually work out of the box too (OpenCV library is responsible for reading frames from camera).
…but when I go to install the packages through Terminal ( pip install cv2, pip install mediapipe and pip install ctcsound) I have this message:
zsh: command not found: pip
This means that your system doesn’t know where python is installed. Try to restart your pc and then try again.
You can check if python is detected by your system by running python --version command. If you get an answer: version 3.11.9, then you are fine and pip command should work. If for some reason it still doesn’t work then you can try this python -m pip install [library name]
If that doesn’t work try also this:
python3 --version
if this works then try following: pip3 install [library name]
From Terminal, with the last command you gave me python3 --version I saw that I have Python 3.11.9;
Then to the next command: pip3 install cv2 I have this Terminal report:
python3.11 -m pip install --upgrade pip
Requirement already satisfied: pip in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (24.0)
Collecting pip
Downloading pip-24.3.1-py3-none-any.whl.metadata (3.7 kB)
Downloading pip-24.3.1-py3-none-any.whl (1.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 1.8 MB/s eta 0:00:00
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 24.0
Uninstalling pip-24.0:
Successfully uninstalled pip-24.0
WARNING: The scripts pip, pip3 and pip3.11 are installed in ‘/Library/Frameworks/Python.framework/Versions/3.11/bin’ which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed pip-24.3.1
Hi Lovre.
Here is the new Terminal report by typing: python3 -m pip install opencv-python
python3 -m pip install opencv-python
Collecting opencv-python
Downloading opencv_python-4.10.0.84-cp37-abi3-macosx_12_0_x86_64.whl.metadata (20 kB)
Collecting numpy>=1.21.2 (from opencv-python)
Downloading numpy-2.1.3-cp311-cp311-macosx_10_9_x86_64.whl.metadata (62 kB)
Downloading opencv_python-4.10.0.84-cp37-abi3-macosx_12_0_x86_64.whl (56.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 2.1 MB/s eta 0:00:00
Downloading numpy-2.1.3-cp311-cp311-macosx_10_9_x86_64.whl (21.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.2/21.2 MB 2.3 MB/s eta 0:00:00
Installing collected packages: numpy, opencv-python
WARNING: The scripts f2py and numpy-config are installed in ‘/Library/Frameworks/Python.framework/Versions/3.11/bin’ which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed numpy-2.1.3 opencv-python-4.10.0.84