Segmentation Mask with MediaPipe in TouchDesigner

The tutorial is an updated version of the MediaPipe Pose using the new segmentation mask function to identify the human body tracked. It used the Script TOP to generate the mask image. Users can further enhance the image with the Threshold TOP for display purpose. Similar to the previous tutorials, it assumes the installation of Python 3.7 and the MediaPipe library through Pip.

The source TouchDesigner project file is available in my TouchDesigner GitHub repository. The Python code is relatively straighforward. The pose tracking results will include an array (segmentation_mask) of the size of the tracked image. Each pixel will have a value between 0.0 to 1.0. Darker value will be the background while brighter value will likely be the tracked body. Here is the full listing.

# me - this DAT
# scriptOp - the OP which is cooking

import numpy as np
import cv2
import mediapipe as mp

mp_drawing = mp.solutions.drawing_utils
mp_pose = mp.solutions.pose

pose = mp_pose.Pose(
	min_detection_confidence=0.5,
	min_tracking_confidence=0.5,
	enable_segmentation=True
)

def onSetupParameters(scriptOp):
	return

# called whenever custom pulse parameter is pushed
def onPulse(par):
	return

def onCook(scriptOp):
	input = scriptOp.inputs[0].numpyArray(delayed=True)
	if input is not None:
		image = cv2.cvtColor(input, cv2.COLOR_RGBA2RGB)
		image *= 255
		image = image.astype('uint8')
		results = pose.process(image)
		
		if results.segmentation_mask is not None:
			rgb = cv2.cvtColor(results.segmentation_mask, cv2.COLOR_GRAY2RGB)
			rgb = rgb * 255
			rgb = rgb.astype(np.uint8)
			scriptOp.copyNumpyArray(rgb)
		else:
			black = np.zeros(image.shape, dtype=np.uint8)
			scriptOp.copyNumpyArray(black)
	return
Segmentation mask in MediaPipe

MediaPipe in TouchDesigner 10

This is the last part of the series, using MediaPipe in TouchDesigner. The following example is a continuation of the last post of pose tracking. This version will use a Script CHOP to output the position information of the torso tracked in the film sequence. The output window will display four numbers (11, 12, 23, 24) on the four corners of the torso. The four numbers are the indices of the pose landmarks corresponding to the torso of the body.

The Script CHOP will output 3 channels

  • pose:x
  • pose:y
  • pose:visibility

Each channel has 33 samples, corresponding to the 33 pose landmarks. The visibility channel will indicate how likely the landmark is visible in the image. The following code segment describes how it is done.

xpos = []
ypos = []
visb = []

if results.pose_landmarks:
    for p in results.pose_landmarks.landmark:
        xpos.append(p.x)
        ypos.append(p.y)
        visb.append(p.visibility)

    tx = scriptOp.appendChan('pose:x')         
    ty = scriptOp.appendChan('pose:y')         
    tv = scriptOp.appendChan('pose:visibility')
         
    tx.vals = xpos         
    ty.vals = ypos         
    tv.vals = visb
         
    scriptOp.rate = me.time.rate         
    scriptOp.numSamples = len(xpos)

The final TouchDesigner project folder MediaPipePoseCHOP is now available in the GitHub repository.

MediaPipe in TouchDesigner 9

The following example illustrates the Pose Tracking solution in the Google MediaPipe, using TouchDesigner. It will display the tracking result in a Script TOP. Instead of using the live Video Device In TOP, it uses the Movie File In to track the dancing movement from 2 film clips. The project also makes use of a Keyboard In CHOP to switch between the 2 film clips.

The project does not resize the original film clip with a Resolution TOP. It performs the resize function within the Python code in the Script TOP with the OpenCV function cv2.resize(). Each pose detected will generate 33 pose landmarks. The details can be found from the following diagram.

Image from the Google MediaPipe

Together with the original video image, the drawing utility will generate the pose skeleton with the following code segment.

mp_drawing.draw_landmarks(
    image, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)

The final TouchDesigner project is available in the GitHub folder as MediaPipePoseTOP. Owing to file size and copyright concerns, the two film clips are not included in GitHub.

MediaPipe in TouchDesigner 8

The following example presents a more general approach to obtain the hand tracking details in a Script CHOP. We can then use other TouchDesigner CHOPs to extract the data for visualisation.

For simplicity, it also detects one single hand. For each hand tracked, it will generate 21 landmarks as shown in the diagram from the last post. The Script CHOP will produce 2 channels, hand:x and hand:y. Each of the channel will have 21 samples, corresponding to the 21 hand landmarks from MediaPipe. The following code segment describes how it is done.

detail_x = []
detail_y = []
if results.multi_hand_landmarks:
    for hand in results.multi_hand_landmarks:
        for pt in hand.landmark:
            detail_x.append(pt.x)
            detail_y.append(pt.y)

    tx = scriptOp.appendChan('hand:x')   
    ty = scriptOp.appendChan('hand:y')  
    tx.vals = detail_x   
    ty.vals = detail_y  
    scriptOp.numSamples = len(detail_x)

scriptOp.rate = me.time.rate

The TouchDesigner project also uses Shuffle CHOP to swap the 21 samples into 21 channels. We can then select the 5 channels corresponding to the 5 finger tips (4, 8, 12, 16, 20) for visualisation. The final project is available for download in the MediaPipeHandCHOP2 folder of the GitHub repository.

MediaPipe in TouchDesigner 7

This example is the continuation of the last post using hand tracking in MediaPipe with TouchDesigner. This version will use a Script CHOP, instead of a Script TOP. The CHOP will produce channels related to the x and y positions of the Wrist and the Index Finger Tip. We can make use of the numbers to create interactive animation accordingly.

The MediaPipe hand tracking solution will generate 21 landmarks including all positions of the 5 fingers and the wrist. Details of the 21 landmarks are in the following diagram.

Image from the Google MediaPipe

For simplicity, the example only detects one hand. The indices 0 and 8 correspond to the WRIST and the INDEX_FINGER_TIP respectively. The following code segment illustrates how it generates the channels for the Script CHOP.

wrist = []
index_tip = []
num_hands = 0
if results.multi_hand_landmarks:
    num_hands += 1
    for hand in results.multi_hand_landmarks:
        wrist.append(hand.landmark[0])
        index_tip.append(hand.landmark[8])


tf = scriptOp.appendChan('hands')
tf.vals = [num_hands]

if len(wrist) > 0:
    twx = scriptOp.appendChan('wrist:x')
    twy = scriptOp.appendChan('wrist:y')

    twx.vals = [wrist[0].x]
    twy.vals = [wrist[0].y]

if len(index_tip) > 0:
    tix = scriptOp.appendChan('index_tip:x') 
    tiy = scriptOp.appendChan('index_tip:y')

    tix.vals = [index_tip[0].x]  
    tiy.vals = [index_tip[0].y]

scriptOp.rate = me.time.rate

MediaPipe in TouchDesigner 6

This tutorial introduces the use of hand tracking in the Google MediaPipe with TouchDesigner. Similar to the previous posts, part 1 of hand tracking will just be a visualisation of the hand details from a Script TOP. It will use the MediaPipe drawing utility to display the hand details directly onto the Video Device In image for output.

The TouchDesigner project can now be downloaded from the MediaPipeHandTOP GitHub directory.

MediaPipe in TouchDesigner 5

This is the continuation of the last post with slight modifications. Instead of just displaying the face mesh details in a Script TOP, it tries to visualise all the face mesh points in a 3D space. As the facial landmarks returned from the MediaPipe contain three dimensional information, it is possible to enumerate all the points and display them in a Script SOP. We are going to use the appendPoint() function to generate the point cloud and the appendPoly() function to create the face mesh.

The data returned from the MediaPipe contains the 468 facial landmarks, based on the Canonical Face Model. The face mesh information (triangles), however, is not available from the results obtained from the MediaPipe solutions. Nevertheless, we can obtain such information from the meta data of the facial landmarks from its GitHub. To simplify the process, I have edited the data into this CSV mesh file. It is expected that the mesh.csv file is located in the TouchDesigner project folder, together with the TOE project file. Here are the first few lines of the mesh.csv file,

173,155,133
246,33,7
382,398,362
263,466,249
308,415,324

Each line is the data for a triangular mesh of the face. The 3 numbers are the indices of the vertices defined in the 468 facial landmarks. The visualisation of the landmarks is also available in the MediaPipe GitHub.

Canonical face model
Image from the Google MediaPipe GitHub

The TouchDesigner project will render the Script SOP with the standard Geometry, Camera, Light and the Render TOP.

I’ll not go through all the code here. The following paragraphs cover some of the essential elements in the Python code. The first one is the initialisation of the face mesh information from the mesh.csv file.

triangles = []
mesh_file = project.folder + "/mesh.csv"
mf = open(mesh_file, "r")
mesh_list = mf.read().split('\n')
for m in mesh_list:
    temp = m.split(',')
    x = temp[0]
    y = temp[1]
    z = temp[2]
    triangles.append([x, y, z])

The variable triangles is the list of all triangles from the canonical face model. Each entry is a list of 3 indices to the entries of the corresponding points in the 468 facial landmarks. The second one is the code to generate the face point cloud and the mesh.

for pt in landmarks:
    p = scriptOp.appendPoint()
    p.x = pt.x
    p.y = pt.y
    p.z = pt.z

for poly in triangles:         
    pp = scriptOp.appendPoly(3, closed=True, addPoints=False)  
    pp[0].point = scriptOp.points[poly[0]]        
    pp[1].point = scriptOp.points[poly[1]]       
    pp[2].point = scriptOp.points[poly[2]]

The first for loop creates all the points from the facial landmarks using the appendPoint() function. The second for loop creates all the triangular meshes from information stored in the variable triangles using the appendPoly() function.

After we draw the 3D face model, we also compute the normals of the model by using another Attribute Create SOP.

The final TouchDesigner project is available in the MediaPipeFaceMeshSOP repository.

MediaPipe in TouchDesigner 4

The following example is a simple demonstration of the Face Mesh function from MediaPipe in TouchDesigner. It is very similar to the previous face detection example. Again, we are going to use the Script TOP to integrate with MediaPipe and display the face mesh information together with the live webcam image.

Instead of flipping the image vertically in the Python code, this version will perform the flipping in the TouchDesigner Flip TOP, both vertically and horizontally (mirror image). We also reduce the resolution from the original 1280 x 720 to 640 x 360 for better performance. The Face Mesh information is drawn directly to the output image in the Script TOP.

Here is also the Python code in the Script TOP

# me - this DAT
# scriptOp - the OP which is cooking
import numpy
import cv2
import mediapipe as mp

mp_drawing = mp.solutions.drawing_utils
mp_face_mesh = mp.solutions.face_mesh

point_spec = mp_drawing.DrawingSpec(
    color=(0, 100, 255),
    thickness=1,
    circle_radius=1
)
line_spec = mp_drawing.DrawingSpec(
    color=(255, 200, 0),
    thickness=2,
    circle_radius=1
)
face_mesh = mp_face_mesh.FaceMesh(
    min_detection_confidence=0.5,
    min_tracking_confidence=0.5
)

# press 'Setup Parameters' in the OP to call this function to re-create the parameters.
def onSetupParameters(scriptOp):
    page = scriptOp.appendCustomPage('Custom')
    p = page.appendFloat('Valuea', label='Value A')
    p = page.appendFloat('Valueb', label='Value B')
    return

# called whenever custom pulse parameter is pushed
def onPulse(par):
    return

def onCook(scriptOp):
    input = scriptOp.inputs[0].numpyArray(delayed=True)
    if input is not None:
        frame = input * 255
        frame = frame.astype('uint8')
        frame = cv2.cvtColor(frame, cv2.COLOR_RGBA2RGB)
        results = face_mesh.process(frame)
        if results.multi_face_landmarks:
            for face_landmarks in results.multi_face_landmarks:
                mp_drawing.draw_landmarks(
                    image=frame,
                    landmark_list=face_landmarks,
                    connections=mp_face_mesh.FACE_CONNECTIONS,
                    landmark_drawing_spec=point_spec,
                    connection_drawing_spec=line_spec)

       frame = cv2.cvtColor(frame, cv2.COLOR_RGB2RGBA)
       scriptOp.copyNumpyArray(frame)
    return

Similar to previous examples, the important code is in the onCook function. The face_mesh will process each frame and draw the results in the frame instance for final display.

The TouchDesigner project is now available in the MediaPipeFaceMeshTOP folder of the GitHub repository.

MediaPipe in TouchDesigner 3

The last post demonstrated the use of the face detection function in MediaPipe with TouchDesigner. Nevertheless, it only produced an image with the detected results. It is not very useful if we want to manipulate the graphics according to the detected faces. In this example, we switch to the use of Script CHOP to output the detected face data in numeric form.

As mentioned in the last post, the MediaPipe face detection expects a vertically flipped image as compared with the TouchDesigner texture, this example will flip the image with a TouchDesigner TOP to make the Python code simpler. Instead of showing all the detected faces, the code just pick the largest face and output its bounding box and the position of the left and right eyes.

Since we are working on a Script CHOP, it is not possible to connect directly the flipped TOP to it. In this case, we use the onSetupParameters function to define the Face TOP input in the Custom tab.

def onSetupParameters(scriptOp):
     page = scriptOp.appendCustomPage('Custom')
     topPar = page.appendTOP('Face', label='Image with face')
     return

And in the onCook function, we use the following statement to retrieve the image from the TOP that we dragged into the Face parameter.

topRef = scriptOp.par.Face.eval()

After we found out the largest face from the image, we append a number channels to the Script CHOP such that the TouchDesigner project can use them for custom visualisation. The new channels are,

  • face (number of faces detected)
  • width, height (size of the bounding box)
  • tx, ty (centre of the bounding box)
  • left_eye_x, left_eye_y (position of the left eye)
  • right_eye_x, right_eye_y (position of the right eye)

The complete project file can be downloaded from this GitHub repository.