Deep Learning-based Human Pose Estimation for Squat Analysis
This example shows how to use human pose estimation for squat analysis from a recorded video. In this example, you will use a pretrained deep learning network to detect a person in the input video. Then, use a pretrained HRNet key point detector to identify keypoints on the detected person. These keypoints can then be used to determine if the person is performing a squat movement. The rest of the example explains the steps involved.
Step 1: Load Pretrained Deep Learning Networks
Load a deep learning object detector trained on COCO dataset to detect people in an image by using the peopleDetector
object.
detector = peopleDetector;
Load a pretrained HRNet object keypoint detector. The default network is HRNet-W32, trained on the COCO keypoint detection data set. In an HRNet-W32 network, the last three stages of the high-resolution subnetworks have 32 convolved feature maps. For more information about HRNet architecture and HRNet object keypoint detector, see Getting Started with HRNet and hrnetObjectKeypointDetector
, respectively.
keyPtDet = hrnetObjectKeypointDetector;
Step 2: Read Video
Download the squat exercise video.
downloadFolder = pwd; dataFilename = "SquatExerciseVideo.zip"; dataUrl = "https://ssd.mathworks.com/supportfiles/vision/data/" + dataFilename; zipFile = fullfile(downloadFolder,dataFilename); if ~exist(zipFile,"file") disp("Downloading Squat Exercise Video (8 MB)...") websave(zipFile,dataUrl); end unzip(zipFile,downloadFolder) dataset = fullfile(downloadFolder,"SquatExerciseVideo.mp4");
Create a VideoReader
object to read a video into the MATLAB® workspace. The video used in this example is a recorded video of a person performing squat movement.
reader = VideoReader('SquatExerciseVideo.mp4');
Step 3: Perform Detections on Video Frame
Read a desired video frame from the input video by setting the current time of the VideoReader
object. The readFrame
function then reads the next available video frame from the specified time.
reader.CurrentTime = 16.5; videoFrame = readFrame(reader);
Detect the person in the video frame by using the detect
method of peopleDetector
object.
[bboxes,scores,class] = detect(detector,videoFrame);
[val,indx] = max(scores);
bbox = bboxes(indx,:);
detection = insertObjectAnnotation(videoFrame,'rectangle',bbox,class(indx));
figure
imshow(detection)
Detect the object keypoints in the cropped image by using the pretrained HRNet object keypoint detector. Use the detect
method of the hrnetObjectKeypointDetector
object to compute the keypoints.
[keypoints,keypointScores,valid] = detect(keyPtDet,videoFrame,bbox);
Insert the detected keypoints into the input frame and display the results.
keyLabels = categorical(1:length(keypointScores))'; detectedPtsImage = insertObjectKeypoints(videoFrame,keypoints,KeypointColor="red",... KeypointLabel=keyLabels,TextBoxColor="cyan",FontColor="blue");
For better visualization of detected keypoints and their locations, crop and display the detected bounding box region.
detectedKeyPoints = imcrop(detectedPtsImage,bbox);
fig = figure(Position=[0 0 400 800]);
hAxes = axes(fig);
image(detectedKeyPoints,Parent=hAxes)
axis off
Step 4: Identify Keypoints and Criteria for Squat Analysis
From the detection, identify the keypoints that you want to use for detecting squat movement. This example uses the keypoints near the hip joint, knee joint, and shoulder joint on the right side of the person's body to perform squat analysis. Then, connects the keypoints near the hip, knee, and shoulder joints to form two line segments:
The line segment from the hip to the knee.
The line segment from the hip to the shoulder.
Specify the index of the desired keypoints near the hip joint, knee joint, and the shoulder joint.
hipIndex = keypoints(13,:); shoulderIndex = keypoints(7,:); kneeIndex = keypoints(15,:); xIndex = [shoulderIndex(1) hipIndex(1) kneeIndex(1)]; yIndex = [shoulderIndex(2) hipIndex(2) kneeIndex(2)];
Draw line segments connecting hip to the knee and hip to the shoulder.
figure imshow(detectedPtsImage) hold on line(xIndex,yIndex,LineWidth=2,Color="Yellow")
Measure the angles made by the two line segments with respect to the horizontal axis of the image to determine if the person is performing the squat movement. In this example, the movement is counted as a squat, if these two conditions are satisfied:
The angle of the line segment from the hip joint to the shoulder joint must be less than 60 degrees.
The angle of the line segment from the hip joint to the knee joint must be less than 15 degrees.
angle1 = (180/pi).*atan(abs((keypoints(13,2)-keypoints(7,2))./(keypoints(7,1)-keypoints(13,1)))); angle2 = (180/pi).*atan(abs((keypoints(15,2)-keypoints(13,2))./(keypoints(15,1)-keypoints(13,1)))); if lt(angle1,60) && lt(angle2,15) disp("Squat") else disp("Not a squat") end
Not a squat
You can also use a different set of keypoints and consider multiple angles to accurately analyze if the squat movement is performed correctly.
Step 5: Perform Squat Analysis on Video
In this section, you will use the approach explained in Step 2 to Step 4 to identify and count the number of squat movements for the duration of the entire video.
Reset the current time of the video reader to zero to start reading the video from the beginning.
reader.CurrentTime = 0;
Initialize the video player to display the squat analysis results. Specify the player position on the screen.
videoPlayer = vision.VideoPlayer(Position=[100 100 600 800]);
Set the flag to true to enable squat counting.
countSquat = true; squatCount = 0;
Initialize the update flag to false. This flag ensures that each squat movement is counted only once. Initially set to false, it checks if the angles between specific keypoints meet the squat criteria: angle1
(hip to shoulder) must be less than 60 degrees, and angle2
(knee to hip) must be less than 15 degrees. When these criteria are met and the flag is false, the squat count increments by 1, and the flag is set to true. If the criteria are not met in subsequent frames, the flag resets to false, allowing the next valid squat to be counted.
update = false;
Perform steps 2 to 4 on each frame in the video.
while hasFrame(reader) % Step 2: Read Video Frame videoFrame = imresize(readFrame(reader),[600 400]); % Step 3: Perform Detections [bboxes,scores,class] = detect(detector,videoFrame); [val,indx] = max(scores); boxPerson = bboxes(indx,:); if ~isempty(boxPerson) [keypoints,scores,vflag] = detect(keyPtDet,videoFrame,boxPerson); videoFrame = insertObjectKeypoints(videoFrame,keypoints,Connections=keyPtDet.KeypointConnections,... KeypointSize = 4,ConnectionColor="y",LineWidth=2); % Step 4: Identify Keypoints and Criteria for Squat Analysis angle1 = (180/pi).*atan(abs((keypoints(13,2)-keypoints(7,2))./(keypoints(7,1)-keypoints(13,1)))); angle2 = (180/pi).*atan(abs((keypoints(15,2)-keypoints(13,2))./(keypoints(15,1)-keypoints(13,1)))); if countSquat if lt(angle1,60) && lt(angle2,15) if ~update squatCount = squatCount+1; update = true; end else update = false; end countText = "SquatCount = " + string(squatCount); videoFrame = insertText(videoFrame,[1 50],countText,FontSize=15,TextBoxColor="cyan",FontColor="blue"); end % Display the results step(videoPlayer,videoFrame) end end release(videoPlayer)