YOLOv8n-Pose inherits the powerful object detection backbone and neck architecture of YOLOv8n. It extends the standard YOLOv8n object detection model by integrating dedicated pose estimation layers onto its head. This allows it to not only detect people (bboxes) but also simultaneously predict the spatial positions (keypoints) of their anatomical joints (e.g., shoulders, elbows, knees, ankles).
Inference results on VIM3.
Inference speed test: USB camera about 182ms per frame. MIPI camera about 156ms per frame.
Download the YOLOv8 official code. ultralytics/ultralytics
$ git clone https://github.com/ultralytics/ultralytics
Refer README.md
to create and train a YOLOv8n-Pose model. My version torch==1.10.1
and ultralytics==8.0.86
.
$ git lfs install $ git lfs clone https://github.com/khadas/aml_npu_sdk
The KSNN conversion tool is under acuity-toolkit/python
.
$ cd aml_npu_sdk/acuity-toolkit/python && ls $ convert data outputs
After training model, modify Class Detect and Class Pose in ultralytics/ultralytics/nn/modules/head.py
as follows. (If you use ultralytics==8.0.86
, the class in ultralytics/ultralytics/nn/modules.py
)
diff --git a/ultralytics/nn/modules/head.py b/ultralytics/nn/modules/head.py index 0b02eb3..0a6e43a 100644 --- a/ultralytics/nn/modules/head.py +++ b/ultralytics/nn/modules/head.py @@ -42,6 +42,9 @@ class Detect(nn.Module): def forward(self, x): """Concatenates and returns predicted bounding boxes and class probabilities.""" + if torch.onnx.is_in_onnx_export(): + return self.forward_export(x) + shape = x[0].shape # BCHW for i in range(self.nl): x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1) @@ -80,6 +83,15 @@ class Detect(nn.Module): a[-1].bias.data[:] = 1.0 # box b[-1].bias.data[:m.nc] = math.log(5 / m.nc / (640 / s) ** 2) # cls (.01 objects, 80 classes, 640 img) + def forward_export(self, x): + results = [] + for i in range(self.nl): + dfl = self.cv2[i](x[i]).contiguous() + cls = self.cv3[i](x[i]).contiguous() + results.append(torch.cat([cls, dfl], 1)) + return tuple(results) + @@ -255,6 +283,16 @@ class Pose(Detect): def forward(self, x): """Perform forward pass through YOLO model and return predictions.""" bs = x[0].shape[0] # batch size - kpt = torch.cat([self.cv4[i](x[i]).view(bs, self.nk, -1) for i in range(self.nl)], -1) # (bs, 17*3, h*w) + if torch.onnx.is_in_onnx_export(): + kpt = [self.cv4[i](x[i]) for i in range(self.nl)] + else: + kpt = torch.cat([self.cv4[i](x[i]).view(bs, self.nk, -1) for i in range(self.nl)], -1) # (bs, 17*3, h*w) x = self.detect(self, x) + + if torch.onnx.is_in_onnx_export(): + output = [] + for i in range(self.nl): + output.append((torch.cat([x[i], kpt[i]], dim=1))) + return output
If you pip-installed ultralytics package, you should modify in package.
Create a python file written as follows to export ONNX model.
from ultralytics import YOLO model = YOLO("./runs/pose/train/weights/best.pt") results = model.export(format="onnx")
$ python export.py
Use Netron to check your model output like this. If not, please check your head.py
.
Enter aml_npu_sdk/acuity-toolkit/python
and run command as follows.
# uint8
$ ./convert --model-name yolov8n_pose \
--platform onnx \
--model yolov8n_pose.onnx \
--mean-values '0 0 0 0.00392156' \
--quantized-dtype asymmetric_affine \
--source-files ./data/dataset/dataset0.txt \
--batch-size 1 \
--iterations 1 \
--kboard VIM3 --print-level 0
Now KSNN only supports batch-size
= 1.
If you want to use more quantified images, please modify batch-size
and iterations
. batch-size
×iterations
=number of quantified images. The number of quantified images has better between 200 and 500.
If you use VIM3L
, please use VIM3L
to replace VIM3
.
If run succeed, converted model and library will generate in outputs/yolov8n_pose
.
If your YOLOv8n-Pose model perform bad on board, please try quanfity model in int8 or int16.
# int8 $ ./convert --model-name yolov8n_pose \ --platform onnx \ --model yolov8n_pose.onnx \ --mean-values '0 0 0 0.00392156' \ --quantized-dtype dynamic_fixed_point \ --qtype int8 \ --source-files ./data/dataset/dataset0.txt \ --batch-size 1 \ --iterations 1 \ --kboard VIM3 --print-level 0 # int16 $ ./convert --model-name yolov8n_pose \ --platform onnx \ --model yolov8n_pose.onnx \ --mean-values '0 0 0 0.00392156' \ --quantized-dtype dynamic_fixed_point \ --qtype int16 \ --source-files ./data/dataset/dataset0.txt \ --batch-size 1 \ --iterations 1 \ --kboard VIM3 --print-level 0
Download KSNN library and demo code. khadas/ksnn
$ git clone --recursive https://github.com/khadas/ksnn.git $ cd ksnn/ksnn $ pip3 install ksnn-1.3-py3-none-any.whl
If your kernel version is 5.15, use ksnn-1.4-py3-none-any.whl
instead of ksnn-1.3-py3-none-any.whl
.
$ pip3 install matplotlib
Put yolov8n_pose.nb
and libnn_yolov8n_pose.so
into ksnn/examples/yolov8n_pose/models/VIM3
and ksnn/examples/yolov8n_pose/libs
$ cd ksnn/examples/yolov8n_pose $ python3 yolov8n-pose-picture.py --model ./models/VIM3/yolov8n_pose_uint8.nb --library ./libs/libnn_yolov8n_pose.so --picture ./data/bus.jpg --level 0
For USB camera.
# usb $ cd ksnn/examples/yolov8n_pose $ python3 yolov8n-pose-cap.py --model ./models/VIM3/yolov8n_pose_uint8.nb --library ./libs/libnn_yolov8n_pose.so --type usb --device 0
For MIPI camera, OpenCV do not support GSTREAMER by pip install. So you need to install OpenCV by sudo apt install.
# mipi $ pip3 uninstall opencv-python numpy $ sudo apt install python3-opencv $ pip3 install numpy==1.23 $ cd ksnn/examples/yolov8n_pose $ python3 yolov8n-pose-cap.py --model ./models/VIM3/yolov8n_pose_uint8.nb --library ./libs/libnn_yolov8n_pose.so --type mipi --device 50
0
and 50
are the camera device index.