Doc for version ddk-3.4.7.7
YOLOv8n-Pose inherits the powerful object detection backbone and neck architecture of YOLOv8n. It extends the standard YOLOv8n object detection model by integrating dedicated pose estimation layers onto its head. This allows it to not only detect people (bboxes) but also simultaneously predict the spatial positions (keypoints) of their anatomical joints (e.g., shoulders, elbows, knees, ankles).
Inference results on VIM4.
Inference speed test: USB camera about 90ms per frame.
Download YOLOv8 official code ultralytics/ultralytics
$ git clone https://github.com/ultralytics/ultralytics
Refer README.md
to train a YOLOv8n-Pose model. My version torch==1.10.1
and ultralytics==8.0.86
.
Follow Docker official documentation to install Docker: Install Docker Engine on Ubuntu.
Follow the script below to get Docker image:
docker pull numbqq/npu-vim4
Get source khadas/vim4_npu_sdk.
$ git lfs install $ git lfs clone https://github.com/khadas/vim4_npu_sdk $ cd vim4_npu_sdk $ ls adla-toolkit-binary adla-toolkit-binary-3.1.7.4 convert-in-docker.sh Dockerfile docs README.md
adla-toolkit-binary/docs
- SDK documentationsadla-toolkit-binary/bin
- SDK tools required for model conversionadla-toolkit-binary/demo
- Conversion examplesIf your kernel is older than 241129, please use branch npu-ddk-1.7.5.5.
After training model, modify Class Detect and Class Pose in ultralytics/ultralytics/nn/modules/head.py
as follows. (If you use ultralytics==8.0.86
, the class in ultralytics/ultralytics/nn/modules.py
)
diff --git a/ultralytics/nn/modules/head.py b/ultralytics/nn/modules/head.py index 0b02eb3..0a6e43a 100644 --- a/ultralytics/nn/modules/head.py +++ b/ultralytics/nn/modules/head.py @@ -42,6 +42,9 @@ class Detect(nn.Module): def forward(self, x): """Concatenates and returns predicted bounding boxes and class probabilities.""" + if torch.onnx.is_in_onnx_export(): + return self.forward_export(x) + shape = x[0].shape # BCHW for i in range(self.nl): x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1) @@ -80,6 +83,15 @@ class Detect(nn.Module): a[-1].bias.data[:] = 1.0 # box b[-1].bias.data[:m.nc] = math.log(5 / m.nc / (640 / s) ** 2) # cls (.01 objects, 80 classes, 640 img) + def forward_export(self, x): + results = [] + for i in range(self.nl): + dfl = self.cv2[i](x[i]).contiguous() + cls = self.cv3[i](x[i]).contiguous() + results.append(torch.cat([cls, dfl], 1)) + return tuple(results) + @@ -255,6 +283,16 @@ class Pose(Detect): def forward(self, x): """Perform forward pass through YOLO model and return predictions.""" bs = x[0].shape[0] # batch size - kpt = torch.cat([self.cv4[i](x[i]).view(bs, self.nk, -1) for i in range(self.nl)], -1) # (bs, 17*3, h*w) + if torch.onnx.is_in_onnx_export(): + kpt = [self.cv4[i](x[i]) for i in range(self.nl)] + else: + kpt = torch.cat([self.cv4[i](x[i]).view(bs, self.nk, -1) for i in range(self.nl)], -1) # (bs, 17*3, h*w) x = self.detect(self, x) + + if torch.onnx.is_in_onnx_export(): + output = [] + for i in range(self.nl): + output.append((torch.cat([x[i], kpt[i]], dim=1))) + return output
If you pip-installed ultralytics package, you should modify in package.
Create a python file written as follows to export ONNX model.
from ultralytics import YOLO model = YOLO("./runs/pose/train/weights/best.pt") results = model.export(format="onnx")
$ python export.py
Use Netron to check your model output like this. If not, please check your head.py
.
Enter vim4_npu_sdk/demo
and modify convert_adla.sh
as follows.
#!/bin/bash ACUITY_PATH=../bin/ #ACUITY_PATH=../python/tvm/ adla_convert=${ACUITY_PATH}adla_convert if [ ! -e "$adla_convert" ]; then adla_convert=${ACUITY_PATH}adla_convert.py fi $adla_convert --model-type onnx \ --model ./model_source/yolov8n_pose/yolov8n_pose.onnx \ --inputs "images" \ --input-shapes "3,640,640" \ --dtypes "float32" \ --quantize-dtype int16 --outdir onnx_output \ --channel-mean-value "0,0,0,255" \ --inference-input-type "float32" \ --inference-output-type "float32" \ --source-file dataset.txt \ --batch-size 1 --target-platform PRODUCT_PID0XA003
Run convert_adla.sh
to generate VIM4 model. The converted model is xxx.adla
in onnx_output
.
$ bash convert_adla.sh
Clone the source code from our khadas/vim4_npu_applications.
$ git clone https://github.com/khadas/vim4_npu_applications
If your kernel is older than 241129, please use version before tag ddk-3.4.7.7.
$ sudo apt update $ sudo apt install libopencv-dev python3-opencv cmake
Put yolov8n_pose_int8.adla
in vim4_npu_applications/yolov8n_pose/data/
.
# Compile $ cd vim4_npu_applications/yolov8n_pose $ mkdir build $ cd build $ cmake .. $ make # Run $ ./yolov8n_pose -m ../data/yolov8n_pose_int8.adla -p ../data/bus.jpg
Put yolov8n_pose_int8.adla
in vim4_npu_applications/yolov8n_pose_cap/data/
.
# Compile $ cd vim4_npu_applications/yolov8n_pose_cap $ mkdir build $ cd build $ cmake .. $ make # Run $ ./yolov8n_pose_cap -m ../data/yolov8n_pose_int8.adla -t usb -d 0
0
is camera device index.