YOLOv8n-Pose OpenCV VIM4 Demo - 8

Introduction

YOLOv8n-Pose inherits the powerful object detection backbone and neck architecture of YOLOv8n. It extends the standard YOLOv8n object detection model by integrating dedicated pose estimation layers onto its head. This allows it to not only detect people (bboxes) but also simultaneously predict the spatial positions (keypoints) of their anatomical joints (e.g., shoulders, elbows, knees, ankles).

Inference results on VIM4.

Inference speed test: USB camera about 90ms per frame.

Get Source Code

Download YOLOv8 official code ultralytics/ultralytics

$ git clone https://github.com/ultralytics/ultralytics

Refer README.md to train a YOLOv8n-Pose model. My version torch==1.10.1 and ultralytics==8.0.86.

Convert Model

Build virtual environment

Follow Docker official documentation to install Docker: Install Docker Engine on Ubuntu.

Follow the script below to get Docker image:

docker pull numbqq/npu-vim4

Get Model Conversion Tools

Get source khadas/vim4_npu_sdk.

$ git lfs install
$ git lfs clone https://github.com/khadas/vim4_npu_sdk
$ cd vim4_npu_sdk
$ ls
adla-toolkit-binary  adla-toolkit-binary-3.1.7.4  convert-in-docker.sh  Dockerfile  docs  README.md

adla-toolkit-binary/docs - SDK documentations
adla-toolkit-binary/bin - SDK tools required for model conversion
adla-toolkit-binary/demo - Conversion examples

If your kernel is older than 241129, please use branch npu-ddk-1.7.5.5.

Convert

After training model, modify Class Detect and Class Pose in ultralytics/ultralytics/nn/modules/head.py as follows. (If you use ultralytics==8.0.86, the class in ultralytics/ultralytics/nn/modules.py)

head.py

diff --git a/ultralytics/nn/modules/head.py b/ultralytics/nn/modules/head.py
index 0b02eb3..0a6e43a 100644
--- a/ultralytics/nn/modules/head.py
+++ b/ultralytics/nn/modules/head.py
@@ -42,6 +42,9 @@ class Detect(nn.Module):
 
     def forward(self, x):
         """Concatenates and returns predicted bounding boxes and class probabilities."""
+        if torch.onnx.is_in_onnx_export():
+            return self.forward_export(x)
+
         shape = x[0].shape  # BCHW
         for i in range(self.nl):
             x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
@@ -80,6 +83,15 @@ class Detect(nn.Module):
             a[-1].bias.data[:] = 1.0  # box
             b[-1].bias.data[:m.nc] = math.log(5 / m.nc / (640 / s) ** 2)  # cls (.01 objects, 80 classes, 640 img)
 
+    def forward_export(self, x):
+        results = []
+        for i in range(self.nl):
+            dfl = self.cv2[i](x[i]).contiguous()
+            cls = self.cv3[i](x[i]).contiguous()
+            results.append(torch.cat([cls, dfl], 1))
+        return tuple(results)
+
 
@@ -255,6 +283,16 @@ class Pose(Detect):
     def forward(self, x):
         """Perform forward pass through YOLO model and return predictions."""
         bs = x[0].shape[0]  # batch size
-        kpt = torch.cat([self.cv4[i](x[i]).view(bs, self.nk, -1) for i in range(self.nl)], -1)  # (bs, 17*3, h*w)
+        if torch.onnx.is_in_onnx_export():
+            kpt = [self.cv4[i](x[i]) for i in range(self.nl)]
+        else:
+            kpt = torch.cat([self.cv4[i](x[i]).view(bs, self.nk, -1) for i in range(self.nl)], -1)  # (bs, 17*3, h*w)
         x = self.detect(self, x)
+
+        if torch.onnx.is_in_onnx_export():
+            output = []
+            for i in range(self.nl):
+                output.append((torch.cat([x[i], kpt[i]], dim=1)))
+            return output

If you pip-installed ultralytics package, you should modify in package.

Create a python file written as follows to export ONNX model.

export.py

from ultralytics import YOLO
model = YOLO("./runs/pose/train/weights/best.pt")
results = model.export(format="onnx")

$ python export.py

Use Netron to check your model output like this. If not, please check your head.py.

Enter vim4_npu_sdk/demo and modify convert_adla.sh as follows.

convert_adla.sh

#!/bin/bash
 
ACUITY_PATH=../bin/
#ACUITY_PATH=../python/tvm/
adla_convert=${ACUITY_PATH}adla_convert
 
 
if [ ! -e "$adla_convert" ]; then
    adla_convert=${ACUITY_PATH}adla_convert.py
fi
 
$adla_convert --model-type onnx \
        --model ./model_source/yolov8n_pose/yolov8n_pose.onnx \
        --inputs "images" \
        --input-shapes  "3,640,640"  \
        --dtypes "float32" \
        --quantize-dtype int16 --outdir onnx_output  \
        --channel-mean-value "0,0,0,255"  \
        --inference-input-type "float32" \
        --inference-output-type "float32" \
        --source-file dataset.txt  \
        --batch-size 1 --target-platform PRODUCT_PID0XA003

Run convert_adla.sh to generate VIM4 model. The converted model is xxx.adla in onnx_output.

$ bash convert_adla.sh

Run NPU

Get source code

Clone the source code from our khadas/vim4_npu_applications.

$ git clone https://github.com/khadas/vim4_npu_applications

If your kernel is older than 241129, please use version before tag ddk-3.4.7.7.

Install dependencies

$ sudo apt update
$ sudo apt install libopencv-dev python3-opencv cmake

Compile and run

Picture input demo

Put yolov8n_pose_int8.adla in vim4_npu_applications/yolov8n_pose/data/.

# Compile
$ cd vim4_npu_applications/yolov8n_pose
$ mkdir build
$ cd build
$ cmake ..
$ make
 
# Run
$ ./yolov8n_pose -m ../data/yolov8n_pose_int8.adla -p ../data/bus.jpg

Camera input demo

Put yolov8n_pose_int8.adla in vim4_npu_applications/yolov8n_pose_cap/data/.

# Compile
$ cd vim4_npu_applications/yolov8n_pose_cap
$ mkdir build
$ cd build
$ cmake ..
$ make
 
# Run
$ ./yolov8n_pose_cap -m ../data/yolov8n_pose_int8.adla -t usb -d 0

0 is camera device index.

Khadas Docs

Sidebar

Table of Contents

YOLOv8n-Pose OpenCV VIM4 Demo - 8

Introduction

Get Source Code

Convert Model

Build virtual environment

Get Model Conversion Tools

Convert

Run NPU

Get source code

Install dependencies

Compile and run

Picture input demo

Camera input demo

Khadas Docs

User Tools

Site Tools

Sidebar

Table of Contents

YOLOv8n-Pose OpenCV VIM4 Demo - 8

Introduction

Get Source Code

Convert Model

Build virtual environment

Get Model Conversion Tools

Convert

Run NPU

Get source code

Install dependencies

Compile and run

Picture input demo

Camera input demo

Page Tools