Khadas Docs

Amazing Khadas, always amazes you!

User Tools

Site Tools


Sidebar

products:sbc:vim3:npu:vim3_demo_lite:yolov8n-pose

YOLOv8n-Pose VIM3 Demo Lite - 8

This demo need kernel version >= 5.15.

Introduction

We think VIM3 C++ Demo is too complex. It is not friendly for users. So we provide a lite version. This document will help you use this lite version.

YOLOv8n-Pose inherits the powerful object detection backbone and neck architecture of YOLOv8n. It extends the standard YOLOv8n object detection model by integrating dedicated pose estimation layers onto its head. This allows it to not only detect people (bboxes) but also simultaneously predict the spatial positions (keypoints) of their anatomical joints (e.g., shoulders, elbows, knees, ankles).

Inference results on VIM3.

Inference speed test: USB camera about 137ms per frame. MIPI camera about 115ms per frame.

Train the model

Download the YOLOv8 official code. ultralytics/ultralytics

$ git clone https://github.com/ultralytics/ultralytics

Refer README.md to create and train a YOLOv8n-Pose model. The version ultralytics== 8.0.86, PyTorch== 1.10.1.

Convert the model

Build Docker Environment

We provided a docker image which contains the required environment to convert the model.

Follow Docker official docs to install Docker: Install Docker Engine on Ubuntu.

Follow the command below to get Docker image:

docker pull numbqq/npu-vim3

Get the conversion tool

$ git lfs install
$ git lfs clone https://github.com/khadas/aml_npu_sdk.git
$ cd aml_npu_sdk/acuity-toolkit/demo && ls
aml_npu_sdk/acuity-toolkit/demo$ ls
0_import_model.sh  1_quantize_model.sh  2_export_case_code.sh  data  dataset_npy.txt  dataset.txt  extractoutput.py  inference.sh  input.npy  model

After training model, modify Class Detect and Class Pose in ultralytics/ultralytics/nn/modules/head.py as follows. (If you use ultralytics==8.0.86, the class in ultralytics/ultralytics/nn/modules.py)

head.py
diff --git a/ultralytics/nn/modules/head.py b/ultralytics/nn/modules/head.py
index 0b02eb3..0a6e43a 100644
--- a/ultralytics/nn/modules/head.py
+++ b/ultralytics/nn/modules/head.py
@@ -42,6 +42,9 @@ class Detect(nn.Module):
 
     def forward(self, x):
         """Concatenates and returns predicted bounding boxes and class probabilities."""
+        if torch.onnx.is_in_onnx_export():
+            return self.forward_export(x)
+
         shape = x[0].shape  # BCHW
         for i in range(self.nl):
             x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
@@ -80,6 +83,15 @@ class Detect(nn.Module):
             a[-1].bias.data[:] = 1.0  # box
             b[-1].bias.data[:m.nc] = math.log(5 / m.nc / (640 / s) ** 2)  # cls (.01 objects, 80 classes, 640 img)
 
+    def forward_export(self, x):
+        results = []
+        for i in range(self.nl):
+            dfl = self.cv2[i](x[i]).contiguous()
+            cls = self.cv3[i](x[i]).contiguous()
+            results.append(torch.cat([cls, dfl], 1).permute(0, 2, 3, 1))
+        return tuple(results)
+
 
@@ -255,6 +283,16 @@ class Pose(Detect):
     def forward(self, x):
         """Perform forward pass through YOLO model and return predictions."""
         bs = x[0].shape[0]  # batch size
-        kpt = torch.cat([self.cv4[i](x[i]).view(bs, self.nk, -1) for i in range(self.nl)], -1)  # (bs, 17*3, h*w)
+        if torch.onnx.is_in_onnx_export():
+            kpt = [self.cv4[i](x[i]).permute(0, 2, 3, 1) for i in range(self.nl)]
+        else:
+            kpt = torch.cat([self.cv4[i](x[i]).view(bs, self.nk, -1) for i in range(self.nl)], -1)  # (bs, 17*3, h*w)
         x = self.detect(self, x)
+
+        if torch.onnx.is_in_onnx_export():
+            output = []
+            for i in range(self.nl):
+                output.append((torch.cat([x[i], kpt[i]], dim=-1)))
+            return output

If you pip-installed ultralytics package, you should modify in package.

Create a python file written as follows to export ONNX model.

export.py
from ultralytics import YOLO
model = YOLO("./runs/pose/train/weights/best.pt")
results = model.export(format="onnx")
$ python export.py

Use Netron to check your model output like this. If not, please check your head.py.

Enter aml_npu_sdk/acuity-toolkit/demo and put yolov8n_pose.onnx into demo/model. Modify 0_import_model.sh, 1_quantize_model.sh and 2_export_case_code.sh as follows.

0_import_model.sh
#!/bin/bash
 
NAME=yolov8n_pose
ACUITY_PATH=../bin/
 
pegasus=${ACUITY_PATH}pegasus
if [ ! -e "$pegasus" ]; then
    pegasus=${ACUITY_PATH}pegasus.py
fi
 
#Onnx
$pegasus import onnx \
    --model  ./model/${NAME}.onnx \
    --output-model ${NAME}.json \
    --output-data ${NAME}.data 
 
#generate inpumeta  --source-file dataset.txt
$pegasus generate inputmeta \
	--model ${NAME}.json \
	--input-meta-output ${NAME}_inputmeta.yml \
	--channel-mean-value "0 0 0 0.0039215"  \
	--source-file dataset.txt
1_quantize_model.sh
#!/bin/bash
 
NAME=yolov8n_pose
ACUITY_PATH=../bin/
 
pegasus=${ACUITY_PATH}pegasus
if [ ! -e "$pegasus" ]; then
    pegasus=${ACUITY_PATH}pegasus.py
fi
 
#--quantizer asymmetric_affine --qtype  uint8
#--quantizer dynamic_fixed_point  --qtype int8(int16,note s905d3 not support int16 quantize) 
# --quantizer perchannel_symmetric_affine --qtype int8(int16, note only T3(0xBE) can support perchannel quantize)
$pegasus  quantize \
	--quantizer dynamic_fixed_point \
	--qtype int8 \
	--rebuild \
	--with-input-meta  ${NAME}_inputmeta.yml \
	--model  ${NAME}.json \
	--model-data  ${NAME}.data
2_export_case_code.sh
#!/bin/bash
 
NAME=yolov8n_pose
ACUITY_PATH=../bin/
 
pegasus=$ACUITY_PATH/pegasus
if [ ! -e "$pegasus" ]; then
    pegasus=$ACUITY_PATH/pegasus.py
fi
 
$pegasus export ovxlib\
    --model ${NAME}.json \
    --model-data ${NAME}.data \
    --model-quantize ${NAME}.quantize \
    --with-input-meta ${NAME}_inputmeta.yml \
    --dtype quantized \
    --optimize VIPNANOQI_PID0X88  \
    --viv-sdk ${ACUITY_PATH}vcmdtools \
    --pack-nbg-unify
 
rm -rf ${NAME}_nbg_unify
 
mv ../*_nbg_unify ${NAME}_nbg_unify
 
cd ${NAME}_nbg_unify
 
mv network_binary.nb ${NAME}.nb
 
cd ..
 
#save normal case demo export.data 
mkdir -p ${NAME}_normal_case_demo
mv  *.h *.c .project .cproject *.vcxproj BUILD *.linux *.export.data ${NAME}_normal_case_demo
 
# delete normal_case demo source
#rm  *.h *.c .project .cproject *.vcxproj  BUILD *.linux *.export.data
 
rm *.data *.quantize *.json *_inputmeta.yml

If you use VIM3L, optimize use VIPNANOQI_PID0X99.

After modifying, return to aml_npu_sdk and run convert-in-docker.sh.

If run succeed, converted model and library will generate in demo/yolov8n_pose_nbg_unify.

$ cd ../../
$ bash convert-in-docker.sh
$ cd acuity-toolkit/demo/yolov8n_pose_pose_nbg_unify
$ ls
BUILD  main.c  makefile.linux  nbg_meta.json  vnn_global.h  vnn_post_process.c  vnn_post_process.h  vnn_pre_process.c  vnn_pre_process.h  vnn_yolov8npose.c  vnn_yolov8npose.h  yolov8n_pose.nb  yolov8npose.vcxproj

Run inference on the NPU

Get source code

Get the source code: khadas/vim3_npu_applications_lite

$ git clone https://github.com/khadas/vim3_npu_applications_lite

Install dependencies

$ sudo apt update
$ sudo apt install libopencv-dev python3-opencv cmake

Compile and run

Put yolov8n_pose.nb into vim3_npu_applications_lite/yolov8n_pose_demo_x11_usb/nn_data. Replace yolov8n_pose_demo_x11_usb/vnn_yolov8npose.c and yolov8n_pose_demo_x11_usb/include/vnn_yolov8npose.h with your generating vnn_yolov8npose.c and vnn_yolov8npose.h.

# Compile
$ cd vim3_npu_applications_lite/yolov8n_pose_demo_x11_usb
$ bash build_vx.sh
$ cd bin_r_cv4
 
# usb
$ ./yolov8n_pose_demo_x11_usb -m ../nn_data/yolov8n_pose_88.nb -t usb -d /dev/video0
 
# mipi
$ ./yolov8n_pose_demo_x11_usb -m ../nn_data/yolov8n_pose_88.nb -t mipi -d /dev/video50
Last modified: 2025/06/06 03:49 by louis