Differences

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+~~tag> NPU YOLO KSNN VIM3 ~~
+====== YOLOv8n-Pose KSNN Demo - 8 ======
+{{indexmenu_n>8}}
+===== Introduction =====
+YOLOv8n-Pose inherits the powerful object detection backbone and neck architecture of YOLOv8n. It extends the standard YOLOv8n object detection model by integrating dedicated pose estimation layers onto its head. This allows it to not only detect people (bboxes) but also simultaneously predict the spatial positions (keypoints) of their anatomical joints (e.g., shoulders, elbows, knees, ankles).
+Inference results on VIM3.
+{{:products:sbc:vim3:npu:ksnn:demos:yolov8n-pose-ksnn-result.jpg?800|}}
+**Inference speed test**: USB camera about **182ms** per frame. MIPI camera about **156ms** per frame.
+===== Train the model =====
+Download the YOLOv8 official code. [[gh>ultralytics/ultralytics]]
+```shell
+$ git clone https://github.com/ultralytics/ultralytics
+```
+Refer ''README.md'' to create and train a YOLOv8n-Pose model. My version ''torch==1.10.1'' and ''ultralytics==8.0.86''.
+===== Convert the model =====
+==== Get the conversion tool ====
+```shell
+$ git lfs install
+$ git lfs clone https://github.com/khadas/aml_npu_sdk
+```
+The KSNN conversion tool is under ''acuity-toolkit/python''.
+```shell
+$ cd aml_npu_sdk/acuity-toolkit/python && ls
+$ convert  data  outputs
+```
+==== Convert ====
+After training model, modify **Class Detect** and **Class Pose** in ''ultralytics/ultralytics/nn/modules/head.py'' as follows. (If you use ''ultralytics==8.0.86'', the class in ''ultralytics/ultralytics/nn/modules.py'')
+```diff head.py
+diff --git a/ultralytics/nn/modules/head.py b/ultralytics/nn/modules/head.py
+index 0b02eb3..0a6e43a 100644
+--- a/ultralytics/nn/modules/head.py
++++ b/ultralytics/nn/modules/head.py
+@@ -42,6 +42,9 @@ class Detect(nn.Module):
+     def forward(self, x):
+         """Concatenates and returns predicted bounding boxes and class probabilities."""
++        if torch.onnx.is_in_onnx_export():
++            return self.forward_export(x)
++
+         shape = x[0].shape  # BCHW
+         for i in range(self.nl):
+             x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
+@@ -80,6 +83,15 @@ class Detect(nn.Module):
+             a[-1].bias.data[:] = 1.0  # box
+             b[-1].bias.data[:m.nc] = math.log(5 / m.nc / (640 / s) ** 2)  # cls (.01 objects, 80 classes, 640 img)
++    def forward_export(self, x):
++        results = []
++        for i in range(self.nl):
++            dfl = self.cv2[i](x[i]).contiguous()
++            cls = self.cv3[i](x[i]).contiguous()
++            results.append(torch.cat([cls, dfl], 1))
++        return tuple(results)
++
+@@ -255,6 +283,16 @@ class Pose(Detect):
+     def forward(self, x):
+         """Perform forward pass through YOLO model and return predictions."""
+         bs = x[0].shape[0]  # batch size
+-        kpt = torch.cat([self.cv4[i](x[i]).view(bs, self.nk, -1) for i in range(self.nl)], -1)  # (bs, 17*3, h*w)
++        if torch.onnx.is_in_onnx_export():
++            kpt = [self.cv4[i](x[i]) for i in range(self.nl)]
++        else:
++            kpt = torch.cat([self.cv4[i](x[i]).view(bs, self.nk, -1) for i in range(self.nl)], -1)  # (bs, 17*3, h*w)
+         x = self.detect(self, x)
++
++        if torch.onnx.is_in_onnx_export():
++            output = []
++            for i in range(self.nl):
++                output.append((torch.cat([x[i], kpt[i]], dim=1)))
++            return output
+```
+<WRAP important>
+If you pip-installed ultralytics package, you should modify in package.
+</WRAP>
+Create a python file written as follows to export ONNX model.
+```python export.py
+from ultralytics import YOLO
+model = YOLO("./runs/pose/train/weights/best.pt")
+results = model.export(format="onnx")
+```
+```shell
+$ python export.py
+```
+<WRAP important>
+Use [[https://netron.app/ | Netron]] to check your model output like this. If not, please check your ''head.py''.
+{{:products:sbc:vim3:npu:ksnn:demos:yolov8n-pose-vim3-ksnn-output.png?600|}}
+</WRAP>
+Enter ''aml_npu_sdk/acuity-toolkit/python'' and run command as follows.
+```shell
+# uint8
+$ ./convert --model-name yolov8n_pose \
+            --platform onnx \
+            --model yolov8n_pose.onnx \
+            --mean-values '0 0 0 0.00392156' \
+            --quantized-dtype asymmetric_affine \
+            --source-files ./data/dataset/dataset0.txt \
+            --batch-size 1 \
+            --iterations 1 \
+            --kboard VIM3 --print-level 0
+```
+<WRAP important>
+Now KSNN only supports ''batch-size'' = 1.
+</WRAP>
+If you want to use more quantified images, please modify ''batch-size'' and ''iterations''. ''batch-size''×''iterations''=number of quantified images. The number of quantified images has better between 200 and 500.
+If you use ''VIM3L'' , please use ''VIM3L'' to replace ''VIM3''.
+If run succeed, converted model and library will generate in ''outputs/yolov8n_pose''.
+<WRAP important>
+If your YOLOv8n-Pose model perform bad on board, please try quanfity model in int8 or int16.
+```shell
+# int8
+$ ./convert --model-name yolov8n_pose \
+            --platform onnx \
+            --model yolov8n_pose.onnx \
+            --mean-values '0 0 0 0.00392156' \
+            --quantized-dtype dynamic_fixed_point \
+            --qtype int8 \
+            --source-files ./data/dataset/dataset0.txt \
+            --batch-size 1 \
+            --iterations 1 \
+            --kboard VIM3 --print-level 0
+# int16
+$ ./convert --model-name yolov8n_pose \
+            --platform onnx \
+            --model yolov8n_pose.onnx \
+            --mean-values '0 0 0 0.00392156' \
+            --quantized-dtype dynamic_fixed_point \
+            --qtype int16 \
+            --source-files ./data/dataset/dataset0.txt \
+            --batch-size 1 \
+            --iterations 1 \
+            --kboard VIM3 --print-level 0
+```
+</WRAP>
+===== Run inference on the NPU by KSNN =====
+==== Install KSNN ====
+Download KSNN library and demo code. [[gh>khadas/ksnn]]
+```shell
+$ git clone --recursive https://github.com/khadas/ksnn.git
+$ cd ksnn/ksnn
+$ pip3 install ksnn-1.3-py3-none-any.whl
+```
+If your kernel version is 5.15, use ''ksnn-1.4-py3-none-any.whl'' instead of ''ksnn-1.3-py3-none-any.whl''.
+==== Install dependencies ====
+```shell
+$ pip3 install matplotlib
+```
+Put ''yolov8n_pose.nb'' and ''libnn_yolov8n_pose.so'' into ''ksnn/examples/yolov8n_pose/models/VIM3'' and ''ksnn/examples/yolov8n_pose/libs''
+==== Picture input demo ====
+```shell
+$ cd ksnn/examples/yolov8n_pose
+$ python3 yolov8n-pose-picture.py --model ./models/VIM3/yolov8n_pose_uint8.nb --library ./libs/libnn_yolov8n_pose.so --picture ./data/bus.jpg --level 0
+```
+=== Camera input demo ===
+For USB camera.
+```shell
+# usb
+$ cd ksnn/examples/yolov8n_pose
+$ python3 yolov8n-pose-cap.py --model ./models/VIM3/yolov8n_pose_uint8.nb --library ./libs/libnn_yolov8n_pose.so --type usb --device 0
+```
+For MIPI camera, OpenCV do not support GSTREAMER by **pip install**. So you need to install OpenCV by **sudo apt install**.
+```shell
+# mipi
+$ pip3 uninstall opencv-python numpy
+$ sudo apt install python3-opencv
+$ pip3 install numpy==1.23
+$ cd ksnn/examples/yolov8n_pose
+$ python3 yolov8n-pose-cap.py --model ./models/VIM3/yolov8n_pose_uint8.nb --library ./libs/libnn_yolov8n_pose.so --type mipi --device 50
+```
+''0'' and ''50'' are the camera device index.

Khadas Docs

User Tools

Site Tools

Differences

Page Tools