~~tag> NPU YOLO KSNN VIM3 ~~

====== YOLOv8n KSNN Demo - 2 ======

{{indexmenu_n>2}}

===== Introduction =====

YOLOv8n is an object detection model. It uses bounding boxes to precisely draw each object in image.

Inference results on VIM3.

{{:products:sbc:vim3:npu:ksnn:demos:yolov8n-ksnn-result.jpg?800|}}

**Inference speed test**: USB camera about **182ms** per frame. MIPI camera about **156ms** per frame.

===== Train the model =====

Download the YOLOv8 official code. [[gh>ultralytics/ultralytics]]

```shell
$ git clone https://github.com/ultralytics/ultralytics
```

Refer ''README.md'' to create and train a YOLOv8n model. My version ''torch==1.10.1'' and ''ultralytics==8.0.86''.

===== Convert the model =====

==== Get the conversion tool ====

```shell
$ git lfs install
$ git lfs clone https://github.com/khadas/aml_npu_sdk
```

The KSNN conversion tool is under ''acuity-toolkit/python''.

```shell
$ cd aml_npu_sdk/acuity-toolkit/python && ls
$ convert  data  outputs
```

==== Convert ====

After training the model, modify ''ultralytics/ultralytics/nn/modules/head.py'' as follows.

```diff head.py
diff --git a/ultralytics/nn/modules/head.py b/ultralytics/nn/modules/head.py
index 0b02eb3..0a6e43a 100644
--- a/ultralytics/nn/modules/head.py
+++ b/ultralytics/nn/modules/head.py
@@ -42,6 +42,9 @@ class Detect(nn.Module):
 
     def forward(self, x):
         """Concatenates and returns predicted bounding boxes and class probabilities."""
+        if torch.onnx.is_in_onnx_export():
+            return self.forward_export(x)
+
         shape = x[0].shape  # BCHW
         for i in range(self.nl):
             x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
@@ -80,6 +83,15 @@ class Detect(nn.Module):
             a[-1].bias.data[:] = 1.0  # box
             b[-1].bias.data[:m.nc] = math.log(5 / m.nc / (640 / s) ** 2)  # cls (.01 objects, 80 classes, 640 img)
 
+    def forward_export(self, x):
+        results = []
+        for i in range(self.nl):
+            dfl = self.cv2[i](x[i]).contiguous()
+            cls = self.cv3[i](x[i]).contiguous()
+            results.append(torch.cat([cls, dfl], 1))
+        return tuple(results)
+
```

<WRAP important>
If you pip-installed ultralytics package, you should modify in package.
</WRAP>

Create a python file written as follows to export ONNX model.

```python export.py
from ultralytics import YOLO
model = YOLO("./runs/detect/train/weights/best.pt")
results = model.export(format="onnx")
```

```shell
$ python export.py
```

<WRAP important>
Use [[https://netron.app/ | Netron]] to check your model output like this. If not, please check your ''head.py''.

{{:products:sbc:vim3:npu:ksnn:yolov8n-vim3-ksnn-output.png?600|}}
</WRAP>

Enter ''aml_npu_sdk/acuity-toolkit/python'' and run command as follows.

```shell
# uint8
$ ./convert --model-name yolov8n \
            --platform onnx \
            --model yolov8n.onnx \
            --mean-values '0 0 0 0.00392156' \
            --quantized-dtype asymmetric_affine \
            --source-files ./data/dataset/dataset0.txt \
            --batch-size 1 \
            --iterations 1 \
            --kboard VIM3 --print-level 0 
```

<WRAP important>
Now KSNN only supports ''batch-size'' = 1.
</WRAP>

If you want to use more quantified images, please modify ''batch-size'' and ''iterations''. ''batch-size''×''iterations''=number of quantified images. The number of quantified images has better between 200 and 500.

If you use ''VIM3L'' , please use ''VIM3L'' to replace ''VIM3''.

If run succeed, converted model and library will generate in ''outputs/yolov8n''.

<WRAP important>
If your YOLOv8 model perform bad on board, please try quanfity model in int8 or int16.
```shell
# int8
$ ./convert --model-name yolov8n \
            --platform onnx \
            --model yolov8n.onnx \
            --mean-values '0 0 0 0.00392156' \
            --quantized-dtype dynamic_fixed_point \
            --qtype int8 \
            --source-files ./data/dataset/dataset0.txt \
            --batch-size 1 \
            --iterations 1 \
            --kboard VIM3 --print-level 0 

# int16
$ ./convert --model-name yolov8n \
            --platform onnx \
            --model yolov8n.onnx \
            --mean-values '0 0 0 0.00392156' \
            --quantized-dtype dynamic_fixed_point \
            --qtype int16 \
            --source-files ./data/dataset/dataset0.txt \
            --batch-size 1 \
            --iterations 1 \
            --kboard VIM3 --print-level 0 
```
</WRAP>

===== Run inference on the NPU by KSNN =====

==== Install KSNN ====

Download KSNN library and demo code. [[gh>khadas/ksnn]]

```shell
$ git clone --recursive https://github.com/khadas/ksnn.git
$ cd ksnn/ksnn
$ pip3 install ksnn-1.3-py3-none-any.whl
```

If your kernel version is 5.15, use ''ksnn-1.4-py3-none-any.whl'' instead of ''ksnn-1.3-py3-none-any.whl''.

==== Install dependencies ====

```shell
$ pip3 install matplotlib
```

Put ''yolov8n.nb'' and ''libnn_yolov8n.so'' into ''ksnn/examples/yolov8n/models/VIM3'' and ''ksnn/examples/yolov8n/libs''

If your model's classes is not 80, please remember to modify the parameter, ''LISTSIZE''. 

```shell
LISTSIZE = classes number + 64
```

==== Picture input demo ====

```shell
$ cd ksnn/examples/yolov8n
$ python3 yolov8n-picture.py --model ./models/VIM3/yolov8n.nb --library ./libs/libnn_yolov8n.so --picture ./data/horses.jpg --level 0
```

=== Camera input demo ===

For USB camera.

```shell
# usb
$ cd ksnn/examples/yolov8n
$ python3 yolov8n-cap.py --model ./models/VIM3/yolov8n_uint8.nb --library ./libs/libnn_yolov8n_uint8.so --type usb --device 0
```

For MIPI camera, OpenCV do not support GSTREAMER by **pip install**. So you need to install OpenCV by **sudo apt install**.

```shell
# mipi
$ pip3 uninstall opencv-python numpy
$ sudo apt install python3-opencv
$ pip3 install numpy==1.23
$ cd ksnn/examples/yolov8n
$ python3 yolov8n-cap.py --model ./models/VIM3/yolov8n_uint8.nb --library ./libs/libnn_yolov8n_uint8.so --type mipi --device 50
```

''0'' and ''50'' are the camera device index.