This is an old revision of the document!
This demo need kernel version >= 5.15.
We think VIM3 C++ Demo is too complex. It is not friendly for users. So we provide a lite version. This document will help you use this lite version.
Download the YOLOv8 official code. ultralytics/ultralytics
$ git clone https://github.com/ultralytics/ultralytics
Refer README.md to create and train a YOLOv8n model. The version ultralytics== 8.0.86, PyTorch== 1.10.1.
We provided a docker image which contains the required environment to convert the model.
Follow Docker official docs to install Docker: Install Docker Engine on Ubuntu.
Follow the command below to get Docker image:
docker pull numbqq/npu-vim3
$ git lfs install $ git lfs clone https://github.com/khadas/aml_npu_sdk.git
$ cd aml_npu_sdk/acuity-toolkit/demo && ls aml_npu_sdk/acuity-toolkit/demo$ ls 0_import_model.sh 1_quantize_model.sh 2_export_case_code.sh data dataset_npy.txt dataset.txt extractoutput.py inference.sh input.npy model
After training the model, modify ultralytics/ultralytics/nn/modules/head.py as follows.
diff --git a/ultralytics/nn/modules/head.py b/ultralytics/nn/modules/head.py index 0b02eb3..0a6e43a 100644 --- a/ultralytics/nn/modules/head.py +++ b/ultralytics/nn/modules/head.py @@ -42,6 +42,9 @@ class Detect(nn.Module): def forward(self, x): """Concatenates and returns predicted bounding boxes and class probabilities.""" + if torch.onnx.is_in_onnx_export(): + return self.forward_export(x) + shape = x[0].shape # BCHW for i in range(self.nl): x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1) @@ -80,6 +83,15 @@ class Detect(nn.Module): a[-1].bias.data[:] = 1.0 # box b[-1].bias.data[:m.nc] = math.log(5 / m.nc / (640 / s) ** 2) # cls (.01 objects, 80 classes, 640 img) + def forward_export(self, x): + results = [] + for i in range(self.nl): + dfl = self.cv2[i](x[i]).contiguous() + cls = self.cv3[i](x[i]).contiguous() + results.append(torch.cat([cls, dfl], 1).permute(0, 2, 3, 1)) + return tuple(results) +
If you pip-installed ultralytics package, you should modify in package.
Create a python file written as follows to export ONNX model.
from ultralytics import YOLO model = YOLO("./runs/detect/train/weights/best.pt") results = model.export(format="onnx")
$ python export.py
Use Netron to check your model output like this. If not, please check your head.py.
Enter aml_npu_sdk/acuity-toolkit/demo and put yolov8n.onnx into demo/model. Modify 0_import_model.sh, 1_quantize_model.sh and 2_export_case_code.sh as follows.
#!/bin/bash
NAME=yolov8n
ACUITY_PATH=../bin/
pegasus=${ACUITY_PATH}pegasus
if [ ! -e "$pegasus" ]; then
pegasus=${ACUITY_PATH}pegasus.py
fi
#Onnx
$pegasus import onnx \
--model ./model/${NAME}.onnx \
--output-model ${NAME}.json \
--output-data ${NAME}.data
#generate inpumeta --source-file dataset.txt
$pegasus generate inputmeta \
--model ${NAME}.json \
--input-meta-output ${NAME}_inputmeta.yml \
--channel-mean-value "0 0 0 0.0039215" \
--source-file dataset.txt
#!/bin/bash
NAME=yolov8n
ACUITY_PATH=../bin/
pegasus=${ACUITY_PATH}pegasus
if [ ! -e "$pegasus" ]; then
pegasus=${ACUITY_PATH}pegasus.py
fi
#--quantizer asymmetric_affine --qtype uint8
#--quantizer dynamic_fixed_point --qtype int8(int16,note s905d3 not support int16 quantize)
# --quantizer perchannel_symmetric_affine --qtype int8(int16, note only T3(0xBE) can support perchannel quantize)
$pegasus quantize \
--quantizer dynamic_fixed_point \
--qtype int8 \
--rebuild \
--with-input-meta ${NAME}_inputmeta.yml \
--model ${NAME}.json \
--model-data ${NAME}.data
#!/bin/bash
NAME=yolov8n
ACUITY_PATH=../bin/
pegasus=$ACUITY_PATH/pegasus
if [ ! -e "$pegasus" ]; then
pegasus=$ACUITY_PATH/pegasus.py
fi
$pegasus export ovxlib\
--model ${NAME}.json \
--model-data ${NAME}.data \
--model-quantize ${NAME}.quantize \
--with-input-meta ${NAME}_inputmeta.yml \
--dtype quantized \
--optimize VIPNANOQI_PID0X88 \
--viv-sdk ${ACUITY_PATH}vcmdtools \
--pack-nbg-unify
rm -rf ${NAME}_nbg_unify
mv ../*_nbg_unify ${NAME}_nbg_unify
cd ${NAME}_nbg_unify
mv network_binary.nb ${NAME}.nb
cd ..
#save normal case demo export.data
mkdir -p ${NAME}_normal_case_demo
mv *.h *.c .project .cproject *.vcxproj BUILD *.linux *.export.data ${NAME}_normal_case_demo
# delete normal_case demo source
#rm *.h *.c .project .cproject *.vcxproj BUILD *.linux *.export.data
rm *.data *.quantize *.json *_inputmeta.yml
If you use VIM3L, optimize use VIPNANOQI_PID0X99.
After modifying, return to aml_npu_sdk and run convert-in-docker.sh.
If run succeed, converted model and library will generate in demo/yolov8n_nbg_unify.
$ cd ../../ $ bash convert-in-docker.sh $ cd acuity-toolkit/demo/yolov8n_nbg_unify $ ls BUILD main.c makefile.linux nbg_meta.json vnn_global.h vnn_post_process.c vnn_post_process.h vnn_pre_process.c vnn_pre_process.h vnn_yolov8n.c vnn_yolov8n.h yolov8n.nb yolov8n.vcxproj
Get the source code: khadas/vim3_npu_applications_lite
$ git clone https://github.com/khadas/vim3_npu_applications_lite
$ sudo apt update $ sudo apt install libopencv-dev python3-opencv cmake
Put yolov8n.nb into vim3_npu_applications_lite/yolov8n_demo_x11_usb/nn_data.
Replace yolov8n_demo_x11_usb/vnn_yolov8n.c and yolov8n_demo_x11_usb/include/vnn_yolov8n.h with your generating vnn_yolov8n.c and vnn_yolov8n.h.
# Compile $ cd vim3_npu_applications_lite/yolov8n_demo_x11_usb $ bash build_vx.sh $ cd bin_r_cv4 $ ./yolov8n_demo_x11_usb -m ../nn_data/yolov8n_88.nb -d /dev/video0