~~tag> NPU RetinaFace VIM3 PyTorch~~ ====== RetinaFace PyTorch VIM3 Demo Lite - 5 ====== This demo need kernel version >= 5.15. ===== Introduction ===== We think VIM3 C++ Demo is too complex. It is not friendly for users. So we provide a lite version. This document will help you use this lite version. {{indexmenu_n>4}} ===== Get source code ===== We will use a DenseNet model based on [[gh>bubbliiiing/retinaface-pytorch]]. ```shell $ git clone https://github.com/bubbliiiing/retinaface-pytorch ``` Before training, modify ''retinaface-pytorch/utils/utils.py'' as follows. ```diff diff --git a/utils/utils.py b/utils/utils.py index 87bb528..4a22f2a 100644 --- a/utils/utils.py +++ b/utils/utils.py @@ -25,5 +25,6 @@ def get_lr(optimizer): return param_group['lr'] def preprocess_input(image): - image -= np.array((104, 117, 123),np.float32) + image = image / 255.0 return image ``` ===== Convert the model ===== ==== Build Docker Environment ==== We provided a docker image which contains the required environment to convert the model. Follow Docker official docs to install Docker: [[https://docs.docker.com/engine/install/ubuntu/|Install Docker Engine on Ubuntu]]. Follow the command below to get Docker image: ```shell docker pull numbqq/npu-vim3 ``` ==== Get the conversion tool ==== ```shell $ git clone --recursive https://github.com/khadas/aml_npu_sdk.git ``` ```shell $ cd aml_npu_sdk/acuity-toolkit/demo && ls aml_npu_sdk/acuity-toolkit/demo$ ls 0_import_model.sh 1_quantize_model.sh 2_export_case_code.sh data dataset_npy.txt dataset.txt extractoutput.py inference.sh input.npy model ``` ==== Convert ==== After training the model, we should convert the PyTorch model into an ONNX model. Create the Python conversion script as follows and run. ```python export.py import torch import numpy as np from nets.retinaface import RetinaFace from utils.config import cfg_mnet, cfg_re50 model_path = "logs/Epoch150-Total_Loss6.2802.pth" net = RetinaFace(cfg=cfg_mnet, mode='eval').eval() device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') net.load_state_dict(torch.load(model_path, map_location=device)) img = torch.zeros(1, 3, 640, 640) torch.onnx.export(net, img, "./retinaface.onnx", verbose=False, opset_version=12, input_names=['images']) ``` Enter ''aml_npu_sdk/acuity-toolkit/demo'' and put ''retinaface.onnx'' into ''demo/model''. Modify ''0_import_model.sh'', ''1_quantize_model.sh'' and ''2_export_case_code.sh'' as follows. ```shell 0_import_model.sh #!/bin/bash NAME=retinaface ACUITY_PATH=../bin/ pegasus=${ACUITY_PATH}pegasus if [ ! -e "$pegasus" ]; then pegasus=${ACUITY_PATH}pegasus.py fi #Onnx $pegasus import onnx \ --model ./model/${NAME}.onnx \ --output-model ${NAME}.json \ --output-data ${NAME}.data #generate inpumeta --source-file dataset.txt $pegasus generate inputmeta \ --model ${NAME}.json \ --input-meta-output ${NAME}_inputmeta.yml \ --channel-mean-value "0 0 0 0.0039215" \ --source-file dataset.txt ``` ```shell 1_quantize_model.sh #!/bin/bash NAME=retinaface ACUITY_PATH=../bin/ pegasus=${ACUITY_PATH}pegasus if [ ! -e "$pegasus" ]; then pegasus=${ACUITY_PATH}pegasus.py fi #--quantizer asymmetric_affine --qtype uint8 #--quantizer dynamic_fixed_point --qtype int8(int16,note s905d3 not support int16 quantize) # --quantizer perchannel_symmetric_affine --qtype int8(int16, note only T3(0xBE) can support perchannel quantize) $pegasus quantize \ --quantizer dynamic_fixed_point \ --qtype int8 \ --rebuild \ --with-input-meta ${NAME}_inputmeta.yml \ --model ${NAME}.json \ --model-data ${NAME}.data ``` ``` shell 2_export_case_code.sh #!/bin/bash NAME=retinaface ACUITY_PATH=../bin/ pegasus=$ACUITY_PATH/pegasus if [ ! -e "$pegasus" ]; then pegasus=$ACUITY_PATH/pegasus.py fi $pegasus export ovxlib\ --model ${NAME}.json \ --model-data ${NAME}.data \ --model-quantize ${NAME}.quantize \ --with-input-meta ${NAME}_inputmeta.yml \ --dtype quantized \ --optimize VIPNANOQI_PID0X88 \ --viv-sdk ${ACUITY_PATH}vcmdtools \ --pack-nbg-unify rm -rf ${NAME}_nbg_unify mv ../*_nbg_unify ${NAME}_nbg_unify cd ${NAME}_nbg_unify mv network_binary.nb ${NAME}.nb cd .. #save normal case demo export.data mkdir -p ${NAME}_normal_case_demo mv *.h *.c .project .cproject *.vcxproj BUILD *.linux *.export.data ${NAME}_normal_case_demo # delete normal_case demo source #rm *.h *.c .project .cproject *.vcxproj BUILD *.linux *.export.data rm *.data *.quantize *.json *_inputmeta.yml ``` If you use VIM3L, ''optimize'' use ''VIPNANOQI_PID0X99''. After modifying, return to ''aml_npu_sdk'' and run ''convert-in-docker.sh''. If run succeed, converted model and library will generate in ''demo/retinaface_nbg_unify''. ```shell $ cd ../../ $ bash convert-in-docker.sh $ cd acuity-toolkit/demo/retinaface_nbg_unify $ ls BUILD main.c makefile.linux nbg_meta.json retinaface_99.nb retinaface.vcxproj vnn_global.h vnn_post_process.c vnn_post_process.h vnn_pre_process.c vnn_pre_process.h vnn_retinaface.c vnn_retinaface.h ``` ===== Run inference on the NPU ===== ==== Get source code ==== Get the source code: [[gh>khadas/vim3_npu_applications_lite]] ```shell $ git clone https://github.com/khadas/vim3_npu_applications_lite ``` ==== Install dependencies ==== ```shell $ sudo apt update $ sudo apt install libopencv-dev python3-opencv cmake ``` ==== Compile and run ==== Put ''retinaface.nb'' into ''vim3_npu_applications_lite/retinaface_demo_x11_usb/nn_data''. Replace ''retinaface_demo_x11_usb/vnn_retinaface.c'' and ''retinaface_demo_x11_usb/include/vnn_retinaface.h'' with your generating ''vnn_retinaface.c'' and ''vnn_retinaface.h''. ```shell # Compile $ cd vim3_npu_applications_lite/retinaface_demo_x11_usb $ bash build_vx.sh $ cd bin_r_cv4 $ ./retinaface_demo_x11_usb -m ../nn_data/retinaface_88.nb -d /dev/video0 ```