This is an old revision of the document!

RetinaFace PyTorch VIM4 Demo - 5

Get source code

$ git clone https://github.com/bubbliiiing/retinaface-pytorch

Before training, modify retinaface-pytorch/utils/utils.py as follows.

diff --git a/utils/utils.py b/utils/utils.py
index 87bb528..4a22f2a 100644
--- a/utils/utils.py
+++ b/utils/utils.py
@@ -25,5 +25,6 @@ def get_lr(optimizer):
         return param_group['lr']
 
 def preprocess_input(image):
-    image -= np.array((104, 117, 123),np.float32)
+    image = image / 255.0
     return image

Convert the model

Build virtual environment

Follow Docker official documentation to install Docker: Install Docker Engine on Ubuntu.

Then fetch the prebuilt NPU Docker container and run it.

$ docker pull yanwyb/npu:v1
$ docker run -it --name vim4-npu1 -v $(pwd):/home/khadas/npu \
				-v /etc/localtime:/etc/localtime:ro \
				-v /etc/timezone:/etc/timezone:ro \
				yanwyb/npu:v1

Get conversion tool

Download Tool from khadas/vim4_npu_sdk.

$ git clone https://gitlab.com/khadas/vim4_npu_sdk

Convert

After training the model, we should convert the PyTorch model into an ONNX model.

Copy nets/retinaface.py and rename retinaface_export.py. Modify retinaface_export.py as follows.

class ClassHead(nn.Module):
    def __init__(self,inchannels=512,num_anchors=2):
        super(ClassHead,self).__init__()
        self.num_anchors = num_anchors
        self.conv1x1 = nn.Conv2d(inchannels,self.num_anchors*2,kernel_size=(1,1),stride=1,padding=0)
 
    def forward(self,x):
        out = self.conv1x1(x)
-       out = out.permute(0,2,3,1).contiguous()
+       out = out.contiguous()
 
-       return out.view(out.shape[0], -1, 2)
+       return out.view(out.shape[0], 4, -1)
 
class BboxHead(nn.Module):
    def __init__(self,inchannels=512,num_anchors=2):
        super(BboxHead,self).__init__()
        self.conv1x1 = nn.Conv2d(inchannels,num_anchors*4,kernel_size=(1,1),stride=1,padding=0)
 
    def forward(self,x):
        out = self.conv1x1(x)
-       out = out.permute(0,2,3,1).contiguous()
+       out = out.contiguous()
 
-       return out.view(out.shape[0], -1, 4)
+       return out.view(out.shape[0], 8, -1)
 
class LandmarkHead(nn.Module):
    def __init__(self,inchannels=512,num_anchors=2):
        super(LandmarkHead,self).__init__()
        self.conv1x1 = nn.Conv2d(inchannels,num_anchors*10,kernel_size=(1,1),stride=1,padding=0)
 
    def forward(self,x):
        out = self.conv1x1(x)
-       out = out.permute(0,2,3,1).contiguous()
+       out = out.contiguous()
 
-       return out.view(out.shape[0], -1, 10)
+       return out.view(out.shape[0], 20, -1)

-       bbox_regressions    = torch.cat([self.BboxHead[i](feature) for i, feature in enumerate(features)], dim=1)
-       classifications     = torch.cat([self.ClassHead[i](feature) for i, feature in enumerate(features)], dim=1)
-       ldm_regressions     = torch.cat([self.LandmarkHead[i](feature) for i, feature in enumerate(features)], dim=1)
+       bbox_regressions    = torch.cat([self.BboxHead[i](feature) for i, feature in enumerate(features)], dim=2)
+       classifications     = torch.cat([self.ClassHead[i](feature) for i, feature in enumerate(features)], dim=2)
+       ldm_regressions     = torch.cat([self.LandmarkHead[i](feature) for i, feature in enumerate(features)], dim=2)
 
        if self.mode == 'train':
            output = (bbox_regressions, classifications, ldm_regressions)
        else:
-           output = (bbox_regressions, F.softmax(classifications, dim=-1), ldm_regressions)
+           output = (bbox_regressions, classifications, ldm_regressions)
        return output

Create the Python conversion script as follows and run.

export.py

import torch
import numpy as np
from nets.retinaface_export import RetinaFace
from utils.config import cfg_mnet, cfg_re50
 
model_path = "logs/Epoch150-Total_Loss6.2802.pth"
net = RetinaFace(cfg=cfg_mnet, mode='eval').eval()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
net.load_state_dict(torch.load(model_path, map_location=device))
 
img = torch.zeros(1, 3, 640, 640)
torch.onnx.export(net, img, "./retinaface.onnx", verbose=False, opset_version=12, input_names=['images'])

Enter vim4_npu_sdk/demo and modify convert_adla.sh as follows.

convert_adla.sh

#!/bin/bash
 
ACUITY_PATH=../bin/
#ACUITY_PATH=../python/tvm/
adla_convert=${ACUITY_PATH}adla_convert
 
 
if [ ! -e "$adla_convert" ]; then
    adla_convert=${ACUITY_PATH}adla_convert.py
fi
 
$adla_convert --model-type onnx \
        --model ./model_source/retinaface/retinaface.onnx \
        --inputs "images" \
        --input-shapes  "3,640,640"  \
        --inference-input-type float32 \
	--inference-output-type float32 \
        --dtypes "float32" \
        --quantize-dtype int8 --outdir onnx_output  \
        --channel-mean-value "0,0,0,255"  \
        --source-file ./dataset.txt  \
        --iterations 500 \
        --disable-per-channel False \
        --batch-size 1 --target-platform PRODUCT_PID0XA003

Please prepare about 500 pictures for quantification. If the pictures size is smaller than model input size, please resize pictures to input size before quantification.

Run convert_adla.sh to generate the VIM4 model. The converted model is xxx.adla in onnx_output.

$ bash convert_adla.sh

Run inference on the NPU

Get source code

Clone the source code khadas/vim4_npu_applications.

$ git clone https://github.com/khadas/vim4_npu_applications

If your kernel version is 5.4 or earlier, please use tag ddk-1.7.5.5. Tag ddk-2.3.6.7 is for 5.15.

Install dependencies

$ sudo apt update
$ sudo apt install libopencv-dev python3-opencv cmake

Compile and run

Picture input demo

Put retinaface_int8.adla in vim4_npu_applications/retinaface/data/.

# Compile
$ cd vim4_npu_applications/retinaface
$ mkdir build
$ cd build
$ cmake ..
$ make
 
# Run
$ sudo ./retinaface -m ../data/retinaface_int8.adla -p ../data/timg.jpg

Camera input demo

Put retinaface_int8.adla in vim4_npu_applications/retinaface_cap/data/.

# Compile
$ cd vim4_npu_applications/retinaface_cap
$ mkdir build
$ cd build
$ cmake ..
$ make
 
# Run
$ sudo ./retinaface_cap -m ../data/retinaface_int8.adla -d 0 -w 1920 -h 1080

0 is the camera device index.

Khadas Docs

Sidebar

Table of Contents

RetinaFace PyTorch VIM4 Demo - 5

Get source code

Convert the model

Build virtual environment

Get conversion tool

Convert

Run inference on the NPU

Get source code

Install dependencies

Compile and run

Picture input demo

Camera input demo

Khadas Docs

User Tools

Site Tools

Sidebar

Table of Contents

RetinaFace PyTorch VIM4 Demo - 5

Get source code

Convert the model

Build virtual environment

Get conversion tool

Convert

Run inference on the NPU

Get source code

Install dependencies

Compile and run

Picture input demo

Camera input demo

Page Tools