This is an old revision of the document!
bubbliiiing/retinaface-pytorch
$ git clone https://github.com/bubbliiiing/retinaface-pytorch
Before training, modify retinaface-pytorch/utils/utils.py
as follows.
diff --git a/utils/utils.py b/utils/utils.py index 87bb528..4a22f2a 100644 --- a/utils/utils.py +++ b/utils/utils.py @@ -25,5 +25,6 @@ def get_lr(optimizer): return param_group['lr'] def preprocess_input(image): - image -= np.array((104, 117, 123),np.float32) + image = image / 255.0 return image
Follow Docker official documentation to install Docker: Install Docker Engine on Ubuntu.
Then fetch the prebuilt NPU Docker container and run it.
$ docker pull yanwyb/npu:v1 $ docker run -it --name vim4-npu1 -v $(pwd):/home/khadas/npu \ -v /etc/localtime:/etc/localtime:ro \ -v /etc/timezone:/etc/timezone:ro \ yanwyb/npu:v1
Download Tool from khadas/vim4_npu_sdk.
$ git clone https://gitlab.com/khadas/vim4_npu_sdk
After training the model, we should convert the PyTorch model into an ONNX model.
Copy nets/retinaface.py
and rename retinaface_export.py
. Modify retinaface_export.py
as follows.
class ClassHead(nn.Module): def __init__(self,inchannels=512,num_anchors=2): super(ClassHead,self).__init__() self.num_anchors = num_anchors self.conv1x1 = nn.Conv2d(inchannels,self.num_anchors*2,kernel_size=(1,1),stride=1,padding=0) def forward(self,x): out = self.conv1x1(x) - out = out.permute(0,2,3,1).contiguous() + out = out.contiguous() - return out.view(out.shape[0], -1, 2) + return out.view(out.shape[0], 4, -1) class BboxHead(nn.Module): def __init__(self,inchannels=512,num_anchors=2): super(BboxHead,self).__init__() self.conv1x1 = nn.Conv2d(inchannels,num_anchors*4,kernel_size=(1,1),stride=1,padding=0) def forward(self,x): out = self.conv1x1(x) - out = out.permute(0,2,3,1).contiguous() + out = out.contiguous() - return out.view(out.shape[0], -1, 4) + return out.view(out.shape[0], 8, -1) class LandmarkHead(nn.Module): def __init__(self,inchannels=512,num_anchors=2): super(LandmarkHead,self).__init__() self.conv1x1 = nn.Conv2d(inchannels,num_anchors*10,kernel_size=(1,1),stride=1,padding=0) def forward(self,x): out = self.conv1x1(x) - out = out.permute(0,2,3,1).contiguous() + out = out.contiguous() - return out.view(out.shape[0], -1, 10) + return out.view(out.shape[0], 20, -1)
- bbox_regressions = torch.cat([self.BboxHead[i](feature) for i, feature in enumerate(features)], dim=1) - classifications = torch.cat([self.ClassHead[i](feature) for i, feature in enumerate(features)], dim=1) - ldm_regressions = torch.cat([self.LandmarkHead[i](feature) for i, feature in enumerate(features)], dim=1) + bbox_regressions = torch.cat([self.BboxHead[i](feature) for i, feature in enumerate(features)], dim=2) + classifications = torch.cat([self.ClassHead[i](feature) for i, feature in enumerate(features)], dim=2) + ldm_regressions = torch.cat([self.LandmarkHead[i](feature) for i, feature in enumerate(features)], dim=2) if self.mode == 'train': output = (bbox_regressions, classifications, ldm_regressions) else: - output = (bbox_regressions, F.softmax(classifications, dim=-1), ldm_regressions) + output = (bbox_regressions, classifications, ldm_regressions) return output
Create the Python conversion script as follows and run.
import torch import numpy as np from nets.retinaface_export import RetinaFace from utils.config import cfg_mnet, cfg_re50 model_path = "logs/Epoch150-Total_Loss6.2802.pth" net = RetinaFace(cfg=cfg_mnet, mode='eval').eval() device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') net.load_state_dict(torch.load(model_path, map_location=device)) img = torch.zeros(1, 3, 640, 640) torch.onnx.export(net, img, "./retinaface.onnx", verbose=False, opset_version=12, input_names=['images'])
Enter vim4_npu_sdk/demo
and modify convert_adla.sh
as follows.
#!/bin/bash ACUITY_PATH=../bin/ #ACUITY_PATH=../python/tvm/ adla_convert=${ACUITY_PATH}adla_convert if [ ! -e "$adla_convert" ]; then adla_convert=${ACUITY_PATH}adla_convert.py fi $adla_convert --model-type onnx \ --model ./model_source/retinaface/retinaface.onnx \ --inputs "images" \ --input-shapes "3,640,640" \ --inference-input-type float32 \ --inference-output-type float32 \ --dtypes "float32" \ --quantize-dtype int8 --outdir onnx_output \ --channel-mean-value "0,0,0,255" \ --source-file ./dataset.txt \ --iterations 500 \ --disable-per-channel False \ --batch-size 1 --target-platform PRODUCT_PID0XA003
Please prepare about 500 pictures for quantification. If the pictures size is smaller than model input size, please resize pictures to input size before quantification.
Run convert_adla.sh
to generate the VIM4 model. The converted model is xxx.adla
in onnx_output
.
$ bash convert_adla.sh
Clone the source code khadas/vim4_npu_applications.
$ git clone https://github.com/khadas/vim4_npu_applications
If your kernel version is 5.4 or earlier, please use tag ddk-1.7.5.5
. Tag ddk-2.3.6.7
is for 5.15.
$ sudo apt update $ sudo apt install libopencv-dev python3-opencv cmake
Put retinaface_int8.adla
in vim4_npu_applications/retinaface/data/
.
# Compile $ cd vim4_npu_applications/retinaface $ mkdir build $ cd build $ cmake .. $ make # Run $ sudo ./retinaface -m ../data/retinaface_int8.adla -p ../data/timg.jpg
Put retinaface_int8.adla
in vim4_npu_applications/retinaface_cap/data/
.
# Compile $ cd vim4_npu_applications/retinaface_cap $ mkdir build $ cd build $ cmake .. $ make # Run $ sudo ./retinaface_cap -m ../data/retinaface_int8.adla -d 0 -w 1920 -h 1080
0
is the camera device index.