Khadas Docs

Amazing Khadas, always amazes you!

User Tools

Site Tools


Sidebar

products:sbc:vim3:npu:npu-performance

NPU Performance Analysis

Obtain NPU performance diagnostics and understand performance metrics.

Preparation

Upgrade The System

Please refer to the upgrade guide.

Reload the Driver Module

1. Uninstall the NPU module:

$ sudo rmmod galcore

2. Reinstall NPU module:

$ sudo insmod /lib/modules/$(uname -r)/kernel/drivers/amlogic/npu/galcore.ko gpuProfiler=1 showArgs=1

Set environment variables

$ export VIV_VX_PROFILE=1
$ export VIV_VX_DEBUG_LEVEL=1

Fetch diagnostic data of the NPU

Run Model

Here is a sample using the inception example to run on the NPU.

$ aml_npu_demo_binaries/inceptionv3/VIM3$ ./run.sh
#productname=VIPNano-QI, pid=0x88
Created VX Thread: 0xa69a21c0
Create Neural Network: 59ms or 59455us
Verify...
generate command buffer, device count=1, core count per-device: 1,
---------------------------Begin VerifyTiling -------------------------
AXI-SRAM = 1048576 Bytes VIP-SRAM = 522240 Bytes SWTILING_PHASE_FEATURES[1, 1, 0]
  0 NBG [(   0    0    0 0,        0, 0x(nil)(0x(nil), 0x(nil)) ->    0    0    0 0,        0, 0x(nil)(0x(nil), 0x(nil))) k(0 0    0,        0) pad(0 0) pool(0 0, 0 0)]
 
 id IN [ x  y  w   h ]   OUT  [ x  y  w  h ] (tx, ty, kpc) (ic, kc, kc/ks, ks/eks, kernel_type)
   0 NBG DD 0x(nil) [   0    0        0        0] -> DD 0x(nil) [   0    0        0        0] (  0,   0,   0) (       0,        0, 0.000000%, 0.000000%, NONE)
 
PreLoadWeightBiases = 1048576  100.000000%
---------------------------End VerifyTiling -------------------------
Verify Graph: 0ms or 823us
Start run graph [1] times...
layer id: 0 layer name:network_binary_graph operation[0]:unkown operation type target:unkown operation target.
uid: 0
op_abs_id: 0
execution time:             20845 us
[     1] TOTAL_READ_BANDWIDTH  (MByte): 71.703380
[     2] TOTAL_WRITE_BANDWIDTH (MByte): 17.810649
[     3] AXI_READ_BANDWIDTH  (MByte): 30.981305
[     4] AXI_WRITE_BANDWIDTH (MByte): 14.130429
[     5] DDR_READ_BANDWIDTH (MByte): 40.722075
[     6] DDR_WRITE_BANDWIDTH (MByte): 3.680220
[     7] GPUTOTALCYCLES: 16697255
[     8] GPUIDLECYCLES: 296080
VPC_ELAPSETIME: 21124
*********
Run the 1 time: 21.00ms or 21609.00us
vxProcessGraph execution time:
Total   21.00ms or 21625.00us
Average 21.62ms or 21625.00us
 --- Top5 ---
  2: 0.833984
795: 0.009102
974: 0.003592
408: 0.002207
393: 0.002111
Exit VX Thread: 0xa69a21c0

Diagnostic data description

Name Description
TOTAL_READ_BANDWIDTH Total read bandwidth
TOTAL_WRITE_BANDWIDTH Total write bandwidth
AXI_READ_BANDWIDTH AXI_SRAM read bandwidth
AXI_WRITE_BANDWIDTH AXI_SRAM write bandwidth
DDR_READ_BANDWIDTH DDR read bandwidth
DDR_WRITE_BANDWIDTH DDR write bandwidth

Calculate Usage Rate

  • GPUTOTALCYCLES : Total number of cycles.
  • GPUIDLECYCLES : Number of cycles in idle state.

Usage rate = (GPUTOTALCYCLES-GPUIDLECYCLES)/GPUTOTALCYCLES.

Last modified: 2023/09/12 04:10 by sravan