Khadas Edges and VIMs are powered by ARM Mali GPUs.
We can make use of the available processing power offered to accelerate computational tasks like physics simulations, audio processing, neural networks, etc.
Fenix images are already bundled with the necessary OpenCL libraries to get started, we also provide some demos to make use of the GPU.
To use OpenCL you will need to use one of the following platforms to make sure the drivers are present.
board | Linux Kernel (BSP) | OS |
---|---|---|
VIM3 Mali G52-MP4 - Bifrost 2nd gen | 4.9 5.15 | Ubuntu 22.04 |
VIM3L Mali G31-MP2 - Bifrost 1st gen | 4.9 5.15 | Ubuntu 22.04 |
VIM4 Mali G52-MP8 - Bifrost 2nd gen | 5.4 5.15 | Ubuntu 22.04 |
Edge2 Mali G610-MP4 - Valhall 3rd gen | 5.10 | Ubuntu 22.04 |
$ clinfo
VIM3 has OpenCL capabilities for both NPU and GPU, and by default the OpenCL lib in /usr/lib is for NPU. To use the GPU for proper acceleration follow the below steps to replace the correct library for the GPU.
# Move the NPU OpenCL lib $ sudo mv /usr/lib/libOpenCL.so /usr/lib/libOpenCL.so.old # Symlink the right OpenCL lib for Mali GPU $ sudo ln -s /usr/lib/aarch64-linux-gnu/libOpenCL.so.1.0.0 /usr/lib/libOpenCL.so
Note: With the 4.9 kernel, there is only OpenCL 2.0 capability and PyOpenCL will break, migrating to 5.15 kernel images will resolve it.
Clone the examples sravansenthiln1/opencl-demos
$ git clone https://github.com/sravansenthiln1/opencl-demos $ cd opencl-demos
The example demos consist of C++ and Python examples to try.
$ sudo apt install opencl-headers opencl-clhpp-headers
$ sudo apt install python3-pip $ pip3 install numpy pyopencl
$ cd c++/neural_network
$ make
$ ./main
$ cd python/neural_network
$ python3 main.py
Mali GPUs share the same memory as the rest of the system, copying into new Cl buffers can waste memory,
You can optimize this by copying the specifying CL_MEM_ALLOC_HOST_PTR
in your allocation, and using the host_ptr
attribute to specify the system memory.
This way you can use the same memory buffers for the system and OpenCL.
To make sure you are getting the most performance from the CPU and the GPU together, you can force the maximum operating frequency.
Forcing the system to operate at maximum frequency will require the device to have adequate cooling, Leaving it running without maintained temperature can reduce the board life span.
On VIM3/3L/4:
$ echo 2 | sudo tee /sys/class/mpgpu/scale_mode
On Edge2:
$ echo performance | sudo tee /sys/class/devfreq/fb000000.gpu/governor
You can refer to these guides for more information regarding improving the performance of your OpenCL application