Table of Contents

OpenCL

Khadas Edges and VIMs are powered by ARM Mali GPUs.

We can make use of the available processing power offered to accelerate computational tasks like physics simulations, audio processing, neural networks, etc.

Fenix images are already bundled with the necessary OpenCL libraries to get started, we also provide some demos to make use of the GPU.

To use OpenCL you will need to use one of the following platforms to make sure the drivers are present.

board Linux Kernel (BSP) OS
VIM3
Mali G52-MP4 - Bifrost 2nd gen
4.9
5.15
Ubuntu 22.04
VIM3L
Mali G31-MP2 - Bifrost 1st gen
4.9
5.15
Ubuntu 22.04
VIM4
Mali G52-MP8 - Bifrost 2nd gen
5.4
5.15
Ubuntu 22.04
Edge2
Mali G610-MP4 - Valhall 3rd gen
5.10 Ubuntu 22.04

Check OpenCL capabilities and details

$ clinfo

VIM3 has OpenCL capabilities for both NPU and GPU, and by default the OpenCL lib in /usr/lib is for NPU. To use the GPU for proper acceleration follow the below steps to replace the correct library for the GPU.

# Move the NPU OpenCL lib
$ sudo mv /usr/lib/libOpenCL.so /usr/lib/libOpenCL.so.old
 
# Symlink the right OpenCL lib for Mali GPU
$ sudo ln -s /usr/lib/aarch64-linux-gnu/libOpenCL.so.1.0.0 /usr/lib/libOpenCL.so

Note: With the 4.9 kernel, there is only OpenCL 2.0 capability and PyOpenCL will break, migrating to 5.15 kernel images will resolve it.

Get source code

Clone the examples sravansenthiln1/opencl-demos

$ git clone https://github.com/sravansenthiln1/opencl-demos
$ cd opencl-demos

The example demos consist of C++ and Python examples to try.

Setup the necessary headers and packages

Install the OpenCL headers

$ sudo apt install opencl-headers opencl-clhpp-headers

Install the Python OpenCL library

$ sudo apt install python3-pip
$ pip3 install numpy pyopencl

Run Examples

Taking the Neural Network examples in C++

Enter the example directory

$ cd c++/neural_network

Compile the application

$ make

Run the application

$ ./main

Taking the Neural Network examples in Python

Enter the example directory

$ cd python/neural_network

Run the application

$ python3 main.py

Improving OpenCL performance

Memory optimizations

Mali GPUs share the same memory as the rest of the system, copying into new Cl buffers can waste memory, You can optimize this by copying the specifying CL_MEM_ALLOC_HOST_PTR in your allocation, and using the host_ptr attribute to specify the system memory.

This way you can use the same memory buffers for the system and OpenCL.

Increasing device operation frequency

To make sure you are getting the most performance from the CPU and the GPU together, you can force the maximum operating frequency.

Forcing the system to operate at maximum frequency will require the device to have adequate cooling, Leaving it running without maintained temperature can reduce the board life span.

On VIM3/3L/4:

$ echo 2 | sudo tee /sys/class/mpgpu/scale_mode

On Edge2:

$ echo performance | sudo tee /sys/class/devfreq/fb000000.gpu/governor

Further guides to optimize on Mali GPUs

You can refer to these guides for more information regarding improving the performance of your OpenCL application