====== OpenCL ======
Khadas Edges and VIMs are powered by **ARM Mali GPUs**.
We can make use of the available processing power offered to accelerate computational tasks like physics simulations, audio processing, neural networks, etc.
Fenix images are already bundled with the necessary OpenCL libraries to get started, we also provide some demos to make use of the GPU.
To use OpenCL you will need to use one of the following platforms to make sure the drivers are present.
^ board ^ Linux Kernel (BSP) ^ OS ^
| VIM3 \\ **Mali G52-MP4** - Bifrost 2nd gen| 4.9 \\ 5.15 | Ubuntu 22.04|
| VIM3L \\ **Mali G31-MP2** - Bifrost 1st gen| 4.9 \\ 5.15 | Ubuntu 22.04|
| VIM4 \\ **Mali G52-MP8** - Bifrost 2nd gen| 5.4 \\ 5.15 | Ubuntu 22.04|
| Edge2 \\ **Mali G610-MP4** - Valhall 3rd gen| 5.10| Ubuntu 22.04|
===== Check OpenCL capabilities and details ======
```shell
$ clinfo
```
**VIM3** has OpenCL capabilities for both NPU and GPU, and by default the OpenCL lib in /usr/lib is for NPU.
To use the GPU for proper acceleration follow the below steps to replace the correct library for the GPU.
```shell
# Move the NPU OpenCL lib
$ sudo mv /usr/lib/libOpenCL.so /usr/lib/libOpenCL.so.old
# Symlink the right OpenCL lib for Mali GPU
$ sudo ln -s /usr/lib/aarch64-linux-gnu/libOpenCL.so.1.0.0 /usr/lib/libOpenCL.so
```
Note: With the 4.9 kernel, there is only OpenCL 2.0 capability and PyOpenCL will break, migrating to 5.15 kernel images will resolve it.
===== Get source code ======
Clone the examples [[gh>sravansenthiln1/opencl-demos]]
```shell
$ git clone https://github.com/sravansenthiln1/opencl-demos
$ cd opencl-demos
```
The example demos consist of C++ and Python examples to try.
===== Setup the necessary headers and packages =====
==== Install the OpenCL headers ====
```shell
$ sudo apt install opencl-headers opencl-clhpp-headers
```
==== Install the Python OpenCL library ====
```shell
$ sudo apt install python3-pip
$ pip3 install numpy pyopencl
```
===== Run Examples =====
==== Taking the Neural Network examples in C++ ====
=== Enter the example directory ===
```shell
$ cd c++/neural_network
```
=== Compile the application ===
```shell
$ make
```
=== Run the application ===
```shell
$ ./main
```
==== Taking the Neural Network examples in Python ====
=== Enter the example directory ===
```shell
$ cd python/neural_network
```
=== Run the application ===
```shell
$ python3 main.py
```
===== Improving OpenCL performance =====
==== Memory optimizations ====
Mali GPUs share the same memory as the rest of the system, copying into new Cl buffers can waste memory,
You can optimize this by copying the specifying ''CL_MEM_ALLOC_HOST_PTR'' in your allocation, and using the ''host_ptr'' attribute to specify the system memory.
This way you can use the same memory buffers for the system and OpenCL.
==== Increasing device operation frequency ====
To make sure you are getting the most performance from the CPU and the GPU together, you can force the maximum operating frequency.
Forcing the system to operate at maximum frequency will require the device to have adequate cooling, Leaving it running without maintained temperature can reduce the board life span.
On **VIM3/3L/4**:
```shell
$ echo 2 | sudo tee /sys/class/mpgpu/scale_mode
```
On **Edge2**:
```shell
$ echo performance | sudo tee /sys/class/devfreq/fb000000.gpu/governor
```
==== Further guides to optimize on Mali GPUs ====
You can refer to these guides for more information regarding improving the performance of your OpenCL application
- [[https://www.youtube.com/watch?v=DO_68Hjs2UI | Arm Mali GPU Training Series Ep 2.1 : The Mali GPU family]]
- [[https://developer.arm.com/documentation/101574/0403 | Arm Mali Bifrost and Valhall OpenCL Developer Guide Version 4.3 ]]
- [[https://registry.khronos.org/OpenCL/specs/opencl-2.1.pdf | OpenCL/specs/opencl-2.1.pdf]]
- [[https://documen.tician.de/pyopencl/ | PyOpenCL Documentation]]