Search by Tags

How to: Using OpenCL 1.2 in iMX8 in a Docker container and Torizon

 
Applicable for

Article updated at 06 Jan 2020
Subscribe for this article updates

Introduction

Torizon features Docker runtime. Toradex provides Debian Docker images and deb packages that greatly eases the development process for embedded applications. In this article, we will show how you can install and test OpenCL libraries optimized to iMX8 GPU to integrate into your application. We will also obtain, build and run an OpenCL benchmarking tool to check if the software is set up correctly and observe the GPU performance.

Prerequisites

  • A Toradex's i.MX8 SoM with Torizon installed ( To get instructions about how to install Torizon, see Getting Started Guide )
  • Basic knowledge of Docker containers. To learn more about Docker, visit the developer's website. To learn the first steps with Docker usage and Torizon, check the Getting Started Guide.

Dockerfile instructions

Download the full Dockerfile implementation.The implementation details will be explained in this session. See the Getting Started Guide with the instructions about how to compile the image on a host pc and pull the image in the board. You can also scp this file to the board and build it locally.

To build

Inside the directory that contains Dockerfile on the host PC, build the image:

$ docker build -t <your-dockerhub-username>/opencl-image .

After the build, push the image to your Dockerhub account:

$ docker push <your-dockerhub-username>/opencl-image

To run

First, pull it from your dockerhub account to the board. In the terminal of your board: Warning: These instructions assumes that the dockerhub credentials are already set up on the board. If you did not setup your credentials yet, execute docker login

# docker pull <your-dockerhub-username>/opencl-image

After the pull, run a container based on the image.

Attention: Please, note that by executing the following line you are accepting the NXP's terms and conditions of the End-User License Agreement (EULA)

# docker run -e ACCEPT_FSL_EULA=1 -it --rm --name=clpeak-container --net=host --cap-add CAP_SYS_TTY_CONFIG \
             -v /dev:/dev -v /tmp:/tmp -v /run/udev/:/run/udev/ \
             --device-cgroup-rule='c 4:* rmw'  --device-cgroup-rule='c 13:* rmw' --device-cgroup-rule='c 199:* rmw' --device-cgroup-rule='c 226:* rmw' \
             <your-dockerhub-username>/opencl-image

Dockerfile explained

Image base

Toradex provides a basic Wayland image in its dockerhub page. You need to add the torizon/arm64v8-debian-wayland-base-vivante to your image. It contains the repository package. You also need to get the Vivante's OpenCL Debian package for Torizon:

FROM torizon/arm64v8-debian-wayland-base-vivante AS base

RUN apt-get -y update && apt-get install -y \
    libopencl-vivante1 \
    libopencl-vivante1-dev \
    libclc-vivante1 \
    libllvm-vivante1 \
    libgal-vivante1 \
    libvsc-vivante1 \
    && apt-get clean && apt-get autoremove && rm -rf /var/lib/apt/lists/*

Building clpeak

Clpeak is a benchmarking tool to measure the peak capabilities of OpenCL devices. We will build it from source for our system:

FROM base AS builder

RUN apt-get -y update && apt-get install -y \
    git build-essential cmake wget \
    && apt-get clean && apt-get autoremove && rm -rf /var/lib/apt/lists/*

RUN git config --global user.email "you@example.com" && \
    git config --global user.name "Your Name"

RUN git clone https://github.com/krrishnarraj/clpeak.git && \
    cd clpeak && \
    git submodule update --init --recursive --remote

RUN mkdir -p clpeak/buid &&\
    cd clpeak/buid && \
    cmake .. && \
    cmake --build .

Run clpeak

In this demo dockerfile, we will run get the built clpeak from the previous stages and use it as entry point

FROM base AS runtime

COPY --from=builder /clpeak/buid/clpeak .

ENTRYPOINT ./clpeak

Complete Dockerfile

You can Download the full Dockerfile implementation, or copy from the block below:

Dockerfile
FROM torizon/arm64v8-debian-wayland-base-vivante AS base
 
RUN apt-get -y update && apt-get install -y \
    libopencl-vivante1 \
    libopencl-vivante1-dev \
    libclc-vivante1 \
    libllvm-vivante1 \
    libgal-vivante1 \
    libvsc-vivante1 \
    && apt-get clean && apt-get autoremove && rm -rf /var/lib/apt/lists/*
 
FROM base AS builder
 
RUN apt-get -y update && apt-get install -y \
    git build-essential cmake wget \
    && apt-get clean && apt-get autoremove && rm -rf /var/lib/apt/lists/*
 
RUN git config --global user.email "you@example.com" && \
    git config --global user.name "Your Name"
 
RUN git clone https://github.com/krrishnarraj/clpeak.git && \
    cd clpeak && \
    git submodule update --init --recursive --remote
 
RUN mkdir -p clpeak/buid &&\
    cd clpeak/buid && \
    cmake .. && \
    cmake --build .
 
FROM base AS runtime
 
COPY --from=builder /clpeak/buid/clpeak .
 
ENTRYPOINT ./clpeak

Expected Output

This is the expected output from an Apalis iMX8 board:

Output
Platform: Vivante OpenCL Platform
  Device: Vivante OpenCL Device GC7000XSVX.6009.0000
    Driver version  : OpenCL 1.2 V6.2.4.p4.190076 (Linux ARM64)
    Compute units   : 1
    Clock frequency : 996 MHz
 
    Global memory bandwidth (GBPS)
      float   : 5.81
      float2  : 9.74
      float4  : 10.63
      float8  : 9.36
      float16 : 8.00
 
    Single-precision compute (GFLOPS)
      float   : 14.14
      float2  : 28.18
      float4  : 55.87
      float8  : 62.15
      float16 : 61.45
 
    No half precision support! Skipped
 
    No double precision support! Skipped
 
    Integer compute (GIOPS)
      int   : 14.13
      int2  : 14.09
      int4  : 15.84
      int8  : 15.73
      int16 : 14.54
 
    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 1.43
      enqueueReadBuffer          : 0.08
      enqueueMapBuffer(for read) : 301.68
        memcpy from mapped ptr   : 0.08
      enqueueUnmap(after write)  : 269.08
        memcpy to mapped ptr     : 1.43
 
    Kernel launch latency : 97.56 us
 
  Device: Vivante OpenCL Device GC7000XSVX.6009.0000
    Driver version  : OpenCL 1.2 V6.2.4.p4.190076 (Linux ARM64)
    Compute units   : 1
    Clock frequency : 996 MHz
 
    Global memory bandwidth (GBPS)
      float   : 5.59
      float2  : 9.38
      float4  : 10.33
      float8  : 9.15
      float16 : 7.85
 
    Single-precision compute (GFLOPS)
      float   : 14.14
      float2  : 28.18
      float4  : 55.87
      float8  : 62.15
      float16 : 61.45
 
    No half precision support! Skipped
 
    No double precision support! Skipped
 
    Integer compute (GIOPS)
      int   : 14.13
      int2  : 14.09
      int4  : 15.84
      int8  : 15.73
      int16 : 14.54
 
    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 1.42
      enqueueReadBuffer          : 0.08
      enqueueMapBuffer(for read) : 238.44
        memcpy from mapped ptr   : 0.08
      enqueueUnmap(after write)  : 207.03
        memcpy to mapped ptr     : 1.44
 
    Kernel launch latency : 126.82 us