Search by Tags

Reliability on Torizon


Article updated at 01 Jul 2021
Subscribe for this article updates

Select the version of your OS from the tabs below. If you don't know the version you are using, run the command cat /etc/os-release or cat /etc/issue on the board.

Remember that you can always refer to the Torizon Documentation, there you can find a lot of relevant articles that might help you in the application development.

Torizon 5.4.0


Reliability is an important topic for embedded systems. Once you have deployed thousands of devices to the field, malfunction or successful attacks may cause harm to people and equipment and may imply costs for on-site maintenance.

Torizon strives to be a reliable system from its conception and at all levels. Be it on TorizonCore or our tools and features, like TorizonCore Builder and our OTA update system, we care about providing safe defaults and guidance to our customers.

In this article, we go through features you can use to further increase the reliability of your product:

  • Docker data integrity checker: recover from data corruption in extremely adverse situations.
  • Docker container health monitor: restart a container if a certain condition fails.

Applying Configuration to a Custom TorizonCore Image

Once you apply the features described in this article to a board, you must create a custom TorizonCore image with the exact same configuration to install on several boards during production programming.

You can do this with the TorizonCore Builder Tool - Customization for Production Programming and Torizon OTA, more specifically by Capturing Changes in the Configuration of a Board on TorizonCore.


It is recommended that you:

Docker data integrity checker

Docker data might get corrupted on the device. It is a rare situation but may happen in some specific cases like malfunctioning hardware or unintended powercuts during write operations in the storage device (NAND or eMMC). The risk is minimized in TorizonCore because most filesystem is mounted read-only and journaling is enabled on read-write mount points. Anyway, if it happens, it can result in containers not being able to start.

To avoid such situations, there is a feature called Docker integrity checker in TorizonCore.

If the docker-compose systemd service is not able to start all containers successfully, the docker-integrity-checker systemd service will be triggered.

This service will perform an integrity check on all installed Docker images that are defined in the /var/sota/storage/docker-compose/docker-compose.yml file because this is the file used by docker-compose.service.

If any of the Docker images are identified as corrupted, they will be deleted and re-pulled from the container registry again.

This feature is currently disabled by default in TorizonCore, and can be enabled by creating the /etc/docker/enable-integrity-checker file:

# touch /etc/docker/enable-integrity-checker

Warning: This feature can create additional network traffic in case a corrupted container image is detected.

Docker container health monitor

It might happen sometimes that a container appears to be up and running, but it’s not running as desired. To improve the reliability of the system, TorizonCore is able to monitor the health of running containers, and restart them if needed.

To monitor a container in TorizonCore, one must:

  • Declare a user-defined check to determine the health state of a running container
  • Label the container with "autoheal=true"
  • Enable docker-watchdog.service systemd service

Given the above conditions, TorizonCore will check the container for its health state every 5 minutes and restart it if the "unhealthy" state is detected.

User defined check

Docker containers can be configured with a check to determine whether or not running containers are in a "healthy" state.

Here is an example of defining a health check. In this case, it will check for the existence of /tmp/.X11-unix/X0 file:

    test: ["CMD", "test", "-S", "/tmp/.X11-unix/X0"]
    interval: 5s
    timeout: 4s
    retries: 2
    start_period: 10s

If the file doesn’t exist, the container will became "unhealthy". More information about Docker healthcheck is available in the Docker Compose file reference.


Every container that is going to be monitored has to be labeled as “autoheal=true”:

      - autoheal=true

Enabling docker-watchdog service

The docker-watchdog systemd service can be enabled by running:

# sudo systemctl enable docker-watchdog.service

After enabling and starting this service, all containers configured with a health check as stated above will be monitored and restarted if they became "unhealthy".