Select the version of your OS from the tabs below. If you don't know the version you are using, run the command cat /etc/os-release
or cat /etc/issue
on the board.
Remember that you can always refer to the Torizon Documentation, there you can find a lot of relevant articles that might help you in the application development.
Reliability is an important topic for embedded systems. Once you have deployed thousands of devices to the field, malfunction or successful attacks may cause harm to people and equipment and may imply costs for on-site maintenance.
Torizon strives to be a reliable system from its conception and at all levels. Be it on TorizonCore or our tools and features, like TorizonCore Builder and our OTA update system, we care about providing safe defaults and guidance to our customers.
In this article, we go through features you can use to further increase the reliability of your product:
Once you apply the features described in this article to a board, you must create a custom TorizonCore image with the exact same configuration to install on several boards during production programming.
You can do this with the TorizonCore Builder Tool - Customization for Production Programming and Torizon OTA, more specifically by Capturing Changes in the Configuration of a Board on TorizonCore.
It is recommended that you:
Docker data might get corrupted on the device. It is a rare situation but may happen in some specific cases like malfunctioning hardware or unintended powercuts during write operations in the storage device (NAND or eMMC). The risk is minimized in TorizonCore because most filesystem is mounted read-only and journaling is enabled on read-write mount points. Anyway, if it happens, it can result in containers not being able to start.
To avoid such situations, there is a feature called Docker integrity checker in TorizonCore.
If the docker-compose
systemd service is not able to start all containers successfully, the docker-integrity-checker
systemd service will be triggered.
This service will perform an integrity check on all installed Docker images that are defined in the /var/sota/storage/docker-compose/docker-compose.yml
file because this is the file used by docker-compose.service
.
If any of the Docker images are identified as corrupted, they will be deleted and re-pulled from the container registry again.
This feature is currently disabled by default in TorizonCore, and can be enabled by creating the /etc/docker/enable-integrity-checker
file:
# touch /etc/docker/enable-integrity-checker
Warning: This feature can create additional network traffic in case a corrupted container image is detected.
It might happen sometimes that a container appears to be up and running, but it’s not running as desired. To improve the reliability of the system, TorizonCore is able to monitor the health of running containers, and restart them if needed.
To monitor a container in TorizonCore, one must:
"autoheal=true"
docker-watchdog.service
systemd serviceGiven the above conditions, TorizonCore will check the container for its health state every 5 minutes and restart it if the "unhealthy" state is detected.
Docker containers can be configured with a check to determine whether or not running containers are in a "healthy" state.
Here is an example of defining a health check. In this case, it will check for the existence of /tmp/.X11-unix/X0
file:
healthcheck:
test: ["CMD", "test", "-S", "/tmp/.X11-unix/X0"]
interval: 5s
timeout: 4s
retries: 2
start_period: 10s
If the file doesn’t exist, the container will became "unhealthy". More information about Docker healthcheck is available in the Docker Compose file reference.
Every container that is going to be monitored has to be labeled as “autoheal=true”:
labels:
- autoheal=true
The docker-watchdog
systemd service can be enabled by running:
# sudo systemctl enable docker-watchdog.service
After enabling and starting this service, all containers configured with a health check as stated above will be monitored and restarted if they became "unhealthy".