Enable Linux Hardware Watchdog Service on Ubuntu 22.04

Introduction

The Linux kernel can perform predifined action on the system if serious problems are detected. The watchdog daemon that tells the kernel the system is working fine. If the daemon stops doing that, the system do the action defined on the watchdog config.

Vultr supports the Linux watchdog service hardware device. When you install, configure, and run the watchdog service on your instance, it will interact with a special virtual device which Vultr will monitor. If the watchdog fails, the Vultr control plane will automatically reboot the instance.

This document will focus on the wd_keepalive service on Ubuntu 22.04, wd_keepalive is a simplified version of the watchdog daemon. It only opens /dev/watchdog, and keeps writing to it often enough to keep the kernel from resetting, at least once per minute. if this fails, the instance will reboot.

Prerequisites

To use Watchdog, you need:

* Vultr cloud server instance with Ubuntu 22.04

* Using [SSH], access the server as root.

Preliminary Setup

Check if the instance is loaded with the watchdog hardware device:

    lspci -v | grep -i watch

    root@vultr:~# lspci -v | grep -i watch

    03:01.0 System peripheral: Intel Corporation 6300ESB Watchdog Timer

    root@vultr:~#

Load the appropriate watchdog kernel module:

modprobe i6300esb

root@vultr:~# modprobe i6300esb

root@vultr:~#

Install the watchdog software from standard repositories:

`apt-get install watchdog`

Confirm the watchdog device is present:

root@vultr:~# ls -al /dev/watchdog*

crw------- 1 root root 10, 130 Oct 10 10:05 /dev/watchdog

root@vultr:~#

Configuration

Open the watchdog config file with vi:

    vi /etc/watchdog.conf

Uncomment the watchdog_device by removing the `#` from beginning of the line:

# The retry-timeout and repair limit are used to handle errors in a more robust

    # manner. Errors must persist for longer than retry-timeout to action a repair

    # or reboot, and if repair-maximum attempts are made without the test passing a

    # reboot is initiated anyway.

    #retry-timeout          = 60

    #repair-maximum         = 1

    #watchdog-device        = /dev/watchdog

    # Defaults compiled into the binary

    #temperature-sensor     =

    #max-temperature        = 90

Set the realtime option to yes and priority to 1

realtime = yes

priority = 1

Save and close the file by pressing `:x` and enter.

Modify the newly created /etc/default/watchdog file and set the `watchdog_module` to `i6300esb`

    watchdog_module="i6300esb"

You will use the wd_keepalive service to monitor the watchdog device, therefore the watchdog daemon required to be stopped and disabled on startup.

Make sure the watchdog service is not running :

root@vultr:~# systemctl status watchdog.service

    ● watchdog.service - watchdog daemon

       Loaded: loaded (/lib/systemd/system/watchdog.service; enabled; vendor preset: enabled)

       Active: inactive (dead)

Make sure the watchdog service is running then stop it:

    root@vultr:~# systemctl stop watchdog.service

check again and Make sure the watchdog service is not running :

● watchdog.service - watchdog daemon

       Loaded: loaded (/lib/systemd/system/watchdog.service; enabled; vendor preset: enabled)

       Active: inactive (dead)

Disable the watchdog service on boot:

    root@vultr:~# systemctl disable watchdog.service

Edit the Systemd configuration file /lib/systemd/system/wd_keepalive.service and add the following lines under the [Install] section.

Just paste it in the end if there isn’t any [Install] section on your file.

[Install]
WantedBy=multi-user.target

Systemd Setup

Reload systemd manager configuration:

        root@vultr:~# systemctl daemon-reload

Start the wd_keepalive service:

        root@vultr:~# systemctl start wd_keepalive

Enable the service to start at system boot:

        root@vultr:~# systemctl enable wd_keepalive

Check the status of the wd_keepalive service:

        root@vultr:~# systemctl status wd_keepalive

Reboot the VM and confirm again that the wd_keepalive service is started on system boot:

   root@vultr:~# systemctl status wd_keepalive

Check the watchdog module is up and working by running `dmesg | grep i6300`

root@vultr:~# dmesg | grep i6300

        [    4.760745] i6300esb: Intel 6300ESB WatchDog Timer Driver v0.05

    [    4.761165] i6300esb: initialized (0x000000009f3c0029). heartbeat=30 sec (nowayout=0)

Testing

To test the watchdog service is configured properly and works as expected, you can trigger a system crash to check if the instance get rebooted.

by running this command you can perform a system crash by a NULL pointer dereference.

    sync; sleep 2; sync; echo c > /proc/sysrq-trigger

Your instance should automatically reboot in about a minute.

Conclusion

In this article, you installed Linux watchdog service with hardware device on a Ubuntu 22.04 server, you explored how to install the service, configure it and test the service . For more information, please visit the following resources:

* [Linux watchdog](https://https://linux.die.net/man/8/watchdog)

* [watchdog.conf](https://linux.die.net/man/5/watchdog.conf)

* [wd_keepalive](https://linux.die.net/man/8/wd_keepalive)