Introduction
The Linux kernel can perform predifined action on the system if serious problems are detected. The watchdog daemon that tells the kernel the system is working fine. If the daemon stops doing that, the system do the action defined on the watchdog config.
Vultr supports the Linux watchdog service hardware device. When you install, configure, and run the watchdog service on your instance, it will interact with a special virtual device which Vultr will monitor. If the watchdog fails, the Vultr control plane will automatically reboot the instance.
This document will focus on the wd_keepalive service on Ubuntu 22.04, wd_keepalive is a simplified version of the watchdog daemon. It only opens /dev/watchdog, and keeps writing to it often enough to keep the kernel from resetting, at least once per minute. if this fails, the instance will reboot.
Prerequisites
To use Watchdog, you need:
* Vultr cloud server instance with Ubuntu 22.04
* Using [SSH], access the server as root.
Preliminary Setup
Check if the instance is loaded with the watchdog hardware device:
lspci -v | grep -i watch
root@vultr:~# lspci -v | grep -i watch
03:01.0 System peripheral: Intel Corporation 6300ESB Watchdog Timer
root@vultr:~#
Load the appropriate watchdog kernel module:
modprobe i6300esb
root@vultr:~# modprobe i6300esb
root@vultr:~#
Install the watchdog software from standard repositories:
`apt-get install watchdog`
Confirm the watchdog device is present:
root@vultr:~# ls -al /dev/watchdog*
crw------- 1 root root 10, 130 Oct 10 10:05 /dev/watchdog
root@vultr:~#
Configuration
Open the watchdog config file with vi:
vi /etc/watchdog.conf
Uncomment the watchdog_device by removing the `#` from beginning of the line:
# The retry-timeout and repair limit are used to handle errors in a more robust
# manner. Errors must persist for longer than retry-timeout to action a repair
# or reboot, and if repair-maximum attempts are made without the test passing a
# reboot is initiated anyway.
#retry-timeout = 60
#repair-maximum = 1
#watchdog-device = /dev/watchdog
# Defaults compiled into the binary
#temperature-sensor =
#max-temperature = 90
Set the realtime option to yes and priority to 1
realtime = yes
priority = 1
Save and close the file by pressing `:x` and enter.
Modify the newly created /etc/default/watchdog file and set the `watchdog_module` to `i6300esb`
watchdog_module="i6300esb"
You will use the wd_keepalive service to monitor the watchdog device, therefore the watchdog daemon required to be stopped and disabled on startup.
Make sure the watchdog service is not running :
root@vultr:~# systemctl status watchdog.service
● watchdog.service - watchdog daemon
Loaded: loaded (/lib/systemd/system/watchdog.service; enabled; vendor preset: enabled)
Active: inactive (dead)
Make sure the watchdog service is running then stop it:
root@vultr:~# systemctl stop watchdog.service
check again and Make sure the watchdog service is not running :
● watchdog.service - watchdog daemon
Loaded: loaded (/lib/systemd/system/watchdog.service; enabled; vendor preset: enabled)
Active: inactive (dead)
Disable the watchdog service on boot:
root@vultr:~# systemctl disable watchdog.service
Edit the Systemd configuration file /lib/systemd/system/wd_keepalive.service and add the following lines under the [Install] section.
Just paste it in the end if there isn’t any [Install] section on your file.
[Install]
WantedBy=multi-user.target
Systemd Setup
Reload systemd manager configuration:
root@vultr:~# systemctl daemon-reload
Start the wd_keepalive service:
root@vultr:~# systemctl start wd_keepalive
Enable the service to start at system boot:
root@vultr:~# systemctl enable wd_keepalive
Check the status of the wd_keepalive service:
root@vultr:~# systemctl status wd_keepalive
Reboot the VM and confirm again that the wd_keepalive service is started on system boot:
root@vultr:~# systemctl status wd_keepalive
Check the watchdog module is up and working by running `dmesg | grep i6300`
root@vultr:~# dmesg | grep i6300
[ 4.760745] i6300esb: Intel 6300ESB WatchDog Timer Driver v0.05
[ 4.761165] i6300esb: initialized (0x000000009f3c0029). heartbeat=30 sec (nowayout=0)
Testing
To test the watchdog service is configured properly and works as expected, you can trigger a system crash to check if the instance get rebooted.
by running this command you can perform a system crash by a NULL pointer dereference.
sync; sleep 2; sync; echo c > /proc/sysrq-trigger
Your instance should automatically reboot in about a minute.
Conclusion
In this article, you installed Linux watchdog service with hardware device on a Ubuntu 22.04 server, you explored how to install the service, configure it and test the service . For more information, please visit the following resources:
* [Linux watchdog](https://https://linux.die.net/man/8/watchdog)
* [watchdog.conf](https://linux.die.net/man/5/watchdog.conf)
* [wd_keepalive](https://linux.die.net/man/8/wd_keepalive)