OpenHab crashing with Z-Wave FIX IT FIX IT FIX IT FIX IT!! -

So, my openhab system periodically decides to leave the building. Appears there is a problem from time to time when the z-wave binding loses communication to the z-wave stick it gets upset and tells openhab to take a hike.

This is bad. Once because it exposed something I missed in my fault tolerance. I had compensated for network issues and full machine failover. But the actual process going belly up…. ooops. My Bad.

Soooo I see it crash while at the gym today and the only thing in my head….

So I appear to have done that.

Let me bring you up to speed on the current state of my home automation. After the great NAS failing of 2015 I was forced to reduce some of my virtual environment. I have not brought my secondary HA controller back online yet. However, it appears that still using keepalived I am able to help address this random problem.

I have added in a new option in my keepalived.conf

 1    script "/usr/local/sbin/healthcheck.sh"
 2    interval 10 # check every 10 seconds
 3    fall 2 # require 2 failures for KO
 4    rise 2 # require 2 successes for OK
 5}
 6
 7vrrp_instance VI_1 {
 8   state MASTER
 9   interface eth0
10   virtual_router_id 220
11   priority 150
12   notify /usr/local/sbin/notify-keepalived.sh
13   advert_int 1
14   authentication {
15        auth_type PASS
16        auth_pass fakepass
17   }
18
19   virtual_ipaddress {
20      192.168.2.90
21   }
22   track_script {
23     chk_hahealth
24   }
25}

So what this does is add a keepalived health check. Every 10 seconds keepalived runs the script /usr/local/sbin/healthcheck.sh and gets an exit code of 0 or 1. 0 if all is good. 1 if the world fell apart.

Environmental concept. Some images in montage provided by NASA (http://visibleearth.nasa.gov/)

The code for this script is

 1#!/bin/sh
 2SERVICE=openhab;
 3
 4if ps ax | grep -v grep | grep $SERVICE &amp;gt; /dev/null
 5then
 6 echo "$SERVICE service running, everything is fine"
 7 /usr/bin/logger "$SERVICE service running, everything is fine"
 8 exit 0
 9else
10 echo "$SERVICE is not running"
11 /usr/bin/logger "$SERVICE is not running"
12 /etc/init.d/openhab restart
13 exit 1
14fi

Explanation:

So this script just checks to see if the openhab process is running. If its good, exit 0. If its not, exit 1 but go ahead and try to restart openhab. When keepalived gets the exit 1 code it keeps track of it. You will see in the config that there is a fall 2 line. That means that if there are 2 exit 1 status’s keepalived will go into a failed state. When the second HA box is back online this will force openhab to move over to the other one. However, I have not seen this happen so far as openhab loads pretty quick so since there is 10 seconds between the checks the second check comes back with an exit 0 and resets the fall count.

OpenHab Crashing With Z-Wave FIX IT FIX IT FIX IT FIX IT!!

Copyright

Comments

OpenHab Crashing With Z-Wave FIX IT FIX IT FIX IT FIX IT!!

Copyright

Related Posts

Comments