How To Configure Online Failover/Failback on CentOS 6 Using Heartbeat

Posted by

This article will explain how to configuration failover/failback using Heartbeat application. According this article : http://linux-ha.org/wiki/Heartbeat

Heartbeat is a daemon that provides cluster infrastructure (communication and membership) services to its clients. This allows clients to know about the presence (or disappearance!) of peer processes on other machines and to easily exchange messages with them.

In this guidance, i am build 2 system for online failover. The systems using CentOS 6 64 Bit. For easy understanding, this is my information system

# Server 1
Hostname   : node1
Domain     : imanudin.net
IP Address : 192.168.80.91

# Server 2
Hostname   : node2
Domain     : imanudin.net
IP Address : 192.168.80.92

# Alias IP for online failover testing
IP Address : 192.168.80.93

# Configure Network

First, we must configure network on CentOS. Assuming name of your network interface is eth0. Do the following configuration on all nodes (node1 and node2) and adjust on node2

vi /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none
IPADDR=192.168.80.91
NETMASK=255.255.255.0
DNS1=192.168.80.91
GATEWAY=192.168.80.11
DNS2=192.168.80.11
DNS3=8.8.8.8
USERCTL=no

Restart network service and setup for automatic boot on all nodes (node1 and node2)

service network restart
chkconfig network on

# Configure Disable Selinux & Firewall on all nodes (node1 and node2)

Open file /etc/sysconfig/selinux and change SELINUX=enforcing become SELINUX=disabled. Also disable some service such as iptables and ip6tables.

setenforce 0
service iptables stop
service ip6tables stop
chkconfig iptables off
chkconfig ip6tables off

# Configure /etc/hosts and hostname on all nodes (node1 and node2)

Open file /etc/hosts and configure as follows

# node1
127.0.0.1     localhost
192.168.80.91 node1.imanudin.net node1
192.168.80.92 node2.imanudin.net node2

# node2
127.0.0.1     localhost
192.168.80.91 node1.imanudin.net node1
192.168.80.92 node2.imanudin.net node2

Do the following command as root and open file /etc/sysconfig/network to change hostname

On node1

hostname node1.imanudin.net
vi /etc/sysconfig/network

Change HOSTNAME so that like below :

NETWORKING=yes
HOSTNAME=node1.imanudin.net

On node2

hostname node2.imanudin.net
vi /etc/sysconfig/network

Change HOSTNAME so that like below :

NETWORKING=yes
HOSTNAME=node2.imanudin.net

# Update repos and install packages Heartbeat on all nodes (node1 and node2)

yum update
yum install epel-release
yum -y install heartbeat

If you cannot get epel repo, please use this repo and install : http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

# Configure Heartbeat

– Create a file /etc/ha.d/ha.cf (enough on node1 only)

vi /etc/ha.d/ha.cf

fill with the following line

keepalive 2
warntime 5
deadtime 15
initdead 90
udpport 694
auto_failback on
ucast eth0 192.168.80.92
logfile /var/log/ha-log
node node1.imanudin.net node2.imanudin.net

Note :

eth0 is interface on your system. If your system using eth1 for interface name, please change eth0 to the eth1. 192.168.80.92 is IP Address of node2

– Create a file /etc/ha.d/authkeys (enough on node1 only)

vi /etc/ha.d/authkeys

fill with the following line

auth 2
2 crc

change permission authkeys

chmod 0600 /etc/ha.d/authkeys

– Create a file /etc/ha.d/haresources (enough on node1 only)

vi /etc/ha.d/haresources

fill with the following line

node1.imanudin.net IPaddr::192.168.80.93/24/eth0:0

Note :

node1.imanudin.net will become as a master server. 192.168.80.93 is an alias IP for testing online/failover

# Copy those files from node1 to node2 (run the following command on node1)

cd /etc/ha.d/
scp authkeys ha.cf haresources root@192.168.80.92:/etc/ha.d/

# Change ha.cf file on node2 (run the following command on node2)

vi /etc/ha.d/ha.cf

change line ucast eth0 192.168.80.92 so that become

ucast eth0 192.168.80.91

192.168.80.91 is IP Address of node1

# Start service Heartbeat and configure for automatic startup at boot on all nodes (node1 and node2)

service heartbeat start
chkconfig heartbeat on

TESTING ONLINE FAILOVER/FAILBACK

After your start service heartbeat on all nodes, you will see an alias IP on node1. Please check with command ifconfig. For testing failover, please stop service heartbeat on node1 (service heartbeat stop). Please check your IP on node2 with command ifconfig. You will see an alias IP on node2 (an alias IP that has been taken by node2). For testing failback, please start again service heartbeat on node1 (service heartbeat start). An alias IP will automatically taken by node1.

TESTING WITH APACHE WEB SERVER

Please install Apache on all nodes

yum install httpd

– Create an index.html on DocumentRoot node1

vi /var/www/html/index.html

Fill with the following example

This is node1

Save and restart service Apache

service httpd restart

Please try to access node1 via browser. You will see a text This is node1

– Create an index.html on DocumentRoot node2

vi /var/www/html/index.html

Fill with the following example

This is node2

Save and restart service Apache

service httpd restart

Please try to access node2 via browser. You will see a text This is node2

Integrate Apache with Heartbeat

Please change file /etc/ha.d/haresources on all nodes

vi /etc/ha.d/haresources

so that like below :

node1.imanudin.net IPaddr::192.168.80.93/24/eth0:0 httpd

Stop service Apache and configure automatic off at boot on all nodes (Service Apache will be handled by Heartbeat)

service httpd stop
chkconfig httpd off

Please try to access an alias IP from browser. You will see a text This is node1. Please try to stop Heartbeat service on node1 and refresh browser. You will see a text This is node2 (all services handled by Heartbeat on node1 will be taken by node2). For failback, please start again Heartbeat service on node1 (all services handled by Heartbeat on node2 will be taken again by node1)

You could also experiment with other services for online failover such as Samba, MySQL, MariaDB etc. The Heartbeat application only configure failover/failback, not data synchronize.

Good luck and hopefully useful 😀

Let’s see the video on Youtube

49 comments

  1. hei iman.
    after startup my zimbra cant run, maybe because drbd mount nothing. here status after startup:

    [root@n1 ~]# service drbd status drbd driver loaded OK; device status:
    version: 8.3.16 (api:88/proto:86-97)
    GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37
    m:res cs ro ds p mounted fstype
    0:r0 Connected Primary/Secondary UpToDate/UpToDate C

  2. hi ya
    i can’t start zimbra service even i put in haresource. this is my haresource:
    node1.aaa.net IPaddr::192.168.1.100/24/eth0:0 zimbra

    1. Hi Rom,

      Please send to me ha-log in folder /var/log/. I will try again in my environment and make the video for documentation

  3. hi ahmad,

    what is the difference between ucast eth0 ipaddress and bcast eth0 in ha.cf file, cause i saw in another tutorial they use bcast eth0 without IP address instead of ucast eth0 with IP addres

    and i had alot of errors in ha-log like this :
    heartbeat: [2533]: ERROR: Message hist queue is filling up (500messages in queue )
    what that means ?

    1. Hi Kazi,

      I don’t know where the difference among them. But, if talking about technically, bcast only looking for 254 ip address if address using /24. For example :

      node1 : 192.168.1.1/24
      node2 : 192.168.2.1/24
      

      Node1 will not find node2 caused using bcast instead of ucast

      CMIIW

  4. Mas Udin,

    Ada case begini :

    Ketika virtual ip sudah ada di node1 dan node1 di restart networknya, virtual ip akan hilang, bagaimana caranya ketika di restart network tidak hilang virtualipnya.

    Thanks

  5. mas jika test nya di node1 di shutdown maka bagaimana dengan node2?

    saya coba shutdown tidak running, jika heartbeat nya di stop maka dapat running automatis akan ada salah satu menjadi primary

    1. Hi mas,

      Node2 akan take over apabila node1 mati (service heartbeat yang mati sudah cukup). Jika tidak otomatis take over, coba cek lognya di /var/log/ha-log

      1. hi mas,

        oh iya kalau shutdown udah bisa service udah take over ke node2.
        namun jika saya disconect atau power mati langsung belum ke handle. apa ada setingan lagi?

  6. Hi Iman, generally for ha or failover we use shared lun configured on both the nodes. without it how failover is possible ..???

    1. Hi,

      Online Failover/Failback not required shared LUN or anything. If you think about HA, yes you should have shared storage and you can use DRBD for shared storage 😉

  7. Hi Iman, I need your help please. first when I configured heartbeat and test it whith apache every thing was ok. I installed drbd but when I configured heartbeat with zimbra and named it dont work. I stopped drbd and test heartbeat with only apache but it dont work too. I dont know what to do. thanks for helping

  8. Hi Iman, great tutorial!
    I have a question about virtual interface, can I set another network interface? (ie. eth1) with another subnet (how about gateway).
    Regards!
    Sorry with my english (I speak spanish)

  9. Hi Iman. Thank you for this post. I have a question.
    I have two server that i install zimbra 8.6. server A is master and server B is slave. server A ON and no problem but server B has a problem. When i run zmcontrol restart, the ldap can not start because address for server ldap is LDAP://test.com:389 and test.com is common IP.
    I decide change zmlocalconfig and change ldap_url to ldpa://127.0.0.1:389 and ldap could start.
    Is correct my solution ?
    Do you have any solution?
    Thanks.

  10. hallo mas iman, saya telah mecoba step by step seperti tutorial di atas, semua berkerja dengan baik pada saat node 1 mati, ip langgsung masuk ke node 2 dengan membuat eth0:0. akan tetapi saya mempunyai kendala. kok IP heartbeatnya tidak bisa di akses iya? saya coba ping dari request time out.

    1. Hi mas,

      Coba ping dari server node1 dan node2 apakah IP alias bisa di ping juga atau tidak. Jika tidak bisa, coba cek kembali apakah ada firewall atau hal yang lainnya

  11. Hallo mas iman, saya sudah coba dan berhasil yang diatas, kasus saya pada jaringan ip publik , IP dari heartbeat yang mebawahi 2 ip publik server dibawahnya hendak di jadikan domain sehingga akan diakses menggunaka domain, kalo seperti itu bagaimana mas, mohon, pencerahannya

    1. Hi Mas,

      Bisa menggunakan 1 nama dengan 2 IP public. Misalnya nama mail.imanudin.net memiliki 2 IP public. Kedua IP Public tersebut merupakan ip dari si Heartbeat

  12. Hi, bapak imanuddin.. saya ingin mencoba implementasikan ini di perusahaan saya bekerja. utk kemudahan saya berkomunikasi apakah saya boleh meminta kontak bapak ?

  13. Hi Iman,
    How about if I wanted to make the node2 take over the node1 in case node1 suddenly “disappear”, ex. an unexpected shutdown.

    Based on what I’ve tried, node2 won’t take over because it cannot communicate with the heartbeat service on node1.

      1. Hi,
        Yes I’m aware of that, but the case is if node1 shutdown improperly (ex. corrupt OS) then node2 won’t take over.

        If I shutdown node1 properly then yes node2 will take over.

          1. Hi Iman,

            Been some time:)
            I want node2 to take over if node1 is missing (regardless any reason).
            But the thing is, node2 WILL ONLY take over is node1 is shutdown properly (means the heartbeat service properly stop).

  14. no package heartbeat available in epel-release?
    how to install configure ha for zimbra?
    please post guide for centos 7

  15. Hi Iman,
    I am using a configuration where there is 1 primary (node1) and 1 secondary(node2) server. mysql will always run on active node.In case of failover, when primary server switches to secondary, mysql will start on secondary(node2) and stop in primary(node1) .

    . auto_failback is set to off.
    Got an issue where primary server logs are

    as1 CRIT: Cluster node as2 returning after partition.
    as1 heartbeat: [13375]: info: For information on cluster partitions, See URL: http://linux-ha.org/SplitBrain
    as1 heartbeat: [13375]: WARN: Deadtime value may be too small.
    as1 info: See FAQ for information on tuning deadtime.

    After this, primary server heartbeat got restarted and mysql stopped as expected.
    On secondary server, failover completed with mysql start and working fine.
    But after sometime, I again see that resource manager is again triggering mysql start
    as2 ResourceManager[10871]: info: Running /etc/ha.d/resource.d/mysqld start
    causing long glitch. Could you please help me in that.

  16. Hi Iman,

    Very nicely written. Congratulations. I needed this for a customer running an old setup. Works well on centos6.

  17. how can I implement HA for Zimbra 8.6 in Centos 7 as there is no heartbeat package available in centos 7?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.