How To Configure Online Failover/Failback on CentOS 6 Using Heartbeat

This article will explain how to configuration failover/failback using Heartbeat application. According this article : http://linux-ha.org/wiki/Heartbeat

Heartbeat is a daemon that provides cluster infrastructure (communication and membership) services to its clients. This allows clients to know about the presence (or disappearance!) of peer processes on other machines and to easily exchange messages with them.

In this guidance, i am build 2 system for online failover. The systems using CentOS 6 64 Bit. For easy understanding, this is my information system

# Server 1
Hostname   : node1
Domain     : imanudin.net
IP Address : 192.168.80.91

# Server 2
Hostname   : node2
Domain     : imanudin.net
IP Address : 192.168.80.92

# Alias IP for online failover testing
IP Address : 192.168.80.93

# Configure Network

First, we must configure network on CentOS. Assuming name of your network interface is eth0. Do the following configuration on all nodes (node1 and node2) and adjust on node2
[code lang=’bash’]
vi /etc/sysconfig/network-scripts/ifcfg-eth0
[/code]

DEVICE=eth0
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none
IPADDR=192.168.80.91
NETMASK=255.255.255.0
DNS1=192.168.80.91
GATEWAY=192.168.80.11
DNS2=192.168.80.11
DNS3=8.8.8.8
USERCTL=no

Restart network service and setup for automatic boot on all nodes (node1 and node2)
[code lang=’bash’]
service network restart
chkconfig network on
[/code]
# Configure Disable Selinux & Firewall on all nodes (node1 and node2)

Open file /etc/sysconfig/selinux and change SELINUX=enforcing become SELINUX=disabled. Also disable some service such as iptables and ip6tables.
[code lang=’bash’]
setenforce 0
service iptables stop
service ip6tables stop
chkconfig iptables off
chkconfig ip6tables off
[/code]
# Configure /etc/hosts and hostname on all nodes (node1 and node2)

Open file /etc/hosts and configure as follows

# node1
127.0.0.1     localhost
192.168.80.91 node1.imanudin.net node1
192.168.80.92 node2.imanudin.net node2

# node2
127.0.0.1     localhost
192.168.80.91 node1.imanudin.net node1
192.168.80.92 node2.imanudin.net node2

Do the following command as root and open file /etc/sysconfig/network to change hostname

– On node1
[code lang=’bash’]
hostname node1.imanudin.net
vi /etc/sysconfig/network
[/code]
Change HOSTNAME so that like below :

NETWORKING=yes
HOSTNAME=node1.imanudin.net

– On node2
[code lang=’bash’]
hostname node2.imanudin.net
vi /etc/sysconfig/network
[/code]
Change HOSTNAME so that like below :

NETWORKING=yes
HOSTNAME=node2.imanudin.net

# Update repos and install packages Heartbeat on all nodes (node1 and node2)

[code lang=’bash’]
yum update
yum install epel-release
yum -y install heartbeat
[/code]

If you cannot get epel repo, please use this repo and install : http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

# Configure Heartbeat

– Create a file /etc/ha.d/ha.cf (enough on node1 only)
[code lang=’bash’]
vi /etc/ha.d/ha.cf
[/code]
fill with the following line

keepalive 2
warntime 5
deadtime 15
initdead 90
udpport 694
auto_failback on
ucast eth0 192.168.80.92
logfile /var/log/ha-log
node node1.imanudin.net node2.imanudin.net

Note :

eth0 is interface on your system. If your system using eth1 for interface name, please change eth0 to the eth1. 192.168.80.92 is IP Address of node2

– Create a file /etc/ha.d/authkeys (enough on node1 only)
[code lang=’bash’]
vi /etc/ha.d/authkeys
[/code]
fill with the following line

auth 2
2 crc

change permission authkeys
[code lang=’bash’]
chmod 0600 /etc/ha.d/authkeys
[/code]
– Create a file /etc/ha.d/haresources (enough on node1 only)
[code lang=’bash’]
vi /etc/ha.d/haresources
[/code]
fill with the following line

node1.imanudin.net IPaddr::192.168.80.93/24/eth0:0

Note :

node1.imanudin.net will become as a master server. 192.168.80.93 is an alias IP for testing online/failover

# Copy those files from node1 to node2 (run the following command on node1)
[code lang=’bash’]
cd /etc/ha.d/
scp authkeys ha.cf haresources [email protected]:/etc/ha.d/
[/code]
# Change ha.cf file on node2 (run the following command on node2)
[code lang=’bash’]
vi /etc/ha.d/ha.cf
[/code]
change line ucast eth0 192.168.80.92 so that become

ucast eth0 192.168.80.91

192.168.80.91 is IP Address of node1

# Start service Heartbeat and configure for automatic startup at boot on all nodes (node1 and node2)
[code lang=’bash’]
service heartbeat start
chkconfig heartbeat on
[/code]
TESTING ONLINE FAILOVER/FAILBACK

After your start service heartbeat on all nodes, you will see an alias IP on node1. Please check with command ifconfig. For testing failover, please stop service heartbeat on node1 (service heartbeat stop). Please check your IP on node2 with command ifconfig. You will see an alias IP on node2 (an alias IP that has been taken by node2). For testing failback, please start again service heartbeat on node1 (service heartbeat start). An alias IP will automatically taken by node1.

TESTING WITH APACHE WEB SERVER

Please install Apache on all nodes
[code lang=’bash’]
yum install httpd
[/code]
– Create an index.html on DocumentRoot node1
[code lang=’bash’]
vi /var/www/html/index.html
[/code]
Fill with the following example

This is node1

Save and restart service Apache
[code lang=’bash’]
service httpd restart
[/code]
Please try to access node1 via browser. You will see a text This is node1

– Create an index.html on DocumentRoot node2
[code lang=’bash’]
vi /var/www/html/index.html
[/code]
Fill with the following example

This is node2

Save and restart service Apache
[code lang=’bash’]
service httpd restart
[/code]
Please try to access node2 via browser. You will see a text This is node2

Integrate Apache with Heartbeat

Please change file /etc/ha.d/haresources on all nodes
[code lang=’bash’]
vi /etc/ha.d/haresources
[/code]
so that like below :

node1.imanudin.net IPaddr::192.168.80.93/24/eth0:0 httpd

Stop service Apache and configure automatic off at boot on all nodes (Service Apache will be handled by Heartbeat)
[code lang=’bash’]
service httpd stop
chkconfig httpd off
[/code]
Please try to access an alias IP from browser. You will see a text This is node1. Please try to stop Heartbeat service on node1 and refresh browser. You will see a text This is node2 (all services handled by Heartbeat on node1 will be taken by node2). For failback, please start again Heartbeat service on node1 (all services handled by Heartbeat on node2 will be taken again by node1)

You could also experiment with other services for online failover such as Samba, MySQL, MariaDB etc. The Heartbeat application only configure failover/failback, not data synchronize.

Good luck and hopefully useful 😀

Let’s see the video on Youtube

49 comments

braddy says:

June 6, 2015 at 4:18 pm

hei iman.
after startup my zimbra cant run, maybe because drbd mount nothing. here status after startup:

[root@n1 ~]# service drbd status drbd driver loaded OK; device status:
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37
m:res cs ro ds p mounted fstype
0:r0 Connected Primary/Secondary UpToDate/UpToDate C

Reply
braddy says:

June 6, 2015 at 4:20 pm

sorry that for this article

https://imanudin.net/2015/03/24/how-to-install-configure-zimbra-high-availability-ha/

Reply
rom says:

June 7, 2015 at 5:03 pm

hi ya
i can’t start zimbra service even i put in haresource. this is my haresource:
node1.aaa.net IPaddr::192.168.1.100/24/eth0:0 zimbra

Reply
1. iman says:
  
  June 8, 2015 at 1:22 am
  
  Hi Rom,
  
  Please send to me ha-log in folder /var/log/. I will try again in my environment and make the video for documentation
  
  Reply
kazi says:

November 3, 2015 at 3:38 am

hi ahmad,

what is the difference between ucast eth0 ipaddress and bcast eth0 in ha.cf file, cause i saw in another tutorial they use bcast eth0 without IP address instead of ucast eth0 with IP addres

and i had alot of errors in ha-log like this :
heartbeat: [2533]: ERROR: Message hist queue is filling up (500messages in queue )
what that means ?

Reply
1. iman says:
  
  November 5, 2015 at 4:45 am
  Hi Kazi,
  
  I don’t know where the difference among them. But, if talking about technically, bcast only looking for 254 ip address if address using /24. For example :
```
node1 : 192.168.1.1/24
node2 : 192.168.2.1/24
```
  Node1 will not find node2 caused using bcast instead of ucast
  
  CMIIW
  Reply
kazi says:

November 11, 2015 at 1:51 am

Hi Ahmad,
Thanks for your article n explanation

Reply
Pingback: Configure HA (high availability) in Centos 6 Using Heartbeat | SkrinHitam
Siva says:

April 27, 2016 at 11:55 am

Thanks for the explanation.
Worked like Charm!

Reply
[email protected] says:

August 15, 2016 at 8:07 am

Mas Udin,

Ada case begini :

Ketika virtual ip sudah ada di node1 dan node1 di restart networknya, virtual ip akan hilang, bagaimana caranya ketika di restart network tidak hilang virtualipnya.

Thanks

Reply
1. iman says:
  
  August 16, 2016 at 10:38 am
  
  Hi mas,
  
  Saat ini belum nemu method nya mas 😀
  
  Reply
asep dadan says:

September 15, 2016 at 9:46 am

mas jika test nya di node1 di shutdown maka bagaimana dengan node2?

saya coba shutdown tidak running, jika heartbeat nya di stop maka dapat running automatis akan ada salah satu menjadi primary

Reply
1. iman says:
  
  September 21, 2016 at 2:49 am
  
  Hi mas,
  
  Node2 akan take over apabila node1 mati (service heartbeat yang mati sudah cukup). Jika tidak otomatis take over, coba cek lognya di /var/log/ha-log
  
  Reply
  1. Asep Dadan says:
    
    September 29, 2016 at 2:54 am
    
    hi mas,
    
    oh iya kalau shutdown udah bisa service udah take over ke node2.
    namun jika saya disconect atau power mati langsung belum ke handle. apa ada setingan lagi?
    
    Reply
Vinod says:

October 25, 2016 at 6:22 pm

Hi Iman, generally for ha or failover we use shared lun configured on both the nodes. without it how failover is possible ..???

Reply
1. iman says:
  
  October 26, 2016 at 3:45 am
  
  Hi,
  
  Online Failover/Failback not required shared LUN or anything. If you think about HA, yes you should have shared storage and you can use DRBD for shared storage 😉
  
  Reply
cesar says:

October 31, 2016 at 1:21 am

can you do it with pacemaker instead, please?

Reply
1. iman says:
  
  October 31, 2016 at 9:13 am
  
  Hi Cesar,
  
  I will try later 😉
  
  Reply
JAS says:

November 15, 2016 at 8:36 am

Hi Iman, I need your help please. first when I configured heartbeat and test it whith apache every thing was ok. I installed drbd but when I configured heartbeat with zimbra and named it dont work. I stopped drbd and test heartbeat with only apache but it dont work too. I dont know what to do. thanks for helping

Reply
1. iman says:
  
  November 17, 2016 at 6:27 am
  
  Hello,
  
  What the contents of /etc/ha.d/haresources?
  
  Reply
Miguel Yucra says:

November 16, 2016 at 4:43 pm

Hi Iman, great tutorial!
I have a question about virtual interface, can I set another network interface? (ie. eth1) with another subnet (how about gateway).
Regards!
Sorry with my english (I speak spanish)

Reply
1. iman says:
  
  November 17, 2016 at 6:39 am
  
  Hi Miguel,
  
  Yes, you can set another network interface. For routing, you can learn about routing from here : https://www.cyberciti.biz/faq/linux-route-add/
  
  Reply
Pujo says:

December 30, 2016 at 2:58 am

Hi Iman,

Have you applied this in production servers?

Reply
1. iman says:
  
  January 4, 2017 at 3:16 am
  
  Hi Pujo,
  
  Yes of course 😉
  
  Reply
saeid says:

January 8, 2017 at 1:17 pm

Hi Iman. Thank you for this post. I have a question.
I have two server that i install zimbra 8.6. server A is master and server B is slave. server A ON and no problem but server B has a problem. When i run zmcontrol restart, the ldap can not start because address for server ldap is LDAP://test.com:389 and test.com is common IP.
I decide change zmlocalconfig and change ldap_url to ldpa://127.0.0.1:389 and ldap could start.
Is correct my solution ?
Do you have any solution?
Thanks.

Reply
1. iman says:
  
  January 8, 2017 at 11:09 pm
  
  Hi Saeid,
  
  You should not change ldap_url. You can follow this guidance if you want to configure Zimbra HA : https://imanudin.net/2015/03/24/how-to-install-configure-zimbra-high-availability-ha/
  
  Reply
eth0:0 says:

March 25, 2017 at 2:57 pm

hallo mas iman, saya telah mecoba step by step seperti tutorial di atas, semua berkerja dengan baik pada saat node 1 mati, ip langgsung masuk ke node 2 dengan membuat eth0:0. akan tetapi saya mempunyai kendala. kok IP heartbeatnya tidak bisa di akses iya? saya coba ping dari request time out.

Reply
1. iman says:
  
  March 28, 2017 at 4:01 am
  
  Hi mas,
  
  Coba ping dari server node1 dan node2 apakah IP alias bisa di ping juga atau tidak. Jika tidak bisa, coba cek kembali apakah ada firewall atau hal yang lainnya
  
  Reply
nur says:

April 12, 2017 at 2:41 pm

Hallo mas iman, saya sudah coba dan berhasil yang diatas, kasus saya pada jaringan ip publik , IP dari heartbeat yang mebawahi 2 ip publik server dibawahnya hendak di jadikan domain sehingga akan diakses menggunaka domain, kalo seperti itu bagaimana mas, mohon, pencerahannya

Reply
1. iman says:
  
  April 27, 2017 at 4:43 am
  
  Hi Mas,
  
  Bisa menggunakan 1 nama dengan 2 IP public. Misalnya nama mail.imanudin.net memiliki 2 IP public. Kedua IP Public tersebut merupakan ip dari si Heartbeat
  
  Reply
BFJ says:

December 3, 2017 at 5:13 am

Hi, bapak imanuddin.. saya ingin mencoba implementasikan ini di perusahaan saya bekerja. utk kemudahan saya berkomunikasi apakah saya boleh meminta kontak bapak ?

Reply
1. iman says:
  
  December 4, 2017 at 10:57 am
  
  Hi pak,
  Untuk kontak, silakan submit pada form berikut : https://imanudin.net/contact/
  
  Reply
macienne12 says:

December 27, 2017 at 7:11 am

Hi Iman,
How about if I wanted to make the node2 take over the node1 in case node1 suddenly “disappear”, ex. an unexpected shutdown.

Based on what I’ve tried, node2 won’t take over because it cannot communicate with the heartbeat service on node1.

Reply
1. iman says:
  
  December 31, 2017 at 9:57 am
  
  Hi,
  If Heartbeat service stopped on node1, node2 will take over even node1 did not shutdown/still power on
  
  Reply
  1. macienne12 says:
    
    January 3, 2018 at 6:09 am
    
    Hi,
    Yes I’m aware of that, but the case is if node1 shutdown improperly (ex. corrupt OS) then node2 won’t take over.
    
    If I shutdown node1 properly then yes node2 will take over.
    
    Reply
    1. iman says:
      
      January 4, 2018 at 6:48 am
      
      Hi,
      Did you want to not take over if node1 shutdown improperly?
      
      Reply
      1. macienne12 says:
        
        April 17, 2018 at 8:17 am
        
        Hi Iman,
        
        Been some time:)
        I want node2 to take over if node1 is missing (regardless any reason).
        But the thing is, node2 WILL ONLY take over is node1 is shutdown properly (means the heartbeat service properly stop).
      2. macienne12 says:
        
        April 17, 2018 at 8:20 am
        
        Hi,
        
        My bad. It actually took over the service.
        Thanks for the useful guide!!
senthilkumar says:

March 3, 2018 at 12:56 pm

no package heartbeat available in epel-release?
how to install configure ha for zimbra?
please post guide for centos 7

Reply
1. iman says:
  
  March 4, 2018 at 3:53 am
  
  Hi,
  
  Heartbeat is no longer available on RHEL/CentOS 7. You can use corosync, pacemaker to do that.
  
  Reply
jess says:

September 10, 2018 at 2:42 am

hi iman, can you post a guide o how to setup high availability using corosync and pacemaker

thank you

Reply
1. iman says:
  
  September 14, 2018 at 8:37 am
  
  Hi Jess,
  I have not try on Pacemaker. I will try later
  
  Reply
  1. Mahima Gupta says:
    
    April 8, 2020 at 8:55 am
    
    Please try corosync and pacemaker. We need a guide.
    
    Reply
    1. iman says:
      
      April 12, 2020 at 10:44 am
      
      Hi Mahima Gupta,
      I am still trying 🙂
      
      Reply
medha says:

December 9, 2018 at 3:42 am

Hi Iman,
I am using a configuration where there is 1 primary (node1) and 1 secondary(node2) server. mysql will always run on active node.In case of failover, when primary server switches to secondary, mysql will start on secondary(node2) and stop in primary(node1) .

. auto_failback is set to off.
Got an issue where primary server logs are

as1 CRIT: Cluster node as2 returning after partition.
as1 heartbeat: [13375]: info: For information on cluster partitions, See URL: http://linux-ha.org/SplitBrain
as1 heartbeat: [13375]: WARN: Deadtime value may be too small.
as1 info: See FAQ for information on tuning deadtime.

After this, primary server heartbeat got restarted and mysql stopped as expected.
On secondary server, failover completed with mysql start and working fine.
But after sometime, I again see that resource manager is again triggering mysql start
as2 ResourceManager[10871]: info: Running /etc/ha.d/resource.d/mysqld start
causing long glitch. Could you please help me in that.

Reply
1. iman says:
  
  December 11, 2018 at 7:41 am
  
  Hello,
  You can try to increase deadtime value and try again
  
  Reply
Ravindra Kumar says:

December 17, 2018 at 2:14 pm

Hi Iman,

Very nicely written. Congratulations. I needed this for a customer running an old setup. Works well on centos6.

Reply
Mahima Gupta says:

April 8, 2020 at 8:52 am

how can I implement HA for Zimbra 8.6 in Centos 7 as there is no heartbeat package available in centos 7?

Reply
1. someextrangename says:
  
  May 24, 2021 at 2:06 pm
  
  use corosync instead, this guide is so good
  
  https://jensd.be/156/linux/building-a-high-available-failover-cluster-with-pacemaker-corosync-pcs
  
  Reply

Share this:

Leave a Reply Cancel reply