This article will explain how to configuration failover/failback using Heartbeat application. According this article : http://linux-ha.org/wiki/Heartbeat
Heartbeat is a daemon that provides cluster infrastructure (communication and membership) services to its clients. This allows clients to know about the presence (or disappearance!) of peer processes on other machines and to easily exchange messages with them.
In this guidance, i am build 2 system for online failover. The systems using CentOS 6 64 Bit. For easy understanding, this is my information system
# Server 1 Hostname : node1 Domain : imanudin.net IP Address : 192.168.80.91 # Server 2 Hostname : node2 Domain : imanudin.net IP Address : 192.168.80.92 # Alias IP for online failover testing IP Address : 192.168.80.93
# Configure Network
First, we must configure network on CentOS. Assuming name of your network interface is eth0. Do the following configuration on all nodes (node1 and node2) and adjust on node2
vi /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0 ONBOOT=yes NM_CONTROLLED=no BOOTPROTO=none IPADDR=192.168.80.91 NETMASK=255.255.255.0 DNS1=192.168.80.91 GATEWAY=192.168.80.11 DNS2=192.168.80.11 DNS3=8.8.8.8 USERCTL=no
Restart network service and setup for automatic boot on all nodes (node1 and node2)
service network restart chkconfig network on
# Configure Disable Selinux & Firewall on all nodes (node1 and node2)
Open file /etc/sysconfig/selinux and change SELINUX=enforcing become SELINUX=disabled. Also disable some service such as iptables and ip6tables.
setenforce 0 service iptables stop service ip6tables stop chkconfig iptables off chkconfig ip6tables off
# Configure /etc/hosts and hostname on all nodes (node1 and node2)
Open file /etc/hosts and configure as follows
# node1 127.0.0.1 localhost 192.168.80.91 node1.imanudin.net node1 192.168.80.92 node2.imanudin.net node2 # node2 127.0.0.1 localhost 192.168.80.91 node1.imanudin.net node1 192.168.80.92 node2.imanudin.net node2
Do the following command as root and open file /etc/sysconfig/network to change hostname
– On node1
hostname node1.imanudin.net vi /etc/sysconfig/network
Change HOSTNAME so that like below :
NETWORKING=yes HOSTNAME=node1.imanudin.net
– On node2
hostname node2.imanudin.net vi /etc/sysconfig/network
Change HOSTNAME so that like below :
NETWORKING=yes HOSTNAME=node2.imanudin.net
# Update repos and install packages Heartbeat on all nodes (node1 and node2)
yum update yum install epel-release yum -y install heartbeat
If you cannot get epel repo, please use this repo and install : http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
# Configure Heartbeat
– Create a file /etc/ha.d/ha.cf (enough on node1 only)
vi /etc/ha.d/ha.cf
fill with the following line
keepalive 2 warntime 5 deadtime 15 initdead 90 udpport 694 auto_failback on ucast eth0 192.168.80.92 logfile /var/log/ha-log node node1.imanudin.net node2.imanudin.net
Note :
eth0 is interface on your system. If your system using eth1 for interface name, please change eth0 to the eth1. 192.168.80.92 is IP Address of node2
– Create a file /etc/ha.d/authkeys (enough on node1 only)
vi /etc/ha.d/authkeys
fill with the following line
auth 2 2 crc
change permission authkeys
chmod 0600 /etc/ha.d/authkeys
– Create a file /etc/ha.d/haresources (enough on node1 only)
vi /etc/ha.d/haresources
fill with the following line
node1.imanudin.net IPaddr::192.168.80.93/24/eth0:0
Note :
node1.imanudin.net will become as a master server. 192.168.80.93 is an alias IP for testing online/failover
# Copy those files from node1 to node2 (run the following command on node1)
cd /etc/ha.d/ scp authkeys ha.cf haresources root@192.168.80.92:/etc/ha.d/
# Change ha.cf file on node2 (run the following command on node2)
vi /etc/ha.d/ha.cf
change line ucast eth0 192.168.80.92 so that become
ucast eth0 192.168.80.91
192.168.80.91 is IP Address of node1
# Start service Heartbeat and configure for automatic startup at boot on all nodes (node1 and node2)
service heartbeat start chkconfig heartbeat on
TESTING ONLINE FAILOVER/FAILBACK
After your start service heartbeat on all nodes, you will see an alias IP on node1. Please check with command ifconfig. For testing failover, please stop service heartbeat on node1 (service heartbeat stop). Please check your IP on node2 with command ifconfig. You will see an alias IP on node2 (an alias IP that has been taken by node2). For testing failback, please start again service heartbeat on node1 (service heartbeat start). An alias IP will automatically taken by node1.
TESTING WITH APACHE WEB SERVER
Please install Apache on all nodes
yum install httpd
– Create an index.html on DocumentRoot node1
vi /var/www/html/index.html
Fill with the following example
This is node1
Save and restart service Apache
service httpd restart
Please try to access node1 via browser. You will see a text This is node1
– Create an index.html on DocumentRoot node2
vi /var/www/html/index.html
Fill with the following example
This is node2
Save and restart service Apache
service httpd restart
Please try to access node2 via browser. You will see a text This is node2
Integrate Apache with Heartbeat
Please change file /etc/ha.d/haresources on all nodes
vi /etc/ha.d/haresources
so that like below :
node1.imanudin.net IPaddr::192.168.80.93/24/eth0:0 httpd
Stop service Apache and configure automatic off at boot on all nodes (Service Apache will be handled by Heartbeat)
service httpd stop chkconfig httpd off
Please try to access an alias IP from browser. You will see a text This is node1. Please try to stop Heartbeat service on node1 and refresh browser. You will see a text This is node2 (all services handled by Heartbeat on node1 will be taken by node2). For failback, please start again Heartbeat service on node1 (all services handled by Heartbeat on node2 will be taken again by node1)
You could also experiment with other services for online failover such as Samba, MySQL, MariaDB etc. The Heartbeat application only configure failover/failback, not data synchronize.
Good luck and hopefully useful 😀
Let’s see the video on Youtube
hei iman.
after startup my zimbra cant run, maybe because drbd mount nothing. here status after startup:
[root@n1 ~]# service drbd status drbd driver loaded OK; device status:
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37
m:res cs ro ds p mounted fstype
0:r0 Connected Primary/Secondary UpToDate/UpToDate C
sorry that for this article
https://imanudin.net/2015/03/24/how-to-install-configure-zimbra-high-availability-ha/
hi ya
i can’t start zimbra service even i put in haresource. this is my haresource:
node1.aaa.net IPaddr::192.168.1.100/24/eth0:0 zimbra
Hi Rom,
Please send to me ha-log in folder /var/log/. I will try again in my environment and make the video for documentation
hi ahmad,
what is the difference between ucast eth0 ipaddress and bcast eth0 in ha.cf file, cause i saw in another tutorial they use bcast eth0 without IP address instead of ucast eth0 with IP addres
and i had alot of errors in ha-log like this :
heartbeat: [2533]: ERROR: Message hist queue is filling up (500messages in queue )
what that means ?
Hi Kazi,
I don’t know where the difference among them. But, if talking about technically, bcast only looking for 254 ip address if address using /24. For example :
Node1 will not find node2 caused using bcast instead of ucast
CMIIW
Hi Ahmad,
Thanks for your article n explanation
Thanks for the explanation.
Worked like Charm!
Mas Udin,
Ada case begini :
Ketika virtual ip sudah ada di node1 dan node1 di restart networknya, virtual ip akan hilang, bagaimana caranya ketika di restart network tidak hilang virtualipnya.
Thanks
Hi mas,
Saat ini belum nemu method nya mas 😀
mas jika test nya di node1 di shutdown maka bagaimana dengan node2?
saya coba shutdown tidak running, jika heartbeat nya di stop maka dapat running automatis akan ada salah satu menjadi primary
Hi mas,
Node2 akan take over apabila node1 mati (service heartbeat yang mati sudah cukup). Jika tidak otomatis take over, coba cek lognya di /var/log/ha-log
hi mas,
oh iya kalau shutdown udah bisa service udah take over ke node2.
namun jika saya disconect atau power mati langsung belum ke handle. apa ada setingan lagi?
Hi Iman, generally for ha or failover we use shared lun configured on both the nodes. without it how failover is possible ..???
Hi,
Online Failover/Failback not required shared LUN or anything. If you think about HA, yes you should have shared storage and you can use DRBD for shared storage 😉
can you do it with pacemaker instead, please?
Hi Cesar,
I will try later 😉
Hi Iman, I need your help please. first when I configured heartbeat and test it whith apache every thing was ok. I installed drbd but when I configured heartbeat with zimbra and named it dont work. I stopped drbd and test heartbeat with only apache but it dont work too. I dont know what to do. thanks for helping
Hello,
What the contents of /etc/ha.d/haresources?
Hi Iman, great tutorial!
I have a question about virtual interface, can I set another network interface? (ie. eth1) with another subnet (how about gateway).
Regards!
Sorry with my english (I speak spanish)
Hi Miguel,
Yes, you can set another network interface. For routing, you can learn about routing from here : https://www.cyberciti.biz/faq/linux-route-add/
Hi Iman,
Have you applied this in production servers?
Hi Pujo,
Yes of course 😉
Hi Iman. Thank you for this post. I have a question.
I have two server that i install zimbra 8.6. server A is master and server B is slave. server A ON and no problem but server B has a problem. When i run zmcontrol restart, the ldap can not start because address for server ldap is LDAP://test.com:389 and test.com is common IP.
I decide change zmlocalconfig and change ldap_url to ldpa://127.0.0.1:389 and ldap could start.
Is correct my solution ?
Do you have any solution?
Thanks.
Hi Saeid,
You should not change ldap_url. You can follow this guidance if you want to configure Zimbra HA : https://imanudin.net/2015/03/24/how-to-install-configure-zimbra-high-availability-ha/
hallo mas iman, saya telah mecoba step by step seperti tutorial di atas, semua berkerja dengan baik pada saat node 1 mati, ip langgsung masuk ke node 2 dengan membuat eth0:0. akan tetapi saya mempunyai kendala. kok IP heartbeatnya tidak bisa di akses iya? saya coba ping dari request time out.
Hi mas,
Coba ping dari server node1 dan node2 apakah IP alias bisa di ping juga atau tidak. Jika tidak bisa, coba cek kembali apakah ada firewall atau hal yang lainnya
Hallo mas iman, saya sudah coba dan berhasil yang diatas, kasus saya pada jaringan ip publik , IP dari heartbeat yang mebawahi 2 ip publik server dibawahnya hendak di jadikan domain sehingga akan diakses menggunaka domain, kalo seperti itu bagaimana mas, mohon, pencerahannya
Hi Mas,
Bisa menggunakan 1 nama dengan 2 IP public. Misalnya nama mail.imanudin.net memiliki 2 IP public. Kedua IP Public tersebut merupakan ip dari si Heartbeat
Hi, bapak imanuddin.. saya ingin mencoba implementasikan ini di perusahaan saya bekerja. utk kemudahan saya berkomunikasi apakah saya boleh meminta kontak bapak ?
Hi pak,
Untuk kontak, silakan submit pada form berikut : https://imanudin.net/contact/
Hi Iman,
How about if I wanted to make the node2 take over the node1 in case node1 suddenly “disappear”, ex. an unexpected shutdown.
Based on what I’ve tried, node2 won’t take over because it cannot communicate with the heartbeat service on node1.
Hi,
If Heartbeat service stopped on node1, node2 will take over even node1 did not shutdown/still power on
Hi,
Yes I’m aware of that, but the case is if node1 shutdown improperly (ex. corrupt OS) then node2 won’t take over.
If I shutdown node1 properly then yes node2 will take over.
Hi,
Did you want to not take over if node1 shutdown improperly?
Hi Iman,
Been some time:)
I want node2 to take over if node1 is missing (regardless any reason).
But the thing is, node2 WILL ONLY take over is node1 is shutdown properly (means the heartbeat service properly stop).
Hi,
My bad. It actually took over the service.
Thanks for the useful guide!!
no package heartbeat available in epel-release?
how to install configure ha for zimbra?
please post guide for centos 7
Hi,
Heartbeat is no longer available on RHEL/CentOS 7. You can use corosync, pacemaker to do that.
hi iman, can you post a guide o how to setup high availability using corosync and pacemaker
thank you
Hi Jess,
I have not try on Pacemaker. I will try later
Please try corosync and pacemaker. We need a guide.
Hi Mahima Gupta,
I am still trying 🙂
Hi Iman,
I am using a configuration where there is 1 primary (node1) and 1 secondary(node2) server. mysql will always run on active node.In case of failover, when primary server switches to secondary, mysql will start on secondary(node2) and stop in primary(node1) .
. auto_failback is set to off.
Got an issue where primary server logs are
as1 CRIT: Cluster node as2 returning after partition.
as1 heartbeat: [13375]: info: For information on cluster partitions, See URL: http://linux-ha.org/SplitBrain
as1 heartbeat: [13375]: WARN: Deadtime value may be too small.
as1 info: See FAQ for information on tuning deadtime.
After this, primary server heartbeat got restarted and mysql stopped as expected.
On secondary server, failover completed with mysql start and working fine.
But after sometime, I again see that resource manager is again triggering mysql start
as2 ResourceManager[10871]: info: Running /etc/ha.d/resource.d/mysqld start
causing long glitch. Could you please help me in that.
Hello,
You can try to increase deadtime value and try again
Hi Iman,
Very nicely written. Congratulations. I needed this for a customer running an old setup. Works well on centos6.
how can I implement HA for Zimbra 8.6 in Centos 7 as there is no heartbeat package available in centos 7?
use corosync instead, this guide is so good
https://jensd.be/156/linux/building-a-high-available-failover-cluster-with-pacemaker-corosync-pcs