How To Install & Configure Zimbra High Availability (HA)

Home » Zimbra » How To Install & Configure Zimbra High Availability (HA)
Zimbra 91 Comments

In the previous articles, i’ve been explain how to install and configure Zimbra on CentOS 6 or CentOS 7, how to install and configure online failover/failback on CentOS 6 using Heartbeat and how to install and configure data replication on CentOS 6 using DRBD. All above guidance could be combined to get Zimbra High Availability. For online failover/failback, you could using Heartbeat. For data replication, you could using DRBD. Heartbeat + DRBD will produce High Availability (HA). The following is guidance to configure Zimbra HA

Step by step to configure Zimbra HA

For the Linux systems, i am using CentOS 6 64 Bit. For easy understanding, this is my information system

# Server 1
Hostname   : node1
Domain     : imanudin.net
IP Address : 192.168.80.91

# Server 2
Hostname   : node2
Domain     : imanudin.net
IP Address : 192.168.80.92

# Alias IP
Hostname   : mail
Domain     : imanudin.net
IP Address : 192.168.80.93

Alias IP will be used for access clients/users. This alias IP will be configured online failover

# install Zimbra on CentOS 6 on all nodes (node1 and node2) as described at this link : How To Install Zimbra 8.6 on CentOS 6. Please note some information below

– Please change name of each nodes refers into mail.imanudin.net when installing Zimbra

– Set IP address of each nodes refers into mail.imanudin.net include DNS and /etc/hosts

# Stop Zimbra and DNS services on all nodes (node1 and node2)

su - zimbra -c "zmcontrol stop"
service named stop
chkconfig zimbra off
chkconfig named off

# After installed Zimbra, install and configure Heartbeat on all nodes (node1 and node2) as described at this link : How To Configure Online Failover/Failback on CentOS 6 Using Heartbeat

# After installed Heartbeat and online failover/failback working fine, then install DRBD for data replication on all nodes (node1 and node2) as described at this link : How To Configure Data Replication/Synchronize on CentOS 6 Using DRBD

# Testing data replication DRBD that has been worked : Testing Data Replication/Synchronize on DRBD

# After DRBD has been worked, copy file/folder /opt/zimbra into DRBD devices.

Do the following command only at node1

– Rysnc Zimbra

drbdadm primary r0
mount /dev/drbd0 /mnt/tmp
rsync -avP --exclude=data.mdb /opt/ /mnt/tmp

data.mdb will be huge if copied by rsync so that take a long time. For the tricks, use cp for copy data.mdb to DRBD devices 😀

– Copy data.mdb

cp /opt/zimbra/data/ldap/mdb/db/data.mdb /mnt/tmp/zimbra/data/ldap/mdb/db/data.mdb
chown zimbra.zimbra /mnt/tmp/zimbra/data/ldap/mdb/db/data.mdb

# Umount DRBD devices after rsync file/folder Zimbra at node1

umount /dev/drbd0

# Move folder /opt existing to another folder, do the following command on all nodes (node1 and node2)

mv /opt /backup­opt
mkdir /opt

# Configure /etc/hosts and dns records on all nodes (node1 and node2)

vi /etc/hosts

so that like below

127.0.0.1       localhost
192.168.80.91   node1.imanudin.net   node1
192.168.80.92   node2.imanudin.net   node2
192.168.80.93   mail.imanudin.net    mail
vi /var/named/db.imanudin.net

change IP address of mail so that refers into IP 192.168.80.93. See the following example

$TTL 1D
@       IN SOA  ns1.imanudin.net. root.imanudin.net. (
                                        0       ; serial
                                        1D      ; refresh
                                        1H      ; retry
                                        1W      ; expire
                                        3H )    ; minimum
@       IN      NS      ns1.imanudin.net.
@       IN      MX      0 mail.imanudin.net.
ns1     IN      A       192.168.80.91
mail    IN      A       192.168.80.93

# Configure file /etc/ha.d/haresources on all nodes (node1 and node2)

vi /etc/ha.d/haresources

so that like below

node1.imanudin.net IPaddr::192.168.80.93/24/eth0:0 drbddisk::r0 Filesystem::/dev/drbd0::/opt::ext3 named zimbra

# Stop service Heartbeat on node2 and then node1

service heartbeat stop

# Start service Heartbeat on node1 and then node2

service heartbeat start

TESTING HA

– Failover

After Zimbra running well on node1, please stop service Heartbeat on node1 or force off machine

service heartbeat stop

All services that taken over by Heartbeat will automatically stopped and taken over by node2. How long node2 can start all services worked again, depends how long start services (named and zimbra)

– Failback

Please start again service Heartbeat on node1 or power on machine

service heartbeat start

All running services on node2 will automatically stopped and taken over by node1

Hooray, finally you could build Zimbra HA with DRBD+Heartbeat

For log information about HA process, you can see at /var/log/ha-log

Good luck and hopefully useful 😀

91 thoughts on - How To Install & Configure Zimbra High Availability (HA)

      • node 1
        [root@master ~]# service drdb status
        drdb: unrecognized service
        [root@master ~]# service drbd status
        drbd driver loaded OK; device status:
        version: 8.3.16 (api:88/proto:86-97)
        GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37
        m:res cs ro ds p mounted fstype
        0:r0 Connected Secondary/Secondary UpToDate/UpToDate C
        [root@master ~]# df -h
        Filesystem Size Used Avail Use% Mounted on
        /dev/sda2 18G 7.5G 9.1G 46% /
        tmpfs 611M 0 611M 0% /dev/shm
        /dev/sda1 283M 80M 188M 30% /boot

      • node 2

        [root@slave ~]# service drbd status
        drbd driver loaded OK; device status:
        version: 8.3.16 (api:88/proto:86-97)
        GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37
        m:res cs ro ds p mounted fstype
        0:r0 Connected Secondary/Secondary UpToDate/UpToDate C
        [root@slave ~]# df -h
        Filesystem Size Used Avail Use% Mounted on
        /dev/sda2 18G 7.3G 9.3G 44% /
        tmpfs 611M 0 611M 0% /dev/shm
        /dev/sda1 283M 80M 188M 30% /boot
        /dev/drbd0 9.9G 3.6G 5.8G 39% /opt

      • ________________________________________________
        this is when im check service zimbra

        [root@master ~]# service zimbra status
        su: warning: cannot change directory to /opt/zimbra: No such file or directory
        -bash: zmcontrol: command not found
        __________________________________________________
        im doing all step in this article and drbd work. but im not doing step editing named file couse have no dns server.

        i post comment many time couse maybe too much char in comment, sorry for bad english

        thanks for quick response btw

  • node 1
    [root@master ~]# service drdb status
    drdb: unrecognized service
    [root@master ~]# service drbd status
    drbd driver loaded OK; device status:
    version: 8.3.16 (api:88/proto:86-97)
    GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37
    m:res cs ro ds p mounted fstype
    0:r0 Connected Secondary/Secondary UpToDate/UpToDate C
    [root@master ~]# df -h
    Filesystem Size Used Avail Use% Mounted on
    /dev/sda2 18G 7.5G 9.1G 46% /
    tmpfs 611M 0 611M 0% /dev/shm
    /dev/sda1 283M 80M 188M 30% /boot

  • hei iman.
    after startup my zimbra cant run, maybe because drbd mount nothing. here status after startup:

    [root@n1 ~]# service drbd status drbd

    driver loaded OK; device status:
    version: 8.3.16 (api:88/proto:86-97)
    GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37
    m:res cs ro ds p mounted fstype
    0:r0 Connected Primary/Secondary UpToDate/UpToDate C

  • this is my hare source:

    n1 IPaddr::192.168.1.50/24/eth4 drbddisk::r0 Filesystem::/dev/drbd0::/opt::ext3 zimbra

  • haresource:

    master drbddisk::r0 Filesystem::/dev/drbd0::/opt::ext3 192.168.1.50 zimbra

    ha-log on master

    [root@master cat /var/log/ha-logatus
    Jun 08 03:18:39 master heartbeat: [2060]: info: Pacemaker support: false
    Jun 08 03:18:39 master heartbeat: [2060]: WARN: Logging daemon is disabled –enabling logging daemon is recommended
    Jun 08 03:18:39 master heartbeat: [2060]: info: **************************
    Jun 08 03:18:39 master heartbeat: [2060]: info: Configuration validated. Starting heartbeat 3.0.4
    Jun 08 03:18:39 master heartbeat: [2061]: info: heartbeat: version 3.0.4
    Jun 08 03:18:39 master heartbeat: [2061]: info: Heartbeat generation: 1426273078
    Jun 08 03:18:39 master heartbeat: [2061]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth4
    Jun 08 03:18:39 master heartbeat: [2061]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth4 – Status: 1
    Jun 08 03:18:39 master heartbeat: [2061]: info: G_main_add_TriggerHandler: Added signal manual handler
    Jun 08 03:18:39 master heartbeat: [2061]: info: G_main_add_TriggerHandler: Added signal manual handler
    Jun 08 03:18:39 master heartbeat: [2061]: info: G_main_add_SignalHandler: Added signal handler for signal 17
    Jun 08 03:18:40 master heartbeat: [2061]: info: Local status now set to: ‘up’
    Jun 08 03:18:40 master heartbeat: [2061]: info: Link master:eth4 up.

  • this is the new log

    haresource:

    master IPaddr::192.168.1.50/24/eth4 drbddisk::r0 Filesystem::/dev/drbd0::/opt::ext3 zimbra

    HA-LOG

    [root@master cat /var/log/ha-log
    Jun 08 03:28:26 master heartbeat: [2061]: info: Heartbeat shutdown in progress. (2061)
    Jun 08 03:28:26 master heartbeat: [9506]: info: Giving up all HA resources.
    ResourceManager(default)[9519]: 2015/06/08_03:28:26 info: Releasing resource group: master IPaddr::192.168.1.50/24/eth4 drbddisk::r0 Filesystem::/dev/drbd0::/opt::ext3 zimbra
    ResourceManager(default)[9519]: 2015/06/08_03:28:26 info: Running /etc/init.d/zimbra stop
    ResourceManager(default)[9519]: 2015/06/08_03:28:29 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /opt ext3 stop
    Filesystem(Filesystem_/dev/drbd0)[9980]: 2015/06/08_03:28:29 INFO: Running stop for /dev/drbd0 on /opt
    Filesystem(Filesystem_/dev/drbd0)[9980]: 2015/06/08_03:28:29 INFO: Trying to unmount /opt
    Filesystem(Filesystem_/dev/drbd0)[9980]: 2015/06/08_03:28:29 INFO: unmounted /opt successfully
    /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[9972]: 2015/06/08_03:28:29 INFO: Success
    ResourceManager(default)[9519]: 2015/06/08_03:28:29 info: Running /etc/ha.d/resource.d/drbddisk r0 stop
    ResourceManager(default)[9519]: 2015/06/08_03:28:30 info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.50/24/eth4 stop
    IPaddr(IPaddr_192.168.1.50)[10135]: 2015/06/08_03:28:30 INFO: IP status = no, IP_CIP=
    /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.1.50)[10107]: 2015/06/08_03:28:30 INFO: Success
    Jun 08 03:28:30 master heartbeat: [9506]: info: All HA resources relinquished.
    Jun 08 03:28:32 master heartbeat: [2061]: info: killing HBWRITE process 2073 with signal 15
    Jun 08 03:28:32 master heartbeat: [2061]: info: killing HBREAD process 2074 with signal 15
    Jun 08 03:28:32 master heartbeat: [2061]: info: killing HBFIFO process 2071 with signal 15
    Jun 08 03:28:32 master heartbeat: [2061]: info: Core process 2071 exited. 3 remaining
    Jun 08 03:28:32 master heartbeat: [2061]: info: Core process 2074 exited. 2 remaining
    Jun 08 03:28:32 master heartbeat: [2061]: info: Core process 2073 exited. 1 remaining
    Jun 08 03:28:32 master heartbeat: [2061]: info: master Heartbeat shutdown complete.
    Jun 08 03:29:59 master heartbeat: [2110]: info: Pacemaker support: false
    Jun 08 03:29:59 master heartbeat: [2110]: WARN: Logging daemon is disabled –enabling logging daemon is recommended
    Jun 08 03:29:59 master heartbeat: [2110]: info: **************************
    Jun 08 03:29:59 master heartbeat: [2110]: info: Configuration validated. Starting heartbeat 3.0.4
    Jun 08 03:29:59 master heartbeat: [2111]: info: heartbeat: version 3.0.4
    Jun 08 03:29:59 master heartbeat: [2111]: info: Heartbeat generation: 1426273079
    Jun 08 03:29:59 master heartbeat: [2111]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth4
    Jun 08 03:29:59 master heartbeat: [2111]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth4 – Status: 1
    Jun 08 03:29:59 master heartbeat: [2111]: info: G_main_add_TriggerHandler: Added signal manual handler
    Jun 08 03:29:59 master heartbeat: [2111]: info: G_main_add_TriggerHandler: Added signal manual handler
    Jun 08 03:29:59 master heartbeat: [2111]: info: G_main_add_SignalHandler: Added signal handler for signal 17
    Jun 08 03:29:59 master heartbeat: [2111]: info: Local status now set to: ‘up’
    Jun 08 03:29:59 master heartbeat: [2111]: info: Link master:eth4 up.

  • Hi iman,
    I testing HA, I stop the heartbeat service on node1 after that I go to node2 and checking the zimbra service status but zimbra service status is show “stopped” Why all zimbra service is not automatically startup?

    Thanks and Regards,
    Tidapat

    • Hi Tidapat,

      For starting Zimbra services, you should wait few minutes (2 minutes) until all Services Zimbra Running Well. Once you stop Heartbeat services, it will stopping Zimbra services and all other services and taken over by other node and starting services from the beginning

  • Hi iman,

    Thank you very much for your update. I try to monitor process working. When I stop the heartbeat service on node1, The alias IP is switch to node2 and /opt is mount and can be show zimbra folder on node2. But after that 1-2 minute the ailas IP is missing and /opt is not show anything, the zimbra service can’t start. How to fix it?

    Thanks and Regards,
    Tidapat

      • Hi iman,

        I see Iman, No problem. I have some questions from Zimbra HA. The zimbra HA request Pacemaker service or not? And if I running on VMware Server it can be work? … I’m not sure my issue is happening because it ‘s running on VMware or not.

        Thanks and Regards,
        Tidapat U.

  • This article very useful. But when i have installed HA then cann’t start zmcontrol service, there are ”
    [root@mail ~]# su – zimbra -c “zmcontrol start”
    su: warning: cannot change directory to /opt/zimbra: No such file or directory
    -bash: zmcontrol: command not found

    Tell me what happened? any how to fix this.

      • [root@mail ~]# service drbd status
        drbd driver loaded OK; device status:
        version: 8.3.16 (api:88/proto:86-97)
        GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014 -11-24 14:51:37
        m:res cs ro ds p mounted fstype
        0:r0 StandAlone Primary/Unknown UpToDate/DUnknown r—– ext3

        [root@mail ~]# df -h
        Filesystem Size Used Avail Use% Mounted on
        /dev/sda2 18G 8.5G 8.1G 52% /
        tmpfs 932M 72K 932M 1% /dev/shm
        /dev/sda1 283M 102M 166M 38% /boot
        /dev/drbd0 20G 5.5G 14G 30% /opt

        [root@mail ~]# ifconfig
        eth0 Link encap:Ethernet HWaddr 00:0C:29:23:B9:17
        inet addr:192.168.42.133 Bcast:192.168.42.255 Mask:255.255.255.0
        inet6 addr: fe80::20c:29ff:fe23:b917/64 Scope:Link
        UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
        RX packets:106 errors:0 dropped:0 overruns:0 frame:0
        TX packets:304 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:1000
        RX bytes:19097 (18.6 KiB) TX bytes:24802 (24.2 KiB)

        eth0:0 Link encap:Ethernet HWaddr 00:0C:29:23:B9:17
        inet addr:192.168.42.134 Bcast:192.168.42.255 Mask:255.255.255.0
        UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

        lo Link encap:Local Loopback
        inet addr:127.0.0.1 Mask:255.0.0.0
        inet6 addr: ::1/128 Scope:Host
        UP LOOPBACK RUNNING MTU:65536 Metric:1
        RX packets:709 errors:0 dropped:0 overruns:0 frame:0
        TX packets:709 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:0
        RX bytes:1126112 (1.0 MiB) TX bytes:1126112 (1.0 MiB)

        [root@mail ~]# su zimbra
        [zimbra@mail root]$ zmcontrol status
        Host mail.vncs.com
        amavis Stopped
        amavisd is not running.
        antispam Stopped
        zmamavisdctl is not running
        ^C antivirus Running
        ldap Running
        z logger Running
        mcontr mailbox Running
        ol memcached Stopped
        memcached is not running.
        ^X^C mta Stopped
        zmsaslauthdctl is not running
        postfix is not running
        opendkim Stopped
        zmopendkimctl is not running.
        proxy Stopped
        zmnginxctl is not running
        ^ service webapp Running
        ^C snmp Stopped
        zmswatch is not running.
        spell Stopped
        zmapachectl is not running
        stats Stopped
        zimbra webapp Running
        zimbraAdmin webapp Running
        zimlet webapp Running
        zmconfigd Running
        [zimbra@mail root]$ zmcontrol start
        Host mail.vncs.com
        Starting zmconfigd…Done.
        Starting logger…Done.
        Starting mailbox…Done.
        Starting memcached…Done.
        Starting proxy…Done.
        Starting amavis…Done.
        Starting antispam…Done.
        Starting antivirus…Done.
        Starting opendkim…Done.
        Starting snmp…Done.
        Starting spell…Done.
        Starting mta…Failed.
        Starting saslauthd…already running.
        postfix failed to start

        Starting stats…Done.
        Starting service webapp…Done.
        Starting zimbra webapp…Done.
        Starting zimbraAdmin webapp…Done.
        Starting zimlet webapp…Done.
        [zimbra@mail root]$ zmcontrol status
        bash: /opt/zimbra/bin/zmcontrol: No such file or directory
        [zimbra@mail root]$ zmcontrol status
        bash: /opt/zimbra/bin/zmcontrol: No such file or directory
        [zimbra@mail root]$ zmcontrol restart
        bash: /opt/zimbra/bin/zmcontrol: No such file or directory

          • # Node 1
            [root@mail ~]# ls /opt/
            created-on-node1.txt zcs-8.6.0_GA_1153.RHEL6_64.20141215151155
            lost+found zcs-8.6.0_GA_1153.RHEL6_64.20141215151155.tgz
            rh zimbra

            # Node2

            [root@newmail ~]# ls /opt/
            is empty

          • More information
            # Node1
            [root@mail ~]# service heartbeat status
            heartbeat OK [pid 2358 et al] is running on mail.vncs.com [mail.vncs.com]…
            [root@mail ~]# service drbd status
            drbd driver loaded OK; device status:
            version: 8.3.16 (api:88/proto:86-97)
            GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37
            m:res cs ro ds p mounted fstype
            0:r0 StandAlone Primary/Unknown UpToDate/DUnknown r—– ext3
            [root@mail ~]#
            #Node2
            [root@newmail ~]# service heartbeat status
            heartbeat OK [pid 2323 et al] is running on newmail.vncs.com [newmail.vncs.com]…
            [root@newmail ~]# service drbd status
            drbd driver loaded OK; device status:
            version: 8.3.16 (api:88/proto:86-97)
            GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37
            m:res cs ro ds p mounted fstype
            0:r0 WFConnection Secondary/Unknown UpToDate/DUnknown C
            [root@newmail ~]#

          • Hi Lee,

            It seems your DRBD devices is not mounted and not connected with other nodes. Please try to reconnect DRBD by perform restart services on all nodes (service drbd restart). The next, you should perform to make first node Primary by perform this command

            drbdadm primary all
            
          • #Node1
            [zimbra@mail root]$ zmcontrol status
            Host mail.vncs.com
            amavis Running
            antispam Running
            antivirus Running
            ldap Running
            logger Running
            mailbox Running
            memcached Running
            mta Running
            opendkim Running
            proxy Running
            service webapp Running
            snmp Running
            spell Running
            stats Running
            zimbra webapp Running
            zimbraAdmin webapp Running
            zimlet webapp Running
            zmconfigd Running
            [zimbra@mail root]$

            #Node2
            [root@newmail ~]# su zimbra
            bash-4.1$
            Zimbra account lost 🙂

  • [zimbra@mail root]$ zmcontrol
    bash: /opt/zimbra/bin/zmcontrol: No such file or directory
    [zimbra@mail root]$ exit
    exit
    [root@mail ~]# su zimbra
    bash-4.1$ exit

  • this step “change IP address of mail so that refers into IP 192.168.80.93. See the following example” where should i do it ? node 1 node 2 or make a alias ip a dns server ? by the way thanks for the work al is installed zimbra dns drbd but i stoped in that part .

  • when i acces the server i get this error in the browser
    HTTP ERROR 502

    Problem accessing ZCS upstream server. Cannot connect to the ZCS upstream server. Connection is refused.
    Possible reasons:

    upstream server is unreachable
    upstream server is currently being upgraded
    upstream server is down

    Please contact your ZCS administrator to fix the problem

  • server is working ok the master server but when i shut down the master trying to swith to the slave nothing seams to work i v got a copy of the log file of the heartbeat server
    Jan 23 08:20:06 mail5.companytt.tn heartbeat: [28744]: WARN: Logging daemon is disabled –enabling logging daemon is recommended
    Jan 23 08:20:06 mail5.companytt.tn heartbeat: [28744]: info: **************************
    Jan 23 08:20:06 mail5.companytt.tn heartbeat: [28744]: info: Configuration validated. Starting heartbeat 3.0.4
    Jan 23 08:20:06 mail5.companytt.tn heartbeat: [28745]: info: heartbeat: version 3.0.4
    Jan 23 08:20:06 mail5.companytt.tn heartbeat: [28745]: info: Heartbeat generation: 1453471986
    Jan 23 08:20:06 mail5.companytt.tn heartbeat: [28745]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0
    Jan 23 08:20:06 mail5.companytt.tn heartbeat: [28745]: info: glib: ucast: bound send socket to device: eth0
    Jan 23 08:20:06 mail5.companytt.tn heartbeat: [28745]: info: glib: ucast: set SO_REUSEPORT(w)
    Jan 23 08:20:06 mail5.companytt.tn heartbeat: [28745]: info: glib: ucast: bound receive socket to device: eth0
    Jan 23 08:20:06 mail5.companytt.tn heartbeat: [28745]: info: glib: ucast: set SO_REUSEPORT(w)
    Jan 23 08:20:06 mail5.companytt.tn heartbeat: [28745]: info: glib: ucast: started on port 694 interface eth0 to 172.17.12.183
    Jan 23 08:20:06 mail5.companytt.tn heartbeat: [28745]: info: G_main_add_TriggerHandler: Added signal manual handler
    Jan 23 08:20:06 mail5.companytt.tn heartbeat: [28745]: info: G_main_add_TriggerHandler: Added signal manual handler
    Jan 23 08:20:06 mail5.companytt.tn heartbeat: [28745]: info: G_main_add_SignalHandler: Added signal handler for signal 17
    Jan 23 08:20:06 mail5.companytt.tn heartbeat: [28745]: info: Local status now set to: ‘up’
    Jan 23 08:21:36 mail5.companytt.tn heartbeat: [28745]: WARN: node mail6.companytt.tn: is dead
    Jan 23 08:21:36 mail5.companytt.tn heartbeat: [28745]: info: Comm_now_up(): updating status to active
    Jan 23 08:21:36 mail5.companytt.tn heartbeat: [28745]: info: Local status now set to: ‘active’
    Jan 23 08:21:36 mail5.companytt.tn heartbeat: [28745]: WARN: No STONITH device configured.
    Jan 23 08:21:36 mail5.companytt.tn heartbeat: [28745]: WARN: Shared disks are not protected.
    Jan 23 08:21:36 mail5.companytt.tn heartbeat: [28745]: info: Resources being acquired from mail6.companytt.tn.
    Jan 23 08:21:36 mail5.companytt.tn heartbeat: [30770]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys mail5.companytt.tn] to acquire.
    harc(default)[30769]: 2016/01/23_08:21:36 info: Running /etc/ha.d//rc.d/status status
    mach_down(default)[30799]: 2016/01/23_08:21:36 info: Taking over resource group IPaddr::172.17.12.200/24/eth0:0
    ResourceManager(default)[30825]: 2016/01/23_08:21:36 info: Acquiring resource group: mail6.companytt.tn IPaddr::172.17.12.200/24/eth0:0 drbddisk::r0 Filesystem::/dev/drbd1::/opt::ext4 named zimbra
    /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.17.12.200)[30852]: 2016/01/23_08:21:37 INFO: Resource is stopped
    ResourceManager(default)[30825]: 2016/01/23_08:21:37 info: Running /etc/ha.d/resource.d/IPaddr 172.17.12.200/24/eth0:0 start
    IPaddr(IPaddr_172.17.12.200)[30981]: 2016/01/23_08:21:37 INFO: Adding inet address 172.17.12.200/24 with broadcast address 172.17.12.255 to device eth0 (with label eth0:0)
    IPaddr(IPaddr_172.17.12.200)[30981]: 2016/01/23_08:21:37 INFO: Bringing device eth0 up
    IPaddr(IPaddr_172.17.12.200)[30981]: 2016/01/23_08:21:37 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.17.12.200 eth0 172.17.12.200 auto not_used not_used
    /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.17.12.200)[30955]: 2016/01/23_08:21:37 INFO: Success
    ResourceManager(default)[30825]: 2016/01/23_08:21:37 info: Running /etc/ha.d/resource.d/drbddisk r0 start
    Jan 23 08:21:47 mail5.companytt.tn heartbeat: [28745]: info: Local Resource acquisition completed. (none)
    Jan 23 08:21:47 mail5.companytt.tn heartbeat: [28745]: info: local resource transition completed.
    ResourceManager(default)[30825]: 2016/01/23_08:21:49 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
    ResourceManager(default)[30825]: 2016/01/23_08:21:49 CRIT: Giving up resources due to failure of drbddisk::r0
    ResourceManager(default)[30825]: 2016/01/23_08:21:49 info: Releasing resource group: mail6.companytt.tn IPaddr::172.17.12.200/24/eth0:0 drbddisk::r0 Filesystem::/dev/drbd1::/opt::ext4 named zimbra
    ResourceManager(default)[30825]: 2016/01/23_08:21:49 info: Running /etc/init.d/zimbra stop
    ResourceManager(default)[30825]: 2016/01/23_08:21:49 ERROR: Return code 127 from /etc/init.d/zimbra
    ResourceManager(default)[30825]: 2016/01/23_08:21:50 info: Retrying failed stop operation [zimbra]
    ResourceManager(default)[30825]: 2016/01/23_08:21:50 info: Running /etc/init.d/zimbra stop
    ResourceManager(default)[30825]: 2016/01/23_08:21:50 ERROR: Return code 127 from /etc/init.d/zimbra
    ResourceManager(default)[30825]: 2016/01/23_08:21:51 info: Retrying failed stop operation [zimbra]
    ResourceManager(default)[30825]: 2016/01/23_08:21:51 info: Running /etc/init.d/zimbra stop
    ResourceManager(default)[30825]: 2016/01/23_08:21:51 ERROR: Return code 127 from /etc/init.d/zimbra
    ResourceManager(default)[30825]: 2016/01/23_08:21:52 info: Retrying failed stop operation [zimbra]
    ResourceManager(default)[30825]: 2016/01/23_08:21:52 info: Running /etc/init.d/zimbra stop
    ResourceManager(default)[30825]: 2016/01/23_08:21:52 ERROR: Return code 127 from /etc/init.d/zimbra
    ResourceManager(default)[30825]: 2016/01/23_08:21:53 info: Retrying failed stop operation [zimbra]
    ResourceManager(default)[30825]: 2016/01/23_08:21:53 info: Running /etc/init.d/zimbra stop
    ResourceManager(default)[30825]: 2016/01/23_08:21:53 ERROR: Return code 127 from /etc/init.d/zimbra
    ResourceManager(default)[30825]: 2016/01/23_08:21:54 info: Retrying failed stop operation [zimbra]
    ResourceManager(default)[30825]: 2016/01/23_08:21:54 info: Running /etc/init.d/zimbra stop
    ResourceManager(default)[30825]: 2016/01/23_08:21:54 ERROR: Return code 127 from /etc/init.d/zimbra
    ResourceManager(default)[30825]: 2016/01/23_08:21:55 info: Retrying failed stop operation [zimbra]
    ResourceManager(default)[30825]: 2016/01/23_08:21:55 info: Running /etc/init.d/zimbra stop
    ResourceManager(default)[30825]: 2016/01/23_08:21:55 ERROR: Return code 127 from /etc/init.d/zimbra
    ResourceManager(default)[30825]: 2016/01/23_08:21:56 info: Retrying failed stop operation [zimbra]
    ResourceManager(default)[30825]: 2016/01/23_08:21:57 info: Running /etc/init.d/zimbra stop
    ResourceManager(default)[30825]: 2016/01/23_08:21:57 ERROR: Return code 127 from /etc/init.d/zimbra
    ResourceManager(default)[30825]: 2016/01/23_08:21:58 info: Retrying failed stop operation [zimbra]
    ResourceManager(default)[30825]: 2016/01/23_08:21:58 info: Running /etc/init.d/zimbra stop
    ResourceManager(default)[30825]: 2016/01/23_08:21:58 ERROR: Return code 127 from /etc/init.d/zimbra
    ResourceManager(default)[30825]: 2016/01/23_08:21:59 info: Retrying failed stop operation [zimbra]
    ResourceManager(default)[30825]: 2016/01/23_08:21:59 info: Running /etc/init.d/zimbra stop
    ResourceManager(default)[30825]: 2016/01/23_08:21:59 ERROR: Return code 127 from /etc/init.d/zimbra
    ResourceManager(default)[30825]: 2016/01/23_08:22:00 info: Retrying failed stop operation [zimbra]
    ResourceManager(default)[30825]: 2016/01/23_08:22:00 info: Running /etc/init.d/zimbra stop
    ResourceManager(default)[30825]: 2016/01/23_08:22:00 ERROR: Return code 127 from /etc/init.d/zimbra
    ResourceManager(default)[30825]: 2016/01/23_08:22:00 ERROR: Resource script for zimbra probably not LSB-compliant.
    ResourceManager(default)[30825]: 2016/01/23_08:22:00 WARN: it (zimbra) MUST succeed on a stop when already stopped
    ResourceManager(default)[30825]: 2016/01/23_08:22:00 WARN: Machine reboot narrowly avoided!
    ResourceManager(default)[30825]: 2016/01/23_08:22:00 info: Running /etc/init.d/named stop
    ResourceManager(default)[30825]: 2016/01/23_08:22:02 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /opt ext4 stop
    Filesystem(Filesystem_/dev/drbd1)[32662]: 2016/01/23_08:22:02 INFO: Running stop for /dev/drbd1 on /opt
    /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd1)[32654]: 2016/01/23_08:22:02 INFO: Success
    ResourceManager(default)[30825]: 2016/01/23_08:22:02 info: Running /etc/ha.d/resource.d/drbddisk r0 stop
    ResourceManager(default)[30825]: 2016/01/23_08:22:02 info: Running /etc/ha.d/resource.d/IPaddr 172.17.12.200/24/eth0:0 stop
    IPaddr(IPaddr_172.17.12.200)[317]: 2016/01/23_08:22:02 INFO: IP status = ok, IP_CIP=
    /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.17.12.200)[32759]: 2016/01/23_08:22:02 INFO: Success
    mach_down(default)[30799]: 2016/01/23_08:22:02 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
    mach_down(default)[30799]: 2016/01/23_08:22:02 info: mach_down takeover complete for node mail6.companytt.tn.
    Jan 23 08:22:02 mail5.companytt.tn heartbeat: [28745]: info: mach_down takeover complete.
    Jan 23 08:22:02 mail5.companytt.tn heartbeat: [28745]: info: Initial resource acquisition complete (mach_down)
    hb_standby(default)[1645]: 2016/01/23_08:22:32 Going standby [foreign].
    Jan 23 08:22:33 mail5.companytt.tn heartbeat: [28745]: info: mail5.companytt.tn wants to go standby [foreign]
    Jan 23 08:22:44 mail5.companytt.tn heartbeat: [28745]: WARN: No reply to standby request. Standby request cancelled.

    • Hi,

      It seems your DRBD devices not working properly and not mounted. Please paste the following command when system all running fine (master and slave)

      service drbd status
      cat /etc/selinux/config
      iptables -L
      
      • thanks for replying Iman i m very grateful this is what you’v asked me for
        the master :
        service drbd status:
        drbd driver loaded OK; device status:
        version: 8.3.16 (api:88/proto:86-97)
        GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37
        m:res cs ro ds p mounted fstype
        1:r0 Connected Primary/Secondary UpToDate/UpToDate C

        cat /etc/selinux/config
        # This file controls the state of SELinux on the system.
        # SELINUX= can take one of these three values:
        # enforcing – SELinux security policy is enforced.
        # permissive – SELinux prints warnings instead of enforcing.
        # disabled – No SELinux policy is loaded.
        SELINUX=disabled
        # SELINUXTYPE= can take one of these two values:
        # targeted – Targeted processes are protected,
        # mls – Multi Level Security protection.
        SELINUXTYPE=targeted

        iptables -L
        Chain INPUT (policy ACCEPT)
        target prot opt source destination

        Chain FORWARD (policy ACCEPT)
        target prot opt source destination

        Chain OUTPUT (policy ACCEPT)
        target prot opt source destination

        the slave :
        service drbd status
        drbd driver loaded OK; device status:
        version: 8.3.16 (api:88/proto:86-97)
        GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2014-11-24 14:51:37
        m:res cs ro ds p mounted fstype
        1:r0 Connected Secondary/Primary UpToDate/UpToDate C

        cat /etc/selinux/config
        # This file controls the state of SELinux on the system.
        # SELINUX= can take one of these three values:
        # enforcing – SELinux security policy is enforced.
        # permissive – SELinux prints warnings instead of enforcing.
        # disabled – No SELinux policy is loaded.
        SELINUX=disabled
        # SELINUXTYPE= can take one of these two values:
        # targeted – Targeted processes are protected,
        # mls – Multi Level Security protection.
        SELINUXTYPE=targeted

        iptables -L
        Chain INPUT (policy ACCEPT)
        target prot opt source destination

        Chain FORWARD (policy ACCEPT)
        target prot opt source destination

        Chain OUTPUT (policy ACCEPT)
        target prot opt source destination

        • Hi Anis,

          All your configuration is good. Please try to perform this command on both node and test again

          setenforce 0
          

          If still problem, i will try to make a video ASAP 😉

  • Great Article Iman.

    Guide me how to follow your article in Multi Server Installation environment ? Do I need to install HA and DRBD in (LDAP+MTA+Proxy) server or in Mailbox Server?

  • Hello Iman,
    Followed your article but I was end up with Zimbra id considered as a normal user account.

    After HA and DRBD installed and configured, finally try to restart the zimbra service…

    su – zimbra
    No directory, logging in with HOME=/
    zimbra@mx1:/$ exit

    Am I missing anything…

    Here is my DRBD status
    drbd driver loaded OK; device status:
    version: 8.4.3 (api:1/proto:86-101)
    srcversion: F97798065516C94BE0F27DC
    m:res cs ro ds p mounted fstype
    0:r0 Connected Primary/Secondary UpToDate/UpToDate C

    Heartbeat status
    heartbeat OK [pid 32492 et al] is running on mx1 [mx1]…

    HA logs
    Filesystem(Filesystem_/dev/drbd0)[1442]: 2016/05/09_00:41:31 INFO: Running stop for /dev/drbd0 on /opt
    Filesystem(Filesystem_/dev/drbd0)[1442]: 2016/05/09_00:41:31 INFO: Trying to unmount /opt
    Filesystem(Filesystem_/dev/drbd0)[1442]: 2016/05/09_00:41:31 INFO: unmounted /opt successfully
    /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[1436]: 2016/05/09_00:41:31 INFO: Success
    ResourceManager(default)[32614]: 2016/05/09_00:41:31 info: Running /etc/ha.d/resource.d/drbddisk r0 stop
    ResourceManager(default)[32614]: 2016/05/09_00:41:31 info: Running /etc/ha.d/resource.d/IPaddr 10.0.0.100/24/eth0:0 stop
    IPaddr(IPaddr_10.0.0.100)[1588]: 2016/05/09_00:41:31 INFO: ifconfig eth0:3 down
    /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.0.0.100)[1564]: 2016/05/09_00:41:31 INFO: Success
    hb_standby(default)[1613]: 2016/05/09_00:42:01 Going standby [foreign].
    May 09 00:42:02 mx1 heartbeat: [32492]: info: mx1 wants to go standby [foreign]
    May 09 00:42:02 mx1 heartbeat: [32492]: info: standby: mx2 can take our foreign resources
    May 09 00:42:02 mx1 heartbeat: [1640]: info: give up foreign HA resources (standby).
    May 09 00:42:02 mx1 heartbeat: [1640]: info: foreign HA resource release completed (standby).
    May 09 00:42:02 mx1 heartbeat: [32492]: info: Local standby process completed [foreign].
    May 09 00:42:03 mx1 heartbeat: [32492]: WARN: 1 lost packet(s) for [mx2] [48:50]
    May 09 00:42:03 mx1 heartbeat: [32492]: info: remote resource transition completed.
    May 09 00:42:03 mx1 heartbeat: [32492]: info: No pkts missing from mx2!

    Note : I am awaiting for your reply ASAP

    • Hi,

      I see the missing in here

      May 09 00:42:03 mx1 heartbeat: [32492]: info: No pkts missing from mx2!

      Please paste all configuration Heartbeat and DRBD from all your node

  • Hello Iman,
    I found the issue

    1. If I stop the heartbeat service in node 1, secondary node drbd makes it primary, after starting the heartbeat in node 1, both node1 and 2 remains the same as secondary/secondary.

    2. /opt was not mounting in both the nodes due to this I am unable to run the zimbra service

    Any clue…? I am using Ubuntu 14.04 LTS

  • Hello Iman,
    I fixed the 2nd issue of myself in my ex-positng, however I am facing both the nodes are in secondary/secondary. If I restart the heartbeat the node2 turns into secondary/primary and mounted the /dev/drdb0 but within a few seconds it’s turn into secondary/secondary. The same has been there in the node1 when the heartbeat starts.

    I am clueless whatz it’s causing the issue.

    • Hi Suresh,

      For testing, services heartbeat should be restarted from node1 only. Node2 nothing do anything. If you want to check real fail over, you can shutdown node1 😀

  • Seems like you are not understanding my query Iman. When the node1 was shutdown, the drbd was mounted in the node 2 however it will umount immediately so the zimbra was inactive and it’s vice-versa in node 1, once it’s up.

    • Hi Suresh,

      Please make sure you has been disabled firewall, apparmor or SELinux in your system. I’ve getting same problem as you when SELinux still enable in CentOS

  • This high availability could be applied between a physical machine and a virtualized machine. Both with the same verssão CentOS and Zimbra. I use virtualization with KVM. In the virtualization host would export the already drive with DRBD for NFS and virtualized machine mount it / opt

  • Hi Iman ,

    Want to know if both the zimbra server should have same host name like mail.imanudin.net or it should be node1.imanudin.net and node1.imanudin.net as while setting up heartbeat it should have different hostname name.

    Please guide.

    • Hi Neel,

      When you install Zimbra, you should use mail.imanudin.net on both server (node1 and node2). After all has been installed, you should change the name to node1 and node2. mail.imanudin.net will refer into alias IP that managed by Heartbeat

      • Hi Iman,

        Thanks , but if we changed name to node1 and node2 for heartbeat management ,then does it will effect in zimbra ldap.

        Could you please help me out as I am installing zimbra with master and standby scenarios , as in case primary down I can access from secondary.

  • HI Iman,

    i have configured zimbra server with differnet relay server to send email to outside usig fetchmail so i have created some users on that relay server and configured id’s on outlook to send and receive email but from last week outlook users getting duplicate emails which is on pop.no issue of zimbra user so can u tell me is that zimbra issue or relay server issue?

      • we have 2 email server one is zimbra which is handling myself and 2nd is third party email server who’s name is saturnworldindia so now what is happening for example my id is created in zimbra and i sent mail to abc user then he getting single email who’s id created on third party email server but when third party users sent internal email then getting double email this issue is comming third party email users not in zimbra users.so what is reason?

  • Hi, Facing issues as the zimbra service automatically get killed and /dev/brbd0 get unmounted. Please check the below logs and kindly help me with this issue,
    ——————————————————————————-
    [root@zmbox1 ~]# tail /var/log/messages
    Nov 9 23:51:08 zmbox1 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
    Nov 9 23:51:51 zmbox1 kernel: block drbd0: role( Secondary -> Primary )
    Nov 9 23:51:51 zmbox1 kernel: kjournald starting. Commit interval 5 seconds
    Nov 9 23:51:52 zmbox1 kernel: EXT3-fs (drbd0): using internal journal
    Nov 9 23:51:52 zmbox1 kernel: EXT3-fs (drbd0): mounted filesystem with ordered data mode
    Nov 9 23:53:57 zmbox1 saslauthd: auth_zimbra_init: zimbra_cert_check is off!
    Nov 9 23:53:57 zmbox1 saslauthd: auth_zimbra_init: 1 auth urls initialized for round-robin
    Nov 9 23:54:59 zmbox1 kernel: block drbd0: role( Primary -> Secondary )
    Nov 9 23:54:59 zmbox1 kernel: block drbd0: bitmap WRITE of 0 pages took 0 jiffies
    Nov 9 23:54:59 zmbox1 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
    ——————————————————————————-

  • [root@zmbox1 ~]# cat /etc/hosts
    127.0.0.1 localhost
    10.1.1.10 zmbox1.nerist.ac.in zmbox1
    10.1.1.11 zmbox2.nerist.ac.in zmbox2
    10.1.1.12 mail.nerist.ac.in mail
    ——————————————————————
    [root@zmbox1 ~]# cat /etc/ha.d/haresources
    zmbox1.nerist.ac.in IPaddr::10.1.1.12/24/eth0:0 drbddisk::r0 Filesystem::/dev/drbd0::/opt::ext3 zimbra

  • root@zmbox2 ~]# cat /etc/hosts
    127.0.0.1 localhost
    10.1.1.10 zmbox1.nerist.ac.in zmbox1
    10.1.1.11 zmbox2.nerist.ac.in zmbox2
    10.1.1.12 mail.nerist.ac.in mail
    ——————————————————————
    [root@zmbox2 ~]# cat /etc/ha.d/haresources
    zmbox1.nerist.ac.in IPaddr::10.1.1.12/24/eth0:0 drbddisk::r0 Filesystem::/dev/drbd0::/opt::ext3 zimbra

  • Hi Iman,
    Thanks for your response.
    Yes, the entry /etc/ha.d/haresources of is single line.
    Please help me to get out of this issue.

  • Hi Iman,
    I have a doubt. Whether this will replicate the errors coming in primary to secondary – say db is corrupted in primary or any other errors.
    Actually my target is to have a backup server.

    Thank you

  • We are taking weekly backup of /opt/zimbra using rsync to another similar server.
    But during failover it was showing “No Such Blob” error.
    Is there any method of backup for zimbra opensource.

    • Hi Saeid,

      You can move /var/log folder to DRBD devices and make symlink. For example, you have been mounted drbd devices to /opt

      mv /var/log/ /opt/
      ln -s /opt/log /var/log
      

      All data on /var/log (symlink from /opt/log) will be replicated by DRBD

  • What are the requirements for Zimbra HA
    ZIMBRA NE
    Load Balancer
    Shared Storage

    Anything. Required

    Is it possible to deploy HA SOLUTION without shared storage

    • Hi Ramesh Petl,

      Zimbra HA can use Zimbra OSE. The requirement are :

      – 2 Server (active/standby)
      – 2 harddisk on each servers
      – Install and configure Heartbeat for online failover
      – Install and configure DRBD for persistent data (like RAID 1)

      shared storage can be replaced by configure DRBD

  • Hi imanudin Pro,
    You can send to me video “How to install configure Zimbra HA”. Iam sorry, Iam from Vietnam, Iam English not Good.
    Thankyou Verry much!

  • Hi iman,

    Trying to figure out how to achieve Active/Active in Zimbra mailbox server. I can use HAProxy for Proxy, two MX record for MTAs, and Master/Slave for LDAP. But for Mailbox, I google’d everywhere and all Active/Standy model, is that possible to use DRBD for Active/Active?

    Thanks.

LEAVE A COMMENT