HA集群配置

HA 即（high available）高可用，又被叫做双机热备，用于关键性业务。简单理解就是，有两台机器A和B，正常是A提供服务，B待命闲置，当A宕机或服务宕掉，会切换至B机器继续提供服务。常用实现高可用的开源软件有heartbeat和keepalived，其中keepalived有负载均衡的功能。
如图所示为一个HA架构，一个交换机下面有两台机器web1和web2，其中web1为主节点，正常是它在提供服务，web2为备用闲置节点。web1和web2中间有一个心跳线，检查对方是否存活状态。流动IP，也叫vip是对外提供服务的IP，正常情况下是配置在web1上的，当web1宕机后，web2会自动配置该vip，对外提供服务。

下面我们用heartbeat来做HA集群，并且把Nginx服务作为HA对应的服务。

准备工作：
两个机器，都是CentOS6.5 网卡eth0 ip如下：

master 192.168.0.161
slave  192.168.0.162

1. 设置hostname （主从上都要进行）

# 在主上
hostname master
bash
#在从上
hostname slave
bash

2. 关闭防火墙（主从上都要进行）

# iptables
iptables -F 
service iptables save
# selinux
setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config

3. 配置hosts （主从上都要进行）

vim /etc/hosts
# 加入：
192.168.0.161 master
192.168.0.162 slave

4. 安装epel扩展源（主从上都要进行）

1	yum install -y epel-release

5. 安装heartbeat libnet Nginx

1	yum install -y heartbeat* libnet nginx

6. 主（master）上配置

cd /usr/share/doc/heartbeat-3.0.4/
cp  authkeys  ha.cf haresources   /etc/ha.d/
cd /etc/ha.d
vi  authkeys  //加入或更改为

auth 3
3 md5 Hello!

chmod 600 authkeys
vi  haresources  //加入

master 192.168.0.150/24/eth0:0 nginx   

vi  ha.cf   //改为如下内容：
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility     local0
keepalive 2
deadtime 30
warntime 10
initdead 60
udpport 694
ucast eth0 192.168.0.162
auto_failback on
node    master
node    slave
ping 192.168.0.1
respawn hacluster /usr/lib/heartbeat/ipfail

配置说明：

debugfile /var/log/ha-debug：保存heartbeat的调试信息；
dlogfile /var/log/ha-log：heartbeat日志信息；
dlogfacility local0：日志级别；
dkeepalive 2：跳时间间隔；
ddeadtime 30：超出该时间间隔未收到对方节点心跳，则认为对方已死亡；
dwarntime 10：超出该时间间隔未收到对方节点心跳，则发出警告并记录到日志；
dinitdead 60：在某些系统上，系统启动或重启之后需要经过一段时间网络才能恢复正常工作，该选项用于解决这种情况产生的时间间隔；
dudpport 694：设置广播通信使用的端口，649为默认端口；
ducast eth0 192.168.0.162：设置对方奇迹心跳检测的网卡和IP；
dauto_failback on：heartbeat的两台之极分别为主节点和从节点，主节点在正常情况下占用资源并运行所有服务，遇到故障时把资源交给从节点并由从节点运行服务；
dnode master：指定主；
dnode slave：指定从；
dping 192.168.0.1
drespawn hacluster/usr/lib/heartbeat/ipfail：指定与heartbeat一同启动和关闭的进程，该进程被自动监听视，遇到故障则从新启动。最常见的进程是ipfail，该进程用于检测和处理网络故障，需要配合ping语句指定pingnode来检测网络连接。如果你的系统是64位，请注意该文件路径

7. 把主上的三个配置拷贝到从上

1 2	cd /etc/ha.d/ scp authkeys ha.cf haresources slave:/etc/ha.d/

8. 到从上(slave) 编辑ha.cf

1 2	vi /etc/ha.d/ha.cf //只需要更改一个地方 ucast eth1 192.168.0.162 改为 ucast eth1 192.168.0.161

9. 启动heartbeat

先主后从

1	service heartbeat start

10. 检查测试

ifconfig

看是否有 eth0:0

1	ps aux \|grep nginx

看是否有nginx进程

11. 测试1

主上故意禁ping

1	iptables -I INPUT -p icmp -j DROP

可以看到日志/var/log/ha-log 发生如下变化

ResourceManager(default)[1751]: 2017/03/28_23:01:07 info: Running /etc/init.d/nginx  start
Mar 28 23:06:49 master heartbeat: [1543]: WARN: node 192.168.0.1: is dead
Mar 28 23:06:49 master heartbeat: [1543]: info: Link 192.168.0.1:192.168.0.1 dead.
Mar 28 23:06:49 master ipfail: [1571]: info: Status update: Node 192.168.0.1 now has status dead
harc(default)[2053]:    2017/03/28_23:06:49 info: Running /etc/ha.d//rc.d/status status
Mar 28 23:06:51 master ipfail: [1571]: info: NS: We are dead. :<
Mar 28 23:06:51 master ipfail: [1571]: info: Link Status update: Link 192.168.0.1/192.168.0.1 now has status dead
Mar 28 23:06:52 master ipfail: [1571]: info: We are dead. :<
Mar 28 23:06:52 master ipfail: [1571]: info: Asking other side for ping node count.
Mar 28 23:06:55 master ipfail: [1571]: info: Giving up because we were told that we have less ping nodes.
Mar 28 23:06:55 master ipfail: [1571]: info: Delayed giveup in 4 seconds.
Mar 28 23:06:59 master ipfail: [1571]: info: giveup() called (timeout worked)
Mar 28 23:06:59 master heartbeat: [1543]: info: master wants to go standby [all]
Mar 28 23:07:00 master heartbeat: [1543]: info: standby: slave can take our all resources
Mar 28 23:07:00 master heartbeat: [2079]: info: give up all HA resources (standby).
ResourceManager(default)[2092]: 2017/03/28_23:07:00 info: Releasing resource group: master 192.168.0.150/24/eth0:0 nginx
ResourceManager(default)[2092]: 2017/03/28_23:07:00 info: Running /etc/init.d/nginx  stop
ResourceManager(default)[2092]: 2017/03/28_23:07:00 info: Running /etc/ha.d/resource.d/IPaddr 192.168.0.150/24/eth0:0 stop
IPaddr(IPaddr_192.168.0.150)[2178]:     2017/03/28_23:07:00 INFO: IP status = ok, IP_CIP=
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.0.150)[2152]:  2017/03/28_23:07:00 INFO:  Success
Mar 28 23:07:00 master heartbeat: [2079]: info: all HA resource release completed (standby).
Mar 28 23:07:00 master heartbeat: [1543]: info: Local standby process completed [all].
Mar 28 23:07:02 master heartbeat: [1543]: WARN: 1 lost packet(s) for [slave] [197:199]
Mar 28 23:07:02 master heartbeat: [1543]: info: remote resource transition completed.
Mar 28 23:07:02 master heartbeat: [1543]: info: No pkts missing from slave!
Mar 28 23:07:02 master heartbeat: [1543]: info: Other node completed standby takeover of all resources.

12. 测试2

主上停止heartbeat服务

1	service heartbeat stop