5,105 瀏覽數

CoreOS etcd2 Cluster 的容錯能力

ETCD2 Cluster members

  1. CoreOS1 (Node1) , IP: 172.16.15.21 (Leader)
  2. CoreOS2 (Node2) , IP: 172.16.15.22
  3. CoreOS3 (Node3) , IP: 172.16.15.23


有 3 個 node 的 Cluster 就具備了容錯的功能.

3 CLUSTER SIZE ,2 MAJORITY , 1 FAILURE TOLERANCE

可以允許一個 Node 的錯誤.

其他多個 Node 的 Cluster 的 SIZE ,MAJORITY, FAILURE TOLERANCE CLUSTER 容錯功能請參考 https://coreos.com/etcd/docs/latest/v2/admin_guide.html#optimal-cluster-size

etcd2 的設定請參考 關於

來試試看,一開始 etcd2 狀態都是正常的

core@coreos1 ~ $ etcdctl member list
29df81d17418132: name=node01 peerURLs=http://172.16.15.21:2380 clientURLs=http://172.16.15.21:2379 isLeader=true
48852466e20da1b8: name=node03 peerURLs=http://172.16.15.23:2380 clientURLs=http://172.16.15.23:2379 isLeader=false
e1c17ca056760d6d: name=node02 peerURLs=http://172.16.15.22:2380 clientURLs=http://172.16.15.22:2379 isLeader=false
core@coreos1 ~ $ etcdctl cluster-health
member 29df81d17418132 is healthy: got healthy result from http://172.16.15.21:2379
member 48852466e20da1b8 is healthy: got healthy result from http://172.16.15.23:2379
member e1c17ca056760d6d is healthy: got healthy result from http://172.16.15.22:2379
cluster is healthy

接下來把 Node3 關閉.

core@coreos1 ~ $ etcdctl cluster-health
member 29df81d17418132 is healthy: got healthy result from http://172.16.15.21:2379
failed to check the health of member 48852466e20da1b8 on http://172.16.15.23:2379: Get http://172.16.15.23:2379/health: dial tcp 172.16.15.23:2379: i/o timeout
member 48852466e20da1b8 is unreachable: [http://172.16.15.23:2379] are all unreachable
member e1c17ca056760d6d is healthy: got healthy result from http://172.16.15.22:2379
cluster is healthy

可以很清楚知道 Node3 的狀態已經是 unreachable 的.但 cluster 的狀態是 healthy 的.

接下來把 Node2 也關閉.

core@coreos1 ~ $ etcdctl cluster-health
member 29df81d17418132 is unhealthy: got unhealthy result from http://172.16.15.21:2379
failed to check the health of member 48852466e20da1b8 on http://172.16.15.23:2379: Get http://172.16.15.23:2379/health: dial tcp 172.16.15.23:2379: i/o timeout
member 48852466e20da1b8 is unreachable: [http://172.16.15.23:2379] are all unreachable
failed to check the health of member e1c17ca056760d6d on http://172.16.15.22:2379: Get http://172.16.15.22:2379/health: dial tcp 172.16.15.22:2379: i/o timeout
member e1c17ca056760d6d is unreachable: [http://172.16.15.22:2379] are all unreachable
cluster is unhealthy

Node2 的狀態也變成 unreachable 的.連 Node1 的狀態也變成了 unhealthy, cluster 狀態也是 unhealthy. 3個 Node 最多只能支援 1 個 Node 的錯誤.

要恢復 etcd2 Cluster 的狀態可以參考 etcd2 的災難復原 – http://benjr.tw/96497

3 Replies to “CoreOS etcd2 Cluster 的容錯能力”

  1. 自動參照通知: CoreOS – Fleet – Benjr.tw

  2. 自動參照通知: 安裝 CoreOS – 設定 etcd2 – Benjr.tw

  3. 自動參照通知: CoreOS 設定檔 – Benjr.tw

發表迴響