CoreOS etcd2 Cluster 的容錯能力

ETCD2 Cluster members

CoreOS1 (Node1) , IP: 172.16.15.21 (Leader)
CoreOS2 (Node2) , IP: 172.16.15.22
CoreOS3 (Node3) , IP: 172.16.15.23

有 3 個 node 的 Cluster 就具備了容錯的功能.

3 CLUSTER SIZE ,2 MAJORITY , 1 FAILURE TOLERANCE

可以允許一個 Node 的錯誤.

其他多個 Node 的 Cluster 的 SIZE ,MAJORITY, FAILURE TOLERANCE CLUSTER 容錯功能請參考 https://coreos.com/etcd/docs/latest/v2/admin_guide.html#optimal-cluster-size

etcd2 的設定請參考關於

etcd2 設定與使用方式 – https://benjr.tw/96404
新增/移除 etcd2 Node – https://benjr.tw/96449

來試試看,一開始 etcd2 狀態都是正常的

core@coreos1 ~ $ etcdctl member list
29df81d17418132: name=node01 peerURLs=http://172.16.15.21:2380 clientURLs=http://172.16.15.21:2379 isLeader=true
48852466e20da1b8: name=node03 peerURLs=http://172.16.15.23:2380 clientURLs=http://172.16.15.23:2379 isLeader=false
e1c17ca056760d6d: name=node02 peerURLs=http://172.16.15.22:2380 clientURLs=http://172.16.15.22:2379 isLeader=false

core@coreos1 ~ $ etcdctl cluster-health
member 29df81d17418132 is healthy: got healthy result from http://172.16.15.21:2379
member 48852466e20da1b8 is healthy: got healthy result from http://172.16.15.23:2379
member e1c17ca056760d6d is healthy: got healthy result from http://172.16.15.22:2379
cluster is healthy

接下來把 Node3 關閉.

core@coreos1 ~ $ etcdctl cluster-health
member 29df81d17418132 is healthy: got healthy result from http://172.16.15.21:2379
failed to check the health of member 48852466e20da1b8 on http://172.16.15.23:2379: Get http://172.16.15.23:2379/health: dial tcp 172.16.15.23:2379: i/o timeout
member 48852466e20da1b8 is unreachable: [http://172.16.15.23:2379] are all unreachable
member e1c17ca056760d6d is healthy: got healthy result from http://172.16.15.22:2379
cluster is healthy

可以很清楚知道 Node3 的狀態已經是 unreachable 的.但 cluster 的狀態是 healthy 的.

接下來把 Node2 也關閉.

core@coreos1 ~ $ etcdctl cluster-health
member 29df81d17418132 is unhealthy: got unhealthy result from http://172.16.15.21:2379
failed to check the health of member 48852466e20da1b8 on http://172.16.15.23:2379: Get http://172.16.15.23:2379/health: dial tcp 172.16.15.23:2379: i/o timeout
member 48852466e20da1b8 is unreachable: [http://172.16.15.23:2379] are all unreachable
failed to check the health of member e1c17ca056760d6d on http://172.16.15.22:2379: Get http://172.16.15.22:2379/health: dial tcp 172.16.15.22:2379: i/o timeout
member e1c17ca056760d6d is unreachable: [http://172.16.15.22:2379] are all unreachable
cluster is unhealthy

Node2 的狀態也變成 unreachable 的.連 Node1 的狀態也變成了 unhealthy, cluster 狀態也是 unhealthy. 3個 Node 最多只能支援 1 個 Node 的錯誤.

要恢復 etcd2 Cluster 的狀態可以參考 etcd2 的災難復原 – https://benjr.tw/96497

沒有解決問題,試試搜尋本站其他內容

CoreOS etcd2 Cluster 的容錯能力

3 thoughts on “CoreOS etcd2 Cluster 的容錯能力”

發佈留言取消回覆

3 thoughts on “CoreOS etcd2 Cluster 的容錯能力”

發佈留言 取消回覆

發佈留言取消回覆