先來看一下 cloud-config.yaml 設定檔.
#cloud-config hostname: coreos1 ssh_authorized_keys: - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5wZYPD/mBs+9O9CrUxdg9kpOus24VrMuNncdt4BRc4iF5npV90HYe5j/y3IG6+2MRbAb2edyf/FUcaJHN/V+i123456yuqyAT2rv9T0eB2+wpmYCUQzqZscJP2uLK8jMhezKWS0l7X5CgJf+d17VooS6CADR9MyTbku3upKp5yEnsCfB+pBLGdrqCUTnGHPfJcLTBIvuMriz/kae0azxcderfbw7YWR8oKdWjKYKlznnBmH6VYFcgv/jSXbRbdZjKNSXIm2xIj6TIIJmo6sWhptcGohi467ODyrzCDioXD1MsYx6ImTMcY5mzL2RDePAW7CM4gWIMaIxDeL5e10SX ben@appledeAir coreos: units: - name: etcd2.service command: start - name: systemd-networkd.service command: stop - name: 00-eth0.network runtime: true content: | [Match] Name=ens32 [Network] Address=172.16.15.21/24 Gateway=172.16.15.2 DNS=168.95.1.1 - name: systemd-networkd.service command: start etcd2: name: "node01" discovery: https://discovery.etcd.io/9dd875ca6dd759d67445a681adde3875 advertise-client-urls: http://172.16.15.21:2379 initial-advertise-peer-urls: http://172.16.15.21:2380 listen-client-urls: http://0.0.0.0:2379 listen-peer-urls: http://172.16.15.21:2380
hostname , ssh_authorized_keys 與 units , Network 前面連結都介紹過了 https://benjr.tw/96511,這次把重點放在 etcd2 上.
那 etcd 是做什麼的, etcd 是一種分散式的 key/value 儲存方式 (至少要有三個 node ,會把資料複製三份到個別的 node 作儲存,以確保資料的可靠度),不同於傳統的關聯式資料庫系統 (傳統的關聯式資料庫基本上就是一堆 tables),etc2 採用的是 key / value Stores 儲存,資料就只有 key / value Stores 採用 雜湊表 (Hash table) 是根據鍵 (Key) 來查詢 (noSQL 的方式) 存儲的資料結構.
關於 fleet 請參考 https://benjr.tw/96502
目前這一台 CoreOS 使用固定 IP 172.16.15.21/24 DNS 168.95.1.1 (中華電信 DNS)
etcd2
關於 etcd2 設定檔內容
- name: “node01”
這裡的 name 是指 etcd node 名稱,不同於前面的 hostname .不設定也是可以的,系統會指定一串數字為名稱. - discovery: https://discovery.etcd.io/9dd875ca6dd759d67445a681adde3875
Discovery service 透過這個來找到彼此 Cluster 的列表 peer list (Peer IP 用於服務器彼此間的直接通信).這個 token 要到 https://discovery.etcd.io/ 網站來產生,直接複製到瀏覽器 https://discovery.etcd.io/new?size=1 ,Cluster 要具備 容錯功能至少要有 3 個 node,但我只先設定一台 etcd2 ,後面會再用指令 #etcdctl member add 的方式去加入.關於 CLUSTER 容錯功能的 SIZE ,MAJORITY, FAILURE TOLERANCE 請參考 https://coreos.com/etcd/docs/latest/v2/admin_guide.html#optimal-cluster-size
除了上網產生 token 也可以直接透過指令的方式獲取.
core@localhost ~ $ curl -w "\n" 'https://discovery.etcd.io/new?size=1' https://discovery.etcd.io/9dd875ca6dd759d67445a681adde3875
- advertise-client-urls: http://172.16.15.21:2379
很多設定範例都有同步設定 port 埠 4001,查了一下這是為了相容舊版的 etcd 所使用的埠,所以我這邊就不設定.advertise-client-urls – List of this member’s client URLs to advertise to the rest of the cluster. These URLs can contain domain names.
列出此成員的 client URLs 以通知其他台 Cluster 成員.除了直接設定 IP 外還可以設定成 Domain name (需要有辦法解析).
- initial-advertise-peer-urls: http://172.16.15.21:2380
initial-advertise-peer-urls – List of this member’s peer URLs to advertise to the rest of the cluster. These addresses are used for communicating etcd data around the cluster. At least one must be routable to all cluster members. These URLs can contain domain names.列出此成員的 Peer URL (用於服務器彼此間的直接通信) 以通知其他台 Cluster 成員. 這地址用於傳遞 cluster etcd 的數據資料.至少需要一個可路由到的集群成員.除了直接設定 IP 外還可以設定成 Domain name (需要有辦法解析).
- listen-client-urls: http://0.0.0.0:2379
listen-client-urls – List of URLs to listen on for client traffic.很多設定範例都有同步設定 port 埠 4001,查了一下這是為了相容舊版的 etcd 所使用的埠,所以我這邊就不設定.
用在與客戶端 etcd 數據傳輸,http://0.0.0.0 代表可接受所有的客戶端.
- listen-peer-urls: http://172.16.15.21:2380
listen-peer-urls- List of URLs to listen on for peer traffic.用於 peer 節點與節點之間數據交換.
其他參數可以參考官方網站說明 https://coreos.com/etcd/docs/latest/op-guide/configuration.html
CoreOS 光碟開機後就進入文字模式,直接透過指令 #coreos-install 來安裝.這次有透過 -c 來指定 cloud-init config .
core@localhost ~ $ sudo coreos-install -d /dev/sda -C stable -c ~/cloud-config.yaml 2016/12/21 09:41:12 Checking availability of "local-file" 2016/12/21 09:41:12 Fetching user-data from datasource of type "local-file" Downloading the signature for https://stable.release.core-os.net/amd64-usr/1185.3.0/coreos_production_image.bin.bz2... 2016-12-21 09:41:14 URL:https://stable.release.core-os.net/amd64-usr/1185.3.0/coreos_production_image.bin.bz2.sig [543/543] -> "/tmp/coreos-install.fmCj9mKD5k/coreos_production_image.bin.bz2.sig" [1] Downloading, writing and verifying coreos_production_image.bin.bz2... ... Success! CoreOS stable 1185.3.0 is installed on /dev/sda core@localhost ~ $ sudo reboot
使用的參數:
-d ( DEVICE ) – Install CoreOS to the given device.
-C ( CHANNEL ) – Release channel to use (e.g. stable, beta)
-c ( CLOUD ) – Insert a cloud-init config to be executed on boot.
重開機後就可以透過 SSH 來連線.
安裝完後的可以檢查一下 etcd2 的服務狀態是否正常.
core@coreos1 ~ $ sudo systemctl status etcd2 ● etcd2.service - etcd2 Loaded: loaded (/usr/lib/systemd/system/etcd2.service; disabled; vendor preset: disabled) Drop-In: /run/systemd/system/etcd2.service.d └─20-cloudinit.conf Active: active (running) since Wed 2017-01-11 05:55:26 UTC; 27min ago Main PID: 957 (etcd2) Tasks: 7 Memory: 20.9M CPU: 7.889s CGroup: /system.slice/etcd2.service └─957 /usr/bin/etcd2 Jan 11 05:55:26 coreos1 systemd[1]: Started etcd2. Jan 11 05:55:26 coreos1 etcd2[957]: added local member e380570f06dea90a [http://172.16.15.21:2380] to Jan 11 05:55:26 coreos1 etcd2[957]: e380570f06dea90a is starting a new election at term 1 Jan 11 05:55:26 coreos1 etcd2[957]: e380570f06dea90a became candidate at term 2 Jan 11 05:55:26 coreos1 etcd2[957]: e380570f06dea90a received vote from e380570f06dea90a at term 2 Jan 11 05:55:26 coreos1 etcd2[957]: e380570f06dea90a became leader at term 2 Jan 11 05:55:26 coreos1 etcd2[957]: raft.node: e380570f06dea90a elected leader e380570f06dea90a at ter Jan 11 05:55:26 coreos1 etcd2[957]: setting up the initial cluster version to 2.3 Jan 11 05:55:26 coreos1 etcd2[957]: set the initial cluster version to 2.3 Jan 11 05:55:26 coreos1 etcd2[957]: published {Name:92d7c022309e4cf2a4d6acd621471130 ClientURLs:[http:
當要除錯時可以透過指令 journalctl 來獲取詳細關於 etcd2 的訊息.
core@coreos1 ~ $ journalctl -u etcd2
下面這個指令可以確認 Cluster 的狀態是否正常.
core@coreos1 ~ $ etcdctl cluster-health member e380570f06dea90a is healthy: got healthy result from http://172.16.15.21:2379 cluster is healthy
core@coreos1 ~ $ etcdctl member list e380570f06dea90a: name=node01 peerURLs=http://172.16.15.21:2380 clientURLs=http://172.16.15.21:2379 isLeader=true
在 etcd2 相對應的路徑也會產生相對應的 member 檔案.
core@coreos1 ~ $ sudo ls -l /var/lib/etcd2/member/ total 16 drwx------. 2 etcd etcd 4096 Jan 11 05:55 snap drwx------. 2 etcd etcd 4096 Jan 11 05:55 wal
如果想要確認目前 etcd2 的設定值.
core@coreos1 ~ $ cat /run/systemd/system/etcd2.service.d/20-cloudinit.conf [Service] Environment="ETCD_ADVERTISE_CLIENT_URLS=http://172.16.15.21:2379" Environment="ETCD_DISCOVERY=https://discovery.etcd.io/9dd875ca6dd759d67445a681adde3875" Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=http://172.16.15.21:2380" Environment="ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379" Environment="ETCD_LISTEN_PEER_URLS=http://172.16.15.21:2380" Environment="ETCD_NAME=node01"
etcdctl
服務與 Cluster 的狀態都確認正常之後,我們可以透過 etcdctl 指令來試一下 etcd 的 key / value Stores 儲存資料運作是否正常.
set 是儲存 key = test , value = CoreOS testing
core@coreos1 ~ $ etcdctl set /test "CoreOS testing" CoreOS testing
get 是讀取 test 這個 key
core@coreos1 ~ $ etcdctl get /test CoreOS testing
其他相關指令還有
- backup backup an etcd directory
- mk make a new key with a given value
- mkdir make a new directory
- rm remove a key
- rmdir removes the key if it is an empty directory or a key-value pair
- get retrieve the value of a key
- ls retrieve a directory
- set set the value of a key
- setdir create a new or existing directory
- update update an existing key with a given value
- updatedir update an existing directory
- watch watch a key for changes
- exec-watch watch a key for changes and exec an executable
- member member add, remove and list subcommands
錯誤檢查
- Temporary failure in name resolution
如果遇到下面的錯誤訊息要確定妳的網路狀態是不是可以連到 http://discovery.etcd.io , 關於 Network 設定與使用方式請參考 – https://benjr.tw/96370core@coreos1 ~ $ journalctl -u etcd2 ... Jan 11 08:00:32 coreos1 etcd2[890]: error #0: dial tcp: lookup discovery.etcd.io: Temporary failure in name resolution Jan 11 08:00:32 coreos1 etcd2[890]: cluster status check: error connecting to https://discovery.etcd.io, retrying in 8s
- has previously registered with discovery service
需要重新申請一個 新的 tokencore@coreos1 ~ $ journalctl -u etcd2 ... Jan 10 06:03:48 coreos1 etcd2[867]: member "09c2ea6e6df44dfda86ce9f8e2e64eb1" has previously registered with discovery service (https:// discovery.etcd.io/<disco key>), Jan 10 06:03:48 coreos1 etcd2[867]: But etcd could not find valid cluster configuration in the given data dir (/var/lib/etcd2) Jan 10 06:03:48 coreos1 etcd2[867]: Please check the given data dir path if the previous bootstrap succeeded Jan 10 06:03:48 coreos1 systemd[1]: etcd2.service: Main process exited, code=exited, status=1/FAILURE Jan 10 06:03:48 coreos1 systemd[1]: Failed to start etcd2.
- server error Gateway Timeout
一開始我設定 Cluster 為 3 nodes ,並透過 https://discovery.etcd.io/new?size=3 產生 token ,在第一個 Node 起來時卻發現 etcd2 服務並不正常. 錯誤訊息如下.core@coreos1 ~ $ etcdctl cluster-health cluster may be unhealthy: failed to list members Error: client: etcd cluster is unavailable or misconfigured error #0: client: endpoint http://127.0.0.1:2379 exceeded header timeout error #1: dial tcp 127.0.0.1:4001: getsockopt: connection refused
core@coreos1 ~ $ journalctl -u etcd2 ... 18 coreos1 etcd2[970]: found 1 peer(s), waiting for 2 more Jan 11 08:37:18 coreos1 etcd2[970]: error #0: client: etcd member https://discovery.etcd.io returns server error [Gateway Timeout] Jan 11 08:37:18 coreos1 etcd2[970]: waiting for other nodes: error connecting to https://discovery.etcd.io, retrying in 2s Jan 11 08:37:20 coreos1 etcd2[970]: found self e5f1821e81b8d32d in the cluster Jan 11 08:37:20 coreos1 etcd2[970]: found 1 peer(s), waiting for 2 more
etcd2 token 如果設成幾個 node 時一開始就需要全部的 node 都啟動服務才會正常運作,如果要先試試看 etcd2 的功能建議先使用 https://discovery.etcd.io/new?size=1 即可.
其他關於 etcd2 設定與使用方式
- 新增移除 etcd2 Node – https://benjr.tw/96449
- etcd2 Cluster 的災難復原 – https://benjr.tw/96497
- etcd2 Cluster 容錯能力 – https://benjr.tw/96688
6 thoughts on “安裝 CoreOS – 設定 etcd2”