[K8S] 使用 MySQL Shell (mysqlsh) 救援損毀無法連線的 MySQl 叢集
同事在 k8s 中建立了一組 MySQL Cluster (group replication),
有個 MySQL Router 來負責導向主要節點,同時也加了一個 MySQL operator,
不太懂細節,不過還是把一些會用到的指令記錄一下吧~
要在 MySQL Operator 裡執行指令的話,
先用 kubectl exec
進到 mysql-operator pod 裡面:
kubectl exec -it mysql-operator-xxx -- bash
接著執行 mysqlsh
,也就是 MySQL Shell,
平常不加 --sql
參數時,預設是用 JavaScript 的方式來操作 MySQL 物件。
要加 --uri
參數指定要連線的 MySQL 節點,
像我在下面指定的是 MySQL Router 提供的節點:
$ mysqlsh --uri "root:[email protected]:3306/mysql" MySQL Shell 8.0.30 Copyright (c) 2016, 2022, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type '\help' or '\?' for help; '\quit' to exit. WARNING: Using a password on the command line interface can be insecure. Creating a session to '[email protected]:3306/mysql' Fetching schema names for autocompletion... Press ^C to stop. Your MySQL connection id is 132 Server version: 8.0.30 MySQL Community Server - GPL Default schema set to `mysql`.
連上之後,可以使用 dba.getCluster()
取得目前的 cluster 物件,
然後可以執行 cluster.status()
查看狀態。
不過下面的例子裡,取得 cluster 物件失敗了,
因為當時的 MySQL cluster 整個有點壞掉,
這也是為什麼我需要進到 MySQL Operator 來除錯…
MySQL innodb-cluster-instances.com-mysqlha.svc.cluster.local:3306 mysql JS > var cluster = dba.getCluster() Dba.getCluster: This function is not available through a session to a standalone instance (metadata exists, instance belongs to that metadata, but GR is not active) (MYSQLSH 51314) MySQL innodb-cluster-instances.com-mysqlha.svc.cluster.local:3306 mysql JS > cluster.status(); TypeError: Cannot read property 'status' of undefined
無法取得正常執行狀態的 cluster 物件,
這時可以試著用 dba.rebootClusterFromCompleteOutage()
來將 cluster 重開機…
這過程中可能會問你某些叢集中的機器是否要重新加入,
也有可能因為連線不上,而問你是否要移除某些機器,這些都要自己判斷~
以下面例子來說,因為我一開始用 mysqlsh 連上的機器,
不是叢集中保有最新資料的那台,所以它叫我換成最新的那台:
MySQL innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306 mysql JS > var cluster = dba.rebootClusterFromCompleteOutage(); Restoring the cluster 'innodb_cluster' from complete outage... The instance 'innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' was part of the cluster configuration. Would you like to rejoin it to the cluster? [y/N]: y Could not open a connection to 'innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306': 'MySQL Error 2061: Could not open connection to 'innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306': Authentication plugin 'caching_sha2_password' reported error: Authentication requires secure connection.' Would you like to remove it from the cluster's metadata? [y/N]: n Dba.rebootClusterFromCompleteOutage: The active session instance (innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306) isn't the most updated in comparison with the ONLINE instances of the Cluster's metadata. Please use the most up to date instance: 'innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306'. (RuntimeError) MySQL innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306 mysql JS > Bye!
依照指示,用 mysqlsh 重新連上有最新資料的節點:
$ mysqlsh --uri "root:password@innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306/mysql" MySQL Shell 8.0.30 Copyright (c) 2016, 2022, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type '\help' or '\?' for help; '\quit' to exit. WARNING: Using a password on the command line interface can be insecure. Creating a session to 'root@innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306/mysql' Fetching schema names for autocompletion... Press ^C to stop. Your MySQL connection id is 169 Server version: 8.0.30 MySQL Community Server - GPL Default schema set to `mysql`.
再跑一次 dba.rebootClusterFromCompleteOutage()
,
這次有成功將叢集重新啟動了:
MySQL innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306 ssl mysql JS > var cluster = dba.rebootClusterFromCompleteOutage(); Restoring the cluster 'innodb_cluster' from complete outage... The instance 'innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' was part of the cluster configuration. Would you like to rejoin it to the cluster? [y/N]: y Could not open a connection to 'innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306': 'MySQL Error 2061: Could not open connection to 'innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306': Authentication plugin 'caching_sha2_password' reported error: Authentication requires secure connection.' Would you like to remove it from the cluster's metadata? [y/N]: n Validating instance configuration at innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306... This instance reports its own address as innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306 Instance configuration is suitable. * Waiting for seed instance to become ONLINE... innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306 was restored. Rejoining 'innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' to the cluster. Validating instance configuration at innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306... This instance reports its own address as innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306 Instance configuration is suitable. Rejoining instance 'innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' to cluster 'innodb_cluster'... Re-creating recovery account... NOTE: User 'mysql_innodb_cluster_1001'@'%' already existed at instance 'innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306'. It will be deleted and created again with a new password. The instance 'innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' was successfully rejoined to the cluster. The cluster was successfully rebooted.
退出 mysqlsh,重新再連上 MySQL Router 節點:
mysqlsh --uri "root:[email protected]:3306/mysql"
現在可以用正常的方法取得 cluster 物件,並查看狀態。
以下例來說,它有發現 innodb-cluster-0 和 innodb-cluster-1 這兩個節點沒有註冊,
而 innodb-cluster-2 這個節點離線了 (整組都壞了…):
MySQL innodb-cluster.test-mysqlha.svc.cluster.local:3306 ssl mysql JS > var cluster = dba.getCluster() MySQL innodb-cluster.test-mysqlha.svc.cluster.local:3306 ssl mysql JS > cluster.status() { "clusterName": "innodb_cluster", "defaultReplicaSet": { "name": "default", "primary": "innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306", "ssl": "VERIFY_IDENTITY", "status": "OK_NO_TOLERANCE_PARTIAL", "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active.", "topology": { "innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306": { "address": "innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306", "instanceErrors": [ "NOTE: instance server_id is not registered in the metadata. Use cluster.rescan() to update the metadata." ], "memberRole": "SECONDARY", "mode": "R/O", "readReplicas": {}, "recovery": { "state": "ON" }, "recoveryStatusText": "Distributed recovery in progress", "role": "HA", "status": "RECOVERING", "version": "8.0.30" }, "innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306": { "address": "innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306", "instanceErrors": [ "WARNING: The replication recovery account in use by the instance is not stored in the metadata. Use Cluster.rescan() to update the metadata." ], "memberRole": "PRIMARY", "mode": "R/W", "readReplicas": {}, "replicationLag": "applier_queue_applied", "role": "HA", "status": "ONLINE", "version": "8.0.30" }, "innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306": { "address": "innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306", "instanceErrors": [ "NOTE: group_replication is stopped." ], "memberRole": "SECONDARY", "memberState": "OFFLINE", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "(MISSING)", "version": "8.0.30" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306" }
執行 cluster.rescan()
重新掃瞄…
可以把已經在叢集裡,但尚未存有相關 metadata 的新節點加進來:
MySQL innodb-cluster.test-mysqlha.svc.cluster.local:3306 ssl mysql JS > cluster.rescan(); Rescanning the cluster... Result of the rescanning operation for the 'innodb_cluster' cluster: { "name": "innodb_cluster", "newTopologyMode": null, "newlyDiscoveredInstances": [ { "host": "innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306", "member_id": "5909e59c-8b48-11ed-8b88-0687c9d0276a", "name": null, "version": "8.0.30" } ], "unavailableInstances": [ { "host": "innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306", "label": "innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306", "member_id": "fe619839-8692-11ed-84c7-aaa2840857fc" } ], "updatedInstances": [] } A new instance 'innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' was discovered in the cluster. Would you like to add it to the cluster metadata? [Y/n]: y Adding instance to the cluster metadata... The instance 'innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' was successfully added to the cluster metadata. The instance 'innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' is no longer part of the cluster. The instance is either offline or left the HA group. You can try to add it to the cluster again with the cluster.rejoinInstance('innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306') command or you can remove it from the cluster configuration. Would you like to remove it from the cluster metadata? [Y/n]: n
至於離線或不在叢集裡的節點,
可以試試 cluster.rejoinInstance()
把它加進來:
MySQL innodb-cluster.test-mysqlha.svc.cluster.local:3306 ssl mysql JS > cluster.rejoinInstance('innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306') Validating instance configuration at innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306... This instance reports its own address as innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306 Instance configuration is suitable. Rejoining instance 'innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' to cluster 'innodb_cluster'... Re-creating recovery account... NOTE: User 'mysql_innodb_cluster_1002'@'%' already existed at instance 'innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306'. It will be deleted and created again with a new password. The instance 'innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' was successfully rejoined to the cluster.
都完成後,再次試試 cluster.rescan()
,看起來沒有新東西了。
再執行 cluster.status()
,可以發現 innodb-cluster-0 目前正在復原中,
innodb-cluster-1 是 Primary 主節點,innodb-cluster-2 是副節點:
MySQL innodb-cluster.test-mysqlha.svc.cluster.local:3306 ssl mysql JS > cluster.rescan(); Rescanning the cluster... Result of the rescanning operation for the 'innodb_cluster' cluster: { "name": "innodb_cluster", "newTopologyMode": null, "newlyDiscoveredInstances": [], "unavailableInstances": [], "updatedInstances": [] } MySQL innodb-cluster.test-mysqlha.svc.cluster.local:3306 ssl mysql JS > cluster.status() { "clusterName": "innodb_cluster", "defaultReplicaSet": { "name": "default", "primary": "innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306", "ssl": "VERIFY_IDENTITY", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure. 1 member is not active.", "topology": { "innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306": { "address": "innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306", "memberRole": "SECONDARY", "mode": "R/O", "readReplicas": {}, "recovery": { "state": "ON" }, "recoveryStatusText": "Distributed recovery in progress", "role": "HA", "status": "RECOVERING", "version": "8.0.30" }, "innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306": { "address": "innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306", "memberRole": "PRIMARY", "mode": "R/W", "readReplicas": {}, "replicationLag": "applier_queue_applied", "role": "HA", "status": "ONLINE", "version": "8.0.30" }, "innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306": { "address": "innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306", "memberRole": "SECONDARY", "mode": "R/O", "readReplicas": {}, "replicationLag": "applier_queue_applied", "role": "HA", "status": "ONLINE", "version": "8.0.30" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306" }
以上,就是這次使用 MySQL Shell 救援損毀的 MySQL Cluster 時,
有用到的一些指令囉~
參考資料:Restart Innodb MySQL Cluster after Complete outage(All node Down)