[K8S] 使用 MySQL Shell (mysqlsh) 救援損毀無法連線的 MySQl 叢集

[K8S] 使用 MySQL Shell (mysqlsh) 救援損毀無法連線的 MySQl 叢集

同事在 k8s 中建立了一組 MySQL Cluster (group replication),

有個 MySQL Router 來負責導向主要節點,同時也加了一個 MySQL operator,

不太懂細節,不過還是把一些會用到的指令記錄一下吧~

 

要在 MySQL Operator 裡執行指令的話,

先用  kubectl exec 進到 mysql-operator pod 裡面:

kubectl exec -it mysql-operator-xxx -- bash

 

接著執行  mysqlsh ,也就是 MySQL Shell,

平常不加  --sql 參數時,預設是用 JavaScript 的方式來操作 MySQL 物件。

要加  --uri 參數指定要連線的 MySQL 節點,

像我在下面指定的是 MySQL Router 提供的節點:

$ mysqlsh --uri "root:password@innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306/mysql"

MySQL Shell 8.0.30

Copyright (c) 2016, 2022, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.
WARNING: Using a password on the command line interface can be insecure.
Creating a session to 'root@innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306/mysql'
Fetching schema names for autocompletion... Press ^C to stop.
Your MySQL connection id is 132
Server version: 8.0.30 MySQL Community Server - GPL
Default schema set to `mysql`.

 

連上之後,可以使用  dba.getCluster() 取得目前的 cluster 物件,

然後可以執行  cluster.status() 查看狀態。

不過下面的例子裡,取得 cluster 物件失敗了,

因為當時的 MySQL cluster 整個有點壞掉,

這也是為什麼我需要進到 MySQL Operator 來除錯…

 MySQL  innodb-cluster-instances.com-mysqlha.svc.cluster.local:3306  mysql  JS > var cluster = dba.getCluster()
Dba.getCluster: This function is not available through a session to a standalone instance (metadata exists, instance belongs to that metadata, but GR is not active) (MYSQLSH 51314)

 MySQL  innodb-cluster-instances.com-mysqlha.svc.cluster.local:3306  mysql  JS > cluster.status();
TypeError: Cannot read property 'status' of undefined

 

無法取得正常執行狀態的 cluster 物件,

這時可以試著用  dba.rebootClusterFromCompleteOutage() 來將 cluster 重開機…

這過程中可能會問你某些叢集中的機器是否要重新加入,

也有可能因為連線不上,而問你是否要移除某些機器,這些都要自己判斷~

以下面例子來說,因為我一開始用 mysqlsh 連上的機器,

不是叢集中保有最新資料的那台,所以它叫我換成最新的那台:

 MySQL  innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306  mysql  JS > var cluster = dba.rebootClusterFromCompleteOutage();
Restoring the cluster 'innodb_cluster' from complete outage...

The instance 'innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y

Could not open a connection to 'innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306': 'MySQL Error 2061: Could not open connection to 'innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306': Authentication plugin 'caching_sha2_password' reported error: Authentication requires secure connection.'
Would you like to remove it from the cluster's metadata? [y/N]: n

Dba.rebootClusterFromCompleteOutage: The active session instance (innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306) isn't the most updated in comparison with the ONLINE instances of the Cluster's metadata. Please use the most up to date instance: 'innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306'. (RuntimeError)

 MySQL  innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306  mysql  JS >
Bye!

 

依照指示,用 mysqlsh 重新連上有最新資料的節點:

$ mysqlsh --uri "root:password@innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306/mysql"

MySQL Shell 8.0.30

Copyright (c) 2016, 2022, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.
WARNING: Using a password on the command line interface can be insecure.
Creating a session to 'root@innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306/mysql'
Fetching schema names for autocompletion... Press ^C to stop.
Your MySQL connection id is 169
Server version: 8.0.30 MySQL Community Server - GPL
Default schema set to `mysql`.

 

再跑一次  dba.rebootClusterFromCompleteOutage() ,

這次有成功將叢集重新啟動了:

MySQL  innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306 ssl  mysql  JS > var cluster = dba.rebootClusterFromCompleteOutage();
Restoring the cluster 'innodb_cluster' from complete outage...

The instance 'innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y

Could not open a connection to 'innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306': 'MySQL Error 2061: Could not open connection to 'innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306': Authentication plugin 'caching_sha2_password' reported error: Authentication requires secure connection.'
Would you like to remove it from the cluster's metadata? [y/N]: n

Validating instance configuration at innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306...

This instance reports its own address as innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306

Instance configuration is suitable.
* Waiting for seed instance to become ONLINE...
innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306 was restored.
Rejoining 'innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' to the cluster.
Validating instance configuration at innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306...

This instance reports its own address as innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306

Instance configuration is suitable.
Rejoining instance 'innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' to cluster 'innodb_cluster'...

Re-creating recovery account...
NOTE: User 'mysql_innodb_cluster_1001'@'%' already existed at instance 'innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306'. It will be deleted and created again with a new password.
The instance 'innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' was successfully rejoined to the cluster.

The cluster was successfully rebooted.

 

退出 mysqlsh,重新再連上 MySQL Router 節點:

mysqlsh --uri "root:password@innodb-cluster.test-mysqlha.svc.cluster.local:3306/mysql"

 

現在可以用正常的方法取得 cluster 物件,並查看狀態。

以下例來說,它有發現 innodb-cluster-0 和 innodb-cluster-1 這兩個節點沒有註冊,

而 innodb-cluster-2 這個節點離線了 (整組都壞了…):

 MySQL  innodb-cluster.test-mysqlha.svc.cluster.local:3306 ssl  mysql  JS > var cluster = dba.getCluster()
 MySQL  innodb-cluster.test-mysqlha.svc.cluster.local:3306 ssl  mysql  JS > cluster.status()
{
    "clusterName": "innodb_cluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306",
        "ssl": "VERIFY_IDENTITY",
        "status": "OK_NO_TOLERANCE_PARTIAL",
        "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active.",
        "topology": {
            "innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306": {
                "address": "innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306",
                "instanceErrors": [
                    "NOTE: instance server_id is not registered in the metadata. Use cluster.rescan() to update the metadata."
                ],
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "recovery": {
                    "state": "ON"
                },
                "recoveryStatusText": "Distributed recovery in progress",
                "role": "HA",
                "status": "RECOVERING",
                "version": "8.0.30"
            },
            "innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306": {
                "address": "innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306",
                "instanceErrors": [
                    "WARNING: The replication recovery account in use by the instance is not stored in the metadata. Use Cluster.rescan() to update the metadata."
                ],
                "memberRole": "PRIMARY",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.30"
            },
            "innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306": {
                "address": "innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306",
                "instanceErrors": [
                    "NOTE: group_replication is stopped."
                ],
                "memberRole": "SECONDARY",
                "memberState": "OFFLINE",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "(MISSING)",
                "version": "8.0.30"
            }
        },
        "topologyMode": "Single-Primary"
    },
    "groupInformationSourceMember": "innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306"
}

 

執行  cluster.rescan() 重新掃瞄…

可以把已經在叢集裡,但尚未存有相關 metadata 的新節點加進來:

MySQL  innodb-cluster.test-mysqlha.svc.cluster.local:3306 ssl  mysql  JS > cluster.rescan();
Rescanning the cluster...

Result of the rescanning operation for the 'innodb_cluster' cluster:
{
    "name": "innodb_cluster",
    "newTopologyMode": null,
    "newlyDiscoveredInstances": [
        {
            "host": "innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306",
            "member_id": "5909e59c-8b48-11ed-8b88-0687c9d0276a",
            "name": null,
            "version": "8.0.30"
        }
    ],
    "unavailableInstances": [
        {
            "host": "innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306",
            "label": "innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306",
            "member_id": "fe619839-8692-11ed-84c7-aaa2840857fc"
        }
    ],
    "updatedInstances": []
}

A new instance 'innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' was discovered in the cluster.
Would you like to add it to the cluster metadata? [Y/n]: y
Adding instance to the cluster metadata...
The instance 'innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' was successfully added to the cluster metadata.

The instance 'innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' is no longer part of the cluster.
The instance is either offline or left the HA group. You can try to add it to the cluster again with the cluster.rejoinInstance('innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306') command or you can remove it from the cluster configuration.
Would you like to remove it from the cluster metadata? [Y/n]: n

 

至於離線或不在叢集裡的節點,

可以試試  cluster.rejoinInstance() 把它加進來:

 MySQL  innodb-cluster.test-mysqlha.svc.cluster.local:3306 ssl  mysql  JS > cluster.rejoinInstance('innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306')
Validating instance configuration at innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306...

This instance reports its own address as innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306

Instance configuration is suitable.
Rejoining instance 'innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' to cluster 'innodb_cluster'...

Re-creating recovery account...
NOTE: User 'mysql_innodb_cluster_1002'@'%' already existed at instance 'innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306'. It will be deleted and created again with a new password.
The instance 'innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306' was successfully rejoined to the cluster.

 

都完成後,再次試試  cluster.rescan() ,看起來沒有新東西了。

再執行  cluster.status() ,可以發現 innodb-cluster-0 目前正在復原中,

innodb-cluster-1 是 Primary 主節點,innodb-cluster-2 是副節點:

 MySQL  innodb-cluster.test-mysqlha.svc.cluster.local:3306 ssl  mysql  JS > cluster.rescan();
Rescanning the cluster...

Result of the rescanning operation for the 'innodb_cluster' cluster:
{
    "name": "innodb_cluster",
    "newTopologyMode": null,
    "newlyDiscoveredInstances": [],
    "unavailableInstances": [],
    "updatedInstances": []
}

 MySQL  innodb-cluster.test-mysqlha.svc.cluster.local:3306 ssl  mysql  JS > cluster.status()
{
    "clusterName": "innodb_cluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306",
        "ssl": "VERIFY_IDENTITY",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure. 1 member is not active.",
        "topology": {
            "innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306": {
                "address": "innodb-cluster-0.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306",
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "recovery": {
                    "state": "ON"
                },
                "recoveryStatusText": "Distributed recovery in progress",
                "role": "HA",
                "status": "RECOVERING",
                "version": "8.0.30"
            },
            "innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306": {
                "address": "innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306",
                "memberRole": "PRIMARY",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.30"
            },
            "innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306": {
                "address": "innodb-cluster-2.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306",
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.30"
            }
        },
        "topologyMode": "Single-Primary"
    },
    "groupInformationSourceMember": "innodb-cluster-1.innodb-cluster-instances.test-mysqlha.svc.cluster.local:3306"
}

 

以上,就是這次使用 MySQL Shell 救援損毀的 MySQL Cluster 時,

有用到的一些指令囉~

 

參考資料:Restart Innodb MySQL Cluster after Complete outage(All node Down)

(本頁面已被瀏覽過 512 次)

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *

這個網站採用 Akismet 服務減少垃圾留言。進一步了解 Akismet 如何處理網站訪客的留言資料