0
0
0
0
专栏/.../

DM 集群迁移

 Hacker_小峰  发表于  2025-03-13

DM 集群迁移

因原来 DM 集群所在机器机架需要下线,所以需要迁移 DM 集群及其上的同步任务。通过先扩容后缩容的方式做迁移,难点是几个正在使用 dm-worker 同步的任务的迁移需要重点关注的。

当前 DM 集群 状态:

$ tiup dm display dm-001

ID               Role          Host        Ports      OS/Arch       Status     Data Dir                         Deploy Dir
--               ----          ----        -----      -------       ------     --------                         ----------
10.0.0.5:9093  alertmanager  10.0.0.5  9093/9094  linux/x86_64  Up         /data/dm-data/alertmanager-9093  /data/dm-deploy/alertmanager-9093
10.0.0.6:8265  dm-master     10.0.0.6  8265/8295  linux/x86_64  Healthy    /data01/dm-data/dm-master-8265   /data01/dm-deploy/dm-master-8265
10.0.0.5:8261  dm-master     10.0.0.5  8261/8291  linux/x86_64  Healthy|L  /data/dm-data/dm-master-8261     /data/dm-deploy/dm-master-8261
10.0.0.5:8263  dm-master     10.0.0.5  8263/8293  linux/x86_64  Healthy    /data03/dm-data/dm-master-8263   /data03/dm-deploy/dm-master-8263
10.0.0.6:8268  dm-worker     10.0.0.6  8268       linux/x86_64  Free       /data02/dm-data/dm-worker-8268   /data02/dm-deploy/dm-worker-8268
10.0.0.5:8262  dm-worker     10.0.0.5  8262       linux/x86_64  Bound      /data/dm-data/dm-worker-8262     /data01/dm-deploy/dm-worker-8262
10.0.0.5:8264  dm-worker     10.0.0.5  8264       linux/x86_64  Free       /data/dm-data/dm-worker-8264     /data02/dm-deploy/dm-worker-8264
10.0.0.5:8266  dm-worker     10.0.0.5  8266       linux/x86_64  Bound      /data/dm-data/dm-worker-8266     /data04/dm-deploy/dm-worker-8266
10.0.0.5:3000  grafana       10.0.0.5  3000       linux/x86_64  Up         -                                /data/dm-deploy/grafana-3000
10.0.0.5:9090  prometheus    10.0.0.5  9090       linux/x86_64  Up         /data/dm-data/prometheus-9090    /data/dm-deploy/prometheus-9090
Total nodes: 10

迁移目标:需要替换掉所有 10.0.0.5 的节点,替换为新的 机器: 10.0.0.8 。

两台机器需要软硬件配置一致,最好磁盘目录也一样。

# df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs         63G     0   63G   0% /dev
tmpfs            63G     0   63G   0% /dev/shm
tmpfs            63G  2.1M   63G   1% /run
tmpfs            63G     0   63G   0% /sys/fs/cgroup
/dev/sda3        50G   14G   37G  27% /
/dev/sda2       494M  170M  324M  35% /boot
/dev/sda4        40G  431M   40G   2% /var
/dev/sda5       1.1T  223M  1.1T   1% /data
/dev/sdb1       894G   34M  894G   1% /data01
/dev/sdc1       894G   34M  894G   1% /data02
/dev/sdd1       894G   34M  894G   1% /data03
/dev/sde1       894G   34M  894G   1% /data04
tmpfs            13G     0   13G   0% /run/user/0

扩容新节点 dm-master、dm-worker

扩容新节点,以替换需要下线的旧节点。

扩容配置文件,重点关注磁盘目录分配和端口,dm-master 普通磁盘 SAS 即可,dm-worker 需要 SSD 磁盘

tiup dm scale-out dm-001 dm-scale-out.yaml
$ cat dm-scale-out.yaml
---
master_servers:
  - host: 10.0.0.8
    name: master-1
    # ssh_port: 22
    port: 8261
    peer_port: 8291
    deploy_dir: "/data01/dm-deploy/dm-master-8261"
    data_dir: "/data01/dm-data/dm-master-8261"
    log_dir: "/data01/dm-deploy/dm-master-8261/log"
  - host: 10.0.0.8
    name: master-2
    # ssh_port: 22
    port: 8263
    peer_port: 8293
    deploy_dir: "/data/dm-deploy/dm-master-8263"
    data_dir: "/data/dm-data/dm-master-8263"
    log_dir: "/data/dm-deploy/dm-master-8263/log"

worker_servers:
  - host: 10.0.0.8
  #  ssh_port: 22
    name: dm-10.0.0.8-8262
    port: 8262
    deploy_dir: "/data02/dm-deploy/dm-worker-8262"
    data_dir: "/data02/dm-data/dm-worker-8262"
    log_dir: "/data02/dm-deploy/dm-worker-8262/log"
  - host: 10.0.0.8
  #  ssh_port: 22
    name: dm-10.0.0.8-8264
    port: 8264
    deploy_dir: "/data03/dm-deploy/dm-worker-8264"
    data_dir: "/data03/dm-data/dm-worker-8264"
    log_dir: "/data03/dm-deploy/dm-worker-8264/log"

迁移 dm-worker 同步任务【重点】

主要需要 改变数据源与 DM-worker 的绑定关系

如何变更 dm-worker 绑定?

dmctl --master-addr <master-addr> operate-source show #查看源数据库列表
dmctl --master-addr <master-addr> get-config source <source-id> #直接查看数据源配置。

tiup dmctl --master-addr 10.0.0.8:8261 pause-task <task-name>  #先暂停任务,才能改变dm-worker 绑定

dmctl --master-addr <master-addr> list-member --worker #查看source 与worker 绑定关系

# 在本示例中 <source-id> 绑定到了 dm-10.0.0.5-8262 上。
# 使用如下命令可以将该数据源绑定到            dm-10.0.0.8-8262 上

tiup dmctl --master-addr 10.0.0.8:8261  transfer-source <source-id>  dm-10.0.0.8-8262

1、查看数据源配置

tiup dmctl --master-addr 10.0.0.8:8261 operate-source show

{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": true,
            "msg": "",
            "source": "<source-id>",
            "worker": "dm-10.0.0.5-8262"
        },
        {
            "result": true,
            "msg": "",
            "source": "<source-id>",
            "worker": "dm-10.0.0.5-8266"
        }
    ]
}

tiup dmctl --master-addr 10.0.0.8:8261 get-config source <source-id>

2、通过 transfer-source 改变数据源与 DM-worker 的绑定关系

2.1 list-member 列出 DM-worker 的绑定关系

dmctl --master-addr <master-addr> list-member --worker

dmctl --master-addr 10.0.0.8:8261 list-member --worker

{
    "result": true,
    "msg": "",
    "members": [
        {
            "worker": {
                "msg": "",
                "workers": [
                    {
                        "name": "dm-10.0.0.6-8268",
                        "addr": "10.0.0.6:8268",
                        "stage": "free",
                        "source": ""
                    },
                    {
                        "name": "dm-10.0.0.8-8262",
                        "addr": "10.0.0.8:8262",
                        "stage": "free",
                        "source": ""
                    },
                    {
                        "name": "dm-10.0.0.8-8264",
                        "addr": "10.0.0.8:8264",
                        "stage": "free",
                        "source": ""
                    },
                    {
                        "name": "dm-10.0.0.5-8262",
                        "addr": "10.0.0.5:8262",
                        "stage": "bound",
                        "source": "<source-id>"
                    },
                    {
                        "name": "dm-10.0.0.5-8266",
                        "addr": "10.0.0.5:8266",
                        "stage": "bound",
                        "source": "<source-id>"
                    }
                ]
            }
        }
    ]
}

2.2 pause-task

在改变绑定关系前,DM 会检查待解绑的 worker 是否正在运行同步任务,如果正在运行则需要先 暂停任务 ,并在改变绑定关系后 恢复任务 。

#先将 sources "<source-id>" task 任务  <task-name>  暂停。

tiup dmctl --master-addr 10.0.0.8:8261 pause-task <task-name> 
tiup dmctl --master-addr 10.0.0.8:8261 query-status  <task-name>

2.3 transfer-source

# 在本示例中 <source-id> 绑定到了 dm-10.0.0.5-8262 上。
# 使用如下命令可以将该数据源绑定到            dm-10.0.0.8-8262 上

tiup dmctl --master-addr 10.0.0.8:8261  transfer-source <source-id>  dm-10.0.0.8-8262

2.4 resume-task

# 再次通过 dmctl --master-addr <master-addr> list-member --worker 查看,检查命令已生效。

tiup dmctl --master-addr 10.0.0.8:8261 list-member --worker

tiup dmctl --master-addr 10.0.0.8:8261 resume-task <task-name> 

tiup dmctl --master-addr 10.0.0.8:8261 query-status  <task-name>

变更下一个 dm-worker 绑定关系:

tiup dmctl --master-addr 10.0.0.8:8261 pause-task <task-name2>
tiup dmctl --master-addr 10.0.0.8:8261 transfer-source <source-id> dm-10.0.0.8-8264
tiup dmctl --master-addr 10.0.0.8:8261 list-member --worker
tiup dmctl --master-addr 10.0.0.8:8261 resume-task <task-name2>
tiup dmctl --master-addr 10.0.0.8:8261 query-status <task-name2>

迁移监控节点【难点】

报错:

{"code": 1, "error": "executor.ssh.execute_failed: Failed to execute command over SSH for 'tidb@10.0.0.8:22' {ssh_stderr:   FAILED: instance 0 in group 3: no address\n, ssh_stdout: Checking /data/dm-deploy/prometheus-9090/conf/prometheus.yml\n, ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /data/dm-deploy/prometheus-9090/bin/prometheus/promtool check config /data/dm-deploy/prometheus-9090/conf/prometheus.yml}, cause: Process exited with status 1: check config failed", "errorVerbose": "check config failed\nexecutor.ssh.execute_failed: Failed to execute command over SSH for 'tidb@10.0.0.8:22' {ssh_stderr:   FAILED: instance 0 in group 3: no address\n, ssh_stdout: Checking /data/dm-deploy/prometheus-9090/conf/prometheus.yml\n, ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /data/dm-deploy/prometheus-9090/bin/prometheus/promtool check config ...

ssh 免登陆

ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub 10.1.0.0

用扩容在缩容的方式迁移 grafana/prometheus/altermanager 总是有奇怪ssh报错,后来看了TiDB论坛发现一句话,需要先缩容在扩监控,五分钟搞定[捂脸].

缩容 grafana、Prometheus、altermanager

需要先缩容原来节点,再扩容
tiup dm scale-in dm-001 -N 10.1.1.1:xxxx

扩容监控节点到新机器

tiup dm scale-out dm-001 /home/tidb/dm/dm-scale-out-grafana.yaml
$ cat dm-scale-out-grafana.yaml
---
monitoring_servers:
- host: 10.0.0.8
grafana_servers:
- host: 10.0.0.8
alertmanager_servers:
- host: 10.0.0.8

其他

变更 IP 后,对应的监控 URL 后端IP也需要更新。

TiDB 中控机迁移 详见

0
0
0
0

版权声明:本文为 TiDB 社区用户原创文章,遵循 CC BY-NC-SA 4.0 版权协议,转载请附上原文出处链接和本声明。

评论
暂无评论